My group at Google continues to grow, and we had the opportunity to publish a few short workshop papers this year about some of the areas we've investigated this year.
- Projecting Disk Usage Based on Historical Trends in a Cloud Environment, Proceedings of the 3rd international workshop on Scientific cloud computing, ACM (2012)
- Uncertainty in Aggregate Estimates from Sampled Distributed Traces, 2012 Workshop on Managing Systems Automatically and Dynamically, USENIX
The first paper describes some of the work we've done on forecasting storage growth in datacenters for capacity planning purposes using ensemble forecasting methods and trend-change detection. It builds on some of the earlier work we did for websearch traffic forecasting and, to a lesser extent, building a market economy for datacenter resources.
The second paper, to which I made only minor contributions, is a more mathematical description of a method of quantifying the uncertainty in aggregate metrics from a sampled RPC tracing system for large-scale distributed systems (e.g. Dapper).
Both papers are addressing problems that usually come up in very large-scale distributed systems, and the applicability is somewhat limited in smaller contexts, but I would be very interested in feedback regardless.