Friday, June 7, 2013

New Work on Flash Provisioning at USENIX ATC on June 26

Later this month I'll be at the USENIX Annual Technical Conference in San Jose with some coauthors on the Storage Analytics and Colossus teams at Google to present some of our recent work on optimizing flash provisioning for cloud storage workloads. Our paper is titled "Janus: Optimal Flash Provisioning for Cloud Storage Workloads", and a pre-print is available from Google Research.

I'm going to ATC '13

This work is about using statistical samples of I/O patterns from a large distributed filesystem to formulate and solve an optimization problem that helps us allocate flash better in our datacenters. I'm looking forward to returning to USENIX ATC as it has been nearly 10 years since I've been to this conference. Shoot me a mail if you will be there and want to meet up.

Sunday, October 14, 2012

Two Recent Short Papers

My group at Google continues to grow, and we had the opportunity to publish a few short workshop papers this year about some of the areas we've investigated this year.

The first paper describes some of the work we've done on forecasting storage growth in datacenters for capacity planning purposes using ensemble forecasting methods and trend-change detection. It builds on some of the earlier work we did for websearch traffic forecasting and, to a lesser extent, building a market economy for datacenter resources.

The second paper, to which I made only minor contributions, is a more mathematical description of a method of quantifying the uncertainty in aggregate metrics from a sampled RPC tracing system for large-scale distributed systems (e.g. Dapper).

Both papers are addressing problems that usually come up in very large-scale distributed systems, and the applicability is somewhat limited in smaller contexts, but I would be very interested in feedback regardless.

Thursday, September 6, 2012

Cycling 320+ Miles Next Week for Charity

Next week I will be cycling from Eureka to San Francisco for the California Climate Ride. I'll be mostly out of touch for the week, but will try to post pictures and check in via email and mobile phone when possible. Please consider donating towards my fundraising goal to support the San Francisco Bicycle Coalition.

Thursday, October 28, 2010

What I've been up to..

It's been nearly a year since I posted here and much has changed. The obvious and most important change is a second new addition to our family which I've been blogging about elsewhere. On the work front I was able to publish a paper about some of my work studying the Availability in Globally Distributed Storage Systems at Google last year. This is an exciting space given the growth of cloud based storage services and sophisticated distributed storage software.

I've been blogging a little more regularly about work-related topics on Google company blogs, with four posts so far this year :
As you can see I've been working on data analysis, distributed cloud storage, and open source, along with some other projects I'm not yet able to talk about. I'll try to post more updates about some of my interests and side projects in the remainder of the year.

Sunday, January 10, 2010

Fun with Amazon Web Services

Amazon has been doing a really great job at selling excess compute capacity in their datacenters through products such as Amazon Elastic Compute Cloud (EC2), Elastic MapReduce, and their simple and structured distributed storage products. The economics of this kind of model, as represented in the two graphs here are clearly compelling. Instead of buying large numbers of computer to mostly sit idle, new start-up companies, researchers, and individuals can rent the excess capacity from Amazon instead. Last year I worked on some related ideas for internal pricing and provisioning of resources at work. This was my first direct experience with the Amazon consumer offerings however, and I was impressed. It took less than half an hour last night to sign up, start a few basic Linux instances, copy some application code over, compile it, and begin running it on the Linux Xen instances.

Not everything is so easily scaled to run on more computers. Some tasks are more feasibly done with human involvement, and I've also been experimenting with Amazon Mechanical Turk as well. This service is named after the 18th century fake chess-playing machine that actually used a hidden human operator to control the device. I have used this service recently to improve the captions for FreeBSD technical conference videos that I am involved with and the results have been stunning.

The results of cheap on-demand distributed computer clusters and a global english language work force that can be paid by the task almost engender too many business ideas to contemplate.. If only there were more hours in the day..

Sunday, June 7, 2009

Support Simon Singh and Scientific Debate

Simon Singh has been sued for libel by the British Chiropractic Association. Simon is an author, journalist, and TV producer who works to popularize math and science. I had the opportunity to hear Simon speak about an earlier book on the Big Bang at Keble College, Oxford. Simon wrote a more recent book on alternative medicine and suggests that there is no evidence for the efficacy of chiropractic treatments for asthma, ear infections, and other infant conditions. British Libel laws are more strict than those in the U.S. and this scientific debate has unbelievably been construed as a form of libel. Read more about the dispute and sign the petition here.

Monday, June 1, 2009

Erdős Number

My current Erdős number is 4. There are several paths of length 3 from my M.Sc. advisor, Joël Ouaknine, to Paul Erdős. The path currently returned by the AMS Collaboration Distance Calculator is:

  • Murray Stokely coauthored with Joël Ouaknine
  • Joël Ouaknine coauthored with A. W. Roscoe
  • A. W. Roscoe coauthored with Mary Ellen Rudin
  • Mary Ellen Rudin coauthored with Paul Erdős