Thursday, October 28, 2010

What I've been up to..

It's been nearly a year since I posted here and much has changed. The obvious and most important change is a second new addition to our family which I've been blogging about elsewhere. On the work front I was able to publish a paper about some of my work studying the Availability in Globally Distributed Storage Systems at Google last year. This is an exciting space given the growth of cloud based storage services and sophisticated distributed storage software.

I've been blogging a little more regularly about work-related topics on Google company blogs, with four posts so far this year :
As you can see I've been working on data analysis, distributed cloud storage, and open source, along with some other projects I'm not yet able to talk about. I'll try to post more updates about some of my interests and side projects in the remainder of the year.

Sunday, January 10, 2010

Fun with Amazon Web Services

Amazon has been doing a really great job at selling excess compute capacity in their datacenters through products such as Amazon Elastic Compute Cloud (EC2), Elastic MapReduce, and their simple and structured distributed storage products. The economics of this kind of model, as represented in the two graphs here are clearly compelling. Instead of buying large numbers of computer to mostly sit idle, new start-up companies, researchers, and individuals can rent the excess capacity from Amazon instead. Last year I worked on some related ideas for internal pricing and provisioning of resources at work. This was my first direct experience with the Amazon consumer offerings however, and I was impressed. It took less than half an hour last night to sign up, start a few basic Linux instances, copy some application code over, compile it, and begin running it on the Linux Xen instances.

Not everything is so easily scaled to run on more computers. Some tasks are more feasibly done with human involvement, and I've also been experimenting with Amazon Mechanical Turk as well. This service is named after the 18th century fake chess-playing machine that actually used a hidden human operator to control the device. I have used this service recently to improve the captions for FreeBSD technical conference videos that I am involved with and the results have been stunning.

The results of cheap on-demand distributed computer clusters and a global english language work force that can be paid by the task almost engender too many business ideas to contemplate.. If only there were more hours in the day..