Tuesday, December 17, 2013

HistogramTools 0.3

A new version 0.3 of HistogramTools is now on CRAN. HistogramTools provides a number of methods for manipulating histograms, measuring the distance between histograms, calculating the information loss due to binning aggregate data sets, and other tools useful for statistical analysis of binned/histogram data. It also uses RProtoBuf to provide a protocol buffer representations of the default R histogram class to allow histograms over large data sets to be computed and manipulated in a MapReduce environment with tools written in other languages.

The full list of updates includes :

  • Moved 'Hmisc' from Depends to Imports.
  • Improved introduction vignette significantly.
  • Added ScaleHistograms function.
  • Added PlotRelativeFrequency function to plot relative frequency histograms.
  • Added minkowski.dist, intersect.dist, kl.divergence, jeffrey.divergence measures for two histograms.
  • Added PreBinnedHistogram for creating histogram objects from an already binned dataset (e.g. just a vector of bins and counts).

Dirk's CRANberries service provides a diff to the previous release 0.2. More information is at the HistogramTools page on CRAN which includes the 18-page package vignette and 1-page Quick Reference Guide. Please mail me directly with any questions or suggestions about this package.

Friday, June 7, 2013

New Work on Flash Provisioning at USENIX ATC on June 26

Later this month I'll be at the USENIX Annual Technical Conference in San Jose with some coauthors on the Storage Analytics and Colossus teams at Google to present some of our recent work on optimizing flash provisioning for cloud storage workloads. Our paper is titled "Janus: Optimal Flash Provisioning for Cloud Storage Workloads", and a pre-print is available from Google Research.

I'm going to ATC '13

This work is about using statistical samples of I/O patterns from a large distributed filesystem to formulate and solve an optimization problem that helps us allocate flash better in our datacenters. I'm looking forward to returning to USENIX ATC as it has been nearly 10 years since I've been to this conference. Shoot me a mail if you will be there and want to meet up.