Last year, Rini Kaushik and I authored a paper "GreenHDFS: Towards An
Energy-Conserving,
Storage-Efficient, Hybrid Hadoop Compute Cluster" at HotPower'10 (PDF here:
http://www.usenix.org/event/hotpower10/tech/full_papers/Kaushik.pdf) that
analyzed "hotness" of files based on real namenode audit logs
Hi all,
We're trying to perform some sort of monitoring on HDFS, that could detect
when a datanode or a data-block
is "hot". It would be useful to see patterns of popularity in live HDFS
deployments.
Would anyone know if there are any publicly available statistics on data
access patterns that we