On Apr 10, 2009, at 2:06 PM, Todd Lipcon wrote:

On Fri, Apr 10, 2009 at 12:03 PM, Brian Bockelman <[email protected] >wrote:



0.19.1 with a few convenience patches (mostly, they improve logging so the local file system researchers can play around with our data patterns).


Hey Brian,

I'm curious about this. Could you elaborate a bit on what kind of stuff you're logging? I'm interested in what FS metrics you're looking at and how
you instrumented the code.

-Todd

No clue what they're doing *with* the data, but I know what we've applied to HDFS to get the data. We apply both of these patches:
http://issues.apache.org/jira/browse/HADOOP-5222
https://issues.apache.org/jira/browse/HADOOP-5625

This adds the duration and offset to each read. Each read is then logged through the HDFS audit mechanisms. We've been pulling the logs through the web interface and putting them back into HDFS, then processing them (actually, today we've been playing with log collection via Chukwa).

There is a student who is looking at our cluster's I/O access patterns, and there's a few folks who do work in designing metadata caching algorithms that love to see application traces. Personally, I'm interested in hooking the logfiles up to our I/O accounting system so I can keep historical records of transfers and compare it to our other file systems.

Brian


Reply via email to