On Apr 10, 2009, at 2:06 PM, Todd Lipcon wrote:
On Fri, Apr 10, 2009 at 12:03 PM, Brian Bockelman <[email protected]
>wrote:
0.19.1 with a few convenience patches (mostly, they improve logging
so the
local file system researchers can play around with our data
patterns).
Hey Brian,
I'm curious about this. Could you elaborate a bit on what kind of
stuff
you're logging? I'm interested in what FS metrics you're looking at
and how
you instrumented the code.
-Todd
No clue what they're doing *with* the data, but I know what we've
applied to HDFS to get the data. We apply both of these patches:
http://issues.apache.org/jira/browse/HADOOP-5222
https://issues.apache.org/jira/browse/HADOOP-5625
This adds the duration and offset to each read. Each read is then
logged through the HDFS audit mechanisms. We've been pulling the logs
through the web interface and putting them back into HDFS, then
processing them (actually, today we've been playing with log
collection via Chukwa).
There is a student who is looking at our cluster's I/O access
patterns, and there's a few folks who do work in designing metadata
caching algorithms that love to see application traces. Personally,
I'm interested in hooking the logfiles up to our I/O accounting system
so I can keep historical records of transfers and compare it to our
other file systems.
Brian