Also, Chukwa (a project already in Hadoop contrib) is designed to do
something similar with Hadoop directly:
http://wiki.apache.org/hadoop/Chukwa
I think some of the examples even mention Apache logs. Haven't used
it personally, but it looks nice.
Brian
On Apr 9, 2009, at 11:14 PM, Alex Loddengaard wrote:
This is a great idea and a common application, Ricky. Scribe is
probably
useful for you as well:
<http://sourceforge.net/projects/scribeserver/>
<
http://images.google.com/imgres?imgurl=http://farm3.static.flickr.com/2211/2197670659_b42810b8ba.jpg&imgrefurl=http://www.flickr.com/photos/niallkennedy/2197670659/&usg=__WLc-p9Gi_p3AdA-YuKLRZ-bdgvo=&h=375&w=500&sz=131&hl=en&start=2&sig2=P22LVO1KObby6_DDy8ujYg&um=1&tbnid=QudxiEyFOk1EpM:&tbnh=98&tbnw=130&prev=/images%3Fq%3Dfacebook%2Bscribe%2Bhadoop%26hl%3Den%26client%3Dfirefox-a%26rls%3Dorg.mozilla:en-US:official%26sa%3DN%26um%3D1&ei=48beSa74L4H-swORnPmjDg
Scribe is what Facebook uses to get its Apache logs to Hadoop.
Unfortunately, HDFS doesn't (yet) have append, so you'll have to
batch log
files and load them into HDFS in bulk.
Alex
On Thu, Apr 9, 2009 at 9:06 PM, Ricky Ho <[email protected]> wrote:
I want to analyze the traffic pattern and statistics of a distributed
application. I am thinking of having the application write the
events as
log entries into HDFS and then later I can use a Map/Reduce task to
do the
analysis in parallel. Is this a good approach ?
In this case, does HDFS support concurrent write (append) to a file ?
Another question is whether the write API thread-safe ?
Rgds,
Ricky