Also, Chukwa (a project already in Hadoop contrib) is designed to do something similar with Hadoop directly:

http://wiki.apache.org/hadoop/Chukwa

I think some of the examples even mention Apache logs. Haven't used it personally, but it looks nice.

Brian

On Apr 9, 2009, at 11:14 PM, Alex Loddengaard wrote:

This is a great idea and a common application, Ricky. Scribe is probably
useful for you as well:

<http://sourceforge.net/projects/scribeserver/>
<
http://images.google.com/imgres?imgurl=http://farm3.static.flickr.com/2211/2197670659_b42810b8ba.jpg&imgrefurl=http://www.flickr.com/photos/niallkennedy/2197670659/&usg=__WLc-p9Gi_p3AdA-YuKLRZ-bdgvo=&h=375&w=500&sz=131&hl=en&start=2&sig2=P22LVO1KObby6_DDy8ujYg&um=1&tbnid=QudxiEyFOk1EpM:&tbnh=98&tbnw=130&prev=/images%3Fq%3Dfacebook%2Bscribe%2Bhadoop%26hl%3Den%26client%3Dfirefox-a%26rls%3Dorg.mozilla:en-US:official%26sa%3DN%26um%3D1&ei=48beSa74L4H-swORnPmjDg


Scribe is what Facebook uses to get its Apache logs to Hadoop.
Unfortunately, HDFS doesn't (yet) have append, so you'll have to batch log
files and load them into HDFS in bulk.

Alex

On Thu, Apr 9, 2009 at 9:06 PM, Ricky Ho <[email protected]> wrote:

I want to analyze the traffic pattern and statistics of a distributed
application. I am thinking of having the application write the events as log entries into HDFS and then later I can use a Map/Reduce task to do the
analysis in parallel.  Is this a good approach ?

In this case, does HDFS support concurrent write (append) to a file ?
Another question is whether the write API thread-safe ?

Rgds,
Ricky


Reply via email to