Hi Emile, On Tue, Dec 4, 2012 at 2:04 AM, Emile Kao <emile...@gmx.net> wrote: > > 1. Which is the best way to implement such a scenario using Flume/ Hadoop? >
You could use the file spooling client / source to stream these files back in the latest trunk and upcoming Flume 1.3.0 builds, along with hdfs sink. 2. The customer would like to keep the log files in thier original state > (file name, size, etc..). Is it practicable using Flume? > Not recommended. Flume is an event streaming system, not a file copying mechanism. If you want to do that, just use some scripts with hadoop fs -put instead of Flume. Flume provides a bunch of stream-oriented features on top of its event streaming architecture, such as data enrichment capabilities, event routing, and configurable file rolling on HDFS, to name a few. Regards, Mike