Hi Jeff & JS,
I tried using spooling dir source & memory channel. It still takes ~ 4
minutes to copy 1gb data into hdfs.
By the way, thanks for suggesting spooling source. I think it is better
than exec + cat in my case.
Cuong LUU
On 21/10/2013 22:50, Jeff Lord wrote:
Luu,
Have you tried using the spooling directory source?
-Jeff
On Mon, Oct 21, 2013 at 3:25 AM, Cuong Luu <[email protected]
<mailto:[email protected]>> wrote:
Hi all,
I need to copy data in a local directory (hadoop server) into hdfs
regularly and automatically. This is my flume config:
agent.sources = execSource
agent.channels = fileChannel
agent.sinks = hdfsSink
agent.sources.execSource.type = exec
agent.sources.execSource.shell = /bin/bash -c
agent.sources.execSource.command = for i in /local-dir/*; do cat
$i; done
agent.sources.execSource.restart = true
agent.sources.execSource.restartThrottle = 3600000
agent.sources.execSource.batchSize = 100
...
agent.sinks.hdfsSink.hdfs.rollInterval = 0
agent.sinks.hdfsSink.hdfs.rollSize = 262144000
agent.sinks.hdfsSink.hdfs.rollCount = 0
agent.sinks.hdfsSink.batchsize = 100000
...
agent.channels.fileChannel.type = FILE
agent.channels.fileChannel.capacity = 100000
...
while hadoop command takes 30second, Flume takes arround 4 minutes
to copy 1 gb text file into HDFS. I am worried about whether the
config is not good or shouldn't use flume in this case?
How about your opinion?