Hi all,

I need to copy data in a local directory (hadoop server) into hdfs
regularly and automatically. This is my flume config:

agent.sources = execSource
agent.channels = fileChannel
agent.sinks = hdfsSink

agent.sources.execSource.type = exec

agent.sources.execSource.shell = /bin/bash -c
agent.sources.execSource.command = for i in /local-dir/*; do cat $i; done

agent.sources.execSource.restart = true
agent.sources.execSource.restartThrottle = 3600000
agent.sources.execSource.batchSize = 100

...
agent.sinks.hdfsSink.hdfs.rollInterval = 0
agent.sinks.hdfsSink.hdfs.rollSize = 262144000
agent.sinks.hdfsSink.hdfs.rollCount = 0
agent.sinks.hdfsSink.batchsize = 100000
...
agent.channels.fileChannel.type = FILE
agent.channels.fileChannel.capacity = 100000
...

while hadoop command takes 30second, Flume takes arround 4 minutes to copy
1 gb text file into HDFS. I am worried about whether the config is not good
or shouldn't use flume in this case?

How about your opinion?

Reply via email to