Use flume to copy data in local directory (hadoop server) into hdfs

Cuong Luu Mon, 21 Oct 2013 03:26:33 -0700

Hi all,

I need to copy data in a local directory (hadoop server) into hdfs
regularly and automatically. This is my flume config:


agent.sources = execSource
agent.channels = fileChannel
agent.sinks = hdfsSink

agent.sources.execSource.type = exec

agent.sources.execSource.shell = /bin/bash -c
agent.sources.execSource.command = for i in /local-dir/*; do cat $i; done

agent.sources.execSource.restart = true
agent.sources.execSource.restartThrottle = 3600000
agent.sources.execSource.batchSize = 100

...
agent.sinks.hdfsSink.hdfs.rollInterval = 0
agent.sinks.hdfsSink.hdfs.rollSize = 262144000
agent.sinks.hdfsSink.hdfs.rollCount = 0
agent.sinks.hdfsSink.batchsize = 100000
...
agent.channels.fileChannel.type = FILE
agent.channels.fileChannel.capacity = 100000
...

while hadoop command takes 30second, Flume takes arround 4 minutes to copy
1 gb text file into HDFS. I am worried about whether the config is not good
or shouldn't use flume in this case?

How about your opinion?

Use flume to copy data in local directory (hadoop server) into hdfs

Reply via email to