Hi all, I need to copy data in a local directory (hadoop server) into hdfs regularly and automatically. This is my flume config:
agent.sources = execSource agent.channels = fileChannel agent.sinks = hdfsSink agent.sources.execSource.type = exec agent.sources.execSource.shell = /bin/bash -c agent.sources.execSource.command = for i in /local-dir/*; do cat $i; done agent.sources.execSource.restart = true agent.sources.execSource.restartThrottle = 3600000 agent.sources.execSource.batchSize = 100 ... agent.sinks.hdfsSink.hdfs.rollInterval = 0 agent.sinks.hdfsSink.hdfs.rollSize = 262144000 agent.sinks.hdfsSink.hdfs.rollCount = 0 agent.sinks.hdfsSink.batchsize = 100000 ... agent.channels.fileChannel.type = FILE agent.channels.fileChannel.capacity = 100000 ... while hadoop command takes 30second, Flume takes arround 4 minutes to copy 1 gb text file into HDFS. I am worried about whether the config is not good or shouldn't use flume in this case? How about your opinion?
