Hi Jeff & JS,

I tried using spooling dir source & memory channel. It still takes ~ 4 minutes to copy 1gb data into hdfs.

By the way, thanks for suggesting spooling source. I think it is better than exec + cat in my case.

Cuong LUU

On 21/10/2013 22:50, Jeff Lord wrote:
Luu,

Have you tried using the spooling directory source?

-Jeff


On Mon, Oct 21, 2013 at 3:25 AM, Cuong Luu <[email protected] <mailto:[email protected]>> wrote:

    Hi all,

    I need to copy data in a local directory (hadoop server) into hdfs
    regularly and automatically. This is my flume config:

    agent.sources = execSource
    agent.channels = fileChannel
    agent.sinks = hdfsSink

    agent.sources.execSource.type = exec

    agent.sources.execSource.shell = /bin/bash -c
    agent.sources.execSource.command = for i in /local-dir/*; do cat
    $i; done

    agent.sources.execSource.restart = true
    agent.sources.execSource.restartThrottle = 3600000
    agent.sources.execSource.batchSize = 100

    ...
    agent.sinks.hdfsSink.hdfs.rollInterval = 0
    agent.sinks.hdfsSink.hdfs.rollSize = 262144000
    agent.sinks.hdfsSink.hdfs.rollCount = 0
    agent.sinks.hdfsSink.batchsize = 100000
    ...
    agent.channels.fileChannel.type = FILE
    agent.channels.fileChannel.capacity = 100000
    ...

    while hadoop command takes 30second, Flume takes arround 4 minutes
    to copy 1 gb text file into HDFS. I am worried about whether the
    config is not good or shouldn't use flume in this case?

    How about your opinion?



Reply via email to