Hi, What is the throughput I can expect when writing to the HDFS Sink. Here is the flume config I'm using:
# in this case called 'agent' # Define a memory channel called ch1 on agent1 agent1.channels.ch1.type = memory # Define an exec source called exec-source1 on agent1 and tell it # to bind to 0.0.0.0:41414. Connect it to channel ch1. agent1.sources.exec-source1.channels = ch1 agent1.sources.exec-source1.type = exec agent1.sources.exec-source1.restart = true agent1.sources.exec-source1.batchSize = 100 agent1.sources.exec-source1.command = /home/ubuntu/flume/linesource.sh # Define a logger sink that simply logs all events it receives # and connect it to the other end of the same channel. agent1.sinks.hdfs-sink1.channel = ch1 agent1.sinks.hdfs-sink1.type = hdfs agent1.sinks.hdfs-sink1.hdfs.path = hdfs://ip-10-000-000-000.ec2.internal/user/ubuntu/event agent1.sinks.hdfs-sink1.hdfs.filePrefix = event agent1.sinks.hdfs-sink1.hdfs.writeFormat = Text agent1.sinks.hdfs-sink1.hdfs.rollInterval = 60 agent1.sinks.hdfs-sink1.hdfs.rollCount = 0 agent1.sinks.hdfs-sink1.hdfs.rollSize = 0 agent1.sinks.hdfs-sink1.hdfs.fileType = DataStream agent1.sinks.hdfs-sink1.hdfs.batchSize = 1000 # Finally, now that we've defined all of our components, tell # agent1 which ones we want to activate. agent1.channels = ch1 agent1.sources = exec-source1 agent1.sinks = hdfs-sink1 So far I only get about 20Mb/min or less than 1 Mb/sec. I am wondering how far it can be improved. Is there any Benchmark on HDFS Sink performance. Thanks in Advance, Pankaj