Guys, I have flume setup to flow partitioned data to hdfs, each partition has its own file folder. Is there a way to specify all the data under one partition to be in one file? I am currently using MyAgent.sinks.HDFS.hdfs.batchSize = 10000 MyAgent.sinks.HDFS.hdfs.rollSize = 15000000 MyAgent.sinks.HDFS.hdfs.rollCount = 10000 MyAgent.sinks.HDFS.hdfs.rollInterval = 360
to make the file roll on 15m data or after 6 minute. Is this the best way to achieve my goal? Thanks, Chen