If you don't intend to roll based on # of events than you will want to set rollCount to 0. MyAgent.sinks.HDFS.hdfs.rollCount = 0
On Mon, Jan 20, 2014 at 12:35 PM, Jimmy <jimmyj...@gmail.com> wrote: > Seems like the only reason is "too many files" issue, correct? > > File Crusher executed regularly might be better option than trying to tune > it in flume > > http://www.jointhegrid.com/hadoop_filecrush/index.jsp > > > > ---------- Forwarded message ---------- > From: Chen Wang <chen.apache.s...@gmail.com> > Date: Mon, Jan 20, 2014 at 11:21 AM > Subject: Re: best way to make all hdfs records in one file under a folder? > To: user@flume.apache.org > > > Chris, > Its by every 6 minutes(thats why i set the roll time to be 60*5=300. the > data size is around 15M. Thus I want them all in one file. > Chen > > > On Mon, Jan 20, 2014 at 10:57 AM, Christopher Shannon < > cshannon...@gmail.com> wrote: > >> How is your data partitioned, by date? >> >> >> On Monday, January 20, 2014, Chen Wang <chen.apache.s...@gmail.com> >> wrote: >> >>> Guys, >>> I have flume setup to flow partitioned data to hdfs, each partition has >>> its own file folder. Is there a way to specify all the data under one >>> partition to be in one file? >>> I am currently using >>> MyAgent.sinks.HDFS.hdfs.batchSize = 10000 >>> MyAgent.sinks.HDFS.hdfs.rollSize = 15000000 >>> MyAgent.sinks.HDFS.hdfs.rollCount = 10000 >>> MyAgent.sinks.HDFS.hdfs.rollInterval = 360 >>> >>> to make the file roll on 15m data or after 6 minute. >>> >>> Is this the best way to achieve my goal? >>> Thanks, >>> Chen >>> >>> > > >