Source is Avro Source which gets evnets fed by a custom JVM application using the flume client SDK.
So referring to the client SDK, if the batchSize property has be set to 1,000, but I pass say 10,000 events in the client.addBatch(List<Event>) call what happens ? On Tue, Oct 15, 2013 at 3:54 PM, Hari Shreedharan <[email protected] > wrote: > What source are you using? Looks like the source is writing > 5K events > in one transaction > > > Thanks, > Hari > > On Tuesday, October 15, 2013 at 12:24 PM, Bhaskar V. Karambelkar wrote: > > Recently we switched over from Memory Channel to File Channel, as Memory > Channel has some GC issues. > Occasionally in File Channel I see this exception > > org.apache.flume.ChannelException: Put queue for FileBackedTransaction of > capacity 5000 full, consider committing more frequently, increasing > capacity or increasing thread count. [channel=fileChannelD1] > > Client batchSize is 1,000, and HDFS Sink batch size is also 1,000. > The channel capacity is 1M (1,000,000), and Channel Tx Capacity is 5,000 > > The underlying directories are not full, so the channel should have enough > space, nor does the channel has any backlog. > > What I'm confused by are the 3 options the Exception mentions. > > How do I , commit more frequently ? or increase capacity ? (Capacity of > Channel is 1M, and that is not full), or increase thread count ?( I see no > option of thread count in file channel, or is this referring to threadcout > of the HDFS sink which reads from this sink ?) > > Lastly, would GC in Hadoop (mostly Namenode) cause HDFS Timeout issues in > HDFS Sink, coz we see HDFS Timeout errors, more or less at the same time > across all our flume nodes, so I suspect it could be NameNode GC causing > timeout issues. > > > thanks > Bhaskar > > >
