To be clear, we have this load handled across 3 EC2 instances running Flume
so each individually we are asking to handle 3.3k (5k).  With 16GB of data
in the channel, I would have expected the replay to be faster.


On Wed, Aug 20, 2014 at 12:12 AM, Gary Malouf <malouf.g...@gmail.com> wrote:

> Our capacity setting is:
>
> agent-1.channels.trdbuy-bid-req-ch1.capacity = 100000000
>
>
> Our current channel size can not be accessed because it still is in this
> odd 'replay' mode.  There's not logs, but the cpu is cranking on the flume
> node and the avro source ports have not yet opened.  The pattern we see is
> that after anywhere from 15-30 minutes, the ports magically open and we can
> continue.
>
>
> This is because we are logging around 10k messages/second and did not want
> to lose any data during brief interruptions.
>
>
> On Wed, Aug 20, 2014 at 12:02 AM, Hari Shreedharan <
> hshreedha...@cloudera.com> wrote:
>
>> How large is your channel (and how long does it take to replay?)
>>
>> Gary Malouf wrote:
>>
>>
>> For the record, we are using Flume 1.4.0 packaged with CDH5.0.2
>>
>>
>> On Tue, Aug 19, 2014 at 11:55 PM, Gary Malouf <malouf.g...@gmail.com
>> <mailto:malouf.g...@gmail.com>> wrote:
>>
>>     We are repeatedly running into cases where the replays of from a
>>     file channel going to HDFS take an eternity.
>>
>>     I've read this thread
>>     <
>> http://mail-archives.apache.org/mod_mbox/flume-dev/201306.mbox/%3ccahbpyvbmed6pkzkdadmyaw_gc_p7cqdefpsycwknky72tfi...@mail.gmail.com%3E
>> >,
>>
>>     but I just am not convinced that our checkpoints are constantly
>>     being corrupted.
>>
>>     We are seeing messages such as:
>>
>>     20 Aug 2014 03:52:26,849 INFO  [lifecycleSupervisor-1-2]
>>     (org.apache.flume.channel.file.EventQueueBackingStoreFileV3.<init>:57)
>>
>>       - Reading checkpoint metadata from
>>     /opt/flume/brq/ch1/checkpoint/checkpoint.meta
>>
>>
>>     How can it be that this takes so long?
>>
>>
>

Reply via email to