Re: FileChannel Replays consistently take a long time

Gary Malouf Wed, 20 Aug 2014 12:04:58 -0700

Hi Hari,

I wanted to provide more details on the issues we are seeing as there may
be better actions we can take.


As mentioned earlier, we are on the Flume 1.4 release that is packaged with
the CDH 5.0.2 distribution.  We've been in production with Flume for well
over a year, but have just in the last month or two seen more significant
issues as our traffic has ramped up

We have 2 sets of Flume 'clusters/configurations' - each instance has an
agent with 2 or more source-channel-sink combinations.  We use file
channels for all of them as they allow us to buffer when downstream issues
occur (usually when replacing HDFS nodes - will touch on this at the
bottom).  All of the channels use capacities of 100000000 items (again,
using Flume as a buffer during downtime).

For reference, we have a 12 node HDFS cluster that these are writing to
with a replication factor of 3.

An example channel configuration:

agent-1.channels.trdbuy-bid-req-ch1.type = file

agent-1.channels.trdbuy-bid-req-ch1.checkpointDir =
/opt/flume/trdbuy-req-ch1/checkpoint

agent-1.channels.trdbuy-bid-req-ch1.dataDirs =
/opt/flume/trdbuy-req-ch1/data

agent-1.channels.trdbuy-bid-req-ch1.capacity = 100000000

We use HDFS sinks for everything, an example configuration:

agent-1.sinks.hdfs-trdbuy-bid-req-sink1.channel = trdbuy-bid-req-ch1

agent-1.sinks.hdfs-trdbuy-bid-req-sink1.type = hdfs

agent-1.sinks.hdfs-trdbuy-bid-req-sink1.hdfs.path =
hdfs://nn-01:8020/flume/trading/buying/bidrequests/yr=%Y/mo=%m/d=%d/

agent-1.sinks.hdfs-trdbuy-bid-req-sink1.hdfs.filePrefix = %{host}s1

agent-1.sinks.hdfs-trdbuy-bid-req-sink1.hdfs.batchSize = 10000

agent-1.sinks.hdfs-trdbuy-bid-req-sink1.hdfs.rollInterval = 3600

agent-1.sinks.hdfs-trdbuy-bid-req-sink1.hdfs.rollCount = 0

agent-1.sinks.hdfs-trdbuy-bid-req-sink1.hdfs.rollSize = 0

agent-1.sinks.hdfs-trdbuy-bid-req-sink1.hdfs.idleTimeout = 1800

We are running into a few distinct problems:

1) Losing a single datanode causes our flume instances which are under
higher volumes (6k messages/second approximately) to be unable to write
(others seem to continue to hum along).  They do not seem to recover until
we restart the instance.  I think this exception is related:

20 Aug 2014 01:16:23,942 ERROR
[SinkRunner-PollingRunner-DefaultSinkProcessor]
(org.apache.flume.sink.hdfs.AbstractHDFSWriter.isUnderReplicated:82)
- Unexpected
error while checking replication factor

java.lang.reflect.InvocationTargetException

        at sun.reflect.GeneratedMethodAccessor51.invoke(Unknown Source)

        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

        at java.lang.reflect.Method.invoke(Method.java:601)

        at
org.apache.flume.sink.hdfs.AbstractHDFSWriter.getNumCurrentReplicas(AbstractHDFSWriter.java:147)

        at
org.apache.flume.sink.hdfs.AbstractHDFSWriter.isUnderReplicated(AbstractHDFSWriter.java:68)

        at
org.apache.flume.sink.hdfs.BucketWriter.shouldRotate(BucketWriter.java:505)

        at
org.apache.flume.sink.hdfs.BucketWriter.append(BucketWriter.java:440)

        at
org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:401)

        at
org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)

        at
org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)

        at java.lang.Thread.run(Thread.java:722)

Caused by: java.io.IOException: Failed to replace a bad datanode on the
existing pipeline due to no more good datanodes being available to try.
(Nodes: current=[10.9.178.151:50010, 10.144.197.136:50010], original=[
10.9.178.151:50010, 10.144.197.136:50010]). The current failed datanode
replacement policy is DEFAULT, and a client may configure this via
'dfs.client.block.write.replace-datanode-on-failure.policy' in its
configuration.

        at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:960)

        at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:1026)

        at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1174)

        at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:924)

        at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:486)

2) Stopping a namenode (as expected) causes the Flume instances to not
write as well.  When the namenode comes back up, Flume does not recover
either.  It again needs a restart.

3) After restarting Flume in the case of either (1) or (2), it seems to
take an eternity (anywhere from 5-30 minutes based on the amount of data in
the channel) to open the source ports for listening.  During this time, a
single cpu on the machine is cranking but no data gets copied to HDFS.
 Once the ports are allowed to open, the data built up in the channel
starts to copy and the catch-up process begins.


This last point has caused us to consider other logging solutions, as our
system is stuck backed up waiting for Flume to figure itself out.  The
embedded agents (on the client/writer side) have protected us to date, but
they too can be subject to the same backup issues with file channels.


Thank you for any help around this, if nothing else this informs other
potential users of problems we have been seeing.

Gary



On Wed, Aug 20, 2014 at 12:50 AM, Gary Malouf <malouf.g...@gmail.com> wrote:

> We run on ssd ephemeral storage.
>
> You can see from the timestamps below how long the replay ended up taking a
>
> 20 Aug 2014 04:05:20,520 INFO  [lifecycleSupervisor-1-1]
> (org.apache.flume.channel.file.Log.replay:348)  - Replay started
>
> 20 Aug 2014 04:05:20,532 INFO  [lifecycleSupervisor-1-0]
> (org.apache.flume.channel.file.Log.replay:348)  - Replay started
>
> 20 Aug 2014 04:05:20,534 INFO  [lifecycleSupervisor-1-0]
> (org.apache.flume.channel.file.Log.replay:360)  - Found NextFileID 0,
>
> from []
>
> 20 Aug 2014 04:05:20,544 INFO  [lifecycleSupervisor-1-1]
> (org.apache.flume.channel.file.Log.replay:360)  - Found NextFileID 619
>
> , from [/opt/flume/trdbuy-bid-req-ch1/data/log-613,
> /opt/flume/trdbuy-bid-req-ch1/data/log-615, /opt/flume/trdbuy-bid-req-ch1/d
>
> ata/log-618, /opt/flume/trdbuy-bid-req-ch1/data/log-616,
> /opt/flume/trdbuy-bid-req-ch1/data/log-617, /opt/flume/trdbuy-bid-req-
>
> ch1/data/log-612, /opt/flume/trdbuy-bid-req-ch1/data/log-619,
> /opt/flume/trdbuy-bid-req-ch1/data/log-614]
>
> 20 Aug 2014 04:05:20,552 INFO  [lifecycleSupervisor-1-1]
> (org.apache.flume.channel.file.EventQueueBackingStoreFileV3.<init>:53)
>
>   - Starting up with /opt/flume/trdbuy-bid-req-ch1/checkpoint/checkpoint
> and /opt/flume/trdbuy-bid-req-ch1/checkpoint/checkpoin
>
> t.meta
>
> 20 Aug 2014 04:05:20,553 INFO  [lifecycleSupervisor-1-1]
> (org.apache.flume.channel.file.EventQueueBackingStoreFileV3.<init>:57)
>
>   - Reading checkpoint metadata from
> /opt/flume/trdbuy-bid-req-ch1/checkpoint/checkpoint.meta
>
> 20 Aug 2014 04:36:36,024 INFO  [lifecycleSupervisor-1-1]
> (org.apache.flume.channel.file.FlumeEventQueue.<init>:114)  - QueueSet
>
>  population inserting 22678800 took 1874849
>
>
>
>
> ---------- Forwarded message ----------
> From: Hari Shreedharan <hshreedha...@cloudera.com>
> Date: Wed, Aug 20, 2014 at 12:42 AM
> Subject: Re: FileChannel Replays consistently take a long time
> To: "user@flume.apache.org" <user@flume.apache.org>
>
>
> Are you running on EBS or ephemeral storage? I have seen IO being slow on
> AWS when EBS with provisioned IO is not used. This might be what you are
> seeing.
>
> Also what do you see as checkpoint size when the channel starts up?
>
>
> On Tue, Aug 19, 2014 at 9:18 PM, Gary Malouf <malouf.g...@gmail.com>
> wrote:
>
>> To be clear, we have this load handled across 3 EC2 instances running
>> Flume so each individually we are asking to handle 3.3k (5k).  With 16GB of
>> data in the channel, I would have expected the replay to be faster.
>>
>>
>> On Wed, Aug 20, 2014 at 12:12 AM, Gary Malouf <malouf.g...@gmail.com>
>> wrote:
>>
>>> Our capacity setting is:
>>>
>>> agent-1.channels.trdbuy-bid-req-ch1.capacity = 100000000
>>>
>>>
>>> Our current channel size can not be accessed because it still is in this
>>> odd 'replay' mode.  There's not logs, but the cpu is cranking on the flume
>>> node and the avro source ports have not yet opened.  The pattern we see is
>>> that after anywhere from 15-30 minutes, the ports magically open and we can
>>> continue.
>>>
>>>
>>> This is because we are logging around 10k messages/second and did not
>>> want to lose any data during brief interruptions.
>>>
>>>
>>> On Wed, Aug 20, 2014 at 12:02 AM, Hari Shreedharan <
>>> hshreedha...@cloudera.com> wrote:
>>>
>>>> How large is your channel (and how long does it take to replay?)
>>>>
>>>> Gary Malouf wrote:
>>>>
>>>>
>>>> For the record, we are using Flume 1.4.0 packaged with CDH5.0.2
>>>>
>>>>
>>>> On Tue, Aug 19, 2014 at 11:55 PM, Gary Malouf <malouf.g...@gmail.com
>>>> <mailto:malouf.g...@gmail.com>> wrote:
>>>>
>>>>     We are repeatedly running into cases where the replays of from a
>>>>     file channel going to HDFS take an eternity.
>>>>
>>>>     I've read this thread
>>>>     <
>>>> http://mail-archives.apache.org/mod_mbox/flume-dev/201306.mbox/%3ccahbpyvbmed6pkzkdadmyaw_gc_p7cqdefpsycwknky72tfi...@mail.gmail.com%3E
>>>> >,
>>>>
>>>>     but I just am not convinced that our checkpoints are constantly
>>>>     being corrupted.
>>>>
>>>>     We are seeing messages such as:
>>>>
>>>>     20 Aug 2014 03:52:26,849 INFO  [lifecycleSupervisor-1-2]
>>>>
>>>> (org.apache.flume.channel.file.EventQueueBackingStoreFileV3.<init>:57)
>>>>
>>>>       - Reading checkpoint metadata from
>>>>     /opt/flume/brq/ch1/checkpoint/checkpoint.meta
>>>>
>>>>
>>>>     How can it be that this takes so long?
>>>>
>>>>
>>>
>>
>
>

Re: FileChannel Replays consistently take a long time

Reply via email to