Flume 1.5.0 hdfs sink stuck with channel is full messages

Viral Bajaria Sun, 27 Jul 2014 19:26:18 -0700

Hi,

I am using 1.5.0 with the hdfs sink to write files to S3. The process
writes fine for a while but eventually I started getting the following
message:


INFO  [pool-5-thread-1]
(org.apache.flume.client.avro.ReliableSpoolingFileEventReader.readEvents:224)
 - Last read was never committed - resetting mark position.
WARN  [pool-5-thread-1]
(org.apache.flume.source.SpoolDirectorySource$SpoolDirectoryRunnable.run:238)
 - The channel is full, and cannot write data now. The source will try
again after 4000 milliseconds

I looked into the logs and I don't see any errors besides the channel is
full.

I saw that the number of open files for writing to HDFS is 1 shy of the max
count (500) and that's when this issue starts every single time.

Following are my configuration settings:

*BEGIN CONFIG*
agent.sources = spooldir
agent.channels = memoryChannel
agent.sinks = s3Sink

agent.sources.spooldir.channels = memoryChannel
agent.sources.spooldir.type = spooldir
agent.sources.spooldir.spoolDir = <spool_directory>
agent.sources.spooldir.deserializer.maxLineLength = 4096
agent.sources.spooldir.decodeErrorPolicy = IGNORE

#s3 sink
agent.sinks.s3Sink.channel = memoryChannel
agent.sinks.s3Sink.type = hdfs
agent.sinks.s3Sink.hdfs.rollInterval = 0
agent.sinks.s3Sink.hdfs.rollSize = 134217728
agent.sinks.s3Sink.hdfs.rollCount = 0
agent.sinks.s3Sink.hdfs.batchSize = 1000
agent.sinks.s3Sink.hdfs.maxOpenFiles = 500
agent.sinks.s3Sink.hdfs.idleTimeout = 30
agent.sinks.s3Sink.hdfs.codeC = gzip
agent.sinks.s3Sink.hdfs.writeFormat = Text
agent.sinks.s3Sink.hdfs.path = <s3_path>
#hdfs sink uses java.util.Time and needs the long timezone format
agent.sinks.s3Sink.hdfs.timeZone = America/New_York


agent.channels.memoryChannel.type = memory
agent.channels.memoryChannel.capacity = 100000
agent.channels.memoryChannel.transactionCapacity = 1000

*END CONFIG*

Out of curiosity, I did a jstack on the process and it says there is a
deadlock:

Java stack information for the threads listed above:
===================================================
"hdfs-s3Sink-roll-timer-0":
        at
org.apache.flume.sink.hdfs.HDFSEventSink$1.run(HDFSEventSink.java:404)
        - waiting to lock <0x000000077ae51798> (a java.lang.Object)
        at
org.apache.flume.sink.hdfs.BucketWriter.runCloseAction(BucketWriter.java:487)
        at
org.apache.flume.sink.hdfs.BucketWriter.close(BucketWriter.java:448)
        - locked <0x0000000781f08ec0> (a
org.apache.flume.sink.hdfs.BucketWriter)
        at
org.apache.flume.sink.hdfs.BucketWriter$5.call(BucketWriter.java:472)
        at
org.apache.flume.sink.hdfs.BucketWriter$5.call(BucketWriter.java:467)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
        at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
"SinkRunner-PollingRunner-DefaultSinkProcessor":
        at
org.apache.flume.sink.hdfs.BucketWriter.close(BucketWriter.java:331)
        - waiting to lock <0x0000000781f08ec0> (a
org.apache.flume.sink.hdfs.BucketWriter)
        at
org.apache.flume.sink.hdfs.HDFSEventSink$WriterLinkedHashMap.removeEldestEntry(HDFSEventSink.java:175)
        at java.util.LinkedHashMap.addEntry(LinkedHashMap.java:431)
        at java.util.HashMap.put(HashMap.java:509)
        at
org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:415)
        - locked <0x000000077ae51798> (a java.lang.Object)
        at
org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
        at
org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
        at java.lang.Thread.run(Thread.java:745)

Found 1 deadlock.

Is the deadlock the issue for the channel getting choked and not moving
forward ?

Thanks,
Viral

Flume 1.5.0 hdfs sink stuck with channel is full messages

Reply via email to