Hi, I am using 1.5.0 with the hdfs sink to write files to S3. The process writes fine for a while but eventually I started getting the following message:
INFO [pool-5-thread-1] (org.apache.flume.client.avro.ReliableSpoolingFileEventReader.readEvents:224) - Last read was never committed - resetting mark position. WARN [pool-5-thread-1] (org.apache.flume.source.SpoolDirectorySource$SpoolDirectoryRunnable.run:238) - The channel is full, and cannot write data now. The source will try again after 4000 milliseconds I looked into the logs and I don't see any errors besides the channel is full. I saw that the number of open files for writing to HDFS is 1 shy of the max count (500) and that's when this issue starts every single time. Following are my configuration settings: *BEGIN CONFIG* agent.sources = spooldir agent.channels = memoryChannel agent.sinks = s3Sink agent.sources.spooldir.channels = memoryChannel agent.sources.spooldir.type = spooldir agent.sources.spooldir.spoolDir = <spool_directory> agent.sources.spooldir.deserializer.maxLineLength = 4096 agent.sources.spooldir.decodeErrorPolicy = IGNORE #s3 sink agent.sinks.s3Sink.channel = memoryChannel agent.sinks.s3Sink.type = hdfs agent.sinks.s3Sink.hdfs.rollInterval = 0 agent.sinks.s3Sink.hdfs.rollSize = 134217728 agent.sinks.s3Sink.hdfs.rollCount = 0 agent.sinks.s3Sink.hdfs.batchSize = 1000 agent.sinks.s3Sink.hdfs.maxOpenFiles = 500 agent.sinks.s3Sink.hdfs.idleTimeout = 30 agent.sinks.s3Sink.hdfs.codeC = gzip agent.sinks.s3Sink.hdfs.writeFormat = Text agent.sinks.s3Sink.hdfs.path = <s3_path> #hdfs sink uses java.util.Time and needs the long timezone format agent.sinks.s3Sink.hdfs.timeZone = America/New_York agent.channels.memoryChannel.type = memory agent.channels.memoryChannel.capacity = 100000 agent.channels.memoryChannel.transactionCapacity = 1000 *END CONFIG* Out of curiosity, I did a jstack on the process and it says there is a deadlock: Java stack information for the threads listed above: =================================================== "hdfs-s3Sink-roll-timer-0": at org.apache.flume.sink.hdfs.HDFSEventSink$1.run(HDFSEventSink.java:404) - waiting to lock <0x000000077ae51798> (a java.lang.Object) at org.apache.flume.sink.hdfs.BucketWriter.runCloseAction(BucketWriter.java:487) at org.apache.flume.sink.hdfs.BucketWriter.close(BucketWriter.java:448) - locked <0x0000000781f08ec0> (a org.apache.flume.sink.hdfs.BucketWriter) at org.apache.flume.sink.hdfs.BucketWriter$5.call(BucketWriter.java:472) at org.apache.flume.sink.hdfs.BucketWriter$5.call(BucketWriter.java:467) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) "SinkRunner-PollingRunner-DefaultSinkProcessor": at org.apache.flume.sink.hdfs.BucketWriter.close(BucketWriter.java:331) - waiting to lock <0x0000000781f08ec0> (a org.apache.flume.sink.hdfs.BucketWriter) at org.apache.flume.sink.hdfs.HDFSEventSink$WriterLinkedHashMap.removeEldestEntry(HDFSEventSink.java:175) at java.util.LinkedHashMap.addEntry(LinkedHashMap.java:431) at java.util.HashMap.put(HashMap.java:509) at org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:415) - locked <0x000000077ae51798> (a java.lang.Object) at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68) at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147) at java.lang.Thread.run(Thread.java:745) Found 1 deadlock. Is the deadlock the issue for the channel getting choked and not moving forward ? Thanks, Viral