Given a quick test (for 20k records) with patch[1] applied, the dead lock problem looks fixed. Thanks!
[1]. https://issues.apache.org/jira/browse/FLUME-2973 On 9 February 2017 at 11:29, Chia-Hung Lin <cli...@googlemail.com> wrote: > Thanks for the information. The maxOpenFiles value I use is the > default one (I don't touch that config value in fact). > > On 8 February 2017 at 15:28, Denes Arvay <de...@cloudera.com> wrote: >> Hi, >> >> Yes, it seems to be a bug, I also bumped into it. >> It seems that the conf file poller detects change in the config file and >> tries to stop the components and in the same time HDFS sink tries to roll a >> file. >> It should be solved by https://issues.apache.org/jira/browse/FLUME-2973 >> >> From your thread dump it seems that rolling is triggered by the maxOpenFiles >> limit, is it overridden in your config file? A very low value could increase >> the chances of this deadlock. >> >> I'd also recommend to use the --no-reload-conf command line parameter if the >> live config reload feature is not needed. >> >> Kind regards, >> Denes >> >> >> >> On Mon, Feb 6, 2017 at 6:08 PM Chia-Hung Lin <cli...@googlemail.com> wrote: >>> >>> I use flume 1.6.0 (revision 2561a23240a71ba20bf288c7c2cda88f443c2080) >>> for testing to move files from local file system to s3. Only a flume >>> process is launched (a single jvm process). The problem is each time a >>> deadlock occurs between roll timer and PollingRunner threads after >>> running a while. A thread dumps is shown as below: >>> >>> "hdfs-sk-roll-timer-0": >>> waiting to lock monitor 0x00007f46c40b5578 (object >>> 0x00000000e002dc90, a java.lang.Object), >>> which is held by "SinkRunner-PollingRunner-DefaultSinkProcessor" >>> "SinkRunner-PollingRunner-DefaultSinkProcessor": >>> waiting to lock monitor 0x00007f4684004db8 (object >>> 0x00000000e17b64d8, a org.apache.flume.sink.hdfs.BucketWriter), >>> which is held by "hdfs-sk-roll-timer-0" >>> >>> Java stack information for the threads listed above: >>> =================================================== >>> "hdfs-sk-roll-timer-0": >>> at >>> org.apache.flume.sink.hdfs.HDFSEventSink$1.run(HDFSEventSink.java:396) >>> - waiting to lock <0x00000000e002dc90> (a java.lang.Object) >>> at >>> org.apache.flume.sink.hdfs.BucketWriter.runCloseAction(BucketWriter.java:447) >>> at >>> org.apache.flume.sink.hdfs.BucketWriter.close(BucketWriter.java:408) >>> - locked <0x00000000e17b64d8> (a >>> org.apache.flume.sink.hdfs.BucketWriter) >>> at >>> org.apache.flume.sink.hdfs.BucketWriter$2.call(BucketWriter.java:280) >>> at >>> org.apache.flume.sink.hdfs.BucketWriter$2.call(BucketWriter.java:274) >>> at java.util.concurrent.FutureTask.run(FutureTask.java:262) >>> at >>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178) >>> at >>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292) >>> at >>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >>> at >>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >>> at java.lang.Thread.run(Thread.java:745) >>> "SinkRunner-PollingRunner-DefaultSinkProcessor": >>> at >>> org.apache.flume.sink.hdfs.BucketWriter.close(BucketWriter.java:304) >>> - waiting to lock <0x00000000e17b64d8> (a >>> org.apache.flume.sink.hdfs.BucketWriter) >>> at >>> org.apache.flume.sink.hdfs.HDFSEventSink$WriterLinkedHashMap.removeEldestEntry(HDFSEventSink.java:163) >>> at java.util.LinkedHashMap.addEntry(LinkedHashMap.java:431) >>> at java.util.HashMap.put(HashMap.java:505) >>> at >>> org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:407) >>> - locked <0x00000000e002dc90> (a java.lang.Object) >>> at >>> org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68) >>> at >>> org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147) >>> at java.lang.Thread.run(Thread.java:745) >>> >>> Found 1 deadlock. >>> >>> The setting is below: >>> >>> a1.sources = src >>> a1.sinks = sk >>> a1.channels = ch >>> ... >>> a1.sinks.sk.type = hdfs >>> a1.sinks.sk.channel = ch >>> ... >>> a1.sinks.sk.hdfs.fileType = DataStream >>> ... >>> a1.sinks.k1.hdfs.rollCount = 0 >>> a1.sinks.k1.hdfs.rollSize = 0 >>> a1.sinks.k1.hdfs.rollInterval = 100 >>> ... >>> a1.channels.ch.type = file >>> a1.channels.ch.checkpointDir = /path/to/chechkpointDir >>> a1.channels.ch.dataDirs = /path/to/dataDir >>> >>> The command to run flume is >>> >>> nohup ./bin/flume-ng agent --conf conf/ --conf-file test.conf --name >>> a1 ... > /path/to/test.log 2 >&1 & >>> >>> Is this a bug or something I can tune to fix it? >>> >>> Thanks