Thanks for the information. The maxOpenFiles value I use is the default one (I don't touch that config value in fact).
On 8 February 2017 at 15:28, Denes Arvay <de...@cloudera.com> wrote: > Hi, > > Yes, it seems to be a bug, I also bumped into it. > It seems that the conf file poller detects change in the config file and > tries to stop the components and in the same time HDFS sink tries to roll a > file. > It should be solved by https://issues.apache.org/jira/browse/FLUME-2973 > > From your thread dump it seems that rolling is triggered by the maxOpenFiles > limit, is it overridden in your config file? A very low value could increase > the chances of this deadlock. > > I'd also recommend to use the --no-reload-conf command line parameter if the > live config reload feature is not needed. > > Kind regards, > Denes > > > > On Mon, Feb 6, 2017 at 6:08 PM Chia-Hung Lin <cli...@googlemail.com> wrote: >> >> I use flume 1.6.0 (revision 2561a23240a71ba20bf288c7c2cda88f443c2080) >> for testing to move files from local file system to s3. Only a flume >> process is launched (a single jvm process). The problem is each time a >> deadlock occurs between roll timer and PollingRunner threads after >> running a while. A thread dumps is shown as below: >> >> "hdfs-sk-roll-timer-0": >> waiting to lock monitor 0x00007f46c40b5578 (object >> 0x00000000e002dc90, a java.lang.Object), >> which is held by "SinkRunner-PollingRunner-DefaultSinkProcessor" >> "SinkRunner-PollingRunner-DefaultSinkProcessor": >> waiting to lock monitor 0x00007f4684004db8 (object >> 0x00000000e17b64d8, a org.apache.flume.sink.hdfs.BucketWriter), >> which is held by "hdfs-sk-roll-timer-0" >> >> Java stack information for the threads listed above: >> =================================================== >> "hdfs-sk-roll-timer-0": >> at >> org.apache.flume.sink.hdfs.HDFSEventSink$1.run(HDFSEventSink.java:396) >> - waiting to lock <0x00000000e002dc90> (a java.lang.Object) >> at >> org.apache.flume.sink.hdfs.BucketWriter.runCloseAction(BucketWriter.java:447) >> at >> org.apache.flume.sink.hdfs.BucketWriter.close(BucketWriter.java:408) >> - locked <0x00000000e17b64d8> (a >> org.apache.flume.sink.hdfs.BucketWriter) >> at >> org.apache.flume.sink.hdfs.BucketWriter$2.call(BucketWriter.java:280) >> at >> org.apache.flume.sink.hdfs.BucketWriter$2.call(BucketWriter.java:274) >> at java.util.concurrent.FutureTask.run(FutureTask.java:262) >> at >> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178) >> at >> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292) >> at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >> at java.lang.Thread.run(Thread.java:745) >> "SinkRunner-PollingRunner-DefaultSinkProcessor": >> at >> org.apache.flume.sink.hdfs.BucketWriter.close(BucketWriter.java:304) >> - waiting to lock <0x00000000e17b64d8> (a >> org.apache.flume.sink.hdfs.BucketWriter) >> at >> org.apache.flume.sink.hdfs.HDFSEventSink$WriterLinkedHashMap.removeEldestEntry(HDFSEventSink.java:163) >> at java.util.LinkedHashMap.addEntry(LinkedHashMap.java:431) >> at java.util.HashMap.put(HashMap.java:505) >> at >> org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:407) >> - locked <0x00000000e002dc90> (a java.lang.Object) >> at >> org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68) >> at >> org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147) >> at java.lang.Thread.run(Thread.java:745) >> >> Found 1 deadlock. >> >> The setting is below: >> >> a1.sources = src >> a1.sinks = sk >> a1.channels = ch >> ... >> a1.sinks.sk.type = hdfs >> a1.sinks.sk.channel = ch >> ... >> a1.sinks.sk.hdfs.fileType = DataStream >> ... >> a1.sinks.k1.hdfs.rollCount = 0 >> a1.sinks.k1.hdfs.rollSize = 0 >> a1.sinks.k1.hdfs.rollInterval = 100 >> ... >> a1.channels.ch.type = file >> a1.channels.ch.checkpointDir = /path/to/chechkpointDir >> a1.channels.ch.dataDirs = /path/to/dataDir >> >> The command to run flume is >> >> nohup ./bin/flume-ng agent --conf conf/ --conf-file test.conf --name >> a1 ... > /path/to/test.log 2 >&1 & >> >> Is this a bug or something I can tune to fix it? >> >> Thanks