What version of Hadoop are you using? Looks like you are getting hit by https://issues.apache.org/jira/browse/HADOOP-6762.
Hari -- Hari Shreedharan On Monday, May 13, 2013 at 6:50 PM, Matt Wise wrote: > So we've just had this happen twice to two different flume machines... we're > using the HDFS sink as well, but ours is writing to an S3N:// URL. Both times > our sink stopped working and the filechannel clogged up immediately causing > serious problems. A restart of Flume worked -- but the filechannel was so > backed up at that point that it took a good long while to get Flume started > up again properly. > > Anyone else seeing this behavior? > > (oh, and we're running flume 1.3.0) > On May 7, 2013, at 8:42 AM, Rahul Ravindran <rahu...@yahoo.com > (mailto:rahu...@yahoo.com)> wrote: > > Hi, > > We have noticed this a few times now where we appear to have an > > IOException from HDFS and this stops draining the channel until the flume > > process is restarted. Below are the logs: namenode-v01-00b is the active > > namenode (namenode-v01-00a is standby). We are using Quorum Journal Manager > > for our Namenode HA, but there was no Namenode failover which was > > initiated. If this is an expected error, should flume handle it and > > gracefully retry (thereby not requiring a restart)? > > Thanks, > > ~Rahul. > > > > 7 May 2013 06:35:02,494 WARN [hdfs-hdfs-sink4-call-runner-2] > > (org.apache.flume.sink.hdfs.BucketWriter.append:378) - Caught IOException > > writing to HDFSWriter (IOException flush:java.io.IOException: Failed on > > local exception: java.nio.channels.ClosedByInterruptException; Host Details > > : local host is: "flumefs-v01-10a.a.com/10.40.85.170 > > (http://flumefs-v01-10a.a.com/10.40.85.170)"; destination host is: > > "namenode-v01-00a.a.com (http://namenode-v01-00a.a.com)":8020; ). Closing > > file > > (hdfs://nameservice1/user/br/data_platform/eventstream/event/flumefs-v01-10a-4//event.1367891734983.tmp) > > and rethrowing exception. > > 07 May 2013 06:35:02,494 WARN [hdfs-hdfs-sink4-call-runner-2] > > (org.apache.flume.sink.hdfs.BucketWriter.append:384) - Caught IOException > > while closing file > > (hdfs://nameservice1/user/br/data_platform/eventstream/event/flumefs-v01-10a-4//event.1367891734983.tmp). > > Exception follows. > > java.io.IOException: IOException flush:java.io.IOException: Failed on local > > exception: java.nio.channels.ClosedByInterruptException; Host Details : > > local host is: "flumefs-v01-10a.a.com/10.40.85.170 > > (http://flumefs-v01-10a.a.com/10.40.85.170)"; destination host is: > > "namenode-v01-00a.a.com (http://namenode-v01-00a.a.com)":8020; > > at > > org.apache.hadoop.hdfs.DFSOutputStream.flushOrSync(DFSOutputStream.java:1617) > > at > > org.apache.hadoop.hdfs.DFSOutputStream.hflush(DFSOutputStream.java:1499) > > at org.apache.hadoop.hdfs.DFSOutputStream.sync(DFSOutputStream.java:1484) > > at > > org.apache.hadoop.fs.FSDataOutputStream.sync(FSDataOutputStream.java:116) > > at org.apache.flume.sink.hdfs.HDFSDataStream.sync(HDFSDataStream.java:95) > > at org.apache.flume.sink.hdfs.BucketWriter.doFlush(BucketWriter.java:345) > > at > > org.apache.flume.sink.hdfs.BucketWriter.access$500(BucketWriter.java:53) > > at org.apache.flume.sink.hdfs.BucketWriter$4.run(BucketWriter.java:310) > > at org.apache.flume.sink.hdfs.BucketWriter$4.run(BucketWriter.java:308) > > at > > org.apache.flume.sink.hdfs.BucketWriter.runPrivileged(BucketWriter.java:143) > > at org.apache.flume.sink.hdfs.BucketWriter.flush(BucketWriter.java:308) > > at org.apache.flume.sink.hdfs.BucketWriter.append(BucketWriter.java:396) > > at org.apache.flume.sink.hdfs.HDFSEventSink$2.call(HDFSEventSink.java:729) > > at org.apache.flume.sink.hdfs.HDFSEventSink$2.call(HDFSEventSink.java:727) > > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > > at > > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > > at > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > > at java.lang.Thread.run(Thread.java:662) > > 07 May 2013 06:35:02,495 WARN > > [SinkRunner-PollingRunner-DefaultSinkProcessor] > > (org.apache.flume.sink.hdfs.HDFSEventSink.process:456) - HDFS IO error > > java.io.IOException: IOException flush:java.io.IOException: Failed on local > > exception: java.nio.channels.ClosedByInterruptException; Host Details : > > local host is: "flumefs-v01-10a.a.com/10.40.85.170 > > (http://flumefs-v01-10a.a.com/10.40.85.170)"; destination host is: > > "namenode-v01-00a.a.com (http://namenode-v01-00a.a.com)":8020; > > at > > org.apache.hadoop.hdfs.DFSOutputStream.flushOrSync(DFSOutputStream.java:1617) > > at > > org.apache.hadoop.hdfs.DFSOutputStream.hflush(DFSOutputStream.java:1499) > > at org.apache.hadoop.hdfs.DFSOutputStream.sync(DFSOutputStream.java:1484) > > at > > org.apache.hadoop.fs.FSDataOutputStream.sync(FSDataOutputStream.java:116) > > at org.apache.flume.sink.hdfs.HDFSDataStream.sync(HDFSDataStream.java:95) > > at org.apache.flume.sink.hdfs.BucketWriter.doFlush(BucketWriter.java:345) > > at > > org.apache.flume.sink.hdfs.BucketWriter.access$500(BucketWriter.java:53) > > at org.apache.flume.sink.hdfs.BucketWriter$4.run(BucketWriter.java:310) > > at org.apache.flume.sink.hdfs.BucketWriter$4.run(BucketWriter.java:308) > > at > > org.apache.flume.sink.hdfs.BucketWriter.runPrivileged(BucketWriter.java:143) > > at org.apache.flume.sink.hdfs.BucketWriter.flush(BucketWriter.java:308) > > at org.apache.flume.sink.hdfs.BucketWriter.append(BucketWriter.java:396) > > at org.apache.flume.sink.hdfs.HDFSEventSink$2.call(HDFSEventSink.java:729) > > at org.apache.flume.sink.hdfs.HDFSEventSink$2.call(HDFSEventSink.java:727) > > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > > at > > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > > at > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > > at java.lang.Thread.run(Thread.java:662) > > 07 May 2013 06:35:05,350 WARN [hdfs-hdfs-sink1-call-runner-5] > > (org.apache.flume.sink.hdfs.BucketWriter.append:378) - Caught IOException > > writing to HDFSWriter (IOException flush:java.io.IOException: Failed on > > local exception: java.nio.channels.ClosedByInterruptException; Host Details > > : local host is: "flumefs-v01-10a.a.com/10.40.85.170 > > (http://flumefs-v01-10a.a.com/10.40.85.170)"; destination host is: > > "namenode-v01-00b.a.com (http://namenode-v01-00b.a.com)":8020; ). Closing > > file > > (hdfs://nameservice1/user/br/data_platform/eventstream/event/flumefs-v01-10a-1//event.1367891734999.tmp) > > and rethrowing exception. > > 07 May 2013 06:35:05,351 WARN [hdfs-hdfs-sink1-call-runner-5] > > (org.apache.flume.sink.hdfs.BucketWriter.append:384) - Caught IOException > > while closing file > > (hdfs://nameservice1/user/br/data_platform/eventstream/event/flumefs-v01-10a-1//event.1367891734999.tmp). > > Exception follows. > > java.io.IOException: IOException flush:java.io.IOException: Failed on local > > exception: java.nio.channels.ClosedByInterruptException; Host Details : > > local host is: "flumefs-v01-10a.a.com/10.40.85.170 > > (http://flumefs-v01-10a.a.com/10.40.85.170)"; destination host is: > > "namenode-v01-00b.a.com (http://namenode-v01-00b.a.com)":8020; > > at > > org.apache.hadoop.hdfs.DFSOutputStream.flushOrSync(DFSOutputStream.java:1617) > > at > > org.apache.hadoop.hdfs.DFSOutputStream.hflush(DFSOutputStream.java:1499) > > at org.apache.hadoop.hdfs.DFSOutputStream.sync(DFSOutputStream.java:1484) > > at > > org.apache.hadoop.fs.FSDataOutputStream.sync(FSDataOutputStream.java:116) > > at org.apache.flume.sink.hdfs.HDFSDataStream.sync(HDFSDataStream.java:95) > > at org.apache.flume.sink.hdfs.BucketWriter.doFlush(BucketWriter.java:345) > > at > > org.apache.flume.sink.hdfs.BucketWriter.access$500(BucketWriter.java:53) > > at org.apache.flume.sink.hdfs.BucketWriter$4.run(BucketWriter.java:310) > > at org.apache.flume.sink.hdfs.BucketWriter$4.run(BucketWriter.java:308) > > at > > org.apache.flume.sink.hdfs.BucketWriter.runPrivileged(BucketWriter.java:143) > > at org.apache.flume.sink.hdfs.BucketWriter.flush(BucketWriter.java:308) > > at org.apache.flume.sink.hdfs.HDFSEventSink$3.call(HDFSEventSink.java:743) > > at org.apache.flume.sink.hdfs.HDFSEventSink$3.call(HDFSEventSink.java:741) > > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > > at > > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > > at > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > > at java.lang.Thread.run(Thread.java:662) > > 07 May 2013 06:35:05,352 WARN > > [SinkRunner-PollingRunner-DefaultSinkProcessor] > > (org.apache.flume.sink.hdfs.HDFSEventSink.process:456) - HDFS IO error > > java.io.IOException: IOException flush:java.io.IOException: Failed on local > > exception: java.nio.channels.ClosedByInterruptException; Host Details : > > local host is: "flumefs-v01-10a.a.com/10.40.85.170 > > (http://flumefs-v01-10a.a.com/10.40.85.170)"; destination host is: > > "namenode-v01-00b.a.com (http://namenode-v01-00b.a.com)":8020; > > at > > org.apache.hadoop.hdfs.DFSOutputStream.flushOrSync(DFSOutputStream.java:1617) > > at > > org.apache.hadoop.hdfs.DFSOutputStream.hflush(DFSOutputStream.java:1499) > > at org.apache.hadoop.hdfs.DFSOutputStream.sync(DFSOutputStream.java:1484) > > at > > org.apache.hadoop.fs.FSDataOutputStream.sync(FSDataOutputStream.java:116) > > at org.apache.flume.sink.hdfs.HDFSDataStream.sync(HDFSDataStream.java:95) > > at org.apache.flume.sink.hdfs.BucketWriter.doFlush(BucketWriter.java:345) > > at > > org.apache.flume.sink.hdfs.BucketWriter.access$500(BucketWriter.java:53) > > at org.apache.flume.sink.hdfs.BucketWriter$4.run(BucketWriter.java:310) > > at org.apache.flume.sink.hdfs.BucketWriter$4.run(BucketWriter.java:308) > > at > > org.apache.flume.sink.hdfs.BucketWriter.runPrivileged(BucketWriter.java:143) > > at org.apache.flume.sink.hdfs.BucketWriter.flush(BucketWriter.java:308) > > at org.apache.flume.sink.hdfs.HDFSEventSink$3.call(HDFSEventSink.java:743) > > at org.apache.flume.sink.hdfs.HDFSEventSink$3.call(HDFSEventSink.java:741) > > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > > at > > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > > at > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > > at java.lang.Thread.run(Thread.java:662) > > 07 May 2013 06:35:07,497 WARN [hdfs-hdfs-sink4-call-runner-8] > > (org.apache.flume.sink.hdfs.BucketWriter.append:378) - Caught IOException > > writing to HDFSWriter (IOException flush:java.io.IOException: Failed on > > local exception: java.nio.channels.ClosedByInterruptException; Host Details > > : local host is: "flumefs-v01-10a.a.com/10.40.85.170 > > (http://flumefs-v01-10a.a.com/10.40.85.170)"; destination host is: > > "namenode-v01-00a.a.com (http://namenode-v01-00a.a.com)":8020; ). Closing > > file > > (hdfs://nameservice1/user/br/data_platform/eventstream/event/flumefs-v01-10a-4//event.1367891734983.tmp) > > and rethrowing exception. > > 07 May 2013 06:35:07,497 WARN [hdfs-hdfs-sink4-call-runner-8] > > (org.apache.flume.sink.hdfs.BucketWriter.append:384) - Caught IOException > > while closing file > > (hdfs://nameservice1/user/br/data_platform/eventstream/event/flumefs-v01-10a-4//event.1367891734983.tmp). > > Exception follows. > > java.io.IOException: IOException flush:java.io.IOException: Failed on local > > exception: java.nio.channels.ClosedByInterruptException; Host Details : > > local host is: "flumefs-v01-10a.a.com/10.40.85.170 > > (http://flumefs-v01-10a.a.com/10.40.85.170)"; destination host is: > > "namenode-v01-00a.a.com (http://namenode-v01-00a.a.com)":8020; > > at > > org.apache.hadoop.hdfs.DFSOutputStream.flushOrSync(DFSOutputStream.java:1617) > > at > > org.apache.hadoop.hdfs.DFSOutputStream.hflush(DFSOutputStream.java:1499) > > > > > > > > >