Per Roshan’s request, I have filed a bug for this issue. For those interested, here is the link to the issue:
https://issues.apache.org/jira/browse/FLUME-2451 Hopefully this will create some visibility on this problem. Thanks, Andrew From: Roshan Naik <ros...@hortonworks.com<mailto:ros...@hortonworks.com>> Reply-To: "user@flume.apache.org<mailto:user@flume.apache.org>" <user@flume.apache.org<mailto:user@flume.apache.org>> Date: Tuesday, August 26, 2014 at 16:11 PM To: "user@flume.apache.org<mailto:user@flume.apache.org>" <user@flume.apache.org<mailto:user@flume.apache.org>> Subject: Re: Flume 1.4 HDFS Sink Cannot Reconnect Please file a bug for this with the details provided in your email. On Tue, Aug 26, 2014 at 9:44 AM, Gary Malouf <malouf.g...@gmail.com<mailto:malouf.g...@gmail.com>> wrote: +1 I've seen this same issue. On Tue, Aug 26, 2014 at 12:33 PM, Andrew O'Neill <aone...@paytronix.com<mailto:aone...@paytronix.com>> wrote: Hello all, My setup: - Flume 1.4 - CDH 4.2.2 (2.0.0-cdh4.2.2) I am testing a simple flume setup with a Sequence Generator Source, a File Channel, and an HDFS Sink (see my flume.conf below). This configuration works as expected until I reboot the cluster’s NameNode or until I restart the HDFS service on the cluster. At this point, it appears that the Flume Agent cannot reconnect to HDFS and must be manually restarted. Since this is not an uncommon occurrence in our production cluster, it is important that Flume is able to reconnect gracefully without any manual intervention. So, how do we fix this HDFS reconnection issue? Here is our flume.conf: appserver.sources = rawtext appserver.channels = testchannel appserver.sinks = test_sink appserver.sources.rawtext.type = seq appserver.sources.rawtext.channels = testchannel appserver.channels.testchannel.type = file appserver.channels.testchannel.capacity = 10000000 appserver.channels.testchannel.minimumRequiredSpace = 214748364800 appserver.channels.testchannel.checkpointDir = /Users/aoneill/Desktop/testchannel/checkpoint appserver.channels.testchannel.dataDirs = /Users/aoneill/Desktop/testchannel/data appserver.channels.testchannel.maxFileSize = 20000000 appserver.sinks.test_sink.type = hdfs appserver.sinks.test_sink.channel = testchannel appserver.sinks.test_sink.hdfs.path = hdfs://cluster01:8020/user/aoneill/flumetest appserver.sinks.test_sink.hdfs.closeTries = 3 appserver.sinks.test_sink.hdfs.filePrefix = events- appserver.sinks.test_sink.hdfs.fileSuffix = .avro appserver.sinks.test_sink.hdfs.fileType = DataStream appserver.sinks.test_sink.hdfs.writeFormat = Text appserver.sinks.test_sink.hdfs.inUsePrefix = inuse- appserver.sinks.test_sink.hdfs.inUseSuffix = .avro appserver.sinks.test_sink.hdfs.rollCount = 100000 appserver.sinks.test_sink.hdfs.rollInterval = 30 appserver.sinks.test_sink.hdfs.rollSize = 10485760 These are the two error message that the Flume Agent outputs constantly after the restart: 2014-08-26 10:47:24,572 (SinkRunner-PollingRunner-DefaultSinkProcessor) [ERROR - org.apache.flume.sink.hdfs.AbstractHDFSWriter.isUnderReplicated(AbstractHDFSWriter.java:96)] Unexpected error while checking replication factor java.lang.reflect.InvocationTargetException at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.flume.sink.hdfs.AbstractHDFSWriter.getNumCurrentReplicas(AbstractHDFSWriter.java:162) at org.apache.flume.sink.hdfs.AbstractHDFSWriter.isUnderReplicated(AbstractHDFSWriter.java:82) at org.apache.flume.sink.hdfs.BucketWriter.shouldRotate(BucketWriter.java:452) at org.apache.flume.sink.hdfs.BucketWriter.append(BucketWriter.java:387) at org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:392) at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68) at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147) at java.lang.Thread.run(Thread.java:744) Caused by: java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:735) at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:207) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:525) at org.apache.hadoop.hdfs.DFSOutputStream.createSocketForPipeline(DFSOutputStream.java:1253) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.transfer(DFSOutputStream.java:891) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:881) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:982) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:779) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:448) and 2014-08-26 10:47:29,592 (SinkRunner-PollingRunner-DefaultSinkProcessor) [WARN - org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:418)] HDFS IO error java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:735) at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:207) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:525) at org.apache.hadoop.hdfs.DFSOutputStream.createSocketForPipeline(DFSOutputStream.java:1253) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.transfer(DFSOutputStream.java:891) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:881) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:982) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:779) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:448) I can provide additional information if needed. Thank you very much for any insight you are able to provide into this problem. Best, Andrew CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.