Well. That is quite a vast topic. For starters you could investigate the namenode and datanode logs to see what kinds of warnings, errors are coming up over there.
On Wed, Aug 7, 2013 at 11:11 AM, Miguel Coelho dos Santos <miguel.coelho.san...@cern.ch> wrote: > Thanks for the quick response. > Any additional pointers on what typically in unhealthy in the HDFS cluster > when this error occurs? > > Miguel > ________________________________________ > From: Jeff Lord [jl...@cloudera.com] > Sent: 07 August 2013 19:31 > To: user@flume.apache.org > Subject: Re: java.io.IOException: Bad response ERROR for block ... from > datanode > > Miguel, > > These errors usually indicate that there is a problem on your HDFS cluster. > You should probably investigate the health of the cluster first. > > -Jeff > > On Wed, Aug 7, 2013 at 7:21 AM, Miguel Coelho dos Santos > <miguel.coelho.san...@cern.ch> wrote: >> Hi, >> >> we are using flume to write data to hdfs. >> Our hdfs sinks recently started to report these errors. >> >> 07 Aug 2013 08:41:42,776 WARN [ResponseProcessor for block >> BP-1897030109-WAS_IP_1-1343818418899:blk_-6113435038800423576_51499410] >> (org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run:747) >> - DFSOutputStream ResponseProcessor exception for block >> BP-1897030109-WAS_IP_1-1343818418899:blk_-6113435038800423576_51499410 >> java.io.IOException: Bad response ERROR for block >> BP-1897030109-WAS_IP_1-1343818418899:blk_-6113435038800423576_51499410 from >> datanode WAS_IP_2:50010 >> >> 07 Aug 2013 08:41:42,776 WARN [DataStreamer for file >> /tmp/syslog/px401/2013-07.tmp/29.tmp/FlumeData.1375857577304.tmp block >> BP-1897030109-WAS_IP_1-1343818418899:blk_-6113435038800423576_51499410] >> (org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery:964) >> - Error Recovery for block >> BP-1897030109-WAS_IP_1-1343818418899:blk_-6113435038800423576_51499410 in >> pipeline 10.32.22.22:50010, WAS_IP_3:50010, WAS_IP_2:50010: bad datanode >> WAS_IP_2:50010 >> >> The errors are not referencing a single datanode, we see errors that >> reference all datanodes in the hadoop cluster. >> >> Has anyone seen these errors in the past? What could be causing this? >> >> Happy to provide extra details about the setup or config files if needed. >> Please note that the various IPs were sanitized into WAS_IP_1, etc. >> >> Regards, >> Miguel