Re: Flume DStream produces 0 records after HDFS node killed

2017-06-22 Thread N B
This issue got resolved. I was able to trace it to the fact that the driver program's pom.xml was pulling in Spark 2.1.1 which in turn was pulling in Hadoop 2.2.0. Explicitly adding dependencies on Hadoop libraries 2.7.3 resolves it. The following API in HDFS : DatanodeManager.getDatanodeStorageI

Re: Flume DStream produces 0 records after HDFS node killed

2017-06-20 Thread N B
Hadoop version 2.7.3 On Tue, Jun 20, 2017 at 11:12 PM, yohann jardin wrote: > Which version of Hadoop are you running on? > > *Yohann Jardin* > Le 6/21/2017 à 1:06 AM, N B a écrit : > > Ok some more info about this issue to see if someone can shine a light on > what could be going on. I turned o

Re: Flume DStream produces 0 records after HDFS node killed

2017-06-20 Thread yohann jardin
Which version of Hadoop are you running on? Yohann Jardin Le 6/21/2017 à 1:06 AM, N B a écrit : Ok some more info about this issue to see if someone can shine a light on what could be going on. I turned on debug logging for org.apache.spark.streaming.scheduler in the driver process and this is

Re: Flume DStream produces 0 records after HDFS node killed

2017-06-20 Thread N B
Ok some more info about this issue to see if someone can shine a light on what could be going on. I turned on debug logging for org.apache.spark.streaming.scheduler in the driver process and this is what gets thrown in the logs and keeps throwing it even after the downed HDFS node is restarted. Usi

Re: Flume DStream produces 0 records after HDFS node killed

2017-06-20 Thread N B
BTW, this is running on Spark 2.1.1. I have been trying to debug this issue and what I have found till now is that it is somehow related to the Spark WAL. The directory named /receivedBlockMetadata seems to stop getting written to after the point of an HDFS node being killed and restarted. I have

Flume DStream produces 0 records after HDFS node killed

2017-06-19 Thread N B
Hi all, We are running a Standalone Spark Cluster for running a streaming application. The application consumes data from Flume using a Flume Polling stream created as such : flumeStream = FlumeUtils.createPollingStream(streamingContext, socketAddress.toArray(new InetSocketAddress[socketAddre