mapWithState termination

2017-05-31 Thread Dominik Safaric
Dear all, I would appreciate if anyone could explain when does mapWithState terminate, i.e. apply subsequent transformations such as writing the state to an external sink? Given a KafkaConsumer instance pulling messages from a Kafka topic, and a mapWithState transformation updating the state

Spark Streaming 2.1 recovery

2017-05-16 Thread Dominik Safaric
Hi, currently I am exploring Spark’s fault tolerance capabilities in terms of fault recovery. Namely I run a Spark 2.1 standalone cluster on a master and four worker nodes. The application pulls data using the Kafka direct stream API from a Kafka topic over a (sliding) window of time, and write

Spark Streaming 2.1 - slave parallel recovery

2017-05-04 Thread Dominik Safaric
Hi all, I’m running cluster consisting of a master and four slaves. The cluster runs a Spark application that reads data from a Kafka topic over a window of time, and writes the data back to Kafka. Checkpointing is enabled by using HDFS. However, although Spark periodically commits checkpoints

Re: Spark Streaming 2.1 Kafka consumer - retrieving offset commits for each poll

2017-04-27 Thread Dominik Safaric
27, 2017 at 11:33 AM, Dominik Safaric > wrote: >> Indeed I have. But, even when storing the offsets in Spark and committing >> offsets upon completion of an output operation within the foreachRDD call >> (as pointed in the example), the only offset that Spark’s Kafka &g

Re: Spark Streaming 2.1 Kafka consumer - retrieving offset commits for each poll

2017-04-27 Thread Dominik Safaric
ming-kafka-0-10-integration.html#kafka-itself > > On Wed, Apr 26, 2017 at 1:17 PM, Dominik Safaric > wrote: >> The reason why I want to obtain this information, i.e. > timestamp> tuples is to relate the consumption with the production rates >> using the __consumer_offsets Kafka

Re: Spark Streaming 2.1 Kafka consumer - retrieving offset commits for each poll

2017-04-26 Thread Dominik Safaric
ally a meaningful idea for a range of offsets. > > > On Tue, Apr 25, 2017 at 2:43 PM, Dominik Safaric > wrote: >> Hi all, >> >> Because the Spark Streaming direct Kafka consumer maps offsets for a given >> Kafka topic and a partition internally while having

Spark Streaming 2.1 Kafka consumer - retrieving offset commits for each poll

2017-04-25 Thread Dominik Safaric
Hi all, Because the Spark Streaming direct Kafka consumer maps offsets for a given Kafka topic and a partition internally while having enable.auto.commit set to false, how can I retrieve the offset of each made consumer’s poll call using the offset ranges of an RDD? More precisely, the informat

Re: Streaming 2.1.0 - window vs. batch duration

2017-03-18 Thread Dominik Safaric
; 1489765611 > 1489765612 > 1489765613 > Window: > 1489765610 > 1489765611 > 1489765612 > 1489765613 > 1489765614 > Window: > 1489765611 > 1489765612 > 1489765613 > 1489765614 > 1489765615 > > On Thu, Mar 16, 2017 at 2:34 PM, Dominik Safaric

Re: Streaming 2.1.0 - window vs. batch duration

2017-03-18 Thread Dominik Safaric
Window: > 1489765608 > 1489765609 > 1489765610 > 1489765611 > 1489765612 > Window: > 1489765609 > 1489765610 > 1489765611 > 1489765612 > 1489765613 > Window: > 1489765610 > 1489765611 > 1489765612 > 1489765613 > 1489765614 > Window: > 1489765611 > 1

Streaming 2.1.0 - window vs. batch duration

2017-03-16 Thread Dominik Safaric
Hi all, As I’ve implemented a streaming application pulling data from Kafka every 1 second (batch interval), I am observing some quite strange behaviour (didn’t use Spark extensively in the past, but continuous operator based engines instead of). Namely the dstream.window(Seconds(60)) windowe

Re: Spark Streaming - java.lang.ClassNotFoundException Scala anonymous function

2017-03-01 Thread Dominik Safaric
at interfere. > > > On Wed, Mar 1, 2017, 14:20 Dominik Safaric <mailto:dominiksafa...@gmail.com>> wrote: > I've been trying to submit a Spark Streaming application using spark-submit > to a cluster of mine consisting of a master and two worker nodes. The > applicatio

Spark Streaming - java.lang.ClassNotFoundException Scala anonymous function

2017-03-01 Thread Dominik Safaric
I've been trying to submit a Spark Streaming application using spark-submit to a cluster of mine consisting of a master and two worker nodes. The application has been written in Scala, and build using Maven. Importantly, the Maven build is configured to produce a fat JAR containing all dependenc

Spark Streaming - parallel recovery

2017-02-22 Thread Dominik Safaric
Hi, As I am investigate among others onto the fault recovery capabilities of Spark, I’ve been curious - what source code artifact initiates the parallel recovery process? In addition, how is a faulty node detected (from a driver's point of view)? Thanks in advance, Dominik ---

Spark Streaming fault tolerance benchmark

2016-08-13 Thread Dominik Safaric
A few months ago, I've started investigating part of an empirical research several stream processing engines, including but not limited to Spark Streaming. As the benchmark should extend the scope further from performance metrics such as throughput and latency, I've focused onto fault tolerance a

RESOLVED - Re: Apache Spark Kafka Integration - org.apache.spark.SparkException: Couldn't find leader offsets for Set()

2016-06-07 Thread Dominik Safaric
r the host.name it should advertise to the consumers and producers. By setting this property, I instantly started receiving Kafka log messages. Nevertheless, thank you all for your help, I appreciate it! > On 07 Jun 2016, at 17:44, Dominik Safaric wrote: > > Dear Todd, > > By r

Re: Apache Spark Kafka Integration - org.apache.spark.SparkException: Couldn't find leader offsets for Set()

2016-06-07 Thread Dominik Safaric
? > > Can you verify the offsets on the broker: > > kafka-run-class.sh kafka.tools.GetOffsetShell --topic --broker-list > --time -1 > > -Todd > > On Tue, Jun 7, 2016 at 8:17 AM, Dominik Safaric <mailto:dominiksafa...@gmail.com>> wrote: >

Re: Apache Spark Kafka Integration - org.apache.spark.SparkException: Couldn't find leader offsets for Set()

2016-06-07 Thread Dominik Safaric
hanges in the kafka clients between 0.8.2.2 > and 0.9.0.x. See this for more information: > > https://issues.apache.org/jira/browse/SPARK-12177 > <https://issues.apache.org/jira/browse/SPARK-12177> > > -Todd > > On Tue, Jun 7, 2016 at 7:35 AM, Dominik Safaric <mailto:domini

Re: Apache Spark Kafka Integration - org.apache.spark.SparkException: Couldn't find leader offsets for Set()

2016-06-07 Thread Dominik Safaric
> > Hi, > > What's the version of Spark? You're using Kafka 0.9.0.1, ain't you? What's > the topic name? > > Jacek > > On 7 Jun 2016 11:06 a.m., "Dominik Safaric" <mailto:dominiksafa...@gmail.com>> wrote: >

Re: Apache Spark Kafka Integration - org.apache.spark.SparkException: Couldn't find leader offsets for Set()

2016-06-07 Thread Dominik Safaric
AWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > > <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw> > > http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/> > > > On 7 June 2016 at 11:32, Dominik Safaric <mailto:dominiks

Re: Apache Spark Kafka Integration - org.apache.spark.SparkException: Couldn't find leader offsets for Set()

2016-06-07 Thread Dominik Safaric
indow > val countByValueAndWindow = price.filter(_ > > 95.0).countByValueAndWindow(Seconds(windowLength), Seconds(slidingInterval)) > countByValueAndWindow.print() > // > ssc.start() > ssc.awaitTermination() > > HTH > > Dr Mich Talebzadeh > > LinkedIn > https://www.linkedin.co

Re: Apache Spark Kafka Integration - org.apache.spark.SparkException: Couldn't find leader offsets for Set()

2016-06-07 Thread Dominik Safaric
UrV8Pw > > <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw> > > http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/> > > > On 7 June 2016 at 10:06, Dominik Safaric <mailto:dominiksafa...@gmail.c

Apache Spark Kafka Integration - org.apache.spark.SparkException: Couldn't find leader offsets for Set()

2016-06-07 Thread Dominik Safaric
As I am trying to integrate Kafka into Spark, the following exception occurs: org.apache.spark.SparkException: java.nio.channels.ClosedChannelException org.apache.spark.SparkException: Couldn't find leader offsets for Set([**,0]) at org.apache.spark.streaming.kafka.KafkaCluster$$anonfun$ch

Spark Streaming - Travis CI and GitHub custom receiver - continuous data but empty RDD?

2016-03-05 Thread Dominik Safaric
Dear all, Lately, as a part of a scientific research, I've been developing an application that streams (or at least should) data from Travis CI and GitHub, using their REST API's. The purpose of this is to get insight into the commit-build relationship, in order to further perform numerous analysi

Spark Streaming - processing/transforming DStreams using a custom Receiver

2016-02-25 Thread Dominik Safaric
Recently, I've implemented the following Receiver and custom Spark Streaming InputDStream using Scala: /** * The GitHubUtils object declares an interface consisting of overloaded createStream * functions. The createStream function takes as arguments the ctx : StreamingContext * passed by the dr