http://spark.apache.org/docs/latest/streaming-kafka-integration.html#approach-2-direct-approach-no-receivers
also has an example of how to close over the offset ranges so they are available on executors. On Fri, Sep 25, 2015 at 12:50 PM, Neelesh <neele...@gmail.com> wrote: > Hi, > We are using DirectKafkaInputDStream and store completed consumer > offsets in Kafka (0.8.2). However, some of our use case require that > offsets be not written if processing of a partition fails with certain > exceptions. This allows us to build various backoff strategies for that > partition, instead of either blindly committing consumer offsets regardless > of errors (because KafkaRDD as HasOffsetRanges is available only on the > driver) or relying on Spark's retry logic and continuing without remedial > action. > > I was playing with SparkListener and found that while one can listen on > taskCompletedEvent on the driver and even figure out that there was an > error, there is no way of mapping this task back to the partition and > retrieving offset range, topic & kafka partition # etc. > > Any pointers appreciated! > > Thanks! > -neelesh >