I also created a JIRA for task failures https://issues.apache.org/jira/browse/SPARK-12452
On Mon, Dec 21, 2015 at 9:54 AM, Neelesh <neele...@gmail.com> wrote: > I am leaning towards something like that. Things get interesting when > multiple different transformations and regrouping happen. At the end of it > all, when the "task" is done, we no longer are sure which kafka partition > they came from, even when all transforms/ grouping happen local to the > original partition. > > Registering an onTaskCompleted listener on my custom RDDIterator seemed > like a good choice, since the Iterator is the only one that really knows > about the kafka offsets for that partition. Granted, the driver has a list > of them too, but there is no way of mapping the task that failed/succeeded > to the list offsets (HasOffsetRanges) on the driver. > > All of this may seem counterintuitive to how the streaming jobs are > generated (in memory increments to offsets). One of my requirements is to > not generate an RDD, if the previous one did not complete - because the > business logic cannot go forward in the face of errors, and has to stop > consuming from the kafka partition that caused failures. This is easy > enough to implement because I have my own DStream/RDD/Iterator > implementations - basically from DirectKafkaInputDStream et al, and > modified to support dynamically adding/removing topics. Now the issue has > moved to how can I update consumer offsets for a partition when a task > succeeds. > > Thanks! > > > > On Mon, Dec 21, 2015 at 8:24 AM, Cody Koeninger <c...@koeninger.org> > wrote: > >> Honestly it's a lot easier to deal with this using transactions. >> >> Someone else would have to speak to the possibility of getting task >> failures added to listener callbacks. >> >> On Sat, Dec 19, 2015 at 5:44 PM, Neelesh <neele...@gmail.com> wrote: >> >>> Hi, >>> I'm trying to build automatic Kafka watermark handling in my stream >>> apps by overriding the KafkaRDDIterator, and adding a >>> taskcompletionlistener and updating watermarks if task was completed (the >>> iterator has access to offsets). But I found out that there is no way to >>> listen to a task error inside the executors. Only the driver gets the >>> TaskReason notification, but not the task completion listener. This >>> basically means watermarks will be updated regardless of whether the task >>> completed successfully or not. >>> >>> Is there any way to listen for the task failures on the executors? >>> >>> Thanks! >>> -neelesh >>> >> >> >