I am leaning towards something like that. Things get interesting when multiple different transformations and regrouping happen. At the end of it all, when the "task" is done, we no longer are sure which kafka partition they came from, even when all transforms/ grouping happen local to the original partition.
Registering an onTaskCompleted listener on my custom RDDIterator seemed like a good choice, since the Iterator is the only one that really knows about the kafka offsets for that partition. Granted, the driver has a list of them too, but there is no way of mapping the task that failed/succeeded to the list offsets (HasOffsetRanges) on the driver. All of this may seem counterintuitive to how the streaming jobs are generated (in memory increments to offsets). One of my requirements is to not generate an RDD, if the previous one did not complete - because the business logic cannot go forward in the face of errors, and has to stop consuming from the kafka partition that caused failures. This is easy enough to implement because I have my own DStream/RDD/Iterator implementations - basically from DirectKafkaInputDStream et al, and modified to support dynamically adding/removing topics. Now the issue has moved to how can I update consumer offsets for a partition when a task succeeds. Thanks! On Mon, Dec 21, 2015 at 8:24 AM, Cody Koeninger <c...@koeninger.org> wrote: > Honestly it's a lot easier to deal with this using transactions. > > Someone else would have to speak to the possibility of getting task > failures added to listener callbacks. > > On Sat, Dec 19, 2015 at 5:44 PM, Neelesh <neele...@gmail.com> wrote: > >> Hi, >> I'm trying to build automatic Kafka watermark handling in my stream >> apps by overriding the KafkaRDDIterator, and adding a >> taskcompletionlistener and updating watermarks if task was completed (the >> iterator has access to offsets). But I found out that there is no way to >> listen to a task error inside the executors. Only the driver gets the >> TaskReason notification, but not the task completion listener. This >> basically means watermarks will be updated regardless of whether the task >> completed successfully or not. >> >> Is there any way to listen for the task failures on the executors? >> >> Thanks! >> -neelesh >> > >