Hi Nick, these are some interesting Ideas. I have been thinking about this, maybe we can add a special output stream (for example Kafka, but can be generic) that would get errors/exceptions that where throws during processing. The actual processing would not stop and the messages in this special stream would contain some information about the current state of processing, the input element(s) and the machine/VM where computation is happening.
In cascading you get the erroneous tuples at the end. This is not possible in streaming, therefore the more stream-y approach/solution. What do you think about that? Cheers, Aljoscha > On 11 Nov 2015, at 21:49, Nick Dimiduk <ndimi...@gmail.com> wrote: > > Heya, > > I don't see a section in the online manual dedicated to this topic, so I want > to raise the question here: How should errors be handled? Specifically I'm > thinking about streaming jobs, which are expected to "never go down". For > example, errors can be raised at the point where objects are serialized > to/from sources/sinks, and UDFs. Cascading provides failure traps [0] where > erroneous tuples are saved off for post-processing. Is there any such > functionality in Flink? > > I started down the road of implementing a Maybe/Optional type, a POJO Generic > triple of <IN, OUT, Throwable> for capturing errors at each stage of a > pipeline. However, Java type erasure means even though it compiles, the job > is rejected at submission time. > > How are other people handling errors in their stream processing? > > Thanks, > Nick > > [0]: http://docs.cascading.org/cascading/1.2/userguide/html/ch06s03.html