Hi Mich, at the moment there is not much support handle such data driven exceptions (badly formatted data, late data, ...). However, there is a proposal to improve this: FLIP-13 [1]. So it is work in progress.
It would be very helpful if you could check if the proposal would address your use cases. Until then, I guess that either the split operator or directly writing to an external storage system would be the way to go. Best, Fabian [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-13+Side+Outputs+in+Flink 2016-11-11 2:33 GMT+01:00 Michel Betancourt <michelbetanco...@gmail.com>: > > Hi, new to Apache Flink. Trying to find some solid input on how best to > handle exceptions in streams -- specifically those that should not > interrupt the stream. > > For example, if an error occurs during deserialization from bytes/Strings > to your data-type, in my use-case I would rather queue the data for visual > inspection than discard it and filter it out. > > One way of doing this is to diverge the stream so that good items take one > path, while bad items take another. > > The closest thing I can find in Flink that can achieve this effect is the > split operator. The caveat is that split does not also allow for inlined > transformations. In other words, the best use of split appears first > perform your logic that catches the exception. Then pass the exception > into the next stage which uses split to check for an exception and > providing names to each piece of the decision, for example "OK" vs "error". > > Frameworks like RX (Reactive Extensions, eg RxJava) have built in > functionality that allows the user to decide if they want to handle > exceptions globally or specifically and resume if needed. I was hoping to > find similar operations in Flink but so far no luck. > > At any rate, it would be great to get some feedback to see if I am heading > down the good path here, and whether there are any caveats / gotchas to be > aware of? > > Thanks! > Mich > >