Hi all! I think we need to change the interface of the streaming source function.
The function currently has simply a run() method where it does its work, until canceled. With this, it is hard to write sources, where the state and the snapshot barriers are exactly aligned. When performing the checkpoint, the vertex will grab the state from the source and inject a checkpoint barrier. It is not clear that the injected barrier aligns with the state, because the source may have emitted more records since grabbing the state, or not emitted the record that is reflected in the state (offset). If we change the interface to a more iterator-like interface (hasNext() and next()), then the vertex calls these methods and can checkpoint in-between calling the methods. After hasNext() is a well defined point, where the state can be grabbed and the barrier be emitted. Any opinions on that? Stephan