Hi Flavio, Thanks for bringing up this topic. I think running periodic jobs with state that gets restored and persisted in a savepoint is a very valid use case and would fit the stream is a superset of batch story quite well. I'm not sure if this behavior is already supported, but think this would be a desirable feature.
I'm looping in Till and Aljoscha who might have some thoughts on this as well. Depending on the discussion we should open a JIRA for this feature. Cheers, Fabian 2017-10-25 10:31 GMT+02:00 Flavio Pompermaier <pomperma...@okkam.it>: > Hi to all, > in my current use case I'd like to improve one step of our batch pipeline. > There's one specific job that ingest a tabular dataset (of Rows) and > explode it into a set of RDF statements (as Tuples). The objects we output > are a containers of those Tuples (grouped by a field). > Flink stateful streaming could be a perfect fit here because we > incrementally increase the state of those containers but we don't have to > spend a lot of time performing some GET operation to an external Key-value > store. > The big problem here is that the sources are finite and the state of the > job gets lost once the job ends, while I was expecting that Flink was > snapshotting the state of its operators before exiting. > > This idea was inspired by https://data-artisans.com/ > blog/queryable-state-use-case-demo#no-external-store, whit the difference > that one can resume the state of the stateful application only when > required. > Do you think that it could be possible to support such a use case (that we > can summarize as "periodic batch jobs that pick up where they left")? > > Best, > Flavio >