State snapshotting when source is finite

Flavio Pompermaier Wed, 25 Oct 2017 01:32:24 -0700

Hi to all,
in my current use case I'd like to improve one step of our batch pipeline.
There's one specific job that ingest a tabular dataset (of Rows) and
explode it into a set of RDF statements (as Tuples).  The objects we output
are a containers of those Tuples (grouped by a field).
Flink stateful streaming could be a perfect fit here because we
incrementally increase the state of those containers but we don't have to
spend a lot of time performing some GET operation to an external Key-value
store.
The big problem here is that the sources are finite and the state of the
job gets lost once the job ends, while I was expecting that Flink was
snapshotting the state of its operators before exiting.


This idea was inspired by
https://data-artisans.com/blog/queryable-state-use-case-demo#no-external-store,
whit the difference that one can resume the state of the stateful
application only when required.
Do you think that it could be possible to support such a use case (that we
can summarize as "periodic batch jobs that pick up where they left")?

Best,
Flavio

State snapshotting when source is finite

Reply via email to