Re: State snapshotting when source is finite

Fabian Hueske Thu, 26 Oct 2017 01:00:05 -0700

Hi Flavio,

Thanks for bringing up this topic.
I think running periodic jobs with state that gets restored and persisted
in a savepoint is a very valid use case and would fit the stream is a
superset of batch story quite well.
I'm not sure if this behavior is already supported, but think this would be
a desirable feature.


I'm looping in Till and Aljoscha who might have some thoughts on this as
well.
Depending on the discussion we should open a JIRA for this feature.

Cheers, Fabian

2017-10-25 10:31 GMT+02:00 Flavio Pompermaier <pomperma...@okkam.it>:

> Hi to all,
> in my current use case I'd like to improve one step of our batch pipeline.
> There's one specific job that ingest a tabular dataset (of Rows) and
> explode it into a set of RDF statements (as Tuples).  The objects we output
> are a containers of those Tuples (grouped by a field).
> Flink stateful streaming could be a perfect fit here because we
> incrementally increase the state of those containers but we don't have to
> spend a lot of time performing some GET operation to an external Key-value
> store.
> The big problem here is that the sources are finite and the state of the
> job gets lost once the job ends, while I was expecting that Flink was
> snapshotting the state of its operators before exiting.
>
> This idea was inspired by https://data-artisans.com/
> blog/queryable-state-use-case-demo#no-external-store, whit the difference
> that one can resume the state of the stateful application only when
> required.
> Do you think that it could be possible to support such a use case (that we
> can summarize as "periodic batch jobs that pick up where they left")?
>
> Best,
> Flavio
>

Re: State snapshotting when source is finite

Reply via email to