Hi Flavio,

this kind of feature is indeed useful and currently not supported by Flink.
I think, however, that this feature is a bit trickier to implement, because
Tasks cannot currently initiate checkpoints/savepoints on their own. This
would entail some changes to the lifecycle of a Task and an extra
communication step with the JobManager. However, nothing impossible to do.

Please open a JIRA issue with the description of the problem where we can
continue the discussion.

Cheers,
Till

On Thu, Oct 26, 2017 at 9:58 AM, Fabian Hueske <fhue...@gmail.com> wrote:

> Hi Flavio,
>
> Thanks for bringing up this topic.
> I think running periodic jobs with state that gets restored and persisted
> in a savepoint is a very valid use case and would fit the stream is a
> superset of batch story quite well.
> I'm not sure if this behavior is already supported, but think this would
> be a desirable feature.
>
> I'm looping in Till and Aljoscha who might have some thoughts on this as
> well.
> Depending on the discussion we should open a JIRA for this feature.
>
> Cheers, Fabian
>
> 2017-10-25 10:31 GMT+02:00 Flavio Pompermaier <pomperma...@okkam.it>:
>
>> Hi to all,
>> in my current use case I'd like to improve one step of our batch pipeline.
>> There's one specific job that ingest a tabular dataset (of Rows) and
>> explode it into a set of RDF statements (as Tuples).  The objects we output
>> are a containers of those Tuples (grouped by a field).
>> Flink stateful streaming could be a perfect fit here because we
>> incrementally increase the state of those containers but we don't have to
>> spend a lot of time performing some GET operation to an external Key-value
>> store.
>> The big problem here is that the sources are finite and the state of the
>> job gets lost once the job ends, while I was expecting that Flink was
>> snapshotting the state of its operators before exiting.
>>
>> This idea was inspired by https://data-artisans.com/b
>> log/queryable-state-use-case-demo#no-external-store, whit the difference
>> that one can resume the state of the stateful application only when
>> required.
>> Do you think that it could be possible to support such a use case (that
>> we can summarize as "periodic batch jobs that pick up where they left")?
>>
>> Best,
>> Flavio
>>
>
>

Reply via email to