[ 
https://issues.apache.org/jira/browse/FLINK-7930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16685388#comment-16685388
 ] 

Flavio Pompermaier commented on FLINK-7930:
-------------------------------------------

What I really like to do with Flink is probably too specific in order to get it 
natively supported I fear... that is, I would like to be able to synchronize a 
continuously running streaming job and a sporadic batch job.

The first one reads all incoming rows and efficiently group them by a field, 
updating the internal Flink (map) state; the second job, the batch one, runs 
from time to time and reads all the current state (scan + some filter ideally) 
in order to apply some transformation on it. However, the batch job should 
starts only once the streaming job has no more incoming data and it has 
snapshotted the current state somewhere (in order to make it available as a 
source for the batch job). 

Unfortunately, this is not very easy to achieve at the moment: there should be 
a way to check if there is unprocessed incoming data from all the job sources 
and, only in that case, trigger a state checkpoint *cancelling the job...that 
is somehow a workaround because I don't really need to stop it, I just need to 
force the flush of the state).

But also supposing to reduce the complexity of the process removing the first 
check (about available incoming data) and only forcing a checkpoint, there's 
still the problem to use the checkpointed data as a source for the batch job...

As I said in the beginning of this comment this is probably a very specific 
case but I'd like to know if there are other similar use cases and how they 
were solved..

> Support periodic jobs with state that gets restored and persisted in a 
> savepoint 
> ---------------------------------------------------------------------------------
>
>                 Key: FLINK-7930
>                 URL: https://issues.apache.org/jira/browse/FLINK-7930
>             Project: Flink
>          Issue Type: New Feature
>          Components: DataStream API
>            Reporter: Flavio Pompermaier
>            Priority: Major
>              Labels: stateful, streaming
>
> As discussed in 
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/State-snapshotting-when-source-is-finite-td16398.html,
>  it could be useful to support the use case of  periodic jobs with state that 
> gets restored and persisted in a savepoint (in order to avoid the need of an 
> external sink)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to