Re: Force enabling checkpoints for iterative streaming jobs

Ufuk Celebi Tue, 09 Jun 2015 15:54:12 -0700

Hey Gyula,

I understand your reasoning, but I don't think its worth to rush this into the 
release.

As you've said, we cannot give precise guarantees. But this is arguably one of 
the key requirements for any fault tolerance mechanism. Therefore I disagree 
that this is better than not having anything at all. I think it will already go 
a long way to have the non-iterative case working reliably.

And as far as I know there are no users really suffering from this at the 
moment (in the sense that someone has complained on the mailing list).

Hence, I vote to postpone this.

– Ufuk

On 10 Jun 2015, at 00:19, Gyula Fóra <gyf...@apache.org> wrote:

> Hey all,
> 
> It is currently impossible to enable state checkpointing for iterative
> jobs, because en exception is thrown when creating the jobgraph. This
> behaviour is motivated by the lack of precise guarantees that we can give
> with the current fault-tolerance implementations for cyclic graphs.
> 
> This PR <https://github.com/apache/flink/pull/812> adds an optional flag to
> force checkpoints even in case of iterations. The algorithm will take
> checkpoints periodically as before, but records in transit inside the loop
> will be lost.
> 
> However even this guarantee is enough for most applications (Machine
> Learning for instance) and certainly much better than not having anything
> at all.
> 
> 
> I suggest we add this to the 0.9 release as currently many applications
> suffer from this limitation (SAMOA, ML pipelines, graph streaming etc.)
> 
> 
> Cheers,
> 
> Gyula

Re: Force enabling checkpoints for iterative streaming jobs

Reply via email to