I disagree. Not having checkpointed operators inside the iteration still breaks the guarantees.
It is not about the states it is about the loop itself. On Wed, Jun 10, 2015 at 10:12 AM Aljoscha Krettek <aljos...@apache.org> wrote: > This is the answer I gave on the PR (we should have one place for > discussing this, though): > > I would be against merging this in the current form. What I propose is > to analyse the topology to verify that there are no checkpointed > operators inside iterations. Operators before and after iterations can > be checkpointed and we can safely allow the user to enable > checkpointing. > > If we have the code to analyse which operators are inside iterations > we could also disallow windows inside iterations. I think windows > inside iterations don't make sense since elements in different > "iterations" would end up in the same window. Maybe I'm wrong here > though, then please correct me. > > On Wed, Jun 10, 2015 at 10:08 AM, Márton Balassi > <balassi.mar...@gmail.com> wrote: > > I agree that for the sake of the above mentioned use cases it is > reasonable > > to add this to the release with the right documentation, for machine > > learning potentially loosing one round of feedback data should not > matter. > > > > Let us not block prominent users until the next release on this. > > > > On Wed, Jun 10, 2015 at 8:09 AM, Gyula Fóra <gyula.f...@gmail.com> > wrote: > > > >> As for people currently suffering from it: > >> > >> An application King is developing requires iterations, and they need > >> checkpoints. Practically all SAMOA programs would need this. > >> > >> It is very likely that the state interfaces will be changed after the > >> release, so this is not something that we can just add later. I don't > see a > >> reason why we should not add it, as it is clearly documented. In this > >> actual case not having guarantees at all means people will never use it > in > >> any production system. Having limited guarantees means that it will > depend > >> on the application. > >> > >> On Wed, Jun 10, 2015 at 12:53 AM, Ufuk Celebi <u...@apache.org> wrote: > >> > >> > Hey Gyula, > >> > > >> > I understand your reasoning, but I don't think its worth to rush this > >> into > >> > the release. > >> > > >> > As you've said, we cannot give precise guarantees. But this is > arguably > >> > one of the key requirements for any fault tolerance mechanism. > Therefore > >> I > >> > disagree that this is better than not having anything at all. I think > it > >> > will already go a long way to have the non-iterative case working > >> reliably. > >> > > >> > And as far as I know there are no users really suffering from this at > the > >> > moment (in the sense that someone has complained on the mailing list). > >> > > >> > Hence, I vote to postpone this. > >> > > >> > – Ufuk > >> > > >> > On 10 Jun 2015, at 00:19, Gyula Fóra <gyf...@apache.org> wrote: > >> > > >> > > Hey all, > >> > > > >> > > It is currently impossible to enable state checkpointing for > iterative > >> > > jobs, because en exception is thrown when creating the jobgraph. > This > >> > > behaviour is motivated by the lack of precise guarantees that we can > >> give > >> > > with the current fault-tolerance implementations for cyclic graphs. > >> > > > >> > > This PR <https://github.com/apache/flink/pull/812> adds an optional > >> > flag to > >> > > force checkpoints even in case of iterations. The algorithm will > take > >> > > checkpoints periodically as before, but records in transit inside > the > >> > loop > >> > > will be lost. > >> > > > >> > > However even this guarantee is enough for most applications (Machine > >> > > Learning for instance) and certainly much better than not having > >> anything > >> > > at all. > >> > > > >> > > > >> > > I suggest we add this to the 0.9 release as currently many > applications > >> > > suffer from this limitation (SAMOA, ML pipelines, graph streaming > etc.) > >> > > > >> > > > >> > > Cheers, > >> > > > >> > > Gyula > >> > > >> > > >> >