The elements that are in-flight in an iteration are also state of the
job. I'm wondering whether the state inside iterations still makes
sense without these in-flight elements. But I also don't know the King
use-case, that's why I though an example could be helpful.

On Wed, Jun 10, 2015 at 12:37 PM, Gyula Fóra <gyula.f...@gmail.com> wrote:
> I don't understand the question, I vote for checkpointing all state in the
> job, even inside iterations (its more of a loop).
>
> Aljoscha Krettek <aljos...@apache.org> ezt írta (időpont: 2015. jún. 10.,
> Sze, 12:34):
>
>> I don't understand why having the state inside an iteration but not
>> the elements that correspond to this state or created this state is
>> desirable. Maybe an example could help understand this better?
>>
>> On Wed, Jun 10, 2015 at 11:27 AM, Gyula Fóra <gyula.f...@gmail.com> wrote:
>> > The other tests verify that the checkpointing algorithm runs properly.
>> That
>> > also ensures that it runs for iterations because a loop is just an extra
>> > source and sink in the jobgraph (so it is the same for the algorithm).
>> >
>> > Fabian Hueske <fhue...@gmail.com> ezt írta (időpont: 2015. jún. 10.,
>> Sze,
>> > 11:19):
>> >
>> >> Without going into the details, how well tested is this feature? The PR
>> >> only extends one test by a few lines.
>> >>
>> >> Is that really enough to ensure that
>> >> 1) the change does not cause trouble
>> >> 2) is working as expected
>> >>
>> >> If this feature should go into the release, it must be thoroughly
>> checked
>> >> and we must take the time for that.
>> >> Including code and hoping for the best because time is scarce is not an
>> >> option IMO.
>> >>
>> >> Fabian
>> >>
>> >>
>> >> 2015-06-10 11:05 GMT+02:00 Gyula Fóra <gyula.f...@gmail.com>:
>> >>
>> >> > And also I would like to remind everyone that any fault tolerance we
>> >> > provide is only as good as the fault tolerance of the master node.
>> Which
>> >> is
>> >> > non existent at the moment.
>> >> >
>> >> > So I don't see a reason why a user should not be able to choose
>> whether
>> >> he
>> >> > wants state checkpoints for iterations as well.
>> >> >
>> >> > In any case this will be used by King for instance, so making it part
>> of
>> >> > the release would save a lot of work for everyone.
>> >> >
>> >> > Paris Carbone <par...@kth.se> ezt írta (időpont: 2015. jún. 10., Sze,
>> >> > 10:29):
>> >> >
>> >> > >
>> >> > > To continue Gyula's point, for consistent snapshots we need to
>> persist
>> >> > the
>> >> > > records in transit within the loop  and also slightly change the
>> >> current
>> >> > > protocol since it works only for DAGs. Before going into that
>> direction
>> >> > > though I would propose we first see whether there is a nice way to
>> make
>> >> > > iterations more structured.
>> >> > >
>> >> > > Paris
>> >> > > ________________________________________
>> >> > > From: Gyula Fóra <gyula.f...@gmail.com>
>> >> > > Sent: Wednesday, June 10, 2015 10:19 AM
>> >> > > To: dev@flink.apache.org
>> >> > > Subject: Re: Force enabling checkpoints for iterative streaming jobs
>> >> > >
>> >> > > I disagree. Not having checkpointed operators inside the iteration
>> >> still
>> >> > > breaks the guarantees.
>> >> > >
>> >> > > It is not about the states it is about the loop itself.
>> >> > > On Wed, Jun 10, 2015 at 10:12 AM Aljoscha Krettek <
>> aljos...@apache.org
>> >> >
>> >> > > wrote:
>> >> > >
>> >> > > > This is the answer I gave on the PR (we should have one place for
>> >> > > > discussing this, though):
>> >> > > >
>> >> > > > I would be against merging this in the current form. What I
>> propose
>> >> is
>> >> > > > to analyse the topology to verify that there are no checkpointed
>> >> > > > operators inside iterations. Operators before and after iterations
>> >> can
>> >> > > > be checkpointed and we can safely allow the user to enable
>> >> > > > checkpointing.
>> >> > > >
>> >> > > > If we have the code to analyse which operators are inside
>> iterations
>> >> > > > we could also disallow windows inside iterations. I think windows
>> >> > > > inside iterations don't make sense since elements in different
>> >> > > > "iterations" would end up in the same window. Maybe I'm wrong here
>> >> > > > though, then please correct me.
>> >> > > >
>> >> > > > On Wed, Jun 10, 2015 at 10:08 AM, Márton Balassi
>> >> > > > <balassi.mar...@gmail.com> wrote:
>> >> > > > > I agree that for the sake of the above mentioned use cases it is
>> >> > > > reasonable
>> >> > > > > to add this to the release with the right documentation, for
>> >> machine
>> >> > > > > learning potentially loosing one round of feedback data should
>> not
>> >> > > > matter.
>> >> > > > >
>> >> > > > > Let us not block prominent users until the next release on this.
>> >> > > > >
>> >> > > > > On Wed, Jun 10, 2015 at 8:09 AM, Gyula Fóra <
>> gyula.f...@gmail.com>
>> >> > > > wrote:
>> >> > > > >
>> >> > > > >> As for people currently suffering from it:
>> >> > > > >>
>> >> > > > >> An application King is developing requires iterations, and they
>> >> need
>> >> > > > >> checkpoints. Practically all SAMOA programs would need this.
>> >> > > > >>
>> >> > > > >> It is very likely that the state interfaces will be changed
>> after
>> >> > the
>> >> > > > >> release, so this is not something that we can just add later. I
>> >> > don't
>> >> > > > see a
>> >> > > > >> reason why we should not add it, as it is clearly documented.
>> In
>> >> > this
>> >> > > > >> actual case not having guarantees at all means people will
>> never
>> >> use
>> >> > > it
>> >> > > > in
>> >> > > > >> any production system. Having limited guarantees means that it
>> >> will
>> >> > > > depend
>> >> > > > >> on the application.
>> >> > > > >>
>> >> > > > >> On Wed, Jun 10, 2015 at 12:53 AM, Ufuk Celebi <u...@apache.org>
>> >> > wrote:
>> >> > > > >>
>> >> > > > >> > Hey Gyula,
>> >> > > > >> >
>> >> > > > >> > I understand your reasoning, but I don't think its worth to
>> rush
>> >> > > this
>> >> > > > >> into
>> >> > > > >> > the release.
>> >> > > > >> >
>> >> > > > >> > As you've said, we cannot give precise guarantees. But this
>> is
>> >> > > > arguably
>> >> > > > >> > one of the key requirements for any fault tolerance
>> mechanism.
>> >> > > > Therefore
>> >> > > > >> I
>> >> > > > >> > disagree that this is better than not having anything at
>> all. I
>> >> > > think
>> >> > > > it
>> >> > > > >> > will already go a long way to have the non-iterative case
>> >> working
>> >> > > > >> reliably.
>> >> > > > >> >
>> >> > > > >> > And as far as I know there are no users really suffering from
>> >> this
>> >> > > at
>> >> > > > the
>> >> > > > >> > moment (in the sense that someone has complained on the
>> mailing
>> >> > > list).
>> >> > > > >> >
>> >> > > > >> > Hence, I vote to postpone this.
>> >> > > > >> >
>> >> > > > >> > – Ufuk
>> >> > > > >> >
>> >> > > > >> > On 10 Jun 2015, at 00:19, Gyula Fóra <gyf...@apache.org>
>> wrote:
>> >> > > > >> >
>> >> > > > >> > > Hey all,
>> >> > > > >> > >
>> >> > > > >> > > It is currently impossible to enable state checkpointing
>> for
>> >> > > > iterative
>> >> > > > >> > > jobs, because en exception is thrown when creating the
>> >> jobgraph.
>> >> > > > This
>> >> > > > >> > > behaviour is motivated by the lack of precise guarantees
>> that
>> >> we
>> >> > > can
>> >> > > > >> give
>> >> > > > >> > > with the current fault-tolerance implementations for cyclic
>> >> > > graphs.
>> >> > > > >> > >
>> >> > > > >> > > This PR <https://github.com/apache/flink/pull/812> adds an
>> >> > > optional
>> >> > > > >> > flag to
>> >> > > > >> > > force checkpoints even in case of iterations. The algorithm
>> >> will
>> >> > > > take
>> >> > > > >> > > checkpoints periodically as before, but records in transit
>> >> > inside
>> >> > > > the
>> >> > > > >> > loop
>> >> > > > >> > > will be lost.
>> >> > > > >> > >
>> >> > > > >> > > However even this guarantee is enough for most applications
>> >> > > (Machine
>> >> > > > >> > > Learning for instance) and certainly much better than not
>> >> having
>> >> > > > >> anything
>> >> > > > >> > > at all.
>> >> > > > >> > >
>> >> > > > >> > >
>> >> > > > >> > > I suggest we add this to the 0.9 release as currently many
>> >> > > > applications
>> >> > > > >> > > suffer from this limitation (SAMOA, ML pipelines, graph
>> >> > streaming
>> >> > > > etc.)
>> >> > > > >> > >
>> >> > > > >> > >
>> >> > > > >> > > Cheers,
>> >> > > > >> > >
>> >> > > > >> > > Gyula
>> >> > > > >> >
>> >> > > > >> >
>> >> > > > >>
>> >> > > >
>> >> > >
>> >> >
>> >>
>>

Reply via email to