Re: [DISCUSS] FLIP-194: Introduce the JobResultStore

Xintong Song Tue, 30 Nov 2021 20:57:08 -0800

@David,

Thanks for the clarification.


No more concerns from my side. +1 for this FLIP.

Thank you~

Xintong Song



On Wed, Dec 1, 2021 at 12:28 AM Till Rohrmann <[email protected]> wrote:

> Given the other breaking changes, I think that it is ok to remove the
> `RunningJobsRegistry` completely.
>
> Since we allow users to specify a HighAvailabilityServices implementation
> when starting Flink via `high-availability: FQDN`, I think we should mark
> the interface at least @Experimental.
>
> Cheers,
> Till
>
> On Tue, Nov 30, 2021 at 2:29 PM Mika Naylor <[email protected]> wrote:
>
> > Hi Till,
> >
> > We thought that breaking interfaces, specifically
> > HighAvailabilityServices and RunningJobsRegistry, was acceptable in this
> > instance because:
> >
> > - Neither of these interfaces are marked @Public and so carry no
> >    guarantees about being public and stable.
> > - As far as we are aware, we currently have no users with custom
> >    HighAvailabilityServices implementations.
> > - The interface was already broken in 1.14 with the changes to
> >    CheckpointRecoveryFactory, and will likely be changed again in 1.15
> >    due to further changes in that factory.
> >
> > Given that, we thought changes to the interface would not be disruptive.
> > Perhaps it could be annotated as @Internal - I'm not sure exactly what
> > guarantees we try and give for the stability of the
> > HighAvailabilityServices interface.
> >
> > Kind regards,
> > Mika
> >
> > On 26.11.2021 18:28, Till Rohrmann wrote:
> > >Thanks for creating this FLIP Matthias, Mika and David.
> > >
> > >I think the JobResultStore is an important piece for fixing Flink's last
> > >high-availability problem (afaik). Once we have this piece in place,
> users
> > >no longer risk to re-execute a successfully completed job.
> > >
> > >I have one comment concerning breaking interfaces:
> > >
> > >If we don't want to break interfaces, then we could keep the
> > >HighAvailabilityServices.getRunningJobsRegistry() method and add a
> default
> > >implementation for HighAvailabilityServices.getJobResultStore(). We
> could
> > >then deprecate the former method and then remove it in the subsequent
> > >release (1.16).
> > >
> > >Apart from that, +1 for the FLIP.
> > >
> > >Cheers,
> > >Till
> > >
> > >On Wed, Nov 17, 2021 at 6:05 PM David Morávek <[email protected]> wrote:
> > >
> > >> Hi everyone,
> > >>
> > >> Matthias, Mika and I want to start a discussion about introduction of
> a
> > new
> > >> Flink component, the *JobResultStore*.
> > >>
> > >> The main motivation is to address shortcomings of the
> > *RunningJobsRegistry*
> > >> and surpass it with the new component. These shortcomings have been
> > first
> > >> described in FLINK-11813 [1].
> > >>
> > >> This change should improve the overall stability of the JobManager's
> > >> components and address the race conditions in some of the fail over
> > >> scenarios during the job cleanup lifecycle.
> > >>
> > >> It should also help to ensure that Flink doesn't leave any uncleaned
> > >> resources behind.
> > >>
> > >> We've prepared a FLIP-194 [2], which outlines the design and reasoning
> > >> behind this new component.
> > >>
> > >> [1] https://issues.apache.org/jira/browse/FLINK-11813
> > >> [2]
> > >>
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=195726435
> > >>
> > >> We're looking forward for your feedback ;)
> > >>
> > >> Best,
> > >> Matthias, Mika and David
> > >>
> >
> > Mika Naylor
> > https://autophagy.io
> >
>

Re: [DISCUSS] FLIP-194: Introduce the JobResultStore

Reply via email to