Re: [DISCUSS] FLIP-194: Introduce the JobResultStore

2021-12-06 Thread David Morávek
> > I also hope that we could remove all the pointers in the HA store(ZK, > ConfigMap) in the future. I'll open a new thread with {user,dev}@f.a.o to verify the thoughts around strong-read-after consistency for FileSystems. If that goes well I can see it as one of the possible topics for 1.16 ;)

Re: [DISCUSS] FLIP-194: Introduce the JobResultStore

2021-12-06 Thread Yang Wang
Thanks for the fruitful discussion. I also hope that we could remove all the pointers in the HA store(ZK, ConfigMap) in the future. After then, we only rely on the ZK/ConfigMap for leader election/retrieval. Best, Yang David Morávek 于2021年12月6日周一 下午4:57写道: > as all of the concerns seems to be

Re: [DISCUSS] FLIP-194: Introduce the JobResultStore

2021-12-06 Thread David Morávek
as all of the concerns seems to be addressed, I'd like to proceed with the vote to move things forward. Thanks everyone for the feedback, it was really helpful! Best, D. On Wed, Dec 1, 2021 at 6:39 AM Zhu Zhu wrote: > Thanks for the explanation Matthias. The solution sounds good to me. > I hav

Re: [DISCUSS] FLIP-194: Introduce the JobResultStore

2021-11-30 Thread Zhu Zhu
Thanks for the explanation Matthias. The solution sounds good to me. I have no more concerns and +1 for the FLIP. Thanks, Zhu Xintong Song 于2021年12月1日周三 下午12:56写道: > @David, > > Thanks for the clarification. > > No more concerns from my side. +1 for this FLIP. > > Thank you~ > > Xintong Song >

Re: [DISCUSS] FLIP-194: Introduce the JobResultStore

2021-11-30 Thread Xintong Song
@David, Thanks for the clarification. No more concerns from my side. +1 for this FLIP. Thank you~ Xintong Song On Wed, Dec 1, 2021 at 12:28 AM Till Rohrmann wrote: > Given the other breaking changes, I think that it is ok to remove the > `RunningJobsRegistry` completely. > > Since we allow

Re: [DISCUSS] FLIP-194: Introduce the JobResultStore

2021-11-30 Thread Till Rohrmann
Given the other breaking changes, I think that it is ok to remove the `RunningJobsRegistry` completely. Since we allow users to specify a HighAvailabilityServices implementation when starting Flink via `high-availability: FQDN`, I think we should mark the interface at least @Experimental. Cheers,

Re: [DISCUSS] FLIP-194: Introduce the JobResultStore

2021-11-30 Thread Mika Naylor
Hi Till, We thought that breaking interfaces, specifically HighAvailabilityServices and RunningJobsRegistry, was acceptable in this instance because: - Neither of these interfaces are marked @Public and so carry no guarantees about being public and stable. - As far as we are aware, we currentl

Re: [DISCUSS] FLIP-194: Introduce the JobResultStore

2021-11-30 Thread David Morávek
Hi Xintong, However, it's probably not so good for users who don't need such > retrieval and already used a ZooKeeper/Native-Kubernetes HA to specify > another remote FS path for storing job results, even if they are > automatically cleaned-up on committed. > Users of ZK / k8s HA are forced to us

Re: [DISCUSS] FLIP-194: Introduce the JobResultStore

2021-11-30 Thread Xintong Song
Thanks for the explanations, Matthias. Including JobResultStore in HighAvailabilityServices as a replacement of RunningJobRegistry makes sense to me. And initializing JobResultStore in the same way initializing JobGraphStore also sounds good. I have another question concerning where to persist th

Re: [DISCUSS] FLIP-194: Introduce the JobResultStore

2021-11-30 Thread Matthias Pohl
Hi Kurt, thanks for sharing your concerns. Our naming is based on the fact that there is already a JobResult class. That's the metadata container we store in the JobResultStore. That JobResult is furthermore used in the REST API where (ironically) it is handled by the JobExecutionResultHandler. The

Re: [DISCUSS] FLIP-194: Introduce the JobResultStore

2021-11-30 Thread Matthias Pohl
Hi Xintong, your observation is correct. We probably didn't address this in the FLIP explicitly enough. We planned to include it in the HighAvailabilityServices analogously to the RunningJobRegistry (and replace the RunningJobRegistry by the JobResultStore in the end). One additional thing, I want

Re: [DISCUSS] FLIP-194: Introduce the JobResultStore

2021-11-30 Thread Matthias Pohl
Hi Zhu Zhu, thanks for your reply. Your concern is valid. Our goal is to only touch the CompletedCheckpointStore and CheckpointIDCounter without instantiating JobMaster/Scheduler/ExecutionGraph. We would have to initialize these classes (and for the CompletedCheckpointStore reload the CompletedChec

Re: [DISCUSS] FLIP-194: Introduce the JobResultStore

2021-11-29 Thread Xintong Song
Thanks David, Matthias and Mika, I like this FLIP in the way it handles potential re-execution and resource leaks due to clean-up failures. I have one question: Why is this JobResultStore not part of the high availability services? Or ask differently, are there cases that we only need the HA serv

Re: [DISCUSS] FLIP-194: Introduce the JobResultStore

2021-11-29 Thread Kurt Young
Hi, I didn't fully read the FLIP but the name somehow confused me. My first impression of seeing this is we are providing some storage for job execution results, like the one returned with accumulators in batch mode. Would a name like "JobStautsStore" be more appropriate? Best, Kurt On Mon, Nov

Re: [DISCUSS] FLIP-194: Introduce the JobResultStore

2021-11-29 Thread Zhu Zhu
Thanks for drafting this FLIP, Matthias, Mika and David. I like the proposed JobResultStore. Besides addressing the problem of re-executing finished jobs, it's also an important step towards HA of multi-job Flink applications. I have one question that, in the "Cleanup" section, it shows that the

Re: [DISCUSS] FLIP-194: Introduce the JobResultStore

2021-11-26 Thread Till Rohrmann
Thanks for creating this FLIP Matthias, Mika and David. I think the JobResultStore is an important piece for fixing Flink's last high-availability problem (afaik). Once we have this piece in place, users no longer risk to re-execute a successfully completed job. I have one comment concerning brea

[DISCUSS] FLIP-194: Introduce the JobResultStore

2021-11-17 Thread David Morávek
Hi everyone, Matthias, Mika and I want to start a discussion about introduction of a new Flink component, the *JobResultStore*. The main motivation is to address shortcomings of the *RunningJobsRegistry* and surpass it with the new component. These shortcomings have been first described in FLINK-