Thanks Chesnay for drafting the FLIP and starting this discussion. I have a couple of comments:
* I know that I've also coined the terms global/local result partition but maybe it is not the perfect name. Maybe we could rethink the terminology and call them persistent result partitions? * Nit: I would call the last parameter of void releasePartitions(JobID jobId, Collection<ResultPartitionID> partitionsToRelease, Collection<ResultPartitionID> partitionsToPromote) either partitionsToRetain or partitionsToPersistent. * I'm not sure whether partitionsToRelease should contain a global/persistent result partition id. I always thought that the user will be responsible for managing the lifecycle of a global/persistent result partition. * Instead of extending the PartitionTable to be able to store global/persistent and local/transient result partitions, I would rather introduce a global PartitionTable to store the global/persistent result partitions explicitly. I think there is a benefit in making things as explicit as possible. * The handover logic between the JM and the RM for the global/persistent result partitions seems a bit brittle to me. What will happen if the JM cannot reach the RM? I think it would be better if the TM announces the global/persistent result partitions to the RM via its heartbeats. That way we don't rely on an established connection between the JM and RM and we keep the TM as the ground of truth. Moreover, the RM should simply forward the release calls to the TM without much internal logic. Cheers, Till On Fri, Sep 6, 2019 at 3:16 PM Chesnay Schepler <ches...@apache.org> wrote: > Hello, > > FLIP-36 (interactive programming) > < > https://cwiki.apache.org/confluence/display/FLINK/FLIP-36%3A+Support+Interactive+Programming+in+Flink> > > proposes a new programming paradigm where jobs are built incrementally > by the user. > > To support this in an efficient manner I propose to extend partition > life-cycle to support the notion of /global partitions/, which are > partitions that can exist beyond the life-time of a job. > > These partitions could then be re-used by subsequent jobs in a fairly > efficient manner, as they don't have to persisted to an external storage > first and consuming tasks could be scheduled to exploit data-locality. > > The FLIP outlines the required changes on the JobMaster, TaskExecutor > and ResourceManager to support this from a life-cycle perspective. > > This FLIP does /not/ concern itself with the /usage/ of global > partitions, including client-side APIs, job-submission, scheduling and > reading said partitions; these are all follow-ups that will either be > part of FLIP-36 or spliced out into separate FLIPs. > >