Thanks Chesnay for drafting the FLIP and starting this discussion.

I have a couple of comments:

* I know that I've also coined the terms global/local result partition but
maybe it is not the perfect name. Maybe we could rethink the terminology
and call them persistent result partitions?
* Nit: I would call the last parameter of void releasePartitions(JobID
jobId, Collection<ResultPartitionID> partitionsToRelease,
Collection<ResultPartitionID> partitionsToPromote) either
partitionsToRetain or partitionsToPersistent.
* I'm not sure whether partitionsToRelease should contain a
global/persistent result partition id. I always thought that the user will
be responsible for managing the lifecycle of a global/persistent
result partition.
* Instead of extending the PartitionTable to be able to store
global/persistent and local/transient result partitions, I would rather
introduce a global PartitionTable to store the global/persistent result
partitions explicitly. I think there is a benefit in making things as
explicit as possible.
* The handover logic between the JM and the RM for the global/persistent
result partitions seems a bit brittle to me. What will happen if the JM
cannot reach the RM? I think it would be better if the TM announces the
global/persistent result partitions to the RM via its heartbeats. That way
we don't rely on an established connection between the JM and RM and we
keep the TM as the ground of truth. Moreover, the RM should simply forward
the release calls to the TM without much internal logic.

Cheers,
Till

On Fri, Sep 6, 2019 at 3:16 PM Chesnay Schepler <ches...@apache.org> wrote:

> Hello,
>
> FLIP-36 (interactive programming)
> <
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-36%3A+Support+Interactive+Programming+in+Flink>
>
> proposes a new programming paradigm where jobs are built incrementally
> by the user.
>
> To support this in an efficient manner I propose to extend partition
> life-cycle to support the notion of /global partitions/, which are
> partitions that can exist beyond the life-time of a job.
>
> These partitions could then be re-used by subsequent jobs in a fairly
> efficient manner, as they don't have to persisted to an external storage
> first and consuming tasks could be scheduled to exploit data-locality.
>
> The FLIP outlines the required changes on the JobMaster, TaskExecutor
> and ResourceManager to support this from a life-cycle perspective.
>
> This FLIP does /not/ concern itself with the /usage/ of global
> partitions, including client-side APIs, job-submission, scheduling and
> reading said partitions; these are all follow-ups that will either be
> part of FLIP-36 or spliced out into separate FLIPs.
>
>

Reply via email to