Re: [DISCUSS] Inconsistent naming of intermediate results

Ufuk Celebi Wed, 01 Apr 2015 01:40:59 -0700

To summarize so far: all are in favor of a rename. I agree with both of
Henry's points regarding the docs.


@Stephan: what would you suggest? I would trust your gut feeling on this
one. ;) JobResult, ExecutionJobResult, ExecutionResult, etc.?

On Tue, Mar 31, 2015 at 8:16 PM, Henry Saputra <[email protected]>
wrote:

> As one of the devs that recently been tracing the runtime portion of
> the code +1 for renaming for inlining with the concepts.
>
> One thing I would like to have is immediate change to the
> documentation [1] with renaming PR . Otherwise
>
> Then need to file followup ticket to update Kostas' awesome wiki page [2].
>
> - Henry
>
> [1]
> http://ci.apache.org/projects/flink/flink-docs-master/internal_job_scheduling.html
> [2]
> https://cwiki.apache.org/confluence/display/FLINK/Data+exchange+between+tasks
>
> On Tue, Mar 31, 2015 at 7:50 AM, Ufuk Celebi <[email protected]> wrote:
> > On a high level we call intermediate data produced by programs
> "intermediate results". For example in a WordCount map-reduce program the
> map function produces an intermediate result, which consists of (word, 1)
> pairs and the reduce function consumes this intermediate result. Kostas has
> recently added documentation explaining the core concepts [1].
> >
> > The naming of classes related to intermediate results is inconsistent
> (and probably confusing).
> >
> > - In JobGraphs (internal low-level API to define programs) they are
> called IntermediateDataSet and identified by IntermediateDataSetIDs.
> >
> > - In ExecutionGraphs (JobManager structure used for state
> tracking/scheduling) they are called IntermediateResult at the
> ExecutionJobVertex (identified by IntermediateDataSetID) and
> IntermediateResultPartition at the ExecutionVertex (identified by
> IntermediateResultPartitionID).
> >
> > - At runtime (TaskManager) they are called ResultPartition and
> identified by ResultPartitionID (composition of ExecutionAttemptID and
> IntermediateResultPartitionID). These are further subpartitioned into
> ResultSubpartition instances.
> >
> > I propose to get the naming more in line with the existing naming scheme
> and prefix it with the corresponding managemenet structures:
> >
> > 1) IntermediateDataSet => JobVertexResult (identified by
> JobVertexResultID)
> > 2) IntermediateResult => ExecutionJobVertexResult (identified by
> JobVertexResultID)
> > 3) IntermediateResultPartition => ExecutionVertexResult (identified by
> ExecutionVertexResultID)
> > 4) ResultPartition => Result
> > 5) ResultSubpartition => ResultPartition
> >
> > These names are non-user facing, but still at the core of the system. I
> think that consistent naming of these classes will make it easier for new
> contributors to get an overview of how single components relate to each
> other (the prefixes indicate this). In the docs, we can still refer to the
> high-level concept as "intermediate results".
> >
> > What's your opinion on this? I think now is a good time to think about
> this stuff, because the core classes have only been added recently to the
> system. Feel free to propose alternatives. :-)
> >
> > – Ufuk
> >
> > [1]
> https://cwiki.apache.org/confluence/display/FLINK/Data+exchange+between+tasks
>

Re: [DISCUSS] Inconsistent naming of intermediate results

Reply via email to