Re: [DISCUSS] Inconsistent naming of intermediate results

Maximilian Michels Wed, 01 Apr 2015 06:39:52 -0700

+1 for the renaming proposed by Ufuk.

@Stephan: At the moment, the IntermediateDataSet is tight to a JobVertex.
So the renaming makes sense. In the future, it might be constructed
differently. Only then, JobVertexResult wouldn't make sense anymore. I'm
not sure if that will even happen.


4) ResultPartition => Result
> 5) ResultSubpartition => ResultPartition
>

Not sure about these. Maybe we should change them to ExecutionResult and
ExecutionResultPartition because that's more specific and would relate to
the other class names.

On Wed, Apr 1, 2015 at 10:39 AM, Ufuk Celebi <u...@apache.org> wrote:

> To summarize so far: all are in favor of a rename. I agree with both of
> Henry's points regarding the docs.
>
> @Stephan: what would you suggest? I would trust your gut feeling on this
> one. ;) JobResult, ExecutionJobResult, ExecutionResult, etc.?
>
> On Tue, Mar 31, 2015 at 8:16 PM, Henry Saputra <henry.sapu...@gmail.com>
> wrote:
>
> > As one of the devs that recently been tracing the runtime portion of
> > the code +1 for renaming for inlining with the concepts.
> >
> > One thing I would like to have is immediate change to the
> > documentation [1] with renaming PR . Otherwise
> >
> > Then need to file followup ticket to update Kostas' awesome wiki page
> [2].
> >
> > - Henry
> >
> > [1]
> >
> http://ci.apache.org/projects/flink/flink-docs-master/internal_job_scheduling.html
> > [2]
> >
> https://cwiki.apache.org/confluence/display/FLINK/Data+exchange+between+tasks
> >
> > On Tue, Mar 31, 2015 at 7:50 AM, Ufuk Celebi <u...@apache.org> wrote:
> > > On a high level we call intermediate data produced by programs
> > "intermediate results". For example in a WordCount map-reduce program the
> > map function produces an intermediate result, which consists of (word, 1)
> > pairs and the reduce function consumes this intermediate result. Kostas
> has
> > recently added documentation explaining the core concepts [1].
> > >
> > > The naming of classes related to intermediate results is inconsistent
> > (and probably confusing).
> > >
> > > - In JobGraphs (internal low-level API to define programs) they are
> > called IntermediateDataSet and identified by IntermediateDataSetIDs.
> > >
> > > - In ExecutionGraphs (JobManager structure used for state
> > tracking/scheduling) they are called IntermediateResult at the
> > ExecutionJobVertex (identified by IntermediateDataSetID) and
> > IntermediateResultPartition at the ExecutionVertex (identified by
> > IntermediateResultPartitionID).
> > >
> > > - At runtime (TaskManager) they are called ResultPartition and
> > identified by ResultPartitionID (composition of ExecutionAttemptID and
> > IntermediateResultPartitionID). These are further subpartitioned into
> > ResultSubpartition instances.
> > >
> > > I propose to get the naming more in line with the existing naming
> scheme
> > and prefix it with the corresponding managemenet structures:
> > >
> > > 1) IntermediateDataSet => JobVertexResult (identified by
> > JobVertexResultID)
> > > 2) IntermediateResult => ExecutionJobVertexResult (identified by
> > JobVertexResultID)
> > > 3) IntermediateResultPartition => ExecutionVertexResult (identified by
> > ExecutionVertexResultID)
> > > 4) ResultPartition => Result
> > > 5) ResultSubpartition => ResultPartition
> > >
> > > These names are non-user facing, but still at the core of the system. I
> > think that consistent naming of these classes will make it easier for new
> > contributors to get an overview of how single components relate to each
> > other (the prefixes indicate this). In the docs, we can still refer to
> the
> > high-level concept as "intermediate results".
> > >
> > > What's your opinion on this? I think now is a good time to think about
> > this stuff, because the core classes have only been added recently to the
> > system. Feel free to propose alternatives. :-)
> > >
> > > – Ufuk
> > >
> > > [1]
> >
> https://cwiki.apache.org/confluence/display/FLINK/Data+exchange+between+tasks
> >
>

Re: [DISCUSS] Inconsistent naming of intermediate results

Reply via email to