Re: [DISCUSS] Inconsistent naming of intermediate results

Maximilian Michels Thu, 04 Jun 2015 04:11:17 -0700

Rename what to streams? Do you mean "ResultPartition" => "StreamPartition"?
I'm not sure if that makes it easier to understand what the classes do.


On Mon, Jun 1, 2015 at 10:11 AM, Aljoscha Krettek <[email protected]>
wrote:

> +1
> I like it. We are a streaming system underneath after all.
> On Jun 1, 2015 10:02 AM, "Ufuk Celebi" <[email protected]> wrote:
>
> > I would like to get this done with the upcoming release to have a stable
> > name for the documentation.
> >
> > Thinking about the names with Stephan, he had a great suggestion to
> rename
> > them to "streams".
> >
> > I like this idea very much. The supported result variants make more sense
> > when you think about them as streams... blocking vs. pipelined/back
> > pressure vs. no back pressure/persistent vs. ephemeral streams.
> >
> > Any opinions on this?
> >
> >
> > On Wed, Apr 1, 2015 at 3:39 PM, Maximilian Michels <[email protected]>
> wrote:
> >
> > > +1 for the renaming proposed by Ufuk.
> > >
> > > @Stephan: At the moment, the IntermediateDataSet is tight to a
> JobVertex.
> > > So the renaming makes sense. In the future, it might be constructed
> > > differently. Only then, JobVertexResult wouldn't make sense anymore.
> I'm
> > > not sure if that will even happen.
> > >
> > > 4) ResultPartition => Result
> > > > 5) ResultSubpartition => ResultPartition
> > > >
> > >
> > > Not sure about these. Maybe we should change them to ExecutionResult
> and
> > > ExecutionResultPartition because that's more specific and would relate
> to
> > > the other class names.
> > >
> > > On Wed, Apr 1, 2015 at 10:39 AM, Ufuk Celebi <[email protected]> wrote:
> > >
> > > > To summarize so far: all are in favor of a rename. I agree with both
> of
> > > > Henry's points regarding the docs.
> > > >
> > > > @Stephan: what would you suggest? I would trust your gut feeling on
> > this
> > > > one. ;) JobResult, ExecutionJobResult, ExecutionResult, etc.?
> > > >
> > > > On Tue, Mar 31, 2015 at 8:16 PM, Henry Saputra <
> > [email protected]>
> > > > wrote:
> > > >
> > > > > As one of the devs that recently been tracing the runtime portion
> of
> > > > > the code +1 for renaming for inlining with the concepts.
> > > > >
> > > > > One thing I would like to have is immediate change to the
> > > > > documentation [1] with renaming PR . Otherwise
> > > > >
> > > > > Then need to file followup ticket to update Kostas' awesome wiki
> page
> > > > [2].
> > > > >
> > > > > - Henry
> > > > >
> > > > > [1]
> > > > >
> > > >
> > >
> >
> http://ci.apache.org/projects/flink/flink-docs-master/internal_job_scheduling.html
> > > > > [2]
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/Data+exchange+between+tasks
> > > > >
> > > > > On Tue, Mar 31, 2015 at 7:50 AM, Ufuk Celebi <[email protected]>
> wrote:
> > > > > > On a high level we call intermediate data produced by programs
> > > > > "intermediate results". For example in a WordCount map-reduce
> program
> > > the
> > > > > map function produces an intermediate result, which consists of
> > (word,
> > > 1)
> > > > > pairs and the reduce function consumes this intermediate result.
> > Kostas
> > > > has
> > > > > recently added documentation explaining the core concepts [1].
> > > > > >
> > > > > > The naming of classes related to intermediate results is
> > inconsistent
> > > > > (and probably confusing).
> > > > > >
> > > > > > - In JobGraphs (internal low-level API to define programs) they
> are
> > > > > called IntermediateDataSet and identified by
> IntermediateDataSetIDs.
> > > > > >
> > > > > > - In ExecutionGraphs (JobManager structure used for state
> > > > > tracking/scheduling) they are called IntermediateResult at the
> > > > > ExecutionJobVertex (identified by IntermediateDataSetID) and
> > > > > IntermediateResultPartition at the ExecutionVertex (identified by
> > > > > IntermediateResultPartitionID).
> > > > > >
> > > > > > - At runtime (TaskManager) they are called ResultPartition and
> > > > > identified by ResultPartitionID (composition of ExecutionAttemptID
> > and
> > > > > IntermediateResultPartitionID). These are further subpartitioned
> into
> > > > > ResultSubpartition instances.
> > > > > >
> > > > > > I propose to get the naming more in line with the existing naming
> > > > scheme
> > > > > and prefix it with the corresponding managemenet structures:
> > > > > >
> > > > > > 1) IntermediateDataSet => JobVertexResult (identified by
> > > > > JobVertexResultID)
> > > > > > 2) IntermediateResult => ExecutionJobVertexResult (identified by
> > > > > JobVertexResultID)
> > > > > > 3) IntermediateResultPartition => ExecutionVertexResult
> (identified
> > > by
> > > > > ExecutionVertexResultID)
> > > > > > 4) ResultPartition => Result
> > > > > > 5) ResultSubpartition => ResultPartition
> > > > > >
> > > > > > These names are non-user facing, but still at the core of the
> > > system. I
> > > > > think that consistent naming of these classes will make it easier
> for
> > > new
> > > > > contributors to get an overview of how single components relate to
> > each
> > > > > other (the prefixes indicate this). In the docs, we can still refer
> > to
> > > > the
> > > > > high-level concept as "intermediate results".
> > > > > >
> > > > > > What's your opinion on this? I think now is a good time to think
> > > about
> > > > > this stuff, because the core classes have only been added recently
> to
> > > the
> > > > > system. Feel free to propose alternatives. :-)
> > > > > >
> > > > > > – Ufuk
> > > > > >
> > > > > > [1]
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/Data+exchange+between+tasks
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] Inconsistent naming of intermediate results

Reply via email to