Rename what to streams? Do you mean "ResultPartition" => "StreamPartition"? I'm not sure if that makes it easier to understand what the classes do.
On Mon, Jun 1, 2015 at 10:11 AM, Aljoscha Krettek <aljos...@apache.org> wrote: > +1 > I like it. We are a streaming system underneath after all. > On Jun 1, 2015 10:02 AM, "Ufuk Celebi" <u...@apache.org> wrote: > > > I would like to get this done with the upcoming release to have a stable > > name for the documentation. > > > > Thinking about the names with Stephan, he had a great suggestion to > rename > > them to "streams". > > > > I like this idea very much. The supported result variants make more sense > > when you think about them as streams... blocking vs. pipelined/back > > pressure vs. no back pressure/persistent vs. ephemeral streams. > > > > Any opinions on this? > > > > > > On Wed, Apr 1, 2015 at 3:39 PM, Maximilian Michels <m...@apache.org> > wrote: > > > > > +1 for the renaming proposed by Ufuk. > > > > > > @Stephan: At the moment, the IntermediateDataSet is tight to a > JobVertex. > > > So the renaming makes sense. In the future, it might be constructed > > > differently. Only then, JobVertexResult wouldn't make sense anymore. > I'm > > > not sure if that will even happen. > > > > > > 4) ResultPartition => Result > > > > 5) ResultSubpartition => ResultPartition > > > > > > > > > > Not sure about these. Maybe we should change them to ExecutionResult > and > > > ExecutionResultPartition because that's more specific and would relate > to > > > the other class names. > > > > > > On Wed, Apr 1, 2015 at 10:39 AM, Ufuk Celebi <u...@apache.org> wrote: > > > > > > > To summarize so far: all are in favor of a rename. I agree with both > of > > > > Henry's points regarding the docs. > > > > > > > > @Stephan: what would you suggest? I would trust your gut feeling on > > this > > > > one. ;) JobResult, ExecutionJobResult, ExecutionResult, etc.? > > > > > > > > On Tue, Mar 31, 2015 at 8:16 PM, Henry Saputra < > > henry.sapu...@gmail.com> > > > > wrote: > > > > > > > > > As one of the devs that recently been tracing the runtime portion > of > > > > > the code +1 for renaming for inlining with the concepts. > > > > > > > > > > One thing I would like to have is immediate change to the > > > > > documentation [1] with renaming PR . Otherwise > > > > > > > > > > Then need to file followup ticket to update Kostas' awesome wiki > page > > > > [2]. > > > > > > > > > > - Henry > > > > > > > > > > [1] > > > > > > > > > > > > > > > http://ci.apache.org/projects/flink/flink-docs-master/internal_job_scheduling.html > > > > > [2] > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/Data+exchange+between+tasks > > > > > > > > > > On Tue, Mar 31, 2015 at 7:50 AM, Ufuk Celebi <u...@apache.org> > wrote: > > > > > > On a high level we call intermediate data produced by programs > > > > > "intermediate results". For example in a WordCount map-reduce > program > > > the > > > > > map function produces an intermediate result, which consists of > > (word, > > > 1) > > > > > pairs and the reduce function consumes this intermediate result. > > Kostas > > > > has > > > > > recently added documentation explaining the core concepts [1]. > > > > > > > > > > > > The naming of classes related to intermediate results is > > inconsistent > > > > > (and probably confusing). > > > > > > > > > > > > - In JobGraphs (internal low-level API to define programs) they > are > > > > > called IntermediateDataSet and identified by > IntermediateDataSetIDs. > > > > > > > > > > > > - In ExecutionGraphs (JobManager structure used for state > > > > > tracking/scheduling) they are called IntermediateResult at the > > > > > ExecutionJobVertex (identified by IntermediateDataSetID) and > > > > > IntermediateResultPartition at the ExecutionVertex (identified by > > > > > IntermediateResultPartitionID). > > > > > > > > > > > > - At runtime (TaskManager) they are called ResultPartition and > > > > > identified by ResultPartitionID (composition of ExecutionAttemptID > > and > > > > > IntermediateResultPartitionID). These are further subpartitioned > into > > > > > ResultSubpartition instances. > > > > > > > > > > > > I propose to get the naming more in line with the existing naming > > > > scheme > > > > > and prefix it with the corresponding managemenet structures: > > > > > > > > > > > > 1) IntermediateDataSet => JobVertexResult (identified by > > > > > JobVertexResultID) > > > > > > 2) IntermediateResult => ExecutionJobVertexResult (identified by > > > > > JobVertexResultID) > > > > > > 3) IntermediateResultPartition => ExecutionVertexResult > (identified > > > by > > > > > ExecutionVertexResultID) > > > > > > 4) ResultPartition => Result > > > > > > 5) ResultSubpartition => ResultPartition > > > > > > > > > > > > These names are non-user facing, but still at the core of the > > > system. I > > > > > think that consistent naming of these classes will make it easier > for > > > new > > > > > contributors to get an overview of how single components relate to > > each > > > > > other (the prefixes indicate this). In the docs, we can still refer > > to > > > > the > > > > > high-level concept as "intermediate results". > > > > > > > > > > > > What's your opinion on this? I think now is a good time to think > > > about > > > > > this stuff, because the core classes have only been added recently > to > > > the > > > > > system. Feel free to propose alternatives. :-) > > > > > > > > > > > > – Ufuk > > > > > > > > > > > > [1] > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/Data+exchange+between+tasks > > > > > > > > > > > > > > >