As one of the devs that recently been tracing the runtime portion of the code +1 for renaming for inlining with the concepts.
One thing I would like to have is immediate change to the documentation [1] with renaming PR . Otherwise Then need to file followup ticket to update Kostas' awesome wiki page [2]. - Henry [1] http://ci.apache.org/projects/flink/flink-docs-master/internal_job_scheduling.html [2] https://cwiki.apache.org/confluence/display/FLINK/Data+exchange+between+tasks On Tue, Mar 31, 2015 at 7:50 AM, Ufuk Celebi <u...@apache.org> wrote: > On a high level we call intermediate data produced by programs "intermediate > results". For example in a WordCount map-reduce program the map function > produces an intermediate result, which consists of (word, 1) pairs and the > reduce function consumes this intermediate result. Kostas has recently added > documentation explaining the core concepts [1]. > > The naming of classes related to intermediate results is inconsistent (and > probably confusing). > > - In JobGraphs (internal low-level API to define programs) they are called > IntermediateDataSet and identified by IntermediateDataSetIDs. > > - In ExecutionGraphs (JobManager structure used for state > tracking/scheduling) they are called IntermediateResult at the > ExecutionJobVertex (identified by IntermediateDataSetID) and > IntermediateResultPartition at the ExecutionVertex (identified by > IntermediateResultPartitionID). > > - At runtime (TaskManager) they are called ResultPartition and identified by > ResultPartitionID (composition of ExecutionAttemptID and > IntermediateResultPartitionID). These are further subpartitioned into > ResultSubpartition instances. > > I propose to get the naming more in line with the existing naming scheme and > prefix it with the corresponding managemenet structures: > > 1) IntermediateDataSet => JobVertexResult (identified by JobVertexResultID) > 2) IntermediateResult => ExecutionJobVertexResult (identified by > JobVertexResultID) > 3) IntermediateResultPartition => ExecutionVertexResult (identified by > ExecutionVertexResultID) > 4) ResultPartition => Result > 5) ResultSubpartition => ResultPartition > > These names are non-user facing, but still at the core of the system. I think > that consistent naming of these classes will make it easier for new > contributors to get an overview of how single components relate to each other > (the prefixes indicate this). In the docs, we can still refer to the > high-level concept as "intermediate results". > > What's your opinion on this? I think now is a good time to think about this > stuff, because the core classes have only been added recently to the system. > Feel free to propose alternatives. :-) > > – Ufuk > > [1] > https://cwiki.apache.org/confluence/display/FLINK/Data+exchange+between+tasks