On a high level we call intermediate data produced by programs "intermediate results". For example in a WordCount map-reduce program the map function produces an intermediate result, which consists of (word, 1) pairs and the reduce function consumes this intermediate result. Kostas has recently added documentation explaining the core concepts [1].
The naming of classes related to intermediate results is inconsistent (and probably confusing). - In JobGraphs (internal low-level API to define programs) they are called IntermediateDataSet and identified by IntermediateDataSetIDs. - In ExecutionGraphs (JobManager structure used for state tracking/scheduling) they are called IntermediateResult at the ExecutionJobVertex (identified by IntermediateDataSetID) and IntermediateResultPartition at the ExecutionVertex (identified by IntermediateResultPartitionID). - At runtime (TaskManager) they are called ResultPartition and identified by ResultPartitionID (composition of ExecutionAttemptID and IntermediateResultPartitionID). These are further subpartitioned into ResultSubpartition instances. I propose to get the naming more in line with the existing naming scheme and prefix it with the corresponding managemenet structures: 1) IntermediateDataSet => JobVertexResult (identified by JobVertexResultID) 2) IntermediateResult => ExecutionJobVertexResult (identified by JobVertexResultID) 3) IntermediateResultPartition => ExecutionVertexResult (identified by ExecutionVertexResultID) 4) ResultPartition => Result 5) ResultSubpartition => ResultPartition These names are non-user facing, but still at the core of the system. I think that consistent naming of these classes will make it easier for new contributors to get an overview of how single components relate to each other (the prefixes indicate this). In the docs, we can still refer to the high-level concept as "intermediate results". What's your opinion on this? I think now is a good time to think about this stuff, because the core classes have only been added recently to the system. Feel free to propose alternatives. :-) – Ufuk [1] https://cwiki.apache.org/confluence/display/FLINK/Data+exchange+between+tasks