Hi Tarandeep, the number of elements in each partition should stay constant. In fact the elements in each partition should not change.
Cheers, Till On Wed, Mar 30, 2016 at 8:14 AM, Tarandeep Singh <tarand...@gmail.com> wrote: > Hi, > > I am looking at implementation of zipWithIndex in DataSetUtils- > > https://github.com/apache/flink/blob/master/flink-java/src/main/java/org/apache/flink/api/java/utils/DataSetUtils.java > > It works in two phases/steps > 1) Count number of elements in each partition (using mapPartition) > 2) In second mapPartition, unique ID is assigned by calculating offset > using number of elements computed in step 1. > > Is there any chance the second mapPartition won't get same number of > elements as first mapPartition (assuming data is in HDFS)? > > Thanks > Tarandeep >