Ok, thanks for the clarification Till!
On Thu, Mar 31, 2016 at 2:14 PM, Till Rohrmann wrote:
> A partition is the portion of data each task receives. Thus, the degree of
> parallelism of your program/task decides how many different partitions you
> have. Depending on the upstream operators (and
A partition is the portion of data each task receives. Thus, the degree of
parallelism of your program/task decides how many different partitions you
have. Depending on the upstream operators (and which data is send to which
task), the partitions will most likely differ in size.
Cheers,
Till
On T
Hi Till and Tarandeep,
I'm also interested in better understanding my knowledge about the concept
of a partition..
>From what I know a partition is the portion of data assigned by the job
manager to each task manager..right?
Then, each partition is divided again at the task manager to maximize the
Hi Tarandeep,
the number of elements in each partition should stay constant. In fact the
elements in each partition should not change.
Cheers,
Till
On Wed, Mar 30, 2016 at 8:14 AM, Tarandeep Singh
wrote:
> Hi,
>
> I am looking at implementation of zipWithIndex in DataSetUtils-
>
> https://gith
Hi,
I am looking at implementation of zipWithIndex in DataSetUtils-
https://github.com/apache/flink/blob/master/flink-java/src/main/java/org/apache/flink/api/java/utils/DataSetUtils.java
It works in two phases/steps
1) Count number of elements in each partition (using mapPartition)
2) In second m