Hi Max,

you can always shuffle your elements using the rebalance method. What Flink
here does is to distribute the elements of each partition among all
available TaskManagers. This happens in a round-robin fashion and is thus
not completely random.

A different mean is the partitionCustom method which allows you to specify
for each element to which partition it shall be sent. You would have to
specify a Partitioner to do this.

For the splitting there is at moment no syntactic sugar. What you can do,
though, is to assign each item a split ID and then use a filter operation
to filter the individual splits. Depending on you split ID distribution you
will have differently sized splits.

Cheers,
Till

On Mon, Jun 15, 2015 at 1:50 PM Maximilian Alber alber.maximil...@gmail.com
<http://mailto:alber.maximil...@gmail.com> wrote:

Hi Flinksters,
>
> I would like to shuffle my elements in the data set and then split it in
> two according to some ratio. Each element in the data set has an unique id.
> Is there a nice way to do it with the flink api?
> (It would be nice to have guaranteed random shuffling.)
> Thanks!
>
> Cheers,
> Max
>
​

Reply via email to