Thanks!

Ok, so for a random shuffle I need partitionCustom. But in that case the
data might be out of balance then?

For the splitting. Is there no way to have exact sizes?

Cheers,
Max

On Mon, Jun 15, 2015 at 2:26 PM, Till Rohrmann <trohrm...@apache.org> wrote:

> Hi Max,
>
> you can always shuffle your elements using the rebalance method. What
> Flink here does is to distribute the elements of each partition among all
> available TaskManagers. This happens in a round-robin fashion and is thus
> not completely random.
>
> A different mean is the partitionCustom method which allows you to
> specify for each element to which partition it shall be sent. You would
> have to specify a Partitioner to do this.
>
> For the splitting there is at moment no syntactic sugar. What you can do,
> though, is to assign each item a split ID and then use a filter operation
> to filter the individual splits. Depending on you split ID distribution you
> will have differently sized splits.
>
> Cheers,
> Till
>
> On Mon, Jun 15, 2015 at 1:50 PM Maximilian Alber
> alber.maximil...@gmail.com <http://mailto:alber.maximil...@gmail.com>
> wrote:
>
> Hi Flinksters,
>>
>> I would like to shuffle my elements in the data set and then split it in
>> two according to some ratio. Each element in the data set has an unique id.
>> Is there a nice way to do it with the flink api?
>> (It would be nice to have guaranteed random shuffling.)
>> Thanks!
>>
>> Cheers,
>> Max
>>
> ​
>

Reply via email to