Hi,

using partitionCustom, the data distribution depends only on your
probability distribution. If it is uniform, you should be fine (ie,
choosing the channel like

> private final Random random = new Random(System.currentTimeMillis());
> int partition(K key, int numPartitions) {
>   return random.nextInt(numPartitions);
> }

should do the trick.

-Matthias

On 06/15/2015 05:41 PM, Maximilian Alber wrote:
> Thanks!
> 
> Ok, so for a random shuffle I need partitionCustom. But in that case the
> data might be out of balance then?
> 
> For the splitting. Is there no way to have exact sizes?
> 
> Cheers,
> Max
> 
> On Mon, Jun 15, 2015 at 2:26 PM, Till Rohrmann <trohrm...@apache.org
> <mailto:trohrm...@apache.org>> wrote:
> 
>     Hi Max,
> 
>     you can always shuffle your elements using the |rebalance| method.
>     What Flink here does is to distribute the elements of each partition
>     among all available TaskManagers. This happens in a round-robin
>     fashion and is thus not completely random.
> 
>     A different mean is the |partitionCustom| method which allows you to
>     specify for each element to which partition it shall be sent. You
>     would have to specify a |Partitioner| to do this.
> 
>     For the splitting there is at moment no syntactic sugar. What you
>     can do, though, is to assign each item a split ID and then use a
>     |filter| operation to filter the individual splits. Depending on you
>     split ID distribution you will have differently sized splits.
> 
>     Cheers,
>     Till
> 
>     On Mon, Jun 15, 2015 at 1:50 PM Maximilian Alber
>     alber.maximil...@gmail.com
>     <http://mailto:alber.maximil...@gmail.com> wrote:
> 
>         Hi Flinksters,
> 
>         I would like to shuffle my elements in the data set and then
>         split it in two according to some ratio. Each element in the
>         data set has an unique id. Is there a nice way to do it with the
>         flink api?
>         (It would be nice to have guaranteed random shuffling.)
>         Thanks!
> 
>         Cheers,
>         Max
> 
>     ​
> 
> 

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to