Hi, using partitionCustom, the data distribution depends only on your probability distribution. If it is uniform, you should be fine (ie, choosing the channel like
> private final Random random = new Random(System.currentTimeMillis()); > int partition(K key, int numPartitions) { > return random.nextInt(numPartitions); > } should do the trick. -Matthias On 06/15/2015 05:41 PM, Maximilian Alber wrote: > Thanks! > > Ok, so for a random shuffle I need partitionCustom. But in that case the > data might be out of balance then? > > For the splitting. Is there no way to have exact sizes? > > Cheers, > Max > > On Mon, Jun 15, 2015 at 2:26 PM, Till Rohrmann <trohrm...@apache.org > <mailto:trohrm...@apache.org>> wrote: > > Hi Max, > > you can always shuffle your elements using the |rebalance| method. > What Flink here does is to distribute the elements of each partition > among all available TaskManagers. This happens in a round-robin > fashion and is thus not completely random. > > A different mean is the |partitionCustom| method which allows you to > specify for each element to which partition it shall be sent. You > would have to specify a |Partitioner| to do this. > > For the splitting there is at moment no syntactic sugar. What you > can do, though, is to assign each item a split ID and then use a > |filter| operation to filter the individual splits. Depending on you > split ID distribution you will have differently sized splits. > > Cheers, > Till > > On Mon, Jun 15, 2015 at 1:50 PM Maximilian Alber > alber.maximil...@gmail.com > <http://mailto:alber.maximil...@gmail.com> wrote: > > Hi Flinksters, > > I would like to shuffle my elements in the data set and then > split it in two according to some ratio. Each element in the > data set has an unique id. Is there a nice way to do it with the > flink api? > (It would be nice to have guaranteed random shuffling.) > Thanks! > > Cheers, > Max > > > >
signature.asc
Description: OpenPGP digital signature