Hi Max, you can always shuffle your elements using the rebalance method. What Flink here does is to distribute the elements of each partition among all available TaskManagers. This happens in a round-robin fashion and is thus not completely random.
A different mean is the partitionCustom method which allows you to specify for each element to which partition it shall be sent. You would have to specify a Partitioner to do this. For the splitting there is at moment no syntactic sugar. What you can do, though, is to assign each item a split ID and then use a filter operation to filter the individual splits. Depending on you split ID distribution you will have differently sized splits. Cheers, Till On Mon, Jun 15, 2015 at 1:50 PM Maximilian Alber alber.maximil...@gmail.com <http://mailto:alber.maximil...@gmail.com> wrote: Hi Flinksters, > > I would like to shuffle my elements in the data set and then split it in > two according to some ratio. Each element in the data set has an unique id. > Is there a nice way to do it with the flink api? > (It would be nice to have guaranteed random shuffling.) > Thanks! > > Cheers, > Max >