Thank you! Still I cannot guarantee the size of each partition, or can I? Something like randomSplit in Spark.
Cheers, Max On Mon, Jun 15, 2015 at 5:46 PM, Matthias J. Sax < mj...@informatik.hu-berlin.de> wrote: > Hi, > > using partitionCustom, the data distribution depends only on your > probability distribution. If it is uniform, you should be fine (ie, > choosing the channel like > > > private final Random random = new Random(System.currentTimeMillis()); > > int partition(K key, int numPartitions) { > > return random.nextInt(numPartitions); > > } > > should do the trick. > > -Matthias > > On 06/15/2015 05:41 PM, Maximilian Alber wrote: > > Thanks! > > > > Ok, so for a random shuffle I need partitionCustom. But in that case the > > data might be out of balance then? > > > > For the splitting. Is there no way to have exact sizes? > > > > Cheers, > > Max > > > > On Mon, Jun 15, 2015 at 2:26 PM, Till Rohrmann <trohrm...@apache.org > > <mailto:trohrm...@apache.org>> wrote: > > > > Hi Max, > > > > you can always shuffle your elements using the |rebalance| method. > > What Flink here does is to distribute the elements of each partition > > among all available TaskManagers. This happens in a round-robin > > fashion and is thus not completely random. > > > > A different mean is the |partitionCustom| method which allows you to > > specify for each element to which partition it shall be sent. You > > would have to specify a |Partitioner| to do this. > > > > For the splitting there is at moment no syntactic sugar. What you > > can do, though, is to assign each item a split ID and then use a > > |filter| operation to filter the individual splits. Depending on you > > split ID distribution you will have differently sized splits. > > > > Cheers, > > Till > > > > On Mon, Jun 15, 2015 at 1:50 PM Maximilian Alber > > alber.maximil...@gmail.com > > <http://mailto:alber.maximil...@gmail.com> wrote: > > > > Hi Flinksters, > > > > I would like to shuffle my elements in the data set and then > > split it in two according to some ratio. Each element in the > > data set has an unique id. Is there a nice way to do it with the > > flink api? > > (It would be nice to have guaranteed random shuffling.) > > Thanks! > > > > Cheers, > > Max > > > > > > > > > >