Hi A bellet

You can try RDD.randomSplit(weights array) where a weights array is the
array of weight you wants to want to put in the consecutive partition
example RDD.randomSplit(Array(0.7, 0.3)) will create two partitions
containing 70% data in one and 30% in other, randomly selecting the
elements. RDD.randomSplit(Array(0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1,
0.1, )) will create 10 partitions of randomly selected elements with equal
weights.
 Thank you


Himanshu



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Best-way-to-randomly-distribute-elements-tp23391p23392.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to