Hi Jaonary, I believe the n folds should be mapped into n Keys in spark using a map function. You can reduce the returned PairRDD and you should get your metric. I don't understand partitions fully, but from whatever I understand of it, they aren't required in your scenario.
Regards, Sanjay On Friday, 21 March 2014 7:03 PM, Jaonary Rabarisoa <jaon...@gmail.com> wrote: Hi I need to partition my data represented as RDD into n folds and run metrics computation in each fold and finally compute the means of my metrics overall the folds. Does spark can do the data partition out of the box or do I need to implement it myself. I know that RDD has a partitions method and mapPartitions but I really don't understand the purpose and the meaning of partition here. Cheers, Jaonary