Hi Jaonary, You can find the code for k-fold CV in https://github.com/apache/incubator-spark/pull/448. I have not find the time to resubmit the pull to latest master.
On Fri, Mar 21, 2014 at 8:46 PM, Sanjay Awatramani <sanjay_a...@yahoo.com>wrote: > Hi Jaonary, > > I believe the n folds should be mapped into n Keys in spark using a map > function. You can reduce the returned PairRDD and you should get your > metric. > I don't understand partitions fully, but from whatever I understand of it, > they aren't required in your scenario. > > Regards, > Sanjay > > > On Friday, 21 March 2014 7:03 PM, Jaonary Rabarisoa <jaon...@gmail.com> > wrote: > Hi > > I need to partition my data represented as RDD into n folds and run > metrics computation in each fold and finally compute the means of my > metrics overall the folds. > Does spark can do the data partition out of the box or do I need to > implement it myself. I know that RDD has a partitions method and > mapPartitions but I really don't understand the purpose and the meaning of > partition here. > > > > Cheers, > > Jaonary > > > -- Hai-Anh Trinh | Senior Software Engineer | http://adatao.com/ http://www.linkedin.com/in/haianh