There is also https://github.com/apache/spark/pull/18 against the current repo which may be easier to apply.
On Fri, Mar 21, 2014 at 8:58 AM, Hai-Anh Trinh <a...@adatao.com> wrote: > Hi Jaonary, > > You can find the code for k-fold CV in > https://github.com/apache/incubator-spark/pull/448. I have not find the > time to resubmit the pull to latest master. > > > On Fri, Mar 21, 2014 at 8:46 PM, Sanjay Awatramani > <sanjay_a...@yahoo.com>wrote: > >> Hi Jaonary, >> >> I believe the n folds should be mapped into n Keys in spark using a map >> function. You can reduce the returned PairRDD and you should get your >> metric. >> I don't understand partitions fully, but from whatever I understand of >> it, they aren't required in your scenario. >> >> Regards, >> Sanjay >> >> >> On Friday, 21 March 2014 7:03 PM, Jaonary Rabarisoa <jaon...@gmail.com> >> wrote: >> Hi >> >> I need to partition my data represented as RDD into n folds and run >> metrics computation in each fold and finally compute the means of my >> metrics overall the folds. >> Does spark can do the data partition out of the box or do I need to >> implement it myself. I know that RDD has a partitions method and >> mapPartitions but I really don't understand the purpose and the meaning of >> partition here. >> >> >> >> Cheers, >> >> Jaonary >> >> >> > > > -- > Hai-Anh Trinh | Senior Software Engineer | http://adatao.com/ > http://www.linkedin.com/in/haianh > > -- Cell : 425-233-8271