Re: N-Fold validation and RDD partitions

Walrus theCat Mon, 24 Mar 2014 16:49:27 -0700

If someone wanted / needed to implement this themselves, are partitions the
correct way to go?  Any tips on how to get started (say, dividing an RDD
into 5 parts)?




On Fri, Mar 21, 2014 at 9:51 AM, Jaonary Rabarisoa <jaon...@gmail.com>wrote:

> Thank you Hai-Anh. Are the files   CrossValidation.scala and 
> RandomSplitRDD.scala
>  enough to use it ? I'm currently using spark 0.9.0 and I to avoid to
> rebuild every thing.
>
>
>
>
> On Fri, Mar 21, 2014 at 4:58 PM, Hai-Anh Trinh <a...@adatao.com> wrote:
>
>> Hi Jaonary,
>>
>> You can find the code for k-fold CV in
>> https://github.com/apache/incubator-spark/pull/448. I have not find the
>> time to resubmit the pull to latest master.
>>
>>
>> On Fri, Mar 21, 2014 at 8:46 PM, Sanjay Awatramani <sanjay_a...@yahoo.com
>> > wrote:
>>
>>> Hi Jaonary,
>>>
>>> I believe the n folds should be mapped into n Keys in spark using a map
>>> function. You can reduce the returned PairRDD and you should get your
>>> metric.
>>> I don't understand partitions fully, but from whatever I understand of
>>> it, they aren't required in your scenario.
>>>
>>> Regards,
>>> Sanjay
>>>
>>>
>>>   On Friday, 21 March 2014 7:03 PM, Jaonary Rabarisoa <jaon...@gmail.com>
>>> wrote:
>>>   Hi
>>>
>>> I need to partition my data represented as RDD into n folds and run
>>> metrics computation in each fold and finally compute the means of my
>>> metrics overall the folds.
>>> Does spark can do the data partition out of the box or do I need to
>>> implement it myself. I know that RDD has a partitions method and
>>> mapPartitions but I really don't understand the purpose and the meaning of
>>> partition here.
>>>
>>>
>>>
>>> Cheers,
>>>
>>> Jaonary
>>>
>>>
>>>
>>
>>
>>  --
>> Hai-Anh Trinh | Senior Software Engineer | http://adatao.com/
>> http://www.linkedin.com/in/haianh
>>
>>
>

Re: N-Fold validation and RDD partitions

Reply via email to