Re: N-Fold validation and RDD partitions

Holden Karau Mon, 24 Mar 2014 17:22:27 -0700

There is also https://github.com/apache/spark/pull/18 against the current
repo which may be easier to apply.



On Fri, Mar 21, 2014 at 8:58 AM, Hai-Anh Trinh <a...@adatao.com> wrote:

> Hi Jaonary,
>
> You can find the code for k-fold CV in
> https://github.com/apache/incubator-spark/pull/448. I have not find the
> time to resubmit the pull to latest master.
>
>
> On Fri, Mar 21, 2014 at 8:46 PM, Sanjay Awatramani 
> <sanjay_a...@yahoo.com>wrote:
>
>> Hi Jaonary,
>>
>> I believe the n folds should be mapped into n Keys in spark using a map
>> function. You can reduce the returned PairRDD and you should get your
>> metric.
>> I don't understand partitions fully, but from whatever I understand of
>> it, they aren't required in your scenario.
>>
>> Regards,
>> Sanjay
>>
>>
>>   On Friday, 21 March 2014 7:03 PM, Jaonary Rabarisoa <jaon...@gmail.com>
>> wrote:
>>   Hi
>>
>> I need to partition my data represented as RDD into n folds and run
>> metrics computation in each fold and finally compute the means of my
>> metrics overall the folds.
>> Does spark can do the data partition out of the box or do I need to
>> implement it myself. I know that RDD has a partitions method and
>> mapPartitions but I really don't understand the purpose and the meaning of
>> partition here.
>>
>>
>>
>> Cheers,
>>
>> Jaonary
>>
>>
>>
>
>
> --
> Hai-Anh Trinh | Senior Software Engineer | http://adatao.com/
> http://www.linkedin.com/in/haianh
>
>


-- 
Cell : 425-233-8271

Re: N-Fold validation and RDD partitions

Reply via email to