There is also a "randomSplit" method in the latest version of spark
https://github.com/apache/incubator-spark/blob/master/core/src/main/scala/org/apache/spark/rdd/RDD.scala
On Tue, Mar 25, 2014 at 1:21 AM, Holden Karau wrote:
> There is also https://github.com/apache/spark/pull/18 against the c
There is also https://github.com/apache/spark/pull/18 against the current
repo which may be easier to apply.
On Fri, Mar 21, 2014 at 8:58 AM, Hai-Anh Trinh wrote:
> Hi Jaonary,
>
> You can find the code for k-fold CV in
> https://github.com/apache/incubator-spark/pull/448. I have not find the
>
If someone wanted / needed to implement this themselves, are partitions the
correct way to go? Any tips on how to get started (say, dividing an RDD
into 5 parts)?
On Fri, Mar 21, 2014 at 9:51 AM, Jaonary Rabarisoa wrote:
> Thank you Hai-Anh. Are the files CrossValidation.scala and
> RandomS
Thank you Hai-Anh. Are the files CrossValidation.scala and
RandomSplitRDD.scala
enough to use it ? I'm currently using spark 0.9.0 and I to avoid to
rebuild every thing.
On Fri, Mar 21, 2014 at 4:58 PM, Hai-Anh Trinh wrote:
> Hi Jaonary,
>
> You can find the code for k-fold CV in
> https:/
Hi Jaonary,
You can find the code for k-fold CV in
https://github.com/apache/incubator-spark/pull/448. I have not find the
time to resubmit the pull to latest master.
On Fri, Mar 21, 2014 at 8:46 PM, Sanjay Awatramani wrote:
> Hi Jaonary,
>
> I believe the n folds should be mapped into n Keys i
Hi Jaonary,
I believe the n folds should be mapped into n Keys in spark using a map
function. You can reduce the returned PairRDD and you should get your metric.
I don't understand partitions fully, but from whatever I understand of it, they
aren't required in your scenario.
Regards,
Sanjay
Hi
I need to partition my data represented as RDD into n folds and run metrics
computation in each fold and finally compute the means of my metrics
overall the folds.
Does spark can do the data partition out of the box or do I need to
implement it myself. I know that RDD has a partitions method an