Hello,
For the need of my application, I need to periodically "shuffle" the data
across nodes/partitions of a reasonably-large dataset. This is an expensive
operation but I only need to do it every now and then. However it seems that
I am doing something wrong because as the iterations go the memo
Thanks a lot for the suggestions!
Le 18/06/2015 15:02, Himanshu Mehra [via Apache Spark User List] a écrit :
> Hi A bellet
>
> You can try RDD.randomSplit(weights array) where a weights array is the
> array of weight you wants to want to put in the consecutive partition
> example RDD.randomSplit(A
Hello,
In the context of a machine learning algorithm, I need to be able to
randomly distribute the elements of a large RDD across partitions (i.e.,
essentially assign each element to a random partition). How could I achieve
this? I have tried to call repartition() with the current number of
parti
Hi everyone,
I have a large RDD and I am trying to create a RDD of a random sample of
pairs of elements from this RDD. The elements composing a pair should come
from the same partition for efficiency. The idea I've come up with is to
take two random samples and then use zipPartitions to pair each
Hello everyone,
I am a Spark novice facing a nontrivial problem to solve with Spark.
I have an RDD consisting of many elements (say, 60K), where each element is
is a d-dimensional vector.
I want to implement an iterative algorithm which does the following. At each
iteration, I want to apply an o