You can find here a gist that illustrates this issue
https://gist.github.com/jrabary/9953562
I got this with spark from master branch.
On Sat, Mar 29, 2014 at 7:12 PM, Andrew Ash wrote:
> Is this spark 0.9.0? Try setting spark.shuffle.spill=false There was a
> hash collision bug that's fixed in
Is this spark 0.9.0? Try setting spark.shuffle.spill=false There was a hash
collision bug that's fixed in 0.9.1 that might cause you to have too few
results in that join.
Sent from my mobile phone
On Mar 28, 2014 8:04 PM, "Matei Zaharia" wrote:
> Weird, how exactly are you pulling out the sample
Weird, how exactly are you pulling out the sample? Do you have a small program
that reproduces this?
Matei
On Mar 28, 2014, at 3:09 AM, Jaonary Rabarisoa wrote:
> I forgot to mention that I don't really use all of my data. Instead I use a
> sample extracted with randomSample.
>
>
> On Fri,
I forgot to mention that I don't really use all of my data. Instead I use a
sample extracted with randomSample.
On Fri, Mar 28, 2014 at 10:58 AM, Jaonary Rabarisoa wrote:
> Hi all,
>
> I notice that RDD.cartesian has a strange behavior with cached and
> uncached data. More precisely, I have a se
Hi all,
I notice that RDD.cartesian has a strange behavior with cached and uncached
data. More precisely, I have a set of data that I load with objectFile
*val data: RDD[(Int,String,Array[Double])] = sc.objectFile("data")*
Then I split it in two set depending on some criteria
*val part1 = data