Re: Is shuffle "stable"?

Matei Zaharia Sat, 14 Jun 2014 15:56:26 -0700

The order is not guaranteed actually, only which keys end up in each partition. 
Reducers may fetch data from map tasks in an arbitrary order, depending on 
which ones are available first. If you’d like a specific order, you should sort 
each partition. Here you might be getting it because each partition only ends 
up having one element, and collect() does return the partitions in order.


Matei

On Jun 14, 2014, at 12:14 PM, Daniel Darabos <daniel.dara...@lynxanalytics.com> 
wrote:

> What I mean is, let's say I run this:
> 
> sc.parallelize(Seq(0->3, 0->2, 0->1), 
> 3).partitionBy(HashPartitioner(3)).collect
> 
> Will the result always be Array((0,3), (0,2), (0,1))? Or could I possibly get 
> a different order?
> 
> I'm pretty sure the shuffle files are taken in the order of the source 
> partitions... But after much search, and the discussion on 
> http://stackoverflow.com/questions/24206660/does-groupbykey-in-spark-preserve-the-original-order
>  I still can't find the code that does this.
> 
> Thanks!

Re: Is shuffle "stable"?

Reply via email to