Re: Possible space improvements to shuffle

2015-06-02 Thread John Carrino
ined > && !aggregator.isDefined. If an aggregator is defined but we don't have > an ordering, then I don't think it makes sense to sort the keys based on > their hashcodes or some default ordering, since hashcode collisions would > lead to incorrect results for sort-bas

Possible space improvements to shuffle

2015-06-02 Thread John Carrino
One thing I have noticed with ExternalSorter is that if an ordering is not defined, it does the sort using only the partition_id, instead of (parition_id, hash). This means that on the reduce side you need to pull the entire dataset into memory before you can begin iterating over the results. I f