Hi folks,

I met several Spark SQL unit test failures when sort-based shuffle is enabled, 
seems Spark SQL uses GenericMutableRow which will make ExternalSorter's 
internal buffer all referred to the same object, I guess GenericMutableRow uses 
only one mutable object to represent different rows, this is OK for hash-based 
shuffle because the row is directly written to file; but will be failed in 
sort-based shuffle because it will store the object to sort them. I just opened 
a JIRA ticket for this, details can be seen in 
https://issues.apache.org/jira/browse/SPARK-2967.

Any suggestion?

Thanks
Jerry

Reply via email to