I'm not 100% sure I understand your question, but yes, Spark (both the RDD API and SQL/DataFrame) does partial aggregation.
On Tue, Feb 9, 2016 at 8:37 PM, Rishitesh Mishra <rishi80.mis...@gmail.com> wrote: > Can anybody confirm, whether ANY operator in Spark SQL uses > map-side-combine ? If not, is it safe to assume SortShuffleManager will > always use Serialized sorting in case of queries from Spark SQL ? >