What are the most common operators for shuffle in Spark

ashok34...@yahoo.com.INVALID Sun, 23 Jan 2022 09:39:51 -0800

Hello,
I know some operators in Spark are expensive because of shuffle.
This document describes shuffle
https://www.educba.com/spark-shuffle/


and saysMore shufflings in numbers are not always bad. Memory constraints and 
other impossibilities can be overcome by shuffling.

In RDD, the below are a few operations and examples of shuffle:
– subtractByKey
– groupBy
– foldByKey
– reduceByKey
– aggregateByKey
– transformations of a join of any type
– distinct
– cogroup
I know some operations like reduceBykey are well known for creating shuffle but 
what I don't understand why distinct operation should cause shuffle!

Thanking

What are the most common operators for shuffle in Spark

Reply via email to