Re: What are the most common operators for shuffle in Spark

2022-01-23 Thread Khalid Mammadov
I don't know actual implementation: But, to me it's still necessary as each worker reads data separately and reduces to get local distinct these will then need to be shuffled to find actual distinct. On Sun, 23 Jan 2022, 17:39 ashok34...@yahoo.com.INVALID, wrote: > Hello, > > I know some operat

What are the most common operators for shuffle in Spark

2022-01-23 Thread ashok34...@yahoo.com.INVALID
Hello, I know some operators in Spark are expensive because of shuffle. This document describes shuffle https://www.educba.com/spark-shuffle/ and saysMore shufflings in numbers are not always bad. Memory constraints and other impossibilities can be overcome by shuffling. In RDD, the below are a