Is there any formula with which I could determine Shuffle Write before
execution?

For example, in Sort Merge join in the stage in which the first table is
being loaded the shuffle write is 429.2 MB. The table is 5.5G in the HDFS
with block size 128 MB. Consequently is being loaded in 45 tasks/partitions.
How this 5.5 GB results in 429 MB? Could I determine it before execution? 

Environment:
#Workers = 2
#Cores/Worker = 4
#Assigned Memory / Worker = 512M

spark.shuffle.partitions=200
spark.shuffle.compress=false
spark.shuffle.memoryFraction=0.1
spark.shuffle.spill=true



--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/Shuffle-Write-Size-tp15779.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Reply via email to