I think shuffle write size not dependency on the your data, but on the join
operation, maybe your join action, don't need to shuffle more data, because
the table data has already on its partition, so it not need shuffle write,
is it possible?

2015-12-25 0:53 GMT+08:00 gsvic <victora...@gmail.com>:

> Is there any formula with which I could determine Shuffle Write before
> execution?
>
> For example, in Sort Merge join in the stage in which the first table is
> being loaded the shuffle write is 429.2 MB. The table is 5.5G in the HDFS
> with block size 128 MB. Consequently is being loaded in 45
> tasks/partitions.
> How this 5.5 GB results in 429 MB? Could I determine it before execution?
>
> Environment:
> #Workers = 2
> #Cores/Worker = 4
> #Assigned Memory / Worker = 512M
>
> spark.shuffle.partitions=200
> spark.shuffle.compress=false
> spark.shuffle.memoryFraction=0.1
> spark.shuffle.spill=true
>
>
>
> --
> View this message in context:
> http://apache-spark-developers-list.1001551.n3.nabble.com/Shuffle-Write-Size-tp15779.html
> Sent from the Apache Spark Developers List mailing list archive at
> Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>
>

Reply via email to