[Spark SQL] was it correct that only one executor was used to shuffle the data for reduce task?

des...@163.com Mon, 25 Jun 2018 07:15:17 -0700

hi,all
    i'm using spark-2.0.0 on hdp 2.5.0 to build a spark-sql app,below is the 
spark-submit configuration:
spark-submit    \
--class "FASTMDTFlow"     \
--master yarn \
--deploy-mode client \
--driver-memory 12g \
--num-executors 110    \
--executor-memory 8g \
--executor-cores 3 \
--conf "spark.driver.maxResultSize=2g" \
--conf "spark.sql.codegen=true" \
--conf "spark.default.parallelism=360" \
--conf "spark.sql.shuffle.partitions=800" \
my.jar


The app was written in  scala with pure sql statements(simply contained the sql 
statements within scala statements),
The Sql statements  are composed of some 'avg,sum'  aggregated functions and 
'group by' clause.
when the app was  running, In the  'Aggregated Metrics by Executor' on the 
spark UI table ,only one executor had 'Shuffle Read Size / Records' value 
(about 100GB),others were zero.
i saw it as there was only one reduce(shuffle) task for reading data and 
writing data.
 was  this  correct?
(if the information was not enough,please tell me.)




By Luo Hai 
Best Wishes!
des...@163.com

[Spark SQL] was it correct that only one executor was used to shuffle the data for reduce task?

Reply via email to