hi,all
    i'm using spark-2.0.0 on hdp 2.5.0 to build a spark-sql app,below is the 
spark-submit configuration:
spark-submit    \
--class "FASTMDTFlow"     \
--master yarn \
--deploy-mode client \
--driver-memory 12g \
--num-executors 110    \
--executor-memory 8g \
--executor-cores 3 \
--conf "spark.driver.maxResultSize=2g" \
--conf "spark.sql.codegen=true" \
--conf "spark.default.parallelism=360" \
--conf "spark.sql.shuffle.partitions=800" \
my.jar

The app was written in  scala with pure sql statements(simply contained the sql 
statements within scala statements),
The Sql statements  are composed of some 'avg,sum'  aggregated functions and 
'group by' clause.
when the app was  running, In the  'Aggregated Metrics by Executor' on the 
spark UI table ,only one executor had 'Shuffle Read Size / Records' value 
(about 100GB),others were zero.
i saw it as there was only one reduce(shuffle) task for reading data and 
writing data.
 was  this  correct?
(if the information was not enough,please tell me.)




By Luo Hai 
Best Wishes!
des...@163.com

Reply via email to