hi,all i'm using spark-2.0.0 on hdp 2.5.0 to build a spark-sql app,below is the spark-submit configuration: spark-submit \ --class "FASTMDTFlow" \ --master yarn \ --deploy-mode client \ --driver-memory 12g \ --num-executors 110 \ --executor-memory 8g \ --executor-cores 3 \ --conf "spark.driver.maxResultSize=2g" \ --conf "spark.sql.codegen=true" \ --conf "spark.default.parallelism=360" \ --conf "spark.sql.shuffle.partitions=800" \ my.jar
The app was written in scala with pure sql statements(simply contained the sql statements within scala statements), The Sql statements are composed of some 'avg,sum' aggregated functions and 'group by' clause. when the app was running, In the 'Aggregated Metrics by Executor' on the spark UI table ,only one executor had 'Shuffle Read Size / Records' value (about 100GB),others were zero. i saw it as there was only one reduce(shuffle) task for reading data and writing data. was this correct? (if the information was not enough,please tell me.) By Luo Hai Best Wishes! des...@163.com