Hello, I recently encountered a problem that confuses me when using spark3.0.
I used the tpcx-bb dataset (200GB) and executed Query#5 in it. The SQL will read about 65.7GB of table data. Query#5 is as follows(https://github.com/NVIDIA/spark-rapids/blob/branch-0.3/integration_tests/src/main/scala/com/nvidia/spark/rapids/tests/tpcxbb/TpcxbbLikeSpark.scala): --------------------------------------------------------- The execution script is: #! /bin/bash $Spark_Home/bin/spark-submit \ --class org.tpcxBB \ --master spark://10.3.68.116:7077 \ --executor-memory 40g \ --total-executors-cores 48 \ --executor-cores 8 \ --conf spark.task.cpus=1 \ --conf spark.driver.memory=40g \ --conf spark.driver.maxResultSize=60g \ --conf spark.executor.memory=40g \ --conf spark.sql.shuffle.partitions=100 \ /home/runJars/Tpcxbb-1.0.jar "/user/root/benchmarks/data-200g/data" "5" --------------------------------------------------------- There are three machines in the spark cluster, each with 56 cpu cores and 360GB of memory allocated. But what is strange is that when I increase the number of cpu cores, the overall execution time of Query#5 is reduced that can be seen from the history server web UI, but the average time of tasks (510 in total) responsible for reading data increases significantly. When cpu cores=32, the average task time is 7s When cpu cores=48, the average task time is 17s When cpu cores=96, the average task time is 20s Entering the task log analysis, it is found that reading the same data file, the task time consumption increases with the number of cpu cores. What is the reason for this? Next, when I kept the total executors unchanged and still increased the cpu cores, I found that the result was the same. Of course, the executor-memory and driver-memory are enough. Can you explain the reason for this? Thank U~