Hello,
    I recently used spark3.2 to do a test based on the TPC-DS dataset, and the 
entire TPC-DS data scale is 1TB(located in HDFS). But I encountered a problem 
that I couldn't understand, and I hope to get your help.
    The SQL statement tested is as follows:
     select count(*) from (
        select distinct c_last_name, c_first_name, d_date
         from store_sales, data_dim, customer
                    where store_sales.ss_sold_date_sk = date_dim.d_date_sk
             and store_sales.ss_customer_sk = customer.c_customer_sk
             and d_month_seq between 1200 and 1200 + 11
    )hot_cust
     limit 100;

Three tables are used here, and their sizes are: store_sales(387.97GB), 
date_dim(9.84GB),customer(1.52GB)

When submitting the application, the driver and executor parameters are set as 
follows:
DRIVER_OPTIONS="--master spark://xx.xx.xx.xx:7078 --total-executor-cores 48 
--conf spark.driver.memory=100g --conf spark.default.parallelism=100 --conf 
spark.sql.adaptive.enabled=false" 
 EXECUTOR_OPTIONS="--executor-cores 12 --conf spark.executor.memory=50g --conf 
spark.task.cpus=1 --conf spark.sql.shuffle.partitions=100 --conf 
spark.sql.crossJoin.enabled=true"

The Spark cluster is built on a physical node with only one worker, and the 
submitted tasks are in StandAlone mode. A total of 56 cores, 376GB of memory. I 
kept executor-num=4 unchanged (that is, the total executors remained 
unchanged), and adjusted executor-cores=2, 4, 8, 12. The corresponding 
application time was: 96s, 66s, 50s, 49s. It was found that the total- 
executor-cores >=32, the time-consuming is almost unchanged. 
I monitored CPU utilization, IO, and found no bottlenecks. Then I analyzed the 
task time in the longest stage of each application in the spark web UI 
interface(The longest time-consuming stage in the four test cases is stage4, 
and the corresponding time-consuming: 48s, 28s, 22s, 23s). We see that with the 
increase of cpu cores, the task time in stage4 will also increase. The average 
task time is: 3s, 4s, 5s, 7s. And each executor takes longer to run the first 
batch of tasks than subsequent tasks. The more cores there are, the greater the 
time gap:

Test-Case-ID total-executor-cores executor-cores total application time  stage4 
time average task time  The first batch of tasks takes time  The remaining 
batch tasks take time  
1 8 2 96s 48s 7s 5-6s 3-4s 
2 16 4 66s 28s 5s 7-8s ~4s 
3 32 8 50s 23s 4s 10-12s ~5s 
4 48 12 49s 22s 3s 14-15s ~6s 

This result confuses me, why after increasing the number of cores, the 
execution time of tasks becomes longer. I compared the data size processed by 
each task in different cases, they are all consistent. Or why is the execution 
efficiency almost unchanged after the number of cores increases to a certain 
number.
Looking forward to your reply.



15927907...@163.com

Reply via email to