Re: Execution efficiency slows down as the number of CPU cores increases

2022-02-10 Thread Mich Talebzadeh
Hi, Can Please provide some more info about your Spark cluster setup. You mentioned Hadoop as the underlying storage. I assume that there is data locality between your Spark cluster and the the underlying hadoop. In your SQL statement below select count(*) from ( select *distinct* c_l

Re: Execution efficiency slows down as the number of CPU cores increases

2022-02-10 Thread Sean Owen
More cores is finishing faster as expected. My guess is that you are getting more parallelism overall and that speeds things up. However with more tasks executing concurrently on one machine, you are getting some contention, so it's possible more tasks are taking longer - a little I/O contention, C

Execution efficiency slows down as the number of CPU cores increases

2022-02-10 Thread 15927907...@163.com
Hello, I recently used spark3.2 to do a test based on the TPC-DS dataset, and the entire TPC-DS data scale is 1TB(located in HDFS). But I encountered a problem that I couldn't understand, and I hope to get your help. The SQL statement tested is as follows: select count(*) from (