Hi Senthil,

Which version of Spark 3 are we using? We had this kind of observation with 
Spark 3.0.2 and 3.1.x, but then we figured out that we had configured big value 
for spark.network.timeout and this value was not taking effect in all releases 
prior to 3.0.2.
This was fixed as part of https://issues.apache.org/jira/browse/SPARK-33557. 
Because we had configured big value for spark.network.timeout, this was 
resulting in TPCDS queries taking long time when tried with Spark 3.0.2 and 
3.1.x. Once we corrected it, we observed that the queries were executed much 
faster.

Thanks and Regards,
Abhishek

From: Senthil Kumar <sen...@gmail.com>
Sent: Sunday, December 19, 2021 11:58 PM
To: dev <dev@spark.apache.org>
Subject: Spark 3 is Slower than Spark 2 for TPCDS Q04 query.

Hi All,

We are comparing Spark 2.4.5 and Spark 3(without enabling spark 3 additional 
features) with TPCDS queries and found that Spark 3's performance is reduced to 
at-least 30-40% compared to Spark 2.4.5.

Eg.

Data size used 1TB

Spark 2.4.5 finishes the Q4 in 1.5 min, but Spark 3.* takes at-least 2.5 min.

Note: We tested this in the same cluster with the same size of data. And we 
ensured that parameters we passed are one and the same for SPark 2.4* and Spark 
3*.

It will be helpful, if any one you also encountered the same issue in your 
benchmarking activities? If so, pls share your input on what could be the 
reason behind this poor performance.

--
Senthil kumar

Reply via email to