Hi All,

We are comparing Spark 2.4.5 and Spark 3(without enabling spark 3
additional features) with TPCDS queries and found that Spark 3's
performance is reduced to at-least 30-40% compared to Spark 2.4.5.

Eg.

Data size used 1TB

Spark 2.4.5 finishes the Q4 in 1.5 min, but Spark 3.* takes at-least 2.5
min.

Note: We tested this in the same cluster with the same size of data. And we
ensured that parameters we passed are one and the same for SPark 2.4* and
Spark 3*.

It will be helpful, if any one you also encountered the same issue in your
benchmarking activities? If so, pls share your input on what could be the
reason behind this poor performance.

-- 
Senthil kumar

Reply via email to