@abhishek. We use spark 3.1* On Mon, 20 Dec 2021, 09:50 Rao, Abhishek (Nokia - IN/Bangalore), < abhishek....@nokia.com> wrote:
> Hi Senthil, > > > > Which version of Spark 3 are we using? We had this kind of observation > with Spark 3.0.2 and 3.1.x, but then we figured out that we had configured > big value for spark.network.timeout and this value was not taking effect > in all releases prior to 3.0.2. > > This was fixed as part of > https://issues.apache.org/jira/browse/SPARK-33557. Because we had > configured big value for spark.network.timeout, this was resulting in TPCDS > queries taking long time when tried with Spark 3.0.2 and 3.1.x. Once we > corrected it, we observed that the queries were executed much faster. > > > > Thanks and Regards, > > Abhishek > > > > *From:* Senthil Kumar <sen...@gmail.com> > *Sent:* Sunday, December 19, 2021 11:58 PM > *To:* dev <dev@spark.apache.org> > *Subject:* Spark 3 is Slower than Spark 2 for TPCDS Q04 query. > > > > Hi All, > > We are comparing Spark 2.4.5 and Spark 3(without enabling spark 3 > additional features) with TPCDS queries and found that Spark 3's > performance is reduced to at-least 30-40% compared to Spark 2.4.5. > > > > Eg. > > Data size used 1TB > > > Spark 2.4.5 finishes the Q4 in 1.5 min, but Spark 3.* takes at-least 2.5 > min. > > > > Note: We tested this in the same cluster with the same size of data. And > we ensured that parameters we passed are one and the same for SPark 2.4* > and Spark 3*. > > > > It will be helpful, if any one you also encountered the same issue in your > benchmarking activities? If so, pls share your input on what could be the > reason behind this poor performance. > > > > -- > > Senthil kumar >