@abhishek. We use spark 3.1*

On Mon, 20 Dec 2021, 09:50 Rao, Abhishek (Nokia - IN/Bangalore), <
abhishek....@nokia.com> wrote:

> Hi Senthil,
>
>
>
> Which version of Spark 3 are we using? We had this kind of observation
> with Spark 3.0.2 and 3.1.x, but then we figured out that we had configured
> big value for spark.network.timeout and this value was not taking effect
> in all releases prior to 3.0.2.
>
> This was fixed as part of
> https://issues.apache.org/jira/browse/SPARK-33557. Because we had
> configured big value for spark.network.timeout, this was resulting in TPCDS
> queries taking long time when tried with Spark 3.0.2 and 3.1.x. Once we
> corrected it, we observed that the queries were executed much faster.
>
>
>
> Thanks and Regards,
>
> Abhishek
>
>
>
> *From:* Senthil Kumar <sen...@gmail.com>
> *Sent:* Sunday, December 19, 2021 11:58 PM
> *To:* dev <dev@spark.apache.org>
> *Subject:* Spark 3 is Slower than Spark 2 for TPCDS Q04 query.
>
>
>
> Hi All,
>
> We are comparing Spark 2.4.5 and Spark 3(without enabling spark 3
> additional features) with TPCDS queries and found that Spark 3's
> performance is reduced to at-least 30-40% compared to Spark 2.4.5.
>
>
>
> Eg.
>
> Data size used 1TB
>
>
> Spark 2.4.5 finishes the Q4 in 1.5 min, but Spark 3.* takes at-least 2.5
> min.
>
>
>
> Note: We tested this in the same cluster with the same size of data. And
> we ensured that parameters we passed are one and the same for SPark 2.4*
> and Spark 3*.
>
>
>
> It will be helpful, if any one you also encountered the same issue in your
> benchmarking activities? If so, pls share your input on what could be the
> reason behind this poor performance.
>
>
>
> --
>
> Senthil kumar
>

Reply via email to