Re: Spark Union performance issue

2023-02-22 Thread Prem Sahoo
Please see inline comments So you union two tables, union the result with another one, and finally with a last one? first Union 2 tables = Result1 2nd Union of another 2 tables = Result2 3rd Result1UnionResult2 = finalResult How many columns do all these tables have? each is having around 700

Re: Spark Union performance issue

2023-02-22 Thread Zhiyuan Lin
Hi Spark devs, I'm experiencing a Union performance degradation as well. Since this email thread is very related, posting it here to see if anyone has any insights. *Background*: After upgrading a Spark job from Spark 2.4 to Spark 3.1 without any code change, we saw *big performance degradation*

Re: Spark Union performance issue

2023-02-22 Thread Enrico Minack
So you union two tables, union the result with another one, and finally with a last one? How many columns do all these tables have? Are you sure creating the plan depends on the number of rows? Enrico Am 22.02.23 um 19:08 schrieb Prem Sahoo: here is the information missed 1. Spark 3.2.0 2.

Re: Spark Union performance issue

2023-02-22 Thread Prem Sahoo
here is the information missed 1. Spark 3.2.0 2. it is scala based 3. size of tables will be ~60G 4. explain plan for catalysts shows lots of time is being spent in creating the plan 5. number of union table is 2 , and another 2 then finally 2 slowness is providing resylut as the data size & colum

Re: Spark Union performance issue

2023-02-22 Thread Enrico Minack
Plus number of unioned tables would be helpful, as well as which downstream operations are performed on the unioned tables. And what "performance issues" do you exactly measure? Enrico Am 22.02.23 um 16:50 schrieb Mich Talebzadeh: Hi, Few details will help 1. Spark version 2. Spark SQL,

Re: Spark Union performance issue

2023-02-22 Thread Mich Talebzadeh
Hi, Few details will help 1. Spark version 2. Spark SQL, Scala or PySpark 3. size of tables in join. 4. What does explain() or the joining operation show? HTH view my Linkedin profile https://en.everybodywiki.com/Mic

Spark Union performance issue

2023-02-22 Thread Prem Sahoo
Hello Team, We are observing Spark Union performance issues when unioning big tables with lots of rows. Do we have any option apart from the Union ?