So you union two tables, union the result with another one, and finally
with a last one?
How many columns do all these tables have?
Are you sure creating the plan depends on the number of rows?
Enrico
Am 22.02.23 um 19:08 schrieb Prem Sahoo:
here is the information missed
1. Spark 3.2.0
2. it is scala based
3. size of tables will be ~60G
4. explain plan for catalysts shows lots of time is being spent in
creating the plan
5. number of union table is 2 , and another 2 then finally 2
slowness is providing resylut as the data size & column size increases .
On Wed, Feb 22, 2023 at 11:07 AM Enrico Minack
<i...@enrico.minack.dev> wrote:
Plus number of unioned tables would be helpful, as well as which
downstream operations are performed on the unioned tables.
And what "performance issues" do you exactly measure?
Enrico
Am 22.02.23 um 16:50 schrieb Mich Talebzadeh:
Hi,
Few details will help
1. Spark version
2. Spark SQL, Scala or PySpark
3. size of tables in join.
4. What does explain() or the joining operation show?
HTH
**view my Linkedin profile
<https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
https://en.everybodywiki.com/Mich_Talebzadeh
*Disclaimer:* Use it at your own risk.Any and all responsibility
for any loss, damage or destruction of data or any other property
which may arise from relying on this email's technical content is
explicitly disclaimed. The author will in no case be liable for
any monetary damages arising from such loss, damage or destruction.
On Wed, 22 Feb 2023 at 15:42, Prem Sahoo <prem.re...@gmail.com>
wrote:
Hello Team,
We are observing Spark Union performance issues when unioning
big tables with lots of rows. Do we have any option apart
from the Union ?