Re: Spark Explain Plan and Joins

2022-02-20 Thread Mich Talebzadeh
Do a Google search on *sort-merge spark*. There are plenty of notes on the topic and examples. NLJ, Sort-merge and Hash-joins and derivatives are common join algorithms in database systems. These were not created by Spark. At a given time, there are reasons why one specific join is preferred over

Docker images for Spark 3.1.1 and Spark 3.1.2 with Java 11 and Java 8 from docker hub

2022-02-20 Thread Mich Talebzadeh
I have loaded docker files into my docker repository on docker hub and it is public. These are built on Spark 3.1.2 OR 3.1.1, with Scala 2.12 and with Java 11 OR Java 8 on OS jre-slim-buster. The ones built on 3.1.1 with Java 8 should work with GCP No additional packages are added to PySpark i

Re: Docker images for Spark 3.1.1 and Spark 3.1.2 with Java 11 and Java 8 from docker hub

2022-02-20 Thread Mich Talebzadeh
Added dockers for Spark 3.2.1 with default11-jre-slim-buster for spark and spark-py HTH view my Linkedin profile https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:* Use it at your own risk. Any and all responsibility fo

Re: Spark Explain Plan and Joins

2022-02-20 Thread Sid
Thank you so much for your reply, Mich. I will go through it. However, I want to understand how to read this plan? If I face any errors or if I want to look how spark is cost optimizing or how should we approach it? Could you help me in layman terms? Thanks, Sid On Sun, 20 Feb 2022, 17:50 Mich T

Re: Spark Explain Plan and Joins

2022-02-20 Thread Mich Talebzadeh
Hi Sid, This article is concise and pretty up-to-date. Spark’s Logical and Physical plans … When, Why, How and Beyond. It is a good start. If after reading it, some stuff needs to be explained, re

Re: Spark Explain Plan and Joins

2022-02-20 Thread Gourav Sengupta
Hi, what are you trying to achieve by this? If there is a performance deterioration, try to collect the query execution run time statistics from SPARK SQL. They can be seen from the SPARK SQL UI and available over API's in case I am not wrong. Please ensure that you are not trying to over automa