Hi Everyone,
Thanks a lot for your answers. It helped me a lot to clear the concept :)
Best,
Sid
On Mon, Aug 1, 2022 at 12:17 AM Vinod KC wrote:
> Hi Sid,
> This example code with output will add some more clarity
>
> spark-shell --conf spark.sql.shuffle.partitions=3 --conf
>> spark.sql.autoBr
Hi Sid,
This example code with output will add some more clarity
spark-shell --conf spark.sql.shuffle.partitions=3 --conf
> spark.sql.autoBroadcastJoinThreshold=-1
>
>
> scala> import org.apache.spark.sql.DataFrame
> import org.apache.spark.sql.DataFrame
>
> scala> import org.apache.spark.sql.func
One option is create a separate column in table A with salting. Use it as
partition key. Use original column for joining.
Ayan
On Sun, 31 Jul 2022 at 6:45 pm, Jacob Lynn wrote:
> The key is this line from Amit's email (emphasis added):
>
> > Change the join_col to *all possible values* of the s
The key is this line from Amit's email (emphasis added):
> Change the join_col to *all possible values* of the sale.
The two tables are treated asymmetrically:
1. The skewed table gets random salts appended to the join key.
2. The other table gets all possible salts appended to the join key (e.g
Hi Sid,
I am not sure I understood your question.
But the keys cannot be different post salting in both the tables, this is
what i have shown in the explanation.
You salt Table A and then explode Table B to create all possible values.
In your case, I do not understand, what Table B has x_8/9. It
Hi Amit,
Thanks for your reply. However, your answer doesn't seem different from
what I have explained.
My question is after salting if the keys are different like in my example
then post join there would be no results assuming the join type as inner
join because even though the keys are segregat
Hi Sid,
Salting is normally a technique to add random characters to existing values.
In big data we can use salting to deal with the skewness.
Salting in join cas be used as :
* Table A-*
Col1, join_col , where join_col values are {x1, x2, x3}
x1
x1
x1
x2
x2
x3
*Table B-*
join_col, Col3 , where j