Re: Salting technique doubt

2022-08-03 Thread Sid
Hi Everyone, Thanks a lot for your answers. It helped me a lot to clear the concept :) Best, Sid On Mon, Aug 1, 2022 at 12:17 AM Vinod KC wrote: > Hi Sid, > This example code with output will add some more clarity > > spark-shell --conf spark.sql.shuffle.partitions=3 --conf >> spark.sql.autoBr

Re: Salting technique doubt

2022-07-31 Thread Vinod KC
Hi Sid, This example code with output will add some more clarity spark-shell --conf spark.sql.shuffle.partitions=3 --conf > spark.sql.autoBroadcastJoinThreshold=-1 > > > scala> import org.apache.spark.sql.DataFrame > import org.apache.spark.sql.DataFrame > > scala> import org.apache.spark.sql.func

Re: Salting technique doubt

2022-07-31 Thread ayan guha
One option is create a separate column in table A with salting. Use it as partition key. Use original column for joining. Ayan On Sun, 31 Jul 2022 at 6:45 pm, Jacob Lynn wrote: > The key is this line from Amit's email (emphasis added): > > > Change the join_col to *all possible values* of the s

Re: Salting technique doubt

2022-07-31 Thread Jacob Lynn
The key is this line from Amit's email (emphasis added): > Change the join_col to *all possible values* of the sale. The two tables are treated asymmetrically: 1. The skewed table gets random salts appended to the join key. 2. The other table gets all possible salts appended to the join key (e.g

Re: Salting technique doubt

2022-07-31 Thread Amit Joshi
Hi Sid, I am not sure I understood your question. But the keys cannot be different post salting in both the tables, this is what i have shown in the explanation. You salt Table A and then explode Table B to create all possible values. In your case, I do not understand, what Table B has x_8/9. It

Re: Salting technique doubt

2022-07-31 Thread Sid
Hi Amit, Thanks for your reply. However, your answer doesn't seem different from what I have explained. My question is after salting if the keys are different like in my example then post join there would be no results assuming the join type as inner join because even though the keys are segregat

Re: Salting technique doubt

2022-07-30 Thread Amit Joshi
Hi Sid, Salting is normally a technique to add random characters to existing values. In big data we can use salting to deal with the skewness. Salting in join cas be used as : * Table A-* Col1, join_col , where join_col values are {x1, x2, x3} x1 x1 x1 x2 x2 x3 *Table B-* join_col, Col3 , where j