Re: Salting technique doubt

Amit Joshi Sun, 31 Jul 2022 01:41:46 -0700

Hi Sid,

I am not sure I understood your question.
But the keys cannot be different post salting in both the tables, this is
what i have shown in the explanation.
You salt Table A and then explode Table B to create all possible values.


In your case, I do not understand, what Table B has x_8/9. It should be all
possible values which you used to create salt.

I hope you understand.

Thanks



On Sun, Jul 31, 2022 at 10:02 AM Sid <[email protected]> wrote:

> Hi Amit,
>
> Thanks for your reply. However, your answer doesn't seem different from
> what I have explained.
>
> My question is after salting if the keys are different like in my example
> then post join there would be no results assuming the join type as inner
> join because even though the keys are segregated in different partitions
> based on unique keys they are not matching because x_1/x_2 !=x_8/x_9
>
> How do you ensure that the results are matched?
>
> Best,
> Sid
>
> On Sun, Jul 31, 2022 at 1:34 AM Amit Joshi <[email protected]>
> wrote:
>
>> Hi Sid,
>>
>> Salting is normally a technique to add random characters to existing
>> values.
>> In big data we can use salting to deal with the skewness.
>> Salting in join cas be used as :
>> * Table A-*
>> Col1, join_col , where join_col values are {x1, x2, x3}
>> x1
>> x1
>> x1
>> x2
>> x2
>> x3
>>
>> *Table B-*
>> join_col, Col3 , where join_col  value are {x1, x2}
>> x1
>> x2
>>
>> *Problem: *Let say for table A, data is skewed on x1
>> Now salting goes like this.  *Salt value =2*
>> For
>> *table A, *create a new col with values by salting join col
>> *New_Join_Col*
>> x1_1
>> x1_2
>> x1_1
>> x2_1
>> x2_2
>> x3_1
>>
>> For *Table B,*
>> Change the join_col to all possible values of the sale.
>> join_col
>> x1_1
>> x1_2
>> x2_1
>> x2_2
>>
>> And then join it like
>> table1.join(table2, where tableA.new_join_col == tableB. join_col)
>>
>> Let me know if you have any questions.
>>
>> Regards
>> Amit Joshi
>>
>>
>> On Sat, Jul 30, 2022 at 7:16 PM Sid <[email protected]> wrote:
>>
>>> Hi Team,
>>>
>>> I was trying to understand the Salting technique for the column where
>>> there would be a huge load on a single partition because of the same keys.
>>>
>>> I referred to one youtube video with the below understanding:
>>>
>>> So, using the salting technique we can actually change the joining
>>> column values by appending some random number in a specified range.
>>>
>>> So, suppose I have these two values in a partition of two different
>>> tables:
>>>
>>> Table A:
>>> Partition1:
>>> x
>>> .
>>> .
>>> .
>>> x
>>>
>>> Table B:
>>> Partition1:
>>> x
>>> .
>>> .
>>> .
>>> x
>>>
>>> After Salting it would be something like the below:
>>>
>>> Table A:
>>> Partition1:
>>> x_1
>>>
>>> Partition 2:
>>> x_2
>>>
>>> Table B:
>>> Partition1:
>>> x_3
>>>
>>> Partition 2:
>>> x_8
>>>
>>> Now, when I inner join these two tables after salting in order to avoid
>>> data skewness problems, I won't get a match since the keys are different
>>> after applying salting techniques.
>>>
>>> So how does this resolves the data skewness issue or if there is some
>>> understanding gap?
>>>
>>> Could anyone help me in layman's terms?
>>>
>>> TIA,
>>> Sid
>>>
>>

Re: Salting technique doubt

Reply via email to