Hi Team,

I was trying to understand the Salting technique for the column where there
would be a huge load on a single partition because of the same keys.

I referred to one youtube video with the below understanding:

So, using the salting technique we can actually change the joining column
values by appending some random number in a specified range.

So, suppose I have these two values in a partition of two different tables:

Table A:
Partition1:
x
.
.
.
x

Table B:
Partition1:
x
.
.
.
x

After Salting it would be something like the below:

Table A:
Partition1:
x_1

Partition 2:
x_2

Table B:
Partition1:
x_3

Partition 2:
x_8

Now, when I inner join these two tables after salting in order to avoid
data skewness problems, I won't get a match since the keys are different
after applying salting techniques.

So how does this resolves the data skewness issue or if there is some
understanding gap?

Could anyone help me in layman's terms?

TIA,
Sid

Reply via email to