Dataframes have a partitionBy function too.
You can avoid a shuffle if one of your datasets is small enough to
broadcast.
On Thu., 4 Jul. 2019, 7:34 am Mkal, wrote:
> Please keep in mind i'm fairly new to spark.
> I have some spark code where i load two textfiles as datasets and after
> some
>
Please keep in mind i'm fairly new to spark.
I have some spark code where i load two textfiles as datasets and after some
map and filter operations to bring the columns in a specific shape, i join
the datasets.
The join takes place on a common column (of type string).
Is there any way to avoid the