subject:"Attempting to avoid a shuffle on join"

Re: Attempting to avoid a shuffle on join

2019-07-05 Thread Chris Teoh

Dataframes have a partitionBy function too. You can avoid a shuffle if one of your datasets is small enough to broadcast. On Thu., 4 Jul. 2019, 7:34 am Mkal, wrote: > Please keep in mind i'm fairly new to spark. > I have some spark code where i load two textfiles as datasets and after > some >

Attempting to avoid a shuffle on join

2019-07-03 Thread Mkal

Please keep in mind i'm fairly new to spark. I have some spark code where i load two textfiles as datasets and after some map and filter operations to bring the columns in a specific shape, i join the datasets. The join takes place on a common column (of type string). Is there any way to avoid the