Hello All, Hope this email finds you well. I have a dataframe of size 8TB (parquet snappy compressed), however I group it by a column and get a much smaller aggregated dataframe of size 700 rows (just two columns, key and count). When I use it like below to broadcast this aggregated result, it throws dataframe can not be broadcasted error.
df_agg = df.groupBy('column1').count().cache() # df_agg.count() df_join = df.join(broadcast(df_agg), 'column1', 'left_outer') df_join.write.parquet('PATH') The same code works with input df size of 3TB without any modifications. Any suggestions? -- Regards, Rishi Shah