Thanks Amit, I was referring to dynamic partition pruning (
https://issues.apache.org/jira/browse/SPARK-11150) & adaptive query
execution (https://issues.apache.org/jira/browse/SPARK-31412) in Sparkk 3 -
where it would figure out right partitions & pushes the filters to input
before applying the jo
Hi Rishi,
May be you have aready done these steps.
Can you check the size of the dataframe you are trying to broadcast using
logInfo(SizeEstimator.estimate(df))
and adjust the driver similarly.
There is one more issue which I found was in spark 2.
Broadcast does not work in cache data. It is poss
Thanks Amit. I have tried increasing driver memory , also tried increasing
max result size returned to the driver. Nothing works, I believe spark is
not able to determine the fact that the result to be broadcasted is small
enough because input data is huge? When I tried this in 2 stages, write out
Hi,
I think problem lies with driver memory. Broadcast in spark work by
collecting all the data to driver and then driver broadcasting to all the
executors. Different strategy could be employed for trasfer like bit
torrent though.
Please try increasing the driver memory. See if it works.
Regards