Okay from looking closer at some of the code, I'm not sure that what I'm
asking for in terms of adaptive execution makes much sense as it can only
happen between stages. I.e. optimising future /stages/ based on the results
of previous stages. Thus an "on-demand" adaptive coalesce doesn't make much
Hi angers.zhu,
Thanks for pointing me towards that PR, I think the main issue there is that
the coalesce operation requires an additional computation which in this case
is undesirable. It also approximates the row size rather than just directly
using the partition size. Thus it has the potential t
Hi all,
you mean something like this
https://github.com/apache/spark/pull/27248/files?
If you need I can raise a pr add a SizeBasedCoaleaser
mhawes 于2021年3月30日周二 下午9:06写道:
> Hi Pol, I had considered repartitioning but the main issue for me there is
> that it will trigger a shuffle and could si