Re: [Spark Core]: Adding support for size based partition coalescing

2021-03-31 Thread mhawes
Okay from looking closer at some of the code, I'm not sure that what I'm asking for in terms of adaptive execution makes much sense as it can only happen between stages. I.e. optimising future /stages/ based on the results of previous stages. Thus an "on-demand" adaptive coalesce doesn't make much

Re: [Spark Core]: Adding support for size based partition coalescing

2021-03-31 Thread mhawes
Hi angers.zhu, Thanks for pointing me towards that PR, I think the main issue there is that the coalesce operation requires an additional computation which in this case is undesirable. It also approximates the row size rather than just directly using the partition size. Thus it has the potential t

Re: [Spark Core]: Adding support for size based partition coalescing

2021-03-31 Thread angers zhu
Hi all, you mean something like this https://github.com/apache/spark/pull/27248/files? If you need I can raise a pr add a SizeBasedCoaleaser mhawes 于2021年3月30日周二 下午9:06写道: > Hi Pol, I had considered repartitioning but the main issue for me there is > that it will trigger a shuffle and could si