Re: Bridging gap between Spark UI and Code

2021-05-21 Thread mhawes
Reviving this thread to ask whether any of the Spark maintainers would consider helping to scope a solution for this. Michal outlines the problem in this thread, but to clarify. The issue is that for very complex spark application where the Logical Plans often span many pages, it is extremely hard

Re: [Spark Core]: Adding support for size based partition coalescing

2021-05-21 Thread mhawes
Adding /another/ update to say that I'm currently planning on using a recently introduced feature whereby calling `.repartition()` with no args will cause the dataset to be optimised by AQE. This actually suits our use-case perfectly! Example: sparkSession.conf().set("spark.sql.adaptive.e