Reviving this thread to ask whether any of the Spark maintainers would
consider helping to scope a solution for this. Michal outlines the problem
in this thread, but to clarify. The issue is that for very complex spark
application where the Logical Plans often span many pages, it is extremely
hard
Adding /another/ update to say that I'm currently planning on using a
recently introduced feature whereby calling `.repartition()` with no args
will cause the dataset to be optimised by AQE. This actually suits our
use-case perfectly!
Example:
sparkSession.conf().set("spark.sql.adaptive.e