Adding /another/ update to say that I'm currently planning on using a recently introduced feature whereby calling `.repartition()` with no args will cause the dataset to be optimised by AQE. This actually suits our use-case perfectly!
Example: sparkSession.conf().set("spark.sql.adaptive.enabled", "true"); Dataset<Long> dataset = sparkSession.range(1, 4, 1, 4).repartition(); assertThat(dataset.rdd().collectPartitions().length).isEqualTo(1); // true Relevant PRs/Issues: [SPARK-31220][SQL] repartition obeys initialPartitionNum when adaptiveExecutionEnabled https://github.com/apache/spark/pull/27986 [SPARK-32056][SQL] Coalesce partitions for repartition by expressions when AQE is enabled https://github.com/apache/spark/pull/28900 [SPARK-32056][SQL][Follow-up] Coalesce partitions for repartiotion hint and sql when AQE is enabled https://github.com/apache/spark/pull/28952 -- Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/ --------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org