andygrove commented on code in PR #1525: URL: https://github.com/apache/datafusion-comet/pull/1525#discussion_r2003626404
########## docs/source/user-guide/tuning.md: ########## @@ -141,30 +191,22 @@ It must be set before the Spark context is created. You can enable or disable Co at runtime by setting `spark.comet.exec.shuffle.enabled` to `true` or `false`. Once it is disabled, Comet will fall back to the default Spark shuffle manager. -### Shuffle Mode +### Shuffle Implementations -Comet provides three shuffle modes: Columnar Shuffle, Native Shuffle and Auto Mode. +Comet provides two shuffle implementations: Native Shuffle and Columnar Shuffle. Comet will use Native Shuffle +where possible, then will use Columnar Shuffle where possible, and will fall back to Spark for shuffle operations +that cannot be supported by either. -#### Auto Mode +#### Native Shuffle -`spark.comet.exec.shuffle.mode` to `auto` will let Comet choose the best shuffle mode based on the query plan. This -is the default. +Comet provides a fully native shuffle implementation, which generally provides the best performance. However, +native shuffle currently only supports `HashPartitioning` and `SinglePartitioning` and has some restrictions on +supported data types. #### Columnar (JVM) Shuffle Comet Columnar shuffle is JVM-based and supports `HashPartitioning`, `RoundRobinPartitioning`, `RangePartitioning`, and -`SinglePartitioning`. This mode has the highest query coverage. - -Columnar shuffle can be enabled by setting `spark.comet.exec.shuffle.mode` to `jvm`. If this mode is explicitly set, -then any shuffle operations that cannot be supported in this mode will fall back to Spark. Review Comment: This is now explained under ### Shuffle Implementations based on Parth's suggested text. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org