viirya commented on code in PR #1210: URL: https://github.com/apache/datafusion-comet/pull/1210#discussion_r1902636032
########## docs/source/user-guide/tuning.md: ########## @@ -23,11 +23,52 @@ Comet provides some tuning options to help you get the best performance from you ## Memory Tuning -Comet shares an off-heap memory pool between Spark and Comet. This requires setting `spark.memory.offHeap.enabled=true`. -If this setting is not enabled, Comet will not accelerate queries and will fall back to Spark. +### Unified Memory Management with Off-Heap Memory + +The recommended way to share memory between Spark and Comet is to set `spark.memory.offHeap.enabled=true`. This allows +Comet to share an off-heap memory pool with Spark. The size of the pool is specified by `spark.memory.offHeap.size`. + +### Dedicated Comet Memory Pools + +If the `spark.memory.offHeap.enabled` setting is not enabled then Comet will use its own dedicated memory pools that +are not shared with Spark. This requires additional configuration settings to be specified to set the size and type of +memory pool to use. + +The size of the pool can be set explicitly with `spark.comet.memoryOverhead`. If this setting is not specified then +the memory overhead will be calculated by multiplying the executor memory by `spark.comet.memory.overhead.factor` +(defaults to `0.2`). + +The type of pool can be specified with `spark.comet.exec.memoryPool`. The default setting is `greedy_task_shared`. + +The valid pool types are: + +- `greedy` +- `greedy_global` +- `greedy_task_shared` +- `fair_spill` +- `fair_spill_global` +- `fair_spill_task_shared` + +Pool types ending with `_global` use a single global memory pool between all tasks. Review Comment: ```suggestion Pool types ending with `_global` use a single global memory pool between all tasks on same executor. ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org