cloud-fan commented on code in PR #52190: URL: https://github.com/apache/spark/pull/52190#discussion_r2318591245
########## sql/core/src/main/scala/org/apache/spark/sql/execution/ExternalAppendOnlyUnsafeRowArray.scala: ########## @@ -34,30 +34,34 @@ import org.apache.spark.util.collection.unsafe.sort.{UnsafeExternalSorter, Unsaf /** * An append-only array for [[UnsafeRow]]s that strictly keeps content in an in-memory array - * until [[numRowsInMemoryBufferThreshold]] is reached post which it will switch to a mode which - * would flush to disk after [[numRowsSpillThreshold]] is met (or before if there is - * excessive memory consumption). Setting these threshold involves following trade-offs: + * until [[numRowsInMemoryBufferThreshold]] or [[sizeInBytesInMemoryBufferThreshold]] is reached + * post which it will switch to a mode (backed by [[UnsafeExternalSorter]]) which would flush to + * disk after [[numRowsSpillThreshold]] or [[sizeInBytesSpillThreshold]] is met (or before if there + * is excessive memory consumption). Setting these threshold involves following trade-offs: * - * - If [[numRowsInMemoryBufferThreshold]] is too high, the in-memory array may occupy more memory - * than is available, resulting in OOM. - * - If [[numRowsSpillThreshold]] is too low, data will be spilled frequently and lead to - * excessive disk writes. This may lead to a performance regression compared to the normal case - * of using an [[ArrayBuffer]] or [[Array]]. + * - If [[numRowsInMemoryBufferThreshold]] and [[sizeInBytesInMemoryBufferThreshold]] are too high, + * the in-memory array may occupy more memory than is available, resulting in OOM. + * - If [[numRowsSpillThreshold]] or [[sizeInBytesSpillThreshold]] is too low, data will be spilled + * frequently and lead to excessive disk writes. This may lead to a performance regression compared + * to the normal case of using an [[ArrayBuffer]] or [[Array]]. Review Comment: ```suggestion * frequently and lead to excessive disk writes. This may lead to a performance regression * compared to the normal case of using an [[ArrayBuffer]] or [[Array]]. ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org