viirya commented on code in PR #50301: URL: https://github.com/apache/spark/pull/50301#discussion_r2029710907
########## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ########## @@ -3391,6 +3391,19 @@ object SQLConf { .intConf .createWithDefault(10000) + val ARROW_EXECUTION_MAX_RECORDS_PER_OUTPUT_BATCH = + buildConf("spark.sql.execution.arrow.maxRecordsPerOutputBatch") Review Comment: Vectorized engines usually have a maximum batch size setting which prevents too big batch as input that possibly causes memory issue. For the downstream operators if they are custom vectorized operators, currently the Arrow output batch is sent to them as input. When user worries that the output batch might be too big for these operators, the user can set this config to limit the output batch size. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org