2010YOUY01 commented on PR #14766: URL: https://github.com/apache/datafusion/pull/14766#issuecomment-2667753322
Thank you for the help. This change will stop execution once `maxrow` is reached. I think this is the optimal behavior for application developers using `datafusion-cli` for quick experiments. However, datafusion internal developers might use it for timing certain queries, so perhaps they want queries to run till the end. Also, I think the current behavior can also be useful, if it is intended to measure the maximum memory footprint with the result fully materialized. (Though it's a rarer case) So I think we need two extra configurations for this purpose: `--stop-after-max-rows, default: false` -- Controls whether to stop early when `maxrows` is reached. I guess there is less people using `datafusion-cli` for application purposes? So default to false for accurate timing. `--retain-full-results, default: false` -- Controls whether to throw away or accumulate result batches after `maxrows` is reached Now I'm not sure if it's okay to register result sink for a `MemoryReservation`, I'll think about it in the background for a while. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org