2010YOUY01 commented on PR #14766:
URL: https://github.com/apache/datafusion/pull/14766#issuecomment-2667753322

   Thank you for the help. 
   This change will stop execution once `maxrow` is reached. I think this is 
the optimal behavior for application developers using `datafusion-cli` for 
quick experiments.
   However, datafusion internal developers might use it for timing certain 
queries, so perhaps they want queries to run till the end. Also, I think the 
current behavior can also be useful, if it is intended to measure the maximum 
memory footprint with the result fully materialized. (Though it's a rarer case)
   
   So I think we need two extra configurations for this purpose:
   `--stop-after-max-rows, default: false` -- Controls whether to stop early 
when `maxrows` is reached. I guess there is less people using `datafusion-cli` 
for application purposes? So default to false for accurate timing.
   `--retain-full-results, default: false`  -- Controls whether to throw away 
or accumulate result batches after `maxrows` is reached
   
   Now I'm not sure if it's okay to register result sink for a 
`MemoryReservation`, I'll think about it in the background for a while.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to