alamb opened a new issue, #14337:
URL: https://github.com/apache/datafusion/issues/14337

   ### Is your feature request related to a problem or challenge?
   
   We rely on 
[`MemoryExec`](https://github.com/alamb/datafusion/blob/f77579108d1dc0285636fbfb24507d2bfca66446/datafusion/physical-plan/src/memory.rs#L54-L53)
 for many of our tests, but `MemoryExec` doesn't support pushing limits down 
(aka `fetch`) This means that some bugs such as the following are harder to 
observe due to the fact in memory tables don't have pushdown testing:
   - https://github.com/apache/datafusion/issues/14335
   
   
   
   ### Describe the solution you'd like
   
   I would like `MemoryExec` to support "fetch" pushdown too so it mirrors the 
other sources and adds additional test coverage
   
   
   ### Describe alternatives you've considered
   
   One way would be:
   1. Add a `MemoryExec::fetch` field similar to 
https://github.com/apache/datafusion/blob/f3b1141d0f417e9d9e6c0ada03592c9d9ec60cd4/datafusion/physical-plan/src/limit.rs#L213-L214
   2. Implement the limit logic like this: 
https://github.com/apache/datafusion/blob/f3b1141d0f417e9d9e6c0ada03592c9d9ec60cd4/datafusion/physical-plan/src/limit.rs#L390-L397
   3. Add the limit to the explain plan display: 
https://github.com/apache/datafusion/blob/f77579108d1dc0285636fbfb24507d2bfca66446/datafusion/physical-plan/src/memory.rs#L71-L78
   5. Add tests here 
https://github.com/apache/datafusion/blob/f77579108d1dc0285636fbfb24507d2bfca66446/datafusion/physical-plan/src/memory.rs#L862
   
   
   
   ### Additional context
   
   I think this is a relatively self contained project for a newcomer who has 
done rust before but wants to get experience with DataFusion and the engine 
part more
   
   The only potential challenge is if this exposes more bugs such as 
https://github.com/apache/datafusion/issues/14335 and will likely require 
updating a bunch of tests. 
   
   It might be good to do this in a few PRs:
   1. Add the limit to the MemoryExec and display in one PR (but don't actually 
limit the output)
   2. Implement the actual output limits
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to