Re: [PR] Apply filter early in TopK [datafusion]

via GitHub Sat, 14 Jun 2025 10:16:46 -0700


Dandandan commented on PR #16408:
URL: https://github.com/apache/datafusion/pull/16408#issuecomment-2972888776


   > Or you can just push to the main PR, I gave you write access to our fork :)
   > 
   > My one question is: how does this optimization play with filter pushdown? 
If a child plan accepted the filter as Exact should we then _not_ re-filter? A 
related question which you've alluded to before: if no child plan accepted the 
filter at all should we avoid updating it?
   
   I think avoiding running the filter twice for "exact cases" is optimal.
   In practice, I am not sure if in this case it would add much overhead: 
converting to rows and comparing against / updating the heap / running the 
compaction logic will be the most expensive part by far. 
    It will be hard I think to show it being much slower somewhere.
   
   Your second question: I think if we actually always use the filter for topk, 
I guess it isn't really wasteful. I think theoretically we should do it just 
before runing the topk instead of after (to avoid running it for the last 
iteration without using it). But also here I think it will be hard to show any 
benefit.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Re: [PR] Apply filter early in TopK [datafusion]

Reply via email to