alamb commented on issue #15513:
URL: https://github.com/apache/datafusion/issues/15513#issuecomment-3027160719

   Here is my suggestion for a blog / outline:
   
   The goal is a technical evangelism piece. The reader should come away having 
learned something about columnar query engines (not just that DataFusion is 
great, which it is!)
   
   # Title: Using Dynamic Filterers and to make TopK /  LIMIT queries much 
faster
   
   # Structure:
   
   ## Intro 
   
   w/ some sort of summary performance chart
   
   Running example:
   A simple example query -- I think the clickbench Q23 `SELECT * FROM hits 
ORDER BY time DESC LIMIT 10` is a pretty good one as it is so simple but 
illustrates the point. More details can be summarized from 
https://github.com/apache/datafusion/issues/15177
   
   ## Background
   
   Show the plan for Q23
   
   Explain the existing topk optimization (that there is a heap)
   
   Explain that the query does much more work than necessary because it decodes 
all rows just to throw all but 10 of them away
   
   Introduce the notion of filter pushdown and point out that DataFusion does 
it at multiple phases
   - Listing table (prune files)
   - During opening (prune files again)
   - During row group / data page filtering 
   - During the scan (if `pushdown_filters` is on)
   
   ## Dynamic Filters
   
   Explain that the topk operator knows the minimum time that could be emitted 
after the plan started -- basically like `WHERE time > (current min in top k)`.
   
   However the current min isn't know at plan time
   
   Then describe the summary technical approach, highlighting that you made it 
general purpose to aslo support SIPs and other user defined dynamic filters; 
Also highlight you worked with the community to do this
   * Add an API for pushing down filter and introducing dynamic filters to 
ExecutionPlan trait
   * Add appropriate APIs for updating those filters at runtime and adding new 
points to prune (e..g on file open)
   
   ## Results
   Show some sort of results if possible
   
   ## Conclusion / Call to action
   This will be released in DataFusion 49
   
   Come help / join us / use DataFusion 🎣 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to