alamb commented on PR #16711: URL: https://github.com/apache/datafusion/pull/16711#issuecomment-3052692116
My analysis of these results are very consistent with my last attempt at caching filter results The biggest slow downs are in Q30, Q31 ``` │ QQuery 30 │ 758.48 ms │ 1197.40 ms │ 1.58x slower │ │ QQuery 31 │ 780.68 ms │ 1172.13 ms │ 1.50x slower │ ``` I am fairly sure this is due to the overhad of RowSelection (these queries select many small selections). I started analyzing them here: https://github.com/apache/datafusion/pull/16562#issuecomment-3009778287 So TLDR is I think the caching approach is good. but to avoid some queries getting slower we will need to improve the RowSelection representation too. I will try and think about this / whip up some POC hopefully over the next few days -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org