alamb commented on issue #14231: URL: https://github.com/apache/datafusion/issues/14231#issuecomment-2612024839
My preference is to try and keep the core of datafusion focused on executing the plans as provided as much as possible, and performing "always good optimizations" For optimizations where there is some tradeoff (like choosing between sorting for sort merge join or hashing, for example) I strongly suggest we keep as much of that out of the core as possible (and use user defined passes instead). The rationale is that when tradeoffs are present, no particular choice will be ideal for all usecase (hence why we already have `prefer_existing_sort`). I can imagine some systems that want to prioritize plans that require less memory but more compute, as well as other systems that would prefer maximum performance even if it takes more memory, etc If we make the optimizer passes in the core of datafusion have baked in tradeoffs/heuristics I think it will just get more and more complicated as people try to change how the tradeoffs work I feel strongly enough about this to help with the project -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
