timsaucer commented on issue #1612: URL: https://github.com/apache/datafusion-python/issues/1612#issuecomment-4843678368
Thanks for the PR! The main issue I see with this is that it makes datafusion-distributed a new dependency for datafusion-python. That's going to add bloat to the existing large wheels we're already producing. Also if we want to support ballista in the same way then we we're adding yet another external dependency and trying to ship/support them in this main repo. The big advantage of this PR is how small / easy it is. The longer term version I had in mind was that we expose via FFI the physical optimizer (done) and query planner (in progress). Then you have a relatively thin `datafusion-distributed` python package and when you create a session context you simply add in the new query planner or physical optimizer. This would work the same for both ballista, datafusion-distributed, and any other package that comes along and wants to do something similar. What do you think? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
