wengh commented on PR #49961: URL: https://github.com/apache/spark/pull/49961#issuecomment-2707172162
@cloud-fan > Now to push down filters, we need to create the python batch reader earlier, which means one more round of Python worker communication in the optimizer. I'm wondering that once we finish pushdown, shall we do the planning work immediately and keep PythonDataSourceReadInfo in PythonScan? Good idea to avoid the extra round of worker. I think that would require some refactoring of the plan_read worker so I'll implement that in a new PR since this PR is already very large. Also when we add column pruning support we should get partitions in column pruning worker rather than filter pushdown worker. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org