Re: [PR] [SPARK-51271][PYTHON] Add filter pushdown API to Python Data Sources [spark]

via GitHub Sat, 15 Mar 2025 09:28:36 -0700


wengh commented on PR #49961:
URL: https://github.com/apache/spark/pull/49961#issuecomment-2707172162


   @cloud-fan 
   
   > Now to push down filters, we need to create the python batch reader 
earlier, which means one more round of Python worker communication in the 
optimizer. I'm wondering that once we finish pushdown, shall we do the planning 
work immediately and keep PythonDataSourceReadInfo in PythonScan?
   
   Good idea to avoid the extra round of worker. I think that would require 
some refactoring of the plan_read worker so I'll implement that in a new PR 
since this PR is already very large.
   
   Also when we add column pruning support we should get partitions in column 
pruning worker rather than filter pushdown worker.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-51271][PYTHON] Add filter pushdown API to Python Data Sources [spark]

Reply via email to