wengh commented on code in PR #50684:
URL: https://github.com/apache/spark/pull/50684#discussion_r2069715404


##########
python/docs/source/user_guide/sql/python_data_source.rst:
##########
@@ -517,6 +530,121 @@ The following example demonstrates how to implement a 
basic Data Source using Ar
 
     df.show()
 
+Filter Pushdown in Python Data Sources
+--------------------------------------
+
+Filter pushdown is an optimization technique that allows data sources to 
handle filters natively, reducing the amount of data that needs to be 
transferred and processed by Spark.
+
+The filter pushdown API is introduced in Spark 4.1, enabling DataSourceReader 
to selectively push down filters from the query to the source.
+
+You must turn on the configuration ``spark.sql.python.filterPushdown.enabled`` 
to enable filter pushdown.
+
+**How Filter Pushdown Works**
+
+When a query includes filter conditions, Spark can pass these filters to the 
data source implementation, which can then apply the filters during data 
retrieval. This is especially beneficial for:
+
+- Data sources backed by formats that allow efficient filtering (e.g. 
key-value stores)
+- APIs that support filtering (e.g. REST and GraphQL APIs)
+
+The data source receives the filters, decides which ones can be pushed down, 
and returns the remaining filters to Spark to be applied later.
+
+**Implementing Filter Pushdown**
+
+To enable filter pushdown in your Python Data Source, implement the 
``pushFilters`` method in your ``DataSourceReader`` class:
+
+.. code-block:: python
+
+    from pyspark.sql.datasource import EqualTo, Filter, GreaterThan, LessThan
+
+    def pushFilters(self, filters: List[Filter]) -> Iterable[Filter]:

Review Comment:
   Changed to an example source that returns prime numbers sequentially



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to