Re: [PR] [GH-2331] Geopandas: Document differences of sindex compared to gpd + sindex fixes [sedona]

via GitHub Sun, 07 Sep 2025 06:22:18 -0700


petern48 commented on code in PR #2332:
URL: https://github.com/apache/sedona/pull/2332#discussion_r2325779811



##########
python/sedona/spark/geopandas/sindex.py:
##########
@@ -38,12 +38,23 @@ def __init__(self, geometry, index_type="strtree", 
column_name=None):
 
         Parameters
         ----------
-        geometry : np.array of Shapely geometries, PySparkDataFrame column, or 
PySparkDataFrame
+        geometry : np.array of Shapely geometries, GeoSeries, or 
PySparkDataFrame
         index_type : str, default "strtree"
             The type of spatial index to use.
         column_name : str, optional
             The column name to extract geometry from if `geometry` is a 
PySparkDataFrame.
+
+        Note: query methods (ie. query, nearest, intersection) have different 
behaviors depending on how the index is constructed.
+        When constructed from a np.array, the query methods return indices 
like original geopandas.
+        When constructed from a GeoSeries or PySparkDataFrame, the query 
methods return geometries.

Review Comment:
   Here's the gist of the first point. Essentially, passing in a np.array will 
return indices like the original geopandas behavior (obviously, it will be 
slow). But passing in GeoSeries or PySparkDataFrame will return geometries. 
This is the best we can do in terms of compatibility.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [GH-2331] Geopandas: Document differences of sindex compared to gpd + sindex fixes [sedona]

Reply via email to