petern48 commented on code in PR #2332:
URL: https://github.com/apache/sedona/pull/2332#discussion_r2325779811
##########
python/sedona/spark/geopandas/sindex.py:
##########
@@ -38,12 +38,23 @@ def __init__(self, geometry, index_type="strtree",
column_name=None):
Parameters
----------
- geometry : np.array of Shapely geometries, PySparkDataFrame column, or
PySparkDataFrame
+ geometry : np.array of Shapely geometries, GeoSeries, or
PySparkDataFrame
index_type : str, default "strtree"
The type of spatial index to use.
column_name : str, optional
The column name to extract geometry from if `geometry` is a
PySparkDataFrame.
+
+ Note: query methods (ie. query, nearest, intersection) have different
behaviors depending on how the index is constructed.
+ When constructed from a np.array, the query methods return indices
like original geopandas.
+ When constructed from a GeoSeries or PySparkDataFrame, the query
methods return geometries.
Review Comment:
Here's the gist of the first point. Essentially, passing in a np.array will
return indices like the original geopandas behavior (obviously, it will be
slow). But passing in GeoSeries or PySparkDataFrame will return geometries.
This is the best we can do in terms of compatibility.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]