Copilot commented on code in PR #2332:
URL: https://github.com/apache/sedona/pull/2332#discussion_r2325822947
##########
python/sedona/spark/geopandas/sindex.py:
##########
@@ -170,10 +186,14 @@ def nearest(
Note: Unlike Geopandas, Sedona does not support geometry input of type
np.array or GeoSeries.
+ Note: nearest() has different behaviors depending on how the index is
constructed.
+ When constructed from a np.array, this method returns indices like
original geopandas.
+ When constructed from a GeoSeries or PySparkDataFrame, this method
returns geometries.
+
Returns
-------
list or tuple
- List of indices of nearest geometries, optionally with distances.
+ List of the nearest geometries, optionally with distances.
Review Comment:
Similar to the query method, this docstring states it returns geometries but
should acknowledge that it returns indices when constructed from np.array,
based on the behavioral notes above.
```suggestion
If the index was constructed from a np.array, returns a list of
indices (or a tuple of (indices, distances) if return_distance is True).
If constructed from a GeoSeries or PySparkDataFrame, returns a
list of geometries (or a tuple of (geometries, distances) if return_distance is
True).
```
##########
python/tests/geopandas/test_sindex.py:
##########
@@ -63,6 +63,31 @@ def setup_method(self):
]
)
+ def test_construct_from_geoseries(self):
+ # Construct from a GeoSeries
+ gs = GeoSeries([Point(x, x) for x in range(5)])
+ sindex = SpatialIndex(gs)
+ result = sindex.query(Point(2, 2))
+ # SpatialIndex constructed from GeoSeries return geometries
+ assert result == [Point(2, 2)]
+
+ def test_construct_from_pyspark_dataframe(self):
+ # Construct from PySparkDataFrame
+ df = self.spark.createDataFrame(
+ [(Point(x, x),) for x in range(5)], ["geometry"]
+ )
+ sindex = SpatialIndex(df, column_name="geometry")
+ result = sindex.query(Point(2, 2))
+ assert result == [Point(2, 2)]
+
+ def test_construct_from_nparray(self):
+ # Construct from np.array
+ array = np.array([Point(x, x) for x in range(5)])
+ sindex = SpatialIndex(array)
+ result = sindex.query(Point(2, 2))
+ # Returns indices like original geopandas
+ assert result.tolist() == np.array([2])
Review Comment:
The assertion compares a list to a numpy array, which will always be False.
Use `assert result.tolist() == [2]` or `assert result == np.array([2])` instead.
```suggestion
assert result.tolist() == [2]
```
##########
python/sedona/spark/geopandas/sindex.py:
##########
@@ -220,20 +243,28 @@ def nearest(
def intersection(self, bounds):
"""
- Find geometries that intersect the given bounding box.
+ Find geometries that intersect the given bounding box. Similar to the
Geopandas version,
+ this is a compatibility wrapper for rtree.index.Index.intersection,
use query instead.
Parameters
----------
bounds : tuple
Bounding box as (min_x, min_y, max_x, max_y).
+ Note: intersection() has different behaviors depending on how the
index is constructed.
+ When constructed from a np.array, this method returns indices like
original geopandas.
+ When constructed from a GeoSeries or PySparkDataFrame, this method
returns geometries.
+
+ Note: Unlike Geopandas, Sedona does not support geometry input of type
np.array or GeoSeries.
+ It is recommended to instead use GeoSeries.intersects directly.
+
Returns
-------
list
- List of indices of matching geometries.
+ List of matching geometries.
Review Comment:
The return description should be consistent with the behavioral notes and
acknowledge that indices are returned when constructed from np.array.
```suggestion
List of matching indices (if constructed from a np.array) or
list of matching geometries (if constructed from a GeoSeries or
PySparkDataFrame).
```
##########
python/sedona/spark/geopandas/sindex.py:
##########
@@ -82,12 +93,17 @@ def query(self, geometry: BaseGeometry, predicate: str =
None, sort: bool = Fals
sort : bool, optional, default False
Whether to sort the results.
+ Note: query() has different behaviors depending on how the index is
constructed.
+ When constructed from a np.array, this method returns indices like
original geopandas.
+ When constructed from a GeoSeries or PySparkDataFrame, this method
returns geometries.
+
Note: Unlike Geopandas, Sedona does not support geometry input of type
np.array or GeoSeries.
+ It is recommended to instead use GeoSeries.intersects directly.
Returns
-------
list
- List of indices of matching geometries.
+ List of matching geometries.
Review Comment:
The docstring states this returns 'List of matching geometries' but the note
above indicates it returns indices when constructed from np.array. The return
description should acknowledge both behaviors or be more generic.
```suggestion
If the index was constructed from a np.array, returns a list of
indices of matching geometries.
If constructed from a GeoSeries or PySparkDataFrame, returns a
list of matching geometries.
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]