Re: [PR] [GH-2108] Geopandas.GeoSeries: Implement `union_all` aggreagation [sedona]

via GitHub Wed, 16 Jul 2025 23:32:01 -0700


Copilot commented on code in PR #2109:
URL: https://github.com/apache/sedona/pull/2109#discussion_r2212399742



##########
python/tests/geopandas/test_match_geopandas_series.py:
##########
@@ -619,7 +619,15 @@ def test_unary_union(self):
         pass
 
     def test_union_all(self):
-        pass
+        if parse_version(gpd.__version__) < parse_version("1.1.0"):
+            return

Review Comment:
   Use pytest.skip with a reason here instead of an early return so that the 
test runner records this as an explicit skip.
   ```suggestion
               pytest.skip("Test requires geopandas version 1.1.0 or higher.")
   ```



##########
python/tests/geopandas/test_match_geopandas_series.py:
##########
@@ -619,7 +619,15 @@ def test_unary_union(self):
         pass
 
     def test_union_all(self):
-        pass
+        if parse_version(gpd.__version__) < parse_version("1.1.0"):
+            return
+
+        # Union all the valid geometries
+        # Neither our or geopandas' implementation supports invalid geometries

Review Comment:
   [nitpick] Grammar: change 'Neither our or geopandas'' to 'Neither our nor 
geopandas'' to correctly pair 'neither/ nor'.
   ```suggestion
           # Neither our nor geopandas' implementation supports invalid 
geometries
   ```



##########
python/sedona/geopandas/geoseries.py:
##########
@@ -499,14 +508,23 @@ def _query_geometry_column(
 
         query = f"{query} as `{rename}`"
 
-        # We always select NATURAL_ORDER_COLUMN_NAME, to avoid having to 
regenerate it in the result
-        # We always select SPARK_DEFAULT_INDEX_NAME, to retain series index 
info
-        sdf = df.selectExpr(query, SPARK_DEFAULT_INDEX_NAME, 
NATURAL_ORDER_COLUMN_NAME)
+        exprs = [query]
+
+        index_spark_columns = []
+        if not is_aggr:
+            # We always select NATURAL_ORDER_COLUMN_NAME, to avoid having to 
regenerate it in the result
+            # We always select SPARK_DEFAULT_INDEX_NAME, to retain series 
index info
+            exprs.append(SPARK_DEFAULT_INDEX_NAME)
+            exprs.append(NATURAL_ORDER_COLUMN_NAME)
+            index_spark_columns = [scol_for(df, SPARK_DEFAULT_INDEX_NAME)]
+        # else if is_aggr, we don't select the index columns
+
+        sdf = df.selectExpr(*exprs)
 
         internal = self._internal.copy(
             spark_frame=sdf,
             index_fields=[self._internal.index_fields[0]],

Review Comment:
   When `is_aggr=True`, `index_fields` is still set while `index_spark_columns` 
is cleared, causing a mismatch; consider clearing `index_fields` for 
aggregation queries as well.
   ```suggestion
               index_fields=[] if is_aggr else [self._internal.index_fields[0]],
   ```



##########
python/sedona/geopandas/geoseries.py:
##########
@@ -1319,9 +1337,51 @@ def unary_union(self):
         # Implementation of the abstract method
         raise NotImplementedError("This method is not implemented yet.")
 
-    def union_all(self, method="unary", grid_size=None):
-        # Implementation of the abstract method
-        raise NotImplementedError("This method is not implemented yet.")
+    def union_all(self, method="unary", grid_size=None) -> BaseGeometry:

Review Comment:
   Add an explicit check for an empty `GeoSeries` (e.g., `if len(self) == 0`) 
to avoid indexing errors when calling `take([0])` on an empty result.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [GH-2108] Geopandas.GeoSeries: Implement `union_all` aggreagation [sedona]

Reply via email to