Re: [PR] [GH-2110] Geopandas.GeoSeries: Implement simplify and snap [sedona]

via GitHub Wed, 23 Jul 2025 04:51:59 -0700


Copilot commented on code in PR #2111:
URL: https://github.com/apache/sedona/pull/2111#discussion_r2225314816



##########
python/sedona/geopandas/geoseries.py:
##########
@@ -3177,6 +3293,71 @@ def buffer(
             returns_geom=True,
         )
 
+    def simplify(self, tolerance=None, preserve_topology=True) -> "GeoSeries":
+        """Returns a ``GeoSeries`` containing a simplified representation of
+        each geometry.
+
+        The algorithm (Douglas-Peucker) recursively splits the original line
+        into smaller parts and connects these parts' endpoints
+        by a straight line. Then, it removes all points whose distance
+        to the straight line is smaller than `tolerance`. It does not
+        move any points and it always preserves endpoints of
+        the original line or polygon.
+        See 
https://shapely.readthedocs.io/en/latest/manual.html#object.simplify
+        for details
+
+        Simplifies individual geometries independently, without considering
+        the topology of a potential polygonal coverage. If you would like to 
treat
+        the ``GeoSeries`` as a coverage and simplify its edges, while 
preserving the
+        coverage topology, see :meth:`simplify_coverage`.
+
+        Parameters
+        ----------
+        tolerance : float
+            All parts of a simplified geometry will be no more than
+            `tolerance` distance from the original. It has the same units
+            as the coordinate reference system of the GeoSeries.
+            For example, using `tolerance=100` in a projected CRS with meters
+            as units means a distance of 100 meters in reality.
+        preserve_topology: bool (default True)
+            False uses a quicker algorithm, but may produce self-intersecting
+            or otherwise invalid geometries.
+
+        Notes
+        -----
+        Invalid geometric objects may result from simplification that does not
+        preserve topology and simplification may be sensitive to the order of
+        coordinates: two geometries differing only in order of coordinates may 
be
+        simplified differently.
+
+        See also
+        --------
+        simplify_coverage : simplify geometries using coverage simplification
+
+        Examples
+        --------
+        >>> from sedona.geopandas import GeoSeries
+        >>> from shapely.geometry import Point, LineString
+        >>> s = GeoSeries(
+        ...     [Point(0, 0).buffer(1), LineString([(0, 0), (1, 10), (0, 20)])]
+        ... )
+        >>> s
+        0    POLYGON ((1 0, 0.99518 -0.09802, 0.98079 -0.19...
+        1                         LINESTRING (0 0, 1 10, 0 20)
+        dtype: geometry
+
+        >>> s.simplify(1)
+        0    POLYGON ((0 1, 0 -1, -1 0, 0 1))
+        1              LINESTRING (0 0, 0 20)
+        dtype: geometry
+        """
+
+        spark_expr = stf.ST_Simplify(self.spark.column, tolerance)
+        if preserve_topology:
+            spark_expr = stf.ST_SimplifyPreserveTopology(self.spark.column, 
tolerance)

Review Comment:
   [nitpick] The variable `spark_expr` is reassigned without using the initial 
value when `preserve_topology=True`. Consider using conditional assignment 
instead: `spark_expr = stf.ST_SimplifyPreserveTopology(...) if 
preserve_topology else stf.ST_Simplify(...)`
   ```suggestion
           spark_expr = (
               stf.ST_SimplifyPreserveTopology(self.spark.column, tolerance)
               if preserve_topology
               else stf.ST_Simplify(self.spark.column, tolerance)
           )
   ```



##########
python/tests/geopandas/test_match_geopandas_series.py:
##########
@@ -785,6 +795,32 @@ def test_intersection(self):
                     gpd_result = gpd_series1.intersection(gpd_series2, 
align=False)
                     self.check_sgpd_equals_gpd(sgpd_result, gpd_result)
 
+    def test_snap(self):
+        if parse_version(gpd.__version__) < parse_version("1.0.0"):
+            return
+
+        num_passed = 0
+        for i, (_, geom) in enumerate(self.geoms):
+            for _, geom2 in self.geoms[i:]:
+                sgpd_result = GeoSeries(geom).snap(GeoSeries(geom2), 1.1)
+                gpd_result = gpd.GeoSeries(geom).snap(gpd.GeoSeries(geom2), 
1.1)
+                if self.check_sgpd_equals_gpd(sgpd_result, gpd_result, 
error="bool"):
+                    num_passed += 1
+
+                if len(geom) == len(geom2):
+                    sgpd_result = GeoSeries(geom).snap(GeoSeries(geom2), 1, 
align=False)
+                    gpd_result = gpd.GeoSeries(geom).snap(
+                        gpd.GeoSeries(geom2), 1, align=False
+                    )
+                    if self.check_sgpd_equals_gpd(
+                        sgpd_result, gpd_result, error="bool"
+                    ):
+                        num_passed += 1
+
+        # Sedona's snap result fails fairly often, even though the results are 
fairly close.
+        # We instead hard code the number of tests that pass currently so we 
can catch any potential regressions.
+        assert num_passed >= 25

Review Comment:
   Hard-coding the expected number of passing tests (25) makes the test 
brittle. Consider using a more flexible approach like checking that the pass 
rate is above a certain percentage or documenting why exactly 25 tests should 
pass.
   ```suggestion
           # We use a minimum pass rate to ensure flexibility and catch 
potential regressions.
           total_tests = sum(1 for i, (_, geom) in enumerate(self.geoms) for _, 
geom2 in self.geoms[i:])
           min_pass_rate = 0.8  # Minimum pass rate (80%)
           min_passed = int(total_tests * min_pass_rate)
           assert num_passed >= min_passed, f"Expected at least {min_passed} 
tests to pass, but got {num_passed}."
   ```



##########
python/tests/geopandas/test_geopandas_base.py:
##########
@@ -57,9 +64,20 @@ def check_sgpd_equals_gpd(cls, actual: GeoSeries, expected: 
gpd.GeoSeries):
             # Sometimes sedona and geopandas both return empty geometries but 
of different types (e.g Point and Polygon)
             elif a.is_empty and e.is_empty:
                 continue
-            cls.assert_geometry_almost_equal(
-                a, e, tolerance=1e-2
-            )  # increased tolerance from 1e-6
+            try:
+                cls.assert_geometry_almost_equal(
+                    a, e, tolerance=1e-2
+                )  # increased tolerance from 1e-6
+            except ValueError as e:
+                if error == "raise":
+                    raise e

Review Comment:
   The variable name `e` shadows the earlier variable `e` from line 65. Use a 
different variable name like `error` or `ve` to avoid confusion.
   ```suggestion
               except ValueError as error:
                   if error == "raise":
                       raise error
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [GH-2110] Geopandas.GeoSeries: Implement simplify and snap [sedona]

Reply via email to