Re: [PR] Add more python integration tests + fix pre-commit ruff-format [sedona-db]

via GitHub Tue, 02 Sep 2025 08:09:28 -0700


paleolimbot commented on code in PR #8:
URL: https://github.com/apache/sedona-db/pull/8#discussion_r2316329852



##########
python/sedonadb/python/sedonadb/testing.py:
##########
@@ -260,11 +269,15 @@ def assert_result(self, result, expected, **kwargs) -> 
"DBEngine":
             self.assert_result(result, [(expected,)], **kwargs)
         elif isinstance(expected, bool):
             self.assert_result(result, [(str(expected).lower(),)], **kwargs)
+        elif isinstance(expected, bytes):
+            self.assert_result(result, [(expected,)], datatype="binary", 
**kwargs)

Review Comment:
   It may make more sense to implement this in the same way that you 
implemented the numeric comparison (i.e., take the first cell of the Pandas 
output). I am not sure that passing the datatype makes much sense for the 
tuples output (which should always be strings).



##########
.pre-commit-config.yaml:
##########
@@ -44,13 +44,6 @@ repos:
       name: rustfmt
       args: ["--all", "--"]
 
-  - repo: https://github.com/astral-sh/ruff-pre-commit
-    rev: v0.1.5
-    hooks:
-      - id: ruff
-        args: [ --fix ]
-      - id: ruff-format
-

Review Comment:
   Thanks!



##########
python/sedonadb/python/sedonadb/testing.py:
##########
@@ -260,11 +269,15 @@ def assert_result(self, result, expected, **kwargs) -> 
"DBEngine":
             self.assert_result(result, [(expected,)], **kwargs)
         elif isinstance(expected, bool):
             self.assert_result(result, [(str(expected).lower(),)], **kwargs)
+        elif isinstance(expected, bytes):
+            self.assert_result(result, [(expected,)], datatype="binary", 
**kwargs)
         elif isinstance(expected, (int, float)):
             result_df = self.result_to_pandas(result)
             assert result_df.shape == (1, 1)
             result_value = result_df.iloc[0, 0]
-            assert result_value == expected, f"Expected {expected}, got 
{result_value}"
+            assert math.isclose(result_value, expected), (
+                f"Expected {expected}, got {result_value}"
+            )

Review Comment:
   This is something we should handle explicitly as an argument (i.e., 
`numeric_epsilon`, or perhaps align it with the name that 
`pandas.testing.assert_dataframe_equal()` uses to specify an epsilon). Most of 
the time we want to assert identicalness and only in specific cases do we want 
to relax that (and we want that relaxing to be explicit in the test code).



##########
python/sedonadb/tests/functions/test_functions.py:
##########
@@ -261,6 +327,60 @@ def test_st_geomfromtext(eng, wkt, expected):
     eng.assert_query_result(f"SELECT ST_GeomFromText({val_or_null(wkt)})", 
expected)
 
 
[email protected]("eng", [SedonaDB, PostGIS])
[email protected](
+    ("geom", "expected"),
+    [
+        ("POINT (1 1)", "POINT (1 1)"),
+        ("POINT EMPTY", "POINT (nan nan)"),
+        ("LINESTRING EMPTY", "LINESTRING EMPTY"),
+        ("POLYGON EMPTY", "POLYGON EMPTY"),
+        ("GEOMETRYCOLLECTION EMPTY", "GEOMETRYCOLLECTION EMPTY"),
+        ("POLYGON ((0 0, 1 0, 1 1, 0 1, 0 0))", "POLYGON ((0 0, 1 0, 1 1, 0 1, 
0 0))"),
+        (
+            "MULTILINESTRING ((0 0, 1 1), (1 1, 2 2))",
+            "MULTILINESTRING ((0 0, 1 1), (1 1, 2 2))",
+        ),
+        (
+            "GEOMETRYCOLLECTION (POINT (0 0), LINESTRING (0 0, 1 1), POLYGON 
((0 0, 0 1, 1 1, 1 0, 0 0)))",
+            "GEOMETRYCOLLECTION (POINT (0 0), LINESTRING (0 0, 1 1), POLYGON 
((0 0, 0 1, 1 1, 1 0, 0 0)))",
+        ),
+    ],
+)
+def test_st_geogfromwkb(eng, geom, expected):
+    eng = eng.create_or_skip()
+    eng.assert_query_result(
+        f"SELECT ST_GeogFromWKB(ST_AsBinary({geom_or_null(geom)}))", expected

Review Comment:
   This would be a less circular test if it used `shapely` to convert to WKB. 
That would also let you check `flavor="iso"` and `flavor="extended"`, which 
both engines should accept.
   
   It's also worth checking what happens on invalid input (we error...I'm not 
sure what PostGIS does).



##########
python/sedonadb/tests/functions/test_functions.py:
##########
@@ -50,6 +50,106 @@ def test_st_area(eng, geom, expected):
     eng.assert_query_result(f"SELECT ST_Area({geom_or_null(geom)})", expected)
 
 
[email protected]("eng", [SedonaDB, PostGIS])
[email protected](
+    ("geom", "expected"),
+    [
+        ("POINT (1 1)", b"0101000000000000000000F03F000000000000F03F"),
+        ("POINT EMPTY", b"0101000000000000000000F87F000000000000F87F"),
+        (
+            "LINESTRING (0 0, 1 2, 3 4)",
+            
b"01020000000300000000000000000000000000000000000000000000000000F03F000000000000004000000000000008400000000000001040",
+        ),
+        ("LINESTRING EMPTY", b"010200000000000000"),
+        (
+            "POINT ZM (0 0 0 0)",
+            
b"01B90B00000000000000000000000000000000000000000000000000000000000000000000",
+        ),
+        (
+            "GEOMETRYCOLLECTION (POINT (0 0), POLYGON ((0 0, 1 0, 1 1, 0 1, 0 
0)))",
+            
b"0107000000020000000101000000000000000000000000000000000000000103000000010000000500000000000000000000000000000000000000000000000000F03F0000000000000000000000000000F03F000000000000F03F0000000000000000000000000000F03F00000000000000000000000000000000",
+        ),
+    ],
+)
+def test_st_asbinary(eng, geom, expected):
+    eng = eng.create_or_skip()
+    eng.assert_query_result(f"SELECT ST_AsBinary({geom_or_null(geom)})", 
expected)
+
+
[email protected]("eng", [SedonaDB, PostGIS])
[email protected](
+    ("geom", "expected"),
+    [
+        (None, None),
+        # Arrow-rs returns POINT (nan nan) instead of POINT EMPTY
+        ("POINT EMPTY", "POINT (nan nan)"),
+        ("LINESTRING EMPTY", "LINESTRING EMPTY"),
+        ("POLYGON EMPTY", "POLYGON EMPTY"),
+        ("MULTIPOINT EMPTY", "MULTIPOINT EMPTY"),
+        ("MULTILINESTRING EMPTY", "MULTILINESTRING EMPTY"),
+        ("MULTIPOLYGON EMPTY", "MULTIPOLYGON EMPTY"),
+        ("GEOMETRYCOLLECTION EMPTY", "GEOMETRYCOLLECTION EMPTY"),
+        ("POINT (1 1)", "POINT (1 1)"),
+        ("LINESTRING (0 0, 1 1)", "LINESTRING (0 0, 1 1)"),
+        ("POLYGON ((0 0, 1 0, 1 1, 0 1, 0 0))", "POLYGON ((0 0, 1 0, 1 1, 0 1, 
0 0))"),
+        ("MULTIPOINT ((0 0), (1 1))", "MULTIPOINT (0 0, 1 1)"),
+        (
+            "MULTILINESTRING ((0 0, 1 1), (1 1, 2 2))",
+            "MULTILINESTRING ((0 0, 1 1), (1 1, 2 2))",
+        ),
+        (
+            "MULTIPOLYGON (((0 0, 1 0, 1 1, 0 1, 0 0)), ((0 0, 1 0, 1 1, 0 1, 
0 0)))",
+            "MULTIPOLYGON (((0 0, 1 0, 1 1, 0 1, 0 0)), ((0 0, 1 0, 1 1, 0 1, 
0 0)))",
+        ),
+        (
+            "GEOMETRYCOLLECTION (POINT (0 0), POLYGON ((0 0, 1 0, 1 1, 0 1, 0 
0)), LINESTRING (0 0, 1 1), GEOMETRYCOLLECTION (POLYGON ((0 0, -1 0, -1 -1, 0 
-1, 0 0))))",
+            "GEOMETRYCOLLECTION (POINT (0 0), POLYGON ((0 0, 1 0, 1 1, 0 1, 0 
0)), LINESTRING (0 0, 1 1), GEOMETRYCOLLECTION (POLYGON ((0 0, -1 0, -1 -1, 0 
-1, 0 0))))",
+        ),
+        ("POINT Z (0 0 0)", "POINT Z (0 0 0)"),
+        ("POINT ZM (0 0 0 0)", "POINT ZM (0 0 0 0)"),
+    ],
+)
+def test_st_astext(eng, geom, expected):
+    eng = eng.create_or_skip()
+    # Engines vary on where they place spaces in the text, so we convert back 
to geometry to compare

Review Comment:
   I think this would be a better test if checked the output directly (which 
should eliminate the need to repeat the parameter twice). Because there's 
differing behaviour, parameterizing by engine doesn't make sense but you can do 
something like:
   
   ```python
   sedona = SedonaDB()
   sedona_wkt = geom.replace(" (", "").replace(", ", ",")
   sedona.assert_query_equal(..., sedona_wkt)
   
   postgis = PostGIS.create_or_skip()
   postgis.assert_query_equal(..., geom)
   ```



##########
python/sedonadb/tests/functions/test_functions.py:
##########
@@ -50,6 +50,106 @@ def test_st_area(eng, geom, expected):
     eng.assert_query_result(f"SELECT ST_Area({geom_or_null(geom)})", expected)
 
 
[email protected]("eng", [SedonaDB, PostGIS])
[email protected](
+    ("geom", "expected"),
+    [
+        ("POINT (1 1)", b"0101000000000000000000F03F000000000000F03F"),
+        ("POINT EMPTY", b"0101000000000000000000F87F000000000000F87F"),
+        (
+            "LINESTRING (0 0, 1 2, 3 4)",
+            
b"01020000000300000000000000000000000000000000000000000000000000F03F000000000000004000000000000008400000000000001040",
+        ),
+        ("LINESTRING EMPTY", b"010200000000000000"),
+        (
+            "POINT ZM (0 0 0 0)",
+            
b"01B90B00000000000000000000000000000000000000000000000000000000000000000000",
+        ),
+        (
+            "GEOMETRYCOLLECTION (POINT (0 0), POLYGON ((0 0, 1 0, 1 1, 0 1, 0 
0)))",
+            
b"0107000000020000000101000000000000000000000000000000000000000103000000010000000500000000000000000000000000000000000000000000000000F03F0000000000000000000000000000F03F000000000000F03F0000000000000000000000000000F03F00000000000000000000000000000000",
+        ),
+    ],
+)
+def test_st_asbinary(eng, geom, expected):
+    eng = eng.create_or_skip()
+    eng.assert_query_result(f"SELECT ST_AsBinary({geom_or_null(geom)})", 
expected)
+
+
[email protected]("eng", [SedonaDB, PostGIS])
[email protected](
+    ("geom", "expected"),
+    [
+        (None, None),
+        # Arrow-rs returns POINT (nan nan) instead of POINT EMPTY

Review Comment:
   ```suggestion
           # geoarrow-c returns POINT (nan nan) instead of POINT EMPTY
   ```



##########
python/sedonadb/tests/functions/test_functions.py:
##########
@@ -50,6 +50,106 @@ def test_st_area(eng, geom, expected):
     eng.assert_query_result(f"SELECT ST_Area({geom_or_null(geom)})", expected)
 
 
[email protected]("eng", [SedonaDB, PostGIS])
[email protected](
+    ("geom", "expected"),
+    [
+        ("POINT (1 1)", b"0101000000000000000000F03F000000000000F03F"),

Review Comment:
   ```suggestion
           ("POINT (1 1)", "0101000000000000000000F03F000000000000F03F"),
   ```
   
   If we're asserting bytes output as uppercase hex strings, the `b"` is 
confusing. If we're asserting bytes output as bytes, use a bytestring with `\x` 
escapes (e.g., `b"\x01\x01\x00..."`). I think hex strings will work out of the 
box here without any modifications to the tester (because that is how Arrow C++ 
casts them to string).



##########
python/sedonadb/python/sedonadb/testing.py:
##########
@@ -183,6 +188,10 @@ def result_to_tuples(self, result, *, wkt_precision=None) 
-> List[Tuple[str]]:
             # isinstance() does not always work with pyarrow in pytest
             if _type_is_geoarrow(col.type):
                 columns.append(ga.format_wkt(col, 
precision=wkt_precision).to_pylist())
+            elif datatype == "binary":
+                binary_lst = col.cast(pa.binary()).to_pylist()
+                # Convert to hex format for comparison e.g 
b'0101000000000000000000F03F000000000000F03F',
+                columns.append([b.hex().upper().encode() for b in binary_lst])

Review Comment:
   I thought that `col.cast(pa.string())` does this already. Does it not 
generate an uppercase hex string?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] Add more python integration tests + fix pre-commit ruff-format [sedona-db]

Reply via email to