Re: [PR] perf: Optimize st_has(z/m) using WKBBytesExecutor [sedona-db]

via GitHub Fri, 17 Oct 2025 14:06:57 -0700


petern48 commented on code in PR #171:
URL: https://github.com/apache/sedona-db/pull/171#discussion_r2403696499



##########
rust/sedona-functions/src/st_haszm.rs:
##########
@@ -107,28 +107,63 @@ impl SedonaScalarKernel for STHasZm {
     }
 }
 
-fn invoke_scalar(item: &Wkb, dim_index: usize) -> Result<Option<bool>> {
-    match item.as_type() {
-        geo_traits::GeometryType::GeometryCollection(collection) => {
-            use geo_traits::GeometryCollectionTrait;
-            if collection.num_geometries() == 0 {
-                Ok(Some(false))
-            } else {
-                // PostGIS doesn't allow creating a GeometryCollection with 
geometries of different dimensions
-                // so we can just check the dimension of the first one
-                let first_geom = unsafe { collection.geometry_unchecked(0) };
-                invoke_scalar(first_geom, dim_index)
-            }
-        }
-        _ => {
-            let geom_dim = item.dim();
-            match dim_index {
-                2 => Ok(Some(matches!(geom_dim, Dimensions::Xyz | 
Dimensions::Xyzm))),
-                3 => Ok(Some(matches!(geom_dim, Dimensions::Xym | 
Dimensions::Xyzm))),
-                _ => sedona_internal_err!("unexpected dim_index"),
-            }
+/// Fast-path inference of geometry type name from raw WKB bytes
+/// An error will be thrown for invalid WKB bytes input
+///
+/// Spec: https://libgeos.org/specifications/wkb/
+fn infer_haszm(buf: &[u8], dim_index: usize) -> Result<Option<bool>> {
+    if buf.len() < 5 {
+        return sedona_internal_err!("Invalid WKB: buffer too small ({} 
bytes)", buf.len());
+    }
+
+    let byte_order = buf[0];
+    let code = match byte_order {
+        0 => u32::from_be_bytes([buf[1], buf[2], buf[3], buf[4]]),
+        1 => u32::from_le_bytes([buf[1], buf[2], buf[3], buf[4]]),
+        other => return sedona_internal_err!("Unexpected byte order: {other}"),
+    };
+
+    // 0000 -> xy or unspecified
+    // 1000 -> xyz
+    // 2000 -> xym
+    // 3000 -> xyzm
+    match code / 1000 {
+        // If xy, it's possible we need to infer the dimension
+        0 => {}
+        1 => return Ok(Some(dim_index == 2)),
+        2 => return Ok(Some(dim_index == 3)),
+        3 => return Ok(Some(true)),
+        _ => return sedona_internal_err!("Unexpected code: {code}"),
+    };

Review Comment:
   Interesting, I didn't think we handled EWKB anywhere in sedona-db, since the 
CRS info is stored at the type level (at least at the moment). Considering 
that, why would we use EWKB in sedona-db at all? How do we detect whether 
something is EWKB or WKB in other parts of the codebase? If EWKB is really 
possible, it would break the implementation for the GEOMETRYCOLLECTION where I 
need to recurse to the nested geometry since there would be an additional byte 
offset for the SRID after wkbType.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] perf: Optimize st_has(z/m) using WKBBytesExecutor [sedona-db]

Reply via email to