paleolimbot commented on code in PR #475:
URL: https://github.com/apache/sedona-db/pull/475#discussion_r2656770632


##########
r/sedonadb/R/dataframe.R:
##########
@@ -316,6 +316,54 @@ as_nanoarrow_array_stream.sedonadb_dataframe <- 
function(x, ..., schema = NULL)
 
 #' @export
 print.sedonadb_dataframe <- function(x, ..., width = NULL, n = NULL) {
+  # Print class header
+  schema <- nanoarrow::infer_nanoarrow_schema(x)
+  ncols <- length(schema$children)
+
+  cat(sprintf("# A sedonadb_dataframe: ? x %d\n", ncols))
+
+  # Print geometry column info
+  # we just use sd_parse_crs() to extract CRS info from 
ARROW:extension:metadata
+  geo_col_info <- character()
+  for (col_name in names(schema$children)) {
+    child <- schema$children[[col_name]]
+    ext_name <- child$metadata[["ARROW:extension:name"]]
+    if (!is.null(ext_name) && grepl("^geoarrow\\.", ext_name)) {

Review Comment:
   It requires adding a bit more in Rust, but I think we could leverage the 
existing SedonaDB code that does some of this with:
   
   ```r
   sd_type <- SedonaType$new(child)
   if (sd_type$logical_type_name() == "geometry") {
      geo_col_info <- c(geo_col_info, sprintf("%s%s", child$name, 
sd_type$crs$display()))
   }
   ```
   
   For this approach specifically, `nanoarrow::nanoarrow_schema_parse(schema)` 
and `geoarrow::geoarrow_schema_parse()` could help extract the CRS string for 
you (so that in Rust all you need is `deserialize_crs(&str)`).



##########
r/sedonadb/R/pkg-sf.R:
##########
@@ -29,7 +29,7 @@ as_sedonadb_dataframe.sf <- function(x, ..., schema = NULL) {
   as_sedonadb_dataframe(new_sedonadb_dataframe(ctx, df), schema = schema)
 }
 
-# dynamically registered in zzz.R
+#' @exportS3Method sf::st_as_sf

Review Comment:
   I didn't know about this...cool!



##########
r/sedonadb/R/000-wrappers.R:
##########


Review Comment:
   You're kind to regenerate these files with a minimal diff...in #468 I added 
a shell script that handles adding the license header and fixing the typo in 
init.c, so hopefully this gets easier. Feel free to copy that script into this 
PR to make your life easier 🙂 



##########
r/sedonadb/src/rust/src/lib.rs:
##########
@@ -66,3 +66,60 @@ fn configure_proj_shared(
     configure_global_proj_engine(builder)?;
     Ok(())
 }
+
+#[savvy]
+fn parse_crs_metadata(crs_json: &str) -> savvy::Result<savvy::Sexp> {
+    use sedona_schema::crs::deserialize_crs_from_obj;
+
+    // The input is GeoArrow extension metadata, which is a JSON object like:
+    // {"crs": <PROJJSON or string>}
+    // We need to extract the "crs" field first.
+    let metadata: serde_json::Value = serde_json::from_str(crs_json)
+        .map_err(|e| savvy::Error::new(format!("Failed to parse metadata JSON: 
{e}")))?;

Review Comment:
   What do you think about adding a savvy-powered wrapper around the `Crs` 
object to expose this information?
   
   ```rust
   #[savvy]
   struct SedonaDBCrs {
     inner: Arc<dyn CoordinateReferenceSystem>
   }
   
   #[savvy]
   impl SedonaDBCrs {
     fn srid(&self) -> savvy::Result<Optional<i32>> {
       // I forget if we can return `Optional<i32>` here...we might need to 
return `savvy::Sexp`
       // and construct the Sexp ourselves
       Ok(self.inner.srid()
     }
   }
   ```
   
   To do this we'd probably also need a wrapper around `SedonaType` (I linked 
the Python file where something similar happens).



##########
r/sedonadb/tests/testthat/test-crs.R:
##########
@@ -0,0 +1,145 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+test_that("sd_parse_crs works for GeoArrow metadata with EPSG", {
+  meta <- '{"crs": {"id": {"authority": "EPSG", "code": 5070}, "name": "NAD83 
/ Conus Albers"}}'
+  parsed <- sedonadb:::sd_parse_crs(meta)
+  expect_identical(parsed$authority_code, "EPSG:5070")
+  expect_identical(parsed$srid, 5070L)
+  expect_identical(parsed$name, "NAD83 / Conus Albers")
+  # The proj_string is the *unwrapped* and *minified* PROJJSON content
+  expect_match(parsed$proj_string, '"authority":"EPSG"', fixed = TRUE)
+  expect_match(parsed$proj_string, '"code":5070', fixed = TRUE)
+})
+
+test_that("sd_parse_crs works for Engineering CRS (no EPSG ID)", {
+  # A realistic example of a local engineering CRS that wouldn't have an EPSG 
code
+  meta <- '{
+    "crs": {
+      "type": "EngineeringCRS",
+      "name": "Construction Site Local Grid",
+      "datum": {
+        "type": "EngineeringDatum",
+        "name": "Local Datum"
+      },
+      "coordinate_system": {
+        "subtype": "Cartesian",
+        "axis": [
+          {"name": "Northing", "abbreviation": "N", "direction": "north", 
"unit": "metre"},
+          {"name": "Easting", "abbreviation": "E", "direction": "east", 
"unit": "metre"}
+        ]
+      }
+    }
+  }'
+  parsed <- sedonadb:::sd_parse_crs(meta)
+  expect_null(parsed$authority_code)
+  expect_null(parsed$srid)
+  expect_identical(parsed$name, "Construction Site Local Grid")
+  expect_false(is.null(parsed$proj_string))
+})
+
+test_that("sd_parse_crs returns NULL if crs field is missing", {
+  expect_null(sedonadb:::sd_parse_crs('{"something_else": 123}'))
+  expect_null(sedonadb:::sd_parse_crs('{}'))
+})
+
+test_that("sd_parse_crs handles invalid JSON gracefully", {
+  expect_error(
+    sedonadb:::sd_parse_crs('invalid json'),
+    "Failed to parse metadata JSON"
+  )
+})
+
+test_that("sd_parse_crs works with plain strings if that's what's in 'crs'", {
+  meta <- '{"crs": "EPSG:4326"}'
+  parsed <- sedonadb:::sd_parse_crs(meta)
+  # Note: PROJ/sedona normalizes EPSG:4326 (lat/lon) to OGC:CRS84 (lon/lat)
+  # for consistent axis order in WKT/GeoJSON contexts.
+  expect_identical(parsed$authority_code, "OGC:CRS84")
+  expect_identical(parsed$srid, 4326L)
+  expect_false(is.null(parsed$proj_string))
+})
+
+# Tests for CRS display in print.sedonadb_dataframe
+
+test_that("print.sedonadb_dataframe shows CRS info for geometry column with 
EPSG", {
+  df <- sd_sql("SELECT ST_SetSRID(ST_Point(1, 2), 4326) as geom")
+  output <- capture.output(print(df, n = 0))
+
+  # Check that the Geometry line is present
+
+  geo_line <- grep("^# Geometry:", output, value = TRUE)
+  expect_length(geo_line, 1)
+
+  # Should show CRS information (OGC:CRS84 or EPSG:4326)
+  expect_match(geo_line, "geom .*(CRS: OGC:CRS84|CRS: EPSG:4326)")
+})
+
+test_that("print.sedonadb_dataframe shows CRS info with different SRID", {
+  df <- sd_sql("SELECT ST_SetSRID(ST_Point(1, 2), 5070) as geom")
+  output <- capture.output(print(df, n = 0))
+
+  geo_line <- grep("^# Geometry:", output, value = TRUE)
+  expect_length(geo_line, 1)
+  expect_match(geo_line, "geom .*(CRS: EPSG:5070|CRS:.*5070)")

Review Comment:
   I think these (all) would be easier to review/maintain with 
`expect_snapshot()`. CI will complain at you if you add this before #468 merges 
(feel free to copy over the ignores I added there for `_snaps` or I can help if 
you like).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to