2010YOUY01 commented on code in PR #560:
URL: https://github.com/apache/sedona-db/pull/560#discussion_r2758192176


##########
python/sedonadb/python/sedonadb/context.py:
##########
@@ -134,14 +135,60 @@ def read_parquet(
                 files.
             options: Optional dictionary of options to pass to the Parquet 
reader.
                 For S3 access, use {"aws.skip_signature": True, "aws.region": 
"us-west-2"} for anonymous access to public buckets.
+            geometry_columns: Optional JSON string mapping column name to
+                GeoParquet column metadata (e.g.,
+                '{"geom": {"encoding": "WKB"}}'). Use this to mark binary WKB
+                columns as geometry columns or correct metadata such as the
+                column CRS.
+
+                Supported keys (others in the spec are not implemented):
+                - encoding: "WKB" (required if the column is not already 
geometry)
+                - crs: (e.g., "EPSG:4326")
+                - edges: "planar" (default) or "spherical"
+                See spec for details: https://geoparquet.org/releases/v1.1.0/
+
+                Useful for:
+                - Legacy Parquet files with Binary columns containing WKB 
payloads.
+                - Overriding GeoParquet metadata when fields like `crs` are 
missing.
+
+                Precedence:
+                - GeoParquet metadata is used to infer geometry columns first.
+                - geometry_columns then overrides the auto-inferred schema:
+                  - If a column is not geometry in metadata but appears in
+                    geometry_columns, it is treated as a geometry column.
+                  - If a column is geometry in metadata and also appears in
+                    geometry_columns, only the provided keys override; other
+                    fields remain as inferred. If a key already exists in 
metadata
+                    and is provided again with a different value, an error is
+                    returned.
+
+                Example:
+                - For `geo.parquet(geo1: geometry, geo2: geometry, geo3: 
binary)`,
+                  `read_parquet("geo.parquet", geometry_columns='{"geo2": 
{"encoding": "WKB"}, "geo3": {"encoding": "WKB"}}')`
+                  overrides `geo2` metadata and treats `geo3` as a geometry 
column.
+                - If `geo` inferred from metadata has:
+                  - `geo: {"encoding": "wkb", "crs": None, "edges": 
"spherical"...}`
+                  and geometry_columns provides:
+                  - `geo: {"crs": 4326}`
+                  then the result is (only override provided keys):
+                  - `geo: {"encoding": "wkb", "crs": "EPSG:4326", "edges": 
"spherical"...}`
+                - If `geo` inferred from metadata has:
+                  - `geo: {"encoding": "wkb", "crs": "EPSG:4326"}`
+                  and geometry_columns provides:
+                  - `geo: {"crs": "EPSG:3857"}`
+                  an error is returned for a conflicting key. This option is 
only
+                  allowed to provide missing optional fields in geometry 
columns.

Review Comment:
   addressed in 
[5113c2f](https://github.com/apache/sedona-db/pull/560/commits/5113c2fc8c5f1836f910fa6c22e750c199ebe65d)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to