paleolimbot commented on PR #240: URL: https://github.com/apache/parquet-format/pull/240#issuecomment-2637914308
I think we are all on the same page here! The high level intent for everybody is that the Parquet and Iceberg types are able to interoperate with the rest of the ecosystem (to maximize adoption for spatial and non-spatial libraries alike) with minimal ambiguity, and that Parquet and Iceberg should take on a minimum of spatial understanding. It is often confusing, but I cannot stress enough that the language we have in GeoParquet is the industry standard. Requiring something else is a sufficient barrier to interoperability that there is a risk the "official" type will not be supported (This is not to say that I will not try to help...I will! But I can only do so much in the face of such a significant departure from the norm.) > This specifically states that the order of dimensions in bounding box metadata must differ from the CRS in some cases I think it was always the (perhaps unclear) intent that the axes of the bounds/statistics were identical to the WKB (on purpose, so that implementations do not need to parse the CRS to iterate over the WKB and calculate the bounds). Perhaps there is a way to make this more clear? > I'm actually leaning toward the current language where X and Y are consistent with the data values and never flipped I think this either requires that GeoArrow/GeoPandas/PostGIS/Every other library I'm aware of has to either (1) rewrite their WKB before writing to Parquet (slow) or (2) permute the axes of the CRS (which invalidates the identifier and requires some logic that isn't baked into most libraries today). > It seems to me that this makes the choice opaque and has much less implementation risk. Some options that I think would have less implementation risk: - Include an optional `permutation` alongside the CRS (e.g., `[0, 1]` to indicate authority compliance), but assume GeoParquet/GeoArrow/GeoPackage/Industry standard otherwise (credit to Martin who pointed out this option in a thread on this PR). - Use the GeoParquet/GeoArrow/GeoPackage/Industry standard language and see if there are any issues (I'm not aware of any in several years of experience with GeoArrow/GeoParquet) - Make no assertions about axis order (i.e., CRS interpretation is purely up to the reader/writer). Because the industry standard is ubiquitous, I think this will cause fewer problems than being explicit about the opposite. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@parquet.apache.org For additional commands, e-mail: issues-h...@parquet.apache.org