paleolimbot commented on PR #240:
URL: https://github.com/apache/parquet-format/pull/240#issuecomment-2637914308

   I think we are all on the same page here! The high level intent for 
everybody is that the Parquet and Iceberg types are able to interoperate with 
the rest of the ecosystem (to maximize adoption for spatial and non-spatial 
libraries alike) with minimal ambiguity, and that Parquet and Iceberg should 
take on a minimum of spatial understanding.
   
   It is often confusing, but I cannot stress enough that the language we have 
in GeoParquet is the industry standard. Requiring something else is a 
sufficient barrier to interoperability that there is a risk the "official" type 
will not be supported (This is not to say that I will not try to help...I will! 
But I can only do so much in the face of such a significant departure from the 
norm.)
   
   > This specifically states that the order of dimensions in bounding box 
metadata must differ from the CRS in some cases
   
   I think it was always the (perhaps unclear) intent that the axes of the 
bounds/statistics were identical to the WKB (on purpose, so that 
implementations do not need to parse the CRS to iterate over the WKB and 
calculate the bounds). Perhaps there is a way to make this more clear?
   
   > I'm actually leaning toward the current language where X and Y are 
consistent with the data values and never flipped
   
   I think this either requires that GeoArrow/GeoPandas/PostGIS/Every other 
library I'm aware of has to either (1) rewrite their WKB before writing to 
Parquet (slow) or (2) permute the axes of the CRS (which invalidates the 
identifier and requires some logic that isn't baked into most libraries today).
   
   > It seems to me that this makes the choice opaque and has much less 
implementation risk.
   
   Some options that I think would have less implementation risk:
   
   - Include an optional `permutation` alongside the CRS (e.g., `[0, 1]` to 
indicate authority compliance), but assume 
GeoParquet/GeoArrow/GeoPackage/Industry standard otherwise (credit to Martin 
who pointed out this option in a thread on this PR).
   - Use the GeoParquet/GeoArrow/GeoPackage/Industry standard language and see 
if there are any issues (I'm not aware of any in several years of experience 
with GeoArrow/GeoParquet)
   - Make no assertions about axis order (i.e., CRS interpretation is purely up 
to the reader/writer). Because the industry standard is ubiquitous, I think 
this will cause fewer problems than being explicit about the opposite.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@parquet.apache.org
For additional commands, e-mail: issues-h...@parquet.apache.org

Reply via email to