jorisvandenbossche commented on PR #240:
URL: https://github.com/apache/parquet-format/pull/240#issuecomment-2638004832

   I follow what Dewey has already answered, but just trying to additionally 
clarify a few points from Ryan's post:
   
   > Also, please correct me if I'm wrong here. My current understanding is 
that the WKB data will correspond to the CRS even if the bounding box 
dimensions override it.
   
   @rdblue if I understand you correctly, then yes I think that is not correct. 
WKB data is defined to be x/y, and almost any producer of WKB values or file 
format using WKB under the hood (including GeoParquet) will use the mapping of 
x=lon / y=lat. 
   So for example when using EPSG:4326 (defined with an axis order of lat/lon), 
the WKB will not correspond to the CRS.
   
   > This specifically states that the order of dimensions in bounding box 
metadata must differ from the CRS in some cases. To me, that seems like a big 
implementation risk if people don't know to swap them. In addition, the names 
that we use for the bounding box values (xmin, ymin, xmax, ymax) are misleading 
when the WKB values use x=latitude, y=longitude but x and y in metadata must be 
x=longitude, y=latitude.
   
   So with my above answer, your last sentence is here is also not correct (I 
am considering GeoParquet here for a moment). We define both the bbox as the 
WKB values to use the convention of x=lon / y=lat, so that the bbox and the WKB 
data are always consistent with each other. 
   This actually ensures that you can read and filter data based on the bbox 
statistics _without_ having to inspect the CRS of the column. You mention _"I 
think we want to avoid needing everything to understand the CRS"_, but so that 
is exactly what GeoParquet tries to achieve by saying that x=lon and y=lat. 
Because if you are not sure if the bbox and WKB data is lon/lat or lat/lon, 
then you always have to first inspect the CRS before you know how to specify 
the bbox filter and how to parse the WKB values.
   
   ---
   
   This is clearly all confusing and easy to misunderstand / misinterpret each 
other, which is IMO a good reason to make this more explicit in the spec. So I 
am personally not a fan of Dewey's last suggestion of leaving this vague and 
then letting implementations choose how to handle this (which will then in 
practice be how GeoParquet does it, I would guess, but which is the opposite of 
what you _could_ read in the current version of the spec)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@parquet.apache.org
For additional commands, e-mail: issues-h...@parquet.apache.org

Reply via email to