wgtmac commented on code in PR #494:
URL: https://github.com/apache/parquet-format/pull/494#discussion_r2059908869


##########
Geospatial.md:
##########
@@ -94,6 +94,39 @@ Bounding box is defined as the thrift struct below in the 
representation of
 min/max value pair of coordinates from each axis. Note that X and Y Values are
 always present. Z and M are omitted for 2D geospatial instances.
 
+Writers should follow the guidelines below when calculating bounding boxes in
+the presence of edge cases.
+
+* `null` instance: Skip it and continue processing the remaining 
+  geospatial instances. Do not produce a bounding box if all instances are 
null.
+* Non-`null` instance with [invalid geospatial 
values](#invalid-geospatial-values):
+  * X and Y: Skip any invalid X or Y value and continue processing the 
+    remaining X or Y values. Do not produce a bounding box if all X or all Y 
+    values are invalid.
+
+  * Z: Skip any invalid Z value and continue processing the remaining Z values.
+    Omit Z from the bounding box if all Z values are invalid.
+
+  * M: Skip any invalid M value and continue processing the remaining M values.
+    Omit M from the bounding box if all M values are invalid.
+
+Readers should follow the guidelines below when examining bounding boxes. 
+Parquet does not permit `null` or `NaN` values in bounding boxes, whether at 
+the overall bounding box level or within individual coordinate fields.
+
+* No bounding box: No assumptions can be made about the presence or validity 
+  of coordinate values. Readers may need to load all individual coordinate 
+  values for validation.
+
+* A bounding box is present:
+    * X and Y: Both X and Y of the bounding box must be present.

Review Comment:
   ```suggestion
       * X and Y: Both X and Y of the bounding box must be present. If any X or 
Y
         value is invalid, this bounding box is not reliable and cannot be used.
   ```



##########
Geospatial.md:
##########
@@ -162,3 +195,19 @@ The axis order of the coordinates in WKB and bounding box 
stored in Parquet
 follows the de facto standard for axis order in WKB and is therefore always
 (x, y) where x is easting or longitude and y is northing or latitude. This
 ordering explicitly overrides the axis order as specified in the CRS.
+
+# Invalid geospatial values
+
+An invalid geospatial value refers to the coordinate values of a non-`null` 
+geospatial instance that are encoded in a valid WKB format, but are not 
+considered valid values under this specification. While different WKB 
+readers may interpret such values differently, the resulting output should 
+be treated as invalid.
+
+* `NaN`: Not a Number. For example, `POINT EMPTY` in WKB is represented by a 
+  `Point` with each ordinate value set to an IEEE-754 quiet NaN value.
+* `Empty geometries`: Geometries explicitly marked as empty in WKB using 

Review Comment:
   I suppose that `LINESTRING EMPTY` or `POLYGON EMPTY` are WKT? Do we have 
canonical WKB values to demonstrate?



##########
Geospatial.md:
##########
@@ -94,6 +94,39 @@ Bounding box is defined as the thrift struct below in the 
representation of
 min/max value pair of coordinates from each axis. Note that X and Y Values are
 always present. Z and M are omitted for 2D geospatial instances.
 
+Writers should follow the guidelines below when calculating bounding boxes in
+the presence of edge cases.
+
+* `null` instance: Skip it and continue processing the remaining 
+  geospatial instances. Do not produce a bounding box if all instances are 
null.
+* Non-`null` instance with [invalid geospatial 
values](#invalid-geospatial-values):
+  * X and Y: Skip any invalid X or Y value and continue processing the 
+    remaining X or Y values. Do not produce a bounding box if all X or all Y 
+    values are invalid.
+
+  * Z: Skip any invalid Z value and continue processing the remaining Z values.
+    Omit Z from the bounding box if all Z values are invalid.
+
+  * M: Skip any invalid M value and continue processing the remaining M values.
+    Omit M from the bounding box if all M values are invalid.
+
+Readers should follow the guidelines below when examining bounding boxes. 
+Parquet does not permit `null` or `NaN` values in bounding boxes, whether at 
+the overall bounding box level or within individual coordinate fields.
+
+* No bounding box: No assumptions can be made about the presence or validity 
+  of coordinate values. Readers may need to load all individual coordinate 
+  values for validation.
+
+* A bounding box is present:
+    * X and Y: Both X and Y of the bounding box must be present.
+    * Z: If Z of the bounding box is missing, readers should not assume 

Review Comment:
   ```suggestion
       * Z: If Z of the bounding box is missing or contains any invalid value, 
readers should not assume 
   ```



##########
Geospatial.md:
##########
@@ -162,3 +195,19 @@ The axis order of the coordinates in WKB and bounding box 
stored in Parquet
 follows the de facto standard for axis order in WKB and is therefore always
 (x, y) where x is easting or longitude and y is northing or latitude. This
 ordering explicitly overrides the axis order as specified in the CRS.
+
+# Invalid geospatial values
+
+An invalid geospatial value refers to the coordinate values of a non-`null` 
+geospatial instance that are encoded in a valid WKB format, but are not 
+considered valid values under this specification. While different WKB 
+readers may interpret such values differently, the resulting output should 
+be treated as invalid.
+
+* `NaN`: Not a Number. For example, `POINT EMPTY` in WKB is represented by a 
+  `Point` with each ordinate value set to an IEEE-754 quiet NaN value.
+* `Empty geometries`: Geometries explicitly marked as empty in WKB using 
+  indicators such as `numPoints`, `numRings`, or `numGeometries`. Examples 
+  include `LINESTRING EMPTY` or `POLYGON EMPTY`.
+* `Out-of-bounds coordinates`: Values that fall outside the valid range 

Review Comment:
   Do we need to provide all invalid examples so implementations do not miss 
anything?



##########
Geospatial.md:
##########
@@ -162,3 +195,19 @@ The axis order of the coordinates in WKB and bounding box 
stored in Parquet
 follows the de facto standard for axis order in WKB and is therefore always
 (x, y) where x is easting or longitude and y is northing or latitude. This
 ordering explicitly overrides the axis order as specified in the CRS.
+
+# Invalid geospatial values
+
+An invalid geospatial value refers to the coordinate values of a non-`null` 
+geospatial instance that are encoded in a valid WKB format, but are not 

Review Comment:
   As we have mentioned `a valid WKB format`, do we need to provide guidelines 
for `invalid WKB format`?



##########
Geospatial.md:
##########
@@ -94,6 +94,39 @@ Bounding box is defined as the thrift struct below in the 
representation of
 min/max value pair of coordinates from each axis. Note that X and Y Values are
 always present. Z and M are omitted for 2D geospatial instances.
 
+Writers should follow the guidelines below when calculating bounding boxes in
+the presence of edge cases.
+
+* `null` instance: Skip it and continue processing the remaining 
+  geospatial instances. Do not produce a bounding box if all instances are 
null.
+* Non-`null` instance with [invalid geospatial 
values](#invalid-geospatial-values):
+  * X and Y: Skip any invalid X or Y value and continue processing the 
+    remaining X or Y values. Do not produce a bounding box if all X or all Y 
+    values are invalid.
+
+  * Z: Skip any invalid Z value and continue processing the remaining Z values.
+    Omit Z from the bounding box if all Z values are invalid.
+
+  * M: Skip any invalid M value and continue processing the remaining M values.
+    Omit M from the bounding box if all M values are invalid.
+
+Readers should follow the guidelines below when examining bounding boxes. 
+Parquet does not permit `null` or `NaN` values in bounding boxes, whether at 
+the overall bounding box level or within individual coordinate fields.
+
+* No bounding box: No assumptions can be made about the presence or validity 
+  of coordinate values. Readers may need to load all individual coordinate 
+  values for validation.
+
+* A bounding box is present:
+    * X and Y: Both X and Y of the bounding box must be present.
+    * Z: If Z of the bounding box is missing, readers should not assume 
+      anything about the presence or validity of Z values and may need to 
+      load individual coordinates for validation.
+    * M: If M of the bounding box is missing, readers should not assume

Review Comment:
   ```suggestion
       * M: If M of the bounding box is missing or contains any invalid value, 
readers should not assume
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to