wgtmac commented on code in PR #494:
URL: https://github.com/apache/parquet-format/pull/494#discussion_r2048097604


##########
Geospatial.md:
##########
@@ -104,6 +104,19 @@ crosses the antimeridian line. In geographic terminology, 
the concepts of `xmin`
 For `GEOGRAPHY` types, X and Y values are restricted to the canonical ranges of
 [-180, 180] for X and [-90, 90] for Y.
 
+When `GeospatialStatistics` is present, writers must omit zmin and zmax if and

Review Comment:
   What about putting these between line 95 and 97?



##########
Geospatial.md:
##########
@@ -104,6 +104,19 @@ crosses the antimeridian line. In geographic terminology, 
the concepts of `xmin`
 For `GEOGRAPHY` types, X and Y values are restricted to the canonical ranges of
 [-180, 180] for X and [-90, 90] for Y.
 
+When `GeospatialStatistics` is present, writers must omit zmin and zmax if and

Review Comment:
   ```suggestion
   To produce `GeospatialStatistics`, writers must omit zmin and zmax if and
   ```



##########
Geospatial.md:
##########
@@ -104,6 +104,19 @@ crosses the antimeridian line. In geographic terminology, 
the concepts of `xmin`
 For `GEOGRAPHY` types, X and Y values are restricted to the canonical ranges of
 [-180, 180] for X and [-90, 90] for Y.
 
+When `GeospatialStatistics` is present, writers must omit zmin and zmax if and
+only if there are zero non-NaN Z values in the column chunk, and must omit mmin
+and mmax if and only if there are zero non-NaN M values. The bounding box must 
+be omitted entirely if and only if there are zero non-NaN X values or zero 
+non-NaN Y values in the column chunk. If Z or M values are missing, the writer
+may still include a bounding box using only the available dimensions.
+
+Readers may interpret the absence of a bounding box, zmin/zmax, or mmin/mmax as
+an indication that all corresponding values are null, and may use this 
+information to skip data during predicate evaluation. For example, a reader may
+skip a row group if the bounding box is absent, indicating that all X and Y 
+coordinates are null.

Review Comment:
   > For example, a reader may
   skip a row group if the bounding box is absent, indicating that all X and Y 
   coordinates are null.
   
   This is counter-intuitive because usually in Parquet we cannot skip the row 
group if its min/max stats is missing because we cannot make any assumption 
about its data. It might be the case they all values are null, or the writer 
does not implement this feature at all.



##########
Geospatial.md:
##########
@@ -104,6 +104,19 @@ crosses the antimeridian line. In geographic terminology, 
the concepts of `xmin`
 For `GEOGRAPHY` types, X and Y values are restricted to the canonical ranges of
 [-180, 180] for X and [-90, 90] for Y.
 
+When `GeospatialStatistics` is present, writers must omit zmin and zmax if and
+only if there are zero non-NaN Z values in the column chunk, and must omit mmin
+and mmax if and only if there are zero non-NaN M values. The bounding box must 

Review Comment:
   I think it is better to mention X and Y values first because they are 
required.



##########
Geospatial.md:
##########
@@ -104,6 +104,19 @@ crosses the antimeridian line. In geographic terminology, 
the concepts of `xmin`
 For `GEOGRAPHY` types, X and Y values are restricted to the canonical ranges of
 [-180, 180] for X and [-90, 90] for Y.
 
+When `GeospatialStatistics` is present, writers must omit zmin and zmax if and
+only if there are zero non-NaN Z values in the column chunk, and must omit mmin
+and mmax if and only if there are zero non-NaN M values. The bounding box must 
+be omitted entirely if and only if there are zero non-NaN X values or zero 
+non-NaN Y values in the column chunk. If Z or M values are missing, the writer
+may still include a bounding box using only the available dimensions.
+
+Readers may interpret the absence of a bounding box, zmin/zmax, or mmin/mmax as
+an indication that all corresponding values are null, and may use this 
+information to skip data during predicate evaluation. For example, a reader may
+skip a row group if the bounding box is absent, indicating that all X and Y 
+coordinates are null.

Review Comment:
   What is a null value in this case? Is WKB able to encode a null coordinate? 
I was thinking the binary value is null in this case.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to