paleolimbot commented on code in PR #494:
URL: https://github.com/apache/parquet-format/pull/494#discussion_r2049167267


##########
Geospatial.md:
##########
@@ -104,6 +104,19 @@ crosses the antimeridian line. In geographic terminology, 
the concepts of `xmin`
 For `GEOGRAPHY` types, X and Y values are restricted to the canonical ranges of
 [-180, 180] for X and [-90, 90] for Y.
 
+When `GeospatialStatistics` is present, writers must omit zmin and zmax if and
+only if there are zero non-NaN Z values in the column chunk, and must omit mmin
+and mmax if and only if there are zero non-NaN M values. The bounding box must 
+be omitted entirely if and only if there are zero non-NaN X values or zero 
+non-NaN Y values in the column chunk. If Z or M values are missing, the writer
+may still include a bounding box using only the available dimensions.
+
+Readers may interpret the absence of a bounding box, zmin/zmax, or mmin/mmax as
+an indication that all corresponding values are null, and may use this 
+information to skip data during predicate evaluation. For example, a reader may
+skip a row group if the bounding box is absent, indicating that all X and Y 
+coordinates are null.

Review Comment:
   Ah, I didn't understand that the lack of null count was a C++-specific 
limitation (and I was remembering a previous draft of the geospatial types that 
had some very strong language that writers were forbidden to write statistics 
and readers were required to ignore any that were present). I would be ideal if 
we could also prune 100% empty (but not null), but I think we can make the null 
count + geospatial_types to prune in most reasonable cases.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to