paleolimbot commented on code in PR #494: URL: https://github.com/apache/parquet-format/pull/494#discussion_r2049167267
########## Geospatial.md: ########## @@ -104,6 +104,19 @@ crosses the antimeridian line. In geographic terminology, the concepts of `xmin` For `GEOGRAPHY` types, X and Y values are restricted to the canonical ranges of [-180, 180] for X and [-90, 90] for Y. +When `GeospatialStatistics` is present, writers must omit zmin and zmax if and +only if there are zero non-NaN Z values in the column chunk, and must omit mmin +and mmax if and only if there are zero non-NaN M values. The bounding box must +be omitted entirely if and only if there are zero non-NaN X values or zero +non-NaN Y values in the column chunk. If Z or M values are missing, the writer +may still include a bounding box using only the available dimensions. + +Readers may interpret the absence of a bounding box, zmin/zmax, or mmin/mmax as +an indication that all corresponding values are null, and may use this +information to skip data during predicate evaluation. For example, a reader may +skip a row group if the bounding box is absent, indicating that all X and Y +coordinates are null. Review Comment: Ah, I didn't understand that the lack of null count was a C++-specific limitation (and I was remembering a previous draft of the geospatial types that had some very strong language that writers were forbidden to write statistics and readers were required to ignore any that were present). I would be ideal if we could also prune 100% empty (but not null), but I think we can make the null count + geospatial_types to prune in most reasonable cases. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
