paleolimbot commented on PR #2971: URL: https://github.com/apache/parquet-java/pull/2971#issuecomment-2797118363
Thank you for this! > If all values are null, we can deduce it by checking null_count == value_count to ignore the empty min/max values. I wasn't able to find a null count for a row group in statistics for all null values (or otherwise) because (at least in C++) the statistics aren't written because the sort order is unknown? The test case in C++ for this is https://github.com/apache/arrow/blob/a14fb07155073c4625e67a8f5ef448fd80b59e65/cpp/src/parquet/column_writer_test.cc#L1999-L2024 . > For the last step, I think it is better to drop the bbox instead of writing all NaNs to confuse users. I agree...we can also clarify in the comments of the format that an omitted bbox (when GeospatialStatistics exists) occurs if-and-only-if there are no x or y values? (And also that omitted z and/or m statistics occur if-and-only-if there were no z and/or m values, respectively, which is true today in both Java and C++). Provided that there is another way to detect the 100% null case (probably common) I'm not personally concerned about excluding 100% empty row groups (probably not that common). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
