paleolimbot commented on PR #2971:
URL: https://github.com/apache/parquet-java/pull/2971#issuecomment-2797118363

   Thank you for this!
   
   > If all values are null, we can deduce it by checking null_count == 
value_count to ignore the empty min/max values.
   
   I wasn't able to find a null count for a row group in statistics for all 
null values (or otherwise) because (at least in C++) the statistics aren't 
written because the sort order is unknown? The test case in C++ for this is 
https://github.com/apache/arrow/blob/a14fb07155073c4625e67a8f5ef448fd80b59e65/cpp/src/parquet/column_writer_test.cc#L1999-L2024
 .
   
   > For the last step, I think it is better to drop the bbox instead of 
writing all NaNs to confuse users.
   
   I agree...we can also clarify in the comments of the format that an omitted 
bbox (when GeospatialStatistics exists) occurs if-and-only-if there are no x or 
y values? (And also that omitted z and/or m statistics occur if-and-only-if 
there were no z and/or m values, respectively, which is true today in both Java 
and C++). Provided that there is another way to detect the 100% null case 
(probably common) I'm not personally concerned about excluding 100% empty row 
groups (probably not that common).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to