mdibaiee opened a new issue, #3574: URL: https://github.com/apache/parquet-java/issues/3574
### Describe the bug, including details regarding any error messages, version, and platform. Currently in [ParquetMetadataConverter.java](https://github.com/apache/parquet-java/blob/7be05b4702df78ae0c0c6b44adc6b7b7af2d931f/parquet-hadoop/src/main/java/org/apache/parquet/format/converter/ParquetMetadataConverter.java), there is a guard that prevents the writing of statistics such as min/max AND null_count when the stats are larger than the max allowed size under truncation. The rationale for this makes sense for omitting min/max, however null_count can be written on the file despite the size of its content. See the code below: https://github.com/apache/parquet-java/blob/7be05b4702df78ae0c0c6b44adc6b7b7af2d931f/parquet-hadoop/src/main/java/org/apache/parquet/format/converter/ParquetMetadataConverter.java#L800-L807 The missing `null_count` metadata sometimes causes downstream consumers of the parquet files to error. For example in Snowflake we are seeing the following kind of error: ``` non-nullable column without default has null values according to file statistics ``` ### Component(s) Core -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
