Hey Ian, Thanks for raising this. The numbers you mention, do you know if this was compressed or uncompressed?
I have read other issues in github which mention gigabyte-scale metadata > files. This sounds like a bad practice, and that table probably needs some maintenance. I don't have the historical context of why we produce pretty JSON. I think this would be an easy optimization, and I agree that making them easily consumable by humans afterward is trivial. FWIW, PyIceberg also produces unpretty JSON. Kind regards, Fokko Op ma 17 feb 2025 om 16:48 schreef Ian Streeter <i...@snowplow.io.invalid>: > Currently, metadata files are pretty-printed, with lots of new-lines and > whitespace indentations. This is the relevant line of code, which uses > the Jackson default pretty printer: > https://github.com/apache/iceberg/blob/abb47830e7df7dc2ae93c74b0ad97f06cdd37aad/core/src/main/java/org/apache/iceberg/TableMetadataParser.java#L131 > > If we could write metadata files without redundant whitespace, then it > would save some storage space, and network overhead. > > This will have have most impact for tables with large metadata files. For > example, I have seen a metadata files which was 53.6MB. After removing > whitespace, this was reduced to 41.4MB. I have read other issues in github > which mention gigabyte-scale metadata files. > > I cannot think of any downside of this suggested change. Metadata files > are mainly read by machines not humans. And if a human does want to inspect > a metadata file, then it is fairly easy to prettify a JSON file when needed. > > I opened this as an issue in github, and then took advice to move the > discussion to this dev list. See > https://github.com/apache/iceberg/issues/12281 > > I would appreciate hearing your thoughts. > Thanks, > Ian > >