Currently, metadata files are pretty-printed, with lots of new-lines and
whitespace indentations.   This is the relevant line of code, which uses
the Jackson default pretty printer:
https://github.com/apache/iceberg/blob/abb47830e7df7dc2ae93c74b0ad97f06cdd37aad/core/src/main/java/org/apache/iceberg/TableMetadataParser.java#L131

If we could write metadata files without redundant whitespace, then it
would save some storage space, and network overhead.

This will have have most impact for tables with large metadata files. For
example, I have seen a metadata files which was 53.6MB. After removing
whitespace, this was reduced to 41.4MB. I have read other issues in github
which mention gigabyte-scale metadata files.

I cannot think of any downside of this suggested change. Metadata files are
mainly read by machines not humans. And if a human does want to inspect a
metadata file, then it is fairly easy to prettify a JSON file when needed.

I opened this as an issue in github, and then took advice to move the
discussion to this dev list.  See
https://github.com/apache/iceberg/issues/12281

I would appreciate hearing your thoughts.
Thanks,
Ian

Reply via email to