Re: [Discuss] Print un-pretty metadata JSON files without whitespace

2025-03-21 Thread Marc Cenac
> Perhaps this is a good default on the catalog side when creating new metadata json. +1 for this b/c I think it's an easy performance win for tables with large metadata. Is there any reason not to have write.metadata.compression-codec default to gzip? I'm curious if there was a reason it's curr

Re: [Discuss] Print un-pretty metadata JSON files without whitespace

2025-03-20 Thread Micah Kornfield
This reminds me that GZipped metadata files are not covered in the spec. I opened https://github.com/apache/iceberg/pull/12598 to try to document them (feedback welcome). On Mon, Feb 17, 2025 at 2:35 PM Kevin Liu wrote: > +1, json with no whitespace sounds like a reasonable default. But if > sa

Re: [Discuss] Print un-pretty metadata JSON files without whitespace

2025-02-17 Thread Kevin Liu
+1, json with no whitespace sounds like a reasonable default. But if saving storage space and network is the main goal, then setting `write.metadata.compression-codec` to `gzip` is way more impactful. Perhaps this is a good default on the catalog side when creating new metadata json. Best, Kevin L

Re: [Discuss] Print un-pretty metadata JSON files without whitespace

2025-02-17 Thread Ian Streeter
The numbers I shared were for uncompressed files. I am embarrassed to say I had not noticed there is an option `write.metadata.compression-codec`. I had it set to the default `none`, and I reckon many other Iceberg users will too. Here are some updated numbers for my example metadata file: - Un

Re: [Discuss] Print un-pretty metadata JSON files without whitespace

2025-02-17 Thread Russell Spitzer
+0 - I would be surprised if post compression sizes were that different but minifying json is a pretty standard practice for over the wire transfers On Mon, Feb 17, 2025 at 1:51 PM Steve Zhang wrote: > +1. Configure table property `write.metadata.compression-codec` to gzip is > usually suggested

Re: [Discuss] Print un-pretty metadata JSON files without whitespace

2025-02-17 Thread Steve Zhang
+1. Configure table property `write.metadata.compression-codec` to gzip is usually suggested to reduce metadata size but drop whitespace can still help here. Thanks, Steve Zhang > On Feb 17, 2025, at 8:32 AM, Fokko Driesprong wrote: > > Hey Ian, > > Thanks for raising this. The numbers yo

Re: [Discuss] Print un-pretty metadata JSON files without whitespace

2025-02-17 Thread Steven Wu
+1. it seems reasonable to produce unpretty json by default. On Mon, Feb 17, 2025 at 8:35 AM Fokko Driesprong wrote: > Hey Ian, > > Thanks for raising this. The numbers you mention, do you know if this was > compressed or uncompressed? > > I have read other issues in github which mention gigabyt

Re: [Discuss] Print un-pretty metadata JSON files without whitespace

2025-02-17 Thread Fokko Driesprong
Hey Ian, Thanks for raising this. The numbers you mention, do you know if this was compressed or uncompressed? I have read other issues in github which mention gigabyte-scale metadata > files. This sounds like a bad practice, and that table probably needs some maintenance. I don't have the his

[Discuss] Print un-pretty metadata JSON files without whitespace

2025-02-17 Thread Ian Streeter
Currently, metadata files are pretty-printed, with lots of new-lines and whitespace indentations. This is the relevant line of code, which uses the Jackson default pretty printer: https://github.com/apache/iceberg/blob/abb47830e7df7dc2ae93c74b0ad97f06cdd37aad/core/src/main/java/org/apache/iceberg