> Perhaps this is a good default on the catalog side when creating new
metadata json.
+1 for this b/c I think it's an easy performance win for tables with large
metadata. Is there any reason not to have write.metadata.compression-codec
default to gzip? I'm curious if there was a reason it's curr
This reminds me that GZipped metadata files are not covered in the spec. I
opened https://github.com/apache/iceberg/pull/12598 to try to document them
(feedback welcome).
On Mon, Feb 17, 2025 at 2:35 PM Kevin Liu wrote:
> +1, json with no whitespace sounds like a reasonable default. But if
> sa
+1, json with no whitespace sounds like a reasonable default. But if saving
storage space and network is the main goal, then setting
`write.metadata.compression-codec` to `gzip` is way more impactful. Perhaps
this is a good default on the catalog side when creating new metadata json.
Best,
Kevin L
The numbers I shared were for uncompressed files.
I am embarrassed to say I had not noticed there is an option
`write.metadata.compression-codec`. I had it set to the default `none`,
and I reckon many other Iceberg users will too.
Here are some updated numbers for my example metadata file:
- Un
+0 - I would be surprised if post compression sizes were that different but
minifying json is a pretty standard practice for over the wire transfers
On Mon, Feb 17, 2025 at 1:51 PM Steve Zhang
wrote:
> +1. Configure table property `write.metadata.compression-codec` to gzip is
> usually suggested
+1. Configure table property `write.metadata.compression-codec` to gzip is
usually suggested to reduce metadata size but drop whitespace can still help
here.
Thanks,
Steve Zhang
> On Feb 17, 2025, at 8:32 AM, Fokko Driesprong wrote:
>
> Hey Ian,
>
> Thanks for raising this. The numbers yo
+1. it seems reasonable to produce unpretty json by default.
On Mon, Feb 17, 2025 at 8:35 AM Fokko Driesprong wrote:
> Hey Ian,
>
> Thanks for raising this. The numbers you mention, do you know if this was
> compressed or uncompressed?
>
> I have read other issues in github which mention gigabyt
Hey Ian,
Thanks for raising this. The numbers you mention, do you know if this was
compressed or uncompressed?
I have read other issues in github which mention gigabyte-scale metadata
> files.
This sounds like a bad practice, and that table probably needs some
maintenance.
I don't have the his
Currently, metadata files are pretty-printed, with lots of new-lines and
whitespace indentations. This is the relevant line of code, which uses
the Jackson default pretty printer:
https://github.com/apache/iceberg/blob/abb47830e7df7dc2ae93c74b0ad97f06cdd37aad/core/src/main/java/org/apache/iceberg