Thanks for bringing this up Micah! I think it's better to treat `.json.gz` as the "default" file scheme and `.gz.json` as the "legacy".
I agree with the other points brought up here. Across the broader ecosystem, I think `.json.gz` is used more often. DuckDB, for example, can automatically detect compression at the suffix, `.json.gz`, but not the other way around. See https://duckdb.org/docs/stable/data/json/loading_json#parameters Best, Kevin Liu On Sun, Apr 27, 2025 at 11:54 PM Fokko Driesprong <fo...@apache.org> wrote: > Hey Micah, > > For some reason, your email ended up in my spam box 😨 > > There is a reason for everything! > > .gz.metadata.json is quite uncommon and can't be read by most existing >> tools. Would it be better to support .metadata.json.gz and treat >> .gz.metadata.json as legacy for backward compatibility? > > > The Java client supports both > <https://github.com/apache/iceberg/blob/dc26b72ad016840b79d62bf8a84b7f2109e9b71b/core/src/test/java/org/apache/iceberg/TableMetadataParserCodecTest.java#L29-L40>. > I looked into this years ago, and if I recall correctly, it was to bypass > the decompressor of Hadoop <https://github.com/apache/iceberg/pull/258/>. > Hadoop would detect the .gz and handle all the (de)compression, which we > wanted to do ourselves. > > gzip is becoming increasingly outdated due to its lack of support for >> modern CPUs. New algorithms like zstd are gaining popularity, so should >> we consider allowing users to use .metadata.json.zst as well? > > > Yes, I think that would make a lot of sense. > > Kind regards, > Fokko > > > > > Op ma 28 apr 2025 om 08:41 schreef Xuanwo <xua...@apache.org>: > >> I've copied my comments from GitHub here for a broader discussion: >> >> >> >> Hi, I have two concerns about this change: >> >> - .gz.metadata.json is quite uncommon and can't be read by most >> existing tools. Would it be better to support .metadata.json.gz and >> treat .gz.metadata.json as legacy for backward compatibility? >> - gzip is becoming increasingly outdated due to its lack of support >> for modern CPUs. New algorithms like zstd are gaining popularity, so >> should we consider allowing users to use .metadata.json.zst as well? >> >> >> On Sun, Apr 27, 2025, at 07:36, Micah Kornfield wrote: >> >> I created https://github.com/apache/iceberg/pull/12598 to document this >> feature. Kevin Liu already took a look, but I would like to get more eyes >> on it before starting a vote for merging. >> >> Thanks, >> Micah >> >> Xuanwo >> >> https://xuanwo.io/ >> >>