[DISCUSS] Spec update to cover compressed JSON metadata files

2025-05-01 Thread Micah Kornfield
Hi Ryan, Thank you for the feedback. I responded on the PR let's try to close out the conversation there. Thanks, Micah On Thursday, May 1, 2025, Ryan Blue wrote: > I just commented on the PR, but I don't think that file naming should be > part of the spec requirements. This is informational a

Re: [DISCUSS] Spec update to cover compressed JSON metadata files

2025-05-01 Thread Ryan Blue
I just commented on the PR, but I don't think that file naming should be part of the spec requirements. This is informational and there are lots of ways to determine file compression, including magic bytes and catalog metadata. There isn't a need for this to be required by the spec so I think it sh

Re: [DISCUSS] Spec update to cover compressed JSON metadata files

2025-04-30 Thread Micah Kornfield
I'll plan on starting a vote for the proposed PR tomorrow, unless there are any objections. I look forward to follow-ups on ways we can improve compression here. Thanks, Micah On Tue, Apr 29, 2025 at 10:38 AM Micah Kornfield wrote: > > I wanted to clarify, as others have pointed out, that the

Re: [DISCUSS] Spec update to cover compressed JSON metadata files

2025-04-29 Thread Micah Kornfield
I wanted to clarify, as others have pointed out, that the PR documents existing functionality and making changes to it at this point risks breaking clients I think any changes to naming convention would have to be done as part of a new version of the spec (and file system based commits must be com

Re: [DISCUSS] Spec update to cover compressed JSON metadata files

2025-04-28 Thread Ryan Blue
It would be great to mention how to determine the compression of the metadata JSON file in the spec. Thanks for bringing this up. It makes sense to me to use the file name and get a bit more strict about this. That said, we will need to make sure that the current default behavior is documented and

Re: [DISCUSS] Spec update to cover compressed JSON metadata files

2025-04-28 Thread Kevin Liu
Thanks for bringing this up Micah! I think it's better to treat `.json.gz` as the "default" file scheme and `.gz.json` as the "legacy". I agree with the other points brought up here. Across the broader ecosystem, I think `.json.gz` is used more often. DuckDB, for example, can automatically detect

Re: [DISCUSS] Spec update to cover compressed JSON metadata files

2025-04-27 Thread Fokko Driesprong
Hey Micah, For some reason, your email ended up in my spam box 😨 There is a reason for everything! .gz.metadata.json is quite uncommon and can't be read by most existing > tools. Would it be better to support .metadata.json.gz and treat > .gz.metadata.json as legacy for backward compatibility?

Re: [DISCUSS] Spec update to cover compressed JSON metadata files

2025-04-27 Thread Xuanwo
I've copied my comments from GitHub here for a broader discussion: Hi, I have two concerns about this change: • `.gz.metadata.json` is quite uncommon and can't be read by most existing tools. Would it be better to support `.metadata.json.gz` and treat `.gz.metadata.json` as legacy for backwar

[DISCUSS] Spec update to cover compressed JSON metadata files

2025-04-26 Thread Micah Kornfield
I created https://github.com/apache/iceberg/pull/12598 to document this feature. Kevin Liu already took a look, but I would like to get more eyes on it before starting a vote for merging. Thanks, Micah