Hi Ryan,
Thank you for the feedback. I responded on the PR let's try to close out
the conversation there.
Thanks,
Micah
On Thursday, May 1, 2025, Ryan Blue wrote:
> I just commented on the PR, but I don't think that file naming should be
> part of the spec requirements. This is informational a
I just commented on the PR, but I don't think that file naming should be
part of the spec requirements. This is informational and there are lots of
ways to determine file compression, including magic bytes and catalog
metadata. There isn't a need for this to be required by the spec so I think
it sh
I'll plan on starting a vote for the proposed PR tomorrow, unless there are
any objections. I look forward to follow-ups on ways we can improve
compression here.
Thanks,
Micah
On Tue, Apr 29, 2025 at 10:38 AM Micah Kornfield
wrote:
>
> I wanted to clarify, as others have pointed out, that the
I wanted to clarify, as others have pointed out, that the PR documents
existing functionality and making changes to it at this point risks
breaking clients
I think any changes to naming convention would have to be done as part of a
new version of the spec (and file system based commits must be com
It would be great to mention how to determine the compression of the
metadata JSON file in the spec. Thanks for bringing this up. It makes sense
to me to use the file name and get a bit more strict about this.
That said, we will need to make sure that the current default behavior is
documented and
Thanks for bringing this up Micah!
I think it's better to treat `.json.gz` as the "default" file scheme and
`.gz.json` as the "legacy".
I agree with the other points brought up here. Across the broader
ecosystem, I think `.json.gz` is used more often. DuckDB, for example, can
automatically detect
Hey Micah,
For some reason, your email ended up in my spam box 😨
There is a reason for everything!
.gz.metadata.json is quite uncommon and can't be read by most existing
> tools. Would it be better to support .metadata.json.gz and treat
> .gz.metadata.json as legacy for backward compatibility?
I've copied my comments from GitHub here for a broader discussion:
Hi, I have two concerns about this change:
• `.gz.metadata.json` is quite uncommon and can't be read by most existing
tools. Would it be better to support `.metadata.json.gz` and treat
`.gz.metadata.json` as legacy for backwar
I created https://github.com/apache/iceberg/pull/12598 to document this
feature. Kevin Liu already took a look, but I would like to get more eyes
on it before starting a vote for merging.
Thanks,
Micah