Hi Gidon, sorry for the late reply.

> Yep, that'd work, as long as the checksum is kept in a trusted
independent storage/db.
> Then I guess both catalog clients and servers would need access to the
trusted storage of the checksums.

Thanks for chiming in. These two lines actually change how I think about
table metadata.json protection. I’m leaning toward the conclusion that we
don’t need to add extra messages to the REST spec. A few reasons:
1. REST catalog isn't fundamentally different from other catalogs (HMS,
Hadoop) in terms of table metadata.json security boundary.
2. The tamper-proof requirement should be exactly the same across different
types of catalogs.
3. Each IRC impl. can still choose to add extra protections like the
checksum I proposed.
4. And longer-term, if we ever manage to remove the requirement that table
metadata.json must live in storage as a file, then we could revisit the
spec and add more targeted guarantees at the API layer.


Yufei


On Thu, Nov 6, 2025 at 4:34 AM Gidon Gershinsky <[email protected]> wrote:

> Hi Yufei, thank you.
>
> I'll start with saying - if the main storage is tamper-proof, then there
> is no problem and no extra requirements for REST catalogs.
> The rest of the mail refers to the scenarios where the main storage is not
> tamper-proof.
>
> > For metadata.json integrity, the REST catalog can add a checksum to the
> metadata.json file at the commit time and validate it while loading it back
>
> Yep, that'd work, as long as the checksum is kept in a trusted independent
> storage/db.
>
> > There are certain use cases where engines may still load tables directly
> from storage even when IRC is used for committing.
>
> Then I guess both catalog clients and servers would need access to the
> trusted storage of the checksums.
>
> > It seems like a loophole, but IRC couldn't really do anything about it.
> It's probably the system admin's responsibility to take care of it.
>
> Ok. As a baseline protection, the REST spec addition patch explicitly
> states what is required of a catalog implementation/deployment to prevent a
> compromise of encrypted tables. Maybe some IRC implementations will handle
> this requirement (fully or partially) for untrusted main storage backends.
> But I agree - eventually, it is the admin's responsibility to make sure the
> requirement is handled; e.g. by choosing a tamper-proof main storage
> backend, or by deploying an independent storage / db for the metadata or
> its checksums.
>
> > For metadata.json confidentiality, I thought the metadata.json itself is
> encrypted as well, no?
>
> By broken confidentially, I meant leaks in the data files and
> manifest/list files. They are obviously confidential (values and stats). In
> the community discussion on encryption, we've decided not to encrypt the
> metadata.json file, for two reasons: metadata fields don't have
> confidential info, and a loss of the metadata encryption key due to a
> catalog glitch would mean a loss of the table.
>
> Cheers, Gidon
>
>
> On Wed, Nov 5, 2025 at 11:38 PM Yufei Gu <[email protected]> wrote:
>
>> Thanks Gidon for raising this! It's great that we start to think through
>> REST API support for encryption. We have been asked to support Encryption
>> in the Polaris community multiple times.
>>
>> For metadata.json integrity, the REST catalog can add a checksum to the
>> metadata.json file at the commit time and validate it while loading it
>> back. There are certain use cases where engines may still load tables
>> directly from storage even when IRC is used for committing. It seems like a
>> loophole, but IRC couldn't really do anything about it. It's probably the
>> system admin's responsibility to take care of it.
>>
>> For metadata.json confidentiality, I thought the metadata.json itself is
>> encrypted as well, no?
>>
>> Yufei
>>
>>
>> On Wed, Nov 5, 2025 at 12:28 AM Gidon Gershinsky <[email protected]>
>> wrote:
>>
>>> Hi all,
>>>
>>> The REST catalog server implementations that keep the table metadata in
>>> a json file in an untrusted storage, are not safe for table encryption [1].
>>> The data confidentiality and integrity can be broken by malicious
>>> modifications of the metadata.json.
>>>
>>> We propose a short addition to the REST spec [2] that requires
>>> protection of the metadata integrity in catalog implementations that will
>>> be used for encrypted tables.
>>>
>>> Being a spec add-on, this is brought for a community discussion. All
>>> comments are welcome.
>>>
>>> Thanks,
>>> Gidon
>>>
>>>
>>>
>>> [1] thread starting at
>>> https://github.com/apache/iceberg/pull/13225#discussion_r2465759567
>>> [2] https://github.com/apache/iceberg/pull/14486
>>>
>>

Reply via email to