Hi everyone,

Me and a few colleagues at AWS would like to discuss a new proposal for
supporting securable objects in the Iceberg REST catalog spec.

Here is our proposal in Google doc:
https://docs.google.com/document/d/1KmIDbPuN6IYF0nWs9ostXIB9F4b8iH3zZO0hjgs1lm4/edit

And here is the corresponding GitHub issue:
https://github.com/apache/iceberg/issues/10407

I will also paste the intro here for an overview. There are 2 main reasons
for us to look into this area and draft this proposal:

*IRC lacks privilege related concepts to express access decisions: *

In a proposal we published previously regarding access decision, we would
like to express the idea like, for example, a LoadTableResponse tells
engine the list of privileges (e.g. read only, read and write, insert but
no update, etc.) the caller has on the table, such that compute engines can
enforce it accordingly.

However, as we explored deeper into this topic, we found that there is no
standard in Iceberg to express such privileges on table objects. And when
we started to come up with keywords like SELECT, INSERT, DELETE, etc. to
express such privileges, we realized that we were basically defining a
securable object framework that is well-known in database systems (see
Reference Works section for more details). This is the primary reason that
led to us publishing this proposal before we push more progress on the work
on access decisions.

*IRC lacks clear guidelines on access management requirements:*

This is feedback we heard frequently when interviewing AWS customers using
Iceberg and considering building an IRC. Today Iceberg objects (namespaces,
tables, views) are not securable within the Iceberg catalog itself, and
need to be secured using an auxiliary system. This means that an
organization building an IRC service needs to wrap many important
operations into custom-built APIs for downstream users to consume (e.g. an
API to grant Iceberg table access on S3 needs to grant corresponding IAM
users/roles the right S3 policy or ACL setting). Huge amount of effort
needs to be spent to figure out what are the missing APIs in IRC to satisfy
enterprise level data warehouse access management requirements.

There are some IRC products that offer vendor-specific APIs outside IRC to
perform those operations, but this means that users are locked-in to this
vendor’s securable object management system when using the IRC solution,
and do not have the true freedom to easily switch to another solution if it
offers better price-performance.

We understand that Iceberg is not a security product, and it is not the
best interest of the community to dive too deep into security-related
domains. However, we believe that *we should at least offer the right
interfaces and set the right standards for how Iceberg catalog expresses
securable objects and how Iceberg catalog users interact with those objects*,
such that (1) users that would like to build IRC can have a clear guideline
of what API constract to implement for managing access to objects in IRC,
and (2) users that are on one IRC product do not need to be locked-in due
to access management aspects.

Would really appreciate any feedback on this topic and proposal!

Best,
Jack Ye

Reply via email to