Hi everyone, Me and a few colleagues at AWS would like to discuss a new proposal for supporting securable objects in the Iceberg REST catalog spec.
Here is our proposal in Google doc: https://docs.google.com/document/d/1KmIDbPuN6IYF0nWs9ostXIB9F4b8iH3zZO0hjgs1lm4/edit And here is the corresponding GitHub issue: https://github.com/apache/iceberg/issues/10407 I will also paste the intro here for an overview. There are 2 main reasons for us to look into this area and draft this proposal: *IRC lacks privilege related concepts to express access decisions: * In a proposal we published previously regarding access decision, we would like to express the idea like, for example, a LoadTableResponse tells engine the list of privileges (e.g. read only, read and write, insert but no update, etc.) the caller has on the table, such that compute engines can enforce it accordingly. However, as we explored deeper into this topic, we found that there is no standard in Iceberg to express such privileges on table objects. And when we started to come up with keywords like SELECT, INSERT, DELETE, etc. to express such privileges, we realized that we were basically defining a securable object framework that is well-known in database systems (see Reference Works section for more details). This is the primary reason that led to us publishing this proposal before we push more progress on the work on access decisions. *IRC lacks clear guidelines on access management requirements:* This is feedback we heard frequently when interviewing AWS customers using Iceberg and considering building an IRC. Today Iceberg objects (namespaces, tables, views) are not securable within the Iceberg catalog itself, and need to be secured using an auxiliary system. This means that an organization building an IRC service needs to wrap many important operations into custom-built APIs for downstream users to consume (e.g. an API to grant Iceberg table access on S3 needs to grant corresponding IAM users/roles the right S3 policy or ACL setting). Huge amount of effort needs to be spent to figure out what are the missing APIs in IRC to satisfy enterprise level data warehouse access management requirements. There are some IRC products that offer vendor-specific APIs outside IRC to perform those operations, but this means that users are locked-in to this vendor’s securable object management system when using the IRC solution, and do not have the true freedom to easily switch to another solution if it offers better price-performance. We understand that Iceberg is not a security product, and it is not the best interest of the community to dive too deep into security-related domains. However, we believe that *we should at least offer the right interfaces and set the right standards for how Iceberg catalog expresses securable objects and how Iceberg catalog users interact with those objects*, such that (1) users that would like to build IRC can have a clear guideline of what API constract to implement for managing access to objects in IRC, and (2) users that are on one IRC product do not need to be locked-in due to access management aspects. Would really appreciate any feedback on this topic and proposal! Best, Jack Ye