+1, Thanks Jack and team for getting the discussion started with this
proposal!

Much of this is well aligned with what we noticed when implementing RBAC
for Polaris Catalog, namely that even if a more complicated User/Role
structure exists outside of the catalog, that it's necessary to be able to
express some common building blocks around "grantee" roles/principals and
scoping/definitions of grants/privileges to make RBAC enforcement work well
and be more standardized across engines.

Your suggestions about initially trying to avoid known problems with things
like "OWNER" privileges and problems depending on the "grantor" in grant
records definitely seem like good ideas.

One thing that came up when trying to distill catalog-enforceable
privileges in Polaris was that by the nature of Iceberg's metadata model,
traditional SQL-style privileges ran into rough edges when it came to
distinguishing e.g. SELECT vs DESCRIBE, or UPDATE vs INSERT vs DELETE,
since "loadTable" is what the Catalog server sees, but then the engine
could be satisfied with just the JSON metadata or might be intending to
just crack open manifest files to select some aggregate statistics, or
might be going all the way to Parquet files.

One way to address this is if we're willing to make privilege definitions
more closely reflect the implementation semantics, e.g. mapping
INSERT/DELETE/UPDATE all to TABLE_WRITE_DATA since at least for now, from
the Catalog's perspective, any deletes require being able to write new
manifests, and anything that can do inserts by writing new manifests can
also effectively "delete" data in the newest snapshot.

It also seems like there's a relationship between having more type-specific
privileges, the ability to have unambiguous hierarchical grants (e.g.
granting TABLE_READ_DATA on a namespace to inherit the privilege in all
child tables), and also having a way to express storage-credential-vending
privileges under the same model.

A few suggestions relating to this:

   - Consider expanding the set of privilege definitions to be
   type-specific (beyond inferring the type-privilege from the object on which
   a privilege is granted). Maybe there should still be a common convention
   for all the "pure CRUDL" operations, but then types might have some
   additional type-specific privileges too
      - Example: NAMESPACE_CREATE, NAMESPACE_READ_PROPERTIES,
      NAMESPACE_WRITE_PROPERTIES,  NAMESPACE_DROP,  NAMESPACE_LIST
   - Allow/define a convention for inheriting grants in the securable
   object hierarchy -- though it makes sense to also allow for non-inheritance
   if an implementation wants to keep the model simple, if we do have
   type-specific privileges, it at least mitigates one of the listed concerns
   about accidental privileges.
      - For example, if the privilege is only DESCRIBE, then granting
      DESCRIBE on a namespace isn't clear whether it should also
confer DESCRIBE
      on tables/views underneath it. But we could say NAMESPACE_READ_PROPERTIES
      on a namespace doesn't mean any kind of TABLE/VIEW privileges, while
      TABLE_READ_PROPERTIES granted on a namespace would more clearly mean to
      inherit the ability to read table properties underneath that namespace.
      - Hierarchical grants probably also address some of the same use
      cases that people might otherwise address with FUTURE GRANTS,
and for some
      scenarios FUTURE GRANTS might be the more complex or error-prone
alternative
   - To handle the concept of Catalog-based storage-credential vending, we
   could introduce separate privileges TABLE_READ_DATA vs
   TABLE_READ_PROPERTIES and the mutate counter parts TABLE_WRITE_DATA vs
   TABLE_WRITE_PROPERTIES. Implementation-wise it could just mean
   TABLE_READ_DATA/TABLE_WRITE_DATA enable receiving appropriately-scoped
   storage credentials (e.g. read-only subscoped session token for
   TABLE_READ_DATA) in things like loadTable and
   createTable(stage-create=true). Whereas
   TABLE_READ_PROPERTIES/TABLE_WRITE_PROPERTIES would only enable whatever the
   REST Catalog server is able to handle directly in the REST request/response.

Would love to hear anyone's thoughts on these areas.

Cheers,
Dennis Huo


On 2024/06/08 19:12:10 Walaa Eldin Moustafa wrote:
> Thanks Jack and team for working on this proposal. I went over it and it
is
> very well written. I particularly like:
>
> (1) The fact that it is adopting the SQL standard and adjusting some of
its
> semantics to fit the Iceberg model.
>
> (2) It includes views from v1. Views are a very important tool for policy
> enforcement. We have built a dynamic privacy and compliance enforcement
> catalog extension at LinkedIn using views [1], and one of the main
> improvements to that catalog extension would be securable view objects.
> Admittedly, it might require further improvements to compute engines to
> implement the permissions, but having an Iceberg spec would be the first
> step.
>
> Looking forward to the next steps of the proposal discussion and adoption.
>
> [1]
>
https://www.slideshare.net/slideshow/viewshift-hassle-free-dynamic-policy-enforcement-for-every-data-lake/269577447
>
> Thanks,
> Walaa.
>
>
> On Thu, May 30, 2024 at 10:35 PM Jack Ye <ye...@gmail.com> wrote:
>
> > Hi everyone,
> >
> > Me and a few colleagues at AWS would like to discuss a new proposal for
> > supporting securable objects in the Iceberg REST catalog spec.
> >
> > Here is our proposal in Google doc:
> >
https://docs.google.com/document/d/1KmIDbPuN6IYF0nWs9ostXIB9F4b8iH3zZO0hjgs1lm4/edit
> >
> > And here is the corresponding GitHub issue:
> > https://github.com/apache/iceberg/issues/10407
> >
> > I will also paste the intro here for an overview. There are 2 main
reasons
> > for us to look into this area and draft this proposal:
> >
> > *IRC lacks clear guidelines on access management requirements:*
> >
> > This is feedback we heard frequently when interviewing AWS customers
using
> > Iceberg and considering building an IRC. Today Iceberg objects
(namespaces,
> > tables, views) are not securable within the Iceberg catalog itself, and
> > need to be secured using an auxiliary system. This means that an
> > organization building an IRC service needs to wrap many important
> > operations into custom-built APIs for downstream users to consume (e.g.
an
> > API to grant Iceberg table access on S3 needs to grant corresponding IAM
> > users/roles the right S3 policy or ACL setting). Huge amount of effort
> > needs to be spent to figure out what are the missing APIs in IRC to
satisfy
> > enterprise level data warehouse access management requirements.
> >
> > There are some IRC products that offer vendor-specific APIs outside IRC
to
> > perform those operations, but this means that users are locked-in to
this
> > vendor’s securable object management system when using the IRC solution,
> > and do not have the true freedom to easily switch to another solution
if it
> > offers better price-performance.
> >
> > We understand that Iceberg is not a security product, and it is not the
> > best interest of the community to dive too deep into security-related
> > domains. However, we believe that *we should at least offer the right
> > interfaces and set the right standards for how Iceberg catalog expresses
> > securable objects and how Iceberg catalog users interact with those
objects*,
> > such that (1) users that would like to build IRC can have a clear
guideline
> > of what API constract to implement for managing access to objects in
IRC,
> > and (2) users that are on one IRC product do not need to be locked-in
due
> > to access management aspects.
> >
> > Would really appreciate any feedback on this topic and proposal!
> >
> > Best,
> > Jack Ye
> >
>

Reply via email to