Hi everyone,

I would like to get some initial thoughts about the possibility to add some
permission control constructs to the Iceberg REST spec. Do we think it is
valuable? If so, how do we imagine its shape and form?

The background of this idea is that, today Iceberg already supports loading
credentials to a table through the config field
<https://github.com/apache/iceberg/blob/main/open-api/rest-catalog-open-api.yaml#L2714-L2719>
in LoadTableResponse, as a basic way to control data access. We heard that
users really like this feature and want more regarding data access control
and permission configuration in Iceberg.

For example, we could consider add a *policy* field in the REST
LoadTableResponse, where a policy has sub-fields that describe:
- general access patterns, like read-only, read-write, admin full access,
etc.
- columns that the specific caller has access to for read or write
- filters (maybe expressed in Iceberg expression) that should be applied by
the engine on behalf of the caller during a table scan
- constraints (again, maybe expressed in Iceberg expression) that should
trigger the table scan or table commit to be rejected

This could be the solution to some topics we discussed in the past. For
example, we can use this as a solution to the EXTERNAL database semantics
support discussion
<https://lists.apache.org/thread/ohqfvhf4wofzkhrvff1lxl58blh432o6> by
saying an external table has read-only access. We can also let the REST
service decide access to columns, which solves some governance issues
raised during the column tagging discussion
<https://lists.apache.org/thread/yflg8w1h87qgwc4s3qtog4l8nx8nk8m0>.

Outside existing discussions, this can also work pretty well with popular
engine vendor features like row-level security
<https://cloud.google.com/bigquery/docs/row-level-security-intro>, check
constraint <https://docs.databricks.com/en/tables/constraints.html>, etc.

In general, permission control and data governance is an important aspect
for enterprise data warehousing. I think having these constructs in the
REST spec and related engine integration could increase enterprise adoption
and help our vision of standardizing access through the REST interface.

Would appreciate any thoughts in this domain! And if we have some general
interest in this direction, I can put up a more detailed design doc.

Best,
Jack Ye

Reply via email to