*Hi everyone,*

As we progress with *Read Restrictions [1]*, we need to reach a community
consensus on two key items: the *list* of predefined masks to include in
the spec, and the *representation* of those masks.

Regarding representation, the current proposal uses an *Action* model. As
Ryan rightly puts it, this is essentially *syntactic sugar* for these
predefined common masking operations.

*Here is how the current "Action" proposal compares to a full "Transform"
approach for a standard mask:*

   -

   *Transform Approach (define new transforms):*

   {"field-id": 1, "expr": {"type": "alias", "name": "col-name", "child":
   {"type": "apply", "func-name": "mask_alphanum", "child": {"type":
   "reference", "field-id": 1}}}}
   -

   *Action Approach (Current Proposal):*

   {"field-id": 1, "action": "mask_alphanum"}

In the "Action" model, the REST spec defines what the action means, and the
caller simply ensures that it is understood and enforced. This mirrors how
many existing policy stores handle masking:

   -

   *Apache Ranger [2]:* Uses maskType (e.g., "maskType": "MASK_SHOW_LAST_4"
   ).
   -

   *Google BigQuery [3]:* Uses predefinedExpression (e.g.,
"predefinedExpression":
   "SHA256").

*I would love to get your feedback on the following:*

   -

   *Representation:* Does the community agree with using this *Action*
   (syntactic sugar) approach for standard masks, or should we strictly use
   the explicit *Transform* approach?
   -

   *The List:* Based on the research into Ranger, BQ, PG, and MS SQL
   [2,3,4,5], what is the "minimal must-have" list of masks we should define
   in the spec (i have some defined already)?

Please feel free to comment on the *Spec PR #13879
<https://github.com/apache/iceberg/pull/13879>* or reply here.

*Best regards,*

Prashant Singh
------------------------------

*References:*

   -

   *[1] Proposal:* https://github.com/apache/iceberg/pull/13879
   -

   *[2] Ranger:* Column masking in Hive
   
<https://docs.cloudera.com/runtime/7.3.1/security-ranger-authorization/topics/security-ranger-resource-based-column-masking-in-hive-with-ranger-policies.html>
   -

   *[3] Google BQ:* BigQuery predefined expressions
   
<https://docs.cloud.google.com/bigquery/docs/reference/bigquerydatapolicy/rest/v1/projects.locations.dataPolicies#PredefinedExpression>
   -

   *[4] PostgreSQL Extension:* Masking functions
   <https://postgresql-anonymizer.readthedocs.io/en/stable/masking_functions/>
   -

   *[5] MS SQL Server:* Dynamic Data Masking
   
<https://learn.microsoft.com/en-us/sql/relational-databases/security/dynamic-data-masking?view=sql-server-ver17#define-a-dynamic-data-mask>

Reply via email to