Applying this should prevent querying on a field, else you could leak its contents, surely? This pretty much prohibits using it in a clustering key, and a partition key with the ordered partitioner - but probably also a hashed partitioner since we do not use a cryptographic hash and the hash function is well defined.
We probably also need to ensure that any ALLOW FILTERING queries on such a field are disabled. Plausibly the data could be cryptographically jumbled before using it in a primary key component (or permitting filtering), but it is probably easier and safer to exclude for now… > On 23 Aug 2022, at 18:13, Aaron Ploetz <aaronplo...@gmail.com> wrote: > > > Some thoughts on this one: > > In a prior job, we'd give app teams access to a single keyspace, and two > roles: a read-write role and a read-only role. In some cases, a "privileged" > application role was also requested. Depending on the requirements, I could > see the UNMASK permission being applied to the RW or privileged roles. But > if there's a problem on the table and the operators go in to investigate, > they will likely use a SUPERUSER account, and they'll see that data. > > How hard would it be for SUPERUSERs to *not* automatically get the UNMASK > permission? > > I'll also echo the concerns around masking primary key components. It's > highly likely that certain personal data properties would be used as a > partition or clustering key (ex: range query for people born within a certain > timeframe). In addition to the "breaks existing" concern, I'm curious about > the challenges around getting that to work with the current primary key > implementation. > > Does this first implementation only apply to payload (non-key) columns? The > examples in the CEP currently do not show primary key components being > masked. > > Thanks, > > Aaron > > >> On Tue, Aug 23, 2022 at 6:44 AM Henrik Ingo <henrik.i...@datastax.com> wrote: >> On Tue, Aug 23, 2022 at 1:10 PM Andrés de la Peña <adelap...@apache.org> >> wrote: >>>> One thought: The way the CEP is currently written, it is only possible to >>>> mask a column one way. You can only define one masking function for a >>>> column, and since you use the original column name, you could only return >>>> one version of it in the result set, even if you had a way to define >>>> several functions. >>> >>> Right, it's one single type of mapping per the column, declared on >>> CREATE/ALTER TABLE statements. Also, users can manually specify their own >>> masking function in SELECT statements if they have permissions for seeing >>> the clear data. >>> >>> For those cases where the data is automatically masked for an unprivileged >>> user, I don't see the use of including different types of masking for the >>> same column into the same result set. Instead, we might be interested on >>> having different types of masking associated to different roles. We could >>> do so with dedicated CREATE/DROP/LIST MASK statements, instead of using the >>> CREATE/ALTER/DESCRIBE TABLE statements. That CREATE MASK statement would >>> associate a masking function to a column and role. However, I'm not sure we >>> need that type of granularity instead of the simplicity of attaching the >>> masking to the column declaration. wdyt? >>> >>> >> >> My gut feeling likewise is that this adds complexity but little value. >>>> >> >> >> -- >> Henrik Ingo >> +358 40 569 7354 >>