Re: [DISCUSS] CEP-20: Dynamic Data Masking

Benedict Tue, 23 Aug 2022 10:20:13 -0700

Applying this should prevent querying on a field, else you could leak its 
contents, surely? This pretty much prohibits using it in a clustering key, and 
a partition key with the ordered partitioner - but probably also a hashed 
partitioner since we do not use a cryptographic hash and the hash function is 
well defined.


We probably also need to ensure that any ALLOW FILTERING queries on such a 
field are disabled.

Plausibly the data could be cryptographically jumbled before using it in a 
primary key component (or permitting filtering), but it is probably easier and 
safer to exclude for now…

> On 23 Aug 2022, at 18:13, Aaron Ploetz <aaronplo...@gmail.com> wrote:
> 
> 
> Some thoughts on this one:
> 
> In a prior job, we'd give app teams access to a single keyspace, and two 
> roles: a read-write role and a read-only role.  In some cases, a "privileged" 
> application role was also requested.  Depending on the requirements, I could 
> see the UNMASK permission being applied to the RW or privileged roles.  But 
> if there's a problem on the table and the operators go in to investigate, 
> they will likely use a SUPERUSER account, and they'll see that data.
> 
> How hard would it be for SUPERUSERs to *not* automatically get the UNMASK 
> permission?
> 
> I'll also echo the concerns around masking primary key components.  It's 
> highly likely that certain personal data properties would be used as a 
> partition or clustering key (ex: range query for people born within a certain 
> timeframe).  In addition to the "breaks existing" concern, I'm curious about 
> the challenges around getting that to work with the current primary key 
> implementation.
> 
> Does this first implementation only apply to payload (non-key) columns?  The 
> examples in the CEP currently do not show primary key components being 
> masked. 
> 
> Thanks,
> 
> Aaron
> 
> 
>> On Tue, Aug 23, 2022 at 6:44 AM Henrik Ingo <henrik.i...@datastax.com> wrote:
>> On Tue, Aug 23, 2022 at 1:10 PM Andrés de la Peña <adelap...@apache.org> 
>> wrote:
>>>> One thought: The way the CEP is currently written, it is only possible to 
>>>> mask a column one way. You can only define one masking function for a 
>>>> column, and since you use the original column name, you could only return 
>>>> one version of it in the result set, even if you had a way to define 
>>>> several functions.
>>> 
>>> Right, it's one single type of mapping per the column, declared on 
>>> CREATE/ALTER TABLE statements. Also, users can manually specify their own 
>>> masking function in SELECT statements if they have permissions for seeing 
>>> the clear data.
>>> 
>>> For those cases where the data is automatically masked for an unprivileged 
>>> user, I don't see the use of including different types of masking for the 
>>> same column into the same result set. Instead, we might be interested on 
>>> having different types of masking associated to different roles. We could 
>>> do so with dedicated CREATE/DROP/LIST MASK statements, instead of using the 
>>> CREATE/ALTER/DESCRIBE TABLE statements. That CREATE MASK statement would 
>>> associate a masking function to a column and role. However, I'm not sure we 
>>> need that type of granularity instead of the simplicity of attaching the 
>>> masking to the column declaration. wdyt?
>>> 
>>> 
>> 
>> My gut feeling likewise is that this adds complexity but little value.
>>>> 
>> 
>> 
>> -- 
>> Henrik Ingo
>> +358 40 569 7354
>>

Re: [DISCUSS] CEP-20: Dynamic Data Masking

Reply via email to