Re: [DISCUSS] CEP-20: Dynamic Data Masking

Benedict Wed, 24 Aug 2022 00:40:48 -0700

Is it typical for a masking feature to make no effort to prevent unmasking? I’m 
just struggling to see the value of this without such mechanisms. Otherwise 
it’s just a default formatter, and we should consider renaming the feature IMO


> On 23 Aug 2022, at 21:27, Andrés de la Peña <[email protected]> wrote:
> 
> 
> As mentioned in the CEP document, dynamic data masking doesn't try to prevent 
> malicious users with SELECT permissions to indirectly guess the real value of 
> the masked value. This can easily be done by just trying values on the WHERE 
> clause of SELECT queries. DDM would not be a replacement for proper 
> column-level permissions.
> 
> The data served by the database is usually consumed by applications that 
> present this data to end users. These end users are not necessarily the users 
> directly connecting to the database. With DDM, it would be easy for 
> applications to mask sensitive data that is going to be consumed by the end 
> users. However, the users directly connecting to the database should be 
> trusted, provided that they have the right SELECT permissions.
> 
> In other words, DDM doesn't directly protect the data, but it eases the 
> production of protected data.
> 
> Said that, we could later go one step ahead and add a way to prevent 
> untrusted users from inferring the masked data. That could be done adding a 
> new permission required to use certain columns on WHERE clauses, different to 
> the current SELECT permission. That would play especially well with 
> column-level permissions, which is something that we still have pending. 
> 
> On Tue, 23 Aug 2022 at 19:13, Aaron Ploetz <[email protected]> wrote:
>>> Applying this should prevent querying on a field, else you could leak its 
>>> contents, surely?
>> 
>> In theory, yes.  Although I could see folks doing something like this:
>> 
>> SELECT COUNT(*) FROM patients
>> WHERE year_of_birth = 2002
>> AND date_of_birth >= '2002-04-01'
>> AND date_of_birth < '2002-11-01';
>> 
>> In this case, the rows containing the masked key column(s) could be filtered 
>> on without revealing the actual data.  But again, that's probably better for 
>> a "phase 2" of the implementation.
>> 
>>> Agreed on not being a queryable field. That would also preclude secondary 
>>> indexing, right?
>> 
>> Yes, that's my thought as well. 
>> 
>>> On Tue, Aug 23, 2022 at 12:42 PM Derek Chen-Becker <[email protected]> 
>>> wrote:
>>> Agreed on not being a queryable field. That would also preclude secondary 
>>> indexing, right? 
>>> 
>>>> On Tue, Aug 23, 2022 at 11:20 AM Benedict <[email protected]> wrote:
>>>> Applying this should prevent querying on a field, else you could leak its 
>>>> contents, surely? This pretty much prohibits using it in a clustering key, 
>>>> and a partition key with the ordered partitioner - but probably also a 
>>>> hashed partitioner since we do not use a cryptographic hash and the hash 
>>>> function is well defined.
>>>> 
>>>> We probably also need to ensure that any ALLOW FILTERING queries on such a 
>>>> field are disabled.
>>>> 
>>>> Plausibly the data could be cryptographically jumbled before using it in a 
>>>> primary key component (or permitting filtering), but it is probably easier 
>>>> and safer to exclude for now…
>>>> 
>>>>>> On 23 Aug 2022, at 18:13, Aaron Ploetz <[email protected]> wrote:
>>>>>> 
>>>>> 
>>>>> Some thoughts on this one:
>>>>> 
>>>>> In a prior job, we'd give app teams access to a single keyspace, and two 
>>>>> roles: a read-write role and a read-only role.  In some cases, a 
>>>>> "privileged" application role was also requested.  Depending on the 
>>>>> requirements, I could see the UNMASK permission being applied to the RW 
>>>>> or privileged roles.  But if there's a problem on the table and the 
>>>>> operators go in to investigate, they will likely use a SUPERUSER account, 
>>>>> and they'll see that data.
>>>>> 
>>>>> How hard would it be for SUPERUSERs to *not* automatically get the UNMASK 
>>>>> permission?
>>>>> 
>>>>> I'll also echo the concerns around masking primary key components.  It's 
>>>>> highly likely that certain personal data properties would be used as a 
>>>>> partition or clustering key (ex: range query for people born within a 
>>>>> certain timeframe).  In addition to the "breaks existing" concern, I'm 
>>>>> curious about the challenges around getting that to work with the current 
>>>>> primary key implementation.
>>>>> 
>>>>> Does this first implementation only apply to payload (non-key) columns?  
>>>>> The examples in the CEP currently do not show primary key components 
>>>>> being masked. 
>>>>> 
>>>>> Thanks,
>>>>> 
>>>>> Aaron
>>>>> 
>>>>> 
>>>>>> On Tue, Aug 23, 2022 at 6:44 AM Henrik Ingo <[email protected]> 
>>>>>> wrote:
>>>>>> On Tue, Aug 23, 2022 at 1:10 PM Andrés de la Peña <[email protected]> 
>>>>>> wrote:
>>>>>>>> One thought: The way the CEP is currently written, it is only possible 
>>>>>>>> to mask a column one way. You can only define one masking function for 
>>>>>>>> a column, and since you use the original column name, you could only 
>>>>>>>> return one version of it in the result set, even if you had a way to 
>>>>>>>> define several functions.
>>>>>>> 
>>>>>>> Right, it's one single type of mapping per the column, declared on 
>>>>>>> CREATE/ALTER TABLE statements. Also, users can manually specify their 
>>>>>>> own masking function in SELECT statements if they have permissions for 
>>>>>>> seeing the clear data.
>>>>>>> 
>>>>>>> For those cases where the data is automatically masked for an 
>>>>>>> unprivileged user, I don't see the use of including different types of 
>>>>>>> masking for the same column into the same result set. Instead, we might 
>>>>>>> be interested on having different types of masking associated to 
>>>>>>> different roles. We could do so with dedicated CREATE/DROP/LIST MASK 
>>>>>>> statements, instead of using the CREATE/ALTER/DESCRIBE TABLE 
>>>>>>> statements. That CREATE MASK statement would associate a masking 
>>>>>>> function to a column and role. However, I'm not sure we need that type 
>>>>>>> of granularity instead of the simplicity of attaching the masking to 
>>>>>>> the column declaration. wdyt?
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> My gut feeling likewise is that this adds complexity but little value.
>>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> -- 
>>>>>> Henrik Ingo
>>>>>> +358 40 569 7354
>>>>>>       
>>> 
>>> 
>>> -- 
>>> +---------------------------------------------------------------+
>>> | Derek Chen-Becker                                             |
>>> | GPG Key available at https://keybase.io/dchenbecker and       |
>>> | https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org |
>>> | Fngrprnt: EB8A 6480 F0A3 C8EB C1E7  7F42 AFC5 AFEE 96E4 6ACC  |
>>> +---------------------------------------------------------------+
>>>

Re: [DISCUSS] CEP-20: Dynamic Data Masking

Reply via email to