Re: [DISCUSS] CEP-20: Dynamic Data Masking

Aaron Ploetz Tue, 23 Aug 2022 11:14:03 -0700

>
> Applying this should prevent querying on a field, else you could leak its
> contents, surely?
>


In theory, yes.  Although I could see folks doing something like this:

SELECT COUNT(*) FROM patients
WHERE year_of_birth = 2002
AND date_of_birth >= '2002-04-01'
AND date_of_birth < '2002-11-01';

In this case, the rows containing the masked key column(s) could be
filtered on without revealing the actual data.  But again, that's probably
better for a "phase 2" of the implementation.

Agreed on not being a queryable field. That would also preclude secondary
> indexing, right?


Yes, that's my thought as well.

On Tue, Aug 23, 2022 at 12:42 PM Derek Chen-Becker <de...@chen-becker.org>
wrote:

> Agreed on not being a queryable field. That would also preclude secondary
> indexing, right?
>
> On Tue, Aug 23, 2022 at 11:20 AM Benedict <bened...@apache.org> wrote:
>
>> Applying this should prevent querying on a field, else you could leak its
>> contents, surely? This pretty much prohibits using it in a clustering key,
>> and a partition key with the ordered partitioner - but probably also a
>> hashed partitioner since we do not use a cryptographic hash and the hash
>> function is well defined.
>>
>> We probably also need to ensure that any ALLOW FILTERING queries on such
>> a field are disabled.
>>
>> Plausibly the data could be cryptographically jumbled before using it in
>> a primary key component (or permitting filtering), but it is probably
>> easier and safer to exclude for now…
>>
>> On 23 Aug 2022, at 18:13, Aaron Ploetz <aaronplo...@gmail.com> wrote:
>>
>> 
>> Some thoughts on this one:
>>
>> In a prior job, we'd give app teams access to a single keyspace, and two
>> roles: a read-write role and a read-only role.  In some cases, a
>> "privileged" application role was also requested.  Depending on the
>> requirements, I could see the UNMASK permission being applied to the RW or
>> privileged roles.  But if there's a problem on the table and the operators
>> go in to investigate, they will likely use a SUPERUSER account, and they'll
>> see that data.
>>
>> How hard would it be for SUPERUSERs to *not* automatically get the UNMASK
>> permission?
>>
>> I'll also echo the concerns around masking primary key components.  It's
>> highly likely that certain personal data properties would be used as a
>> partition or clustering key (ex: range query for people born within a
>> certain timeframe).  In addition to the "breaks existing" concern, I'm
>> curious about the challenges around getting that to work with the current
>> primary key implementation.
>>
>> Does this first implementation only apply to payload (non-key) columns?
>> The examples in the CEP currently do not show primary key components being
>> masked.
>>
>> Thanks,
>>
>> Aaron
>>
>>
>> On Tue, Aug 23, 2022 at 6:44 AM Henrik Ingo <henrik.i...@datastax.com>
>> wrote:
>>
>>> On Tue, Aug 23, 2022 at 1:10 PM Andrés de la Peña <adelap...@apache.org>
>>> wrote:
>>>
>>>> One thought: The way the CEP is currently written, it is only possible
>>>>> to mask a column one way. You can only define one masking function for a
>>>>> column, and since you use the original column name, you could only return
>>>>> one version of it in the result set, even if you had a way to define
>>>>> several functions.
>>>>>
>>>>
>>>> Right, it's one single type of mapping per the column, declared on
>>>> CREATE/ALTER TABLE statements. Also, users can manually specify their own
>>>> masking function in SELECT statements if they have permissions for seeing
>>>> the clear data.
>>>>
>>>> For those cases where the data is automatically masked for an
>>>> unprivileged user, I don't see the use of including different types of
>>>> masking for the same column into the same result set. Instead, we might be
>>>> interested on having different types of masking associated to different
>>>> roles. We could do so with dedicated CREATE/DROP/LIST MASK statements,
>>>> instead of using the CREATE/ALTER/DESCRIBE TABLE statements. That CREATE
>>>> MASK statement would associate a masking function to a column and role.
>>>> However, I'm not sure we need that type of granularity instead of the
>>>> simplicity of attaching the masking to the column declaration. wdyt?
>>>>
>>>>
>>>>
>>> My gut feeling likewise is that this adds complexity but little value.
>>>
>>>>
>>>>>
>>>
>>> --
>>>
>>> Henrik Ingo
>>>
>>> +358 40 569 7354 <358405697354>
>>>
>>> [image: Visit us online.] <https://www.datastax.com/>  [image: Visit us
>>> on Twitter.] <https://twitter.com/DataStaxEng>  [image: Visit us on
>>> YouTube.]
>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.youtube.com_channel_UCqA6zOSMpQ55vvguq4Y0jAg&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=IFj3MdIKYLLXIUhYdUGB0cTzTlxyCb7_VUmICBaYilU&m=bmIfaie9O3fWJAu6lESvWj3HajV4VFwgwgVuKmxKZmE&s=16sY48_kvIb7sRQORknZrr3V8iLTfemFKbMVNZhdwgw&e=>
>>>   [image: Visit my LinkedIn profile.]
>>> <https://www.linkedin.com/in/heingo/>
>>>
>>
>
> --
> +---------------------------------------------------------------+
> | Derek Chen-Becker                                             |
> | GPG Key available at https://keybase.io/dchenbecker and       |
> | https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org |
> | Fngrprnt: EB8A 6480 F0A3 C8EB C1E7  7F42 AFC5 AFEE 96E4 6ACC  |
> +---------------------------------------------------------------+
>
>

Re: [DISCUSS] CEP-20: Dynamic Data Masking

Reply via email to