Re: [DISCUSS] CEP-20: Dynamic Data Masking

Andrés de la Peña Wed, 24 Aug 2022 03:06:40 -0700

Here are the names of the feature on same databases out there, errors and
omission excepted:


   - Microsoft SQL Server / Azure SQL: Dynamic data masking
   - MySQL: Enterprise data masking and de-identification
   - PostgreSQL: Dynamic masking
   - MongoDB: Data masking
   - IBM Db2: Masks
   - Oracle: Redaction
   - MariaDB/MaxScale: Data masking
   - Snowflake: Dynamic data masking


On Wed, 24 Aug 2022 at 10:40, Benedict <[email protected]> wrote:

> Right, but we get to decide how we offer such features and what we call
> them. I can’t imagine a good reason to call this a masking feature,
> especially one that applies differentially to certain users, when it is
> trivial to unmask.
>
> I’m ok offering a feature called “default formatter” or something that
> applies some UDF to a field before returning to the client, and if users
> wish to “mask” their data in this way that’s fine. But calling it a data
> mask when it is trivial to circumvent is IMO dangerous, and I’d at least
> want to see evidence that all other equivalent features in the industry are
> similarly poorly named and offer similarly poor protection.
>
> On 24 Aug 2022, at 09:50, Benjamin Lerer <[email protected]> wrote:
>
> 
>
>> The PCI DSS Standard v4_0
>> <https://docs-prv.pcisecuritystandards.org/PCI%20DSS/Standard/PCI-DSS-v4_0.pdf>
>>  requires
>> that credit card numbers stored on the system must be "rendered
>> unreadable", thus this proposal is _NOT_ a good way to protect credit card
>> numbers.
>
>
> My point was simply about the fact that Dynamic Data Masking like any
> other feature made sense for some scenario but not for others. I apologise
> if my example was a bad one.
>
> Le mer. 24 août 2022 à 10:36, Claude Warren, Jr via dev <
> [email protected]> a écrit :
>
>> This change appears to be looking at two aspects:
>>
>>    1. Add metadata to columns
>>    2. Add functionality based on the metadata.
>>
>> If the system had a generic user defined metadata and the ability to
>> define filter functions at the point where data are being returned to the
>> client it would be possible for users implement this filter, or any other
>> filter on the data.
>>
>> The concept of user defined metadata and filters could be applied to
>> other parts of the system as well.  For example, if the metadata were
>> accessible from UDFs the metadata could be used in low level filters to
>> remove rows from queries before they were returned.
>>
>>
>>
>>
>> On Wed, Aug 24, 2022 at 9:29 AM Claude Warren, Jr <[email protected]>
>> wrote:
>>
>>> The PCI DSS Standard v4_0
>>> <https://docs-prv.pcisecuritystandards.org/PCI%20DSS/Standard/PCI-DSS-v4_0.pdf>
>>>  requires
>>> that credit card numbers stored on the system must be "rendered
>>> unreadable", thus this proposal is _NOT_ a good way to protect credit card
>>> numbers.  In fact, for any critically sensitive data this is not an
>>> appropriate solution.  However, there seems to be agreement that it is
>>> appropriate for obfuscating some data in some queries by some users.
>>>
>>>
>>>
>>> On Wed, Aug 24, 2022 at 9:02 AM Benjamin Lerer <[email protected]>
>>> wrote:
>>>
>>>> Is it typical for a masking feature to make no effort to prevent
>>>>> unmasking? I’m just struggling to see the value of this without such
>>>>> mechanisms. Otherwise it’s just a default formatter, and we should 
>>>>> consider
>>>>> renaming the feature IMO
>>>>
>>>>
>>>> The security that Dynamic Data Masking is bringing is related to how
>>>> you make use of the feature. It is somehow the same with passwords. If you
>>>> use a weak password it does not bring much security.
>>>> Masking a field like people's gender is useless because you will be
>>>> able to determine its value in one query. On the other hand masking credit
>>>> card numbers makes a lot of sense as it will complicate the life of the
>>>> person trying to have access to it and the queries needed to reach the
>>>> information will leave some clear traces in the audit log.
>>>>
>>>> Dynamic Data Masking is not a magic bullet. Nevertheless, it is a good
>>>> way to protect sensitive data like credit card numbers or passwords.
>>>>
>>>>
>>>> Le mer. 24 août 2022 à 09:40, Benedict <[email protected]> a écrit :
>>>>
>>>>> Is it typical for a masking feature to make no effort to prevent
>>>>> unmasking? I’m just struggling to see the value of this without such
>>>>> mechanisms. Otherwise it’s just a default formatter, and we should 
>>>>> consider
>>>>> renaming the feature IMO
>>>>>
>>>>> On 23 Aug 2022, at 21:27, Andrés de la Peña <[email protected]>
>>>>> wrote:
>>>>>
>>>>> 
>>>>> As mentioned in the CEP document, dynamic data masking doesn't try to
>>>>> prevent malicious users with SELECT permissions to indirectly guess the
>>>>> real value of the masked value. This can easily be done by just trying
>>>>> values on the WHERE clause of SELECT queries. DDM would not be a
>>>>> replacement for proper column-level permissions.
>>>>>
>>>>> The data served by the database is usually consumed by applications
>>>>> that present this data to end users. These end users are not necessarily
>>>>> the users directly connecting to the database. With DDM, it would be easy
>>>>> for applications to mask sensitive data that is going to be consumed by 
>>>>> the
>>>>> end users. However, the users directly connecting to the database should 
>>>>> be
>>>>> trusted, provided that they have the right SELECT permissions.
>>>>>
>>>>> In other words, DDM doesn't directly protect the data, but it eases
>>>>> the production of protected data.
>>>>>
>>>>> Said that, we could later go one step ahead and add a way to prevent
>>>>> untrusted users from inferring the masked data. That could be done adding 
>>>>> a
>>>>> new permission required to use certain columns on WHERE clauses, different
>>>>> to the current SELECT permission. That would play especially well with
>>>>> column-level permissions, which is something that we still have pending.
>>>>>
>>>>> On Tue, 23 Aug 2022 at 19:13, Aaron Ploetz <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Applying this should prevent querying on a field, else you could leak
>>>>>>> its contents, surely?
>>>>>>>
>>>>>>
>>>>>> In theory, yes.  Although I could see folks doing something like this:
>>>>>>
>>>>>> SELECT COUNT(*) FROM patients
>>>>>> WHERE year_of_birth = 2002
>>>>>> AND date_of_birth >= '2002-04-01'
>>>>>> AND date_of_birth < '2002-11-01';
>>>>>>
>>>>>> In this case, the rows containing the masked key column(s) could be
>>>>>> filtered on without revealing the actual data.  But again, that's 
>>>>>> probably
>>>>>> better for a "phase 2" of the implementation.
>>>>>>
>>>>>> Agreed on not being a queryable field. That would also preclude
>>>>>>> secondary indexing, right?
>>>>>>
>>>>>>
>>>>>> Yes, that's my thought as well.
>>>>>>
>>>>>> On Tue, Aug 23, 2022 at 12:42 PM Derek Chen-Becker <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> Agreed on not being a queryable field. That would also preclude
>>>>>>> secondary indexing, right?
>>>>>>>
>>>>>>> On Tue, Aug 23, 2022 at 11:20 AM Benedict <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Applying this should prevent querying on a field, else you could
>>>>>>>> leak its contents, surely? This pretty much prohibits using it in a
>>>>>>>> clustering key, and a partition key with the ordered partitioner - but
>>>>>>>> probably also a hashed partitioner since we do not use a cryptographic 
>>>>>>>> hash
>>>>>>>> and the hash function is well defined.
>>>>>>>>
>>>>>>>> We probably also need to ensure that any ALLOW FILTERING queries on
>>>>>>>> such a field are disabled.
>>>>>>>>
>>>>>>>> Plausibly the data could be cryptographically jumbled before using
>>>>>>>> it in a primary key component (or permitting filtering), but it is 
>>>>>>>> probably
>>>>>>>> easier and safer to exclude for now…
>>>>>>>>
>>>>>>>> On 23 Aug 2022, at 18:13, Aaron Ploetz <[email protected]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>> 
>>>>>>>> Some thoughts on this one:
>>>>>>>>
>>>>>>>> In a prior job, we'd give app teams access to a single keyspace,
>>>>>>>> and two roles: a read-write role and a read-only role.  In some cases, 
>>>>>>>> a
>>>>>>>> "privileged" application role was also requested.  Depending on the
>>>>>>>> requirements, I could see the UNMASK permission being applied to the 
>>>>>>>> RW or
>>>>>>>> privileged roles.  But if there's a problem on the table and the 
>>>>>>>> operators
>>>>>>>> go in to investigate, they will likely use a SUPERUSER account, and 
>>>>>>>> they'll
>>>>>>>> see that data.
>>>>>>>>
>>>>>>>> How hard would it be for SUPERUSERs to *not* automatically get the
>>>>>>>> UNMASK permission?
>>>>>>>>
>>>>>>>> I'll also echo the concerns around masking primary key components.
>>>>>>>> It's highly likely that certain personal data properties would be used 
>>>>>>>> as a
>>>>>>>> partition or clustering key (ex: range query for people born within a
>>>>>>>> certain timeframe).  In addition to the "breaks existing" concern, I'm
>>>>>>>> curious about the challenges around getting that to work with the 
>>>>>>>> current
>>>>>>>> primary key implementation.
>>>>>>>>
>>>>>>>> Does this first implementation only apply to payload (non-key)
>>>>>>>> columns?  The examples in the CEP currently do not show primary key
>>>>>>>> components being masked.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> Aaron
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Aug 23, 2022 at 6:44 AM Henrik Ingo <
>>>>>>>> [email protected]> wrote:
>>>>>>>>
>>>>>>>>> On Tue, Aug 23, 2022 at 1:10 PM Andrés de la Peña <
>>>>>>>>> [email protected]> wrote:
>>>>>>>>>
>>>>>>>>>> One thought: The way the CEP is currently written, it is only
>>>>>>>>>>> possible to mask a column one way. You can only define one masking 
>>>>>>>>>>> function
>>>>>>>>>>> for a column, and since you use the original column name, you could 
>>>>>>>>>>> only
>>>>>>>>>>> return one version of it in the result set, even if you had a way 
>>>>>>>>>>> to define
>>>>>>>>>>> several functions.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Right, it's one single type of mapping per the column, declared
>>>>>>>>>> on CREATE/ALTER TABLE statements. Also, users can manually specify 
>>>>>>>>>> their
>>>>>>>>>> own masking function in SELECT statements if they have permissions 
>>>>>>>>>> for
>>>>>>>>>> seeing the clear data.
>>>>>>>>>>
>>>>>>>>>> For those cases where the data is automatically masked for an
>>>>>>>>>> unprivileged user, I don't see the use of including different types 
>>>>>>>>>> of
>>>>>>>>>> masking for the same column into the same result set. Instead, we 
>>>>>>>>>> might be
>>>>>>>>>> interested on having different types of masking associated to 
>>>>>>>>>> different
>>>>>>>>>> roles. We could do so with dedicated CREATE/DROP/LIST MASK 
>>>>>>>>>> statements,
>>>>>>>>>> instead of using the CREATE/ALTER/DESCRIBE TABLE statements. That 
>>>>>>>>>> CREATE
>>>>>>>>>> MASK statement would associate a masking function to a column and 
>>>>>>>>>> role.
>>>>>>>>>> However, I'm not sure we need that type of granularity instead of the
>>>>>>>>>> simplicity of attaching the masking to the column declaration. wdyt?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> My gut feeling likewise is that this adds complexity but little
>>>>>>>>> value.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>>
>>>>>>>>> Henrik Ingo
>>>>>>>>>
>>>>>>>>> +358 40 569 7354 <358405697354>
>>>>>>>>>
>>>>>>>>> [image: Visit us online.] <https://www.datastax.com/>  [image:
>>>>>>>>> Visit us on Twitter.] <https://twitter.com/DataStaxEng>  [image:
>>>>>>>>> Visit us on YouTube.]
>>>>>>>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.youtube.com_channel_UCqA6zOSMpQ55vvguq4Y0jAg&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=IFj3MdIKYLLXIUhYdUGB0cTzTlxyCb7_VUmICBaYilU&m=bmIfaie9O3fWJAu6lESvWj3HajV4VFwgwgVuKmxKZmE&s=16sY48_kvIb7sRQORknZrr3V8iLTfemFKbMVNZhdwgw&e=>
>>>>>>>>>   [image: Visit my LinkedIn profile.]
>>>>>>>>> <https://www.linkedin.com/in/heingo/>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> +---------------------------------------------------------------+
>>>>>>> | Derek Chen-Becker                                             |
>>>>>>> | GPG Key available at https://keybase.io/dchenbecker and       |
>>>>>>> | https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org |
>>>>>>> | Fngrprnt: EB8A 6480 F0A3 C8EB C1E7  7F42 AFC5 AFEE 96E4 6ACC  |
>>>>>>> +---------------------------------------------------------------+
>>>>>>>
>>>>>>>

Re: [DISCUSS] CEP-20: Dynamic Data Masking

Reply via email to