Here are the names of the feature on same databases out there, errors and omission excepted:
- Microsoft SQL Server / Azure SQL: Dynamic data masking - MySQL: Enterprise data masking and de-identification - PostgreSQL: Dynamic masking - MongoDB: Data masking - IBM Db2: Masks - Oracle: Redaction - MariaDB/MaxScale: Data masking - Snowflake: Dynamic data masking On Wed, 24 Aug 2022 at 10:40, Benedict <bened...@apache.org> wrote: > Right, but we get to decide how we offer such features and what we call > them. I can’t imagine a good reason to call this a masking feature, > especially one that applies differentially to certain users, when it is > trivial to unmask. > > I’m ok offering a feature called “default formatter” or something that > applies some UDF to a field before returning to the client, and if users > wish to “mask” their data in this way that’s fine. But calling it a data > mask when it is trivial to circumvent is IMO dangerous, and I’d at least > want to see evidence that all other equivalent features in the industry are > similarly poorly named and offer similarly poor protection. > > On 24 Aug 2022, at 09:50, Benjamin Lerer <ble...@apache.org> wrote: > > > >> The PCI DSS Standard v4_0 >> <https://docs-prv.pcisecuritystandards.org/PCI%20DSS/Standard/PCI-DSS-v4_0.pdf> >> requires >> that credit card numbers stored on the system must be "rendered >> unreadable", thus this proposal is _NOT_ a good way to protect credit card >> numbers. > > > My point was simply about the fact that Dynamic Data Masking like any > other feature made sense for some scenario but not for others. I apologise > if my example was a bad one. > > Le mer. 24 août 2022 à 10:36, Claude Warren, Jr via dev < > dev@cassandra.apache.org> a écrit : > >> This change appears to be looking at two aspects: >> >> 1. Add metadata to columns >> 2. Add functionality based on the metadata. >> >> If the system had a generic user defined metadata and the ability to >> define filter functions at the point where data are being returned to the >> client it would be possible for users implement this filter, or any other >> filter on the data. >> >> The concept of user defined metadata and filters could be applied to >> other parts of the system as well. For example, if the metadata were >> accessible from UDFs the metadata could be used in low level filters to >> remove rows from queries before they were returned. >> >> >> >> >> On Wed, Aug 24, 2022 at 9:29 AM Claude Warren, Jr <claude.war...@aiven.io> >> wrote: >> >>> The PCI DSS Standard v4_0 >>> <https://docs-prv.pcisecuritystandards.org/PCI%20DSS/Standard/PCI-DSS-v4_0.pdf> >>> requires >>> that credit card numbers stored on the system must be "rendered >>> unreadable", thus this proposal is _NOT_ a good way to protect credit card >>> numbers. In fact, for any critically sensitive data this is not an >>> appropriate solution. However, there seems to be agreement that it is >>> appropriate for obfuscating some data in some queries by some users. >>> >>> >>> >>> On Wed, Aug 24, 2022 at 9:02 AM Benjamin Lerer <b.le...@gmail.com> >>> wrote: >>> >>>> Is it typical for a masking feature to make no effort to prevent >>>>> unmasking? I’m just struggling to see the value of this without such >>>>> mechanisms. Otherwise it’s just a default formatter, and we should >>>>> consider >>>>> renaming the feature IMO >>>> >>>> >>>> The security that Dynamic Data Masking is bringing is related to how >>>> you make use of the feature. It is somehow the same with passwords. If you >>>> use a weak password it does not bring much security. >>>> Masking a field like people's gender is useless because you will be >>>> able to determine its value in one query. On the other hand masking credit >>>> card numbers makes a lot of sense as it will complicate the life of the >>>> person trying to have access to it and the queries needed to reach the >>>> information will leave some clear traces in the audit log. >>>> >>>> Dynamic Data Masking is not a magic bullet. Nevertheless, it is a good >>>> way to protect sensitive data like credit card numbers or passwords. >>>> >>>> >>>> Le mer. 24 août 2022 à 09:40, Benedict <bened...@apache.org> a écrit : >>>> >>>>> Is it typical for a masking feature to make no effort to prevent >>>>> unmasking? I’m just struggling to see the value of this without such >>>>> mechanisms. Otherwise it’s just a default formatter, and we should >>>>> consider >>>>> renaming the feature IMO >>>>> >>>>> On 23 Aug 2022, at 21:27, Andrés de la Peña <adelap...@apache.org> >>>>> wrote: >>>>> >>>>> >>>>> As mentioned in the CEP document, dynamic data masking doesn't try to >>>>> prevent malicious users with SELECT permissions to indirectly guess the >>>>> real value of the masked value. This can easily be done by just trying >>>>> values on the WHERE clause of SELECT queries. DDM would not be a >>>>> replacement for proper column-level permissions. >>>>> >>>>> The data served by the database is usually consumed by applications >>>>> that present this data to end users. These end users are not necessarily >>>>> the users directly connecting to the database. With DDM, it would be easy >>>>> for applications to mask sensitive data that is going to be consumed by >>>>> the >>>>> end users. However, the users directly connecting to the database should >>>>> be >>>>> trusted, provided that they have the right SELECT permissions. >>>>> >>>>> In other words, DDM doesn't directly protect the data, but it eases >>>>> the production of protected data. >>>>> >>>>> Said that, we could later go one step ahead and add a way to prevent >>>>> untrusted users from inferring the masked data. That could be done adding >>>>> a >>>>> new permission required to use certain columns on WHERE clauses, different >>>>> to the current SELECT permission. That would play especially well with >>>>> column-level permissions, which is something that we still have pending. >>>>> >>>>> On Tue, 23 Aug 2022 at 19:13, Aaron Ploetz <aaronplo...@gmail.com> >>>>> wrote: >>>>> >>>>>> Applying this should prevent querying on a field, else you could leak >>>>>>> its contents, surely? >>>>>>> >>>>>> >>>>>> In theory, yes. Although I could see folks doing something like this: >>>>>> >>>>>> SELECT COUNT(*) FROM patients >>>>>> WHERE year_of_birth = 2002 >>>>>> AND date_of_birth >= '2002-04-01' >>>>>> AND date_of_birth < '2002-11-01'; >>>>>> >>>>>> In this case, the rows containing the masked key column(s) could be >>>>>> filtered on without revealing the actual data. But again, that's >>>>>> probably >>>>>> better for a "phase 2" of the implementation. >>>>>> >>>>>> Agreed on not being a queryable field. That would also preclude >>>>>>> secondary indexing, right? >>>>>> >>>>>> >>>>>> Yes, that's my thought as well. >>>>>> >>>>>> On Tue, Aug 23, 2022 at 12:42 PM Derek Chen-Becker < >>>>>> de...@chen-becker.org> wrote: >>>>>> >>>>>>> Agreed on not being a queryable field. That would also preclude >>>>>>> secondary indexing, right? >>>>>>> >>>>>>> On Tue, Aug 23, 2022 at 11:20 AM Benedict <bened...@apache.org> >>>>>>> wrote: >>>>>>> >>>>>>>> Applying this should prevent querying on a field, else you could >>>>>>>> leak its contents, surely? This pretty much prohibits using it in a >>>>>>>> clustering key, and a partition key with the ordered partitioner - but >>>>>>>> probably also a hashed partitioner since we do not use a cryptographic >>>>>>>> hash >>>>>>>> and the hash function is well defined. >>>>>>>> >>>>>>>> We probably also need to ensure that any ALLOW FILTERING queries on >>>>>>>> such a field are disabled. >>>>>>>> >>>>>>>> Plausibly the data could be cryptographically jumbled before using >>>>>>>> it in a primary key component (or permitting filtering), but it is >>>>>>>> probably >>>>>>>> easier and safer to exclude for now… >>>>>>>> >>>>>>>> On 23 Aug 2022, at 18:13, Aaron Ploetz <aaronplo...@gmail.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>> >>>>>>>> Some thoughts on this one: >>>>>>>> >>>>>>>> In a prior job, we'd give app teams access to a single keyspace, >>>>>>>> and two roles: a read-write role and a read-only role. In some cases, >>>>>>>> a >>>>>>>> "privileged" application role was also requested. Depending on the >>>>>>>> requirements, I could see the UNMASK permission being applied to the >>>>>>>> RW or >>>>>>>> privileged roles. But if there's a problem on the table and the >>>>>>>> operators >>>>>>>> go in to investigate, they will likely use a SUPERUSER account, and >>>>>>>> they'll >>>>>>>> see that data. >>>>>>>> >>>>>>>> How hard would it be for SUPERUSERs to *not* automatically get the >>>>>>>> UNMASK permission? >>>>>>>> >>>>>>>> I'll also echo the concerns around masking primary key components. >>>>>>>> It's highly likely that certain personal data properties would be used >>>>>>>> as a >>>>>>>> partition or clustering key (ex: range query for people born within a >>>>>>>> certain timeframe). In addition to the "breaks existing" concern, I'm >>>>>>>> curious about the challenges around getting that to work with the >>>>>>>> current >>>>>>>> primary key implementation. >>>>>>>> >>>>>>>> Does this first implementation only apply to payload (non-key) >>>>>>>> columns? The examples in the CEP currently do not show primary key >>>>>>>> components being masked. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> >>>>>>>> Aaron >>>>>>>> >>>>>>>> >>>>>>>> On Tue, Aug 23, 2022 at 6:44 AM Henrik Ingo < >>>>>>>> henrik.i...@datastax.com> wrote: >>>>>>>> >>>>>>>>> On Tue, Aug 23, 2022 at 1:10 PM Andrés de la Peña < >>>>>>>>> adelap...@apache.org> wrote: >>>>>>>>> >>>>>>>>>> One thought: The way the CEP is currently written, it is only >>>>>>>>>>> possible to mask a column one way. You can only define one masking >>>>>>>>>>> function >>>>>>>>>>> for a column, and since you use the original column name, you could >>>>>>>>>>> only >>>>>>>>>>> return one version of it in the result set, even if you had a way >>>>>>>>>>> to define >>>>>>>>>>> several functions. >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Right, it's one single type of mapping per the column, declared >>>>>>>>>> on CREATE/ALTER TABLE statements. Also, users can manually specify >>>>>>>>>> their >>>>>>>>>> own masking function in SELECT statements if they have permissions >>>>>>>>>> for >>>>>>>>>> seeing the clear data. >>>>>>>>>> >>>>>>>>>> For those cases where the data is automatically masked for an >>>>>>>>>> unprivileged user, I don't see the use of including different types >>>>>>>>>> of >>>>>>>>>> masking for the same column into the same result set. Instead, we >>>>>>>>>> might be >>>>>>>>>> interested on having different types of masking associated to >>>>>>>>>> different >>>>>>>>>> roles. We could do so with dedicated CREATE/DROP/LIST MASK >>>>>>>>>> statements, >>>>>>>>>> instead of using the CREATE/ALTER/DESCRIBE TABLE statements. That >>>>>>>>>> CREATE >>>>>>>>>> MASK statement would associate a masking function to a column and >>>>>>>>>> role. >>>>>>>>>> However, I'm not sure we need that type of granularity instead of the >>>>>>>>>> simplicity of attaching the masking to the column declaration. wdyt? >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> My gut feeling likewise is that this adds complexity but little >>>>>>>>> value. >>>>>>>>> >>>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> >>>>>>>>> Henrik Ingo >>>>>>>>> >>>>>>>>> +358 40 569 7354 <358405697354> >>>>>>>>> >>>>>>>>> [image: Visit us online.] <https://www.datastax.com/> [image: >>>>>>>>> Visit us on Twitter.] <https://twitter.com/DataStaxEng> [image: >>>>>>>>> Visit us on YouTube.] >>>>>>>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.youtube.com_channel_UCqA6zOSMpQ55vvguq4Y0jAg&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=IFj3MdIKYLLXIUhYdUGB0cTzTlxyCb7_VUmICBaYilU&m=bmIfaie9O3fWJAu6lESvWj3HajV4VFwgwgVuKmxKZmE&s=16sY48_kvIb7sRQORknZrr3V8iLTfemFKbMVNZhdwgw&e=> >>>>>>>>> [image: Visit my LinkedIn profile.] >>>>>>>>> <https://www.linkedin.com/in/heingo/> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>>> -- >>>>>>> +---------------------------------------------------------------+ >>>>>>> | Derek Chen-Becker | >>>>>>> | GPG Key available at https://keybase.io/dchenbecker and | >>>>>>> | https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org | >>>>>>> | Fngrprnt: EB8A 6480 F0A3 C8EB C1E7 7F42 AFC5 AFEE 96E4 6ACC | >>>>>>> +---------------------------------------------------------------+ >>>>>>> >>>>>>>