Is it typical for a masking feature to make no effort to prevent unmasking? I’m just struggling to see the value of this without such mechanisms. Otherwise it’s just a default formatter, and we should consider renaming the feature IMO
> On 23 Aug 2022, at 21:27, Andrés de la Peña <adelap...@apache.org> wrote: > > > As mentioned in the CEP document, dynamic data masking doesn't try to prevent > malicious users with SELECT permissions to indirectly guess the real value of > the masked value. This can easily be done by just trying values on the WHERE > clause of SELECT queries. DDM would not be a replacement for proper > column-level permissions. > > The data served by the database is usually consumed by applications that > present this data to end users. These end users are not necessarily the users > directly connecting to the database. With DDM, it would be easy for > applications to mask sensitive data that is going to be consumed by the end > users. However, the users directly connecting to the database should be > trusted, provided that they have the right SELECT permissions. > > In other words, DDM doesn't directly protect the data, but it eases the > production of protected data. > > Said that, we could later go one step ahead and add a way to prevent > untrusted users from inferring the masked data. That could be done adding a > new permission required to use certain columns on WHERE clauses, different to > the current SELECT permission. That would play especially well with > column-level permissions, which is something that we still have pending. > > On Tue, 23 Aug 2022 at 19:13, Aaron Ploetz <aaronplo...@gmail.com> wrote: >>> Applying this should prevent querying on a field, else you could leak its >>> contents, surely? >> >> In theory, yes. Although I could see folks doing something like this: >> >> SELECT COUNT(*) FROM patients >> WHERE year_of_birth = 2002 >> AND date_of_birth >= '2002-04-01' >> AND date_of_birth < '2002-11-01'; >> >> In this case, the rows containing the masked key column(s) could be filtered >> on without revealing the actual data. But again, that's probably better for >> a "phase 2" of the implementation. >> >>> Agreed on not being a queryable field. That would also preclude secondary >>> indexing, right? >> >> Yes, that's my thought as well. >> >>> On Tue, Aug 23, 2022 at 12:42 PM Derek Chen-Becker <de...@chen-becker.org> >>> wrote: >>> Agreed on not being a queryable field. That would also preclude secondary >>> indexing, right? >>> >>>> On Tue, Aug 23, 2022 at 11:20 AM Benedict <bened...@apache.org> wrote: >>>> Applying this should prevent querying on a field, else you could leak its >>>> contents, surely? This pretty much prohibits using it in a clustering key, >>>> and a partition key with the ordered partitioner - but probably also a >>>> hashed partitioner since we do not use a cryptographic hash and the hash >>>> function is well defined. >>>> >>>> We probably also need to ensure that any ALLOW FILTERING queries on such a >>>> field are disabled. >>>> >>>> Plausibly the data could be cryptographically jumbled before using it in a >>>> primary key component (or permitting filtering), but it is probably easier >>>> and safer to exclude for now… >>>> >>>>>> On 23 Aug 2022, at 18:13, Aaron Ploetz <aaronplo...@gmail.com> wrote: >>>>>> >>>>> >>>>> Some thoughts on this one: >>>>> >>>>> In a prior job, we'd give app teams access to a single keyspace, and two >>>>> roles: a read-write role and a read-only role. In some cases, a >>>>> "privileged" application role was also requested. Depending on the >>>>> requirements, I could see the UNMASK permission being applied to the RW >>>>> or privileged roles. But if there's a problem on the table and the >>>>> operators go in to investigate, they will likely use a SUPERUSER account, >>>>> and they'll see that data. >>>>> >>>>> How hard would it be for SUPERUSERs to *not* automatically get the UNMASK >>>>> permission? >>>>> >>>>> I'll also echo the concerns around masking primary key components. It's >>>>> highly likely that certain personal data properties would be used as a >>>>> partition or clustering key (ex: range query for people born within a >>>>> certain timeframe). In addition to the "breaks existing" concern, I'm >>>>> curious about the challenges around getting that to work with the current >>>>> primary key implementation. >>>>> >>>>> Does this first implementation only apply to payload (non-key) columns? >>>>> The examples in the CEP currently do not show primary key components >>>>> being masked. >>>>> >>>>> Thanks, >>>>> >>>>> Aaron >>>>> >>>>> >>>>>> On Tue, Aug 23, 2022 at 6:44 AM Henrik Ingo <henrik.i...@datastax.com> >>>>>> wrote: >>>>>> On Tue, Aug 23, 2022 at 1:10 PM Andrés de la Peña <adelap...@apache.org> >>>>>> wrote: >>>>>>>> One thought: The way the CEP is currently written, it is only possible >>>>>>>> to mask a column one way. You can only define one masking function for >>>>>>>> a column, and since you use the original column name, you could only >>>>>>>> return one version of it in the result set, even if you had a way to >>>>>>>> define several functions. >>>>>>> >>>>>>> Right, it's one single type of mapping per the column, declared on >>>>>>> CREATE/ALTER TABLE statements. Also, users can manually specify their >>>>>>> own masking function in SELECT statements if they have permissions for >>>>>>> seeing the clear data. >>>>>>> >>>>>>> For those cases where the data is automatically masked for an >>>>>>> unprivileged user, I don't see the use of including different types of >>>>>>> masking for the same column into the same result set. Instead, we might >>>>>>> be interested on having different types of masking associated to >>>>>>> different roles. We could do so with dedicated CREATE/DROP/LIST MASK >>>>>>> statements, instead of using the CREATE/ALTER/DESCRIBE TABLE >>>>>>> statements. That CREATE MASK statement would associate a masking >>>>>>> function to a column and role. However, I'm not sure we need that type >>>>>>> of granularity instead of the simplicity of attaching the masking to >>>>>>> the column declaration. wdyt? >>>>>>> >>>>>>> >>>>>> >>>>>> My gut feeling likewise is that this adds complexity but little value. >>>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Henrik Ingo >>>>>> +358 40 569 7354 >>>>>> >>> >>> >>> -- >>> +---------------------------------------------------------------+ >>> | Derek Chen-Becker | >>> | GPG Key available at https://keybase.io/dchenbecker and | >>> | https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org | >>> | Fngrprnt: EB8A 6480 F0A3 C8EB C1E7 7F42 AFC5 AFEE 96E4 6ACC | >>> +---------------------------------------------------------------+ >>>