> > Applying this should prevent querying on a field, else you could leak its > contents, surely? >
In theory, yes. Although I could see folks doing something like this: SELECT COUNT(*) FROM patients WHERE year_of_birth = 2002 AND date_of_birth >= '2002-04-01' AND date_of_birth < '2002-11-01'; In this case, the rows containing the masked key column(s) could be filtered on without revealing the actual data. But again, that's probably better for a "phase 2" of the implementation. Agreed on not being a queryable field. That would also preclude secondary > indexing, right? Yes, that's my thought as well. On Tue, Aug 23, 2022 at 12:42 PM Derek Chen-Becker <de...@chen-becker.org> wrote: > Agreed on not being a queryable field. That would also preclude secondary > indexing, right? > > On Tue, Aug 23, 2022 at 11:20 AM Benedict <bened...@apache.org> wrote: > >> Applying this should prevent querying on a field, else you could leak its >> contents, surely? This pretty much prohibits using it in a clustering key, >> and a partition key with the ordered partitioner - but probably also a >> hashed partitioner since we do not use a cryptographic hash and the hash >> function is well defined. >> >> We probably also need to ensure that any ALLOW FILTERING queries on such >> a field are disabled. >> >> Plausibly the data could be cryptographically jumbled before using it in >> a primary key component (or permitting filtering), but it is probably >> easier and safer to exclude for now… >> >> On 23 Aug 2022, at 18:13, Aaron Ploetz <aaronplo...@gmail.com> wrote: >> >> >> Some thoughts on this one: >> >> In a prior job, we'd give app teams access to a single keyspace, and two >> roles: a read-write role and a read-only role. In some cases, a >> "privileged" application role was also requested. Depending on the >> requirements, I could see the UNMASK permission being applied to the RW or >> privileged roles. But if there's a problem on the table and the operators >> go in to investigate, they will likely use a SUPERUSER account, and they'll >> see that data. >> >> How hard would it be for SUPERUSERs to *not* automatically get the UNMASK >> permission? >> >> I'll also echo the concerns around masking primary key components. It's >> highly likely that certain personal data properties would be used as a >> partition or clustering key (ex: range query for people born within a >> certain timeframe). In addition to the "breaks existing" concern, I'm >> curious about the challenges around getting that to work with the current >> primary key implementation. >> >> Does this first implementation only apply to payload (non-key) columns? >> The examples in the CEP currently do not show primary key components being >> masked. >> >> Thanks, >> >> Aaron >> >> >> On Tue, Aug 23, 2022 at 6:44 AM Henrik Ingo <henrik.i...@datastax.com> >> wrote: >> >>> On Tue, Aug 23, 2022 at 1:10 PM Andrés de la Peña <adelap...@apache.org> >>> wrote: >>> >>>> One thought: The way the CEP is currently written, it is only possible >>>>> to mask a column one way. You can only define one masking function for a >>>>> column, and since you use the original column name, you could only return >>>>> one version of it in the result set, even if you had a way to define >>>>> several functions. >>>>> >>>> >>>> Right, it's one single type of mapping per the column, declared on >>>> CREATE/ALTER TABLE statements. Also, users can manually specify their own >>>> masking function in SELECT statements if they have permissions for seeing >>>> the clear data. >>>> >>>> For those cases where the data is automatically masked for an >>>> unprivileged user, I don't see the use of including different types of >>>> masking for the same column into the same result set. Instead, we might be >>>> interested on having different types of masking associated to different >>>> roles. We could do so with dedicated CREATE/DROP/LIST MASK statements, >>>> instead of using the CREATE/ALTER/DESCRIBE TABLE statements. That CREATE >>>> MASK statement would associate a masking function to a column and role. >>>> However, I'm not sure we need that type of granularity instead of the >>>> simplicity of attaching the masking to the column declaration. wdyt? >>>> >>>> >>>> >>> My gut feeling likewise is that this adds complexity but little value. >>> >>>> >>>>> >>> >>> -- >>> >>> Henrik Ingo >>> >>> +358 40 569 7354 <358405697354> >>> >>> [image: Visit us online.] <https://www.datastax.com/> [image: Visit us >>> on Twitter.] <https://twitter.com/DataStaxEng> [image: Visit us on >>> YouTube.] >>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.youtube.com_channel_UCqA6zOSMpQ55vvguq4Y0jAg&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=IFj3MdIKYLLXIUhYdUGB0cTzTlxyCb7_VUmICBaYilU&m=bmIfaie9O3fWJAu6lESvWj3HajV4VFwgwgVuKmxKZmE&s=16sY48_kvIb7sRQORknZrr3V8iLTfemFKbMVNZhdwgw&e=> >>> [image: Visit my LinkedIn profile.] >>> <https://www.linkedin.com/in/heingo/> >>> >> > > -- > +---------------------------------------------------------------+ > | Derek Chen-Becker | > | GPG Key available at https://keybase.io/dchenbecker and | > | https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org | > | Fngrprnt: EB8A 6480 F0A3 C8EB C1E7 7F42 AFC5 AFEE 96E4 6ACC | > +---------------------------------------------------------------+ > >