>
> Yes, I was thinking that simple projection views (essentially a SELECT
> statement with application of transform functions) would complement masking
> functions, and from the discussion it sounds like this is basically what
> some of the other databases do.


I don't see that the mentioned databases in general suggest using views for
dynamic data masking. So far, I have only seen this this blog post entry
<https://dev.mysql.com/blog-archive/data-masking-in-mysql/> suggesting to
use MySQL's not-materialized views with masking functions, probably because
MySQL lacks the more sophisticated mechanisms for data masking that other
databases offer.

However, using MySQL views can allow malicious users to run queries to
infer the masked data, which is what we were trying to avoid. For example:

CREATE TABLE employees(
 id INT NOT NULL AUTO_INCREMENT,
 name VARCHAR(100) NOT NULL,
 PRIMARY KEY (id));

CREATE VIEW employee_mask AS SELECT
  id,
  mask_inner(name, 1, 0, _binary'*') AS name
  FROM employees;

INSERT INTO employees(name) SELECT "Joseph";
INSERT INTO employees(name) SELECT "Olivia";

SELECT * FROM employee_mask WHERE name="Joseph";
+----+--------+
| id | name   |
+----+--------+
|  1 | J***** |
+----+--------+

On Fri, 26 Aug 2022 at 02:45, Derek Chen-Becker <de...@chen-becker.org>
wrote:

> Yes, I was thinking that simple projection views (essentially a SELECT
> statement with application of transform functions) would complement masking
> functions, and from the discussion it sounds like this is basically what
> some of the other databases do. Projection views seem like they would be
> useful in their own right, so would it be proper to write a separate CEP
> for that? I would be happy to help drive that document and discussion. I'm
> not sure if it's the best name, but I'm trying to distinguish views that
> expose a subset of an existing schema vs materialized views, which offer
> more complex capabilities.
>
> Cheers,
>
> Derek
>
> On Thu, Aug 25, 2022, 3:11 PM Benedict <bened...@apache.org> wrote:
>
>> I’m inclined to agree that this seems a more straightforward approach
>> that makes fewer implied promises.
>>
>> Perhaps we could deliver simple views backed by virtual tables, and model
>> our approach on that of Postgres, MySQL et al?
>>
>> Views in C* would be very simple, just offering a subset of fields with
>> some UDFs applied. It would allow users to define roles with access only to
>> the views, or for applications to use the views for presentation purposes.
>>
>> It feels like a cleaner approach to me, and we’d get two features for the
>> price of one. BUT I don’t feel super strongly about this.
>>
>> On 25 Aug 2022, at 20:16, Derek Chen-Becker <de...@chen-becker.org>
>> wrote:
>>
>> 
>> To make sure I understand, if I wanted to use a masked column for a
>> conditional update, you're saying we would need SELECT_MASKED to use it in
>> the IF clause? I worry that this proposal is increasing in complexity; I
>> would actually be OK starting with something smaller in scope. Perhaps just
>> providing the masking functions and not tying masking to schema would be
>> sufficient for an initial goal? That wouldn't preclude additional
>> permissions, schema integration, or perhaps just plain Views in the future.
>>
>> Cheers,
>>
>> Derek
>>
>> On Thu, Aug 25, 2022 at 11:12 AM Andrés de la Peña <adelap...@apache.org>
>> wrote:
>>
>>> I have modified the proposal adding a new SELECT_MASKED permission.
>>> Using masked columns on WHERE/IF clauses would require having SELECT and
>>> either UNMASK or SELECT_MASKED permissions. Seeing the unmasked values in
>>> the query results would always require both SELECT and UNMASK.
>>>
>>> This way we can have the best of both worlds, allowing admins to decide
>>> whether they trust their immediate users or not. wdyt?
>>>
>>> On Wed, 24 Aug 2022 at 16:06, Henrik Ingo <henrik.i...@datastax.com>
>>> wrote:
>>>
>>>> This is the difference between security and compliance I guess :-D
>>>>
>>>> The way I see this, the attacker or threat in this concept is not the
>>>> developer with access to the database. Rather a feature like this is just a
>>>> convenient way to apply some masking rule in a centralized way. The
>>>> protection is against an end user of the application, who should not be
>>>> able to see the personal data of someone else. Or themselves, even. As long
>>>> as the application end user doesn't have access to run arbitrary CQL, then
>>>> these frorms of masking prevent accidental unauthorized use/leaking of
>>>> personal data.
>>>>
>>>> henrik
>>>>
>>>>
>>>>
>>>> On Wed, Aug 24, 2022 at 10:40 AM Benedict <bened...@apache.org> wrote:
>>>>
>>>>> Is it typical for a masking feature to make no effort to prevent
>>>>> unmasking? I’m just struggling to see the value of this without such
>>>>> mechanisms. Otherwise it’s just a default formatter, and we should 
>>>>> consider
>>>>> renaming the feature IMO
>>>>>
>>>>> On 23 Aug 2022, at 21:27, Andrés de la Peña <adelap...@apache.org>
>>>>> wrote:
>>>>>
>>>>> 
>>>>> As mentioned in the CEP document, dynamic data masking doesn't try to
>>>>> prevent malicious users with SELECT permissions to indirectly guess the
>>>>> real value of the masked value. This can easily be done by just trying
>>>>> values on the WHERE clause of SELECT queries. DDM would not be a
>>>>> replacement for proper column-level permissions.
>>>>>
>>>>> The data served by the database is usually consumed by applications
>>>>> that present this data to end users. These end users are not necessarily
>>>>> the users directly connecting to the database. With DDM, it would be easy
>>>>> for applications to mask sensitive data that is going to be consumed by 
>>>>> the
>>>>> end users. However, the users directly connecting to the database should 
>>>>> be
>>>>> trusted, provided that they have the right SELECT permissions.
>>>>>
>>>>> In other words, DDM doesn't directly protect the data, but it eases
>>>>> the production of protected data.
>>>>>
>>>>> Said that, we could later go one step ahead and add a way to prevent
>>>>> untrusted users from inferring the masked data. That could be done adding 
>>>>> a
>>>>> new permission required to use certain columns on WHERE clauses, different
>>>>> to the current SELECT permission. That would play especially well with
>>>>> column-level permissions, which is something that we still have pending.
>>>>>
>>>>> On Tue, 23 Aug 2022 at 19:13, Aaron Ploetz <aaronplo...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Applying this should prevent querying on a field, else you could leak
>>>>>>> its contents, surely?
>>>>>>>
>>>>>>
>>>>>> In theory, yes.  Although I could see folks doing something like this:
>>>>>>
>>>>>> SELECT COUNT(*) FROM patients
>>>>>> WHERE year_of_birth = 2002
>>>>>> AND date_of_birth >= '2002-04-01'
>>>>>> AND date_of_birth < '2002-11-01';
>>>>>>
>>>>>> In this case, the rows containing the masked key column(s) could be
>>>>>> filtered on without revealing the actual data.  But again, that's 
>>>>>> probably
>>>>>> better for a "phase 2" of the implementation.
>>>>>>
>>>>>> Agreed on not being a queryable field. That would also preclude
>>>>>>> secondary indexing, right?
>>>>>>
>>>>>>
>>>>>> Yes, that's my thought as well.
>>>>>>
>>>>>> On Tue, Aug 23, 2022 at 12:42 PM Derek Chen-Becker <
>>>>>> de...@chen-becker.org> wrote:
>>>>>>
>>>>>>> Agreed on not being a queryable field. That would also preclude
>>>>>>> secondary indexing, right?
>>>>>>>
>>>>>>> On Tue, Aug 23, 2022 at 11:20 AM Benedict <bened...@apache.org>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Applying this should prevent querying on a field, else you could
>>>>>>>> leak its contents, surely? This pretty much prohibits using it in a
>>>>>>>> clustering key, and a partition key with the ordered partitioner - but
>>>>>>>> probably also a hashed partitioner since we do not use a cryptographic 
>>>>>>>> hash
>>>>>>>> and the hash function is well defined.
>>>>>>>>
>>>>>>>> We probably also need to ensure that any ALLOW FILTERING queries on
>>>>>>>> such a field are disabled.
>>>>>>>>
>>>>>>>> Plausibly the data could be cryptographically jumbled before using
>>>>>>>> it in a primary key component (or permitting filtering), but it is 
>>>>>>>> probably
>>>>>>>> easier and safer to exclude for now…
>>>>>>>>
>>>>>>>> On 23 Aug 2022, at 18:13, Aaron Ploetz <aaronplo...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>> 
>>>>>>>> Some thoughts on this one:
>>>>>>>>
>>>>>>>> In a prior job, we'd give app teams access to a single keyspace,
>>>>>>>> and two roles: a read-write role and a read-only role.  In some cases, 
>>>>>>>> a
>>>>>>>> "privileged" application role was also requested.  Depending on the
>>>>>>>> requirements, I could see the UNMASK permission being applied to the 
>>>>>>>> RW or
>>>>>>>> privileged roles.  But if there's a problem on the table and the 
>>>>>>>> operators
>>>>>>>> go in to investigate, they will likely use a SUPERUSER account, and 
>>>>>>>> they'll
>>>>>>>> see that data.
>>>>>>>>
>>>>>>>> How hard would it be for SUPERUSERs to *not* automatically get the
>>>>>>>> UNMASK permission?
>>>>>>>>
>>>>>>>> I'll also echo the concerns around masking primary key components.
>>>>>>>> It's highly likely that certain personal data properties would be used 
>>>>>>>> as a
>>>>>>>> partition or clustering key (ex: range query for people born within a
>>>>>>>> certain timeframe).  In addition to the "breaks existing" concern, I'm
>>>>>>>> curious about the challenges around getting that to work with the 
>>>>>>>> current
>>>>>>>> primary key implementation.
>>>>>>>>
>>>>>>>> Does this first implementation only apply to payload (non-key)
>>>>>>>> columns?  The examples in the CEP currently do not show primary key
>>>>>>>> components being masked.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> Aaron
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Aug 23, 2022 at 6:44 AM Henrik Ingo <
>>>>>>>> henrik.i...@datastax.com> wrote:
>>>>>>>>
>>>>>>>>> On Tue, Aug 23, 2022 at 1:10 PM Andrés de la Peña <
>>>>>>>>> adelap...@apache.org> wrote:
>>>>>>>>>
>>>>>>>>>> One thought: The way the CEP is currently written, it is only
>>>>>>>>>>> possible to mask a column one way. You can only define one masking 
>>>>>>>>>>> function
>>>>>>>>>>> for a column, and since you use the original column name, you could 
>>>>>>>>>>> only
>>>>>>>>>>> return one version of it in the result set, even if you had a way 
>>>>>>>>>>> to define
>>>>>>>>>>> several functions.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Right, it's one single type of mapping per the column, declared
>>>>>>>>>> on CREATE/ALTER TABLE statements. Also, users can manually specify 
>>>>>>>>>> their
>>>>>>>>>> own masking function in SELECT statements if they have permissions 
>>>>>>>>>> for
>>>>>>>>>> seeing the clear data.
>>>>>>>>>>
>>>>>>>>>> For those cases where the data is automatically masked for an
>>>>>>>>>> unprivileged user, I don't see the use of including different types 
>>>>>>>>>> of
>>>>>>>>>> masking for the same column into the same result set. Instead, we 
>>>>>>>>>> might be
>>>>>>>>>> interested on having different types of masking associated to 
>>>>>>>>>> different
>>>>>>>>>> roles. We could do so with dedicated CREATE/DROP/LIST MASK 
>>>>>>>>>> statements,
>>>>>>>>>> instead of using the CREATE/ALTER/DESCRIBE TABLE statements. That 
>>>>>>>>>> CREATE
>>>>>>>>>> MASK statement would associate a masking function to a column and 
>>>>>>>>>> role.
>>>>>>>>>> However, I'm not sure we need that type of granularity instead of the
>>>>>>>>>> simplicity of attaching the masking to the column declaration. wdyt?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> My gut feeling likewise is that this adds complexity but little
>>>>>>>>> value.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>>
>>>>>>>>> Henrik Ingo
>>>>>>>>>
>>>>>>>>> +358 40 569 7354 <358405697354>
>>>>>>>>>
>>>>>>>>> [image: Visit us online.] <https://www.datastax.com/>  [image:
>>>>>>>>> Visit us on Twitter.] <https://twitter.com/DataStaxEng>  [image:
>>>>>>>>> Visit us on YouTube.]
>>>>>>>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.youtube.com_channel_UCqA6zOSMpQ55vvguq4Y0jAg&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=IFj3MdIKYLLXIUhYdUGB0cTzTlxyCb7_VUmICBaYilU&m=bmIfaie9O3fWJAu6lESvWj3HajV4VFwgwgVuKmxKZmE&s=16sY48_kvIb7sRQORknZrr3V8iLTfemFKbMVNZhdwgw&e=>
>>>>>>>>>   [image: Visit my LinkedIn profile.]
>>>>>>>>> <https://urldefense.com/v3/__https://www.linkedin.com/in/heingo/__;!!PbtH5S7Ebw!YKhUm1ce3A3Djw9kupwqUWknncAxAeKovQ9vuMMPTMAubth1Zjbs8W62LQMY3KorY7W3H7Fhb1GRu1wnvEAU$>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> +---------------------------------------------------------------+
>>>>>>> | Derek Chen-Becker                                             |
>>>>>>> | GPG Key available at https://keybase.io/dchenbecker
>>>>>>> <https://urldefense.com/v3/__https://keybase.io/dchenbecker__;!!PbtH5S7Ebw!YKhUm1ce3A3Djw9kupwqUWknncAxAeKovQ9vuMMPTMAubth1Zjbs8W62LQMY3KorY7W3H7Fhb1GRu-uKf-oY$>
>>>>>>> and       |
>>>>>>> | https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org
>>>>>>> <https://urldefense.com/v3/__https://pgp.mit.edu/pks/lookup?search=derek*40chen-becker.org__;JQ!!PbtH5S7Ebw!YKhUm1ce3A3Djw9kupwqUWknncAxAeKovQ9vuMMPTMAubth1Zjbs8W62LQMY3KorY7W3H7Fhb1GRuz_jdH0t$>
>>>>>>> |
>>>>>>> | Fngrprnt: EB8A 6480 F0A3 C8EB C1E7  7F42 AFC5 AFEE 96E4 6ACC  |
>>>>>>> +---------------------------------------------------------------+
>>>>>>>
>>>>>>>
>>>>
>>>> --
>>>>
>>>> Henrik Ingo
>>>>
>>>> +358 40 569 7354 <358405697354>
>>>>
>>>> [image: Visit us online.] <https://www.datastax.com/>  [image: Visit
>>>> us on Twitter.] <https://twitter.com/DataStaxEng>  [image: Visit us on
>>>> YouTube.]
>>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.youtube.com_channel_UCqA6zOSMpQ55vvguq4Y0jAg&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=IFj3MdIKYLLXIUhYdUGB0cTzTlxyCb7_VUmICBaYilU&m=bmIfaie9O3fWJAu6lESvWj3HajV4VFwgwgVuKmxKZmE&s=16sY48_kvIb7sRQORknZrr3V8iLTfemFKbMVNZhdwgw&e=>
>>>>   [image: Visit my LinkedIn profile.]
>>>> <https://www.linkedin.com/in/heingo/>
>>>>
>>>
>>
>> --
>> +---------------------------------------------------------------+
>> | Derek Chen-Becker                                             |
>> | GPG Key available at https://keybase.io/dchenbecker and       |
>> | https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org |
>> | Fngrprnt: EB8A 6480 F0A3 C8EB C1E7  7F42 AFC5 AFEE 96E4 6ACC  |
>> +---------------------------------------------------------------+
>>
>>

Reply via email to