sounds interesting. I would like to understand a couple things here. If the column names are the same for masked and unmasked data, it would impact existing applications. I am curious what the transition plan look like for applications that expect unmasked data?
For example, let’s say you store SSNs and Birth dates. Upon enabling this feature, let’s say the app user is not given the UNMASK permission. Now the app is receiving masked values for these columns. This is fine for most read only applications. However, a lot of times these columns may be used as primary keys or part of primary keys in other tables. This would break existing applications. How would this work in mixed mode when ew nodes in the cluster are masking data and others aren’t? How would it impact the driver? How would the application learn that the column values are masked? This is important in case a user has UNMASK permission and then later taken away. Again this would break a lot of applications. Dinesh > On Aug 19, 2022, at 4:50 AM, Andrés de la Peña <adelap...@apache.org> wrote: > > > Hi everyone, > > I'd like to start a discussion about this proposal for dynamic data masking: > https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-20%3A+Dynamic+Data+Masking > > Dynamic data masking allows to obscure sensitive information without changing > the stored data. It would be based on a set of native CQL functions providing > different types of masking, such as replacing the column value by "XXXX". > These functions could be used as regular functions or attached to table > columns with CREATE/ALTER table. There would be a new UNMASK permission, so > only the users with this permissions would be able to see the unmasked column > values. It would be possible to customize masking by using UDFs as masking > functions. > > Thanks,