Re: [DISCUSS] CEP-20: Dynamic Data Masking

Berenguer Blasi Wed, 07 Sep 2022 06:57:20 -0700

A. I agree the implementor's preference is an important aspect to takeinto account.


On 7/9/22 15:23, Ekaterina Dimitrova wrote:

On Wed, 7 Sep 2022 at 9:05, Andrés de la Peña <adelap...@apache.org>wrote:


    The poll makes sense to me. I would slightly change it to:

    A) We shouldn't prefer neither approach, and I agree to the
    implementor selecting the table schema approach for this CEP
    B) We should prefer the view approach, but I am not opposed to the
    implementor selecting the table schema approach for this CEP
    C) We should NOT implement the table schema approach, and should
    implement the view approach
    D) We should NOT implement the table view approach, and should
    implement the schema approach
    E) We should NOT implement the table schema approach, and should
    implement some other scheme (or not implement this feature)

    Where my vote is for A.


    On Wed, 7 Sept 2022 at 13:12, Benedict <bened...@apache.org> wrote:

        I’m not convinced there’s been adequate resolution over which
        approach is adopted. I know you have expressed a preference
        for the table schema approach, but the weight of other opinion
        so far appears to be against this approach - even if it is
        broadly adopted by other databases. I will note that Postgres
        does not adopt this approach, it has a more sophisticated
        security label approach that has not been proposed by anybody
        so far.

        I think extra weight should be given to the implementer’s
        preference, so while I personally do not like the table schema
        approach, I am happy to accept this is an industry norm, and
        leave the decision to you.

        However, we should ensure the community as a whole endorses
        this. I think an indicative poll should be undertaken first, eg:

        A) We should implement the table schema approach, as proposed
        B) We should prefer the view approach, but I am not opposed to
        the implementor selecting the table schema approach for this CEP
        C) We should NOT implement the table schema approach, and
        should implement the view approach
        D) We should NOT implement the table schema approach, and
        should implement some other scheme (or not implement this feature)

        Where my vote is B

        On 7 Sep 2022, at 12:50, Andrés de la Peña
        <adelap...@apache.org> wrote:

        
        If nobody has more concerns regarding the CEP I will start
        the vote tomorrow.

        On Wed, 31 Aug 2022 at 13:18, Andrés de la Peña
        <adelap...@apache.org> wrote:

                Is there enough support here for VIEWS to be the
                implementation strategy for displaying masking functions?


            I'm not sure that views should be "the" strategy for
            masking functions. We have multiple approaches here:

            1) CQL functions only. Users can decide to use the
            masking functions on their own will. I think most dbs
            allow this pattern of usage, which is quite
            straightforward. Obviously, it doesn't allow admins to
            decide enforce users seeing only masked data.
            Nevertheless, it's still useful for trusted database
            users generating masked data that will be consumed by the
            end users of the application.

            2) Masking functions attached to specific columns. This
            way the same queries will see different data (masked or
            not) depending on the permissions of the user running the
            query. It has the advantage of not requiring to change
            the queries that users with different permissions run.
            The downside is that users would need to query the schema
            if they need to know whether a column is masked, unless
            we change the names of the returned columns. This is the
            approach offered by Azure/SQL Server, PostgreSQL, IBM
            Db2, Oracle, MariaDB/MaxScale and SnowFlake. All these
            databases support applying the masking function to
            columns on the base table, and some of them also allow to
            apply masking to views.

            3) Masking functions as part of projected views. This
            ways users might need to query the view appropriate for
            their permissions instead of the base table. This might
            mean changing the queries if the masking policy is
            changed by the admin. MySQL recommends this approach on a
            blog entry, although it's not part of its main
            documentation for data masking, and the implementation
            has security issues. Some of the other databases offering
            the approach 2) as their main option also support masking
            on view columns.

            Each approach has its own advantages and limitations, and
            I don't think we necessarily have to choose. The CEP
            proposes implementing 1) and 2), but no one impedes us to
            also have 3) if we get to have projected views. However,
            I think that projected views is a new general-purpose
            feature with its own complexities, so it would deserve
            its own CEP, if someone is willing to work on the
            implementation.



            On Wed, 31 Aug 2022 at 12:03, Claude Warren via dev
            <dev@cassandra.apache.org> wrote:

                Is there enough support here for VIEWS to be the
                implementation strategy for displaying masking functions?

                It seems to me the view would have to store the query
                and apply a where clause to it, so the same PK would
                be in play.

                It has data leaking properties.

                It has more use cases as it can be used to

                  * construct views that filter out sensitive columns
                  * apply transforms to convert units of measure

                Are there more thoughts along this line?

Re: [DISCUSS] CEP-20: Dynamic Data Masking

Reply via email to