Re: [DISCUSS] Removing support for java 8

2022-08-30 Thread Benjamin Lerer
I seem to recall some discussion about the fact that we took some shortcuts
when introducing java11 support. Before removing java8 support we should
probably make sure that we have cleaned those things.
My understanding was that it would be part of the work related to adding
support for java 17 but I might be wrong.

Le mar. 30 août 2022 à 08:56, Mick Semb Wever  a écrit :

>
>
> On Mon, 29 Aug 2022 at 23:01, Brandon Williams  wrote:
>
>> +1 for removing it when we add 17, to avoid making extra work.
>>
>
>
> +1 on that^
>


Re: [DISCUSS] Removing support for java 8

2022-08-30 Thread Brad
+1 on removing jdk8.  We should also remove python 3.6 (EOL 12/21) on trunk
at the same time.

On Mon, Aug 29, 2022 at 9:40 PM Blake Eggleston 
wrote:

> Sorry, I meant trunk, not 4.1 :)
>
> > On Aug 29, 2022, at 1:09 PM, Blake Eggleston 
> wrote:
> >
> > Hi all, I wanted to propose removing jdk8 support for 4.1. Active
> support ended back in March of this year, and I believe the community has
> built enough confidence in java 11 to make it an uncontroversial change for
> our next major release. Let me know what you think.
> >
> > Thanks,
> >
> > Blake
>
>


Re: [DISCUSS] CEP-20: Dynamic Data Masking

2022-08-30 Thread Avi Kivity via dev
Agree with views, or alternatively, column permissions together with 
computed columns:



CREATE TABLE foo (

  id int PRIMARY KEY,

  unmasked_name text,

  name text GENERATED ALWAYS AS some_mask_function(text, 'xxx', 7)

)


(syntax from postgresql)


GRANT SELECT ON foo.name TO general_use;

GRANT SELECT ON foo.unmasked_name TO top_secret;


On 26/08/2022 00.10, Benedict wrote:
I’m inclined to agree that this seems a more straightforward approach 
that makes fewer implied promises.


Perhaps we could deliver simple views backed by virtual tables, and 
model our approach on that of Postgres, MySQL et al?


Views in C* would be very simple, just offering a subset of fields 
with some UDFs applied. It would allow users to define roles with 
access only to the views, or for applications to use the views for 
presentation purposes.


It feels like a cleaner approach to me, and we’d get two features for 
the price of one. BUT I don’t feel super strongly about this.


On 25 Aug 2022, at 20:16, Derek Chen-Becker  
wrote:



To make sure I understand, if I wanted to use a masked column for a 
conditional update, you're saying we would need SELECT_MASKED to use 
it in the IF clause? I worry that this proposal is increasing in 
complexity; I would actually be OK starting with something smaller in 
scope. Perhaps just providing the masking functions and not tying 
masking to schema would be sufficient for an initial goal? That 
wouldn't preclude additional permissions, schema integration, or 
perhaps just plain Views in the future.


Cheers,

Derek

On Thu, Aug 25, 2022 at 11:12 AM Andrés de la Peña 
 wrote:


I have modified the proposal adding a new SELECT_MASKED
permission. Using masked columns on WHERE/IF clauses would
require having SELECT and either UNMASK or SELECT_MASKED
permissions. Seeing the unmasked values in the query results
would always require both SELECT and UNMASK.

This way we can have the best of both worlds, allowing admins to
decide whether they trust their immediate users or not. wdyt?

On Wed, 24 Aug 2022 at 16:06, Henrik Ingo
 wrote:

This is the difference between security and compliance I
guess :-D

The way I see this, the attacker or threat in this concept is
not the developer with access to the database. Rather a
feature like this is just a convenient way to apply some
masking rule in a centralized way. The protection is against
an end user of the application, who should not be able to see
the personal data of someone else. Or themselves, even. As
long as the application end user doesn't have access to run
arbitrary CQL, then these frorms of masking prevent
accidental unauthorized use/leaking of personal data.

henrik



On Wed, Aug 24, 2022 at 10:40 AM Benedict
 wrote:

Is it typical for a masking feature to make no effort to
prevent unmasking? I’m just struggling to see the value
of this without such mechanisms. Otherwise it’s just a
default formatter, and we should consider renaming the
feature IMO


On 23 Aug 2022, at 21:27, Andrés de la Peña
 wrote:


As mentioned in the CEP document, dynamic data masking
doesn't try to prevent malicious users with SELECT
permissions to indirectly guess the real value of the
masked value. This can easily be done by just trying
values on the WHERE clause of SELECT queries. DDM would
not be a replacement for proper column-level permissions.

The data served by the database is usually consumed by
applications that present this data to end users. These
end users are not necessarily the users directly
connecting to the database. With DDM, it would be easy
for applications to mask sensitive data that is going to
be consumed by the end users. However, the users
directly connecting to the database should be trusted,
provided that they have the right SELECT permissions.

In other words, DDM doesn't directly protect the data,
but it eases the production of protected data.

Said that, we could later go one step ahead and add a
way to prevent untrusted users from inferring the masked
data. That could be done adding a new permission
required to use certain columns on WHERE clauses,
different to the current SELECT permission. That would
play especially well with column-level permissions,
which is something that we still have pending.

On Tue, 23 Aug 2022 at 19:13, Aaron Ploetz
 wrote:

Applying this should prevent querying on a
field, else you could leak its contents, surely?


   

Re: [DISCUSS] CEP-20: Dynamic Data Masking

2022-08-30 Thread Andrés de la Peña
>
> GRANT SELECT ON foo.unmasked_name TO top_secret;


Note that Cassandra doesn't have support for column-level permissions.
There was an initiative to add them in 2016, CASSANDRA-12859
. However, the
ticket has been inactive since 2017. The last comments seem some
discussions about design.

Also, generated columns in PostgreSQL are always stored, so if they were
used for masking they would constitute static data masking, not dynamic.

The approach for dynamic data masking that PostgreSQL suggests on its
documentation

doesn't
seem based on generating a masked copy of the column, neither on a
generated column or on a view. Instead, it uses security labels to
associate columns to users and masking functions. That way, the same column
will be seen masked or unmasked depending on the user.

I'd say that applying the masking rule to the base column itself, and not
to a copy, is the most common approach among the discussed databases so
far. Also, it has the advantage for us of not being based on other
relatively complex features that we miss, such as column-level permissions
or not-materialized views. If someday we add those features I think they
would play well with what is proposed on the CEP.

On Tue, 30 Aug 2022 at 11:46, Avi Kivity via dev 
wrote:

> Agree with views, or alternatively, column permissions together with
> computed columns:
>
>
> CREATE TABLE foo (
>
>   id int PRIMARY KEY,
>
>   unmasked_name text,
>
>   name text GENERATED ALWAYS AS some_mask_function(text, 'xxx', 7)
>
> )
>
>
> (syntax from postgresql)
>
>
> GRANT SELECT ON foo.name TO general_use;
>
> GRANT SELECT ON foo.unmasked_name TO top_secret;
>
>
> On 26/08/2022 00.10, Benedict wrote:
>
> I’m inclined to agree that this seems a more straightforward approach that
> makes fewer implied promises.
>
> Perhaps we could deliver simple views backed by virtual tables, and model
> our approach on that of Postgres, MySQL et al?
>
> Views in C* would be very simple, just offering a subset of fields with
> some UDFs applied. It would allow users to define roles with access only to
> the views, or for applications to use the views for presentation purposes.
>
> It feels like a cleaner approach to me, and we’d get two features for the
> price of one. BUT I don’t feel super strongly about this.
>
> On 25 Aug 2022, at 20:16, Derek Chen-Becker 
>  wrote:
>
> 
> To make sure I understand, if I wanted to use a masked column for a
> conditional update, you're saying we would need SELECT_MASKED to use it in
> the IF clause? I worry that this proposal is increasing in complexity; I
> would actually be OK starting with something smaller in scope. Perhaps just
> providing the masking functions and not tying masking to schema would be
> sufficient for an initial goal? That wouldn't preclude additional
> permissions, schema integration, or perhaps just plain Views in the future.
>
> Cheers,
>
> Derek
>
> On Thu, Aug 25, 2022 at 11:12 AM Andrés de la Peña 
> wrote:
>
>> I have modified the proposal adding a new SELECT_MASKED permission. Using
>> masked columns on WHERE/IF clauses would require having SELECT and either
>> UNMASK or SELECT_MASKED permissions. Seeing the unmasked values in the
>> query results would always require both SELECT and UNMASK.
>>
>> This way we can have the best of both worlds, allowing admins to decide
>> whether they trust their immediate users or not. wdyt?
>>
>> On Wed, 24 Aug 2022 at 16:06, Henrik Ingo 
>> wrote:
>>
>>> This is the difference between security and compliance I guess :-D
>>>
>>> The way I see this, the attacker or threat in this concept is not the
>>> developer with access to the database. Rather a feature like this is just a
>>> convenient way to apply some masking rule in a centralized way. The
>>> protection is against an end user of the application, who should not be
>>> able to see the personal data of someone else. Or themselves, even. As long
>>> as the application end user doesn't have access to run arbitrary CQL, then
>>> these frorms of masking prevent accidental unauthorized use/leaking of
>>> personal data.
>>>
>>> henrik
>>>
>>>
>>>
>>> On Wed, Aug 24, 2022 at 10:40 AM Benedict  wrote:
>>>
 Is it typical for a masking feature to make no effort to prevent
 unmasking? I’m just struggling to see the value of this without such
 mechanisms. Otherwise it’s just a default formatter, and we should consider
 renaming the feature IMO

 On 23 Aug 2022, at 21:27, Andrés de la Peña 
 wrote:

 
 As mentioned in the CEP document, dynamic data masking doesn't try to
 prevent malicious users with SELECT permissions to indirectly guess the
 real value of the masked value. This can easily be done by just trying
 values on the WHERE clause of SELECT queries. DDM would not be a
 replacement for proper column-level permissions.

 T

Re: [DISCUSS] CEP-20: Dynamic Data Masking

2022-08-30 Thread Benedict
Not to push the point too strongly (I don’t have a very firm view of my own), 
but if we provide this via a view feature we’re just implementing one new 
feature and we get masking for free. I don’t think it is materially more 
complicated than redefining columns for users - it might even be less so, as we 
do not have to consider how applications interpret table metadata. 

Projection views are a very simple concept and pretty simple to implement I 
think, and conceptually very familiar to users. So let’s at least not prefer 
the table column modifier approach because it’s simpler or requires fewer new 
features, as I do not believe this to be the case.

> On 30 Aug 2022, at 12:46, Andrés de la Peña  wrote:
> 
> 
>> GRANT SELECT ON foo.unmasked_name TO top_secret;
> 
> Note that Cassandra doesn't have support for column-level permissions. There 
> was an initiative to add them in 2016, CASSANDRA-12859. However, the ticket 
> has been inactive since 2017. The last comments seem some discussions about 
> design.
> 
> Also, generated columns in PostgreSQL are always stored, so if they were used 
> for masking they would constitute static data masking, not dynamic. 
> 
> The approach for dynamic data masking that PostgreSQL suggests on its 
> documentation doesn't seem based on generating a masked copy of the column, 
> neither on a generated column or on a view. Instead, it uses security labels 
> to associate columns to users and masking functions. That way, the same 
> column will be seen masked or unmasked depending on the user. 
> 
> I'd say that applying the masking rule to the base column itself, and not to 
> a copy, is the most common approach among the discussed databases so far. 
> Also, it has the advantage for us of not being based on other relatively 
> complex features that we miss, such as column-level permissions or 
> not-materialized views. If someday we add those features I think they would 
> play well with what is proposed on the CEP.
> 
>> On Tue, 30 Aug 2022 at 11:46, Avi Kivity via dev  
>> wrote:
>> Agree with views, or alternatively, column permissions together with 
>> computed columns:
>> 
>> 
>> 
>> CREATE TABLE foo (
>> 
>>   id int PRIMARY KEY,
>> 
>>   unmasked_name text,
>> 
>>   name text GENERATED ALWAYS AS some_mask_function(text, 'xxx', 7)
>> 
>> )
>> 
>> 
>> 
>> (syntax from postgresql)
>> 
>> 
>> 
>> GRANT SELECT ON foo.name TO general_use;
>> 
>> GRANT SELECT ON foo.unmasked_name TO top_secret;
>> 
>> 
>> 
>>> On 26/08/2022 00.10, Benedict wrote:
>>> I’m inclined to agree that this seems a more straightforward approach that 
>>> makes fewer implied promises.
>>> 
>>> Perhaps we could deliver simple views backed by virtual tables, and model 
>>> our approach on that of Postgres, MySQL et al?
>>> 
>>> Views in C* would be very simple, just offering a subset of fields with 
>>> some UDFs applied. It would allow users to define roles with access only to 
>>> the views, or for applications to use the views for presentation purposes.
>>> 
>>> It feels like a cleaner approach to me, and we’d get two features for the 
>>> price of one. BUT I don’t feel super strongly about this.
>>> 
 On 25 Aug 2022, at 20:16, Derek Chen-Becker  wrote:
 
 
 To make sure I understand, if I wanted to use a masked column for a 
 conditional update, you're saying we would need SELECT_MASKED to use it in 
 the IF clause? I worry that this proposal is increasing in complexity; I 
 would actually be OK starting with something smaller in scope. Perhaps 
 just providing the masking functions and not tying masking to schema would 
 be sufficient for an initial goal? That wouldn't preclude additional 
 permissions, schema integration, or perhaps just plain Views in the future.
 
 Cheers,
 
 Derek
 
 On Thu, Aug 25, 2022 at 11:12 AM Andrés de la Peña  
 wrote:
> I have modified the proposal adding a new SELECT_MASKED permission. Using 
> masked columns on WHERE/IF clauses would require having SELECT and either 
> UNMASK or SELECT_MASKED permissions. Seeing the unmasked values in the 
> query results would always require both SELECT and UNMASK.
> 
> This way we can have the best of both worlds, allowing admins to decide 
> whether they trust their immediate users or not. wdyt?
> 
> On Wed, 24 Aug 2022 at 16:06, Henrik Ingo  
> wrote:
>> This is the difference between security and compliance I guess :-D
>> 
>> The way I see this, the attacker or threat in this concept is not the 
>> developer with access to the database. Rather a feature like this is 
>> just a convenient way to apply some masking rule in a centralized way. 
>> The protection is against an end user of the application, who should not 
>> be able to see the personal data of someone else. Or themselves, even. 
>> As long as the application end user doesn't have access to run arbitrary 
>> CQL, then t

Re: [DISCUSS] Removing support for java 8

2022-08-30 Thread Jon Haddad
+1 to removal of 8 in trunk.

On 2022/08/29 20:09:55 Blake Eggleston wrote:
> Hi all, I wanted to propose removing jdk8 support for 4.1. Active support 
> ended back in March of this year, and I believe the community has built 
> enough confidence in java 11 to make it an uncontroversial change for our 
> next major release. Let me know what you think.
> 
> Thanks,
> 
> Blake


[DISCUSS] LWT UPDATE semantics with + and - when null

2022-08-30 Thread David Capwell
4.1 added the ability for LWT to support "UPDATE ... SET name = name + 42",
but we never really fleshed out with the larger community what the
semantics should be in the case where the column or row are NULL; I opened
up https://issues.apache.org/jira/browse/CASSANDRA-17857 for this issue.

As I see it there are 3 possible outcomes:
1) fail the query
2) null + 42 = null (matches SQL)
3) null + 42 == 0 + 42 = 42 (matches counters)

In SQL you get NULL (option 2), but CQL counters treat NULL as 0 (option 3)
meaning we already do not match SQL (though counters are not a standard SQL
type so might not be applicable).  Personally I lean towards option 3 as
the "zero" for addition and subtraction is 0 (1 for multiplication and
division).

So looking for feedback so we can update in CASSANDRA-17857 before 4.1
release.


Re: [DISCUSS] LWT UPDATE semantics with + and - when null

2022-08-30 Thread Benedict
I’m a bit torn here, as consistency with counters is important. But they are a 
unique eventually consistent data type, and I am inclined to default standard 
numeric types to behave as SQL does, since they write a new value rather than a 
“delta” 

It is far from optimal to have divergent behaviours, but also suboptimal to 
diverge from relational algebra, and probably special casing counters is the 
least bad outcome IMO.


> On 30 Aug 2022, at 22:52, David Capwell  wrote:
> 
> 
> 4.1 added the ability for LWT to support "UPDATE ... SET name = name + 42", 
> but we never really fleshed out with the larger community what the semantics 
> should be in the case where the column or row are NULL; I opened up 
> https://issues.apache.org/jira/browse/CASSANDRA-17857 for this issue.
> 
> As I see it there are 3 possible outcomes:
> 1) fail the query
> 2) null + 42 = null (matches SQL)
> 3) null + 42 == 0 + 42 = 42 (matches counters)
> 
> In SQL you get NULL (option 2), but CQL counters treat NULL as 0 (option 3) 
> meaning we already do not match SQL (though counters are not a standard SQL 
> type so might not be applicable).  Personally I lean towards option 3 as the 
> "zero" for addition and subtraction is 0 (1 for multiplication and division).
> 
> So looking for feedback so we can update in CASSANDRA-17857 before 4.1 
> release.
> 
> 


Re: [DISCUSS] Removing support for java 8

2022-08-30 Thread Caleb Rackliffe
+1 on removing 8 for trunk

On Tue, Aug 30, 2022 at 2:42 PM Jon Haddad 
wrote:

> +1 to removal of 8 in trunk.
>
> On 2022/08/29 20:09:55 Blake Eggleston wrote:
> > Hi all, I wanted to propose removing jdk8 support for 4.1. Active
> support ended back in March of this year, and I believe the community has
> built enough confidence in java 11 to make it an uncontroversial change for
> our next major release. Let me know what you think.
> >
> > Thanks,
> >
> > Blake
>


Re: [DISCUSS] LWT UPDATE semantics with + and - when null

2022-08-30 Thread Caleb Rackliffe
Also +1 on the SQL behavior here. I was uneasy w/ coercing to "" / 0 / 1
(depending on the type) in our previous discussion, but for some reason
didn't bring up the SQL analog :-|

On Tue, Aug 30, 2022 at 5:38 PM Benedict  wrote:

> I’m a bit torn here, as consistency with counters is important. But they
> are a unique eventually consistent data type, and I am inclined to default
> standard numeric types to behave as SQL does, since they write a new value
> rather than a “delta”
>
> It is far from optimal to have divergent behaviours, but also suboptimal
> to diverge from relational algebra, and probably special casing counters is
> the least bad outcome IMO.
>
>
> On 30 Aug 2022, at 22:52, David Capwell  wrote:
>
> 
> 4.1 added the ability for LWT to support "UPDATE ... SET name = name +
> 42", but we never really fleshed out with the larger community what the
> semantics should be in the case where the column or row are NULL; I opened
> up https://issues.apache.org/jira/browse/CASSANDRA-17857 for this issue.
>
> As I see it there are 3 possible outcomes:
> 1) fail the query
> 2) null + 42 = null (matches SQL)
> 3) null + 42 == 0 + 42 = 42 (matches counters)
>
> In SQL you get NULL (option 2), but CQL counters treat NULL as 0 (option
> 3) meaning we already do not match SQL (though counters are not a standard
> SQL type so might not be applicable).  Personally I lean towards option 3
> as the "zero" for addition and subtraction is 0 (1 for multiplication and
> division).
>
> So looking for feedback so we can update in CASSANDRA-17857 before 4.1
> release.
>
>
>