[DISCUSS] CEP-42: Constraints Framework

2024-05-31 Thread Bernardo Botella
Hello everyone,

I am proposing this CEP:
https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-42%3A+Constraints+Framework

And I’m looking for feedback from the community.

Thanks a lot!
Bernardo

Re: [DISCUSS] CEP-42: Constraints Framework

2024-06-02 Thread Bernardo Botella
Hi Jeff,

Thanks a lot for your comments. 

At your first question "Would this be implemented solely in the write path?”, 
the answer is yes. I think enforcing it at reads/compaction/repairs may pose 
problems for cases in which an alter table is performed adding new or more 
strict constraints to a table that has some already offending data. I think the 
cleanest way to handle these scenarios is to just prevent new data to be added 
if it does not comply with the current constraints.

At your second comment:
For the third point, I didn’t want to be prescriptive on what those validations 
should be, but the fact that the proposal is extensible to those potential use 
cases is something concrete that, in my opinion, comes as a benefit of the 
actual proposal. I’d be happy to develop a bit more the main example used of 
sizeOf if it helps alleviate your concerns on this point.

I still do think that the general benefit of allowing flexibility at adding 
limits to what can be written to the database is something positive that help 
Cassandra users keep healthy clusters.


 

> On Jun 2, 2024, at 12:04 PM, Jeff Jirsa  wrote:
> 
> Separately, when we discuss benefits of a proposal in a CEP, we should talk 
> about what’s concrete and ignore the stuff that’s idealistic. Of these four 
> points:
> 
> This brings to the table several benefits and flexibility. Some examples:
> 
> Cassandra operators have more control to reason about your data and 
> appropriately tune for performance.
> Potential reduction on maintenance overhead, being able to better predict 
> partition sizes.
> Extensibility to more complex validations in the future.
> Potential value in storage engine making decisions based on data size.
> The second is just the first, restated, and the fourth seems incredibly 
> unlikely. The third seems maybe possible, but why not spec out the full range 
> with the CEP instead of assuming iterative implementation?
> 
> 
> 
>> On Jun 2, 2024, at 20:59, Jeff Jirsa  wrote:
>> 
>> 
>> Would this be implemented solely in the write path? Or would you also try to 
>> enforce it in the read and sstable/compaction/repair paths as well?  
>> 
>> 
>> 
>>> On May 31, 2024, at 23:24, Bernardo Botella  
>>> wrote:
>>> 
>>> Hello everyone,
>>> 
>>> I am proposing this CEP:
>>> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-42%3A+Constraints+Framework
>>> 
>>> And I’m looking for feedback from the community.
>>> 
>>> Thanks a lot!
>>> Bernardo



Re: [DISCUSS] CEP-42: Constraints Framework

2024-06-03 Thread Bernardo Botella
Basically, I am trying to protect the limits set by the operator against 
misconfigured schemas from the customers. 

I see the guardrails as a safety limit added by the operator, setting the 
limits within the customers owning the actual schema (and their constraints) 
can operate. With that vision, if a customer tries to “ignore” the actual 
limits set by the operator by adding more relaxed constraints, it gets a nice 
message saying that “that is not allowed for the cluster, please contact your 
admin".



> On Jun 3, 2024, at 2:51 PM, Miklosovic, Stefan via dev 
>  wrote:
> 
> You wrote in the CEP:
>  
> As we mentioned in the motivation section, we currently have some guardrails 
> for columns size in place which can be extended for other data types.
> Those guardrails will take preference over the defined constraints in the 
> schema, and a SCHEMA ALTER adding constraints that break the limits defined 
> by the guardrails framework will fail.
> If the guardrails themselves are modified, operator should get a warning 
> mentioning that there are schemas with offending constraints.
>  
> I think that this should be other way around. Guardrails should kick in when 
> there are no constraints and they would be overridden by table schema. That 
> way, there is always a “default” in terms of guardrails (which one can turn 
> off on demand / change) but you can override it by table alternation.
>  
> Basically, what is in schema should win regardless of how guardrails are 
> configured. They don’t matter when a constraint is explicitly specified in a 
> schema. It should take the defaults in guardrails if there are any and no 
> constraint is specified on schema level.
>  
> What is your motivation to do it like you suggested?
>  
> From: Bernardo Botella  <mailto:conta...@bernardobotella.com>>
> Date: Friday, 31 May 2024 at 23:24
> To: dev@cassandra.apache.org <mailto:dev@cassandra.apache.org> 
> mailto:dev@cassandra.apache.org>>
> Subject: [DISCUSS] CEP-42: Constraints Framework
> 
> You don't often get email from conta...@bernardobotella.com 
> <mailto:conta...@bernardobotella.com>. Learn why this is important 
> <https://aka.ms/LearnAboutSenderIdentification>
> EXTERNAL EMAIL - USE CAUTION when clicking links or attachments 
> 
> 
> 
> Hello everyone, 
>  
> I am proposing this CEP:
> CEP-42: Constraints Framework - CASSANDRA - Apache Software Foundation 
> <https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-42%3A+Constraints+Framework>
> cwiki.apache.org 
> <https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-42%3A+Constraints+Framework>
>   
>  
> <https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-42%3A+Constraints+Framework>
>  
>  
> And I’m looking for feedback from the community.
>  
> Thanks a lot!
> Bernardo



Re: [DISCUSS] CEP-42: Constraints Framework

2024-06-03 Thread Bernardo Botella
Yes, that is correct. This particular behavior will need CEP-24 in order to 
work reliably. But, if my understanding is correct, that statement holds true 
for the entirety of Guardrails, and not only for this particular feature.

> On Jun 3, 2024, at 3:54 PM, Miklosovic, Stefan  
> wrote:
> 
> That would work reliably in case there is no way how to misconfigure 
> guardrails in the cluster. What if you set a guardrail on one node but you 
> don’t set it (or set it differently) on the other? If it is configured 
> differently and you want to check the guardrails if constraints do not 
> violate them, then your query might fail or not based on what node is hit. 
>  
> I guess that guardrails would need to start to be transactional to be sure 
> this is avoided and guardrails are indeed same everywhere (CEP-24 thread sent 
> recently here in ML).
>  
>  
> From: Bernardo Botella  <mailto:conta...@bernardobotella.com>>
> Date: Tuesday, 4 June 2024 at 00:31
> To: dev@cassandra.apache.org <mailto:dev@cassandra.apache.org> 
> mailto:dev@cassandra.apache.org>>
> Cc: Miklosovic, Stefan  <mailto:stefan.mikloso...@netapp.com>>
> Subject: Re: [DISCUSS] CEP-42: Constraints Framework
> 
> You don't often get email from conta...@bernardobotella.com 
> <mailto:conta...@bernardobotella.com>. Learn why this is important 
> <https://aka.ms/LearnAboutSenderIdentification>
> EXTERNAL EMAIL - USE CAUTION when clicking links or attachments 
> 
> 
> 
> Basically, I am trying to protect the limits set by the operator against 
> misconfigured schemas from the customers. 
>  
> I see the guardrails as a safety limit added by the operator, setting the 
> limits within the customers owning the actual schema (and their constraints) 
> can operate. With that vision, if a customer tries to “ignore” the actual 
> limits set by the operator by adding more relaxed constraints, it gets a nice 
> message saying that “that is not allowed for the cluster, please contact your 
> admin".
>  
>  
> 
> 
> On Jun 3, 2024, at 2:51 PM, Miklosovic, Stefan via dev 
>  wrote:
>  
> You wrote in the CEP:
>  
> As we mentioned in the motivation section, we currently have some guardrails 
> for columns size in place which can be extended for other data types.
> Those guardrails will take preference over the defined constraints in the 
> schema, and a SCHEMA ALTER adding constraints that break the limits defined 
> by the guardrails framework will fail.
> If the guardrails themselves are modified, operator should get a warning 
> mentioning that there are schemas with offending constraints.
>  
> I think that this should be other way around. Guardrails should kick in when 
> there are no constraints and they would be overridden by table schema. That 
> way, there is always a “default” in terms of guardrails (which one can turn 
> off on demand / change) but you can override it by table alternation.
>  
> Basically, what is in schema should win regardless of how guardrails are 
> configured. They don’t matter when a constraint is explicitly specified in a 
> schema. It should take the defaults in guardrails if there are any and no 
> constraint is specified on schema level.
>  
> What is your motivation to do it like you suggested?
>  
> From: Bernardo Botella  <mailto:conta...@bernardobotella.com>>
> Date: Friday, 31 May 2024 at 23:24
> To: dev@cassandra.apache.org <mailto:dev@cassandra.apache.org> 
> mailto:dev@cassandra.apache.org>>
> Subject: [DISCUSS] CEP-42: Constraints Framework
> 
> You don't often get email from conta...@bernardobotella.com 
> <mailto:conta...@bernardobotella.com>. Learn why this is important 
> <https://aka.ms/LearnAboutSenderIdentification>
> EXTERNAL EMAIL - USE CAUTION when clicking links or attachments 
>  
> 
> Hello everyone, 
>  
> I am proposing this CEP:
> CEP-42: Constraints Framework - CASSANDRA - Apache Software Foundation 
> <https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-42%3A+Constraints+Framework>
> cwiki.apache.org 
> <https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-42%3A+Constraints+Framework>
>   
>  
> <https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-42%3A+Constraints+Framework>
>  
>  
> And I’m looking for feedback from the community.
>  
> Thanks a lot!
> Bernardo



Re: [DISCUSS] CEP-42: Constraints Framework

2024-06-04 Thread Bernardo Botella
In the CEP document there is another example (altho not explicetly mentioned) 
adding a constraint to the max value of an int -> `number_of_items int 
CONSTRAINT number_of_items < 1000`

This basic example can also be used to expand on how to extend this 
functionality with these two initial constraints (size and value), by composing 
them to create new data types with proper validation. 

For example, this could create an ipv4 with built in validation:
CREATE TYPE keyspace.cidr_address_ipv4 (
  ip_adress inet,
  subnet_mask int,
  CONSTRAINT subnet_mask > 0,
  CONSTRAINT subnet_mask < 32
) 

Or a color type:
CREATE TYPE keyspace.color (
  r int,
  g int,
  b int,
  CONSTRAINT r >= 0,
  CONSTRAINT r < 255,
  CONSTRAINT g >= 0,
  CONSTRAINT g < 255,
  CONSTRAINT b >= 0,
  CONSTRAINT b < 255,
) 


Another types of constraints and functions can be added in the future to 
provide even more flexibility, but are out of the scope of this CEP.

Bernardo

> On Jun 4, 2024, at 1:01 PM, Jon Haddad  wrote:
> 
> The idea is interesting.  I think it would help to have more concrete 
> examples.  It's a bit sparse at the moment, and I have a hard time getting on 
> board with new features where the main selling point is Extensibility over 
> the value they provide on their own.  
> 
> I think it would help a lot if we knew what types of constraints, besides the 
> size check, you were thinking of adding.
> 
> Jon
> 
> On Mon, Jun 3, 2024 at 5:27 PM Bernardo Botella  <mailto:conta...@bernardobotella.com>> wrote:
>> Yes, that is correct. This particular behavior will need CEP-24 in order to 
>> work reliably. But, if my understanding is correct, that statement holds 
>> true for the entirety of Guardrails, and not only for this particular 
>> feature.
>> 
>>> On Jun 3, 2024, at 3:54 PM, Miklosovic, Stefan 
>>> mailto:stefan.mikloso...@netapp.com>> wrote:
>>> 
>>> That would work reliably in case there is no way how to misconfigure 
>>> guardrails in the cluster. What if you set a guardrail on one node but you 
>>> don’t set it (or set it differently) on the other? If it is configured 
>>> differently and you want to check the guardrails if constraints do not 
>>> violate them, then your query might fail or not based on what node is hit. 
>>>  
>>> I guess that guardrails would need to start to be transactional to be sure 
>>> this is avoided and guardrails are indeed same everywhere (CEP-24 thread 
>>> sent recently here in ML).
>>>  
>>>  
>>> From: Bernardo Botella >> <mailto:conta...@bernardobotella.com>>
>>> Date: Tuesday, 4 June 2024 at 00:31
>>> To: dev@cassandra.apache.org <mailto:dev@cassandra.apache.org> 
>>> mailto:dev@cassandra.apache.org>>
>>> Cc: Miklosovic, Stefan >> <mailto:stefan.mikloso...@netapp.com>>
>>> Subject: Re: [DISCUSS] CEP-42: Constraints Framework
>>> 
>>> You don't often get email from conta...@bernardobotella.com 
>>> <mailto:conta...@bernardobotella.com>. Learn why this is important 
>>> <https://aka.ms/LearnAboutSenderIdentification>  
>>> EXTERNAL EMAIL - USE CAUTION when clicking links or attachments 
>>> 
>>> 
>>> 
>>> Basically, I am trying to protect the limits set by the operator against 
>>> misconfigured schemas from the customers. 
>>>  
>>> I see the guardrails as a safety limit added by the operator, setting the 
>>> limits within the customers owning the actual schema (and their 
>>> constraints) can operate. With that vision, if a customer tries to “ignore” 
>>> the actual limits set by the operator by adding more relaxed constraints, 
>>> it gets a nice message saying that “that is not allowed for the cluster, 
>>> please contact your admin".
>>>  
>>>  
>>> 
>>> 
>>> On Jun 3, 2024, at 2:51 PM, Miklosovic, Stefan via dev 
>>> mailto:dev@cassandra.apache.org>> wrote:
>>>  
>>> You wrote in the CEP:
>>>  
>>> As we mentioned in the motivation section, we currently have some 
>>> guardrails for columns size in place which can be extended for other data 
>>> types.
>>> Those guardrails will take preference over the defined constraints in the 
>>> schema, and a SCHEMA ALTER adding constraints that break the limits defined 
>>> by the guardrails framework will fail.
>>> If the guardrails themselves are modified, operator should get a warning 
>>> mentioning that there are schemas with offending constraints.

Re: [DISCUSS] CEP-42: Constraints Framework

2024-06-06 Thread Bernardo Botella
Thanks for the clarification Jon.

I will update the CEP being specific with the two specific Constraint types I 
will be adding, which are size and value (the ones shown in the example). 

And, just to clarify, the mention to extensibility just aims to state that the 
feature should be built in a way that allow more constraints being added. 
 


> On Jun 5, 2024, at 9:24 PM, Jon Haddad  wrote:
> 
> I think there's some promising ideas here, but the CEP needs to be developed 
> a bit more.
> 
> > Another types of constraints and functions can be added in the future to 
> > provide even more flexibility, but are out of the scope of this CEP.
> 
> > For the third point, I didn’t want to be prescriptive on what those 
> > validations should be, but the fact that the proposal is extensible to 
> > those potential use cases is something concrete that, in my opinion, comes 
> > as a benefit of the actual proposal. I’d be happy to develop a bit more the 
> > main example used of sizeOf if it helps alleviate your concerns on this 
> > point.
> 
> I disagree, quite strongly, with this.  While I appreciate extensibility, I 
> think having a variety of actual constraints that ship with the feature means 
> it needs to be built to satisfy real world use cases.  Without going through 
> this process, it feels a bit too much like triggers, UDAs and UDFs  - 
> incomplete, and too much left to the end user.  
> 
> To me, punting on thinking through constraints kicks the most important can 
> down the road.  
> 
> Jon
> 
> 
> On Tue, Jun 4, 2024 at 5:37 PM Bernardo Botella  <mailto:conta...@bernardobotella.com>> wrote:
>> In the CEP document there is another example (altho not explicetly 
>> mentioned) adding a constraint to the max value of an int -> 
>> `number_of_items int CONSTRAINT number_of_items < 1000`
>> 
>> This basic example can also be used to expand on how to extend this 
>> functionality with these two initial constraints (size and value), by 
>> composing them to create new data types with proper validation. 
>> 
>> For example, this could create an ipv4 with built in validation:
>> CREATE TYPE keyspace.cidr_address_ipv4 (
>>   ip_adress inet,
>>   subnet_mask int,
>>   CONSTRAINT subnet_mask > 0,
>>   CONSTRAINT subnet_mask < 32
>> ) 
>> 
>> Or a color type:
>> CREATE TYPE keyspace.color (
>>   r int,
>>   g int,
>>   b int,
>>   CONSTRAINT r >= 0,
>>   CONSTRAINT r < 255,
>>   CONSTRAINT g >= 0,
>>   CONSTRAINT g < 255,
>>   CONSTRAINT b >= 0,
>>   CONSTRAINT b < 255,
>> ) 
>> 
>> 
>> Another types of constraints and functions can be added in the future to 
>> provide even more flexibility, but are out of the scope of this CEP.
>> 
>> Bernardo
>> 
>>> On Jun 4, 2024, at 1:01 PM, Jon Haddad >> <mailto:j...@jonhaddad.com>> wrote:
>>> 
>>> The idea is interesting.  I think it would help to have more concrete 
>>> examples.  It's a bit sparse at the moment, and I have a hard time getting 
>>> on board with new features where the main selling point is Extensibility 
>>> over the value they provide on their own.  
>>> 
>>> I think it would help a lot if we knew what types of constraints, besides 
>>> the size check, you were thinking of adding.
>>> 
>>> Jon
>>> 
>>> On Mon, Jun 3, 2024 at 5:27 PM Bernardo Botella 
>>> mailto:conta...@bernardobotella.com>> wrote:
>>>> Yes, that is correct. This particular behavior will need CEP-24 in order 
>>>> to work reliably. But, if my understanding is correct, that statement 
>>>> holds true for the entirety of Guardrails, and not only for this 
>>>> particular feature.
>>>> 
>>>>> On Jun 3, 2024, at 3:54 PM, Miklosovic, Stefan 
>>>>> mailto:stefan.mikloso...@netapp.com>> 
>>>>> wrote:
>>>>> 
>>>>> That would work reliably in case there is no way how to misconfigure 
>>>>> guardrails in the cluster. What if you set a guardrail on one node but 
>>>>> you don’t set it (or set it differently) on the other? If it is 
>>>>> configured differently and you want to check the guardrails if 
>>>>> constraints do not violate them, then your query might fail or not based 
>>>>> on what node is hit. 
>>>>>  
>>>>> I guess that guardrails would need to start to be transactional to be 
>>>>> sure this is avoided and guardrails 

Re: [DISCUSS] CEP-42: Constraints Framework

2024-06-07 Thread Bernardo Botella
My concern about mentioning other potential constraints to be implemented in 
the future on the CEP is it may derail the conversation from the set of initial 
ones I want to propose, which are size and value constraints. There is 
definitely a lot of other potential constraints that we could discuss in future 
updates. For example:

For numeric types:
- Max, Min, equality, difference (included)

For date types:
- Range (as you mentioned)

For text based types:
- Size (included)
- isJson
- complies with a pattern (as you mentioned)
- is block listed
- complies with an enum

General table constraints (including one or more columns):
- Compare between numeric types (a < b, a > b, a != b, …)
- Compare between date types (date1 < date2, date1>date2, date1!=date2, …)

Do you think this CEP should also contain those?

And, about your question, the answer is yes. Take a look at the Color example 
that I mentioned above:
CREATE TYPE keyspace.color (
  r int,
  g int,
  b int,
  CONSTRAINT r >= 0,
  CONSTRAINT r < 255,
  CONSTRAINT g >= 0,
  CONSTRAINT g < 255,
  CONSTRAINT b >= 0,
  CONSTRAINT b < 255,
) 

Here, you have more than one constraint per column to form a composite object. 
Similar things should be supported at table level.

I hope this helps,
Bernardo



> On Jun 6, 2024, at 11:08 PM, Dinesh Joshi  wrote:
> 
> On Thu, Jun 6, 2024 at 1:50 PM Bernardo Botella  <mailto:conta...@bernardobotella.com>> wrote:
>> I will update the CEP being specific with the two specific Constraint types 
>> I will be adding, which are size and value (the ones shown in the example). 
> 
> Could you identify constraints for the most common data types? It would be 
> nice to ship a good set of default constraints. For example, it would be nice 
> to constrain numeric & date data types within a range, text could comply with 
> a pattern, etc.
> 
> One question that I'm not sure if it came up, is whether a column could have 
> multiple constraints?
> 
> Dinesh
> 
> 



Re: [DISCUSS] CEP-42: Constraints Framework

2024-06-10 Thread Bernardo Botella
Hi everyone,

After the feedback, I'd like to make a recap of what we have discussed in this 
thread and try to move forward with the conversation.

I made some clarifications:
- Constraints are only applied at write time.
- Guardrail configurations should maintain preference over what's being defined 
as a constraint.

Specify constraints:
There is a general feedback around adding more concrete examples than the ones 
that can be found on the CEP document. 
Basically, the initial constraints I am proposing are:
- SizeOf Constraint for String types, as in
name text CONSTRAINT sizeOf(name) < 256

- Value Constraint for numeric types
number_of_items int CONSTRAINT number_of_items < 1000

Those two alone and combined provide a lot of flexibility, and allow complex 
validations that enable "new types" such as:

CREATE TYPE keyspace.cidr_address_ipv4 (
  ip_adress inet,
  subnet_mask int,
  CONSTRAINT subnet_mask > 0,
  CONSTRAINT subnet_mask < 32
)

CREATE TYPE keyspace.color (
  r int,
  g int,
  b int,
  CONSTRAINT r >= 0,
  CONSTRAINT r < 255,
  CONSTRAINT g >= 0,
  CONSTRAINT g < 255,
  CONSTRAINT b >= 0,
  CONSTRAINT b < 255,
) 


Those two initial Constraints are de fundamental constraints that would give 
value to the feature. The framework can (and will) be extended with other 
Constraints, leaving us with the following:

For numeric types:
- Max (<)
- Min (>)
- Equality ( = = )
- Difference (!=)

For date types:
- Before (<)
- After (>)

For text based types:
- Size (sizeOf)
- isJson (is the text a json?)
- complies with a given pattern
- Is it block listed?
- Is it part of an enum?

General table constraints (including more than one column):
- Compare between numeric types (a < b, a > b, a != b, …)
- Compare between date types (date1 < date2, date1>date2, date1!=date2, …)

I have updated the CEP with this information.

Potential dependency on CEP-24:
Giving that the Constraints Framework provides a set of checks to be performed 
along side those that can be made using the Guardrails framework, there may be 
some relation with CEP-24, which mentions transactional Guardrails to prevent 
situation in which the limit configurations are different across the cluster.

This CEP-42 is not proposing modifying the Guardrails framework, and therefore 
should not be affected by CEP-24. It is true that the improvements provided by 
CEP-24 would benefit this Constraints framework, but it is not dependent on 
them.


I hope I included all the points and addressed them on the CEP, otherwise, 
please call it out and I’ll be more than happy to include it.

Thanks everyone for all the inputs!
Bernardo

> On Jun 7, 2024, at 11:54 AM, Štefan Miklošovič  
> wrote:
> 
> How I see it is that in 5.1 there will be TCM for the very first time and I 
> do not think that config in TCM would make it into 5.1 based on what Sam 
> talks about (need for some stability etc), that makes total sense to me. TCM 
> is quite a big feature to deliver on its own and putting even way more stuff 
> into that might be detrimental to the quality if we rush it.
> 
> Then sometimes after 5.1 we might take a serious look for config in TCM 
> itself.
> 
> My plan, ideally, is to still ship CEP-24 without config in TCM, then after 
> 5.1 when config in TCM lands, CEP-24 might integrate with that on a deeper 
> level.
> 
> If CEP-42 (this one) makes it into 5.1 as well, I think the similar case 
> might be done about that as well (integration with guardrails).
> 
> On Fri, Jun 7, 2024 at 8:49 PM Sam Tunnicliffe  > wrote:
>> We've been working on a draft CEP for migrating config from yaml to cluster 
>> metadata but have been a bit short of time recently, I'll try to get 
>> something out for discussion as soon as possible. 
>> A little delay isn't such a bad thing IMO, as we're still ironing out the 
>> kinks in the TCM implementation itself. It'd be good to get a bit more road 
>> testing done with that before we start adding more to it, which I'm sure 
>> will start to ramp up once 5.0 is out.  
>> 
>> Thanks,
>> Sam
>> 
>>> On 7 Jun 2024, at 19:19, Štefan Miklošovič >> > wrote:
>>> 
>>> Yes, all configuration should be transactional (configuration which makes 
>>> sense to require to be the same cluster-wide). Guardrails in TCM are just a 
>>> subset of this problem. When I started to do CEP-24 I started with 
>>> guardrails in TCM but then I realized it leads to more general "all config 
>>> in TCM" and I found myself rabbit-hole-ing endlessly.
>>> 
>>> BTW I do not think that once CEP-24 is in place without guardrails in TCM 
>>> then implementing it would blow up things a lot. It is really just about a 
>>> couple mutable virtual tables and a couple transformations for various 
>>> guardrail types we have but I expect that its integration into more general 
>>> config in TCM should be rather straightforward.
>>> 
>>> Config in TCM definitely deserves its own CEP, it is too much to hand

Re: [DISCUSS] CEP-42: Constraints Framework

2024-06-11 Thread Bernardo Botella
Hi Štephan

I'll address the different points:
1)
An example (possibly a stretch) of use case for != constraint would be:
Let's say you have a table in which you want to record a movement, from 
position p1 to position p2. You may want to check that those two are different 
to make sure there is actual movement.

CREATE TABLE keyspace.table (
  p1 int, 
  p2 int,
  ...,
  CONSTRAINT p1 != p2
);

For the case of ==, I agree that it is harder to come up with a valid use case, 
and I added it for completion.

2)
Is part of an enum is somehow suplying the lack of enum types. Constraint could 
be something like CONSTRAINT belongsToEnum([list of valid values], field):
CREATE TABLE keyspace.table (
  field text CONSTRAINT belongsToEnum(['foo', 'foo2'], field), 
  ...
);

3)
Similarly, we can check and reject if a term is part of a list of blocked terms:
CREATE TABLE keyspace.table (
  field text CONSTRAINT isNotBlocked(['blocked_foo', 'blocked_foo2'], field), 
  ...
);

Please let me know if this helps,
Bernardo



> On Jun 11, 2024, at 6:29 AM, Štefan Miklošovič  
> wrote:
> 
> Hi Bernardo,
> 
> 1) Could you elaborate on these two constraints?
> 
> == and != ?
> 
> What is the use case? Why would I want to have data in a database stored in 
> some column which would need to be _same as my constraint_ and which _could 
> not_ be same as my constraint? Can you give me at least one example of each? 
> It looks like I am going to put a constant into a database in case of ==, 
> wouldn't a static column be better?
> 
> 2) For examples of text based types you mentioned: "is part of an enum" - how 
> would you enforce this in Cassandra? What enum do we have in CQL?
> 3) What does "is it block listed" mean?
> 
> In the meanwhile, I made changes to CEP-24 to move transactionality into 
> optional features.
> 
> On Tue, Jun 11, 2024 at 12:18 AM Bernardo Botella 
> mailto:conta...@bernardobotella.com>> wrote:
>> Hi everyone,
>> 
>> After the feedback, I'd like to make a recap of what we have discussed in 
>> this thread and try to move forward with the conversation.
>> 
>> I made some clarifications:
>> - Constraints are only applied at write time.
>> - Guardrail configurations should maintain preference over what's being 
>> defined as a constraint.
>> 
>> Specify constraints:
>> There is a general feedback around adding more concrete examples than the 
>> ones that can be found on the CEP document. 
>> Basically, the initial constraints I am proposing are:
>> - SizeOf Constraint for String types, as in
>> name text CONSTRAINT sizeOf(name) < 256
>> 
>> - Value Constraint for numeric types
>> number_of_items int CONSTRAINT number_of_items < 1000
>> 
>> Those two alone and combined provide a lot of flexibility, and allow complex 
>> validations that enable "new types" such as:
>> 
>> CREATE TYPE keyspace.cidr_address_ipv4 (
>>   ip_adress inet,
>>   subnet_mask int,
>>   CONSTRAINT subnet_mask > 0,
>>   CONSTRAINT subnet_mask < 32
>> )
>> 
>> CREATE TYPE keyspace.color (
>>   r int,
>>   g int,
>>   b int,
>>   CONSTRAINT r >= 0,
>>   CONSTRAINT r < 255,
>>   CONSTRAINT g >= 0,
>>   CONSTRAINT g < 255,
>>   CONSTRAINT b >= 0,
>>   CONSTRAINT b < 255,
>> ) 
>> 
>> 
>> Those two initial Constraints are de fundamental constraints that would give 
>> value to the feature. The framework can (and will) be extended with other 
>> Constraints, leaving us with the following:
>> 
>> For numeric types:
>> - Max (<)
>> - Min (>)
>> - Equality ( = = )
>> - Difference (!=)
>> 
>> For date types:
>> - Before (<)
>> - After (>)
>> 
>> For text based types:
>> - Size (sizeOf)
>> - isJson (is the text a json?)
>> - complies with a given pattern
>> - Is it block listed?
>> - Is it part of an enum?
>> 
>> General table constraints (including more than one column):
>> - Compare between numeric types (a < b, a > b, a != b, …)
>> - Compare between date types (date1 < date2, date1>date2, date1!=date2, …)
>> 
>> I have updated the CEP with this information.
>> 
>> Potential dependency on CEP-24:
>> Giving that the Constraints Framework provides a set of checks to be 
>> performed along side those that can be made using the Guardrails framework, 
>> there may be some relation with CEP-24, which mentions transactional 
>> Guardrails to prevent situation in which the limit configuratio

Re: [DISCUSS] CEP-42: Constraints Framework

2024-06-12 Thread Bernardo Botella
Hi again,

I completely agree that anything beyond simple poses a problem. My point is 
that the definition of simple may vary, and each of those constraints I 
mentioned deserves a conversation on its own. As I previously mentioned on the 
dev thread:
https://lists.apache.org/thread/qln8cbkhlw9j9563p0kl12wrm5w62nq0

I am trying to propose here the two constraints that will add a lot of value to 
the framework (size and value), and illustrating how the framework is to be 
extended.

The final list I proposed can either be expanded (I’m more than happy to hear 
more proposals :-) ) or reduced (you and Claude present very valid points), 
but, I think using this thread to discuss them one by one may derail the 
conversation and make it hard to follow. Having said that, we can leave out 
from the CEP the isList type of constraints and defer it to a future 
conversation if the constraints framework CEP is approved. Once we have the 
basic ones in place, we can have a deeper discussion on this one.

What do you think?


> On Jun 12, 2024, at 3:39 AM, Štefan Miklošovič  
> wrote:
> 
> My gut feeling is that anything beyond simple comparisons is just too 
> problematic / complex. I think that this should be part of the application 
> logic rather than putting that to the database. Is there any major database 
> out there which has constraints modelled like that? (belongsToEnum, 
> isNotBlocked, inList ...). It just opens a lot of questions, like how would 
> we treat nulls? How would this be supported in the driver? Etc ... 
>  
> 
> 
> On Wed, Jun 12, 2024 at 12:34 PM Claude Warren, Jr via dev 
> mailto:dev@cassandra.apache.org>> wrote:
>>> 2)
>>> Is part of an enum is somehow suplying the lack of enum types. Constraint 
>>> could be something like CONSTRAINT belongsToEnum([list of valid values], 
>>> field):
>>> CREATE TABLE keyspace.table (
>>>   field text CONSTRAINT belongsToEnum(['foo', 'foo2'], field), 
>>>   ...
>>> );
>>> 3)
>>> Similarly, we can check and reject if a term is part of a list of blocked 
>>> terms:
>>> CREATE TABLE keyspace.table (
>>>   field text CONSTRAINT isNotBlocked(['blocked_foo', 'blocked_foo2'], 
>>> field), 
>>>   ...
>>> );
>> 
>> Are these not just "CONSTRAINT inList([List of valid values], field);"  and 
>> "CONSTRAINT not inList([List of valid values], field);"?
>> At this point doesn't "CONSTRAINT p1 != p2" devolve to "CONSTRAINT not 
>> inList([p1], p2);"?
>> 
>> Can "[List of values]" point to a variable containing a list?  Or does it 
>> require hard coding in the constraint itself?
>> 
>> 
>> 
>> On Tue, Jun 11, 2024 at 6:23 PM Bernardo Botella 
>> mailto:conta...@bernardobotella.com>> wrote:
>>> Hi Štephan
>>> 
>>> I'll address the different points:
>>> 1)
>>> An example (possibly a stretch) of use case for != constraint would be:
>>> Let's say you have a table in which you want to record a movement, from 
>>> position p1 to position p2. You may want to check that those two are 
>>> different to make sure there is actual movement.
>>> 
>>> CREATE TABLE keyspace.table (
>>>   p1 int, 
>>>   p2 int,
>>>   ...,
>>>   CONSTRAINT p1 != p2
>>> );
>>> 
>>> For the case of ==, I agree that it is harder to come up with a valid use 
>>> case, and I added it for completion.
>>> 
>>> 2)
>>> Is part of an enum is somehow suplying the lack of enum types. Constraint 
>>> could be something like CONSTRAINT belongsToEnum([list of valid values], 
>>> field):
>>> CREATE TABLE keyspace.table (
>>>   field text CONSTRAINT belongsToEnum(['foo', 'foo2'], field), 
>>>   ...
>>> );
>>> 
>>> 3)
>>> Similarly, we can check and reject if a term is part of a list of blocked 
>>> terms:
>>> CREATE TABLE keyspace.table (
>>>   field text CONSTRAINT isNotBlocked(['blocked_foo', 'blocked_foo2'], 
>>> field), 
>>>   ...
>>> );
>>> 
>>> Please let me know if this helps,
>>> Bernardo
>>> 
>>> 
>>> 
>>>> On Jun 11, 2024, at 6:29 AM, Štefan Miklošovič 
>>>> mailto:stefan.mikloso...@gmail.com>> wrote:
>>>> 
>>>> Hi Bernardo,
>>>> 
>>>> 1) Could you elaborate on these two constraints?
>>>> 
>>>> == and != ?
>>>> 

Re: [Discuss] CEP-24 Password validation and generation

2024-06-12 Thread Bernardo Botella
+1 on Francisco’s comments. TCM is a general feature that a lot of other things 
will benefit from, and the fact that this CEP is one of those that will benefit 
shouldn’t block it from moving forward.

> On Jun 11, 2024, at 11:16 PM, Francisco Guerrero  wrote:
> 
> Stefan, thanks for moving this CEP forward. This CEP brings a lot of value
> to Cassandra without needing to wait for TCM. I can see how a misconfigured
> node can be problematic, but the issue is not something introduced in
> this CEP, and it affects many other features in Cassandra. I think it needs to
> be addressed separately.
> 
> Mature database offerings have functionality that is proposed in your CEP such
> as password strength, and preventing usage of previously used passwords.
> 
> I'm looking forward to see what shape this CEP takes in the coming weeks,
> and also looking forward to the pull request when it lands.
> 
> I think we can even extend this concept to MutualTLS authentication where
> we can impose certain restrictions on certificates. I recently contributed
> https://issues.apache.org/jira/browse/CASSANDRA-18951 to Cassandra to
> add restrictions to the allowed certificate validity period. We can consider 
> having
> CEP-24 as a pluggable way to configure restrictions that are not necessarily
> just scoped for passwords, but more generally to other authentication methods.
> 
> Best,
> - Francisco
> 
> On 2024/06/07 17:58:34 Štefan Miklošovič wrote:
>> Hi Shailaja,
>> 
>> thanks for taking a look at this.
>> 
>> That was indeed just an example we can change. It was more about showing
>> what might be possible in the future, nothing is set in stone yet, as the
>> last sentence "this is not the part of the initial implementation" explains.
>> 
>> When it comes to these very specific features you mentioned, I feel like
>> this is very "business specific" and I do not want to "pollute" Cassandra
>> system tables unnecessarily. It was a long time ago since I was writing
>> that CEP and it made sense to me back than to have a table for previous
>> passwords but then I started to reconsider it because I do not know about
>> any database out there which would offer something similar (correct me if I
>> am wrong) plus I start to question its actual benefit for a database user.
>> We are not trying to mimic the behavior of a website after all. More to it,
>> the password rotation itself is quite a topic and there are opinions that
>> password should not be actually rotated at all. Hence I think that it is
>> not the role of Cassandra to define how passwords are going to be rotated,
>> with what frequency etc. Let's just keep it simple and let's just enforce
>> the password strength itself.
>> 
>> More to this CEP in general, after I read in the other thread about CEP-42
>> that Dinesh does not consider TCM to be a hard requirement for this CEP and
>> he finds it very useful already, I think I will consolidate what I have and
>> I will remove TCM part of that in order to make it happen sooner.
>> 
>> I think I made a mistake by waiting for config in TCM but it was only with
>> good intentions - to provide a comprehensive feature without any
>> compromises. It seems to me that providing a well rounded config in TCM +
>> guardrails in TCM was too much for me to handle and it would take way more
>> time than I anticipated and it will be better if this is a more iterative
>> process. I think that based on where I am with the implementation of
>> guardrails in TCM (POC is basically done) it is more or less just a coding
>> exercise to integrate it into general config in TCM once config in TCM is
>> introduced.
>> 
>> I think I will restructure the current CEP-24 a little bit and I will move
>> more optional features into possible extensions in the future in order to
>> keep the core functionality at the minimum in order to reason about it more
>> easily. I will try to get back to this in the upcoming weeks and I will
>> eventually start a voting thread.
>> 
>> Regards
>> 
>> 
>> On Fri, Jun 7, 2024 at 6:00 PM  wrote:
>> 
>>> Hi Stefan,
>>> 
>>> Thanks for the CEP, sounds great. Regarding
>>> 
>>> If we were about to make this even harder to bypass, we may say that
>>> password can be changed once per day, for example (anytime for a
>>> superuser). Since we have "created" column which is of type timeuuid, we
>>> would check this table and see if there was some password already set that
>>> day or not and fail the request eventually. This is not the part of the
>>> initial implementation.
>>> 
>>> Allowing password change only once a day would be too restrictive and may
>>> create chaos for users. For example, I am trying to file a tax return on
>>> the last day of deadline, I forgot the password I had set last year, now
>>> changed it. Assume I forgot the password I just set either due to an
>>> unclear/faulty website or due to my bad memory with stress to file tax
>>> returns on the last day. In that case either I should be able to change t

Re: [DISCUSS] CEP-42: Constraints Framework

2024-06-13 Thread Bernardo Botella
Thanks a lot for your comments Abe!

I do agree that the Constraint clause should be as simple as possible. I will 
add a note on the CEP along with some specifics about the proposed constraints 
(removing the ones that are contentious, and adding them to a possible future 
additions section). And yeah, I also think that these constraints will help 
different Cassandra operating paradigms (multi-tenant clusters and diverse 
workflows).

Besides that, I hope that I’ve addressed all the potential concerns and 
feedback on the thread. Let’s let a bit more time for others to chime in (any 
further feedback will be more than welcome), but I’d like to move forward with 
a voting soon if no other concerns are pointed out.

All and all, thanks a lot to everyone that participated in the thread and added 
to the discussion!
Bernardo



> On Jun 12, 2024, at 2:37 PM, Abe Ratnofsky  wrote:
> 
> I've thought about this some more. It would be useful for Cassandra to 
> support user-defined "guardrails" (or constraints, whatever you want to call 
> them), that could be applied per keyspace or table. Whether a user or an 
> operator is considered the owner of a table depends on the organization 
> deploying Cassandra, so allowing both parties to protect their tables against 
> mis-use seems good to me, especially for large multi-tenant clusters with 
> diverse workloads.
> 
> For example, it would be really useful if a user could set the 
> Guardrails.{read,write}ConsistencyLevels for their tables, or declare whether 
> all operations should be over LWTs to avoid mixing regular and LWT workloads.
> 
> I'm hesitant about adding lots of expression syntax to the CONSTRAINT clause. 
> I think I'd prefer a function calling syntax that represents:
> 1. Whether the constraint is system / keyspace / table scoped
> 2. Where in query processing the constraint is checked
> 3. What is executed by the check



Re: [DISCUSS] Increments on non-existent rows in Accord

2024-06-20 Thread Bernardo Botella
Doesn’t an UPDATE statement creates a row if the partition key does not exist? 
That’s also confirmed by the official Cassandra documentation here 
:

”Unlike in SQL, UPDATE does not check the prior existence of the row by 
default. The row is created if none existed before, and updated otherwise. 
Furthermore, there is no means of knowing which action occurred.”

That being the case, I think the second option you mention is what keeps 
consistency with the UPDATEs out of the transaction.

Kind regards,
Bernardo

> On Jun 20, 2024, at 1:54 PM, Caleb Rackliffe  wrote:
> 
> We had a bug report a while back from Luis E Fernandez and team in 
> CASSANDRA-18988  
> around the behavior of increments/decrements on numeric fields for 
> non-existent rows. Consider the following, wich can be run on the 
> cep-15-accord branch:
> 
> CREATE KEYSPACE accord WITH replication = {'class': 'SimpleStrategy', 
> 'replication_factor': '1'} AND durable_writes = true
> 
> CREATE TABLE accord.accounts (
> partition text,
> account_id int,
> balance int,
> PRIMARY KEY (partition, account_id)
> ) WITH CLUSTERING ORDER BY (account_id ASC) AND transactional_mode='full'
> 
> BEGIN TRANSACTION
> INSERT INTO accord.accounts (partition, account_id, balance) VALUES 
> ('default', 0, 100);
> INSERT INTO accord.accounts (partition, account_id, balance) VALUES 
> ('default', 1, 100);
> COMMIT TRANSACTION
> 
> BEGIN TRANSACTION
> UPDATE accord.accounts SET balance -= 10 WHERE partition = 'default' AND 
> account_id = 1;
> UPDATE accord.accounts SET balance += 10 WHERE partition = 'default' AND 
> account_id = 3;
> COMMIT TRANSACTION
> 
> Reading the 'default' partition will produce the following result.
> 
>  partition | account_id | balance
> ---++-
>default |  0 | 100
>default |  1 |  90
> 
> As you will notice, we have not implicitly inserted a row for account_id 3, 
> which does not exist when we request that its balance be incremented by 10. 
> This is by design, as null + 10 == null.
> 
> Before I close CASSANDRA-18988 
> , I'd like to confirm 
> with everyone reading this that the behavior above is reasonable. The only 
> other option I've seen proposed that would make sense is perhaps producing a 
> result like:
> 
>  partition | account_id | balance
> ---++-
>default |  0 | 100
>default |  1 |  90
>default |  3 |null
> 
> 
> Note however that this is exactly what we would produce if we had first 
> inserted a row w/ no value for balance:
> 
> INSERT INTO accord.accounts (partition, account_id) VALUES ('default', 3);



Re: [DISCUSS] CEP-42: Constraints Framework

2024-06-24 Thread Bernardo Botella
Thanks for the comments Jordan.

Completely agreed that we will need to be careful on not accepting constraints 
that require a read before a write. It is called out on the CEP itself, and 
will have to be enforced in the future.

After all the feedback and discussion, I think we are ready to move to a voting 
thread for CEP-42. I will be posting the thread today.

Thanks everyone who participated in the discussion!
Bernardo

> On Jun 23, 2024, at 2:38 PM, Jordan West  wrote:
> 
> I am generally for this CEP, particularly the sizeOf guardrail. For example, 
> we recently had an incident caused by a client who wrote outside of the 
> contract we had verbally established. The constraint would have let us encode 
> that contract into the database. In this case, clients are writing large 
> blobs at the application layer and internally the client performs chunking.  
> We had established a chunk size of 64k, for example. However, the application 
> team wanted to use a different programming language than the ones we provide 
> clients for so they wrote their own. The new client had a bug that did not 
> honor the agreed upon chunk size and wrote chunks that were MBs in size. This 
> eventually led to a production incident and the issue was discovered as a 
> result of a bunch of analysis (dumping sstables, etc). Had we had the sizeOf 
> guardrail it would have turned a production incident with hours of 
> investigation into a bug found immediately during development. Could this be 
> done with a node-level guardrail? Likely. But config has the issues described 
> above and its possible to have two tables with different constraints around 
> similar fields (for example, two different chunk size configs due to data 
> shape). Could it be done at the client layer? Yes that's what we are doing 
> now, but this incident highlights the weakness with that approach (having to 
> implement the contract everywhere and having disjoint features across 
> clients).
>  
> I also think there is benefit to application owners. Encoding constraints in 
> the database ensures continuity as ownership and contributors change and 
> reduces the need for comments or documentation as the means to enforce or 
> share this knowledge. 
> 
> I think enforcing them at write time makes sense. Thinking about it in the 
> scope of compaction for example reminds me of a data loss incident where 
> someone ran a validation in an older version (like 2.0 or 2.1) and a bunch of 
> 4 byte ints were thrown away because the field expected an 8 byte long. 
> 
> My primary concern would be ensuring that we don't implement constraints that 
> require a read before right (not inList comes to mind as an example of one 
> that could imply reading before writing and could confuse a user if it 
> doesn't). 
> 
> Regarding the conflict with existing guardrails, I do think that is tougher. 
> On one hand I find this feature to be more evolved than those guardrails and 
> would be fine to see them be replaced by it. On the other, the guardrails 
> provide sole control to the operator which is nice but adds some complexity 
> that has been rightly called out.  But I don't see that as a reason not to go 
> forward with this feature. We should pick a path and accept the tradeoffs. 
>   
> Jordan
> 
> 
> On Thu, Jun 13, 2024 at 2:39 PM Bernardo Botella 
> mailto:conta...@bernardobotella.com>> wrote:
>> Thanks a lot for your comments Abe!
>> 
>> I do agree that the Constraint clause should be as simple as possible. I 
>> will add a note on the CEP along with some specifics about the proposed 
>> constraints (removing the ones that are contentious, and adding them to a 
>> possible future additions section). And yeah, I also think that these 
>> constraints will help different Cassandra operating paradigms (multi-tenant 
>> clusters and diverse workflows).
>> 
>> Besides that, I hope that I’ve addressed all the potential concerns and 
>> feedback on the thread. Let’s let a bit more time for others to chime in 
>> (any further feedback will be more than welcome), but I’d like to move 
>> forward with a voting soon if no other concerns are pointed out.
>> 
>> All and all, thanks a lot to everyone that participated in the thread and 
>> added to the discussion!
>> Bernardo
>> 
>> 
>> 
>> > On Jun 12, 2024, at 2:37 PM, Abe Ratnofsky > > <mailto:a...@aber.io>> wrote:
>> > 
>> > I've thought about this some more. It would be useful for Cassandra to 
>> > support user-defined "guardrails" (or constraints, whatever you want to 
>> > call them), that could be applied per keyspace or table. Whether a user or 

[VOTE] CEP-42: Constraints Framework

2024-06-24 Thread Bernardo Botella
Hi everyone,

I would like to start the voting for CEP-42.

Proposal: 
https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-42%3A+Constraints+Framework
Discussion: https://lists.apache.org/thread/xc2phmxgsc7t3y9b23079vbflrhyyywj

The vote will be open for 72 hours. A vote passes if there are at least 3 
binding +1s and no binding vetoes.

Thanks,
Bernardo Botella

Re: [DISCUSS] CEP-42: Constraints Framework

2024-06-24 Thread Bernardo Botella
Hi Ariel and Jon,

Let me address your question first. Yes, AND is supported in the proposal. 
Below you can find some examples of different constraints applied to the same 
column.

As per the LENGTH name instead of sizeOf as in the proposal, I am also not 
opposed to it if it is more consistent with terminology in the databases 
universe.

So, to recap, there seems to be general agreement on the usefulness of the 
Constraints Framework.
Now, from the feedback that has arrived after the voting has been called, I see 
there are three different proposals for syntax:

1.-
The syntax currently described in the CEP. Example:
CREATE TYPE keyspace.cidr_address_ipv4 (
  ip_adress inet,
  subnet_mask int,
  CONSTRAINT subnet_mask > 0,
  CONSTRAINT subnet_mask < 32
)

2.-
As Jon suggested, leaving this definitions to more specific Guardrails at table 
level. Example, something like:
column_min_int_value_size_threshold_keyspace_address_ipv4_ip_adress = 0
column_max_int_value_size_threshold_keyspace_address_ipv4_ip_adress = 32

3.-
As Ariel suggested, having the CHECK keyword added to align consistency with 
SQL. Example:
CREATE TYPE keyspace.cidr_address_ipv4 (
  ip_adress inet,
  subnet_mask int,
  CONSTRAINT CHECK subnet_mask > 0,
  CONSTRAINT CHECK subnet_mask < 32
)

For the guardrails vs cql syntax, I think that keeping the conceptual 
separation that has been explored in this thread, and perfectly recapped by 
Doug, is closer to what we are trying to achieve with this framework. In my 
opinion, having them in the CQL schema definition provides those application 
level constraints that Doug mentions in an more accesible way than having to 
configure such specific guardrais.

For the addition of the CHECK keyword, I'm definitely not opposed to it if it 
helps Cassandra users coming from other databases understand concepts that were 
already familiar to them.

I hope this helps move the conversation forward,
Bernardo



> On Jun 24, 2024, at 12:17 PM, Ariel Weisberg  wrote:
> 
> Hi,
> 
> I see a vote for this has been called. I should have provided more prompt 
> feedback sooner.
> 
> I am a strong +1 on adding column level constraints being a good thing to 
> add. I'm not too concerned about row/partition/table level constraints, but I 
> would like to change the syntax before I would be +1 on this CEP.
> 
> It would be good to align the syntax as closely as possible to our existing 
> syntax, and if not that then MySQL/Postgres. For example it looks like we 
> don't have a string length function so maybe add `LENGTH` (consistent with 
> MySQL/Postgres) to also use with column level constraints.
> 
> It looks like there are generally two forms of constraint syntax, one is 
> expressed as part of the column definition, and the other is a named or 
> anonymous constraint on the table. https://www.w3schools.com/sql/sql_check.asp
> 
> Can we align with having these column level ones as `CHECK` constraints like 
> in SQL, and `CONSTRAINT [constraint_name] CHECK` would be used if creating a 
> named or multi-column constraint?
> 
> Will column level check constraints support `AND` so that you can specify 
> multiple constraints on the column? I am not sure if that is supported in 
> other databases, but it would be good to align on that as well.
> 
> RE some implementation things to keep in mind:
> 
> If TCM is in use and the constraints are defined in the schema data structure 
> this should work fine with Accord because all coordinators (regular, 
> recovery) will deterministically agree on the constraints being enforced 
> BUT... this also has to map to how/when constraints are enforced.
> 
> Both Accord and Paxos work best when the constraints are enforced when the 
> final mutation to be applied is created and not later when it is being 
> applied to the CFS. This also reduces duplication of enforcement checking 
> work to just the coordinator for the write.
> 
> Ariel
> 
> On Fri, May 31, 2024, at 5:23 PM, Bernardo Botella wrote:
>> Hello everyone,
>> 
>> I am proposing this CEP:
>> CEP-42: Constraints Framework - CASSANDRA - Apache Software Foundation 
>> <https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-42%3A+Constraints+Framework>
>> cwiki.apache.org 
>> <https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-42%3A+Constraints+Framework>
>>  
>> <https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-42%3A+Constraints+Framework>
>> 
>> And I’m looking for feedback from the community.
>> 
>> Thanks a lot!
>> Bernardo



Re: [DISCUSS] CEP-42: Constraints Framework

2024-06-25 Thread Bernardo Botella
Got it. Thanks for the clarification Jon. Then, in terms of syntax, I think we 
can discard the option 2.

In terms of GUARDRAIL vs CONSTRAINT concept you bring up, I guess here we have 
pros and cons for both sides. It is true that there is an existing concept of 
GUARDRAIL on Cassandra, and that reusing it comes with benefits. But, in my 
opinion, there are two main advantages to use the CONSTRAINT name for the 
feature:
- It keeps consistency with concepts from other databases (this may be a minor, 
but I really think there is benefit for those coming from other databases, and 
may help them understand what this actually is)
- Having it presented as a different concept help illustrate how those two 
features are different. Following the example provided by Doug, we can have 
clear separation on those two levels of restrictions to a write.



> On Jun 24, 2024, at 9:46 PM, Jon Haddad  wrote:
> 
> I think my suggestion was unclear. I was referring to the name guardrail, 
> using the same infra as guardrails, rather than a separate concept. Not 
> applying it like we do table options. 
> 
> 
> 
> On Tue, Jun 25, 2024 at 12:44 AM Bernardo Botella 
> mailto:conta...@bernardobotella.com>> wrote:
>> Hi Ariel and Jon,
>> 
>> Let me address your question first. Yes, AND is supported in the proposal. 
>> Below you can find some examples of different constraints applied to the 
>> same column.
>> 
>> As per the LENGTH name instead of sizeOf as in the proposal, I am also not 
>> opposed to it if it is more consistent with terminology in the databases 
>> universe.
>> 
>> So, to recap, there seems to be general agreement on the usefulness of the 
>> Constraints Framework.
>> Now, from the feedback that has arrived after the voting has been called, I 
>> see there are three different proposals for syntax:
>> 
>> 1.-
>> The syntax currently described in the CEP. Example:
>> CREATE TYPE keyspace.cidr_address_ipv4 (
>>   ip_adress inet,
>>   subnet_mask int,
>>   CONSTRAINT subnet_mask > 0,
>>   CONSTRAINT subnet_mask < 32
>> )
>> 
>> 2.-
>> As Jon suggested, leaving this definitions to more specific Guardrails at 
>> table level. Example, something like:
>> column_min_int_value_size_threshold_keyspace_address_ipv4_ip_adress = 0
>> column_max_int_value_size_threshold_keyspace_address_ipv4_ip_adress = 32
>> 
>> 3.-
>> As Ariel suggested, having the CHECK keyword added to align consistency with 
>> SQL. Example:
>> CREATE TYPE keyspace.cidr_address_ipv4 (
>>   ip_adress inet,
>>   subnet_mask int,
>>   CONSTRAINT CHECK subnet_mask > 0,
>>   CONSTRAINT CHECK subnet_mask < 32
>> )
>> 
>> For the guardrails vs cql syntax, I think that keeping the conceptual 
>> separation that has been explored in this thread, and perfectly recapped by 
>> Doug, is closer to what we are trying to achieve with this framework. In my 
>> opinion, having them in the CQL schema definition provides those application 
>> level constraints that Doug mentions in an more accesible way than having to 
>> configure such specific guardrais.
>> 
>> For the addition of the CHECK keyword, I'm definitely not opposed to it if 
>> it helps Cassandra users coming from other databases understand concepts 
>> that were already familiar to them.
>> 
>> I hope this helps move the conversation forward,
>> Bernardo
>> 
>> 
>> 
>>> On Jun 24, 2024, at 12:17 PM, Ariel Weisberg >> <mailto:ar...@weisberg.ws>> wrote:
>>> 
>>> Hi,
>>> 
>>> I see a vote for this has been called. I should have provided more prompt 
>>> feedback sooner.
>>> 
>>> I am a strong +1 on adding column level constraints being a good thing to 
>>> add. I'm not too concerned about row/partition/table level constraints, but 
>>> I would like to change the syntax before I would be +1 on this CEP.
>>> 
>>> It would be good to align the syntax as closely as possible to our existing 
>>> syntax, and if not that then MySQL/Postgres. For example it looks like we 
>>> don't have a string length function so maybe add `LENGTH` (consistent with 
>>> MySQL/Postgres) to also use with column level constraints.
>>> 
>>> It looks like there are generally two forms of constraint syntax, one is 
>>> expressed as part of the column definition, and the other is a named or 
>>> anonymous constraint on the table. 
>>> https://www.w3schools.com/sql/sql_check.asp
>>> 
>>> Can we align with having these column level one

Re: [DISCUSS] CEP-42: Constraints Framework

2024-06-25 Thread Bernardo Botella
Hi Ariel,

Your suggestions make sense, and I’ll be updating the CEP with the details. 
Basically:
- We have an optional name for the constraints. If the name is not provided, a 
random name is generated for a constraint:
CREATE TABLE keyspace.table (
  p1 int, 
  p2 int,
  ...,
  CONSTRAINT [name] CHECK p1 != p2
);

- Alter and Drop constraints are as follows
ALTER CONSTRAINT [name] CHECK new_condition
DROP CONSTRAINT [name]

- Describe table returns the list of constraints for a table.
- The condition of the CONSTRAINT (after the CHECK keyword) can be surrounded 
by optional parentheses to keep consistency with other databases syntax.

I will update the CEP with those details.

To Dinesh’s point, I agree that a NOT NULL constraint will be really useful. I 
can add it to the list on the CEP

Regards,
Bernardo


> On Jun 25, 2024, at 9:22 AM, Ariel Weisberg  wrote:
> 
> Hi,
> 
> I am also +1 on Doug's distinction between things that can be managed by 
> operators and things that can be managed by applications.
> 
> Some things to note about the syntax is that there are parens around the 
> condition in SQL. In your example there are multiple anonymous constraints on 
> the same column, how are anonymous constraints handled? Does the database 
> automatically generate a named constraint for them so they can be referenced 
> later? Do we allow multiple constraints on the same column and AND them 
> together?
> 
> Ariel
> 
> 
> 
> On Mon, Jun 24, 2024, at 6:43 PM, Bernardo Botella wrote:
>> Hi Ariel and Jon,
>> 
>> Let me address your question first. Yes, AND is supported in the proposal. 
>> Below you can find some examples of different constraints applied to the 
>> same column.
>> 
>> As per the LENGTH name instead of sizeOf as in the proposal, I am also not 
>> opposed to it if it is more consistent with terminology in the databases 
>> universe.
>> 
>> So, to recap, there seems to be general agreement on the usefulness of the 
>> Constraints Framework.
>> Now, from the feedback that has arrived after the voting has been called, I 
>> see there are three different proposals for syntax:
>> 
>> 1.-
>> The syntax currently described in the CEP. Example:
>> CREATE TYPE keyspace.cidr_address_ipv4 (
>>   ip_adress inet,
>>   subnet_mask int,
>>   CONSTRAINT subnet_mask > 0,
>>   CONSTRAINT subnet_mask < 32
>> )
>> 
>> 2.-
>> As Jon suggested, leaving this definitions to more specific Guardrails at 
>> table level. Example, something like:
>> column_min_int_value_size_threshold_keyspace_address_ipv4_ip_adress = 0
>> column_max_int_value_size_threshold_keyspace_address_ipv4_ip_adress = 32
>> 
>> 3.-
>> As Ariel suggested, having the CHECK keyword added to align consistency with 
>> SQL. Example:
>> CREATE TYPE keyspace.cidr_address_ipv4 (
>>   ip_adress inet,
>>   subnet_mask int,
>>   CONSTRAINT CHECK subnet_mask > 0,
>>   CONSTRAINT CHECK subnet_mask < 32
>> )
>> 
>> For the guardrails vs cql syntax, I think that keeping the conceptual 
>> separation that has been explored in this thread, and perfectly recapped by 
>> Doug, is closer to what we are trying to achieve with this framework. In my 
>> opinion, having them in the CQL schema definition provides those application 
>> level constraints that Doug mentions in an more accesible way than having to 
>> configure such specific guardrais.
>> 
>> For the addition of the CHECK keyword, I'm definitely not opposed to it if 
>> it helps Cassandra users coming from other databases understand concepts 
>> that were already familiar to them.
>> 
>> I hope this helps move the conversation forward,
>> Bernardo
>> 
>> 
>> 
>>> On Jun 24, 2024, at 12:17 PM, Ariel Weisberg  wrote:
>>> 
>>> Hi,
>>> 
>>> I see a vote for this has been called. I should have provided more prompt 
>>> feedback sooner.
>>> 
>>> I am a strong +1 on adding column level constraints being a good thing to 
>>> add. I'm not too concerned about row/partition/table level constraints, but 
>>> I would like to change the syntax before I would be +1 on this CEP.
>>> 
>>> It would be good to align the syntax as closely as possible to our existing 
>>> syntax, and if not that then MySQL/Postgres. For example it looks like we 
>>> don't have a string length function so maybe add `LENGTH` (consistent with 
>>> MySQL/Postgres) to also use with column level constraints.
>>> 
>>> It looks like there are generally two forms of constraint syntax

Re: [DISCUSS] CEP-42: Constraints Framework

2024-07-01 Thread Bernardo Botella
Thanks everyone for all the feedback that came in after the call for votes.

To Yifan's point, yes you are right, and I updated the CEP with the expressions.

There’s been a really good discussion around adding or supporting constraints 
at read time. I think the point Doug made illustrate that such constraints may 
come with rough edges that have other implications that need be taken care of. 
Due to that, I’d like to follow Dinesh’s suggestion of deferring it, and start 
again with the call for votes for the proposal. 

I will resurface the call for votes thread.

Thanks everyone!
Bernardo

> On Jun 29, 2024, at 1:26 PM, Dinesh Joshi  wrote:
> 
> The read time constraint application is going to be expensive and possibly 
> complicated to implement with low RoI. Therefore my suggestion is to defer 
> it. If there are situations where it appears to be helpful, we can always 
> reconsider it.
> 
> On Tue, Jun 25, 2024 at 3:34 PM Yifan Cai  > wrote:
>>> - Alter and Drop constraints are as follows
>>> ALTER CONSTRAINT [name] CHECK new_condition DROP CONSTRAINT [name]
>> 
>> I think you mean the following syntax to modify existing constraints, since 
>> constraints are part of the table definition. 
>> ALTER TABLE [keyspace_name.]table_name ALTER CONSTRAINT [constraint_name] 
>> CHECK check_expression
>> 
>> Dinesh's proposal to check on read is a good addition. I think it is 
>> optional and should be enabled/disabled w/ configuration. The extra check 
>> may not be desirable in some circumstances, e.g. the use cases do not ever 
>> change the constraints and do not have other write data other than CQL. 
>> Since the original CEP defines that the constraints are applied at the write 
>> time, we need to update the CEP if we decide to include the check on read.
>> 
>> - Yifan
>> 
>> 
>> On Tue, Jun 25, 2024 at 1:13 PM Štefan Miklošovič > > wrote:
>>> I wonder how often it is that users will apply the constraints on tables 
>>> with data while they know their data is probably not compliant with the 
>>> constraint configuration. I humbly think that people are aware of this in 
>>> advance and what usually happens is that there is some kind of a job which 
>>> consolidates the data (or migrates them to a new table) before admins put a 
>>> "lid" on that so moving forward nobody puts there anything which would 
>>> violate it.
>>> 
>>> I probably have not kept myself up to date with the discussion but I was 
>>> thinking that constraints are effectively there just on the write path. 
>>> Whatever is read is not a job of a constraint to refuse to return.
>>> 
>>> On Tue, Jun 25, 2024 at 9:57 PM Dinesh Joshi >> > wrote:
 Abe, that's a good point. We need to call out distinct use-cases here. 
 When a fresh cluster is set up with constraints we don't have any issues 
 because the data written and read back is going to be compliant to the 
 constraint(s). For existing data in a cluster where new constraints are 
 applied or existing constraints changed in such a way that may render 
 existing data unreadable, we need a good user experience. This is what I 
 propose –
 
 1. When a constraint is added or changed in such a way that existing data 
 could be rendered unreadable, we should warn the user.
 
 2. Give the user a choice of whether it is ok for the data to be rendered 
 unreadable and an error is issued or a warning should be issued when the 
 read violates the constraint but data is still readable. New data going in 
 will meet the constraint but old data would need to be rewritten for the 
 application to make it compliant.
 
 With this approach the application developer can decide what is right for 
 their particular use-case. In many cases the application developer may 
 decide to rewrite the data when they see a warning.
 
 
 On Tue, Jun 25, 2024 at 12:46 PM Abe Ratnofsky >>> > wrote:
> If we're going to introduce a feature that looks like SQL constraints, we 
> should make sure it's "reasonably" compliant. In particular, we should 
> avoid situations where a user creates a constraint, writes some data, 
> then reads data that violates that constraint, unless they've expressed 
> that violations on read would be acceptable.
> 
> For Postgres, when adding a new constraint you can specify NOT VALID to 
> avoid scanning all existing relevant data[1]. If we want to avoid 
> scan-on-DDL, this tradeoff needs to be made clear to a user.
> 
> As we've already discussed, constraints must deal with operations that 
> appear within limits on the write path, but once reconciled on read or 
> during compaction can lead to a violation. Adding to non-frozen 
> collections is one example. Expecting users to understand the write path 
> for collections feels unrealistic t

Re: [DISCUSS] Feature branch to update a nodetool obsolete dependency (airline)

2024-07-01 Thread Bernardo Botella
+1 on the feature branch allowing breaking the effort into smaller chunks that 
can be even worked in parallel.



> On Jul 1, 2024, at 3:13 AM, Štefan Miklošovič  wrote:
> 
> Hi Maxim,
> 
> thank you for doing this. I think that Picocli is a great choice, comparing 
> it with airline v2 which is an attempt to resurrect the original airline, it 
> seems to be way more active and popular.
> 
> I personally don't have anything against what you suggested, however I think 
> that this kind of work will put additional stress on us being sure that the 
> output of the commands will be exactly as it is now. We do have nodetool 
> tests which are covering the tests for the output which is very handy in this 
> kind of situation, but I think we do not test all of them. It would be great 
> to increase our test coverage where possible in this area and I think it is 
> actually going to be a requirement as only then we will be sure that old and 
> new code produces the same output.
> 
> I think it is too soon to contemplate when we switch to this, we just need to 
> be sure that it is the same so existing integrations will not be broken.
> 
> Regards 
> 
> On Fri, Jun 28, 2024 at 3:48 PM Maxim Muzafarov  > wrote:
>> Hello everyone,
>> 
>> 
>> The nodetool relies on the airlift/airline library to mark up the CLI
>> commands used to manage Cassandra, which are part of our public API.
>> This library is no longer maintained, so we need to update it anyway,
>> and the good news is that we already have several good alternatives:
>> airline-2 [3] or picocli [2].
>> 
>> In this message, I'm mainly talking about CASSANDRA-17445 [4], which
>> refers to the problem and is a prerequisite for a larger CEP-38 CQL
>> Management API [5]. It doesn't make sense to use annotations from the
>> deprecated library to build a new API, so this is another reason to
>> update the library as soon as possible and do some inherently small
>> code refactoring required for the CEP-38.
>> 
>> In addition to being widely used and well supported, the Picocli
>> library offers the following advantages for us:
>> - We can detach the jmx-specific parameters from the commands so that
>> they can be reused in other APIs (e.g. without host, port) while
>> remaining backwards compatible;
>> - We can set up nodetool's autocompletion after the migration with
>> minimal effort;
>> - There is a good Picocli ecosystem of tools that we can use to
>> simplify our codebase, e.g. generate man pages tool to make our CLIs
>> more Unix friendly [7];
>> 
>> 
>> = Prototype =
>> 
>> I have a working prototype [8] that shows what the result will look
>> like. The prototype includes:
>> - Tests between the execution of commands via the nodetool and nodtoolv2;
>> - 5 out of 164 nodetool commands have been moved so far, to show the
>> refactoring we need to do to the command's body;
>> - The command help output under for the nodetoolv2 is the same as it
>> is currently for the nodetool and this is the default, however a
>> "cassandra.cli.picocli.layout" is added to switch to the Picocli
>> defaults;
>> - You can also see that the colour scheme is applied by the Picocli
>> out of the box, and this is how it looks [9];
>> - The nodetoolv2 is called first when the shell is triggered, and if
>> the nodetoolv2 doesn't contain the command it needs yet, it falls back
>> to the nodetool and the old argument parser;
>> 
>> 
>> Since the number of commands is quite large (164), I'd like to create
>> a feature branch and move all the commands one at a time, while
>> keeping the output backwards by applying additional tests at the same
>> time and checking that the CI is always green. I think the "feature
>> branch" approach will be less stressful for us since it focuses on
>> requiring a review of only tedious changes to the feature branch,
>> rather than reviewing the 15k line patch.
>> 
>> 
>> Anyway, I am open to any suggestions and advice based on your
>> experience and best practices for this case. Looking forward to your
>> thoughts and suggestions.
>> 
>> 
>> 
>> [1] https://github.com/airlift/airline
>> [2] https://picocli.info/
>> [3] https://github.com/rvesse/airline
>> [4] https://issues.apache.org/jira/browse/CASSANDRA-17445
>> [5] 
>> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-38%3A+CQL+Management+API
>> [6] 
>> https://github.com/apache/cassandra/pull/2497/files#diff-acdd5f29d28df5c02f4bfc933528f084508b4923112e312e68a4aff7df973bce
>> [7] https://picocli.info/man/gen-manpage.html
>> [8] https://github.com/apache/cassandra/pull/2497/files
>> [9] 
>> https://github.com/apache/cassandra/assets/3415046/57b14ae0-ff59-43d2-b542-10d3218ae075



Re: [VOTE] CEP-42: Constraints Framework

2024-07-01 Thread Bernardo Botella
With all the feedback that came in the discussion thread after the call for 
votes, I’d like to extend the period another 72 hours starting today.

As before, a vote passes if there are at least 3 binding +1s and no binding 
vetoes.

Thanks,
Bernardo Botella

> On Jun 24, 2024, at 7:17 AM, Bernardo Botella  
> wrote:
> 
> Hi everyone,
> 
> I would like to start the voting for CEP-42.
> 
> Proposal: 
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-42%3A+Constraints+Framework
> Discussion: https://lists.apache.org/thread/xc2phmxgsc7t3y9b23079vbflrhyyywj
> 
> The vote will be open for 72 hours. A vote passes if there are at least 3 
> binding +1s and no binding vetoes.
> 
> Thanks,
> Bernardo Botella



Re: [VOTE] CEP-40: Data Transfer Using Cassandra Sidecar for Live Migrating Instances

2024-07-02 Thread Bernardo Botella
+1 (nb)

> On Jul 2, 2024, at 2:15 PM, Jordan West  wrote:
> 
> +1
> 
> On Fri, Jun 28, 2024 at 05:56  > wrote:
>> +1
>> 
>> 
>>> On Jun 27, 2024, at 3:03 PM, Josh McKenzie >> > wrote:
>>> 
>>> +1
>>> 
>>> On Thu, Jun 27, 2024, at 12:40 AM, Abhijeet Dubey wrote:
 +1
 
 On Thu, Jun 27, 2024 at 1:47 AM Francisco Guerrero >>> > wrote:
 +1
 
 On 2024/06/21 15:13:31 Venkata Hari Krishna Nukala wrote:
 > Hi everyone,
 > 
 > I would like to start the voting for CEP-40 as all the feedback in the
 > discussion thread seems to be addressed.
 > 
 > Proposal:
 > https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-40%3A+Data+Transfer+Using+Cassandra+Sidecar+for+Live+Migrating+Instances
 > Discussion thread:
 > https://lists.apache.org/thread/g397668tp0zybf29g8hgbllv7t3j493f
 > 
 > As per the CEP process documentation, this vote will be open for 72 hours
 > (longer if needed).
 > 
 > Thanks!
 > Hari
 >
 
 
 --
 Abhijeet Dubey
 Software Engineer @ Apple Inc.
 IIT Bombay Computer Science & Engineering Class of 2019
 Contact : +91-9900190105
 Apple Inc. | IIT Bombay
>> 



[RESULT][VOTE] CEP-42: Constraints Framework

2024-07-04 Thread Bernardo Botella
The vote passes with 7 binding +1, 3 non binding, and no vetoes.

Thanks everyone who was part of the discussion!
Bernardo


Re: [DISCUSS] Introduce CREATE TABLE LIKE grammer

2024-08-19 Thread Bernardo Botella
Definitely a nice addition to CQL.

Looking for inspiration at how Postgres and Mysql do that may also help with 
the final design (I like the WITH proposed by Stefan, but I would definitely 
take a look at the INCLUDING keyword proposed by Postgres).
https://www.postgresql.org/docs/current/sql-createtable.html
https://dev.mysql.com/doc/refman/8.4/en/create-table-like.html

On top of that, and as part of the interesting questions, I would like to add 
the permissions to the mix. Both the question about copying them over (with a 
WITH keyword probably), and the need for read permissions on the source table 
as well.

Bernardo

> On Aug 19, 2024, at 10:01 AM, Štefan Miklošovič  
> wrote:
> 
> BTW this would be cool to do as well:
> 
> ALTER TABLE ks.to_copy LIKE ks.tb WITH INDICES;
> 
> This would mean that if we create a copy of a table, later we can decide that 
> we need indices too, so we might "enrich" that table with indices from the 
> old one without necessarily explicitly re-creating them on that new table. 
> 
> On Mon, Aug 19, 2024 at 6:55 PM Štefan Miklošovič  > wrote:
>> I think this is an interesting idea worth exploring. I definitely agree with 
>> Benjamin who raised important questions which needs to be answered first. 
>> Also, what about triggers?
>> 
>> It might be rather "easy" to come up with something simple but it should be 
>> a comprehensive solution with predictable behavior we all agree on.
>> 
>> If a keyspace of a new table does not exist we would need to create that one 
>> too before. For the simplicity, I would just make it a must to create it on 
>> same keyspace. We might iterate on that in the future.
>> 
>> UDTs are created per keyspace so there is nothing to re-create. We just need 
>> to reference it from a new table, right?
>> 
>> Indexes and MVs are interesting but in theory they might be re-created too.
>> 
>> Would it be appropriate to use something like this?
>> 
>> CREATE TABLE ks.tb_copy LIKE ks.tb WITH INDEXES AND VIEWS AND TRIGGERS  
>> 
>> Without "WITH" it would just copy a table with nothing else.
>> 
>> On Mon, Aug 19, 2024 at 6:10 PM guo Maxwell > > wrote:
>>> Hello, everyone:
>>> As  Jira CASSANDRA-7662 
>>>  has described , we 
>>> would like to introduce a new grammer " CREATE TABLE LIKE " ,which  
>>> simplifies creating new tables duplicating the existing ones .
>>> The format may be like : CREATE TABLE  LIKE 
>>> 
>>> Before I implement this function, do you have any suggestions on this?
>>> 
>>> Looking forward to your reply!



Re: Welcome Doug Rohrer as Cassandra Committer

2024-08-23 Thread Bernardo Botella
Congratulations Doug!

> On Aug 23, 2024, at 1:20 PM, Yifan Cai  wrote:
> 
> Congrats Doug!
> From: Jordan West 
> Sent: Friday, August 23, 2024 1:19:04 PM
> To: dev@cassandra.apache.org 
> Subject: Re: Welcome Doug Rohrer as Cassandra Committer
>  
> Awesome! Congratulations Doug! 
> 
> On Fri, Aug 23, 2024 at 12:17 Štefan Miklošovič  > wrote:
> Great news! Congratulations, Doug. 
> 
> On Fri, Aug 23, 2024 at 8:55 PM Dinesh Joshi  > wrote:
> The Apache Cassandra PMC is thrilled to announce that Doug Rohrer has
> accepted the invitation to become a committer!
> 
> Doug has worked on several aspects of Cassandra, Sidecar, and
> Analytics. Congratulations and welcome!
> 
> The Apache Cassandra PMC members



Re: Welcome Jordan West and Stefan Miklosovic as Cassandra PMC members!

2024-08-30 Thread Bernardo Botella
Congrats you both!

> On Aug 30, 2024, at 1:32 PM, Melissa Logan  wrote:
> 
> Great news - congrats Jordan and Stefan!
> 
> On Fri, Aug 30, 2024 at 1:31 PM Sumanth Pasupuleti 
> mailto:sumanth.pasupuleti...@gmail.com>> 
> wrote:
>> Congratulations Jordan and Stefan!!!
>> 
>> On Fri, Aug 30, 2024 at 1:21 PM Jon Haddad > > wrote:
>>> The PMC's members are pleased to announce that Jordan West and Stefan 
>>> Miklosovic have accepted invitations to become PMC members.
>>> 
>>> Thanks a lot, Jordan and Stefan, for everything you have done for the 
>>> project all these years.
>>> 
>>> Congratulations and welcome!!
>>> 
>>> The Apache Cassandra PMC



Re: [VOTE] Release test-api 0.0.17

2024-09-10 Thread Bernardo Botella
+1 nb

> On Sep 10, 2024, at 7:34 AM, Doug Rohrer  wrote:
> 
> Proposing the test build of in-jvm dtest API 0.0.17 for release
> 
> Repository:
> https://gitbox.apache.org/repos/asf?p=cassandra-in-jvm-dtest-api.git
> 
> Candidate SHA:
> https://github.com/apache/cassandra-in-jvm-dtest-api/commit/85b538ca8259dedc2aded8a633cf3174f551f664
> Tagged with 0.0.17
> 
> Artifacts:
> https://repository.apache.org/content/repositories/orgapachecassandra-1343/org/apache/cassandra/dtest-api/0.0.17/
> 
> Key signature: 9A648E3DEDA36EE374C4277B602ED2C52277
> 
> Changes since last release:
> * CASSANDRA-19783 - In-jvm dtest to detect InstanceClassLoaderLeaks
> * CASSANDRA-19239 - jvm-dtests crash on java 17
> 
> The vote will be open for 24 hours. Everyone who has tested the build
> is invited to vote. Votes by PMC members are considered binding. A
> vote passes if there are at least three binding +1s.



Re: Welcome Chris Bannister, James Hartig, Jackson Flemming and João Reis, as cassandra-gocql-driver committers

2024-09-12 Thread Bernardo Botella
It is great to see the project growing like this. Congratulations!!

> On Sep 12, 2024, at 6:27 AM, Tolbert, Andy  wrote:
> 
> Congratulations everyone! 🎉
> 
> On Thu, Sep 12, 2024 at 6:41 AM Mick Semb Wever  > wrote:
>> The PMC's members are pleased to announce that Chris Bannister, James 
>> Hartig, Jackson Flemming and João Reis have accepted invitations to become 
>> committers on the Drivers subproject.  
>> 
>> Thanks a lot for everything you have done with the gocql driver all these 
>> years.  We are very excited to see the driver now inside the Apache 
>> Cassandra project.
>> 
>> Congratulations and welcome!!
>> 
>> The Apache Cassandra PMC



Re: [DISCUSS] Donating easy-cass-stress to the project

2024-10-08 Thread Bernardo Botella
Just found out about this thread.

I do agree, after seeing Jon and Jordan’s talk on this tool, it would be great 
to have it under the project umbrella. Like Alexander, I have also some ideas 
on workflows to contribute, and would love to help maintain it.

Bernardo

> On Oct 8, 2024, at 1:51 PM, Doug Rohrer  wrote:
> 
> Hey folks,
> 
> I just wanted to resurface this conversation, especially after Jon and 
> Jordon’s talk at Community over Code this week. I think there would be some 
> real value in getting easy-cass-lab donated and part of the ecosystem.
> 
> To try to summarize:
> 
> - Jon would like to donate if his active development of the project isn’t 
> negatively affected.
> 
> - It seems a separate repo/subproject is the right way to go rather than 
> bringing it in-tree
> 
> - Several other folks have stepped up to be co-maintainers (thanks!)
> 
> - Some form of IP clearance would need to be done if this were to move 
> forward.
> 
> It seems the major concerns other than IP clearance were taken care of in the 
> thread. Is there an appetite to bring easy-case-stress into the Apache 
> umbrella and, if so, how would we move forward from here?
> 
> Doug Rohrer
> 
>> On May 3, 2024, at 1:16 PM, Alexander DEJANOVSKI  
>> wrote:
>> 
>> 
>> Hi folks,
>> 
>> I'm familiar with the codebase and can help with the maintenance and 
>> evolution.
>> I already have some additional profiles that I can push there which were 
>> never merged in the main branch of tlp-cluster.
>> 
>> I love this tool (I know I'm biased) and hope it gets the attention it 
>> deserves.
>> 
>> Le mar. 30 avr. 2024, 23:17, Jordan West > > a écrit :
>>> I would likely commit to it as well
>>> 
>>> Jordan 
>>> 
>>> On Mon, Apr 29, 2024 at 10:55 David Capwell >> > wrote:
> So: besides Jon, who in the community expects/desires to maintain this 
> going forward? 
 
 I have been maintaining a fork for years, so don’t mind helping maintain 
 this project.
 
> On Apr 28, 2024, at 4:08 AM, Mick Semb Wever  > wrote:
> 
>> A separate subproject like dtest and the Java driver would maybe help 
>> address concerns with introducing a gradle build system and Kotlin.
> 
> 
> 
> Nit, dtest is a separate repository, not a subproject.  The Java driver 
> is one repository to be in the Drivers subproject.  Esoteric maybe, but 
> ASF terminology we need to get right :-) 
> 
> To your actual point (IIUC), it can be a separate repository and not a 
> separate subproject.  This permits it to be kotlin+gradle, while not 
> having the formal subproject procedures.  It still needs 3 responsible 
> committers from the get-go to show sustainability.  Would 
> easy-cass-stress have releases, or always be a codebase users work 
> directly with ?
> 
> Can/Should we first demote cassandra-stress by moving it out to a 
> separate repo ? 
>  ( Can its imports work off non-snapshot dependencies ? )
> It might feel like an extra prerequisite step to introduce, but maybe it 
> helps move the needle forward and make this conversation a bit 
> easier/obvious.
> 
 



Re: [DISCUSS] CEP-44: Kafka integration for Cassandra CDC using Sidecar

2024-10-15 Thread Bernardo Botella
 2024 at 11:14 AM Josh McKenzie >> <mailto:jmcken...@apache.org>> wrote:
>>> 
>>> The CEP for the sidecar has stalled. The sidecar itself is very much alive 
>>> and a thing.
>>> 
>>> CEP != artifact.
>>> 
>>> We should definitely clean that up though.
>>> 
>>> On Mon, Sep 30, 2024, at 10:59 AM, Dinesh Joshi wrote:
>>>> Patrick, could you please elaborate? The Sidecar has been a thing for a 
>>>> while now.
>>>> 
>>>> On Mon, Sep 30, 2024 at 7:51 AM Patrick McFadin >>> <mailto:pmcfa...@gmail.com>> wrote:
>>>> I made the mistake of asking two things in one email. 
>>>> 
>>>> First thing I asked. Sidecar? Stalled CEP so why is this being talked 
>>>> about like this is a thing?
>>>> 
>>>> On Mon, Sep 30, 2024 at 7:21 AM Benedict >>> <mailto:bened...@apache.org>> wrote:
>>>> 
>>>> Sorry Bernardo, you may have misunderstood me. I don’t have any concerns, 
>>>> I was suggesting a possible future scenario where CDC for Kafka via 
>>>> sidecar is changed to use a hypothetical future topic subscription service 
>>>> provided by C*. It was meant to show that this CEP may be easily decoupled 
>>>> from any future evolution in this area. 
>>>> 
>>>> 
>>>>> On 30 Sep 2024, at 14:58, Bernardo Botella >>>> <mailto:conta...@bernardobotella.com>> wrote:
>>>>> Thanks everyone for the comments.
>>>> 
>>>>> 
>>>>> Patrick:
>>>>> The proposal includes a “best effort” approach for deduplication (some 
>>>>> details can be found on the Digest class comments on the PR here 
>>>>> https://github.com/apache/cassandra-analytics/pull/87/files#diff-3a09caecc1da13419d92cde56a7cfc7d253faac08182e6c2768b3d32c015de82R185-R193
>>>>>  ). That alone won’t eliminate all the duplicates, but as Josh points 
>>>>> out, it moves the line to something way easier to handle for consumers, 
>>>>> and definitely on the direction we should aim towards. Accord is 
>>>>> definitely something this contribution will benefit from, that will move 
>>>>> that line even further.
>>>>> 
>>>>> Benedict:
>>>>> If I understand it correctly, your concern is that Kafka is somewhat the 
>>>>> hardcoded option for a CDC stream being published? The proposal 
>>>>> introduces a concept of data sources and sinks 
>>>>> (https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=323488575#CEP44:KafkaintegrationforCassandraCDCusingSidecar-SourcesandSinks)
>>>>>  being kafka the first implemented data sink. That means that the actual 
>>>>> Kafka output should (will) be something pluggable.
>>>>> 
>>>>> 
>>>>> 
>>>>>> On Sep 30, 2024, at 5:43 AM, Josh McKenzie >>>>> <mailto:jmcken...@apache.org>> wrote:
>>>>>> 
>>>>>>> I don't see much on how this would be handled other than "left to the 
>>>>>>> end user to figure out." 
>>>>>> My immediate thought when I read that was "Yes. But it's moving where we 
>>>>>> draw the line of 'left to the end user to figure out' much further than 
>>>>>> it was before".
>>>>>> 
>>>>>> This should only be necessary in edge cases w/extended severe degraded 
>>>>>> availability where you can't hit QUORUM w/this design. So we go from 
>>>>>> "De-dupe literally everything o ye' user" to "de-dupe a small fraction 
>>>>>> of a % of the time when things really go off the rails".
>>>>>> 
>>>>>> It still leaves the burden of processing potential duplicates 
>>>>>> downstream, so some complexity burden on the users remains if they have 
>>>>>> no tolerance for processing duplicate messages, however the underlying 
>>>>>> machine resource utilization (from "dedupe everything" to "dedupe a 
>>>>>> small % of things") is pretty massively shifted by this design change. 
>>>>>> That, and using the hash of the mutation the way the extended design 
>>>>>> does is something a downstream consumer could also do on their side to 
>>>>>> ensure anything that 

Re: [VOTE] CEP-44: Kafka integration for Cassandra CDC using Sidecar

2024-10-18 Thread Bernardo Botella
+1 nb

> On Oct 17, 2024, at 5:52 PM, Josh McKenzie  wrote:
> 
> +1
> 
> On Thu, Oct 17, 2024, at 2:51 PM, Yifan Cai wrote:
>> +1 nb
>> 
>> 
>> From: Brandon Williams 
>> Sent: Thursday, October 17, 2024 11:47:13 AM
>> To: dev@cassandra.apache.org 
>> Subject: Re: [VOTE] CEP-44: Kafka integration for Cassandra CDC using Sidecar
>>  
>> +1
>> 
>> Kind Regards,
>> Brandon
>> 
>> On Thu, Oct 17, 2024 at 1:08 PM James Berragan  wrote:
>> >
>> > Hi everyone,
>> >
>> > I would like to start the voting for CEP-44 as all the feedback in the 
>> > discussion thread seems to be addressed.
>> >
>> > Proposal: CEP-44: Kafka integration for Cassandra CDC using Sidecar
>> > Discussion thread: 
>> > https://lists.apache.org/thread/8k6njsnvdbmjb6jhyy07o1s7jz8xp1qg
>> >
>> > As per the CEP process documentation, this vote will be open for 72 hours 
>> > (longer if needed).
>> >
>> > Thanks!
>> > James.



Re: [DISCUSS] Introduce CREATE TABLE LIKE grammer

2024-10-21 Thread Bernardo Botella
gt;>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Wed, Oct 16, 2024 at 11:16 AM Yifan Cai >>>>>>>>>>>>> <mailto:yc25c...@gmail.com>> wrote:
>>>>>>>>>>>>>>> "WITH ALL" seems to be a natural addition to the directives. 
>>>>>>>>>>>>>>> What do you think about adding the fifth keyword ALL to retain 
>>>>>>>>>>>>>>> all fields of the table schema? 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> For instance, CREATE TABLE new_table LIKE original_table WITH 
>>>>>>>>>>>>>>> ALL, it replicates options, indexes, triggers, constraints and 
>>>>>>>>>>>>>>> any applicable kinds that are introduced in the future.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> - Yifan
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On Wed, Oct 16, 2024 at 7:46 AM guo Maxwell 
>>>>>>>>>>>>>>> mailto:cclive1...@gmail.com>> wrote:
>>>>>>>>>>>>>>>> Disscussed with Bernardo on slack,and +1 with his advice on 
>>>>>>>>>>>>>>>> adding a fourth keyword. 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> The keyword would be  CONSTRAINTS , any more suggestion ?
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> guo Maxwell >>>>>>>>>>>>>>> <mailto:cclive1...@gmail.com>>于2024年10月16日 周三上午9:55写道:
>>>>>>>>>>>>>>>>> Hi yifan, 
>>>>>>>>>>>>>>>>> Thanks for bringing this up. The SELECT permission on the 
>>>>>>>>>>>>>>>>> original table is needed. Mysql and PG all have mentioned 
>>>>>>>>>>>>>>>>> this, and I also specifically noticed this in my code.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> I probably missed this in the cep documentation. 😅
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Yifan Cai mailto:yc25c...@gmail.com>> 
>>>>>>>>>>>>>>>>> 于2024年10月16日周三 07:46写道:
>>>>>>>>>>>>>>>>>> Thanks for creating the CEP! I think it is missing 
>>>>>>>>>>>>>>>>>> Bernardo's comment on "the need for read permissions on the 
>>>>>>>>>>>>>>>>>> source table". 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> CreateTableStatement does not check the permissions outside 
>>>>>>>>>>>>>>>>>> of the enclosing keyspace. Having the SELECT permission on 
>>>>>>>>>>>>>>>>>> the original table is a requirement for CREATE TABLE LIKE.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> - Yifan
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> On Sun, Sep 29, 2024 at 11:01 PM guo Maxwell 
>>>>>>>>>>>>>>>>>> mailto:cclive1...@gmail.com>> wrote:
>>>>>>>>>>>>>>>>>>> Hello, everyone , 
>>>>>>>>>>>>>>>>>>> I have finished the doc for CEP-43 for CREATE_TABLE_LIKE 
>>>>>>>>>>>>>>>>>>> <https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-43++Apache+Cassandra+CREATE+TABLE++LIKE>
>>>>>>>>>>>>>>>>>>>  as said before, looking forward to your suggestions. 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>

Re: [DISCUSS] Modeling JIRA fix version for subprojects

2024-10-17 Thread Bernardo Botella
+1 to CASSANALYTICS

> On Oct 17, 2024, at 1:48 PM, Yifan Cai  wrote:
> 
> yep. CASSANALYTICS sounds good to me. +1
> 
> On Thu, Oct 17, 2024 at 1:45 PM Francisco Guerrero  > wrote:
>> > Can we include Cassandra Analytics to the infra ticket? I am looking
>> > forward to jira project name suggestions for it...
>> 
>> How about CASSANALYTICS ?
>> 
>> On 2024/10/17 18:50:45 Yifan Cai wrote:
>> > Can we include Cassandra Analytics to the infra ticket? I am looking
>> > forward to jira project name suggestions for it...
>> > 
>> > - Yifan
>> > 
>> > On Thu, Oct 17, 2024 at 10:46 AM Patrick McFadin > > > wrote:
>> > 
>> > > I think it needs a bit more blue. Maybe some pink stripes. I'll file a
>> > > Jira.
>> > >
>> > > On Thu, Oct 17, 2024 at 9:01 AM Brandon Williams > > > > wrote:
>> > >
>> > >> Thanks everyone, I've created
>> > >> https://issues.apache.org/jira/browse/INFRA-26212
>> > >>
>> > >> Kind Regards,
>> > >> Brandon
>> > >>
>> > >> On Thu, Oct 17, 2024 at 9:55 AM Ekaterina Dimitrova
>> > >> mailto:e.dimitr...@gmail.com>> wrote:
>> > >> >
>> > >> > It would have been nice to be in red italic but… :-)
>> > >> >
>> > >> > Thanks, Brandon, +1 to the suggestion on my end too. Sounds reasonable
>> > >> to me
>> > >> >
>> > >> >
>> > >> > On Thu, 17 Oct 2024 at 17:50, Abe Ratnofsky > > >> > > wrote:
>> > >> >>
>> > >> >> +1 to CASSDRIVER-JAVA et al.
>> > >> >>
>> > >> >> On Oct 17, 2024, at 10:37 AM, Jon Haddad > > >> >> >
>> > >> wrote:
>> > >> >>
>> > >> >> Sgtm, let’s ship it
>> > >> >>
>> > >> >> +1
>> > >> >>
>> > >> >>
>> > >> >>
>> > >> >> On Thu, Oct 17, 2024 at 4:09 AM Brandon Williams > > >> >> >
>> > >> wrote:
>> > >> >>>
>> > >> >>> Nobody wants to suggest a color for this bikeshed?  I'll start:
>> > >> >>> CASSDRIVER-. I'd like to get on this sooner than later 
>> > >> >>> since
>> > >> >>> during the time we wait the situation worsens.
>> > >> >>>
>> > >> >>> Kind Regards,
>> > >> >>> Brandon
>> > >> >>>
>> > >> >>> On Wed, Oct 2, 2024 at 5:07 PM Brandon Williams > > >> >>> >
>> > >> wrote:
>> > >> >>> >
>> > >> >>> > I think we just need to ask infra to create the jira instances, 
>> > >> >>> > but
>> > >> I
>> > >> >>> > guess we need to have some kind of consistent naming scheme to 
>> > >> >>> > help
>> > >> >>> > identify them?
>> > >> >>> >
>> > >> >>> > Kind Regards,
>> > >> >>> > Brandon
>> > >> >>> >
>> > >> >>> > On Wed, Oct 2, 2024 at 1:02 PM Francisco Guerrero <
>> > >> fran...@apache.org > wrote:
>> > >> >>> > >
>> > >> >>> > > +1 too on the points brought by Mick, we need more visibility 
>> > >> >>> > > into
>> > >> >>> > > subprojects. For starters, we should look into integrating Qbot
>> > >> >>> > > notifications in #cassandra-dev and #cassandra-noise for
>> > >> >>> > > CASSANDRASC tickets. Let me know if I can help with that.
>> > >> >>> > >
>> > >> >>> > > On 2024/10/02 17:39:28 Yifan Cai wrote:
>> > >> >>> > > > +1 on all the points raised by Mick. Please let me know if
>> > >> there is
>> > >> >>> > > > anything I can help with.
>> > >> >>> > > >
>> > >> >>> > > > - Yifan
>> > >> >>> > > >
>> > >> >>> > > > On Wed, Oct 2, 2024 at 8:13 AM Josh McKenzie <
>> > >> jmcken...@apache.org > wrote:
>> > >> >>> > > >
>> > >> >>> > > > > - Qbot notifications in #cassandra-dev and #cassandra-noise 
>> > >> >>> > > > > ,
>> > >> as well as
>> > >> >>> > > > > in any subproject channels
>> > >> >>> > > > > - some cadence of dev@ ML updates, e.g. on activities, or
>> > >> dependency
>> > >> >>> > > > > changes, etc
>> > >> >>> > > > > - regular releases
>> > >> >>> > > > >
>> > >> >>> > > > > Agree on all 3 points. Also - I've *definitely* fallen off 
>> > >> >>> > > > > on
>> > >> the project
>> > >> >>> > > > > updates for mainline; I'll pick that back up after 
>> > >> >>> > > > > ApacheCon.
>> > >> >>> > > > >
>> > >> >>> > > > >
>> > >> >>> > > > > On Wed, Oct 2, 2024, at 1:57 AM, Mick Semb Wever wrote:
>> > >> >>> > > > >
>> > >> >>> > > > > To play devil's advocate here, it's important that the
>> > >> subprojects don't
>> > >> >>> > > > > lose visibility and silo from the rest of the project.
>> > >> >>> > > > >
>> > >> >>> > > > > There are different ways to solve this, and lumping
>> > >> everything into one
>> > >> >>> > > > > jira project is a messy and poor way of doing it.  But as 
>> > >> >>> > > > > the
>> > >> sidecar has
>> > >> >>> > > > > shown us, subproject activity should somehow be made noisy 
>> > >> >>> > > > > to
>> > >> us.  We need
>> > >> >>> > > > > sorts of common spaces in the project.
>> > >> >>> > > > >
>> > >> >>> > > > > If we go the separate jira project route, then some
>> > >> suggestions to help
>> > >> >>> > > > > with this are:
>> > >> >>> > > > > - Qbot notifications in #cassandra-dev and #cassandra-noise

Re: [VOTE] CEP-42: Constraints Framework

2024-11-04 Thread Bernardo Botella
fiers unless quoted, this can 
>>>>>>> complicate data definition declarations. We should aim to avoid adding 
>>>>>>> new reserved keywords where possible. Here are a couple of alternatives:
>>>>>>> 
>>>>>>> 1.1 Inline Constraint Definition
>>>>>>> 
>>>>>>> We could eliminate the keyword "CONSTRAINT." Instead, similar to data 
>>>>>>> masking, constraints could be defined using "CONSTRAINED WITH." For 
>>>>>>> example, in the following code, r_value_range_lower_bound and 
>>>>>>> r_value_range_upper_bound are constraint names, followed immediately by 
>>>>>>> their expressions, with multiple constraints connected using "AND".
>>>>>>> 
>>>>>>> CREATE TABLE rgb (
>>>>>>>   name text PRIMARY KEY,
>>>>>>>   r int CONSTRAINED WITH r_value_range_lower_bound CHECK r >= 0 AND 
>>>>>>> r_value_range_upper_bound CHECK r < 256,
>>>>>>>   ...
>>>>>>> );
>>>>>>> 1.2 Special Symbol
>>>>>>> 
>>>>>>> Another option is to use a special symbol to differentiate from 
>>>>>>> identifiers, such as "@CONSTRAINT." However, since there is currently 
>>>>>>> no annotation-like concept in CQL, this might confuse users.
>>>>>>> 
>>>>>>> CREATE TABLE rgb (
>>>>>>>   name text PRIMARY KEY,
>>>>>>>   r int,
>>>>>>>   ...
>>>>>>>   @CONSTRAINT r_value_range_lower_bound CHECK r >= 0,
>>>>>>>   @CONSTRAINT r_value_range_upper_bound CHECK r < 256,
>>>>>>>   ...
>>>>>>> );
>>>>>>> 2. Constraint Name
>>>>>>> CEP-42 states, "Name of the constraint is optional. If it is not 
>>>>>>> provided, a name is generated for the constraint." 
>>>>>>> 
>>>>>>> However, based on the actual statements defining constraints, I believe 
>>>>>>> names should be mandatory for clarity and usability. System-generated 
>>>>>>> names often lack descriptiveness.
>>>>>>> 
>>>>>>> 3. Cross-Column Constraints
>>>>>>> CEP-42 proposes allowing constraints that compare multiple columns. For 
>>>>>>> example,
>>>>>>> 
>>>>>>> CREATE TABLE keyspace.table (
>>>>>>>   p1 int,
>>>>>>>   p2 int,
>>>>>>>   ...,
>>>>>>>   CONSTRAINT [name] CHECK (p1 != p2)
>>>>>>> );
>>>>>>> Such constraints can be problematic due to their referential nature. 
>>>>>>> Consider scenarios where column p2 is dropped, or when insert/update 
>>>>>>> operations include only partial values (e.g., only inserting p1). 
>>>>>>> Should the query result in a read (before write), or should it fail due 
>>>>>>> to incomplete values?
>>>>>>> 
>>>>>>> For simplicity, I propose that, at least for the initial iteration, we 
>>>>>>> exclude support for cross-column constraints. In other words, 
>>>>>>> constraints should only check the values of individual columns.
>>>>>>> 
>>>>>>> - Yifan
>>>>>>> 
>>>>>>> 
>>>>>>> On Thu, Sep 19, 2024 at 11:46 AM Patrick McFadin >>>>>> <mailto:pmcfa...@gmail.com>> wrote:
>>>>>>>> Thanks for the update. My inbox search failed me :D
>>>>>>>> 
>>>>>>>> On Thu, Sep 19, 2024 at 11:31 AM Bernardo Botella 
>>>>>>>> mailto:conta...@bernardobotella.com>> 
>>>>>>>> wrote:
>>>>>>>>> Hi Patrick,
>>>>>>>>> 
>>>>>>>>> Thanks for taking a look at this and keeping the house tidy.
>>>>>>>>> 
>>>>>>>>> I announced the voting results on a sepparate thread:
>>>>>>>>> lists.apache.org
>>>>>>>>> 
>>>>>>>>>  
>>>>>>>>> <https://li

Re: [VOTE] CEP-42: Constraints Framework

2024-11-04 Thread Bernardo Botella
My comment comes from the fact that I find it counter intuitive to have 
constraints defined at column level that reference other columns. From your 
example, referencing column b from the column a definition looks off in my 
head. Besides, validation can become trickier. For instance, when validating 
the constraint on "a int CONSTRAINED WITH a != b”, we still don’t know if b 
exists or not.

Considering those two items, I do think there is value on keeping the “table 
level” constraints (basically, constraints defined outside of the column 
definition), that can potentially be used for such cross column constraints.

As for the naming, current proposal (and implementation on the PR) have the 
constraint name as optional, generating a name for the constraint if none is 
passed. I think we are discussing here if we should keep this, or directly 
remove the name completely with the drop of “table level” constraints.

Bernardo

> On Nov 4, 2024, at 10:21 AM, Štefan Miklošovič  wrote:
> 
> Could you give some concrete examples of potential problems when introducing 
> general / cross column constraints?
> 
> When I have a table where I do not want two columns to contain the same 
> values for a given primary key (and we do not want to deal with a tuple as 
> suggested before), why would this not be possible?
> 
> create table ks.tb (
> id int,
> a int CONSTRAINED WITH a != b,
> b int CONSTRAINED WITH b != a, 
> primary key (id)
> )
> 
> ALTER TABLE ks.tb ALTER a CONSTRAINED WITH a != b AND b CONSTRAINED WITH b != 
> a;
> 
> Maybe one constraint would be just enough? Depends how we look at it, if we 
> look for constraints just for that column or all columns to see for potential 
> constraint violations ... 
> 
> Can you give more (please, practical) examples where we would have problems 
> with unnamed constraints?
> 
> In general, I think that we should not deliver something which is not optimal 
> to use on delivery in the first palce. What if "general constraints" never 
> come? Then we would end up with something which is not so comfortable to use. 
> If we want to choose named constraints without any practical example which 
> would use them, I think we should deliver general constraints with CEP on the 
> introduction otherwise it will be half-baked without any guarantees we will 
> see the other half of that.
> 
> Also, if you can not live without named constraints, I think that they 
> _could_ be named, but their name would be hidden. How it appears in CQL is 
> just syntactic sugar. You could indeed have them named, it is just you _do 
> not have to_. For example when I have 
> 
> create table ks.tb (
> id int,
> a int CONSTRAINED WITH a > 10,
> b int CONSTRAINED WITH b < 50,
> primary key (id)
> )
> 
> then there is nothing wrong with having internal representation to contain 
> constraints which are named "ks_tb_a" and "ks_tb_b", it is just I do not need 
> to use it in CQL every single time I am going to interact with that. So, keep 
> the names if you want and if you think that multicolumn constraints 
> absolutely need that, it is just a user would not need to deal with this 
> every time and it would be easy on the eye and UX would be better.
> 
> On Mon, Nov 4, 2024 at 4:58 PM Bernardo Botella  <mailto:conta...@bernardobotella.com>> wrote:
>> Hi everyone,
>> 
>> Thanks a lot for the constructive discussion! Sorry for coming to it so late 
>> in the game, I’ve been out this past week, but I’m back up and running.
>> 
>> Really interesting ideas. So, to recap:
>> 
>> I do agree that we can keep out from initial implementations the Cross 
>> Column Constraints. The CEP calls them "General table constraints”, and it 
>> also states that they should be part of a future contribution with these 
>> words: "The framework can (and will) be extended with other Constraints”. 
>> So, yeah, not supporting them right now was already part of the plan :-)
>> 
>> For the second (and more interesting) part of the discussion, I completely 
>> agree that we need to prevent adding new reserved words whenever possible, 
>> but that shouldn’t harm the “expressivity” and readability of CQL by 
>> overloading words meanings too much. Now, having said that, I think you 
>> folks are proposing really interesting alternatives.
>> 
>> If we go with the CONSTRAINED WITH -> This allows us to remove the need of 
>> [constraint names] at the expense of removing the constraints definitions 
>> outside of the column definition. This may come back to hurt us in the 
>> future by, for example, making it harder to include those “General t

Re: [VOTE] CEP-37: Repair scheduling inside C*

2024-11-05 Thread Bernardo Botella
+1 (non binding)

> On Nov 5, 2024, at 1:28 PM, Jaydeep Chovatia  
> wrote:
> 
> Hi Everyone,
> 
> I would like to start the voting for CEP-37 as all the feedback in the 
> discussion thread seems to be addressed.
> 
> Proposal: CEP-37 Repair Scheduling Inside Cassandra 
> 
> Discussion thread: 
> https://lists.apache.org/thread/nl8rmsyxxovryl3nnlt4mzrj9t0x66ln
> 
> As per the CEP process documentation, this vote will be open for 72 hours 
> (longer if needed).
> 
> Thanks,
> Jaydeep



Re: Backporting CASSANDRA-17812 to 4.x

2024-11-05 Thread Bernardo Botella
+1 on back porting it

> On Nov 5, 2024, at 8:51 AM, Štefan Miklošovič  wrote:
> 
> Backporting in such a way that all auth requests will still go to the same 
> request executor as before is OK for me.
> 
> On Tue, Nov 5, 2024 at 3:32 PM J. D. Jordan  > wrote:
>> If I read the ticket correctly, this is preventing bcrypt of incoming 
>> credentials from causing a DOS?
>> I think that’s reasonable to backport.  If we want to be conservative it 
>> could be backported with added code that keeps the current behavior by 
>> default?
>> 
>>> On Nov 5, 2024, at 7:43 AM, Josh McKenzie >> > wrote:
>>> 
>>> 
>>> I'm neutral to the backport. In terms of the letter of the law, I can see 
>>> the argument either way of it being an improvement or a bugfix.
>>> 
>>> Definitely wouldn't -1 a backport.
>>> 
>>> On Tue, Nov 5, 2024, at 7:23 AM, Mick Semb Wever wrote:
 Can you please put the ticket description in the email.  Saves us having 
 to follow the link to know what you're talking about.
 
 Yes to backporting this.
 
 On Tue, 5 Nov 2024 at 10:27, Štefan Miklošovič >>> > wrote:
 Hello,
 
 I want to ask if there are objections for backporting CASSANDRA-17812 (1) 
 to 4.0.x and 4.1.x.
 
 There is a question already in that ticket about backporting from another 
 person and we keep being asked about this a lot. It seems to me that while 
 this is technically an improvement, it is so valuable that we should make 
 an exception here. 
 
 It is even security related.
 
 (1) https://issues.apache.org/jira/browse/CASSANDRA-17812
 
 Regards
>>> 



Re: [VOTE] CEP-42: Constraints Framework

2024-11-11 Thread Bernardo Botella
After offline discussion (Thanks Yifan and Stefan!), we have decided to drop 
the support of "table level" constraints at least for this initial 
implementation, and support defining constraints along with column definitions. 
This allows us to drop the need of new reserved keywords (no need to define 
CONSTRAINT as a reserved keyword anymore), and simplifies the grammar to alter 
and drop constraints. 
Other reason to do this is that cross column constraints support come with 
further challenges, and it is prone to include "read before write" patterns in 
some cases. As such, we decided it is better to leave that for a future PR in 
case there is demand for that feature. Having said that, grammar now supports:

// Create constraints at column level
CREATE TYPE keyspace.color (
  r int CHECK r >= 0 AND r < 255 ,
  g int,
  b int CHECK b > 10
)

// Drop the created constraints on a column
ALTER TABLE keyspace.color ALTER r DROP CHECK;

// Alter the constraints on a column
ALTER TYPE keyspace.color ALTER r CHECK r >= 10 AND r < 20;

Thanks everyone! 
Bernardo

> On Nov 4, 2024, at 1:15 PM, Bernardo Botella  
> wrote:
> 
> My comment comes from the fact that I find it counter intuitive to have 
> constraints defined at column level that reference other columns. From your 
> example, referencing column b from the column a definition looks off in my 
> head. Besides, validation can become trickier. For instance, when validating 
> the constraint on "a int CONSTRAINED WITH a != b”, we still don’t know if b 
> exists or not.
> 
> Considering those two items, I do think there is value on keeping the “table 
> level” constraints (basically, constraints defined outside of the column 
> definition), that can potentially be used for such cross column constraints.
> 
> As for the naming, current proposal (and implementation on the PR) have the 
> constraint name as optional, generating a name for the constraint if none is 
> passed. I think we are discussing here if we should keep this, or directly 
> remove the name completely with the drop of “table level” constraints.
> 
> Bernardo
> 
>> On Nov 4, 2024, at 10:21 AM, Štefan Miklošovič  
>> wrote:
>> 
>> Could you give some concrete examples of potential problems when introducing 
>> general / cross column constraints?
>> 
>> When I have a table where I do not want two columns to contain the same 
>> values for a given primary key (and we do not want to deal with a tuple as 
>> suggested before), why would this not be possible?
>> 
>> create table ks.tb (
>> id int,
>> a int CONSTRAINED WITH a != b,
>> b int CONSTRAINED WITH b != a, 
>> primary key (id)
>> )
>> 
>> ALTER TABLE ks.tb ALTER a CONSTRAINED WITH a != b AND b CONSTRAINED WITH b 
>> != a;
>> 
>> Maybe one constraint would be just enough? Depends how we look at it, if we 
>> look for constraints just for that column or all columns to see for 
>> potential constraint violations ... 
>> 
>> Can you give more (please, practical) examples where we would have problems 
>> with unnamed constraints?
>> 
>> In general, I think that we should not deliver something which is not 
>> optimal to use on delivery in the first palce. What if "general constraints" 
>> never come? Then we would end up with something which is not so comfortable 
>> to use. If we want to choose named constraints without any practical example 
>> which would use them, I think we should deliver general constraints with CEP 
>> on the introduction otherwise it will be half-baked without any guarantees 
>> we will see the other half of that.
>> 
>> Also, if you can not live without named constraints, I think that they 
>> _could_ be named, but their name would be hidden. How it appears in CQL is 
>> just syntactic sugar. You could indeed have them named, it is just you _do 
>> not have to_. For example when I have 
>> 
>> create table ks.tb (
>> id int,
>> a int CONSTRAINED WITH a > 10,
>> b int CONSTRAINED WITH b < 50,
>> primary key (id)
>> )
>> 
>> then there is nothing wrong with having internal representation to contain 
>> constraints which are named "ks_tb_a" and "ks_tb_b", it is just I do not 
>> need to use it in CQL every single time I am going to interact with that. 
>> So, keep the names if you want and if you think that multicolumn constraints 
>> absolutely need that, it is just a user would not need to deal with this 
>> every time and it would be easy on the eye and UX would be better.
>> 
>> On Mon, Nov 4, 2024 at 4:58 PM Bernardo Bot

Re: [DISCUSS] CEP-44: Kafka integration for Cassandra CDC using Sidecar

2024-09-30 Thread Bernardo Botella
o from a 
>>>> generalized Kafka producer engine to something specific to a particular 
>>>> use case. I don't see much on how this would be handled other than "left 
>>>> to the end user to figure out." 
>>>> 
>>>> There is also little mention of where the increased resource load would be 
>>>> handled. 
>>>> 
>>>> This has been discussed many times before, but is it time to introduce the 
>>>> concept of an elected leader for a token range for this type of operation? 
>>>> It would eliminate a ton of problems that need to managed when bridging c* 
>>>> to a system like Kafka. Last time it was discussed in earnest was for 
>>>> KIP-30: 
>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-30+-+Allow+for+brokers+to+have+plug-able+consensus+and+meta+data+storage+sub+systems
>>>>  
>>>> 
>>>> Patrick
>>>> 
>>>> On Sat, Sep 28, 2024 at 11:44 AM Jon Haddad >>> <mailto:j...@rustyrazorblade.com>> wrote:
>>>> Yes! I’m really looking forward to trying this out. The CEP looks really 
>>>> well thought out. I think this will make CDC a lot more useful for a lot 
>>>> of teams. 
>>>> Jon
>>>> 
>>>> 
>>>> On Fri, Sep 27, 2024 at 4:23 PM Josh McKenzie >>> <mailto:jmcken...@apache.org>> wrote:
>>>> 
>>>> Really excited to see this hit the ML James.
>>>> 
>>>> As author of the base CDC (get your stones ready for throwing :D) and 
>>>> someone moderately involved in the CEP here, definitely welcome any 
>>>> questions. CDC is a thorny problem in a multi-replica distributed system 
>>>> like this.
>>>> 
>>>> On Fri, Sep 27, 2024, at 5:40 PM, James Berragan wrote:
>>>>> Hi everyone,
>>>>> 
>>>>> Wiki: 
>>>>> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-44%3A+Kafka+integration+for+Cassandra+CDC+using+Sidecar
>>>>> 
>>>>> We would like to propose this CEP for adoption by the community.
>>>>> 
>>>>> CDC is a common technique in databases but right now there is no 
>>>>> out-of-the-box solution to do this easily and at scale with Cassandra. 
>>>>> Our proposal is to build a fully-fledged solution into the Apache 
>>>>> Cassandra Sidecar. This comes with a number of benefits:
>>>>> - Sidecar is an official part of the existing Cassandra eco-system.
>>>>> - Sidecar runs co-located with Cassandra instances and so scales with the 
>>>>> cluster size.
>>>>> - Sidecar can access the underlying Cassandra database to store CDC 
>>>>> configuration and the CDC state in a special table.
>>>>> - Running in the Sidecar does not require additional external resources 
>>>>> to run.
>>>>> 
>>>>> The core CDC module we anticipate will be pluggable and re-usable, it is 
>>>>> available for review here: 
>>>>> https://github.com/apache/cassandra-analytics/pull/87. The remaining 
>>>>> Sidecar code will follow.
>>>>> 
>>>>> As a reminder, please keep the discussion here on the dev list vs. in the 
>>>>> wiki, as we’ve found it easier to manage via email.
>>>>> 
>>>>> Sincerely,
>>>>> James Berragan
>>>>> Bernardo Botella Corbi
>>>>> Yifan Cai
>>>>> Jyothsna Konisa



Re: [VOTE] CEP-42: Constraints Framework

2024-09-19 Thread Bernardo Botella
Hi Patrick,

Thanks for taking a look at this and keeping the house tidy.

I announced the voting results on a sepparate thread:
https://lists.apache.org/thread/v73cwc8p80xx7zpkldjq6w1qrkf2k9h0

As a follow up, this is not stalled, and I’m currently working on a patch that 
will be soon available for review.

Thanks,
Bernardo


> On Sep 19, 2024, at 11:20 AM, Patrick McFadin  wrote:
> 
> I'm going to cap this thread. Vote passes with no binding -1s.
> 
> On Tue, Jul 2, 2024 at 2:25 PM Jordan West  <mailto:jorda...@gmail.com>> wrote:
>> +1
>> 
>> On Tue, Jul 2, 2024 at 12:15 Francisco Guerrero > <mailto:fran...@apache.org>> wrote:
>>> +1
>>> 
>>> On 2024/07/02 18:45:33 Josh McKenzie wrote:
>>> > +1
>>> > 
>>> > On Tue, Jul 2, 2024, at 1:18 PM, Abe Ratnofsky wrote:
>>> > > +1 (nb)
>>> > > 
>>> > >> On Jul 2, 2024, at 12:15 PM, Yifan Cai >> > >> <mailto:yc25c...@gmail.com>> wrote:
>>> > >> 
>>> > >> +1 on CEP-42.
>>> > >> 
>>> > >> - Yifan
>>> > >> 
>>> > >> On Tue, Jul 2, 2024 at 5:17 AM Jon Haddad >> > >> <mailto:j...@jonhaddad.com>> wrote:
>>> > >>> +1
>>> > >>> 
>>> > >>> On Tue, Jul 2, 2024 at 5:06 AM >> > >>> <mailto:shailajako...@icloud.com>> wrote:
>>> > >>>> +1
>>> > >>>> 
>>> > >>>> 
>>> > >>>>> On Jul 1, 2024, at 8:34 PM, Doug Rohrer >> > >>>>> <mailto:droh...@apple.com>> wrote:
>>> > >>>>> 
>>> > >>>>> +1 (nb) - Thanks for all of the suggestions and Bernardo for 
>>> > >>>>> wrangling the CEP into shape!
>>> > >>>>> 
>>> > >>>>> Doug
>>> > >>>>> 
>>> > >>>>>> On Jul 1, 2024, at 3:06 PM, Dinesh Joshi >> > >>>>>> <mailto:djo...@apache.org>> wrote:
>>> > >>>>>> 
>>> > >>>>>> +1
>>> > >>>>>> 
>>> > >>>>>> On Mon, Jul 1, 2024 at 11:58 AM Ariel Weisberg >> > >>>>>> <mailto:ar...@weisberg.ws>> wrote:
>>> > >>>>>>> __
>>> > >>>>>>> Hi,
>>> > >>>>>>> 
>>> > >>>>>>> I am +1 on CEP-42 with the latest updates to the CEP to clarify 
>>> > >>>>>>> syntax, error messages, constraint naming and generated naming, 
>>> > >>>>>>> alter/drop, describe etc.
>>> > >>>>>>> 
>>> > >>>>>>> I think this now tracks very closely to how other SQL databases 
>>> > >>>>>>> define constraints and the syntax is easily extensible to 
>>> > >>>>>>> multi-column and multi-table constraints.
>>> > >>>>>>> 
>>> > >>>>>>> Ariel
>>> > >>>>>>> 
>>> > >>>>>>> On Mon, Jul 1, 2024, at 9:48 AM, Bernardo Botella wrote:
>>> > >>>>>>>> With all the feedback that came in the discussion thread after 
>>> > >>>>>>>> the call for votes, I’d like to extend the period another 72 
>>> > >>>>>>>> hours starting today.
>>> > >>>>>>>> 
>>> > >>>>>>>> As before, a vote passes if there are at least 3 binding +1s and 
>>> > >>>>>>>> no binding vetoes.
>>> > >>>>>>>> 
>>> > >>>>>>>> Thanks,
>>> > >>>>>>>> Bernardo Botella
>>> > >>>>>>>> 
>>> > >>>>>>>>> On Jun 24, 2024, at 7:17 AM, Bernardo Botella 
>>> > >>>>>>>>> >> > >>>>>>>>> <mailto:conta...@bernardobotella.com>> wrote:
>>> > >>>>>>>>> 
>>> > >>>>>>>>> Hi everyone,
>>> > >>>>>>>>> 
>>> > >>>>>>>>> I would like to start the voting for CEP-42.
>>> > >>>>>>>>> 
>>> > >>>>>>>>> Proposal: 
>>> > >>>>>>>>> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-42%3A+Constraints+Framework
>>> > >>>>>>>>> Discussion: 
>>> > >>>>>>>>> https://lists.apache.org/thread/xc2phmxgsc7t3y9b23079vbflrhyyywj
>>> > >>>>>>>>> 
>>> > >>>>>>>>> The vote will be open for 72 hours. A vote passes if there are 
>>> > >>>>>>>>> at least 3 binding +1s and no binding vetoes.
>>> > >>>>>>>>> 
>>> > >>>>>>>>> Thanks,
>>> > >>>>>>>>> Bernardo Botella
>>> > >>>>>>>



Re: [VOTE] CEP-43: Apache Cassandra CREATE TABLE LIKE

2024-10-15 Thread Bernardo Botella
Fair point. I will move my feedback there.

> On Oct 15, 2024, at 4:19 PM, Yifan Cai  wrote:
> 
> For further discussions, should we use the discussion thread? This thread is 
> for voting. 
> 
> - Yifan
> 
> On Tue, Oct 15, 2024 at 3:31 PM Bernardo Botella 
> mailto:conta...@bernardobotella.com>> wrote:
>> Hi Guo,
>> 
>> Do you think it would make sense to add a fourth keyword to add after the 
>> WITH for Constraints? (See CEP-42)
>> 
>> Copying a table without the defined constraints may be useful.
>> 
>> Bernardo
>> 
>> 
>>> On Oct 9, 2024, at 9:32 PM, guo Maxwell >> <mailto:cclive1...@gmail.com>> wrote:
>>> 
>>> ok, I think the time can be two weeks . 
>>> 
>>> Looking forward to your feedback.
>>> 
>>> Abe Ratnofsky mailto:a...@aber.io>> 于2024年10月10日周四 11:51写道:
>>>> With the CEP only being completed last week and the Community over Code 
>>>> conference finishing up this week, I'd love to have a few more days to 
>>>> review and discuss the proposal.
>> 



Re: [DISCUSS] Introduce CREATE TABLE LIKE grammer

2024-10-15 Thread Bernardo Botella
. 
>>>>>> see mysql create table like 
>>>>>> <https://dev.mysql.com/doc/refman/8.4/en/create-table-like.html> and the 
>>>>>> newly created index name is the same with the original table's index 
>>>>>> name.
>>>>>> 
>>>>>> So for Casandra, I hope it can also support the information copy of 
>>>>>> index and even view/trigger. And I also hope to be able to flexibly 
>>>>>> decide which information is copied like pg.
>>>>>> 
>>>>>> Besides, I think the copy can happen between different keyspaces. And 
>>>>>> UDT needs to be taken into account.
>>>>>> 
>>>>>> But as we know the index/view/trigger name are all under keyspace level, 
>>>>>> so it seems that the newly created index name (or view name/ trigger 
>>>>>> name) must be different from the original tables' ,otherwise  names 
>>>>>> would clash .
>>>>>> 
>>>>>> So regarding the above problem, one idea I have is that for newly 
>>>>>> created types, indexes and views under different keyspaces and the same 
>>>>>> keyspace, we first generate random names for them, and then we can add 
>>>>>> the ability of modifying the names(for types/indexes/views/triggers) so 
>>>>>> that users can manually change the names.
>>>>>>  
>>>>>> 
>>>>>> guo Maxwell mailto:cclive1...@gmail.com>> 
>>>>>> 于2024年9月20日周五 08:06写道:
>>>>>>> No,I think still need some discuss on grammar detail after I finish the 
>>>>>>> first version 
>>>>>>> 
>>>>>>> Patrick McFadin >>>>>> <mailto:pmcfa...@gmail.com>>于2024年9月20日 周五上午2:24写道:
>>>>>>>> Is this CEP ready for a VOTE thread?
>>>>>>>> 
>>>>>>>> On Sat, Aug 24, 2024 at 8:56 PM guo Maxwell >>>>>>> <mailto:cclive1...@gmail.com>> wrote:
>>>>>>>>> Thank you for your replies, I will prepare a CEP later. 
>>>>>>>>> 
>>>>>>>>> Patrick McFadin mailto:pmcfa...@gmail.com>> 
>>>>>>>>> 于2024年8月20日周二 02:11写道:
>>>>>>>>>> +1 This is a CEP
>>>>>>>>>> 
>>>>>>>>>> On Mon, Aug 19, 2024 at 10:50 AM Jon Haddad >>>>>>>>> <mailto:j...@jonhaddad.com>> wrote:
>>>>>>>>>>> Given the fairly large surface area for this, i think it should be 
>>>>>>>>>>> a CEP. 
>>>>>>>>>>> 
>>>>>>>>>>> —
>>>>>>>>>>> Jon Haddad
>>>>>>>>>>> Rustyrazorblade Consulting
>>>>>>>>>>> rustyrazorblade.com <http://rustyrazorblade.com/>
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> On Mon, Aug 19, 2024 at 10:44 AM Bernardo Botella 
>>>>>>>>>>> >>>>>>>>>> <mailto:conta...@bernardobotella.com>> wrote:
>>>>>>>>>>>> Definitely a nice addition to CQL.
>>>>>>>>>>>> 
>>>>>>>>>>>> Looking for inspiration at how Postgres and Mysql do that may also 
>>>>>>>>>>>> help with the final design (I like the WITH proposed by Stefan, 
>>>>>>>>>>>> but I would definitely take a look at the INCLUDING keyword 
>>>>>>>>>>>> proposed by Postgres).
>>>>>>>>>>>> https://www.postgresql.org/docs/current/sql-createtable.html
>>>>>>>>>>>> https://dev.mysql.com/doc/refman/8.4/en/create-table-like.html
>>>>>>>>>>>> 
>>>>>>>>>>>> On top of that, and as part of the interesting questions, I would 
>>>>>>>>>>>> like to add the permissions to the mix. Both the question about 
>>>>>>>>>>>> copying them over (with a WITH keyword probably), and the need for 
>>>>>>>>>>>> read permissions on the source table as well.
>>>>>>>>>>>> 
>>>>>>>>>>>> Bernardo
>>>>>>>>>>>> 
>>>>>>>>>>>>> On Aug 19, 2024, at 10:01 AM, Štefan Miklošovič 
>>>>>>>>>>>>> mailto:smikloso...@apache.org>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> BTW this would be cool to do as well:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> ALTER TABLE ks.to_copy LIKE ks.tb WITH INDICES;
>>>>>>>>>>>>> 
>>>>>>>>>>>>> This would mean that if we create a copy of a table, later we can 
>>>>>>>>>>>>> decide that we need indices too, so we might "enrich" that table 
>>>>>>>>>>>>> with indices from the old one without necessarily explicitly 
>>>>>>>>>>>>> re-creating them on that new table. 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On Mon, Aug 19, 2024 at 6:55 PM Štefan Miklošovič 
>>>>>>>>>>>>> mailto:smikloso...@apache.org>> wrote:
>>>>>>>>>>>>>> I think this is an interesting idea worth exploring. I 
>>>>>>>>>>>>>> definitely agree with Benjamin who raised important questions 
>>>>>>>>>>>>>> which needs to be answered first. Also, what about triggers?
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> It might be rather "easy" to come up with something simple but 
>>>>>>>>>>>>>> it should be a comprehensive solution with predictable behavior 
>>>>>>>>>>>>>> we all agree on.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> If a keyspace of a new table does not exist we would need to 
>>>>>>>>>>>>>> create that one too before. For the simplicity, I would just 
>>>>>>>>>>>>>> make it a must to create it on same keyspace. We might iterate 
>>>>>>>>>>>>>> on that in the future.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> UDTs are created per keyspace so there is nothing to re-create. 
>>>>>>>>>>>>>> We just need to reference it from a new table, right?
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Indexes and MVs are interesting but in theory they might be 
>>>>>>>>>>>>>> re-created too.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Would it be appropriate to use something like this?
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> CREATE TABLE ks.tb_copy LIKE ks.tb WITH INDEXES AND VIEWS AND 
>>>>>>>>>>>>>> TRIGGERS  
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Without "WITH" it would just copy a table with nothing else.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Mon, Aug 19, 2024 at 6:10 PM guo Maxwell 
>>>>>>>>>>>>>> mailto:cclive1...@gmail.com>> wrote:
>>>>>>>>>>>>>>> Hello, everyone:
>>>>>>>>>>>>>>> As  Jira CASSANDRA-7662 
>>>>>>>>>>>>>>> <https://issues.apache.org/jira/browse/CASSANDRA-7662> has 
>>>>>>>>>>>>>>> described , we would like to introduce a new grammer " CREATE 
>>>>>>>>>>>>>>> TABLE LIKE " ,which  simplifies creating new tables duplicating 
>>>>>>>>>>>>>>> the existing ones .
>>>>>>>>>>>>>>> The format may be like : CREATE TABLE  LIKE 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Before I implement this function, do you have any suggestions 
>>>>>>>>>>>>>>> on this?
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Looking forward to your reply!
>>>>>>>>>>>> 



Re: [VOTE] CEP-43: Apache Cassandra CREATE TABLE LIKE

2024-10-15 Thread Bernardo Botella
Hi Guo,

Do you think it would make sense to add a fourth keyword to add after the WITH 
for Constraints? (See CEP-42)

Copying a table without the defined constraints may be useful.

Bernardo


> On Oct 9, 2024, at 9:32 PM, guo Maxwell  wrote:
> 
> ok, I think the time can be two weeks . 
> 
> Looking forward to your feedback.
> 
> Abe Ratnofsky mailto:a...@aber.io>> 于2024年10月10日周四 11:51写道:
>> With the CEP only being completed last week and the Community over Code 
>> conference finishing up this week, I'd love to have a few more days to 
>> review and discuss the proposal.



Re: [VOTE] CEP-43: Apache Cassandra CREATE TABLE LIKE

2024-11-06 Thread Bernardo Botella
+1 (nb)

Thanks a lot Guo for addressing all the comments!

> On Nov 6, 2024, at 7:21 AM, Štefan Miklošovič  wrote:
> 
> Having all cleared out in discussion thread (1), I think we can finally vote 
> on this.
> 
> +1
> 
> I welcome everybody to finish this vote or raise other issues in the 
> discussion thread if any.
> 
> (1) https://lists.apache.org/thread/2z09twbrv75rszpxbm1przxxohpjvkkl
> 
> On Mon, Nov 4, 2024 at 2:53 AM guo Maxwell  <mailto:cclive1...@gmail.com>> wrote:
>> Now at this point I think we can continue  the voting for CEP-43 as all the 
>> feedback in the discussion thread seems to be addressed.
>> 
>> Proposal: CEP43-CREATE TABLE LIKE 
>> <https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-43++Apache+Cassandra+CREATE+TABLE++LIKE>
>> Discussion thread:  discussion 
>> <https://lists.apache.org/list?dev@cassandra.apache.org:lte=1M:create%20table%20like>
>> 
>> As per the CEP process documentation, this vote will be open for 72 hours 
>> (longer if needed).
>> 
>> Bernardo Botella > <mailto:conta...@bernardobotella.com>> 于2024年10月16日周三 07:40写道:
>>> Fair point. I will move my feedback there.
>>> 
>>>> On Oct 15, 2024, at 4:19 PM, Yifan Cai >>> <mailto:yc25c...@gmail.com>> wrote:
>>>> 
>>>> For further discussions, should we use the discussion thread? This thread 
>>>> is for voting. 
>>>> 
>>>> - Yifan
>>>> 
>>>> On Tue, Oct 15, 2024 at 3:31 PM Bernardo Botella 
>>>> mailto:conta...@bernardobotella.com>> wrote:
>>>>> Hi Guo,
>>>>> 
>>>>> Do you think it would make sense to add a fourth keyword to add after the 
>>>>> WITH for Constraints? (See CEP-42)
>>>>> 
>>>>> Copying a table without the defined constraints may be useful.
>>>>> 
>>>>> Bernardo
>>>>> 
>>>>> 
>>>>>> On Oct 9, 2024, at 9:32 PM, guo Maxwell >>>>> <mailto:cclive1...@gmail.com>> wrote:
>>>>>> 
>>>>>> ok, I think the time can be two weeks . 
>>>>>> 
>>>>>> Looking forward to your feedback.
>>>>>> 
>>>>>> Abe Ratnofsky mailto:a...@aber.io>> 于2024年10月10日周四 
>>>>>> 11:51写道:
>>>>>>> With the CEP only being completed last week and the Community over Code 
>>>>>>> conference finishing up this week, I'd love to have a few more days to 
>>>>>>> review and discuss the proposal.
>>>>> 
>>> 



Re: February 2025 project status update

2025-02-03 Thread Bernardo Botella
Thanks a lot Josh for those Jira filters!! I think they are going to be really 
useful to avoid having hanging and stale tickets, reducing contributors 
frustration for not getting the deserved attention.

It is great to see community activity growing!

Bernardo

> On Feb 3, 2025, at 9:35 AM, Josh McKenzie  wrote:
> 
> Welcome to February. An oddly spelled month with a peculiar and inconsistent 
> number of days.
> 
> Releases:
> We find ourselves in the somewhat odd place where release votes passed for 
> 3.0, 3.11, and 4.0, however there were insufficient votes on 4.1 and 5.0 to 
> release those branches. Expect more to come on this front.
> 
> Java driver released 3.12.0 and 3.12.1; check out the user@ announcements 
> here: https://lists.apache.org/thread/pnv3xq1d2sydmxzh128trd797sb4zjc0, 
> https://lists.apache.org/thread/jtgxrx5fhx772lhndvc4d6w3507tff43
> 
> Discussions:
> 10 topics on dev@: Link to ponymail: 
> https://lists.apache.org/list.html?dev@cassandra.apache.org
> - Patrick is shopping around for contributors to another Cassandra 
> Forward virtual event: 
> https://lists.apache.org/thread/0h53v3v5c8t8txfo7th9xnlsffvs67r1. If you have 
> a topic you want to speak about, chime in!
> - Brandon reached out to clarify the difference between Patch Available 
> and Needs Committer, and Dmitry followed up with some questions around the 
> contribution process and workflow that are perhaps sparsely documented: 
> https://lists.apache.org/thread/ktds22nm1jptrhrfnmm0y2yyg5v22zcq. Great info 
> here for new contributors if you're looking for some insight and 
> clarification; ideally we'll get the results of this discussion into either 
> the wiki or even better, the "how to contribute" on our project website.
> - The discussion on "5.1 should be 6.0" continues: 
> https://lists.apache.org/thread/6sv3pjp2fdowgs21wjl8mw54q7t2oxgn. I'm 
> planning on forking off another thread to propose some ideas around 
> simplifying our release process. For now, that thread _seems_ to have 
> surfaced a general acceptance of us versioning our next major as 6.0. Barring 
> any last minute protests of course.
> - The discussion about capabilities and feature advertisement between 
> nodes and cluster-wide consistency on this topic continues in "Capabilities": 
> https://lists.apache.org/thread/0ychf0zwqoys9jbbr0bjchzg5zcts6pg. There's a 
> lot of really good, interesting work going on in that thread; I think the 
> outcome of this discussion and more robust and consistent cluster-wide 
> awareness of what is and is not safe to use would be incredibly valuable to 
> our users (nevermind those of us on the project never having to write a 
> mixed-capability in-jvm cluster dtest again...). Definitely worth a read.
> - And last but not least, Maxwell Guo has reached out to ask about the 
> status of triggers on the project here: 
> https://lists.apache.org/thread/n5b2pmr0451ho82f3xvgt153yj3d54sx. Triggers, 
> counters, oldschool Secondary Indexes, Materialized Views: all of these 
> features live in a fenced off place where we generally tell people "Don't use 
> these unless you _really_ know what you're doing". There might be a nugget of 
> insight to take from that that would inform our future development path on 
> some of those things.
> 
> 
> 7 topics on user@ (excluding UNSUBSCRIBE flailing ;) ): Link to ponymail: 
> https://lists.apache.org/list.html?u...@cassandra.apache.org
> - Sebastian Albrecht had a question around whether requiring forceful 
> nodetool intervention to enable audit logging after enabling the feature was 
> required: https://lists.apache.org/thread/3whc30bqfcr1vgwv73zwlv74l2v3c0gt.  
> I think there's some interesting UX insights we could take away from this 
> discussion broadly, not just for this specific feature.
> - Tommy Stendahl looks like he's both a) wrestling with java driver 
> changes and schema agreement check changes or regressions, and b) a lack of 
> clarity around whether the java driver mailing list is still a thing or if 
> conversation should come to user@: 
> https://lists.apache.org/thread/w7hg99q0zcp196vvo75r3t6w32j2rjfo. Either way, 
> CASSJAVA-69 was created to track this potential issue: 
> https://issues.apache.org/jira/browse/CASSJAVA-69
> 
> JIRA:
> Closed: 95 since Jan 1: 
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20cassandra%20and%20resolved%20%3E%20%222025-01-01%22%20and%20resolution%20%3D%20fixed%20order%20by%20resolved%20.
>  That's... a lot of jiras. I'm impressed. I had no idea we had this much 
> movement, even with me staying on top of the JIRA firehose.
> Created: 108 new issues created since Jan 1, nothing higher than "normal" 
> priority: 
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20cassandra%20and%20created%20%3E%20%222025-01-01%22%20order%20by%20priority%2C%20created
> 
> New Contributors:
> Here's a kanban filter that'll show you some good starter tickets that are 
> currently unclaimed: 
> h

Re: [VOTE] CEP-45: Mutation Tracking

2025-02-04 Thread Bernardo Botella
+1 (nb)

> On Feb 4, 2025, at 12:34 PM, Dinesh Joshi  wrote:
> 
> +1
> 
> On Mon, Feb 3, 2025 at 10:35 AM Blake Eggleston  > wrote:
>> Hi dev@,
>> 
>> I’d like to start the voting for CEP-45: Mutation Tracking
>> 
>> Proposal: 
>> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-45:+Mutation+Tracking
>> Discussion: https://lists.apache.org/thread/0rstj4bzbb2596o5vw1m863ofggdjc81
>> 
>> The vote will be open for 72 hours. A vote passes if there are at least 3 
>> binding +1s and no binding vetoes.
>> 
>> Thanks,
>> Blake Eggleston



[DISCUSS] Fine grained max size guardails

2025-02-08 Thread Bernardo Botella
Hi everyone,

After Constraints framework was merged in, I would like to come back to the 
discussion Jordan brought up in this Jira:
https://issues.apache.org/jira/browse/CASSANDRA-19677

For context, that Jira ticket (and PR) is adding a bunch of more fine grained 
size thresholds for column types using guardrails, expanding on what these 
Jiras added:
https://issues.apache.org/jira/browse/CASSANDRA-17151
https://issues.apache.org/jira/browse/CASSANDRA-17150

Now, we have an alternative way to set sizes to scpecific columns using 
constraints (we have LENGHT constraint, which is technically different, but 
adding a SIZE constraint is on the roadmap and straight forward).

Jordan raised a really valid concern that these new guardrails may be adding 
some noise to an already crowded space such as settings. On the other hand, 
these guardrails operate at a different level than constraints, as they are 
generic as opposed to column specific.

We would like to hear what the community think in this case. Should these 
guardrails go in? Or do we drop them in favour of plain constraints?

My two cents: My opinion is that these guardails still add value and help 
operators a more fine grained control to "protect" the database.

Regards,
Bernardo

Re: [DISCUSS] Fine grained max size guardails

2025-02-08 Thread Bernardo Botella
Yifan: how is the SIZE constraint from the LENGTH constraint? -> I think you 
are asking how are they different? They are similar, but not exactly the same. 
And it will depend on the actual type of the column they are added. For 
example, for a blob, both SIZE and LENGTH would be equivalent. But, for 
strings, they are difference. For the string “foo”, LENGTH would be 3, but size 
would be bigger than 3 (depending on the actual encoding used).


> On Feb 8, 2025, at 7:58 PM, Yifan Cai  wrote:
> 
> It makes sense to me to have both guardrails (which is for operators) and 
> constraints (which is for app owners) to define size limits. Besides the 
> difference in the target audience groups, the scope where guardrail and 
> constraints are applicable also differs. 
> 
> However, it is unnecessary to reject constraints definition if it goes beyond 
> the relevant guardrail, as long as the write failure indicates whether the 
> size violates the guardrail or column constraint, which should be propagated 
> to clients for transparency.
> 
> Btw, how is the SIZE constraint from the LENGTH constraint? 
> 
> - Yifan
> 
> On Sat, Feb 8, 2025 at 6:25 PM Bernardo Botella  <mailto:conta...@bernardobotella.com>> wrote:
>> Thanks everyone for the inputs.
>> 
>> Dinesh: "constraint should not violate the max bound of the guardrail” -> 
>> Yes, that statement is true with the proposed patch. With code as is, the 
>> write will fail if it either does not comply with the guardrail OR does not 
>> comply with the constraint. The CEP touched this as well, stating that 
>> guardrails take preference over defined constraints in schemas, so no matter 
>> what, these guardrails will always be respected.
>> 
>> Thanks,
>> Bernardo
>> 
>>> On Feb 8, 2025, at 6:09 PM, Dinesh Joshi >> <mailto:djo...@apache.org>> wrote:
>>> 
>>> Guardrails and constraints serve distinct purposes. Guardrails allow the 
>>> operator to define reasonable bounds while constraints allow the developer 
>>> to do the same in the schema. However the constraint should not violate the 
>>> max bound of the guardrail. For example, if an operator defines the max 
>>> size of a column to be 1MiB then a constraint in the schema cannot go 
>>> beyond this max size limit. This allows the operator to define reasonable 
>>> limits while allowing the developer control over their application’s limits.
>>> 
>>> On Sat, Feb 8, 2025 at 12:03 PM Bernardo Botella 
>>> mailto:conta...@bernardobotella.com>> wrote:
>>>> Hi everyone,
>>>> 
>>>> After Constraints framework was merged in, I would like to come back to 
>>>> the discussion Jordan brought up in this Jira:
>>>> https://issues.apache.org/jira/browse/CASSANDRA-19677
>>>> 
>>>> For context, that Jira ticket (and PR) is adding a bunch of more fine 
>>>> grained size thresholds for column types using guardrails, expanding on 
>>>> what these Jiras added:
>>>> https://issues.apache.org/jira/browse/CASSANDRA-17151
>>>> https://issues.apache.org/jira/browse/CASSANDRA-17150
>>>> 
>>>> Now, we have an alternative way to set sizes to scpecific columns using 
>>>> constraints (we have LENGHT constraint, which is technically different, 
>>>> but adding a SIZE constraint is on the roadmap and straight forward).
>>>> 
>>>> Jordan raised a really valid concern that these new guardrails may be 
>>>> adding some noise to an already crowded space such as settings. On the 
>>>> other hand, these guardrails operate at a different level than 
>>>> constraints, as they are generic as opposed to column specific.
>>>> 
>>>> We would like to hear what the community think in this case. Should these 
>>>> guardrails go in? Or do we drop them in favour of plain constraints?
>>>> 
>>>> My two cents: My opinion is that these guardails still add value and help 
>>>> operators a more fine grained control to "protect" the database.
>>>> 
>>>> Regards,
>>>> Bernardo
>> 



Re: [DISCUSS] Fine grained max size guardails

2025-02-08 Thread Bernardo Botella
Noted for when we get to implement the SERIALIZED_SIZE constraint :-)

> El feb 8, 2025, a las 8:55 p. m., Yifan Cai  escribió:
> 
> Thanks for the example. 
> 
> "SIZE" is in fact "SERIALIZED_SIZE". 
> 
> The term size and length are mostly interchangeable. Some modifiers on size 
> will be required in order to distinguish. 
> 
> - Yifan
> 
> On Sat, Feb 8, 2025 at 8:50 PM Bernardo Botella  <mailto:conta...@bernardobotella.com>> wrote:
>> Yifan: how is the SIZE constraint from the LENGTH constraint? -> I think you 
>> are asking how are they different? They are similar, but not exactly the 
>> same. And it will depend on the actual type of the column they are added. 
>> For example, for a blob, both SIZE and LENGTH would be equivalent. But, for 
>> strings, they are difference. For the string “foo”, LENGTH would be 3, but 
>> size would be bigger than 3 (depending on the actual encoding used).
>> 
>> 
>>> On Feb 8, 2025, at 7:58 PM, Yifan Cai >> <mailto:yc25c...@gmail.com>> wrote:
>>> 
>>> It makes sense to me to have both guardrails (which is for operators) and 
>>> constraints (which is for app owners) to define size limits. Besides the 
>>> difference in the target audience groups, the scope where guardrail and 
>>> constraints are applicable also differs. 
>>> 
>>> However, it is unnecessary to reject constraints definition if it goes 
>>> beyond the relevant guardrail, as long as the write failure indicates 
>>> whether the size violates the guardrail or column constraint, which should 
>>> be propagated to clients for transparency.
>>> 
>>> Btw, how is the SIZE constraint from the LENGTH constraint? 
>>> 
>>> - Yifan
>>> 
>>> On Sat, Feb 8, 2025 at 6:25 PM Bernardo Botella 
>>> mailto:conta...@bernardobotella.com>> wrote:
>>>> Thanks everyone for the inputs.
>>>> 
>>>> Dinesh: "constraint should not violate the max bound of the guardrail” -> 
>>>> Yes, that statement is true with the proposed patch. With code as is, the 
>>>> write will fail if it either does not comply with the guardrail OR does 
>>>> not comply with the constraint. The CEP touched this as well, stating that 
>>>> guardrails take preference over defined constraints in schemas, so no 
>>>> matter what, these guardrails will always be respected.
>>>> 
>>>> Thanks,
>>>> Bernardo
>>>> 
>>>>> On Feb 8, 2025, at 6:09 PM, Dinesh Joshi >>>> <mailto:djo...@apache.org>> wrote:
>>>>> 
>>>>> Guardrails and constraints serve distinct purposes. Guardrails allow the 
>>>>> operator to define reasonable bounds while constraints allow the 
>>>>> developer to do the same in the schema. However the constraint should not 
>>>>> violate the max bound of the guardrail. For example, if an operator 
>>>>> defines the max size of a column to be 1MiB then a constraint in the 
>>>>> schema cannot go beyond this max size limit. This allows the operator to 
>>>>> define reasonable limits while allowing the developer control over their 
>>>>> application’s limits.
>>>>> 
>>>>> On Sat, Feb 8, 2025 at 12:03 PM Bernardo Botella 
>>>>> mailto:conta...@bernardobotella.com>> 
>>>>> wrote:
>>>>>> Hi everyone,
>>>>>> 
>>>>>> After Constraints framework was merged in, I would like to come back to 
>>>>>> the discussion Jordan brought up in this Jira:
>>>>>> https://issues.apache.org/jira/browse/CASSANDRA-19677
>>>>>> 
>>>>>> For context, that Jira ticket (and PR) is adding a bunch of more fine 
>>>>>> grained size thresholds for column types using guardrails, expanding on 
>>>>>> what these Jiras added:
>>>>>> https://issues.apache.org/jira/browse/CASSANDRA-17151
>>>>>> https://issues.apache.org/jira/browse/CASSANDRA-17150
>>>>>> 
>>>>>> Now, we have an alternative way to set sizes to scpecific columns using 
>>>>>> constraints (we have LENGHT constraint, which is technically different, 
>>>>>> but adding a SIZE constraint is on the roadmap and straight forward).
>>>>>> 
>>>>>> Jordan raised a really valid concern that these new guardrails may be 
>>>>>> adding some noise to an already crowded space such as settings. On the 
>>>>>> other hand, these guardrails operate at a different level than 
>>>>>> constraints, as they are generic as opposed to column specific.
>>>>>> 
>>>>>> We would like to hear what the community think in this case. Should 
>>>>>> these guardrails go in? Or do we drop them in favour of plain 
>>>>>> constraints?
>>>>>> 
>>>>>> My two cents: My opinion is that these guardails still add value and 
>>>>>> help operators a more fine grained control to "protect" the database.
>>>>>> 
>>>>>> Regards,
>>>>>> Bernardo
>>>> 
>> 



Re: [DISCUSS] NOT_NULL constraint vs STRICTLY_NOT_NULL constraint

2025-02-10 Thread Bernardo Botella
Hi. These was a topic we discussed during the ML thread:
https://lists.apache.org/thread/xc2phmxgsc7t3y9b23079vbflrhyyywj

Here was one of my answers on that:
https://lists.apache.org/thread/76olqf6225noygxcclsrs56ngnlmcvxv

It was also specified in the CEP 
(https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-42%3A+Constraints+Framework#CEP42:ConstraintsFramework-Constraintexecutionatwritetime):
"Note: This constraints are only enforced at write time. So, an ALTER 
CONSTRAINT with more restrictive constraints shouldn’t affect preexisting data.”

Long story short, constraints are only checked at write time. If a constraint 
is added to a table with preexisting offending data, that data stays untouched.

I hope this helps,
Bernardo

> On Feb 10, 2025, at 7:00 AM, Benedict  wrote:
> 
> This is counterintuitive to me. The constraint should be applied to the 
> table, not to the update. NOT NULL should imply a value is always specified.
> 
> How are you handling this for tables that already exist? Can we alter table 
> to add constraints, and if so what are the semantics?
> 
>> On 10 Feb 2025, at 14:50, Bernardo Botella  
>> wrote:
>> 
>> Hi everyone,
>> 
>> Stefan Miklosovic and I have been working on a NOT_NULL 
>> (https://github.com/apache/cassandra/pull/3867) constraint to be added to 
>> the constraints tool belt, and a really interesting conversation came up.
>> 
>> First, as a problem statement, let's consider this:
>> 
>> -
>> CREATE TABLE ks.tb2 (
>>   id int,
>>   cl1 int,
>>   cl2 int,
>>   val text CHECK NOT_NULL(val),
>>   PRIMARY KEY (id, cl1, cl2)
>> )
>> 
>> cassandra@cqlsh> INSERT INTO ks.tb2 (id, cl1, cl2, val) VALUES ( 1, 2, 3, 
>> null);
>> InvalidRequest: Error from server: code=2200 [Invalid query] message="Column 
>> value does not satisfy value constraint for column 'val' as it is null."
>> 
>> cassandra@cqlsh> INSERT INTO ks.tb2 (id, cl1, cl2, val) VALUES ( 1, 2, 3, 
>> “text");
>> cassandra@cqlsh> select * from ks.tb2;
>> 
>> id | cl1 | cl2 | val
>> +-+-+--
>> 1 |   2 |   3 | text
>> 
>> (1 rows)
>> cassandra@cqlsh> INSERT INTO ks.tb2 (id, cl1, cl2) VALUES ( 1, 2, 4);
>> cassandra@cqlsh> select * from ks.tb2;
>> 
>> id | cl1 | cl2 | val
>> +-+-+--
>> 1 |   2 |   3 | text
>> 1 |   2 |   4 | null
>> 
>> -
>> 
>> As you see, we have a hole in which a 'null' value is getting written on 
>> column val even if we have a NOT_NULL on that particular column whenever the 
>> column is NOT specified on the write. That raises the question on how this 
>> particular constraint should behave.
>> 
>> If we consider the other constraints (scalar constraint and length 
>> constraint so far), this particular behavior is fine. But, if the constraint 
>> is NOT_NULL, then it becomes a little bit trickier.
>> 
>> The conclusions we have reached is that the meaning of constraints should be 
>> interpreted like: I check whatever you give me as part of the write, 
>> ignoring everything else. Let me elaborate:
>> If we decide to treat this particular NOT_NULL constraint differently, and 
>> check if the value for that column is present in the insert statement, we 
>> then open a different can of worms. What happens if the row already exists 
>> with a valid value, and that insert statement is only trying to do an update 
>> to a different column in the row? If that was the case, we would be forcing 
>> the user to specify the 'val' column value for every update, even if it is 
>> not needed.
>> 
>> Mainly for this reason, we think it is better to treat this NOT_NULL 
>> constraint just like the other constraints, and execute it ONLY on the 
>> values that are present on the insert statement.
>> 
>> The main con is that it may lead to a little bit of confussion (as in, why I 
>> just added a null value to the table even if I have a NOT_NULL constraint?). 
>> We have thought on aliviating this particular confusion by:
>> - Extensive documentation. Let's be upfront on what this constraint does and 
>> does not. 
>> (https://github.com/apache/cassandra/blob/ed58c404e8c880b69584e71a3690d3d9f73ef9fa/doc/modules/cassandra/pages/developing/cql/constraints.adoc#not_null-constraint)
>> - Adding, as part of this patch, yet another constraint (STRICTLY_NOT_NULL), 
>> that checks for the actual column value to be present in the insert 
>> statement..
>> 
>> If you've made it until here, that means you are really interested in 
>> constraints. Thanks! The question for you is, would you have any concern 
>> with this approach?
>> 
>> Thanks,
>> Bernardo



Re: [DISCUSS] NOT_NULL constraint vs STRICTLY_NOT_NULL constraint

2025-02-10 Thread Bernardo Botella
I will create a Jira to keep track of that “NO VERIFY” suggestion. For this 
thread, I’d like to stick to the actual proposal for both NOT_NULL and 
STRICTLY_NOT_NULL constraints Stefan and I are adding on the patch.


> On Feb 10, 2025, at 7:18 AM, Benedict  wrote:
> 
> Thanks. While I agree we shouldn’t be applying these constraints post hoc on 
> read or compaction, I think we need to make clear to the user whether we are 
> validating a new constraint before accepting it for alter table. Which is to 
> say I think alter table should require something like “NO VERIFY” or some 
> other additional keywords to make clear we aren’t checking the constraint 
> applies to existing data.
> 
> 
>> On 10 Feb 2025, at 15:10, Bernardo Botella  
>> wrote:
>> 
>> Hi. These was a topic we discussed during the ML thread:
>> https://lists.apache.org/thread/xc2phmxgsc7t3y9b23079vbflrhyyywj
>> 
>> Here was one of my answers on that:
>> https://lists.apache.org/thread/76olqf6225noygxcclsrs56ngnlmcvxv
>> 
>> It was also specified in the CEP 
>> (https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-42%3A+Constraints+Framework#CEP42:ConstraintsFramework-Constraintexecutionatwritetime):
>> "Note: This constraints are only enforced at write time. So, an ALTER 
>> CONSTRAINT with more restrictive constraints shouldn’t affect preexisting 
>> data.”
>> 
>> Long story short, constraints are only checked at write time. If a 
>> constraint is added to a table with preexisting offending data, that data 
>> stays untouched.
>> 
>> I hope this helps,
>> Bernardo
>> 
>>> On Feb 10, 2025, at 7:00 AM, Benedict  wrote:
>>> 
>>> This is counterintuitive to me. The constraint should be applied to the 
>>> table, not to the update. NOT NULL should imply a value is always specified.
>>> 
>>> How are you handling this for tables that already exist? Can we alter table 
>>> to add constraints, and if so what are the semantics?
>>> 
>>>> On 10 Feb 2025, at 14:50, Bernardo Botella  
>>>> wrote:
>>>> 
>>>> Hi everyone,
>>>> 
>>>> Stefan Miklosovic and I have been working on a NOT_NULL 
>>>> (https://github.com/apache/cassandra/pull/3867) constraint to be added to 
>>>> the constraints tool belt, and a really interesting conversation came up.
>>>> 
>>>> First, as a problem statement, let's consider this:
>>>> 
>>>> -
>>>> CREATE TABLE ks.tb2 (
>>>>   id int,
>>>>   cl1 int,
>>>>   cl2 int,
>>>>   val text CHECK NOT_NULL(val),
>>>>   PRIMARY KEY (id, cl1, cl2)
>>>> )
>>>> 
>>>> cassandra@cqlsh> INSERT INTO ks.tb2 (id, cl1, cl2, val) VALUES ( 1, 2, 3, 
>>>> null);
>>>> InvalidRequest: Error from server: code=2200 [Invalid query] 
>>>> message="Column value does not satisfy value constraint for column 'val' 
>>>> as it is null."
>>>> 
>>>> cassandra@cqlsh> INSERT INTO ks.tb2 (id, cl1, cl2, val) VALUES ( 1, 2, 3, 
>>>> “text");
>>>> cassandra@cqlsh> select * from ks.tb2;
>>>> 
>>>> id | cl1 | cl2 | val
>>>> +-+-+--
>>>> 1 |   2 |   3 | text
>>>> 
>>>> (1 rows)
>>>> cassandra@cqlsh> INSERT INTO ks.tb2 (id, cl1, cl2) VALUES ( 1, 2, 4);
>>>> cassandra@cqlsh> select * from ks.tb2;
>>>> 
>>>> id | cl1 | cl2 | val
>>>> +-+-+--
>>>> 1 |   2 |   3 | text
>>>> 1 |   2 |   4 | null
>>>> 
>>>> -
>>>> 
>>>> As you see, we have a hole in which a 'null' value is getting written on 
>>>> column val even if we have a NOT_NULL on that particular column whenever 
>>>> the column is NOT specified on the write. That raises the question on how 
>>>> this particular constraint should behave.
>>>> 
>>>> If we consider the other constraints (scalar constraint and length 
>>>> constraint so far), this particular behavior is fine. But, if the 
>>>> constraint is NOT_NULL, then it becomes a little bit trickier.
>>>> 
>>>> The conclusions we have reached is that the meaning of constraints should 
>>>> be interpreted like: I check whatever you give me as part of the write, 
>>>> ignoring everythin

[DISCUSS] NOT_NULL constraint vs STRICTLY_NOT_NULL constraint

2025-02-10 Thread Bernardo Botella
Hi everyone,

Stefan Miklosovic and I have been working on a NOT_NULL 
(https://github.com/apache/cassandra/pull/3867) constraint to be added to the 
constraints tool belt, and a really interesting conversation came up.

First, as a problem statement, let's consider this:

-
CREATE TABLE ks.tb2 (
id int,
cl1 int,
cl2 int,
val text CHECK NOT_NULL(val),
PRIMARY KEY (id, cl1, cl2)
) 

cassandra@cqlsh> INSERT INTO ks.tb2 (id, cl1, cl2, val) VALUES ( 1, 2, 3, null);
InvalidRequest: Error from server: code=2200 [Invalid query] message="Column 
value does not satisfy value constraint for column 'val' as it is null."

cassandra@cqlsh> INSERT INTO ks.tb2 (id, cl1, cl2, val) VALUES ( 1, 2, 3, 
“text");
cassandra@cqlsh> select * from ks.tb2;

 id | cl1 | cl2 | val
+-+-+--
  1 |   2 |   3 | text

(1 rows)
cassandra@cqlsh> INSERT INTO ks.tb2 (id, cl1, cl2) VALUES ( 1, 2, 4);
cassandra@cqlsh> select * from ks.tb2;

 id | cl1 | cl2 | val
+-+-+--
  1 |   2 |   3 | text
  1 |   2 |   4 | null

-

As you see, we have a hole in which a 'null' value is getting written on column 
val even if we have a NOT_NULL on that particular column whenever the column is 
NOT specified on the write. That raises the question on how this particular 
constraint should behave.

If we consider the other constraints (scalar constraint and length constraint 
so far), this particular behavior is fine. But, if the constraint is NOT_NULL, 
then it becomes a little bit trickier.

The conclusions we have reached is that the meaning of constraints should be 
interpreted like: I check whatever you give me as part of the write, ignoring 
everything else. Let me elaborate:
If we decide to treat this particular NOT_NULL constraint differently, and 
check if the value for that column is present in the insert statement, we then 
open a different can of worms. What happens if the row already exists with a 
valid value, and that insert statement is only trying to do an update to a 
different column in the row? If that was the case, we would be forcing the user 
to specify the 'val' column value for every update, even if it is not needed. 

Mainly for this reason, we think it is better to treat this NOT_NULL constraint 
just like the other constraints, and execute it ONLY on the values that are 
present on the insert statement.

The main con is that it may lead to a little bit of confussion (as in, why I 
just added a null value to the table even if I have a NOT_NULL constraint?). We 
have thought on aliviating this particular confusion by:
- Extensive documentation. Let's be upfront on what this constraint does and 
does not. 
(https://github.com/apache/cassandra/blob/ed58c404e8c880b69584e71a3690d3d9f73ef9fa/doc/modules/cassandra/pages/developing/cql/constraints.adoc#not_null-constraint)
- Adding, as part of this patch, yet another constraint (STRICTLY_NOT_NULL), 
that checks for the actual column value to be present in the insert statement..

If you've made it until here, that means you are really interested in 
constraints. Thanks! The question for you is, would you have any concern with 
this approach?

Thanks,
Bernardo

Re: [DISCUSS] NOT_NULL constraint vs STRICTLY_NOT_NULL constraint

2025-02-10 Thread Bernardo Botella
We have consensus then. Let’s ditch the non strict version, and rename the 
STRICTLY_NOT_NULL to NOT_NULL.

Thanks everyone!
Bernardo

> On Feb 10, 2025, at 8:58 AM, Štefan Miklošovič  wrote:
> 
> I agree.
> 
> The only reason would be purely practical: if a user has a table consisting 
> of 1000 columns not being null and a user wants to modify 1 column only, then 
> a user would be forced to specify the remaining 999 columns just for the sake 
> of it.
> 
> But in this case, I think it would be more practical just to ensure in the 
> application that what he is putting there is not null rather than having 1000 
> constraints on the table.
> 
> On Mon, Feb 10, 2025 at 5:52 PM Yifan Cai  <mailto:yc25c...@gmail.com>> wrote:
>> While LOOSE_NOT_NULL might improve the clarity a bit, what is the value of 
>> such constraint provides to users? It still permits null. Meanwhile, it is 
>> easier to check the nullness of the bound values on the application side.
>> IMO, what benefits users is a way to ensure no null value can exist for the 
>> constrained columns. Reading the thread, it is the behavior of the strict 
>> version. 
>> How about we just drop the LOOSE one and call the STRICT one “NOT_NULL”?
>> 
>> - Yifan
>> From: Bernardo Botella > <mailto:conta...@bernardobotella.com>>
>> Sent: Monday, February 10, 2025 8:44:13 AM
>> To: dev@cassandra.apache.org <mailto:dev@cassandra.apache.org> 
>> mailto:dev@cassandra.apache.org>>
>> Subject: Re: [DISCUSS] NOT_NULL constraint vs STRICTLY_NOT_NULL constraint
>>  
>> To recap,
>> 
>> The sentiment I am getting is that NOT_NULL allowing null values is too 
>> confusing. Nice, that’s why we started the thread.
>> 
>> As an alternative, instead of ditching the loose not null constraint, I 
>> propose we change the “default” behavior. From my initial proposal, I 
>> suggest renaming the Constraints:
>> - NOT_NULL -> LOOSE_NOT_NULL
>> - STRICTLY_NOT_NULL -> NOT_NULL
>> 
>> The reasoning behind trying to keep it is:
>> - It is already implemented.
>> - By being explicit with it being loose, we avoid the confusion of allowing 
>> nulls.
>> - It still adds value on its own.
>> 
>> With, the “by default” not null doesn’t allow null or non present values on 
>> the insert statement, while we still support the more relaxed LOOSE_NOT_NULL 
>> for updates.
>> 
>> Thoughts?
>> 
>> 
>>> On Feb 10, 2025, at 8:29 AM, Štefan Miklošovič >> <mailto:smikloso...@apache.org>> wrote:
>>> 
>>> 
>>> 
>>> On Mon, Feb 10, 2025 at 5:20 PM Dinesh Joshi >> <mailto:djo...@apache.org>> wrote:
>>> In my head NOT_NULL constraint implies that the column must be specified on 
>>> each write and must not be NULL. If a column with the NOT_NULL constraint 
>>> is omitted during a write then shouldn’t it be treated as if it was 
>>> specified and set to NULL?
>>> 
>>> Well, yes. One may also look at it that way. But then we would end up with 
>>> "null" in a column, while it would be quite surprising for users to see 
>>> that because they were thinking that if they specified it as NOT NULL on a 
>>> table creation, then it is "guaranteed" that it will not be null ever 
>>> again. It just looks strange to say in table schema it is not null but then 
>>> it actually might be.
>>>  
>>> 
>>> If the column has a non-NULL value that was previously written and you’re 
>>> updating the rest of the columns, you still have to force the user to 
>>> specify it otherwise you will have to perform a read before write to 
>>> validate that the column was not NULL. I think this is a fine compromise 
>>> given that the goal here is to ensure that an application shouldn’t 
>>> inadvertently write a NULL value for a column specified as NOT_NULL.
>>> 
>>> 
>>> Yes. I see it the same way. 
>>>  
>>> On Mon, Feb 10, 2025 at 6:50 AM Bernardo Botella 
>>> mailto:conta...@bernardobotella.com>> wrote:
>>> Hi everyone,
>>> 
>>> Stefan Miklosovic and I have been working on a NOT_NULL 
>>> (https://github.com/apache/cassandra/pull/3867) constraint to be added to 
>>> the constraints tool belt, and a really interesting conversation came up.
>>> 
>>> First, as a problem statement, let's consider this:
>>> 
>>> -
>>> CREATE TABLE ks.tb2 (
>>> id int,
>>

Re: [DISCUSS] NOT_NULL constraint vs STRICTLY_NOT_NULL constraint

2025-02-10 Thread Bernardo Botella
To recap,

The sentiment I am getting is that NOT_NULL allowing null values is too 
confusing. Nice, that’s why we started the thread.

As an alternative, instead of ditching the loose not null constraint, I propose 
we change the “default” behavior. From my initial proposal, I suggest renaming 
the Constraints:
- NOT_NULL -> LOOSE_NOT_NULL
- STRICTLY_NOT_NULL -> NOT_NULL

The reasoning behind trying to keep it is:
- It is already implemented.
- By being explicit with it being loose, we avoid the confusion of allowing 
nulls.
- It still adds value on its own.

With, the “by default” not null doesn’t allow null or non present values on the 
insert statement, while we still support the more relaxed LOOSE_NOT_NULL for 
updates.

Thoughts?


> On Feb 10, 2025, at 8:29 AM, Štefan Miklošovič  wrote:
> 
> 
> 
> On Mon, Feb 10, 2025 at 5:20 PM Dinesh Joshi  <mailto:djo...@apache.org>> wrote:
>> In my head NOT_NULL constraint implies that the column must be specified on 
>> each write and must not be NULL. If a column with the NOT_NULL constraint is 
>> omitted during a write then shouldn’t it be treated as if it was specified 
>> and set to NULL?
> 
> Well, yes. One may also look at it that way. But then we would end up with 
> "null" in a column, while it would be quite surprising for users to see that 
> because they were thinking that if they specified it as NOT NULL on a table 
> creation, then it is "guaranteed" that it will not be null ever again. It 
> just looks strange to say in table schema it is not null but then it actually 
> might be.
>  
>> 
>> If the column has a non-NULL value that was previously written and you’re 
>> updating the rest of the columns, you still have to force the user to 
>> specify it otherwise you will have to perform a read before write to 
>> validate that the column was not NULL. I think this is a fine compromise 
>> given that the goal here is to ensure that an application shouldn’t 
>> inadvertently write a NULL value for a column specified as NOT_NULL.
>> 
> 
> Yes. I see it the same way. 
>  
>> On Mon, Feb 10, 2025 at 6:50 AM Bernardo Botella 
>> mailto:conta...@bernardobotella.com>> wrote:
>>> Hi everyone,
>>> 
>>> Stefan Miklosovic and I have been working on a NOT_NULL 
>>> (https://github.com/apache/cassandra/pull/3867) constraint to be added to 
>>> the constraints tool belt, and a really interesting conversation came up.
>>> 
>>> First, as a problem statement, let's consider this:
>>> 
>>> -
>>> CREATE TABLE ks.tb2 (
>>> id int,
>>> cl1 int,
>>> cl2 int,
>>> val text CHECK NOT_NULL(val),
>>> PRIMARY KEY (id, cl1, cl2)
>>> ) 
>>> 
>>> cassandra@cqlsh> INSERT INTO ks.tb2 (id, cl1, cl2, val) VALUES ( 1, 2, 3, 
>>> null);
>>> InvalidRequest: Error from server: code=2200 [Invalid query] 
>>> message="Column value does not satisfy value constraint for column 'val' as 
>>> it is null."
>>> 
>>> cassandra@cqlsh> INSERT INTO ks.tb2 (id, cl1, cl2, val) VALUES ( 1, 2, 3, 
>>> “text");
>>> cassandra@cqlsh> select * from ks.tb2;
>>> 
>>>  id | cl1 | cl2 | val
>>> +-+-+--
>>>   1 |   2 |   3 | text
>>> 
>>> (1 rows)
>>> cassandra@cqlsh> INSERT INTO ks.tb2 (id, cl1, cl2) VALUES ( 1, 2, 4);
>>> cassandra@cqlsh> select * from ks.tb2;
>>> 
>>>  id | cl1 | cl2 | val
>>> +-+-+--
>>>   1 |   2 |   3 | text
>>>   1 |   2 |   4 | null
>>> 
>>> -
>>> 
>>> As you see, we have a hole in which a 'null' value is getting written on 
>>> column val even if we have a NOT_NULL on that particular column whenever 
>>> the column is NOT specified on the write. That raises the question on how 
>>> this particular constraint should behave.
>>> 
>>> If we consider the other constraints (scalar constraint and length 
>>> constraint so far), this particular behavior is fine. But, if the 
>>> constraint is NOT_NULL, then it becomes a little bit trickier.
>>> 
>>> The conclusions we have reached is that the meaning of constraints should 
>>> be interpreted like: I check whatever you give me as part of the write, 
>>> ignoring everything else. Let me elaborate:
>>> If we decide to treat this particular NOT_NULL constraint differently, and 
>>> check if the

Re: 【DISCUSS】What is the current status of triggers in Cassandra ?

2025-01-31 Thread Bernardo Botella
+1 on skipping triggers if we can’t make sure that it will work in every 
scenario. 
The experience of copying a table and having a broken result is definitely 
something to avoid.

Kind regards,
Bernardo

> On Jan 31, 2025, at 10:49 AM, Brandon Williams  wrote:
> 
> I agree, and triggers are an expert feature anyway so I wouldn't
> expect them to be copied.
> 
> Kind Regards,
> Brandon
> 
> On Fri, Jan 31, 2025 at 12:46 PM Štefan Miklošovič
>  wrote:
>> 
>> Thank you Maxwell for reaching ML with this.
>> 
>> I was talking to Maxwell about a feature where CREATE TABLE LIKE would also 
>> support triggers.
>> 
>> create table ks.tb_copy like ks.tb with triggers
>> 
>> "with triggers" would be added to CQL grammar and it would "copy" what that 
>> trigger(s) is / are doing.
>> 
>> While this is technically possible to do, I am not completely sure it is the 
>> right thing to do. If you take a look into examples/triggers (we have this 
>> dir in the repository), there is an example of a trigger which is parsing 
>> keyspace / table to operate on from the configuration file 
>> (examples/triggers/conf/AuditTrigger.properties).
>> 
>> If a user copies a table like this, then, sure, a trigger will be copied as 
>> well, but it will not match anymore.
>> 
>> My argument against supporting copying triggers is that when we can not 
>> guarantee that it will work _in all cases_, then I would say that we should 
>> not be supporting this.
>> 
>> On Fri, Jan 31, 2025 at 6:06 PM guo Maxwell  wrote:
>>> 
>>> Hello dev,
>>>   I'm very sorry to disturb everyone's wonderful weekend time. Please allow 
>>> me to ask about the trigger in Cassandra?
>>> Maybe everyone knows some implementations of Cassandra's trigger. If the 
>>> user needs it to do something, it may be
>>> necessary to package the jar we need and load the corresponding class to do 
>>> something similar to preprocessing on the write path.
>>> So my question here is : Are we fine with the current implementation here, 
>>> Should we support triggers in new features ?
>>> I encountered this problem recently, we are also discussing whether we need 
>>> to continue to support trigger's clone in CEP-43 (CREATE TABLE LIKE)?
>>> 
>>> Looking forward to your reply.



Re: Joining Slack as a multi channel guest

2024-12-16 Thread Bernardo Botella
Hi Soheil,

I think Brandon refers to this Jira project
https://issues.apache.org/jira/projects/INFRA/

Regards,
Bernardo

> On Dec 16, 2024, at 10:04 AM, Soheil Rahsaz  wrote:
> 
> Thank you for your quick response.
> How do I contact INFRA?
> Should I Use Apache's Jira and put `INFRA` as the project?
> 
> On Mon, Dec 16, 2024 at 9:27 PM Brandon Williams  > wrote:
>> I don't think anyone is going to be able to help here, you may need to
>> contact INFRA.  When I try to invite your address I can't because
>> you're already in the workspace as a guest.
>> 
>> Kind Regards,
>> Brandon
>> 
>> On Mon, Dec 16, 2024 at 11:55 AM Soheil Rahsaz
>> mailto:soheilrahsaz...@gmail.com>> wrote:
>> >
>> > Hello everyone,
>> > Can anyone assist me in joining the slack issue?
>> > I'm trying to join the Slack as a multi channel guest with this email:
>> > soheilrahsaz...@gmail.com 
>> > Could anyone add me?
>> >
>> > Sincerely,
>> > Soheil



Re: Capabilities

2024-12-19 Thread Bernardo Botella
+1 to the positive sentiment of such a feature. Huge benefit towards reducing 
risks.

> On Dec 19, 2024, at 8:31 AM, Patrick McFadin  wrote:
> 
> Thanks for bringing this back, Jordan. I had completely forgotten
> about Riak's Capabilities support. That was a fan favorite for
> operators, along with a couple other interesting ways to control the
> upgrade process.
> 
> +1 on a CEP from me.
> 
> On Thu, Dec 19, 2024 at 7:38 AM Josh McKenzie  wrote:
>> 
>> Strong +1.
>> 
>> Much like having repair scheduling built in to the ecosystem, this feels 
>> like table stakes for having a self-contained, usable distributed database.
>> 
>> On Wed, Dec 18, 2024, at 6:11 PM, Dinesh Joshi wrote:
>> 
>> Hi Jordan,
>> 
>> Thank you for starting this thread. This is a great idea. From an ecosystem 
>> perspective this is absolutely critical. I'm a big +1 on working towards 
>> building this into Cassandra and the surrounding ecosystem. This would a 
>> step in the right direction to derisk upgrades.
>> 
>> Dinesh
>> 
>> On Wed, Dec 18, 2024 at 3:01 PM Jordan West  wrote:
>> 
>> In a recent discussion on the pains of upgrading one topic that came up is a 
>> feature that Riak had called Capabilities [1]. A major pain with upgrades is 
>> that each node independently decides when to start using new or modified 
>> functionality. Even when we put this behind a config (like storage 
>> compatibility mode) each node immediately enables the feature when the 
>> config is changed and the node is restarted. This causes various types of 
>> upgrade pain such as failed streams and schema disagreement. A recent 
>> example of this is CASSANRA-20118 [2]. In some cases operators can prevent 
>> this from happening through careful coordination (e.g. ensuring upgrade 
>> sstables only runs after the whole cluster is upgraded) but typically 
>> requires custom code in whatever control plane the operator is using. A 
>> capabilities framework would distribute the state of what features each node 
>> has (and their status e.g. enabled or not) so that the cluster can choose to 
>> opt in to new features once the whole cluster has them available. From 
>> experience, having this in Riak made upgrades a significantly less risky 
>> process and also paved a path towards repeatable downgrades. I think 
>> Cassandra would benefit from it as well.
>> 
>> Further, other tools like analytics could benefit from having this 
>> information since currently it's up to the operator to manually determine 
>> the state of the cluster in some cases.
>> 
>> I am considering drafting a CEP proposal for this feature but wanted to take 
>> the general temperature of the community and get some early thoughts while 
>> working on the draft.
>> 
>> Looking forward to hearing y'alls thoughts,
>> Jordan
>> 
>> [1] 
>> https://github.com/basho/riak_core/blob/25d9a6fa917eb8a2e95795d64eb88d7ad384ed88/src/riak_core_capability.erl#L23-L72
>> 
>> [2] https://issues.apache.org/jira/browse/CASSANDRA-20118
>> 
>> 



Re: Planet Cassandra meetup organizer opportunity

2024-12-06 Thread Bernardo Botella
Hi Melissa,

I’ll be happy to jump in to keep this going. Let’s sync about this when you 
have a chance.

Regards,
Bernardo

> On Dec 6, 2024, at 10:17 AM, Melissa Logan  wrote:
> 
> Hi folks:
> 
> My team and I created and managed the Planet Cassandra Meetup - a virtual 
> monthly meetup to share Cassandra use cases, best practices, case studies, 
> community updates, and other similar topics: 
> https://www.meetup.com/cassandra-global/
> 
> As we have stepped back from organizing this, we're looking for one or more 
> individuals who are interested in stepping up. In short, it would mean 
> finding speakers and hosting the event virtually each month. You can see a 
> detailed list of activities here: 
> 
> https://cwiki.apache.org/confluence/display/CASSANDRA/Planet+Cassandra+Meetup
> 
> It's a fun way to stay up-to-date on Cassandra and connected with the 
> community. I'm happy to share more details if anyone is interested. In an 
> ideal world, a couple people would collaborate to draw ideas from multiple 
> sources and share the workload.
> 
> Note: Meetup.com does incur a monthly fee of about $15/mo which I believe 
> will remain at this legacy/lower fee.
> 
> If we don't hear from anyone by Friday, Jan. 3 we plan to close the group.
> 
> Any questions, just let me know. Thanks!
> 
> Melissa



Re: Checkstyle as style contract for Cassandra

2025-01-19 Thread Bernardo Botella
; >>> I recall reading through / offering this guide in the past as a 
>>>>>> >>> starting point for an org I was managing at the time: 
>>>>>> >>> https://google.github.io/eng-practices/review/reviewer/
>>>>>> >>>
>>>>>> >>> Been years; might be worth it to have a skim through that and see if 
>>>>>> >>> it could serve as a reasonable starting point for us if someone has 
>>>>>> >>> the inclination.
>>>>>> >>>
>>>>>> >>> On Thu, Jan 16, 2025, at 9:17 AM, Benedict wrote:
>>>>>> >>>
>>>>>> >>>
>>>>>> >>> I can imagine that it might cause some frustrating review 
>>>>>> >>> interactions people would like to avoid, but for solving that I’d 
>>>>>> >>> prefer we take a more social approach.
>>>>>> >>>
>>>>>> >>> Review shouldn’t spend much time on minor style points, and these 
>>>>>> >>> should normally be framed as suggestions. Obviously newer 
>>>>>> >>> contributors may need pointing to the style guide as something to 
>>>>>> >>> familiarise themselves with, but it shouldn’t readily be invoked as 
>>>>>> >>> a “thou shalt do this” tool.
>>>>>> >>>
>>>>>> >>> Perhaps a “Review Guide” is what we need to make sure we keep review 
>>>>>> >>> primarily focused on the core contribution, and to help avoid folk 
>>>>>> >>> getting bogged down in style sniping.
>>>>>> >>>
>>>>>> >>>
>>>>>> >>> On 16 Jan 2025, at 14:08, Josh McKenzie >>>>> >>> <mailto:jmcken...@apache.org>> wrote:
>>>>>> >>>
>>>>>> >>> 
>>>>>> >>> Right now our codebase is pretty consistent, especially for not 
>>>>>> >>> having a linter enforcing this kind of thing. Are we trying to solve 
>>>>>> >>> for codebase consistency, education of new contributors, both? 
>>>>>> >>> Neither?
>>>>>> >>>
>>>>>> >>> If just solving for consistency I'd argue we're good. If educating 
>>>>>> >>> new contributors, the Code Style guide seems pretty thorough to me? 
>>>>>> >>> https://cassandra.apache.org/_/development/code_style.html
>>>>>> >>>
>>>>>> >>> All of which is to say - it feels like the status quo is fine here 
>>>>>> >>> for me. i.e. it's not clear to me what problem we're trying to solve 
>>>>>> >>> w/a change here.
>>>>>> >>>
>>>>>> >>> On Wed, Jan 15, 2025, at 9:58 PM, guo Maxwell wrote:
>>>>>> >>>
>>>>>> >>> I agree with you for all these two points.
>>>>>> >>>
>>>>>> >>> I think you should open a ticket to solve this if you want to add a 
>>>>>> >>> rule to checkstyle, as I know there are many old codes that do not 
>>>>>> >>> comply with this rule.
>>>>>> >>> For point 2, this really feels like personal preference, but I'd 
>>>>>> >>> probably listen to the reviewer's opinion.😁
>>>>>> >>>
>>>>>> >>> Tolbert, Andy >>>>> >>> <mailto:x...@andrewtolbert.com>> 于2025年1月16日周四 08:47写道:
>>>>>> >>>
>>>>>> >>> Reading back https://issues.apache.org/jira/browse/CASSANDRA-19276 a 
>>>>>> >>> bit more, I think I *was* able to make checkstyle bend to the "Code 
>>>>>> >>> Style" definition by ignoring lambda tokens.  It's just that there 
>>>>>> >>> were a lot of "violations" which defined a method on one line:
>>>>>> >>>
>>>>>> >>> public int  getActiveTaskCount(){ return 0; }
>>>>>> >>> public long getCompletedTaskCount() { return 0; }
>>>>>> >>> public int  getPendingTaskCount()   { return 0; }
>>>>>> >>> public int  getCorePoolSize()   { return 0; }
>>>>>> >>> public int  getMaximumPoolSize(){ return 0; }
>>>>>> >>>
>>>>>> >>> I felt that this code was perfectly readable and wouldn't be right 
>>>>>> >>> to change.  This is what I wanted to make checkstyle consider 
>>>>>> >>> acceptable.
>>>>>> >>>
>>>>>> >>> I think it would be really nice if checkstyle would fail for the 
>>>>>> >>> more obvious case we want to avoid that comes up in reviews or 
>>>>>> >>> sometimes slips into the codebase if not caught by a reviewer, e.g.:
>>>>>> >>>
>>>>>> >>> if {
>>>>>> >>> //...
>>>>>> >>> }
>>>>>> >>>
>>>>>> >>> Thanks,
>>>>>> >>> Andy
>>>>>> >>>
>>>>>> >>> On Wed, Jan 15, 2025 at 6:21 PM Tolbert, Andy 
>>>>>> >>> mailto:x...@andrewtolbert.com>> wrote:
>>>>>> >>>
>>>>>> >>> Hi Bernardo,
>>>>>> >>>
>>>>>> >>> Thanks for bringing this up!
>>>>>> >>>
>>>>>> >>> Last year I was looking into enforcing curly braces as defined in 
>>>>>> >>> Code Style and had some thoughts on how to make this work but hit a 
>>>>>> >>> bit of a brick wall:
>>>>>> >>>
>>>>>> >>> https://issues.apache.org/jira/browse/CASSANDRA-19276
>>>>>> >>>
>>>>>> >>> I don't think there is an easy way as is to enforce this with 
>>>>>> >>> checkstyle currently:
>>>>>> >>>
>>>>>> >>> "{ and } are placed on a new line except when empty or opening a 
>>>>>> >>> multi-line lambda expression. Braces may be elided to a depth of one 
>>>>>> >>> if the condition or loop guards a single expression."
>>>>>> >>>
>>>>>> >>> Without making changes to checkstyle itself (e.g.: 
>>>>>> >>> https://github.com/checkstyle/checkstyle/issues/12226).
>>>>>> >>>
>>>>>> >>> I think if we were to add a new rule around brackets and newlines, 
>>>>>> >>> we would ideally try to make it match the Code style definition as 
>>>>>> >>> its declared, and hopefully it would not be too require touching a 
>>>>>> >>> lot of files (which maybe the case unfortunately).
>>>>>> >>>
>>>>>> >>> Thanks,
>>>>>> >>> Andy
>>>>>> >>>
>>>>>> >>> On Wed, Jan 15, 2025 at 6:10 PM Benedict >>>>> >>> <mailto:bened...@apache.org>> wrote:
>>>>>> >>>
>>>>>> >>> Even something as simple as the curly brace rule has sensible 
>>>>>> >>> exceptions. I’m pretty hard -1 on letting a linter make all our 
>>>>>> >>> editing decisions. Formatting is a contextual choice about how to 
>>>>>> >>> best represent information to the reader, and we should not abdicate 
>>>>>> >>> responsibility. The style guide is exactly that, a guide and that 
>>>>>> >>> helps us navigate editing choices, and it can be evolved or refined 
>>>>>> >>> via discussion and experimentation.
>>>>>> >>>
>>>>>> >>> For example, the second clause in your quote (re: lambdas) came 
>>>>>> >>> about only because we could break the restrictions of the first 
>>>>>> >>> clause and demonstrate an improvement to readability.
>>>>>> >>>
>>>>>> >>> If this is a pain point during review, either some people are too 
>>>>>> >>> eager to point to the code style guide, or perhaps your IDE defaults 
>>>>>> >>> need updating. This shouldn’t cause lots of traffic.
>>>>>> >>>
>>>>>> >>> People should try not to overly nitpick formatting, though of course 
>>>>>> >>> a balance is to be struck between contributors’ expression of their 
>>>>>> >>> code and that code sitting neatly in its context in the codebase.
>>>>>> >>>
>>>>>> >>> > On 15 Jan 2025, at 23:50, Bernardo Botella 
>>>>>> >>> > >>>>> >>> > <mailto:conta...@bernardobotella.com>> wrote:
>>>>>> >>> >
>>>>>> >>> > Hi everyone!
>>>>>> >>> >
>>>>>> >>> > I wanted to raise a question about code style for the project. 
>>>>>> >>> > I've been receiving some feedback on PRs about the need to:
>>>>>> >>> > - Have curly braces start on a new line
>>>>>> >>> > - Remove curly braces if the condition or loop has only one 
>>>>>> >>> > expression
>>>>>> >>> >
>>>>>> >>> > Taking a look at the official Code Style stated in the web, I read 
>>>>>> >>> > that:
>>>>>> >>> > "{ and } are placed on a new line except when empty or opening a 
>>>>>> >>> > multi-line lambda expression. Braces may be elided to a depth of 
>>>>>> >>> > one if the condition or loop guards a single expression."
>>>>>> >>> >
>>>>>> >>> > Which addresses the first type of comments I mentioned (curly 
>>>>>> >>> > braces starting in a new line), but leaves open the second type of 
>>>>>> >>> > comments (remove not needed curly braces).
>>>>>> >>> >
>>>>>> >>> > But, when looking at the checkstyle.xml, I don't see any rule 
>>>>>> >>> > enforcing any of those two types of comments.
>>>>>> >>> >
>>>>>> >>> > I believe checkstyle.xml should be our contract, so I'm proposing 
>>>>>> >>> > here:
>>>>>> >>> >
>>>>>> >>> > For "curly braces starting in a new line" rule, add something like 
>>>>>> >>> > what we already have on Sidecar and Analytics projects:
>>>>>> >>> > 
>>>>>> >>> >
>>>>>> >>> >
>>>>>> >>> > ...
>>>>>> >>> > 
>>>>>> >>> >
>>>>>> >>> > That way, we can fail fast and not worry about those comments on 
>>>>>> >>> > PRs. This of course may be painful, as we probably will have to 
>>>>>> >>> > fix a bunch of wrongly placed brackets all over the place.
>>>>>> >>> >
>>>>>> >>> > If there are no concerns here, I'll be more than happy to bite the 
>>>>>> >>> > bullet and add a patch for this.
>>>>>> >>> >
>>>>>> >>> >
>>>>>> >>> >
>>>>>> >>> > For "remove not needed curly braces", I understand that it tends 
>>>>>> >>> > to be the preference on the code, so we either modify the 
>>>>>> >>> > documentation and add a rule for that on the checkstyle.xml, or we 
>>>>>> >>> > are fine with that style and there is no need to remove them on 
>>>>>> >>> > patches.
>>>>>> >>> >
>>>>>> >>> > I wanted to hear the thoughts on the community for this one. My 
>>>>>> >>> > preference is to always use brackets, but that's just a 
>>>>>> >>> > preference, so it's perfectly fine not to enforce it and leave the 
>>>>>> >>> > documentation as is.
>>>>>> >>> >
>>>>>> >>> > Thanks everyone!
>>>>>> >>> > Bernardo
>>>>>> >>>
>>>>>> >>>
>>>>>> >>>
>>>>>> >>>
>>> 



Re: [DISCUSS] Review Guide for the project

2025-01-19 Thread Bernardo Botella
Well, as a growing member on this community, I definitely see benefit on such a 
review style guide. I’d also be more than happy to participate on its creation 
with my “fresh eyes reviewer” hat, as someone that hasn’t been reviewing PR for 
as long as other members of the community.

Having said that, coming back to Josh’s initial question around starting from 
an existing artifact or from scratch, I am sure we can get such an artifact and 
use it for inspiration and adapt it to the nuances of our project. 

Just to state the obvious, the review guide should not only cover Cassandra 
main project. It should also aim to expand to all the subprojects (such as 
Sidecar and analytics).

Bernardo

> El ene 18, 2025, a las 4:37 p.m., Ekaterina Dimitrova  
> escribió:
> 
> Having updated guides is great and probably we can also put links to them 
> into the PR template, for convenience sake. Thus anyone who opens a PR will 
> see them.
> 
> On Sat, 18 Jan 2025 at 18:46, Jordan West  > wrote:
>> I generally support a guide we can point new contributors to as well. 
>> 
>> Jordan 
>> 
>> On Sat, Jan 18, 2025 at 10:20 Dinesh Joshi > > wrote:
>>> As a growing community with new committers and contributors, I would 
>>> support establishing a review guide to ensure consistency and uniformity of 
>>> feedback.
>>> 
>>> On Sat, Jan 18, 2025 at 5:21 AM Josh McKenzie >> > wrote:
 See thread "Checkstyle as style contract for Cassandra 
 ".
 
 One area where we haven't formalized much guidance is around our code 
 review culture as a project. I've worked with guides based on google's 
 "How to do a code review 
 " eng-practices 
 in the past with some success. In the above thread, while there's a lot of 
 interconnected concerns (do we rely on checkstyle to enforce style, how 
 are new people on the project expected to learn about style or get 
 involved reviewing code, etc), we can split out the review piece to its 
 own discussion.
 
 We currently have a fairly extensive guide on our Code Style 
 .
  Should we do the same around how to review and, if so, should we start 
 with an existing artifact or from scratch?



Re: Patrick McFadin joins the PMC

2025-01-22 Thread Bernardo Botella
Nice! 

Congrats Patrick!

> On Jan 22, 2025, at 8:05 AM, Jordan West  wrote:
> 
> The PMC's members are pleased to announce that Patrick McFadin has accepted 
> an invitation to become a PMC member.
> 
> Thanks a lot, Patrick, for everything you have done for the project all these 
> years.
> 
> Congratulations and welcome!!
> 
> The Apache Cassandra PMC



Re: [DISCUSS] synchronisation of properties between Config.java and cassandra.yaml

2025-01-24 Thread Bernardo Botella
Love the suggestion of marking the hidden/advanced configuration properties 
with annotations. Leaving a “configuration” property out of the main 
configuration file should be deliberate and well thought and argued. I highly 
doubt we have 112 “advanced” properties that really need to be hidden to 
protect users from themselves :-(

Agreed with Ekaterina that it is worth reviewing the current properties and 
come up with the subset that should stay hidden, and expose the rest on the 
yaml file.


> On Jan 24, 2025, at 8:00 AM, Dmitry Konstantinov  wrote:
> 
> https://issues.apache.org/jira/browse/CASSANDRA-20249
> 
> On Fri, 24 Jan 2025 at 15:40, Dmitry Konstantinov  > wrote:
>> Maybe I missed some patterns but it looks like a pretty good estimation, I 
>> did like 10 random checks manually to verify :-)
>> I will try to make an ant target with a similar logic (hopefully, during the 
>> weekend)
>> I will create a ticket to track this activity (to share attachments there to 
>> not overload the thread with such outputs in future).
>> 
>> On Fri, 24 Jan 2025 at 15:37, Štefan Miklošovič > > wrote:
>>> Oh my god, 112? :DD I was thinking it would be less than 10.
>>> 
>>> Anyway, I think we need to integrate this to some ant target. If you 
>>> expanded on this, that would be great.
>>> 
>>> On Fri, Jan 24, 2025 at 4:31 PM Dmitry Konstantinov >> > wrote:
 A very primitive implementation of the 1st idea below:
 
 String configUrl = 
 "file:///Users/dmitry/IdeaProjects/cassandra-trunk/conf/cassandra.yaml";
 Field[] allFields = Config.class.getFields();
 List topLevelPropertyNames = new ArrayList<>();
 for(Field field : allFields)
 {
 if (!Modifier.isStatic(field.getModifiers()))
 {
 topLevelPropertyNames.add(field.getName());
 }
 }
 
 URL url = new URL(configUrl);
 List lines = Files.readAllLines(Paths.get(url.toURI()));
 
 int missedCount = 0;
 for (String propertyName : topLevelPropertyNames)
 {
 boolean found = false;
 for (String line : lines)
 {
 if (line.startsWith(propertyName + ":")
 || line.startsWith("#" + propertyName + ":")
 || line.startsWith("# " + propertyName + ":")) {
 found = true;
 break;
 }
 }
 if (!found)
 {
 missedCount++;
 System.out.println(propertyName);
 }
 }
 System.out.println("Total missed:" + missedCount);
 
 It prints the following config property names which are defined in 
 Config.java but not present as "property" or "# property " in a file:
 permissions_cache_max_entries
 roles_cache_max_entries
 credentials_cache_max_entries
 auto_bootstrap
 force_new_prepared_statement_behaviour
 use_deterministic_table_id
 repair_request_timeout
 stream_transfer_task_timeout
 cms_await_timeout
 cms_default_max_retries
 cms_default_retry_backoff
 epoch_aware_debounce_inflight_tracker_max_size
 metadata_snapshot_frequency
 available_processors
 repair_session_max_tree_depth
 use_offheap_merkle_trees
 internode_max_message_size
 native_transport_max_message_size
 native_transport_max_request_data_in_flight_per_ip
 native_transport_max_request_data_in_flight
 native_transport_receive_queue_capacity
 min_free_space_per_drive
 max_space_usable_for_compactions_in_percentage
 reject_repair_compaction_threshold
 concurrent_index_builders
 max_streaming_retries
 commitlog_max_compression_buffers_in_pool
 max_mutation_size
 dynamic_snitch
 failure_detector
 use_creation_time_for_hint_ttl
 key_cache_migrate_during_compaction
 key_cache_invalidate_after_sstable_deletion
 paxos_cache_size
 file_cache_round_up
 disk_optimization_estimate_percentile
 disk_optimization_page_cross_chance
 purgeable_tobmstones_metric_granularity
 windows_timer_interval
 otc_coalescing_strategy
 otc_coalescing_window_us
 otc_coalescing_enough_coalesced_messages
 otc_backlog_expiration_interval_ms
 scripted_user_defined_functions_enabled
 user_defined_functions_threads_enabled
 allow_insecure_udfs
 allow_extra_insecure_udfs
 user_defined_functions_warn_timeout
 user_defined_functions_fail_timeout
 user_function_timeout_policy
 back_pressure_enabled
 back_pressure_strategy
 repair_command_pool_full_strategy
 repair_command_pool_size
 block_for_peers_timeout_in_secs
 block_for_peers_in_remote_dcs
 skip_stream_disk_space_check
 snapshot_on_repaired_data_mismatch
 validation_preview_purge_head_start
 initial_range_tombstone_list_allocation_size
 range_tombstone_list_growth_factor
 snapshot_on_duplicate_row_detection
 ch

Re: [DISCUSS] synchronisation of properties between Config.java and cassandra.yaml

2025-01-24 Thread Bernardo Botella
Love the suggestion of marking the hidden/advanced configuration properties 
with annotations. Leaving a “configuration” property out of the main 
configuration file should be deliberate and well thought and argued. I highly 
doubt we have 112 “advanced” properties that really need to be hidden to 
protect users from themselves :-(

Agreed with Ekaterina that it is worth reviewing the current properties and 
come up with the subset that should stay hidden, and expose the rest on the 
yaml file.


> On Jan 24, 2025, at 8:00 AM, Dmitry Konstantinov  wrote:
> 
> https://issues.apache.org/jira/browse/CASSANDRA-20249
> 
> On Fri, 24 Jan 2025 at 15:40, Dmitry Konstantinov  > wrote:
>> Maybe I missed some patterns but it looks like a pretty good estimation, I 
>> did like 10 random checks manually to verify :-)
>> I will try to make an ant target with a similar logic (hopefully, during the 
>> weekend)
>> I will create a ticket to track this activity (to share attachments there to 
>> not overload the thread with such outputs in future).
>> 
>> On Fri, 24 Jan 2025 at 15:37, Štefan Miklošovič > > wrote:
>>> Oh my god, 112? :DD I was thinking it would be less than 10.
>>> 
>>> Anyway, I think we need to integrate this to some ant target. If you 
>>> expanded on this, that would be great.
>>> 
>>> On Fri, Jan 24, 2025 at 4:31 PM Dmitry Konstantinov >> > wrote:
 A very primitive implementation of the 1st idea below:
 
 String configUrl = 
 "file:///Users/dmitry/IdeaProjects/cassandra-trunk/conf/cassandra.yaml";
 Field[] allFields = Config.class.getFields();
 List topLevelPropertyNames = new ArrayList<>();
 for(Field field : allFields)
 {
 if (!Modifier.isStatic(field.getModifiers()))
 {
 topLevelPropertyNames.add(field.getName());
 }
 }
 
 URL url = new URL(configUrl);
 List lines = Files.readAllLines(Paths.get(url.toURI()));
 
 int missedCount = 0;
 for (String propertyName : topLevelPropertyNames)
 {
 boolean found = false;
 for (String line : lines)
 {
 if (line.startsWith(propertyName + ":")
 || line.startsWith("#" + propertyName + ":")
 || line.startsWith("# " + propertyName + ":")) {
 found = true;
 break;
 }
 }
 if (!found)
 {
 missedCount++;
 System.out.println(propertyName);
 }
 }
 System.out.println("Total missed:" + missedCount);
 
 It prints the following config property names which are defined in 
 Config.java but not present as "property" or "# property " in a file:
 permissions_cache_max_entries
 roles_cache_max_entries
 credentials_cache_max_entries
 auto_bootstrap
 force_new_prepared_statement_behaviour
 use_deterministic_table_id
 repair_request_timeout
 stream_transfer_task_timeout
 cms_await_timeout
 cms_default_max_retries
 cms_default_retry_backoff
 epoch_aware_debounce_inflight_tracker_max_size
 metadata_snapshot_frequency
 available_processors
 repair_session_max_tree_depth
 use_offheap_merkle_trees
 internode_max_message_size
 native_transport_max_message_size
 native_transport_max_request_data_in_flight_per_ip
 native_transport_max_request_data_in_flight
 native_transport_receive_queue_capacity
 min_free_space_per_drive
 max_space_usable_for_compactions_in_percentage
 reject_repair_compaction_threshold
 concurrent_index_builders
 max_streaming_retries
 commitlog_max_compression_buffers_in_pool
 max_mutation_size
 dynamic_snitch
 failure_detector
 use_creation_time_for_hint_ttl
 key_cache_migrate_during_compaction
 key_cache_invalidate_after_sstable_deletion
 paxos_cache_size
 file_cache_round_up
 disk_optimization_estimate_percentile
 disk_optimization_page_cross_chance
 purgeable_tobmstones_metric_granularity
 windows_timer_interval
 otc_coalescing_strategy
 otc_coalescing_window_us
 otc_coalescing_enough_coalesced_messages
 otc_backlog_expiration_interval_ms
 scripted_user_defined_functions_enabled
 user_defined_functions_threads_enabled
 allow_insecure_udfs
 allow_extra_insecure_udfs
 user_defined_functions_warn_timeout
 user_defined_functions_fail_timeout
 user_function_timeout_policy
 back_pressure_enabled
 back_pressure_strategy
 repair_command_pool_full_strategy
 repair_command_pool_size
 block_for_peers_timeout_in_secs
 block_for_peers_in_remote_dcs
 skip_stream_disk_space_check
 snapshot_on_repaired_data_mismatch
 validation_preview_purge_head_start
 initial_range_tombstone_list_allocation_size
 range_tombstone_list_growth_factor
 snapshot_on_duplicate_row_detection
 ch

Checkstyle as style contract for Cassandra

2025-01-15 Thread Bernardo Botella
Hi everyone!

I wanted to raise a question about code style for the project. I've been 
receiving some feedback on PRs about the need to:
- Have curly braces start on a new line
- Remove curly braces if the condition or loop has only one expression

Taking a look at the official Code Style stated in the web, I read that:
"{ and } are placed on a new line except when empty or opening a multi-line 
lambda expression. Braces may be elided to a depth of one if the condition or 
loop guards a single expression."

Which addresses the first type of comments I mentioned (curly braces starting 
in a new line), but leaves open the second type of comments (remove not needed 
curly braces).

But, when looking at the checkstyle.xml, I don't see any rule enforcing any of 
those two types of comments.

I believe checkstyle.xml should be our contract, so I'm proposing here:

For "curly braces starting in a new line" rule, add something like what we 
already have on Sidecar and Analytics projects:



...


That way, we can fail fast and not worry about those comments on PRs. This of 
course may be painful, as we probably will have to fix a bunch of wrongly 
placed brackets all over the place.

If there are no concerns here, I'll be more than happy to bite the bullet and 
add a patch for this.



For "remove not needed curly braces", I understand that it tends to be the 
preference on the code, so we either modify the documentation and add a rule 
for that on the checkstyle.xml, or we are fine with that style and there is no 
need to remove them on patches.

I wanted to hear the thoughts on the community for this one. My preference is 
to always use brackets, but that's just a preference, so it's perfectly fine 
not to enforce it and leave the documentation as is.

Thanks everyone!
Bernardo

Re: [DISCUSS] NOT_NULL constraint vs STRICTLY_NOT_NULL constraint

2025-02-14 Thread Bernardo Botella
Guo: From your name change suggestion, I think I’d be -1 on that. In my head, 
LOOSE_NOT_NULL would imply that there is another NOT_NULL which happens to be 
NOT_NULL. Not the case now after this thread discussion. There are a lot of 
differences between MYSQL and Cassandra, and this constraint behavior being one 
of them I think is also a valid assumption. I don’t think there’s the need of 
being verbose on the naming for the constraint.

Bernardo



> On Feb 11, 2025, at 12:42 AM, guo Maxwell  wrote:
> 
> I think it may be better to use LOOSE_NOT_NULL instead of NOT_NULL.
> The reason is: NOT_NULL can easily make users think that it is a related 
> function of MYSQL, but in fact we are different.
> Changing a different name may avoid users' preconceived feelings.
> 
> Dinesh Joshi mailto:djo...@apache.org>> 于2025年2月11日周二 
> 01:55写道:
>> On Mon, Feb 10, 2025 at 9:05 AM Bernardo Botella 
>> mailto:conta...@bernardobotella.com>> wrote:
>>> We have consensus then. Let’s ditch the non strict version, and rename the 
>>> STRICTLY_NOT_NULL to NOT_NULL.
>> 
>> Can you give this thread at least 24-48 hours to ensure we capture any other 
>> perspectives? 



Re: [DISCUSS] Fine grained max size guardails

2025-02-08 Thread Bernardo Botella
Thanks everyone for the inputs.

Dinesh: "constraint should not violate the max bound of the guardrail” -> Yes, 
that statement is true with the proposed patch. With code as is, the write will 
fail if it either does not comply with the guardrail OR does not comply with 
the constraint. The CEP touched this as well, stating that guardrails take 
preference over defined constraints in schemas, so no matter what, these 
guardrails will always be respected.

Thanks,
Bernardo

> On Feb 8, 2025, at 6:09 PM, Dinesh Joshi  wrote:
> 
> Guardrails and constraints serve distinct purposes. Guardrails allow the 
> operator to define reasonable bounds while constraints allow the developer to 
> do the same in the schema. However the constraint should not violate the max 
> bound of the guardrail. For example, if an operator defines the max size of a 
> column to be 1MiB then a constraint in the schema cannot go beyond this max 
> size limit. This allows the operator to define reasonable limits while 
> allowing the developer control over their application’s limits.
> 
> On Sat, Feb 8, 2025 at 12:03 PM Bernardo Botella 
> mailto:conta...@bernardobotella.com>> wrote:
>> Hi everyone,
>> 
>> After Constraints framework was merged in, I would like to come back to the 
>> discussion Jordan brought up in this Jira:
>> https://issues.apache.org/jira/browse/CASSANDRA-19677
>> 
>> For context, that Jira ticket (and PR) is adding a bunch of more fine 
>> grained size thresholds for column types using guardrails, expanding on what 
>> these Jiras added:
>> https://issues.apache.org/jira/browse/CASSANDRA-17151
>> https://issues.apache.org/jira/browse/CASSANDRA-17150
>> 
>> Now, we have an alternative way to set sizes to scpecific columns using 
>> constraints (we have LENGHT constraint, which is technically different, but 
>> adding a SIZE constraint is on the roadmap and straight forward).
>> 
>> Jordan raised a really valid concern that these new guardrails may be adding 
>> some noise to an already crowded space such as settings. On the other hand, 
>> these guardrails operate at a different level than constraints, as they are 
>> generic as opposed to column specific.
>> 
>> We would like to hear what the community think in this case. Should these 
>> guardrails go in? Or do we drop them in favour of plain constraints?
>> 
>> My two cents: My opinion is that these guardails still add value and help 
>> operators a more fine grained control to "protect" the database.
>> 
>> Regards,
>> Bernardo



Re: Welcome Jeremiah Jordan to the PMC

2025-02-16 Thread Bernardo Botella
Congrats!

> El feb 16, 2025, a las 9:08 a. m., Aaron  escribió:
> 
> Congratulations, JD!
> 
> On Sat, Feb 15, 2025 at 7:05 AM Jasonstack Zhao Yang 
> mailto:jasonstack.z...@gmail.com>> wrote:
>> Congrats!
>> 
>> On Sat, 15 Feb 2025 at 20:25, Maxim Muzafarov > > wrote:
>>> Congratulation Jeremiah!
>>> 
>>> On Sat, 15 Feb 2025 at 05:01, Paulo Motta >> > wrote:
>>> >
>>> > Congrats JD!
>>> >
>>> > On Fri, 14 Feb 2025 at 18:35 guo Maxwell >> > > wrote:
>>> >>
>>> >> Congrats!
>>> >> Tolbert, Andy >> >> >于2025年2月15日 周六上午6:22写道:
>>> >>>
>>> >>> Congrats JD!
>>> >>>
>>> >>> On Fri, Feb 14, 2025 at 4:13 PM >> >>> > wrote:
>>> 
>>>  Congratulations, well deserved!
>>> 
>>>  El 14 feb 2025, a las 20:40, Alex Petrov >>  > escribió:
>>> 
>>>  
>>>  Congratulations!
>>> 
>>>  On Fri, Feb 14, 2025, at 7:33 PM, Josh McKenzie wrote:
>>> 
>>>  Congrats Jeremiah!
>>> 
>>>  I know you're excited to have yet another email list to attend to, 
>>>  aren't you? :D
>>> 
>>>  On Fri, Feb 14, 2025, at 1:29 PM, Jeremiah Jordan wrote:
>>> 
>>>  Thanks all!  Excited to continue being a part of the project in this 
>>>  new role.
>>> 
>>>  -Jeremiah Jordan
>>> 
>>>  On Feb 14, 2025 at 12:23:17 PM, Francisco Guerrero >>  > wrote:
>>> 
>>>  Congrats!
>>> 
>>>  On 2025/02/14 18:20:02 Yifan Cai wrote:
>>> 
>>>  Congrats!
>>> 
>>> 
>>>  On Fri, Feb 14, 2025 at 10:16 AM Jordan West >>  > wrote:
>>> 
>>> 
>>>  > Congrats, JD! Welcome aboard!
>>> 
>>>  >
>>> 
>>>  > Jordan
>>> 
>>>  >
>>> 
>>>  > On Fri, Feb 14, 2025 at 11:01 Mick Semb Wever >>  > > wrote:
>>> 
>>>  >
>>> 
>>>  >>.
>>> 
>>>  >>
>>> 
>>>  >> > I hope you will join me in welcoming him to the committee.
>>> 
>>>  >>
>>> 
>>>  >>
>>> 
>>>  >> Welcome JD!
>>> 
>>>  >>
>>> 
>>>  >
>>> 
>>> 
>>> 
>>> 



Re: Welcome Caleb Rackliffe to the PMC

2025-02-20 Thread Bernardo Botella
So many good news today! Congratulations Caleb!



> On Feb 20, 2025, at 4:07 PM, Ekaterina Dimitrova  
> wrote:
> 
> That’s awesome addition! Well done! Thanks for everything, Caleb!! Congrats!!!
> 
> On Thu, 20 Feb 2025 at 18:55, Jeremiah Jordan  > wrote:
>> Congrats Caleb!
>> 
>> On Feb 20, 2025 at 4:06:19 PM, Jon Haddad > > wrote:
>>> The PMC for Apache Cassandra is delighted to announce that Caleb Rackliffe 
>>> has joined it's membership!
>>> 
>>> Caleb has been a member of the community for 10 years and is one of the 
>>> most active committers on the project.  
>>> 
>>> Please join us in welcoming Caleb to his new role!
>>> 
>>> Jon
>>> On behalf of the Cassandra PMC
>>> 
>>> 



[DISCUSS] Virtualise system_schema (CASSANDRA-19129)

2025-02-20 Thread Bernardo Botella
Hi everyone!

As part of Jira ticket (CASSANDRA-20331) involving creating a new system table 
to improve Cosntraints support on the driver, I have been pointed to this other 
Jira (CASSANDRA-19129). It makes perfect sense for us to move to virtual 
tables, and avoid increasing the snowball of tables that will need to be 
migrated. So I think it's a good time to plan on moving these tables. Now, for 
doing so, I see two different approaches:
- Moving all the tables at once and making sure the driver can find them.
- We take an incremental approach of moving one by one table.

I wanted to pick this list brains on what are the potential concerns of any of 
those two approaches, and if is there any preference. Or, even better, if 
someone has already thought about a better one.

In my opinion, being able to split the work in the different tables will help 
us spread the work load in a less dramatic path. Unless it becomes a burden for 
the actual client that is, I think we'd be better off migrating them one by one 
to become virtual tables.

Regards,
Bernardo

Re: New committers: Maxwell Guo and Dmitry Konstantinov

2025-02-20 Thread Bernardo Botella
Nice!!

Congratulations Maxwell and Dmitry!! Well deserved milestone!! And great for 
the community!


> On Feb 20, 2025, at 9:47 AM, Štefan Miklošovič  wrote:
> 
> The Project Management Committee (PMC) for Apache Cassandra
> has invited Maxwell Guo and Dmitry Konstantinov to become committers and we 
> are pleased
> to announce that they have accepted.
> 
> Please join us in welcoming Maxwell Guo and Dmitry Konstantinov to their new 
> role and
> responsibility in our project community.
> 
> Stefan Miklosovic
> 
> On behalf of the Apache Cassandra PMC



Re: [VOTE] Release Apache Sidecar Cassandra 0.1.0

2025-03-02 Thread Bernardo Botella
+1 (nb)

Awesome milestone

On Fri, Feb 28, 2025 at 11:06 Josh McKenzie  wrote:

> +1 - great work everyone!
>
> On Fri, Feb 28, 2025, at 1:58 PM, Dinesh Joshi wrote:
>
> +1, thanks to everyone who worked towards this milestone.
>
> On Fri, Feb 28, 2025 at 10:47 AM Doug Rohrer  wrote:
>
> +1 (nb)
>
> Thanks for putting in the work to get this ready to go!
>
> Doug
>
> > On Feb 28, 2025, at 7:46 AM, Brandon Williams  wrote:
> >
> > +1, verified sigs/checksums, tested packaging.
> >
> > Minor note: the packages do not declare any deps (like java.)  This is
> > probably not an issue in practice since nobody will run a dedicated
> > 'sidecar machine' but still could be improved.
> >
> > Kind Regards,
> > Brandon
> >
> > On Thu, Feb 27, 2025 at 4:15 PM Francisco Guerrero 
> wrote:
> >>
> >> Proposing the test build of Cassandra Sidecar 0.1.0 for release.
> >>
> >> sha1: a2c19e8ccf04bd3ddbdf8ac4d792d2d55f2e497f
> >> Git: https://github.com/apache/cassandra-sidecar/tree/0.1.0-tentative
> >> Maven Artifacts:
> >>
> https://repository.apache.org/content/repositories/orgapachecassandra-1388/org/apache/cassandra/sidecar-server/0.1.0/
> >>
> https://repository.apache.org/content/repositories/orgapachecassandra-1388/org/apache/cassandra/sidecar-client/0.1.0/
> >>
> https://repository.apache.org/content/repositories/orgapachecassandra-1388/org/apache/cassandra/sidecar-adapters-cassandra41/0.1.0/
> >>
> https://repository.apache.org/content/repositories/orgapachecassandra-1388/org/apache/cassandra/sidecar-adapters-base/0.1.0/
> >>
> https://repository.apache.org/content/repositories/orgapachecassandra-1388/org/apache/cassandra/sidecar-vertx-client/0.1.0/
> >>
> https://repository.apache.org/content/repositories/orgapachecassandra-1388/org/apache/cassandra/sidecar-client-common/0.1.0/
> >>
> https://repository.apache.org/content/repositories/orgapachecassandra-1388/org/apache/cassandra/sidecar-vertx-client-all/0.1.0/
> >>
> https://repository.apache.org/content/repositories/orgapachecassandra-1388/org/apache/cassandra/sidecar-server-common/0.1.0/
> >>
> https://repository.apache.org/content/repositories/orgapachecassandra-1388/org/apache/cassandra/sidecar-vertx-auth-mtls/0.1.0/
> >>
> https://repository.apache.org/content/repositories/orgapachecassandra-1387/org/apache/cassandra/sidecar-client/0.1.0-jdk8/
> >>
> https://repository.apache.org/content/repositories/orgapachecassandra-1387/org/apache/cassandra/sidecar-vertx-client/0.1.0-jdk8/
> >>
> https://repository.apache.org/content/repositories/orgapachecassandra-1387/org/apache/cassandra/sidecar-client-common/0.1.0-jdk8/
> >>
> https://repository.apache.org/content/repositories/orgapachecassandra-1387/org/apache/cassandra/sidecar-vertx-client-all/0.1.0-jdk8/
> >>
> >> The Source and Build Artifacts, and the Debian and RPM packages and
> repositories, are available here:
> >>
> https://dist.apache.org/repos/dist/dev/cassandra/cassandra-sidecar/0.1.0/
> >>
> >> The vote will be open for 72 hours (longer if needed). Everyone who has
> tested the build is invited to vote. Votes by PMC members are considered
> binding. A vote passes if there are at least three binding +1s and no -1's.
> >>
> >> [1]: CHANGES.txt:
> https://github.com/apache/cassandra-sidecar/blob/0.1.0-tentative/CHANGES.txt
> >> [2]: NEWS.txt:
> https://github.com/apache/cassandra-sidecar/blob/0.1.0-tentative/NEWS.txt
>
>
>


Re: [RELEASE] Apache Cassandra Sidecar 0.1.0 released

2025-03-07 Thread Bernardo Botella
This is a huge milestone! It’s incredible to see this release happening. 
Congrats to everyone involved!

> On Mar 7, 2025, at 9:48 AM, Francisco Guerrero  wrote:
> 
> The Cassandra team is pleased to announce the release of Apache Sidecar 
> Cassandra version 0.1.0.
> 
> 
> Downloads of source and binary distributions are available here:
> 
>  https://dlcdn.apache.org/cassandra/cassandra-sidecar/0.1.0/
> 
> 
> The Maven artifacts can be found at:
> 
>  https://repo.maven.apache.org/maven2/org/apache/cassandra/
> 
> These will be mirrored to other repositories.
> 
> 
> As always, please review the changes[1] and pay attention to the release 
> notes[2]. Let us know[3] if you were to encounter any problem.
> 
> 
> Enjoy!
> 
> [1]: CHANGES.txt 
> https://github.com/apache/cassandra-sidecar/blob/cassandra-sidecar-0.1.0/CHANGES.txt
> [2]: NEWS.txt 
> https://github.com/apache/cassandra-sidecar/blob/cassandra-sidecar-0.1.0/NEWS.txt
> [3]: https://issues.apache.org/jira/browse/CASSSIDECAR



Re: Welcome Ekaterina Dimitrova as Cassandra PMC member

2025-03-04 Thread Bernardo Botella
Congratulations!!

On Tue, Mar 4, 2025 at 22:17 Berenguer Blasi 
wrote:

> Congrats Ekaterina!
> On 5/3/25 2:03, Jasonstack Zhao Yang wrote:
>
> Congratulations Ekaterina!
>
> On Wed, 5 Mar 2025 at 08:18, Josh McKenzie  wrote:
>
>> Welcome Ekaterina!  \o/
>>
>> On Tue, Mar 4, 2025, at 7:07 PM, Francisco Guerrero wrote:
>>
>> Congratulations Ekaterina! Well deserved!
>>
>> On 2025/03/04 20:25:08 Paulo Motta wrote:
>> > Aloha,
>> >
>> > The Project Management Committee (PMC) for Apache Cassandra is
>> delighted to
>> > announce that Ekaterina Dimitrova has joined the PMC!
>> >
>> > Thanks a lot, Ekaterina, for everything you have done for the project
>> all
>> > these years.
>> >
>> > The PMC - Project Management Committee - manages and guides the
>> direction
>> > of the project, and is responsible for inviting new committers and PMC
>> > members to steward the longevity of the project.
>> >
>> > See https://community.apache.org/pmc/responsibilities.html if you're
>> > interested in learning more about the rights and responsibilities of PMC
>> > members.
>> >
>> > Please join us in welcoming Ekaterina Dimitrova to her new role in our
>> > project!
>> >
>> > Paulo, on behalf of the Apache Cassandra PMC
>> >
>>
>>
>>


Re: Welcome Aaron Ploetz as Cassandra Committer

2025-03-03 Thread Bernardo Botella
That’s awesome!!

Congratulations Aaron!! Long overdue for sure!


On Mon, Mar 3, 2025 at 16:25 Patrick McFadin  wrote:

> The Apache Cassandra PMC is very happy to announce that Aaron Ploetz has
> accepted the invitation to become a committer!
>
> Aaron has been tireless in his mission to help every single Cassandra
> operator on planet Earth. If you don't believe me, check out his Stack
> Overflow profile page: https://stackoverflow.com/users/1054558/aaron
> He's been a continuous speaker on Cassandra topics and is one of the
> coordinators for the Planet Cassandra meetup. Those are just the
> recent highlights.
>
> Please join us in congratulating and welcoming Aaron.
>
> The Apache Cassandra PMC members
>


Re: [VOTE][IP CLEARANCE] Spark-Cassandra-Connector

2025-03-18 Thread Bernardo Botella
+1 (nb)

> On Mar 18, 2025, at 10:52 AM, Yifan Cai  wrote:
> 
> +1 (nb)
> 
> From: Jeremiah Jordan 
> Sent: Tuesday, March 18, 2025 10:32:14 AM
> To: dev@cassandra.apache.org 
> Cc: gene...@incubator.apache.org 
> Subject: Re: [VOTE][IP CLEARANCE] Spark-Cassandra-Connector
>  
> +1
> 
> On Mar 18, 2025 at 3:13:09 AM, Mick Semb Wever  > wrote:
>> (general@incubator cc'd)
>> 
>> Please vote on the acceptance of the Spark-Cassandra-Connector and its
>> IP Clearance:
>> https://incubator.apache.org/ip-clearance/cassandra-spark-cassandra-connector.html
>> 
>> All consent from original authors of the donation, and tracking of
>> collected CLAs, is found in
>> https://github.com/datastax/spark-cassandra-connector/pull/1376 and
>> https://docs.google.com/spreadsheets/d/1rkFtfnXbIckV1tYQlgFtwoHHOKUJj0vv-VndlQWA4rY
>> These do not all require acknowledgement before the vote.
>> 
>> The code is prepared for donation at
>> https://github.com/datastax/spark-cassandra-connector
>> 
>> Once this vote passes we will request ASF Infra to move the
>> datastax/spark-cassandra-connector as-is to
>> apache/cassandra-spark-connector  .  The master and gh-pages branches,
>> all tags, and all history, will be kept.  The master branch will be
>> renamed to trunk.
>> 
>> PMC members, please check carefully the IP Clearance requirements before 
>> voting.
>> 
>> The vote will be open for 72 hours (or longer). Votes by PMC members
>> are considered binding. A vote passes if there are at least three
>> binding +1s and no -1's.
>> 
>> regards,
>> Mick



Re: Constraint's "not null" alignment with transactions and their simplification

2025-04-11 Thread Bernardo Botella
Benedict:

An alternative for that, keeping the CHECK word, would be to change the 
constraint name to IS_JSON. CHECK IS_JSON would read as you intend without the 
need to jump to REQUIRE. I think that’s true for the rest of provided 
constraints as well.

Bernardo


> On Apr 11, 2025, at 6:02 AM, Benedict  wrote:
> 
> We have taken a different approach though, as we do not actually take a 
> predicate on the RHS and do not supply the column name. In our examples we 
> had eg CHECK JSON, which doesn’t parse unambiguously to a human. The 
> equivalent to Postgres would seem to be CHECK is_json(field).
> 
> I’m all for following an existing example, but once we decide to diverge the 
> justification is gone and we should decide holistically what we think is 
> best. So if we want to elide the column entirely and have a list of built in 
> restrictions, I’d prefer eg REQUIRE JSON since this parses unambiguously to a 
> human, whereas if we want to follow Postgres let’s do that but do it but that 
> means eg CHECK is_json(field).
> 
>> On 11 Apr 2025, at 10:57, Štefan Miklošovič  wrote:
>> 
>> 
>> While modelling that, we followed how it is done in SQL world, PostgreSQL as 
>> well as MySQL both use CHECK.
>> 
>> https://www.postgresql.org/docs/current/ddl-constraints.html#DDL-CONSTRAINTS-CHECK-CONSTRAINTS
>> https://dev.mysql.com/doc/refman/8.4/en/create-table-check-constraints.html
>> 
>> On Fri, Apr 11, 2025 at 10:43 AM Benedict > > wrote:
>>> I would prefer require/expect/is over check
>>> 
 On 11 Apr 2025, at 08:05, Štefan Miklošovič >>> > wrote:
 
 
 Yes, you will have it like that :) Thank you for this idea. Great example 
 of cooperation over diverse domains.
 
 On Fri, Apr 11, 2025 at 12:29 AM David Capwell >>> > wrote:
> I am biased but I do prefer
> 
> val3 text CHECK NOT NULL AND JSON AND LENGTH() < 1024
> 
> Here is a similar accord CQL
> 
> BEGIN TRANSACTION
>   LET a = (…);
>   IF a IS NOT NULL 
>   AND a.b IS NOT NULL 
>   AND a.c IS NULL; THEN
> — profit
>   END IF
> COMMIT TRANSACTION
> 
>> On Apr 10, 2025, at 8:46 AM, Yifan Cai > > wrote:
>> 
>> Re: reserved keywords, “check” is currently not, and I don’t think it 
>> needs to be a reserved keyword with the proposal.
>> 
>> From: C. Scott Andreas > >
>> Sent: Thursday, April 10, 2025 7:59:35 AM
>> To: dev@cassandra.apache.org  
>> mailto:dev@cassandra.apache.org>>
>> Cc: dev@cassandra.apache.org  
>> mailto:dev@cassandra.apache.org>>
>> Subject: Re: Constraint's "not null" alignment with transactions and 
>> their simplification
>>  
>> If the proposal does not introduce “check” as a reserved keyword that 
>> would require quoting in existing DDL/DML, this concern doesn’t apply 
>> and the email below can be ignored. This might be the case if “CHECK NOT 
>> NULL” is the full token introduced rather than “CHECK” separately from 
>> constraints that are checked.
>> 
>> If “check” is introduced as a standalone reserved keyword: my primary 
>> feedback is on the introduction of reserved words in the CQL grammar 
>> that may affect compatibility of existing schemas.
>> 
>> In the Cassandra 3.x series, several new CQL reserved words were added 
>> (more than necessary) and subsequently backed out, because it required 
>> users to begin quoting schemas and introduced incompatibility between 
>> 3.x and 4.x for queries and DDL that “just worked” before.
>> 
>> The word “check” is used in many domains (test/evaluation engineering, 
>> finance, business processes, etc) and is likely to be used in user 
>> schemas. If the proposal introduces this as a reserved word that would 
>> require it to be quoted if used in table or column names, this will 
>> create incompatibility for existing user queries on upgrade.
>> 
>> Otherwise, ignore me. :)
>> 
>> Thanks,
>> 
>> – Scott
>> 
>> –––
>> Mobile
>> 
>>> On Apr 10, 2025, at 7:47 AM, Jon Haddad >> > wrote:
>>> 
>>> 
>>> This looks like a really nice improvement to me. 
>>> 
>>> 
>>> On Thu, Apr 10, 2025 at 7:27 AM Štefan Miklošovič 
>>> mailto:smikloso...@apache.org>> wrote:
>>> Recently, David Capwell was commenting on constraints in one of Slack 
>>> threads (1) in dev channel and he suggested that the current form of 
>>> "not null" constraint we have right now in place, e.g like this
>>> 
>>> create table ks.tb (id int primary key, val int check not_null(val));
>>> 
>>> could be instead of that form used like this:
>>> 

Re: [DISCUSS] 5.1 should be 6.0

2025-04-10 Thread Bernardo Botella
+1 on 6.0

> On Apr 10, 2025, at 1:07 PM, Josh McKenzie  wrote:
> 
> Let's keep this thread to just +1's on 6.0; I'll see about a proper isolated 
> [DISCUSS] thread for my proposal above hopefully tomorrow, schedule 
> permitting.
> 
> On Thu, Apr 10, 2025, at 3:46 PM, Jeremiah Jordan wrote:
>> +1 to 6.0
>> 
>> On Thu, Apr 10, 2025 at 1:38 PM Josh McKenzie > > wrote:
>> 
>> +1 to 6.0.
>> 
>> On Thu, Apr 10, 2025, at 2:28 PM, Jon Haddad wrote:
>>> Bringing this back up.
>>> 
>>> I don't think we have any reason to hold up renaming the version.  We can 
>>> have a separate discussion about what upgrade paths are supported, but 
>>> let's at least address this one issue of version number so we can have 
>>> consistent messaging.  When i talk to people about the next release, I'd 
>>> like to be consistent with what I call it, and have a unified voice as a 
>>> project.
>>> 
>>> Jon
>>> 
>>> On Thu, Jan 30, 2025 at 1:41 AM Mick Semb Wever >> > wrote:
>>> .
>>>
 If you mean only 4.1 and 5.0 would be online upgrade targets, I would 
 suggest we change that to T-3 so you encompass all “currently supported” 
 releases at the time the new branch is GAed.
>>> I think that's better actually, yeah. I was originally thinking T-2 from 
>>> the "what calendar time frame is reasonable" perspective, but saying "if 
>>> you're on a currently supported branch you can upgrade to a release that 
>>> comes out" makes clean intuitive sense. That'd mean:
>>> 
>>> 6.0: 5.0, 4.1, 4.0 online upgrades supported. Drop support for 4.0. API 
>>> compatible guaranteed w/5.0.
>>> 7.0: 6.0, 5.0, 4.1 online upgrades supported. Drop support for 4.1. API 
>>> compatible guaranteed w/6.0.
>>> 8.0: 7.0, 6.0, 5.0 online upgrades supported. Drop support for 5.0. API 
>>> compatible guaranteed w/7.0.
>>> 
>>> 
>>> 
>>> 
>>> I like this.



Re: Project hygiene on old PRs

2025-04-13 Thread Bernardo Botella
Thanks Josh and Stefan for the comments!

Such a script can definitely be helpful for this purpose of keeping our house 
tidy. It seems that the thread hasn’t gotten much steam yet. As this is, by no 
means, any urgent matter, let’s give some more time for people to pitch in. 
I’ll wait some more days looking for answers on this thread. Then, if no one 
has any strong opinion against it, I can start closing old PRs.

Thanks!
Bernardo

> On Apr 11, 2025, at 10:22 AM, Štefan Miklošovič  
> wrote:
> 
> I have a small script which scans GH pull requests (their titles) and looks 
> into JIRA to see what is their status. When it is "resolved" it prints it to 
> the console. Then I go over the links of PRs and close them one by one. This 
> relies on the title of the PR to be in exact format (CASSANDRA-123 a title of 
> the ticket) and not bullet proof but I have not come up with anything better 
> so far.
> 
> On Fri, Apr 11, 2025 at 5:19 PM Josh McKenzie  <mailto:jmcken...@apache.org>> wrote:
>> +1 from me.
>> 
>> My intuition is that this is a logical consequence of us not using github to 
>> merge PR's so they don't auto-close. Which seems like it's a logical 
>> consequence of us using merge commits instead of per-branch commits of 
>> patches.
>> 
>> The band-aid of at least having a human-in-the-loop to close out old 
>> inactive things is better than the status quo; the information is all still 
>> available in github but the status of the PR's will communicate different 
>> things.
>> 
>> On Thu, Apr 10, 2025, at 7:14 PM, Bernardo Botella wrote:
>>> Hi everyone!
>>> 
>>> First of all, this may have come out before, and I understand it is really 
>>> hard to keep a tidy house with so many different collaborations. But, I 
>>> can't help the feeling that coming to the main Apache Cassandra repository 
>>> and seeing more than 600 open PRs, some of them without activity for 5+ 
>>> years, gives the wrong impression about the love and care that we all share 
>>> for this code base. I think we can find an easy to follow agreement to try 
>>> and keep things a bit tidier. I wanted to propose some kind of "rule" that 
>>> allow us to directly close PRs that haven't had activity in a reasonable 
>>> and conservative amount of time of, let's say, 6 months? I want to 
>>> reiterate that I mean no activity at all for six months from the PR author. 
>>> I understand that complex PRs can be opened for longer than that period, 
>>> and that's perfectly fine.
>>> 
>>> What do you all think?
>>> 
>>> Bernardo
>> 



Re: Constraint's "not null" alignment with transactions and their simplification

2025-04-14 Thread Bernardo Botella
Now this is becoming a really interesting discussion. Thanks everyone for 
pitching in!

Here is my take on some of the proposed changes:

We are talking about treating some constraints (NOT_NULL, JSON) as special 
cases by omitting the CHECK keyword (not reserved as per current 
implementation). Now, while this may seem like a nice approach to feature gaps 
on our CQL, it really worries me that by doing so we open the door to not 
needed complexity both at implementation and conceptually with the constraints 
framework. 

In my mind, what’s the constraints framework? It is a simple and really easy to 
extend integration point for validators for a row. (LENGHT, SCALAR, REGEX, are 
really good examples of it).

What’s NOT the responsibility of the constraints framework? I don’t think this 
should be used to deliver partial solutions to feature gaps on CQL data 
modeling. Let’s take JSON constraint as an example. In the constraints case, it 
is as simple as checking that the provided string is valid json. Easy. Simple. 
But, how would JSON look like if it was a first class citizen in CQL? Setting 
the grammar aside, it would be handled differently probably. Things like: Can 
we store it better? Do we allow making queries for fields inside the json blob? 
Are there any optimizations that can be done when serializing/deserializing it? 
All of those definitely fall out of the scope of the constraints framework. So, 
I guess the question then becomes, is the JSON constraint a valid constraint to 
have? Just a temporal patch until (if) JSON type is in? Should we just remove 
it and keep ignoring JSON? Those are valid questions and discussions to have. 
But, I really think that we shouldn’t see this simple validator as a full 
fledged, first class citizen, type in CQL. Similar arguments could be have for 
the NOT_NULL constraint that has spawned so many interesting conversations.

Now, having made that distinction, I don’t think we should have constraints 
that can be defined differently on the CQL statement. They should all have a 
CHECK keyword, specifying that they are a constraint that will be checked (aka, 
row value will be validated against whatever function). That’s easy to 
identify, and it’s conceptually easy to understand the limitations it comes 
with (as opposed to the JSON example mentioned above).

Bernardo



> On Apr 14, 2025, at 10:53 AM, Štefan Miklošovič  
> wrote:
> 
> As Yifan already said, "check" is not a reserved word now and its usage does 
> not collide with anything. 
> 
> If people have columns, tables, keyspaces with name "check" that seems to 
> work already so they don't need to do anything:
> 
> CREATE TABLE ks.tb (id int check id > 0, val int check val > 0, primary key 
> (id));
> 
> ALTER TABLE ks.tb ADD check int check check > 0;
> 
> DESCRIBE ks.tb;
> 
> CREATE TABLE ks.tb (
> id int CHECK id > 0 PRIMARY KEY,
> check int CHECK check > 0,
> val int CHECK val > 0
> ) 
> 
> CREATE TABLE ks.check (id int check id > 0, check int check check > 0, 
> primary key (id));
> CREATE KEYSPACE check WITH replication = {'class': 'SimpleStrategy', 
> 'replication_factor': 1};
> CREATE TABLE check.check (check int check check > 0, val int check val > 0, 
> primary key (check));
> INSERT INTO check.check (check , val ) VALUES ( 1, 1);
> 
> PostgreSQL has this:
> 
> CREATE TABLE products (
> product_no integer,
> name text,
> price numeric CHECK (price > 0)
> );
> 
> we follow this approach (minus parenthesis). We can also chain constraints 
> whatever we like
> 
> val int CHECK val > 0 and age < 100
> 
> We can make a stab in trying to model
> 
> val int not null check val > 0
> 
> this is how PostgreSQL has it (1).
> 
> but that would be more complicated on the implementation side because we 
> would need to also accommodate "CQL describe" to dump it like that, plus I am 
> not sure how complicated it would be to tweak the parser as well.
> 
> I will try to make some progress and will report back.
> 
> Regards  
> 
> (1) 
> https://www.postgresql.org/docs/current/ddl-constraints.html#DDL-CONSTRAINTS-NOT-NULL
> 
> On Sun, Apr 13, 2025 at 6:49 PM Dinesh Joshi  > wrote:
>> On Sun, Apr 13, 2025 at 9:24 AM Patrick McFadin > > wrote:
>>> I'm loving all the syntax discussion lately. It's a good debate and 
>>> essential for the project's future with a good developer experience.
>> 
>> +1
>>  
>>> On NULL. I've been asked this a million times by end users. Why is there no 
>>> "NOT NULL" in the schema?
>> 
>> I would've expected this to be in billions by now ;)
>>  
>>> I'm in favor of the standard SQL syntax here because it's what users have 
>>> been using forever: 
>>> name   text NOT NULL
>> 
>> I hold a weak opinion on this. We don't have to necessarily align on to the 
>> standard SQL syntax. In my experience, users subconsciously feel Cassandra 
>> is a SQL database and try to design their schema to fit the trad

Re: Project hygiene on old PRs

2025-04-14 Thread Bernardo Botella
Just for reference, I ran a script I wrote using Stefan’s as inspiration, and 
there are 413 PRs without any activity for the past 6 months.

Bernardo

> On Apr 14, 2025, at 6:39 AM, Štefan Miklošovič  wrote:
> 
> 
> 
> On Mon, Apr 14, 2025 at 3:22 PM Josh McKenzie  > wrote:
>>> Funny that people don't forget to create a PR when trying to make a change 
>>> but as soon as it is delivered the respective PR is "memory holed".
>> We use the PR mechanisms for review but don't use the PR mechanism for 
>> merge. Makes sense that we open them since they're part of our workflow and 
>> forget to close them.
>> 
>> I'd much prefer a workflow where we just used the industry standard tools 
>> for both opening and closing (i.e. had per-branch patches we merged using gh 
>> after review passed and linked CI passed). But I suspect that's another 
>> [DISCUSS] thread and 
>> I should appropriately don metaphorical flame retardant protective gear 
>> before wading back into that particular dumpster fire.
>> 
> 
> genuinely chuckled :D Look ... just try to put a JIRA number into a title and 
> bonus points for closing it afterwards. If a PR is merged, and a ticket is 
> going to be resolved, people still need to put there a commit url from GitHub 
> etc ... so maybe internalizing it a little bit more that a PR might be closed 
> too would be great.
> 
> Unless we start to merge the PRs by "pushing buttons" I don't think this is 
> going to be resolved.
> 
> What is interesting is that there is automatic creation of a link into a JIRA 
> ticket when a PR is created (I guess that works by scanning a title of a PR 
> and linking it to a ticket? Or does it look into the name of a branch of that 
> PR?). Anyway, I would expect that the same is done when a JIRA is closed - 
> that it would go over the links of PRs and close them. When it can work one 
> way, why cannot it work the other way around as well?
>  
>> :D
>> 
>> On Mon, Apr 14, 2025, at 8:27 AM, Mick Semb Wever wrote:
>>> 
>>> 
>>> On Mon, 14 Apr 2025 at 10:23, Štefan Miklošovič >> > wrote:
>>> BTW If you still do not want to take care of closing it, that is also fine, 
>>> because we have a script at least. 
>>> 
>>> 
>>> 
>>> Relying on the PR name seems a bit brittle.  Maybe it wouldn't take much to 
>>> improve it.
>>> e.g. would it be possible to also auto-detect which PRs, still open, have 
>>> no changes to merge ?   This is an easy indicator that the PR has otherwise 
>>> been merged.
>>> Stale PRs with file conflicts is another lhf category that can get closed 
>>> out.
>>> 
>>> Not putting the work on you Stefan, just brainstorming…
>> 



Project hygiene on old PRs

2025-04-10 Thread Bernardo Botella
Hi everyone!

First of all, this may have come out before, and I understand it is really hard 
to keep a tidy house with so many different collaborations. But, I can't help 
the feeling that coming to the main Apache Cassandra repository and seeing more 
than 600 open PRs, some of them without activity for 5+ years, gives the wrong 
impression about the love and care that we all share for this code base. I 
think we can find an easy to follow agreement to try and keep things a bit 
tidier. I wanted to propose some kind of "rule" that allow us to directly close 
PRs that haven't had activity in a reasonable and conservative amount of time 
of, let's say, 6 months? I want to reiterate that I mean no activity at all for 
six months from the PR author. I understand that complex PRs can be opened for 
longer than that period, and that's perfectly fine.

What do you all think?

Bernardo

Re: Project hygiene on old PRs

2025-04-14 Thread Bernardo Botella
+1 on Paulo’s proposal. That will definitely help things up.

I will give one or two more days in case someone missed the thread, and if 
there is no voices against it, I’ll just close the stale PRs and raise the 
ticket to INFRA to close PRs when a Jira ticket is resolved.

Bernardo

> On Apr 14, 2025, at 4:56 PM, Jordan West  wrote:
> 
> If we want something to happen repeatably we should automate it not add more 
> manual tasks  to the list. Paulo’s suggestion seems to be in line with that 
> so +1 to something in that direction. 
> 
> We continually are swimming upstream making up our own process. The ask to 
> put the ticket number in the PR title requires manual effort because by 
> default GitHub takes the top line of the commit message which we explicitly 
> say shouldn’t contain the JIRA (that goes in the second line). Until we have 
> a solution it’s reasonable to ask folks to make that manual change but we 
> should also accept folks will forget or won’t do it because we asked them to 
> remember yet another manual step. 
> 
> Jordan 
> 
> On Mon, Apr 14, 2025 at 09:17 Paulo Motta  <mailto:pa...@apache.org>> wrote:
>> *committers (and not reviewers)
>> 
>> On Mon, Apr 14, 2025 at 12:13 PM Paulo Motta > <mailto:pa...@apache.org>> wrote:
>>> >  I am not sure why it is so hard for people to not forget to close a PR 
>>> > when their branch is merged. 
>>> 
>>> I wonder if reviewers* know they need to append the message "Closes #PR_ID" 
>>> to the end of the commit message to have the PR be closed, this does not 
>>> seem very obvious, but it's also a bit inconvenient.
>> 
>>> 
>>> 
>>> Since Apache INFRA already links github PRs to the appropriate JIRA, it 
>>> would probably not be very hard to have the PR be closed when the JIRA is 
>>> resolved. If this does not sound stupid perhaps we could submit an INFRA 
>>> feature request to address this issue.
>>> 
>>> On Mon, Apr 14, 2025 at 3:17 AM Štefan Miklošovič >> <mailto:smikloso...@apache.org>> wrote:
>>>> I am not sure why it is so hard for people to not forget to close a PR 
>>>> when their branch is merged. I stopped "fighting" this and I just run a 
>>>> script every few weeks. Funny that people don't forget to create a PR when 
>>>> trying to make a change but as soon as it is delivered the respective PR 
>>>> is "memory holed". A PR does not close itself if it was not obvious 
>>>> already.
>>>> 
>>>> On Mon, Apr 14, 2025 at 8:00 AM Bernardo Botella 
>>>> mailto:conta...@bernardobotella.com>> wrote:
>>>>> Thanks Josh and Stefan for the comments!
>>>>> 
>>>>> Such a script can definitely be helpful for this purpose of keeping our 
>>>>> house tidy. It seems that the thread hasn’t gotten much steam yet. As 
>>>>> this is, by no means, any urgent matter, let’s give some more time for 
>>>>> people to pitch in. I’ll wait some more days looking for answers on this 
>>>>> thread. Then, if no one has any strong opinion against it, I can start 
>>>>> closing old PRs.
>>>>> 
>>>>> Thanks!
>>>>> Bernardo
>>>>> 
>>>>>> On Apr 11, 2025, at 10:22 AM, Štefan Miklošovič >>>>> <mailto:smikloso...@apache.org>> wrote:
>>>>>> 
>>>>>> I have a small script which scans GH pull requests (their titles) and 
>>>>>> looks into JIRA to see what is their status. When it is "resolved" it 
>>>>>> prints it to the console. Then I go over the links of PRs and close them 
>>>>>> one by one. This relies on the title of the PR to be in exact format 
>>>>>> (CASSANDRA-123 a title of the ticket) and not bullet proof but I have 
>>>>>> not come up with anything better so far.
>>>>>> 
>>>>>> On Fri, Apr 11, 2025 at 5:19 PM Josh McKenzie >>>>> <mailto:jmcken...@apache.org>> wrote:
>>>>>>> +1 from me.
>>>>>>> 
>>>>>>> My intuition is that this is a logical consequence of us not using 
>>>>>>> github to merge PR's so they don't auto-close. Which seems like it's a 
>>>>>>> logical consequence of us using merge commits instead of per-branch 
>>>>>>> commits of patches.
>>>>>>> 
>>>>>>> The band-aid of

Re: [VOTE] Simplifying our release versioning process

2025-04-23 Thread Bernardo Botella
+1

> On Apr 22, 2025, at 7:20 PM, Joseph Lynch  wrote:
> 
> I'm unclear if Josh/Ekaterina/Benedict's statements are part of the vote 
> amending our project governance. If consensus is required for breaking 
> changes with a strong preference for not breaking I am +1, but the original 
> text of Josh's proposal is merely "We use a deprecate-then-remove strategy 
> for API breaking changes (deprecate in release N, then remove in N+1)" and I 
> don't see anything in the linked Governance page referring to this discuss 
> policy on breaking changes.
> 
> Can we just remove the parenthetical in #4 or clarify that breaking changes 
> require a minimum duration as determined by a DISCUSS thread - not to be 
> shorter than 1 major release?
> 
> -Joey
> 
> On Tue, Apr 22, 2025 at 6:17 PM Patrick McFadin  > wrote:
>> +1
>> 
>> On Tue, Apr 22, 2025 at 8:52 AM Dmitry Konstantinov > > wrote:
>>> +1
>>> 
>>> On Tue, 22 Apr 2025 at 16:37, Caleb Rackliffe >> > wrote:
 +1
 
 On Tue, Apr 22, 2025 at 8:37 AM Ekaterina Dimitrova >>> > wrote:
> +1
> 
> I also remember we agreed on Discuss thread for removing anything plus 
> preference for backward compatibility wherever it is possible. 
> 
> On Tue, 22 Apr 2025 at 7:00, Sam Tunnicliffe  > wrote:
>> +1
>> 
>> > On 17 Apr 2025, at 16:58, Josh McKenzie > > > wrote:
>> > 
>> > [DISCUSS] thread: 
>> > https://lists.apache.org/thread/jy6vodbkh64plhdfwqz3l3364gsmh2lq
>> > 
>> > The proposed new versioning mechanism:
>> > • We no longer use semver .MINOR
>> > • Online upgrades are supported for all GA supported releases at 
>> > time of new .MAJOR
>> > • T-1 releases are guaranteed API compatible for non-deprecated 
>> > features
>> > • We use a deprecate-then-remove strategy for API breaking changes 
>> > (deprecate in release N, then remove in N+1)
>> > This would translate into the following for our upcoming releases 
>> > (assuming 3 supported majors at all times):
>> > • 6.0: 5.0, 4.1, 4.0 online upgrades are supported (grandfather 
>> > window). We drop support for 4.0. API compatibility is guaranteed w/5.0
>> > • 7.0: 6.0, 5.0, 4.1 online upgrades are supported (grandfather 
>> > window). We drop support for 4.1. API compatibility is guaranteed w/6.0
>> > • 8.0: 7.0, 6.0, 5.0 online upgrades are supported (fully on new 
>> > paradigm). We drop support for 5.0. API compatibility guaranteed w/7.0
>> > David asked the question:
>> >> 
>> >> 
>> >> Does this imply that each release is allowed to make breaking changes 
>> >> (assuming they followed the “correct” deprecation process)? My first 
>> >> instinct is to not like this
>> > 
>> > Each release would be allowed to make breaking changes but only for 
>> > features that have already been deprecated for one major release cycle.
>> > 
>> > This is a process change so as per our governance: 
>> > https://cwiki.apache.org/confluence/display/CASSANDRA/Cassandra+Project+Governance,
>> >  it'll require a super majority of 50% of the roll called PMC in 
>> > favor. Current roll call is 21 so we need 11 pmc members to 
>> > participate, 8 of which are in favor of the change.
>> > 
>> > I'll plan to leave the vote open until we hit enough participation to 
>> > pass or fail it up to probably a couple weeks.
>> 
>> 
>>> 
>>> 
>>> 
>>> --
>>> Dmitry Konstantinov