Re: [DISCUSS] Index selection syntax for CASSANDRA-18112

2024-12-20 Thread Joel Shepherd
WITH INDEX (or something equivalent) seems really useful. Less opinionated on the specific syntax, but I think there is a lot of value in the form of predictable, controllable performance, in giving developers more direct control over query execution, whether that's index selection or even low

Re: [DISCUSS] Index selection syntax for CASSANDRA-18112

2024-12-20 Thread Jeremiah Jordan
> > On Fri, Dec 20, 2024 at 5:36 PM Caleb Rackliffe > wrote: > >> You mean like to control the tokenization/analysis of query terms? >> > Yes. Elastic for example lets you specify the query time analyzer in the query, over riding what is specified at the index level. https://www.elastic.co/guide

Re: [DISCUSS] Index selection syntax for CASSANDRA-18112

2024-12-20 Thread Caleb Rackliffe
You mean like to control the tokenization/analysis of query terms? On Fri, Dec 20, 2024 at 4:38 PM Jeremiah Jordan wrote: > Rather than WITH INDEX/WITHOUT INDEX what about WITH OPTIONS {}. If we > move into allowing analysis/tokenization on indexed items, then a more > general WITH OPTIONS woul

Re: [DISCUSS] Index selection syntax for CASSANDRA-18112

2024-12-20 Thread Caleb Rackliffe
So that would look something like... SELECT ... FROM ... WHERE ... WITH OPTIONS = { 'exclude_indexes' : [, ] } On Fri, Dec 20, 2024 at 5:36 PM Caleb Rackliffe wrote: > You mean like to control the tokenization/analysis of query terms? > > On Fri, Dec 20, 2024 at 4:38 PM Jeremiah Jordan > wrote

Re: [DISCUSS] Index selection syntax for CASSANDRA-18112

2024-12-20 Thread Jeremiah Jordan
Rather than WITH INDEX/WITHOUT INDEX what about WITH OPTIONS {}. If we move into allowing analysis/tokenization on indexed items, then a more general WITH OPTIONS would be useful for that too. That would let us add any other new options to a SELECT without needing to modify the grammar further.

[DISCUSS] Index selection syntax for CASSANDRA-18112

2024-12-20 Thread Caleb Rackliffe
Some of your are probably familiar with work in the DS fork to improve the selection of indexes for SAI queries in https://github.com/datastax/cassandra/commit/eeb33dd62b9b74ecf818a263fd73dbe6714b0df0#diff-2830028723b7f4af5ec7450fae2c206aeefa5a2c3455eff6f4a0734a85cb5424 . While I'm eagerly anticip

Re: Capabilities

2024-12-20 Thread Benedict
Mostly conceptual; the problem with a linearizable history is that if you lose some of it (eg because some logic bug prevents you from processing some epoch) you stop the world until an operator can step in to perform surgery about what the history should be.I do know of one recent bug to schema ch

Re: Capabilities

2024-12-20 Thread Jordan West
On Fri, Dec 20, 2024 at 11:06 AM Benedict wrote: > If TCM breaks we all have a really bad time, much worse than if any one of > these features individually has problems. If you break TCM in the right way > the cluster could become inoperable, or operations like topology changes > may be prevented

Re: Capabilities

2024-12-20 Thread Benedict
If TCM breaks we all have a really bad time, much worse than if any one of these features individually has problems. If you break TCM in the right way the cluster could become inoperable, or operations like topology changes may be prevented. So, we want to keep its responsibilities scoped sensibly,

Re: Capabilities

2024-12-20 Thread Jon Haddad
I don’t know the details and limits of TCM well enough to comment on what it can do, but i think its fair to say that if we can’t put a few hundred configuration options in taking up maybe a few MB, there’s a fundamental problem with it, and we need to seriously reconsider if it’s ready for product

Re: Capabilities

2024-12-20 Thread Paulo Motta
Apologies I missed the forked thread "Re: Capabilities" before commenting on this. I think the TCM-lite suggestion there is not incompatible with the generic "In Maintenance" TCM state that I am proposing, since while in this state each individual feature could also have their independent/parallel

Re: Capabilities

2024-12-20 Thread Štefan Miklošovič
I stand corrected. C in TCM is "cluster" :D Anyway. Configuration is super reasonable to be put there. On Fri, Dec 20, 2024 at 7:42 PM Štefan Miklošovič wrote: > I am super hesitant to base distributed guardrails or any configuration > for that matter on anything but TCM. Does not "C" in TCM sta

Re: Capabilities

2024-12-20 Thread Štefan Miklošovič
I am super hesitant to base distributed guardrails or any configuration for that matter on anything but TCM. Does not "C" in TCM stand for "configuration" anyway? So rename it to TSM like "schema" then if it is meant to be just for that. It seems to be quite ridiculous to code tables with caches on

Re: Capabilities

2024-12-20 Thread Paulo Motta
> It should be possible to use distributed system tables just fine for capabilities, config and guardrails. I have been thinking about this recently and I agree we should be wary about introducing new TCM states and create additional complexity that can be serviced by existing data dissemination m

Re: Capabilities

2024-12-20 Thread Jordan West
One minor clarification: ETS is entirely in memory (unless you explicitly dump it to disk or use DETS) so the equivalence to a local system table is only partially accurate but I think the parallel is fine in the case of what I was describing. Jordan On Fri, Dec 20, 2024 at 09:07 Jordan West wr

Re: Capabilities

2024-12-20 Thread Jordan West
Benedict, I agree with you TCM might be overkill for capabilities. It’s truly something that’s fine to be eventually consistent. Riaks implementation used a local ETS table (ETS is built into Erlang - equivalent for us would a local only system table) and an efficient and reliable gossip protocol.

Re: Cassandra 5 Upgrade - Storage Compatibility Modes

2024-12-20 Thread Brandon Williams
That sounds like a possibility to me on the surface. Kind Regards, Brandon On Fri, Dec 20, 2024 at 8:42 AM Paul Chandler wrote: > > Hi Brandon, > > That sounds good. Will that fix be in 4.1, as it is the old nodes that don’t > transmit the hints? > > Thanks > > Paul > > > On 20 Dec 2024, at 13:

Re: Cassandra 5 Upgrade - Storage Compatibility Modes

2024-12-20 Thread Paul Chandler
Hi Brandon, That sounds good. Will that fix be in 4.1, as it is the old nodes that don’t transmit the hints? Thanks Paul > On 20 Dec 2024, at 13:41, Brandon Williams wrote: > > I think after a discussion on #cassandra-dev yesterday, we are going > to remove the requirement for schema agree

Re: Cassandra 5 Upgrade - Storage Compatibility Modes

2024-12-20 Thread Brandon Williams
I think after a discussion on #cassandra-dev yesterday, we are going to remove the requirement for schema agreement to deliver hints, as suggested by Jeff Jirsa. Kind Regards, Brandon On Thu, Dec 19, 2024 at 7:43 AM Paul Chandler wrote: > > Hi Brandon, > > I am not sure which part changes after

Re: Capabilities

2024-12-20 Thread Štefan Miklošovič
Having a parallel and feature focused TCM log as you suggested seems perfectly reasonable to me. On Fri, Dec 20, 2024 at 11:33 AM Benedict wrote: > Guardrails are broadly the same as Auth which works this way, but with > less criticality. It’s fine if guardrails are updated slowly. > > But, agai

Re: Capabilities

2024-12-20 Thread Benedict
Guardrails are broadly the same as Auth which works this way, but with less criticality. It’s fine if guardrails are updated slowly.But, again, TCM is a fine target for this. It would however be nice to have an in-between capability though, TCM-lite if you will, for these features. Perhaps even jus

Re: Capabilities

2024-12-20 Thread Štefan Miklošovič
What do you mean by a distributed table? You mean these in system_distributed keyspace? If so, imagine we introduce a table system_distributed.guardrails where each row would hold what a guardrail would be set to, hence on guardrails evaluation in runtime (and there are a bunch of them to consider

Re: Capabilities

2024-12-20 Thread Benedict
If you perform a read from a distributed table on startup you will find the latest information. What catchup are you thinking of? I don’t think any of the features we talked about need a log, only the latest information.We can (and should) probably introduce event listeners for distributed tables,

Re: Capabilities

2024-12-20 Thread Štefan Miklošovič
I find TCM way more comfortable to work with. The capability of log being replayed on restart and catching up with everything else automatically is god-sent. If we had that on "good old distributed tables", then is it not true that we would need to take extra care of that, e.g. we would need to rep

Re: Capabilities

2024-12-20 Thread Benedict
TCM is a perfectly valid basis for this, but TCM is only really *necessary* to solve meta config problems where we can’t rely on the rest of the database working. Particularly placement issues, which is why schema and membership need to live there.It should be possible to use distributed system tab

Re: Capabilities

2024-12-20 Thread Štefan Miklošovič
Jordan, I also think that having it on TCM would be ideal and we should explore this path first before doing anything custom. Regarding my idea about the guardrails in TCM, when I prototyped that and wanted to make it happen, there was a little bit of a pushback (1) (even though super reasonable