> we are already late. We have several features running in production that we > chose to not open source yet because implementing phase 1 of the CEP would > have heavily simplify their designs. The cost of developing them was much > higher than what it would have been if the CEP had already been implemented. > We are also currently working on some SAI features that need cost based > optimization. Are there DISCUSS threads or CEP's for any of that work? For us to have a useful discussion about whether we're at a point in the project where a query optimizer is appropriate for the project this information would be vital.
On Thu, Dec 21, 2023, at 12:33 PM, Benjamin Lerer wrote: > Hey German, > > To clarify things, we intend to push cardinalities across nodes, not costs. > It will be up to the Cost Model to estimate cost based on those > cardinalities. We will implement some functionalities to collect costs on > query execution to be able to provide them as the output of EXPLAIN ANALYZE. > > We will provide more details on how we will collect and distribute > cardinalities. We will probably not go into details on how we will estimate > costs before the patch for it is ready. The main reason being that there are > a lot of different parts that you need to account for and that it will > require significant testing and experimentation. > > Regarding multi-tenancy, even if you use query cost, do not forget that you > will have to account also for background tasks such as compaction, repair, > backup, ... which is not included in this CEP. > > Le jeu. 21 déc. 2023 à 00:18, German Eichberger via dev > <dev@cassandra.apache.org> a écrit : >> All, >> >> very much agree with Scott's reasoning. >> >> It seems expedient given the advent of ACCORD transactions to be more like >> the other distributed SQL databases and just support SQL. But just because >> it's expedient it isn't right and we should work out the relational features >> in more detail before we embark on tying us to some query planning design. >> >> The main problem in this space is pushing cost / across nodes based on data >> density. I understand that TCM will level out data density but the cost >> based optimizer proposal does a lot of hand waving when it comes to >> collecting/estimating costs for each node. I like to see more details on >> this since otherwise it will be fairly limiting. >> >> I am less tied to ALLOW FILTERING - many of my customers find allowing >> filtering beneficial for their workloads so I think removing it makes sense >> to me (and yes we try to discourage them 🙂) >> >> I am also intrigued by this proposal when I think about multi tenancy and >> resource governance: We have heard from several operator who run multiple >> internal teams on the same Cassandra cluster jut to optimize costs. Having a >> way to attribute those costs more fairly by adding up the costs the >> optimizer calculates might be hugely beneficial. There could also be a way >> to have a "cost budget" on a keyspace to minimize the noisy neighbor problem >> and do more intelligent request throttling. >> >> In summary I support the proposal with the caveats raised above. >> >> Thanks, >> German >> >> >> *From:* C. Scott Andreas <sc...@paradoxica.net> >> *Sent:* Wednesday, December 20, 2023 8:15 AM >> *To:* dev@cassandra.apache.org <dev@cassandra.apache.org> >> *Cc:* dev@cassandra.apache.org <dev@cassandra.apache.org> >> *Subject:* [EXTERNAL] Re: [DISCUSS] CEP-39: Cost Based Optimizer >> >> >> You don't often get email from sc...@paradoxica.net. Learn why this is >> important <https://aka.ms/LearnAboutSenderIdentification> >> >> Thanks for this proposal and apologies for my delayed engagement during the >> Cassandra Summit last week. Benjamin, I appreciate your work on this and >> your engagement on this thread – I know it’s a lot of discussion to field. >> >> On ALLOW FILTERING: >> >> I share Chris Lohfink’s experience in operating clusters that have made >> heavy use of ALLOW FILTERING. It is a valuable guardrail for the database to >> require users specifically annotate queries that may cost 1000x+ that of a >> simple lookup for a primary key. For my own purposes, I’d actually like to >> go a step further and disable queries that require ALLOW FILTERING by >> default unless explicitly reviewed - but haven’t taken the step of adding >> such a guardrail yet. >> >> CBOs, CQL, and SQL: >> >> The CBO proposal cuts to the heart of one of the fundamental differences >> between SQL and CQL that I haven’t seen exercised yet. >> >> SQL allows users to define schemas that provide structure to data and to >> issue queries over them based on a relational algebra. SQL’s purpose is to >> decouple the on-disk representation of data from the query language used to >> access and aggregate it. This produces a very flexible query language that >> can be used to ask a database anything - but at a cost of execution that may >> be effectively infinite (think recursive subqueries). >> >> CQL is very different. While SQL is designed to decouple query language and >> on-disk representation, CQL is designed specifically to couple them. A CQL >> schema declares data placement, query routing, and disk serialization, and >> sorting to enable efficient retrieval. This is a very different design goal >> from a general-purpose query language. In time CQL may gain many SQL-like >> capabilities (and I hope it does!), but it will require careful work to do >> so without creating many footguns. >> >> Feature evolution: >> >> I agree that in the coming years, Cassandra is likely to gain >> semi-relational features via maturation of the byte-ordered partitioner (via >> range splitting, via TCM); the availability of SAI and its evolution (e.g., >> via new functionality enabled by Lucene libraries); potentially joins via >> BOP; and others. This is a really exciting future, and one that probably >> requires a planner and optimizer. >> >> My general inclination is that a planner + optimizer seem valuable for >> Cassandra, but that the proposal feels a year or two early. The database >> doesn’t yet have a plan of record to add support for some of the >> semirelational constructs we’ve talked about, and I’m not aware of active >> CEPs that propose designs for features like these yet. >> >> Like Jeff, I’d find this much easier to discuss in the context of a database >> gaining support for these features with specific designs available to >> discuss. The ALLOW FILTERING and constant folding examples are a little >> slim. Index selection is probably the best one I can think of right now - >> e.g., if we wanted to add the ability to issue partition-restricted queries >> over a base table with multiple indexes defined without users specifically >> declaring an index. I haven’t seen an at-scale use case that would be better >> served by planner-driven index selection vs. user-driven, but they might be >> out there. >> >> It’s not my role to suggest changes in prioritization for work that isn’t >> mine. But I feel that the project could design better interfaces and a >> better planner/optimizer if that work were oriented toward improving >> particular features that are in wide use. >> >> To summarize my thoughts: >> >> – I agree that it is valuable for Apache Cassandra to gain a >> planner/optimizer. >> – I disagree with removing or deprecating ALLOW FILTERING and see this as a >> necessary guardrail. >> – I think the proposal surfaces the differences between the design goals of >> CQL and SQL, but I don’t feel that it quite addresses it. >> – I think we could collectively build a stronger planner/optimizer once some >> of the features it’s meant to optimize are in place. >> – I’m not quite sold on the need for the implementation to be bespoke based >> on discussion so far (vs. Calcite/Catalyst etc), but haven’t done the >> legwork to investigate this myself. >> – I *love* the idea of capturing many of the execution and hotness >> statistics that are proposed in the CEP. It would be very valuable to >> surface query cost to users independent of a CBO. Stats like these would >> also be valuable toward retrofitting Cassandra for multitenancy by bounding >> or rate-limiting users on query cost. Tracking SSTable hotness would also be >> useful toward evaluating feasibility of tiered storage, too. >> >> Thanks for this proposal and discussion so far — appreciate and enjoying it. >> >> – Scott >> >>> On Dec 20, 2023, at 7:52 AM, Benjamin Lerer <ble...@apache.org> wrote: >>> >>> >>>> If we are to address that within the CEP itself then we should discuss it >>>> here, as I would like to fully understand the approach as well as how it >>>> relates to consistency of execution and the idea of triggering >>>> re-optimisation. >>> >>> Sure, that was my plan. >>> >>> >>>> I’m not sold on the proposed set of characteristics, and think my coupling >>>> an execution plan to a given prepared statement for clients to supply is >>>> perhaps simpler to implement and maintain, and has corollary benefits - >>>> such as providing a mechanism for users to specify their own execution >>>> plan. >>>> >>>> Note, my proposal cuts across all of these elements of the CEP. There is >>>> no obvious need for a cross-cluster re-optimisation event or cross cluster >>>> statistic management. >>> >>> I think that I am missing one part of your proposal. How do you plan to >>> build the initial execution plan for a prepared statement? >>> >>> Le mer. 20 déc. 2023 à 14:05, Benedict <bened...@apache.org> a écrit : >>>> >>>> If we are to address that within the CEP itself then we should discuss it >>>> here, as I would like to fully understand the approach as well as how it >>>> relates to consistency of execution and the idea of triggering >>>> re-optimisation. These ideas are all interrelated. >>>> >>>> I’m not sold on the proposed set of characteristics, and think my coupling >>>> an execution plan to a given prepared statement for clients to supply is >>>> perhaps simpler to implement and maintain, and has corollary benefits - >>>> such as providing a mechanism for users to specify their own execution >>>> plan. >>>> >>>> Note, my proposal cuts across all of these elements of the CEP. There is >>>> no obvious need for a cross-cluster re-optimisation event or cross cluster >>>> statistic management. >>>> >>>> We still also need to discuss more concretely how the base statistics >>>> themselves will be derived, as there is little detail here today in the >>>> proposal. >>>> >>>> >>>>> On 20 Dec 2023, at 12:58, Benjamin Lerer <b.le...@gmail.com> wrote: >>>>> >>>>> After the second phase of the CEP, we will have two optimizer >>>>> implementations. One will be similar to what we have today and the other >>>>> one will be the CBO. As those implementations will be behind the new >>>>> Optimizer API interfaces they will both have support for EXPLAIN and they >>>>> will both benefit from the simplification/normalization rules. Such as >>>>> the ones that David mentioned. >>>>> >>>>> Regarding functions, we are already able to determine which ones are >>>>> deterministic >>>>> (https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/cql3/functions/Function.java#L55). >>>>> We simply do not take advantage of it. >>>>> >>>>> I removed the ALLOW FILTERING part and will open a discussion about it at >>>>> the beginning of next year. >>>>> >>>>> Regarding the statistics management part, I would like to try to address >>>>> it within the CEP itself, if feasible. If it turns out to be too >>>>> complicated, I will separate it into its own CEP. >>>>> >>>>> Le mar. 19 déc. 2023 à 22:23, David Capwell <dcapw...@apple.com> a écrit : >>>>>>> even if the only outcome of all this work were to tighten up >>>>>>> inconsistencies in our grammar and provide more robust EXPLAIN and >>>>>>> EXPLAIN ANALYZE functionality to our end users, I think that would be >>>>>>> highly valuable >>>>>> >>>>>> In my mental model a no-op optimizer just becomes what we have today >>>>>> (since all new features really should be disabled by default, I would >>>>>> hope we support this), so we benefit from having a logical AST + ability >>>>>> to mutate it before we execute it and we can use this to make things >>>>>> nicer for users (as you are calling out) >>>>>> >>>>>> Here is one example that stands out to me in accord >>>>>> >>>>>> LET a = (select * from tbl where pk=0); >>>>>> Insert into tbl2 (pk, …) values (a.pk, …); — this is not allowed as we >>>>>> don’t know the primary key… but this could trivially be written to >>>>>> replace a.pk with 0… >>>>>> >>>>>> With this work we could also rethink what functions are deterministic >>>>>> and which ones are not (not trying to bike shed)… simple example is >>>>>> “now” (select now() from tbl; — each row will have a different >>>>>> timestamp), if we make this deterministic we can avoid calling it for >>>>>> each row and instead just replace it with a constant for the query… >>>>>> >>>>>> Even if the CBO is dropped in favor of no-op (what we do today), I still >>>>>> see value in this work. >>>>>> >>>>>> I do think that the CBO really doesn’t solve the fact some features >>>>>> don’t work well, if anything it could just mask it until it’s too late…. >>>>>> If user builds an app using filtering and everything is going well in >>>>>> QA, but once they see a spike in traffic in prod we start rejecting… >>>>>> this is a bad user experience IMO… we KNOW you must think about this >>>>>> before you go this route, so a CBO letting you ignore it till you hit a >>>>>> wall I don’t think is the best (not saying ALLOW FILTERING is the >>>>>> solution to this… but it at least is a signal to users to think through >>>>>> their data model). >>>>>> >>>>>> >>>>>>> On Dec 15, 2023, at 6:38 PM, Josh McKenzie <jmcken...@apache.org> wrote: >>>>>>> >>>>>>>> Goals >>>>>>>> • Introduce a Cascades(2) query optimizer with rules easily >>>>>>>> extendable >>>>>>>> • Improve query performance for most common queries >>>>>>>> • Add support for EXPLAIN and EXPLAIN ANALYZE to help with query >>>>>>>> optimization and troubleshooting >>>>>>>> • Lay the groundwork for the addition of features like joins, >>>>>>>> subqueries, OR/NOT and index ordering >>>>>>>> • Put in place some performance benchmarks to validate query >>>>>>>> optimizations >>>>>>> I think these are sensible goals. We're possibly going to face a >>>>>>> chicken-or-egg problem with a feature like this that so heavily >>>>>>> intersects with other as-yet written features where much of the value >>>>>>> is in the intersection of them; if we continue down the current "one >>>>>>> heuristic to rule them all" query planning approach we have now, we'll >>>>>>> struggle to meaningfully explore or conceptualize the value of >>>>>>> potential alternatives different optimizers could present us. Flip >>>>>>> side, to Benedict's point, until SAI hits and/or some other potential >>>>>>> future things we've all talked about, this cbo would likely fall >>>>>>> directly into the same path that we effectively have hard-coded today >>>>>>> (primary index path only). >>>>>>> >>>>>>> One thing I feel pretty strongly about: even if the only outcome of all >>>>>>> this work were to tighten up inconsistencies in our grammar and provide >>>>>>> more robust EXPLAIN and EXPLAIN ANALYZE functionality to our end users, >>>>>>> I think that would be highly valuable. This path of "only" would be >>>>>>> predicated on us not having successful introduction of a robust >>>>>>> secondary index implementation and a variety of other things we have a >>>>>>> lot of interest in, so I find it unlikely, but worth calling out. >>>>>>> >>>>>>> re: the removal of ALLOW FILTERING - is there room for compromise here >>>>>>> and instead converting it to a guardrail that defaults to being >>>>>>> enabled? That could theoretically give us a more gradual path to >>>>>>> migration to a cost-based guardrail for instance, and would preserve >>>>>>> the current robustness of the system while making it at least a touch >>>>>>> more configurable. >>>>>>> >>>>>>> On Fri, Dec 15, 2023, at 11:03 AM, Chris Lohfink wrote: >>>>>>>> Thanks for time in addressing concerns. At least with initial >>>>>>>> versions, as long as there is a way to replace it with noop or disable >>>>>>>> it I would be happy. This is pretty standard practice with features >>>>>>>> nowadays but I wanted to highlight it as this might require some >>>>>>>> pretty tight coupling. >>>>>>>> >>>>>>>> Chris >>>>>>>> >>>>>>>> On Fri, Dec 15, 2023 at 7:57 AM Benjamin Lerer <ble...@apache.org> >>>>>>>> wrote: >>>>>>>>> Hey Chris, >>>>>>>>> You raise some valid points. >>>>>>>>> >>>>>>>>> I believe that there are 3 points that you mentioned: >>>>>>>>> 1) CQL restrictions are some form of safety net and should be kept >>>>>>>>> 2) A lot of Cassandra features do not scale and/or are too easy to >>>>>>>>> use in a wrong way that can make the whole system collapse. We should >>>>>>>>> not add more to that list. Especially not joins. >>>>>>>>> >>>>>>>>> 3) Should we not start to fix features like secondary index rather >>>>>>>>> than adding new ones? Which is heavily linked to 2). >>>>>>>>> >>>>>>>>> Feel free to correct me if I got them wrong or missed one. >>>>>>>>> >>>>>>>>> Regarding 1), I believe that you refer to the "Removing unnecessary >>>>>>>>> CQL query limitations and inconsistencies" section. We are not >>>>>>>>> planning to remove any safety net here. >>>>>>>>> What we want to remove is a certain amount of limitations which make >>>>>>>>> things confusing for a user trying to write a query for no good >>>>>>>>> reason. Like "why can I define a column alias but not use it anywhere >>>>>>>>> in my query?" or "Why can I not create a list with 2 bind >>>>>>>>> parameters?". While refactoring some CQL code, I kept on finding >>>>>>>>> those types of exceptions that we can easily remove while simplifying >>>>>>>>> the code at the same time. >>>>>>>>> >>>>>>>>> For 2), I agree that at a certain scale or for some scenarios, some >>>>>>>>> features simply do not scale or catch users by surprise. The goal of >>>>>>>>> the CEP is to improve things in 2 ways. One is by making Cassandra >>>>>>>>> smarter in the way it chooses how to process queries, hopefully >>>>>>>>> improving its overall scalability. The other by being transparent >>>>>>>>> about how Cassandra will execute the queries through the use of >>>>>>>>> EXPLAIN. One problem of GROUP BY for example is that most users do >>>>>>>>> not realize what is actually happening under the hood and therefore >>>>>>>>> its limitations. I do not believe that EXPLAIN will change everything >>>>>>>>> but it will help people to get a better understanding of the >>>>>>>>> limitations of some features. >>>>>>>>> >>>>>>>>> I do not know which features will be added in the future to C*. That >>>>>>>>> will be discussed through some future CEPs. Nevertheless, I do not >>>>>>>>> believe that it makes sense to write a CEP for a query optimizer >>>>>>>>> without taking into account that we might at some point add some >>>>>>>>> level of support for joins or subqueries. We have been too often >>>>>>>>> delivering features without looking at what could be the possible >>>>>>>>> evolutions which resulted in code where adding new features was more >>>>>>>>> complex than it should have been. I do not want to make the same >>>>>>>>> mistake. I want to create an optimizer that can be improved easily >>>>>>>>> and considering joins or other features simply help to build things >>>>>>>>> in a more generic way. >>>>>>>>> >>>>>>>>> Regarding feature stabilization, I believe that it is happening. I >>>>>>>>> have heard plans of how to solve MVs, range queries, hot partitions, >>>>>>>>> ... and there was a lot of thinking behind those plans. Secondary >>>>>>>>> indexes are being worked on. We hope that the optimizer will also >>>>>>>>> help with some index queries. >>>>>>>>> >>>>>>>>> It seems to me that this proposal is going toward the direction that >>>>>>>>> you want without introducing new problems for scalability. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Le jeu. 14 déc. 2023 à 16:47, Chris Lohfink <clohfin...@gmail.com> a >>>>>>>>> écrit : >>>>>>>>>> I don't wanna be a blocker for this CEP or anything but did want to >>>>>>>>>> put my 2 cents in. This CEP is horrifying to me. >>>>>>>>>> >>>>>>>>>> I have seen thousands of clusters across multiple companies and >>>>>>>>>> helped them get working successfully. A vast majority of that >>>>>>>>>> involved blocking the use of MVs, GROUP BY, secondary indexes, and >>>>>>>>>> even just simple _range queries_. The "unncessary restrictions of >>>>>>>>>> cql" are not only necessary IMHO, more restrictions are necessary to >>>>>>>>>> be successful at scale. The idea of just opening up CQL to general >>>>>>>>>> purpose relational queries and lines like "supporting queries with >>>>>>>>>> joins in an efficient way" ... I would really like us to make >>>>>>>>>> secondary indexes be a viable option before we start opening up >>>>>>>>>> floodgates on stuff like this. >>>>>>>>>> >>>>>>>>>> Chris >>>>>>>>>> >>>>>>>>>> On Thu, Dec 14, 2023 at 9:37 AM Benedict <bened...@apache.org> wrote: >>>>>>>>>>> >>>>>>>>>>> > So yes, this physical plan is the structure that you have in mind >>>>>>>>>>> > but the idea of sharing it is not part of the CEP. >>>>>>>>>>> >>>>>>>>>>> I think it should be. This should form a major part of the API on >>>>>>>>>>> which any CBO is built. >>>>>>>>>>> >>>>>>>>>>> > It seems that there is a difference between the goal of your >>>>>>>>>>> > proposal and the one of the CEP. The goal of the CEP is first to >>>>>>>>>>> > ensure optimal performance. It is ok to change the execution plan >>>>>>>>>>> > for one that delivers better performance. What we want to >>>>>>>>>>> > minimize is having a node performing queries in an inefficient >>>>>>>>>>> > way for a long period of time. >>>>>>>>>>> >>>>>>>>>>> You have made a goal of the CEP synchronising summary statistics >>>>>>>>>>> across the whole cluster in order to achieve some degree of >>>>>>>>>>> uniformity of query plan. So this is explicitly a goal of the CEP, >>>>>>>>>>> and synchronising summary statistics is a hard problem and won’t >>>>>>>>>>> provide strong guarantees. >>>>>>>>>>> >>>>>>>>>>> > The client side proposal targets consistency for a given query on >>>>>>>>>>> > a given driver instance. In practice, it would be possible to >>>>>>>>>>> > have 2 similar queries with 2 different execution plans on the >>>>>>>>>>> > same driver >>>>>>>>>>> >>>>>>>>>>> This would only be possible if the driver permitted it. A driver >>>>>>>>>>> could (and should) enforce that it only permits one query plan per >>>>>>>>>>> query. >>>>>>>>>>> >>>>>>>>>>> The opposite is true for your proposal: some queries may begin >>>>>>>>>>> degrading because they touch specific replicas that optimise the >>>>>>>>>>> query differently, and this will be hard to debug. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> On 14 Dec 2023, at 15:30, Benjamin Lerer <b.le...@gmail.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>> The binding of the parser output to the schema (what is today the >>>>>>>>>>>> Raw.prepare call) will create the logical plan, expressed as a >>>>>>>>>>>> tree of relational operators. Simplification and normalization >>>>>>>>>>>> will happen on that tree to produce a new equivalent logical plan. >>>>>>>>>>>> That logical plan will be used as input to the optimizer. The >>>>>>>>>>>> output will be a physical plan producing the output specified by >>>>>>>>>>>> the logical plan. A tree of physical operators specifying how the >>>>>>>>>>>> operations should be performed. >>>>>>>>>>>> >>>>>>>>>>>> That physical plan will be stored as part of the statements >>>>>>>>>>>> (SelectStatement, ModificationStatement, ...) in the prepared >>>>>>>>>>>> statement cache. Upon execution, variables will be bound and the >>>>>>>>>>>> RangeCommands/Mutations will be created based on the physical plan. >>>>>>>>>>>> >>>>>>>>>>>> The string representation of a physical plan will effectively >>>>>>>>>>>> represent the output of an EXPLAIN statement but outside of that >>>>>>>>>>>> the physical plan will stay encapsulated within the statement >>>>>>>>>>>> classes. >>>>>>>>>>>> Hints will be parameters provided to the optimizer to enforce some >>>>>>>>>>>> specific choices. Like always using an Index Scan instead of a >>>>>>>>>>>> Table Scan, ignoring the cost comparison. >>>>>>>>>>>> >>>>>>>>>>>> So yes, this physical plan is the structure that you have in mind >>>>>>>>>>>> but the idea of sharing it is not part of the CEP. I did not >>>>>>>>>>>> document it because it will simply be a tree of physical operators >>>>>>>>>>>> used internally. >>>>>>>>>>>> >>>>>>>>>>>>> My proposal is that the execution plan of the coordinator that >>>>>>>>>>>>> prepares a query gets serialised to the client, which then >>>>>>>>>>>>> provides the execution plan to all future coordinators, and >>>>>>>>>>>>> coordinators provide it to replicas as necessary. >>>>>>>>>>>>> >>>>>>>>>>>>> This means it is not possible for any conflict to arise for a >>>>>>>>>>>>> single client. It would guarantee consistency of execution for >>>>>>>>>>>>> any single client (and avoid any drift over the client’s >>>>>>>>>>>>> sessions), without necessarily guaranteeing consistency for all >>>>>>>>>>>>> clients. >>>>>>>>>>>> >>>>>>>>>>>> It seems that there is a difference between the goal of your >>>>>>>>>>>> proposal and the one of the CEP. The goal of the CEP is first to >>>>>>>>>>>> ensure optimal performance. It is ok to change the execution plan >>>>>>>>>>>> for one that delivers better performance. What we want to minimize >>>>>>>>>>>> is having a node performing queries in an inefficient way for a >>>>>>>>>>>> long period of time. >>>>>>>>>>>> >>>>>>>>>>>> The client side proposal targets consistency for a given query on >>>>>>>>>>>> a given driver instance. In practice, it would be possible to have >>>>>>>>>>>> 2 similar queries with 2 different execution plans on the same >>>>>>>>>>>> driver making things really confusing. Identifying the source of >>>>>>>>>>>> an inefficient query will also be pretty hard. >>>>>>>>>>>> >>>>>>>>>>>> Interestingly, having 2 nodes with 2 different execution plans >>>>>>>>>>>> might not be a serious problem. It simply means that based on >>>>>>>>>>>> cardinality at t1, the optimizer on node 1 chose plan 1 while the >>>>>>>>>>>> one on node 2 chose plan 2 at t2. In practice if the cost >>>>>>>>>>>> estimates reflect properly the actual cost those 2 plans should >>>>>>>>>>>> have pretty similar efficiency. The problem is more about the fact >>>>>>>>>>>> that you would ideally want a uniform behavior around your cluster. >>>>>>>>>>>> Changes of execution plans should only occur at certain points. So >>>>>>>>>>>> the main problematic scenario is when the data distribution is >>>>>>>>>>>> around one of those points. Which is also the point where the >>>>>>>>>>>> change should have the least impact. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Le jeu. 14 déc. 2023 à 11:38, Benedict <bened...@apache.org> a >>>>>>>>>>>> écrit : >>>>>>>>>>>>> >>>>>>>>>>>>> There surely needs to be a more succinct and abstract >>>>>>>>>>>>> representation in order to perform transformations on the query >>>>>>>>>>>>> plan? You don’t intend to manipulate the object graph directly as >>>>>>>>>>>>> you apply any transformations when performing simplification or >>>>>>>>>>>>> cost based analysis? This would also (I expect) be the form used >>>>>>>>>>>>> to support EXPLAIN functionality, and probably also HINTs etc. >>>>>>>>>>>>> This would ideally *not* be coupled to the CBO itself, and would >>>>>>>>>>>>> ideally be succinctly serialised. >>>>>>>>>>>>> >>>>>>>>>>>>> I would very much expect the query plan to be represented >>>>>>>>>>>>> abstractly as part of this work, and for there to be a mechanism >>>>>>>>>>>>> that translates this abstract representation into the object >>>>>>>>>>>>> graph that executes it. >>>>>>>>>>>>> >>>>>>>>>>>>> If I’m incorrect, could you please elaborate more specifically >>>>>>>>>>>>> how you intend to go about this? >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> On 14 Dec 2023, at 10:33, Benjamin Lerer <b.le...@gmail.com> >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> I mean that an important part of this work - not specified in >>>>>>>>>>>>>>> the CEP (AFAICT) - should probably be to define some standard >>>>>>>>>>>>>>> execution model, that we can manipulate and serialise, for use >>>>>>>>>>>>>>> across (and without) optimisers. >>>>>>>>>>>>>> >>>>>>>>>>>>>> I am confused because for me an execution model defines how >>>>>>>>>>>>>> operations are executed within the database in a conceptual way, >>>>>>>>>>>>>> which is not something that this CEP intends to change. Do you >>>>>>>>>>>>>> mean the physical/execution plan? >>>>>>>>>>>>>> Today this plan is somehow represented for reads by the >>>>>>>>>>>>>> SelectStatement and its components (Selections, >>>>>>>>>>>>>> StatementRestrictions, ...) it is then converted at execution >>>>>>>>>>>>>> time after parameter binding into a ReadCommand which is sent to >>>>>>>>>>>>>> the replicas. >>>>>>>>>>>>>> We plan to refactor SelectStatement and its components but the >>>>>>>>>>>>>> ReadCommands change should be relatively small. What you are >>>>>>>>>>>>>> proposing is not part of the scope of this CEP. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Le jeu. 14 déc. 2023 à 10:24, Benjamin Lerer <b.le...@gmail.com> >>>>>>>>>>>>>> a écrit : >>>>>>>>>>>>>>>> Can you share the reasons why Apache Calcite is not suitable >>>>>>>>>>>>>>>> for this case and why it was rejected >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> My understanding is that Calcite was made for two main things: >>>>>>>>>>>>>>> to help with optimizing SQL-like languages and to let people >>>>>>>>>>>>>>> query different kinds of data sources together. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> We could think about using it for our needs, but there are some >>>>>>>>>>>>>>> big problems: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> 1. CQL is not SQL. There are significant differences between >>>>>>>>>>>>>>> the 2 languages >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> 2. Cassandra has its own specificities that will influence the >>>>>>>>>>>>>>> cost model and the way we deal with optimizations: partitions, >>>>>>>>>>>>>>> replication factors, consistency levels, LSM tree storage, ... >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> 3. Every framework comes with its own limitations and >>>>>>>>>>>>>>> additional cost >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> From my view, there are too many big differences between what >>>>>>>>>>>>>>> Calcite does and what we need in Cassandra. If we used Calcite, >>>>>>>>>>>>>>> it would also mean relying a lot on another system that >>>>>>>>>>>>>>> everyone would have to learn and adjust to. The problems and >>>>>>>>>>>>>>> extra work this would bring don't seem worth the benefits we >>>>>>>>>>>>>>> might get >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Le mer. 13 déc. 2023 à 18:06, Benjamin Lerer >>>>>>>>>>>>>>> <b.le...@gmail.com> a écrit : >>>>>>>>>>>>>>>> One thing that I did not mention is the fact that this CEP is >>>>>>>>>>>>>>>> only a high level proposal. There will be deeper discussions >>>>>>>>>>>>>>>> on the dev list around the different parts of this proposal >>>>>>>>>>>>>>>> when we reach those parts and have enough details to make >>>>>>>>>>>>>>>> those discussions more meaningful. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> The maintenance and distribution of summary statistics in >>>>>>>>>>>>>>>>> particular is worthy of its own CEP, and it might be >>>>>>>>>>>>>>>>> preferable to split it out. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> For maintaining node statistics the idea is to re-use the >>>>>>>>>>>>>>>> current Memtable/SSTable mechanism and relies on mergeable >>>>>>>>>>>>>>>> statistics. That will allow us to easily build node level >>>>>>>>>>>>>>>> statistics for a given table by merging all the statistics of >>>>>>>>>>>>>>>> its memtable and SSTables. For the distribution of these node >>>>>>>>>>>>>>>> statistics we are still exploring different options. We can >>>>>>>>>>>>>>>> come back with a precise proposal once we have hammered all >>>>>>>>>>>>>>>> the details. >>>>>>>>>>>>>>>> Is it for you a blocker for this CEP or do you just want to >>>>>>>>>>>>>>>> make sure that this part is discussed in deeper details before >>>>>>>>>>>>>>>> we implement it? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> The proposal also seems to imply we are aiming for >>>>>>>>>>>>>>>>> coordinators to all make the same decision for a query, which >>>>>>>>>>>>>>>>> I think is challenging, and it would be worth fleshing out >>>>>>>>>>>>>>>>> the design here a little (perhaps just in Jira). >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> The goal is that the large majority of nodes preparing a query >>>>>>>>>>>>>>>> at a given point in time should make the same decision and >>>>>>>>>>>>>>>> that over time all nodes should converge toward the same >>>>>>>>>>>>>>>> decision. This part is dependent on the node statistics >>>>>>>>>>>>>>>> distribution, the cost model and the triggers for >>>>>>>>>>>>>>>> re-optimization (that will require some experimentation). >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> There’s also not much discussion of the execution model: I >>>>>>>>>>>>>>>>> think it would make most sense for this to be independent of >>>>>>>>>>>>>>>>> any cost and optimiser models (though they might want to >>>>>>>>>>>>>>>>> operate on them), so that EXPLAIN and hints can work across >>>>>>>>>>>>>>>>> optimisers (a suitable hint might essentially bypass the >>>>>>>>>>>>>>>>> optimiser, if the optimiser permits it, by providing a >>>>>>>>>>>>>>>>> standard execution model) >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> It is not clear to me what you mean by "a standard execution >>>>>>>>>>>>>>>> model"? Otherwise, we were not planning to have the execution >>>>>>>>>>>>>>>> model or the hints depending on the optimizer. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I think it would be worth considering providing the execution >>>>>>>>>>>>>>>>> plan to the client as part of query preparation, as an opaque >>>>>>>>>>>>>>>>> payload to supply to coordinators on first contact, as this >>>>>>>>>>>>>>>>> might simplify the problem of ensuring queries behave the >>>>>>>>>>>>>>>>> same without adopting a lot of complexity for synchronising >>>>>>>>>>>>>>>>> statistics (which will never provide strong guarantees). Of >>>>>>>>>>>>>>>>> course, re-preparing a query might lead to a new plan, though >>>>>>>>>>>>>>>>> any coordinators with the query in their cache should be able >>>>>>>>>>>>>>>>> to retrieve it cheaply. If the execution model is efficiently >>>>>>>>>>>>>>>>> serialised this might have the ancillary benefit of improving >>>>>>>>>>>>>>>>> the occupancy of our prepared query cache. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I am not sure that I understand your proposal. If 2 nodes >>>>>>>>>>>>>>>> build a different execution plan how do you solve that >>>>>>>>>>>>>>>> conflict? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Le mer. 13 déc. 2023 à 09:55, Benedict <bened...@apache.org> a >>>>>>>>>>>>>>>> écrit : >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> A CBO can only make worse decisions than the status quo for >>>>>>>>>>>>>>>>> what I presume are the majority of queries - i.e. those that >>>>>>>>>>>>>>>>> touch only primary indexes. In general, there are plenty of >>>>>>>>>>>>>>>>> use cases that prefer determinism. So I agree that there >>>>>>>>>>>>>>>>> should at least be a CBO implementation that makes the same >>>>>>>>>>>>>>>>> decisions as the status quo, deterministically. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I do support the proposal, but would like to see some >>>>>>>>>>>>>>>>> elements discussed in more detail. The maintenance and >>>>>>>>>>>>>>>>> distribution of summary statistics in particular is worthy of >>>>>>>>>>>>>>>>> its own CEP, and it might be preferable to split it out. The >>>>>>>>>>>>>>>>> proposal also seems to imply we are aiming for coordinators >>>>>>>>>>>>>>>>> to all make the same decision for a query, which I think is >>>>>>>>>>>>>>>>> challenging, and it would be worth fleshing out the design >>>>>>>>>>>>>>>>> here a little (perhaps just in Jira). >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> While I’m not a fan of ALLOW FILTERING, I’m not convinced >>>>>>>>>>>>>>>>> that this CEP deprecates it. It is a concrete qualitative >>>>>>>>>>>>>>>>> guard rail, that I expect some users will prefer to a >>>>>>>>>>>>>>>>> cost-based guard rail. Perhaps this could be left to the CBO >>>>>>>>>>>>>>>>> to decide how to treat. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> There’s also not much discussion of the execution model: I >>>>>>>>>>>>>>>>> think it would make most sense for this to be independent of >>>>>>>>>>>>>>>>> any cost and optimiser models (though they might want to >>>>>>>>>>>>>>>>> operate on them), so that EXPLAIN and hints can work across >>>>>>>>>>>>>>>>> optimisers (a suitable hint might essentially bypass the >>>>>>>>>>>>>>>>> optimiser, if the optimiser permits it, by providing a >>>>>>>>>>>>>>>>> standard execution model) >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I think it would be worth considering providing the execution >>>>>>>>>>>>>>>>> plan to the client as part of query preparation, as an opaque >>>>>>>>>>>>>>>>> payload to supply to coordinators on first contact, as this >>>>>>>>>>>>>>>>> might simplify the problem of ensuring queries behave the >>>>>>>>>>>>>>>>> same without adopting a lot of complexity for synchronising >>>>>>>>>>>>>>>>> statistics (which will never provide strong guarantees). Of >>>>>>>>>>>>>>>>> course, re-preparing a query might lead to a new plan, though >>>>>>>>>>>>>>>>> any coordinators with the query in their cache should be able >>>>>>>>>>>>>>>>> to retrieve it cheaply. If the execution model is efficiently >>>>>>>>>>>>>>>>> serialised this might have the ancillary benefit of improving >>>>>>>>>>>>>>>>> the occupancy of our prepared query cache. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On 13 Dec 2023, at 00:44, Jon Haddad <j...@jonhaddad.com> >>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> I think it makes sense to see what the actual overhead is of >>>>>>>>>>>>>>>>>> CBO before making the assumption it'll be so high that we >>>>>>>>>>>>>>>>>> need to have two code paths. I'm happy to provide thorough >>>>>>>>>>>>>>>>>> benchmarking and analysis when it reaches a testing phase. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> I'm excited to see where this goes. I think it sounds very >>>>>>>>>>>>>>>>>> forward looking and opens up a lot of possibilities. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Jon >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Tue, Dec 12, 2023 at 4:25 PM guo Maxwell >>>>>>>>>>>>>>>>>> <cclive1...@gmail.com> wrote: >>>>>>>>>>>>>>>>>>> Nothing expresses my thoughts better than +1 >>>>>>>>>>>>>>>>>>> ,It feels like it means a lot to Cassandra. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> I have a question. Is it easy to turn off cbo's optimizer >>>>>>>>>>>>>>>>>>> or by pass in some way? Because some simple read and write >>>>>>>>>>>>>>>>>>> requests will have better performance without cbo, which is >>>>>>>>>>>>>>>>>>> also the advantage of Cassandra compared to some rdbms. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> David Capwell <dcapw...@apple.com>于2023年12月13日 周三上午3:37写道: >>>>>>>>>>>>>>>>>>>> Overall LGTM. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> On Dec 12, 2023, at 5:29 AM, Benjamin Lerer >>>>>>>>>>>>>>>>>>>>> <ble...@apache.org> wrote: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Hi everybody, >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> I would like to open the discussion on the introduction >>>>>>>>>>>>>>>>>>>>> of a cost based optimizer to allow Cassandra to pick the >>>>>>>>>>>>>>>>>>>>> best execution plan based on the data >>>>>>>>>>>>>>>>>>>>> distribution.Therefore, improving the overall query >>>>>>>>>>>>>>>>>>>>> performance. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> This CEP should also lay the groundwork for the future >>>>>>>>>>>>>>>>>>>>> addition of features like joins, subqueries, OR/NOT and >>>>>>>>>>>>>>>>>>>>> index ordering. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> The proposal is here: >>>>>>>>>>>>>>>>>>>>> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-39%3A+Cost+Based+Optimizer >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Thank you in advance for your feedback. >>