Re: [DISCUSS] CEP-39: Cost Based Optimizer

Josh McKenzie Thu, 21 Dec 2023 10:00:15 -0800

> we are already late. We have several features running in production that we 
> chose to not open source yet because implementing phase 1 of the CEP would 
> have heavily simplify their designs. The cost of developing them was much 
> higher than what it would have been if the CEP had already been implemented. 
> We are also currently working on some SAI features that need cost based 
> optimization.
Are there DISCUSS threads or CEP's for any of that work? For us to have a 
useful discussion about whether we're at a point in the project where a query 
optimizer is appropriate for the project this information would be vital.


On Thu, Dec 21, 2023, at 12:33 PM, Benjamin Lerer wrote:
> Hey German,
> 
> To clarify things, we intend to push cardinalities across nodes, not costs. 
> It will be up to the Cost Model to estimate cost based on those 
> cardinalities. We will implement some functionalities to collect costs on 
> query execution to be able to provide them as the output of EXPLAIN ANALYZE.
> 
> We will provide more details on how we will collect and distribute 
> cardinalities. We will probably not go into details on how we will estimate 
> costs before the patch for it is ready. The main reason being that there are 
> a lot of different parts that you need to account for and that it will 
> require significant testing and experimentation.
> 
> Regarding multi-tenancy, even if you use query cost, do not forget that you 
> will have to account also for background tasks such as compaction, repair, 
> backup, ... which is not included in this CEP.      
> 
> Le jeu. 21 déc. 2023 à 00:18, German Eichberger via dev 
> <dev@cassandra.apache.org> a écrit :
>> All,
>> 
>> very much agree with Scott's reasoning. 
>> 
>> It seems expedient given the advent of ACCORD transactions to be more like 
>> the other distributed SQL databases and just support SQL. But just because 
>> it's expedient it isn't right and we should work out the relational features 
>> in more detail before we embark on tying us to some query planning design.
>> 
>> The main problem in this space is pushing cost / across nodes based on data 
>> density. I understand that TCM will level out data density but the cost 
>> based optimizer proposal does a lot of hand waving when it comes to 
>> collecting/estimating costs for each node. I like to see more details on 
>> this since otherwise it will be fairly limiting.
>> 
>> I am less tied to ALLOW FILTERING - many of my customers find allowing 
>> filtering beneficial for their workloads so I think removing it makes sense 
>> to me (and yes we try to discourage them 🙂)
>> 
>> I am also intrigued by this proposal when I think about multi tenancy and 
>> resource governance: We have heard from several operator who run multiple 
>> internal teams on the same Cassandra cluster jut to optimize costs. Having a 
>> way to attribute those costs more fairly by adding up the costs the 
>> optimizer calculates might be hugely beneficial.  There could also be a way 
>> to have a "cost budget" on a keyspace to minimize the noisy neighbor problem 
>> and do more intelligent request throttling. 
>> 
>> In summary I support the proposal with the caveats raised above.
>> 
>> Thanks,
>> German
>> 
>> 
>> *From:* C. Scott Andreas <sc...@paradoxica.net>
>> *Sent:* Wednesday, December 20, 2023 8:15 AM
>> *To:* dev@cassandra.apache.org <dev@cassandra.apache.org>
>> *Cc:* dev@cassandra.apache.org <dev@cassandra.apache.org>
>> *Subject:* [EXTERNAL] Re: [DISCUSS] CEP-39: Cost Based Optimizer
>>  
>> 
>> You don't often get email from sc...@paradoxica.net. Learn why this is 
>> important <https://aka.ms/LearnAboutSenderIdentification>
>> 
>> Thanks for this proposal and apologies for my delayed engagement during the 
>> Cassandra Summit last week. Benjamin, I appreciate your work on this and 
>> your engagement on this thread – I know it’s a lot of discussion to field.
>> 
>> On ALLOW FILTERING:
>> 
>> I share Chris Lohfink’s experience in operating clusters that have made 
>> heavy use of ALLOW FILTERING. It is a valuable guardrail for the database to 
>> require users specifically annotate queries that may cost 1000x+ that of a 
>> simple lookup for a primary key. For my own purposes, I’d actually like to 
>> go a step further and disable queries that require ALLOW FILTERING by 
>> default unless explicitly reviewed - but haven’t taken the step of adding 
>> such a guardrail yet.
>> 
>> CBOs, CQL, and SQL:
>> 
>> The CBO proposal cuts to the heart of one of the fundamental differences 
>> between SQL and CQL that I haven’t seen exercised yet.
>> 
>> SQL allows users to define schemas that provide structure to data and to 
>> issue queries over them based on a relational algebra. SQL’s purpose is to 
>> decouple the on-disk representation of data from the query language used to 
>> access and aggregate it. This produces a very flexible query language that 
>> can be used to ask a database anything - but at a cost of execution that may 
>> be effectively infinite (think recursive subqueries).
>> 
>> CQL is very different. While SQL is designed to decouple query language and 
>> on-disk representation, CQL is designed specifically to couple them. A CQL 
>> schema declares data placement, query routing, and disk serialization, and 
>> sorting to enable efficient retrieval. This is a very different design goal 
>> from a general-purpose query language. In time CQL may gain many SQL-like 
>> capabilities (and I hope it does!), but it will require careful work to do 
>> so without creating many footguns.
>> 
>> Feature evolution:
>> 
>> I agree that in the coming years, Cassandra is likely to gain 
>> semi-relational features via maturation of the byte-ordered partitioner (via 
>> range splitting, via TCM); the availability of SAI and its evolution (e.g., 
>> via new functionality enabled by Lucene libraries); potentially joins via 
>> BOP; and others. This is a really exciting future, and one that probably 
>> requires a planner and optimizer.
>> 
>> My general inclination is that a planner + optimizer seem valuable for 
>> Cassandra, but that the proposal feels a year or two early. The database 
>> doesn’t yet have a plan of record to add support for some of the 
>> semirelational constructs we’ve talked about, and I’m not aware of active 
>> CEPs that propose designs for features like these yet.
>> 
>> Like Jeff, I’d find this much easier to discuss in the context of a database 
>> gaining support for these features with specific designs available to 
>> discuss. The ALLOW FILTERING and constant folding examples are a little 
>> slim. Index selection is probably the best one I can think of right now - 
>> e.g., if we wanted to add the ability to issue partition-restricted queries 
>> over a base table with multiple indexes defined without users specifically 
>> declaring an index. I haven’t seen an at-scale use case that would be better 
>> served by planner-driven index selection vs. user-driven, but they might be 
>> out there.
>> 
>> It’s not my role to suggest changes in prioritization for work that isn’t 
>> mine. But I feel that the project could design better interfaces and a 
>> better planner/optimizer if that work were oriented toward improving 
>> particular features that are in wide use.
>> 
>> To summarize my thoughts:
>> 
>> – I agree that it is valuable for Apache Cassandra to gain a 
>> planner/optimizer.
>> – I disagree with removing or deprecating ALLOW FILTERING and see this as a 
>> necessary guardrail.
>> – I think the proposal surfaces the differences between the design goals of 
>> CQL and SQL, but I don’t feel that it quite addresses it.
>> – I think we could collectively build a stronger planner/optimizer once some 
>> of the features it’s meant to optimize are in place.
>> – I’m not quite sold on the need for the implementation to be bespoke based 
>> on discussion so far (vs. Calcite/Catalyst etc), but haven’t done the 
>> legwork to investigate this myself.
>> – I *love* the idea of capturing many of the execution and hotness 
>> statistics that are proposed in the CEP. It would be very valuable to 
>> surface query cost to users independent of a CBO. Stats like these would 
>> also be valuable toward retrofitting Cassandra for multitenancy by bounding 
>> or rate-limiting users on query cost. Tracking SSTable hotness would also be 
>> useful toward evaluating feasibility of tiered storage, too.
>> 
>> Thanks for this proposal and discussion so far — appreciate and enjoying it.
>> 
>> – Scott
>> 
>>> On Dec 20, 2023, at 7:52 AM, Benjamin Lerer <ble...@apache.org> wrote:
>>> 
>>> 
>>>> If we are to address that within the CEP itself then we should discuss it 
>>>> here, as I would like to fully understand the approach as well as how it 
>>>> relates to consistency of execution and the idea of triggering 
>>>> re-optimisation.
>>> 
>>> Sure, that was my plan.
>>> 
>>> 
>>>> I’m not sold on the proposed set of characteristics, and think my coupling 
>>>> an execution plan to a given prepared statement for clients to supply is 
>>>> perhaps simpler to implement and maintain, and has corollary benefits - 
>>>> such as providing a mechanism for users to specify their own execution 
>>>> plan.
>>>>  
>>>> Note, my proposal cuts across all of these elements of the CEP. There is 
>>>> no obvious need for a cross-cluster re-optimisation event or cross cluster 
>>>> statistic management.
>>>  
>>> I think that I am missing one part of your proposal. How do you plan to 
>>> build the initial execution plan for a prepared statement?
>>> 
>>> Le mer. 20 déc. 2023 à 14:05, Benedict <bened...@apache.org> a écrit :
>>>> 
>>>> If we are to address that within the CEP itself then we should discuss it 
>>>> here, as I would like to fully understand the approach as well as how it 
>>>> relates to consistency of execution and the idea of triggering 
>>>> re-optimisation. These ideas are all interrelated.
>>>> 
>>>> I’m not sold on the proposed set of characteristics, and think my coupling 
>>>> an execution plan to a given prepared statement for clients to supply is 
>>>> perhaps simpler to implement and maintain, and has corollary benefits - 
>>>> such as providing a mechanism for users to specify their own execution 
>>>> plan.
>>>> 
>>>> Note, my proposal cuts across all of these elements of the CEP. There is 
>>>> no obvious need for a cross-cluster re-optimisation event or cross cluster 
>>>> statistic management.
>>>> 
>>>> We still also need to discuss more concretely how the base statistics 
>>>> themselves will be derived, as there is little detail here today in the 
>>>> proposal.
>>>> 
>>>> 
>>>>> On 20 Dec 2023, at 12:58, Benjamin Lerer <b.le...@gmail.com> wrote:
>>>>> 
>>>>> After the second phase of the CEP, we will have two optimizer 
>>>>> implementations. One will be similar to what we have today and the other 
>>>>> one will be the CBO. As those implementations will be behind the new 
>>>>> Optimizer API interfaces they will both have support for EXPLAIN and they 
>>>>> will both benefit from the simplification/normalization rules. Such as 
>>>>> the ones that David mentioned.
>>>>> 
>>>>> Regarding functions, we are already able to determine which ones are 
>>>>> deterministic 
>>>>> (https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/cql3/functions/Function.java#L55).
>>>>>  We simply do not take advantage of it.
>>>>> 
>>>>> I removed the ALLOW FILTERING part and will open a discussion about it at 
>>>>> the beginning of next year.
>>>>> 
>>>>> Regarding the statistics management part, I would like to try to address 
>>>>> it within the CEP itself, if feasible. If it turns out to be too 
>>>>> complicated, I will separate it into its own CEP.
>>>>> 
>>>>> Le mar. 19 déc. 2023 à 22:23, David Capwell <dcapw...@apple.com> a écrit :
>>>>>>> even if the only outcome of all this work were to tighten up 
>>>>>>> inconsistencies in our grammar and provide more robust EXPLAIN and 
>>>>>>> EXPLAIN ANALYZE functionality to our end users, I think that would be 
>>>>>>> highly valuable
>>>>>> 
>>>>>> In my mental model a no-op optimizer just becomes what we have today 
>>>>>> (since all new features really should be disabled by default, I would 
>>>>>> hope we support this), so we benefit from having a logical AST + ability 
>>>>>> to mutate it before we execute it and we can use this to make things 
>>>>>> nicer for users (as you are calling out)
>>>>>> 
>>>>>> Here is one example that stands out to me in accord
>>>>>> 
>>>>>> LET a = (select * from tbl where pk=0);
>>>>>> Insert into tbl2 (pk, …) values (a.pk, …); — this is not allowed as we 
>>>>>> don’t know the primary key… but this could trivially be written to 
>>>>>> replace a.pk with 0…
>>>>>> 
>>>>>> With this work we could also rethink what functions are deterministic 
>>>>>> and which ones are not (not trying to bike shed)… simple example is 
>>>>>> “now” (select now() from tbl; — each row will have a different 
>>>>>> timestamp), if we make this deterministic we can avoid calling it for 
>>>>>> each row and instead just replace it with a constant for the query… 
>>>>>> 
>>>>>> Even if the CBO is dropped in favor of no-op (what we do today), I still 
>>>>>> see value in this work.
>>>>>> 
>>>>>> I do think that the CBO really doesn’t solve the fact some features 
>>>>>> don’t work well, if anything it could just mask it until it’s too late…. 
>>>>>>  If user builds an app using filtering and everything is going well in 
>>>>>> QA, but once they see a spike in traffic in prod we start rejecting… 
>>>>>> this is a bad user experience IMO… we KNOW you must think about this 
>>>>>> before you go this route, so a CBO letting you ignore it till you hit a 
>>>>>> wall I don’t think is the best (not saying ALLOW FILTERING is the 
>>>>>> solution to this… but it at least is a signal to users to think through 
>>>>>> their data model). 
>>>>>> 
>>>>>> 
>>>>>>> On Dec 15, 2023, at 6:38 PM, Josh McKenzie <jmcken...@apache.org> wrote:
>>>>>>> 
>>>>>>>> Goals
>>>>>>>>  • Introduce a Cascades(2) query optimizer with rules easily 
>>>>>>>> extendable 
>>>>>>>>  • Improve query performance for most common queries
>>>>>>>>  • Add support for EXPLAIN and EXPLAIN ANALYZE to help with query 
>>>>>>>> optimization and troubleshooting
>>>>>>>>  • Lay the groundwork for the addition of features like joins, 
>>>>>>>> subqueries, OR/NOT and index ordering
>>>>>>>>  • Put in place some performance benchmarks to validate query 
>>>>>>>> optimizations
>>>>>>> I think these are sensible goals. We're possibly going to face a 
>>>>>>> chicken-or-egg problem with a feature like this that so heavily 
>>>>>>> intersects with other as-yet written features where much of the value 
>>>>>>> is in the intersection of them; if we continue down the current "one 
>>>>>>> heuristic to rule them all" query planning approach we have now, we'll 
>>>>>>> struggle to meaningfully explore or conceptualize the value of 
>>>>>>> potential alternatives different optimizers could present us. Flip 
>>>>>>> side, to Benedict's point, until SAI hits and/or some other potential 
>>>>>>> future things we've all talked about, this cbo would likely fall 
>>>>>>> directly into the same path that we effectively have hard-coded today 
>>>>>>> (primary index path only).
>>>>>>> 
>>>>>>> One thing I feel pretty strongly about: even if the only outcome of all 
>>>>>>> this work were to tighten up inconsistencies in our grammar and provide 
>>>>>>> more robust EXPLAIN and EXPLAIN ANALYZE functionality to our end users, 
>>>>>>> I think that would be highly valuable. This path of "only" would be 
>>>>>>> predicated on us not having successful introduction of a robust 
>>>>>>> secondary index implementation and a variety of other things we have a 
>>>>>>> lot of interest in, so I find it unlikely, but worth calling out.
>>>>>>> 
>>>>>>> re: the removal of ALLOW FILTERING - is there room for compromise here 
>>>>>>> and instead converting it to a guardrail that defaults to being 
>>>>>>> enabled? That could theoretically give us a more gradual path to 
>>>>>>> migration to a cost-based guardrail for instance, and would preserve 
>>>>>>> the current robustness of the system while making it at least a touch 
>>>>>>> more configurable.
>>>>>>> 
>>>>>>> On Fri, Dec 15, 2023, at 11:03 AM, Chris Lohfink wrote:
>>>>>>>> Thanks for time in addressing concerns. At least with initial 
>>>>>>>> versions, as long as there is a way to replace it with noop or disable 
>>>>>>>> it I would be happy. This is pretty standard practice with features 
>>>>>>>> nowadays but I wanted to highlight it as this might require some 
>>>>>>>> pretty tight coupling.
>>>>>>>> 
>>>>>>>> Chris
>>>>>>>> 
>>>>>>>> On Fri, Dec 15, 2023 at 7:57 AM Benjamin Lerer <ble...@apache.org> 
>>>>>>>> wrote:
>>>>>>>>> Hey Chris,
>>>>>>>>> You raise some valid points.
>>>>>>>>> 
>>>>>>>>> I believe that there are 3 points that you mentioned:
>>>>>>>>> 1) CQL restrictions are some form of safety net and should be kept
>>>>>>>>> 2) A lot of Cassandra features do not scale and/or are too easy to 
>>>>>>>>> use in a wrong way that can make the whole system collapse. We should 
>>>>>>>>> not add more to that list. Especially not joins.
>>>>>>>>> 
>>>>>>>>> 3) Should we not start to fix features like secondary index rather 
>>>>>>>>> than adding new ones? Which is heavily linked to 2).
>>>>>>>>> 
>>>>>>>>> Feel free to correct me if I got them wrong or missed one.
>>>>>>>>> 
>>>>>>>>> Regarding 1), I believe that you refer to the "Removing unnecessary 
>>>>>>>>> CQL query limitations and inconsistencies" section. We are not 
>>>>>>>>> planning to remove any safety net here.
>>>>>>>>> What we want to remove is a certain amount of limitations which make 
>>>>>>>>> things confusing for a user trying to write a query for no good 
>>>>>>>>> reason. Like "why can I define a column alias but not use it anywhere 
>>>>>>>>> in my query?" or "Why can I not create a list with 2 bind 
>>>>>>>>> parameters?". While refactoring some CQL code, I kept on finding 
>>>>>>>>> those types of exceptions that we can easily remove while simplifying 
>>>>>>>>> the code at the same time.
>>>>>>>>> 
>>>>>>>>> For 2), I agree that at a certain scale or for some scenarios, some 
>>>>>>>>> features simply do not scale or catch users by surprise. The goal of 
>>>>>>>>> the CEP is to improve things in 2 ways. One is by making Cassandra 
>>>>>>>>> smarter in the way it chooses how to process queries, hopefully 
>>>>>>>>> improving its overall scalability. The other by being transparent 
>>>>>>>>> about how Cassandra will execute the queries through the use of 
>>>>>>>>> EXPLAIN. One problem of GROUP BY for example is that most users do 
>>>>>>>>> not realize what is actually happening under the hood and therefore 
>>>>>>>>> its limitations. I do not believe that EXPLAIN will change everything 
>>>>>>>>> but it will help people to get a better understanding of the 
>>>>>>>>> limitations of some features.
>>>>>>>>> 
>>>>>>>>> I do not know which features will be added in the future to C*. That 
>>>>>>>>> will be discussed through some future CEPs. Nevertheless, I do not 
>>>>>>>>> believe that it makes sense to write a CEP for a query optimizer 
>>>>>>>>> without taking into account that we might at some point add some 
>>>>>>>>> level of support for joins or subqueries. We have been too often 
>>>>>>>>> delivering features without looking at what could be the possible 
>>>>>>>>> evolutions which resulted in code where adding new features was more 
>>>>>>>>> complex than it should have been. I do not want to make the same 
>>>>>>>>> mistake. I want to create an optimizer that can be improved easily 
>>>>>>>>> and considering joins or other features simply help to build things 
>>>>>>>>> in a more generic way.
>>>>>>>>> 
>>>>>>>>> Regarding feature stabilization, I believe that it is happening. I 
>>>>>>>>> have heard plans of how to solve MVs, range queries, hot partitions, 
>>>>>>>>> ... and there was a lot of thinking behind those plans. Secondary 
>>>>>>>>> indexes are being worked on. We hope that the optimizer will also 
>>>>>>>>> help with some index queries.
>>>>>>>>> 
>>>>>>>>> It seems to me that this proposal is going toward the direction that 
>>>>>>>>> you want without introducing new problems for scalability.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>  
>>>>>>>>> 
>>>>>>>>> Le jeu. 14 déc. 2023 à 16:47, Chris Lohfink <clohfin...@gmail.com> a 
>>>>>>>>> écrit :
>>>>>>>>>> I don't wanna be a blocker for this CEP or anything but did want to 
>>>>>>>>>> put my 2 cents in. This CEP is horrifying to me.
>>>>>>>>>> 
>>>>>>>>>> I have seen thousands of clusters across multiple companies and 
>>>>>>>>>> helped them get working successfully. A vast majority of that 
>>>>>>>>>> involved blocking the use of MVs, GROUP BY, secondary indexes, and 
>>>>>>>>>> even just simple _range queries_. The "unncessary restrictions of 
>>>>>>>>>> cql" are not only necessary IMHO, more restrictions are necessary to 
>>>>>>>>>> be successful at scale. The idea of just opening up CQL to general 
>>>>>>>>>> purpose relational queries and lines like "supporting queries with 
>>>>>>>>>> joins in an efficient way" ... I would really like us to make 
>>>>>>>>>> secondary indexes be a viable option before we start opening up 
>>>>>>>>>> floodgates on stuff like this.
>>>>>>>>>> 
>>>>>>>>>> Chris
>>>>>>>>>> 
>>>>>>>>>> On Thu, Dec 14, 2023 at 9:37 AM Benedict <bened...@apache.org> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> > So yes, this physical plan is the structure that you have in mind 
>>>>>>>>>>> > but the idea of sharing it is not part of the CEP.
>>>>>>>>>>> 
>>>>>>>>>>> I think it should be. This should form a major part of the API on 
>>>>>>>>>>> which any CBO is built.
>>>>>>>>>>> 
>>>>>>>>>>> > It seems that there is a difference between the goal of your 
>>>>>>>>>>> > proposal and the one of the CEP. The goal of the CEP is first to 
>>>>>>>>>>> > ensure optimal performance. It is ok to change the execution plan 
>>>>>>>>>>> > for one that delivers better performance. What we want to 
>>>>>>>>>>> > minimize is having a node performing queries in an inefficient 
>>>>>>>>>>> > way for a long period of time.
>>>>>>>>>>> 
>>>>>>>>>>> You have made a goal of the CEP synchronising summary statistics 
>>>>>>>>>>> across the whole cluster in order to achieve some degree of 
>>>>>>>>>>> uniformity of query plan. So this is explicitly a goal of the CEP, 
>>>>>>>>>>> and synchronising summary statistics is a hard problem and won’t 
>>>>>>>>>>> provide strong guarantees.
>>>>>>>>>>> 
>>>>>>>>>>> > The client side proposal targets consistency for a given query on 
>>>>>>>>>>> > a given driver instance. In practice, it would be possible to 
>>>>>>>>>>> > have 2 similar queries with 2 different execution plans on the 
>>>>>>>>>>> > same driver
>>>>>>>>>>> 
>>>>>>>>>>> This would only be possible if the driver permitted it. A driver 
>>>>>>>>>>> could (and should) enforce that it only permits one query plan per 
>>>>>>>>>>> query.
>>>>>>>>>>> 
>>>>>>>>>>> The opposite is true for your proposal: some queries may begin 
>>>>>>>>>>> degrading because they touch specific replicas that optimise the 
>>>>>>>>>>> query differently, and this will be hard to debug.
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>>> On 14 Dec 2023, at 15:30, Benjamin Lerer <b.le...@gmail.com> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>> The binding of the parser output to the schema (what is today the 
>>>>>>>>>>>> Raw.prepare call) will create the logical plan, expressed as a 
>>>>>>>>>>>> tree of relational operators. Simplification and normalization 
>>>>>>>>>>>> will happen on that tree to produce a new equivalent logical plan. 
>>>>>>>>>>>> That logical plan will be used as input to the optimizer. The 
>>>>>>>>>>>> output will be a physical plan producing the output specified by 
>>>>>>>>>>>> the logical plan. A tree of physical operators specifying how the 
>>>>>>>>>>>> operations should be performed.
>>>>>>>>>>>> 
>>>>>>>>>>>> That physical plan will be stored as part of the statements 
>>>>>>>>>>>> (SelectStatement, ModificationStatement, ...) in the prepared 
>>>>>>>>>>>> statement cache. Upon execution, variables will be bound and the 
>>>>>>>>>>>> RangeCommands/Mutations will be created based on the physical plan.
>>>>>>>>>>>> 
>>>>>>>>>>>> The string representation of a physical plan will effectively 
>>>>>>>>>>>> represent the output of an EXPLAIN statement but outside of that 
>>>>>>>>>>>> the physical plan will stay encapsulated within the statement 
>>>>>>>>>>>> classes.   
>>>>>>>>>>>> Hints will be parameters provided to the optimizer to enforce some 
>>>>>>>>>>>> specific choices. Like always using an Index Scan instead of a 
>>>>>>>>>>>> Table Scan, ignoring the cost comparison.
>>>>>>>>>>>> 
>>>>>>>>>>>> So yes, this physical plan is the structure that you have in mind 
>>>>>>>>>>>> but the idea of sharing it is not part of the CEP. I did not 
>>>>>>>>>>>> document it because it will simply be a tree of physical operators 
>>>>>>>>>>>> used internally.
>>>>>>>>>>>> 
>>>>>>>>>>>>> My proposal is that the execution plan of the coordinator that 
>>>>>>>>>>>>> prepares a query gets serialised to the client, which then 
>>>>>>>>>>>>> provides the execution plan to all future coordinators, and 
>>>>>>>>>>>>> coordinators provide it to replicas as necessary. 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> This means it is not possible for any conflict to arise for a 
>>>>>>>>>>>>> single client. It would guarantee consistency of execution for 
>>>>>>>>>>>>> any single client (and avoid any drift over the client’s 
>>>>>>>>>>>>> sessions), without necessarily guaranteeing consistency for all 
>>>>>>>>>>>>> clients.
>>>>>>>>>>>> 
>>>>>>>>>>>>  It seems that there is a difference between the goal of your 
>>>>>>>>>>>> proposal and the one of the CEP. The goal of the CEP is first to 
>>>>>>>>>>>> ensure optimal performance. It is ok to change the execution plan 
>>>>>>>>>>>> for one that delivers better performance. What we want to minimize 
>>>>>>>>>>>> is having a node performing queries in an inefficient way for a 
>>>>>>>>>>>> long period of time.
>>>>>>>>>>>> 
>>>>>>>>>>>> The client side proposal targets consistency for a given query on 
>>>>>>>>>>>> a given driver instance. In practice, it would be possible to have 
>>>>>>>>>>>> 2 similar queries with 2 different execution plans on the same 
>>>>>>>>>>>> driver making things really confusing. Identifying the source of 
>>>>>>>>>>>> an inefficient query will also be pretty hard.
>>>>>>>>>>>> 
>>>>>>>>>>>> Interestingly, having 2 nodes with 2 different execution plans 
>>>>>>>>>>>> might not be a serious problem. It simply means that based on 
>>>>>>>>>>>> cardinality at t1, the optimizer on node 1 chose plan 1 while the 
>>>>>>>>>>>> one on node 2 chose plan 2 at t2. In practice if the cost 
>>>>>>>>>>>> estimates reflect properly the actual cost those 2 plans should 
>>>>>>>>>>>> have pretty similar efficiency. The problem is more about the fact 
>>>>>>>>>>>> that you would ideally want a uniform behavior around your cluster.
>>>>>>>>>>>> Changes of execution plans should only occur at certain points. So 
>>>>>>>>>>>> the main problematic scenario is when the data distribution is 
>>>>>>>>>>>> around one of those points. Which is also the point where the 
>>>>>>>>>>>> change should have the least impact.
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> Le jeu. 14 déc. 2023 à 11:38, Benedict <bened...@apache.org> a 
>>>>>>>>>>>> écrit :
>>>>>>>>>>>>> 
>>>>>>>>>>>>> There surely needs to be a more succinct and abstract 
>>>>>>>>>>>>> representation in order to perform transformations on the query 
>>>>>>>>>>>>> plan? You don’t intend to manipulate the object graph directly as 
>>>>>>>>>>>>> you apply any transformations when performing simplification or 
>>>>>>>>>>>>> cost based analysis? This would also (I expect) be the form used 
>>>>>>>>>>>>> to support EXPLAIN functionality, and probably also HINTs etc. 
>>>>>>>>>>>>> This would ideally *not* be coupled to the CBO itself, and would 
>>>>>>>>>>>>> ideally be succinctly serialised.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I would very much expect the query plan to be represented 
>>>>>>>>>>>>> abstractly as part of this work, and for there to be a mechanism 
>>>>>>>>>>>>> that translates this abstract representation into the object 
>>>>>>>>>>>>> graph that executes it.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> If I’m incorrect, could you please elaborate more specifically 
>>>>>>>>>>>>> how you intend to go about this?
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On 14 Dec 2023, at 10:33, Benjamin Lerer <b.le...@gmail.com> 
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> I mean that an important part of this work - not specified in 
>>>>>>>>>>>>>>> the CEP (AFAICT) - should probably be to define some standard 
>>>>>>>>>>>>>>> execution model, that we can manipulate and serialise, for use 
>>>>>>>>>>>>>>> across (and without) optimisers.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> I am confused because for me an execution model defines how 
>>>>>>>>>>>>>> operations are executed within the database in a conceptual way, 
>>>>>>>>>>>>>> which is not something that this CEP intends to change. Do you 
>>>>>>>>>>>>>> mean the physical/execution plan?
>>>>>>>>>>>>>> Today this plan is somehow represented for reads by the 
>>>>>>>>>>>>>> SelectStatement and its components (Selections, 
>>>>>>>>>>>>>> StatementRestrictions, ...) it is then converted at execution 
>>>>>>>>>>>>>> time after parameter binding into a ReadCommand which is sent to 
>>>>>>>>>>>>>> the replicas.
>>>>>>>>>>>>>> We plan to refactor SelectStatement and its components but the 
>>>>>>>>>>>>>> ReadCommands change should be relatively small. What you are 
>>>>>>>>>>>>>> proposing is not part of the scope of this CEP.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Le jeu. 14 déc. 2023 à 10:24, Benjamin Lerer <b.le...@gmail.com> 
>>>>>>>>>>>>>> a écrit :
>>>>>>>>>>>>>>>> Can you share the reasons why Apache Calcite is not suitable 
>>>>>>>>>>>>>>>> for this case and why it was rejected
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> My understanding is that Calcite was made for two main things: 
>>>>>>>>>>>>>>> to help with optimizing SQL-like languages and to let people 
>>>>>>>>>>>>>>> query different kinds of data sources together.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> We could think about using it for our needs, but there are some 
>>>>>>>>>>>>>>> big problems:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>  1. CQL is not SQL. There are significant differences between 
>>>>>>>>>>>>>>> the 2 languages
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>  2. Cassandra has its own specificities that will influence the 
>>>>>>>>>>>>>>> cost model and the way we deal with optimizations: partitions, 
>>>>>>>>>>>>>>> replication factors, consistency levels, LSM tree storage, ...
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>  3. Every framework comes with its own limitations and 
>>>>>>>>>>>>>>> additional cost
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> From my view, there are too many big differences between what 
>>>>>>>>>>>>>>> Calcite does and what we need in Cassandra. If we used Calcite, 
>>>>>>>>>>>>>>> it would also mean relying a lot on another system that 
>>>>>>>>>>>>>>> everyone would have to learn and adjust to. The problems and 
>>>>>>>>>>>>>>> extra work this would bring don't seem worth the benefits we 
>>>>>>>>>>>>>>> might get
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>   
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Le mer. 13 déc. 2023 à 18:06, Benjamin Lerer 
>>>>>>>>>>>>>>> <b.le...@gmail.com> a écrit :
>>>>>>>>>>>>>>>> One thing that I did not mention is the fact that this CEP is 
>>>>>>>>>>>>>>>> only a high level proposal. There will be deeper discussions 
>>>>>>>>>>>>>>>> on the dev list around the different parts of this proposal 
>>>>>>>>>>>>>>>> when we reach those parts and have enough details to make 
>>>>>>>>>>>>>>>> those discussions more meaningful.    
>>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>>> The maintenance and distribution of summary statistics in 
>>>>>>>>>>>>>>>>> particular is worthy of its own CEP, and it might be 
>>>>>>>>>>>>>>>>> preferable to split it out.
>>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>> For maintaining node statistics the idea is to re-use the 
>>>>>>>>>>>>>>>> current Memtable/SSTable mechanism and relies on mergeable 
>>>>>>>>>>>>>>>> statistics. That will allow us to easily build node level 
>>>>>>>>>>>>>>>> statistics for a given table by merging all the statistics of 
>>>>>>>>>>>>>>>> its memtable and SSTables. For the distribution of these node 
>>>>>>>>>>>>>>>> statistics we are still exploring different options. We can 
>>>>>>>>>>>>>>>> come back with a precise proposal once we have hammered all 
>>>>>>>>>>>>>>>> the details. 
>>>>>>>>>>>>>>>> Is it for you a blocker for this CEP or do you just want to 
>>>>>>>>>>>>>>>> make sure that this part is discussed in deeper details before 
>>>>>>>>>>>>>>>> we implement it?
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>>> The proposal also seems to imply we are aiming for 
>>>>>>>>>>>>>>>>> coordinators to all make the same decision for a query, which 
>>>>>>>>>>>>>>>>> I think is challenging, and it would be worth fleshing out 
>>>>>>>>>>>>>>>>> the design here a little (perhaps just in Jira).
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> The goal is that the large majority of nodes preparing a query 
>>>>>>>>>>>>>>>> at a given point in time should make the same decision and 
>>>>>>>>>>>>>>>> that over time all nodes should converge toward the same 
>>>>>>>>>>>>>>>> decision. This part is dependent on the node statistics 
>>>>>>>>>>>>>>>> distribution, the cost model and the triggers for 
>>>>>>>>>>>>>>>> re-optimization (that will require some experimentation).
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> There’s also not much discussion of the execution model: I 
>>>>>>>>>>>>>>>>> think it would make most sense for this to be independent of 
>>>>>>>>>>>>>>>>> any cost and optimiser models (though they might want to 
>>>>>>>>>>>>>>>>> operate on them), so that EXPLAIN and hints can work across 
>>>>>>>>>>>>>>>>> optimisers (a suitable hint might essentially bypass the 
>>>>>>>>>>>>>>>>> optimiser, if the optimiser permits it, by providing a 
>>>>>>>>>>>>>>>>> standard execution model)
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> It is not clear to me what you mean by "a standard execution 
>>>>>>>>>>>>>>>> model"? Otherwise, we were not planning to have the execution 
>>>>>>>>>>>>>>>> model or the hints depending on the optimizer. 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> I think it would be worth considering providing the execution 
>>>>>>>>>>>>>>>>> plan to the client as part of query preparation, as an opaque 
>>>>>>>>>>>>>>>>> payload to supply to coordinators on first contact, as this 
>>>>>>>>>>>>>>>>> might simplify the problem of ensuring queries behave the 
>>>>>>>>>>>>>>>>> same without adopting a lot of complexity for synchronising 
>>>>>>>>>>>>>>>>> statistics (which will never provide strong guarantees). Of 
>>>>>>>>>>>>>>>>> course, re-preparing a query might lead to a new plan, though 
>>>>>>>>>>>>>>>>> any coordinators with the query in their cache should be able 
>>>>>>>>>>>>>>>>> to retrieve it cheaply. If the execution model is efficiently 
>>>>>>>>>>>>>>>>> serialised this might have the ancillary benefit of improving 
>>>>>>>>>>>>>>>>> the occupancy of our prepared query cache.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> I am not sure that I understand your proposal. If 2 nodes 
>>>>>>>>>>>>>>>> build a different execution plan how do you solve that 
>>>>>>>>>>>>>>>> conflict?
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Le mer. 13 déc. 2023 à 09:55, Benedict <bened...@apache.org> a 
>>>>>>>>>>>>>>>> écrit :
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> A CBO can only make worse decisions than the status quo for 
>>>>>>>>>>>>>>>>> what I presume are the majority of queries - i.e. those that 
>>>>>>>>>>>>>>>>> touch only primary indexes. In general, there are plenty of 
>>>>>>>>>>>>>>>>> use cases that prefer determinism. So I agree that there 
>>>>>>>>>>>>>>>>> should at least be a CBO implementation that makes the same 
>>>>>>>>>>>>>>>>> decisions as the status quo, deterministically.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> I do support the proposal, but would like to see some 
>>>>>>>>>>>>>>>>> elements discussed in more detail. The maintenance and 
>>>>>>>>>>>>>>>>> distribution of summary statistics in particular is worthy of 
>>>>>>>>>>>>>>>>> its own CEP, and it might be preferable to split it out. The 
>>>>>>>>>>>>>>>>> proposal also seems to imply we are aiming for coordinators 
>>>>>>>>>>>>>>>>> to all make the same decision for a query, which I think is 
>>>>>>>>>>>>>>>>> challenging, and it would be worth fleshing out the design 
>>>>>>>>>>>>>>>>> here a little (perhaps just in Jira).
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> While I’m not a fan of ALLOW FILTERING, I’m not convinced 
>>>>>>>>>>>>>>>>> that this CEP deprecates it. It is a concrete qualitative 
>>>>>>>>>>>>>>>>> guard rail, that I expect some users will prefer to a 
>>>>>>>>>>>>>>>>> cost-based guard rail. Perhaps this could be left to the CBO 
>>>>>>>>>>>>>>>>> to decide how to treat.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> There’s also not much discussion of the execution model: I 
>>>>>>>>>>>>>>>>> think it would make most sense for this to be independent of 
>>>>>>>>>>>>>>>>> any cost and optimiser models (though they might want to 
>>>>>>>>>>>>>>>>> operate on them), so that EXPLAIN and hints can work across 
>>>>>>>>>>>>>>>>> optimisers (a suitable hint might essentially bypass the 
>>>>>>>>>>>>>>>>> optimiser, if the optimiser permits it, by providing a 
>>>>>>>>>>>>>>>>> standard execution model)
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> I think it would be worth considering providing the execution 
>>>>>>>>>>>>>>>>> plan to the client as part of query preparation, as an opaque 
>>>>>>>>>>>>>>>>> payload to supply to coordinators on first contact, as this 
>>>>>>>>>>>>>>>>> might simplify the problem of ensuring queries behave the 
>>>>>>>>>>>>>>>>> same without adopting a lot of complexity for synchronising 
>>>>>>>>>>>>>>>>> statistics (which will never provide strong guarantees). Of 
>>>>>>>>>>>>>>>>> course, re-preparing a query might lead to a new plan, though 
>>>>>>>>>>>>>>>>> any coordinators with the query in their cache should be able 
>>>>>>>>>>>>>>>>> to retrieve it cheaply. If the execution model is efficiently 
>>>>>>>>>>>>>>>>> serialised this might have the ancillary benefit of improving 
>>>>>>>>>>>>>>>>> the occupancy of our prepared query cache.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> On 13 Dec 2023, at 00:44, Jon Haddad <j...@jonhaddad.com> 
>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> I think it makes sense to see what the actual overhead is of 
>>>>>>>>>>>>>>>>>> CBO before making the assumption it'll be so high that we 
>>>>>>>>>>>>>>>>>> need to have two code paths.  I'm happy to provide thorough 
>>>>>>>>>>>>>>>>>> benchmarking and analysis when it reaches a testing phase.  
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> I'm excited to see where this goes.  I think it sounds very 
>>>>>>>>>>>>>>>>>> forward looking and opens up a lot of possibilities.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Jon
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> On Tue, Dec 12, 2023 at 4:25 PM guo Maxwell 
>>>>>>>>>>>>>>>>>> <cclive1...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>> Nothing expresses my thoughts better than +1
>>>>>>>>>>>>>>>>>>> ，It feels like it means a lot to Cassandra.
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> I have a question. Is it easy to turn off cbo's optimizer 
>>>>>>>>>>>>>>>>>>> or by pass in some way? Because some simple read and write 
>>>>>>>>>>>>>>>>>>> requests will have better performance without cbo, which is 
>>>>>>>>>>>>>>>>>>> also the advantage of Cassandra compared to some rdbms.
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> David Capwell <dcapw...@apple.com>于2023年12月13日 周三上午3:37写道：
>>>>>>>>>>>>>>>>>>>> Overall LGTM.  
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> On Dec 12, 2023, at 5:29 AM, Benjamin Lerer 
>>>>>>>>>>>>>>>>>>>>> <ble...@apache.org> wrote:
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Hi everybody,
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> I would like to open the discussion on the introduction 
>>>>>>>>>>>>>>>>>>>>> of a cost based optimizer to allow Cassandra to pick the 
>>>>>>>>>>>>>>>>>>>>> best execution plan based on the data 
>>>>>>>>>>>>>>>>>>>>> distribution.Therefore, improving the overall query 
>>>>>>>>>>>>>>>>>>>>> performance.
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> This CEP should also lay the groundwork for the future 
>>>>>>>>>>>>>>>>>>>>> addition of features like joins, subqueries, OR/NOT and 
>>>>>>>>>>>>>>>>>>>>> index ordering.
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> The proposal is here: 
>>>>>>>>>>>>>>>>>>>>> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-39%3A+Cost+Based+Optimizer
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Thank you in advance for your feedback.
>>

Re: [DISCUSS] CEP-39: Cost Based Optimizer

Reply via email to