The CEP expressly includes an item for coordinated cardinality estimation, by producing whole cluster summaries. I’m not sure if you addressed this in your feedback, it’s not clear what you’re referring to with distributed estimates, but avoiding this was expressly the driver of my suggestion to instead include the plan as a payload (which offers users some additional facilities).
> On 2 Jan 2024, at 21:26, Ariel Weisberg <ar...@weisberg.ws> wrote: > > > Hi, > > I am burying the lede, but it's important to keep an eye on runtime-adaptive > vs planning time optimization as the cost/benefits vary greatly between the > two and runtime adaptive can be a game changer. Basically CBO optimizes for > query efficiency and startup time at the expense of not handling some queries > well and runtime adaptive is cheap/free for expensive queries and can handle > cases that CBO can't. > > Generally speaking I am +1 on the introduction of a CBO, since it seems like > there exists things that would benefit from it materially (and many of the > associated refactors/cleanup) and it aligns with my north star that includes > joins. > > Do we all have the same north star that Cassandra should eventually support > joins? Just curious if that is controversial. > > I don't feel like this CEP in particular should need to really nail down > exactly how distributed estimates work since we can start with using local > estimates as a proxy for the entire cluster and then improve. If someone has > bandwidth to do a separate CEP for that then sure that would be great, but > this seems big enough in scope already. > > RE testing, continuity of performance of queries is going to be really > important. I would really like to see that we have a fuzzed the space > deterministically and via a collection of hand rolled cases, and can compare > performance between versions to catch queries that regress. Hopefully we can > agree on a baseline for releasing where we know what prior release to compare > to and what acceptable changes in performance are. > > RE prepared statements - It feels to me like trying to send the plan blob > back and forth to get more predictable, but not absolutely predictable, plans > is not worth it? Feels like a lot for an incremental improvement over a > baseline that doesn't exist yet, IOW it doesn't feel like something for V1. > Maybe it ends up in YAGNI territory. > > The north star of predictable behavior for queries is a *very* important one > because it means the world to users, but CBO is going to make mistakes all > over the place. It's simply unachievable even with accurate statistics > because it's very hard to tell how predicates will behave on a column. > > This segues nicely into the importance of adaptive execution :-) It's how you > rescue the queries that CBO doesn't handle well for any reason such as bugs, > bad statistics, missing features. Re-ordering predicate evaluation, switching > indexes, and re-ordering joins can all be done on the fly. > > CBO is really a performance optimization since adaptive approaches will allow > any query to complete with some wasted resources. > > If my pager were waking me up at night and I wanted to stem the bleeding I > would reach for runtime adaptive over CBO because I know it will catch more > cases even if it is slower to execute up front. > > What is the nature of the queries we are looking solve right now? Are they > long running heavy hitters, or short queries that explode if run incorrectly, > or a mix of both? > > Ariel > >> On Tue, Dec 12, 2023, at 8:29 AM, Benjamin Lerer wrote: >> Hi everybody, >> >> I would like to open the discussion on the introduction of a cost based >> optimizer to allow Cassandra to pick the best execution plan based on the >> data distribution.Therefore, improving the overall query performance. >> >> This CEP should also lay the groundwork for the future addition of features >> like joins, subqueries, OR/NOT and index ordering. >> >> The proposal is here: >> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-39%3A+Cost+Based+Optimizer >> >> Thank you in advance for your feedback. >