On Wed, Jul 19, 2023 at 1:46 PM Jeff Davis <pg...@j-davis.com> wrote: > On Wed, 2023-07-19 at 20:03 +0200, Tomas Vondra wrote: > > Makes sense, I also need to think about maybe not having duplicate > > clauses in the two lists. What annoys me on that it partially > > prevents > > the cost-based reordering done by order_qual_clauses(). So maybe we > > should have three lists ... Also, some of the expressions count be > > fairly expensive. > > Can we just calculate the costs of the pushdown and do it when it's a > win? If the random_page_cost savings exceed the costs from evaluating > the clause earlier, then push down.
My patch that teaches nbtree to execute ScalarArrayOps intelligently (by dynamically choosing to not re-descend the btree to perform another "primitive index scan" when the data we need is located on the same leaf page as the current ScalarArrayOps arrays) took an interesting turn recently -- one that seems related. I found that certain kinds of queries are dramatically faster once you teach the optimizer to accept that multi-column ScalarArrayOps can be trusted to return tuples in logical/index order, at least under some circumstances. For example: pg@regression:5555 [583930]=# create index order_by_saop on tenk1(two,four,twenty); CREATE INDEX pg@regression:5555 [583930]=# EXPLAIN (ANALYZE, BUFFERS) select ctid, thousand from tenk1 where two in (0,1) and four in (1,2) and twenty in (1,2) order by two, four, twenty limit 20; This shows "Buffers: shared hit=1377" on HEAD, versus "Buffers: shared hit=13" with my patch. All because we can safely terminate the scan early now. The vast majority of the buffer hits the patch will avoid are against heap pages, even though I started out with the intention of eliminating unnecessary repeat index page accesses. Note that build_index_paths() currently refuses to allow SAOP clauses to be recognized as ordered with a multi-column index and a query with a clause for more than the leading column -- that is something that the patch needs to address (to get this particular improvement, at least). Allowing such an index path to have useful pathkeys is typically safe (in the sense that it won't lead to wrong answers to queries), but we still make a conservative assumption that they can lead to wrong answers. There are comments about "equality constraints" that describe the restrictions right now. But it's not just the question of basic correctness -- the optimizer is very hesitant to use multi-column SAOPs, even in cases that don't care about ordering. So it's also (I think, implicitly) a question of *risk*. The risk of getting very inefficient SAOP execution in nbtree, when it turns out that a huge number of "primitive index scans" are needed. But, if nbtree is taught to "coalesce together" primitive index scans at runtime, that risk goes way down. Anyway, this seems related to what you're talking about because the relationship between selectivity and ordering seems particularly important in this context. And because it suggests that there is at least some scope for adding "run time insurance" to the executor, which is valuable in the optimizer if it bounds the potential downside. If you can be practically certain that there is essentially zero risk of serious problems when the costing miscalculates (for a limited subset of cases), then life might be a lot easier -- clearly we should be biased in one particular direction with a case that has that kind of general profile. My current understanding of the optimizer side of things -- particularly things like costing for "filter quals/pqquals" versus costing for "true index quals" -- is rather basic. That will have to change. Curious to hear your thoughts (if any) on how what you're discussing may relate to what I need to do with my patch. Right now my patch assumes that making SAOP clauses into proper index quals (that usually preserve index ordering) is an unalloyed good (when safe!). This assumption is approximately true on average, as far as I can tell. But it's probably quite untrue in various specific cases, that somebody is bound to care about. -- Peter Geoghegan