Thanks for the insights! On Wed, Feb 26, 2025, at 16:05, Peter Geoghegan wrote: > On Wed, Feb 26, 2025 at 9:29 AM <large.goose2...@salomvary.com> wrote: > > Without being familiar the internals of the query planner, I *think* there > > *should* be a way to come up with WHERE conditions that results in the > > "perfect" plan. > > There is a fundamental trade-off involved here. The simple, fast > "WHERE (col_1, col_2, col_3) > (10, 20, 29)" query returns whatever > tuples are stored immediately after "(10, 20, 29)" in the index. > Naturally, they're returned in index order, which is usually the most > useful order (simple ASC order or simple DESC order for all columns). > > > The B-Tree code can physically traverse your mixed-ASC-and-DESC order > index in almost the same way. But it is much less useful, since the > matching index tuples won't be physically located together as exactly > one contiguous group of tuples.
I am not sure I understand this. My understanding is that given this "mixed order" index: CREATE INDEX data_index_desc ON data (col_1, col_2 DESC, col_3); The index tuples are physically organized exactly in this way: ORDER BY col_1, col_2 DESC, col_3 So that I should be able to write a query that reads a continuous range from this index without filtering. > > And so (with your "handwritten" row > comparison) you get a filter qual that filters out non-matching tuples > using lower-order index columns. The index scan actually just returns > "Index Cond: (col_1 >= 10)" (which still returns a contiguous group of > index tuples), while a filter condition excludes those tuples returned > by the index scan node that don't satisfy the later/lower-order column > condition. Does this mean that it is not possible to come up with a plan that has the same performance as "WHERE (col_1, col_2, col_3) > (10, 20, 29)" using "handwritten" filters, or only for "mixed order"? Or not a theoretical limitation but a limitation of the current implementation of the query planner? I don't know whether it's polite to bring up competitors on this mailing list, but MySQL 8 (which quite ironically has poor performance for the "row values" syntax, doing a full index scan) seems to avoid index filtering using the "OR AND" variant (at least when no mixed ordering is involved): SELECT content FROM data WHERE col_1 > 10 OR ( col_1 = 10 AND ( col_2 > 20 OR ( col_2 = 20 AND col_3 > 29 ) ) ) ORDER BY col_1, col_2, col_3 LIMIT 100; -> Limit: 100 row(s) (cost=100710 rows=100) (actual time=0.0322..1.15 rows=100 loops=1) -> Index range scan on data using data_index over (col_1 = 10 AND col_2 = 20 AND 29 < col_3) OR (col_1 = 10 AND 20 < col_2) OR (10 < col_1), with index condition: ((`data`.col_1 > 10) or ((`data`.col_1 = 10) and ((`data`.col_2 > 20) or ((`data`.col_2 = 20) and (`data`.col_3 > 29))))) (cost=100710 rows=511937) (actual time=0.0316..1.14 rows=100 loops=1) Still slightly slower actual total time on the same machine as PostgreSQL though (based on a single EXPLAIN ANALYZE sample only). > > The book "Relational Database Index Design and the Optimizers" > proposes a vocabulary for the trade-offs in this area -- the 3 star > model. When creating the best possible index for certain queries it is > sometimes inherently necessary to choose between what it calls the > first star (which means avoiding a sort) and the second star (which > means having the thinnest possible "row slice"). Sometimes those > things are in tension, which makes sense when you imagine how the > index must be physically traversed. Aka. "Good, Fast, Cheap — Pick Any Two" ;) Cheers, Márton