Re: Efficient pagination using multi-column cursors

large . goose2829 Wed, 26 Feb 2025 07:41:29 -0800

Thanks for the insights!

On Wed, Feb 26, 2025, at 16:05, Peter Geoghegan wrote:
> On Wed, Feb 26, 2025 at 9:29 AM <large.goose2...@salomvary.com> wrote:
> > Without being familiar the internals of the query planner, I *think* there 
> > *should* be a way to come up with WHERE conditions that results in the 
> > "perfect" plan.
> 
> There is a fundamental trade-off involved here. The simple, fast
> "WHERE (col_1, col_2, col_3) > (10, 20, 29)" query returns whatever
> tuples are stored immediately after "(10, 20, 29)" in the index.
> Naturally, they're returned in index order, which is usually the most
> useful order (simple ASC order or simple DESC order for all columns).
> 
> 
> The B-Tree code can physically traverse your mixed-ASC-and-DESC order
> index in almost the same way. But it is much less useful, since the
> matching index tuples won't be physically located together as exactly
> one contiguous group of tuples.


I am not sure I understand this.

My understanding is that given this "mixed order" index:
CREATE INDEX data_index_desc ON data (col_1, col_2 DESC, col_3);

The index tuples are physically organized exactly in this way:
ORDER BY col_1, col_2 DESC, col_3

So that I should be able to write a query that reads a continuous range from 
this index without filtering.

> 
> And so (with your "handwritten" row
> comparison) you get a filter qual that filters out non-matching tuples
> using lower-order index columns. The index scan actually just returns
> "Index Cond: (col_1 >= 10)" (which still returns a contiguous group of
> index tuples), while a filter condition excludes those tuples returned
> by the index scan node that don't satisfy the later/lower-order column
> condition.

Does this mean that it is not possible to come up with a plan that has the same 
performance as "WHERE (col_1, col_2, col_3) > (10, 20, 29)" using "handwritten" 
filters, or only for "mixed order"? Or not a theoretical limitation but a 
limitation of the current implementation of the query planner?

I don't know whether it's polite to bring up competitors on this mailing list, 
but MySQL 8 (which quite ironically has poor performance for the "row values" 
syntax, doing a full index scan) seems to avoid index filtering using the "OR 
AND" variant (at least when no mixed ordering is involved):

SELECT content
FROM data
WHERE
  col_1 > 10
  OR (
    col_1 = 10
    AND (
      col_2 > 20
      OR (
        col_2 = 20
        AND col_3 > 29
      )
    )
  )
ORDER BY col_1, col_2, col_3
LIMIT 100;

-> Limit: 100 row(s)  (cost=100710 rows=100) (actual time=0.0322..1.15 rows=100 
loops=1)
    -> Index range scan on data using data_index
         over (col_1 = 10 AND col_2 = 20 AND 29 < col_3) OR (col_1 = 10 AND 20 
< col_2) OR (10 < col_1),
         with index condition: ((`data`.col_1 > 10) or ((`data`.col_1 = 10) and 
((`data`.col_2 > 20) or ((`data`.col_2 = 20) and (`data`.col_3 > 29)))))
         (cost=100710 rows=511937) (actual time=0.0316..1.14 rows=100 loops=1)


Still slightly slower actual total time on the same machine as PostgreSQL 
though (based on a single EXPLAIN ANALYZE sample only).

> 
> The book "Relational Database Index Design and the Optimizers"
> proposes a vocabulary for the trade-offs in this area -- the 3 star
> model. When creating the best possible index for certain queries it is
> sometimes inherently necessary to choose between what it calls the
> first star (which means avoiding a sort) and the second star (which
> means having the thinnest possible "row slice"). Sometimes those
> things are in tension, which makes sense when you imagine how the
> index must be physically traversed.

Aka. "Good, Fast, Cheap — Pick Any Two" ;)

Cheers,
Márton

Re: Efficient pagination using multi-column cursors

Reply via email to