Re: POC, WIP: OR-clause support for indexes

Alena Rybakina Mon, 03 Feb 2025 03:55:10 -0800


On 03.02.2025 14:32, Alexander Korotkov wrote:

On Mon, Feb 3, 2025 at 12:22 PM Alena Rybakina
<[email protected]>  wrote:

Thank you for updated version! I agree for your version of the code.


On 02.02.2025 21:00, Alexander Korotkov wrote:

On Fri, Jan 31, 2025 at 4:31 PM Alena Rybakina
<[email protected]>  wrote:

I started reviewing at the patch and saw some output "ERROR" in the output of 
the test and is it okay here?

SELECT * FROM tenk1 t1
WHERE t1.thousand = 42 OR t1.thousand = (SELECT t2.tenthous FROM tenk1 t2 WHERE 
t2.thousand = t1.tenthous);
ERROR: more than one row returned by a subquery used as an expression

The output is correct for this query.  But the query is very
unfortunate for the regression test.  I've revised query in the v47
revision [1].

Links.
1.https://www.postgresql.org/message-id/CAPpHfdsBZmNt9qUoJBqsQFiVDX1%3DyCKpuVAt1YnR7JCpP%3Dk8%2BA%40mail.gmail.com

While analyzing the modified query plan from the regression test, I noticed 
that despite using a full seqscan for table t2 in the original plan,
its results are cached by Materialize node, and this can significantly speed up 
the execution of the NestedLoop algorithm.

For example, after running the query several times, I got results that show 
that the query execution time was twice as bad.

Original plan:

EXPLAIN ANALYZE SELECT * FROM bitmap_split_or t1, bitmap_split_or t2 WHERE t1.a=t2.b OR 
t1.a=1; QUERY PLAN 
--------------------------------------------------------------------------------------------------------------------------------
 Nested Loop (cost=0.00..70067.00 rows=2502499 width=24) (actual time=0.015..1123.247 
rows=2003000 loops=1) Join Filter: ((t1.a = t2.b) OR (t1.a = 1)) Rows Removed by Join 
Filter: 1997000 Buffers: shared hit=22 -> Seq Scan on bitmap_split_or t1 
(cost=0.00..31.00 rows=2000 width=12) (actual time=0.006..0.372 rows=2000 loops=1) 
Buffers: shared hit=11 -> Materialize (cost=0.00..41.00 rows=2000 width=12) (actual 
time=0.000..0.111 rows=2000 loops=2000) Storage: Memory Maximum Storage: 110kB Buffers: 
shared hit=11 -> Seq Scan on bitmap_split_or t2 (cost=0.00..31.00 rows=2000 width=12) 
(actual time=0.003..0.188 rows=2000 loops=1) Buffers: shared hit=11 Planning Time: 0.118 
ms Execution Time: 1204.874 ms (13 rows)

Query plan after the patch:

EXPLAIN ANALYZE SELECT * FROM bitmap_split_or t1, bitmap_split_or t2 WHERE t1.a=t2.b 
OR t1.a=1; QUERY PLAN 
-----------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop (cost=0.28..56369.00 rows=2502499 width=24) (actual time=0.121..2126.606 
rows=2003000 loops=1) Buffers: shared hit=16009 read=2 -> Seq Scan on 
bitmap_split_or t2 (cost=0.00..31.00 rows=2000 width=12) (actual time=0.017..0.652 
rows=2000 loops=1) Buffers: shared hit=11 -> Index Scan using t_a_b_idx on 
bitmap_split_or t1 (cost=0.28..18.15 rows=1002 width=12) (actual time=0.044..0.627 
rows=1002 loops=2000) Index Cond: (a = ANY (ARRAY[t2.b, 1])) Buffers: shared 
hit=15998 read=2 Planning Time: 0.282 ms Execution Time: 2344.367 ms (9 rows)

I'm afraid that we may lose this with this optimization. Maybe this can be 
taken into account somehow, what do you think?

The important aspect is that the second plan have lower cost than the
first one.  So, that's the question to the cost model.  The patch just
lets optimizer consider more comprehensive plurality of paths.  You
can let optimizer select the first plan by tuning *_cost params.  For
example, setting cpu_index_tuple_cost = 0.02 makes first plan win for
me.

Other than that the test query is quite unfortunate as t1.a=1 is very
frequent. I've adjusted the query so that nested loop with index scan
wins both in cost and execution time.

I've also adjusted another test query as proposed by Andrei.

I'm going to push this patch is there is no more notes.

Links.
1.https://www.postgresql.org/message-id/fc1017ca-877b-4f86-b491-154cf123eedd%40gmail.com


Okay.I agree with your codeand have no more notes

--
Regards,
Alena Rybakina
Postgres Professional

Re: POC, WIP: OR-clause support for indexes

Reply via email to