On 05.04.2025 13:09, Alexander Korotkov wrote:
On Fri, Apr 4, 2025 at 11:35 AM Andrei Lepikhov<lepi...@gmail.com> wrote:
On 4/4/25 04:53, Richard Guo wrote:
On Fri, Apr 4, 2025 at 1:02 AM Alexander Korotkov<aekorot...@gmail.com> wrote:
I've got an off-list bug report from Alexander Lakhin involving a
placeholder variable. Alena and Andrei proposed a fix. It is fairly
simple: we just shouldn't remove PHVs during self-join elimination, as
they might still be referenced from other parts of a query. The patch
is attached. I'm going to fix this if no objections.
Hmm, I'm not sure about the fix. It seems to me that it simply
prevents removing any PHVs in the self-join removal case. My concern
is that this might result in PHVs that could actually be removed not
being removed in many cases.
Let's play with use cases:
If a PHV is needed in the inner or outer only, it means we have a clause
in the baserestrictinfo that will be transferred to the keeping
relation, and we shouldn't remove the PHV.
Another case is when the PHV is needed in a join clause of the
self-join. I may imagine such a case:
toKeep.x+toRemove.y=PHV
This clause will be transformed to "toKeep.x+toKeep.y=PHV", pushed to
baserestrictinfo of keeping relation and should be saved.
I think it is possible to invent quite a narrow case of clause like the
following:
PHV_evaluated_at_inner = PHV_evaluated_at_outer
It needs to prove reproducibility. But even if it makes sense, it seems
to have no danger for further selectivity estimation compared to the
source clause and is a too-narrow case, isn't it?
In other cases, this PHV is needed something else, and we can't remove it.
I agree with that, and I also believe this case will be quite rare in
practice.
Should we add more regression tests covering these cases?
I experimented with some examples like this and noticed that it does
affect cardinality estimation, though I'm not sure the impact is
significant.
I used the tables from the regression tests, so if they’re appropriate
for reproducing this case, it should be straightforward to add them.
EXPLAIN (analyze, VERBOSE)
SELECT 1
FROM tbl_phv t1
LEFT JOIN (
SELECT 1 AS extra, x, y FROM tbl_phv tl
) t3
JOIN (
SELECT y, x FROM tbl_phv tr
) t4 ON t4.y = t3.y and (t4.x*0) = (t3.x *0)
ON true
WHERE t3.extra IS NOT NULL
AND t4.x + t3.y= (t1.x * 0);
The query plan with transformation:
------------------------------------------------------------------------------------------------------------------------------
Hash Join (cost=3.27..7.16 rows=100 width=4) (actual
time=0.149..0.151 rows=0.00 loops=1)
Output: 1
Hash Cond: ((tr.x + tr.y) = (t1.x * 0))
Buffers: shared hit=2
-> Seq Scan on public.tbl_phv tr (cost=0.00..2.26 rows=100
width=8) (actual time=0.027..0.047 rows=100.00 loops=1)
Output: tr.x, tr.y
Filter: ((1 IS NOT NULL) AND ((tr.x * 0) IS NOT NULL))
Rows Removed by Filter: 1
Buffers: shared hit=1
-> Hash (cost=2.01..2.01 rows=101 width=4) (actual
time=0.078..0.079 rows=100.00 loops=1)
Output: t1.x
Buckets: 1024 Batches: 1 Memory Usage: 12kB
Buffers: shared hit=1
-> Seq Scan on public.tbl_phv t1 (cost=0.00..2.01 rows=101
width=4) (actual time=0.005..0.038 rows=101.00 loops=1)
Output: t1.x
Buffers: shared hit=1
Planning Time: 0.607 ms
Execution Time: 0.194 ms
(18 rows)
The query plan without transformation:
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------
Hash Join (cost=6.09..8.74 rows=1 width=4) (actual time=0.248..0.250
rows=0.00 loops=1)
Output:1
Hash Cond:((t1.x * 0) = (tr.x + tl.y))
Buffers:shared hit=3
-> Seq Scan on public.tbl_phv t1 (cost=0.00..2.01 rows=101 width=4)
(actual time=0.011..0.021 rows=101.00 loops=1)
Output:t1.x, t1.y
Buffers:shared hit=1
-> Hash (cost=6.08..6.08 rows=1 width=8) (actual time=0.206..0.207
rows=100.00 loops=1)
Output:tl.y, tr.x
Buckets:1024 Batches:1 Memory Usage:12kB
Buffers:shared hit=2
-> Hash Join (cost=3.51..6.08 rows=1 width=8) (actual time=0.090..0.160
rows=100.00 loops=1)
Output:tl.y, tr.x
Inner Unique:true
Hash Cond:((tr.y = tl.y) AND ((tr.x * 0) = (tl.x * 0)))
Buffers:shared hit=2
-> Seq Scan on public.tbl_phv tr (cost=0.00..2.01 rows=101 width=8)
(actual time=0.005..0.016 rows=101.00 loops=1)
Output:tr.x, tr.y
Buffers:shared hit=1
-> Hash (cost=2.01..2.01 rows=100 width=8) (actual time=0.080..0.080
rows=100.00 loops=1)
Output:tl.y, tl.x
Buckets:1024 Batches:1 Memory Usage:12kB
Buffers:shared hit=1
-> Seq Scan on public.tbl_phv tl (cost=0.00..2.01 rows=100 width=8)
(actual time=0.008..0.035 rows=101.00 loops=1)
Output:tl.y, tl.x
Filter:(1 IS NOT NULL)
Buffers:shared hit=1
Planning:
Buffers:shared hit=25
Planning Time:0.609 ms
Execution Time:0.319 ms
(31 rows)
EXPLAIN (analyze, VERBOSE)
SELECT 1
FROM tbl_phv t1
LEFT JOIN (
SELECT 1 AS extra, x, y FROM tbl_phv tl
) t3
JOIN (
SELECT y, x FROM tbl_phv tr
) t4 ON t4.y = t3.y
ON true
WHERE t3.extra IS NOT NULL
AND t4.x + t3.y= (t1.x * 0);
The query plan with transformation:
------------------------------------------------------------------------------------------------------------------------------
Hash Join (cost=3.26..6.90 rows=100 width=4) (actual
time=0.145..0.146 rows=0.00 loops=1)
Output: 1
Hash Cond: ((t1.x * 0) = (tr.x + tr.y))
Buffers: shared hit=2
-> Seq Scan on public.tbl_phv t1 (cost=0.00..2.01 rows=101
width=4) (actual time=0.030..0.040 rows=101.00 loops=1)
Output: t1.x, t1.y
Buffers: shared hit=1
-> Hash (cost=2.01..2.01 rows=100 width=8) (actual
time=0.081..0.081 rows=100.00 loops=1)
Output: tr.y, tr.x
Buckets: 1024 Batches: 1 Memory Usage: 12kB
Buffers: shared hit=1
-> Seq Scan on public.tbl_phv tr (cost=0.00..2.01 rows=100
width=8) (actual time=0.012..0.040 rows=101.00 loops=1)
Output: tr.y, tr.x
Filter: (1 IS NOT NULL)
Buffers: shared hit=1
Planning Time: 0.571 ms
Execution Time: 0.195 ms
(17 rows)
The query plan without transformation:
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------
Hash Join (cost=6.53..9.94 rows=50 width=4) (actual time=0.236..0.238
rows=0.00 loops=1)
Output:1
Hash Cond:((tr.x + tl.y) = (t1.x * 0))
Buffers:shared hit=3
-> Hash Join (cost=3.26..5.55 rows=100 width=8) (actual
time=0.075..0.137 rows=101.00 loops=1)
Output:tl.y, tr.x
Inner Unique:true
Hash Cond:(tr.y = tl.y)
Buffers:shared hit=2
-> Seq Scan on public.tbl_phv tr (cost=0.00..2.01 rows=101 width=8)
(actual time=0.005..0.017 rows=101.00 loops=1)
Output:tr.x, tr.y
Buffers:shared hit=1
-> Hash (cost=2.01..2.01 rows=100 width=4) (actual time=0.064..0.065
rows=101.00 loops=1)
Output:tl.y
Buckets:1024 Batches:1 Memory Usage:12kB
Buffers:shared hit=1
-> Seq Scan on public.tbl_phv tl (cost=0.00..2.01 rows=100 width=4)
(actual time=0.006..0.033 rows=101.00 loops=1)
Output:tl.y
Filter:(1 IS NOT NULL)
Buffers:shared hit=1
-> Hash (cost=2.01..2.01 rows=101 width=4) (actual time=0.078..0.078
rows=100.00 loops=1)
Output:t1.x
Buckets:1024 Batches:1 Memory Usage:12kB
Buffers:shared hit=1
-> Seq Scan on public.tbl_phv t1 (cost=0.00..2.01 rows=101 width=4)
(actual time=0.014..0.039 rows=101.00 loops=1)
Output:t1.x
Buffers:shared hit=1
Planning:
Buffers:shared hit=12
Planning Time:0.478 ms
Execution Time:0.293 ms
(31 rows)
--
Regards,
Alena Rybakina
Postgres Professional