On 8/3/23 07:26, Peter Geoghegan wrote: > On Wed, Aug 2, 2023 at 6:32 PM Peter Geoghegan <p...@bowt.ie> wrote: >> I don't dispute the fact that this can only happen when the planner >> believes (with good reason) that the expected cost will be lower. But >> I maintain that there is a novel risk to be concerned about, which is >> meaningfully distinct from the general risk of regressions that comes >> from making just about any change to the planner. The important >> principle here is that we should have a strong bias in the direction >> of making quals into true "Access Predicates" whenever practical. >>
OK, preference for access predicates sounds like a reasonable principle. >> Yeah, technically the patch doesn't directly disrupt how existing >> index paths get generated. But it does have the potential to disrupt >> it indirectly, by providing an alternative very-similar index path >> that's likely to outcompete the existing one in these cases. I think >> that we should have just one index path that does everything well >> instead. > > You can see this for yourself, quite easily. Start by running the > relevant query from the regression tests, which is: > > SELECT * FROM tenk1 WHERE thousand = 42 AND (tenthous = 1 OR tenthous > = 3 OR tenthous = 42); > > EXPLAIN (ANALYZE, BUFFERS) confirms that the patch makes the query > slightly faster, as expected. I see 7 buffer hits for the bitmap index > scan plan on master, versus only 4 buffer hits for the patch's index > scan. Obviously, this is because we go from multiple index scans > (multiple bitmap index scans) to only one. > > But, if I run this insert statement and try the same thing again, > things look very different: > > insert into tenk1 (thousand, tenthous) select 42, i from > generate_series(43, 1000) i; > > (Bear in mind that we've inserted rows that don't actually need to be > selected by the query in question.) > > Now the master branch's plan works in just the same way as before -- > it has exactly the same overhead (7 buffer hits). Whereas the patch > still gets the same risky plan -- which now blows up. The plan now > loses by far more than it could ever really hope to win by: 336 buffer > hits. (It could be a lot higher than this, even, but you get the > picture.) Are you sure? Because if I try this on master (62e9af4c without any patches), I get this: regression=# explain (verbose, analyze, buffers) SELECT * FROM tenk1 WHERE thousand = 42 AND (tenthous = 1 OR tenthous = 3 OR tenthous = 42); QUERY PLAN ------------------------------------------------------------------------ Index Scan using tenk1_thous_tenthous on public.tenk1 (cost=0.29..1416.32 rows=1 width=244) (actual time=0.078..1.361 rows=1 loops=1) Output: unique1, unique2, two, four, ten, twenty, hundred, thousand, twothousand, fivethous, tenthous, odd, even, stringu1, stringu2, string4 Index Cond: (tenk1.thousand = 42) Filter: ((tenk1.tenthous = 1) OR (tenk1.tenthous = 3) OR (tenk1.tenthous = 42)) Rows Removed by Filter: 967 Buffers: shared hit=335 Planning Time: 0.225 ms Execution Time: 1.430 ms (8 rows) So not sure about the claim that master works fine as before. OTOH, the patched branch (with "my" patch 2023/07/16, just to be clear) does this: QUERY PLAN ------------------------------------------------------------------------ Index Scan using tenk1_thous_tenthous on public.tenk1 (cost=0.29..23.57 rows=1 width=244) (actual time=0.077..0.669 rows=1 loops=1) Output: unique1, unique2, two, four, ten, twenty, hundred, thousand, twothousand, fivethous, tenthous, odd, even, stringu1, stringu2, string4 Index Cond: (tenk1.thousand = 42) Index Filter: ((tenk1.tenthous = 1) OR (tenk1.tenthous = 3) OR (tenk1.tenthous = 42)) Rows Removed by Index Recheck: 967 Filter: ((tenk1.tenthous = 1) OR (tenk1.tenthous = 3) OR (tenk1.tenthous = 42)) Buffers: shared hit=7 Planning Time: 0.211 ms Execution Time: 0.722 ms (9 rows) Which is just the 7 buffers ... Did I do something wrong? > > Sure, it's difficult to imagine a very general model that captures > this sort of risk, in the general case. But you don't need a degree in > actuarial science to understand that it's inherently a bad idea to > juggle chainsaws -- no matter what your general risk tolerance happens > to be. Some things you just don't do. > Oh come on. I've been juggling chainsaws (lit on fire!) since I was a little boy! There's nothing risky about it. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company