Re: Use of additional index columns in rows filtering

Tomas Vondra Thu, 03 Aug 2023 04:21:23 -0700

On 8/3/23 07:26, Peter Geoghegan wrote:
> On Wed, Aug 2, 2023 at 6:32 PM Peter Geoghegan <[email protected]> wrote:
>> I don't dispute the fact that this can only happen when the planner
>> believes (with good reason) that the expected cost will be lower. But
>> I maintain that there is a novel risk to be concerned about, which is
>> meaningfully distinct from the general risk of regressions that comes
>> from making just about any change to the planner. The important
>> principle here is that we should have a strong bias in the direction
>> of making quals into true "Access Predicates" whenever practical.
>>


OK, preference for access predicates sounds like a reasonable principle.

>> Yeah, technically the patch doesn't directly disrupt how existing
>> index paths get generated. But it does have the potential to disrupt
>> it indirectly, by providing an alternative very-similar index path
>> that's likely to outcompete the existing one in these cases. I think
>> that we should have just one index path that does everything well
>> instead.
> 
> You can see this for yourself, quite easily. Start by running the
> relevant query from the regression tests, which is:
> 
> SELECT * FROM tenk1 WHERE thousand = 42 AND (tenthous = 1 OR tenthous
> = 3 OR tenthous = 42);
> 
> EXPLAIN (ANALYZE, BUFFERS) confirms that the patch makes the query
> slightly faster, as expected. I see 7 buffer hits for the bitmap index
> scan plan on master, versus only 4 buffer hits for the patch's index
> scan. Obviously, this is because we go from multiple index scans
> (multiple bitmap index scans) to only one.
> 
> But, if I run this insert statement and try the same thing again,
> things look very different:
> 
> insert into tenk1 (thousand, tenthous) select 42, i from
> generate_series(43, 1000) i;
> 
> (Bear in mind that we've inserted rows that don't actually need to be
> selected by the query in question.)
> 
> Now the master branch's plan works in just the same way as before --
> it has exactly the same overhead (7 buffer hits). Whereas the patch
> still gets the same risky plan -- which now blows up. The plan now
> loses by far more than it could ever really hope to win by: 336 buffer
> hits. (It could be a lot higher than this, even, but you get the
> picture.)

Are you sure? Because if I try this on master (62e9af4c without any
patches), I get this:

regression=# explain (verbose, analyze, buffers) SELECT * FROM tenk1
WHERE thousand = 42 AND (tenthous = 1 OR tenthous = 3 OR tenthous = 42);

                              QUERY PLAN
------------------------------------------------------------------------
 Index Scan using tenk1_thous_tenthous on public.tenk1
(cost=0.29..1416.32 rows=1 width=244) (actual time=0.078..1.361 rows=1
loops=1)
   Output: unique1, unique2, two, four, ten, twenty, hundred, thousand,
twothousand, fivethous, tenthous, odd, even, stringu1, stringu2, string4
   Index Cond: (tenk1.thousand = 42)
   Filter: ((tenk1.tenthous = 1) OR (tenk1.tenthous = 3) OR
(tenk1.tenthous = 42))
   Rows Removed by Filter: 967
   Buffers: shared hit=335
 Planning Time: 0.225 ms
 Execution Time: 1.430 ms
(8 rows)

So not sure about the claim that master works fine as before. OTOH, the
patched branch (with "my" patch 2023/07/16, just to be clear) does this:

                              QUERY PLAN
------------------------------------------------------------------------
 Index Scan using tenk1_thous_tenthous on public.tenk1
(cost=0.29..23.57 rows=1 width=244) (actual time=0.077..0.669 rows=1
loops=1)
   Output: unique1, unique2, two, four, ten, twenty, hundred, thousand,
twothousand, fivethous, tenthous, odd, even, stringu1, stringu2, string4
   Index Cond: (tenk1.thousand = 42)
   Index Filter: ((tenk1.tenthous = 1) OR (tenk1.tenthous = 3) OR
(tenk1.tenthous = 42))
   Rows Removed by Index Recheck: 967
   Filter: ((tenk1.tenthous = 1) OR (tenk1.tenthous = 3) OR
(tenk1.tenthous = 42))
   Buffers: shared hit=7
 Planning Time: 0.211 ms
 Execution Time: 0.722 ms
(9 rows)

Which is just the 7 buffers ...

Did I do something wrong?

> 
> Sure, it's difficult to imagine a very general model that captures
> this sort of risk, in the general case. But you don't need a degree in
> actuarial science to understand that it's inherently a bad idea to
> juggle chainsaws -- no matter what your general risk tolerance happens
> to be. Some things you just don't do.
> 

Oh come on. I've been juggling chainsaws (lit on fire!) since I was a
little boy! There's nothing risky about it.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Use of additional index columns in rows filtering

Reply via email to