On Thu, Mar 27, 2025 at 6:03 PM Alena Rybakina <a.rybak...@postgrespro.ru> wrote: > I replied an example like this:
This example shows costs that are dominated by heap access costs. Both the sequential scan and the bitmap heap scan must access 637 heap blocks. So I don't think that this is such a great example -- the heap accesses are irrelevant from the point of view of assessing how well we're modelling index scan related costs. > I think it would be useful to show information that we used an index scan but > at the same time we skipped the "region" column and I assume we should output > how many distinct values the "region" column had. > > For example it will look like this "Skip Scan on region (4 distinct values)": > What do you think? As I said on our call today, I think that we should keep the output for EXPLAIN ANALYZE simple. While I'm sympathetic to the idea that we should show more information about how quals can be applied in index scan node output, that seems like it should be largely independent work to me. Masahiro Ikeda wrote a patch that aimed to improve matters in this area some months back. I'm supportive of that (there is definitely value in signalling to users that the index might actually look quite different to how they imagine it looks, say by having an omitted-by-query prefix attribute). I don't exactly know what the most useful kind of information to show is with skip scan in place, since skip scan makes the general nature of quals (whether a given qual is what oracle calls "access predicates", or what oracle calls "filter predicates") is made squishy/dynamic by skip scan, in a way that is new. The relationship between the number of values that a skip array ever uses, and the number of primitive index scans is quite complicated. Sometimes it is actually as simple as your example query, but that's often not true. "Index Searches: N" can be affected by: * The use of SAOP arrays, which also influence primitive scan scheduling, in the same way as they have since Postgres 17 -- and can be mixed freely with skip arrays. * The availability of opclass skipsupport, which makes skip arrays generate their element values by addition/subtraction from the current array element, rather than using NEXT/PRIOR sentinel keys. The sentinel keys act as probes that get the next real (non-sentinel) value that we need to look up next. Whereas skip support can often successfully guess that (for example) the next value in the index after 268 is 269, saving a primitive scan each time (this might not happen at all, or it might work only some of the time, or it might work all of the time). * Various primitive index scan scheduling heuristics. Another concern here is that I don't want to invent a special kind of "index search" just for skip scan. We're going to show an "Index Searches: N" that's greater than 1 with SAOP array keys, too -- which don't use skip scan at all (nothing new about that, except for the fact that we report the number of searches directly from EXPLAIN ANALYZE in Postgres 18). There really is almost no difference between a scan with a skip array and a scan of the same index with a similar SAOP array (when each array "contains the same elements", and is used to scan the same index, in the same way). That's why the cost model is as similar as possible to the Postgres 17 costing of SAOP array scans -- it's really the same access method. Reusing the cost model makes sense because actual execution times are almost identical when we compare a skip array to a SAOP array in the way that I described. The only advantage that I see from putting something about "skip scan" in EXPLAIN ANALYZE is that it is more googleable that way. But it seems like "Index Searches: N" is almost as good, most of the time. In any case, the fact that we don't need a separate optimizer index path/executor node for this is something that I see as a key advantage, and something that I'd like EXPLAIN ANALYZE to preserve. The problem with advertising that an index scan node is a skip scan is: what if it just never skips? Never skipping like this isn't necessarily unexpected. And even if it is unexpected, it's not necessarily a problem. > I didn't see any regression tests. Maybe we should add some tests? To be > honest I didn't see it mentioned in the commit message but I might have > missed something. There are definitely new regression tests -- I specifically tried to keep the test coverage high, using gcov html reports (like the ones from coverage.postgresql.org). The test updates appear towards the end of the big patch file, though. Maybe you're just not used to seeing tests appear last like this? I use "git config diff.orderfile ... " to get this behavior. I find it useful to put the important changes (particularly header file changes) first, and less important changes (like tests) much later. Thanks for taking a look at my patch! -- Peter Geoghegan