On Thu, Aug 3, 2023 at 3:04 PM Tomas Vondra <tomas.von...@enterprisedb.com> wrote: > When you say "index page accesses" do you mean accesses to index pages, > or accesses to heap pages from the index scan?
Yes, that's exactly what I mean. Note that that's the dominant cost for the original BitmapOr plan. As I said upthread, the original BitmapOr plan has 7 buffer hits. The breakdown is 1 single heap page access, 3 root page accesses, and 3 leaf page accesses. There is only 1 heap page access because only 1 out of the 3 index scans that feed into the BitmapOr actually end up finding any matching rows in the index. In short, the dominant cost here is index page accesses. It's a particularly good case for my SAOP patch! > Because my patch is all about reducing the heap pages, which are usually > the expensive part of the index scan. But you're right the "index scan" > with index filter may access more index pages, because it has fewer > "access predicates". The fact is that your patch correctly picks the cheapest plan, which is kinda like a risky version of the plan that my SAOP patch would pick -- it is cheaper for the very same reason. I understand that that's not your intention at all, but this is clearly what happened. That's what I meant by "weird second order effects". To me, it really does kinda look like your patch accidentally discovered a plan that's fairly similar to the plan that my SAOP patch would have found by design! Perhaps I should have been clearer on this point earlier. (If you're only now seeing this for yourself for the first time, then...oops. No wonder you were confused about which patch it was I was going on about!) > I don't quite see that with the tenk1 query we've been discussing (the > extra buffers were due to non-allvisible heap pages), but I guess that's > possible. The extra buffer hits occur because I made them occur by inserting new tuples where thousand = 42. Obviously, I did it that way because I had a point that I wanted to make. Obviously, there wouldn't have been any notable regression from your patch at all if I had (say) inserted tuples where thousand = 43 instead. (Not for the original "42" query, at least.) That's part of the problem, as I see it. Local changes like that can have outsized impact on individual queries, even though there is no inherent reason to expect it. How can statistics reliably guide the planner here? Statistics are supposed to be a summary of the whole attribute, that allow us to make various generalizations during planning. But this plan leaves us sensitive to relatively small changes in one particular "thousand" grouping, with potentially outsized impact. And, this can happen very suddenly, because it's so "local". Making this plan perform robustly just doesn't seem to be one of the things that statistics can be expected to help us with very much. -- Peter Geoghegan