On Mon, Jun 24, 2024 at 2:28 PM Robert Haas <robertmh...@gmail.com> wrote: > On Mon, Jun 24, 2024 at 1:47 PM Peter Geoghegan <p...@bowt.ie> wrote: > > I agree, with the proviso that "avoid gratuitous failures" should > > include cases where a query that got the optimization suddenly fails > > to get the optimization, due only to some very innocuous looking > > change. Such as a change from using a constant 1_000_000_000 to a > > constant 5_000_000_000 in the query text. That is a POLA violation. > > Nope, I don't agree with that at all. If you imagine that we can > either have the optimization apply to one of those cases on the other, > or on the other hand we can have some cases that outright fail, I > think it's entirely clear that the former is better.
I'm just saying that not having the optimization apply to a query very similar to one where it does apply is a POLA violation. That's another kind of failure, for all practical purposes. Weird performance cliffs like that are bad. It's very easy to imagine code that generates a query text, that at some point randomly and mysteriously gets a sequential scan. Or a much less efficient index scan. > I was assuming this patch shouldn't be changing the way indexes work > at all, just making use of the facilities that we have today. More > could be done, but that might make it harder to get anything > committed. I was just pointing out that there is currently no good way to make nbtree efficiently execute a qual "WHERE a = 5 OR a IS NULL", which is almost entirely (though not quite entirely) due to a lack of any way of expressing that idea through SQL, in a way that'll get pushed down to the index scan node. You can write "WHERE a = any('{5,NULL')", of course, but that doesn't treat NULL as just another array element to match against via an IS NULL qual (due to NULL semantics). Yeah, this is nonessential. But it's quite a nice optimization, and seems entirely doable within the framework of the patch. It would be a natural follow-up. All that I'd need on the nbtree side is to get an input scan key that directly embodies "WHERE mycol = 5 OR mycol IS NULL". That would probably just be a scan key with sk_flags "SK_SEARCHARRAY | SK_SEARCHNULL", that was otherwise identical to the current SK_SEARCHARRAY scan keys. Adopting the nbtree array index scan code to work with this would be straightforward. SK_SEARCHNULL scan keys basically already work like regular equality scan keys at execution time, so all that this optimization requires on the nbtree side is teaching _bt_advance_array_keys to treat NULL as a distinct array condition (evaluated as IS NULL, not as = NULL). > It's even possible, in my mind at least, that the patch is already > doing exactly the right things here. Even if it isn't, the problem > doesn't seem to be fundamental, because if this example can work (and > it does) then what the patch is trying to do should be workable, too. > We just have to make sure we're plugging all the pieces properly > together, and that we have comments adequately explain what is > happening and test cases that verify it. My feeling is that the patch > doesn't meet that standard today, but I think that just means it needs > some more work. I'm not arguing we have to throw the whole thing out, > or invent a lot of new infrastructure, or anything like that. I feel like my point about the potential for POLA violations is pretty much just common sense. I'm not particular about the exact mechanism by which we avoid it; only that we avoid it. -- Peter Geoghegan