On Sat, 8 May 2021 at 09:15, James Coleman <jtc...@gmail.com> wrote: > > On Sat, Apr 24, 2021 at 6:25 AM David Rowley <dgrowle...@gmail.com> wrote: > > I'm a bit undecided if it's safe to set the opfuncid to the negator > > function. If anything were to set that again based on the opno then > > it would likely set it to the wrong thing. We can't go changing the > > opno either because EXPLAIN would display the wrong thing. > > I don't personally see a reason why this is a problem. But I also > don't know that I have enough knowledge of the codebase to say that > definitively.
The reason for my concern is that if the opfuncid is set to InvalidOid, set_sa_opfuncid() always sets the ScalarArrayOpExpr's opfuncid to get_opcode(opexpr->opno) . I'm effectively setting the opfuncid to get_opcode(get_negator(opexpr->opno)), if anything were to reset the ScalarArrayOpExpr's opfuncid to InvalidOid, then set_sa_opfuncid() would repopulate it with the wrong value. Maybe the solution there is to teach set_sa_opfuncid() about our hashing NOT IN trick and have it check if (!opexpr->useOr && OidIsValid(opexpr->hashfuncid)) and if that's true then do opexpr->opfuncid = get_opcode(get_negator(opexpr->opno)). Then we could just not bothing setting opfuncid in convert_saop_to_hashed_saop_walker(). > > Anyway, I've attached what I ended up with after spending a few hours > > looking at this. > > Overall I like this approach. > > One thing I think we could clean up: > > + bool useOr; /* use OR or AND semantics? */ > ... > + /* useOr == true means an IN clause, useOr == false is NOT IN */ > > I'm wondering if the intersection of these two lines implies that > useOr isn't quite the right name here. Perhaps something like > "negated"? I'm not sure I want to go changing that. The whole IN() / NOT IN() behaviour regarding NULLs all seems pretty weird until you mentally replace a IN (1,2,3) with a = 1 OR a = 2 OR a = 3. And for the a NOT IN(1,2,3) case, a <> 1 AND a <> 2 AND a <> 3. People can make a bit more sense of the weirdness of NULLs with NOT IN when they mentally convert their expression like that. I think having that in code is useful too. Any optimisations that are added must match those semantics. > The other thing I got to thinking about was = ALL. It doesn't get > turned into a hash op because the negator of = isn't hashable. I think > it's correct that that's the determining factor, because I can't > imagine what it would mean to hash <>. But...I wanted to confirm I > wasn't missing something. We don't have explicit tests for that case, > but I'm not sure it's necessary either. It's important to think of other cases, I just don't think there's any need to do anything for that one. Remember that we have the restriction of requiring a set of Consts, so for that case to be met, someone would have to write something like: col = ALL('{1,1,1,1,1,1,1,1}'::int[]); I think if anyone comes along complaining that a query containing that is not as fast as they'd like then we might tell them that they should just use: col = 1. A sanity checkup might not go amiss either. David