Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays

David Rowley Fri, 07 May 2021 17:39:06 -0700

On Sat, 8 May 2021 at 09:15, James Coleman <jtc...@gmail.com> wrote:
>
> On Sat, Apr 24, 2021 at 6:25 AM David Rowley <dgrowle...@gmail.com> wrote:
> > I'm a bit undecided if it's safe to set the opfuncid to the negator
> > function.  If anything were to set that again based on the opno then
> > it would likely set it to the wrong thing. We can't go changing the
> > opno either because EXPLAIN would display the wrong thing.
>
> I don't personally see a reason why this is a problem. But I also
> don't know that I have enough knowledge of the codebase to say that
> definitively.


The reason for my concern is that if the opfuncid is set to
InvalidOid, set_sa_opfuncid() always sets the ScalarArrayOpExpr's
opfuncid to get_opcode(opexpr->opno) . I'm effectively setting the
opfuncid to get_opcode(get_negator(opexpr->opno)), if anything were to
reset the ScalarArrayOpExpr's opfuncid to InvalidOid, then
set_sa_opfuncid() would repopulate it with the wrong value.

Maybe the solution there is to teach set_sa_opfuncid() about our
hashing NOT IN trick and have it check if (!opexpr->useOr &&
OidIsValid(opexpr->hashfuncid)) and if that's true then do
opexpr->opfuncid = get_opcode(get_negator(opexpr->opno)).  Then we
could just not bothing setting opfuncid in
convert_saop_to_hashed_saop_walker().


> > Anyway, I've attached what I ended up with after spending a few hours
> > looking at this.
>
> Overall I like this approach.
>
> One thing I think we could clean up:
>
> + bool            useOr;  /* use OR or AND semantics? */
> ...
> + /* useOr == true means an IN clause, useOr == false is NOT IN */
>
> I'm wondering if the intersection of these two lines implies that
> useOr isn't quite the right name here. Perhaps something like
> "negated"?

I'm not sure I want to go changing that.  The whole IN() / NOT IN()
behaviour regarding NULLs all seems pretty weird until you mentally
replace a IN (1,2,3) with a = 1 OR a = 2 OR a = 3.  And for the a NOT
IN(1,2,3) case, a <> 1 AND a <> 2 AND a <> 3.  People can make a bit
more sense of the weirdness of NULLs with NOT IN when they mentally
convert their expression like that.  I think having that in code is
useful too. Any optimisations that are added must match those
semantics.

> The other thing I got to thinking about was = ALL. It doesn't get
> turned into a hash op because the negator of = isn't hashable. I think
> it's correct that that's the determining factor, because I can't
> imagine what it would mean to hash <>. But...I wanted to confirm I
> wasn't missing something. We don't have explicit tests for that case,
> but I'm not sure it's necessary either.

It's important to think of other cases, I just don't think there's any
need to do anything for that one.  Remember that we have the
restriction of requiring a set of Consts, so for that case to be met,
someone would have to write something like: col =
ALL('{1,1,1,1,1,1,1,1}'::int[]);  I think if anyone comes along
complaining that a query containing that is not as fast as they'd like
then we might tell them that they should just use: col = 1. A sanity
checkup might not go amiss either.

David

Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays

Reply via email to