Re: NOT IN subquery optimization

Tom Lane Tue, 24 Mar 2020 16:49:32 -0700

"Li, Zheng" <zhe...@amazon.com> writes:
>     * I find it entirely unacceptable to stick some planner temporary
>     fields into struct SubLink.  If you need that storage you'll have
>     to find some other place to put it.  But in point of fact I don't
>     think you need it; it doesn't look to me to be critical to generate
>     the subquery's plan any earlier than make_subplan would have done it.
>     Moreover, you should really strive to *not* do that, because it's
>     likely to get in the way of other future optimizations.  As the
>     existing comment in make_subplan already suggests, we might want to
>     delay subplan planning even further than that in future.


>   The reason for calling make_subplan this early is that we want to
> Call subplan_is_hashable(plan), to decide whether to proceed with the proposed
> transformation.

Well, you're going to have to find another way, because this one won't do.

If you really need to get whacked over the head with a counterexample for
this approach, consider what happens if some part of the planner decides
to pass the SubLink through copyObject, expression_tree_mutator, etc
in between where you've done the planning and where make_subplan looks
at it.  Since you haven't taught copyfuncs.c about these fields, they'll
semi-accidentally wind up as NULL afterwards, meaning you lost the
information anyway.  (In fact, I wouldn't be surprised if that's happening
already in some cases; you couldn't really tell, since make_subplan will
just repeat the work.)  On the other hand, you can't have copyfuncs.c
copying such fields either --- we don't have copyfuncs support for
PlannerInfo, and if we did, the case would end up as infinite recursion.
Nor would it be particularly cool to try to fake things out by copying the
pointers as scalars; that will lead to dangling pointers later on.

BTW, so far as I can see, the only reason you're bothering with the whole
thing is to compare the size of the subquery output with work_mem, because
that's all that subplan_is_hashable does.  I wonder whether that
consideration is even still necessary in the wake of 1f39bce02.  If it is,
I wonder whether there isn't a cheaper way to figure it out.  (Note
similar comment in make_subplan.)

Also ...

> We try to stick with the fast hashed subplan when possible to avoid
> potential performance degradation from the transformation which may
> restrict the planner to choose Nested Loop Anti Join in order to handle
> Null properly,

But can't you detect that case directly?  It seems like you'd need to
figure out the NULL situation anyway to know whether the transformation
to antijoin is valid in the first place.

                        regards, tom lane

Re: NOT IN subquery optimization

Reply via email to