On Fri, Oct 02, 2020 at 10:55:14AM -0400, James Coleman wrote:
On Fri, Oct 2, 2020 at 10:53 AM James Coleman <jtc...@gmail.com> wrote:

On Fri, Oct 2, 2020 at 10:32 AM Tomas Vondra
<tomas.von...@2ndquadrant.com> wrote:
>
> On Fri, Oct 02, 2020 at 09:19:44AM -0400, James Coleman wrote:
> >
> > ...
> >
> >I've been able to confirm that the problem goes away if we stop adding
> >the gather merge paths in generate_useful_gather_paths().
> >
> >I'm not sure yet what conclusion that leads us to. It seems to be that
> >the biggest clue remains that all of this works correctly unless one
> >of the selected columns (which happens to be a pathkey at this point
> >because it's a DISTINCT query) contains a volatile expression.
> >
>
> Yeah. It seems to me this is a bug in get_useful_pathkeys_for_relation,
> which is calling find_em_expr_for_rel and is happy with anything it
> returns. But this was copied from postgres_fdw, which however does a bit
> more here:
>
>      if (pathkey_ec->ec_has_volatile ||
>          !(em_expr = find_em_expr_for_rel(pathkey_ec, rel)) ||
>          !is_foreign_expr(root, rel, em_expr))
>
> So not only does it explicitly check volatility of the pathkey, it also
> calls is_foreign_expr which checks the expression for mutable functions.
>
> The attached patch seems to fix this, but it only adds the check for
> mutable functions. Maybe it should check ec_has_volatile too ...

We actually discussed the volatility check in that function back on
the original thread [1], and we'd concluded that was specifically
necessary for the fdw code because the function would execute on two
different servers (and thus produce different results), but that in a
local server only scenario it should be fine.

My understanding (correct me if I'm wrong) is that the volatile
function should only be executed once (at the scan level?) to build
the tuple and from then on the datum isn't going to change, so I'm not
sure why the volatility would matter here.

James

1: 
https://www.postgresql.org/message-id/20200328025830.6v6ogkseohakc32q%40development

Oh, hmm, could what I said all be true, but there still be some rule
that you shouldn't compare datums generated from volatile expressions
in different backends (i.e., parallel query)?


I'm not sure it's all that related to parallel query - it's more likely
that when constructing the paths below the gather merge, this new code
fails to do something important.

I'm not 100% sure how the grouping and volatile functions work, so let
me think aloud here ...

The backtrace looks like this:

#0 get_sortgroupref_tle #1 0x0000000000808ab9 in prepare_sort_from_pathkeys
    #2  0x000000000080926c in make_sort_from_pathkeys
    #3  0x0000000000801032 in create_sort_plan
    #4  0x00000000007fe7e0 in create_plan_recurse
    #5  0x0000000000800b2c in create_gather_merge_plan
    #6  0x00000000007fe94d in create_plan_recurse
    #7  0x0000000000805328 in create_nestloop_plan
    #8  0x00000000007ff3c5 in create_join_plan
    #9  0x00000000007fe5f8 in create_plan_recurse
    #10 0x0000000000800d68 in create_projection_plan
    #11 0x00000000007fe662 in create_plan_recurse
    #12 0x0000000000801252 in create_upper_unique_plan
    #13 0x00000000007fe760 in create_plan_recurse
    #14 0x00000000007fe4f2 in create_plan
    #15 0x000000000081082f in standard_planner

and the create_sort_plan works with lefttree that is IndexScan, so the
query we're constructing looks like this:

   Distinct
    -> Nestloop
        -> Gather Merge
            -> Sort
                -> Index Scan

and it's the sort that expects to find the expression in the Index Scan
target list. Which seems rather bogus, because clearly the index scan
does not include the expression. (I wonder if it's somehow related that
indexes can't be built on volatile expressions ...)

Anyway, the index scan clearly does not include the expression the sort
references, hence the failure. And the index can can't compute it,
because we probably need to compute it on top of the join I think
(otherwise we might get duplicate values for volatile functions etc.)


Looking at this from a slightly different angle, the root cause here
seems to be that generate_useful_gather_paths uses the pathkeys it gets
from get_useful_pathkeys_for_relation, which means root->query_pathkeys.
But all other create_gather_merge_calls use root->sort_pathkeys, so
maybe this is the actual problem and get_useful_pathkeys_for_relation
should use root->sort_pathkeys instead. That does fix the issue for me
too (and it passes all regression tests).


regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply via email to