On Sat, Mar 28, 2020 at 10:19:04AM -0400, James Coleman wrote:
On Fri, Mar 27, 2020 at 10:58 PM Tomas Vondra
<tomas.von...@2ndquadrant.com> wrote:
...
>> As a side note here, I'm wondering if this (determining useful pathkeys)
>> can be made a bit smarter by looking both at query_pathkeys and pathkeys
>> useful for merging, similarly to what truncate_useless_pathkeys() does.
>> But that can be seen as an improvement of what we do now.
>
>Unless your comment below about looking at truncate_useless_pathkeys
>is implying you're considering aiming to get this in now, I wonder if
>we should just expand the comment to reference pathkeys useful for
>merging as a possible future extension.
>

Maybe. I've been thinking about how to generate those path keys, but
it's a bit tricky, because pathkeys_useful_for_merging() - the thing
that backs truncate_useless_pathkeys() actually gets pathkeys and merely
verifies if those are useful for merging. But this code needs to do the
opposite - generate those pathkeys.

But let's say we know we have a join on columns (a,b,c). For each of
those PathKey values we know it's useful for merging, but should we
generate pathkeys (a,b,c), (b,c,a), ... or any other permutation? I
suppose we can look at pathkeys for existing paths of the relation to
prune the possibilities a bit, but what then?

I'm not convinced it's worth it this time around. Both because of the
late hour in the CF etc., but also because it seems likely to become
pretty complex quickly, and also far more likely to raise performance
questions in planning if there's not a lot of work done to keep it
limited.


Agreed. There'll be time to add more optimizations in the future.

BTW I wonder if we actually need the ec_has_volatile check in
get_useful_pathkeys_for_relation. The comment says we can't but is it
really true? pathkeys_useful_for_ordering doesn't do any such checks, so
I'd bet this is merely an unnecessary copy-paste from postgres_fdw.
Opinions?

I hadn't really looked at that part in depth before, but after reading
it over, re-reading the definition of volatility, and running a few
basic queries, I agree.

For example: we already do allow volatile pathkeys in a simple query like:
-- t(a, b) with index on (a)
select * from t order by a, random();

It makes sense that you couldn't push down such a sort to a foreign
server, given there's no constraints said function isn't operating
directly on the primary database (in the fdw case). But that obviously
wouldn't apply here.


Thanks for confirming my reasoning.

>> >9. optimizer/path/allpaths.c get_useful_pathkeys_for_relation:
>> >* Considering query_pathkeys is always worth it, because it might let us
>> >* avoid a local sort.
>> >
>> >That originally was a copy from the fdw code, but since the two
>> >functions have diverged (Is that concerning? I could be confusing, but
>> >isn't a compilation problem) I didn't move the function.
>> >
>>
>> I think it's OK the two functions diverged, it's simply because the FDW
>> one needs to check other things too. But I might rework this once I look
>> closer at truncate_useless_pathkeys.
>
>Agreed, for now at least. It's tempting to think they should always be
>shared, but I'm not convinced (without a lot more digging) that this
>represents structural rather than incidental duplication.
>

The more I look at pathkeys_useful_for_ordering() the more I think the
get_useful_pathkeys_for_relation() function should look more like it
than the postgres_fdw one ...

If we go down that path (and indeed this is actually implied by
removing the volatile check too), doesn't that really just mean (at
least for now) that get_useful_pathkeys_for_relation goes away
entirely and we effectively use root->query_pathkeys directly?

Yes, basically.

The only thing your lose them is the check that each eclass has a
member in the rel. But that check probably wasn't quite right anyway:
at least for incremental sort (maybe not regular sort), I think we
probably don't necessarily care that the entire list has members in the
rel, but rather that some prefix does, right?

Probably, although I always forget how exactly this EC business works.
My reasoning is more "If pathkeys_useful_for_ordering does not need
that, why should we need it here?"

The idea there would be that we could create a gather merge here that
supplies a partial ordering (that prefix of query_pathkeys) and then
above it the planner might place another incremental sort (say above a
join), or perhaps even a join above that would actually be sufficient
(since many joins are capable of providing an ordering anyway).


Not sure. Isn't the idea that we do the Incremental Sort below the
Gather Merge, because then it's actually done in parallel?

I've attached (added prefix .txt to avoid the cfbot assuming this is a
full series) an idea in that direction to see if we're thinking along
the same route. You'll want to apply and view with `git diff -w`
probably.

The attached also adds a few XXX comments. In particular, I wonder if
we should verify that some prefix of the root->query_pathkeys is
actually made up of eclass members for the current rel, because
otherwise I think we can skip the loop on the subpaths entirely.


Hmmm, that's an interesting possible optimization. I wonder if that
actually saves anything, though, because the number of paths in the loop
is likely fairly low.

>> >I did notice though that find_em_expr_for_rel() is wholesale
>> >copied (and unchanged) from the fdw code, so I moved it to
>> >equivclass.c so both places can share it.
>> >
>>
>> +1
>>

... which would also get rid of find_em_expr_for_rel().

... which, I think, would retain the need for find_em_expr_for_rel().

Note: The attached applied to the previous series compiles and runs
make check...but I haven't really tested it; it's meant more for "is
this the direction we want to go".


Thanks, I'll take a look.


regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply via email to