> On Sun, Jul 12, 2020 at 12:48:47PM +0000, Floris Van Nee wrote:
> >
> > Good point, thanks for looking at this. With the latest planner version 
> > there
> > are indeed more possibilities to use skipping. It never occured to me that
> > some of those paths will still rely on index scan returning full data set. 
> > I'll look
> > in details and add verification to prevent putting something like this on 
> > top of
> > skip scan in the next version.
>
> I believe the required changes are something like in attached patch. There 
> were a few things I've changed:
> - build_uniquekeys was constructing the list incorrectly. For a DISTINCT a,b, 
> it would create two unique keys, one with a and one with b. However, it 
> should be one unique key with (a,b).

Yes, I've also noticed that while preparing fix for index scan not
covered by index and included it.

> - the uniquekeys that is built, still contains some redundant keys, that are 
> normally eliminated from the path keys lists.

I guess you're talking about:

+ if (EC_MUST_BE_REDUNDANT(ec))
+   continue;

Can you add some test cases to your changes to show the effect of it? It
seem to me redundant keys are already eliminated at this point by either
make_pathkeys_for_uniquekeys or even earlier for distinct on, but could
be I've missed something.

Along the lines I'm also curious about this part:

-       ListCell   *k;
-       List *exprs = NIL;
-
-       foreach(k, ec->ec_members)
-       {
-               EquivalenceMember *mem = (EquivalenceMember *) lfirst(k);
-               exprs = lappend(exprs, mem->em_expr);
-       }
-
-       result = lappend(result, makeUniqueKey(exprs, false, false));
+       EquivalenceMember *mem = (EquivalenceMember*) 
lfirst(list_head(ec->ec_members));

I'm curious about this myself, maybe someone can clarify. It looks like
generaly speaking there could be more than one member (if not
ec_has_volatile), which "representing knowledge that multiple items are
effectively equal". Is this information is not interesting enough to
preserve it in unique keys?

> - the distinct_pathkeys may be NULL, even though there's a possibility for 
> skipping. But it wouldn't create the uniquekeys in this case. This makes the 
> planner not choose skip scans even though it could. For example in queries 
> that do SELECT DISTINCT ON (a) * FROM t1 WHERE a=1 ORDER BY a,b; Since a is 
> constant, it's eliminated from regular pathkeys.

What would be the point of skipping if it's a constant?

> - to combat the issues mentioned earlier, there's now a check in 
> build_index_paths that checks if the query_pathkeys matches the 
> useful_pathkeys. Note that we have to use the path keys here rather than any 
> of the unique keys. The unique keys are only Expr nodes - they do not contain 
> the necessary information about ordering. Due to elimination of some constant 
> path keys, we have to search the attributes of the index to find the correct 
> prefix to use in skipping.

IIUC here you mean this function, right?

+ prefix = find_index_prefix_for_pathkey(root,
+                                                                            
index,
+                                                                            
BackwardScanDirection,
+                                                                            
llast_node(PathKey,
+                                                                            
root->distinct_pathkeys));

Doesn't it duplicate the job already done in build_index_pathkeys by
building those pathkeys again? If yes, probably it's possible to reuse
useful_pathkeys. Not sure about unordered indexes, but looks like
query_pathkeys should also match in this case.

Will also look at the follow up questions in the next email.


Reply via email to