On Thu, 11 Jul 2019 at 02:48, Dmitry Dolgov <9erthali...@gmail.com> wrote: > > > On Tue, Jul 2, 2019 at 2:27 PM David Rowley <david.row...@2ndquadrant.com> > > wrote: > > > > The more I think about these UniqueKeys, the more I think they need to > > be a separate concept to PathKeys. For example, UniqueKeys: { x, y } > > should be equivalent to { y, x }, but with PathKeys, that's not the > > case, since the order of each key matters. UniqueKeys equivalent > > version of pathkeys_contained_in() would not care about the order of > > individual keys, it would say things like, { a, b, c } is contained in > > { b, a }, since if the path is unique on columns { b, a } then it must > > also be unique on { a, b, c }. > > > On Tue, Jul 9, 2019 at 3:32 PM Jesper Pedersen <jesper.peder...@redhat.com> > > wrote: > > > > David, are you thinking about something like the attached ? > > > > Some questions. > > > > * Do you see UniqueKey as a "complete" planner node ? > > - I didn't update the nodes/*.c files for this yet > > > > * Is a UniqueKey with a list of EquivalenceClass best, or a list of > > UniqueKey with a single EquivalenceClass > > Just for me to clarify, the idea is to replace PathKeys with a new concept of > "UniqueKey" for skip scans, right? If I see it correctly, of course > > UniqueKeys { x, y } == UniqueKeys { y, x } > > from the result point of view, but the execution costs could be different due > to different values distribution. In fact there are efforts to utilize this to > produce more optimal order [1], but with UniqueKeys concept this information > is > lost. Obviously it's not something that could be immediately (or even never) a > problem for skip scan functionality, but I guess it's still worth to point it > out.
The UniqueKeys idea is quite separate from pathkeys. Currently, a Path can have a List of PathKeys which define the order that the tuples will be read from the Plan node that's created from that Path. The idea with UniqueKeys is that a Path can also have a non-empty List of UniqueKeys to define that there will be no more than 1 row with the same value for the Paths UniqueKey column/exprs. As of now, if you look at standard_qp_callback() sets root->query_pathkeys, the idea here would be that the callback would also set a new List field named "query_uniquekeys" based on the group_pathkeys when non-empty and !root->query->hasAggs, or by using the distinct clause if it's non-empty. Then in build_index_paths() around the call to match_pathkeys_to_index() we'll probably also want to check if the index can support UniqueKeys that would suit the query_uniquekeys that were set during standard_qp_callback(). As for setting query_uniquekeys in standard_qp_callback(), this should be simply a matter of looping over either group_pathkeys or distinct_pathkeys and grabbing the pk_eclass from each key and making a canonical UniqueKey from that. To have these canonical you'll need to have a new field in PlannerInfo named canon_uniquekeys which will do for UniqueKeys what canon_pathkeys does for PathKeys. So you'll need an equivalent of make_canonical_pathkey() in uniquekey.c With this implementation, the code that the patch adds in create_distinct_paths() can mostly disappear. You'd only need to look for a path that uniquekeys_contained_in() matches the root->query_uniquekeys and then just leave it to set_cheapest(distinct_rel); to pick the cheapest path. It would be wasted effort to create paths with UniqueKeys if there's multiple non-dead base rels in the query as the final rel in create_distinct_paths would be a join rel, so it might be worth checking that before creating such index paths in build_index_paths(). However, down the line, we could carry the uniquekeys forward into the join paths if the join does not duplicate rows from the other side of the join... That's future stuff though, not for this patch, I don't think. I think a UniqueKey can just be a struct similar to PathKey, e.g be located in pathnodes.h around where PathKey is defined. Likely we'll need a uniquekeys.c file that has the equivalent to pathkeys_contained_in() ... uniquekeys_contained_in(), which return true if arg1 is a superset of arg2 rather than check for one set being a prefix of another. As you mention above: UniqueKeys { x, y } == UniqueKeys { y, x }. That superset check could perhaps be optimized by sorting UniqueKey lists in memory address order, that'll save having a nested loop, but likely that's not going to be required for a first cut version. This would work since you'd want UniqueKeys to be canonical the same as PathKeys are (Notice that compare_pathkeys() only checks memory addresses of pathkeys and not equals()). I think the UniqueKey struct would only need to contain an EquivalenceClass field. I think all the other stuff that's in PathKey is irrelevant to UniqueKey. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services