On Sun, Aug 15, 2021 at 7:33 AM Andy Fan <zhihui.fan1...@gmail.com> wrote:
> Hi: > > I have finished the parts for baserel, joinrel, subquery, distinctrel. I > think > the hardest ones have been verified. Here is the design overview. > > 1. Use EC instead of expr to cover more UniqueKey cases. > 2. Redesign the UniqueKey as below: > > @@ -246,6 +246,7 @@ struct PlannerInfo > > List *eq_classes; /* list of active EquivalenceClasses */ > + List *unique_exprs; /* List of unique expr */ > > bool ec_merging_done; /* set true once ECs are canonical */ > > +typedef struct UniqueKey > +{ > + NodeTag type; > + Bitmapset *unique_expr_indexes; > + bool multi_nulls; > +} UniqueKey; > + > > PlannerInfo.unique_exprs is a List of unique exprs. Unique Exprs is a set > of > EquivalenceClass. for example: > > CREATE TABLE T1(A INT NOT NULL, B INT NOT NULL, C INT, pk INT primary > key); > CREATE UNIQUE INDEX ON t1(a, b); > > SELECT DISTINCT * FROM T1 WHERE a = c; > > Then we would have PlannerInfo.unique_exprs as below > [ > [EC(a, c), EC(b)], > [EC(pk)] > ] > > RelOptInfo(t1) would have 2 UniqueKeys. > UniqueKey1 {unique_expr_indexes=bms{0}, multinull=false] > UniqueKey2 {unique_expr_indexes=bms{1}, multinull=false] > > The design will benefit many table joins cases. For instance a 10- tables > join, > each table has a primary key (a, b). Then we would have a UniqueKey like > this. > > JoinRel{1,2,3,4} - {t1.a, t1.b, t2.a, t2.b, t3.a, t3.b t4.a t4.b} > JoinRel{1,2,3,4,5} - {t1.a, t1.b, t2.a, t2.b, t3.a, t3.b t4.a t4.b t5.a > t5.b} > > When more tables are joined, the list would be longer and longer, build > the list > consumes both CPU cycles and memory. > > With the method as above, we can store it as: > > root->unique_exprs = /* All the UniqueKey is stored once */ > [ > [t1.a, t1.b], -- EC is ignored in document. > [t2.a, t2.b], > [t3.a, t3.b], > [t4.a, t4.b], > [t5.a, t5.b], > [t6.a, t6.b], > [t7.a, t7.b], > [t8.a, t8.b], > [t9.a, t9.b], > [t10.a, t10.b], > ] > > JoinRel{1,2,3,4} - Bitmapset{0,1,2,3} -- one bitmapword. > JoinRel{1,2,3,4,5} - Bitmapset{0,1,2,3,4} -- one bitmapword. > > The member in the bitmap is the index of root->unique_exprs rather than > root->eq_classes because we need to store the SingleRow case in > root->unqiue_exprs as well. > > 3. Define a new SingleRow node and use it in joinrel as well. > > +typedef struct SingleRow > +{ > + NodeTag type; > + Index relid; > +} SingleRow; > > SELECT * FROM t1, t2 WHERE t2.pk = 1; > > root->unique_exprs > [ > [t1.a, t1.b], > SingleRow{relid=2} > ] > > JoinRel{t1} - Bitmapset{0} > JoinRel{t2} - Bitmapset{1} > > JoinRelt{1, 2} Bitmapset{0, 1} -- SingleRow will never be expanded to > dedicated > exprs. > > 4. Interesting UniqueKey to remove the Useless UniqueKey as soon as > possible. > > The overall idea is similar with PathKey, I distinguish the UniqueKey for > distinct purpose and useful_for_merging purpose. > > SELECT DISTINCT pk FROM t; -- OK, maintain it all the time, just like > root->query_pathkey. > > SELECT DISTINCT t2.c FROM t1, t2 WHERE t1.d = t2.pk; -- T2's UniqueKey PK > is > use before t1 join t2, but not useful after it. > > Currently the known issue I didn't pass the "interesting UniqueKey" info to > subquery well [2], I'd like to talk more about this when we discuss about > UnqiueKey on subquery part. > > > 5. relation_is_distinct_for > > Now I design the function as > > + bool > + relation_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List > *distinct_pathkey) > > It is "List *distinct_pathkey", rather than "List *exprs". The reason > pathkey > has EC in it, and all the UniqueKey has EC as well. if so, the subset-like > checking is very effective. As for the distinct/group as no-op case, we > have > pathkey all the time. The only drawback of it is some clauses are > not-sortable, > in this case, the root->distinct_pathkey and root->group_pathkey is not > set. However it should be rare by practice, so I ignore this part for > now. After all, I can have a relation_is_disticnt_for version for Exprs. I > just > not implemented it so far. > > 6. EC overhead in UnqiueKey & UNION case. > > Until now I didn't create any new EC for the UniqueKey case, I just used > the > existing ones. However I can't handle the case like > > SELECT a, b FROM t1 > UNION > SELECT x, y FROM t2; > > In this case, there is no EC created with existing code. and I don't want > to > create them for the UniqueKey case as well. so my plan is just not to > handle > the case for UNION. > > Since we need some big effort from the reviewer, I split the patch into > many > smaller chunks. > > Patch 1 / Patch 2: I just split some code which UniqueKey uses but not > very > interrelated. Splitting them out to reduce the core patch size. > Patch 3: still the notnull stuff. This one doesn't play a big role > overall, > even if the design is changed at last, we can just modify some small > stuff. IMO, > I don't think it is a blocker issue to move on. > Patch 4: Support the UniqueKey for baserel. > Patch 5: Support the UniqueKey for joinrel. > Patch 6: Support the UniqueKey for some upper relation, like distinctrel, > groupby rel. > > I'd suggest moving on like this: > 1. Have an overall review to see if any outstanding issues the patch has. > 2. If not, we can review and commit patch 1 & patch 2 to reduce the patch > size. > 3. Decide which method to take for not null stuff. IMO Tom's method > would be a bit > complex and the benefit is not very clear to me[1]. So the choices > include: a). move on UniqueKey stuff until Tom's method is ready. b). > Move > on the UniqueKey with my notnull way, and changes to Tom's method when > necessary. I prefer method b). > 4. Review & Commit the UniqueKey for BaseRel part. > 5. Review & Commit the UniqueKey for JoinRel part. > 6. Review & Commit the UniqueKey for SubQuery part *without* the > Interesting > UniqueKey well handled. > 7. Review & Commit the UniqueKey for SubQuery part *with* the Interesting > UniqueKey well handled. > 8. Discuss about the UniqueKey on partitioned rel case and > develop/review/commit > it. > 9. Apply UniqueKey stuff on more user cases rather than DISTINCT. > > What do you think about this? > > [1] > https://www.postgresql.org/message-id/CAApHDvo5b2pYX%2BdbPy%2BysGBSarezRSfXthX32TZNFm0wPdfKGQ%40mail.gmail.com > [2] > https://www.postgresql.org/message-id/CAKU4AWo6-%3D9mg3UQ5UJhGCMw6wyTPyPGgV5oh6dFvwEN%3D%2Bhb_w%40mail.gmail.com > > > Thanks > Hi, For v3-0005-Support-UniqueKey-on-JoinRel.patch : +static void populate_joinrel_composited_uniquekey(PlannerInfo *root, populate_joinrel_composited_uniquekey -> populate_joinrel_composite_uniquekey (without the trailing d for composite) For populate_joinrel_uniquekeys(): + foreach(lc, outerrel->uniquekeys) + { ... + return; Should the remaining unique keys be considered ? For populate_joinrel_uniquekey_for_rel(): + else if (bms_equal(r->right_relids, rel->relids) && r->left_ec != NULL) + { + other_ecs = lappend(other_ecs, r->right_ec); + other_relids = bms_add_members(other_relids, r->left_relids); It seems the append to other_ecs is the same as the one for the `bms_equal(r->left_relids, rel->relids) && r->right_ec != NULL` case. Is this correct ? Cheers