Re: Keep notnullattrs in RelOptInfo (Was part of UniqueKey patch series)

Zhihong Yu Sun, 15 Aug 2021 09:36:03 -0700

On Sun, Aug 15, 2021 at 7:33 AM Andy Fan <zhihui.fan1...@gmail.com> wrote:


> Hi:
>
> I have finished the parts for baserel, joinrel, subquery, distinctrel. I
> think
> the hardest ones have been verified.  Here is the design overview.
>
> 1. Use EC instead of expr to cover more UniqueKey cases.
> 2. Redesign the UniqueKey as below:
>
> @@ -246,6 +246,7 @@ struct PlannerInfo
>
> List   *eq_classes; /* list of active EquivalenceClasses */
> + List   *unique_exprs; /* List of unique expr */
>
>   bool ec_merging_done; /* set true once ECs are canonical */
>
> +typedef struct UniqueKey
> +{
> + NodeTag type;
> + Bitmapset *unique_expr_indexes;
> + bool multi_nulls;
> +} UniqueKey;
> +
>
> PlannerInfo.unique_exprs is a List of unique exprs.  Unique Exprs is a set
> of
> EquivalenceClass. for example:
>
> CREATE TABLE T1(A INT NOT NULL, B INT NOT NULL, C INT,  pk INT primary
> key);
> CREATE UNIQUE INDEX ON t1(a, b);
>
> SELECT DISTINCT * FROM T1 WHERE a = c;
>
> Then we would have PlannerInfo.unique_exprs as below
> [
> [EC(a, c), EC(b)],
> [EC(pk)]
> ]
>
> RelOptInfo(t1) would have 2 UniqueKeys.
> UniqueKey1 {unique_expr_indexes=bms{0}, multinull=false]
> UniqueKey2 {unique_expr_indexes=bms{1}, multinull=false]
>
> The design will benefit many table joins cases. For instance a 10- tables
> join,
> each table has a primary key (a, b).  Then we would have a UniqueKey like
> this.
>
> JoinRel{1,2,3,4} -  {t1.a, t1.b, t2.a, t2.b, t3.a, t3.b t4.a t4.b}
> JoinRel{1,2,3,4,5} -  {t1.a, t1.b, t2.a, t2.b, t3.a, t3.b t4.a t4.b t5.a
> t5.b}
>
> When more tables are joined, the list would be longer and longer, build
> the list
> consumes both CPU cycles and memory.
>
> With the method as above, we can store it as:
>
> root->unique_exprs =  /* All the UniqueKey is stored once */
> [
> [t1.a, t1.b], -- EC is ignored in document.
> [t2.a, t2.b],
> [t3.a, t3.b],
> [t4.a, t4.b],
> [t5.a, t5.b],
> [t6.a, t6.b],
> [t7.a, t7.b],
> [t8.a, t8.b],
> [t9.a, t9.b],
> [t10.a, t10.b],
> ]
>
> JoinRel{1,2,3,4} -  Bitmapset{0,1,2,3} -- one bitmapword.
> JoinRel{1,2,3,4,5} -  Bitmapset{0,1,2,3,4} -- one bitmapword.
>
> The member in the bitmap is the index of root->unique_exprs rather than
> root->eq_classes because we need to store the SingleRow case in
> root->unqiue_exprs as well.
>
> 3. Define a new SingleRow node and use it in joinrel as well.
>
> +typedef struct SingleRow
> +{
> + NodeTag type;
> + Index relid;
> +} SingleRow;
>
> SELECT * FROM t1, t2 WHERE t2.pk = 1;
>
> root->unique_exprs
> [
> [t1.a, t1.b],
> SingleRow{relid=2}
> ]
>
> JoinRel{t1} - Bitmapset{0}
> JoinRel{t2} - Bitmapset{1}
>
> JoinRelt{1, 2} Bitmapset{0, 1} -- SingleRow will never be expanded to
> dedicated
> exprs.
>
> 4. Interesting UniqueKey to remove the Useless UniqueKey as soon as
> possible.
>
> The overall idea is similar with PathKey, I distinguish the UniqueKey for
> distinct purpose and useful_for_merging purpose.
>
> SELECT DISTINCT pk FROM  t; -- OK, maintain it all the time, just like
> root->query_pathkey.
>
> SELECT DISTINCT t2.c FROM t1, t2 WHERE t1.d = t2.pk; -- T2's UniqueKey PK
> is
> use before t1 join t2, but not useful after it.
>
> Currently the known issue I didn't pass the "interesting UniqueKey" info to
> subquery well [2], I'd like to talk more about this when we discuss about
> UnqiueKey on subquery part.
>
>
> 5. relation_is_distinct_for
>
> Now I design the function as
>
> + bool
> + relation_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List
>   *distinct_pathkey)
>
> It is "List *distinct_pathkey", rather than "List *exprs". The reason
> pathkey
> has EC in it, and all the UniqueKey has EC as well. if so, the subset-like
> checking is very effective.  As for the distinct/group as no-op case, we
> have
> pathkey all the time. The only drawback of it is some clauses are
> not-sortable,
> in this case, the root->distinct_pathkey and root->group_pathkey is not
> set. However it should be rare by practice, so I ignore this part for
> now. After all, I can have a relation_is_disticnt_for version for Exprs. I
> just
> not implemented it so far.
>
> 6. EC overhead in UnqiueKey & UNION case.
>
> Until now I didn't create any new EC for the UniqueKey case, I just used
> the
> existing ones. However I can't handle the case like
>
> SELECT a, b FROM t1
> UNION
> SELECT x, y FROM t2;
>
> In this case, there is no EC created with existing code. and I don't want
> to
> create them for the UniqueKey case as well.  so my plan is just not to
> handle
> the case for UNION.
>
> Since we need some big effort from the reviewer, I split the patch into
> many
> smaller chunks.
>
> Patch 1 / Patch 2:  I just split some code which UniqueKey uses but not
> very
> interrelated. Splitting them out to reduce the core patch size.
> Patch 3:  still the notnull stuff.  This one doesn't play a big role
> overall,
> even if the design is changed at last, we can just modify some small
> stuff. IMO,
> I don't think it is a blocker issue to move on.
> Patch 4:  Support the UniqueKey for baserel.
> Patch 5: Support the UniqueKey for joinrel.
> Patch 6: Support the UniqueKey for some upper relation, like distinctrel,
> groupby rel.
>
> I'd suggest moving on like this:
> 1. Have an overall review to see if any outstanding issues the patch has.
> 2. If not, we can review and commit patch 1 & patch 2 to reduce the patch
> size.
> 3. Decide which method to take for not null stuff. IMO Tom's method
> would be a bit
>    complex and the benefit is not very clear to me[1].  So the choices
>    include: a). move on UniqueKey stuff until Tom's method is ready. b).
> Move
>    on the UniqueKey with my notnull way, and changes to Tom's method when
>    necessary. I prefer method b).
> 4. Review & Commit the UniqueKey for BaseRel part.
> 5. Review & Commit the UniqueKey for JoinRel part.
> 6. Review & Commit the UniqueKey for SubQuery part *without* the
> Interesting
>    UniqueKey well handled.
> 7. Review & Commit the UniqueKey for SubQuery part *with* the Interesting
>    UniqueKey well handled.
> 8. Discuss about the UniqueKey on partitioned rel case and
> develop/review/commit
>    it.
> 9. Apply UniqueKey stuff on more user cases rather than DISTINCT.
>
> What do you think about this?
>
> [1]
> https://www.postgresql.org/message-id/CAApHDvo5b2pYX%2BdbPy%2BysGBSarezRSfXthX32TZNFm0wPdfKGQ%40mail.gmail.com
> [2]
> https://www.postgresql.org/message-id/CAKU4AWo6-%3D9mg3UQ5UJhGCMw6wyTPyPGgV5oh6dFvwEN%3D%2Bhb_w%40mail.gmail.com
>
>
> Thanks
>
Hi,
For v3-0005-Support-UniqueKey-on-JoinRel.patch :

+static void populate_joinrel_composited_uniquekey(PlannerInfo *root,

populate_joinrel_composited_uniquekey
-> populate_joinrel_composite_uniquekey (without the trailing d for
composite)

For populate_joinrel_uniquekeys():

+       foreach(lc, outerrel->uniquekeys)
+       {
...
+           return;

Should the remaining unique keys be considered ?

For populate_joinrel_uniquekey_for_rel():

+       else if (bms_equal(r->right_relids, rel->relids) && r->left_ec !=
NULL)
+       {
+           other_ecs = lappend(other_ecs, r->right_ec);
+           other_relids = bms_add_members(other_relids, r->left_relids);

It seems the append to other_ecs is the same as the one for the
`bms_equal(r->left_relids, rel->relids) && r->right_ec != NULL` case. Is
this correct ?

Cheers

Re: Keep notnullattrs in RelOptInfo (Was part of UniqueKey patch series)

Reply via email to