Antonin Houska <a...@cybertec.at> wrote:

> Andy Fan <zhihuifan1...@163.com> wrote:
> > >
> > > * Combining the UKs
> > >
> > >   IMO this is the most problematic part of the patch. You call
> > >   populate_joinrel_uniquekeys() for the same join multiple times,
> > 
> > Why do you think so? The below code is called in "make_join_rel"
> 
> Consider join of tables "a", "b" and "c". My understanding is that
> make_join_rel() is called once with rel1={a} and rel2={b join c}, then with
> rel1={a join b} and rel2={c}, etc. I wanted to say that each call should
> produce the same set of unique keys.
> 
> I need to check this part more in detail.

I think I understand now. By calling populate_joinrel_uniquekeys() for various
orderings, you can find out that various input relation unique keys can
represent the whole join. For example, if the ordering is

A JOIN (B JOIN C)

you can prove that the unique keys of A can be used for the whole join, while
for the ordering

B JOIN (A JOIN C)

you can prove the same for the unique keys of B, and so on.

> > Is your original question is about populate_joinrel_uniquekey_for_rel
> > rather than populate_joinrel_uniquekeys? We have the below codes:
> > 
> >     outeruk_still_valid = populate_joinrel_uniquekey_for_rel(root, joinrel, 
> > outerrel,
> >                                                                             
> >                                          innerrel, restrictlist);
> >     inneruk_still_valid = populate_joinrel_uniquekey_for_rel(root, joinrel, 
> > innerrel,
> >                                                                             
> >                                          outerrel, restrictlist);
> > 
> > This is mainly because of the following theory. Quoted from
> > README.uniquekey. Let's called this as "rule 1".
> > 
> > """
> > How to maintain the uniquekey?
> > -------------------------------
> > .. From the join relation, it is maintained with two rules:
> > 
> > - the uniquekey in one side is still unique if it can't be duplicated
> >   after the join. for example:
> > 
> >   SELECT t1.pk FROM t1 JOIN t2 ON t1.a = t2.pk;
> >   UniqueKey on t1:  t1.pk
> >   UniqueKey on t1 Join t2:  t1.pk
> > """
> > 
> > AND the blow codes:
> > 
> > 
> >     if (outeruk_still_valid || inneruk_still_valid)
> > 
> >             /*
> >              * the uniquekey on outers or inners have been added into 
> > joinrel so
> >              * the combined uniuqekey from both sides is not needed.
> >              */
> >             return;
> > 
> > 
> > We don't create the component uniquekey if any one side of the boths
> > sides is unique already. For example:
> > 
> > "(t1.id) in joinrel(t1, t2) is unique" OR "(t2.id) in joinrel is
> > unique", there is no need to create component UniqueKey (t1.id, t2.id);  
> 
> ok, I need to check more in detail how this part works.

This optimization makes sense to me.

> > >
> > >   Of course one problem is that the number of combinations can grow
> > >   exponentially as new relations are joined.
> > 
> > Yes, that's why "rule 1" needed and "How to reduce the overhead" in
> > UniqueKey.README is introduced. 

I think there should yet be some guarantee that the number of unique keys does
not grow exponentially. Perhaps a constant that allows a relation (base or
join) to have at most N unique keys. (I imagine N to be rather small, e.g. 3
or 4.) And when picking the "best N keys", one criterion could be the number
of expressions in the key (the shorter key the better).

> > >
> > >   2) Check if the join relation is single-row
> > >
> > >   I in order to get rid of the dependency on 'restrictlist', I think you 
> > > can
> > >   use ECs. Consider a query from your regression tests:
> > >
> > > CREATE TABLE uk_t (id int primary key, a int not null, b int not null, c 
> > > int, d int, e int);
> > >
> > > SELECT distinct t1.d FROM uk_t t1 JOIN uk_t t2 ON t1.e = t2.id and t1.id 
> > > = 1;
> > >
> > >   The idea here seems to be that no more than one row of t1 matches the 
> > > query
> > >   clauses. Therefore, if t2(id) is unique, the clause t1.e=t2.id ensures 
> > > that
> > >   no more than one row of t2 matches the query (because t1 cannot provide 
> > > the
> > >   clause with more than one input value of 'e'). And therefore, the join 
> > > also
> > >   produces at most one row.
> > 
> > You are correct and IMO my current code are able to tell it is a single
> > row as well.
> > 
> > 1. Since t1.id = 1, so t1 is single row, so t1.d is unqiuekey as a
> > consequence.
> > 2. Given t2.id is unique, t1.e = t2.id so t1's unqiuekey can be kept
> > after the join because of rule 1 on joinrel. and t1 is singlerow, so the
> > joinrel is singlerow as well.

ok, I think I understand now.

-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com


Reply via email to