Re: Questions about RelMdColumnUniqueness

Ian Bertolacci Fri, 07 Mar 2025 14:26:18 -0800

That example makes sense but isn’t directly related to the issue here.
In this example: `select T1.id, count(*) from T1 inner join T2 on T1.foreignKey 
= T2.ID where T2.foo = 1234 group by T1.id`
AggregateRemoveRule wants to know if  `T1.id` after the filter is unique. The 
answer should be “yes” because T1 is not expanded by the join (in these 
examples, all ID fields are unique at the TableScan level).
For example:
T1
ID | foreignKey
1  | 10
2  | 10
3  | 20
4  | 20


T2
ID  (lets ignore foo)
10
20

T1 inner join T2 on T1.foreignKey = T2.id
T1.ID | T2.ID
1         | 10
2         | 10
3         | 20
4         | 20

And the analyser is already aware of this.
I ask the join “Is T1.id unique” it will say “yes”; and if I ask “is T2.id 
unique” it will say “no”, because the non-uniqueness of the foreign key field 
negates any uniqueness on the opposite/ primary key side.

But if you ask the filter “is T1.id unique” it will ask the join “is {T1.id 
T2.foo}” unique.
The join looks at T1.id and sees that its unique, then looks at T2.foo and sees 
that its not unique.
And for join, it “ands” these together, so if any non-unique column exists, 
then it claims that the whole key-set is not unique.

This gets back to my “What is this supposed to do?”
Is it answering the question “Is this set of keys, when taken together, 
unique?” or is it  answering the question “are *all* of these keys 
*independently* unique?”
If it’s the first, then the join analysis is too conservative, it’s the second, 
then the table scan analysis is incorrect (I think)

To me, I feel like the analysis could be as simple as “Are any of these columns 
unique after the join?” if yes, then the key set as a whole is unique.

-Ian

Re: Questions about RelMdColumnUniqueness

Reply via email to