Re: [PATCH] Keeps tracking the uniqueness with UniqueKey

Andy Fan Tue, 14 Apr 2020 17:20:10 -0700

Hi David:

Thanks for your time.



> 1. Out of date comment in join.sql
>
> -- join removal is not possible when the GROUP BY contains a column that is
> -- not in the join condition.  (Note: as of 9.6, we notice that b.id is a
> -- primary key and so drop b.c_id from the GROUP BY of the resulting plan;
> -- but this happens too late for join removal in the outer plan level.)
> explain (costs off)
> select d.* from d left join (select d, c_id from b group by b.d, b.c_id) s
>   on d.a = s.d;
>
> You've changed the GROUP BY clause so it does not include b.id, so the
> Note in the comment is now misleading.
>

Thanks, I will fix this one in the following patch.


>
> 2. I think 0002 is overly restrictive in its demands that
> parse->hasAggs must be false. We should be able to just use a Group
> Aggregate with unsorted input when the input_rel is unique on the
> GROUP BY clause.  This will save on hashing and sorting.  Basically
> similar to what we do for when a query contains aggregates without any
> GROUP BY.
>
>
Yes,  This will be a perfect result,  the difficult is the current
aggregation function
execution is highly coupled with Agg node(ExecInitAgg) which is removed in
the
unique case.  I ever make the sum (w/o finalfn) and avg(with finalfn)
works in a hack way, but still many stuffs is not handled.  Let me prepare
the code
for this purpose in 1~2  days to see if I'm going with the right direction.

Ashutosh also has an idea[1] that if the relation underlying an Agg node is
known to be unique for given groupByClause, we could safely use
AGG_SORTED strategy. Though the input is not ordered, it's sorted thus for
every row Agg
node will combine/finalize the aggregate result.

I will target the perfect result first and see how many effort do we need,
if not,
I will try Ashutosh's suggestion.



> 3. I don't quite understand why you changed this to a right join:
>
>  -- Test case where t1 can be optimized but not t2
>  explain (costs off) select t1.*,t2.x,t2.z
> -from t1 inner join t2 on t1.a = t2.x and t1.b = t2.y
> +from t1 right join t2 on t1.a = t2.x and t1.b = t2.y
>
> Perhaps this change is left over from some previous version of the patch?
>

This is on purpose.   the original test case is used to test we can short
the group key for t1 but not t2 for aggregation, but if I keep the inner
join, the
aggnode will be removed totally, so I have to change it to right join in
order
to keep the aggnode.  The full test case is:

-- Test case where t1 can be optimized but not t2

explain (costs off) select t1.*,t2.x,t2.z

from t1 inner join t2 on t1.a = t2.x and t1.b = t2.y

group by t1.a,t1.b,t1.c,t1.d,t2.x,t2.z;


where (a, b) is the primary key of t1.



[1]
https://www.postgresql.org/message-id/CAExHW5sY%2BL6iZ%3DrwnL7n3jET7aNLCNQimvfcS7C%2B5wmdjmdPiw%40mail.gmail.com

Re: [PATCH] Keeps tracking the uniqueness with UniqueKey

Reply via email to