Hi David: Thanks for your time.
> 1. Out of date comment in join.sql > > -- join removal is not possible when the GROUP BY contains a column that is > -- not in the join condition. (Note: as of 9.6, we notice that b.id is a > -- primary key and so drop b.c_id from the GROUP BY of the resulting plan; > -- but this happens too late for join removal in the outer plan level.) > explain (costs off) > select d.* from d left join (select d, c_id from b group by b.d, b.c_id) s > on d.a = s.d; > > You've changed the GROUP BY clause so it does not include b.id, so the > Note in the comment is now misleading. > Thanks, I will fix this one in the following patch. > > 2. I think 0002 is overly restrictive in its demands that > parse->hasAggs must be false. We should be able to just use a Group > Aggregate with unsorted input when the input_rel is unique on the > GROUP BY clause. This will save on hashing and sorting. Basically > similar to what we do for when a query contains aggregates without any > GROUP BY. > > Yes, This will be a perfect result, the difficult is the current aggregation function execution is highly coupled with Agg node(ExecInitAgg) which is removed in the unique case. I ever make the sum (w/o finalfn) and avg(with finalfn) works in a hack way, but still many stuffs is not handled. Let me prepare the code for this purpose in 1~2 days to see if I'm going with the right direction. Ashutosh also has an idea[1] that if the relation underlying an Agg node is known to be unique for given groupByClause, we could safely use AGG_SORTED strategy. Though the input is not ordered, it's sorted thus for every row Agg node will combine/finalize the aggregate result. I will target the perfect result first and see how many effort do we need, if not, I will try Ashutosh's suggestion. > 3. I don't quite understand why you changed this to a right join: > > -- Test case where t1 can be optimized but not t2 > explain (costs off) select t1.*,t2.x,t2.z > -from t1 inner join t2 on t1.a = t2.x and t1.b = t2.y > +from t1 right join t2 on t1.a = t2.x and t1.b = t2.y > > Perhaps this change is left over from some previous version of the patch? > This is on purpose. the original test case is used to test we can short the group key for t1 but not t2 for aggregation, but if I keep the inner join, the aggnode will be removed totally, so I have to change it to right join in order to keep the aggnode. The full test case is: -- Test case where t1 can be optimized but not t2 explain (costs off) select t1.*,t2.x,t2.z from t1 inner join t2 on t1.a = t2.x and t1.b = t2.y group by t1.a,t1.b,t1.c,t1.d,t2.x,t2.z; where (a, b) is the primary key of t1. [1] https://www.postgresql.org/message-id/CAExHW5sY%2BL6iZ%3DrwnL7n3jET7aNLCNQimvfcS7C%2B5wmdjmdPiw%40mail.gmail.com