On 01/12/2015 12:07, David Rowley wrote: > On 1 December 2015 at 17:09, Marko Tiikkaja <ma...@joh.to > <mailto:ma...@joh.to>> wrote: > > On 2015-12-01 05:00, David Rowley wrote: > > We already allow a SELECT's target list to contain > non-aggregated columns > in a GROUP BY query in cases where the non-aggregated column is > functionally dependent on the GROUP BY clause. > > For example a query such as; > > SELECT p.product_id,p.description, SUM(s.quantity) > FROM product p > INNER JOIN sale s ON p.product_id = s.product_id > GROUP BY p.product_id; > > is perfectly fine in PostgreSQL, as p.description is > functionally dependent > on p.product_id (assuming product_id is the PRIMARY KEY of product). > > > This has come up before (on other forums, at least), and my main > concern has been that unlike the case where we go from throwing an > error to allowing a query, this has a chance to make the planning of > currently legal queries slower. Have you tried to measure the > impact of this on queries where there's no runtime gains to be had? > > > I've performed a series of benchmarks on the following queries: > > Test1: explain select id1,id2 from t1 group by id1,id2; > Test2: explain select id from t2 group by id; > Test3: explain select t1.id1,t1.id2 from t2 inner join t1 on > t1.id1=t2.id <http://t2.id> group by t1.id1,t1.id2; > > I ran each of these with pgbench for 60 seconds, 3 runs per query. In > each case below I've converted the TPS into seconds using the average > TPS over the 3 runs. > > In summary: > > Test1 is the worst case test. It's a very simple query so planning > overhead of join searching is non-existent. The fact that there's 2 > columns in the GROUP BY means that the fast path cannot be used. I added > this as if there's only 1 column in the GROUP BY then there's no point > in searching for something to remove. > > [...] > > It seems that the worst case test adds about 7.6 microseconds onto > planning time. To get this worse case result I had to add two GROUP BY > columns, as having only 1 triggers a fast path as the code knows it > can't remove any columns, since there's only 1. A similar fast path also > exists which will only lookup the PRIMARY KEY details if there's more > than 1 column per relation in the GROUP BY, so for example GROUP BY > rel1.col1, rel2.col1 won't lookup any PRIMARY KEY constraint. > > Given that the extra code really only does anything if the GROUP BY has > 2 or more expressions, are you worried that this will affect too many > short and fast to execute queries negatively? > >
In identify_key_vars() + /* Only PK constraints are of interest for now, see comment above */ + if (con->contype != CONSTRAINT_PRIMARY) + continue; I think the comment should better refer to remove_useless_groupby_columns() (don't optimize further than what check_functional_grouping() does). + /* + * Exit if the constraint is deferrable, there's no point in looking + * for another PK constriant, as there can only be one. + */ small typo on "constriant" + { + varlistmatches = bms_add_member(varlistmatches, varlistidx); + found_col = true; + /* don't break, there might be dupicates */ + } small typo on "dupicates". Also, I thought transformGroupClauseExpr() called in the parse analysis make sure that there isn't any duplicate in the GROUP BY clause. Do I miss something? in remove_useless_groupby_columns() + /* + * Skip if there were no Vars found in the GROUP BY clause which belong + * to this relation. We can also skip if there's only 1 Var, as there + * needs to be more than one Var for there to be any columns which are + * functionally dependent on another column. + */ This comment doesn't sound very clear. I'm not a native english speaker, so maybe it's just me. + /* + * If we found any surplus Vars in the GROUP BY clause, then we'll build + * a new GROUP BY clause without these surplus Vars. + */ + if (anysurplus) + { + List *new_groupby = NIL; + + foreach (lc, root->parse->groupClause) + { + SortGroupClause *sgc = (SortGroupClause *) lfirst(lc); + TargetEntry *tle; + Var *var; + + tle = get_sortgroupclause_tle(sgc, root->parse->targetList); + var = (Var *) tle->expr; + + if (!IsA(var, Var)) + continue; + [...] if var isn't a Var, it needs to be added in new_groupby. + /* XXX do we to alter tleSortGroupRef to remove gaps? */ no idea on that :/ -- Julien Rouhaud http://dalibo.com - http://dalibo.org -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers