[HACKERS] Pull up aggregate subquery

Hitoshi Harada Wed, 04 May 2011 03:02:45 -0700

I sometimes wonder if we could pull up aggregate query in the optimizer.

My typical problem is:


Consider two relations, medium size M and large size L. L has
reference column to M's primary key. Say,

    create table size_m as select i as id, repeat(i::text, i % 100) as
val from generate_series(1, 20000) i;
    create table size_l as select i as id, m.id as m_id,
repeat(i::text, i % 100) as val from generate_series(1, 100000) i,
size_m m where i.i / 10 = m.id;

Now, you want to query M under some condition with join aggregate L
group by M's primary key.

    select m_id, sum_len from size_m m inner join(select m_id,
sum(length(val)) as sum_len from size_l group by m_id)l on m.id =
l.m_id where val = '1';

The generated plan is:

                                                           QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=36116.92..38339.67 rows=50 width=235) (actual
time=440.679..1039.964 rows=1 loops=1)
   Join Filter: (m.id = size_l.m_id)
   ->  Seq Scan on size_m m  (cost=0.00..897.00 rows=1 width=227)
(actual time=0.021..16.698 rows=1 loops=1)
         Filter: (val = '1'::text)
   ->  GroupAggregate  (cost=36116.92..37217.09 rows=10026 width=248)
(actual time=440.651..1013.704 rows=10000 loops=1)
         ->  Sort  (cost=36116.92..36366.90 rows=99991 width=248)
(actual time=440.619..593.062 rows=99991 loops=1)
               Sort Key: size_l.m_id
               Sort Method: external sort  Disk: 25736kB
               ->  Seq Scan on size_l  (cost=0.00..4565.91 rows=99991
width=248) (actual time=0.011..138.243 rows=99991 loops=1)
 Total runtime: 1044.039 ms
(10 rows)

Note that most of the result of aggregate is discarded on join M,
because M resulted in small output with filter by M.val. If we can
filter M first and filter L by the M's result before running
aggregate, the response may dramatically get faster.

If you filter by M.id instead of M.val the optimizer is smart enough
to push down the condition to L, which is filtered before aggregate.

    select m_id, sum_len from size_m m inner join(select m_id,
sum(length(val)) as sum_len from size_l group by m_id)l on m.id =
l.m_id where id = 1;

                                                     QUERY PLAN
--------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.00..5713.02 rows=2 width=235) (actual
time=72.121..82.364 rows=1 loops=1)
   ->  Seq Scan on size_m m  (cost=0.00..897.00 rows=1 width=227)
(actual time=0.028..10.252 rows=1 loops=1)
         Filter: (id = 1)
   ->  GroupAggregate  (cost=0.00..4815.98 rows=2 width=248) (actual
time=72.065..72.067 rows=1 loops=1)
         ->  Seq Scan on size_l  (cost=0.00..4815.89 rows=10
width=248) (actual time=0.051..71.968 rows=10 loops=1)
               Filter: (m_id = 1)
 Total runtime: 82.525 ms
(7 rows)

This seems like the benefit of EquivalentClass. 1 = M.id = L.m_id is
implied and the optimizer adds implicit constant filter L.m_id = 1 in
the plan tree. In contrast, in the former case of M.val = '1' doesn't
imply any condition for L.m_id. That's fair enough.

However, I think we can filter L by L.m_id before aggregate because
L.m_id is of the group clause as well as the join condition and M.id
is unique in M. In such cases, the query can be transform something
like:
    GroupAggregate
      -> NestLoop (L.m_id = M.id)
        -> SeqScan L
        -> SeqScan M (filter: M.val = '1')
This transformation can be done by pulling up aggregate Query in
pull_up_subqueries(). Currently the optimizer doesn't pull up any
queries which contains aggregate, but as shown above in some cases we
can do it. Attached is WIP proof of concept patch to do it. I know it
breaks general queries but it transforms as I described above. I
suppose the missing piece is adding condition of "when" to pull up
aggregate. "how" is done in the patch.

db1=# explain select m_id, sum_len from size_m m inner join(select
m_id, sum(length(val)) as sum_len from size_l group by m_id)l on m.id
= l.m_id where val = '1';
                                                           QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------
 GroupAggregate  (cost=6712.96..6713.16 rows=10 width=471) (actual
time=125.496..125.499 rows=1 loops=1)
   ->  Sort  (cost=6712.96..6712.99 rows=10 width=471) (actual
time=125.228..125.288 rows=10 loops=1)
         Sort Key: size_l.m_id
         Sort Method: quicksort  Memory: 25kB
         ->  Nested Loop  (cost=0.00..6712.80 rows=10 width=471)
(actual time=0.142..125.089 rows=10 loops=1)
               Join Filter: (m.id = size_l.m_id)
               ->  Seq Scan on size_m m  (cost=0.00..897.00 rows=1
width=227) (actual time=0.037..8.956 rows=1 loops=1)
                     Filter: (val = '1'::text)
               ->  Seq Scan on size_l  (cost=0.00..4565.91 rows=99991
width=248) (actual time=0.060..65.910 rows=99991 loops=1)
 Total runtime: 125.658 ms
(10 rows)

(You may notice that by the transformation non-group/non-agg
TargetEntry is in the Agg node, but actually there's no check for that
during optimizer and executor. May be thanks to Functional
Dependency.)

Comments?

Regards,

-- 
Hitoshi Harada

diff --git a/.gitignore b/.gitignore
index 81c4d5e..2230d62 100644
--- a/.gitignore
+++ b/.gitignore
@@ -28,3 +28,4 @@ win32ver.rc
 /pgsql.sln.cache
 /Debug/
 /Release/
+tags
diff --git a/src/backend/optimizer/prep/prepjointree.c b/src/backend/optimizer/prep/prepjointree.c
index 5d16329..91d6fa0 100644
--- a/src/backend/optimizer/prep/prepjointree.c
+++ b/src/backend/optimizer/prep/prepjointree.c
@@ -877,9 +877,23 @@ pull_up_simple_subquery(PlannerInfo *root, Node *jtnode, RangeTblEntry *rte,
 	parse->hasSubLinks |= subquery->hasSubLinks;
 
 	/*
-	 * subquery won't be pulled up if it hasAggs or hasWindowFuncs, so no work
-	 * needed on those flags
+	 * For group clauses, reassign the indexes to the targetList, by
+	 * adding new junk ones (for now).
 	 */
+	parse->hasAggs = subquery->hasAggs;
+	parse->groupClause = copyObject(subquery->groupClause);
+	foreach (lc, parse->groupClause)
+	{
+		SortGroupClause *sort = (SortGroupClause *) lfirst(lc);
+		TargetEntry	   *tle = get_tle_by_resno(subquery->targetList, sort->tleSortGroupRef);
+		TargetEntry	   *newtle;
+		AttrNumber		resno = list_length(parse->targetList);
+
+		Assert(tle != NULL);
+		newtle = makeTargetEntry(tle->expr, resno + 1, tle->resname, true);
+		parse->targetList = lappend(parse->targetList, newtle);
+		sort->tleSortGroupRef = assignSortGroupRef(newtle, parse->targetList);
+	}
 
 	/*
 	 * Return the adjusted subquery jointree to replace the RangeTblRef entry
@@ -1071,10 +1085,14 @@ is_simple_subquery(Query *subquery)
 	 * that case the locking was originally declared in the upper query
 	 * anyway.
 	 */
-	if (subquery->hasAggs ||
+	/*
+	 * In some cases aggregate query can be pulled up. Need to add code
+	 * to make decision for more precise conditions.
+	 */
+	if (/*subquery->hasAggs ||*/
 		subquery->hasWindowFuncs ||
-		subquery->groupClause ||
-		subquery->havingQual ||
+		/*subquery->groupClause ||
+		subquery->havingQual ||*/
 		subquery->sortClause ||
 		subquery->distinctClause ||
 		subquery->limitOffset ||

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Pull up aggregate subquery

Reply via email to