This is all on Postgres 9.0.0: I'm working on the definition for a view that, as part of its output, includes three columns that each contain a sum of values in a table that are in one of a few different states -- essentially: for each parent, give me the sum of children in the foo state, a second sum of the children in the bar state, and a third sum of the children in the baz state.
Thinking I'd be clever and avoid multiple almost-identical subqueries to for each SUM, I decided to do it in a single subquery that returns an array of all 3 sums and then split it out into its component parts higher up (see the example below if you don't understand what I mean). Unfortunately, the query planner doesn't quite handle this case: For each reference to the subquery value, it duplicates the subquery and thus its plan: CREATE TABLE parent ( id INT PRIMARY KEY ); INSERT INTO parent(id) SELECT GENERATE_SERIES(1, 1000); CREATE TABLE child ( parent_id INT, v1 INT, v2 INT, PRIMARY KEY(parent_id, v1) ); INSERT INTO child(parent_id, v1, v2) SELECT p.id, v1.id, RANDOM() * 500 FROM GENERATE_SERIES(1, 1000) AS p(id) CROSS JOIN GENERATE_SERIES(1, 20) AS v1(id) ; EXPLAIN SELECT *, child_data[1] AS foo, child_data[2] AS bar, child_data[3] AS baz FROM ( SELECT p.id, ( SELECT ARRAY[ SUM(c.v2), SUM(CASE WHEN c.v1 > 15 THEN c.v2 ELSE 0 END), SUM(CASE WHEN c.v1 > 5 THEN c.v2 ELSE 0 END) ] FROM child AS c WHERE c.parent_id=p.id ) AS child_data FROM parent AS p ) AS t; produces: Seq Scan on wings_sky.parent p (cost=0.00..161113.12 rows=1000 width=4) Output: p.id, (SubPlan 1), ((SubPlan 2))[1], ((SubPlan 3))[2], ((SubPlan 4))[3] SubPlan 1 -> Aggregate (cost=40.26..40.27 rows=1 width=8) Output: ARRAY[sum(c.v2), sum(CASE WHEN (c.v1 > 15) THEN c.v2 ELSE 0 END), sum(CASE WHEN (c.v1 > 5) THEN c.v2 ELSE 0 END)] -> Index Scan using child_pkey on wings_sky.child c (cost=0.00..40.10 rows=20 width=8) Output: c.parent_id, c.v1, c.v2 Index Cond: (c.parent_id = $0) SubPlan 2 -> Aggregate (cost=40.26..40.27 rows=1 width=8) Output: ARRAY[sum(c.v2), sum(CASE WHEN (c.v1 > 15) THEN c.v2 ELSE 0 END), sum(CASE WHEN (c.v1 > 5) THEN c.v2 ELSE 0 END)] -> Index Scan using child_pkey on wings_sky.child c (cost=0.00..40.10 rows=20 width=8) Output: c.parent_id, c.v1, c.v2 Index Cond: (c.parent_id = $0) SubPlan 3 -> Aggregate (cost=40.26..40.27 rows=1 width=8) Output: ARRAY[sum(c.v2), sum(CASE WHEN (c.v1 > 15) THEN c.v2 ELSE 0 END), sum(CASE WHEN (c.v1 > 5) THEN c.v2 ELSE 0 END)] -> Index Scan using child_pkey on wings_sky.child c (cost=0.00..40.10 rows=20 width=8) Output: c.parent_id, c.v1, c.v2 Index Cond: (c.parent_id = $0) SubPlan 4 -> Aggregate (cost=40.26..40.27 rows=1 width=8) Output: ARRAY[sum(c.v2), sum(CASE WHEN (c.v1 > 15) THEN c.v2 ELSE 0 END), sum(CASE WHEN (c.v1 > 5) THEN c.v2 ELSE 0 END)] -> Index Scan using child_pkey on wings_sky.child c (cost=0.00..40.10 rows=20 width=8) Output: c.parent_id, c.v1, c.v2 Index Cond: (c.parent_id = $0) There's another method of writing this that is more efficient in this PARTICULAR case: EXPLAIN VERBOSE SELECT parent.id, t.foo, t.bar, t.baz FROM parent INNER JOIN ( SELECT child.parent_id, SUM(child.v2) AS foo, SUM(CASE WHEN child.v1 > 15 THEN child.v2 ELSE 0 END) AS bar, SUM(CASE WHEN child.v1 > 5 THEN child.v2 ELSE 0 END) AS baz FROM child GROUP BY child.parent_id ) AS t ON t.parent_id=parent.id Hash Join (cost=547.12..556.62 rows=200 width=28) Output: parent.id, (sum(child.v2)), (sum(CASE WHEN (child.v1 > 15) THEN child.v2 ELSE 0 END)), (sum(CASE WHEN (child.v1 > 5) THEN child.v2 ELSE 0 END)) Hash Cond: (child.parent_id = parent.id) -> HashAggregate (cost=483.12..487.62 rows=200 width=12) Output: child.parent_id, sum(child.v2), sum(CASE WHEN (child.v1 > 15) THEN child.v2 ELSE 0 END), sum(CASE WHEN (child.v1 > 5) THEN child.v2 ELSE 0 END) -> Seq Scan on wings_sky.child (cost=0.00..291.06 rows=19206 width=12) Output: child.parent_id, child.v1, child.v2 -> Hash (cost=34.00..34.00 rows=2400 width=4) Output: parent.id -> Seq Scan on wings_sky.parent (cost=0.00..34.00 rows=2400 width=4) Output: parent.id However, the second plan performs poorly in cases in the case where parent_id can be more than one possible value (i.e. there is no WHERE parent_id=1 clause) -- it takes nearly as long in that instance as it does with no WHERE clause altogether. Both plans run the same speed with one parent_id. The first plan starts losing speed gradually as the number of parents increase; the second plan is either all-or-nothing. In the first case, it seems inefficient to duplicate the subplan for each reference -- I'd think the (corrected) plan should look something like this: Seq Scan on wings_sky.parent p (cost=0.00..161113.12 rows=1000 width=4) Output: p.id, (SubPlan 1), ((SubPlan 1))[1], ((SubPlan 1))[2], ((SubPlan 1))[3] SubPlan 1 -> Aggregate (cost=40.26..40.27 rows=1 width=8) Output: ARRAY[sum(c.v2), sum(CASE WHEN (c.v1 > 15) THEN c.v2 ELSE 0 END), sum(CASE WHEN (c.v1 > 5) THEN c.v2 ELSE 0 END)] -> Index Scan using child_pkey on wings_sky.child c (cost=0.00..40.10 rows=20 width=8) Output: c.parent_id, c.v1, c.v2 Index Cond: (c.parent_id = $0) Is there any chance this might be looked at in a future release? -- Daniel Grace AGE, LLC System Administrator and Software Developer dgr...@wingsnw.com // www.wingsnw.com -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs