This is all on Postgres 9.0.0:

I'm working on the definition for a view that, as part of its output,
includes three columns that each contain a sum of values in a table
that are in one of a few different states -- essentially: for each
parent, give me the sum of children in the foo state, a second sum of
the children in the bar state, and a third sum of the children in the
baz state.

Thinking I'd be clever and avoid multiple almost-identical subqueries
to for each SUM, I decided to do it in a single subquery that returns
an array of all 3 sums and then split it out into its component parts
higher up (see the example below if you don't understand what I mean).
 Unfortunately, the query planner doesn't quite handle this case: For
each reference to the subquery value, it duplicates the subquery and
thus its plan:

CREATE TABLE parent (
        id INT PRIMARY KEY
);
INSERT INTO parent(id) SELECT GENERATE_SERIES(1, 1000);

CREATE TABLE child (
        parent_id INT,
        v1 INT,
        v2 INT,
        PRIMARY KEY(parent_id, v1)
);

INSERT INTO child(parent_id, v1, v2)
SELECT p.id, v1.id, RANDOM() * 500
FROM
        GENERATE_SERIES(1, 1000) AS p(id)
        CROSS JOIN GENERATE_SERIES(1, 20) AS v1(id)
;

EXPLAIN SELECT *, child_data[1] AS foo, child_data[2] AS bar,
child_data[3] AS baz FROM (
        SELECT p.id,
                (
                        SELECT
                                ARRAY[
                                        SUM(c.v2),
                                        SUM(CASE WHEN c.v1 > 15 THEN c.v2 ELSE 
0 END),
                                        SUM(CASE WHEN c.v1 >  5 THEN c.v2 ELSE 
0 END)
                                ]
                        FROM child AS c
                        WHERE c.parent_id=p.id
                ) AS child_data
        FROM parent AS p
) AS t;

produces:

Seq Scan on wings_sky.parent p (cost=0.00..161113.12 rows=1000 width=4)
Output: p.id, (SubPlan 1), ((SubPlan 2))[1], ((SubPlan 3))[2], ((SubPlan 4))[3]
SubPlan 1
-> Aggregate (cost=40.26..40.27 rows=1 width=8)
Output: ARRAY[sum(c.v2), sum(CASE WHEN (c.v1 > 15) THEN c.v2 ELSE 0
END), sum(CASE WHEN (c.v1 > 5) THEN c.v2 ELSE 0 END)]
-> Index Scan using child_pkey on wings_sky.child c (cost=0.00..40.10
rows=20 width=8)
Output: c.parent_id, c.v1, c.v2
Index Cond: (c.parent_id = $0)
SubPlan 2
-> Aggregate (cost=40.26..40.27 rows=1 width=8)
Output: ARRAY[sum(c.v2), sum(CASE WHEN (c.v1 > 15) THEN c.v2 ELSE 0
END), sum(CASE WHEN (c.v1 > 5) THEN c.v2 ELSE 0 END)]
-> Index Scan using child_pkey on wings_sky.child c (cost=0.00..40.10
rows=20 width=8)
Output: c.parent_id, c.v1, c.v2
Index Cond: (c.parent_id = $0)
SubPlan 3
-> Aggregate (cost=40.26..40.27 rows=1 width=8)
Output: ARRAY[sum(c.v2), sum(CASE WHEN (c.v1 > 15) THEN c.v2 ELSE 0
END), sum(CASE WHEN (c.v1 > 5) THEN c.v2 ELSE 0 END)]
-> Index Scan using child_pkey on wings_sky.child c (cost=0.00..40.10
rows=20 width=8)
Output: c.parent_id, c.v1, c.v2
Index Cond: (c.parent_id = $0)
SubPlan 4
-> Aggregate (cost=40.26..40.27 rows=1 width=8)
Output: ARRAY[sum(c.v2), sum(CASE WHEN (c.v1 > 15) THEN c.v2 ELSE 0
END), sum(CASE WHEN (c.v1 > 5) THEN c.v2 ELSE 0 END)]
-> Index Scan using child_pkey on wings_sky.child c (cost=0.00..40.10
rows=20 width=8)
Output: c.parent_id, c.v1, c.v2
Index Cond: (c.parent_id = $0)

There's another method of writing this that is more efficient in this
PARTICULAR case:


EXPLAIN VERBOSE SELECT parent.id, t.foo, t.bar, t.baz
FROM
        parent
        INNER JOIN (
                SELECT
                        child.parent_id,
                        SUM(child.v2) AS foo,
                        SUM(CASE WHEN child.v1 > 15 THEN child.v2 ELSE 0 END) 
AS bar,
                        SUM(CASE WHEN child.v1 >  5 THEN child.v2 ELSE 0 END) 
AS baz
                FROM
                        child
                GROUP BY
                        child.parent_id
        ) AS t ON t.parent_id=parent.id

Hash Join (cost=547.12..556.62 rows=200 width=28)
Output: parent.id, (sum(child.v2)), (sum(CASE WHEN (child.v1 > 15)
THEN child.v2 ELSE 0 END)), (sum(CASE WHEN (child.v1 > 5) THEN
child.v2 ELSE 0 END))
Hash Cond: (child.parent_id = parent.id)
-> HashAggregate (cost=483.12..487.62 rows=200 width=12)
Output: child.parent_id, sum(child.v2), sum(CASE WHEN (child.v1 > 15)
THEN child.v2 ELSE 0 END), sum(CASE WHEN (child.v1 > 5) THEN child.v2
ELSE 0 END)
-> Seq Scan on wings_sky.child (cost=0.00..291.06 rows=19206 width=12)
Output: child.parent_id, child.v1, child.v2
-> Hash (cost=34.00..34.00 rows=2400 width=4)
Output: parent.id
-> Seq Scan on wings_sky.parent (cost=0.00..34.00 rows=2400 width=4)
Output: parent.id


However, the second plan performs poorly in cases in the case where
parent_id can be more than one possible value (i.e. there is no WHERE
parent_id=1 clause) -- it takes nearly as long in that instance as it
does with no WHERE clause altogether.  Both plans run the same speed
with one parent_id.  The first plan starts losing speed gradually as
the number of parents increase; the second plan is either
all-or-nothing.



In the first case, it seems inefficient to duplicate the subplan for
each reference -- I'd think the (corrected) plan should look something
like this:

Seq Scan on wings_sky.parent p (cost=0.00..161113.12 rows=1000 width=4)
Output: p.id, (SubPlan 1), ((SubPlan 1))[1], ((SubPlan 1))[2], ((SubPlan 1))[3]
SubPlan 1
-> Aggregate (cost=40.26..40.27 rows=1 width=8)
Output: ARRAY[sum(c.v2), sum(CASE WHEN (c.v1 > 15) THEN c.v2 ELSE 0
END), sum(CASE WHEN (c.v1 > 5) THEN c.v2 ELSE 0 END)]
-> Index Scan using child_pkey on wings_sky.child c (cost=0.00..40.10
rows=20 width=8)
Output: c.parent_id, c.v1, c.v2
Index Cond: (c.parent_id = $0)

Is there any chance this might be looked at in a future release?

-- 
Daniel Grace
AGE, LLC
System Administrator and Software Developer
dgr...@wingsnw.com // www.wingsnw.com

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Reply via email to