On Sat, Oct 22, 2016 at 9:09 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:
> brolga is still not terribly happy with this patch: it's choosing not to > push down the aggregates in one of the queries. While I failed to > duplicate that result locally, investigation suggests that brolga's result > is perfectly sane; in fact it's not very clear why we're not getting that > from multiple critters, because the plan brolga is choosing is not > inferior to the expected one. > > The core of the problem is this subquery: > > contrib_regression=# explain verbose select min(13), avg(ft1.c1), > sum(ft2.c1) from ft1 right join ft2 on (ft1.c1 = ft2.c1) where ft1.c1 = 12; > > QUERY PLAN > ------------------------------------------------------------ > ------------------------------------------------------------ > --------------------------------- > Foreign Scan (cost=108.61..108.64 rows=1 width=44) > Output: (min(13)), (avg(ft1.c1)), (sum(ft2.c1)) > Relations: Aggregate on ((public.ft1) INNER JOIN (public.ft2)) > Remote SQL: SELECT min(13), avg(r1."C 1"), sum(r2."C 1") FROM ("S 1"."T > 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r2."C 1" = 12)) AND ((r1."C 1" = > 12)))) > (4 rows) > > If you look at the estimate to just fetch the data, it's: > > contrib_regression=# explain verbose select ft1.c1, ft2.c1 from ft1 right > join ft2 on (ft1.c1 = ft2.c1) where ft1.c1 = 12; > QUERY PLAN > ------------------------------------------------------------ > -------------------------------------------------------------------------- > Foreign Scan (cost=100.55..108.62 rows=1 width=8) > Output: ft1.c1, ft2.c1 > Relations: (public.ft1) INNER JOIN (public.ft2) > Remote SQL: SELECT r1."C 1", r2."C 1" FROM ("S 1"."T 1" r1 INNER JOIN > "S 1"."T 1" r2 ON (((r2."C 1" = 12)) AND ((r1."C 1" = 12)))) > (4 rows) > > Note we're expecting only one row out of the join. Now the cost of doing > three aggregates on a single row of input is not a lot. Comparing these > local queries: > > regression=# explain select min(13),avg(q1),sum(q2) from int8_tbl where > q2=456; > QUERY PLAN > --------------------------------------------------------------- > Aggregate (cost=1.07..1.08 rows=1 width=68) > -> Seq Scan on int8_tbl (cost=0.00..1.06 rows=1 width=16) > Filter: (q2 = 456) > (3 rows) > > regression=# explain select (q1),(q2) from int8_tbl where q2=456; > QUERY PLAN > --------------------------------------------------------- > Seq Scan on int8_tbl (cost=0.00..1.06 rows=1 width=16) > Filter: (q2 = 456) > (2 rows) > > we seem to have startup = input cost + .01 and then another .01 > for total. So the estimate to do the above remote scan and then > aggregate locally should have been 108.63 startup and 108.64 total, > give or take. The estimate for aggregating remotely is a hair better, > but it's not nearly better enough to guarantee that the planner won't > see it as fuzzily the same cost. > > In short: the problem with this test case is that it's considering > aggregation over only a single row, which is a situation in which > pushing down the aggregate actually doesn't save us anything, because > we're retrieving one row from the remote either way. So it's not at all > surprising that we don't get a stable plan choice. The test query needs > to be adjusted so that the aggregation is done over multiple rows, > allowing fdw_tuple_cost to kick in and provide some daylight between > the cost estimates. > Attached patch which performs aggrgation over 1000 rows as suggested by Tom. I believe it will have a stable output/plan now. Thanks > > regards, tom lane > -- Jeevan B Chalke Principal Software Engineer, Product Development EnterpriseDB Corporation The Enterprise PostgreSQL Company
agg_push_down_fix_testcase.patch
Description: binary/octet-stream
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers