Re: [HACKERS] Aggregate Push Down - Performing aggregation on foreign server

Tom Lane Sat, 22 Oct 2016 08:40:27 -0700

brolga is still not terribly happy with this patch: it's choosing not to
push down the aggregates in one of the queries.  While I failed to
duplicate that result locally, investigation suggests that brolga's result
is perfectly sane; in fact it's not very clear why we're not getting that
from multiple critters, because the plan brolga is choosing is not
inferior to the expected one.


The core of the problem is this subquery:

contrib_regression=# explain verbose select min(13), avg(ft1.c1), sum(ft2.c1) 
from ft1 right join ft2 on (ft1.c1 = ft2.c1) where ft1.c1 = 12;
                                                                       QUERY 
PLAN                                                                        
---------------------------------------------------------------------------------------------------------------------------------------------------------
 Foreign Scan  (cost=108.61..108.64 rows=1 width=44)
   Output: (min(13)), (avg(ft1.c1)), (sum(ft2.c1))
   Relations: Aggregate on ((public.ft1) INNER JOIN (public.ft2))
   Remote SQL: SELECT min(13), avg(r1."C 1"), sum(r2."C 1") FROM ("S 1"."T 1" 
r1 INNER JOIN "S 1"."T 1" r2 ON (((r2."C 1" = 12)) AND ((r1."C 1" = 12))))
(4 rows)

If you look at the estimate to just fetch the data, it's:

contrib_regression=# explain verbose select ft1.c1, ft2.c1 from ft1 right join 
ft2 on (ft1.c1 = ft2.c1) where ft1.c1 = 12; 
                                                              QUERY PLAN        
                                                      
--------------------------------------------------------------------------------------------------------------------------------------
 Foreign Scan  (cost=100.55..108.62 rows=1 width=8)
   Output: ft1.c1, ft2.c1
   Relations: (public.ft1) INNER JOIN (public.ft2)
   Remote SQL: SELECT r1."C 1", r2."C 1" FROM ("S 1"."T 1" r1 INNER JOIN "S 
1"."T 1" r2 ON (((r2."C 1" = 12)) AND ((r1."C 1" = 12))))
(4 rows)

Note we're expecting only one row out of the join.  Now the cost of doing
three aggregates on a single row of input is not a lot.  Comparing these
local queries:

regression=# explain select min(13),avg(q1),sum(q2) from int8_tbl where q2=456;
                          QUERY PLAN                           
---------------------------------------------------------------
 Aggregate  (cost=1.07..1.08 rows=1 width=68)
   ->  Seq Scan on int8_tbl  (cost=0.00..1.06 rows=1 width=16)
         Filter: (q2 = 456)
(3 rows)

regression=# explain select (q1),(q2) from int8_tbl where q2=456;
                       QUERY PLAN                        
---------------------------------------------------------
 Seq Scan on int8_tbl  (cost=0.00..1.06 rows=1 width=16)
   Filter: (q2 = 456)
(2 rows)

we seem to have startup = input cost + .01 and then another .01
for total.  So the estimate to do the above remote scan and then
aggregate locally should have been 108.63 startup and 108.64 total,
give or take.  The estimate for aggregating remotely is a hair better,
but it's not nearly better enough to guarantee that the planner won't
see it as fuzzily the same cost.

In short: the problem with this test case is that it's considering
aggregation over only a single row, which is a situation in which
pushing down the aggregate actually doesn't save us anything, because
we're retrieving one row from the remote either way.  So it's not at all
surprising that we don't get a stable plan choice.  The test query needs
to be adjusted so that the aggregation is done over multiple rows,
allowing fdw_tuple_cost to kick in and provide some daylight between
the cost estimates.

                        regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Aggregate Push Down - Performing aggregation on foreign server

Reply via email to