Hello. At Thu, 02 Aug 2018 01:06:41 +0100, Andrew Gierth <and...@tao11.riddles.org.uk> wrote in <87pnz1aby9....@news-spur.riddles.org.uk> > This analysis comes from investigating a report from an IRC user. A > summary of the initial report is: > > Using PG 9.6.9 and postgres_fdw, a query of the form "select * from > foreign_table order by col limit 1" is getting a local Sort plan, not > pushing the ORDER BY to the remote. Turning off use_remote_estimates > changes the plan to use a remote sort, with a 10000x speedup. > > I don't think this can be called a bug, exactly, and I don't have an > immediate fix, so I'm putting this analysis up for the benefit of anyone > working on this in future.
I didn't see the concrete estimates, it seems that the cause is too-small total cost of non-remote-sorted plan compared with the startup cost of remote-sorted one. In other words, tuple cost by the remoteness is estimated as too small. Perhaps setting fdw_tuple_cost to , say 1 as an extreme value, will bring victory to remote sort path for the query. > The cause of the misplan seems to be this: postgres_fdw with > use_remote_estimates on does not attempt to obtain fast-start plans from > the remote. In this case what happens is this: > > 1. postgres_fdw gets the cost estimate from the plain remote fetch, by > doing "EXPLAIN select * from table". This produces a plan with a low > startup cost (just the constant overhead) and a high total cost (on > the order of 1.2e6 in this case). > > 2. postgres_fdw gets the cost estimate for the ordered fetch, by doing > "EXPLAIN select * from table order by col". Note that there is no > LIMIT nor any cursor_tuple_fraction in effect, so the plan returned > in this case is a seqscan+sort plan (in spite of the presence of an > index on "col"), with a very high (order of 8e6) startup and total > cost. > > So when the local side tries to generate paths, it has the choice of > using a remote-ordered path with startup cost 8e6, or a local top-1 > sort on top of an unordered remote path, which has a total cost on the > order of 1.5e6 in this case; cheaper than the remote sort because this > only needs to do top-1, while the remote is sorting millions of rows > and would probably spill to disk. A simple test at hand showed that (on a unix-domain connection): =# explain (verbose on, analyze on) select * from ft1 order by a; > Foreign Scan on public.ft1 (cost=9847.82..17097.82 rows=100000 width=4) > (actual time=195.861..515.747 rows=100000 loops=1) =# explain (verbose on, analyze on) select * from ft1; > Foreign Scan on public.ft1 (cost=100.00..8543.00 rows=100000 width=4) > (actual time=0.659..399.427 rows=100000 loops=1) The cost is apaprently wrong. On my environment fdw_startup_cost = 45 and fdw_tuple_cost = 0.2 gave me an even cost/actual time ratio *for these queries*. (hard coded default is 100 and 0.01. Of course this disucussion is ignoring the accuracy of local-execution estimate.) =# explain (verbose on, analyze on) select * from ft1 order by a; > Foreign Scan on public.ft1 (cost=9792.82..31042.82 rows=100000 width=4) > (actual time=201.493..533.913 rows=100000 loops=1) =# explain (verbose on, analyze on) select * from ft1; > Foreign Scan on public.ft1 (cost=45.00..22488.00 rows=100000 width=4) > (actual time=0.837..484.469 rows=100000 loops=1) This gave me a remote-sorted plan for "select * from ft1 order by a limit 1". (But also gave me a remote-sorted plan without a LIMIT..) > However, when it comes to actual execution, postgres_fdw opens a cursor > for the remote query, which means that cursor_tuple_fraction will come > into play. As far as I can tell, this is not set anywhere, so this means > that the plan that actually gets run on the remote is likely to have > _completely_ different costs from those returned by the EXPLAINs. In > particular, in this case the fast-start index-scan plan for the ORDER BY > remote query is clearly being chosen when use_remote_estimates is off > (since the query completes in 15ms rather than 150 seconds). > > One possibility: would it be worth adding an option to EXPLAIN that > makes it assume cursor_tuple_fraction? Cursor fraction seems working since the foreign scan with remote sort has a cost with different startup and total values. The problem seems to be a too-small tuple cost. So, we might have a room for improvement on DEFAULT_FDW_STARTUP_COST, DEFAULT_FDW_TUPLE_COST and DEFAULT_FDW_SORT_MULTIPLIER settings. regards. -- Kyotaro Horiguchi NTT Open Source Software Center