Re: Asynchronous Append on postgres_fdw nodes.

Etsuro Fujita Wed, 10 Feb 2021 02:31:30 -0800

On Thu, Feb 4, 2021 at 7:21 PM Etsuro Fujita <etsuro.fuj...@gmail.com> wrote:
> On Mon, Feb 1, 2021 at 12:06 PM Etsuro Fujita <etsuro.fuj...@gmail.com> wrote:
> > Rather than doing so, I'd like to propose to allow
> > FDWs to disable async execution of them in problematic cases by
> > themselves during executor startup in the first cut.  What I have in
> > mind for that is:
> >
> > 1) For an FDW that has async-capable ForeignScan(s), we allow the FDW
> > to record, for each of the async-capable and non-async-capable
> > ForeignScan(s), the information on a connection to be used for the
> > ForeignScan into EState during BeginForeignScan().
> >
> > 2) After doing ExecProcNode() to each SubPlan and the main query tree
> > in InitPlan(), we give the FDW a chance to a) reconsider, for each of
> > the async-capable ForeignScan(s), whether the ForeignScan can be
> > executed asynchronously as planned, based on the information stored
> > into EState in #1, and then b) disable async execution of the
> > ForeignScan if not.
>
> s/ExecProcNode()/ExecInitNode()/.  Sorry for that.  I’ll post an
> updated patch for this in a few days.


I created a WIP patch for this.  For #2, I added a new callback
routine ReconsiderAsyncForeignScan().  The routine for postgres_fdw
postgresReconsiderAsyncForeignScan() is pretty simple: async execution
of an async-capable ForeignScan is disabled if the connection used for
it is used in other parts of the query plan tree except async subplans
just below the parent Append.  Here is a running example:

postgres=# create table t1 (a int, b int, c text);
postgres=# create table t2 (a int, b int, c text);
postgres=# create foreign table p1 (a int, b int, c text) server
server1 options (table_name 't1');
postgres=# create foreign table p2 (a int, b int, c text) server
server2 options (table_name 't2');
postgres=# create table pt (a int, b int, c text) partition by range (a);
postgres=# alter table pt attach partition p1 for values from (10) to (20);
postgres=# alter table pt attach partition p2 for values from (20) to (30);
postgres=# insert into p1 select 10 + i % 10, i, to_char(i, 'FM0000')
from generate_series(0, 99) i;
postgres=# insert into p2 select 20 + i % 10, i, to_char(i, 'FM0000')
from generate_series(0, 99) i;
postgres=# analyze pt;
postgres=# create table loct (a int, b int);
postgres=# create foreign table ft (a int, b int) server server1
options (table_name 'loct');
postgres=# insert into ft select i, i from generate_series(0, 99) i;
postgres=# analyze ft;
postgres=# create view v as select * from ft;

postgres=# explain verbose select * from pt, v where pt.b = v.b and v.b = 99;
                                       QUERY PLAN
-----------------------------------------------------------------------------------------
 Nested Loop  (cost=200.00..306.84 rows=2 width=21)
   Output: pt.a, pt.b, pt.c, ft.a, ft.b
   ->  Foreign Scan on public.ft  (cost=100.00..102.27 rows=1 width=8)
         Output: ft.a, ft.b
         Remote SQL: SELECT a, b FROM public.loct WHERE ((b = 99))
   ->  Append  (cost=100.00..204.55 rows=2 width=13)
         ->  Foreign Scan on public.p1 pt_1  (cost=100.00..102.27
rows=1 width=13)
               Output: pt_1.a, pt_1.b, pt_1.c
               Remote SQL: SELECT a, b, c FROM public.t1 WHERE ((b = 99))
         ->  Async Foreign Scan on public.p2 pt_2
(cost=100.00..102.27 rows=1 width=13)
               Output: pt_2.a, pt_2.b, pt_2.c
               Remote SQL: SELECT a, b, c FROM public.t2 WHERE ((b = 99))
(12 rows)

For this query, while p2 is executed asynchronously, p1 isn’t as it
uses the same connection with ft.  BUT:

postgres=# create role view_owner SUPERUSER;
postgres=# create user mapping for view_owner server server1;
postgres=# alter view v owner to view_owner;

postgres=# explain verbose select * from pt, v where pt.b = v.b and v.b = 99;
                                       QUERY PLAN
-----------------------------------------------------------------------------------------
 Nested Loop  (cost=200.00..306.84 rows=2 width=21)
   Output: pt.a, pt.b, pt.c, ft.a, ft.b
   ->  Foreign Scan on public.ft  (cost=100.00..102.27 rows=1 width=8)
         Output: ft.a, ft.b
         Remote SQL: SELECT a, b FROM public.loct WHERE ((b = 99))
   ->  Append  (cost=100.00..204.55 rows=2 width=13)
         ->  Async Foreign Scan on public.p1 pt_1
(cost=100.00..102.27 rows=1 width=13)
               Output: pt_1.a, pt_1.b, pt_1.c
               Remote SQL: SELECT a, b, c FROM public.t1 WHERE ((b = 99))
         ->  Async Foreign Scan on public.p2 pt_2
(cost=100.00..102.27 rows=1 width=13)
               Output: pt_2.a, pt_2.b, pt_2.c
               Remote SQL: SELECT a, b, c FROM public.t2 WHERE ((b = 99))
(12 rows)

in this setup, p1 is executed asynchronously as ft doesn’t use the
same connection with p1.

I added to postgresReconsiderAsyncForeignScan() this as well: even if
the connection isn’t used in the other parts, async execution of an
async-capable ForeignScan is disabled if the subplans of the Append
are all async-capable, and they use the same connection, because in
that case the subplans won’t be parallelized at all, and the overhead
of async execution may cause a performance degradation.

Attached is an updated version of the patch.  Sorry for the delay.

Best regards,
Etsuro Fujita

async-wip-2021-02-10.patch
Description: Binary data

Re: Asynchronous Append on postgres_fdw nodes.

Reply via email to