Re: Retry Cached Remote Connections for postgres_fdw in case remote backend gets killed/goes away

Fujii Masao Wed, 23 Sep 2020 07:49:36 -0700



On 2020/09/21 12:44, Bharath Rupireddy wrote:

On Thu, Sep 17, 2020 at 10:20 PM Fujii Masao <[email protected] 
<mailto:[email protected]>> wrote:
 >
 > In 1st way, you may also need to call ReleaseExternalFD() when new 
connection fails
 > to be made, as connect_pg_server() does. Also we need to check that
 > non-superuser has used password to make new connection,
 > as connect_pg_server() does? I'm concerned about the case where
 > pg_hba.conf is changed accidentally so that no password is necessary
 > at the remote server and the existing connection is terminated. In this case,
 > if we connect to the local server as non-superuser, connection to
 > the remote server should fail because the remote server doesn't
 > require password. But with your patch, we can successfully reconnect
 > to the remote server.
 >
 > Therefore I like 2nd option. Also maybe disconnect_ps_server() needs to
 > be called before that. I'm not sure how much useful 1st option is.
 >

Thanks. Above points look sensible. +1 for the 2nd option i.e. instead of 
PQreset(entry->conn);, let's try to disconnect_pg_server() and 
connect_pg_server().

 >
 > What if 2nd attempt happens with have_prep_stmt=true? I'm not sure
 > if this case really happens, though. But if that can, it's strange to start
 > new connection with have_prep_stmt=true even when the caller of
 > GetConnection() doesn't intend to create any prepared statements.
 >
 > I think it's safer to do 2nd attempt in the same way as 1st one. Maybe
 > we can simplify the code by making them into common code block
 > or function.
 >

I don't think the have_prep_stmt will be set by the time we make 2nd attempt because 
entry->have_prep_stmt |= will_prep_stmt; gets hit only after we are successful in 
either 1st attempt or 2nd attempt. I think we don't need to set all transient state. 
No other transient state except changing_xact_state changes from 1st attempt to 2nd 
attempt(see below), so let's set only entry->changing_xact_state to false before 
2nd attempt.

1st attempt:
(gdb) p *entry
$3 = {key = 16389, conn = 0x55a896199990, xact_depth = 0, have_prep_stmt = 
false,
   have_error = false, changing_xact_state = false, invalidated = false,
   server_hashvalue = 3905865521, mapping_hashvalue = 2617776010}

2nd attempt i.e. in retry block:
(gdb) p *entry
$4 = {key = 16389, conn = 0x55a896199990, xact_depth = 0, have_prep_stmt = 
false,
   have_error = false, changing_xact_state = true, invalidated = false,
   server_hashvalue = 3905865521, mapping_hashvalue = 2617776010}

 >>
 > > If an error occurs in the first attempt, we return from
 > > pgfdw_get_result()'s  if (!PQconsumeInput(conn)) to the catch block we
 > > added and pgfdw_report_error() will never get called. And the below
 > > part of the code is reached only in scenarios as mentioned in the
 > > comments. Removing this might have problems if we receive errors other
 > > than CONNECTION_BAD or for subtxns. We could clear if any result and
 > > just rethrow the error upstream. I think no problem having this code
 > > here.
 >
 > But the orignal code works without this?
 > Or you mean that the original code has the bug?
 >

There's no bug in the original code. Sorry, I was wrong in saying 
pgfdw_report_error() will never get called with this patch. Indeed it gets 
called even when 1's attempt connection is failed. Since we added an extra 
try-catch block, we will not be throwing the error to the user, instead we make 
a 2nd attempt from the catch block.

I'm okay to remove below part of the code

 > >> +                       PGresult *res = NULL;
 > >> +                       res = PQgetResult(entry->conn);
 > >> +                       PQclear(res);
 > >> Are these really necessary? I was just thinking that's not because
 > >> pgfdw_get_result() and pgfdw_report_error() seem to do that
 > >> already in do_sql_command().

Please let me know if okay with the above agreed points, I will work on the new 
patch.


Yes, please work on the patch! Thanks! I may revisit the above points later, 
though ;)

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: Retry Cached Remote Connections for postgres_fdw in case remote backend gets killed/goes away

Reply via email to