On Thu, Jul 2, 2020 at 4:29 PM Bharath Rupireddy <bharath.rupireddyforpostg...@gmail.com> wrote: > > > > This is actually strange. AFAIR the code, without looking at the > > current code, when a query picks a foreign connection it checks its > > state. It's possible that the connection has not been marked bad by > > the time you fire new query. If the problem exists probably we should > > fix it anyway since the backend at the other end of the connection has > > higher chances of being killed while the connection was sitting idle > > in the cache. > > > > Thanks Ashutosh for the suggestion. One way, we could solve the above > problem is that, upon firing the new foreign query from local backend using > cached > connection, (assuming the remote backend/session that was cached in the local > backed got > killed by some means), instead of failing the query in the local > backend/session, upon > detecting error from remote backend, we could just delete the cached old > entry and try getting another > connection to remote backend/session, cache it and proceed to submit the > query. This has to happen only at > the beginning of remote xact.
Yes, I believe that would be good. > > This way, instead of failing(as mentioned above " server closed the > connection unexpectedly"), > the query succeeds if the local session is able to get a new remote backend > connection. > In GetConnection() there's a comment /* * We don't check the health of cached connection here, because it would * require some overhead. Broken connection will be detected when the * connection is actually used. */ Possibly this is where you want to check the health of connection when it's being used the first time in a transaction. > I worked on a POC patch to prove the above point. Attaching the patch. > Please note that, the patch doesn't contain comments and has some issues like > having some new > variable in PGconn structure and the things like. I don't think changing the PGConn structure for this is going to help. It's a libpq construct and used by many other applications/tools other than postgres_fdw. Instead you could use ConnCacheEntry for the same. See how we track invalidated connection and reconnect upon invalidation. > > If the approach makes some sense, then I can rework properly on the patch and > probably > can open another thread for the review and other stuff. > > The way I tested the patch: > > 1. select * from foreign_tbl; > /*from local session - this results in a > remote connection being cached in > the connection cache and > a remote backend/session is opened. > */ > 2. kill the remote backend/session > 3. select * from foreign_tbl; > /*from local session - without patch > this throws error "ERROR: server closed the connection unexpectedly" > with path - try to use > the cached connection at the beginning of remote xact, upon receiving > error from remote postgres > server, instead of aborting the query, delete the cached entry, try to > get a new connection, if it > gets, cache it and use that for executing the query, query succeeds. > */ This will work. Be cognizant of the fact that the same connection may be used by multiple plan nodes. -- Best Wishes, Ashutosh Bapat