I have observed the following situation a few times now , with 8.4.5. Multiple PSQL clients are connected to server, some of them running transaction and some of them are idle state. When one of the backend is killed or crashed (using kill -9 <backend-pid>). The connection reset attempt from the active clients( that is, which were running a transaction and crashed in between) fails, since they immediately make the attempt while the server is in startup phase. As you can see from following: ----------------------- ACTIVE CLIENT ----------------------- [amul@localhost ~]$ psql -p 5432 postgres psql (8.4.5) Type "help" for help. postgres=# create table emp( id int,name varchar(20)); CREATE TABLE postgres=# insert into emp values(generate_series(1,999999999),'XYZ'); WARNING: terminating connection because of crash of another server process DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. HINT: In a moment you should be able to reconnect to the database and repeat your command. server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: Failed. ! ----------------------- IDLE CLIENT ----------------------- [amul@localhost ~]$ psql -p 5432 postgres psql (8.4.5) Type "help" for help. postgres=# select pg_backend_pid(); server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: Succeeded. postgres=# I just gone through and found following: 1. When backend crashes , server goes into recovery mode and come in the normal state to accept connection, it take little time. 2. But at busy client(which was running transaction before crash), immediately tries to reconnect to server which is under startup phase so it gets a negative reply and fails to reconnect. So I thought, before sending reconnect request from client need to wait for the server come to a state when it can accept connections It should have some timeout wait. I am not sure is this correct way to code modification or does it have any other impact.
I tried wait to client before sending reconnect request to server. For that added some sleep time for client in src/bin/psql/common.c (that is it changes things only for psql clients) Please check the attached patch for the modification.
0001-psql-connection-wait.patch
Description: Binary data
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers