On Sat, Feb 19, 2011 at 3:35 AM, Simon Riggs <si...@2ndquadrant.com> wrote: > On Fri, 2011-02-18 at 20:45 -0500, Robert Haas wrote: >> On the other hand, I see no particular >> harm in leaving the option in there either, though I definitely think >> the default should be changed to -1. > > The default setting should be to *not* freeze up if you lose the > standby. That behaviour unexpectedly leads to an effective server down > situation, rather than 2 minutes of slow running.
My understanding is that the parameter will wait on every commit, not just once. There's no mechanism to do anything else. But I did some testing this evening and actually it appears to not work at all. I hit walreceiver with a SIGSTOP and the commit never completes, even after the two minute timeout. Also, when I restarted walreceiver after a long time, I got a server crash. DEBUG: write 0/3027BC8 flush 0/3014690 apply 0/3014690 DEBUG: released 0 procs up to 0/3014690 DEBUG: write 0/3027BC8 flush 0/3027BC8 apply 0/3014690 DEBUG: released 2 procs up to 0/3027BC8 WARNING: could not locate ourselves on wait queue server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: DEBUG: shmem_exit(-1): 0 callbacks to make DEBUG: proc_exit(-1): 0 callbacks to make FATAL: could not receive data from WAL stream: server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. Failed. !> LOG: record with zero length at 0/3027BC8 DEBUG: CommitTransaction DEBUG: name: unnamed; blockState: STARTED; state: INPROGR, xid/subid/cid: 0/1/0, nestlvl: 1, children: DEBUG: received replication command: IDENTIFY_SYSTEM DEBUG: received replication command: START_REPLICATION 0/3000000 LOG: streaming replication successfully connected to primary DEBUG: standby "standby" is a potential synchronous standby DEBUG: write 0/0 flush 0/0 apply 0/3027BC8 DEBUG: released 0 procs up to 0/0 DEBUG: standby "standby" has now caught up with primary DEBUG: write 0/3027C18 flush 0/0 apply 0/3027BC8 DEBUG: standby "standby" is now the synchronous replication standby DEBUG: released 0 procs up to 0/0 DEBUG: write 0/3027C18 flush 0/3027C18 apply 0/3027BC8 DEBUG: released 0 procs up to 0/3027C18 DEBUG: write 0/3027C18 flush 0/3027C18 apply 0/3027C18 DEBUG: released 0 procs up to 0/3027C18 (lots more copies of those last two messages) I believe the problem is that the definition of IsOnSyncRepQueue is bogus, so that the loop in SyncRepWaitOnQueue always takes the first branch. It was a little confusing to me setting this up that setting only synchronous_replication did nothing; I had to also set synchronous_standby_names. We might need a cross-check there. I believe the docs for synchronous_replication also need some updating; this part appears to be out of date: + between primary and standby. The commit wait will last until the + first reply from any standby. Multiple standby servers allow + increased availability and possibly increase performance as well. The words "on the primary" in the next sentence may not be necessary any more either, as I believe this parameter now has no effect anywhere else. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers