On Wed, Mar 16, 2011 at 7:39 AM, Robert Haas <robertmh...@gmail.com> wrote: >>> The only idea I have for allowing fast shutdown to still be fast, even >>> when sync rep is involved, is to shut down the system in two phases. >>> The postmaster would need to stop accepting new connections, and first >>> kill off all the backends that aren't waiting for sync rep. Then, >>> once all remaining backends are waiting for sync rep, we can have them >>> proceed as above: close the connection without acking the commit or >>> throwing ERROR/FATAL, and exit. That's pretty complicated, especially >>> given the rule that the postmaster mustn't touch shared memory, but I >>> don't see any alternative. >> >> What extra capability are we actually delivering by doing that?? >> The risk of introducing a bug and thereby losing data far outweighs the >> rather dubious benefit. > > Well, my belief is that when users ask the database to shut down, they > want it to work. If I'm the only one who thinks that, then whatever. > But I firmly believe we'll get bug reports about this.
On further review, the approach proposed above doesn't really work, because a backend can get a SIGTERM either because the system is doing a fast shutdown or because a user has issued pg_terminate_backend(PID); and in the latter case we have to continue letting in connections. As of right now, synchronous replication continues to wait even when: - someone tries to perform a fast shutdown - someone tries to kill the backend using pg_terminate_backend() - someone attempts to cancel the query using pg_cancel_backend() or by pressing control-C in, for example, psql - someone attempts to shut off synchronous replication by changing synchronous_standby_names in postgresql.conf and issuing pg_ctl reload We've worked pretty hard to ensure that things like query cancel and shutdown work quickly and reliably, and I don't think we want to make synchronous replication the one part of the system that departs from that general principle. So, patch attached. This patch arranges to do the following things: 1. If a die interrupt is received (pg_terminate_backend or fast shutdown), then terminate the sync rep wait and arrange for the connection to be closed without acknowledging the commit (but do send a warning message back). The commit still happened, though, so other transactions will see its effects. This is unavoidable unless we're willing to either ignore attempts to terminate a backend waiting for sync rep, or panic the system when it happens, and I don't think either of those is appropriate. 2. If a query cancel interrupt is received (pg_cancel_backend or ^C), then cancel the sync rep wait and issue a warning before acknowledging the commit. Again, the alternative is to either ignore the cancel or panic, neither of which I believe to be what users will want. 3. If synchronous_standby_names is changed to '' by editing postgresql.conf and issuing pg_ctl reload, then cancel all waits in progress and wake everybody up. As I mentioned before, reloading the config file from within the waiting backend (which can't safely throw an error) seems risky, so what I did instead is made WAL writer responsible for handling this. Nobody's allowed to wait for sync rep unless a global shared memory flag is set, and the WAL writer process is responsible for setting and clearing this flag when the config file is reloaded. This has basically no performance cost; WAL writer only ever does any extra work at all with this code when it receives a SIGHUP, and even then the work is trivial except in the case where synchronous_standby_names has changed from empty to non-empty or visca versa. The advantage of putting this in WAL writer rather than, say, bgwriter is that WAL writer doesn't have nearly as many jobs to do and they don't involve nearly as much I/O, so the chances of a long delay due to the process being busy are much less. 4. Remove the SYNC_REP_MUST_DISCONNECT state, which actually does absolutely nothing right now, despite what the name would seem to imply. In particular, it doesn't arrange for any sort of disconnect. This patch does arrange for that, but not using this mechanism. 5. The existing code relies on being able to read MyProc->syncRepState without holding the lock, even while a WAL sender must be updating it in another process. I'm not 100% sure this is safe on a multi-processor machine with weak memory ordering. In practice, the chances of something going wrong here seem extremely small. You'd need something like this: a WAL sender updates MyProc->syncRepState just after the wait timeout expires and before the latch is reset, but the regular backend fails to see the state change due to memory-ordering effects and drops through the loop, waiting another 60 s, and then finally wakes up and completes the wait (but a minute later than expected). That seems vanishingly unlikely but it's also simple to protect against, so I did. Review appreciated. Thanks, -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
sync-rep-wait-fixes.patch
Description: Binary data
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers