Re: [HACKERS] Sync Rep v17

Fujii Masao Wed, 02 Mar 2011 06:31:15 -0800

On Wed, Mar 2, 2011 at 8:22 PM, Simon Riggs <[email protected]> wrote:
> The WALSender deliberately does *not* wake waiting users if the standby
> disconnects. Doing so would break the whole reason for having sync rep
> in the first place. What we do is allow a potential standby to takeover
> the role of sync standby, if one is available. Or the failing standby
> can reconnect and then release waiters.


If there is potential standby when synchronous standby has gone, I agree
that it's not good idea to release the waiting backends soon. In this case,
those backends should wait for next synchronous standby.

On the other hand, if there is no potential standby, I think that the waiting
backends should not wait for the timeout and should wake up as soon as
synchronous standby has gone. Otherwise, those backends suspend for
a long time (i.e., until the timeout expires), which would decrease the
high-availability, I'm afraid.

Keeping those backends waiting for the failed standby to reconnect is an
idea. But this looks like the behavior for "allow_standalone_primary = off".
If allow_standalone_primary = on, it looks more natural to make the
primary work alone without waiting the timeout.

> If we shutdown, then we wait for the shutdown commit record to be
> transferred to our standby, so a normal or fast shutdown of the master
> always leaves all connected standbys up to date. We already do that, so
> sync rep doesn't touch that behaviour. If a standby is disconnected,
> then it doesn't receive the shutdown checkpoint record.
>
> The wait state for a commit does not persist when we shutdown and
> restart.
>
> Can you restate which bits of the above you think need to be changed?

What I'm thinking is: when the waiting backends are released because
of the timeout while the fast shutdown is being done in the master,
those backends should not return the success indication to the client.
Of course, in that case, WAL has already been flushed in the master,
but I think that those backends should exit with FATAL error before
returning the success. This is for avoiding breaking the synchronous
replication rule, i.e., all the transaction which the client knows as
committed must be committed in the synchronous standby after failover.

If we allow those backends to return the success in that situation, the
following scenario which can cause a data loss can happen.

1. The primary is running with allow_standalone_primary = on. There
    is only one (synchronous) standby connected.
2. The replication connection is closed because of the network outage.
3. While some backends are waiting for replication, the user requests
    fast shutdown in the master.
4. Since the timeout expires, those backends stop waiting and return
    the success indication to the client (but not replicated to the standby).
5. Since there is no backend waiting for replication, fast shutdown
    completes.
6. The clusterware like pacemaker detects the death of the primary
    and triggers the failover.
7. New primary doesn't have some transactions committed to the
    client, i.e., transaction lost happens!!

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Sync Rep v17

Reply via email to