Re: [HACKERS] Sync Rep v19

Fujii Masao Sat, 05 Mar 2011 23:51:46 -0800

On Sun, Mar 6, 2011 at 1:53 AM, Simon Riggs <si...@2ndquadrant.com> wrote:
> On Sat, 2011-03-05 at 20:08 +0900, Fujii Masao wrote:
>> On Sat, Mar 5, 2011 at 7:28 AM, Simon Riggs <si...@2ndquadrant.com> wrote:
>> > Yes, that can happen. As people will no doubt observe, this seems to be
>> > an argument for wait-forever. What we actually need is a wait that lasts
>> > longer than it takes for us to decide to failover, if the standby is
>> > actually up and this is some kind of split brain situation. That way the
>> > clients are still waiting when failover occurs. WAL is missing, but
>> > since we didn't acknowledge the client we are OK to treat that situation
>> > as if it were an abort.
>>
>> Oracle Data Guard in the maximum availability mode behaves that way?
>>
>> I'm sure that you are implementing something like the maximum availability
>> mode rather than the maximum protection one. So I'd like to know how
>> the data loss situation I described can be avoided in the maximum 
>> availability
>> mode.
>
> It can't. (Oracle or otherwise...)
>
> Once we begin waiting for sync rep, if the transaction or backend ends
> then other backends will be able to see the changed data. The only way
> to prevent that is to shutdown the database to ensure that no readers or
> writers have access to that.
>
> Oracle's protection mechanism is to shutdown the primary if there is no
> sync standby available. Maximum Protection. Any other mode must
> therefore be less than maximum protection, according to Oracle, and me.
> "Available" here means one that has not timed out, via parameter.
>
> Shutting down the main server is cool, as long as you failover to one of
> the standbys. If there aren't any standbys, or you don't have a
> mechanism for switching quickly, you have availability problems.
>
> What shutting down the server doesn't do is keep the data safe for
> transactions that were in their commit-wait phase when the disconnect
> occurs. That data exists, yet will not have been transferred to the
> standby.
>
> >From now, I also say we should wait forever. It is the safest mode and I
> want no argument about whether sync rep is safe or not. We can introduce
> a more relaxed mode later with high availability for the primary. That
> is possible and in some cases desirable.
>
> Now, when we lose last sync standby we have three choices:
>
>  1. reconnect the standby, or wait for a potential standby to catchup
>
>  2. immediate shutdown of master and failover to one of the standbys
>
>  3. reclassify an async standby as a sync standby
>
> More than likely we would attempt to do (1) for a while, then do (2)
>
> This means that when we startup the primary will freeze for a while
> until the sync standbys connect. Which is OK, since they try to
> reconnect every 5 seconds and we don't plan on shutting down the primary
> much anyway.
>
> I've removed the timeout parameter, plus if we begin waiting we wait
> until released, forever if that's how long it takes.
>
> The recommendation to use more than one standby remains.
>
> Fast shutdown will wake backends from their latch and there isn't any
> changed interrupt behaviour any more.
>
> synchronous_standby_names = '*' is no longer the default
>
> On a positive note this is one less parameter and will improve
> performance as well.
>
> All above changes made.
>
> Ready to commit, barring concrete objections to important behaviour.
>
> I will do one final check tomorrow evening then commit.


I agree with this change.

One comment; what about introducing built-in function to wake up all the
waiting backends? When replication connection is closed, if we STONITH
the standby, we can safely (for not physical data loss but logical one)
switch the primary to standalone mode. But there is no way to wake up
the waiting backends for now. Setting synchronous_replication to OFF
and reloading the configuration file doesn't affect the existing waiting
backends. The attached patch introduces the "pg_wakeup_all_waiters"
(better name?) function which wakes up all the backends on the queue.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

*** a/doc/src/sgml/func.sgml
--- b/doc/src/sgml/func.sgml
***************
*** 13905,13910 **** SELECT set_config('log_statement_stats', 'off', false);
--- 13905,13917 ----
         <entry><type>boolean</type></entry>
         <entry>Terminate a backend</entry>
        </row>
+       <row>
+        <entry>
+         <literal><function>pg_wakeup_all_waiters()</function></literal>
+         </entry>
+        <entry><type>void</type></entry>
+        <entry>Wake up all the backends waiting for replication</entry>
+       </row>
       </tbody>
      </tgroup>
     </table>
***************
*** 13939,13944 **** SELECT set_config('log_statement_stats', 'off', false);
--- 13946,13956 ----
      subprocess.
     </para>
  
+    <para>
+     <function>pg_wakeup_all_waiters</> signals all the backends waiting
+     for replication, to wake up and complete the transaction.
+    </para>
+ 
     <indexterm>
      <primary>backup</primary>
     </indexterm>
*** a/src/backend/replication/syncrep.c
--- b/src/backend/replication/syncrep.c
***************
*** 72,78 **** static void SyncRepWaitOnQueue(XLogRecPtr XactCommitLSN);
  static void SyncRepQueueInsert(void);
  
  static int SyncRepGetStandbyPriority(void);
! static int SyncRepWakeQueue(void);
  
  /*
   * ===========================================================
--- 72,78 ----
  static void SyncRepQueueInsert(void);
  
  static int SyncRepGetStandbyPriority(void);
! static int SyncRepWakeQueue(bool wakeup_all);
  
  /*
   * ===========================================================
***************
*** 180,196 **** SyncRepWaitOnQueue(XLogRecPtr XactCommitLSN)
  			 * unlikely to be far enough, yet is possible. Next time we are
  			 * woken we should be more lucky.
  			 */
! 			if (XLByteLE(XactCommitLSN, walsndctl->lsn))
  			{
  				SHMQueueDelete(&(MyProc->syncrep_links));
  				LWLockRelease(SyncRepLock);
  
- 				/*
- 				 * Reset our waitLSN.
- 				 */
- 				MyProc->waitLSN.xlogid = 0;
- 				MyProc->waitLSN.xrecoff = 0;
- 
  				if (new_status)
  				{
  					/* Reset ps display */
--- 180,192 ----
  			 * unlikely to be far enough, yet is possible. Next time we are
  			 * woken we should be more lucky.
  			 */
! 			if (XLByteLE(XactCommitLSN, walsndctl->lsn) ||
! 				(MyProc->waitLSN.xlogid == 0 &&
! 				 MyProc->waitLSN.xrecoff == 0))
  			{
  				SHMQueueDelete(&(MyProc->syncrep_links));
  				LWLockRelease(SyncRepLock);
  
  				if (new_status)
  				{
  					/* Reset ps display */
***************
*** 347,353 **** SyncRepReleaseWaiters(void)
  		 * release up to this location.
  		 */
  		walsndctl->lsn = MyWalSnd->flush;
! 		numprocs = SyncRepWakeQueue();
  	}
  
  	LWLockRelease(SyncRepLock);
--- 343,349 ----
  		 * release up to this location.
  		 */
  		walsndctl->lsn = MyWalSnd->flush;
! 		numprocs = SyncRepWakeQueue(false);
  	}
  
  	LWLockRelease(SyncRepLock);
***************
*** 427,436 **** SyncRepGetStandbyPriority(void)
   * to be woken. We don't modify the queue, we leave that for individual
   * procs to release themselves.
   *
   * Must hold SyncRepLock
   */
  static int
! SyncRepWakeQueue(void)
  {
  	volatile WalSndCtlData *walsndctl = WalSndCtl;
  	PGPROC	*proc;
--- 423,434 ----
   * to be woken. We don't modify the queue, we leave that for individual
   * procs to release themselves.
   *
+  * If 'wakeup_all' is true, set the latches of all procs in the queue.
+  *
   * Must hold SyncRepLock
   */
  static int
! SyncRepWakeQueue(bool wakeup_all)
  {
  	volatile WalSndCtlData *walsndctl = WalSndCtl;
  	PGPROC	*proc;
***************
*** 445,454 **** SyncRepWakeQueue(void)
  		/*
  		 * Assume the queue is ordered by LSN
  		 */
! 		if (XLByteLT(walsndctl->lsn, proc->waitLSN))
  			return numprocs;
  
  		numprocs++;
  		SetLatch(&(proc->waitLatch));
  		proc = (PGPROC *) SHMQueueNext(&(WalSndCtl->SyncRepQueue),
  									   &(proc->syncrep_links),
--- 443,454 ----
  		/*
  		 * Assume the queue is ordered by LSN
  		 */
! 		if (!wakeup_all && XLByteLT(walsndctl->lsn, proc->waitLSN))
  			return numprocs;
  
  		numprocs++;
+ 		proc->waitLSN.xlogid = 0;
+ 		proc->waitLSN.xrecoff = 0;
  		SetLatch(&(proc->waitLatch));
  		proc = (PGPROC *) SHMQueueNext(&(WalSndCtl->SyncRepQueue),
  									   &(proc->syncrep_links),
***************
*** 457,459 **** SyncRepWakeQueue(void)
--- 457,477 ----
  
  	return numprocs;
  }
+ 
+ /*
+  * Wake up all the waiting backends
+  */
+ Datum
+ pg_wakeup_all_waiters(PG_FUNCTION_ARGS)
+ {
+ 	if (!superuser())
+ 		ereport(ERROR,
+ 				(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+ 			(errmsg("must be superuser to signal other server processes"))));
+ 
+ 	LWLockAcquire(SyncRepLock, LW_EXCLUSIVE);
+ 	SyncRepWakeQueue(true);
+ 	LWLockRelease(SyncRepLock);
+ 
+ 	PG_RETURN_VOID();
+ }
*** a/src/include/catalog/pg_proc.h
--- b/src/include/catalog/pg_proc.h
***************
*** 2869,2874 **** DATA(insert OID = 3821 ( pg_last_xlog_replay_location	PGNSP PGUID 12 1 0 0 f f f
--- 2869,2876 ----
  DESCR("last xlog replay location");
  DATA(insert OID = 3830 ( pg_last_xact_replay_timestamp	PGNSP PGUID 12 1 0 0 f f f t f v 0 0 1184 "" _null_ _null_ _null_ _null_ pg_last_xact_replay_timestamp _null_ _null_ _null_ ));
  DESCR("timestamp of last replay xact");
+ DATA(insert OID = 3831 ( pg_wakeup_all_waiters		PGNSP PGUID 12 1 0 0 f f f t f v 0 0 2278 "" _null_ _null_ _null_ _null_ pg_wakeup_all_waiters _null_ _null_ _null_ ));
+ DESCR("wake up all waiters");
  
  DATA(insert OID = 3071 ( pg_xlog_replay_pause		PGNSP PGUID 12 1 0 0 f f f t f v 0 0 2278 "" _null_ _null_ _null_ _null_ pg_xlog_replay_pause _null_ _null_ _null_ ));
  DESCR("pause xlog replay");
*** a/src/include/replication/syncrep.h
--- b/src/include/replication/syncrep.h
***************
*** 34,37 **** extern void SyncRepCleanupAtProcExit(int code, Datum arg);
--- 34,40 ----
  extern void SyncRepInitConfig(void);
  extern void SyncRepReleaseWaiters(void);
  
+ /* system administration functions */
+ extern Datum pg_wakeup_all_waiters(PG_FUNCTION_ARGS);
+ 
  #endif   /* _SYNCREP_H */

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Sync Rep v19

Reply via email to