Re: [HACKERS] Streaming replication, and walsender during recovery

Fujii Masao Mon, 18 Jan 2010 22:04:30 -0800

On Mon, Jan 18, 2010 at 11:42 PM, Simon Riggs <si...@2ndquadrant.com> wrote:
> On Mon, 2010-01-18 at 09:31 -0500, Tom Lane wrote:
>> Fujii Masao <masao.fu...@gmail.com> writes:
>> > When I configured a cascaded standby (i.e, made the additional
>> > standby server connect to the standby), I got the following
>> > errors, and a cascaded standby didn't start replication.
>>
>> >   ERROR:  timeline 0 of the primary does not match recovery target 
>> > timeline 1
>>
>> > I didn't care about that case so far. To avoid a confusing error
>> > message, we should forbid a startup of walsender during recovery,
>> > and emit a suitable message? Or support such cascade-configuration?
>> > Though I don't think that the latter is difficult to be implemented,
>> > ISTM it's not the time to do that now.
>>
>> It would be kind of silly to add code to forbid it if making it work
>> would be about the same amount of effort.  I think it'd be worth looking
>> closer to find out what the problem is.
>
> There is an ERROR, but no problem AFAICS. The tli isn't set until end of
> recovery because it doesn't need to have been set yet. That shouldn't
> prevent retrieving WAL data.


OK. Here is the patch which supports a walsender process during recovery;

* Change walsender so as to send the WAL written by the walreceiver
  if it has been started during recovery.
* Kill the walsenders started during recovery at the end of recovery
  because replication cannot survive the change of timeline ID.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

*** a/src/backend/access/transam/xlog.c
--- b/src/backend/access/transam/xlog.c
***************
*** 6384,6389 **** StartupXLOG(void)
--- 6384,6397 ----
  		xlogctl->SharedRecoveryInProgress = false;
  		SpinLockRelease(&xlogctl->info_lck);
  	}
+ 
+ 	/*
+ 	 * Kill the walsender processes which were started during recovery
+ 	 * since they cannot survive the change of timeline ID at the end of
+ 	 * an archive recovery. Here is the right place to do that because
+ 	 * new 'cascaded' walsender will not be started from here on.
+ 	 */
+ 	ShutdownCascadedWalSnds();
  }
  
  /*
***************
*** 6666,6671 **** GetWriteRecPtr(void)
--- 6674,6682 ----
  	volatile XLogCtlData *xlogctl = XLogCtl;
  	XLogRecPtr	recptr;
  
+ 	if (LocalRecoveryInProgress)
+ 		return GetWalRcvWriteRecPtr();
+ 
  	SpinLockAcquire(&xlogctl->info_lck);
  	recptr = xlogctl->LogwrtResult.Write;
  	SpinLockRelease(&xlogctl->info_lck);
*** a/src/backend/replication/walsender.c
--- b/src/backend/replication/walsender.c
***************
*** 491,496 **** InitWalSnd(void)
--- 491,505 ----
  				(errcode(ERRCODE_TOO_MANY_CONNECTIONS),
  				 errmsg("sorry, too many standbys already")));
  
+ 	/*
+ 	 * Use the recovery target timeline ID during recovery.
+ 	 */
+ 	if (RecoveryInProgress())
+ 	{
+ 		MyWalSnd->cascaded = true;
+ 		ThisTimeLineID = GetRecoveryTargetTLI();
+ 	}
+ 
  	/* Arrange to clean up at walsender exit */
  	on_shmem_exit(WalSndKill, 0);
  }
***************
*** 506,511 **** WalSndKill(int code, Datum arg)
--- 515,521 ----
  	 * for this.
  	 */
  	MyWalSnd->pid = 0;
+ 	MyWalSnd->cascaded = false;
  
  	/* WalSnd struct isn't mine anymore */
  	MyWalSnd = NULL;
***************
*** 848,850 **** GetOldestWALSendPointer(void)
--- 858,880 ----
  	}
  	return oldest;
  }
+ 
+ /*
+  * Stop only the cascaded walsender processes.
+  */
+ void
+ ShutdownCascadedWalSnds(void)
+ {
+ 	int	i;
+ 
+ 	for (i = 0; i < MaxWalSenders; i++)
+ 	{
+ 		/* use volatile pointer to prevent code rearrangement */
+ 		volatile WalSnd	*walsnd = &WalSndCtl->walsnds[i];
+ 		pid_t	walsndpid;
+ 
+ 		walsndpid = walsnd->pid;
+ 		if (walsndpid != 0 && walsnd->cascaded)
+ 			kill(walsndpid, SIGTERM);
+ 	}
+ }
*** a/src/include/replication/walsender.h
--- b/src/include/replication/walsender.h
***************
*** 22,27 **** typedef struct WalSnd
--- 22,28 ----
  {
  	pid_t	pid;		/* this walsender's process id, or 0 */
  	XLogRecPtr sentPtr;	/* WAL has been sent up to this point */
+ 	bool	cascaded;	/* this walsender is started during recovery? */
  
  	slock_t	mutex;		/* locks shared variables shown above */
  } WalSnd;
***************
*** 45,49 **** extern void WalSndSignals(void);
--- 46,51 ----
  extern Size WalSndShmemSize(void);
  extern void WalSndShmemInit(void);
  extern XLogRecPtr GetOldestWALSendPointer(void);
+ extern void ShutdownCascadedWalSnds(void);
  
  #endif	/* _WALSENDER_H */

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Streaming replication, and walsender during recovery

Reply via email to