On Mon, Jan 18, 2010 at 11:42 PM, Simon Riggs <si...@2ndquadrant.com> wrote: > On Mon, 2010-01-18 at 09:31 -0500, Tom Lane wrote: >> Fujii Masao <masao.fu...@gmail.com> writes: >> > When I configured a cascaded standby (i.e, made the additional >> > standby server connect to the standby), I got the following >> > errors, and a cascaded standby didn't start replication. >> >> > ERROR: timeline 0 of the primary does not match recovery target >> > timeline 1 >> >> > I didn't care about that case so far. To avoid a confusing error >> > message, we should forbid a startup of walsender during recovery, >> > and emit a suitable message? Or support such cascade-configuration? >> > Though I don't think that the latter is difficult to be implemented, >> > ISTM it's not the time to do that now. >> >> It would be kind of silly to add code to forbid it if making it work >> would be about the same amount of effort. I think it'd be worth looking >> closer to find out what the problem is. > > There is an ERROR, but no problem AFAICS. The tli isn't set until end of > recovery because it doesn't need to have been set yet. That shouldn't > prevent retrieving WAL data.
OK. Here is the patch which supports a walsender process during recovery; * Change walsender so as to send the WAL written by the walreceiver if it has been started during recovery. * Kill the walsenders started during recovery at the end of recovery because replication cannot survive the change of timeline ID. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
*** a/src/backend/access/transam/xlog.c --- b/src/backend/access/transam/xlog.c *************** *** 6384,6389 **** StartupXLOG(void) --- 6384,6397 ---- xlogctl->SharedRecoveryInProgress = false; SpinLockRelease(&xlogctl->info_lck); } + + /* + * Kill the walsender processes which were started during recovery + * since they cannot survive the change of timeline ID at the end of + * an archive recovery. Here is the right place to do that because + * new 'cascaded' walsender will not be started from here on. + */ + ShutdownCascadedWalSnds(); } /* *************** *** 6666,6671 **** GetWriteRecPtr(void) --- 6674,6682 ---- volatile XLogCtlData *xlogctl = XLogCtl; XLogRecPtr recptr; + if (LocalRecoveryInProgress) + return GetWalRcvWriteRecPtr(); + SpinLockAcquire(&xlogctl->info_lck); recptr = xlogctl->LogwrtResult.Write; SpinLockRelease(&xlogctl->info_lck); *** a/src/backend/replication/walsender.c --- b/src/backend/replication/walsender.c *************** *** 491,496 **** InitWalSnd(void) --- 491,505 ---- (errcode(ERRCODE_TOO_MANY_CONNECTIONS), errmsg("sorry, too many standbys already"))); + /* + * Use the recovery target timeline ID during recovery. + */ + if (RecoveryInProgress()) + { + MyWalSnd->cascaded = true; + ThisTimeLineID = GetRecoveryTargetTLI(); + } + /* Arrange to clean up at walsender exit */ on_shmem_exit(WalSndKill, 0); } *************** *** 506,511 **** WalSndKill(int code, Datum arg) --- 515,521 ---- * for this. */ MyWalSnd->pid = 0; + MyWalSnd->cascaded = false; /* WalSnd struct isn't mine anymore */ MyWalSnd = NULL; *************** *** 848,850 **** GetOldestWALSendPointer(void) --- 858,880 ---- } return oldest; } + + /* + * Stop only the cascaded walsender processes. + */ + void + ShutdownCascadedWalSnds(void) + { + int i; + + for (i = 0; i < MaxWalSenders; i++) + { + /* use volatile pointer to prevent code rearrangement */ + volatile WalSnd *walsnd = &WalSndCtl->walsnds[i]; + pid_t walsndpid; + + walsndpid = walsnd->pid; + if (walsndpid != 0 && walsnd->cascaded) + kill(walsndpid, SIGTERM); + } + } *** a/src/include/replication/walsender.h --- b/src/include/replication/walsender.h *************** *** 22,27 **** typedef struct WalSnd --- 22,28 ---- { pid_t pid; /* this walsender's process id, or 0 */ XLogRecPtr sentPtr; /* WAL has been sent up to this point */ + bool cascaded; /* this walsender is started during recovery? */ slock_t mutex; /* locks shared variables shown above */ } WalSnd; *************** *** 45,49 **** extern void WalSndSignals(void); --- 46,51 ---- extern Size WalSndShmemSize(void); extern void WalSndShmemInit(void); extern XLogRecPtr GetOldestWALSendPointer(void); + extern void ShutdownCascadedWalSnds(void); #endif /* _WALSENDER_H */
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers