Setting up failback safe standby

*** a/doc/src/sgml/config.sgml --- b/doc/src/sgml/config.sgml *************** *** 1749,1754 **** include 'filename' --- 1749,1787 ---- + + synchronous_transfer (enum) + + synchronous_transfer configuration parameter + + + + This parameter controls the synchronous nature of WAL transfer and + maintains file system level consistency between master server and + standby server. Specifies whether master server will wait for file + system level change (for example : modifying data page) before + corresponding WAL records are replicated to the standby server. + + + Valid values are commit, data_flush and + all. The default value is commit, meaning + that master only wait for transaction commits, this is equivalent + to turning off synchronous_transfer parameter and standby + server will behave as a synchronous standby in + Streaming Replication. For value data_flush, master will + wait only for data page modifications but not for transaction + commits, hence the standby server will act as asynchronous + failback safe standby. For value all, master will wait + for data page modifications as well as for transaction commits and + resultant standby server will act as synchronous failback safe + standby. The wait is on background activities and hence will not + create much performance overhead. + To configure synchronous failback safe standby + should be set. + + + + wal_sync_method (enum) *************** *** 2258,2271 **** include 'filename' ! Specifies a comma-separated list of standby names that can support ! synchronous replication, as described in ! . ! At any one time there will be at most one active synchronous standby; ! transactions waiting for commit will be allowed to proceed after ! this standby server confirms receipt of their data. ! The synchronous standby will be the first standby named in this list ! that is both currently connected and streaming data in real-time (as shown by a state of streaming in the pg_stat_replication view). --- 2291,2315 ---- ! Specifies a comma-separated list of standby names. If this parameter ! is set then standby will behave as synchronous standby in replication, ! as described in or synchronous ! failback safe standby, as described in . ! At any time there will be at most one active standby; when standby is ! synchronous standby in replication, transactions waiting for commit ! will be allowed to proceed after this standby server confirms receipt ! of their data. But when standby is synchronous failback safe standby ! data page modifications as well as transaction commits will be allowed ! to proceed only after this standby server confirms receipt of their data. ! If this parameter is set to empty value and ! is set to data_flush ! then standby is called as asynchronous failback safe standby and only ! data page modifications will wait before corresponding WAL record is ! replicated to standby. ! ! ! Synchronous standby in replication will be the first standby named in ! this list that is both currently connected and streaming data in real-time (as shown by a state of streaming in the pg_stat_replication view). *** a/doc/src/sgml/high-availability.sgml --- b/doc/src/sgml/high-availability.sgml *************** *** 1140,1145 **** primary_conninfo = 'host=192.168.1.50 port=5432 user=foo password=foopass' --- 1140,1203 ---- + + + Setting up failback safe standby + + + Setting up failback safe standby + + + + PostgreSQL Streaming Replication offers durability, but if the master + crashes and particular WAL record is unable to reach to standby + server, then that WAL record is present on master server but not + on standby server. In such a case master is ahead of standby server + in term of WAL records and Data in database. This will lead to + file-system level inconsistency between master and standby server. + For example a heap page update on the master might not have been reflected + on standby when master crashes. + + + + Due to this inconsistency fresh backup of new master onto new standby + is needed to re-prepare HA cluster. Taking fresh backup can be very + time consuming process when database is of large size. In such a case + disaster recovery can take very long time if Streaming Replication is + used to setup the high availability cluster. + + + + If HA cluster is configured with failback safe standby then this fresh + back up can be avoided. The + parameter has control over all WAL transfers and will not make any file + system level change until master gets a confirmation from standby server. + This avoids the need of a fresh backup by maintaining consistency. + + + + Basic Configuration + + Failback safe standby can be asynchronous or synchronous in nature. + This will depend upon whether master will wait for transaction commit + or not. By default failback safe mechanism is turned off. + + + + The first step to configure HA with failback safe standby is to setup + synchronous streaming replication. Configuring synchronous failback + safe standby requires setting up to + all. This configuration will cause each commit and data + page modification to wait for confirmation that standby has written + corresponding WAL record to durable storage. Configuring asynchronous + failback safe standby requires setting up + to data_flush. This configuration will cause only data + page modifications to wait for confirmation that standby has written + corresponding WAL record to durable storage. + + + + *************** *** 1201,1212 **** primary_conninfo = 'host=192.168.1.50 port=5432 user=foo password=foopass' ! So, switching from primary to standby server can be fast but requires ! some time to re-prepare the failover cluster. Regular switching from ! primary to standby is useful, since it allows regular downtime on ! each system for maintenance. This also serves as a test of the ! failover mechanism to ensure that it will really work when you need it. ! Written administration procedures are advised. --- 1259,1286 ---- ! At the time of failover there is a possibility of file-system level ! inconsistency between old primary and old standby server hence ! fresh backup from new master onto old master is needed for Configuring ! old standby server as a new standby server. Without taking fresh ! backup even if the new standby starts, streaming replication does not ! start successfully. The activity of taking backup can be fast for smaller ! database but for large database requires more time to re-prepare the ! failover cluster and could break the service level agreement of crash ! recovery. The need of fresh backup and problem of long ! recovery time can be solved if HA cluster is configured with ! failback safe standby see . ! Failback safe standby makes WAL transfer synchronous at required ! places and maintains the file-system level consistency between ! master and standby server and the old master server can be easily ! configured as new standby server. ! ! ! ! Regular switching from primary to standby is useful, since it allows ! regular downtime on each system for maintenance. This also serves as ! a test of the failover mechanism to ensure that it will really work ! when you need it. Written administration procedures are advised. *** a/doc/src/sgml/perform.sgml --- b/doc/src/sgml/perform.sgml *************** *** 1569,1574 **** SELECT * FROM x, y, a, b, c WHERE something AND somethingelse; --- 1569,1582 ---- corruption) in case of a crash of the database alone. + + + + Set to commit; there is no + need to guard against database inconsistency between master and standby + server during failover. + + *** a/src/backend/access/transam/clog.c --- b/src/backend/access/transam/clog.c *************** *** 37,42 **** --- 37,44 ---- #include "access/transam.h" #include "miscadmin.h" #include "pg_trace.h" + #include "replication/syncrep.h" + #include "replication/walsender.h" /* * Defines for CLOG page sizes. A page is the same BLCKSZ as is used *************** *** 708,715 **** WriteZeroPageXlogRec(int pageno) /* * Write a TRUNCATE xlog record * ! * We must flush the xlog record to disk before returning --- see notes ! * in TruncateCLOG(). */ static void WriteTruncateXlogRec(int pageno) --- 710,719 ---- /* * Write a TRUNCATE xlog record * ! * Before returning we must flush the xlog record to disk ! * and if synchronous transfer is requested wait for failback ! * safe standby to receive WAL up to recptr. ! * --- see notes in TruncateCLOG(). */ static void WriteTruncateXlogRec(int pageno) *************** *** 723,728 **** WriteTruncateXlogRec(int pageno) --- 727,738 ---- rdata.next = NULL; recptr = XLogInsert(RM_CLOG_ID, CLOG_TRUNCATE, &rdata); XLogFlush(recptr); + + /* + * Wait for failback safe standby. + */ + if (SyncTransRequested()) + SyncRepWaitForLSN(recptr, true, true); } /* *** a/src/backend/access/transam/slru.c --- b/src/backend/access/transam/slru.c *************** *** 54,59 **** --- 54,61 ---- #include "access/slru.h" #include "access/transam.h" #include "access/xlog.h" + #include "replication/syncrep.h" + #include "replication/walsender.h" #include "storage/fd.h" #include "storage/shmem.h" #include "miscadmin.h" *************** *** 744,749 **** SlruPhysicalWritePage(SlruCtl ctl, int pageno, int slotno, SlruFlush fdata) --- 746,758 ---- START_CRIT_SECTION(); XLogFlush(max_lsn); END_CRIT_SECTION(); + + /* + * If synchronous transfer is requested, wait for failback safe + * standby to receive WAL up to max_lsn. + */ + if (SyncTransRequested()) + SyncRepWaitForLSN(max_lsn, true, true); } } *** a/src/backend/access/transam/twophase.c --- b/src/backend/access/transam/twophase.c *************** *** 1091,1102 **** EndPrepare(GlobalTransaction gxact) END_CRIT_SECTION(); /* ! * Wait for synchronous replication, if required. * * Note that at this stage we have marked the prepare, but still show as * running in the procarray (twice!) and continue to hold locks. */ ! SyncRepWaitForLSN(gxact->prepare_lsn); records.tail = records.head = NULL; } --- 1091,1102 ---- END_CRIT_SECTION(); /* ! * Wait for synchronous/synchronous failback safe standby, if required. * * Note that at this stage we have marked the prepare, but still show as * running in the procarray (twice!) and continue to hold locks. */ ! SyncRepWaitForLSN(gxact->prepare_lsn, false, true); records.tail = records.head = NULL; } *************** *** 2058,2069 **** RecordTransactionCommitPrepared(TransactionId xid, END_CRIT_SECTION(); /* ! * Wait for synchronous replication, if required. * * Note that at this stage we have marked clog, but still show as running * in the procarray and continue to hold locks. */ ! SyncRepWaitForLSN(recptr); } /* --- 2058,2069 ---- END_CRIT_SECTION(); /* ! * Wait for synchronous/synchronous failback safe standby, if required. * * Note that at this stage we have marked clog, but still show as running * in the procarray and continue to hold locks. */ ! SyncRepWaitForLSN(recptr, false, true); } /* *************** *** 2138,2147 **** RecordTransactionAbortPrepared(TransactionId xid, END_CRIT_SECTION(); /* ! * Wait for synchronous replication, if required. * * Note that at this stage we have marked clog, but still show as running * in the procarray and continue to hold locks. */ ! SyncRepWaitForLSN(recptr); } --- 2138,2147 ---- END_CRIT_SECTION(); /* ! * Wait for synchronous/synchronous failback safe standby, if required. * * Note that at this stage we have marked clog, but still show as running * in the procarray and continue to hold locks. */ ! SyncRepWaitForLSN(recptr, false, true); } *** a/src/backend/access/transam/xact.c --- b/src/backend/access/transam/xact.c *************** *** 1189,1201 **** RecordTransactionCommit(void) latestXid = TransactionIdLatest(xid, nchildren, children); /* ! * Wait for synchronous replication, if required. * * Note that at this stage we have marked clog, but still show as running * in the procarray and continue to hold locks. */ if (wrote_xlog) ! SyncRepWaitForLSN(XactLastRecEnd); /* Reset XactLastRecEnd until the next transaction writes something */ XactLastRecEnd = 0; --- 1189,1201 ---- latestXid = TransactionIdLatest(xid, nchildren, children); /* ! * Wait for synchronous/synchronous failback safe standby, if required. * * Note that at this stage we have marked clog, but still show as running * in the procarray and continue to hold locks. */ if (wrote_xlog) ! SyncRepWaitForLSN(XactLastRecEnd, false, true); /* Reset XactLastRecEnd until the next transaction writes something */ XactLastRecEnd = 0; *************** *** 4690,4697 **** xact_redo_commit_internal(TransactionId xid, XLogRecPtr lsn, --- 4690,4706 ---- * for any user that requested ForceSyncCommit(). */ if (XactCompletionForceSyncCommit(xinfo)) + { XLogFlush(lsn); + /* + * If synchronous transfer is requested, wait for failback safe + * standby to receive WAL up to lsn, + */ + if (SyncTransRequested()) + SyncRepWaitForLSN(lsn, true, true); + + } } /* *** a/src/backend/access/transam/xlog.c --- b/src/backend/access/transam/xlog.c *************** *** 39,44 **** --- 39,45 ---- #include "pgstat.h" #include "postmaster/bgwriter.h" #include "postmaster/startup.h" + #include "replication/syncrep.h" #include "replication/walreceiver.h" #include "replication/walsender.h" #include "storage/barrier.h" *************** *** 8282,8287 **** CreateCheckPoint(int flags) --- 8283,8300 ---- END_CRIT_SECTION(); /* + * If synchronous transfer is requested, wait for failback safe standby + * to receive WAL up to checkpoint WAL record. Otherwise if failure occurs + * before standby receives CHECKPOINT WAL record causes an inconsistency + * between control files of master and standby. Because of this master will + * start from a location which is not known to the standby at the time fail-over. + * + * There is no need to wait for shutdown CHECKPOINT. + */ + if (SyncTransRequested()) + SyncRepWaitForLSN(recptr, true, !shutdown); + + /* * Let smgr do post-checkpoint cleanup (eg, deleting old files). */ smgrpostckpt(); *** a/src/backend/catalog/storage.c --- b/src/backend/catalog/storage.c *************** *** 25,30 **** --- 25,32 ---- #include "catalog/catalog.h" #include "catalog/storage.h" #include "catalog/storage_xlog.h" + #include "replication/syncrep.h" + #include "replication/walsender.h" #include "storage/freespace.h" #include "storage/smgr.h" #include "utils/memutils.h" *************** *** 288,293 **** RelationTruncate(Relation rel, BlockNumber nblocks) --- 290,303 ---- */ if (fsm || vm) XLogFlush(lsn); + + /* + * If synchronous transfer is requested, wait for failback safe standby + * to receive WAL up to lsn. Otherwise, we may have a situation where + * the heap is truncated, but the action never replayed on the standby. + */ + if (SyncTransRequested()) + SyncRepWaitForLSN(lsn, true, true); } /* Do the real work */ *************** *** 521,526 **** smgr_redo(XLogRecPtr lsn, XLogRecord *record) --- 531,543 ---- */ XLogFlush(lsn); + /* + * If synchronous transfer is requested, wait for failback safe standby + * to receive WAL up to lsn. + */ + if (SyncTransRequested()) + SyncRepWaitForLSN(lsn, true, true); + smgrtruncate(reln, MAIN_FORKNUM, xlrec->blkno); /* Also tell xlogutils.c about it */ *** a/src/backend/postmaster/autovacuum.c --- b/src/backend/postmaster/autovacuum.c *************** *** 85,90 **** --- 85,92 ---- #include "postmaster/autovacuum.h" #include "postmaster/fork_process.h" #include "postmaster/postmaster.h" + #include "replication/syncrep.h" + #include "replication/walsender.h" #include "storage/bufmgr.h" #include "storage/ipc.h" #include "storage/latch.h" *************** *** 1591,1598 **** AutoVacWorkerMain(int argc, char *argv[]) * Force synchronous replication off to allow regular maintenance even if * we are waiting for standbys to connect. This is important to ensure we * aren't blocked from performing anti-wraparound tasks. */ ! if (synchronous_commit > SYNCHRONOUS_COMMIT_LOCAL_FLUSH) SetConfigOption("synchronous_commit", "local", PGC_SUSET, PGC_S_OVERRIDE); --- 1593,1602 ---- * Force synchronous replication off to allow regular maintenance even if * we are waiting for standbys to connect. This is important to ensure we * aren't blocked from performing anti-wraparound tasks. + * Note that if sync transfer is requested, we can't regular maintenance until + * standbys to connect. */ ! if (synchronous_commit > SYNCHRONOUS_COMMIT_LOCAL_FLUSH && !SyncTransRequested()) SetConfigOption("synchronous_commit", "local", PGC_SUSET, PGC_S_OVERRIDE); *** a/src/backend/replication/syncrep.c --- b/src/backend/replication/syncrep.c *************** *** 67,72 **** static bool announce_next_takeover = true; --- 67,74 ---- static int SyncRepWaitMode = SYNC_REP_NO_WAIT; + int synchronous_transfer = SYNCHRONOUS_TRANSFER_COMMIT; + static void SyncRepQueueInsert(int mode); static void SyncRepCancelWait(void); *************** *** 83,110 **** static bool SyncRepQueueIsOrderedByLSN(int mode); */ /* ! * Wait for synchronous replication, if requested by user. * * Initially backends start in state SYNC_REP_NOT_WAITING and then ! * change that state to SYNC_REP_WAITING before adding ourselves ! * to the wait queue. During SyncRepWakeQueue() a WALSender changes ! * the state to SYNC_REP_WAIT_COMPLETE once replication is confirmed. ! * This backend then resets its state to SYNC_REP_NOT_WAITING. */ ! void ! SyncRepWaitForLSN(XLogRecPtr XactCommitLSN) { char *new_status = NULL; const char *old_status; int mode = SyncRepWaitMode; /* ! * Fast exit if user has not requested sync replication, or there are no ! * sync replication standby names defined. Note that those standbys don't ! * need to be connected. */ ! if (!SyncRepRequested() || !SyncStandbysDefined()) ! return; Assert(SHMQueueIsDetached(&(MyProc->syncRepLinks))); Assert(WalSndCtl != NULL); --- 85,151 ---- */ /* ! * Wait for synchronous/failback safe standby, if requested by user. * * Initially backends start in state SYNC_REP_NOT_WAITING and then ! * change that state to SYNC_REP_WAITING/SYNC_REP_WAITING_FOR_DATA_FLUSH ! * before adding ourselves to the wait queue. During SyncRepWakeQueue() a ! * WALSender changes the state to SYNC_REP_WAIT_COMPLETE once replication is ! * confirmed. This backend then resets its state to SYNC_REP_NOT_WAITING. ! * ! * ForDataFlush - if TRUE, we wait before flushing data page. ! * Otherwise wait for the sync standby ! * ! * Wait - if FALSE, we don't actually wait, but tell the caller whether or not ! * the standby has already made progressed upto the given XactCommitLSN ! * ! * Return TRUE if either the synchronous standby/failback safe standby is not ! * configured/turned off OR the standby has made enough progress */ ! bool ! SyncRepWaitForLSN(XLogRecPtr XactCommitLSN, bool ForDataFlush, bool Wait) { char *new_status = NULL; const char *old_status; int mode = SyncRepWaitMode; + bool ret; /* ! * Fast exit If there are no sync replication standby names defined. ! * Note that those standbys don't need to be connected. */ ! if (!SyncStandbysDefined()) ! return true; ! ! /* if user has not requested sync replication, exit */ ! if (!SyncRepRequested() && !ForDataFlush) ! return true; ! ! /* ! * If the caller has specified ForDataFlush, but synchronous transfer ! * is not specified or its turned off, exit. ! * ! * We would like to allow the failback safe mechanism even for cascaded ! * standbys as well. But we can't really wait for the standby to catch ! * up until we reach a consistent state since the standbys won't be ! * even able to connect without us reaching in that state (XXX Confirm) ! */ ! if (!SyncTransRequested() && ForDataFlush) ! return true; ! ! /* ! * If the caller has not specified ForDataFlush, but synchronous commit ! * is skipped by values of synchronous_transfer, exit. ! */ ! if (IsSyncRepSkipped() && !ForDataFlush) ! return true; ! ! /* ! * If both synchronous replication and synchronous transfer ! * are requested but the system still in recovery, exit. ! */ ! if (RecoveryInProgress()) ! return true; Assert(SHMQueueIsDetached(&(MyProc->syncRepLinks))); Assert(WalSndCtl != NULL); *************** *** 120,130 **** SyncRepWaitForLSN(XLogRecPtr XactCommitLSN) * condition but we'll be fetching that cache line anyway so it's likely to * be a low cost check. */ ! if (!WalSndCtl->sync_standbys_defined || XactCommitLSN <= WalSndCtl->lsn[mode]) { LWLockRelease(SyncRepLock); ! return; } /* --- 161,180 ---- * condition but we'll be fetching that cache line anyway so it's likely to * be a low cost check. */ ! if ((!ForDataFlush && !WalSndCtl->sync_standbys_defined) || XactCommitLSN <= WalSndCtl->lsn[mode]) { LWLockRelease(SyncRepLock); ! return true; ! } ! ! /* ! * Exit if we are told not to block on the standby. ! */ ! if (!Wait) ! { ! LWLockRelease(SyncRepLock); ! return false; } /* *************** *** 151,156 **** SyncRepWaitForLSN(XLogRecPtr XactCommitLSN) --- 201,208 ---- new_status[len] = '\0'; /* truncate off " waiting ..." */ } + ret = false; + /* * Wait for specified LSN to be confirmed. * *************** *** 187,193 **** SyncRepWaitForLSN(XLogRecPtr XactCommitLSN) --- 239,248 ---- LWLockRelease(SyncRepLock); } if (syncRepState == SYNC_REP_WAIT_COMPLETE) + { + ret = true; break; + } /* * If a wait for synchronous replication is pending, we can neither *************** *** 264,269 **** SyncRepWaitForLSN(XLogRecPtr XactCommitLSN) --- 319,326 ---- set_ps_display(new_status, false); pfree(new_status); } + + return ret; } /* *** a/src/backend/storage/buffer/bufmgr.c --- b/src/backend/storage/buffer/bufmgr.c *************** *** 41,46 **** --- 41,48 ---- #include "pg_trace.h" #include "pgstat.h" #include "postmaster/bgwriter.h" + #include "replication/syncrep.h" + #include "replication/walsender.h" #include "storage/buf_internals.h" #include "storage/bufmgr.h" #include "storage/ipc.h" *************** *** 1975,1982 **** FlushBuffer(volatile BufferDesc *buf, SMgrRelation reln) * skip the flush if the buffer isn't permanent. */ if (buf->flags & BM_PERMANENT) XLogFlush(recptr); ! /* * Now it's safe to write buffer to disk. Note that no one else should * have been able to write it while we were busy with log flushing because --- 1977,1990 ---- * skip the flush if the buffer isn't permanent. */ if (buf->flags & BM_PERMANENT) + { XLogFlush(recptr); ! /* If synchronous transfer is requested, wait for failback safe standby ! * to receive WAL up to recptr. ! */ ! if (SyncTransRequested()) ! SyncRepWaitForLSN(recptr, true, true); ! } /* * Now it's safe to write buffer to disk. Note that no one else should * have been able to write it while we were busy with log flushing because *** a/src/backend/utils/cache/relmapper.c --- b/src/backend/utils/cache/relmapper.c *************** *** 48,53 **** --- 48,55 ---- #include "catalog/pg_tablespace.h" #include "catalog/storage.h" #include "miscadmin.h" + #include "replication/syncrep.h" + #include "replication/walsender.h" #include "storage/fd.h" #include "storage/lwlock.h" #include "utils/inval.h" *************** *** 711,716 **** write_relmap_file(bool shared, RelMapFile *newmap, --- 713,719 ---- int fd; RelMapFile *realmap; char mapfilename[MAXPGPATH]; + XLogRecPtr lsn=InvalidXLogRecPtr; /* * Fill in the overhead fields and update CRC. *************** *** 753,759 **** write_relmap_file(bool shared, RelMapFile *newmap, { xl_relmap_update xlrec; XLogRecData rdata[2]; - XLogRecPtr lsn; /* now errors are fatal ... */ START_CRIT_SECTION(); --- 756,761 ---- *************** *** 849,854 **** write_relmap_file(bool shared, RelMapFile *newmap, --- 851,863 ---- /* Critical section done */ if (write_wal) END_CRIT_SECTION(); + + /* + * If synchronous transfer is requested, wait for failback safe + * standby to receive WAL up to recptr. + */ + if (SyncTransRequested()) + SyncRepWaitForLSN(lsn, true, true); } /* *** a/src/backend/utils/misc/guc.c --- b/src/backend/utils/misc/guc.c *************** *** 381,386 **** static const struct config_enum_entry synchronous_commit_options[] = { --- 381,396 ---- }; /* + * Although only "all", "data_flush", and "commit" are documented + */ + static const struct config_enum_entry synchronous_transfer_options[] = { + {"all", SYNCHRONOUS_TRANSFER_ALL, false}, + {"data_flush", SYNCHRONOUS_TRANSFER_DATA_FLUSH, false}, + {"commit", SYNCHRONOUS_TRANSFER_COMMIT, false}, + {NULL, 0, false} + }; + + /* * Options for enum values stored in other modules */ extern const struct config_enum_entry wal_level_options[]; *************** *** 3300,3305 **** static struct config_enum ConfigureNamesEnum[] = --- 3310,3325 ---- }, { + {"synchronous_transfer", PGC_SIGHUP, WAL_SETTINGS, + gettext_noop("Sets the data flush synchronization level"), + NULL + }, + &synchronous_transfer, + SYNCHRONOUS_TRANSFER_COMMIT, synchronous_transfer_options, + NULL, NULL, NULL + }, + + { {"trace_recovery_messages", PGC_SIGHUP, DEVELOPER_OPTIONS, gettext_noop("Enables logging of recovery-related debugging information."), gettext_noop("Each level includes all the levels that follow it. The later" *** a/src/backend/utils/misc/postgresql.conf.sample --- b/src/backend/utils/misc/postgresql.conf.sample *************** *** 220,225 **** --- 220,227 ---- #synchronous_standby_names = '' # standby servers that provide sync rep # comma-separated list of application_name # from standby(s); '*' = all + #synchronous_transfer = commit # data page synchronization level + # commit, data_flush or all #vacuum_defer_cleanup_age = 0 # number of xacts by which cleanup is delayed # - Standby Servers - *** a/src/backend/utils/time/tqual.c --- b/src/backend/utils/time/tqual.c *************** *** 60,65 **** --- 60,67 ---- #include "access/subtrans.h" #include "access/transam.h" #include "access/xact.h" + #include "replication/walsender.h" + #include "replication/syncrep.h" #include "storage/bufmgr.h" #include "storage/procarray.h" #include "utils/tqual.h" *************** *** 115,120 **** SetHintBits(HeapTupleHeader tuple, Buffer buffer, --- 117,134 ---- if (XLogNeedsFlush(commitLSN) && BufferIsPermanent(buffer)) return; /* not flushed yet, so don't set hint */ + + /* + * If synchronous transfer is requested, we check if the commit WAL record + * has made to the standby before allowing hint bit updates. We should not + * wait for the standby to receive the WAL since its OK to delay hint bit + * updates. + */ + if (SyncTransRequested()) + { + if(!SyncRepWaitForLSN(commitLSN, true, false)) + return; + } } tuple->t_infomask |= infomask; *** a/src/include/replication/syncrep.h --- b/src/include/replication/syncrep.h *************** *** 19,24 **** --- 19,30 ---- #define SyncRepRequested() \ (max_wal_senders > 0 && synchronous_commit > SYNCHRONOUS_COMMIT_LOCAL_FLUSH) + #define SyncTransRequested() \ + (max_wal_senders > 0 && synchronous_transfer > SYNCHRONOUS_TRANSFER_COMMIT) + + #define IsSyncRepSkipped() \ + (max_wal_senders > 0 && synchronous_transfer == SYNCHRONOUS_TRANSFER_DATA_FLUSH) + /* SyncRepWaitMode */ #define SYNC_REP_NO_WAIT -1 #define SYNC_REP_WAIT_WRITE 0 *************** *** 31,41 **** #define SYNC_REP_WAITING 1 #define SYNC_REP_WAIT_COMPLETE 2 /* user-settable parameters for synchronous replication */ extern char *SyncRepStandbyNames; /* called by user backend */ ! extern void SyncRepWaitForLSN(XLogRecPtr XactCommitLSN); /* called at backend exit */ extern void SyncRepCleanupAtProcExit(void); --- 37,59 ---- #define SYNC_REP_WAITING 1 #define SYNC_REP_WAIT_COMPLETE 2 + typedef enum + { + SYNCHRONOUS_TRANSFER_COMMIT, /* no wait for flush data page */ + SYNCHRONOUS_TRANSFER_DATA_FLUSH, /* wait for data page flush only + * no wait for WAL */ + SYNCHRONOUS_TRANSFER_ALL /* wait for data page flush and WAL*/ + } SynchronousTransferLevel; + /* user-settable parameters for synchronous replication */ extern char *SyncRepStandbyNames; + /* user-settable parameters for failback safe replication */ + extern int synchronous_transfer; + /* called by user backend */ ! extern bool SyncRepWaitForLSN(XLogRecPtr XactCommitLSN, ! bool ForDataFlush, bool Wait); /* called at backend exit */ extern void SyncRepCleanupAtProcExit(void);