Hi, On Mon, Jun 15, 2026 at 9:02 PM Daisuke Higuchi <[email protected]> wrote: > > I found that "CREATE DATABASE ... TEMPLATE" with WAL_LOG strategy can > cause sequence values to rollback after a streaming replication failover. > > When creating a new database with "CREATE DATABASE ... TEMPLATE" from > a template database that has sequences, calling nextval() on those > sequences in the new database causes the primary to have a higher > last_value than the standby temporarily. > When a failover occurs and the standby is promoted in this state, the > sequence value appears to have rollbacked on the new primary. > I found this issue in PostgreSQL version 15, but I've confirmed that it also > happens on the master branch.
Nice catch! Thanks for reporting! > I show the example as follows. > > In the attached patch, during "CREATE DATABASE ... TEMPLATE" with > WAL_LOG strategy, the sequence's log_cnt is reset to 0 so that > SEQ_LOG_VALS (32) values are fetched in advance on the next nextval() > call. This ensures that the standby's last_value is greater than or > equal to the primary's last_value. > And, this patch opens sequence page using readBufferWithoutRelcache() > without relying on relcache entries and resets log_cnt. This follows > the same approach as ScanSourceDatabasePGClass(). > Note that this issue does not occur with FILE_COPY strategy because > this performs a checkpoint internally and a WAL record to fetch > SEQ_LOG_VALS in advance is always emitted on the next nextval() call. > However, I could not apply the same approach because I understand the > WAL_LOG strategy is designed to avoid checkpoints. Some comments on the patch: 1/ @@ -218,6 +222,10 @@ CreateDatabaseUsingWalLog(Oid src_dboid, Oid dst_dboid, /* Copy relation storage from source to the destination. */ CreateAndCopyRelationData(srcrlocator, dstrlocator, relinfo->permanent); + /* Reset log_cnt for sequences to ensure WAL on first nextval() */ + if (relinfo->relkind == RELKIND_SEQUENCE) + ResetSequenceLogCnt(dstrlocator, relinfo->permanent); What happens if we reset the log_cnt for some sequences and the CREATE DATABASE then fails (say, while copying other remaining relations' data pages)? We would have written reset WAL records, but the CREATE DATABASE failed - and those WAL records now not only need to be replayed for crash recovery but also flow through to standbys and logical replication clients. 2/ Don't we need to gate the WAL logging in ResetSequenceLogCnt with RelationNeedsWAL(rel)? 3/ WAL logging in ResetSequenceLogCnt and other sequence.c look mostly similar - try to dedup it with a similar one there? 4/ Why not move ResetSequenceLogCnt closer to its friends in sequence.c? 5/ Do we need any special handling for temporary sequences? 6/ Instead of a new file, why not add the test case to existing src/test/recovery/t/034_create_database.pl? -- Bharath Rupireddy Amazon Web Services: https://aws.amazon.com
