> On Apr 1, 2016, at 10:04 AM, Michael Paquier <michael.paqu...@gmail.com> > wrote: > > I would suggest the following name modifications, node names have been > introduced to help tracking of each node's log: > - Candie => master > - Django => slave or just standby > There is no need for complication! And each node's log file is > prefixed by the test number. Note that in other tests there are no > promotions, or fallbacks done, but we stick with a node name that > represents the initial state of the cluster.
Ok, let’s reflect initial state in node names. So master and standby then. > >> +# Switch to synchronous replication >> +$node_master->append_conf('postgresql.conf', qq( >> +synchronous_standby_names = '*' >> +)); >> +$node_master->restart; >> Reloading would be fine. >> >> Good catch, done. > > +$node_master->psql('postgres', "select pg_reload_conf()"); > > It would be cleaner to introduce a new routine in PostgresNode that > calls pg_ctl reload (mentioned in the N-sync patch as well, that would > be useful for many purposes). Okay. > >> All accesses to 2pc dummies in ProcArray are covered with TwoPhaseStateLock, >> so I thick that’s safe. Also I’ve deleted comment above that block, probably >> it’s more confusing than descriptive. > > OK, you removed the use to allProcs. Though by reading again the code > just holding TwoPhaseStateLock that's actually fine. > > The patch needs a small cleanup: > $ git diff master --check > src/test/recovery/t/006_twophase.pl:224: new blank line at EOF. > > 006_twophase.pl should be renamed to 007. It keeps its license to > kill, and gains in being famous. Huh, eventually there will be Fleming reference, instead of Tarantino one in node names. > > - * Scan the pg_twophase directory and setup all the required information to > - * allow standby queries to treat prepared transactions as still active. > - * This is never called at the end of recovery - we use > - * RecoverPreparedTransactions() at that point. > + * It's a caller responsibility to call MarkAsPrepared() on returned gxact. > * > Wouldn't it be more simple to just call MarkAsPrepared at the end of > RecoverPreparedFromBuffer? I did that intentionally to allow modification of gxact before unlock. RecoverPreparedFromXLOG: gxact = RecoverPreparedFromBuffer((char *) XLogRecGetData(record), false); gxact->prepare_start_lsn = record->ReadRecPtr; gxact->prepare_end_lsn = record->EndRecPtr; MarkAsPrepared(gxact); RecoverPreparedFromFiles: gxact = RecoverPreparedFromBuffer(buf, forceOverwriteOK); gxact->ondisk = true; MarkAsPrepared(gxact); While both function are only called during recovery, I think that it is better to write that in a way when it possible to use it in multiprocess environment. > > While testing the patch, I found a bug in the recovery conflict code > path. You can do the following to reproduce it: > 1) Start a master with a standby > 2) prepare a transaction on master > 3) Stop immediate on standby to force replay > 4) commit prepare transaction on master > 5) When starting the standby, it remains stuck here: Hm, I wasn’t able to reproduce that. Do you mean following scenario or am I missing something? (async replication) $node_master->psql('postgres', " begin; insert into t values (1); prepare transaction 'x'; "); $node_slave->teardown_node; $node_master->psql('postgres',"commit prepared 'x'"); $node_slave->start; $node_slave->psql('postgres',"select count(*) from pg_prepared_xacts", stdout => \$psql_out); is($psql_out, '0', "Commit prepared on master while slave is down."); > * thread #1: tid = 0x229b4, 0x00007fff8e2d4f96 > libsystem_kernel.dylib`poll + 10, queue = 'com.apple.main-thread', > stop reason = signal SIGSTOP > * frame #0: 0x00007fff8e2d4f96 libsystem_kernel.dylib`poll + 10 > frame #1: 0x0000000107e5e043 > postgres`WaitEventSetWaitBlock(set=0x00007f90c20596a8, cur_timeout=-1, > occurred_events=0x00007fff581efd28, nevents=1) + 51 at latch.c:1102 > frame #2: 0x0000000107e5da26 > postgres`WaitEventSetWait(set=0x00007f90c20596a8, timeout=-1, > occurred_events=0x00007fff581efd28, nevents=1) + 390 at latch.c:935 > frame #3: 0x0000000107e5d4c7 > postgres`WaitLatchOrSocket(latch=0x0000000111432464, wakeEvents=1, > sock=-1, timeout=-1) + 343 at latch.c:347 > frame #4: 0x0000000107e5d36a > postgres`WaitLatch(latch=0x0000000111432464, wakeEvents=1, timeout=0) > + 42 at latch.c:302 > frame #5: 0x0000000107e7b5a6 postgres`ProcWaitForSignal + 38 at proc.c:1731 > frame #6: 0x0000000107e6a4eb > postgres`ResolveRecoveryConflictWithLock(locktag=LOCKTAG at > 0x00007fff581efde8) + 187 at standby.c:391 > frame #7: 0x0000000107e7a6a8 > postgres`ProcSleep(locallock=0x00007f90c203dac8, > lockMethodTable=0x00000001082f6218) + 1128 at proc.c:1215 > frame #8: 0x0000000107e72886 > postgres`WaitOnLock(locallock=0x00007f90c203dac8, > owner=0x0000000000000000) + 358 at lock.c:1703 > frame #9: 0x0000000107e70f93 > postgres`LockAcquireExtended(locktag=0x00007fff581f0238, lockmode=8, > sessionLock='\x01', dontWait='\0', reportMemoryError='\0') + 2819 at > lock.c:998 > frame #10: 0x0000000107e6a9a6 > postgres`StandbyAcquireAccessExclusiveLock(xid=863, dbOid=16384, > relOid=16385) + 358 at standby.c:627 > frame #11: 0x0000000107e6af0b > postgres`standby_redo(record=0x00007f90c2041e38) + 251 at > standby.c:809 > frame #12: 0x0000000107b0e227 postgres`StartupXLOG + 9351 at xlog.c:6871 > It seems that the replay on on-memory state of the PREPARE transaction > is conflicting directly with replay. -- Stas Kelvich Postgres Professional: http://www.postgrespro.com Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers