On Wed, Mar 30, 2016 at 10:19 PM, Stas Kelvich <s.kelv...@postgrespro.ru> wrote: > Hm, it’s hard to create descriptive names because test changes master/slave > roles for that nodes several times during test.
Really? the names used in the patch help less then. > It’s possible to call them > “node1” and “node2” but I’m not sure that improves something. But anyway I’m > not insisting on that particular names and will agree with any reasonable > suggestion. I would suggest the following name modifications, node names have been introduced to help tracking of each node's log: - Candie => master - Django => slave or just standby There is no need for complication! And each node's log file is prefixed by the test number. Note that in other tests there are no promotions, or fallbacks done, but we stick with a node name that represents the initial state of the cluster. > +# Switch to synchronous replication > +$node_master->append_conf('postgresql.conf', qq( > +synchronous_standby_names = '*' > +)); > +$node_master->restart; > Reloading would be fine. > > Good catch, done. +$node_master->psql('postgres', "select pg_reload_conf()"); It would be cleaner to introduce a new routine in PostgresNode that calls pg_ctl reload (mentioned in the N-sync patch as well, that would be useful for many purposes). > All accesses to 2pc dummies in ProcArray are covered with TwoPhaseStateLock, > so I thick that’s safe. Also I’ve deleted comment above that block, probably > it’s more confusing than descriptive. OK, you removed the use to allProcs. Though by reading again the code just holding TwoPhaseStateLock that's actually fine. The patch needs a small cleanup: $ git diff master --check src/test/recovery/t/006_twophase.pl:224: new blank line at EOF. 006_twophase.pl should be renamed to 007. It keeps its license to kill, and gains in being famous. - * Scan the pg_twophase directory and setup all the required information to - * allow standby queries to treat prepared transactions as still active. - * This is never called at the end of recovery - we use - * RecoverPreparedTransactions() at that point. + * It's a caller responsibility to call MarkAsPrepared() on returned gxact. * Wouldn't it be more simple to just call MarkAsPrepared at the end of RecoverPreparedFromBuffer? While testing the patch, I found a bug in the recovery conflict code path. You can do the following to reproduce it: 1) Start a master with a standby 2) prepare a transaction on master 3) Stop immediate on standby to force replay 4) commit prepare transaction on master 5) When starting the standby, it remains stuck here: * thread #1: tid = 0x229b4, 0x00007fff8e2d4f96 libsystem_kernel.dylib`poll + 10, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP * frame #0: 0x00007fff8e2d4f96 libsystem_kernel.dylib`poll + 10 frame #1: 0x0000000107e5e043 postgres`WaitEventSetWaitBlock(set=0x00007f90c20596a8, cur_timeout=-1, occurred_events=0x00007fff581efd28, nevents=1) + 51 at latch.c:1102 frame #2: 0x0000000107e5da26 postgres`WaitEventSetWait(set=0x00007f90c20596a8, timeout=-1, occurred_events=0x00007fff581efd28, nevents=1) + 390 at latch.c:935 frame #3: 0x0000000107e5d4c7 postgres`WaitLatchOrSocket(latch=0x0000000111432464, wakeEvents=1, sock=-1, timeout=-1) + 343 at latch.c:347 frame #4: 0x0000000107e5d36a postgres`WaitLatch(latch=0x0000000111432464, wakeEvents=1, timeout=0) + 42 at latch.c:302 frame #5: 0x0000000107e7b5a6 postgres`ProcWaitForSignal + 38 at proc.c:1731 frame #6: 0x0000000107e6a4eb postgres`ResolveRecoveryConflictWithLock(locktag=LOCKTAG at 0x00007fff581efde8) + 187 at standby.c:391 frame #7: 0x0000000107e7a6a8 postgres`ProcSleep(locallock=0x00007f90c203dac8, lockMethodTable=0x00000001082f6218) + 1128 at proc.c:1215 frame #8: 0x0000000107e72886 postgres`WaitOnLock(locallock=0x00007f90c203dac8, owner=0x0000000000000000) + 358 at lock.c:1703 frame #9: 0x0000000107e70f93 postgres`LockAcquireExtended(locktag=0x00007fff581f0238, lockmode=8, sessionLock='\x01', dontWait='\0', reportMemoryError='\0') + 2819 at lock.c:998 frame #10: 0x0000000107e6a9a6 postgres`StandbyAcquireAccessExclusiveLock(xid=863, dbOid=16384, relOid=16385) + 358 at standby.c:627 frame #11: 0x0000000107e6af0b postgres`standby_redo(record=0x00007f90c2041e38) + 251 at standby.c:809 frame #12: 0x0000000107b0e227 postgres`StartupXLOG + 9351 at xlog.c:6871 It seems that the replay on on-memory state of the PREPARE transaction is conflicting directly with replay. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers