On Wed, Jan 17, 2024 at 7:30 PM shveta malik <shveta.ma...@gmail.com> wrote: > > On Wed, Jan 17, 2024 at 3:08 PM Bertrand Drouvot > <bertranddrouvot...@gmail.com> wrote: > > > > Hi, > > > > On Tue, Jan 16, 2024 at 05:27:05PM +0530, shveta malik wrote: > > > PFA v62. Details: > > > > Thanks! > > > > > v62-003: > > > It is a new patch which attempts to implement slot-sync worker as a > > > special process which is neither a bgworker nor an Auxiliary process. > > > Here we get the benefit of converting enable_syncslot to a PGC_SIGHUP > > > Guc rather than PGC_POSTMASTER. We launch the slot-sync worker only if > > > it is hot-standby and 'enable_syncslot' is ON. > > > > The implementation looks reasonable to me (from what I can see some parts is > > copy/paste from an already existing "special" process and some parts are > > "sync slot" specific) which makes fully sense. > > > > A few remarks: > > > > 1 === > > + * Was it the slot sycn worker? > > > > Typo: sycn > > > > 2 === > > + * ones), and no walwriter, autovac launcher or bgwriter or > > slot sync > > > > Instead? "* ones), and no walwriter, autovac launcher, bgwriter or slot > > sync" > > > > 3 === > > + * restarting slot slyc worker. If stopSignaled is set, the worker will > > > > Typo: slyc > > > > 4 === > > +/* Flag to tell if we are in an slot sync worker process */ > > > > s/an/a/ ? > > > > 5 === (coming from v62-0002) > > + Assert(tuplestore_tuple_count(res->tuplestore) == 1); > > > > Is it even possible for the related query to not return only one row? (I > > think the > > "count" ensures it). > > > > 6 === > > if (conninfo_changed || > > primary_slotname_changed || > > + old_enable_syncslot != enable_syncslot || > > (old_hot_standby_feedback != hot_standby_feedback)) > > { > > ereport(LOG, > > errmsg("slot sync worker will restart > > because of" > > " a parameter change")); > > > > I don't think "slot sync worker will restart" is true if one change > > enable_syncslot > > from on to off. > > > > IMHO, v62-003 is in good shape and could be merged in v62-002 (that would > > ease > > the review). But let's wait to see if others think differently. > > > > Regards, > > > > -- > > Bertrand Drouvot > > PostgreSQL Contributors Team > > RDS Open Source Databases > > Amazon Web Services: https://aws.amazon.com > > > PFA v63. > > --It addresses comments by Peter given in [1], [2], comment by Nisha > given in [3], comments by Bertrand given in [4] > --It also moves race-condition fix from patch003 to patch002 as > suggested by Swada-san offlist. Race-condition is mentioned in [5] >
Thank you for updating the patch. I have some comments: --- + latestWalEnd = GetWalRcvLatestWalEnd(); + if (remote_slot->confirmed_lsn > latestWalEnd) + { + elog(ERROR, "exiting from slot synchronization as the received slot sync" + " LSN %X/%X for slot \"%s\" is ahead of the standby position %X/%X", + LSN_FORMAT_ARGS(remote_slot->confirmed_lsn), + remote_slot->name, + LSN_FORMAT_ARGS(latestWalEnd)); + } IIUC GetWalRcvLatestWalEnd () returns walrcv->latestWalEnd, which is typically the primary server's flush position and doesn't mean the LSN where the walreceiver received/flushed up to. Does it really happen that the slot's confirmed_flush_lsn is higher than the primary's flush lsn? --- After dropping a database on the primary, I got the following LOG (PID 2978463 is the slotsync worker on the standby): LOG: still waiting for backend with PID 2978463 to accept ProcSignalBarrier CONTEXT: WAL redo at 0/301CE00 for Database/DROP: dir 1663/16384 Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com