At Mon, 11 Oct 2021 15:24:46 +0900, Fujii Masao <masao.fu...@oss.nttdata.com> wrote in > > > On 2021/10/09 22:22, Bharath Rupireddy wrote: > > Hi, > > It looks like auxiliary processes will not have a valid MyBackendId as > > they don't call InitPostgres() and SharedInvalBackendInit() unlike > > backends. But the startup process (which is an auxiliary process) in > > hot standby mode seems to be different in the way that it does have a > > valid MyBackendId as SharedInvalBackendInit() gets called from > > InitRecoveryTransactionEnvironment(). The SharedInvalBackendInit() > > usually stores the MyBackendId in the caller's PGPROC structure i.e. > > MyProc->backendId. The auxiliary processes (including the startup > > process) usually register themselves in procsignal array with > > ProcSignalInit(MaxBackends + MyAuxProcType + 1) unlike the backends > > with ProcSignalInit(MyBackendId) after SharedInvalBackendInit() in > > InitPostgres(). > > The problem comes when a postgres process wants to send a multiplexed > > SIGUSR1 signal (probably using SendProcSignal()) to the startup > > process after receiving its ProcSignal->psh_slot[] with its backendId > > from the PGPROC (the postgres process can get the startup process > > PGPROC structure from AuxiliaryPidGetProc()). Remember the startup > > process has registered in the procsignal array with > > ProcSignalInit(MaxBackends + MyAuxProcType + 1), not with the > > ProcSignalInit(MyBackendId) like the backends did. So, the postgres > > process, wanting to send SIGUSR1 to the startup process, refers to the > > wrong ProcSignal->psh_slot[] and may not send the signal. > > Is this inconsistency of MyBackendId for a startup process a problem > > at all? Thoughts? > > These are the following ways I think we can fix it, if at all some > > other hacker agrees that it is actually an issue: > > 1) Fix the startup process code, probably by unregistering the > > procsignal array entry that was made with ProcSignalInit(MaxBackends + > > MyAuxProcType + 1) in AuxiliaryProcessMain() and register with > > ProcSignalInit(MyBackendId) immediately after SharedInvalBackendInit() > > calculates the MyBackendId in with SharedInvalBackendInit() in > > InitRecoveryTransactionEnvironment(). This seems risky to me as > > unregistering and registering ProcSignal array involves some barriers > > and during the registering and unregistering window, the startup > > process may miss the SIGUSR1. > > 2) Ensure that the process, that wants to send the startup process > > SIGUSR1 signal, doesn't use the backendId from the startup process > > PGPROC, in which case it has to loop over all the entries of > > ProcSignal->psh_slot[] array to find the entry with the startup > > process PID. It seems easier and less riskier but only caveat is that > > the sending process shouldn't look at the backendId from auxiliary > > process PGPROC, instead it should just traverse the entire proc signal > > array to find the right slot. > > 3) Add a comment around AuxiliaryPidGetProc() that says "auxiliary > > processes don't have valid backend ids, so don't use the backendId > > from the returned PGPROC". > > (2) and (3) seem reasonable to me. Thoughts?
(I'm not sure how the trouble happens.) 2 and 3 looks like fixing inconsistency by another inconsistency. I'm not sure 1 is acceptable. > How about modifying SharedInvalBackendInit() so that it accepts > BackendId as an argument and allocates the ProcState entry of > the specified BackendId? That is, the startup process determines > that its BackendId is "MaxBackends + MyAuxProcType (=StartupProcess) + > 1" > in AuxiliaryProcessMain(), and then it passes that BackendId to > SharedInvalBackendInit() in InitRecoveryTransactionEnvironment(). > > Maybe you need to enlarge ProcState array so that it also handles > auxiliary processes if it does not for now. It seems to me that the backendId on startup process is used only for vxid generation. Actually "make check-world" doesn't fail by skipping SharedInvalBackendinit() (and disabling an assertion). I thought that we could decouple vxid from backend ID (or sinval code) by using pgprocno for vxid generation instead of backend ID. "make check-world" doesn't fail with that change, too. (I don't think "doesn't fail" ncecessarily mean that that change is correct, though), but vxid gets somewhat odd after the change.. =# select distinct virtualxid from pg_locks; virtualxid ------------ 116/1 # startup 99/48 # backend 1 ... regards. -- Kyotaro Horiguchi NTT Open Source Software Center