On Tue, Dec 12, 2023, at 12:58 PM, Drouvot, Bertrand wrote:
> Currently walrcv->walRcvState is set to WALRCV_STREAMING at the
> beginning of WalReceiverMain().
> 
> But it seems that after this assignment things could be wrong before the
> walreicever actually starts streaming (like not being able to connect
> to the primary).
> 
> It looks to me that WALRCV_STREAMING should be set once 
> walrcv_startstreaming()
> returns true: this is the proposal of this patch.

Per the state name (streaming), it seems it should be set later as you
proposed, however, I'm concerned about the previous state (WALRCV_STARTING). If
I'm reading the code correctly, WALRCV_STARTING is assigned at
RequestXLogStreaming():

                        SetInstallXLogFileSegmentActive();
                        RequestXLogStreaming(tli, ptr, PrimaryConnInfo,
                                             PrimarySlotName,
                                             wal_receiver_create_temp_slot);
                        flushedUpto = 0; 
                    }

                    /*
                     * Check if WAL receiver is active or wait to start up.
                     */
                    if (!WalRcvStreaming())
                    {    
                        lastSourceFailed = true;
                        break;
                    }

After a few lines the function WalRcvStreaming() has:

    SpinLockRelease(&walrcv->mutex);

    /*  
     * If it has taken too long for walreceiver to start up, give up. Setting
     * the state to STOPPED ensures that if walreceiver later does start up
     * after all, it will see that it's not supposed to be running and die
     * without doing anything.
     */
    if (state == WALRCV_STARTING)
    {   
        pg_time_t   now = (pg_time_t) time(NULL);

        if ((now - startTime) > WALRCV_STARTUP_TIMEOUT)
        {   
            bool        stopped = false;

            SpinLockAcquire(&walrcv->mutex);
            if (walrcv->walRcvState == WALRCV_STARTING)
            {   
                state = walrcv->walRcvState = WALRCV_STOPPED;
                stopped = true;
            }
            SpinLockRelease(&walrcv->mutex);

            if (stopped)
                ConditionVariableBroadcast(&walrcv->walRcvStoppedCV);
        }
    }   

Couldn't it give up before starting if you apply your patch? My main concern is
due to a slow system, the walrcv_connect() took to long in WalReceiverMain()
and the code above kills the walreceiver while in the process to start it.
Since you cannot control the hardcoded WALRCV_STARTUP_TIMEOUT value, you might
have issues during overload periods.


--
Euler Taveira
EDB   https://www.enterprisedb.com/

Reply via email to