On Tue, May 11, 2021 at 3:38 PM Amul Sul <sula...@gmail.com> wrote: > > On Tue, May 11, 2021 at 2:26 PM Dilip Kumar <dilipbal...@gmail.com> wrote: > > > > On Tue, May 11, 2021 at 2:16 PM Amul Sul <sula...@gmail.com> wrote: > > > > > I get why you think that, I wasn't very precise in briefing the problem. > > > > > > Any new backend that gets connected right after the shared memory > > > state changes to WALPROHIBIT_STATE_GOING_READ_WRITE will be by > > > default allowed to do the WAL writes. Such backends can perform write > > > operation before the checkpointer does the XLogAcceptWrites(). > > > > Okay, make sense now. But my next question is why do we allow backends > > to write WAL in WALPROHIBIT_STATE_GOING_READ_WRITE state? why don't we > > wait until the shared memory state is changed to > > WALPROHIBIT_STATE_READ_WRITE? > > > > Ok, good question. > > Now let's first try to understand the Checkpointer's work. > > When Checkpointer sees the wal prohibited state is an in-progress state, then > it first emits the global barrier and waits until all backers absorb that. > After that it set the final requested WAL prohibit state. > > When other backends absorb those barriers then appropriate action is taken > (e.g. abort the read-write transaction if moving to read-only) by them. Also, > LocalXLogInsertAllowed flags get reset in it and that backend needs to call > XLogInsertAllowed() to get the right value for it, which further decides WAL > writes permitted or prohibited. > > Consider an example that the system is trying to change to read-write and for > that wal prohibited state is set to WALPROHIBIT_STATE_GOING_READ_WRITE before > Checkpointer starts its work. If we want to treat that system as read-only > for > the WALPROHIBIT_STATE_GOING_READ_WRITE state as well. Then we might need to > think about the behavior of the backend that has absorbed the barrier and > reset > the LocalXLogInsertAllowed flag. That backend eventually going to call > XLogInsertAllowed() to get the actual value for it and by seeing the current > state as WALPROHIBIT_STATE_GOING_READ_WRITE, it will set > LocalXLogInsertAllowed > again same as it was before for the read-only state.
I might be missing something, but assume the behavior should be like this 1. If the state is getting changed from WALPROHIBIT_STATE_READ_WRITE -> WALPROHIBIT_STATE_READ_ONLY, then as soon as the backend process the barrier, we can immediately abort any read-write transaction(and stop allowing WAL writing), because once we ensure that all session has responded that now they have no read-write transaction then we can safely change the state from WALPROHIBIT_STATE_GOING_READ_ONLY to WALPROHIBIT_STATE_READ_ONLY. 2. OTOH, if we are changing from WALPROHIBIT_STATE_READ_ONLY -> WALPROHIBIT_STATE_READ_WRITE, then we don't need to allow the backend to consider the system as read-write, instead, we should wait until the shared state is changed to WALPROHIBIT_STATE_READ_WRITE. So your problem is that on receiving the barrier we need to call LocalXLogInsertAllowed() from the backend, but how does that matter? you can still make IsWALProhibited() return true. I don't know the complete code so I might be missing something but at least that is what I would expect from the design POV. Other than this point, I think the state names READ_ONLY, READ_WRITE are a bit confusing no? because actually, these states represent whether WAL is allowed or not, but READ_ONLY, READ_WRITE seems like we are putting the system under a Read-only state. For example, if you are doing some write operation on an unlogged table will be allowed, I guess because that will not generate the WAL until you commit (because commit generates WAL) right? so practically, we are just blocking the WAL, not the write operation. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com