On Tue, May 11, 2021 at 2:26 PM Dilip Kumar <dilipbal...@gmail.com> wrote: > > On Tue, May 11, 2021 at 2:16 PM Amul Sul <sula...@gmail.com> wrote: > > > I get why you think that, I wasn't very precise in briefing the problem. > > > > Any new backend that gets connected right after the shared memory > > state changes to WALPROHIBIT_STATE_GOING_READ_WRITE will be by > > default allowed to do the WAL writes. Such backends can perform write > > operation before the checkpointer does the XLogAcceptWrites(). > > Okay, make sense now. But my next question is why do we allow backends > to write WAL in WALPROHIBIT_STATE_GOING_READ_WRITE state? why don't we > wait until the shared memory state is changed to > WALPROHIBIT_STATE_READ_WRITE? >
Ok, good question. Now let's first try to understand the Checkpointer's work. When Checkpointer sees the wal prohibited state is an in-progress state, then it first emits the global barrier and waits until all backers absorb that. After that it set the final requested WAL prohibit state. When other backends absorb those barriers then appropriate action is taken (e.g. abort the read-write transaction if moving to read-only) by them. Also, LocalXLogInsertAllowed flags get reset in it and that backend needs to call XLogInsertAllowed() to get the right value for it, which further decides WAL writes permitted or prohibited. Consider an example that the system is trying to change to read-write and for that wal prohibited state is set to WALPROHIBIT_STATE_GOING_READ_WRITE before Checkpointer starts its work. If we want to treat that system as read-only for the WALPROHIBIT_STATE_GOING_READ_WRITE state as well. Then we might need to think about the behavior of the backend that has absorbed the barrier and reset the LocalXLogInsertAllowed flag. That backend eventually going to call XLogInsertAllowed() to get the actual value for it and by seeing the current state as WALPROHIBIT_STATE_GOING_READ_WRITE, it will set LocalXLogInsertAllowed again same as it was before for the read-only state. Now the question is when this value should get reset again so that backend can be read-write? We are done with a barrier and that backend never going to come back to read-write again. One solution, I think, is to set the final state before emitting the barrier but as per the current design that should get set after all barrier processing. Let's see what Robert says on this. Regards, Amul