On Thu, Sep 9, 2021 at 11:12 PM Mark Dilger <mark.dil...@enterprisedb.com> wrote: > >
Thank you, for looking at the patch. Please see my reply inline below: > > > On Sep 8, 2021, at 6:44 AM, Amul Sul <sula...@gmail.com> wrote: > > > > Here is the rebased version. > > v33-0004 > > This patch moves the include of "catalog/pg_control.h" from transam/xlog.c > into access/xlog.h, making pg_control.h indirectly included from a much > larger set of files. Maybe that's ok. I don't know. But it seems you are > doing this merely to get the symbol (not even the definition) for struct > DBState. I'd recommend rearranging the code so this isn't necessary, but > otherwise you'd at least want to remove the now redundant includes of > catalog/pg_control.h from xlogdesc.c, xloginsert.c, auth-scram.c, > postmaster.c, misc/pg_controldata.c, and pg_controldata/pg_controldata.c. > Yes, you are correct, xlog.h is included in more than 150 files. I was wondering if we can have a forward declaration instead of including pg_control.h (e.g. The same way struct XLogRecData was declared in xlog.h). Perhaps, DBState is enum & I don't see we have done the same for enum elsewhere as we are doing for structures, but that seems to be fine, IMO. Earlier, I was unsure before preparing this patch, but since that makes sense (I assumed) and minimizes duplications, can we go ahead and post separately with the same change in StartupXLOG() which I have skipped for the same reason mentioned in patch commit-msg. > v33-0005 > > This patch makes bool XLogInsertAllowed() more complicated than before. The > result used to depend mostly on the value of LocalXLogInsertAllowed except > that when that value was negative, the result was determined by > RecoveryInProgress(). There was an arcane rule that LocalXLogInsertAllowed > must have the non-negative values binary coercible to boolean "true" and > "false", with the basis for that rule being the coding of > XLogInsertAllowed(). Now that the function is more complicated, this rule > seems even more arcane. Can we change the logic to not depend on casting an > integer to bool? > We can't use a boolean variable because LocalXLogInsertAllowed represents three states as, 1 means "wal is allowed'', 0 for "wal is disallowed", and -1 is for "need to check". > The code comment change in autovacuum.c introduces a non-grammatical > sentence: "First, the system is not read only i.e. wal writes permitted". > > The function comment in checkpointer.c reads more like it toggles the system > into allowing something, rather than actually doing that same something: > "SendSignalToCheckpointer allows a process to send a signal to the checkpoint > process". > > The new code comment in ipci.c contains a typo, but more importantly, it > doesn't impart any knowledge beyond what a reader of the function name could > already surmise. Perhaps the comment can better clarify what is happening: > "Set up wal probibit shared state" > > The new code comment in sync.c copies and changes a nearby comment but drops > part of the verb phrase: "As in ProcessSyncRequests, we don't want to stop > wal prohibit change requests". The nearby comment reads "stop absorbing". I > think this one should read "stop processing". This same comment is used > again below. Then a third comment reads "For the same reason mentioned > previously for the wal prohibit state change request check." That third > comment is too glib. > > tcop/utility.c needlessly includes "access/walprohibit.h" > > wait_event.h extends enum WaitEventIO with new values > WAIT_EVENT_WALPROHIBIT_STATE and WAIT_EVENT_WALPROHIBIT_STATE_CHANGE. I > don't find the difference between these two names at all clear. Waiting for > a state change is clear enough. But how is waiting on a state different? > > xlog.h defines a new enum. I don't find any of it clear; not the comment, > nor the name of the enum, nor the names of the values: > > /* State of work that enables wal writes */ > typedef enum XLogAcceptWritesState > { > XLOG_ACCEPT_WRITES_PENDING = 0, /* initial state, not started */ > XLOG_ACCEPT_WRITES_SKIPPED, /* skipped wal writes */ > XLOG_ACCEPT_WRITES_DONE /* wal writes are enabled */ > } XLogAcceptWritesState; > > This enum seems to have been written from the point of view of someone who > already knew what it was for. It needs to be written in a way that will be > clear to people who have no idea what it is for. > > v33-0006: > > The new code comments in brin.c and elsewhere should use the verb "require" > rather than "have", otherwise "building indexes" reads as a noun phrase > rather than as a gerund: /* Building indexes will have an XID */ > Will try to think about the pointed code comments for the improvements. > The new function CheckWALPermitted() seems to test the current state of > variables but not lock any of them, and the new function comment says: > CheckWALPermitted() calls XLogInsertAllowed() does check the LocalXLogInsertAllowed flag which is local to that process only, and nobody else reads that concurrently. > /* > * In opposite to the above assertion if a transaction doesn't have valid XID > * (e.g. VACUUM) then it won't be killed while changing the system state to > WAL > * prohibited. Therefore, we need to explicitly error out before entering > into > * the critical section. > */ > > This suggests to me that a vacuum process can check whether wal is > prohibited, then begin a critical section which needs wal to be allowed, and > concurrently somebody else might disable wal without killing the vacuum > process. I'm given to wonder what horrors await when the vacuum process does > something that needs to be wal logged but cannot be. Does it trigger a > panic? I don't like the idea that calling pg_prohibit_wal durning a vacuum > might panic the cluster. If there is some reason this is not a problem, I > think the comment should explain it. In particular, why is it sufficient to > check whether wal is prohibited before entering the critical section and not > necessary to be sure it remains allowed through the lifetime of that critical > section? > Hm, interrupts absorption are disabled inside the critical section. The wal prohibited state for that process (here vacuum) will never get set until it sees the interrupts & the system will not be said wal prohibited until every process sees that interrupts. I am not sure we should explain the characteristics of the critical section at this place, if want, we can add a brief saying that inside the critical section we should not worry about the state change which never happens because interrupts are disabled there. > v33-0007: > > I don't really like what the documentation has to say about pg_prohibit_wal. > Why should pg_prohibit_wal differ from other signal sending functions in > whether it returns a boolean? If you believe it must always succeed, you can > still define it as returning a boolean and always return true. That leaves > the door open to future code changes which might need to return false for > some reason. > Ok, I am fine to always return true. > But I also don't like the idea that existing transactions with xids are > immediately killed. Shouldn't this function take an optional timeout, > perhaps defaulting to none, but otherwise allowing the user to put the system > into WALPROHIBIT_STATE_GOING_READ_ONLY for a period of time before killing > remaining transactions? > Ok, will check. > Why is this function defined to take a boolean such that > pg_prohibit_wal(true) means to prohibit wal and pg_prohibit_wal(false) means > to allow wal. Wouldn't a different function named pg_allow_wal() make it > more clear? This also would be a better interface if taking the system > read-only had a timeout as I suggested above, as such a timeout parameter > when allowing wal is less clearly useful. > Like Robert, I am too inclined to have a single function that is easy to remember. Apart from this, recently while testing this patch with pgbench where I have exhausted the connection limit and want to change the system's prohibited state in between but I was unable to do that, I wish I could do that using the pg_clt option. How about having a pg_clt option to alter wal prohibited state? > That's enough code review for now. Next I will review your regression > tests.... > Thanks again.