On Thu, Sep 9, 2021 at 11:12 PM Mark Dilger
<mark.dil...@enterprisedb.com> wrote:
>
>

Thank you, for looking at the patch.  Please see my reply inline below:

>
> > On Sep 8, 2021, at 6:44 AM, Amul Sul <sula...@gmail.com> wrote:
> >
> > Here is the rebased version.
>
> v33-0004
>
> This patch moves the include of "catalog/pg_control.h" from transam/xlog.c 
> into access/xlog.h, making pg_control.h indirectly included from a much 
> larger set of files.  Maybe that's ok.  I don't know.  But it seems you are 
> doing this merely to get the symbol (not even the definition) for struct 
> DBState.  I'd recommend rearranging the code so this isn't necessary, but 
> otherwise you'd at least want to remove the now redundant includes of 
> catalog/pg_control.h from xlogdesc.c, xloginsert.c, auth-scram.c, 
> postmaster.c, misc/pg_controldata.c, and pg_controldata/pg_controldata.c.
>

Yes, you are correct, xlog.h is included in more than 150 files. I was
wondering if we can have a forward declaration instead of including
pg_control.h (e.g. The same way struct XLogRecData was declared in
xlog.h). Perhaps, DBState is enum & I don't see we have done the same
for enum elsewhere as we are doing for structures, but that seems to
be fine, IMO.

Earlier, I was unsure before preparing this patch, but since that
makes sense (I assumed) and minimizes duplications, can we go ahead
and post separately with the same change in StartupXLOG() which I have
skipped for the same reason mentioned in patch commit-msg.

> v33-0005
>
> This patch makes bool XLogInsertAllowed() more complicated than before.  The 
> result used to depend mostly on the value of LocalXLogInsertAllowed except 
> that when that value was negative, the result was determined by 
> RecoveryInProgress().  There was an arcane rule that LocalXLogInsertAllowed 
> must have the non-negative values binary coercible to boolean "true" and 
> "false", with the basis for that rule being the coding of 
> XLogInsertAllowed().  Now that the function is more complicated, this rule 
> seems even more arcane.  Can we change the logic to not depend on casting an 
> integer to bool?
>

We can't use a boolean variable because LocalXLogInsertAllowed
represents three states as, 1 means "wal is allowed'', 0 for "wal is
disallowed", and -1 is for "need to check".

> The code comment change in autovacuum.c introduces a non-grammatical 
> sentence: "First, the system is not read only i.e. wal writes permitted".
>
> The function comment in checkpointer.c reads more like it toggles the system 
> into allowing something, rather than actually doing that same something: 
> "SendSignalToCheckpointer allows a process to send a signal to the checkpoint 
> process".
>
> The new code comment in ipci.c contains a typo, but more importantly, it 
> doesn't impart any knowledge beyond what a reader of the function name could 
> already surmise.  Perhaps the comment can better clarify what is happening: 
> "Set up wal probibit shared state"
>
> The new code comment in sync.c copies and changes a nearby comment but drops 
> part of the verb phrase:  "As in ProcessSyncRequests, we don't want to stop 
> wal prohibit change requests".  The nearby comment reads "stop absorbing".  I 
> think this one should read "stop processing".  This same comment is used 
> again below.   Then a third comment reads "For the same reason mentioned 
> previously for the wal prohibit state change request check."  That third 
> comment is too glib.
>
> tcop/utility.c needlessly includes "access/walprohibit.h"
>
> wait_event.h extends enum WaitEventIO with new values 
> WAIT_EVENT_WALPROHIBIT_STATE and WAIT_EVENT_WALPROHIBIT_STATE_CHANGE.  I 
> don't find the difference between these two names at all clear.  Waiting for 
> a state change is clear enough.  But how is waiting on a state different?
>
> xlog.h defines a new enum.  I don't find any of it clear; not the comment, 
> nor the name of the enum, nor the names of the values:
>
> /* State of work that enables wal writes */
> typedef enum XLogAcceptWritesState
> {
>     XLOG_ACCEPT_WRITES_PENDING = 0, /* initial state, not started */
>     XLOG_ACCEPT_WRITES_SKIPPED,     /* skipped wal writes */
>     XLOG_ACCEPT_WRITES_DONE         /* wal writes are enabled */
> } XLogAcceptWritesState;
>
> This enum seems to have been written from the point of view of someone who 
> already knew what it was for.  It needs to be written in a way that will be 
> clear to people who have no idea what it is for.
>
> v33-0006:
>
> The new code comments in brin.c and elsewhere should use the verb "require" 
> rather than "have", otherwise "building indexes" reads as a noun phrase 
> rather than as a gerund: /* Building indexes will have an XID */
>

Will try to think about the pointed code comments for the improvements.

> The new function CheckWALPermitted() seems to test the current state of 
> variables but not lock any of them, and the new function comment says:
>

CheckWALPermitted() calls XLogInsertAllowed() does check the
LocalXLogInsertAllowed flag which is local to that process only, and
nobody else reads that concurrently.

> /*
>  * In opposite to the above assertion if a transaction doesn't have valid XID
>  * (e.g. VACUUM) then it won't be killed while changing the system state to 
> WAL
>  * prohibited.  Therefore, we need to explicitly error out before entering 
> into
>  * the critical section.
>  */
>
> This suggests to me that a vacuum process can check whether wal is 
> prohibited, then begin a critical section which needs wal to be allowed, and 
> concurrently somebody else might disable wal without killing the vacuum 
> process.  I'm given to wonder what horrors await when the vacuum process does 
> something that needs to be wal logged but cannot be.  Does it trigger a 
> panic?  I don't like the idea that calling pg_prohibit_wal durning a vacuum 
> might panic the cluster.  If there is some reason this is not a problem, I 
> think the comment should explain it.  In particular, why is it sufficient to 
> check whether wal is prohibited before entering the critical section and not 
> necessary to be sure it remains allowed through the lifetime of that critical 
> section?
>

Hm, interrupts absorption are disabled inside the critical section.
The wal prohibited state for that process (here vacuum) will never get
set until it sees the interrupts & the system will not be said wal
prohibited until every process sees that interrupts. I am not sure we
should explain the characteristics of the critical section at this
place, if want, we can add a brief saying that inside the critical
section we should not worry about the state change which never happens
because interrupts are disabled there.

> v33-0007:
>
> I don't really like what the documentation has to say about pg_prohibit_wal.  
> Why should pg_prohibit_wal differ from other signal sending functions in 
> whether it returns a boolean?  If you believe it must always succeed, you can 
> still define it as returning a boolean and always return true.  That leaves 
> the door open to future code changes which might need to return false for 
> some reason.
>

Ok, I am fine to always return true.

> But I also don't like the idea that existing transactions with xids are 
> immediately killed.  Shouldn't this function take an optional timeout, 
> perhaps defaulting to none, but otherwise allowing the user to put the system 
> into WALPROHIBIT_STATE_GOING_READ_ONLY for a period of time before killing 
> remaining transactions?
>

Ok, will check.

> Why is this function defined to take a boolean such that 
> pg_prohibit_wal(true) means to prohibit wal and pg_prohibit_wal(false) means 
> to allow wal.  Wouldn't a different function named pg_allow_wal() make it 
> more clear?  This also would be a better interface if taking the system 
> read-only had a timeout as I suggested above, as such a timeout parameter 
> when allowing wal is less clearly useful.
>

Like Robert, I am too inclined to have a single function that is easy
to remember. Apart from this, recently while testing this patch with
pgbench where I have exhausted the connection limit and want to change
the system's prohibited state in between but I was unable to do that,
I wish I could do that using the pg_clt option. How about having a
pg_clt option to alter wal prohibited state?

> That's enough code review for now.  Next I will review your regression 
> tests....
>
Thanks again.


Reply via email to