On Thu, Jan 7, 2021 at 12:45 PM Bharath Rupireddy <bharath.rupireddyforpostg...@gmail.com> wrote: > > On Thu, Jan 7, 2021 at 11:32 AM Amul Sul <sula...@gmail.com> wrote: > > Snip from CreateCheckPoint(): > > -- > > /* > > * Acquire CheckpointLock to ensure only one checkpoint happens at a time. > > * (This is just pro forma, since in the present system structure there is > > * only one process that is allowed to issue checkpoints at any given > > * time.) > > */ > > LWLockAcquire(CheckpointLock, LW_EXCLUSIVE); > > -- > > > > As per this comment, it seems to be that we don't really need this LW lock. > > We > > could have something else instead if we are afraid of having multiple > > checkpoints at any given time which isn't possible, btw. > > Looks like CheckpointLock() can also be called in standalone backends > (RequestCheckpoint->CreateCheckPoint) along with the checkpointer > process. Having said that, I'm not quite sure whether it can get > called at the same time from a backend(may be with checkpoint; > command) and the checkpointer process. > > /* > * If in a standalone backend, just do it ourselves. > */ > if (!IsPostmasterEnvironment) > { > /* > * There's no point in doing slow checkpoints in a standalone backend, > * because there's no other backends the checkpoint could disrupt. > */ > CreateCheckPoint(flags | CHECKPOINT_IMMEDIATE); > > See the below comment for IsPostmasterEnvironment: > > /* > * IsPostmasterEnvironment is true in a postmaster process and any postmaster > * child process; it is false in a standalone process (bootstrap or > * standalone backend). >
I am not sure I understood your point completely. Do you mean there could be multiple bootstrap or standalone backends that could exist at a time and can perform checkpoint? > > This CheckpointLock is acquired almost at the start of CreateCheckPoint() > > and > > released at the end. The in-between code can take a while to be complete. > > All > > that time interrupt will be on hold which doesn't seem to be a great idea > > but > > that wasn't problematic until now. I am having trouble due to this > > interrupt > > hold for a long since I was trying to add code changes where the > > checkpointer > > process is suppose to respond to the system read-write state change request > > as > > soon as possible[1]. That cannot be done if the interrupt is disabled > > since read-write state change uses the global barrier mechanism to convey > > system > > state changes to other running processes. > > > > So my question is, can we completely get rid of the CheckpointLock need in > > CreateCheckPoint()? > > > > Thoughts/Comments/Suggestions? > > I don't think we can completely get rid of CheckpointLock in > CreateCheckPoint given the fact that it can get called from multiple > processes. > How? > How about serving that ASRO barrier request alone for checkpointer > process in ProcessInterrupts even though the CheckpointLock is held by > the checkpointer process? And also while the checkpointing is > happening, may be we should call CHECK_FOR_INTERRUPTS to see if the > ASRO barrier has arrived. This may sound bit hacky, but just a > thought. I'm thinking that in ProcessInterrupts we can get to know the > checkpointer process type via MyAuxProcType, and whether the lock is > held or not using CheckpointLock, and then we can handle the ASRO > global barrier request. > I am afraid that this will easily get rejected by the community. Note that this is a very crucial code path of the database server. There are other options too, but don't want to propose them until clarity on the CheckpointLock necessity. Regards, Amul