Re: O(n) tasks cause lengthy startups and checkpoints

Andres Freund Sun, 03 Jul 2022 10:27:50 -0700

Hi,

On 2022-07-03 10:07:54 -0700, Nathan Bossart wrote:
> Thanks for the prompt review.
> 
> On Sat, Jul 02, 2022 at 03:54:56PM -0700, Andres Freund wrote:
> > On 2022-07-02 15:05:54 -0700, Nathan Bossart wrote:
> >> +          /* Obtain requested tasks */
> >> +          SpinLockAcquire(&CustodianShmem->cust_lck);
> >> +          flags = CustodianShmem->cust_flags;
> >> +          CustodianShmem->cust_flags = 0;
> >> +          SpinLockRelease(&CustodianShmem->cust_lck);
> > 
> > Just resetting the flags to 0 is problematic. Consider what happens if 
> > there's
> > two tasks and and the one processed first errors out. You'll loose 
> > information
> > about needing to run the second task.
> 
> I think we also want to retry any failed tasks.


I don't think so, at least not if it's just going to retry that task straight
away - then we'll get stuck on that one task forever. If we had the ability to
"queue" it the end, to be processed after other already dequeued tasks, it'd
be a different story.


> The way v6 handles this is by requesting all tasks after an exception.

Ick. That strikes me as a bad idea.


> >> +/*
> >> + * RequestCustodian
> >> + *                Called to request a custodian task.
> >> + */
> >> +void
> >> +RequestCustodian(int flags)
> >> +{
> >> +  SpinLockAcquire(&CustodianShmem->cust_lck);
> >> +  CustodianShmem->cust_flags |= flags;
> >> +  SpinLockRelease(&CustodianShmem->cust_lck);
> >> +
> >> +  if (ProcGlobal->custodianLatch)
> >> +          SetLatch(ProcGlobal->custodianLatch);
> >> +}
> > 
> > With this representation we can't really implement waiting for a task or
> > such. And it doesn't seem like a great API for the caller to just specify a
> > mix of flags.
> 
> At the moment, the idea is that nothing should need to wait for a task
> because the custodian only handles things that are relatively non-critical.

Which is just plainly not true as the patchset stands...

I think we're going to have to block if some cleanup as part of a checkpoint
hasn't been completed by the next checkpoint - otherwise it'll just end up
being way too confusing and there's absolutely no backpressure anymore.


> If that changes, this could probably be expanded to look more like
> RequestCheckpoint().
> 
> What would you suggest using instead of a mix of flags?

I suspect an array of tasks with requested and completed counters or such?
With a condition variable to wait on?


> >> +  /* let the custodian know what it can remove */
> >> +  CustodianSetLogicalRewriteCutoff(cutoff);
> > 
> > Setting this variable in a custodian datastructure and then fetching it from
> > there seems architecturally wrong to me.
> 
> Where do you think it should go?  I previously had it in the checkpointer's
> shared memory, but you didn't like that the functions were declared in
> bgwriter.h (along with the other checkpoint stuff).  If the checkpointer
> shared memory is the right place, should we create checkpointer.h and use
> that instead?

Well, so far I have not understood what the whole point of the shared state
is, so i have a bit of a hard time answering this ;)


> >> +/*
> >> + * Remove all mappings not needed anymore based on the logical restart 
> >> LSN saved
> >> + * by the checkpointer.  We use this saved value instead of calling
> >> + * ReplicationSlotsComputeLogicalRestartLSN() so that we don't interfere 
> >> with an
> >> + * ongoing call to CheckPointLogicalRewriteHeap() that is flushing 
> >> mappings to
> >> + * disk.
> >> + */
> > 
> > What interference could there be?
> 
> My concern is that the custodian could obtain a later cutoff than what the
> checkpointer does, which might cause files to be concurrently unlinked and
> fsync'd.  If we always use the checkpointer's cutoff, that shouldn't be a
> problem.  This could probably be better explained in this comment.

How about having a Datum argument to RequestCustodian() that is forwarded to
the task?


> >> +void
> >> +RemoveOldLogicalRewriteMappings(void)
> >> +{
> >> +  XLogRecPtr      cutoff;
> >> +  DIR                *mappings_dir;
> >> +  struct dirent *mapping_de;
> >> +  char            path[MAXPGPATH + 20];
> >> +  bool            value_set = false;
> >> +
> >> +  cutoff = CustodianGetLogicalRewriteCutoff(&value_set);
> >> +  if (!value_set)
> >> +          return;
> > 
> > Afaics nothing clears values_set - is that a good idea?
> 
> I'm using value_set to differentiate the case where InvalidXLogRecPtr means
> the checkpointer hasn't determined a value yet versus the case where it
> has.  In the former, we don't want to take any action.  In the latter, we
> want to unlink all the files.  Since we're moving to a request model for
> the custodian, I might be able to remove this value_set stuff completely.
> If that's not possible, it probably deserves a better comment.

It would.

Greetings,

Andres Freund

Re: O(n) tasks cause lengthy startups and checkpoints

Reply via email to