Re: [Patch] Optimize dropping of relation buffers using dlist

Thomas Munro Wed, 21 Oct 2020 23:45:58 -0700

On Thu, Oct 22, 2020 at 7:33 PM Kyotaro Horiguchi
<[email protected]> wrote:
> At Thu, 22 Oct 2020 14:16:37 +0900 (JST), Kyotaro Horiguchi 
> <[email protected]> wrote in
> > smgrtruncate and msgrextend modifies that cache from their parameter,
> > not from lseek().  At the very first the value in the cache comes from
> > lseek() but if nothing other than postgres have changed the file size,
> > I believe we can rely on the cache even with such a buggy kernels even
> > if still exists.
>
> Mmm. Not exact. The requirement here is that we must be certain that
> the we don't have a buffuer for blocks after the file size known to
> the process.  While recoverying, If the first lseek() returned smaller
> size than actual, we cannot have a buffer for the blocks after the
> size. After we trncated or extended the file, we are certain that we
> don't have a buffer for unknown blocks.


Thanks, I understand now.  Something feels fragile about it, perhaps
because it's not really acting as a "cache" anymore despite its name,
but I see the logic now.  It becomes the authoritative source of
information, even if the kernel decides to make our file smaller
asynchronously.

> > If there's no longer such a buggy kernel, we can rely on lseek() only
> > when InRecovery. If we had synchronized file size cache we could rely
> > on the cache even while !InRecovery.  (I'm not sure about how vacuum
> > affects, though.)

Perhaps the buggy kernel of 2006 is actually Linux working as designed
according to its philosophy on ejecting dirty buffers on writeback
failure (and apparently adjusting the size at the same time).  At
least in 2020 it'll tell us about the problem that caused that when we
next perform an operation that reads the error counter, but in the
case of a relation we're dropping -- the use case in this thread --
that won't happen!  (I mean, something else will probably tell you
your system is toast pretty soon, but this particular condition may be
undetected).

I think a synchronised file size cache wouldn't be enough to use this
trick outside the recovery process, because the initial value would
come from a call to lseek(), but unlike recovery, that wouldn't happen
*before* we start putting pages in the buffer pool.  Also, if we one
day have a size-limited relcache, even recovery could get into
trouble, if it evicts the RelationData that holds the authoritative
nblocks value.

Re: [Patch] Optimize dropping of relation buffers using dlist

Reply via email to