Re: [Patch] Optimize dropping of relation buffers using dlist

Kyotaro Horiguchi Wed, 16 Sep 2020 01:32:46 -0700

At Wed, 16 Sep 2020 10:05:32 +0530, Amit Kapila <amit.kapil...@gmail.com> wrote 
in 
> On Wed, Sep 16, 2020 at 9:02 AM Kyotaro Horiguchi
> <horikyota....@gmail.com> wrote:
> >
> > At Wed, 16 Sep 2020 08:33:06 +0530, Amit Kapila <amit.kapil...@gmail.com> 
> > wrote in
> > > On Wed, Sep 16, 2020 at 7:46 AM Kyotaro Horiguchi
> > > <horikyota....@gmail.com> wrote:
> > By the way I'm not sure that actually happens, but if one smgrextend
> > call exnteded the relation by two or more blocks, the cache is
> > invalidated and succeeding smgrnblocks returns lseek()'s result.
> >
> 
> Can you think of any such case? I think in recovery we use
> XLogReadBufferExtended->ReadBufferWithoutRelcache for reading the page
> which seems to be extending page-by-page but there could be some case
> where that is not true. One idea is to run regressions and add an
> Assert to see if we are extending more than a block during recovery.


I agree with you. Actually XLogReadBufferExtended is the only point to
read a page while recovery and seems calling ReadBufferWithoutRelcache
page by page up to the target page. The only case I found where the
cache is invalidated is ALTER TABLE SET TABLESPACE while
wal_level=minimal and not during recovery. smgrextend is called
without smgrnblocks called at the time.

Considering that the behavior of lseek can be a problem only just after
extending a file, an assertion in smgrextend seems to be
enough. Although, I'm not confident on the diagnosis.

--- a/src/backend/storage/smgr/smgr.c
+++ b/src/backend/storage/smgr/smgr.c
@@ -474,7 +474,14 @@ smgrextend(SMgrRelation reln, ForkNumber forknum, 
BlockNumber blocknum,
        if (reln->smgr_cached_nblocks[forknum] == blocknum)
                reln->smgr_cached_nblocks[forknum] = blocknum + 1;
        else
+       {
+               /*
+                * DropRelFileNodeBuffers relies on the behavior that nblocks 
cache
+                * won't be invalidated by file extension while recoverying.
+                */
+               Assert(!InRecovery);
                reln->smgr_cached_nblocks[forknum] = InvalidBlockNumber;
+       }
 }

> > Don't
> > we need to guarantee the cache to be valid while recovery?
> >
> 
> One possibility could be that we somehow detect that the value we are
> using is cached one and if so then only do this optimization.

I basically like this direction.  But I'm not sure the additional
parameter for smgrnblocks is acceptable.

But on the contrary, it might be a better design that
DropRelFileNodeBuffers gives up the optimization when
smgrnblocks(,,must_accurate = true) returns InvalidBlockNumber.


@@ -544,9 +551,12 @@ smgrwriteback(SMgrRelation reln, ForkNumber forknum, 
BlockNumber blocknum,
 /*
  *     smgrnblocks() -- Calculate the number of blocks in the
  *                                      supplied relation.
+ *
+ *     Returns InvalidBlockNumber if must_accurate is true and 
smgr_cached_nblocks
+ *     is not available.
  */
 BlockNumber
-smgrnblocks(SMgrRelation reln, ForkNumber forknum)
+smgrnblocks(SMgrRelation reln, ForkNumber forknum, bool must_accurate)
 {
        BlockNumber result;
 
@@ -561,6 +571,17 @@ smgrnblocks(SMgrRelation reln, ForkNumber forknum)
 
        reln->smgr_cached_nblocks[forknum] = result;
 
+       /*
+        * We cannot believe the result from smgr_nblocks is always accurate
+        * because lseek of buggy Linux kernels doesn't account for a recent
+        * write. However, we can rely on the result from lseek while recovering
+        * because the first call to this function is not happen just after a 
file
+        * extension. Return values on subsequent calls return cached nblocks,
+        * which should be accurate during recovery.
+        */
+       if (!InRecovery && must_accurate)
+               return InvalidBlockNumber;
+
        return result;
 }


regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: [Patch] Optimize dropping of relation buffers using dlist

Reply via email to