Re: [Patch] Optimize dropping of relation buffers using dlist

Kyotaro Horiguchi Thu, 03 Dec 2020 21:04:45 -0800

Thanks for the new version.

This contains only replies. I'll send some further comments in another
mail later.
           
At Thu, 3 Dec 2020 03:49:27 +0000, "k.jami...@fujitsu.com" 
<k.jami...@fujitsu.com> wrote in 
> On Thursday, November 26, 2020 4:19 PM, Horiguchi-san wrote:
> > Hello, Kirk. Thank you for the new version.
> 
> Apologies for the delay, but attached are the updated versions to simplify 
> the patches.
> The changes reflected most of your comments/suggestions.
> 
> Summary of changes in the latest versions.
> 1. Updated the function description of DropRelFileNodeBuffers in 0003.
> 2. Updated the commit logs of 0003 and 0004.
> 3, FindAndDropRelFileNodeBuffers is now called for each relation fork,
>   instead of for all involved forks.
> 4. Removed the unnecessary palloc() and subscripts like forks[][],
>    firstDelBlock[], nforks, as advised by Horiguchi-san. The memory
>    allocation for block[][] was also simplified.
>    So 0004 became simpler and more readable.
...
> > > a reliable size of nblocks for supplied relation's fork at that time,
> > > and it's safe because DropRelFileNodeBuffers() relies on the behavior
> > > that cached nblocks will not be invalidated by file extension during
> > > recovery.  Otherwise, or if not in recovery, proceed to sequential
> > > search of the whole buffer pool.
> > 
> > This sentence seems involving confusion. It reads as if "we can rely on it
> > because we're relying on it".  And "the cached value won't be invalidated"
> > doesn't explain the reason precisely. The reason I think is that the cached
> > value is guaranteed to be the maximum page we have in shared buffer at least
> > while recovery, and that guarantee is holded by not asking fseek once we
> > cached the value.
> 
> Fixed the commit log of 0003.


Thanks!

...
> > +   nforks = palloc(sizeof(int) * n);
> > +   forks = palloc(sizeof(ForkNumber *) * n);
> > +   blocks = palloc(sizeof(BlockNumber *) * n);
> > +   firstDelBlocks = palloc(sizeof(BlockNumber) * n * (MAX_FORKNUM
> > + 1));
> > +   for (i = 0; i < n; i++)
> > +   {
> > +           forks[i] = palloc(sizeof(ForkNumber) * (MAX_FORKNUM +
> > 1));
> > +           blocks[i] = palloc(sizeof(BlockNumber) * (MAX_FORKNUM
> > + 1));
> > +   }
> > 
> > We can allocate the whole array at once like this.
> > 
> >  BlockNumber (*blocks)[MAX_FORKNUM+1] =
> >       (BlockNumber (*)[MAX_FORKNUM+1])
> >       palloc(sizeof(BlockNumber) * n * (MAX_FORKNUM + 1))
> 
> Thank you for suggesting to reduce the lines for the 2d dynamic memory alloc.
> I followed this way in 0004, but it's my first time to see it written this 
> way.
> I am very glad it works, though is it okay to write it this way since I 
> cannot find
> a similar code of declaring and allocating 2D arrays like this in Postgres 
> source code?

Actually it would be somewhat novel for a certain portion of people,
but it is fundamentally the same with function pointers.  Hard to make
it from scratch, but I suppose not so hard to read:)

int (*func_char_to_int)(char x) = some_func;

FWIW isn.c has the following part:

> static bool
> check_table(const char *(*TABLE)[2], const unsigned TABLE_index[10][2])


> > +                   nBlocksToInvalidate += blocks[i][numForks];
> > +
> > +                   forks[i][numForks++] = j;
> > 
> > We can signal to the later code the absense of a fork by setting
> > InvalidBlockNumber to blocks. Thus forks[], nforks and numForks can be
> > removed.
> 
> Followed it in 0004.

Looks fine to me, thanks.

> > +   /* Zero the array of blocks because these will all be dropped anyway
> > */
> > +   MemSet(firstDelBlocks, 0, sizeof(BlockNumber) * n *
> > (MAX_FORKNUM +
> > +1));
> > 
> > We don't need to prepare nforks, forks and firstDelBlocks for all relations
> > before looping over relations.  In other words, we can fill in the arrays 
> > for a
> > relation at every iteration of relations.
> 
> Followed your advice. Although I now drop the buffers per fork, which now
> removes forks[][], nforks, firstDelBlocks[].

That's fine for me.

> > +    * We enter the optimization iff we are in recovery and the number of
> > +blocks to
> > 
> > This comment ticks out of 80 columns. (I'm not sure whether that convention
> > is still valid..)
> 
> Fixed.
>  
> > +   if (InRecovery && nBlocksToInvalidate <
> > BUF_DROP_FULL_SCAN_THRESHOLD)
> > 
> > We don't need to check InRecovery here. DropRelFileNodeBuffers doesn't do
> > that.
> 
> 
> As for DropRelFileNodesAllBuffers use case, I used InRecovery
> so that the optimization still works.
>   Horiguchi-san also wrote in another mail:
> > A bit different from the point, but if some tuples have been inserted to the
> > truncated table, XLogReadBufferExtended() is called for the table and the
> > length is cached.
> I was wrong in my previous claim that the "cached" value always return false.
> When I checked the recovery test log from recovery tap test, there was only
> one example when "cached" became true (script below) and entered the
> optimization path. However, in all other cases including the TRUNCATE test 
> case
> in my patch, the "cached" flag returns "false".

Yeah, I agree that smgrnblocks returns false in the targetted cases,
so we should want some amendment.  We need to disucssion on this point.

> "cached" flag became true:
>       # in different subtransaction patterns
>       $node->safe_psql(
>               'postgres', "
>               BEGIN;
>               CREATE TABLE spc_commit (id serial PRIMARY KEY, id2 int);
>               INSERT INTO spc_commit VALUES (DEFAULT, 
> generate_series(1,3000));
>               TRUNCATE spc_commit;
>               SAVEPOINT s; ALTER TABLE spc_commit SET TABLESPACE other; 
> RELEASE s;
>               COPY spc_commit FROM '$copy_file' DELIMITER ',';
>               COMMIT;");
>       $node->stop('immediate');
>       $node->start;
> 
> So I used the InRecovery for the optimization case of 
> DropRelFileNodesAllBuffers.
> I retained the smgrnblocks' "cached" parameter as it is useful in
> DropRelFileNodeBuffers.

I think that's ok as this version of the patch.

> > > > I agree that we can do a better job by expanding comments to clearly
> > > > state why it is safe.
> > >
> > > Yes, basically what Amit-san also mentioned above. The first patch
> > prevents that.
> > > And in the description of DropRelFileNodeBuffers in the 0003 patch,
> > > please check If that would suffice.
> > 
> > + *         While in recovery, if the expected maximum number of
> > buffers to be
> > + *         dropped is small enough and the sizes of all involved forks
> > are
> > + *         already cached, individual buffer is located by
> > BufTableLookup().
> > + *         It is safe because cached blocks will not be invalidated by file
> > + *         extension during recovery.  See smgrnblocks() and
> > smgrextend() for
> > + *         more details. Otherwise, if the conditions for optimization are
> > not
> > + *         met, the buffer pool is sequentially scanned so that no
> > buffers are
> > + *         left behind.
> > 
> > I'm not confident on it, but it seems somewhat obscure.  How about
> > something like this?
> > 
> > We mustn't leave a buffer for the relations to be dropped.  We invalidate
> > buffer blocks by locating using BufTableLookup() when we assure that we
> > know up to what page of every fork we possiblly have a buffer for. We can
> > know that by the "cached" flag returned by smgrblocks. It currently gets 
> > true
> > only while recovery. See
> > smgrnblocks() and smgrextend(). Otherwise we scan the whole buffer pool to
> > find buffers for the relation, which is slower when a small part of buffers 
> > are
> > to be dropped.
> 
> Followed your advice and modified it a bit.
> 
> I have changed the status to "Needs Review".
> Feedbacks are always welcome.
> 
> Regards,
> Kirk Jamison

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: [Patch] Optimize dropping of relation buffers using dlist

Reply via email to