Thanks for the new version. This contains only replies. I'll send some further comments in another mail later. At Thu, 3 Dec 2020 03:49:27 +0000, "k.jami...@fujitsu.com" <k.jami...@fujitsu.com> wrote in > On Thursday, November 26, 2020 4:19 PM, Horiguchi-san wrote: > > Hello, Kirk. Thank you for the new version. > > Apologies for the delay, but attached are the updated versions to simplify > the patches. > The changes reflected most of your comments/suggestions. > > Summary of changes in the latest versions. > 1. Updated the function description of DropRelFileNodeBuffers in 0003. > 2. Updated the commit logs of 0003 and 0004. > 3, FindAndDropRelFileNodeBuffers is now called for each relation fork, > instead of for all involved forks. > 4. Removed the unnecessary palloc() and subscripts like forks[][], > firstDelBlock[], nforks, as advised by Horiguchi-san. The memory > allocation for block[][] was also simplified. > So 0004 became simpler and more readable. ... > > > a reliable size of nblocks for supplied relation's fork at that time, > > > and it's safe because DropRelFileNodeBuffers() relies on the behavior > > > that cached nblocks will not be invalidated by file extension during > > > recovery. Otherwise, or if not in recovery, proceed to sequential > > > search of the whole buffer pool. > > > > This sentence seems involving confusion. It reads as if "we can rely on it > > because we're relying on it". And "the cached value won't be invalidated" > > doesn't explain the reason precisely. The reason I think is that the cached > > value is guaranteed to be the maximum page we have in shared buffer at least > > while recovery, and that guarantee is holded by not asking fseek once we > > cached the value. > > Fixed the commit log of 0003.
Thanks! ... > > + nforks = palloc(sizeof(int) * n); > > + forks = palloc(sizeof(ForkNumber *) * n); > > + blocks = palloc(sizeof(BlockNumber *) * n); > > + firstDelBlocks = palloc(sizeof(BlockNumber) * n * (MAX_FORKNUM > > + 1)); > > + for (i = 0; i < n; i++) > > + { > > + forks[i] = palloc(sizeof(ForkNumber) * (MAX_FORKNUM + > > 1)); > > + blocks[i] = palloc(sizeof(BlockNumber) * (MAX_FORKNUM > > + 1)); > > + } > > > > We can allocate the whole array at once like this. > > > > BlockNumber (*blocks)[MAX_FORKNUM+1] = > > (BlockNumber (*)[MAX_FORKNUM+1]) > > palloc(sizeof(BlockNumber) * n * (MAX_FORKNUM + 1)) > > Thank you for suggesting to reduce the lines for the 2d dynamic memory alloc. > I followed this way in 0004, but it's my first time to see it written this > way. > I am very glad it works, though is it okay to write it this way since I > cannot find > a similar code of declaring and allocating 2D arrays like this in Postgres > source code? Actually it would be somewhat novel for a certain portion of people, but it is fundamentally the same with function pointers. Hard to make it from scratch, but I suppose not so hard to read:) int (*func_char_to_int)(char x) = some_func; FWIW isn.c has the following part: > static bool > check_table(const char *(*TABLE)[2], const unsigned TABLE_index[10][2]) > > + nBlocksToInvalidate += blocks[i][numForks]; > > + > > + forks[i][numForks++] = j; > > > > We can signal to the later code the absense of a fork by setting > > InvalidBlockNumber to blocks. Thus forks[], nforks and numForks can be > > removed. > > Followed it in 0004. Looks fine to me, thanks. > > + /* Zero the array of blocks because these will all be dropped anyway > > */ > > + MemSet(firstDelBlocks, 0, sizeof(BlockNumber) * n * > > (MAX_FORKNUM + > > +1)); > > > > We don't need to prepare nforks, forks and firstDelBlocks for all relations > > before looping over relations. In other words, we can fill in the arrays > > for a > > relation at every iteration of relations. > > Followed your advice. Although I now drop the buffers per fork, which now > removes forks[][], nforks, firstDelBlocks[]. That's fine for me. > > + * We enter the optimization iff we are in recovery and the number of > > +blocks to > > > > This comment ticks out of 80 columns. (I'm not sure whether that convention > > is still valid..) > > Fixed. > > > + if (InRecovery && nBlocksToInvalidate < > > BUF_DROP_FULL_SCAN_THRESHOLD) > > > > We don't need to check InRecovery here. DropRelFileNodeBuffers doesn't do > > that. > > > As for DropRelFileNodesAllBuffers use case, I used InRecovery > so that the optimization still works. > Horiguchi-san also wrote in another mail: > > A bit different from the point, but if some tuples have been inserted to the > > truncated table, XLogReadBufferExtended() is called for the table and the > > length is cached. > I was wrong in my previous claim that the "cached" value always return false. > When I checked the recovery test log from recovery tap test, there was only > one example when "cached" became true (script below) and entered the > optimization path. However, in all other cases including the TRUNCATE test > case > in my patch, the "cached" flag returns "false". Yeah, I agree that smgrnblocks returns false in the targetted cases, so we should want some amendment. We need to disucssion on this point. > "cached" flag became true: > # in different subtransaction patterns > $node->safe_psql( > 'postgres', " > BEGIN; > CREATE TABLE spc_commit (id serial PRIMARY KEY, id2 int); > INSERT INTO spc_commit VALUES (DEFAULT, > generate_series(1,3000)); > TRUNCATE spc_commit; > SAVEPOINT s; ALTER TABLE spc_commit SET TABLESPACE other; > RELEASE s; > COPY spc_commit FROM '$copy_file' DELIMITER ','; > COMMIT;"); > $node->stop('immediate'); > $node->start; > > So I used the InRecovery for the optimization case of > DropRelFileNodesAllBuffers. > I retained the smgrnblocks' "cached" parameter as it is useful in > DropRelFileNodeBuffers. I think that's ok as this version of the patch. > > > > I agree that we can do a better job by expanding comments to clearly > > > > state why it is safe. > > > > > > Yes, basically what Amit-san also mentioned above. The first patch > > prevents that. > > > And in the description of DropRelFileNodeBuffers in the 0003 patch, > > > please check If that would suffice. > > > > + * While in recovery, if the expected maximum number of > > buffers to be > > + * dropped is small enough and the sizes of all involved forks > > are > > + * already cached, individual buffer is located by > > BufTableLookup(). > > + * It is safe because cached blocks will not be invalidated by file > > + * extension during recovery. See smgrnblocks() and > > smgrextend() for > > + * more details. Otherwise, if the conditions for optimization are > > not > > + * met, the buffer pool is sequentially scanned so that no > > buffers are > > + * left behind. > > > > I'm not confident on it, but it seems somewhat obscure. How about > > something like this? > > > > We mustn't leave a buffer for the relations to be dropped. We invalidate > > buffer blocks by locating using BufTableLookup() when we assure that we > > know up to what page of every fork we possiblly have a buffer for. We can > > know that by the "cached" flag returned by smgrblocks. It currently gets > > true > > only while recovery. See > > smgrnblocks() and smgrextend(). Otherwise we scan the whole buffer pool to > > find buffers for the relation, which is slower when a small part of buffers > > are > > to be dropped. > > Followed your advice and modified it a bit. > > I have changed the status to "Needs Review". > Feedbacks are always welcome. > > Regards, > Kirk Jamison regards. -- Kyotaro Horiguchi NTT Open Source Software Center