From: Thomas Munro <thomas.mu...@gmail.com> > On Thu, Oct 22, 2020 at 7:33 PM Kyotaro Horiguchi > <horikyota....@gmail.com> wrote: > > Mmm. Not exact. The requirement here is that we must be certain that > > the we don't have a buffuer for blocks after the file size known to > > the process. While recoverying, If the first lseek() returned smaller > > size than actual, we cannot have a buffer for the blocks after the > > size. After we trncated or extended the file, we are certain that we > > don't have a buffer for unknown blocks. > > Thanks, I understand now. Something feels fragile about it, perhaps > because it's not really acting as a "cache" anymore despite its name, > but I see the logic now. It becomes the authoritative source of > information, even if the kernel decides to make our file smaller > asynchronously.
Thank you Horiguchi-san, you are a savior! I was worried like the end of the world has come. > I think a synchronised file size cache wouldn't be enough to use this > trick outside the recovery process, because the initial value would > come from a call to lseek(), but unlike recovery, that wouldn't happen > *before* we start putting pages in the buffer pool. Also, if we one > day have a size-limited relcache, even recovery could get into > trouble, if it evicts the RelationData that holds the authoritative > nblocks value. That's too bad, because we hoped we may be able to various operations during normal operation (TRUNCATE, DROP TABLE/INDEX, DROP DATABASE, etc.) An honest man can't believe the system call, that's a hell. I'm probably being silly, but can't we avoid the problem by using fstat() instead of lseek(SEEK_END)? Would they return the same value from the i-node? Or, can't we just try to do BufTableLookup() one block after what smgrnblocks() returns? Regards Takayuki Tsunakawa