On Wed, Sep 16, 2020 at 7:46 AM Kyotaro Horiguchi <horikyota....@gmail.com> wrote: > > At Wed, 2 Sep 2020 08:18:06 +0530, Amit Kapila <amit.kapil...@gmail.com> > wrote in > > On Wed, Sep 2, 2020 at 7:01 AM Kyotaro Horiguchi > > <horikyota....@gmail.com> wrote: > > > Isn't a relation always locked asscess-exclusively, at truncation > > > time? If so, isn't even the result of lseek reliable enough? > > > > > > > Even if the relation is locked, background processes like checkpointer > > can still touch the relation which might cause problems. Consider a > > case where we extend the relation but didn't flush the newly added > > pages. Now during truncate operation, checkpointer can still flush > > those pages which can cause trouble for truncate. But, I think in the > > recovery path such cases won't cause a problem. > > I reconsided on this and still have a doubt. > > Is this means lseek(SEEK_END) doesn't count blocks that are > write(2)'ed (by smgrextend) but not yet flushed? (I don't think so, > for clarity.) The nblocks cache is added just to reduce the number of > lseek()s and expected to always have the same value with what lseek() > is expected to return. >
See comments in ReadBuffer_common() which indicates such a possibility ("Unfortunately, we have also seen this case occurring because of buggy Linux kernels that sometimes return an lseek(SEEK_END) result that doesn't account for a recent write."). Also, refer my previous email [1] on this and another email link in that email which has a discussion on this point. > The reason it is reliable only during recovery > is that the cache is not shared but the startup process is the only > process that changes the relation size during recovery. > Yes, that is why we are planning to do this optimization for recovery path. > If any other process can extend the relation while smgrtruncate is > running, the current DropRelFileNodeBuffers should have the chance > that a new buffer for extended area is allocated at a buffer location > where the function already have passed by, which is a disaster. > The relation might have extended before smgrtruncate but the newly added pages can be flushed by checkpointer during smgrtruncate. [1] - https://www.postgresql.org/message-id/CAA4eK1LH2uQWznwtonD%2Bnch76kqzemdTQAnfB06z_LXa6NTFtQ%40mail.gmail.com -- With Regards, Amit Kapila.