On Sun, Nov 01, 2020 at 05:39:40PM -0800, Andres Freund wrote: > I'm a bit limited writing - one handed for a while following an injury > on Friday...
Oops. Take care. > On 2020-11-02 10:05:25 +0900, Michael Paquier wrote: > > There is no place doing that on HEAD. > > Err? > > /* see if the block is in the buffer pool or not */ > LWLockAcquire(partLock, LW_SHARED); > buf_id = BufTableLookup(&buf_tag, buf_hash); > > [...] > > How is this not doing IO while holding a buffer mapping lock? Well, other than the one we are discussing of course :) > > > > This specific point was mentioned in the first message of this thread, > > 7th paragraph. That was a long thread, so it is easy to miss: > > https://www.postgresql.org/message-id/CAOBaU_aVvMjQn=ge5qPiJOPMmOj5=ii3st5q0y+wulml5sr...@mail.gmail.com > > The code clearly doesnt implement it that way. > > > > I am wondering what you have in mind regarding the use of > > BM_IO_IN_PROGRESS or a similar flag. Wouldn't that imply some > > consequences for other existing buffers in the table, like a possible > > eviction? > > You'd need exactly one empty buffer for that - it can be reused for the > next to-be-checked buffer. > > > > I'd like to think that we should not do any manipulation of > > the buffer tables in this case. > > Why? Its the way we lock buffers - why is this so special that we need > to do differently? > > > > Hence, in order to prevent a > > concurrent activity to load in shared buffers the page currently > > checked on disk, I got to think that we would need something new here, > > like a filtering hash table that would be checked each time a backend > > tries to insert an entry into the buffer tables. > > Thats going to slow down everything a bit - the mapping already is a > bottleneck. > > > > 1) If the buffer is in shared buffers, we have the APIs to solve that > > by using a pin, unlock the partition, and then do the I/O. (Still > > that's unsafe with the smgrwrite() argument?) > > Thats why you need an appropriate relation lock... Something CheckBuffer > didnt bother to document. Its a restriction, but one we probably can > live with. > > > > 2) If the buffer is not in shared buffers, we don't have what it takes > > to solve the problem yet. > > We do. Set up enough state for the case to be otherwise the same as the > in s_b case. > > > But even if we solve this problem, we will > > never really be sure that this is entirely safe, as per the argument > > with concurrent smgrwrite() calls. Current in-core code assumes that > > this can happen only for init forks of unlogged relations which would > > not be visible yet in the backend doing a page check, still it can be > > really easy to break this assumption with any new code added by a new > > feature. > > It also happens in a few other cases than just init forks. But > visibility & relation locking can take care of that. But you need to > document that. If the locking allows concurent readers - and especially > concurrent writers, then you cant really use smgrwite for anything but > relation extension. -- Michael
signature.asc
Description: PGP signature