Alexander Korotkov <a.korot...@postgrespro.ru> writes: > On Fri, Aug 17, 2018 at 9:55 PM Tom Lane <t...@sss.pgh.pa.us> wrote: >> Another point is that the truncation code attempts to remove all >> to-be-truncated-away pages from the shared buffer arena, but that only >> works if nobody else is loading such pages into shared buffers >> concurrently. In the presence of concurrent scans, we might be left >> with valid-looking buffers for pages that have been truncated away >> on-disk. That could cause all sorts of fun later. Yeah, the buffers >> should contain only dead tuples ... but, for example, they might not >> be hinted dead. If somebody sets one of those hint bits and then >> writes the buffer back out to disk, you've got real problems.
> Thank you for the explanation. I see that injecting past OEF pages > into shared buffers doesn't look good. I start thinking about letting > caller of ReadBuffer() (or its variation) handle past OEF situation. That'd still have the same race condition, though: between the time we start to drop the doomed pages from shared buffers, and the time we actually perform ftruncate, concurrent scans could re-load such pages into shared buffers. Could it work to ftruncate first and flush shared buffers after? Probably not, I think the write-back-dirty-hint-bits scenario breaks that one. If this were easy, we'd have fixed it years ago :-(. It'd sure be awfully nice not to need AEL during autovacuum, even transiently; but I'm not sure how we get there without adding an unpleasant amount of substitute locking in table scans. regards, tom lane