Alexander Korotkov <a.korot...@postgrespro.ru> writes:
> On Fri, Aug 17, 2018 at 9:55 PM Tom Lane <t...@sss.pgh.pa.us> wrote:
>> Another point is that the truncation code attempts to remove all
>> to-be-truncated-away pages from the shared buffer arena, but that only
>> works if nobody else is loading such pages into shared buffers
>> concurrently.  In the presence of concurrent scans, we might be left
>> with valid-looking buffers for pages that have been truncated away
>> on-disk.  That could cause all sorts of fun later.  Yeah, the buffers
>> should contain only dead tuples ... but, for example, they might not
>> be hinted dead.  If somebody sets one of those hint bits and then
>> writes the buffer back out to disk, you've got real problems.

> Thank you for the explanation.  I see that injecting past OEF pages
> into shared buffers doesn't look good.  I start thinking about letting
> caller of ReadBuffer() (or its variation) handle past OEF situation.

That'd still have the same race condition, though: between the time
we start to drop the doomed pages from shared buffers, and the time
we actually perform ftruncate, concurrent scans could re-load such
pages into shared buffers.

Could it work to ftruncate first and flush shared buffers after?
Probably not, I think the write-back-dirty-hint-bits scenario
breaks that one.

If this were easy, we'd have fixed it years ago :-(.  It'd sure
be awfully nice not to need AEL during autovacuum, even transiently;
but I'm not sure how we get there without adding an unpleasant amount
of substitute locking in table scans.

                        regards, tom lane

Reply via email to