On 2016-07-18 01:33:10 -0700, Andres Freund wrote: > On 2016-07-18 10:02:52 +0530, Amit Kapila wrote: > > On Mon, Jul 18, 2016 at 9:13 AM, Andres Freund <and...@anarazel.de> wrote: > > > On 2016-07-18 09:07:19 +0530, Amit Kapila wrote: > > >> + /* > > >> + * Before locking the buffer, pin the visibility map page if it may be > > >> + * necessary. > > >> + */ > > >> > > >> + if (PageIsAllVisible(BufferGetPage(*buffer))) > > >> + visibilitymap_pin(relation, block, &vmbuffer); > > >> + > > >> LockBuffer(*buffer, BUFFER_LOCK_EXCLUSIVE); > > >> > > >> I think we need to check for PageIsAllVisible and try to pin the > > >> visibility map after taking the lock on buffer. I think it is quite > > >> possible that in the time this routine tries to acquire lock on > > >> buffer, the page becomes all visible. > > > > > > I don't see how. Without a cleanup lock it's not possible to mark a page > > > all-visible/frozen. > > > > > > > Consider the below scenario. > > > > Vacuum > > a. acquires a cleanup lock for page - 10 > > b. busy in checking visibility of tuples > > --assume, here it takes some time and in the meantime Session-1 > > performs step (a) and (b) and start waiting in step- (c) > > c. marks the page as all-visible (PageSetAllVisible) > > d. unlockandrelease the buffer > > > > Session-1 > > a. In heap_lock_tuple(), readbuffer for page-10 > > b. check PageIsAllVisible(), found page is not all-visible, so didn't > > acquire the visbilitymap_pin > > c. LockBuffer in ExlusiveMode - here it will wait for vacuum to > > release the lock > > d. Got the lock, but now the page is marked as all-visible, so ideally > > need to recheck the page and acquire the visibilitymap_pin > > So, I've tried pretty hard to reproduce that. While the theory above is > sound, I believe the relevant code-path is essentially dead for SQL > callable code, because we'll always hold a buffer pin before even > entering heap_update/heap_lock_tuple. It's possible that you could > concoct a dangerous scenario with follow_updates though; but I can't > immediately see how. Due to that, and based on the closing in beta > release, I'm planning to push a version of the patch that the returns > fixed; but not this. It seems better to have the majority of the fix > in.
Pushed that way. Let's try to figure out a good solution to a) test this case b) how to fix it in a reasonable way. Note that there's also http://archives.postgresql.org/message-id/20160718071729.tlj4upxhaylwv75n%40alap3.anarazel.de which seems related. Regards, Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers