On Mon, Feb 24, 2020 at 3:38 PM Amit Kapila <amit.kapil...@gmail.com> wrote: > > On Thu, Feb 20, 2020 at 8:06 AM Andres Freund <and...@anarazel.de> wrote: > > What I'm advocating is that extension locks should continue to go > > through lock.c. And yes, that requires some changes to group locking, > > but I still don't see why they'd be complicated. > > > > Fair position, as per initial analysis, I think if we do below three > things, it should work out without changing to a new way of locking > for relation extension or page type locks. > a. As per the discussion above, ensure in code we will never try to > acquire another heavy-weight lock after acquiring relation extension > or page type locks (probably by having Asserts in code or maybe some > other way).
I have done an analysis of the relation extension lock (which can be acquired via LockRelationForExtension or ConditionalLockRelationForExtension) and found that we don't acquire any other heavyweight lock after acquiring it. However, we do sometimes try to acquire it again in the places where we update FSM after extension, see points (e) and (f) described below. The usage of this lock can be broadly divided into six categories and each one is explained as follows: a. Where after taking the relation extension lock we call ReadBuffer (or its variant) and then LockBuffer. The LockBuffer internally calls either LWLock to acquire or release neither of which acquire another heavy-weight lock. It is quite obvious as well that while taking some lightweight lock, there is no reason to acquire another heavyweight lock on any object. The specs/comments of ReadBufferExtended (which gets called from variants of ReadBuffer) API says that if the blknum requested is P_NEW, only one backend can call it at-a-time which indicates that we don't need to acquire any heavy-weight lock inside this API. Otherwise, also, this API won't need a heavy-weight lock to read the existing block into shared buffer as two different backends are allowed to read the same block. I have also gone through all the functions called/used in this path to ensure that we don't use heavy-weight locks inside it. The usage by APIs BloomNewBuffer, GinNewBuffer, gistNewBuffer, _bt_getbuf, and SpGistNewBuffer falls in this category. Another API that falls under this category is revmap_physical_extend which uses ReadBuffer, LocakBuffer and ReleaseBuffer. The ReleaseBuffer API unpins aka decrement the reference count for buffer and disassociates a buffer from the resource owner. None of that requires heavy-weight lock. T b. After taking relation extension lock, we call RelationGetNumberOfBlocks which primarily calls file-level functions to determine the size of the file. This doesn't acquire any other heavy-weight lock after relation extension lock. The usage by APIs ginvacuumcleanup, gistvacuumscan, btvacuumscan, and spgvacuumscan falls in this category. c. There is a usage in API brin_page_cleanup() where we just acquire and release the relation extension lock to avoid reinitializing the page. As there is no call in-between acquire and release, so there is no chance of another heavy-weight lock acquire after having relation extension lock. d. In fsm_extend() and vm_extend(), after acquiring relation extension lock, we perform various file-level operations like RelationOpenSmgr, smgrexists, smgrcreate, smgrnblocks, smgrextend. First, from theory, we don't have any heavy-weight lock other than relation extension lock which can cover such operations and then I have verified it by going through these APIs that these don't acquire any other heavy-weight lock. Then these APIs also call PageSetChecksumInplace computes a checksum of the page and sets the same in page header which is quite straight-forward and doesn't acquire any heavy-weight lock. In vm_extend, we additionally call CacheInvalidateSmgr to send a shared-inval message to force other backends to close any smgr references they may have for the relation for which we extending visibility map which has no reason to acquire any heavy-weight lock. I have checked the code path as well and I didn't find any heavy-weight lock call in that. e. In brin_getinsertbuffer, we call ReadBuffer() and LockBuffer(), the usage of which is the same as what is mentioned in (a). In addition to that it calls brin_initialize_empty_new_buffer() which further calls RecordPageWithFreeSpace which can again acquire relation extension lock for same relation. This usage is safe because we have a mechanism in heavy-weight lock manager that if we already hold a lock and a request came for the same lock and in same mode, the lock will be granted. f. In RelationGetBufferForTuple(), there are multiple APIs that get called and like (e), it can try to reacquire the relation extension lock in one of those APIs. The main APIs it calls after acquiring relation extension lock are described as follows: - GetPageWithFreeSpace: This tries to find a page in the given relation with at least the specified amount of free space. This mainly checks the FSM pages and in one of the paths might call fsm_extend which can again try to acquire the relation extension lock on the same relation. - RelationAddExtraBlocks: This adds multiple pages in a relation if there is contention around relation extension lock. This calls RelationExtensionLockWaiterCount which is mainly to check how many lockers are waiting for the same lock, then call ReadBufferBI which as explained above won't require heavy-weight locks and FSM APIs which can acquire Relation extension lock on the same relation, but that is safe as discussed previously. The Page locks can be acquired via LockPage and ConditionalLockPage. This is acquired from one place in the code during Gin index cleanup (ginInsertCleanup). The basic idea is that it will scan the pending list and move entries into the main index. While moving entries to the main page, it might need to add a new page that will require us to take a relation extension lock. Now, unlike relation extension lock, after acquiring page lock, we do acquire another heavy-weight lock (relation extension lock), but as we never acquire it in reverse order, this is safe. So, as per this analysis, we can add Asserts for relation extension and page locks which will indicate that they won't participate in deadlocks. It would be good if someone else can also do independent analysis and verify my findings. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com