Re: Exposing the lock manager's WaitForLockers() to SQL
Here is a new series adding a single pg_wait_for_lockers() function that takes a boolean argument to control the interpretation of the lock mode. It omits LOCK's handling of descendant tables so it requires permissions directly on descendants in order to wait for locks on them. Not sure if that would be a problem for anyone. v6-0001-Refactor-GetLockConflicts-into-more-general-GetLo.patch Description: Binary data v6-0002-Allow-specifying-single-lockmode-in-WaitForLocker.patch Description: Binary data v6-0003-Add-pg_wait_for_lockers-function.patch Description: Binary data
Re: Exposing the lock manager's WaitForLockers() to SQL
On Fri, Jan 26, 2024 at 4:54 AM vignesh C wrote: > > CFBot shows that there is one warning as in [1]: > patching file doc/src/sgml/libpq.sgml > ... > [09:30:40.000] [943/2212] Compiling C object > src/backend/postgres_lib.a.p/storage_lmgr_lock.c.obj > [09:30:40.000] c:\cirrus\src\backend\storage\lmgr\lock.c(4084) : > warning C4715: 'ParseLockmodeName': not all control paths return a > value Thanks Vignesh, I guess the MS compiler doesn't have __builtin_constant_p()? So I added an unreachable return, and a regression test that exercises this error path. I also made various other simplifications and minor fixes to the code, docs, and tests. Back in v5 (with a new SQL command) I had a detailed example in the docs, which I removed when changing to a function, and I'm not sure if I should try to add it back now...I could shrink it but it might still be too long for this part of the docs? Anyway, please see attached. v7-0001-Refactor-GetLockConflicts-into-more-general-GetLo.patch Description: Binary data v7-0002-Allow-specifying-single-lockmode-in-WaitForLocker.patch Description: Binary data v7-0003-Add-pg_wait_for_lockers-function.patch Description: Binary data
Re: Exposing the lock manager's WaitForLockers() to SQL
I guess the output of the deadlock test was unstable, so I simply removed it in v8 here, but I can try to fix it instead if it seems important to test that. v8-0001-Refactor-GetLockConflicts-into-more-general-GetLo.patch Description: Binary data v8-0003-Add-pg_wait_for_lockers-function.patch Description: Binary data v8-0002-Allow-specifying-single-lockmode-in-WaitForLocker.patch Description: Binary data
Re: Exposing the lock manager's WaitForLockers() to SQL
Minor style fix; sorry for the spam. v9-0001-Refactor-GetLockConflicts-into-more-general-GetLo.patch Description: Binary data v9-0002-Allow-specifying-single-lockmode-in-WaitForLocker.patch Description: Binary data v9-0003-Add-pg_wait_for_lockers-function.patch Description: Binary data
Re: [PATCH] LockAcquireExtended improvement
On Thu, Mar 14, 2024 at 1:15 PM Robert Haas wrote: > Seeing no further discussion, I have committed my version of this > patch, with your test case. This comment on ProcSleep() seems to have the values of dontWait backward (double negatives are tricky): * Result: PROC_WAIT_STATUS_OK if we acquired the lock, PROC_WAIT_STATUS_ERROR * if not (if dontWait = true, this is a deadlock; if dontWait = false, we * would have had to wait). Also there's a minor typo in a comment in LockAcquireExtended(): * Check the proclock entry status. If dontWait = true, this is an * expected case; otherwise, it will open happen if something in the * ipc communication doesn't work correctly. "open" should be "only".
Re: Exposing the lock manager's WaitForLockers() to SQL
Rebased, fixed a couple typos, and reordered the isolation tests to put the most elaborate pair last. v11-0001-Refactor-GetLockConflicts-into-more-general-GetL.patch Description: Binary data v11-0002-Allow-specifying-single-lockmode-in-WaitForLocke.patch Description: Binary data v11-0003-Add-pg_wait_for_lockers-function.patch Description: Binary data
Re: Exposing the lock manager's WaitForLockers() to SQL
Rebased and fixed conflicts. FWIW re: Andrey's comment in his excellent CF summary email[0]: we're currently using vanilla Postgres (via Gentoo) on single nodes, and not anything fancy like Citus. The Citus relationship is just that we were inspired by Marco's blog post there. We have a variety of clients written in different languages that generally don't coordinate their table modifications, and Marco's scheme merely requires them to use sequences idiomatically, which we can just about manage. :-) This feature is then a performance optimization to support this scheme while avoiding the case where one writer holding a RowExclusiveLock blocks the reader from taking a ShareLock which in turn prevents other writers from taking a RowExclusiveLock for a long time. Instead, the reader can wait for the first writer without taking any locks or blocking later writers. I've illustrated this difference in the isolation tests. Still hoping we can get this into 17. :-) [0] https://www.postgresql.org/message-id/C8D65462-0888-4484-A72C-C99A94381ECD%40yandex-team.ru v10-0003-Add-pg_wait_for_lockers-function.patch Description: Binary data v10-0001-Refactor-GetLockConflicts-into-more-general-GetL.patch Description: Binary data v10-0002-Allow-specifying-single-lockmode-in-WaitForLocke.patch Description: Binary data
Exposing the lock manager's WaitForLockers() to SQL
Hi there, We'd like to be able to call the lock manager's WaitForLockers() and WaitForLockersMultiple() from SQL. Below I describe our use case, but basically I'm wondering if this: 1. Seems like a reasonable thing to do 2. Would be of interest upstream 3. Should be done with a new pg_foo() function (taking an oid?), or a whole new SQL command, or something else If this sounds promising, we may be able to code this up and submit it. The rest of this email describes our use cases and related observations: Use Case Background Our use case is inspired by this blog post by Marco Slot (CC'ed) at Citus Data: https://www.citusdata.com/blog/2018/06/14/scalable-incremental-data-aggregation/ . This describes a scheme for correctly aggregating rows given minimal coordination with an arbitrary number of writers while keeping minimal additional state. It relies on two simple facts: 1. INSERT/UPDATE take their ROW EXCLUSIVE lock on the target table before evaluating any column DEFAULT expressions, and thus before e.g. calling nextval() on a sequence in the DEFAULT expression. And of course, this lock is only released when the transaction commits or rolls back. 2. pg_sequence_last_value() (still undocumented!) can be used to obtain an instantaneous upper bound on the sequence values that have been returned by nextval(), even if the transaction that called nextval() hasn't yet committed. So, assume we have a table: create table tbl ( id bigserial, data text ); which is only ever modified by INSERTs that use DEFAULT for id. Then, a client can process each row exactly once using a loop like this (excuse the pseudo-SQL): min_id := 0; while true: max_id := pg_sequence_last_value('tbl_id_seq'); wait_for_writers('tbl'::regclass); SELECT some_aggregation(data) FROM tbl WHERE id > min_id AND id <= max_id; min_id := max_id; In the blog post, the equivalent of wait_for_writers() is implemented by taking and immediately releasing a SHARE ROW EXCLUSIVE lock on tbl. It's unclear why this can't be SHARE, since it just needs to conflict with INSERT's ROW EXCLUSIVE, but in any case it's sufficient for correctness. (Note that this version only works if the rows committed by the transactions that it waited for are actually visible to the SELECT, so for example, the whole thing can't be within a Repeatable Read or Serializable transaction.) Why WaitForLockers()? No new writer can acquire a ROW EXCLUSIVE lock as long as we're waiting to obtain the SHARE lock, even if we only hold it for an instant. If we have to wait a long time, because some existing writer holds its ROW EXCLUSIVE lock for a long time, this could noticeably reduce overall writer throughput. But we don't actually need to obtain a lock at all--and waiting for transactions that already hold conflicting locks is exactly what WaitForLockers() / WaitForLockersMultiple() does. Using it instead would prevent any interference with writers. Appendix: Extensions and Observations Aside from downgrading to SHARE mode and merely waiting instead of locking, we propose a couple other extensions and observations related to Citus' scheme. These only tangentially motivate our need for WaitForLockers(), so you may stop reading here unless the overall scheme is of interest. == Separate client for reading sequences and waiting == First, in our use case each batch of rows might require extensive processing as part of a larger operation that doesn't want to block waiting for writers to commit. A simple extension is to separate the processing from the determination of sequence values. In other words, have a single client that sits in a loop: while true: seq_val := pg_sequence_last_value('tbl_id_seq'); WaitForLockers('tbl'::regclass, 'SHARE'); publish(seq_val); and any number of other clients that use the series of published sequence values to do their own independent processing (maintaining their own additional state). This can be extended to multiple tables with WaitForLockersMultiple(): while true: seq_val1 := pg_sequence_last_value('tbl1_id_seq'); seq_val2 := pg_sequence_last_value('tbl2_id_seq'); WaitForLockersMultiple( ARRAY['tbl1', 'tbl2']::regclass[], 'SHARE'); publish('tbl1', seq_val1); publish('tbl2', seq_val2); Which is clearly more efficient than locking or waiting for the tables in sequence, hence the desire for that function as well. == Latency == This brings us to a series of observations about latency. If some writers take a long time to commit, some already-committed rows might not be processed for a long time. To avoid exacerbating this when using WaitForLockersMultiple(), which obviously has to wait for the last writer of any specified table, it should be
Re: Exposing the lock manager's WaitForLockers() to SQL
Hi Marco, thanks for the reply! Glad to know you'd find it useful too. :-) On Tue, Jan 10, 2023 at 1:01 AM Marco Slot wrote: > I'm wondering whether it could be an option of the LOCK command. > (LOCK WAIT ONLY?) I assume that's doable, but just from looking at the docs, it might be a little confusing. For example, at least if we use WaitForLockersMultiple(), waiting for multiple tables would happen in parallel (which I think is good), while locking them is documented to happen sequentially. Also, normal LOCK is illegal outside a transaction, but waiting makes perfect sense. (Actually, normal LOCK makes sense too, if the goal was just to wait. :-) ) By contrast, while LOCK has NOWAIT, and SELECT's locking clause has NOWAIT and SKIP LOCKED, they only change the blocking/failure behavior, while success still means taking the lock and has the same semantics. But I'm really no expert on SQL syntax or typical practice for things like this. Anything that works is fine with me. :-) As a possibly superfluous sidebar, I wanted to correct this part of my original message: > On Fri, Dec 23, 2022 at 11:43 AM Will Mortensen wrote: > > pg_sequence_last_value() (still undocumented!) can be used to > > obtain an instantaneous upper bound on the sequence values > > that have been returned by nextval(), even if the transaction > > that called nextval() hasn't yet committed. This is true, but not the most important part of making this scheme work: as you mentioned in the Citus blog post, to avoid missing rows, we need (and this gives us) an instantaneous *lower* bound on the sequence values that could be used by transactions that commit after we finish waiting (and start processing). This doesn't work with sequence caching, since without somehow inspecting all sessions' sequence caches, rows with arbitrarily old/low cached sequence values could be committed arbitrarily far into the future, and we'd fail to process them. As you also implied in the blog post, the upper bound is what allows us to also process each row *exactly* once (instead of at least once) and in sequence order, if desired. So those are the respective justifications for both arms of the WHERE clause: id > min_id AND id <= max_id . On Tue, Jan 10, 2023 at 1:01 AM Marco Slot wrote: > > On Fri, Dec 23, 2022 at 11:43 AM Will Mortensen wrote: > > We'd like to be able to call the lock manager's WaitForLockers() and > > WaitForLockersMultiple() from SQL. Below I describe our use case, but > > basically I'm wondering if this: > > > > 1. Seems like a reasonable thing to do > > > > 2. Would be of interest upstream > > > > 3. Should be done with a new pg_foo() function (taking an > >oid?), or a whole new SQL command, or something else > > Definitely +1 on adding a function/syntax to wait for lockers without > actually taking a lock. The get sequence value + lock-and-release > approach is still the only reliable scheme I've found for reliably and > efficiently processing new inserts in PostgreSQL. I'm wondering > whether it could be an option of the LOCK command. (LOCK WAIT ONLY?) > > Marco
Re: Exposing the lock manager's WaitForLockers() to SQL
Hi Andres, On Wed, Jan 11, 2023 at 12:33 PM Andres Freund wrote: > I think such a function would still have to integrate enough with the lock > manager infrastructure to participate in the deadlock detector. Otherwise I > think you'd trivially end up with loads of deadlocks. Could you elaborate on which unusual deadlock concerns arise? To be clear, WaitForLockers() is an existing function in lmgr.c (https://github.com/postgres/postgres/blob/216a784829c2c5f03ab0c43e009126cbb819e9b2/src/backend/storage/lmgr/lmgr.c#L986), and naively it seems like we mostly just need to call it. To my very limited understanding, from looking at the existing callers and the implementation of LOCK, that would look something like this (assuming we're in a SQL command like LOCK and calling unmodified WaitForLockers() with a single table): 1. Call something like RangeVarGetRelidExtended() with AccessShareLock to ensure the table is not dropped and obtain the table oid 2. Use SET_LOCKTAG_RELATION() to construct the lock tag from the oid 3. Call WaitForLockers(), which internally calls GetLockConflicts() and VirtualXactLock(). These certainly take plenty of locks of various types, and will likely sleep in LockAcquire() waiting for transactions to finish, but there don't seem to be any unusual pre/postconditions, nor do we hold any unusual locks already. Obviously a deadlock is possible if transactions end up waiting for each other, just as when taking table or row locks, etc., but it seems like this would be detected as usual?
Re: Exposing the lock manager's WaitForLockers() to SQL
I suppose if it's correct that we need to lock the table first (at least in ACCESS SHARE mode), an option to LOCK perhaps makes more sense. Maybe you could specify two modes like: LOCK TABLE IN _lockmode_ MODE AND THEN WAIT FOR CONFLICTS WITH _waitmode_ MODE; But that might be excessive. :-D And I don't know if there's any reason to use a _lockmode_ other than ACCESS SHARE. On Wed, Jan 11, 2023 at 11:03 PM Will Mortensen wrote: > > Hi Andres, > > On Wed, Jan 11, 2023 at 12:33 PM Andres Freund wrote: > > I think such a function would still have to integrate enough with the lock > > manager infrastructure to participate in the deadlock detector. Otherwise I > > think you'd trivially end up with loads of deadlocks. > > Could you elaborate on which unusual deadlock concerns arise? To be > clear, WaitForLockers() is an existing function in lmgr.c > (https://github.com/postgres/postgres/blob/216a784829c2c5f03ab0c43e009126cbb819e9b2/src/backend/storage/lmgr/lmgr.c#L986), > and naively it seems like we mostly just need to call it. To my very > limited understanding, from looking at the existing callers and the > implementation of LOCK, that would look something like this > (assuming we're in a SQL command like LOCK and calling unmodified > WaitForLockers() with a single table): > > 1. Call something like RangeVarGetRelidExtended() with AccessShareLock > to ensure the table is not dropped and obtain the table oid > > 2. Use SET_LOCKTAG_RELATION() to construct the lock tag from the oid > > 3. Call WaitForLockers(), which internally calls GetLockConflicts() and > VirtualXactLock(). These certainly take plenty of locks of various types, > and will likely sleep in LockAcquire() waiting for transactions to finish, > but there don't seem to be any unusual pre/postconditions, nor do we > hold any unusual locks already. > > Obviously a deadlock is possible if transactions end up waiting for each > other, just as when taking table or row locks, etc., but it seems like this > would be detected as usual?
Re: Exposing the lock manager's WaitForLockers() to SQL
Hi Andres, On Thu, Jan 12, 2023 at 11:31 AM Andres Freund wrote: > I know that WaitForLockers() is an existing function :). I'm not sure it's > entirely suitable for your use case. So I mainly wanted to point out that if > you end up writing a separate version of it, you still need to integrate with > the deadlock detection. I see. What about it seems potentially unsuitable? > On 2023-01-11 23:03:30 -0800, Will Mortensen wrote: > > To my very limited understanding, from looking at the existing callers and > > the implementation of LOCK, that would look something like this (assuming > > we're in a SQL command like LOCK and calling unmodified WaitForLockers() > > with a single table): > > > > 1. Call something like RangeVarGetRelidExtended() with AccessShareLock > > to ensure the table is not dropped and obtain the table oid > > > > 2. Use SET_LOCKTAG_RELATION() to construct the lock tag from the oid > > > > 3. Call WaitForLockers(), which internally calls GetLockConflicts() and > > VirtualXactLock(). These certainly take plenty of locks of various types, > > and will likely sleep in LockAcquire() waiting for transactions to finish, > > but there don't seem to be any unusual pre/postconditions, nor do we > > hold any unusual locks already. > > I suspect that keeping the AccessShareLock while doing the WaitForLockers() is > likely to increase the deadlock risk noticeably. I think for the use case you > might get away with resolving the relation names, building the locktags, and > then release the lock, before calling WaitForLockers. If somebody drops the > table or such, you'd presumably still get desired behaviour that way, without > the increased deaadlock risk. That makes sense. I agree it seems fine to just return if e.g. the table is dropped. FWIW re: deadlocks in general, I probably didn't highlight it well in my original email, but the existing solution for this use case (as Marco described in his blog post) is to actually lock the table momentarily. Marco's blog post uses ShareRowExclusiveLock, but I think ShareLock is sufficient for us; in any case, that's stronger than the AccessShareLock that we need to merely wait. And actually locking the table with e.g. ShareLock seems perhaps *more* likely to cause deadlocks (and hurts performance), since it not only waits for existing conflicting lockers (e.g. RowExclusiveLock) as desired, but also undesirably blocks other transactions from newly acquiring conflicting locks in the meantime. Hence the motivation for this feature. :-) I'm sure I may be missing something though. Thanks for all your feedback. :-)
Re: Exposing the lock manager's WaitForLockers() to SQL
Hi Andres, On Thu, Jan 12, 2023 at 7:49 PM Andres Freund wrote: > Consider a scenario like this: > > tx 1: acquires RowExclusiveLock on tbl1 to insert rows > tx 2: acquires AccessShareLock on tbl1 > tx 2: WaitForLockers(ShareRowExclusiveLock, tbl1) ends up waiting for tx1 > tx 1: truncate tbl1 needs an AccessExclusiveLock Oh of course, thanks. Is it even necessary to take the AccessShareLock? I see that one can call e.g. RangeVarGetRelidExtended() with NoLock, and from the comments it seems like that might be OK here? Did you have any remaining concerns about the suitability of WaitForLockers() for the use case? Any thoughts on the syntax? It seems like an option to LOCK (like Marco suggested) might be simplest to implement albeit a little tricky to document. Supporting descendant tables looks straightforward enough (just collect more locktags?). Views look more involved; maybe we can avoid supporting them?
Re: Exposing the lock manager's WaitForLockers() to SQL
On Sun, Sep 3, 2023 at 11:16 PM Will Mortensen wrote: > I realized that for our use case, we'd ideally wait for holders of > RowExclusiveLock only, and not e.g. VACUUM holding > ShareUpdateExclusiveLock. Waiting for lockers in a specific mode seems > possible by generalizing/duplicating WaitForLockersMultiple() and > GetLockConflicts(), but I'd love to have a sanity check before > attempting that. Also, I imagine those semantics might be too > different to make sense as part of the LOCK command. Well I attempted it. :-) Here is a new series that refactors GetLockConflicts(), generalizes WaitForLockersMultiple(), and adds a new WAIT FOR LOCKERS command. I first tried extending LOCK further, but the code became somewhat unwieldy and the syntax became very confusing. I also thought again about making new pg_foo() functions, but that would seemingly make it harder to share code with LOCK, and sharing syntax (to the extent it makes sense) feels very natural. Also, a new SQL command provides plenty of doc space. :-) (I'll see about adding more examples later.) I'll try to edit the title of the CF entry accordingly. Still looking forward to any feedback. :-) v4-0002-Allow-specifying-single-lockmode-in-WaitForLocker.patch Description: Binary data v4-0003-Add-WAIT-FOR-LOCKERS-command.patch Description: Binary data v4-0001-Refactor-GetLockConflicts-into-more-general-GetLo.patch Description: Binary data
Re: Exposing the lock manager's WaitForLockers() to SQL
I meant to add that the example in the doc is adapted from Marco Slot's blog post linked earlier: https://www.citusdata.com/blog/2018/06/14/scalable-incremental-data-aggregation/
Re: Exposing the lock manager's WaitForLockers() to SQL
Simplified the code and docs, and rewrote the example with more prose instead of PL/pgSQL, which unfortunately made it longer, although it could be truncated. Not really sure what's best... v5-0001-Refactor-GetLockConflicts-into-more-general-GetLo.patch Description: Binary data v5-0003-Add-WAIT-FOR-LOCKERS-command.patch Description: Binary data v5-0002-Allow-specifying-single-lockmode-in-WaitForLocker.patch Description: Binary data
Re: Exposing the lock manager's WaitForLockers() to SQL
Hi Laurenz, thanks for taking a look! On Sat, Jan 6, 2024 at 4:00 AM Laurenz Albe wrote: > While your original use case is valid, I cannot think of > any other use case. So it is a special-purpose statement that is only > useful for certain processing of append-only tables. It is definitely somewhat niche. :-) But as I mentioned in my longwinded original message, the scheme is easily extended (with some tradeoffs) to process updates, if they set a non-primary-key column using a sequence. As for deletions though, our applications handle them separately. > Is it worth creating a new SQL statement for that, which could lead to > a conflict with future editions of the SQL standard? Couldn't we follow > the PostgreSQL idiosyncrasy of providing a function with side effects > instead? I would be happy to add a pg_foo() function instead. Here are a few things to figure out: * To support waiting for lockers in a specified mode vs. conflicting with a specified mode, should there be two functions, or one function with a boolean argument like I used in C? * Presumably the function(s) would take a regclass[] argument? * Presumably the lock mode would be specified using strings like 'ShareLock'? There's no code to parse these AFAICT, but we could add it. * Maybe we could omit LOCK's handling of descendant tables for simplicity? I will have to see how much other code needs to be duplicated or shared. I'll look further into it later this week.
Re: Exposing the lock manager's WaitForLockers() to SQL
I realized that for our use case, we'd ideally wait for holders of RowExclusiveLock only, and not e.g. VACUUM holding ShareUpdateExclusiveLock. Waiting for lockers in a specific mode seems possible by generalizing/duplicating WaitForLockersMultiple() and GetLockConflicts(), but I'd love to have a sanity check before attempting that. Also, I imagine those semantics might be too different to make sense as part of the LOCK command. Alternatively, I had originally been trying to use the pg_locks view, which obviously provides flexibility in identifying existing lock holders. But I couldn't find a way to wait for the locks to be released / transactions to finish, and I was a little concerned about the performance impact of selecting from it frequently when we only care about a subset of the locks, although I didn't try to assess that in our particular application. In any case, I'm looking forward to hearing more feedback from reviewers and potential users. :-)
Re: Exposing the lock manager's WaitForLockers() to SQL
Here is a first attempt at a WIP patch. Sorry about the MIME type. It doesn't take any locks on the tables, but I'm not super confident that that's safe, so any input would be appreciated. I omitted view support for simplicity, but if that seems like a requirement I'll see about adding it. I assume we would need to take AccessShareLock on views (and release it, per above). If the syntax and behavior seem roughly correct I'll work on updating the docs. The commit message at the beginning of the .patch has slightly more commentary. Thanks for any and all feedback! 0001-Add-WAIT-ONLY-option-to-LOCK.patch Description: Binary data
[PATCH] fix doc example of bit-reversed MAC address
Pretty trivial since this is documenting something that Postgres *doesn't* do, but it incorrectly reversed only the bits of each nibble, not the whole byte. See e.g. https://www.ibm.com/docs/en/csfdcd/7.1?topic=ls-bit-ordering-in-mac-addresses for a handy table. diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml index 9b0c6c6926..eb2cf34ae6 100644 --- a/doc/src/sgml/datatype.sgml +++ b/doc/src/sgml/datatype.sgml @@ -3907,7 +3907,7 @@ SELECT person.name, holidays.num_weeks FROM person, holidays IEEE Std 802-2001 specifies the second shown form (with hyphens) as the canonical form for MAC addresses, and specifies the first form (with colons) as the bit-reversed notation, so that - 08-00-2b-01-02-03 = 01:00:4D:08:04:0C. This convention is widely + 08-00-2b-01-02-03 = 10:00:D4:80:40:C0. This convention is widely ignored nowadays, and it is relevant only for obsolete network protocols (such as Token Ring). PostgreSQL makes no provisions for bit reversal, and all accepted formats use the canonical LSB
Re: Exposing the lock manager's WaitForLockers() to SQL
Updated docs a bit. I'll see about adding this to the next CF in hopes of attracting a reviewer. :-) v3-0001-Add-WAIT-ONLY-option-to-LOCK-command.patch Description: Binary data
Re: [PATCH] LockAcquireExtended improvement
On Tue, Mar 26, 2024 at 7:14 PM Will Mortensen wrote: > This comment on ProcSleep() seems to have the values of dontWait > backward (double negatives are tricky): > > * Result: PROC_WAIT_STATUS_OK if we acquired the lock, > PROC_WAIT_STATUS_ERROR > * if not (if dontWait = true, this is a deadlock; if dontWait = false, we > * would have had to wait). > > Also there's a minor typo in a comment in LockAcquireExtended(): > > * Check the proclock entry status. If dontWait = true, this is an > * expected case; otherwise, it will open happen if something in the > * ipc communication doesn't work correctly. > > "open" should be "only". Here's a patch fixing those typos. v1-0001-Fix-typos-from-LOCK-NOWAIT-improvement.patch Description: Binary data
Re: [PATCH] LockAcquireExtended improvement
Thanks! :-)
Re: Exposing the lock manager's WaitForLockers() to SQL
I got some very helpful off-list feedback from Robert Haas that this needed more self-contained explanation/motivation. So here goes. :-) This patch set adds a new SQL function pg_wait_for_lockers(), which waits for transactions holding specified table locks to commit or roll back. This can be useful with knowledge of the queries in those transactions, particularly for asynchronous and incremental processing of inserted/updated rows. Specifically, consider a scenario where INSERTs and UPDATEs always set a serial column to its default value. A client can call pg_sequence_last_value() + pg_wait_for_lockers() and then take a new DB snapshot and know that rows committed after this snapshot will have values of the serial column greater than the value from pg_sequence_last_value(). As shown in the example at the end, this allows the client to asynchronously and incrementally read inserted/updated rows with minimal per-client state, without buffering changes, and without affecting writer transactions. There are lots of other ways to support incrementally reading new rows, but they don’t have all of those qualities. For example: * Forcing writers to commit in a specific order (e.g. by serial column value) would reduce throughput * Explicitly tracking or coordinating with writers would likely be more complex, impact performance, and/or require much more state * Methods that are synchronous or buffer/queue changes are problematic if readers fall behind Existing ways to wait for table locks also have downsides: * Taking a conflicting lock with LOCK blocks new transactions from taking the lock of interest while LOCK waits. And in order to wait for writers holding RowExclusiveLock, we must take ShareLock, which also conflicts with ShareUpdateExclusiveLock and therefore unnecessarily interferes with (auto)vacuum. Finally, with multiple tables LOCK locks them one at a time, so it waits (and holds locks) longer than necessary. * Using pg_locks / pg_lock_status() to identify the transactions holding the locks is more expensive since it also returns all other locks in the DB cluster, plus there’s no efficient built-in way to wait for the transactions to commit or roll back. By contrast, pg_wait_for_lockers() doesn’t block other transactions, waits on multiple tables in parallel, and doesn’t spend time looking at irrelevant locks. This change is split into three patches for ease of review. The first two patches modify the existing WaitForLockers() C function and other locking internals to support waiting for lockers in a single lock mode, which allows waiting for INSERT/UPDATE without waiting for vacuuming. These changes could be omitted at the cost of unnecessary waiting, potentially for a long time with slow vacuums. The third patch adds the pg_wait_for_lockers() SQL function, which just calls WaitForLockers(). FWIW, another solution might be to directly expose the functions that WaitForLockers() calls, namely GetLockConflicts() (generalized to GetLockers() in the first patch) to identify the transactions holding the locks, and VirtualXactLock() to wait for each transaction to commit or roll back. That would be more complicated for the client but could be more broadly useful. I could investigate that further if it seems preferable. === Example === Assume we have the following table: CREATE TABLE page_views ( id bigserial, view_time timestamptz ); which is only ever modified by (potentially concurrent) INSERT commands that assign the default value to the id column. We can run the following commands: SELECT pg_sequence_last_value('page_views_id_seq'); pg_sequence_last_value 4 SELECT pg_wait_for_lockers(array['page_views']::oid, regclass[], 'RowExclusiveLock', FALSE); Now we know that all rows where id <= 4 have been committed or rolled back, and we can observe/process them: SELECT * FROM page_views WHERE id <= 4; id | view_time +--- 2 | 2024-01-01 12:34:01.00-00 3 | 2024-01-01 12:34:00.00-00 Later we can iterate: SELECT pg_sequence_last_value('page_views_id_seq'); pg_sequence_last_value 9 SELECT pg_wait_for_lockers(array['page_views']::oid, regclass[], 'RowExclusiveLock', FALSE); We already observed all the rows where id <= 4, so this time we can filter them out: SELECT * FROM page_views WHERE id > 4 AND id <= 9; id | view_time +--- 5 | 2024-01-01 12:34:05.00-00 8 | 2024-01-01 12:34:04.00-00 9 | 2024-01-01 12:34:07.00-00 We can continue iterating like this to incrementally observe more newly inserted rows. Note that the only state we persist across iterations is the value returned by pg_sequence_last_value(). In this example, we processed inserted rows exactly once. Variations are possible for handling updates, as discussed in the original email, and I could explain that again
Re: Exposing the lock manager's WaitForLockers() to SQL
I should add that the latest patches remove permissions checks because pg_locks doesn't have any, and improve the commit messages. Hope I didn't garble anything doing this late after the dev conference. :-) Robert asked me about other existing functions that could be leveraged, such as GetConflictingVirtualXIDs(), but I didn't see any with close-enough semantics that handle fast-path locks as needed for tables/relations.
[PATCH] doc: fix markup indentation in MVCC
Trivial fix to make the indentation consistent. From 46977fbe5fa0a26ef77938a8fe30b9def062e8f8 Mon Sep 17 00:00:00 2001 From: Will Mortensen Date: Sat, 27 Aug 2022 17:07:11 -0700 Subject: [PATCH 1/6] doc: fix markup indentation in MVCC --- doc/src/sgml/mvcc.sgml | 16 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/doc/src/sgml/mvcc.sgml b/doc/src/sgml/mvcc.sgml index 337f6dd429..69b01d01b9 100644 --- a/doc/src/sgml/mvcc.sgml +++ b/doc/src/sgml/mvcc.sgml @@ -109,8 +109,8 @@ dirty read dirty read - - + + A transaction reads data written by a concurrent uncommitted transaction. @@ -121,8 +121,8 @@ nonrepeatable read nonrepeatable read - - + + A transaction re-reads data it has previously read and finds that data has been modified by another transaction (that committed since the initial read). @@ -135,8 +135,8 @@ phantom read phantom read - - + + A transaction re-executes a query returning a set of rows that satisfy a search condition and finds that the set of rows satisfying the condition has changed due to another recently-committed transaction. @@ -149,8 +149,8 @@ serialization anomaly serialization anomaly - - + + The result of successfully committing a group of transactions is inconsistent with all possible orderings of running those transactions one at a time. -- 2.25.1
[PATCH] doc: add missing mention of MERGE in MVCC
MERGE is now a data-modification command too. 0002-doc-add-missing-mention-of-MERGE-in-MVCC.patch Description: Binary data
Re: [PATCH] doc: add missing mention of MERGE in MVCC
I saw, thanks again! On Wed, Jun 21, 2023 at 4:08 PM Bruce Momjian wrote: > > On Mon, Jun 19, 2023 at 11:32:46PM -0700, Will Mortensen wrote: > > MERGE is now a data-modification command too. > > Yes, this has been applied too. > > -- > Bruce Momjian https://momjian.us > EDB https://enterprisedb.com > > Only you can decide what is important to you.
Re: Exposing the lock manager's WaitForLockers() to SQL
Updated patch with more tests and a first attempt at doc updates. As the commit message and doc now point out, using WaitForLockersMultiple() makes for a behavior difference with actually locking multiple tables, in that the combined set of conflicting locks is obtained only once for all tables, rather than obtaining conflicts and locking / waiting for just the first table and then obtaining conflicts and locking / waiting for the second table, etc. This is definitely desirable for my use case, but maybe these kinds of differences illustrate the potential awkwardness of extending LOCK? Thanks again for any and all feedback! v2-0001-Add-WAIT-ONLY-option-to-LOCK-statement.patch Description: Binary data
Re: Exposing the lock manager's WaitForLockers() to SQL
On Thu, May 30, 2024 at 12:01 AM Will Mortensen wrote: > FWIW, another solution might be to directly expose the functions that > WaitForLockers() calls, namely GetLockConflicts() (generalized to > GetLockers() in the first patch) to identify the transactions holding > the locks, and VirtualXactLock() to wait for each transaction to > commit or roll back. That would be more complicated for the client but > could be more broadly useful. I could investigate that further if it > seems preferable. We will look further into this. Since the main advantage over polling the existing pg_locks view would be efficiency, we will try to provide more quantitative evidence/analysis of that. That will probably want to be a new thread and CF entry, so I'm withdrawing this one. Thanks again for all the replies, and to Robert for your off-list feedback and letting me bend your ear in Vancouver. :-)
README.tuplock and SHARE lock
README.tuplock says: > There is one exception > here: since infomask space is limited, we do not provide a separate bit > for SELECT FOR SHARE, so we have to use the extended info in a MultiXact in > that case. (The other cases, SELECT FOR UPDATE and SELECT FOR KEY SHARE, are > presumably more commonly used due to being the standards-mandated locking > mechanism, or heavily used by the RI code, so we want to provide fast paths > for those.) But looking at the explanations of the infomask bits further down (as updated in commit cdbdc4382743fcfabb3437ea7c4d103adaa01324), as well as the actual code for locking a not-yet-locked tuple in compute_new_xmax_infomask(), this doesn't seem to be true. Was this an oversight?
Re: README.tuplock and SHARE lock
Sounds good to me. :-)