Re: Logical Replication of sequences
On Thu, Jun 13, 2024 at 6:14 PM Masahiko Sawada wrote: > > On Thu, Jun 13, 2024 at 7:06 PM Amit Kapila wrote: > > > > On Thu, Jun 13, 2024 at 1:09 PM Masahiko Sawada > > wrote: > > > > > > On Wed, Jun 12, 2024 at 6:59 PM Amit Kapila > > > wrote: > > > > > > > > > > > > Yeah, starting with a single worker sounds good for now. Do you think > > > > we should sync all the sequences in a single transaction or have some > > > > threshold value above which a different transaction would be required > > > > or maybe a different sequence sync worker altogether? Now, having > > > > multiple sequence-sync workers requires some synchronization so that > > > > only a single worker is allocated for one sequence. > > > > > > > > The simplest thing is to use a single sequence sync worker that syncs > > > > all sequences in one transaction but with a large number of sequences, > > > > it could be inefficient. OTOH, I am not sure if it would be a problem > > > > in reality. > > > > > > I think that we can start with using a single worker and one > > > transaction, and measure the performance with a large number of > > > sequences. > > > > > > > Fair enough. However, this raises the question Dilip and Vignesh are > > discussing whether we need a new relfilenode for sequence update even > > during initial sync? As per my understanding, the idea is that similar > > to tables, the CREATE SUBSCRIPTION command (with copy_data = true) > > will create the new sequence entries in pg_subscription_rel with the > > state as 'i'. Then the sequence-sync worker would start a transaction > > and one-by-one copy the latest sequence values for each sequence (that > > has state as 'i' in pg_subscription_rel) and mark its state as ready > > 'r' and commit the transaction. Now if there is an error during this > > operation it will restart the entire operation. The idea of creating a > > new relfilenode is to handle the error so that if there is a rollback, > > the sequence state will be rolled back to 'i' and the sequence value > > will also be rolled back. The other option could be that we update the > > sequence value without a new relfilenode and if the transaction rolled > > back then only the sequence's state will be rolled back to 'i'. This > > would work with a minor inconsistency that sequence values will be > > up-to-date even when the sequence state is 'i' in pg_subscription_rel. > > I am not sure if that matters because anyway, they can quickly be > > out-of-sync with the publisher again. > > I think it would be fine in many cases even if the sequence value is > up-to-date even when the sequence state is 'i' in pg_subscription_rel. > But the case we would like to avoid is where suppose the sequence-sync > worker does both synchronizing sequence values and updating the > sequence states for all sequences in one transaction, and if there is > an error we end up retrying the synchronization for all sequences. > The one idea to avoid this is to update sequences in chunks (say 100 or some threshold number of sequences in one transaction). Then we would only redo the sync for the last and pending set of sequences. > > > > Now, say we don't want to maintain the state of sequences for initial > > sync at all then after the error how will we detect if there are any > > pending sequences to be synced? One possibility is that we maintain a > > subscription level flag 'subsequencesync' in 'pg_subscription' to > > indicate whether sequences need sync. This flag would indicate whether > > to sync all the sequences in pg_susbcription_rel. This would mean that > > if there is an error while syncing the sequences we will resync all > > the sequences again. This could be acceptable considering the chances > > of error during sequence sync are low. The benefit is that both the > > REFRESH PUBLICATION SEQUENCES and CREATE SUBSCRIPTION can use the same > > idea and sync all sequences without needing a new relfilenode. Users > > can always refer 'subsequencesync' flag in 'pg_subscription' to see if > > all the sequences are synced after executing the command. > > I think that REFRESH PUBLICATION {SEQUENCES} can be executed even > while the sequence-sync worker is synchronizing sequences. In this > case, the worker might not see new sequences added by the concurrent > REFRESH PUBLICATION {SEQUENCES} command since it's already running. > The worker could end up marking the subsequencesync as completed while > not synchronizing these new sequences. > This is possible but we could avoid REFRESH PUBLICATION {SEQUENCES} by not allowing to change the subsequencestate during the time sequence-worker is syncing the sequences. This could be restrictive but there doesn't seem to be cases where user would like to immediately refresh sequences after creating the subscription. > > > > > > > Or yet another idea I came up with is that a tablesync worker will > > > > > synchronize both the table and sequences owned by the table. That is, > > > > > after the table
Re: Allow non-superuser to cancel superuser tasks.
> On 13 Jun 2024, at 02:04, Nathan Bossart wrote: > > I adjusted 0001 based on my upthread feedback. This patch looks good to me. Spellchecker is complaining about “signaling” instead of “signalling”, but ISTM it’s OK. I’ve tried to dig into the test. The problem is CV is allocated in inj_state = GetNamedDSMSegment("injection_points”, which seems to be destroyed in shmem_exit() calling dsm_backend_shutdown() This happens before we broadcast that sleep is over. I think this might happen with any wait on injection point if it is pg_terminate_backend()ed. Is there way to wake up from CV sleep before processing actual termination? Thanks! Best regards, Andrey Borodin.
Re: Improve the granularity of PQsocketPoll's timeout parameter?
On Thu, Jun 13, 2024 at 9:18 PM Tom Lane wrote: > Ranier Vilela writes: > > +1 for push. > > Done. [...] Thanks a lot Tom (and reviewers)! --DD
may be a buffer overflow problem
Hi hackers, I am using gcc version 11.3.0 to compile postgres source code. Gcc complains about the following line: ```c strncpy(sqlca->sqlstate, "YE001", sizeof(sqlca->sqlstate)); ``` with error as: misc.c:529:17: error: ‘strncpy’ output truncated before terminating nul copying 5 bytes from a string of the same length [-Werror=stringop-truncation] I find the definition of `sqlca->sqlstate` and it has only 5 bytes. When the statement ```c strncpy(sqlca->sqlstate, "YE001", sizeof(sqlca->sqlstate)); ``` get executed, `sqlca->sqlstate` will have no '\0' byte which makes me anxious when someone prints that as a string. Indeed, I found the code(in src/interfaces/ecpg/ecpglib/misc.c) does that, ```c fprintf(debugstream, "[NO_PID]: sqlca: code: %ld, state: %s\n", sqlca->sqlcode, sqlca->sqlstate); ``` Is there any chance to fix the code?
Don't process multi xmax in FreezeMultiXactId() if it is already marked as invalid.
Hello! The src/backend/access/heap/README.tuplock says about HEAP_XMAX_INVALID bit that "Any tuple with this bit set does not have a valid value stored in XMAX." Found that FreezeMultiXactId() tries to process such an invalid multi xmax and may looks for an update xid in the pg_multixact for it. Maybe not do this work in FreezeMultiXactId() and exit immediately if the bit HEAP_XMAX_INVALID was already set? For instance, like that: master @@ -6215,6 +6215,15 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask, /* We should only be called in Multis */ Assert(t_infomask & HEAP_XMAX_IS_MULTI); + /* Xmax is already marked as invalid */ + if (MultiXactIdIsValid(multi) && + (t_infomask & HEAP_XMAX_INVALID)) + { + *flags |= FRM_INVALIDATE_XMAX; + pagefrz->freeze_required = true; + return InvalidTransactionId; + } + if (!MultiXactIdIsValid(multi) || HEAP_LOCKED_UPGRADED(t_infomask)) With the best regards, -- Anton A. Melnikov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: Avoid orphaned objects dependencies, take 3
Hi, On Thu, Jun 13, 2024 at 02:27:45PM -0400, Robert Haas wrote: > On Thu, Jun 13, 2024 at 12:52 PM Bertrand Drouvot > wrote: > > > > table_open(childRelId, ...) would lock any "ALTER TABLE > > > > DROP CONSTRAINT" > > > > already. Not sure I understand your concern here. > > > > > > I believe this is not true. This would take a lock on the table, not > > > the constraint itself. > > > > I agree that it would not lock the constraint itself. What I meant to say > > is that > > , nevertheless, the constraint can not be dropped. Indeed, the "ALTER TABLE" > > necessary to drop the constraint (ALTER TABLE DROP CONSTRAINT) > > would > > be locked by the table_open(childRelId, ...). > > Ah, right. So, I was assuming that, with either this version of your > patch or the earlier version, we'd end up locking the constraint > itself. Was I wrong about that? The child contraint itself is not locked when going through ConstraintSetParentConstraint(). While at it, let's look at a full example and focus on your concern. Let's do that with this gdb file: " $ cat gdb.txt b dependency.c:1542 command 1 printf "Will return for: classId %d and objectId %d\n", object->classId, object->objectId c end b dependency.c:1547 if object->classId == 2606 command 2 printf "Will lock constraint: classId %d and objectId %d\n", object->classId, object->objectId c end " knowing that: " Line 1542 is the return here in depLockAndCheckObject() (your concern): if (object->classId == RelationRelationId || object->classId == AuthMemRelationId) return; Line 1547 is the lock here in depLockAndCheckObject(): /* assume we should lock the whole object not a sub-object */ LockDatabaseObject(object->classId, object->objectId, 0, AccessShareLock); " So, with gdb attached to a session let's: 1. Create the parent relation CREATE TABLE upsert_test ( a INT, b TEXT ) PARTITION BY LIST (a); gdb produces: --- Will return for: classId 1259 and objectId 16384 Will return for: classId 1259 and objectId 16384 --- Oid 16384 is upsert_test, so I think the return (dependency.c:1542) is fine as we are creating the object (it can't be dropped as not visible to anyone else). 2. Create another relation (will be the child) CREATE TABLE upsert_test_2 (b TEXT, a int); gdb produces: --- Will return for: classId 1259 and objectId 16391 Will return for: classId 1259 and objectId 16394 Will return for: classId 1259 and objectId 16394 Will return for: classId 1259 and objectId 16391 --- Oid 16391 is upsert_test_2 Oid 16394 is pg_toast_16391 so I think the return (dependency.c:1542) is fine as we are creating those objects (can't be dropped as not visible to anyone else). 3. Attach the partition ALTER TABLE upsert_test ATTACH PARTITION upsert_test_2 FOR VALUES IN (2); gdb produces: --- Will return for: classId 1259 and objectId 16384 --- That's fine because we'd already had locked the relation 16384 through AlterTableLookupRelation()->RangeVarGetRelidExtended()->LockRelationOid(). 4. Add a constraint on the child relation ALTER TABLE upsert_test_2 add constraint bdtc2 UNIQUE (a); gdb produces: --- Will return for: classId 1259 and objectId 16391 Will lock constraint: classId 2606 and objectId 16397 --- That's fine because we'd already had locked the relation 16391 through AlterTableLookupRelation()->RangeVarGetRelidExtended()->LockRelationOid(). Oid 16397 is the constraint we're creating (bdtc2). 5. Add a constraint on the parent relation (this goes through ConstraintSetParentConstraint()) ALTER TABLE upsert_test add constraint bdtc1 UNIQUE (a); gdb produces: --- Will return for: classId 1259 and objectId 16384 Will lock constraint: classId 2606 and objectId 16399 Will return for: classId 1259 and objectId 16398 Will return for: classId 1259 and objectId 16391 Will lock constraint: classId 2606 and objectId 16399 Will return for: classId 1259 and objectId 16391 --- Regarding "Will return for: classId 1259 and objectId 16384": That's fine because we'd already had locked the relation 16384 through AlterTableLookupRelation()->RangeVarGetRelidExtended()->LockRelationOid(). Regarding "Will lock constraint: classId 2606 and objectId 16399": Oid 16399 is the constraint that we're creating. Regarding "Will return for: classId 1259 and objectId 16398": That's fine because Oid 16398 is an index that we're creating while creating the constraint (so the index can't be dropped as not visible to anyone else). Regarding "Will return for: classId 1259 and objectId 16391": That's fine because 16384 we'd be already locked (as mentioned above). And I think that's fine because trying to drop "upsert_test_2" (aka 16391) would produce RemoveRelations()->RangeVarGetRelidExtended()->RangeVarCallbackForDropRelation() ->LockRelationOid(relid=16384, lockmode=8) and so would be locked. Regarding this example, I don't think that the return in depLockAndCheckObject(): " if (object->classId == R
Re: may be a buffer overflow problem
> On 14 Jun 2024, at 09:38, Winter Loo wrote: > I find the definition of `sqlca->sqlstate` and it has only 5 bytes. When the > statement > > ```c > strncpy(sqlca->sqlstate, "YE001", sizeof(sqlca->sqlstate)); > ``` > > get executed, `sqlca->sqlstate` will have no '\0' byte which makes me anxious > when someone prints that as a string. sqlstate is defined as not being unterminated fixed-length, leaving the callers to handle termination. > Indeed, I found the code(in src/interfaces/ecpg/ecpglib/misc.c) does that, > > fprintf(debugstream, "[NO_PID]: sqlca: code: %ld, state: %s\n", > sqlca->sqlcode, sqlca->sqlstate); This is indeed buggy and need to take the length into account, as per the attached. This only happens when in the undocumented regression test debug mode which may be why it's gone unnoticed. -- Daniel Gustafsson ecgp_sqlstate.diff Description: Binary data
Re: may be a buffer overflow problem
On Fri, 2024-06-14 at 15:38 +0800, Winter Loo wrote: > I am using gcc version 11.3.0 to compile postgres source code. Gcc complains > about the following line: > > strncpy(sqlca->sqlstate, "YE001", sizeof(sqlca->sqlstate)); > > with error as: > > misc.c:529:17: error: ‘strncpy’ output truncated before terminating nul > copying 5 bytes from a string of the same length [-Werror=stringop-truncation] > > I find the definition of `sqlca->sqlstate` and it has only 5 bytes. When the > statement > > strncpy(sqlca->sqlstate, "YE001", sizeof(sqlca->sqlstate)); > > get executed, `sqlca->sqlstate` will have no '\0' byte which makes me anxious > when someone prints that as a string. Indeed, I found the code(in > src/interfaces/ecpg/ecpglib/misc.c) does that, > > fprintf(debugstream, "[NO_PID]: sqlca: code: %ld, state: %s\n", > sqlca->sqlcode, sqlca->sqlstate); > > Is there any chance to fix the code? I agree that that is wrong. We could either use memcpy() to avoid the warning and use a format string with %.*s in fprintf(), or we could make the "sqlstate" one byte longer. I think that the second option would be less error-prone. Yours, Laurenz Albe
Re: may be a buffer overflow problem
On Fri, 2024-06-14 at 09:55 +0200, Daniel Gustafsson wrote: > > On 14 Jun 2024, at 09:38, Winter Loo wrote: > > > I find the definition of `sqlca->sqlstate` and it has only 5 bytes. When > > the statement > > > > ```c > > strncpy(sqlca->sqlstate, "YE001", sizeof(sqlca->sqlstate)); > > ``` > > > > get executed, `sqlca->sqlstate` will have no '\0' byte which makes me > > anxious when someone prints that as a string. > > sqlstate is defined as not being unterminated fixed-length, leaving the > callers > to handle termination. > > > Indeed, I found the code(in src/interfaces/ecpg/ecpglib/misc.c) does that, > > > > fprintf(debugstream, "[NO_PID]: sqlca: code: %ld, state: %s\n", > > sqlca->sqlcode, sqlca->sqlstate); > > This is indeed buggy and need to take the length into account, as per the > attached. This only happens when in the undocumented regression test debug > mode which may be why it's gone unnoticed. So you think we should ignore that compiler warning? What about using memcpy() instead of strncpy()? Yours, Laurenz Albe
Re: may be a buffer overflow problem
> On 14 Jun 2024, at 10:06, Laurenz Albe wrote: > So you think we should ignore that compiler warning? We already do using this in meson.build: # Similarly disable useless truncation warnings from gcc 8+ 'format-truncation', 'stringop-truncation', -- Daniel Gustafsson
Re: may be a buffer overflow problem
On Fri, 2024-06-14 at 10:10 +0200, Daniel Gustafsson wrote: > > On 14 Jun 2024, at 10:06, Laurenz Albe wrote: > > > So you think we should ignore that compiler warning? > > We already do using this in meson.build: > > # Similarly disable useless truncation warnings from gcc 8+ > 'format-truncation', > 'stringop-truncation', Right; and I see that -Wno-stringop-truncation is also set if you build with "make". So your patch is good. I wonder how Winter Loo got to see that warning... Yours, Laurenz Albe
Re: may be a buffer overflow problem
> On 14 Jun 2024, at 10:29, Laurenz Albe wrote: > > On Fri, 2024-06-14 at 10:10 +0200, Daniel Gustafsson wrote: >>> On 14 Jun 2024, at 10:06, Laurenz Albe wrote: >> >>> So you think we should ignore that compiler warning? >> >> We already do using this in meson.build: >> >> # Similarly disable useless truncation warnings from gcc 8+ >> 'format-truncation', >> 'stringop-truncation', > > Right; and I see that -Wno-stringop-truncation is also set if you build > with "make". So your patch is good. Thanks for looking! I will apply it backpatched all the way down as this has been wrong since 2006. > I wonder how Winter Loo got to see that warning... And it would be interesting to know if that was the only warning, since error.c in ECPG performs the exact same string copy. -- Daniel Gustafsson
Re: libpq contention due to gss even when not using gss
> On Thu, Jun 13, 2024 at 10:30:24AM GMT, Andres Freund wrote: > > > To investigate a report of both postgres and pgbouncer having issues when > > > a > > > lot of new connections aree established, I used pgbench -C. Oddly, on an > > > early attempt, the bottleneck wasn't postgres+pgbouncer, it was pgbench. > > > But > > > only when using TCP, not with unix sockets. > > > > > > c=40;pgbench -C -n -c$c -j$c -T5 -f <(echo 'select 1') 'port=6432 > > > host=127.0.0.1 user=test dbname=postgres password=fake' > > > > > > host=127.0.0.1: 16465 > > > host=127.0.0.1,gssencmode=disable 20860 > > > host=/tmp:49286 > > > > > > Note that the server does *not* support gss, yet gss has a substantial > > > performance impact. > > > > > > Obviously the connection rates here absurdly high and outside of badly > > > written > > > applications likely never practically relevant. However, the number of > > > cores > > > in systems are going up, and this quite possibly will become relevant in > > > more > > > realistic scenarios (lock contention kicks in earlier the more cores you > > > have). > > > > By not supporting gss I assume you mean having built with --with-gssapi, > > but only host (not hostgssenc) records in pg_hba, right? > > Yes, the latter. Or not having kerberos set up on the client side. I've been experimenting with both: * The server is built without gssapi, but the client does support it. This produces exactly the contention you're talking about. * The server is built with gssapi, but do not use it in pg_hba, the client does support gssapi. In this case the difference between gssencmode=disable/prefer is even more dramatic in my test case (milliseconds vs seconds) due to the environment with configured kerberos (for other purposes, thus gss_init_sec_context spends huge amount of time to still return nothing). At the same time after quick look I don't see an easy way to avoid that. Current implementation tries to initialize gss before getting any confirmation from the server whether it's supported. Doing this other way around would probably just shift overhead to the server side.
Re: Optimize numeric.c mul_var() using the Karatsuba algorithm
The following review has been posted through the commitfest application: make installcheck-world: not tested Implements feature: not tested Spec compliant: not tested Documentation:not tested This applies cleanly to master, builds and passes regression tests in my Windows/Cygwin environment. I also read through comments and confirmed for myself that the assumption about the caller ensuring var1 is shorter is done already in unchanged code from mul_var. Frees correspond to inits. The "Karatsuba condition" reasoning for deciding whether a number is big enough to use this algorithm appears to match what Joel has stated in this thread. The arithmetic appears to match what's been described in the comments. I have *not* confirmed that with any detailed review of the Karatsuba algorithm from outside sources, other implementations like the Rust one referenced here, or anything similar. I'm hoping that the regression tests give sufficient coverage that if the arithmetic was incorrect there would be obvious failures. If additional coverage was needed, cases falling immediately on either side of the limiting conditions used in the patch would probably be useful. From the limited precedent I've exposed myself to, that doesn't seem to be required here, but I'm open to contrary input from other reviewers. In the meantime, I'm marking this approved. Thanks for the detailed background and comments, Joel! The new status of this patch is: Ready for Committer
Re: Logical Replication of sequences
On Fri, Jun 14, 2024 at 5:16 AM Michael Paquier wrote: > > On Thu, Jun 13, 2024 at 03:36:05PM +0530, Amit Kapila wrote: > > Fair enough. However, this raises the question Dilip and Vignesh are > > discussing whether we need a new relfilenode for sequence update even > > during initial sync? As per my understanding, the idea is that similar > > to tables, the CREATE SUBSCRIPTION command (with copy_data = true) > > will create the new sequence entries in pg_subscription_rel with the > > state as 'i'. Then the sequence-sync worker would start a transaction > > and one-by-one copy the latest sequence values for each sequence (that > > has state as 'i' in pg_subscription_rel) and mark its state as ready > > 'r' and commit the transaction. Now if there is an error during this > > operation it will restart the entire operation. > > Hmm. You mean to use only one transaction for all the sequences? > I've heard about deployments with a lot of them. Could it be a > problem to process them in batches, as well? I don't think so. We can even sync one sequence per transaction but then it would be resource and time consuming without much gain. As mentioned in a previous email, we might want to sync 100 or some other threshold number of sequences per transaction. The other possibility is to make a subscription-level option for this batch size but I don't see much advantage in doing so as it won't be convenient for users to set it. I feel we should pick some threshold number that is neither too low nor too high and if we later see any problem with it, we can make it a configurable knob. > > > The idea of creating a > > new relfilenode is to handle the error so that if there is a rollback, > > the sequence state will be rolled back to 'i' and the sequence value > > will also be rolled back. The other option could be that we update the > > sequence value without a new relfilenode and if the transaction rolled > > back then only the sequence's state will be rolled back to 'i'. This > > would work with a minor inconsistency that sequence values will be > > up-to-date even when the sequence state is 'i' in pg_subscription_rel. > > I am not sure if that matters because anyway, they can quickly be > > out-of-sync with the publisher again. > > Seeing a mention to relfilenodes specifically for sequences freaks me > out a bit, because there's some work I have been doing in this area > and sequences may not have a need for a physical relfilenode at all. > But I guess that you refer to the fact that like tables, relfilenodes > would only be created as required because anything you'd do in the > apply worker path would just call some of the routines of sequence.h, > right? > Yes, I think so. The only thing the patch expects is a way to rollback the sequence changes if the transaction rolls back during the initial sync. But I am not sure if we need such a behavior. The discussion for the same is in progress. Let's wait for the outcome. > > Now, say we don't want to maintain the state of sequences for initial > > sync at all then after the error how will we detect if there are any > > pending sequences to be synced? One possibility is that we maintain a > > subscription level flag 'subsequencesync' in 'pg_subscription' to > > indicate whether sequences need sync. This flag would indicate whether > > to sync all the sequences in pg_susbcription_rel. This would mean that > > if there is an error while syncing the sequences we will resync all > > the sequences again. This could be acceptable considering the chances > > of error during sequence sync are low. > > There could be multiple subscriptions to a single database that point > to the same set of sequences. Is there any conflict issue to worry > about here? > I don't think so. In the worst case, the same value would be copied twice. The same scenario in case of tables could lead to duplicate data or unique key violation ERRORs which is much worse. So, I expect users to be careful about the same. > > The benefit is that both the > > REFRESH PUBLICATION SEQUENCES and CREATE SUBSCRIPTION can use the same > > idea and sync all sequences without needing a new relfilenode. Users > > can always refer 'subsequencesync' flag in 'pg_subscription' to see if > > all the sequences are synced after executing the command. > > That would be cheaper, indeed. Isn't a boolean too limiting? > In this idea, we only need a flag to say whether the sequence sync is required or not. > Isn't that something you'd want to track with a LSN as "the point in > WAL where all the sequences have been synced"? > It won't be any better for the required purpose because after CREATE SUBSCRIPTION, if REFERESH wants to toggle the flag to indicate the sequences need sync again then using LSN would mean we need to set it to Invalid value. > The approach of doing all the sync work from the subscriber, while > having a command that can be kicked from the subscriber side is a good > user experience. > Thank you for
Re: Issues with ON CONFLICT UPDATE and REINDEX CONCURRENTLY
Hello, Michael. > Isn't the issue that we may select as arbiter indexes stuff that's !indisvalid? As far as I can see (1) !indisvalid indexes are filtered out. But... It looks like this choice is not locked in any way (2), so index_concurrently_swap or index_concurrently_set_dead can change this index after the decision is made, even despite WaitForLockersMultiple (3). In some cases, it may cause a data loss... But I was unable to reproduce that using some random usleep(), however - maybe it is a wrong assumption. [1]: https://github.com/postgres/postgres/blob/915de706d28c433283e9dc63701e8f978488a2b9/src/backend/optimizer/util/plancat.c#L804 [2]: https://github.com/postgres/postgres/blob/915de706d28c433283e9dc63701e8f978488a2b9/src/backend/optimizer/util/plancat.c#L924-L928 [3]: https://github.com/postgres/postgres/blob/8aee330af55d8a759b2b73f5a771d9d34a7b887f/src/backend/commands/indexcmds.c#L4153
Re: libpq contention due to gss even when not using gss
> On 14 Jun 2024, at 10:46, Dmitry Dolgov <9erthali...@gmail.com> wrote: > >> On Thu, Jun 13, 2024 at 10:30:24AM GMT, Andres Freund wrote: To investigate a report of both postgres and pgbouncer having issues when a lot of new connections aree established, I used pgbench -C. Oddly, on an early attempt, the bottleneck wasn't postgres+pgbouncer, it was pgbench. But only when using TCP, not with unix sockets. c=40;pgbench -C -n -c$c -j$c -T5 -f <(echo 'select 1') 'port=6432 host=127.0.0.1 user=test dbname=postgres password=fake' host=127.0.0.1: 16465 host=127.0.0.1,gssencmode=disable 20860 host=/tmp:49286 Note that the server does *not* support gss, yet gss has a substantial performance impact. Obviously the connection rates here absurdly high and outside of badly written applications likely never practically relevant. However, the number of cores in systems are going up, and this quite possibly will become relevant in more realistic scenarios (lock contention kicks in earlier the more cores you have). >>> >>> By not supporting gss I assume you mean having built with --with-gssapi, >>> but only host (not hostgssenc) records in pg_hba, right? >> >> Yes, the latter. Or not having kerberos set up on the client side. > > I've been experimenting with both: > > * The server is built without gssapi, but the client does support it. > This produces exactly the contention you're talking about. > > * The server is built with gssapi, but do not use it in pg_hba, the > client does support gssapi. In this case the difference between > gssencmode=disable/prefer is even more dramatic in my test case > (milliseconds vs seconds) due to the environment with configured > kerberos (for other purposes, thus gss_init_sec_context spends huge > amount of time to still return nothing). > > At the same time after quick look I don't see an easy way to avoid that. > Current implementation tries to initialize gss before getting any > confirmation from the server whether it's supported. Doing this other > way around would probably just shift overhead to the server side. The main problem seems to be that we check whether or not there is a credential cache when we try to select encryption but not yet authentication, as a way to figure out if gssenc it as all worth trying? I experimented with deferring it with potentially cheaper heuristics in encryption selection, but it seems hard to get around since other methods were even more expensive. -- Daniel Gustafsson
Re: Pgoutput not capturing the generated columns
> Thanks for the updated patch, few comments: > 1) The option name seems wrong here: > In one place include_generated_column is specified and other place > include_generated_columns is specified: > > + else if (IsSet(supported_opts, > SUBOPT_INCLUDE_GENERATED_COLUMN) && > +strcmp(defel->defname, > "include_generated_column") == 0) > + { > + if (IsSet(opts->specified_opts, > SUBOPT_INCLUDE_GENERATED_COLUMN)) > + errorConflictingDefElem(defel, pstate); > + > + opts->specified_opts |= > SUBOPT_INCLUDE_GENERATED_COLUMN; > + opts->include_generated_column = defGetBoolean(defel); > + } Fixed. > diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c > index d453e224d9..e8ff752fd9 100644 > --- a/src/bin/psql/tab-complete.c > +++ b/src/bin/psql/tab-complete.c > @@ -3365,7 +3365,7 @@ psql_completion(const char *text, int start, int end) > COMPLETE_WITH("binary", "connect", "copy_data", "create_slot", > "disable_on_error", > "enabled", "failover", "origin", > "password_required", > "run_as_owner", "slot_name", > - "streaming", > "synchronous_commit", "two_phase"); > + "streaming", > "synchronous_commit", "two_phase","include_generated_columns"); > > 2) This small data table need not have a primary key column as it will > create an index and insertion will happen in the index too. > +-- check include-generated-columns option with generated column > +CREATE TABLE gencoltable (a int PRIMARY KEY, b int GENERATED ALWAYS > AS (a * 2) STORED); > +INSERT INTO gencoltable (a) VALUES (1), (2), (3); > +SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, > NULL, 'include-xids', '0', 'skip-empty-xacts', '1', > 'include-generated-columns', '1'); Fixed. > 3) Please add a test case for this: > + set to false. If the subscriber-side > column is also a > + generated column then this option has no effect; the > replicated data will > + be ignored and the subscriber column will be filled as > normal with the > + subscriber-side computed or default data. Added the required test case. > 4) You can use a new style of ereport to remove the brackets around errcode > 4.a) > + else if (!parse_bool(strVal(elem->arg), > &data->include_generated_columns)) > + ereport(ERROR, > + > (errcode(ERRCODE_INVALID_PARAMETER_VALUE), > +errmsg("could not > parse value \"%s\" for parameter \"%s\"", > + > strVal(elem->arg), elem->defname))); > > 4.b) similarly here too: > + ereport(ERROR, > + (errcode(ERRCODE_SYNTAX_ERROR), > + /*- translator: both %s are strings of the form > "option = value" */ > + errmsg("%s and %s are mutually > exclusive options", > + "copy_data = true", > "include_generated_column = true"))); > > 4.c) similarly here too: > + if (include_generated_columns_option_given) > + ereport(ERROR, > + > (errcode(ERRCODE_SYNTAX_ERROR), > +errmsg("conflicting > or redundant options"))); Fixed. > 5) These variable names can be changed to keep it smaller, something > like gencol or generatedcol or gencolumn, etc > +++ b/src/include/catalog/pg_subscription.h > @@ -98,6 +98,8 @@ CATALOG(pg_subscription,6100,SubscriptionRelationId) > BKI_SHARED_RELATION BKI_ROW > * slots) in the upstream database are enabled > * to be synchronized to the standbys. */ > > + bool subincludegeneratedcolumn; /* True if generated columns must be > published */ > + > #ifdef CATALOG_VARLEN /* variable-length fields start here */ > /* Connection string to the publisher */ > text subconninfo BKI_FORCE_NOT_NULL; > @@ -157,6 +159,7 @@ typedef struct Subscription > List*publications; /* List of publication names to subscribe to */ > char*origin; /* Only publish data originating from the > * specified origin */ > + bool includegeneratedcolumn; /* publish generated column data */ > } Subscription; Fixed. The attached Patch contains the suggested changes. Thanks and Regards, Shubham Khanna. v6-0001-Enable-support-for-include_generated_columns-opti.patch Description: Binary data
Re: Conflict Detection and Resolution
On Thu, Jun 13, 2024 at 7:00 PM Alvaro Herrera wrote: > > On 2024-Jun-07, Tomas Vondra wrote: > > > On 6/3/24 09:30, Amit Kapila wrote: > > > On Sat, May 25, 2024 at 2:39 AM Tomas Vondra > > > wrote: > > > >> How is this going to deal with the fact that commit LSN and timestamps > > >> may not correlate perfectly? That is, commits may happen with LSN1 < > > >> LSN2 but with T1 > T2. > > > But as I wrote, I'm not quite convinced this means there are not other > > issues with this way of resolving conflicts. It's more likely a more > > complex scenario is required. > > Jan Wieck approached me during pgconf.dev to reproach me of this > problem. He also said he had some code to fix-up the commit TS > afterwards somehow, to make the sequence monotonically increasing. > Perhaps we should consider that, to avoid any problems caused by the > difference between LSN order and TS order. It might be quite > nightmarish to try to make the system work correctly without > reasonable constraints of that nature. > I agree with this but the problem Jan was worried about was not directly reproducible in what the PostgreSQL provides at least that is what I understood then. We are also unable to think of a concrete scenario where this is a problem but we are planning to spend more time deriving a test to reproducible the problem. -- With Regards, Amit Kapila.
Re: Pgoutput not capturing the generated columns
On Tue, Jun 4, 2024 at 8:12 AM Peter Smith wrote: > > Hi, > > Here are some review comments for patch v5-0001. > > == > GENERAL G.1 > > The patch changes HEAD behaviour for PUBLICATION col-lists right? e.g. > maybe before they were always ignored, but now they are not? > > OTOH, when 'include_generated_columns' is false then the PUBLICATION > col-list will ignore any generated cols even when they are present in > a PUBLICATION col-list, right? > > These kinds of points should be noted in the commit message and in the > (col-list?) documentation. Fixed. > == > Commit message > > General 1a. > IMO the commit message needs some background to say something like: > "Currently generated column values are not replicated because it is > assumed that the corresponding subscriber-side table will generate its > own values for those columns." > > ~ > > General 1b. > Somewhere in this commit message, you need to give all the other > special rules --- e.g. the docs says "If the subscriber-side column is > also a generated column then this option has no effect" > > ~~~ Fixed. > 2. > This commit enables support for the 'include_generated_columns' option > in logical replication, allowing the transmission of generated column > information and data alongside regular table changes. This option is > particularly useful for scenarios where applications require access to > generated column values for downstream processing or synchronization. > > ~ > > I don't think the sentence "This option is particularly useful..." is > helpful. It seems like just saying "This commit supports option XXX. > This is particularly useful if you want XXX". > Fixed. > > 3. > CREATE SUBSCRIPTION test1 connection 'dbname=postgres host=localhost port= > 'publication pub1; > > ~ > > What is this CREATE SUBSCRIPTION for? Shouldn't it have an example of > the new parameter being used in it? > Added the description for this in the Patch. > > 4. > Currently copy_data option with include_generated_columns option is > not supported. A future patch will remove this limitation. > > ~ > > Suggest to single-quote those parameter names for better readability. > Fixed. > > 5. > This commit aims to enhance the flexibility and utility of logical > replication by allowing users to include generated column information > in replication streams, paving the way for more robust data > synchronization and processing workflows. > > ~ > > IMO this paragraph can be omitted. Fixed. > == > .../test_decoding/sql/decoding_into_rel.sql > > 6. > +-- check include-generated-columns option with generated column > +CREATE TABLE gencoltable (a int PRIMARY KEY, b int GENERATED ALWAYS > AS (a * 2) STORED); > +INSERT INTO gencoltable (a) VALUES (1), (2), (3); > +SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, > NULL, 'include-xids', '0', 'skip-empty-xacts', '1', > 'include-generated-columns', '1'); > +INSERT INTO gencoltable (a) VALUES (4), (5), (6); > +SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, > NULL, 'include-xids', '0', 'skip-empty-xacts', '1', > 'include-generated-columns', '0'); > +DROP TABLE gencoltable; > + > > 6a. > I felt some additional explicit comments might help the readabilty of > the output file. > > e.g.1 > -- When 'include-generated=columns' = '1' the generated column 'b' > values will be replicated > SELECT data FROM pg_logical_slot_get_changes... > > e.g.2 > -- When 'include-generated=columns' = '0' the generated column 'b' > values will not be replicated > SELECT data FROM pg_logical_slot_get_changes... Added the required description for this. > 6b. > Suggest adding one more test case (where 'include-generated=columns' > is not set) to confirm/demonstrate the default behaviour for > replicated generated cols. Added the required Test case. > == > doc/src/sgml/protocol.sgml > > 7. > + > + class="parameter">include-generated-columns > + > + > +Boolean option to enable generated columns. > +The include-generated-columns option controls whether generated > +columns should be included in the string representation of tuples > +during logical decoding in PostgreSQL. This allows users to > +customize the output format based on whether they want to include > +these columns or not. The default is false. > + > + > + > > 7a. > It doesn't render properly. e.g. Should not be bold italic (probably > the class is wrong?), because none of the nearby parameters look this > way. > > ~ > > 7b. > The name here should NOT have hyphens. It needs underscores same as > all other nearby protocol parameters. > > ~ > > 7c. > The description seems overly verbose. > > SUGGESTION > Boolean option to enable generated columns. This option controls > whether generated columns should be included in the string > representation of tuples during logical decoding in PostgreSQL. The > default is false. Fixed. > == > doc/src/sgml/ref
Re:Re: may be a buffer overflow problem
>Thanks for looking! I will apply it backpatched all the way down as this has >been wrong since 2006. > >> I wonder how Winter Loo got to see that warning... > I was compiling source code of postgres version 13 and the building flags is changed in my development environment. >And it would be interesting to know if that was the only warning, since >error.c >in ECPG performs the exact same string copy. > Yes, that was the only warning. I searched all `sqlstate` words in ecpg directory, there's no other dubious problems.
Re: Conflict Detection and Resolution
On Thu, Jun 13, 2024 at 11:18 PM Jonathan S. Katz wrote: > > On 6/13/24 7:28 AM, Amit Kapila wrote: > >> > >> I feel that we should log all types of conflict in an uniform way. For > >> example, with detect_conflict being true, the update_differ conflict > >> is reported as "conflict %s detected on relation "%s"", whereas > >> concurrent inserts with the same key is reported as "duplicate key > >> value violates unique constraint "%s"", which could confuse users. > >> Ideally, I think that we log such conflict detection details (table > >> name, column name, conflict type, etc) to somewhere (e.g. a table or > >> server logs) so that the users can resolve them manually. > >> > > > > It is good to think if there is a value in providing in > > pg_conflicts_history kind of table which will have details of > > conflicts that occurred and then we can extend it to have resolutions. > > I feel we can anyway LOG the conflicts by default. Updating a separate > > table with conflicts should be done by default or with a knob is a > > point to consider. > > +1 for logging conflicts uniformly, but I would +100 to exposing the log > in a way that's easy for the user to query (whether it's a system view > or a stat table). Arguably, I'd say that would be the most important > feature to come out of this effort. > We can have both the system view and a stats table. The system view could have some sort of cumulative stats data like how many times a particular conflict had occurred and the table would provide detailed information about the conflict. The one challenge I see in providing a table is in its cleanup mechanism. We could prove a partitioned table such that users can truncate/drop the not needed partitions or provide a non-partitioned table where users can delete the old data in which case they generate a work for auto vacuum. > Removing how conflicts are resolved, users want to know exactly what row > had a conflict, and users from other database systems that have dealt > with these issues will have tooling to be able to review and analyze if > a conflicts occur. This data is typically stored in a queryable table, > with data retained for N days. When you add in automatic conflict > resolution, users then want to have a record of how the conflict was > resolved, in case they need to manually update it. > > Having this data in a table also gives the user opportunity to > understand conflict stats (e.g. conflict rates) and potentially identify > portions of the application and other parts of the system to optimize. > It also makes it easier to import to downstream systems that may perform > further analysis on conflict resolution, or alarm if a conflict rate > exceeds a certain threshold. > Agreed those are good use cases to store conflict history. -- With Regards, Amit Kapila.
Re: 回复: An implementation of multi-key sort
hi Tomas, So many thanks for your kind response and detailed report. I am working on locating issues based on your report/script and optimizing code, and will update later. Could you please also send me the script to generate report pdf from the test results (explain*.log)? I can try to make one by myself, but I'd like to get a report exactly the same as yours. It's really helpful. Thanks in advance. Yao Wang On Mon, Jun 10, 2024 at 5:09 AM Tomas Vondra wrote: > > Hello Yao, > > I was interested in the patch, considering the promise of significant > speedups of sorting, so I took a quick look and did some basic perf > testing today. Unfortunately, my benchmarks don't really confirm any > peformance benefits, so I haven't looked at the code very much and only > have some very basic feedback: > > 1) The new GUC is missing from the .sample config, triggering a failure > of "make check-world". Fixed by 0002. > > 2) There's a place mixing tabs/spaces in indentation. Fixed by 0003. > > 3) I tried running pgindent, mostly to see how that would affect the > comments, and for most it's probably fine, but a couple are mangled > (usually those with a numbered list of items). Might needs some changes > to use formatting that's not reformatted like this. The changes from > pgindent are in 0004, but this is not a fix - it just shows the changes > after running pgindent. > > Now, regarding the performance tests - I decided to do the usual black > box testing, i.e. generate tables with varying numbers of columns, data > types, different data distribution (random, correlated, ...) and so on. > And then run simple ORDER BY queries on that, measuring timing with and > without mk-sort, and checking the effect. > > So I wrote a simple bash script (attached) that does exactly that - it > generates a table with 1k - 10M rows, fills with with data (with some > basic simple data distributions), and then runs the queries. > > The raw results are too large to attach, I'm only attaching a PDF > showing the summary with a "speedup heatmap" - it's a pivot with the > parameters on the left, and then the GUC and number on columns on top. > So the first group of columns is with enable_mk_sort=off, the second > group with enable_mk_sort=on, and finally the heatmap with relative > timing (enable_mk_sort=on / enable_mk_sort=off). > > So values <100% mean it got faster (green color - good), and values > >100% mean it got slower (red - bad). And the thing is - pretty much > everything is red, often in the 200%-300% range, meaning it got 2x-3x > slower. There's only very few combinations where it got faster. That > does not seem very promising ... but maybe I did something wrong? > > After seeing this, I took a look at your example again, which showed > some nice speedups. But it seems very dependent on the order of keys in > the ORDER BY clause. For example consider this: > > set enable_mk_sort = on; > explain (analyze, timing off) > select * from t1 order by c6, c5, c4, c3, c2, c1; > > QUERY PLAN > --- > Sort (cost=72328.81..73578.81 rows=49 width=76) >(actual rows=49 loops=1) >Sort Key: c6, c5, c4, c3, c2, c1 >Sort Method: quicksort Memory: 59163kB >-> Seq Scan on t1 (cost=0.00..24999.99 rows=49 width=76) >(actual rows=49 loops=1) > Planning Time: 0.054 ms > Execution Time: 1095.183 ms > (6 rows) > > set enable_mk_sort = on; > explain (analyze, timing off) > select * from t1 order by c6, c5, c4, c3, c2, c1; > > QUERY PLAN > --- > Sort (cost=72328.81..73578.81 rows=49 width=76) >(actual rows=49 loops=1) >Sort Key: c6, c5, c4, c3, c2, c1 >Sort Method: multi-key quick sort Memory: 59163kB >-> Seq Scan on t1 (cost=0.00..24999.99 rows=49 width=76) >(actual rows=49 loops=1) > Planning Time: 0.130 ms > Execution Time: 633.635 ms > (6 rows) > > Which seems great, but let's reverse the sort keys: > > set enable_mk_sort = off; > explain (analyze, timing off) > select * from t1 order by c1, c2, c3, c4, c5, c6; > > QUERY PLAN > --- > > Sort (cost=72328.81..73578.81 rows=49 width=76) >(actual rows=49 loops=1) >Sort Key: c1, c2, c3, c4, c5, c6 >Sort Method: quicksort Memory: 59163kB >-> Seq Scan on t1 (cost=0.00..24999.99 rows=49 width=76) >(actual rows=49 loops=1) > Planning Time: 0.146 ms > Execution Time: 170.085 ms > (6 rows) > > set enable_mk_sort = off; > explain (analyze, timing off) > select * from t1 order by c1, c2, c3, c4, c5, c6; > > QUERY PLAN > --- > Sort (cost=72328.81..73578.81 rows=4
Re: Conflict Detection and Resolution
On Fri, Jun 14, 2024 at 12:10 AM Robert Haas wrote: > > On Thu, May 23, 2024 at 2:37 AM shveta malik wrote: > > c) update_deleted: The row with the same value as that incoming > > update's key does not exist. The row is already deleted. This conflict > > type is generated only if the deleted row is still detectable i.e., it > > is not removed by VACUUM yet. If the row is removed by VACUUM already, > > it cannot detect this conflict. It will detect it as update_missing > > and will follow the default or configured resolver of update_missing > > itself. > > I think this design is categorically unacceptable. It amounts to > designing a feature that works except when it doesn't. I'm not exactly > sure how the proposal should be changed to avoid depending on the > timing of VACUUM, but I think it's absolutely not OK to depend on the > timing of VACUUm -- or, really, this is going to depend on the timing > of HOT-pruning, which will often happen almost instantly. > Agreed, above Tomas has speculated to have a way to avoid vacuum cleaning dead tuples until the required changes are received and applied. Shveta also mentioned another way to have deads-store (say a table where deleted rows are stored for resolution) [1] which is similar to a technique used by some other databases. There is an agreement to not rely on Vacuum to detect such a conflict but the alternative is not clear. Currently, we are thinking to consider such a conflict type as update_missing (The row with the same value as that incoming update's key does not exist.). This is how the current HEAD code behaves and LOGs the information (logical replication did not find row to be updated ..). [1] - https://www.postgresql.org/message-id/CAJpy0uCov4JfZJeOvY0O21_gk9bcgNUDp4jf8%2BBbMp%2BEAv8cVQ%40mail.gmail.com -- With Regards, Amit Kapila.
TruncateMultiXact() bugs
I was performing tests around multixid wraparound, when I ran into this assertion: TRAP: failed Assert("CritSectionCount == 0 || (context)->allowInCritSection"), File: "../src/backend/utils/mmgr/mcxt.c", Line: 1353, PID: 920981 postgres: autovacuum worker template0(ExceptionalCondition+0x6e)[0x560a501e866e] postgres: autovacuum worker template0(+0x5dce3d)[0x560a50217e3d] postgres: autovacuum worker template0(ForwardSyncRequest+0x8e)[0x560a4ffec95e] postgres: autovacuum worker template0(RegisterSyncRequest+0x2b)[0x560a50091eeb] postgres: autovacuum worker template0(+0x187b0a)[0x560a4fdc2b0a] postgres: autovacuum worker template0(SlruDeleteSegment+0x101)[0x560a4fdc2ab1] postgres: autovacuum worker template0(TruncateMultiXact+0x2fb)[0x560a4fdbde1b] postgres: autovacuum worker template0(vac_update_datfrozenxid+0x4b3)[0x560a4febd2f3] postgres: autovacuum worker template0(+0x3adf66)[0x560a4ffe8f66] postgres: autovacuum worker template0(AutoVacWorkerMain+0x3ed)[0x560a4ffe7c2d] postgres: autovacuum worker template0(+0x3b1ead)[0x560a4ffecead] postgres: autovacuum worker template0(+0x3b620e)[0x560a4fff120e] postgres: autovacuum worker template0(+0x3b3fbb)[0x560a4ffeefbb] postgres: autovacuum worker template0(+0x2f724e)[0x560a4ff3224e] /lib/x86_64-linux-gnu/libc.so.6(+0x27c8a)[0x7f62cc642c8a] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x85)[0x7f62cc642d45] postgres: autovacuum worker template0(_start+0x21)[0x560a4fd16f31] 2024-06-14 13:11:02.025 EEST [920971] LOG: server process (PID 920981) was terminated by signal 6: Aborted 2024-06-14 13:11:02.025 EEST [920971] DETAIL: Failed process was running: autovacuum: VACUUM pg_toast.pg_toast_13407 (to prevent wraparound) The attached python script reproduces this pretty reliably. It's a reduced version of a larger test script I was working on, it probably could be simplified further for this particular issue. Looking at the code, it's pretty clear how it happens: 1. TruncateMultiXact does START_CRIT_SECTION(); 2. In the critical section, it calls PerformMembersTruncation() -> SlruDeleteSegment() -> SlruInternalDeleteSegment() -> RegisterSyncRequest() -> ForwardSyncRequest() 3. If the fsync request queue is full, it calls CompactCheckpointerRequestQueue(), which calls palloc0. Pallocs are not allowed in a critical section. A straightforward fix is to add a check to CompactCheckpointerRequestQueue() to bail out without compacting, if it's called in a critical section. That would cover any other cases like this, where RegisterSyncRequest() is called in a critical section. I haven't tried searching if any more cases like this exist. But wait there is more! After applying that fix in CompactCheckpointerRequestQueue(), the test script often gets stuck. There's a deadlock between the checkpointer, and the autovacuum backend trimming the SLRUs: 1. TruncateMultiXact does this: MyProc->delayChkptFlags |= DELAY_CHKPT_START; 2. It then makes that call to PerformMembersTruncation() and RegisterSyncRequest(). If it cannot queue the request, it sleeps a little and retries. But the checkpointer is stuck waiting for the autovacuum backend, because of delayChkptFlags, and will never clear the queue. To fix, I propose to add AbsorbSyncRequests() calls to the wait-loops in CreateCheckPoint(). Attached patch fixes both of those issues. I can't help thinking that TruncateMultiXact() should perhaps not have such a long critical section. TruncateCLOG() doesn't do that. But it was added for good reasons in commit 4f627f897367, and this fix seems appropriate for the stable branches anyway, even if we come up with something better for master. -- Heikki Linnakangas Neon (https://neon.tech)import time import logging import psycopg2 from typing import Any, Callable, Dict, Iterator, List, Optional, Tuple, Type, Union, cast import subprocess PGDATA = "pgdata-multixid-repro" log = logging.getLogger(__name__) def connect(): return psycopg2.connect( database="postgres", user="postgres", host="/tmp/", ) # Constants and macros copied from PostgreSQL multixact.c and headers. These are needed to # calculate the SLRU segments that a particular multixid or multixid-offsets falls into. BLCKSZ = 8192 MULTIXACT_OFFSETS_PER_PAGE = int(BLCKSZ / 4) SLRU_PAGES_PER_SEGMENT = int(32) MXACT_MEMBER_BITS_PER_XACT = 8 MXACT_MEMBER_FLAGS_PER_BYTE = 1 MULTIXACT_FLAGBYTES_PER_GROUP = 4 MULTIXACT_MEMBERS_PER_MEMBERGROUP = MULTIXACT_FLAGBYTES_PER_GROUP * MXACT_MEMBER_FLAGS_PER_BYTE MULTIXACT_MEMBERGROUP_SIZE = 4 * MULTIXACT_MEMBERS_PER_MEMBERGROUP + MULTIXACT_FLAGBYTES_PER_GROUP MULTIXACT_MEMBERGROUPS_PER_PAGE = int(BLCKSZ / MULTIXACT_MEMBERGROUP_SIZE) MULTIXACT_MEMBERS_PER_PAGE = MULTIXACT_MEMBERGROUPS_PER_PAGE * MULTIXACT_MEMBERS_PER_MEMBERGROUP def MultiXactIdToOffsetSegment(xid: int): return int(xid / (SLRU_PAGES_PER_SEGMENT * MULTIXACT_OFFSETS_PER_PAGE)) def MXOffsetToMemberSegment(off: int): return int(off /
RE: Conflict Detection and Resolution
On Thursday, June 13, 2024 8:46 PM Peter Eisentraut wrote: > > On 23.05.24 08:36, shveta malik wrote: > > Conflict Resolution > > > > a) latest_timestamp_wins:The change with later commit timestamp > wins. > > b) earliest_timestamp_wins: The change with earlier commit timestamp > wins. > > c) apply: Always apply the remote change. > > d) skip:Remote change is skipped. > > e) error: Error out on conflict. Replication is stopped, manual > > action is needed. > > You might be aware of pglogical, which has similar conflict resolution modes, > but they appear to be spelled a bit different. It might be worth reviewing > this, > so that we don't unnecessarily introduce differences. Right. Some of the proposed resolution names are different from pglogical's while the functionalities are the same. The following is the comparison with pglogical: latest_timestamp_wins(proposal) - last_update_wins(pglogical) earliest_timestamp_wins(proposal) - first_update_wins(pglogical) apply(proposal) - apply_remote(pglogical) skip(proposal)- keep_local(pglogical) I personally think the pglogical's names read more naturally. But others may have different opinions on this. > > https://github.com/2ndquadrant/pglogical?tab=readme-ov-file#conflicts > > There might also be other inspiration to be found related to this in pglogical > documentation or code. Another difference is that we allow users to specify different resolutions for different conflicts, while pglogical allows specifying one resolution for all conflict. I think the proposed approach offers more flexibility to users, which seems more favorable to me. Best Regards, Hou zj
Re: Remove dependence on integer wrapping
14.06.2024 05:48, Joseph Koshakow wrote: v4-0001-Remove-dependence-on-integer-wrapping.patch is unchanged, I just incremented the version number. > Also there are several trap-producing cases with date types: > SELECT to_date('1', 'CC'); > SELECT to_timestamp('10,999', 'Y,YYY'); > SELECT make_date(-2147483648, 1, 1); > > And one more with array... > CREATE TABLE t (ia int[]); > INSERT INTO t(ia[2147483647:2147483647]) VALUES ('{}'); I'll try and get patches to address these too in the next couple of weeks unless someone beats me to it. > I think it's not the whole iceberg too. +1 After sending my message, I toyed with -ftrapv a little time more and found other cases: SELECT '[]'::jsonb -> -2147483648; #4 0x7efe232d67f3 in __GI_abort () at ./stdlib/abort.c:79 #5 0x55e8fde9f211 in __negvsi2 () #6 0x55e8fdcca62c in jsonb_array_element (fcinfo=0x55e8fec28220) at jsonfuncs.c:948 (gdb) f 6 #6 0x55e14cb9362c in jsonb_array_element (fcinfo=0x55e14d493220) at jsonfuncs.c:948 948 if (-element > nelements) (gdb) p element $1 = -2147483648 --- SELECT jsonb_delete_path('{"a":[]}', '{"a",-2147483648}'); #4 0x7f1873bef7f3 in __GI_abort () at ./stdlib/abort.c:79 #5 0x564a009d2211 in __negvsi2 () #6 0x564a00807c89 in setPathArray (it=0x7fff865c7380, path_elems=0x564a017baf20, path_nulls=0x564a017baf40, path_len=2, st=0x7fff865c7388, level=1, newval=0x0, nelems=2, op_type=2) at jsonfuncs.c:5407 (gdb) f 6 #6 0x55985e823c89 in setPathArray (it=0x7ffc22258fe0, path_elems=0x559860286f20, path_nulls=0x559860286f40, path_len=2, st=0x7ffc22258fe8, level=1, newval=0x0, nelems=0, op_type=2) at jsonfuncs.c:5407 5407 if (-idx > nelems) (gdb) p idx $1 = -2147483648 --- CREATE FUNCTION check_foreign_key () RETURNS trigger AS .../refint.so' LANGUAGE C; CREATE TABLE t (i int4 NOT NULL); CREATE TRIGGER check_fkey BEFORE DELETE ON t FOR EACH ROW EXECUTE PROCEDURE check_foreign_key (2147483647, 'cascade', 'i', "ft", "i"); INSERT INTO t VALUES (1); DELETE FROM t; #4 0x7f57f0bef7f3 in __GI_abort () at ./stdlib/abort.c:79 #5 0x7f57f1671351 in __addvsi3 () from .../src/test/regress/refint.so #6 0x7f57f1670234 in check_foreign_key (fcinfo=0x7ffebf523650) at refint.c:321 (gdb) f 6 #6 0x7f3400ef9234 in check_foreign_key (fcinfo=0x7ffd6e16a600) at refint.c:321 321 nkeys = (nargs - nrefs) / (nrefs + 1); (gdb) p nargs $1 = 3 (gdb) p nrefs $2 = 2147483647 --- And the most interesting case to me: SET temp_buffers TO 10; CREATE TEMP TABLE t(i int PRIMARY KEY); INSERT INTO t VALUES(1); #4 0x7f385cdd37f3 in __GI_abort () at ./stdlib/abort.c:79 #5 0x5620071c4f51 in __addvsi3 () #6 0x562007143f3c in init_htab (hashp=0x562008facb20, nelem=610070812) at dynahash.c:720 (gdb) f 6 #6 0x560915207f3c in init_htab (hashp=0x560916039930, nelem=10) at dynahash.c:720 720 hctl->high_mask = (nbuckets << 1) - 1; (gdb) p nbuckets $1 = 1073741824 Best regards, Alexander
Re: POC, WIP: OR-clause support for indexes
On Mon, Apr 8, 2024 at 1:34 AM Alexander Korotkov wrote: > > I've revised the patch. Did some beautification, improvements for > documentation, commit messages etc. > > I've pushed the 0001 patch without 0002. I think 0001 is good by > itself given that there is the or_to_any_transform_limit GUC option. > The more similar OR clauses are here the more likely grouping them > into SOAP will be a win. But I've changed the default value to 5. > This will make it less invasive and affect only queries with obvious > repeating patterns. That also reduced the changes in the regression > tests expected outputs. > > Regarding 0002, it seems questionable since it could cause a planning > slowdown for SAOP's with large arrays. Also, it might reduce the win > of transformation made by 0001. So, I think we should skip it for > now. The patch has been reverted from pg17. Let me propose a new version for pg18 based on the valuable feedback from Tom Lane [1][2]. * The transformation is moved to the stage of adding restrictinfos to the base relation (in particular add_base_clause_to_rel()). This leads to interesting consequences. While this allows IndexScans to use transformed clauses, BitmapScans and SeqScans seem unaffected. Therefore, I wasn't able to find a planning regression. * As soon as there is no planning regression anymore, I've removed or_to_any_transform_limit GUC, which was a source of critics. * Now, not only Consts allowed in the SAOP's list, but also Params. * The criticized hash based on expression jumbling has been removed. Now, the plain list is used instead. * OrClauseGroup now gets a legal node tag. That allows to mix it in the list with other nodes without hacks. I think this patch shouldn't be as good as before for optimizing performance of large OR lists, given that BitmapScans and SeqScans still deal with ORs. However, it allows IndexScans to handle more, doesn't seem to cause planning regression and therefore introduce no extra GUC. Overall, this seems like a good compromise. This patch could use some polishing, but I'd like to first hear some feedback on general design. Links 1. https://www.postgresql.org/message-id/3604469.1712628736%40sss.pgh.pa.us 2. https://www.postgresql.org/message-id/3649287.1712642139%40sss.pgh.pa.us -- Regards, Alexander Korotkov Supabase v25-0001-Transform-OR-clauses-to-ANY-expression.patch Description: Binary data
Re: RFC: adding pytest as a supported test framework
On Thu, Jun 13, 2024 at 8:12 PM Jacob Champion wrote: > But also: do you have opinions on what to fill in as steps 2 > (something we have no ability to test today) and 3 (something we do > test today, but hate)? > > My vote for step 2 is "client and server separation", perhaps by > testing libpq fallback against a server that claims support for > different build-time options. I don't want to have a vote in step 3, > because part of that step is proving that this framework can provide > value for a part of the project I don't really know much about. I mean, both Perl and Python are Turing-complete. Anything you can do in one, you can do in the other, especially when you consider that we're not likely to accept too many dependencies on external Perl or Python modules. That's why I see this as nothing more or less than exercise in letting people use the programming language they prefer. We've talked about a libpq FFI interface, but it hasn't been done; now we're talking about maybe having a Python one. Perhaps we'll end up with both. Then you can imagine porting tests from one language to the other and the only difference is whether you'd rather have line noise before all of your variable names or semantically significant whitespace. I just don't believe in the idea that we're going to write one category of tests in one language and another category in another language. As soon as we open the door to Python tests, people are going to start writing the TAP tests that they would have written in Perl in Python instead. And if the test utilities that we have for Perl are not there for Python, then they'll either open code things for which they would have used a module, or they'll write a stripped-down version of the module that will then get built-up patch by patch until, 50 or 100 or 200 hours of committer-review later, it resembles the existing Perl module. And, if the committer pushes back and says, hey, why not write this test in Perl which already has all of this test infrastructure in place already, then the submitter will wander off muttering about how PostgreSQL committers are hidebound backward individuals who just try to ruin everybody's day. So as I see it, the only reasonable plan here if we want to introduce testing in Python (or C#, or Ruby, or Go, or JavaScript, or Lua, or LOLCODE) is to try to achieve a reasonable degree of parity between that language and Perl. Because then we can at least review the new infrastructure all at once, instead of incrementally spread across many patches written, reviewed, and committed by many different people. Now, I completely understand if you're not excited about getting sucked down that rabbit-hole, and maybe some other committer is going to see this differently than I do, and that's fine. But my view is that if you're not interested in doing all the work to let people do more or less the kind of stuff that they currently do in Perl in Python instead, then your alternative is to take the tests that you want to add and rewrite them in Perl. And I am fairly cetain that if you choose that option, it will save me, and a bunch of other committers, a whole lot of work, at least in the short term. If we add support for Python, we are going to end up having to do a lot of things twice for the next let's say ten to twenty years until somebody rewrites the remaining Perl tests in Python or whatever language is hot and cool by then. My opinion is that we need to be open to enduring that pain because we can't indefinitely hold our breath and insist on using obsolete tools for everything, but that doesn't mean that I don't think it will be painful. Consider the meson build system project. To get that committed, Andres had to make it do pretty much everything MSVC could do and mostly everything that configure could do, and the places where he didn't make it do everything configure could do remain sore spots that I, at least, am not happy about. And in that case, he also managed to get MSVC removed entirely, so that we do not have a larger number of build systems in total than we had before. Furthermore, the amount of code in buildsystem files (makefiles, meson.build) in a typical patch needing review is usually none or very little, whereas the amount of test code in a patch is sometimes quite large. I've come around to the belief that the meson work was worthwhile -- running tests is so much faster and nicer now -- but it was a ton of work to get done and impacted everyone in the development community, and I think the blast radius for this change is likely to be larger for the reasons suggested earlier in this paragraph. -- Robert Haas EDB: http://www.enterprisedb.com
Re: 回复: An implementation of multi-key sort
On 6/14/24 13:20, Yao Wang wrote: > hi Tomas, > > So many thanks for your kind response and detailed report. I am working > on locating issues based on your report/script and optimizing code, and > will update later. > > Could you please also send me the script to generate report pdf > from the test results (explain*.log)? I can try to make one by myself, > but I'd like to get a report exactly the same as yours. It's really > helpful. > I don't have a script for that. I simply load the results into a spreadsheet, do a pivot table to "aggregate and reshuffle" it a bit, and then add a heatmap. I use google sheets for this, but any other spreadsheet should handle this too, I think. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [HACKERS] make async slave to wait for lsn to be replayed
Hi, Ivan! On Wed, Jun 12, 2024 at 11:36 AM Kartyshov Ivan wrote: > > Hi, Alexander, Here, I made some improvements according to your > discussion with Heikki. > > On 2024-04-11 18:09, Alexander Korotkov wrote: > > On Thu, Apr 11, 2024 at 1:46 AM Heikki Linnakangas > > wrote: > >> In a nutshell, it's possible for the loop in WaitForLSN to exit > >> without > >> cleaning up the process from the heap. I was able to hit that by > >> adding > >> a delay after the addLSNWaiter() call: > >> > >> > TRAP: failed Assert("!procInfo->inHeap"), File: > >> > "../src/backend/commands/waitlsn.c", Line: 114, PID: 1936152 > >> > postgres: heikki postgres [local] > >> > CALL(ExceptionalCondition+0xab)[0x55da1f68787b] > >> > postgres: heikki postgres [local] CALL(+0x331ec8)[0x55da1f204ec8] > >> > postgres: heikki postgres [local] CALL(WaitForLSN+0x139)[0x55da1f2052cc] > >> > postgres: heikki postgres [local] > >> > CALL(pg_wal_replay_wait+0x18b)[0x55da1f2056e5] > >> > postgres: heikki postgres [local] > >> > CALL(ExecuteCallStmt+0x46e)[0x55da1f18031a] > >> > postgres: heikki postgres [local] > >> > CALL(standard_ProcessUtility+0x8cf)[0x55da1f4b26c9] > >> > >> I think there's a similar race condition if the timeout is reached at > >> the same time that the startup process wakes up the process. > > > > Thank you for catching this. I think WaitForLSN() just needs to call > > deleteLSNWaiter() unconditionally after exit from the loop. > > Fix and add injection point test on this race condition. > > > On Thu, Apr 11, 2024 at 1:46 AM Heikki Linnakangas > > wrote: > >> The docs could use some-copy-editing, but just to point out one issue: > >> > >> > There are also procedures to control the progress of recovery. > >> > >> That's copy-pasted from an earlier sentence at the table that lists > >> functions like pg_promote(), pg_wal_replay_pause(), and > >> pg_is_wal_replay_paused(). The pg_wal_replay_wait() doesn't control > >> the > >> progress of recovery like those functions do, it only causes the > >> calling > >> backend to wait. > > Fix documentation and add extra tests on multi-standby replication > and cascade replication. Thank you for the revised patch. I see couple of items which are not addressed in this revision. * As Heikki pointed, that it's currently not possible in one round trip to call call pg_wal_replay_wait() and do other job. The attached patch addresses this. It milds the requirement. Now, it's not necessary to be in atomic context. It's only necessary to have no active snapshot. This new requirement works for me so far. I appreciate a feedback on this. * As Alvaro pointed, multiple waiters case isn't covered by the test suite. That leads to no coverage of some code paths. The attached patch addresses this by adding a test case with multiple waiters. The rest looks good to me. Links. 1. https://www.postgresql.org/message-id/d1303584-b763-446c-9409-f4516118219f%40iki.fi 2.https://www.postgresql.org/message-id/202404051815.eri4u5q6oj26%40alvherre.pgsql -- Regards, Alexander Korotkov Supabase From 0229f55f8e63688f6660d07600dd6918e0e55b91 Mon Sep 17 00:00:00 2001 From: Alexander Korotkov Date: Thu, 13 Jun 2024 00:39:45 +0300 Subject: [PATCH v19] Implement pg_wal_replay_wait() stored procedure pg_wal_replay_wait() is to be used on standby and specifies waiting for the specific WAL location to be replayed before starting the transaction. This option is useful when the user makes some data changes on primary and needs a guarantee to see these changes on standby. The queue of waiters is stored in the shared memory array sorted by LSN. During replay of WAL waiters whose LSNs are already replayed are deleted from the shared memory array and woken up by setting of their latches. pg_wal_replay_wait() needs to wait without any snapshot held. Otherwise, the snapshot could prevent the replay of WAL records implying a kind of self-deadlock. This is why it is only possible to implement pg_wal_replay_wait() as a procedure working without an active snapshot, not a function. Catversion is bumped. Discussion: https://postgr.es/m/eb12f9b03851bb2583adab5df9579b4b%40postgrespro.ru Author: Kartyshov Ivan, Alexander Korotkov Reviewed-by: Michael Paquier, Peter Eisentraut, Dilip Kumar, Amit Kapila Reviewed-by: Alexander Lakhin, Bharath Rupireddy, Euler Taveira --- doc/src/sgml/func.sgml| 112 ++ src/backend/access/transam/xact.c | 6 + src/backend/access/transam/xlog.c | 7 + src/backend/access/transam/xlogrecovery.c | 11 + src/backend/catalog/system_functions.sql | 3 + src/backend/commands/Makefile | 3 +- src/backend/commands/meson.build | 1 + src/backend/commands/waitlsn.c| 345 ++ src/backend/lib/pairingheap.c | 18 +- src/backend/storage/ipc/ipci.c| 3 + src/backend/storage/lmgr/proc.c | 6 + .../utils/activity/w
assertion failure at cost_memoize_rescan()
Hello, I met an assertion failure, and identified the root of the problem, but no idea how to fix it. The location of the problematic Assert() is at cost_memoize_rescan() to check 'hit_ratio' is between 0.0 and 1.0. The 'calls' is provided by the caller, and 'ndistinct' is the result of estimate_num_groups(). #4 0x0084d583 in cost_memoize_rescan (root=0x2e95748, mpath=0x30aece8, rescan_startup_cost=0x7ffd72141260, rescan_total_cost=0x7ffd72141258) at costsize.c:2564 /home/kaigai/source/pgsql-16/src/backend/optimizer/path/costsize.c:2564:83932:beg:0x84d583 (gdb) l 2559 * how many of those scans we expect to get a cache hit. 2560 */ 2561hit_ratio = ((calls - ndistinct) / calls) * 2562(est_cache_entries / Max(ndistinct, est_cache_entries)); 2563 2564Assert(hit_ratio >= 0 && hit_ratio <= 1.0); 2565 2566/* 2567 * Set the total_cost accounting for the expected cache hit ratio. We 2568 * also add on a cpu_operator_cost to account for a cache lookup. This (gdb) bt #0 0x7f3a39aa154c in __pthread_kill_implementation () from /lib64/libc.so.6 #1 0x7f3a39a54d06 in raise () from /lib64/libc.so.6 #2 0x7f3a39a287f3 in abort () from /lib64/libc.so.6 #3 0x00b6ff2c in ExceptionalCondition (conditionName=0xd28c28 "hit_ratio >= 0 && hit_ratio <= 1.0", fileName=0xd289a4 "costsize.c", lineNumber=2564) at assert.c:66 #4 0x0084d583 in cost_memoize_rescan (root=0x2e95748, mpath=0x30aece8, rescan_startup_cost=0x7ffd72141260, rescan_total_cost=0x7ffd72141258) at costsize.c:2564 #5 0x00850831 in cost_rescan (root=0x2e95748, path=0x30aece8, rescan_startup_cost=0x7ffd72141260, rescan_total_cost=0x7ffd72141258) at costsize.c:4350 #6 0x0084e333 in initial_cost_nestloop (root=0x2e95748, workspace=0x7ffd721412d0, jointype=JOIN_INNER, outer_path=0x3090058, inner_path=0x30aece8, extra=0x7ffd72141500) at costsize.c:2978 #7 0x00860f58 in try_partial_nestloop_path (root=0x2e95748, joinrel=0x30ae158, outer_path=0x3090058, inner_path=0x30aece8, pathkeys=0x0, jointype=JOIN_INNER, extra=0x7ffd72141500) at joinpath.c:887 #8 0x00862a64 in consider_parallel_nestloop (root=0x2e95748, joinrel=0x30ae158, outerrel=0x308f428, innerrel=0x2eac390, jointype=JOIN_INNER, extra=0x7ffd72141500) at joinpath.c:2083 #9 0x0086273d in match_unsorted_outer (root=0x2e95748, joinrel=0x30ae158, outerrel=0x308f428, innerrel=0x2eac390, jointype=JOIN_INNER, extra=0x7ffd72141500) at joinpath.c:1940 #10 0x008600f0 in add_paths_to_joinrel (root=0x2e95748, joinrel=0x30ae158, outerrel=0x308f428, innerrel=0x2eac390, jointype=JOIN_INNER, sjinfo=0x7ffd721415f0, restrictlist=0x30ae5a8) at joinpath.c:296 #11 0x00864d10 in populate_joinrel_with_paths (root=0x2e95748, rel1=0x308f428, rel2=0x2eac390, joinrel=0x30ae158, sjinfo=0x7ffd721415f0, restrictlist=0x30ae5a8) at joinrels.c:925 #12 0x008649e1 in make_join_rel (root=0x2e95748, rel1=0x308f428, rel2=0x2eac390) at joinrels.c:776 #13 0x00863ec1 in make_rels_by_clause_joins (root=0x2e95748, old_rel=0x308f428, other_rels_list=0x3088ed0, other_rels=0x3088ee8) at joinrels.c:312 #14 0x0086399a in join_search_one_level (root=0x2e95748, level=3) at joinrels.c:123 #15 0x008463f8 in standard_join_search (root=0x2e95748, levels_needed=4, initial_rels=0x3088ed0) at allpaths.c:3454 #16 0x0084636d in make_rel_from_joinlist (root=0x2e95748, joinlist=0x306b4f8) at allpaths.c:3385 #17 0x00841548 in make_one_rel (root=0x2e95748, joinlist=0x306b4f8) at allpaths.c:229 #18 0x008806a9 in query_planner (root=0x2e95748, qp_callback=0x886bcb , qp_extra=0x7ffd72141960) at planmain.c:278 #19 0x00882f5f in grouping_planner (root=0x2e95748, tuple_fraction=0) at planner.c:1495 #20 0x0088268c in subquery_planner (glob=0x2e95348, parse=0x2e90e98, parent_root=0x0, hasRecursion=false, tuple_fraction=0) at planner.c:1064 #21 0x00880cdb in standard_planner (parse=0x2e90e98, query_string=0x2e3a0e8 "explain\nselect sum(lo_revenue), d_year, p_brand1\n from lineorder, date1, part, supplier\n where lo_orderdate = d_datekey\nand lo_partkey = p_partkey\nand lo_suppkey = s_suppkey\n and p_brand1"..., cursorOptions=2048, boundParams=0x0) at planner.c:413 I tracked the behavior of estimate_num_groups() using gdb line-by-line to observe how 'input_rows' is changed and how it affects the result value. According to the call trace, the problematic estimate_num_groups() invocation is called with "input_rows=3251872.91667", then it was rounded up to 3251873 by the clamp_row_est(). Eventually, its result value was calculated larger than the upper limit, so the return value was suppressed by 3251873, but it is a tiny bit larger than the input value! Back to the cost_memoize_rescan(). The hit_ratio is calculated as follows: hit_ratio = ((calls - ndistinct) / calls) *
Re: Issues with ON CONFLICT UPDATE and REINDEX CONCURRENTLY
Hello. > But I was unable to reproduce that using some random usleep(), however - maybe it is a wrong assumption. It seems like the assumption is correct - we may use an invalid index as arbiter due to race condition. The attached patch adds a check for that case, and now the test fails like this: # pgbench: error: client 16 script 1 aborted in command 1 query 0: ERROR: duplicate key value violates unique constraint "tbl_pkey_ccold" # DETAIL: Key (i)=(42) already exists. # pgbench: error: client 9 script 1 aborted in command 1 query 0: ERROR: ON CONFLICT does not support invalid indexes as arbiters # pgbench: error: client 0 script 2 aborted in command 1 query 0: ERROR: duplicate key value violates unique constraint "tbl_pkey" # DETAIL: Key (i)=(69) already exists. # pgbench: error: client 7 script 0 aborted in command 1 query 0: ERROR: ON CONFLICT does not support invalid indexes as arbiters # pgbench: error: client 10 script 0 aborted in command 1 query 0: ERROR: ON CONFLICT does not support invalid indexes as arbiters # pgbench: error: client 11 script 0 aborted in command 1 query 0: ERROR: ON CONFLICT does not support invalid indexes as arbiters I think It is even possible to see !alive index in the same situation (it is worse case), but I was unable to reproduce it so far. Best regards, Mikhail. Subject: [PATCH] error report in case of invalid index used as arbiter test for issue with upsert fail --- Index: src/test/modules/test_misc/t/006_concurrently_unique_fail.pl IDEA additional info: Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP <+>UTF-8 === diff --git a/src/test/modules/test_misc/t/006_concurrently_unique_fail.pl b/src/test/modules/test_misc/t/006_concurrently_unique_fail.pl new file mode 100644 --- /dev/null (revision 9446f944b415306d9e5d5ab98f69938d8f5ee87f) +++ b/src/test/modules/test_misc/t/006_concurrently_unique_fail.pl (revision 9446f944b415306d9e5d5ab98f69938d8f5ee87f) @@ -0,0 +1,158 @@ + +# Copyright (c) 2024, PostgreSQL Global Development Group + +# Test REINDEX CONCURRENTLY with concurrent modifications and HOT updates +use strict; +use warnings; + +use Config; +use Errno; + +use PostgreSQL::Test::Cluster; +use PostgreSQL::Test::Utils; +use Time::HiRes qw(usleep); +use IPC::SysV; +use threads; +use Time::HiRes qw( time ); +use Test::More; +use Test::Builder; + +if ($@ || $windows_os) +{ + plan skip_all => 'Fork and shared memory are not supported by this platform'; +} + +my ($pid, $shmem_id, $shmem_key, $shmem_size); +eval 'sub IPC_CREAT {0001000}' unless defined &IPC_CREAT; +$shmem_size = 4; +$shmem_key = rand(100); +$shmem_id = shmget($shmem_key, $shmem_size, &IPC_CREAT | 0777) or die "Can't shmget: $!"; +shmwrite($shmem_id, "wait", 0, $shmem_size) or die "Can't shmwrite: $!"; + +my $psql_timeout = IPC::Run::timer($PostgreSQL::Test::Utils::timeout_default); +# +# Test set-up +# +my ($node, $result); +$node = PostgreSQL::Test::Cluster->new('RC_test'); +$node->init; +$node->append_conf('postgresql.conf', + 'lock_timeout = ' . (1000 * $PostgreSQL::Test::Utils::timeout_default)); +$node->append_conf('postgresql.conf', 'fsync = off'); +$node->append_conf('postgresql.conf', 'autovacuum = off'); +$node->start; +$node->safe_psql('postgres', q(CREATE UNLOGGED TABLE tbl(i int primary key, updated_at timestamp))); + + +### +### +### + +# IT IS NOT REQUIRED TO REPRODUCE THE ISSUE BUT MAKES IT TO HAPPEN FASTER +$node->safe_psql('postgres', q(CREATE INDEX idx ON tbl(i, updated_at))); + +### +### +### + +my $builder = Test::More->builder; +$builder->use_numbers(0); +$builder->no_plan(); + +my $child = $builder->child("pg_bench"); + +if(!defined($pid = fork())) { + # fork returned undef, so unsuccessful + die "Cannot fork a child: $!"; +} elsif ($pid == 0) { + + $node->pgbench( + '--no-vacuum --client=20 -j 2 --transactions=10', + 0, + [qr{actually processed}], + [qr{^$}], + 'concurrent INSERTs, UPDATES and RC', + { + # Ensure some HOT updates happen + '002_pgbench_concurrent_transaction_updates' => q( +BEGIN; +INSERT INTO tbl VALUES(13,now()) on conflict(i) do update set updated_at = now(); +COMMIT; + ), + '003_pgbench_concurrent_transaction_updates' => q( +
Bugfix and improvements in multixact.c
Hi! While working on a making multix xact offsets 64-bit [0] I've discovered a minor issue. The thing is that type 'xid' is used in all macro, but it doesn't correct. Appropriate MultiXactId or MultiXactOffset should be used, actually. And the second thing, as Heikki Linnakangas points out, args naming is also misleading. Since, these problems are not the point of thread [0], I decided to create this discussion. And here is the patch set addressing mentioned issues (0001 and 0002). Additionally, I made an optional patch 0003 to switch from macro to inline functions. For me, personally, use of macro functions is justified if we are dealing with different argument types, to make polymorphic call. Which is not the case here. So, we can have more control over types and still generate the same code in terms of speed. See https://godbolt.org/z/KM8voadhs Starting from O1 function is inlined, thus no overhead is noticeable. Anyway, it's up to the actual commiter to decide does it worth it or not. Again, this particular patch 0003 is completely optional. As always, any opinions and reviews are very welcome! [0] https://www.postgresql.org/message-id/flat/ff143b24-a093-40da-9833-d36b83726bdf%40iki.fi#61d5a0e1cf6ab94b0e8aae8559bc4cf7 -- Best regards, Maxim Orlov. v1-0003-Switch-from-macro-functions-to-inline-in-multixac.patch Description: Binary data v1-0002-Fix-using-of-MultiXactOffset-type-insted-of-Trans.patch Description: Binary data v1-0001-Get-rid-of-misleading-xid-arg-name-in-multixact.patch Description: Binary data
Re: jsonpath: Missing Binary Execution Path?
On Jun 13, 2024, at 22:31, Chapman Flack wrote: > It's baked right into the standard grammar: || can only have a > on its right and a > on its left. > > && can only have a on its right and a > on its left. Wow. > The case for ! is even more limiting: it can't be applied to anything > but a . That can be either the exists predicate, > or, any other but wrapped in parentheses. > > The language seems sort of gappy in the same way XPath 1.0 was. XPath 2.0 > became much more consistent and conceptually unified, only by that time, > XML was old school, and JSON was cool, and apparently started inventing > a path language. I suppose that’s the reason for this design. But if these sorts of limitations were changed in XPath, perhaps SQL-Next could fix them, too. Thanks for citing the standard; super helpful. D
Re: Shouldn't jsonpath .string() Unwrap?
On Jun 13, 2024, at 21:55, Chapman Flack wrote: > My opinion is yes, that should be done. 9.46, umm, General > Rule 11 g ii 6) A) says just "if MODE is lax and is not > type or size, then let BASE be Unwrap(BASE)." No special exemption > there for string(), nor further below at C) XV) for the operation > of string(). Thank you! Cited that bit in the commit message in the attached patch (also available as a GitHub PR[1]). D [1]: https://github.com/theory/postgres/pull/5 v1-0001-Teach-jsonpath-string-to-unwrap-in-lax-mode.patch Description: Binary data
Re: may be a buffer overflow problem
Daniel Gustafsson writes: > This is indeed buggy and need to take the length into account, as per the > attached. This only happens when in the undocumented regression test debug > mode which may be why it's gone unnoticed. Seeing that this code is exercised thousands of times a day in the regression tests and has had a failure rate of exactly zero (and yes, the tests do check the output), there must be some reason why it's okay. regards, tom lane
Re: CI and test improvements
Hi, On Thu, 13 Jun 2024 at 14:56, Justin Pryzby wrote: > > ccache should be installed in the image rather than re-installed on each > invocation. ccache is installed in the Windows VM images now [1]. It can be used as 'set CC=ccache.exe cl.exe' in the Windows CI task. [1] https://github.com/anarazel/pg-vm-images/commit/03a9225ac962fb30b5c0722c702941e2d7c1e81e -- Regards, Nazir Bilal Yavuz Microsoft
Re: RFC: adding pytest as a supported test framework
On 2024-06-14 Fr 08:10, Robert Haas wrote: We've talked about a libpq FFI interface, but it hasn't been done; Hold my beer :-) I just posted a POC for that. cheers andrew -- Andrew Dunstan EDB:https://www.enterprisedb.com
Re: may be a buffer overflow problem
I wrote: > Seeing that this code is exercised thousands of times a day in the > regression tests and has had a failure rate of exactly zero (and > yes, the tests do check the output), there must be some reason > why it's okay. After looking a little closer, I think the reason why it works in practice is that there's always a few bytes of zero padding at the end of struct sqlca_t. I don't see any value in changing individual code sites that are depending on that, because there are surely many more, both in our own code and end users' code. What I suggest we ought to do is formalize the existence of that zero pad. Perhaps like this: charsqlstate[5]; + charsqlstatepad; /* nul terminator for sqlstate */ }; Another way could be to change - charsqlstate[5]; + charsqlstate[6]; but I fear that could have unforeseen consequences in code that is paying attention to sizeof(sqlstate). Either way there are probably doc adjustments to be made. regards, tom lane
Re: CI and test improvements
On Fri, Jun 14, 2024 at 05:36:54PM +0300, Nazir Bilal Yavuz wrote: > Hi, > > On Thu, 13 Jun 2024 at 14:56, Justin Pryzby wrote: > > > > ccache should be installed in the image rather than re-installed on each > > invocation. > > ccache is installed in the Windows VM images now [1]. It can be used > as 'set CC=ccache.exe cl.exe' in the Windows CI task. > > [1] > https://github.com/anarazel/pg-vm-images/commit/03a9225ac962fb30b5c0722c702941e2d7c1e81e Thanks. I think that works by using a "shim" created by choco in C:\ProgramData\chocolatey\bin. But going through that indirection seems to incur an extra 15sec of compilation time; I think we'll want to do something to avoid that. I'm not sure what the options are, like maybe installing ccache into a fixed path like c:\ccache or installing a custom link, to a "pinned" version of ccache. -- Justin
Using LibPq in TAP tests via FFI
Over at [1] Andres expressed enthusiasm for enabling TAP tests to call LibPQ directly via FFI, and there was some support from others as well. Attached is a very rough POC for just that.There are two perl modules, one which wraps libpq (or almost all of it) in perl, and another which uses that module to create a session object that can be used to run SQL. Also in the patch is a modification of one TAP test (arbitrarily chosen as src/bin/pg_amcheck/t/004_verify_heapam.p) to use the new interface, so it doesn't use psql at all. There's a bunch of work to do here, but for a morning's work it's not too bad :-) Luckily I had most of the first file already to hand. Next I plan to look at some of the recovery tests and other uses of background_psql, which might be more challenging,a dn require extension of the session API. Also there's a lot of error checking and documentation that need to be added. cheers andrew [1] https://postgr.es/m/20240612152812.ixz3eiz2p475g...@awork3.anarazel.de -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: Using LibPq in TAP tests via FFI
On 2024-06-14 Fr 11:09, Andrew Dunstan wrote: Over at [1] Andres expressed enthusiasm for enabling TAP tests to call LibPQ directly via FFI, and there was some support from others as well. Attached is a very rough POC for just that.There are two perl modules, one which wraps libpq (or almost all of it) in perl, and another which uses that module to create a session object that can be used to run SQL. Also in the patch is a modification of one TAP test (arbitrarily chosen as src/bin/pg_amcheck/t/004_verify_heapam.p) to use the new interface, so it doesn't use psql at all. There's a bunch of work to do here, but for a morning's work it's not too bad :-) Luckily I had most of the first file already to hand. Next I plan to look at some of the recovery tests and other uses of background_psql, which might be more challenging,a dn require extension of the session API. Also there's a lot of error checking and documentation that need to be added. cheers andrew [1] https://postgr.es/m/20240612152812.ixz3eiz2p475g...@awork3.anarazel.de And here's the patch cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com diff --git a/src/bin/pg_amcheck/t/004_verify_heapam.pl b/src/bin/pg_amcheck/t/004_verify_heapam.pl index f6d2c5f787..c47ff3f939 100644 --- a/src/bin/pg_amcheck/t/004_verify_heapam.pl +++ b/src/bin/pg_amcheck/t/004_verify_heapam.pl @@ -5,6 +5,7 @@ use strict; use warnings FATAL => 'all'; use PostgreSQL::Test::Cluster; +use PostgreSQL::Test::Session; use PostgreSQL::Test::Utils; use Test::More; @@ -190,16 +191,17 @@ $node->append_conf('postgresql.conf', 'max_prepared_transactions=10'); $node->start; my $port = $node->port; my $pgdata = $node->data_dir; -$node->safe_psql('postgres', "CREATE EXTENSION amcheck"); -$node->safe_psql('postgres', "CREATE EXTENSION pageinspect"); +my $session = PostgreSQL::Test::Session->new(node => $node); +$session->do("CREATE EXTENSION amcheck"); +$session->do("CREATE EXTENSION pageinspect"); # Get a non-zero datfrozenxid -$node->safe_psql('postgres', qq(VACUUM FREEZE)); +$session->do(qq(VACUUM FREEZE)); # Create the test table with precisely the schema that our corruption function # expects. -$node->safe_psql( - 'postgres', qq( +$session->do( + qq( CREATE TABLE public.test (a BIGINT, b TEXT, c TEXT); ALTER TABLE public.test SET (autovacuum_enabled=false); ALTER TABLE public.test ALTER COLUMN c SET STORAGE EXTERNAL; @@ -209,14 +211,15 @@ $node->safe_psql( # We want (0 < datfrozenxid < test.relfrozenxid). To achieve this, we freeze # an otherwise unused table, public.junk, prior to inserting data and freezing # public.test -$node->safe_psql( - 'postgres', qq( +$session->do( + qq( CREATE TABLE public.junk AS SELECT 'junk'::TEXT AS junk_column; ALTER TABLE public.junk SET (autovacuum_enabled=false); - VACUUM FREEZE public.junk - )); + ), + 'VACUUM FREEZE public.junk' +); -my $rel = $node->safe_psql('postgres', +my $rel = $session->query_oneval( qq(SELECT pg_relation_filepath('public.test'))); my $relpath = "$pgdata/$rel"; @@ -229,23 +232,24 @@ my $ROWCOUNT_BASIC = 16; # First insert data needed for tests unrelated to update chain validation. # Then freeze the page. These tuples are at offset numbers 1 to 16. -$node->safe_psql( - 'postgres', qq( +$session->do( + qq( INSERT INTO public.test (a, b, c) SELECT x'DEADF9F9DEADF9F9'::bigint, 'abcdefg', repeat('w', 1) FROM generate_series(1, $ROWCOUNT_BASIC); - VACUUM FREEZE public.test;) +), + 'VACUUM FREEZE public.test' ); # Create some simple HOT update chains for line pointer validation. After # the page is HOT pruned, we'll have two redirects line pointers each pointing # to a tuple. We'll then change the second redirect to point to the same # tuple as the first one and verify that we can detect corruption. -$node->safe_psql( - 'postgres', qq( +$session->do( + qq( INSERT INTO public.test (a, b, c) VALUES ( x'DEADF9F9DEADF9F9'::bigint, 'abcdefg', generate_series(1,2)); -- offset numbers 17 and 18 @@ -254,8 +258,8 @@ $node->safe_psql( )); # Create some more HOT update chains. -$node->safe_psql( - 'postgres', qq( +$session->do( + qq( INSERT INTO public.test (a, b, c) VALUES ( x'DEADF9F9DEADF9F9'::bigint, 'abcdefg', generate_series(3,6)); -- offset numbers 21 through 24 @@ -264,25 +268,29 @@ $node->safe_psql( )); # Negative test case of HOT-pruning with aborted tuple. -$node->safe_psql( - 'postgres', qq( +$session->do( + qq( BEGIN; UPDATE public.test SET c = 'a' WHERE c = '5'; -- offset number 27 ABORT; - VACUUM FREEZE public.test; - )); + ), + 'VACUUM FREEZE public.test;', + ); # Next update on any tuple will be stored at the same place of tuple inserted # by aborted transaction. This should not cause the table to appear corrupt. -$node->safe_psql( - 'postgres', qq( +$session->do( + qq( +BEGIN; UPDATE public.test SET c = 'a' WHERE c = '6'; -- o
Re: Shouldn't jsonpath .string() Unwrap?
On 06/14/24 10:39, David E. Wheeler wrote: > Cited that bit in the commit message in the attached patch (also available as > a GitHub PR[1]). > > [1]: https://github.com/theory/postgres/pull/5 I would s/extepsions/exceptions/ in the added documentation. :) Offhand (as GitHub PRs aren't really The PG Way), if they were The Way, I would find this one a little hard to follow, being based at a point 28 unrelated commits ahead of the ref it's opened against. I suspect 'master' on theory/postgres could be fast-forwarded to f1affb6 and then the PR would look much more approachable. Regards, -Chap
Re: CI and test improvements
Hi, On June 14, 2024 8:22:01 AM PDT, Justin Pryzby wrote: >On Fri, Jun 14, 2024 at 05:36:54PM +0300, Nazir Bilal Yavuz wrote: >> Hi, >> >> On Thu, 13 Jun 2024 at 14:56, Justin Pryzby wrote: >> > >> > ccache should be installed in the image rather than re-installed on each >> > invocation. >> >> ccache is installed in the Windows VM images now [1]. It can be used >> as 'set CC=ccache.exe cl.exe' in the Windows CI task. >> >> [1] >> https://github.com/anarazel/pg-vm-images/commit/03a9225ac962fb30b5c0722c702941e2d7c1e81e > >Thanks. I think that works by using a "shim" created by choco in >C:\ProgramData\chocolatey\bin. > >But going through that indirection seems to incur an extra 15sec of >compilation time; I think we'll want to do something to avoid that. > >I'm not sure what the options are, like maybe installing ccache into a >fixed path like c:\ccache or installing a custom link, to a "pinned" >version of ccache. Hm. There actually already is the mingw ccache installed. The images for mingw and msvc used to be separate (for startup performance when using containers), but we just merged them. So it might be easiest to just explicitly use the ccache from there (also an explicit path might be faster). Could you check if that's fast? If not, we can just install the binaries distributed by the project, it's just more work to keep up2date that way. Andres -- Sent from my Android device with K-9 Mail. Please excuse my brevity.
Re: Using LibPq in TAP tests via FFI
On Fri, Jun 14, 2024 at 11:11 AM Andrew Dunstan wrote: > And here's the patch I haven't reviewed the patch, but a big +1 for the idea. Not only this might cut down on the resource costs of running the tests in CI, as Andres has pointed out a few times, but it also could lead to much nicer user interfaces. For instance, right now, we have a number of TAP tests that are parsing psql output to recover the values returned by queries. Perhaps eventually - or maybe already, again I haven't looked at the code - you'll be able to do something like $resultset->[0][0] to pull the first column out of the first row. That kind of thing could substantially improve the readability and maintainability of some of our tests. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Using LibPq in TAP tests via FFI
Hi, On June 14, 2024 8:09:43 AM PDT, Andrew Dunstan wrote: >Over at [1] Andres expressed enthusiasm for enabling TAP tests to call LibPQ >directly via FFI, and there was some support from others as well. Attached is >a very rough POC for just that.There are two perl modules, one which wraps >libpq (or almost all of it) in perl, and another which uses that module to >create a session object that can be used to run SQL. Also in the patch is a >modification of one TAP test (arbitrarily chosen as >src/bin/pg_amcheck/t/004_verify_heapam.p) to use the new interface, so it >doesn't use psql at all. > >There's a bunch of work to do here, but for a morning's work it's not too bad >:-) Luckily I had most of the first file already to hand. Yay! >Next I plan to look at some of the recovery tests and other uses of >background_psql, which might be more challenging,a dn require extension of the >session API. Also there's a lot of error checking and documentation that need >to be added. I'd suggest trying to convert the various looping constructs first, they're responsible for a large number of spawned shells. And I vaguely recall that there were none/very few that depend on actually being run via psql. Andres -- Sent from my Android device with K-9 Mail. Please excuse my brevity.
Re: RFC: adding pytest as a supported test framework
Robert Haas writes: > I mean, both Perl and Python are Turing-complete. Anything you can do > in one, you can do in the other, especially when you consider that > we're not likely to accept too many dependencies on external Perl or > Python modules. That's why I see this as nothing more or less than > exercise in letting people use the programming language they prefer. I think that's an oversimplified analysis. Sure, the languages are both Turing-complete, but for our purposes here they are both simply glue languages around some set of testing facilities. Some of those facilities will be provided by the base languages (plus whatever extension modules we choose to require) and some by code we write. The overall experience of writing tests will be determined by the testing facilities far more than by the language used for glue. That being the case, I do agree with your point that Python equivalents to most of PostgreSQL::Test will need to be built up PDQ. Maybe they can be better than the originals, in features or ease of use, but "not there at all" is not better. But what I'd really like to see is some comparison of the language-provided testing facilities that we're proposing to depend on. Why is pytest (or whatever) better than Test::More? I also wonder about integration of python-based testing with what we already have. A significant part of what you called the meson work had to do with persuading pg_regress, isolationtester, etc to output test results in the common format established by TAP. Am I right in guessing that pytest will have nothing to do with that? Can we even manage to dump perl and python test scripts into the same subdirectory and sort things out automatically? I'm definitely going to be -1 for a python testing feature that cannot integrate with what we have because it demands its own control and result-collection infrastructure. regards, tom lane
Re: libpq contention due to gss even when not using gss
> On Fri, Jun 14, 2024 at 12:12:55PM GMT, Daniel Gustafsson wrote: > > I've been experimenting with both: > > > > * The server is built without gssapi, but the client does support it. > > This produces exactly the contention you're talking about. > > > > * The server is built with gssapi, but do not use it in pg_hba, the > > client does support gssapi. In this case the difference between > > gssencmode=disable/prefer is even more dramatic in my test case > > (milliseconds vs seconds) due to the environment with configured > > kerberos (for other purposes, thus gss_init_sec_context spends huge > > amount of time to still return nothing). > > > > At the same time after quick look I don't see an easy way to avoid that. > > Current implementation tries to initialize gss before getting any > > confirmation from the server whether it's supported. Doing this other > > way around would probably just shift overhead to the server side. > > The main problem seems to be that we check whether or not there is a > credential > cache when we try to select encryption but not yet authentication, as a way to > figure out if gssenc it as all worth trying? Yep, this is my understanding as well. Which other methods did you try for checking that?
Re: libpq contention due to gss even when not using gss
Hi, On 2024-06-14 10:46:04 +0200, Dmitry Dolgov wrote: > At the same time after quick look I don't see an easy way to avoid that. > Current implementation tries to initialize gss before getting any > confirmation from the server whether it's supported. Doing this other > way around would probably just shift overhead to the server side. Initializing the gss cache at all isn't so much the problem. It's that we do it for every connection. And that doing so requires locking inside gss. So maybe we could just globally cache that gss isn't available, instead of rediscovering it over and over for every new connection. Greetings, Andres Freund
Re: Shouldn't jsonpath .string() Unwrap?
> On Jun 14, 2024, at 11:25, Chapman Flack wrote: > > I would s/extepsions/exceptions/ in the added documentation. :) Bah, fixed and attached, thanks. > Offhand (as GitHub PRs aren't really The PG Way), if they were The Way, > I would find this one a little hard to follow, being based at a point > 28 unrelated commits ahead of the ref it's opened against. I suspect > 'master' on theory/postgres could be fast-forwarded to f1affb6 and then > the PR would look much more approachable. Yeah, I pushed the PR and branch before I synced master, and GitHub was taking a while to notice and update the PR. I fixed it with `git commit --all --amend --date now --reedit-message HEAD` and force-pushed (then fixed the typo and fixed again). D v2-0001-Teach-jsonpath-string-to-unwrap-in-lax-mode.patch Description: Binary data
Re: RFC: adding pytest as a supported test framework
Hi, On 2024-06-14 11:49:29 -0400, Tom Lane wrote: > I also wonder about integration of python-based testing with what > we already have. A significant part of what you called the meson > work had to do with persuading pg_regress, isolationtester, etc > to output test results in the common format established by TAP. FWIW, meson's testrunner doesn't require TAP, the lowest common denominator is just an exit code. However, for things that run many "sub" tests, it's a lot nicer if the failures can be reported more granularly than just "the entire testsuite failed". Meson currently supports: exitcode: the executable's exit code is used by the test harness to record the outcome of the test). tap: Test Anything Protocol. gtest (since 0.55.0): for Google Tests. rust (since 0.56.0): for native rust tests > Am I right in guessing that pytest will have nothing to do with that? Looks like there's a plugin for pytest to support tap as output: https://pypi.org/project/pytest-tap/ However, it's not available as a debian package. I know that some folks just advocate installing dependencies via venv, but I personally don't think that'll fly. For one, it'll basically prevent tests being run by packagers. > Can we even manage to dump perl and python test scripts into the same > subdirectory and sort things out automatically? That shouldn't be much of a problem. > I'm definitely going to be -1 for a python testing feature that cannot > integrate with what we have because it demands its own control and > result-collection infrastructure. Dito. Greetings, Andres Freund
Re: RFC: adding pytest as a supported test framework
On 2024-06-14 09:11:11 -0700, Andres Freund wrote: > On 2024-06-14 11:49:29 -0400, Tom Lane wrote: > > Am I right in guessing that pytest will have nothing to do with that? > > Looks like there's a plugin for pytest to support tap as output: > https://pypi.org/project/pytest-tap/ > > However, it's not available as a debian package. I know that some folks just > advocate installing dependencies via venv, but I personally don't think > that'll fly. For one, it'll basically prevent tests being run by packagers. If this were the blocker, I think we could just ship an output adapter ourselves. pytest-tap is not a lot of code: https://github.com/python-tap/pytest-tap/blob/main/src/pytest_tap/plugin.py So either vendoring it or just writing an even simpler version ourselves seems entirely feasible.
jsonpath: Missing regex_like && starts with Errors?
Hackers, I noticed that neither `regex_like` nor `starts with`, the jsonpath operators, raise an error when the operand is not a string (or array of strings): david=# select jsonb_path_query('true', '$ like_regex "^hi"'); jsonb_path_query -- null (1 row) david=# select jsonb_path_query('{"x": "hi"}', '$ starts with "^hi"'); jsonb_path_query -- null (1 row) This is true in strict and lax mode, and with verbosity enabled (as in these examples). Most other operators raise an error when they can’t operate on the operand: david=# select jsonb_path_query('{"x": "hi"}', '$.integer()'); ERROR: jsonpath item method .integer() can only be applied to a string or numeric value david=# select jsonb_path_query('{"x": "hi"}', '$+$'); ERROR: left operand of jsonpath operator + is not a single numeric value Should `like_regex` and `starts with` adopt this behavior, too? I note that filter expressions seem to suppress these sorts of errors, but I assume that’s by design: david=# select jsonb_path_query('{"x": "hi"}', 'strict $ ?(@ starts with "^hi")'); jsonb_path_query -- (0 rows) david=# select jsonb_path_query('{"x": "hi"}', 'strict $ ?(@ like_regex "^hi")'); jsonb_path_query -- (0 rows) david=# select jsonb_path_query('{"x": "hi"}', 'strict $ ?(@.integer() == 1)'); jsonb_path_query -- (0 rows) D
Re: Using LibPq in TAP tests via FFI
Hi, On 2024-06-14 11:11:38 -0400, Andrew Dunstan wrote: > On 2024-06-14 Fr 11:09, Andrew Dunstan wrote: > > Over at [1] Andres expressed enthusiasm for enabling TAP tests to call > > LibPQ directly via FFI, and there was some support from others as well. > > Attached is a very rough POC for just that.There are two perl modules, > > one which wraps libpq (or almost all of it) in perl, and another which > > uses that module to create a session object that can be used to run SQL. What are your current thoughts about a fallback for this? It seems possible to implement the session module ontop of BackgroundPsql.pm, if necessary. But I suspect we'll eventually get to a point where that gets less and less convenient. How much of a dependency is FFI::Platypus, compared to requiring perl headers to be installed? In case FFI::Platypus is a complicted dependency, a small XS wrapper could be an alternative. Greetings, Andres Freund
Re: libpq contention due to gss even when not using gss
Andres Freund writes: > Initializing the gss cache at all isn't so much the problem. It's that we do > it for every connection. And that doing so requires locking inside gss. So > maybe we could just globally cache that gss isn't available, instead of > rediscovering it over and over for every new connection. I had the impression that krb5 already had such a cache internally. Maybe they don't cache the "failed" state though. I doubt we'd want to either in long-lived processes --- what if the user installs the credential while we're running? regards, tom lane
Re: libpq contention due to gss even when not using gss
Hi, On 2024-06-14 12:27:12 -0400, Tom Lane wrote: > Andres Freund writes: > > Initializing the gss cache at all isn't so much the problem. It's that we do > > it for every connection. And that doing so requires locking inside gss. So > > maybe we could just globally cache that gss isn't available, instead of > > rediscovering it over and over for every new connection. > > I had the impression that krb5 already had such a cache internally. Well, if so, it clearly doesn't seem to work very well, given that it causes contention at ~15k lookups/sec. That's obviously a trivial number for anything cached, even with the worst possible locking regimen. > Maybe they don't cache the "failed" state though. I doubt we'd > want to either in long-lived processes --- what if the user > installs the credential while we're running? If we can come up with something better - cool. But it doesn't seem great that gss introduces contention for the vast majority of folks that use libpq in environments that never use gss. I don't think we should cache the set of credentials when gss is actually available on a process-wide basis, just the fact that gss isn't available at all. I think it's very unlikely for that fact to change while an application is running. And if it happens, requiring a restart in those cases seems an acceptable price to pay for what is effectively a niche feature. Greetings, Andres Freund
Re: Bugfix and improvements in multixact.c
On 14/06/2024 16:56, Maxim Orlov wrote: Hi! While working on a making multix xact offsets 64-bit [0] I've discovered a minor issue. The thing is that type 'xid' is used in all macro, but it doesn't correct. Appropriate MultiXactId or MultiXactOffset should be used, actually. And the second thing, as Heikki Linnakangas points out, args naming is also misleading. Thanks! Looks good to me at a quick glance. I'll try to review and commit these properly by Monday. Additionally, I made an optional patch 0003 to switch from macro to inline functions. For me, personally, use of macro functions is justified if we are dealing with different argument types, to make polymorphic call. Which is not the case here. So, we can have more control over types and still generate the same code in terms of speed. See https://godbolt.org/z/KM8voadhs Starting from O1 function is inlined, thus no overhead is noticeable. Anyway, it's up to the actual commiter to decide does it worth it or not. Again, this particular patch 0003 is completely optional. I agree static inline functions are generally easier to work with than macros. These particular macros were not too bad, though. I'll bite the bullet and commit this one too unless someone objects. It's late in the v17 release cycle, but these are local to multixact.c so there's no risk of breaking extensions, and it seems good to do it now since we're modifying the macros anyway. -- Heikki Linnakangas Neon (https://neon.tech)
Re: Allow non-superuser to cancel superuser tasks.
On Fri, Jun 14, 2024 at 12:06:36PM +0500, Andrey M. Borodin wrote: > This patch looks good to me. Thanks for looking. > Spellchecker is complaining about "signaling" instead of "signalling", > but ISTM it´s OK. I think this is an en-US versus en-GB thing. We've standardized on en-US for "cancel" (see commits 8c9da14, 21f1e15, and af26857), so IMO we might as well do so for "signal," too. -- nathan
Re: RFC: adding pytest as a supported test framework
On Thu, Jun 13, 2024 at 1:08 PM Jelte Fennema-Nio wrote: > But Perl is at the next level of unmaintained infrastructure. It is > actually clear how you can contribute to it, but still no new > community members actually want to contribute to it. Also, it's not > only unmaintained by us but it's also pretty much unmaintained by the > upstream community. I am not happy with the state of Perl, as it has made some MAJOR missteps along the way, particularly in the last 5 years. But can we dispel this strawman? There is a difference between "unpopular" and "unmaintained". The latest version of Perl was released May 20, 2024. The latest release of Test::More was April 25, 2024. Both are heavily used. Just not as heavily as they used to be. :) Cheers, Greg
Re: RFC: adding pytest as a supported test framework
On Fri, 14 Jun 2024 at 22:33, Greg Sabino Mullane wrote: > I am not happy with the state of Perl, as it has made some MAJOR missteps > along the way, particularly in the last 5 years. But can we dispel this > strawman? There is a difference between "unpopular" and "unmaintained". The > latest version of Perl was released May 20, 2024. The latest release of > Test::More was April 25, 2024. Both are heavily used. Just not as heavily as > they used to be. :) Sorry, yes I exaggerated here. Looking at the last Perl changelog[1] it's definitely getting more new features and improvements than I had thought. Test::More on the other hand, while indeed still maintained, it's definitely not getting significant new feature development or improvements[2]. Especially when comparing it to pytest[3]. [1]: https://perldoc.perl.org/perldelta [2]: https://github.com/Test-More/test-more/blob/master/Changes [3]: https://docs.pytest.org/en/stable/changelog.html
Re: RFC: adding pytest as a supported test framework
On Fri, Jun 14, 2024 at 9:24 AM Robert Haas wrote: > For example, the fact that > nobody's helping Thomas maintain this cfbot that we all have come to > rely on, or helping him get that integrated into > commitfest.postgresql.org, is a problem. I've been talking to Magnus and Jelte about cfbot and we're hoping to have some good news soon...
Re: use CREATE DATABASE STRATEGY = FILE_COPY in pg_upgrade
On Wed, Jun 12, 2024 at 09:41:01AM -0500, Nathan Bossart wrote: > Actually, I think you are right that we need a manual checkpoint, except I > think we need it to be after prepare_new_globals(). set_frozenxids() > connects to each database (including template0) and updates a bunch of > pg_class rows, and we probably want those on disk before we start copying > the files to create all the user's databases. Here is an updated patch. -- nathan >From c3aeed33b351475896c2b4002cff125f1edeeb42 Mon Sep 17 00:00:00 2001 From: Nathan Bossart Date: Fri, 14 Jun 2024 16:25:04 -0500 Subject: [PATCH v3 1/1] Use CREATE DATABASE ... STRATEGY = FILE_COPY in pg_upgrade. While the strategy is considered very costly due to expensive checkpoints when the target system has configured large shared_buffers, the issues that those checkpoints protect against don't apply in a pg_upgrade system. By skipping the checkpoints during CREATE DATABASE ... STRATEGY=FILE_COPY and using that strategy for pg_upgrade, we can save a lot of time and IO that would otherwise be spent on reading pages from disk, generating WAL records, writing those WAL records to disk, and then writing the pages to disk - which has the potential of at least double the IO of FILE_COPY. Co-authored-by: Matthias van de Meent Reviewed-by: Ranier Vilela, Dilip Kumar Discussion: https://postgr.es/m/Zl9ta3FtgdjizkJ5%40nathan --- src/backend/commands/dbcommands.c | 18 +++--- src/bin/pg_dump/pg_dump.c | 9 - src/bin/pg_upgrade/pg_upgrade.c | 12 3 files changed, 35 insertions(+), 4 deletions(-) diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c index be629ea92c..99d7a9ba3c 100644 --- a/src/backend/commands/dbcommands.c +++ b/src/backend/commands/dbcommands.c @@ -563,9 +563,14 @@ CreateDatabaseUsingFileCopy(Oid src_dboid, Oid dst_dboid, Oid src_tsid, * happened while we're copying files, a file might be deleted just when * we're about to copy it, causing the lstat() call in copydir() to fail * with ENOENT. +* +* In binary upgrade mode, we can skip this checkpoint because we are +* careful to ensure that template0 is fully written to disk prior to any +* CREATE DATABASE commands. */ - RequestCheckpoint(CHECKPOINT_IMMEDIATE | CHECKPOINT_FORCE | - CHECKPOINT_WAIT | CHECKPOINT_FLUSH_ALL); + if (!IsBinaryUpgrade) + RequestCheckpoint(CHECKPOINT_IMMEDIATE | CHECKPOINT_FORCE | + CHECKPOINT_WAIT | CHECKPOINT_FLUSH_ALL); /* * Iterate through all tablespaces of the template database, and copy each @@ -657,10 +662,17 @@ CreateDatabaseUsingFileCopy(Oid src_dboid, Oid dst_dboid, Oid src_tsid, * seem to be much we can do about that except document it as a * limitation. * +* In binary upgrade mode, we can skip this checkpoint because neither of +* these problems applies: we don't ever replay the WAL generated during +* pg_upgrade, and we don't concurrently modify template0 (not to mention +* that trying to take a backup during pg_upgrade is pointless). +* * See CreateDatabaseUsingWalLog() for a less cheesy CREATE DATABASE * strategy that avoids these problems. */ - RequestCheckpoint(CHECKPOINT_IMMEDIATE | CHECKPOINT_FORCE | CHECKPOINT_WAIT); + if (!IsBinaryUpgrade) + RequestCheckpoint(CHECKPOINT_IMMEDIATE | CHECKPOINT_FORCE | + CHECKPOINT_WAIT); } /* diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c index e324070828..1e51a63442 100644 --- a/src/bin/pg_dump/pg_dump.c +++ b/src/bin/pg_dump/pg_dump.c @@ -3123,10 +3123,17 @@ dumpDatabase(Archive *fout) * since those can't be altered later. Other DB properties are left to * the DATABASE PROPERTIES entry, so that they can be applied after * reconnecting to the target DB. +* +* For binary upgrade, we use the FILE_COPY strategy because testing has +* shown it to be faster. When the server is in binary upgrade mode, it +* will also skip the checkpoints normally triggered by CREATE DATABASE +* STRATEGY = FILE_COPY. */ if (dopt->binary_upgrade) { - appendPQExpBuffer(creaQry, "CREATE DATABASE %s WITH TEMPLATE = template0 OID = %u", + appendPQExpBuffer(creaQry, + "CREATE DATABASE %s WITH TEMPLATE = template0 " + "OID = %u STRATEGY = FILE_COPY", qdatname, dbCatId.oid); } else diff --git a/src/bin/pg_upgrade/pg_upgrade.c b/src/bin/pg_upgrade/pg_upgrade.c index af370768b6..45f095a230 100644 --- a/
Re: RFC: adding pytest as a supported test framework
On Fri, 14 Jun 2024 at 17:49, Tom Lane wrote: > But what I'd really like to see is some comparison of the > language-provided testing facilities that we're proposing > to depend on. Why is pytest (or whatever) better than Test::More? Some advantages of pytest over Test::More: 1. It's much easier to debug failing tests using the output that pytest gives. A good example of this is on pytest its homepage[1] (i.e. it shows the value given to the call to inc in the error) 2. No need to remember a specific comparison DSL (is/isnt/is_deeply/like/ok/cmp_ok/isa_ok), just put assert in front of a boolean expression and your output is great (if you want to add a message too for clarity you can use: assert a == b, "the world is ending") 3. Very easy to postgres log files on stderr/stdout when a test fails. This might be possible/easy with Perl too, but we currently don't do that. So right now for many failures you're forced to traverse the build/testrun/... directory tree to find the logs. 4. Pytest has autodiscovery of test files and functions, so we probably wouldn't have to specify all of the exact test files anymore in the meson.build files. Regarding 2, there are ~150 checks that are using a suboptimal way of testing for a comparison. Mostly a lot that could use "like(..., ...)" instead of "ok(... ~= ...)" ❯ grep '\bok(.*=' **.pl | wc -l 151 [1]: https://docs.pytest.org/en/8.2.x/#a-quick-example
Re: tiny step toward threading: reduce dependence on setlocale()
On Thu, 2024-06-06 at 11:37 -0700, Jeff Davis wrote: > > I think this patch series is a nice cleanup, as well, making libc > more > like the other providers and not dependent on global state. New rebased series attached with additional cleanup. Now that pg_locale_t is never NULL, we can simplify the way the collation cache works, eliminating ~100 lines. -- Jeff Davis PostgreSQL Contributor Team - AWS From d3862b88d8df3372ebdd368489a86142bd11f42c Mon Sep 17 00:00:00 2001 From: Jeff Davis Date: Wed, 5 Jun 2024 11:45:55 -0700 Subject: [PATCH v2 1/7] Make database default collation internal to pg_locale.c. --- src/backend/utils/adt/pg_locale.c | 64 ++- src/backend/utils/init/postinit.c | 35 ++--- src/include/utils/pg_locale.h | 3 +- 3 files changed, 66 insertions(+), 36 deletions(-) diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c index 7e5bb2b703..29f16c49cb 100644 --- a/src/backend/utils/adt/pg_locale.c +++ b/src/backend/utils/adt/pg_locale.c @@ -56,6 +56,7 @@ #include "access/htup_details.h" #include "catalog/pg_collation.h" +#include "catalog/pg_database.h" #include "mb/pg_wchar.h" #include "miscadmin.h" #include "utils/builtins.h" @@ -116,6 +117,8 @@ char *localized_full_months[12 + 1]; /* is the databases's LC_CTYPE the C locale? */ bool database_ctype_is_c = false; +static struct pg_locale_struct default_locale; + /* indicates whether locale information cache is valid */ static bool CurrentLocaleConvValid = false; static bool CurrentLCTimeValid = false; @@ -1443,8 +1446,6 @@ lc_ctype_is_c(Oid collation) return (lookup_collation_cache(collation, true))->ctype_is_c; } -struct pg_locale_struct default_locale; - void make_icu_collator(const char *iculocstr, const char *icurules, @@ -1537,6 +1538,65 @@ pg_locale_deterministic(pg_locale_t locale) return locale->deterministic; } +void +pg_init_database_collation() +{ + HeapTuple tup; + Form_pg_database dbform; + Datum datum; + bool isnull; + + /* Fetch our pg_database row normally, via syscache */ + tup = SearchSysCache1(DATABASEOID, ObjectIdGetDatum(MyDatabaseId)); + if (!HeapTupleIsValid(tup)) + elog(ERROR, "cache lookup failed for database %u", MyDatabaseId); + dbform = (Form_pg_database) GETSTRUCT(tup); + + if (dbform->datlocprovider == COLLPROVIDER_BUILTIN) + { + char *datlocale; + + datum = SysCacheGetAttrNotNull(DATABASEOID, tup, Anum_pg_database_datlocale); + datlocale = TextDatumGetCString(datum); + + builtin_validate_locale(dbform->encoding, datlocale); + + default_locale.info.builtin.locale = MemoryContextStrdup( + TopMemoryContext, datlocale); + } + else if (dbform->datlocprovider == COLLPROVIDER_ICU) + { + char *datlocale; + char *icurules; + + datum = SysCacheGetAttrNotNull(DATABASEOID, tup, Anum_pg_database_datlocale); + datlocale = TextDatumGetCString(datum); + + datum = SysCacheGetAttr(DATABASEOID, tup, Anum_pg_database_daticurules, &isnull); + if (!isnull) + icurules = TextDatumGetCString(datum); + else + icurules = NULL; + + make_icu_collator(datlocale, icurules, &default_locale); + } + else + { + Assert(dbform->datlocprovider == COLLPROVIDER_LIBC); + } + + default_locale.provider = dbform->datlocprovider; + + /* + * Default locale is currently always deterministic. Nondeterministic + * locales currently don't support pattern matching, which would break a + * lot of things if applied globally. + */ + default_locale.deterministic = true; + + ReleaseSysCache(tup); +} + /* * Create a locale_t from a collation OID. Results are cached for the * lifetime of the backend. Thus, do not free the result with freelocale(). diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c index 0805398e24..6347efdd5a 100644 --- a/src/backend/utils/init/postinit.c +++ b/src/backend/utils/init/postinit.c @@ -423,43 +423,14 @@ CheckMyDatabase(const char *name, bool am_superuser, bool override_allow_connect strcmp(ctype, "POSIX") == 0) database_ctype_is_c = true; - if (dbform->datlocprovider == COLLPROVIDER_BUILTIN) - { - datum = SysCacheGetAttrNotNull(DATABASEOID, tup, Anum_pg_database_datlocale); - datlocale = TextDatumGetCString(datum); - - builtin_validate_locale(dbform->encoding, datlocale); - - default_locale.info.builtin.locale = MemoryContextStrdup( - TopMemoryContext, datlocale); - } - else if (dbform->datlocprovider == COLLPROVIDER_ICU) - { - char *icurules; + pg_init_database_collation(); - datum = SysCacheGetAttrNotNull(DATABASEOID, tup, Anum_pg_database_datlocale); + datum = SysCacheGetAttr(DATABASEOID, tup, Anum_pg_database_datlocale, &isnull); + if (!isnull) datlocale = TextDatumGetCString(datum); - - datum = SysCacheGetAttr(DATABASEOID, tup, Anum_pg_database_daticurules, &isnull); - if (!isnull) - icurules = TextDatumGetCString(datum); - else - icurules = NULL; - - make_icu_collator(datlocale, icurules,
Re: Using LibPq in TAP tests via FFI
On Sat, Jun 15, 2024 at 3:33 AM Robert Haas wrote: > I haven't reviewed the patch, but a big +1 for the idea. Not only this > might cut down on the resource costs of running the tests in CI, as It would be good to keep some context between the threads here. For the archives' sake, here is where the potential savings were reported, and this and other ideas were discussed: https://www.postgresql.org/message-id/flat/CA%2BhUKGJoEO33K%3DZynsH%3DxkiEyfBMZjOoqBK%2BgouBdTGW2-woig%40mail.gmail.com
Re: DROP OWNED BY fails to clean out pg_init_privs grants
Robert Haas writes: > I think the only thing we absolutely have to fix here is the dangling > ACL references. Here's a draft patch that takes care of Hannu's example, and I think it fixes other potential dangling-reference scenarios too. I'm not sure whether I like the direction this is going, but I do think we may not be able to do a lot more than this for v17. The core change is to install SHARED_DEPENDENCY_INITACL entries in pg_shdepend for all references to non-pinned roles in pg_init_privs, whether they are the object's owner or not. Doing that ensures that we can't drop a role that is still referenced there, and it provides a basis for shdepDropOwned and shdepReassignOwned to take some kind of action for such references. The semantics I've implemented on top of that are: * ALTER OWNER does not touch pg_init_privs entries. * REASSIGN OWNED replaces pg_init_privs references with the new role, whether the references are as grantor or grantee. * DROP OWNED removes pg_init_privs mentions of the doomed role (as grantor or grantee), removing the pg_init_privs entry altogether if it goes to zero clauses. (This is what happened already, but only if if a SHARED_DEPENDENCY_INITACL entry existed.) I'm not terribly thrilled with this, because it's still possible to get into a state where a pg_init_privs entry is based on an owning role that's no longer the owner: you just have to use ALTER OWNER directly rather than via REASSIGN OWNED. While I don't think the backend has much problem with that, it probably will confuse pg_dump to some extent. However, such cases are going to confuse pg_dump anyway for reasons discussed upthread, namely that we don't really support dump/restore of extensions where not all the objects are owned by the extension owner. I'm content to leave that in the pile of unfinished work for now. An alternative definition could be that ALTER OWNER also replaces old role with new in pg_init_privs entries. That would reduce the surface for confusing pg_dump a little bit, but I don't think that I like it better, for two reasons: * ALTER OWNER would have to probe pg_init_acl for potential entries every time, which would be wasted work for most ALTERs. * This gets away from the notion that pg_init_privs should be a historical record of the state that existed at the end of CREATE EXTENSION. Now, maybe that notion is unworkable anyway, but I don't want to let go of it before we're sure about that. A couple of more-minor points for review: * As this stands, updateInitAclDependencies() no longer pays any attention to its ownerId argument, and in one place I depend on that to skip doing a rather expensive lookup of the current object owner. Perhaps we should remove that argument altogether, and in consequence simplify some other callers too? However, that would only help much if we were willing to revert 534287403's changes to pass the object's owner ID to recordExtensionInitPriv, which I'm hesitant to do because I suspect we'll end up wanting to record the owner ID explicitly in pg_init_privs entries. On the third hand, maybe we'll never do that, so perhaps we should revert those changes for now; some of them add nontrivial lookup costs. * In shdepReassignOwned, I refactored to move the switch on sdepForm->classid into a new subroutine. We could have left it where it is, but it would need a couple more tab stops of indentation which felt like too much. It's in the eye of the beholder though. Comments? regards, tom lane diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml index 2f52e70b48..a63cc71efa 100644 --- a/doc/src/sgml/catalogs.sgml +++ b/doc/src/sgml/catalogs.sgml @@ -7182,9 +7182,6 @@ SCRAM-SHA-256$:&l The referenced object (which must be a role) is mentioned in a pg_init_privs entry for the dependent object. - (A SHARED_DEPENDENCY_INITACL entry is not made for - the owner of the object, since the owner will have - a SHARED_DEPENDENCY_OWNER entry anyway.) diff --git a/src/backend/catalog/aclchk.c b/src/backend/catalog/aclchk.c index 143876b77f..ad61e2e4a1 100644 --- a/src/backend/catalog/aclchk.c +++ b/src/backend/catalog/aclchk.c @@ -4846,6 +4846,124 @@ recordExtensionInitPrivWorker(Oid objoid, Oid classoid, int objsubid, table_close(relation, RowExclusiveLock); } +/* + * ReplaceRoleInInitPriv + * + * Used by shdepReassignOwned to replace mentions of a role in pg_init_privs. + */ +void +ReplaceRoleInInitPriv(Oid oldroleid, Oid newroleid, + Oid classid, Oid objid, int32 objsubid) +{ + Relation rel; + ScanKeyData key[3]; + SysScanDesc scan; + HeapTuple oldtuple; + Datum oldAclDatum; + bool isNull; + Acl *old_acl; + Acl *new_acl; + HeapTuple newtuple; + int noldmembers; + int nnewmembers; + Oid *oldmembers; + Oid *newmembers; + + /* Search for existing pg_init_privs entry for the target object. */ + rel = table_open(Init
Speed up collation cache
The blog post here (thank you depesz!): https://www.depesz.com/2024/06/11/how-much-speed-youre-leaving-at-the-table-if-you-use-default-locale/ showed an interesting result where the builtin provider is not quite as fast as "C" for queries like: SELECT * FROM a WHERE t = '...'; The reason is that it's calling varstr_cmp() many times, which does a lookup in the collation cache for each call. For sorts, it only does a lookup in the collation cache once, so the effect is not significant. The reason looking up "C" is faster is because there's a special check for C_COLLATION_OID, so it doesn't even need to do the hash lookup. If you create an equivalent collation like: CREATE COLLATION libc_c(PROVIDER = libc, LOCALE = 'C'); it will perform the same as a collation with the builtin provider. Attached is a patch to use simplehash.h instead, which speeds things up enough to make them fairly close (from around 15% slower to around 8%). The patch is based on the series here: https://postgr.es/m/f1935bc481438c9d86c2e0ac537b1c110d41a00a.ca...@j-davis.com which does some refactoring in a related area, but I can make them independent. We can also consider what to do about those special cases: * add a special case for PG_C_UTF8? * instead of a hardwired set of special collation IDs, have a single- element "last collation ID" to check before doing the hash lookup? * remove the special cases entirely if we can close the performance gap enough that it's not important? (Note: the special case in lc_ctpye_is_c() is currently required for correctness because hba.c uses C_COLLATION_OID for regexes before the syscache is initialized. That can be fixed pretty easily a couple different ways, though.) -- Jeff Davis PostgreSQL Contributor Team - AWS From 777186a41955da8d05929f5c34e531dfa985b513 Mon Sep 17 00:00:00 2001 From: Jeff Davis Date: Fri, 14 Jun 2024 15:38:42 -0700 Subject: [PATCH v2 7/7] Change collation cache to use simplehash.h. --- src/backend/utils/adt/pg_locale.c | 39 +-- 1 file changed, 27 insertions(+), 12 deletions(-) diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c index 435a37a0e3..b71ca2d780 100644 --- a/src/backend/utils/adt/pg_locale.c +++ b/src/backend/utils/adt/pg_locale.c @@ -57,12 +57,12 @@ #include "access/htup_details.h" #include "catalog/pg_collation.h" #include "catalog/pg_database.h" +#include "common/hashfn.h" #include "mb/pg_wchar.h" #include "miscadmin.h" #include "utils/builtins.h" #include "utils/formatting.h" #include "utils/guc_hooks.h" -#include "utils/hsearch.h" #include "utils/lsyscache.h" #include "utils/memutils.h" #include "utils/pg_locale.h" @@ -129,10 +129,27 @@ typedef struct { Oid collid; /* hash key: pg_collation OID */ pg_locale_t locale; /* locale_t struct, or 0 if not valid */ -} collation_cache_entry; -static HTAB *collation_cache = NULL; + /* needed for simplehash */ + uint32 hash; + char status; +} collation_cache_entry; +#define SH_PREFIX collation_cache +#define SH_ELEMENT_TYPE collation_cache_entry +#define SH_KEY_TYPE Oid +#define SH_KEY collid +#define SH_HASH_KEY(tb, key) hash_uint32((uint32) key) +#define SH_EQUAL(tb, a, b) (a == b) +#define SH_GET_HASH(tb, a) a->hash +#define SH_SCOPE static inline +#define SH_STORE_HASH +#define SH_DECLARE +#define SH_DEFINE +#include "lib/simplehash.h" + +static MemoryContext CollationCacheContext = NULL; +static collation_cache_hash *CollationCache = NULL; #if defined(WIN32) && defined(LC_MESSAGES) static char *IsoLocaleName(const char *); @@ -1235,18 +1252,16 @@ lookup_collation_cache(Oid collation) Assert(OidIsValid(collation)); Assert(collation != DEFAULT_COLLATION_OID); - if (collation_cache == NULL) + if (CollationCache == NULL) { - /* First time through, initialize the hash table */ - HASHCTL ctl; - - ctl.keysize = sizeof(Oid); - ctl.entrysize = sizeof(collation_cache_entry); - collation_cache = hash_create("Collation cache", 100, &ctl, - HASH_ELEM | HASH_BLOBS); + CollationCacheContext = AllocSetContextCreate(TopMemoryContext, + "collation cache", + ALLOCSET_DEFAULT_SIZES); + CollationCache = collation_cache_create( + CollationCacheContext, 128, NULL); } - cache_entry = hash_search(collation_cache, &collation, HASH_ENTER, &found); + cache_entry = collation_cache_insert(CollationCache, collation, &found); if (!found) { /* -- 2.34.1
Re: jsonpath: Missing regex_like && starts with Errors?
On 06/14/24 12:21, David E. Wheeler wrote: > I noticed that neither `regex_like` nor `starts with`, the jsonpath > operators, raise an error when the operand is not a string (or array of > strings): > > david=# select jsonb_path_query('true', '$ like_regex "^hi"'); > jsonb_path_query > -- > null > (1 row) > > david=# select jsonb_path_query('{"x": "hi"}', '$ starts with "^hi"'); > jsonb_path_query > -- > null > (1 row) To begin with, both of those path queries should have been rejected at the parsing stage, just like the one David Johnson pointed out: On 06/13/24 22:14, David G. Johnston wrote: > On Thursday, June 13, 2024, Chapman Flack wrote: >> On 06/13/24 21:46, David G. Johnston wrote: > david=# select jsonb_path_query('1', '$ >= 1'); Good point. I can't either. No way I can see to parse that as a . >>> >>> Whether we note it as non-standard or not is an open question then, but >> it >>> does work and opens up a documentation question. All of these are appearing where a is needed, and that's not allowed in the standard. Strictly speaking, the only place can appear is within . So I should go look at our code to see what grammar we've implemented, exactly. It is beginning to seem as if we have simply added as another choice for an expression, not restricted to only appearing in a filter. If so, and we add documentation about how we diverge from the standard, that's probably the way to say it. On 06/13/24 22:14, David G. Johnston wrote: > I don’t get why the outcome of a boolean producing operation isn’t just > generally allowed to be produced I understand; after all, what is a 'predicate' but another 'boolean producing operation'? But the committee (at least in this edition) has stuck us with this clear division in the grammar: there is no , boolean as it may be, that can appear as a , and there is no that can appear outside of a filter and be treated as a boolean-valued expression. As for the error behavior of a (which strictly, again, can only appear inside a ), the standard says what seems to be the same thing, in a couple different ways. In 4.48.5 Overview of SQL/JSON path language, this is said: "The SQL/JSON path language traps any errors that occur during the evaluation of a . Depending on the precise ... the result may be Unknown, True, or False, ...". Later in 9.46's General Rules where the specific semantics of the various predicates are laid out, each predicate has rules spelling out which of Unknown, True, or False results when an error condition is encountered (usually Unknown, except where something already seen allows returning True or False). Finally, the itself collapses the three-valued logic to two; it includes the items for which the predicate returns True, and excludes them for False or Unknown. So that's where the errors went. The question of what should happen to the errors when a appears outside of a of course isn't answered in the standard, because that's not supposed to be possible. So if we're allowing predicates to appear on their own as expressions, it's also up to us to say what should happen with errors when they do. Regards, -Chap
Re: jsonpath: Missing regex_like && starts with Errors?
On 06/14/24 22:29, Chapman Flack wrote: > So I should go look at our code to see what grammar we've implemented, > exactly. It is beginning to seem as if we have simply added > as another choice for an expression, not restricted > to only appearing in a filter. If so, and we add documentation about how > we diverge from the standard, that's probably the way to say it. That's roughly what we've done: 119 result: 120 mode expr_or_predicate { 121 ... 125 } 126 | /* EMPTY */ { *result = NULL; } 127 ; 128 129 expr_or_predicate: 130 expr{ $$ = $1; } 131 | predicate { $$ = $1; } 132 ; Oddly, that's only at the top-level goal production. Your entire JSON path query we'll allow to be a predicate in lieu of an expr. We still don't allow a predicate to appear in place of an expr within any other production. Regards, -Chap