date:20240614

On Thu, Jun 13, 2024 at 6:14 PM Masahiko Sawada  wrote:
>
> On Thu, Jun 13, 2024 at 7:06 PM Amit Kapila  wrote:
> >
> > On Thu, Jun 13, 2024 at 1:09 PM Masahiko Sawada  
> > wrote:
> > >
> > > On Wed, Jun 12, 2024 at 6:59 PM Amit Kapila  
> > > wrote:
> > > >
> > > >
> > > > Yeah, starting with a single worker sounds good for now. Do you think
> > > > we should sync all the sequences in a single transaction or have some
> > > > threshold value above which a different transaction would be required
> > > > or maybe a different sequence sync worker altogether? Now, having
> > > > multiple sequence-sync workers requires some synchronization so that
> > > > only a single worker is allocated for one sequence.
> > > >
> > > > The simplest thing is to use a single sequence sync worker that syncs
> > > > all sequences in one transaction but with a large number of sequences,
> > > > it could be inefficient. OTOH, I am not sure if it would be a problem
> > > > in reality.
> > >
> > > I think that we can start with using a single worker and one
> > > transaction, and measure the performance with a large number of
> > > sequences.
> > >
> >
> > Fair enough. However, this raises the question Dilip and Vignesh are
> > discussing whether we need a new relfilenode for sequence update even
> > during initial sync? As per my understanding, the idea is that similar
> > to tables, the CREATE SUBSCRIPTION command (with copy_data = true)
> > will create the new sequence entries in pg_subscription_rel with the
> > state as 'i'. Then the sequence-sync worker would start a transaction
> > and one-by-one copy the latest sequence values for each sequence (that
> > has state as 'i' in pg_subscription_rel) and mark its state as ready
> > 'r' and commit the transaction. Now if there is an error during this
> > operation it will restart the entire operation. The idea of creating a
> > new relfilenode is to handle the error so that if there is a rollback,
> > the sequence state will be rolled back to 'i' and the sequence value
> > will also be rolled back. The other option could be that we update the
> > sequence value without a new relfilenode and if the transaction rolled
> > back then only the sequence's state will be rolled back to 'i'. This
> > would work with a minor inconsistency that sequence values will be
> > up-to-date even when the sequence state is 'i' in pg_subscription_rel.
> > I am not sure if that matters because anyway, they can quickly be
> > out-of-sync with the publisher again.
>
> I think it would be fine in many cases even if the sequence value is
> up-to-date even when the sequence state is 'i' in pg_subscription_rel.
> But the case we would like to avoid is where suppose the sequence-sync
> worker does both synchronizing sequence values and updating the
> sequence states for all sequences in one transaction, and if there is
> an error we end up retrying the synchronization for all sequences.
>

The one idea to avoid this is to update sequences in chunks (say 100
or some threshold number of sequences in one transaction). Then we
would only redo the sync for the last and pending set of sequences.

> >
> > Now, say we don't want to maintain the state of sequences for initial
> > sync at all then after the error how will we detect if there are any
> > pending sequences to be synced? One possibility is that we maintain a
> > subscription level flag 'subsequencesync' in 'pg_subscription' to
> > indicate whether sequences need sync. This flag would indicate whether
> > to sync all the sequences in pg_susbcription_rel. This would mean that
> > if there is an error while syncing the sequences we will resync all
> > the sequences again. This could be acceptable considering the chances
> > of error during sequence sync are low. The benefit is that both the
> > REFRESH PUBLICATION SEQUENCES and CREATE SUBSCRIPTION can use the same
> > idea and sync all sequences without needing a new relfilenode. Users
> > can always refer 'subsequencesync' flag in 'pg_subscription' to see if
> > all the sequences are synced after executing the command.
>
> I think that REFRESH PUBLICATION {SEQUENCES} can be executed even
> while the sequence-sync worker is synchronizing sequences. In this
> case, the worker might not see new sequences added by the concurrent
> REFRESH PUBLICATION {SEQUENCES} command since it's already running.
> The worker could end up marking the subsequencesync as completed while
> not synchronizing these new sequences.
>

This is possible but we could avoid REFRESH PUBLICATION {SEQUENCES} by
not allowing to change the subsequencestate during the time
sequence-worker is syncing the sequences. This could be restrictive
but there doesn't seem to be cases where user would like to
immediately refresh sequences after creating the subscription.

> >
> > > > > Or yet another idea I came up with is that a tablesync worker will
> > > > > synchronize both the table and sequences owned by the table. That is,
> > > > > after the table

Re: Allow non-superuser to cancel superuser tasks.

2024-06-14 Thread Andrey M. Borodin

> On 13 Jun 2024, at 02:04, Nathan Bossart  wrote:
> 
> I adjusted 0001 based on my upthread feedback.

This patch looks good to me. Spellchecker is complaining about “signaling” 
instead of “signalling”, but ISTM it’s OK.

I’ve tried to dig into the test.
The problem is CV is allocated in

inj_state = GetNamedDSMSegment("injection_points”,

which seems to be destroyed in

shmem_exit() calling dsm_backend_shutdown()

This happens before we broadcast that sleep is over.
I think this might happen with any wait on injection point if it is 
pg_terminate_backend()ed.

Is there way to wake up from CV sleep before processing actual termination?

Thanks!

Best regards, Andrey Borodin.

Re: Improve the granularity of PQsocketPoll's timeout parameter?

2024-06-14 Thread Dominique Devienne

On Thu, Jun 13, 2024 at 9:18 PM Tom Lane  wrote:
> Ranier Vilela  writes:
> > +1 for push.
>
> Done.  [...]

Thanks a lot Tom (and reviewers)! --DD

may be a buffer overflow problem

2024-06-14 Thread Winter Loo

Hi hackers,


I am using gcc version 11.3.0 to compile postgres source code. Gcc complains 
about the following line:


```c
strncpy(sqlca->sqlstate, "YE001", sizeof(sqlca->sqlstate));
```


with error as:


misc.c:529:17: error: ‘strncpy’ output truncated before terminating nul copying 
5 bytes from a string of the same length [-Werror=stringop-truncation]


I find the definition of `sqlca->sqlstate` and it has only 5 bytes. When the 
statement


```c
strncpy(sqlca->sqlstate, "YE001", sizeof(sqlca->sqlstate));
```


get executed, `sqlca->sqlstate` will have no '\0' byte which makes me anxious 
when someone prints that as a string. Indeed, I found the code(in 
src/interfaces/ecpg/ecpglib/misc.c) does that,


```c
fprintf(debugstream, "[NO_PID]: sqlca: code: %ld, state: %s\n",
sqlca->sqlcode, sqlca->sqlstate);
```


Is there any chance to fix the code?

Don't process multi xmax in FreezeMultiXactId() if it is already marked as invalid.

2024-06-14 Thread Anton A. Melnikov


Hello!

The src/backend/access/heap/README.tuplock says about HEAP_XMAX_INVALID bit
that "Any tuple with this bit set does not have a valid value stored in XMAX."

Found that FreezeMultiXactId() tries to process such an invalid multi xmax
and may looks for an update xid in the pg_multixact for it.

Maybe not do this work in FreezeMultiXactId() and exit immediately if the
bit HEAP_XMAX_INVALID was already set?

For instance, like that:

master
@@ -6215,6 +6215,15 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
/* We should only be called in Multis */
Assert(t_infomask & HEAP_XMAX_IS_MULTI);
 
+   /* Xmax is already marked as invalid */

+   if (MultiXactIdIsValid(multi) &&
+   (t_infomask & HEAP_XMAX_INVALID))
+   {
+   *flags |= FRM_INVALIDATE_XMAX;
+   pagefrz->freeze_required = true;
+   return InvalidTransactionId;
+   }
+
if (!MultiXactIdIsValid(multi) ||
HEAP_LOCKED_UPGRADED(t_infomask))


With the best regards,

--
Anton A. Melnikov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: Avoid orphaned objects dependencies, take 3

2024-06-14 Thread Bertrand Drouvot

Hi,

On Thu, Jun 13, 2024 at 02:27:45PM -0400, Robert Haas wrote:
> On Thu, Jun 13, 2024 at 12:52 PM Bertrand Drouvot
>  wrote:
> > > > table_open(childRelId, ...) would lock any "ALTER TABLE  
> > > > DROP CONSTRAINT"
> > > > already. Not sure I understand your concern here.
> > >
> > > I believe this is not true. This would take a lock on the table, not
> > > the constraint itself.
> >
> > I agree that it would not lock the constraint itself. What I meant to say 
> > is that
> > , nevertheless, the constraint can not be dropped. Indeed, the "ALTER TABLE"
> > necessary to drop the constraint (ALTER TABLE  DROP CONSTRAINT) 
> > would
> > be locked by the table_open(childRelId, ...).
> 
> Ah, right. So, I was assuming that, with either this version of your
> patch or the earlier version, we'd end up locking the constraint
> itself. Was I wrong about that?

The child contraint itself is not locked when going through
ConstraintSetParentConstraint().

While at it, let's look at a full example and focus on your concern.

Let's do that with this gdb file:

"
$ cat gdb.txt
b dependency.c:1542
command 1
printf "Will return for: classId %d and objectId %d\n", 
object->classId, object->objectId
c
end
b dependency.c:1547 if object->classId == 2606
command 2
printf "Will lock constraint: classId %d and objectId %d\n", 
object->classId, object->objectId
c
end
"

knowing that:

"
Line 1542 is the return here in depLockAndCheckObject() (your concern):

if (object->classId == RelationRelationId || object->classId == 
AuthMemRelationId)
return;

Line 1547 is the lock here in depLockAndCheckObject():

/* assume we should lock the whole object not a sub-object */
LockDatabaseObject(object->classId, object->objectId, 0, AccessShareLock);
"

So, with gdb attached to a session let's:

1. Create the parent relation

CREATE TABLE upsert_test (
a   INT,
b   TEXT
) PARTITION BY LIST (a);

gdb produces:

---
Will return for: classId 1259 and objectId 16384
Will return for: classId 1259 and objectId 16384
---

Oid 16384 is upsert_test, so I think the return (dependency.c:1542) is fine as
we are creating the object (it can't be dropped as not visible to anyone else).

2. Create another relation (will be the child)

CREATE TABLE upsert_test_2 (b TEXT, a int);

gdb produces:

---
Will return for: classId 1259 and objectId 16391
Will return for: classId 1259 and objectId 16394
Will return for: classId 1259 and objectId 16394
Will return for: classId 1259 and objectId 16391
---

Oid 16391 is upsert_test_2
Oid 16394 is pg_toast_16391

so I think the return (dependency.c:1542) is fine as we are creating those
objects (can't be dropped as not visible to anyone else).

3. Attach the partition

ALTER TABLE upsert_test ATTACH PARTITION upsert_test_2 FOR VALUES IN (2);

gdb produces:

---
Will return for: classId 1259 and objectId 16384
---

That's fine because we'd already had locked the relation 16384 through
AlterTableLookupRelation()->RangeVarGetRelidExtended()->LockRelationOid().

4. Add a constraint on the child relation

ALTER TABLE upsert_test_2 add constraint bdtc2 UNIQUE (a);

gdb produces:

---
Will return for: classId 1259 and objectId 16391
Will lock constraint: classId 2606 and objectId 16397
---

That's fine because we'd already had locked the relation 16391 through
AlterTableLookupRelation()->RangeVarGetRelidExtended()->LockRelationOid().

Oid 16397 is the constraint we're creating (bdtc2).

5. Add a constraint on the parent relation (this goes through
ConstraintSetParentConstraint())

ALTER TABLE upsert_test add constraint bdtc1 UNIQUE (a);

gdb produces:

---
Will return for: classId 1259 and objectId 16384
Will lock constraint: classId 2606 and objectId 16399
Will return for: classId 1259 and objectId 16398
Will return for: classId 1259 and objectId 16391
Will lock constraint: classId 2606 and objectId 16399
Will return for: classId 1259 and objectId 16391
---

Regarding "Will return for: classId 1259 and objectId 16384":
That's fine because we'd already had locked the relation 16384 through
AlterTableLookupRelation()->RangeVarGetRelidExtended()->LockRelationOid().

Regarding "Will lock constraint: classId 2606 and objectId 16399":
Oid 16399 is the constraint that we're creating.

Regarding "Will return for: classId 1259 and objectId 16398":
That's fine because Oid 16398 is an index that we're creating while creating
the constraint (so the index can't be dropped as not visible to anyone else).

Regarding "Will return for: classId 1259 and objectId 16391":
That's fine because 16384 we'd be already locked (as mentioned above). And
I think that's fine because trying to drop "upsert_test_2" (aka 16391) would 
produce
RemoveRelations()->RangeVarGetRelidExtended()->RangeVarCallbackForDropRelation()
->LockRelationOid(relid=16384, lockmode=8) and so would be locked.

Regarding this example, I don't think that the return in 
depLockAndCheckObject():

"
if (object->classId == R

Re: may be a buffer overflow problem

> On 14 Jun 2024, at 09:38, Winter Loo  wrote:

> I find the definition of `sqlca->sqlstate` and it has only 5 bytes. When the 
> statement
> 
> ```c
> strncpy(sqlca->sqlstate, "YE001", sizeof(sqlca->sqlstate));
> ```
> 
> get executed, `sqlca->sqlstate` will have no '\0' byte which makes me anxious 
> when someone prints that as a string.

sqlstate is defined as not being unterminated fixed-length, leaving the callers
to handle termination.

> Indeed, I found the code(in src/interfaces/ecpg/ecpglib/misc.c) does that,
> 
> fprintf(debugstream, "[NO_PID]: sqlca: code: %ld, state: %s\n",
> sqlca->sqlcode, sqlca->sqlstate);

This is indeed buggy and need to take the length into account, as per the
attached.  This only happens when in the undocumented regression test debug
mode which may be why it's gone unnoticed.

--
Daniel Gustafsson

ecgp_sqlstate.diff
Description: Binary data

Re: may be a buffer overflow problem

2024-06-14 Thread Laurenz Albe

On Fri, 2024-06-14 at 15:38 +0800, Winter Loo wrote:
> I am using gcc version 11.3.0 to compile postgres source code. Gcc complains 
> about the following line:
> 
>   strncpy(sqlca->sqlstate, "YE001", sizeof(sqlca->sqlstate));
> 
> with error as:
> 
> misc.c:529:17: error: ‘strncpy’ output truncated before terminating nul
> copying 5 bytes from a string of the same length [-Werror=stringop-truncation]
> 
> I find the definition of `sqlca->sqlstate` and it has only 5 bytes. When the 
> statement
> 
>   strncpy(sqlca->sqlstate, "YE001", sizeof(sqlca->sqlstate));
> 
> get executed, `sqlca->sqlstate` will have no '\0' byte which makes me anxious
> when someone prints that as a string. Indeed, I found the code(in 
> src/interfaces/ecpg/ecpglib/misc.c) does that,
> 
>   fprintf(debugstream, "[NO_PID]: sqlca: code: %ld, state: %s\n",
>   sqlca->sqlcode, sqlca->sqlstate);
> 
> Is there any chance to fix the code?

I agree that that is wrong.

We could either use memcpy() to avoid the warning and use a format string
with %.*s in fprintf(), or we could make the "sqlstate" one byte longer.

I think that the second option would be less error-prone.

Yours,
Laurenz Albe

Re: may be a buffer overflow problem

2024-06-14 Thread Laurenz Albe

On Fri, 2024-06-14 at 09:55 +0200, Daniel Gustafsson wrote:
> > On 14 Jun 2024, at 09:38, Winter Loo  wrote:
> 
> > I find the definition of `sqlca->sqlstate` and it has only 5 bytes. When 
> > the statement
> > 
> > ```c
> > strncpy(sqlca->sqlstate, "YE001", sizeof(sqlca->sqlstate));
> > ```
> > 
> > get executed, `sqlca->sqlstate` will have no '\0' byte which makes me 
> > anxious when someone prints that as a string.
> 
> sqlstate is defined as not being unterminated fixed-length, leaving the 
> callers
> to handle termination.
> 
> > Indeed, I found the code(in src/interfaces/ecpg/ecpglib/misc.c) does that,
> > 
> > fprintf(debugstream, "[NO_PID]: sqlca: code: %ld, state: %s\n",
> > sqlca->sqlcode, sqlca->sqlstate);
> 
> This is indeed buggy and need to take the length into account, as per the
> attached.  This only happens when in the undocumented regression test debug
> mode which may be why it's gone unnoticed.

So you think we should ignore that compiler warning?
What about using memcpy() instead of strncpy()?

Yours,
Laurenz Albe

Re: may be a buffer overflow problem

> On 14 Jun 2024, at 10:06, Laurenz Albe  wrote:

> So you think we should ignore that compiler warning?

We already do using this in meson.build:

  # Similarly disable useless truncation warnings from gcc 8+
  'format-truncation',
  'stringop-truncation',

--
Daniel Gustafsson

Re: may be a buffer overflow problem

2024-06-14 Thread Laurenz Albe

On Fri, 2024-06-14 at 10:10 +0200, Daniel Gustafsson wrote:
> > On 14 Jun 2024, at 10:06, Laurenz Albe  wrote:
> 
> > So you think we should ignore that compiler warning?
> 
> We already do using this in meson.build:
> 
>   # Similarly disable useless truncation warnings from gcc 8+
>   'format-truncation',
>   'stringop-truncation',

Right; and I see that -Wno-stringop-truncation is also set if you build
with "make".  So your patch is good.

I wonder how Winter Loo got to see that warning...

Yours,
Laurenz Albe

Re: may be a buffer overflow problem

> On 14 Jun 2024, at 10:29, Laurenz Albe  wrote:
> 
> On Fri, 2024-06-14 at 10:10 +0200, Daniel Gustafsson wrote:
>>> On 14 Jun 2024, at 10:06, Laurenz Albe  wrote:
>> 
>>> So you think we should ignore that compiler warning?
>> 
>> We already do using this in meson.build:
>> 
>>   # Similarly disable useless truncation warnings from gcc 8+
>>   'format-truncation',
>>   'stringop-truncation',
> 
> Right; and I see that -Wno-stringop-truncation is also set if you build
> with "make".  So your patch is good.

Thanks for looking!  I will apply it backpatched all the way down as this has
been wrong since 2006.

> I wonder how Winter Loo got to see that warning...

And it would be interesting to know if that was the only warning, since error.c
in ECPG performs the exact same string copy.

--
Daniel Gustafsson

Re: libpq contention due to gss even when not using gss

2024-06-14 Thread Dmitry Dolgov

> On Thu, Jun 13, 2024 at 10:30:24AM GMT, Andres Freund wrote:
> > > To investigate a report of both postgres and pgbouncer having issues when 
> > > a
> > > lot of new connections aree established, I used pgbench -C.  Oddly, on an
> > > early attempt, the bottleneck wasn't postgres+pgbouncer, it was pgbench. 
> > > But
> > > only when using TCP, not with unix sockets.
> > >
> > > c=40;pgbench -C -n -c$c -j$c -T5 -f <(echo 'select 1') 'port=6432 
> > > host=127.0.0.1 user=test dbname=postgres password=fake'
> > >
> > > host=127.0.0.1:   16465
> > > host=127.0.0.1,gssencmode=disable 20860
> > > host=/tmp:49286
> > >
> > > Note that the server does *not* support gss, yet gss has a substantial
> > > performance impact.
> > >
> > > Obviously the connection rates here absurdly high and outside of badly 
> > > written
> > > applications likely never practically relevant. However, the number of 
> > > cores
> > > in systems are going up, and this quite possibly will become relevant in 
> > > more
> > > realistic scenarios (lock contention kicks in earlier the more cores you
> > > have).
> >
> > By not supporting gss I assume you mean having built with --with-gssapi,
> > but only host (not hostgssenc) records in pg_hba, right?
>
> Yes, the latter. Or not having kerberos set up on the client side.

I've been experimenting with both:

* The server is built without gssapi, but the client does support it.
  This produces exactly the contention you're talking about.

* The server is built with gssapi, but do not use it in pg_hba, the
  client does support gssapi. In this case the difference between
  gssencmode=disable/prefer is even more dramatic in my test case
  (milliseconds vs seconds) due to the environment with configured
  kerberos (for other purposes, thus gss_init_sec_context spends huge
  amount of time to still return nothing).

At the same time after quick look I don't see an easy way to avoid that.
Current implementation tries to initialize gss before getting any
confirmation from the server whether it's supported. Doing this other
way around would probably just shift overhead to the server side.

Re: Optimize numeric.c mul_var() using the Karatsuba algorithm

2024-06-14 Thread Aaron Altman

The following review has been posted through the commitfest application:
make installcheck-world:  not tested
Implements feature:   not tested
Spec compliant:   not tested
Documentation:not tested

This applies cleanly to master, builds and passes regression tests in my 
Windows/Cygwin environment.  

I also read through comments and confirmed for myself that the assumption about 
the caller ensuring var1 is shorter is done already in unchanged code from 
mul_var.  Frees correspond to inits.  The "Karatsuba condition" reasoning for 
deciding whether a number is big enough to use this algorithm appears to match 
what Joel has stated in this thread.

The arithmetic appears to match what's been described in the comments.  I have 
*not* confirmed that with any detailed review of the Karatsuba algorithm from 
outside sources, other implementations like the Rust one referenced here, or 
anything similar.  I'm hoping that the regression tests give sufficient 
coverage that if the arithmetic was incorrect there would be obvious failures.  
If additional coverage was needed, cases falling immediately on either side of 
the limiting conditions used in the patch would probably be useful.  From the 
limited precedent I've exposed myself to, that doesn't seem to be required 
here, but I'm open to contrary input from other reviewers.  In the meantime, 
I'm marking this approved.  

Thanks for the detailed background and comments, Joel!

The new status of this patch is: Ready for Committer

Re: Logical Replication of sequences

On Fri, Jun 14, 2024 at 5:16 AM Michael Paquier  wrote:
>
> On Thu, Jun 13, 2024 at 03:36:05PM +0530, Amit Kapila wrote:
> > Fair enough. However, this raises the question Dilip and Vignesh are
> > discussing whether we need a new relfilenode for sequence update even
> > during initial sync? As per my understanding, the idea is that similar
> > to tables, the CREATE SUBSCRIPTION command (with copy_data = true)
> > will create the new sequence entries in pg_subscription_rel with the
> > state as 'i'. Then the sequence-sync worker would start a transaction
> > and one-by-one copy the latest sequence values for each sequence (that
> > has state as 'i' in pg_subscription_rel) and mark its state as ready
> > 'r' and commit the transaction. Now if there is an error during this
> > operation it will restart the entire operation.
>
> Hmm.  You mean to use only one transaction for all the sequences?
> I've heard about deployments with a lot of them.  Could it be a
> problem to process them in batches, as well?

I don't think so. We can even sync one sequence per transaction but
then it would be resource and time consuming without much gain. As
mentioned in a previous email, we might want to sync 100 or some other
threshold number of sequences per transaction. The other possibility
is to make a subscription-level option for this batch size but I don't
see much advantage in doing so as it won't be convenient for users to
set it. I feel we should pick some threshold number that is neither
too low nor too high and if we later see any problem with it, we can
make it a configurable knob.

>
> > The idea of creating a
> > new relfilenode is to handle the error so that if there is a rollback,
> > the sequence state will be rolled back to 'i' and the sequence value
> > will also be rolled back. The other option could be that we update the
> > sequence value without a new relfilenode and if the transaction rolled
> > back then only the sequence's state will be rolled back to 'i'. This
> > would work with a minor inconsistency that sequence values will be
> > up-to-date even when the sequence state is 'i' in pg_subscription_rel.
> > I am not sure if that matters because anyway, they can quickly be
> > out-of-sync with the publisher again.
>
> Seeing a mention to relfilenodes specifically for sequences freaks me
> out a bit, because there's some work I have been doing in this area
> and sequences may not have a need for a physical relfilenode at all.
> But I guess that you refer to the fact that like tables, relfilenodes
> would only be created as required because anything you'd do in the
> apply worker path would just call some of the routines of sequence.h,
> right?
>

Yes, I think so. The only thing the patch expects is a way to rollback
the sequence changes if the transaction rolls back during the initial
sync. But I am not sure if we need such a behavior. The discussion for
the same is in progress. Let's wait for the outcome.

> > Now, say we don't want to maintain the state of sequences for initial
> > sync at all then after the error how will we detect if there are any
> > pending sequences to be synced? One possibility is that we maintain a
> > subscription level flag 'subsequencesync' in 'pg_subscription' to
> > indicate whether sequences need sync. This flag would indicate whether
> > to sync all the sequences in pg_susbcription_rel. This would mean that
> > if there is an error while syncing the sequences we will resync all
> > the sequences again. This could be acceptable considering the chances
> > of error during sequence sync are low.
>
> There could be multiple subscriptions to a single database that point
> to the same set of sequences.  Is there any conflict issue to worry
> about here?
>

I don't think so. In the worst case, the same value would be copied
twice. The same scenario in case of tables could lead to duplicate
data or unique key violation ERRORs which is much worse. So, I expect
users to be careful about the same.

> > The benefit is that both the
> > REFRESH PUBLICATION SEQUENCES and CREATE SUBSCRIPTION can use the same
> > idea and sync all sequences without needing a new relfilenode. Users
> > can always refer 'subsequencesync' flag in 'pg_subscription' to see if
> > all the sequences are synced after executing the command.
>
> That would be cheaper, indeed.  Isn't a boolean too limiting?
>

In this idea, we only need a flag to say whether the sequence sync is
required or not.

> Isn't that something you'd want to track with a LSN as "the point in
> WAL where all the sequences have been synced"?
>

It won't be any better for the required purpose because after CREATE
SUBSCRIPTION, if REFERESH wants to toggle the flag to indicate the
sequences need sync again then using LSN would mean we need to set it
to Invalid value.

> The approach of doing all the sync work from the subscriber, while
> having a command that can be kicked from the subscriber side is a good
> user experience.
>

Thank you for

Re: Issues with ON CONFLICT UPDATE and REINDEX CONCURRENTLY

2024-06-14 Thread Michail Nikolaev

Hello, Michael.

> Isn't the issue that we may select as arbiter indexes stuff that's
!indisvalid?
As far as I can see (1) !indisvalid indexes are filtered out.

But... It looks like this choice is not locked in any way (2), so
index_concurrently_swap or index_concurrently_set_dead can change this
index after the decision is made, even despite WaitForLockersMultiple (3).
In some cases, it may cause a data loss...
But I was unable to reproduce that using some random usleep(), however -
maybe it is a wrong assumption.

[1]:
https://github.com/postgres/postgres/blob/915de706d28c433283e9dc63701e8f978488a2b9/src/backend/optimizer/util/plancat.c#L804

[2]:
https://github.com/postgres/postgres/blob/915de706d28c433283e9dc63701e8f978488a2b9/src/backend/optimizer/util/plancat.c#L924-L928
[3]:
https://github.com/postgres/postgres/blob/8aee330af55d8a759b2b73f5a771d9d34a7b887f/src/backend/commands/indexcmds.c#L4153

Re: libpq contention due to gss even when not using gss

> On 14 Jun 2024, at 10:46, Dmitry Dolgov <9erthali...@gmail.com> wrote:
> 
>> On Thu, Jun 13, 2024 at 10:30:24AM GMT, Andres Freund wrote:
 To investigate a report of both postgres and pgbouncer having issues when a
 lot of new connections aree established, I used pgbench -C.  Oddly, on an
 early attempt, the bottleneck wasn't postgres+pgbouncer, it was pgbench. 
 But
 only when using TCP, not with unix sockets.
 
 c=40;pgbench -C -n -c$c -j$c -T5 -f <(echo 'select 1') 'port=6432 
 host=127.0.0.1 user=test dbname=postgres password=fake'
 
 host=127.0.0.1:   16465
 host=127.0.0.1,gssencmode=disable 20860
 host=/tmp:49286
 
 Note that the server does *not* support gss, yet gss has a substantial
 performance impact.
 
 Obviously the connection rates here absurdly high and outside of badly 
 written
 applications likely never practically relevant. However, the number of 
 cores
 in systems are going up, and this quite possibly will become relevant in 
 more
 realistic scenarios (lock contention kicks in earlier the more cores you
 have).
>>> 
>>> By not supporting gss I assume you mean having built with --with-gssapi,
>>> but only host (not hostgssenc) records in pg_hba, right?
>> 
>> Yes, the latter. Or not having kerberos set up on the client side.
> 
> I've been experimenting with both:
> 
> * The server is built without gssapi, but the client does support it.
>  This produces exactly the contention you're talking about.
> 
> * The server is built with gssapi, but do not use it in pg_hba, the
>  client does support gssapi. In this case the difference between
>  gssencmode=disable/prefer is even more dramatic in my test case
>  (milliseconds vs seconds) due to the environment with configured
>  kerberos (for other purposes, thus gss_init_sec_context spends huge
>  amount of time to still return nothing).
> 
> At the same time after quick look I don't see an easy way to avoid that.
> Current implementation tries to initialize gss before getting any
> confirmation from the server whether it's supported. Doing this other
> way around would probably just shift overhead to the server side.

The main problem seems to be that we check whether or not there is a credential
cache when we try to select encryption but not yet authentication, as a way to
figure out if gssenc it as all worth trying?  I experimented with deferring it
with potentially cheaper heuristics in encryption selection, but it seems hard
to get around since other methods were even more expensive.

--
Daniel Gustafsson

Re: Pgoutput not capturing the generated columns

2024-06-14 Thread Shubham Khanna

> Thanks for the updated patch, few comments:
> 1) The option name seems wrong here:
> In one place include_generated_column is specified and other place
> include_generated_columns is specified:
>
> +   else if (IsSet(supported_opts,
> SUBOPT_INCLUDE_GENERATED_COLUMN) &&
> +strcmp(defel->defname,
> "include_generated_column") == 0)
> +   {
> +   if (IsSet(opts->specified_opts,
> SUBOPT_INCLUDE_GENERATED_COLUMN))
> +   errorConflictingDefElem(defel, pstate);
> +
> +   opts->specified_opts |= 
> SUBOPT_INCLUDE_GENERATED_COLUMN;
> +   opts->include_generated_column = defGetBoolean(defel);
> +   }

Fixed.

> diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
> index d453e224d9..e8ff752fd9 100644
> --- a/src/bin/psql/tab-complete.c
> +++ b/src/bin/psql/tab-complete.c
> @@ -3365,7 +3365,7 @@ psql_completion(const char *text, int start, int end)
> COMPLETE_WITH("binary", "connect", "copy_data", "create_slot",
>   "disable_on_error",
> "enabled", "failover", "origin",
>   "password_required",
> "run_as_owner", "slot_name",
> - "streaming",
> "synchronous_commit", "two_phase");
> + "streaming",
> "synchronous_commit", "two_phase","include_generated_columns");
>
> 2) This small data table need not have a primary key column as it will
> create an index and insertion will happen in the index too.
> +-- check include-generated-columns option with generated column
> +CREATE TABLE gencoltable (a int PRIMARY KEY, b int GENERATED ALWAYS
> AS (a * 2) STORED);
> +INSERT INTO gencoltable (a) VALUES (1), (2), (3);
> +SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,
> NULL, 'include-xids', '0', 'skip-empty-xacts', '1',
> 'include-generated-columns', '1');

Fixed.

> 3) Please add a test case for this:
> +  set to false. If the subscriber-side
> column is also a
> +  generated column then this option has no effect; the
> replicated data will
> +  be ignored and the subscriber column will be filled as
> normal with the
> +  subscriber-side computed or default data.

Added the required test case.

> 4) You can use a new style of ereport to remove the brackets around errcode
> 4.a)
> +   else if (!parse_bool(strVal(elem->arg),
> &data->include_generated_columns))
> +   ereport(ERROR,
> +
> (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> +errmsg("could not
> parse value \"%s\" for parameter \"%s\"",
> +
> strVal(elem->arg), elem->defname)));
>
> 4.b) similarly here too:
> +   ereport(ERROR,
> +   (errcode(ERRCODE_SYNTAX_ERROR),
> +   /*- translator: both %s are strings of the form
> "option = value" */
> +   errmsg("%s and %s are mutually
> exclusive options",
> +   "copy_data = true",
> "include_generated_column = true")));
>
> 4.c) similarly here too:
> +   if (include_generated_columns_option_given)
> +   ereport(ERROR,
> +   
> (errcode(ERRCODE_SYNTAX_ERROR),
> +errmsg("conflicting
> or redundant options")));

Fixed.

> 5) These variable names can be changed to keep it smaller, something
> like gencol or generatedcol or gencolumn, etc
> +++ b/src/include/catalog/pg_subscription.h
> @@ -98,6 +98,8 @@ CATALOG(pg_subscription,6100,SubscriptionRelationId)
> BKI_SHARED_RELATION BKI_ROW
>   * slots) in the upstream database are enabled
>   * to be synchronized to the standbys. */
>
> + bool subincludegeneratedcolumn; /* True if generated columns must be
> published */
> +
>  #ifdef CATALOG_VARLEN /* variable-length fields start here */
>   /* Connection string to the publisher */
>   text subconninfo BKI_FORCE_NOT_NULL;
> @@ -157,6 +159,7 @@ typedef struct Subscription
>   List*publications; /* List of publication names to subscribe to */
>   char*origin; /* Only publish data originating from the
>   * specified origin */
> + bool includegeneratedcolumn; /* publish generated column data */
>  } Subscription;

Fixed.

The attached Patch contains the suggested changes.

Thanks and Regards,
Shubham Khanna.


v6-0001-Enable-support-for-include_generated_columns-opti.patch
Description: Binary data

Re: Conflict Detection and Resolution

On Thu, Jun 13, 2024 at 7:00 PM Alvaro Herrera  wrote:
>
> On 2024-Jun-07, Tomas Vondra wrote:
>
> > On 6/3/24 09:30, Amit Kapila wrote:
> > > On Sat, May 25, 2024 at 2:39 AM Tomas Vondra 
> > >  wrote:
>
> > >> How is this going to deal with the fact that commit LSN and timestamps
> > >> may not correlate perfectly? That is, commits may happen with LSN1 <
> > >> LSN2 but with T1 > T2.
>
> > But as I wrote, I'm not quite convinced this means there are not other
> > issues with this way of resolving conflicts. It's more likely a more
> > complex scenario is required.
>
> Jan Wieck approached me during pgconf.dev to reproach me of this
> problem.  He also said he had some code to fix-up the commit TS
> afterwards somehow, to make the sequence monotonically increasing.
> Perhaps we should consider that, to avoid any problems caused by the
> difference between LSN order and TS order.   It might be quite
> nightmarish to try to make the system work correctly without
> reasonable constraints of that nature.
>

I agree with this but the problem Jan was worried about was not
directly reproducible in what the PostgreSQL provides at least that is
what I understood then. We are also unable to think of a concrete
scenario where this is a problem but we are planning to spend more
time deriving a test to reproducible the problem.

-- 
With Regards,
Amit Kapila.

Re: Pgoutput not capturing the generated columns

2024-06-14 Thread Shubham Khanna

On Tue, Jun 4, 2024 at 8:12 AM Peter Smith  wrote:
>
> Hi,
>
> Here are some review comments for patch v5-0001.
>
> ==
> GENERAL G.1
>
> The patch changes HEAD behaviour for PUBLICATION col-lists right? e.g.
> maybe before they were always ignored, but now they are not?
>
> OTOH, when 'include_generated_columns' is false then the PUBLICATION
> col-list will ignore any generated cols even when they are present in
> a PUBLICATION col-list, right?
>
> These kinds of points should be noted in the commit message and in the
> (col-list?) documentation.

Fixed.

> ==
> Commit message
>
> General 1a.
> IMO the commit message needs some background to say something like:
> "Currently generated column values are not replicated because it is
> assumed that the corresponding subscriber-side table will generate its
> own values for those columns."
>
> ~
>
> General 1b.
> Somewhere in this commit message, you need to give all the other
> special rules --- e.g. the docs says "If the subscriber-side column is
> also a generated column then this option has no effect"
>
> ~~~

Fixed.

> 2.
> This commit enables support for the 'include_generated_columns' option
> in logical replication, allowing the transmission of generated column
> information and data alongside regular table changes. This option is
> particularly useful for scenarios where applications require access to
> generated column values for downstream processing or synchronization.
>
> ~
>
> I don't think the sentence "This option is particularly useful..." is
> helpful. It seems like just saying "This commit supports option XXX.
> This is particularly useful if you want XXX".
>

Fixed.

>
> 3.
> CREATE SUBSCRIPTION test1 connection 'dbname=postgres host=localhost port=
> 'publication pub1;
>
> ~
>
> What is this CREATE SUBSCRIPTION for? Shouldn't it have an example of
> the new parameter being used in it?
>

Added the description for this in the Patch.

>
> 4.
> Currently copy_data option with include_generated_columns option is
> not supported. A future patch will remove this limitation.
>
> ~
>
> Suggest to single-quote those parameter names for better readability.
>

Fixed.

>
> 5.
> This commit aims to enhance the flexibility and utility of logical
> replication by allowing users to include generated column information
> in replication streams, paving the way for more robust data
> synchronization and processing workflows.
>
> ~
>
> IMO this paragraph can be omitted.

Fixed.

> ==
> .../test_decoding/sql/decoding_into_rel.sql
>
> 6.
> +-- check include-generated-columns option with generated column
> +CREATE TABLE gencoltable (a int PRIMARY KEY, b int GENERATED ALWAYS
> AS (a * 2) STORED);
> +INSERT INTO gencoltable (a) VALUES (1), (2), (3);
> +SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,
> NULL, 'include-xids', '0', 'skip-empty-xacts', '1',
> 'include-generated-columns', '1');
> +INSERT INTO gencoltable (a) VALUES (4), (5), (6);
> +SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,
> NULL, 'include-xids', '0', 'skip-empty-xacts', '1',
> 'include-generated-columns', '0');
> +DROP TABLE gencoltable;
> +
>
> 6a.
> I felt some additional explicit comments might help the readabilty of
> the output file.
>
> e.g.1
> -- When 'include-generated=columns' = '1' the generated column 'b'
> values will be replicated
> SELECT data FROM pg_logical_slot_get_changes...
>
> e.g.2
> -- When 'include-generated=columns' = '0' the generated column 'b'
> values will not be replicated
> SELECT data FROM pg_logical_slot_get_changes...

Added the required description for this.

> 6b.
> Suggest adding one more test case (where 'include-generated=columns'
> is not set) to confirm/demonstrate the default behaviour for
> replicated generated cols.

Added the required Test case.

> ==
> doc/src/sgml/protocol.sgml
>
> 7.
> +
> +  class="parameter">include-generated-columns
> +  
> +   
> +Boolean option to enable generated columns.
> +The include-generated-columns option controls whether generated
> +columns should be included in the string representation of tuples
> +during logical decoding in PostgreSQL. This allows users to
> +customize the output format based on whether they want to include
> +these columns or not. The default is false.
> +   
> +  
> +
>
> 7a.
> It doesn't render properly. e.g. Should not be bold italic (probably
> the class is wrong?), because none of the nearby parameters look this
> way.
>
> ~
>
> 7b.
> The name here should NOT have hyphens. It needs underscores same as
> all other nearby protocol parameters.
>
> ~
>
> 7c.
> The description seems overly verbose.
>
> SUGGESTION
> Boolean option to enable generated columns. This option controls
> whether generated columns should be included in the string
> representation of tuples during logical decoding in PostgreSQL. The
> default is false.

Fixed.

> ==
> doc/src/sgml/ref

Re:Re: may be a buffer overflow problem

2024-06-14 Thread Winter Loo

>Thanks for looking! I will apply it backpatched all the way down as this has

>been wrong since 2006.
>
>> I wonder how Winter Loo got to see that warning...

>
I was compiling source code of postgres version 13 and the building flags is 
changed in my development environment.
>And it would be interesting to know if that was the only warning, since 
>error.c >in ECPG performs the exact same string copy.
>


Yes, that was the only warning. I searched all `sqlstate` words in ecpg 
directory, there's no other dubious problems.

Re: Conflict Detection and Resolution

On Thu, Jun 13, 2024 at 11:18 PM Jonathan S. Katz  wrote:
>
> On 6/13/24 7:28 AM, Amit Kapila wrote:
> >>
> >> I feel that we should log all types of conflict in an uniform way. For
> >> example, with detect_conflict being true, the update_differ conflict
> >> is reported as "conflict %s detected on relation "%s"", whereas
> >> concurrent inserts with the same key is reported as "duplicate key
> >> value violates unique constraint "%s"", which could confuse users.
> >> Ideally, I think that we log such conflict detection details (table
> >> name, column name, conflict type, etc) to somewhere (e.g. a table or
> >> server logs) so that the users can resolve them manually.
> >>
> >
> > It is good to think if there is a value in providing in
> > pg_conflicts_history kind of table which will have details of
> > conflicts that occurred and then we can extend it to have resolutions.
> > I feel we can anyway LOG the conflicts by default. Updating a separate
> > table with conflicts should be done by default or with a knob is a
> > point to consider.
>
> +1 for logging conflicts uniformly, but I would +100 to exposing the log
> in a way that's easy for the user to query (whether it's a system view
> or a stat table). Arguably, I'd say that would be the most important
> feature to come out of this effort.
>

We can have both the system view and a stats table. The system view
could have some sort of cumulative stats data like how many times a
particular conflict had occurred and the table would provide detailed
information about the conflict. The one challenge I see in providing a
table is in its cleanup mechanism. We could prove a partitioned table
such that users can truncate/drop the not needed partitions or provide
a non-partitioned table where users can delete the old data in which
case they generate a work for auto vacuum.

> Removing how conflicts are resolved, users want to know exactly what row
> had a conflict, and users from other database systems that have dealt
> with these issues will have tooling to be able to review and analyze if
> a conflicts occur. This data is typically stored in a queryable table,
> with data retained for N days. When you add in automatic conflict
> resolution, users then want to have a record of how the conflict was
> resolved, in case they need to manually update it.
>
> Having this data in a table also gives the user opportunity to
> understand conflict stats (e.g. conflict rates) and potentially identify
> portions of the application and other parts of the system to optimize.
> It also makes it easier to import to downstream systems that may perform
> further analysis on conflict resolution, or alarm if a conflict rate
> exceeds a certain threshold.
>

Agreed those are good use cases to store conflict history.

-- 
With Regards,
Amit Kapila.

Re: 回复: An implementation of multi-key sort

2024-06-14 Thread Yao Wang

hi Tomas,

So many thanks for your kind response and detailed report. I am working
on locating issues based on your report/script and optimizing code, and
will update later.

Could you please also send me the script to generate report pdf
from the test results (explain*.log)? I can try to make one by myself,
but I'd like to get a report exactly the same as yours. It's really
helpful.

Thanks in advance.


Yao Wang

On Mon, Jun 10, 2024 at 5:09 AM Tomas Vondra
 wrote:
>
> Hello Yao,
>
> I was interested in the patch, considering the promise of significant
> speedups of sorting, so I took a quick look and did some basic perf
> testing today. Unfortunately, my benchmarks don't really confirm any
> peformance benefits, so I haven't looked at the code very much and only
> have some very basic feedback:
>
> 1) The new GUC is missing from the .sample config, triggering a failure
> of "make check-world". Fixed by 0002.
>
> 2) There's a place mixing tabs/spaces in indentation. Fixed by 0003.
>
> 3) I tried running pgindent, mostly to see how that would affect the
> comments, and for most it's probably fine, but a couple are mangled
> (usually those with a numbered list of items). Might needs some changes
> to use formatting that's not reformatted like this. The changes from
> pgindent are in 0004, but this is not a fix - it just shows the changes
> after running pgindent.
>
> Now, regarding the performance tests - I decided to do the usual black
> box testing, i.e. generate tables with varying numbers of columns, data
> types, different data distribution (random, correlated, ...) and so on.
> And then run simple ORDER BY queries on that, measuring timing with and
> without mk-sort, and checking the effect.
>
> So I wrote a simple bash script (attached) that does exactly that - it
> generates a table with 1k - 10M rows, fills with with data (with some
> basic simple data distributions), and then runs the queries.
>
> The raw results are too large to attach, I'm only attaching a PDF
> showing the summary with a "speedup heatmap" - it's a pivot with the
> parameters on the left, and then the GUC and number on columns on top.
> So the first group of columns is with enable_mk_sort=off, the second
> group with enable_mk_sort=on, and finally the heatmap with relative
> timing (enable_mk_sort=on / enable_mk_sort=off).
>
> So values <100% mean it got faster (green color - good), and values
> >100% mean it got slower (red - bad). And the thing is - pretty much
> everything is red, often in the 200%-300% range, meaning it got 2x-3x
> slower. There's only very few combinations where it got faster. That
> does not seem very promising ... but maybe I did something wrong?
>
> After seeing this, I took a look at your example again, which showed
> some nice speedups. But it seems very dependent on the order of keys in
> the ORDER BY clause. For example consider this:
>
> set enable_mk_sort = on;
> explain (analyze, timing off)
> select * from t1 order by c6, c5, c4, c3, c2, c1;
>
>  QUERY PLAN
> ---
>  Sort  (cost=72328.81..73578.81 rows=49 width=76)
>(actual rows=49 loops=1)
>Sort Key: c6, c5, c4, c3, c2, c1
>Sort Method: quicksort  Memory: 59163kB
>->  Seq Scan on t1  (cost=0.00..24999.99 rows=49 width=76)
>(actual rows=49 loops=1)
>  Planning Time: 0.054 ms
>  Execution Time: 1095.183 ms
> (6 rows)
>
> set enable_mk_sort = on;
> explain (analyze, timing off)
> select * from t1 order by c6, c5, c4, c3, c2, c1;
>
>  QUERY PLAN
> ---
>  Sort  (cost=72328.81..73578.81 rows=49 width=76)
>(actual rows=49 loops=1)
>Sort Key: c6, c5, c4, c3, c2, c1
>Sort Method: multi-key quick sort  Memory: 59163kB
>->  Seq Scan on t1  (cost=0.00..24999.99 rows=49 width=76)
>(actual rows=49 loops=1)
>  Planning Time: 0.130 ms
>  Execution Time: 633.635 ms
> (6 rows)
>
> Which seems great, but let's reverse the sort keys:
>
> set enable_mk_sort = off;
> explain (analyze, timing off)
> select * from t1 order by c1, c2, c3, c4, c5, c6;
>
>  QUERY PLAN
> ---
>
>  Sort  (cost=72328.81..73578.81 rows=49 width=76)
>(actual rows=49 loops=1)
>Sort Key: c1, c2, c3, c4, c5, c6
>Sort Method: quicksort  Memory: 59163kB
>->  Seq Scan on t1  (cost=0.00..24999.99 rows=49 width=76)
>(actual rows=49 loops=1)
>  Planning Time: 0.146 ms
>  Execution Time: 170.085 ms
> (6 rows)
>
> set enable_mk_sort = off;
> explain (analyze, timing off)
> select * from t1 order by c1, c2, c3, c4, c5, c6;
>
>  QUERY PLAN
> ---
>  Sort  (cost=72328.81..73578.81 rows=4

Re: Conflict Detection and Resolution

On Fri, Jun 14, 2024 at 12:10 AM Robert Haas  wrote:
>
> On Thu, May 23, 2024 at 2:37 AM shveta malik  wrote:
> > c) update_deleted: The row with the same value as that incoming
> > update's key does not exist. The row is already deleted. This conflict
> > type is generated only if the deleted row is still detectable i.e., it
> > is not removed by VACUUM yet. If the row is removed by VACUUM already,
> > it cannot detect this conflict. It will detect it as update_missing
> > and will follow the default or configured resolver of update_missing
> > itself.
>
> I think this design is categorically unacceptable. It amounts to
> designing a feature that works except when it doesn't. I'm not exactly
> sure how the proposal should be changed to avoid depending on the
> timing of VACUUM, but I think it's absolutely not OK to depend on the
> timing of VACUUm -- or, really, this is going to depend on the timing
> of HOT-pruning, which will often happen almost instantly.
>

Agreed, above Tomas has speculated to have a way to avoid vacuum
cleaning dead tuples until the required changes are received and
applied. Shveta also mentioned another way to have deads-store (say a
table where deleted rows are stored for resolution) [1] which is
similar to a technique used by some other databases. There is an
agreement to not rely on Vacuum to detect such a conflict but the
alternative is not clear. Currently, we are thinking to consider such
a conflict type as update_missing (The row with the same value as that
incoming update's key does not exist.). This is how the current HEAD
code behaves and LOGs the information (logical replication did not
find row to be updated ..).

[1] - 
https://www.postgresql.org/message-id/CAJpy0uCov4JfZJeOvY0O21_gk9bcgNUDp4jf8%2BBbMp%2BEAv8cVQ%40mail.gmail.com

-- 
With Regards,
Amit Kapila.

TruncateMultiXact() bugs

2024-06-14 Thread Heikki Linnakangas

I was performing tests around multixid wraparound, when I ran into this 
assertion:



TRAP: failed Assert("CritSectionCount == 0 || (context)->allowInCritSection"), File: 
"../src/backend/utils/mmgr/mcxt.c", Line: 1353, PID: 920981
postgres: autovacuum worker template0(ExceptionalCondition+0x6e)[0x560a501e866e]
postgres: autovacuum worker template0(+0x5dce3d)[0x560a50217e3d]
postgres: autovacuum worker template0(ForwardSyncRequest+0x8e)[0x560a4ffec95e]
postgres: autovacuum worker template0(RegisterSyncRequest+0x2b)[0x560a50091eeb]
postgres: autovacuum worker template0(+0x187b0a)[0x560a4fdc2b0a]
postgres: autovacuum worker template0(SlruDeleteSegment+0x101)[0x560a4fdc2ab1]
postgres: autovacuum worker template0(TruncateMultiXact+0x2fb)[0x560a4fdbde1b]
postgres: autovacuum worker 
template0(vac_update_datfrozenxid+0x4b3)[0x560a4febd2f3]
postgres: autovacuum worker template0(+0x3adf66)[0x560a4ffe8f66]
postgres: autovacuum worker template0(AutoVacWorkerMain+0x3ed)[0x560a4ffe7c2d]
postgres: autovacuum worker template0(+0x3b1ead)[0x560a4ffecead]
postgres: autovacuum worker template0(+0x3b620e)[0x560a4fff120e]
postgres: autovacuum worker template0(+0x3b3fbb)[0x560a4ffeefbb]
postgres: autovacuum worker template0(+0x2f724e)[0x560a4ff3224e]
/lib/x86_64-linux-gnu/libc.so.6(+0x27c8a)[0x7f62cc642c8a]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x85)[0x7f62cc642d45]
postgres: autovacuum worker template0(_start+0x21)[0x560a4fd16f31]
2024-06-14 13:11:02.025 EEST [920971] LOG:  server process (PID 920981) was 
terminated by signal 6: Aborted
2024-06-14 13:11:02.025 EEST [920971] DETAIL:  Failed process was running: 
autovacuum: VACUUM pg_toast.pg_toast_13407 (to prevent wraparound)


The attached python script reproduces this pretty reliably. It's a 
reduced version of a larger test script I was working on, it probably 
could be simplified further for this particular issue.


Looking at the code, it's pretty clear how it happens:

1. TruncateMultiXact does START_CRIT_SECTION();

2. In the critical section, it calls PerformMembersTruncation() -> 
SlruDeleteSegment() -> SlruInternalDeleteSegment() -> 
RegisterSyncRequest() -> ForwardSyncRequest()


3. If the fsync request queue is full, it calls 
CompactCheckpointerRequestQueue(), which calls palloc0. Pallocs are not 
allowed in a critical section.


A straightforward fix is to add a check to 
CompactCheckpointerRequestQueue() to bail out without compacting, if 
it's called in a critical section. That would cover any other cases like 
this, where RegisterSyncRequest() is called in a critical section. I 
haven't tried searching if any more cases like this exist.


But wait there is more!

After applying that fix in CompactCheckpointerRequestQueue(), the test 
script often gets stuck. There's a deadlock between the checkpointer, 
and the autovacuum backend trimming the SLRUs:


1. TruncateMultiXact does this:

MyProc->delayChkptFlags |= DELAY_CHKPT_START;

2. It then makes that call to PerformMembersTruncation() and 
RegisterSyncRequest(). If it cannot queue the request, it sleeps a 
little and retries. But the checkpointer is stuck waiting for the 
autovacuum backend, because of delayChkptFlags, and will never clear the 
queue.


To fix, I propose to add AbsorbSyncRequests() calls to the wait-loops in 
CreateCheckPoint().



Attached patch fixes both of those issues.

I can't help thinking that TruncateMultiXact() should perhaps not have 
such a long critical section. TruncateCLOG() doesn't do that. But it was 
added for good reasons in commit 4f627f897367, and this fix seems 
appropriate for the stable branches anyway, even if we come up with 
something better for master.


--
Heikki Linnakangas
Neon (https://neon.tech)import time
import logging
import psycopg2
from typing import Any, Callable, Dict, Iterator, List, Optional, Tuple, Type, Union, cast
import subprocess

PGDATA = "pgdata-multixid-repro"

log = logging.getLogger(__name__)

def connect():
return psycopg2.connect(
database="postgres",
user="postgres",
host="/tmp/",
)

# Constants and macros copied from PostgreSQL multixact.c and headers. These are needed to
# calculate the SLRU segments that a particular multixid or multixid-offsets falls into.
BLCKSZ = 8192
MULTIXACT_OFFSETS_PER_PAGE = int(BLCKSZ / 4)
SLRU_PAGES_PER_SEGMENT = int(32)
MXACT_MEMBER_BITS_PER_XACT = 8
MXACT_MEMBER_FLAGS_PER_BYTE = 1
MULTIXACT_FLAGBYTES_PER_GROUP = 4
MULTIXACT_MEMBERS_PER_MEMBERGROUP = MULTIXACT_FLAGBYTES_PER_GROUP * MXACT_MEMBER_FLAGS_PER_BYTE
MULTIXACT_MEMBERGROUP_SIZE = 4 * MULTIXACT_MEMBERS_PER_MEMBERGROUP + MULTIXACT_FLAGBYTES_PER_GROUP
MULTIXACT_MEMBERGROUPS_PER_PAGE = int(BLCKSZ / MULTIXACT_MEMBERGROUP_SIZE)
MULTIXACT_MEMBERS_PER_PAGE = MULTIXACT_MEMBERGROUPS_PER_PAGE * MULTIXACT_MEMBERS_PER_MEMBERGROUP

def MultiXactIdToOffsetSegment(xid: int):
return int(xid / (SLRU_PAGES_PER_SEGMENT * MULTIXACT_OFFSETS_PER_PAGE))


def MXOffsetToMemberSegment(off: int):
return int(off /

RE: Conflict Detection and Resolution

2024-06-14 Thread Zhijie Hou (Fujitsu)

On Thursday, June 13, 2024 8:46 PM Peter Eisentraut  
wrote:
> 
> On 23.05.24 08:36, shveta malik wrote:
> > Conflict Resolution
> > 
> > a) latest_timestamp_wins:The change with later commit timestamp
> wins.
> > b) earliest_timestamp_wins:   The change with earlier commit timestamp
> wins.
> > c) apply:   Always apply the remote change.
> > d) skip:Remote change is skipped.
> > e) error:   Error out on conflict. Replication is stopped, manual
> > action is needed.
> 
> You might be aware of pglogical, which has similar conflict resolution modes,
> but they appear to be spelled a bit different.  It might be worth reviewing 
> this,
> so that we don't unnecessarily introduce differences.

Right. Some of the proposed resolution names are different from pglogical's
while the functionalities are the same. The following is the comparison with
pglogical:

 latest_timestamp_wins(proposal) - last_update_wins(pglogical)
 earliest_timestamp_wins(proposal) - first_update_wins(pglogical)
 apply(proposal)   - apply_remote(pglogical)
 skip(proposal)- keep_local(pglogical)

I personally think the pglogical's names read more naturally. But others may
have different opinions on this.

> 
> https://github.com/2ndquadrant/pglogical?tab=readme-ov-file#conflicts
> 
> There might also be other inspiration to be found related to this in pglogical
> documentation or code.

Another difference is that we allow users to specify different resolutions for
different conflicts, while pglogical allows specifying one resolution for all 
conflict.
I think the proposed approach offers more flexibility to users, which seems more
favorable to me.

Best Regards,
Hou zj

Re: Remove dependence on integer wrapping

2024-06-14 Thread Alexander Lakhin


14.06.2024 05:48, Joseph Koshakow wrote:


v4-0001-Remove-dependence-on-integer-wrapping.patch is unchanged, I
just incremented the version number.

>    Also there are several trap-producing cases with date types:
>    SELECT to_date('1', 'CC');
>    SELECT to_timestamp('10,999', 'Y,YYY');
>    SELECT make_date(-2147483648, 1, 1);
>
>    And one more with array...
>    CREATE TABLE t (ia int[]);
>    INSERT INTO t(ia[2147483647:2147483647]) VALUES ('{}');

I'll try and get patches to address these too in the next couple of
weeks unless someone beats me to it.

>    I think it's not the whole iceberg too.

+1


After sending my message, I toyed with -ftrapv a little time more and
found other cases:
SELECT '[]'::jsonb -> -2147483648;

#4  0x7efe232d67f3 in __GI_abort () at ./stdlib/abort.c:79
#5  0x55e8fde9f211 in __negvsi2 ()
#6  0x55e8fdcca62c in jsonb_array_element (fcinfo=0x55e8fec28220) at 
jsonfuncs.c:948

(gdb) f 6
#6  0x55e14cb9362c in jsonb_array_element (fcinfo=0x55e14d493220) at 
jsonfuncs.c:948
948 if (-element > nelements)
(gdb) p element
$1 = -2147483648

---
SELECT jsonb_delete_path('{"a":[]}', '{"a",-2147483648}');

#4  0x7f1873bef7f3 in __GI_abort () at ./stdlib/abort.c:79
#5  0x564a009d2211 in __negvsi2 ()
#6  0x564a00807c89 in setPathArray (it=0x7fff865c7380, 
path_elems=0x564a017baf20, path_nulls=0x564a017baf40,
    path_len=2, st=0x7fff865c7388, level=1, newval=0x0, nelems=2, op_type=2) at 
jsonfuncs.c:5407

(gdb) f 6
#6  0x55985e823c89 in setPathArray (it=0x7ffc22258fe0, 
path_elems=0x559860286f20, path_nulls=0x559860286f40,
    path_len=2, st=0x7ffc22258fe8, level=1, newval=0x0, nelems=0, op_type=2) at 
jsonfuncs.c:5407
5407    if (-idx > nelems)
(gdb) p idx
$1 = -2147483648

---
CREATE FUNCTION check_foreign_key () RETURNS trigger AS .../refint.so' LANGUAGE 
C;
CREATE TABLE t (i int4 NOT NULL);
CREATE TRIGGER check_fkey BEFORE DELETE ON t FOR EACH ROW EXECUTE PROCEDURE
  check_foreign_key (2147483647, 'cascade', 'i', "ft", "i");
INSERT INTO t VALUES (1);
DELETE FROM t;

#4  0x7f57f0bef7f3 in __GI_abort () at ./stdlib/abort.c:79
#5  0x7f57f1671351 in __addvsi3 () from .../src/test/regress/refint.so
#6  0x7f57f1670234 in check_foreign_key (fcinfo=0x7ffebf523650) at 
refint.c:321

(gdb) f 6
#6  0x7f3400ef9234 in check_foreign_key (fcinfo=0x7ffd6e16a600) at 
refint.c:321
321 nkeys = (nargs - nrefs) / (nrefs + 1);
(gdb) p nargs
$1 = 3
(gdb) p nrefs
$2 = 2147483647

---
And the most interesting case to me:
SET temp_buffers TO 10;

CREATE TEMP TABLE t(i int PRIMARY KEY);
INSERT INTO t VALUES(1);

#4  0x7f385cdd37f3 in __GI_abort () at ./stdlib/abort.c:79
#5  0x5620071c4f51 in __addvsi3 ()
#6  0x562007143f3c in init_htab (hashp=0x562008facb20, nelem=610070812) at 
dynahash.c:720

(gdb) f 6
#6  0x560915207f3c in init_htab (hashp=0x560916039930, nelem=10) at 
dynahash.c:720
720 hctl->high_mask = (nbuckets << 1) - 1;
(gdb) p nbuckets
$1 = 1073741824

Best regards,
Alexander

Re: POC, WIP: OR-clause support for indexes

2024-06-14 Thread Alexander Korotkov

On Mon, Apr 8, 2024 at 1:34 AM Alexander Korotkov  wrote:
>
> I've revised the patch.  Did some beautification, improvements for
> documentation, commit messages etc.
>
> I've pushed the 0001 patch without 0002.  I think 0001 is good by
> itself given that there is the or_to_any_transform_limit GUC option.
> The more similar OR clauses are here the more likely grouping them
> into SOAP will be a win.  But I've changed the default value to 5.
> This will make it less invasive and affect only queries with obvious
> repeating patterns.  That also reduced the changes in the regression
> tests expected outputs.
>
> Regarding 0002, it seems questionable since it could cause a planning
> slowdown for SAOP's with large arrays.  Also, it might reduce the win
> of transformation made by 0001.  So, I think we should skip it for
> now.

The patch has been reverted from pg17.  Let me propose a new version
for pg18 based on the valuable feedback from Tom Lane [1][2].

 * The transformation is moved to the stage of adding restrictinfos to
the base relation (in particular add_base_clause_to_rel()).  This
leads to interesting consequences.  While this allows IndexScans to
use transformed clauses, BitmapScans and SeqScans seem unaffected.
Therefore, I wasn't able to find a planning regression.
 * As soon as there is no planning regression anymore, I've removed
or_to_any_transform_limit GUC, which was a source of critics.
 * Now, not only Consts allowed in the SAOP's list, but also Params.
 * The criticized hash based on expression jumbling has been removed.
Now, the plain list is used instead.
 * OrClauseGroup now gets a legal node tag.  That allows to mix it in
the list with other nodes without hacks.

I think this patch shouldn't be as good as before for optimizing
performance of large OR lists, given that BitmapScans and SeqScans
still deal with ORs.  However, it allows IndexScans to handle more,
doesn't seem to cause planning regression and therefore introduce no
extra GUC.  Overall, this seems like a good compromise.

This patch could use some polishing, but I'd like to first hear some
feedback on general design.

Links
1. https://www.postgresql.org/message-id/3604469.1712628736%40sss.pgh.pa.us
2. https://www.postgresql.org/message-id/3649287.1712642139%40sss.pgh.pa.us

--
Regards,
Alexander Korotkov
Supabase

v25-0001-Transform-OR-clauses-to-ANY-expression.patch
Description: Binary data

Re: RFC: adding pytest as a supported test framework

2024-06-14 Thread Robert Haas

On Thu, Jun 13, 2024 at 8:12 PM Jacob Champion
 wrote:
> But also: do you have opinions on what to fill in as steps 2
> (something we have no ability to test today) and 3 (something we do
> test today, but hate)?
>
> My vote for step 2 is "client and server separation", perhaps by
> testing libpq fallback against a server that claims support for
> different build-time options. I don't want to have a vote in step 3,
> because part of that step is proving that this framework can provide
> value for a part of the project I don't really know much about.

I mean, both Perl and Python are Turing-complete. Anything you can do
in one, you can do in the other, especially when you consider that
we're not likely to accept too many dependencies on external Perl or
Python modules. That's why I see this as nothing more or less than
exercise in letting people use the programming language they prefer.
We've talked about a libpq FFI interface, but it hasn't been done; now
we're talking about maybe having a Python one. Perhaps we'll end up
with both. Then you can imagine porting tests from one language to the
other and the only difference is whether you'd rather have line noise
before all of your variable names or semantically significant
whitespace.

I just don't believe in the idea that we're going to write one
category of tests in one language and another category in another
language. As soon as we open the door to Python tests, people are
going to start writing the TAP tests that they would have written in
Perl in Python instead. And if the test utilities that we have for
Perl are not there for Python, then they'll either open code things
for which they would have used a module, or they'll write a
stripped-down version of the module that will then get built-up patch
by patch until, 50 or 100 or 200 hours of committer-review later, it
resembles the existing Perl module. And, if the committer pushes back
and says, hey, why not write this test in Perl which already has all
of this test infrastructure in place already, then the submitter will
wander off muttering about how PostgreSQL committers are hidebound
backward individuals who just try to ruin everybody's day. So as I see
it, the only reasonable plan here if we want to introduce testing in
Python (or C#, or Ruby, or Go, or JavaScript, or Lua, or LOLCODE) is
to try to achieve a reasonable degree of parity between that language
and Perl. Because then we can at least review the new infrastructure
all at once, instead of incrementally spread across many patches
written, reviewed, and committed by many different people.

Now, I completely understand if you're not excited about getting
sucked down that rabbit-hole, and maybe some other committer is going
to see this differently than I do, and that's fine. But my view is
that if you're not interested in doing all the work to let people do
more or less the kind of stuff that they currently do in Perl in
Python instead, then your alternative is to take the tests that you
want to add and rewrite them in Perl. And I am fairly cetain that if
you choose that option, it will save me, and a bunch of other
committers, a whole lot of work, at least in the short term. If we add
support for Python, we are going to end up having to do a lot of
things twice for the next let's say ten to twenty years until somebody
rewrites the remaining Perl tests in Python or whatever language is
hot and cool by then. My opinion is that we need to be open to
enduring that pain because we can't indefinitely hold our breath and
insist on using obsolete tools for everything, but that doesn't mean
that I don't think it will be painful.

Consider the meson build system project. To get that committed, Andres
had to make it do pretty much everything MSVC could do and mostly
everything that configure could do, and the places where he didn't
make it do everything configure could do remain sore spots that I, at
least, am not happy about. And in that case, he also managed to get
MSVC removed entirely, so that we do not have a larger number of build
systems in total than we had before. Furthermore, the amount of code
in buildsystem files (makefiles, meson.build) in a typical patch
needing review is usually none or very little, whereas the amount of
test code in a patch is sometimes quite large. I've come around to the
belief that the meson work was worthwhile -- running tests is so much
faster and nicer now -- but it was a ton of work to get done and
impacted everyone in the development community, and I think the blast
radius for this change is likely to be larger for the reasons
suggested earlier in this paragraph.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: 回复: An implementation of multi-key sort

2024-06-14 Thread Tomas Vondra

On 6/14/24 13:20, Yao Wang wrote:
> hi Tomas,
> 
> So many thanks for your kind response and detailed report. I am working
> on locating issues based on your report/script and optimizing code, and
> will update later.
> 
> Could you please also send me the script to generate report pdf
> from the test results (explain*.log)? I can try to make one by myself,
> but I'd like to get a report exactly the same as yours. It's really
> helpful.
> 

I don't have a script for that. I simply load the results into a
spreadsheet, do a pivot table to "aggregate and reshuffle" it a bit, and
then add a heatmap. I use google sheets for this, but any other
spreadsheet should handle this too, I think.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] make async slave to wait for lsn to be replayed

2024-06-14 Thread Alexander Korotkov

Hi, Ivan!

On Wed, Jun 12, 2024 at 11:36 AM Kartyshov Ivan
 wrote:
>
> Hi, Alexander, Here, I made some improvements according to your
> discussion with Heikki.
>
> On 2024-04-11 18:09, Alexander Korotkov wrote:
> > On Thu, Apr 11, 2024 at 1:46 AM Heikki Linnakangas 
> > wrote:
> >> In a nutshell, it's possible for the loop in WaitForLSN to exit
> >> without
> >> cleaning up the process from the heap. I was able to hit that by
> >> adding
> >> a delay after the addLSNWaiter() call:
> >>
> >> > TRAP: failed Assert("!procInfo->inHeap"), File: 
> >> > "../src/backend/commands/waitlsn.c", Line: 114, PID: 1936152
> >> > postgres: heikki postgres [local] 
> >> > CALL(ExceptionalCondition+0xab)[0x55da1f68787b]
> >> > postgres: heikki postgres [local] CALL(+0x331ec8)[0x55da1f204ec8]
> >> > postgres: heikki postgres [local] CALL(WaitForLSN+0x139)[0x55da1f2052cc]
> >> > postgres: heikki postgres [local] 
> >> > CALL(pg_wal_replay_wait+0x18b)[0x55da1f2056e5]
> >> > postgres: heikki postgres [local] 
> >> > CALL(ExecuteCallStmt+0x46e)[0x55da1f18031a]
> >> > postgres: heikki postgres [local] 
> >> > CALL(standard_ProcessUtility+0x8cf)[0x55da1f4b26c9]
> >>
> >> I think there's a similar race condition if the timeout is reached at
> >> the same time that the startup process wakes up the process.
> >
> > Thank you for catching this.  I think WaitForLSN() just needs to call
> > deleteLSNWaiter() unconditionally after exit from the loop.
>
> Fix and add injection point test on this race condition.
>
> > On Thu, Apr 11, 2024 at 1:46 AM Heikki Linnakangas 
> > wrote:
> >> The docs could use some-copy-editing, but just to point out one issue:
> >>
> >> > There are also procedures to control the progress of recovery.
> >>
> >> That's copy-pasted from an earlier sentence at the table that lists
> >> functions like pg_promote(), pg_wal_replay_pause(), and
> >> pg_is_wal_replay_paused(). The pg_wal_replay_wait() doesn't control
> >> the
> >> progress of recovery like those functions do, it only causes the
> >> calling
> >> backend to wait.
>
> Fix documentation and add extra tests on multi-standby replication
> and cascade replication.

Thank you for the revised patch.

I see couple of items which are not addressed in this revision.
 * As Heikki pointed, that it's currently not possible in one round
trip to call call pg_wal_replay_wait() and do other job.  The attached
patch addresses this.  It milds the requirement.  Now, it's not
necessary to be in atomic context.  It's only necessary to have no
active snapshot.  This new requirement works for me so far.  I
appreciate a feedback on this.
 * As Alvaro pointed, multiple waiters case isn't covered by the test
suite.  That leads to no coverage of some code paths.  The attached
patch addresses this by adding a test case with multiple waiters.

The rest looks good to me.

Links.
1. 
https://www.postgresql.org/message-id/d1303584-b763-446c-9409-f4516118219f%40iki.fi
2.https://www.postgresql.org/message-id/202404051815.eri4u5q6oj26%40alvherre.pgsql

--
Regards,
Alexander Korotkov
Supabase
From 0229f55f8e63688f6660d07600dd6918e0e55b91 Mon Sep 17 00:00:00 2001
From: Alexander Korotkov 
Date: Thu, 13 Jun 2024 00:39:45 +0300
Subject: [PATCH v19] Implement pg_wal_replay_wait() stored procedure

pg_wal_replay_wait() is to be used on standby and specifies waiting for
the specific WAL location to be replayed before starting the transaction.
This option is useful when the user makes some data changes on primary and
needs a guarantee to see these changes on standby.

The queue of waiters is stored in the shared memory array sorted by LSN.
During replay of WAL waiters whose LSNs are already replayed are deleted from
the shared memory array and woken up by setting of their latches.

pg_wal_replay_wait() needs to wait without any snapshot held.  Otherwise,
the snapshot could prevent the replay of WAL records implying a kind of
self-deadlock.  This is why it is only possible to implement
pg_wal_replay_wait() as a procedure working without an active snapshot,
not a function.

Catversion is bumped.

Discussion: https://postgr.es/m/eb12f9b03851bb2583adab5df9579b4b%40postgrespro.ru
Author: Kartyshov Ivan, Alexander Korotkov
Reviewed-by: Michael Paquier, Peter Eisentraut, Dilip Kumar, Amit Kapila
Reviewed-by: Alexander Lakhin, Bharath Rupireddy, Euler Taveira
---
 doc/src/sgml/func.sgml| 112 ++
 src/backend/access/transam/xact.c |   6 +
 src/backend/access/transam/xlog.c |   7 +
 src/backend/access/transam/xlogrecovery.c |  11 +
 src/backend/catalog/system_functions.sql  |   3 +
 src/backend/commands/Makefile |   3 +-
 src/backend/commands/meson.build  |   1 +
 src/backend/commands/waitlsn.c| 345 ++
 src/backend/lib/pairingheap.c |  18 +-
 src/backend/storage/ipc/ipci.c|   3 +
 src/backend/storage/lmgr/proc.c   |   6 +
 .../utils/activity/w

assertion failure at cost_memoize_rescan()

2024-06-14 Thread Kohei KaiGai

Hello,

I met an assertion failure, and identified the root of the problem, but no
idea how to fix it.

The location of the problematic Assert() is at cost_memoize_rescan() to
check 'hit_ratio' is between 0.0 and 1.0.
The 'calls' is provided by the caller, and 'ndistinct' is the result
of estimate_num_groups().

#4  0x0084d583 in cost_memoize_rescan (root=0x2e95748,
mpath=0x30aece8, rescan_startup_cost=0x7ffd72141260,
rescan_total_cost=0x7ffd72141258) at costsize.c:2564
/home/kaigai/source/pgsql-16/src/backend/optimizer/path/costsize.c:2564:83932:beg:0x84d583
(gdb) l
2559 * how many of those scans we expect to get a cache hit.
2560 */
2561hit_ratio = ((calls - ndistinct) / calls) *
2562(est_cache_entries / Max(ndistinct,
est_cache_entries));
2563
2564Assert(hit_ratio >= 0 && hit_ratio <= 1.0);
2565
2566/*
2567 * Set the total_cost accounting for the expected cache hit
ratio.  We
2568 * also add on a cpu_operator_cost to account for a cache
lookup. This

(gdb) bt
#0  0x7f3a39aa154c in __pthread_kill_implementation () from
/lib64/libc.so.6
#1  0x7f3a39a54d06 in raise () from /lib64/libc.so.6
#2  0x7f3a39a287f3 in abort () from /lib64/libc.so.6
#3  0x00b6ff2c in ExceptionalCondition (conditionName=0xd28c28
"hit_ratio >= 0 && hit_ratio <= 1.0", fileName=0xd289a4 "costsize.c",
lineNumber=2564) at assert.c:66
#4  0x0084d583 in cost_memoize_rescan (root=0x2e95748,
mpath=0x30aece8, rescan_startup_cost=0x7ffd72141260,
rescan_total_cost=0x7ffd72141258) at costsize.c:2564
#5  0x00850831 in cost_rescan (root=0x2e95748, path=0x30aece8,
rescan_startup_cost=0x7ffd72141260, rescan_total_cost=0x7ffd72141258) at
costsize.c:4350
#6  0x0084e333 in initial_cost_nestloop (root=0x2e95748,
workspace=0x7ffd721412d0, jointype=JOIN_INNER, outer_path=0x3090058,
inner_path=0x30aece8, extra=0x7ffd72141500) at costsize.c:2978
#7  0x00860f58 in try_partial_nestloop_path (root=0x2e95748,
joinrel=0x30ae158, outer_path=0x3090058, inner_path=0x30aece8,
pathkeys=0x0, jointype=JOIN_INNER, extra=0x7ffd72141500) at joinpath.c:887
#8  0x00862a64 in consider_parallel_nestloop (root=0x2e95748,
joinrel=0x30ae158, outerrel=0x308f428, innerrel=0x2eac390,
jointype=JOIN_INNER, extra=0x7ffd72141500) at joinpath.c:2083
#9  0x0086273d in match_unsorted_outer (root=0x2e95748,
joinrel=0x30ae158, outerrel=0x308f428, innerrel=0x2eac390,
jointype=JOIN_INNER, extra=0x7ffd72141500) at joinpath.c:1940
#10 0x008600f0 in add_paths_to_joinrel (root=0x2e95748,
joinrel=0x30ae158, outerrel=0x308f428, innerrel=0x2eac390,
jointype=JOIN_INNER, sjinfo=0x7ffd721415f0, restrictlist=0x30ae5a8) at
joinpath.c:296
#11 0x00864d10 in populate_joinrel_with_paths (root=0x2e95748,
rel1=0x308f428, rel2=0x2eac390, joinrel=0x30ae158, sjinfo=0x7ffd721415f0,
restrictlist=0x30ae5a8) at joinrels.c:925
#12 0x008649e1 in make_join_rel (root=0x2e95748, rel1=0x308f428,
rel2=0x2eac390) at joinrels.c:776
#13 0x00863ec1 in make_rels_by_clause_joins (root=0x2e95748,
old_rel=0x308f428, other_rels_list=0x3088ed0, other_rels=0x3088ee8) at
joinrels.c:312
#14 0x0086399a in join_search_one_level (root=0x2e95748, level=3)
at joinrels.c:123
#15 0x008463f8 in standard_join_search (root=0x2e95748,
levels_needed=4, initial_rels=0x3088ed0) at allpaths.c:3454
#16 0x0084636d in make_rel_from_joinlist (root=0x2e95748,
joinlist=0x306b4f8) at allpaths.c:3385
#17 0x00841548 in make_one_rel (root=0x2e95748, joinlist=0x306b4f8)
at allpaths.c:229
#18 0x008806a9 in query_planner (root=0x2e95748,
qp_callback=0x886bcb , qp_extra=0x7ffd72141960) at
planmain.c:278
#19 0x00882f5f in grouping_planner (root=0x2e95748,
tuple_fraction=0) at planner.c:1495
#20 0x0088268c in subquery_planner (glob=0x2e95348,
parse=0x2e90e98, parent_root=0x0, hasRecursion=false, tuple_fraction=0) at
planner.c:1064
#21 0x00880cdb in standard_planner (parse=0x2e90e98,
query_string=0x2e3a0e8 "explain\nselect sum(lo_revenue), d_year,
p_brand1\n  from lineorder, date1, part, supplier\n  where lo_orderdate =
d_datekey\nand lo_partkey = p_partkey\nand lo_suppkey = s_suppkey\n
   and p_brand1"..., cursorOptions=2048,
boundParams=0x0) at planner.c:413

I tracked the behavior of estimate_num_groups() using gdb line-by-line to
observe how 'input_rows' is changed
and how it affects the result value.
According to the call trace, the problematic estimate_num_groups()
invocation is called with "input_rows=3251872.91667",
then it was rounded up to 3251873 by the clamp_row_est(). Eventually, its
result value was calculated larger than the upper
limit, so the return value was suppressed by 3251873, but it is a tiny bit
larger than the input value!

Back to the cost_memoize_rescan().
The hit_ratio is calculated as follows:

hit_ratio = ((calls - ndistinct) / calls) *

Re: Issues with ON CONFLICT UPDATE and REINDEX CONCURRENTLY

2024-06-14 Thread Michail Nikolaev

Hello.

> But I was unable to reproduce that using some random usleep(), however -
maybe it is a wrong assumption.
It seems like the assumption is correct - we may use an invalid index as
arbiter due to race condition.

The attached patch adds a check for that case, and now the test fails like
this:

# pgbench: error: client 16 script 1 aborted in command 1 query 0:
ERROR:  duplicate key value violates unique constraint "tbl_pkey_ccold"
# DETAIL:  Key (i)=(42) already exists.
# pgbench: error: client 9 script 1 aborted in command 1 query 0:
ERROR:  ON CONFLICT does not support invalid indexes as arbiters
# pgbench: error: client 0 script 2 aborted in command 1 query 0:
ERROR:  duplicate key value violates unique constraint "tbl_pkey"
# DETAIL:  Key (i)=(69) already exists.
# pgbench: error: client 7 script 0 aborted in command 1 query 0:
ERROR:  ON CONFLICT does not support invalid indexes as arbiters
# pgbench: error: client 10 script 0 aborted in command 1 query 0:
ERROR:  ON CONFLICT does not support invalid indexes as arbiters
# pgbench: error: client 11 script 0 aborted in command 1 query 0:
ERROR:  ON CONFLICT does not support invalid indexes as arbiters

I think It is even possible to see !alive index in the same situation (it
is worse case), but I was unable to reproduce it so far.

Best regards,
Mikhail.
Subject: [PATCH] error report in case of invalid index used as arbiter
test for issue with upsert fail
---
Index: src/test/modules/test_misc/t/006_concurrently_unique_fail.pl
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===
diff --git a/src/test/modules/test_misc/t/006_concurrently_unique_fail.pl b/src/test/modules/test_misc/t/006_concurrently_unique_fail.pl
new file mode 100644
--- /dev/null	(revision 9446f944b415306d9e5d5ab98f69938d8f5ee87f)
+++ b/src/test/modules/test_misc/t/006_concurrently_unique_fail.pl	(revision 9446f944b415306d9e5d5ab98f69938d8f5ee87f)
@@ -0,0 +1,158 @@
+
+# Copyright (c) 2024, PostgreSQL Global Development Group
+
+# Test REINDEX CONCURRENTLY with concurrent modifications and HOT updates
+use strict;
+use warnings;
+
+use Config;
+use Errno;
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Time::HiRes qw(usleep);
+use IPC::SysV;
+use threads;
+use Time::HiRes qw( time );
+use Test::More;
+use Test::Builder;
+
+if ($@ || $windows_os)
+{
+	plan skip_all => 'Fork and shared memory are not supported by this platform';
+}
+
+my ($pid, $shmem_id, $shmem_key,  $shmem_size);
+eval 'sub IPC_CREAT {0001000}' unless defined &IPC_CREAT;
+$shmem_size = 4;
+$shmem_key = rand(100);
+$shmem_id = shmget($shmem_key, $shmem_size, &IPC_CREAT | 0777) or die "Can't shmget: $!";
+shmwrite($shmem_id, "wait", 0, $shmem_size) or die "Can't shmwrite: $!";
+
+my $psql_timeout = IPC::Run::timer($PostgreSQL::Test::Utils::timeout_default);
+#
+# Test set-up
+#
+my ($node, $result);
+$node = PostgreSQL::Test::Cluster->new('RC_test');
+$node->init;
+$node->append_conf('postgresql.conf',
+	'lock_timeout = ' . (1000 * $PostgreSQL::Test::Utils::timeout_default));
+$node->append_conf('postgresql.conf', 'fsync = off');
+$node->append_conf('postgresql.conf', 'autovacuum = off');
+$node->start;
+$node->safe_psql('postgres', q(CREATE UNLOGGED TABLE tbl(i int primary key, updated_at timestamp)));
+
+
+###
+###
+###
+
+# IT IS NOT REQUIRED TO REPRODUCE THE ISSUE BUT MAKES IT TO HAPPEN FASTER
+$node->safe_psql('postgres', q(CREATE INDEX idx ON tbl(i, updated_at)));
+
+###
+###
+###
+
+my $builder = Test::More->builder;
+$builder->use_numbers(0);
+$builder->no_plan();
+
+my $child  = $builder->child("pg_bench");
+
+if(!defined($pid = fork())) {
+	# fork returned undef, so unsuccessful
+	die "Cannot fork a child: $!";
+} elsif ($pid == 0) {
+
+	$node->pgbench(
+		'--no-vacuum --client=20 -j 2 --transactions=10',
+		0,
+		[qr{actually processed}],
+		[qr{^$}],
+		'concurrent INSERTs, UPDATES and RC',
+		{
+			# Ensure some HOT updates happen
+			'002_pgbench_concurrent_transaction_updates' => q(
+BEGIN;
+INSERT INTO tbl VALUES(13,now()) on conflict(i) do update set updated_at = now();
+COMMIT;
+			),
+			'003_pgbench_concurrent_transaction_updates' => q(
+

Bugfix and improvements in multixact.c

2024-06-14 Thread Maxim Orlov

Hi!

While working on a making multix xact offsets 64-bit [0] I've discovered a
minor issue. The
thing is that type 'xid' is used in all macro, but it doesn't correct.
Appropriate MultiXactId or
MultiXactOffset should be used, actually.

And the second thing, as Heikki Linnakangas points out, args naming is also
misleading.

Since, these problems are not the point of thread [0], I decided to create
this discussion.
And here is the patch set addressing mentioned issues (0001 and 0002).

Additionally, I made an optional patch 0003 to switch from macro to inline
functions. For
me, personally, use of macro functions is justified if we are dealing with
different argument
types, to make polymorphic call. Which is not the case here. So, we can
have more
control over types and still generate the same code in terms of speed.
See https://godbolt.org/z/KM8voadhs Starting from O1 function is inlined,
thus no
overhead is noticeable. Anyway, it's up to the actual commiter to decide
does it worth it
or not. Again, this particular patch 0003 is completely optional.

As always, any opinions and reviews are very welcome!

[0]
https://www.postgresql.org/message-id/flat/ff143b24-a093-40da-9833-d36b83726bdf%40iki.fi#61d5a0e1cf6ab94b0e8aae8559bc4cf7

--
Best regards,
Maxim Orlov.

v1-0003-Switch-from-macro-functions-to-inline-in-multixac.patch
Description: Binary data

v1-0002-Fix-using-of-MultiXactOffset-type-insted-of-Trans.patch
Description: Binary data

v1-0001-Get-rid-of-misleading-xid-arg-name-in-multixact.patch
Description: Binary data

Re: jsonpath: Missing Binary Execution Path?

On Jun 13, 2024, at 22:31, Chapman Flack  wrote:

> It's baked right into the standard grammar: || can only have a
>  on its right and a 
> on its left.
> 
> && can only have a  on its right and a
>  on its left.

Wow. 

> The case for ! is even more limiting: it can't be applied to anything
> but a . That can be either the exists predicate,
> or, any other  but wrapped in parentheses.
> 
> The language seems sort of gappy in the same way XPath 1.0 was. XPath 2.0
> became much more consistent and conceptually unified, only by that time,
> XML was old school, and JSON was cool, and apparently started inventing
> a path language.

I suppose that’s the reason for this design. But if these sorts of limitations 
were changed in XPath, perhaps SQL-Next could fix them, too.

Thanks for citing the standard; super helpful.

D

Re: Shouldn't jsonpath .string() Unwrap?

On Jun 13, 2024, at 21:55, Chapman Flack  wrote:

> My opinion is yes, that should be done. 9.46, umm, General
> Rule 11 g ii 6) A) says just "if MODE is lax and  is not
> type or size, then let BASE be Unwrap(BASE)." No special exemption
> there for string(), nor further below at C) XV) for the operation
> of string().

Thank you! Cited that bit in the commit message in the attached patch (also 
available as a GitHub PR[1]).

D

[1]: https://github.com/theory/postgres/pull/5




v1-0001-Teach-jsonpath-string-to-unwrap-in-lax-mode.patch
Description: Binary data

Re: may be a buffer overflow problem

Daniel Gustafsson  writes:
> This is indeed buggy and need to take the length into account, as per the
> attached.  This only happens when in the undocumented regression test debug
> mode which may be why it's gone unnoticed.

Seeing that this code is exercised thousands of times a day in the
regression tests and has had a failure rate of exactly zero (and
yes, the tests do check the output), there must be some reason
why it's okay.

regards, tom lane

Re: CI and test improvements

2024-06-14 Thread Nazir Bilal Yavuz

Hi,

On Thu, 13 Jun 2024 at 14:56, Justin Pryzby  wrote:
>
> ccache should be installed in the image rather than re-installed on each
> invocation.

ccache is installed in the Windows VM images now [1]. It can be used
as 'set CC=ccache.exe cl.exe' in the Windows CI task.

[1] 
https://github.com/anarazel/pg-vm-images/commit/03a9225ac962fb30b5c0722c702941e2d7c1e81e

-- 
Regards,
Nazir Bilal Yavuz
Microsoft

Re: RFC: adding pytest as a supported test framework

2024-06-14 Thread Andrew Dunstan



On 2024-06-14 Fr 08:10, Robert Haas wrote:

We've talked about a libpq FFI interface, but it hasn't been done;



Hold my beer :-)


I just posted a POC for that.


cheers


andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com

Re: may be a buffer overflow problem

I wrote:
> Seeing that this code is exercised thousands of times a day in the
> regression tests and has had a failure rate of exactly zero (and
> yes, the tests do check the output), there must be some reason
> why it's okay.

After looking a little closer, I think the reason why it works in
practice is that there's always a few bytes of zero padding at the
end of struct sqlca_t.

I don't see any value in changing individual code sites that are
depending on that, because there are surely many more, both in
our own code and end users' code.  What I suggest we ought to do
is formalize the existence of that zero pad.  Perhaps like this:

charsqlstate[5];
+   charsqlstatepad; /* nul terminator for sqlstate */
 };

Another way could be to change

-   charsqlstate[5];
+   charsqlstate[6];

but I fear that could have unforeseen consequences in code that
is paying attention to sizeof(sqlstate).

Either way there are probably doc adjustments to be made.

regards, tom lane

Re: CI and test improvements

2024-06-14 Thread Justin Pryzby

On Fri, Jun 14, 2024 at 05:36:54PM +0300, Nazir Bilal Yavuz wrote:
> Hi,
> 
> On Thu, 13 Jun 2024 at 14:56, Justin Pryzby  wrote:
> >
> > ccache should be installed in the image rather than re-installed on each
> > invocation.
> 
> ccache is installed in the Windows VM images now [1]. It can be used
> as 'set CC=ccache.exe cl.exe' in the Windows CI task.
> 
> [1] 
> https://github.com/anarazel/pg-vm-images/commit/03a9225ac962fb30b5c0722c702941e2d7c1e81e

Thanks.  I think that works by using a "shim" created by choco in
C:\ProgramData\chocolatey\bin.

But going through that indirection seems to incur an extra 15sec of
compilation time; I think we'll want to do something to avoid that.

I'm not sure what the options are, like maybe installing ccache into a
fixed path like c:\ccache or installing a custom link, to a "pinned"
version of ccache.

-- 
Justin

Using LibPq in TAP tests via FFI

2024-06-14 Thread Andrew Dunstan

Over at [1] Andres expressed enthusiasm for enabling TAP tests to call 
LibPQ directly via FFI, and there was some support from others as well. 
Attached is a very rough POC for just that.There are two perl modules, 
one which wraps libpq (or almost all of it) in perl, and another which 
uses that module to create a session object that can be used to run SQL. 
Also in the patch is a modification of one TAP test (arbitrarily chosen 
as src/bin/pg_amcheck/t/004_verify_heapam.p) to use the new interface, 
so it doesn't use psql at all.


There's a bunch of work to do here, but for a morning's work it's not 
too bad :-) Luckily I had most of the first file already to hand.


Next I plan to look at some of the recovery tests and other uses of 
background_psql, which might be more challenging,a dn require extension 
of the session API. Also there's a lot of error checking and 
documentation that need to be added.



cheers


andrew


[1]  https://postgr.es/m/20240612152812.ixz3eiz2p475g...@awork3.anarazel.de

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: Using LibPq in TAP tests via FFI

2024-06-14 Thread Andrew Dunstan



On 2024-06-14 Fr 11:09, Andrew Dunstan wrote:
Over at [1] Andres expressed enthusiasm for enabling TAP tests to call 
LibPQ directly via FFI, and there was some support from others as 
well. Attached is a very rough POC for just that.There are two perl 
modules, one which wraps libpq (or almost all of it) in perl, and 
another which uses that module to create a session object that can be 
used to run SQL. Also in the patch is a modification of one TAP test 
(arbitrarily chosen as src/bin/pg_amcheck/t/004_verify_heapam.p) to 
use the new interface, so it doesn't use psql at all.


There's a bunch of work to do here, but for a morning's work it's not 
too bad :-) Luckily I had most of the first file already to hand.


Next I plan to look at some of the recovery tests and other uses of 
background_psql, which might be more challenging,a dn require 
extension of the session API. Also there's a lot of error checking and 
documentation that need to be added.



cheers


andrew


[1] 
https://postgr.es/m/20240612152812.ixz3eiz2p475g...@awork3.anarazel.de





And here's the patch


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com
diff --git a/src/bin/pg_amcheck/t/004_verify_heapam.pl b/src/bin/pg_amcheck/t/004_verify_heapam.pl
index f6d2c5f787..c47ff3f939 100644
--- a/src/bin/pg_amcheck/t/004_verify_heapam.pl
+++ b/src/bin/pg_amcheck/t/004_verify_heapam.pl
@@ -5,6 +5,7 @@ use strict;
 use warnings FATAL => 'all';
 
 use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Session;
 use PostgreSQL::Test::Utils;
 
 use Test::More;
@@ -190,16 +191,17 @@ $node->append_conf('postgresql.conf', 'max_prepared_transactions=10');
 $node->start;
 my $port = $node->port;
 my $pgdata = $node->data_dir;
-$node->safe_psql('postgres', "CREATE EXTENSION amcheck");
-$node->safe_psql('postgres', "CREATE EXTENSION pageinspect");
+my $session = PostgreSQL::Test::Session->new(node => $node);
+$session->do("CREATE EXTENSION amcheck");
+$session->do("CREATE EXTENSION pageinspect");
 
 # Get a non-zero datfrozenxid
-$node->safe_psql('postgres', qq(VACUUM FREEZE));
+$session->do(qq(VACUUM FREEZE));
 
 # Create the test table with precisely the schema that our corruption function
 # expects.
-$node->safe_psql(
-	'postgres', qq(
+$session->do(
+	qq(
 		CREATE TABLE public.test (a BIGINT, b TEXT, c TEXT);
 		ALTER TABLE public.test SET (autovacuum_enabled=false);
 		ALTER TABLE public.test ALTER COLUMN c SET STORAGE EXTERNAL;
@@ -209,14 +211,15 @@ $node->safe_psql(
 # We want (0 < datfrozenxid < test.relfrozenxid).  To achieve this, we freeze
 # an otherwise unused table, public.junk, prior to inserting data and freezing
 # public.test
-$node->safe_psql(
-	'postgres', qq(
+$session->do(
+	qq(
 		CREATE TABLE public.junk AS SELECT 'junk'::TEXT AS junk_column;
 		ALTER TABLE public.junk SET (autovacuum_enabled=false);
-		VACUUM FREEZE public.junk
-	));
+	),
+   	'VACUUM FREEZE public.junk'
+);
 
-my $rel = $node->safe_psql('postgres',
+my $rel = $session->query_oneval(
 	qq(SELECT pg_relation_filepath('public.test')));
 my $relpath = "$pgdata/$rel";
 
@@ -229,23 +232,24 @@ my $ROWCOUNT_BASIC = 16;
 
 # First insert data needed for tests unrelated to update chain validation.
 # Then freeze the page. These tuples are at offset numbers 1 to 16.
-$node->safe_psql(
-	'postgres', qq(
+$session->do(
+	qq(
 	INSERT INTO public.test (a, b, c)
 		SELECT
 			x'DEADF9F9DEADF9F9'::bigint,
 			'abcdefg',
 			repeat('w', 1)
 	FROM generate_series(1, $ROWCOUNT_BASIC);
-	VACUUM FREEZE public.test;)
+),
+	'VACUUM FREEZE public.test'
 );
 
 # Create some simple HOT update chains for line pointer validation. After
 # the page is HOT pruned, we'll have two redirects line pointers each pointing
 # to a tuple. We'll then change the second redirect to point to the same
 # tuple as the first one and verify that we can detect corruption.
-$node->safe_psql(
-	'postgres', qq(
+$session->do(
+	qq(
 		INSERT INTO public.test (a, b, c)
 			VALUES ( x'DEADF9F9DEADF9F9'::bigint, 'abcdefg',
 	 generate_series(1,2)); -- offset numbers 17 and 18
@@ -254,8 +258,8 @@ $node->safe_psql(
 	));
 
 # Create some more HOT update chains.
-$node->safe_psql(
-	'postgres', qq(
+$session->do(
+	qq(
 		INSERT INTO public.test (a, b, c)
 			VALUES ( x'DEADF9F9DEADF9F9'::bigint, 'abcdefg',
 	 generate_series(3,6)); -- offset numbers 21 through 24
@@ -264,25 +268,29 @@ $node->safe_psql(
 	));
 
 # Negative test case of HOT-pruning with aborted tuple.
-$node->safe_psql(
-	'postgres', qq(
+$session->do(
+	qq(
 		BEGIN;
 		UPDATE public.test SET c = 'a' WHERE c = '5'; -- offset number 27
 		ABORT;
-		VACUUM FREEZE public.test;
-	));
+   ),
+	   'VACUUM FREEZE public.test;',
+	);
 
 # Next update on any tuple will be stored at the same place of tuple inserted
 # by aborted transaction. This should not cause the table to appear corrupt.
-$node->safe_psql(
-	'postgres', qq(
+$session->do(
+	qq(
+BEGIN;
 		UPDATE public.test SET c = 'a' WHERE c = '6'; -- o

Re: Shouldn't jsonpath .string() Unwrap?

2024-06-14 Thread Chapman Flack

On 06/14/24 10:39, David E. Wheeler wrote:
> Cited that bit in the commit message in the attached patch (also available as 
> a GitHub PR[1]).
> 
> [1]: https://github.com/theory/postgres/pull/5

I would s/extepsions/exceptions/ in the added documentation. :)

Offhand (as GitHub PRs aren't really The PG Way), if they were The Way,
I would find this one a little hard to follow, being based at a point
28 unrelated commits ahead of the ref it's opened against. I suspect
'master' on theory/postgres could be fast-forwarded to f1affb6 and then
the PR would look much more approachable.

Regards,
-Chap

Re: CI and test improvements

Hi, 

On June 14, 2024 8:22:01 AM PDT, Justin Pryzby  wrote:
>On Fri, Jun 14, 2024 at 05:36:54PM +0300, Nazir Bilal Yavuz wrote:
>> Hi,
>> 
>> On Thu, 13 Jun 2024 at 14:56, Justin Pryzby  wrote:
>> >
>> > ccache should be installed in the image rather than re-installed on each
>> > invocation.
>> 
>> ccache is installed in the Windows VM images now [1]. It can be used
>> as 'set CC=ccache.exe cl.exe' in the Windows CI task.
>> 
>> [1] 
>> https://github.com/anarazel/pg-vm-images/commit/03a9225ac962fb30b5c0722c702941e2d7c1e81e
>
>Thanks.  I think that works by using a "shim" created by choco in
>C:\ProgramData\chocolatey\bin.
>
>But going through that indirection seems to incur an extra 15sec of
>compilation time; I think we'll want to do something to avoid that.
>
>I'm not sure what the options are, like maybe installing ccache into a
>fixed path like c:\ccache or installing a custom link, to a "pinned"
>version of ccache.

Hm. There actually already is the mingw ccache installed. The images for mingw 
and msvc used to be separate (for startup performance when using containers), 
but we just merged them.  So it might be easiest to just explicitly use the 
ccache from there (also an explicit path might be faster). Could you check if 
that's fast? If not, we can just install the binaries distributed by the 
project, it's just more work to keep up2date that way. 

Andres 
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: Using LibPq in TAP tests via FFI

2024-06-14 Thread Robert Haas

On Fri, Jun 14, 2024 at 11:11 AM Andrew Dunstan  wrote:
> And here's the patch

I haven't reviewed the patch, but a big +1 for the idea. Not only this
might cut down on the resource costs of running the tests in CI, as
Andres has pointed out a few times, but it also could lead to much
nicer user interfaces. For instance, right now, we have a number of
TAP tests that are parsing psql output to recover the values returned
by queries. Perhaps eventually - or maybe already, again I haven't
looked at the code - you'll be able to do something like
$resultset->[0][0] to pull the first column out of the first row. That
kind of thing could substantially improve the readability and
maintainability of some of our tests.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Using LibPq in TAP tests via FFI

Hi, 

On June 14, 2024 8:09:43 AM PDT, Andrew Dunstan  wrote:
>Over at [1] Andres expressed enthusiasm for enabling TAP tests to call LibPQ 
>directly via FFI, and there was some support from others as well. Attached is 
>a very rough POC for just that.There are two perl modules, one which wraps 
>libpq (or almost all of it) in perl, and another which uses that module to 
>create a session object that can be used to run SQL. Also in the patch is a 
>modification of one TAP test (arbitrarily chosen as 
>src/bin/pg_amcheck/t/004_verify_heapam.p) to use the new interface, so it 
>doesn't use psql at all.
>
>There's a bunch of work to do here, but for a morning's work it's not too bad 
>:-) Luckily I had most of the first file already to hand.

Yay!


>Next I plan to look at some of the recovery tests and other uses of 
>background_psql, which might be more challenging,a dn require extension of the 
>session API. Also there's a lot of error checking and documentation that need 
>to be added.

I'd suggest trying to convert the various looping constructs first, they're 
responsible for a large number of spawned shells. And I vaguely recall that 
there were none/very few that depend on actually being run via psql. 

Andres

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: RFC: adding pytest as a supported test framework

Robert Haas  writes:
> I mean, both Perl and Python are Turing-complete. Anything you can do
> in one, you can do in the other, especially when you consider that
> we're not likely to accept too many dependencies on external Perl or
> Python modules. That's why I see this as nothing more or less than
> exercise in letting people use the programming language they prefer.

I think that's an oversimplified analysis.  Sure, the languages are
both Turing-complete, but for our purposes here they are both simply
glue languages around some set of testing facilities.  Some of those
facilities will be provided by the base languages (plus whatever
extension modules we choose to require) and some by code we write.
The overall experience of writing tests will be determined by the
testing facilities far more than by the language used for glue.

That being the case, I do agree with your point that Python
equivalents to most of PostgreSQL::Test will need to be built up PDQ.
Maybe they can be better than the originals, in features or ease of
use, but "not there at all" is not better.

But what I'd really like to see is some comparison of the
language-provided testing facilities that we're proposing
to depend on.  Why is pytest (or whatever) better than Test::More?

I also wonder about integration of python-based testing with what
we already have.  A significant part of what you called the meson
work had to do with persuading pg_regress, isolationtester, etc
to output test results in the common format established by TAP.
Am I right in guessing that pytest will have nothing to do with that?
Can we even manage to dump perl and python test scripts into the same
subdirectory and sort things out automatically?  I'm definitely going
to be -1 for a python testing feature that cannot integrate with what
we have because it demands its own control and result-collection
infrastructure.

regards, tom lane

Re: libpq contention due to gss even when not using gss

2024-06-14 Thread Dmitry Dolgov

> On Fri, Jun 14, 2024 at 12:12:55PM GMT, Daniel Gustafsson wrote:
> > I've been experimenting with both:
> >
> > * The server is built without gssapi, but the client does support it.
> >  This produces exactly the contention you're talking about.
> >
> > * The server is built with gssapi, but do not use it in pg_hba, the
> >  client does support gssapi. In this case the difference between
> >  gssencmode=disable/prefer is even more dramatic in my test case
> >  (milliseconds vs seconds) due to the environment with configured
> >  kerberos (for other purposes, thus gss_init_sec_context spends huge
> >  amount of time to still return nothing).
> >
> > At the same time after quick look I don't see an easy way to avoid that.
> > Current implementation tries to initialize gss before getting any
> > confirmation from the server whether it's supported. Doing this other
> > way around would probably just shift overhead to the server side.
>
> The main problem seems to be that we check whether or not there is a 
> credential
> cache when we try to select encryption but not yet authentication, as a way to
> figure out if gssenc it as all worth trying?

Yep, this is my understanding as well. Which other methods did you try
for checking that?

Re: libpq contention due to gss even when not using gss

Hi,

On 2024-06-14 10:46:04 +0200, Dmitry Dolgov wrote:
> At the same time after quick look I don't see an easy way to avoid that.
> Current implementation tries to initialize gss before getting any
> confirmation from the server whether it's supported. Doing this other
> way around would probably just shift overhead to the server side.

Initializing the gss cache at all isn't so much the problem. It's that we do
it for every connection. And that doing so requires locking inside gss. So
maybe we could just globally cache that gss isn't available, instead of
rediscovering it over and over for every new connection.

Greetings,

Andres Freund

Re: Shouldn't jsonpath .string() Unwrap?



> On Jun 14, 2024, at 11:25, Chapman Flack  wrote:
> 
> I would s/extepsions/exceptions/ in the added documentation. :)

Bah, fixed and attached, thanks.

> Offhand (as GitHub PRs aren't really The PG Way), if they were The Way,
> I would find this one a little hard to follow, being based at a point
> 28 unrelated commits ahead of the ref it's opened against. I suspect
> 'master' on theory/postgres could be fast-forwarded to f1affb6 and then
> the PR would look much more approachable.

Yeah, I pushed the PR and branch before I synced master, and GitHub was taking 
a while to notice and update the PR. I fixed it with `git commit --all --amend 
--date now --reedit-message HEAD` and force-pushed (then fixed the typo and 
fixed again).

D



v2-0001-Teach-jsonpath-string-to-unwrap-in-lax-mode.patch
Description: Binary data

Re: RFC: adding pytest as a supported test framework

Hi,

On 2024-06-14 11:49:29 -0400, Tom Lane wrote:
> I also wonder about integration of python-based testing with what
> we already have.  A significant part of what you called the meson
> work had to do with persuading pg_regress, isolationtester, etc
> to output test results in the common format established by TAP.

FWIW, meson's testrunner doesn't require TAP, the lowest common denominator is
just an exit code. However, for things that run many "sub" tests, it's a lot
nicer if the failures can be reported more granularly than just "the entire
testsuite failed".

Meson currently supports:

exitcode: the executable's exit code is used by the test harness to record 
the outcome of the test).

tap: Test Anything Protocol.

gtest (since 0.55.0): for Google Tests.

rust (since 0.56.0): for native rust tests

> Am I right in guessing that pytest will have nothing to do with that?

Looks like there's a plugin for pytest to support tap as output:
https://pypi.org/project/pytest-tap/

However, it's not available as a debian package. I know that some folks just
advocate installing dependencies via venv, but I personally don't think
that'll fly. For one, it'll basically prevent tests being run by packagers.

> Can we even manage to dump perl and python test scripts into the same
> subdirectory and sort things out automatically?

That shouldn't be much of a problem.

> I'm definitely going to be -1 for a python testing feature that cannot
> integrate with what we have because it demands its own control and
> result-collection infrastructure.

Dito.

Greetings,

Andres Freund

Re: RFC: adding pytest as a supported test framework

On 2024-06-14 09:11:11 -0700, Andres Freund wrote:
> On 2024-06-14 11:49:29 -0400, Tom Lane wrote:
> > Am I right in guessing that pytest will have nothing to do with that?
> 
> Looks like there's a plugin for pytest to support tap as output:
> https://pypi.org/project/pytest-tap/
> 
> However, it's not available as a debian package. I know that some folks just
> advocate installing dependencies via venv, but I personally don't think
> that'll fly. For one, it'll basically prevent tests being run by packagers.

If this were the blocker, I think we could just ship an output adapter
ourselves. pytest-tap is not a lot of code:
https://github.com/python-tap/pytest-tap/blob/main/src/pytest_tap/plugin.py

So either vendoring it or just writing an even simpler version ourselves seems
entirely feasible.

jsonpath: Missing regex_like && starts with Errors?

Hackers,

I noticed that neither `regex_like` nor `starts with`, the jsonpath operators, 
raise an error when the operand is not a string (or array of strings):

david=# select jsonb_path_query('true', '$ like_regex "^hi"');
 jsonb_path_query 
--
 null
(1 row)

david=# select jsonb_path_query('{"x": "hi"}', '$ starts with "^hi"');
 jsonb_path_query 
--
 null
(1 row)

This is true in strict and lax mode, and with verbosity enabled (as in these 
examples). Most other operators raise an error when they can’t operate on the 
operand:

david=# select jsonb_path_query('{"x": "hi"}', '$.integer()');
ERROR:  jsonpath item method .integer() can only be applied to a string or 
numeric value

david=# select jsonb_path_query('{"x": "hi"}', '$+$');
ERROR:  left operand of jsonpath operator + is not a single numeric value

Should `like_regex` and `starts with` adopt this behavior, too?

I note that filter expressions seem to suppress these sorts of errors, but I 
assume that’s by design:

david=# select jsonb_path_query('{"x": "hi"}', 'strict $ ?(@ starts with 
"^hi")');
 jsonb_path_query 
--
(0 rows)

david=# select jsonb_path_query('{"x": "hi"}', 'strict $ ?(@ like_regex 
"^hi")');
 jsonb_path_query 
--
(0 rows)

david=# select jsonb_path_query('{"x": "hi"}', 'strict $ ?(@.integer() == 1)');
 jsonb_path_query 
--
(0 rows)

D

Re: Using LibPq in TAP tests via FFI

Hi,

On 2024-06-14 11:11:38 -0400, Andrew Dunstan wrote:
> On 2024-06-14 Fr 11:09, Andrew Dunstan wrote:
> > Over at [1] Andres expressed enthusiasm for enabling TAP tests to call
> > LibPQ directly via FFI, and there was some support from others as well.
> > Attached is a very rough POC for just that.There are two perl modules,
> > one which wraps libpq (or almost all of it) in perl, and another which
> > uses that module to create a session object that can be used to run SQL.

What are your current thoughts about a fallback for this?  It seems possible
to implement the session module ontop of BackgroundPsql.pm, if necessary. But
I suspect we'll eventually get to a point where that gets less and less
convenient.

How much of a dependency is FFI::Platypus, compared to requiring perl headers
to be installed?  In case FFI::Platypus is a complicted dependency, a small XS
wrapper could be an alternative.

Greetings,

Andres Freund

Re: libpq contention due to gss even when not using gss

Andres Freund  writes:
> Initializing the gss cache at all isn't so much the problem. It's that we do
> it for every connection. And that doing so requires locking inside gss. So
> maybe we could just globally cache that gss isn't available, instead of
> rediscovering it over and over for every new connection.

I had the impression that krb5 already had such a cache internally.
Maybe they don't cache the "failed" state though.  I doubt we'd
want to either in long-lived processes --- what if the user
installs the credential while we're running?

regards, tom lane

Re: libpq contention due to gss even when not using gss

Hi,

On 2024-06-14 12:27:12 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > Initializing the gss cache at all isn't so much the problem. It's that we do
> > it for every connection. And that doing so requires locking inside gss. So
> > maybe we could just globally cache that gss isn't available, instead of
> > rediscovering it over and over for every new connection.
> 
> I had the impression that krb5 already had such a cache internally.

Well, if so, it clearly doesn't seem to work very well, given that it causes
contention at ~15k lookups/sec. That's obviously a trivial number for anything
cached, even with the worst possible locking regimen.

> Maybe they don't cache the "failed" state though.  I doubt we'd
> want to either in long-lived processes --- what if the user
> installs the credential while we're running?

If we can come up with something better - cool. But it doesn't seem great that
gss introduces contention for the vast majority of folks that use libpq in
environments that never use gss.

I don't think we should cache the set of credentials when gss is actually
available on a process-wide basis, just the fact that gss isn't available at
all.  I think it's very unlikely for that fact to change while an application
is running.  And if it happens, requiring a restart in those cases seems an
acceptable price to pay for what is effectively a niche feature.

Greetings,

Andres Freund

Re: Bugfix and improvements in multixact.c

2024-06-14 Thread Heikki Linnakangas


On 14/06/2024 16:56, Maxim Orlov wrote:

Hi!

While working on a making multix xact offsets 64-bit [0] I've
discovered a minor issue.  The thing is that type 'xid' is used in
all macro, but it doesn't correct. Appropriate MultiXactId or 
MultiXactOffset should be used, actually.


And the second thing, as Heikki Linnakangas points out, args naming
is also misleading.

Thanks!

Looks good to me at a quick glance. I'll try to review and commit these 
properly by Monday.


Additionally, I made an optional patch 0003 to switch from macro to 
inline functions.  For me, personally, use of macro functions is 
justified if we are dealing with different argument types, to make 
polymorphic call.  Which is not the case here.  So, we can have more
 control over types and still generate the same code in terms of 
speed. See https://godbolt.org/z/KM8voadhs Starting from O1 function
is inlined, thus no overhead is noticeable.  Anyway, it's up to the 
actual commiter to decide does it worth it or not.  Again, this 
particular patch 0003 is completely optional.
I agree static inline functions are generally easier to work with than 
macros. These particular macros were not too bad, though.


I'll bite the bullet and commit this one too unless someone objects. 
It's late in the v17 release cycle, but these are local to multixact.c 
so there's no risk of breaking extensions, and it seems good to do it 
now since we're modifying the macros anyway.


--
Heikki Linnakangas
Neon (https://neon.tech)

Re: Allow non-superuser to cancel superuser tasks.

2024-06-14 Thread Nathan Bossart

On Fri, Jun 14, 2024 at 12:06:36PM +0500, Andrey M. Borodin wrote:
> This patch looks good to me.

Thanks for looking.

> Spellchecker is complaining about "signaling" instead of "signalling",
> but ISTM it´s OK.

I think this is an en-US versus en-GB thing.  We've standardized on en-US
for "cancel" (see commits 8c9da14, 21f1e15, and af26857), so IMO we might
as well do so for "signal," too.

-- 
nathan

Re: RFC: adding pytest as a supported test framework

2024-06-14 Thread Greg Sabino Mullane

On Thu, Jun 13, 2024 at 1:08 PM Jelte Fennema-Nio 
wrote:

> But Perl is at the next level of unmaintained infrastructure. It is
> actually clear how you can contribute to it, but still no new
> community members actually want to contribute to it. Also, it's not
> only unmaintained by us but it's also pretty much unmaintained by the
> upstream community.

I am not happy with the state of Perl, as it has made some MAJOR missteps
along the way, particularly in the last 5 years. But can we dispel this
strawman? There is a difference between "unpopular" and "unmaintained". The
latest version of Perl was released May 20, 2024. The latest release of
Test::More was April 25, 2024. Both are heavily used. Just not as heavily
as they used to be. :)

Cheers,
Greg

Re: RFC: adding pytest as a supported test framework

2024-06-14 Thread Jelte Fennema-Nio

On Fri, 14 Jun 2024 at 22:33, Greg Sabino Mullane  wrote:
> I am not happy with the state of Perl, as it has made some MAJOR missteps 
> along the way, particularly in the last 5 years. But can we dispel this 
> strawman? There is a difference between "unpopular" and "unmaintained". The 
> latest version of Perl was released May 20, 2024. The latest release of 
> Test::More was April 25, 2024. Both are heavily used. Just not as heavily as 
> they used to be. :)

Sorry, yes I exaggerated here. Looking at the last Perl changelog[1]
it's definitely getting more new features and improvements than I had
thought.

Test::More on the other hand, while indeed still maintained, it's
definitely not getting significant new feature development or
improvements[2]. Especially when comparing it to pytest[3].

[1]: https://perldoc.perl.org/perldelta
[2]: https://github.com/Test-More/test-more/blob/master/Changes
[3]: https://docs.pytest.org/en/stable/changelog.html

Re: RFC: adding pytest as a supported test framework

2024-06-14 Thread Thomas Munro

On Fri, Jun 14, 2024 at 9:24 AM Robert Haas  wrote:
> For example, the fact that
> nobody's helping Thomas maintain this cfbot that we all have come to
> rely on, or helping him get that integrated into
> commitfest.postgresql.org, is a problem.

I've been talking to Magnus and Jelte about cfbot and we're hoping to
have some good news soon...

Re: use CREATE DATABASE STRATEGY = FILE_COPY in pg_upgrade

2024-06-14 Thread Nathan Bossart

On Wed, Jun 12, 2024 at 09:41:01AM -0500, Nathan Bossart wrote:
> Actually, I think you are right that we need a manual checkpoint, except I
> think we need it to be after prepare_new_globals().  set_frozenxids()
> connects to each database (including template0) and updates a bunch of
> pg_class rows, and we probably want those on disk before we start copying
> the files to create all the user's databases.

Here is an updated patch.

-- 
nathan
>From c3aeed33b351475896c2b4002cff125f1edeeb42 Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Fri, 14 Jun 2024 16:25:04 -0500
Subject: [PATCH v3 1/1] Use CREATE DATABASE ... STRATEGY = FILE_COPY in
 pg_upgrade.

While the strategy is considered very costly due to expensive checkpoints
when the target system has configured large shared_buffers, the issues that
those checkpoints protect against don't apply in a pg_upgrade system.

By skipping the checkpoints during CREATE DATABASE ... STRATEGY=FILE_COPY
and using that strategy for pg_upgrade, we can save a lot of time and IO
that would otherwise be spent on reading pages from disk, generating WAL
records, writing those WAL records to disk, and then writing the pages to
disk - which has the potential of at least double the IO of FILE_COPY.

Co-authored-by: Matthias van de Meent
Reviewed-by: Ranier Vilela, Dilip Kumar
Discussion: https://postgr.es/m/Zl9ta3FtgdjizkJ5%40nathan
---
 src/backend/commands/dbcommands.c | 18 +++---
 src/bin/pg_dump/pg_dump.c |  9 -
 src/bin/pg_upgrade/pg_upgrade.c   | 12 
 3 files changed, 35 insertions(+), 4 deletions(-)

diff --git a/src/backend/commands/dbcommands.c 
b/src/backend/commands/dbcommands.c
index be629ea92c..99d7a9ba3c 100644
--- a/src/backend/commands/dbcommands.c
+++ b/src/backend/commands/dbcommands.c
@@ -563,9 +563,14 @@ CreateDatabaseUsingFileCopy(Oid src_dboid, Oid dst_dboid, 
Oid src_tsid,
 * happened while we're copying files, a file might be deleted just when
 * we're about to copy it, causing the lstat() call in copydir() to fail
 * with ENOENT.
+*
+* In binary upgrade mode, we can skip this checkpoint because we are
+* careful to ensure that template0 is fully written to disk prior to 
any
+* CREATE DATABASE commands.
 */
-   RequestCheckpoint(CHECKPOINT_IMMEDIATE | CHECKPOINT_FORCE |
- CHECKPOINT_WAIT | 
CHECKPOINT_FLUSH_ALL);
+   if (!IsBinaryUpgrade)
+   RequestCheckpoint(CHECKPOINT_IMMEDIATE | CHECKPOINT_FORCE |
+ CHECKPOINT_WAIT | 
CHECKPOINT_FLUSH_ALL);
 
/*
 * Iterate through all tablespaces of the template database, and copy 
each
@@ -657,10 +662,17 @@ CreateDatabaseUsingFileCopy(Oid src_dboid, Oid dst_dboid, 
Oid src_tsid,
 * seem to be much we can do about that except document it as a
 * limitation.
 *
+* In binary upgrade mode, we can skip this checkpoint because neither 
of
+* these problems applies: we don't ever replay the WAL generated during
+* pg_upgrade, and we don't concurrently modify template0 (not to 
mention
+* that trying to take a backup during pg_upgrade is pointless).
+*
 * See CreateDatabaseUsingWalLog() for a less cheesy CREATE DATABASE
 * strategy that avoids these problems.
 */
-   RequestCheckpoint(CHECKPOINT_IMMEDIATE | CHECKPOINT_FORCE | 
CHECKPOINT_WAIT);
+   if (!IsBinaryUpgrade)
+   RequestCheckpoint(CHECKPOINT_IMMEDIATE | CHECKPOINT_FORCE |
+ CHECKPOINT_WAIT);
 }
 
 /*
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index e324070828..1e51a63442 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -3123,10 +3123,17 @@ dumpDatabase(Archive *fout)
 * since those can't be altered later.  Other DB properties are left to
 * the DATABASE PROPERTIES entry, so that they can be applied after
 * reconnecting to the target DB.
+*
+* For binary upgrade, we use the FILE_COPY strategy because testing has
+* shown it to be faster.  When the server is in binary upgrade mode, it
+* will also skip the checkpoints normally triggered by CREATE DATABASE
+* STRATEGY = FILE_COPY.
 */
if (dopt->binary_upgrade)
{
-   appendPQExpBuffer(creaQry, "CREATE DATABASE %s WITH TEMPLATE = 
template0 OID = %u",
+   appendPQExpBuffer(creaQry,
+ "CREATE DATABASE %s WITH 
TEMPLATE = template0 "
+ "OID = %u STRATEGY = 
FILE_COPY",
  qdatname, dbCatId.oid);
}
else
diff --git a/src/bin/pg_upgrade/pg_upgrade.c b/src/bin/pg_upgrade/pg_upgrade.c
index af370768b6..45f095a230 100644
--- a/

Re: RFC: adding pytest as a supported test framework

2024-06-14 Thread Jelte Fennema-Nio

On Fri, 14 Jun 2024 at 17:49, Tom Lane  wrote:
> But what I'd really like to see is some comparison of the
> language-provided testing facilities that we're proposing
> to depend on.  Why is pytest (or whatever) better than Test::More?

Some advantages of pytest over Test::More:

1. It's much easier to debug failing tests using the output that
pytest gives. A good example of this is on pytest its homepage[1]
(i.e. it shows the value given to the call to inc in the error)
2. No need to remember a specific comparison DSL
(is/isnt/is_deeply/like/ok/cmp_ok/isa_ok), just put assert in front of
a boolean expression and your output is great (if you want to add a
message too for clarity you can use: assert a == b, "the world is
ending")
3. Very easy to postgres log files on stderr/stdout when a test fails.
This might be possible/easy with Perl too, but we currently don't do
that. So right now for many failures you're forced to traverse the
build/testrun/... directory tree to find the logs.
4. Pytest has autodiscovery of test files and functions, so we
probably wouldn't have to specify all of the exact test files anymore
in the meson.build files.

Regarding 2, there are ~150 checks that are using a suboptimal way of
testing for a comparison. Mostly a lot that could use "like(..., ...)"
instead of "ok(... ~= ...)"
❯ grep '\bok(.*=' **.pl | wc -l
151

[1]: https://docs.pytest.org/en/8.2.x/#a-quick-example

Re: tiny step toward threading: reduce dependence on setlocale()

2024-06-14 Thread Jeff Davis

On Thu, 2024-06-06 at 11:37 -0700, Jeff Davis wrote:
> 
> I think this patch series is a nice cleanup, as well, making libc
> more
> like the other providers and not dependent on global state.

New rebased series attached with additional cleanup. Now that
pg_locale_t is never NULL, we can simplify the way the collation cache
works, eliminating ~100 lines.

-- 
Jeff Davis
PostgreSQL Contributor Team - AWS


From d3862b88d8df3372ebdd368489a86142bd11f42c Mon Sep 17 00:00:00 2001
From: Jeff Davis 
Date: Wed, 5 Jun 2024 11:45:55 -0700
Subject: [PATCH v2 1/7] Make database default collation internal to
 pg_locale.c.

---
 src/backend/utils/adt/pg_locale.c | 64 ++-
 src/backend/utils/init/postinit.c | 35 ++---
 src/include/utils/pg_locale.h |  3 +-
 3 files changed, 66 insertions(+), 36 deletions(-)

diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index 7e5bb2b703..29f16c49cb 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -56,6 +56,7 @@
 
 #include "access/htup_details.h"
 #include "catalog/pg_collation.h"
+#include "catalog/pg_database.h"
 #include "mb/pg_wchar.h"
 #include "miscadmin.h"
 #include "utils/builtins.h"
@@ -116,6 +117,8 @@ char	   *localized_full_months[12 + 1];
 /* is the databases's LC_CTYPE the C locale? */
 bool		database_ctype_is_c = false;
 
+static struct pg_locale_struct default_locale;
+
 /* indicates whether locale information cache is valid */
 static bool CurrentLocaleConvValid = false;
 static bool CurrentLCTimeValid = false;
@@ -1443,8 +1446,6 @@ lc_ctype_is_c(Oid collation)
 	return (lookup_collation_cache(collation, true))->ctype_is_c;
 }
 
-struct pg_locale_struct default_locale;
-
 void
 make_icu_collator(const char *iculocstr,
   const char *icurules,
@@ -1537,6 +1538,65 @@ pg_locale_deterministic(pg_locale_t locale)
 		return locale->deterministic;
 }
 
+void
+pg_init_database_collation()
+{
+	HeapTuple	tup;
+	Form_pg_database dbform;
+	Datum		datum;
+	bool		isnull;
+
+	/* Fetch our pg_database row normally, via syscache */
+	tup = SearchSysCache1(DATABASEOID, ObjectIdGetDatum(MyDatabaseId));
+	if (!HeapTupleIsValid(tup))
+		elog(ERROR, "cache lookup failed for database %u", MyDatabaseId);
+	dbform = (Form_pg_database) GETSTRUCT(tup);
+
+	if (dbform->datlocprovider == COLLPROVIDER_BUILTIN)
+	{
+		char	*datlocale;
+
+		datum = SysCacheGetAttrNotNull(DATABASEOID, tup, Anum_pg_database_datlocale);
+		datlocale = TextDatumGetCString(datum);
+
+		builtin_validate_locale(dbform->encoding, datlocale);
+
+		default_locale.info.builtin.locale = MemoryContextStrdup(
+ TopMemoryContext, datlocale);
+	}
+	else if (dbform->datlocprovider == COLLPROVIDER_ICU)
+	{
+		char	*datlocale;
+		char	*icurules;
+
+		datum = SysCacheGetAttrNotNull(DATABASEOID, tup, Anum_pg_database_datlocale);
+		datlocale = TextDatumGetCString(datum);
+
+		datum = SysCacheGetAttr(DATABASEOID, tup, Anum_pg_database_daticurules, &isnull);
+		if (!isnull)
+			icurules = TextDatumGetCString(datum);
+		else
+			icurules = NULL;
+
+		make_icu_collator(datlocale, icurules, &default_locale);
+	}
+	else
+	{
+		Assert(dbform->datlocprovider == COLLPROVIDER_LIBC);
+	}
+
+	default_locale.provider = dbform->datlocprovider;
+
+	/*
+	 * Default locale is currently always deterministic.  Nondeterministic
+	 * locales currently don't support pattern matching, which would break a
+	 * lot of things if applied globally.
+	 */
+	default_locale.deterministic = true;
+
+	ReleaseSysCache(tup);
+}
+
 /*
  * Create a locale_t from a collation OID.  Results are cached for the
  * lifetime of the backend.  Thus, do not free the result with freelocale().
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index 0805398e24..6347efdd5a 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -423,43 +423,14 @@ CheckMyDatabase(const char *name, bool am_superuser, bool override_allow_connect
 		strcmp(ctype, "POSIX") == 0)
 		database_ctype_is_c = true;
 
-	if (dbform->datlocprovider == COLLPROVIDER_BUILTIN)
-	{
-		datum = SysCacheGetAttrNotNull(DATABASEOID, tup, Anum_pg_database_datlocale);
-		datlocale = TextDatumGetCString(datum);
-
-		builtin_validate_locale(dbform->encoding, datlocale);
-
-		default_locale.info.builtin.locale = MemoryContextStrdup(
- TopMemoryContext, datlocale);
-	}
-	else if (dbform->datlocprovider == COLLPROVIDER_ICU)
-	{
-		char	   *icurules;
+	pg_init_database_collation();
 
-		datum = SysCacheGetAttrNotNull(DATABASEOID, tup, Anum_pg_database_datlocale);
+	datum = SysCacheGetAttr(DATABASEOID, tup, Anum_pg_database_datlocale, &isnull);
+	if (!isnull)
 		datlocale = TextDatumGetCString(datum);
-
-		datum = SysCacheGetAttr(DATABASEOID, tup, Anum_pg_database_daticurules, &isnull);
-		if (!isnull)
-			icurules = TextDatumGetCString(datum);
-		else
-			icurules = NULL;
-
-		make_icu_collator(datlocale, icurules,

Re: Using LibPq in TAP tests via FFI

2024-06-14 Thread Thomas Munro

On Sat, Jun 15, 2024 at 3:33 AM Robert Haas  wrote:
> I haven't reviewed the patch, but a big +1 for the idea. Not only this
> might cut down on the resource costs of running the tests in CI, as

It would be good to keep some context between the threads here.  For
the archives' sake, here is where the potential savings were reported,
and this and other ideas were discussed:

https://www.postgresql.org/message-id/flat/CA%2BhUKGJoEO33K%3DZynsH%3DxkiEyfBMZjOoqBK%2BgouBdTGW2-woig%40mail.gmail.com

Re: DROP OWNED BY fails to clean out pg_init_privs grants