date:20250307

Re: [PATCH] Add get_bytes() and set_bytes() functions

2025-03-07 Thread Dean Rasheed

On Sat, 1 Mar 2025 at 11:30, Dean Rasheed  wrote:
>
> With those updates, I think this is ready for commit, which I'll do in
> a day or two, if there are no further comments.
>

Committed.

Regards,
Dean

Re: Log connection establishment timings

2025-03-07 Thread Bertrand Drouvot

Hi,

On Thu, Mar 06, 2025 at 11:41:03AM -0500, Melanie Plageman wrote:
> 
> I still need to do some more manual testing and validation.
> 
> On Thu, Mar 6, 2025 at 9:56 AM Bertrand Drouvot
>  wrote:
> 
> > +   /* If an empty string was passed, we're done. */
> > +   if (list_length(elemlist) == 0)
> > +   return 0;
> > +
> < -- snip -->
> > what about storing the list_length(elemlist) at the start to avoid the 
> > multiple
> > list_length(elemlist) calls?
> 
> Since it would only be called twice

Oh right, I was focusing on the top loop but missed the "return" if the length 
is
 > 1.

> and it has the length stored in
> the List struct, I prefer it as is -- without an extra local variable.

Yeah, me too, sorry for the noise.

> How do you find spelling mistakes in patches usually?

Given the fact that I'm not a native english speaker I would say: By luck ;-)
I just try do one pass during the review just focusing on those.

> I tried `git
> show | aspell list` -- but of course that includes lots of variables
> and such that aren't actually spelling mistakes.

Trying to "automate" is actually a good idea!

> Well, I actually didn't want to call it "timings" because it implies
> that we only measure the timings when it is specified. But we actually
> get the timings unconditionally for client backends and wal sender.

> Don't you think it is more confusing for the user if we call it
> timings and users think if they don't include that timings aren't
> measured?

That's a good question. I think the expectations are set in the log_connections
GUC documentation. It says "Causes the specified stages of each connection 
attempt
to the server to be logged". So for me that would be like: log yes or no the
timings.

Of course to be logged those need to be measured and one could expect the 
timings
not to be measured if timings is not set.

But at the end, what we're "really" interested in this thread, given its 
$SUBJECT,
is to actually log the timings. 

So yeah, from my point of view I think that it would be better if the option
name highlights the fact that it is about timing (and not only in its 
description
as with ready_for_query). One option to address your concern could be to re-word
the doc a bit saying (logs timings that are measured independently of the GUC 
value,
something like this).

That said If you feel that it's too confusing to use "timings" then I'm fine 
too.

> Also, I thought changing the message output to say "ready for query"
> somewhere in it makes it more clear when that message is actually
> emitted in a connection's lifetime. What do you think?

I agree if we keep ready_for_query as the option name.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

RE: Selectively invalidate caches in pgoutput module

2025-03-07 Thread Hayato Kuroda (Fujitsu)

Dear Amit,

Thanks for the comment! PSA new version.

> 
> Few comments:
> 1. Why do we need to invalidate relsync entries when owner of its
> publication changes?
> 
> I think the owner change will impact the future Alter Publication ...
> Add/Drop/Set/Rename operations as that will be allowed only to new
> owner (or super users), otherwise, there shouldn't be an impact on
> RelSyncCache entries. Am, I missing something?

Actually, I did not find reasons to invalidate even OWNER command for now. I
included it to 1) follow current rule, and to 2) prepare for future update.
 
1) - Currently RelSyncCache entries are discarded even by ALTER PUBLICATION 
OWNER
 statements. I wanted to preserve it.
2) - In future versions, privilege might be introduced for the publications,
 some publications would not be visible for other db users. In this case
 I feel we should modify RelationSyncEntry::pubactions,  columns and
 exprstate thus entries must be discarded.

Now I removed invalidations for OWNER command. Let's revert the change if we 
miss
something.

> 2.
> + if (pubform->puballtables)
> + {
> + CacheInvalidateRelSyncAll();
> + }
> + else
> + {
> + List*relids = NIL;
> + List*schemarelids = NIL;
> +
> + /*
> + * For partition table, when we insert data, get_rel_sync_entry is
> + * called and a hash entry is created for the corresponding leaf table.
> + * So invalidating the leaf nodes would be sufficient here.
> + */
> + relids = GetPublicationRelations(pubform->oid,
> + PUBLICATION_PART_LEAF);
> + schemarelids = GetAllSchemaPublicationRelations(pubform->oid,
> + PUBLICATION_PART_LEAF);
> +
> + relids = list_concat_unique_oid(relids, schemarelids);
> +
> + InvalidateRelSyncCaches(relids);
> + }
> +
> + CatalogTupleUpdate(rel, &tup->t_self, tup);
> 
> Shouldn't we need to update the CatalogTuple before invalidations.

Right, fixed.

> 3.
> + if (pubform->puballtables)
> + {
> + CacheInvalidateRelSyncAll();
> + }
> + else
> + {
> + List*relids = NIL;
> + List*schemarelids = NIL;
> +
> + /*
> + * For partition table, when we insert data, get_rel_sync_entry is
> + * called and a hash entry is created for the corresponding leaf table.
> + * So invalidating the leaf nodes would be sufficient here.
> + */
> + relids = GetPublicationRelations(pubform->oid,
> + PUBLICATION_PART_LEAF);
> + schemarelids = GetAllSchemaPublicationRelations(pubform->oid,
> + PUBLICATION_PART_LEAF);
> +
> + relids = list_concat_unique_oid(relids, schemarelids);
> +
> + InvalidateRelSyncCaches(relids);
> + }
> 
> This is a duplicate code. Can we write a function to eliminate this duplicacy?

Since the part has been removed from OWNER command, duplicacy was removed.
I did not introduce a function for this.

Best regards,
Hayato Kuroda
FUJITSU LIMITED



v5-0001-Introduce-a-new-invalidation-message-to-invalidat.patch
Description:  v5-0001-Introduce-a-new-invalidation-message-to-invalidat.patch


v5-0002-Invalidate-Relcaches-while-ALTER-PUBLICATION-RENA.patch
Description:  v5-0002-Invalidate-Relcaches-while-ALTER-PUBLICATION-RENA.patch

Re: Trigger more frequent autovacuums

2025-03-07 Thread wenhui qiu

Sorry ,A wrong version of debug pcnt_visibletuples ，kindly please check the
v3 attachment

On Fri, Mar 7, 2025 at 5:37 PM wenhui qiu  wrote:

> Hi
>The more accurate data I've found is tabentry->live_tuples; provides
> the second version
>
> #Here's a simple test I did
>
> test=# select count(*) from join1;
>   count
> -
>  2289001
> (1 row)
>
>
> test=# update join1 set name=md5(now()::text) where id<100;
> UPDATE 1938700
> test=# select 1938700/2289001;
>  ?column?
> --
> 0
> (1 row)
>
> test=# select 1938700/2289001::float;
>   ?column?
> 
>  0.8469633696097119
> (1 row)
>
> test=#
>
>
>
> test=# select count(*) from join1;
>   count
> -
>  2289001
> (1 row)
> test=# update join1 set name=md5(now()::text) where id<=8;
> UPDATE 159901
> test=# select 159901/2289001::float;
>   ?column?
> -
>  0.06985623859491542
> (1 row)
>
>
> test=# select * from pg_stat_all_tables where relname='join1';
> -[ RECORD 1 ]--+--
> relid  | 16385
> schemaname | public
> relname| join1
> seq_scan   | 17
> last_seq_scan  | 2025-03-07 15:34:02.793659+08
> seq_tup_read   | 14994306
> idx_scan   | 7
> last_idx_scan  | 2025-03-07 15:34:23.404788+08
> idx_tup_fetch  | 500281
> n_tup_ins  | 2289001
> n_tup_upd  | 2268604
> n_tup_del  | 0
> n_tup_hot_upd  | 399
> n_tup_newpage_upd  | 2268205
> n_live_tup | 2286701
> n_dead_tup | 159901
> n_mod_since_analyze| 159901
> n_ins_since_vacuum | 0
> last_vacuum| 2025-03-06 18:18:11.318419+08
> last_autovacuum| 2025-03-07 15:25:53.055576+08
> last_analyze   | 2025-03-06 18:18:11.424253+08
> last_autoanalyze   | 2025-03-07 15:25:53.456656+08
> vacuum_count   | 3
> autovacuum_count   | 3
> analyze_count  | 2
> autoanalyze_count  | 4
> total_vacuum_time  | 205
> total_autovacuum_time  | 2535
> total_analyze_time | 203
> total_autoanalyze_time | 1398
>
> test=#
> test=# update join1 set name=md5(now()::text) where id<=8;
> UPDATE 159901
>
>
> test=# \x
> Expanded display is on.
> test=# select (n_live_tup)/(n_live_tup+n_dead_tup)::float from
> pg_stat_all_tables where relname='join1';
> -[ RECORD 1 ]
> ?column? | 0.8774142777358045
>
> test=# select *  from pg_stat_all_tables where relname='join1';
> -[ RECORD 1 ]--+--
> relid  | 16385
> schemaname | public
> relname| join1
> seq_scan   | 17
> last_seq_scan  | 2025-03-07 15:34:02.793659+08
> seq_tup_read   | 14994306
> idx_scan   | 8
> last_idx_scan  | 2025-03-07 15:46:38.331795+08
> idx_tup_fetch  | 660182
> n_tup_ins  | 2289001
> n_tup_upd  | 2428505
> n_tup_del  | 0
> n_tup_hot_upd  | 424
> n_tup_newpage_upd  | 2428081
> n_live_tup | 2289001
> n_dead_tup | 319802
> n_mod_since_analyze| 0
> n_ins_since_vacuum | 0
> last_vacuum| 2025-03-06 18:18:11.318419+08
> last_autovacuum| 2025-03-07 15:25:53.055576+08
> last_analyze   | 2025-03-06 18:18:11.424253+08
> last_autoanalyze   | 2025-03-07 15:47:35.950932+08
> vacuum_count   | 3
> autovacuum_count   | 3
> analyze_count  | 2
> autoanalyze_count  | 5
> total_vacuum_time  | 205
> total_autovacuum_time  | 2535
> total_analyze_time | 203
> total_autoanalyze_time | 1770
>
> test=#
> tail -n 1000 postgresql-Fri_17.csv |grep join1
> 2025-03-07 17:30:12.782 +08,,,755739,,67cabca4.b881b,3,,2025-03-07
> 17:30:12 +08,2017/2,0,DEBUG,0,"vacthresh: 457850.218750,anlthresh:
> 228950.109375, the join1 has 2289001.00 reltuples, pcnt_unfrozen:
> 1.00, pcnt_visibletuples: 0.877414 ","","autovacuum worker",,0
> 2025-03-07 17:31:12.803 +08,,,756028,,67cabce0.b893c,3,,2025-03-07
> 17:31:12 +08,2003/4,0,DEBUG,0,"vacthresh: 457850.218750,anlthresh:
> 228950.109375, the join1 has 2289001.00 reltuples, pcnt_unfrozen:
> 1.00, pcnt_visibletuples: 0.877414 ","","autovacuum worker",,0
> 2025-03-07 17:32:12.822 +08,,,756405,,67cabd1c.b8ab5,3,,2025-03-07
> 17:32:12 +08,2006/4,0,DEBUG,0,"vacthresh: 457850.218750,anlthresh:
> 228950.109375, the join1 has 2289001.00 reltuples, pcnt_unfrozen:
> 1.00, pcnt_visibletuples: 0.877414 ","","autovacuum worker",,0
> 2025-03-07 17:33:12.842 +08,,,757026,,67cabd58.b8d22,3,,2025-03-07
> 17:33:12 +08,2009/4,0,DEBUG,0,"vacthresh: 457850.218750,anlthresh:
> 228950.109375, the join1 has 2289001.00 reltuples, pcnt_unfrozen:
> 1.00, pcnt_visibletuples: 0.877414 ","","autovacuum worker",,0
>
> On Fri, Mar 7, 2025 at 2:22 PM wenhui qiu  wrote:
>
>> HI Nathan Bossa

Re: [PATCH] New predefined role pg_manage_extensions

2025-03-07 Thread Laurenz Albe

On Fri, 2025-02-28 at 19:16 +0100, Michael Banck wrote:
> New version 3 attached.

I am fine with v3, and I'll mark it "ready for committer".

I have been wondering about regression tests.
We cannot have them in the core tests, because we cannot rely on any
extension being available.
We could shove a test into a regression test for an existing contrib,
but that would be somewhat off-topic there.  Not sure what would be best.

Yours,
Laurenz Albe

Re: strange valgrind reports about wrapper_handler on 64-bit arm

2025-03-07 Thread Nathan Bossart

On Fri, Mar 07, 2025 at 12:03:47AM +0100, Tomas Vondra wrote:
> while running check-world on 64-bit arm (rpi5 with Debian 12.9), I got a
> couple reports like this:
> 
> ==64550== Use of uninitialised value of size 8
> ==64550==at 0xA62FE0: wrapper_handler (pqsignal.c:107)
> ==64550==by 0x580BB9E7: ??? (in
> /usr/libexec/valgrind/memcheck-arm64-linux)
> ==64550==  Uninitialised value was created by a stack allocation
> ==64550==at 0x4F94660: strcoll_l (strcoll_l.c:258)
> ==64550==
> {
>
>Memcheck:Value8
>fun:wrapper_handler
>obj:/usr/libexec/valgrind/memcheck-arm64-linux
> }
> **64550** Valgrind detected 1 error(s) during execution of "ANALYZE
> mcv_lists;"
> 
> The exact command varies, I don't think it's necessarily about analyze
> or extended stats.
> 
> The line the report refers to is this:
> 
> (*pqsignal_handlers[postgres_signal_arg]) (postgres_signal_arg);
> 
> so I guess it can't be about postgres_signal_arg (as that's an int). But
> that leaves just pqsignal_handlers, and why would that be uninitialized?
> 
> The closest thing I found in archives is [1] from about a year ago, but
> we haven't found any clear explanation there either :-(

Hm.  The pointer to strcoll_l makes me wonder if there might be an issue in
one of the handler functions that wrapper_handler calls, and
wrapper_handler is getting the blame.  I don't see how pqsignal_handlers
could be uninitialized.  It's static, and we are careful to set it
appropriately before we call sigaction().

-- 
nathan

Re: Proposal to Enable/Disable Index using ALTER INDEX (with patch)

2025-03-07 Thread Shayon Mukherjee

On Sun, Feb 23, 2025 at 3:41 PM Shayon Mukherjee  wrote:

> I have rebased the patch on top of master (resolving some merge
> conflicts), along with the meson changes (thank you for that).
>

Rebased against the latest master and attaching the v13 patch.

Thank you
Shayon

v13-0001-Introduce-the-ability-to-enable-disable-indexes-.patch
Description: Binary data

Re: dblink: Add SCRAM pass-through authentication

2025-03-07 Thread Peter Eisentraut


On 06.03.25 22:58, Jacob Champion wrote:

On Thu, Mar 6, 2025 at 12:33 PM Peter Eisentraut  wrote:

AFAICT, in pgfdw_security_check(), if SCRAM has been used for the
outgoing server connection, then PQconnectionUsedPassword() is true, and
then this check should fail if no "password" parameter was given.  That
check should be expanded to allow alternatively passing the SCRAM key
component parameters.


pgfdw_security_check() is currently not called if SCRAM passthrough is
in use, though:


/*
 * Perform post-connection security checks only if scram pass-through
 * is not being used because the password is not necessary.
 */
if (!(MyProcPort->has_scram_keys && UseScramPassthrough(server, user)))
pgfdw_security_check(keywords, values, user, conn);


Right.  How about the attached?  It checks as an alternative to a 
password whether the SCRAM keys were provided.  That should get us back 
to the same level of checking?
From daa5ff65007ed1cef49020191b50abf226228d95 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Fri, 7 Mar 2025 17:20:33 +0100
Subject: [PATCH] WIP: postgres_fdw: Fix SCRAM pass-through security

---
 contrib/postgres_fdw/connection.c | 41 ---
 1 file changed, 27 insertions(+), 14 deletions(-)

diff --git a/contrib/postgres_fdw/connection.c 
b/contrib/postgres_fdw/connection.c
index 8ef9702c05c..5a069b54078 100644
--- a/contrib/postgres_fdw/connection.c
+++ b/contrib/postgres_fdw/connection.c
@@ -447,12 +447,23 @@ pgfdw_security_check(const char **keywords, const char 
**values, UserMapping *us
/* Connected via PW, with PW required true, and provided non-empty PW. 
*/
if (PQconnectionUsedPassword(conn))
{
-   /* ok if params contain a non-empty password */
+   boolhas_scram_server_key = false;
+   boolhas_scram_client_key = false;
+
+   /* ok if params contain a non-empty password or SCRAM keys */
for (int i = 0; keywords[i] != NULL; i++)
{
if (strcmp(keywords[i], "password") == 0 && 
values[i][0] != '\0')
return;
+
+   if (strcmp(keywords[i], "scram_client_key") == 0 && 
values[i][0] != '\0')
+   has_scram_client_key = true;
+   if (strcmp(keywords[i], "scram_server_key") == 0 && 
values[i][0] != '\0')
+   has_scram_server_key = true;
}
+
+   if (has_scram_client_key && has_scram_server_key && 
MyProcPort->has_scram_keys)
+   return;
}
 
ereport(ERROR,
@@ -586,12 +597,8 @@ connect_pg_server(ForeignServer *server, UserMapping *user)
 
keywords[n] = values[n] = NULL;
 
-   /*
-* Verify the set of connection parameters only if scram 
pass-through
-* is not being used because the password is not necessary.
-*/
-   if (!(MyProcPort->has_scram_keys && UseScramPassthrough(server, 
user)))
-   check_conn_params(keywords, values, user);
+   /* verify the set of connection parameters */
+   check_conn_params(keywords, values, user);
 
/* first time, allocate or get the custom wait event */
if (pgfdw_we_connect == 0)
@@ -609,12 +616,8 @@ connect_pg_server(ForeignServer *server, UserMapping *user)
server->servername),
 errdetail_internal("%s", 
pchomp(PQerrorMessage(conn);
 
-   /*
-* Perform post-connection security checks only if scram 
pass-through
-* is not being used because the password is not necessary.
-*/
-   if (!(MyProcPort->has_scram_keys && UseScramPassthrough(server, 
user)))
-   pgfdw_security_check(keywords, values, user, conn);
+   /* Perform post-connection security checks */
+   pgfdw_security_check(keywords, values, user, conn);
 
/* Prepare new session for use */
configure_remote_session(conn);
@@ -703,6 +706,8 @@ static void
 check_conn_params(const char **keywords, const char **values, UserMapping 
*user)
 {
int i;
+   boolhas_scram_server_key = false;
+   boolhas_scram_client_key = false;
 
/* no check required if superuser */
if (superuser_arg(user->userid))
@@ -714,13 +719,21 @@ check_conn_params(const char **keywords, const char 
**values, UserMapping *user)
return;
 #endif
 
-   /* ok if params contain a non-empty password */
+   /* ok if params contain a non-empty password or SCRAM keys */
for (i = 0; keywords[i] != NULL; i++)
{

Re: Statistics Import and Export

2025-03-07 Thread Andres Freund

Hi,

On 2025-03-06 17:42:30 -0800, Jeff Davis wrote:
> At minimum, we would need to at least add the option "--with-
> statistics", because right now the only way to explicitly request stats
> is to say "--statistics-only".

+1, this has been annoying me while testing.

I did get confused for a while because I used --statistics, as the opposite of
--no-statistics, while going back and forth between the two. Kinda appears to
work, but actually means --statistics-only, something rather different...


> To generalize this concept: for each of {schema, data, stats} users
> might want "yes", "no", or "only".

> If we use this options scheme, it would be easy to change the default
> for stats independently of the other options, if necessary, without
> surprising consequences.
> 
> Patch attached. This patch does NOT change the default; stats are still
> opt-out. But it makes it easier for users to start specifying what they
> want or not explicitly, or to rely on the defaults if they prefer.
> 
> Note that the patch would mean we go from 2 options in v17:
>   --{schema|data}-only
> 
> to 9 options in v18:
>   --{with|no}-{schema|data|stats} and
>   --{schema|data|stats}-only

Could we, instead of having --with-$foo, just use --$foo?

Greetings,

Andres Freund

Re: TOAST versus toast

2025-03-07 Thread Robert Treat

On Mon, Feb 17, 2025 at 6:27 PM David G. Johnston
 wrote:
>
> On Wed, Jan 15, 2025 at 10:38 PM Peter Smith  wrote:
>>
>> On Thu, Jan 16, 2025 at 3:26 PM Tom Lane  wrote:
>> >
>> > Peter Smith  writes:
>> > > During some recent reviews, I came across some comments mentioning 
>> > > "toast" ...
>> > > TOAST is a PostgreSQL acronym for "The Oversized-Attribute Storage
>> > > Technique" [1].
>> >
>> > It is indeed an acronym, but usages such as "toasting" are all over
>> > our code and docs, as you see.  I question whether changing that
>> > to "TOASTing" improves readability.  I agree that consistently
>> > saying "TOAST table" not "toast table" is a good idea, but I'm
>> > not quite convinced that removing every last lower-case occurrence
>> > is a win, especially in these combined forms.
>> >

I took a look at this a few weeks ago and couldn't get excited about
it. It does seem to me that the cases where we use TOAST as a verb are
more readable when done in lower case, and this is pretty common in
everyday english/grammar; as an example, people would generally write
"the dr. lasered the tumor" not "the dr. LASERed the tumor".  So I am
+1 on the idea of not uppercasing these instances, but the flip side
"should we ensure we are lower casing them" is interesting... we
usually do, but there are a few cases where we don't (typically where
they have been labeled as acronyms). the documentation on
pg_column_toast_chunk_id is a good example:

Shows the chunk_id of an on-disk
TOASTed value.  Returns NULL
if the value is un-TOASTed or not on-disk.  See
 for more information about
TOAST.

>
>
> I'm not particularly convinced that "TOAST table" is a good idea; but I don't 
> hate it either.
>
> TOAST is a "technique", design feature, algorithm, process.  When referring 
> to that concept, using TOAST makes sense.  The implementation artifacts are 
> conveniently labelled e.g., "toast tables", and can be used in the same 
> capitalization that one would write "foreign table" or "temporary table".  
> Sure, we can define our made-up label as "TOAST tables" but it just makes it 
> stand out unnecessarily in comparison to "temporary tables" and the like.
>
> I'd be more interested in making sure all TOAST references are in regards to 
> the technique and lower-case the ones that aren't.
>

I kind of wondered about this, because I felt pretty used to seeing
the term "TOAST table", so I did some quick searches, and it looks
like we have about 20 cases where we use TOAST table vs about 10 where
we use toast table, specifically focusing on cases where we don't add
any markup to the word "toast", and about 20 more where we use
"TOAST table". So ISTM that folks are probably used
to seeing the term with upper case, but not universally so... so I
could probably get onboard with David's suggestion, although tbh I
probably would lean the other way.


Robert Treat
https://xzilla.net

Re: [PATCH] New predefined role pg_manage_extensions

2025-03-07 Thread Tom Lane

Jelte Fennema-Nio  writes:
> The reason why I walked back my comment was that cloud providers can
> simply choose which extensions they actually add to the image. If an
> extension is marked as not trusted by the author, then with this role
> they can still choose to add it without having to make changes to the
> control file if they think it's "secure enough".

If they think it's "secure enough", they can mark it trusted in their
images.  Why do we need anything beyond that?

regards, tom lane

Re: AIX support

2025-03-07 Thread Robert Haas

On Fri, Mar 7, 2025 at 9:11 AM Srirama Kucherlapati
 wrote:
> Our team has identified couple of issues with the meson build on AIX, 
> primarily focusing on the following areas:
>
> Symbol extraction and resolution in object files during binary creation.
> Dynamic runtime library path resolution in shared libraries.
>
> We have resolved them and we plan to submit these fixes to the meson 
> community shortly.

Nice!

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: new commitfest transition guidance

2025-03-07 Thread jian he

hi.

It seems we don't have much info about "Patch Triage" in 2025.

but 2023, 2024 we do have.
like:
https://wiki.postgresql.org/wiki/FOSDEM/PGDay_2024_Developer_Meeting#v17_Patch_Triage
and
https://wiki.postgresql.org/wiki/FOSDEM/PGDay_2023_Developer_Meeting#v16_Patch_Triage

RE: Query ID Calculation Fix for DISTINCT / ORDER BY and LIMIT / OFFSET

2025-03-07 Thread Bykov Ivan

Hello!

Last time, I forgot to attach the patches.
The problem still persists in the 17.3 release.

Solution One

The simplest way to fix the problem is to place the scalar field used in the 
query ID calculation
between similar subnodes.
A patch for this solution is attached below 
(0001-Query-ID-Calculation-Fix-Variant-A.patch).

Solution Two

Alternatively, we can change the hash sum when we encounter an empty node.
This approach may impact performance but will protect us from such errors in 
the future.
A patch for this solution is attached below 
(0001-Query-ID-Calculation-Fix-Variant-B.patch).


==
SELECT version();
 version
-
PostgreSQL 17.3 on x86_64-pc-linux-gnu, compiled by gcc (Debian 12.2.0-14) 
12.2.0, 64-bit

SET compute_query_id = on;

/* LIMIT / OFFSET */
EXPLAIN (VERBOSE) SELECT "oid" FROM pg_class LIMIT 1;

 QUERY PLAN

Limit  (cost=0.00..0.04 rows=1 width=4)
   Output: oid
   ->  Seq Scan on pg_catalog.pg_class  (cost=0.00..18.15 rows=415 width=4)
 Output: oid
Query Identifier: 5185884322440896420

EXPLAIN (VERBOSE) SELECT "oid" FROM pg_class OFFSET 1;
 QUERY PLAN

Limit  (cost=0.04..18.15 rows=414 width=4)
   Output: oid
   ->  Seq Scan on pg_catalog.pg_class  (cost=0.00..18.15 rows=415 width=4)
 Output: oid
Query Identifier: 5185884322440896420

/* DISTINCT / ORDER BY */
EXPLAIN (VERBOSE) SELECT DISTINCT "oid" FROM pg_class;

 QUERY PLAN

Unique  (cost=0.27..23.54 rows=415 width=4)
   Output: oid
   ->  Index Only Scan using pg_class_oid_index on pg_catalog.pg_class  
(cost=0.27..22.50 rows=415 width=4)
 Output: oid
Query Identifier: 751948508603549510

EXPLAIN (VERBOSE) SELECT "oid" FROM pg_class ORDER BY "oid";

  QUERY PLAN
--
Index Only Scan using pg_class_oid_index on pg_catalog.pg_class  
(cost=0.27..22.50 rows=415 width=4)
   Output: oid
Query Identifier: 751948508603549510



0001-Query-ID-Calculation-Fix-Variant-A.patch
Description: 0001-Query-ID-Calculation-Fix-Variant-A.patch


0001-Query-ID-Calculation-Fix-Variant-B.patch
Description: 0001-Query-ID-Calculation-Fix-Variant-B.patch

Re: [PATCH] New predefined role pg_manage_extensions

2025-03-07 Thread Tom Lane

Robert Haas  writes:
> On Fri, Mar 7, 2025 at 9:37 AM Michael Banck  wrote:
>> Also, I think there is case to be made that a cloud provider (or site
>> admin) would like to delegate the decision whether users with CREATE
>> rights on a particular database are allowed to install some extensions
>> or not. Or rather, assign somebody they believe would make the right
>> call to do that, by granting pg_manage_extensions.

> Hypothetically, somebody could want a feature at various levels of
> granularity. The most fine-grained would be something like: [1] allow
> user X to install extension Y. Then, more broadly, you could have: [2]
> allow any user who can install extensions to install extension Y. Or
> conversely: [3] allow user X to install any extension. This patch
> implements [3], but you could make an argument for any of the others.

It's not apparent to me how [3] is meaningfully different from
giving user X superuser.  If you have the ability to install and
use, say, file_fdw, then nothing except honesty stands between you
and a superuser bit.  Is the argument for this feature that cloud
providers won't realize that?  Or perhaps the argument is that the
provider will only provide pre-vetted extensions to install ---
but then the existing "trusted extension" feature does everything
they need.

While I'm all for chipping away at what superuser privilege is
needed for, we have to tread VERY carefully about chipping away
at things that allow any outside-the-database access.

regards, tom lane

Re: doc: expand note about pg_upgrade's --jobs option

2025-03-07 Thread Magnus Hagander

On Wed, Mar 5, 2025 at 5:28 PM Nathan Bossart 
wrote:

> On Wed, Mar 05, 2025 at 09:35:27AM -0600, Nathan Bossart wrote:
> > On Wed, Mar 05, 2025 at 01:52:40PM +0100, Magnus Hagander wrote:
> >> Another option that I think would also work is to just cut down the
> details
> >> to just "The --jobs option allows multiple CPU cores
> to be
> >> used".
> >
> > That's fine with me.  It's probably not particularly actionable
> > information, anyway.  If anything, IMHO we should make it clear to users
> > that the parallelization is per-database (except for file transfer, which
> > is per-tablespace).  If you've just got one big database in the default
> > tablespace, --jobs won't help.
> >
> >> I think this is also slightly confusing, but maybe that's a
> >> non-native-english thing: "a good place to start is the maximum of the
> >> number of  CPU cores and tablespaces.". Am I supposed to set it to
> >> max(cpucores, ntablespaces) or to max(cpucores+ntablespaces)?
> >
> > I've always read it to mean the former.  But I'm not sure that's great
> > advice.  If you have 8 cores and 100 tablespaces, does it make sense to
> use
> > --jobs=100?  Ordinarily, I'd suggest the number of cores as the starting
> > point.
>
> Here's another attempt at the patch based on the latest discussion.
>

LGTM!

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/

Re: per backend WAL statistics

2025-03-07 Thread Michael Paquier

On Fri, Mar 07, 2025 at 08:33:04AM +, Bertrand Drouvot wrote:
> But when it's time to flush, then pgstat_backend_have_pending_cb() checks
> for zeros in PendingBackendStats *without* any check on the backend type.
> 
> I think the issue is "masked" on HEAD because PendingBackendStats is
> probably automatically initialized with zeros (as being a static variable at
> file scope).

If this weren't true, we would have a lot of problems in more places
than this one.  It does not hurt to add an initialization at the top
of pgstat_backend.c for PendingBackendStats, to document the
intention, while we're on it.

Did both things, and applied the result.
--
Michael

signature.asc
Description: PGP signature

Re: strange valgrind reports about wrapper_handler on 64-bit arm

2025-03-07 Thread Andres Freund

Hi,

On 2025-03-07 00:03:47 +0100, Tomas Vondra wrote:
> while running check-world on 64-bit arm (rpi5 with Debian 12.9), I got a
> couple reports like this:
> 
> ==64550== Use of uninitialised value of size 8
> ==64550==at 0xA62FE0: wrapper_handler (pqsignal.c:107)
> ==64550==by 0x580BB9E7: ??? (in
> /usr/libexec/valgrind/memcheck-arm64-linux)
> ==64550==  Uninitialised value was created by a stack allocation
> ==64550==at 0x4F94660: strcoll_l (strcoll_l.c:258)
> ==64550==
> {
>
>Memcheck:Value8
>fun:wrapper_handler
>obj:/usr/libexec/valgrind/memcheck-arm64-linux
> }
> **64550** Valgrind detected 1 error(s) during execution of "ANALYZE
> mcv_lists;"

> The exact command varies, I don't think it's necessarily about analyze
> or extended stats.

Do you have a few other examples from where it was triggered?

Is the source of the uninitialized value always strcoll_l?

Can you reliably reproduce it in certain scenarios or is it probabilistic in
some form?

Do you know what signal was delivered (I think that could be detected using
valgrinds --vgdb)?

> The line the report refers to is this:
> 
> (*pqsignal_handlers[postgres_signal_arg]) (postgres_signal_arg);
> 
> so I guess it can't be about postgres_signal_arg (as that's an int). But
> that leaves just pqsignal_handlers, and why would that be uninitialized?

Is it possible that the signal number we're getting called for is above
PG_NSIG? That'd explain why the source value is something fairly random?

ISTM that we should add an Assert() to wrapper_handler() that ensures that the
signal arg is below PG_NSIG.

Might also be worth trying to run without valgrind but with address and
undefined behaviour sanitizers enabled.  I don't currently have access to an
armv8 machine that's not busy doing other stuff...

Greetings,

Andres Freund

Re: Add contrib/pg_logicalsnapinspect

2025-03-07 Thread Masahiko Sawada

On Fri, Mar 7, 2025 at 2:42 AM Bertrand Drouvot
 wrote:
>
> Hi,
>
> On Fri, Mar 07, 2025 at 10:26:23AM +0530, Amit Kapila wrote:
> > On Fri, Mar 7, 2025 at 3:19 AM Masahiko Sawada  
> > wrote:
> > >
> > > On Wed, Mar 5, 2025 at 4:05 AM Bertrand Drouvot
> > >  wrote:
> > > >
> > > > Hi,
> > > >
> > > > On Wed, Mar 05, 2025 at 02:42:15PM +0530, Amit Kapila wrote:
> > > > > On Wed, Mar 5, 2025 at 12:47 PM Bertrand Drouvot
> > > > >  wrote:
> > > > > >
> > > > > > Agree, PFA a patch doing so.
> > > > > >
> > > > >
> > > > > It would be better if you could add a few comments atop the
> > > > > permutation line to explain the working of the test.
> > > >
> > > > yeah makes sense. Done in the attached, and bonus point I realized that 
> > > > the
> > > > test could be simplified (so, removing useless steps in passing).
> > > >
> > >
> > > Thank you for the patch.
> > >
> > > The new simplified test case can be pretty-formatted as:
> > >
> > > init
> > > begin
> > > savepoint
> > > truncate
> > > checkpoint-1
> > > get_changes-1
> > > commit
> > > checkpoint-2
> > > get_changes-2
> > > info_catchange check
> > > info_committed check
> > > meta check
>
> Yes.
>
> > > IIUC if another checkpoint happens between get_change-2 and the
> > > subsequent checks, the first snapshot would be removed during the
> > > checkpoint, resulting in a test failure.
>
> Good catch! Yeah you're right, thanks!
>
> > I think we could check the
> > > snapshot files while one transaction keeps open. The more simplified
> > > test case would be:
> > >
> > > init
> > > begin
> > > savepoint
> > > insert(cat-change)
> > > begin
> > > insert(cat-change)
> > > commit
> > > checkpoint
> > > get_changes
> > > info_catchange check
> > > info_committed check
> > > meta check
> > > commit
> > >
> > > In this test case, we would have at least one serialized snapshot that
> > > has both cat-changes and committed txns. What do you think?
>
> Indeed, I think that would prevent snapshots to be removed.
>
> The attached ends up doing:
>
> init
> begin
> savepoint
> truncate table1
>create table table2
>checkpoint
>get_changes
>info check
>meta check
> commit
>
> As the 2 ongoing catalog changes and the committed catalog change are part of 
> the
> same snapshot, then I grouped the catchanges and committed changes checks in 
> the
> same "info check".
>
> > Your proposed change in the test sounds better than what we have now
> > but I think we should also avoid autovacuum to perform analyze as that
> > may add additional counts. For test_decoding, we keep
> > autovacuum_naptime = 1d in logical.conf file, we can either use the
> > same here or simply keep autovacuum off.
>
> When writing the attached, I initially added extra paranoia in the tests by
> using ">=", does that also address your autovacuum concern?
>

Thank you for updating the patch. It looks mostly good to me. I've
made some cosmetic changes and attached the updated version.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com


v4-0001-pg_logicalinspect-Stabilize-isolation-tests.patch
Description: Binary data

Re: Simplify the logic a bit (src/bin/scripts/reindexdb.c)

2025-03-07 Thread Álvaro Herrera

Hello

I find that this is still quite broken -- both the original, and your
patch.  I have already complained about a fundamental bug in the
original in [1].  In addition to what I reported there, in the unpatched
code I noticed that we're wasting memory and CPU by comparing the
qualified table name, when it would be faster to just store the table
OID and do the comparison based on that.  Because the REINDEX INDEX
query only needs to specify the *index* name, not the table name, we
don't need the table name to be stored in the indices_tables_list: we
can convert it into an OID list and store that instead.  The strcmp
becomes a numeric comparison.  Yay.  This is of course a pretty trivial,
almost meaningless change.

[1] https://postgr.es/m/202503071820.j25zn3lo4hvn@alvherre.pgsql

But your patch also makes the mistake that indices_table_list is passed
as a pointer by value, so the list that is filled by
get_parallel_tabidx_list() can never have any effect.  I wonder if you
tested this patch at all.  With your patch and the sample setup I posted
in [1], the program doesn't do anything.  It never calls REINDEX at all.
It is a dead program.  It's so bogus that in fact, there's no program,
it's just a bug that takes a lot of disk space.

For this to work, we need to pass that list (the output list) as a
pointer reference, so that the caller can _receive_ the output list.

We also get this compiler warning:

/pgsql/source/master/src/bin/scripts/reindexdb.c: In function 
‘get_parallel_tabidx_list’:
/pgsql/source/master/src/bin/scripts/reindexdb.c:782:17: warning: this ‘if’ 
clause does not guard... [-Wmisleading-indentation]
  782 | if (cell != index_list->head)
  | ^~
/pgsql/source/master/src/bin/scripts/reindexdb.c:785:25: note: ...this 
statement, but the latter is misleadingly indented as if it were guarded by the 
‘if’
  785 | appendQualifiedRelation(&catalog_query, 
cell->val, conn, echo);
  | ^~~

Not to mention the 'git apply' warnings:

I schmee: master 0$ git apply 
/tmp/v3-simplifies-reindex-one-database-reindexdb.patch
/tmp/v3-simplifies-reindex-one-database-reindexdb.patch:191: space before tab 
in indent.
 
fmtQualifiedIdEnc(PQgetvalue(res, i, 1),
/tmp/v3-simplifies-reindex-one-database-reindexdb.patch:192: space before tab 
in indent.

   PQgetvalue(res, i, 0),
/tmp/v3-simplifies-reindex-one-database-reindexdb.patch:193: space before tab 
in indent.

   PQclientEncoding(conn)));
advertencia: 3 líneas agregan errores de espacios en blanco.


Anyway, my version of this is attached.  It fixes the problems with your
patch, but not Orlov's fundamental bug.  Without fixing that bug, this
program does not deserve this supposedly parallel mode that doesn't
actually deliver.  I wonder if we shouldn't just revert 47f99a407de2
completely.

You, on the other hand, really need to be much more careful with the
patches you submit. 

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"Selbst das größte Genie würde nicht weit kommen, wenn es
alles seinem eigenen Innern verdanken wollte." (Johann Wolfgang von Goethe)
   Ni aún el genio más grande llegaría muy lejos si
quisiera sacarlo todo de su propio interior.

Re: Add column name to error description

2025-03-07 Thread Tom Lane

jian he  writes:
> On Fri, Mar 7, 2025 at 11:05 AM Tom Lane  wrote:
>> Do others agree Erik's version improves readability?

> i think so.

Pushed v4, then.

regards, tom lane

Re: Statistics Import and Export

2025-03-07 Thread Jeff Davis

On Fri, 2025-03-07 at 12:41 -0500, Robert Treat wrote:
> Ugh... this feels like a bit of the combinatorial explosion,
> especially if we ever need to add another option.

Not quite that bad, because ideally the yes/no/only  would not be
expanding as well. But I agree that it feels like a lot of options.

> I wonder if it would
> be possible to do something simple like just providing
> "--include={schema|data|stats}" where you specify only what you want,
> and leave out what you don't.

Can you explain the idea in a bit more detail? Does --
include=statistics mean include statistics also or statistics only? Can
you explicitly request that data be included but rely on the default
for statistics? What options would it override or conflict with?

Regards,
Jeff Davis

Re: Simplify the logic a bit (src/bin/scripts/reindexdb.c)

2025-03-07 Thread Álvaro Herrera

On 2025-Mar-07, Álvaro Herrera wrote:

> Anyway, my version of this is attached.  It fixes the problems with your
> patch, but not Orlov's fundamental bug.

And of course I forgot to actually attach the patch.  Good grief.

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/
"No deja de ser humillante para una persona de ingenio saber
que no hay tonto que no le pueda enseñar algo." (Jean B. Say)
diff --git a/src/bin/scripts/reindexdb.c b/src/bin/scripts/reindexdb.c
index b00c8112869..c55742daf81 100644
--- a/src/bin/scripts/reindexdb.c
+++ b/src/bin/scripts/reindexdb.c
@@ -12,6 +12,7 @@
 #include "postgres_fe.h"
 
 #include 
+#include 
 
 #include "catalog/pg_class_d.h"
 #include "common.h"
@@ -33,10 +34,14 @@ typedef enum ReindexType
 } ReindexType;
 
 
-static SimpleStringList *get_parallel_object_list(PGconn *conn,
+static SimpleStringList *get_parallel_tables_list(PGconn *conn,
   ReindexType type,
   SimpleStringList *user_list,
   bool echo);
+static void get_parallel_tabidx_list(PGconn *conn,
+	 SimpleStringList *index_list,
+	 SimpleOidList **table_list,
+	 bool echo);
 static void reindex_one_database(ConnParams *cparams, ReindexType type,
  SimpleStringList *user_list,
  const char *progname,
@@ -276,15 +281,15 @@ reindex_one_database(ConnParams *cparams, ReindexType type,
 {
 	PGconn	   *conn;
 	SimpleStringListCell *cell;
-	SimpleStringListCell *indices_tables_cell = NULL;
+	SimpleOidListCell *indices_tables_cell = NULL;
 	bool		parallel = concurrentCons > 1;
-	SimpleStringList *process_list = user_list;
-	SimpleStringList *indices_tables_list = NULL;
+	SimpleStringList *process_list;
+	SimpleOidList *tableoid_list = NULL;
 	ReindexType process_type = type;
 	ParallelSlotArray *sa;
 	bool		failed = false;
-	int			items_count = 0;
-	char	   *prev_index_table_name = NULL;
+	int			items_count;
+	Oid			prev_index_table_oid = InvalidOid;
 	ParallelSlot *free_slot = NULL;
 
 	conn = connectDatabase(cparams, progname, echo, false, true);
@@ -322,6 +327,7 @@ reindex_one_database(ConnParams *cparams, ReindexType type,
 			case REINDEX_INDEX:
 			case REINDEX_SCHEMA:
 			case REINDEX_TABLE:
+process_list = user_list;
 Assert(user_list != NULL);
 break;
 		}
@@ -330,26 +336,14 @@ reindex_one_database(ConnParams *cparams, ReindexType type,
 	{
 		switch (process_type)
 		{
-			case REINDEX_DATABASE:
-
-/* Build a list of relations from the database */
-process_list = get_parallel_object_list(conn, process_type,
-		user_list, echo);
-process_type = REINDEX_TABLE;
-
-/* Bail out if nothing to process */
-if (process_list == NULL)
-{
-	PQfinish(conn);
-	return;
-}
-break;
-
 			case REINDEX_SCHEMA:
 Assert(user_list != NULL);
+/* fall through */
 
-/* Build a list of relations from all the schemas */
-process_list = get_parallel_object_list(conn, process_type,
+			case REINDEX_DATABASE:
+
+/* Build a list of relations from the given list */
+process_list = get_parallel_tables_list(conn, process_type,
 		user_list, echo);
 process_type = REINDEX_TABLE;
 
@@ -362,45 +356,34 @@ reindex_one_database(ConnParams *cparams, ReindexType type,
 break;
 
 			case REINDEX_INDEX:
-Assert(user_list != NULL);
 
 /*
- * Build a list of relations from the indices.  This will
- * accordingly reorder the list of indices too.
+ * Generate a list of indexes and a matching list of table
+ * OIDs, based on the user-specified index names.
  */
-indices_tables_list = get_parallel_object_list(conn, process_type,
-			   user_list, echo);
+get_parallel_tabidx_list(conn, user_list, &tableoid_list,
+		 echo);
 
-/*
- * Bail out if nothing to process.  'user_list' was modified
- * in-place, so check if it has at least one cell.
- */
-if (user_list->head == NULL)
+/* Bail out if nothing to process */
+if (tableoid_list == NULL)
 {
 	PQfinish(conn);
 	return;
 }
 
-/*
- * Assuming 'user_list' is not empty, 'indices_tables_list'
- * shouldn't be empty as well.
- */
-Assert(indices_tables_list != NULL);
-indices_tables_cell = indices_tables_list->head;
+indices_tables_cell = tableoid_list->head;
+process_list = user_list;
 
 break;
 
 			case REINDEX_SYSTEM:
 /* not supported */
+process_list = NULL;
 Assert(false);
 break;
 
 			case REINDEX_TABLE:
-
-/*
- * Fall through.  The list of items for tables is already
- * created.
- */
+process_list = user_list;
 break;
 		}
 	}
@@ -410,6 +393,7 @@ reindex_one_database(ConnParams *cparams, ReindexType type,
 	 * the list.  We choose the minimum between the number of concurrent
 	 * connections and the number of items in the list.
 	 */
+	items_count = 0;
 	for (cell = process_list->head; cell; cell = cell->next)
 	{

Re: pg_atomic_compare_exchange_*() and memory barriers

2025-03-07 Thread Alexander Korotkov

On Fri, Mar 7, 2025 at 7:07 PM Andres Freund  wrote:
> On 2025-03-07 18:39:42 +0200, Alexander Korotkov wrote:
> > On Fri, Mar 7, 2025 at 6:02 PM Andres Freund  wrote:
> > >
> > > On 2025-03-07 17:47:08 +0200, Alexander Korotkov wrote:
> > > > While investigating a bug in the patch to get rid of WALBufMappingLock, 
> > > > I
> > > > found that the surrounding pg_atomic_compare_exchange_u64() fixes the
> > > > problem for me.
> > >
> > > That sounds more likely to be due to slowing down things enough to make a 
> > > race
> > > less likely to be hit. Or a compiler bug.
> > >
> > >
> > > > That doesn't feel right because, according to the
> > > > comments, both pg_atomic_compare_exchange_u32() and
> > > > pg_atomic_compare_exchange_u64() should provide full memory barrier
> > > > semantics.  So, I decided to investigate this further.
> > > >
> > > > In my case, the implementation of pg_atomic_compare_exchange_u64() is 
> > > > based
> > > > on __atomic_compare_exchange_n().
> > > >
> > > > static inline bool
> > > > pg_atomic_compare_exchange_u64_impl(volatile pg_atomic_uint64 *ptr,
> > > > uint64 *expected, uint64 newval)
> > > > {
> > > > AssertPointerAlignment(expected, 8);
> > > > return __atomic_compare_exchange_n(&ptr->value, expected, newval, 
> > > > false,
> > > >__ATOMIC_SEQ_CST, 
> > > > __ATOMIC_SEQ_CST);
> > > > }
> > > >
> > > > According to the docs __ATOMIC_SEQ_CST enforces total ordering with *all
> > > > other __ATOMIC_SEQ_CST operations*.  It only says about other
> > > > __ATOMIC_SEQ_CST operations; nothing is said about regular reads/writes.
> > > > This sounds quite far from our comment, promising full barrier 
> > > > semantics,
> > > > which, in my understanding, means the equivalent of
> > > > both pg_read_barrier()/pg_write_barrier(), which should barrier *all 
> > > > other
> > > > read/writes*.
> > >
> > > A memory barrier in one process/thread always needs be paired with a 
> > > barrier
> > > in another process/thread. It's not enough to have a memory barrier on one
> > > side but not the other.
> >
> > Yes, there surely should be a memory barrier on another side.  But
> > does __ATOMIC_SEQ_CST works as a barrier for the regular read/write
> > operations on the same side?
>
> Yes, if it's paired with another barrier on the other side. The regular
> read/write operations are basically protected transitively, due to
>
> a) An acquire barrier preventing non-atomic reads/writes in the same thread
>from appearing to have been moved to before the barrier
>
> b) A release barrier preventing non-atomic reads/writes in the same thread
>from appearing to have been moved to after the barrier
>
> c) The other thread being guaranteed a) and b) for the other threads'
>non-atomic reads/writes depending on the type of barrier
>
> d) The atomic value itself being guaranteed to be, well, atomic
>
>
> > > > This function first checks if LSE instructions are present.  If so,
> > > > instruction LSE casal is used.  If not (in my case), the loop of
> > > > ldaxr/stlxr is used instead.  The documentation says both ldaxr/stlxr
> > > > provides one-way barriers.  Read/writes after ldaxr will be observed 
> > > > after,
> > > > and read/writes before stlxr will be observed before.  So, a pair of 
> > > > these
> > > > instructions provides a full memory barrier.  However, if we don't 
> > > > observe
> > > > the expected value, only ldaxr gets executed.  So, an unsuccessful
> > > > pg_atomic_compare_exchange_u64() attempt will leave us with a one-way
> > > > barrier, and that caused a bug in my case.
> > >
> > > That has to be a compiler bug.  We specify __ATOMIC_SEQ_CST for both
> > > success_memorder *and* failure_memorder.
> > >
> > > What compiler & version is this?
> >
> > I've tried and got the same results with two compilers.
> > gcc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0
> > Ubuntu clang version 19.1.1 (1ubuntu1~24.04.2)
>
> Thinking more about it I wonder if the behaviour of not doing a release in
> case the atomic fails *might* arguably actually be correct - if the
> compare-exchange fails, there's nothing that the non-atomic values could be
> ordered against.

There is another successful compare-exchange executed by a concurrent
process to get ordered against.

--
Regards,
Alexander Korotkov
Supabase

Re: Statistics Import and Export

2025-03-07 Thread Hari Krishna Sunder

To improve the performance of pg_dump can we add a new sql function that
can operate more efficiently than the pg_stats view? It could also take in
an optional list of oids to filter on.
This will help speed up the dump and restore within pg18 and future
upgrades to higher pg versions.

Thanks
Hari Krishna Sunder

On Fri, Mar 7, 2025 at 7:43 PM Corey Huinker 
wrote:

> I tried to generalize that requirement to all of
>> {schema|data|statistics} for consistency, but that resulted in 9
>> options.
>>
>
> 9 options that resolve to 3 boolean variables. It's not that hard.
>
> And if we add a fourth option set, then we have 12 options. So it's O(3N),
> not O(N^2).
>
> People have scripts now that rely on the existing -only flags, and nearly
> every other potentially optional thing has a -no flag. Let's leverage that.
>

Re: Parallel heap vacuum

2025-03-07 Thread Masahiko Sawada

On Thu, Mar 6, 2025 at 5:33 PM Peter Smith  wrote:
>
> Some minor review comments for patch v10-0001.
>
> ==
> src/include/access/tableam.h
>
> 1.
>  struct IndexInfo;
> +struct ParallelVacuumState;
> +struct ParallelContext;
> +struct ParallelWorkerContext;
>  struct SampleScanState;
>
> Use alphabetical order for consistency with existing code.
>
> ~~~
>
> 2.
> + /*
> + * Estimate the size of shared memory that the parallel table vacuum needs
> + * for AM
> + *
>
> 2a.
> Missing period (.)
>
> 2b.
> Change the wording to be like below, for consistency with the other
> 'estimate' function comments, and for consistency with the comment
> where this function is implemented.
>
> Estimate the size of shared memory needed for a parallel table vacuum
> of this relation.
>
> ~~~
>
> 3.
> + * The state_out is the output parameter so that an arbitrary data can be
> + * passed to the subsequent callback, parallel_vacuum_remove_dead_items.
>
> Typo? "an arbitrary data"
>
> ~~~
>
> 4. General/Asserts
>
> All the below functions have a comment saying "Not called if parallel
> table vacuum is disabled."
> - parallel_vacuum_estimate
> - parallel_vacuum_initialize
> - parallel_vacuum_initialize_worker
> - parallel_vacuum_collect_dead_items
>
> But, it's only a comment. I wondered if they should all have some
> Assert as an integrity check on that.
>
> ~~~
>
> 5.
> +/*
> + * Return the number of parallel workers for a parallel vacuum scan of this
> + * relation.
> + */
>
> "Return the number" or "Compute the number"?
> The comment should match the comment in the fwd declaration of this function.
>
> ~~~
>
> 6.
> +/*
> + * Perform a parallel vacuums scan to collect dead items.
> + */
>
> 6a.
> "Perform" or "Execute"?
> The comment should match the one in the fwd declaration of this function.
>
> 6b.
> Typo "vacuums"
>

Thank you for reviewing the patch. I'll address these comments and
submit the updated version patches soon.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

RE: Selectively invalidate caches in pgoutput module

2025-03-07 Thread Hayato Kuroda (Fujitsu)

Dear Amit,

> Don't we need to call this invalidation function from
> InvalidateSystemCachesExtended()?

Hmm. I did not call relsync callback functions in 
InvalidateSystemCachesExtended()
intentionally, but added now. Please see my analysis below and let me know how
you think.

In the first place, InvalidateSystemCachesExtended() can be called by
1) InvalidateSystemCaches(), and 2) AcceptInvalidationMessages().

Regarding the 1), it could be used when a) the parallel worker is launched,
b) decoder starts decoding, or c) decoder finishes decoding and after decoding 
context is free'd.
However, parallel worker won't do a decoding, initially the relsync cache is 
empty
and relsync would be dropped when the decding context is removed.
Based on the fact I did not called the function.

As for the 2), the path exists only when the debug build mode and 
debug_discard_caches is set.
According to the manual and code comments, it must discard all caches anytime.
So...OK, I decided to follow the rule.

Best regards,
Hayato Kuroda
FUJITSU LIMITED



v6-0001-Introduce-a-new-invalidation-message-to-invalidat.patch
Description:  v6-0001-Introduce-a-new-invalidation-message-to-invalidat.patch


v6-0002-Invalidate-Relcaches-while-ALTER-PUBLICATION-RENA.patch
Description:  v6-0002-Invalidate-Relcaches-while-ALTER-PUBLICATION-RENA.patch

Re: per backend WAL statistics

2025-03-07 Thread Michael Paquier

On Thu, Mar 06, 2025 at 10:33:52AM +, Bertrand Drouvot wrote:
> Indeed, there is no reason for pgstat_backend_have_pending_cb() to return 
> true if 
> pgstat_tracks_backend_bktype() is not satisfied.
> 
> So it deserves a dedicated patch to fix this already existing issue:
> 0001 attached.

 pgstat_backend_have_pending_cb(void)
 {
-return (!pg_memory_is_all_zeros(&PendingBackendStats,
-sizeof(struct PgStat_BackendPending)));
+if (!pgstat_tracks_backend_bktype(MyBackendType))
+return false;
+else
+return (!pg_memory_is_all_zeros(&PendingBackendStats,
+sizeof(struct PgStat_BackendPending)));

So, if I understand your point correctly, it is not a problem on HEAD
because we are never going to update PendingBackendStats in the
checkpointer as pgstat_count_backend_io_op[_time]() blocks any attempt
to do so.  However, it becomes a problem with the WAL portion patch
because of the dependency to pgWalUsage which may be updated by the
checkpointer and the pending callback would happily report true in
this case.  It would also become a problem if we add in backend stats
a different portion that depends on something external.

An extra check based on pgstat_tracks_backend_bktype() makes sense in
pgstat_backend_have_pending_cb(), yes, forcing the hand of the stats
reports to not do stupid things in a process where we should not
report stats.  Good catch from the sanity check in
pgstat_report_stat(), even if this is only a problem once we begin to
use something else than PendingBackendStats for the pending stats.
This has also the merit of making pgstat_report_stat() do less work.
--
Michael

signature.asc
Description: PGP signature

Re: [PATCH] New predefined role pg_manage_extensions

2025-03-07 Thread Robert Haas

On Fri, Mar 7, 2025 at 11:21 AM Tom Lane  wrote:
> While I'm all for chipping away at what superuser privilege is
> needed for, we have to tread VERY carefully about chipping away
> at things that allow any outside-the-database access.

I agree, but I also don't want the security decisions that the core
project takes to become irrelevant in practice. It seems like what's
starting to happen is that all of the cloud providers end up finding
the same issues and working around them in very similar ways and they
don't do anything in PostgreSQL itself, I guess because that would
require winning an argument on the mailing list. I think that dynamic
is bad for us as an open-source project, so I'm trying to be
particularly careful about evaluating proposals that smell like "all
the cloud providers are already doing this, why don't we maybe just
agree that it's needed".

I'm not certain that this is such a case, and I'd like to have more
information about what the cloud providers are actually doing in this
area, but I'm alert to the possibility that it might be the case.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Commitfest app release on Feb 17 with many improvements

2025-03-07 Thread Tom Lane

Jelte Fennema-Nio  writes:
> On Fri, 7 Mar 2025 at 17:42, Tom Lane  wrote:
>> Hm, don't you *want* a failure if the patch is already applied?

> It's pretty common that in a larger patchset the first 1-2
> small/trivial patches get committed before the rest. Having to then
> send an additional email, resubmitting the rest of the patchset
> because CFBot fails when re-applying those is kind of silly. This
> avoids that.

Got it.  That is indeed an improvement, though of course it only
helps if those patches were applied verbatim.

regards, tom lane

Re: Simplify the logic a bit (src/bin/scripts/reindexdb.c)

2025-03-07 Thread Ranier Vilela

Em sex., 7 de mar. de 2025 às 15:54, Álvaro Herrera 
escreveu:

> Hello
>
> I find that this is still quite broken -- both the original, and your
> patch.  I have already complained about a fundamental bug in the
> original in [1].  In addition to what I reported there, in the unpatched
> code I noticed that we're wasting memory and CPU by comparing the
> qualified table name, when it would be faster to just store the table
> OID and do the comparison based on that.  Because the REINDEX INDEX
> query only needs to specify the *index* name, not the table name, we
> don't need the table name to be stored in the indices_tables_list: we
> can convert it into an OID list and store that instead.  The strcmp
> becomes a numeric comparison.  Yay.  This is of course a pretty trivial,
> almost meaningless change.
>
> [1] https://postgr.es/m/202503071820.j25zn3lo4hvn@alvherre.pgsql
>
> But your patch also makes the mistake that indices_table_list is passed
> as a pointer by value, so the list that is filled by
> get_parallel_tabidx_list() can never have any effect.  I wonder if you
> tested this patch at all.  With your patch and the sample setup I posted
> in [1], the program doesn't do anything.  It never calls REINDEX at all.
> It is a dead program.  It's so bogus that in fact, there's no program,
> it's just a bug that takes a lot of disk space.
>
> For this to work, we need to pass that list (the output list) as a
> pointer reference, so that the caller can _receive_ the output list.
>
> We also get this compiler warning:
>
> /pgsql/source/master/src/bin/scripts/reindexdb.c: In function
> ‘get_parallel_tabidx_list’:
> /pgsql/source/master/src/bin/scripts/reindexdb.c:782:17: warning: this
> ‘if’ clause does not guard... [-Wmisleading-indentation]
>   782 | if (cell != index_list->head)
>   | ^~
> /pgsql/source/master/src/bin/scripts/reindexdb.c:785:25: note: ...this
> statement, but the latter is misleadingly indented as if it were guarded by
> the ‘if’
>   785 | appendQualifiedRelation(&catalog_query,
> cell->val, conn, echo);
>   | ^~~
>
> Not to mention the 'git apply' warnings:
>
> I schmee: master 0$ git apply
> /tmp/v3-simplifies-reindex-one-database-reindexdb.patch
> /tmp/v3-simplifies-reindex-one-database-reindexdb.patch:191: space before
> tab in indent.
>
>  fmtQualifiedIdEnc(PQgetvalue(res, i, 1),
> /tmp/v3-simplifies-reindex-one-database-reindexdb.patch:192: space before
> tab in indent.
>
>  PQgetvalue(res, i, 0),
> /tmp/v3-simplifies-reindex-one-database-reindexdb.patch:193: space before
> tab in indent.
>
>  PQclientEncoding(conn)));
> advertencia: 3 líneas agregan errores de espacios en blanco.
>
>
> Anyway, my version of this is attached.  It fixes the problems with your
> patch, but not Orlov's fundamental bug.  Without fixing that bug, this
> program does not deserve this supposedly parallel mode that doesn't
> actually deliver.  I wonder if we shouldn't just revert 47f99a407de2
> completely.
>
> You, on the other hand, really need to be much more careful with the
> patches you submit.
>
Yes of course, thank you for making the necessary corrections.

best regards,
Ranier Vilela

Re: INSERT ... ON CONFLICT DO SELECT [FOR ...] take 2

2025-03-07 Thread Joel Jacobson

On Wed, Mar 5, 2025, at 03:32, Andreas Karlsson wrote:
> On 3/4/25 10:24 AM, Andreas Karlsson wrote:
>> Rebased the patch to add support for OLD.* and NEW.*.
>
> Apparently the CI did not like that version.
>
> Andreas
>
> Attachments:
> * v6-0001-Add-support-for-ON-CONFLICT-DO-SELECT-FOR.patch

+1 This patch adds a very useful feature.

I looked over the patch and noted that it touches some areas and concepts that
I don't feel sufficiently familiar with. For that reason, I'm removing myself
as a reviewer, hoping that someone with the appropriate expertise will step in.

That said, I read through the entire patch, and everything—code, comments, 
tests,
and documentation—appears tidy and well-structured. I didn't spot any obvious
errors or issues.

I did notice a couple of minor nits in the comments:

- The word "strength" is misspelled as "strengh" in a few places.
- There's an extra "if" in the comment "Returns true if if we're done."

Overall, the patch looks solid to me.

/Joel

Re: [PoC] Federated Authn/z with OAUTHBEARER

2025-03-07 Thread Jacob Champion

On Thu, Mar 6, 2025 at 9:13 PM Thomas Munro  wrote:
> I don't see that behaviour on my Mac with a simple program, and that
> seems like it couldn't possibly be intended.

What version of macOS?

Just to make sure I'm not chasing ghosts, I've attached my test
program. Here are my CI results for running it:

= FreeBSD =

[  6 us] timer is set
[   1039 us] kqueue is readable
[   1050 us] timer is reset
[   1052 us] kqueue is not readable

= NetBSD =

[  3 us] timer is set
[  14993 us] kqueue is readable
[  15000 us] timer is reset
[  15002 us] kqueue is not readable

= OpenBSD =

[ 24 us] timer is set
[  19660 us] kqueue is readable
[  19709 us] timer is reset
[  19712 us] kqueue is not readable

= macOS Sonoma =

[  4 us] timer is set
[   1282 us] kqueue is readable
[   1286 us] timer is reset
[   1287 us] kqueue is still readable

--Jacob


kqueue_test.c
Description: Binary data

Re: Add Pipelining support in psql

2025-03-07 Thread Daniel Verite

Jelte Fennema-Nio wrote:

> As an example you can copy paste this tiny script:
> 
> \startpipeline
> select pg_sleep(5) \bind \g
> \endpipeline
> 
> And then it will show these "extra argument ... ignored" warnings
> 
> \startpipeline: extra argument "select" ignored
> \startpipeline: extra argument "pg_sleep(5)" ignored

It happens with other metacommands as well, and appears
to depend on a readline option that is "on" by default since
readline-8.1 [1]

 enable-bracketed-paste

When set to ‘On’, Readline configures the terminal to insert each
paste into the editing buffer as a single string of characters,
instead of treating each character as if it had been read from the
keyboard. This is called putting the terminal into bracketed paste
mode; it prevents Readline from executing any editing commands
bound to key sequences appearing in the pasted text. The default
is ‘On’.

This behavior of the metacommand complaining about arguments 
on the next line also happens if using \e and typing this sequence
of commands  in the editor. In that case readline is not involved.

There might be something to improve here, because
a metacommand cannot take its argument from the next line,
and yet that's what the error messages somewhat imply.
But that issue is not related to the new pipeline metacommands.

[1]
https://tiswww.case.edu/php/chet/readline/readline.html#index-enable_002dbracketed_002dpaste

Best regards,
-- 
Daniel Vérité 
https://postgresql.verite.pro/

Re: Log connection establishment timings

2025-03-07 Thread Jacob Champion

On Fri, Mar 7, 2025 at 2:17 PM Melanie Plageman
 wrote:
> Yea, "auth_id" is too ambiguous for a GUC option. I went with
> "authorization" because it sounds a bit less like a stage while also
> being a recognizable word. It achieves symmetry -- though it doesn't
> match the prefix used in the logs.

Yeah, that sounds reasonable.

--Jacob

Re: Expanding HOT updates for expression and partial indexes

2025-03-07 Thread Matthias van de Meent

On Thu, 6 Mar 2025 at 13:40, Burd, Greg  wrote:
>
> > On Mar 5, 2025, at 6:39 PM, Matthias van de Meent 
> >  wrote:
> >
> > On Wed, 5 Mar 2025 at 18:21, Burd, Greg  wrote:
> >> * augments IndexInfo only when needed for testing expressions and only once
> >
> > ExecExpressionIndexesUpdated seems to always loop over all indexes,
> > always calling AttributeIndexInfo which always updates the fields in
> > the IndexInfo when the index has only !byval attributes (e.g. text,
> > json, or other such varlena types). You say it happens only once, have
> > I missed something?
>
> There's a test that avoids doing it more than once, [...]

Is this that one?

+if (indexInfo->ii_IndexAttrByVal)
+return indexInfo;

I think that test doesn't work consistently: a bitmapset * is NULL
when no bits are set; and for some indexes no attribute will be byval,
thus failing this early-exit even after processing.

Another small issue with this approach is that it always calls and
tests in EEIU(), while it's quite likely we would do better if we
pre-processed _all_ indexes at once, so that we can have a path that
doesn't repeatedly get into EEIU only to exit immediately after. It'll
probably be hot enough to not matter much, but it's still cycles spent
on something that we can optimize for in code.

> >> * retains existing summarized index HOT update logic
> >
> > Great, thanks!
> >
> > Kind regards,
> >
> > Matthias van de Meent
> > Neon (https://neon.tech)
>
> I might widen this patch a bit to include support for testing equality of 
> index tuples using custom operators when they exist for the index.  In the 
> use case I'm solving for we use a custom operator for equality that is not 
> the same as a memcmp().  Do you have thoughts on that?

I don't think that's a very great idea. From a certain point of view,
you can see HOT as "deduplicating multiple tuple versions behind a
single TID". Btree doesn't support deduplication for types that can
have more than one representation of the same value so that e.g.
'0.0'::numeric and '0'::numeric are both displayed correctly, even
when they compare as equal according to certain equality operators.

So, I don't think that's worth investing time into right now. Maybe in
the future if there are new discoveries about what we can and cannot
deduplicate, but I don't think it should be part of an MVP or 1.0.

Kind regards,

Matthias van de Meent

Re: Log connection establishment timings

2025-03-07 Thread Melanie Plageman

On Fri, Mar 7, 2025 at 10:09 AM Bertrand Drouvot
 wrote:
>
> Given that conn_timing.ready_for_use is only used here:
>
> +   if (!reported_first_ready_for_query &&
> +   (log_connections & 
> LOG_CONNECTION_READY_FOR_USE) &&
> +   IsConnectionBackend(MyBackendType))
> +   {
> +   uint64  total_duration,
> +   fork_duration,
> +   auth_duration;
> +
> +   conn_timing.ready_for_use = 
> GetCurrentTimestamp();
> +
> +   total_duration =
> +   
> TimestampDifferenceMicroseconds(conn_timing.socket_create,
> + 
>   conn_timing.ready_for_use);
>
> I wonder if ready_for_use needs to be part of ConnectionTiming after all.
> I mean we could just:
>
> "
> total_duration = TimestampDifferenceMicroseconds(conn_timing.socket_create, 
> GetCurrentTimestamp())
> "
>
> That said, having ready_for_use part of ConnectionTiming could be
> useful if we decide to create a SQL API on top of this struct though. So,
> that's probably fine and better as it is.

That's what I was thinking. It felt like good symmetry to have it there.

> And if we keep it part of ConnectionTiming, then:
>
> === 02
>
> +   if (!reported_first_ready_for_query &&
> +   (log_connections & 
> LOG_CONNECTION_READY_FOR_USE) &&
> +   IsConnectionBackend(MyBackendType))
>
> What about getting rid of reported_first_ready_for_query and check if
> conn_timing.ready_for_use is zero instead?

Unfortunately, I think a TimestampTz value of 0 corresponds to
1970-01-01 00:00:00 UTC, which is a valid timestamp.

- Melanie

RE: Selectively invalidate caches in pgoutput module

2025-03-07 Thread Hayato Kuroda (Fujitsu)

Dear Amit, hackers,

> Yeah, this is a good improvement and the patch looks safe to me, so I
> pushed with minor changes in the comments.

Thanks! PSA rebased patch. While considering more about 0002, I found a
conceptual bug - when relsync cache could not be re-built when SQL interfaces 
are
used. In old vesrion inval messages were not recorded on the WAL, so relsync
cache could not be discarded if decoding happed later.
In new version, the invalidation is handled as transactional and serialized to
the WAL. More detail, messages are stored in InvalMessageArrays and consumed at
the end of transactions. For simplfication, the subgroup of the message is set
as "relcache" when serializing. But IIUC the handling is OK - snapshot
invalidation does the same approach.

I did a performance testing with given patch. ()
Mostly same as previous test but used two publications.

1. Initialize an instance
2. Create a root table and leaf tables. The variable $NUM_PART controls
  how many partitions exist on the instance.
3. Create another table
4. Create publications (p1, p2). One includes a root table, and another one 
includes
  independent table.
5. Create a replication slot with pgoutput plugint.
6. Execute a transaction which would be decoded. In the transaction:
  a. Insert tuples to all the leaf tables
  b. Rename publication p2 to p2_renamed
  c. Rename publication p2_renamed to p2
  d. Insert tuples to all the leaf tables again.
7. decode all the changes via SQL interfaces.

My expectation is for HEAD, leaves are cached at a), but they would be
discarded at b) and c), then they are cached again at d).
For patched case, it only an independent table would be discarded at b) and c) 
so the
decoding time should be reduced.

Below results are the median of five runs. ITSM around 15% improvements.

head [ms]   patched (0001-0003)
239 200

> Ideally, we need the time of only step-6d with and without the patch.
> Step 6-a is required to build the cache, but in actual workloads via
> walsender, we won't need that, and also, at the end of SQL API
> execution, we will forcibly invalidate all caches, so that is also not
> required. See, if in the performance measurement of the next set of
> patches, you can avoid that.

Hmm. It requires additional debug messages. Let me consider.

Best regards,
Hayato Kuroda
FUJITSU LIMITED 



v4-0001-Introduce-a-new-invalidation-message-to-invalidat.patch
Description:  v4-0001-Introduce-a-new-invalidation-message-to-invalidat.patch


v4-0002-Invalidate-Relcaches-while-ALTER-PUBLICATION-OWNE.patch
Description:  v4-0002-Invalidate-Relcaches-while-ALTER-PUBLICATION-OWNE.patch


test_r.sh
Description: test_r.sh

Re: jsonb_strip_nulls with arrays?

2025-03-07 Thread Andrew Dunstan




On 2025-03-06 Th 9:17 AM, Shinoda, Noriyoshi (SXD Japan FSI) wrote:

Hi,
Thanks for developing the good feature.
I've attached a small patch for the documentation of the json_strip_nulls 
function. The data type of the 'target' parameter is different between the 
implementation and the documentation. The implementation is json_stripe_nulls 
(target JSON, ...), but the current documentation says json_stripe_nulls(target 
JSONB, ...).



Argh! My glasses must have been fogged up yesterday.


pushed, thanks


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: Statistics Import and Export

2025-03-07 Thread Corey Huinker

>
> I tried to generalize that requirement to all of
> {schema|data|statistics} for consistency, but that resulted in 9
> options.
>

9 options that resolve to 3 boolean variables. It's not that hard.

And if we add a fourth option set, then we have 12 options. So it's O(3N),
not O(N^2).

People have scripts now that rely on the existing -only flags, and nearly
every other potentially optional thing has a -no flag. Let's leverage that.

Re: Statistics Import and Export

2025-03-07 Thread Corey Huinker

>
>
> if you want everything --include=schema,data,statistics (presumably
> redundant with the default behavior)
> if you want schema only --include=schema
> if you want "everything except schema" --include=data,statistics
>

Until we add a fourth option, and then it becomes completely ambiguous as
to whether you wanted data+statstics, or you not-wanted schema.



And if someday, for example, there is ever agreement on including role
> information with normal pg_dump, you add "roles" as an option to be
> parsed via --include without having to create any new flags.
>

This is pushing a burden onto our customers for a parsing convenience.

-1.

Re: [Patch] remove duplicated smgrclose

2025-03-07 Thread Masahiko Sawada

Hi,

On Sun, Oct 27, 2024 at 12:05 PM Kirill Reshke  wrote:
>
> On Wed, 14 Aug 2024 at 11:35, Steven Niu  wrote:
> >
> > Junwang, Kirill,
> >
> > The split work has been done. I created a new patch for removing redundant 
> > smgrclose() function as attached.
> > Please help review it.
> >
> > Thanks,
> > Steven
> >
> > Steven Niu  于2024年8月12日周一 18:11写道：
> >>
> >> Kirill,
> >>
> >> Good catch!
> >> I will split the patch into two to cover both cases.
> >>
> >> Thanks,
> >> Steven
> >>
> >>
> >> Junwang Zhao  于2024年8月9日周五 18:19写道：
> >>>
> >>> On Fri, Aug 9, 2024 at 5:20 PM Kirill Reshke  
> >>> wrote:
> >>> >
> >>> > On Thu, 1 Aug 2024 at 17:32, Junwang Zhao  wrote:
> >>> > >
> >>> > > Hi Steven,
> >>> > >
> >>> > > On Wed, Jul 31, 2024 at 11:16 AM Steven Niu  
> >>> > > wrote:
> >>> > > >
> >>> > > > Hello, hackers,
> >>> > > >
> >>> > > > I think there may be some duplicated codes.
> >>> > > > Function smgrDoPendingDeletes() calls both smgrdounlinkall() and 
> >>> > > > smgrclose().
> >>> > > > But both functions would close SMgrRelation object, it's dupliacted 
> >>> > > > behavior?
> >>> > > >
> >>> > > > So I make this patch. Could someone take a look at it?
> >>> > > >
> >>> > > > Thanks for your help,
> >>> > > > Steven
> >>> > > >
> >>> > > > From Highgo.com
> >>> > > >
> >>> > > >
> >>> > > You change LGTM, but the patch seems not to be applied to HEAD,
> >>> > > I generate the attached v2 using `git format` with some commit 
> >>> > > message.
> >>> > >
> >>> > > --
> >>> > > Regards
> >>> > > Junwang Zhao
> >>> >
> >>> > Hi all!
> >>> > This change looks good to me. However, i have an objection to these
> >>> > lines from v2:
> >>> >
> >>> > >  /* Close the forks at smgr level */
> >>> > > - for (forknum = 0; forknum <= MAX_FORKNUM; forknum++)
> >>> > > - smgrsw[which].smgr_close(rels[i], forknum);
> >>> > > + smgrclose(rels[i]);
> >>> >
> >>> > Why do we do this? This seems to be an unrelated change given thread
> >>> > $subj. This is just a pure refactoring job, which deserves a separate
> >>> > patch. There is similar coding in
> >>> > smgrdestroy function:
> >>> >
> >>> > ```
> >>> > for (forknum = 0; forknum <= MAX_FORKNUM; forknum++)
> >>> > smgrsw[reln->smgr_which].smgr_close(reln, forknum);
> >>> > ```
> >>> >
> >>> > So, I feel like these two places should be either changed together or
> >>> > not be altered at all. And is it definitely a separate change.
> >>>
> >>> Yeah, I tend to agree with you, maybe we should split the patch
> >>> into two.
> >>>
> >>> Steven, could you do this?
> >>>
> >>> >
> >>> > --
> >>> > Best regards,
> >>> > Kirill Reshke
> >>>
> >>>
> >>>
> >>> --
> >>> Regards
> >>> Junwang Zhao
>
> Hi!
> Looks like discussion on the subject is completed, and no open items
> left, so I will try to mark commitfest [1] entry as Ready For
> Committer.
>

I've looked at the patch and have some comments:

The patch removes smgrclose() calls following smgrdounlinkall(), for example:

--- a/src/backend/catalog/storage.c
+++ b/src/backend/catalog/storage.c
@@ -686,9 +686,6 @@ smgrDoPendingDeletes(bool isCommit)
  {
  smgrdounlinkall(srels, nrels, false);

- for (int i = 0; i < nrels; i++)
- smgrclose(srels[i]);
-
  pfree(srels);
  }
 }

While smgrdounlinkall() close the relation at smgr level as follow:

/* Close the forks at smgr level */
for (forknum = 0; forknum <= MAX_FORKNUM; forknum++)
smgrsw[which].smgr_close(rels[i], forknum);

smgrrelease(), called by smgrclose(), also does the same thing but
does more things as follow:

void
smgrrelease(SMgrRelation reln)
{
for (ForkNumber forknum = 0; forknum <= MAX_FORKNUM; forknum++)
{
smgrsw[reln->smgr_which].smgr_close(reln, forknum);
reln->smgr_cached_nblocks[forknum] = InvalidBlockNumber;
}
reln->smgr_targblock = InvalidBlockNumber;
}

Therefore, if we move such smgrclose() calls, we would end up missing
some operations that are done in smgrrelease() but not in
smgrdounlinkall(), no?

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Re: support virtual generated column not null constraint

2025-03-07 Thread Xuneng Zhou

Hi,

jian he  于2025年3月6日周四 21:44写道：

> On Wed, Feb 26, 2025 at 3:01 PM ego alter  wrote:
> >
> > Hi, I’ve had a chance to review the patch.  As I am still getting
> familiar with executor part, questions and feedbacks could be relatively
> trivial.
> >
> > There are two minor issues i want to discuss:
> > 1. The way of caching constraint-checking expr for virtual generated not
> null
> > The existing array for caching constraint-checking expr is
> >  /* array of constraint-checking expr states */
> > ExprState **ri_ConstraintExprs;
> >
> > the proposed changes for virtual generated not null in patch
> > +  /* array of virtual generated not null constraint-checking expr
> states */
> > +  ExprState **ri_VirGeneratedConstraintExprs;
> > /*
> > Could you explain the rationale behind adding this new field instead of
> reusing ri_ConstraintExprs? The comment suggests it’s used specifically for
> not null constraint-checking, but could it be extended to handle other
> constraints in the future as well? I assume one benefit is that it
> separates the handling of normal constraints from virtual ones, but I'd
> like to appreciate more context on the decision.
> >
>
> it's a cache mechanism, just like ResultRelInfo.ri_ConstraintExprs
> you can see comments in ExecRelCheck
> /*
>  * If first time through for this result relation, build expression
>  * nodetrees for rel's constraint expressions.  Keep them in the
> per-query
>  * memory context so they'll survive throughout the query.
>  */
>
> for example:
> create table t(a int, check (a > 3));
> insert into t values (4),(3);
> we need to call ExecRelCheck for values 4 and 3.
> The second time ExecRelCheck called for values 3, if
> ri_ConstraintExprs is not null then
> we didn't need to call ExecPrepareExpr again for the same check
> constraint expression.
>
> +  ExprState **ri_VirGeneratedConstraintExprs;
> is specifically only for virtual generated columns not null constraint
> evaluation.
> Anyway, I renamed it to ri_GeneratedNotNullExprs.
>
> If you want to extend cache for other constraints in the future,
> you can add it to struct ResultRelInfo.
> Currently struct ResultRelInfo, we have ri_GeneratedExprsU,
> ri_GeneratedExprsI for generated expressions;
> ri_ConstraintExprs for check constraints.
>
> Thank you for your explanation!  This name seems to be more clear for its
usage.

>
> > 2. The naming of variable gen_virtualnn.
> > Gen_virtualnn looks confusing at first glance. Checkconstr seems to be
> more straightforward.
> >
> thanks. I changed it to exprstate.
>
> new patch attached.
> 0001 not null for virtual generated columns, some refactoring happened
> within ExecConstraints
> to avoid duplicated error handling code.
>
> 0002 and 0003 is the domain for virtual generated columns. domain can
> have not-null constraints,
> this is built on top of 0001, so I guess posting here for review should be
> fine?
>
> currently it works as intended. but add a virtual generated column
> with domain_with_constraints
> requires table rewrite. but some cases table scan should just be enough.
> some case we do need table rewrite.
> for example:
> create domain d1 as int check(value > random(min=>11::int, max=>12));
> create domain d2 as int check(value > 12);
> create table t(a int);
> insert into t select g from generate_series(1, 10) g;
> alter table t add column b d1 generated always as (a+11) virtual; --we
> do need table rewrite in phase 3.
> alter table t add column c d2 generated always as (a + 12) virtual;
> --we can only do table scan in phase 3.
>

Re: support virtual generated column not null constraint

2025-03-07 Thread Xuneng Zhou

Hi,

forget to add hackers to cc.

Xuneng Zhou  于2025年3月8日周六 12:10写道：

>
>
> Navneet Kumar  于2025年3月8日周六 02:09写道：
>
>>
>>
>>> This scenario fails
>>> 1. CREATE TABLE person (
>>> id INT GENERATED BY DEFAULT AS IDENTITY,
>>> first_name VARCHAR(50) NOT NULL,
>>> last_name VARCHAR(50) NOT NULL
>>> );
>>>
>>> 2. INSERT INTO person (first_name, last_name)
>>> VALUES ('first', 'last');
>>>
>>> 3. ALTER TABLE person
>>> ADD COLUMN full_name VARCHAR(100) GENERATED ALWAYS AS (first_name || ' '
>>> || last_name) VIRTUAL;
>>>
>>
>> Forgot to mention NOT NULL constraint in above query.
>>
>> 3. ALTER TABLE person
>> ADD COLUMN full_name VARCHAR(100) NOT NULL GENERATED ALWAYS AS
>> (first_name || ' ' || last_name) VIRTUAL;
>>
>> ERROR:  column "full_name" of relation "person" contains null values
>>
>>
>
> I did some debugging for this error.  It is reported in this function:
>
> */**
>
> * * ATRewriteTable: scan or rewrite one table*
>
> * **
>
> * * A rewrite is requested by passing a valid OIDNewHeap; in that case,
> caller*
>
> * * must already hold AccessExclusiveLock on it.*
>
> * */*
>
> static void
>
> *ATRewriteTable*(AlteredTableInfo **tab*, Oid *OIDNewHeap*)
>
> {
>
>
>
> ...
>
>
> */* Now check any constraints on the possibly-changed tuple */*
>
> econtext->ecxt_scantuple = insertslot;
>
>
> foreach(l, notnull_attrs)
>
> {
>
> int attn = lfirst_int(l);
>
>
> if (*slot_attisnull*(insertslot, attn + 1))
>
> {
>
> Form_pg_attribute attr = *TupleDescAttr*(newTupDesc,
> attn);
>
>
> ereport(ERROR,
>
> (*errcode*(ERRCODE_NOT_NULL_VIOLATION),
>
>  *errmsg*("column \"%s\" of relation \"%s\"
> contains null values",
>
> NameStr(attr->attname),
>
> RelationGetRelationName(oldrel)),
>
>  *errtablecol*(oldrel, attn + 1)));
>
> }
>
> }
>
> ...
>
> }
>
>
> If this error is unexpected, I think the issue is that when adding a NOT
> NULL constraint to a regular column, pg scans the table to ensure no NULL
> values exist. But for virtual columns, there are no stored values to scan.
> Maybe we should add some condition like this? Then checking not null at
> runtime.
>
>
> * /* Skip NOT NULL validation for virtual generated columns during table
> rewrite */*
>
> if (TupleDescAttr(newTupDesc, attn)->attgenerated ==
> ATTRIBUTE_GENERATED_VIRTUAL)
>
> continue;
>

Re: Statistics Import and Export

2025-03-07 Thread Corey Huinker

On Sat, Mar 8, 2025 at 12:52 AM Hari Krishna Sunder 
wrote:

> To improve the performance of pg_dump can we add a new sql function that
> can operate more efficiently than the pg_stats view? It could also take in
> an optional list of oids to filter on.
> This will help speed up the dump and restore within pg18 and future
> upgrades to higher pg versions.
>
>
We can't install functions on the source database - it might be a read
replica.

Re: per backend WAL statistics

2025-03-07 Thread Bertrand Drouvot

Hi,

On Sat, Mar 08, 2025 at 10:57:38AM +0900, Michael Paquier wrote:
> On Fri, Mar 07, 2025 at 08:33:04AM +, Bertrand Drouvot wrote:
> > But when it's time to flush, then pgstat_backend_have_pending_cb() checks
> > for zeros in PendingBackendStats *without* any check on the backend type.
> > 
> > I think the issue is "masked" on HEAD because PendingBackendStats is
> > probably automatically initialized with zeros (as being a static variable at
> > file scope).
> 
> If this weren't true, we would have a lot of problems in more places
> than this one.

Yeah I fully agree and I think that was fine. I just added "probably" as 
cautious wording, as the "We assume this initializes to zeroes" comments
that have been removed in 07e9e28b56d and 88f5ebbbee3 for example.

> It does not hurt to add an initialization at the top
> of pgstat_backend.c for PendingBackendStats, to document the
> intention, while we're on it.

-static PgStat_BackendPending PendingBackendStats;
+static PgStat_BackendPending PendingBackendStats = {0};

Not sure about this change: I think that that would not always work should
PgStat_BackendPending contains padding. I mean there is no guarantee that would
initialize padding bytes to zeroes (if any).

That would not be an issue should we only access the struct
fields in the code, but that's not the case as we're making use of
pg_memory_is_all_zeros() on it.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Statistics Import and Export

2025-03-07 Thread Corey Huinker

Updated and rebase patches.

0001 is the same as v6-0002, but with proper ACL checks on schemas after
cache lookup

0002 attempts to replace all possible ERRORs in the restore/clear functions
with WARNINGs. This is done with an eye towards reducing the set of things
that could potentially cause an upgrade to fail.

Spoke with Nathan about how best to batch the pg_stats fetches. I'll be
working on that now. Given that, the patch that optimized out
getAttributeStats() calls on indexes without expressions has been
withdrawn. It's a clear incremental gain, and we're looking for a couple
orders of magnitude gain.
From 9cd4b4e0e280d0fd8cb120ac105d6e65a491cd7e Mon Sep 17 00:00:00 2001
From: Corey Huinker 
Date: Tue, 4 Mar 2025 22:16:52 -0500
Subject: [PATCH v7 1/2] Split relation into schemaname and relname.

In order to further reduce potential error-failures in restores and
upgrades, replace the numerous casts of fully qualified relation names
into their schema+relname text components.

Further remove the ::name casts on attname and change the expected
datatype to text.

Add an ACL_USAGE check on the namespace oid after it is looked up.
---
 src/include/catalog/pg_proc.dat|   8 +-
 src/include/statistics/stat_utils.h|   2 +
 src/backend/statistics/attribute_stats.c   |  87 --
 src/backend/statistics/relation_stats.c|  65 +++--
 src/backend/statistics/stat_utils.c|  37 +++
 src/bin/pg_dump/pg_dump.c  |  25 +-
 src/bin/pg_dump/t/002_pg_dump.pl   |   6 +-
 src/test/regress/expected/stats_import.out | 307 +
 src/test/regress/sql/stats_import.sql  | 276 +++---
 doc/src/sgml/func.sgml |  41 +--
 10 files changed, 566 insertions(+), 288 deletions(-)

diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index cede992b6e2..fdd4b8d7dba 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -12443,8 +12443,8 @@
   descr => 'clear statistics on relation',
   proname => 'pg_clear_relation_stats', provolatile => 'v', proisstrict => 'f',
   proparallel => 'u', prorettype => 'void',
-  proargtypes => 'regclass',
-  proargnames => '{relation}',
+  proargtypes => 'text text',
+  proargnames => '{schemaname,relname}',
   prosrc => 'pg_clear_relation_stats' },
 { oid => '8461',
   descr => 'restore statistics on attribute',
@@ -12459,8 +12459,8 @@
   descr => 'clear statistics on attribute',
   proname => 'pg_clear_attribute_stats', provolatile => 'v', proisstrict => 'f',
   proparallel => 'u', prorettype => 'void',
-  proargtypes => 'regclass name bool',
-  proargnames => '{relation,attname,inherited}',
+  proargtypes => 'text text text bool',
+  proargnames => '{schemaname,relname,attname,inherited}',
   prosrc => 'pg_clear_attribute_stats' },
 
 # GiST stratnum implementations
diff --git a/src/include/statistics/stat_utils.h b/src/include/statistics/stat_utils.h
index 0eb4decfcac..cad042c8e4a 100644
--- a/src/include/statistics/stat_utils.h
+++ b/src/include/statistics/stat_utils.h
@@ -32,6 +32,8 @@ extern bool stats_check_arg_pair(FunctionCallInfo fcinfo,
 
 extern void stats_lock_check_privileges(Oid reloid);
 
+extern Oid stats_schema_check_privileges(const char *nspname);
+
 extern bool stats_fill_fcinfo_from_arg_pairs(FunctionCallInfo pairs_fcinfo,
 			 FunctionCallInfo positional_fcinfo,
 			 struct StatsArgInfo *arginfo);
diff --git a/src/backend/statistics/attribute_stats.c b/src/backend/statistics/attribute_stats.c
index 6bcbee0edba..f87db2d6102 100644
--- a/src/backend/statistics/attribute_stats.c
+++ b/src/backend/statistics/attribute_stats.c
@@ -36,7 +36,8 @@
 
 enum attribute_stats_argnum
 {
-	ATTRELATION_ARG = 0,
+	ATTRELSCHEMA_ARG = 0,
+	ATTRELNAME_ARG,
 	ATTNAME_ARG,
 	ATTNUM_ARG,
 	INHERITED_ARG,
@@ -58,8 +59,9 @@ enum attribute_stats_argnum
 
 static struct StatsArgInfo attarginfo[] =
 {
-	[ATTRELATION_ARG] = {"relation", REGCLASSOID},
-	[ATTNAME_ARG] = {"attname", NAMEOID},
+	[ATTRELSCHEMA_ARG] = {"schemaname", TEXTOID},
+	[ATTRELNAME_ARG] = {"relname", TEXTOID},
+	[ATTNAME_ARG] = {"attname", TEXTOID},
 	[ATTNUM_ARG] = {"attnum", INT2OID},
 	[INHERITED_ARG] = {"inherited", BOOLOID},
 	[NULL_FRAC_ARG] = {"null_frac", FLOAT4OID},
@@ -80,7 +82,8 @@ static struct StatsArgInfo attarginfo[] =
 
 enum clear_attribute_stats_argnum
 {
-	C_ATTRELATION_ARG = 0,
+	C_ATTRELSCHEMA_ARG = 0,
+	C_ATTRELNAME_ARG,
 	C_ATTNAME_ARG,
 	C_INHERITED_ARG,
 	C_NUM_ATTRIBUTE_STATS_ARGS
@@ -88,8 +91,9 @@ enum clear_attribute_stats_argnum
 
 static struct StatsArgInfo cleararginfo[] =
 {
-	[C_ATTRELATION_ARG] = {"relation", REGCLASSOID},
-	[C_ATTNAME_ARG] = {"attname", NAMEOID},
+	[C_ATTRELSCHEMA_ARG] = {"relation", TEXTOID},
+	[C_ATTRELNAME_ARG] = {"relation", TEXTOID},
+	[C_ATTNAME_ARG] = {"attname", TEXTOID},
 	[C_INHERITED_ARG] = {"inherited", BOOLOID},
 	[C_NUM_ATTRIBUTE_STATS_ARGS] = {0}
 };
@@ -133,6 +137,9 @@ static void init_empty_stats_tuple(

Re: Add contrib/pg_logicalsnapinspect

2025-03-07 Thread Bertrand Drouvot

Hi,

On Fri, Mar 07, 2025 at 12:09:35PM -0800, Masahiko Sawada wrote:
> Thank you for updating the patch. It looks mostly good to me. I've
> made some cosmetic changes and attached the updated version.

LGTM, thanks!

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: pg_dump, pg_dumpall, pg_restore: Add --no-policies option

2025-03-07 Thread Jim Jones

Hi

On 27.02.25 15:37, vignesh C wrote:
> Here is a rebased version along with the test failure fixes, please
> accept the change if you are ok with it.


Patch LGTM. +1

It applies cleanly and works as described:


== pg_dump ==

$ /usr/local/postgres-dev/bin/pg_dump db > dump.out
$ grep "POLICY" dump.out | tee /dev/tty | wc -l
-- Name: t p; Type: POLICY; Schema: public; Owner: jim
CREATE POLICY p ON public.t FOR DELETE;
2

$ /usr/local/postgres-dev/bin/pg_dump db --no-policies > dump.out
$ grep "POLICY" dump.out | tee /dev/tty | wc -l
0

== pg_dumpall ==

$ /usr/local/postgres-dev/bin/pg_dumpall > dumpall.out
$ grep "POLICY" dumpall.out | tee /dev/tty | wc -l
-- Name: t p; Type: POLICY; Schema: public; Owner: jim
CREATE POLICY p ON public.t FOR DELETE;
2

$ /usr/local/postgres-dev/bin/pg_dumpall --no-policies > dumpall.out
$ grep "POLICY" dumpall.out | tee /dev/tty | wc -l
0

== pg_restore ==

$ /usr/local/postgres-dev/bin/pg_dump -Fc db > dump.out
$ /usr/local/postgres-dev/bin/pg_restore -l dump.out | grep POLICY | tee
/dev/tty | wc -l
3375; 3256 16388 POLICY public t p jim
1
$ /usr/local/postgres-dev/bin/pg_restore --no-policies -l dump.out |
grep POLICY | tee /dev/tty | wc -l
0


check-world passes and the documentation is consistent with the existing
"--no*" options of pg_dump, pg_dumpall, and pg_restore.

The new status of this patch is: Ready for Committer

Best regards, Jim

Re: Log connection establishment timings

2025-03-07 Thread Bertrand Drouvot

Hi,

On Thu, Mar 06, 2025 at 02:10:47PM -0500, Andres Freund wrote:
> On 2025-01-20 15:01:38 +, Bertrand Drouvot wrote:
> > /* If start_time is in the future or now, no time has elapsed */
> > if (start_time >= stop_time)
> > return 0;
> > "
> > 
> > I think that it can happen due to time changes.
> 
> > So with TimestampTz, we would:
> > 
> > 1. return 0 if we moved the time backward
> > 2. provide an inflated duration including the time jump (if the time move
> > forward).
> > 
> > But with instr_time (and on systems that support CLOCK_MONOTONIC) then
> > pg_clock_gettime_ns() should not be affected by system time change IIUC.
> 
> It still does jump around a bit on some systems, even if it shouldn't. It's
> not at all rare to see time distontinuities when getting scheduled on another
> socket than before 

Interesting, yeah I can imagine that could happen. 

> or when a VM got migrated. It shouldn't happen, but does.

Agree, those are convincing examples.

> I think it'd be better to use absolute times and store them as such in
> ConnectionTimes or whatever.

It was still not clear to me why using TimestampTz would be better, until I 
read:

> That way we have information about when a
> connection was established for some future SQL functions and for debugging
> problems.
> 

Thanks!

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Separate GUC for replication origins

2025-03-07 Thread Euler Taveira

On Fri, Mar 7, 2025, at 11:47 AM, Peter Eisentraut wrote:
> Is that maximum active for the whole system, or maximum active per 
> session, or maximum active per created origin, or some combination of these?
> 

It is a cluster-wide setting. Similar to max_replication_slots. I will make
sure the GUC description is clear about it. It seems the Replication Progress
Tracking chapter needs an update to specify this information too.

--
Euler Taveira
EDB   https://www.enterprisedb.com/

Re: Refactoring postmaster's code to cleanup after child exit

2025-03-07 Thread Tomas Vondra

On 3/7/25 15:53, Andres Freund wrote:
> Hi,
> 
> On 2025-03-06 22:49:20 +0200, Heikki Linnakangas wrote:
>> In short, all the 4 patches look good to me. Thanks for picking this up!
>>
>> On 06/03/2025 22:16, Andres Freund wrote:
>>> On 2025-03-05 20:49:33 -0800, Noah Misch wrote:
> This behaviour makes it really hard to debug problems. It'd have been a 
> lot
> easier to understand the problem if we'd seen psql's stderr before the 
> test
> died.
>
> I guess that mean at the very least we'd need to put an eval {} around the
> ->pump() call., print $self->{stdout}, ->{stderr} and reraise an error?

 That sounds right.
>>>
>>> In the attached patch I did that for wait_connect().  I did verify that it
>>> works by implementing the wait_connect() fix before fixing
>>> 002_connection_limits.pl, which fails if a sleep(1) is added just before the
>>> proc_exit(1) for FATAL.
>>
>> +1. For the archives sake, I just want to clarify that this pump stuff is
>> all about getting better error messages on a test failure. It doesn't help
>> with the original issue.
> 
> Agreed.
> 

FWIW I keep running into this (and skink seems unhappy too). I ended up
just adding a sleep(1), right before

push(@sessions, background_psql_as_user('regress_superuser'));

and that makes it work on all my machines (including rpi5).


regards

-- 
Tomas Vondra

Re: making EXPLAIN extensible

2025-03-07 Thread Tom Lane

Robert Haas  writes:
> On Fri, Mar 7, 2025 at 9:38 AM Peter Eisentraut  wrote:
>> Also, benign typedef redefinitions are a C11 feature.  In practice, all
>> compilers currently in play support it, and the only problem you'll get
>> is from the buildfarm members that are explicitly set up to warn about
>> accidental C11 use.  We could probably have a discussion about that, but
>> for this patch set, it's probably better to just deal with the status quo.

> Agreed. +1 for having a discussion at some point, though, because the
> effect of the current rules seems to be that you have to write "struct
> BananaSplit *" in a bunch of places instead of just 'BananaSplit *" to
> avoid redefining the typedef.

I'd be +1 if there's a way to allow that particular thing without
thereby opening the floodgates to every other C11 feature.  I expect
not all of C11 is universal yet, so I think the buildfarm animals
that are using -std=gnu99 are mostly doing us a service.  But yeah,
this particular thing is a pain in the rear.

regards, tom lane

Re: pg_atomic_compare_exchange_*() and memory barriers

2025-03-07 Thread Andres Freund

Hi,

On 2025-03-07 17:47:08 +0200, Alexander Korotkov wrote:
> While investigating a bug in the patch to get rid of WALBufMappingLock, I
> found that the surrounding pg_atomic_compare_exchange_u64() fixes the
> problem for me.

That sounds more likely to be due to slowing down things enough to make a race
less likely to be hit. Or a compiler bug.


> That doesn't feel right because, according to the
> comments, both pg_atomic_compare_exchange_u32() and
> pg_atomic_compare_exchange_u64() should provide full memory barrier
> semantics.  So, I decided to investigate this further.
>
> In my case, the implementation of pg_atomic_compare_exchange_u64() is based
> on __atomic_compare_exchange_n().
>
> static inline bool
> pg_atomic_compare_exchange_u64_impl(volatile pg_atomic_uint64 *ptr,
> uint64 *expected, uint64 newval)
> {
> AssertPointerAlignment(expected, 8);
> return __atomic_compare_exchange_n(&ptr->value, expected, newval, false,
>__ATOMIC_SEQ_CST, __ATOMIC_SEQ_CST);
> }
>
> According to the docs __ATOMIC_SEQ_CST enforces total ordering with *all
> other __ATOMIC_SEQ_CST operations*.  It only says about other
> __ATOMIC_SEQ_CST operations; nothing is said about regular reads/writes.
> This sounds quite far from our comment, promising full barrier semantics,
> which, in my understanding, means the equivalent of
> both pg_read_barrier()/pg_write_barrier(), which should barrier *all other
> read/writes*.

A memory barrier in one process/thread always needs be paired with a barrier
in another process/thread. It's not enough to have a memory barrier on one
side but not the other.

__ATOMIC_SEQ_CST is used to implement the C11/C++11 barriers, and for those
it's more clearly formulated that that include acquire/release semantics.  The
standard formulation is a bit more complicated IIRC, but here's the
cppreference.com simplification:

https://en.cppreference.com/w/c/atomic/memory_order
>  A load operation with this memory order performs an acquire operation, a
>  store performs a release operation, and read-modify-write performs both an
>  acquire operation and a release operation, plus a single total order exists
>  in which all threads observe all modifications in the same order (see
>  Sequentially-consistent ordering below).


>
> This function first checks if LSE instructions are present.  If so,
> instruction LSE casal is used.  If not (in my case), the loop of
> ldaxr/stlxr is used instead.  The documentation says both ldaxr/stlxr
> provides one-way barriers.  Read/writes after ldaxr will be observed after,
> and read/writes before stlxr will be observed before.  So, a pair of these
> instructions provides a full memory barrier.  However, if we don't observe
> the expected value, only ldaxr gets executed.  So, an unsuccessful
> pg_atomic_compare_exchange_u64() attempt will leave us with a one-way
> barrier, and that caused a bug in my case.

That has to be a compiler bug.  We specify __ATOMIC_SEQ_CST for both
success_memorder *and* failure_memorder.

What compiler & version is this?


> I have a couple of ideas on how to fix this problem.
> 1. Provide full barrier semantics for pg_atomic_compare_exchange_*().  In
> particular, for __atomic_compare_exchange_n(), we should probably
> use __ATOMIC_ACQ_REL for success and run an explicit memory barrier in the
> case of failure.

I don't follow - __ATOMIC_SEQ_CST is *stronger* than __ATOMIC_ACQ_REL.

Greetings,

Andres Freund

Re: Add Postgres module info

2025-03-07 Thread Andrei Lepikhov


On 2/3/2025 20:35, Andrei Lepikhov wrote:

On 17/2/2025 04:00, Michael Paquier wrote:

No documentation provided.
Ok, I haven't been sure this idea has a chance to be committed. I will 
introduce the docs in the next version.
This is a new version with bug fixes. Also, use TAP tests instead of 
regression tests. Still, no documentation is included.


--
regards, Andrei LepikhovFrom 3ef2579bfa3dd0306d63e355ceae65f9f476dea8 Mon Sep 17 00:00:00 2001
From: "Andrei V. Lepikhov" 
Date: Tue, 19 Nov 2024 18:45:36 +0700
Subject: [PATCH v4] Introduce PG_MODULE_MAGIC_EXT macro.
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

This macro provides dynamically loaded shared libraries (modules) with standard
way to incorporate version (supposedly, defined according to semantic versioning
specification) and name data. The introduced catalogue routine pg_get_modules≈ 
can
be used to find this module by name and check the version. It makes users
independent from file naming conventions.

With a growing number of Postgres core hooks and the introduction of named DSM
segments, the number of modules that don't need to be loaded on startup may
grow fast. Moreover, in many cases related to query tree transformation or
extra path recommendation, such modules might not need database objects except
GUCs - see auto_explain as an example. That means they don't need to execute
the 'CREATE EXTENSION' statement at all and don't have a record in
the pg_extension table. Such a trick provides much flexibility, including
an online upgrade and may become widespread.

In addition, it is also convenient in support to be sure that the installation
(or at least the backend) includes a specific version of the module. Even if
a module has an installation script, it is not rare that it provides
an implementation for a range of UI routine versions. It makes sense to ensure
which specific version of the code is used.

Discussions [1,2] already mentioned module-info stuff, but at that time,
extensibility techniques and extension popularity were low, and it wasn't
necessary to provide that data.

[1] https://www.postgresql.org/message-id/flat/20060507211705.GB3808%40svana.org
[2] 
https://www.postgresql.org/message-id/flat/20051106162658.34c31d57%40thunder.logicalchaos.org
---
 src/backend/utils/fmgr/dfmgr.c| 66 ++-
 src/include/catalog/pg_proc.dat   |  6 ++
 src/include/fmgr.h| 28 +++-
 src/test/modules/test_misc/Makefile   |  5 +-
 .../test_misc/t/008_check_pg_get_modules.pl   | 65 ++
 src/test/modules/test_shm_mq/test.c   |  5 +-
 src/test/modules/test_slru/test_slru.c|  5 +-
 7 files changed, 172 insertions(+), 8 deletions(-)
 create mode 100644 src/test/modules/test_misc/t/008_check_pg_get_modules.pl

diff --git a/src/backend/utils/fmgr/dfmgr.c b/src/backend/utils/fmgr/dfmgr.c
index 87b233cb887..70d6ced4157 100644
--- a/src/backend/utils/fmgr/dfmgr.c
+++ b/src/backend/utils/fmgr/dfmgr.c
@@ -21,10 +21,12 @@
 #endif /* !WIN32 */
 
 #include "fmgr.h"
+#include "funcapi.h"
 #include "lib/stringinfo.h"
 #include "miscadmin.h"
 #include "storage/fd.h"
 #include "storage/shmem.h"
+#include "utils/builtins.h"
 #include "utils/hsearch.h"
 
 
@@ -51,6 +53,7 @@ typedef struct df_files
ino_t   inode;  /* Inode number of file */
 #endif
void   *handle; /* a handle for pg_dl* 
functions */
+   const pg_module_info *minfo;
charfilename[FLEXIBLE_ARRAY_MEMBER];/* Full 
pathname of file */
 } DynamicFileList;
 
@@ -75,7 +78,7 @@ static char *substitute_libpath_macro(const char *name);
 static char *find_in_dynamic_libpath(const char *basename);
 
 /* Magic structure that module needs to match to be accepted */
-static const Pg_magic_struct magic_data = PG_MODULE_MAGIC_DATA;
+static const Pg_magic_struct magic_data = PG_MODULE_MAGIC_DATA(0);
 
 
 /*
@@ -245,8 +248,14 @@ internal_load_library(const char *libname)
{
const Pg_magic_struct *magic_data_ptr = (*magic_func) 
();
 
+   /*
+* Check magic field from loading library to be sure it 
 compiled
+* for the same Postgres code. Skip maintainer fields 
at the end
+* of the struct.
+*/
if (magic_data_ptr->len != magic_data.len ||
-   memcmp(magic_data_ptr, &magic_data, 
magic_data.len) != 0)
+   memcmp(magic_data_ptr, &magic_data,
+  offsetof(Pg_magic_struct, 
module_extra)) != 0)
{
/* copy data block before unlinking library */
Pg_magic_struct module_magic_data =

Re: [PATCH] New predefined role pg_manage_extensions

2025-03-07 Thread Robert Haas

On Fri, Mar 7, 2025 at 9:37 AM Michael Banck  wrote:
> Also, I think there is case to be made that a cloud provider (or site
> admin) would like to delegate the decision whether users with CREATE
> rights on a particular database are allowed to install some extensions
> or not. Or rather, assign somebody they believe would make the right
> call to do that, by granting pg_manage_extensions.

Hypothetically, somebody could want a feature at various levels of
granularity. The most fine-grained would be something like: [1] allow
user X to install extension Y. Then, more broadly, you could have: [2]
allow any user who can install extensions to install extension Y. Or
conversely: [3] allow user X to install any extension. This patch
implements [3], but you could make an argument for any of the others.
My previous proposal amounted to allowing [2] via filesystem hacks,
but you could also have a GUC to allow [2], approximately:
artifically_trusted_extensions = foo, bar. That would actually allow
for [1] as well, because you could apply that GUC via ALTER ROLE ..
SET. I'm not saying that's necessarily better than what you're
proposing, but I think it's worth taking the time to think through the
options carefully.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: zstd failing on mipsel (PG 15.12, pg_verifybackup/t/010_client_untar.pl)

2025-03-07 Thread Christoph Berg

Re: To PostgreSQL Hackers
> ... and the file is just fine. The blame is now on the kernel side for
> any machine-level problems. (The kernel on the porter box where I
> tried to reproduce the problem is much older than on the actual
> buildds. A new version that hopefully fixes the problem is being built
> now.)

A new kernel fixed the problem indeed:

https://buildd.debian.org/status/logs.php?pkg=postgresql-15&ver=15.12-0%2Bdeb12u2&arch=mipsel&suite=bookworm

Christoph

Re: Trigger more frequent autovacuums

2025-03-07 Thread wenhui qiu

Hi
   The more accurate data I've found is tabentry->live_tuples; provides the
second version

#Here's a simple test I did

test=# select count(*) from join1;
  count
-
 2289001
(1 row)


test=# update join1 set name=md5(now()::text) where id<100;
UPDATE 1938700
test=# select 1938700/2289001;
 ?column?
--
0
(1 row)

test=# select 1938700/2289001::float;
  ?column?

 0.8469633696097119
(1 row)

test=#



test=# select count(*) from join1;
  count
-
 2289001
(1 row)
test=# update join1 set name=md5(now()::text) where id<=8;
UPDATE 159901
test=# select 159901/2289001::float;
  ?column?
-
 0.06985623859491542
(1 row)


test=# select * from pg_stat_all_tables where relname='join1';
-[ RECORD 1 ]--+--
relid  | 16385
schemaname | public
relname| join1
seq_scan   | 17
last_seq_scan  | 2025-03-07 15:34:02.793659+08
seq_tup_read   | 14994306
idx_scan   | 7
last_idx_scan  | 2025-03-07 15:34:23.404788+08
idx_tup_fetch  | 500281
n_tup_ins  | 2289001
n_tup_upd  | 2268604
n_tup_del  | 0
n_tup_hot_upd  | 399
n_tup_newpage_upd  | 2268205
n_live_tup | 2286701
n_dead_tup | 159901
n_mod_since_analyze| 159901
n_ins_since_vacuum | 0
last_vacuum| 2025-03-06 18:18:11.318419+08
last_autovacuum| 2025-03-07 15:25:53.055576+08
last_analyze   | 2025-03-06 18:18:11.424253+08
last_autoanalyze   | 2025-03-07 15:25:53.456656+08
vacuum_count   | 3
autovacuum_count   | 3
analyze_count  | 2
autoanalyze_count  | 4
total_vacuum_time  | 205
total_autovacuum_time  | 2535
total_analyze_time | 203
total_autoanalyze_time | 1398

test=#
test=# update join1 set name=md5(now()::text) where id<=8;
UPDATE 159901


test=# \x
Expanded display is on.
test=# select (n_live_tup)/(n_live_tup+n_dead_tup)::float from
pg_stat_all_tables where relname='join1';
-[ RECORD 1 ]
?column? | 0.8774142777358045

test=# select *  from pg_stat_all_tables where relname='join1';
-[ RECORD 1 ]--+--
relid  | 16385
schemaname | public
relname| join1
seq_scan   | 17
last_seq_scan  | 2025-03-07 15:34:02.793659+08
seq_tup_read   | 14994306
idx_scan   | 8
last_idx_scan  | 2025-03-07 15:46:38.331795+08
idx_tup_fetch  | 660182
n_tup_ins  | 2289001
n_tup_upd  | 2428505
n_tup_del  | 0
n_tup_hot_upd  | 424
n_tup_newpage_upd  | 2428081
n_live_tup | 2289001
n_dead_tup | 319802
n_mod_since_analyze| 0
n_ins_since_vacuum | 0
last_vacuum| 2025-03-06 18:18:11.318419+08
last_autovacuum| 2025-03-07 15:25:53.055576+08
last_analyze   | 2025-03-06 18:18:11.424253+08
last_autoanalyze   | 2025-03-07 15:47:35.950932+08
vacuum_count   | 3
autovacuum_count   | 3
analyze_count  | 2
autoanalyze_count  | 5
total_vacuum_time  | 205
total_autovacuum_time  | 2535
total_analyze_time | 203
total_autoanalyze_time | 1770

test=#
tail -n 1000 postgresql-Fri_17.csv |grep join1
2025-03-07 17:30:12.782 +08,,,755739,,67cabca4.b881b,3,,2025-03-07 17:30:12
+08,2017/2,0,DEBUG,0,"vacthresh: 457850.218750,anlthresh:
228950.109375, the join1 has 2289001.00 reltuples, pcnt_unfrozen:
1.00, pcnt_visibletuples: 0.877414 ","","autovacuum worker",,0
2025-03-07 17:31:12.803 +08,,,756028,,67cabce0.b893c,3,,2025-03-07 17:31:12
+08,2003/4,0,DEBUG,0,"vacthresh: 457850.218750,anlthresh:
228950.109375, the join1 has 2289001.00 reltuples, pcnt_unfrozen:
1.00, pcnt_visibletuples: 0.877414 ","","autovacuum worker",,0
2025-03-07 17:32:12.822 +08,,,756405,,67cabd1c.b8ab5,3,,2025-03-07 17:32:12
+08,2006/4,0,DEBUG,0,"vacthresh: 457850.218750,anlthresh:
228950.109375, the join1 has 2289001.00 reltuples, pcnt_unfrozen:
1.00, pcnt_visibletuples: 0.877414 ","","autovacuum worker",,0
2025-03-07 17:33:12.842 +08,,,757026,,67cabd58.b8d22,3,,2025-03-07 17:33:12
+08,2009/4,0,DEBUG,0,"vacthresh: 457850.218750,anlthresh:
228950.109375, the join1 has 2289001.00 reltuples, pcnt_unfrozen:
1.00, pcnt_visibletuples: 0.877414 ","","autovacuum worker",,0

On Fri, Mar 7, 2025 at 2:22 PM wenhui qiu  wrote:

> HI Nathan Bossart Melanie Plageman
>
> Firstly, congratulations on the submission of this path:
> https://commitfest.postgresql.org/patch/5320/
>
> vacthresh = (float4) vac_base_thresh + vac_scale_factor * reltuples;
> anlthresh = (float4) anl_base_thresh + anl_scale_factor * reltuples;
> vacinsthresh = (float4) vac_ins_base_thresh + vac_ins_scale_factor *
> reltuples;
> These three calculations have already been optimised f

Re: Draft for basic NUMA observability

2025-03-07 Thread Jakub Wartak

On Fri, Mar 7, 2025 at 11:20 AM Jakub Wartak
 wrote:
>
> Hi,
> On Wed, Mar 5, 2025 at 10:30 AM Jakub Wartak
>  wrote:
> >Hi,
>
> > > Yeah, that's why I was mentioning to use a "shared" 
> > > populate_buffercache_entry()
> > > or such function: to put the "duplicated" code in it and then use this
> > > shared function in pg_buffercache_pages() and in the new numa related one.
> >
> > OK, so hastily attempted that in 7b , I had to do a larger refactor
> > there to avoid code duplication between those two. I don't know which
> > attempt is better though (7 vs 7b)..
> >
>
> I'm attaching basically the earlier stuff (v7b) as v8 with the
> following minor changes:
> - docs are included
> - changed int8 to int4 in one function definition for numa_zone_id

.. and v9 attached because cfbot partially complained about
.cirrus.tasks.yml being adjusted recently (it seems master is hot
these days).

-J.


v9-0001-Add-optional-dependency-to-libnuma-Linux-only-for.patch
Description: Binary data


v9-0003-Add-pg_shmem_numa_allocations-to-show-NUMA-zones-.patch
Description: Binary data


v9-0002-Extend-pg_buffercache-with-new-view-pg_buffercach.patch
Description: Binary data

Re: Add contrib/pg_logicalsnapinspect

2025-03-07 Thread Bertrand Drouvot

Hi,

On Fri, Mar 07, 2025 at 10:26:23AM +0530, Amit Kapila wrote:
> On Fri, Mar 7, 2025 at 3:19 AM Masahiko Sawada  wrote:
> >
> > On Wed, Mar 5, 2025 at 4:05 AM Bertrand Drouvot
> >  wrote:
> > >
> > > Hi,
> > >
> > > On Wed, Mar 05, 2025 at 02:42:15PM +0530, Amit Kapila wrote:
> > > > On Wed, Mar 5, 2025 at 12:47 PM Bertrand Drouvot
> > > >  wrote:
> > > > >
> > > > > Agree, PFA a patch doing so.
> > > > >
> > > >
> > > > It would be better if you could add a few comments atop the
> > > > permutation line to explain the working of the test.
> > >
> > > yeah makes sense. Done in the attached, and bonus point I realized that 
> > > the
> > > test could be simplified (so, removing useless steps in passing).
> > >
> >
> > Thank you for the patch.
> >
> > The new simplified test case can be pretty-formatted as:
> >
> > init
> > begin
> > savepoint
> > truncate
> > checkpoint-1
> > get_changes-1
> > commit
> > checkpoint-2
> > get_changes-2
> > info_catchange check
> > info_committed check
> > meta check

Yes.

> > IIUC if another checkpoint happens between get_change-2 and the
> > subsequent checks, the first snapshot would be removed during the
> > checkpoint, resulting in a test failure.

Good catch! Yeah you're right, thanks!

> I think we could check the
> > snapshot files while one transaction keeps open. The more simplified
> > test case would be:
> >
> > init
> > begin
> > savepoint
> > insert(cat-change)
> > begin
> > insert(cat-change)
> > commit
> > checkpoint
> > get_changes
> > info_catchange check
> > info_committed check
> > meta check
> > commit
> >
> > In this test case, we would have at least one serialized snapshot that
> > has both cat-changes and committed txns. What do you think?

Indeed, I think that would prevent snapshots to be removed.

The attached ends up doing:

init
begin
savepoint
truncate table1
   create table table2
   checkpoint
   get_changes
   info check
   meta check
commit

As the 2 ongoing catalog changes and the committed catalog change are part of 
the
same snapshot, then I grouped the catchanges and committed changes checks in the
same "info check".

> Your proposed change in the test sounds better than what we have now
> but I think we should also avoid autovacuum to perform analyze as that
> may add additional counts. For test_decoding, we keep
> autovacuum_naptime = 1d in logical.conf file, we can either use the
> same here or simply keep autovacuum off.

When writing the attached, I initially added extra paranoia in the tests by
using ">=", does that also address your autovacuum concern?

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
>From 415d707187070df88e11e9bb1059309ab102953e Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot 
Date: Wed, 5 Mar 2025 07:06:58 +
Subject: [PATCH v3] Modify pg_logicalinspect isolation test

The previous version was relying on the fact that the test produces exactly
2 snapshots on disk, while in fact it can produce more. Changing the test knowing
that at least 2 snapshots are generated.

In passing, removing useless steps and adding some comments.

Per buildfarm member skink.
---
 .../expected/logical_inspect.out  | 44 +--
 .../specs/logical_inspect.spec| 18 +---
 2 files changed, 24 insertions(+), 38 deletions(-)
  57.1% contrib/pg_logicalinspect/expected/
  42.8% contrib/pg_logicalinspect/specs/

diff --git a/contrib/pg_logicalinspect/expected/logical_inspect.out b/contrib/pg_logicalinspect/expected/logical_inspect.out
index d95efa4d1e5..c86711e471d 100644
--- a/contrib/pg_logicalinspect/expected/logical_inspect.out
+++ b/contrib/pg_logicalinspect/expected/logical_inspect.out
@@ -1,6 +1,6 @@
 Parsed test spec with 2 sessions
 
-starting permutation: s0_init s0_begin s0_savepoint s0_truncate s1_checkpoint s1_get_changes s0_commit s0_begin s0_insert s1_checkpoint s1_get_changes s0_commit s1_get_changes s1_get_logical_snapshot_info s1_get_logical_snapshot_meta
+starting permutation: s0_init s0_begin s0_savepoint s0_truncate s1_create_table s1_checkpoint s1_get_changes s1_get_logical_snapshot_info s1_get_logical_snapshot_meta s0_commit
 step s0_init: SELECT 'init' FROM pg_create_logical_replication_slot('isolation_slot', 'test_decoding');
 ?column?
 
@@ -10,43 +10,23 @@ init
 step s0_begin: BEGIN;
 step s0_savepoint: SAVEPOINT sp1;
 step s0_truncate: TRUNCATE tbl1;
+step s1_create_table: CREATE TABLE tbl2 (val1 integer, val2 integer);
 step s1_checkpoint: CHECKPOINT;
 step s1_get_changes: SELECT data FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'skip-empty-xacts', '1', 'inclu

Re: Commitfest app release on Feb 17 with many improvements

2025-03-07 Thread Jelte Fennema-Nio

On Fri, 7 Mar 2025 at 11:26, Andreas Karlsson  wrote:
> Out of curiosity: do you track which method works? I would expect
> everything to be applied with either git am or patch which can be
> applied with git apply making git apply technically unnecessary.

I think we need all of them...
- git apply is unable to apply the patch in 5272 on top of main
- git am, works with that 5272, but cannot apply any patch not created
by git format-patch (for instance 5522)
- I've been unable to get patch(1) not to return a failure if the
patch is already applied. So applying e.g. 5522 twice. git apply
handles that fine.

Re: Draft for basic NUMA observability

2025-03-07 Thread Jakub Wartak

Hi,
On Wed, Mar 5, 2025 at 10:30 AM Jakub Wartak
 wrote:
>Hi,

> > Yeah, that's why I was mentioning to use a "shared" 
> > populate_buffercache_entry()
> > or such function: to put the "duplicated" code in it and then use this
> > shared function in pg_buffercache_pages() and in the new numa related one.
>
> OK, so hastily attempted that in 7b , I had to do a larger refactor
> there to avoid code duplication between those two. I don't know which
> attempt is better though (7 vs 7b)..
>

I'm attaching basically the earlier stuff (v7b) as v8 with the
following minor changes:
- docs are included
- changed int8 to int4 in one function definition for numa_zone_id

-J.


v8-0002-Extend-pg_buffercache-with-new-view-pg_buffercach.patch
Description: Binary data


v8-0003-Add-pg_shmem_numa_allocations-to-show-NUMA-zones-.patch
Description: Binary data


v8-0001-Add-optional-dependency-to-libnuma-Linux-only-for.patch
Description: Binary data

Re: POC: make mxidoff 64 bits

2025-03-07 Thread Maxim Orlov

Here is a rebase, v14.

-- 
Best regards,
Maxim Orlov.
From ee4b3b3c3ad3293460eb1f0418d87a065b9a589b Mon Sep 17 00:00:00 2001
From: Maxim Orlov 
Date: Wed, 4 May 2022 15:53:36 +0300
Subject: [PATCH v14 5/7] TEST: initdb option to initialize cluster with
 non-standard xid/mxid/mxoff

To date testing database cluster wraparund was not easy as initdb has always
inited it with default xid/mxid/mxoff. The option to specify any valid
xid/mxid/mxoff at cluster startup will make these things easier.

Author: Maxim Orlov 
Author: Pavel Borisov 
Author: Svetlana Derevyanko 
Discussion: 
https://www.postgresql.org/message-id/flat/CACG%3Dezaa4vqYjJ16yoxgrpa-%3DgXnf0Vv3Ey9bjGrRRFN2YyWFQ%40mail.gmail.com
---
 src/backend/access/transam/clog.c  |  21 +
 src/backend/access/transam/multixact.c |  53 
 src/backend/access/transam/subtrans.c  |   8 +-
 src/backend/access/transam/xlog.c  |  15 ++--
 src/backend/bootstrap/bootstrap.c  |  50 +++-
 src/backend/main/main.c|   6 ++
 src/backend/postmaster/postmaster.c|  14 +++-
 src/backend/tcop/postgres.c|  53 +++-
 src/bin/initdb/initdb.c| 107 -
 src/bin/initdb/t/001_initdb.pl |  60 ++
 src/include/access/xlog.h  |   3 +
 src/include/c.h|   4 +
 src/include/catalog/pg_class.h |   2 +-
 13 files changed, 382 insertions(+), 14 deletions(-)

diff --git a/src/backend/access/transam/clog.c 
b/src/backend/access/transam/clog.c
index 48f10bec91..eb8a9791ab 100644
--- a/src/backend/access/transam/clog.c
+++ b/src/backend/access/transam/clog.c
@@ -834,6 +834,7 @@ BootStrapCLOG(void)
 {
int slotno;
LWLock *lock = SimpleLruGetBankLock(XactCtl, 0);
+   int64   pageno;
 
LWLockAcquire(lock, LW_EXCLUSIVE);
 
@@ -844,6 +845,26 @@ BootStrapCLOG(void)
SimpleLruWritePage(XactCtl, slotno);
Assert(!XactCtl->shared->page_dirty[slotno]);
 
+   pageno = 
TransactionIdToPage(XidFromFullTransactionId(TransamVariables->nextXid));
+   if (pageno != 0)
+   {
+   LWLock *nextlock = SimpleLruGetBankLock(XactCtl, pageno);
+
+   if (nextlock != lock)
+   {
+   LWLockRelease(lock);
+   LWLockAcquire(nextlock, LW_EXCLUSIVE);
+   lock = nextlock;
+   }
+
+   /* Create and zero the first page of the commit log */
+   slotno = ZeroCLOGPage(pageno, false);
+
+   /* Make sure it's written out */
+   SimpleLruWritePage(XactCtl, slotno);
+   Assert(!XactCtl->shared->page_dirty[slotno]);
+   }
+
LWLockRelease(lock);
 }
 
diff --git a/src/backend/access/transam/multixact.c 
b/src/backend/access/transam/multixact.c
index 4dae8f4799..ac68becf8b 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -1955,6 +1955,7 @@ BootStrapMultiXact(void)
 {
int slotno;
LWLock *lock;
+   int64   pageno;
 
lock = SimpleLruGetBankLock(MultiXactOffsetCtl, 0);
LWLockAcquire(lock, LW_EXCLUSIVE);
@@ -1966,6 +1967,26 @@ BootStrapMultiXact(void)
SimpleLruWritePage(MultiXactOffsetCtl, slotno);
Assert(!MultiXactOffsetCtl->shared->page_dirty[slotno]);
 
+   pageno = MultiXactIdToOffsetPage(MultiXactState->nextMXact);
+   if (pageno != 0)
+   {
+   LWLock *nextlock = SimpleLruGetBankLock(MultiXactOffsetCtl, 
pageno);
+
+   if (nextlock != lock)
+   {
+   LWLockRelease(lock);
+   LWLockAcquire(nextlock, LW_EXCLUSIVE);
+   lock = nextlock;
+   }
+
+   /* Create and zero the first page of the offsets log */
+   slotno = ZeroMultiXactOffsetPage(pageno, false);
+
+   /* Make sure it's written out */
+   SimpleLruWritePage(MultiXactOffsetCtl, slotno);
+   Assert(!MultiXactOffsetCtl->shared->page_dirty[slotno]);
+   }
+
LWLockRelease(lock);
 
lock = SimpleLruGetBankLock(MultiXactMemberCtl, 0);
@@ -1978,7 +1999,39 @@ BootStrapMultiXact(void)
SimpleLruWritePage(MultiXactMemberCtl, slotno);
Assert(!MultiXactMemberCtl->shared->page_dirty[slotno]);
 
+   pageno = MXOffsetToMemberPage(MultiXactState->nextOffset);
+   if (pageno != 0)
+   {
+   LWLock *nextlock = SimpleLruGetBankLock(MultiXactMemberCtl, 
pageno);
+
+   if (nextlock != lock)
+   {
+   LWLockRelease(lock);
+   LWLockAcquire(nextlock, LW_EXCLUSIVE);
+   lock = nextlock;
+   }
+
+   /* Create and zero the first page of the members log */
+   slotno = ZeroMultiXactMemberPage(pageno,

Re: speedup COPY TO for partitioned table.

2025-03-07 Thread jian he

hi.

rebased and polished patch attached, test case added.
However there is a case (the following) where
``COPY(partitioned_table)`` is much slower
(around 25% in some cases) than ``COPY (select * from partitioned_table)``.

If the partition attribute order is not the same as the partitioned table,
then for each output row, we need to create a template TupleTableSlot
and call execute_attr_map_slot,
i didn't find a work around to reduce the inefficiency.

Since the master doesn't have ``COPY(partitioned_table)``,
I guess this slowness case is allowed?


--- the follow case is far slower than ``COPY(select * From pp) TO ``
drop table if exists pp;
CREATE TABLE pp (id  INT, val int ) PARTITION BY RANGE (id);
create table pp_1 (val int, id int);
create table pp_2 (val int, id int);
ALTER TABLE pp ATTACH PARTITION pp_1 FOR VALUES FROM (1) TO (5);
ALTER TABLE pp ATTACH PARTITION pp_2 FOR VALUES FROM (5) TO (10);
insert into pp select g, 10 + g from generate_series(1,9) g;
copy pp to stdout(header);
From eaf3869c4fb5fdacba5efd562f73ca06a0251ac4 Mon Sep 17 00:00:00 2001
From: jian he 
Date: Fri, 7 Mar 2025 18:39:56 +0800
Subject: [PATCH v2 1/1] support "COPY partitioned_table TO"

drop table if exists pp;
CREATE TABLE pp (id  INT, val int ) PARTITION BY RANGE (id);
create table pp_1 (val int, id int);
create table pp_2 (val int, id int);
ALTER TABLE pp ATTACH PARTITION pp_1 FOR VALUES FROM (1) TO (5);
ALTER TABLE pp ATTACH PARTITION pp_2 FOR VALUES FROM (5) TO (10);
insert into pp select g, 10 + g from generate_series(1,9) g;
copy pp to stdout(header);

the above case is much slower (around 25% some case) than
``COPY (select * from pp) to stdout(header);``
but this is still a new feature, since master does not
support ``COPY (partitioned_table)``.

discussion: https://postgr.es/m/CACJufxEZt+G19Ors3bQUq-42-61__C=y5k2wk=sHEFRusu7=i...@mail.gmail.com
---
 src/backend/commands/copyto.c   | 80 ++---
 src/test/regress/expected/copy2.out | 16 ++
 src/test/regress/sql/copy2.sql  | 11 
 3 files changed, 101 insertions(+), 6 deletions(-)

diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 84a3f3879a8..966b6741530 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -19,6 +19,8 @@
 #include 
 
 #include "access/tableam.h"
+#include "access/table.h"
+#include "catalog/pg_inherits.h"
 #include "commands/copyapi.h"
 #include "commands/progress.h"
 #include "executor/execdesc.h"
@@ -82,6 +84,7 @@ typedef struct CopyToStateData
 	List	   *attnumlist;		/* integer list of attnums to copy */
 	char	   *filename;		/* filename, or NULL for STDOUT */
 	bool		is_program;		/* is 'filename' a program to popen? */
+	List	   *partitions;		/* oid list of partition oid for copy to */
 	copy_data_dest_cb data_dest_cb; /* function for writing data */
 
 	CopyFormatOptions opts;
@@ -643,6 +646,8 @@ BeginCopyTo(ParseState *pstate,
 		PROGRESS_COPY_COMMAND_TO,
 		0
 	};
+	List	   *children = NIL;
+	List	   *scan_oids = NIL;
 
 	if (rel != NULL && rel->rd_rel->relkind != RELKIND_RELATION)
 	{
@@ -670,11 +675,19 @@ BeginCopyTo(ParseState *pstate,
 	 errmsg("cannot copy from sequence \"%s\"",
 			RelationGetRelationName(rel;
 		else if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
-			ereport(ERROR,
-	(errcode(ERRCODE_WRONG_OBJECT_TYPE),
-	 errmsg("cannot copy from partitioned table \"%s\"",
-			RelationGetRelationName(rel)),
-	 errhint("Try the COPY (SELECT ...) TO variant.")));
+		{
+			children = find_all_inheritors(RelationGetRelid(rel),
+		   AccessShareLock,
+		   NULL);
+			foreach_oid(childreloid, children)
+			{
+char		relkind = get_rel_relkind(childreloid);
+if (RELKIND_HAS_PARTITIONS(relkind))
+	continue;
+
+scan_oids = lappend_oid(scan_oids, childreloid);
+			}
+		}
 		else
 			ereport(ERROR,
 	(errcode(ERRCODE_WRONG_OBJECT_TYPE),
@@ -710,6 +723,7 @@ BeginCopyTo(ParseState *pstate,
 		cstate->rel = rel;
 
 		tupDesc = RelationGetDescr(cstate->rel);
+		cstate->partitions = list_copy(scan_oids);
 	}
 	else
 	{
@@ -1066,7 +1080,61 @@ DoCopyTo(CopyToState cstate)
 
 	cstate->routine->CopyToStart(cstate, tupDesc);
 
-	if (cstate->rel)
+	/*
+	 * if COPY TO source table is a partitioned table, then open each
+	 * partition and process each individual partition.
+	 */
+	if (cstate->rel && cstate->rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
+	{
+		processed = 0;
+
+		foreach_oid(scan_oid, cstate->partitions)
+		{
+			TupleTableSlot *slot;
+			TableScanDesc scandesc;
+			Relation		scan_rel;
+			TupleDesc		scan_tupdesc;
+			AttrMap		*map;
+			TupleTableSlot  *root_slot = NULL;
+			TupleTableSlot  *original_slot = NULL;
+
+			scan_rel = table_open(scan_oid, AccessShareLock);
+			scan_tupdesc = RelationGetDescr(scan_rel);
+			map = build_attrmap_by_name_if_req(tupDesc, scan_tupdesc, false);
+
+			scandesc = table_beginscan(scan_rel, GetActiveSnapshot(), 0, NULL);
+

Re: PoC. The saving of the compiled jit-code in the plan cache

2025-03-07 Thread Vladlen Popolitov


Matheus Alcantara писал(а) 2025-03-07 05:02:

Hi!

 Thank for interest to the patch.


1. Changes in jit-code generation.

a) the load of the absolute address (as const) changed to the load of 
this

address from a struct member:


If I understood correctly, this change is required to avoid making the
jit code points to a memory address that was reallocated (making it
invalid)? If that's the case, would be possible to split this change
into a separated patch? It would help a lot to review.

Yes, you correct. In this patch the value of the pointers initialised
every time, when code called for new structure, to reffer to new 
address.

I did not put it into separate patch, because now I see all changes
in the system are interconnected and require to change many files.
It is hard to maintain, at least now.

Also I did it to check the possibility to implement it and check, is it
really better. During implementation I understood, that the design must
be changed and clarified, it looks dirty now.



It's not clear to me what's the difference between jit_context being 
set

to NULL or CACHED_JITCONTEXT_EMPTY and how it changes from NULL to
CACHED_JITCONTEXT_EMPTY. I think some code comments will help on this,
specially on llvm_compile_expr around if/else blocks.

 When llvm_compile_expr is called, I need to know, is it compilation for
cache or usual compilation. I use jit_context for this. It is always
initialised by NULL in palloc0(). If plan is saved to cache,
jit_context assigned CACHED_JITCONTEXT_EMPTY - it means, jit also must
me saved in cache, but not saved yet.


Updated patch rebased to the current master. Also I resolved the 
problems

with the lookup of the compiled expressions.
 Cached jit compiles only expressions from cached plan - they are
recognized by memory context - if Expr is not NULL and belong to the 
same
memory context as cached plan, this expression is compiled to the 
function

with expression address in the name (expression address casted to Size
type).
 All other expression (generated later than plan, f.e. expressions in
aggregates, hash join, hash aggregates) are not compiled and are 
executed

by standard expression interpreter.


What's stopping us from going back to the current code generation
without caching on these scenarios? I'm a bit concerned for never jit
compile these kind of expressions and have some kind of performance
issue.


I thought about this. The reason to store the jit in cache - avoid
compiling every time when query is executed. If we anyway compile
the code even for 1 function, we lose benefits of cached jit code.
We can choose the standard jit compilation for other functions.


What's the relation of this exec time generated expressions (hash join,
hash agg) problem with the function name lookup issue? It's seems
different problems to me but I may be missing something.

This question and next question are correlated.

If these problems is not fully correlated I think that it would be
better to have some kind of hash map or use the number of the call to
connect expressions with functions (as you did on v4) to lookup the
cached compiled function. IMHO it sounds a bit strange to me to add the
function pointer on the function name and have this memory context
validation.

 This function lookup the biggest challenge in this patch, I see
it correct.
 In current implementation: the function is generated for expression,
the name of the function is saved in the ExprState,
then function is compiled (when first expression is executed),
then the function is called,
lookup by saved address is done to find compiled code,
the compiled code address is saved in ExprState as new execution code,
the compiled code is called.


 In cached implementation when expression is called the next time,
we need to find the link among the expression and the compiled code.
I tried to find reliable way to connect them, and see now only one
version - make compilation only for expressions saved with the plan -
they have the same address across the time and the same memory context
as the saved plan. Expressions generated during execution like
aggregate expression and Hash join expression are allocated in
query execution context and always have different addresses, and I did
not find any way to connect them with cached plan. The have "expr"
member, but it can be NULL, T_List or T_Expr (or anything, what
Custom node can assign to it).
 Function name with expression address now looks as reliable way to find
compiled function, though not elegant. Standard implementation just 
stores

generated name, uses it and pfree() it.


I've also executed make check-world and it seems that
test_misc/003_check_guc is falling:

[11:29:42.995](1.153s) not ok 1 - no parameters missing from
postgresql.conf.sample
[11:29:42.995](0.000s) #   Failed test 'no parameters missing from
postgresql.conf.sample'
#   at src/test/modules/test_misc/t/003_check_guc.pl line 85.
[11:29:42.995](0.000s) #  got: '1'
# expec

Re: Selectively invalidate caches in pgoutput module

2025-03-07 Thread Amit Kapila

On Fri, Mar 7, 2025 at 4:28 PM Hayato Kuroda (Fujitsu)
 wrote:
>
> Dear Amit,
>
> Thanks for the comment! PSA new version.
>

Don't we need to call this invalidation function from
InvalidateSystemCachesExtended()?

-- 
With Regards,
Amit Kapila.

Re: strange valgrind reports about wrapper_handler on 64-bit arm

2025-03-07 Thread Nathan Bossart

On Fri, Mar 07, 2025 at 11:32:28AM -0500, Andres Freund wrote:
> Is it possible that the signal number we're getting called for is above
> PG_NSIG? That'd explain why the source value is something fairly random?
> 
> ISTM that we should add an Assert() to wrapper_handler() that ensures that the
> signal arg is below PG_NSIG.

We have such an assertion in pqsignal() before we install wrapper_handler
for anything.  Is there another way it could be getting called with a
different signo?

-- 
nathan

Re: [PATCH] pg_stat_activity: make slow/hanging authentication more visible

2025-03-07 Thread Andres Freund

Hi,

On 2025-03-06 15:39:44 -0800, Jacob Champion wrote:
> I've reattached the wait event patches, to get the cfbot back to where it was.

FWIW, I continue to think that this is a misuse of wait events. We shouldn't
use them as a poor man's general purpose tracing framework.

Greetings,

Andres Freund

Re: pg_atomic_compare_exchange_*() and memory barriers

2025-03-07 Thread Alexander Korotkov

Hi, Andres.

Thank you for reply.

On Fri, Mar 7, 2025 at 6:02 PM Andres Freund  wrote:
>
> On 2025-03-07 17:47:08 +0200, Alexander Korotkov wrote:
> > While investigating a bug in the patch to get rid of WALBufMappingLock, I
> > found that the surrounding pg_atomic_compare_exchange_u64() fixes the
> > problem for me.
>
> That sounds more likely to be due to slowing down things enough to make a race
> less likely to be hit. Or a compiler bug.
>
>
> > That doesn't feel right because, according to the
> > comments, both pg_atomic_compare_exchange_u32() and
> > pg_atomic_compare_exchange_u64() should provide full memory barrier
> > semantics.  So, I decided to investigate this further.
> >
> > In my case, the implementation of pg_atomic_compare_exchange_u64() is based
> > on __atomic_compare_exchange_n().
> >
> > static inline bool
> > pg_atomic_compare_exchange_u64_impl(volatile pg_atomic_uint64 *ptr,
> > uint64 *expected, uint64 newval)
> > {
> > AssertPointerAlignment(expected, 8);
> > return __atomic_compare_exchange_n(&ptr->value, expected, newval, false,
> >__ATOMIC_SEQ_CST, __ATOMIC_SEQ_CST);
> > }
> >
> > According to the docs __ATOMIC_SEQ_CST enforces total ordering with *all
> > other __ATOMIC_SEQ_CST operations*.  It only says about other
> > __ATOMIC_SEQ_CST operations; nothing is said about regular reads/writes.
> > This sounds quite far from our comment, promising full barrier semantics,
> > which, in my understanding, means the equivalent of
> > both pg_read_barrier()/pg_write_barrier(), which should barrier *all other
> > read/writes*.
>
> A memory barrier in one process/thread always needs be paired with a barrier
> in another process/thread. It's not enough to have a memory barrier on one
> side but not the other.

Yes, there surely should be a memory barrier on another side.  But
does __ATOMIC_SEQ_CST works as a barrier for the regular read/write
operations on the same side?

> __ATOMIC_SEQ_CST is used to implement the C11/C++11 barriers, and for those
> it's more clearly formulated that that include acquire/release semantics.  The
> standard formulation is a bit more complicated IIRC, but here's the
> cppreference.com simplification:
>
> https://en.cppreference.com/w/c/atomic/memory_order
> >  A load operation with this memory order performs an acquire operation, a
> >  store performs a release operation, and read-modify-write performs both an
> >  acquire operation and a release operation, plus a single total order exists
> >  in which all threads observe all modifications in the same order (see
> >  Sequentially-consistent ordering below).


Thank you.  This is not yet clear for me.  I probably need to meditate
more on this :)

>
> > This function first checks if LSE instructions are present.  If so,
> > instruction LSE casal is used.  If not (in my case), the loop of
> > ldaxr/stlxr is used instead.  The documentation says both ldaxr/stlxr
> > provides one-way barriers.  Read/writes after ldaxr will be observed after,
> > and read/writes before stlxr will be observed before.  So, a pair of these
> > instructions provides a full memory barrier.  However, if we don't observe
> > the expected value, only ldaxr gets executed.  So, an unsuccessful
> > pg_atomic_compare_exchange_u64() attempt will leave us with a one-way
> > barrier, and that caused a bug in my case.
>
> That has to be a compiler bug.  We specify __ATOMIC_SEQ_CST for both
> success_memorder *and* failure_memorder.
>
> What compiler & version is this?

I've tried and got the same results with two compilers.
gcc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0
Ubuntu clang version 19.1.1 (1ubuntu1~24.04.2)

--
Regards,
Alexander Korotkov
Supabase

Re: strange valgrind reports about wrapper_handler on 64-bit arm

2025-03-07 Thread Andres Freund

Hi,

On 2025-03-07 10:36:35 -0600, Nathan Bossart wrote:
> On Fri, Mar 07, 2025 at 11:32:28AM -0500, Andres Freund wrote:
> > Is it possible that the signal number we're getting called for is above
> > PG_NSIG? That'd explain why the source value is something fairly random?
> > 
> > ISTM that we should add an Assert() to wrapper_handler() that ensures that 
> > the
> > signal arg is below PG_NSIG.
> 
> We have such an assertion in pqsignal() before we install wrapper_handler
> for anything.  Is there another way it could be getting called with a
> different signo?

Who the hell knows :).

One potential way would be that we got SIGNAL_ARGS wrong for the platform and
are interpreting some random thing as the signal number.  Or something went
wrong in the windows signal emulation code.  Or ...

It seems cheap insurance to add it both places.

Greetings,

Andres Freund

Re: Commitfest app release on Feb 17 with many improvements

2025-03-07 Thread Tom Lane

Jelte Fennema-Nio  writes:
> On Fri, 7 Mar 2025 at 11:26, Andreas Karlsson  wrote:
>> Out of curiosity: do you track which method works? I would expect
>> everything to be applied with either git am or patch which can be
>> applied with git apply making git apply technically unnecessary.

> I think we need all of them...
> - git apply is unable to apply the patch in 5272 on top of main
> - git am, works with that 5272, but cannot apply any patch not created
> by git format-patch (for instance 5522)
> - I've been unable to get patch(1) not to return a failure if the
> patch is already applied. So applying e.g. 5522 twice. git apply
> handles that fine.

Hm, don't you *want* a failure if the patch is already applied?

regards, tom lane

Re: support virtual generated column not null constraint

2025-03-07 Thread Navneet Kumar

Hi

I encountered an issue when trying to add a virtual column to an existing
table using the ALTER command. The operation fails even though the existing
data ensures that the virtual column's value will never be NULL. However,
if I define the same virtual column while creating the table and then
insert the same data, it works as expected.

For example
This scenario fails
1. CREATE TABLE person (
id INT GENERATED BY DEFAULT AS IDENTITY,
first_name VARCHAR(50) NOT NULL,
last_name VARCHAR(50) NOT NULL
);

2. INSERT INTO person (first_name, last_name)
VALUES ('first', 'last');

3. ALTER TABLE person
ADD COLUMN full_name VARCHAR(100) GENERATED ALWAYS AS (first_name || ' ' ||
last_name) VIRTUAL;

The above ALTER command works if I use STORED instead. Also if I define
virtual generated column while creating table it works with same data.

1. CREATE TABLE person (
id INT GENERATED BY DEFAULT AS IDENTITY,
first_name VARCHAR(50) NOT NULL,
last_name VARCHAR(50) NOT NULL,
full_name VARCHAR(100) NOT NULL GENERATED ALWAYS AS (first_name || '  '
||  last_name) VIRTUAL
);

2. INSERT INTO person(first_name, last_name)

VALUES ('first', 'last');






On Thu, Mar 6, 2025 at 7:15 PM jian he  wrote:

> On Wed, Feb 26, 2025 at 3:01 PM ego alter  wrote:
> >
> > Hi, I’ve had a chance to review the patch.  As I am still getting
> familiar with executor part, questions and feedbacks could be relatively
> trivial.
> >
> > There are two minor issues i want to discuss:
> > 1. The way of caching constraint-checking expr for virtual generated not
> null
> > The existing array for caching constraint-checking expr is
> >  /* array of constraint-checking expr states */
> > ExprState **ri_ConstraintExprs;
> >
> > the proposed changes for virtual generated not null in patch
> > +  /* array of virtual generated not null constraint-checking expr
> states */
> > +  ExprState **ri_VirGeneratedConstraintExprs;
> > /*
> > Could you explain the rationale behind adding this new field instead of
> reusing ri_ConstraintExprs? The comment suggests it’s used specifically for
> not null constraint-checking, but could it be extended to handle other
> constraints in the future as well? I assume one benefit is that it
> separates the handling of normal constraints from virtual ones, but I'd
> like to appreciate more context on the decision.
> >
>
> it's a cache mechanism, just like ResultRelInfo.ri_ConstraintExprs
> you can see comments in ExecRelCheck
> /*
>  * If first time through for this result relation, build expression
>  * nodetrees for rel's constraint expressions.  Keep them in the
> per-query
>  * memory context so they'll survive throughout the query.
>  */
>
> for example:
> create table t(a int, check (a > 3));
> insert into t values (4),(3);
> we need to call ExecRelCheck for values 4 and 3.
> The second time ExecRelCheck called for values 3, if
> ri_ConstraintExprs is not null then
> we didn't need to call ExecPrepareExpr again for the same check
> constraint expression.
>
> +  ExprState **ri_VirGeneratedConstraintExprs;
> is specifically only for virtual generated columns not null constraint
> evaluation.
> Anyway, I renamed it to ri_GeneratedNotNullExprs.
>
> If you want to extend cache for other constraints in the future,
> you can add it to struct ResultRelInfo.
> Currently struct ResultRelInfo, we have ri_GeneratedExprsU,
> ri_GeneratedExprsI for generated expressions;
> ri_ConstraintExprs for check constraints.
>
>
> > 2. The naming of variable gen_virtualnn.
> > Gen_virtualnn looks confusing at first glance. Checkconstr seems to be
> more straightforward.
> >
> thanks. I changed it to exprstate.
>
> new patch attached.
> 0001 not null for virtual generated columns, some refactoring happened
> within ExecConstraints
> to avoid duplicated error handling code.
>
> 0002 and 0003 is the domain for virtual generated columns. domain can
> have not-null constraints,
> this is built on top of 0001, so I guess posting here for review should be
> fine?
>
> currently it works as intended. but add a virtual generated column
> with domain_with_constraints
> requires table rewrite. but some cases table scan should just be enough.
> some case we do need table rewrite.
> for example:
> create domain d1 as int check(value > random(min=>11::int, max=>12));
> create domain d2 as int check(value > 12);
> create table t(a int);
> insert into t select g from generate_series(1, 10) g;
> alter table t add column b d1 generated always as (a+11) virtual; --we
> do need table rewrite in phase 3.
> alter table t add column c d2 generated always as (a + 12) virtual;
> --we can only do table scan in phase 3.
>

Re: strange valgrind reports about wrapper_handler on 64-bit arm

2025-03-07 Thread Nathan Bossart

On Fri, Mar 07, 2025 at 11:41:38AM -0500, Andres Freund wrote:
> On 2025-03-07 10:36:35 -0600, Nathan Bossart wrote:
>> On Fri, Mar 07, 2025 at 11:32:28AM -0500, Andres Freund wrote:
>> > Is it possible that the signal number we're getting called for is above
>> > PG_NSIG? That'd explain why the source value is something fairly random?
>> > 
>> > ISTM that we should add an Assert() to wrapper_handler() that ensures that 
>> > the
>> > signal arg is below PG_NSIG.
>> 
>> We have such an assertion in pqsignal() before we install wrapper_handler
>> for anything.  Is there another way it could be getting called with a
>> different signo?
> 
> Who the hell knows :).
> 
> One potential way would be that we got SIGNAL_ARGS wrong for the platform and
> are interpreting some random thing as the signal number.  Or something went
> wrong in the windows signal emulation code.  Or ...
> 
> It seems cheap insurance to add it both places.

Good enough for me.  I'll commit/back-patch to v17 the attached soon.

-- 
nathan
diff --git a/src/port/pqsignal.c b/src/port/pqsignal.c
index 5dd8b76bae8..79b50486175 100644
--- a/src/port/pqsignal.c
+++ b/src/port/pqsignal.c
@@ -87,6 +87,8 @@ wrapper_handler(SIGNAL_ARGS)
 {
int save_errno = errno;
 
+   Assert(postgres_signal_arg < PG_NSIG);
+
 #ifndef FRONTEND
 
/*

Re: Add column name to error description

2025-03-07 Thread jian he

On Fri, Mar 7, 2025 at 11:05 AM Tom Lane  wrote:
>
> Erik Wienhold  writes:
> > But I don't see the point in keeping variables atttypid and atttypmod
> > around when those values are now available via outatt.  Removing these
> > two variables makes the code easier to read IMO.  Done so in the
> > attached v4.
>
> I think the idea of the original coding was to keep those values in
> registers in the inner loop rather than re-fetching them each time.
> But that's probably an unmeasurably microscopic optimization, if
> real at all (modern compilers might figure it out for themselves).

> Do others agree Erik's version improves readability?
>
i think so.

While looking at it, in build_attrmap_by_position
I guess errmsg may be better than errmsg_internal
since there are around 10 related error messages in
src/pl/plpgsql/src/expected/plpgsql_record.out,
so it's user visible.

but I am confused by the below errmsg_internal comments about
translation message infinite error recursion.
/*
 * errmsg_internal --- add a primary error message text to the current error
 *
 * This is exactly like errmsg() except that strings passed to errmsg_internal
 * are not translated, and are customarily left out of the
 * internationalization message dictionary.  This should be used for "can't
 * happen" cases that are probably not worth spending translation effort on.
 * We also use this for certain cases where we *must* not try to translate
 * the message because the translation would fail and result in infinite
 * error recursion.
 */

Re: Refactoring postmaster's code to cleanup after child exit

2025-03-07 Thread Andres Freund

Hi,

On 2025-03-06 22:49:20 +0200, Heikki Linnakangas wrote:
> In short, all the 4 patches look good to me. Thanks for picking this up!
> 
> On 06/03/2025 22:16, Andres Freund wrote:
> > On 2025-03-05 20:49:33 -0800, Noah Misch wrote:
> > > > This behaviour makes it really hard to debug problems. It'd have been a 
> > > > lot
> > > > easier to understand the problem if we'd seen psql's stderr before the 
> > > > test
> > > > died.
> > > > 
> > > > I guess that mean at the very least we'd need to put an eval {} around 
> > > > the
> > > > ->pump() call., print $self->{stdout}, ->{stderr} and reraise an error?
> > > 
> > > That sounds right.
> > 
> > In the attached patch I did that for wait_connect().  I did verify that it
> > works by implementing the wait_connect() fix before fixing
> > 002_connection_limits.pl, which fails if a sleep(1) is added just before the
> > proc_exit(1) for FATAL.
> 
> +1. For the archives sake, I just want to clarify that this pump stuff is
> all about getting better error messages on a test failure. It doesn't help
> with the original issue.

Agreed.


> This is all annoyingly complicated, but getting good error messages is worth
> it.

Yea. I really look forward to having a way to write stuff like this that
doesn't involve hackily driving psql from 100m away using rubber bands.


> > On 2025-03-05 08:23:32 +0900, Michael Paquier wrote:>> Why not adding an
> > injection point with a WARNING or a LOG generated,
> then
> > > check the server logs for the code path taken based on the elog() 
> > > generated
> > > with the point name?
> > 
> > I think the log_min_messages approach is a lot simpler. If we need something
> > like this more widely we can reconsider injection points...
> 
> +1. It's a little annoying to depend on a detail like the "client backend
> process exited" debug message, but seems like the best fix for now.

We use the same message for LOG messages too, for other types of backends, so
I think it's not that likely to change.  But stilll not great.


> While we're at it, attached are a few more cleanups I noticed.

I assume you'll apply that yourself?


Commits with updated commit messages attached.


I wonder if we should apply the polishing of connect_ok()/connect_fails() and
the wait_connect() debuggability improvements to the backbranches? Keeping TAP
infrastructure as similar as possible between branches has proven worthwhile
IME.


Greetings,

Andres Freund
>From d6dbf4c4a1e723a27df8a08b7e75352b8fb29d05 Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Fri, 7 Mar 2025 09:44:00 -0500
Subject: [PATCH v2 1/4] tests: Improve test names in
 connect_fails()/connect_ok()

connect_fails() didn't mention that stderr matched, whereas connect_ok() did.

Neither connect_fails() nor connect_ok() mentioned what they were checking
when checking psql's return status.

Reviewed-by: Heikki Linnakangas 
Discussion: https://postgr.es/m/ggflhkciwdyotpoie323chu2c2idpjk5qimrn462encwx2io7s@thmcxl7i6dpw
---
 src/test/perl/PostgreSQL/Test/Cluster.pm | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/src/test/perl/PostgreSQL/Test/Cluster.pm b/src/test/perl/PostgreSQL/Test/Cluster.pm
index b105cba05a6..883532e1cd3 100644
--- a/src/test/perl/PostgreSQL/Test/Cluster.pm
+++ b/src/test/perl/PostgreSQL/Test/Cluster.pm
@@ -2554,7 +2554,7 @@ sub connect_ok
 		connstr => "$connstr",
 		on_error_stop => 0);
 
-	is($ret, 0, $test_name);
+	is($ret, 0, "$test_name: connect succeeds, as expected");
 
 	if (defined($params{expected_stdout}))
 	{
@@ -2619,11 +2619,11 @@ sub connect_fails
 		extra_params => ['-w'],
 		connstr => "$connstr");
 
-	isnt($ret, 0, $test_name);
+	isnt($ret, 0, "$test_name: connect fails, as expected");
 
 	if (defined($params{expected_stderr}))
 	{
-		like($stderr, $params{expected_stderr}, "$test_name: matches");
+		like($stderr, $params{expected_stderr}, "$test_name: stderr matches");
 	}
 
 	$self->log_check($test_name, $log_location, %params);
-- 
2.48.1.76.g4e746b1a31.dirty

>From e514fe32c2566c524f1f18410266a1e2efdc7644 Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Fri, 7 Mar 2025 09:44:00 -0500
Subject: [PATCH v2 2/4] tests: Add note if BackgroundPsql::wait_connect()
 fails

If wait_connect() failed due to psql exiting, all that we'd see is a "process
ended prematurely" error thrown by IPC::Run, without ever seeing psql's error
message.

Address that by wrapping the pump() call in eval and taking note of stdout &
stderr in case of failure.

We might want to do that in pump_until() as well, but that seems to require
API changes, so let's do the easily achievable bit first.

Reviewed-by: Heikki Linnakangas 
Discussion: https://postgr.es/m/ggflhkciwdyotpoie323chu2c2idpjk5qimrn462encwx2io7s@thmcxl7i6dpw
---
 .../perl/PostgreSQL/Test/BackgroundPsql.pm| 26 ---
 1 file changed, 22 insertions(+), 4 deletions(-)

diff --git a/src/test/perl/PostgreSQL/Test/BackgroundPsql.pm b/src/test/perl/PostgreSQL/Test/BackgroundPsql.pm
in

Memory context can be its own parent and child in replication command

2025-03-07 Thread Anthonin Bonnefoy

Hi,

While running some tests with logical replication, I've run in a
situation where a walsender process was stuck in an infinite loop with
the following backtrace:

#0  in MemoryContextDelete (...) at ../src/backend/utils/mmgr/mcxt.c:474
#1  in exec_replication_command (cmd_string=... "BEGIN") at
../src/backend/replication/walsender.c:2005

Which matches the following while loop:
while (curr->firstchild != NULL)
curr = curr->firstchild;

Inspecting the memory context, I have the following:

$9 = (MemoryContext) 0xafb741c35360
(gdb) p *curr
$10 = {type = T_AllocSetContext, isReset = false, allowInCritSection =
false, mem_allocated = 8192, methods = 0xafb7278a82d8
, parent = 0xafb741c35360, firstchild =
0xafb741c35360, prevchild = 0x0, nextchild = 0x0, name =
0xafb7275f68a8 "Replication command context", ident = 0x0, reset_cbs =
0x0}

So the memory context is 0xafb741c35360, which is also the same value
for parent and firstchild. This explains the infinite loop as
MemoryContextDelete tries to find the leaf with no children.

I was able to get a rr recording of the issue and trace how this
happened. This can be reproduced by triggering 2 replication commands,
with the first one doing a snapshot export:

CREATE_REPLICATION_SLOT "test_slot" LOGICAL "test_decoding" ( SNAPSHOT
"export");
DROP_REPLICATION_SLOT "test_slot";

- CreateReplicationSlot will start a new transaction to handle the
snapshot export.
- This transaction will save the replication command context (let's
call it ctx1) in its state.
- ctx1 is deleted at the end of exec_replication_command
- During the next replication command, the transaction is aborted in
SnapBuildClearExportedSnapshot
- The transaction restores ctx1 as the CurrentMemoryContext
- AllocSetContextCreate is called with ctx1 as a parent, and it will
pull ctx1 from the freelist
- ctx1's parent and child will be set to ctx1 and returned
- During ctx1 deletion, it will be stuck in an infinite loop

I've added a tap test to reproduce the issue, along with an assertion
during context creation to check the parent and returned context are
not the same so the test would immediately abort and not stay stuck.

To fix this, it seems like saving and restoring the memory context
after the call AbortCurrentTransaction was the best approach. It is
similar to what's done with CurrentResourceOwner. I've thought of
switching to the TopMemoryContext before exporting the snapshot so
aborting the transaction will switch back to TopMemoryContext, but
this would still require restoring the memory context after the
transaction is aborted.

Regards,
Anthonin Bonnefoy


v01-0001-Avoid-using-deleted-context-with-replication-com.patch
Description: Binary data

Re: making EXPLAIN extensible

2025-03-07 Thread Robert Haas

On Thu, Mar 6, 2025 at 4:24 PM Robert Haas  wrote:
> On Thu, Mar 6, 2025 at 4:16 PM Tom Lane  wrote:
> > I find a good deal of attraction in getting rid of the IDs and
> > just using names.  Nor do I believe we need a hash table.
> > (1) Surely there will not be so many extensions using this within a
> > single EXPLAIN that a simple loop with strcmp()'s isn't good enough.
> > (2) The IDs aren't free either; where will an extension keep the
> > ID it assigned?  We're trying to move away from global variables.
> >
> > But, if you're convinced otherwise, the current design is OK.
>
> Interesting. I hadn't even considered just iterating every time to
> find the ID, but I agree with you that might be totally fine. As you
> say, we're not expecting there to be many extensions here. I can try
> coding that up and see how it looks (or you can, if you like).

Here's v6, doing it that way. I found that the simplest thing to do
was just push the call to GetExplainExtensionId() inside
Get/SetExplainExtensionState(). With this approach, the backend-scope
IDs still exist, but they are private to explain_state.c. An alternate
design could be to make each individual ExplainState have its own list
of extension names alongside its own list of opaque pointers, so that
the IDs become ExplainState-scoped rather than backend-scoped. At the
moment, that seems to me to be just deciding to make the code more
complicated for no obvious benefit, but maybe I'm missing something.
At any rate, my overall conclusion here is that this is giving up a
probably-insignificant amount of performance for an
also-not-terribly-significant abstraction improvement, so I find it a
little hard to get excited about it one way or the other, but it's
fine.

Tom, what do you think?

Thanks,

-- 
Robert Haas
EDB: http://www.enterprisedb.com

v6-0002-Add-some-new-hooks-so-extensions-can-add-details-.patch
Description: Binary data

v6-0001-Make-it-possible-for-loadable-modules-to-add-EXPL.patch
Description: Binary data

v6-0003-pg_overexplain-Additional-EXPLAIN-options-for-deb.patch
Description: Binary data

Re: Get rid of WALBufMappingLock

2025-03-07 Thread Alexander Korotkov

On Sun, Mar 2, 2025 at 1:58 PM Alexander Korotkov  wrote:
>
> On Fri, Feb 28, 2025 at 3:13 PM Álvaro Herrera  
> wrote:
> > On 2025-Feb-28, Michael Paquier wrote:
> >
> > > Saying that, I have also done similar tests with your v12 for a couple
> > > of hours and this looks stable under installcheck-world.  I can see
> > > that you've reworked quite a bit the surroundings of InitializedFrom
> > > in this one.  If you apply that once again at some point, the
> > > buildfarm will be judge in the long-term, but I am rather confident by
> > > saying that the situation looks better here, at least.
> >
> > Heh, no amount of testing can prove lack of bugs; but for sure "it looks
> > different now, so it must be correct" must be the weakest proof of
> > correctness I've heard of!
>
> Michael just volunteered to help Yura and me with testing.  He wan't
> intended to be reviewer.  And he reported that tests looks much more
> stable now.  I think he is absolutely correct with this.

Nevertheless, I don't think the bug has gone in v12.  I managed to
reproduce it on my local Raspberry PI 4.  The attached version of
patch fixes the bug for me.  It adds memory barriers surrounding
pg_atomic_compare_exchange_u64().  That certainly not right given this
function should already provide full memory barrier semantics.  But my
investigation shows it doesn't.  I'm going to start a separate thread
about this.

Also, new version of patch contains fix of potential integer overflow
during OldPageRqstPtr computation sent off-list my me by Yura.

--
Regards,
Alexander Korotkov
Supabase

v13-0001-Get-rid-of-WALBufMappingLock.patch
Description: Binary data

Re: Log connection establishment timings

2025-03-07 Thread Bertrand Drouvot

Hi,

On Thu, Mar 06, 2025 at 06:16:07PM -0500, Melanie Plageman wrote:
> Attached v12 does this (uses timestamps instead of instr_time).

Thanks for the new version!

> I've done some assorted cleanup. Most notably:
> 
> I removed LOG_CONNECTION_NONE because it could lead to wrong results
> to have a bitmask with a flag value that is 0 (it could be set at the
> same time other flags are set, so you could never use it correctly).

Oh right, and starting with LOG_CONNECTION_NONE/OFF = (1 << 0) does not
make that much sense...

A couple of comments regarding 0002:

=== 01

Given that conn_timing.ready_for_use is only used here:

+   if (!reported_first_ready_for_query &&
+   (log_connections & 
LOG_CONNECTION_READY_FOR_USE) &&
+   IsConnectionBackend(MyBackendType))
+   {
+   uint64  total_duration,
+   fork_duration,
+   auth_duration;
+
+   conn_timing.ready_for_use = 
GetCurrentTimestamp();
+
+   total_duration =
+   
TimestampDifferenceMicroseconds(conn_timing.socket_create,
+   
conn_timing.ready_for_use);

I wonder if ready_for_use needs to be part of ConnectionTiming after all.
I mean we could just:

"
total_duration = TimestampDifferenceMicroseconds(conn_timing.socket_create, 
GetCurrentTimestamp())
"

That said, having ready_for_use part of ConnectionTiming could be
useful if we decide to create a SQL API on top of this struct though. So,
that's probably fine and better as it is.

And if we keep it part of ConnectionTiming, then:

=== 02

+   if (!reported_first_ready_for_query &&
+   (log_connections & 
LOG_CONNECTION_READY_FOR_USE) &&
+   IsConnectionBackend(MyBackendType))

What about getting rid of reported_first_ready_for_query and check if
conn_timing.ready_for_use is zero instead?

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: making EXPLAIN extensible

2025-03-07 Thread Robert Haas

On Fri, Mar 7, 2025 at 9:38 AM Peter Eisentraut  wrote:
> Just to clarify this:  Nobody has gone through and used IWYU to clean up
> indirect includes, as you appear to imagine here.  My recent IWYU work
> was, besides putting some infrastructure in place, to clean up includes
> that are completely unneeded.  Indirect includes cleanup is a different
> project that is not currently happening, AFAIK.

OK, thanks. I wonder whether that's a good use of effort or just not
worth worrying about.

> Also, benign typedef redefinitions are a C11 feature.  In practice, all
> compilers currently in play support it, and the only problem you'll get
> is from the buildfarm members that are explicitly set up to warn about
> accidental C11 use.  We could probably have a discussion about that, but
> for this patch set, it's probably better to just deal with the status quo.

Agreed. +1 for having a discussion at some point, though, because the
effect of the current rules seems to be that you have to write "struct
BananaSplit *" in a bunch of places instead of just 'BananaSplit *" to
avoid redefining the typedef. That's worth doing it if it solves a
real problem, but if compilers where it is a real problem are extinct
in the wild, then I think I would prefer not to have to add the
"struct" keyword in a bunch of places just for compliance with
historical compiler behavior.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: [PATCH] pg_stat_activity: make slow/hanging authentication more visible

2025-03-07 Thread Jacob Champion

On Wed, Mar 5, 2025 at 8:08 PM Michael Paquier  wrote:
> Honestly, I don't see a reason not to introduce that, like in the
> attached.

This new code races against the session timeout. I see this on timer expiration:

[14:19:55.224](0.000s) # issuing query 34 via background psql:
SELECT state FROM pg_stat_activity WHERE pid = ;
[14:19:55.228](0.004s) # pump_until: process terminated
unexpectedly when searching for "(?^:(^|\n)background_psql:
QUERY_SEPARATOR 34:\r?\n)" with stream: ""
process ended prematurely at
/home/jacob/src/postgres/src/test/perl/PostgreSQL/Test/Utils.pm line
439.

Which makes it seem like some sort of crash, IMO. I don't find that as
easily debuggable as the previous log message, which was

[14:21:33.104](0.001s) # issuing query 32 via background psql:
SELECT pid FROM pg_stat_activity
#   WHERE state = 'starting' and wait_event = 'init-pre-auth';
IPC::Run: timeout on timer #1 at
/home/jacob/perl5/lib/perl5/IPC/Run.pm line 3007.

> +  WHERE state = 'starting' and wait_event = 'init-pre-auth';});

Did you have thoughts on expanding the check to backend_type [1]?

> + # Give up.  The output of the last attempt is logged by query(),
> + # so no need to do anything here.
> + return 0;

One of my primary complaints about the poll_query_until()
implementation is that "giving up" in this case means continuing to
run pieces of the test that have no business running after a timeout,
and increasing the log noise after a failure. I'm not sure how loudly
to complain in this particular case, since I know we use it
elsewhere...

Thanks!
--Jacob

[1] 
https://postgr.es/m/CAOYmi%2BnxNCQcTQE-tQ7Lwpe4cYc1u-yxwEe5kt2AVN%2BDXXVVbQ%40mail.gmail.com

Re: what's going on with lapwing?

2025-03-07 Thread Greg Sabino Mullane

On Fri, Mar 7, 2025 at 8:52 AM Robert Haas  wrote:

> There is no "what we'd like them to do" -- we have no policy or preference
> or anything as a group. Everybody's just guessing what other people want
> and care about, and then sometimes we're all grumpy at each other.
>

This is a great point. Many years ago I had some spare hardware and
wanted to setup a build animal. I looked through the list to see what
unique permutation I could provide, but it was a very manual process! It
would be nice to at least have a list of "wanted" configurations.

Perhaps a summary of the currently running configurations. So you could
see, for example, which compiler versions were run in the last week, and
perhaps how many animals were running it. Basically, some meta information
about the current runs, which could help us drive the "wanted" list.

-- 
Cheers,
Greg

--
Crunchy Data - https://www.crunchydata.com
Enterprise Postgres Software Products & Tech Support

Re: [PATCH] New predefined role pg_manage_extensions

2025-03-07 Thread Jelte Fennema-Nio

On Fri, 7 Mar 2025 at 15:37, Michael Banck  wrote:
> On Fri, Mar 07, 2025 at 09:17:46AM -0500, Robert Haas wrote:
> > Why wouldn't the cloud provider just change add 'trusted = true' to
> > the relevant control files instead of doing this?
>
> That would be possible, but  maybe the cloud provider is using
> distribution packages and does not want to muck around in the file
> system (as is usually frowned upon), or, maybe more likely, is using
> container images based on (what I've seen most of them are) the Debian
> packages and cannot (or does not want to anyway) muck around in the file
> system easily.

Yeah exactly, having to do this for every extension that you onboard
is quite a hassle to maintain. It seems much nicer to allow people to
assign a single role and be done with it.

Also many cloud providers have some slightly forked/extended postgres
to allow this already.

> Also, I think there is case to be made that a cloud provider (or site
> admin) would like to delegate the decision whether users with CREATE
> rights on a particular database are allowed to install some extensions
> or not. Or rather, assign somebody they believe would make the right
> call to do that, by granting pg_manage_extensions.

I think this is a really good point. Adding trusted=true gives any
database owner the ability to install these more dangerous extensions.
While by using pg_manage_extensions you can limit this ability to the
cluster administrator.

> On the other hand, maybe trusted should be part of the catalog and not
> (just) the extension control file, so that somebody with appropriate
> permissions (like the cloud provider during instance bootstrap) could do
> "ALTER EXTENSION foo (SET trusted|TRUSTED);" or whatever. ISTR that I
> reviewed the discussion around trusted back then and did not see that
> possiblity discussed at all, but I might be misremembering, it's been a
> while.

I think that would be hard because there's no record in the
pg_extension for extensions that are not installed. So there's also no
way to mark such an extension as trusted. To be able to do this we'd
probably need a system-wide catalog. If we'd go this route then I
think what we'd really want is a way to do:

GRANT INSTALL ON EXTENSION TO user;

And that seems orthogonal to having this pg_manage_extensions role,
because then pg_manage_extensions could grant that permission to
people.

Re: ZStandard (with dictionaries) compression support for TOAST compression

2025-03-07 Thread Aleksander Alekseev

Hi Robert,

> I think that solving the problems around using a dictionary is going
> to be really hard. Can we see some evidence that the results will be
> worth it?

Compression dictionaries give a good compression ratio (~50%) and also
increase TPS a bit (5-10%) due to better buffer cache utilization. At
least according to synthetic and not trustworthy benchmarks I did some
years ago [1]. The result may be very dependent on the actual data of
course, not to mention particular implementation of the idea.

[1]: https://github.com/afiskon/zson/blob/master/docs/benchmark.md

-- 
Best regards,
Aleksander Alekseev

Re: Add contrib/pg_logicalsnapinspect

2025-03-07 Thread Amit Kapila

On Fri, Mar 7, 2025 at 4:12 PM Bertrand Drouvot
 wrote:
>
> On Fri, Mar 07, 2025 at 10:26:23AM +0530, Amit Kapila wrote:
>
> > Your proposed change in the test sounds better than what we have now
> > but I think we should also avoid autovacuum to perform analyze as that
> > may add additional counts. For test_decoding, we keep
> > autovacuum_naptime = 1d in logical.conf file, we can either use the
> > same here or simply keep autovacuum off.
>
> When writing the attached, I initially added extra paranoia in the tests by
> using ">=", does that also address your autovacuum concern?
>

Yes, that will address the autovacuum concern.

-- 
With Regards,
Amit Kapila.

Re: [PATCH] New predefined role pg_manage_extensions

2025-03-07 Thread Dagfinn Ilmari Mannsåker

Laurenz Albe  writes:

> On Fri, 2025-02-28 at 19:16 +0100, Michael Banck wrote:
>> New version 3 attached.
>
> I am fine with v3, and I'll mark it "ready for committer".
>
> I have been wondering about regression tests.
> We cannot have them in the core tests, because we cannot rely on any
> extension being available.

That's what the extensions in src/test/modules/ are for.  The
test_extensions subdirectory seems suitable, it has tests for all sorts
of behaviour around extensions.

- ilmari

pg_atomic_compare_exchange_*() and memory barriers

2025-03-07 Thread Alexander Korotkov

Hi all,

While investigating a bug in the patch to get rid of WALBufMappingLock, I
found that the surrounding pg_atomic_compare_exchange_u64() fixes the
problem for me.  That doesn't feel right because, according to the
comments, both pg_atomic_compare_exchange_u32() and
pg_atomic_compare_exchange_u64() should provide full memory barrier
semantics.  So, I decided to investigate this further.

In my case, the implementation of pg_atomic_compare_exchange_u64() is based
on __atomic_compare_exchange_n().

static inline bool
pg_atomic_compare_exchange_u64_impl(volatile pg_atomic_uint64 *ptr,
uint64 *expected, uint64 newval)
{
AssertPointerAlignment(expected, 8);
return __atomic_compare_exchange_n(&ptr->value, expected, newval, false,
   __ATOMIC_SEQ_CST, __ATOMIC_SEQ_CST);
}

According to the docs __ATOMIC_SEQ_CST enforces total ordering with *all
other __ATOMIC_SEQ_CST operations*.  It only says about other
__ATOMIC_SEQ_CST operations; nothing is said about regular reads/writes.
This sounds quite far from our comment, promising full barrier semantics,
which, in my understanding, means the equivalent of
both pg_read_barrier()/pg_write_barrier(), which should barrier *all other
read/writes*.

I've found that __atomic_compare_exchange_n() ends up with usage
of __aarch64_cas8_acq_rel, which disassembles as following.

0860 <__aarch64_cas8_acq_rel>:
 860:   d503245fbti c
 864:   9110adrpx16, 2 <__data_start>
 868:   3940a210ldrbw16, [x16, #40]
 86c:   3470cbz w16, 878 <__aarch64_cas8_acq_rel+0x18>
 870:   c8e0fc41casal   x0, x1, [x2]
 874:   d65f03c0ret
 878:   aa0003f0mov x16, x0
 87c:   c85ffc40ldaxr   x0, [x2]
 880:   eb10001fcmp x0, x16
 884:   5461b.ne890 <__aarch64_cas8_acq_rel+0x30>  // b.any
 888:   c811fc41stlxr   w17, x1, [x2]
 88c:   3591cbnzw17, 87c <__aarch64_cas8_acq_rel+0x1c>
 890:   d65f03c0ret

This function first checks if LSE instructions are present.  If so,
instruction LSE casal is used.  If not (in my case), the loop of
ldaxr/stlxr is used instead.  The documentation says both ldaxr/stlxr
provides one-way barriers.  Read/writes after ldaxr will be observed after,
and read/writes before stlxr will be observed before.  So, a pair of these
instructions provides a full memory barrier.  However, if we don't observe
the expected value, only ldaxr gets executed.  So, an unsuccessful
pg_atomic_compare_exchange_u64() attempt will leave us with a one-way
barrier, and that caused a bug in my case.  In contrast,
__sync_val_compare_and_swap() uses __aarch64_cas8_sync, which is quite the
same as __aarch64_cas8_acq_rel, but has an explicit memory barrier in the
end.

I have a couple of ideas on how to fix this problem.
1. Provide full barrier semantics for pg_atomic_compare_exchange_*().  In
particular, for __atomic_compare_exchange_n(), we should probably
use __ATOMIC_ACQ_REL for success and run an explicit memory barrier in the
case of failure.
2. Document that pg_atomic_compare_exchange_*() doesn't necessarily provide
a memory barrier on failure.  But then we need to carefully check if
existing use-cases need explicit memory barriers now.

Any thoughts?

Links
1.
https://www.postgresql.org/message-id/CAPpHfdsWcQb-u-9K%3DipneBf8CMhoUuBWKYc%2BXWJEHVdtONOepQ%40mail.gmail.com
2. https://developer.arm.com/documentation/100941/0101/Barriers

--
Regards,
Alexander Korotkov
Supabase

Re: Trigger more frequent autovacuums

2025-03-07 Thread Melanie Plageman

On Fri, Mar 7, 2025 at 6:19 AM wenhui qiu  wrote:
>
> Sorry ,A wrong version of debug pcnt_visibletuples ，kindly please check the 
> v3 attachment

I looked at v3. I think I need more than the logging message to
understand your goal here. Could you explain the algorithm and why you
think it makes sense and what scenarios it is meant to handle better?

Thinking about it conceptually, I don't think this makes sense:

pcnt_visibletuples = (float4) (livetuples / (livetuples + vactuples));
vacthresh = (float4) vac_base_thresh + vac_scale_factor * reltuples *
pcnt_visibletuples
do_vacuum = vactuples > vacthresh

livetuples + deadtuples is approx reltuples (little more complicated
than this, but), so this is basically
livetuples/reltuples*reltuples -> livetuples

So vactuples > vacthresh is basically just deadtuples > livetuples

Maybe you think that we should be comparing the portion of the table
that is dead to the portion of the table that is live, but that
doesn't seem to be what you mean the algorithm to do based on the one
comment you have.

The anlthresh calculation is a different discussion, since
mod_since_analyze is calculated in a different way (tuples updated +
tuples inserted + tuples_deleted). But I am also skeptical of this
one.

I think you need to explain more conceptually about why you think
these ways of calculating the thresholds makes sense.

- Melanie

Re: Refactoring postmaster's code to cleanup after child exit

2025-03-07 Thread Andres Freund

Hi,

On 2025-03-07 16:25:09 +0100, Tomas Vondra wrote:
> FWIW I keep running into this (and skink seems unhappy too). I ended up
> just adding a sleep(1), right before
> 
> push(@sessions, background_psql_as_user('regress_superuser'));
> 
> and that makes it work on all my machines (including rpi5).

Can you confirm that the fix attached to my prior email suffices to address
the issue on your machine too?  I'm planning to push the fixes soon.

Greetings,

Andres Freund

Re: [PATCH] New predefined role pg_manage_extensions

2025-03-07 Thread Jelte Fennema-Nio

On Fri, 7 Mar 2025 at 14:58, Robert Haas  wrote:
> I see that Jelte walked this comment back, but I think this issue
> needs more discussion. I'm not intrinsically against having a role
> like pg_execute_server_programs that allows escalation to superuser,
> but I don't see how it would help a cloud provider whose goal is to
> NOT allow administrators to escalate to superuser.
>
> What am I missing?

The reason why I walked back my comment was that cloud providers can
simply choose which extensions they actually add to the image. If an
extension is marked as not trusted by the author, then with this role
they can still choose to add it without having to make changes to the
control file if they think it's "secure enough".

RE: AIX support

2025-03-07 Thread Srirama Kucherlapati

Hi Team,
Our team has identified couple of issues with the meson build on AIX, primarily 
focusing on the following areas:

  1.  Symbol extraction and resolution in object files during binary creation.
  2.  Dynamic runtime library path resolution in shared libraries.
We have resolved them and we plan to submit these fixes to the meson community 
shortly.

In addition, we are actively working on resolving other test issues. I will 
provide you with updates once these problems have been resolved.

92/301 postgresql:regress / regress/regress
# 4 of 223 tests failed.


Warm regards,
Sriram.

Re: Add arbitrary xid and mxid to pg_resetwal

2025-03-07 Thread Aleksander Alekseev

Hi Daniil,

Thanks for the patch.

> main idea of this patch (for REL_17_STABLE)

Your patch should target the `master` branch. Also please add a
corresponding entry to the nearest open commitfest [1].

> In my opinion, this will be useful primarily to simplify testing, since at 
> the moment you have to create segments manually (as in this article).

In this case you should add a test or two that demonstrate this. As
separate commits perhaps.

If you could also demonstrate that these tests improve code coverage
for instance that would be great.

[1]: https://commitfest.postgresql.org/

-- 
Best regards,
Aleksander Alekseev

Re: [PATCH] New predefined role pg_manage_extensions

2025-03-07 Thread Laurenz Albe

On Fri, 2025-03-07 at 09:17 -0500, Robert Haas wrote:
> On Fri, Mar 7, 2025 at 9:02 AM Jelte Fennema-Nio  wrote:
> > The reason why I walked back my comment was that cloud providers can
> > simply choose which extensions they actually add to the image. If an
> > extension is marked as not trusted by the author, then with this role
> > they can still choose to add it without having to make changes to the
> > control file if they think it's "secure enough".
> 
> Hmm. It would be easy to do dumb things here, but I agree there are
> probably a bunch of debatable cases. Maybe it would be smart if we
> labelled our untrusted extensions somehow with why they're untrusted,
> or documented that.
> 
> Why wouldn't the cloud provider just change add 'trusted = true' to
> the relevant control files instead of doing this?

That's quite true.  Perhaps the patch should be rejected after all.

Yours,
Laurenz Albe

Re: what's going on with lapwing?

2025-03-07 Thread Julien Rouhaud

On Fri, Mar 07, 2025 at 08:52:26AM -0500, Robert Haas wrote:
> On Thu, Mar 6, 2025 at 7:03 PM Julien Rouhaud  wrote:
> > Honestly, it's been years of people complaining on one thing or another 
> > about
> > lapwing without ever asking for a change.  Was it really hard to ask "can 
> > you
> > remove the -Werror it's not useful anymore" the first time it caused extra
> > work?  Instead I have to guess what people want.  So after a few complaints 
> > I
> > removed that flag.  And now after a few more complaints I turned it off.  If
> > that's not what you want, well too bad but that's on you, not me.
>
> This is actually much harder for me as a committer than you might
> guess. How is an individual committer working on an individual issue
> supposed to know that removing -Wall is the right thing vs. fixing the
> warning in some way?

That may be true in general, but I don't think it was the case here.

The "repeated extra work" here was replacing {0} with {{0}} or something like
that, due to a compiler bug.  It was outlined multiple time.

The problem is easy: either doing that is so much pain that the -Werror should
be removed, or the odds of that compiler and/or 32bits architecture revealing
an actual bug is high enough that it's a small price to pay.

I was fine either way, and I didn't think it was my choice to make since I'm
not the one responsible to deal with it.  But none of that happened and instead
people just chose to the easy solution of complaining about it for the sake of
complaining without giving an actual solution.

Anyway it seems that the only wanted use for this animal now would be to test
REL_13_STABLE which will be EOL in a few months now, and I don't see the point
of dealing more complaining just for that given how much activity is going to
happen on that branch, so as I said I turned it off.

Re: [PATCH] New predefined role pg_manage_extensions

2025-03-07 Thread Michael Banck

Hi,

On Fri, Mar 07, 2025 at 09:17:46AM -0500, Robert Haas wrote:
> Why wouldn't the cloud provider just change add 'trusted = true' to
> the relevant control files instead of doing this?

That would be possible, but  maybe the cloud provider is using
distribution packages and does not want to muck around in the file
system (as is usually frowned upon), or, maybe more likely, is using
container images based on (what I've seen most of them are) the Debian
packages and cannot (or does not want to anyway) muck around in the file
system easily.

Also, I think there is case to be made that a cloud provider (or site
admin) would like to delegate the decision whether users with CREATE
rights on a particular database are allowed to install some extensions
or not. Or rather, assign somebody they believe would make the right
call to do that, by granting pg_manage_extensions.

On the other hand, maybe trusted should be part of the catalog and not
(just) the extension control file, so that somebody with appropriate
permissions (like the cloud provider during instance bootstrap) could do
"ALTER EXTENSION foo (SET trusted|TRUSTED);" or whatever. ISTR that I
reviewed the discussion around trusted back then and did not see that
possiblity discussed at all, but I might be misremembering, it's been a
while.

Michael

Re: encode/decode support for base64url

2025-03-07 Thread Aleksander Alekseev

Hi,

> > Sometimes support for base64url from RFC 4648 would be useful.
> > Does anyone else need a patch like this?
>
> While not a frequent ask, it has been mentioned in the past.  I think it would
> make sense to add so please do submit a patch for it for consideration.

IMO it would be nice to have.

Would you like to submit such a patch or are you merely suggesting an
idea for others to implement?

-- 
Best regards,
Aleksander Alekseev

Re: making EXPLAIN extensible

2025-03-07 Thread Peter Eisentraut


On 06.03.25 21:23, Robert Haas wrote:

On Wed, Mar 5, 2025 at 4:38 PM Tom Lane  wrote:

v4 has addressed most of my nitpicks, but you still have typedefs
for ExplainState in both header files.  My bet is that at least
one buildfarm animal will complain about that.  I could be wrong
though, maybe all such compilers are in disuse now.


Ugh, I suck at this, sorry. Adjusted in v5. It's hard to avoid the
conclusion that our IWYU configuration must be fairly lenient, because
every change seems to surface more source files that are depending on
indirect includes.


Just to clarify this:  Nobody has gone through and used IWYU to clean up 
indirect includes, as you appear to imagine here.  My recent IWYU work 
was, besides putting some infrastructure in place, to clean up includes 
that are completely unneeded.  Indirect includes cleanup is a different 
project that is not currently happening, AFAIK.


Also, benign typedef redefinitions are a C11 feature.  In practice, all 
compilers currently in play support it, and the only problem you'll get 
is from the buildfarm members that are explicitly set up to warn about 
accidental C11 use.  We could probably have a discussion about that, but 
for this patch set, it's probably better to just deal with the status quo.

Re: Separate GUC for replication origins

2025-03-07 Thread Peter Eisentraut


On 07.03.25 04:51, Amit Kapila wrote:

I agree that the originally proposed name max_replication_origins is not
good, because you can "create" (using pg_replication_origin_create())
more than the configured maximum.  What is the term for what the setting
actually controls?  How many are "active"?  "In use"?  Per session?  etc.


It controls the number of active sessions using origin. The idea is
that to track replication progress via replication_origin we need to
do replorigin_session_setup(). If you look in the code, we have used
the term replorigin_session* in many places, so we thought of naming
this as max_replication_origin_sessions. But the other options could
be max_active_replication_origins or max_replication_origins_in_use.


The word "session" is correlated to "replication origin" but requires some
knowledge to know the replication progress tracking design. The word "active"
can express the fact that it was setup and is currently in use. I vote for
max_active_replication_origins.


Sounds reasonable. Let's go with max_active_replication_origins then,
unless people think otherwise.


Is that maximum active for the whole system, or maximum active per 
session, or maximum active per created origin, or some combination of these?

Re: AIO v2.5

2025-03-07 Thread Jakub Wartak

On Thu, Mar 6, 2025 at 2:13 PM Andres Freund  wrote:

> On 2025-03-06 12:36:43 +0100, Jakub Wartak wrote:
> > On Tue, Mar 4, 2025 at 8:00 PM Andres Freund  wrote:
> > > Questions:
> > >
> > > - My current thinking is that we'd set io_method = worker initially - so 
> > > we
> > >   actually get some coverage - and then decide whether to switch to
> > >   io_method=sync by default for 18 sometime around beta1/2. Does that 
> > > sound
> > >   reasonable?
> >
> > IMHO, yes, good idea. Anyway final outcomes partially will depend on
> > how many other stream-consumers be committed, right?
>
> I think it's more whether we find cases where it performs substantially worse
> with the read stream users that exist.  The behaviour for non-read-stream IO
> shouldn't change.

OK, so in order to to get full picture for v18beta this would mean
$thread + following ones?:
- Use read streams in autoprewarm
- BitmapHeapScan table AM violation removal (and use streaming read API)
- Index Prefetching (it seems it has stalled?)

or is there something more planned? (I'm asking what to apply on top
of AIO to minimize number of potential test runs which seem to take
lots of time, so to do it all in one go)

> > So, I've taken aio-2 branch from Your's github repo for a small ride
> > on legacy RHEL 8.7 with dm-flakey to inject I/O errors. This is more a
> > question: perhaps IO workers should auto-close fd on errors or should
> > we use SIGUSR2 for it? The scenario is like this:
>
> When you say "auto-close", you mean that one IO error should trigger *all*
> workers to close their FDs?

Yeah I somehow was thinking about such a thing, but after You have
bolded that "*all*", my question sounds much more stupid than it was
yesterday. Sorry for asking stupid question :)

> The same is already true with bgwriter, checkpointer etc?

Yeah.. I was kind of looking for a way of getting "higher
availability" in the presence of partial IO (tablespace) errors.

> > pg_terminate_backend() on those won't work. The only thing that works seems
> > to be sending SIGUSR2
>
> Sending SIGINT works.

Ugh, ok, it looks like I've been overthinking that, cool.

> > , but is that safe [there could be some errors after pwrite() ]?
>
> Could you expand on that?

It is pure speculation on my side: well I'm always concerned about
leaving something out there without cleanup after errors and then
re-using it for something else much later, especially on edge-cases
like NFS or FUSE. In the backend we could maintain some state, but
io_workes are shared across backends. E.g. some pwrite() failing on
NFS, we are not closing that fd, and then reusing it for something
else much latter for different backend (although AFAIK close() does
not guarantee anything, but e.g. it could be that some inode/path or
something was simply marked dangling - the fresh pair of
close()/open() could could could return error, but here we would just
keep on pwriting() there?).

OK the only question remains: does it make sense to try something like
pgbench on NFS UDP mountopt=hard,nointr + intermittent iptables DROP
from time to time , or is it not worth trying?

> > With
> > io_worker=sync just quitting the backend of course works. Not sure
> > what your thoughts are because any other bgworker could be having open
> > fds there. It's a very minor thing. Otherwise that outage of separate
> > tablespace (rarely used) would potentially cause inability to fsck
> > there and lower the availability of the DB (due to potential restart
> > required).
>
> I think a crash-restart is the only valid thing to get out of a scenario like
> that, independent of AIO:
>
> - If there had been any writes we need to perform crash recovery anyway, to
>   recreate those writes
> - If there just were reads, it's good to restart as well, as otherwise there
>   might be pages in the buffer pool that don't exist on disk anymore, due to
>   the errors.

OK, cool, thanks!

-J.

Re: Commitfest app release on Feb 17 with many improvements

2025-03-07 Thread Andreas Karlsson


On 3/7/25 12:48 AM, Jelte Fennema-Nio wrote:

Okay, I went for the approach of just trying everything until one
works. Starting with "git am", then patch(1), and as a final attempt
"git apply". Patch 5272 applies correctly now. I've removed any
backoff caused by repeated failures for all existing patches, so all
should be retried in the next ~48 hours.


Out of curiosity: do you track which method works? I would expect 
everything to be applied with either git am or patch which can be 
applied with git apply making git apply technically unnecessary.


Andreas

1 2 >

1 - 100 of 171 matches

Mail list logo