Re: Statistics Import and Export

2025-02-28 Thread Nathan Bossart
On Fri, Feb 28, 2025 at 04:52:02PM -0500, Corey Huinker wrote:
> On Fri, Feb 28, 2025 at 4:25 PM Jeff Davis  wrote:
>> On Fri, 2025-02-28 at 14:56 -0600, Nathan Bossart wrote:
>>> I'm curious what the use-case is for pg_upgrade's --no-statistics
>>> option.
>>
>> Mostly completeness and paranoia. I don't see a real use case. If we
>> decide we don't need it, that's fine with me.
> 
> Completeness/symmetry and paranoia was how I viewed it. I suppose it might
> be useful in a failsafe if a pg_upgrade failed and you needed a way to
> retry.

Got it.  I have no strong opinion on the matter.

-- 
nathan




Add assertion for failed alloc to palloc0() and palloc_extended()

2025-02-28 Thread Andreas Karlsson

Hi,

I noticed that we have Assert(ret != NULL) in palloc() but not in 
palloc0() so for consistency I decided to add it. I also added an 
assertion that the MCXT_ALLOC_NO_OOM flag is set if alloc() returns

NULL to palloc_extended().

I feel that this might be useful since while palloc() is much more 
common the OOM which causes alloc() to incorrectly return NULL could in 
theory happen in any of the three functions.


Andreas
From 7165657a4b62ac35e2c67b9e51035be2a5fbb93f Mon Sep 17 00:00:00 2001
From: Andreas Karlsson 
Date: Sat, 1 Mar 2025 00:27:52 +0100
Subject: [PATCH] Add assertion for failed alloc to palloc0() and
 palloc_extended()

The palloc() call asserts that the alloc function does not return NULL
so do the same in palloc0(). In palloc_extend() we can assert the same
as long as the MCXT_ALLOC_NO_OOM flag is not set, so let's do so.
---
 src/backend/utils/mmgr/mcxt.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/src/backend/utils/mmgr/mcxt.c b/src/backend/utils/mmgr/mcxt.c
index aa6da0d0352..ce108788259 100644
--- a/src/backend/utils/mmgr/mcxt.c
+++ b/src/backend/utils/mmgr/mcxt.c
@@ -1356,7 +1356,8 @@ palloc0(Size size)
 	context->isReset = false;
 
 	ret = context->methods->alloc(context, size, 0);
-
+	/* We expect OOM to be handled by the alloc function */
+	Assert(ret != NULL);
 	VALGRIND_MEMPOOL_ALLOC(context, ret, size);
 
 	MemSetAligned(ret, 0, size);
@@ -1379,6 +1380,7 @@ palloc_extended(Size size, int flags)
 	ret = context->methods->alloc(context, size, flags);
 	if (unlikely(ret == NULL))
 	{
+		Assert(flags & MCXT_ALLOC_NO_OOM);
 		return NULL;
 	}
 
-- 
2.47.2



Re: [PoC] Federated Authn/z with OAUTHBEARER

2025-02-28 Thread Jacob Champion
On Fri, Feb 28, 2025 at 5:44 AM Thomas Munro  wrote:
> If you trigger the new optional NetBSD CI task, the oauthvalidator
> tests implode[1].

Oh, thank you for reporting that. I need to pay more attention to the
BSD CI thread.

> Apparently that OS's kevent() doesn't like zero
> relative timeouts for EVFILT_TIMER[2].  I see that you worked around
> the same problem for Linux timerfd already by rounding 0 up to 1, so
> we could just do the same here, and it passes with the attached.

Just from inspection, that looks good to me. I'll look into running
the new BSD tasks on the other patches I posted above, too.

Should we maybe consider just doing that across the board, and put up
with the inefficiency? Admittedly 1ms is a lot more dead time than
1ns...

> A
> cute alternative, not tested, might be to put NOTE_ABSTIME into fflag
> if timeout == 0 (then it's an absolute time in the past, which should
> fire immediately).

That could work. I think I need to stare at the man pages more if we
go that direction, since (IIUC) NOTE_ABSTIME changes up some other
default behavior for the timer.

> But I'm curious, how hard would it be to do this ↓ instead and not
> have that problem on any OS?
>
>  * There might be an optimization opportunity here: if timeout == 0, we
>  * could signal drive_request to immediately call
>  * curl_multi_socket_action, rather than returning all the way up the
>  * stack only to come right back. But it's not clear that the additional
>  * code complexity is worth it.

I'm not sure if it's hard, so much as it is confusing. My first
attempt at it a while back gave me the feeling that I wouldn't
remember how it worked in a few months.

Here are the things that I think we would have to consider, at minimum:
1. Every call to curl_multi_socket_action/all now has to look for the
new "time out immediately" flag after returning.
2. If it is set, and if actx->running has been set to zero, we have to
decide what that means conceptually. (Is that an impossible case? Or
do we ignore the timeout and assume the request is done/failed?)
3. Otherwise, we need to clear the flag and immediately call
curl_multi_socket_action(CURL_SOCKET_TIMEOUT), and repeat until that
flag is no longer set. That feels brittle to me, because if there's
some misunderstanding in our code or some strange corner case in an
old version of Curl on some platform, and we keep getting a timeout of
zero, we'll hit an infinite loop. (The current behavior instead
returns control to the top level every time, and gives
curl_multi_socket_all() a chance to right the ship by checking the
status of all the outstanding sockets.)
4. Even if 2 and 3 are just FUD, there's another potential call to
set_timer(0) in OAUTH_STEP_TOKEN_REQUEST. The interval can only be
zero in debug mode, so if we add new code for that case alone, the
tests will be mostly exercising a non-production code path.

I prefer your patch, personally.

Thanks!
--Jacob




Re: vacuumdb changes for stats import/export

2025-02-28 Thread Nathan Bossart
On Thu, Feb 27, 2025 at 04:36:04PM +0700, John Naylor wrote:
> On Wed, Feb 5, 2025 at 4:44 AM Nathan Bossart  
> wrote:
>> [v2]
> 
> I started looking just at 0001, and it seems like a fairly
> straightforward rearrangement.

Thanks for taking a look.

> I found this comment quite hard to read:
> 
> + * 'found_objs' should be a fully qualified list of objects to process, as
> + * returned by a previous call to vacuum_one_database().  If *found_objs is
> + * NULL, it is set to the results of the catalog query discussed below.  If
> + * found_objs is NULL, the results of the catalog query are not returned.
> + *
> + * If *found_objs is NULL, this function performs a catalog query to retrieve
> + * the list of tables to process.  When 'objects' is NULL, all tables in the
> 
> I had to read it several times before I noticed the difference between
> "* found_objs" and "*found_objs". Maybe some extra spacing and breaks
> would help, or other reorganization.

Yeah, it's pretty atrocious.  I think the main problem is that the
interface is just too complicated, so I'll take a step back and see if I
can make it more understandable to humans.  In the meantime, here's an
attempt at adjusting the comment:

 * found_objs is a double pointer to a fully qualified list of objects to
 * process, as returned by a previous call to vacuum_one_database().  If
 * *found_objs (the once-dereferenced double pointer) is NULL, it is set to the
 * results of the catalog query discussed below.  If found_objs (the double
 * pointer) is NULL, the results of the catalog query are not returned.
 *
 * If *found_objs (the once-dereferenced double pointer) is NULL, this function
 * performs a catalog query to retrieve the list of tables to process.  When
 * "objects" is NULL, all tables in the database are processed.  Otherwise, the
 * catalog query performs a lookup for the objects listed in "objects".

-- 
nathan




Re: [PoC] Federated Authn/z with OAUTHBEARER

2025-02-28 Thread Thomas Munro
On Sat, Mar 1, 2025 at 6:37 AM Jacob Champion
 wrote:
> Should we maybe consider just doing that across the board, and put up
> with the inefficiency? Admittedly 1ms is a lot more dead time than
> 1ns...

Last time I checked, NetBSD is still using scheduler ticks (100hz
periodic interrupt) for all this kind of stuff so it's even worse than
that :-)

> I prefer your patch, personally.

Cool, I'll commit it shortly unless someone else comes up with a better idea.




Re: [PoC] Federated Authn/z with OAUTHBEARER

2025-02-28 Thread Thomas Munro
On Sat, Mar 1, 2025 at 10:57 AM Jacob Champion
 wrote:
> After your patch gets us past the zero timeout bug, NetBSD next runs into
>
> failed to fetch OpenID discovery document: Unsupported protocol
> (libcurl: Received HTTP/0.9 when not allowed)'
>
> ...but only for a single test (nonempty oauth_client_secret), which is
> very strange. And it doesn't always fail during the same HTTP request.
> I'll look into it.

In case it's relevant, it was green for me, but I also ran it in
combination with my 3x-go-faster patch on that other thread. . o O {
Timing/race stuff?  Normally the build farm shakes that stuff out a
bit more reliably than CI, but I doubt libcurl is set up on many
animals... }




Re: optimize file transfer in pg_upgrade

2025-02-28 Thread Robert Haas
On Fri, Feb 28, 2025 at 3:01 PM Nathan Bossart  wrote:
> That's exactly where I landed (see v3-0002).  I haven't measured whether
> transferring relfilenodes or dumping the sequence data is faster for the
> existing modes, but for now I've left those alone, i.e., they still dump
> sequence data.  The new "swap" mode just uses the old cluster's sequence
> files, and I've disallowed using swap mode for upgrades from  the sequence tuple format change (along with other incompatible changes).

Ah. Perhaps I should have read the thread more carefully before
commenting. Sounds good, at any rate.

> I'll admit I'm a bit concerned that this will cause problems if and when
> someone wants to change the sequence tuple format again.  But that hasn't
> happened for a while, AFAIK nobody's planning to change it, and even if it
> does happen, we just need to have my proposed new mode transfer the
> sequence files like it transfers the catalog files.  That will make this
> mode slower, especially if you have a ton of sequences, but maybe it'll
> still be a win in most cases.  Of course, we probably will need to have
> pg_upgrade handle other kinds of format changes, too, but IMHO it's still
> worth trying to speed up pg_upgrade despite the potential future
> complexities.

I think it's fine. If somebody comes along and says "hey, when v23
came out Nathan's feature only sped up pg_upgrade by 2x instead of 3x
like it did for v22, so Nathan is a bad person," I think we can fairly
reply "thanks for sharing your opinion, feel free not to use the
feature and run at 1x speed". There's no rule saying that every
optimization must always produce the maximum possible benefit in every
scenario. We're just concerned about regressions, and "only delivers
some of the speedup if the sequence format has changed on disk" is not
a regression.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: optimize file transfer in pg_upgrade

2025-02-28 Thread Robert Haas
On Wed, Nov 6, 2024 at 5:07 PM Nathan Bossart  wrote:
> these user relation files will have the same name.  Therefore, it can be
> much faster to instead move the entire data directory from the old cluster
> to the new cluster and to then swap the catalog relation files.

This is a cool idea.

> Another interesting problem is that pg_upgrade currently doesn't transfer
> the sequence data files.  Since v10, we've restored these via pg_restore.
> I believe this was originally done for the introduction of the pg_sequence
> catalog, which changed the format of sequence tuples.  In the new
> catalog-swap mode I am proposing, this means we need to transfer all the
> pg_restore-generated sequence data files.  If there are many sequences, it
> can be difficult to determine which transfer mode and synchronization
> method will be faster.  Since sequence tuple modifications are very rare, I
> think the new catalog-swap mode should just use the sequence data files
> from the old cluster whenever possible.

Maybe we should rethink the decision not to transfer relfilenodes for
sequences. Or have more than one way to do it. pg_upgrade
--binary-upgrade --binary-upgrade-even-for-sequences, or whatever.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Metadata and record block access stats for indexes

2025-02-28 Thread Mircea Cadariu

Hi,

For the purpose of writing a blog post I was checking the index stats 
recorded for a workload, but found them rather confusing. Following 
along the code with the debugger it eventually made sense, and I could 
eventually understand what's counted.  Looking around a bit, I 
discovered an older discussion [1] in the mailing lists and learned that 
the issue is known.  The proposal in that thread is to start counting 
separate metadata and record stats depending on what type of index block 
is retrieved.


I realized those would have helped me better understand the collected 
index stats, so I started working on a patch to add these in the system 
views. Attached is a WIP patch file with partial coverage of the B-Tree 
index code. The implementation follows the existing stats collection 
approach and the naming convention proposed in [1].  Let me know if what 
I'm doing is feasible and if there's any concerns I could address. Next 
steps would be to replace all places where I currently pass in NULL with 
proper counting, as well as update tests and docs.


Looking forward to your feedback! Thanks!

Cheers,
Mircea

[1]: 
https://www.postgresql.org/message-id/flat/CAH2-WzmdZqxCS1widYzjDAM%2BZ-Jz%3DejJoaWXDVw9Qy1UsK0tLA%40mail.gmail.com 

From a540a042ffb0b348254afdfdce39199900f9c7ec Mon Sep 17 00:00:00 2001
From: Mircea Cadariu 
Date: Thu, 20 Feb 2025 13:45:12 +
Subject: [PATCH v1] Preliminary work to capture and expose separate record
 (leaf page) and metadata (non-leaf page) index access statistics in the
 system views, with partial coverage of B-Trees.

---
 contrib/amcheck/verify_heapam.c  |   2 +-
 contrib/amcheck/verify_nbtree.c  |   6 +-
 contrib/bloom/blinsert.c |   6 +-
 contrib/bloom/blscan.c   |   2 +-
 contrib/bloom/blutils.c  |   6 +-
 contrib/bloom/blvacuum.c |   6 +-
 contrib/pageinspect/btreefuncs.c |   8 +-
 contrib/pageinspect/rawpage.c|   2 +-
 contrib/pg_prewarm/autoprewarm.c |   2 +-
 contrib/pg_surgery/heap_surgery.c|   2 +-
 contrib/pg_visibility/pg_visibility.c|   2 +-
 contrib/pgstattuple/pgstatapprox.c   |   2 +-
 contrib/pgstattuple/pgstatindex.c|  10 +-
 contrib/pgstattuple/pgstattuple.c|  10 +-
 doc/src/sgml/monitoring.sgml | 110 +++
 src/backend/access/brin/brin.c   |   4 +-
 src/backend/access/brin/brin_pageops.c   |   4 +-
 src/backend/access/brin/brin_revmap.c|  12 +-
 src/backend/access/gin/ginbtree.c|  11 +-
 src/backend/access/gin/ginfast.c |  14 +--
 src/backend/access/gin/ginget.c  |   6 +-
 src/backend/access/gin/ginutil.c |   6 +-
 src/backend/access/gin/ginvacuum.c   |  22 ++--
 src/backend/access/gist/gist.c   |  10 +-
 src/backend/access/gist/gistbuild.c  |  10 +-
 src/backend/access/gist/gistget.c|   4 +-
 src/backend/access/gist/gistutil.c   |   2 +-
 src/backend/access/gist/gistvacuum.c |   6 +-
 src/backend/access/hash/hash.c   |   3 +-
 src/backend/access/hash/hashpage.c   |  10 +-
 src/backend/access/heap/heapam.c |  16 +--
 src/backend/access/heap/heapam_handler.c |   9 +-
 src/backend/access/heap/hio.c|   8 +-
 src/backend/access/heap/vacuumlazy.c |   2 +-
 src/backend/access/heap/visibilitymap.c  |   2 +-
 src/backend/access/nbtree/nbtinsert.c|  32 --
 src/backend/access/nbtree/nbtpage.c  |  65 +++
 src/backend/access/nbtree/nbtree.c   |   2 +-
 src/backend/access/nbtree/nbtsearch.c|  34 --
 src/backend/access/nbtree/nbtutils.c |   5 +-
 src/backend/access/spgist/spgdoinsert.c  |   4 +-
 src/backend/access/spgist/spgscan.c  |   4 +-
 src/backend/access/spgist/spgutils.c |   8 +-
 src/backend/access/spgist/spgvacuum.c|   4 +-
 src/backend/access/transam/xloginsert.c  |   2 +-
 src/backend/catalog/system_views.sql |  30 -
 src/backend/commands/sequence.c  |   2 +-
 src/backend/storage/aio/read_stream.c|   6 +-
 src/backend/storage/buffer/bufmgr.c  |  52 +
 src/backend/storage/freespace/freespace.c|   2 +-
 src/backend/utils/activity/pgstat_database.c |   4 +
 src/backend/utils/activity/pgstat_relation.c |   8 ++
 src/backend/utils/adt/pgstatfuncs.c  |  24 
 src/include/access/nbtree.h  |   4 +-
 src/include/catalog/pg_proc.dat  |  32 ++
 src/include/pgstat.h |  45 
 src/include/storage/bufmgr.h |  12 +-
 src/test/regress/expected/rules.out  |  40 ++-
 58 files changed, 557 insertions(+), 201 deletions(-)

diff --git a/contrib/amcheck/verify_heapam.c b/contrib/amcheck/verify_heapam.c
index 827312306f

Re: Make COPY format extendable: Extract COPY TO format implementations

2025-02-28 Thread Masahiko Sawada
On Thu, Feb 27, 2025 at 7:57 PM Sutou Kouhei  wrote:
>
> Hi,
>
> In 
>   "Re: Make COPY format extendable: Extract COPY TO format implementations" 
> on Thu, 27 Feb 2025 15:24:26 -0800,
>   Masahiko Sawada  wrote:
>
> > Pushed the 0001 patch.
>
> Thanks!
>
> > Regarding the 0002 patch, I realized we stopped exposing
> > NextCopyFromRawFields() function:
> >
> >  --- a/src/include/commands/copy.h
> > +++ b/src/include/commands/copy.h
> > @@ -107,8 +107,6 @@ extern CopyFromState BeginCopyFrom(ParseState
> > *pstate, Relation rel, Node *where
> >  extern void EndCopyFrom(CopyFromState cstate);
> >  extern bool NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
> >  Datum *values, bool *nulls);
> > -extern bool NextCopyFromRawFields(CopyFromState cstate,
> > - char ***fields, int *nfields);
> >
> > I think that this change is not relevant with the refactoring and
> > probably we should keep it exposed as extension might be using it.
> > Considering that we added pg_attribute_always_inline to the function,
> > does it work even if we omit pg_attribute_always_inline to its
> > function declaration in the copy.h file?
>
> Unfortunately, no. The inline + constant argument
> optimization requires "static".
>
> How about the following?
>
> static pg_attribute_always_inline bool
> NextCopyFromRawFieldsInternal(CopyFromState cstate, char ***fields, int 
> *nfields, bool is_csv)
> {
> ...
> }
>
> bool
> NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
> {
> return NextCopyFromRawFieldsInternal(cstate, fields, nfields, 
> cstate->opts.csv_mode);
> }
>

Thank you for the confirmation.

I initially thought it would be acceptable to stop
NextCopyFromRawFields exposed since NextCopyFrom() could serve as an
alternative. For example, the NextCopyFromRawFields() function was
originally exposed in commit 8ddc05fb01ee2c primarily to support
extension modules like file_fdw but file_fdw wasn't utilizing this
API. I pushed the patch without the above change. Unfortunately, this
commit subsequently broke file_text_array_fdw[1] and made BF animal
crake unhappy[2].

Upon examining file_text_array_fdw more closely, I realized that
NextCopyFrom() may not be a suitable replacement for
NextCopyFromRawFields() in certain scenarios. Specifically,
NextCopyFrom() assumes that the caller has prior knowledge of the
source data's column count, making it inadequate for cases where
extensions like file_text_array_fdw need to construct an array of
source data with an unknown number of columns. In such situations,
NextCopyFromRawFields() proves to be more practical. Given these
considerations, I'm now leaning towards implementing the proposed
change. Thoughts?

Regards,

[1] https://github.com/adunstan/file_text_array_fdw
[2] 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=crake&dt=2025-02-28%2018%3A47%3A02

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com




Re: Statistics Import and Export

2025-02-28 Thread Jeff Davis
On Thu, 2025-02-27 at 22:42 -0500, Greg Sabino Mullane wrote:
> I know I'm coming late to this, but I would like us to rethink having
> statistics dumped by default. I was caught by this today, as I was
> doing two dumps in a row, but the output changed between runs solely
> because the stats got updated. It got me thinking about all the use
> cases of pg_dump I've seen over the years. I think this has the
> potential to cause a lot of problems for things like automated
> scripts.

Can you expand on some of those cases?

There are some good reasons to make dumping stats the default:

 * The argument here[1] seemed compelling: pg_dump has always dumped
everything by default, so not doing so for stats could be surprising.

 * When dumping into the custom format, we'd almost certainly want to
include the stats so you can decide later whether to restore them or
not.

 * For most of the cases I'm aware of, if you encounter a diff related
to stats, it would be obvious what the problem is and the fix would be
easy. I can imagine cases where it might not be easy, but I can't
recall any, so if you can then it would be helpful to list them.

so we will need to weigh the costs and benefits.

Unless there's a consensus to change it, I'm inclined to keep it the
default at least into beta, so that we can get feedback from users and
make a more informed decision.

(Aside: I assume everyone here agrees that pg_upgrade should transfer
the stats by default.)

Regards,
Jeff Davis


[1]
https://www.postgresql.org/message-id/3228677.1713844341%40sss.pgh.pa.us





Re: making EXPLAIN extensible

2025-02-28 Thread Sami Imseih
> EXPLAIN output. It wouldn't make sense for core to have an EXPLAIN
> option whose whole purpose is to cater to the needs of some extension,
> so that made me think of providing some extensibility infrastructure.

Making EXPLAIN extensible sounds like a good idea.. FWIW, There is a
discussion [0]
for showing FDW remote plans ( postgres_fdw specifically), and I think
we  will need to
add some new options to EXPLAIN to make that possible.

Have not looked at your patches, but I will do so now.


Regards,

Sami Imseih
Amazon Web Services (AWS)


[0] 
https://www.postgresql.org/message-id/CAA5RZ0vXiOiodrNQ-Va4FCAkXMpGA%3DGZDeKjFBRgRvHGuW7s7Q%40mail.gmail.com




Re: Should work_mem be stable for a prepared statement?

2025-02-28 Thread Tom Lane
Sami Imseih  writes:
> However, I think any GUC that can influence the planner
> should be considered for consistency in behavior.
> It was mentioned above with the enable_* GUCs, but another
> one I can think of is the max_parallel_workers_per_gather which
> should then force a re-plan if changed. I have seen users need to turn
> that off in a hurry when it impacts their oltp workload.

The argument for treating work_mem specially is that it has effects at
both plan time and run time, so that the planner's cost assumptions
are invalidated if the executor uses a different value than the
planner did.  I don't believe that argument applies to anything else;
certainly it doesn't apply to the enable_* flags.  I'm also not
convinced that the argument requires us to solve the problem by
re-planning.  It would work just as well to stamp each PlannedStmt
with the value that the planner used and refer to that in the
executor instead of looking directly at the GUC.

This is all kind of moot though, now that Jeff has revealed that
what he's really interested in is some sort of refactoring.
Maybe that refactoring is one that would conveniently apply to
other GUCs, or maybe it isn't.  I'm content to await details
before arguing about what we "should" do.

regards, tom lane




Re: Log connection establishment timings

2025-02-28 Thread Melanie Plageman
On Fri, Feb 28, 2025 at 12:16 AM Bertrand Drouvot
 wrote:
>
> Yup but my idea was to put all those line:
>
> "
> if (Log_connections &&
> (child_type == B_BACKEND || child_type == B_WAL_SENDER))
> {
> instr_time  fork_time = ((BackendStartupData *) 
> startup_data)->fork_time;
>
> conn_timing.fork_duration = INSTR_TIME_GET_DURATION_SINCE(fork_time);
> }
> "
>
> into a dedicated helper function.

I ended up finding a bug that means that that exact code isn't
duplicated twice now. For EXEC_BACKEND, I have to wait to calculate
the duration until after reading the GUC values but I need to get the
fork end time before that.

I tried coming up with an inline helper to replace this and most
things I tried felt awkward. It has to have backend type and start
time as parameters. And all of these places we have to be super
careful that the GUCs have already been read before using
log_connections, so it seems a bit unsafe to check log_connections
(the global variable) in a function. Someone could call the function
in a different place and maybe not know that log_connections won't be
set there.

Also, I also wasn't sure if it would be weird to call a function like
"LogConnectionTiming()" which in many cases doesn't log the connection
timing (because it is a different backend type).

But maybe I'm not thinking about it correctly. What were you imagining?

- Melanie




Re: Trigger more frequent autovacuums of heavy insert tables

2025-02-28 Thread Nathan Bossart
On Wed, Feb 26, 2025 at 04:48:20PM -0500, Melanie Plageman wrote:
> Makes sense. Thanks Robert and Nathan. Attached v11 changes the docs
> wording and is rebased.

0001 LGTM.

>  
> - Specifies a fraction of the table size to add to
> - autovacuum_vacuum_insert_threshold
> - when deciding whether to trigger a VACUUM.
> - The default is 0.2 (20% of table size).
> - This parameter can only be set in the 
> postgresql.conf
> - file or on the server command line;
> - but the setting can be overridden for individual tables by
> - changing table storage parameters.
> +Specifies a fraction of the active (unfrozen) table size to add to
> +autovacuum_vacuum_insert_threshold
> +when deciding whether to trigger a VACUUM.
> +The default is 0.2 (20% of active table size).
> +This parameter can only be set in the 
> postgresql.conf
> +file or on the server command line;
> +but the setting can be overridden for individual tables by
> +changing table storage parameters.
>  

nitpick: There might be an unintentional indentation change here.

I'm wondering about the use of the word "active," too.  While it's
qualified by the "(unfrozen)" after it, I'm worried it might not be
descriptive enough.  For example, I might consider a frozen page that's in
the buffer cache and is being read by queries to be "active."  And it
doesn't seem clear to me that it's referring to unfrozen pages and not
unfrozen tuples.  Perhaps we should say something like "a fraction of the
unfrozen pages in the table to add...".

> + /*
> +  * If we have data for relallfrozen, calculate the unfrozen 
> percentage
> +  * of the table to modify insert scale factor. This helps us 
> decide
> +  * whether or not to vacuum an insert-heavy table based on the 
> number
> +  * of inserts to the "active" part of the table.
> +  */
> + if (relpages > 0 && relallfrozen > 0)

So, if we don't have this data, we just use reltuples, which is the
existing behavior and should trigger vacuums less aggressively than if we
_did_ have the data.  That seems like the correct choice to me.

> + /*
> +  * It could be the stats were updated manually and 
> relallfrozen >
> +  * relpages. Clamp relallfrozen to relpages to avoid 
> nonsensical
> +  * calculations.
> +  */
> + relallfrozen = Min(relallfrozen, relpages);
> + pcnt_unfrozen = 1 - ((float4) relallfrozen / relpages);

Makes sense.

-- 
nathan




Re: Proposal: Limitations of palloc inside checkpointer

2025-02-28 Thread Maxim Orlov
After done some testing, I found a bug in the patch. If more requests were
pushed while we release the lock, num_requests could not be set to zero.

Here is a fixed version.

-- 
Best regards,
Maxim Orlov.


v2-0001-AbsorbSyncRequests-incrementally-instead-of-doing.patch
Description: Binary data


Re: [BUG]: the walsender does not update its IO statistics until it exits

2025-02-28 Thread Michael Paquier
On Fri, Feb 28, 2025 at 02:41:34PM +0900, Michael Paquier wrote:
> With smaller records, the loop can become hotter, can't it?  Also,
> there can be a high number of WAL senders on a single node, and I've
> heard of some customers with complex logical decoding deployments with
> dozens of logical WAL senders.  Isn't there a risk of having this code
> path become a point of contention?  It seems to me that we should
> benchmark this change more carefully, perhaps even reduce the
> frequency of the report calls.

One idea here would be to have on a single host one server with a set
of N pg_receivewal processes dumping their WAL segments into a tmpfs,
while a single session generates a bunch of records with a minimal
size using pg_logical_emit_message().  Monitoring the maximum
replication lag with pg_stat_replication and looking at some perf
profiles of the cluster should show how these stats reports affect the
replication setup efficiency. 
--
Michael


signature.asc
Description: PGP signature


Re: Update docs for UUID data type

2025-02-28 Thread Masahiko Sawada
On Thu, Feb 27, 2025 at 5:50 PM Andy Alsup  wrote:
>
> Masahiko,
>
> I have combined the gen_random_uuid() and uuidv4() into a single row, as you 
> suggested. Please find the v5 patch, which has been squashed into a single 
> commit.

Thank you for updating the patch!

I like that the patch adds the reference to the uuid data type. But I
think we might want to adjust terminology:

+  
+   See  for how details on the UUID datatype in
+PostgreSQL.
+  

On 9.14. UUID Functions section, we use the word 'UUID' for data that
are generated based on algorithms defined by RFC9562 whereas we use
uuid (i.e., uuid in func.sgml) to a PostgreSQL data type.
IIUC you want to refer 'UUID datatype' in the above change to the
latter, PostgreSQL's uuid data type. Is that correct? If so, how about
the following change?

   See  for details on the data type
   uuid in PostgreSQL.

I've attached the updated patch that incorporates the above change,
and updated the commit message too.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com


0001-doc-Convert-UUID-functions-list-to-table-format.patch
Description: Binary data


Re: Enhancing Memory Context Statistics Reporting

2025-02-28 Thread Daniel Gustafsson
> On 24 Feb 2025, at 13:46, Rahila Syed  wrote:

> PFA the updated and rebased patches.

Thanks for the rebase, a few mostly superficial comments from a first
read-through.  I'll do some more testing and playing around with it for
functional comments.

+  ...
+  child contexts' statistics, with num_agg_contexts indicating the number
+  of these aggregated child contexts.
The documentation refers to attributes in the return row but the format of that
row isn't displayed which makes following along hard.  I think we should
include a table or a programlisting showing the return data before this
paragraph.


+const char *
+AssignContextType(NodeTag type)
This function doesn't actually assign anything so the name is a bit misleading,
it would be better with ContextTypeToString or something similar.


+ * By default, only superusers or users with PG_READ_ALL_STATS are allowed to
This sentence is really long and should probably be broken up.


+ * The shared memory buffer has a limited size - it the process has too many
s/it/if/


+ * If the publishing backend does not respond before the condition variable
+ * times out, which is set to MEMSTATS_WAIT_TIMEOUT, retry for max_tries
+ * number of times, which is defined by user, before giving up and
+ * returning previously published statistics, if any.
This comment should mention what happens if the process gives up and there is
no previously published stats.


+   int i;
...
+   for (i = 0; i < memCtxState[procNumber].total_stats; i++)
This can be rewritten as "for (int i = 0; .." since we allow C99.


+* process running and consuming lots of memory, that it might end on its
+* own first and its memory contexts are not logged is not a problem.
This comment is copy/pasted from pg_log_backend_memory_contexts and while it
mostly still apply it should at least be rewritten to not refer to logging as
this function doesn't do that.


+   ereport(WARNING,
+   (errmsg("PID %d is not a PostgreSQL server process",
No need to add the extra parenthesis around errmsg anymore, so I think new code
should omit those.


+  errhint("Use pg_backend_memory_contexts view instead")));
Super nitpick, but errhints should be complete sentences ending with a period.


+* statitics have previously been published by the backend. In which case,
s/statitics/statistics/


+* statitics have previously been published by the backend. In which case,
+* check if statistics are not older than curr_timestamp, if they are wait
I think the sentence around the time check is needlessly confusing, could it be
rewritten into something like:
"A valid DSA pointer isn't proof that statistics are available, it can be
valid due to previously published stats.  Check if the stats are updated by
comparing the timestamp, if the stats are newer than our previously
recorded timestamp from before sending the procsignal they must by
definition be updated."


+   /* Assert for dsa_handle to be valid */
Was this intended to be turned into an Assert call? Else it seems better to 
remove.


+   if (print_location != PRINT_STATS_NONE)
This warrants a comment stating why it makes sense.


+* Do not print the statistics if print_to_stderr is PRINT_STATS_NONE,
s/print_to_stderr/print_location/.  Also, do we really need print_to_stderr in
this function at all?  It seems a bit awkward to combine a boolean and a
paramter for a tri-state value when the parameter holds the tri_state already.
For readability I think just checking print_location will be better since the
value will be clear, where print_to_stderr=false is less clear in a tri-state
scenario.


+ * its ancestors to a list, inorder to compute a path.
s/inorder/in order/


+   elog(LOG, "hash table corrupted, can't construct path value");
+   break;
This will return either a NIL list or a partial path, but PublishMemoryContext
doesn't really take into consideration that it might be so.  Is this really
benign to the point that we can blindly go on?  Also, elog(LOG..) is mostly for
tracing or debugging as elog() isn't intended for user facing errors.


+static void
+compute_num_of_contexts(List *contexts, HTAB *context_id_lookup,
+   int *stats_count, bool get_summary)
This function does a lot than compute the number of contexts so the name seems
a bit misleading.  Perhaps a rename to compute_contexts() or something similar?


+   memctx_info[curr_id].name = dsa_allocate0(area,
+ strlen(clipped_ident) + 1);
These calls can use idlen instead of more strlen() calls no?  While there is no
performance benefit, it would increase readability IMHO if the code first
calculates a value, and then use it.


+   strncpy(name,
+   clipped_ident, strlen(clipped_ident));
Since clipped_ident has been nul terminated earlier there is no need to use
strncpy, we can instead use strlcpy and take the target buffer size into
consideration rather than the input stri

Re: Make COPY format extendable: Extract COPY TO format implementations

2025-02-28 Thread Masahiko Sawada
On Fri, Feb 28, 2025 at 1:58 PM Sutou Kouhei  wrote:
>
> Hi,
>
> In 
>   "Re: Make COPY format extendable: Extract COPY TO format implementations" 
> on Fri, 28 Feb 2025 11:50:39 -0800,
>   Masahiko Sawada  wrote:
>
> > I initially thought it would be acceptable to stop
> > NextCopyFromRawFields exposed since NextCopyFrom() could serve as an
> > alternative. For example, the NextCopyFromRawFields() function was
> > originally exposed in commit 8ddc05fb01ee2c primarily to support
> > extension modules like file_fdw but file_fdw wasn't utilizing this
> > API. I pushed the patch without the above change. Unfortunately, this
> > commit subsequently broke file_text_array_fdw[1] and made BF animal
> > crake unhappy[2].
> >
> > [1] https://github.com/adunstan/file_text_array_fdw
> > [2] 
> > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=crake&dt=2025-02-28%2018%3A47%3A02
>
> Thanks for the try!
>
> > Upon examining file_text_array_fdw more closely, I realized that
> > NextCopyFrom() may not be a suitable replacement for
> > NextCopyFromRawFields() in certain scenarios. Specifically,
> > NextCopyFrom() assumes that the caller has prior knowledge of the
> > source data's column count, making it inadequate for cases where
> > extensions like file_text_array_fdw need to construct an array of
> > source data with an unknown number of columns. In such situations,
> > NextCopyFromRawFields() proves to be more practical. Given these
> > considerations, I'm now leaning towards implementing the proposed
> > change. Thoughts?
>
> You suggest that we re-export NextCopyFromRawFields() (as a
> wrapper of static inline version) for backward
> compatibility, right? It makes sense. We should keep
> backward compatibility because there is a use-case of
> NextCopyFromRawFields().

Yes, I've submitted the patch to re-export that function[1]. Could you
review it?

Regards,

[1] 
https://www.postgresql.org/message-id/CAD21AoBA414Q76LthY65NJfWbjOxXn1bdFFsD_NBhT2wPUS1SQ%40mail.gmail.com

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com




Re: Make COPY format extendable: Extract COPY TO format implementations

2025-02-28 Thread Sutou Kouhei
Hi,

In 
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on 
Fri, 28 Feb 2025 11:50:39 -0800,
  Masahiko Sawada  wrote:

> I initially thought it would be acceptable to stop
> NextCopyFromRawFields exposed since NextCopyFrom() could serve as an
> alternative. For example, the NextCopyFromRawFields() function was
> originally exposed in commit 8ddc05fb01ee2c primarily to support
> extension modules like file_fdw but file_fdw wasn't utilizing this
> API. I pushed the patch without the above change. Unfortunately, this
> commit subsequently broke file_text_array_fdw[1] and made BF animal
> crake unhappy[2].
>
> [1] https://github.com/adunstan/file_text_array_fdw
> [2] 
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=crake&dt=2025-02-28%2018%3A47%3A02

Thanks for the try!

> Upon examining file_text_array_fdw more closely, I realized that
> NextCopyFrom() may not be a suitable replacement for
> NextCopyFromRawFields() in certain scenarios. Specifically,
> NextCopyFrom() assumes that the caller has prior knowledge of the
> source data's column count, making it inadequate for cases where
> extensions like file_text_array_fdw need to construct an array of
> source data with an unknown number of columns. In such situations,
> NextCopyFromRawFields() proves to be more practical. Given these
> considerations, I'm now leaning towards implementing the proposed
> change. Thoughts?

You suggest that we re-export NextCopyFromRawFields() (as a
wrapper of static inline version) for backward
compatibility, right? It makes sense. We should keep
backward compatibility because there is a use-case of
NextCopyFromRawFields().


Thanks,
-- 
kou




Re: Space missing from EXPLAIN output

2025-02-28 Thread Fabrízio de Royes Mello
On Fri, Feb 28, 2025 at 4:48 PM Ilia Evdokimov <
ilya.evdoki...@tantorlabs.com> wrote:

>
> On 28.02.2025 21:08, Robert Haas wrote:
> > On Fri, Feb 28, 2025 at 12:12 PM Thom Brown  wrote:
> >> Rebased and attached.
> > Thanks, committed. Sorry for the mistake and thanks for the patch.
> >
>
> Hi hackers,
>
> First of all, sorry about the space issue - that was my oversight.
>
> I also just noticed another documentation mistake on my part regarding
> the fractional rows display [0]. In one place, I forgot to append '.00'.
> I overlooked this because, in my local branch, this change was already
> committed as part of my local previous patches, so it didn't show up in
> the diff.
>
> Apologies for the oversight! I've attached a fix for this on last commit
> 7717f63.
>
> [0]:
>
> https://www.postgresql.org/message-id/40663fc5-edac-4b45-a2aa-a76976700ed9%40tantorlabs.com
>
>
LGTM

-- 
Fabrízio de Royes Mello


Re: Statistics Import and Export: difference in statistics dumped

2025-02-28 Thread Jeff Davis
On Fri, 2025-02-28 at 14:51 +0530, Ashutosh Bapat wrote:
> 2. We aren't restoring the statistics faithfully - as mentioned in
> Greg's reply. If users dump and restore with autovacuum turned off,
> they will be surprised to see the statistics to be different on the
> original and restored database - which may have other effects like
> change in plans.

Then let's just address that concern directly: disable updating stats
implicitly if autovacuum is off. If autovacuum is on, the user
shouldn't have an expectation of stable stats anyway. Patch attached.

Regards,
Jeff Davis

From a8945b9ce4e358f4a79d3065c07f3b42a94fd387 Mon Sep 17 00:00:00 2001
From: Jeff Davis 
Date: Thu, 27 Feb 2025 17:06:00 -0800
Subject: [PATCH v2] During CREATE INDEX, don't update stats if autovacuum is
 off.

We previously fixed this for binary upgrade in 71b66171d0, but a
similar problem existed when using pg_dump --no-data without
pg_upgrade involved.

Fix by not implicitly updating stats during create index when
autovacuum is disabled.

Reported-by: Ashutosh Bapat 
Discussion: https://postgr.es/m/CAExHW5vf9D+8-a5_BEX3y=2y_xY9hiCxV1=c+fnxdvfprwv...@mail.gmail.com
---
 src/backend/catalog/index.c| 31 +++--
 src/test/regress/expected/stats_import.out | 39 ++
 src/test/regress/sql/stats_import.sql  | 22 ++--
 3 files changed, 82 insertions(+), 10 deletions(-)

diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index f37b990c81d..318a44e1e1d 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -63,6 +63,7 @@
 #include "optimizer/optimizer.h"
 #include "parser/parser.h"
 #include "pgstat.h"
+#include "postmaster/autovacuum.h"
 #include "rewrite/rewriteManip.h"
 #include "storage/bufmgr.h"
 #include "storage/lmgr.h"
@@ -121,6 +122,7 @@ static void UpdateIndexRelation(Oid indexoid, Oid heapoid,
 bool isready);
 static void index_update_stats(Relation rel,
 			   bool hasindex,
+			   bool update_stats,
 			   double reltuples);
 static void IndexCheckExclusion(Relation heapRelation,
 Relation indexRelation,
@@ -1261,6 +1263,17 @@ index_create(Relation heapRelation,
 	}
 	else if ((flags & INDEX_CREATE_SKIP_BUILD) != 0)
 	{
+		bool update_stats = true;
+
+		/*
+		 * If autovacuum is disabled, don't implicitly update stats as a part
+		 * of index creation.
+		 */
+		if (!AutoVacuumingActive() ||
+			(heapRelation->rd_options != NULL &&
+			 !((StdRdOptions *) heapRelation->rd_options)->autovacuum.enabled))
+			update_stats = false;
+
 		/*
 		 * Caller is responsible for filling the index later on.  However,
 		 * we'd better make sure that the heap relation is correctly marked as
@@ -1268,6 +1281,7 @@ index_create(Relation heapRelation,
 		 */
 		index_update_stats(heapRelation,
 		   true,
+		   update_stats,
 		   -1.0);
 		/* Make the above update visible */
 		CommandCounterIncrement();
@@ -2807,9 +2821,9 @@ FormIndexDatum(IndexInfo *indexInfo,
 static void
 index_update_stats(Relation rel,
    bool hasindex,
+   bool update_stats,
    double reltuples)
 {
-	bool		update_stats;
 	BlockNumber relpages = 0;	/* keep compiler quiet */
 	BlockNumber relallvisible = 0;
 	Oid			relid = RelationGetRelid(rel);
@@ -2837,7 +2851,8 @@ index_update_stats(Relation rel,
 	 * Don't update statistics during binary upgrade, because the indexes are
 	 * created before the data is moved into place.
 	 */
-	update_stats = reltuples >= 0 && !IsBinaryUpgrade;
+	if (reltuples < 0 || IsBinaryUpgrade)
+		update_stats = false;
 
 	/*
 	 * Finish I/O and visibility map buffer locks before
@@ -2981,6 +2996,7 @@ index_build(Relation heapRelation,
 	Oid			save_userid;
 	int			save_sec_context;
 	int			save_nestlevel;
+	bool		update_stats = true;
 
 	/*
 	 * sanity checks
@@ -3121,15 +3137,26 @@ index_build(Relation heapRelation,
 		table_close(pg_index, RowExclusiveLock);
 	}
 
+	/*
+	 * If autovacuum is disabled, don't implicitly update stats as a part
+	 * of index creation.
+	 */
+	if (!AutoVacuumingActive() ||
+		(heapRelation->rd_options != NULL &&
+		 !((StdRdOptions *) heapRelation->rd_options)->autovacuum.enabled))
+		update_stats = false;
+
 	/*
 	 * Update heap and index pg_class rows
 	 */
 	index_update_stats(heapRelation,
 	   true,
+	   update_stats,
 	   stats->heap_tuples);
 
 	index_update_stats(indexRelation,
 	   false,
+	   update_stats,
 	   stats->index_tuples);
 
 	/* Make the updated catalog row versions visible */
diff --git a/src/test/regress/expected/stats_import.out b/src/test/regress/expected/stats_import.out
index 1f150f7b08d..abd391181e4 100644
--- a/src/test/regress/expected/stats_import.out
+++ b/src/test/regress/expected/stats_import.out
@@ -12,15 +12,42 @@ CREATE TABLE stats_import.test(
 arange int4range,
 tags text[]
 ) WITH (autovacuum_enabled = false);
+SELECT
+pg_catalog.pg_restore_relation_stats(
+'relation', 'stats_impor

Re: optimize file transfer in pg_upgrade

2025-02-28 Thread Nathan Bossart
On Fri, Feb 28, 2025 at 03:37:49PM -0500, Robert Haas wrote:
> On Fri, Feb 28, 2025 at 3:01 PM Nathan Bossart  
> wrote:
>> That's exactly where I landed (see v3-0002).  I haven't measured whether
>> transferring relfilenodes or dumping the sequence data is faster for the
>> existing modes, but for now I've left those alone, i.e., they still dump
>> sequence data.  The new "swap" mode just uses the old cluster's sequence
>> files, and I've disallowed using swap mode for upgrades from > the sequence tuple format change (along with other incompatible changes).
> 
> Ah. Perhaps I should have read the thread more carefully before
> commenting. Sounds good, at any rate.

On the contrary, I'm glad you independently came to the same conclusion.

>> I'll admit I'm a bit concerned that this will cause problems if and when
>> someone wants to change the sequence tuple format again.  But that hasn't
>> happened for a while, AFAIK nobody's planning to change it, and even if it
>> does happen, we just need to have my proposed new mode transfer the
>> sequence files like it transfers the catalog files.  That will make this
>> mode slower, especially if you have a ton of sequences, but maybe it'll
>> still be a win in most cases.  Of course, we probably will need to have
>> pg_upgrade handle other kinds of format changes, too, but IMHO it's still
>> worth trying to speed up pg_upgrade despite the potential future
>> complexities.
> 
> I think it's fine. If somebody comes along and says "hey, when v23
> came out Nathan's feature only sped up pg_upgrade by 2x instead of 3x
> like it did for v22, so Nathan is a bad person," I think we can fairly
> reply "thanks for sharing your opinion, feel free not to use the
> feature and run at 1x speed". There's no rule saying that every
> optimization must always produce the maximum possible benefit in every
> scenario. We're just concerned about regressions, and "only delivers
> some of the speedup if the sequence format has changed on disk" is not
> a regression.

Cool.  I appreciate the design feedback.

-- 
nathan




Re: jsonb_strip_nulls with arrays?

2025-02-28 Thread Florents Tselai


> On 20 Feb 2025, at 12:18 AM, Andrew Dunstan  wrote:
> 
> 
> 
> On 2025-02-19 We 4:23 PM, Florents Tselai wrote:
>> 
>> 
>>> On 18 Jan 2025, at 11:51 AM, Florents Tselai  
>>>  wrote:
>>> 
>>> 
>>> 
 On 8 Jan 2025, at 6:45 PM, Andrew Dunstan  
  wrote:
 
 
 
 On 2024-09-17 Tu 4:53 PM, Florents Tselai wrote:
> 
>> We could, if we're going to do anything at all in this area. Another 
>> possibility would be to provide a second optional parameter for 
>> json{b}_strip_nulls. That's probably a better way to go.
>> 
> Here's a patch that adds that argument (only for jsonb; no json 
> implementation yet)
> 
> 
 
 I think it looks sane. We're not stripping a top level null, which is one 
 thing I looked out for.
 
 I think we need a json implementation as well, though.
 
 
 
>>> Thanks for having a Look, Andrew; 
>>> if there aren’t any other objections, I’ll come back with a json 
>>> implementation too. 
>> 
>> Attached is a v2 patch with the missing json implementation. 
>> jsonb one remains the same.
> 
> 
> Please add this to the next Commitfest at 
> https://commitfest.postgresql.org/52/
> 

Added ; thanks
 https://commitfest.postgresql.org/patch/5260/
> 
> cheers
> 
> 
> 
> andrew
> 
> 
> 
>> 
>> 
>> 
> --
> Andrew Dunstan
> EDB: https://www.enterprisedb.com 



Re: Statistics Import and Export

2025-02-28 Thread Nathan Bossart
On Fri, Feb 28, 2025 at 12:54:03PM -0800, Jeff Davis wrote:
> (Aside: I assume everyone here agrees that pg_upgrade should transfer
> the stats by default.)

That feels like a safe assumption to me...  I'm curious what the use-case
is for pg_upgrade's --no-statistics option.

-- 
nathan




Re: Restrict copying of invalidated replication slots

2025-02-28 Thread Masahiko Sawada
On Thu, Feb 27, 2025 at 7:26 PM Amit Kapila  wrote:
>
> On Fri, Feb 28, 2025 at 5:10 AM Masahiko Sawada  wrote:
> >
> > On Thu, Feb 27, 2025 at 12:52 AM Amit Kapila  
> > wrote:
> > >
> > > On Thu, Feb 27, 2025 at 10:47 AM Masahiko Sawada  
> > > wrote:
> > > >
> > > > On Tue, Feb 25, 2025 at 7:33 PM Amit Kapila  
> > > > wrote:
> > > > >
> > > > > AFAICU, InvalidateObsoleteReplicationSlots() is not serialized with
> > > > > this operation. So, isn't it possible that the source slot exists at
> > > > > the later position in ReplicationSlotCtl->replication_slots and the
> > > > > loop traversing slots is already ahead from the point where the newly
> > > > > copied slot is created?
> > > >
> > > > Good point. I think it's possible.
> > > >
> > > > > If so, the newly created slot won't be marked
> > > > > as invalid whereas the source slot will be marked as invalid. I agree
> > > > > that even in such a case, at a later point, the newly created slot
> > > > > will also be marked as invalid.
> > > >
> > > > The wal_status of the newly created slot would immediately become
> > > > 'lost' and the next checkpoint will invalidate it. Do we need to do
> > > > something to deal with this case?
> > > >
> > >
> > > + /* Check if source slot became invalidated during the copy operation */
> > > + if (second_slot_contents.data.invalidated != RS_INVAL_NONE)
> > > + ereport(ERROR,
> > > + errmsg("cannot copy replication slot \"%s\"",
> > > +NameStr(*src_name)),
> > > + errdetail("The source replication slot was invalidated during the
> > > copy operation."));
> > >
> > > How about adding a similar test as above for MyReplicationSlot? That
> > > should suffice the need because we have already acquired the new slot
> > > by this time and invalidation should signal this process before
> > > marking the new slot as invalid.
> >
> > IIUC in the scenario you mentioned, the loop traversing slots already
> > passed the position of newly created slot in
> > ReplicationSlotCtl->replication_slots array, so
> > MyReplicationSlot->data.invalidated is still RS_INVAL_NONE, no?
> >
>
> Right, I don't have a simpler solution for this apart from making it
> somehow serialize this operation with
> InvalidateObsoleteReplicationSlots(). I don't think we need to go that
> far to handle the scenario being discussed.

Agreed.

>  It is better to add a
> comment for this race and why it won't harm.

+1

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com




Re: Statistics Import and Export

2025-02-28 Thread Corey Huinker
On Fri, Feb 28, 2025 at 4:25 PM Jeff Davis  wrote:

> On Fri, 2025-02-28 at 14:56 -0600, Nathan Bossart wrote:
> > On Fri, Feb 28, 2025 at 12:54:03PM -0800, Jeff Davis wrote:
> > > (Aside: I assume everyone here agrees that pg_upgrade should
> > > transfer
> > > the stats by default.)
> >
> > That feels like a safe assumption to me...  I'm curious what the use-
> > case
> > is for pg_upgrade's --no-statistics option.
>
> Mostly completeness and paranoia. I don't see a real use case. If we
> decide we don't need it, that's fine with me.
>

Completeness/symmetry and paranoia was how I viewed it. I suppose it might
be useful in a failsafe if a pg_upgrade failed and you needed a way to
retry.


Re: Log connection establishment timings

2025-02-28 Thread Melanie Plageman
On Fri, Feb 28, 2025 at 12:54 AM Bertrand Drouvot
 wrote:
>
> On Thu, Feb 27, 2025 at 05:55:19PM -0500, Melanie Plageman wrote:
> > It still needs polishing (e.g. I need to figure out where to put the new 
> > guc hook
> > functions and enums and such)
>
> yeah, I wonder if it would make sense to create new dedicated 
> "connection_logging"
> file(s).

I think for now there isn't enough for a new file. I think these are
probably okay in postmaster.c since this has to do with postmaster
starting new connections.

> > I'm worried the list of possible connection log messages could get
> > unwieldy if we add many more.
>
> Thinking out loud, I'm not sure we'd add "many more" but maybe what we could 
> do
> is to introduce some predefined groups like:
>
> 'basic' (that would be equivalent to 'received, + timings introduced in 0002')
> 'security' (equivalent to 'authenticated,authorized')
>
> Not saying we need to create this kind of predefined groups now, just an idea
> in case we'd add many more: that could "help" the user experience, or are you
> more concerned about the code maintenance?

I was more worried about the user having to provide very long lists.
But groupings could be a sensible solution in the future.

> Just did a quick test, (did not look closely at the code), and it looks like
> that:
>
> 'all': does not report the "received" (but does report authenticated and 
> authorized)
> 'received': does not work?
> 'authenticated': works
> 'authorized': works

Thanks for testing! Ouch, there were many bugs in that prototype. My
enums were broken and a few other things.

I found some bugs in 0002 as well.  Attached v8 fixes the issues I found so far.

We have to be really careful about when log_connections is actually
set before we use it -- I fixed some issues with that.

I decided to move the creation_time out of ClientSocket and into
BackendStartupData, but that meant I had to save it in the
ConnectionTiming because we don't have access to the
BackendStartupData anymore in PostgresMain(). That means
ConnectionTiming is now a mixture of times and durations. I think that
should be okay (i.e. not too confusing and gross), but I'm not sure.
It is called ConnectionTiming and not ConnectionDuration, after all.

- Melanie
From 0ee1cd306a4c27e5ec0d1860ce7cd219bb761fed Mon Sep 17 00:00:00 2001
From: Melanie Plageman 
Date: Thu, 27 Feb 2025 17:32:17 -0500
Subject: [PATCH v8 1/2] Modularize log_connections output

Convert the boolean log_connections GUC into a list GUC of the
connection stages to log. This obsoletes log_disconnections, as
'disconnected' is now a log_connections option.

The current log_connections options are 'received', 'authenticated',
'authorized', and 'disconnected'. The empty string disables all
connection logging. 'all' enables all available connection logging.

This gives users more control over the volume of connection logging.

This patch has the same behavior as [1] but was developed independently.
The idea to replace log_disconnections with a log_connections option
does, however, come from that thread.

[1] https://www.postgresql.org/message-id/flat/CAA8Fd-qCB96uwfgMKrzfNs32mqqysi53yZFNVaRNJ6xDthZEgA%40mail.gmail.com

Discussion: https://postgr.es/m/flat/Z8FPpJPDLHWNuV2c%40ip-10-97-1-34.eu-west-3.compute.internal#35525b4425c5f8ecfe52ec6ad859ef9a
---
 doc/src/sgml/config.sgml  | 48 +---
 src/backend/libpq/auth.c  |  8 +-
 src/backend/postmaster/postmaster.c   | 77 ++-
 src/backend/tcop/backend_startup.c|  4 +-
 src/backend/tcop/postgres.c   | 13 +---
 src/backend/utils/init/postinit.c |  2 +-
 src/backend/utils/misc/guc_tables.c   | 29 +++
 src/backend/utils/misc/postgresql.conf.sample |  5 +-
 src/include/postmaster/postmaster.h   | 16 +++-
 src/include/tcop/tcopprot.h   |  1 -
 src/include/utils/guc_hooks.h |  2 +
 .../libpq/t/005_negotiate_encryption.pl   |  3 +-
 src/test/authentication/t/001_password.pl |  2 +-
 src/test/authentication/t/003_peer.pl |  2 +-
 src/test/authentication/t/005_sspi.pl |  2 +-
 src/test/kerberos/t/001_auth.pl   |  2 +-
 src/test/ldap/t/001_auth.pl   |  2 +-
 src/test/ldap/t/002_bindpasswd.pl |  2 +-
 .../t/001_mutated_bindpasswd.pl   |  2 +-
 .../modules/oauth_validator/t/001_server.pl   |  2 +-
 .../modules/oauth_validator/t/002_client.pl   |  2 +-
 .../postmaster/t/002_connection_limits.pl |  2 +-
 src/test/postmaster/t/003_start_stop.pl   |  2 +-
 src/test/recovery/t/013_crash_restart.pl  |  2 +-
 src/test/recovery/t/022_crash_temp_files.pl   |  2 +-
 src/test/recovery/t/032_relfilenode_reuse.pl  |  2 +-
 src/test/recovery/t/037_invalid_database.pl   |  3 +-
 src/test/ssl/t/SSL/Server.pm  |  2 +-
 src/tools/ci/pg_ci_base.conf  |  3 +-
 src/tools/pgindent/typedefs.list 

Re: Should work_mem be stable for a prepared statement?

2025-02-28 Thread Sami Imseih
> The argument for treating work_mem specially is that it has effects at
> both plan time and run time, so that the planner's cost assumptions
> are invalidated if the executor uses a different value than the
> planner did.

I see that now. Thanks!

> Maybe that refactoring is one that would conveniently apply to
> other GUCs, or maybe it isn't.  I'm content to await details
> before arguing about what we "should" do.

Agree.

--
Sami




Re: Support for NO INHERIT to INHERIT state change with named NOT NULL constraints

2025-02-28 Thread Alvaro Herrera
On 2025-Feb-21, Suraj Kharage wrote:

> Thanks, Alvaro.
> 
> I have revised the patch as per your last update.
> Please find attached v5 for further review.

Hello

I noticed two issues.  One is that we are OK to modify a constraint
that's defined in our parent, which breaks everything.  We can only
allow a top-level constraint to be modified.  I added a check in
ATExecAlterConstraint() for this.  Please add a test case for this.

The other one is that when we set a constraint to NO INHERIT on a table
with children and grandchildren, we must only modify the directly
inheriting constraints as not having a parent -- we must not recurse to
also do that in the grandchildren!  Otherwise they become disconnected,
which makes no sense.  So we just want to locate the constraint for each
child, modify by subtracting 1 from coninhcount and set islocal, and
we're done.  The whole ATExecSetNotNullNoInherit() function is based on
the wrong premise that this requires to recurse.  I chose to remove it
to keep things simple.

Stylistically, in tablecmds.c we always have the 'List **wqueue'
argument as the first one in functions that take it.  So when adding it
to a function that doesn't have it, don't put it last.

This error message:
-errmsg("ALTER TABLE \"%s\" ALTER CONSTRAINT \"%s\" SET %s only 
supports not null constraint",
-   RelationGetRelationName(rel), cmdcon->conname,
-   cmdcon->noinherit ? "NO INHERIT" : "INHERIT")));
seems a bit excessive.  Looking at other examples, it doesn't look like
we need to cite the complete message in so much detail (especially if,
say, the user specified a schema-qualified table name in the command
which won't show up in the error message, this would look just weird).
I simplified it.

Please verify that the tests are still working correctly and resubmit.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"E pur si muove" (Galileo Galilei)
>From 3113c30f1e09f084f9f33b4006bed145c52c8573 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=C3=81lvaro=20Herrera?= 
Date: Fri, 28 Feb 2025 22:45:59 +0100
Subject: [PATCH] Alvaro's review

---
 doc/src/sgml/ref/alter_table.sgml |   2 +-
 src/backend/commands/tablecmds.c  | 199 +-
 src/backend/parser/gram.y |   4 +-
 src/include/nodes/parsenodes.h|   2 +-
 4 files changed, 60 insertions(+), 147 deletions(-)

diff --git a/doc/src/sgml/ref/alter_table.sgml 
b/doc/src/sgml/ref/alter_table.sgml
index fd02c3ca370..a397cdc0792 100644
--- a/doc/src/sgml/ref/alter_table.sgml
+++ b/doc/src/sgml/ref/alter_table.sgml
@@ -574,7 +574,7 @@ WITH ( MODULUS numeric_literal, REM
   In addition to changing the inheritability status of the constraint,
   in the case where a non-inheritable constraint is being marked
   inheritable, if the table has children, an equivalent constraint
-  is added to them. If marking an inheritable constraint as
+  will be added to them. If marking an inheritable constraint as
   non-inheritable on a table with children, then the corresponding
   constraint on children will be marked as no longer inherited,
   but not removed.
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 3b78b980d47..0e3a9d47720 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -392,15 +392,12 @@ static void AlterSeqNamespaces(Relation classRel, 
Relation rel,
 static ObjectAddress ATExecAlterConstraint(List **wqueue, Relation rel,

   ATAlterConstraint *cmdcon,

   bool recurse, LOCKMODE lockmode);
-static bool ATExecAlterConstraintInternal(ATAlterConstraint *cmdcon, Relation 
conrel,
+static bool ATExecAlterConstraintInternal(List **wqueue, ATAlterConstraint 
*cmdcon, Relation conrel,

  Relation tgrel, Relation rel, HeapTuple contuple,
-   
  bool recurse, List **otherrelids, LOCKMODE lockmode,
-   
  List **wqueue);
+   
  bool recurse, List **otherrelids, LOCKMODE lockmode);
 static void AlterConstrTriggerDeferrability(Oid conoid, Relation tgrel, 
Relation rel,

bool deferrable, bool initdeferred,

List **otherrelids);
-static ObjectAddress ATExecSetNotNullNoInherit(Relation rel, char *conName,
-   
   char *colName, LO

Re: [18] CREATE SUBSCRIPTION ... SERVER

2025-02-28 Thread Jeff Davis
On Mon, 2024-12-16 at 20:05 -0800, Jeff Davis wrote:
> On Wed, 2024-10-30 at 08:08 -0700, Jeff Davis wrote:
> 

Rebased v14.

The approach has changed multiple times. It starte off with more in-
core code, but in response to review feedback, has become more
decoupled from core and more coupled to postgres_fdw.

But the patch has been about the same (just rebases) since March of
last year, and hasn't gotten feedback since. I still think it's a nice
feature, but I'd like some feedback on the externals of the feature.

As a note, this will require a version bump for postgres_fdw for the
new connection method.

Regards,
Jeff Davis

From e63b42acfb4d4d8241b4453520a7fe52195c0f99 Mon Sep 17 00:00:00 2001
From: Jeff Davis 
Date: Tue, 2 Jan 2024 13:42:48 -0800
Subject: [PATCH v14] CREATE SUSBCRIPTION ... SERVER.

Allow specifying a foreign server for CREATE SUBSCRIPTION, rather than
a raw connection string with CONNECTION.

Using a foreign server as a layer of indirection improves management
of multiple subscriptions to the same server. It also provides
integration with user mappings in case different subscriptions have
different owners or a subscription changes owners.

Discussion: https://postgr.es/m/61831790a0a937038f78ce09f8dd4cef7de7456a.ca...@j-davis.com
Reviewed-by: Ashutosh Bapat
---
 contrib/postgres_fdw/Makefile |   2 +
 contrib/postgres_fdw/connection.c |  73 
 .../postgres_fdw/expected/postgres_fdw.out|   8 +
 contrib/postgres_fdw/meson.build  |   1 +
 .../postgres_fdw/postgres_fdw--1.1--1.2.sql   |   8 +
 contrib/postgres_fdw/sql/postgres_fdw.sql |   7 +
 contrib/postgres_fdw/t/010_subscription.pl|  71 
 doc/src/sgml/ref/alter_subscription.sgml  |  18 +-
 doc/src/sgml/ref/create_subscription.sgml |  11 +-
 src/backend/catalog/pg_subscription.c |  38 +++-
 src/backend/commands/foreigncmds.c|  58 +-
 src/backend/commands/subscriptioncmds.c   | 168 --
 src/backend/foreign/foreign.c |  66 +++
 src/backend/parser/gram.y |  22 +++
 src/backend/replication/logical/worker.c  |  16 +-
 src/bin/pg_dump/pg_dump.c |  35 +++-
 src/bin/pg_dump/pg_dump.h |   1 +
 src/bin/psql/tab-complete.in.c|   2 +-
 src/include/catalog/pg_foreign_data_wrapper.h |   3 +
 src/include/catalog/pg_subscription.h |   7 +-
 src/include/foreign/foreign.h |   3 +
 src/include/nodes/parsenodes.h|   3 +
 src/test/regress/expected/oidjoins.out|   1 +
 23 files changed, 587 insertions(+), 35 deletions(-)
 create mode 100644 contrib/postgres_fdw/t/010_subscription.pl

diff --git a/contrib/postgres_fdw/Makefile b/contrib/postgres_fdw/Makefile
index adfbd2ef758..59b805656c1 100644
--- a/contrib/postgres_fdw/Makefile
+++ b/contrib/postgres_fdw/Makefile
@@ -19,6 +19,8 @@ DATA = postgres_fdw--1.0.sql postgres_fdw--1.0--1.1.sql postgres_fdw--1.1--1.2.s
 REGRESS = postgres_fdw query_cancel
 TAP_TESTS = 1
 
+TAP_TESTS = 1
+
 ifdef USE_PGXS
 PG_CONFIG = pg_config
 PGXS := $(shell $(PG_CONFIG) --pgxs)
diff --git a/contrib/postgres_fdw/connection.c b/contrib/postgres_fdw/connection.c
index 8a8d3b4481f..961368a919a 100644
--- a/contrib/postgres_fdw/connection.c
+++ b/contrib/postgres_fdw/connection.c
@@ -131,6 +131,7 @@ PG_FUNCTION_INFO_V1(postgres_fdw_get_connections);
 PG_FUNCTION_INFO_V1(postgres_fdw_get_connections_1_2);
 PG_FUNCTION_INFO_V1(postgres_fdw_disconnect);
 PG_FUNCTION_INFO_V1(postgres_fdw_disconnect_all);
+PG_FUNCTION_INFO_V1(postgres_fdw_connection);
 
 /* prototypes of private functions */
 static void make_new_connection(ConnCacheEntry *entry, UserMapping *user);
@@ -2279,6 +2280,78 @@ postgres_fdw_get_connections_internal(FunctionCallInfo fcinfo,
 	}
 }
 
+/*
+ * Values in connection strings must be enclosed in single quotes. Single
+ * quotes and backslashes must be escaped with backslash. NB: these rules are
+ * different from the rules for escaping a SQL literal.
+ */
+static void
+appendEscapedValue(StringInfo str, const char *val)
+{
+	appendStringInfoChar(str, '\'');
+	for (int i = 0; val[i] != '\0'; i++)
+	{
+		if (val[i] == '\\' || val[i] == '\'')
+			appendStringInfoChar(str, '\\');
+		appendStringInfoChar(str, val[i]);
+	}
+	appendStringInfoChar(str, '\'');
+}
+
+Datum
+postgres_fdw_connection(PG_FUNCTION_ARGS)
+{
+	Oid			userid = PG_GETARG_OID(0);
+	Oid			serverid = PG_GETARG_OID(1);
+	ForeignServer *server = GetForeignServer(serverid);
+	UserMapping *user = GetUserMapping(userid, serverid);
+	StringInfoData str;
+	const char **keywords;
+	const char **values;
+	int			n;
+
+	/*
+	 * Construct connection params from generic options of ForeignServer and
+	 * UserMapping.  (Some of them might not be libpq options, in which case
+	 * we'll just waste a few array slots.)  Add 4 extra slots for
+	 * application_name, fallback_application_name, client_encoding, 

Re: [PoC] Federated Authn/z with OAUTHBEARER

2025-02-28 Thread Jacob Champion
On Fri, Feb 28, 2025 at 9:37 AM Jacob Champion
 wrote:
> Just from inspection, that looks good to me. I'll look into running
> the new BSD tasks on the other patches I posted above, too.

After your patch gets us past the zero timeout bug, NetBSD next runs into

failed to fetch OpenID discovery document: Unsupported protocol
(libcurl: Received HTTP/0.9 when not allowed)'

...but only for a single test (nonempty oauth_client_secret), which is
very strange. And it doesn't always fail during the same HTTP request.
I'll look into it.

--Jacob




Re: Space missing from EXPLAIN output

2025-02-28 Thread Ilia Evdokimov


On 28.02.2025 21:08, Robert Haas wrote:

On Fri, Feb 28, 2025 at 12:12 PM Thom Brown  wrote:

Rebased and attached.

Thanks, committed. Sorry for the mistake and thanks for the patch.



Hi hackers,

First of all, sorry about the space issue - that was my oversight.

I also just noticed another documentation mistake on my part regarding 
the fractional rows display [0]. In one place, I forgot to append '.00'. 
I overlooked this because, in my local branch, this change was already 
committed as part of my local previous patches, so it didn't show up in 
the diff.


Apologies for the oversight! I've attached a fix for this on last commit 
7717f63.


[0]: 
https://www.postgresql.org/message-id/40663fc5-edac-4b45-a2aa-a76976700ed9%40tantorlabs.com


--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.
diff --git a/doc/src/sgml/perform.sgml b/doc/src/sgml/perform.sgml
index be4b49f62b..91feb59abd 100644
--- a/doc/src/sgml/perform.sgml
+++ b/doc/src/sgml/perform.sgml
@@ -730,7 +730,7 @@ WHERE t1.unique1 < 10 AND t1.unique2 = t2.unique2;
  ->  Bitmap Index Scan on tenk1_unique1  (cost=0.00..4.36 rows=10 width=0) (actual time=0.004..0.004 rows=10.00 loops=1)
Index Cond: (unique1 < 10)
Buffers: shared hit=2
-   ->  Index Scan using tenk2_unique2 on tenk2 t2  (cost=0.29..7.90 rows=1 width=244) (actual time=0.003..0.003 rows=1 loops=10)
+   ->  Index Scan using tenk2_unique2 on tenk2 t2  (cost=0.29..7.90 rows=1 width=244) (actual time=0.003..0.003 rows=1.00 loops=10)
  Index Cond: (unique2 = t1.unique2)
  Buffers: shared hit=24 read=6
  Planning:


making EXPLAIN extensible

2025-02-28 Thread Robert Haas
Prior to PostgreSQL 10, EXPLAIN had just 2 options: VACUUM and
ANALYZE. Now, we're up to 12 options, which is already quite a lot,
and there's plenty more things that somebody might like to do.
However, not all of those things necessarily need to be part of the
core code. My original reason for wanting to extend EXPLAIN was that I
was thinking about an extension that would want to do a bunch of
things and one of those things would be to add some information to the
EXPLAIN output. It wouldn't make sense for core to have an EXPLAIN
option whose whole purpose is to cater to the needs of some extension,
so that made me think of providing some extensibility infrastructure.

However, there are other use cases, too, basically any of the normal
reasons why extensibility is useful and desirable. You might need to
get some information out a query plan that 99% of people don't care
about. You could come up with your own way of formatting a query plan,
but that's a big pain. It's a lot nicer if you can just add the detail
that you care about to the EXPLAIN output without needing to modify
PostgreSQL itself. Even if you think of something that really ought to
be included in the EXPLAIN output by PostgreSQL, you can roll an
extension out much quicker than you can get a change upstreamed and
released. So I think EXPLAIN extensibility is, as a general concept,
useful.

So here are some patches.

0001 allows a loadable module to register new EXPLAIN options.
Currently, EXPLAIN (FUNGUS) will error out, but if you want to make it
work, this patch is for you. This patch also allows you to stash some
state related to your new option, or options, in the ExplainState.
Core options have hard-coded structure members; e.g. EXPLAIN (BUFFERS)
sets es->buffers. If you add EXPLAIN (FUNGUS), there won't be an
es->fungus, but you can get about the same effect using the new
facilities provided here.

0002 provides hooks that you can use to make your new EXPLAIN options
actually do something. In particular, this adds a new hook that is
called once per PlanState node, and a new nook that is called once per
PlannedStmt. Each is called at an appropriate point for you to tack on
more output after what EXPLAIN would already produce.

0003 adds a new contrib module called pg_overexplain, which adds
EXPLAIN (DEBUG) and EXPLAIN (RANGE_TABLE). I actually think this is
quite useful for planner hacking, and maybe a few more options would
be, too. Right now, if you want to see stuff that EXPLAIN doesn't
clearly show, you have to use SET debug_print_plan = true, and that
output is so verbose that finding the parts you actually want to see
is quite difficult. Assuming it gives you the details you need,
EXPLAIN (RANGE_TABLE) looks way, way better to me, and if we end up
committing these patches I anticipate using this semi-regularly.

There are plenty of debatable things in this patch set, and I mention
some of them in the commit messages. The hook design in 0002 is a bit
simplistic and could be made more complex; there's lots of stuff that
could be added to or removed from 0003, much of which comes down to
what somebody hacking on the planner would actually want to see. I'm
happy to bikeshed all of that stuff; this is all quite preliminary and
I'm not committed to the details. The only thing that would disappoint
me is if somebody said "this whole idea of making EXPLAIN extensible
is stupid and pointless and we shouldn't ever do it." I will argue
against that vociferously. I think even what I have here is enough to
disprove that hypothesis, but I have a bunch of ideas about how to do
more. Some of those require additional infrastructure and are best
proposed with that other infrastructure; some can be done with just
this, but I ran out of time to code up examples so here is what I have
got so far.

Hope you like it, sorry if you don't.

-- 
Robert Haas
EDB: http://www.enterprisedb.com


v1-0003-pg_overexplain-Additional-EXPLAIN-options-for-deb.patch
Description: Binary data


v1-0002-Add-some-new-hooks-so-extensions-can-add-details-.patch
Description: Binary data


v1-0001-Make-it-possible-for-loadable-modules-to-add-EXPL.patch
Description: Binary data


Re: optimize file transfer in pg_upgrade

2025-02-28 Thread Nathan Bossart
On Fri, Feb 28, 2025 at 02:41:22PM -0500, Robert Haas wrote:
> On Fri, Feb 28, 2025 at 2:40 PM Robert Haas  wrote:
>> Maybe we should rethink the decision not to transfer relfilenodes for
>> sequences. Or have more than one way to do it. pg_upgrade
>> --binary-upgrade --binary-upgrade-even-for-sequences, or whatever.
> 
> Sorry, I meant: pg_dump --binary-upgrade --binary-upgrade-even-for-sequences
> 
> i.e. pg_upgrade could decide which way to ask pg_dump to do it,
> depending on versions and flags.

That's exactly where I landed (see v3-0002).  I haven't measured whether
transferring relfilenodes or dumping the sequence data is faster for the
existing modes, but for now I've left those alone, i.e., they still dump
sequence data.  The new "swap" mode just uses the old cluster's sequence
files, and I've disallowed using swap mode for upgrades from 

Re: Should work_mem be stable for a prepared statement?

2025-02-28 Thread Sami Imseih
> It sounds like the behavior change would be desirable or at least
> neutral. I will have to try it out and see if the refactoring is a net
> improvement or turns into a mess.

I think this is a good operational improvement, particularly if
someone wants to change work_mem in a pinch, and the only
option now they have it to somehow get the application to
re-prepare; deallocating all prepared statements or reconnecting.
This is even worse with extended query protocol prepared statements
in which there is no visibility in pg_prepared_statements. So one may
be forced to use DEALLOCATE ALL.

However, I think any GUC that can influence the planner
should be considered for consistency in behavior.
It was mentioned above with the enable_* GUCs, but another
one I can think of is the max_parallel_workers_per_gather which
should then force a re-plan if changed. I have seen users need to turn
that off in a hurry when it impacts their oltp workload.


--
Sami Imseih
Amazon Web Services (AWS)




Re: Update docs for UUID data type

2025-02-28 Thread Andy Alsup
Masahiko,

I like the change you've made.

Thanks,
Andy Alsup

On Fri, Feb 28, 2025 at 2:05 PM Masahiko Sawada 
wrote:

> On Thu, Feb 27, 2025 at 5:50 PM Andy Alsup  wrote:
> >
> > Masahiko,
> >
> > I have combined the gen_random_uuid() and uuidv4() into a single row, as
> you suggested. Please find the v5 patch, which has been squashed into a
> single commit.
>
> Thank you for updating the patch!
>
> I like that the patch adds the reference to the uuid data type. But I
> think we might want to adjust terminology:
>
> +  
> +   See  for how details on the UUID
> datatype in
> +PostgreSQL.
> +  
>
> On 9.14. UUID Functions section, we use the word 'UUID' for data that
> are generated based on algorithms defined by RFC9562 whereas we use
> uuid (i.e., uuid in func.sgml) to a PostgreSQL data type.
> IIUC you want to refer 'UUID datatype' in the above change to the
> latter, PostgreSQL's uuid data type. Is that correct? If so, how about
> the following change?
>
>See  for details on the data type
>uuid in PostgreSQL.
>
> I've attached the updated patch that incorporates the above change,
> and updated the commit message too.
>
> Regards,
>
> --
> Masahiko Sawada
> Amazon Web Services: https://aws.amazon.com
>


Re: Make COPY format extendable: Extract COPY TO format implementations

2025-02-28 Thread Sutou Kouhei
Hi,

In 
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on 
Fri, 28 Feb 2025 14:00:18 -0800,
  Masahiko Sawada  wrote:

> Yes, I've submitted the patch to re-export that function[1]. Could you
> review it?
> 
> [1] 
> https://www.postgresql.org/message-id/CAD21AoBA414Q76LthY65NJfWbjOxXn1bdFFsD_NBhT2wPUS1SQ%40mail.gmail.com

Sure! Done!

https://www.postgresql.org/message-id/20250301.071641.1257013931056303227.kou%40clear-code.com


Thanks,
-- 
kou




Re: making EXPLAIN extensible

2025-02-28 Thread Isaac Morland
On Fri, 28 Feb 2025 at 15:09, Thom Brown  wrote:


> One thing I am wondering is whether extensions should be required to
> prefix their EXPLAIN option with the extension name to avoid
> collisions.
>
> If two extensions happen to choose the same name, it won't be possible
> to use both simultaneously.


Could the call that processes the registration automatically prepend the
extension name to the supplied explain option name? So if extension X
registers option O it would be registered as X_O rather than returning an
error if O doesn't follow the proper pattern.


Re: Trigger more frequent autovacuums of heavy insert tables

2025-02-28 Thread wenhui qiu
Hi
> + * It could be the stats were updated manually and relallfrozen >
> + * relpages. Clamp relallfrozen to relpages to avoid nonsensical
> + * calculations.
> + */
> + relallfrozen = Min(relallfrozen, relpages);
> + pcnt_unfrozen = 1 - ((float4) relallfrozen / relpages);
> + }
> +
Based on the comments, the  pcnt_unfrozen   value could potentially be 0,
which would indicate that everything is frozen. Therefore,  is it necessary
to handle the case where the value is 0.?





On Sat, Mar 1, 2025 at 1:54 AM Nathan Bossart 
wrote:

> On Wed, Feb 26, 2025 at 04:48:20PM -0500, Melanie Plageman wrote:
> > Makes sense. Thanks Robert and Nathan. Attached v11 changes the docs
> > wording and is rebased.
>
> 0001 LGTM.
>
> >  
> > - Specifies a fraction of the table size to add to
> > - autovacuum_vacuum_insert_threshold
> > - when deciding whether to trigger a VACUUM.
> > - The default is 0.2 (20% of table size).
> > - This parameter can only be set in the
> postgresql.conf
> > - file or on the server command line;
> > - but the setting can be overridden for individual tables by
> > - changing table storage parameters.
> > +Specifies a fraction of the active (unfrozen) table size to add
> to
> > +autovacuum_vacuum_insert_threshold
> > +when deciding whether to trigger a VACUUM.
> > +The default is 0.2 (20% of active table
> size).
> > +This parameter can only be set in the
> postgresql.conf
> > +file or on the server command line;
> > +but the setting can be overridden for individual tables by
> > +changing table storage parameters.
> >  
>
> nitpick: There might be an unintentional indentation change here.
>
> I'm wondering about the use of the word "active," too.  While it's
> qualified by the "(unfrozen)" after it, I'm worried it might not be
> descriptive enough.  For example, I might consider a frozen page that's in
> the buffer cache and is being read by queries to be "active."  And it
> doesn't seem clear to me that it's referring to unfrozen pages and not
> unfrozen tuples.  Perhaps we should say something like "a fraction of the
> unfrozen pages in the table to add...".
>
> > + /*
> > +  * If we have data for relallfrozen, calculate the
> unfrozen percentage
> > +  * of the table to modify insert scale factor. This helps
> us decide
> > +  * whether or not to vacuum an insert-heavy table based on
> the number
> > +  * of inserts to the "active" part of the table.
> > +  */
> > + if (relpages > 0 && relallfrozen > 0)
>
> So, if we don't have this data, we just use reltuples, which is the
> existing behavior and should trigger vacuums less aggressively than if we
> _did_ have the data.  That seems like the correct choice to me.
>
> > + /*
> > +  * It could be the stats were updated manually and
> relallfrozen >
> > +  * relpages. Clamp relallfrozen to relpages to
> avoid nonsensical
> > +  * calculations.
> > +  */
> > + relallfrozen = Min(relallfrozen, relpages);
> > + pcnt_unfrozen = 1 - ((float4) relallfrozen /
> relpages);
>
> Makes sense.
>
> --
> nathan
>
>
>


Re: [PATCH] pg_stat_activity: make slow/hanging authentication more visible

2025-02-28 Thread Jacob Champion
On Fri, Feb 14, 2025 at 5:34 PM Jacob Champion
 wrote:
> (But it doesn't
> seem like we're going to agree on this for now; in the meantime I'll
> prepare a version of the patch that only calls
> pgstat_bestart_security() once.)

v9 removes the first call, and moves the second (now only) call up and
out of the if/else chain, just past client authentication. The SSL
pre-auth tests have been removed.

Thanks!
--Jacob
1:  81a61854bdf ! 1:  d8de39bc076 pgstat: report in earlier with STATE_STARTING
@@ Commit message
 
 2) pgstat_bestart_security() reports the SSL/GSS status of the
connection.  Some backends don't call this at all; others call it
-   twice, once after transport establishment and once after client
-   authentication.
+   after client authentication.
 
 3) pgstat_bestart_final() reports the user and database IDs, takes the
entry out of STATE_STARTING, and reports the application_name.
@@ src/backend/utils/activity/backend_status.c: pgstat_bestart(void)
 +}
 +
 +/*
-+ * Fill in SSL and GSS information for the pgstat entry. This is separate 
from
-+ * pgstat_bestart_initial() so that backends may call it multiple times as
-+ * security details are filled in.
++ * Fill in SSL and GSS information for the pgstat entry.
 + *
 + * This should only be called from backends with a MyProcPort.
 + */
@@ src/backend/utils/init/postinit.c: InitPostgres(const char *in_dbname, 
Oid dboid
 +  if (!bootstrap)
 +  {
 +  pgstat_bestart_initial();
-+  if (MyProcPort)
-+  pgstat_bestart_security();  /* fill in any SSL/GSS 
info too */
-+
 +  INJECTION_POINT("init-pre-auth");
 +  }
 +
@@ src/backend/utils/init/postinit.c: InitPostgres(const char *in_dbname, 
Oid dboid
InitializeSessionUserId(username, useroid, false);
/* ensure that auth_method is actually valid, aka authn_id is 
not NULL */
 @@ src/backend/utils/init/postinit.c: InitPostgres(const char *in_dbname, 
Oid dboid,
-   InitializeSystemUser(MyClientConnectionInfo.authn_id,
-
hba_authname(MyClientConnectionInfo.auth_method));
am_superuser = superuser();
-+
-+  /*
-+   * Authentication may have changed SSL/GSS details for the 
session, so
-+   * report it again.
-+   */
-+  pgstat_bestart_security();
}
  
++  /* Report any SSL/GSS details for the session. */
++  if (MyProcPort != NULL)
++  {
++  Assert(!bootstrap);
++
++  pgstat_bestart_security();
++  }
++
/*
+* Binary upgrades only allowed super-user connections
+*/
 @@ src/backend/utils/init/postinit.c: InitPostgres(const char *in_dbname, 
Oid dboid,
/* initialize client encoding */
InitializeClientEncoding();
@@ src/test/authentication/t/007_pre_auth.pl (new)
 +$conn->quit();
 +
 +done_testing();
-
- ## src/test/ssl/Makefile ##
-@@
- #-
- 
- EXTRA_INSTALL = contrib/sslinfo
-+EXTRA_INSTALL += src/test/modules/injection_points
- 
- subdir = src/test/ssl
- top_builddir = ../../..
- include $(top_builddir)/src/Makefile.global
- 
--export OPENSSL with_ssl
-+export OPENSSL enable_injection_points with_ssl
- 
- # The sslfiles targets are separated into their own file due to 
interactions
- # with settings in Makefile.global.
-
- ## src/test/ssl/meson.build ##
-@@ src/test/ssl/meson.build: tests += {
-   'bd': meson.current_build_dir(),
-   'tap': {
- 'env': {
-+  'enable_injection_points': get_option('injection_points') ? 'yes' : 
'no',
-   'with_ssl': ssl_library,
-   'OPENSSL': openssl.found() ? openssl.path() : '',
- },
-
- ## src/test/ssl/t/001_ssltests.pl ##
-@@ src/test/ssl/t/001_ssltests.pl: use Config qw ( %Config );
- use PostgreSQL::Test::Cluster;
- use PostgreSQL::Test::Utils;
- use Test::More;
-+use Time::HiRes qw(usleep);
- 
- use FindBin;
- use lib $FindBin::RealBin;
-@@ src/test/ssl/t/001_ssltests.pl: $node->start;
- my $result = $node->safe_psql('postgres', "SHOW ssl_library");
- is($result, $ssl_server->ssl_library(), 'ssl_library parameter');
- 
-+my $injection_points_unavailable = '';
-+if ($ENV{enable_injection_points} ne 'yes')
-+{
-+  $injection_points_unavailable =
-+'Injection points not supported by this build';
-+}
-+elsif (!$node->check_extension('injection_points'))
-+{
-+  $injection_points_unavailable =
-+'Extension injection_points not installed';
-

Re: Make COPY format extendable: Extract COPY TO format implementations

2025-02-28 Thread Sutou Kouhei
Hi,

Our 0001/0002 patches were merged into master. I've rebased
on master. Can we discuss how to proceed rest patches?

The contents of them aren't changed but I'll show a summary
of them again:

0001-0003 are for COPY TO and 0004-0007 are for COPY FROM.

For COPY TO:

0001: Add support for adding custom COPY TO format. This
uses tablesample like handler approach. We've discussed
other approaches such as USING+CREATE XXX approach but it
seems that other approaches are overkill for this case.

See also: 
https://www.postgresql.org/message-id/flat/d838025aceeb19c9ff1db702fa55cabf%40postgrespro.ru#caca2799effc859f82f40ee8bec531d8

0002: Export CopyToStateData to implement custom COPY TO
format as extension.

0003: Export a function and add a private space to
CopyToStateData to implement custom COPY TO format as
extension.

We may want to squash 0002 and 0003 but splitting them will
be easy to review. Because 0002 just moves existing codes
(with some rename) and 0003 just adds some codes. If we
squash 0002 and 0003, moving and adding are mixed.

For COPY FROM:

0004: This is COPY FROM version of 0001.

0005: 0002 has COPY_ prefix -> COPY_DEST_ prefix change for
enum CopyDest. This is similar change for enum CopySource.

0006: This is COPY FROM version of 0003.

0007: This is for easy to implement "ON_ERROR stop" and
"LOG_VERBOSITY verbose" in extension.

We may want to squash 0005-0007 like for 0002-0003.


Thanks,
-- 
kou
>From 5ccc5d1a54d0f6c7c47381533c879a9432fb925f Mon Sep 17 00:00:00 2001
From: Sutou Kouhei 
Date: Mon, 25 Nov 2024 12:19:15 +0900
Subject: [PATCH v35 1/7] Add support for adding custom COPY TO format

This uses the handler approach like tablesample. The approach creates
an internal function that returns an internal struct. In this case,
a COPY TO handler returns a CopyToRoutine.

This also add a test module for custom COPY TO handler.
---
 src/backend/commands/copy.c   | 79 ---
 src/backend/commands/copyto.c | 36 +++--
 src/backend/nodes/Makefile|  1 +
 src/backend/nodes/gen_node_support.pl |  2 +
 src/backend/utils/adt/pseudotypes.c   |  1 +
 src/include/catalog/pg_proc.dat   |  6 ++
 src/include/catalog/pg_type.dat   |  6 ++
 src/include/commands/copy.h   |  1 +
 src/include/commands/copyapi.h|  2 +
 src/include/nodes/meson.build |  1 +
 src/test/modules/Makefile |  1 +
 src/test/modules/meson.build  |  1 +
 src/test/modules/test_copy_format/.gitignore  |  4 +
 src/test/modules/test_copy_format/Makefile| 23 ++
 .../expected/test_copy_format.out | 17 
 src/test/modules/test_copy_format/meson.build | 33 
 .../test_copy_format/sql/test_copy_format.sql |  5 ++
 .../test_copy_format--1.0.sql |  8 ++
 .../test_copy_format/test_copy_format.c   | 63 +++
 .../test_copy_format/test_copy_format.control |  4 +
 20 files changed, 273 insertions(+), 21 deletions(-)
 mode change 100644 => 100755 src/backend/nodes/gen_node_support.pl
 create mode 100644 src/test/modules/test_copy_format/.gitignore
 create mode 100644 src/test/modules/test_copy_format/Makefile
 create mode 100644 src/test/modules/test_copy_format/expected/test_copy_format.out
 create mode 100644 src/test/modules/test_copy_format/meson.build
 create mode 100644 src/test/modules/test_copy_format/sql/test_copy_format.sql
 create mode 100644 src/test/modules/test_copy_format/test_copy_format--1.0.sql
 create mode 100644 src/test/modules/test_copy_format/test_copy_format.c
 create mode 100644 src/test/modules/test_copy_format/test_copy_format.control

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index cfca9d9dc29..8d94bc313eb 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -32,6 +32,7 @@
 #include "parser/parse_coerce.h"
 #include "parser/parse_collate.h"
 #include "parser/parse_expr.h"
+#include "parser/parse_func.h"
 #include "parser/parse_relation.h"
 #include "utils/acl.h"
 #include "utils/builtins.h"
@@ -476,6 +477,70 @@ defGetCopyLogVerbosityChoice(DefElem *def, ParseState *pstate)
 	return COPY_LOG_VERBOSITY_DEFAULT;	/* keep compiler quiet */
 }
 
+/*
+ * Process the "format" option.
+ *
+ * This function checks whether the option value is a built-in format such as
+ * "text" and "csv" or not. If the option value isn't a built-in format, this
+ * function finds a COPY format handler that returns a CopyToRoutine (for
+ * is_from == false). If no COPY format handler is found, this function
+ * reports an error.
+ */
+static void
+ProcessCopyOptionFormat(ParseState *pstate,
+		CopyFormatOptions *opts_out,
+		bool is_from,
+		DefElem *defel)
+{
+	char	   *format;
+	Oid			funcargtypes[1];
+	Oid			handlerOid = InvalidOid;
+
+	format = defGetString(defel);
+
+	opts_out->csv_mode = false;
+	opts_out->binary = false;
+	/* b

Re: Orphaned users in PG16 and above can only be managed by Superusers

2025-02-28 Thread Ashutosh Sharma
Added a commitfest entry for this here:

https://commitfest.postgresql.org/patch/5608/

--
With Regards,
Ashutosh Sharma.

On Tue, Feb 18, 2025 at 2:54 PM Ashutosh Sharma  wrote:
>
> Hi Robert,
>
> On Tue, Feb 11, 2025 at 9:48 PM Ashutosh Sharma  wrote:
> >
> > Hi Robert,
> >
> > On Tue, Feb 4, 2025 at 10:54 PM Robert Haas  wrote:
> > >
> > > On Thu, Jan 30, 2025 at 8:45 AM Ashutosh Sharma  
> > > wrote:
> > > > Imagine a superuser creates role u1. Since the superuser is creating
> > > > u1, it won't have membership in any role. Now, suppose u1 creates a
> > > > new role, u2. In this case, u1 automatically becomes a member of u2
> > > > with the admin option. However, at this point, there is no dependency
> > > > between u1 and u2, meaning that dropping u2 shouldn't impact u1. Now,
> > > > if u2 creates yet another role, u3, that's when u1 will start
> > > > depending on u2. This is because if u2 were dropped, u1 would lose the
> > > > ability to administer u3. At this stage, a dependency between u1 and
> > > > u2 is recorded.
> > >
> > > This seems to me to be assuming that who can administer which roles
> > > won't change, but it can. The superuser is free to grant the ability
> > > to administer u3 to some new user u4 or an existing user such as u1,
> > > and is also free to remove that ability from u2.
> > >
> >
> > You're right, I'll fix that in the next patch.
> >
> > > I think if you want to legislate that attempting to drop u2 fails, you
> > > have to do that by testing what the situation is at the time of the
> > > drop command, not by adding dependencies in advance.
> >
> > I agree, and I will give this some thought to see if we can find a way
> > forward along these lines.
> >
>
> Attached is a patch that checks for role dependencies when the DROP
> ROLE command is executed. If dependencies are found, the command is
> prevented from succeeding. Please review the attached patch and share
> your feedback. thanks.!
>
> --
> With Regards,
> Ashutosh Sharma.




Re: Introduce "log_connection_stages" setting.

2025-02-28 Thread Sergey Dudoladov
Hello hackers,

here is the new rebased version of the patch.

Regards,
Sergey
From 8a695aacfd9590737a6e10aca8ddf33181806937 Mon Sep 17 00:00:00 2001
From: Sergey Dudoladov 
Date: Fri, 28 Feb 2025 10:33:11 +0100
Subject: [PATCH] Introduce log_connection_messages

This patch removes 'log_connections' and 'log_disconnections'
in favor of 'log_connection_messages', thereby introducing
incompatibility

Author: Sergey Dudoladov

Reviewed-by: Justin Pryzby, Jacob Champion

Discussion:
https://www.postgresql.org/message-id/flat/CAA8Fd-qCB96uwfgMKrzfNs32mqqysi53yZFNVaRNJ6xDthZEgA%40mail.gmail.com
---
 doc/src/sgml/config.sgml  | 90 +--
 src/backend/commands/variable.c   | 81 +
 src/backend/libpq/auth.c  |  9 +-
 src/backend/postmaster/postmaster.c   |  1 -
 src/backend/tcop/backend_startup.c|  4 +-
 src/backend/tcop/postgres.c   | 11 ++-
 src/backend/utils/init/postinit.c |  2 +-
 src/backend/utils/misc/guc_tables.c   | 29 +++---
 src/backend/utils/misc/postgresql.conf.sample |  5 +-
 src/include/postgres.h|  9 ++
 src/include/postmaster/postmaster.h   |  1 -
 src/include/tcop/tcopprot.h   |  1 -
 src/include/utils/guc_hooks.h |  2 +
 .../libpq/t/005_negotiate_encryption.pl   |  3 +-
 src/test/authentication/t/001_password.pl |  2 +-
 src/test/authentication/t/003_peer.pl |  2 +-
 src/test/authentication/t/005_sspi.pl |  2 +-
 src/test/kerberos/t/001_auth.pl   |  2 +-
 src/test/ldap/t/001_auth.pl   |  2 +-
 src/test/ldap/t/002_bindpasswd.pl |  2 +-
 .../t/001_mutated_bindpasswd.pl   |  2 +-
 .../modules/oauth_validator/t/001_server.pl   |  2 +-
 .../modules/oauth_validator/t/002_client.pl   |  2 +-
 .../postmaster/t/002_connection_limits.pl |  2 +-
 src/test/postmaster/t/003_start_stop.pl   |  2 +-
 src/test/recovery/t/013_crash_restart.pl  |  2 +-
 src/test/recovery/t/022_crash_temp_files.pl   |  2 +-
 src/test/recovery/t/032_relfilenode_reuse.pl  |  2 +-
 src/test/recovery/t/037_invalid_database.pl   |  3 +-
 src/test/ssl/t/SSL/Server.pm  |  2 +-
 src/tools/ci/pg_ci_base.conf  |  3 +-
 31 files changed, 197 insertions(+), 87 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index e55700f35b..0428733463 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -140,7 +140,7 @@
  An example of what this file might look like is:
 
 # This is a comment
-log_connections = yes
+log_connection_messages = all
 log_destination = 'syslog'
 search_path = '"$user", public'
 shared_buffers = 128MB
@@ -337,7 +337,7 @@ UPDATE pg_settings SET setting = reset_val WHERE name = 'configuration_parameter
-c name=value command-line parameter, or its equivalent
--name=value variation.  For example,
 
-postgres -c log_connections=yes --log-destination='syslog'
+postgres -c log_connection_messages=all --log-destination='syslog'
 
Settings provided in this way override those set via
postgresql.conf or ALTER SYSTEM,
@@ -7314,23 +7314,75 @@ local0.*/var/log/postgresql
   
  
 
- 
-  log_connections (boolean)
+ 
+  log_connection_messages (string)
   
-   log_connections configuration parameter
+   log_connection_messages configuration parameter
   
   
   

-Causes each attempted connection to the server to be logged,
-as well as successful completion of both client authentication (if
-necessary) and authorization.
+Causes the specified stages of each connection attempt to be logged.
+Example: authorized,disconnected.
+The default is the empty string, which means that nothing is logged.
 Only superusers and users with the appropriate SET
 privilege can change this parameter at session start,
 and it cannot be changed at all within a session.
-The default is off.

 
+
+ Connection messages
+ 
+  
+  
+  
+   
+Name
+Description
+   
+  
+  
+   
+received
+Logs receipt of a connection. At this point a connection has been
+received, but no further work has been done: PostgreSQL is about to start
+interacting with a client.
+   
+
+   
+authenticated
+Logs the original identity used by an authentication method
+to identify a user. In most cases, the identity string matches the
+PostgreSQL username, but some third-party authentication methods may
+alter the original user identifier before the server stores it. Failed
+authentication is always logged regardless of the value of this setting.
+
+   
+
+   
+authorized
+Logs succe

Re: Statistics Import and Export: difference in statistics dumped

2025-02-28 Thread Ashutosh Bapat
Hi Jeff,
I am changing the subject on this email and thus creating a new thread
to discuss this issue.

On Fri, Feb 28, 2025 at 8:02 AM Jeff Davis  wrote:
>
> On Tue, 2025-02-25 at 11:11 +0530, Ashutosh Bapat wrote:
> > So the dumped statistics are not restored exactly. The reason for
> > this
> > is the table statistics is dumped before dumping ALTER TABLE ... ADD
> > CONSTRAINT command which changes the statistics. I think all the
> > pg_restore_relation_stats() calls should be dumped after all the
> > schema and data modifications have been done. OR what's the point in
> > dumping statistics only to get rewritten even before restore
> > finishes.
>
> In your example, it's not so bad because the stats are actually better:
> the index is built after the data is present, and therefore relpages
> and reltuples are correct.
>
> The problem is more clear if you use --no-data. If you load data,
> ANALYZE, pg_dump --no-data, then reload the sql file, then the stats
> are lost.
>
> That workflow is very close to what pg_upgrade does. We solved the
> problem for pg_upgrade in commit 71b66171d0 by simply not updating the
> statistics when building an index and IsBinaryUpgrade.
>
> To solve the issue with dump --no-data, I propose that we change the
> test in 71b66171d0 to only update the stats if the physical relpages is
> non-zero.

I don't think I understand the patch well, but here's one question: If
a table is truncated and index is rebuilt would the code in patch stop
it from updating the stats? If yes, that looks problematic.

>
> Patch attached:
>
>  * If the dump is --no-data, or during pg_upgrade, the table will be
> empty, so the physical relpages will be zero and the restored stats
> won't be overwritten.
>
>  * If (like in your example) the dump includes data, the new stats are
> based on real data, so they are better anyway. This is sort of like the
> case where autoanalyze kicks in.
>
>  * If the dump is --statistics-only, then there won't be any indexes
> created in the SQL file, so when you restore the stats, they will
> remain until you do something else to change them.
>
>  * If your example really is a problem, you'd need to dump first with -
> -no-statistics, and then with --statistics-only, and restore the two
> SQL files in order.

There are few problems

1. If there are thousands of tables with primary key constraints, we
have twice the number of calls to pg_restore_relation_stats() of which
only half will be useful. The stats written by the first set of calls
will be overwritten by the second set of calls. The time spent in
executing the first set of calls can be saved completely and to some
extent time dumping the calls as well. It will be some measurable
improvement I think.

2. We aren't restoring the statistics faithfully - as mentioned in
Greg's reply. If users dump and restore with autovacuum turned off,
they will be surprised to see the statistics to be different on the
original and restored database - which may have other effects like
change in plans.

3. The test I am building over at [1] is aimed at testing whether the
objects dumped get restored faithfully by comparing dumps from the
original and restored database. That's a bit crude method but is being
used by some of our tests. I think it will be good to test statistics
as well in that test. But if it's not going to be same on the original
and the restored database we can not test it. For now, I have used
--no-statistics.

>
>
> Alternatively, we could put stats into SECTION_POST_DATA, which was
> already discussed[*], and we decided against it (though there was not a
> clear consensus).

I haven't looked at the code which dumps the statistics, but it does
seem simple dump the statistics after the constraint creation command
for the tables with primary key constraint. That will dump
not-up-to-date statistics and might overwrite the statistics


[1] 
https://www.postgresql.org/message-id/CAExHW5sBbMki6Xs4XxFQQF3C4Wx3wxkLAcySrtuW3vrnOxXDNQ%40mail.gmail.com

-- 
Best Wishes,
Ashutosh Bapat




Re: Small memory fixes for pg_createsubcriber

2025-02-28 Thread Ranier Vilela
Em qui., 27 de fev. de 2025 às 22:19, Michael Paquier 
escreveu:

> On Thu, Feb 27, 2025 at 10:23:31AM -0300, Ranier Vilela wrote:
> > Yeah, I also think it would look good like this.
>
> It's the least confusing option, indeed.  I've reduced a bit the diffs
> and done that down to v16 for the pg_upgrade part where this exists.
>
Thanks Michael.


> Double-checking the tree, it does not seem that we have simolar holes
> now..  I hope that I'm not wrong.
>
I think so.

best regards,
Ranier Vilela


Re: Proposal: Limitations of palloc inside checkpointer

2025-02-28 Thread Maxim Orlov
Here is an alternative solution. We can limit the number of processed
requests to fit in a 1Gb memory chunk for each pass. Obviously, we left
some requests in the queue to be processed in the next call. We can
overcome this by using multi-step processing: estimating the number of
steps in the beginning and processing requests again.

I'd like to hear your opinion on the subject.

-- 
Best regards,
Maxim Orlov.


v3-0001-Limit-AbsorbSyncRequests-to-1Gb-at-once.patch
Description: Binary data


Re: Extend postgres_fdw_get_connections to return remote backend pid

2025-02-28 Thread Fujii Masao



On 2025/02/28 15:10, Sagar Shedge wrote:

For now, I think it makes sense to keep postgres_fdw_get_connections()
aligned with the current implementation. Otherwise, explaining what
remote_backend_pid = NULL means could be confusing, especially since
pipeline mode isn't supported yet in postgres_fdw.


Make sense. I have created a patch according to the above suggestions.
Please review.


Thanks for updating the patch!

I made a few cosmetic adjustments and attached the revised version.
Unless there are any objections, I plan to commit this.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
From e510af4e2e04d4d9500fa7d1ddc9ed0386bdca64 Mon Sep 17 00:00:00 2001
From: Fujii Masao 
Date: Fri, 28 Feb 2025 18:37:05 +0900
Subject: [PATCH v3] postgres_fdw: Extend postgres_fdw_get_connections to
 return remote backend PID.

This commit adds a new "remote_backend_pid" output column to
the postgres_fdw_get_connections function. It returns the process ID of
the remote backend, on the foreign server, handling the connection.

This enhancement is useful for troubleshooting, monitoring, and reporting.
For example, if a connection is unexpectedly closed by the foreign server,
the remote backend's PID can help diagnose the cause.

No extension version bump is needed, as commit c297a47c5f already
handled it for v18~.

Author: Sagar Dilip Shedge 
Reviewed-by: Fujii Masao 
Discussion: 
https://postgr.es/m/caphyiff25q5xuqwxetfkwhc0yva_6+tfg9kw4bcvcjpcwxy...@mail.gmail.com
---
 contrib/postgres_fdw/connection.c | 11 +++--
 .../postgres_fdw/expected/postgres_fdw.out| 46 ---
 .../postgres_fdw/postgres_fdw--1.1--1.2.sql   |  2 +-
 contrib/postgres_fdw/sql/postgres_fdw.sql | 26 ---
 doc/src/sgml/postgres-fdw.sgml| 23 +++---
 5 files changed, 76 insertions(+), 32 deletions(-)

diff --git a/contrib/postgres_fdw/connection.c 
b/contrib/postgres_fdw/connection.c
index 8a8d3b4481f..8ef9702c05c 100644
--- a/contrib/postgres_fdw/connection.c
+++ b/contrib/postgres_fdw/connection.c
@@ -2111,8 +2111,8 @@ pgfdw_finish_abort_cleanup(List *pending_entries, List 
*cancel_requested,
 
 /* Number of output arguments (columns) for various API versions */
 #define POSTGRES_FDW_GET_CONNECTIONS_COLS_V1_1 2
-#define POSTGRES_FDW_GET_CONNECTIONS_COLS_V1_2 5
-#define POSTGRES_FDW_GET_CONNECTIONS_COLS  5   /* maximum of above */
+#define POSTGRES_FDW_GET_CONNECTIONS_COLS_V1_2 6
+#define POSTGRES_FDW_GET_CONNECTIONS_COLS  6   /* maximum of above */
 
 /*
  * Internal function used by postgres_fdw_get_connections variants.
@@ -2128,13 +2128,15 @@ pgfdw_finish_abort_cleanup(List *pending_entries, List 
*cancel_requested,
  *
  * For API version 1.2 and later, this function takes an input parameter
  * to check a connection status and returns the following
- * additional values along with the three values from version 1.1:
+ * additional values along with the four values from version 1.1:
  *
  * - user_name - the local user name of the active connection. In case the
  *   user mapping is dropped but the connection is still active, then the
  *   user name will be NULL in the output.
  * - used_in_xact - true if the connection is used in the current transaction.
  * - closed - true if the connection is closed.
+ * - remote_backend_pid - process ID of the remote backend, on the foreign
+ *   server, handling the connection.
  *
  * No records are returned when there are no cached connections at all.
  */
@@ -2273,6 +2275,9 @@ postgres_fdw_get_connections_internal(FunctionCallInfo 
fcinfo,
values[i++] = 
BoolGetDatum(pgfdw_conn_check(entry->conn) != 0);
else
nulls[i++] = true;
+
+   /* Return process ID of remote backend */
+   values[i++] = Int32GetDatum(PQbackendPID(entry->conn));
}
 
tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc, 
values, nulls);
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out 
b/contrib/postgres_fdw/expected/postgres_fdw.out
index 8447b289cb7..c68119030ab 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -10589,13 +10589,18 @@ drop cascades to foreign table ft7
 -- List all the existing cached connections. loopback and loopback3
 -- should be output as invalid connections. Also the server name and user name
 -- for loopback3 should be NULL because both server and user mapping were
--- dropped.
-SELECT server_name, user_name = CURRENT_USER as "user_name = CURRENT_USER", 
valid, used_in_xact, closed
-FROM postgres_fdw_get_connections() ORDER BY 1;
- server_name | user_name = CURRENT_USER | valid | used_in_xact | closed 
--+--+---+--+
- loopback| t| f | t  

Re: SQL function which allows to distinguish a server being in point in time recovery mode and an ordinary replica

2025-02-28 Thread m . litsarev

Hi,

Fix an error in the patch.

Respectfully,

Mikhail Litsarev,
Postgres Professional: https://postgrespro.com
From d96d322e9c146e35ef1a5c3168109d3b059585f7 Mon Sep 17 00:00:00 2001
From: Mikhail Litsarev 
Date: Fri, 10 Jan 2025 21:23:02 +0300
Subject: [PATCH v6 1/2] Replace recovery boolean flags with a bits32 set.

Move local booleans ArchiveRecoveryRequested, InArchiveRecovery,
StandbyModeRequested, StandbyMode, LocalHotStandbyActive,
LocalPromoteIsTriggered into localRecoveryFlags bitset.

Move SharedHotStandbyActive, SharedPromoteIsTriggered members of
XLogRecoveryCtlData into sharedRecoveryFlags bitset.

Refactor the code according to the changes.
---
 src/backend/access/transam/timeline.c  |   6 +-
 src/backend/access/transam/xlog.c  |  26 +--
 src/backend/access/transam/xlogarchive.c   |   4 +-
 src/backend/access/transam/xlogrecovery.c  | 221 ++---
 src/backend/replication/logical/slotsync.c |   2 +-
 src/backend/replication/slot.c |   2 +-
 src/include/access/xlog_internal.h |   7 +-
 src/include/access/xlogrecovery.h  |  34 +++-
 8 files changed, 162 insertions(+), 140 deletions(-)

diff --git a/src/backend/access/transam/timeline.c b/src/backend/access/transam/timeline.c
index a27f27cc037..c9f53c4b667 100644
--- a/src/backend/access/transam/timeline.c
+++ b/src/backend/access/transam/timeline.c
@@ -93,7 +93,7 @@ readTimeLineHistory(TimeLineID targetTLI)
 		return list_make1(entry);
 	}
 
-	if (ArchiveRecoveryRequested)
+	if (ArchiveRecoveryRequested())
 	{
 		TLHistoryFileName(histfname, targetTLI);
 		fromArchive =
@@ -229,7 +229,7 @@ existsTimeLineHistory(TimeLineID probeTLI)
 	if (probeTLI == 1)
 		return false;
 
-	if (ArchiveRecoveryRequested)
+	if (ArchiveRecoveryRequested())
 	{
 		TLHistoryFileName(histfname, probeTLI);
 		RestoreArchivedFile(path, histfname, "RECOVERYHISTORY", 0, false);
@@ -331,7 +331,7 @@ writeTimeLineHistory(TimeLineID newTLI, TimeLineID parentTLI,
 	/*
 	 * If a history file exists for the parent, copy it verbatim
 	 */
-	if (ArchiveRecoveryRequested)
+	if (ArchiveRecoveryRequested())
 	{
 		TLHistoryFileName(histfname, parentTLI);
 		RestoreArchivedFile(path, histfname, "RECOVERYHISTORY", 0, false);
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 799fc739e18..f3af732b229 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -5434,7 +5434,7 @@ CheckRequiredParameterValues(void)
 	 * For archive recovery, the WAL must be generated with at least 'replica'
 	 * wal_level.
 	 */
-	if (ArchiveRecoveryRequested && ControlFile->wal_level == WAL_LEVEL_MINIMAL)
+	if (ArchiveRecoveryRequested() && ControlFile->wal_level == WAL_LEVEL_MINIMAL)
 	{
 		ereport(FATAL,
 (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
@@ -5447,7 +5447,7 @@ CheckRequiredParameterValues(void)
 	 * For Hot Standby, the WAL must be generated with 'replica' mode, and we
 	 * must have at least as many backend slots as the primary.
 	 */
-	if (ArchiveRecoveryRequested && EnableHotStandby)
+	if (ArchiveRecoveryRequested() && EnableHotStandby)
 	{
 		/* We ignore autovacuum_worker_slots when we make this test. */
 		RecoveryRequiresIntParameter("max_connections",
@@ -5607,8 +5607,8 @@ StartupXLOG(void)
 	 *
 	 * InitWalRecovery analyzes the control file and the backup label file, if
 	 * any.  It updates the in-memory ControlFile buffer according to the
-	 * starting checkpoint, and sets InRecovery and ArchiveRecoveryRequested.
-	 * It also applies the tablespace map file, if any.
+	 * starting checkpoint, and sets SX_ARCHIVE_RECOVERY_REQUESTED and
+	 * InRecovery. It also applies the tablespace map file, if any.
 	 */
 	InitWalRecovery(ControlFile, &wasShutdown,
 	&haveBackupLabel, &haveTblspcMap);
@@ -5740,7 +5740,7 @@ StartupXLOG(void)
 	{
 		/* Initialize state for RecoveryInProgress() */
 		SpinLockAcquire(&XLogCtl->info_lck);
-		if (InArchiveRecovery)
+		if (InArchiveRecovery())
 			XLogCtl->SharedRecoveryState = RECOVERY_STATE_ARCHIVE;
 		else
 			XLogCtl->SharedRecoveryState = RECOVERY_STATE_CRASH;
@@ -5793,7 +5793,7 @@ StartupXLOG(void)
 		 * startup process to think that there are still invalid page
 		 * references when checking for data consistency.
 		 */
-		if (InArchiveRecovery)
+		if (InArchiveRecovery())
 		{
 			LocalMinRecoveryPoint = ControlFile->minRecoveryPoint;
 			LocalMinRecoveryPointTLI = ControlFile->minRecoveryPointTLI;
@@ -5827,7 +5827,7 @@ StartupXLOG(void)
 		 * control file and we've established a recovery snapshot from a
 		 * running-xacts WAL record.
 		 */
-		if (ArchiveRecoveryRequested && EnableHotStandby)
+		if (ArchiveRecoveryRequested() && EnableHotStandby)
 		{
 			TransactionId *xids;
 			int			nxids;
@@ -5940,7 +5940,7 @@ StartupXLOG(void)
 		 * recover from an online backup but never called pg_backup_stop(), or
 		 * you didn't archive all the WAL needed.
 		 */
-		if (ArchiveRecoveryRequested || ControlFile->backupE

Re: doc: create table improvements

2025-02-28 Thread Laurenz Albe
On Wed, 2024-04-24 at 07:45 -0700, David G. Johnston wrote:
> On Wed, Apr 24, 2024 at 3:30 AM Peter Eisentraut  wrote:
> >  > +   The reliability characteristics of a table are governed by its
> >  > +   persistence mode.  The default mode is described
> >  > +   here
> >  > +   There are two alternative modes that can be specified during
> >  > +   table creation:
> >  > +   temporary and
> >  > +   unlogged.
> > 
> > Not sure reliability is the best word here.  I mean, a temporary table 
> > isn't any less reliable than any other table.  It just does different 
> > things.
> 
> Given the name of the section where this is all discussed I'm having trouble
> going with a different word. 

This patch has rotted somewhat, and parts of it have become obsolete
with commit e2bab2d792.  Still, I think that it is a good idea to shorten
the lines in the synopsis.

A detailed review:

> diff --git a/doc/src/sgml/ref/create_table.sgml 
> b/doc/src/sgml/ref/create_table.sgml
> index 02f31d2d6f..9a5dafb9af 100644
> --- a/doc/src/sgml/ref/create_table.sgml
> +++ b/doc/src/sgml/ref/create_table.sgml
> [...]
> +and column_storage is:
> +
> +STORAGE { PLAIN | EXTERNAL | EXTENDED | MAIN | DEFAULT } [ COMPRESSION 
> compression_method ]
> [...]

I don't know if "column_storage" is descriptive.  After all, this is solely
about TOAST details, which is only one aspect of storage.
I have renamed it to "oversize_storage".


@@ -118,11 +127,21 @@ WITH ( MODULUS numeric_literal, REM
   Description
 
   
-   CREATE TABLE will create a new, initially empty table
+   CREATE TABLE will create a new, initially empty, table
in the current database. The table will be owned by the user issuing the
command.
   

I am not a native speaker, but the sentence feels better to me without
the extra comma.  I took the liberty to undo this change, partly because
it is unrelated to the topic of the patch.

+  
+   The reliability characteristics of a table are governed by its
+   persistence mode.  The default mode is described
+   here
+   There are two alternative modes that can be specified during
+   table creation:
+   temporary and
+   unlogged.
+  

I agree with Peter that "reliability" is not ideal.  I went with
"durability" instead.  I removed the link to the reliability discussion
and rephrased the sentence somewhat.

+
+ 
+  If specified on a partitioned table the property is recorded but ignored:
+  the entire partitioned table is not automatically truncated after a crash
+  or unclean shutdown.
+ 

This has become obsolete with e2bab2d792, so I removed it.

Attached is an updated patch.

Yours,
Laurenz Albe
From 338dfb52d83d0f9c07421641a0111a2f8e63b34e Mon Sep 17 00:00:00 2001
From: Laurenz Albe 
Date: Fri, 28 Feb 2025 14:36:25 +0100
Subject: [PATCH v2] Unclutter CREATE TABLE synopsis

Factor out the "persistence mode" and "oversize storage" parts
of the syntax synopsis to reduce the line length and increase
the readability.

Author: David G. Johnston
Reviewed-by: Laurenz Albe
Discussion: https://postgr.es/m/CAKFQuwYfMV-2SdrP-umr5SVNSqTn378BUvHsebetp5%3DDhT494w%40mail.gmail.com
---
 doc/src/sgml/ref/create_table.sgml | 27 ++-
 1 file changed, 22 insertions(+), 5 deletions(-)

diff --git a/doc/src/sgml/ref/create_table.sgml b/doc/src/sgml/ref/create_table.sgml
index 0a3e520f215..73f18d6b331 100644
--- a/doc/src/sgml/ref/create_table.sgml
+++ b/doc/src/sgml/ref/create_table.sgml
@@ -21,8 +21,8 @@ PostgreSQL documentation
 
  
 
-CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE [ IF NOT EXISTS ] table_name ( [
-  { column_name data_type [ STORAGE { PLAIN | EXTERNAL | EXTENDED | MAIN | DEFAULT } ] [ COMPRESSION compression_method ] [ COLLATE collation ] [ column_constraint [ ... ] ]
+CREATE [ persistence_mode ] TABLE [ IF NOT EXISTS ] table_name ( [
+  { column_name data_type [ oversize_storage ] [ COLLATE collation ] [ column_constraint [ ... ] ]
 | table_constraint
 | LIKE source_table [ like_option ... ] }
 [, ... ]
@@ -34,7 +34,7 @@ CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE [ IF NOT EXI
 [ ON COMMIT { PRESERVE ROWS | DELETE ROWS | DROP } ]
 [ TABLESPACE tablespace_name ]
 
-CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE [ IF NOT EXISTS ] table_name
+CREATE [ persistence_mode ] TABLE [ IF NOT EXISTS ] table_name
 OF type_name [ (
   { column_name [ WITH OPTIONS ] [ column_constraint [ ... ] ]
 | table_constraint }
@@ -46,7 +46,7 @@ CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE [ IF NOT EXI
 [ ON COMMIT { PRESERVE ROWS | DELETE ROWS | DROP } ]
 [ TABLESPACE tablespace_name ]
 
-CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE [ IF NOT EXISTS ] table_name
+CREATE [ persistence_mode ] TABLE [ IF NOT EXISTS ] table_name
 PARTITION OF parent_table [ (
   { column_name [ WITH OPTIONS ] [ column_constraint [ ... ] ]
 | table_constraint }
@@ -58,7 +58,16 @@ CREAT

Re: [PATCH] Add regression tests of ecpg command notice (error / warning)

2025-02-28 Thread Fujii Masao




On 2025/02/28 9:24, Ryo Kanbayashi wrote:

I have rewrote my patch on TAP test sttyle :)
File for build are also updated.


Thanks for updating the patch!

+'tests': [
+  't/001_ecpg_notice.pl',
+  't/002_ecpg_notice_informix.pl',

Since neither test emits "notice" messages, shouldn't the test file
names be revised to reflect this?

Also, I'm unsure if it's ideal to place input files directly under
the "t" directory. I looked for similar TAP tests with input files,
but I couldn't find any examples to guide this decision...

+program_help_ok('ecpg');
+program_version_ok('ecpg');
+program_options_handling_ok('ecpg');
+command_fails(['ecpg'], 'ecpg without arguments fails');

These checks seem unnecessary in 002 since they're already covered in 001.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION





Re: Adding NetBSD and OpenBSD to Postgres CI

2025-02-28 Thread Thomas Munro
On Tue, Feb 18, 2025 at 11:33 AM Thomas Munro  wrote:
> Maybe we could try this?
>
> https://man.netbsd.org/mount_tmpfs.8
> https://man.openbsd.org/mount_tmpfs.8

NetBSD's test_world: 10:30 -> 3:23
OpenBSD test_world: 15:45 - >9:10

I think NetBSD would finish around 2nd place if turned on by default
like that.  The OpenBSD part needs more work though, see attached...
From f28ad02574090c4690bce639b47ee301a3d857cc Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Fri, 28 Feb 2025 22:19:07 +1300
Subject: [PATCH 1/2] ci: Use a RAM disk for NetBSD and OpenBSD.

Run all the tests on a RAM disk, because it goes faster.  Harmonize the
setup for all three BSD CI tasks into one script, replacing the old
FreeBSD-specific one from commit 0265e5c1.

XXX OpenBSD not good enough yet

ci-os-only: freebsd netbsd openbsd
---
 .cirrus.tasks.yml   |  4 +--
 src/tools/ci/gcp_freebsd_repartition.sh | 26 --
 src/tools/ci/prepare_filesystem.sh  | 36 +
 3 files changed, 38 insertions(+), 28 deletions(-)
 delete mode 100755 src/tools/ci/gcp_freebsd_repartition.sh
 create mode 100755 src/tools/ci/prepare_filesystem.sh

diff --git a/.cirrus.tasks.yml b/.cirrus.tasks.yml
index 91b51142d2e..2858dc1194e 100644
--- a/.cirrus.tasks.yml
+++ b/.cirrus.tasks.yml
@@ -155,8 +155,7 @@ task:
 
   ccache_cache:
 folder: $CCACHE_DIR
-  # Work around performance issues due to 32KB block size
-  repartition_script: src/tools/ci/gcp_freebsd_repartition.sh
+  prepare_filesystem_script: src/tools/ci/prepare_filesystem.sh
   create_user_script: |
 pw useradd postgres
 chown -R postgres:postgres .
@@ -277,6 +276,7 @@ task:
   ccache_cache:
 folder: $CCACHE_DIR
 
+  prepare_filesystem_script: src/tools/ci/prepare_filesystem.sh
   create_user_script: |
 useradd postgres
 chown -R postgres:users /home/postgres
diff --git a/src/tools/ci/gcp_freebsd_repartition.sh b/src/tools/ci/gcp_freebsd_repartition.sh
deleted file mode 100755
index 3adb8fb88ec..000
--- a/src/tools/ci/gcp_freebsd_repartition.sh
+++ /dev/null
@@ -1,26 +0,0 @@
-#!/bin/sh
-
-set -e
-set -x
-
-# fix backup partition table after resize
-gpart recover da0
-gpart show da0
-
-# delete and re-add swap partition with expanded size
-swapoff -a
-gpart delete -i 3 da0
-gpart add -t freebsd-swap -l swapfs -a 4096 da0
-gpart show da0
-swapon -a
-
-# create a file system on a memory disk backed by swap, to minimize I/O
-mdconfig -a -t swap -s20g -u md1
-newfs -b 8192 -U /dev/md1
-
-# migrate working directory
-du -hs $CIRRUS_WORKING_DIR
-mv $CIRRUS_WORKING_DIR $CIRRUS_WORKING_DIR.orig
-mkdir $CIRRUS_WORKING_DIR
-mount -o noatime /dev/md1 $CIRRUS_WORKING_DIR
-cp -a $CIRRUS_WORKING_DIR.orig/ $CIRRUS_WORKING_DIR/
diff --git a/src/tools/ci/prepare_filesystem.sh b/src/tools/ci/prepare_filesystem.sh
new file mode 100755
index 000..dd203de4915
--- /dev/null
+++ b/src/tools/ci/prepare_filesystem.sh
@@ -0,0 +1,36 @@
+#!/bin/sh
+
+set -e
+set -x
+
+# migrate working directory to tmpfs
+du -hs $CIRRUS_WORKING_DIR
+mv $CIRRUS_WORKING_DIR $CIRRUS_WORKING_DIR.orig
+mkdir $CIRRUS_WORKING_DIR
+case "`uname`" in
+
+  FreeBSD|NetBSD)
+mount -t tmpfs tmpfs $CIRRUS_WORKING_DIR
+;;
+
+  OpenBSD)
+# tmpfs is broken on OpenBSD.  mfs is similar but you need a size up front
+# and that needs to be backed by swap, so we'll somehow need to add swap,
+# either here or on the image.
+
+#du -h
+#umount /usr/obj  # unused 5G on /dev/sd0j
+#swapon /dev/sd0j # fails with ENXIO... what am I missing?
+
+# XXX As a temporary hack, mount a 4GB file as swap.
+dd if=/dev/zero of=/usr/obj/big.swap bs=1M count=4096
+swapon /usr/obj/big.swap
+
+mount -t mfs -o rw,noatime,nodev,-s=800 swap $CIRRUS_WORKING_DIR
+;;
+
+  *)
+echo "I don't know how to create a RAM disk on this computer"
+exit 1
+esac
+cp -a $CIRRUS_WORKING_DIR.orig/. $CIRRUS_WORKING_DIR/
-- 
2.48.1



Re: Licence preamble update

2025-02-28 Thread Tom Lane
Noah Misch  writes:
> On Thu, Feb 27, 2025 at 04:56:05PM +, Dave Page wrote:
>> --- a/COPYRIGHT
>> +++ b/COPYRIGHT
>> @@ -1,5 +1,5 @@
>> PostgreSQL Database Management System
>> -(formerly known as Postgres, then as Postgres95)
>> +(also known as Postgres, formerly as Postgres95)
>> 
>> Portions Copyright (c) 1996-2025, PostgreSQL Global Development Group

> I'm not seeing this change as aligned with
> https://www.postgresql.org/about/policies/project-name/, which says Postgres
> "is an alias or nickname and is not the official name of the project."  The
> official product name did change Postgres -> Postgres95 -> PostgreSQL, with
> "Postgres" holding the status of a nickname since Postgres95 became the
> official name.  Today's text matches that history, and the proposed text
> doesn't.  Can you share more from the brief discussion?  Changing a license
> file is an eyebrow-raising event, so we should do it only if the win is clear.
> There may be an argument for making this change, but I'm missing it currently.

PGCAC holds trademarks on both "PostgreSQL" and "Postgres".  We've
been given legal advice that both terms need to be in current use
to preserve the trademarks.  Which they are and have been, but the
present wording in COPYRIGHT doesn't align with that.  The website
phrasing will be adjusted as well.

regards, tom lane




Re: Space missing from EXPLAIN output

2025-02-28 Thread Thom Brown
On Fri, 28 Feb 2025 at 16:54, Fabrízio de Royes Mello
 wrote:
>
>
> On Fri, Feb 28, 2025 at 1:38 PM Thom Brown  wrote:
>>
>> Hi,
>>
>> Commit ddb17e387aa introduced fractional row counts, but the rejigging
>> has introduced a formatting issue:
>>
>> Worker 0:  actual time=34.779..34.780rows=0 loops=1
>>   Buffers: shared hit=1200
>> Worker 1:  actual time=39.737..39.738rows=0 loops=1
>>   Buffers: shared hit=1084
>>
>> A space is missing between the time values and the "rows" label.
>>
>
> Are you sure your main is updated? The current main is 
> 424ededc580b03e1bcf8aff18a735e519c80061f.
>
> Because your patch is not applying:
> main on  main [$]
> ➜ git apply /tmp/fix_explain_analyze_spacing.diff
> error: patch failed: src/backend/commands/explain.c:2075
> error: src/backend/commands/explain.c: patch does not apply
>
> On the current main your change should be on line 2041 and not 2075 according 
> to your patch.

Erk, yes, my main wasn't up-to-date. Thanks for pointing that out.

Rebased and attached.

Thom
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 7e4432f080..d8a7232ced 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -2038,7 +2038,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
 ExplainIndentText(es);
 appendStringInfo(es->str, "actual ");
 if (es->timing)
-	appendStringInfo(es->str, "time=%.3f..%.3f", startup_ms, total_ms);
+	appendStringInfo(es->str, "time=%.3f..%.3f ", startup_ms, total_ms);
 
 appendStringInfo(es->str, "rows=%.2f loops=%.0f\n", rows, nloops);
 			}


Re: moving some code out of explain.c

2025-02-28 Thread Robert Haas
On Fri, Feb 28, 2025 at 9:52 AM Robert Haas  wrote:
> On Thu, Feb 27, 2025 at 4:48 PM Tom Lane  wrote:
> > Oversight I assume?
>
> Yeah, that was dumb. Thanks for catching it. Here's v3.

Committed so I can see what the buildfarm thinks before it gets too
late in the day.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Amcheck verification of GiST and GIN

2025-02-28 Thread Mark Dilger
> So, we have an entry tree page, where there is tuple on offset 80,
> with gin tuple category = 3, and then it goes category 0 again. And
> one more such pattern on the same page.
> The ginCompareEntries function compares the gin tuples category first.
> I do not understand how this would be a valid order on the page, given
> that
> ginCompareEntries used in `ginget.c` logic. . Maybe I'm missing
> something vital about GIN.
>

The only obvious definition of "wrong" for this is that gin index scans
return different result sets than table scans over the same data.  Using
your much smaller reproducible test case, and adding rows like:

SELECT COUNT(*) FROM tbl WHERE j @>
'"1129BBCABFFAACA9VGVKipnwohaccc9TSIMTOQKHmcGYVeFE_PWKLHmpyj60137672qugtsstugg"'::jsonb;
SELECT COUNT(*) FROM tbl WHERE j @> '{"": "r", "hji4124": "",
"HTJP_DAptxn6": 9}'::jsonb;
SELECT COUNT(*) FROM tbl WHERE j @> '[]'::jsonb;
SELECT COUNT(*) FROM tbl WHERE j @> NULL::jsonb;
SELECT COUNT(*) FROM tbl WHERE j @> '{"": -6, "__": [""], "YMb":
-22}'::jsonb;
SELECT COUNT(*) FROM tbl WHERE j @> '{"853": -60, "pjx": "",
"TGLUG_jxmrggv": null}'::jsonb;
SELECT COUNT(*) FROM tbl WHERE j @>
'"D3BDA069074174vx48A37IVHWVXLUP9382542ypsl1465pixtryzCBgrkkhrvCC_BDDFatkyXHLIe"'::jsonb;
SELECT COUNT(*) FROM tbl WHERE j @> '{"F18s": {"": -84194}, "ececab2":
[""]}'::jsonb;

The results are the same with or without the index.  Can you find any
examples where they differ?

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Space missing from EXPLAIN output

2025-02-28 Thread Thom Brown
Hi,

Commit ddb17e387aa introduced fractional row counts, but the rejigging
has introduced a formatting issue:

Worker 0:  actual time=34.779..34.780rows=0 loops=1
  Buffers: shared hit=1200
Worker 1:  actual time=39.737..39.738rows=0 loops=1
  Buffers: shared hit=1084

A space is missing between the time values and the "rows" label.

Patch attached to fix.

Regards

Thom
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index c0d614866a..a1395edc8c 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -2075,7 +2075,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
 ExplainIndentText(es);
 appendStringInfo(es->str, "actual ");
 if (es->timing)
-	appendStringInfo(es->str, "time=%.3f..%.3f", startup_ms, total_ms);
+	appendStringInfo(es->str, "time=%.3f..%.3f ", startup_ms, total_ms);
 
 if (nloops > 1)
 	appendStringInfo(es->str, "rows=%.2f loops=%.0f\n", rows, nloops);


Re: Proposal: Limitations of palloc inside checkpointer

2025-02-28 Thread Maxim Orlov
I think I figured it out. Here is v4.

If the number of requests is less than 1 GB, the algorithm stays the same
as before. If we need to process more, we will do it incrementally with
slices of 1 GB.

Best regards,
Maxim Orlov.


v4-0001-Process-sync-requests-incrementally-in-AbsorbSync.patch
Description: Binary data


Re: Space missing from EXPLAIN output

2025-02-28 Thread Fabrízio de Royes Mello
On Fri, Feb 28, 2025 at 1:38 PM Thom Brown  wrote:

> Hi,
>
> Commit ddb17e387aa introduced fractional row counts, but the rejigging
> has introduced a formatting issue:
>
> Worker 0:  actual time=34.779..34.780rows=0 loops=1
>   Buffers: shared hit=1200
> Worker 1:  actual time=39.737..39.738rows=0 loops=1
>   Buffers: shared hit=1084
>
> A space is missing between the time values and the "rows" label.
>
>
Are you sure your main is updated? The current main
is 424ededc580b03e1bcf8aff18a735e519c80061f.

Because your patch is not applying:
main on  main [$]
➜ git apply /tmp/fix_explain_analyze_spacing.diff
error: patch failed: src/backend/commands/explain.c:2075
error: src/backend/commands/explain.c: patch does not apply

On the current main your change should be on line 2041 and not 2075
according to your patch.

Regards,

-- 
Fabrízio de Royes Mello


Re: [PATCH] New predefined role pg_manage_extensions

2025-02-28 Thread Michael Banck
Hi,

On Fri, Jan 17, 2025 at 10:03:17AM +0100, Laurenz Albe wrote:
> On Thu, 2024-10-31 at 22:47 +0100, Michael Banck wrote:
> > Even though there has not been a lot of discussion on this, here is a
> > rebased patch.  I have also added it to the upcoming commitfest.
> 
> I had a look at the patch.

Thanks! And sorry for the long delay...
 
> > --- a/doc/src/sgml/user-manag.sgml
> > +++ b/doc/src/sgml/user-manag.sgml
> > @@ -669,6 +669,17 @@ GRANT pg_signal_backend TO admin_user;
> >   
> >  
> >  
> > + > xreflabel="pg_manage_extensions">
> > + pg_manage_extensions
> > + 
> > +  
> > +   pg_manage_extensions allows creating, removing or
> > +   updating extensions, even if the extensions are untrusted or the 
> > user is
> > +   not the database owner.
> > +  
> > + 
> > +
> > +
> 
> The inaccuracy of the database owner has already been mentioned.

Right, I had already changed that in my tree but never sent out an
updated version with this.
 
> Should we say "creating, altering or dropping extensions" to make the 
> connection to
> the respective commands obvious?

Alright, did so.

> > --- a/src/backend/commands/extension.c
> > +++ b/src/backend/commands/extension.c
> > @@ -994,13 +994,14 @@ execute_extension_script(Oid extensionOid, 
> > ExtensionControlFile *control,
> > ListCell   *lc2;
> >  
> > /*
> > -* Enforce superuser-ness if appropriate.  We postpone these checks 
> > until
> > -* here so that the control flags are correctly associated with the 
> > right
> > +* Enforce superuser-ness/membership of the pg_manage_extensions
> > +* predefined role if appropriate.  We postpone these checks until here
> > +* so that the control flags are correctly associated with the right
> >  * script(s) if they happen to be set in secondary control files.
> >  */
> 
> This is just an attempt to improve the English:
> 
>   If appropriate, enforce superuser-ness or membership of the predefined role
>   pg_manage_extensions.

Done.

> > -: errhint("Must be superuser to create this 
> > extension.")));
> > +: errhint("Only roles with privileges of the \"%s\" 
> > role are allowed to create this extension.", "pg_manage_extensions")));
> 
> I don't see the point of breaking out the role name from the message.
> We don't do that in other places.

We actually do, I think I modelled it after other predefined roles,
e.g.:

|src/backend/commands/subscriptioncmds.c:
errdetail("Only roles with privileges of the \"%s\" role may create 
subscriptions.",
|src/backend/commands/subscriptioncmds.c-   
   "pg_create_subscription")));
|--
|src/backend/commands/copy.c:
errdetail("Only roles with privileges of the \"%s\" role may COPY to or from an 
external program.",
|src/backend/commands/copy.c-   
   "pg_execute_server_program"),
|--
|src/backend/commands/copy.c:
errdetail("Only roles with privileges of the \"%s\" role may COPY from a file.",
|src/backend/commands/copy.c-   
   "pg_read_server_files"),
|--
|src/backend/commands/copy.c:
errdetail("Only roles with privileges of the \"%s\" role may COPY to a file.",
|src/backend/commands/copy.c-   
   "pg_write_server_files"),

However, those are all errdetail, while the previous "Must be superuser
to [...]" is errhint, and that error message has different hints based
on whether the extension is trusted or not...

> Besides, I think that the mention of the superuser should be retained.
> Moreover, I think that commit 8d9978a717 pretty much established that we
> should not quote names if they contain underscores.
> Perhaps:
> 
>   errhint("Must be superuser or member of pg_manage_extensions to create this 
> extension.")));

Alright, I think it makes more sense to have it like that than the
above, so changed it to that.

New version 3 attached.


Michael
>From 57e02d95ace2a390ea1e40266735529f313b31ec Mon Sep 17 00:00:00 2001
From: Michael Banck 
Date: Thu, 31 Oct 2024 22:36:12 +0100
Subject: [PATCH v3] Add new pg_manage_extensions predefined role.

This allows any role that is granted this new predefined role to CREATE, UPDATE
or DROP extensions, no matter whether they are trusted or not.
---
 doc/src/sgml/user-manag.sgml  | 11 +++
 src/backend/commands/extension.c  | 11 ++-
 src/include/catalog/pg_authid.dat |  5 +
 3 files changed, 22 insertions(+), 5 deletions(-)

diff --git a/doc/src/sgml/user-manag.sgml b/doc/src/sgml/user-manag.sgml
index ed18704a9c2..2a44f41da4f 100644
--- a/doc/src/sgml/user-manag.sgml
+++ b/doc/src/sgml/user-manag.sgml
@@ -669,6 +669,17 @@ GRANT

Re: Disabling vacuum truncate for autovacuum

2025-02-28 Thread Nathan Bossart
On Thu, Feb 27, 2025 at 08:29:16PM -0800, Gurjeet Singh wrote:
> On Mon, Jan 27, 2025 at 1:55 AM Laurenz Albe  wrote:
>> I hope it is possible to override the global setting with the 
>> "vacuum_truncate"
>> option on an individual table.
> 
> Current patch behaviour is that if the autovacuum_vacuum_truncate is false, 
> then
> autovacuum will _not_ truncate any relations. If the parameter's value is true
> (the default), then the relation's reloption will be honored.
> 
> A table-owner, or the database-owner, may enable truncation of a table, as 
> they
> may be trying to be nice and return the unused disk space back to the
> OS/filesystem. But if the sysadmin/DBA (who is ultimately responsible for the
> health of the entire db instance, as well as of any replicas of the db
> instance),
> wants to disable truncation across all databases to ensure that the 
> replication
> does not get stuck, then IMHO Postgres should empower the sysadmin to make
> that decision, and override the relation-level setting enabled by the table-
> owner or the database-owner.

IIUC reloptions with corresponding GUCs typically override the GUC setting,
although autovacuum_enabled is arguably an exception.  If setting the GUC
to false overrides the relation-specific settings, it also becomes more
difficult to enable truncation for just a couple of tables, although that
might not be a popular use-case.  Furthermore, even if we do want the GUC
to override the reloption, it won't override VACUUM (TRUNCATE).

>> > One additional improvement I can think of is to emit a WARNING or NOTICE 
>> > message
>> > that truncate operation is being skipped, perhaps only if the truncation
>> > would've freed up space over a certain threshold.
>>
>> Interesting idea, but I think it is independent from this patch.
> 
> Agreed. I'll consider writing a separate patch for this.

Perhaps it would be useful to say whether truncation was attempted in the
output of VACUUM (VERBOSE) and the autovacuum logs.

>> > Perhaps there's value in letting this parameter be specified at database 
>> > level,
>> > but I'm not able to think of a reason why someone would want to disable 
>> > this
>> > behaviour on just one database. So leaving the parameter context to be the 
>> > same
>> > as most other autovacuum parameters: SIGHUP.
>>
>> I can imagine setting that on only a certain database. Different databases
>> typically have different applications, which have different needs.
> 
> Makes sense. I don't think anything special needs to be done in the patch to
> address this.

Hm.  I was thinking PGC_USERSET might make sense for this one, but that was
only because I didn't see any technical reason to restrict it.  I don't
know whether limiting it accomplishes anything beyond making it more
cumbersome for users to choose their desired default truncation setting.

> PS: Nathan, your latest email arrived as I was preparing this email and patch,
> so this email and patch does not address concerns, if any, in your latest 
> email.
> I will try to respond to it soon.

Oops, sorry for the conflict.  I'm happy to take a step back and be the
reviewer/committer for this one.

-- 
nathan




Re: Lowering temp_buffers minimum

2025-02-28 Thread Matthias van de Meent
On Tue, 25 Feb 2025 at 15:33, Andres Freund  wrote:
> It seems rather odd that our minimum for temp_buffers is 100 while the
minimum
> for shared_buffers, which is shared across connections!, is 16.

Hmm, given that, I'd say we also increase that minimum shared_buffers to a
value >= 33 as the highest number of pages that can be addressed in one WAL
record: We allow users to write WAL records with 33 pages without pinning
the relevant buffers, but recovery doesn't do direct-to-disk options. So, I
think it's better to increase this limit.

> Does anybody see a reason we shouldn't lower temp_buffers to match
> shared_buffers?

None that I can think of. As Robert said, go for it.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)


Re: moving some code out of explain.c

2025-02-28 Thread Robert Haas
On Thu, Feb 27, 2025 at 4:48 PM Tom Lane  wrote:
> Oversight I assume?

Yeah, that was dumb. Thanks for catching it. Here's v3.

-- 
Robert Haas
EDB: http://www.enterprisedb.com


v3-0001-Avoid-including-explain.h-in-explain_format.h-and.patch
Description: Binary data


Re: Introduce "log_connection_stages" setting.

2025-02-28 Thread Melanie Plageman
On Fri, Feb 28, 2025 at 5:04 AM Bertrand Drouvot
 wrote:
>
> Did not realized that before but FWIW, it looks "very close" to what Melanie 
> is
> doing in [1] (0001 patch). Idea looks similar excepts, among other things, 
> that
> log_connections is kept.

Oh wow, yes, I never saw this thread. I prefer keeping the original
guc name log_connections. I think it makes sense to also include
disconnected as an option in the list and remove that separate guc
log_disconnections. I was thinking, it should be okay to include "all"
in the list with other values and it just sets the value to all.

- Melanie




Re: Add Pipelining support in psql

2025-02-28 Thread Daniel Verite
Anthonin Bonnefoy wrote:

> > What is the reasoning here behind this restriction?  \gx is a wrapper
> > of \g with expanded mode on, but it is also possible to call \g with
> > expanded=on, bypassing this restriction.
> 
> The issue is that \gx enables expanded mode for the duration of the
> query and immediately reset it in sendquery_cleanup. With pipelining,
> the command is piped and displaying is done by either \endpipeline or
> \getresults, so the flag change has no impact. Forbidding it was a way
> to make it clearer that it won't have the expected effect

But it's not just \gx

The following invocations don't respect the desired output destination
and formats (ignoring them), when a pipeline is active:

select ... \bind \g filename
select ... \bind \g |program
select ... \bind \g (format=unaligned tuples_only=on)

Just like for \gx the problem is that in a pipeline, sending the query
is not followed by getting the results, and the output properties
of a query are lost in between.


Best regards,
-- 
Daniel Vérité 
https://postgresql.verite.pro/




Re: Incorrect result of bitmap heap scan.

2025-02-28 Thread Matthias van de Meent
On Tue, 7 Jan 2025 at 18:46, Matthias van de Meent
 wrote:
>
> Hi,
>
> I've rebased my earlier patch to fix some minor conflicts with the
> work done on bitmaps in December last year. I've also included Andres'
> cursor-based isolation test as 0002; which now passes.

Rebased again, now on top of head due to f7a8fc10.

> The patches for the back-branches didn't need updating, as those
> branches have not diverged enough for those patches to have gotten
> stale. They're still available in my initial mail over at [0].

Same again.

> Kind regards,
>
> Matthias van de Meent
> Neon (https://neon.tech)
>
> [0] 
> https://www.postgresql.org/message-id/CAEze2Wg1Q4gWzm9RZ0yXydm23Mj3iScu8LA__Zz3JJEgpnoGPQ%40mail.gmail.com


v4-0002-isolationtester-showing-broken-index-only-bitmap-.patch
Description: Binary data


v4-0001-Remove-HeapBitmapScan-s-skip_fetch-optimization.patch
Description: Binary data


Re: Changing shared_buffers without restart

2025-02-28 Thread Ashutosh Bapat
On Tue, Feb 25, 2025 at 3:22 PM Dmitry Dolgov <9erthali...@gmail.com> wrote:
>
> > On Fri, Oct 18, 2024 at 09:21:19PM GMT, Dmitry Dolgov wrote:
> > TL;DR A PoC for changing shared_buffers without PostgreSQL restart, via
> > changing shared memory mapping layout. Any feedback is appreciated.
>
> Hi,
>
> Here is a new version of the patch, which contains a proposal about how to
> coordinate shared memory resizing between backends. The rest is more or less
> the same, a feedback about coordination is appreciated. It's a lot to read, 
> but
> the main difference is about:

Thanks Dmitry for the summary.

>
> 1. Allowing to decouple a GUC value change from actually applying it, sort of 
> a
> "pending" change. The idea is to let a custom logic be triggered on an assign
> hook, and then take responsibility for what happens later and how it's going 
> to
> be applied. This allows to use regular GUC infrastructure in cases where value
> change requires some complicated processing. I was trying to make the change
> not so invasive, plus it's missing GUC reporting yet.
>
> 2. Shared memory resizing patch became more complicated thanks to some
> coordination between backends. The current implementation was chosen from few
> more or less equal alternatives, which are evolving along following lines:
>
> * There should be one "coordinator" process overseeing the change. Having
> postmaster to fulfill this role like in this patch seems like a natural idea,
> but it poses certain challenges since it doesn't have locking infrastructure.
> Another option would be to elect a single backend to be a coordinator, which
> will handle the postmaster as a special case. If there will ever be a
> "coordinator" worker in Postgres, that would be useful here.
>
> * The coordinator uses EmitProcSignalBarrier to reach out to all other 
> backends
> and trigger the resize process. Backends join a Barrier to synchronize and 
> wait
> untill everyone is finished.
>
> * There is some resizing state stored in shared memory, which is there to
> handle backends that were for some reason late or didn't receive the signal.
> What to store there is open for discussion.
>
> * Since we want to make sure all processes share the same understanding of 
> what
> NBuffers value is, any failure is mostly a hard stop, since to rollback the
> change coordination is needed as well and sounds a bit too complicated for 
> now.
>

I think we should add a way to monitor the progress of resizing; at
least whether resizing is complete and whether the new GUC value is in
effect.

> We've tested this change manually for now, although it might be useful to try
> out injection points. The testing strategy, which has caught plenty of bugs,
> was simply to run pgbench workload against a running instance and change
> shared_buffers on the fly. Some more subtle cases were verified by manually
> injecting delays to trigger expected scenarios.

I have shared a script with my changes but it's far from being full
testing. We will need to use injection points to test specific
scenarios.

-- 
Best Wishes,
Ashutosh Bapat




Re: Get rid of WALBufMappingLock

2025-02-28 Thread Yura Sokolov
26.02.2025 11:52, Andrey Borodin wrote:
>> On 25 Feb 2025, at 20:19, Alexander Korotkov  wrote:
>>
>>
> 
> 
> Hi!
> 
> One little piece of code looks suspicious to me. But I was not raising 
> concern because I see similar code everywhere in the codebase. But know 
> Kirill asked to me explain what is going on and I cannot.
> 
> This seems to be relevant… so.
> 
> + while (upto >= pg_atomic_read_u64(&XLogCtl->InitializedUpTo))
>// Assume ConditionVariableBroadcast() happened here, but before next line
> + ConditionVariableSleep(&XLogCtl->InitializedUpToCondVar, 
> WAIT_EVENT_WAL_BUFFER_INIT);
> + ConditionVariableCancelSleep();
> 
> Won’t this sleep wait forever?

Because ConditionVariableSleep doesn't sleep for the first time.
It just performs ConditionVariablePrepareToSleep and immediately returns.
So actual condition of `while` loop is checked at least twice before going
to sleep.

> 
> I see about 20 other occurrences of similar code, so, perhaps, everything is 
> fine. But I would greatly appreciate a little pointers on why it works.

---
regards
Yura Sokolov aka funny-falcon




Re: RFC: Additional Directory for Extensions

2025-02-28 Thread Matheus Alcantara
Hi

On Tue, Feb 25, 2025 at 5:29 PM Matheus Alcantara
 wrote:
> Fixed on the attached v3.
>
After I've sent the v3 patch I noticed that the tests were failing on windows.
The problem was on TAP test that was using ":" as a separator on
extension_control_path and also the path was not being escaped correctly
resulting in a wrong path configuration.

Attached v4 with all these fixes.

-- 
Matheus Alcantara


v4-0001-extension_control_path.patch
Description: Binary data


Re: [BUG]: the walsender does not update its IO statistics until it exits

2025-02-28 Thread Bertrand Drouvot
Hi,

On Fri, Feb 28, 2025 at 02:41:34PM +0900, Michael Paquier wrote:
> On Wed, Feb 26, 2025 at 09:48:50AM +, Bertrand Drouvot wrote:
> > Yeah I think that makes sense, done that way in the attached.
> > 
> > Speaking about physical walsender, I moved the test to 001_stream_rep.pl 
> > instead
> > (would also fail without the fix).
> 
> Hmm.  I was doing some more checks with this patch, and on closer look
> I am wondering if the location you have chosen for the stats reports
> is too aggressive: this requires a LWLock for the WAL sender backend
> type taken in exclusive mode, with each step of WalSndLoop() taken
> roughly each time a record or a batch of records is sent.  A single
> installcheck with a primary/standby setup can lead to up to 50k stats
> report calls.

Yeah, what I can observe (installcheck with a primary/standby):

- extras flush are called about 70K times.

Among those 70K:

- about 575 are going after the "have_iostats" check in pgstat_io_flush_cb()
- about 575 are going after the PendingBackendStats pg_memory_is_all_zeros()
check in pgstat_flush_backend() (makes sense due to the above)
- about 575 are going after the PendingBackendStats.pending_io 
pg_memory_is_all_zeros()
check in pgstat_flush_backend_entry_io() (makes sense due to the above)

It means that only a very few of them are "really" flushing IO stats.

> With smaller records, the loop can become hotter, can't it?  Also,
> there can be a high number of WAL senders on a single node, and I've
> heard of some customers with complex logical decoding deployments with
> dozens of logical WAL senders.  Isn't there a risk of having this code
> path become a point of contention?  It seems to me that we should
> benchmark this change more carefully, perhaps even reduce the
> frequency of the report calls.

That sounds a good idea to measure the impact of those extra calls and see
if we'd need to mitigate the impacts. I'll collect some data.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: long-standing data loss bug in initial sync of logical replication

2025-02-28 Thread Benoit Lobréau

Hi,

It took me a while but I ran the test on my laptop with 20 runs per 
test. I asked for a dedicated server and will re-run the tests if/when I 
have it.


count of partitions |   Head (sec) |Fix (sec) |Degradation (%)
--
1000|   0,0265 |   0,028  |  5,66037735849054
5000|   0,091  |   0,0945 |  3,84615384615385
1   |   0,1795 |   0,1815 |  1,11420612813371

 Concurrent Txn |Head (sec)|Patch (sec) | Degradation in %
 -
 50 |   0,1797647  |   0,1920949|  6,85907744957
 100|   0,3693029  |   0,3823425|  3,53086856344
 500|   1,62265755 |   1,91427485   | 17,97158617972
 1000   |   3,01388635 |   3,57678295   | 18,67676928162
 2000   |   7,0171877  |   6,4713304|  8,43500897435

I'll try to run test2.pl later (right now it fails).

hope this helps.

--
Benoit Lobréau
Consultant
http://dalibo.com


cache_inval_bug_v1.ods
Description: application/vnd.oasis.opendocument.spreadsheet


Re: bug: ALTER TABLE ADD VIRTUAL GENERATED COLUMN with table rewrite

2025-02-28 Thread Srinath Reddy
Hi,
I have applied the patch and verified,and patch LGTM.

Thanks and regards
Srinath Reddy Sadipiralla
EDB: https://www.enterprisedb.com/


Re: SQL function which allows to distinguish a server being in point in time recovery mode and an ordinary replica

2025-02-28 Thread m . litsarev

Hi!

Rebased the patch.

Respectfully,

Mikhail Litsarev,
Postgres Professional: https://postgrespro.comFrom 67b190ce15f8ba7480ba1691b804b27f96fd5540 Mon Sep 17 00:00:00 2001
From: Mikhail Litsarev 
Date: Fri, 10 Jan 2025 21:23:02 +0300
Subject: [PATCH v5 1/2] Replace recovery boolean flags with a bits32 set.

Move local booleans ArchiveRecoveryRequested, InArchiveRecovery,
StandbyModeRequested, StandbyMode, LocalHotStandbyActive,
LocalPromoteIsTriggered into localRecoveryFlags bitset.

Move SharedHotStandbyActive, SharedPromoteIsTriggered members of
XLogRecoveryCtlData into sharedRecoveryFlags bitset.

Refactor the code according to the changes.
---
 src/backend/access/transam/timeline.c  |   6 +-
 src/backend/access/transam/xlog.c  |  26 +--
 src/backend/access/transam/xlogarchive.c   |   4 +-
 src/backend/access/transam/xlogrecovery.c  | 221 ++---
 src/backend/replication/logical/slotsync.c |   2 +-
 src/include/access/xlog_internal.h |   7 +-
 src/include/access/xlogrecovery.h  |  34 +++-
 7 files changed, 161 insertions(+), 139 deletions(-)

diff --git a/src/backend/access/transam/timeline.c b/src/backend/access/transam/timeline.c
index a27f27cc037..c9f53c4b667 100644
--- a/src/backend/access/transam/timeline.c
+++ b/src/backend/access/transam/timeline.c
@@ -93,7 +93,7 @@ readTimeLineHistory(TimeLineID targetTLI)
 		return list_make1(entry);
 	}
 
-	if (ArchiveRecoveryRequested)
+	if (ArchiveRecoveryRequested())
 	{
 		TLHistoryFileName(histfname, targetTLI);
 		fromArchive =
@@ -229,7 +229,7 @@ existsTimeLineHistory(TimeLineID probeTLI)
 	if (probeTLI == 1)
 		return false;
 
-	if (ArchiveRecoveryRequested)
+	if (ArchiveRecoveryRequested())
 	{
 		TLHistoryFileName(histfname, probeTLI);
 		RestoreArchivedFile(path, histfname, "RECOVERYHISTORY", 0, false);
@@ -331,7 +331,7 @@ writeTimeLineHistory(TimeLineID newTLI, TimeLineID parentTLI,
 	/*
 	 * If a history file exists for the parent, copy it verbatim
 	 */
-	if (ArchiveRecoveryRequested)
+	if (ArchiveRecoveryRequested())
 	{
 		TLHistoryFileName(histfname, parentTLI);
 		RestoreArchivedFile(path, histfname, "RECOVERYHISTORY", 0, false);
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 799fc739e18..f3af732b229 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -5434,7 +5434,7 @@ CheckRequiredParameterValues(void)
 	 * For archive recovery, the WAL must be generated with at least 'replica'
 	 * wal_level.
 	 */
-	if (ArchiveRecoveryRequested && ControlFile->wal_level == WAL_LEVEL_MINIMAL)
+	if (ArchiveRecoveryRequested() && ControlFile->wal_level == WAL_LEVEL_MINIMAL)
 	{
 		ereport(FATAL,
 (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
@@ -5447,7 +5447,7 @@ CheckRequiredParameterValues(void)
 	 * For Hot Standby, the WAL must be generated with 'replica' mode, and we
 	 * must have at least as many backend slots as the primary.
 	 */
-	if (ArchiveRecoveryRequested && EnableHotStandby)
+	if (ArchiveRecoveryRequested() && EnableHotStandby)
 	{
 		/* We ignore autovacuum_worker_slots when we make this test. */
 		RecoveryRequiresIntParameter("max_connections",
@@ -5607,8 +5607,8 @@ StartupXLOG(void)
 	 *
 	 * InitWalRecovery analyzes the control file and the backup label file, if
 	 * any.  It updates the in-memory ControlFile buffer according to the
-	 * starting checkpoint, and sets InRecovery and ArchiveRecoveryRequested.
-	 * It also applies the tablespace map file, if any.
+	 * starting checkpoint, and sets SX_ARCHIVE_RECOVERY_REQUESTED and
+	 * InRecovery. It also applies the tablespace map file, if any.
 	 */
 	InitWalRecovery(ControlFile, &wasShutdown,
 	&haveBackupLabel, &haveTblspcMap);
@@ -5740,7 +5740,7 @@ StartupXLOG(void)
 	{
 		/* Initialize state for RecoveryInProgress() */
 		SpinLockAcquire(&XLogCtl->info_lck);
-		if (InArchiveRecovery)
+		if (InArchiveRecovery())
 			XLogCtl->SharedRecoveryState = RECOVERY_STATE_ARCHIVE;
 		else
 			XLogCtl->SharedRecoveryState = RECOVERY_STATE_CRASH;
@@ -5793,7 +5793,7 @@ StartupXLOG(void)
 		 * startup process to think that there are still invalid page
 		 * references when checking for data consistency.
 		 */
-		if (InArchiveRecovery)
+		if (InArchiveRecovery())
 		{
 			LocalMinRecoveryPoint = ControlFile->minRecoveryPoint;
 			LocalMinRecoveryPointTLI = ControlFile->minRecoveryPointTLI;
@@ -5827,7 +5827,7 @@ StartupXLOG(void)
 		 * control file and we've established a recovery snapshot from a
 		 * running-xacts WAL record.
 		 */
-		if (ArchiveRecoveryRequested && EnableHotStandby)
+		if (ArchiveRecoveryRequested() && EnableHotStandby)
 		{
 			TransactionId *xids;
 			int			nxids;
@@ -5940,7 +5940,7 @@ StartupXLOG(void)
 		 * recover from an online backup but never called pg_backup_stop(), or
 		 * you didn't archive all the WAL needed.
 		 */
-		if (ArchiveRecoveryRequested || ControlFile->backupEndRequired)
+		if (ArchiveRecoveryRequested() || ControlFile->

Re: Get rid of WALBufMappingLock

2025-02-28 Thread Michael Paquier
On Fri, Feb 28, 2025 at 02:13:23PM +0100, Álvaro Herrera wrote:
> On 2025-Feb-28, Michael Paquier wrote:
>> Saying that, I have also done similar tests with your v12 for a couple
>> of hours and this looks stable under installcheck-world.  I can see
>> that you've reworked quite a bit the surroundings of InitializedFrom
>> in this one.  If you apply that once again at some point, the
>> buildfarm will be judge in the long-term, but I am rather confident by
>> saying that the situation looks better here, at least.
> 
> Heh, no amount of testing can prove lack of bugs; but for sure "it looks
> different now, so it must be correct" must be the weakest proof of
> correctness I've heard of!

Err, okay.  I did use the word "stable" with tests rather than
"correct", and I implied upthread that I did not check the correctness
nor the internals of the patch.  If my words held the meaning you
are implying, well, my apologies for the confusion, I guess.  I only
tested the patch and it was stable while I've noticed a few diffs with
the previous version, but I did *not* check its internals at all, nor
do I mean that I endorse its logic.  I hope that's clear now.
--
Michael


signature.asc
Description: PGP signature


Re: per backend WAL statistics

2025-02-28 Thread Bertrand Drouvot
Hi,

On Fri, Feb 28, 2025 at 11:34:14AM +0900, Michael Paquier wrote:
> On Thu, Feb 27, 2025 at 07:47:09AM +, Bertrand Drouvot wrote:
> > That's how I did it initially but decided to move it to pgstat_backend.c. 
> > The
> > reason was that it's fully linked to "per backend" stats and that there is
> > no SQL api on top of it (while I think that's the case for almost all the 
> > ones
> > in pgstatfuncs.c). Thoughts?
> 
> Okay by me with pgstat_fetch_stat_backend in parallel, why not
> exposing this part as well..  Perhaps that could be useful for some
> extension?  I'd rather have out-of-core code do these lookups with the
> same sanity checks in place for the procnumber and slot lookups.

Yeah that's also a pros for it.

> The name was inconsistent with the rest of the file, so I have settled
> to a pgstat_fetch_stat_backend_by_pid() to be more consistent.

Sounds good, thanks!

> A
> second thing is to properly initialize bktype if defined by the
> caller.

Saw that in c2a50ac678e, makes sense.

> Attached is a rebased version of the rest.

The rebased version looks ok.

Also attaching the patch I mentioned up-thread to address some of Rahila's
comments ([1]): It adds a AuxiliaryPidGetProc() call in 
pgstat_fetch_stat_backend_by_pid()
and pg_stat_reset_backend_stats(). I think that fully makes sense since 
a051e71e28a
modified pgstat_tracks_backend_bktype() for B_WAL_RECEIVER, B_WAL_SUMMARIZER
and B_WAL_WRITER.

It looks like it does not need doc updates. Attached as 0002 as it's somehow
un-related to this thread (but not sure it deserves it's dedicated thread 
though).

[1]: 
https://www.postgresql.org/message-id/CAH2L28v9BwN8_y0k6FQ591=0g2hj_eshlgj3bp38c9nmvyk...@mail.gmail.com
[2]: 
https://www.postgresql.org/message-id/flat/Z8FMjlyNpNicucGa%40paquier.xyz#92d3e2b9ad860708f76b8f9c6f49f32d

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
>From 38ca4b60869a6c96c22e6bdfd33cac07827cef88 Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot 
Date: Mon, 6 Jan 2025 10:00:00 +
Subject: [PATCH v12 1/2] per backend WAL statistics

Now that commit 9aea73fc61 added backend-level statistics to pgstats (and
per backend IO statistics) we can more easily add per backend statistics.

This commit adds per backend WAL statistics using the same layer as pg_stat_wal,
except that it is now possible to know how much WAL activity is happening in each
backend rather than an overall aggregate of all the activity.  A function called
pg_stat_get_backend_wal() is added to access this data depending on the
PID of a backend.

The same limitation as in 9aea73fc61 persists, meaning that Auxiliary processes
are not included in this set of statistics.

XXX: bump catalog version
XXX: bump of stats file format not required, as backend stats do not
persist on disk.
---
 doc/src/sgml/monitoring.sgml| 19 ++
 src/backend/utils/activity/pgstat_backend.c | 64 +
 src/backend/utils/activity/pgstat_wal.c |  1 +
 src/backend/utils/adt/pgstatfuncs.c | 26 -
 src/include/catalog/pg_proc.dat |  7 +++
 src/include/pgstat.h| 37 ++--
 src/include/utils/pgstat_internal.h |  3 +-
 src/test/regress/expected/stats.out | 14 +
 src/test/regress/sql/stats.sql  |  6 ++
 9 files changed, 156 insertions(+), 21 deletions(-)
  15.9% doc/src/sgml/
  39.3% src/backend/utils/activity/
  15.6% src/backend/utils/adt/
   8.8% src/include/catalog/
   4.5% src/include/utils/
   8.4% src/test/regress/expected/
   6.4% src/test/regress/sql/

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 9178f1d34ef..f4c37c811ba 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -4860,6 +4860,25 @@ description | Waiting for a newly initialized WAL file to reach durable storage

   
 
+  
+   
+
+ pg_stat_get_backend_wal
+
+pg_stat_get_backend_wal ( integer )
+record
+   
+   
+Returns WAL statistics about the backend with the specified
+process ID. The output fields are exactly the same as the ones in the
+pg_stat_wal view.
+   
+   
+The function does not return WAL statistics for the checkpointer,
+the background writer, the startup process and the autovacuum launcher.
+   
+  
+
   

 
diff --git a/src/backend/utils/activity/pgstat_backend.c b/src/backend/utils/activity/pgstat_backend.c
index 3c9ebbcd69c..641ba27c95b 100644
--- a/src/backend/utils/activity/pgstat_backend.c
+++ b/src/backend/utils/activity/pgstat_backend.c
@@ -38,6 +38,14 @@
  */
 static PgStat_BackendPending PendingBackendStats;
 
+/*
+ * WAL usage counters saved from pgWalUsage at the previous call to
+ * pgstat_report_wal(). This is used to calculate how much WAL usage
+ * happens between pgstat_report_w

Re: Add RESPECT/IGNORE NULLS and FROM FIRST/LAST options

2025-02-28 Thread Tatsuo Ishii
>> BTW, I noticed that in the code path where
>> ignorenulls_getfuncarginframe() is called, WinSetMarkPosition() is
>> never called?
>>
>> Attached version uses the mark_pos at the end.

I did simple performance test against v8.

EXPLAIN ANALYZE
SELECT
  x,
nth_value(x,2) IGNORE NULLS OVER w
FROM generate_series(1,$i) g(x)
WINDOW w AS (ORDER BY x ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING);

I changed $i = 1k, 2k, 3k, 4k, 5k... 10k and got this:

Number  Time (ms)
of rows 

100028.977
200096.556
3000212.019
4000383.615
5000587.05
6000843.23
70001196.177
80001508.52
90001920.593
1   2514.069

As you can see, when the number of rows = 1k, it took 28 ms. For 10k
rows, it took 2514 ms, which is 86 times slower than the 1k case. Can
we enhance this?

Graph attached.

Best reagards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp


Re: Enhance 'pg_createsubscriber' to retrieve databases automatically when no database is provided.

2025-02-28 Thread Shubham Khanna
On Thu, Feb 20, 2025 at 4:56 PM Hayato Kuroda (Fujitsu)
 wrote:
>
> Dear Shubham,
>
> Thanks for updating the patch quickly!
>
> > > 04.
> > > ```
> > > +# Verify that only user databases got subscriptions (not template 
> > > databases)
> > > +my @user_dbs = ($db1, $db2);
> > > +foreach my $dbname (@user_dbs)
> > > +{
> > > +   my $sub_exists =
> > > + $node_s1->safe_psql($dbname, "SELECT count(*) FROM
> > pg_subscription;");
> > > +   is($sub_exists, '3', "Subscription created successfully for 
> > > $dbname");
> > > +}
> > > ```
> > >
> > > Hmm, what do you want to check here? pg_subscription is a global catalog 
> > > so
> > that
> > > very loop will see exactly the same content. Also, 'postgres' is also the 
> > > user
> > database.
> > > I feel you must ensure that all three databases (postgres, $db1, and 
> > > $db2) have
> > a
> > > subscription here.
> > >
> >
> > Fixed.
>
> My point was that the loop does not have meaning because pg_subscription
> is a global one. I and Peter considered changes like [1] is needed. It ensures
> that each databases have a subscription.
> Note: [1] cannot pass the test as-is because $db1 and $db2 contains special
> characters. Please escape appropriately.
>

Fixed.

> Other comments are listed in below.
>
> 01.
> ```
> + -a
> + --all-dbs
> ```
>
> Peter's comment [2] does not say that option name should be changed.
> The scope of his comment is only in the .c file.
>

Fixed.

> 02.
> ```
> +# Set up node S2 as standby linking to node P
> +$node_p->backup('backup_3');
> +my $node_s2 = PostgreSQL::Test::Cluster->new('node_s2');
> ```
>
> I feel $node_s should be renamed to $node_s1, then you can use $node_s2.
>
> Note that this change may not be needed based on the comment [3].

Fixed.

> We may have to separate the test file.
>
> [1]:
> ```
> # Verify that only user databases got subscriptions (not template databases)
> my @user_dbs = ('postgres', $db1, $db2);
> foreach my $dbname (@user_dbs)
> {
> $result =
> $node_s2->safe_psql('postgres',
> "SELECT count(*) FROM pg_subscription, pg_database 
> WHERE subdbid = pg_database.oid and datname = '$dbname';");
>
> is($result, '1', "Subscription created successfully for $dbname");
> }
> ```

Fixed.

> [2]: 
> https://www.postgresql.org/message-id/CAHut%2BPsatTfk9-F4JBrX_yYE0QGh4wQiTmvS4%3DdnBxcL%3DAK2HA%40mail.gmail.com
> [3]: 
> https://www.postgresql.org/message-id/CAExHW5vmMs5nZ6%3DXcCYAXMJrhVrsW7hOovyg%2BP%2BT9Pkuc7DykA%40mail.gmail.com
>

The attached Patch contains the suggested changes.

Thanks and regards,
Shubham Khanna.


v11-0001-Enhance-pg_createsubscriber-to-fetch-and-append-.patch
Description: Binary data


Re: SIMD optimization for list_sort

2025-02-28 Thread John Naylor
On Fri, Feb 28, 2025 at 12:43 PM R, Rakshit  wrote:

> > I don't think "another extension might use it someday" makes a very strong 
> > case,
> > particularly for something that requires a new dependency.
>
> The x86-simdsort library is an optional dependency in Postgres. Also the new 
> list sort implementation which uses the x86-simdsort library does not impact 
> any of the existing workflows in Postgres.

"Optional" and "Does not impact" are not great selling points to get
us to take a 1500 line patch. As we told you in November, list_sort
isn't critical for us. You need to start with the user and work
backwards to the technology. Don't pick a technology and try to sell
people on using it.

> We ran our extension to stress list sort with low cardinality inputs. For eg, 
> for an array of size 100k having repeated values in the range 1-10 we still 
> see a gain of around 20% in throughput.
> We will collect more data for low cardinality inputs and with AVX2 too.

Thanks for the news, those are encouraging results.

-- 
John Naylor
Amazon Web Services




Re: explain analyze rows=%.0f

2025-02-28 Thread Ilia Evdokimov



On 27.02.2025 19:51, Robert Haas wrote:

On Mon, Feb 24, 2025 at 2:16 PM Robert Haas  wrote:

On Mon, Feb 24, 2025 at 12:12 PM Ilia Evdokimov
 wrote:

If no one is concerned about rows being a non-integer, then I support
this, as it's quite strange for the average rows to be an integer only
for me. If we go with this approach, we should also update all examples
in the documentation accordingly. I attached patch with changes in
documentation.

Thanks, that's very helpful. If we go forward with this, I'll commit
your patch and mine together.

Since Tom doesn't seem to want to further object to trying it this
way, I've gone ahead and done this for now. Granted, it's not
altogether clear from the buildfarm that we would have ever gotten any
more failures after 44cbba9a7f51a3888d5087fc94b23614ba2b81f2, but it
seems to me that there's no way to rule out the possibility, and
low-frequency buildfarm failures that might only occur once a year are
in some ways more annoying than high-frequency ones, especially to
people who put a lot of energy into tracking down those failures. It
just seems unprincipled to me to leave things in a state where we know
that such failures are possible, and wonky to have things in a state
where whether parallel query gets used or not could make the
difference between getting a report of X rows and getting a report of
X.00 rows. If a user notices that difference in their own environment
-- regression tests aside -- they're not likely to guess what has
happened.

Of course, if this commit draws objections or runs into problems with
the buildfarm or with users, we might have to reconsider. I'm not dead
set on having it this way rather than any other. But I think it's more
principled than making 0-vs-2 digits predicated on a random event; and
it's a lot simpler than any of the other options that have been
proposed.



Then I am moving the commitfest entry [0] to the Committed status.

[0]: https://commitfest.postgresql.org/patch/5501/

--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.





Re: Enhance 'pg_createsubscriber' to retrieve databases automatically when no database is provided.

2025-02-28 Thread Shubham Khanna
On Fri, Feb 21, 2025 at 9:44 AM Ashutosh Bapat
 wrote:
>
> On Fri, Feb 21, 2025 at 5:18 AM Peter Smith  wrote:
> >
> > On Thu, Feb 20, 2025 at 10:26 PM Hayato Kuroda (Fujitsu)
> >  wrote:
> > >
> > > Dear Shubham,
> > >
> > > Thanks for updating the patch quickly!
> > >
> > > > > 04.
> > > > > ```
> > > > > +# Verify that only user databases got subscriptions (not template 
> > > > > databases)
> > > > > +my @user_dbs = ($db1, $db2);
> > > > > +foreach my $dbname (@user_dbs)
> > > > > +{
> > > > > +   my $sub_exists =
> > > > > + $node_s1->safe_psql($dbname, "SELECT count(*) FROM
> > > > pg_subscription;");
> > > > > +   is($sub_exists, '3', "Subscription created successfully for 
> > > > > $dbname");
> > > > > +}
> > > > > ```
> > > > >
> > > > > Hmm, what do you want to check here? pg_subscription is a global 
> > > > > catalog so
> > > > that
> > > > > very loop will see exactly the same content. Also, 'postgres' is also 
> > > > > the user
> > > > database.
> > > > > I feel you must ensure that all three databases (postgres, $db1, and 
> > > > > $db2) have
> > > > a
> > > > > subscription here.
> > > > >
> > > >
> > > > Fixed.
> > >
> > > My point was that the loop does not have meaning because pg_subscription
> > > is a global one. I and Peter considered changes like [1] is needed. It 
> > > ensures
> > > that each databases have a subscription.
> > > Note: [1] cannot pass the test as-is because $db1 and $db2 contains 
> > > special
> > > characters. Please escape appropriately.
> >
> > Yes. Some test is still needed to confirm the expected subscriptions
> > all get created for respective dbs. But, the current test loop just
> > isn't doing it properly.
> >
> > >
> > > Other comments are listed in below.
> > >
> > > 01.
> > > ```
> > > + -a
> > > + --all-dbs
> > > ```
> > >
> > > Peter's comment [2] does not say that option name should be changed.
> > > The scope of his comment is only in the .c file.
> >
> > Yes, that's correct. My v9 suggestion to change 'all' to 'all_dbs' was
> > referring only to the field name of struct CreateSubscriberOptions,
> > nothing else. Not the usage help, not the error messages, not the
> > docs, not the tests, not the commit message.
>
> +1. We don't want yet another option to specify all databases. :)
>

Fixed.

The attached patch at [1] contains the suggested changes.

[1] - 
https://www.postgresql.org/message-id/CAHv8RjJ9BsCg%2Bpur307b1JnfQebnmxFZLw4LdcGX7f-%3D6OK1vw%40mail.gmail.com

Thanks and regards,
Shubham Khanna.




Re: [PoC] Federated Authn/z with OAUTHBEARER

2025-02-28 Thread Thomas Munro
Hi,

If you trigger the new optional NetBSD CI task, the oauthvalidator
tests implode[1].  Apparently that OS's kevent() doesn't like zero
relative timeouts for EVFILT_TIMER[2].  I see that you worked around
the same problem for Linux timerfd already by rounding 0 up to 1, so
we could just do the same here, and it passes with the attached.  A
cute alternative, not tested, might be to put NOTE_ABSTIME into fflag
if timeout == 0 (then it's an absolute time in the past, which should
fire immediately).

But I'm curious, how hard would it be to do this ↓ instead and not
have that problem on any OS?

 * There might be an optimization opportunity here: if timeout == 0, we
 * could signal drive_request to immediately call
 * curl_multi_socket_action, rather than returning all the way up the
 * stack only to come right back. But it's not clear that the additional
 * code complexity is worth it.

[1] https://cirrus-ci.com/task/6354435774873600
[2] 
https://github.com/NetBSD/src/blob/67c7c4658e77aa4534b6aac8c041d77097c5e722/sys/kern/kern_event.c#L1375
From 7deb153caf552c9bcb380f88eddbca94be33a0c2 Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Sat, 1 Mar 2025 00:27:01 +1300
Subject: [PATCH] Fix OAUTH on NetBSD.

NetBSD's EVFILT_TIMER doesn't like zero relative timeouts and fails with
EINVAL.  Steal the workaround from the same problem on Linux from a few
lines up: round 0 up to 1.

This could be seen on the optional NetBSD CI task, but not on the NetBSD
build farm animals because they aren't finding curl and testing this
code.
---
 src/interfaces/libpq/fe-auth-oauth-curl.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/src/interfaces/libpq/fe-auth-oauth-curl.c b/src/interfaces/libpq/fe-auth-oauth-curl.c
index ae339579f88..6e60a81574d 100644
--- a/src/interfaces/libpq/fe-auth-oauth-curl.c
+++ b/src/interfaces/libpq/fe-auth-oauth-curl.c
@@ -1363,6 +1363,16 @@ set_timer(struct async_ctx *actx, long timeout)
 #ifdef HAVE_SYS_EVENT_H
 	struct kevent ev;
 
+#ifdef __NetBSD__
+
+	/*
+	 * Work around NetBSD's rejection of zero timeouts (EINVAL), a bit like
+	 * timerfd above.
+	 */
+	if (timeout == 0)
+		timeout = 1;
+#endif
+
 	/* Enable/disable the timer itself. */
 	EV_SET(&ev, 1, EVFILT_TIMER, timeout < 0 ? EV_DELETE : (EV_ADD | EV_ONESHOT),
 		   0, timeout, 0);
-- 
2.48.1



Re: Simplify the logic a bit (src/bin/scripts/reindexdb.c)

2025-02-28 Thread Ranier Vilela
Hi Álvaro.

Em qui., 27 de fev. de 2025 às 16:50, Álvaro Herrera <
alvhe...@alvh.no-ip.org> escreveu:

> On 2025-Feb-14, Ranier Vilela wrote:
>
> > Attached is the prototype version v1.
> > What do you think?
>
> I think this is still a bit confused.  The new function's comment says
> "prepare the list of tables to ..." but what it really receives is a
> list of _indexes_ (as evidenced by the fact that they're compared to
> pg_index.indexrelid).  So on input the user_list is an index list, and
> on output it has been changed into a list of tables, and the list of
> indexes is the function's return value?  That seems quite weird.

  Yeah, I think that is confusing.

  I
> propose to change the API of this new function thus:
>
> static void
> get_parallel_tabidx_list(PGconn *conn,
> SimpleStringList *index_list,
> SimpleStringList *table_list,
> bool echo);
>
> where index_list is inout and the table_list is just an output argument.
>
Ok.


>
> I would also remove the 'type' argument, because I don't see a need to
> keep it.
>
Ok.

Thanks for your guidance.

v2 attached, please comment if you have any further objections.

best regards,
Ranier Vilela


v2-simplifies-reindex-one-database-reindexdb.patch
Description: Binary data


Re: Implement waiting for wal lsn replay: reloaded

2025-02-28 Thread Yura Sokolov
28.02.2025 16:03, Yura Sokolov пишет:
> 17.02.2025 00:27, Alexander Korotkov wrote:
>> On Thu, Feb 6, 2025 at 10:31 AM Yura Sokolov  
>> wrote:
>>> I briefly looked into patch and have couple of minor remarks:
>>>
>>> 1. I don't like `palloc` in the `WaitLSNWakeup`. I believe it wont issue
>>> problems, but still don't like it. I'd prefer to see local fixed array, say
>>> of 16 elements, and loop around remaining function body acting in batch of
>>> 16 wakeups. Doubtfully there will be more than 16 waiting clients often,
>>> and even then it wont be much heavier than fetching all at once.
>>
>> OK, I've refactored this to use static array of 16 size.  palloc() is
>> used only if we don't fit static array.
> 
> I've rebased patch and:
> - fixed compiler warning in wait.c ("maybe uninitialized 'result'").
> - made a loop without call to palloc in WaitLSNWakeup. It is with "goto" to
> keep indentation, perhaps `do {} while` would be better?

And fixed:
   'WAIT' is marked as BARE_LABEL in kwlist.h, but it is missing from
gram.y's bare_label_keyword rule

---
regards
Yura Sokolov aka funny-falconFrom d9c44427a4cbecd6dd27edae48ea42d933756ff9 Mon Sep 17 00:00:00 2001
From: Yura Sokolov 
Date: Fri, 28 Feb 2025 15:40:18 +0300
Subject: [PATCH v4] Implement WAIT FOR command

WAIT FOR is to be used on standby and specifies waiting for
the specific WAL location to be replayed.  This option is useful when
the user makes some data changes on primary and needs a guarantee to see
these changes are on standby.

The queue of waiters is stored in the shared memory as an LSN-ordered pairing
heap, where the waiter with the nearest LSN stays on the top.  During
the replay of WAL, waiters whose LSNs have already been replayed are deleted
from the shared memory pairing heap and woken up by setting their latches.

WAIT FOR needs to wait without any snapshot held.  Otherwise, the snapshot
could prevent the replay of WAL records, implying a kind of self-deadlock.
This is why separate utility command seems appears to be the most robust
way to implement this functionality.  It's not possible to implement this as
a function.  Previous experience shows that stored procedures also have
limitation in this aspect.

Discussion: https://postgr.es/m/eb12f9b03851bb2583adab5df9579b4b%40postgrespro.ru
Author: Kartyshov Ivan, Alexander Korotkov
Reviewed-by: Michael Paquier, Peter Eisentraut, Dilip Kumar, Amit Kapila
Reviewed-by: Alexander Lakhin, Bharath Rupireddy, Euler Taveira
Reviewed-by: Heikki Linnakangas, Kyotaro Horiguchi
---
 doc/src/sgml/ref/allfiles.sgml|   1 +
 doc/src/sgml/reference.sgml   |   1 +
 src/backend/access/transam/Makefile   |   3 +-
 src/backend/access/transam/meson.build|   1 +
 src/backend/access/transam/xact.c |   6 +
 src/backend/access/transam/xlog.c |   7 +
 src/backend/access/transam/xlogrecovery.c |  11 +
 src/backend/access/transam/xlogwait.c | 347 ++
 src/backend/commands/Makefile |   3 +-
 src/backend/commands/meson.build  |   1 +
 src/backend/commands/wait.c   | 185 ++
 src/backend/lib/pairingheap.c |  18 +-
 src/backend/parser/gram.y |  15 +-
 src/backend/storage/ipc/ipci.c|   3 +
 src/backend/storage/lmgr/proc.c   |   6 +
 src/backend/tcop/pquery.c |  12 +-
 src/backend/tcop/utility.c|  22 ++
 .../utils/activity/wait_event_names.txt   |   2 +
 src/include/access/xlogwait.h |  89 +
 src/include/commands/wait.h   |  21 ++
 src/include/lib/pairingheap.h |   3 +
 src/include/nodes/parsenodes.h|   7 +
 src/include/parser/kwlist.h   |   1 +
 src/include/storage/lwlocklist.h  |   1 +
 src/include/tcop/cmdtaglist.h |   1 +
 src/test/recovery/meson.build |   1 +
 src/test/recovery/t/045_wait_for_lsn.pl   | 217 +++
 src/tools/pgindent/typedefs.list  |   4 +
 28 files changed, 978 insertions(+), 11 deletions(-)
 create mode 100644 src/backend/access/transam/xlogwait.c
 create mode 100644 src/backend/commands/wait.c
 create mode 100644 src/include/access/xlogwait.h
 create mode 100644 src/include/commands/wait.h
 create mode 100644 src/test/recovery/t/045_wait_for_lsn.pl

diff --git a/doc/src/sgml/ref/allfiles.sgml b/doc/src/sgml/ref/allfiles.sgml
index f5be638867a..8b585cba751 100644
--- a/doc/src/sgml/ref/allfiles.sgml
+++ b/doc/src/sgml/ref/allfiles.sgml
@@ -188,6 +188,7 @@ Complete list of usable sgml source files in this directory.
 
 
 
+
 
 
 
diff --git a/doc/src/sgml/reference.sgml b/doc/src/sgml/reference.sgml
index ff85ace83fc..bd14ec00d2d 100644
--- a/doc/src/sgml/reference.sgml
+++ b/doc/src/sgml/reference.sgml
@@ -216,6 +216,7 @@
&update;
&vacuum;
&values;
+   &wait;
 
  
 
diff --git a/

Re: long-standing data loss bug in initial sync of logical replication

2025-02-28 Thread Amit Kapila
On Fri, Feb 28, 2025 at 6:15 AM Masahiko Sawada  wrote:
>
> On Thu, Feb 27, 2025 at 12:14 AM Zhijie Hou (Fujitsu)
>  wrote:
> >
> > On Monday, February 24, 2025 5:50 PM Amit Kapila  
> > wrote:
> > >
> > > On Wed, Dec 11, 2024 at 12:37 PM Masahiko Sawada
> > >  wrote:
> > > >
> > > > I confirmed that the proposed patch fixes these issues. I have one
> > > > question about the patch:
> > > >
> > > > In the main loop in SnapBuildDistributeSnapshotAndInval(), we have the
> > > > following code:
> > > >
> > > >/*
> > > > * If we don't have a base snapshot yet, there are no changes in 
> > > > this
> > > > * transaction which in turn implies we don't yet need a 
> > > > snapshot at
> > > > * all. We'll add a snapshot when the first change gets queued.
> > > > *
> > > > * NB: This works correctly even for subtransactions because
> > > > * ReorderBufferAssignChild() takes care to transfer the base
> > > snapshot
> > > > * to the top-level transaction, and while iterating the 
> > > > changequeue
> > > > * we'll get the change from the subtxn.
> > > > */
> > > >if (!ReorderBufferXidHasBaseSnapshot(builder->reorder, txn->xid))
> > > >continue;
> > > >
> > > > Is there any case where we need to distribute inval messages to
> > > > transactions that don't have the base snapshot yet but eventually need
> > > > the inval messages?
> > > >
> > >
> > > Good point. It is mentioned that for snapshots: "We'll add a snapshot
> > > when the first change gets queued.". I think we achieve this via
> > > builder->committed.xip array such that when we set a base snapshot for
> > > a transaction, we use that array to form a snapshot. However, I don't
> > > see any such consideration for invalidations. Now, we could either
> > > always add invalidations to xacts that don't have base_snapshot yet or
> > > have a mechanism similar committed.xid array. But it is better to
> > > first reproduce the problem.
> >
> > I think distributing invalidations to a transaction that has not yet built a
> > base snapshot is un-necessary. This is because, during the process of 
> > building
> > its base snapshot, such a transaction will have already recorded the XID of 
> > the
> > transaction that altered the publication information into its array of
> > committed XIDs. Consequently, it will reflect the latest changes in the 
> > catalog
> > from the beginning. In the context of logical decoding, this scenario is
> > analogous to decoding a new transaction initiated after the catalog-change
> > transaction has been committed.
> >
> > The original issue arises because the catalog cache was constructed using an
> > outdated snapshot that could not reflect the latest catalog changes. 
> > However,
> > this is not a problem in cases without a base snapshot. Since the existing
> > catalog cache should have been invalidated upon decoding the committed
> > catalog-change transaction, the subsequent transactions will construct a new
> > cache with the latest snapshot.
>
> I've also concluded it's not necessary but the reason and analysis
> might be somewhat different. IIUC in the original issue (looking at
> Andres's reproducer[1]), the fact that when replaying a
> non-catalog-change transaction, the walsender constructed the snapshot
> that doesn't reflect the catalog change is fine because the first
> change of that transaction was made before the catalog change. The
> problem is that the walsender process absorbed the invalidation
> message when replaying the change that happened before the catalog
> change, and ended up keeping replaying the subsequent changes with
> that snapshot. That is why we concluded that we need to distribute the
> invalidation messages to concurrently decoded transactions so that we
> can invalidate the cache again at that point. As the comment
> mentioned, the base snapshot is set before queuing any changes, so if
> the transaction doesn't have the base snapshot yet, there must be no
> queued change that happened before the catalog change. The
> transactions that initiated after the catalog change don't have this
> issue.
>

I think both of you are saying the same thing with slightly different
words. Hou-San's explanation goes into more detail at the code level,
and you have said the same thing with a slightly higher-level view.
Additionally, for streaming transactions where we would have already
sent one or more streams, we don't need anything special since they
behave similarly to a transaction having a base snapshot because we
save the snapshot after sending each stream.

-- 
With Regards,
Amit Kapila.




Re: long-standing data loss bug in initial sync of logical replication

2025-02-28 Thread Amit Kapila
On Fri, Feb 28, 2025 at 9:45 AM Amit Kapila  wrote:
>
> On Fri, Feb 28, 2025 at 6:15 AM Masahiko Sawada  wrote:
> >
> > On Thu, Feb 27, 2025 at 12:14 AM Zhijie Hou (Fujitsu)
> > >
> > > I think distributing invalidations to a transaction that has not yet 
> > > built a
> > > base snapshot is un-necessary. This is because, during the process of 
> > > building
> > > its base snapshot, such a transaction will have already recorded the XID 
> > > of the
> > > transaction that altered the publication information into its array of
> > > committed XIDs. Consequently, it will reflect the latest changes in the 
> > > catalog
> > > from the beginning. In the context of logical decoding, this scenario is
> > > analogous to decoding a new transaction initiated after the catalog-change
> > > transaction has been committed.
> > >
> > > The original issue arises because the catalog cache was constructed using 
> > > an
> > > outdated snapshot that could not reflect the latest catalog changes. 
> > > However,
> > > this is not a problem in cases without a base snapshot. Since the existing
> > > catalog cache should have been invalidated upon decoding the committed
> > > catalog-change transaction, the subsequent transactions will construct a 
> > > new
> > > cache with the latest snapshot.
> >
> > I've also concluded it's not necessary but the reason and analysis
> > might be somewhat different.
>

Based on the discussion on this point and Hou-San's proposed comment,
I have tried to add/edit a few comments in 0001 patch. See, if those
make sense to you, it is important to capture the reason and theory we
discussed here in the form of comments so that it will be easy to
remember the reason in the future.

-- 
With Regards,
Amit Kapila.


v17-0001-Distribute-invalidatons-if-change-in-catalog-tab.patch
Description: Binary data


Re: Improve CRC32C performance on SSE4.2

2025-02-28 Thread John Naylor
Hi Raghuveer,

You raised some interesting points, which deserve a thoughtful
response. After sleeping on it, however I came to the conclusion that
a sweeping change in runtime checks, with either of our approaches,
has downsides and unresolved questions. Perhaps we can come back to it
at a later time. For this release cycle, I took a step back and tried
to think of the least invasive way to solve the immediate problem,
which is: How to allow existing builds with "-msse4.2" to take
advantage of CLMUL while not adding overhead. Here's what I came up
with in v11:

0001: same benchmark module as before
0002: For SSE4.2 builds, arrange so that constant input uses an
inlined path so that the compiler can emit unrolled loops anywhere.
This is particularly important for the WAL insertion lock, so this is
possibly committable on its own just for that.
0003: the PCLMUL path, only for runtime-check builds
0004: the PCLMUL path for SSE4.2 builds. This uses a function pointer
for long-ish input and the same above inlined path for short input
(whether constant or not). So it gets the best of both worlds.

There is also a separate issue:

On Tue, Feb 25, 2025 at 6:05 PM John Naylor  wrote:
>
> Another thing I found in Agner's manuals: AMD Zen, even as recently as
> Zen 4, don't have as good a microarchitecture for PCLMUL, so if anyone
> with such a machine would like to help test the cutoff

David Rowley shared some results off-list, which are: Zen 4 is very
good with this algorithm even at 64 bytes input length, but Zen 2
regresses up to maybe 256 bytes. Having a large cutoff to cover all
bases makes this less useful, and that was one of my reservations
about AVX-512. However, with the corsix generator I found it's
possible to specify AVX-512 with a single accumulator (rather than 4),
which still gives a minimum input of 64 bytes, so I'll plan to put
something together to demonstrate.

(Older machines could use the 3-way stream algorithm as a fallback on
long inputs, as I've mentioned elsewhere, assuming that's legally
unencumbered.)

-- 
John Naylor
Amazon Web Services
From f96ae8bf6913796c5d724ff9da25bbc79927ff1c Mon Sep 17 00:00:00 2001
From: John Naylor 
Date: Fri, 28 Feb 2025 18:27:40 +0700
Subject: [PATCH v11 4/4] Use runtime check even when we have SSE 4.2 at
 compile time

This allows us to use PCLMUL for longer inputs. Short inputs are
inlined to avoid the indirection through a function pointer.
---
 configure |  2 +-
 configure.ac  |  2 +-
 src/include/port/pg_crc32c.h  | 15 +++
 src/port/meson.build  |  1 +
 src/port/pg_crc32c_sse42_choose.c |  2 ++
 5 files changed, 16 insertions(+), 6 deletions(-)

diff --git a/configure b/configure
index 93fddd69981..91c0ffc8272 100755
--- a/configure
+++ b/configure
@@ -17684,7 +17684,7 @@ if test x"$USE_SSE42_CRC32C" = x"1"; then
 
 $as_echo "#define USE_SSE42_CRC32C 1" >>confdefs.h
 
-  PG_CRC32C_OBJS="pg_crc32c_sse42.o"
+  PG_CRC32C_OBJS="pg_crc32c_sse42.o pg_crc32c_sse42_choose.o"
   { $as_echo "$as_me:${as_lineno-$LINENO}: result: SSE 4.2" >&5
 $as_echo "SSE 4.2" >&6; }
 else
diff --git a/configure.ac b/configure.ac
index b6d02f5ecc7..a85bdbd4ff6 100644
--- a/configure.ac
+++ b/configure.ac
@@ -2151,7 +2151,7 @@ fi
 AC_MSG_CHECKING([which CRC-32C implementation to use])
 if test x"$USE_SSE42_CRC32C" = x"1"; then
   AC_DEFINE(USE_SSE42_CRC32C, 1, [Define to 1 use Intel SSE 4.2 CRC instructions.])
-  PG_CRC32C_OBJS="pg_crc32c_sse42.o"
+  PG_CRC32C_OBJS="pg_crc32c_sse42.o pg_crc32c_sse42_choose.o"
   AC_MSG_RESULT(SSE 4.2)
 else
   if test x"$USE_SSE42_CRC32C_WITH_RUNTIME_CHECK" = x"1"; then
diff --git a/src/include/port/pg_crc32c.h b/src/include/port/pg_crc32c.h
index fe0e1b6b275..26b676dddc9 100644
--- a/src/include/port/pg_crc32c.h
+++ b/src/include/port/pg_crc32c.h
@@ -55,22 +55,29 @@ typedef uint32 pg_crc32c;
 	((crc) = pg_comp_crc32c_dispatch((crc), (data), (len)))
 #define FIN_CRC32C(crc) ((crc) ^= 0x)
 
+extern pg_crc32c pg_comp_crc32c_sb8(pg_crc32c crc, const void *data, size_t len);
+extern pg_crc32c (*pg_comp_crc32c) (pg_crc32c crc, const void *data, size_t len);
 extern pg_crc32c pg_comp_crc32c_sse42(pg_crc32c crc, const void *data, size_t len);
+#ifdef USE_PCLMUL_WITH_RUNTIME_CHECK
+extern pg_crc32c pg_comp_crc32c_pclmul(pg_crc32c crc, const void *data, size_t len);
+#endif
 
 static inline
 pg_crc32c
 pg_comp_crc32c_dispatch(pg_crc32c crc, const void *data, size_t len)
 {
-	if (__builtin_constant_p(len) && len < 64)
+	if (len < 64)
 	{
 		/*
-		 * For small constant inputs, inline the computation. This allows the
-		 * compiler to unroll loops.
+		 * For small inputs, inline the computation to avoid the runtime
+		 * check. This also allows the compiler to unroll loops for constant
+		 * input.
 		 */
 		return pg_comp_crc32c_sse42_inline(crc, data, len);
 	}
 	else
-		return pg_comp_crc32c_sse42(crc, data, len);
+		/* For larger inputs, use a runtime check for PCLMUL instruct

Reduce the instruction overhead of OpenSSL calls

2025-02-28 Thread ryanewang(王蕾)
Hi,
From the openssl documentation, when the value of ret is greater than 0, the 
SSL_get_error() function returns SSL_ERROR_NONE. 
So, it seems that when the return value of SSL_read() or SSL_write() function 
is greater than 0, we don't need to make an error judgment. 
The attached patch attempts to reduce unnecessary error judgments. 


I am glad for feedback and reviews!






  腾讯

  
ryanewang   
 研发四组员工

0001-Patch-Remove-unnecessary-OpenSSL-error-judgment.patch
Description: Binary data


Re: Add “FOR UPDATE NOWAIT” lock details to the log.

2025-02-28 Thread Yuki Seino


On 2025/02/27 15:44, Yuki Seino wrote:
(4) 
https://github.com/postgres/postgres/blob/master/src/backend/access/heap/heapam_handler.c#L463 


...I don't know how to reproduce it.

I have confirmed that (4) can be reproduced using the following procedure.

(4) 
https://github.com/postgres/postgres/blob/master/src/backend/access/heap/heapam_handler.c#L463

tx1=# SELECT pg_advisory_lock(0);
tx2=# SELECT * FROM pgbench_accounts WHERE pg_advisory_lock(0) IS NOT 
NULL FOR UPDATE NOWAIT;

tx1=# UPDATE pgbench_accounts SET aid = aid WHERE aid = '1';
tx1=# BEGIN;
tx1=# UPDATE pgbench_accounts SET aid = aid WHERE aid = '1';
tx1=# SELECT pg_advisory_unlock(0);

Send the modified patch.

Regard,diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index e55700f35b8..ecd5e36ffbc 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -7678,6 +7678,25 @@ log_line_prefix = '%m [%p] %q%u@%d/%a '
   
  
 
+ 
+  log_lock_failure (boolean)
+  
+   log_lock_failure configuration 
parameter
+  
+  
+  
+   
+Controls whether a log message is produced when a lock acquisition
+fails.  This is useful for analyzing the causes of lock failures.
+Currently, only lock failures due to SELECT NOWAIT
+is supported.  The default is off.  Only superusers
+and users with the appropriate SET privilege can 
change
+this setting.
+   
+  
+ 
+
+
  
   log_recovery_conflict_waits 
(boolean)
   
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index fa7935a0ed3..e640e55a285 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -98,7 +98,8 @@ static void MultiXactIdWait(MultiXactId multi, 
MultiXactStatus status, uint16 in
Relation rel, 
ItemPointer ctid, XLTW_Oper oper,
int *remaining);
 static bool ConditionalMultiXactIdWait(MultiXactId multi, MultiXactStatus 
status,
-  
uint16 infomask, Relation rel, int *remaining);
+  
uint16 infomask, Relation rel, int *remaining,
+  
LockHoldersAndWaiters *lockHoldersAndWaiters);
 static void index_delete_sort(TM_IndexDeleteOp *delstate);
 static int bottomup_sort_and_shrink(TM_IndexDeleteOp *delstate);
 static XLogRecPtr log_heap_new_cid(Relation relation, HeapTuple tup);
@@ -162,8 +163,8 @@ static const struct
LockTuple((rel), (tup), tupleLockExtraInfo[mode].hwlock)
 #define UnlockTupleTuplock(rel, tup, mode) \
UnlockTuple((rel), (tup), tupleLockExtraInfo[mode].hwlock)
-#define ConditionalLockTupleTuplock(rel, tup, mode) \
-   ConditionalLockTuple((rel), (tup), tupleLockExtraInfo[mode].hwlock)
+#define ConditionalLockTupleTuplock(rel, tup, mode, buf) \
+   ConditionalLockTuple((rel), (tup), tupleLockExtraInfo[mode].hwlock, 
(buf))
 
 #ifdef USE_PREFETCH
 /*
@@ -4894,7 +4895,7 @@ l3:
case LockWaitSkip:
if 
(!ConditionalMultiXactIdWait((MultiXactId) xwait,

status, infomask, relation,
-   
NULL))
+   
NULL, NULL))
{
result = TM_WouldBlock;
/* recovery code 
expects to have buffer lock held */
@@ -4903,13 +4904,25 @@ l3:
}
break;
case LockWaitError:
+   LockHoldersAndWaiters 
*lockHoldersAndWaiters = CreateLockHoldersAndWaiters();
if 
(!ConditionalMultiXactIdWait((MultiXactId) xwait,

status, infomask, relation,
-   
NULL))
+   
NULL, lockHoldersAndWaiters))
ereport(ERROR,
-   
(errcode(ERRCODE_LOCK_NOT_AVAILABLE),
- 

Re: Replace IN VALUES with ANY in WHERE clauses during optimization

2025-02-28 Thread Alena Rybakina

Hi!

On 21.02.2025 00:09, Alena Rybakina wrote:


Hi!

On 09.02.2025 18:38, Alexander Korotkov wrote:

Also, aren't we too restrictive while requiring is_simple_values_sequence()?
For instance, I believe cases like this (containing Var) could be transformed 
too.

select * from t t1, lateral (select * from t t2 where t2.i in (values (t1.i), 
(1)));


I added it and attached a patch with diff file. To be honest, I didn't 
find queries except for var with volatile functions where the transform 
can't be applied.


I'm not sure about only cases where var can refer to something outside 
available_rels list but I couldn't come up with an example where that's 
possible, what do you think?


--
Regards,
Alena Rybakina
Postgres Professional
diff --git a/src/backend/optimizer/plan/subselect.c 
b/src/backend/optimizer/plan/subselect.c
index 08c4648f936..7c1b2f4cfe6 100644
--- a/src/backend/optimizer/plan/subselect.c
+++ b/src/backend/optimizer/plan/subselect.c
@@ -1215,8 +1215,8 @@ inline_cte_walker(Node *node, inline_cte_walker_context 
*context)
 }
 
 /*
- * The function traverses the tree looking for elements of type var.
- * If it finds it, it returns true.
+ * The function traverses the tree looking for subqueries particularly
+ * elements of type RangeTblEntry. If it finds it, it returns true.
  */
 static bool
 values_simplicity_check_walker(Node *node, void *ctx)
@@ -1225,7 +1225,7 @@ values_simplicity_check_walker(Node *node, void *ctx)
{
return false;
}
-   else if(IsA(node, Var))
+   else if(IsA(node, RangeTblEntry))
return true;
else if(IsA(node, Query))
return query_tree_walker((Query *) node,
@@ -1302,6 +1302,22 @@ convert_VALUES_to_ANY(Query *query, Node *testexpr)
List *elem = lfirst(lc);
Node *value = linitial(elem);
 
+   if(IsA(value, Var))
+   {
+   /*
+* We don't risk optimizing if the var is volatile, 
either.
+*/
+   if(contain_volatile_functions(value))
+   return NULL;
+
+   /*
+* Upper-level vars will now be one level closer to 
their
+* parent than before; in particular, anything that had 
been level 1
+* becomes level zero.
+   */
+   IncrementVarSublevelsUp(value, -1, 1);
+   }
+
value = eval_const_expressions(NULL, value);
 
if (!IsA(value, Const))
diff --git a/src/test/regress/expected/subselect.out 
b/src/test/regress/expected/subselect.out
index 5138a00349c..63e0114f177 100644
--- a/src/test/regress/expected/subselect.out
+++ b/src/test/regress/expected/subselect.out
@@ -3095,3 +3095,14 @@ SELECT ten FROM onek t WHERE 1.0 IN (VALUES (1), (3));
  ->  Values Scan on "*VALUES*"
 (5 rows)
 
+EXPLAIN (COSTS OFF)
+SELECT * FROM onek t1, lateral (SELECT * FROM onek t2 WHERE t2.ten IN (values 
(t1.ten), (1)));
+QUERY PLAN
+--
+ Nested Loop
+   Join Filter: (t2.ten = ANY (ARRAY[t1.ten, 1]))
+   ->  Seq Scan on onek t1
+   ->  Materialize
+ ->  Seq Scan on onek t2
+(5 rows)
+
diff --git a/src/test/regress/sql/subselect.sql 
b/src/test/regress/sql/subselect.sql
index 480f8d7b852..088619f0ffc 100644
--- a/src/test/regress/sql/subselect.sql
+++ b/src/test/regress/sql/subselect.sql
@@ -1357,3 +1357,6 @@ SELECT ten FROM onek t WHERE 1.0 IN ((VALUES (1), 
(3))::integer);
 
 EXPLAIN (COSTS OFF)
 SELECT ten FROM onek t WHERE 1.0 IN (VALUES (1), (3));
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM onek t1, lateral (SELECT * FROM onek t2 WHERE t2.ten IN (values 
(t1.ten), (1)));
From 381b9aefaa08c5ea21145339fd56cbfbf08bcfe4 Mon Sep 17 00:00:00 2001
From: Alena Rybakina 
Date: Thu, 20 Feb 2025 23:30:01 +0300
Subject: [PATCH 2/2] Add an implementation of the x IN VALUES to 'x = ANY
 [...]', a special case of the ScalarArrayOpExpr operation.

It will simplify the query tree, eliminating the appearance of
an unnecessary join.

Since VALUES describes a relational table, and the value of such
a list is a table row, the optimizer will likely face an underestimation
problem due to the inability to estimate cardinality through
MCV statistics. The cardinality evaluation mechanism
can work with the array inclusion check operation.
If the array is small enough (< 100 elements), it will perform
a statistical evaluation element by element.

We perform the transformation in the convert_ANY_sublink_to_join
if VALUES RTE is proper and the transformation is convertible.
The conversion is only possible for operations on scalar values.

Authors: Andrei Lepikhov ,
	 Alena Rybakina 
Reviewed-by: Ivan Kush ,
	 Alexander Korotkov 
---
 src/backend/optimizer/plan/subselect.c| 122 ++
 src/backend/optimizer/

Re: per backend WAL statistics

2025-02-28 Thread Michael Paquier
On Fri, Feb 28, 2025 at 09:26:08AM +, Bertrand Drouvot wrote:
> Also attaching the patch I mentioned up-thread to address some of Rahila's
> comments ([1]): It adds a AuxiliaryPidGetProc() call in 
> pgstat_fetch_stat_backend_by_pid()
> and pg_stat_reset_backend_stats(). I think that fully makes sense since 
> a051e71e28a
> modified pgstat_tracks_backend_bktype() for B_WAL_RECEIVER, B_WAL_SUMMARIZER
> and B_WAL_WRITER.

Oops, yes, you are right on this one.  This change should have
happened earlier.  The flow you are using in 0002 is similar to
pg_log_backend_memory_contexts(), which looks OK at quick glance.
--
Michael


signature.asc
Description: PGP signature


Re: Get rid of WALBufMappingLock

2025-02-28 Thread Yura Sokolov
26.02.2025 14:48, Alexander Korotkov пишет:
> Hi, Michael!
> 
> On Wed, Feb 26, 2025 at 3:04 AM Michael Paquier  wrote:
>>
>> On Tue, Feb 25, 2025 at 05:19:29PM +0200, Alexander Korotkov wrote:
>>> It seems that I managed to reproduce the issue on my Raspberry PI 4.
>>> After running our test suite in a loop for 2 days I found one timeout.
>>
>> Hmm.  It's surprising to not see a higher occurence.  My buildfarm
>> host has caught that on its first run after the patch, for two
>> different animals which are both on the same machine.
>>
>>> One way or another, we need protection against this situation any way.
>>> The updated patch is attached.  Now, after acquiring ReservedPtr it
>>> waits till OldPageRqstPtr gets initialized.  Additionally I've to
>>> implement more accurate calculation of OldPageRqstPtr.  I run tests
>>> with new patch on my Raspberry in a loop.  Let's see how it goes.
>>
>> Perhaps you'd prefer that I do more tests with your patch?  This is
>> time-consuming for you.  This is not a review of the internals of the
>> patch, and I cannot give you access to the host, but if my stuff is
>> the only place where we have a good reproducibility of the issue, I'm
>> OK to grab some time and run a couple of checks to avoid again a
>> freeze of the buildfarm.
> 
> Thank you for offering the help.  Updated version of patch is attached
> (I've added one memory barrier there just in case).  I would
> appreciate if you could run it on batta, hachi or similar hardware.

Good day, Alexander.

Checked your additions to patch. They're clear and robust.

---
regards
Yura Sokolov aka funny-falcon




Re: Extend postgres_fdw_get_connections to return remote backend pid

2025-02-28 Thread Sagar Shedge
Hi Fujii,

Thanks for updates.
Looks good to me. We can proceed with latest potch.

-- 
Sagar Dilip Shedge,
SDE AWS




Re: long-standing data loss bug in initial sync of logical replication

2025-02-28 Thread Amit Kapila
On Mon, Feb 24, 2025 at 4:49 PM Shlok Kyal  wrote:
>
> Patches need a rebase. Attached the rebased patch.
>

I would like to discuss 0002 patch:
 publication_invalidation_cb(Datum arg, int cacheid, uint32 hashvalue)
 {
  publications_valid = false;
-
- /*
- * Also invalidate per-relation cache so that next time the filtering info
- * is checked it will be updated with the new publication settings.
- */
- rel_sync_cache_publication_cb(arg, cacheid, hashvalue);
 }

 /*
@@ -1970,18 +1964,6 @@ init_rel_sync_cache(MemoryContext cachectx)
rel_sync_cache_publication_cb,
(Datum) 0);

- /*
- * Flush all cache entries after any publication changes.  (We need no
- * callback entry for pg_publication, because publication_invalidation_cb
- * will take care of it.)
- */
- CacheRegisterSyscacheCallback(PUBLICATIONRELMAP,
-   rel_sync_cache_publication_cb,
-   (Datum) 0);
- CacheRegisterSyscacheCallback(PUBLICATIONNAMESPACEMAP,
-   rel_sync_cache_publication_cb,
-   (Datum) 0);

In 0002 patch, we are improving the performance by avoiding
invalidation processing in a number of cases. Basically, the claim is
that we are unnecessarily invalidating all the RelSyncCache entries
when a particular relation's entry could be invalidated. I have not
verified it, but IIUC, this should be an independent improvement atop
HEAD; if so, then we should start a separate thread to discuss it.

Thoughts?

-- 
With Regards,
Amit Kapila.




RE: Enhance 'pg_createsubscriber' to retrieve databases automatically when no database is provided.

2025-02-28 Thread Hayato Kuroda (Fujitsu)
Dear Shubham,

Thanks for updating the patch.

I think the modification [1] is not correct - the loop is meaningless because 
the same
query would be executed every time. How about idea like attached? Here, instead 
of
try escaping dbname, dbname is directly obtained from the instance and they are 
compared.

How do you think?

[1]:
```
+# Verify that only user databases got subscriptions (not template databases)
+my @user_dbs = ('postgres', $db1, $db2);
+foreach my $dbname (@user_dbs)
+{
+   $result = $node_s2->safe_psql('postgres',
+   "SELECT count(*) FROM pg_subscription, pg_database WHERE 
subdbid = pg_database.oid and datistemplate = 'f';"
+   );
+   is($result, '3', "Subscription created successfully for $dbname");
+   $result = $node_s2->safe_psql('postgres',
+   "SELECT count(*) FROM pg_subscription, pg_database WHERE 
subdbid = pg_database.oid and datistemplate = 't';"
+   );
+   is($result, '0', "Subscription created successfully for $dbname");
+}
```

Best regards,
Hayato Kuroda
FUJITSU LIMITED



kuroda_fix.diffs
Description: kuroda_fix.diffs


Re: Implement waiting for wal lsn replay: reloaded

2025-02-28 Thread Yura Sokolov
17.02.2025 00:27, Alexander Korotkov wrote:
> On Thu, Feb 6, 2025 at 10:31 AM Yura Sokolov  wrote:
>> I briefly looked into patch and have couple of minor remarks:
>>
>> 1. I don't like `palloc` in the `WaitLSNWakeup`. I believe it wont issue
>> problems, but still don't like it. I'd prefer to see local fixed array, say
>> of 16 elements, and loop around remaining function body acting in batch of
>> 16 wakeups. Doubtfully there will be more than 16 waiting clients often,
>> and even then it wont be much heavier than fetching all at once.
> 
> OK, I've refactored this to use static array of 16 size.  palloc() is
> used only if we don't fit static array.

I've rebased patch and:
- fixed compiler warning in wait.c ("maybe uninitialized 'result'").
- made a loop without call to palloc in WaitLSNWakeup. It is with "goto" to
keep indentation, perhaps `do {} while` would be better?

---
regards
Yura Sokolov aka funny-falconFrom fa107e15eab3ec2493f0663f03b563d49979e0b5 Mon Sep 17 00:00:00 2001
From: Yura Sokolov 
Date: Fri, 28 Feb 2025 15:40:18 +0300
Subject: [PATCH v3] Implement WAIT FOR command

WAIT FOR is to be used on standby and specifies waiting for
the specific WAL location to be replayed.  This option is useful when
the user makes some data changes on primary and needs a guarantee to see
these changes are on standby.

The queue of waiters is stored in the shared memory as an LSN-ordered pairing
heap, where the waiter with the nearest LSN stays on the top.  During
the replay of WAL, waiters whose LSNs have already been replayed are deleted
from the shared memory pairing heap and woken up by setting their latches.

WAIT FOR needs to wait without any snapshot held.  Otherwise, the snapshot
could prevent the replay of WAL records, implying a kind of self-deadlock.
This is why separate utility command seems appears to be the most robust
way to implement this functionality.  It's not possible to implement this as
a function.  Previous experience shows that stored procedures also have
limitation in this aspect.

Discussion: https://postgr.es/m/eb12f9b03851bb2583adab5df9579b4b%40postgrespro.ru
Author: Kartyshov Ivan, Alexander Korotkov
Reviewed-by: Michael Paquier, Peter Eisentraut, Dilip Kumar, Amit Kapila
Reviewed-by: Alexander Lakhin, Bharath Rupireddy, Euler Taveira
Reviewed-by: Heikki Linnakangas, Kyotaro Horiguchi
---
 doc/src/sgml/ref/allfiles.sgml|   1 +
 doc/src/sgml/reference.sgml   |   1 +
 src/backend/access/transam/Makefile   |   3 +-
 src/backend/access/transam/meson.build|   1 +
 src/backend/access/transam/xact.c |   6 +
 src/backend/access/transam/xlog.c |   7 +
 src/backend/access/transam/xlogrecovery.c |  11 +
 src/backend/access/transam/xlogwait.c | 347 ++
 src/backend/commands/Makefile |   3 +-
 src/backend/commands/meson.build  |   1 +
 src/backend/commands/wait.c   | 185 ++
 src/backend/lib/pairingheap.c |  18 +-
 src/backend/parser/gram.y |  14 +-
 src/backend/storage/ipc/ipci.c|   3 +
 src/backend/storage/lmgr/proc.c   |   6 +
 src/backend/tcop/pquery.c |  12 +-
 src/backend/tcop/utility.c|  22 ++
 .../utils/activity/wait_event_names.txt   |   2 +
 src/include/access/xlogwait.h |  89 +
 src/include/commands/wait.h   |  21 ++
 src/include/lib/pairingheap.h |   3 +
 src/include/nodes/parsenodes.h|   7 +
 src/include/parser/kwlist.h   |   1 +
 src/include/storage/lwlocklist.h  |   1 +
 src/include/tcop/cmdtaglist.h |   1 +
 src/test/recovery/meson.build |   1 +
 src/test/recovery/t/045_wait_for_lsn.pl   | 217 +++
 src/tools/pgindent/typedefs.list  |   4 +
 28 files changed, 977 insertions(+), 11 deletions(-)
 create mode 100644 src/backend/access/transam/xlogwait.c
 create mode 100644 src/backend/commands/wait.c
 create mode 100644 src/include/access/xlogwait.h
 create mode 100644 src/include/commands/wait.h
 create mode 100644 src/test/recovery/t/045_wait_for_lsn.pl

diff --git a/doc/src/sgml/ref/allfiles.sgml b/doc/src/sgml/ref/allfiles.sgml
index f5be638867a..8b585cba751 100644
--- a/doc/src/sgml/ref/allfiles.sgml
+++ b/doc/src/sgml/ref/allfiles.sgml
@@ -188,6 +188,7 @@ Complete list of usable sgml source files in this directory.
 
 
 
+
 
 
 
diff --git a/doc/src/sgml/reference.sgml b/doc/src/sgml/reference.sgml
index ff85ace83fc..bd14ec00d2d 100644
--- a/doc/src/sgml/reference.sgml
+++ b/doc/src/sgml/reference.sgml
@@ -216,6 +216,7 @@
&update;
&vacuum;
&values;
+   &wait;
 
  
 
diff --git a/src/backend/access/transam/Makefile b/src/backend/access/transam/Makefile
index 661c55a9db7..a32f473e0a2 100644
--- a/src/backend/access/transam/Makefile
+++ b/src/backend/access/

Re: Add “FOR UPDATE NOWAIT” lock details to the log.

2025-02-28 Thread Fujii Masao




On 2025/02/28 21:08, Yuki Seino wrote:


On 2025/02/27 15:44, Yuki Seino wrote:

(4) 
https://github.com/postgres/postgres/blob/master/src/backend/access/heap/heapam_handler.c#L463
...I don't know how to reproduce it.

I have confirmed that (4) can be reproduced using the following procedure.

(4) 
https://github.com/postgres/postgres/blob/master/src/backend/access/heap/heapam_handler.c#L463
tx1=# SELECT pg_advisory_lock(0);
tx2=# SELECT * FROM pgbench_accounts WHERE pg_advisory_lock(0) IS NOT NULL FOR 
UPDATE NOWAIT;
tx1=# UPDATE pgbench_accounts SET aid = aid WHERE aid = '1';
tx1=# BEGIN;
tx1=# UPDATE pgbench_accounts SET aid = aid WHERE aid = '1';
tx1=# SELECT pg_advisory_unlock(0);

Send the modified patch.


Thanks for updating the patch!

I encountered a compilation error with the patch. You can also see the error in 
the patch tester.
https://cirrus-ci.com/task/5070779370438656

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION





Re: Enhance 'pg_createsubscriber' to retrieve databases automatically when no database is provided.

2025-02-28 Thread Shubham Khanna
On Fri, Feb 21, 2025 at 5:18 AM Peter Smith  wrote:
>
> On Thu, Feb 20, 2025 at 10:26 PM Hayato Kuroda (Fujitsu)
>  wrote:
> >
> > Dear Shubham,
> >
> > Thanks for updating the patch quickly!
> >
> > > > 04.
> > > > ```
> > > > +# Verify that only user databases got subscriptions (not template 
> > > > databases)
> > > > +my @user_dbs = ($db1, $db2);
> > > > +foreach my $dbname (@user_dbs)
> > > > +{
> > > > +   my $sub_exists =
> > > > + $node_s1->safe_psql($dbname, "SELECT count(*) FROM
> > > pg_subscription;");
> > > > +   is($sub_exists, '3', "Subscription created successfully for 
> > > > $dbname");
> > > > +}
> > > > ```
> > > >
> > > > Hmm, what do you want to check here? pg_subscription is a global 
> > > > catalog so
> > > that
> > > > very loop will see exactly the same content. Also, 'postgres' is also 
> > > > the user
> > > database.
> > > > I feel you must ensure that all three databases (postgres, $db1, and 
> > > > $db2) have
> > > a
> > > > subscription here.
> > > >
> > >
> > > Fixed.
> >
> > My point was that the loop does not have meaning because pg_subscription
> > is a global one. I and Peter considered changes like [1] is needed. It 
> > ensures
> > that each databases have a subscription.
> > Note: [1] cannot pass the test as-is because $db1 and $db2 contains special
> > characters. Please escape appropriately.
>
> Yes. Some test is still needed to confirm the expected subscriptions
> all get created for respective dbs. But, the current test loop just
> isn't doing it properly.
>

Fixed.

> >
> > Other comments are listed in below.
> >
> > 01.
> > ```
> > + -a
> > + --all-dbs
> > ```
> >
> > Peter's comment [2] does not say that option name should be changed.
> > The scope of his comment is only in the .c file.
>
> Yes, that's correct. My v9 suggestion to change 'all' to 'all_dbs' was
> referring only to the field name of struct CreateSubscriberOptions,
> nothing else. Not the usage help, not the error messages, not the
> docs, not the tests, not the commit message.
>

Fixed.

The attached patch at [1] contains the suggested changes.

[1] - 
https://www.postgresql.org/message-id/CAHv8RjJ9BsCg%2Bpur307b1JnfQebnmxFZLw4LdcGX7f-%3D6OK1vw%40mail.gmail.com

Thanks and regards,
Shubham Khanna.




Re: Get rid of WALBufMappingLock

2025-02-28 Thread Álvaro Herrera
On 2025-Feb-28, Michael Paquier wrote:

> Saying that, I have also done similar tests with your v12 for a couple
> of hours and this looks stable under installcheck-world.  I can see
> that you've reworked quite a bit the surroundings of InitializedFrom
> in this one.  If you apply that once again at some point, the
> buildfarm will be judge in the long-term, but I am rather confident by
> saying that the situation looks better here, at least.

Heh, no amount of testing can prove lack of bugs; but for sure "it looks
different now, so it must be correct" must be the weakest proof of
correctness I've heard of!

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/
"I am amazed at [the pgsql-sql] mailing list for the wonderful support, and
lack of hesitasion in answering a lost soul's question, I just wished the rest
of the mailing list could be like this."   (Fotis)
  https://postgr.es/m/200606261359.k5qdxe2p004...@auth-smtp.hol.gr




Re: Introduce "log_connection_stages" setting.

2025-02-28 Thread Bertrand Drouvot
Hi,

On Fri, Feb 28, 2025 at 10:40:37AM +0100, Sergey Dudoladov wrote:
> Hello hackers,
> 
> here is the new rebased version of the patch.
> 

Thanks for the patch!

Did not realized that before but FWIW, it looks "very close" to what Melanie is
doing in [1] (0001 patch). Idea looks similar excepts, among other things, that
log_connections is kept.

[1]: 
https://www.postgresql.org/message-id/CAAKRu_asMtUpxDjts1J60batVqLsSGRodMOq4E9ryg1XdO8EZw%40mail.gmail.com

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




  1   2   >