date:20250414

Re: New committer: Jacob Champion

2025-04-14 Thread Euler Taveira

On Fri, Apr 11, 2025, at 5:26 PM, Jonathan S. Katz wrote:
> Please join us in wishing Jacob much success and few reverts!

Keep up the good work, Jacob!


--
Euler Taveira
EDB   https://www.enterprisedb.com/

Re: [PING] [PATCH v2] parallel pg_restore: avoid disk seeks when jumping short distance forward

2025-04-14 Thread Tom Lane

Dimitrios Apostolou  writes:
> Any feedback on this one-liner? Or is the lack of feedback a clue that I 
> have been missing something important in my patch submission? :-)

The calendar ;-).  At this point we're in feature freeze for v18,
so things that aren't bugs aren't likely to get much attention
until v19 development opens up (in July, unless things are really
going badly with v18).

You should add your patch to the July commitfest [1] to make sure
we don't lose track of it.

regards, tom lane

[1] https://commitfest.postgresql.org/53/

Re: Built-in Raft replication

2025-04-14 Thread Kirill Reshke

On Mon, 14 Apr 2025 at 22:15, Konstantin Osipov  wrote:
>
> Hi,

Hi

> I am considering starting work on implementing a built-in Raft
> replication for PostgreSQL.
>

Just some thought on top of my mind, if you need my voice here:

I have a hard time believing the community will be positive about this
change in-core. It has more changes as contrib extension. In fact, if
we want a built-in consensus algorithm, Paxos is a better option,
because you can use postgresql as local crash-safe storage for single
decree paxos, just store your state (ballot number, last voice) in a
heap table.
OTOH Raft needs to write its own log, and what's worse, it sometimes
needs to remove already written parts of it (so, it is not appended
only, unlike WAL). If you have a production system which maintains two
kinds of logs with different semantics, it is a very hard system to
maintain..

There is actually a prod-ready (non open source) implementation of
RAFT as extension, called BiHA, by pgpro.

Just some thought on top of my mind, if you need my voice here.

-- 
Best regards,
Kirill Reshke

Fixup some appendPQExpBuffer() calls

2025-04-14 Thread David Rowley

In a similar effort to what I did in [1], there's a bunch of
appendPQExpBuffer() calls that could use appendPQExpBufferStr() or
appendPQExpBufferChar(). With [1], I've been keeping those
appendStringInfo calls in check at this time of year for a few years
now. With appendPQExpBuffer I've not been, so a bunch of the changes
in 0002 touch code that's older than master. In one case as old as
v11.

See the attached summary.txt for details on when each of these first
appeared. The 0001 patch is the method I used to find these (which is
a bit cruddy and not for commit)

I'm now wondering if:

1) I should commit the attached 0002 patch now, or;
2) Should commit only the new-to-v18 ones now, or;
3) do nothing.

I think #3 isn't a good option as we (or I) have made efforts in the
past to keep these in check. Also, for me, #2 isn't ideal as if I do
this next year for v19, I'll need to ignore the older inconsistencies,
and I'd rather not have to do that, so I vote for #1.

Any other opinions?

David

[1] 
https://postgr.es/m/CAApHDvqJnNjueb=eoj8k+8n0g7nj_acpwsicj5rnv4fdeja...@mail.gmail.com
initdb.c
cccdbc5d95ac REL_17_STABLE

pg_createsubscriber.c:
d44032d01463 REL_17_STABLE
b3f5ccebd79d REL_17_STABLE
ac0e33136abc master

pg_dump.c:
a563c24c9574 REL_16_STABLE
b29cbd3da4e3 REL_17_STABLE
9a17be1e244a REL_17_STABLE
5ba4cc309095 REL_17_STABLE
a379061a22a8 master
650ab8aaf195 master

pg_dumpall.c:
3d14e171e9e2 REL_16_STABLE

pg_upgrade.c:
29d0a77fa660 REL_17_STABLE

psql/common.c:
49ca462eb165 REL_11_STABLE

vacuumdb.c:
fb56a181175b REL_17_STABLE

fe-auth-oauth-curl.c:
b3f0be788afc master

test_escape.c:
ac00ff1c9608 master


v1-0001-Adjust-code-to-highlight-appendStringInfo-misusag.patch
Description: Binary data


v1-0002-Fixup-various-usages-of-appendPQExpBuffer.patch
Description: Binary data

RE: Rename injection point names in test_aio

2025-04-14 Thread Hayato Kuroda (Fujitsu)

Dear Michael,

> 93bc3d75d8e1 has introduced a couple of new injection points, but
> these don't follow the convention in-place where points are named
> more-or-less-like-that.  Please find attached a patch to make all
> these more consistent.

I have no objections for the patch, but I feel there are no concrete naming 
rules
(I confused while creating patches). Can we clarify that? E.g., first term 
should
be a module or process, or something like that.

Best regards,
Hayato Kuroda
FUJITSU LIMITED

Re: Add estimated hit ratio to Memoize in EXPLAIN to explain cost adjustment

2025-04-14 Thread Ilia Evdokimov

As there seems to be general support for the idea behind this patch, and 
no one has raised strong objections recently, but also no clear 
consensus on the exact final shape of the output has been reached yet - 
and I understand other priorities might be taking precedence right now - 
I’ve registered the patch in the upcoming CommitFest 2025-07 to ensure 
it stays in the review process.


https://commitfest.postgresql.org/patch/5697/

--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.

Re: POC: make mxidoff 64 bits

2025-04-14 Thread wenhui qiu

HI Heikki
Pls Kindly help to create task in https://commitfest.postgresql.org/53/
,I can not found this path in
Commitfest 2025-07


Thanks

On Wed, Apr 2, 2025 at 2:26 AM Heikki Linnakangas  wrote:

> On 01/04/2025 21:25, Heikki Linnakangas wrote:
> > On 07/03/2025 13:30, Maxim Orlov wrote:
> >> Here is a rebase, v14.
> >
> > Thanks! I did some manual testing of this. I created a little helper
> > function to consume multixids, to test the autovacuum behavior, and
> > found one issue:
>
> Forgot to attach the test function I used, here it is.
>
> --
> Heikki Linnakangas
> Neon (https://neon.tech)

Support for runtime parameters in injection points, for AIO tests

2025-04-14 Thread Michael Paquier

Hi all,
(Andres in CC.)

While reading the AIO code I have noticed that the way it uses
injection points is limited by the fact that we don't have support for
runtime parameters in the existing injection point facility.  The code
is shaped with two set/get functions that set a static parameter that
the injection callback would reuse internally, using
pgaio_inj_io_get(), and pgaio_io_call_inj() and a static
pgaio_inj_cur_handle.  Relying on a static variable for that is not a
good idea, IMO, even if the stack is reset with a TRY/CATCH block on
error in the callback run.

Supporting runtime parameters in injection points is something that I
have mentioned as wanted a couple of times, but I was waiting for an
actual use-case in core before adding support for it, as mentioned
around here:
https://www.postgresql.org/message-id/z_yn_galw-pe2...@paquier.xyz

test_aio is bringing one.

We are still in early April, and I'd like to propose a cleanup of the
AIO code on HEAD, even if we are post-feature freeze, to not release
this new code in this state, implying an open item.  Please note that
I'm OK to take responsibility for this patch set at the end, reviews
are welcome.

Anyway, I have spent some time with my mind on that, and finished
with the attached patch set:
- 0001 is the addition of runtime parameters in the backend code.  I
have made the choice of extending the existing INJECTION_POINT() and
INJECTION_POINT_CACHED() instead of introducing new macros.  That's a
matter of taste, perhaps, but increasing the number of these macros
leads to a confusing result.  This one is more invasive, but that's OK
for me.  The code is shaped so as we can rely on
InjectionPointCallback to define the shape of a callback.  It is 
possible to pass down a full structure if one wants, that the callback
is then responsible for translating back.  The AIO code only want an
AIO handle, which is simple.
- 0002 introduces a few updates to the module injection_points, adding
support for runtime parameters and some tests.
- 0003 cleans up the AOI test code, relying on 0001.

The CI is passing.  Thoughts and comments are welcome.
--
Michael
From 7e8453987d127144477a8e7e9fa7ae61e48091be Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Mon, 14 Apr 2025 16:24:18 +0900
Subject: [PATCH 1/3] Add support for runtime arguments in injection points

The macros INJECTION_POINT() and INJECTION_POINT_CACHED() are extended
with an optional argument that can be passed down to the callback
defined.
---
 src/include/utils/injection_point.h   | 15 ---
 src/backend/access/gin/ginbtree.c |  6 +++---
 src/backend/access/heap/heapam.c  |  2 +-
 src/backend/access/index/genam.c  |  2 +-
 src/backend/access/transam/multixact.c|  4 ++--
 src/backend/access/transam/xlog.c |  2 +-
 src/backend/commands/indexcmds.c  |  4 ++--
 src/backend/executor/nodeAgg.c| 10 +-
 src/backend/libpq/be-secure-gssapi.c  |  2 +-
 src/backend/libpq/be-secure.c |  2 +-
 src/backend/postmaster/autovacuum.c   |  2 +-
 src/backend/storage/aio/aio.c |  2 +-
 src/backend/tcop/backend_startup.c|  2 +-
 src/backend/tcop/postgres.c   |  6 +++---
 src/backend/utils/cache/catcache.c|  2 +-
 src/backend/utils/cache/inval.c   |  2 +-
 src/backend/utils/cache/typcache.c|  2 +-
 src/backend/utils/init/postinit.c |  2 +-
 src/backend/utils/misc/injection_point.c  |  8 
 .../modules/injection_points/injection_points.c   |  4 ++--
 src/test/modules/test_slru/test_multixact.c   |  2 +-
 doc/src/sgml/xfunc.sgml   | 11 +++
 22 files changed, 49 insertions(+), 45 deletions(-)

diff --git a/src/include/utils/injection_point.h b/src/include/utils/injection_point.h
index 6ba64cd1ebc6..470e1fc582d8 100644
--- a/src/include/utils/injection_point.h
+++ b/src/include/utils/injection_point.h
@@ -16,13 +16,13 @@
  */
 #ifdef USE_INJECTION_POINTS
 #define INJECTION_POINT_LOAD(name) InjectionPointLoad(name)
-#define INJECTION_POINT(name) InjectionPointRun(name)
-#define INJECTION_POINT_CACHED(name) InjectionPointCached(name)
+#define INJECTION_POINT(name, arg) InjectionPointRun(name, arg)
+#define INJECTION_POINT_CACHED(name, arg) InjectionPointCached(name, arg)
 #define IS_INJECTION_POINT_ATTACHED(name) IsInjectionPointAttached(name)
 #else
 #define INJECTION_POINT_LOAD(name) ((void) name)
-#define INJECTION_POINT(name) ((void) name)
-#define INJECTION_POINT_CACHED(name) ((void) name)
+#define INJECTION_POINT(name, arg) ((void) name)
+#define INJECTION_POINT_CACHED(name, arg) ((void) name)
 #define IS_INJECTION_POINT_ATTACHED(name) (false)
 #endif
 
@@ -30,7 +30,8 @@
  * Typedef for callback function launched by an injection point.
  */
 typedef void (*Injec

rounding_up

2025-04-14 Thread Daria Shanina

Hello everyone!

I noticed, when we parse and validate values (in particular, the int type),
we use the *rint* method, but unfortunately it does not work according to
the round rules. Although on the website
https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/rint-rintf-rintl?view=msvc-170
says something else.

I tested at several OS:


Lubuntu

daria-shanina@lnv-dshanina:~/projects/test$ ./rounding_up

rint(2.00) is 2.0 | round(2.00) is 2.0 | ceil(2.00) is 2.0 | floor(2.00) is
2.0

rint(2.10) is 2.0 | round(2.10) is 2.0 | ceil(2.10) is 3.0 | floor(2.10) is
2.0

rint(2.20) is 2.0 | round(2.20) is 2.0 | ceil(2.20) is 3.0 | floor(2.20) is
2.0

rint(2.30) is 2.0 | round(2.30) is 2.0 | ceil(2.30) is 3.0 | floor(2.30) is
2.0

rint(2.40) is 2.0 | round(2.40) is 2.0 | ceil(2.40) is 3.0 | floor(2.40) is
2.0

rint(2.50) is 2.0 | round(2.50) is 3.0 | ceil(2.50) is 3.0 | floor(2.50) is
2.0

rint(2.60) is 3.0 | round(2.60) is 3.0 | ceil(2.60) is 3.0 | floor(2.60) is
2.0

rint(2.70) is 3.0 | round(2.70) is 3.0 | ceil(2.70) is 3.0 | floor(2.70) is
2.0

rint(2.80) is 3.0 | round(2.80) is 3.0 | ceil(2.80) is 3.0 | floor(2.80) is
2.0

rint(2.90) is 3.0 | round(2.90) is 3.0 | ceil(2.90) is 3.0 | floor(2.90) is
2.0


FreeBSD

daria@2ndfreebsd:~/projects/test$ ./rounding_up

rint(2.00) is 2.0 | round(2.00) is 2.0 | ceil(2.00) is 2.0 | floor(2.00) is
2.0

rint(2.10) is 2.0 | round(2.10) is 2.0 | ceil(2.10) is 3.0 | floor(2.10) is
2.0

rint(2.20) is 2.0 | round(2.20) is 2.0 | ceil(2.20) is 3.0 | floor(2.20) is
2.0

rint(2.30) is 2.0 | round(2.30) is 2.0 | ceil(2.30) is 3.0 | floor(2.30) is
2.0

rint(2.40) is 2.0 | round(2.40) is 2.0 | ceil(2.40) is 3.0 | floor(2.40) is
2.0

rint(2.50) is 2.0 | round(2.50) is 3.0 | ceil(2.50) is 3.0 | floor(2.50) is
2.0

rint(2.60) is 3.0 | round(2.60) is 3.0 | ceil(2.60) is 3.0 | floor(2.60) is
2.0

rint(2.70) is 3.0 | round(2.70) is 3.0 | ceil(2.70) is 3.0 | floor(2.70) is
2.0

rint(2.80) is 3.0 | round(2.80) is 3.0 | ceil(2.80) is 3.0 | floor(2.80) is
2.0

rint(2.90) is 3.0 | round(2.90) is 3.0 | ceil(2.90) is 3.0 | floor(2.90) is
2.0


Windows

C:\Users\Daria\projects\test>rounding_up.exe

rint(2.00) is 2.0 | round(2.00) is 2.0 | ceil(2.00) is 2.0 | floor(2.00) is
2.0

rint(2.10) is 2.0 | round(2.10) is 2.0 | ceil(2.10) is 3.0 | floor(2.10) is
2.0

rint(2.20) is 2.0 | round(2.20) is 2.0 | ceil(2.20) is 3.0 | floor(2.20) is
2.0

rint(2.30) is 2.0 | round(2.30) is 2.0 | ceil(2.30) is 3.0 | floor(2.30) is
2.0

rint(2.40) is 2.0 | round(2.40) is 2.0 | ceil(2.40) is 3.0 | floor(2.40) is
2.0

rint(2.50) is 2.0 | round(2.50) is 3.0 | ceil(2.50) is 3.0 | floor(2.50) is
2.0

rint(2.60) is 3.0 | round(2.60) is 3.0 | ceil(2.60) is 3.0 | floor(2.60) is
2.0

rint(2.70) is 3.0 | round(2.70) is 3.0 | ceil(2.70) is 3.0 | floor(2.70) is
2.0

rint(2.80) is 3.0 | round(2.80) is 3.0 | ceil(2.80) is 3.0 | floor(2.80) is
2.0

rint(2.90) is 3.0 | round(2.90) is 3.0 | ceil(2.90) is 3.0 | floor(2.90) is
2.0


As you could see in the output, the *round* method works according to the
rules. Maybe we should use it?


Thank you for your attention!


--

Best regards,

Daria Shanina

Re: New committer: Jacob Champion

2025-04-14 Thread Jelte Fennema-Nio

On Fri, 11 Apr 2025 at 22:26, Jonathan S. Katz  wrote:
>
> The Core Team would like to extend our congratulations to Jacob
> Champion, who has accepted an invitation to become our newest PostgreSQL
> committer.
>
> Please join us in wishing Jacob much success and few reverts!


Congrats! Well deserved!

RE: doc patch: clarify the naming rule for injection_points

2025-04-14 Thread Hayato Kuroda (Fujitsu)

Dear Aleksander,


> ```
> - their own code using the same macro.
> + their own code using the same macro.  The name of injection points must
> be
> + lower characters, and dashes must separate its terms.
> ```
> 
> Perhaps "must" is a too strong statement. I suggest something like:
> 
> """
> By convention, the name of injection points ...
> """

It is always difficult for me to distinguish them, thanks for comments.
PSA updated.

> 
> I have mixed feelings about the patch overall though. This would mean
> that we need to rename injection points like this:
> 
> ``
> AtEOXact_Inval-with-transInvalInfo
> heap_update-before-pin
> ```
> 
> Otherwise we would contradict our own documentation.

Thanks for telling these counterexamples. I grepped with "INJECTION_POINTS" and
"IS_INJECTION_POINT_ATTACHED", and no others are found.

> I don't mind heap-update-before-pin, but not 100% sure if:
> 
> at-eo-xact-inval-with-trans-inval-info
> 
> ... would be a better name than the current one.

I also feel just converting lower case is not good. The point seems to locate in
the end-of-transaction callback and it accepts invalidation messages. Based on 
the
fact, how about "inval-process-invalidation-messages"?
0002 did simple replacements of these words.

Best regards,
Hayato Kuroda
FUJITSU LIMITED



v2-0002-Follow-naming-rules-for-injection-points.patch
Description: v2-0002-Follow-naming-rules-for-injection-points.patch


v2-0001-Doc-clarify-the-naming-rule-for-injection-points.patch
Description:  v2-0001-Doc-clarify-the-naming-rule-for-injection-points.patch

Re: pgsql: Non text modes for pg_dumpall, correspondingly change pg_restore

2025-04-14 Thread Andrew Dunstan




On 2025-04-14 Mo 8:20 AM, Álvaro Herrera wrote:

On 2025-Apr-04, Andrew Dunstan wrote:


Non text modes for pg_dumpall, correspondingly change pg_restore

I think the new oid_string_list stuff in this commit is unnecessary, and
we can remove a bunch of lines by simplifying that to using
SimplePtrList, as the attached illustrates.  (Perhaps the names of
members of the proposed struct can be improved.)



Sure, we can do it that way.




I also propose to remove the open-coded implementation of pg_get_line.

WDYT?



seems reasonable




I'm also not sure about the double sscanf() business there ... There
must be a better way to do this.




Yes, probably. I'll look into that if you like.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

[PING] [PATCH v2] parallel pg_restore: avoid disk seeks when jumping short distance forward

2025-04-14 Thread Dimitrios Apostolou


On Fri, 4 Apr 2025, Dimitrios Apostolou wrote:


I just managed to run pgindent, here is v2 with the comment style fixed.


Any feedback on this one-liner? Or is the lack of feedback a clue that I 
have been missing something important in my patch submission? :-)


Should I CC people that are frequent committers to the file?


Thanks,
Dimitris

Re: Add estimated hit ratio to Memoize in EXPLAIN to explain cost adjustment

2025-04-14 Thread Robert Haas

On Tue, Apr 1, 2025 at 5:07 PM David Rowley  wrote:
> I tried to move things along to address Tom's concern about not
> wanting to show this in EXPLAIN's standard output. I suggested in [1]
> that we could use EXPLAIN ANALYZE, but nobody commented on that.
> EXPLAIN ANALYZE is much more verbose than EXPLAIN already, and if we
> put these in EXPLAIN ANALYZE then the viewer can more easily compare
> planned vs actual. I had mentioned that Hash Join does something like
> this for buckets.

I don't think we should use ANALYZE for this because, IME, that should
just be about whether the query gets executed. Since this looks like
information that is available at plan time, I think it should be
displayed all the time or made contingent on VERBOSE. It would be sad
if you had to run the query to get information that was available
without needing to run the query. Personally, I think we can probably
just display it all the time. I mean, the typical plan is not going to
contain enough Memoize nodes that a little extra chatter for each one
impedes readability significantly.

As far as what to display, I have sympathy with Lukas's initial
complaint that it was hard to understand the costing of Memoize. I
haven't faced this exact problem, I think because Memoize isn't that
often used in plans I have seen, but I've faced a lot of similar
problems where you have to try to work backwards painfully to figure
out the chain of events that led to the value you're actually seeing,
and I'm +1 for efforts to make it more clear.

Having looked at v6, I think it would help, at least if the reader is
sufficiently knowledgeable. From the values displayed, it looks like
someone could reconstruct the evict_ratio value in
cost_memoize_rescan(), and if they know the loop count, also the
hit_ratio. But that seems hard: if you weren't reading the code, how
would you know how to do it? Even if you are reading the code, are you
sure you'd reconstruct it correctly? I wonder why we think it's better
to display this than a more "cooked" number like the estimated hit
ratio that was proposed in v1?

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: pgsql: Non text modes for pg_dumpall, correspondingly change pg_restore

2025-04-14 Thread Álvaro Herrera

On 2025-Apr-04, Andrew Dunstan wrote:

> Non text modes for pg_dumpall, correspondingly change pg_restore

I think the new oid_string_list stuff in this commit is unnecessary, and
we can remove a bunch of lines by simplifying that to using
SimplePtrList, as the attached illustrates.  (Perhaps the names of
members of the proposed struct can be improved.)

I also propose to remove the open-coded implementation of pg_get_line.

WDYT?

I'm also not sure about the double sscanf() business there ... There
must be a better way to do this.

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/
>From 91d1fcee443e96d09fa6c23f30bdee7da279ada5 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=C3=81lvaro=20Herrera?= 
Date: Mon, 14 Apr 2025 13:57:48 +0200
Subject: [PATCH] remove unnecessary oid_string list stuff

Also use pg_get_line_buf() instead of open-coding it
---
 src/bin/pg_dump/pg_restore.c   | 93 ++
 src/fe_utils/simple_list.c | 41 -
 src/include/fe_utils/simple_list.h | 16 -
 src/tools/pgindent/typedefs.list   |  1 +
 4 files changed, 57 insertions(+), 94 deletions(-)

diff --git a/src/bin/pg_dump/pg_restore.c b/src/bin/pg_dump/pg_restore.c
index ff4bb320fc9..4ace56891e7 100644
--- a/src/bin/pg_dump/pg_restore.c
+++ b/src/bin/pg_dump/pg_restore.c
@@ -67,10 +67,20 @@ static int	process_global_sql_commands(PGconn *conn, const char *dumpdirpath,
 		const char *outfile);
 static void copy_or_print_global_file(const char *outfile, FILE *pfile);
 static int	get_dbnames_list_to_restore(PGconn *conn,
-		SimpleOidStringList *dbname_oid_list,
+		SimplePtrList *dbname_oid_list,
 		SimpleStringList db_exclude_patterns);
 static int	get_dbname_oid_list_from_mfile(const char *dumpdirpath,
-		   SimpleOidStringList *dbname_oid_list);
+		   SimplePtrList *dbname_oid_list);
+
+/*
+ * Stores a database OID and the corresponding name.
+ */
+typedef struct DbOidName
+{
+	Oid			oid;
+	char		str[FLEXIBLE_ARRAY_MEMBER]; /* null-terminated string here */
+} DbOidName;
+
 
 int
 main(int argc, char **argv)
@@ -927,7 +937,7 @@ read_one_statement(StringInfo inBuf, FILE *pfile)
  */
 static int
 get_dbnames_list_to_restore(PGconn *conn,
-			SimpleOidStringList *dbname_oid_list,
+			SimplePtrList *dbname_oid_list,
 			SimpleStringList db_exclude_patterns)
 {
 	int			count_db = 0;
@@ -943,13 +953,14 @@ get_dbnames_list_to_restore(PGconn *conn,
 	 * Process one by one all dbnames and if specified to skip restoring, then
 	 * remove dbname from list.
 	 */
-	for (SimpleOidStringListCell * db_cell = dbname_oid_list->head;
+	for (SimplePtrListCell *db_cell = dbname_oid_list->head;
 		 db_cell; db_cell = db_cell->next)
 	{
+		DbOidName  *dbidname = (DbOidName *) db_cell->ptr;
 		bool		skip_db_restore = false;
 		PQExpBuffer db_lit = createPQExpBuffer();
 
-		appendStringLiteralConn(db_lit, db_cell->str, conn);
+		appendStringLiteralConn(db_lit, dbidname->str, conn);
 
 		for (SimpleStringListCell *pat_cell = db_exclude_patterns.head; pat_cell; pat_cell = pat_cell->next)
 		{
@@ -957,7 +968,7 @@ get_dbnames_list_to_restore(PGconn *conn,
 			 * If there is an exact match then we don't need to try a pattern
 			 * match
 			 */
-			if (pg_strcasecmp(db_cell->str, pat_cell->val) == 0)
+			if (pg_strcasecmp(dbidname->str, pat_cell->val) == 0)
 skip_db_restore = true;
 			/* Otherwise, try a pattern match if there is a connection */
 			else if (conn)
@@ -972,7 +983,7 @@ get_dbnames_list_to_restore(PGconn *conn,
 if (dotcnt > 0)
 {
 	pg_log_error("improper qualified name (too many dotted names): %s",
- db_cell->str);
+ dbidname->str);
 	PQfinish(conn);
 	exit_nicely(1);
 }
@@ -982,7 +993,7 @@ get_dbnames_list_to_restore(PGconn *conn,
 if ((PQresultStatus(res) == PGRES_TUPLES_OK) && PQntuples(res))
 {
 	skip_db_restore = true;
-	pg_log_info("database \"%s\" matches exclude pattern: \"%s\"", db_cell->str, pat_cell->val);
+	pg_log_info("database \"%s\" matches exclude pattern: \"%s\"", dbidname->str, pat_cell->val);
 }
 
 PQclear(res);
@@ -1001,8 +1012,8 @@ get_dbnames_list_to_restore(PGconn *conn,
 		 */
 		if (skip_db_restore)
 		{
-			pg_log_info("excluding database \"%s\"", db_cell->str);
-			db_cell->oid = InvalidOid;
+			pg_log_info("excluding database \"%s\"", dbidname->str);
+			dbidname->oid = InvalidOid;
 		}
 		else
 		{
@@ -1024,13 +1035,14 @@ get_dbnames_list_to_restore(PGconn *conn,
  * Returns, total number of database names in map.dat file.
  */
 static int
-get_dbname_oid_list_from_mfile(const char *dumpdirpath, SimpleOidStringList *dbname_oid_list)
+get_dbname_oid_list_from_mfile(const char *dumpdirpath, SimplePtrList *dbname_oid_list)
 {
+	StringInfoData linebuf;
 	FILE	   *pfile;
 	char		map_file_path[MAXPGPATH];
-	char		line[MAXPGPATH];
 	int			count = 0;
 
+
 	/*
 	 * If there is only global.dat file in dump, then return from h

Re: BitmapHeapScan streaming read user and prelim refactoring

2025-04-14 Thread James Hunter

On Thu, Apr 10, 2025 at 8:15 PM Thomas Munro  wrote:
>
> On Fri, Apr 11, 2025 at 5:50 AM James Hunter  
> wrote:
> > I am looking at the pre-streaming code, in PG 17, as I am not familiar
> > with the PG 18 "streaming" code. Back in PG 17, nodeBitmapHeapscan.c
> > maintained two shared TBM iterators, for PQ. One of the iterators was
> > the actual, "fetch" iterator; the other was the "prefetch" iterator,
> > which kept some distance ahead of the "fetch" iterator (to hide read
> > latency).
>
> We're talking at cross-purposes.
>
> The new streaming BHS isn't just issuing probabilistic hints about
> future access obtained from a second iterator.  It has just one shared
> iterator connected up to the workers' ReadStreams.  Each worker pulls
> a disjoint set of blocks out of its stream, possibly running a bunch
> of IOs in the background as required.  The stream replaces the old
> ReadBuffer() call, and the old PrefetchBuffer() call and a bunch of
> dubious iterator synchronisation logic are deleted.

Thanks for the clarification -- I realize I am very late to this
conversation (as I rejoined a PostgreSQL-related team only within the
past year), but I think this whole "ReadStream" project is misguided.

Of course, late feedback is a burden, but I think our discussion here
(and, in the future, if you try to "ReadStream" BTree index pages,
themselves) illustrates my point.  In this specific case, you are
proposing a lot of complexity, but it's not at all clear to me: why?

I see two orthogonal problems, in processing Bitmap Heap pages in
parallel: (1) we need to prefetch enough pages, far enough in advance,
to hide read latency; (2) later, every parallel worker needs to be
given a set of pages to process, in a way that minimizes contention.

The easiest way to hand out work to parallel workers (and often the
best) is to maintain a single, shared, global work queue. Just put
whatever pages you prefetch into a FIFO queue, and let each worker
pull one piece of "work" off that queue. In this was, there's no
"ramp-down" problem.

If you find that contention on this shared queue becomes a bottleneck,
then you just pull *n* pieces of work, in a batch. Then the
"ramp-down" problem is limited to "n", instead of just 1. Often, one
can find a suitable value of n that simultaneously makes contention
effectively zero, while avoiding "ramp-down" problems; say n = 10.

So much for popping from the shared queue. Pushing to the shared queue
is also easy, because you have async reads. Anytime a worker pops a
(batch of) work item(s) off the shared queue, it checks to see if the
queue is still large enough. If not, it issues the appropriate
prefetch / "ReadStream" calls.

A single shared queue is easiest, but sometimes there's no way to
prevent it from becoming a bottleneck. In that case, one typically
partitions the input at startup, gives each worker its own partition,
and waits for all workers to complete. In this, second, model, workers
are entirely independent, so there is no contention: we scale out
perfectly. The problem, as you've pointed out, is that one worker
might finish its own work long before another; and then the worker
that finished its work is idle and therefore wasted.

This is why a single shared queue is so nice, because it avoids
workers being idle. But I am confused by your proposal, which seems to
be trying to get the behavior of a single shared queue, but
implemented with the added complexity of multiple queues.

Why not just use a single queue?

> These are now
> real IOs running in the background and for the *exact* blocks you will
> consume; posix_fadvise() was just a stepping towards AIO that
> tolerated sloppy synchronisation including being entirely wrong.

It has never been clear to me why prefetching the exact blocks you'll
later consume is seen as a *benefit*, rather than a *cost*. I'm not
aware of any prefetch interface, other than PG's "ReadStream," that
insists on this. But that's a separate discussion...

>  If
> you additionally teach the iterator to work in batches, as my 0001
> patch (which I didn't propose for v18) showed, then one worker might
> end up processing (say) 10 blocks at end-of-scan while all the other
> workers have finished the node, and maybe the whole query.

The standard solution to the problem you describe here is to pick a
batch size small enough that you don't care. For example, no matter
what you do, it will always be possible for one worker to end up
processing *1* extra block, end of scan. If 1 is OK, but 10 is too
large, then I would try 5. AFAIU, the argument for larger batches is
to reduce contention; and the argument for smaller batches is to
reduce amount of time workers are idle, at end of query. The problem
seems amenable to a static solution -- there should be a value of n
that satisfies both restrictions.

Otherwise, we'd implicitly be saying that contention is very slow,
compared to the time it takes a worker to process a block; while also
saying that it tak

Re: ci: Allow running mingw tests by default via environment variable

2025-04-14 Thread Nathan Bossart

On Sat, Apr 12, 2025 at 08:18:35PM +1200, Thomas Munro wrote:
> Is it safe to assume that CI changes come under the same code-freeze
> exception as tests and docs?  I would assume so, it's just scripting
> glue in .cirrus.XXX files not affecting one bit of the software, but I
> don't recall that we specifically clarified that before, so can
> someone from the RMT please +1 this?  Then I'll go ahead and push
> these.

That seems reasonable to me.  But to play devil's advocate, is there a
strong reason for doing this before July?

-- 
nathan

Re: [PoC] Federated Authn/z with OAUTHBEARER

2025-04-14 Thread Jacob Champion

On Mon, Apr 14, 2025 at 11:27 AM Wolfgang Walther
 wrote:
>src/interfaces/libpq-oauth/meson.build:18:22: ERROR: File
> oauth-curl.c does not exist.
>
> This.. clears it up, because that file is indeed missing for me on disk.

Aha! Okay, glad I don't need to track that down.

> libpq.a
> libpq-oauth-18.a
>
> The libpq.a file has no references to dlopen, but plenty of references
> to curl stuff.

Which references? libpq-oauth should be the only thing using Curl symbols:

$ nm src/interfaces/libpq/libpq.a | grep --count curl
0
$ nm src/interfaces/libpq-oauth/libpq-oauth-18.a | grep --count curl
116

> I'm not sure what the libpq-oauth-18.a file is for.

That implements the flow. You'll need to link that into your
application or it will complain about missing flow symbols. (I don't
think there's an easy way to combine the two libraries in our Autoconf
setup; the only ways I can think of right now would introduce a
circular dependency between libpq and libpq-oauth...)

Thanks!
--Jacob

Re: ci: Allow running mingw tests by default via environment variable

2025-04-14 Thread Andres Freund

Hi, 

On April 14, 2025 9:22:40 PM GMT+02:00, Nathan Bossart 
 wrote:
>On Sat, Apr 12, 2025 at 08:18:35PM +1200, Thomas Munro wrote:
>> Is it safe to assume that CI changes come under the same code-freeze
>> exception as tests and docs?  I would assume so, it's just scripting
>> glue in .cirrus.XXX files not affecting one bit of the software, but I
>> don't recall that we specifically clarified that before, so can
>> someone from the RMT please +1 this?  Then I'll go ahead and push
>> these.
>
>That seems reasonable to me.  But to play devil's advocate, is there a
>strong reason for doing this before July?

We get more coverage of the changes we merge until then. 

Greetings, 

Andres
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: doc patch: clarify the naming rule for injection_points

2025-04-14 Thread Aleksander Alekseev

Hi,

> Naming rule of points is not determined yet, but most of them have lower cases
> and each term are divided by dash "-". I think this is a good chance to 
> formally
> clarify it. PSA adding the description.
>
> I was confused the correct place for the description. I added at the end of 
> first
> paragraph, because it describes how we add and use it. Suggestions are very
> welcomed.

```
- their own code using the same macro.
+ their own code using the same macro.  The name of injection points must be
+ lower characters, and dashes must separate its terms.
```

Perhaps "must" is a too strong statement. I suggest something like:

"""
By convention, the name of injection points ...
"""

I have mixed feelings about the patch overall though. This would mean
that we need to rename injection points like this:

``
AtEOXact_Inval-with-transInvalInfo
heap_update-before-pin
```

Otherwise we would contradict our own documentation.

I don't mind heap-update-before-pin, but not 100% sure if:

at-eo-xact-inval-with-trans-inval-info

... would be a better name than the current one.

--
Best regards,
Aleksander Alekseev

Re: rounding_up

2025-04-14 Thread Andrey Borodin

Hi Daria!

> On 14 Apr 2025, at 13:24, Daria Shanina  wrote:
> 
> when we parse and validate values (in particular, the int type), we use the 
> rint method, but unfortunately it does not work according to the round rules.

Are this concerns explainable in SQL query?

As far as I can see from your data, rint() is consistent across OSes. Can user 
observe any inconsistency caused by rint() behavior in PostgreSQL?

Thanks!

Best regards, Andrey Borodin.

Re: rounding_up

2025-04-14 Thread Christoph Moench-Tegeder

Hi,

## Daria Shanina (vilensip...@gmail.com):

> I noticed, when we parse and validate values (in particular, the int type),
> we use the *rint* method, but unfortunately it does not work according to
> the round rules.

First question would be "which round rule?" as (of course) there're
multiple to chose from.

Second, the rules in use are consistent with the documentation of
round(double precision)
: Rounds to nearest integer. For numeric, ties are broken by rounding
: away from zero. For double precision, the tie-breaking behavior is
: platform dependent, but “round to nearest even” is the most common rule.
https://www.postgresql.org/docs/current/functions-math.html
and I think it makes sense to have round() and implicit rounding
behave the same.

Third, rint() works the way you set it to with fesetround() (see
man page). And that works on the nearest Linux and FreeBSD I
could grab :)

Regards,
Christoph

-- 
Spare Space.

Re: Adding error messages to a few slash commands

2025-04-14 Thread Abhishek Chanda

Thanks Pavel, done now https://commitfest.postgresql.org/patch/5699/

Thanks

On Mon, Apr 14, 2025 at 4:27 AM Pavel Luzanov  wrote:
>
> On 13.04.2025 19:40, Abhishek Chanda wrote:
>
> Thanks for the feedback, attached an updated patch that changes most
> of those "Did not find" messages to empty tables. I did not change two
> sets, listDbRoleSettings and listTables both have comments describing
> that these are deliberately this way.
>
>
> Thanks.
>
> I wanted to post this update to close this loop for now. I will follow
> up once the merge window opens again.
>
>
> I recommend to create a new entry for this patch in the open July commitfest.
> I will be able to review this patch in a couple months later in June,
> if no one wants to review it earlier.
>
> --
> Pavel Luzanov
> Postgres Professional: https://postgrespro.com



-- 
Thanks and regards
Abhishek Chanda

RE: Rename injection point names in test_aio

2025-04-14 Thread Hayato Kuroda (Fujitsu)

Dear Michael,

> > I have no objections for the patch, but I feel there are no concrete naming 
> > rules
> > (I confused while creating patches).
> 
> There has been a sort of implied rule in the code to use lower
> characters, with terms separated by dashes.  Perhaps we could make
> that more official with an extra sentence in the docs about that.

Agreed.

> > Can we clarify that? E.g., first term should be a module or process,
> > or something like that.
> 
> Not sure that it would be a good thing to put context-specific
> restrictions here.

My main concern is that in someday the name of injection points might be 
conflict,
and it might be painful to consider after the number of points is increased.
But it's OK to leave here now.

> Anyway, would you like to propose a patch for the documentation?

Sure, I did. Please see [1].

[1]: 
https://www.postgresql.org/message-id/OSCPR01MB14966E14C1378DEE51FB7B7C5F5B32%40OSCPR01MB14966.jpnprd01.prod.outlook.com

Best regards,
Hayato Kuroda
FUJITSU LIMITED

HELP: SAVEPOINT feature cases

2025-04-14 Thread itumonohito

Dear hackers,

I hope this email finds you well.

I am writing to inquire about the use cases for the SAVEPOINT feature.

I would appreciate it if you could provide some examples of scenarios where
this functionality proves particularly useful.

Thank you for your time and assistance. I look forward to your response.

Sincerely,

jack.

Re: HELP: SAVEPOINT feature cases

2025-04-14 Thread Kirill Reshke

On Mon, 14 Apr 2025 at 21:54, itumonohito  wrote:
>
> Dear hackers,
>
> I hope this email finds you well.
>
> I am writing to inquire about the use cases for the SAVEPOINT feature.
>
> I would appreciate it if you could provide some examples of scenarios where 
> this functionality proves particularly useful.
>
> Thank you for your time and assistance. I look forward to your response.
>
> Sincerely,
>
> jack.

Have you read https://www.postgresql.org/docs/current/sql-savepoint.html
? There are a few examples here.

-- 
Best regards,
Kirill Reshke

Re: Conflicting updates of command progress

2025-04-14 Thread Sami Imseih

> While working on [1] I realized that some field of pg_stat_progress_cluste has
> weird value.

Is there a repro that you can share that shows the weird values? It sounds like
the repro is on top of [1]. Is that right?

> AFAICS the current design does not consider that one progress-reporting
> command can be called by another one.

pgstat_progress_start_command should only be called once by the entry
point for the
command. In theory, we could end up in a situation where start_command
is called multiple times during the same top-level command;

> Not sure what the correct fix is. We can
> either ignore update requests from the "nested" commands, or display the

There is a pattern where we do

```
if (progress)
  pgstat_progress_update_param
```

cluster_rel can pass down a flag to index_build or others that update progress
to not report progress. Therefore, only the top level command is
updating progress.
what do you think?


[1] https://commitfest.postgresql.org/patch/5117/

--
Sami Imseih
Amazon Web Services (AWS)

Built-in Raft replication

2025-04-14 Thread Konstantin Osipov

Hi,

I am considering starting work on implementing a built-in Raft
replication for PostgreSQL.

Raft's advantage is that it unifies log replication, cluster
configuration/membership/topology management and initial state transfer 
into a single protocol. 

Currently the cluster configuration/topology is often managed by
Patroni, or similar tools, however, it seems there are certain
usability drawbacks with this approach: 

- it's a separate tool, requiring an external state provider like etcd;
  raft could store its configuration in system tables; this is
  also an observability improvement since everyone could look up 
  cluster state the same way as everything else

- same for watchdog; raft has a built-in failure detector that's
  configuration aware;

- flexible quorums; currently quorum size is a configurable; 
  with a built-in raft, extending the quorum could be a matter
  of starting a new node and pointing it to an existing cluster

Going forward I can see PostgreSQL providing transparent bouncing
on pg_wire level, given that Raft state is now part of the
system, so drivers and all cluster nodes could easily see where
the leader is. 

If anyone is working on Raft already I'd be happy to discuss
the details. I am fairly new to the PostgreSQL hackers ecosystem
so cautious of starting work in isolation/knowing there is no
interest in accepting the feature into the trunk.

Thanks,

-- 
Konstantin Osipov

Re: An incorrect check in get_memoize_path

2025-04-14 Thread wenhui qiu

HI
  No objections.It's a pity that the postgresql18 version has been
code-frozen


Thanks

On Mon, Apr 14, 2025 at 4:21 PM Andrei Lepikhov  wrote:

> On 4/14/25 08:49, Richard Guo wrote:
> > On Mon, Apr 7, 2025 at 4:50 PM Richard Guo 
> wrote:
> >> Hence, I propose the attached patch for the fix.
> >>
> >> BTW, there is a XXX comment there saying that maybe we can make the
> >> remaining join quals part of the inner scan's filter instead of the
> >> join filter.  I don't think this is possible in all cases.  In the
> >> above query, 'coalesce(t2.b) = t3.b' cannot be made part of t3's scan
> >> filter, according to join_clause_is_movable_into.  So I removed that
> >> comment in the patch while we're here.
> >>
> >> Any thoughts?
> >
> > Here is an updated patch with a commit message.  Regarding
> > backporting, I'm inclined not to, given the lack of field reports.
> > Any objections to pushing it?
> No objections.
> By the way, I think you should be less shy about adding to CC people who
> have been involved in the discussion (at least David should have a
> chance to know). Also, don't hesitate to use the 'Reviewed-by' tag to
> let people know who else may remember the context of the problem, just
> in case.
>
> --
> regards, Andrei Lepikhov
>
>
>

Re: [PoC] Federated Authn/z with OAUTHBEARER

2025-04-14 Thread Wolfgang Walther


Wolfgang Walther:
> So yes, not related to your patch. I do understand that PostgreSQL's
> autoconf build system is not designed for "static only", I am certainly
> not expecting you to fix that.
>
> I think meson will do better here, but I was not able to make that work,
> yet.
I did a basic meson build. Full postgresql package, not libpq-only.

The static-only build just works. On master that is. Same as the regular 
build.


So yes, meson will handle the static stuff much better.


> I just tried the same thing on the bigger postgresql package, where the
> full build is run and not only libpq / libpq-oauth. It fails with the
> same error. No rule for oauth-curl.o.
Applying the v5 patch to the above meson build, will give me a different 
error. This time for both the static-only and the regular build:


  src/interfaces/libpq-oauth/meson.build:18:22: ERROR: File 
oauth-curl.c does not exist.


This.. clears it up, because that file is indeed missing for me on disk. 
I assume that's because this file is tracked as a rename in the v5 
patch. I can apply this with git, but not directly in the nix build 
system. TIL, I need to use "fetchpatch2" instead of "fetchpatch" for 
that. Sure thing.



So, with the patch applied correctly, I get the following:

1. Meson regular build:

libpq-oauth-18.so
libpq.so
libpq.so.5
libpq.so.5.18

The libpq.so file has references to dlopen and libpq-auth-18.so, cool.

2. Meson static-only build:

libpq.a
libpq-oauth-18.a

The libpq.a file has no references to dlopen, but plenty of references 
to curl stuff.


I'm not sure what the libpq-oauth-18.a file is for.

3. Back to the lipq-only build with autoconf, from where I started. I 
only need to add the following line:


make -C src/interfaces/libpq-oauth install

and get this:

libpq-oauth-18.so
libpq.so
libpq.so.5
libpq.so.5.18

Sweet!

4. Of course the static-only build does not work with autoconf, but 
that's expected.


So, sorry for the noise before. Now, that I know how to apply patches 
with renames, I will try your next patch as well.


Best,

Wolfgang

Re: An incorrect check in get_memoize_path

2025-04-14 Thread Andrei Lepikhov


On 4/14/25 08:49, Richard Guo wrote:

On Mon, Apr 7, 2025 at 4:50 PM Richard Guo  wrote:

Hence, I propose the attached patch for the fix.

BTW, there is a XXX comment there saying that maybe we can make the
remaining join quals part of the inner scan's filter instead of the
join filter.  I don't think this is possible in all cases.  In the
above query, 'coalesce(t2.b) = t3.b' cannot be made part of t3's scan
filter, according to join_clause_is_movable_into.  So I removed that
comment in the patch while we're here.

Any thoughts?


Here is an updated patch with a commit message.  Regarding
backporting, I'm inclined not to, given the lack of field reports.
Any objections to pushing it?

No objections.
By the way, I think you should be less shy about adding to CC people who 
have been involved in the discussion (at least David should have a 
chance to know). Also, don't hesitate to use the 'Reviewed-by' tag to 
let people know who else may remember the context of the problem, just 
in case.


--
regards, Andrei Lepikhov

[WIP] Implement "pg_restore --data-only --clean" as a way to skip WAL

2025-04-14 Thread Dimitrios Apostolou


Hello list,

I implemented --clean support for --data-only, in order to avoid logging 
to the WAL while populating the database. The attached patch issues a 
TRUNCATE before COPY on each worker process, and provides a significant 
speed advantage if the cluster is configure with wal_level=minimal.


It also provides a safer way to load the database, as avoiding WAL logging 
also avoids potential and painful ENOSPACE on the WAL partition as I 
experienced in [1]. In other words it makes things much better for my use 
case.


[1] 
https://www.postgresql.org/message-id/flat/076464ad-3d70-dd25-9e8f-e84f27decfba%40gmx.net

But it has some rough edges. I would appreciate guidance and feedback.

* When the table-to-be-TRUNCATEd is referenced as foreign key from other
  table, the whole transaction fails with:

ERROR: cannot truncate a table referenced in a foreign key constraint

  1. As a first step, when TRUNCATE fails I want to try a DELETE FROM
 instead, which has more chances of succeeding, and continuing with
 the COPY. How to detect the failure of ahprintf("TRUNCATE") and do
 the alternative without failing the whole transaction?

  2. Why doesn't --disable-triggers help?
 To test this, I have manually issued

   ALTER TABLE x DISABLE TRIGGER ALL

 to every table and issued manual TRUNCATE still fails. Shouldn't
 postgres skip the referential integrity checks?

  3. In my tests, all my tables start empty since I have just created the
 schema. Then pg_restore --data-only --clean first populates
 the /referencing/ tables, which is allowed because of disabled
 triggers, and then it tries to load the /referenced/ table.

 At this point the referential integrity is already broken. Getting an
 error when TRUNCATing the empty /referenced/ table doesn't make
 sense.


What do you think?

Thank you in advance,
Dimitris
From ee28b0b9c1d3b78f318f259f4907785db98f31e6 Mon Sep 17 00:00:00 2001
From: Dimitrios Apostolou 
Date: Sat, 12 Apr 2025 01:59:45 +0200
Subject: [PATCH v1] pg_restore --clean --data-only

In parallel restore, it issues a TRUNCATE before COPYing the data into
the tables, within a transaction.

As a result it avoid logging to the WAL, when combined with
wal_level=minimal.
---
 doc/src/sgml/ref/pg_restore.sgml | 10 ++
 src/bin/pg_dump/pg_backup.h  |  1 +
 src/bin/pg_dump/pg_backup_archiver.c |  3 ++-
 src/bin/pg_dump/pg_restore.c |  6 +++---
 4 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/doc/src/sgml/ref/pg_restore.sgml b/doc/src/sgml/ref/pg_restore.sgml
index 2e3ba802581..697ca25d70a 100644
--- a/doc/src/sgml/ref/pg_restore.sgml
+++ b/doc/src/sgml/ref/pg_restore.sgml
@@ -112,18 +112,28 @@ PostgreSQL documentation
   

 Before restoring database objects, issue commands
 to DROP all the objects that will be restored.
 This option is useful for overwriting an existing database.
 If any of the objects do not exist in the destination database,
 ignorable error messages will be reported,
 unless --if-exists is also specified.

+   
+ In combination with --data-only a TRUNCATE will be
+ attempted instead of DROP, before COPYing the data.  So if you want
+ to overwrite an existing database without re-writing the schema, then
+ issue --data-only --clean.  Together
+ with --parallel it is a high performance way to load
+ the tables, as it avoids logging to the WAL (if the server is
+ configured with wal_level=minimal).  Warning:
+ foreign key constraints might cause table truncation to fail.
+   
   
  
 
  
   -C
   --create
   

 Create the database before restoring into it.
diff --git a/src/bin/pg_dump/pg_backup.h b/src/bin/pg_dump/pg_backup.h
index fbf5f1c515e..642795f0223 100644
--- a/src/bin/pg_dump/pg_backup.h
+++ b/src/bin/pg_dump/pg_backup.h
@@ -97,18 +97,19 @@ typedef struct _restoreOptions
 	int			noOwner;		/* Don't try to match original object owner */
 	int			noTableAm;		/* Don't issue table-AM-related commands */
 	int			noTablespace;	/* Don't issue tablespace-related commands */
 	int			disable_triggers;	/* disable triggers during data-only
 	 * restore */
 	int			use_setsessauth;	/* Use SET SESSION AUTHORIZATION commands
 	 * instead of OWNER TO */
 	char	   *superuser;		/* Username to use as superuser */
 	char	   *use_role;		/* Issue SET ROLE to this */
+	int			clean;
 	int			dropSchema;
 	int			disable_dollar_quoting;
 	int			dump_inserts;	/* 0 = COPY, otherwise rows per INSERT */
 	int			column_inserts;
 	int			if_exists;
 	int			no_comments;	/* Skip comments */
 	int			no_publications;	/* Skip publication entries */
 	int			no_security_labels; /* Skip security label entries */
 	int			no_subscriptions;	/* Skip subscription entries */
diff --git a/src/bin/pg_dump/pg_backup_archiver.c b/src/bin/pg_du

Re: Logging parallel worker draught

2025-04-14 Thread Benoit Lobréau


On 4/7/25 6:41 PM, Melanie Plageman wrote:

On Mon, Feb 3, 2025 at 12:37 AM Sami Imseih  wrote:
I started looking at this, and I like the idea.


Thanks for taking a look!


A few comments: I don't understand what 0002 is. For starters, the
commit message says something about pg_stat_database, and there are no
changes related to that.


I had originally split this part out while working on the patch to add 
parallel worker stats in pg_stat_database [1], in order to isolate the 
common components. In the end, that patch only accounted for user queries.


I merged it into "Implements logging for parallel worker usage in 
utilities" for v9.



Also, we already have basically identical logging coming from
parallel_vacuum_process_all_indexes() and viewable in existing output.
Not only does your implementation not replace this, it is odd that
setting your new guc to none does not disable this. It seems a bit
inconsistent. I'm not sure what the exact right behavior is here,
though.


That logging is used for the VERBOSE mode of VACUUM. There was also 
dicussion to add similar info for parallel index creation.


The use case here is different — the goal is to audit parallel worker 
usage across the entire instance, without needing every call site to use 
VACUUM (VERBOSE) along with SET log_min_messages = info.


I avoided reusing that part of the code because I thought the 
expectation was to aggregate worker counts and display them in 
parallel_vacuum_end(). Sami also mentionned that using the same log
line everywhere in the patch would make parsing easier, which I tought 
was a good idea.



Since your last update, it seems parallel gin index build has been
committed, so perhaps you want to add that.


Thanks for the heads-up! I've added logging in _gin_end_parallel().

You’ll find the updated patch attached.

[1] https://commitfest.postgresql.org/patch/5212/

--
Benoit
From 59cd090e78a5833b901ad662d6ae1ae936c3172d Mon Sep 17 00:00:00 2001
From: benoit 
Date: Tue, 8 Oct 2024 12:39:41 +0200
Subject: [PATCH 1/3] Add a guc for parallel worker logging

The new guc log_parallel_workers controls whether a log message is
produced to display information on the number of workers spawned when a
parallel query or utility is executed.

The default value is `none` which disables logging. `all` displays
information for all parallel queries, whereas `shortage` displays
information only when the number of workers launched is lower than the
number of planned workers.

This new parameter can help database administrators and developers
diagnose performance issues related to parallelism and optimize the
configuration of the system accordingly.
---
 doc/src/sgml/config.sgml  | 18 ++
 src/backend/access/transam/parallel.c | 19 +++
 src/backend/utils/misc/guc_tables.c   | 19 +++
 src/backend/utils/misc/postgresql.conf.sample |  1 +
 src/include/access/parallel.h | 10 ++
 src/include/utils/guc.h   |  1 +
 6 files changed, 68 insertions(+)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index c1674c22cb2..b1345d15a84 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -7950,6 +7950,24 @@ log_line_prefix = '%m [%p] %q%u@%d/%a '
   
  
 
+
+  log_parallel_workers (enum)
+  
+   log_parallel_workers configuration parameter
+  
+  
+  
+   
+Controls whether a log message about the number of workers is emitted during the
+execution of a parallel query or utility statement. The default value is
+none which disables logging. all emits
+information for all parallel queries or utilities, whereas shortage
+emits information only when the number of workers launched is lower than the number
+of planned workers.
+   
+  
+ 
+
  
   log_parameter_max_length (integer)
   
diff --git a/src/backend/access/transam/parallel.c b/src/backend/access/transam/parallel.c
index 94db1ec3012..77a8deff30d 100644
--- a/src/backend/access/transam/parallel.c
+++ b/src/backend/access/transam/parallel.c
@@ -1663,3 +1663,22 @@ LookupParallelWorkerFunction(const char *libraryname, const char *funcname)
 	return (parallel_worker_main_type)
 		load_external_function(libraryname, funcname, true, NULL);
 }
+
+/*
+ * If required, emit information about parallel workers usage in
+ * the logs.
+ */
+void
+LogParallelWorkersIfNeeded(int log_parallel_workers,
+		   int parallel_workers_to_launch,
+		   int parallel_workers_launched)
+{
+	if ((log_parallel_workers == LOG_PARALLEL_WORKERS_ALL &&
+		parallel_workers_to_launch > 0) ||
+		(log_parallel_workers == LOG_PARALLEL_WORKERS_SHORTAGE &&
+		parallel_workers_to_launch != parallel_workers_launched))
+		ereport(LOG,
+(errmsg("launched %i parallel workers (planned: %i)",
+ parallel_workers_launched,
+ parallel_workers_to_launch)));
+}
di

Re: [PoC] Federated Authn/z with OAUTHBEARER

2025-04-14 Thread Jacob Champion

On Fri, Apr 11, 2025 at 9:21 AM Wolfgang Walther
 wrote:
> I tried to apply this patch to nixpkgs' libpq build [1]. First, I pinned
> a recent commit from master (one where the v5 patch will apply cleanly
> later) and enabled --with-libcurl [2].

(The [2] link is missing, I think.)

> 2. The statically linked build fails during configure:

I'm confused by this -- the build produces staticlibs alongside the
dynamically linked ones, so that's what I've been testing against.
What different options do you pass to configure for a "statically
linked build"?

>undefined reference to `psl_is_cookie_domain_acceptable'
>undefined reference to `nghttp2_session_check_request_allowed'
>
> I assume the many libs listed in Libs.private in libcurl.pc are not
> added automatically for this check?

Not unless there is some magic in PKG_CHECK_MODULES I've never heard
of (which is entirely possible!). Furthermore I imagine that the
transitive dependencies of all its dependencies are not added either.

Does your build method currently work for dependency forests like
libgssapi_krb5 and libldap? (I want to make sure I'm not accidentally
doing less work than we currently support for those other deps, but
I'm also not planning to add more feature work as part of this
particular open item.)

> I tried adding "make submake-libpq-oauth", but that doesn't exist.

There is no submake for this because no other targets depend on it.
Currently I don't have any plans to add one (but -C should work).

> When I do "make -C src/interfaces/libpq-oauth", I get this error:
>
>make: *** No rule to make target 'oauth-curl.o', needed by
> 'libpq-oauth-18.so'.  Stop.

I cannot reproduce this. The CI seems happy, too. Is this patch the
only modification you've made to our build system, or are there more
changes?

I'm about to rewrite this part somewhat, so a deep dive may not be very helpful.

Thanks,
--Jacob

Re: Fundamental scheduling bug in parallel restore of partitioned tables

2025-04-14 Thread Tom Lane

I wrote:
> Here's a draft patch for this.  It seems to fix the problem in
> light testing.

I realized that the "repro" I had for this isn't testing the same
thing that Dimitrios is seeing; what it is exposing looks more like
a bug or at least a behavioral change due to the v18 work to record
not-null constraints in pg_constraint [1].  So my patch may fix his
problem or it may not.  It would be good to have a reproducer that
fails (not necessarily every time) in v17 or earlier.

In addition to that uncertainty, pushing the patch now would get in
the way of identifying what's really going on at [1].  So I'm going
to sit on it for now, and maybe it's going to turn into v19 material.

regards, tom lane

[1] https://www.postgresql.org/message-id/1280408.1744650810%40sss.pgh.pa.us

Re: BitmapHeapScan streaming read user and prelim refactoring

2025-04-14 Thread Robert Haas

On Thu, Apr 10, 2025 at 11:15 PM Thomas Munro  wrote:
> The new streaming BHS isn't just issuing probabilistic hints about
> future access obtained from a second iterator.  It has just one shared
> iterator connected up to the workers' ReadStreams.  Each worker pulls
> a disjoint set of blocks out of its stream, possibly running a bunch
> of IOs in the background as required.

It feels to me like the problem here is that the shared iterator is
connected to unshared read-streams. If you make a shared read-stream
object and connect the shared iterator to that instead, does that
solve this whole problem, or is there more to it?

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Using pg_bitutils.h in tidbitmap.c.

2025-04-14 Thread Thomas Munro

tidbitmap.c's operations loop over all the bits, but could leap over
zeros with bitmap instructions like bitmapset.c.  Hard to imagine that
matters, but I think it also comes out easier to read?
From 7eec670224270c098c2fb3ad4b7748b0769a1e95 Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Wed, 19 Mar 2025 02:38:26 +1300
Subject: [PATCH] Use pg_bitutils.h in tidbitmap.c.

Instead of looping over every bit, skip the zeros.
---
 src/backend/nodes/tidbitmap.c | 108 ++
 1 file changed, 57 insertions(+), 51 deletions(-)

diff --git a/src/backend/nodes/tidbitmap.c b/src/backend/nodes/tidbitmap.c
index 41031aa8f2f..6ca01bdb772 100644
--- a/src/backend/nodes/tidbitmap.c
+++ b/src/backend/nodes/tidbitmap.c
@@ -44,6 +44,7 @@
 #include "common/int.h"
 #include "nodes/bitmapset.h"
 #include "nodes/tidbitmap.h"
+#include "port/pg_bitutils.h"
 #include "storage/lwlock.h"
 #include "utils/dsa.h"
 
@@ -242,6 +243,20 @@ static int	tbm_shared_comparator(const void *left, const void *right,
 #include "lib/simplehash.h"
 
 
+/*
+ * Return the position of the rightmost bit of *word, and also clear that bit.
+ * Useful for looping over the ones in a word.  *word must not be zero.
+ */
+static inline int
+bmw_pop_rightmost_one(bitmapword *word)
+{
+	int			position = bmw_rightmost_one_pos(*word);
+
+	*word &= ~((bitmapword) 1 << position);
+
+	return position;
+}
+
 /*
  * tbm_create - create an initially-empty bitmap
  *
@@ -474,24 +489,20 @@ tbm_union_page(TIDBitmap *a, const PagetableEntry *bpage)
 
 	if (bpage->ischunk)
 	{
+		BlockNumber blockno = bpage->blockno;
+
 		/* Scan b's chunk, mark each indicated page lossy in a */
 		for (wordnum = 0; wordnum < WORDS_PER_CHUNK; wordnum++)
 		{
 			bitmapword	w = bpage->words[wordnum];
 
-			if (w != 0)
+			while (w != 0)
 			{
-BlockNumber pg;
+int			bitnum = bmw_pop_rightmost_one(&w);
 
-pg = bpage->blockno + (wordnum * BITS_PER_BITMAPWORD);
-while (w != 0)
-{
-	if (w & 1)
-		tbm_mark_page_lossy(a, pg);
-	pg++;
-	w >>= 1;
-}
+tbm_mark_page_lossy(a, blockno + bitnum);
 			}
+			blockno += BITS_PER_BITMAPWORD;
 		}
 	}
 	else if (tbm_page_is_lossy(a, bpage->blockno))
@@ -584,38 +595,29 @@ tbm_intersect_page(TIDBitmap *a, PagetableEntry *apage, const TIDBitmap *b)
 	{
 		/* Scan each bit in chunk, try to clear */
 		bool		candelete = true;
+		BlockNumber blockno = apage->blockno;
 
 		for (wordnum = 0; wordnum < WORDS_PER_CHUNK; wordnum++)
 		{
 			bitmapword	w = apage->words[wordnum];
+			bitmapword	neww = w;
 
-			if (w != 0)
+			while (w != 0)
 			{
-bitmapword	neww = w;
-BlockNumber pg;
-int			bitnum;
+int			bitnum = bmw_pop_rightmost_one(&w);
+BlockNumber pg = blockno + bitnum;
 
-pg = apage->blockno + (wordnum * BITS_PER_BITMAPWORD);
-bitnum = 0;
-while (w != 0)
+if (!tbm_page_is_lossy(b, pg) &&
+	tbm_find_pageentry(b, pg) == NULL)
 {
-	if (w & 1)
-	{
-		if (!tbm_page_is_lossy(b, pg) &&
-			tbm_find_pageentry(b, pg) == NULL)
-		{
-			/* Page is not in b at all, lose lossy bit */
-			neww &= ~((bitmapword) 1 << bitnum);
-		}
-	}
-	pg++;
-	bitnum++;
-	w >>= 1;
+	/* Page is not in b at all, lose lossy bit */
+	neww &= ~((bitmapword) 1 << bitnum);
 }
-apage->words[wordnum] = neww;
-if (neww != 0)
-	candelete = false;
 			}
+			apage->words[wordnum] = neww;
+			if (neww != 0)
+candelete = false;
+			blockno += BITS_PER_BITMAPWORD;
 		}
 		return candelete;
 	}
@@ -910,21 +912,14 @@ tbm_extract_page_tuple(TBMIterateResult *iteritem,
 	{
 		bitmapword	w = page->words[wordnum];
 
-		if (w != 0)
+		while (w != 0)
 		{
-			int			off = wordnum * BITS_PER_BITMAPWORD + 1;
+			int			bitnum = bmw_pop_rightmost_one(&w);
+			int			off = wordnum * BITS_PER_BITMAPWORD + bitnum + 1;
 
-			while (w != 0)
-			{
-if (w & 1)
-{
-	if (ntuples < max_offsets)
-		offsets[ntuples] = (OffsetNumber) off;
-	ntuples++;
-}
-off++;
-w >>= 1;
-			}
+			if (ntuples < max_offsets)
+offsets[ntuples] = off;
+			ntuples++;
 		}
 	}
 
@@ -938,18 +933,29 @@ static inline void
 tbm_advance_schunkbit(PagetableEntry *chunk, int *schunkbitp)
 {
 	int			schunkbit = *schunkbitp;
+	int			wordnum = WORDNUM(schunkbit);
+	int			bitnum = BITNUM(schunkbit);
+	bitmapword	w;
+
+	/* Chop off bits to the right of bitnum in starting word. */
+	w = chunk->words[wordnum] & ~((bitmapword) 0) << bitnum;
 
-	while (schunkbit < PAGES_PER_CHUNK)
+	for (;;)
 	{
-		int			wordnum = WORDNUM(schunkbit);
-		int			bitnum = BITNUM(schunkbit);
+		if (w != 0)
+		{
+			*schunkbitp =
+wordnum * BITS_PER_BITMAPWORD + bmw_rightmost_one_pos(w);
+			return;
+		}
 
-		if ((chunk->words[wordnum] & ((bitmapword) 1 << bitnum)) != 0)
+		if (++wordnum == WORDS_PER_CHUNK)
 			break;
-		schunkbit++;
+		w = chunk->words[wordnum];
 	}
 
-	*schunkbitp = schunkbit;
+	/* No bits found.  Point past end. */
+	*schunkbitp = WORDS_PER_CHUNK *

Re: New committer: Jacob Champion

2025-04-14 Thread Nazir Bilal Yavuz

On Fri, 11 Apr 2025 at 23:26, Jonathan S. Katz  wrote:
>
> The Core Team would like to extend our congratulations to Jacob
> Champion, who has accepted an invitation to become our newest PostgreSQL
> committer.
>
> Please join us in wishing Jacob much success and few reverts!

Congratulations Jacob!

-- 
Regards,
Nazir Bilal Yavuz
Microsoft

Re: Add pg_get_injection_points() for information of injection points

2025-04-14 Thread Aleksander Alekseev

Hi,

> > Also, should we define pg_get_injection_points in the injection_points
> > extension maybe?
>
> As this is global for all libraries, that's not something I would add
> there.

If I didn't miss anything, all SQL functions dealing with injection
points are gathered in injection_points extension so IMO
pg_get_injection_points() belongs there. It would be awkward to have
it in the core considering the fact that injection points presumably
should be disabled in release builds. Users will see a function in \df
that does nothing.

-- 
Best regards,
Aleksander Alekseev

Re: Changing shared_buffers without restart

2025-04-14 Thread Dmitry Dolgov

> On Mon, Apr 14, 2025 at 10:40:28AM GMT, Ashutosh Bapat wrote:
>
> However, when we put back the patches to shrink buffers, we will evict
> the extra buffers, and shrink - if all the processes haven't
> participated in the barrier by then, some of them may try to access
> those buffers - re-installing them and then bad things can happen.

As I've mentioned above, I don't see how a process could try to access a
buffer, if it's on the path between receiving the ProcSignalBarrier and
attaching to the global shmem Barrier, even if we shrink buffers.
AFAICT interrupt handles should not touch buffers, and otherwise the
process doesn't have any point withing this window where it might do
this. Do you have some particular scenario in mind?

> I might have not noticed it, but are we putting two mappings one
> reserved and one allocated in the same address space, so that when the
> allocated mapping shrinks or expands, the reserved mapping continues
> to prohibit any other mapping from appearing there? I looked at some
> of the previous emails, but didn't find anything that describes how
> the reserved mapped space is managed.

I though so, but this turns out to be incorrect. Just have done a small
experiment -- looks like when reserving some space, mapping and
unmapping a small segment from it leaves a non-mapped gap. That would
mean for shrinking the new available space has to be reserved again.

Re: Changing shared_buffers without restart

2025-04-14 Thread Ashutosh Bapat

On Mon, Apr 14, 2025 at 12:50 PM Dmitry Dolgov <9erthali...@gmail.com> wrote:
>
> > On Mon, Apr 14, 2025 at 10:40:28AM GMT, Ashutosh Bapat wrote:
> >
> > However, when we put back the patches to shrink buffers, we will evict
> > the extra buffers, and shrink - if all the processes haven't
> > participated in the barrier by then, some of them may try to access
> > those buffers - re-installing them and then bad things can happen.
>
> As I've mentioned above, I don't see how a process could try to access a
> buffer, if it's on the path between receiving the ProcSignalBarrier and
> attaching to the global shmem Barrier, even if we shrink buffers.
> AFAICT interrupt handles should not touch buffers, and otherwise the
> process doesn't have any point withing this window where it might do
> this. Do you have some particular scenario in mind?

ProcessProcSignalBarrier() is not within an interrupt handler but it
responds to a flag set by an interrupt handler. After calling
pg_atomic_write_u64(&MyProcSignalSlot->pss_barrierReceivedGeneration,
shared_gen); it will enter the loop

while (flags != 0)
where it may process many barriers before processing
PROCSIGNAL_BARRIER_SHMEM_RESIZE. Nothing stops the other barrier
processing code from touching buffers. Right now it's just smgrrelease
that gets called in the other barrier. But that's not guaranteed in
future.

>
> > I might have not noticed it, but are we putting two mappings one
> > reserved and one allocated in the same address space, so that when the
> > allocated mapping shrinks or expands, the reserved mapping continues
> > to prohibit any other mapping from appearing there? I looked at some
> > of the previous emails, but didn't find anything that describes how
> > the reserved mapped space is managed.
>
> I though so, but this turns out to be incorrect. Just have done a small
> experiment -- looks like when reserving some space, mapping and
> unmapping a small segment from it leaves a non-mapped gap. That would
> mean for shrinking the new available space has to be reserved again.

Right. That's what I thought. But I didn't see the corresponding code.
So we have to keep track of two mappings for every segment - 1 for
allocation and one for reserving space and resize those two while
shrinking and expanding buffers. Am I correct?

--
Best Wishes,
Ashutosh Bapat

doc patch: clarify the naming rule for injection_points

2025-04-14 Thread Hayato Kuroda (Fujitsu)

Dear hackers,

This is a forked thread from [1]. 

Naming rule of points is not determined yet, but most of them have lower cases
and each term are divided by dash "-". I think this is a good chance to formally
clarify it. PSA adding the description.

I was confused the correct place for the description. I added at the end of 
first
paragraph, because it describes how we add and use it. Suggestions are very
welcomed.

[1]: 
https://www.postgresql.org/message-id/OSCPR01MB14966B78F3AF15C252EB9E02FF5B32%40OSCPR01MB14966.jpnprd01.prod.outlook.com

Best regards,
Hayato Kuroda
FUJITSU LIMITED



0001-Doc-clarify-the-naming-rule-for-injection-points.patch
Description:  0001-Doc-clarify-the-naming-rule-for-injection-points.patch

Re: MergeJoin beats HashJoin in the case of multiple hash clauses

2025-04-14 Thread Tender Wang

Hi,

While I debug hashjoin codes,  in estimate_multivariate_bucketsize(), I
find that
the list_copy(hashclauses) below is unnecessary if we have a single join
clause.

List   *clauses = list_copy(hashclauses);
...

I adjust the place of list_copy() call as the attached patch.
This can save some overhead of function calls and memory copies.

Any thoughts?

-- 
Thanks, Tender Wang
From d4a29af25ef276a145b46ecc16d631909fb1891a Mon Sep 17 00:00:00 2001
From: Tender Wang 
Date: Mon, 14 Apr 2025 13:53:15 +0800
Subject: [PATCH] Adjust the place of list_copy() in
 estimate_multivariate_bucketsize.

If we have a single hashclause, list_copy(hashclauses) is unnecessary.
This adjustment can save the overhead of function calls and memory
copies.
---
 src/backend/utils/adt/selfuncs.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 588d991fa57..786fea16bc5 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -3799,18 +3799,18 @@ estimate_multivariate_bucketsize(PlannerInfo *root, 
RelOptInfo *inner,
 List 
*hashclauses,
 Selectivity 
*innerbucketsize)
 {
-   List   *clauses = list_copy(hashclauses);
+   List   *clauses = NIL;
List   *otherclauses = NIL;
double  ndistinct = 1.0;
 
+   /*
+* Nothing to do for a single clause.  Could we employ univariate
+* extended stat here?
+*/
if (list_length(hashclauses) <= 1)
-
-   /*
-* Nothing to do for a single clause.  Could we employ 
univariate
-* extended stat here?
-*/
return hashclauses;
 
+   clauses = list_copy(hashclauses);
while (clauses != NIL)
{
ListCell   *lc;
-- 
2.34.1

Re: New committer: Jacob Champion

2025-04-14 Thread Matheus Alcantara


On 11/04/25 17:26, Jonathan S. Katz wrote:
The Core Team would like to extend our congratulations to Jacob 
Champion, who has accepted an invitation to become our newest 
PostgreSQL committer.


Please join us in wishing Jacob much success and few reverts!



Congrats Jacob! Well deserved.

--
Matheus Alcantara

Re: Assert failure in base_yyparse

2025-04-14 Thread Richard Guo

On Fri, Mar 28, 2025 at 7:12 PM Евгений Горбанев  wrote:
> If you replace is_not_null with NOT NULL in the query, everything works
> correctly.
> It seems that is_not_null is an incorrect keyword and it should not be
> used, but I don't understand how it gets here.

It seems what happens is that internally in gram.y (~line 14274), the
DefElem for the not-null option is assigned the name "is_not_null".
As a result, this allows users to explicitly use "is_not_null" as the
option name.  However, the value provided for the is_not_null option
in this way might not be a Boolean as expected, which triggers the
assertion.

I kind of doubt we should allow the use of the "is_not_null" keyword
in the xmltable function.

Hi Álvaro, what do you think about this?

Thanks
Richard

Re: New committer: Jacob Champion

2025-04-14 Thread Aleksander Alekseev

> The Core Team would like to extend our congratulations to Jacob
> Champion, who has accepted an invitation to become our newest PostgreSQL
> committer.

Congrats, Jacob!

-- 
Best regards,
Aleksander Alekseev

Re: someone else to do the list of acknowledgments

2025-04-14 Thread Aleksander Alekseev

Hi,

> >> The whole thing might take about 20 to 30 hours wall-clock time.
> >
> > After this dev cycle, things with a defined end to them hold a greater 
> > attraction than they did previously.
> >
> >>
> >> So, there is some time to think about this.  Please discuss here if
> >> you're interested or have questions.
> >
> > I am interested.
>
> +1

Sorry, I've just realized that my +1 can be interpreted as both voting
for Corey and as volunteering for making the list of acknowledgments.
To clarify, I meant that I'm interested too :)

-- 
Best regards,
Aleksander Alekseev

Re: Add pg_get_injection_points() for information of injection points

2025-04-14 Thread Ranier Vilela

Em seg., 14 de abr. de 2025 às 09:46, Ranier Vilela 
escreveu:

> Hi Michael.
>
> Em dom., 13 de abr. de 2025 às 21:36, Michael Paquier 
> escreveu:
>
>> Hi all,
>>
>> One thing that's been lacking in injection points is the possibility
>> to look at the state of the injection points in shared memory through
>> some SQL.  This was on my tablets for some time, but I have not taken
>> the time to do the actual legwork.
>>
>> The attached patch adds a SRF that returns a set of tuples made of the
>> name, library and function for all the injection points attached to
>> the system.  This implementation relies on a new function added in
>> injection_point.c, called InjectionPointList(), that retrieves a
>> palloc()'d array of the injection points, hiding from the callers the
>> internals of what this stuff does with the shmem array lookup.
>>
>> This is useful for monitoring or in tests, to make sure for example
>> that nothing is left around at the end of a script.  I have a
>> different proposal planned for this area of the code, where this
>> function would be good to have, but I am sending an independent patch
>> as this stuff is useful on its own.
>>
>> The patch includes a couple of tests and some documentation.
>>
> I think that it would be more productive to use the "int idx", to store
> *num_points,
> avoiding counting inside the loop, no?
>
> Function InjectionPointList:
> + uint32 max_inuse;
> +   int idx;
> + for (idx = 0; idx < max_inuse; idx++)- (*num_points)++;
> +   *num_points = idx;
> + return result;
>
Nevermind, this is wrong, sorry for the noise.

best regards,
Ranier Vilela

Re: COALESCE with single argument looks like identity function

2025-04-14 Thread Maksim Milyutin


On 4/11/25 17:00, Tom Lane wrote:

Maksim Milyutin writes:

I've noticed that COALESCE function doesn't converge to argument
expression if it is alone in argument list of COALESCE as part
simplification routine for expressions in planner. This might suppress
further useful transformations when non-strict ops are required from
some expression like converging OUTER JOIN to INNER one with WHERE qual
containing COALESCE over single column from inner side.

Seems like a reasonable idea --- it's probably a rare case, but the
check is cheap enough.  I'd add some comments though.



Thanks for your comments.



Please add this to the open commitfest so we don't lose track of it.



Done. In regression tests I've replaced all COALESCEs with single 
argument to ones with dummy second argument to preserve coalesce calls 
as AFAICS their usages are intentional for wrapping attributes to 
generate PHVs above.



Also I've noticed the issue in query (in join.sql test suite):

SELECT 1 FROM group_tbl t1
    LEFT JOIN (SELECT a c1, *COALESCE(a)* c2 FROM group_tbl t2) s ON TRUE
GROUP BY s.c1, s.c2

repeatable t2.a in GROUP BY clauses are not converged to single appearance:

 QUERY PLAN

 Group
   Group Key: t2.a, *t2.a*
   ->  Sort
 Sort Key: t2.a, *t2.a*
 ->  Nested Loop Left Join
   ->  Seq Scan on group_tbl t1
   ->  Seq Scan on group_tbl t2

IMO the cause is in PHV surrounding s.c2 that differentiates its 
internal expression with the same first grouping key.


--
Best regard,
Maksim Milyutin

Re: rounding_up

2025-04-14 Thread Tom Lane

Christoph Moench-Tegeder  writes:
> ## Daria Shanina (vilensip...@gmail.com):
>> I noticed, when we parse and validate values (in particular, the int type),
>> we use the *rint* method, but unfortunately it does not work according to
>> the round rules.

> First question would be "which round rule?" as (of course) there're
> multiple to chose from.

Yeah.  Round-to-nearest-even is a well-respected rule, which is why
it's the default per IEEE 754.  I don't see a good reason for us
to switch.  Even if someone advanced an argument, it would have
to be a *mighty* convincing argument to justify breaking backwards
compatibility here.

I do find it a little unfortunate that our numeric type does it
differently than our float types.  Again though, there's a huge
compatibility argument against changing that now.  It does give
you an "out" if you really need one or the other behavior for
a particular application: you can cast to numeric or float8
before casting to int.

regards, tom lane

Re: support create index on virtual generated column.

2025-04-14 Thread jian he

On Wed, Mar 26, 2025 at 5:36 PM Kirill Reshke  wrote:
> reshke=# CREATE TABLE xx (a int, b int GENERATED ALWAYS AS (a * 2) VIRTUAL) ;
> CREATE TABLE
> reshke=# create index on xx (b);
> CREATE INDEX
> reshke=#
> reshke=# \d+ xx
>  Table "public.xx"
>  Column |  Type   | Collation | Nullable |   Default
> | Storage | Compression | Stats target | Description
> +-+---+--+-+-+-+--+-
>  a  | integer |   |  |
> | plain   | |  |
>  b  | integer |   |  | generated always as (a * 2)
> | plain   | |  |
> Indexes:
> "xx_b_idx" btree (b)
> Access method: heap
>
> reshke=# alter table xx drop column b;
> ALTER TABLE
> reshke=# \d+ xx
>Table "public.xx"
>  Column |  Type   | Collation | Nullable | Default | Storage |
> Compression | Stats target | Description
> +-+---+--+-+-+-+--+-
>  a  | integer |   |  | | plain   |
> |  |
> Indexes:
> "xx_b_idx" btree ("pg.dropped.2" int4_ops)
> Access method: heap
>
> reshke=#
> ```
>
> with regular columns we have different behaviour - with drop column we
> drop the index
>

I was wrong about dependency.
when creating an index on a virtual generated column, it will have
dependency with
virtual generated column attribute and the generation expression
associated attribute.

new patch attached. Now,
ALTER TABLE DROP COLUMN works fine.
ALTER INDEX ATTACH PARTITION works fine.
creating such an index on a partitioned table works just fine.
for table inheritance: create index on parent table will not cascade
to child table,
so we don't need to worry about this.
From 28852a6df9ebcfc68f8b97248eb811ec22307065 Mon Sep 17 00:00:00 2001
From: jian he 
Date: Mon, 14 Apr 2025 19:07:48 +0800
Subject: [PATCH v2 1/1] index on virtual generated column

* internally such index transformed into expression index.
  For example, an index on (b int GENERATED ALWAYS AS (a * 2) VIRTUAL)
  internal representation: an expression index on ((a * 2)).
* in pageinspect module, add some test to check the index content of virtual generated column.
* primary key, unique index over virtual generated column are not supported.
* expression index and predicate index over virtual generated columns are
  currently not supported.
* virtual generated column can not be in "include column"
* all types of indexes are supported, hash index, gist index regress test added.
* To support ALTER TABLE SET EXPRESSION, in pg_index, we need to track the original
  virtual generated column attribute number, so ALTER TABLE SET EXPRESSION can
  identify which index needs to be rebuilt.
* ALTER COLUMN SET DATA TYPE will also cause table rewrite, so we really
  need to track the virtual generated column attribute number that index was built on.

* if the partitioned table and partition both have an index, then the index over the virtual
  generated column should be the same expression. For example, the following last
  command should error out.

CREATE TABLE parted (b integer,c integer,a integer GENERATED ALWAYS AS (c+1))PARTITION BY RANGE (b);
CREATE TABLE part (b integer,c integer,a integer GENERATED ALWAYS AS (c));
create index on part(a);
create index on parted(a);
alter table parted ATTACH partition part for values from (1) to (10);

discussion: https://postgr.es/m/CACJufxGao-cypdNhifHAdt8jHfK6-HX=trbovbkgruxw063...@mail.gmail.com
commitfest entry: https://commitfest.postgresql.org/patch/5667/
---
 contrib/pageinspect/expected/btree.out|  33 ++
 contrib/pageinspect/sql/btree.sql |  21 ++
 doc/src/sgml/catalogs.sgml|  15 +
 src/backend/catalog/index.c   |  69 
 src/backend/commands/indexcmds.c  | 301 +++---
 src/backend/commands/tablecmds.c  |  61 
 src/backend/parser/parse_utilcmd.c|  25 +-
 src/backend/utils/adt/ruleutils.c |  28 +-
 src/include/catalog/index.h   |   6 +
 src/include/catalog/pg_index.h|   1 +
 src/include/nodes/execnodes.h |   1 +
 src/test/regress/expected/fast_default.out|   8 +
 .../regress/expected/generated_virtual.out| 199 ++--
 src/test/regress/sql/fast_default.sql |   6 +
 src/test/regress/sql/generated_virtual.sql| 111 +--
 15 files changed, 803 insertions(+), 82 deletions(-)

diff --git a/contrib/pageinspect/expected/btree.out b/contrib/pageinspect/expected/btree.out
index 0aa5d73322f..56d57848cf7 100644
--- a/contrib/pageinspect/expected/btree.out
+++ b/contrib/pageinspect/expected/btree.out
@@ -183,6 +183,39 @@ tids   |

 SELECT * FROM bt_page

Re: Adding facility for injection points (or probe points?) for more advanced tests

2025-04-14 Thread Michael Paquier

On Mon, Apr 14, 2025 at 01:24:30PM +0900, Michael Paquier wrote:
> IMO, we ought to clean up the way the AIO code does its tests with
> injection point with something like the patch of this thread.  And
> perhaps even do it in v18 to have the code in a cleaner state at
> release.  I'll start a new thread after hacking my way through that.
> The core injection point patch still needs a bit of work compared to
> what was sent previously.

Please see here, on a new thread:
https://www.postgresql.org/message-id/z_y9ttnxubvya...@paquier.xyz
--
Michael


signature.asc
Description: PGP signature

Re: Rename injection point names in test_aio

2025-04-14 Thread Michael Paquier

On Mon, Apr 14, 2025 at 08:13:44AM +, Hayato Kuroda (Fujitsu) wrote:
> I have no objections for the patch, but I feel there are no concrete naming 
> rules
> (I confused while creating patches).

There has been a sort of implied rule in the code to use lower
characters, with terms separated by dashes.  Perhaps we could make
that more official with an extra sentence in the docs about that.

> Can we clarify that? E.g., first term should be a module or process,
> or something like that.

Not sure that it would be a good thing to put context-specific
restrictions here.

Anyway, would you like to propose a patch for the documentation?
--
Michael

signature.asc
Description: PGP signature

Re: support create index on virtual generated column.

2025-04-14 Thread Kirill Reshke

On Mon, 14 Apr 2025 at 16:10, jian he  wrote:
>
> new patch attached. Now,
> ALTER TABLE DROP COLUMN works fine.
> ALTER INDEX ATTACH PARTITION works fine.
> creating such an index on a partitioned table works just fine.
> for table inheritance: create index on parent table will not cascade
> to child table,
> so we don't need to worry about this.

Hi! I reviewed v2, and it seems to be working now.

But there are tests that are comment-out, what is their purpose? I
note that commit 83ea6c5 also included some commented tests, so
perhaps there's a reason I'm not aware of.

```
ALTER TABLE gtest22c DROP COLUMN e;
\d gtest22c

-- EXPLAIN (COSTS OFF) SELECT * FROM gtest22c WHERE b * 3 = 6;
-- SELECT * FROM gtest22c WHERE b * 3 = 6;
-- EXPLAIN (COSTS OFF) SELECT * FROM gtest22c WHERE a = 1 AND b > 0;
-- SELECT * FROM gtest22c WHERE a = 1 AND b > 0;
```


-- 
Best regards,
Kirill Reshke

Re: New committer: Jacob Champion

2025-04-14 Thread wenhui qiu

Congratulations

On Mon, Apr 14, 2025 at 6:56 PM Aleksander Alekseev <
aleksan...@timescale.com> wrote:

> > The Core Team would like to extend our congratulations to Jacob
> > Champion, who has accepted an invitation to become our newest PostgreSQL
> > committer.
>
> Congrats, Jacob!
>
> --
> Best regards,
> Aleksander Alekseev
>
>
>

Re: Adding error messages to a few slash commands

2025-04-14 Thread Pavel Luzanov


On 13.04.2025 19:40, Abhishek Chanda wrote:

Thanks for the feedback, attached an updated patch that changes most
of those "Did not find" messages to empty tables. I did not change two
sets, listDbRoleSettings and listTables both have comments describing
that these are deliberately this way.


Thanks.


I wanted to post this update to close this loop for now. I will follow
up once the merge window opens again.


I recommend to create a new entry for this patch in the open July 
commitfest .

I will be able to review this patch in a couple months later in June,
if no one wants to review it earlier.

--
Pavel Luzanov
Postgres Professional:https://postgrespro.com

Re: COALESCE with single argument looks like identity function

2025-04-14 Thread Maksim Milyutin


Updated patchset is attached


On 4/14/25 17:25, Maksim Milyutin wrote:

On 4/11/25 17:00, Tom Lane wrote:

Maksim Milyutin writes:

I've noticed that COALESCE function doesn't converge to argument
expression if it is alone in argument list of COALESCE as part
simplification routine for expressions in planner. This might suppress
further useful transformations when non-strict ops are required from
some expression like converging OUTER JOIN to INNER one with WHERE qual
containing COALESCE over single column from inner side.

Seems like a reasonable idea --- it's probably a rare case, but the
check is cheap enough.  I'd add some comments though.



Thanks for your comments.



Please add this to the open commitfest so we don't lose track of it.



Done. In regression tests I've replaced all COALESCEs with single 
argument to ones with dummy second argument to preserve coalesce calls 
as AFAICS their usages are intentional for wrapping attributes to 
generate PHVs above.



Also I've noticed the issue in query (in join.sql test suite):

SELECT 1 FROM group_tbl t1
    LEFT JOIN (SELECT a c1, *COALESCE(a)* c2 FROM group_tbl t2) s ON TRUE
GROUP BY s.c1, s.c2

repeatable t2.a in GROUP BY clauses are not converged to single 
appearance:


 QUERY PLAN

 Group
   Group Key: t2.a, *t2.a*
   ->  Sort
 Sort Key: t2.a, *t2.a*
 ->  Nested Loop Left Join
   ->  Seq Scan on group_tbl t1
   ->  Seq Scan on group_tbl t2

IMO the cause is in PHV surrounding s.c2 that differentiates its 
internal expression with the same first grouping key.



--
Best regard,
Maksim Milyutin
From 294d7fdf105dac6a17a6ac94a832f4cdf02e6060 Mon Sep 17 00:00:00 2001
From: Maksim Milyutin 
Date: Mon, 14 Apr 2025 17:06:50 +0300
Subject: [PATCH v1 2/2] Adjust regression tests

---
 src/test/regress/expected/join.out  | 32 -
 src/test/regress/expected/subselect.out | 46 -
 src/test/regress/sql/join.sql   | 10 +++---
 src/test/regress/sql/subselect.sql  |  8 ++---
 4 files changed, 48 insertions(+), 48 deletions(-)

diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 14da5708451..6c55eeda6be 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -5384,14 +5384,14 @@ select * from
   (select 1 as id) as xx
   left join
 (tenk1 as a1 full join (select 1 as id) as yy on (a1.unique1 = yy.id))
-  on (xx.id = coalesce(yy.id));
-  QUERY PLAN   

+  on (xx.id = coalesce(yy.id, 0));
+   QUERY PLAN   
+
  Nested Loop Left Join
->  Result
->  Hash Full Join
  Hash Cond: (a1.unique1 = (1))
- Filter: (1 = COALESCE((1)))
+ Filter: (1 = COALESCE((1), 0))
  ->  Seq Scan on tenk1 a1
  ->  Hash
->  Result
@@ -5401,7 +5401,7 @@ select * from
   (select 1 as id) as xx
   left join
 (tenk1 as a1 full join (select 1 as id) as yy on (a1.unique1 = yy.id))
-  on (xx.id = coalesce(yy.id));
+  on (xx.id = coalesce(yy.id, 0));
  id | unique1 | unique2 | two | four | ten | twenty | hundred | thousand | twothousand | fivethous | tenthous | odd | even | stringu1 | stringu2 | string4 | id 
 +-+-+-+--+-++-+--+-+---+--+-+--+--+--+-+
   1 |   1 |2838 |   1 |1 |   1 |  1 |   1 |1 |   1 | 1 |1 |   2 |3 | BA   | EFEAAA   | xx  |  1
@@ -8123,20 +8123,20 @@ select * from int4_tbl i left join
 
 explain (verbose, costs off)
 select * from int4_tbl i left join
-  lateral (select coalesce(i) from int2_tbl j where i.f1 = j.f1) k on true;
- QUERY PLAN  
--
+  lateral (select coalesce(i, row(0)::int4_tbl) from int2_tbl j where i.f1 = j.f1) k on true;
+  QUERY PLAN  
+--
  Nested Loop Left Join
-   Output: i.f1, (COALESCE(i.*))
+   Output: i.f1, (COALESCE(i.*, '(0)'::int4_tbl))
->  Seq Scan on public.int4_tbl i
  Output: i.f1, i.*
->  Seq Scan on public.int2_tbl j
- Output: j.f1, COALESCE(i.*)
+ Output: j.f1, COALESCE(i.*, '(0)'::int4_tbl)
  Filter: (i.f1 = j.f1)
 (7 rows)
 
 select * from int4_tbl i left join
-  lateral (select coalesce(i) from int2_tbl j where i.f1 = j.f1) k on true;
+  lateral (select coalesce(i, row(0)::int4_tbl) from int2_tbl j where i.f1 = j.f1) k on true;
  f1  | coalesce 
 -+--
0 | (0)
@@ -9305,14 +9305,14 @@ CREATE STATISTICS group_tbl_stat (ndistinct) ON a, b FROM group_tbl;
 ANALYZE group_tbl;
 EXPLAIN (COSTS OFF)
 SELECT 1 FROM group_tbl

Re: Performance issues with v18 SQL-language-function changes

2025-04-14 Thread Robert Haas

On Sun, Apr 13, 2025 at 3:23 PM Tom Lane  wrote:
> create function fx(p_summa bigint) returns text immutable strict
> return ltrim(to_char(p_summa, '999 999 999 999 999 999 999 999'));
>
> explain analyze select fx(i) from generate_series(1,100) as i(i);
>
> you arrive at the rude discovery that 0dca5d68d is about 50% slower
> than 0dca5d68d^, because the old implementation builds a plan for fx()
> only once and then re-uses it throughout the query.

I agree that we should do something about this. I haven't reviewed
your patches but the approach sounds broadly reasonable.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Add pg_get_injection_points() for information of injection points

2025-04-14 Thread Michael Paquier

On Tue, Apr 15, 2025 at 02:49:24AM +, Hayato Kuroda (Fujitsu) wrote:
> Should cur_pos be uint32 instead of int? Either of them can work because
> cur_pos can be [0, 128], but it may be clearer.
> 
> Apart from above, LGTM.

Sure, why not.  Thanks for the review.

An important thing may be to split the change into two:
- First one for the API in injection_point.c.
- Second one for the new function.

The second patch is still something folks seem to be meh about, but if
the end consensus is to put the new SQL function in the module
injection_points we are going to need the first patch anyway.
--
Michael
From d738943616f1af8fc2ce1d309a3b7a0b77f29a22 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Tue, 15 Apr 2025 14:40:30 +0900
Subject: [PATCH v3 1/2] Add InjectionPointList() to retrieve list of injection
 points

This hides the internals of the shmem array lookup, allocating the
result in a palloc'd array usable by the caller.
---
 src/include/utils/injection_point.h  | 14 +++
 src/backend/utils/misc/injection_point.c | 52 
 src/tools/pgindent/typedefs.list |  1 +
 3 files changed, 67 insertions(+)

diff --git a/src/include/utils/injection_point.h 
b/src/include/utils/injection_point.h
index 6ba64cd1ebc6..c5c78914b848 100644
--- a/src/include/utils/injection_point.h
+++ b/src/include/utils/injection_point.h
@@ -11,6 +11,17 @@
 #ifndef INJECTION_POINT_H
 #define INJECTION_POINT_H
 
+/*
+ * Injection point data, used when retrieving a list of all the attached
+ * injection points.
+ */
+typedef struct InjectionPointData
+{
+   const char *name;
+   const char *library;
+   const char *function;
+} InjectionPointData;
+
 /*
  * Injection points require --enable-injection-points.
  */
@@ -46,6 +57,9 @@ extern void InjectionPointCached(const char *name);
 extern bool IsInjectionPointAttached(const char *name);
 extern bool InjectionPointDetach(const char *name);
 
+/* Get the current set of injection points attached */
+extern InjectionPointData *InjectionPointList(uint32 *num_points);
+
 #ifdef EXEC_BACKEND
 extern PGDLLIMPORT struct InjectionPointsCtl *ActiveInjectionPoints;
 #endif
diff --git a/src/backend/utils/misc/injection_point.c 
b/src/backend/utils/misc/injection_point.c
index 97ab851f0a63..1f76f25aafb9 100644
--- a/src/backend/utils/misc/injection_point.c
+++ b/src/backend/utils/misc/injection_point.c
@@ -584,3 +584,55 @@ IsInjectionPointAttached(const char *name)
return false;   /* silence compiler */
 #endif
 }
+
+/*
+ * Retrieve a list of all the injection points currently attached.
+ *
+ * This list is palloc'd in the current memory context.
+ */
+InjectionPointData *
+InjectionPointList(uint32 *num_points)
+{
+#ifdef USE_INJECTION_POINTS
+   InjectionPointData *result;
+   uint32  max_inuse;
+   int cur_pos = 0;
+
+   LWLockAcquire(InjectionPointLock, LW_SHARED);
+
+   max_inuse = pg_atomic_read_u32(&ActiveInjectionPoints->max_inuse);
+
+   /*
+* This overestimates the allocation size, including slots that may be
+* free, for simplicity.
+*/
+   result = palloc0(sizeof(InjectionPointData) * max_inuse);
+
+   for (uint32 idx = 0; idx < max_inuse; idx++)
+   {
+   InjectionPointEntry *entry;
+   uint64  generation;
+
+   entry = &ActiveInjectionPoints->entries[idx];
+   generation = pg_atomic_read_u64(&entry->generation);
+
+   /* skip free slots */
+   if (generation % 2 == 0)
+   continue;
+
+   result[cur_pos].name = pstrdup(entry->name);
+   result[cur_pos].library = pstrdup(entry->library);
+   result[cur_pos].function = pstrdup(entry->function);
+   cur_pos++;
+   }
+
+   LWLockRelease(InjectionPointLock);
+
+   *num_points = cur_pos;
+   return result;
+
+#else
+   *num_points = 0;
+   return NULL;
+#endif
+}
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index d16bc208654b..e7df2dc10684 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1281,6 +1281,7 @@ InjectionPointCacheEntry
 InjectionPointCallback
 InjectionPointCondition
 InjectionPointConditionType
+InjectionPointData
 InjectionPointEntry
 InjectionPointsCtl
 InjectionPointSharedState
-- 
2.49.0

From 0e0fb1cfbc2ab863f5e3ff38acd6a29ba1656161 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Tue, 15 Apr 2025 14:40:55 +0900
Subject: [PATCH v3 2/2] Add pg_get_injection_points()

This is a system function that displays the information about the
injection points currently attached to the system, feeding from the
states of things in shared memory.
---
 src/include/catalog/pg_proc.dat   |  8 +++
 src/backend/utils/misc/Makefile   |  1 +
 .../utils/misc/injection_point_funcs.c| 59

Re: pg_dump --if-exists --clean when drop index that is partition of a partitioned index

2025-04-14 Thread Pavel Stehule

út 15. 4. 2025 v 7:33 odesílatel jian he 
napsal:

> On Mon, Apr 14, 2025 at 2:09 PM Pavel Stehule 
> wrote:
> >
> > Hi
> >
> > po 14. 4. 2025 v 7:54 odesílatel jian he 
> napsal:
> >>
> >> hi.
> >>
> >> CREATE TABLE tp(c int, a int, b int) PARTITION BY RANGE (b);
> >> CREATE TABLE tp_1(c int, a int, b int);
> >> ALTER TABLE tp ATTACH PARTITION tp_1 FOR VALUES FROM (0) TO (1);
> >> CREATE INDEX t_a_idx ON tp_1(a);
> >> CREATE INDEX tp_a_idx ON tp(a);
> >>
> >> pg_dump  --schema=public --if-exists --clean --no-statistics
> >> --no-owner --no-data --table-and-children=tp > 1.sql
> >> pg_dump output file 1.sql excerpts:
> >> 
> >> DROP INDEX IF EXISTS public.t_a_idx;
> >> DROP INDEX IF EXISTS public.tp_a_idx;
> >> DROP TABLE IF EXISTS public.tp_1;
> >> DROP TABLE IF EXISTS public.tp;
> >> 
> >> if you execute the
> >> DROP INDEX IF EXISTS public.t_a_idx;
> >>
> >> ERROR:  cannot drop index t_a_idx because index tp_a_idx requires it
> >> HINT:  You can drop index tp_a_idx instead.
> >>
> >> Is this pg_dump output what we expected?
> >>
> >
> > It is a bug, I think, the implementation of these parts of code is older
> than partitioning support, and doesn't do necessary detach.
> >
>
>
> seems pretty easy to fix.
> we only need dropStmt when IndxInfo->parentidx oid is invalid.
>
> +if (!OidIsValid(indxinfo->parentidx))
> +appendPQExpBuffer(delq, "DROP INDEX %s;\n", qqindxname);
>

I don't think it is the correct fix.

because then fails CREATE INDEX qqindxname

The motivation for usage --clean and --if-exists option is possibility to
call restore idempotently. It should not fail when I do restore in an empty
database, and it shouldn't fail if I do restore in an existing database.

Regards

Pavel



>
> I have tested the above changes on PG11, master.
>

Re: Recent pg_rewind test failures in buildfarm

2025-04-14 Thread Alexander Lakhin


Hello Tom,

15.04.2025 05:58, Tom Lane wrote:

In the last day or so, both skink and mamba have hit this
in the pg_rewind test suite [1][2]:

#3  0x01f03f7c in ExceptionalCondition (conditionName=conditionName@entry=0x2119c4c 
"pending_since == 0", fileName=fileName@entry=0x2119454 "pgstat.c", 
lineNumber=lineNumber@entry=734) at assert.c:66
...

That assert appears to be several years old, and the
008_min_recovery_point.pl test script that's triggering it hasn't
changed very recently either, so I'm baffled where to start digging.
It has the odor of a timing problem, so maybe we just started hitting
this by chance.  Still ... anybody have an idea?

[1] 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2025-04-13%2018%3A55%3A03
[2] 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mamba&dt=2025-04-15%2001%3A00%3A04



FWIW, there was also another test (028_row_filter.pl) failed due to that
assert triggered: [3].

I've managed to reproduce this failure locally when running 10 instances
of 008_min_recovery_point.pl in parallel under Valgrind.

Will try to investigate the issue during this week.

[3] 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2025-04-11%2007%3A41%3A36

Best regards,
Alexander Lakhin
Neon (https://neon.tech)

Re: Add estimated hit ratio to Memoize in EXPLAIN to explain cost adjustment

2025-04-14 Thread David Rowley

On Tue, 15 Apr 2025 at 11:14, Ilia Evdokimov
 wrote:
> On 14.04.2025 23:53, David Rowley wrote:
> If we can't get consensus for everything people want to add at once
> then maybe the patch could be broken into two, with 0001 being pretty
> much the v4 patch and then have 0002 add the Estimated Hit Ratio.
> Having the physical patches and being able to try it out or view the
> regression test changes might lower the bar on people chipping in with
> their views.
>
>
> Agreed - I’ll split the patch into two: one adds Estimated Capacity and 
> Estimated Distinct Keys, and the second adds Estimated Hit Ratio. I’ll also 
> add regression test differences to make it easier to evaluate the usefulness 
> of each part.

Thanks for the patches.

I'm just looking for ways to allow us to group all three of these
together for the TEXT format type. I see the BUFFERS are displayed
quite compactly, e.g. "Buffers: shared hit=17 read=4".
show_wal_usage() does something similar and so does
show_modifytable_info() for MERGE.

Maybe we could compress the text output a bit with:

"Estimates: capacity=N distinct keys=N hit ratio=N.N%"

If we get it that compact, maybe lookups could fit in there too, i.e.:

"Estimates: capacity=N distinct keys=N lookups=N hit ratio=N.N%"

I'd also like to vote that you modify explain.c and multiply the
hit_ratio by 100 and use "%" as the unit in ExplainPropertyFloat().
Just looking at the raw number of "1.00" in the expected output, it
isn't obvious if the planner expects every lookup to be a cache hit or
just 1% of them.

Also, to get something commitable, you'll also need to modify
explain_memoize() at the top of memoize.sql and add handling to mask
out the value of these new properties. These are not going to be
anywhere near stable enough across platforms to have these shown in
the expected output files.

David

Re: not null constraints, again

2025-04-14 Thread Tender Wang

Tender Wang  于2025年4月15日周二 11:20写道：

>
>
> Tom Lane  于2025年4月15日周二 05:39写道：
>
>> Alvaro Herrera  writes:
>> > On 2025-Apr-14, Tom Lane wrote:
>> >> I would not have expected that adding pg_constraint rows implies
>> >> stronger locks than what ALTER ADD PRIMARY KEY was using before,
>> >> and I suspect that doing so will cause more problems than just
>> >> breaking parallel restore.
>>
>> > I wasn't aware of this side effect.  I'll investigate this in more
>> > depth.  I suspect it might be a bug in the way we run through ALTER
>> > TABLE for the primary key.
>>
>> After further thought it occurs to me that it might not be a case
>> of "we get stronger locks", but a case of "we accidentally get a
>> weaker lock earlier and then try to upgrade it", thus creating a
>> possibility of deadlock where before we'd just have blocked till
>> the other statement cleared.  Still worthy of being fixed if that's
>> true, though.
>>
>
> I added sleep(1) in the  DeadLockReport() before error report to display
> the status when a deadlock happened.
> bool continue_sleep = true;
> do
> {
> sleep(1);
> } while (continue_sleep);
> ereport(ERROR,
> (errcode(ERRCODE_T_R_DEADLOCK_DETECTED),
> errmsg("deadlock detected"),
> errdetail_internal("%s", clientbuf.data),
> errdetail_log("%s", logbuf.data),
> errhint("See server log for query details.")));
>
>
>
> ubuntu@VM-0-17-ubuntu:/workspace/postgres$ ps -ef|grep postgres
> ubuntu   2911109   1  0 10:34 ?00:00:00
> /workspace/pgsql/bin/postgres -D ../data
> ubuntu   290 2911109  0 10:34 ?00:00:00 postgres: io worker 0
> ubuntu   291 2911109  0 10:34 ?00:00:00 postgres: io worker 1
> ubuntu   292 2911109  0 10:34 ?00:00:00 postgres: io worker 2
> ubuntu   293 2911109  0 10:34 ?00:00:00 postgres: checkpointer
> ubuntu   294 2911109  0 10:34 ?00:00:00 postgres: background
> writer
> ubuntu   296 2911109  0 10:34 ?00:00:00 postgres: walwriter
> ubuntu   297 2911109  0 10:34 ?00:00:00 postgres: autovacuum
> launcher
> ubuntu   298 2911109  0 10:34 ?00:00:00 postgres: logical
> replication launcher
> ubuntu   2911180 2911109  0 10:34 ?00:00:00 postgres: ubuntu
> target [local] COPY waiting
> ubuntu   2911184 2911109  0 10:34 ?00:00:00 postgres: ubuntu
> target [local] idle
> ubuntu   2911187 2911109  0 10:34 ?00:00:00 postgres: ubuntu
> target [local] idle
> ubuntu   2911188 2911109  0 10:34 ?00:00:00 postgres: ubuntu
> target [local] ALTER TABLE
> ubuntu   2911189 2911109  0 10:34 ?00:00:00 postgres: ubuntu
> target [local] SELECT waiting
> ubuntu   2911190 2911109  0 10:34 ?00:00:00 postgres: ubuntu
> target [local] idle
> ubuntu   2911191 2911109  0 10:34 ?00:00:00 postgres: ubuntu
> target [local] TRUNCATE TABLE waiting
> ubuntu   2911192 2911109  0 10:34 ?00:00:00 postgres: ubuntu
> target [local] idle
> ubuntu   2911193 2911109  0 10:34 ?00:00:00 postgres: ubuntu
> target [local] SELECT waiting
> ubuntu   2911194 2911109  0 10:34 ?00:00:00 postgres: ubuntu
> target [local] idle
>
> gdb -p 2911188   // ALTER TABLE ONLY public.parent2  ADD CONSTRAINT
> parent2_pkey PRIMARY KEY (id);
> (gdb) bt
> #0  0x7f2f6910878a in __GI___clock_nanosleep (clock_id=clock_id@entry=0,
> flags=flags@entry=0, req=req@entry=0x7ffc0e560e60, 
> rem=rem@entry=0x7ffc0e560e60)
> at ../sysdeps/unix/sysv/linux/clock_nanosleep.c:78
> #1  0x7f2f6910d677 in __GI___nanosleep (req=req@entry=0x7ffc0e560e60,
> rem=rem@entry=0x7ffc0e560e60) at ../sysdeps/unix/sysv/linux/nanosleep.c:25
> #2  0x7f2f6910d5ae in __sleep (seconds=0) at
> ../sysdeps/posix/sleep.c:55
> #3  0x561cd9386100 in DeadLockReport () at deadlock.c:1136
> #4  0x561cd9389df8 in LockAcquireExtended (locktag=0x7ffc0e5610b0,
> lockmode=8, sessionLock=false, dontWait=false, reportMemoryError=true,
> locallockp=0x7ffc0e5610a8, logLockFailure=false) at lock.c:1232
> #5  0x561cd93864bc in LockRelationOid (relid=16473, lockmode=8) at
> lmgr.c:115
> #6  0x561cd8f21b20 in find_inheritance_children_extended
> (parentrelId=16463, omit_detached=true, lockmode=8, detached_exist=0x0,
> detached_xmin=0x0) at pg_inherits.c:213
> #7  0x561cd8f217c1 in find_inheritance_children (parentrelId=16463,
> lockmode=8) at pg_inherits.c:60
> #8  0x561cd904dd73 in ATPrepAddPrimaryKey (wqueue=0x7ffc0e561348,
> rel=0x7f2f5d7d6240, cmd=0x561cf1d2ee38, recurse=false, lockmode=8,
> context=0x7ffc0e561540) at tablecmds.c:9463
> #9  0x561cd9043906 in ATPrepCmd (wqueue=0x7ffc0e561348,
> rel=0x7f2f5d7d6240, cmd=0x561cf1d2ee38, recurse=false, recursing=false,
> lockmode=8, context=0x7ffc0e561540) at tablecmds.c:5079
> #10 0x561cd90432aa in ATController (parsetree=0x561cf1d062e0,
> rel=0x7f2f5d7d6240, cmds=0x561cf1d06290, recurse=false, lockmode=8,
> context=0x7ffc0e561540) at tablecmds.c:4871
> #11 0x561cd9042f3b in AlterTable (stmt=0x561cf1d062e0, lockmode=8

Re: FmgrInfo allocation patterns (and PL handling as staged programming)

2025-04-14 Thread Pavel Stehule

Hi

út 15. 4. 2025 v 2:32 odesílatel Chapman Flack  napsal:

> On 04/06/25 22:37, Tom Lane wrote:
> > Here's a draft patch to fix the bogus dependencies.  As given this'd
> > only be okay for HEAD, since I doubt we can get away with changing
> > ProcedureCreate()'s signature in stable branches ... In the back branches
> > we could make ProcedureCreate() deconstruct the OID array it's given and
> > then repeat the transform lookups, but ugh. ...
> >
> > BTW, I feel a little uncomfortable with the idea that we're adding
> > dependencies on objects that are explicitly mentioned nowhere in the
> > pg_proc entry.
>
> Pursuing that idea a bit further, was there a compelling original reason
> the column in pg_proc had to be protrftypes and not just protransforms?
>
> The transforms have been looked up at ProcedureCreate time.
>
> Any PL that supports transforms has to repeat the transform lookups
> if there are protrftypes present.
>
> Any PL that doesn't support transforms needs only to check for presence
> and say "sorry, I don't support those", which it could do just as easily
> with a list of transforms as with a list of types.
>

There was a long discussion, and I think the main reason for this design
is  a possibility to use an old code without change although the
transformations are installed. And secondly there was a possibility to use
a transformation when it was installed, and use it although it was
installed after the function was created. For example PL/pgSQL code has no
dependency on types or casts that are internally used. And I can write code
in PL/Python that can work with or without transformation, and can be
unfriendly to force recreate functions to use a transformation. Today we
have more tools for processing dependencies in code, but 13 years ago the
possibility to work with PL code and transformations independently (how
much was possible) made sense. Today it is not a problem to build a new
production server and migrate data there, and there are less reasons to
maintain old servers. There was more expectation about others PL (than
PL/pgSQL) and about transformations (a lot of work where PLPerl was used
before can be done now by SQL/XML or SQL/JSON or by FDW).

https://www.postgresql.org/message-id/flat/1339713732.11971.79.camel%40vanquo.pezone.net

Regards

Pavel

>
> Regards,
> -Chap
>
>
>

Re: pg_dump --if-exists --clean when drop index that is partition of a partitioned index

2025-04-14 Thread jian he

On Mon, Apr 14, 2025 at 2:09 PM Pavel Stehule  wrote:
>
> Hi
>
> po 14. 4. 2025 v 7:54 odesílatel jian he  napsal:
>>
>> hi.
>>
>> CREATE TABLE tp(c int, a int, b int) PARTITION BY RANGE (b);
>> CREATE TABLE tp_1(c int, a int, b int);
>> ALTER TABLE tp ATTACH PARTITION tp_1 FOR VALUES FROM (0) TO (1);
>> CREATE INDEX t_a_idx ON tp_1(a);
>> CREATE INDEX tp_a_idx ON tp(a);
>>
>> pg_dump  --schema=public --if-exists --clean --no-statistics
>> --no-owner --no-data --table-and-children=tp > 1.sql
>> pg_dump output file 1.sql excerpts:
>> 
>> DROP INDEX IF EXISTS public.t_a_idx;
>> DROP INDEX IF EXISTS public.tp_a_idx;
>> DROP TABLE IF EXISTS public.tp_1;
>> DROP TABLE IF EXISTS public.tp;
>> 
>> if you execute the
>> DROP INDEX IF EXISTS public.t_a_idx;
>>
>> ERROR:  cannot drop index t_a_idx because index tp_a_idx requires it
>> HINT:  You can drop index tp_a_idx instead.
>>
>> Is this pg_dump output what we expected?
>>
>
> It is a bug, I think, the implementation of these parts of code is older than 
> partitioning support, and doesn't do necessary detach.
>


seems pretty easy to fix.
we only need dropStmt when IndxInfo->parentidx oid is invalid.

+if (!OidIsValid(indxinfo->parentidx))
+appendPQExpBuffer(delq, "DROP INDEX %s;\n", qqindxname);

I have tested the above changes on PG11, master.
From 24cca288498cbbe7fb831e92567205fe13db5e19 Mon Sep 17 00:00:00 2001
From: jian he 
Date: Tue, 15 Apr 2025 13:32:24 +0800
Subject: [PATCH v1 1/1] fix pg_dump --clean on index inheritance hierarchies

CREATE TABLE tp(c int, a int, b int) PARTITION BY RANGE (b);
CREATE TABLE tp_1(c int, a int, b int);
ALTER TABLE tp ATTACH PARTITION tp_1 FOR VALUES FROM (0) TO (1);
CREATE INDEX tp_1_a_idx ON tp_1(a);
CREATE INDEX tp_a_idx ON tp(a);

now the pg_dump output is
```
DROP INDEX IF EXISTS public.tp_a_idx;
DROP TABLE IF EXISTS public.tp_1;
``

discussion: https://postgr.es/m/CACJufxF0QSdkjFKF4di-JGWN6CSdQYEAhGPmQJJCdkSZtd=o...@mail.gmail.com
commitfest entry:
---
 src/bin/pg_dump/pg_dump.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index c6e6d3b2b86..4eb33fefb56 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -17953,7 +17953,14 @@ dumpIndex(Archive *fout, const IndxInfo *indxinfo)
 			  qindxname);
 		}
 
-		appendPQExpBuffer(delq, "DROP INDEX %s;\n", qqindxname);
+		/*
+		 * for DROP statement, we don't need dump indexes that are partition of
+		 * a partitioned index, since drop partitioned index will drop it as
+		 * well. We support index inheritance hierarchies since version 11, but
+		 * earlier than version 11, we unconditionally set parentidx to 0.
+		*/
+		if (!OidIsValid(indxinfo->parentidx))
+			appendPQExpBuffer(delq, "DROP INDEX %s;\n", qqindxname);
 
 		if (indxinfo->dobj.dump & DUMP_COMPONENT_DEFINITION)
 			ArchiveEntry(fout, indxinfo->dobj.catId, indxinfo->dobj.dumpId,
-- 
2.34.1

Re: Recent pg_rewind test failures in buildfarm

2025-04-14 Thread Michael Paquier

On Mon, Apr 14, 2025 at 10:58:28PM -0400, Tom Lane wrote:
> That assert appears to be several years old, and the
> 008_min_recovery_point.pl test script that's triggering it hasn't
> changed very recently either, so I'm baffled where to start digging.
> It has the odor of a timing problem, so maybe we just started hitting
> this by chance.  Still ... anybody have an idea?

Likely 039549d70f6a.  The proc_exit(0) calls done at this stage are
reporting that we have non-flushed stats, but we should have some
remaining around.  This sequence is part of the pgstat_shutdown_hook
callback, which has been initialized by pgstat_initialize() done for
the WAL senders in BaseInit().  It looks like we are going to need to
be more aggressive with the stats flushes, so as we have nothing left
when reaching this stage of the shutdown in ProcessRepliesIfAny().

(Sorry for the fuzzy reply, I cannot dive into all the details now,
that's just a quick take on the matter so I may be wrong with my
assumptions.)
--
Michael

signature.asc
Description: PGP signature

Re: Recent pg_rewind test failures in buildfarm

2025-04-14 Thread Bertrand Drouvot

Hi,

On Tue, Apr 15, 2025 at 08:00:00AM +0300, Alexander Lakhin wrote:
> Hello Tom,
> 
> 15.04.2025 05:58, Tom Lane wrote:
> > In the last day or so, both skink and mamba have hit this
> > in the pg_rewind test suite [1][2]:
> > 
> > #3  0x01f03f7c in ExceptionalCondition 
> > (conditionName=conditionName@entry=0x2119c4c "pending_since == 0", 
> > fileName=fileName@entry=0x2119454 "pgstat.c", 
> > lineNumber=lineNumber@entry=734) at assert.c:66
> > ...
> > 
> > That assert appears to be several years old, and the
> > 008_min_recovery_point.pl test script that's triggering it hasn't
> > changed very recently either, so I'm baffled where to start digging.
> > It has the odor of a timing problem, so maybe we just started hitting
> > this by chance.  Still ... anybody have an idea?

Sorry, can't look at the details right now but it might be linked to 
039549d70f6 which is recent enough and in this area. Will give it a look once
I've time.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: [PoC] Federated Authn/z with OAUTHBEARER

2025-04-14 Thread Wolfgang Walther


Jacob Champion:

libpq.a
libpq-oauth-18.a

The libpq.a file has no references to dlopen, but plenty of references
to curl stuff.


Which references? libpq-oauth should be the only thing using Curl symbols:

 $ nm src/interfaces/libpq/libpq.a | grep --count curl
 0
 $ nm src/interfaces/libpq-oauth/libpq-oauth-18.a | grep --count curl
 116


Not sure what I was looking at earlier, probably too many different 
builds at the same time. Now I can't find the curl symbols in libpq.a 
either...



I'm not sure what the libpq-oauth-18.a file is for.


That implements the flow. You'll need to link that into your
application or it will complain about missing flow symbols. (I don't
think there's an easy way to combine the two libraries in our Autoconf
setup; the only ways I can think of right now would introduce a
circular dependency between libpq and libpq-oauth...)


... which immediately explains what the libpq-oauth-18.a file is for, yes.

But that means we'd need a -lpq-oauth-18 or something like that in 
Libs.private in libpq.pc, right?


This seems to be missing, I checked both the autoconf and meson builds.

Best,

Wolfgang

Re: pgsql: Non text modes for pg_dumpall, correspondingly change pg_restore

2025-04-14 Thread Andrew Dunstan



On 2025-04-14 Mo 12:28 PM, Andrew Dunstan wrote:





I'm also not sure about the double sscanf() business there ... There
must be a better way to do this.




Yes, probably. I'll look into that if you like.





something like this?


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com
diff --git a/src/bin/pg_dump/pg_restore.c b/src/bin/pg_dump/pg_restore.c
index ff4bb320fc9..a600f7d9a3b 100644
--- a/src/bin/pg_dump/pg_restore.c
+++ b/src/bin/pg_dump/pg_restore.c
@@ -1053,26 +1053,32 @@ get_dbname_oid_list_from_mfile(const char *dumpdirpath, SimpleOidStringList *dbn
 	while ((fgets(line, MAXPGPATH, pfile)) != NULL)
 	{
 		Oid			db_oid = InvalidOid;
-		char		db_oid_str[MAXPGPATH + 1] = "";
 		char	   *dbname;
+		char   *p = line;
 
 		/* Extract dboid. */
-		sscanf(line, "%u", &db_oid);
-		sscanf(line, "%20s", db_oid_str);
+		while(isdigit(*p))
+			p++;
+		if (p > line && *p == ' ')
+		{
+			sscanf(line, "%u", &db_oid);
+			p++;
+		}
 
 		/* dbname is the rest of the line */
-		dbname = line + strlen(db_oid_str) + 1;
+		dbname = p;
 
 		/* Remove \n from dbname. */
 		dbname[strlen(dbname) - 1] = '\0';
 
-		pg_log_info("found database \"%s\" (OID: %u) in \"%s\"",
-	dbname, db_oid, map_file_path);
-
 		/* Report error and exit if the file has any corrupted data. */
 		if (!OidIsValid(db_oid) || strlen(dbname) == 0)
 			pg_fatal("invalid entry in \"%s\" at line : %d", map_file_path,
 	 count + 1);
+		else
+			pg_log_info("found database \"%s\" (OID: %u) in \"%s\"",
+		dbname, db_oid, map_file_path);
+
 
 		simple_oid_string_list_append(dbname_oid_list, db_oid, dbname);
 		count++;

Re: Fix a resource leak (src/backend/utils/adt/rowtypes.c)

2025-04-14 Thread Tom Lane

Robert Haas  writes:
> On Sun, Apr 13, 2025 at 7:34 PM Ranier Vilela  wrote:
>> The function *record_in* has a new report about resource leak.
>> I think Coverity is right.

> I agree, for small values of "right". The memory isn't formally leaked
> because it will be eventually released when the containing memory
> context is deleted, but it's unclear why we should bother to clean up
> the memory in the normal path yet skip it here. I wondered whether the
> existing pfree calls were added in response to some specific
> complaint, but it doesn't appear so: they date back to Tom's 2004-era
> commit a3704d3deca6d08013a6b1db0432b75dc6b78d28, the commit message
> for which is rather more brief than what is typical today. Still, it
> seems safer to bet on the pfree being a good idea than on the reverse,
> because record_in() can be called lots of times in a single
> transaction.

Well, the pfree's in the main path do, but the "fail:" code path is
considerably more recent (ccff2d20e).  In the latter, I believe I
thought it wasn't worth the bookkeeping to keep track of whether these
allocations had been made yet.  Note the comment

/* exit here once we've done lookup_rowtype_tupdesc */

i.e. there is (was) exactly one criterion for whether to "goto fail"
or just return.  I don't love the proposed patch, partly because it
doesn't bother to update that comment after falsifying it, but mostly
because it makes the code more fragile.  Put a "goto fail" in the
wrong place, and kaboom!

We could do something like initialize all these variables to NULL
at the top and then write

if (values)
pfree(values);

et cetera.  But I don't see the point.  Frankly, if I wanted to make
this more consistent I would delete the pfrees in the main path.
Exactly zero evidence has been offered that that would create a leak
of significance, and the existing code goes against our general advice
that retail pfree's are more expensive than letting context reset
clean up.  (The fact that Coverity doesn't understand that doesn't
make it not true.)

regards, tom lane

use correct variable in error message in _allocAH function (pg_backup_archiver.c)

2025-04-14 Thread Mahendra Singh Thalor

Hi hackers,

In _allocAH function(pg_backup_archiver.c), we were using the *fmt*
variable in switch case for *default case*, but correct variable is
AH->format.

Here, I am attaching a patch for the same.

-- 
Thanks and Regards
Mahendra Singh Thalor
EnterpriseDB: http://www.enterprisedb.com


v01-use-correct-variable-in-error-message-in-_allocAH-fun.patch
Description: Binary data

RE: Add pg_get_injection_points() for information of injection points

2025-04-14 Thread Hayato Kuroda (Fujitsu)

Dear Michael,

> Yes.  The function could be changed to return an ERROR if the build
> does not support this option.  It's more portable to return NULL if
> you have some queries that do joins.  This could be used with
> pg_stat_activity and wait events for example, and some points are in
> positions strategic enough that they could be used across more than
> one library, like the restart point one or the autovacuum startup one.

Thanks for the description. +1.

> > I'm not sure it is directly related, but ISTM there are no direct ways to 
> > check
> > whether the injection_points is enabled or not. How do you think adding the
> > function?
> 
> Yeah, we could use something like that, not sure if that's worth it.

We can fork new thread when it is required...

> > Regarding the internal of the patch, it could be crashed when two points are
> > attached and then first one is detached [1]. I think we should not use 
> > "idx" for
> > the result array - PSA the fix.
> 
> Oops.  That's a brain fart from my side.  Thanks.

Confirmed v2 could fix the issue. One minor comment related with my part:

Should cur_pos be uint32 instead of int? Either of them can work because
cur_pos can be [0, 128], but it may be clearer.

Apart from above, LGTM.

Best regards,
Hayato Kuroda
FUJITSU LIMITED

Recent pg_rewind test failures in buildfarm

2025-04-14 Thread Tom Lane

In the last day or so, both skink and mamba have hit this
in the pg_rewind test suite [1][2]:

#3  0x01f03f7c in ExceptionalCondition 
(conditionName=conditionName@entry=0x2119c4c "pending_since == 0", 
fileName=fileName@entry=0x2119454 "pgstat.c", lineNumber=lineNumber@entry=734) 
at assert.c:66
#4  0x01d7994c in pgstat_report_stat (force=force@entry=true) at pgstat.c:734
#5  0x01d7a248 in pgstat_shutdown_hook (code=, arg=) at pgstat.c:615
#6  0x01d1e7c0 in shmem_exit (code=code@entry=0) at ipc.c:243
#7  0x01d1e92c in proc_exit_prepare (code=code@entry=0) at ipc.c:198
#8  0x01d1e9fc in proc_exit (code=0) at ipc.c:111
#9  0x01cd5bd0 in ProcessRepliesIfAny () at walsender.c:2294
#10 0x01cd8084 in WalSndLoop (send_data=send_data@entry=0x1cd7628 
) at walsender.c:2790
#11 0x01cd8f40 in StartReplication (cmd=0xfd048210) at walsender.c:960
#12 exec_replication_command (cmd_string=cmd_string@entry=0xfd53c098 
"START_REPLICATION 0/300 TIMELINE 2") at walsender.c:2135
#13 0x01d5bd84 in PostgresMain (dbname=, username=) at postgres.c:4767

That assert appears to be several years old, and the
008_min_recovery_point.pl test script that's triggering it hasn't
changed very recently either, so I'm baffled where to start digging.
It has the odor of a timing problem, so maybe we just started hitting
this by chance.  Still ... anybody have an idea?

regards, tom lane

[1] 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2025-04-13%2018%3A55%3A03
[2] 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mamba&dt=2025-04-15%2001%3A00%3A04

Re: [PoC] Federated Authn/z with OAUTHBEARER

2025-04-14 Thread Wolfgang Walther


Jacob Champion:

(The [2] link is missing, I think.)


Ah, sry. This is the link:

https://github.com/wolfgangwalther/nixpkgs/commits/postgresql-libpq-curl/

It's the last two commits on that branch.


I'm confused by this -- the build produces staticlibs alongside the
dynamically linked ones, so that's what I've been testing against.
What different options do you pass to configure for a "statically
linked build"?


It's not so much the options, but more that for this build there are no 
shared libs available at buildtime at all. You can consider it a "fully 
static system". So in your case, you'd always do the configure test with 
shared libs, but I can't.


The build system passes --enable-static and --disable-shared to 
configure, but both of those are ignored by configure, as indicated by a 
WARNING immediately.



Not unless there is some magic in PKG_CHECK_MODULES I've never heard
of (which is entirely possible!). Furthermore I imagine that the
transitive dependencies of all its dependencies are not added either.


IIUC, the transitive dependencies would be part of libcurl's 
Libs.private / Requires.private (assuming that file is correctly 
created). So that would be taken care of, I guess.




Does your build method currently work for dependency forests like
libgssapi_krb5 and libldap? (I want to make sure I'm not accidentally
doing less work than we currently support for those other deps, but
I'm also not planning to add more feature work as part of this
particular open item.)


We currently build libpq with neither libldap, nor libkrb5, at least for 
the static case. But I just tried on the bigger postgresql package and 
force-enabled libldap there for the static build - it fails in exactly 
the same way.


So yes, not related to your patch. I do understand that PostgreSQL's 
autoconf build system is not designed for "static only", I am certainly 
not expecting you to fix that.


I think meson will do better here, but I was not able to make that work, 
yet.




When I do "make -C src/interfaces/libpq-oauth", I get this error:

make: *** No rule to make target 'oauth-curl.o', needed by
'libpq-oauth-18.so'.  Stop.

I cannot reproduce this. The CI seems happy, too. Is this patch the
only modification you've made to our build system, or are there more
changes?


We apply another patch to change the default socket directory to /run, 
but that's certainly unrelated. All the other custom stuff only kicks in 
afterwards, in the installPhase, so unrelated as well.


I just tried the same thing on the bigger postgresql package, where the 
full build is run and not only libpq / libpq-oauth. It fails with the 
same error. No rule for oauth-curl.o.




I'm about to rewrite this part somewhat, so a deep dive may not be very helpful.


OK. I will try to get meson running, at least enough to try this patch 
again. Maybe that gives better results.


Thanks,

Wolfgang

Re: Adding support for SSLKEYLOGFILE in the frontend

2025-04-14 Thread Daniel Gustafsson

> On 9 Apr 2025, at 20:45, Daniel Gustafsson  wrote:
> 
>> On 9 Apr 2025, at 20:41, Jacob Champion  
>> wrote:
>> 
>> Hello,
>> 
>> On Thu, Apr 3, 2025 at 8:51 AM Daniel Gustafsson  wrote:
>>> Committed, after another round of testing and looking.
>> 
>> I think we may want to consider marking sslkeylogfile as a debug
>> option (that is, opt->dispchar = "D") in fe-connect.c. Besides being a
>> true "debug option", this would also prevent a relatively unprivileged
>> user of postgres_fdw or dblink from logging the backend connection's
>> keys. WDYT?
> 
> I think that sounds like a good idea, unless anyone thinks otherwise I'll go
> ahead and make it so.

Just to close the loop, this was done yesterday as 2970c75dd982.

--
Daniel Gustafsson

Re: Fix a resource leak (src/backend/utils/adt/rowtypes.c)

2025-04-14 Thread Ranier Vilela

Em seg., 14 de abr. de 2025 às 16:59, Robert Haas 
escreveu:

> On Sun, Apr 13, 2025 at 7:34 PM Ranier Vilela  wrote:
> > CID 1608916: (#1 of 1): Resource leak (RESOURCE_LEAK)
> > 52. leaked_storage: Variable buf going out of scope leaks the storage
> buf.data points to.
> >
> > The function *record_in* has a new report about resource leak.
> > I think Coverity is right.
>
> I agree, for small values of "right".

Thanks.


> The memory isn't formally leaked
> because it will be eventually released when the containing memory
> context is deleted, but it's unclear why we should bother to clean up
> the memory in the normal path yet skip it here. I wondered whether the
> existing pfree calls were added in response to some specific
> complaint, but it doesn't appear so: they date back to Tom's 2004-era
> commit a3704d3deca6d08013a6b1db0432b75dc6b78d28,

Thanks for researching.


> the commit message
> for which is rather more brief than what is typical today. Still, it
> seems safer to bet on the pfree being a good idea than on the reverse,
> because record_in() can be called lots of times in a single
> transaction.
>
I think that material for v18, although there were no reported concerns.

best regards,
Ranier Vilela

Re: not null constraints, again

2025-04-14 Thread Alvaro Herrera

On 2025-Apr-14, Tom Lane wrote:

> The patch I propose there seems to prevent this, but I wonder if we
> shouldn't look closer into why it's failing in the first place.
> I would not have expected that adding pg_constraint rows implies
> stronger locks than what ALTER ADD PRIMARY KEY was using before,
> and I suspect that doing so will cause more problems than just
> breaking parallel restore.

I wasn't aware of this side effect.  I'll investigate this in more
depth.  I suspect it might be a bug in the way we run through ALTER
TABLE for the primary key.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"Los cuentos de hadas no dan al niño su primera idea sobre los monstruos.
Lo que le dan es su primera idea de la posible derrota del monstruo."
   (G. K. Chesterton)

Re: Add pg_get_injection_points() for information of injection points

2025-04-14 Thread Michael Paquier

On Mon, Apr 14, 2025 at 04:29:37PM +0300, Aleksander Alekseev wrote:
> If I didn't miss anything, all SQL functions dealing with injection
> points are gathered in injection_points extension so IMO
> pg_get_injection_points() belongs there. It would be awkward to have
> it in the core considering the fact that injection points presumably
> should be disabled in release builds.

There are two more in test_aio, and by design out-of-core extensions
can define their own.

> Users will see a function in \df that does nothing.

Yeah, this one's true if --enable-injection-points is not used.
--
Michael


signature.asc
Description: PGP signature

Re: Add estimated hit ratio to Memoize in EXPLAIN to explain cost adjustment

2025-04-14 Thread Andrei Lepikhov


On 4/14/25 18:31, Robert Haas wrote:

On Tue, Apr 1, 2025 at 5:07 PM David Rowley  wrote:

I tried to move things along to address Tom's concern about not
wanting to show this in EXPLAIN's standard output. I suggested in [1]
that we could use EXPLAIN ANALYZE, but nobody commented on that.
EXPLAIN ANALYZE is much more verbose than EXPLAIN already, and if we
put these in EXPLAIN ANALYZE then the viewer can more easily compare
planned vs actual. I had mentioned that Hash Join does something like
this for buckets.

Having looked at v6, I think it would help, at least if the reader is
sufficiently knowledgeable. From the values displayed, it looks like
someone could reconstruct the evict_ratio value in
cost_memoize_rescan(), and if they know the loop count, also the
hit_ratio. But that seems hard: if you weren't reading the code, how
would you know how to do it? Even if you are reading the code, are you
sure you'd reconstruct it correctly? I wonder why we think it's better
to display this than a more "cooked" number like the estimated hit
ratio that was proposed in v1?

+1.
My reasons: a) it more "physical"; b) We don't need to look for the 
corresponding NestLoop JOIN node - my explans usually contain 150+ lines 
and it is hard to get into the analysis something out of current screen.


--
regards, Andrei Lepikhov

Re: [18] Unintentional behavior change in commit e9931bfb75

2025-04-14 Thread Jeff Davis

On Sat, 2025-04-12 at 05:34 -0700, Noah Misch wrote:
> I think the code for (2) and for "I/i in Turkish" haven't returned. 
> Given
> commit e3fa2b0 restored the v17 "I/i in Turkish" treatment for plain
> lower(),
> the regex code likely needs a similar restoration.  If not, the regex
> comments
> would need to change to match the code.

Great find, thank you! I'm curious how you came about this difference,
was it through testing or code inspection?

Patch attached. I also updated the top of the comment so that it's
clear that it's referring to the libc provider specifically, and that
ICU still has an issue with non-UTF8 encodings.

Also, the force-to-ASCII-behavior special case is different for
pg_wc_tolower/uppper vs LOWER()/UPPER: the former depends only on
whether it's the default locale, whereas the latter depends on whether
it's the default locale and the encoding is single-byte. Therefore the
results in the tr_TR.UTF-8 locale for the libc provider are
inconsistent:

  => select 'i' ~* 'I', 'I' ~* 'i', lower('I') = 'i', upper('i') = 'I';
   ?column? | ?column? | ?column? | ?column? 
  --+--+--+--
   t| t| f| f

That behavior goes back a long way, so I'm not suggesting that we
change it.

Regards,
Jeff Davis

From e8a68f42f5802d138ba04043b25b7d42862be29d Mon Sep 17 00:00:00 2001
From: Jeff Davis 
Date: Mon, 14 Apr 2025 11:34:11 -0700
Subject: [PATCH v1] Another unintentional behavior change in commit
 e9931bfb75.

Reported-by: Noah Misch 
Discussion: https://postgr.es/m/20250412123430.8c.nmi...@google.com
---
 src/backend/regex/regc_pg_locale.c | 24 +++-
 1 file changed, 19 insertions(+), 5 deletions(-)

diff --git a/src/backend/regex/regc_pg_locale.c b/src/backend/regex/regc_pg_locale.c
index ed7411df83d..41b993ad773 100644
--- a/src/backend/regex/regc_pg_locale.c
+++ b/src/backend/regex/regc_pg_locale.c
@@ -21,9 +21,10 @@
 #include "utils/pg_locale.h"
 
 /*
- * To provide as much functionality as possible on a variety of platforms,
- * without going so far as to implement everything from scratch, we use
- * several implementation strategies depending on the situation:
+ * For the libc provider, to provide as much functionality as possible on a
+ * variety of platforms without going so far as to implement everything from
+ * scratch, we use several implementation strategies depending on the
+ * situation:
  *
  * 1. In C/POSIX collations, we use hard-wired code.  We can't depend on
  * the  functions since those will obey LC_CTYPE.  Note that these
@@ -33,8 +34,9 @@
  *
  * 2a. When working in UTF8 encoding, we use the  functions.
  * This assumes that every platform uses Unicode codepoints directly
- * as the wchar_t representation of Unicode.  On some platforms
- * wchar_t is only 16 bits wide, so we have to punt for codepoints > 0x.
+ * as the wchar_t representation of Unicode.  (XXX: This could be a problem
+ * for ICU in non-UTF8 encodings.)  On some platforms wchar_t is only 16 bits
+ * wide, so we have to punt for codepoints > 0x.
  *
  * 2b. In all other encodings, we use the  functions for pg_wchar
  * values up to 255, and punt for values above that.  This is 100% correct
@@ -562,10 +564,16 @@ pg_wc_toupper(pg_wchar c)
 		case PG_REGEX_STRATEGY_BUILTIN:
 			return unicode_uppercase_simple(c);
 		case PG_REGEX_STRATEGY_LIBC_WIDE:
+			/* force C behavior for ASCII characters, per comments above */
+			if (pg_regex_locale->is_default && c <= (pg_wchar) 127)
+return pg_ascii_toupper((unsigned char) c);
 			if (sizeof(wchar_t) >= 4 || c <= (pg_wchar) 0x)
 return towupper_l((wint_t) c, pg_regex_locale->info.lt);
 			/* FALL THRU */
 		case PG_REGEX_STRATEGY_LIBC_1BYTE:
+			/* force C behavior for ASCII characters, per comments above */
+			if (pg_regex_locale->is_default && c <= (pg_wchar) 127)
+return pg_ascii_toupper((unsigned char) c);
 			if (c <= (pg_wchar) UCHAR_MAX)
 return toupper_l((unsigned char) c, pg_regex_locale->info.lt);
 			return c;
@@ -590,10 +598,16 @@ pg_wc_tolower(pg_wchar c)
 		case PG_REGEX_STRATEGY_BUILTIN:
 			return unicode_lowercase_simple(c);
 		case PG_REGEX_STRATEGY_LIBC_WIDE:
+			/* force C behavior for ASCII characters, per comments above */
+			if (pg_regex_locale->is_default && c <= (pg_wchar) 127)
+return pg_ascii_tolower((unsigned char) c);
 			if (sizeof(wchar_t) >= 4 || c <= (pg_wchar) 0x)
 return towlower_l((wint_t) c, pg_regex_locale->info.lt);
 			/* FALL THRU */
 		case PG_REGEX_STRATEGY_LIBC_1BYTE:
+			/* force C behavior for ASCII characters, per comments above */
+			if (pg_regex_locale->is_default && c <= (pg_wchar) 127)
+return pg_ascii_tolower((unsigned char) c);
 			if (c <= (pg_wchar) UCHAR_MAX)
 return tolower_l((unsigned char) c, pg_regex_locale->info.lt);
 			return c;
-- 
2.34.1

Re: Performance issues with v18 SQL-language-function changes

2025-04-14 Thread Pavel Stehule

Hi

po 14. 4. 2025 v 16:38 odesílatel Robert Haas 
napsal:

> On Sun, Apr 13, 2025 at 3:23 PM Tom Lane  wrote:
> > create function fx(p_summa bigint) returns text immutable strict
> > return ltrim(to_char(p_summa, '999 999 999 999 999 999 999 999'));
> >
> > explain analyze select fx(i) from generate_series(1,100) as i(i);
> >
> > you arrive at the rude discovery that 0dca5d68d is about 50% slower
> > than 0dca5d68d^, because the old implementation builds a plan for fx()
> > only once and then re-uses it throughout the query.
>
> I agree that we should do something about this. I haven't reviewed
> your patches but the approach sounds broadly reasonable.
>

I can confirm that all tests passed, and patched code is about 5% faster
than the current master (tested on my slower notebook). So it should to fix
performance regression where it was it against pg17 (it was about 2%)
(tested without assertions)

Regards

Pavel




> --
> Robert Haas
> EDB: http://www.enterprisedb.com
>

Re: Add estimated hit ratio to Memoize in EXPLAIN to explain cost adjustment

2025-04-14 Thread David Rowley

On Tue, 15 Apr 2025 at 04:31, Robert Haas  wrote:
> I don't think we should use ANALYZE for this because, IME, that should
> just be about whether the query gets executed. Since this looks like
> information that is available at plan time, I think it should be
> displayed all the time or made contingent on VERBOSE. It would be sad
> if you had to run the query to get information that was available
> without needing to run the query. Personally, I think we can probably
> just display it all the time. I mean, the typical plan is not going to
> contain enough Memoize nodes that a little extra chatter for each one
> impedes readability significantly.

You might be right there. I just offered that as a possible solution
to the concern of bloating the EXPLAIN output.

> Having looked at v6, I think it would help, at least if the reader is
> sufficiently knowledgeable. From the values displayed, it looks like
> someone could reconstruct the evict_ratio value in
> cost_memoize_rescan(), and if they know the loop count, also the
> hit_ratio. But that seems hard: if you weren't reading the code, how
> would you know how to do it? Even if you are reading the code, are you
> sure you'd reconstruct it correctly? I wonder why we think it's better
> to display this than a more "cooked" number like the estimated hit
> ratio that was proposed in v1?

I think the problem with only showing the estimated hit ratio is that
it's difficult to then know what you might change to improve that in
cases where you might have other join types disabled and are
experimenting to figure out why Nested Loop / Memoize isn't being
chosen.  If you see the estimated capacity isn't big enough to store
all the rows for all distinct keys then you know you can increase
work_mem to assist. If the Estimated Unique Keys is too high, then you
might be able to adjust some ndistinct estimates or increase stats
targets somewhere to improve that.  If we only showed the estimated
hit ratio, then you wouldn't know where to start to get the planner to
pick the Memoize plan.

As I mentioned in [1], I wouldn't object to having the estimated hit
ratio shown in addition to the components that are used to calculate
it. Per [2], Andrei didn't seem to like the idea due to the
duplication. FWIW, I do agree with you that the hit ratio isn't
exactly easy to calculate on the fly when looking at the input numbers
that are used to calculate it. I just wasn't 100% certain that it
would be needed in its calculated form. Maybe it depends on the reason
you're looking at EXPLAIN. Maybe the reason I mentioned above for
looking at EXPLAIN is just one of many and I'm not thinking hard
enough to consider the other ones.

If we can't get consensus for everything people want to add at once
then maybe the patch could be broken into two, with 0001 being pretty
much the v4 patch and then have 0002 add the Estimated Hit Ratio.
Having the physical patches and being able to try it out or view the
regression test changes might lower the bar on people chipping in with
their views.

David

[1] 
https://postgr.es/m/CAApHDvoE5S5FkvEq+N3-J9LfaVUpWLOnczOYAOEvBMCY20=p...@mail.gmail.com
[2] https://postgr.es/m/e76f1635-a933-47bf-a826-045e20d5c...@gmail.com

Re: bug in stored generated column over domain with constraints.

2025-04-14 Thread Tom Lane

jian he  writes:
> new patch attached.

I looked this over.  It's kind of astonishing that nobody has reported
this before, because AFAICT it's been broken since we invented
generated columns.

> rewriteTargetListIU, expand_insert_targetlist these two places can
> make a null Const TargetEntry for the generated column in an INSERT
> operation.

rewriteTargetListIU will *not* do that: it explicitly does nothing
for an attgenerated column, except apply a bunch of error checks.
See about line 988 in HEAD.  So it seems sufficient to fix
expand_insert_targetlist as you've done here.  And then we have to
make ExecCheckPlanOutput cope, too.

I think that this patch is technically correct, but I don't like
it much because it pays little attention to keeping
expand_insert_targetlist and ExecCheckPlanOutput readable, and no
attention at all to keeping their logic parallel.  I think we can
make the code cleaner by moving the default case to the end, as
in the attached.

The test cases seemed a bit overdone, especially in comparison
to the adjacent existing tests.  Fine for development maybe,
but I don't see the value in carrying them forever.  On the other
hand, there's a comment at the top of the test script:
-- keep these tests aligned with generated_virtual.sql
The VIRTUAL case is currently rejecting domains altogether.
But perhaps that won't be true forever, so I think we ought
to follow that advice.

So I end with the attached v4.

regards, tom lane

diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 309e27f8b5f..333cbf78343 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -212,19 +212,10 @@ ExecCheckPlanOutput(Relation resultRel, List *targetList)
 		attr = TupleDescAttr(resultDesc, attno);
 		attno++;
 
-		if (!attr->attisdropped)
-		{
-			/* Normal case: demand type match */
-			if (exprType((Node *) tle->expr) != attr->atttypid)
-ereport(ERROR,
-		(errcode(ERRCODE_DATATYPE_MISMATCH),
-		 errmsg("table row type and query-specified row type do not match"),
-		 errdetail("Table has type %s at ordinal position %d, but query expects %s.",
-   format_type_be(attr->atttypid),
-   attno,
-   format_type_be(exprType((Node *) tle->expr);
-		}
-		else
+		/*
+		 * Special cases here should match planner's expand_insert_targetlist.
+		 */
+		if (attr->attisdropped)
 		{
 			/*
 			 * For a dropped column, we can't check atttypid (it's likely 0).
@@ -239,6 +230,35 @@ ExecCheckPlanOutput(Relation resultRel, List *targetList)
 		 errdetail("Query provides a value for a dropped column at ordinal position %d.",
    attno)));
 		}
+		else if (attr->attgenerated)
+		{
+			/*
+			 * For a generated column, the planner will have inserted a null
+			 * of the column's base type (to avoid possibly failing on domain
+			 * not-null constraints).  It doesn't seem worth insisting on that
+			 * exact type though, since a null value is type-independent.  As
+			 * above, just insist on *some* NULL constant.
+			 */
+			if (!IsA(tle->expr, Const) ||
+!((Const *) tle->expr)->constisnull)
+ereport(ERROR,
+		(errcode(ERRCODE_DATATYPE_MISMATCH),
+		 errmsg("table row type and query-specified row type do not match"),
+		 errdetail("Query provides a value for a generated column at ordinal position %d.",
+   attno)));
+		}
+		else
+		{
+			/* Normal case: demand type match */
+			if (exprType((Node *) tle->expr) != attr->atttypid)
+ereport(ERROR,
+		(errcode(ERRCODE_DATATYPE_MISMATCH),
+		 errmsg("table row type and query-specified row type do not match"),
+		 errdetail("Table has type %s at ordinal position %d, but query expects %s.",
+   format_type_be(attr->atttypid),
+   attno,
+   format_type_be(exprType((Node *) tle->expr);
+		}
 	}
 	if (attno != resultDesc->natts)
 		ereport(ERROR,
diff --git a/src/backend/optimizer/prep/preptlist.c b/src/backend/optimizer/prep/preptlist.c
index c2a77503d4b..ffc9d6c3f30 100644
--- a/src/backend/optimizer/prep/preptlist.c
+++ b/src/backend/optimizer/prep/preptlist.c
@@ -44,6 +44,7 @@
 #include "optimizer/tlist.h"
 #include "parser/parse_coerce.h"
 #include "parser/parsetree.h"
+#include "utils/lsyscache.h"
 #include "utils/rel.h"
 
 static List *expand_insert_targetlist(PlannerInfo *root, List *tlist,
@@ -419,9 +420,8 @@ expand_insert_targetlist(PlannerInfo *root, List *tlist, Relation rel)
 			 *
 			 * INSERTs should insert NULL in this case.  (We assume the
 			 * rewriter would have inserted any available non-NULL default
-			 * value.)  Also, if the column isn't dropped, apply any domain
-			 * constraints that might exist --- this is to catch domain NOT
-			 * NULL.
+			 * value.)  Also, normally we must apply any domain constraints
+			 * that might exist --- this is to catch domain NOT NULL.
 			 *
 			 * When generating a NULL constant for

Re: Add pg_get_injection_points() for information of injection points

2025-04-14 Thread Kirill Reshke

On Mon, 14 Apr 2025 at 17:32, Michael Paquier  wrote:
>
> As this is global for all libraries, that's not something I would add
> there.

Same applies for pg_stat_statements, but it is defined in extension.
Also, injection points are tests-only, so maybe no need to have this
function callable in production systems?
One more concern is about pre-defined oids: they are limited. Maybe we
should not consume predefined oid in case when this is avoidable.

-- 
Best regards,
Kirill Reshke

Re: Call for Posters: PGConf.dev 2025

2025-04-14 Thread Noah Misch

Thanks for running this.  Four logistical questions:

1. By what timestamp should one submit?
2. Where does one send the submission?

On Sun, Apr 13, 2025 at 04:01:56PM +0500, Andrey Borodin wrote:
> The goal of the poster session is to visually present your patch or project 
> on an A2-sized poster (ANSI C, 17" × 22"). We encourage you to include:

3. Should it be portrait orientation, landscape orientation, or no preference?

> /*
>  * A visual representation of your patch or project.
>  * The conference logo and year.

4. Is there an official way to place the logo on a white page?
   https://2025.pgconf.dev/ puts it on different dark backgrounds.  If there's
   a preferred way (or a desire for all posters to have dark backgrounds),
   that would be good to know.

Re: Call for Posters: PGConf.dev 2025

2025-04-14 Thread Andrey Borodin

> On 15 Apr 2025, at 01:52, Noah Misch  wrote:
> 
> Thanks for running this.  Four logistical questions:
> 
> 1. By what timestamp should one submit?

If prospect participant is going to bring their poster to Montreal themself (or 
hand via friends) - it’s good to have on conf venue on May 13, so I can put it 
on walls before conf starts. This is most preferred way.

For participants that cannot hand their posters to Montreal I’m going to print 
their posters on May 8th before my flight to Canada. If something does not go 
smoothly, probably, I’ll be able to print in Montreal, but I’m considering this 
as an edge case.

Anyway, it’s good to express interest and intention to draw poster by next 
week, so I can estimate demand.

> 2. Where does one send the submission?

Off-list response to this thread is fine. Or, perhaps, any email to me would 
work too. I think I’ll create a Telegram chat for quick feedback and so that I 
can show how posters look on the wall.

> 
> On Sun, Apr 13, 2025 at 04:01:56PM +0500, Andrey Borodin wrote:
>> The goal of the poster session is to visually present your patch or project 
>> on an A2-sized poster (ANSI C, 17" × 22"). We encourage you to include:
> 
> 3. Should it be portrait orientation, landscape orientation, or no preference?

I made my poster in portrait orientation. I do not know how exactly I’ll be 
able to display posters on the conf venue, so there is no clear preference. I 
hope I wont need to hang posters rotated by 90 degrees.

Speaking about poster format, I’ve chosen A2 as a very safe option. A2s are 
easy to bring and hang, unlike A1 and A0. But I’m afraid that they are too 
small to accommodate all the details in big letters and pictures...

>> /*
>> * A visual representation of your patch or project.
>> * The conference logo and year.
> 
> 4. Is there an official way to place the logo on a white page?
>   https://2025.pgconf.dev/ puts it on different dark backgrounds.  If there's
>   a preferred way (or a desire for all posters to have dark backgrounds),
>   that would be good to know.

I’d suppose white background can save a lot of ink :) I’ve inverted colors of 
logo from the website, PFA. But I’m too far from design to call any graphics of 
my production “official”. Also logo from the website lacks year, and I do not 
know which type is used to produce the logo...

Thank you for your questions!

Best regards, Andrey Borodin.

Re: Proposal: Limitations of palloc inside checkpointer

2025-04-14 Thread Xuneng Zhou

Hi,

I've moved this entry to the July CommitFest. The CI reported an unused
variable warning in patch v5, so I've addressed it by removing the unused
one. Sorry for not catching the issue locally.
Xuneng Zhou  于2025年4月10日周四 23:43写道：

> Hi,
>
> I’ve updated and rebased Maxim's patch version 4 with optimizations like
> ring buffer and capped max_requests proposed by Heikki.  These are the
> summary of Changes in v5:
>
> • Replaced the linear array model in AbsorbSyncRequests() with a bounded
> ring buffer to avoid large memmove() operations when processing sync
> requests.
>
> • Introduced a tunable batch size (CKPT_REQ_BATCH_SIZE, default: 10,000)
> to incrementally absorb requests while respecting MaxAllocSize.
>
> • Capped max_requests with a hard upper bound (MAX_CHECKPOINT_REQUESTS =
> 10,000,000) to avoid pathological memory growth when shared_buffers is
> very large.
>
> • Updated CompactCheckpointerRequestQueue() to support the ring buffer
> layout.
>
> • Retained the existing compacting logic but modified it to operate
> in-place over the ring.
>
>> >> The patch itself looks ok to me. I'm curious about the trade-offs
>> >> between this incremental approach and the alternative of
>> >> using palloc_extended() with the MCXT_ALLOC_HUGE flag. The approach
>> >> of splitting the requests into fixed-size slices  avoids OOM
>> >> failures or process termination by the OOM killer, which is good.
>>
>> +1
>>
>> >> However, it does add some overhead with additional lock acquisition/
>> >> release cycles and memory movement operations via memmove(). The
>> >> natural question is whether the security justify the cost.
>>
>> Hmm, if you turned the array into a ring buffer, you could absorb part
>> of the queue without the memmove().
>>
>> With that, I'd actually suggest using a much smaller allocation, maybe
>> 1 entries or even less. That should be enough to keep the lock
>> acquire/release overhead acceptable
>>
>
>
>> >> Regarding the slice size of 1 GB,  is this derived from
>> >> MaxAllocSize limit, or was it chosen for other performance
>> >> reasons? whether a different size might offer better performance
>> >> under typical workloads?
>> >
>> > I think 1 GB is derived purely from MaxAllocSize. This "palloc" is a
>> > relatively old one, and no one expected the number of requests to
>> exceed
>> > 1 GB. Now we have the ability to set the shared_buffers to a huge
>> number
>> > (without discussing now whether this makes any real sense), thus this
>> > limit for palloc becomes a problem.
>>
>> Yeah. Frankly even 1 GB seems excessive for this. We set max_requests =
>> NBuffers, which makes some sense so that you can fit the worst case
>> scenario of quickly evicting all pages from the buffer cache. But even
>> then I'd expect the checkpointer to be able to absorb the changes before
>> the queue fills up. And we have the compaction logic now too.
>>
>> So I wonder if we should cap max_requests at, say, 10 million requests
>> now?
>>
>> CompactCheckpointerRequestQueue() makes a pretty large allocation too,
>> BTW, although much smaller than the one in AbsorbSyncRequests(). The
>> hash table it builds could grow quite large though.
>>
>> I’m not certain what the optimal cap for max_requests should be—whether
> it’s 10 million(setting it to 10 million would result in a peak temporary
> allocation of roughly 700 MB within CompactCheckpointerRequestQueue), 1
> million, or some other value—but introducing an upper bound seems
> important. Without a cap, max_requests could theoretically scale with
> NBuffers, potentially resulting in excessive temporary memory allocations.
>
> As this is my first patch submission, it might be somewhat naive and
> experimental— I appreciate your patience and feedback.
>


v6-0001-Process-sync-requests-incrementally-in-AbsorbSyncReq.patch
Description: Binary data

pg_combinebackup: correct code comment.

2025-04-14 Thread Amul Sul

Hi,
Cc: Robert

Attached is a patch that corrects the code comment for
process_directory_recursively() in pg_combinebackup, where the comment
incorrectly refers to "output_directory" instead of the intended
"input_directory".

--
Regards,
Amul Sul
EDB: http://www.enterprisedb.com
From 80765206dcdc5e474cf49b92846a57b61c50e006 Mon Sep 17 00:00:00 2001
From: Amul Sul 
Date: Tue, 15 Apr 2025 09:42:52 +0530
Subject: [PATCH] pg_combinebackup: Fix incorrect code comment.

The comment in the process_directory_recursively() prologue
incorrectly refers to output_directory instead of input_directory.
---
 src/bin/pg_combinebackup/pg_combinebackup.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/bin/pg_combinebackup/pg_combinebackup.c b/src/bin/pg_combinebackup/pg_combinebackup.c
index 4f8ba156336..7544c6b33ee 100644
--- a/src/bin/pg_combinebackup/pg_combinebackup.c
+++ b/src/bin/pg_combinebackup/pg_combinebackup.c
@@ -826,7 +826,7 @@ parse_oid(char *s, Oid *result)
  *
  * n_prior_backups is the number of prior backups that we have available.
  * This doesn't count the very last backup, which is referenced by
- * output_directory, just the older ones. prior_backup_dirs is an array of
+ * input_directory, just the older ones. prior_backup_dirs is an array of
  * the locations of those previous backups.
  */
 static void
-- 
2.43.5

Re: not null constraints, again

2025-04-14 Thread Tom Lane

The attached script simply creates two partitioned tables that
are connected by a foreign key constraint, then pg_dumps that
setup and tries to do a parallel restore.  This works up until

14e87ffa5c543b5f30ead7413084c25f7735039f is the first bad commit
commit 14e87ffa5c543b5f30ead7413084c25f7735039f
Author: Álvaro Herrera 
Date:   Fri Nov 8 13:28:48 2024 +0100

Add pg_constraint rows for not-null constraints

Since that commit, it fails every time (for me, anyway, on a couple
of different machines) with a deadlock error, typically between
ALTER ADD PRIMARY KEY and one of the table COPY commands:

2025-04-14 12:54:49.892 EDT [1278062] ERROR:  deadlock detected
2025-04-14 12:54:49.892 EDT [1278062] DETAIL:  Process 1278062 waits for 
AccessExclusiveLock on relation 47164 of database 47159; blocked by process 
1278059.
Process 1278059 waits for AccessShareLock on relation 47160 of database 
47159; blocked by process 1278062.
Process 1278062: ALTER TABLE ONLY public.parent1
ADD CONSTRAINT parent1_pkey PRIMARY KEY (id);
Process 1278059: COPY public.c11 (id, b) FROM stdin;

I stumbled across this result after wondering why the repro
I'd devised at [1] didn't fail in v17.

The patch I propose there seems to prevent this, but I wonder if we
shouldn't look closer into why it's failing in the first place.
I would not have expected that adding pg_constraint rows implies
stronger locks than what ALTER ADD PRIMARY KEY was using before,
and I suspect that doing so will cause more problems than just
breaking parallel restore.

regards, tom lane

[1] https://www.postgresql.org/message-id/flat/2045026.1743801...@sss.pgh.pa.us

psql postgres <

Re: [PoC] Federated Authn/z with OAUTHBEARER

2025-04-14 Thread Jacob Champion

On Mon, Apr 14, 2025 at 12:46 PM Wolfgang Walther
 wrote:
> But that means we'd need a -lpq-oauth-18 or something like that in
> Libs.private in libpq.pc, right?

I believe so. I'm in the middle of the .pc stuff right now; v6 should
have the fixes as long as I don't get stuck.

Thanks,
--Jacob

Re: Built-in Raft replication

2025-04-14 Thread Konstantin Osipov

* Kirill Reshke  [25/04/14 20:48]:
> > I am considering starting work on implementing a built-in Raft
> > replication for PostgreSQL.
> >
> 
> Just some thought on top of my mind, if you need my voice here:
> 
> I have a hard time believing the community will be positive about this
> change in-core. It has more changes as contrib extension. In fact, if
> we want a built-in consensus algorithm, Paxos is a better option,
> because you can use postgresql as local crash-safe storage for single
> decree paxos, just store your state (ballot number, last voice) in a
> heap table.

But Raft is a log replication algorithm, not a consensus
algorithm. It does use consensus, but that's for leader election.
Paxos could be used for log replication, but that would be
expensive. In fact etcd uses Raft, and etcd is used by Patroni. So
I completely lost your line of thought here.

> OTOH Raft needs to write its own log, and what's worse, it sometimes
> needs to remove already written parts of it (so, it is not appended
> only, unlike WAL). If you have a production system which maintains two
> kinds of logs with different semantics, it is a very hard system to
> maintain..

My proposal is exactly to replace (or rather, extend) the current
synchronous log replication with Raft. Entry removal is possible to
stack on top of append-only format, and production implementations
exist which do that.

So, no, it's a single log, and in fact the current WAL will do.

> There is actually a prod-ready (non open source) implementation of
> RAFT as extension, called BiHA, by pgpro.

My guess biha is an extension since a proprietary code is easier
to maintain that way. I'd rather say the fact that there is a
proprietary implementation out in the field confirms it could be a
good idea to have it in PostgreSQL trunk. 

In any case I'm interested in contributing to the trunk, not
building a proprietary module/fork.

-- 
Konstantin Osipov

Typos in the comment for the estimate_multivariate_ndistinct()

2025-04-14 Thread Tender Wang

Hi,

While reading the estimate_multivariate_ndistinct(),
 I think "If a match it found, *varinfos is
 * updated to remove the list of matched varinfos"
should be "If a match is found, *varinfos is
 * updated to remove the list of matched varinfos"
I've attached a patch for that.

-- 
Thanks,
Tender Wang
From 55ac0d72303b7ea431d58cbbc242dc5215e13102 Mon Sep 17 00:00:00 2001
From: Tender Wang 
Date: Mon, 14 Apr 2025 21:28:53 +0800
Subject: [PATCH] Fix a typo

---
 src/backend/utils/adt/selfuncs.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 588d991fa57..23abfafb165 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -4148,7 +4148,7 @@ estimate_hashagg_tablesize(PlannerInfo *root, Path *path,
 /*
  * Find applicable ndistinct statistics for the given list of VarInfos (which
  * must all belong to the given rel), and update *ndistinct to the estimate of
- * the MVNDistinctItem that best matches.  If a match it found, *varinfos is
+ * the MVNDistinctItem that best matches.  If a match is found, *varinfos is
  * updated to remove the list of matched varinfos.
  *
  * Varinfos that aren't for simple Vars are ignored.
-- 
2.34.1

Re: pgsql: Non text modes for pg_dumpall, correspondingly change pg_restore

2025-04-14 Thread Mahendra Singh Thalor

On Mon, 14 Apr 2025 at 21:39, Álvaro Herrera  wrote:
>
> On 2025-Apr-04, Andrew Dunstan wrote:
>
> > Non text modes for pg_dumpall, correspondingly change pg_restore
>
> I think the new oid_string_list stuff in this commit is unnecessary, and
> we can remove a bunch of lines by simplifying that to using
> SimplePtrList, as the attached illustrates.  (Perhaps the names of
> members of the proposed struct can be improved.)

Thanks Álvaro for the patch.

I took this patch and did some testing. Patch looks good to me. I was not
able to use "git am" in my setup due to CR's in patch but no warning in the
patch when I used "patch -p".

*One review comment:*
We are using FLEXIBLE_ARRAY_MEMBER for dbname. Can we use NAMEDATALEN? In
code, many places we are using this hard coded value of 64 size for names.

On Tue, 15 Apr 2025 at 01:44, Andrew Dunstan  wrote:
>
>
> On 2025-04-14 Mo 12:28 PM, Andrew Dunstan wrote:
> >
> >
> >>
> >> I'm also not sure about the double sscanf() business there ... There
> >> must be a better way to do this.
> >
> >
> >
> > Yes, probably. I'll look into that if you like.
> >
> >
> >
>
> something like this?
>
> cheers
>
>
> andrew
>
>
> --
> Andrew Dunstan
> EDB: https://www.enterprisedb.com

Thanks Andrew.

Your patch looks good to me. I took your patch and did 1-2 line adjustments
as per below. Please have a look.

--- a/src/bin/pg_dump/pg_restore.c
> +++ b/src/bin/pg_dump/pg_restore.c
> @@ -1053,15 +1053,19 @@ get_dbname_oid_list_from_mfile(const char
> *dumpdirpath, SimpleOidStringList *dbn
> while ((fgets(line, MAXPGPATH, pfile)) != NULL)
> {
> Oid db_oid = InvalidOid;
> -   chardb_oid_str[MAXPGPATH + 1] = "";
> char   *dbname;
> +   char*p = line;
>
> /* Extract dboid. */
> sscanf(line, "%u", &db_oid);
> -   sscanf(line, "%20s", db_oid_str);
> +
> +   while(isdigit(*p))
> +   p++;
> +
> +   Assert(*p == ' ');
>
> /* dbname is the rest of the line */
> -   dbname = line + strlen(db_oid_str) + 1;
> +   dbname = ++p;
>
> /* Remove \n from dbname. */
> dbname[strlen(dbname) - 1] = '\0';
>

-- 
Thanks and Regards
Mahendra Singh Thalor
EnterpriseDB: http://www.enterprisedb.com

Re: FmgrInfo allocation patterns (and PL handling as staged programming)

2025-04-14 Thread Chapman Flack

On 04/06/25 22:37, Tom Lane wrote:
> Here's a draft patch to fix the bogus dependencies.  As given this'd
> only be okay for HEAD, since I doubt we can get away with changing
> ProcedureCreate()'s signature in stable branches ... In the back branches
> we could make ProcedureCreate() deconstruct the OID array it's given and
> then repeat the transform lookups, but ugh. ...
> 
> BTW, I feel a little uncomfortable with the idea that we're adding
> dependencies on objects that are explicitly mentioned nowhere in the
> pg_proc entry.

Pursuing that idea a bit further, was there a compelling original reason
the column in pg_proc had to be protrftypes and not just protransforms?

The transforms have been looked up at ProcedureCreate time.

Any PL that supports transforms has to repeat the transform lookups
if there are protrftypes present.

Any PL that doesn't support transforms needs only to check for presence
and say "sorry, I don't support those", which it could do just as easily
with a list of transforms as with a list of types.

Regards,
-Chap

Log prefix missing for subscriber log messages received from publisher

2025-04-14 Thread vignesh C

Hi,

Currently, when a warning is emitted by the publisher, the
corresponding log message does not include the log prefix. This makes
it harder to correlate such messages with other log entries. For
example, in a simulated error scenario where directory removal fails,
the notice message lacks the standard log prefix, as shown below:
2025-03-18 16:44:36.071 IST [196901] LOG:  logical replication table
synchronization worker for subscription "sub1", table "t1" has
finished
WARNING:  could not remove directory
"pg_replslot/pg_16398_sync_16387_7483106341004194035.tmp"

In this case, the WARNING line does not include the usual timestamp
information, making it harder to trace.

To address this, we can have a custom notice processor for WAL
receiver connections—similar to what's done in the attached patch.
This ensures that notices received during both streaming and logical
replication include the appropriate log prefix. Since this issue is
present in both replication modes, the patch sets the notice processor
for all WAL receiver connections.

Regards,
Vignesh


register_notice_process.patch
Description: Binary data

Re: not null constraints, again

2025-04-14 Thread Tom Lane

Alvaro Herrera  writes:
> On 2025-Apr-14, Tom Lane wrote:
>> I would not have expected that adding pg_constraint rows implies
>> stronger locks than what ALTER ADD PRIMARY KEY was using before,
>> and I suspect that doing so will cause more problems than just
>> breaking parallel restore.

> I wasn't aware of this side effect.  I'll investigate this in more
> depth.  I suspect it might be a bug in the way we run through ALTER
> TABLE for the primary key.

After further thought it occurs to me that it might not be a case
of "we get stronger locks", but a case of "we accidentally get a
weaker lock earlier and then try to upgrade it", thus creating a
possibility of deadlock where before we'd just have blocked till
the other statement cleared.  Still worthy of being fixed if that's
true, though.

regards, tom lane

Re: Add pg_get_injection_points() for information of injection points

2025-04-14 Thread Michael Paquier

On Mon, Apr 14, 2025 at 06:15:07AM +, Hayato Kuroda (Fujitsu) wrote:
> Apart from functions related with injection_points, this can be called even 
> when
> postgres has been built with -Dinjection_points=false. This is intentional 
> because
> this function does not have any side-effect and just refers the status. Is my
> understanding correct?

Yes.  The function could be changed to return an ERROR if the build
does not support this option.  It's more portable to return NULL if
you have some queries that do joins.  This could be used with
pg_stat_activity and wait events for example, and some points are in
positions strategic enough that they could be used across more than
one library, like the restart point one or the autovacuum startup one.

> I'm not sure it is directly related, but ISTM there are no direct ways to 
> check
> whether the injection_points is enabled or not. How do you think adding the
> function?

Yeah, we could use something like that, not sure if that's worth it.

> Regarding the internal of the patch, it could be crashed when two points are
> attached and then first one is detached [1]. I think we should not use "idx" 
> for
> the result array - PSA the fix.

Oops.  That's a brain fart from my side.  Thanks.
--
Michael
From ee033a0f883d52638cc9c90b9c0eff29b9dde5a5 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Tue, 15 Apr 2025 08:39:36 +0900
Subject: [PATCH v2] Add pg_get_injection_points()

This is a system function that displays the information about the
injection points currently attached to the system, feeding from the
states of things in shared memory.
---
 src/include/catalog/pg_proc.dat   |  8 +++
 src/include/utils/injection_point.h   | 14 +
 src/backend/utils/misc/Makefile   |  1 +
 src/backend/utils/misc/injection_point.c  | 53 +
 .../utils/misc/injection_point_funcs.c| 59 +++
 src/backend/utils/misc/meson.build|  1 +
 .../expected/injection_points.out | 16 +
 .../injection_points/sql/injection_points.sql |  7 +++
 src/test/regress/expected/misc_functions.out  |  7 +++
 src/test/regress/sql/misc_functions.sql   |  3 +
 doc/src/sgml/func.sgml| 46 +++
 src/tools/pgindent/typedefs.list  |  1 +
 12 files changed, 216 insertions(+)
 create mode 100644 src/backend/utils/misc/injection_point_funcs.c

diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 62beb71da288..d087d9fda1e8 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -12566,4 +12566,12 @@
   proargnames => '{pid,io_id,io_generation,state,operation,off,length,target,handle_data_len,raw_result,result,target_desc,f_sync,f_localmem,f_buffered}',
   prosrc => 'pg_get_aios' },
 
+# Injection point functions
+{ oid => '8490', descr => 'information about injection points attached',
+  proname => 'pg_get_injection_points', prorows => '3', proretset => 't',
+  provolatile => 'v', proparallel => 'r', prorettype => 'record',
+  proargtypes => '', proallargtypes => '{text,text,text}',
+  proargmodes => '{o,o,o}', proargnames => '{name,library,function}',
+  prosrc => 'pg_get_injection_points' },
+
 ]
diff --git a/src/include/utils/injection_point.h b/src/include/utils/injection_point.h
index 6ba64cd1ebc6..c5c78914b848 100644
--- a/src/include/utils/injection_point.h
+++ b/src/include/utils/injection_point.h
@@ -11,6 +11,17 @@
 #ifndef INJECTION_POINT_H
 #define INJECTION_POINT_H
 
+/*
+ * Injection point data, used when retrieving a list of all the attached
+ * injection points.
+ */
+typedef struct InjectionPointData
+{
+	const char *name;
+	const char *library;
+	const char *function;
+} InjectionPointData;
+
 /*
  * Injection points require --enable-injection-points.
  */
@@ -46,6 +57,9 @@ extern void InjectionPointCached(const char *name);
 extern bool IsInjectionPointAttached(const char *name);
 extern bool InjectionPointDetach(const char *name);
 
+/* Get the current set of injection points attached */
+extern InjectionPointData *InjectionPointList(uint32 *num_points);
+
 #ifdef EXEC_BACKEND
 extern PGDLLIMPORT struct InjectionPointsCtl *ActiveInjectionPoints;
 #endif
diff --git a/src/backend/utils/misc/Makefile b/src/backend/utils/misc/Makefile
index b362ae437710..93703633f69a 100644
--- a/src/backend/utils/misc/Makefile
+++ b/src/backend/utils/misc/Makefile
@@ -22,6 +22,7 @@ OBJS = \
 	guc_tables.o \
 	help_config.o \
 	injection_point.o \
+	injection_point_funcs.o \
 	pg_config.o \
 	pg_controldata.o \
 	pg_rusage.o \
diff --git a/src/backend/utils/misc/injection_point.c b/src/backend/utils/misc/injection_point.c
index 97ab851f0a63..ea7e6dba5211 100644
--- a/src/backend/utils/misc/injection_point.c
+++ b/src/backend/utils/misc/injection_point.c
@@ -584,3 +584,56 @@ IsInjectionPointAttached(const char *name)
 	return false;/* silence compiler */
 #endif
 }
+
+/*
+ * Retrieve a list of

Re: ci: Allow running mingw tests by default via environment variable

2025-04-14 Thread Nathan Bossart

On Mon, Apr 14, 2025 at 09:26:59PM +0200, Andres Freund wrote:
> On April 14, 2025 9:22:40 PM GMT+02:00, Nathan Bossart 
>  wrote:
>>On Sat, Apr 12, 2025 at 08:18:35PM +1200, Thomas Munro wrote:
>>> Is it safe to assume that CI changes come under the same code-freeze
>>> exception as tests and docs?  I would assume so, it's just scripting
>>> glue in .cirrus.XXX files not affecting one bit of the software, but I
>>> don't recall that we specifically clarified that before, so can
>>> someone from the RMT please +1 this?  Then I'll go ahead and push
>>> these.
>>
>>That seems reasonable to me.  But to play devil's advocate, is there a
>>strong reason for doing this before July?
> 
> We get more coverage of the changes we merge until then. 

Sure.  To be clear, I think this change is fine to proceed, especially
given we're less than a week into feature freeze and have plenty of time to
stabilize any cfbot breakage.  If anything, this change should _help_
stabilize v18.  I do wonder if we should be more cautious about these kinds
of changes closer to release dates, but that's not a concern here.

-- 
nathan

Re: Add estimated hit ratio to Memoize in EXPLAIN to explain cost adjustment

2025-04-14 Thread Ilia Evdokimov



On 14.04.2025 23:53, David Rowley wrote:

If we can't get consensus for everything people want to add at once
then maybe the patch could be broken into two, with 0001 being pretty
much the v4 patch and then have 0002 add the Estimated Hit Ratio.
Having the physical patches and being able to try it out or view the
regression test changes might lower the bar on people chipping in with
their views.



Agreed - I’ll split the patch into two: one adds Estimated Capacity 
and Estimated Distinct Keys, and the second adds Estimated Hit Ratio. 
I’ll also add regression test differences to make it easier to evaluate 
the usefulness of each part.


--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.
From 99a23dfae4415a3eb88c3847deefae3c2103077d Mon Sep 17 00:00:00 2001
From: Evdokimov Ilia 
Date: Tue, 15 Apr 2025 00:47:26 +0300
Subject: [PATCH v7 1/2] Show ndistinct and est_entries in EXPLAIN for Memoize

---
 src/backend/commands/explain.c   | 13 
 src/backend/optimizer/path/costsize.c|  3 +
 src/backend/optimizer/plan/createplan.c  | 10 ++-
 src/backend/optimizer/util/pathnode.c|  3 +
 src/include/nodes/pathnodes.h|  2 +
 src/include/nodes/plannodes.h|  6 ++
 src/test/regress/expected/create_index.out   |  3 +-
 src/test/regress/expected/join.out   | 64 +++-
 src/test/regress/expected/memoize.out| 54 +++--
 src/test/regress/expected/partition_join.out | 18 --
 src/test/regress/expected/subselect.out  | 10 +--
 11 files changed, 126 insertions(+), 60 deletions(-)

diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 786ee865f14..0a4ce5ee7d7 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -3628,6 +3628,19 @@ show_memoize_info(MemoizeState *mstate, List *ancestors, ExplainState *es)
 	ExplainPropertyText("Cache Key", keystr.data, es);
 	ExplainPropertyText("Cache Mode", mstate->binary_mode ? "binary" : "logical", es);
 
+	if (es->format == EXPLAIN_FORMAT_TEXT)
+	{
+		ExplainIndentText(es);
+		appendStringInfo(es->str, "Estimated Capacity: %u  Estimated Distinct Lookup Keys: %0.0f\n",
+		((Memoize *) plan)->est_entries,
+		((Memoize *) plan)->est_unique_keys);
+	}
+	else
+	{
+		ExplainPropertyUInteger("Estimated Capacity", "", ((Memoize *) plan)->est_entries, es);
+		ExplainPropertyFloat("Estimated Distinct Lookup Keys", "", ((Memoize *) plan)->est_unique_keys, 0, es);
+	}
+
 	pfree(keystr.data);
 
 	if (!es->analyze)
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 60b0fcfb6be..f72319d903c 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -2604,6 +2604,9 @@ cost_memoize_rescan(PlannerInfo *root, MemoizePath *mpath,
 	mpath->est_entries = Min(Min(ndistinct, est_cache_entries),
 			 PG_UINT32_MAX);
 
+	/* Remember ndistinct for a potential EXPLAIN later */
+	mpath->est_unique_keys = ndistinct;
+
 	/*
 	 * When the number of distinct parameter values is above the amount we can
 	 * store in the cache, then we'll have to evict some entries from the
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index a8f22a8c154..a1456c9014d 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -284,7 +284,8 @@ static Material *make_material(Plan *lefttree);
 static Memoize *make_memoize(Plan *lefttree, Oid *hashoperators,
 			 Oid *collations, List *param_exprs,
 			 bool singlerow, bool binary_mode,
-			 uint32 est_entries, Bitmapset *keyparamids);
+			uint32 est_entries, Bitmapset *keyparamids,
+			double est_unique_keys);
 static WindowAgg *make_windowagg(List *tlist, WindowClause *wc,
  int partNumCols, AttrNumber *partColIdx, Oid *partOperators, Oid *partCollations,
  int ordNumCols, AttrNumber *ordColIdx, Oid *ordOperators, Oid *ordCollations,
@@ -1703,7 +1704,8 @@ create_memoize_plan(PlannerInfo *root, MemoizePath *best_path, int flags)
 
 	plan = make_memoize(subplan, operators, collations, param_exprs,
 		best_path->singlerow, best_path->binary_mode,
-		best_path->est_entries, keyparamids);
+		best_path->est_entries, keyparamids,
+		best_path->est_unique_keys);
 
 	copy_generic_path_info(&plan->plan, (Path *) best_path);
 
@@ -6636,7 +6638,8 @@ materialize_finished_plan(Plan *subplan)
 static Memoize *
 make_memoize(Plan *lefttree, Oid *hashoperators, Oid *collations,
 			 List *param_exprs, bool singlerow, bool binary_mode,
-			 uint32 est_entries, Bitmapset *keyparamids)
+			uint32 est_entries, Bitmapset *keyparamids,
+			double est_unique_keys)
 {
 	Memoize*node = makeNode(Memoize);
 	Plan	   *plan = &node->plan;
@@ -6654,6 +6657,7 @@ make_memoize(Plan *lefttree, Oid *hashoperators, Oid *collations,
 	node->binary_mode = binary_mode;
 	node->est_entries = est_entries;
 	node->keyparamids

Re: not null constraints, again

2025-04-14 Thread Tender Wang

Tom Lane  于2025年4月15日周二 05:39写道：

> Alvaro Herrera  writes:
> > On 2025-Apr-14, Tom Lane wrote:
> >> I would not have expected that adding pg_constraint rows implies
> >> stronger locks than what ALTER ADD PRIMARY KEY was using before,
> >> and I suspect that doing so will cause more problems than just
> >> breaking parallel restore.
>
> > I wasn't aware of this side effect.  I'll investigate this in more
> > depth.  I suspect it might be a bug in the way we run through ALTER
> > TABLE for the primary key.
>
> After further thought it occurs to me that it might not be a case
> of "we get stronger locks", but a case of "we accidentally get a
> weaker lock earlier and then try to upgrade it", thus creating a
> possibility of deadlock where before we'd just have blocked till
> the other statement cleared.  Still worthy of being fixed if that's
> true, though.
>

I added sleep(1) in the  DeadLockReport() before error report to display
the status when a deadlock happened.
bool continue_sleep = true;
do
{
sleep(1);
} while (continue_sleep);
ereport(ERROR,
(errcode(ERRCODE_T_R_DEADLOCK_DETECTED),
errmsg("deadlock detected"),
errdetail_internal("%s", clientbuf.data),
errdetail_log("%s", logbuf.data),
errhint("See server log for query details.")));



ubuntu@VM-0-17-ubuntu:/workspace/postgres$ ps -ef|grep postgres
ubuntu   2911109   1  0 10:34 ?00:00:00
/workspace/pgsql/bin/postgres -D ../data
ubuntu   290 2911109  0 10:34 ?00:00:00 postgres: io worker 0
ubuntu   291 2911109  0 10:34 ?00:00:00 postgres: io worker 1
ubuntu   292 2911109  0 10:34 ?00:00:00 postgres: io worker 2
ubuntu   293 2911109  0 10:34 ?00:00:00 postgres: checkpointer
ubuntu   294 2911109  0 10:34 ?00:00:00 postgres: background
writer
ubuntu   296 2911109  0 10:34 ?00:00:00 postgres: walwriter
ubuntu   297 2911109  0 10:34 ?00:00:00 postgres: autovacuum
launcher
ubuntu   298 2911109  0 10:34 ?00:00:00 postgres: logical
replication launcher
ubuntu   2911180 2911109  0 10:34 ?00:00:00 postgres: ubuntu target
[local] COPY waiting
ubuntu   2911184 2911109  0 10:34 ?00:00:00 postgres: ubuntu target
[local] idle
ubuntu   2911187 2911109  0 10:34 ?00:00:00 postgres: ubuntu target
[local] idle
ubuntu   2911188 2911109  0 10:34 ?00:00:00 postgres: ubuntu target
[local] ALTER TABLE
ubuntu   2911189 2911109  0 10:34 ?00:00:00 postgres: ubuntu target
[local] SELECT waiting
ubuntu   2911190 2911109  0 10:34 ?00:00:00 postgres: ubuntu target
[local] idle
ubuntu   2911191 2911109  0 10:34 ?00:00:00 postgres: ubuntu target
[local] TRUNCATE TABLE waiting
ubuntu   2911192 2911109  0 10:34 ?00:00:00 postgres: ubuntu target
[local] idle
ubuntu   2911193 2911109  0 10:34 ?00:00:00 postgres: ubuntu target
[local] SELECT waiting
ubuntu   2911194 2911109  0 10:34 ?00:00:00 postgres: ubuntu target
[local] idle

gdb -p 2911188   // ALTER TABLE ONLY public.parent2  ADD CONSTRAINT
parent2_pkey PRIMARY KEY (id);
(gdb) bt
#0  0x7f2f6910878a in __GI___clock_nanosleep (clock_id=clock_id@entry=0,
flags=flags@entry=0, req=req@entry=0x7ffc0e560e60,
rem=rem@entry=0x7ffc0e560e60)
at ../sysdeps/unix/sysv/linux/clock_nanosleep.c:78
#1  0x7f2f6910d677 in __GI___nanosleep (req=req@entry=0x7ffc0e560e60,
rem=rem@entry=0x7ffc0e560e60) at ../sysdeps/unix/sysv/linux/nanosleep.c:25
#2  0x7f2f6910d5ae in __sleep (seconds=0) at ../sysdeps/posix/sleep.c:55
#3  0x561cd9386100 in DeadLockReport () at deadlock.c:1136
#4  0x561cd9389df8 in LockAcquireExtended (locktag=0x7ffc0e5610b0,
lockmode=8, sessionLock=false, dontWait=false, reportMemoryError=true,
locallockp=0x7ffc0e5610a8, logLockFailure=false) at lock.c:1232
#5  0x561cd93864bc in LockRelationOid (relid=16473, lockmode=8) at
lmgr.c:115
#6  0x561cd8f21b20 in find_inheritance_children_extended
(parentrelId=16463, omit_detached=true, lockmode=8, detached_exist=0x0,
detached_xmin=0x0) at pg_inherits.c:213
#7  0x561cd8f217c1 in find_inheritance_children (parentrelId=16463,
lockmode=8) at pg_inherits.c:60
#8  0x561cd904dd73 in ATPrepAddPrimaryKey (wqueue=0x7ffc0e561348,
rel=0x7f2f5d7d6240, cmd=0x561cf1d2ee38, recurse=false, lockmode=8,
context=0x7ffc0e561540) at tablecmds.c:9463
#9  0x561cd9043906 in ATPrepCmd (wqueue=0x7ffc0e561348,
rel=0x7f2f5d7d6240, cmd=0x561cf1d2ee38, recurse=false, recursing=false,
lockmode=8, context=0x7ffc0e561540) at tablecmds.c:5079
#10 0x561cd90432aa in ATController (parsetree=0x561cf1d062e0,
rel=0x7f2f5d7d6240, cmds=0x561cf1d06290, recurse=false, lockmode=8,
context=0x7ffc0e561540) at tablecmds.c:4871
#11 0x561cd9042f3b in AlterTable (stmt=0x561cf1d062e0, lockmode=8,
context=0x7ffc0e561540) at tablecmds.c:4533
#12 0x561cd93bb7a8 in ProcessUtilitySlow (pstate=0x561cf1d2f9e0,
pstmt=0x561cf1d06390, queryString=0x561cf1d05570 "ALTER TABLE ONLY
public.parent2\nADD CONSTRAINT parent2_p

1 2 >

100 matches

Mail list logo