Re: New committer: Jacob Champion

2025-04-13 Thread Richard Guo
On Sat, Apr 12, 2025 at 5:26 AM Jonathan S. Katz  wrote:
> The Core Team would like to extend our congratulations to Jacob
> Champion, who has accepted an invitation to become our newest PostgreSQL
> committer.
>
> Please join us in wishing Jacob much success and few reverts!

Congratulations Jacob!  Well deserved!

Thanks
Richard




Re: pg_dump --if-exists --clean when drop index that is partition of a partitioned index

2025-04-13 Thread Pavel Stehule
Hi

po 14. 4. 2025 v 7:54 odesílatel jian he 
napsal:

> hi.
>
> CREATE TABLE tp(c int, a int, b int) PARTITION BY RANGE (b);
> CREATE TABLE tp_1(c int, a int, b int);
> ALTER TABLE tp ATTACH PARTITION tp_1 FOR VALUES FROM (0) TO (1);
> CREATE INDEX t_a_idx ON tp_1(a);
> CREATE INDEX tp_a_idx ON tp(a);
>
> pg_dump  --schema=public --if-exists --clean --no-statistics
> --no-owner --no-data --table-and-children=tp > 1.sql
> pg_dump output file 1.sql excerpts:
> 
> DROP INDEX IF EXISTS public.t_a_idx;
> DROP INDEX IF EXISTS public.tp_a_idx;
> DROP TABLE IF EXISTS public.tp_1;
> DROP TABLE IF EXISTS public.tp;
> 
> if you execute the
> DROP INDEX IF EXISTS public.t_a_idx;
>
> ERROR:  cannot drop index t_a_idx because index tp_a_idx requires it
> HINT:  You can drop index tp_a_idx instead.
>
> Is this pg_dump output what we expected?
>
>
It is a bug, I think, the implementation of these parts of code is older
than partitioning support, and doesn't do necessary detach.

regards

Pavel



>
> SELECT * FROM pg_partition_tree('tp_a_idx');
>   relid   | parentrelid | isleaf | level
> --+-++---
>  tp_a_idx | | f  | 0
>  t_a_idx  | tp_a_idx| t  | 1
> (2 rows)
>
>
>


Re: An incorrect check in get_memoize_path

2025-04-13 Thread Richard Guo
On Mon, Apr 7, 2025 at 4:50 PM Richard Guo  wrote:
> Hence, I propose the attached patch for the fix.
>
> BTW, there is a XXX comment there saying that maybe we can make the
> remaining join quals part of the inner scan's filter instead of the
> join filter.  I don't think this is possible in all cases.  In the
> above query, 'coalesce(t2.b) = t3.b' cannot be made part of t3's scan
> filter, according to join_clause_is_movable_into.  So I removed that
> comment in the patch while we're here.
>
> Any thoughts?

Here is an updated patch with a commit message.  Regarding
backporting, I'm inclined not to, given the lack of field reports.
Any objections to pushing it?

Thanks
Richard


v2-0001-Fix-an-incorrect-check-in-get_memoize_path.patch
Description: Binary data


Re: Adding error messages to a few slash commands

2025-04-13 Thread Abhishek Chanda
Thanks for the feedback, attached an updated patch that changes most
of those "Did not find" messages to empty tables. I did not change two
sets, listDbRoleSettings and listTables both have comments describing
that these are deliberately this way.

I wanted to post this update to close this loop for now. I will follow
up once the merge window opens again.

Thanks

On Sun, Apr 13, 2025 at 1:47 AM Pavel Luzanov  wrote:
>
> On 13.04.2025 08:29, Tom Lane wrote:
>
> Abhishek Chanda  writes:
>
> Currently, some slash commands in psql return an error saying "Did not
> find any  named " while some return an empty table. This patch
> changes a few of the slash commands to return a similar error message.
>
>
> +1 for this patch.
>
> Personally, if I were trying to make these things consistent, I'd have
> gone in the other direction (ie return empty tables).
>
>
> +1
> Returning empty tables is a more appropriate behavior.
>
> --
> Pavel Luzanov
> Postgres Professional: https://postgrespro.com



-- 
Thanks and regards
Abhishek Chanda


v2-0001-Print-empty-table-when-a-given-object-is-not-foun.patch
Description: Binary data


Re: Buildfarm: Enabling injection points on basilisk/dogfish (Alpine / musl)

2025-04-13 Thread Wolfgang Walther

Andrew Dunstan:


On 2025-04-12 Sa 10:10 PM, Noah Misch wrote:

On Sat, Apr 12, 2025 at 07:51:06PM +0200, Wolfgang Walther wrote:

With injection points enabled, I get the following errors in test_aio:


[15:14:45.408](0.000s) not ok 187 - worker: first hard IO error is 
reported:

expected stderr
[15:14:45.409](0.000s)
[15:14:45.409](0.000s) #   Failed test 'worker: first hard IO error is
reported: expected stderr'
#   at t/001_aio.pl line 810.
[15:14:45.409](0.000s) # 'psql::88: ERROR:  could
not read blocks 2..2 in file "base/5/16408": I/O error'
# doesn't match '(?^:ERROR:.*could not read blocks 2\.\.2 in file
\"base/.*\": Input/output error)'
It seems like it's just the error message that is different and has 
"I/O"

instead of "Input/output"?

Looks like it.

On a more general note, does enabling injection points make any 
sense here?

Yes, it does.

I see that coverage in the build farm is not very big. IIUC, those 
are a
development tool, so might not be relevant, because nobody is 
developing on

Alpine / musl?
No, whether anyone develops on the platform is not a factor. One 
hasn't fully

tested PostgreSQL until one builds with injection points.




Here's a simple fix ... also removes some unnecessary escaping and 
leaning toothpick syndrome.


Confirmed - this works!

Thanks,

Wolfgang





Re: Automatically sizing the IO worker pool

2025-04-13 Thread Jose Luis Tallon

On 12/4/25 18:59, Thomas Munro wrote:

It's hard to know how to set io_workers=3.


Hmmm enable the below behaviour if "io_workers=auto" (default) ?

Sometimes being able to set this kind of parameters manually helps 
tremendously with specific workloads... :S



[snip]
Here's a patch to replace that GUC with:

   io_min_workers=1
   io_max_workers=8
   io_worker_idle_timeout=60s
   io_worker_launch_interval=500ms


Great as defaults / backwards compat with io_workers=auto. Sounds more 
user-friendly to me, at least



[snip]

Ideas, testing, flames etc welcome.


Logic seems sound, if a bit daunting for inexperienced users --- well, 
maybe just a bit more than it is now, but ISTM evolution should try and 
flatten novices' learning curve, right?



Just .02€, though.


Thanks,

--
Parkinson's Law: Work expands to fill the time alloted to it.





Re: Buildfarm: Enabling injection points on basilisk/dogfish (Alpine / musl)

2025-04-13 Thread Andres Freund
Hi, 

On April 13, 2025 7:27:33 PM GMT+02:00, Wolfgang Walther 
 wrote:
>Andrew Dunstan:
>> 
>> On 2025-04-12 Sa 10:10 PM, Noah Misch wrote:
>>> On Sat, Apr 12, 2025 at 07:51:06PM +0200, Wolfgang Walther wrote:
 With injection points enabled, I get the following errors in test_aio:
 
 
 [15:14:45.408](0.000s) not ok 187 - worker: first hard IO error is 
 reported:
 expected stderr
 [15:14:45.409](0.000s)
 [15:14:45.409](0.000s) #   Failed test 'worker: first hard IO error is
 reported: expected stderr'
 #   at t/001_aio.pl line 810.
 [15:14:45.409](0.000s) # 'psql::88: ERROR:  could
 not read blocks 2..2 in file "base/5/16408": I/O error'
 # doesn't match '(?^:ERROR:.*could not read blocks 2\.\.2 in file
 \"base/.*\": Input/output error)'
 It seems like it's just the error message that is different and has "I/O"
 instead of "Input/output"?
>>> Looks like it.
>>> 
 On a more general note, does enabling injection points make any sense here?
>>> Yes, it does.
>>> 
 I see that coverage in the build farm is not very big. IIUC, those are a
 development tool, so might not be relevant, because nobody is developing on
 Alpine / musl?
>>> No, whether anyone develops on the platform is not a factor. One hasn't 
>>> fully
>>> tested PostgreSQL until one builds with injection points.
>>> 
>>> 
>> 
>> Here's a simple fix ... also removes some unnecessary escaping and leaning 
>> toothpick syndrome.
>
>Confirmed - this works!

Thanks for testing and writing up a fix. Andrew, would you be ok applying it? 
I've been traveling the last 24h and should probably not handing sharp commit 
bits tonight. 

I'm not too surprised about failures like this, when writing the tests up I was 
worried about different formulations. But after seeing freebsd, glibc Linux, 
netbsd, openbsd windows all working the same I thought we were in the clear.

Greetings, 

Andres
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.




Get rid of integer divide in FAST_PATH_REL_GROUP() macro

2025-04-13 Thread David Rowley
I noticed a while ago that the new fast-path locking code uses integer
division to figure out the  fast-path locking group slot.  To me, this
seems a bit unnecessary as FastPathLockGroupsPerBackend is always a
power-of-two value, so we can use bitwise-AND instead.

I don't think FAST_PATH_REL_GROUP() is in any particularly hot code
paths, but still, having the divide in there isn't sitting well with
me. Can we get rid of it?

I've attached a patch for that. I also adjusted the method used to
calculate FastPathLockGroupsPerBackend. Also, the Assert that was
going on at the end of the loop in InitializeFastPathLocks() looked a
little odd as it seems to be verifying something that the loop
condition was checking already. I thought it was better to check that
we end up with a power-of-two.

Please see the attached patch.

David


v1-0001-Eliminate-integer-divide-in-fastpath-locking-code.patch
Description: Binary data


Re: Accessing an invalid pointer in BufferManagerRelation structure

2025-04-13 Thread Daniil Davydov
Hi,

On Wed, Mar 26, 2025 at 2:14 PM Stepan Neretin  wrote:
>
> The first thing we both noticed is that the macro calls a function that won't 
> be available without an additional header. This seems a bit inconvenient.

Well, I rebased the patch onto the latest `master`
(b51f86e49a7f119004c0ce5d0be89cdf98309141) and noticed that we don't
need to include `rel.h` in `localbuf.c` directly anymore, because
`#include lmgr.h` was added in memutils.h
I guess it solves this issue. Please, see v3 patch.

> I also have a question: is the logic correct that if the relation is valid, 
> we should fetch it rather than the other way around? Additionally, is 
> checking only the `rd_isvalid` flag sufficient, or should we also consider 
> the flag below?
>
> ```
> bool rd_isvalid; /* relcache entry is valid */
>

I don't think that we should check any Relation's flags here. We are
checking `RelationIsValid((bmr).rel) ?` to decide whether
BufferManagerRelation was created via BMR_REL or BMR_SMGR.
If the `rel` field is not NULL, we can definitely say that BMR_REL was
used, so we should call RelationGetSmgr in order to access smgr.

--
Best regards,
Daniil Davydov
From a1c572652e69f13b954ec3a915ed55c1ec11c772 Mon Sep 17 00:00:00 2001
From: Daniil Davidov 
Date: Mon, 14 Apr 2025 13:07:38 +0700
Subject: [PATCH v3] Add macros for safety access to smgr

---
 src/backend/storage/buffer/bufmgr.c   | 55 ---
 src/backend/storage/buffer/localbuf.c |  9 +++--
 src/include/storage/bufmgr.h  | 13 +++
 3 files changed, 43 insertions(+), 34 deletions(-)

diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 0b0e056eea2..f2b117088a8 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -892,11 +892,8 @@ ExtendBufferedRelBy(BufferManagerRelation bmr,
 	Assert(bmr.smgr == NULL || bmr.relpersistence != 0);
 	Assert(extend_by > 0);
 
-	if (bmr.smgr == NULL)
-	{
-		bmr.smgr = RelationGetSmgr(bmr.rel);
+	if (bmr.relpersistence == 0)
 		bmr.relpersistence = bmr.rel->rd_rel->relpersistence;
-	}
 
 	return ExtendBufferedRelCommon(bmr, fork, strategy, flags,
    extend_by, InvalidBlockNumber,
@@ -928,11 +925,8 @@ ExtendBufferedRelTo(BufferManagerRelation bmr,
 	Assert(bmr.smgr == NULL || bmr.relpersistence != 0);
 	Assert(extend_to != InvalidBlockNumber && extend_to > 0);
 
-	if (bmr.smgr == NULL)
-	{
-		bmr.smgr = RelationGetSmgr(bmr.rel);
+	if (bmr.relpersistence == 0)
 		bmr.relpersistence = bmr.rel->rd_rel->relpersistence;
-	}
 
 	/*
 	 * If desired, create the file if it doesn't exist.  If
@@ -940,15 +934,15 @@ ExtendBufferedRelTo(BufferManagerRelation bmr,
 	 * an smgrexists call.
 	 */
 	if ((flags & EB_CREATE_FORK_IF_NEEDED) &&
-		(bmr.smgr->smgr_cached_nblocks[fork] == 0 ||
-		 bmr.smgr->smgr_cached_nblocks[fork] == InvalidBlockNumber) &&
-		!smgrexists(bmr.smgr, fork))
+		(BMR_GET_SMGR(bmr)->smgr_cached_nblocks[fork] == 0 ||
+		 BMR_GET_SMGR(bmr)->smgr_cached_nblocks[fork] == InvalidBlockNumber) &&
+		!smgrexists(BMR_GET_SMGR(bmr), fork))
 	{
 		LockRelationForExtension(bmr.rel, ExclusiveLock);
 
 		/* recheck, fork might have been created concurrently */
-		if (!smgrexists(bmr.smgr, fork))
-			smgrcreate(bmr.smgr, fork, flags & EB_PERFORMING_RECOVERY);
+		if (!smgrexists(BMR_GET_SMGR(bmr), fork))
+			smgrcreate(BMR_GET_SMGR(bmr), fork, flags & EB_PERFORMING_RECOVERY);
 
 		UnlockRelationForExtension(bmr.rel, ExclusiveLock);
 	}
@@ -958,13 +952,13 @@ ExtendBufferedRelTo(BufferManagerRelation bmr,
 	 * kernel.
 	 */
 	if (flags & EB_CLEAR_SIZE_CACHE)
-		bmr.smgr->smgr_cached_nblocks[fork] = InvalidBlockNumber;
+		BMR_GET_SMGR(bmr)->smgr_cached_nblocks[fork] = InvalidBlockNumber;
 
 	/*
 	 * Estimate how many pages we'll need to extend by. This avoids acquiring
 	 * unnecessarily many victim buffers.
 	 */
-	current_size = smgrnblocks(bmr.smgr, fork);
+	current_size = smgrnblocks(BMR_GET_SMGR(bmr), fork);
 
 	/*
 	 * Since no-one else can be looking at the page contents yet, there is no
@@ -1008,7 +1002,7 @@ ExtendBufferedRelTo(BufferManagerRelation bmr,
 	if (buffer == InvalidBuffer)
 	{
 		Assert(extended_by == 0);
-		buffer = ReadBuffer_common(bmr.rel, bmr.smgr, bmr.relpersistence,
+		buffer = ReadBuffer_common(bmr.rel, BMR_GET_SMGR(bmr), bmr.relpersistence,
    fork, extend_to - 1, mode, strategy);
 	}
 
@@ -2568,10 +2562,10 @@ ExtendBufferedRelCommon(BufferManagerRelation bmr,
 	BlockNumber first_block;
 
 	TRACE_POSTGRESQL_BUFFER_EXTEND_START(fork,
-		 bmr.smgr->smgr_rlocator.locator.spcOid,
-		 bmr.smgr->smgr_rlocator.locator.dbOid,
-		 bmr.smgr->smgr_rlocator.locator.relNumber,
-		 bmr.smgr->smgr_rlocator.backend,
+		 BMR_GET_SMGR(bmr)->smgr_rlocator.locator.spcOid,
+		 BMR_GET_SMGR(bmr)->smgr_rlocator.locator.dbOid,
+		 BMR_GET_SMGR(bmr)->smgr_rlocator.locator.relNumber,
+		 BMR_GET_SMGR(bmr)->smgr_rlocator.backend,
 		 extend_by);
 

RE: Add pg_get_injection_points() for information of injection points

2025-04-13 Thread Hayato Kuroda (Fujitsu)
Dear Michael,

Thanks for creating the patch! Let me confirm two points:

Apart from functions related with injection_points, this can be called even when
postgres has been built with -Dinjection_points=false. This is intentional 
because
this function does not have any side-effect and just refers the status. Is my
understanding correct?

I'm not sure it is directly related, but ISTM there are no direct ways to check
whether the injection_points is enabled or not. How do you think adding the
function?


Regarding the internal of the patch, it could be crashed when two points are
attached and then first one is detached [1]. I think we should not use "idx" for
the result array - PSA the fix.

[1]:
```
SELECT injection_points_attach('TestInjectionLog', 'notice');
 injection_points_attach 
-
 
(1 row)

SELECT injection_points_attach('TestInjectionError', 'error');
 injection_points_attach 
-
 
(1 row)

SELECT injection_points_detach('TestInjectionLog');
 injection_points_detach 
-
 
(1 row)

SELECT name, library, function FROM pg_get_injection_points()
  ORDER BY name COLLATE "C";
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
connection to server was lost
```

Best regards,
Hayato Kuroda
FUJITSU LIMITED

diff --git a/src/backend/utils/misc/injection_point.c 
b/src/backend/utils/misc/injection_point.c
index bf1a49f7472..ea7e6dba521 100644
--- a/src/backend/utils/misc/injection_point.c
+++ b/src/backend/utils/misc/injection_point.c
@@ -596,6 +596,7 @@ InjectionPointList(uint32 *num_points)
 #ifdef USE_INJECTION_POINTS
InjectionPointData *result;
uint32  max_inuse;
+   int cur_pos = 0;
 
LWLockAcquire(InjectionPointLock, LW_SHARED);
 
@@ -619,14 +620,16 @@ InjectionPointList(uint32 *num_points)
if (generation % 2 == 0)
continue;
 
-   result[idx].name = pstrdup(entry->name);
-   result[idx].library = pstrdup(entry->library);
-   result[idx].function = pstrdup(entry->function);
-   (*num_points)++;
+   result[cur_pos].name = pstrdup(entry->name);
+   result[cur_pos].library = pstrdup(entry->library);
+   result[cur_pos].function = pstrdup(entry->function);
+   cur_pos++;
}
 
LWLockRelease(InjectionPointLock);
 
+   *num_points = cur_pos;
+
return result;
 
 #else


Re: Back-patch of: avoid multiple hard links to same WAL file after a crash

2025-04-13 Thread Michael Paquier
On Sun, Apr 13, 2025 at 11:51:57AM -0400, Tom Lane wrote:
> Noah Misch  writes:
> > Tom and Michael, do you still object to the test addition, or not?  If there
> > are no new or renewed objections by 2025-04-20, I'll proceed to add the 
> > test.
> 
> While I still don't love it, I don't have a better proposal.

No objections from here to use your proposal for the time being on all
the branches.  I am planning to spend some cycles thinking about a
better alternative, but I doubt that this will be portable down to
v13.
--
Michael


signature.asc
Description: PGP signature


Re: Automatically sizing the IO worker pool

2025-04-13 Thread Thomas Munro
On Mon, Apr 14, 2025 at 5:45 AM Jose Luis Tallon
 wrote:
> On 12/4/25 18:59, Thomas Munro wrote:
> > It's hard to know how to set io_workers=3.
>
> Hmmm enable the below behaviour if "io_workers=auto" (default) ?

Why not just delete io_workers?  If you really want a fixed number,
you can set io_min_workers==io_max_workers.

What should io_max_workers default to?  I guess it could be pretty
large without much danger, but I'm not sure.  If it's a small value,
an overloaded storage system goes through two stages: first it fills
the queue up with a backlog of requests until it overflows because the
configured maximum of workers isn't keeping up, and then new
submissions start falling back to synchronous IO, sort of jumping
ahead of the queued backlog, but also stalling if the real reason is
that the storage itself isn't keeping up.  Whether it'd be better for
the IO worker pool to balloon all the way up to 32 processes (an
internal limit) if required to try to avoid that with default
settings, I'm not entirely sure.  Maybe?  Why not at least try to get
all the concurrency possible, before falling back to synchronous?
Queued but not running IOs seem to be strictly worse than queued but
not even trying to run.  I'd be interested to hear people's thoughts
and experiences actually trying different kinds of workloads on
different kinds of storage.  Whether adding more concurrency actually
helps or just generates a lot of useless new processes before the
backpressure kicks in depends on why it's not keeping up, eg hitting
IOPS, throughput or concurrency limits in the storage.  In later work
I hope we can make higher levels smarter about understanding whether
requesting more concurrency helps or hurts with feedback (that's quite
a hard problem that some of my colleagues have been looking into), but
the simpler question here seems to be: should this fairly low level
system-wide setting ship with a default that includes any preconceived
assumptions about that?

It's superficially like max_parallel_workers, which ships with a
default of 8, and that's basically where I plucked that 8 from in the
current patch for lack of a serious idea to propose yet.  But it's
also more complex than CPU: you know how many cores you have and you
know things about your workload, but even really small "on the metal"
systems probably have a lot more concurrent I/O capacity -- perhaps
depending on the type of operation! (and so far we only have reads) --
than CPU cores.  Especially once you completely abandon the idea that
anyone runs databases on spinning rust in modern times, even on low
end systems, which I think we've more or less agreed to assume these
days with related changes such as the recent *_io_concurrency default
change (1->16).  It's actually pretty hard to drive a laptop up to
needing more half a dozen or a dozen or a dozen or so workers with
this patch for especially without debug_io_direct=data ie with fast
double-buffered I/O, but cloud environments may also be where most
databases run these days, and low end cloud configurations have
arbitrary made up limits that may be pretty low, so it all depends
I really don't know, but one idea is that we could leave it open as
possible, and let users worry about that with higher-level settings
and the query concurrency they choose to generate...
io_method=io_uring is effectively open, so why should io_method=worker
be any different by default?  Just some thoughts.  I'm not sure.




Add pg_get_injection_points() for information of injection points

2025-04-13 Thread Michael Paquier
Hi all,

One thing that's been lacking in injection points is the possibility
to look at the state of the injection points in shared memory through
some SQL.  This was on my tablets for some time, but I have not taken
the time to do the actual legwork.

The attached patch adds a SRF that returns a set of tuples made of the
name, library and function for all the injection points attached to
the system.  This implementation relies on a new function added in
injection_point.c, called InjectionPointList(), that retrieves a
palloc()'d array of the injection points, hiding from the callers the
internals of what this stuff does with the shmem array lookup.

This is useful for monitoring or in tests, to make sure for example
that nothing is left around at the end of a script.  I have a
different proposal planned for this area of the code, where this
function would be good to have, but I am sending an independent patch
as this stuff is useful on its own.

The patch includes a couple of tests and some documentation.

Thanks,
--
Michael
From 8865e2ca3b1b6291e94d2a5476926095245a4e2f Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Mon, 14 Apr 2025 09:26:52 +0900
Subject: [PATCH] Add pg_get_injection_points()

This is a system function that displays the information about the
injection points currently attached to the system, feeding from the
states of things in shared memory.
---
 src/include/catalog/pg_proc.dat   |  8 +++
 src/include/utils/injection_point.h   | 14 +
 src/backend/utils/misc/Makefile   |  1 +
 src/backend/utils/misc/injection_point.c  | 50 
 .../utils/misc/injection_point_funcs.c| 59 +++
 src/backend/utils/misc/meson.build|  1 +
 .../expected/injection_points.out | 16 +
 .../injection_points/sql/injection_points.sql |  7 +++
 src/test/regress/expected/misc_functions.out  |  7 +++
 src/test/regress/sql/misc_functions.sql   |  3 +
 doc/src/sgml/func.sgml| 46 +++
 src/tools/pgindent/typedefs.list  |  1 +
 12 files changed, 213 insertions(+)
 create mode 100644 src/backend/utils/misc/injection_point_funcs.c

diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 62beb71da288..d087d9fda1e8 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -12566,4 +12566,12 @@
   proargnames => '{pid,io_id,io_generation,state,operation,off,length,target,handle_data_len,raw_result,result,target_desc,f_sync,f_localmem,f_buffered}',
   prosrc => 'pg_get_aios' },
 
+# Injection point functions
+{ oid => '8490', descr => 'information about injection points attached',
+  proname => 'pg_get_injection_points', prorows => '3', proretset => 't',
+  provolatile => 'v', proparallel => 'r', prorettype => 'record',
+  proargtypes => '', proallargtypes => '{text,text,text}',
+  proargmodes => '{o,o,o}', proargnames => '{name,library,function}',
+  prosrc => 'pg_get_injection_points' },
+
 ]
diff --git a/src/include/utils/injection_point.h b/src/include/utils/injection_point.h
index 6ba64cd1ebc6..c5c78914b848 100644
--- a/src/include/utils/injection_point.h
+++ b/src/include/utils/injection_point.h
@@ -11,6 +11,17 @@
 #ifndef INJECTION_POINT_H
 #define INJECTION_POINT_H
 
+/*
+ * Injection point data, used when retrieving a list of all the attached
+ * injection points.
+ */
+typedef struct InjectionPointData
+{
+	const char *name;
+	const char *library;
+	const char *function;
+} InjectionPointData;
+
 /*
  * Injection points require --enable-injection-points.
  */
@@ -46,6 +57,9 @@ extern void InjectionPointCached(const char *name);
 extern bool IsInjectionPointAttached(const char *name);
 extern bool InjectionPointDetach(const char *name);
 
+/* Get the current set of injection points attached */
+extern InjectionPointData *InjectionPointList(uint32 *num_points);
+
 #ifdef EXEC_BACKEND
 extern PGDLLIMPORT struct InjectionPointsCtl *ActiveInjectionPoints;
 #endif
diff --git a/src/backend/utils/misc/Makefile b/src/backend/utils/misc/Makefile
index b362ae437710..93703633f69a 100644
--- a/src/backend/utils/misc/Makefile
+++ b/src/backend/utils/misc/Makefile
@@ -22,6 +22,7 @@ OBJS = \
 	guc_tables.o \
 	help_config.o \
 	injection_point.o \
+	injection_point_funcs.o \
 	pg_config.o \
 	pg_controldata.o \
 	pg_rusage.o \
diff --git a/src/backend/utils/misc/injection_point.c b/src/backend/utils/misc/injection_point.c
index 97ab851f0a63..bf1a49f7472a 100644
--- a/src/backend/utils/misc/injection_point.c
+++ b/src/backend/utils/misc/injection_point.c
@@ -584,3 +584,53 @@ IsInjectionPointAttached(const char *name)
 	return false;/* silence compiler */
 #endif
 }
+
+/*
+ * Retrieve a list of all the injection points currently attached.
+ *
+ * This list is palloc'd in the current memory context.
+ */
+InjectionPointData *
+InjectionPointList(uint32 *num_points)
+{
+#ifdef USE_INJECTION_POINTS
+	InjectionPointData *re

Re: Changing shared_buffers without restart

2025-04-13 Thread Ashutosh Bapat
On Fri, Apr 11, 2025 at 8:31 PM Dmitry Dolgov <9erthali...@gmail.com> wrote:
>
> > > I think a relatively elegant solution is to extend ProcSignalBarrier
> > > mechanism to track not only pss_barrierGeneration, as a sign that
> > > everything was processed, but also something like
> > > pss_barrierReceivedGeneration, indicating that the message was received
> > > everywhere but not processed yet. That would be enough to allow
> > > processes to wait until the resize message was received everywhere, then
> > > use a global Barrier to wait until all processes are finished.  It's
> > > somehow similar to your proposal to use two signals, but has less
> > > implementation overhead.
> >
> > The way it's implemented in v4 still has the disjoint group problem.
> > Assume backends p1, p2, p3. All three of them are executing
> > ProcessProcSignalBarrier(). All three of them updated
> > pss_barrierReceivedGeneration
> >
> > /* The message is observed, record that */
> > pg_atomic_write_u64(&MyProcSignalSlot->pss_barrierReceivedGeneration,
> > shared_gen);
> >
> > p1, p2 moved faster and reached following code from 
> > ProcessBarrierShmemResize()
> > if (BarrierAttach(barrier) == SHMEM_RESIZE_REQUESTED)
> >   
> > WaitForProcSignalBarrierReceived(pg_atomic_read_u64(&ShmemCtrl->Generation));
> >
> > Since all the processes have received the barrier message, p1, p2 move
> > ahead and go through all the next phases and finish resizing even
> > before p3 gets a chance to call ProcessBarrierShmemResize() and attach
> > itself to Barrier. This could happen because it processed some other
> > ProcSignalBarrier message. p1 and p2 won't wait for p3 since it has
> > not attached itself to the barrier. Once p1, p2 finish, p3 will attach
> > itself to the barrier and resize buffers again - reinitializing the
> > shared memory, which might has been already modified by p1 or p2. Boom
> > - there's memory corruption.
>
> It won't reinitialize anything, since this logic is controlled by the
> ShmemCtrl->NSharedBuffers, if it's already updated nothing will be
> changed.

Ah, I see it now
if(pg_atomic_read_u32(&ShmemCtrl->NSharedBuffers) != NBuffers)
{

Thanks for the clarification.

However, when we put back the patches to shrink buffers, we will evict
the extra buffers, and shrink - if all the processes haven't
participated in the barrier by then, some of them may try to access
those buffers - re-installing them and then bad things can happen.

>
> About the race condition you mention, there is indeed a window between
> receiving the ProcSignalBarrier and attaching to the global Barrier in
> resize, but I don't think any process will be able to touch buffer pool
> while inside this window. Even if it happens that the remapping itself
> was blazing fast that this window was enough to make one process late
> (e.g. if it was busy handling some other signal as you mention), as I've
> showed above it shouldn't be a problem.
>
> I can experiment with this case though, maybe there is a way to
> completely close this window to not thing about even potential
> scenarios.

The window may be small today but we have to make this future proof.
Multiple ProcSignalBarrier messages may be processed in a single call
to ProcessProcSignalBarrier() and if each of those takes as long as
buffer resizing, the window will get bigger and bigger. So we have to
close this window.

>
> > > * Shared memory address space is now reserved for future usage, making
> > >   shared memory segments clash (e.g. due to memory allocation)
> > >   impossible.  There is a new GUC to control how much space to reserve,
> > >   which is called max_available_memory -- on the assumption that most of
> > >   the time it would make sense to set its value to the total amount of
> > >   memory on the machine. I'm open for suggestions regarding the name.
> >
> > With 0006 applied
> > + /* Clean up some reserved space to resize into */
> > + if (munmap(m->shmem + m->shmem_size, new_size - m->shmem_size) == -1)
> > ze, m->shmem)));
> > ... snip ...
> > + ptr = mremap(m->shmem, m->shmem_size, new_size, 0);
> >
> > We unmap the portion of reserved address space where the existing
> > segment would expand into. As long as we are just expanding this will
> > work. I am wondering how would this work for shrinking buffers? What
> > scheme do you have in mind?
>
> I didn't like this part originally, and after changes to support hugetlb
> I think it's worth it just to replace mremap with munmap/mmap. That way
> there will be no such question, e.g. if a segment is getting shrinked
> the unmapped area will again become a part of the reserved space.
>

I might have not noticed it, but are we putting two mappings one
reserved and one allocated in the same address space, so that when the
allocated mapping shrinks or expands, the reserved mapping continues
to prohibit any other mapping from appearing there? I looked at some
of the previous emails, but didn't find anything that describes how
the res

Re: Logical Replication of sequences

2025-04-13 Thread Peter Smith
Hi Vignesh,

FYI, the patch v20250325-0004 failed to apply (atop 0001,0002,0002)
due to recent master changes.

Checking patch src/backend/commands/sequence.c...
error: while searching for:
(long long) minv, (long long) maxv)));

/* Set the currval() state only if iscalled = true */
if (iscalled)
{
elm->last = next; /* last returned number */
elm->last_valid = true;

error: patch failed: src/backend/commands/sequence.c:994
error: src/backend/commands/sequence.c: patch does not apply

==
Kind Regards,
Peter Smith.
Fujitsu Australia




Re: New committer: Jacob Champion

2025-04-13 Thread Kashif Zeeshan
Congrats Jacob.

On Mon, Apr 14, 2025 at 8:32 AM Ashutosh Bapat 
wrote:

> Hearty congratulations Jacob.
>
> On Mon, Apr 14, 2025 at 6:55 AM Richard Guo 
> wrote:
> >
> > On Sat, Apr 12, 2025 at 5:26 AM Jonathan S. Katz 
> wrote:
> > > The Core Team would like to extend our congratulations to Jacob
> > > Champion, who has accepted an invitation to become our newest
> PostgreSQL
> > > committer.
> > >
> > > Please join us in wishing Jacob much success and few reverts!
> >
> > Congratulations Jacob!  Well deserved!
> >
> > Thanks
> > Richard
> >
> >
>
>
> --
> Best Wishes,
> Ashutosh Bapat
>
>
>


Re: New committer: Jacob Champion

2025-04-13 Thread Ashutosh Sharma
Many Congratulations, Jacob.

On Mon, Apr 14, 2025 at 11:00 AM Kashif Zeeshan  wrote:
>
> Congrats Jacob.
>
> On Mon, Apr 14, 2025 at 8:32 AM Ashutosh Bapat  
> wrote:
>>
>> Hearty congratulations Jacob.
>>
>> On Mon, Apr 14, 2025 at 6:55 AM Richard Guo  wrote:
>> >
>> > On Sat, Apr 12, 2025 at 5:26 AM Jonathan S. Katz  
>> > wrote:
>> > > The Core Team would like to extend our congratulations to Jacob
>> > > Champion, who has accepted an invitation to become our newest PostgreSQL
>> > > committer.
>> > >
>> > > Please join us in wishing Jacob much success and few reverts!
>> >
>> > Congratulations Jacob!  Well deserved!
>> >
>> > Thanks
>> > Richard
>> >
>> >
>>
>>
>> --
>> Best Wishes,
>> Ashutosh Bapat
>>
>>




Fix bug with accessing to temporary tables of other sessions

2025-04-13 Thread Daniil Davydov
Hi,

During previous commitfest this topic already has been discussed
within the "Forbid to DROP temp tables of other sessions" thread [1].
Unfortunately its name doesn't reflect the real problem, so I decided
to start a new thread, as David G. Johnston advised.

Here are the summary results of the discussion [1] :
The superuser is only allowed to DROP temporary relations of other
sessions. Other commands (like SELECT, INSERT, UPDATE, DELETE ...)
must be forbidden to him. Error message for this case will look like
this : `could not access temporary relations of other sessions`.
For now, superuser still can specify such operations because of a bug
in the code that mistakenly recognizes other session's temp table as
permanent (I've covered this topic in more detail in [2]). Attached
patch fixes this bug (targeted on
b51f86e49a7f119004c0ce5d0be89cdf98309141).

Opened issue:
Not everyone liked the idea that table's persistence can be assigned
to table during makeRangeVarXXX calls (inside gram.y).
My opinion - `As far as "pg_temp_" prefix is reserved by the postgres
kernel, we can definitely say that we have encountered a temporary
table when we see this prefix.`

I will be glad to hear your opinion.

--
Best regards,
Daniil Davydov

[1] 
https://www.postgresql.org/message-id/CAJDiXgj72Axj0d4ojKdRWG_rnkfs4uWY414NL%3D15sCvh7-9rwg%40mail.gmail.com
[2] 
https://www.postgresql.org/message-id/CAJDiXgj%2B5UKLWSUT5605rJhuw438NmEKecvhFAF2nnrMsgGK3w%40mail.gmail.com
From c1415e457edd8e1cf38fd78ac55c93dbd8617d55 Mon Sep 17 00:00:00 2001
From: Daniil Davidov 
Date: Mon, 14 Apr 2025 11:01:56 +0700
Subject: [PATCH v5] Fix accessing other sessions temp tables

---
 src/backend/catalog/namespace.c  | 56 
 src/backend/commands/tablecmds.c |  3 +-
 src/backend/nodes/makefuncs.c|  6 +++-
 src/backend/parser/gram.y| 11 ++-
 src/include/catalog/namespace.h  |  2 ++
 5 files changed, 55 insertions(+), 23 deletions(-)

diff --git a/src/backend/catalog/namespace.c b/src/backend/catalog/namespace.c
index d97d632a7ef..f407efd9447 100644
--- a/src/backend/catalog/namespace.c
+++ b/src/backend/catalog/namespace.c
@@ -499,28 +499,44 @@ RangeVarGetRelidExtended(const RangeVar *relation, LOCKMODE lockmode,
 		 */
 		if (relation->relpersistence == RELPERSISTENCE_TEMP)
 		{
-			if (!OidIsValid(myTempNamespace))
-relId = InvalidOid; /* this probably can't happen? */
-			else
-			{
-if (relation->schemaname)
-{
-	Oid			namespaceId;
+			Oid	namespaceId;
 
-	namespaceId = LookupExplicitNamespace(relation->schemaname, missing_ok);
+			if (relation->schemaname)
+			{
+namespaceId = LookupExplicitNamespace(relation->schemaname, missing_ok);
 
+/*
+ * If the user has specified an existing temporary schema
+ * owned by another user.
+ */
+if (OidIsValid(namespaceId) && namespaceId != myTempNamespace)
+{
 	/*
-	 * For missing_ok, allow a non-existent schema name to
-	 * return InvalidOid.
+	 * We don't allow users to access temp tables of other
+	 * sessions except for the case of dropping tables.
 	 */
-	if (namespaceId != myTempNamespace)
+	if (!(flags & RVR_OTHER_TEMP_OK))
 		ereport(ERROR,
-(errcode(ERRCODE_INVALID_TABLE_DEFINITION),
- errmsg("temporary tables cannot specify a schema name")));
+(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("could not access temporary relations of other sessions")));
 }
+			}
+			else
+			{
+namespaceId = myTempNamespace;
 
-relId = get_relname_relid(relation->relname, myTempNamespace);
+/*
+ * If this table was recognized as temporary, it means that we
+ * found it because backend's temporary namespace was specified
+ * in search_path. Thus, MyTempNamespace must contain valid oid.
+ */
+Assert(OidIsValid(namespaceId));
 			}
+
+			if (missing_ok && !OidIsValid(namespaceId))
+relId = InvalidOid;
+			else
+relId = get_relname_relid(relation->relname, namespaceId);
 		}
 		else if (relation->schemaname)
 		{
@@ -3553,21 +3569,19 @@ get_namespace_oid(const char *nspname, bool missing_ok)
 RangeVar *
 makeRangeVarFromNameList(const List *names)
 {
-	RangeVar   *rel = makeRangeVar(NULL, NULL, -1);
+	RangeVar   *rel;
 
 	switch (list_length(names))
 	{
 		case 1:
-			rel->relname = strVal(linitial(names));
+			rel = makeRangeVar(NULL, strVal(linitial(names)), -1);
 			break;
 		case 2:
-			rel->schemaname = strVal(linitial(names));
-			rel->relname = strVal(lsecond(names));
+			rel = makeRangeVar(strVal(linitial(names)), strVal(lsecond(names)), -1);
 			break;
 		case 3:
+			rel = makeRangeVar(strVal(lsecond(names)), strVal(lthird(names)), -1);
 			rel->catalogname = strVal(linitial(names));
-			rel->schemaname = strVal(lsecond(names));
-			rel->relname = strVal(lthird(names));
 			break;
 		default:
 			ereport(ERROR,
@@ -3774,6 +3788,8 @@ GetTempNamespaceProcNumber(Oid namespaceId)
 		return INVALID_PROC_NUMBER

Re: New committer: Jacob Champion

2025-04-13 Thread vignesh C
On Sat, 12 Apr 2025 at 01:56, Jonathan S. Katz  wrote:
>
> The Core Team would like to extend our congratulations to Jacob
> Champion, who has accepted an invitation to become our newest PostgreSQL
> committer.
>
> Please join us in wishing Jacob much success and few reverts!

Many Congratulations, Jacob

Regards,
Vignesh




Re: Silence resource leaks alerts

2025-04-13 Thread Ranier Vilela
Em sex., 11 de abr. de 2025 às 08:27, Ranier Vilela 
escreveu:

> Thanks Michael, for looking at this.
>
>
> Em sex., 11 de abr. de 2025 às 02:09, Michael Paquier 
> escreveu:
>
>> On Thu, Apr 10, 2025 at 03:10:02PM -0300, Ranier Vilela wrote:
>> > While it is arguable that this is a false warning, there is a benefit in
>> > moving the initialization of the string buffer, silencing the warnings
>> that
>> > are presented in this case.
>> >
>> > 1. pg_overexplain.c
>> > 2. ruleutils.c
>>
>> These code paths are far from being critical and the two ones in
>> ruleutils.c are older, even if it is a practice that had better be
>> discouraged particularly as initStringInfo() can allocate some memory
>> for nothing.  So it could bloat the current memory context if these
>> code paths are repeatedly taken.
>>
> Yeah, it's a bit annoying to do unnecessary work.
> Plus a small gain, by delaying memory allocation until when it is actually
> needed.
>
Attached a new example of moving stringinfo creation, per Coverity.

best regards,
Ranier Vilela


postpone_string_buffer_creation_dblink.patch
Description: Binary data


Re: Logical Replication of sequences

2025-04-13 Thread Peter Smith
Hi Vignesh,

Here are some review comments for patch v20250325-0001.

==
src/test/regress/expected/sequence.out

1.
+SELECT last_value, log_cnt, is_called  FROM
pg_sequence_state('sequence_test');
+ last_value | log_cnt | is_called
++-+---
+ 99 |  32 | t
+(1 row)
+

I think 32 may seem like a surprising value to anybody reading these
results. Perhaps it will help if there can be a comment for this .sql
test to explain why this is the expected value.

==
Kind Regards,
Peter Smith.
Fujitsu Australia




Re: Horribly slow pg_upgrade performance with many Large Objects

2025-04-13 Thread Hannu Krosing
And in case there *is* ACL present then each user mentioned in the ACL
adds more overhead

Also the separate GRANT calls cause bloat as the
pg_largeoject_metadata row gets updated for each ALTER USER or GRANT

The following is for 10 million LOs with 1 and 3 users being GRANTed
SELECT on each object (with no grants the pg_restore run was 10
minutes)

Nr of GRANTS  | pg_dump time | pg_restore time
--+--+
   0  |0m 10s|   10m  5s
   1  |0m 17s|   15m  3s
   3  |0m 21s|   27m 15s

NB! - I left out the --verbose flag from pg_dump as used by
pg_upgrade, as it will emit one line per LO dumped

## 1 GRANT / LO

hannuk@db01-c1a:~/work/lo-testing$ time pg_dump --schema-only
--quote-all-identifiers --binary-upgrade --format=custom
--file=lodb10m.dump -p 5433 lodb10m
real0m17.022s
user0m2.956s
sys 0m1.453s

hannuk@db01-c1a:~/work/lo-testing$ time pg_restore -p 5434
--exit-on-error --transaction-size=1000 --dbname lodb10m lodb10m.dump
real15m3.136s
user0m28.991s
sys 2m54.164s

## 3 GRANTs / LO

make sample LO with 3 grants

ALTER LARGE OBJECT 1 OWNER TO "hannuk";
GRANT SELECT ON LARGE OBJECT 1 TO "bob";
GRANT SELECT ON LARGE OBJECT 1 TO "joe";
GRANT SELECT ON LARGE OBJECT 1 TO "tom";
lodb10m=# select *  from pg_shdepend where objid = 1;

┌───┬─┬───┬──┬┬──┬─┐
│ dbid  │ classid │ objid │ objsubid │ refclassid │ refobjid │ deptype │
├───┼─┼───┼──┼┼──┼─┤
│ 16406 │2613 │ 1 │0 │   1260 │16384 │ o   │
│ 16406 │2613 │ 1 │0 │   1260 │16393 │ a   │
│ 16406 │2613 │ 1 │0 │   1260 │16394 │ a   │
│ 16406 │2613 │ 1 │0 │   1260 │16395 │ a   │
└───┴─┴───┴──┴┴──┴─┘

lodb10m=# select *  from pg_largeobject_metadata ;
┌─┬──┬───┐
│ oid │ lomowner │  lomacl   │
├─┼──┼───┤
│   1 │16384 │ {hannuk=rw/hannuk,bob=r/hannuk,joe=r/hannuk,tom=r/hannuk} │
└─┴──┴───┘

Make the remaining 10M-1 LOs

lodb10m=# insert into pg_largeobject_metadata(oid, lomowner, lomacl)
SELECT i, 16384,
'{hannuk=rw/hannuk,bob=r/hannuk,joe=r/hannuk,tom=r/hannuk}' FROM
generate_series(2, 10_000_000) g(i);
INSERT 0 999
Time: 18859.341 ms (00:18.859)

And add their sharedeps

lodb10m=# WITH refdeps (robj, rdeptype)
AS ( VALUES
  (16384, 'o'),
  (16393, 'a'),
  (16394, 'a'),
  (16395, 'a')
)
INSERT INTO pg_shdepend SELECT 16396, 2613, i, 0, 1260, robj, rdeptype
  FROM generate_series(2, 10_000_000) g(i)
 , refdeps
;
INSERT 0 3996
Time: 116697.342 ms (01:56.697)

Time pg_upgrade's pg_dump and pg_reload

hannuk@db01-c1a:~/work/lo-testing$ time pg_dump --schema-only
--quote-all-identifiers --binary-upgrade --format=custom
--file=lodb10m-3grants.dump -p 5433 lodb10m
real0m21.519s
user0m2.951s
sys 0m1.723s

hannuk@db01-c1a:~/work/lo-testing$ time pg_restore -p 5434
--exit-on-error --transaction-size=1000 --dbname lodb10m
lodb10m-3grants.dump
real27m15.372s
user0m45.157s
sys 4m57.513s

On Fri, Apr 11, 2025 at 10:11 PM Nathan Bossart
 wrote:
>
> On Tue, Apr 08, 2025 at 12:22:00PM -0500, Nathan Bossart wrote:
> > On Tue, Apr 08, 2025 at 01:07:09PM -0400, Tom Lane wrote:
> >> Nathan Bossart  writes:
> >>> I do think it's worth considering going back to copying
> >>> pg_largobject_metadata's files for upgrades from v16 and newer.
> >>
> >> (If we do this) I don't see why we'd need to stop at v16.  I'm
> >> envisioning that we'd use COPY, which will be dealing in the
> >> text representation of aclitems, and I don't think that's changed
> >> in a long time.  The sort of thing that would break it is changes
> >> in the set of available/default privilege bits for large objects.
> >
> > I was thinking of actually reverting commit 12a53c7 for upgrades from v16,
> > which AFAICT is the last release where any relevant storage formats changed
> > (aclitem changed in v16).  But if COPY gets us pretty close to that and is
> > less likely to be disrupted by future changes, it could be a better
> > long-term approach.
> >
> >> That is, where the dump currently contains something like
> >>
> >> SELECT pg_catalog.lo_create('2121');
> >> ALTER LARGE OBJECT 2121 OWNER TO postgres;
> >> GRANT ALL ON LARGE OBJECT 2121 TO joe;
> >>
> >> we'd have
> >>
> >> COPY pg_largeobject_metadata FROM STDIN;
> >> ...
> >> 2121 10  {postgres=rw/postgres,joe=rw/postgres}
> >> ...
> >>
> >> and some appropriate COPY data for pg_shdepend too.
>
> I did some more research here.  For many large objects without ACLs to
> dump, I noticed that the vast majority o

Performance issues with v18 SQL-language-function changes

2025-04-13 Thread Tom Lane
In the thread that led up to commit 0dca5d68d [1], we'd convinced
ourselves that the new implementation was faster than the old.
So I was sad to discover that there are common cases where it's
a good bit slower.  We'd focused too much on test methods like

do $$
begin
  for i in 1..1000 loop
perform fx((random()*100)::int);
  end loop;
end;
$$;

The thing about this test case is that the SQL function under
test is executed only once per calling query (i.e., per PERFORM).
That doesn't allow the old implementation to amortize anything.
If you instead test cases like

create function fx(p_summa bigint) returns text immutable strict
return ltrim(to_char(p_summa, '999 999 999 999 999 999 999 999'));

explain analyze select fx(i) from generate_series(1,100) as i(i);

you arrive at the rude discovery that 0dca5d68d is about 50% slower
than 0dca5d68d^, because the old implementation builds a plan for fx()
only once and then re-uses it throughout the query.  So does the new
implementation, but it has added GetCachedPlan overhead.  Moreover,
I made the unforced error of deciding that we could tear down and
rebuild the SQLFunctionCache and subsidiary data structures for
each call.  That overhead is relatively minor in comparison to the
cost of parsing and planning the function; but when comparing cases
where there's no repeated parsing and planning, it becomes
significant.

Hence, the attached patch series reverts that decision and goes back
to the old method of having the SQLFunctionCache and some associated
objects live as long as the FmgrInfo does (without, however, the
poorly-controlled memory leaks of the old code).  In my testing
this gets us from a 50% penalty down to about 5%, which I think is
acceptable given the other benefits 0dca5d68d brought us.

I'm inclined to argue that, seeing that 0dca5d68d was mainly intended
as a performance feature, this performance loss is a bug that we
need to do something about even though we're post-feature-freeze.
We could either revert 0dca5d68d or apply the attached.  I'd prefer
the latter.

(There are other things we could do to try to reduce the overhead
further, such as trying to not build a Tuplestore or JunkFilter in
simple cases.  But that seems like new work not a fix for a bad
decision in existing work, so I think it's out of scope for v18.)

Comments?

regards, tom lane

[1] https://www.postgresql.org/message-id/flat/8216639.NyiUUSuA9g%40aivenlaptop

From c87a37d00b847e86e5286c9c46668410019b0043 Mon Sep 17 00:00:00 2001
From: Tom Lane 
Date: Sun, 13 Apr 2025 13:37:46 -0400
Subject: [PATCH v1 1/5] Minor performance improvement for SQL-language
 functions.

Late in the development of commit 0dca5d68d, I added a step to copy
the result tlist we extract from the cached final query, because
I was afraid that that might not last as long as the JunkFilter that
we're passing it off to.  However, that turns out to cost a noticeable
number of cycles, and it's really quite unnecessary because the
JunkFilter will not examine that tlist after it's been created.
(ExecFindJunkAttribute would use it, but we don't use that function
on this JunkFilter.)  Hence, remove the copy step.  For safety,
reset the might-become-dangling jf_targetList pointer to NIL.

In passing, remove DR_sqlfunction.cxt, which we don't use anymore;
it's confusing because it's not entirely clear which context it
ought to point at.
---
 src/backend/executor/functions.c | 19 +++
 1 file changed, 11 insertions(+), 8 deletions(-)

diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index 53ff614d87b..d3f05c7d2c7 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -45,7 +45,6 @@ typedef struct
 {
 	DestReceiver pub;			/* publicly-known function pointers */
 	Tuplestorestate *tstore;	/* where to put result tuples */
-	MemoryContext cxt;			/* context containing tstore */
 	JunkFilter *filter;			/* filter to convert tuple type */
 } DR_sqlfunction;
 
@@ -787,12 +786,6 @@ init_execution_state(SQLFunctionCachePtr fcache)
 		 */
 		resulttlist = get_sql_fn_result_tlist(plansource->query_list);
 
-		/*
-		 * We need to make a copy to ensure that it doesn't disappear
-		 * underneath us due to plancache invalidation.
-		 */
-		resulttlist = copyObject(resulttlist);
-
 		/*
 		 * If the result is composite, *and* we are returning the whole tuple
 		 * result, we need to insert nulls for any dropped columns.  In the
@@ -807,6 +800,17 @@ init_execution_state(SQLFunctionCachePtr fcache)
 			  slot);
 		else
 			fcache->junkFilter = ExecInitJunkFilter(resulttlist, slot);
+
+		/*
+		 * The resulttlist tree belongs to the plancache and might disappear
+		 * underneath us due to plancache invalidation.  While we could
+		 * forestall that by copying it, that'd just be a waste of cycles,
+		 * because the junkfilter doesn't need it anymore.  (It'd only be used
+		 * by ExecFindJunkAttribute(), which we 

Re: Buildfarm: Enabling injection points on basilisk/dogfish (Alpine / musl)

2025-04-13 Thread Andrew Dunstan



On 2025-04-13 Su 1:51 PM, Andres Freund wrote:

Hi,

On April 13, 2025 7:27:33 PM GMT+02:00, Wolfgang Walther 
 wrote:

Andrew Dunstan:

On 2025-04-12 Sa 10:10 PM, Noah Misch wrote:

On Sat, Apr 12, 2025 at 07:51:06PM +0200, Wolfgang Walther wrote:

With injection points enabled, I get the following errors in test_aio:


[15:14:45.408](0.000s) not ok 187 - worker: first hard IO error is reported:
expected stderr
[15:14:45.409](0.000s)
[15:14:45.409](0.000s) #   Failed test 'worker: first hard IO error is
reported: expected stderr'
#   at t/001_aio.pl line 810.
[15:14:45.409](0.000s) # 'psql::88: ERROR:  could
not read blocks 2..2 in file "base/5/16408": I/O error'
# doesn't match '(?^:ERROR:.*could not read blocks 2\.\.2 in file
\"base/.*\": Input/output error)'
It seems like it's just the error message that is different and has "I/O"
instead of "Input/output"?

Looks like it.


On a more general note, does enabling injection points make any sense here?

Yes, it does.


I see that coverage in the build farm is not very big. IIUC, those are a
development tool, so might not be relevant, because nobody is developing on
Alpine / musl?

No, whether anyone develops on the platform is not a factor. One hasn't fully
tested PostgreSQL until one builds with injection points.



Here's a simple fix ... also removes some unnecessary escaping and leaning 
toothpick syndrome.

Confirmed - this works!

Thanks for testing and writing up a fix. Andrew, would you be ok applying it? 
I've been traveling the last 24h and should probably not handing sharp commit 
bits tonight.

I'm not too surprised about failures like this, when writing the tests up I was 
worried about different formulations. But after seeing freebsd, glibc Linux, 
netbsd, openbsd windows all working the same I thought we were in the clear.




pushed.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





pg_dump --if-exists --clean when drop index that is partition of a partitioned index

2025-04-13 Thread jian he
hi.

CREATE TABLE tp(c int, a int, b int) PARTITION BY RANGE (b);
CREATE TABLE tp_1(c int, a int, b int);
ALTER TABLE tp ATTACH PARTITION tp_1 FOR VALUES FROM (0) TO (1);
CREATE INDEX t_a_idx ON tp_1(a);
CREATE INDEX tp_a_idx ON tp(a);

pg_dump  --schema=public --if-exists --clean --no-statistics
--no-owner --no-data --table-and-children=tp > 1.sql
pg_dump output file 1.sql excerpts:

DROP INDEX IF EXISTS public.t_a_idx;
DROP INDEX IF EXISTS public.tp_a_idx;
DROP TABLE IF EXISTS public.tp_1;
DROP TABLE IF EXISTS public.tp;

if you execute the
DROP INDEX IF EXISTS public.t_a_idx;

ERROR:  cannot drop index t_a_idx because index tp_a_idx requires it
HINT:  You can drop index tp_a_idx instead.

Is this pg_dump output what we expected?


SELECT * FROM pg_partition_tree('tp_a_idx');
  relid   | parentrelid | isleaf | level
--+-++---
 tp_a_idx | | f  | 0
 t_a_idx  | tp_a_idx| t  | 1
(2 rows)




Re: datfrozenxid > relfrozenxid w/ crash before XLOG_HEAP_INPLACE

2025-04-13 Thread Noah Misch
On Sun, Apr 06, 2025 at 11:00:54AM -0700, Noah Misch wrote:
> I pushed that as commit 8e7e672 (2024-10-25).  I now think DELAY_CHKPT_START
> is superfluous here, per this proc.h comment:
> 
>  * (In the
>  * extremely common case where the data being modified is in shared buffers
>  * and we acquire an exclusive content lock on the relevant buffers before
>  * writing WAL, this mechanism is not needed, because phase 2 will block
>  * until we release the content lock and then flush the modified data to
>  * disk.)
> 
> heap_inplace_update_and_unlock() meets those conditions.  Its closest
> precedent is XLogSaveBufferForHint(), which does need DELAY_CHKPT_START due to
> having only BUFFER_LOCK_SHARED.

True so far, but ...

> heap_inplace_update_and_unlock() has
> BUFFER_LOCK_EXCLUSIVE, so it doesn't need DELAY_CHKPT_START.

... not so, because we've not yet done MarkBufferDirty().  transam/README
cites SyncOneBuffer(), which says:

 * Check whether buffer needs writing.
 *
 * We can make this check without taking the buffer content lock so long
 * as we mark pages dirty in access methods *before* logging changes 
with
 * XLogInsert(): if someone marks the buffer dirty just after our check 
we
 * don't worry because our checkpoint.redo points before log record for
 * upcoming changes and so we are not required to write such dirty 
buffer.

The attached patch updates the aforementioned proc.h comment and the
heap_inplace_update_and_unlock() comment that my last message proposed.
From: Noah Misch 

Comment on need to MarkBufferDirty() if omitting DELAY_CHKPT_START.

Blocking checkpoint phase 2 requires MarkBufferDirty() and
BUFFER_LOCK_EXCLUSIVE; neither suffices by itself.  transam/README documents
this, citing SyncOneBuffer().  Update the DELAY_CHKPT_START documentation to
say this.  Expand the heap_inplace_update_and_unlock() comment that cites
XLogSaveBufferForHint() as precedent, since heap_inplace_update_and_unlock()
could have opted not to use DELAY_CHKPT_START.

Commit 8e7e672cdaa6bfec85d4d5dd9be84159df23bb41 added DELAY_CHKPT_START to
heap_inplace_update_and_unlock().  Since commit
bc6bad88572501aecaa2ac5d4bc900ac0fd457d5 reverted it in non-master branches,
no back-patch.

Discussion: https://postgr.es/m/20250406180054.26.nmi...@google.com

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index ed2e302..c1a4de1 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -6507,9 +6507,17 @@ heap_inplace_update_and_unlock(Relation relation,
 * [crash]
 * [recovery restores datfrozenxid w/o relfrozenxid]
 *
-* Like in MarkBufferDirtyHint() subroutine XLogSaveBufferForHint(), 
copy
-* the buffer to the stack before logging.  Here, that facilitates a FPI
-* of the post-mutation block before we accept other sessions seeing it.
+* Mimic MarkBufferDirtyHint() subroutine XLogSaveBufferForHint().
+* Specifically, use DELAY_CHKPT_START, and copy the buffer to the 
stack.
+* The stack copy facilitates a FPI of the post-mutation block before we
+* accept other sessions seeing it.  DELAY_CHKPT_START allows us to
+* XLogInsert() before MarkBufferDirty().  Since XLogSaveBufferForHint()
+* can operate under BUFFER_LOCK_SHARED, it can't avoid 
DELAY_CHKPT_START.
+* This function, however, likely could avoid it with the following 
order
+* of operations: MarkBufferDirty(), XLogInsert(), memcpy().  Opt to use
+* DELAY_CHKPT_START here, too, as a way to have fewer distinct code
+* patterns to analyze.  Inplace update isn't so frequent that it should
+* pursue the small optimization of skipping DELAY_CHKPT_START.
 */
Assert((MyProc->delayChkptFlags & DELAY_CHKPT_START) == 0);
START_CRIT_SECTION();
diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h
index f51b03d..86c5f99 100644
--- a/src/include/storage/proc.h
+++ b/src/include/storage/proc.h
@@ -110,10 +110,10 @@ extern PGDLLIMPORT int FastPathLockGroupsPerBackend;
  * is inserted prior to the new redo point, the corresponding data changes will
  * also be flushed to disk before the checkpoint can complete. (In the
  * extremely common case where the data being modified is in shared buffers
- * and we acquire an exclusive content lock on the relevant buffers before
- * writing WAL, this mechanism is not needed, because phase 2 will block
- * until we release the content lock and then flush the modified data to
- * disk.)
+ * and we acquire an exclusive content lock and MarkBufferDirty() on the
+ * relevant buffers before writing WAL, this mechanism is not needed, because
+ * phase 2 will block until we release the content lock and then flush the
+ * modified data to disk.  See transam/README and SyncOneBuffer().)
  *
  * Setting DELAY_CHKPT_COMPLETE prevents the system from moving fr

Re: Fix replica identity checks for MERGE command on published table.

2025-04-13 Thread Ashutosh Bapat
On Fri, Apr 11, 2025 at 10:54 AM Zhijie Hou (Fujitsu)
 wrote:
>
> Hi,
>
> While testing publication DDLs, I noticed an unexpected behavior where the
> MERGE command can be executed on tables lacking replica identity keys,
> regardless of whether they are part of a publication that publishes updates 
> and
> deletes.
>
> Replication and application of the updates and deletes generated by MERGE
> command require replica identity keys in the WAL record, which are essential
> for the apply worker on the subscriber to find local tuples for updating or
> deletion. Furthermore, publications require specific columns to be part of the
> replica identity key if the table specifies a publication row filter or column
> list.
>
> We already have restrictions on executing UPDATE and DELETE commands for 
> tables
> without replica identity keys under similar conditions. So, I think the same
> restriction should apply to the MERGE command as well.
>

+-- fail - missing REPLICA IDENTITY
+MERGE INTO testpub_merge_no_ri USING testpub_merge_pk s ON s.a >= 1
+ WHEN MATCHED THEN UPDATE SET b = s.b;
+ERROR:  cannot update table "testpub_merge_no_ri" because it does not
have a replica identity and publishes updates
+HINT:  To enable updating the table, set REPLICA IDENTITY using ALTER TABLE.
+-- fail - missing REPLICA IDENTITY
+MERGE INTO testpub_merge_no_ri USING testpub_merge_pk s ON s.a >= 1
+ WHEN MATCHED THEN DELETE;
+ERROR:  cannot delete from table "testpub_merge_no_ri" because it
does not have a replica identity and publishes deletes
+HINT:  To enable deleting from the table, set REPLICA IDENTITY using
ALTER TABLE.

I was wondering whether we should mention MERGE somewhere in the error
message like "cannot merge into table ...". But the error message is
reported depending upon the actual operation being performed and
whether it's being published by the publication, so mentioning
specific operations is better than mentioning just MERGE. So I think
the current error message is ok; and it will help point out the
operations that caused it.

But that opens up another question: some merge operations (on the same
table) will go through and some won't if the publication publishes
only some of the operations. I am wondering, albeit quite late after
the feature has sailed, whether MERGE should be considered a separate
operation as far as publication is concerned. This topic may have been
discussed either when MERGE was implemented or when operation filters
were implemented. Sorry for the noise in that case.

This isn't a detailed review.

--
Best Wishes,
Ashutosh Bapat




Re: Adding facility for injection points (or probe points?) for more advanced tests

2025-04-13 Thread Michael Paquier
On Wed, Feb 05, 2025 at 09:19:22AM +0900, Michael Paquier wrote:
> On Mon, Feb 03, 2025 at 09:30:33PM -0800, Jeff Davis wrote:
>> One extra benefit of supporting arguments is that it would be a more
>> flexible way to change the local state around the injection point.
>> Right now the only way is by using IS_INJECTION_POINT_ATTACHED(), which
>> doesn't permit callback-defined conditions, etc.
> 
> Agreed.  The line I'm drawing here (mentioned upthread as well) is
> that any changes done in the core backend for injection_point.c should
> have one or more use cases in the tree.

As a matter of fact, this argument is now void on HEAD as of
93bc3d75d8e1 (addition of test_aio) and more specifically da7226993fd4
(core AIO infra).  See pgaio_inj_io_get() that is used in a injection
point callback to retrieve a specific value that had better be given
as an argument of the callback at runtime.  There is also
pgaio_inj_cur_handle at the top of aio.c with a comment on the matter.
So the AIO testing is using a set of hacks to go through the current
limitations.

IMO, we ought to clean up the way the AIO code does its tests with
injection point with something like the patch of this thread.  And
perhaps even do it in v18 to have the code in a cleaner state at
release.  I'll start a new thread after hacking my way through that.
The core injection point patch still needs a bit of work compared to
what was sent previously.
--
Michael


signature.asc
Description: PGP signature


Rename injection point names in test_aio

2025-04-13 Thread Michael Paquier
Hi all,
(well, Andres)

93bc3d75d8e1 has introduced a couple of new injection points, but
these don't follow the convention in-place where points are named
more-or-less-like-that.  Please find attached a patch to make all
these more consistent.

Thoughts or comments?
--
Michael
diff --git a/src/backend/storage/aio/aio.c b/src/backend/storage/aio/aio.c
index 86f7250b7a5f..64e8e81e1849 100644
--- a/src/backend/storage/aio/aio.c
+++ b/src/backend/storage/aio/aio.c
@@ -507,7 +507,7 @@ pgaio_io_process_completion(PgAioHandle *ioh, int result)
 
 	pgaio_io_update_state(ioh, PGAIO_HS_COMPLETED_IO);
 
-	pgaio_io_call_inj(ioh, "AIO_PROCESS_COMPLETION_BEFORE_SHARED");
+	pgaio_io_call_inj(ioh, "aio-process-completion-before-shared");
 
 	pgaio_io_call_complete_shared(ioh);
 
diff --git a/src/backend/storage/aio/method_worker.c b/src/backend/storage/aio/method_worker.c
index 8ad17ec1ef73..0fde2a5b30da 100644
--- a/src/backend/storage/aio/method_worker.c
+++ b/src/backend/storage/aio/method_worker.c
@@ -525,7 +525,7 @@ IoWorkerMain(const void *startup_data, size_t startup_data_len)
 			 * To be able to exercise the reopen-fails path, allow injection
 			 * points to trigger a failure at this point.
 			 */
-			pgaio_io_call_inj(ioh, "AIO_WORKER_AFTER_REOPEN");
+			pgaio_io_call_inj(ioh, "aio-worker-after-reopen");
 
 			error_errno = 0;
 			error_ioh = NULL;
diff --git a/src/test/modules/test_aio/test_aio.c b/src/test/modules/test_aio/test_aio.c
index 1d776010ef41..7745244b0e24 100644
--- a/src/test/modules/test_aio/test_aio.c
+++ b/src/test/modules/test_aio/test_aio.c
@@ -86,19 +86,19 @@ test_aio_shmem_startup(void)
 		inj_io_error_state->enabled_reopen = false;
 
 #ifdef USE_INJECTION_POINTS
-		InjectionPointAttach("AIO_PROCESS_COMPLETION_BEFORE_SHARED",
+		InjectionPointAttach("aio-process-completion-before-shared",
 			 "test_aio",
 			 "inj_io_short_read",
 			 NULL,
 			 0);
-		InjectionPointLoad("AIO_PROCESS_COMPLETION_BEFORE_SHARED");
+		InjectionPointLoad("aio-process-completion-before-shared");
 
-		InjectionPointAttach("AIO_WORKER_AFTER_REOPEN",
+		InjectionPointAttach("aio-worker-after-reopen",
 			 "test_aio",
 			 "inj_io_reopen",
 			 NULL,
 			 0);
-		InjectionPointLoad("AIO_WORKER_AFTER_REOPEN");
+		InjectionPointLoad("aio-worker-after-reopen");
 
 #endif
 	}
@@ -109,8 +109,8 @@ test_aio_shmem_startup(void)
 		 * critical section.
 		 */
 #ifdef USE_INJECTION_POINTS
-		InjectionPointLoad("AIO_PROCESS_COMPLETION_BEFORE_SHARED");
-		InjectionPointLoad("AIO_WORKER_AFTER_REOPEN");
+		InjectionPointLoad("aio-process-completion-before-shared");
+		InjectionPointLoad("aio-worker-after-reopen");
 		elog(LOG, "injection point loaded");
 #endif
 	}


signature.asc
Description: PGP signature


Re: Bump soft open file limit (RLIMIT_NOFILE) to hard limit on startup

2025-04-13 Thread Jelte Fennema-Nio

On Fri Apr 4, 2025 at 7:34 PM CEST, Heikki Linnakangas wrote:
Let's move that 'in_restore_command' business to the caller. It's weird 
modularity for the function to implicitly behave differently for some 
callers. 


I definitely agree with the sentiment, and it was what I originally
planned to do. But then I went for this approach anyway because commit
8fb13dd6ab5b explicitely moved all code except for the actual call to
system() out of the PreRestoreCommand()/PostRestoreCommand() section
(which is also described in the code comment). 


So I didn't move the the in_restore_command stuff to the caller, and
improved the function comment to call out this unfortunate coupling. 

And 'wait_event_info' should only affect pgstat reporting, not 
actual behavior.


Given that we need to keep the restore command stuff in this function, I
think the only other option is to add a dedicated argument for the
restore command stuff, like "bool is_restore_command". But that would
require every caller, except for the restore command, to pass an
additional "false" as an argument. To me the additionaly noise that that
adds seems like a worse issue than the non-purity we get by
piggy-backing on the wait_event_info argument.

I don't feel good about the function name. How about pg_system() or 
something? 


Done 

postmaster/startup.c also seems like a weird place for it; 
not sure where it belongs but probably not there. Maybe next to 
OpenPipeStream() in fd.c, or move both to a new file.


Moved it to fd.c

Looks a bit funny that both functions are called Restore(). 
Not sure what to suggest instead though. Maybe RaiseOpenFileLimit() and 
LowerOpenFileLimit().


Changed them to UseOriginalOpenFileLimit() and
UseOriginalOpenFileLimit()

What would it take to re-implement popen() with fork+exec? Genuine 
question; I have a feeling it might be complicated to do correctly on 
all platforms, but I don't know.


I initially attempted to re-implement it, but after looking at the
fairly complex FreeBSD implementation of popen[1] that suddenly seemed
more trouble than it's worth.

[1]: 
https://github.com/freebsd/freebsd-src/blob/c98367641991019bac0e8cd55b70682171820534/lib/libc/gen/popen.c#L63-L181
From 5dbab2ccf0d74313dbc2cbaeb418346de8cc2f48 Mon Sep 17 00:00:00 2001
From: Jelte Fennema-Nio 
Date: Sun, 23 Feb 2025 16:52:29 +0100
Subject: [PATCH v8 1/3] Adds a helper for places that shell out to system()

We need to call system() in a few places, and to do so safely we need
some pre/post-fork logic. This encapsulates that logic into a single
System helper function. The main reason to do this is that in a follow
up commit we want to add even more logic pre and post-fork.
---
 src/backend/access/transam/xlogarchive.c | 29 ++--
 src/backend/archive/shell_archive.c  |  5 +--
 src/backend/postmaster/startup.c |  1 +
 src/backend/storage/file/fd.c| 42 
 src/include/storage/fd.h |  1 +
 5 files changed, 48 insertions(+), 30 deletions(-)

diff --git a/src/backend/access/transam/xlogarchive.c b/src/backend/access/transam/xlogarchive.c
index 1ef1713c91a..c7640ec5025 100644
--- a/src/backend/access/transam/xlogarchive.c
+++ b/src/backend/access/transam/xlogarchive.c
@@ -158,27 +158,8 @@ RestoreArchivedFile(char *path, const char *xlogfname,
 			(errmsg_internal("executing restore command \"%s\"",
 			 xlogRestoreCmd)));
 
-	fflush(NULL);
-	pgstat_report_wait_start(WAIT_EVENT_RESTORE_COMMAND);
-
-	/*
-	 * PreRestoreCommand() informs the SIGTERM handler for the startup process
-	 * that it should proc_exit() right away.  This is done for the duration
-	 * of the system() call because there isn't a good way to break out while
-	 * it is executing.  Since we might call proc_exit() in a signal handler,
-	 * it is best to put any additional logic before or after the
-	 * PreRestoreCommand()/PostRestoreCommand() section.
-	 */
-	PreRestoreCommand();
-
-	/*
-	 * Copy xlog from archival storage to XLOGDIR
-	 */
-	rc = system(xlogRestoreCmd);
-
-	PostRestoreCommand();
-
-	pgstat_report_wait_end();
+	/* Copy xlog from archival storage to XLOGDIR */
+	rc = pg_system(xlogRestoreCmd, WAIT_EVENT_RESTORE_COMMAND);
 	pfree(xlogRestoreCmd);
 
 	if (rc == 0)
@@ -325,11 +306,7 @@ ExecuteRecoveryCommand(const char *command, const char *commandName,
 	/*
 	 * execute the constructed command
 	 */
-	fflush(NULL);
-	pgstat_report_wait_start(wait_event_info);
-	rc = system(xlogRecoveryCmd);
-	pgstat_report_wait_end();
-
+	rc = pg_system(xlogRecoveryCmd, wait_event_info);
 	pfree(xlogRecoveryCmd);
 
 	if (rc != 0)
diff --git a/src/backend/archive/shell_archive.c b/src/backend/archive/shell_archive.c
index 828723afe47..64c2c6aa760 100644
--- a/src/backend/archive/shell_archive.c
+++ b/src/backend/archive/shell_archive.c
@@ -75,10 +75,7 @@ shell_archive_file(ArchiveModuleState *state, const char *file,
 			(errmsg_internal("executing archive command \"%s\"",
 			 xlogarchcmd)));
 
-	fflu

Fix a resource leak (src/backend/utils/adt/rowtypes.c)

2025-04-13 Thread Ranier Vilela
Hi.

Per Coverity.

CID 1608916: (#1 of 1): Resource leak (RESOURCE_LEAK)
52. leaked_storage: Variable buf going out of scope leaks the storage
buf.data points to.

The function *record_in* has a new report about resource leak.
I think Coverity is right.
The normal path of the function frees the memory of several variables used.
Therefore the failure path should also free them.
A quick search on the web shows several occurrences of "malformed record
literal", therefore failure is common in this function.

Although Coveriy reports the leak of only buf.data, the variables *values*
and *nulls* should also be released.

While there, move the creation of stringdata, to ensure that in case of
failure, the buf.data variable is released correctly.

Attached a path.

best regards,
Ranier Vilela


fix-resource-leak-rowtypes.patch
Description: Binary data


Re: New committer: Jacob Champion

2025-04-13 Thread Ashutosh Bapat
Hearty congratulations Jacob.

On Mon, Apr 14, 2025 at 6:55 AM Richard Guo  wrote:
>
> On Sat, Apr 12, 2025 at 5:26 AM Jonathan S. Katz  wrote:
> > The Core Team would like to extend our congratulations to Jacob
> > Champion, who has accepted an invitation to become our newest PostgreSQL
> > committer.
> >
> > Please join us in wishing Jacob much success and few reverts!
>
> Congratulations Jacob!  Well deserved!
>
> Thanks
> Richard
>
>


-- 
Best Wishes,
Ashutosh Bapat




Re: Logical Replication of sequences

2025-04-13 Thread Peter Smith
Hi Vignesh,

Some review comments for patch v20250325-0002

==
Commit message

1.
Furthermore, enhancements to psql commands (\d and \dRp) now allow for better
display of publications containing specific sequences or sequences included
in a publication.

~

That doesn't seem as clear as it might be. Also, IIUC the "sequences
included in a publication" is not actually implemented yet -- there is
only the "all sequences" flag.

SUGGESTION
Furthermore, enhancements to psql commands now display which
publications contain the specified sequence (\d command), and if a
specified publication includes all sequences (\dRp command)

==
doc/src/sgml/ref/create_publication.sgml

2.
   
To add a table to a publication, the invoking user must have ownership
-   rights on the table.  The FOR ALL TABLES and
-   FOR TABLES IN SCHEMA clauses require the invoking
+   rights on the table.  The FOR TABLES IN SCHEMA,
+   FOR ALL TABLES and
+   FOR ALL SEQUENCES clauses require the invoking
user to be a superuser.

IMO these should all be using  SGML markup same as elsewhere
on this page, not  markup.

==
src/backend/commands/publicationcmds.c

3.
if (!superuser_arg(newOwnerId))
{
  if (form->puballtables)
ereport(ERROR,
errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
errmsg("permission denied to change owner of publication \"%s\"",
   NameStr(form->pubname)),
errhint("The owner of a FOR ALL TABLES publication must be
a superuser."));
  if (form->puballsequences)
ereport(ERROR,
errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
errmsg("permission denied to change owner of publication \"%s\"",
   NameStr(form->pubname)),
errhint("The owner of a FOR ALL SEQUENCES publication must
be a superuser."));
  if (is_schema_publication(form->oid))
ereport(ERROR,
errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
errmsg("permission denied to change owner of publication \"%s\"",
   NameStr(form->pubname)),
errhint("The owner of a FOR TABLES IN SCHEMA publication
must be a superuser."));
}

I wondered if there's too much duplicated code here. Maybe it's better
to share a common ereport?

SUGGESTION

if (!superuser_arg(newOwnerId))
{
  char *hint_msg = NULL;

  if (form->puballtables)
hint_msg = _("The owner of a FOR ALL TABLES publication must be a
superuser.");
  else if (form->puballsequences)
hint_msg = _("The owner of a FOR ALL SEQUENCES publication must be
a superuser.");
  else if (is_schema_publication(form->oid))
hint_msg = _("The owner of a FOR TABLES IN SCHEMA publication must
be a superuser.");
  if (hint_msg)
ereport(ERROR,
errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
errmsg("permission denied to change owner of publication \"%s\"",
   NameStr(form->pubname)),
errhint(hint_msg));
}

==
src/bin/psql/describe.c

describeOneTableDetails:

4.
+ res = PSQLexec(buf.data);
+ if (!res)
+ goto error_return;
+
+ numrows = PQntuples(res);
+

Isn't this same code already done a few lines above in the same
function? Maybe I misread something.

==
src/test/regress/sql/publication.sql

5.
+-- check that describe sequence lists all publications the sequence belongs to

Might be clearer to say:  "lists both" instead of "lists all"

==
Kind Regards,
Peter Smith.
Fujitsu Australia




Re: Proposal: Adding compression of temporary files

2025-04-13 Thread Dmitry Dolgov
> On Fri, Mar 28, 2025 at 09:23:13AM GMT, Filip Janus wrote:
> > +else
> > +if (nbytesOriginal - file->pos != 0)
> > +/* curOffset must be corrected also if compression is
> > + * enabled, nbytes was changed by compression but we
> > + * have to use the original value of nbytes
> > + */
> > +file->curOffset-=bytestowrite;
> >
> > It's not something introduced by the compression patch - the first part
> > is what we used to do before. But I find it a bit confusing - isn't it
> > mixing the correction of "logical file position" adjustment we did
> > before, and also the adjustment possibly needed due to compression?
> >
> > In fact, isn't it going to fail if the code gets multiple loops in
> >
> > while (wpos < file->nbytes)
> > {
> >...
> > }
> >
> > because bytestowrite will be the value from the last loop? I haven't
> > tried, but I guess writing wide tuples (more than 8k) might fail.
> >
>
> I will definitely test it with larger tuples than 8K.
>
> Maybe I don't understand it correctly,
> the adjustment is performed in the case that file->nbytes and file->pos
> differ.
> So it must persist also if we are working with the compressed data, but the
> problem is that data stored and compressed on disk has different sizes than
> data incoming uncompressed ones, so what should be the correction value.
> By debugging, I realized that the correction should correspond to the size
> of
> bytestowrite from the last iteration of the loop.

I agree, this looks strange. If the idea is to set curOffset to its
original value + pos, and the original value was advanced multiple times
by bytestowrite, it seems incorrect to adjust it by bytestowrite, it
seems incorrect to adjust it only once. From what I see current tests do
not exercise a case where the while will get multiple loops, so it looks
fine.

At the same time maybe I'm missing something, but how exactly such test
for 8k tuples and multiple loops in the while block should look like?
E.g. when I force a hash join on a table with a single wide text column,
the minimal tuple that is getting written to the temporary file still
has rather small length, I assume due to toasting. Is there some other
way to achieve that?




Re: Things I don't like about \du's "Attributes" column

2025-04-13 Thread Pavel Luzanov

On 11.04.2025 20:09, David G. Johnston wrote:
However, I do think we are at something committable, though I'd make 
one, maybe two, changes to v8.


Thank you for supporting this patch.

Valid until -> Password valid until: the timestamp value already 
forces a wide column, adding the word Password to the header to 
clarify what is valid simply provides the same context that the create 
role page provides when it shows the valid until attribute immediately 
below the password attribute. Leaving "valid until" alone retains the 
attribute name tieback.


Connection limit -> Con. limit: maybe this gets rejected on 
translation grounds but the abbreviation here seems obvious and shaves 
7 characters off the mandatory width for a column that occupies 12 
characters more than the values require.


Yes, it can be done.


I still have some doubts about these columns. Possibly these columns 
changes very rare,

not very often. So, maybe a better solution is to show them only in \du+.
In such case wide of columns is not a problem. But such change requires 
changing

"Attributes" column back to one line.


Even without those changes I as reviewer would concur with the 
proposal and try to move this on from bike-shedding to a go/no-go 
decision (by marking it Ready to Commit) as to whether this is an 
improvement over the status quo.


Sincethe patchrequiresrebaseandimplementationof yoursuggestions, 
Idecidedto movethe entrytothenext(July)commitfestandsetthe status to 
"Waitingon Author". I hope that I will return to this work in June. 
Robert's idea (detailed role permissions report with \du rolename) can 
be implemented as a separate work.


--
Pavel Luzanov
Postgres Professional:https://postgrespro.com


Call for Posters: PGConf.dev 2025

2025-04-13 Thread Andrey Borodin
Dear pgsql-hackers,

As we prepare for the PostgreSQL Development Conference 2025 (PGConf.dev 2025), 
scheduled for May 13–16 in Montreal, Canada, we are excited to host a poster 
session showcasing the contributions and projects within our community.

If you’re planning to attend PGConf.dev 2025, your participation would be an 
invaluable addition, and we’d love to showcase your work. 

I've contacted several authors of patches that have been rescheduled multiple 
times in the CommitFests. But then decided to make a broader call for posters.

The goal of the poster session is to visually present your patch or project on 
an A2-sized poster (ANSI C, 17" × 22"). We encourage you to include:
/*
 * A visual representation of your patch or project.
 * The conference logo and year.
 * Information on what attention or collaboration your project requires.
 * Contact details for potential collaborators to reach you or your team.
 * (E.g. a QR code to your patch or project, email, or messenger)
 */

If you're unsure where to start, I'll be happy to send you my own poster soon 
(created in PowerPoint) as a stub. Additionally, it would be greatly 
appreciated if your poster could be used as an example for others in their work.

We hope to print and bring all the posters to the conference, but if you prefer 
to handle your poster's printing and transportation, it would certainly be 
appreciated. Posters will be displayed in dedicated conference spaces and 
removed after the event, and you are welcome to take them home if you wish.

If you want to read about typical poster sessions you can reffer to this wiki 
page [0]. However, there are some differences. Please note that our poster 
session will be somewhat experimental. The main focus is on improving 
collaboration: attracting coauthors, reviewers, maybe committers, improving the 
overall discussion, etc. Let me know if you would like to present your poster 
or have a question.

Thank you for your ongoing contributions to PostgreSQL, and I look forward to 
hearing from you.

Best regards,
Andrey Borodin

[0] https://en.wikipedia.org/wiki/Poster_session



Re: [PATCH] PGSERVICEFILE as part of a normal connection string

2025-04-13 Thread Ryo Kanbayashi
On Mon, Apr 7, 2025 at 1:10 PM Michael Paquier  wrote:
>
> On Thu, Apr 03, 2025 at 12:36:59AM +0900, Ryo Kanbayashi wrote:
> > I'll reflect your notice and suggestion to the patch current I'm
> > working on :)
>
> Thanks for that.
>
> And I have forgotten to add you as a reviewer of what has been
> committed as 2c7bd2ba507e.  Sorry for that :/

No problem :)

I rebased our patch according to  2c7bd2ba507e.
https://commitfest.postgresql.org/patch/5387/

---
Great regards,
Ryo Kanbayashi


v7-0001-add-servicefile-connection-option-feature.patch
Description: Binary data


Re: remove unnecessary explicit type conversion (to char) for appendStringInfoChar function calls

2025-04-13 Thread Andrew Dunstan



On 2025-04-13 Su 8:12 AM, Álvaro Herrera wrote:

On 2025-Apr-12, Andrew Dunstan wrote:


Seems odd that this one is necessary at all. Doesn't a StringInfo always
have a trailing null byte?

AFAICT what this is doing that's significant, is increment StringInfo->len.
Before doing it, the NUL byte is not part of the message; afterwards it
is.




That make sense, but then it would be nice to have the accompanying 
comment a bit clearer.



cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Buildfarm: Enabling injection points on basilisk/dogfish (Alpine / musl)

2025-04-13 Thread Andrew Dunstan


On 2025-04-12 Sa 10:10 PM, Noah Misch wrote:

On Sat, Apr 12, 2025 at 07:51:06PM +0200, Wolfgang Walther wrote:

With injection points enabled, I get the following errors in test_aio:


[15:14:45.408](0.000s) not ok 187 - worker: first hard IO error is reported:
expected stderr
[15:14:45.409](0.000s)
[15:14:45.409](0.000s) #   Failed test 'worker: first hard IO error is
reported: expected stderr'
#   at t/001_aio.pl line 810.
[15:14:45.409](0.000s) #   'psql::88: ERROR:  could
not read blocks 2..2 in file "base/5/16408": I/O error'
# doesn't match '(?^:ERROR:.*could not read blocks 2\.\.2 in file
\"base/.*\": Input/output error)'
It seems like it's just the error message that is different and has "I/O"
instead of "Input/output"?

Looks like it.


On a more general note, does enabling injection points make any sense here?

Yes, it does.


I see that coverage in the build farm is not very big. IIUC, those are a
development tool, so might not be relevant, because nobody is developing on
Alpine / musl?

No, whether anyone develops on the platform is not a factor.  One hasn't fully
tested PostgreSQL until one builds with injection points.




Here's a simple fix ... also removes some unnecessary escaping and 
leaning toothpick syndrome.



cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com
diff --git a/src/test/modules/test_aio/t/001_aio.pl b/src/test/modules/test_aio/t/001_aio.pl
index c136d8ee8f5..ef4e5247e5b 100644
--- a/src/test/modules/test_aio/t/001_aio.pl
+++ b/src/test/modules/test_aio/t/001_aio.pl
@@ -813,7 +813,7 @@ SELECT invalidate_rel_block('tbl_ok', 2);
 		"first hard IO error is reported",
 		qq(SELECT count(*) FROM tbl_ok),
 		qr/^$/,
-		qr/ERROR:.*could not read blocks 2\.\.2 in file \"base\/.*\": Input\/output error/
+		qr!ERROR:.*could not read blocks 2\.\.2 in file "base/.*": (?:I/O|Input/output) error!
 	);
 
 	psql_like(
@@ -822,7 +822,7 @@ SELECT invalidate_rel_block('tbl_ok', 2);
 		"second hard IO error is reported",
 		qq(SELECT count(*) FROM tbl_ok),
 		qr/^$/,
-		qr/ERROR:.*could not read blocks 2\.\.2 in file \"base\/.*\": Input\/output error/
+		qr!ERROR:.*could not read blocks 2\.\.2 in file "base/.*": (?:I/O|Input/output) error!
 	);
 
 	$psql->query_safe(qq(SELECT inj_io_short_read_detach()));


Re: Back-patch of: avoid multiple hard links to same WAL file after a crash

2025-04-13 Thread Noah Misch
On Sat, Apr 05, 2025 at 07:09:58PM -0700, Noah Misch wrote:
> On Sun, Apr 06, 2025 at 07:42:02AM +0900, Michael Paquier wrote:
> > On Sat, Apr 05, 2025 at 12:13:39PM -0700, Noah Misch wrote:
> > > Since the 2025-02 releases made non-toy-size archive recoveries fail 
> > > easily,
> > > that's not enough.  If the proposed 3-second test is the wrong thing, what
> > > instead?
> > 
> > I don't have a good idea about that in ~16, TBH, but I am sure to not
> > be a fan of the low reproducibility rate of this test as proposed.
> > It's not perfect, but as the design to fix the original race condition
> > has been introduced in v15, why not begin with a test in 17~ using
> > some injection points?
> 
> Two reasons:
> 
> a) The fix ended calls to the whole range of relevant code.  Hence, the
>injection point placement that would have been relevant before the fix
>isn't reached.  In other words, there's no right place for the injection
>point.  (The place for the injection point would be in durable_rename(), in
>the checkpointer.  After the fix, the checkpointer just doesn't call
>durable_rename().)
> 
> b) Stochastic tests catch defects beyond the specific one the test author
>targeted.  An injection point test is less likely to do that.  (That said,
>with reason (a), there's no known injection point test design to compete
>with the stochastic design.)

Tom and Michael, do you still object to the test addition, or not?  If there
are no new or renewed objections by 2025-04-20, I'll proceed to add the test.

As another data point, raising the runtime from 3s to 17s makes it reproduce
the problem 25% of the time.  You can imagine a plot with axes of runtime and
percent detection.  One can pick any point on that plot's curve.  Given how
little wall time it takes for the buildfarm and CI to reach a few hundred
runs, I like the trade-off of 3s runtime and 1% detection.  In particular, I
like it better than 17s runtime for 25% detection.  How do you see it?




Re: Back-patch of: avoid multiple hard links to same WAL file after a crash

2025-04-13 Thread Tom Lane
Noah Misch  writes:
> Tom and Michael, do you still object to the test addition, or not?  If there
> are no new or renewed objections by 2025-04-20, I'll proceed to add the test.

While I still don't love it, I don't have a better proposal.

regards, tom lane




Re: TOAST versus toast

2025-04-13 Thread wenhui qiu
Hi,
I think this point is of no significance at all. Besides, this is a
document that has been around for over ten years. Everyone has become
accustomed to this kind of expression. This is just a case of being full
but having nothing to do with anything.


On Sat, 12 Apr 2025 at 10:31, David G. Johnston 
wrote:

> On Sun, Mar 16, 2025 at 8:33 PM Peter Smith  wrote:
>
>> Thanks for your suggestions. At this point option (1) is looking most
>> attractive. Probably, I will just withdraw the CF entry soon unless
>> there is some new interest. Just chipping away fixing a few places
>> isn't going to achieve the consistency this thread was aiming for.
>>
>>
> I've moved this back to waiting on author pending a final decision.
> Interested parties might still chime in but it doesn't seem like it is
> actively looking for reviewers at this point.
>
> David J.
>
>


Re: CSN snapshots in hot standby

2025-04-13 Thread 贾明伟
Hi all,

Apologies — the patch I sent earlier did not appear as expected on the mailing 
list archives because of wrong attachment style. 

I'll resend it properly as an inline patch shortly.

Thanks for your understanding!

Best regards,  
Mingwei Jia



v7-0013-use-clog-in-XidInMVCCSnapshot.patch
Description: Binary data


Re: remove unnecessary explicit type conversion (to char) for appendStringInfoChar function calls

2025-04-13 Thread Álvaro Herrera
On 2025-Apr-12, Andrew Dunstan wrote:

> Seems odd that this one is necessary at all. Doesn't a StringInfo always
> have a trailing null byte?

AFAICT what this is doing that's significant, is increment StringInfo->len.
Before doing it, the NUL byte is not part of the message; afterwards it
is.

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/
"The important things in the world are problems with society that we don't
understand at all. The machines will become more complicated but they won't
be more complicated than the societies that run them."(Freeman Dyson)




pg_restore --format= option(without any value) should report an error as pg_dump is reporting an error

2025-04-13 Thread Mahendra Singh Thalor
Hi,
With "pg_restore --format=", we are not giving any error because in code,
we are checking length of arg but pg_dump is reporting an error for the
same option.

For the consistency purpose, pg_dump and pg_restore both should report an
error for the test case below.

*Ex: (output after this patch)but before this patch, below command is
passing.*
/pg_restore  x1 -d postgres -j 10 -C --verbose --format=
pg_restore: error: unrecognized archive format ""; please specify "c", "d",
or "t"

Here, I am attaching a patch which is fixing the same. I added 2 TAP tests
also for invalid options.

*Note:* We have 2 more options in pg_restore code which validate the option
if arg has non zero length. I will prepare patches for both(--host and
--port). We need to add some validate function also for both these options.

-- 
Thanks and Regards
Mahendra Singh Thalor
EnterpriseDB: http://www.enterprisedb.com


v01-pg_restore-format-option-should-validate-all-values.patch
Description: Binary data