Re: Conflict detection for update_deleted in logical replication

2025-05-24 Thread Amit Kapila
On Sat, May 24, 2025 at 3:58 PM Dilip Kumar  wrote:
>
> On Sat, May 24, 2025 at 11:00 AM Amit Kapila  wrote:
> >
> > On Sat, May 24, 2025 at 10:29 AM Dilip Kumar  wrote:
> > >
> > > On Sat, May 24, 2025 at 10:04 AM Dilip Kumar  
> > > wrote:
> > > >
> > > > On Fri, May 23, 2025 at 9:21 PM Xuneng Zhou  
> > > > wrote:
> > > > >
> > > > Looking at v31-0001 again, most of it looks fine except this logic of
> > > > getting the commit_ts after marking the transaction in commit.  I see
> > > > in RecordTransactionCommit(), we are setting this flag
> > > > (DELAY_CHKPT_IN_COMMIT) to put the transaction in commit state[1], and
> > > > after that we insert the commit log[2], but I noticed that there we
> > > > call GetCurrentTransactionStopTimestamp() for acquiring the commit-ts
> > > > and IIUC we want to ensure that commit-ts timestamp should be after we
> > > > set the transaction in commit with (DELAY_CHKPT_IN_COMMIT), but
> > > > question is, is it guaranteed that the place we are calling
> > > > GetCurrentTransactionStopTimestamp() will always give us the current
> > > > timestamp? Because if you see this function, it may return
> > > > 'xactStopTimestamp' as well if that is already set.  I am still
> > > > digging a bit more. Is there a possibility that 'xactStopTimestamp' is
> > > > already set during some interrupt handling when
> > > > GetCurrentTransactionStopTimestamp() is already called by
> > > > pgstat_report_stat(), or is it guaranteed that during
> > > > RecordTransactionCommit we will call this first time?
> > > >
> > > > If we have already ensured this then I think adding a comment to
> > > > explain the same will be really useful.
> > > >
> > ...
> > >
> > > IMHO, this should not be an issue as the only case where
> > > 'xactStopTimestamp' is set for the current process is from
> > > ProcessInterrupts->pgstat_report_stat->
> > > GetCurrentTransactionStopTimestamp, and this call sequence is only
> > > possible when transaction blockState is TBLOCK_DEFAULT.  And that is
> > > only set after RecordTransactionCommit() is called, so logically,
> > > RecordTransactionCommit() should always be the first one to set the
> > > 'xactStopTimestamp'.  But I still think this is a candidate for
> > > comments, or even better,r if somehow it can be ensured by some
> > > assertion, maybe by passing a parameter in
> > > GetCurrentTransactionStopTimestamp() that if this is called from
> > > RecordTransactionCommit() then 'xactStopTimestamp' must not already be
> > > set.
> > >
> >
> > We can add an assertion as you are suggesting, but I feel that adding
> > a parameter for this purpose looks slightly odd.
>
>
> Yeah, that's true. Another option is to add an assert as
> Assert(xactStopTimestamp == 0) right before calling
> XactLogCommitRecord()?  With that, we don't need to pass an extra
> parameter, and since we are in a critical section, this process can
> not be interrupted, so it's fine even if we have ensured that
> 'xactStopTimestamp' is 0 before calling the API, as this can not be
> changed.  And we can add a comment atop this assertion.
>

This sounds reasonable to me. Let us see what others think.

-- 
With Regards,
Amit Kapila.




Re: [PATCH] Extending FK check skipping on replicas to ADD FK and TRUNCATE

2025-05-24 Thread Hannu Krosing
I would also argue for treating this as a bug and back-porting to all
supported versions - a quick look at v13 seems to confirm that the
wrapped code has not changed at least since then.

I don't think we can claim the current state is Working As Intended
unless someone stands up and says they really intended it to work this
way :)

On Sat, May 24, 2025 at 3:47 PM Hannu Krosing  wrote:
>
> Hello Everybody,
>
> Currently setting `session_replication_role` to `replica` disables
> foreign key checks allowing, among other things, free table copy order
> and faster CDC apply in logical replication.
>
> But two other cases of foreign keys are still restricted or blocked
> even with this setting
>
>
> 1. Foreign Key checks during `ALTER TABLE  ADD FOREIGN KEY
> () REFERENCES ();`
>
> These can be really, Really, REALLY slow when the PK table is huge
> (hundreds of billions of rows). And here I mean taking-tens-of-days
> slow in extreme cases.
>
> And they are also completely unnecessary when the table data comes
> from a known good source.
>
>
> 2. `TRUNCATE ` is blocked when there are any foreign
> keys referencing the 
>
> But you still can mess up your database in exactly the same way by
> running `DELETE FROM `, just a little (or a lot)
> slower.
>
>
> The attached patch fixes this, allowing both cases to work when `SET
> session_replication_role=replica;` is in force:
>
> ### Test for `ALTER TABLE  ADD FOREIGN KEY () REFERENCES
> ();`
>
> CREATE TABLE pkt(id int PRIMARY KEY);
> CREATE TABLE fkt(fk int);
> INSERT INTO fkt VALUES(1);
>
> ALTER TABLE fkt ADD FOREIGN KEY (fk) REFERENCES pkt(id); -- fails
>
> SET session_replication_role=replica;
> ALTER TABLE fkt ADD FOREIGN KEY (fk) REFERENCES pkt(id); -- now succeeds
>
> ### Test for `TRUNCATE `
>
> CREATE TABLE pkt(id int PRIMARY KEY);
> CREATE TABLE fkt(fk int REFERENCES pkt(id));
>
> TRUNCATE pkt; -- fails
>
> SET session_replication_role=replica;
> TRUNCATE pkt; -- now succeeds
>
>
>
> The patch just wraps two sections of code in
>
> if (SessionReplicationRole != SESSION_REPLICATION_ROLE_REPLICA) {
>...
> }
>
> and the rest is added indentation for the wrapped.
>
> Plus I added tests, including the until now missing test for
> explicitly the foreign key checks (which can be argued to already be
> covered by generic trigger tests)
>
>
> I am not sure if we need to add anything to current documentation[*]
> which says just this about foreign keys and
> `session_replication_role=replica`
>
> > Since foreign keys are implemented as triggers, setting this parameter to 
> > replica also disables all foreign key checks, which can leave data in an 
> > inconsistent state if improperly used.
>
> [*] 
> https://www.postgresql.org/docs/17/runtime-config-client.html#GUC-SESSION-REPLICATION-ROLE
>
> Any comments and suggestions are welcome




Re: PATCH: jsonpath string methods: lower, upper, initcap, l/r/btrim, replace, split_part

2025-05-24 Thread David E. Wheeler
On May 23, 2025, at 13:52, Tom Lane  wrote:

>> I assume you mean that they’re set at initdb time, so there’s no mutability 
>> concern? 
> 
> Yeah, I think Peter's right and I'm wrong.  Obviously this ties into
> our philosophical debate about how immutable is immutable.  But as
> long as the functions only depend on locale settings that are fixed
> at database creation, I think it's okay to consider them immutable.
> 
> If you were, say, depending on LC_NUMERIC, it would clearly be unsafe
> to consider that immutable, so I'm not quite sure if this is the end
> of the discussion.  But for what's mentioned in the thread title,
> I think we only care about LC_CTYPE.

Oh, so maybe all this is moot, and Florents can go ahead and add support for 
the functions to the non-_tz functions?

Should there be some sort of inventory of what functions can be used in what 
contexts?

D



signature.asc
Description: Message signed with OpenPGP


Re: Retiring some encodings?

2025-05-24 Thread DEVOPS_WwIT

Hi Michael


Yeah, that's a good point.  I would also question what's the benefit
in using GB18030 over UTF-8, though.  An obvious one I can see is
because legacy applications never get updated.

The GB18030 encoding standard is a mandatory Chinese character encoding 
standard required by regulations. Software sold and used in China must 
support GB18030, with its latest version being the 2023 edition. The 
primary advantage of GB18030 is that most Chinese characters require 
only 2 bytes for storage, whereas UTF-8 necessitates 3 bytes for the 
same characters. This makes GB18030 significantly more storage-efficient 
compared to UTF-8 in terms of space utilization.


Tony


Re: Non-reproducible AIO failure

2025-05-24 Thread Thomas Munro
On Sun, May 25, 2025 at 9:00 AM Alexander Lakhin  wrote:
> Hello Thomas,
> 24.05.2025 14:42, Thomas Munro wrote:
> > On Sat, May 24, 2025 at 3:17 PM Tom Lane  wrote:
> >> So it seems that "very low-probability issue in our Mac AIO code" is
> >> the most probable description.
> > There isn't any macOS-specific AIO code so my first guess would be
> > that it might be due to aarch64 weak memory reordering (though Andres
> > speculated that itt should all be one backend, huh), if it's not just
> > a timing luck thing.  Alexander, were the other OSes you tried all on
> > x86?
>
> As I wrote off-list before, I had tried x86_64 only, but since then I
> tried to reproduce the issue on an aarch64 server with Ubuntu 24.04,
> running 10, then 40 instances of t/027_stream_regress.pl in parallel. I've
> also multiplied "test: brin ..." line x10. But the issue is still not
> reproduced (in 8+ hours).

Hmm.  And I see now that this really is all in one backend.  Could it
be some variation of the interrupt processing stuff from acad9093?

> However, I've managed to get an AIO-related assertion failure on macOS 14.5
...
> TRAP: failed Assert("ioh->op == PGAIO_OP_INVALID"), File: "aio_io.c", Line: 
> 161, PID: 32355

Can you get a core and print *ioh in the debugger?




Re: Relstats after VACUUM FULL and CLUSTER

2025-05-24 Thread Sami Imseih
> If you subtract recently dead from that number within the heap 
> implementation, then it will no longer
> reflect non-removable tuples and the log message in the cluster
> function "found %.0f removable, %.0f nonremovable row versions" will no 
> longer be correct.

Yes, that's correct. I did not pay attention to the logging aspect.

Here is a test with and without the patch. While there is a long-running
serializable transaction in another session, running an update followed
by a normal vacuum sets the reltuples value correctly to 10. A follow-up
VACUUM FULL then sets it incorrectly to 14. The patch, which sets
the num_tuples after the logging, ensures that the logging is correct
and that pg_class.reltuples matches the actual number of live tuples.


-- without the patch

test=# UPDATE stats SET a=1 WHERE a > 6;
UPDATE 4
test=# SELECT reltuples FROM pg_class WHERE relname = 'stats';
 reltuples
---
-1
(1 row)

test=# vacuum verbose stats;
INFO:  vacuuming "test.public.stats"
INFO:  finished vacuuming "test.public.stats": index scans: 0
pages: 0 removed, 1 remain, 1 scanned (100.00% of total), 0 eagerly scanned
tuples: 0 removed, 14 remain, 4 are dead but not yet removable
removable cutoff: 789, which was 1 XIDs old when operation ended
new relfrozenxid: 788, which is 1 XIDs ahead of previous value
frozen: 0 pages from table (0.00% of total) had 0 tuples frozen
visibility map: 0 pages set all-visible, 0 pages set all-frozen (0
were all-visible)
index scan not needed: 0 pages from table (0.00% of total) had 0 dead
item identifiers removed
avg read rate: 0.000 MB/s, avg write rate: 9.159 MB/s
buffer usage: 14 hits, 0 reads, 3 dirtied
WAL usage: 4 records, 3 full page images, 24981 bytes, 0 buffers full
system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
VACUUM
test=# SELECT reltuples FROM pg_class WHERE relname = 'stats';
 reltuples
---
10
(1 row)

test=# VACUUM (verbose, FULL) stats;
INFO:  vacuuming "public.stats"
INFO:  "public.stats": found 0 removable, 14 nonremovable row versions
in 1 pages
DETAIL:  4 dead row versions cannot be removed yet.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
VACUUM
test=# SELECT reltuples FROM pg_class WHERE relname = 'stats';
 reltuples
---
14
(1 row)



-- with the patch

test=# UPDATE stats SET a=1 WHERE a > 6;
UPDATE 4
test=#
test=# SELECT reltuples FROM pg_class WHERE relname = 'stats';
 reltuples
---
-1
(1 row)

test=# VACUUM verbose stats;
INFO:  vacuuming "test.public.stats"
INFO:  finished vacuuming "test.public.stats": index scans: 0
pages: 0 removed, 1 remain, 1 scanned (100.00% of total), 0 eagerly scanned
tuples: 0 removed, 14 remain, 4 are dead but not yet removable
removable cutoff: 794, which was 1 XIDs old when operation ended
new relfrozenxid: 793, which is 1 XIDs ahead of previous value
frozen: 0 pages from table (0.00% of total) had 0 tuples frozen
visibility map: 0 pages set all-visible, 0 pages set all-frozen (0
were all-visible)
index scan not needed: 0 pages from table (0.00% of total) had 0 dead
item identifiers removed
avg read rate: 0.000 MB/s, avg write rate: 9.195 MB/s
buffer usage: 17 hits, 0 reads, 3 dirtied
WAL usage: 4 records, 3 full page images, 24981 bytes, 0 buffers full
system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
VACUUM
test=# SELECT reltuples FROM pg_class WHERE relname = 'stats';
 reltuples
---
10
(1 row)

test=# VACUUM (verbose, FULL) stats;
INFO:  vacuuming "public.stats"
INFO:  "public.stats": found 0 removable, 14 nonremovable row versions
in 1 pages
DETAIL:  4 dead row versions cannot be removed yet.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
VACUUM
test=# SELECT reltuples FROM pg_class WHERE relname = 'stats';
 reltuples
---
10
(1 row)


--
Sami




Re: Non-reproducible AIO failure

2025-05-24 Thread Tom Lane
Thomas Munro  writes:
> Can you get a core and print *ioh in the debugger?

So far, I've failed to get anything useful out of core files
from this failure.  The trace goes back no further than

(lldb) bt
* thread #1
  * frame #0: 0x00018de39388 libsystem_kernel.dylib`__pthread_kill + 8

That's quite odd in itself: while I don't find the debugging
environment on macOS to be the greatest, it's not normally
this unhelpful.

I have managed to reproduce the "aio_ret->result.status !=
PGAIO_RS_UNKNOWN" crash about five times in total on three different
Apple Silicon machines (an M1 mini, an M4 Pro mini, and an M4 Pro
laptop).  So there is definitely something wrong.  The failure rate
is pretty tiny for me, though I've not tried running several test
instances concurrently.

regards, tom lane




Re: Add pg_get_injection_points() for information of injection points

2025-05-24 Thread Rustam ALLAKOV
The following review has been posted through the commitfest application:
make installcheck-world:  not tested
Implements feature:   not tested
Spec compliant:   not tested
Documentation:tested, failed

Hi Michael,
Thank you for the patch.

I suggest to add into documentation 
> if the server was built with  --enable-injection-points configuration;
> otherwise returns ...

to match how `unicode_version` function is documented
> Returns a string representing the version of Unicode used by ICU, if
> the server was built with ICU support; otherwise returns

Regards.
Rustam Allakov.

Re: mention unused_oids script in pg_proc.dat

2025-05-24 Thread Florents Tselai


> On 24 May 2025, at 12:24 PM, Álvaro Herrera  wrote:
> 
> On 2025-May-24, Florents Tselai wrote:
> 
>> I aggree with having more READMEs around the src tree.
>> It’s true that a lot of doc content found in VII. Internals is dev-oriented,
>> but imho it should be closer to the src (and more succinct/less verbose too).
> 
> Maybe these READMEs can simply contain little more than links to the
> built HTML docs, to avoid having multiple sources of info.
> 
>> Looking at some of the text in 67.2.2. OID Assignment for example,
>> a few things look outdated wrt choosing OIDs.
>> 
>> One can look through the history of similar commits to see what
>> needs to be changed; So it’s not THAT bad.
> 
> Sure, outdated docs is something that happens.  Patches welcome.
> 

Something like this ? 



v1-catalog-readme.patch
Description: Binary data




Re: Conflict detection for update_deleted in logical replication

2025-05-24 Thread Dilip Kumar
On Sat, May 24, 2025 at 11:00 AM Amit Kapila  wrote:
>
> On Sat, May 24, 2025 at 10:29 AM Dilip Kumar  wrote:
> >
> > On Sat, May 24, 2025 at 10:04 AM Dilip Kumar  wrote:
> > >
> > > On Fri, May 23, 2025 at 9:21 PM Xuneng Zhou  wrote:
> > > >
> > > Looking at v31-0001 again, most of it looks fine except this logic of
> > > getting the commit_ts after marking the transaction in commit.  I see
> > > in RecordTransactionCommit(), we are setting this flag
> > > (DELAY_CHKPT_IN_COMMIT) to put the transaction in commit state[1], and
> > > after that we insert the commit log[2], but I noticed that there we
> > > call GetCurrentTransactionStopTimestamp() for acquiring the commit-ts
> > > and IIUC we want to ensure that commit-ts timestamp should be after we
> > > set the transaction in commit with (DELAY_CHKPT_IN_COMMIT), but
> > > question is, is it guaranteed that the place we are calling
> > > GetCurrentTransactionStopTimestamp() will always give us the current
> > > timestamp? Because if you see this function, it may return
> > > 'xactStopTimestamp' as well if that is already set.  I am still
> > > digging a bit more. Is there a possibility that 'xactStopTimestamp' is
> > > already set during some interrupt handling when
> > > GetCurrentTransactionStopTimestamp() is already called by
> > > pgstat_report_stat(), or is it guaranteed that during
> > > RecordTransactionCommit we will call this first time?
> > >
> > > If we have already ensured this then I think adding a comment to
> > > explain the same will be really useful.
> > >
> ...
> >
> > IMHO, this should not be an issue as the only case where
> > 'xactStopTimestamp' is set for the current process is from
> > ProcessInterrupts->pgstat_report_stat->
> > GetCurrentTransactionStopTimestamp, and this call sequence is only
> > possible when transaction blockState is TBLOCK_DEFAULT.  And that is
> > only set after RecordTransactionCommit() is called, so logically,
> > RecordTransactionCommit() should always be the first one to set the
> > 'xactStopTimestamp'.  But I still think this is a candidate for
> > comments, or even better,r if somehow it can be ensured by some
> > assertion, maybe by passing a parameter in
> > GetCurrentTransactionStopTimestamp() that if this is called from
> > RecordTransactionCommit() then 'xactStopTimestamp' must not already be
> > set.
> >
>
> We can add an assertion as you are suggesting, but I feel that adding
> a parameter for this purpose looks slightly odd.


Yeah, that's true. Another option is to add an assert as
Assert(xactStopTimestamp == 0) right before calling
XactLogCommitRecord()?  With that, we don't need to pass an extra
parameter, and since we are in a critical section, this process can
not be interrupted, so it's fine even if we have ensured that
'xactStopTimestamp' is 0 before calling the API, as this can not be
changed.  And we can add a comment atop this assertion.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly

2025-05-24 Thread Alexander Korotkov
On Fri, May 23, 2025 at 12:10 AM Alexander Korotkov
 wrote:
>
> Hi, Vitaly!
>
> On Tue, May 20, 2025 at 6:44 PM Vitaly Davydov  
> wrote:
> >
> > Thank you very much for the review!
> >
> > > The patchset doesn't seem to build after 371f2db8b0, which adjusted
> > > the signature of the INJECTION_POINT() macro.  Could you, please,
> > > update the patchset accordingly.
> >
> > I've updated the patch (see attached). Thanks.
> >
> > > I see in 0004 patch we're calling XLogGetReplicationSlotMinimumLSN()
> > > before slots synchronization then use it for WAL truncation.
> > > Generally looks good.  But what about the "if
> > > (InvalidateObsoleteReplicationSlots(...))" branch?  It calls
> > > XLogGetReplicationSlotMinimumLSN() again.  Why would the value
> > > obtained from the latter call reflect slots as they are synchronized
> > > to the disk?
> >
> > In patch 0004 I call XLogGetReplicationSlotMinimumLSN() again to keep the 
> > old
> > behaviour - this function was called in KeepLogSeg prior to my change. I 
> > also
> > call CheckPointReplicationSlots at the next line to save the invalidated and
> > other dirty slots on disk again to make sure, the new oldest LSN is in sync.
> >
> > The problem I tried to solve in this if-branch is to fix test
> > src/test/recovery/t/019_replslot_limit.pl which was failed because the WAL 
> > was
> > not truncated enought for the test to pass ok. In general, this branch is 
> > not
> > necessary and we may fix the test by calling checkpoint twice (please, see 
> > the
> > alternative.rej patch for this case). If you think, we should incorporate 
> > this
> > new change, I'm ok to do it. But the WAL will be truncating more lazily.
> >
> > Furthermore, I think we can save slots on disk right after invalidation, 
> > not in
> > CheckPointGuts to avoid saving invalidated slots twice.
>
> Thank you for the clarification.  It's all good.  I just missed that
> CheckPointReplicationSlots() syncs slots inside the "if" branch.
>
> I've reordered the patchset.  Fix should come first, tests comes
> second.  So, tests pass after each commit.  Also I've joined both
> tests and injection points into single commit.  I don't see reason to
> place tests into src/test/modules, because there is no module.  I've
> moved them into src/test/recovery.
>
> I also improved some comments and commit messages.  I think 0001
> should go to all supported releases as it fixes material bug, while
> 0002 should be backpatched to 17, where injection points fist appears.
> 0003 should go to pg19 after branching.  I'm continuing reviewing
> this.

I spend more time on this.  The next revision is attached.  It
contains revised comments and other cosmetic changes.  I'm going to
backpatch 0001 to all supported branches, and 0002 to 17 where
injection points were introduced.

--
Regards,
Alexander Korotkov
Supabase


v5-0003-Remove-redundant-ReplicationSlotsComputeRequiredL.patch
Description: Binary data


v5-0002-Add-TAP-tests-to-check-replication-slot-advance-d.patch
Description: Binary data


v5-0001-Keep-WAL-segments-by-the-flushed-value-of-the-slo.patch
Description: Binary data


Re: I/O worker and ConfigReload

2025-05-24 Thread Thomas Munro
On Sun, May 25, 2025 at 1:56 AM Dmitry Dolgov <9erthali...@gmail.com> wrote:
> I've been rebasing the patch for online resizing of shared memory, and
> noticed something strange about IoWorkerMain: although it sets the
> handler SignalHandlerForConfigReload, it doesn't look like it acts upon
> ConfigReloadPending. From what I see it happens because it only does
> CHECK_FOR_INTERRUPTS in the main worker loop, which doesn't handle
> ConfigReloadPending.
>
> In the context of shared memory resizing patch it means I/O workers are
> not receiving the new value of NBuffers and crash. Adding something like
> pgaio_worker_process_interrupts to deal with ConfigReloadPending at the
> beginning of the main worker loop seems to solve the issue. But I
> haven't found any discussion about config reload in I/O workers, was
> this omission intentional?

You're right, and I noticed the same thing and fixed it in
0004-aio-Adjust-IO-worker-pool-size-automatically.patch in
https://www.postgresql.org/message-id/flat/CA%2BhUKG%2Bm4xV0LMoH2c%3DoRAdEXuCnh%2BtGBTWa7uFeFMGgTLAw%2BQ%40mail.gmail.com.
(That patch also introduces a reason for the workers to care about
config reload.). Not sure how that happened, ie the history of
adding/removing reasons to worry about reload, I don't remember, this
code stewed for a long time, but it's at least inconsistent...  We
could fix it as you suggested or as shown in that patch.




I/O worker and ConfigReload

2025-05-24 Thread Dmitry Dolgov
Hi,

I've been rebasing the patch for online resizing of shared memory, and
noticed something strange about IoWorkerMain: although it sets the
handler SignalHandlerForConfigReload, it doesn't look like it acts upon
ConfigReloadPending. From what I see it happens because it only does
CHECK_FOR_INTERRUPTS in the main worker loop, which doesn't handle
ConfigReloadPending.

In the context of shared memory resizing patch it means I/O workers are
not receiving the new value of NBuffers and crash. Adding something like
pgaio_worker_process_interrupts to deal with ConfigReloadPending at the
beginning of the main worker loop seems to solve the issue. But I
haven't found any discussion about config reload in I/O workers, was
this omission intentional?




Re: pg18: Virtual generated columns are not (yet) safe when superuser selects from them

2025-05-24 Thread David G. Johnston
On Saturday, May 24, 2025, jian he  wrote:

> On Sat, May 24, 2025 at 2:39 PM Feike Steenbergen
>  wrote:
> >
> > The loophole is this:
> >
> > - the generated virtual column can use a user-defined function
> > - when running SELECT against that column by a superuser
> >   the function is called within the context of a superuser
> > - this in turn allows the regular user to run any code within
> >   the context of superuser
>
> sorry, I am not fully sure what this means.
> a minimum sql reproducer would be great.
>

This is same complaint being made against “security invoker” triggers
existing/being the default.  Or the general risk in higher privileged users
running security invoker functions written by lesser privileged users.

The features conform to our existing security model design.  Discussions
are happening as pertains to that model and the OP should chime in there to
contribute to the overall position of the project and not relegate the
complaint to any one particular feature.

David J.


Re: PATCH: jsonpath string methods: lower, upper, initcap, l/r/btrim, replace, split_part

2025-05-24 Thread Florents Tselai



> On 25 May 2025, at 1:01 AM, David E. Wheeler  wrote:
> 
> On May 24, 2025, at 17:55, David E. Wheeler  wrote:
> 
>> And now I see my patch broke the grammar because I left some of my fiddling 
>> in there. Apologies. Here’s an updated patch with the updated keyword map, 
>> too.
> 
> No, really :sigh:
> 
> D
> 
> 

The most important problem in jsonpath_scan.l now is the fact that I broke the 
alphabetical ordering of keywords in v2 ,
and you followed that too.


> I'm curious why you added the `arg0` and `arg1` fields to the `method_args` 
> union. Is there some reason that the existing `left` and `right` fields 
> wouldn't work? 

The left-right ended-up being more of a brain teaser to work with in 
jsonpath_exec.
Until before these methods, the opt_datetime_template was the only argument 
passed in existing jsonpath functions, 
So initially I used that as a template to add to the scann-parser infra,
but then realized it may make morese sense to have a way to access indexed-args.
IIRC, with an eye in the future I found it much more convenient - less of the  
to work with indexed-args.
I should have gone back and use them for *TRIM_P too
But you may be onto something with the split_part thing.

> The existing string() method operates on a "JSON boolean, number, string, or 
> datetime"; should these functions also operate on all those data types?

You mean implicitely conversion to string first?
I don’t think so: I’d expect to work like ‘$…string().replace()…'

> I'm not sure how well these functions comply with the SQL spec.

The fact that Peter hasn’t raized this as an issue, makes me think it's not one





Re: PATCH: jsonpath string methods: lower, upper, initcap, l/r/btrim, replace, split_part

2025-05-24 Thread David E. Wheeler
On May 24, 2025, at 17:46, David E. Wheeler  wrote:

> Oh, and I forgot to mention, src/backend/utils/adt/jsonpath_scan.l looks like 
> it has a ton of duplication in it. Shouldn’t it just add the new keywords, 
> something like:

And now I see my patch broke the grammar because I left some of my fiddling in 
there. Apologies. Here’s an updated patch with the updated keyword map, too.

Best,

David



v4-0001-Add-additional-jsonpath-string-methods.patch
Description: Binary data


signature.asc
Description: Message signed with OpenPGP


Re: PATCH: jsonpath string methods: lower, upper, initcap, l/r/btrim, replace, split_part

2025-05-24 Thread David E. Wheeler
On May 24, 2025, at 17:55, David E. Wheeler  wrote:

> And now I see my patch broke the grammar because I left some of my fiddling 
> in there. Apologies. Here’s an updated patch with the updated keyword map, 
> too.

No, really :sigh:

D



v5-0001-Add-additional-jsonpath-string-methods.patch
Description: Binary data


signature.asc
Description: Message signed with OpenPGP


Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly

2025-05-24 Thread Alexander Korotkov
Hi, Amit!

Thank you for your attention to this patchset!

On Sat, May 24, 2025 at 2:15 PM Amit Kapila  wrote:
> On Sat, May 24, 2025 at 4:08 PM Alexander Korotkov  
> wrote:
> >
> > I spend more time on this.  The next revision is attached.  It
> > contains revised comments and other cosmetic changes.  I'm going to
> > backpatch 0001 to all supported branches,
> >
>
> Is my understanding correct that we need 0001 because
> PhysicalConfirmReceivedLocation() doesn't save the slot to disk after
> changing the slot's restart_lsn?

Yes.  Also, even if it would save slot to the disk, there is still
race condition that concurrent checkpoint could use updated value from
the shared memory to clean old WAL segments, and then crash happens
before we managed to write the slot to the disk.

> If so, shouldn't the comments (One
> could argue that the slot should be saved to disk now, but that'd be
> energy wasted - the worst thing lost information could cause here is
> to give wrong information in a statistics view) in
> PhysicalConfirmReceivedLocation() be changed to mention the risk of
> not saving the slot?
>
> Also, after 0001, even the same solution will be true for logical
> slots as well, where we are already careful to save the slot to disk
> if its restart_lsn is changed, see LogicalConfirmReceivedLocation().
> So, won't that effort be wasted? Even if we don't want to do anything
> about it (which doesn't sound like a good idea), we should note that
> in comments somewhere.

I have added the comments about both points in the attached revision
of the patchset.

--
Regards,
Alexander Korotkov
Supabase


v6-0003-Remove-redundant-ReplicationSlotsComputeRequiredL.patch
Description: Binary data


v6-0002-Add-TAP-tests-to-check-replication-slot-advance-d.patch
Description: Binary data


v6-0001-Keep-WAL-segments-by-the-flushed-value-of-the-slo.patch
Description: Binary data


Re: Automatically sizing the IO worker pool

2025-05-24 Thread Dmitry Dolgov
> On Sun, Apr 13, 2025 at 04:59:54AM GMT, Thomas Munro wrote:
> It's hard to know how to set io_workers=3.  If it's too small,
> io_method=worker's small submission queue overflows and it silently
> falls back to synchronous IO.  If it's too high, it generates a lot of
> pointless wakeups and scheduling overhead, which might be considered
> an independent problem or not, but having the right size pool
> certainly mitigates it.  Here's a patch to replace that GUC with:
>
>   io_min_workers=1
>   io_max_workers=8
>   io_worker_idle_timeout=60s
>   io_worker_launch_interval=500ms
>
> It grows the pool when a backlog is detected (better ideas for this
> logic welcome), and lets idle workers time out.

I like the idea. In fact, I've been pondering about something like a
"smart" configuration for quite some time, and convinced that a similar
approach needs to be applied to many performance-related GUCs.

Idle timeout and launch interval serving as a measure of sensitivity
makes sense to me, growing the pool when a backlog (queue_depth >
nworkers, so even a slightest backlog?) is detected seems to be somewhat
arbitrary. From what I understand the pool growing velocity is constant
and do not depend on the worker demand (i.e. queue_depth)? It may sounds
fancy, but I've got an impression it should be possible to apply what's
called a "low-pass filter" in the control theory (sort of a transfer
function with an exponential decay) to smooth out the demand and adjust
the worker pool based on that.

As a side note, it might be far fetched, but there are instruments in
queueing theory to figure out how much workers are needed to guarantee a
certain low queueing probability, but for that one needs to have an
average arrival rate (in our case, average number of IO operations
dispatched to workers) and an average service rate (average number of IO
operations performed by workers).




Re: Non-reproducible AIO failure

2025-05-24 Thread Alexander Lakhin

Hello Thomas,

24.05.2025 14:42, Thomas Munro wrote:

On Sat, May 24, 2025 at 3:17 PM Tom Lane  wrote:

So it seems that "very low-probability issue in our Mac AIO code" is
the most probable description.

There isn't any macOS-specific AIO code so my first guess would be
that it might be due to aarch64 weak memory reordering (though Andres
speculated that itt should all be one backend, huh), if it's not just
a timing luck thing.  Alexander, were the other OSes you tried all on
x86?


As I wrote off-list before, I had tried x86_64 only, but since then I
tried to reproduce the issue on an aarch64 server with Ubuntu 24.04,
running 10, then 40 instances of t/027_stream_regress.pl in parallel. I've
also multiplied "test: brin ..." line x10. But the issue is still not
reproduced (in 8+ hours).

However, I've managed to get an AIO-related assertion failure on macOS 14.5
(Darwin Kernel Version 23.5.0) quite easily, running 027_stream_regress.pl
the same way:
2   # +++ tap check in src/test/recovery_2 +++
3   # +++ tap check in src/test/recovery_3 +++
4   # +++ tap check in src/test/recovery_4 +++
5   # +++ tap check in src/test/recovery_5 +++
1   # +++ tap check in src/test/recovery_1 +++
7   # +++ tap check in src/test/recovery_7 +++
6   # +++ tap check in src/test/recovery_6 +++
8   # +++ tap check in src/test/recovery_8 +++
1   [16:07:31] t/027_stream_regress.pl ..
1   Dubious, test returned 29 (wstat 7424, 0x1d00)
1   All 1 subtests passed
1   [16:07:31]
1
1   Test Summary Report
1   ---
1   t/027_stream_regress.pl (Wstat: 7424 Tests: 1 Failed: 0)
1 Non-zero exit status: 29
1 Parse errors: No plan found in TAP output
1   Files=1, Tests=1, 197 wallclock secs ( 0.01 usr  0.01 sys + 4.47 cusr  
5.79 csys = 10.28 CPU)
1   Result: FAIL

TRAP: failed Assert("ioh->op == PGAIO_OP_INVALID"), File: "aio_io.c", Line: 
161, PID: 32355
0   postgres    0x000104f078f4 ExceptionalCondition 
+ 236
1   postgres    0x000104c0ebd4 
pgaio_io_before_start + 260
2   postgres    0x000104c0ea94 pgaio_io_start_readv 
+ 36
3   postgres    0x000104c2d4e8 FileStartReadV + 252
4   postgres    0x000104c807c8 mdstartreadv + 668
5   postgres    0x000104c83db0 smgrstartreadv + 116
6   postgres    0x000104c1675c AsyncReadBuffers + 
1444
7   postgres    0x000104c15868 StartReadBuffersImpl 
+ 1196
8   postgres    0x000104c153ac StartReadBuffers + 64
9   postgres    0x000104c129b4 
read_stream_start_pending_read + 1204
10  postgres    0x000104c12018 
read_stream_look_ahead + 812
11  postgres    0x000104c11cd0 
read_stream_next_buffer + 1396
12  postgres    0x000104606c38 
heap_fetch_next_buffer + 284
13  postgres    0x0001045f79d4 heapgettup_pagemode 
+ 192
14  postgres    0x0001045f7fa4 heap_getnextslot + 84
15  postgres    0x00010497a404 
table_scan_getnextslot + 340
16  postgres    0x00010497a210 SeqNext + 148
17  postgres    0x00010497a8e4 ExecScanFetch + 772
18  postgres    0x00010497a4b8 ExecScanExtended + 
164
19  postgres    0x000104979bf0 
ExecSeqScanWithProject + 244
20  postgres    0x000104953ce0 ExecProcNode + 68
21  postgres    0x00010494f148 MultiExecPrivateHash 
+ 64
22  postgres    0x00010494ed48 MultiExecHash + 100
23  postgres    0x000104925a18 MultiExecProcNode + 
180
24  postgres    0x000104957220 ExecHashJoinImpl + 
628
25  postgres    0x0001049566dc ExecHashJoin + 32
26  postgres    0x000104925958 ExecProcNodeFirst + 
100
27  postgres    0x00010491c08c ExecProcNode + 68
28  postgres    0x00010491692c ExecutePlan + 268
29  postgres    0x00010491679c standard_ExecutorRun 
+ 504
30  pg_stat_statements.dylib    0x0001055e132c pgss_ExecutorRun + 
212
31  postgres    0x000104916580 ExecutorRun + 72
32  postgres    0x00010491f59c ParallelQueryMain + 
508
33  postgres    0x0001046968d4 ParallelWorkerMain + 
1712
34  postgres    0x000104b476f4 BackgroundWorkerMain 
+ 824
35  postgres    0x000104b4b87c 
postmaster_child_launch + 492
36  postgres   

Re: I/O worker and ConfigReload

2025-05-24 Thread Dmitry Dolgov
> On Sun, May 25, 2025 at 02:15:12AM GMT, Thomas Munro wrote:
> On Sun, May 25, 2025 at 1:56 AM Dmitry Dolgov <9erthali...@gmail.com> wrote:
> > I've been rebasing the patch for online resizing of shared memory, and
> > noticed something strange about IoWorkerMain: although it sets the
> > handler SignalHandlerForConfigReload, it doesn't look like it acts upon
> > ConfigReloadPending. From what I see it happens because it only does
> > CHECK_FOR_INTERRUPTS in the main worker loop, which doesn't handle
> > ConfigReloadPending.
> >
> > In the context of shared memory resizing patch it means I/O workers are
> > not receiving the new value of NBuffers and crash. Adding something like
> > pgaio_worker_process_interrupts to deal with ConfigReloadPending at the
> > beginning of the main worker loop seems to solve the issue. But I
> > haven't found any discussion about config reload in I/O workers, was
> > this omission intentional?
>
> You're right, and I noticed the same thing and fixed it in
> 0004-aio-Adjust-IO-worker-pool-size-automatically.patch in
> https://www.postgresql.org/message-id/flat/CA%2BhUKG%2Bm4xV0LMoH2c%3DoRAdEXuCnh%2BtGBTWa7uFeFMGgTLAw%2BQ%40mail.gmail.com.
> (That patch also introduces a reason for the workers to care about
> config reload.). Not sure how that happened, ie the history of
> adding/removing reasons to worry about reload, I don't remember, this
> code stewed for a long time, but it's at least inconsistent...  We
> could fix it as you suggested or as shown in that patch.

I see thanks. Indeed, there isn't much difference between what I had in
mind and the relevant bits in 0004, so probably it's the way to go.




Re: queryId constant squashing does not support prepared statements

2025-05-24 Thread Sami Imseih
> In v17, we are a bit smarter with the numbering, with a normalization
> giving the following, starting at $1:
> select where $5 in ($1, $2, $3) and $6 = $4
>
> So your argument about the $n parameters is kind of true, but I think
> the numbering logic in v17 to start at $1 is a less-confusing result.
> I would imagine that the squashed logic should give the following
> result on HEAD in this case if we want a maximum of consistency with
> the squashing of the IN elements taken into account:
> select where $3 in ($1 /*, ... */) and $4 = $2
>
> Starting the count of the parameters at $4 would be strange.

yeah, I think the correct answer is we need to handle 2 cases.

1. If we don't have a squashed list, then we just do what we do now.

2. If we have 1 or more squashed lists, then we can't guarantee
the $n parameter as was supplied by the user and we simply rename
the $n starting from 1.

therefore, a user supplied query like this:
```
select where $5 in ($1, $2, $3) and $6 = $4 and 1 = 2
```

will be normalized to:
```
select where $1 in ($2 /*...*/) and $3 = $4 and $5 = $6
```

To accomplish this, we will need to track the locations of
external parameters to support the 2nd case, because we need
to re-write the original location of the parameter with
the new value. I played around with this this morning and
it works as I described above. Any concerns with the
behavior described above?

--
Sami




Re: PATCH: jsonpath string methods: lower, upper, initcap, l/r/btrim, replace, split_part

2025-05-24 Thread Florents Tselai


> On 24 May 2025, at 7:08 PM, David E. Wheeler  wrote:
> 
> On May 23, 2025, at 13:52, Tom Lane  wrote:
> 
>>> I assume you mean that they’re set at initdb time, so there’s no mutability 
>>> concern?
>> 
>> Yeah, I think Peter's right and I'm wrong.  Obviously this ties into
>> our philosophical debate about how immutable is immutable.  But as
>> long as the functions only depend on locale settings that are fixed
>> at database creation, I think it's okay to consider them immutable.
>> 
>> If you were, say, depending on LC_NUMERIC, it would clearly be unsafe
>> to consider that immutable, so I'm not quite sure if this is the end
>> of the discussion.  But for what's mentioned in the thread title,
>> I think we only care about LC_CTYPE.
> 
> Oh, so maybe all this is moot, and Florents can go ahead and add support for 
> the functions to the non-_tz functions?
> 


I think the patch is still in reasonably good shape and hasn’t changed much 
since September 24.
So yes, I’d hope there are still some valid points to consider or improve.
Otherwise, I’ll have only myself to blame for not pushing harder before the 
feature freeze. 😅

Re: PATCH: jsonpath string methods: lower, upper, initcap, l/r/btrim, replace, split_part

2025-05-24 Thread David E. Wheeler
On May 24, 2025, at 12:51, Florents Tselai  wrote:

> I think the patch is still in reasonably good shape and hasn’t changed much 
> since September 24.So yes, I’d hope there are still some valid points to 
> consider or improve.

Okay, here’s a review.

Patch applies cleanly.

All tests pass.

I'm curious why you added the `arg0` and `arg1` fields to the `method_args` 
union. Is there some reason that the existing `left` and `right` fields 
wouldn't work? Admittedly these are not formally binary operators, but I don't 
see that it matters much.

The existing string() method operates on a "JSON boolean, number, string, or 
datetime"; should these functions also operate on all those data types?

The argument to the trim methods appears to be ignored:

```
postgres=# select jsonb_path_query('"zzzytest"', '$.ltrim("xyz")');
 jsonb_path_query 
--
 "zzzytest"
```

I'm wondering if the issue is the use of the opt_datetime_template in the 
grammar?

```
| '.' STR_LTRIM_P '(' opt_datetime_template ')'
{ $$ = makeItemUnary(jpiStrLtrimFunc, $4); }
| '.' STR_RTRIM_P '(' opt_datetime_template ')'
{ $$ = makeItemUnary(jpiStrRtrimFunc, $4); }
| '.' STR_BTRIM_P '(' opt_datetime_template ')'
{ $$ = makeItemUnary(jpiStrBtrimFunc, $4); }
```

I realize it resolves to a string, but for some reason it doesn't get picked 
up. But also, do you want to support variables for either of these arguments? 
If so, maybe rename and use starts_with_initial:

```
starts_with_initial:
STRING_P{ $$ = 
makeItemString(&$1); }
| VARIABLE_P{ $$ = 
makeItemVariable(&$1); }
;
```

split_part() does not support a negative n value:

```
postgres=# select split_part('abc,def,ghi,jkl', ',', -2) ;
 split_part 

 ghi

select jsonb_path_query('"abc,def,ghi,jkl"', '$.split_part(",", -2)');
ERROR:  syntax error at or near "-" of jsonpath input
LINE 1: select jsonb_path_query('"abc,def,ghi,jkl"', '$.split_part("...
```

Nor does a `+` work. I think you’d be better served using `csv_elem`, something 
like:


```
| '.' STR_SPLIT_PART_P '(' STRING_P csv_elem ‘)’
```

I'm not sure how well these functions comply with the SQL spec. Does it have a 
provision for implementation-specific methods? I *think* all existing methods 
are defined by the spec, but someone with access to its contents would have to 
say for sure. And maybe we don't care, consider this a natural extension?

I’ve attached a new patch with docs.

Best,

David



v3-0001-Add-additional-jsonpath-string-methods.patch
Description: Binary data





signature.asc
Description: Message signed with OpenPGP


Re: PATCH: jsonpath string methods: lower, upper, initcap, l/r/btrim, replace, split_part

2025-05-24 Thread David E. Wheeler
On May 24, 2025, at 17:39, David E. Wheeler  wrote:

> I’ve attached a new patch with docs.

Oh, and I forgot to mention, src/backend/utils/adt/jsonpath_scan.l looks like 
it has a ton of duplication in it. Shouldn’t it just add the new keywords, 
something like:

```
@@ -415,6 +415,11 @@ static const JsonPathKeyword keywords[] = {
{4, false, WITH_P, "with"},
{5, true, FALSE_P, "false"},
{5, false, FLOOR_P, "floor"},
+   {5, false, STR_BTRIM_P, "btrim"},
+   {5, false, STR_LOWER_P, "lower"},
+   {5, false, STR_LTRIM_P, "ltrim"},
+   {5, false, STR_RTRIM_P, "rtrim"},
+   {5, false, STR_UPPER_P, "upper"},
{6, false, BIGINT_P, "bigint"},
{6, false, DOUBLE_P, "double"},
{6, false, EXISTS_P, "exists"},
@@ -428,10 +433,13 @@ static const JsonPathKeyword keywords[] = {
{7, false, INTEGER_P, "integer"},
{7, false, TIME_TZ_P, "time_tz"},
{7, false, UNKNOWN_P, "unknown"},
+   {7, false, STR_INITCAP_P, "initcap"},
+   {7, false, STR_REPLACEFUNC_P, "replace"},
{8, false, DATETIME_P, "datetime"},
{8, false, KEYVALUE_P, "keyvalue"},
{9, false, TIMESTAMP_P, "timestamp"},
{10, false, LIKE_REGEX_P, "like_regex"},
+   {10, false, STR_SPLIT_PART_P, "split_part"},
{12, false, TIMESTAMP_TZ_P, "timestamp_tz"},
```

Best,

David



signature.asc
Description: Message signed with OpenPGP


Re: Non-reproducible AIO failure

2025-05-24 Thread Thomas Munro
On Sat, May 24, 2025 at 3:17 PM Tom Lane  wrote:
> So it seems that "very low-probability issue in our Mac AIO code" is
> the most probable description.

There isn't any macOS-specific AIO code so my first guess would be
that it might be due to aarch64 weak memory reordering (though Andres
speculated that itt should all be one backend, huh), if it's not just
a timing luck thing.  Alexander, were the other OSes you tried all on
x86?




Please update your contact list: It must be pgsql-hackers@lists.postgresql.org

2025-05-24 Thread * Neustradamus *
Dear all users of the pgsql-hackers @ lists.postgresql.org mailing list,

A lot of people always send e-mails to the bad mailing list address.

Can you ONLY send to pgsql-hackers @ lists.postgresql.org?
It is same for other lists from lists.postgresql.org.

It must not be used: pgsql-hackers @ postgresql.org

If you have a contact in your mailbox with a bad e-mail address (without 
"lists"), please update to have "lists.postgresql.org".

It will permit a better management and less e-mails detected like spam, ...

At the same time, I propose to PostgreSQL team to stop to deliver e-mails 
without "lists".

Thanks in advance.

Regards,

Neustradamus



Re: Options to control remote transactions’ access/deferrable modes in postgres_fdw

2025-05-24 Thread Etsuro Fujita
On Thu, May 8, 2025 at 5:50 PM Etsuro Fujita  wrote:
> Attached is an updated version of the patch.

Here is a new version of the patch where I added a comment for a new
function, fixed indentation, and added the commit message.  If there
are no objections, I will push this as a master-only fix, as noted in
the commit message.

Best regards,
Etsuro Fujita


v5-0001-postgres_fdw-Inherit-the-local-transaction-s-access-.patch
Description: Binary data


Re: Fixing memory leaks in postgres_fdw

2025-05-24 Thread Etsuro Fujita
Hi,

On Sat, May 24, 2025 at 10:10 AM Tom Lane  wrote:
> The DirectModify code path relies on PG_TRY blocks to ensure that it
> releases the PGresult for the foreign modify command, but that can't
> work because (at least in cases with RETURNING data) the PGresult has
> to survive across successive calls to postgresIterateDirectModify.
> If an error occurs in the query in between those steps, we have no
> opportunity to clean up.

Ugh.

> I thought of fixing this by using a memory context reset callback
> to ensure that the PGresult is cleaned up when the executor's context
> goes away, and that seems to work nicely (see 0001 attached).
> However, I feel like this is just a POC, because now that we have that
> concept we might be able to use it elsewhere in postgres_fdw to
> eliminate most or even all of its reliance on PG_TRY.  That should be
> faster as well as much less bug-prone.  But I'm putting it up at this
> stage for comments, in case anyone thinks this is not the direction to
> head in.

I think that that is a good idea; +1 for removing the reliance not
only in DirectModify but in other places.  I think that that would be
also useful if extending batch INSERT to cases with RETURNING data in
postgres_fdw.

Thanks for working on this!

Best regards,
Etsuro Fujita




Re: pg18: Virtual generated columns are not (yet) safe when superuser selects from them

2025-05-24 Thread jian he
On Sat, May 24, 2025 at 2:39 PM Feike Steenbergen
 wrote:
>
> The loophole is this:
>
> - the generated virtual column can use a user-defined function
> - when running SELECT against that column by a superuser
>   the function is called within the context of a superuser
> - this in turn allows the regular user to run any code within
>   the context of superuser

sorry, I am not fully sure what this means.
a minimum sql reproducer would be great.

you may check virtual generated column function privilege regress tests on
https://git.postgresql.org/cgit/postgresql.git/tree/src/test/regress/sql/generated_virtual.sql#n284
(from line 284 to line 303)

also see [1].
PostgreSQL grants EXECUTE privilege for functions and procedures to
PUBLIC *by default* when the objects are created.

[1]: 
https://www.postgresql.org/docs/current/ddl-priv.html#PRIVILEGES-SUMMARY-TABLE




[PATCH] Extending FK check skipping on replicas to ADD FK and TRUNCATE

2025-05-24 Thread Hannu Krosing
Hello Everybody,

Currently setting `session_replication_role` to `replica` disables
foreign key checks allowing, among other things, free table copy order
and faster CDC apply in logical replication.

But two other cases of foreign keys are still restricted or blocked
even with this setting


1. Foreign Key checks during `ALTER TABLE  ADD FOREIGN KEY
() REFERENCES ();`

These can be really, Really, REALLY slow when the PK table is huge
(hundreds of billions of rows). And here I mean taking-tens-of-days
slow in extreme cases.

And they are also completely unnecessary when the table data comes
from a known good source.


2. `TRUNCATE ` is blocked when there are any foreign
keys referencing the 

But you still can mess up your database in exactly the same way by
running `DELETE FROM `, just a little (or a lot)
slower.


The attached patch fixes this, allowing both cases to work when `SET
session_replication_role=replica;` is in force:

### Test for `ALTER TABLE  ADD FOREIGN KEY () REFERENCES
();`

CREATE TABLE pkt(id int PRIMARY KEY);
CREATE TABLE fkt(fk int);
INSERT INTO fkt VALUES(1);

ALTER TABLE fkt ADD FOREIGN KEY (fk) REFERENCES pkt(id); -- fails

SET session_replication_role=replica;
ALTER TABLE fkt ADD FOREIGN KEY (fk) REFERENCES pkt(id); -- now succeeds

### Test for `TRUNCATE `

CREATE TABLE pkt(id int PRIMARY KEY);
CREATE TABLE fkt(fk int REFERENCES pkt(id));

TRUNCATE pkt; -- fails

SET session_replication_role=replica;
TRUNCATE pkt; -- now succeeds



The patch just wraps two sections of code in

if (SessionReplicationRole != SESSION_REPLICATION_ROLE_REPLICA) {
   ...
}

and the rest is added indentation for the wrapped.

Plus I added tests, including the until now missing test for
explicitly the foreign key checks (which can be argued to already be
covered by generic trigger tests)


I am not sure if we need to add anything to current documentation[*]
which says just this about foreign keys and
`session_replication_role=replica`

> Since foreign keys are implemented as triggers, setting this parameter to 
> replica also disables all foreign key checks, which can leave data in an 
> inconsistent state if improperly used.

[*] 
https://www.postgresql.org/docs/17/runtime-config-client.html#GUC-SESSION-REPLICATION-ROLE

Any comments and suggestions are welcome
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index e6606e5e57a..592a12fa95b 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -1975,13 +1975,19 @@ ExecuteTruncateGuts(List *explicit_rels,
 	 * Check foreign key references.  In CASCADE mode, this should be
 	 * unnecessary since we just pulled in all the references; but as a
 	 * cross-check, do it anyway if in an Assert-enabled build.
+	 * 
+	 * Skip foreign key checks when `session_replication_role = replica` to
+	 * match the behaviour of disabling FK triggers in the same situation
 	 */
+	if (SessionReplicationRole != SESSION_REPLICATION_ROLE_REPLICA)
+	{
 #ifdef USE_ASSERT_CHECKING
-	heap_truncate_check_FKs(rels, false);
-#else
-	if (behavior == DROP_RESTRICT)
 		heap_truncate_check_FKs(rels, false);
+#else
+		if (behavior == DROP_RESTRICT)
+			heap_truncate_check_FKs(rels, false);
 #endif
+	}
 
 	/*
 	 * If we are asked to restart sequences, find all the sequences, lock them
@@ -5918,49 +5924,55 @@ ATRewriteTables(AlterTableStmt *parsetree, List **wqueue, LOCKMODE lockmode,
 	 * theoretically possible that we have changed both relations of the
 	 * foreign key, and we'd better have finished both rewrites before we try
 	 * to read the tables.
+	 * 
+	 * Skip the check when `session_replication_mode = replica` to save time 
+	 * and to match the FK trigger behaviour in the same situation 
 	 */
-	foreach(ltab, *wqueue)
+	if (SessionReplicationRole != SESSION_REPLICATION_ROLE_REPLICA)
 	{
-		AlteredTableInfo *tab = (AlteredTableInfo *) lfirst(ltab);
-		Relation	rel = NULL;
-		ListCell   *lcon;
-
-		/* Relations without storage may be ignored here too */
-		if (!RELKIND_HAS_STORAGE(tab->relkind))
-			continue;
-
-		foreach(lcon, tab->constraints)
+		foreach(ltab, *wqueue)
 		{
-			NewConstraint *con = lfirst(lcon);
+			AlteredTableInfo *tab = (AlteredTableInfo *) lfirst(ltab);
+			Relation	rel = NULL;
+			ListCell   *lcon;
 
-			if (con->contype == CONSTR_FOREIGN)
+			/* Relations without storage may be ignored here too */
+			if (!RELKIND_HAS_STORAGE(tab->relkind))
+continue;
+
+			foreach(lcon, tab->constraints)
 			{
-Constraint *fkconstraint = (Constraint *) con->qual;
-Relation	refrel;
+NewConstraint *con = lfirst(lcon);
 
-if (rel == NULL)
+if (con->contype == CONSTR_FOREIGN)
 {
-	/* Long since locked, no need for another */
-	rel = table_open(tab->relid, NoLock);
-}
+	Constraint *fkconstraint = (Constraint *) con->qual;
+	Relation	refrel;
 
-refrel = table_open(con->refrelid, RowShareLock);
+	if (rel == NULL)
+	{
+		/* Long since locked, no need fo

Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly

2025-05-24 Thread Amit Kapila
On Sat, May 24, 2025 at 4:08 PM Alexander Korotkov  wrote:
>
> I spend more time on this.  The next revision is attached.  It
> contains revised comments and other cosmetic changes.  I'm going to
> backpatch 0001 to all supported branches,
>

Is my understanding correct that we need 0001 because
PhysicalConfirmReceivedLocation() doesn't save the slot to disk after
changing the slot's restart_lsn? If so, shouldn't the comments (One
could argue that the slot should be saved to disk now, but that'd be
energy wasted - the worst thing lost information could cause here is
to give wrong information in a statistics view) in
PhysicalConfirmReceivedLocation() be changed to mention the risk of
not saving the slot?

Also, after 0001, even the same solution will be true for logical
slots as well, where we are already careful to save the slot to disk
if its restart_lsn is changed, see LogicalConfirmReceivedLocation().
So, won't that effort be wasted? Even if we don't want to do anything
about it (which doesn't sound like a good idea), we should note that
in comments somewhere.

-- 
With Regards,
Amit Kapila.




Copy Tuple Desc in internal_get_result_type

2025-05-24 Thread Lakshmi Narayana Velayudam
Hello,
I would like to propose a modification to the internal_get_result_type
function, which is called from *get_call_result_type*. Specifically, I
recommend altering it to return a copy of *rsinfo->expectedDesc*.

Currently, callers are responsible for copying the tuple descriptor
returned from *get_call_result_type* before assigning it to rsinfo->setDesc
as part of set-returning functions in mode *SFRM_Materialize *if not copied
then it can lead to unexpected behavior because rsinfo->expectedDesc will
be unintentionally freed at the end of ExecMakeTableFunctionResult as
rsinfo->setDesc holds its reference. This can be overlooked, unless a
user-defined function (UDF) is invoked multiple times within the same query.

*Example*: Queries involving lateral joins with user-defined functions
(UDFs).

For your convenience, I have created a GitHub repository demonstrating the
issue with a C extension: https://github.com/narayana-dev/srfbug

Additionally, I have attached the patch file for your review.

I look forward to your feedback.

Best regards,
Lakshmi Narayana Velayudam


copy_tup_desc.patch
Description: Binary data


Testing PostgreSQL 18 beta ...

2025-05-24 Thread André Verwijs




I want to test PostgreSQL 18 beta on Debian 13 (trixie),
but "libpq5" (18-beta) is missing


My sources configuration:
~


Types: deb
URIs: https://apt.postgresql.org/pub/repos/apt/
Suites: trixie-pgdg
Components: main 17 18
Enabled: yes
Architectures: amd64
Signed-By: /etc/apt/trusted.gpg.d/PostgreSQL.asc

Types: deb
URIs: https://apt.postgresql.org/pub/repos/apt/
Suites: trixie-pgdg-snapshot
Components: main 17 18
Enabled: yes
Architectures: amd64
Signed-By: /etc/apt/trusted.gpg.d/PostgreSQL.asc

Types: deb
URIs: https://apt.postgresql.org/pub/repos/apt/
Suites: trixie-pgdg-testing
Components: main 17 18
Enabled: yes
Architectures: amd64
Signed-By: /etc/apt/trusted.gpg.d/PostgreSQL.asc

Types: deb
URIs: https://apt.postgresql.org/pub/repos/apt/
Suites: sid-pgdg
Components: main 17 18
Enabled: yes
Architectures: amd64
Signed-By: /etc/apt/trusted.gpg.d/PostgreSQL.asc


please add "libpq5" (18-beta) to (1 or more) sources..







--

Dank U - Thank You

* X (twitter): https://x.com/opensimfan
* BlueSky: https://bsky.app/profile/opensimfan.bsky.social
* Facebook: http://www.facebook.com/andre.verwijs
* Instagram: https://instagram.com/dutchglory








Re: MERGE issues around inheritance

2025-05-24 Thread Álvaro Herrera
On 2025-May-21, Andres Freund wrote:

> Hi,
> 
> In [1] I added some verification to projection building, to check if the
> tupleslot passed to ExecBuildProjectionInfo() is compatible with the target
> list.  One query in merge.sql [2] got flagged.
> 
> Trying to debug that issue, I found another problem. This leaves us with:
> 
> 1) w/ inheritance INSERTed rows are not returned by RETURNING. This seems to
>just generally not work with MERGE

Hmm, curious.  One thing to observe is that the original source tuple is
in the child table, but the tuple inserted by MERGE ends up in the
parent table.  I'm guessing that the code gets confused as to the
relation that the tuple in the returned slot comes from, and that
somehow makes it somehow not "see" the tuple to process for RETURNING?
I dunno.  CC'ing Dean, who is more familiar with this code than I am.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"I love the Postgres community. It's all about doing things _properly_. :-)"
(David Garamond)




Re: mention unused_oids script in pg_proc.dat

2025-05-24 Thread Álvaro Herrera
On 2025-May-24, Florents Tselai wrote:

> I aggree with having more READMEs around the src tree.
> It’s true that a lot of doc content found in VII. Internals is dev-oriented,
> but imho it should be closer to the src (and more succinct/less verbose too).

Maybe these READMEs can simply contain little more than links to the
built HTML docs, to avoid having multiple sources of info.

> Looking at some of the text in 67.2.2. OID Assignment for example,
> a few things look outdated wrt choosing OIDs.
> 
> One can look through the history of similar commits to see what
> needs to be changed; So it’s not THAT bad.

Sure, outdated docs is something that happens.  Patches welcome.

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/
"Cómo ponemos nuestros dedos en la arcilla del otro. Eso es la amistad; jugar
al alfarero y ver qué formas se pueden sacar del otro" (C. Halloway en
La Feria de las Tinieblas, R. Bradbury)