date:20250128

Re: Windows CFBot is broken because ecpg dec_test.c error

2025-01-28 Thread Andres Freund

Hi, 

On January 28, 2025 7:13:16 AM EST, Jelte Fennema-Nio  
wrote:
>Since about ~11 hours ago the ecpg test is consistently failing on
>Window with this error[1]:
>
>> Could not open file 
>> C:/cirrus/build/src/interfaces/ecpg/test/compat_informix/dec_test.c for 
>> reading
>
>I took a quick look at possible causes but couldn't find a clear
>winner. My current guess is that there's some dependency rule missing
>in the meson file and due to some infra changes files now get compiled
>in the wrong order.
>
>One recent suspicious commit seems to be:
>7819a25cd101b574f5422edb00fe3376fbb646da
>But there are a bunch of successful changes that include that commit,
>so it seems to be a red herring. (CC-ed Noah anyway)

I think it's due to a new version of meson. Seems we under specified test 
dependencies. I'll write up a patch.

Andres

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: Convert sepgsql tests to TAP

2025-01-28 Thread Dagfinn Ilmari Mannsåker

Peter Eisentraut  writes:

> On 27.08.24 10:12, Peter Eisentraut wrote:
>> Here is a new patch version.
>> I simplified the uses of sed and awk inside the Perl script.  I also 
>> fixed "make installcheck".  I noticed that meson installs sepgsql.sql
>> into the wrong directory, so that's fixed also.  (Many of the 
>> complications in this patch set are because sepgsql is not an
>> extension but a loose SQL script, of which it is now the only one. 
>> Maybe something to address separately.)
>> I did end up deciding to keep the old test_sepgsql script, because it 
>> does have the documented purpose of testing existing installations.  I
>> did change it so that it calls pg_regress directly, without going via 
>> make, so that the dependency on make is removed.
>
> This has been committed.  And I understand there is a buildfarm client
> update available for the affected buildfarm members.

This patch passed the TAP command invocation cleanup patch mid-flight,
so didn't get the memo about command usng the fat comma for line option
arguments.  Here's a patch for bringing it in line with the new
convention.  I don't have any machines with SELinux enabled, so either
someone who has would need to test it, or we can rely on the buildfarm.

- ilmari

>From bc899fbe7a89fcdf198421a9abf608772748c1ba Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Dagfinn=20Ilmari=20Manns=C3=A5ker?= 
Date: Tue, 28 Jan 2025 13:32:35 +
Subject: [PATCH] sepgsql: update TAP test to use fat comma style

---
 contrib/sepgsql/t/001_sepgsql.pl | 15 +--
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/contrib/sepgsql/t/001_sepgsql.pl b/contrib/sepgsql/t/001_sepgsql.pl
index cba51403518..c5fd7254841 100644
--- a/contrib/sepgsql/t/001_sepgsql.pl
+++ b/contrib/sepgsql/t/001_sepgsql.pl
@@ -211,10 +211,10 @@
 
 	my $result = run_log(
 		[
-			'postgres', '--single',
-			'-F', '-c',
-			'exit_on_error=true', '-D',
-			$node->data_dir, 'template0'
+			'postgres', '--single', '-F',
+			'-c' => 'exit_on_error=true',
+			'-D' => $node->data_dir,
+			'template0'
 		],
 		'<',
 		$ENV{share_contrib_dir} . '/sepgsql.sql');
@@ -238,8 +238,11 @@
 
 $node->command_ok(
 	[
-		$ENV{PG_REGRESS}, '--bindir=', '--inputdir=.', '--launcher',
-		'./launcher', @tests
+		$ENV{PG_REGRESS},
+		'--bindir' => '',
+		'--inputdir' => '.',
+		'--launcher' => './launcher',
+		@tests
 	],
 	'sepgsql tests');
 
-- 
2.48.1

Re: Interrupts vs signals

2025-01-28 Thread Andres Freund

Hi,

On 2024-12-02 16:39:28 +0200, Heikki Linnakangas wrote:
> From eff8de11fbfea4e2aadce9c1d71452b0f5a1b80b Mon Sep 17 00:00:00 2001
> From: Heikki Linnakangas 
> Date: Mon, 2 Dec 2024 15:51:54 +0200
> Subject: [PATCH v5 1/4] Replace Latches with Interrupts

Your email has only three attachements - I assume the fourth is something
local that didn't need to be shared?


> MIME-Version: 1.0
> Content-Type: text/plain; charset=UTF-8
> Content-Transfer-Encoding: 8bit
> 
> The Latch was a flag, with an inter-process signalling mechanism that
> allowed a process to wait for the Latch to be set. My original vision
> for Latches was that you could have different latches for different
> things that you need to wait for, but in practice, almost all code
> used one "process latch" that was shared for all wakeups. The only
> exception was the "recoveryWakeupLatch". I think the reason that we
> ended up just sharing the same latch for all wakeups is that it was
> cumbersome in practice to deal with multiple latches. For starters,
> there was no machinery for waiting for multiple latches at the same
> time. Secondly, changing the "ownership" of a latch needed extra
> locking or other coordination. Thirdly, each inter-process latch
> needed to be initialized at postmaster startup time.
> 
> This patch embraces the reality of how Latches were used, and replaces
> the Latches with a per-process interrupt mask. You can no longer
> allocate Latches in arbitrary shared memory structs and an interrupt
> is always directed at a particular process, addressed by its
> ProcNumber.  Each process has a bitmask of pending interrupts in
> PGPROC.

That doesn't really remove the need for inter-process coordination to know
what ProcNumber corresponds to what...


> This commit introduces two interrupt bits. INTERRUPT_GENERAL replaces
> the general-purpose per-process latch. All code that previously set a
> process's process latch now sets its INTERRUPT_GENERAL interrupt bit
> instead.

Half-formed-at-best thought: I wonder if we should split the "interrupt
yourself in response to a signal" cases from "another process wakes you up",
even in the initial commit. They seem rather different to me.


> This also moves the WaitEventSet functions to a different source file,
> waiteventset.c. This separates the platform-dependent code waiting and
> signalling code from the platform-independent parts.

I think this should be split into a separate commit. It's painful to verify
that code-movement didn't change anything else if there's lots of other
changes in the same commit.


> diff --git a/src/include/storage/interrupt.h b/src/include/storage/interrupt.h
> new file mode 100644
> index 00..1da05369c3
> --- /dev/null
> +++ b/src/include/storage/interrupt.h

So we now have postmaster/interrupt.[ch] and storage/interrupt.h /
storage/ipc/interrupt.c.


> @@ -0,0 +1,171 @@
> +/*-
> + *
> + * interrupt.h
> + * Interrupt handling routines.

And all of them use the same "short description".

That somehow doesn't seem optimal, although I don't really have a good
solution.



> + * "Interrupts" are a set of flags that represent conditions that should be
> + * handled at a later time.  They are roughly analogous to Unix signals,
> + * except that they are handled cooperatively by checking for them at many
> + * points in the code.
> + *
> + * Interrupt flags can be "raised" synchronously by code that wants to defer
> + * an action, or asynchronously by timer signal handlers, other signal
> + * handlers or "sent" by other backends setting them directly.

I think it might be worth explicitly stating here that multiple "arriving"
interrupts can be coalesced.


> + * Most code currently deals with the INTERRUPT_GENERAL interrupt. It is
> + * raised by any of the events checked by CHECK_FOR_INTERRUPTS(), like query
> + * cancellation or idle session timeout. Well behaved backend code performs
> + * CHECK_FOR_INTERRUPTS() periodically in long computations, and should never
> + * sleep using mechanisms other than the WaitEventSet mechanism or the more
> + * convenient WaitInterrupt/WaitSockerOrInterrupt functions (except for
> + * bounded short periods, eg LWLock waits), so they should react in good 
> time.




> + * The "standard" set of interrupts is handled by CHECK_FOR_INTERRUPTS(), and
> + * consists of tasks that are safe to perform at most times.  They can be
> + * suppressed by HOLD_INTERRUPTS()/RESUME_INTERRUPTS().
> + *
> + *
> + * The correct pattern to wait for event(s) using INTERRUPT_GENERAL is:
> + *
> + * for (;;)
> + * {
> + *  ClearInterrupt(INTERRUPT_GENERAL);
> + *  if (work to do)
> + *  Do Stuff();
> + *  WaitInterrupt(1 << INTERRUPT_GENERAL, ...);
> + * }

I don't particularly like that there's now dozens of places that need to do
bit-shifting.

One reason I don't like it is that, for example, this should actually be 1U <<
INTERRUPT

Re: Compression of bigger WAL records

2025-01-28 Thread Fujii Masao





On 2025/01/22 3:24, Andrey M. Borodin wrote:




On 12 Jan 2025, at 17:43, Andrey M. Borodin  wrote:

I attach a prototype patch.


Here's v2, now it passes all the tests with wal_debug.


I like the idea of WAL compression more.

With the current approach, each backend needs to allocate memory twice
the size of the total WAL record. Right? One area is for the gathered
WAL record data (from rdt and registered_buffers), and the other is for
storing the compressed data. Could this lead to potential memory usage
concerns? Perhaps we should consider setting a limit on the maximum
memory each backend can use for WAL compression?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: Eagerly scan all-visible pages to amortize aggressive vacuum

2025-01-28 Thread Robert Treat

On Mon, Jan 27, 2025 at 12:45 PM Melanie Plageman
 wrote:
>
> On Fri, Jan 24, 2025 at 3:43 PM Robert Haas  wrote:
> >
> > On Fri, Jan 24, 2025 at 3:02 PM Melanie Plageman
> >  wrote:
> attached v11 uses a fraction with this name. It follows the
> conventions and I find it descriptive.
>
> Changing the configuration to a fraction does mean that quite a few of
> the comments are different in this version. I think the description of
> the guc in
> config.sgml could use some review in particular. I tried to make it
> clear that the percentage is not the maximum number of blocks that
> could be eager scanned, because you may also eagerly scan blocks and
> succeed.
>

Specifies the maximum fraction of pages that VACUUM
may scan and fail to set all-frozen in the
visibility map before disabling eager scanning. A value of 0 disables
eager scanning altogether. The default is 0.03 (3%).

Note that when eager scanning is enabled, successful page freeze
attempts will not count against this limit, although they are
internally capped at 20% of the all-visible but not all-frozen pages
within the relation, in an effort to amortize overhead for future
aggressive vacuums.

This parameter can only be set in the
postgresql.conf file or on the server command
line; but the setting can be overridden for individual tables by
changing the corresponding
table storage parameter. For more information, see .


> Other note: I noticed AutoVacOpts that are floating point numbers (like
> vacuum_cost_delay) are float8s but their associated GUCs (like
> autovacuum_vacuum_cost_delay) are doubles. These are going to be equivalent,
> but I wanted to note the inconsistency in case I was missing something.
>

::thinking emoji::

I think what you have is right, though certainly better to have a more
qualified opinion.
On a not entirely related item, I find it interesting that you set
your max value to 1.0, but vacuum_scale_factor (and similar) set to
100.00.
I think you have it right though, what does it mean to set a table's
scale factor to 10,000%?

> > >
> > > > + ereport(INFO,
> > > > + (errmsg("Vacuum successfully froze %u eager scanned blocks of
> > > > \"%s.%s.%s\". Now disabling eager scanning.",
> > > >
> > > > I predict that if Tom sees this, you will get complaints about both
> > > > the wording of the message, which pretty clearly does not conform to
> > > > style guidelines for a primary error message, and also about the use
> > > > of the INFO level. Allow me to suggest DEBUG1 or DEBUG2 and "disabling
> > > > eager scanning after freezing %u eagerly scanned blocks".
> > >
> > > I've used your wording. Just for future reference, are the style
> > > guidelines I was violating the capitalization and punctuation? Are
> > > these documented somewhere? Also, what is a primary error message?
> > > INFO level? Or ones that use ereport and are translated? I looked at
> > > other messages and saw that they don't capitalize the first word or
> > > use punctuation, so I assume that those were problems.
> >
> > Yes. Primary error messages, i.e. errmsg(), are not capitalized and
> > punctuated, unlike errdetail() and errhint() messages, which are.
> >
> > See https://www.postgresql.org/docs/current/error-style-guide.html
> >
> > INFO level is used for VERY few things. I can't tell you off the top
> > of my head when it's appropriate, but I think the answer is "almost
> > never".
>
> I have changed it to DEBUG2. I started it as INFO because the other vacuum
> messages about whether or not the vacuum is an aggressive vacuum were at INFO
> level.I don't know what a user might prefer. Treat said this downthread:
>
> > Maybe, but one of the areas that INFO is used for is providing
> > additional details in VACUUM VERBOSE output, and this seems like it
> > would be pretty useful information to have if you are trying to
> > discern changes in i/o rate during a vacuum, or trying to tune the
> > failure rate setting, or several other related fields (including
> > automated capture by tools like pganalyze), so I believe INFO is the
> > right choice for this.
>
> I'm happy to go either way. I don't want users mad about verbosity, but if 
> they
> think it's helpful, then that seems good.
>

If there is a configurable, people will want to tune it, and DEBUG
level messages aren't a usable solution.

Robert Treat
https://xzilla.net

Re: EDB Installer initcluster script changes - review requested

2025-01-28 Thread Vladlen Popolitov


Manika Singhal писал(а) 2025-01-28 16:42:

Hi,



"C:\Users\Administrator\AppData\Local\Temp\2\postgresql_installer_cd746730f8"

"C:\Program Files\PostgreSQL\17" "C:\Program
Files\PostgreSQL\17\data" 5432 "Turkish,Türkiye" 0


Hi!

 Your example has not-ascii characters in command line. It is not good 
practice.
If user switched on "Use UTF-8" flag in locale settings (it is user's 
right to do it

on own computer), it forces all commands
use UTF-8 characters in command line. "Türkiye" will be transformed to 
wrong text

by exec system call.

 "Use UTF-8" flag appeared in Windows 10. It helps correctly input and 
output
UTF-8 text (including correct behaviour of psql.exe utility under 
Windows).

As consequence, it consider as obsolete all programs that uses national
chars in command line or do not expect UTF-8 in command line.



Thanks & Regards
Manika Singhal
EDB Pune



Links:
--
[1] https://github.com/EnterpriseDB/edb-installers/issues/98
[2] 
https://www.postgresql.org/message-id/ca+hukgl5mbn3jquebapbx0yxdntpui04j+ksy2f7kbbhlga...@mail.gmail.com


--
Best regards,

Vladlen Popolitov.

Re: XMLDocument (SQL/XML X030)

2025-01-28 Thread Jim Jones

Hi Robert

On 28.01.25 05:54, Robert Treat wrote:
> Was playing around with the patch and was thinking about this wording:
> "The xmldocument function returns the input argument
>  unchanged... and is provided for compatibility."
>
> When I run an example similar to the db2 example you gave earlier:
>
> pagila=# SELECT xmldocument(xmlforest(10 as X, 20 as Y));
> xmldocument
> 
>  1020
>
> In the db2 case, this is preserved as UPPER (which is to say, db2 case
> folds UPPER, and the input happens to match that), but we case fold
> lower, because we do; presumably you'd get the opposite effect in db2
> running the input with unquoted lower field names(?). 

Yes.

SELECT 42 AS foo FROM SYSIBM.SYSDUMMY1

FOO    
---
 42

  1 record(s) selected.

> In any case (no
> pun intended), SQL folks probably don't care much about that
> discrepancy, but given xml is case sensitive, maybe xml people do?

That's a good point. DB2 converts unquoted identifiers to uppercase by
default, which, if memory serves, aligns with the SQL standard. In the
case of this xmlforest example, my guess is that DB2 treats the elements
as identifiers and normalizes them to uppercase as well, as DB2 does not
handle XML documents as text like we do. To preserve case, you'd need to
explicitly quote the identifiers:

SELECT xmlforest(10 AS "x", 20 AS "y") FROM SYSIBM.SYSDUMMY1

1020

  1 record(s) selected.

Things look different when constructing the xml document directly from a
string:

SELECT xmlparse(DOCUMENT 'bar') FROM
SYSIBM.SYSDUMMY1

bar

  1 record(s) selected.

I'd say the difference is due to how the two systems handle the XML data
type and unquoted identifiers in general, rather than a difference in
the behaviour of the function itself. Sticking to quoted identifiers in
both systems helps:

SELECT xmlforest(42 AS "foo", 73 AS "bar");
 xmlforest  

 4273

Probably that's why most DB2 examples in their documentation use quoted
identifiers :)

Best regards, Jim

Re: NOT ENFORCED constraint feature

2025-01-28 Thread Peter Eisentraut


On 20.01.25 17:53, Amul Sul wrote:

Attached is a new set of patches. Please ignore patch 0001 here, which
was posted separately [1] -- proposes allowing invalid foreign key
constraints on partitioned tables. Patch 0002 refactors
tryAttachPartitionForeignKey(), primarily needed by the new patch
v9-0006 included in this set. This patch handles merging constraints
with different enforceability. Without it, a new constraint would be
created on the child. However, this patch introduces additional code
that may appear somewhat messy or confusing. I've added plenty of
comments to clarify the logic. While I’ve done some testing, it hasn’t
been extensive. I plan to do more testing in the next week.

Please let me know if we should continue with patch 0006 improvement,
or if the feature up to patch 0005, which enforces matching
enforceability before merging constraints, is sufficient.

1]  
https://postgr.es/m/CAAJ_b96Bp=-ZwihPPtuaNX=srz0u6zsxd3+fgaro0juka8v...@mail.gmail.com



I made minor updates to the attached version, particularly ensuring
that the order of relation opening and closing remains the same as
before in ATExecAlterConstrRecurse(). Additionally, I’ve added a
refactoring patch to make createForeignKeyActionTriggers() accept a
relation OID instead of a Relation, making this function consistent
with others like createForeignKeyCheckTriggers().


I think v10-0001 has been committed separately now.  I can't 
successfully apply the remaining patches though.  Could you supply an 
updated patch set?


Just from looking at them, the refactoring patches 0002, 0003, 0004 seem ok.

0005 also seems ok.


In 0006, this change in the test output should be improved:

 -- XXX: error message is misleading here
 ALTER TABLE unique_tbl ALTER CONSTRAINT unique_tbl_i_key ENFORCED;
-ERROR:  ALTER CONSTRAINT statement constraints cannot be marked ENFORCED
-LINE 1: ...TABLE unique_tbl ALTER CONSTRAINT unique_tbl_i_key ENFORCED;
-  ^
+ERROR:  constraint "unique_tbl_i_key" of relation "unique_tbl" is not a 
foreign key constraint


Maybe this should be along the lines of "ALTER CONSTRAINT ... ENFORCED 
is not supported for %s constraints" or something like that.



This behavior is not correct:

+-- Changing it back to ENFORCED will leave the constraint in the NOT 
VALID state

+ALTER TABLE FKTABLE ALTER CONSTRAINT fktable_ftest1_fkey ENFORCED;
+-- Which needs to be explicitly validated.
+ALTER TABLE FKTABLE VALIDATE CONSTRAINT fktable_ftest1_fkey;

Setting the constraint to enforced should enforce it immediately.  This 
SQL statement is covered by the SQL standard.  Also, I think it's a 
better user experience if you don't require two steps.

Feature Request: Add AES-128-CFB Mode Support to pgcrypto

2025-01-28 Thread Vladyslav Nebozhyn

Dear PostgreSQL Development Team,

I would like to request the addition of support for the *AES-128-CFB* mode
in the pgcrypto extension. Currently, pgcrypto supports AES encryption
modes like ECB and CBC, but it does not include CFB mode, which is
essential for certain use cases.

In managed environments such as *Azure Database for PostgreSQL - Flexible
Server*, users are unable to create or install custom extensions. This
restriction makes it challenging to work with encrypted data that relies on
AES-128-CFB, as we cannot use custom solutions to handle this algorithm.
Adding CFB mode support to pgcrypto would address this limitation and
expand its usability in managed PostgreSQL environments.

Implementing AES-128-CFB in pgcrypto should require only minimal changes,
as it is already built on OpenSSL, which supports the CFB mode natively.
Including this functionality would also align pgcrypto with the principle
of providing robust cryptographic support, similar to other database
solutions.
Why This Matters:

   1. *Compatibility*: Many existing systems encrypt data using
   AES-128-CFB, and without native support in pgcrypto, PostgreSQL users
   must resort to inefficient workarounds.
   2. *Consistency*: pgcrypto already supports other AES modes (e.g., ECB,
   CBC). Including CFB would ensure that its AES capabilities are complete.
   3. *Ease of Implementation*: OpenSSL already provides a straightforward
   API for AES-128-CFB, so adding it to pgcrypto should require only a few
   lines of code.

This enhancement would greatly benefit users in managed environments and
improve the flexibility of PostgreSQL as a whole.

Thank you for considering this request. I would be happy to assist in
testing or providing further information if needed.

Best regards,
Vladyslav Nebozhyn

Re: POC, WIP: OR-clause support for indexes

2025-01-28 Thread Andrei Lepikhov


On 1/28/25 11:36, Andrei Lepikhov wrote:

On 1/27/25 16:50, Alexander Korotkov wrote:
qsort(matches, n, sizeof(OrArgIndexMatch), or_arg_index_match_cmp);

To fit an index, the order of elements in the target array of the 
`ScalarArrayOpExpr` may change compared to the initial list of OR 
expressions. If there are indexes that cover the same set of columns but 
in reverse order, this could potentially alter the position of a 
Subplan. However, I believe this is a rare case; it is supported by the 
initial OR path and should be acceptable.
I beg your pardon - I forgot that we've restricted the feature's scope 
and can't combine OR clauses into ScalarArrayOpExpr if the args list 
contains references to different columns.

So, my note can't be applied here.

--
regards, Andrei Lepikhov

Re: Introduce XID age and inactive timeout based replication slot invalidation

2025-01-28 Thread Amit Kapila

On Mon, Dec 30, 2024 at 11:05 AM Peter Smith  wrote:
>
> I think we are often too quick to throw out perfectly good tests.
> Citing that some similar GUCs don't do testing as a reason to skip
> them just seems to me like an example of "two wrongs don't make a
> right".
>
> There is a third option.
>
> Keep the tests. Because they take excessive time to run, that simply
> means you should run them *conditionally* based on the PG_TEST_EXTRA
> environment variable so they don't impact the normal BF execution. The
> documentation [1] says this env var is for "resource intensive" tests
> -- AFAIK this is exactly the scenario we find ourselves in, so is
> exactly what this env var was meant for.
>
> Search other *.pl tests for PG_TEST_EXTRA to see some examples.
>

I don't see the long-running tests to be added under PG_TEST_EXTRA as
that will make it unusable after some point. Now, if multiple senior
members feel it is okay to add long-running tests under PG_TEST_EXTRA
then I am open to considering it. We can keep this test as a separate
patch so that the patch is being tested in CI or in manual tests
before commit.

-- 
With Regards,
Amit Kapila.

Re: Timeline issue if StartupXLOG() is interrupted right before end-of-recovery record is done

2025-01-28 Thread Andrey Borodin

> On 21 Jan 2025, at 16:47, Roman Eskin  wrote:
> 
>> 
>> Persisting recovery signal file for some _timeout_ seems super dangerous to 
>> me. In distributed systems every extra _timeout_ is a source of complexity, 
>> uncertainty and despair.
> 
> The approach is not about persisting the signal files for some timeout. 
> Currently the files are removed in StartupXLOG() before 
> writeTimeLineHistory() and PerformRecoveryXLogAction() are called. The 
> suggestion is to move the file removal after PerformRecoveryXLogAction() 
> inside StartupXLOG().

Sending node to repeated promote-fail cycle without resolving root cause seems 
like even less appealing idea.
If something prevented promotion, why we should retry by this particular method?

Even in case of transient failure which you described - power loss - it does 
not sound like a very good idea to retry promotion after returning online. The 
user will get unexpected splitbrain.

Best regards, Andrey Borodin.

Re: Introduce XID age and inactive timeout based replication slot invalidation

2025-01-28 Thread Nisha Moond

On Mon, Jan 27, 2025 at 4:20 PM Amit Kapila  wrote:
>
> On Mon, Jan 27, 2025 at 11:00 AM Nisha Moond  wrote:
> >
> > I discussed the above comments further with Peter off-list, and here
> > are the v63 patches with the following changes:
> >  patch-001: The Assert and related comments have been updated for clarity.
> >
>
> The 0001 patch should be discussed in a separate thread as those are
> general improvements that are useful even without the main patch we
> are trying to achieve in this thread. I suggest we break it into three
> patches (a) Ensure the same inactive_since time for all slots, (b)
> Raise an error for invalid slots during ReplicationSlotAcquire(); tell
> in the commit message, without this patch when such an ERROR would
> have otherwise occurred, and (c) Changes in
> InvalidatePossiblyObsoleteSlot(), I suggest to leave this change for
> later as this impacts the core logic of invalidation.
>

I have started a new thread for these general improvements and have
separated the changes (a) and (b) into different patches.

You can find the new thread at [1].

> *
> @@ -812,7 +823,7 @@ ReplicationSlotAlter(const char *name, const bool 
> *failover,
>   Assert(MyReplicationSlot == NULL);
>   Assert(failover || two_phase);
>
> - ReplicationSlotAcquire(name, false);
> + ReplicationSlotAcquire(name, false, false);
>
> Why don't we want to give ERROR during Alter? I think it is okay to
> not give ERROR for invalid slots during Drop as we are anyway removing
> such slots.
>

Because ReplicationSlotAlter() already handles errors immediately
after acquiring the slot. It raises errors for invalidated slots and
also raises a different error message if the slot is a physical one.
So, In case of ALTER, I feel it is okay to acquire the slot first
without raising errors and then handle errors in the pre-defined way.
Similar immediate error handling is not available at other places.

[1] 
https://www.postgresql.org/message-id/CABdArM6pBL5hPnSQ%2B5nEVMANcF4FCH7LQmgskXyiLY75TMnKpw%40mail.gmail.com

--
Thanks,
Nisha

Re: Interrupts vs signals

2025-01-28 Thread Andres Freund

Hi,

I was chatting with Heikki about this patch and he mentioned that he recalls a
patch that did some work to unify the signal replacement, procsignal.h and
CHECK_FOR_INTERRUPTS(). Thomas, that was probably from you? Do you have a
pointer, if so?

It does seem like we're going to have to do some unification here. We have too
many different partially overlapping, partially collaborating systems here.


- procsignals - kinda like the interrupts here, but not quite. Probably can't
  just merge them 1:1 into the proposed mechanism, or we'll run out of bits
  rather soon.  I don't know if we want the relevant multiplexing to be built
  into interrupt.h or not.

  Or we ought to redesign this mechanism to deal with more than 32 types of
  interrupt.


- pmsignal.h - basically the same thing as here, except for signals sent *too*
  postmaster.

  It might or might not be required to keep this separate. There are different
  "reliability" requirements...


- CHECK_FOR_INTERRUPTS(), which uses the mechanism introduced here to react to
  signals while blocked, using RaiseInterrupt() (whereas it did SetLatch()
  before).

  A fairly simple improvement would be to use different InterruptType for
  e.g. termination and query cancellation.

  But even with this infrastructure in place, we can't *quite* get away from
  signal handlers, because we need some way to set at the very least
  InterruptPending (and perhaps ProcDiePending, QueryCancelPending etc,
  although those could be replaced with InterruptIsPending()). The
  infrastructure introduced with these patches provides neither a way to set
  those variables in response to delivery of an interrupt, nor can we
  currently set them from another backend, as they are global variables.

  We could of course do InterruptsPending(~0) in in
  INTERRUPTS_PENDING_CONDITION(), but it would require analyzing the increased
  indirection overhead as well as the "timeliness" guarantees.

  Today we don't need a memory barrier around checking InterruptPending,
  because delivery of a signal delivery (via SetLatch()) ensures that. But I
  think we would need one if we were to not send signals for
  cancellation/terminations etc. I.e. right now the overhead of delivery is
  bigger, but checking if there pending signals is cheaper.


  Another aspect of this is that we're cramming more and more code into
  ProcessInterrupts(), in a long weave of ifs.  I wonder if somewhere along
  ipc/interrupt.h there should be a facility to register callbacks to react to
  delivered interrupts.  It'd need to be a bit more than just a mapping of
  InterruptType to callback, because of things like InterruptHoldoffCount,
  CritSectionCount, QueryCancelHoldoffCount.


- timeout.c - will need a fairly big renovation for threading.  IIUC Heikki is
  thinking that we'll have a dedicated timer thread.

  It's a lot more expensive to wake up another thread than your own. A process
  local SIGALRM delivery does not require an inter processor interrupt, but
  doing a pthread_sigqueue() or whatnot does. Which would be a bit silly,
  because what most of the timeout handlers do is to set some variables to
  true and then call SetLatch().  It might not matter *that* much, because we
  don't except timeout "delivery" that frequently, but I'm also not convinced
  it won't be noticeable.

  Nearly all the timeouts we currently have could actually just specify the
  interrupt that ought to be raised, to be sent by that timeout thread. The
  one exception to that is StartupPacketTimeoutHandler() (which does
  _exit(1)).

  Without that one exception, a renovated timeout.c would not need to support
  handlers, which would be rather nice.  It's not at all obvious to me that we
  actually need StartupPacketTimeoutHandler() and process_startup_packet_die()
  actually need to do _exit(), we should be able to do non-blocking network IO
  around these parts - at least for SSL we largely do, we just don't check for
  interupts in in the right places.


Other questions:

- Can ipc/interrupts.c style interrupts be sent by postmaster? I think they
  ought to before long.


Greetings,

Andres Freund

Re: Eagerly scan all-visible pages to amortize aggressive vacuum

2025-01-28 Thread Masahiko Sawada

On Mon, Jan 27, 2025 at 1:22 PM Melanie Plageman
 wrote:
>
> On Mon, Jan 27, 2025 at 12:52 PM Masahiko Sawada  
> wrote:
> >
> > Thank you for updating the patch. I was reviewing the v10 patch and
> > had some comments. I believe these comments are still valid for v11,
> > but please ignore them if outdated.
>
> Thanks so much for the review!
>
> > +   if (TransactionIdIsNormal(vacrel->cutoffs.relfrozenxid) &&
> > +   TransactionIdPrecedesOrEquals(vacrel->cutoffs.relfrozenxid,
> > +
> >vacrel->cutoffs.FreezeLimit))
> > +   oldest_unfrozen_before_cutoff = true;
> > +
> > +   if (!oldest_unfrozen_before_cutoff &&
> > +   MultiXactIdIsValid(vacrel->cutoffs.relminmxid) &&
> > +   MultiXactIdPrecedesOrEquals(vacrel->cutoffs.relminmxid,
> > +
> >  vacrel->cutoffs.MultiXactCutoff))
> > +   oldest_unfrozen_before_cutoff = true;
> >
> > Given that our freeze check strictly checks if an xid is older than
> > the cutoff (exclusive bound), I think we should check if the
> > relfrozenxid and relminmxid strictly precede the cutoff values.
>
> Makes sense. I've changed that.
>
> > ---
> > if (*was_eager_scanned)
> > vacrel->eager_scanned_pages++;
> >
> > How about incrementing this counter near the place where incrementing
> > scanned_pages (i.e., at the beginning of the loop of
> > heap_vac_scan_next_block())? It would make it easy to understand the
> > difference between eager_scanned_pages and scanned_pages.
>
> Right. That makes sense. I've changed that in the attached v12.
>
> > ---
> > + * No vm_page_frozen output parameter (like what is passed to
> > + * lazy_scan_prune()) is passed here because empty pages are always frozen 
> > and
> > + * thus could never be eagerly scanned.
> >
> > The last part "thus could never be eagerly scanned" confused me a bit;
> > IIUC we count all pages that are scanned because of this new eager
> > scan feature as "eagerly scanned pages". We increment
> > eager_scanned_pages counter even if the page is either new or empty.
> > This fact seems to contradict the sentence "empty pages could never be
> > eagerly scanned".
>
> Ah, so what I mean by this is that the first time an empty page is
> vacuumed, it is set all-visible and all-frozen in the VM. We only
> eagerly scan pages that are _only_ all-visible (not also all-frozen).
> So, no empty pages will ever be eligible for eager scanning. I've
> updated the comment, because I agree it was confusing. See what you
> think now.

Thank you for updating the patch! These updates look good to me.

BTW I realized that we need to update tab-complete.c too to support
the tab-completion for vacuum_max_eager_freeze_failure_rate.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Re: Interrupts vs signals

2025-01-28 Thread Thomas Munro

On Wed, Jan 29, 2025 at 10:15 AM Andres Freund  wrote:
> I was chatting with Heikki about this patch and he mentioned that he recalls a
> patch that did some work to unify the signal replacement, procsignal.h and
> CHECK_FOR_INTERRUPTS(). Thomas, that was probably from you? Do you have a
> pointer, if so?

Yeah, attempts at that were included in earlier versions in this very
thread, but then Heikki came up with the
let's-just-replace-latches-completely concept and rejiggered the lower
level patches around that (which I liked and support).  I will try to
rebase/reorganise the accompanying procsignal removal part on top of
this version today, more soon...  (I had meant to do that earlier but
got a bit distracted by summer holiday season down here and some
personal stuff, sorry for the delay on that).

> It does seem like we're going to have to do some unification here. We have too
> many different partially overlapping, partially collaborating systems here.

Right, my goal from the start of this thread was always a full
unification leaving just one single system for this type of IPC, and
Heikki's latest version is a transitional point; hopefully with the
other stuff rebased on top we'll be getting pretty close to that, at
least for the procsignal part.  More soon.

Re: Non-text mode for pg_dumpall

2025-01-28 Thread Mahendra Singh Thalor

On Fri, 24 Jan 2025 at 20:50, jian he  wrote:
>
> On Thu, Jan 23, 2025 at 6:35 PM Mahendra Singh Thalor
>  wrote:
>
> hi.
> After some tests and thinking about your reply, I admit that using
> expand_dbname_patterns
> in pg_restore will not work.
> We need to do pattern matching against the map.dat file.
> Please check the attached v12 series based on your
> v11_pg_dumpall-with-directory-tar-custom-format-21-jan.patch
>
> v12-0001 cosmetic change.
> v12-0002 implement pg_resore --exclude-database=PATTERN.
> main gist of implementation:
> for each database name in map.dat file,
> check if this database name pattern matches with PATTERN or not.
> pattern matching is using processSQLNamePattern.
>
> your substring will not work.
> some of the test cases.
> $BIN10/pg_restore --exclude-database=* -Cd template1 --verbose dir10 >
> dir_format 2>&1
> $BIN10/pg_restore --exclude-database=*x* -Cd template1 --verbose dir10
> > dir_format 2>&1
> $BIN10/pg_restore --exclude-database=?* -Cd template1 --verbose dir10
> > dir_format 2>&1

I merged v12_0001 into the latest patch. There was one bug in v12_001*
which was fixed in v12_0002*.

-extern PGconn **connectDatabase*(const char *dbname,
> - const char *connection_string, const char *pghost,
> - const char *pgport, const char *pguser,
> - trivalue prompt_password, bool fail_on_error,
> - const char *progname, const char **connstr, int *server_version);
> +extern PGconn *v(const char *dbname, const char *connection_string, const
> char *pghost,


As per v12_0002*, I made some changes into the current patch to avoid using
the substring function.

On Tue, 28 Jan 2025 at 11:57, Mahendra Singh Thalor 
wrote:
>
> On Tue, 28 Jan 2025 at 10:19, Srinath Reddy  wrote:
> >
> >
> > Hi mahendra,
> >
> > I have reviewed the code in the v11 patch and it looks good to me.
> >
> > But in common_dumpall_restore.c there's  parseDumpFormat which is
common between pg_dumpall and pg_restore ,as per the discussion in [1]
thread i don't think we should create a common api ,as discussed in the
thread there might chances in the future we might decide that some format
is obsolete and desupport it in pg_dumpall ,while support in pg_restore for
compatibility reasons.

Fixed. In the latest patch, I removed the parseDumpFormat function.
In older versions, I was using the same function for pg_dumpall and
pg_restore but now some common code is already committed from this patch
and as per discussion, we will keep separate handling for parsing so adding
parseDumpFormat function only in pg_dumpall.c file.

On Sun, 26 Jan 2025 at 20:17, jian he  wrote:
>
> hi.
> attached patching trying to refactor ReadOneStatement
> for properly handling the single and double quotes.
> the commit message also has some tests on it.
>
> it is based on your
> v11_pg_dumpall-with-directory-tar-custom-format-21-jan.patch.

Okay. I am doing some more testing and code review for this type of test
cases. I will merge this delta into the next version.

> >
>
> Oaky. Thanks for review. I will make changes as per discussion in
> another thread.
>
>
> On Tue, 28 Jan 2025 at 11:52, Srinath Reddy  wrote:
> >
> > make check-world fails,i think we don't need $port and $filename
instead we can use something like 'xxx'.so fixed it in the below patch.
>
> In offline discussion, Andew already reported this test case. I will
> fix this in the next version.
>
Fixed.

Thanks Jian and Srinath for the testing and review.

Here, I am attaching an updated patch for review and testing.

I merged some of the delta patches that are shared by Jian and did some
fixes also.

-- 
Thanks and Regards
Mahendra Singh Thalor
EnterpriseDB: http://www.enterprisedb.com


v13_pg_dumpall-with-directory-tar-custom-format-28-jan.patch
Description: Binary data

Re: Introduce XID age and inactive timeout based replication slot invalidation

2025-01-28 Thread Nisha Moond

On Tue, Jan 28, 2025 at 3:26 PM Amit Kapila  wrote:
>
> On Mon, Dec 30, 2024 at 11:05 AM Peter Smith  wrote:
> >
> > I think we are often too quick to throw out perfectly good tests.
> > Citing that some similar GUCs don't do testing as a reason to skip
> > them just seems to me like an example of "two wrongs don't make a
> > right".
> >
> > There is a third option.
> >
> > Keep the tests. Because they take excessive time to run, that simply
> > means you should run them *conditionally* based on the PG_TEST_EXTRA
> > environment variable so they don't impact the normal BF execution. The
> > documentation [1] says this env var is for "resource intensive" tests
> > -- AFAIK this is exactly the scenario we find ourselves in, so is
> > exactly what this env var was meant for.
> >
> > Search other *.pl tests for PG_TEST_EXTRA to see some examples.
> >
>
> I don't see the long-running tests to be added under PG_TEST_EXTRA as
> that will make it unusable after some point. Now, if multiple senior
> members feel it is okay to add long-running tests under PG_TEST_EXTRA
> then I am open to considering it. We can keep this test as a separate
> patch so that the patch is being tested in CI or in manual tests
> before commit.
>

Please find the attached v64 patches. The changes in this version
w.r.t. older patch v63 are as -
- The changes from the v63-0001 patch have been moved to a separate thread [1].
- The v63-0002 patch has been split into two parts in v64:
  1) 001 patch: Implements the main feature - inactive timeout-based
slot invalidation.
  2) 002 patch: Separates the TAP test "044_invalidate_inactive_slots"
as suggested above.

[1] 
https://www.postgresql.org/message-id/CABdArM6pBL5hPnSQ%2B5nEVMANcF4FCH7LQmgskXyiLY75TMnKpw%40mail.gmail.com

--
Thanks,
Nisha


v64-0001-Introduce-inactive_timeout-based-replication-slo.patch
Description: Binary data


v64-0002-Add-TAP-test-for-slot-invalidation-based-on-inac.patch
Description: Binary data

Re: Feature Request: Add AES-128-CFB Mode Support to pgcrypto

2025-01-28 Thread Daniel Gustafsson

> On 28 Jan 2025, at 11:46, Vladyslav Nebozhyn  wrote:

> • Ease of Implementation: OpenSSL already provides a straightforward API 
> for AES-128-CFB, so adding it to pgcrypto should require only a few lines of 
> code.

IIRC we already support CFB for Blowfish so I think it would be quite easy to
add.  If you propose a patch for adding this I can volunteer to review it.

--
Daniel Gustafsson

Windows CFBot is broken because ecpg dec_test.c error

2025-01-28 Thread Jelte Fennema-Nio

Since about ~11 hours ago the ecpg test is consistently failing on
Window with this error[1]:

> Could not open file 
> C:/cirrus/build/src/interfaces/ecpg/test/compat_informix/dec_test.c for 
> reading

I took a quick look at possible causes but couldn't find a clear
winner. My current guess is that there's some dependency rule missing
in the meson file and due to some infra changes files now get compiled
in the wrong order.

One recent suspicious commit seems to be:
7819a25cd101b574f5422edb00fe3376fbb646da
But there are a bunch of successful changes that include that commit,
so it seems to be a red herring. (CC-ed Noah anyway)

[1]: https://cirrus-ci.com/task/6305422665056256?logs=check_world#L143

RE: POC: enable logical decoding when wal_level = 'replica' without a server restart

2025-01-28 Thread Hayato Kuroda (Fujitsu)

Dear Sawada-san,

I love the idea. I've roughly tested the patch and worked on my env.
Here are initial comments...

1. xloglevelworker.c
```
+#include "replication/logicalxlog.h"
```

xloglevelworker.c includes replication/logicalxlog.h, but it does not exist.
The line had to be removed to build and test it.

2.
```
+static void
+writeUpdateWalLevel(int new_wal_level)
+{
+   XLogBeginInsert();
+   XLogRegisterData((char *) (&new_wal_level), sizeof(bool));
+   XLogInsert(RM_XLOG_ID, XLOG_UPDATE_WAL_LEVEL);
+}
```

IIUC the data length should be sizeof(int) instead of sizeof(bool).

3.
Is there a reason why the process does not wait till the archiver exits?

4.
When I dumped wal files, I found that XLOG_UPDATE_WAL_LEVEL cannot be 
recognized:

```
rmgr: XLOGlen (rec/tot): 27/27, tx:  0, lsn: 
0/03050838, prev 0/03050800, desc: UNKNOWN (f0) wal_level logical
```

xlog_identify() must be updated as well.

5.
When I changed "logical" to "replica", postgres outputs like below:

```
LOG:  received SIGHUP, reloading configuration files
LOG:  parameter "wal_level" changed to "replica"
LOG:  wal_level control worker started
LOG:  changing wal_level from "logical" to "replica"
LOG:  wal_level has been decreased to "replica"
LOG:  successfully changed wal_level from "logical" to "replica"
```

ISTM that both postmaster and the wal_level control worker said something like
"wal_level changed", which is bit strange for me. Since GUC can't be renamed,
can we use another name for the wal_level control state?

6.
With the patch present, the wal_level can be changed to the minimal even when 
the
streaming replication is going. If we do that, the walsender exits immediately 
and
the below FATAL appears periodically until the standby stops. Same things can be
said for the logical replication:

```
FATAL:  streaming replication receiver "walreceiver" could not connect to the 
primary server:
connection to server on socket "/tmp/.s.PGSQL." failed:
FATAL:  WAL senders require "wal_level" to be "replica" or "logical
```

I know this is not a perfect, but can we avoid the issue by reject the GUC 
update
if the walsender exists? Another approach is not to update the value when 
replication
slots need to be invalidated.

--
Best regards,
Haato Kuroda

Re: Virtual generated columns

2025-01-28 Thread Shlok Kyal

On Mon, 27 Jan 2025 at 15:29, Peter Eisentraut  wrote:
>
> On 15.01.25 20:37, Peter Eisentraut wrote:
> > On 15.01.25 15:12, Dean Rasheed wrote:
> >> On Tue, 14 Jan 2025 at 13:37, Peter Eisentraut 
> >> wrote:
> >>>
> >>> Here is a new patch with that fixed and also a few
> >>> tweaks suggested by Jian.
> >>>
> >>
> >> I'm hoping to push my RETURNING OLD/NEW patch [1] soon, so I thought
> >> that I would check how it works together with this patch. The good
> >> news is that AFAICS everything just works, and it's possible to return
> >> old/new virtual generated columns in DML queries as expected.
> >>
> >> It did require a minor update, because my patch adds a new
> >> "result_relation" argument to ReplaceVarsFromTargetList() -- needed in
> >> DML queries because, when propagating a Var's old/new
> >> varreturningtype, replacement Vars need to be handled differently
> >> depending on whether or not they refer to the result relation. So that
> >> affects expand_generated_columns_internal(), when called from
> >> fireRIRrules(). OTOH, from expand_generated_columns_in_expr() it's OK
> >> to just pass 0 as the result relation index, because there won't be
> >> any old/new Vars in an expression that's not part of a DML query.
> >>
> >> Attached is the delta patch I used to handle this, along with a couple
> >> of simple test cases. It doesn't really matter which feature makes it
> >> in first, but the one that comes second will need to do something like
> >> this.
> >
> > Ok, I'll wait if you want to go ahead with yours soon.
>
> Here is an updated patch that integrates the above changes and also
> makes some adjustments now that the logical replication configuration
> questions are resolved.  I think this is complete now.
>
> But I'm seeing mysterious CI failures that have me stumped.  For example:
>
> https://cirrus-ci.com/task/5924251028422656
>
> I have seen this particular pgbench test failure sporadically but
> several times, and I have not seen it anywhere without this patch, and
> never locally.  The macOS task on the cfbot CI is very flaky right now,
> so it's hard to get a good baseline.  Also, it seems to me that this
> failing test could not possibly be further away from the code that the
> patch changes, so I'm thinking timing problems, but it only happens on
> the macOS task.  Really weird.

Hi,

I did some testing related to logical replication on the patch:

Test1: With row filter on publisher

-- publisher:
CREATE TABLE t1 (a int, b int GENERATED ALWAYS AS (a * 2) VIRTUAL);
create publication pub1 for table t1 where (b > 50);
INSERT INTO t1 values(1);
INSERT INTO t1 values(32);

-- subscriber
CREATE TABLE t1 (a int, b int);
create subscription test1 connection 'dbname=postgres host=localhost
port=5432' publication pub1;
select * from t1;
a  | b
+---
 32 |
(1 row)

Only records where b>50 are replicated to the subscriber.



Test 2: Replication of virtual generated column using user defined operator

-- publisher
CREATE OPERATOR === (
leftarg = integer,
rightarg = integer,
procedure = int4eq
);
CREATE TABLE t1 (a int, b bool GENERATED ALWAYS AS (a === 10)
VIRTUAL); INSERT INTO t1 values(1);
INSERT INTO t1 values(10);

-- create publication with row filter with user defined operator
create publication pub1 for table t1 where (a === 10);

ERROR:  invalid publication WHERE expression LINE 1: create
publication pub1 for table t1 where (a === 10)
^
DETAIL:  User-defined operators are not allowed.

-- create publication on virtual generated column using user defined operator
create publication pub1 for table t1 where (b = 't');
ERROR:  invalid publication WHERE expression
DETAIL:  User-defined operators are not allowed.



Test 3: CREATE PUBLICATION on column list with Virtual generated column

CREATE TABLE t1 (a int, b int GENERATED ALWAYS AS (a * 2) VIRTUAL);
create publication pub1 for table t1 (a, b);

It is failing with error:
ERROR:  cannot use virtual generated column "b" in publication column list.



Test 4: Update publication on non virtual gen

CREATE TABLE t1 (a int, b int GENERATED ALWAYS AS (a * 2) VIRTUAL);
create publication pub1 for table t1 (a);
alter table t1 replica identity full;
update t1 set a = 10;

ERROR:  cannot update table "t1"
DETAIL:  Column list used by the publication does not cover the
replica identity.



Test 5: Update publication on non virtual gen with no column list specified

CREATE TABLE t1 (a int, b int GENERATED ALWAYS AS (a * 2) VIRTUAL);
create pu

Improve error handling for invalid slots and ensure a same 'inactive_since' time for inactive slots

2025-01-28 Thread Nisha Moond

Hello Hackers,
(CC people involved in the earlier discussion)

While implementing slot invalidation based on inactive(idle) timeout
(see [1]), several general optimizations and improvements were
identified.

This thread is a spin-off from [1], intended to address these
optimizations separately from the main feature. As suggested in [2],
the improvements are divided into two parts:

Patch-001: Update the logic to ensure all inactive slots have the same
'inactive_since' time when restoring the slots from disk in
RestoreSlotFromDisk() and when updating the synced slots on standby in
update_synced_slots_inactive_since().

Patch-002: Raise error for invalid slots while acquiring it in
ReplicationSlotAcquire().
Currently, a process can acquire an invalid slot but may eventually
error out at later stages. For example, if a process acquires a slot
invalidated due to wal_removed, it will later fail in
CreateDecodingContext() when trying to access the removed WAL. The
idea here is to improve error handling by detecting invalid slots
earlier.
A new parameter, "error_if_invalid", is introduced in
ReplicationSlotAcquire(). If the caller specifies
error_if_invalid=true, an error is raised immediately instead of
letting the process acquire the invalid slot first and then fail later
due to the invalidated slot.

The v1 patches are attached, any feedback would be appreciated!


[1] 
https://www.postgresql.org/message-id/flat/calj2acw4aue-_ufqojdwcen-xxolghmvrfnl8snw_tz5nje...@mail.gmail.com
[2] 
https://www.postgresql.org/message-id/CAA4eK1LiDjz%2BF8hEYG0_ux%3DrqwhxnTuWpT-RKNDWaac3w3bWNw%40mail.gmail.com

--
Thanks,
Nisha


v1-0001-Ensure-same-inactive_since-time-for-all-inactive-.patch
Description: Binary data


v1-0002-Raise-Error-for-Invalid-Slots-in-ReplicationSlotA.patch
Description: Binary data

Re: Add RESPECT/IGNORE NULLS and FROM FIRST/LAST options

2025-01-28 Thread Tatsuo Ishii

>> +/*
>> + * ignorenulls_getfuncarginframe
>> + * For IGNORE NULLS, get the next nonnull value in the frame, moving 
>> forward or backward
>> + * until we find a value or reach the frame's end.
>> + */
>> +static Datum
>> +ignorenulls_getfuncarginframe(WindowObject winobj, int argno,
>>
>> Do you assume that win_nonnulls is sorted by pos? I think it's
>> necessarily true that pos in win_nonnulls array is sorted. Is that ok?
> 
> Yes it must be sorted on my understanding of the code.

Then the patch has a problem. I ran a query below and examined
win_nonnulls.  It seems it was not sorted out.

SELECT
  x,y,
  nth_value(y,1) IGNORE NULLS OVER w
FROM (VALUES (1,1), (2,2), (3,NULL), (4,4), (5,NULL), (6,6), (7,7)) AS t(x,y)
WINDOW w AS (ORDER BY x ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING EXCLUDE 
CURRENT ROW);


(gdb) p *winobj->win_nonnulls @ winobj->nonnulls_len
$8 = {1, 0, 3, 6, 5}

Best reagards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

Re: Pgoutput not capturing the generated columns

2025-01-28 Thread Amit Kapila

On Thu, Jan 23, 2025 at 9:58 AM vignesh C  wrote:
>

I have pushed the remaining part of this patch. Now, we can review the
proposed documentation part.

I feel we don't need the Examples sub-section for this part of the
docs. The behavior is quite clear from the "Behavior Summary"
sub-section table. Also, I don't find the sub-sections 29.6.1 and
29.6.2 are worth mentioning as separate sub-chapters.

*
+ non-PostgreSQL database via plugin output,

In the above sentence, shouldn't it be 'output plugin' instead of
'plugin output'? We use that way at other places in the docs.

-- 
With Regards,
Amit Kapila.

Re: SQL Property Graph Queries (SQL/PGQ)

2025-01-28 Thread Junwang Zhao

On Tue, Jan 28, 2025 at 7:03 PM Ashutosh Bapat
 wrote:
>
> On Tue, Jan 28, 2025 at 4:31 PM Junwang Zhao  wrote:
> >
> > On Tue, Jan 28, 2025 at 6:17 PM Ashutosh Bapat
> >  wrote:
> > >
> > > On Fri, Jan 24, 2025 at 9:16 PM Junwang Zhao  wrote:
> > > >
> > > > I figured out it's because I have `-DWRITE_READ_PARSE_PLAN_TREES` this 
> > > > option,
> > > > removing this option clears the warning, but I'm not sure if this is a
> > > > hidden issue.
> > > >
> > >
> > > Thanks for the hint. I found the issue and the fix appears in 0004.
> > > Some members of RangeTblEntry were not being handled in read/out
> > > functions.
> >
> > Thanks for the fix, I will apply and test again.
> >
> > >
> > > > >
> > > > > 0001-0009 patches are the same as the previous set sans mergeconflict 
> > > > > fixes.
> > > > > 0010 - is test for \dGx - to make sure that the extended format output
> > > > > works with \dG. We may decide not to have this patch in the final
> > > > > commit. But no harm in being there til that point.
> > > >
> > > > I have some changes based on the latest patch set. I attached two 
> > > > patches with
> > > > the following summary.
> > > >
> > > > 0001:
> > > > - doc: some of the query in ddl.sgml can not be executed
> > >
> > > The standard seems to require a property reference to be qualified by
> > > element pattern variable, even if the property is referenced within
> > > the same element pattern bound to the variable. Hence I accepted your
> > > document fixes which qualify property reference with element pattern
> > > variable.
> > >
> > > However, I didn't understand why you changed a LABEL name from order to 
> > > order_.
> >
> > Ah, that's because `order` is a keyword of SQL(order by), so the alias
> > is a conflict.
> > I'm not saying `order_` is a good name though.
>
> I didn't realize that. Thanks for catching. In that case just orders
> or cust_orders?

Yeah, I think explicitly LABELed as `orders` makes sense.

>
> >
> > >
> > > > - after path factor was introduced, some comments doesn't apply
> > >
> > > Thanks pointing those out. Accepted after rephrasing those and also
> > > correcting some related comments.
> > >
> > > >
> > > > 0002:
> > > > - add a get_propgraph_element_alias_name function and do some trivial 
> > > > refactor
> > > >
> > >
> > > I see you have added some negative tests - object not found tests, but
> > > I didn't see positive tests. Hence I haven't added those changes in
> > > the attached patchset. But we certainly need those changes. You may
> > > want to submit a patch with positive tests. That code needs to be
> > > fixed before committing anyway.
> >
> > Ok, I'll add positive tests.
>
> Thanks.
>
> --
> Best Wishes,
> Ashutosh Bapat



-- 
Regards
Junwang Zhao

Re: NOT ENFORCED constraint feature

2025-01-28 Thread Amul Sul

On Tue, Jan 28, 2025 at 4:01 PM Peter Eisentraut  wrote:
>
> On 20.01.25 17:53, Amul Sul wrote:
> >> Attached is a new set of patches. Please ignore patch 0001 here, which
> >> was posted separately [1] -- proposes allowing invalid foreign key
> >> constraints on partitioned tables. Patch 0002 refactors
> >> tryAttachPartitionForeignKey(), primarily needed by the new patch
> >> v9-0006 included in this set. This patch handles merging constraints
> >> with different enforceability. Without it, a new constraint would be
> >> created on the child. However, this patch introduces additional code
> >> that may appear somewhat messy or confusing. I've added plenty of
> >> comments to clarify the logic. While I’ve done some testing, it hasn’t
> >> been extensive. I plan to do more testing in the next week.
> >>
> >> Please let me know if we should continue with patch 0006 improvement,
> >> or if the feature up to patch 0005, which enforces matching
> >> enforceability before merging constraints, is sufficient.
> >>
> >> 1]  
> >> https://postgr.es/m/CAAJ_b96Bp=-ZwihPPtuaNX=srz0u6zsxd3+fgaro0juka8v...@mail.gmail.com
> >>
> >
> > I made minor updates to the attached version, particularly ensuring
> > that the order of relation opening and closing remains the same as
> > before in ATExecAlterConstrRecurse(). Additionally, I’ve added a
> > refactoring patch to make createForeignKeyActionTriggers() accept a
> > relation OID instead of a Relation, making this function consistent
> > with others like createForeignKeyCheckTriggers().
>
> I think v10-0001 has been committed separately now.  I can't
> successfully apply the remaining patches though.  Could you supply an
> updated patch set?
>

Sure, I plan to work on that tomorrow.

> Just from looking at them, the refactoring patches 0002, 0003, 0004 seem ok.
>
> 0005 also seems ok.
>

Thank you.

>
> In 0006, this change in the test output should be improved:
>
>   -- XXX: error message is misleading here
>   ALTER TABLE unique_tbl ALTER CONSTRAINT unique_tbl_i_key ENFORCED;
> -ERROR:  ALTER CONSTRAINT statement constraints cannot be marked ENFORCED
> -LINE 1: ...TABLE unique_tbl ALTER CONSTRAINT unique_tbl_i_key ENFORCED;
> -  ^
> +ERROR:  constraint "unique_tbl_i_key" of relation "unique_tbl" is not a
> foreign key constraint
>
> Maybe this should be along the lines of "ALTER CONSTRAINT ... ENFORCED
> is not supported for %s constraints" or something like that.
>

Ok, let me see what can be done here.

>
> This behavior is not correct:
>
> +-- Changing it back to ENFORCED will leave the constraint in the NOT
> VALID state
> +ALTER TABLE FKTABLE ALTER CONSTRAINT fktable_ftest1_fkey ENFORCED;
> +-- Which needs to be explicitly validated.
> +ALTER TABLE FKTABLE VALIDATE CONSTRAINT fktable_ftest1_fkey;
>
> Setting the constraint to enforced should enforce it immediately.  This
> SQL statement is covered by the SQL standard.  Also, I think it's a
> better user experience if you don't require two steps.
>

Let me clarify: the constraint will be enforced for new inserts and
updates, but it won't be validated against existing data, so those
will remain marked as invalid.

Regards,

Re: SQL Property Graph Queries (SQL/PGQ)

2025-01-28 Thread Junwang Zhao

On Tue, Jan 28, 2025 at 6:17 PM Ashutosh Bapat
 wrote:
>
> On Fri, Jan 24, 2025 at 9:16 PM Junwang Zhao  wrote:
> >
> > I figured out it's because I have `-DWRITE_READ_PARSE_PLAN_TREES` this 
> > option,
> > removing this option clears the warning, but I'm not sure if this is a
> > hidden issue.
> >
>
> Thanks for the hint. I found the issue and the fix appears in 0004.
> Some members of RangeTblEntry were not being handled in read/out
> functions.

Thanks for the fix, I will apply and test again.

>
> > >
> > > 0001-0009 patches are the same as the previous set sans mergeconflict 
> > > fixes.
> > > 0010 - is test for \dGx - to make sure that the extended format output
> > > works with \dG. We may decide not to have this patch in the final
> > > commit. But no harm in being there til that point.
> >
> > I have some changes based on the latest patch set. I attached two patches 
> > with
> > the following summary.
> >
> > 0001:
> > - doc: some of the query in ddl.sgml can not be executed
>
> The standard seems to require a property reference to be qualified by
> element pattern variable, even if the property is referenced within
> the same element pattern bound to the variable. Hence I accepted your
> document fixes which qualify property reference with element pattern
> variable.
>
> However, I didn't understand why you changed a LABEL name from order to 
> order_.

Ah, that's because `order` is a keyword of SQL(order by), so the alias
is a conflict.
I'm not saying `order_` is a good name though.

>
> > - after path factor was introduced, some comments doesn't apply
>
> Thanks pointing those out. Accepted after rephrasing those and also
> correcting some related comments.
>
> >
> > 0002:
> > - add a get_propgraph_element_alias_name function and do some trivial 
> > refactor
> >
>
> I see you have added some negative tests - object not found tests, but
> I didn't see positive tests. Hence I haven't added those changes in
> the attached patchset. But we certainly need those changes. You may
> want to submit a patch with positive tests. That code needs to be
> fixed before committing anyway.

Ok, I'll add positive tests.

>
> --
> Best Wishes,
> Ashutosh Bapat



-- 
Regards
Junwang Zhao

Re: SQL Property Graph Queries (SQL/PGQ)

2025-01-28 Thread Ashutosh Bapat

On Tue, Jan 28, 2025 at 4:31 PM Junwang Zhao  wrote:
>
> On Tue, Jan 28, 2025 at 6:17 PM Ashutosh Bapat
>  wrote:
> >
> > On Fri, Jan 24, 2025 at 9:16 PM Junwang Zhao  wrote:
> > >
> > > I figured out it's because I have `-DWRITE_READ_PARSE_PLAN_TREES` this 
> > > option,
> > > removing this option clears the warning, but I'm not sure if this is a
> > > hidden issue.
> > >
> >
> > Thanks for the hint. I found the issue and the fix appears in 0004.
> > Some members of RangeTblEntry were not being handled in read/out
> > functions.
>
> Thanks for the fix, I will apply and test again.
>
> >
> > > >
> > > > 0001-0009 patches are the same as the previous set sans mergeconflict 
> > > > fixes.
> > > > 0010 - is test for \dGx - to make sure that the extended format output
> > > > works with \dG. We may decide not to have this patch in the final
> > > > commit. But no harm in being there til that point.
> > >
> > > I have some changes based on the latest patch set. I attached two patches 
> > > with
> > > the following summary.
> > >
> > > 0001:
> > > - doc: some of the query in ddl.sgml can not be executed
> >
> > The standard seems to require a property reference to be qualified by
> > element pattern variable, even if the property is referenced within
> > the same element pattern bound to the variable. Hence I accepted your
> > document fixes which qualify property reference with element pattern
> > variable.
> >
> > However, I didn't understand why you changed a LABEL name from order to 
> > order_.
>
> Ah, that's because `order` is a keyword of SQL(order by), so the alias
> is a conflict.
> I'm not saying `order_` is a good name though.

I didn't realize that. Thanks for catching. In that case just orders
or cust_orders?

>
> >
> > > - after path factor was introduced, some comments doesn't apply
> >
> > Thanks pointing those out. Accepted after rephrasing those and also
> > correcting some related comments.
> >
> > >
> > > 0002:
> > > - add a get_propgraph_element_alias_name function and do some trivial 
> > > refactor
> > >
> >
> > I see you have added some negative tests - object not found tests, but
> > I didn't see positive tests. Hence I haven't added those changes in
> > the attached patchset. But we certainly need those changes. You may
> > want to submit a patch with positive tests. That code needs to be
> > fixed before committing anyway.
>
> Ok, I'll add positive tests.

Thanks.

-- 
Best Wishes,
Ashutosh Bapat

Re: Modern SHA2- based password hashes for pgcrypto

2025-01-28 Thread Bernd Helmle

Am Freitag, dem 24.01.2025 um 19:06 +0100 schrieb Alvaro Herrera:
> > So we behave exactly the same way as px_crypt_md5(): It stops after
> > the
> > first '$' after the magic byte preamble. For shacrypt, this could
> > be
> > the next '$' after the closing one of the non-mandatory 'rounds'
> > option, but with your example this doesn't happen since it gets
> > never
> > parsed. The salt length will be set to 0.
> 
> IMO silently using no salt or 0 iterations because the input is
> somewhat
> broken is bad security and should be rejected.  If we did so in the
> past
> without noticing, that's bad already, but we should not replicate
> that
> behavior any further.

Following Japin's example we parse the broken input after the magic
byte into

$6$rounds=1$/Zg436s2vmTwsoSz

So this is what gets passed over to the next steps after extracting the
magic byte.

We then try to parse the optional rounds option, which is expected to
be at the very first characters at this stage, which isn't true. In
that case we fall back to default 5000 iterations. So no problem with
iterations here. Note that we don't output the rounds option in that
case. I adapted this from Drepper's code to be compatible with the
tests (it suppresses the rounds option in case the default is used). I
forgot to adjust the docs about this change.

But we don't extract the salt because of the first '$', which confused
the input handling that there isn't anything interesting for it. And
that get unnoticed currently, which might be a problem for the caller.
Well, you see that something is wrong when looking at the result, but
if that happens programmatically it might be hidden.

I've looked what others do:

Drepper's code with the input above produces the following:

$5$=1$cWSkAaJa1mL912.HQqjIhODJ9b3S7hmgsb/k9Efp7.7

It parses the salt from the input as "=1".

Python's passlib is very strict when it comes to supported characters
within a salt string. It rejects everything thats not matching '[./0-
9A-Za-z]'. So when you provide the example above you get

  File "/usr/lib/python3.13/site-packages/passlib/utils/handlers.py",
line 1459, in _norm_salt
raise ValueError("invalid characters in %s salt" % cls.name)
ValueError: invalid characters in sha256_crypt salt

openssl-passwd does this:

echo test | openssl passwd -5 -stdin -salt
'$6$rounds=1$/Zg436s2vmTwsoSz'

$5$$6$rounds=1$$BFtTxJrvpb/ra8SnnXkiNCJ1HGZ3OOmwvyQ2TJZx44B

It absorbs everything up to 16 bytes and uses that as the salt string.

Interestingly, python-passlib and Drepper's code accept both empty
salt, openssl-passwd seems to be buggy:

echo test | openssl passwd -5 -stdin -salt '' 


So at least, they do *something* with the input and don't silently omit
salting the input passphrase, but the results aren't the same hash.

So i think we have two options: adapt Drepper's code and throw away
everything or be very strict like python-passlib. I myself can find
support for both, so not sure.


Bernd

Re: Skip collecting decoded changes of already-aborted transactions

2025-01-28 Thread Masahiko Sawada

On Mon, Jan 27, 2025 at 7:01 PM Peter Smith  wrote:
>
> On Tue, Jan 28, 2025 at 4:31 AM Masahiko Sawada  wrote:
> >
> > On Sun, Jan 26, 2025 at 10:26 PM Amit Kapila  
> > wrote:
> > >
> > > On Fri, Jan 24, 2025 at 12:38 AM Masahiko Sawada  
> > > wrote:
> > > >
> > > > On Wed, Jan 22, 2025 at 7:35 PM Peter Smith  
> > > > wrote:
> > > > >
> > > > > On Thu, Jan 23, 2025 at 2:17 PM Amit Kapila  
> > > > > wrote:
> > > > > >
> > > > > > On Wed, Jan 22, 2025 at 9:21 AM Peter Smith  
> > > > > > wrote:
> > > > > > >
> > > > > > >
> > > > > > > ==
> > > > > > > Commit message
> > > > > > >
> > > > > > > typo /RBTXN_IS_PREAPRE/RBTXN_IS_PREPARE/
> > > > > > >
> > > >
> > > > Will fix.
> > > >
> > > > > > >
> > > > > > > Also, this code (below) seems to be treating those macros as
> > > > > > > unrelated, but IIUC we know that rbtxn_skip_prepared(txn) is not
> > > > > > > possible unless rbtxn_is_prepared(txn) is true.
> > > > > > >
> > > > > > > - if (rbtxn_prepared(txn) || rbtxn_skip_prepared(txn))
> > > > > > > + if (rbtxn_is_prepared(txn) || rbtxn_skip_prepared(txn))
> > > > > > >   continue;
> > > >
> > > > Right. We no longer need to check rbtxn_skip_prepared() here.
> > > >
> > > > > > >
> > > > > > > ~~
> > > > > > >
> > > > > > > Furthermore, if we cannot infer that RBTXN_SKIPPED_PREPARE *must* 
> > > > > > > also
> > > > > > > be a prepared transaction, then why aren't the macros changed to 
> > > > > > > match
> > > > > > > that interpretation?
> > > > > > >
> > > > > > > e.g.
> > > > > > >
> > > > > > > /* prepare for this transaction skipped? */
> > > > > > > #define rbtxn_skip_prepared(txn) \
> > > > > > > ( \
> > > > > > > ((txn)->txn_flags & RBTXN_IS_PREPARED != 0) && \
> > > > > > > ((txn)->txn_flags & RBTXN_SKIPPED_PREPARE != 0) \
> > > > > > > )
> > > > > > >
> > > > > > > /* Has a prepare or stream_prepare already been sent? */
> > > > > > > #define rbtxn_sent_prepare(txn) \
> > > > > > > ( \
> > > > > > > ((txn)->txn_flags & RBTXN_IS_PREPARED != 0) && \
> > > > > > > ((txn)->txn_flags & RBTXN_SENT_PREPARE != 0) \
> > > > > > > )
> > > > > > >
> > > > > > > ~~~
> > > > > > >
> > > > > > > I think a to fix all this might be to enforce the 
> > > > > > > RBTXN_IS_PREPARED
> > > > > > > bitflag is set also for RBTXN_SKIPPED_PREPARE and 
> > > > > > > RBTXN_SENT_PREPARE
> > > > > > > constants, removing the ambiguity about how exactly to interpret 
> > > > > > > those
> > > > > > > two constants.
> > > > > > >
> > > > > > > e.g. something like
> > > > > > >
> > > > > > > #define RBTXN_IS_PREPARED 0x0040
> > > > > > > #define RBTXN_SKIPPED_PREPARE(0x0080 | RBTXN_IS_PREPARED)
> > > > > > > #define RBTXN_SENT_PREPARE(0x0200 | RBTXN_IS_PREPARED)
> > > > > > >
> > > > > >
> > > > > > I think the better way would be to ensure that where we set
> > > > > > RBTXN_SENT_PREPARE or RBTXN_SKIPPED_PREPARE, the transaction is a
> > > > > > prepared one (RBTXN_IS_PREPARED must be already set). It should be
> > > > > > already the case for RBTXN_SENT_PREPARE but we can ensure the same 
> > > > > > for
> > > > > > RBTXN_SKIPPED_PREPARE as well.
> > > >
> > > > Since the patch already does "txn->txn_flags |= (RBTXN_IS_PREPARED |
> > > > RBTXN_SKIPPED_PREPARE);", it's already ensured, no?
> > > >
> > >
> > > I mean to say that we add assert to ensure the same.
> > >
> > > > I think we need to add both flags in ReorderBufferSkipPrepare(),
> > > > because there is a case where a transaction might not be marked as
> > > > RBTXN_IS_PREPARED here.
> > > >
> > >
> > > Are you talking about the case when it is invoked from
> > > DecodePrepare()?
> >
> > Yes. IIUC ReorderBufferSkipPrepare() is called only from DecodePrepare().
> >
> > > I thought we would set the flag in that code path.
> >
> > I agree that it makes sense to add the flag before calling
> > ReorderBufferSkipPrepare().
> >
> > >
> > > > > >
> > > > > > Will that address your concern? Does anyone else have an opinion on 
> > > > > > this matter?
> > > > >
> > > > > Yes that would be OK, but should also add some clarifying comments in
> > > > > the "reorderbuffer.h" like:
> > > > >
> > > > >  #define RBTXN_SKIPPED_PREPARE   0x0080 /* this flag can only be set
> > > > > for RBTXN_IS_PREPARED transactions */
> > > > >  #define RBTXN_SENT_PREPARE 0x0200 /* this flag can only be set for
> > > > > RBTXN_IS_PREPARED transactions */
> > > >
> > > > I think the same is true for RBTXN_IS_SERIALIZED and
> > > > RBTXN_IS_SERIALIZED_CLEAR; RBTXN_IS_SERIALIZED_CLEAR can only be set
> > > > for RBTXN_IS_SERIALIZED transaction. Should we add some comments to
> > > > them too? But I'm concerned about having too much explanation if we
> > > > add descriptions to flags too while already having comments for
> > > > corresponding macros.
>
> Hm That RBTXN_IS_SERIALIZED / RBTXN_IS_SERIALIZED_CLEAR is used
> differently -- it seems more tricky because RBTXN_IS_SERIALIZED flag
> is turned OFF again when RBTXN_IS_SERIALIZED_CLEAR is turned ON.
> (Whereas setting SKIPPED_PREP

Proposal to make temporary change of memory context a bit safer

2025-01-28 Thread Antonin Houska

When dealing with a tricky bug in an extension [1] last week, I found out that
one should not rely on a PG core functions to leave CurrentMemoryContext
always unchanged. While the change of memory context makes sense for functions
involved in error handling or transaction control, it's less obvious in other
cases.

In particular, I realized that DecodingContextFindStartpoint() can switch to
TopTransactionContext, but it only does so in special cases (i.e. when dealing
with catalog cache invalidations), so it's not trivial to detect this behavior
during testing.

I think that the risk of problems like this would be smaller if we had a macro
that both calls MemoryContextSwitchTo() and checks if the context has not
changed. For example

DO_IN_MEMORY_CONTEXT(myContext)
{
...
}

which expands to

{
MemoryContext oldcontext = MemoryContextSwitchTo(myContext);

...

if (CurrentMemoryContext != myContext)
ereport(ERROR, ...);
MemoryContextSwitchTo(oldcontext);
}

Is this worth a patch? (I don't mean that macro like this should automatically
replace all the existing uses of MemoryContextSwitchTo() across the tree.)

[1] https://www.postgresql.org/about/news/pg_squeeze-18-released-3005/

-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com

Re: [PATCH] Improve code coverage of network address functions

2025-01-28 Thread Aleksander Alekseev

Hi,

> The correct version is as follows.
>
> make installcheck-world: tested, passed
> Implements feature: tested, passed
> Spec compliant: tested, passed
> Documentation: tested, passed

Thanks for your feedback!

> About the tests pushed to the SSL test suite, I'm +-0.  003_sslinfo.pl
> is a bit better than the two others in the SSL test suite, still it
> does not really fit into this file.

So to clarify, you propose creating a new file for the test (still in
the ssl/ suite) or keep it as is?

I agree that this is not exactly the best place for the test. However
I'm not sure whether creating a new one, e.g.
ssl/t/004_code_coverage.pl will be much better considering the fact
that the test still has little (nothing) to do with SSL.

Personally I'm fine with either option though.

-- 
Best regards,
Aleksander Alekseev

Re: Logging parallel worker draught

2025-01-28 Thread Sami Imseih

> I feel that observability is important, and I don't understand why we
> would want to have the information for only a portion of the
> functionality's usage (even if it's the most important).

In my opinion, the requirement for parallel usage in
the utility statement is different. In fact, I think the story
there needs improvement as is being discussed in [1]
by introducing something like a VERBOSE.

Here is the kind of case I think will be a bit odd if we
introduce the parallel logging for vacuum in V6_0003
and V6_0004.

postgres=# set client_min_messages = LOG;
SET
postgres=# set log_parallel_workers = "all";
SET
postgres=# vacuum (verbose, parallel 4) t ;
INFO:  vacuuming "postgres.public.t"
INFO:  launched 2 parallel vacuum workers for index vacuuming (planned: 2)
LOG:  launched 2 parallel workers (planned: 2)
INFO:  table "t": truncated 17242 to 0 pages
INFO:  finished vacuuming "postgres.public.t": index scans: 1
pages: 17242 removed, 0 remain, 17242 scanned (100.00% of total)

There will both be an INFO ( existing behavior ) and LOG ( new behavior ).
This seems wrong to me and there should only really be one
mechanism to log parallel workers for utility statements.
Others may have different opinions.

I also have a few comments on V6

1/
For the new comments.

Change:
  "workers is produced"
To
  "workers is emitted"

"emit" is used mainly to describe logs.

Also, change "displays" to "emits"

2/
Should the function LoggingParallelWorkers do all the work,
including logging? This way the callers just need to
call the function without checking for the return
value? Currently, the patch just repeats the same
logging calls everywhere it's needed.

Maybe use a void returning function called
LogParallelWorkersIfNeeded for this purpose?

Regards,

Sami

[1] 
https://www.postgresql.org/message-id/caa5rz0trtul6_vpvw79daggkp7b-ztwuc5yrpz5sjm8ns4k...@mail.gmail.com

Re: [PATCH] Improve code coverage of network address functions

2025-01-28 Thread Michael Paquier

On Tue, Jan 28, 2025 at 01:33:58PM +0300, Aleksander Alekseev wrote:
> Thanks for your feedback!

set_masklen(inet) could be covered for the -1 case, and it was missing
in the patch submitted.  For consistency with the other queries,
moving the call of abbrev(inet) with the existing abbrev(cidr) makes
more sense, I guess, and we could expand a bit the use of aliases in
the attributes even for the existing queries with broadcast() and
abbrev() to make the output nicer.

> So to clarify, you propose creating a new file for the test (still in
> the ssl/ suite) or keep it as is?
> 
> I agree that this is not exactly the best place for the test. However
> I'm not sure whether creating a new one, e.g.
> ssl/t/004_code_coverage.pl will be much better considering the fact
> that the test still has little (nothing) to do with SSL.
> 
> Personally I'm fine with either option though.

To be honest, I don't what's the best course of action here :)

Sticking that into the SSL tests looks incorrect to me.  If we care
only about the execution without the output, a SQL test is the most
common practice.  People can also use pg_regress with custom
connection strings, but I agree that it limits the impact of these
additions in the default cases where the tests are run using a unix
domain socket and these return NULL.

The SQL tests don't fall into that category and they are nice
additions, so I have applied this part with the tweaks mentioned
above.
--
Michael

signature.asc
Description: PGP signature

Re: Proposal: Filter irrelevant change before reassemble transactions during logical decoding

2025-01-28 Thread Peter Smith

Hi Ajin,

Some review comments for patch v12-0001.

==
Commit message

1.
Track transactions which have snapshot changes with a new flag
RBTXN_HAS_SNAPSHOT_CHANGES

~

The commit message only says *what* it does, but not *why* this patch
even exists.  TBH, I don't understand why this patch needs to be
separated from your patch 0002, because 0001 makes no independent use
of the flag, nor is it separately tested.

Anyway, if it is going to remain separated then IMO at least the the
message should explain the intended purpose e.g. why the subsequent
patches require this flagged info and how they will use it.

==
src/include/replication/reorderbuffer.h

2.
+/* Does this transaction have snapshot changes? */
+#define rbtxn_has_snapshot_changes(txn) \
+( \
+ ((txn)->txn_flags & RBTXN_HAS_SNAPSHOT_CHANGES) != 0 \
+)
+

Is the below wording maybe a more plain way to say that:

/* Does this transaction make changes to the current snapshot? */

==
Kind Regards,
Peter Smith.
Fujitsu Australia

Re: Make COPY format extendable: Extract COPY TO format implementations

2025-01-28 Thread Masahiko Sawada

On Thu, Jan 23, 2025 at 1:12 AM Sutou Kouhei  wrote:
>
> Hi,
>
> I noticed that the last patch set (v27) can't be applied to
> the current master. I've rebased on the current master and
> created v28 patch set. No code change.

Thank you for updating the patch!

While 0001 and 0002 look good to me overall, we still need to polish
subsequent patches. Here are review comments:

---
I still find that it would not be a good idea to move all copy-related
struct definitions to copyapi.h because we need to include copyapi.h
file into a .c file even if the file is not related to the custom copy
format routines. I think that copyapi.h should have only the
definitions of CopyToRoutine and CopyFromRoutine as well as some
functions related to the custom copy format. Here is an idea:

- CopyToState and CopyFromState are defined in copyto_internal.h (new
file) and copyfrom_internal.h, respectively.
- These two files #include's copy.h and other necessary header files.
- copyapi.h has only CopyToRoutine and CopyFromRoutine and #include's
both copyfrom_internal.h and copyto_internal.h.
- copyto.c, copyfrom.c and copyfromparse.c #include copyapi.h

Some advantages of this idea:

- we can keep both CopyToState and CopyFromState private in _internal.h files.
- custom format extension can include copyapi.h to provide a custom
copy format routine and to access the copy state data.
- copy-related .c files won't need to include copyapi.h if they don't
use custom copy format routines.

---
The 0008 patch introduces CopyFromStateRead(). While it would be a
good start, I think we can consider sorting out low-level
communication functions more. For example, CopyReadBinaryData() uses
the internal 64kB buffer but some custom copy format extensions might
want to use a larger buffer in its own implementation, which would
require exposing CopyGetData() etc. Given that we might expose more
functions to provide more ways for extensions, we might want to rename
CopyFromStateRead().

---
While we get the format routines for custom formats in
ProcessCopyOptionFormat(), we do that for built-in formats in
BeginCopyTo(), which seems odd to me. I think we can have
CopyToGetRoutine() responsible for getting CopyToRoutine for built-in
formats as well as custom format. The same is true for
CopyFromRoutine.

---
Copy[To|From]Routine for built-in formats are missing to set the node type.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Re: Pgoutput not capturing the generated columns

2025-01-28 Thread Peter Smith

On Tue, Jan 28, 2025 at 7:59 PM Amit Kapila  wrote:
>
> On Thu, Jan 23, 2025 at 9:58 AM vignesh C  wrote:
> >
>
> I have pushed the remaining part of this patch. Now, we can review the
> proposed documentation part.
>
> I feel we don't need the Examples sub-section for this part of the
> docs. The behavior is quite clear from the "Behavior Summary"
> sub-section table.

It is good to hear that the "Behavior Summary" matrix is clear, but it
is never the purpose of examples to show behaviour that is not already
clearly documented. The examples are simply to demonstrate some real
usage. Personally, I find it far easier to understand this (or any
other) feature by working through a few examples in conjunction with
the behaviour matrix, instead of just having the matrix and nothing
else.

Before removing the examples section I'd like to know if other people
also think it has no value.

> Also, I don't find the sub-sections 29.6.1 and
> 29.6.2 are worth mentioning as separate sub-chapters.
>

OK. Removed these as requested.

> *
> + non-PostgreSQL database via plugin output,
>
> In the above sentence, shouldn't it be 'output plugin' instead of
> 'plugin output'? We use that way at other places in the docs.
>

Fixed.

~~~

I also modified some whitespace indentations in the SGML file.

==
Kind Regards,
Peter Smith.
Fujitsu Australia

v57-0001-DOCS-Generated-Column-Replication.patch
Description: Binary data

Re: Adding skip scan (including MDAM style range skip scan) to nbtree

2025-01-28 Thread Peter Geoghegan

On Fri, Jan 24, 2025 at 10:38 PM Peter Geoghegan  wrote:
> I can reproduce this. However, it should be noted that the regression
> completely goes away if I make one small change to your test case: all
> I need to do is make sure that the CREATE INDEX happens *before*
> inserting any rows into the table. Once I do that, suffix truncation
> tends to be quite a bit more effective. This makes all the difference
> with your test case, since it encourages the existing heuristics
> within _bt_advance_array_keys to do the right thing and stick on the
> leaf level. That allows the "skipskip" mechanism to kick in as
> expected, which doesn't seem to be happening when the index is built
> by CREATE INDEX.

I think that I could have done better at explaining myself here. I'll
have another go at that:

Your test case showed an excessive number of primitive index scans:
EXPLAIN ANALYZE showed "Index Searches: 21859", even though the ideal
number of index searches is 1, given all these specifics. It would be
understandable if a person saw that and concluded that the added
overhead/regression comes from descending the index many more times
than is truly necessary -- blaming the added overhead that comes from
all those extra _bt_search calls is a good guess. But that's not it.
Not really.

You (Heikki) didn't actually make any claims about _bt_search being
too hot during profiling of this test case. You didn't actually see
_bt_search show up prominently when you ran "perf". What you actually
saw (and actually complained about) was stuff that is called from
_bt_readpage, to deal with array maintenance. Even still, I want to
avoid making this any more confusing than it needs to be. The nature
of the problem needs to be carefully teased apart.

There is an important sense in which the issue of excessive primitive
index scans *is* related to the regression from wasting cycles on
array maintenance, though only indirectly: the heuristics that decide
if the skipskip mechanism should be enabled during a _bt_readpage call
are indirectly influenced by certain other heuristics, in
_bt_advance_array_keys.
These other heuristics are the heuristics that determine
whether or not we'll start another primitive index scan (all of which
are in _bt_advance_array_key, and were added in Postgres 17, and
haven't been touched by this patch series).

More concretely: if we choose to start another primitive index scan,
and make a bad choice (because we land on the very next sibling leaf
page when we could gotten to by simply stepping right without calling
_bt_first/_bt_search again), then we also won't have an opportunity to
apply the skipskip mechanism when on that same sibling leaf page.
That's because in practice every leaf page read within _bt_readpage
will be the first leaf page of the ongoing primitive index scan with
this test case. Being the first leaf page of a primscan supposedly
makes a leaf page a bad target for the skipskip optimization, and so
we never actually apply the skipskip optimization in practice here.

Again, the real problem is simply that we're not applying the skipskip
optimization at all -- even though it was specifically written with
cases like Heikki's adversarial case in mind, and even though it
actually works as designed once it is forced to activate with Heikki's
test case. It may also be a bit of a problem that there's 21859 calls to
_bt_search instead of just 1, but that's a surprisingly small
contributor to the added overhead. (I'll probably fix the problem by
avoiding useless primitive index scans in the first place, rather than
by changing the heuristics that activate skipskip, which condition the
use of skipskip on firstPage=false. But, again, that doesn't mean that
the problem is excessive primscan overhead from all of the extra
_bt_first/_bt_search calls. The problem is indirect, and so my solution
can be indirect, too.)

--
Peter Geoghegan

Re: Having problems generating a code coverage report

2025-01-28 Thread Michael Paquier

On Wed, Jan 15, 2025 at 07:51:39PM -0500, Peter Geoghegan wrote:
> I was able to get the lcov git master branch's tip to produce a
> Postgres coverage report (same compiler version and OS as before).
> Unfortunately, I had to use an even grottier incantation to get this
> to work:
> 
> make -s coverage-html GENHTML_FLAGS="-q --legend --ignore-errors
> unmapped,unmapped,empty,empty,inconsistent,inconsistent,corrupt,corrupt,range,range"
> LCOVFLAGS="--ignore-errors
> empty,empty,negative,negative,inconsistent,inconsistent"
> 
> (As you can imagine, I came up with this through pure trial and error.)

Interesting.  These tricks do not seem to work here with a Debian GID
host, unfortunately.  I'll try to dig a bit more, this is getting very
annoying :(
--
Michael


signature.asc
Description: PGP signature

preptlist.c can insert unprocessed expression trees

2025-01-28 Thread Tom Lane

I happened across a not-great behavior in expand_insert_targetlist.
If a column is omitted in an INSERT, and there's no column
default, the code generates a NULL Const to be inserted.
Furthermore, if the column is of a domain type, we wrap the
Const in CoerceToDomain, so as to throw a run-time error if the
domain has a NOT NULL constraint.  That's fine as far as
it goes, but there are two problems:

1. We're being sloppy about the type/typmod that the Const is
labeled with.  It really should have the domain's base type/typmod,
since it's the input to CoerceToDomain not the output.  This can
result in coerce_to_domain inserting a useless length-coercion
function (useless because it's being applied to a null).

2. We're not applying expression preprocessing (specifically,
eval_const_expressions) to the resulting expression tree.
The planner's primary expression-preprocessing pass already happened,
so that means the length coercion step and CoerceToDomain node miss
preprocessing altogether.

You can observe that there is a problem with this script:

-
create domain d11 as varchar(11);
create table td11 (f1 int, f2 d11);
set debug_print_plan = 1;
insert into td11 values(0, null);
insert into td11 values(0);
-

Comparing the plan tree dumps for the two INSERTs, the first just
shows a simple NULL Const of the domain type as the source for f2.
But the second shows a NULL Const of the domain type that is fed
to a varchar length-checking function and then to CoerceToDomain.
Of course they really ought to look the same.

When this code was last touched --- over twenty years ago, looks like
--- neither of these oversights meant anything more than a little
inefficiency.  However, as we've loaded more and more responsibility
onto eval_const_expressions, it's turned into what's probably a live
bug.  Specifically, if the length-coercion function call needed
default-argument insertion or named-argument reordering, things would
blow up pretty good.  That's not the case for any in-core data types,
but I wonder whether any extensions create such things.  The authors
wouldn't find out about the issue unless they tried making a domain
on top of their type, so it could have gone unreported.  So I think
this is something that needs to be fixed, and probably back-patched,
even though I don't have a test case that exhibits a crash.
(I actually found this while working on a patch that adds some
more work to eval_const_expressions, so we'll need to deal with
it going forward even if we opt not to back-patch.)

There are a few places in the rewriter that do the same sort of
thing (probably copied-and-pasted from preptlist at some point).
Those are before the planner so the results will get preprocessed
later, but it's still not great if they insert useless length-
coercion calls.  So I felt it was worth writing a utility function to
consolidate all those usages into one copy.  I'm not quite sure about
what to call it though.  In the attached 0001 patch I called it
coerce_null_to_domain and put it in parse_coerce.c.  Another idea
I considered was to consider it as a variant of makeConst and
put it in makefuncs.c.  But that would require makefuncs.c to call
parse_coerce.c which seems like a layering violation.  Anyone have
a better idea?

0001 does result in some cosmetic changes in postgres_fdw's
regression output.  That's because we're now careful to label
the null Const with the column's typmod, which we were not
before.

It seems to me that part of the problem here is that coerce_to_domain
is willing to look up the domain's base type/typmod if the caller
doesn't want to supply it.  With these changes, there's basically
noplace where the caller hasn't already looked that up, and I think
that that's probably required for correct usage.  (The caller has
to produce an input that's of the base type, after all.)  So it seems
like that's not a convenience so much as an encouragement to incorrect
coding.  I propose, for HEAD only, 0002 which removes that misfeature
and requires callers to supply the info.

regards, tom lane

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 85252cbdbc..7e487cff2a 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -4273,13 +4273,13 @@ EXPLAIN (VERBOSE, COSTS OFF) EXECUTE st6;
 
 PREPARE st7 AS INSERT INTO ft1 (c1,c2,c3) VALUES (1001,101,'foo');
 EXPLAIN (VERBOSE, COSTS OFF) EXECUTE st7;
-   QUERY PLAN
--
+

Fwd: Why we need to check for local buffers in BufferIsExclusiveLocked and BufferIsDirty?

2025-01-28 Thread Srinath Reddy

On Sun, Jan 26, 2025 at 9:49 PM Tom Lane  wrote:

> Srinath Reddy  writes:
> > as suggested did the changes and attached the patch for the same.
>
> Uh ... what in the world is the point of changing
> BufferIsExclusiveLocked's signature?
>
>
as suggested did not change the BufferIsExclusiveLocked's signature and
here's the patch for the same.

Regards,
Srinath Reddy Sadipiralla,
EDB: https://www.enterprisedb.com 
From 3d232e3f5d94acd0493dce1a51a3c8d51e49a1c8 Mon Sep 17 00:00:00 2001
From: Srinath Reddy Sadipiralla 
Date: Wed, 29 Jan 2025 07:04:18 +0530
Subject: [PATCH 1/1] 	Handle local buffer cases properly in  
 BufferIsExclusiveLocked and BufferIsDirty

---
 src/backend/storage/buffer/bufmgr.c | 21 -
 1 file changed, 12 insertions(+), 9 deletions(-)

diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 0d8849bf89..23c9edfb71 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -2463,7 +2463,8 @@ ExtendBufferedRelShared(BufferManagerRelation bmr,
 /*
  * BufferIsExclusiveLocked
  *
- *  Checks if buffer is exclusive-locked.
+ *  Checks if buffer is exclusive-locked,For local buffers we act as exclusive lock is always
+ * held,because we don't initialize content locks for local buffers.
  *
  * Buffer must be pinned.
  */
@@ -2472,20 +2473,22 @@ BufferIsExclusiveLocked(Buffer buffer)
 {
 	BufferDesc *bufHdr;
 
+	Assert(BufferIsPinned(buffer));
+
 	if (BufferIsLocal(buffer))
 	{
 		int			bufid = -buffer - 1;
 
 		bufHdr = GetLocalBufferDescriptor(bufid);
+		return true;
 	}
 	else
 	{
 		bufHdr = GetBufferDescriptor(buffer - 1);
-	}
-
-	Assert(BufferIsPinned(buffer));
-	return LWLockHeldByMeInMode(BufferDescriptorGetContentLock(bufHdr),
+		return LWLockHeldByMeInMode(BufferDescriptorGetContentLock(bufHdr),
 LW_EXCLUSIVE);
+	}
+	
 }
 
 /*
@@ -2501,6 +2504,8 @@ BufferIsDirty(Buffer buffer)
 {
 	BufferDesc *bufHdr;
 
+	Assert(BufferIsPinned(buffer));
+
 	if (BufferIsLocal(buffer))
 	{
 		int			bufid = -buffer - 1;
@@ -2510,11 +2515,9 @@ BufferIsDirty(Buffer buffer)
 	else
 	{
 		bufHdr = GetBufferDescriptor(buffer - 1);
-	}
-
-	Assert(BufferIsPinned(buffer));
-	Assert(LWLockHeldByMeInMode(BufferDescriptorGetContentLock(bufHdr),
+		Assert(LWLockHeldByMeInMode(BufferDescriptorGetContentLock(bufHdr),
 LW_EXCLUSIVE));
+	}
 
 	return pg_atomic_read_u32(&bufHdr->state) & BM_DIRTY;
 }
-- 
2.43.0

Re: POC: enable logical decoding when wal_level = 'replica' without a server restart

2025-01-28 Thread Masahiko Sawada

On Tue, Jan 28, 2025 at 1:39 AM Hayato Kuroda (Fujitsu)
 wrote:
>
> Dear Sawada-san,
>
> I love the idea. I've roughly tested the patch and worked on my env.
> Here are initial comments...

Thank you for looking at the patch!

>
> 1. xloglevelworker.c
> ```
> +#include "replication/logicalxlog.h"
> ```
>
> xloglevelworker.c includes replication/logicalxlog.h, but it does not exist.
> The line had to be removed to build and test it.
>
> 2.
> ```
> +static void
> +writeUpdateWalLevel(int new_wal_level)
> +{
> +   XLogBeginInsert();
> +   XLogRegisterData((char *) (&new_wal_level), sizeof(bool));
> +   XLogInsert(RM_XLOG_ID, XLOG_UPDATE_WAL_LEVEL);
> +}
> ```
>
> IIUC the data length should be sizeof(int) instead of sizeof(bool).

Agreed to fix them.

>
> 3.
> Is there a reason why the process does not wait till the archiver exits?

No. I didn't implement this part as the patch was just for
proof-of-concept. I think it would be better to wait for it to exit.

>
> 4.
> When I dumped wal files, I found that XLOG_UPDATE_WAL_LEVEL cannot be 
> recognized:
>
> ```
> rmgr: XLOGlen (rec/tot): 27/27, tx:  0, lsn: 
> 0/03050838, prev 0/03050800, desc: UNKNOWN (f0) wal_level logical
> ```
>
> xlog_identify() must be updated as well.

Will fix.

>
> 5.
> When I changed "logical" to "replica", postgres outputs like below:
>
> ```
> LOG:  received SIGHUP, reloading configuration files
> LOG:  parameter "wal_level" changed to "replica"
> LOG:  wal_level control worker started
> LOG:  changing wal_level from "logical" to "replica"
> LOG:  wal_level has been decreased to "replica"
> LOG:  successfully changed wal_level from "logical" to "replica"
> ```
>
> ISTM that both postmaster and the wal_level control worker said something like
> "wal_level changed", which is bit strange for me. Since GUC can't be renamed,
> can we use another name for the wal_level control state?

I'm concerned that users could be confused if two different names
refer to substantially the same thing.

Having said that, I guess that we need to drastically change the
messages. For example, I think that the wal_level worker should say
something like "successfully made 'logical' wal_level effective"
instead of saying something like "changed wal_level value". Also,
users might not need gradual messages when increasing 'minimal' to
'logical' or decreasing 'logical' to 'minimal'.

>
> 6.
> With the patch present, the wal_level can be changed to the minimal even when 
> the
> streaming replication is going. If we do that, the walsender exits 
> immediately and
> the below FATAL appears periodically until the standby stops. Same things can 
> be
> said for the logical replication:
>
> ```
> FATAL:  streaming replication receiver "walreceiver" could not connect to the 
> primary server:
> connection to server on socket "/tmp/.s.PGSQL." failed:
> FATAL:  WAL senders require "wal_level" to be "replica" or "logical
> ```
>
> I know this is not a perfect, but can we avoid the issue by reject the GUC 
> update
> if the walsender exists? Another approach is not to update the value when 
> replication
> slots need to be invalidated.

Does it mean that we reject the config file from being reloaded in
that case? I have no idea how to reject it in a case where the
wal_level in postgresql.conf changed and the user did 'pg_ctl reload'.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Re: Improve error handling for invalid slots and ensure a same 'inactive_since' time for inactive slots

2025-01-28 Thread Peter Smith

Hi Nisha,

Some review comments for patch v1-0001.

==
src/backend/replication/logical/slotsync.c

ReplSlotSyncWorkerMain:

1.
+ /* Use same inactive_since time for all slots */
+ now = GetCurrentTimestamp();
+
  LWLockAcquire(ReplicationSlotControlLock, LW_SHARED);

  for (int i = 0; i < max_replication_slots; i++)
@@ -1537,10 +1540,6 @@ update_synced_slots_inactive_since(void)
  /* The slot must not be acquired by any process */
  Assert(s->active_pid == 0);

- /* Use the same inactive_since time for all the slots. */
- if (now == 0)
- now = GetCurrentTimestamp();
-

AFAICT, this code was *already* ensuring to use the same
'inactive_since' even before your patch. The only difference is now
you are getting the timestamp value up-front instead of a deferred
assignment.

So why did you change this (and the code of RestoreSlotFromDisk) to do
the up-front assignment? Instead, you could have chosen to just leave
this code as-is, and then modify the RestoreSlotFromDisk code to match
it.

FWIW, I do prefer what you have done here because it is simpler, but I
just wondered about the choice because I think some people worry about
GetCurrentTimestamp overheads and try to avoid calling that wherever
possible.

==
src/backend/replication/slot.c

2. What about other loops?

AFAICT there are still some other loops where the inactive_since
timestamps might differ.

e.g. How about this logic in slot.c:

InvalidateObsoleteReplicationSlots:

LOOP:
for (int i = 0; i < max_replication_slots; i++)
{
  ReplicationSlot *s = &ReplicationSlotCtl->replication_slots[i];

  calls InvalidatePossiblyObsoleteSlot(...)
  which calls ReplicationSlotRelease(...)
  which assigns now = GetCurrentTimestamp();
  then slot->inactive_since = now;
}

~

So, should you also assign a 'now' value outside this loop and pass
that timestamp down the calls so they eventually all get assigned the
same? I don't know, but I guess at least that would require much fewer
unnecessary calls to GetCurrentTimestamp so that may be a good thing.

==
Kind Regards,
Peter Smith.
Fujitsu Australia

Re: Improve error handling for invalid slots and ensure a same 'inactive_since' time for inactive slots

2025-01-28 Thread Peter Smith

Hi Nisha,

Some review comments for the patch v1-0002.

==
src/backend/replication/slot.c

1.
+
+/*
+ * Raise an error based on the slot's invalidation cause.
+ */
+static void
+RaiseSlotInvalidationError(ReplicationSlot *slot)
+{
+ StringInfo err_detail = makeStringInfo();
+
+ Assert(slot->data.invalidated != RS_INVAL_NONE);
+
+ switch (slot->data.invalidated)
+ {
+ case RS_INVAL_WAL_REMOVED:
+ appendStringInfoString(err_detail, _("This slot has been invalidated
because the required WAL has been removed."));
+ break;
+
+ case RS_INVAL_HORIZON:
+ appendStringInfoString(err_detail, _("This slot has been invalidated
because the required rows have been removed."));
+ break;
+
+ case RS_INVAL_WAL_LEVEL:
+ /* translator: %s is a GUC variable name */
+ appendStringInfo(err_detail, _("This slot has been invalidated
because \"%s\" is insufficient for slot."),
+ "wal_level");
+ break;
+
+ case RS_INVAL_NONE:
+ pg_unreachable();
+ }
+
+ ereport(ERROR,
+ errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("can no longer get changes from replication slot \"%s\"",
+NameStr(slot->data.name)),
+ errdetail_internal("%s", err_detail->data));
+}

AFAIK the _() is a macro for gettext(). So those strings are intended
for translation, right? There's also a "/* translator: ..." comment
implying the same.

OTOH, those strings are only being used by errdetail_internal, whose
function comment says "This is exactly like errdetail() except that
strings passed to errdetail_internal are not translated...".

Isn't there some contradiction here?

Perhaps the _() macro is not needed, or perhaps the
errdetail_internal() should be errdetail().

==
Kind Regards,
Peter Smith.
Fujitsu Australia

Re: Pgoutput not capturing the generated columns

2025-01-28 Thread Amit Kapila

On Wed, Jan 29, 2025 at 6:03 AM Peter Smith  wrote:
>
> On Tue, Jan 28, 2025 at 7:59 PM Amit Kapila  wrote:
> >
> > On Thu, Jan 23, 2025 at 9:58 AM vignesh C  wrote:
> > >
> >
> > I have pushed the remaining part of this patch. Now, we can review the
> > proposed documentation part.
> >
> > I feel we don't need the Examples sub-section for this part of the
> > docs. The behavior is quite clear from the "Behavior Summary"
> > sub-section table.
>
> It is good to hear that the "Behavior Summary" matrix is clear, but it
> is never the purpose of examples to show behaviour that is not already
> clearly documented. The examples are simply to demonstrate some real
> usage. Personally, I find it far easier to understand this (or any
> other) feature by working through a few examples in conjunction with
> the behaviour matrix, instead of just having the matrix and nothing
> else.
>

I am not against giving examples in the docs to make the topic easy to
understand but in this particular case, I am not sure if additional
examples are useful. You already gave one example in the beginning:
"For example, note below that subscriber table generated column value
comes from the subscriber column's calculation." the remaining text is
clear enough to understand the feature.

If you still want to make a case for additional examples, divide this
patch into two parts. The first part without examples could be part of
this thread and I can commit that. Then you can start a separate
thread just for the examples and then we can see what others think and
make a decision based on that.

-- 
With Regards,
Amit Kapila.

Re: Extended Statistics set/restore/clear functions.

2025-01-28 Thread jian he

hi.
I reviewed 0001 only.

in src/backend/statistics/mvdistinct.c

no need #include "nodes/pg_list.h" since
src/include/statistics/statistics.h sub level include "nodes/pg_list.h"

no need #include "utils/palloc.h"
sicne #include "postgres.h"
already included it.


 select '[{"6, -32768,,": -11}]'::pg_ndistinct;
ERROR:  malformed pg_ndistinct: "[{"6, -32768,,": -11}]"
LINE 1: select '[{"6, -32768,,": -11}]'::pg_ndistinct;
   ^
DETAIL:  All ndistinct count values are scalar doubles.
imho, this errdetail message is not good.


select '{}'::pg_ndistinct ;
segfault


select '{"1,":"1"}'::pg_ndistinct ;
ERROR:  malformed pg_ndistinct: "{"1,":"1"}"
LINE 1: select '{"1,":"1"}'::pg_ndistinct ;
   ^
DETAIL:  All ndistinct attnum lists must be a comma separated list of attnums.

imho, this errdetail message is not good. would be better saying that
"length of list of attnums must be larger than 1".



select  pt1.typnamespace, pt1.typarray, pt1.typcategory, pt1.typname
frompg_type pt1
where pt1.typname ~*'distinct';

 typnamespace | typarray | typcategory |   typname
--+--+-+--
   11 |0 | Z   | pg_ndistinct

typcategory (Z) marked as Internal-use types. and there is no
pg_ndistinct array type,
not sure this is fine.


all the errcode one pair of the parenthesis is unnecessary.
for example
(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
 errmsg("malformed pg_ndistinct: \"%s\"", parse->str),
 errdetail("Must begin with \"{\"")));
can change to
errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
 errmsg("malformed pg_ndistinct: \"%s\"", parse->str),
 errdetail("Must begin with \"{\""));

see https://www.postgresql.org/docs/current/error-message-reporting.html

Extensibility proposal - transaction start callbacks

2025-01-28 Thread Alastair Turner

I see that various comments in xact.c talk about start and end callbacks,
but only end seems to be implemented at the moment. I also can't find any
discussion about start of transaction callbacks in the mailing list
archives. Would a patch in this direction be of interest?

I was looking at this based on the discussion around extended transaction
logging for audit tools [1], there is another possible use of this session
information - in logical replication. There is a pattern among the users of
various logical replication / CDC of using user identity, application
identity or session flags as a way to suppress the logical replication of
some transactions. The primary interest seems to be in suppressing the
logical replication of bulk load or bulk update operations which are
dual-loaded into the two databases from a common source.

The existing infrastructure of XactCallback and pg_logical_emit_message()
seems to provide what's needed to tie this together, but only at the end
(COMMIT, ROLLBACK, PREPARE) of a transaction. It would be great for the
logical replication plugin to be able to make a decision at the start of a
large transaction to discard the transaction's content before incurring the
cost of staging the content and/or rolling back.

Thanks
Alastair

[1]
https://www.postgresql.org/message-id/flat/678FE75B.2020101%40acm.org#122d68fd45ea3012b7b5e49519e77b48

Re: Sample rate added to pg_stat_statements

2025-01-28 Thread Sami Imseih

> All the changes mentioned above are included in the v13 patch. Since the
> patch status is 'Ready for Committer,' I believe it is now better for
> upstream inclusion, with improved details in tests and documentation. Do
> you have any further suggestions?

I am not quite clear on the sample_1.out. I do like the idea of separating
the sample tests, but I was thinking of something a bit more simple.
What do you think of my attached, sampling.sql, test? It tests sample
rate in both
simple and extended query protocols and for both top level and
nested levels?

> If anyone has the capability to run this benchmark on machines with more
> CPUs or with different queries, it would be nice. I’d appreciate any
> suggestions or feedback.

I wanted to share some additional benchmarks I ran as well
on a r8g.48xlarge ( 192 vCPUs, 1,536 GiB of memory) configured
with 16GB of shared_buffers. I also attached the benchmark.sh
script used to generate the output.
The benchmark is running the select-only pgbench workload,
so we have a single heavily contentious entry, which is the
worst case.

The test shows that the spinlock (SpinDelay waits)
becomes an issue at high connection counts and will
become worse on larger machines. A sample_rate going from
1 to .75 shows a 60% improvement; but this is on a single
contentious entry. Most workloads will likely not see this type
of improvement. I also could not really observe
this type of difference on smaller machines ( i.e. 32 vCPUs),
as expected.

## init
pgbench -i -s500

### 192 connections
pgbench -c192 -j20 -S -Mprepared -T120 --progress 10

sample_rate = 1
tps = 484338.769799 (without initial connection time)
waits
-
  11107  SpinDelay
   9568  CPU
929  ClientRead
 13  DataFileRead
  3  BufferMapping

sample_rate = .75
tps = 909547.562124 (without initial connection time)
waits
-
  12079  CPU
   4781  SpinDelay
   2100  ClientRead

sample_rate = .5
tps = 1028594.555273 (without initial connection time)
waits
-
  13253  CPU
   3378  ClientRead
174  SpinDelay

sample_rate = .25
tps = 1019507.126313 (without initial connection time)
waits
-
  13397  CPU
   3423  ClientRead

sample_rate = 0
tps = 1015425.288538 (without initial connection time)
waits
-
  13106  CPU
   3502  ClientRead

### 32 connections
pgbench -c32 -j20 -S -Mprepared -T120 --progress 10

sample_rate = 1
tps = 620667.049565 (without initial connection time)
waits
-
   1782  CPU
560  ClientRead

sample_rate = .75
tps = 620663.131347 (without initial connection time)
waits
-
   1736  CPU
554  ClientRead

sample_rate = .5
tps = 624094.688239 (without initial connection time)
waits
-
   1741  CPU
648  ClientRead

sample_rate = .25
tps = 628638.538204 (without initial connection time)
waits
-
   1702  CPU
576  ClientRead

sample_rate = 0
tps = 630483.464912 (without initial connection time)
waits
-
   1638  CPU
574  ClientRead

Regards,

Sami


sampling.sql
Description: Binary data


benchmark.sh
Description: Bourne shell script

Re: Increase of maintenance_work_mem limit in 64-bit Windows

2025-01-28 Thread Vladlen Popolitov


Tom Lane писал(а) 2025-01-24 06:13:

v.popoli...@postgrespro.ru writes:

[ v2-0001-work_mem_vars-limit-increased-in-64bit-Windows.patch ]

Hi Tom,

Thank you for the interest to the patch and comments.

Comment A.

Is there a reason not to do them all the same way?  I'd suggest
"(Size) 1024", losing the "L" which has no purpose any more.
(And note project style is with a space after the cast.)
I don't like use of INT64CONST here because that forces an
unnecessarily expensive calculation in 32-bit machines.


Trying to fix 64-bit code I did not consider 32-bit code seriously,
but your comment changed my mind. I tried to cast everything to
lvalue type (it was the reason of many cast kinds). Now I changed
almost everything to (Size)1024, what is 32 and 64 bit friendly.

Only one exceptions in src/backend/commands/vacuumparallel.c:378
shared->dead_items_info.max_bytes = vac_work_mem * (size_t)1024;
max_bytes declared as size_t, I did cast to the same type (I cannot
guarranty, that size_t and Size are equivalent in all systems)

Comment B.

I wonder if it'd be worth creating a macro rather than repeating
"* (Size) 1024" everywhere.  KILOBYTES_TO_BYTES(work_mem) seems
too wordy, but maybe we can think of a shorter name.

I looked in source for all variables overflow connected to "work_mem"
variables and fixed them. I did not want to include additional code
and I tried to keep the patch as small as possible. Source has other
locations with KB to bytes conversions, but it was not the goal of
this patch.

Comment C.

-   longsort_threshold;
+   uint64  sort_threshold;

(1) Are you sure the related code doesn't need this to be signed?
(2) Even if it can be unsigned, why not Size or size_t?

Changed to Size from uint64. It should not be signed, it is compared
with positive values later and finaly casted to uint32.

Comment D.

-   longsort_mem_bytes = sort_mem * 1024L;
+   int64   sort_mem_bytes = sort_mem * INT64CONST(1024);

Wrong type surely, and even more so here:

-   longwork_mem_bytes = work_mem * 1024L;
+   double  work_mem_bytes = work_mem * INT64CONST(1024);

If there's actually a reason for this scattershot approach to
new data types, you need to explain what the plan is.  I'd
have expected a push to replace "long" with "Size", or maybe
use size_t (or ssize_t when we need a signed type).

INT64CONST(1024) changed to (Size)1024, but I keep new types
"int64" and "double" here.
1) sort_mem_bytes is used as int64 variable later, passed to
function with int64 parameter. "Size" type is not used in
this case.
2) work_mem_bytes is compared only with double values later, it
does not need additional casts in this case. "Size" type is not
used in this case.

Comment E.

-tbm_create(long maxbytes, dsa_area *dsa)
+tbm_create(double maxbytes, dsa_area *dsa)

Why "double"??

Also, do we need to widen the result of tbm_calculate_entries?
I see the clamp to INT_MAX-1, but should we try to get rid of
that?  (Or if not, should we reduce its result to "int"?)

Agree. It was "double" due to usage of this variable as parameter
in tbm_calculate_entries(double maxbytes), but really
tbm_calculate_entries() does not need this type, it again
converted to "long" local variable. I changed parameter,
local variable and return value of tbm_calculate_entries()
to Size.

New version of diff:
-tbm_create(long maxbytes, dsa_area *dsa)
+tbm_create(Size maxbytes, dsa_area *dsa)

-long
-tbm_calculate_entries(double maxbytes)
+Size
+tbm_calculate_entries(Size maxbytes)
 {
-   longnbuckets;
+   Sizenbuckets;

Also tbm_calculate_entries() is used in assignment to "long"
variable maxentries. This variable is the part of other code,
I did not touch it, only added explicit cast to long:

-   maxentries = tbm_calculate_entries(work_mem * 1024L);
+   maxentries = (long)tbm_calculate_entries(work_mem * (Size)1024);


Summary:
I did fixes in patch for A,B,C,E comments.
Comment D - I gave explanation, why I changed long to int64 and double
instead of Size, I hope it is logical and you will be satisfied.

--
Best regards,

Vladlen Popolitov.
From 015e60140e62439fe6a45afc2c2fcdf464514219 Mon Sep 17 00:00:00 2001
From: Vladlen Popolitov 
Date: Tue, 28 Jan 2025 18:34:17 +0700
Subject: [PATCH v3] work_mem_vars limit increased in 64bit Windows

---
 src/backend/access/gin/ginfast.c|  2 +-
 src/backend/access/gin/ginget.c |  2 +-
 src/backend/access/hash/hash.c  |  4 ++--
 src/backend/access/heap/vacuumlazy.c|  2 +-
 src/backend/access/nbtree/nbtpage.c |  2 +-
 src/backend/commands/vacuumparallel.c   |  2 +-
 src/backend/executor/execUtils.c|  2 +-
 src/backend/executor/nodeBitmapIndexscan.c  |  2 +-
 src/backend/executor/nodeBitmapOr.c |  2 +-
 src/backend/nodes/tidbitmap.c   |  8 
 src/backend/optimizer

Re: XMLDocument (SQL/XML X030)

2025-01-28 Thread Chapman Flack

On 01/28/25 03:14, Jim Jones wrote:
> I'd say the difference is due to how the two systems handle the XML data
> type and unquoted identifiers in general, rather than a difference in
> the behaviour of the function itself. 

I'd go further and say it's entirely down to how the two systems
handle unquoted identifiers. In neither case was there ever any
XML value created with XML names in one case and then changed to
the other. The SQL names were already in their (DB2- or PostgreSQL-
specific) folded form by the first moment any XML library code ever
saw them. The XML code handled them faithfully ever after, whether
in serialized or in node-tree form.

Presumably both DB2 and PostgreSQL users soon know in their sleep
what their respective systems do to unquoted identifiers, and know
that quoting is the way to control that when it matters.

Regards,
-Chap

Re: Increase of maintenance_work_mem limit in 64-bit Windows

2025-01-28 Thread Vladlen Popolitov


Tom Lane писал(а) 2025-01-24 06:13:

v.popoli...@postgrespro.ru writes:

[ v2-0001-work_mem_vars-limit-increased-in-64bit-Windows.patch ]

Hi Tom,

Thank you for the interest to the patch and comments.

Comment A.

Is there a reason not to do them all the same way?  I'd suggest
"(Size) 1024", losing the "L" which has no purpose any more.
(And note project style is with a space after the cast.)
I don't like use of INT64CONST here because that forces an
unnecessarily expensive calculation in 32-bit machines.


Trying to fix 64-bit code I did not consider 32-bit code seriously,
but your comment changed my mind. I tried to cast everything to
lvalue type (it was the reason of many cast kinds). Now I changed
almost everything to (Size)1024, what is 32 and 64 bit friendly.

Only one exceptions in src/backend/commands/vacuumparallel.c:378
shared->dead_items_info.max_bytes = vac_work_mem * (size_t)1024;
max_bytes declared as size_t, I did cast to the same type (I cannot
guarranty, that size_t and Size are equivalent in all systems)

Comment B.

I wonder if it'd be worth creating a macro rather than repeating
"* (Size) 1024" everywhere.  KILOBYTES_TO_BYTES(work_mem) seems
too wordy, but maybe we can think of a shorter name.

I looked in source for all variables overflow connected to "work_mem"
variables and fixed them. I did not want to include additional code
and I tried to keep the patch as small as possible. Source has other
locations with KB to bytes conversions, but it was not the goal of
this patch.

Comment C.

-   longsort_threshold;
+   uint64  sort_threshold;

(1) Are you sure the related code doesn't need this to be signed?
(2) Even if it can be unsigned, why not Size or size_t?

Changed to Size from uint64. It should not be signed, it is compared
with positive values later and finaly casted to uint32.

Comment D.

-   longsort_mem_bytes = sort_mem * 1024L;
+   int64   sort_mem_bytes = sort_mem * INT64CONST(1024);

Wrong type surely, and even more so here:

-   longwork_mem_bytes = work_mem * 1024L;
+   double  work_mem_bytes = work_mem * INT64CONST(1024);

If there's actually a reason for this scattershot approach to
new data types, you need to explain what the plan is.  I'd
have expected a push to replace "long" with "Size", or maybe
use size_t (or ssize_t when we need a signed type).

INT64CONST(1024) changed to (Size)1024, but I keep new types
"int64" and "double" here.
1) sort_mem_bytes is used as int64 variable later, passed to
function with int64 parameter. "Size" type is not used in
this case.
2) work_mem_bytes is compared only with double values later, it
does not need additional casts in this case. "Size" type is not
used in this case.

Comment E.

-tbm_create(long maxbytes, dsa_area *dsa)
+tbm_create(double maxbytes, dsa_area *dsa)

Why "double"??

Also, do we need to widen the result of tbm_calculate_entries?
I see the clamp to INT_MAX-1, but should we try to get rid of
that?  (Or if not, should we reduce its result to "int"?)

Agree. It was "double" due to usage of this variable as parameter
in tbm_calculate_entries(double maxbytes), but really
tbm_calculate_entries() does not need this type, it again
converted to "long" local variable. I changed parameter,
local variable and return value of tbm_calculate_entries()
to Size.

New version of diff:
-tbm_create(long maxbytes, dsa_area *dsa)
+tbm_create(Size maxbytes, dsa_area *dsa)

-long
-tbm_calculate_entries(double maxbytes)
+Size
+tbm_calculate_entries(Size maxbytes)
 {
-   longnbuckets;
+   Sizenbuckets;

Also tbm_calculate_entries() is used in assignment to "long"
variable maxentries. This variable is the part of other code,
I did not touch it, only added explicit cast to long:

-   maxentries = tbm_calculate_entries(work_mem * 1024L);
+   maxentries = (long)tbm_calculate_entries(work_mem * (Size)1024);


Summary:
I did fixes in patch for A,B,C,E comments.
Comment D - I gave explanation, why I changed long to int64 and double
instead of Size, I hope it is logical and you will be satisfied.

--
Best regards,

Vladlen Popolitov.
From 015e60140e62439fe6a45afc2c2fcdf464514219 Mon Sep 17 00:00:00 2001
From: Vladlen Popolitov 
Date: Tue, 28 Jan 2025 18:34:17 +0700
Subject: [PATCH v3] work_mem_vars limit increased in 64bit Windows

---
 src/backend/access/gin/ginfast.c|  2 +-
 src/backend/access/gin/ginget.c |  2 +-
 src/backend/access/hash/hash.c  |  4 ++--
 src/backend/access/heap/vacuumlazy.c|  2 +-
 src/backend/access/nbtree/nbtpage.c |  2 +-
 src/backend/commands/vacuumparallel.c   |  2 +-
 src/backend/executor/execUtils.c|  2 +-
 src/backend/executor/nodeBitmapIndexscan.c  |  2 +-
 src/backend/executor/nodeBitmapOr.c |  2 +-
 src/backend/nodes/tidbitmap.c   |  8 
 src/backend/optimizer

Re: NOT ENFORCED constraint feature

2025-01-28 Thread Peter Eisentraut


On 28.01.25 11:58, Amul Sul wrote:

This behavior is not correct:

+-- Changing it back to ENFORCED will leave the constraint in the NOT
VALID state
+ALTER TABLE FKTABLE ALTER CONSTRAINT fktable_ftest1_fkey ENFORCED;
+-- Which needs to be explicitly validated.
+ALTER TABLE FKTABLE VALIDATE CONSTRAINT fktable_ftest1_fkey;

Setting the constraint to enforced should enforce it immediately.  This
SQL statement is covered by the SQL standard.  Also, I think it's a
better user experience if you don't require two steps.


Let me clarify: the constraint will be enforced for new inserts and
updates, but it won't be validated against existing data, so those
will remain marked as invalid.


Yes, I understand, but that is the not the correct behavior of this 
command per SQL standard.

Re: Sample rate added to pg_stat_statements

2025-01-28 Thread Ilia Evdokimov



On 28.01.2025 20:21, Sami Imseih wrote:

All the changes mentioned above are included in the v13 patch. Since the
patch status is 'Ready for Committer,' I believe it is now better for
upstream inclusion, with improved details in tests and documentation. Do
you have any further suggestions?

I am not quite clear on the sample_1.out. I do like the idea of separating
the sample tests, but I was thinking of something a bit more simple.
What do you think of my attached, sampling.sql, test? It tests sample
rate in both
simple and extended query protocols and for both top level and
nested levels?



That sounds great! I've added your sample.sql file to my v14 
patch. However, I was focused on testing sample_rate values between 0 
and 1. The approach I came up with was using the sample{_1}.out files.  
I’ve removed the test involving those files for now, but if the 
committer prefers to keep it, I can reintroduce them.






If anyone has the capability to run this benchmark on machines with more
CPUs or with different queries, it would be nice. I’d appreciate any
suggestions or feedback.

I wanted to share some additional benchmarks I ran as well
on a r8g.48xlarge ( 192 vCPUs, 1,536 GiB of memory) configured
with 16GB of shared_buffers. I also attached the benchmark.sh
script used to generate the output.
The benchmark is running the select-only pgbench workload,
so we have a single heavily contentious entry, which is the
worst case.

The test shows that the spinlock (SpinDelay waits)
becomes an issue at high connection counts and will
become worse on larger machines. A sample_rate going from
1 to .75 shows a 60% improvement; but this is on a single
contentious entry. Most workloads will likely not see this type
of improvement. I also could not really observe
this type of difference on smaller machines ( i.e. 32 vCPUs),
as expected.

## init
pgbench -i -s500

### 192 connections
pgbench -c192 -j20 -S -Mprepared -T120 --progress 10

sample_rate = 1
tps = 484338.769799 (without initial connection time)
waits
-
   11107  SpinDelay
9568  CPU
 929  ClientRead
  13  DataFileRead
   3  BufferMapping

sample_rate = .75
tps = 909547.562124 (without initial connection time)
waits
-
   12079  CPU
4781  SpinDelay
2100  ClientRead

sample_rate = .5
tps = 1028594.555273 (without initial connection time)
waits
-
   13253  CPU
3378  ClientRead
 174  SpinDelay

sample_rate = .25
tps = 1019507.126313 (without initial connection time)
waits
-
   13397  CPU
3423  ClientRead

sample_rate = 0
tps = 1015425.288538 (without initial connection time)
waits
-
   13106  CPU
3502  ClientRead

### 32 connections
pgbench -c32 -j20 -S -Mprepared -T120 --progress 10

sample_rate = 1
tps = 620667.049565 (without initial connection time)
waits
-
1782  CPU
 560  ClientRead

sample_rate = .75
tps = 620663.131347 (without initial connection time)
waits
-
1736  CPU
 554  ClientRead

sample_rate = .5
tps = 624094.688239 (without initial connection time)
waits
-
1741  CPU
 648  ClientRead

sample_rate = .25
tps = 628638.538204 (without initial connection time)
waits
-
1702  CPU
 576  ClientRead

sample_rate = 0
tps = 630483.464912 (without initial connection time)
waits
-
1638  CPU
 574  ClientRead

Regards,

Sami



Thank you so much for benchmarking this on a pretty large machine with a 
large number of CPUs. The results look fantastic, and I truly appreciate 
your effort.


BWT, I realized that the 'sampling' test needs to be added not only to 
the Makefile but also to meson.build. I've included that in the v14 patch.


--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.
From 7c9d45325e29a65259740d5255d39f9f57ee6fba Mon Sep 17 00:00:00 2001
From: Ilia Evdokimov 
Date: Tue, 28 Jan 2025 23:04:34 +0300
Subject: [PATCH] Allow setting sample rate for pg_stat_statements

New configuration parameter pg_stat_statements.sample_rate makes it
possible to track just a fraction of the queries meeting the configured
threshold, to reduce the amount of tracking.
---
 contrib/pg_stat_statements/Makefile   |   2 +-
 .../pg_stat_statements/expected/sampling.out  | 174 ++
 contrib/pg_stat_statements/meson.build|   1 +
 .../pg_stat_statements/pg_stat_statements.c   |  59 +-
 contrib/pg_stat_statements/sql/sampling.sql   |  50 +
 doc/src/sgml/pgstatstatements.sgml|  19 ++
 6 files changed, 297 insertions(+), 8 deletions(-)
 create mode 100644 contrib/pg_stat_statements/expected/sampling.out
 create mode 100644 contrib/pg_stat_statements/sql/sampling.sql

diff --git a/contrib/pg_stat_statements/Makefile b/contrib/pg_stat_statements/Makefile
index 241c02587b..b70bdfaf2d 100644
--- a/contrib/pg_stat_statements/Makefile
+++ b/contrib/pg_stat_statements/Makefile
@@ -20,7 +20,7 @@ LDFLAGS_SL += $(filter -lm, $(LIBS))
 REGRESS_OPTS = --temp-config $(top_srcdir)/contrib/pg_stat_statements/pg_stat_statements.conf

Adding a '--clean-publisher-objects' option to 'pg_createsubscriber' utility.

2025-01-28 Thread Shubham Khanna

Hi all,

I propose adding the --clean-publisher-objects option to the
pg_createsubscriber utility. As discussed in [1], this feature ensures
a clean and streamlined setup of logical replication by removing stale
or unnecessary publications from the subscriber node. These
publications, replicated during streaming replication, become
redundant after converting to logical replication and serve no further
purpose. This patch introduces the drop_all_publications() function,
which efficiently fetches and drops all publications on the subscriber
node within a single transaction. Since this cleanup is not required
when upgrading streaming replication clusters, as mentioned in [2],
this feature is supported only when the --clean-publisher-objects
option is specified, allowing users to choose accordingly.
Additionally, other related objects, such as subscriptions and
replication slots, may also require cleanup. I plan to analyze this
further and include them in subsequent patches.
The attached patch includes the necessary changes for this feature.

[1] - 
https://www.postgresql.org/message-id/CAExHW5t4ew7ZrgcDdTv7YmuG7LVQT1ZaEny_EvtngHtEBNyjcQ%40mail.gmail.com
[2] - 
https://amitkapila16.blogspot.com/2024/09/online-upgrading-logical-and-physical.html

Thanks and regards,
Shubham Khanna.


v1-0001-Support-for-dropping-all-publications-in-pg_creat.patch
Description: Binary data

RE: Adding a '--clean-publisher-objects' option to 'pg_createsubscriber' utility.

2025-01-28 Thread Hayato Kuroda (Fujitsu)

Dear Shubham,

> I propose adding the --clean-publisher-objects option to the
> pg_createsubscriber utility. As discussed in [1], this feature ensures
> a clean and streamlined setup of logical replication by removing stale
> or unnecessary publications from the subscriber node. These
> publications, replicated during streaming replication, become
> redundant after converting to logical replication and serve no further
> purpose. This patch introduces the drop_all_publications() function,
> which efficiently fetches and drops all publications on the subscriber
> node within a single transaction.

I think replication slot are also type of 'publisher-objects', but they are not
removed for now: API-name may not be accurate. And...

> Additionally, other related objects, such as subscriptions and
> replication slots, may also require cleanup. I plan to analyze this
> further and include them in subsequent patches.

I'm not sure replication slots should be cleaned up. Apart from other items like
publication/subscription, replication slots are not replicated when it is 
created
on the primary instance. This means they are intentionally created by DBAs and 
there
may not be no strong reasons to drop them after the conversion.

Another question is the style of APIs. Do you plan to provide APIs like
'cleanup-subscriber-objects' and 'cleanup-publisher-objects', or just one
'cleanup-logical-replication-objects'?

Regarding the patch:

1.
```
+   The pg_createsubscriber now supports the
+   --clean-publisher-objects to remove all publications on
+   the subscriber node before creating a new subscription.
```

This description is not suitable for the documentation. Something like:

Remove all publications on the subscriber node.

2.
```
+   /* Drop publications from the subscriber if requested */
+   drop_all_publications(dbinfo);
```

This should be called when `opts.clean_publisher_objects` is true.

3.
You said publications are dropped within a single transaction, but the
patch does not do. Which is correct?

4.
```
+# Set up node A as primary
+my $node_a = PostgreSQL::Test::Cluster->new('node_a');
+my $aconnstr = $node_a->connstr;
+$node_a->init(allows_streaming => 'logical');
+$node_a->append_conf('postgresql.conf', 'autovacuum = off');
+$node_a->start;
+
+# Set up node B as standby linking to node A
+$node_a->backup('backup_3');
+my $node_b = PostgreSQL::Test::Cluster->new('node_b');
+$node_b->init_from_backup($node_a, 'backup_3', has_streaming => 1);
+$node_b->append_conf(
+   'postgresql.conf', qq[
+   primary_conninfo = '$aconnstr'
+   hot_standby_feedback = on
+   max_logical_replication_workers = 5
+   ]);
+$node_b->set_standby_mode();
+$node_b->start;
```

I don't think new nodes are needed in the test. Can you reuse node_p/node_s for 
the purpose?

--
Best regards,
Haato Kuroda

Restrict publishing of partitioned table with a foreign table as partition

2025-01-28 Thread Shlok Kyal

Hi,

As part of a discussion in [1], I am starting this thread to address
the issue reported for foreign tables.

Logical replication of foreign tables is not supported, and we throw
an error in this case. But when we create a publication on a
partitioned table that has a foreign table as a partition, the initial
sync of such a table is successful. We should also throw an error in
such cases.
With this patch, we will throw an error when we try to create a
publication on (or add to an existing publication) a partitioned table
with a foreign table as its partition or attach such a table to
existing published tables.

[1] : 
https://www.postgresql.org/message-id/CAA4eK1Lhh4SgiYQLNiWSNKGdVSzbd53%3Dsr2tQCKooEphDkUtgw%40mail.gmail.com

Thanks and Regards,
Shlok Kyal


v1-0001-Restrict-publishing-of-partitioned-table-with-a-f.patch
Description: Binary data

Re: Introduce XID age and inactive timeout based replication slot invalidation

2025-01-28 Thread vignesh C

On Tue, 28 Jan 2025 at 17:28, Nisha Moond  wrote:
>
> On Tue, Jan 28, 2025 at 3:26 PM Amit Kapila  wrote:
> >
> > On Mon, Dec 30, 2024 at 11:05 AM Peter Smith  wrote:
> > >
> > > I think we are often too quick to throw out perfectly good tests.
> > > Citing that some similar GUCs don't do testing as a reason to skip
> > > them just seems to me like an example of "two wrongs don't make a
> > > right".
> > >
> > > There is a third option.
> > >
> > > Keep the tests. Because they take excessive time to run, that simply
> > > means you should run them *conditionally* based on the PG_TEST_EXTRA
> > > environment variable so they don't impact the normal BF execution. The
> > > documentation [1] says this env var is for "resource intensive" tests
> > > -- AFAIK this is exactly the scenario we find ourselves in, so is
> > > exactly what this env var was meant for.
> > >
> > > Search other *.pl tests for PG_TEST_EXTRA to see some examples.
> > >
> >
> > I don't see the long-running tests to be added under PG_TEST_EXTRA as
> > that will make it unusable after some point. Now, if multiple senior
> > members feel it is okay to add long-running tests under PG_TEST_EXTRA
> > then I am open to considering it. We can keep this test as a separate
> > patch so that the patch is being tested in CI or in manual tests
> > before commit.
> >
>
> Please find the attached v64 patches. The changes in this version
> w.r.t. older patch v63 are as -
> - The changes from the v63-0001 patch have been moved to a separate thread 
> [1].
> - The v63-0002 patch has been split into two parts in v64:
>   1) 001 patch: Implements the main feature - inactive timeout-based
> slot invalidation.
>   2) 002 patch: Separates the TAP test "044_invalidate_inactive_slots"
> as suggested above.
>
> [1] 
> https://www.postgresql.org/message-id/CABdArM6pBL5hPnSQ%2B5nEVMANcF4FCH7LQmgskXyiLY75TMnKpw%40mail.gmail.com

Few comments:
1) We can mention about the slot that do not reserve WAL is also not applicable:
+   
+Note that the idle timeout invalidation mechanism is not
+applicable for slots on the standby server that are being synced
+from the primary server (i.e., standby slots having
+pg_replication_slots.synced
+value true).
+Synced slots are always considered to be inactive because they don't
+perform logical decoding to produce changes.
+   

2) Similarly we can mention in the commit message also that it will
not be considered for slot that do not reserve WAL:
Note that the idle timeout invalidation mechanism is not
applicable for slots on the standby server that are being synced
from the primary server (i.e., standby slots having 'synced' field
'true'). Synced slots are always considered to be inactive because
they don't perform logical decoding to produce changes.

3) Since idle_replication_slot_timeout is somewhat similar to
max_slot_wal_keep_size, we can move idle_replication_slot_timeout
after max_slot_wal_keep_size instead of keeping it after
wal_sender_timeout.
+ 
+  idle_replication_slot_timeout
(integer)
+  
+   idle_replication_slot_timeout
configuration parameter
+  
+  
+  
+   
+Invalidate replication slots that have remained idle longer than this
+duration. If this value is specified without units, it is taken as
+minutes. A value of zero disables the idle timeout
invalidation mechanism.
+The default is one day. This parameter can only be set in the
+postgresql.conf file or on the server
command line.
+   

4) We can try to keep it to less than 80 char wherever possible:
a) Like in this case, "mechanism" can be moved to the next line:
+duration. If this value is specified without units, it is taken as
+minutes. A value of zero disables the idle timeout
invalidation mechanism.

b) Similarly here too, "slot's" can be moved to the next line:
+inactive slots. The duration of slot inactivity is calculated
using the slot's
+pg_replication_slots.inactive_since

5) You can use new ereport style to exclude brackets around errcode:
+   ereport(ERROR,
+
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+errmsg("can no longer get changes
from replication slot \"%s\"",
+   NameStr(s->data.name)),
+errdetail("This slot has been
invalidated because it has remained idle longer than the configured
\"%s\" duration.",
+
"idle_replication_slot_timeout")));

Regards,
Vignesh

CREATE PUBLICATION and foreign partitions

2025-01-28 Thread Sergey Tatarintsev


Hello all!

Currently, according to logical replication restriction sections at [1], 
"Attempts to replicate ... foreign tables, will result in an error."


But we don't care about partitioned tables with foreign partitions.

As we discussed here 
https://www.postgresql.org/message-id/CAA4eK1Lhh4SgiYQLNiWSNKGdVSzbd53%3Dsr2tQCKooEphDkUtgw%40mail.gmail.com

we decided to discuss this issue separately.
I have attached patch for this issue.

The correction only applies to the case "CREATE PUBLICATION FOR TABLE", 
but no "FOR ALL TABLES" and "TABLES IN SCHEMA" because table list built 
during subscription.
And I have another patch that filters partitioned tables at subscription 
time, but we can't throw an error in that case. I'm not sure if it's 
interesting, because it changes the behavior


[1] - 
https://www.postgresql.org/docs/devel/logical-replication-restrictions.html
From c69e26ef8519dfbfee9edf586ec8742f79e92269 Mon Sep 17 00:00:00 2001
From: Sergey Tatarintsev 
Date: Wed, 29 Jan 2025 12:40:54 +0700
Subject: [PATCH] Attempts to replicate partitioned tables with foreign
 partitions, will result in an error

---
 src/backend/catalog/pg_publication.c  | 36 +++
 src/test/regress/expected/publication.out | 16 ++
 src/test/regress/sql/publication.sql  | 13 
 3 files changed, 65 insertions(+)

diff --git a/src/backend/catalog/pg_publication.c b/src/backend/catalog/pg_publication.c
index 09e2dbdd10..a426986c7f 100644
--- a/src/backend/catalog/pg_publication.c
+++ b/src/backend/catalog/pg_publication.c
@@ -48,6 +48,34 @@ typedef struct
  * table. */
 } published_rel;
 
+/*
+ * Check is relation has foreign partitions or
+ * it's foreign table itself.
+ */
+static bool IsRelationWithForeignData(Oid relid)
+{
+	List	   *all_parts;
+	ListCell   *lc;
+	char		relkind = get_rel_relkind(relid);
+
+	if (relkind == RELKIND_FOREIGN_TABLE)
+		return true;
+	else if (relkind != RELKIND_PARTITIONED_TABLE)
+		return false;
+
+	all_parts = find_all_inheritors(relid, NoLock, NULL);
+
+	foreach(lc, all_parts)
+	{
+		Oid			partOid = lfirst_oid(lc);
+
+		if (get_rel_relkind(partOid) == RELKIND_FOREIGN_TABLE)
+			return true;
+	}
+
+	return false;
+}
+
 /*
  * Check if relation can be in given publication and throws appropriate
  * error if not.
@@ -64,6 +92,14 @@ check_publication_add_relation(Relation targetrel)
 		RelationGetRelationName(targetrel)),
  errdetail_relkind_not_supported(RelationGetForm(targetrel)->relkind)));
 
+	/* Can't be partitioned table with foreign partitions */
+	if (IsRelationWithForeignData(RelationGetForm(targetrel)->oid))
+		ereport(ERROR,
+(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("cannot add relation \"%s\" to publication",
+		RelationGetRelationName(targetrel)),
+ errdetail("This operation is not supported for partitioned tables with foreign partitions")));
+
 	/* Can't be system table */
 	if (IsCatalogRelation(targetrel))
 		ereport(ERROR,
diff --git a/src/test/regress/expected/publication.out b/src/test/regress/expected/publication.out
index 5de2d64d01..a9c02a1503 100644
--- a/src/test/regress/expected/publication.out
+++ b/src/test/regress/expected/publication.out
@@ -1847,6 +1847,22 @@ Tables:
 DROP PUBLICATION pub1;
 DROP PUBLICATION pub2;
 DROP TABLE gencols;
+-- Attempts to replicate foreign tables, will result in an error
+CREATE FOREIGN DATA WRAPPER dummy;
+CREATE SERVER fdw FOREIGN DATA WRAPPER dummy;
+CREATE TABLE ptab(id int) PARTITION BY RANGE(id);
+CREATE TABLE part1 PARTITION OF ptab FOR VALUES FROM (0) TO (5);
+CREATE TABLE part2 PARTITION OF ptab FOR VALUES FROM (5) TO (15) PARTITION BY RANGE(id);
+CREATE TABLE part21 PARTITION OF part2 FOR VALUES FROM (5) TO (10);
+CREATE FOREIGN TABLE part22 PARTITION OF part2 FOR VALUES FROM (10) TO (15) SERVER fdw;
+CREATE PUBLICATION fdwpub FOR TABLE ptab;
+ERROR:  cannot add relation "ptab" to publication
+DETAIL:  This operation is not supported for partitioned tables with foreign partitions
+CREATE PUBLICATION fdwpub_via_root FOR TABLE ptab WITH (publish_via_partition_root);
+ERROR:  cannot add relation "ptab" to publication
+DETAIL:  This operation is not supported for partitioned tables with foreign partitions
+DROP TABLE ptab;
+DROP FOREIGN DATA WRAPPER dummy CASCADE;
 RESET client_min_messages;
 RESET SESSION AUTHORIZATION;
 DROP ROLE regress_publication_user, regress_publication_user2;
diff --git a/src/test/regress/sql/publication.sql b/src/test/regress/sql/publication.sql
index 48e68bcca2..c69ad9682b 100644
--- a/src/test/regress/sql/publication.sql
+++ b/src/test/regress/sql/publication.sql
@@ -1153,6 +1153,19 @@ DROP PUBLICATION pub1;
 DROP PUBLICATION pub2;
 DROP TABLE gencols;
 
+-- Attempts to replicate foreign tables, will result in an error
+CREATE FOREIGN DATA WRAPPER dummy;
+CREATE SERVER fdw FOREIGN DATA WRAPPER dummy;
+CREATE TABLE ptab(id int) PARTITION BY RANGE(id);
+CREATE TABLE part1 PARTITION OF ptab FOR VALUES FROM (0) TO (5);
+CREAT

Re: create subscription with (origin = none, copy_data = on)

2025-01-28 Thread Sergey Tatarintsev


23.01.2025 09:46, Amit Kapila пишет:

On Wed, Jan 22, 2025 at 9:44 PM Sergey Tatarintsev
 wrote:

22.01.2025 18:41, Shlok Kyal пишет:

Also we still don't care about foreign partitions  (as I wrote earlier
we should raise an ERROR for such publications).


I think dealing with this separately from the origin vs. partitioned
table issue is better.


ok! let it be so
I made a patch for such cases. You can see it here:

https://www.postgresql.org/message-id/c78766fa-4eff-4805-ad9c-868f02954ad4%40postgrespro.ru

Re: SQL:2011 application time

2025-01-28 Thread Paul Jungwirth


On 1/27/25 07:47, Peter Eisentraut wrote:
> On 24.01.25 03:55, Tom Lane wrote:
>> I've now run an exhaustive search through the last three months of
>> buildfarm runs, and found just one additional instance of the same
>> failure.  The three matches are
>>
>> 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=calliphoridae&dt=2025-01-22%2005%3A49%3A08
>> 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=indri&dt=2025-01-22%2001%3A29%3A35
>> 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mylodon&dt=2025-01-22%2001%3A17%3A14
>>
>> Since those are all post-1772d554b0, it's difficult to avoid the
>> conclusion that that either introduced the error or allowed a
>> pre-existing problem to become visible.
>
> I found a few more in the bowels of various Cirrus CI jobs:
>
> https://cirrus-ci.com/task/5125479033733120
> -> 
https://api.cirrus-ci.com/v1/artifact/task/5125479033733120/testrun/build/testrun/regress/
> regress/regression.diffs
>
> https://cirrus-ci.com/task/4562529080311808
> -> 
https://api.cirrus-ci.com/v1/artifact/task/4562529080311808/log/src/test/recovery/tmp_check/
> regression.diffs
>
> https://cirrus-ci.com/task/5985049025183744
> -> 
https://api.cirrus-ci.com/v1/artifact/task/5985049025183744/log/src/bin/pg_upgrade/tmp_check/
> regression.diffs
>
> https://cirrus-ci.com/task/4702566639992832
> -> 
https://api.cirrus-ci.com/v1/artifact/task/4702566639992832/log/src/test/recovery/tmp_check/
> regression.diffs

Thanks to you both for finding some more examples! This answers one question I've been wondering 
about: Why do we get this failure for range types . . .:


```
diff -U3 /tmp/cirrus-ci-build/src/test/regress/expected/without_overlaps.out 
/tmp/cirrus-ci-build/src/test/recovery/tmp_check/results/without_overlaps.out
--- /tmp/cirrus-ci-build/src/test/regress/expected/without_overlaps.out	2025-01-25 
03:14:11.906404866 +
+++ /tmp/cirrus-ci-build/src/test/recovery/tmp_check/results/without_overlaps.out	2025-01-25 
03:21:08.218167913 +

@@ -1792,8 +1792,6 @@
   SET valid_at = CASE WHEN lower(valid_at) = '2018-01-01' THEN 
daterange('2018-01-01', '2018-01-05')
   WHEN lower(valid_at) = '2018-02-01' THEN daterange('2018-01-05', 
'2018-03-01') END

   WHERE id = '[6,7)';
-ERROR:  update or delete on table "temporal_rng" violates RESTRICT setting of foreign key 
constraint "temporal_fk_rng2rng_fk" on table "temporal_fk_rng2rng"
-DETAIL:  Key (id, valid_at)=([6,7), [2018-01-01,2018-02-01)) is referenced from table 
"temporal_fk_rng2rng".

 -- a PK update that fails because both are referenced (even before commit):
 BEGIN;
   ALTER TABLE temporal_fk_rng2rng
```

. . . but never a failure later in the file for the same scenario with multiranges? But Peter's 
links [1] now include an example of that too:


```
diff -U3 /tmp/cirrus-ci-build/src/test/regress/expected/without_overlaps.out 
/tmp/cirrus-ci-build/src/bin/pg_upgrade/tmp_check/results/without_overlaps.out
--- /tmp/cirrus-ci-build/src/test/regress/expected/without_overlaps.out	2025-01-25 
04:31:18.353128126 +
+++ /tmp/cirrus-ci-build/src/bin/pg_upgrade/tmp_check/results/without_overlaps.out	2025-01-25 
04:35:22.020363327 +

@@ -2311,8 +2311,6 @@
   SET valid_at = CASE WHEN lower(valid_at) = '2018-01-01' THEN 
datemultirange(daterange('2018-01-01', '2018-01-05'))
   WHEN lower(valid_at) = '2018-02-01' THEN 
datemultirange(daterange('2018-01-05', '2018-03-01')) END

   WHERE id = '[6,7)';
-ERROR:  update or delete on table "temporal_mltrng" violates RESTRICT setting of foreign key 
constraint "temporal_fk_mltrng2mltrng_fk" on table "temporal_fk_mltrng2mltrng"
-DETAIL:  Key (id, valid_at)=([6,7), {[2018-01-01,2018-02-01)}) is referenced from table 
"temporal_fk_mltrng2mltrng".

 -- a PK update that fails because both are referenced (even before commit):
 BEGIN;
   ALTER TABLE temporal_fk_mltrng2mltrng
```

So that is relieving. Still it's interesting that it's a 6:1 ratio.

I've attached a patch that causes both failures to appear every time (v48.0). It shows that if the 
RESTRICT constraint accidentally loaded the cached query plan from the most recently cached NO 
ACTION constraint (which we test just before testing RESTRICT), it would create matching failures. 
So some kind of oid conflict could cause that.


On 1/27/25 07:56, Peter Eisentraut wrote:
> I think this call in ri_restrict()
>
>  ri_BuildQueryKey(&qkey, riinfo, RI_PLAN_RESTRICT);
>
> needs to use a different third argument for NO ACTION vs. RESTRICT, since we 
are now sometimes using
> different queries for them.
>
> However, the RI_QueryKey also uses the constraint OID as part of the hash 
key, so even this mistake
> would not trigger any bad effect unless we also have OID collisions?

That is my take too. I haven't worked out how an OID collision could happen though. Since we are 
running the tests in parallel could the other tests generate enough oids to roll the counter around? 
Surely not. And I

Re: Proposal for Updating CRC32C with AVX-512 Algorithm.

2025-01-28 Thread John Naylor

On Sat, Jan 25, 2025 at 3:35 AM Devulapalli, Raghuveer
 wrote:

> > #1 - The choice of AVX-512. There is no such thing as a "CRC instruction 
> > operating
> > on 8 bytes", and the proposed algorithm is a multistep process using 
> > carryless
> > multiplication and requiring at least 256 bytes of input. The Chromium 
> > sources
> > cited as the source for this patch also contain an implementation using 
> > 128-bit
> > instructions, and which only requires at least 64 bytes of input. Is there 
> > a reason
> > that not tested or proposed as well? That would be much easier to 
> > read/maintain,
> > work on more systems, and might give a speed boost on smaller inputs. These 
> > are
> > useful properties to have.
> >
> > https://github.com/chromium/chromium/blob/main/third_party/zlib/crc32_simd
> > .c#L215
>
> Agreed. postgres already has the SSE42 version pg_comp_crc32c_sse42, but I 
> didn’t
> realize it uses the crc32 instruction which processes only 8 bytes at a time. 
> This can
> certainly be upgraded to process 64bytes at a time and should be faster. 
> Since most
> of the AVX-512 stuff is almost ready, I propose to do this in a follow up 
> patch immediately.

It doesn't make sense to me that more limited/difficult hardware
support (and more complex coding for that) and a larger input
threshold should be a prerequisite for something that doesn't have
these disadvantages.

> Let me know if you disagree. The AVX512 version processes 256 bytes at a time 
> and will
> most certainly be faster than the improved SSE42 version, which is why the 
> chromium
> library has both AVX512 and SSE42.

It looks like chromium simply vendored the zlib library. Input
destined for compression is always going to be "large". That's not
true in general for our use case, and we mentioned that fact seven
months ago, when Andres said upthread [1]: "This is extremely workload
dependent, it's not hard to find workloads with lots of very small
record and very few big ones...". Given that feedback, it would have
made a lot of sense to mention the 64-byte alternative back then,
especially since it's the exact same pclmull algorithm based on the
same paper, and is found in the same zlib .c file, but for some reason
that was not done.

More broadly, the best strategy is to start with the customer and work
backward to the technology. It's more risky to pick the technology
upfront and try to find ways to use it. My goal here is to help you
make the right tradeoffs. Here's my view:

1. If we can have a relatively low input size threshold for
improvement, it's possibly worth a bit of additional complexity in
configure and run-time checks. There is a complicating factor in
testing that though: the latency of carryless multiplication
instructions varies drastically on different microarchitectures.
2. If we can improve large inputs in a simple fashion, with no
additional hardware support, that's worth doing in any case.
3. Complex hardware support (6 CPUIDs!) that only works on large
inputs (a minority of workloads) looks to be the worst of both worlds
and it's not the tradeoff we should make.

Further, we verified upthread that Intel's current and near-future
product line includes server chips (some with over 100 cores, so not
exactly low-end) that don't support AVX-512 at all. I have no idea how
common they will be, but they will certainly be found in cloud
datacenters somewhere. Shouldn't we have an answer for them as well?

> > #2 - The legal status of the algorithm from following Intel white paper, 
> > which is
> > missing from its original location, archived here:
> >
> > https://web.archive.org/web/20220802143127/https://www.intel.com/content/
> > dam/www/public/us/en/documents/white-papers/crc-iscsi-polynomial-crc32-
> > instruction-paper.pdf
> >
> > https://github.com/torvalds/linux/blob/master/arch/x86/crypto/crc32c-pcl-intel-
> > asm_64.S
> >
> > ...so I'm unclear if these patents are applicable to software 
> > implementations.
> > They also seem to be expired, but I am not a lawyer.
> > Could you look into this please? Even if we do end up with AVX-512, this 
> > would be
> > a good fallback.
>
> Given that SSE42 is pretty much available in all x86 processors at this 
> point, do we need a
> fallback C version specially after we improve the SSE42 version.

I know you had extended time off work, but I've already shared my
findings and explained my reasoning [2]. The title of the paper is
"Fast CRC Computation for iSCSI Polynomial Using CRC32 Instruction",
so unsurprisingly it does improve the SSE42 version. With a few dozen
lines of code, I can get ~3x speedup on page-sized inputs. At the very
least we want to use this technique on Arm [3], and the only blocker
now is the question regarding the patents. I'm interested to hear the
response on this.

[1] 
https://www.postgresql.org/message-id/20240612201135.kk77tiqcux77lgev%40awork3.anarazel.de
[2] 
https://www.postgresql.org/message-id/CANWCAZbr4sO1bPoS%2BE%3DiRWnrBZp7zUKZEJ

Re: Enhance 'pg_createsubscriber' to retrieve databases automatically when no database is provided.

2025-01-28 Thread Amit Kapila

On Tue, Jan 28, 2025 at 12:01 PM Ashutosh Bapat
 wrote:
>
> If we want to stick to --database= supporting a pattern looks better
> than just a single special pattern *.
>

This sounds reasonable to me as well. Note that the interaction of
other parameters like --replication-slot is not yet discussed. I think
if the number of slots given matches with the number of databases
fetched based on pattern matches then we can use them otherwise,
return the ERROR. The other option could be that we don't allow
options like --replication-slot along with pattern matching option.

-- 
With Regards,
Amit Kapila.

Comment cleanup - it's vs its

2025-01-28 Thread Jacob Brazeal

I noticed a few incorrect usages of it's vs its and decided to search
systematically for any mistakes. Overall, we use it right more than 99% of
the time but there were still about 20 mistakes.

Jacob


v1-its.patch
Description: Binary data

Re: Introduce XID age and inactive timeout based replication slot invalidation

2025-01-28 Thread Peter Smith

On Tue, Jan 28, 2025 at 10:58 PM Nisha Moond  wrote:
> Please find the attached v64 patches. The changes in this version
> w.r.t. older patch v63 are as -
> - The changes from the v63-0001 patch have been moved to a separate thread 
> [1].
> - The v63-0002 patch has been split into two parts in v64:
>   1) 001 patch: Implements the main feature - inactive timeout-based
> slot invalidation.
>   2) 002 patch: Separates the TAP test "044_invalidate_inactive_slots"
> as suggested above.
>

Hi Nisha.

Some review comments for patch v64-0001.

==
1. General

Too much of this patch v64-0001 is identical/duplicated code with the
recent "spin-off" patch v1-0002 [1]. e.g. Most of v1-0001 is now also
embedded in the v64-0001.

This is making for an unnecessarily tricky 2 x review of all the same
code, and it will also cause rebase hassles later.

Even if you wanted the 'error_in_invalid' stuff to be discussed and
pushed separately, I think it will be much easier to keep a "COPY" of
that v1-0002 patch here as a pre-requisite for v64-0001 so then all of
the current code duplications can be removed.

==
src/backend/replication/slot.c

ReplicationSlotAcquire:

2.
+ *
+ * An error is raised if error_if_invalid is true and the slot is found to
+ * be invalid.
  */

and

+ /*
+ * An error is raised if error_if_invalid is true and the slot has been
+ * previously invalidated due to inactive timeout.
+ */
+ if (error_if_invalid && s->data.invalidated == RS_INVAL_IDLE_TIMEOUT)
+ {

Although those comments are correct for v1-0001 [1] it is a misleading
comment in the hacked into v64-0001 because here you are only checking
invalidation cause RS_INVAL_IDLE_TIMEOUT but none of the other
possible causes.

~~~

ReportSlotInvalidation:

3.
+ case RS_INVAL_IDLE_TIMEOUT:
+ Assert(inactive_since > 0);
+ /* translator: second %s is a GUC variable name */
+ appendStringInfo(&err_detail,
+ _("The slot has remained idle since %s, which is longer than the
configured \"%s\" duration."),
+ timestamptz_to_str(inactive_since),
+ "idle_replication_slot_timeout");

I have the same question already asked for my review of patch v1-0002
[1]. e.g. Isn't there some mismatch between using the _() macro which
is for translations, and using the errdetail_internal which is for
strings *not* requiring translation?

~~~

InvalidatePossiblyObsoleteSlot:

4.
/*
 * The logical replication slots shouldn't be invalidated as GUC
 * max_slot_wal_keep_size is set to -1 during the binary upgrade. See
 * check_old_cluster_for_valid_slots() where we ensure that no
 * invalidated before the upgrade.
 */
Assert(!(*invalidated && SlotIsLogical(s) && IsBinaryUpgrade));

Unless I am mistaken, all of the v63 cleanups of the above binary
upgrade code assert stuff have vanished somewhere between v63 and v64.
I cannot find them in the spin-off thread. All accidentally lost? (in
2 places)

Not only that but the accompanying comment modification (to mention
"and idle_replication_slot_timeout is set to 0") is also MIA last seen
in v63 (??)

==
[1] 
https://www.postgresql.org/message-id/CABdArM6pBL5hPnSQ%2B5nEVMANcF4FCH7LQmgskXyiLY75TMnKpw%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia

Re: Change GUC hashtable to use simplehash?

2025-01-28 Thread John Naylor

On Thu, Jan 23, 2025 at 8:52 AM Anton A. Melnikov
 wrote:
>
> Hi!
>
> On 22.01.2025 11:37, John Naylor wrote:
> > On Fri, Jan 17, 2025 at 4:50 PM John Naylor  wrote:
> >>
> >> It would be a lot more readable to revert the offending commit
> >> instead, since its predecessor had a much simpler bytewise loop.
>
> Agreed that reverting seems as a preferable way, and here's why.

This is done -- thanks for the report, and for testing.

-- 
John Naylor
Amazon Web Services

Re: Introduce XID age and inactive timeout based replication slot invalidation

2025-01-28 Thread vignesh C

On Tue, 28 Jan 2025 at 17:28, Nisha Moond  wrote:
>
> Please find the attached v64 patches. The changes in this version
> w.r.t. older patch v63 are as -
> - The changes from the v63-0001 patch have been moved to a separate thread 
> [1].
> - The v63-0002 patch has been split into two parts in v64:
>   1) 001 patch: Implements the main feature - inactive timeout-based
> slot invalidation.
>   2) 002 patch: Separates the TAP test "044_invalidate_inactive_slots"
> as suggested above.

Currently the test takes around 220 seconds for me. We could do the
following changes to bring it down to around 70 to 80 seconds:
1) Set idle_replication_slot_timeout to 70 seconds
+# Avoid unpredictability
+$primary->append_conf(
+   'postgresql.conf', qq{
+checkpoint_timeout = 1h
+});
+$primary->start;

2) I felt just 1 second more is enough unless you anticipate a random
failure, the test passes for me:
+# Give enough time for inactive_since to exceed the timeout
+sleep($idle_timeout_1min * 60 + 10);

3) Since we will be setting it to 70 seconds above, changing the
configuration and reload is not required:
+# Set timeout GUC so that the next checkpoint will invalidate inactive slots
+$primary->safe_psql(
+   'postgres', qq[
+ALTER SYSTEM SET idle_replication_slot_timeout TO
'${idle_timeout_1min}min';
+]);
+$primary->reload;

4) Here you can add some comments that 60s has elapsed and the slot
will get invalidated in another 10 seconds, and pass timeout as 10s to
wait_for_slot_invalidation:
+# Wait for logical failover slot to become inactive on the primary. Note that
+# nobody has acquired the slot yet, so it must get invalidated due to
+# idle timeout.
+wait_for_slot_invalidation($primary, 'sync_slot1', $logstart,
+   $idle_timeout_1min);

5) We can have another streaming replication cluster setup, may be
primary2 and standby2 nodes and stop the standby2 immediately along
with the first streaming replication cluster itself:
+# Make the standby slot on the primary inactive and check for invalidation
+$standby1->stop;
+wait_for_slot_invalidation($primary, 'sb_slot1', $logstart,
+   $idle_timeout_1min);

6) We can rename primary to primary or standby1 to standby to keep the
name consistent:
+# Create standby slot on the primary
+$primary->safe_psql(
+   'postgres', qq[
+SELECT pg_create_physical_replication_slot(slot_name :=
'sb_slot1', immediately_reserve := true);
+]);
+
+# Create standby
+my $standby1 = PostgreSQL::Test::Cluster->new('standby1');
+$standby1->init_from_backup($primary, $backup_name, has_streaming => 1);

Regards,
Vignesh

Re: Comment cleanup - it's vs its

2025-01-28 Thread John Naylor

On Wed, Jan 29, 2025 at 1:46 PM Jacob Brazeal  wrote:
>
> I noticed a few incorrect usages of it's vs its and decided to search 
> systematically for any mistakes. Overall, we use it right more than 99% of 
> the time but there were still about 20 mistakes.

LGTM, except maybe this one:

- * try read non-locale sign, it's happen only if format is not exact
+ * try read non-locale sign, it happens only if format is not exact

...sounds better to me with "which happens".

-- 
John Naylor
Amazon Web Services

Re: Comment cleanup - it's vs its

2025-01-28 Thread John Naylor

On Wed, Jan 29, 2025 at 1:46 PM Jacob Brazeal  wrote:
>
> I noticed a few incorrect usages of it's vs its and decided to search 
> systematically for any mistakes. Overall, we use it right more than 99% of 
> the time but there were still about 20 mistakes.

I just noticed this, too:

-# DROP DATABASE should drops it's slots, including active slots.
+# DROP DATABASE should drops its slots, including active slots.

"should drop"

-- 
John Naylor
Amazon Web Services

Re: per backend WAL statistics

2025-01-28 Thread Rahila Syed

Hi,

Thank you for the patchset. Having per-backend WAL statistics,
in addition to cluster-wide ones, is useful.

I had a few comments while looking at v6-0003-* patch.

+ /*
+ * This could be an auxiliary process but these do not report backend
+ * statistics due to pgstat_tracks_backend_bktype(), so there is no need
+ * for an extra call to AuxiliaryPidGetProc().
+ */
+ if (!proc)
+ PG_RETURN_NULL();

Maybe an explicit call to AuxiliaryPidGetProc() followed by a check
for pgstat_tracks_backend_bktype() would be more maintainable.
Since the processes tracked by AuxiliaryPidGetProc and
pgstat_tracks_backend_bktype might diverge in future.

On that note, it is not clear to me why the WAL writer statistics are not
included in per backend
wal statistics? I understand the same limitation currently exists in
pgstats_track_io_bktype(), but why does that need to be extended to
WAL statistics?

+ pg_stat_get_backend_wal
+
+pg_stat_get_backend_wal (
integer )
+record
+   
Should the naming describe what is being returned more clearly?
Something like pg_stat_get_backend_wal_activity()? Currently it
suggests that it returns a backend's WAL, which is not the case.

+ if (pgstat_tracks_backend_bktype(MyBackendType))
+ {
+ PendingBackendStats.pending_wal.wal_write++;
+
+ if (track_wal_io_timing)
+ INSTR_TIME_ACCUM_DIFF(PendingBackendStats.pending_wal.wal_write_time,
+  end, start);
+ }
At the risk of nitpicking, may I suggest moving the above code, which is
under the
track_wal_io_timing check, to the existing check before this added chunk?
This way, all code related to track_wal_io_timing will be grouped together,
closer to where the "end" variable is computed.


Thank you,
Rahila Syed



On Tue, Jan 21, 2025 at 12:50 PM Bertrand Drouvot <
bertranddrouvot...@gmail.com> wrote:

> Hi,
>
> On Fri, Jan 17, 2025 at 08:43:57AM +0900, Michael Paquier wrote:
> > On Thu, Jan 16, 2025 at 12:44:20PM -0500, Andres Freund wrote:
> > > On 2025-01-16 17:11:09 +, Bertrand Drouvot wrote:
> > >> So, do you think that the initial proposal that has been made here
> (See R1. in
> > >> [2]) i.e make use of a new PendingBackendWalStats variable:
> > >
> > > Well, I think this first needs be fixed for for the IO stats change
> made in
> > >
> > > Once we have a pattern to model after, we can apply the same scheme
> here.
> >
> > Okay, thanks for the input.  I was not sure what you intended
> > originally with all this part of the backend code, and how much would
> > be acceptable.  The line is clear now.
> >
> > >> 0003 does not rely on pgstat_prep_backend_pending() for its pending
> statistics
> > >> but on a new PendingBackendWalStats variable. The reason is that the
> pending wal
> > >> statistics are incremented in a critical section (see XLogWrite(),
> and so
> > >> a call to pgstat_prep_pending_entry() could trigger a failed
> assertion:
> > >> MemoryContextAllocZero()->"CritSectionCount == 0 ||
> (context)->allowInCritSection"
> > >> "
> > >>
> > >> and implemented up to v4 is a viable approach?
> > >
> > > Yes-ish.  I think it would be better to make it slightly more general
> than
> > > that, handling this for all types of backend stats, not just for WAL.
> >
> > Agreed to use the same concept for all these parts of the backend
> > stats kind rather than two of them.  Will send a reply on the original
> > backend I/O thread as well.
>
> PFA v6 that now relies on the new PendingBackendStats variable introduced
> in
> 4feba03d8b9.
>
> Remark: I moved PendingBackendStats back to pgstat.h because I think that
> the
> "simple" pending stats increment that we are adding in xlog.c are not worth
> an extra function call overhead (while it made more sense for the more
> complex IO
> stats handling). So PendingBackendStats is now visible to the outside
> world like
> PendingWalStats and friends.
>
> Regards,
>
> --
> Bertrand Drouvot
> PostgreSQL Contributors Team
> RDS Open Source Databases
> Amazon Web Services: https://aws.amazon.com
>

Re: Comment cleanup - it's vs its

2025-01-28 Thread John Naylor

On Wed, Jan 29, 2025 at 2:26 PM Jacob Brazeal  wrote:
>
> > I just noticed this, too:
> >
> > -# DROP DATABASE should drops it's slots, including active slots.
> > +# DROP DATABASE should drops its slots, including active slots.
> >
> > "should drop"
>
> Ah, good catch. Fixed in v3.

(Note: You forgot reply-all for your later messages). I ran pgindent
after applying and found it re-indented an entire paragraph. That
seemed like an unfortunate side-effect for a typo fix, so I took the
liberty of adding an additional space after a period. Some comments
use the two-space style anyway. Pushed, thanks for the patch!

--
John Naylor
Amazon Web Services

Re: Extended Statistics set/restore/clear functions.

2025-01-28 Thread jian he

hi.

select '{"1, 0B100101":"NaN"}'::pg_ndistinct;
  pg_ndistinct

 {"1, 37": -2147483648}
(1 row)

this is not what we expected?
For the VALUE part of pg_ndistinct, float8 has 3 special values: inf, -inf, NaN.

For the key part of pg_ndistinct, see example.
select '{"1, 16\t":"1"}'::pg_ndistinct;
here \t is not tab character, ascii 9. it's two characters: backslash
and character "t".
so here it should error out?
(apply this to \n, \r, \b)


pg_ndistinct_in(PG_FUNCTION_ARGS)
ending part should be:

freeJsonLexContext(lex);
if (result == JSON_SUCCESS)
{
..
}
else
{
   ereturn(parse_state.escontext, (Datum) 0,
errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
errmsg("malformed pg_ndistinct: \"%s\"", str),
errdetail("Must be valid JSON."));
   PG_RETURN_NULL();
}
result should be either JSON_SUCCESS or anything else.



all these functions:
ndistinct_object_start, ndistinct_array_start,
ndistinct_object_field_start, ndistinct_array_element_start
have
ndistinctParseState *parse = state;

do we need to change it to
ndistinctParseState *parse = (ndistinctParseState *)state;
?



ndistinctParseState need to add to src/tools/pgindent/typedefs.list
probably change it to "typedef struct ndistinctParseState".
also struct ndistinctParseState need placed in the top of
src/backend/statistics/mvdistinct.c
not in line 340?

72 matches

Mail list logo