Re: Disallow redundant indexes

2025-04-24 Thread David Rowley
On Thu, 24 Apr 2025 at 21:27, Japin Li  wrote:
> I propose that PostgreSQL prevent redundant index creation by:
>
> - Checking for identical existing indexes during CREATE INDEX.
> - Failing with an error (like Oracle's ORA-01408) if a duplicate is found.
> - Providing a GUC parameter (allow_redundant_indexes) to control this.

> I’d love to hear your feedback or suggestions for improvement.

Sounds like pointless nannying to me. Also, we don't want GUCs that
change the way things work. It's too annoying for application
developers. I don't have a complete picture of the history of it, but
I believe we still have quite a mess as a result of some GUC-dependent
behaviour. Check the mess around backslash_quote,
standard_conforming_strings and escape_string_warning.

In any case, who are we to define what a duplicate index is? Would
creating a hash index on a column that's already part of a btree index
be disallowed? How about creating an index on (col1) when there's
already an index on (col1,col2)? One person might think that's a waste
of space, and someone else might think it's useful to have the (col1)
index to support faster index-only scans when only that column is
needed.

I get that we have REINDEX CONCURRENTLY now and there's less of a need
to recreate another duplicate index CONCURRENTLY before dropping the
old one, but aren't there still reasons to create a duplicate index
concurrently, e.g to move an index to another tablespace without
blocking queries.

If you want to do this, then maybe just write some query that looks at
pg_index to find duplicates and stick it on the wiki. Anyone who cares
enough about this can run that to check. Oh wait, someone did that
already, see [1].

David

[1] https://wiki.postgresql.org/wiki/Index_Maintenance#Duplicate_indexes




Re: pg_upgrade-breaking release

2025-04-24 Thread Greg Sabino Mullane
On Thu, Apr 24, 2025 at 8:12 AM Bruce Momjian  wrote:

> Do we think most people are _not_ going to use pg_upgrade now that we
> are defaulting to checksums being enabled by default in PG 18?


I cannot imagine this would stop anyone from upgrading. It's one additional
flag, which was already a requirement for those going the other direction
in the past (checksums -> no checksums). And I also assume that "most
people" are already running with checksums enabled.

  And if so, do we think we are ever going to have a
> storage-format-changing release where pg_upgrade cannot be used?
>

Seems very unlikely, that would kind of go against the whole purpose of
pg_upgrade.

-- 
Cheers,
Greg

--
Crunchy Data - https://www.crunchydata.com
Enterprise Postgres Software Products & Tech Support


Re: RFC: Logging plan of the running query

2025-04-24 Thread torikoshia

Hi,

Attached a rebased version of the patch.

--
Regards,

--
Atsushi Torikoshi
Seconded from NTT DATA GROUP CORPORATION to SRA OSS K.K.From 112467238f585bb3398d86ac6e1a71caa6549fb4 Mon Sep 17 00:00:00 2001
From: Atsushi Torikoshi 
Date: Thu, 24 Apr 2025 20:50:41 +0900
Subject: [PATCH v44] Add function to log the plan of the currently running
 query

Currently, we have to wait for the query execution to finish
to know its plan either using EXPLAIN ANALYZE or auto_explain.
This is not so convenient, for example when investigating
long-running queries on production environments.
To improve this situation, this patch adds pg_log_query_plan()
function that requests to log the plan of the currently
executing query.

On receipt of the request, codes for logging plan is wrapped
to every ExecProcNode and when the executor runs one of
ExecProcNode, the plan is actually logged. These wrappers are
unwrapped when once the plan is logged.
In this way, we can avoid adding the overhead which we'll face
when adding CHECK_FOR_INTERRUPTS() like mechanisms in somewhere
in executor codes safely.

By default, only superusers are allowed to request to log the
plans because allowing any users to issue this request at an
unbounded rate would cause lots of log messages and which can
lead to denial of service.

---
 contrib/auto_explain/auto_explain.c  |  24 +-
 doc/src/sgml/func.sgml   |  55 +++
 src/backend/access/transam/xact.c|   7 +
 src/backend/catalog/system_functions.sql |   2 +
 src/backend/commands/Makefile|   1 +
 src/backend/commands/dynamic_explain.c   | 368 +++
 src/backend/commands/explain.c   |  38 +-
 src/backend/commands/meson.build |   1 +
 src/backend/executor/execMain.c  |  10 +
 src/backend/storage/ipc/procsignal.c |   4 +
 src/backend/tcop/postgres.c  |   4 +
 src/backend/utils/init/globals.c |   2 +
 src/include/catalog/pg_proc.dat  |   6 +
 src/include/commands/dynamic_explain.h   |  26 ++
 src/include/commands/explain.h   |   5 +
 src/include/commands/explain_state.h |   1 +
 src/include/miscadmin.h  |   1 +
 src/include/nodes/execnodes.h|   3 +
 src/include/storage/procsignal.h |   2 +
 src/include/tcop/pquery.h|   1 -
 src/test/regress/expected/misc_functions.out |  54 ++-
 src/test/regress/sql/misc_functions.sql  |  41 ++-
 22 files changed, 613 insertions(+), 43 deletions(-)
 create mode 100644 src/backend/commands/dynamic_explain.c
 create mode 100644 src/include/commands/dynamic_explain.h

diff --git a/contrib/auto_explain/auto_explain.c b/contrib/auto_explain/auto_explain.c
index cd6625020a..e720ddf39f 100644
--- a/contrib/auto_explain/auto_explain.c
+++ b/contrib/auto_explain/auto_explain.c
@@ -15,6 +15,7 @@
 #include 
 
 #include "access/parallel.h"
+#include "commands/dynamic_explain.h"
 #include "commands/explain.h"
 #include "commands/explain_format.h"
 #include "commands/explain_state.h"
@@ -412,26 +413,9 @@ explain_ExecutorEnd(QueryDesc *queryDesc)
 			es->format = auto_explain_log_format;
 			es->settings = auto_explain_log_settings;
 
-			ExplainBeginOutput(es);
-			ExplainQueryText(es, queryDesc);
-			ExplainQueryParameters(es, queryDesc->params, auto_explain_log_parameter_max_length);
-			ExplainPrintPlan(es, queryDesc);
-			if (es->analyze && auto_explain_log_triggers)
-ExplainPrintTriggers(es, queryDesc);
-			if (es->costs)
-ExplainPrintJITSummary(es, queryDesc);
-			ExplainEndOutput(es);
-
-			/* Remove last line break */
-			if (es->str->len > 0 && es->str->data[es->str->len - 1] == '\n')
-es->str->data[--es->str->len] = '\0';
-
-			/* Fix JSON to output an object */
-			if (auto_explain_log_format == EXPLAIN_FORMAT_JSON)
-			{
-es->str->data[0] = '{';
-es->str->data[es->str->len - 1] = '}';
-			}
+			ExplainStringAssemble(es, queryDesc, auto_explain_log_format,
+  auto_explain_log_triggers,
+  auto_explain_log_parameter_max_length);
 
 			/*
 			 * Note: we rely on the existing logging of context or
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 574a544d9f..13dc0a7745 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -28822,6 +28822,29 @@ acl  | {postgres=arwdDxtm/postgres,foo=r/postgres}

   
 
+  
+   
+
+ pg_log_query_plan
+
+pg_log_query_plan ( pid integer )
+boolean
+   
+   
+Requests to log the plan of the query currently running on the
+backend with specified process ID.
+It will be logged at LOG message level and
+will appear in the server log based on the log
+configuration set (See 
+for more information), but will not be sent to the client
+regardless of .
+   
+   
+This function is restricted to sup

Re: pg_upgrade-breaking release

2025-04-24 Thread Greg Sabino Mullane
On Thu, Apr 24, 2025 at 8:37 AM Bruce Momjian  wrote:

> When I wrote pg_upgrade, I assumed at some point the value of changing the
> storage format would outweigh the value of allowing in-place upgrades.  I
> guess that hasn't happened yet.
>

It reminds me of TDE, which is an good example of that, in a way. It
requires pg_dump or logical replication to go from unencrypted ->
encrypted. If core were to have something like that, I guess pg_upgrade
could technically support it, but it would be equivalent to pg_dump. We've
all been spoiled by that awesome --link option! :)

Cheers,
Greg

--
Crunchy Data - https://www.crunchydata.com
Enterprise Postgres Software Products & Tech Support


Re: pg_upgrade-breaking release

2025-04-24 Thread Bruce Momjian
On Thu, Apr 24, 2025 at 08:51:41AM -0400, Greg Sabino Mullane wrote:
> On Thu, Apr 24, 2025 at 8:37 AM Bruce Momjian  wrote:
> 
> When I wrote pg_upgrade, I assumed at some point the value of changing the
> storage format would outweigh the value of allowing in-place upgrades.  I
> guess that hasn't happened yet.
> 
> 
> It reminds me of TDE, which is an good example of that, in a way. It requires
> pg_dump or logical replication to go from unencrypted -> encrypted. If core
> were to have something like that, I guess pg_upgrade could technically support
> it, but it would be equivalent to pg_dump. We've all been spoiled by that
> awesome --link option! :) 

True on spoiled, but I think TDE is being held back by code complexity,
not upgrade complexity.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Do not let urgent matters crowd out time for investment in the future.




Re: Improve verification of recovery_target_timeline GUC.

2025-04-24 Thread David Steele

On 2/14/25 02:42, Michael Paquier wrote:

On Fri, Jan 24, 2025 at 01:36:45PM +, David Steele wrote:


+   timeline = strtoull(*newval, &endp, 0);
+
+   if (*endp != '\0' || errno == EINVAL || errno == ERANGE)
{
GUC_check_errdetail("\"recovery_target_timeline\" is not a 
valid number.");
return false;
}


Why not using strtou64() here?  That's more portable depending on
SIZEOF_LONG (aka the 4 bytes on WIN32 vs 8 bytes for the rest of the
world).


Done.


+
+   if (timeline < 2 || timeline > UINT_MAX)
+   {


A recovery_target_timeline of 1 is a valid value, isn't it?


+   GUC_check_errdetail(
+   "\"recovery_target_timeline\" must be between 2 and 
4,294,967,295.");
+   return false;
+   }


I would suggest to rewrite that as \"%s\" must be between %u and %u,
which would be more generic for translations in case there is an
overlap with something else.


Done. This means there are no commas in the upper bound but I don't 
think it's a big deal and it more closely matches other GUC messages.



+$logfile = slurp_file($node_standby->logfile());
+ok($logfile =~ qr/must be between 2 and 4,294,967,295/,
+   'target timelime out of max range');


These sequences of tests could be improved:
- Let's use start locations for the logs slurped, reducing the scope
of the logs scanned.


Done.


- Perhaps it would be better to rely on wait_for_log() to make sure
that the expected log contents are generated without any kind of race
condition?


Done.

I'm also now using a single cluster for all three tests rather than 
creating a new one for each test. This is definitely more efficient.


Having said that, I think these tests are awfully expensive for a single 
GUC. Unit tests would definitely be preferable but that's not an option 
for GUCs, so far as I know.


Thanks,
-DavidFrom 2ccf6f207a14dafe1a426208618ae7690c4513a2 Mon Sep 17 00:00:00 2001
From: David Steele 
Date: Thu, 24 Apr 2025 15:11:51 +
Subject: Improve verification of recovery_target_timeline GUC.

Currently check_recovery_target_timeline() converts any value that is not
current, latest, or a valid integer to 0. So for example:

recovery_target_timeline = 'currrent'

results in the following error:

FATAL:  22023: recovery target timeline 0 does not exist

Since there is no range checking for uint32 (but there is a cast from unsigned 
long) this setting:

recovery_target_timeline = '99'

results in the following error:

FATAL:  22023: recovery target timeline 1410065407 does not exist

Improve by adding endptr checking to catch conversion errors and add range 
checking to
exclude values < 2 and greater than UINT_MAX.
---
 src/backend/access/transam/xlogrecovery.c   | 17 +--
 src/test/recovery/t/003_recovery_targets.pl | 50 +
 2 files changed, 64 insertions(+), 3 deletions(-)

diff --git a/src/backend/access/transam/xlogrecovery.c 
b/src/backend/access/transam/xlogrecovery.c
index 6ce979f2d8b..e82aa739d79 100644
--- a/src/backend/access/transam/xlogrecovery.c
+++ b/src/backend/access/transam/xlogrecovery.c
@@ -4994,13 +4994,24 @@ check_recovery_target_timeline(char **newval, void 
**extra, GucSource source)
rttg = RECOVERY_TARGET_TIMELINE_LATEST;
else
{
+   char *endp;
+   uint64  timeline;
rttg = RECOVERY_TARGET_TIMELINE_NUMERIC;
 
errno = 0;
-   strtoul(*newval, NULL, 0);
-   if (errno == EINVAL || errno == ERANGE)
+   timeline = strtou64(*newval, &endp, 0);
+
+   if (*endp != '\0' || errno == EINVAL || errno == ERANGE)
+   {
+   GUC_check_errdetail("\"%s\" is not a valid number.",
+   
"recovery_target_timeline");
+   return false;
+   }
+
+   if (timeline < 1 || timeline > UINT_MAX)
{
-   GUC_check_errdetail("\"recovery_target_timeline\" is 
not a valid number.");
+   GUC_check_errdetail("\"%s\" must be between %u and %u.",
+   
"recovery_target_timeline", 1, UINT_MAX);
return false;
}
}
diff --git a/src/test/recovery/t/003_recovery_targets.pl 
b/src/test/recovery/t/003_recovery_targets.pl
index 0ae2e982727..bd701d04f99 100644
--- a/src/test/recovery/t/003_recovery_targets.pl
+++ b/src/test/recovery/t/003_recovery_targets.pl
@@ -187,4 +187,54 @@ ok( $logfile =~
  qr/FATAL: .* recovery ended before configured recovery target was 
reached/,
'recovery end before target reached is a fatal error');
 
+# Invalid timeline target
+$node_standby = Postg

Re: sslmode=secure by default (Re: Making sslrootcert=system work on Windows psql)

2025-04-24 Thread Christoph Berg
Re: George MacKerron
> > Before we can make this change, I think we would have to improve the
> > UX. psql does not even have any --switch for it. PostgreSQL serving
> > non-SSL and SSL on the same port doesn't make the UX better... :-/
> 
> How do you think the UX could be improved? Maybe by using a psql switch 
> and/or an env var to opt out of (or initially even to opt into) the new 
> sslmode treatment?

The env var is already there (PGSSLMODE).

Now you can say `psql -h db.example.com -p 5433 dbfoo`, but for
specifying the sslmode, you have to rewrite at least the last argument
to use connection string syntax, `psql "dbname=dbfoo sslmode=verify-full`.
This needs be be less cumbersome. (And the names of the options make
me want to stay away from them, require/verify-ca/verify-full/verify-confusing.
Your sslmode=secure idea is really good.)

It should be as simple as
psql --ssl (= sslmode=secure)
psql --insecure (the old sslmode=require)
psql --no-ssl (= sslmode=disable)

psql -s and -S are unfortunately already taken :-/

For connection strings, perhaps the best action is to tell people that
always including "sslmode=something" is best practise. For libpq-style
key=value connection strings, that wouldn't even be ugly. For
postgresql://-style strings, we would ideally have something like http://
vs https://, but I am not sure how to squeeze that into the syntax.
(Appending ?sslmode= works, but meh.)

Christoph




pg_upgrade-breaking release

2025-04-24 Thread Bruce Momjian
Do we think most people are _not_ going to use pg_upgrade now that we
are defaulting to checksums being enabled by default in PG 18?  And if
so, do we think we are ever going to have a storage-format-changing
release where pg_upgrade cannot be used?

Maybe it is too late to ask this, but I am asking anyway.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Do not let urgent matters crowd out time for investment in the future.




Re: MergeAppend could consider sorting cheapest child path

2025-04-24 Thread Andrei Lepikhov

On 3/28/25 09:19, Alexander Pyhalov wrote:

Andy Fan писал(а) 2024-10-17 03:33:
I've updated patch. One more interesting case which we found - when 
fractional path is selected, it still can be more expensive than sorted 
cheapest total path (as we look only on indexes whith necessary 
pathkeys, not on indexes which allow efficiently fetch data).
So far couldn't find artificial example, but we've seen inadequate index 
selection due to this issue - instead of using index suited for filters 
in where, index, suitable for sorting was selected as one having the 
cheapest fractional cost.

I think it is necessary to generalise the approach a little.

Each MergeAppend subpath candidate that fits pathkeys should be compared 
to the overall-optimal path + explicit Sort node. Let's label this 
two-node composition as base_path. There are three cases exist: 
startup-optimal, total-optimal and fractional-optimal.
In the startup case, base_path should use cheapest_startup_path, the 
total-optimal case - cheapest_total_path and for a fractional case, we 
may employ the get_cheapest_fractional_path routine to detect proper 
base_path.


It may provide a more effective plan either in full, fractional and 
partial scan cases:
1. The Sort node may be pushed down to subpaths under a parallel or 
async Append.
2. When a minor set of subpaths doesn't have a proper index, and it is 
profitable to sort them instead of switching to plain Append.


In general, analysing the regression tests changed by this code, I see 
that the cost model prefers a series of small sortings instead of a 
single massive one. May be it will show some profit for execution time.


I am not afraid of any palpable planning overhead here because we just 
do cheap cost estimation and comparison operations that don't need 
additional memory allocations. The caller is responsible for building a 
proper Sort node if this method is chosen as optimal.


In the attachment, see the patch written according to the idea. There 
are I introduced two new routines:

get_cheapest_path_for_pathkeys_ext
get_cheapest_fractional_path_for_pathkeys_ext

I have designed the code that way to reduce duplicated code in the 
generate_orderedappend_paths routine. But the main point is that I 
envision these new routines may be reused elsewhere.


This feature looks сlose to the one we discussed before [1]. So, let me 
CC the people from the previous conversation to the discussion.


[1] 
https://www.postgresql.org/message-id/flat/CAN-LCVPxnWB39CUBTgOQ9O7Dd8DrA_tpT1EY3LNVnUuvAX1NjA%40mail.gmail.com


--
regards, Andrei LepikhovFrom 98306f0e14c12b6dee92ef5977d85fc1dd324898 Mon Sep 17 00:00:00 2001
From: "Andrei V. Lepikhov" 
Date: Thu, 24 Apr 2025 14:03:02 +0200
Subject: [PATCH v0] Consider explicit sort of the MergeAppend subpaths.

Expand the optimiser search scope a little: fetching optimal subpath matching
pathkeys of the planning MergeAppend, consider the extra case of
overall-optimal path plus explicit Sort node at the top.

It may provide a more effective plan in both full and fractional scan cases:
1. The Sort node may be pushed down to subpaths under a parallel or
async Append.
2. The case is when a minor set of subpaths doesn't have a proper index, and it
is profitable to sort them instead of switching to plain Append.

In general, analysing the regression tests changed by this code, I see that
the cost model prefers a series of small sortings instead of a single massive
one.

Overhead:
It seems multiple subpaths may be encountered, as well as many pathkeys.
So, to be as careful as possible here, only cost estimation is performed.
---
 .../postgres_fdw/expected/postgres_fdw.out|   6 +-
 src/backend/optimizer/path/allpaths.c |  35 ++---
 src/backend/optimizer/path/pathkeys.c | 112 ++
 src/include/optimizer/paths.h |  10 ++
 .../regress/expected/collate.icu.utf8.out |   9 +-
 src/test/regress/expected/inherit.out |  19 ++-
 src/test/regress/expected/partition_join.out  | 139 +++---
 src/test/regress/expected/partition_prune.out |  16 +-
 src/test/regress/expected/union.out   |   6 +-
 src/test/regress/sql/inherit.sql  |   4 +-
 10 files changed, 255 insertions(+), 101 deletions(-)

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index d1acee5a5fa..11b42f18cb6 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -10277,13 +10277,15 @@ SELECT t1.a, t2.b FROM fprt1 t1 INNER JOIN fprt2 t2 ON (t1.a = t2.b) WHERE t1.a
->  Nested Loop
  Join Filter: (t1.a = t2.b)
  ->  Append
-   ->  Foreign Scan on ftprt1_p1 t1_1
+   ->  Sort
+ Sort Key: t1_1.a
+ ->  Foreign Scan on ftprt1_p1 t1_1
->  Foreign Scan on ftprt1_p2 t1_2
  ->  Materialize
->  Append
 

Re: extension_control_path and "directory"

2025-04-24 Thread Matheus Alcantara
On Thu, Apr 24, 2025 at 7:21 AM Christoph Berg  wrote:
>
> Re: Matheus Alcantara
> > I've tested with the semver extension and it seems to work fine with
> > this patch. Can you please check on your side to see if it's also
> > working?
>
> Hi Matheus,
>
> thanks for the patch, it does indeed work.
>
Thanks for testing and for reviewing.

> diff --git a/src/backend/commands/extension.c 
> b/src/backend/commands/extension.c
> index 180f4af9be3..d68efd59118 100644
> --- a/src/backend/commands/extension.c
> +++ b/src/backend/commands/extension.c
> @@ -376,6 +376,14 @@ get_extension_control_directories(void)
>
> /* Substitute the path macro if needed */
> mangled = substitute_path_macro(piece, "$system", 
> system_dir);
> +
> +   /*
> +* Append "extension" suffix in case is a custom 
> extension control
> +* path.
> +*/
> +   if (strcmp(piece, "$system") != 0)
> +   mangled = psprintf("%s/extension", mangled);
>
> This would look prettier if it was something like
>
> mangled = substitute_path_macro(piece, "$system", 
> system_dir "/extension");
>
> ... but I'm wondering if it wouldn't be saner if the control path
> should be stored without "extension" in that struct. Then opening the
> control file would be path + "extension/" + filename and the extra
> directory would be path + directory, without any on-the-fly stripping
> of trailing components.
>
> The extension_control_path GUC could also be adjusted to refer to the
> directory one level above the extension/foo.control location.
>
Storing the control path directly without any code to remove the
/extension at the end would be more trick I think, because we would need
to change the find_in_path() function to return the path without the
suffix.

In this new version I've introduced a new "basedir" field on
ExtensionControlFile so that we can save the base directory to search
for .control files and scripts. With this new field, on
get_extension_script_directory() we just need to join control->basedir
with control->directory. Note that we still need to handle the removal
of the /extension at the end of control path but I think that on this
new version the code looks a bit better (IMHO) since we just need to
handle on find_extension_control_filename(). WYT?

>
> +   /*
> +* Assert that the control->control_dir end with /extension suffix so 
> that
> +* we can replace with the value from control->directory.
> +*/
> +   Assert(ctrldir_len >= suffix_len &&
> +  strcmp(control->control_dir + ctrldir_len - suffix_len, 
> "extension") == 0);
>
> If control_dir is coming from extension_control_path, it might have a
> different suffix. Replace the Assert by elog(ERROR). (Or see above.)
>
In v2 I've moved the logic to remove the /extension to
parse_extension_control_file(), do you think that this Assert on this
function would still be wrong? IIUC we should always have /extension at
the end of "control_dir" at this place, because the
extension_control_path GUC will omit the /extension at the end and we
will force it to have the suffix on the path at
find_extension_control_filename() and
get_extension_control_directories() functions. I'm missing something
here?

I've also included some more TAP tests on this new version.

-- 
Matheus Alcantara


v2-0001-Make-directory-work-with-extension-control-path.patch
Description: Binary data


Re: Enable data checksums by default

2025-04-24 Thread Peter Eisentraut

On 23.04.25 00:24, Tomas Vondra wrote:

The patch that flips the default has been committed.

I also started a PG18 open items page and made a note that we follow up
on the upgrade experience, as was discussed in this thread.

https://wiki.postgresql.org/wiki/PostgreSQL_18_Open_Items


Regarding the open item, can someone explain what exactly are we
planning to evaluate mid-beta?


If you have a PG <=17 installation without checksums (the default), and 
you do the usual upgrade procedure to PG 18 involving initdb + 
pg_upgrade, then pg_upgrade will reject the upgrade, because the 
checksum settings don't match.  The workaround is to run initdb with 
--no-data-checksums and try again.


That's probably not all that bad, but if this is all below a bunch of 
layers of scripts, users will have to do some work on their end to get 
this working smoothly.


The point of the open item was (a) to make sure this is adequately 
documented, for instance in the release notes, (b) to think about 
technological solutions to simplify this, such as [0], and (c) to just 
check the general feedback.


Nothing from [0] ended up being committed, so that part of obsolete. 
The action for beta1 is (a).  And then for (c) perhaps monitor the 
feedback between beta1 and beta2.



[0]: 
https://www.postgresql.org/message-id/flat/57957aca-3eae-4106-afb2-3008122b9950%40eisentraut.org






Re: Disallow redundant indexes

2025-04-24 Thread Greg Sabino Mullane
On Thu, Apr 24, 2025 at 7:31 AM David Rowley  wrote:

> On Thu, 24 Apr 2025 at 21:27, Japin Li  wrote:
> > I propose that PostgreSQL prevent redundant index creation by:



> In any case, who are we to define what a duplicate index is?


I think this part is easier than you make it sound: everything (except the
name) is exactly the same as an existing index. That's the 99% case we are
trying to catch here.

I've had this idea before, and even wrote a quick POC at one point, but I
had it simply throw a warning rather than an error. That avoids the need
for any GUC, which I agree is not a good idea. And it still allows people
to create a duplicate index if they really want to.

Cheers,
Greg

--
Crunchy Data - https://www.crunchydata.com
Enterprise Postgres Software Products & Tech Support


Re: What's our minimum supported Python version?

2025-04-24 Thread Jelte Fennema-Nio
On Thu, 24 Apr 2025 at 10:54, Peter Eisentraut  wrote:
> The cut-off in practice for these things is usually RHEL.  PG18
> currently still supports RHEL 7, which appears to come with Python 3.6.
> Seeing that the small problem with the test script was easily fixed, I
> think we should stick with that for now.  There might be value in
> dropping support for RHEL 7 sometime, but that should be done across the
> board (meson version, openssl version, perl version, etc.), and requires
> some buildfarm changes, so I wouldn't do this right now.

RHEL7 is not supported by Devrim for PG16 and PG17 already[1]. Did you
mean something different with "PG18 currently still supports RHEL 7"?

[1]: https://yum.postgresql.org/news/rhel7-postgresql-rpms-end-of-life/




Re: sslmode=secure by default (Re: Making sslrootcert=system work on Windows psql)

2025-04-24 Thread Peter Eisentraut

On 24.04.25 12:53, Christoph Berg wrote:

Now you can say `psql -h db.example.com -p 5433 dbfoo`, but for
specifying the sslmode, you have to rewrite at least the last argument
to use connection string syntax, `psql "dbname=dbfoo sslmode=verify-full`.
This needs be be less cumbersome. (And the names of the options make
me want to stay away from them, require/verify-ca/verify-full/verify-confusing.
Your sslmode=secure idea is really good.)


I'm generally in favor of making sslmode=verify-full the effective 
default somehow.


Another detail to think about is how this affects psql -h localhost.  In 
principle, this should require full SSL, but you're probably not going 
to have certificates that allow "localhost".  And connections to 
localhost are the default on Windows.  We could also switch the Windows 
default to Unix-domain sockets.  But there are probably still other 
reasons why connections to TCP/IP localhost are made.  Some things to 
think about.






Re: Disallow redundant indexes

2025-04-24 Thread Japin Li


Hi, Greg and David

Thank you for your feedback.

On Thu, 24 Apr 2025 at 08:26, Greg Sabino Mullane  wrote:
> On Thu, Apr 24, 2025 at 7:31 AM David Rowley  wrote:
>
>  On Thu, 24 Apr 2025 at 21:27, Japin Li  wrote:
>  > I propose that PostgreSQL prevent redundant index creation by:
>
>  
>  In any case, who are we to define what a duplicate index is?
>

You're absolutely right. Defining a duplicate index is indeed simpler than
I initially described.

> I think this part is easier than you make it sound: everything (except the 
> name) is exactly the same as an existing
> index. That's the 99% case we are trying to catch here.
>

As Greg pointed out, if everything except the name is identical to an existing
index, it should be considered a duplicate in most cases (the 99% case).
This is precisely the scenario I'm aiming to prevent.

> I've had this idea before, and even wrote a quick POC at one point, but I had 
> it simply throw a warning rather than an
> error. That avoids the need for any GUC, which I agree is not a good idea. 
> And it still allows people to create a
> duplicate index if they really want to.
>

I also appreciate your suggestion regarding the GUC parameter.  You've
convinced me that a warning might be a more appropriate approach.  A warning
would still alert users to the potential issue of creating a redundant index,
while allowing them to proceed if they have a specific reason to do so.

-- 
Regrads,
Japin Li




Re: MergeJoin beats HashJoin in the case of multiple hash clauses

2025-04-24 Thread Tender Wang
Tender Wang  于2025年4月14日周一 14:17写道:

> Hi,
>
> While I debug hashjoin codes,  in estimate_multivariate_bucketsize(), I
> find that
> the list_copy(hashclauses) below is unnecessary if we have a single join
> clause.
>
> List   *clauses = list_copy(hashclauses);
> ...
>
> I adjust the place of list_copy() call as the attached patch.
> This can save some overhead of function calls and memory copies.
>
> Any thoughts?
>
>
Hi Alexander,

In the last thread, I found a minor optimization for the code in
estimate_multivariate_bucketsize().
Adjust the place of list_copy() at the start of
estimate_multivariate_bucketsize, and we can avoid unnecessarily creating a
new list
and memory copy if we have only a single hash clause.

Do you think it's worth doing this?
-- 
Thanks,
Tender Wang


Re: Showing applied extended statistics in explain Part 2

2025-04-24 Thread Andrei Lepikhov

On 2/10/25 16:56, Tomas Vondra wrote:

On 2/10/25 10:09, Andrei Lepikhov wrote:

On 8/2/2025 20:50, Tomas Vondra wrote:


The main profit here - you see all the stats involved in estimations
(and their state), even if final plan doesn't contain estimated stuff at
all.


OK, that seems very underwhelming. I still think we should show which
clauses were estimated using which statistics object.
To understand how it may work, I employed the EXPLAIN extensibility 
introduced in PG 18 to show the use of plain statistics [1]. It looks 
like the following:


EXPLAIN (COSTS OFF, STAT ON)
SELECT * FROM sc_a WHERE x=1 AND y LIKE 'a';

 Seq Scan on sc_a
   Filter: ((y ~~ 'a'::text) AND (x = 1))
 Statistics:
   "sc_a.y: 1 times, stats: { MCV: 10 values, Correlation,
ndistinct: 10., nullfrac: 0., width: 5 }
   "sc_a.x: 1 times, stats: { Histogram: 0 values, Correlation,
 ndistinct: -1., nullfrac: 0., width: 4 }

As you can see, stat usage is summarised at the end of the EXPLAIN. It 
contains information about the column, how many times it was used and 
the parameters of statistic slots.
Of course, being an extension it is constrained a lot, but even there is 
the profit:

1. You may see types of statistics exist on the column
2. Precision of the histogram or MCV (statistic_target) on a specific 
table - some users forget to increase it on large (or partitioned) tables
3. You have basic stat like nullfrac, ndistinct without the necessity to 
teach personnel how to gather it on a production instance safely.


Also, using it in real cases, I realised that it would be highly 
profitable to specify which statistic type was used to estimate this 
specific clause.


Of course, extended statistics have their own specifics, which may 
require another output format. Just consider this example too.


[1] https://github.com/danolivo/pg_index_stats

--
regards, Andrei Lepikhov




Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly

2025-04-24 Thread Vitaly Davydov
Hi Alexander,

Thank you for the review. I apologize for a late reply. I missed your email.

> 1) As ReplicationSlotsComputeRequiredLSN() is called each time we need
> to advance the position of WAL needed by replication slots, the usage
> pattern probably could be changed.  Thus, we probably need to call
> ReplicationSlotsComputeRequiredLSN() somewhere after change of
> restart_lsn_flushed while restart_lsn is not changed.  And probably
> can skip ReplicationSlotsComputeRequiredLSN() in some cases when only
> restart_lsn is changed.

Yes, it is a good idea for investigation, thank you! I guess, It may work for
persistent slots but I'm not sure about other types of slots (ephemeral and
temporary). I have no clear understanding of consequences at the moment. I
propose to postpone it for future, because the proposed changes will me more
invasive.

> 2) I think it's essential to include into the patch test caches which
> fail without patch.  You could start from integrating [1] test into
> your patch, and then add more similar tests for different situations.

The problem with TAP tests - it is hard to reproduce without injection points.
The Tomas Vondra's tests create two new injection points. I have to add more
injection points for new tests as well. Injection points help to test the code
but make the code unreadable. I'm not sure, what is the policy of creating
injection points? Personally, I would not like to add new injection points
only to check this particular rare case. I'm trying to create a test without
injection points that should fail occasionally, but I haven't succeeded at
the moment.

I have a question - is there any interest to backport the solution into
existing major releases? I can prepare a patch where restart_lsn_flushed stored
outside of ReplicationSlot structure and doesn't affect the existing API.

With best regards,
Vitaly

On Friday, April 04, 2025 06:22 MSK, Alexander Korotkov  
wrote:

> Hi, Vitaly!
> 
> On Mon, Mar 3, 2025 at 5:12 PM Vitaly Davydov  
> wrote:
> > The slot data is flushed to the disk at the beginning of checkpoint. If
> > an existing slot is advanced in the middle of checkpoint execution, its
> > advanced restart LSN is taken to calculate the oldest LSN for WAL
> > segments removal at the end of checkpoint. If the node is restarted just
> > after the checkpoint, the slots data will be read from the disk at
> > recovery with the oldest restart LSN which can refer to removed WAL
> > segments.
> >
> > The patch introduces a new in-memory state for slots -
> > flushed_restart_lsn which is used to calculate the oldest LSN for WAL
> > segments removal. This state is updated every time with the current
> > restart_lsn at the moment, when the slot is saving to disk.
> 
> Thank you for your work on this subject.   I think generally your
> approach is correct.  When we're truncating the WAL log, we need to
> reply on the position that would be used in the case of server crush.
> That is the position flushed to the disk.
> 
> While your patch is generality looks good, I'd like make following notes:
> 
> 1) As ReplicationSlotsComputeRequiredLSN() is called each time we need
> to advance the position of WAL needed by replication slots, the usage
> pattern probably could be changed.  Thus, we probably need to call
> ReplicationSlotsComputeRequiredLSN() somewhere after change of
> restart_lsn_flushed while restart_lsn is not changed.  And probably
> can skip ReplicationSlotsComputeRequiredLSN() in some cases when only
> restart_lsn is changed.
> 2) I think it's essential to include into the patch test caches which
> fail without patch.  You could start from integrating [1] test into
> your patch, and then add more similar tests for different situations.
> 
> Links.
> 1.  
> https://www.postgresql.org/message-id/e3ac0535-e7a2-4a96-9b36-9f765e9cfec5%40vondra.me
> 
> --
> Regards,
> Alexander Korotkov
> Supabase





Re: What's our minimum supported Python version?

2025-04-24 Thread Peter Eisentraut

On 24.04.25 13:16, Jelte Fennema-Nio wrote:

On Thu, 24 Apr 2025 at 10:54, Peter Eisentraut  wrote:

The cut-off in practice for these things is usually RHEL.  PG18
currently still supports RHEL 7, which appears to come with Python 3.6.
Seeing that the small problem with the test script was easily fixed, I
think we should stick with that for now.  There might be value in
dropping support for RHEL 7 sometime, but that should be done across the
board (meson version, openssl version, perl version, etc.), and requires
some buildfarm changes, so I wouldn't do this right now.


RHEL7 is not supported by Devrim for PG16 and PG17 already[1]. Did you
mean something different with "PG18 currently still supports RHEL 7"?

[1]: https://yum.postgresql.org/news/rhel7-postgresql-rpms-end-of-life/


There are approximately 6 buildfarm members with RHEL 7 or CentOS 7, so 
in that sense, "support" means that everything is expected to work 
there.  And some amount of work has been done recently to keep that 
support alive.


(But I'm confused now because I see Python 3.6 on both the RHEL 7 and 
the RHEL 8 animals.  So it's not clear how representative these animals 
are especially with respect to the Python versions.)





Postmaster fails to shut down right after crash restart

2025-04-24 Thread Sergey Shinderuk

Hello,

While developing a patch and running regression tests I noticed that the 
postmaster could fail to shut down right after crash restart. It could 
get stuck in the PM_WAIT_BACKENDS state forever.


As far as I understand, the problem occurs when a shutdown signal is 
received before getting PMSIGNAL_RECOVERY_STARTED from the startup 
process. In that case the FatalError flag is not cleared, and the 
postmaster is stuck in PM_WAIT_BACKENDS waiting for the checkpointer, 
which ignores SIGTERM.


To easily reproduce the problem I added pg_usleep in xlogrecovery.c just 
before SendPostmasterSignal(PMSIGNAL_RECOVERY_STARTED). See the patch 
attached.


Then I run a script that simulates a crash and does pg_ctl stop:

$ ./init.sh
[...]

$ ./stop-after-crash.sh
waiting for server to start done
server started
waiting for server to shut 
down... failed

pg_ctl: server does not shut down


Some processes are still alive:

$ ps uf -C postgres
USER PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND
sergey279874  0.0  0.0 222816 28560 ?Ss   14:25   0:00 
/home/sergey/pgwork/devel/install/bin/postgres -D data
sergey279887  0.0  0.0 222772  5664 ?Ss   14:25   0:00  \_ 
postgres: io worker 0
sergey279888  0.0  0.0 222772  5664 ?Ss   14:25   0:00  \_ 
postgres: io worker 1
sergey279889  0.0  0.0 222772  5664 ?Ss   14:25   0:00  \_ 
postgres: io worker 2
sergey279891  0.0  0.0 222884  8480 ?Ss   14:25   0:00  \_ 
postgres: checkpointer



Here is an excerpt from the debug log:

postmaster[279874] LOG:  all server processes terminated; reinitializing
startup[279890] LOG:  database system was interrupted; last known up at 
2025-04-24 14:25:58 MSK
startup[279890] LOG:  database system was not properly shut down; 
automatic recovery in progress


postmaster[279874] DEBUG:  postmaster received shutdown request signal
postmaster[279874] LOG:  received fast shutdown request
postmaster[279874] DEBUG:  updating PMState from PM_STARTUP to 
PM_STOP_BACKENDS
postmaster[279874] DEBUG:  sending signal 15/SIGTERM to background 
writer process with pid 279892
postmaster[279874] DEBUG:  sending signal 15/SIGTERM to checkpointer 
process with pid 279891
postmaster[279874] DEBUG:  sending signal 15/SIGTERM to startup process 
with pid 279890
postmaster[279874] DEBUG:  sending signal 15/SIGTERM to io worker 
process with pid 279889
postmaster[279874] DEBUG:  sending signal 15/SIGTERM to io worker 
process with pid 279888
postmaster[279874] DEBUG:  sending signal 15/SIGTERM to io worker 
process with pid 279887
postmaster[279874] DEBUG:  updating PMState from PM_STOP_BACKENDS to 
PM_WAIT_BACKENDS


startup[279890] LOG:  invalid record length at 0/175A4D8: expected at 
least 24, got 0

postmaster[279874] DEBUG:  postmaster received pmsignal signal
startup[279890] LOG:  redo is not required

checkpointer[279891] LOG:  checkpoint starting: end-of-recovery 
immediate wait
checkpointer[279891] LOG:  checkpoint complete: wrote 0 buffers (0.0%), 
wrote 3 SLRU buffers; 0 WAL file(s) added, 0 removed, 0 recycled; 
write=0.007 s, sync=0.002 s, total=0.026 s; sync files=2, longest=0.001 
s, average=0.001 s; distance=0 kB, estimate=0 kB; lsn=0/175A4D8, redo 
lsn=0/175A4D8


startup[279890] DEBUG:  exit(0)
postmaster[279874] DEBUG:  updating PMState from PM_WAIT_BACKENDS to 
PM_WAIT_BACKENDS


checkpointer[279891] DEBUG:  checkpoint skipped because system is idle
checkpointer[279891] DEBUG:  checkpoint skipped because system is idle


I don't know how to fix this, but thought it's worth reporting.

Best regards,

--
Sergey Shinderukhttps://postgrespro.com/diff --git a/src/backend/access/transam/xlogrecovery.c b/src/backend/access/transam/xlogrecovery.c
index 6ce979f2d8b..19ee8b09685 100644
--- a/src/backend/access/transam/xlogrecovery.c
+++ b/src/backend/access/transam/xlogrecovery.c
@@ -1696,7 +1696,10 @@ PerformWalRecovery(void)
 	 * archiver if necessary.
 	 */
 	if (IsUnderPostmaster)
+	{
+		pg_usleep(300);
 		SendPostmasterSignal(PMSIGNAL_RECOVERY_STARTED);
+	}
 
 	/*
 	 * Allow read-only connections immediately if we're consistent already.


init.sh
Description: application/shellscript


stop-after-crash.sh
Description: application/shellscript


logfile.gz
Description: application/gzip


Re: pg_upgrade-breaking release

2025-04-24 Thread Bruce Momjian
On Thu, Apr 24, 2025 at 08:35:10AM -0400, Greg Sabino Mullane wrote:
> On Thu, Apr 24, 2025 at 8:12 AM Bruce Momjian  wrote:
> 
> Do we think most people are _not_ going to use pg_upgrade now that we
> are defaulting to checksums being enabled by default in PG 18?
> 
> 
> I cannot imagine this would stop anyone from upgrading. It's one additional
> flag, which was already a requirement for those going the other direction in
> the past (checksums -> no checksums). And I also assume that "most people" are
> already running with checksums enabled.
> 
> 
>   And if so, do we think we are ever going to have a
> storage-format-changing release where pg_upgrade cannot be used?
> 
> 
> Seems very unlikely, that would kind of go against the whole purpose of
> pg_upgrade.

When I wrote pg_upgrade, I assumed at some point the value of changing
the storage format would outweigh the value of allowing in-place
upgrades.  I guess that hasn't happened yet.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Do not let urgent matters crowd out time for investment in the future.




Re: POC: enable logical decoding when wal_level = 'replica' without a server restart

2025-04-24 Thread Amit Kapila
On Wed, Apr 23, 2025 at 9:35 PM Masahiko Sawada  wrote:
>
> On Wed, Apr 23, 2025 at 5:46 AM Amit Kapila  wrote:
> >
> > BTW, did we consider the idea to automatically transition to 'logical'
> > when the first logical slot is created and transition back to
> > 'replica' when last logical slot gets dropped? I see some ideas around
> > this last time we discussed this topic.
>
> Yes. Bertrand pointed out that a drawback is that the primary server
> needs to create a logical slot in order to execute logical decoding on
> the standbys[1].
>

True, but if we want to avoid that, we can still keep 'logical' as
wal_level for the ease of users. We can also have another API like the
one you originally proposed (pg_activate_logical_decoding) for the
ease of users. But the minimum requirement would be that one creates a
logical slot to enable logical decoding/replication.

Additionally, shall we do some benchmarking, if not done already, to
show the cases where the performance and WAL volume can hurt users if
we make wal_level as 'logical'?

-- 
With Regards,
Amit Kapila.




Re: vacuumdb --missing-stats-only and pg_upgrade from PG13

2025-04-24 Thread Christoph Berg
Re: Nathan Bossart
> Here is what I have staged for commit.  I'll aim to commit these patches
> sometime next week to give time for additional feedback.

I confirm my PG13 test table gets analyzed now with the patch.

Christoph




Re: pg_upgrade-breaking release

2025-04-24 Thread Álvaro Herrera
On 2025-Apr-24, Bruce Momjian wrote:

> Do we think most people are _not_ going to use pg_upgrade now that we
> are defaulting to checksums being enabled by default in PG 18?  And if
> so, do we think we are ever going to have a storage-format-changing
> release where pg_upgrade cannot be used?

Peter E posted a patch that allowed pg_upgrade to migrate (rewrite)
files from non-checksummed to checksummed, but he appears to have given
up on it for this release given an apparent lack of interest.
https://postgr.es/m/57957aca-3eae-4106-afb2-3008122b9...@eisentraut.org

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/




Re: pg_upgrade-breaking release

2025-04-24 Thread Bruce Momjian
On Thu, Apr 24, 2025 at 04:51:08PM +0200, Álvaro Herrera wrote:
> On 2025-Apr-24, Bruce Momjian wrote:
> 
> > Do we think most people are _not_ going to use pg_upgrade now that we
> > are defaulting to checksums being enabled by default in PG 18?  And if
> > so, do we think we are ever going to have a storage-format-changing
> > release where pg_upgrade cannot be used?
> 
> Peter E posted a patch that allowed pg_upgrade to migrate (rewrite)
> files from non-checksummed to checksummed, but he appears to have given
> up on it for this release given an apparent lack of interest.
> https://postgr.es/m/57957aca-3eae-4106-afb2-3008122b9...@eisentraut.org

Yeah, I saw that, and I think we could do something similar for TDE if
we ever had it.  I think we are suggesting people just do offline
addition of checksums rather than trying to do something fancy with
pg_upgrade.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Do not let urgent matters crowd out time for investment in the future.




Re: [PATCH] dynahash: add memory allocation failure check

2025-04-24 Thread Andrey Borodin



> On 24 Apr 2025, at 00:42, Tom Lane  wrote:
> 
> - hashp = (HTAB *) DynaHashAlloc(sizeof(HTAB) + strlen(tabname) + 1);
> + hashp = (HTAB *) MemoryContextAlloc(CurrentDynaHashCxt,
> + sizeof(HTAB) + strlen(tabname) + 1);

This seems correct to me.

While fixing this maybe use MemoryContextAllocZero() instead of subsequent 
MemSet()?

But this might unroll loop of unnecessary beautifications like DynaHashAlloc() 
calling Assert(MemoryContextIsValid(CurrentDynaHashCxt)) just before 
MemoryContextAllocExtended() will repeat same exercise.


Best regards, Andrey Borodin.



Re: pg_upgrade-breaking release

2025-04-24 Thread Noah Misch
On Thu, Apr 24, 2025 at 08:37:56AM -0400, Bruce Momjian wrote:
> On Thu, Apr 24, 2025 at 08:35:10AM -0400, Greg Sabino Mullane wrote:
> > On Thu, Apr 24, 2025 at 8:12 AM Bruce Momjian  wrote:
> > 
> > Do we think most people are _not_ going to use pg_upgrade now that we
> > are defaulting to checksums being enabled by default in PG 18?
> > 
> > 
> > I cannot imagine this would stop anyone from upgrading. It's one additional
> > flag

Yeah.  We've had this before, with integer datetimes and others.  People will
notice the friction, but v18 won't be a shock in this respect.

> >   And if so, do we think we are ever going to have a
> > storage-format-changing release where pg_upgrade cannot be used?
> > 
> > 
> > Seems very unlikely, that would kind of go against the whole purpose of
> > pg_upgrade.
> 
> When I wrote pg_upgrade, I assumed at some point the value of changing
> the storage format would outweigh the value of allowing in-place
> upgrades.  I guess that hasn't happened yet.

Having pg_upgrade has made PostgreSQL popular for applications with high
availability and high data volumes, applications that would have ruled out
PostgreSQL before pg_upgrade.  In other words, adding pg_upgrade changed the
expectations of PostgreSQL.  A storage format break is less plausible now than
it was in the early years of having pg_upgrade.

I think a prerequisite for a storage format break would be a foolproof upgrade
recipe based on logical replication techniques.  The step having downtime
would need to be no slower than pg_upgrade.  Even with that, the double
storage requirement isn't negligible.  Hence, it wouldn't be a given that we'd
regard the storage format break as sufficiently-valuable.




Re: [PATCH] dynahash: add memory allocation failure check

2025-04-24 Thread Tom Lane
Andrey Borodin  writes:
> While fixing this maybe use MemoryContextAllocZero() instead of subsequent 
> MemSet()?

I thought about that but intentionally left it as-is, because that
would force zeroing of the space reserved for the hashtable name too.
That's unnecessary, and since it'd often be odd-sized it might result
in a less efficient fill loop.

regards, tom lane




Re: POC: enable logical decoding when wal_level = 'replica' without a server restart

2025-04-24 Thread Masahiko Sawada
On Thu, Apr 24, 2025 at 5:30 AM Amit Kapila  wrote:
>
> On Wed, Apr 23, 2025 at 9:35 PM Masahiko Sawada  wrote:
> >
> > On Wed, Apr 23, 2025 at 5:46 AM Amit Kapila  wrote:
> > >
> > > BTW, did we consider the idea to automatically transition to 'logical'
> > > when the first logical slot is created and transition back to
> > > 'replica' when last logical slot gets dropped? I see some ideas around
> > > this last time we discussed this topic.
> >
> > Yes. Bertrand pointed out that a drawback is that the primary server
> > needs to create a logical slot in order to execute logical decoding on
> > the standbys[1].
> >
>
> True, but if we want to avoid that, we can still keep 'logical' as
> wal_level for the ease of users.

I think we'd like to cover the use case like where users start with
'replica' on the primary and execute logical decoding on the standby
without neither creating a logical slot on the primary nor restarting
the primary.

> We can also have another API like the
> one you originally proposed (pg_activate_logical_decoding) for the
> ease of users. But the minimum requirement would be that one creates a
> logical slot to enable logical decoding/replication.

I think we want to avoid the runtime WAL level automatically decreased
to 'replica' once all logical slots are removed, if users still want
to execute logical decoding on only the standby. One idea is that if
users enable logical decoding using pg_activate_logical_decoding(),
the runtime WAL level doesn't decrease to 'replica' even if all
logical slots are removed. But it would require for us to remember how
the logical decoding has been enabled in a permanent way. Also, I'm
concerned that having three ways to enable logical decoding could
confuse users: wal_level GUC parameter, creating at least one logical
slot, and pg_activate_logical_decoding().

> Additionally, shall we do some benchmarking, if not done already, to
> show the cases where the performance and WAL volume can hurt users if
> we make wal_level as 'logical'?

I believe it would be significant especially for REPLICA IDENTITY FULL
tables. I agree it's worth benchmarking it but I guess the result
would not convince us to make 'logical' default.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com




Re: Making sslrootcert=system work on Windows psql

2025-04-24 Thread Jacob Champion
On Wed, Apr 23, 2025 at 8:47 AM George MacKerron  wrote:
> I’d suggest two new special sslrootcert values:
>
> (1) sslrootcert=openssl
>
> This does exactly what sslrootcert=system does now, but is less confusingly 
> named for Windows users. sslrootcert=system becomes a deprecated synonym for 
> this option.

Stealing the word "system" from the existing sslrootcert domain had at
least two hazards: a) existing users might have a file named "system"
that would now be ignored, and b) users might accidentally use
sslrootcert=system on older versions of libpq, picking up an
unexpected file named "system" and doing the Wrong Thing. Problem (a)
can be worked around by saying "./system" instead, so honestly I
wasn't too concerned about that, and I considered (b) to be more of a
theoretical problem that was outweighed by the benefit of getting
OpenSSL to just Do The Thing people wanted it to do.

A couple years on, I think (b) is less theoretical than I had
originally hoped. As evidence I point to Stack Overflow questions like
[1], where both the asker and the answerer are a bit confused about
how connection string versioning works. If we steal more words, I
think that problem is going to get worse. So I'm leaning toward's
Daniel's earlier position that sslrootcert has kind of run its course,
and if you want to select OpenSSL stores, we need a more fully
featured syntax and probably a completely new option to be able to
pass that through safely.

> (2) sslrootcert=os
>
> This does what I was proposing in my patch: it uses winstore on Windows and 
> behaves the same as sslrootcert=openssl elsewhere, where openssl *is* the 
> operating system SSL provider.

Falling back to standard OpenSSL introduces the same hazard we're
running into today, though -- what if someone creates a macstore [2]
for OpenSSL, so that its behavior matches Safari's or whatever, and
then everyone wonders why sslrootcert=os doesn't use that? If the
abstraction must leak the details anyway, I think we should expose
them directly instead.

(As a small soapbox, I think "application-level" fallback for a trust
chain is frequently going to lead to regret. You should ideally tell
us what you want, and either get it or fail.)

Thanks,
--Jacob

[1] 
https://stackoverflow.com/questions/77989772/psql-root-certificate-file-system-does-not-exist-why-sslrootcert-system-do
[2] https://github.com/openssl/openssl/issues/23460




Re: Fix slot synchronization with two_phase decoding enabled

2025-04-24 Thread Masahiko Sawada
On Thu, Apr 24, 2025 at 2:24 AM Nisha Moond  wrote:
>
> On Thu, Apr 24, 2025 at 12:28 PM Amit Kapila  wrote:
> >
> > On Wed, Apr 23, 2025 at 11:04 PM Masahiko Sawada  
> > wrote:
> > >
> > > On Tue, Apr 22, 2025 at 3:00 AM Amit Kapila  
> > > wrote:
> > > >
> > > > On Mon, Apr 21, 2025 at 8:44 AM Zhijie Hou (Fujitsu)
> > > >  wrote:
> > > > >
> > > > > On Sat, Apr 19, 2025 at 2:19 AM Masahiko Sawada wrote:
> > > > > >
> > > > > > On Tue, Apr 8, 2025 at 10:14 PM Zhijie Hou (Fujitsu)
> > > > > >  wrote:
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Approach 2
> > > > > > > --
> > > > > > >
> > > > > > > Instead of disallowing the use of two-phase and failover 
> > > > > > > together, a more
> > > > > > > flexible strategy could be only restrict failover for slots with 
> > > > > > > two-phase
> > > > > > > enabled when there's a possibility of existing prepared 
> > > > > > > transactions before
> > > > > > the
> > > > > > > two_phase_at that are not yet replicated. During slot creation 
> > > > > > > with
> > > > > > two-phase
> > > > > > > and failover, we could check for any decoded prepared 
> > > > > > > transactions when
> > > > > > > determining the decoding start point 
> > > > > > > (DecodingContextFindStartpoint). For
> > > > > > > subsequent attempts to alter failover to true, we ensure that 
> > > > > > > two_phase_at is
> > > > > > > less than restart_lsn, indicating that all prepared transactions 
> > > > > > > have been
> > > > > > > committed and replicated, thus the bug would not happen.
> > > > > > >
> > > > > > > pros:
> > > > > > >
> > > > > > > This method minimizes restrictions for users. Especially during 
> > > > > > > slot creation
> > > > > > > with (two_phase=on, failover=on), as it’s uncommon for 
> > > > > > > transactions to
> > > > > > prepare
> > > > > > > during consistent snapshot creation, the restriction becomes 
> > > > > > > almost
> > > > > > > unnoticeable.
> > > > > >
> > > > > > I think this approach can work for the transactions that are 
> > > > > > prepared
> > > > > > while the slot is created. But if I understand the problem 
> > > > > > correctly,
> > > > > > while the initial table sync is performing, the slot's two_phase is
> > > > > > still false, so we need to deal with the transactions that are
> > > > > > prepared during the initial table sync too. What do you think?
> > > > > >
> > > > >
> > > > > Yes, I agree that we need to restrict this case too. Given that we 
> > > > > haven't
> > > > > started decoding when setting two_phase=true during 
> > > > > CreateDecodingContext()
> > > > > after tablesync, we could check prepared transactions afterwards 
> > > > > during
> > > > > decoding. This could involve reporting an ERROR when skipping a 
> > > > > prepared
> > > > > transaction during decoding if its prepare LSN is less than 
> > > > > two_phase_at.
> > > > >
> > > >
> > > > It will make it difficult for users to detect it as this happens at a
> > > > later point of time.
> > > >
> > > > > Alternatively, a simpler method would be to prevent this situation 
> > > > > entirely
> > > > > during the CREATE SUBSCRIPTION command. For example, we could 
> > > > > restrict slots
> > > > > created with failover set to true and twophase is later modified to 
> > > > > true after
> > > > > tablesync. Although the simpler check is more user-visible, it may 
> > > > > offer less
> > > > > flexibility.
> > > > >
> > > >
> > > > I agree with your point, but OTOH, I am also afraid of adding too many
> > > > smart checks in the back-branch. If we follow what you say here, then
> > > > users have the following ways in PG17 to enable both failover and
> > > > two_phase. (a) During Create Subscription, users can set both
> > > > 'failover' and 'two_phase', if 'copy_data' is false, or (b), if
> > > > 'copy_data' is true, during Create Subscription, then users can enable
> > > > 'two_phase' and wait for it to be enabled. Then use Alter Subscription
> > > > to set 'failover'.
> > >
> > > Yet another idea would be to disallow enabling both two_phase and
> > > failover at CREATE SUBSCRIPTION regardless of copy_data value and to
> > > add check when enabling failover for the two_phase-enabled-slots. For
> > > example, users who want to enable both need to do two steps:
> > >
> > > 1. CREATE SUBSCRIPTION with two_phase = true and failover = false.
> > > 2. ALTER SUBSCRIPTION with failover = true.
> > >
> > > At ALTER SUBSCRIPTION with failover = true, the subscriber checks if
> > > the two_phase states is ready (and possibly check if the slot's
> > > two_phase has been enabled too), otherwise raises an ERROR. Then, when
> > > the publisher enables the failover for the two_phase-enabled-slot up
> > > on walrcv_alter_slot() request, it checks the slot's restart_lsn has
> > > passed slot's two_phase_at, otherwise raise an error with the message
> > > like "the slot need to consume change upto %X/%X to enable failover".
> > >
> >
> > This should furthe

Cancel problems of query to pg_stat_statements

2025-04-24 Thread Roman Khapov
Hi! Recently we faced a problem in out production psql installation, which was that we had to cancel all requests to the db, including performance monitoring requests, that uses ps_stat_statements. But we could not cancel the request in usual way, and had to kill -9 the pg process of it. We've noticed that the the query execution stuck on PGSS_TEXT_FILE file reading in function qtext_load_file, which doesn't have CHECK_FOR_INTERRUPTS in the read cycle. In addition to our case with large PGSS_TEXT_FILE (and maybe the problems with virtual disk i/o) that can explain uncancellable pg_stat_statements queries. Also, the reading block size can be reduced from 1GB to 32MB in order to increase the frequency of CHECK_FOR_INTERRUPTS calls without qtext_load_file performance degradation. To check that I did some little testing with fio like: fio --name=readtest --filename=./random-bytes-file --rw=read --bs=32m --size=10G --ioengine=libaio --direct=1 --numjobs=1 So, I made a simple patch that adds CHECK_FOR_INTERRUPTS call in read cycle of qtext_load_file and change max value of toread from 1GB to 32MB. I would appreciate your feedback on these changes.  From eab041fcd1f4973b03f6bebd37bef3395fe64b4b Mon Sep 17 00:00:00 2001
From: rkhapov 
Date: Thu, 24 Apr 2025 17:01:54 +
Subject: [PATCH] pg_stat_statements.c: cancelable qtext_load_file

In the case of a large PGSS_TEXT_FILE, the work time of the qtext_load_file
function will be quite long, and the query to the pg_stat_statements table
will not be cancellable, as there is no CHECK_FOR_INTERRUPT in the function.

Also, the amount of bytes read can reach 1 GB, which leads to a slow read
system call that does not allow cancellation of the query. Testing the speed
of sequential read using fio with different block sizes shows that there is
no significant difference between 16 MB blocks and 1 GB blocks.

Therefore, this patch changes the maximum read value from 1 GB to 16 MB and
adds CHECK_FOR_INTERRUPTION in the read loop of qtext_load_file to make it cancellable.

Signed-off-by: rkhapov 
---
 contrib/pg_stat_statements/pg_stat_statements.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 9778407cba3..cd34f1ce248 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -2365,7 +2365,9 @@ qtext_load_file(Size *buffer_size)
 	nread = 0;
 	while (nread < stat.st_size)
 	{
-		int			toread = Min(1024 * 1024 * 1024, stat.st_size - nread);
+		int			toread = Min(32 * 1024 * 1024, stat.st_size - nread);
+
+		CHECK_FOR_INTERRUPTS();
 
 		/*
 		 * If we get a short read and errno doesn't get set, the reason is
-- 
2.43.0




Re: sslmode=secure by default (Re: Making sslrootcert=system work on Windows psql)

2025-04-24 Thread Jacob Champion
On Thu, Apr 24, 2025 at 5:00 AM Peter Eisentraut  wrote:
> I'm generally in favor of making sslmode=verify-full the effective
> default somehow.

+many

On Thu, Apr 24, 2025 at 3:53 AM Christoph Berg  wrote:
> For
> postgresql://-style strings, we would ideally have something like http://
> vs https://, but I am not sure how to squeeze that into the syntax.

Not to derail things too much, but I'd also like a postgress://
scheme, and I've put a little bit of idle thought into it. I think
we'd want it to imply sslnegotiation=direct and sslrootcert=system
(modulo the Windows discussion already in progress), and potentially
make a bunch of stricter decisions about TLS settings to better match
modern practice. The intent would be to have a "browser-strength"
scheme for people who care more about security than about raw
compatibility with older systems, because they're connecting to
someone else's servers on the open Web.

The hardest part, in my opinion, is that we'd have to start following
the RFC concept of "authority". A URL of
"postgress://example.com/db?host=evil.com&hostaddr=..." is outright
dangerous, as is "postgress://example.com/db?sslmode=disable". So if
there's interest in that scheme, I think it should remain a separate
feature from "verify-full by default", because there's a lot more to
figure out.

--Jacob




Re: Available disk space per tablespace

2025-04-24 Thread said assemlal

Hi,

I also tested the patch on Linux mint 22.1 with the btrfs and ext4 
partitions. I generated some data and the outcome looks good:


postgres=# \db+
   List of tablespaces
   Name   |  Owner   | Location  | Access 
privileges | Options |  Size   |  Free   | Description

--+--+---+---+-+-+-+-
 pg_default   | postgres | |   | | 1972 MB 
| 29 GB   |
 pg_global    | postgres | |   | | 556 kB  
| 29 GB   |
 tablespace_test2 | postgres | /media/said/queryme/pgsql 
|   | | 3147 MB | 1736 GB |



Numbers are the same as if I were executing the command: df -h

tablespace_test2 was the ext4 partition on usb stick.

Numbers are correct.

Said

On 2025-03-13 14 h 10, Christoph Berg wrote:

Hi,

I'm picking up a 5 year old patch again:
https://www.postgresql.org/message-id/flat/20191108132419.GG8017%40msg.df7cb.de

Users will be interested in knowing how much extra data they can load
into a database, but PG currently does not expose that number. This
patch introduces a new function pg_tablespace_avail() that takes a
tablespace name or oid, and returns the number of bytes "available"
there. This is the number without any reserved blocks (Unix, f_avail)
or available to the current user (Windows).

(This is not meant to replace a full-fledged OS monitoring system that
has much more numbers about disks and everything, it is filling a UX
gap.)

Compared to the last patch, this just returns a single number so it's
easier to use - total space isn't all that interesting, we just return
the number the user wants.

The free space is included in \db+ output:

postgres =# \db+
  List of tablespaces
 Name│ Owner │ Location │ Access privileges │ Options │  Size   │  Free 
 │ Description
┼───┼──┼───┼─┼─┼┼─
  pg_default │ myon  │  │ ∅ │ ∅   │ 23 MB   │ 538 
GB │ ∅
  pg_global  │ myon  │  │ ∅ │ ∅   │ 556 kB  │ 538 
GB │ ∅
  spc│ myon  │ /tmp/spc │ ∅ │ ∅   │ 0 bytes │ 31 GB 
 │ ∅
(3 rows)

The patch has also been tested on Windows.

TODO: Figure out which systems need statfs() vs statvfs()

Christoph




Re: [PATCH] Re: Proposal to Enable/Disable Index using ALTER INDEX

2025-04-24 Thread jian he
hi, two more minor issues.

src/bin/pg_dump/pg_dump.c
if (fout->remoteVersion >= 18)
need change to
if (fout->remoteVersion >= 19)


+-- Test index visibility with partitioned tables
+CREATE TABLE part_tbl(id int, data text) PARTITION BY RANGE(id);
+CREATE TABLE part1 PARTITION OF part_tbl FOR VALUES FROM (1) TO (100);
+CREATE TABLE part2 PARTITION OF part_tbl FOR VALUES FROM (100) TO (200);
+INSERT INTO part_tbl
+SELECT g, 'data ' || g
+FROM generate_series(1, 199) g;
+CREATE INDEX idx_part_tbl ON part_tbl(data);
+SELECT c.relname, i.indisvisible
+FROM pg_index i
+JOIN pg_class c ON i.indexrelid = c.oid
+WHERE c.relname LIKE 'idx_part_tbl%'
+ORDER BY c.relname;
+   relname| indisvisible
+--+--
+ idx_part_tbl | t
+(1 row)
+

This test seems not that good?
"idx_part_tbl" is the partitioned index, we also need to show each
partition index
pg_index.indisvisible value?

we can change it to

CREATE TABLE part_tbl(id int, data text) PARTITION BY RANGE(id);
CREATE TABLE part_tbl_1 PARTITION OF part_tbl FOR VALUES FROM (1) TO (100);
CREATE TABLE part_tbl_2 PARTITION OF part_tbl FOR VALUES FROM (100) TO (200);
CREATE INDEX ON part_tbl(data);
SELECT c.relname, i.indisvisible
FROM pg_index i
JOIN pg_class c ON i.indexrelid = c.oid
WHERE c.relname LIKE 'part_tbl%'
ORDER BY c.relname;
-




Re: [BUG] temporary file usage report with extended protocol and unnamed portals

2025-04-24 Thread Frédéric Yhuel




On 4/23/25 18:13, Sami Imseih wrote:

Also, another strange behavior of the way portal cleanup occurs is that
in extended-query-protocol and within a transaction, ExecutorEnd for the
last query is not actually called until the next command. This just seems
odd to me especially for extensions that rely on ExecutorEnd.

So, Can we do something like this? This drops the portal as soon as
execution completes ( the portal is fetched to completion ). This will
ensure that there is no delay in ExecutorEnd getting called and in the
case of log_temp_files, the message will be logged while debug_query_string
is still pointing to the correct query.


diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index dc4c600922d..efe0151ca8f 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -2327,6 +2327,9 @@ exec_execute_message(const char *portal_name,
long max_rows)

 /* Send appropriate CommandComplete to client */
 EndCommand(&qc, dest, false);
+
+   if (!portal->portalPinned)
+   PortalDrop(portal, false);
 }
 else
 {


I don't know if it is the correct solution, but it seems good to me 
(FWIW), and I've tested it and it works well in all the following cases:


* Java program: extended protocol used for the two queries (the one that 
use the temp file and the SELECT 1).


* Python program: the SELECT 1 is using the simple protocol.

* SQL PREPARE / EXECUTE

* Another version of the Java program using the
setFetchSize() method (and named portals).




Re: What's our minimum supported Python version?

2025-04-24 Thread Peter Eisentraut

On 22.04.25 18:04, Tom Lane wrote:

Jacob Champion  writes:

As for picking a version... 3.6 will have been EOL for almost three
years by the time 18 releases. It seems like we would drop it happily,
were it not for RHEL8.


Agreed, but RHEL8 is out there and I don't think we can just drop
support for it.  I'm also not excited by the idea that an incidental
test script gets to dictate what the cutoff is.

I do think we should stop claiming that Python 3.2 will work.
(Maybe it does, but we don't know that.)  I see that the configure
script only checks for Python >= 3, and it doesn't look like the
meson scripts check explicitly at all, although there's a comment
saying that our meson version cutoff is intended to allow working
with Python 3.5.

Maybe it's sufficient to make a documentation change here, and
say we support Python >= 3.5?  I'd be okay with saying 3.6.8
too, on the grounds that if anything older fails to work we'd
almost certainly just say "too bad".  But RHEL8 is widespread
enough that I think we need to keep making the effort for 3.6.8.


There are a lot of different things that are meant by Python support 
nowadays.


It used to be the case that the minimum Python version was (1) the 
oldest version that was required to get plpython to compile, and (2) the 
range of versions for which we would maintain "expected" files for the 
plpython tests (see history of src/pl/plpython/expected/README).


#2 isn't really a problem anymore, it seems.  It used to require yearly 
attention, but not anymore.  (Maybe Python itself has gotten more 
stable, or we have changed our tests to be less sensitive to this, 
probably a combination of both.)


#1 is also less of a hot-spot, as indicated that we still claim to 
support back to Python 3.2.  Also, starting in PG18, we are using the 
Python "Limited API", so this is being enforced on the Python side. 
That means, in principle, if plpython compiles successfully right now on 
Python 3.13, then it should also compile successfully on Python 3.2.


But now the new meanings are also, (3) what version of Python is 
required by the oldest supported Meson version, and (4) what version of 
Python can be assumed by various build and test scripts.  There is 
actually (4a) scripts that are only run from a meson build, which should 
surely align with #3, and (4b) scripts that are also run by a make 
build, which is what oauth_server.py is, for which we can pick anything. 
 The answer to #3 is currently Python 3.5 (see top of top-level 
meson.build).


And then there is meaning (5) what version has anyone actually tested.

Note that #2, #3, and #4 are build-time and #1 also has a run-time 
component.  It would be plausible to say that you need a certain newer 
Python to build, but plpython can still run on older versions.  If we 
were to make any changes, we need to make sure that the documentation is 
precise about this and matches the code (see #define Py_LIMITED_API).


The cut-off in practice for these things is usually RHEL.  PG18 
currently still supports RHEL 7, which appears to come with Python 3.6. 
Seeing that the small problem with the test script was easily fixed, I 
think we should stick with that for now.  There might be value in 
dropping support for RHEL 7 sometime, but that should be done across the 
board (meson version, openssl version, perl version, etc.), and requires 
some buildfarm changes, so I wouldn't do this right now.






Re: Fix slot synchronization with two_phase decoding enabled

2025-04-24 Thread Nisha Moond
On Thu, Apr 24, 2025 at 12:28 PM Amit Kapila  wrote:
>
> On Wed, Apr 23, 2025 at 11:04 PM Masahiko Sawada  
> wrote:
> >
> > On Tue, Apr 22, 2025 at 3:00 AM Amit Kapila  wrote:
> > >
> > > On Mon, Apr 21, 2025 at 8:44 AM Zhijie Hou (Fujitsu)
> > >  wrote:
> > > >
> > > > On Sat, Apr 19, 2025 at 2:19 AM Masahiko Sawada wrote:
> > > > >
> > > > > On Tue, Apr 8, 2025 at 10:14 PM Zhijie Hou (Fujitsu)
> > > > >  wrote:
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Approach 2
> > > > > > --
> > > > > >
> > > > > > Instead of disallowing the use of two-phase and failover together, 
> > > > > > a more
> > > > > > flexible strategy could be only restrict failover for slots with 
> > > > > > two-phase
> > > > > > enabled when there's a possibility of existing prepared 
> > > > > > transactions before
> > > > > the
> > > > > > two_phase_at that are not yet replicated. During slot creation with
> > > > > two-phase
> > > > > > and failover, we could check for any decoded prepared transactions 
> > > > > > when
> > > > > > determining the decoding start point 
> > > > > > (DecodingContextFindStartpoint). For
> > > > > > subsequent attempts to alter failover to true, we ensure that 
> > > > > > two_phase_at is
> > > > > > less than restart_lsn, indicating that all prepared transactions 
> > > > > > have been
> > > > > > committed and replicated, thus the bug would not happen.
> > > > > >
> > > > > > pros:
> > > > > >
> > > > > > This method minimizes restrictions for users. Especially during 
> > > > > > slot creation
> > > > > > with (two_phase=on, failover=on), as it’s uncommon for transactions 
> > > > > > to
> > > > > prepare
> > > > > > during consistent snapshot creation, the restriction becomes almost
> > > > > > unnoticeable.
> > > > >
> > > > > I think this approach can work for the transactions that are prepared
> > > > > while the slot is created. But if I understand the problem correctly,
> > > > > while the initial table sync is performing, the slot's two_phase is
> > > > > still false, so we need to deal with the transactions that are
> > > > > prepared during the initial table sync too. What do you think?
> > > > >
> > > >
> > > > Yes, I agree that we need to restrict this case too. Given that we 
> > > > haven't
> > > > started decoding when setting two_phase=true during 
> > > > CreateDecodingContext()
> > > > after tablesync, we could check prepared transactions afterwards during
> > > > decoding. This could involve reporting an ERROR when skipping a prepared
> > > > transaction during decoding if its prepare LSN is less than 
> > > > two_phase_at.
> > > >
> > >
> > > It will make it difficult for users to detect it as this happens at a
> > > later point of time.
> > >
> > > > Alternatively, a simpler method would be to prevent this situation 
> > > > entirely
> > > > during the CREATE SUBSCRIPTION command. For example, we could restrict 
> > > > slots
> > > > created with failover set to true and twophase is later modified to 
> > > > true after
> > > > tablesync. Although the simpler check is more user-visible, it may 
> > > > offer less
> > > > flexibility.
> > > >
> > >
> > > I agree with your point, but OTOH, I am also afraid of adding too many
> > > smart checks in the back-branch. If we follow what you say here, then
> > > users have the following ways in PG17 to enable both failover and
> > > two_phase. (a) During Create Subscription, users can set both
> > > 'failover' and 'two_phase', if 'copy_data' is false, or (b), if
> > > 'copy_data' is true, during Create Subscription, then users can enable
> > > 'two_phase' and wait for it to be enabled. Then use Alter Subscription
> > > to set 'failover'.
> >
> > Yet another idea would be to disallow enabling both two_phase and
> > failover at CREATE SUBSCRIPTION regardless of copy_data value and to
> > add check when enabling failover for the two_phase-enabled-slots. For
> > example, users who want to enable both need to do two steps:
> >
> > 1. CREATE SUBSCRIPTION with two_phase = true and failover = false.
> > 2. ALTER SUBSCRIPTION with failover = true.
> >
> > At ALTER SUBSCRIPTION with failover = true, the subscriber checks if
> > the two_phase states is ready (and possibly check if the slot's
> > two_phase has been enabled too), otherwise raises an ERROR. Then, when
> > the publisher enables the failover for the two_phase-enabled-slot up
> > on walrcv_alter_slot() request, it checks the slot's restart_lsn has
> > passed slot's two_phase_at, otherwise raise an error with the message
> > like "the slot need to consume change upto %X/%X to enable failover".
> >
>
> This should further simplify the checks with an additional restriction
> during the CREATE SUBSCRIPTION time. I am in favor of it because I
> want the code in this area to be as much same in HEAD and backbranches
> as possible.
>

Please find the updated patch for Approach 3, which implements the
idea suggested by Swada-san above.

--
Thanks,
Nisha


v7-0001-PG17-Appro

Disallow redundant indexes

2025-04-24 Thread Japin Li


Hi, hackers

Currently, PostgreSQL permits creating multiple indexes on the same columns
in the same order for a table, potentially leading to redundant indexes.
For example:

CREATE INDEX ON t(id);
CREATE INDEX ON t(id);

While permitted, this leads to:

- Increased storage consumption
- Performance degradation (for data modification)
- Maintenance overhead
- Potential query optimizer confusion

Oracle prevents this with an error like ORA-01408: such column list already
indexed [1].

I propose that PostgreSQL prevent redundant index creation by:

- Checking for identical existing indexes during CREATE INDEX.
- Failing with an error (like Oracle's ORA-01408) if a duplicate is found.
- Providing a GUC parameter (allow_redundant_indexes) to control this.

This change would:

- Prevent accidental redundancy
- Optimize storage
- Improve performance
- Simplify maintenance
- Enhance efficiency and user flexibility

I’d love to hear your feedback or suggestions for improvement.

[1] https://docs.oracle.com/en/error-help/db/ora-01408/?r=19c

-- 
Regrads,
Japin Li




Re: Conflicting updates of command progress

2025-04-24 Thread Antonin Houska
Sami Imseih  wrote:

> >> pgstat_progress_start_command should only be called once by the entry
> >> point for the
> >> command. In theory, we could end up in a situation where start_command
> >> is called multiple times during the same top-level command;
> 
> > Not only in theory - it actually happens when CLUSTER is rebuilding indexes.
> 
> In the case of CLUSTER, pgstat_progress_start_command is only called once,
> but pgstat_progress_update_param is called in the context of both CLUSTER
> and CREATE INDEX.

pgstat_progress_start_command() is called twice: First with
cmdtype=PROGRESS_COMMAND_CLUSTER, second with
PROGRESS_COMMAND_CREATE_INDEX. The first happens in cluster_rel() the second
in cluster_rel() -> rebuild_relation() -> finish_heap_swap() ->
reindex_relation() -> reindex_index().

It does not matter though, the current design only expects one command.

> > That's a possible approach. However, if the index build takes long time in 
> > the
> > CLUSTER case, the user will probably be interested in details about the 
> > index
> > build.
> 
> I agree,
> 
> >> Is there a repro that you can share that shows the weird values? It sounds 
> >> like
> >> the repro is on top of [1]. Is that right?
> 
> >> You can reproduce the similar problem by creating a trigger function that
> >> runs a progress-reporting command like COPY, and then COPY data into
> >> a table that uses that trigger.
> 
> >> [2] https://commitfest.postgresql.org/patch/5282/
> 
> In this case, pgstat_progress_start_command is actually called
> twice in the life of a single COPY command; the upper-level COPY
> command calls pgstat_progress_start_command and then the nested COPY
> command also does calls pgstat_progress_start_command.
> 
> > I think that can be implemented by moving the progress related fields from
> > PgBackendStatus into a new structure and by teaching the backend to insert a
> > new instance of that structure into a shared hash table (dshash.c)
> 
> I think this is a good idea in general to move the backend progress to
> shared memory.
> and with a new API that will deal with scenarios as described above.
> 1/ an (explicit) nested
> command was started by a top-level command, such as the COPY case above.
> 2/ a top-level command triggered some other progress code implicitly, such as
> CLUSTER triggering CREATE INDEX code.

Yes, I mean a new API. I imagine pgstat_progress_start_command() to initialize
the shared memory to track the "nested" command and to put the existing value
of MyBEEntry onto a stack. pgstat_progress_end_command() would then restore
the original value.

However, special care is needed for [2] because that's not necessarily
nesting: consider merge-joining two foreign tables, both using file_fdw. In
this case the pointers to the progress tracking shared memory would probably
have to be passed as function arguments.

(Note that the current progress tracking system also uses shared memory,
however in a less flexible way.)

> I also like the shared memory approach because we can then not have to use
> a message like the one introduced in f1889729dd3ab0 to support parallel index
> vacuum progress 46ebdfe164c61.

I didn't know about these patches. I'm not sure though if this needs to be
removed. Even if each worker updated the progress information separately
(would users appreciate that?), it should still send the progress information
to the leader before it (i.e. the worker) exits.

-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com




Re: long-standing data loss bug in initial sync of logical replication

2025-04-24 Thread Amit Kapila
On Wed, Apr 23, 2025 at 10:28 PM Masahiko Sawada  wrote:
>
> >
> > Fair enough. OTOH, we can leave the 13 branch considering following:
> > (a) it is near EOL, (b) bug happens in rare cases (when the DDLs like
> > ALTER PUBLICATION ... ADD TABLE ... or ALTER TYPE ...  that don't take
> > a strong lock on table happens concurrently to DMLs on the tables
> > involved in the DDL.), and (c) the complete fix is invasive, even
> > partial fix is not simple. I have a slight fear that if we make any
> > mistake in fixing it partially (of course, we can't see any today), we
> > may not even get a chance to fix it.
> >
> > Now, if the above convinces you or someone else not to push the
> > partial fix in 13, then fine; otherwise, I'll push the 0001 to 13 day
> > after tomorrow.
>
> I've considered the above points. I guess (b), particularly executing
> ALTER PUBLICATION .. ADD TABLE while the target table is being
> updated, might not be rare depending on systems. Given that this bug
> causes a silent data-loss on the subscriber that is hard for users to
> realize, it could ultimately depend on to what extent we can mitigate
> the problem with only 0001 and there is a workaround when the problem
> happens.
>
> Kuroda-san already shared[1] the analysis of what happens with and
> without 0002 patch, but let me try with the example close to the
> original data-loss problem[2]:
>
> Consider the following scenario:
>
> S1: CREATE TABLE d(data text not null);
> S1: INSERT INTO d VALUES('d1');
> S2: BEGIN;
> S2: INSERT INTO d VALUES('d2');
> S1: ALTER PUBLICATION pb ADD TABLE d;
> S2: INSERT INTO d VALUES('d3');
> S2: COMMIT
> S2: INSERT INTO d VALUES('d4');
> S1: INSERT INTO d VALUES('d5');
>
> Without 0001 and 0002 (i.e. as of today), the walsender fails to send
> all changes to table 'd' until it invalidates its caches for some
> reasons.
>
> With only 0001, the walsender sends 'd4' insertion or later.
>
> WIth both 0001 and 0002, the wansender sends 'd3' insertion or later.
>
> ISTM the difference between without both 0001 and 0002 and with 0001
> is significant. So I think it's worth applying 0001 for v13.
>

Pushed to v13 as well, thanks for sharing the feedback.

-- 
With Regards,
Amit Kapila.




Re: extension_control_path and "directory"

2025-04-24 Thread Christoph Berg
Re: Matheus Alcantara
> I've tested with the semver extension and it seems to work fine with
> this patch. Can you please check on your side to see if it's also
> working?

Hi Matheus,

thanks for the patch, it does indeed work.

diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index 180f4af9be3..d68efd59118 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -376,6 +376,14 @@ get_extension_control_directories(void)
 
/* Substitute the path macro if needed */
mangled = substitute_path_macro(piece, "$system", 
system_dir);
+
+   /*
+* Append "extension" suffix in case is a custom 
extension control
+* path.
+*/
+   if (strcmp(piece, "$system") != 0)
+   mangled = psprintf("%s/extension", mangled);

This would look prettier if it was something like

mangled = substitute_path_macro(piece, "$system", 
system_dir "/extension");

... but I'm wondering if it wouldn't be saner if the control path
should be stored without "extension" in that struct. Then opening the
control file would be path + "extension/" + filename and the extra
directory would be path + directory, without any on-the-fly stripping
of trailing components.

The extension_control_path GUC could also be adjusted to refer to the
directory one level above the extension/foo.control location.


+   /*
+* Assert that the control->control_dir end with /extension suffix so 
that
+* we can replace with the value from control->directory.
+*/
+   Assert(ctrldir_len >= suffix_len &&
+  strcmp(control->control_dir + ctrldir_len - suffix_len, 
"extension") == 0);

If control_dir is coming from extension_control_path, it might have a
different suffix. Replace the Assert by elog(ERROR). (Or see above.)

Christoph




Re: sslmode=secure by default (Re: Making sslrootcert=system work on Windows psql)

2025-04-24 Thread George MacKerron
> On Linux/*ix, there would be 3 things that are all the same.
> 
> If the Windows Openssl store is that bad, wouldn't the smarter thing
> to do for PG19 to use winstore by default? The Openssl one would still
> be available when requested explicitly. This would avoid the
> proliferation of default values.

I agree ... but I think that looks rather like my most recent (rejected) patch?

However, perhaps we could extend that patch for greater 
backwards-compatibility, checking not only that the SSL_CERT_DIR and 
SSL_CERT_FILE environment variables are not set, but *also* that there is no 
cert.pem file and no certs/ directory inside OPENSSLDIR.

I think that should make the behaviour backwards-compatible for all scenarios 
*except* those that would otherwise be guaranteed to fail certificate 
verification because we are on Windows and there are no OpenSSL certificates 
configured on the system. It seems fairly safe to assume that people who are 
using sslrootcert=system on Windows and without any configured OpenSSL certs 
are not doing so with the deliberate intention that all connections should fail!

I attach a patch that would do this (side-by-side view at 
https://github.com/postgres/postgres/compare/master...jawj:postgres:jawj-sslrootcert-system-windows).

An advantage of this approach would be that people building Postgres who want 
this behaviour sooner than next year could also patch it into versions 16 – 18 
without much trouble.


>> BIGGER IDEA

>> In summary, you end up with these as sslmode values:
>> 
>> * disabled
>> * insecure (formerly known as require)
>> * verify-ca
>> * verify-full
>> * secure (the new default, meaning sslmode=verify-full + sslrootcert=os)
>> 
>> Obviously this would need to be well-trailed ahead of time, as some people 
>> would need to make changes to how they use psql/libpq. But it would peg the 
>> default security of a Postgres connection at the same level as the security 
>> of any random blog page (which I think is a bare minimum one might aspire 
>> to).
> 
> I agree that this would be a good change for SSL users, and also one
> that people would likely be willing to buy.
> 
> The big problem here is that a lot of installations are not using SSL
> at all (default on RPM), and another big chunk is using SSL, but
> relying on the default snakeoil certificates to just work (default on
> Debian), so this would not be "some people" but more like "everyone
> except the few who have already configured certificates properly".
> 
> These people would have to change every single connection string to
> include "sslmode=disabled" or the like. This will likely not be
> received well.
> 
> Before we can make this change, I think we would have to improve the
> UX. psql does not even have any --switch for it. PostgreSQL serving
> non-SSL and SSL on the same port doesn't make the UX better... :-/

How do you think the UX could be improved? Maybe by using a psql switch and/or 
an env var to opt out of (or initially even to opt into) the new sslmode 
treatment?



sslrootcert-system-windows.diff
Description: Binary data




Re: Fix slot synchronization with two_phase decoding enabled

2025-04-24 Thread shveta malik
On Thu, Apr 24, 2025 at 2:54 PM Nisha Moond  wrote:
>
> Please find the updated patch for Approach 3, which implements the
> idea suggested by Swada-san above.
>

Thank You for the patch.

1)

CreateDecodingContext:

  if (ctx->twophase && !slot->data.two_phase)
  {
+ /*
+ * Do not allow two-phase decoding for failover enabled slots.
+ *
+ * See comments atop the similar check in ReplicationSlotCreate() for
+ * a detailed reason.
+ */
+ if (slot->data.failover)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot enable two-phase decoding for failover enabled slot \"%s\"",
+ NameStr(slot->data.name;
+

Can you please add tetscase to cover this scenario?

2)
We shall update create-sub documents as well for these mutually
exclusive options. Review other pages (alter-sub, create-slot) as well
for any required change.

3)
+##
+# Test that the failover option can be enabled for a two_phase enabled
+# subscription.
+##

Suggestion: 'Test that the failover option can be enabled for a two_phase
enabled subscription only through Alter Subscription (failover=true)'


thanks
Shveta




Re: [PATCH] contrib/xml2: xslt_process() should report XSLT-related error details

2025-04-24 Thread Robin Haberkorn
On Thu Apr 24, 2025 at 01:59:10 GMT +03, Tom Lane wrote:
> Hm.  I wonder if we couldn't get rid of the
> HAVE_XMLSTRUCTUREDERRORCONTEXT conditionalization.  If the
> libxml2 versions that lacked that were "old" in 2011, surely
> they are extinct in the wild by now.  I'm thinking that we
> really don't want to be using the generic handler at all, given
> the commit log's comment that it can't distinguish between
> error, warning, and notice cases.

libxml 2.7.4 was released 15 years ago, so yes these #ifdefs can
probably be removed already. What's the oldest distribution/OS you
want to support PG on?
Even Ubuntu 14.04 from 2014 already had libxml 2.9.1.

If you like, I will prepare another patch to remove
HAVE_XMLSTRUCTUREDERRORCONTEXT in a separate thread.

PS: I added the libxslt error handling patch to the next commitfest:
https://commitfest.postgresql.org/patch/5718/

-- 
Robin Haberkorn
Senior Software Engineer

B1 Systems GmbH
Osterfeldstraße 7 / 85088 Vohburg / https://www.b1-systems.de
GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt, HRB 3537




Re: Does RENAME TABLE rename associated identity sequence?

2025-04-24 Thread Ashutosh Bapat
Hi Jason,

On Wed, Apr 23, 2025 at 6:06 PM Jason Song  wrote:

> Hi hackers,
>
> I was wondering if there's any built-in functionality in PostgreSQL where
> renaming a table with an identity column would also rename the
> auto-generated sequence associated with that identity column.
>
> In my case, I renamed a table that used `GENERATED BY DEFAULT AS
> IDENTITY`, and later when I ran `pg_dump`, I noticed that the sequence name
> was unchanged (e.g., still `old_table_id_seq`). As a result, any `setval()`
> or sequence-related operations referenced the old sequence name, even
> though the table name had changed.
>

Is it causing a problem in your application or environment?

As long as the system uses the given name consistently and does not cause
any error due to trying to derive sequence name from new table name and
ending up in non-existent sequence error, it should be fine. Internally we
use OID and not name. Identity sequences are not expected to be manipulated
by users anyway, so the name shouldn't matter.

I agree that finding an identity sequence associated with a table is not
straightforward - that association is stored in pg_depends, which is not
intuitive. Take a look at getIdentitySequence().


>
> I realize this can be worked around — for example, by using
> `--exclude-table-data` to skip the `setval()` or manually renaming the
> sequence after the table rename. But I'm curious if there are any plans (or
> technical reasons against) supporting something like `ALTER TABLE ...
> RENAME ... WITH SEQUENCE`, or having the sequence name automatically follow
> the table rename when it was originally auto-generated by an identity
> column.
>
>
If there's any problem, IMO, ALTER TABLE ... RENAME ... should rename the
sequence too since the identity sequences are created implicitly when the
table is created, so they should be renamed implicitly. We should not
require WITH SEQUENCE clause.

-- 
Best Wishes,
Ashutosh Bapat


Possible incorrect comment in ginget.c

2025-04-24 Thread Arseniy Mukhin
Hi,

I was a bit confused by this comment in keyGetItem() (ginget.c)

/*
* Normally, none of the items in additionalEntries can have a curItem
* larger than minItem. But if minItem is a lossy page, then there
* might be exact items on the same page among additionalEntries.
*/

AFAIS the code before the comment is opposite to the comment's first sentence.
We set advancePast right before the minItem, and if an additional
entry's curItem is smaller than or equals to advancePast, we call
entryGetItem(), so afterwards it should be larger than advancePast and
larger than or equal to minItem.
So it seems that normally all items in additionalEntries have a
curItem larger than or equal to minItem.

It seems the correct first sentence would be:
"Normally, none of the items in additionalEntries can have a curItem
LESS than minItem"


Best regards,
Arseniy Mukhin




Re: Fix premature xmin advancement during fast forward decoding

2025-04-24 Thread Amit Kapila
On Tue, Apr 22, 2025 at 12:36 PM Zhijie Hou (Fujitsu)
 wrote:
>
> When analyzing some issues in another thread[1], I found a bug that fast 
> forward
> decoding could lead to premature advancement of catalog_xmin, resulting in
> required catalog data being removed by vacuum.
>
> The issue arises because we do not build a base snapshot when decoding changes
> during fast forward mode, preventing reference to the minimum transaction ID
> that remains visible in the snapshot when determining the candidate for
> catalog_xmin. As a result, catalog_xmin was directly advanced to the oldest
> running transaction ID found in the latest running_xacts record.
>
> In code-level, SnapBuildProcessChange -> ReorderBufferSetBaseSnapshot could 
> not
> be reached during fast forward decoding, resulting in
> rb->txns_by_base_snapshot_lsn being NULL. When advancing catalog_xmin, the
> system attempts to refer to rb->txns_by_base_snapshot_lsn via
> ReorderBufferGetOldestXmin(). However, since rb->txns_by_base_snapshot_lsn is
> NULL, it defaults to directly using running->oldestRunningXid as the candidate
> for catalog_xmin. For reference, see the details in
> SnapBuildProcessRunningXacts.
>
> See the attachment for a test(0002) to prove that the catalog data that are
> still required would be removed after premature catalog_xmin advancement 
> during
> fast forward decoding.
>
> To fix this, I think we can allow the base snapshot to be built during fast
> forward decoding, as implemented in the patch 0001 (We already built base
> snapshot in fast-forward mode for logical message in logicalmsg_decode()).
>

The same bug was fixed for non-fast_forward mode in commit f49a80c4.
See the following code in that commit:

-   LogicalIncreaseXminForSlot(lsn, running->oldestRunningXid);
+   xmin = ReorderBufferGetOldestXmin(builder->reorder);
+   if (xmin == InvalidTransactionId)
+   xmin = running->oldestRunningXid;

+   LogicalIncreaseXminForSlot(lsn, xmin);

Is my understanding correct? If so, then I think it is a miss in the
commit f49a80c4 to consider fast_forward mode. Please refer commit in
the commit message.

The code changes look correct to me. I have some suggestions related
to comments in the code. See attached. Please prepare patches for back
branches as well.

-- 
With Regards,
Amit Kapila.
diff --git a/src/backend/replication/logical/decode.c 
b/src/backend/replication/logical/decode.c
index 8dc467a8bb3..cc03f0706e9 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -413,10 +413,10 @@ heap2_decode(LogicalDecodingContext *ctx, 
XLogRecordBuffer *buf)
/*
 * If we don't have snapshot or we are just fast-forwarding, there is no
 * point in decoding data changes. However, it's crucial to build the 
base
-* snapshot during fast-forward mode (as done in 
SnapBuildProcessChange())
-* because the snapshot's minimum visible transaction ID (xmin) must be
-* referenced when determining the candidate catalog_xmin for the
-* replication slot.
+* snapshot during fast-forward mode (as is done in
+* SnapBuildProcessChange()) because we require the snapshot's xmin when
+* determining the candidate catalog_xmin for the replication slot. See
+* SnapBuildProcessRunningXacts().
 */
if (SnapBuildCurrentState(builder) < SNAPBUILD_FULL_SNAPSHOT)
return;
@@ -477,10 +477,10 @@ heap_decode(LogicalDecodingContext *ctx, XLogRecordBuffer 
*buf)
/*
 * If we don't have snapshot or we are just fast-forwarding, there is no
 * point in decoding data changes. However, it's crucial to build the 
base
-* snapshot during fast-forward mode (as done in 
SnapBuildProcessChange())
-* because the snapshot's minimum visible transaction ID (xmin) must be
-* referenced when determining the candidate catalog_xmin for the
-* replication slot.
+* snapshot during fast-forward mode (as is done in
+* SnapBuildProcessChange()) because we require the snapshot's xmin when
+* determining the candidate catalog_xmin for the replication slot. See
+* SnapBuildProcessRunningXacts().
 */
if (SnapBuildCurrentState(builder) < SNAPBUILD_FULL_SNAPSHOT)
return;


Re: Does RENAME TABLE rename associated identity sequence?

2025-04-24 Thread Tom Lane
Isaac Morland  writes:
> On Thu, 24 Apr 2025 at 05:53, Ashutosh Bapat 
> wrote:
>> If there's any problem, IMO, ALTER TABLE ... RENAME ... should rename the
>> sequence too since the identity sequences are created implicitly when the
>> table is created, so they should be renamed implicitly. We should not
>> require WITH SEQUENCE clause.

> My concern would be what happens if the new sequence name is not available.
> I suppose the simplest behaviour might be to skip renaming the sequence in
> that case, perhaps raising a warning.

We do not rename any other subsidiary objects such as indexes.
Why would we rename a sequence (which has a lot more reason
to be considered an independent object than an index does)?

regression=# create table foo (i int primary key);
CREATE TABLE
regression=# \d+ foo
   Table "public.foo"
 Column |  Type   | Collation | Nullable | Default | Storage | Compression | 
Stats target | Description 
+-+---+--+-+-+-+--+-
 i  | integer |   | not null | | plain   | |
  | 
Indexes:
"foo_pkey" PRIMARY KEY, btree (i)
Not-null constraints:
"foo_i_not_null" NOT NULL "i"
Access method: heap

regression=# alter table foo rename to bar;
ALTER TABLE
regression=# \d+ bar
   Table "public.bar"
 Column |  Type   | Collation | Nullable | Default | Storage | Compression | 
Stats target | Description 
+-+---+--+-+-+-+--+-
 i  | integer |   | not null | | plain   | |
  | 
Indexes:
"foo_pkey" PRIMARY KEY, btree (i)
Not-null constraints:
"foo_i_not_null" NOT NULL "i"
Access method: heap

I think it's up to the user to rename subsidiary objects if
they wish to do so.

regards, tom lane




Re: What's our minimum supported Python version?

2025-04-24 Thread Tom Lane
Peter Eisentraut  writes:
> There are approximately 6 buildfarm members with RHEL 7 or CentOS 7, so 
> in that sense, "support" means that everything is expected to work 
> there.  And some amount of work has been done recently to keep that 
> support alive.

Yeah.  Devrim's choice of what to package is not project policy,
it's just his own personal allocation of resources.

> (But I'm confused now because I see Python 3.6 on both the RHEL 7 and 
> the RHEL 8 animals.  So it's not clear how representative these animals 
> are especially with respect to the Python versions.)

Per wikipedia, RHEL7 was released mid-2014, so the latest Python
version 7.0 could possibly have shipped with is 3.4 (released
2014-03) and much more likely they shipped 3.3 (2012-09).  However,
you can if you choose install later Python versions, and you
have control over which version /usr/bin/python3 points at.
(At least this is true on the RHEL8 box I am looking at, and
I'm fairly sure it was so on RHEL7 too.)  Similarly, Python 3.6
seems to be the baseline default on RHEL8 --- and the timing
more or less matches up, as 3.7 was released not long before
they would have been freezing the RHEL8 feature set.  But you can
install 3.8, 3.9, 3.11, and/or 3.12 without going outside the
universe of Red-Hat-supported packages.

So what's that mean for us?  "We still want to support RHEL7" turns
out to be squishier than one might think.  But I don't think we
really want to promise to work with Python 3.3, especially given
the lack of any buildfarm representation of that.  In the other
direction, yeah we could insist that RHEL users install some
later Python version, but I think we'd get push-back.  The point
of using an LTS distro is, well, stability.  The users want to
decide which packages they are comfortable with updating.

I'm still content with the idea of deciding that 3.6 is now our
cutoff.  If someone comes along and complains because
oauth_server.py doesn't work on an older version, it's up to them
to provide a patch.  If/when we want to include some Python code
that can't reasonably be made to work on 3.6, we can reconsider.

regards, tom lane




RE: Fix premature xmin advancement during fast forward decoding

2025-04-24 Thread Zhijie Hou (Fujitsu)
On Fri, Apr 25, 2025 at 5:44 AM Masahiko Sawada wrote:

(Resending this email after compressing the attachment)

> On Tue, Apr 22, 2025 at 12:06 AM Zhijie Hou (Fujitsu)
>  wrote:
> >
> > Hi,
> >
> > When analyzing some issues in another thread[1], I found a bug that
> > fast forward decoding could lead to premature advancement of
> > catalog_xmin, resulting in required catalog data being removed by vacuum.
> >
> > The issue arises because we do not build a base snapshot when decoding
> > changes during fast forward mode, preventing reference to the minimum
> > transaction ID that remains visible in the snapshot when determining
> > the candidate for catalog_xmin. As a result, catalog_xmin was directly
> > advanced to the oldest running transaction ID found in the latest
> running_xacts record.
> >
> > In code-level, SnapBuildProcessChange -> ReorderBufferSetBaseSnapshot
> > could not be reached during fast forward decoding, resulting in
> > rb->txns_by_base_snapshot_lsn being NULL. When advancing
> catalog_xmin,
> > rb->the
> > system attempts to refer to rb->txns_by_base_snapshot_lsn via
> > ReorderBufferGetOldestXmin(). However, since
> > rb->txns_by_base_snapshot_lsn is NULL, it defaults to directly using
> > running->oldestRunningXid as the candidate for catalog_xmin. For
> > reference, see the details in SnapBuildProcessRunningXacts.
> 
> I agree with your analysis.
> 
> > To fix this, I think we can allow the base snapshot to be built during
> > fast forward decoding, as implemented in the patch 0001 (We already
> > built base snapshot in fast-forward mode for logical message in
> logicalmsg_decode()).
> >
> > I also explored the possibility of further optimizing the fix to
> > reduce the overhead associated with building a snapshot in
> > fast-forward mode. E.g., to maintain a single base_snapshot_xmin
> > rather than generating a full snapshot for each transaction, and use
> > this base_snapshot_xmin when determining the candidate catalog_xmin.
> > However, I am not feeling confident about maintaining some
> > fast-forward dedicated fields in common structures and perhaps employing
> different handling for catalog_xmin.
> >
> > Moreover, I conducted a basic test[2] to test the patch's impact,
> > noting that advancing the slot incurs roughly a 4% increase in
> > processing time after applying the patch, which appears to be
> > acceptable.  Additionally, the cost associated with building the
> > snapshot via SnapBuildBuildSnapshot() did not show up in the profile.
> 
> Where did the regression come from? I think that even with your patch
> SnapBuildBuildSnapshot() would be called only a few times in the workload.

AFAICS, the primary cost arises from the additional function
invocations/executions. Functions such as SnapBuildProcessChange,
ReorderBufferSetBaseSnapshot, and ReorderBufferXidHasBaseSnapshot are invoked
more frequently after the patch. Although these functions aren't inherently
expensive, the scenario I tested involved numerous transactions starting with
just a single insert each. This resulted in a high frequency of calls to these
functions.

Attaching the profile for both HEAD and PATCHED for reference.

> With the patch, I think that we need to create ReorderBufferTXN entries for
> each transaction even during fast_forward mode, which could lead to
> overheads.

I think ReorderBufferTXN was created in fast_forward even without the patch.
See heap_decode-> ReorderBufferProcessXid.

> I think that 4% degradation is something that we want to avoid.

As mentioned above, these costs arise from essential function executions
required to maintain txns_by_base_snapshot_lsn. So I think the cost is
reasonable to ensure correctness. Thoughts ?

Best Regards,
Hou zj
<>


Re: Making sslrootcert=system work on Windows psql

2025-04-24 Thread Jelte Fennema-Nio
On Wed, 23 Apr 2025 at 17:47, George MacKerron  wrote:
> I’d suggest two new special sslrootcert values:
>
> (1) sslrootcert=openssl
>
> This does exactly what sslrootcert=system does now, but is less confusingly 
> named for Windows users. sslrootcert=system becomes a deprecated synonym for 
> this option.
>
> (2) sslrootcert=os

Okay I have some time to respond to this thread now. My main thought is this:

I think we should try as hard as possible for sslrootcert=system to do
"the right thing" on all platforms. "the right thing" being: allowing
users to connect to a Postgres server if the cert of that server. It's
a huge shame that MANY users connect to their production Postgres
databases over unauthenticated TLS. I believe the only way to fix that
is by having a *standard* & *secure* connection string that people can
copy paste from their database service provider's portal. Adding new
options that users need to choose between makes it impossible for
database providers to provide such a *standard* connection string
because the exact string will depend on the platform. Which means
they'll continue to only provide sslmode=require in their provided
connstrings. And even if we can somehow avoid that, it will reset the
clock on when most clients will actually support that *standard*
connection string. Thus increasing the time (by at least two years)
before database providers dare to put these options in their default
connection strings, in fear of their customers not being able to
connect and opening support requests or closing their accounts.

But yeah, the discussed situation is problematic for this: Windows
machines have multiple cert stores that could be reasonably considered
the system store.

So I'd like to propose a different way around that problem: Instead
adding more connection options. How about we add a *compile time*
option that allows the person that compiles libpq to choose which cert
store it should use if sslrootcert=system is provided. Something like
--system-cert-store=openssl and --system-cert-store=winstore flags for
./configure. This way users don't have to choose between the various
system stores to get behaviour that is sensible. Which one should be
the default, requires discussion, and maybe we'd want to let that
depend on the OpenSSL version or change it in the future. We could
even make it required for people compiling libpq on Windows (with an
OpenSSl version that supports winstore) to choose between these two
options, by making it a required flag.

Note: This doesn't mean we couldn't allow people to override the
compile time systemstore at runtime with e.g. sslsystemstore=winstore,
but the default would be the one that was chosen at compile time.

> BIGGER IDEA

I would really like to get to a point where libpq by default fails to
connect if you're not connecting to Postgres in a secure way: i.e. one
where you're not both encrypting traffic and authenticating the host
(unless you're connecting over unix sockets and maybe local loopback).
I think there's no point in actually working on/proposing that until
we have a secure connection string that works on all systems (i.e.
what sslrootcert=system is supposed to do)




Re: Fix premature xmin advancement during fast forward decoding

2025-04-24 Thread Masahiko Sawada
Hi,

On Tue, Apr 22, 2025 at 12:06 AM Zhijie Hou (Fujitsu)
 wrote:
>
> Hi,
>
> When analyzing some issues in another thread[1], I found a bug that fast 
> forward
> decoding could lead to premature advancement of catalog_xmin, resulting in
> required catalog data being removed by vacuum.
>
> The issue arises because we do not build a base snapshot when decoding changes
> during fast forward mode, preventing reference to the minimum transaction ID
> that remains visible in the snapshot when determining the candidate for
> catalog_xmin. As a result, catalog_xmin was directly advanced to the oldest
> running transaction ID found in the latest running_xacts record.
>
> In code-level, SnapBuildProcessChange -> ReorderBufferSetBaseSnapshot could 
> not
> be reached during fast forward decoding, resulting in
> rb->txns_by_base_snapshot_lsn being NULL. When advancing catalog_xmin, the
> system attempts to refer to rb->txns_by_base_snapshot_lsn via
> ReorderBufferGetOldestXmin(). However, since rb->txns_by_base_snapshot_lsn is
> NULL, it defaults to directly using running->oldestRunningXid as the candidate
> for catalog_xmin. For reference, see the details in
> SnapBuildProcessRunningXacts.

I agree with your analysis.

> To fix this, I think we can allow the base snapshot to be built during fast
> forward decoding, as implemented in the patch 0001 (We already built base
> snapshot in fast-forward mode for logical message in logicalmsg_decode()).
>
> I also explored the possibility of further optimizing the fix to reduce the
> overhead associated with building a snapshot in fast-forward mode. E.g., to
> maintain a single base_snapshot_xmin rather than generating a full snapshot 
> for
> each transaction, and use this base_snapshot_xmin when determining the
> candidate catalog_xmin. However, I am not feeling confident about maintaining
> some fast-forward dedicated fields in common structures and
> perhaps employing different handling for catalog_xmin.
>
> Moreover, I conducted a basic test[2] to test the patch's impact, noting that
> advancing the slot incurs roughly a 4% increase in processing time after
> applying the patch, which appears to be acceptable.  Additionally, the cost
> associated with building the snapshot via SnapBuildBuildSnapshot() did not 
> show
> up in the profile.

Where did the regression come from? I think that even with your patch
SnapBuildBuildSnapshot() would be called only a few times in the
workload. With the patch, I think that we need to create
ReorderBufferTXN entries for each transaction even during fast_forward
mode, which could lead to overheads. I think that 4% degradation is
something that we want to avoid.

Regards

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com




Re: sslmode=secure by default (Re: Making sslrootcert=system work on Windows psql)

2025-04-24 Thread Jelte Fennema-Nio
On Thu, 24 Apr 2025 at 18:46, Jacob Champion
 wrote:
>
> On Thu, Apr 24, 2025 at 5:00 AM Peter Eisentraut  wrote:
> > I'm generally in favor of making sslmode=verify-full the effective
> > default somehow.
>
> +many

Yes, +many

> Not to derail things too much, but I'd also like a postgress://
> scheme

Sounds great.

Let me derail some more, while we're at it I think it would be good to
add tls-prefixed aliases for all our ssl options. Like tlscert/tlskey.
Since such a new postgress:// scheme would be totally new, maybe we
can even disallow the ssl prefixed ones there.

> The hardest part, in my opinion, is that we'd have to start following
> the RFC concept of "authority". A URL of
> "postgress://example.com/db?host=evil.com&hostaddr=..." is outright
> dangerous

Why is this dangerous? As long as we'd validate that the provided cert
by the server is for example.com, I don't see any security problem in
having DNS resolution happen for evil.com, nor in having the IP
addresses hardcoded using hostaddr.

> as is "postgress://example.com/db?sslmode=disable"

Yeah that should be addressed, but seems like we mainly need to
disallow specifying sslmode completely there (or error if it's not
verify-full). And maybe there's some other options that we'd want to
disallow.

On Thu, 24 Apr 2025 at 18:46, Jacob Champion
 wrote:
>
> On Thu, Apr 24, 2025 at 5:00 AM Peter Eisentraut  wrote:
> > I'm generally in favor of making sslmode=verify-full the effective
> > default somehow.
>
> +many
>
> On Thu, Apr 24, 2025 at 3:53 AM Christoph Berg  wrote:
> > For
> > postgresql://-style strings, we would ideally have something like http://
> > vs https://, but I am not sure how to squeeze that into the syntax.
>
> Not to derail things too much, but I'd also like a postgress://
> scheme, and I've put a little bit of idle thought into it. I think
> we'd want it to imply sslnegotiation=direct and sslrootcert=system
> (modulo the Windows discussion already in progress), and potentially
> make a bunch of stricter decisions about TLS settings to better match
> modern practice. The intent would be to have a "browser-strength"
> scheme for people who care more about security than about raw
> compatibility with older systems, because they're connecting to
> someone else's servers on the open Web.
>
> The hardest part, in my opinion, is that we'd have to start following
> the RFC concept of "authority". A URL of
> "postgress://example.com/db?host=evil.com&hostaddr=..." is outright
> dangerous, as is "postgress://example.com/db?sslmode=disable". So if
> there's interest in that scheme, I think it should remain a separate
> feature from "verify-full by default", because there's a lot more to
> figure out.
>
> --Jacob
>
>




Re: Making sslrootcert=system work on Windows psql

2025-04-24 Thread Jelte Fennema-Nio
On Thu, 24 Apr 2025 at 23:52, Jelte Fennema-Nio  wrote:
> How about we add a *compile time*
> option that allows the person that compiles libpq to choose which cert
> store it should use if sslrootcert=system is provided. Something like
> --system-cert-store=openssl and --system-cert-store=winstore flags for
> ./configure.

@George So basically my suggestion is to make the behaviour that your
patch introduces configurable at compile time. FWIW my vote would
probably be to default to --system-cert-store=winstore if it's
available. And then --system-cert-store=openssl would be a way out for
people that took the effort to configure openssl correctly on Windows.




Re: extension_control_path and "directory"

2025-04-24 Thread David E. Wheeler
On Apr 24, 2025, at 11:18, Matheus Alcantara  wrote:

> In v2 I've moved the logic to remove the /extension to
> parse_extension_control_file(), do you think that this Assert on this
> function would still be wrong? IIUC we should always have /extension at
> the end of "control_dir" at this place, because the
> extension_control_path GUC will omit the /extension at the end and we
> will force it to have the suffix on the path at
> find_extension_control_filename() and
> get_extension_control_directories() functions. I'm missing something
> here?

I took this patch for a spin and managed to make it core dump. How? Well I 
installed semver with this command:

```sh
make prefix=/Users/david/Downloads install
```

Then set the search paths and restarted:

```ini
extension_control_path = '/Users/david/Downloads/share/extension:$system'
dynamic_library_path = '/Users/david/Downloads/lib:$libdir'
```

Then I connected and ran `CREATE EXTENSION semver` and it segfaulted. I poked 
around for a few minutes and realized that my prefix is not what I expected. 
Because it doesn’t contain the string “postgres”, PGXS helpfully adds it. The 
actual paths are:

```ini
extension_control_path = 
'/Users/david/Downloads/share/postgresql/extension:$system'
dynamic_library_path = '/Users/david/Downloads/lib/postgresql:$libdir'
```

With that fix it no longer segafulted.

So I presume something crashes when a directory or file doesn’t exist.

But I am not at all sure we want this prefix behavior for installing 
extensions. I get that has been the behavior for setting the main sharedir and 
libdir for Postgres, but I don’t know that it makes sense for extension 
prefixes.

Best,

David




signature.asc
Description: Message signed with OpenPGP


Re: long-standing data loss bug in initial sync of logical replication

2025-04-24 Thread Shlok Kyal
On Thu, 24 Apr 2025 at 14:39, Amit Kapila  wrote:
>
> On Wed, Apr 23, 2025 at 10:28 PM Masahiko Sawada  
> wrote:
> >
> > >
> > > Fair enough. OTOH, we can leave the 13 branch considering following:
> > > (a) it is near EOL, (b) bug happens in rare cases (when the DDLs like
> > > ALTER PUBLICATION ... ADD TABLE ... or ALTER TYPE ...  that don't take
> > > a strong lock on table happens concurrently to DMLs on the tables
> > > involved in the DDL.), and (c) the complete fix is invasive, even
> > > partial fix is not simple. I have a slight fear that if we make any
> > > mistake in fixing it partially (of course, we can't see any today), we
> > > may not even get a chance to fix it.
> > >
> > > Now, if the above convinces you or someone else not to push the
> > > partial fix in 13, then fine; otherwise, I'll push the 0001 to 13 day
> > > after tomorrow.
> >
> > I've considered the above points. I guess (b), particularly executing
> > ALTER PUBLICATION .. ADD TABLE while the target table is being
> > updated, might not be rare depending on systems. Given that this bug
> > causes a silent data-loss on the subscriber that is hard for users to
> > realize, it could ultimately depend on to what extent we can mitigate
> > the problem with only 0001 and there is a workaround when the problem
> > happens.
> >
> > Kuroda-san already shared[1] the analysis of what happens with and
> > without 0002 patch, but let me try with the example close to the
> > original data-loss problem[2]:
> >
> > Consider the following scenario:
> >
> > S1: CREATE TABLE d(data text not null);
> > S1: INSERT INTO d VALUES('d1');
> > S2: BEGIN;
> > S2: INSERT INTO d VALUES('d2');
> > S1: ALTER PUBLICATION pb ADD TABLE d;
> > S2: INSERT INTO d VALUES('d3');
> > S2: COMMIT
> > S2: INSERT INTO d VALUES('d4');
> > S1: INSERT INTO d VALUES('d5');
> >
> > Without 0001 and 0002 (i.e. as of today), the walsender fails to send
> > all changes to table 'd' until it invalidates its caches for some
> > reasons.
> >
> > With only 0001, the walsender sends 'd4' insertion or later.
> >
> > WIth both 0001 and 0002, the wansender sends 'd3' insertion or later.
> >
> > ISTM the difference between without both 0001 and 0002 and with 0001
> > is significant. So I think it's worth applying 0001 for v13.
> >
>
> Pushed to v13 as well, thanks for sharing the feedback.
>

In the commits, I saw that the filenames are misspelled for files
invalidation_distrubution.out and invalidation_distrubution.spec.
This is present in branches from REL_13 to HEAD. I have attached
patches to fix the same.

Thanks and Regards,
Shlok Kyal


v1_HEAD-0001-Fix-spelling-for-file-names.patch
Description: Binary data


v1_REL_15-0001-Fix-spelling-for-file-names.patch
Description: Binary data


v1_REL_13-0001-Fix-spelling-for-file-names.patch
Description: Binary data


v1_REL_16-0001-Fix-spelling-for-file-names.patch
Description: Binary data


v1_REL_14-0001-Fix-spelling-for-file-names.patch
Description: Binary data


v1_REL_17-0001-Fix-spelling-for-file-names.patch
Description: Binary data


Re: Fix premature xmin advancement during fast forward decoding

2025-04-24 Thread Masahiko Sawada
On Thu, Apr 24, 2025 at 9:56 PM Amit Kapila  wrote:
>
> On Fri, Apr 25, 2025 at 8:14 AM Zhijie Hou (Fujitsu)
>  wrote:
> >
> > On Fri, Apr 25, 2025 at 5:44 AM Masahiko Sawada wrote:
> > > On Tue, Apr 22, 2025 at 12:06 AM Zhijie Hou (Fujitsu)
> > >  wrote:
> > > >
> > > > Hi,
> > > >
> > > > When analyzing some issues in another thread[1], I found a bug that
> > > > fast forward decoding could lead to premature advancement of
> > > > catalog_xmin, resulting in required catalog data being removed by 
> > > > vacuum.
> > > >
> > > > The issue arises because we do not build a base snapshot when decoding
> > > > changes during fast forward mode, preventing reference to the minimum
> > > > transaction ID that remains visible in the snapshot when determining
> > > > the candidate for catalog_xmin. As a result, catalog_xmin was directly
> > > > advanced to the oldest running transaction ID found in the latest
> > > running_xacts record.
> > > >
> > > > In code-level, SnapBuildProcessChange -> ReorderBufferSetBaseSnapshot
> > > > could not be reached during fast forward decoding, resulting in
> > > > rb->txns_by_base_snapshot_lsn being NULL. When advancing
> > > catalog_xmin,
> > > > rb->the
> > > > system attempts to refer to rb->txns_by_base_snapshot_lsn via
> > > > ReorderBufferGetOldestXmin(). However, since
> > > > rb->txns_by_base_snapshot_lsn is NULL, it defaults to directly using
> > > > running->oldestRunningXid as the candidate for catalog_xmin. For
> > > > reference, see the details in SnapBuildProcessRunningXacts.
> > >
> > > I agree with your analysis.
> > >
> > > > To fix this, I think we can allow the base snapshot to be built during
> > > > fast forward decoding, as implemented in the patch 0001 (We already
> > > > built base snapshot in fast-forward mode for logical message in
> > > logicalmsg_decode()).
> > > >
> > > > I also explored the possibility of further optimizing the fix to
> > > > reduce the overhead associated with building a snapshot in
> > > > fast-forward mode. E.g., to maintain a single base_snapshot_xmin
> > > > rather than generating a full snapshot for each transaction, and use
> > > > this base_snapshot_xmin when determining the candidate catalog_xmin.
> > > > However, I am not feeling confident about maintaining some
> > > > fast-forward dedicated fields in common structures and perhaps employing
> > > different handling for catalog_xmin.
> > > >
> > > > Moreover, I conducted a basic test[2] to test the patch's impact,
> > > > noting that advancing the slot incurs roughly a 4% increase in
> > > > processing time after applying the patch, which appears to be
> > > > acceptable.  Additionally, the cost associated with building the
> > > > snapshot via SnapBuildBuildSnapshot() did not show up in the profile.
> > >
> > > Where did the regression come from? I think that even with your patch
> > > SnapBuildBuildSnapshot() would be called only a few times in the workload.
> >
> > AFAICS, the primary cost arises from the additional function
> > invocations/executions. Functions such as SnapBuildProcessChange,
> > ReorderBufferSetBaseSnapshot, and ReorderBufferXidHasBaseSnapshot are 
> > invoked
> > more frequently after the patch. Although these functions aren't inherently
> > expensive, the scenario I tested involved numerous transactions starting 
> > with
> > just a single insert each. This resulted in a high frequency of calls to 
> > these
> > functions.

Ah, you're right. I looked at the wrong profile.

> >
>
> Yeah, I see below from the patched profile.
>
> |--4.65%--SnapBuildProcessChange
> |  |
> |  |
> |  |
> |  |--2.08%--ReorderBufferSetBaseSnapshot
> |  |
> |  |  |
> |  |
> |  |   --1.32%--__mcount_internal
> |  |
> |  |
> |  |
> |  |--1.29%--__mcount_internal
> |  |
> |  |
> |  |
> |   --0.72%--ReorderBufferXidHasBaseSnapshot
>

It looks like you've run the benchmark test with enabling assertions,
is that right? I can see AssertTXNLsnOrder() in the profiles.

> As the number of operations per transaction increases, this cost
> should reduce further as we will set base_snapshot only once per
> transaction. Now, we can try to micro-optimize this by passing
> ReorderBufferTXN, as that is computed before we invoke
> SnapBuildProcessChange, but not sure how much it will change. However,
> the bigger question is, is it really worth optimizing this code path?
>
> If we really want to optimize this code path for the test Hou-San did,
> I see that SnapBuildCommitT

Re: Conflict detection for update_deleted in logical replication

2025-04-24 Thread shveta malik
On Thu, Apr 24, 2025 at 6:11 PM Zhijie Hou (Fujitsu)
 wrote:

> > Few comments for patch004:
> > Config.sgml:
> > 1)
> > +   
> > +Maximum duration (in milliseconds) for which conflict
> > +information can be retained for conflict detection by the apply 
> > worker.
> > +The default value is 0, indicating that conflict
> > +information is retained until it is no longer needed for detection
> > +purposes.
> > +   
> >
> > IIUC, the above is not entirely accurate. Suppose the subscriber manages to
> > catch up and sets oldest_nonremovable_xid to 100, which is then updated in
> > slot. After this, the apply worker takes a nap and begins a new xid update 
> > cycle.
> > Now, let’s say the next candidate_xid is 200, but this time the subscriber 
> > fails
> > to keep up and exceeds max_conflict_retention_duration. As a result, it sets
> > oldest_nonremovable_xid to InvalidTransactionId, and the launcher skips
> > updating the slot’s xmin.
>
> If the time exceeds the max_conflict_retention_duration, the launcher would
> Invalidate the slot, instead of skipping updating it. So the conflict 
> info(e.g.,
> dead tuples) would not be retained anymore.
>

launcher will not invalidate the slot until all subscriptions have
stopped conflict_info retention. So info of dead tuples for a
particular oldest_xmin of a particular apply worker could be retained
for much longer than this configured duration. If other apply workers
are actively working (catching up with primary), then they should keep
on advancing xmin of shared slot but if xmin of shared slot remains
same for say 15min+15min+15min for 3 apply-workers (assuming they are
marking themselves with stop_conflict_retention one after other and
xmin of slot has not been advanced), then the first apply worker
having marked itself with stop_conflict_retention still has access to
the oldest_xmin's data for 45 mins instead of 15 mins. (where
max_conflict_retention_duration=15 mins). Please let me know if my
understanding is wrong.

> > However, the previous xmin value (100) is still there
> > in the slot, causing its data to be retained beyond the
> > max_conflict_retention_duration. The xid 200 which actually honors
> > max_conflict_retention_duration was never marked for retention. If my
> > understanding is correct, then the documentation doesn’t fully capture this
> > scenario.
>
> As mentioned above, the strategy here is to invalidate the slot.

Please consider the case with multiple subscribers. Sorry if I missed
to mention in my previous email that it was a multi-sub case.

thanks
Shveta




Re: Fix premature xmin advancement during fast forward decoding

2025-04-24 Thread Amit Kapila
On Fri, Apr 25, 2025 at 8:14 AM Zhijie Hou (Fujitsu)
 wrote:
>
> On Fri, Apr 25, 2025 at 5:44 AM Masahiko Sawada wrote:
> > On Tue, Apr 22, 2025 at 12:06 AM Zhijie Hou (Fujitsu)
> >  wrote:
> > >
> > > Hi,
> > >
> > > When analyzing some issues in another thread[1], I found a bug that
> > > fast forward decoding could lead to premature advancement of
> > > catalog_xmin, resulting in required catalog data being removed by vacuum.
> > >
> > > The issue arises because we do not build a base snapshot when decoding
> > > changes during fast forward mode, preventing reference to the minimum
> > > transaction ID that remains visible in the snapshot when determining
> > > the candidate for catalog_xmin. As a result, catalog_xmin was directly
> > > advanced to the oldest running transaction ID found in the latest
> > running_xacts record.
> > >
> > > In code-level, SnapBuildProcessChange -> ReorderBufferSetBaseSnapshot
> > > could not be reached during fast forward decoding, resulting in
> > > rb->txns_by_base_snapshot_lsn being NULL. When advancing
> > catalog_xmin,
> > > rb->the
> > > system attempts to refer to rb->txns_by_base_snapshot_lsn via
> > > ReorderBufferGetOldestXmin(). However, since
> > > rb->txns_by_base_snapshot_lsn is NULL, it defaults to directly using
> > > running->oldestRunningXid as the candidate for catalog_xmin. For
> > > reference, see the details in SnapBuildProcessRunningXacts.
> >
> > I agree with your analysis.
> >
> > > To fix this, I think we can allow the base snapshot to be built during
> > > fast forward decoding, as implemented in the patch 0001 (We already
> > > built base snapshot in fast-forward mode for logical message in
> > logicalmsg_decode()).
> > >
> > > I also explored the possibility of further optimizing the fix to
> > > reduce the overhead associated with building a snapshot in
> > > fast-forward mode. E.g., to maintain a single base_snapshot_xmin
> > > rather than generating a full snapshot for each transaction, and use
> > > this base_snapshot_xmin when determining the candidate catalog_xmin.
> > > However, I am not feeling confident about maintaining some
> > > fast-forward dedicated fields in common structures and perhaps employing
> > different handling for catalog_xmin.
> > >
> > > Moreover, I conducted a basic test[2] to test the patch's impact,
> > > noting that advancing the slot incurs roughly a 4% increase in
> > > processing time after applying the patch, which appears to be
> > > acceptable.  Additionally, the cost associated with building the
> > > snapshot via SnapBuildBuildSnapshot() did not show up in the profile.
> >
> > Where did the regression come from? I think that even with your patch
> > SnapBuildBuildSnapshot() would be called only a few times in the workload.
>
> AFAICS, the primary cost arises from the additional function
> invocations/executions. Functions such as SnapBuildProcessChange,
> ReorderBufferSetBaseSnapshot, and ReorderBufferXidHasBaseSnapshot are invoked
> more frequently after the patch. Although these functions aren't inherently
> expensive, the scenario I tested involved numerous transactions starting with
> just a single insert each. This resulted in a high frequency of calls to these
> functions.
>

Yeah, I see below from the patched profile.

|--4.65%--SnapBuildProcessChange
|  |
|  |
|  |
|  |--2.08%--ReorderBufferSetBaseSnapshot
|  |
|  |  |
|  |
|  |   --1.32%--__mcount_internal
|  |
|  |
|  |
|  |--1.29%--__mcount_internal
|  |
|  |
|  |
|   --0.72%--ReorderBufferXidHasBaseSnapshot

As the number of operations per transaction increases, this cost
should reduce further as we will set base_snapshot only once per
transaction. Now, we can try to micro-optimize this by passing
ReorderBufferTXN, as that is computed before we invoke
SnapBuildProcessChange, but not sure how much it will change. However,
the bigger question is, is it really worth optimizing this code path?

If we really want to optimize this code path for the test Hou-San did,
I see that SnapBuildCommitTxn() has a bigger overhead, so we should
investigate whether we really need it for the fast-forward mode, where
we won't process changes. I am of the opinion that we need to pay this
additional cost of setting base_snapshot for the correctness of the
fast-forward mode.

-- 
With Regards,
Amit Kapila.




Re: Conflicting updates of command progress

2025-04-24 Thread Sami Imseih
> pgstat_progress_start_command() is called twice: First with
> cmdtype=PROGRESS_COMMAND_CLUSTER, second with
> PROGRESS_COMMAND_CREATE_INDEX. The first happens in cluster_rel() the second
> in cluster_rel() -> rebuild_relation() -> finish_heap_swap() ->
> reindex_relation() -> reindex_index().
>
> It does not matter though, the current design only expects one command.

When I set a breakpoint on pgstat_progress_start_command and ran a
CLUSTER on a table with a few indexes, I only saw start_command being
called once for PROGRESS_COMMAND_CLUSTER. Then I went back in the
code and realized that reindex_index when called via the CLUSTER command
has progress set to false. So as it stands now, only PROGRESS_COMMAND_CLUSTER
is effectively started when CLUSTER is called.

```
if (progress)
{
const int progress_cols[] = {
PROGRESS_CREATEIDX_COMMAND,
PROGRESS_CREATEIDX_INDEX_OID
};
const int64 progress_vals[] = {
PROGRESS_CREATEIDX_COMMAND_REINDEX,
indexId
};

pgstat_progress_start_command(PROGRESS_COMMAND_CREATE_INDEX,
heapId);
pgstat_progress_update_multi_param(2, progress_cols, progress_vals);
}
```

--
Sami




Re: RFC: Additional Directory for Extensions

2025-04-24 Thread David E. Wheeler
On Mar 19, 2025, at 13:29, David E. Wheeler  wrote:

> I think this should at least be documented, but generally feels unexpected to 
> me. I’ve attached a patch that fleshes out the docs, along with an example of 
> setting `extension_control_path` and `dynamic_library_path` to use the 
> locations. It might not have the information right about the need for 
> “postgresql” or “pgsql” in the path. Back in 2003[1] it was just “postgres”, 
> but I couldn’t find the logic for it just now.

Here’s a rebase.

Best,

David



v2-0001-Flesh-out-docs-for-the-prefix-make-variable.patch
Description: Binary data


signature.asc
Description: Message signed with OpenPGP


Re: What's our minimum supported Python version?

2025-04-24 Thread Jacob Champion
On Thu, Apr 24, 2025 at 7:59 AM Tom Lane  wrote:
> I'm still content with the idea of deciding that 3.6 is now our
> cutoff.

Seems like no one is pushing hard for an earlier version, yet, so
here's a patch with your suggested wording from upthread. I'm not sure
if this meets Peter's request for precision. (Though I'm not really
excited about documenting more precision than we are testing for...)

> If someone comes along and complains because
> oauth_server.py doesn't work on an older version, it's up to them
> to provide a patch.

As an aside, EL7's standard libcurl is too old to pass our current
configure checks, so I think someone would really have to go out of
their way to get to that point.

--Jacob


0001-Bump-the-minimum-supported-Python-version-to-3.6.8.patch
Description: Binary data


Re: Does RENAME TABLE rename associated identity sequence?

2025-04-24 Thread Isaac Morland
On Thu, 24 Apr 2025 at 05:53, Ashutosh Bapat 
wrote:


> If there's any problem, IMO, ALTER TABLE ... RENAME ... should rename the
> sequence too since the identity sequences are created implicitly when the
> table is created, so they should be renamed implicitly. We should not
> require WITH SEQUENCE clause.
>

My concern would be what happens if the new sequence name is not available.
I suppose the simplest behaviour might be to skip renaming the sequence in
that case, perhaps raising a warning.


Re: Improve verification of recovery_target_timeline GUC.

2025-04-24 Thread Michael Paquier
On Thu, Apr 24, 2025 at 03:34:29PM +, David Steele wrote:
> Done. This means there are no commas in the upper bound but I don't think
> it's a big deal and it more closely matches other GUC messages.

One thing that I have double-checked is if we have similar messages
for GUCs that are defined as strings while requiring some range
checks.  While we have similar messages (tablesample code is one,
opclass a second), we don't have similar thing for GUCs.  I'd bet that
this would be a win in the long-run anyway.

> I'm also now using a single cluster for all three tests rather than creating
> a new one for each test. This is definitely more efficient.
> 
> Having said that, I think these tests are awfully expensive for a single
> GUC. Unit tests would definitely be preferable but that's not an option for
> GUCs, so far as I know.

On my laptop, 003_recovery_targets.pl increases from 8.2s to 8.7s,
which seems OK here for the coverage we have.

Undoing the changes in xlogrecovery.c causes the tests to fail, so we
are good.

"invalid recovery startup fails" is used three times repeatedly for
the tests with bogus and out-of-bound values.  I'd suggest to make
these more verbose, with three extras "bogus value", "lower bound
check" and "upper bound check" added.

Side thing noted while reading the area: check_recovery_target_xid()
does not have any GUC_check_errdetail() giving details for the EINVAL
and ERANGE cases.  Your addition for recovery_target_timeline is a
good idea, so perhaps we could do the same there?  No need to do that,
that's just something I've noticed in passing..

And you are following the fat-comma convention for the command lines,
thanks for doing that.  Note that I'm not planning to do anything here
for v18 now that we are in feature freeze, these would be for v19 once
the branch opens.
--
Michael


signature.asc
Description: PGP signature


Re: Fix slot synchronization with two_phase decoding enabled

2025-04-24 Thread Masahiko Sawada
On Thu, Apr 24, 2025 at 10:48 AM Masahiko Sawada  wrote:
>
> On Thu, Apr 24, 2025 at 2:24 AM Nisha Moond  wrote:
> >
> > On Thu, Apr 24, 2025 at 12:28 PM Amit Kapila  
> > wrote:
> > >
> > > On Wed, Apr 23, 2025 at 11:04 PM Masahiko Sawada  
> > > wrote:
> > > >
> > > > On Tue, Apr 22, 2025 at 3:00 AM Amit Kapila  
> > > > wrote:
> > > > >
> > > > > On Mon, Apr 21, 2025 at 8:44 AM Zhijie Hou (Fujitsu)
> > > > >  wrote:
> > > > > >
> > > > > > On Sat, Apr 19, 2025 at 2:19 AM Masahiko Sawada wrote:
> > > > > > >
> > > > > > > On Tue, Apr 8, 2025 at 10:14 PM Zhijie Hou (Fujitsu)
> > > > > > >  wrote:
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > Approach 2
> > > > > > > > --
> > > > > > > >
> > > > > > > > Instead of disallowing the use of two-phase and failover 
> > > > > > > > together, a more
> > > > > > > > flexible strategy could be only restrict failover for slots 
> > > > > > > > with two-phase
> > > > > > > > enabled when there's a possibility of existing prepared 
> > > > > > > > transactions before
> > > > > > > the
> > > > > > > > two_phase_at that are not yet replicated. During slot creation 
> > > > > > > > with
> > > > > > > two-phase
> > > > > > > > and failover, we could check for any decoded prepared 
> > > > > > > > transactions when
> > > > > > > > determining the decoding start point 
> > > > > > > > (DecodingContextFindStartpoint). For
> > > > > > > > subsequent attempts to alter failover to true, we ensure that 
> > > > > > > > two_phase_at is
> > > > > > > > less than restart_lsn, indicating that all prepared 
> > > > > > > > transactions have been
> > > > > > > > committed and replicated, thus the bug would not happen.
> > > > > > > >
> > > > > > > > pros:
> > > > > > > >
> > > > > > > > This method minimizes restrictions for users. Especially during 
> > > > > > > > slot creation
> > > > > > > > with (two_phase=on, failover=on), as it’s uncommon for 
> > > > > > > > transactions to
> > > > > > > prepare
> > > > > > > > during consistent snapshot creation, the restriction becomes 
> > > > > > > > almost
> > > > > > > > unnoticeable.
> > > > > > >
> > > > > > > I think this approach can work for the transactions that are 
> > > > > > > prepared
> > > > > > > while the slot is created. But if I understand the problem 
> > > > > > > correctly,
> > > > > > > while the initial table sync is performing, the slot's two_phase 
> > > > > > > is
> > > > > > > still false, so we need to deal with the transactions that are
> > > > > > > prepared during the initial table sync too. What do you think?
> > > > > > >
> > > > > >
> > > > > > Yes, I agree that we need to restrict this case too. Given that we 
> > > > > > haven't
> > > > > > started decoding when setting two_phase=true during 
> > > > > > CreateDecodingContext()
> > > > > > after tablesync, we could check prepared transactions afterwards 
> > > > > > during
> > > > > > decoding. This could involve reporting an ERROR when skipping a 
> > > > > > prepared
> > > > > > transaction during decoding if its prepare LSN is less than 
> > > > > > two_phase_at.
> > > > > >
> > > > >
> > > > > It will make it difficult for users to detect it as this happens at a
> > > > > later point of time.
> > > > >
> > > > > > Alternatively, a simpler method would be to prevent this situation 
> > > > > > entirely
> > > > > > during the CREATE SUBSCRIPTION command. For example, we could 
> > > > > > restrict slots
> > > > > > created with failover set to true and twophase is later modified to 
> > > > > > true after
> > > > > > tablesync. Although the simpler check is more user-visible, it may 
> > > > > > offer less
> > > > > > flexibility.
> > > > > >
> > > > >
> > > > > I agree with your point, but OTOH, I am also afraid of adding too many
> > > > > smart checks in the back-branch. If we follow what you say here, then
> > > > > users have the following ways in PG17 to enable both failover and
> > > > > two_phase. (a) During Create Subscription, users can set both
> > > > > 'failover' and 'two_phase', if 'copy_data' is false, or (b), if
> > > > > 'copy_data' is true, during Create Subscription, then users can enable
> > > > > 'two_phase' and wait for it to be enabled. Then use Alter Subscription
> > > > > to set 'failover'.
> > > >
> > > > Yet another idea would be to disallow enabling both two_phase and
> > > > failover at CREATE SUBSCRIPTION regardless of copy_data value and to
> > > > add check when enabling failover for the two_phase-enabled-slots. For
> > > > example, users who want to enable both need to do two steps:
> > > >
> > > > 1. CREATE SUBSCRIPTION with two_phase = true and failover = false.
> > > > 2. ALTER SUBSCRIPTION with failover = true.
> > > >
> > > > At ALTER SUBSCRIPTION with failover = true, the subscriber checks if
> > > > the two_phase states is ready (and possibly check if the slot's
> > > > two_phase has been enabled too), otherwise raises an ERROR. Then, when
> > > > the publisher enables the fai

Re: ZStandard (with dictionaries) compression support for TOAST compression

2025-04-24 Thread Michael Paquier
On Wed, Apr 23, 2025 at 11:59:26AM -0400, Robert Haas wrote:
> That's nice to know, but I think the key question is not so much what
> the feature costs when it is used but what it costs when it isn't
> used. If we implement a system where we don't let
> dictionary-compressed zstd datums leak out of tables, that's bound to
> slow down a CTAS from a table where this feature is used, but that's
> kind of OK: the feature has pros and cons, and if you don't like those
> tradeoffs, you don't have to use it. However, it sounds like this
> could also slow down inserts and updates in some cases even for users
> who are not making use of the feature, and that's going to be a major
> problem unless it can be shown that there is no case where the impact
> is at all significant. Users hate paying for features that they aren't
> using.

The cost of digesting a dictionnary when decompressing sets of values
is also something I think we should worry about, FWIW (see [1]), as
the digesting cost is documented as costly, so I think that there is
also an argument in making the feature efficient if used.  That would
hurt if a sequential scan needs to detoast multiple blobs with the
same dict.  If we attach that on a per-value value, wouldn't it imply
that we need to digest the dictionnary every time a blob is
decompressed?  This information could be cached, but it seems a bit
weird to me to invent a new level of relation caching for would could
be attached as a relation attribute option in the relcache.  If a
dictionnary gets trained with a new sample of values, we could rely on
the invalidation to pass the new information.

Based on what I'm reading and I know very little about the topic so I
may be wrong, but does it even make sense to allow multiple
dictionnaries to be used in a single attribute?  Of course that may
depend on the JSON blob patterns a single attribute is dealing with,
but I'm not sure that this is worth the extra complexity this creates.

> I wonder if there's a possible design where we only allow
> dictionary-compressed datums to exist as top-level attributes in
> designated tables to which those dictionaries are attached; and any
> time you try to bury that Datum inside a container object (row, range,
> array, whatever) detoasting is forced. If there's a clean and
> inexpensive way to implement that, then you could avoid having
> heap_toast_insert_or_update care about HeapTupleHasExternal(), which
> seems like it might be a key point.

Interesting, not sure.

FWIW, I'd still try to focus on making varatt more extensible with
plain zstd support first, because diving in all these details.  We are
going to need it anyway.

[1]: https://facebook.github.io/zstd/zstd_manual.html#Chapter10
--
Michael


signature.asc
Description: PGP signature


Re: Why doesn't GiST VACUUM require a super-exclusive lock, like nbtree VACUUM?

2025-04-24 Thread Matthias van de Meent
On Fri, 21 Mar 2025 at 17:14, Matthias van de Meent
 wrote:
> Attached is v10, which polishes the previous patches, and adds a patch
> for nbtree to use the new visibility checking strategy so that it too
> can release its index pages much earlier, and adds a similar
> visibility check test to nbtree.

And here's v12. v11 (skipped) would've been a rebase, but after
finishing the rebase I noticed a severe regression in btree's IOS with
the new code, so v12 here applies some optimizations which reduce the
overhead of the new code.

Given its TableAM api changes it'd be nice to have a review on 0001,
though the additions could be rewritten to not (yet) add
TableAMRoutine.

I think patches 1, 2 and 3 are relevant to PG18 (as long as we don't
have a beta, and this is only a bit more than a bugfix). Patch 4 is
for PG19 to get btree to implement the new API, too, and patch 5
contains tests similar to the bitmap scan tests, validating that IOS
doesn't block VACUUM but still returns correct results.

I'll try to figure out a patch that's backpatchable, as alternative to
patches 2 and 3, or at least for back-patching into PG17-. That will
arrive separately, though.


Kind regards,

Matthias van de Meent
Neon (https://neon.tech)


v12-0002-GIST-Fix-visibility-issues-in-IOS.patch
Description: Binary data


v12-0005-Test-for-IOS-Vacuum-race-conditions-in-index-AMs.patch
Description: Binary data


v12-0001-IOS-TableAM-Support-AM-specific-fast-visibility-.patch
Description: Binary data


v12-0004-NBTree-Reduce-Index-Only-Scan-pin-duration.patch
Description: Binary data


v12-0003-SP-GIST-Fix-visibility-issues-in-IOS.patch
Description: Binary data


Re: Support NOT VALID / VALIDATE constraint options for named NOT NULL constraints

2025-04-24 Thread Alvaro Herrera
On 2025-Apr-09, jian he wrote:

> hi.
> 
> attached patch is for address pg_dump inconsistency
> when parent is "not null not valid" while child is "not null".

Here's my take on this patch.  We don't really need the
notnull_parent_invalid flag; in flagInhAttrs we can just set "islocal"
to convince getTableAttrs to print the constraint.  This also fixes the
bug that in getTableAttrs() you handled the case where
shouldPrintColumn() is true and not the other one.

I also added test cases to pg_dump/t/002_pg_dump.pl to verify that the
output was correct in those cases.  In constraints.sql I added a couple
of tables to ensure that the pg_upgrade handling (the pg_dump
--binary-upgrade mode) is tested as well.

Looking at the surrounding code in flagInhAttrs I noticed that we're
mishandling this case:

create table parent1 (a int);
create table parent2 (a int);
create table child () inherits (parent1, parent2);
alter table parent1 add not null a;
alter table parent2 add not null a not valid;

We print the constraint for table child for no apparent reason.

Patch 0002 is a part of your proposed patch that I don't think we need,
but I'm open to hearing arguments about why we do, preferrably with some
test cases.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"Hay que recordar que la existencia en el cosmos, y particularmente la
elaboración de civilizaciones dentro de él no son, por desgracia,
nada idílicas" (Ijon Tichy)
>From 00c46ba90a24a99857cdc849943cf3fd7b96bd8e Mon Sep 17 00:00:00 2001
From: jian he 
Date: Wed, 9 Apr 2025 13:07:58 +0800
Subject: [PATCH 1/2] Fix pg_dump for inherited validated not-null constraints

When a child constraint is validated and its parent constraint isn't,
pg_dump requires special handling.
---
 src/bin/pg_dump/common.c  | 12 +++
 src/bin/pg_dump/pg_dump.c | 12 ++-
 src/bin/pg_dump/pg_dump.h |  1 +
 src/bin/pg_dump/t/002_pg_dump.pl  | 41 +--
 src/test/regress/expected/constraints.out | 24 +
 src/test/regress/sql/constraints.sql  | 14 
 6 files changed, 101 insertions(+), 3 deletions(-)

diff --git a/src/bin/pg_dump/common.c b/src/bin/pg_dump/common.c
index 56b6c368acf..b0973d44f32 100644
--- a/src/bin/pg_dump/common.c
+++ b/src/bin/pg_dump/common.c
@@ -546,6 +546,18 @@ flagInhAttrs(Archive *fout, DumpOptions *dopt, TableInfo *tblinfo, int numTables
 		parent->notnull_constrs[inhAttrInd] != NULL)
 		foundNotNull = true;
 
+	/*
+	 * For validated not-null constraints in child tables which
+	 * derive from a parent constraint marked NOT VALID, we
+	 * artificially mark the child constraint as local so that
+	 * it is printed independently.  Failing to do this would
+	 * result in the child constraint being restored as NOT
+	 * VALID.
+	 */
+	if (fout->remoteVersion >= 18 &&
+		parent->notnull_invalid[inhAttrInd])
+		tbinfo->notnull_islocal[j] = true;
+
 	foundDefault |= (parentDef != NULL &&
 	 strcmp(parentDef->adef_expr, "NULL") != 0 &&
 	 !parent->attgenerated[inhAttrInd]);
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 105e917aa7b..0e2485479f8 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -9255,6 +9255,7 @@ getTableAttrs(Archive *fout, TableInfo *tblinfo, int numTables)
 		tbinfo->attfdwoptions = (char **) pg_malloc(numatts * sizeof(char *));
 		tbinfo->attmissingval = (char **) pg_malloc(numatts * sizeof(char *));
 		tbinfo->notnull_constrs = (char **) pg_malloc(numatts * sizeof(char *));
+		tbinfo->notnull_invalid = (bool *) pg_malloc(numatts * sizeof(bool));
 		tbinfo->notnull_noinh = (bool *) pg_malloc(numatts * sizeof(bool));
 		tbinfo->notnull_islocal = (bool *) pg_malloc(numatts * sizeof(bool));
 		tbinfo->attrdefs = (AttrDefInfo **) pg_malloc(numatts * sizeof(AttrDefInfo *));
@@ -9280,6 +9281,8 @@ getTableAttrs(Archive *fout, TableInfo *tblinfo, int numTables)
 			tbinfo->attlen[j] = atoi(PQgetvalue(res, r, i_attlen));
 			tbinfo->attalign[j] = *(PQgetvalue(res, r, i_attalign));
 			tbinfo->attislocal[j] = (PQgetvalue(res, r, i_attislocal)[0] == 't');
+			tbinfo->notnull_invalid[j] = false; /* it only change in
+ * determineNotNullFlags */
 
 			/* Handle not-null constraint name and flags */
 			determineNotNullFlags(fout, res, r,
@@ -9756,6 +9759,12 @@ determineNotNullFlags(Archive *fout, PGresult *res, int r,
 		else
 			appendPQExpBuffer(*invalidnotnulloids, ",%s", constroid);
 
+		/*
+		 * Track when a parent constraint is invalid for the cases where a
+		 * child constraint has been validated independenly.
+		 */
+		tbinfo->notnull_invalid[j] = true;
+
 		/* nothing else to do */
 		tbinfo->notnull_constrs[j] = NULL;
 		return;
@@ -9763,10 +9772,11 @@ determineNotNullFlags(Archive *fout, PGresult *res, int r,
 
 	/*
 	 * notnull_noinh is straight from the query result. notnull_islocal al

Re: [PoC] Federated Authn/z with OAUTHBEARER

2025-04-24 Thread Jacob Champion
On Wed, Apr 23, 2025 at 1:13 PM Christoph Berg  wrote:
> Uhm, so far the plan was to have one "libpq-oauth" package, not several.

I think the system is overconstrained at that point. If you want to
support clients that delay-load the ABI they're compiled against,
_and_ have them continue to work seamlessly after the system has
upgraded the ABI underneath them, without restarting the client... is
there any option other than side-by-side installation?

> Since shipping a single libpq5.deb package for all PG majors has worked well
> for the past decades, I wouldn't want to complicate that now.

I'm not sure if it's possible to ship a client-side module system
without something getting more complicated, though... I'm trying hard
not to overcomplicate it for you, but I also don't think the
complexity is going to remain the same.

--Jacob




Re: [PATCH] dynahash: add memory allocation failure check

2025-04-24 Thread Andrey Borodin



> On 24 Apr 2025, at 19:10, Tom Lane  wrote:
> 
> I thought about that but intentionally left it as-is, because that
> would force zeroing of the space reserved for the hashtable name too.
> That's unnecessary, and since it'd often be odd-sized it might result
> in a less efficient fill loop.

Well, that's just few hundred bytes at most. But I agree that makes sense.


Best regards, Andrey Borodin.







Re: Typos in the comment for the estimate_multivariate_ndistinct()

2025-04-24 Thread Tender Wang
Daniel Gustafsson  于2025年4月16日周三 22:20写道:

> > On 14 Apr 2025, at 15:34, Tender Wang  wrote:
> >
> > Hi,
> >
> > While reading the estimate_multivariate_ndistinct(),
> >  I think "If a match it found, *varinfos is
> >  * updated to remove the list of matched varinfos"
> > should be "If a match is found, *varinfos is
> >  * updated to remove the list of matched varinfos"
> > I've attached a patch for that.
>
> Seems like a correct change.
>

Thanks for checking.


-- 
Thanks,
Tender Wang


Re: AIX support

2025-04-24 Thread Thomas Munro
On Mon, Apr 7, 2025 at 10:04 PM Heikki Linnakangas  wrote:
> I'm surprised how big the difference is, because I actually expected the
> compiler to detect the memory-zeroing loop and replace it with some
> fancy vector instructions (does powerpc have any?).

It certainly does, and we've played with explicit AltiVec
vectorisation before with the idea that our abstraction over x86 and
ARM instructions could be extended to POWER, but that was just trying
stuff and learning while wondering if those abstractions are general
enough and how well all the instructions match up.  No one with modern
hardware and a vested interest has seriously investigated it.  That's
independent of automatic vectorisation done by the compiler or libc,
and I guess it would not really be AIX-specific since it would also
apply to Linux on POWER.  One of those patches is linked from this
small list of "ideas for a future PostgreSQL/AIX maintainer":

https://wiki.postgresql.org/wiki/AIX