Re: BitmapHeapScan streaming read user and prelim refactoring

2025-03-22 Thread Andres Freund
Hi,

On 2025-03-22 16:14:11 -0400, Andres Freund wrote:
> On 2025-03-22 22:00:00 +0200, Alexander Lakhin wrote:
> > @@ -24,14 +24,14 @@
> >  SELECT count(*) FROM bmscantest WHERE a = 1 AND b = 1;
> >   count
> >  ---
> > -    23
> > +    18
> >  (1 row)
> 
> Is it possible that this is the bug reported here:
> https://postgr.es/m/873c33c5-ef9e-41f6-80b2-2f5e11869f1c%40garret.ru
> 
> I.e. that the all-visible logic in bitmap heapscans is fundamentally broken?
> 
> I can reproduce different results on a fast machien by just putting a VACUUM
> FREEZE bmscantest after the CREATE INDEXes in bitmapops.sql. After I disable
> the all-visible logic in bitmapheap_stream_read_next(), I can't observe such a
> difference anymore.

Hm, it's clearly related to the all-visible path, but I think the bug is
actually independent of what was reported there. The bug you reported is
perfectly reproducible, without any concurrency, after all.

Here's a smaller reproducer:

CREATE TABLE bmscantest (a int, t text);

INSERT INTO bmscantest
  SELECT (r%53), 
'foo'
  FROM generate_series(1,7) r;

CREATE INDEX i_bmtest_a ON bmscantest(a);
set enable_indexscan=false;
set enable_seqscan=false;

-- Lower work_mem to trigger use of lossy bitmaps
set work_mem = 64;

SELECT count(*) FROM bmscantest WHERE a = 1;
vacuum freeze bmscantest;
SELECT count(*) FROM bmscantest WHERE a = 1;

-- clean up
DROP TABLE bmscantest;


The first SELECT reports 1321, the second 572 tuples.

Greetings,

Andres Freund




Re: Update LDAP Protocol in fe-connect.c to v3

2025-03-22 Thread Tom Lane
Andrew Jackson  writes:
> Currently the LDAP usage in fe-connect.c does not explicitly set the
> protocol version to v3. This causes issues with many LDAP servers as they
> will often require clients to use the v3 protocol and disallow any use of
> the v2 protocol.

This is the first complaint I can recall hearing about that, so
exactly which ones are "many"?  Also, are we really sufficiently
compliant with v3 that just adding this bit is enough?

> One further note is that I do not currently see any test coverage over the
> LDAP functionality in `fe-connect.c`. I am happy to add that to this patch
> if needed.

src/test/ldap/ doesn't do it for you?

regards, tom lane




Re: Proposal - Allow extensions to set a Plan Identifier

2025-03-22 Thread Sami Imseih
> On Sat, Mar 22, 2025 at 11:50:06PM +0100, Andrei Lepikhov wrote:
> > planId actually looks less controversial than queryId or plan_node_id. At
> > the same time, it is not free from the different logic that extensions may
> > incorporate into this value: I can imagine, for example, the attempt of
> > uniquely numbering plans related to the same queryId, plain hashing of the
> > plan tree, hashing without constants, etc.

I think plan_node_id is probably the least controversial because that value
comes straight from core, and different extensions cannot have their own
interpretation of what that value could be.

Computing a plan_id is even more open for interpretation than it is
for query_id perhaps, which is why giving this ability to extensions
will be useful. Different extensions may choose to compute a plan_id
different ways, but they all should be able to push this value into core,
and this is what this patch will allow for.

FWIW, Lukas did start a Wiki [0] to open the discussion for what parts
of the plan should be used to compute a plan_id, and maybe we can
in the future compite a plan_id in core by default.

> > So, it seems that extensions may conflict with the same field. Are we sure
> > that will happen if they are loaded in different orders in neighbouring
> > backends?
>
> Depends on what kind of computation once wants to do, and surely
> without some coordination across the extensions you are using these
> cannot be shared, but it's no different from the concept of a query
> ID.

correct. This was also recently raised in the "making Explain extensible" [0]
thread also. That is the nature of extensions, and coordination is required.

[0] https://wiki.postgresql.org/wiki/Plan_ID_Jumbling
[1] 
https://www.postgresql.org/message-id/CA%2BTgmoZ8sTodL-orXQm51_npNxuDAS86BS5kC8t0LULneShRbg%40mail.gmail.com

--
Sami Imseih
Amazon Web Services (AWS)




Re: Add estimated hit ratio to Memoize in EXPLAIN to explain cost adjustment

2025-03-22 Thread Ilia Evdokimov
After considering the opinions expressed in this discussion, I tend to 
agree more with David. If we add info about estimated unique keys and 
estimated capacity, then any additional information - such as 
evict_ratio and hit_ratio - can also be calculated, as EXPLAIN ANALYZE 
provides all the necessary details to compute these values.


For now, I’m attaching a rebased v4 patch, which includes the fix for 
ExplainPropertyText.


--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.
From b60e508583b1cb7f30814cef6f39c4b570693350 Mon Sep 17 00:00:00 2001
From: Ilia Evdokimov 
Date: Fri, 21 Mar 2025 11:52:29 +0300
Subject: [PATCH v4] Show ndistinct and est_entries in EXPLAIN for Memoize

---
 src/backend/commands/explain.c  | 16 
 src/backend/optimizer/path/costsize.c   |  3 +++
 src/backend/optimizer/plan/createplan.c | 10 +++---
 src/backend/optimizer/util/pathnode.c   |  3 +++
 src/include/nodes/pathnodes.h   |  2 ++
 src/include/nodes/plannodes.h   |  6 ++
 6 files changed, 37 insertions(+), 3 deletions(-)

diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 391b34a2af2..cb7e295ca8a 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -3628,6 +3628,22 @@ show_memoize_info(MemoizeState *mstate, List *ancestors, ExplainState *es)
 	ExplainPropertyText("Cache Key", keystr.data, es);
 	ExplainPropertyText("Cache Mode", mstate->binary_mode ? "binary" : "logical", es);
 
+	if (es->costs)
+	{
+		if (es->format == EXPLAIN_FORMAT_TEXT)
+		{
+			ExplainIndentText(es);
+			appendStringInfo(es->str, "Estimated Capacity: %u  Estimated Distinct Lookup Keys: %0.0f\n",
+			((Memoize *) plan)->est_entries,
+			((Memoize *) plan)->est_unique_keys);
+		}
+		else
+		{
+			ExplainPropertyUInteger("Estimated Capacity", "", ((Memoize *) plan)->est_entries, es);
+			ExplainPropertyFloat("Estimated Distinct Lookup Keys", "", ((Memoize *) plan)->est_unique_keys, 0, es);
+		}
+	}
+
 	pfree(keystr.data);
 
 	if (!es->analyze)
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index f6f77b8fe19..e3abf9ccc26 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -2604,6 +2604,9 @@ cost_memoize_rescan(PlannerInfo *root, MemoizePath *mpath,
 	mpath->est_entries = Min(Min(ndistinct, est_cache_entries),
 			 PG_UINT32_MAX);
 
+	/* Remember ndistinct for a potential EXPLAIN later */
+	mpath->est_unique_keys = ndistinct;
+
 	/*
 	 * When the number of distinct parameter values is above the amount we can
 	 * store in the cache, then we'll have to evict some entries from the
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 75e2b0b9036..596b82c5dc9 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -284,7 +284,8 @@ static Material *make_material(Plan *lefttree);
 static Memoize *make_memoize(Plan *lefttree, Oid *hashoperators,
 			 Oid *collations, List *param_exprs,
 			 bool singlerow, bool binary_mode,
-			 uint32 est_entries, Bitmapset *keyparamids);
+			uint32 est_entries, Bitmapset *keyparamids,
+			double est_unique_keys);
 static WindowAgg *make_windowagg(List *tlist, WindowClause *wc,
  int partNumCols, AttrNumber *partColIdx, Oid *partOperators, Oid *partCollations,
  int ordNumCols, AttrNumber *ordColIdx, Oid *ordOperators, Oid *ordCollations,
@@ -1703,7 +1704,8 @@ create_memoize_plan(PlannerInfo *root, MemoizePath *best_path, int flags)
 
 	plan = make_memoize(subplan, operators, collations, param_exprs,
 		best_path->singlerow, best_path->binary_mode,
-		best_path->est_entries, keyparamids);
+		best_path->est_entries, keyparamids,
+		best_path->est_unique_keys);
 
 	copy_generic_path_info(&plan->plan, (Path *) best_path);
 
@@ -6636,7 +6638,8 @@ materialize_finished_plan(Plan *subplan)
 static Memoize *
 make_memoize(Plan *lefttree, Oid *hashoperators, Oid *collations,
 			 List *param_exprs, bool singlerow, bool binary_mode,
-			 uint32 est_entries, Bitmapset *keyparamids)
+			uint32 est_entries, Bitmapset *keyparamids,
+			double est_unique_keys)
 {
 	Memoize*node = makeNode(Memoize);
 	Plan	   *plan = &node->plan;
@@ -6654,6 +6657,7 @@ make_memoize(Plan *lefttree, Oid *hashoperators, Oid *collations,
 	node->binary_mode = binary_mode;
 	node->est_entries = est_entries;
 	node->keyparamids = keyparamids;
+	node->est_unique_keys = est_unique_keys;
 
 	return node;
 }
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 93e73cb44db..1fbcda99067 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1701,6 +1701,9 @@ create_memoize_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
 	Assert(enable_memoize);
 	pathnode->path.disabled_nodes = subpath->disabled_nodes;
 
+	/* Estimated numb

Re: query_id: jumble names of temp tables for better pg_stat_statement UX

2025-03-22 Thread Michael Paquier
On Sat, Mar 22, 2025 at 09:04:19PM -0400, Tom Lane wrote:
> Right.  I'm arguing that that's good.  The proposed patch already
> obscures the difference between similar table names in different
> (temp) schemas, and I'm suggesting that taking that a bit further
> would be fine.
> 
> Note that if the tables we're considering don't have identical
> rowtypes, the queries would likely jumble differently anyway
> due to differences in Vars' varattno and vartype.

Not for the types AFAIK, the varattnos count in, but perhaps for the
same argument as previously it's just kind of OK?  Please see the
tests in the attached about that.

I've spent a few hours looking at the diffs of a pgss dump before and 
after the fact.  The reduction in the number of entries seem to come
mainly from tests where we do a successive creates and drops of the
same table name.  There are quite a few of them for updatable views,
but it's not the only one.  A good chunk comes from tables with
simpler and rather generic names.

So your idea to use the relation name in eref while skipping the
column list looks kind of promising.  Per se the attached.  Thoughts?
--
Michael
From fd2bc0e26134b7d74fe33da99eab6623a9859f22 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Sun, 23 Mar 2025 15:33:05 +0900
Subject: [PATCH v4 1/2] Add support for custom_query_jumble at field-level

This option gives the possibility for query jumbling to force custom
jumbling policies specific to a field of a node.  When dealing with
complex node structures, this is simpler than having to enforce a custom
function across the full node.

Custom functions need to be defined as _jumble${node}_${field}, are
given in input the parent node and the field value.  The field value is
not really required if we have the parent Node, but it makes custom
implementations easier to follow with field-specific definitions and
values.  The code generated by gen_node_support.pl uses a macro called
JUMBLE_CUSTOM(), that hides the objects specific to queryjumblefuncs.c.
---
 src/include/nodes/nodes.h |  4 
 src/backend/nodes/gen_node_support.pl | 15 +--
 src/backend/nodes/queryjumblefuncs.c  |  6 ++
 3 files changed, 23 insertions(+), 2 deletions(-)

diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index d18044b4e650..adec68a45ab8 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -54,6 +54,7 @@ typedef enum NodeTag
  *   readfuncs.c.
  *
  * - custom_query_jumble: Has custom implementation in queryjumblefuncs.c.
+ *   Available as well as a per-field attribute.
  *
  * - no_copy: Does not support copyObject() at all.
  *
@@ -101,6 +102,9 @@ typedef enum NodeTag
  * - equal_ignore_if_zero: Ignore the field for equality if it is zero.
  *   (Otherwise, compare normally.)
  *
+ * - custom_query_jumble: Has custom implementation in queryjumblefuncs.c
+ *   applied for the field of a node.  Available as well as a node attribute.
+ *
  * - query_jumble_ignore: Ignore the field for the query jumbling.  Note
  *   that typmod and collation information are usually irrelevant for the
  *   query jumbling.
diff --git a/src/backend/nodes/gen_node_support.pl b/src/backend/nodes/gen_node_support.pl
index 7e3f335ac09d..2d62ecb1d9d6 100644
--- a/src/backend/nodes/gen_node_support.pl
+++ b/src/backend/nodes/gen_node_support.pl
@@ -471,6 +471,7 @@ foreach my $infile (@ARGV)
 && $attr !~ /^read_as\(\w+\)$/
 && !elem $attr,
 qw(copy_as_scalar
+custom_query_jumble
 equal_as_scalar
 equal_ignore
 equal_ignore_if_zero
@@ -1283,12 +1284,17 @@ _jumble${n}(JumbleState *jstate, Node *node)
 		my $t = $node_type_info{$n}->{field_types}{$f};
 		my @a = @{ $node_type_info{$n}->{field_attrs}{$f} };
 		my $query_jumble_ignore = $struct_no_query_jumble;
+		my $query_jumble_custom = 0;
 		my $query_jumble_location = 0;
 		my $query_jumble_squash = 0;
 
 		# extract per-field attributes
 		foreach my $a (@a)
 		{
+			if ($a eq 'custom_query_jumble')
+			{
+$query_jumble_custom = 1;
+			}
 			if ($a eq 'query_jumble_ignore')
 			{
 $query_jumble_ignore = 1;
@@ -1304,8 +1310,13 @@ _jumble${n}(JumbleState *jstate, Node *node)
 		}
 
 		# node type
-		if (($t =~ /^(\w+)\*$/ or $t =~ /^struct\s+(\w+)\*$/)
-			and elem $1, @node_types)
+		if ($query_jumble_custom)
+		{
+			# Custom function that applies to one field of a node.
+			print $jff "\tJUMBLE_CUSTOM($n, $f);\n";
+		}
+		elsif (($t =~ /^(\w+)\*$/ or $t =~ /^struct\s+(\w+)\*$/)
+			   and elem $1, @node_types)
 		{
 			# Squash constants if requested.
 			if ($query_jumble_squash)
diff --git a/src/backend/nodes/queryjumblefuncs.c b/src/backend/nodes/queryjumblefuncs.c
index 189bfda610aa..9b9cf6f34381 100644
--- a/src/backend/nodes/queryjumblefuncs.c
+++ b/src/backend/nodes/queryjumblefuncs.c
@@ -333,6 +333,12 @@ do { \
 	if (expr->str) \
 		AppendJumble(jstate, (const unsigned char *) (expr->str), strlen(expr->str) + 1); \
 } while(0)

Re: Add Postgres module info

2025-03-22 Thread Tom Lane
I spent awhile reviewing the v5 patch, and here's a proposed v6.
Some notes:

* I didn't like depending on offsetof(Pg_magic_struct, module_extra)
to determine which parts of the struct are checked for compatibility.
It just seems way too easy to break that with careless insertion
of new fields, and such breakage might not cause obvious failures.
I think the right thing is to break out the ABI-checking fields as
their own sub-struct, rather than breaking out the new fields as a
sub-struct.

* I renamed the inquiry function to pg_get_loaded_modules, since
it only works on loaded modules but that's hardly clear from the
previous name.

* It is not clear to me what permission restrictions we should put
on pg_get_loaded_modules, but it is clear that "none" is the wrong
answer.  In particular, exposing the full file path of loaded modules
is against our rules: unprivileged users are not supposed to be able
to learn anything about the filesystem underneath the server.  (This
is why for instance an unprivileged user can't read the data_directory
GUC.)  In the attached I made the library path read as NULL unless the
user has pg_read_server_files, but I'm not attached to that specific
solution.  One thing not to like is that it's very likely that you'd
just get a row of NULLs and no useful info about a module at all.
Another idea perhaps could be to strip off the directory path and
maybe the filename extension if the user doesn't have privilege.
Or we could remove the internal permission check and instead gate
access to the function altogether with grantable EXECUTE privilege.
(This might be the right answer, since it's not clear that Joe
Unprivileged User should be able to know what modules are loaded; some
of them might have security implications.)  In any case, requiring
pg_read_server_files feels a little too strong, but I don't see an
alternative role I like better.  The EXECUTE-privilege answer would at
least let installations adjust the function's availability to their
liking.

* I didn't like anything about the test setup.  Making test_misc
dependent on other modules is a recipe for confusion, and perhaps for
failures in parallel builds.  (Yes, I see somebody already made it
depend on injection_points.  But doubling down on a bad idea doesn't
make it less bad.)  Also, the test would fail completely in an
installation that came with any preloaded modules, which hardly seems
like an improbable future situation.  I think we need to restrict what
modules we're looking at with a WHERE clause to prevent that.  After
some thought I went back to the upthread idea of just having
auto_explain as a test case.

Still TBD:

* I'm not happy with putting pg_get_loaded_modules into dfmgr.c.
It feels like the wrong layer to have a SQL-callable function,
and the large expansion in its #include list is evidence that we're
adding functionality that doesn't belong there.  But I'm not quite
sure where to put it instead.  Also, the naive way to do that would
require exporting DynamicFileList which doesn't feel nice either.
Maybe we could make dfmgr.c export some sort of iterator function?

* Should we convert our existing modules to use PG_MODULE_MAGIC_EXT?
I'm mildly in favor of that, but I think we'd need some automated way
to manage their version strings, and I don't know what that ought to
look like.  Maybe it'd be enough to make all the in-core modules use
PG_VERSION as their version string, but I think that might put a dent
in the idea of the version strings following semantic versioning
rules.

regards, tom lane

From 2f8e182dbd26e03a9568393307b64cb26b3842fd Mon Sep 17 00:00:00 2001
From: Tom Lane 
Date: Sat, 22 Mar 2025 18:39:15 -0400
Subject: [PATCH v6] Introduce PG_MODULE_MAGIC_EXT macro.

This macro allows dynamically loaded shared libraries (modules) to
provide a wired-in module name and version, and possibly other
compile-time-constant fields in future.  This information can be
retrieved with the new pg_get_loaded_modules() function.

This feature is expected to be particularly useful for modules
that do not have any exposed SQL functionality and thus are
not associated with a SQL-level extension object.  But even for
modules that do belong to extensions, being able to verify the
actual code version can be useful.

Author: Andrei Lepikhov 
Reviewed-by: Yurii Rashkovskii 
Reviewed-by: Tom Lane 
Discussion: https://postgr.es/m/dd4d1b59-d0fe-49d5-b28f-1e463b68f...@gmail.com
---
 contrib/auto_explain/auto_explain.c|  5 +-
 contrib/auto_explain/t/001_auto_explain.pl | 11 +++
 doc/src/sgml/func.sgml | 24 +++
 doc/src/sgml/xfunc.sgml| 24 +++
 src/backend/utils/fmgr/dfmgr.c | 81 --
 src/include/catalog/pg_proc.dat|  7 ++
 src/include/fmgr.h | 62 ++---
 src/tools/pgindent/typedefs.list   |  1 +
 8 files changed, 196 insertions(+), 19 deletions(-)

diff --git a/

Update LDAP Protocol in fe-connect.c to v3

2025-03-22 Thread Andrew Jackson
Currently the LDAP usage in fe-connect.c does not explicitly set the
protocol version to v3. This causes issues with many LDAP servers as they
will often require clients to use the v3 protocol and disallow any use of
the v2 protocol. Further the other usage of LDAP in postgres (in
`backend/libpq/auth.c`) uses the v3 protocol.

This patch changes fe-connect.c so that it uses the v3 protocol similar to
`backend/libpq/auth.c`.

One further note is that I do not currently see any test coverage over the
LDAP functionality in `fe-connect.c`. I am happy to add that to this patch
if needed.


Re: Proposal - Allow extensions to set a Plan Identifier

2025-03-22 Thread Michael Paquier
On Fri, Mar 21, 2025 at 09:25:24AM -0400, Sami Imseih wrote:
>> planner() is the sole place in the core code where the planner hook
>> can be called.  Shouldn't we have at least a call to
>> pgstat_report_plan_id() after planning a query?  At least that should
>> be the behavior I'd expect, where a module pushes a planId to a
>> PlannedStmt, then core publishes it to the backend entry in non-force
>> mode.
> 
> I agree. I was just thinking we rely on the exec_ routines to report the 
> plan_id
> at the start. But, besides the good reason you give, reporting
> (slightly) earlier is
> better for monitoring tools; as it reduces the likelihood they find an empty
> plan_id.

Yep.  pgstat_report_plan_id() is not something that extensions should
do, but they should report the planId in the PlannedStmt and let the
backend do the rest.

> Overall, v3 LGTM

Thanks.  If anybody has any objections and/or comments, now would be a
good time.  I'll revisit this thread at the beginning of next week.
--
Michael


signature.asc
Description: PGP signature


Re: meson and check-tests

2025-03-22 Thread Rustam ALLAKOV
Hi, 

please note, same file `/src/tools/testwrap` on the same line number 
is being changed in this patch [1] in the same commitfest.

[1] https://commitfest.postgresql.org/patch/5602/

Regards,
Rustam Allakov

Re: speedup COPY TO for partitioned table.

2025-03-22 Thread vignesh C
On Tue, 11 Mar 2025 at 18:24, jian he  wrote:
>
> after my change:
> 
> COPY TO can be used only with plain tables, not views, and does not
> copy rows from child tables,
> however COPY TO can be used with partitioned tables.
> For example, in a table inheritance hierarchy, COPY table TO copies
> the same rows as SELECT * FROM ONLY table.
> The syntax COPY (SELECT * FROM table) TO ... can be used to dump all
> of the rows in an inheritance hierarchy, or view.
> 

I find an issue with the patch:

-- Setup
CREATE SERVER myserver FOREIGN DATA WRAPPER postgres_fdw OPTIONS
(dbname 'testdb', port '5432');
CREATE TABLE t1(id int) PARTITION BY RANGE(id);
CREATE TABLE part1 PARTITION OF t1 FOR VALUES FROM (0) TO (5);
CREATE TABLE part2 PARTITION OF t1 FOR VALUES FROM (5) TO (15)
PARTITION BY RANGE(id);
CREATE FOREIGN TABLE part2_1 PARTITION OF part2 FOR VALUES FROM (10)
TO (15) SERVER myserver;

-- Create table in testdb
create table part2_1(id int);

-- Copy partitioned table data
postgres=#  copy t1 to stdout(header);
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.

Stack trace for the same is:
#0  table_beginscan (rel=0x72b109f9aad8, snapshot=0x5daafa77e000,
nkeys=0, key=0x0) at ../../../src/include/access/tableam.h:883
#1  0x5daadf89eb9b in DoCopyTo (cstate=0x5daafa71e278) at copyto.c:1105
#2  0x5daadf8913f4 in DoCopy (pstate=0x5daafa6c5fc0,
stmt=0x5daafa6f20c8, stmt_location=0, stmt_len=25,
processed=0x7ffd3799c2f0) at copy.c:316
#3  0x5daadfc7a770 in standard_ProcessUtility
(pstmt=0x5daafa6f21e8, queryString=0x5daafa6f15c0 "copy t1 to
stdout(header);", readOnlyTree=false,
context=PROCESS_UTILITY_TOPLEVEL,
params=0x0, queryEnv=0x0, dest=0x5daafa6f25a8, qc=0x7ffd3799c660)
at utility.c:738

(gdb) f 0
#0  table_beginscan (rel=0x72b109f9aad8, snapshot=0x5daafa77e000,
nkeys=0, key=0x0) at ../../../src/include/access/tableam.h:883
883 return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key, NULL, flags);

The table access method is not available in this care
(gdb) p *rel->rd_tableam
Cannot access memory at address 0x0

This failure happens when we do table_beginscan on scan part2_1 table

Regards,
Vignesh




Re: Next commitfest app release is planned for March 18th

2025-03-22 Thread Jelte Fennema-Nio
On Fri, 21 Mar 2025 at 18:53, Tom Lane  wrote:
>
> Peter Eisentraut  writes:
> > - If I'm the committer for a patch but not a reviewer, and the patch is
> > in "needs review" status, then the patch is formally speaking not
> > actionable by me and should not be under "Patches that are ready for
> > your review".  Perhaps it should be under "Blocked on others" [namely
> > the reviewers], or in a different category.

I think I agree with this... but is that actually a common situation?
I'd expect people to mostly "claim a patch as committer" when it's in
the "ready for committer" state.

> > - Conversely, if I'm the reviewer for a patch but not the committer, and
> > the patch is in "ready for committer" status, then it's also not
> > "Patches that are ready for your review".  This might similarly be
> > "Blocked on others" [namely the committer].
>
> Both of these things would work correctly only for patches that you
> have already claimed as committer.  I'm not sure about other people,
> but I rarely claim a patch as committer until I'm actually on the
> verge of committing it.  I might mark myself as reviewer sometime
> sooner than that, in which case your second proposal would be a
> net negative for me.

I think probably a nice intermediary solution would be to handle
patches with an assigned committer differently. If a committer
assigned themselves as committer and it's "ready for committer", then
I don't think it should be marked as "ready for your review" for other
committers. If there's no committer, I think it should be, because
*some committer* needs to review the patch but it's unknown which one.

> I don't think we should encourage committers
> to claim patches early, because then they are a single point of
> failure (work-stoppage) in a way that a reviewer is not.  Maybe
> we need a way for committers to mark patches as things they want
> to pay attention to, without thereby blocking other committers
> from taking up the patch?

Yeah I think we need a way to "star" a patch (not just for committers,
but for anyone). After creating this initial version of this dashboard
I realized it's also possible to "subscribe to updates" for a patch,
which means you'll get emails about it. I think if you "subscribe for
updates" you should also see it in your dashboard (which you currently
don't). But I also think there should be a way to "star" something, so
you see it in your dashboard without receiving email updates about it.

> > - Also, my dashboard shows patches from past and future commitfests.  I
> > don't know what I'm supposed to do with that.  The purpose of having a
> > "current" commitfest is that you work on that one when it's current.
>
> This might make sense for the patch author, but I agree it's not
> appropriate for anyone else.

I'd love to hear some more thoughts on this. Especially from Peter G,
because he explicitly requested to see patches from different
commitfests in this dashboard. So I'm curious about his reasoning.
Ofcourse we can make this configurable or have two separate pages, but
I'm wondering if there's a way to change it in a way that works for
everyone.

Instead of showing all patches from all commitfests, how about we do
the following:
1. Show all open patches where you are the author, no matter which
commitfest they are in.
2. If there's currently no commitfest "In Progress", then we show all
patches (where you are not the author) both from the Previous
commitfest and the Open commitfest. I think it would be a shame if
everyone's dashboard is empty in the first week after a commitfest,
just because authors haven't moved their patches over to the Open
commitfest.
3. If there's a commitfest "In Progress", then we only show patches
(where you are not the author) from the "In Progress" commitfest to
keep reviews focussed.
4. Have a separate section at the bottom of the page with all the
patches that you are tracking, but are not matching one of the
criteria from the three rules above.




Improve error reporting for few options in pg_createsubscriber

2025-03-22 Thread vignesh C
Hi,

Currently, error reports for database, publication, subscription, and
replication slots do not include the option name. This has been
addressed by including the option name in the error messages, ensuring
consistency similar to remove option. Additionally, pg_log_error and
exit(1) have been replaced with a single pg_fatal statement which
means the same. The attached patch implements these changes.

Regards,
Vignesh
From 4d1c8ebcb5ad8c2f17b027d5bf23d063da419c9d Mon Sep 17 00:00:00 2001
From: Vignesh 
Date: Sat, 22 Mar 2025 18:56:15 +0530
Subject: [PATCH] Improve error reporting consistency in pg_createsubscriber

Ensure consistent error reporting in pg_createsubscriber
by including the option name in error messages. Additionally,
replace pg_log_error and exit calls with pg_fatal.
---
 src/bin/pg_basebackup/pg_createsubscriber.c | 20 
 1 file changed, 4 insertions(+), 16 deletions(-)

diff --git a/src/bin/pg_basebackup/pg_createsubscriber.c b/src/bin/pg_basebackup/pg_createsubscriber.c
index e2d6b7544bf..2b9ec24efc2 100644
--- a/src/bin/pg_basebackup/pg_createsubscriber.c
+++ b/src/bin/pg_basebackup/pg_createsubscriber.c
@@ -2063,10 +2063,7 @@ main(int argc, char **argv)
 	num_dbs++;
 }
 else
-{
-	pg_log_error("database \"%s\" specified more than once", optarg);
-	exit(1);
-}
+	pg_fatal("database \"%s\" specified more than once for --database", optarg);
 break;
 			case 'D':
 subscriber_dir = pg_strdup(optarg);
@@ -2113,10 +2110,7 @@ main(int argc, char **argv)
 	num_pubs++;
 }
 else
-{
-	pg_log_error("publication \"%s\" specified more than once", optarg);
-	exit(1);
-}
+	pg_fatal("publication \"%s\" specified more than once for --publication", optarg);
 break;
 			case 3:
 if (!simple_string_list_member(&opt.replslot_names, optarg))
@@ -2125,10 +2119,7 @@ main(int argc, char **argv)
 	num_replslots++;
 }
 else
-{
-	pg_log_error("replication slot \"%s\" specified more than once", optarg);
-	exit(1);
-}
+	pg_fatal("replication slot \"%s\" specified more than once for --replication-slot", optarg);
 break;
 			case 4:
 if (!simple_string_list_member(&opt.sub_names, optarg))
@@ -2137,10 +2128,7 @@ main(int argc, char **argv)
 	num_subs++;
 }
 else
-{
-	pg_log_error("subscription \"%s\" specified more than once", optarg);
-	exit(1);
-}
+	pg_fatal("subscription \"%s\" specified more than once for --subscription", optarg);
 break;
 			default:
 /* getopt_long already emitted a complaint */
-- 
2.43.0



Re: Enhance 'pg_createsubscriber' to retrieve databases automatically when no database is provided.

2025-03-22 Thread Shubham Khanna
On Sat, Mar 22, 2025 at 6:23 PM vignesh C  wrote:
>
> On Fri, 21 Mar 2025 at 18:59, Shubham Khanna
>  wrote:
> >
> >
> > During the recent testing, I observed that the tests were failing when
> > using wait_for_slot_catchup(). To address this, I reverted to using
> > wait_for_subscription_sync(), which was employed previously and has
> > proven to be more reliable in ensuring test stability.
> > Please let me know if there are any additional adjustments you would
> > suggest or if you would like me to investigate further into
> > wait_for_slot_catchup().
> >
> > I have created a separate patch for the synopsis of '--all' option as
> > suggested by Amit at [1]. The attached patch contains the suggested
> > changes.
>
> I believe you added the following because pattern matching is
> difficult for the $db1 and $db2 variables having special names:
> +# Create a new database on node_p
> +$node_p->safe_psql(
> +   "postgres", qq(
> +   CREATE DATABASE db1;
> +));
>
> How about we change it to verify the count of occurrences instead for
> this case like below:
> # Verify that the required logical replication objects are created. The
> # expected count 3 refers to postgres, $db1 and $db2 databases.
> is(scalar(() = $stderr =~ /creating publication/g),
> 3, "verify publications are created for all databases");
> is(scalar(() = $stderr =~ /creating the replication slot/g),
> 3, "verify replication slots are created for all databases");
> is(scalar(() = $stderr =~ /creating subscription/g),
> 3, "verify subscriptions are created for all databases");
>
> If you are ok, you can merge the changes attached.
>

I agree that verifying the count of occurrences is a more
straightforward and effective approach, especially given the
challenges with pattern matching for $db1 and $db2 variables with
special names. This method simplifies validation and enhances
robustness by explicitly ensuring the expected number of logical
replication objects are created.

I have reviewed and merged the proposed changes into the patch. The
attached patches contain the suggested changes.

Thanks and regards,
Shubham Khanna.


v18-0003-Synopsis-for-all-option.patch
Description: Binary data


v18-0001-Enhance-pg_createsubscriber-to-fetch-and-append-.patch
Description: Binary data


v18-0002-Additional-test-cases.patch
Description: Binary data


Re: Next commitfest app release is planned for March 18th

2025-03-22 Thread Tom Lane
Peter Geoghegan  writes:
> On Sat, Mar 22, 2025 at 7:15 AM Jelte Fennema-Nio  wrote:
>> Yeah I think we need a way to "star" a patch (not just for committers,
>> but for anyone).

> Right. In my view "starring" something should definitely not appear in
> the "History Log" of the patch; it should be private.

+1

> Personally, I think it'd be most useful for the dashboard to show all
> open patches.

Surely not "all"?  We have that display already.  Should be more like
"patches that I have expressed an interest in or have reason to take
current or future action on".

In the case of stale patches that haven't been moved forward from a
closed commitfest, perhaps we could compromise on Jelte's suggestion
of putting those in a separate section at the bottom of the page.

However, I still think such patches should be treated differently for
the author than other people.  The author does have current action to
take on the patch, namely moving it forward or withdrawing it.

regards, tom lane




Re: Remove redundant if-else in EXPLAIN by using ExplainPropertyText

2025-03-22 Thread David Rowley
On Fri, 21 Mar 2025 at 09:24, Ilia Evdokimov
 wrote:
> Thanks to David [0], we found that the if and else branches contained
> equivalent code. Since ExplainPropertyText already handles non-text
> formats, the extra condition is unnecessary.
>
> I reviewed other files related to EXPLAIN where similar patterns might
> exist, but this was the only instance I found.

I looked at; git grep -E "appendStringInfo.*\\n" --
src/backend/commands/explain.c and all those results are appending
multiple options. No single option ones.

The patch looks good to me and seems worth applying to master. I feel
this should be fixed to avoid misleading future patches that add new
properties to Memoize into copying the same pattern.

David




Re: BitmapHeapScan streaming read user and prelim refactoring

2025-03-22 Thread Alexander Lakhin

Hello Melanie,

15.03.2025 16:43, Melanie Plageman wrote:

On Thu, Mar 13, 2025 at 5:41 PM Melanie Plageman
 wrote:

Overall, I feel pretty good about merging this once Thomas merges the
read stream patches.

This was committed in 944e81bf99db2b5b70b, 2b73a8cd33b745c, and
c3953226a07527a1e2.
I've marked it as committed in the CF app.


It looks like that change made the bitmapops test unstable (on slow animals?): 
[1], [2], [3].
I've reproduced such test failures when running
TESTS=$(printf 'bitmapops %.0s' `seq 50`) make -s check-tests
under Valgrind:
...
ok 17    - bitmapops   52719 ms
not ok 18    - bitmapops   57566 ms
ok 19    - bitmapops   60179 ms
ok 20    - bitmapops   32927 ms
ok 21    - bitmapops   45127 ms
ok 22    - bitmapops   42924 ms
ok 23    - bitmapops   61035 ms
ok 24    - bitmapops   56316 ms
ok 25    - bitmapops   52874 ms
not ok 26    - bitmapops   67468 ms
ok 27    - bitmapops   55605 ms
ok 28    - bitmapops   24021 ms
...

diff -U3 /home/vagrant/postgresql/src/test/regress/expected/bitmapops.out 
/home/vagrant/postgresql/src/test/regress/results/bitmapops.out

--- /home/vagrant/postgresql/src/test/regress/expected/bitmapops.out 2025-03-16 
01:37:52.716885600 -0700
+++ /home/vagrant/postgresql/src/test/regress/results/bitmapops.out 2025-03-22 
03:47:54.014702406 -0700
@@ -24,14 +24,14 @@
 SELECT count(*) FROM bmscantest WHERE a = 1 AND b = 1;
  count
 ---
-    23
+    18
 (1 row)

 -- Test bitmap-or.
 SELECT count(*) FROM bmscantest WHERE a = 1 OR b = 1;
  count
 ---
-  2485
+  1044
 (1 row)

 -- clean up
diff -U3 /home/vagrant/postgresql/src/test/regress/expected/bitmapops.out 
/home/vagrant/postgresql/src/test/regress/results/bitmapops.out

--- /home/vagrant/postgresql/src/test/regress/expected/bitmapops.out 2025-03-16 
01:37:52.716885600 -0700
+++ /home/vagrant/postgresql/src/test/regress/results/bitmapops.out 2025-03-22 
03:54:53.129549597 -0700
@@ -31,7 +31,7 @@
 SELECT count(*) FROM bmscantest WHERE a = 1 OR b = 1;
  count
 ---
-  2485
+  1044
 (1 row)

git bisect for this deviation has pointed at 2b73a8cd3.

[1] 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2025-03-16%2010%3A34%3A17
[2] 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=flaviventris&dt=2025-03-17%2020%3A07%3A43
[3] 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2025-03-17%2001%3A16%3A02

Best regards,
Alexander Lakhin
Neon (https://neon.tech)

Re: Parallel heap vacuum

2025-03-22 Thread Andres Freund
Hi,

On 2025-03-20 01:35:42 -0700, Masahiko Sawada wrote:
> One plausible solution would be that we don't use ReadStream in
> parallel heap vacuum cases but directly use
> table_block_parallelscan_xxx() instead. It works but we end up having
> two different scan methods for parallel and non-parallel lazy heap
> scan. I've implemented this idea in the attached v12 patches.

I think that's a bad idea - this means we'll never be able to use direct IO
for parallel VACUUMs, despite

a) The CPU overhead of buffered reads being a problem for VACUUM

b) Data ending up in the kernel page cache is rather wasteful for VACUUM, as
   that's often data that won't otherwise be used again soon. I.e. these reads
   would particularly benefit from using direct IO.

c) Even disregarding DIO, loosing the ability to do larger reads, as provided
   by read streams, looses a fair bit of efficiency (just try doing a
   pg_prewarm of a large relation with io_combine_limit=1 vs
   io_combine_limit=1).

Greetings,

Andres Freund




Re: BitmapHeapScan streaming read user and prelim refactoring

2025-03-22 Thread Andres Freund
Hi,

On 2025-03-22 22:00:00 +0200, Alexander Lakhin wrote:
> 15.03.2025 16:43, Melanie Plageman wrote:
> > On Thu, Mar 13, 2025 at 5:41 PM Melanie Plageman
> >  wrote:
> > > Overall, I feel pretty good about merging this once Thomas merges the
> > > read stream patches.
> > This was committed in 944e81bf99db2b5b70b, 2b73a8cd33b745c, and
> > c3953226a07527a1e2.
> > I've marked it as committed in the CF app.
> 
> It looks like that change made the bitmapops test unstable (on slow 
> animals?): [1], [2], [3].
> I've reproduced such test failures when running
> TESTS=$(printf 'bitmapops %.0s' `seq 50`) make -s check-tests
> under Valgrind:
> ...
> ok 17    - bitmapops   52719 ms
> not ok 18    - bitmapops   57566 ms
> ok 19    - bitmapops   60179 ms
> ok 20    - bitmapops   32927 ms
> ok 21    - bitmapops   45127 ms
> ok 22    - bitmapops   42924 ms
> ok 23    - bitmapops   61035 ms
> ok 24    - bitmapops   56316 ms
> ok 25    - bitmapops   52874 ms
> not ok 26    - bitmapops   67468 ms
> ok 27    - bitmapops   55605 ms
> ok 28    - bitmapops   24021 ms
> ...
> 
> diff -U3 /home/vagrant/postgresql/src/test/regress/expected/bitmapops.out
> /home/vagrant/postgresql/src/test/regress/results/bitmapops.out
> --- /home/vagrant/postgresql/src/test/regress/expected/bitmapops.out 
> 2025-03-16 01:37:52.716885600 -0700
> +++ /home/vagrant/postgresql/src/test/regress/results/bitmapops.out 
> 2025-03-22 03:47:54.014702406 -0700
> @@ -24,14 +24,14 @@
>  SELECT count(*) FROM bmscantest WHERE a = 1 AND b = 1;
>   count
>  ---
> -    23
> +    18
>  (1 row)

Is it possible that this is the bug reported here:
https://postgr.es/m/873c33c5-ef9e-41f6-80b2-2f5e11869f1c%40garret.ru

I.e. that the all-visible logic in bitmap heapscans is fundamentally broken?

I can reproduce different results on a fast machien by just putting a VACUUM
FREEZE bmscantest after the CREATE INDEXes in bitmapops.sql. After I disable
the all-visible logic in bitmapheap_stream_read_next(), I can't observe such a
difference anymore.

Greetings,

Andres Freund




Re: RFC: Allow EXPLAIN to Output Page Fault Information

2025-03-22 Thread Jelte Fennema-Nio
On Wed, 19 Mar 2025 at 14:15, torikoshia  wrote:
> BTW based on your discussion, I thought this patch could not be merged
> anytime soon. Does that align with your understanding?

Yeah, that aligns with my understanding. I don't think it's realistic
to get this merged before the code freeze, but I think both of the
below issues could be resolved.

> - With bgworker-based AIO, this patch could mislead users into
> underestimating the actual storage I/O load, which is undesirable.

To resolve this, I think the patch would need to change to not report
anything if bgworker-based AIO is used. So I moved this patch to the
next commitfest, and marked it as "waiting for author" there.

> - With io_uring-based AIO, this patch could provide meaningful values,
> but it may take some time before io_uring sees widespread adoption.

I submitted this patch to help make io_uring-based AIO more of a reality:
https://commitfest.postgresql.org/patch/5570/




Re: query_id: jumble names of temp tables for better pg_stat_statement UX

2025-03-22 Thread Tom Lane
Michael Paquier  writes:
> On Sat, Mar 22, 2025 at 10:43:00AM +0100, Christoph Berg wrote:
>> Are we at the point where the patch is already Ready for Committer?

> I'll think a bit more about how to document all that.  Anyway, yes,
> I'm OK with the per-field custom_query_jumble, so let's move on with
> that, so I will do something about that.

I'm not terribly happy with the entire proposal.

(1) I know it was asserted upthread that there was no performance
impact, but I find that hard to believe.

(2) This patch inserts catalog lookups into query ID computation,
which AFAIK there never were before.  This means you can't compute a
query ID outside a transaction or in an already-failed transaction.
Surely that's going to bite us eventually.

(3) I think having query jumbling work differently for temp tables
than other tables is a fundamentally bad idea.

So my feeling is: if we think this is the behavior we want, let's do
it across the board.  I suggest that we simply drop the relid from the
jumble and use the table alias (that is, eref->aliasname) instead.
ISTM this fits well with the general trend in pg_stat_statements
to merge statements together more aggressively than the original
concept envisioned.

regards, tom lane




Re: making EXPLAIN extensible

2025-03-22 Thread Robert Haas
On Sat, Mar 22, 2025 at 4:46 AM Andrei Lepikhov  wrote:
> I skimmed through the code and tested how it works.
> It looks good and has no apparent architectural dependencies.
> But I haven't scrutinised it line-by-line and do not intend to do so.
> I wanna say I hate the design of this module. Having a strong necessity
> for extra explain tools (in the daily routine, all I have is the only
> single explain analyse verbose output to find out planner/executor bug,
> reproduce it and make a patch), I don't see a single case when I would
> use this module. It adds a burden to fix its output on a node change
> (you don't care, but it adds work to Postgres fork maintainers, too, for
> nothing). Also, it breaks my understanding of the principles of the
> Postgres code design - to start the discussion on how to show more, we
> need only the bare minimum of code and output lines.
> In my opinion, it should show as few parameters as possible to
> demonstrate principles and test the code on a single planner node. It
> only deserves src/test/modules because it is not helpful for a broad
> audience.

Gee, thanks for the ringing endorsement. :-)

I think *I* will use it pretty regularly; I already have. In my
experience, using debug_query_plan for this sort of thing sucks quite
a lot. Finding the information you need in the output takes a long
time because the output is quite long. This is much more
understandable, at least for me. I agree with you that a trivial test
module could demonstrate that the hook works, but a trivial example
would not demonstrate that the hook can be used to do something
actually useful. It sounds like what I've written also fails the
"actually useful" test for you, but it doesn't for me. I'm not going
to insist on shipping this if I'm the ONLY one who would ever get any
use out of it, but I doubt that's the case.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Reduce TupleHashEntryData struct size by half

2025-03-22 Thread Jeff Davis
On Tue, 2025-03-04 at 17:28 -0800, Jeff Davis wrote:
> My results (with wide tables):
> 
>     GROUP BY  EXCEPT
>    master:  2151    1732
>    entire v8 series:    2054    1740

I'm not sure what I did with the EXCEPT test, above, perhaps I had
filled the test table twice by mistake or something?

New numbers below:

Data:

  create table c (a int not null);
  insert into c select a from generate_Series(1,1000) a;

  create table big(i int, j int);
  insert into big select g, 1 from generate_series(1,1000) g;

  create table wide (t text not null);
  insert into wide
select repeat(md5(a::text),10)
from generate_Series(1,1000) a;

  create table wide_copy (t text not null);
  insert into wide_copy select * from wide;

  vacuum freeze analyze;


Results:

  c(tps) wide(tps) except(tps)  big(ms)  big(MiB)

91ecb5e0bc 4768  20912317 3853   768
master 4942  20632323 3831   768
0001   4932  20382361 3712   616
0002   4601  20342365 3753   616
0003   4850  20402418 3749   616
0004   4744  20652282 3665   488
0005   4630  19942307 3665   488
0006   4809  20632394 3646   488
0007   4752  20702479 3659   488

Note: c.sql, wide.sql, and except.sql are measured in tps, higher is
better. big.sql is measured in ms and MiB, lower is better.

For some reason I'm getting a decline of about 3% in the c.sql test
that seems to be associated with the accessor functions, even when
inlined. I'm also not seeing as much benefit from the inlining of the
MemoryContextMemAllocated(). But these mixed test results are minor
compared with the memory savings of 35% and the more consistently-
improved performance of 5% on the larger test (which is also integers),
so I plan to commit it.

Regards,
Jeff Davis

From f19e8363d52755d54d31f34778c841f97e6edfe2 Mon Sep 17 00:00:00 2001
From: Jeff Davis 
Date: Wed, 8 Jan 2025 11:46:35 -0800
Subject: [PATCH v9 1/7] HashAgg: use Bump allocator for hash TupleHashTable
 entries.

The entries aren't freed until the entire hash table is destroyed, so
use the Bump allocator to improve allocation speed, avoid wasting
space on the chunk header, and avoid wasting space due to the
power-of-two allocations.

Discussion: https://postgr.es/m/caaphdvqv1anb4cm36fzrwivxrevbo_lsg_eq3nqdxtjecaa...@mail.gmail.com
Reviewed-by: David Rowley
---
 src/backend/executor/execUtils.c |  17 +++--
 src/backend/executor/nodeAgg.c   | 111 ++-
 src/include/nodes/execnodes.h|   5 +-
 3 files changed, 104 insertions(+), 29 deletions(-)

diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 55ab18fb826..772c86e70e9 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -322,19 +322,18 @@ CreateExprContext(EState *estate)
 ExprContext *
 CreateWorkExprContext(EState *estate)
 {
-	Size		minContextSize = ALLOCSET_DEFAULT_MINSIZE;
-	Size		initBlockSize = ALLOCSET_DEFAULT_INITSIZE;
 	Size		maxBlockSize = ALLOCSET_DEFAULT_MAXSIZE;
 
-	/* choose the maxBlockSize to be no larger than 1/16 of work_mem */
-	while (maxBlockSize > work_mem * (Size) 1024 / 16)
-		maxBlockSize >>= 1;
+	maxBlockSize = pg_prevpower2_size_t(work_mem * (Size) 1024 / 16);
 
-	if (maxBlockSize < ALLOCSET_DEFAULT_INITSIZE)
-		maxBlockSize = ALLOCSET_DEFAULT_INITSIZE;
+	/* But no bigger than ALLOCSET_DEFAULT_MAXSIZE */
+	maxBlockSize = Min(maxBlockSize, ALLOCSET_DEFAULT_MAXSIZE);
 
-	return CreateExprContextInternal(estate, minContextSize,
-	 initBlockSize, maxBlockSize);
+	/* and no smaller than ALLOCSET_DEFAULT_INITSIZE */
+	maxBlockSize = Max(maxBlockSize, ALLOCSET_DEFAULT_INITSIZE);
+
+	return CreateExprContextInternal(estate, ALLOCSET_DEFAULT_MINSIZE,
+	 ALLOCSET_DEFAULT_INITSIZE, maxBlockSize);
 }
 
 /* 
diff --git a/src/backend/executor/nodeAgg.c b/src/backend/executor/nodeAgg.c
index b4a7698a0b3..fb73e6f991d 100644
--- a/src/backend/executor/nodeAgg.c
+++ b/src/backend/executor/nodeAgg.c
@@ -406,6 +406,7 @@ static void build_hash_tables(AggState *aggstate);
 static void build_hash_table(AggState *aggstate, int setno, long nbuckets);
 static void hashagg_recompile_expressions(AggState *aggstate, bool minslot,
 		  bool nullcheck);
+static void hash_create_memory(AggState *aggstate);
 static long hash_choose_num_buckets(double hashentrysize,
 	long ngroups, Size memory);
 static int	hash_choose_num_partitions(double input_groups,
@@ -1509,7 +1510,7 @@ build_hash_table(AggState *aggstate, int setno, long nbuckets)
 {
 	AggStatePerHash perhash = &aggstate->perhash[setno];
 	MemoryContext metacxt = aggstate->hash_metacxt;
-	MemoryContext hashcxt = aggstate->hashcontext->ecxt_per_tuple_memory;
+	Mem

Bug - DoS - Handler function lookups consider non-handler functions

2025-03-22 Thread David G. Johnston
Today:

create extension tsm_system_rows ;
create schema s1;
create function s1.system_rows(internal) returns void language c as
'tsm_system_rows.so', 'tsm_system_rows_handler';
set search_path to s1,public,pg_catalog;
select count(*) from pg_class tablesample system_rows(0);
ERROR:  function system_rows must return type tsm_handler
LINE 1: select count(*) from pg_class tablesample system_rows(0);

The above is a denial-of-service due to our decision to lookup handler
functions by name regardless of return type and consider it an error if a
function with the wrong return type shows up (in particular, even though
one with the correct signature exists and otherwise would have been found).

The attached POC fixes this by allowing callers to also specify the OID of
the handler type as part of their function lookup criteria.  Tablesample is
fixed to use this new call though possibly others exist.  I'm not
particularly fond of what I ended up with naming conventions but figure
it's good enough for now.

Patch applied and re-running the above:

select count(*) from pg_class tablesample system_rows(0);
 count
---
 0
(1 row)

I noticed this when reviewing the extensible copy formats patch which used
tablesample as a reference.

David J.
From 09fd47f241859a716374f5b7926f758a5ca8fd3e Mon Sep 17 00:00:00 2001
From: "David G. Johnston" 
Date: Sat, 22 Mar 2025 09:45:49 -0700
Subject: [PATCH] Handler function lookups ignore non-handler functions

---
 src/backend/catalog/namespace.c   | 35 +++
 src/backend/parser/parse_clause.c | 10 +--
 src/backend/parser/parse_func.c   | 47 +++
 src/include/catalog/namespace.h   |  7 +
 src/include/parser/parse_func.h   |  5 
 5 files changed, 78 insertions(+), 26 deletions(-)

diff --git a/src/backend/catalog/namespace.c b/src/backend/catalog/namespace.c
index d97d632a7e..0cb66261a3 100644
--- a/src/backend/catalog/namespace.c
+++ b/src/backend/catalog/namespace.c
@@ -1115,9 +1115,20 @@ TypeIsVisibleExt(Oid typid, bool *is_missing)
 	return visible;
 }
 
-
+FuncCandidateList
+FuncnameGetCandidates(List *names, int nargs, List *argnames,
+	  bool expand_variadic, bool expand_defaults,
+	  bool include_out_arguments,
+	  bool missing_ok)
+{
+	return HandlerGetCandidates(names, nargs, argnames,
+ expand_variadic, expand_defaults,
+ include_out_arguments,
+ NULL,
+ missing_ok);
+}
 /*
- * FuncnameGetCandidates
+ * HandlerGetCandidates
  *		Given a possibly-qualified function name and argument count,
  *		retrieve a list of the possible matches.
  *
@@ -1184,14 +1195,20 @@ TypeIsVisibleExt(Oid typid, bool *is_missing)
  * The caller might end up discarding such an entry anyway, but if it selects
  * such an entry it should react as though the call were ambiguous.
  *
+ * The presence of a handlerkey further restricts the search to functions whose
+ * return data type matches the handlerkey.  These are pseudo-types known to
+ * the system are are never schema-qualified.
+ *
  * If missing_ok is true, an empty list (NULL) is returned if the name was
  * schema-qualified with a schema that does not exist.  Likewise if no
  * candidate is found for other reasons.
  */
 FuncCandidateList
-FuncnameGetCandidates(List *names, int nargs, List *argnames,
-	  bool expand_variadic, bool expand_defaults,
-	  bool include_out_arguments, bool missing_ok)
+HandlerGetCandidates(List *names, int nargs, List *argnames,
+	 bool expand_variadic, bool expand_defaults,
+	 bool include_out_arguments,
+	 Oid *handlerkey,
+	 bool missing_ok)
 {
 	FuncCandidateList resultList = NULL;
 	bool		any_special = false;
@@ -1374,6 +1391,14 @@ FuncnameGetCandidates(List *names, int nargs, List *argnames,
 continue;
 		}
 
+		if (handlerkey)
+		{
+			{
+if (procform->prorettype != handlerkey)
+	continue;
+			}
+		}
+
 		/*
 		 * We must compute the effective argument list so that we can easily
 		 * compare it to earlier results.  We waste a palloc cycle if it gets
diff --git a/src/backend/parser/parse_clause.c b/src/backend/parser/parse_clause.c
index 2e64fcae7b..dbea8c2fda 100644
--- a/src/backend/parser/parse_clause.c
+++ b/src/backend/parser/parse_clause.c
@@ -924,7 +924,7 @@ transformRangeTableSample(ParseState *pstate, RangeTableSample *rts)
 	 */
 	funcargtypes[0] = INTERNALOID;
 
-	handlerOid = LookupFuncName(rts->method, 1, funcargtypes, true);
+	handlerOid = LookupHandlerName(rts->method, 1, funcargtypes, (Oid *) TSM_HANDLEROID, true);
 
 	/* we want error to complain about no-such-method, not no-such-function */
 	if (!OidIsValid(handlerOid))
@@ -934,14 +934,6 @@ transformRangeTableSample(ParseState *pstate, RangeTableSample *rts)
 		NameListToString(rts->method)),
  parser_errposition(pstate, rts->location)));
 
-	/* check that handler has correct return type */
-	if (get_func_rettype(handlerOid) != TSM_HANDLEROID)
-		ereport(ERROR,
-(errcode(ERRCODE_WRONG_

Re: Make COPY format extendable: Extract COPY TO format implementations

2025-03-22 Thread David G. Johnston
On Fri, Mar 21, 2025 at 10:23 PM David G. Johnston <
david.g.johns...@gmail.com> wrote:

> Then this would benefit from the new function I suggest creating since it
> apparently has the same, IMO, bug.
>
>
Concretely like I posted here:
https://www.postgresql.org/message-id/cakfquwybtck+uw-byfchhp8hyj0r5+upytgmdqevp9phcsz...@mail.gmail.com

David J.


Re: Bug - DoS - Handler function lookups consider non-handler functions

2025-03-22 Thread Tom Lane
"David G. Johnston"  writes:
> create extension tsm_system_rows ;
> create schema s1;
> create function s1.system_rows(internal) returns void language c as
> 'tsm_system_rows.so', 'tsm_system_rows_handler';
> set search_path to s1,public,pg_catalog;
> select count(*) from pg_class tablesample system_rows(0);
> ERROR:  function system_rows must return type tsm_handler
> LINE 1: select count(*) from pg_class tablesample system_rows(0);

> The above is a denial-of-service due to our decision to lookup handler
> functions by name regardless of return type and consider it an error if a
> function with the wrong return type shows up (in particular, even though
> one with the correct signature exists and otherwise would have been found).

I do not think this is a bug: it's superuser error.  Yeah, it'd be
great if we could prevent people from creating incorrect support
functions, but that'd require solving the halting problem.

> The attached POC fixes this by allowing callers to also specify the OID of
> the handler type as part of their function lookup criteria.

I'm not on board with that at all.  The law of unintended consequences
comes into play the moment you start messing with function lookup
rules.  And I'm not at all convinced that "ignore it" is an
improvement over "throw an error".

And please, let's stop with the scare tactics of calling this sort
of thing a denial-of-service.

regards, tom lane




Re: Next commitfest app release is planned for March 18th

2025-03-22 Thread Peter Geoghegan
On Sat, Mar 22, 2025 at 10:49 AM Tom Lane  wrote:
> > Personally, I think it'd be most useful for the dashboard to show all
> > open patches.
>
> Surely not "all"?  We have that display already.  Should be more like
> "patches that I have expressed an interest in or have reason to take
> current or future action on".

What I meant was "patches that I have expressed an interest in or have
reason to take current or future action on" should be interpreted in
the broadest/most permissive way possible by the dashboard. In
particular, entries should only *fully* disappear when somebody (some
individual human) has actively chosen to make a representation that
the patch is closed out. The rest (how we present that information) is
details.

> In the case of stale patches that haven't been moved forward from a
> closed commitfest, perhaps we could compromise on Jelte's suggestion
> of putting those in a separate section at the bottom of the page.

If there is any gray area, then I very much want us to err on the side
of still showing *something*. I really strongly object to having
things vanish, absent a 100% unambiguous signal that that's what
should happen.

In general, the personal dashboard is (quite usefully) oriented around
what actions you as a user (or some other CF app user) needs to take
-- it is workflow oriented. It seems natural to me to handle stale/in
limbo patches (patches that are not officially closed but also aren't
in the current or next CF) in just the same way -- by presenting the
information in terms of actions that need to be taken by some
individual stakeholder.

> However, I still think such patches should be treated differently for
> the author than other people.  The author does have current action to
> take on the patch, namely moving it forward or withdrawing it.

Right. So maybe for the patch author it appears either under "Your
patches that need changes from you", or in a similar section that's
just for stale patches that need to either be officially dropped or
officially moved forward to an open CF. Whereas it'd be a little
different (but not too different) for somebody who sees a patch
because they're the reviewer/committer of record (the action item for
such a person, if any, is to actually fully close the patch, or to nag
the patch author).

It probably makes sense to make stale/in limbo entries stick out like
a sore thumb. They're *supposed* to be annoying.

-- 
Peter Geoghegan




Re: query_id: jumble names of temp tables for better pg_stat_statement UX

2025-03-22 Thread Tom Lane
I wrote:
> So my feeling is: if we think this is the behavior we want, let's do
> it across the board.  I suggest that we simply drop the relid from the
> jumble and use the table alias (that is, eref->aliasname) instead.

I experimented with this trivial fix (shown in-line to keep the cfbot
from thinking this is the patch-of-record):

diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 23c9e3c5abf..a54bbdc18b7 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -1051,7 +1051,7 @@ typedef struct RangeTblEntry
/* user-written alias clause, if any */
Alias  *alias pg_node_attr(query_jumble_ignore);
/* expanded reference names */
-   Alias  *eref pg_node_attr(query_jumble_ignore);
+   Alias  *eref;
 
RTEKind rtekind;/* see above */
 
@@ -1094,7 +1094,7 @@ typedef struct RangeTblEntry
 * tables to be invalidated if the underlying table is altered.
 */
/* OID of the relation */
-   Oid relid;
+   Oid relid pg_node_attr(query_jumble_ignore);
/* inheritance requested? */
boolinh;
/* relation kind (see pg_class.relkind) */

This caused just one diff in existing regression test cases:

diff --git a/contrib/pg_stat_statements/expected/planning.out 
b/contrib/pg_stat_statements/expected/planning.out
index 3ee1928cbe9..c25b8b946fd 100644
--- a/contrib/pg_stat_statements/expected/planning.out
+++ b/contrib/pg_stat_statements/expected/planning.out
@@ -75,8 +75,9 @@ SELECT plans >= 2 AND plans <= calls AS plans_ok, calls, 
rows, query FROM pg_sta
   WHERE query LIKE 'SELECT COUNT%' ORDER BY query COLLATE "C";
  plans_ok | calls | rows |query 
 --+---+--+--
- t| 4 |4 | SELECT COUNT(*) FROM stats_plan_test
-(1 row)
+ f| 1 |1 | SELECT COUNT(*) FROM stats_plan_test
+ f| 3 |3 | SELECT COUNT(*) FROM stats_plan_test
+(2 rows)
 
 -- Cleanup
 DROP TABLE stats_plan_test;

What's happening there is that there's an ALTER TABLE ADD COLUMN in
the test, so the executions after the first one see more entries
in eref->colnames and come up with a different jumble.  I think
we probably don't want that behavior; we only want to jumble the
table name.  So we'd still need the v3-0001 patch in some form to
allow annotating RangeTblEntry.eref with a custom jumble method
that'd only jumble the aliasname.

regards, tom lane




Re: Proposal: Progressive explain

2025-03-22 Thread Robert Haas
On Wed, Mar 19, 2025 at 6:53 PM Rafael Thofehrn Castro
 wrote:
> The strategy I used here is to use a MemoryContextCallback
> (ProgressiveExplainReleaseFunc), configured in the memory context
> where the query is being executed, being responsible for calling
> ProgressiveExplainCleanup() if the query doesn't end gracefully.

Thanks for the pointer. I'm a bit skeptical about what's going on here
in ProgressiveExplainReleaseFunc(). It seems like we're depending on
shared memory to tell us whether we need to do purely backend-local
cleanup, like calling disable_timeout() and resetting
ProgressiveExplainPending and activeQueryDesc. I would have expected
us to keep track in local memory of whether this kind of work needs to
be done. It seems roundabout to take an LWLock, do a hash table lookup
to see if there's an entry there, release the LWLock, and then very
shortly thereafter take the lock a second time to release the entry
that we now know is there.

The comment in ProgressiveExplainCleanup about only detaching from the
DSA if the query ended gracefully is not ideal from my point of view
because it says what the code does instead of why the code does that
thing. Also, the function is seemingly called with queryDesc as an
argument not because you need it for anything but because you're going
to test whether it's null. In that case, you could just pass a
Boolean. Even then, something seems odd about this: why do we have to
be holding ProgressiveExplainHashLock to dsa_detach the
somewhat-inscrutably named area pe_a? And why are we not detaching it
in case of error?

I am wondering why you chose this relatively unusual error cleanup
strategy. What I would have done is invent AtEOXact_ProgressiveExplain
and AtSubEOXact_ProgressiveExplain. In some sense this looks simpler,
because it doesn't need separate handling for transactions and
subtransactions, but it's so unlike what we do in most places that
it's hard for me to tell whether it's correct. I feel like the
approach you've chosen here would make sense if what we wanted to do
was basically release memory or some memory-like resource associated
closely with the context -- e.g. expandedrecord.c releases a
TupleDesc, but this is doing more than that.

I think the effect of this choice is that cleanup of the
progressive-EXPLAIN stuff happens much later than it normally would.
Most of the time, in the case of an abort, we would want
AbortTransaction() to release as many resources as possible, leaving
basically no work to do at CleanupTransaction() time. This is so that
if a user types "SELECT 1/0;" we release resources, as far as
possible, right away, and don't wait for them to subsequently type
"ROLLBACK;". The transaction lives on only as a shell. But these
resources, if I'm reading this right, are going to stick around until
the transaction is actually rolled back, because memory is not freed
until CleanupTransaction() time. I wonder what happens if a query
inside of an explicit transaction aborts after putting an entry in the
progressive-explain view. My guess is that the entry will stick around
until the actual rollback happens.

In fact, now that I think about it, this is probably why we don't
dsa_detach() in ProgressiveExplainCleanup() in cases of error -- the
resource owner cleanup will have already released the DSA segments
long before the memory context is deleted.

I'm sort of inclined to say that this should be rewritten to do error
cleanup in a more normal way. It's probably more code to do it that
way, but I think having it be more similar to other subsystems is
probably worth quite a bit.

> > It seems like when we replace a longer entry with a shorter one, we
> > forget that it was originally longer. Hence, if the length of a
> > progressive EXPLAIN is alternately 2922 characters and 2923
> > characters, we'll reallocate on every other progressive EXPLAIN
> > instead of at most once.
>
> Are you talking about re-printing the plan in the same query execution?

Yes.

> The logic for the code, using your example, would be to allocate 2922 +
> PROGRESSIVE_EXPLAIN_FREE_SIZE (4096, currently) initially. If next plans
> alternate between 2922 and 2923 no additional allocation will be done.
> A reallocation will be needed only if the plan length ends up exceeding
> 2922+4096. At the end of query execution (or cancellation) that DSA will
> be freed and a next query execution will have to allocate again using the
 > same logic.

It seems to me that if ProgressiveExplainPrint() reaches /* Update
shared memory with new data */ without reallocating,
strlen(pe_data->plan) can be reduced. On the next trip through the
function, we don't know whether the string we're seeing is the
original string -- for which strlen()+PROGRESSIVE_EXPLAIN_FREE_SIZE)
gives us the original allocation size -- or whether the string we're
seeing is a shorter one that was copied over the original, longer
string. PROGRESSIVE_EXPLAIN_FREE_SIZE is big enough that this probably
isn't much of 

Re: BitmapHeapScan streaming read user and prelim refactoring

2025-03-22 Thread Andres Freund
Hi,

On 2025-03-22 16:42:42 -0400, Andres Freund wrote:
> Hm, it's clearly related to the all-visible path, but I think the bug is
> actually independent of what was reported there. The bug you reported is
> perfectly reproducible, without any concurrency, after all.
> 
> Here's a smaller reproducer:
> 
> CREATE TABLE bmscantest (a int, t text);
> 
> INSERT INTO bmscantest
>   SELECT (r%53), 
> 'foo'
>   FROM generate_series(1,7) r;
> 
> CREATE INDEX i_bmtest_a ON bmscantest(a);
> set enable_indexscan=false;
> set enable_seqscan=false;
> 
> -- Lower work_mem to trigger use of lossy bitmaps
> set work_mem = 64;
> 
> SELECT count(*) FROM bmscantest WHERE a = 1;
> vacuum freeze bmscantest;
> SELECT count(*) FROM bmscantest WHERE a = 1;
> 
> -- clean up
> DROP TABLE bmscantest;
> 
> 
> The first SELECT reports 1321, the second 572 tuples.

The problem is that sometimes recheck is performed for pending empty
tuples. The comment about empty tuples says:

/*
 * Bitmap is exhausted. Time to emit empty tuples if relevant. We emit
 * all empty tuples at the end instead of emitting them per block we
 * skip fetching. This is necessary because the streaming read API
 * will only return TBMIterateResults for blocks actually fetched.
 * When we skip fetching a block, we keep track of how many empty
 * tuples to emit at the end of the BitmapHeapScan. We do not recheck
 * all NULL tuples.
 */
*recheck = false;
return bscan->rs_empty_tuples_pending > 0;

But we actually emit empty tuples at each page boundary:

/*
 * Out of range?  If so, nothing more to look at on this page
 */
while (hscan->rs_cindex >= hscan->rs_ntuples)
{
/*
 * Emit empty tuples before advancing to the next block
 */
if (bscan->rs_empty_tuples_pending > 0)
{
/*
 * If we don't have to fetch the tuple, just return 
nulls.
 */
ExecStoreAllNullTuple(slot);
bscan->rs_empty_tuples_pending--;
return true;
}

/*
 * Returns false if the bitmap is exhausted and there are no 
further
 * blocks we need to scan.
 */
if (!BitmapHeapScanNextBlock(scan, recheck, lossy_pages, 
exact_pages))
return false;
}

So we don't just process tuples at the end of the scan, but at each page
boundary.

This leads to wrong results whenever there is a rs_empty_tuples_pending > 0
after a previous page that needed rechecks, because nothing will reset
*recheck = false.

And indeed, if I add a *recheck = false in that inside the
/*
 * Emit empty tuples before advancing to the next block
 */
if (bscan->rs_empty_tuples_pending > 0)
{
the problem goes away.


A fix should do slightly more, given that the "at the end" comment is wrong.


But I'm wondering if it's worth fixing it, or if we should just rip the logic
out alltogether, due to the whole VM checking being unsound as we learned in
the thread I referenced earlie, without even bothering to fix the bug here.

Greetings,

Andres Freund




Re: Using read_stream in index vacuum

2025-03-22 Thread Andrey Borodin



> On 22 Mar 2025, at 00:23, Melanie Plageman  wrote:
> 
> 
> I've committed the btree and gist read stream users.

Cool! Thanks!

> I think we can
> come back to the test after feature freeze and make sure it is super
> solid.

+1.


> On 22 Mar 2025, at 02:54, Melanie Plageman  wrote:
> 
> On Fri, Mar 21, 2025 at 3:23 PM Melanie Plageman
>  wrote:
>> 
>> I've committed the btree and gist read stream users. I think we can
>> come back to the test after feature freeze and make sure it is super
>> solid.
> 
> I've now committed the spgist vacuum user as well. I'll mark the CF
> entry as completed.

That's great! Thank you!

> I wonder if we should do GIN?

GIN vacuum is a logical scan. Back in 2017 I was starting to work on it, but 
made some mistakes, that were reverted by fd83c83 from the released version. 
And I decided to back off for some time. Perhaps, now I can implement physical 
scan for GIN, that could benefit from read stream. But I doubt I will find 
committer for this in 19, let alone 18.

We can add some support for read stream for hashbulkdelete(): it's not that 
linear as B-tree, GiST and SP-GiST, it scans only beginning of hash buckets, 
but if buckets are small it might be more efficient.

>> Looking at the spgist read stream user, I see you didn't convert
>> spgprocesspending(). It seems like you could write a callback that
>> uses the posting list and streamify this as well.
> 
> It's probably not worth it -- since we process the pending list for
> each page of the index.

My understanding is that pending lists should be small on real workloads.

Thank you!


Best regards, Andrey Borodin.



Re: [PATCH] PGSERVICEFILE as part of a normal connection string

2025-03-22 Thread Michael Paquier
On Thu, Mar 20, 2025 at 06:16:44PM +0900, Ryo Kanbayashi wrote:
> Sorry, I found a miss on 006_service.pl.
> Fixed patch is attached...

Please note that the commit fest app needs all the patches of a a set
to be posted in the same message.  In this case, v2-0001 is not going
to get automatic test coverage.

Your patch naming policy is also a bit confusing.  I would suggest to
use `git format-patch -vN -2`, where N is your version number.  0001
would be the new tests for service files, and 0002 the new feature,
with its own tests.

+if ($windows_os) {
+
+# Windows: use CRLF
+print $fh "[my_srv]",   "\r\n";
+print $fh join( "\r\n", split( ' ', $node->connstr ) ), "\r\n";
+}
+else {
+# Non-Windows: use LF
+print $fh "[my_srv]", "\n";
+print $fh join( "\n", split( ' ', $node->connstr ) ), "\n";
+}
+close $fh;

That's duplicated.  Let's perhaps use a $newline variable and print
into the file using the $newline?

Question: you are doing things this way in the test because fgets() is
what is used by libpq to retrieve the lines of the service file, is
that right?

Please note that the CI is failing.  It seems to me that you are
missing a done_testing() at the end of the script.  If you have a
github account, I'd suggest to set up a CI in your own fork of
Postgres, this is really helpful to double-check the correctness of a
patch before posting it to the lists, and saves in round trips between
author and reviewer.  Please see src/tools/ci/README in the code tree
for details.

+# Copyright (c) 2023-2024, PostgreSQL Global Development Group

These dates are incorrect.  Should be 2025, as it's a new file.

+++ b/src/interfaces/libpq/t/007_servicefile_opt.pl
@@ -0,0 +1,100 @@
+# Copyright (c) 2023-2024, PostgreSQL Global Development Group

Incorrect date again in the second path with the new feature.  I'd
suggest to merge all the tests in a single script, with only one node
initialized and started.
--
Michael


signature.asc
Description: PGP signature


Re: why there is not VACUUM FULL CONCURRENTLY?

2025-03-22 Thread Antonin Houska
Alvaro Herrera  wrote:

> I rebased this patch series; here's v09.  No substantive changes from v08.
> I made sure the tree still compiles after each commit.

Thanks.

> I did look at 0002 again (and renamed the members of the new struct by
> adding a p_ prefix, as well as fixing the references to the old names
> that were in a few code comments here and there; I don't think these
> changes are "substantive"), and ended up wondering why do we need that
> change in the first place.  According to the comment where the progress
> restore function is called, it's because reorderbuffer.c uses a
> subtransaction internally.  But I went to look at reorderbuffer.c and
> noticed that the subtransaction is only used "when using the SQL
> function interface, because that creates a transaction already".  So
> maybe we should look into making REPACK use reorderbuffer without having
> to open a transaction block.

Which part of reorderbuffer.c do you mean? ISTM that the use of subransactions
is more extensive. At least ReorderBufferImmediateInvalidation() appears to
rely on it, which in turn is called by xact_decode().

(I don't claim that saving and restoring the progress state is perfect, but I
don't have better idea right now.)

-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com




Re: query_id: jumble names of temp tables for better pg_stat_statement UX

2025-03-22 Thread Christoph Berg
Re: To Michael Paquier
> > >> +#define JUMBLE_CUSTOM(nodetype, item) \
> > >> +_jumble##nodetype##_##item(jstate, expr, expr->item)
> > 
> > In this one, I want to mean that we require a custom per-field
> > function to look like that:
> > _jumbleNodefoo_field(JumbleState *jstate, NodeFoo *expr, FieldType field);
> 
> Perhaps this:

Or actually more explicit:

/*
 * Per-field custom jumble functions have this signature:
 * _jumbleNodefoo_field(JumbleState *jstate, NodeFoo *expr, FieldType field);
 */

Christoph




Re: query_id: jumble names of temp tables for better pg_stat_statement UX

2025-03-22 Thread Christoph Berg
Re: Michael Paquier
> >> + * Note that the argument types are enforced for the per-field custom
> >> + * functions.
> >> + */
> >> +#define JUMBLE_CUSTOM(nodetype, item) \
> >> +  _jumble##nodetype##_##item(jstate, expr, expr->item)
> 
> In this one, I want to mean that we require a custom per-field
> function to look like that:
> _jumbleNodefoo_field(JumbleState *jstate, NodeFoo *expr, FieldType field);
> 
> Rather than having more generic shape like that:
> _jumbleNodefoo_field(JumbleState *jstate, Node *exp,
>  const unsigned char *item, Size size);
> 
> So a custom function is defined so as the node type and field type are
> arguments.  Perhaps this comment would be better if reworded like
> that:
> "The arguments of this function use the node type and the field type,
> rather than a generic argument like AppendJumble() and the other
> _jumble() functions."

Perhaps this:

/*
 * The per-field custom jumble functions get jstate, the node, and the
 * field as arguments.
 */

They are not actually different from _jumbleList and _jumbleA_Const
which also get the node (and just not the field). AppendJumble is a
different layer, the output, so it's not surprising its signature is
different.

Are we at the point where the patch is already Ready for Committer?

Thanks,
Christoph




Re: why there is not VACUUM FULL CONCURRENTLY?

2025-03-22 Thread Antonin Houska
Robert Haas  wrote:

> On Thu, Mar 20, 2025 at 2:09 PM Antonin Houska  wrote:
> > Robert Haas  wrote:
> > > Is there a README or a long comment in here someplace that is a good
> > > place to read to understand the overall design of this feature?
> >
> > I tried to explain how it works in the commit messages. The one in 0004 is
> > probably the most important one.
> 
> Thanks. A couple of comments/questions:
> 
> - I don't understand why this commit message seems to think that we
> can't acquire a stronger lock while already holding a weaker one. We
> can do that, and in some cases we do precisely that.

Can you please give me an example? I don't recall seeing a lock upgrade in the
tree. That's the reason I tried rather hard to avoid that.

> Such locking
> patterns can result in deadlock e.g. if I take AccessShareLock and you
> take AccessShareLock and then I tried to upgrade to
> AccessExclusiveLock and then you try to upgrade to
> AccessExclusiveLock, somebody is going to have to ERROR out. But that
> doesn't keep us from doing that in some places where it seems better
> than the alternatives, and the alternative chosen by the patch
> (possibly discovering at the very end that all our work has been in
> vain) does not seem better than risking a deadlock.

I see. Only the backends that do upgrade their lock are exposed to the risk of
deadlock, e.g. two backends running REPACK CONCURRENTLY on the same table, and
that should not happen too often.

I'll consider your objection - it should make the patch a bit simpler.

> - On what basis do you make the statement in the last paragraph that
> the decoding-related lag should not exceed one WAL segment? I guess
> logical decoding probably keeps up pretty well most of the time but
> this seems like a very strong guarantee for something I didn't know we
> had any kind of guarantee about.

The patch itself does guarantee that by checking the amount of unprocessed WAL
regularly when it's copying the data into the new table. If too much WAL
appears to be unprocessed, it enforces the decoding before the copying is
resumed.

The WAL decoding during the "initial load" phase can actually be handled by a
background worker (not sure it's necessary in the initial implementation),
which would make a significant lag rather unlikely. But even then we should
probably enforce certain limit on the lag (e.g. because background worker is
not guaranteed to start).

> - What happens if we crash?

The replication slot we create is RS_TEMPORARY, so it disappears after
restart. Everything else is as if the current implementation of CLUSTER ends
due to crash.

-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com




Re: Adding extension default version to \dx

2025-03-22 Thread Jelte Fennema-Nio
On Tue, 25 Feb 2025 at 17:11, Nathan Bossart  wrote:
>
> On Tue, Feb 25, 2025 at 04:42:37PM +0100, Magnus Hagander wrote:
> > Thanks goes to  both you and the previous responders - I did manage to mute
> > this thread away and missed the early replies, but got Jeltes the other day
> > which brought it back up on my list to get to within the Not Too Long
> > Future (TM).
>
> Got it, sounds good.

The commitfest is close to ending, but this hasn't been committed yet.
I think it would be great if either of you could commit it before the
commitfest ends. It would be a shame to have this quality of life
improvement wait another year.




Re: Enhance 'pg_createsubscriber' to retrieve databases automatically when no database is provided.

2025-03-22 Thread vignesh C
On Fri, 21 Mar 2025 at 18:59, Shubham Khanna
 wrote:
>
>
> During the recent testing, I observed that the tests were failing when
> using wait_for_slot_catchup(). To address this, I reverted to using
> wait_for_subscription_sync(), which was employed previously and has
> proven to be more reliable in ensuring test stability.
> Please let me know if there are any additional adjustments you would
> suggest or if you would like me to investigate further into
> wait_for_slot_catchup().
>
> I have created a separate patch for the synopsis of '--all' option as
> suggested by Amit at [1]. The attached patch contains the suggested
> changes.

I believe you added the following because pattern matching is
difficult for the $db1 and $db2 variables having special names:
+# Create a new database on node_p
+$node_p->safe_psql(
+   "postgres", qq(
+   CREATE DATABASE db1;
+));

How about we change it to verify the count of occurrences instead for
this case like below:
# Verify that the required logical replication objects are created. The
# expected count 3 refers to postgres, $db1 and $db2 databases.
is(scalar(() = $stderr =~ /creating publication/g),
3, "verify publications are created for all databases");
is(scalar(() = $stderr =~ /creating the replication slot/g),
3, "verify replication slots are created for all databases");
is(scalar(() = $stderr =~ /creating subscription/g),
3, "verify subscriptions are created for all databases");

If you are ok, you can merge the changes attached.

Regards,
Vignesh
diff --git a/src/bin/pg_basebackup/t/040_pg_createsubscriber.pl b/src/bin/pg_basebackup/t/040_pg_createsubscriber.pl
index 7e275473eb9..9355cb0ade9 100644
--- a/src/bin/pg_basebackup/t/040_pg_createsubscriber.pl
+++ b/src/bin/pg_basebackup/t/040_pg_createsubscriber.pl
@@ -419,12 +419,6 @@ command_fails_like(
 	qr/--publication cannot be used with --all/,
 	'fail if --publication is used with --all');
 
-# Create a new database on node_p
-$node_p->safe_psql(
-	"postgres", qq(
-	CREATE DATABASE db1;
-));
-
 # run pg_createsubscriber with '--all' option
 my ($stdout, $stderr) = run_command(
 	[
@@ -440,20 +434,14 @@ my ($stdout, $stderr) = run_command(
 	],
 	'run pg_createsubscriber with --all');
 
-like(
-	$stderr,
-	qr/.*pg_createsubscriber: creating publication .* in database \"postgres\"/,
-	"expanded commands for all databases");
-like(
-	$stderr,
-	qr/.*pg_createsubscriber: creating publication .* in database \"db1\"/,
-	"expanded commands for all databases");
-
-# Drop the newly created database on node_p
-$node_p->safe_psql(
-	"postgres", qq(
-	DROP DATABASE db1;
-));
+# Verify that the required logical replication objects are created. The
+# expected count 3 refers to postgres, $db1 and $db2 databases.
+is(scalar(() = $stderr =~ /creating publication/g),
+	3, "verify publications are created for all databases");
+is(scalar(() = $stderr =~ /creating the replication slot/g),
+	3, "verify replication slots are created for all databases");
+is(scalar(() = $stderr =~ /creating subscription/g),
+	3, "verify subscriptions are created for all databases");
 
 # Run pg_createsubscriber on node S.  --verbose is used twice
 # to show more information.


Re: RFC: Additional Directory for Extensions

2025-03-22 Thread Andrew Dunstan


On 2025-03-21 Fr 12:42 PM, Tom Lane wrote:

Matheus Alcantara writes:

On Thu, Mar 20, 2025 at 7:38 PM Andrew Dunstan wrote:

(wondering if this another of these cases where the "path includes postgres" 
thing bites us, and we're looking in the wrong place)

Nope, testing shows it's not that, so I am rather confused about what was going 
on.

I'm not sure if I'm checking on the right place [1] but it seems that the
Contrib and ContribInstall is executed after Check step which causes this test
failure?

No, this is not failing in Check.

I did just notice a clue though: on snakefly, the failing step's
log [1] includes

make[1]: Leaving directory 
`/opt/postgres/build-farm-18/HEAD/pgsql.build/src/backend'
rm -rf '/opt/postgres/build-farm-18/HEAD/pgsql.build'/tmp_install
/usr/bin/mkdir -p '/opt/postgres/build-farm-18/HEAD/pgsql.build'/tmp_install/log
make -C '../../../..' DESTDIR='/opt/postgres/build-farm-18/HEAD/pgsql.build'/tmp_install 
install >'/opt/postgres/build-farm-18/HEAD/pgsql.build'/tmp_install/log/install.log 
2>&1
make -j1  checkprep 
>>'/opt/postgres/build-farm-18/HEAD/pgsql.build'/tmp_install/log/install.log 
2>&1
PATH="/opt/postgres/build-farm-18/HEAD/pgsql.build/tmp_install/opt/postgres/build-farm-18/HEAD/inst/bin:/opt/postgres/build-farm-18/HEAD/pgsql.build/src/test/modules/test_extensions:$PATH"
 
LD_LIBRARY_PATH="/opt/postgres/build-farm-18/HEAD/pgsql.build/tmp_install/opt/postgres/build-farm-18/HEAD/inst/lib:$LD_LIBRARY_PATH"
 INITDB_TEMPLATE='/opt/postgres/build-farm-18/HEAD/pgsql.build'/tmp_install/initdb-template  initdb --auth trust 
--no-sync --no-instructions --lc-messages=C --no-clean 
'/opt/postgres/build-farm-18/HEAD/pgsql.build'/tmp_install/initdb-template 
>>'/opt/postgres/build-farm-18/HEAD/pgsql.build'/tmp_install/log/initdb-template.log 2>&1
echo "# +++ regress check in src/test/modules/test_extensions +++" && 
PATH="/opt/postgres/build-farm-18/HEAD/pgsql.build/tmp_install/opt/postgres/build-farm-18/HEAD/inst/bin:/opt/postgres/build-farm-18/HEAD/pgsql.build/src/test/modules/test_extensions:$PATH"
 
LD_LIBRARY_PATH="/opt/postgres/build-farm-18/HEAD/pgsql.build/tmp_install/opt/postgres/build-farm-18/HEAD/inst/lib:$LD_LIBRARY_PATH"
 INITDB_TEMPLATE='/opt/postgres/build-farm-18/HEAD/pgsql.build'/tmp_install/initdb-template  
../../../../src/test/regress/pg_regress --temp-instance=./tmp_check --inputdir=. --bindir=  
--temp-config=/opt/postgres/build-farm-18/tmp/buildfarm-C9Iy3s/bfextra.conf  --no-locale --port=5678 
--dbname=contrib_regression test_extensions test_extdepend
# +++ regress check in src/test/modules/test_extensions +++
# initializing database system by running initdb

showing that the step made its own tmp_install, and that only the core
"install" process was executed, so the lack of amcheck in that install
tree is not surprising.  But concurrent runs on other animals, eg [2],
don't show a tmp_install rebuild happening.  So those are using an
installation tree that *does* include contrib modules.

So what this comes down to is "why is snakefly doing a fresh install
here?".  I don't know the buildfarm client well enough to identify
probable causes.  I do note that Makefile.global.in conditionalizes
tmp_install rebuild on several variables:

temp-install: | submake-generated-headers
ifndef NO_TEMP_INSTALL
ifneq ($(abs_top_builddir),)
ifeq ($(MAKELEVEL),0)
rm -rf '$(abs_top_builddir)'/tmp_install
$(MKDIR_P) '$(abs_top_builddir)'/tmp_install/log
$(MAKE) -C '$(top_builddir)' DESTDIR='$(abs_top_builddir)'/tmp_install install 
>'$(abs_top_builddir)'/tmp_install/log/install.log 2>&1
$(MAKE) -j1 $(if $(CHECKPREP_TOP),-C $(CHECKPREP_TOP),) checkprep 
>>'$(abs_top_builddir)'/tmp_install/log/install.log 2>&1



Good catch. This is happening because the owner hasn't updated the 
animal to REL_19_1. In 19 and 19.1 we fixed detection of exiting 
installs to take account of the 'Is there postgres or pgsql in the 
prefix' issue. So it was looking in the wrong place.



cheers


andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com


Re: making EXPLAIN extensible

2025-03-22 Thread Andrei Lepikhov

On 3/21/25 20:54, Robert Haas wrote:

On Fri, Mar 21, 2025 at 2:37 PM Tom Lane  wrote:
Here's v9, which also adds 'SET debug_parallel_query = off' to the
pg_overexplain tests, per CI, because the test results are not (and
cannot realistically be made) stable under under that option.

I skimmed through the code and tested how it works.
It looks good and has no apparent architectural dependencies.
But I haven't scrutinised it line-by-line and do not intend to do so.
I wanna say I hate the design of this module. Having a strong necessity 
for extra explain tools (in the daily routine, all I have is the only 
single explain analyse verbose output to find out planner/executor bug, 
reproduce it and make a patch), I don't see a single case when I would 
use this module. It adds a burden to fix its output on a node change 
(you don't care, but it adds work to Postgres fork maintainers, too, for 
nothing). Also, it breaks my understanding of the principles of the 
Postgres code design - to start the discussion on how to show more, we 
need only the bare minimum of code and output lines.
In my opinion, it should show as few parameters as possible to 
demonstrate principles and test the code on a single planner node. It 
only deserves src/test/modules because it is not helpful for a broad 
audience.


--
regards, Andrei Lepikhov




Re: Parallel heap vacuum

2025-03-22 Thread Melanie Plageman
On Thu, Mar 20, 2025 at 4:36 AM Masahiko Sawada  wrote:
>
> When testing the multi passes of table vacuuming, I found an issue.
> With the current patch, both leader and parallel workers process stop
> the phase 1 as soon as the shared TidStore size reaches to the limit,
> and then the leader resumes the parallel heap scan after heap
> vacuuming and index vacuuming. Therefore, as I described in the patch,
> one tricky part of this patch is that it's possible that we launch
> fewer workers than the previous time when resuming phase 1 after phase
> 2 and 3. In this case, since the previous parallel workers might have
> already allocated some blocks in their chunk, newly launched workers
> need to take over their parallel scan state. That's why in the patch
> we store workers' ParallelBlockTableScanWorkerData in shared memory.
> However, I found my assumption is wrong; in order to take over the
> previous scan state properly we need to take over not only
> ParallelBlockTableScanWorkerData but also ReadStream data as parallel
> workers might have already queued some blocks for look-ahead in their
> ReadStream. Looking at ReadStream codes, I find that it's not
> realistic to store it into the shared memory.

It seems like one way to solve this would be to add functionality to
the read stream to unpin the buffers it has in the buffers queue
without trying to continue calling the callback until the stream is
exhausted.

We have read_stream_reset(), but that is to restart streams that have
already been exhausted. Exhausted streams are where the callback has
returned InvalidBlockNumber. In the read_stream_reset() cases, the
read stream user knows there are more blocks it would like to scan or
that it would like to restart the scan from the beginning.

Your case is you want to stop trying to exhaust the read stream and
just unpin the remaining buffers. As long as the worker which paused
phase I knows exactly the last block it processed and can communicate
this to whatever worker resumes phase I later, it can initialize
vacrel->current_block to the last block processed.

> One plausible solution would be that we don't use ReadStream in
> parallel heap vacuum cases but directly use
> table_block_parallelscan_xxx() instead. It works but we end up having
> two different scan methods for parallel and non-parallel lazy heap
> scan. I've implemented this idea in the attached v12 patches.

One question is which scenarios will parallel vacuum phase I without
AIO be faster than read AIO-ified vacuum phase I. Without AIO writes,
I suppose it would be trivial for phase I parallel vacuum to be faster
without using read AIO. But it's worth thinking about the tradeoff.

- Melanie




Re: Next commitfest app release is planned for March 18th

2025-03-22 Thread Peter Geoghegan
On Sat, Mar 22, 2025 at 7:15 AM Jelte Fennema-Nio  wrote:
> Yeah I think we need a way to "star" a patch (not just for committers,
> but for anyone). After creating this initial version of this dashboard
> I realized it's also possible to "subscribe to updates" for a patch,
> which means you'll get emails about it. I think if you "subscribe for
> updates" you should also see it in your dashboard (which you currently
> don't). But I also think there should be a way to "star" something, so
> you see it in your dashboard without receiving email updates about it.

Right. In my view "starring" something should definitely not appear in
the "History Log" of the patch; it should be private.

> > > - Also, my dashboard shows patches from past and future commitfests.  I
> > > don't know what I'm supposed to do with that.  The purpose of having a
> > > "current" commitfest is that you work on that one when it's current.
> >
> > This might make sense for the patch author, but I agree it's not
> > appropriate for anyone else.
>
> I'd love to hear some more thoughts on this. Especially from Peter G,
> because he explicitly requested to see patches from different
> commitfests in this dashboard. So I'm curious about his reasoning.
> Ofcourse we can make this configurable or have two separate pages, but
> I'm wondering if there's a way to change it in a way that works for
> everyone.

Personally, I think it'd be most useful for the dashboard to show all
open patches. I believe that that implies that it'll only show me
patches from either the current CF, or the next CF. But thinking about
it some more...I guess that there are edge-cases, where patches can be
in limbo between being "fully open" and "fully closed". Like when a
patch from an old CF isn't brought forward, and also isn't
withdrawn/returned.

I think that we should apply an expansive definition of "active", that
still shows me things that are less than 100% officially closed.
Possibly with additional visual cues that they're in this in
between/limbo state.

> Instead of showing all patches from all commitfests, how about we do
> the following:

I think that we should show all patches from all commitfests in the
"Personal Dashboard", provided that they're not closed. I probably
don't have enough experience with the current "Personal Dashboard"
yet, but so far it seems like exactly what I had in mind.

If I see something in the "Personal Dashboard" that seems like it
shouldn't be there, due to not being in either the current or next CF,
and am bothered by that, then maybe I should then take it as an
opportunity to fix the problem. As I said, maybe there should be some
additional visual cue that shows certain "Personal Dashboard" entries
are "in limbo". But please don't make entries in this state just
vanish, on the grounds that they're theoretically closed -- experience
suggests that they're probably not really closed at all.

If something appears on my "Personal Dashboard" it appears there
because I actively chose to participate, and hiding things from me on
bureaucratic grounds seems paternalistic.

-- 
Peter Geoghegan




Re: [PATCH] PGSERVICEFILE as part of a normal connection string

2025-03-22 Thread Ryo Kanbayashi
On Sat, Mar 22, 2025 at 4:46 PM Michael Paquier  wrote:
>
> On Thu, Mar 20, 2025 at 06:16:44PM +0900, Ryo Kanbayashi wrote:
> > Sorry, I found a miss on 006_service.pl.
> > Fixed patch is attached...
>
> Please note that the commit fest app needs all the patches of a a set
> to be posted in the same message.  In this case, v2-0001 is not going
> to get automatic test coverage.
>
> Your patch naming policy is also a bit confusing.  I would suggest to
> use `git format-patch -vN -2`, where N is your version number.  0001
> would be the new tests for service files, and 0002 the new feature,
> with its own tests.

All right.
I attached patches generated with your suggested command :)

> +if ($windows_os) {
> +
> +# Windows: use CRLF
> +print $fh "[my_srv]",   "\r\n";
> +print $fh join( "\r\n", split( ' ', $node->connstr ) ), "\r\n";
> +}
> +else {
> +# Non-Windows: use LF
> +print $fh "[my_srv]", "\n";
> +print $fh join( "\n", split( ' ', $node->connstr ) ), "\n";
> +}
> +close $fh;
>
> That's duplicated.  Let's perhaps use a $newline variable and print
> into the file using the $newline?

OK.
I reflected above comment.

> Question: you are doing things this way in the test because fgets() is
> what is used by libpq to retrieve the lines of the service file, is
> that right?

No. I'm doing above way simply because line ending code of service file
wrote by users may become CRLF in Windows platform.

> Please note that the CI is failing.  It seems to me that you are
> missing a done_testing() at the end of the script.  If you have a
> github account, I'd suggest to set up a CI in your own fork of
> Postgres, this is really helpful to double-check the correctness of a
> patch before posting it to the lists, and saves in round trips between
> author and reviewer.  Please see src/tools/ci/README in the code tree
> for details.

Sorry.
I'm using Cirrus CI with GitHub and I checked passing the CI.
But there were misses when I created patch files...

> +# Copyright (c) 2023-2024, PostgreSQL Global Development Group
>
> These dates are incorrect.  Should be 2025, as it's a new file.

OK.

> +++ b/src/interfaces/libpq/t/007_servicefile_opt.pl
> @@ -0,0 +1,100 @@
> +# Copyright (c) 2023-2024, PostgreSQL Global Development Group
>
> Incorrect date again in the second path with the new feature.  I'd
> suggest to merge all the tests in a single script, with only one node
> initialized and started.

OK.
Additional test scripts have been merged to a single script ^^ b

---
Great regards,
Ryo Kanbayashi
From 69c4f4eb8e0ed40c434867fbb740a68383623da9 Mon Sep 17 00:00:00 2001
From: Ryo Kanbayashi 
Date: Sun, 23 Mar 2025 11:41:06 +0900
Subject: [PATCH v3 1/2] add regression test of service connection option.

---
 src/interfaces/libpq/meson.build  |  1 +
 src/interfaces/libpq/t/006_service.pl | 79 +++
 2 files changed, 80 insertions(+)
 create mode 100644 src/interfaces/libpq/t/006_service.pl

diff --git a/src/interfaces/libpq/meson.build b/src/interfaces/libpq/meson.build
index 19f4a52a97a..292fecf3320 100644
--- a/src/interfaces/libpq/meson.build
+++ b/src/interfaces/libpq/meson.build
@@ -122,6 +122,7 @@ tests += {
   't/003_load_balance_host_list.pl',
   't/004_load_balance_dns.pl',
   't/005_negotiate_encryption.pl',
+  't/006_service.pl',
 ],
 'env': {
   'with_ssl': ssl_library,
diff --git a/src/interfaces/libpq/t/006_service.pl b/src/interfaces/libpq/t/006_service.pl
new file mode 100644
index 000..045e25a05d3
--- /dev/null
+++ b/src/interfaces/libpq/t/006_service.pl
@@ -0,0 +1,79 @@
+# Copyright (c) 2025, PostgreSQL Global Development Group
+use strict;
+use warnings FATAL => 'all';
+use Config;
+use PostgreSQL::Test::Utils;
+use PostgreSQL::Test::Cluster;
+use Test::More;
+
+# This tests "service" connection options.
+
+# Cluster setup which is shared for testing both load balancing methods
+my $node = PostgreSQL::Test::Cluster->new('node');
+
+# Create a data directory with initdb
+$node->init();
+
+# Start the PostgreSQL server
+$node->start();
+
+my $td  = PostgreSQL::Test::Utils::tempdir;
+my $srvfile = "$td/pgsrv.conf";
+
+# Windows: use CRLF
+# Non-Windows: use LF
+my $newline = $windows_os ? "\r\n" : "\n";
+
+# Create a service file
+open my $fh, '>', $srvfile or die $!;
+print $fh "[my_srv]", $newline;
+print $fh join( $newline, split( ' ', $node->connstr ) ), $newline;
+
+close $fh;
+
+# Check that service option works as expected
+{
+local $ENV{PGSERVICEFILE} = $srvfile;
+$node->connect_ok(
+'service=my_srv',
+'service=my_srv',
+sql => "SELECT 'connect1'",
+expected_stdout => qr/connect1/
+);
+
+$node->connect_ok(
+'postgres://?service=my_srv',
+'postgres://?service=my_srv',
+sql => "SELECT 'connect2'",
+expected_stdout =

Re: Proposal - Allow extensions to set a Plan Identifier

2025-03-22 Thread Michael Paquier
On Sat, Mar 22, 2025 at 11:50:06PM +0100, Andrei Lepikhov wrote:
> planId actually looks less controversial than queryId or plan_node_id. At
> the same time, it is not free from the different logic that extensions may
> incorporate into this value: I can imagine, for example, the attempt of
> uniquely numbering plans related to the same queryId, plain hashing of the
> plan tree, hashing without constants, etc.

One plan ID should have one single query ID, but the opposite may be
not necessarily true: a query ID could be associated with multiple
plan patterns (aka multiple plan IDs).  What this offers is that we
know about which plan one query is currently using in live, for a
given query ID.

> So, it seems that extensions may conflict with the same field. Are we sure
> that will happen if they are loaded in different orders in neighbouring
> backends?

Depends on what kind of computation once wants to do, and surely
without some coordination across the extensions you are using these
cannot be shared, but it's no different from the concept of a query
ID.  The use cases I've seen where this field is useful is when
splitting code that uses the planner hook for several purposes across
more than one extension.  Three of them which are independent, still
want to treat know about a plan ID: 
- Set/get an existing node tree plan based on a specific ID.
- Hash calculation of a tree (like Lukas proposal).
- Statistics related to the plan tree used (aka custom cumulative
stats play here).

> I'm not very good at stat collector guts, but would it be possible to allow
> extensions to report their state in standard tuple format? In that case,
> doubts about queryId or planId may be resolved.

I am not exactly sure what you mean here.  We are only going to have
one plan ID set for each query execution.  Any stats plan data related
to a plan ID surely needs to be upper-bounded if you don't want to
bloat pgstats, with the query ID of the query string it relates to
stored in it and the plan ID used as hash key, but it does not change
that we're only going to always have one single plan ID in a
PlannedStmt or in a backend entry doing a query execution, like a
query ID.
--
Michael


signature.asc
Description: PGP signature


Re: pg_stat_statements and "IN" conditions

2025-03-22 Thread Sami Imseih
> Sure, it's important and I'm planning to tackle this next. If you want,
> we can collaborate on a patch for that.

I spent some time looking some more at this, and I believe all that needs
to be done is check for a PRAM node with a type of PARAM_EXTERN.

During planning the planner turns the Param into a Const during
eval_const_expressions_mutator.

If it's as simple as I think it is, I hope we can get this committed for 18.
If not, and a longer discussion is needed, a new thread can be started
for this.

--
Sami Imseih
Amazon Web Services (AWS)


v1-0001-Allow-query-jumble-to-squash-a-list-external-paramet.patch
Description: Binary data


Re: AIO v2.5

2025-03-22 Thread Noah Misch
On Thu, Mar 20, 2025 at 09:58:37PM -0400, Andres Freund wrote:
> Attached v2.11, with the following changes:

> - Added an error check for FileStartReadV() failing
> 
>   FileStartReadV() actually can fail, if the file can't be re-opened. I
>   thought it'd be important for the error message to differ from the one
>   that's issued for read actually failing, so I went with:
> 
>   "could not start reading blocks %u..%u in file \"%s\": %m"
> 
>   but I'm not sure how good that is.

Message looks good.

> - Improved error message if io_uring_queue_init() fails
> 
>   Added errhint()s for likely cases of failure.
> 
>   Added errcode().  I was tempted to use errcode_for_file_access(), but that
>   doesn't support ENOSYS - perhaps I should add that instead?

Either way is fine with me.  ENOSYS -> ERRCODE_FEATURE_NOT_SUPPORTED is a good
general mapping to have in errcode_for_file_access(), but it's also not a
problem to keep it the way v2.11 has it.

> - Disable io_uring method when using EXEC_BACKEND, they're not compatible
> 
>   I chose to do this with a define aio.h, but I guess we could also do it at
>   configure time? That seems more complicated though - how would we even know
>   that EXEC_BACKEND is used on non-windows?

Agreed, "make PROFILE=-DEXEC_BACKEND" is a valid way to get EXEC_BACKEND.

>   Not sure yet how to best disable testing io_uring in this case. We can't
>   just query EXEC_BACKEND from pg_config.h unfortunately.  I guess making the
>   initdb not fail and checking the error log would work, but that doesn't work
>   nicely with Cluster.pm.

How about "postgres -c io_method=io_uring -C ":

--- a/src/test/modules/test_aio/t/001_aio.pl
+++ b/src/test/modules/test_aio/t/001_aio.pl
@@ -29,7 +29,13 @@ $node_worker->stop();
 # Test io_method=io_uring
 ###
 
-if ($ENV{with_liburing} eq 'yes')
+sub have_io_uring
+{
+   local %ENV = $node_worker->_get_env();  # any node works
+   return run_log [qw(postgres -c io_method=io_uring -C io_method)];
+}
+
+if (have_io_uring())
 {
my $node_uring = create_node('io_uring');
$node_uring->start();

> Questions:
> 
> 
> - We only "look" at BM_IO_ERROR for writes, isn't that somewhat weird?
> 
>   See AbortBufferIO(Buffer buffer)
> 
>   It doesn't really matter for the patchset, but it just strikes me as an 
> oddity.

That caught my attention in an earlier review round, but I didn't find it
important enough to raise.  It's mildly unfortunate to be setting BM_IO_ERROR
for reads when the only thing BM_IO_ERROR drives is message "Multiple failures
--- write error might be permanent."  It's minor, so let's leave it that way
for the foreseeable future.

> Subject: [PATCH v2.11 01/27] aio, bufmgr: Comment fixes

Ready to commit, though other comment fixes might come up in later reviews.
One idea so far is to comment on valid states after some IoMethodOps
callbacks:

--- a/src/include/storage/aio_internal.h
+++ b/src/include/storage/aio_internal.h
@@ -310,6 +310,9 @@ typedef struct IoMethodOps
/*
 * Start executing passed in IOs.
 *
+* Shall advance state to PGAIO_HS_SUBMITTED.  (By the time this 
returns,
+* other backends might have advanced the state further.)
+*
 * Will not be called if ->needs_synchronous_execution() returned true.
 *
 * num_staged_ios is <= PGAIO_SUBMIT_BATCH_SIZE.
@@ -321,6 +324,12 @@ typedef struct IoMethodOps
/*
 * Wait for the IO to complete. Optional.
 *
+* On return, state shall be PGAIO_HS_COMPLETED_IO,
+* PGAIO_HS_COMPLETED_SHARED or PGAIO_HS_COMPLETED_LOCAL.  (The callback
+* need not change the state if it's already one of those.)  If state is
+* PGAIO_HS_COMPLETED_IO, state will reach PGAIO_HS_COMPLETED_SHARED
+* without further intervention.
+*
 * If not provided, it needs to be guaranteed that the IO method calls
 * pgaio_io_process_completion() without further interaction by the
 * issuing backend.

> Subject: [PATCH v2.11 02/27] aio: Change prefix of PgAioResultStatus values to
>  PGAIO_RS_

Ready to commit

> Subject: [PATCH v2.11 03/27] Redefine max_files_per_process to control
>  additionally opened files

Ready to commit

> Subject: [PATCH v2.11 04/27] aio: Add liburing dependency

> --- a/meson.build
> +++ b/meson.build
> @@ -944,6 +944,18 @@ endif
>  
>  
>  
> +###
> +# Library: liburing
> +###
> +
> +liburingopt = get_option('liburing')
> +liburing = dependency('liburing', required: liburingopt)
> +if liburing.found()
> +  cdata.set('USE_LIBURING', 1)
> +endif

This is a different style from other deps; is it equivalent to our standard
style?  Example for lz4:

lz4opt = get_option('lz4')
if not lz4opt.disabled()
  lz4 = dependency('liblz4', required: false)
  # Unfortunately the dependency is named differently wit

Re: Update LDAP Protocol in fe-connect.c to v3

2025-03-22 Thread Andrew Jackson
> This is the first complaint I can recall hearing about that, so
exactly which ones are "many"?

I've tested a 2 before figuring out about the v3 issue. lldap[0] and the
docker image osixia/docker-openldap[1].
- lldap  gives the following error message when I attempt to connect
without the patch "Service Error: while handling incoming messages: while
receiving LDAP op: Bind request version is not equal to 3. This is a
serious client bug.". With the attached patch this error message does not
appear
-  osixia/docker-openlap gives the following error message without the
patch "67df745e conn=1001 op=0 RESULT tag=97 err=2 text=historical protocol
version requested, use LDAPv3 instead".
"

> Also, are we really sufficiently compliant with v3 that just adding this
bit is enough?

I believe that this bit is all that is needed. Per the man page for
ldap_set_option [2]: "The protocol version used by the library defaults to
LDAPv2 (now historic), which corresponds to the LDAP_VERSION2 macro.
Application developers are encouraged to explicitly set
LDAP_OPT_PROTOCOL_VERSION to LDAPv3, using the LDAP_VERSION3 macro, or to
allow users to select the protocol version."

> src/test/ldap/ doesn't do it for you?

Looking through the tests here it seems like they are all tests for the
serverside auth functionality that is configurable in pg_hba.conf. I don't
see any tests that test the client side "LDAP Lookup of Connection
Parameters" described in [3]

[0] https://github.com/lldap/lldap
[1] https://github.com/osixia/docker-openldap
[2] https://linux.die.net/man/3/ldap
[3] https://www.postgresql.org/docs/current/libpq-ldap.html


On Sat, Mar 22, 2025 at 6:10 PM Tom Lane  wrote:

> Andrew Jackson  writes:
> > Currently the LDAP usage in fe-connect.c does not explicitly set the
> > protocol version to v3. This causes issues with many LDAP servers as they
> > will often require clients to use the v3 protocol and disallow any use of
> > the v2 protocol.
>
> This is the first complaint I can recall hearing about that, so
> exactly which ones are "many"?  Also, are we really sufficiently
> compliant with v3 that just adding this bit is enough?
>
> > One further note is that I do not currently see any test coverage over
> the
> > LDAP functionality in `fe-connect.c`. I am happy to add that to this
> patch
> > if needed.
>
> src/test/ldap/ doesn't do it for you?
>
> regards, tom lane
>


Re: Update Unicode data to Unicode 16.0.0

2025-03-22 Thread Jeremy Schneider
On Fri, 21 Mar 2025 13:45:24 -0700
Jeff Davis  wrote:

> > Maybe we should actually move in the direction of having encodings
> > that are essentially specific versions of Unicode. Instead of just
> > having UTF-8 that accepts whatever, you could have UTF-8.v16.0.0 or
> > whatever, which would only accept code points known to that version
> > of
> > Unicode. Or maybe this shouldn't be entirely new encodings but
> > something vaguely akin to a typmod, so that you could have columns
> > of type text[limited_to_unicode_v16_0_0] or whatever. If we actually
> > exclude unassigned code points, then we know they aren't there, and
> > we
> > can make deductions about what is safe to do based on that
> > information.  
> 
> I like this line of thinking, vaguely similar to my STRICT_UNICODE
> database option proposal. Maybe these aren't exactly the right things
> to do, but I think there are some possibilities here, and we shouldn't
> give up and assume there's a problem when usually there is not.

There is "the iPhone paradox" here; if we reject unassigned code
points, then websites are going to start throwing database errors for
anyone with the latest iPhone who uses a new emoji.

(Unless the database is updated very quickly, which is atypical.) Apple
tends to get new emojis into consumers hands a year or less after the
new Unicode release.

-Jeremy




Re: Proposal - Allow extensions to set a Plan Identifier

2025-03-22 Thread Andrei Lepikhov

On 3/22/25 06:48, Michael Paquier wrote:

On Fri, Mar 21, 2025 at 09:25:24AM -0400, Sami Imseih wrote:

planner() is the sole place in the core code where the planner hook
can be called.  Shouldn't we have at least a call to
pgstat_report_plan_id() after planning a query?  At least that should
be the behavior I'd expect, where a module pushes a planId to a
PlannedStmt, then core publishes it to the backend entry in non-force
mode.


I agree. I was just thinking we rely on the exec_ routines to report the plan_id
at the start. But, besides the good reason you give, reporting
(slightly) earlier is
better for monitoring tools; as it reduces the likelihood they find an empty
plan_id.


Yep.  pgstat_report_plan_id() is not something that extensions should
do, but they should report the planId in the PlannedStmt and let the
backend do the rest.


Overall, v3 LGTM


Thanks.  If anybody has any objections and/or comments, now would be a
good time.  I'll revisit this thread at the beginning of next week.
The last time I wrote the email, I mistakenly thought about 
plan_node_id. I apologize for that.
planId actually looks less controversial than queryId or plan_node_id. 
At the same time, it is not free from the different logic that 
extensions may incorporate into this value: I can imagine, for example, 
the attempt of uniquely numbering plans related to the same queryId, 
plain hashing of the plan tree, hashing without constants, etc.
So, it seems that extensions may conflict with the same field. Are we 
sure that will happen if they are loaded in different orders in 
neighbouring backends?
I'm not very good at stat collector guts, but would it be possible to 
allow extensions to report their state in standard tuple format? In that 
case, doubts about queryId or planId may be resolved.


--
regards, Andrei Lepikhov




Re: query_id: jumble names of temp tables for better pg_stat_statement UX

2025-03-22 Thread Tom Lane
Michael Paquier  writes:
> Alias.aliasname is not qualified, so it means that we'd begin to
> assign the same query ID even if using two relations from two schemas
> depending on what search_path assigns, no?

Right.  I'm arguing that that's good.  The proposed patch already
obscures the difference between similar table names in different
(temp) schemas, and I'm suggesting that taking that a bit further
would be fine.

Note that if the tables we're considering don't have identical
rowtypes, the queries would likely jumble differently anyway
due to differences in Vars' varattno and vartype.

regards, tom lane




Re: query_id: jumble names of temp tables for better pg_stat_statement UX

2025-03-22 Thread Michael Paquier
On Sat, Mar 22, 2025 at 12:24:43PM -0400, Tom Lane wrote:
> I experimented with this trivial fix (shown in-line to keep the cfbot
> from thinking this is the patch-of-record):
> 
> What's happening there is that there's an ALTER TABLE ADD COLUMN in
> the test, so the executions after the first one see more entries
> in eref->colnames and come up with a different jumble.  I think
> we probably don't want that behavior; we only want to jumble the
> table name.  So we'd still need the v3-0001 patch in some form to
> allow annotating RangeTblEntry.eref with a custom jumble method
> that'd only jumble the aliasname.

Alias.aliasname is not qualified, so it means that we'd begin to
assign the same query ID even if using two relations from two schemas
depending on what search_path assigns, no?  Say:
create schema popo1;
create schema popo2;
create table popo1.aa (a int, b int);
create table popo2.aa (a int, b int);
set search_path = 'popo1';
select count(*) from aa;
set search_path = 'popo2';
select count(*) from aa;

=# select query, calls from pg_stat_statements where
 query ~ 'select count';
  query  | calls
-+---
 select count(*) from aa | 2
(1 row)

Perhaps that's OK because such queries use the same query string, but
just silencing the relid means that we'd lose the namespace reference
entirely, making the stats potentially fuzzier depending on the
workload.  On HEAD, one can guess the query ID with an EXPLAIN and a
search_path, as well, so currently it's possible to cross-check the 
contents of pgss.  But we'd lose this possibility here..
--
Michael


signature.asc
Description: PGP signature


Re: AIO v2.5

2025-03-22 Thread Noah Misch
On Thu, Mar 20, 2025 at 09:58:37PM -0400, Andres Freund wrote:
> Attached v2.11

> Subject: [PATCH v2.11 05/27] aio: Add io_method=io_uring

Apart from some isolated cosmetic points, this is ready to commit:

> + ereport(ERROR,
> + errcode(err),
> + errmsg("io_uring_queue_init failed: 
> %m"),
> + hint != NULL ? errhint("%s", hint) : 0);

https://www.postgresql.org/docs/current/error-style-guide.html gives the 
example:

BAD:open() failed: %m
BETTER: could not open file %s: %m

Hence, this errmsg should change, perhaps to:
"could not setup io_uring queues: %m".

> + pgaio_debug_io(DEBUG3, ioh,
> +"wait_one io_gen: %llu, ref_gen: 
> %llu, cycle %d",
> +(long long unsigned) ref_generation,
> +(long long unsigned) ioh->generation,

In the message string, io_gen appears before ref_gen.  In the subsequent args,
the order is swapped relative to the message string.

> --- a/src/backend/utils/activity/wait_event_names.txt
> +++ b/src/backend/utils/activity/wait_event_names.txt
> @@ -192,6 +192,8 @@ ABI_compatibility:
>  
>  Section: ClassName - WaitEventIO
>  
> +AIO_IO_URING_SUBMIT  "Waiting for IO submission via io_uring."
> +AIO_IO_URING_COMPLETION  "Waiting for IO completion via io_uring."
>  AIO_IO_COMPLETION"Waiting for IO completion."

I'm wondering if there's an opportunity to enrich the last two wait event
names and/or descriptions.  The current descriptions suggest to me more
similarity than is actually there.  Inputs to the decision:

- AIO_IO_COMPLETION waits for an IO in PGAIO_HS_DEFINED, PGAIO_HS_STAGED, or
  PGAIO_HS_COMPLETED_IO to reach PGAIO_HS_COMPLETED_SHARED.  The three
  starting states are the states where some other backend owns the next
  action, so the current backend can only wait to be signaled.

- AIO_IO_URING_COMPLETION waits for the kernel to do enough so we can move
  from PGAIO_HS_SUBMITTED to PGAIO_HS_COMPLETED_IO.

Possible names and descriptions, based on PgAioHandleState enum names and
comments:

AIO_IO_URING_COMPLETED_IO   "Waiting for IO result via io_uring."
AIO_COMPLETED_SHARED"Waiting for IO shared completion callback."

If "shared completion callback" is too internals-focused, perhaps this:

AIO_IO_URING_COMPLETED_IO   "Waiting for IO result via io_uring."
AIO_COMPLETED_SHARED"Waiting for IO completion to update shared memory."

> --- a/doc/src/sgml/config.sgml
> +++ b/doc/src/sgml/config.sgml
> @@ -2710,6 +2710,12 @@ include_dir 'conf.d'
>  worker (execute asynchronous I/O using worker 
> processes)
> 
>
> +  
> +   
> +io_uring (execute asynchronous I/O using
> +io_uring, if available)

I feel the "if available" doesn't quite fit, since we'll fail if unavailable.
Maybe just "(execute asynchronous I/O using Linux io_uring)" with "Linux"
there to reduce surprise on other platforms.

> Subject: [PATCH v2.11 06/27] aio: Implement support for reads in smgr/md/fd

(Still reviewing this one.)




Re: Test to dump and restore objects left behind by regression

2025-03-22 Thread Ashutosh Bapat
On Fri, Mar 21, 2025 at 6:04 PM Alvaro Herrera  wrote:
>
> On 2025-Mar-21, Ashutosh Bapat wrote:
>
> > On Thu, Mar 20, 2025 at 8:37 PM vignesh C  wrote:
>
> > > Should the copyright be only 2025 in this case:
>
> > The patch was posted in 2024 to this mailing list. So we better
> > protect the copyright since then. I remember a hackers discussion
> > where a senior member of the community mentioned that there's not harm
> > in mentioning longer copyright periods than being stricter about it. I
> > couldn't find the discussion though.
>
> On the other hand, my impression is that we do update copyright years to
> current year, when committing new files of patches that have been around
> for long.
>
> And there's always
> https://liferay.dev/blogs/-/blogs/how-and-why-to-properly-write-copyright-statements-in-your-code

Right. So shouldn't the copyright notice be 2024-2025 and not just
only 2025? - Next year it will be changed to 2024-2026.


-- 
Best Wishes,
Ashutosh Bapat