date:20250312

Re: Memory context can be its own parent and child in replication command

2025-03-12 Thread Anthonin Bonnefoy

I've updated the patch with another approach:

0001: This sets the MemoryContext type to T_Invalid just before adding
it to the aset freelist, allowing to distinguish between a
MemoryContext that was placed in the freelist and shouldn't be used,
and a valid one. The existing MemoryContextIsValid checks will be
triggered if a MemoryContext in the freelist is used.
I've also added additional MemoryContextIsValid to make sure the
MemoryContext restored by a transaction is valid.

0002: Keep the replication command alive when there's an ongoing
snapshot export. This way, the context restored is valid and can be
reused. This requires the replication command context to be created
under the TopMemoryContext.

Regards,
Anthonin Bonnefoy


v02-0001-Add-additional-memory-context-checks.patch
Description: Binary data


v02-0002-Avoid-using-deleted-context-with-replication-com.patch
Description: Binary data

Re: DOCS - Generated Column Replication Examples

2025-03-12 Thread vignesh C

On Sat, 1 Mar 2025 at 19:01, Amit Kapila  wrote:
>
> On Mon, Feb 3, 2025 at 4:23 AM Peter Smith  wrote:
> >
> > A recent commit [1] added a new section "29.6. Generated Column
> > Replication" to the documentation [2]. But, no "Examples" were
> > included.
> >
> > I've created this new thread to make a case for adding an "Examples"
> > subsection to that page.
> >
> > (the proposed examples patch is attached)
> >
> > ==
> >
> > 1. Reasons AGAINST Adding Examples to this section (and why I disagree):
> >
> > 1a. The necessary information to describe the feature can already be
> > found in the text.
> >
>
> I still feel that the current text and example given on the page are
> sufficient for now. This is a new feature, and I think it is better to
> get some feedback from users. We can add more examples or information
> based on real user feedback. Also, we haven't got any other response
> on this thread yet. So, I think we can wait for a week or so more to
> see if anyone else also thinks that the additional examples will be
> useful; if not, then return this patch for now.

It seems there is currently little interest in this, so I’ve closed
the commitfest entry for now. We can revisit this later if we receive
feedback from users indicating it's needed.

Regards,
Vignesh

Re: Non-text mode for pg_dumpall

2025-03-12 Thread jian he

On Wed, Mar 12, 2025 at 1:06 AM Álvaro Herrera  wrote:
>
> Hello,
>
> On 2025-Mar-11, Mahendra Singh Thalor wrote:
>
> > In map.dat file, I tried to fix this issue by adding number of characters
> > in dbname but as per code comments, as of now, we are not supporting \n\r
> > in dbnames so i removed handling.
> > I will do some more study to fix this issue.
>
> Yeah, I think this is saying that you should not consider the contents
> of map.dat as a shell string.  After all, you're not going to _execute_
> that file via the shell.
>
> Maybe for map.dat you need to escape such characters somehow, so that
> they don't appear as literal newlines/carriage returns.
>

I am confused.
currently pg_dumpall plain format will abort when encountering dbname
containing newline.
the left dumped plain file does not contain all the cluster databases data.

if pg_dumpall non-text format aborts earlier,
it's aligned with pg_dumpall plain format?
it's also an improvement since aborts earlier, nothing will be dumped?

am i missing something?

Re: Selectively invalidate caches in pgoutput module

2025-03-12 Thread Amit Kapila

On Tue, Mar 11, 2025 at 5:47 PM Hayato Kuroda (Fujitsu)
 wrote:
>
> Attached patch address comments by you and Amit [1]. What's new;
>

Thanks, the patch looks mostly good to me. I have made few cosmetic
changes in the attached and combined both the patches. The existing
Alter Publication ... Rename tests don't test invalidations arising
from that command. As this patch changes that code path, it would be
good to add a few tests for the same. We can add one for individual
relations and another for ALL Tables publication.

-- 
With Regards,
Amit Kapila.

v11-0001-Avoid-invalidating-all-RelationSyncCache-entries.patch
Description: Binary data

Re: Wrong results with subquery pullup and grouping sets

2025-03-12 Thread Dean Rasheed

On Wed, 12 Mar 2025 at 07:45, Richard Guo  wrote:
>
> I refined the comment in v2 and ended up with:
>
>   * This analysis could be tighter: in particular, a non-strict
>   * construct hidden within a lower-level PlaceHolderVar is not
>   * reason to add another PHV.  But for now it doesn't seem
> - * worth the code to be more exact.
> + * worth the code to be more exact.  This is also why it's
> + * preferable to handle bare PHVs in the above branch, rather
> + * than this branch.  We also prefer to handle bare Vars in a
> + * separate branch, as it's cheaper this way and parallels the
> + * handling of PHVs.
>

LGTM.

> For backpatching, 0001 seems more like a cosmetic change, and I don't
> think it needs to be backpatched.  0002 is a bug fix, but I'm not
> confident enough to commit it to the back branches.  Given that there
> are other wrong result issues with grouping sets fixed in version 18
> but not in earlier versions, and that there are no field reports about
> this issue, perhaps it's OK to refrain from backpatching this one?
>

Yes, I was thinking along the same lines. This particular issue feels
a bit manufactured, and unlikely to occur in practice, but I'm happy
to go with whatever you decide.

Thanks for taking care of this.

Regards,
Dean

Re: pgsql: reindexdb: Add the index-level REINDEX with multiple jobs

2025-03-12 Thread Álvaro Herrera

On 2025-Mar-11, Alexander Korotkov wrote:

> Thank you for your feedback!  I also think that pipelining would be a
> better options, but it's too invasive for backpatching. I've written
> comments for gen_reindex_command() and run_reindex_command().  I'm
> going to push this patch to master and 17 if no objections.

Looks good.

Actually, I don't understand why is run_reindex_command() calling
PQfinish().  Hmm, also, why is the 'async' parameter always given as
true?  Is that PQfinish() actually dead code?

[... Looks at git history ...]

Ah!  this is actually a problem older than your patch -- it appears that
the last (only?) caller with async=false was removed by commit
2cbc3c17a5c1.  I think it's worth removing that while you're at it.  We
no longer need executeMaintenanceCommand() in reindexdb.c anymore
either, I think.

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/

Re: AIO v2.5

2025-03-12 Thread Noah Misch

On Tue, Mar 11, 2025 at 07:55:35PM -0400, Andres Freund wrote:
> On 2025-03-11 12:41:08 -0700, Noah Misch wrote:
> > On Mon, Sep 16, 2024 at 01:51:42PM -0400, Andres Freund wrote:
> > > On 2024-09-16 07:43:49 -0700, Noah Misch wrote:

> What do we want to do for ConditionalLockBufferForCleanup() (I don't think
> IsBufferCleanupOK() can matter)?  I suspect we should also make it wait for
> the IO. See below:

I agree IsBufferCleanupOK() can't matter.  It asserts that the caller already
holds the exclusive buffer lock, and it doesn't loop or wait.

> [...] leads me to think that causing
> the IO to complete is probably the safest bet. Triggering IO completion never
> requires acquiring new locks that could participate in a deadlock, so it'd be
> safe.

Yes, that decision looks right to me.  I scanned the callers, and none of them
clearly prefers a different choice.  If we someday find one caller prefers a
false return over blocking on I/O completion, we can always introduce a new
ConditionalLock* variant for that.

> > > - To allow io_workers to be PGC_SIGHUP, and to eventually allow to
> > >   automatically in/decrease active workers, the max number of workers 
> > > (32) is
> > >   always allocated. That means we use more semaphores than before. I think
> > >   that's ok, it's not 1995 anymore.  Alternatively we can add a
> > >   "io_workers_max" GUC and probe for it in initdb.
> >
> > Let's start as you have it.  If someone wants to make things perfect for
> > non-root BSD users, they can add the GUC later.  io_method=sync is a
> > sufficient backup plan indefinitely.
> 
> Cool.
> 
> I think we'll really need to do something about this for BSD users regardless
> of AIO. Or maybe those OSs should fix something, but somehow I am not having
> high hopes for an OS that claims to have POSIX confirming unnamed semaphores
> due to having a syscall that always returns EPERM... [1].

I won't mind a project making things better for non-root BSD users.  I do
think such a project should not block other projects making things better for
everything else (like $SUBJECT).

> > > - Check if documentation for track_io_timing needs to be adjusted, after 
> > > the
> > >   bufmgr.c changes we only track waiting for an IO.
> >
> > Yes.
> 
> The relevant sentences seem to be:
> 
> - "Enables timing of database I/O calls."
> 
>   s/calls/waits/
> 
> - "Time spent in {read,write,writeback,extend,fsync} operations"
> 
>   s/in/waiting for/
> 
>   Even though not all of these will use AIO, the "waiting for" formulation
>   seems just as accurate.
> 
> - "Columns tracking I/O time will only be non-zero whenlinkend="guc-track-io-timing"/> is enabled."
> 
>   s/time/wait time/

Sounds good.

> > On Mon, Mar 10, 2025 at 02:23:12PM -0400, Andres Freund wrote:
> > > Attached is v2.6 of the AIO patchset.
> >
> > > - 0005, 0006 - io_uring support - close, but we need to do something about
> > >   set_max_fds(), which errors out spuriously in some cases
> >
> > What do we know about those cases?  I don't see a set_max_fds(); is that
> > set_max_safe_fds(), or something else?
> 
> Sorry, yes, set_max_safe_fds(). The problem basically is that with io_uring we
> will have a large number of FDs already allocated by the time
> set_max_safe_fds() is called. set_max_safe_fds() subtracts already_open from
> max_files_per_process allowing few, and even negative, IOs.
> 
> I think we should redefine max_files_per_process to be about the number of
> files each *backend* will additionally open.  Jelte was working on related
> patches, see [2]

Got it.  max_files_per_process is a quaint setting, documented as follows (I
needed the reminder):

If the kernel is enforcing
a safe per-process limit, you don't need to worry about this setting.
But on some platforms (notably, most BSD systems), the kernel will
allow individual processes to open many more files than the system
can actually support if many processes all try to open
that many files. If you find yourself seeing Too many open
files failures, try reducing this setting.

I could live with
v6-0003-Reflect-the-value-of-max_safe_fds-in-max_files_pe.patch but would lean
against it since it feels unduly novel to have a setting where we use the
postgresql.conf value to calculate a value that becomes the new SHOW-value of
the same setting.  Options I'd consider before that:

- Like you say, "redefine max_files_per_process to be about the number of
  files each *backend* will additionally open".  It will become normal that
  each backend's actual FD list length is max_files_per_process + MaxBackends
  if io_method=io_uring.  Outcome is not unlike
  v6-0002-Bump-postmaster-soft-open-file-limit-RLIMIT_NOFIL.patch +
  v6-0003-Reflect-the-value-of-max_safe_fds-in-max_files_pe.patch but we don't
  mutate max_files_per_process.  Benchmark results should not change beyond
  the inter-major-version noise level unless one sets io_method=io_uring.  I'm
  fee

Re: Special-case executor expression steps for common combinations

2025-03-12 Thread Daniel Gustafsson

> On 9 Mar 2025, at 23:35, Daniel Gustafsson  wrote:
>> On 8 Mar 2025, at 17:15, Andreas Karlsson  wrote:

>> +1 Still seems like a nice change to me too.
> 
> Thanks, I actually had it staged to go in early next week.

..which was done yesterday, thanks for review!

--
Daniel Gustafsson

Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row

2025-03-12 Thread Jim Jones



On 12.03.25 09:00, jian he wrote:
>> 1) WARNING might be a better fit than NOTICE here.
>>
> but NOTICE, on_errror set_to_null is aligned with on_errror ignore.
>
>> I would still leave the extra messages from "log_verbosity verbose" as
>> NOTICE though. What do you think?
>>
>> 
> When LOG_VERBOSITY option is set to verbose,
> for ignore option, a NOTICE message containing the line of the input
> file and the column name
> whose input conversion has failed is emitted for each discarded row;
> for set_to_null option, a NOTICE message containing the line of the
> input file and the column name
> where value was replaced with NULL for each input conversion failure.
>
> see the above desciption,
> on_errror set_to_null is aligned with on_errror ignore.
> it's just on_errror ignore is per row, on_errror set_to_null is per
> column/field.
> so NOTICE is aligned with other on_error option.


I considered using a WARNING due to the severity of the issue - the
failure to import data - but either NOTICE or WARNING works for me.


>> 2) Inconsistent terminology. Invalid values in "on_error set_to_null"
>> mode are names as "erroneous", but as "invalid" in "on_error stop" mode.
>> I don't want to get into the semantics of erroneous or invalid, but
>> sticking to one terminology would IMHO look better.
>>
> I am open to changing it.
> what do you think "invalid values in %llu row was replaced with null"?


LGTM: "invalid values in %llu rows were replaced with null"

Thanks for the patch!

Best, Jim

Re: Support NOT VALID / VALIDATE constraint options for named NOT NULL constraints

2025-03-12 Thread Ashutosh Bapat

On Tue, Mar 11, 2025 at 3:46 PM Rushabh Lathia  wrote:

>>
>> > JFYI, I can reproduce the failure Ashutosh Bapat reported, while
>> > running the pg_upgrade test through meson commands.  I am
>> > investigating that further.
>>
>> Ah good, thanks.
>
>
> Somehow, I am now not able to reproduce after the clean build.  Yesterday
> I was able to reproduce, so I was happy, but again trying to analyze the issue
> when I start with the

If the test passes for you, can you please try the patches at [1] on
top of your patches? Please apply those, set and export environment
variable PG_TEST_EXTRA=regress_dump_test, and run 002_pg_upgrade test?
I intended to do this but can not do it since the test always fails
with your patches applied.

[1] 
https://www.postgresql.org/message-id/CAExHW5uQoyOddBKLBBJpfxXqqok%3DBTeMvt5OpnM6gw0SroiUUw%40mail.gmail.com

-- 
Best Wishes,
Ashutosh Bapat

Re: Wrong results with subquery pullup and grouping sets

2025-03-12 Thread Richard Guo

On Wed, Mar 12, 2025 at 1:32 AM Dean Rasheed  wrote:
> On Mon, 10 Mar 2025 at 13:05, Richard Guo  wrote:
> > Attached are the patches.

> These 2 comment changes from 0002 could be made part of 0001:
>
> 1). In pull_up_simple_subquery(), removing the word "Again" from the
> comment following the deleted block, since this is now the only place
> that sets wrap_non_vars there.
>
> 2). In pull_up_constant_function(), moving "(See comments in
> pull_up_simple_subquery().)" to the next comment.

Nice catch.

> > Another thing is that when deciding whether to wrap the newnode in
> > pullup_replace_vars_callback(), I initially thought that we didn't
> > need to handle Vars/PHVs specifically, and could instead merge them
> > into the branch for handling general expressions.  However, doing so
> > causes a plan diff in the regression tests.  The reason is that a
> > non-strict construct hidden within a lower-level PlaceHolderVar can
> > lead the code for handling general expressions to mistakenly think
> > that another PHV is needed, even when it isn't.  Therefore, the
> > branches for handling Vars/PHVs are retained in 0002.

> Right. The comment addition in 0002, relating to that, confused me at first:
>
>  * This analysis could be tighter: in particular, a non-strict
>  * construct hidden within a lower-level PlaceHolderVar is not
>  * reason to add another PHV.  But for now it doesn't seem
> -* worth the code to be more exact.
> +* worth the code to be more exact.  This is also why we need
> +* to handle Vars and PHVs in the above branches, rather than
> +* merging them into this one.
>
> AIUI, it's not really that it *needs* to handle Vars and PHVs
> separately, it's just better if it does, since that avoids
> unnecessarily wrapping a PHV again, if it contains non-strict
> constructs. Also, AFAICS there's no technical reason why simple Vars
> couldn't be handled here (all the tests pass if that branch is
> removed), but as a comment higher up says, that would be more
> expensive. So perhaps this new comment should say "This is why it's
> preferable to handle simple PHVs in the branch above, rather than this
> branch."

Exactly.  Technically, we could remove the branch for Vars.  However,
I chose to keep it because: 1) it's more efficient this way, and 2)
it's better to keep the handling of Vars and PHVs aligned.  I refined
the comment in v2 and ended up with:

  * This analysis could be tighter: in particular, a non-strict
  * construct hidden within a lower-level PlaceHolderVar is not
  * reason to add another PHV.  But for now it doesn't seem
- * worth the code to be more exact.
+ * worth the code to be more exact.  This is also why it's
+ * preferable to handle bare PHVs in the above branch, rather
+ * than this branch.  We also prefer to handle bare Vars in a
+ * separate branch, as it's cheaper this way and parallels the
+ * handling of PHVs.

> Finally, ReplaceWrapOption should be added to pgindent's typedefs.list.

Nice catch.

For backpatching, 0001 seems more like a cosmetic change, and I don't
think it needs to be backpatched.  0002 is a bug fix, but I'm not
confident enough to commit it to the back branches.  Given that there
are other wrong result issues with grouping sets fixed in version 18
but not in earlier versions, and that there are no field reports about
this issue, perhaps it's OK to refrain from backpatching this one?

Thanks
Richard


v2-0001-Remove-code-setting-wrap_non_vars-to-true-for-UNION-ALL-subqueries.patch
Description: Binary data


v2-0002-Fix-incorrect-handling-of-subquery-pullup.patch
Description: Binary data

Re: Conflict detection for update_deleted in logical replication

2025-03-12 Thread vignesh C

On Thu, 20 Feb 2025 at 12:50, Zhijie Hou (Fujitsu)
 wrote:
>
>
> Here is the v28 patch set, which converts the subscription option
> max_conflict_retention_duration into a GUC. Other logic remains unchanged.

After discussing with Hou internally, I have moved this to the next
CommitFest since it will not be committed in the current release. This
also allows reviewers to focus on the remaining patches in the current
CommitFest.

Regards,
Vignesh

Re: Question about duplicate JSONTYPE_JSON check

2025-03-12 Thread Álvaro Herrera

On 2025-Mar-12, Amit Langote wrote:

> I was able to construct a test case that crashes due to this bug:
> 
> CREATE TYPE mood AS ENUM ('happy', 'sad', 'neutral');
> CREATE FUNCTION mood_to_json(mood) RETURNS json AS $$
>   SELECT to_json($1::text);
> $$ LANGUAGE sql IMMUTABLE;
> CREATE CAST (mood AS json) WITH FUNCTION mood_to_json(mood) AS IMPLICIT;
> 
> SELECT JSON_OBJECT('happy'::mood: '123'::jsonb);
> server closed the connection unexpectedly

Good reaction time :-)  I see that that line shows as not even uncovered
in the report, but as non-existant (no background color as opposed to
red):

https://coverage.postgresql.org/src/backend/utils/adt/jsonb.c.gcov.html#660

Evidently the compiler must be optimizing it out as dead code.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"Update: super-fast reaction on the Postgres bugs mailing list. The report
was acknowledged [...], and a fix is under discussion.
The wonders of open-source !"
 https://twitter.com/gunnarmorling/status/1596080409259003906

Re: Parallel heap vacuum

2025-03-12 Thread Dilip Kumar

On Wed, Mar 12, 2025 at 3:40 PM Amit Kapila  wrote:
>
> On Wed, Mar 12, 2025 at 11:24 AM Dilip Kumar  wrote:
> >
> > On Wed, Mar 12, 2025 at 3:17 AM Masahiko Sawada  
> > wrote:
> > >
> > > On Tue, Mar 11, 2025 at 5:51 AM Amit Kapila  
> > > wrote:
> > > >
> >
> > Some thoughts/questions on the idea
> >
> > I notice that we are always considering block-level parallelism for
> > heaps and object-level parallelism for indexes. I'm wondering, when
> > multiple tables are being vacuumed together—either because the user
> > has provided a list of tables or has specified a  partitioned table
> > with multiple children—does it still make sense to default to
> > block-level parallelism? Or could we consider table-level parallelism
> > in such cases? For example, if there are 4 tables and 6 workers, with
> > 2 tables being small and the other 2 being large, perhaps we could
> > allocate 4 workers to vacuum all 4 tables in parallel. For the larger
> > tables, we could apply block-level parallelism, using more workers for
> > internal parallelism. On the other hand, if all tables are small, we
> > could just apply table-level parallelism without needing block-level
> > parallelism at all. This approach could offer more flexibility, isn't
> > it?
> >
>
> I have not thought from this angle, but it seems we can build this
> even on top of block-level vacuum for large tables.

Yes, that can be built on top of block-level vacuum. In that case, we
can utilize the workers more efficiently, depending on how many
workers we have and how many tables need to be vacuumed. And yes, that
could also be discussed separately.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: Test mail for pgsql-hackers

2025-03-12 Thread Ashutosh Bapat

Hi Blessy,

On Tue, Mar 11, 2025 at 6:03 PM BharatDB  wrote:
>>
>> Hi ,
>
>
>>
>> I’ve been exploring logical replication and noticed that if the column 
>> datatypes don’t match between the publisher and subscriber, PostgreSQL 
>> doesn’t give a warning. This can cause unexpected behavior, and I thought it 
>> might be helpful to alert users when this happens.
>
>
>>
>> ### **What This Patch Does:**
>
>
>>
>> - Adds a warning when a column's datatype in the subscriber doesn’t match 
>> the publisher.
>>
>> - Helps users catch issues early instead of running into silent errors later.
>
>
>>
>> Why I Think It’s Useful:- Avoids confusion when replication doesn’t work as 
>> expected. - Makes debugging easier by pointing out potential problems. I’d 
>> love to get feedback on whether this is a good idea and if I’ve approached 
>> it correctly. Since I’m still learning, any suggestions for improvement 
>> would be really helpful. I’ve attached the patch—please let me know what you 
>> think!

Large part of your patch is renaming files which are not related to
this patch. Can you please clean that up.

The code also looks problematic
+Oid local_typid = TupleDescAttr(tupdesc, i)->atttypid;
+Oid remote_typid = lrel.atttyps[i];

AFAIK, the attribute positions on the publisher and subscriber need
not match, but this code assumes that the attributes at ith position
on publisher has to be mapped to the attribute at the same position on
the subscriber.

+
+/* Get human-readable type names */
+char *local_typname = format_type_be(local_typid);
+char *remote_typname = format_type_be(remote_typid);

format_type_be() will give the name of the type on the subscriber not
on the publisher. You will need to look up logical replication type
cache for the name of the type on remote. If further assumes that the
OIDs on the same types on the publisher and the subscriber are same,
which may not be.

My memory is hazy as to whether we consider the types with the same
name on publisher and subscriber to be equivalent. If not, then I see
no way to make sure that the mapped columns are of the same type.

-- 
Best Wishes,
Ashutosh Bapat

Re: Index AM API cleanup

2025-03-12 Thread Peter Eisentraut

While studying the patch "[PATCH v21 08/12] Allow non-btree indexes to 
participate in mergejoin", I figured we could do the sortsupport.c piece 
much simpler.  The underlying issue there is similar to commits 
0d2aa4d4937 and c594f1ad2ba: We're just using the btree strategy numbers 
to pass information about the sort direction.  We can do this simpler by 
just passing a bool.  See attached patch.From f1855169b20ecfa12016c89abac52913cedf6688 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Wed, 12 Mar 2025 11:02:33 +0100
Subject: [PATCH v21.1] Simplify and generalize
 PrepareSortSupportFromIndexRel()

PrepareSortSupportFromIndexRel() was accepting btree strategy numbers
purely for the purpose of comparing it later against btree strategies
to determine if the sort direction was forward or reverse.  Change
that.  Instead, pass a bool directly, to indicate the same without an
unfortunate assumption that a strategy number refers specifically to a
btree strategy.  (This is similar in spirit to commits 0d2aa4d4937 and
c594f1ad2ba.)

(This could arguably be simplfied further by having the callers fill
in ssup_reverse directly.  But this way, it preserves consistency by
having all PrepareSortSupport*() variants be responsible for filling
in ssup_reverse.)

Moreover, remove the hardcoded check against BTREE_AM_OID, and check
against amcanorder instead, which is the actual requirement.

Discussion: 
https://www.postgresql.org/message-id/flat/e72eaa49-354d-4c2e-8eb9-255197f55...@enterprisedb.com
---
 src/backend/access/nbtree/nbtsort.c|  7 +++
 src/backend/utils/sort/sortsupport.c   | 15 ++-
 src/backend/utils/sort/tuplesortvariants.c | 14 ++
 src/include/utils/sortsupport.h|  2 +-
 4 files changed, 16 insertions(+), 22 deletions(-)

diff --git a/src/backend/access/nbtree/nbtsort.c 
b/src/backend/access/nbtree/nbtsort.c
index fa336ba0096..3794cc924ad 100644
--- a/src/backend/access/nbtree/nbtsort.c
+++ b/src/backend/access/nbtree/nbtsort.c
@@ -1171,7 +1171,7 @@ _bt_load(BTWriteState *wstate, BTSpool *btspool, BTSpool 
*btspool2)
{
SortSupport sortKey = sortKeys + i;
ScanKey scanKey = wstate->inskey->scankeys + i;
-   int16   strategy;
+   boolreverse;

sortKey->ssup_cxt = CurrentMemoryContext;
sortKey->ssup_collation = scanKey->sk_collation;
@@ -1183,10 +1183,9 @@ _bt_load(BTWriteState *wstate, BTSpool *btspool, BTSpool 
*btspool2)

Assert(sortKey->ssup_attno != 0);

-   strategy = (scanKey->sk_flags & SK_BT_DESC) != 0 ?
-   BTGreaterStrategyNumber : BTLessStrategyNumber;
+   reverse = (scanKey->sk_flags & SK_BT_DESC) != 0;

-   PrepareSortSupportFromIndexRel(wstate->index, strategy, 
sortKey);
+   PrepareSortSupportFromIndexRel(wstate->index, reverse, 
sortKey);
}

for (;;)
diff --git a/src/backend/utils/sort/sortsupport.c 
b/src/backend/utils/sort/sortsupport.c
index 6037031eaa3..0b4f08ed2ec 100644
--- a/src/backend/utils/sort/sortsupport.c
+++ b/src/backend/utils/sort/sortsupport.c
@@ -150,15 +150,15 @@ PrepareSortSupportFromOrderingOp(Oid orderingOp, 
SortSupport ssup)
 }

 /*
- * Fill in SortSupport given an index relation, attribute, and strategy.
+ * Fill in SortSupport given an index relation and attribute.
  *
  * Caller must previously have zeroed the SortSupportData structure and then
  * filled in ssup_cxt, ssup_attno, ssup_collation, and ssup_nulls_first.  This
- * will fill in ssup_reverse (based on the supplied strategy), as well as the
+ * will fill in ssup_reverse (based on the supplied argument), as well as the
  * comparator function pointer.
  */
 void
-PrepareSortSupportFromIndexRel(Relation indexRel, int16 strategy,
+PrepareSortSupportFromIndexRel(Relation indexRel, bool reverse,
   SortSupport ssup)
 {
Oid opfamily = 
indexRel->rd_opfamily[ssup->ssup_attno - 1];
@@ -166,12 +166,9 @@ PrepareSortSupportFromIndexRel(Relation indexRel, int16 
strategy,

Assert(ssup->comparator == NULL);

-   if (indexRel->rd_rel->relam != BTREE_AM_OID)
-   elog(ERROR, "unexpected non-btree AM: %u", 
indexRel->rd_rel->relam);
-   if (strategy != BTGreaterStrategyNumber &&
-   strategy != BTLessStrategyNumber)
-   elog(ERROR, "unexpected sort support strategy: %d", strategy);
-   ssup->ssup_reverse = (strategy == BTGreaterStrategyNumber);
+   if (!indexRel->rd_indam->amcanorder)
+   elog(ERROR, "unexpected non-amcanorder AM: %u", 
indexRel->rd_rel->relam);
+   ssup->ssup_reverse = reverse;

FinishSortSupportFunction(opfamily, opcintype, ssup);
 }
di

Re: meson vs. llvm bitcode files

2025-03-12 Thread Nazir Bilal Yavuz

Hi,

On Tue, 11 Mar 2025 at 01:04, Diego Fronza  wrote:
> I did a full review on the provided patches plus some tests, I was able to 
> validate that the loading of bitcode modules is working also JIT works for 
> both backend and contrib modules.

Thank you!

> To test JIT on contrib modules I just lowered the costs for all jit settings 
> and used the intarray extension, using the data/test__int.data:
> CREATE EXTENSION intarray;
> CREATE TABLE test__int( a int[] );1
> \copy test__int from 'data/test__int.data'
>
> For queries any from line 98+ on contrib/intarray/sql/_int.sql will work.
>
> Then I added extra debug messages to llvmjit_inline.cpp on 
> add_module_to_inline_search_path() function, also on 
> llvm_build_inline_plan(), I was able to see many functions in this module 
> being successfully inlined.
>
> I'm attaching a new patch based on your original work which add further 
> support for generating bitcode from:

Thanks for doing that!

>  - Generated backend sources: processed by flex, bison, etc.
>  - Generated contrib module sources,

I think we do not need to separate these two.

   foreach srcfile : bitcode_module['srcfiles']
-if meson.version().version_compare('>=0.59')
+srcfilename = '@0@'.format(srcfile)
+if srcfilename.startswith('=0.59')

Also, checking if the string starts with ' On this patch I just included fmgrtab.c and src/backend/parser for the 
> backend generated code.
> For contrib generated sources I added contrib/cube as an example.

I applied your contrib/cube example and did the same thing for the contrib/seg.

> All relevant details about the changes are included in the patch itself.
>
> As you may know already I also created a PR focused on llvm bitcode emission 
> on meson, it generates bitcode for all backend and contribution modules, 
> currently under review by some colleagues at Percona: 
> https://github.com/percona/postgres/pull/103
> I'm curious if we should get all or some of the generated backend sources 
> compiled to bitcode, similar to contrib modules.

I think we can do this. I added other backend sources like you did in
the PR but attached it as another patch (0007) because I wanted to
hear other people's opinions on that first.

v3 is attached.

--
Regards,
Nazir Bilal Yavuz
Microsoft
From 62c7c05b19476939941b656a6eab43dc3661ef09 Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Thu, 24 Oct 2024 07:23:05 -0400
Subject: [PATCH v3 1/7] meson: Add generated header stamps

Otherwise build commands become too long and this has visible effect on
creation time of meson build files.

Author: Andres Freund 
Author: Nazir Bilal Yavuz 
Discussion: https://postgr.es/m/206b001d-1884-4081-bd02-bed5c92f02ba%40eisentraut.org
---
 src/include/meson.build  | 18 ++
 src/backend/meson.build  |  2 +-
 src/fe_utils/meson.build |  2 +-
 meson.build  | 16 +---
 4 files changed, 29 insertions(+), 9 deletions(-)

diff --git a/src/include/meson.build b/src/include/meson.build
index 2e4b7aa529e..c488a5dc4c9 100644
--- a/src/include/meson.build
+++ b/src/include/meson.build
@@ -177,3 +177,21 @@ install_subdir('catalog',
 
 # autoconf generates the file there, ensure we get a conflict
 generated_sources_ac += {'src/include': ['stamp-h']}
+
+# Instead of having targets depending directly on the generated headers, have
+# them depend on a stamp files for all of them. Dependencies on headers are
+# implemented as order-only dependencies in meson (later using compiler
+# generated dependencies). The benefit of using a stamp file is that it makes
+# ninja.build smaller and meson setup faster.
+generated_headers_stamp = custom_target('generated-headers-stamp.h',
+  output: 'generated-headers-stamp.h',
+  input: generated_headers,
+  command: stamp_cmd,
+)
+
+generated_backend_headers_stamp = custom_target('generated-backend-headers-stamp.h',
+  output: 'generated-backend-headers-stamp.h',
+  input: generated_backend_headers,
+  depends: generated_headers_stamp,
+  command: stamp_cmd,
+)
diff --git a/src/backend/meson.build b/src/backend/meson.build
index 2b0db214804..7fc649c3ebd 100644
--- a/src/backend/meson.build
+++ b/src/backend/meson.build
@@ -169,7 +169,7 @@ backend_mod_code = declare_dependency(
   compile_args: pg_mod_c_args,
   include_directories: postgres_inc,
   link_args: pg_mod_link_args,
-  sources: generated_headers + generated_backend_headers,
+  sources: [generated_backend_headers_stamp],
   dependencies: backend_mod_deps,
 )
 
diff --git a/src/fe_utils/meson.build b/src/fe_utils/meson.build
index a18cbc939e4..5a9ddb73463 100644
--- a/src/fe_utils/meson.build
+++ b/src/fe_utils/meson.build
@@ -29,7 +29,7 @@ generated_sources += psqlscan
 fe_utils_sources += psqlscan
 
 fe_utils = static_library('libpgfeutils',
-  fe_utils_sources + generated_headers,
+  fe_utils_sources,
   c_pch: pch_postgres_fe_h,
   include_directories: [postgres_inc, libpq_inc],
   c_args: host_system == 'windows' ? ['-DFD_SETSIZE=1024'] : [],
di

Re: Add an option to skip loading missing publication to avoid logical replication failure

2025-03-12 Thread Dilip Kumar

On Tue, Mar 11, 2025 at 4:01 PM vignesh C  wrote:
>
> On Mon, 10 Mar 2025 at 09:33, Amit Kapila  wrote:
> >
> > On Tue, Mar 4, 2025 at 6:54 PM vignesh C  wrote:
> > >
> > > On further thinking, I felt the use of publications_updated variable
> > > is not required we can use publications_valid itself which will be set
> > > if the publication system table is invalidated. Here is a patch for
> > > the same.
> > >
> >
> > The patch relies on the fact that whenever a publication's data is
> > invalidated, it will also invalidate all the RelSyncEntires as per
> > publication_invalidation_cb. But note that we are discussing removing
> > that inefficiency in the thread  [1]. So, we should try to rebuild the
> > entry when we have skipped the required publication previously.
> >
> > Apart from this, please consider updating the docs, as mentioned in my
> > response to Sawada-San's email.
>
> The create subscription documentation already has "We allow
> non-existent publications to be specified so that users can add those
> later. This means pg_subscription can have non-existent publications."
> and should be enough at [1]. Let me know if we need to add more
> documentation.
>
> Apart from this I have changed the log level that logs "skipped
> loading publication" to WARNING as we log a warning "WARNING:
> publications "pub2", "pub3" do not exist on the publisher" in case of
> CREATE SUBSCRIPTION and looked similar to this. I can change it to a
> different log level in case you feel this is not the right level.
>
> Also I have added a test case for dilip's comment from [2].
> The attached v7 version patch has the changes for the same.
>

Thanks, Vignesh, for adding the test. I believe you've tested the
effect of DROP PUBLICATION. However, I think we should also test the
behavior of ALTER SUBSCRIPTION...SET PUBLICATION before creating the
PUBLICATION, and then create the PUBLICATION at a later stage.


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: Support NOT VALID / VALIDATE constraint options for named NOT NULL constraints

2025-03-12 Thread Ashutosh Bapat

On Wed, Mar 12, 2025 at 3:20 PM Alvaro Herrera  wrote:
>
> On 2025-Mar-12, Ashutosh Bapat wrote:
>
> > If the test passes for you, can you please try the patches at [1] on
> > top of your patches? Please apply those, set and export environment
> > variable PG_TEST_EXTRA=regress_dump_test, and run 002_pg_upgrade test?
> > I intended to do this but can not do it since the test always fails
> > with your patches applied.
>
> Oh, I need to enable a PG_TEST_EXTRA option in order for the test to
> run?  FFS.  That explains why the tests passed just fine for me.
> I'll re-run.

You need PG_TEST_EXTRA to run the testcase I am adding there. Rest of
the testcases run without PG_TEST_EXTRA.

-- 
Best Wishes,
Ashutosh Bapat

Re: [PATCH] SVE popcount support

2025-03-12 Thread chiranmoy.bhattacha...@fujitsu.com

On Wed, Mar 12, 2025 at 02:41:18AM +, nathandboss...@gmail.com wrote:

> v5-no-sve is the result of using a function pointer, but pointing to the
> "slow" versions instead of the SVE version.  v5-sve is the result of the
> latest patch in this thread on a machine with SVE support, and v5-4reg is
> the result of the latest patch in this thread modified to process 4
> register's worth of data at a time.

Nice, I wonder why I did not observe any performance gain in the 4reg
version. Did you modify the 4reg version code?

One possible explanation is that you used Graviton4 based instances
whereas I used Graviton3 instances.

> For the latter point, I think we should consider trying to add a separate
> Neon implementation that we use as a fallback for machines that don't have
> SVE.  My understanding is that Neon is virtually universally supported on
> 64-bit Arm gear, so that will not only help offset the function pointer
> overhead but may even improve performance for a much wider set of machines.

I have added the NEON implementation in the latest patch.

Here are the numbers for drive_popcount(100, 1024) on m7g.8xlarge:
Scalar - 692ms
Neon - 298ms
SVE - 112ms

-Chiranmoy


v6-0001-SVE-and-NEON-support-for-popcount.patch
Description: v6-0001-SVE-and-NEON-support-for-popcount.patch

Re: Add an option to skip loading missing publication to avoid logical replication failure

2025-03-12 Thread Amit Kapila

On Tue, Mar 11, 2025 at 9:48 AM Dilip Kumar  wrote:
>
> On Mon, Mar 10, 2025 at 10:54 AM Amit Kapila  wrote:
> >
>
> > >> BTW, I am planning to commit this only on HEAD as this is a behavior
> > >> change. Please let me know if you guys think otherwise.
> > >
> > >
> > > Somehow this looks like a bug fix which should be backported no?  Am I 
> > > missing something?
> > >
> >
> > We can consider this a bug-fix and backpatch it, but I am afraid that
> > because this is a behavior change, some users may not like it. Also, I
> > don't remember seeing public reports for this behavior; that is
> > probably because it is hard to hit. FYI, we found this via BF
> > failures. So, I thought it would be better to get this field tested
> > via HEAD only at this point in time.
>
> At the moment, I don't have a strong opinion on this. Since no one has
> encountered or reported this issue, it might be the case that it's not
> affecting anyone, and we could simply backpatch without causing any
> dissatisfaction. However, I'm fine with whatever others decide.
>

Sawada-San, others, do you have an opinion on whether to backpatch this change?

-- 
With Regards,
Amit Kapila.

Re: SQL function which allows to distinguish a server being in point in time recovery mode and an ordinary replica

2025-03-12 Thread vignesh C

On Fri, 28 Feb 2025 at 19:37,  wrote:
>
> Hi,
>
> Fix an error in the patch.

I felt you might have missed attaching the test patches added at [1].
Also the test from [1] is failing with the latest v6 version patch.

This change is not required:
 extern void XLogRecGetBlockRefInfo(XLogReaderState *record, bool pretty,

bool detailed_format, StringInfo buf,

uint32 *fpi_len);
-
 /*

[1] - 
https://www.postgresql.org/message-id/4ba66566b84df983c881b996eb8831f1%40postgrespro.ru

Regards,
Vignesh

Re: Vacuum timing in pg_stat_all_tables

2025-03-12 Thread Bertrand Drouvot

Hi,

On Tue, Mar 04, 2025 at 03:24:26PM +, Bertrand Drouvot wrote:
> Like "more stats are always nice" I think that "more explanations in the doc" 
> are
> always nice, so I don't see any reason why not to add this extra explanation.

Attached an attempt to do so.

Note that it does not add extra explanation to "cost-based delay". If we feel 
the
need we could add a link to ""
like it has been done for delay_time in bb8dff9995f.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
>From 8c333bcb97541b1238602ae84ea85d4792a5577f Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot 
Date: Wed, 12 Mar 2025 10:36:13 +
Subject: [PATCH v1] Add more details for pg_stat_all_tables

30a6ed0ce4b added the [auto]vacuum_time and [auto]analyze _time fields to
pg_stat_all_tables and bb8dff9995f added cost-based vacuum delay time to
progress views. This commit highlights the fact that the fields added in 30a6ed0ce4b
include the time spent sleeping due to cost-based delay.

Suggested-by: Magnus Hagander 
---
 doc/src/sgml/monitoring.sgml | 12 
 1 file changed, 8 insertions(+), 4 deletions(-)
 100.0% doc/src/sgml/

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index aaa6586d3a4..a37ae4d2d1d 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -4047,7 +4047,8 @@ description | Waiting for a newly initialized WAL file to reach durable storage
total_vacuum_time double precision
   
   
-   Total time this table has been manually vacuumed, in milliseconds
+   Total time this table has been manually vacuumed, in milliseconds. (This
+   includes the time spent sleeping due to cost-based delay.)
   
  
 
@@ -4057,7 +4058,8 @@ description | Waiting for a newly initialized WAL file to reach durable storage
   
   
Total time this table has been vacuumed by the autovacuum daemon,
-   in milliseconds
+   in milliseconds. (This includes the time spent sleeping due to cost-based
+   delay.)
   
  
 
@@ -4066,7 +4068,8 @@ description | Waiting for a newly initialized WAL file to reach durable storage
total_analyze_time double precision
   
   
-   Total time this table has been manually analyzed, in milliseconds
+   Total time this table has been manually analyzed, in milliseconds. (This
+   includes the time spent sleeping due to cost-based delay.)
   
  
 
@@ -4076,7 +4079,8 @@ description | Waiting for a newly initialized WAL file to reach durable storage
   
   
Total time this table has been analyzed by the autovacuum daemon,
-   in milliseconds
+   in milliseconds. (This includes the time spent sleeping due to cost-based
+   delay.)
   
  
 
-- 
2.34.1

Re: Improve monitoring of shared memory allocations

2025-03-12 Thread Rahila Syed

Hi Andres,



>> > + if (hashp->isshared)
>> > + {
>> > + int nsegs;
>> > + int nbuckets;
>> > + nsegs = find_num_of_segs(nelem, &nbuckets,
>> hctl->num_partitions, hctl->ssize);
>> > +
>> > + curr_offset =  (((char *) hashp->hctl) +
>> sizeof(HASHHDR) + (info->dsize * sizeof(HASHSEGMENT)) +
>> > ++ (sizeof(HASHBUCKET) * hctl->ssize * nsegs));
>> > + }
>> > +
>>
>> Why only do this for shared hashtables? Couldn't we allocate the elments
>> together with the rest for non-share hashtables too?
>>
>
>
I have now made the changes uniformly across shared and non-shared hash
tables,
making the code fix look cleaner. I hope this aligns with your suggestions.
Please find attached updated and rebased versions of both patches.

Kindly let me know your views.

Thank you,
Rahila Syed


v3-0001-Account-for-initial-shared-memory-allocated-by-hash_.patch
Description: Binary data


v3-0002-Replace-ShmemAlloc-calls-by-ShmemInitStruct.patch
Description: Binary data

Re: Proposal: Limitations of palloc inside checkpointer

2025-03-12 Thread Maxim Orlov

On Wed, 12 Mar 2025 at 10:27, Xuneng Zhou  wrote:

> Hi,
> The patch itself looks ok to me. I'm curious about the trade-offs between
> this incremental approach and the alternative of
> using palloc_extended() with the MCXT_ALLOC_HUGE flag. The approach of
> splitting the requests into fixed-size slices  avoids OOM failures or
> process termination by the OOM killer, which is good. However, it does add
> some overhead with additional lock acquisition/release cycles and memory
> movement operations via memmove(). The natural question is whether the
> security justify the cost. Regarding the slice size of 1 GB,  is this
> derived from MaxAllocSize limit, or was it chosen for other performance
> reasons? whether a different size might offer better performance under
> typical workloads?
>

I think 1 GB is derived purely from MaxAllocSize. This "palloc" is a
relatively old one, and no one expected the number of requests to exceed 1
GB. Now we have the ability to set the shared_buffers to a huge number
(without discussing now whether this makes any real sense), thus this limit
for palloc becomes a problem.

-- 
Best regards,
Maxim Orlov.

Re: INSERT ... ON CONFLICT DO SELECT [FOR ...] take 2

2025-03-12 Thread Dean Rasheed

> On 3/4/25 10:24 AM, Andreas Karlsson wrote:
> Rebased the patch to add support for OLD.* and NEW.*.

create table t (key int primary key, val text);
insert into t values (1, 'old');

insert into t values (1, 'new') on conflict (key) do select for update
  returning old.*, new.*;

 key | val | key | val
-+-+-+-
   1 | old | |
(1 row)

IMO this should cause new.* be the same as old.*. This is kind-of like
a "do update" that changes nothing, with the end result being that old
and new are the same, because nothing was changed. Currently, the only
command that can cause new to be NULL is a DELETE, because the new
state of the table is that the row no longer exists, which isn't the
case here.

On Mon, 10 Mar 2025 at 13:11, Kirill Reshke  wrote:
>
> On Mon, 10 Mar 2025 at 18:05, Kirill Reshke  wrote:
> >
> > 1) Should we answer INSERT 0 10 here?
>
> Sorry, i mean: INSERT 0 0

Hmm, I would say that the patch is correct -- the count should be the
number of rows inserted, updated or selected for return (and the
"Outputs" section of the doc page for INSERT should be updated to say
that). That way, the count always matches the number of rows returned
when there's a RETURNING clause, which I think is true of all other
DML commands.

Regards,
Dean

Re: Implement waiting for wal lsn replay: reloaded

2025-03-12 Thread Yura Sokolov

10.03.2025 14:30, Alexander Korotkov пишет:
> On Fri, Feb 28, 2025 at 3:55 PM Yura Sokolov  wrote:
>> 28.02.2025 16:03, Yura Sokolov пишет:
>>> 17.02.2025 00:27, Alexander Korotkov wrote:
 On Thu, Feb 6, 2025 at 10:31 AM Yura Sokolov  
 wrote:
> I briefly looked into patch and have couple of minor remarks:
>
> 1. I don't like `palloc` in the `WaitLSNWakeup`. I believe it wont issue
> problems, but still don't like it. I'd prefer to see local fixed array, 
> say
> of 16 elements, and loop around remaining function body acting in batch of
> 16 wakeups. Doubtfully there will be more than 16 waiting clients often,
> and even then it wont be much heavier than fetching all at once.

 OK, I've refactored this to use static array of 16 size.  palloc() is
 used only if we don't fit static array.
>>>
>>> I've rebased patch and:
>>> - fixed compiler warning in wait.c ("maybe uninitialized 'result'").
>>> - made a loop without call to palloc in WaitLSNWakeup. It is with "goto" to
>>> keep indentation, perhaps `do {} while` would be better?
>>
>> And fixed:
>>'WAIT' is marked as BARE_LABEL in kwlist.h, but it is missing from
>> gram.y's bare_label_keyword rule
> 
> Thank you, Yura.  I've further revised the patch.  Mostly added the
> documentation including SQL command reference and few paragraphs in
> the high availability chapter explaining the read-your-writes
> consistency concept.

Good day, Alexander.

Looking "for the last time" to the patch I found there remains
`pg_wal_replay_wait` function in documentation and one comment.
So I fixed it in documentation, and removed sentence from comment.

Otherwise v6 is just rebased v5.

---
regards
Yura Sokolov aka funny-falconFrom 80b4cb8c0ac75168ab1fce55feccc4f08f32ce34 Mon Sep 17 00:00:00 2001
From: Alexander Korotkov 
Date: Mon, 10 Mar 2025 12:59:38 +0200
Subject: [PATCH v6] Implement WAIT FOR command

WAIT FOR is to be used on standby and specifies waiting for
the specific WAL location to be replayed.  This option is useful when
the user makes some data changes on primary and needs a guarantee to see
these changes are on standby.

The queue of waiters is stored in the shared memory as an LSN-ordered pairing
heap, where the waiter with the nearest LSN stays on the top.  During
the replay of WAL, waiters whose LSNs have already been replayed are deleted
from the shared memory pairing heap and woken up by setting their latches.

WAIT FOR needs to wait without any snapshot held.  Otherwise, the snapshot
could prevent the replay of WAL records, implying a kind of self-deadlock.
This is why separate utility command seems appears to be the most robust
way to implement this functionality.  It's not possible to implement this as
a function.  Previous experience shows that stored procedures also have
limitation in this aspect.

Discussion: https://postgr.es/m/eb12f9b03851bb2583adab5df9579b4b%40postgrespro.ru
Author: Kartyshov Ivan 
Author: Alexander Korotkov 
Reviewed-by: Michael Paquier 
Reviewed-by: Peter Eisentraut 
Reviewed-by: Dilip Kumar 
Reviewed-by: Amit Kapila 
Reviewed-by: Alexander Lakhin 
Reviewed-by: Bharath Rupireddy 
Reviewed-by: Euler Taveira 
Reviewed-by: Heikki Linnakangas 
Reviewed-by: Kyotaro Horiguchi 
Reviewed-by: Yura Sokolov 
---
 doc/src/sgml/high-availability.sgml   |  54 +++
 doc/src/sgml/ref/allfiles.sgml|   1 +
 doc/src/sgml/ref/wait_for.sgml| 216 +++
 doc/src/sgml/reference.sgml   |   1 +
 src/backend/access/transam/Makefile   |   3 +-
 src/backend/access/transam/meson.build|   1 +
 src/backend/access/transam/xact.c |   6 +
 src/backend/access/transam/xlog.c |   7 +
 src/backend/access/transam/xlogrecovery.c |  11 +
 src/backend/access/transam/xlogwait.c | 347 ++
 src/backend/commands/Makefile |   3 +-
 src/backend/commands/meson.build  |   1 +
 src/backend/commands/wait.c   | 184 ++
 src/backend/lib/pairingheap.c |  18 +-
 src/backend/parser/gram.y |  15 +-
 src/backend/storage/ipc/ipci.c|   3 +
 src/backend/storage/lmgr/proc.c   |   6 +
 src/backend/tcop/pquery.c |  12 +-
 src/backend/tcop/utility.c|  22 ++
 .../utils/activity/wait_event_names.txt   |   2 +
 src/include/access/xlogwait.h |  89 +
 src/include/commands/wait.h   |  21 ++
 src/include/lib/pairingheap.h |   3 +
 src/include/nodes/parsenodes.h|   7 +
 src/include/parser/kwlist.h   |   1 +
 src/include/storage/lwlocklist.h  |   1 +
 src/include/tcop/cmdtaglist.h |   1 +
 src/test/recovery/meson.build |   1 +
 src/test/recovery/t/045_wait_for_lsn.pl   | 217 +++
 src/tools/pgindent/typedefs.list

Re: dblink: Add SCRAM pass-through authentication

2025-03-12 Thread Jacob Champion

On Mon, Mar 10, 2025 at 11:25 AM Jacob Champion
 wrote:
>
> On Fri, Mar 7, 2025 at 8:22 AM Peter Eisentraut  wrote:
> > Right.  How about the attached?  It checks as an alternative to a
> > password whether the SCRAM keys were provided.  That should get us back
> > to the same level of checking?
>
> Yes, I think so. Attached is a set of tests to illustrate, mirroring
> the dblink tests added upthread; they fail without this patch.

In an offline discussion with Peter and Matheus, we figured out that
this is still not enough. The latest patch checks that a password was
used, but it doesn't ensure that the password material came from the
SCRAM keys. Attached is an updated test to illustrate.

Thanks,
--Jacob
commit 4a41754eaa41f2db285e68ff8140d6932c299358
Author: Jacob Champion 
Date:   Mon Mar 10 11:18:27 2025 -0700

WIP

diff --git a/contrib/postgres_fdw/t/001_auth_scram.pl 
b/contrib/postgres_fdw/t/001_auth_scram.pl
index 047840cc914..464492948b4 100644
--- a/contrib/postgres_fdw/t/001_auth_scram.pl
+++ b/contrib/postgres_fdw/t/001_auth_scram.pl
@@ -68,6 +68,45 @@ test_fdw_auth($node1, $db0, "t2", $fdw_server2,
 test_auth($node2, $db2, "t2",
"SCRAM auth directly on foreign server should still succeed");
 
+# Ensure that trust connections fail without superuser opt-in.
+unlink($node1->data_dir . '/pg_hba.conf');
+unlink($node2->data_dir . '/pg_hba.conf');
+
+$node1->append_conf(
+   'pg_hba.conf', qq{
+local   db0 all scram-sha-256
+local   db1 all trust
+});
+$node2->append_conf(
+   'pg_hba.conf', qq{
+local   all all password
+});
+
+$node1->restart;
+$node2->restart;
+
+my ($ret, $stdout, $stderr) = $node1->psql(
+   $db0,
+   "SELECT count(1) FROM t",
+   connstr => $node1->connstr($db0) . " user=$user");
+
+is($ret, 3, 'loopback trust fails on the same cluster');
+like(
+   $stderr,
+   qr/password or GSSAPI delegated credentials required/,
+   'expected error from loopback trust (same cluster)');
+
+($ret, $stdout, $stderr) = $node1->psql(
+   $db0,
+   "SELECT count(1) FROM t2",
+   connstr => $node1->connstr($db0) . " user=$user");
+
+is($ret, 3, 'loopback password fails on a different cluster');
+like(
+   $stderr,
+   qr/password or GSSAPI delegated credentials required/,
+   'expected error from loopback password (different cluster)');
+
 # Helper functions
 
 sub test_auth

remove open-coded popcount in acl.c

2025-03-12 Thread Nathan Bossart

There's a count_one_bits() function in acl.c that can be replaced with a
call to pg_popcount64().  This isn't performance-critical code, but IMHO we
might as well use the centralized implementation.

-- 
nathan
>From 932fac13bf168571b145a54c29d9ac28ca2a070f Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Wed, 12 Mar 2025 10:45:12 -0500
Subject: [PATCH v1 1/1] Remove open-coded popcount in acl.c.

---
 src/backend/utils/adt/acl.c | 20 +---
 1 file changed, 1 insertion(+), 19 deletions(-)

diff --git a/src/backend/utils/adt/acl.c b/src/backend/utils/adt/acl.c
index 6a76550a5e2..ba14713fef2 100644
--- a/src/backend/utils/adt/acl.c
+++ b/src/backend/utils/adt/acl.c
@@ -5432,24 +5432,6 @@ select_best_admin(Oid member, Oid role)
return admin_role;
 }
 
-
-/* does what it says ... */
-static int
-count_one_bits(AclMode mask)
-{
-   int nbits = 0;
-
-   /* this code relies on AclMode being an unsigned type */
-   while (mask)
-   {
-   if (mask & 1)
-   nbits++;
-   mask >>= 1;
-   }
-   return nbits;
-}
-
-
 /*
  * Select the effective grantor ID for a GRANT or REVOKE operation.
  *
@@ -5532,7 +5514,7 @@ select_best_grantor(Oid roleId, AclMode privileges,
 */
if (otherprivs != ACL_NO_RIGHTS)
{
-   int nnewrights = 
count_one_bits(otherprivs);
+   int nnewrights = 
pg_popcount64(otherprivs);
 
if (nnewrights > nrights)
{
-- 
2.39.5 (Apple Git-154)

Re: Orphaned users in PG16 and above can only be managed by Superusers

2025-03-12 Thread Nathan Bossart

On Wed, Mar 12, 2025 at 12:10:30PM +0530, Ashutosh Sharma wrote:
> I think moving the check to the second pass won´t work in this case.
> The reason is that we rely on entries in the pg_auth_members table. By
> the time the check occurs in the second pass, the first pass will have
> already removed all entries related to the roles being dropped from
> pg_auth_members and incremented the command counter. As a result, when
> check_drop_role_dependency() is called in the second pass, it won´t
> find any entries in the table and won't be able to detect any ADMIN
> privilege-related dependencies.

Oh, good catch.

> So, we need to explore alternative approaches to handle this better.
> I´ll spend some more time today to investigate other possibilities for
> addressing this issue.

I think this approach has other problems.  For example, even if a role has
admin directly on the dropped role, we'll block DROP ROLE if it also has
admin indirectly:

postgres=# create role a createrole;
CREATE ROLE
postgres=# set role a;
SET
postgres=> create role b createrole;
CREATE ROLE
postgres=> set role b;
SET
postgres=> create role c admin a;
CREATE ROLE
postgres=> reset role;
RESET
postgres=# drop role b;
ERROR:  role "b" cannot be dropped because some objects depend on it
DETAIL:  role a inherits ADMIN privileges on role c through role b

There are also other ways besides DROP ROLE that roles can end up without
any admins:

postgres=# create role a createrole;
CREATE ROLE
postgres=# set role a;
SET
postgres=> create role b createrole;
CREATE ROLE
postgres=> set role b;
SET
postgres=> create role c;
CREATE ROLE
postgres=> reset role;
RESET
postgres=# revoke c from b;
REVOKE ROLE

I wonder if we should adjust the requirements here to be "an operation
cannot result in a role with 0 admins."  That does mean you might lose
indirect admin privileges, but I'm not sure that's sustainable, anyway.
For example, if a role has 2 admins, should we block REVOKE from one of the
admins because another role inherited its privileges?  If nothing else, it
sounds like a complicated user experience.

If we just want to make sure that roles don't end up without admins, I
think we could just collect the set of roles losing an admin during DROP
ROLE, REVOKE, etc., and then once all removals have completed, we verify
that each of the collected roles has an admin.

-- 
nathan

Re: Elimination of the repetitive code at the SLRU bootstrap functions.

2025-03-12 Thread Andrey Borodin




> On 12 Mar 2025, at 20:02, Evgeny Voropaev  wrote:
> 

v6 looks good to me. I'll flip the CF entry.

Thanks!


Best regards, Andrey Borodin.

Re: Log connection establishment timings

2025-03-12 Thread Melanie Plageman

On Tue, Mar 11, 2025 at 6:27 PM Melanie Plageman
 wrote:
>
> I did more manual testing of my patches, and I think they are mostly
> ready for commit except for the IsConnectionBackend() macro (if we
> have something to change it to).

I've committed this and marked it as such in the CF app.
Thanks to everyone for the review.

- Melanie

Re: Introduce "log_connection_stages" setting.

2025-03-12 Thread Melanie Plageman

On Mon, Mar 3, 2025 at 6:43 PM Melanie Plageman
 wrote:
>
> Okay, I got cold feet today working on this. I actually think we
> probably want some kind of guc set (set like union/intersection/etc
> not set like assign a value) infrastructure for gucs that can be equal
> to any combination of predetermined values.

I ended up with a patch set that I was happy with which used the list
of string GUCs. I committed it in 9219093cab26. It changes
log_connections to a list of strings for the different aspects of
connection setup and supports the most common inputs for the boolean
version of log_connections for backwards compatibility. I did not end
up deprecating log_disconnections since it is rather late in the cycle
and that change could use more visibility. I suggest proposing that at
the beginning of the next release.

Thanks for your interest in this topic.

- Melanie

Re: Index AM API cleanup

2025-03-12 Thread Mark Dilger

On Wed, Mar 12, 2025 at 7:00 AM Peter Eisentraut 
wrote:

> And another small patch set extracted from the bigger one, to keep
> things moving along:
>
> 0001: Add get_opfamily_method() in lsyscache.c, from your patch set.
>

Right, this came from v21-0006-*, with a slight code comment change and one
variable renaming.  It is ready for commit.

0002: Add get_opfamily_member_for_cmptype().  This was called
> get_opmethod_member() in your patch set, but I think that name wasn't
> quite right.  I also removed the opmethod argument, which was rarely
> used and is somewhat redundant.
>

This is also taken from v21-0006-*.  The reason I had an opmethod argument
was that some of the callers of this function already know the opmethod,
and without the argument, this function has to look up the opmethod from
the syscache again.  Whether that makes a measurable performance difference
is an open question.

Your version is ready for commit.  If we want to reintroduce the opmethod
argument for performance reasons, we can always circle back to that in a
later commit.

0003 and 0004 are enabling non-btree unique indexes for partition keys
>

You refactored v21-0011-* into v21.2-0003-*, in which an error gets raised
about a missing operator in a slightly different part of the logic.  I am
concerned that the new positioning of the check-and-error might allow the
flow of execution to reach the Assert(idx_eqop) statement in situations
where the user has defined an incomplete opfamily or opclass.  Such a
condition should raise an error about the missing operator rather than
asserting.

In particular, looking at the control logic:

   if (stmt->unique && !stmt->iswithoutoverlaps)
{

}
else if (exclusion)
   ;

Assert(idx_eqop);

I cannot prove to myself that the assertion cannot trigger, because the
upstream logic before we reach this point *might* be filtering out all
cases where this could be a problem, but it is too complicated to prove.
Even if it is impossible now, this is a pretty brittle piece of code after
applying the patch.

Any chance you'd like to keep the patch closer to how I had it in
v21-0011-* ?

> and materialized views.  These were v21-0011 and v21-0012, except that
> I'm combining the switch from strategies to compare types (which was in
> v21-0006 or so) with the removal of the btree requirements.
>

v21.2-0004-* is ready for commit.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Test to dump and restore objects left behind by regression

2025-03-12 Thread Ashutosh Bapat

On Wed, Mar 12, 2025 at 5:35 PM Alvaro Herrera  wrote:
>
> Hello
>
> When running these tests, I encounter this strange diff in the dumps,
> which seems to be that the locale for type money does not match.  I
> imagine the problem is that the locale is not set correctly when
> initdb'ing one of them?  Grepping the regress_log for initdb, I see
> this:
>
> $ grep -B1 'Running: initdb' tmp_check/log/regress_log_002_pg_upgrade
> [13:00:57.580](0.003s) # initializing database system by running initdb
> # Running: initdb -D 
> /home/alvherre/Code/pgsql-build/master/src/bin/pg_upgrade/tmp_check/t_002_pg_upgrade_old_node_data/pgdata
>  -A trust -N --wal-segsize 1 --allow-group-access --encoding UTF-8 
> --lc-collate C --lc-ctype C --locale-provider builtin --builtin-locale 
> C.UTF-8 -k
> --
> [13:01:12.879](0.044s) # initializing database system by running initdb
> # Running: initdb -D 
> /home/alvherre/Code/pgsql-build/master/src/bin/pg_upgrade/tmp_check/t_002_pg_upgrade_dst_node_data/pgdata
>  -A trust -N --wal-segsize 1 --allow-group-access --encoding UTF-8 
> --lc-collate C --lc-ctype C --locale-provider builtin --builtin-locale 
> C.UTF-8 -k
> --
> [13:01:28.000](0.033s) # initializing database system by running initdb
> # Running: initdb -D 
> /home/alvherre/Code/pgsql-build/master/src/bin/pg_upgrade/tmp_check/t_002_pg_upgrade_new_node_data/pgdata
>  -A trust -N --wal-segsize 1 --allow-group-access --encoding SQL_ASCII 
> --locale-provider libc
>

The original node and the node where dump is restored have the same
initdb commands. It's the upgraded node which has different initdb
command. But that's how the test is written originally.

>
>
> [12:50:31.838](0.102s) not ok 15 - dump outputs from original and restored 
> regression database (using plain format) match
> [12:50:31.839](0.000s)
> [12:50:31.839](0.000s) #   Failed test 'dump outputs from original and 
> restored regression database (using plain format) match'
> #   at /pgsql/source/master/src/test/perl/PostgreSQL/Test/Utils.pm line 797.
> [12:50:31.839](0.000s) #  got: '1'
> # expected: '0'
> === diff of 
> /home/alvherre/Code/pgsql-build/master/src/bin/pg_upgrade/tmp_check/tmp_test_vVew/src_dump.sql_adjusted
>  and 
> /home/alvherre/Code/pgsql-build/master/src/bin/pg_upgrade/tmp_check/tmp_test_vVew/dest_dump.plain.sql_adjusted
> === stdout ===
> --- 
> /home/alvherre/Code/pgsql-build/master/src/bin/pg_upgrade/tmp_check/tmp_test_vVew/src_dump.sql_adjusted
>  2025-03-12 12:50:27.674918597 +0100
> +++ 
> /home/alvherre/Code/pgsql-build/master/src/bin/pg_upgrade/tmp_check/tmp_test_vVew/dest_dump.plain.sql_adjusted
>   2025-03-12 12:50:31.778840338 +0100
> @@ -208972,7 +208972,7 @@
>  -- Data for Name: money_data; Type: TABLE DATA; Schema: public; Owner: 
> alvherre
>  --
>  COPY public.money_data (m) FROM stdin;
> -$123.46
> +$ 12.346,00
>  \.
>  --
>  -- Data for Name: mvtest_t; Type: TABLE DATA; Schema: public; Owner: alvherre
> @@ -376231,7 +376231,7 @@
>  -- Data for Name: tab_core_types; Type: TABLE DATA; Schema: public; Owner: 
> alvherre
>  --
>  COPY public.tab_core_types (point, line, lseg, box, openedpath, closedpath, 
> polygon, circle, date, "time", "timestamp", timetz, timestamptz, "interval", 
> "json", jsonb, jsonpath, inet, cidr, macaddr8, macaddr, int2, int4, int8, 
> float4, float8, pi, "char", bpchar, "varchar", name, text, bool, bytea, 
> "bit", varbit, money, refcursor, int2vector, oidvector, aclitem, tsvector, 
> tsquery, uuid, xid8, regclass, type, regrole, oid, tid, xid, cid, 
> txid_snapshot, pg_snapshot, pg_lsn, cardinal_number, character_data, 
> sql_identifier, time_stamp, yes_or_no, int4range, int4multirange, int8range, 
> int8multirange, numrange, nummultirange, daterange, datemultirange, tsrange, 
> tsmultirange, tstzrange, tstzmultirange) FROM stdin;
> -(11,12){1,-1,0}[(11,11),(12,12)]   (13,13),(11,11) 
> ((11,12),(13,13),(14,14))   [(11,12),(13,13),(14,14)]   
> ((11,12),(13,13),(14,14))   <(1,1),1>   2025-03-12  
> 04:50:14.125899 2025-03-12 04:50:14.125899  04:50:14.125899-07  
> 2025-03-12 12:50:14.125899+01   00:00:12{"reason":"because"}
> {"when": "now"} $."a"[*]?(@ > 2)127.0.0.1   127.0.0.0/8 
> 00:01:03:ff:fe:86:1c:ba 00:01:03:86:1c:ba   2   4   8   4 
>   8   3.14159265358979f   c   abc nametxt t   
> \\xdeadbeef 1   10001   $12.34  abc 1 2 1 2 
> alvherre=UC/alvherre'a' 'and' 'ate' 'cat' 'fat' 'mat' 'on' 'rat' 'sat'
>   'fat' & 'rat'   a0eebc99-9c0b-4ef8-bb6d-6bb9bd380a1111  pg_class
> regtype pg_monitor  1259(1,1)   2   3   10:20:10,14,15  
> 10:20:10,14,15  16/B374D848 1   l   n   2025-03-12 
> 12:50:14.13+01   YES empty   {}  empty   {}  (3,4)   {(3,4)} 
> [2020-01-03,2021-02-03) {[2020-01-03,2021-02-03)}   ("2020-01-02 
> 03:04:05","2021-02-03 06:07:08")   {("

Re: Support NOT VALID / VALIDATE constraint options for named NOT NULL constraints

2025-03-12 Thread Rushabh Lathia

On Wed, Mar 12, 2025 at 3:20 PM Alvaro Herrera 
wrote:

> On 2025-Mar-12, Ashutosh Bapat wrote:
>
> > If the test passes for you, can you please try the patches at [1] on
> > top of your patches? Please apply those, set and export environment
> > variable PG_TEST_EXTRA=regress_dump_test, and run 002_pg_upgrade test?
> > I intended to do this but can not do it since the test always fails
> > with your patches applied.
>
> Oh, I need to enable a PG_TEST_EXTRA option in order for the test to
> run?  FFS.  That explains why the tests passed just fine for me.
> I'll re-run.
>
> --
> Álvaro HerreraBreisgau, Deutschland  —
> https://www.EnterpriseDB.com/
> "La conclusión que podemos sacar de esos estudios es que
> no podemos sacar ninguna conclusión de ellos" (Tanenbaum)
>

I can reproduce the issue and will be sending the patch soon.

Thanks Alvaro, Ashutosh.


Rushabh Lathia

Re: Elimination of the repetitive code at the SLRU bootstrap functions.

2025-03-12 Thread Evgeny Voropaev


Hello Hackers!

Andrey, thank you for your review and remarks.

> Patch adds whitespace errors
Cleaned.

> if (writePage != 0) should be if (writePage)
Done.

> XLogSimpleInsert(int64 simpledata, RmgrId rmid, uint8 info)
> I’d rename function XLogSimpleInsert() to something more descriptive
> and changed arguments order from generic to specific.

The function has been renamed and the parameters have been reordered. 
Now we have:

XLogInsertInt64(RmgrId rmid, uint8 info, int64 simpledata)

> Probably, committer has broader view on XLog routines and can decide
> if this function would better belong to SLRU than common XLog stuff.

In accordance with Álvaro's proposal, we want to enclose this function 
in the "xloginsert.c" module.


Best regards,
Evgeny Voropaev.From 8c9aa484dc96cdf23c7fa524d88d67ce3c4cc6fc Mon Sep 17 00:00:00 2001
From: Evgeny Voropaev 
Date: Wed, 12 Mar 2025 22:35:19 +0800
Subject: [PATCH v6] Elimination of the repetitive code at the SLRU
 bootstrapping and nullifying functions.
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The functions bootstrapping and nullifying SLRU pages used to have a lot of
repetitive code. New functions realized by this commit (BootStrapSlruPage,
XLogInsertInt64) have allowed to remove these repetitions.

Author: Evgeny Voropaev  
Reviewed by: Álvaro Herrera 
Reviewed by: Andrey Borodin 
Reviewed by: Aleksander Alekseev 
Discussion: https://www.postgresql.org/message-id/flat/97820ce8-a1cd-407f-a02b-47368fadb14b%40tantorlabs.com
---
 src/backend/access/transam/clog.c   |  72 ++-
 src/backend/access/transam/commit_ts.c  |  68 ++-
 src/backend/access/transam/multixact.c  | 155 ++--
 src/backend/access/transam/slru.c   |  45 ++-
 src/backend/access/transam/subtrans.c   |  48 ++--
 src/backend/access/transam/xloginsert.c |  14 +++
 src/include/access/slru.h   |   2 +
 src/include/access/xloginsert.h |   1 +
 8 files changed, 137 insertions(+), 268 deletions(-)

diff --git a/src/backend/access/transam/clog.c b/src/backend/access/transam/clog.c
index 48f10bec91e..aca8c31d680 100644
--- a/src/backend/access/transam/clog.c
+++ b/src/backend/access/transam/clog.c
@@ -110,7 +110,6 @@ static SlruCtlData XactCtlData;
 #define XactCtl (&XactCtlData)
 
 
-static int	ZeroCLOGPage(int64 pageno, bool writeXlog);
 static bool CLOGPagePrecedes(int64 page1, int64 page2);
 static void WriteZeroPageXlogRec(int64 pageno);
 static void WriteTruncateXlogRec(int64 pageno, TransactionId oldestXact,
@@ -832,41 +831,11 @@ check_transaction_buffers(int *newval, void **extra, GucSource source)
 void
 BootStrapCLOG(void)
 {
-	int			slotno;
-	LWLock	   *lock = SimpleLruGetBankLock(XactCtl, 0);
-
-	LWLockAcquire(lock, LW_EXCLUSIVE);
-
-	/* Create and zero the first page of the commit log */
-	slotno = ZeroCLOGPage(0, false);
-
-	/* Make sure it's written out */
-	SimpleLruWritePage(XactCtl, slotno);
-	Assert(!XactCtl->shared->page_dirty[slotno]);
-
-	LWLockRelease(lock);
-}
-
-/*
- * Initialize (or reinitialize) a page of CLOG to zeroes.
- * If writeXlog is true, also emit an XLOG record saying we did this.
- *
- * The page is not actually written, just set up in shared memory.
- * The slot number of the new page is returned.
- *
- * Control lock must be held at entry, and will be held at exit.
- */
-static int
-ZeroCLOGPage(int64 pageno, bool writeXlog)
-{
-	int			slotno;
-
-	slotno = SimpleLruZeroPage(XactCtl, pageno);
-
-	if (writeXlog)
-		WriteZeroPageXlogRec(pageno);
-
-	return slotno;
+	/*
+	 * Nullify the page (pageno = 0), don't insert an XLog record,
+	 * save the page on a disk
+	 */
+	BootStrapSlruPage(XactCtl, 0, 0, true);
 }
 
 /*
@@ -959,7 +928,6 @@ void
 ExtendCLOG(TransactionId newestXact)
 {
 	int64		pageno;
-	LWLock	   *lock;
 
 	/*
 	 * No work except at first XID of a page.  But beware: just after
@@ -970,14 +938,12 @@ ExtendCLOG(TransactionId newestXact)
 		return;
 
 	pageno = TransactionIdToPage(newestXact);
-	lock = SimpleLruGetBankLock(XactCtl, pageno);
 
-	LWLockAcquire(lock, LW_EXCLUSIVE);
-
-	/* Zero the page and make an XLOG entry about it */
-	ZeroCLOGPage(pageno, true);
-
-	LWLockRelease(lock);
+	/*
+	 * Zero the page, make an XLOG entry about it,
+	 * don't write the page on a disk
+	 */
+	BootStrapSlruPage(XactCtl, pageno, WriteZeroPageXlogRec, false);
 }
 
 
@@ -1073,9 +1039,7 @@ CLOGPagePrecedes(int64 page1, int64 page2)
 static void
 WriteZeroPageXlogRec(int64 pageno)
 {
-	XLogBeginInsert();
-	XLogRegisterData(&pageno, sizeof(pageno));
-	(void) XLogInsert(RM_CLOG_ID, CLOG_ZEROPAGE);
+	XLogInsertInt64(RM_CLOG_ID, CLOG_ZEROPAGE, pageno);
 }
 
 /*
@@ -1114,19 +1078,9 @@ clog_redo(XLogReaderState *record)
 	if (info == CLOG_ZEROPAGE)
 	{
 		int64		pageno;
-		int			slotno;
-		LWLock	   *lock;
-
 		memcpy(&pageno, XLogRecGetData(record), sizeof(pageno));
-
-		lock = SimpleLruGetBankLock(XactCtl, pageno);
-		LWLockAcquire(lock, LW_EXCLUSIVE);
-
-		sl

Re: Proposal: Progressive explain

2025-03-12 Thread Euler Taveira

On Fri, Mar 7, 2025, at 5:34 PM, Rafael Thofehrn Castro wrote:
>  Did additional benchmarks and found issues with the patch that doesn't do 
> execProcNode
> wrapping. There are sporadic crashes with **double free or corruption (top)**
> 
> So making the patch that uses the wrapper the current one. Again, giving the 
> credits to
> torikoshia as being the owner of that section of the code.
> 

Rafael, thanks for working on it. It is a step forward in observability. I
started with some performance tests and the last improvements seem to fix the
overhead imposed in the initial patch version. I didn't notice any of these new
function in the perf report while executing fast queries.

I found a crash. It is simple to reproduce.

Session A:

select * from pg_stat_progress_explain;
\watch 2

Session B:

explain select pg_sleep(30);
\watch 2

8<8<

Backtrace:

Core was generated by `postgres: euler postgres [lo'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  WrapExecProcNodeWithExplain (ps=0x7f7f7f7f7f7f7f7f) at explain.c:5401
5401 if (ps->ExecProcNodeOriginal != NULL)
#0  WrapExecProcNodeWithExplain (ps=0x7f7f7f7f7f7f7f7f) at explain.c:5401
#1  0x5624173829aa in handle_sig_alarm (postgres_signal_arg=) at timeout.c:414
#2  0x5624173ba02c in wrapper_handler (postgres_signal_arg=14) at 
pqsignal.c:110
#3  
#4  0x7f20fa529e63 in epoll_wait (epfd=6, events=0x56244ef37e58, 
maxevents=1, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
#5  0x5624171fb02f in WaitEventSetWaitBlock (nevents=1, 
occurred_events=0x7ffdd9e62080, cur_timeout=-1, set=0x56244ef37dd8) at 
waiteventset.c:1190
#6  WaitEventSetWait (set=0x56244ef37dd8, timeout=timeout@entry=-1, 
occurred_events=occurred_events@entry=0x7ffdd9e62080, nevents=nevents@entry=1, 
wait_event_info=wait_event_info@entry=100663296) at waiteventset.c:1138
#7  0x56241709513c in secure_read (port=0x56244eeb90e0, ptr=0x56241775a9a0 
, len=8192) at be-secure.c:218
#8  0x56241709bf2e in pq_recvbuf () at pqcomm.c:924
#9  0x56241709ceb5 in pq_getbyte () at pqcomm.c:970
#10 0x56241721b617 in SocketBackend (inBuf=0x7ffdd9e622a0) at postgres.c:361
#11 ReadCommand (inBuf=0x7ffdd9e622a0) at postgres.c:484
#12 PostgresMain (dbname=, username=) at 
postgres.c:4625
#13 0x5624172167ed in BackendMain (startup_data=, 
startup_data_len=) at backend_startup.c:107
#14 0x56241717519b in postmaster_child_launch (child_type=, 
child_slot=2, startup_data=startup_data@entry=0x7ffdd9e6253c, 
startup_data_len=startup_data_len@entry=4, 
client_sock=client_sock@entry=0x7ffdd9e62540) at launch_backend.c:274
#15 0x562417178c32 in BackendStartup (client_sock=0x7ffdd9e62540) at 
postmaster.c:3519
#16 ServerLoop () at postmaster.c:1688
#17 0x56241717a6da in PostmasterMain (argc=argc@entry=1, 
argv=argv@entry=0x56244eeb81b0) at postmaster.c:1386
#18 0x562416e64f9a in main (argc=1, argv=0x56244eeb81b0) at main.c:230

8<8<

You call this feature "progressive explain". My first impression is that it
will only provide the execution plans for EXPLAIN commands. Instead of
"progressive explain", I would suggest "query progress" that is a general
database terminology. It seems natural to use "progressive explain" since you
are using the explain infrastructure (including the same options -- format,
settings, wal, ...) -- to print the execution plan.

+CREATE VIEW pg_stat_progress_explain AS
+SELECT
+*
+FROM pg_stat_progress_explain(true);
+

There is no use for the function argument. If you decide to keep this function,
remove it.

Why don't you use the pgstat_progress_XXX() API? Since you are using a
pg_stat_progress_XXX view name I would expect using the command progress
reporting infrastructure (see backend_progress.c).

Maybe you could include datid and datname as the other progress reporting
views. It would avoid a join to figure out what the database is.

+static const struct config_enum_entry explain_format_options[] = {
+   {"text", EXPLAIN_FORMAT_TEXT, false},
+   {"xml", EXPLAIN_FORMAT_XML, false},
+   {"json", EXPLAIN_FORMAT_JSON, false},
+   {"yaml", EXPLAIN_FORMAT_YAML, false},
+   {NULL, 0, false}
+};

Isn't it the same definition as in auto_explain.c? Use only one definition for
auto_explain and this feature. You can put this struct into explain.c, use an
extern definition for guc_tables.c and put a extern PGDLLIMPORT defintion into
guc.h. See wal_level_options, for an example.

+static const struct config_enum_entry progressive_explain_options[] = {
+   {"off", PROGRESSIVE_EXPLAIN_NONE, false},
+   {"explain", PROGRESSIVE_EXPLAIN_EXPLAIN, false},
+   {"analyze", PROGRESSIVE_EXPLAIN_ANALYZE, false},
+   {"false", PROGRESSIVE_EXPLAIN_NONE, true},
+   {NULL, 0, false}
+};

The "analyze" is a separate option in auto_explain. Should we have 2 options?
One that enable/disable

Re: Index AM API cleanup

2025-03-12 Thread Tom Lane

Peter Eisentraut  writes:
> 0002: Add get_opfamily_member_for_cmptype().  This was called 
> get_opmethod_member() in your patch set, but I think that name wasn't 
> quite right.  I also removed the opmethod argument, which was rarely 
> used and is somewhat redundant.

Hm, that will throw an error if IndexAmTranslateCompareType fails.
Shouldn't it be made to return InvalidOid instead?

regards, tom lane

Re: Optimizing FastPathTransferRelationLocks()

2025-03-12 Thread Ashutosh Bapat

On Tue, Mar 11, 2025 at 8:46 PM Fujii Masao  wrote:
>
>
>
> On 2025/03/11 20:55, Ashutosh Bapat wrote:
> > Hi Fujii-san,
> >
> > It seems that this was forgotten somehow.
> >
> > The patch still applies.
> >
> > Examining c4d5cb71d229095a39fda1121a75ee40e6069a2a, it seems that this patch
> > could have been part of that commit as well. But may be it wasn't so 
> > apparent
> > that time. I think it's a good improvement.
>
> Thanks for reviewing the patch!
>
>
> > On Tue, Nov 19, 2024 at 10:14 PM Fujii Masao
> >  wrote:
> >>
> >> The latest version of the patch is attached.
> >
> > @@ -2773,6 +2773,10 @@ FastPathTransferRelationLocks(LockMethod
> > lockMethodTable, const LOCKTAG *locktag
> > LWLock *partitionLock = LockHashPartitionLock(hashcode);
> > Oid relid = locktag->locktag_field2;
> > uint32 i;
> > + uint32 group;
> > +
> > + /* fast-path group the lock belongs to */
> > + group = FAST_PATH_REL_GROUP(relid);
> >
> > We could just combine variable declaration and initialization; similar
> > to partitionLock.
>
> I’ve updated the patch as suggested. Updated patch is attached.
>

Thanks.

>
> > The performance gain from this patch might be tiny and may not be
> > visible. Still, have you tried to measure the performance improvement?
>
> I haven’t measured the actual performance gain since the patch optimizes
> the logic in a clear and logical way. While we might see some improvement
> in artificial scenarios — like with a large max_connections and
> all backends slots having their databaseIds set, I’m not sure
> how meaningful that would be.

Fair enough. The code is more readable this way. That itself is an improvement.

I stared at c4d5cb71d229095a39fda1121a75ee40e6069a2a to see whether
there's any reason this was left aside at that time. I am convinced
that it was just missed. I think this patch is good to be committed.

-- 
Best Wishes,
Ashutosh Bapat

Re: Draft for basic NUMA observability

2025-03-12 Thread Jakub Wartak

On Mon, Mar 10, 2025 at 11:14 AM Bertrand Drouvot
 wrote:

> Thanks for the new version!

v10 is attached with most fixes after review and one new thing
introduced: pg_numa_available() for run-time decision inside tests
which was needed after simplifying code a little bit as you wanted.
I've also fixed -Dlibnuma=disabled as it was not properly implemented.
There are couple open items (minor/decision things), but most is fixed
or answered:

> Some random comments on 0001:
>
> === 1
>
> It does not compiles "alone". It's missing:
>
[..]
> +extern PGDLLIMPORT int huge_pages_status;
[..]
> -static int huge_pages_status = HUGE_PAGES_UNKNOWN;
> +inthuge_pages_status = HUGE_PAGES_UNKNOWN;
>
> That come with 0002. So it has to be in 0001 instead.

Ugh, good catch, I haven't thought about it in isolation, they are
separate to just ease review, but should be committed together. Fixed.

> === 2
>
> +else
> +  as_fn_error $? "header file  is required for --with-libnuma" 
> "$LINENO" 5
> +fi
>
> Maybe we should add the same test (checking for numa.h) for meson?

TBH, I have no idea, libnuma.h may exist but it may not link e.g. when
cross-compiling 32-bit on 64-bit. Or is this more about keeping sync
between meson and autoconf?

> === 3
>
> +# FIXME: filter-out / with/without with_libnuma?
> +LIBS += $(LIBNUMA_LIBS)
>
> It looks to me that we can remove those 2 lines.

Done.

> === 4
>
> + Only supported on Linux.
>
> s/on Linux/on platforms for which the libnuma library is implemented/?
>
> I did a quick grep on "Only supported on" and it looks like that could be
> a more consistent wording.

Fixed.

> === 5
>
> +#include "c.h"
> +#include "postgres.h"
> +#include "port/pg_numa.h"
> +#include "storage/pg_shmem.h"
> +#include 
>
> I had a closer look to other header files and it looks like it "should" be:
>
> #include "c.h"
> #include "postgres.h"
>
> #include 
> #ifdef WIN32
> #include 
> #endif
>
> #include "port/pg_numa.h"
> #include "storage/pg_shmem.h"
>
> And is "#include "c.h"" really needed?

Fixed both. It seems to compile without c.h.

> === 6
>
> +/* FIXME not tested, might crash */
>
> That's a bit scary.

When you are in support for long enough it is becoming the norm ;) But
on serious note Andres wanted have numa error/warning handlers (used
by libnuma), but current code has no realistic code-path to hit it
from numa_available(3), numa_move_pages(3) or numa_max_node(3). The
situation won't change until one day in future (I hope!) we start
using some more advanced libnuma functionality for interleaving
memory, please see my earlier reply:
https://www.postgresql.org/message-id/CAKZiRmzpvBtqrz5Jr2DDcfk4Ar1ppsXkUhEX9RpA%2Bs%2B_5hcTOg%40mail.gmail.com
 E.g. numa_available(3) is tiny wrapper , see
https://github.com/numactl/numactl/blob/master/libnuma.c#L872

For now, I've adjusted that FIXME to XXX, but still don't know we
could inject failure to see this triggered...

> === 7
>
> +* XXX: for now we issue just WARNING, but long-term that might 
> depend on
> +* numa_set_strict() here
>
> s/here/here./

Done.

> Did not look carefully at all the comments in 0001, 0002 and 0003 though.
>
> A few random comments regarding 0002:
>
> === 8
>
> # create extension pg_buffercache;
> ERROR:  could not load library 
> "/home/postgres/postgresql/pg_installed/pg18/lib/pg_buffercache.so": 
> /home/postgres/postgresql/pg_installed/pg18/lib/pg_buffercache.so: undefined 
> symbol: pg_numa_query_pages
> CONTEXT:  SQL statement "CREATE FUNCTION pg_buffercache_pages()
> RETURNS SETOF RECORD
> AS '$libdir/pg_buffercache', 'pg_buffercache_pages'
> LANGUAGE C PARALLEL SAFE"
> extension script file "pg_buffercache--1.2.sql", near line 7
>
> While that's ok if 0003 is applied. I think that each individual patch should
> "fully" work individually.

STILL OPEN QUESTION: Not sure I understand: you need to have 0001 +
0002 or 0001 + 0003, but here 0002 is complaining about lack of
pg_numa_query_pages() which is part of 0001 (?) Should I merge those
patches or keep them separate?

> === 9
>
> +   do
> +   {
> +   if (os_page_ptrs[blk2page + j] == 0)
>
> blk2page + j will be repeated multiple times, better to store it in a local
> variable instead?

Done.

> === 10
>
> +   if (firstUseInBackend == true)
>
>
> if (firstUseInBackend) instead?

Done everywhere.

> === 11
>
> +   int j = 0,
> +   blk2page = (int) i * pages_per_blk;
>
> I wonder if size_t is more appropriate for blk2page:
>
> size_t blk2page = (size_t)(i * pages_per_blk) maybe?

Sure, done.

> === 12
>
> as we know that we'll iterate until pages_per_blk then would a for loop be 
> more
> appropriate here, something like?
>
> "
> for (size_t j = 0; j < pages_per_blk; j++)
> {
> if (os_page_ptrs[blk2page + j] == 0)
> {
> "

Sure.

> === 13
>
> +   if (buf_state & BM_DIRTY)
> +   fctx->record[i].isdirty = true;
> +   else
> +

Re: Separate GUC for replication origins

2025-03-12 Thread vignesh C

On Fri, 7 Mar 2025 at 21:21, Euler Taveira  wrote:
>
> On Fri, Mar 7, 2025, at 11:47 AM, Peter Eisentraut wrote:
>
> Is that maximum active for the whole system, or maximum active per
> session, or maximum active per created origin, or some combination of these?
>
>
> It is a cluster-wide setting. Similar to max_replication_slots. I will make
> sure the GUC description is clear about it. It seems the Replication Progress
> Tracking chapter needs an update to specify this information too.

I reviewed the discussion on this thread and believe we now have an
agreement on the design and GUC names. However, the patch still needs
updates and extensive testing, especially considering its impact on
backward compatibility. I'm unsure if this feature can be committed in
the current CommitFest. If you're okay with it, we can move it to the
next CommitFest. However, if you prefer to keep it, we can do our best
to complete it and make a final decision at the end of the CommitFest.

Regards,
Vignesh

Re: Question about duplicate JSONTYPE_JSON check

2025-03-12 Thread Álvaro Herrera

On 2025-Mar-12, Amit Langote wrote:

> Patch look good for committing?

Ah sorry, I should have said so -- yes, it looks good to me.  I feel a
slight dislike for using URL-escaped characters in the mailing list link
you added, because it means I cannot directly copy/paste the message-id
string into my email client program.  Not a huge issue for sure, and it
seems a majority of links in the source tree are already like that
anyway, but this seems an inclusive, safe, welcoming nitpicking space :-)

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/

Re: Changing the state of data checksums in a running cluster

2025-03-12 Thread Tomas Vondra

On 3/11/25 14:07, Dagfinn Ilmari Mannsåker wrote:
> As the resident perl style pedant, I'd just like to complain about the
> below:
> 
> Tomas Vondra  writes:
> 
>> diff --git a/src/test/perl/PostgreSQL/Test/Cluster.pm 
>> b/src/test/perl/PostgreSQL/Test/Cluster.pm
>> index 666bd2a2d4c..1c66360c16c 100644
>> --- a/src/test/perl/PostgreSQL/Test/Cluster.pm
>> +++ b/src/test/perl/PostgreSQL/Test/Cluster.pm
>> @@ -3761,7 +3761,8 @@ sub checksum_enable_offline
>>  my ($self) = @_;
>>  
>>  print "### Enabling checksums in \"$self->data_dir\"\n";
>> -PostgreSQL::Test::Utils::system_or_bail('pg_checksums', '-D', 
>> $self->data_dir, '-e');
>> +PostgreSQL::Test::Utils::system_or_bail('pg_checksums', '-D',
>> +$self->data_dir, '-e');
>>  return;
>>  }
> 
> 
> This breaking between the command line options and its arguments is why
> we're switching to using fat commas. We're also using long options for
> improved self-documentation, so this should be written as:
> 
>   PostgreSQL::Test::Utils::system_or_bail('pg_checksums',
>   '--pgdata' => $self->data_dir,
>   '--enable');
> 
> And likewise below in the disable method.
> 

I don't know what fat comma is, but that's simply what perltidy did. I
don't mind formatting it differently, if there's a better way.


thanks

-- 
Tomas Vondra

Re: Index AM API cleanup

2025-03-12 Thread Peter Eisentraut

And another small patch set extracted from the bigger one, to keep 
things moving along:


0001: Add get_opfamily_method() in lsyscache.c, from your patch set.

0002: Add get_opfamily_member_for_cmptype().  This was called 
get_opmethod_member() in your patch set, but I think that name wasn't 
quite right.  I also removed the opmethod argument, which was rarely 
used and is somewhat redundant.


0003 and 0004 are enabling non-btree unique indexes for partition keys 
and materialized views.  These were v21-0011 and v21-0012, except that 
I'm combining the switch from strategies to compare types (which was in 
v21-0006 or so) with the removal of the btree requirements.
From bca4cfd1b8837ced9b74a04267861cac8003fbc6 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Wed, 12 Mar 2025 13:45:39 +0100
Subject: [PATCH v21.2 1/4] Add get_opfamily_method()

---
 src/backend/utils/cache/lsyscache.c | 22 ++
 src/include/utils/lsyscache.h   |  1 +
 2 files changed, 23 insertions(+)

diff --git a/src/backend/utils/cache/lsyscache.c 
b/src/backend/utils/cache/lsyscache.c
index 80c5a3fcfb7..9f7c0315052 100644
--- a/src/backend/utils/cache/lsyscache.c
+++ b/src/backend/utils/cache/lsyscache.c
@@ -1288,6 +1288,28 @@ get_opclass_method(Oid opclass)
 
 /* -- OPFAMILY CACHE --
 */
 
+/*
+ * get_opfamily_method
+ *
+ * Returns the OID of the index access method the opfamily is for.
+ */
+Oid
+get_opfamily_method(Oid opfid)
+{
+   HeapTuple   tp;
+   Form_pg_opfamily opfform;
+   Oid result;
+
+   tp = SearchSysCache1(OPFAMILYOID, ObjectIdGetDatum(opfid));
+   if (!HeapTupleIsValid(tp))
+   elog(ERROR, "cache lookup failed for operator family %u", 
opfid);
+   opfform = (Form_pg_opfamily) GETSTRUCT(tp);
+
+   result = opfform->opfmethod;
+   ReleaseSysCache(tp);
+   return result;
+}
+
 char *
 get_opfamily_name(Oid opfid, bool missing_ok)
 {
diff --git a/src/include/utils/lsyscache.h b/src/include/utils/lsyscache.h
index 6fab7aa6009..271eac2f0e1 100644
--- a/src/include/utils/lsyscache.h
+++ b/src/include/utils/lsyscache.h
@@ -108,6 +108,7 @@ extern Oid  get_opclass_input_type(Oid opclass);
 extern bool get_opclass_opfamily_and_input_type(Oid opclass,

Oid *opfamily, Oid *opcintype);
 extern Oid get_opclass_method(Oid opclass);
+extern Oid get_opfamily_method(Oid opfid);
 extern char *get_opfamily_name(Oid opfid, bool missing_ok);
 extern RegProcedure get_opcode(Oid opno);
 extern char *get_opname(Oid opno);
-- 
2.48.1

From d6788595aa64dbb8fe55703cfec1b9368fe3c245 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Wed, 12 Mar 2025 14:07:34 +0100
Subject: [PATCH v21.2 2/4] Add get_opfamily_member_for_cmptype()

---
 src/backend/utils/cache/lsyscache.c | 20 
 src/include/utils/lsyscache.h   |  3 +++
 2 files changed, 23 insertions(+)

diff --git a/src/backend/utils/cache/lsyscache.c 
b/src/backend/utils/cache/lsyscache.c
index 9f7c0315052..2151d8d0915 100644
--- a/src/backend/utils/cache/lsyscache.c
+++ b/src/backend/utils/cache/lsyscache.c
@@ -184,6 +184,26 @@ get_opfamily_member(Oid opfamily, Oid lefttype, Oid 
righttype,
return result;
 }
 
+/*
+ * get_opfamily_member_for_cmptype
+ * Get the OID of the operator that implements the specified 
comparison
+ * type with the specified datatypes for the specified opfamily.
+ *
+ * Returns InvalidOid if there is no pg_amop entry for the given keys.
+ */
+Oid
+get_opfamily_member_for_cmptype(Oid opfamily, Oid lefttype, Oid righttype,
+   CompareType 
cmptype)
+{
+   Oid opmethod;
+   StrategyNumber  strategy;
+
+   opmethod = get_opfamily_method(opfamily);
+   strategy = IndexAmTranslateCompareType(cmptype, opmethod, opfamily, 
false);
+
+   return get_opfamily_member(opfamily, lefttype, righttype, strategy);
+}
+
 /*
  * get_ordering_op_properties
  * Given the OID of an ordering operator (a btree "<" or ">" 
operator),
diff --git a/src/include/utils/lsyscache.h b/src/include/utils/lsyscache.h
index 271eac2f0e1..d42380a0d46 100644
--- a/src/include/utils/lsyscache.h
+++ b/src/include/utils/lsyscache.h
@@ -14,6 +14,7 @@
 #define LSYSCACHE_H
 
 #include "access/attnum.h"
+#include "access/cmptype.h"
 #include "access/htup.h"
 #include "nodes/pg_list.h"
 
@@ -74,6 +75,8 @@ extern void get_op_opfamily_properties(Oid opno, Oid 
opfamily, bool ordering_op,
   Oid 
*righttype);
 extern Oid get_opfamily_member(Oid opfamily, Oid lefttype, Oid righttype,
int16 strategy);
+extern Oid get_opfamily_member_fo

Re: AIO v2.5

2025-03-12 Thread Andres Freund

Hi,

On 2025-03-11 19:55:35 -0400, Andres Freund wrote:
> On 2025-03-11 12:41:08 -0700, Noah Misch wrote:
> > On Mon, Sep 16, 2024 at 01:51:42PM -0400, Andres Freund wrote:
> > > On 2024-09-16 07:43:49 -0700, Noah Misch wrote:
> > > > For non-sync IO methods, I gather it's essential that a process other 
> > > > than the
> > > > IO definer be scanning for incomplete IOs and completing them.
> >
> > > > Otherwise, deadlocks like this would happen:
> > >
> > > > backend1 locks blk1 for non-IO reasons
> > > > backend2 locks blk2, starts AIO write
> > > > backend1 waits for lock on blk2 for non-IO reasons
> > > > backend2 waits for lock on blk1 for non-IO reasons
> > > >
> > > > If that's right, in worker mode, the IO worker resolves that deadlock.  
> > > > What
> > > > resolves it under io_uring?  Another process that happens to do
> > > > pgaio_io_ref_wait() would dislodge things, but I didn't locate the code 
> > > > to
> > > > make that happen systematically.
> > >
> > > Yea, it's code that I haven't forward ported yet. I think basically
> > > LockBuffer[ForCleanup] ought to call pgaio_io_ref_wait() when it can't
> > > immediately acquire the lock and if the buffer has IO going on.
> >
> > I'm not finding that code in v2.6.  What function has it?
> 
> My local version now has it... Sorry, I was focusing on the earlier patches
> until now.

Looking more at my draft, I don't think it was race-free.  I had a race-free
way of doing it in the v1 patch (by making lwlocks extensible, so the check
for IO could happen between enqueueing on the lwlock wait queue and sleeping
on the semaphore), but that obviously requires that infrastructure.

I want to focus on reads for now, so I'll add FIXMEs to the relevant places in
the patch to support AIO writes and focus on the rest of the patch for now.

Greetings,

Andres Freund

Re: Add pg_buffercache_evict_all() and pg_buffercache_mark_dirty[_all]() functions

2025-03-12 Thread Nazir Bilal Yavuz

Hi,

On Mon, 17 Feb 2025 at 19:59, Andres Freund  wrote:
>
> Hi,

Thanks for the review! And sorry for the late reply.

> On 2024-12-25 15:57:34 +0300, Nazir Bilal Yavuz wrote:
> > So, this patchset extends pg_buffercache with 3 functions:
> >
> > 1- pg_buffercache_evict_all(): This is very similar to the already
> > existing pg_buffercache_evict() function. The difference is
> > pg_buffercache_evict_all() does not take an argument. Instead it just
> > loops over the shared buffers and tries to evict all of them. It
> > returns the number of buffers evicted and flushed.
>
> I do think that'd be rather useful.  Perhaps it's also worth having a version
> that just evicts a specific relation?

That makes sense. I will work on this.

> > 2- pg_buffercache_mark_dirty(): This function takes a buffer id as an
> > argument and tries to mark this buffer as dirty. Returns true on
> > success. This returns false if the buffer is already dirty. Do you
> > think this makes sense or do you prefer it to return true if the
> > buffer is already dirty?
>
> I don't really have feelings about that ;)

I feel the same. I just wanted to have a symmetry between dirty and
evict functions. If you think this would be useless, I can remove it.

> > From 813e5ec0da4c65970b4b1ce2ec2918e4652da9ab Mon Sep 17 00:00:00 2001
> > From: Nazir Bilal Yavuz 
> > Date: Fri, 20 Dec 2024 14:06:47 +0300
> > Subject: [PATCH v1 1/2] Add pg_buffercache_evict_all() function for testing
> >
> > This new function provides a mechanism to evict all shared buffers at
> > once. It is designed to address the inefficiency of repeatedly calling
> > pg_buffercache_evict() for each individual buffer, which can be
> > time-consuming when dealing with large shared buffer pools
> > (e.g., ~790ms vs. ~270ms for 16GB of shared buffers).
>
> >   */
> >  bool
> > -EvictUnpinnedBuffer(Buffer buf)
> > +EvictUnpinnedBuffer(Buffer buf, bool *flushed)
> >  {
> >   BufferDesc *desc;
> >   uint32  buf_state;
> >   boolresult;
> >
> > + *flushed = false;
> > +
> >   /* Make sure we can pin the buffer. */
> >   ResourceOwnerEnlarge(CurrentResourceOwner);
> >   ReservePrivateRefCountEntry();
> > @@ -6134,6 +6136,7 @@ EvictUnpinnedBuffer(Buffer buf)
> >   LWLockAcquire(BufferDescriptorGetContentLock(desc), 
> > LW_SHARED);
> >   FlushBuffer(desc, NULL, IOOBJECT_RELATION, IOCONTEXT_NORMAL);
> >   LWLockRelease(BufferDescriptorGetContentLock(desc));
> > + *flushed = true;
> >   }
> >
> >   /* This will return false if it becomes dirty or someone else pins
> >   it. */
>
> I don't think *flushed is necessarily accurate this way, as it might detect
> that it doesn't need to flush the data (via StartBufferIO() returning false).

You are right. It seems there is no way to get that information
without editing the FlushBuffer(), right? And I think editing
FlushBuffer() is not worth it. So, I can either remove it or explain
in the docs that #buffers_flushed may not be completely accurate.

> > + */
> > +Datum
> > +pg_buffercache_evict_all(PG_FUNCTION_ARGS)
> > +{
> > + Datum   result;
> > + TupleDesc   tupledesc;
> > + HeapTuple   tuple;
> > + Datum   values[NUM_BUFFERCACHE_EVICT_ALL_ELEM];
> > + boolnulls[NUM_BUFFERCACHE_EVICT_ALL_ELEM] = {0};
> > +
> > + int32   buffers_evicted = 0;
> > + int32   buffers_flushed = 0;
> > + boolflushed;
> > +
> > + if (get_call_result_type(fcinfo, NULL, &tupledesc) != 
> > TYPEFUNC_COMPOSITE)
> > + elog(ERROR, "return type must be a row type");
> > +
> > + if (!superuser())
> > + ereport(ERROR,
> > + (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
> > +  errmsg("must be superuser to use 
> > pg_buffercache_evict_all function")));
> > +
> > + for (int buf = 1; buf < NBuffers; buf++)
> > + {
> > + if (EvictUnpinnedBuffer(buf, &flushed))
> > + buffers_evicted++;
> > + if (flushed)
> > + buffers_flushed++;
>
> I'd probably add a pre-check for the buffer not being in use. We don't need an
> external function call, with an unconditional LockBufHdr() inside, for that
> case.

I did not understand why we would need this. Does not
EvictUnpinnedBuffer() already check that the buffer is not in use?

> > +/*
> > + * MarkUnpinnedBufferDirty
> > + *
> > + * This function is intended for testing/development use only!
> > + *
> > + * To succeed, the buffer must not be pinned on entry, so if the caller 
> > had a
> > + * particular block in mind, it might already have been replaced by some 
> > other
> > + * block by the time this function runs.  It's also unpinned on return, so 
> > the
> > + * buffer might be occupied and flushed by the time control is returned.  
> > This
> > + * inherent raciness without other inter

Re: Adding skip scan (including MDAM style range skip scan) to nbtree

2025-03-12 Thread Peter Geoghegan

On Wed, Mar 12, 2025 at 4:28 PM Alena Rybakina
 wrote:
> Thank you for the explanation!
>
> Now I see why these changes were made.
>
> After your additional explanations, everything really became clear and I
> fully agree with the current code regarding this part.

Cool.

> However I did not see an explanation to the commit regarding this place,
> as well as a comment next to the assert and the parallel_aware check and
> why BitmapIndexScanState was added in the ExecParallelReInitializeDSM.

I added BitmapIndexScanState to the switch statement in
ExecParallelReInitializeDSM because it is in the category of
planstates that never need their shared memory reinitialized -- that's
just how we represent such a plan state there.

I think that this is supposed to serve as a kind of documentation,
since it doesn't really affect how things behave. That is, it wouldn't
actually affect anything if I had forgotten to add
BitmapIndexScanState to the ExecParallelReInitializeDSM switch
statement "case" that represents that it is in this "plan state
category": the switch ends with catch-all "default: break;".

> In my opinion, there is not enough additional explanation about this in
> the form of comments, although I think that it has already been
> explained here enough for someone who will look at this code.

What can be done to improve the situation? For example, would adding a
comment next to the new assertions recently added to
ExecIndexScanReInitializeDSM and ExecIndexOnlyScanReInitializeDSM be
an improvement? And if so, what would the comment say?

-- 
Peter Geoghegan

Re: PGSERVICEFILE as part of a normal connection string

2025-03-12 Thread Ryo Kanbayashi

On Mon, Jan 27, 2025 at 2:01 PM Michael Paquier  wrote:
>
> On Wed, Nov 20, 2024 at 02:58:43AM -0500, Corey Huinker wrote:
> > Currently, a lot of our utility scripts (anything that uses
> > connectDatabase) don't support service=name params or PGSERVICE=name env
> > vars, which is really too bad. I previously thought that this was because
> > of a lack of interest, but perhaps people do want it?
>
> I'm all for more test coverage, FWIW.
>
> Torsten, the patch has been waiting on input from you based on my
> latest review for some time, so I have marked it as returned with
> feedback in the CP app.  Feel free to resubmit a new version if you
> are planning to work on that.

TO: Torsten,
CC: Micael and other hackers

If you can't work for ther patch for a while because you are busy or
other some reason,
I can become additinal reviewer and  apply review comments from Micael
to the patch  instead of you.

If you don't want my action, please reply and notice me that. If
possible, within a week :)

Just to let you know, my action is not intended to steal your
contribution but to prevent your good idea from being lost.

TO: Mecael and other hackers,

There are any problem in light of community customs?

---
Great regards,
Ryo Kanbayashi

RE: CRC32C Parallel Computation Optimization on ARM

2025-03-12 Thread Devulapalli, Raghuveer

> > Intel has contributed SSE4.2 CRC32C [1] and AVX-512 CRC32C [2] based on
> similar techniques to postgres.
> 
> ...this is a restatement of facts we already know. I'm guessing the intended
> takeaway is "since Intel submitted an implementation to us based on paper A,
> then we are free to separately also use a technique from paper B (which cites
> patents)". 

Yes. 

> The original proposal that started this thread is below, and I'd like to give 
> that
> author credit for initiating that work

Yup, that should be fine. 

Raghuveer

Re: Separate GUC for replication origins

2025-03-12 Thread vignesh C

On Thu, 13 Mar 2025 at 05:59, Euler Taveira  wrote:
>
> On Wed, Mar 12, 2025, at 8:47 AM, vignesh C wrote:
>
> I reviewed the discussion on this thread and believe we now have an
> agreement on the design and GUC names. However, the patch still needs
> updates and extensive testing, especially considering its impact on
> backward compatibility. I'm unsure if this feature can be committed in
> the current CommitFest. If you're okay with it, we can move it to the
> next CommitFest. However, if you prefer to keep it, we can do our best
> to complete it and make a final decision at the end of the CommitFest.
>
>
> This is a mechanical patch. I was waiting if someone would object or suggest a
> better GUC name. It seems to me it isn't. The pre existing GUC
> (max_replication_slots) already has coverage. I don't know what additional
> tests you have in mind. Could you elaborate?

I was considering any potential impact on pg_upgrade and
pg_createsubscriber. I will run tests with the latest posted patch to
verify.

> I'm biased to say that it is one of the easiest patches to commit because it 
> is
> just assigning a new GUC name for a pre existing functionality. If there is no
> interested in it, it will be moved to the next CF.

Sounds like a plan! Let's verify it and work towards getting it committed.

Regards,
Vignesh

Re: DOCS: Make the Server Application docs synopses more consistent

2025-03-12 Thread Peter Smith

On Wed, Mar 12, 2025 at 3:18 AM David G. Johnston
 wrote:
>
> On Thu, Dec 12, 2024 at 8:25 PM Peter Smith  wrote:
>>
>> [1] initdb [option...] [ --pgdata | -D ] directory
>> [2] pg_archivecleanup [option...] archivelocation oldestkeptwalfile
>> [3] pg_checksums [option...] [[ -D | --pgdata ]datadir]
>> [4] pg_controldata [option] [[ -D | --pgdata ]datadir]
>> [5] pg_createsubscriber [option...] { -d | --database }dbname { -D |
>> --pgdata }datadir { -P | --publisher-server }connstr
>> [6] pg_ctl (the many synopses here don't give all the switch
>> alternatives, it would be too much...)
>> [7] pg_resetwal [ -f | --force ] [ -n | --dry-run ] [option...] [ -D |
>> --pgdata ]datadir
>> [8] pg_rewind [option...] { -D | --target-pgdata } directory {
>> --source-pgdata=directory | --source-server=connstr }
>> [9] pg_test_fsync [option...]
>> [10] pg_test_timing [option...]
>> [11] pg_upgrade -b oldbindir [-B newbindir] -d oldconfigdir -D
>> newconfigdir [option...]
>> [12] pg_waldump [option...] [startseg [endseg]]
>> [13] pg_walsummary [option...] [file...]
>>
>
> Here are some guidelines we should add to [1].
>
> I don't feel listing both short and long form args is a good use of space - 
> and the { | } impairs readability.  The short form combined with, usually, a 
> decent replaceable name, shows the reader the common interactive usage and 
> they can find the long forms in the Options section.  Maybe use long forms 
> for value-less options since those don't get the argname hint.
>
> The non-space between option and its value reduces readability.  IIUC a space 
> in between doesn't impact correctness so I say go for readability.  This 
> becomes more pronounced with the first items since it is only marginally 
> readable now because there is always a } or ] before the replaceable.  Though 
> this may involve fighting the markup a bit...I haven't experimented yet 
> (pg_rewind managed it...).
>
> The first listed command should probably only include mandatory options.  
> When there are multiple combinations of mandatory options show multiple 
> commands, one for each variant.  Use examples to showcase idiomatic usage 
> patterns with descriptions.  There is room to argue exceptions to be listed 
> also in Synopsis.
>
> Establish the convention of datadir as the replaceable name.  Possibly do the 
> same for other common items.
>
> We should have a reference page (near [1] listing DocBook elements and our 
> guidelines for how/where to use them.
>
> In any case, details aside, modifying [1] with whatever is agreed to (and 
> making changes to conform) is something I agree should happen.
>

Hi David.

Thanks for taking an interest in this thread.

I've made some updates and attached the v2 patch.

==

CHANGE SUMMARY

There are some discussion points among these changes as well as other
TODO items to check I haven't broken anything.

[1] initdb [option...] [ --pgdata | -D ] directory
- removed the short/long option variations
- cleanup the tags
- changed replaceable 'directory' to 'datadir' in the synopsis and in
the options list
- TODO. Need to confirm if changing to "[-D datadir]" is correct, or
should that be "[[-D] datadir]"

[2] pg_archivecleanup [option...] archivelocation oldestkeptwalfile
- no changes

[3] pg_checksums [option...] [[ -D | --pgdata ]datadir]
- removed the short/long option variations
- cleanup the tags
- TODO. Need to confirm if changing to "[-D datadir]" is correct, or
should that be "[[-D] datadir]" (see also initdb etc)

[4] pg_controldata [option] [[ -D | --pgdata ]datadir]
- removed the short/long option variations
- cleanup the tags
- some  should be 
- the "[option]" should be "[option...]"
- TODO. Need to confirm if changing to "[-D datadir]" is correct, or
should that be "[[-D] datadir]" (see also initdb etc)
- TODO. The page structure looks strange. There should be an "Options"
section to describe -D, -V, and -?

[5] pg_createsubscriber [option...] { -d | --database }dbname { -D |
--pgdata }datadir { -P | --publisher-server }connstr
- removed the short/long option variations
- cleanup the tags
- rearranged so -D comes first (same as most other commands)
- make the "-D datadir" optional "[-D datadir]"
- change 'directory' to 'datadir' in the options list so it matches the synopsis
- change the order of the options list to match
- TODO. Need to confirm if changing to "[-D datadir]" is correct, or
should that be "[[-D] datadir]" (see also initdb etc)

[6] pg_ctl (the many synopses here don't give all the switch
alternatives, it would be too much...)
- no changes

[7] pg_resetwal [ -f | --force ] [ -n | --dry-run ] [option...] [ -D |
--pgdata ]datadir
- removed the short/long option variations
- cleanup the tags
- TODO. Looks a bit strange with "[options...]" not shown first.

[8] pg_rewind [option...] { -D | --target-pgdata } directory {
--source-pgdata=directory | --source-server=connstr }
- removed the short/long option variations
- cleanup the tags
- make the "{-D datadir}

Re: Proposal: Filter irrelevant change before reassemble transactions during logical decoding

2025-03-12 Thread Ajin Cherian

On Mon, Mar 3, 2025 at 9:10 PM Ajin Cherian  wrote:
>
> On Thu, Feb 20, 2025 at 8:42 PM Hayato Kuroda (Fujitsu)
>  wrote:
> >

I confirmed with tests that with the patch, a lot less disk space is
used when there are lots of unpublished changes and the subscription
is not configured to be streaming.
HEAD code:
Initial disk usage in replication slot directory:
4.0K/home/ajin/dataoss/pg_replslot/mysub
BEGIN
INSERT 0 100
COMMIT
Final disk usage in replication slot directory:
212M/home/ajin/dataoss/pg_replslot/mysub

Patched code:
Initial disk usage in replication slot directory:
4.0K/home/ajin/dataoss/pg_replslot/mysub
BEGIN
INSERT 0 100
COMMIT
Final disk usage in replication slot directory:
4.0K/home/ajin/dataoss/pg_replslot/mysub

Attaching the test script used.

Regards,
Ajin Cherian
Fujitsu Australia

test_disk_usage.sh
Description: Binary data

Re: Vacuum statistics

2025-03-12 Thread Jim Nasby

On Wed, Mar 12, 2025 at 2:41 PM Alena Rybakina 
wrote:

> Hi!
>
> On 10.03.2025 12:13, Ilia Evdokimov wrote:
> > Hi,
> >
> > After commit eaf5027 we should add information about wal_buffers_full.
> >
> > Any thoughts?
> >
> > --
> > Best regards,
> > Ilia Evdokimov,
> > Tantor Labs LLC.
> >
> I think I can add it. To be honest, I haven't gotten to know this
> statistic yet, haven't had time to get to know this commit yet. How will
> this statistic help us analyze the work of the vacuum for relations?
>

The usecase I can see here is that we don't want autovac creating so much
WAL traffic that it starts forcing other backends to have to write WAL out.
But tracking how many times autovac writes WAL buffers won't help with that
(though we also don't want any WAL buffers written by autovac to be counted
in the system-wide wal_buffers_full: autovac is a BG process and it's fine
if it's spending time writing WAL out). I think the same is true of a
manual vacuum as well.

What would be helpful would be a way to determine if autovac was causing
enough traffic to force other backends to write WAL. Offhand I'm not sure
how practical that actually is though.

BTW, there's also an argument to be made that autovac should throttle
itself if we're close to running out of available WAL buffers...

DOCS: Make the Server Application docs synopses more consistent

2025-03-12 Thread David G. Johnston

On Wednesday, March 12, 2025, Peter Smith  wrote:
>
>
> I've made some updates and attached the v2 patch.

Thanks.  Full review later.

Pg_controldata

- TODO. The page structure looks strange. There should be an "Options"
> section to describe -D, -V, and -?

Agreed.

>
> [7] pg_resetwal [ -f | --force ] [ -n | --dry-run ] [option...] [ -D |
>
> - TODO. Looks a bit strange with "[options...]" not shown first.

Yeah, need to fix that.

>
> [8] pg_rewind [option...] { -D | --target-pgdata } directory
>

> - TODO. Was it OK to make the "[-D datadir]" optional like all others?
> The page does not mention PGDATA.

My take on this is that the presence of environment variables doesn’t
impact whether a given CLI option is shown optional or mandatory.  We are,
in effect, communicating that “datadir” must be specified for this
command.  And then choosing one of the ways of specifying it.  The reader
can use the indicated short-form -D, the long-form —pgdata if it exists, or
the DATADIR environment variable (this applies to any option, not just
datadir) as they desire.  In short, most/all of these should show “-D
datadir” without [ ].

[11] pg_upgrade -b oldbindir [-B newbindir] -d oldconfigdir -D
> newconfigdir [option...]
> - made all the option to be optional because the page says they can
> all use environment variable defaults.

See note above regarding environment variables.

> - TODO. Looks a bit strange with "[options...]" not shown first.

Should be moved.

>
> OTHER QUESTIONS
> - Should we use class="parameter" for all the  tags?
> Currently, some do, but most do not.
>
>
I’d probably document that they should have that class but leave the bulk
cleanup of existing content for a separate patch.

~~~
>
> BTW, the scope of this thread is originally only the *server*
> application pages, but all the *client* applications might be impacted
> if we applied the same rules there.
>

Yeah, noticed that yesterday.  I don’t see a reason why the same policy
shouldn’t apply to both subsets.  I wouldn’t require everything has to be
changed right now though.  Perfectly happy to ask for people specifically
familiar with these applications to modify the pages under the new policy
and try to divvy up the work.  Then pickup what doesn’t get done.

David J.

Re: DOCS: Make the Server Application docs synopses more consistent

2025-03-12 Thread David G. Johnston

On Wednesday, March 12, 2025, David G. Johnston 
wrote:
>
>
> My take on this is that the presence of environment variables doesn’t
> impact whether a given CLI option is shown optional or mandatory.  We are,
> in effect, communicating that “datadir” must be specified for this
> command.  And then choosing one of the ways of specifying it.  The reader
> can use the indicated short-form -D, the long-form —pgdata if it exists, or
> the DATADIR environment variable (this applies to any option, not just
> datadir) as they desire.  In short, most/all of these should show “-D
> datadir” without [ ].
>

Expanding on this a bit.  Write the synopsis as if no environment variables
are set.

Generally, if the application itself defines the default value do not list
the option in the primary synopsis entry at all.  If the application takes
an option’s default value from the OS environment by a means other than an
optional environment variable then show the option in the primary synopsis
entry but mark it as optional.

Connection options can have their own rules; which means we don’t need to
be specifying -U and -d everywhere when we deal with client applications.
Though maybe we should?

David J.

Re: Add Postgres module info

2025-03-12 Thread Andrei Lepikhov


On 7/3/2025 16:56, Andrei Lepikhov wrote:

On 2/3/2025 20:35, Andrei Lepikhov wrote:

On 17/2/2025 04:00, Michael Paquier wrote:

No documentation provided.
Ok, I haven't been sure this idea has a chance to be committed. I will 
introduce the docs in the next version.
This is a new version with bug fixes. Also, use TAP tests instead of 
regression tests. Still, no documentation is included.


v5 contains documentation entries for the pg_get_modules function and 
the PG_MODULE_MAGIC_EXT macro. Also, commit comment is smoothed a little.


--
regards, Andrei LepikhovFrom b5072ce898e0760a717411535c0429303f4e5a45 Mon Sep 17 00:00:00 2001
From: "Andrei V. Lepikhov" 
Date: Tue, 19 Nov 2024 18:45:36 +0700
Subject: [PATCH v5] Introduce PG_MODULE_MAGIC_EXT macro.

This macro provides dynamically loaded shared libraries (modules) with standard
way to incorporate version (supposedly, defined according to semantic versioning
specification) and name data. The introduced catalogue routine pg_get_modules 
can
be used to find this module by name and check the version. It makes users
independent from file naming conventions.

With a growing number of Postgres core hooks and the introduction of named DSM
segments, the number of modules that don't need to be loaded on startup may
grow fast. Moreover, in many cases related to query tree transformation or
extra path recommendation, such modules might not need database objects except
GUCs - see auto_explain as an example. That means they don't need to execute
the 'CREATE EXTENSION' statement at all and don't have a record in
the pg_extension table. Such a trick provides much flexibility, including
an online upgrade and may become widespread.

In addition, it is also convenient in support to be sure that the installation
(or at least the backend) includes a specific version of the module. Even if
a module has an installation script, it is not rare that it provides
an implementation for a range of UI routine versions. It makes sense to ensure
which specific version of the code is used.

Discussions [1,2] already mentioned module-info stuff, but at that time,
extensibility techniques and extension popularity were low, and it wasn't
necessary to provide that data.

[1] https://www.postgresql.org/message-id/flat/20060507211705.GB3808%40svana.org
[2] 
https://www.postgresql.org/message-id/flat/20051106162658.34c31d57%40thunder.logicalchaos.org
---
 doc/src/sgml/xfunc.sgml   | 44 +
 src/backend/utils/fmgr/dfmgr.c| 66 ++-
 src/include/catalog/pg_proc.dat   |  6 ++
 src/include/fmgr.h| 28 +++-
 src/test/modules/test_misc/Makefile   |  5 +-
 .../test_misc/t/008_check_pg_get_modules.pl   | 65 ++
 src/test/modules/test_shm_mq/test.c   |  5 +-
 src/test/modules/test_slru/test_slru.c|  5 +-
 8 files changed, 216 insertions(+), 8 deletions(-)
 create mode 100644 src/test/modules/test_misc/t/008_check_pg_get_modules.pl

diff --git a/doc/src/sgml/xfunc.sgml b/doc/src/sgml/xfunc.sgml
index 9f22dacac7d..22e26258b15 100644
--- a/doc/src/sgml/xfunc.sgml
+++ b/doc/src/sgml/xfunc.sgml
@@ -1967,6 +1967,28 @@ CREATE FUNCTION square_root(double precision) RETURNS 
double precision
 
 
 PG_MODULE_MAGIC;
+
+or
+
+PG_MODULE_MAGIC_EXT(name, version);
+
+   
+
+   
+extended magic block
+   
+   
+Extended magic block provides the same functionality like PG_MODULE_MAGIC,
+allowing the extension provider to hard code into the binary file
+information about the
+version and module name that can be read and processed later by the 

+pg_get_modules function.
+
+
+PG_MODULE_MAGIC_EXT(
+.name = "my_module_name",
+.version = "1.2.3"
+);
 

 
@@ -1993,6 +2015,28 @@ PG_MODULE_MAGIC;
 return void.  There is presently no way to unload a dynamically loaded 
file.

 
+   
+pg_get_modules
+   
+   
+List of loaded modules
+   
+
+   
+To see the status of loaded modules a function named 
+pg_get_modules is used.
+The function receives no parameters and returns whole set of loaded modules
+in current backend. The result can differ among backends because some
+modules might be loaded dynamically by the LOAD command.
+Returns a set of records describing the module.
+The module_name column of text type
+contains the name the maintainer provided at the compilation stage or NULL
+if not defined.
+The version column of text type 
contains
+version info hard coded during the compilation or NULL if not defined.
+The libname column of text type 
contains
+full path to the file of loaded module.
+   
   
 

diff --git a/src/backend/utils/fmgr/dfmgr.c b/src/backend/utils/fmgr/dfmgr.c
index 87b233cb887..70d6ced4157 100644
--- a/src/backend/utils/fmgr/dfmgr.c
+++ b/src/backend/utils/fmgr/dfmgr.c
@@ -21,10 +21,12 @@
 #endif /* !WIN32 */

Re: remove open-coded popcount in acl.c

2025-03-12 Thread Álvaro Herrera

On 2025-Mar-12, Nathan Bossart wrote:

> On Wed, Mar 12, 2025 at 05:23:25PM +0100, Álvaro Herrera wrote:
> > Strange: this code is not covered by any tests.
> > 
> > https://coverage.postgresql.org/src/backend/utils/adt/acl.c.gcov.html#5533
> > https://coverage.postgresql.org/src/backend/utils/adt/acl.c.gcov.html#5438
> 
> Huh.  Well, it's easy enough to add some basic tests for the grantor
> selection machinery.  Here's a first try.

Thanks :-)  I confirm that this covers the code in select_best_grantor
that you're modifying.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"Ed is the standard text editor."
  http://groups.google.com/group/alt.religion.emacs/msg/8d94ddab6a9b0ad3

Re: remove open-coded popcount in acl.c

2025-03-12 Thread Nathan Bossart

On Wed, Mar 12, 2025 at 05:23:25PM +0100, Álvaro Herrera wrote:
> Strange: this code is not covered by any tests.
> 
> https://coverage.postgresql.org/src/backend/utils/adt/acl.c.gcov.html#5533
> https://coverage.postgresql.org/src/backend/utils/adt/acl.c.gcov.html#5438

Huh.  Well, it's easy enough to add some basic tests for the grantor
selection machinery.  Here's a first try.

-- 
nathan
>From d3cf9ca237f647ebcca20c55c8302f00f716c459 Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Wed, 12 Mar 2025 10:45:12 -0500
Subject: [PATCH v2 1/1] Remove open-coded popcount in acl.c.

---
 src/backend/utils/adt/acl.c  | 20 +-
 src/test/regress/expected/privileges.out | 27 
 src/test/regress/sql/privileges.sql  | 22 +++
 3 files changed, 50 insertions(+), 19 deletions(-)

diff --git a/src/backend/utils/adt/acl.c b/src/backend/utils/adt/acl.c
index 6a76550a5e2..ba14713fef2 100644
--- a/src/backend/utils/adt/acl.c
+++ b/src/backend/utils/adt/acl.c
@@ -5432,24 +5432,6 @@ select_best_admin(Oid member, Oid role)
return admin_role;
 }
 
-
-/* does what it says ... */
-static int
-count_one_bits(AclMode mask)
-{
-   int nbits = 0;
-
-   /* this code relies on AclMode being an unsigned type */
-   while (mask)
-   {
-   if (mask & 1)
-   nbits++;
-   mask >>= 1;
-   }
-   return nbits;
-}
-
-
 /*
  * Select the effective grantor ID for a GRANT or REVOKE operation.
  *
@@ -5532,7 +5514,7 @@ select_best_grantor(Oid roleId, AclMode privileges,
 */
if (otherprivs != ACL_NO_RIGHTS)
{
-   int nnewrights = 
count_one_bits(otherprivs);
+   int nnewrights = 
pg_popcount64(otherprivs);
 
if (nnewrights > nrights)
{
diff --git a/src/test/regress/expected/privileges.out 
b/src/test/regress/expected/privileges.out
index a76256405fe..954f549555e 100644
--- a/src/test/regress/expected/privileges.out
+++ b/src/test/regress/expected/privileges.out
@@ -3293,3 +3293,30 @@ DROP SCHEMA reindex_test;
 DROP ROLE regress_no_maintain;
 DROP ROLE regress_maintain;
 DROP ROLE regress_maintain_all;
+-- grantor selection
+CREATE ROLE regress_grantor1;
+CREATE ROLE regress_grantor2 ROLE regress_grantor1;
+CREATE ROLE regress_grantor3;
+CREATE TABLE grantor_test1 ();
+CREATE TABLE grantor_test2 ();
+CREATE TABLE grantor_test3 ();
+GRANT SELECT ON grantor_test2 TO regress_grantor1 WITH GRANT OPTION;
+GRANT SELECT, UPDATE ON grantor_test3 TO regress_grantor2 WITH GRANT OPTION;
+SET ROLE regress_grantor1;
+GRANT SELECT, UPDATE ON grantor_test1 TO regress_grantor3;
+ERROR:  permission denied for table grantor_test1
+GRANT SELECT, UPDATE ON grantor_test2 TO regress_grantor3;
+WARNING:  not all privileges were granted for "grantor_test2"
+GRANT SELECT, UPDATE ON grantor_test3 TO regress_grantor3;
+RESET ROLE;
+SELECT * FROM information_schema.table_privileges t
+   WHERE grantor LIKE 'regress_grantor%' ORDER BY ROW(t.*);
+ grantor  | grantee  | table_catalog | table_schema |  
table_name   | privilege_type | is_grantable | with_hierarchy 
+--+--+---+--+---++--+
+ regress_grantor1 | regress_grantor3 | regression| public   | 
grantor_test2 | SELECT | NO   | YES
+ regress_grantor2 | regress_grantor3 | regression| public   | 
grantor_test3 | SELECT | NO   | YES
+ regress_grantor2 | regress_grantor3 | regression| public   | 
grantor_test3 | UPDATE | NO   | NO
+(3 rows)
+
+DROP TABLE grantor_test1, grantor_test2, grantor_test3;
+DROP ROLE regress_grantor1, regress_grantor2, regress_grantor3;
diff --git a/src/test/regress/sql/privileges.sql 
b/src/test/regress/sql/privileges.sql
index d195aaf1377..b81694c24f2 100644
--- a/src/test/regress/sql/privileges.sql
+++ b/src/test/regress/sql/privileges.sql
@@ -2042,3 +2042,25 @@ DROP SCHEMA reindex_test;
 DROP ROLE regress_no_maintain;
 DROP ROLE regress_maintain;
 DROP ROLE regress_maintain_all;
+
+-- grantor selection
+CREATE ROLE regress_grantor1;
+CREATE ROLE regress_grantor2 ROLE regress_grantor1;
+CREATE ROLE regress_grantor3;
+CREATE TABLE grantor_test1 ();
+CREATE TABLE grantor_test2 ();
+CREATE TABLE grantor_test3 ();
+GRANT SELECT ON grantor_test2 TO regress_grantor1 WITH GRANT OPTION;
+GRANT SELECT, UPDATE ON grantor_test3 TO regress_grantor2 WITH GRANT OPTION;
+
+SET ROLE regress_grantor1;
+GRANT SELECT, UPDATE ON grantor_test1 TO regress_grantor3;
+GRANT SELECT, UPDATE ON grantor_test2 TO regress_grantor3;
+GRANT SELECT, UPDATE ON grantor_test3 TO regress_grantor3;
+RESET ROLE;
+
+SELECT * FROM information_schema.table_privileges t
+   WHERE grantor LIKE 'regress_g

Re: remove open-coded popcount in acl.c

2025-03-12 Thread Nathan Bossart

On Wed, Mar 12, 2025 at 07:34:16PM +0100, Álvaro Herrera wrote:
> Thanks :-)  I confirm that this covers the code in select_best_grantor
> that you're modifying.

Thanks for the quick review.  I'll plan on committing this shortly if CI is
happy.

-- 
nathan

Re: Support NOT VALID / VALIDATE constraint options for named NOT NULL constraints

2025-03-12 Thread Alvaro Herrera

On 2025-Mar-12, Rushabh Lathia wrote:

> Hi Alvaro,
> 
> Here are the latest patches, which includes the regression fix.

Thank you.

Taking a step back after discussing this with some colleagues, I need to
contradict what I said at the start of this thread.  There's a worry
that changing pg_attribute.attnotnull in the way I initially suggested
might not be such a great idea after all.  I did a quick search using
codesearch.debian.net for applications reading that column and thinking
about how they would react to this change; I think in the end it's going
to be quite disastrous.  We would break a vast number of these apps, and
there are probably countless other apps and frameworks that we would
also break.  Everybody would hate us forever.  Upgrading to Postgres 18
would become as bad an experience as the drastic change of implicit
casts to text in 8.3.  Nothing else in the intervening 17 years strikes
me as so problematic as this change would be.

So I think we may need to go back and somehow leave pg_attribute alone,
to avoid breaking the whole world.  Maybe it would be sufficient to
change the CompactAttribute->attnotnull to no longer be a boolean, but
the new char representation instead.  I'm not sure if this would
actually work.

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/

Re: [PATCH] SVE popcount support

2025-03-12 Thread Nathan Bossart

On Wed, Mar 12, 2025 at 10:34:46AM +, chiranmoy.bhattacha...@fujitsu.com 
wrote:
> On Wed, Mar 12, 2025 at 02:41:18AM +, nathandboss...@gmail.com wrote:
> 
>> v5-no-sve is the result of using a function pointer, but pointing to the
>> "slow" versions instead of the SVE version.  v5-sve is the result of the
>> latest patch in this thread on a machine with SVE support, and v5-4reg is
>> the result of the latest patch in this thread modified to process 4
>> register's worth of data at a time.
> 
> Nice, I wonder why I did not observe any performance gain in the 4reg
> version. Did you modify the 4reg version code?
> 
> One possible explanation is that you used Graviton4 based instances
> whereas I used Graviton3 instances.

Yeah, it looks like the number of vector registers is different [0].

>> For the latter point, I think we should consider trying to add a separate
>> Neon implementation that we use as a fallback for machines that don't have
>> SVE.  My understanding is that Neon is virtually universally supported on
>> 64-bit Arm gear, so that will not only help offset the function pointer
>> overhead but may even improve performance for a much wider set of machines.
> 
> I have added the NEON implementation in the latest patch.
> 
> Here are the numbers for drive_popcount(100, 1024) on m7g.8xlarge:
> Scalar - 692ms
> Neon - 298ms
> SVE - 112ms

Those are nice results.  I'm a little worried about the Neon implementation
for smaller inputs since it uses a per-byte loop for the remaining bytes,
though.  If we can ensure there's no regression there, I think this patch
will be in decent shape.

[0] 
https://github.com/aws/aws-graviton-getting-started?tab=readme-ov-file#building-for-graviton

-- 
nathan

Re: Handle interrupts while waiting on Append's async subplans

2025-03-12 Thread Heikki Linnakangas


On 03/03/2025 19:44, Heikki Linnakangas wrote:
In any case, the attached patch works and seems like an easy fix for 
stable branches at least.


Committed that to master and all affected stable branches.

--
Heikki Linnakangas
Neon (https://neon.tech)

Re: remove open-coded popcount in acl.c

2025-03-12 Thread Álvaro Herrera

On 2025-Mar-12, Nathan Bossart wrote:

> There's a count_one_bits() function in acl.c that can be replaced with a
> call to pg_popcount64().  This isn't performance-critical code, but IMHO we
> might as well use the centralized implementation.

Makes sense.  Patch looks good to me.

> @@ -5532,7 +5514,7 @@ select_best_grantor(Oid roleId, AclMode privileges,
>*/
>   if (otherprivs != ACL_NO_RIGHTS)
>   {
> - int nnewrights = 
> count_one_bits(otherprivs);
> + int nnewrights = 
> pg_popcount64(otherprivs);

Strange: this code is not covered by any tests.

https://coverage.postgresql.org/src/backend/utils/adt/acl.c.gcov.html#5533
https://coverage.postgresql.org/src/backend/utils/adt/acl.c.gcov.html#5438

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/

Re: vacuumdb changes for stats import/export

2025-03-12 Thread Nathan Bossart

On Wed, Mar 12, 2025 at 01:56:04PM +0700, John Naylor wrote:
> The change here seems fine. My only quibble is that this sentence now
> seems out of place: "Option --analyze-in-stages can be used to
> generate minimal statistics quickly." I'm thinking we should make a
> clearer separation for with and without the  --no-statistics option
> specified.

I think the recommendation stays the same regardless of whether
--no-statistics was used.  --missing-only should do the right think either
way.  But I do agree that this paragraph feels a bit haphazard.  I modified
it to call out the full command for both --analyze-only and
--analyze-in-stages, and I moved the note about --jobs to its own sentence
and made it clear that it is applicable to both of the suggested vacuumdb
commands.

-- 
nathan
>From 57329fe78042b12331852c4b8a121e73020e6a90 Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Tue, 11 Mar 2025 11:18:36 -0500
Subject: [PATCH v6 1/3] vacuumdb: Teach vacuum_one_database() to reuse query
 results.

Presently, each call to vacuum_one_database() performs a catalog
query to retrieve the list of tables to process.  A proposed
follow-up commit would add a "missing only" feature to
--analyze-in-stages, which requires us to save the results of the
catalog query (since tables without statistics would have them
after the first stage).  This commit adds this ability via a new
parameter for vacuum_one_database() that specifies either a
previously-retrieved list to process or a place to store the
results of the catalog query for later use.  Note that this commit
does not make use of this new parameter for anything.  That is left
as a future exercise.

The trade-offs of this approach are increased memory usage and less
responsiveness to concurrent catalog changes in later stages,
neither of which is expected to bother anyone.

Author: Corey Huinker 
Co-authored-by: Nathan Bossart 
Reviewed-by: John Naylor 
Discussion: https://postgr.es/m/Z5O1bpcwDrMgyrYy%40nathan
---
 src/bin/scripts/vacuumdb.c | 330 ++---
 1 file changed, 194 insertions(+), 136 deletions(-)

diff --git a/src/bin/scripts/vacuumdb.c b/src/bin/scripts/vacuumdb.c
index 982bf070be6..e28f82a0eba 100644
--- a/src/bin/scripts/vacuumdb.c
+++ b/src/bin/scripts/vacuumdb.c
@@ -62,10 +62,16 @@ typedef enum
 
 static VacObjFilter objfilter = OBJFILTER_NONE;
 
+static SimpleStringList *retrieve_objects(PGconn *conn,
+   
  vacuumingOptions *vacopts,
+   
  SimpleStringList *objects,
+   
  bool echo);
+
 static void vacuum_one_database(ConnParams *cparams,

vacuumingOptions *vacopts,
int stage,

SimpleStringList *objects,
+   
SimpleStringList **found_objs,
int 
concurrentCons,
const char 
*progname, bool echo, bool quiet);
 
@@ -405,7 +411,7 @@ main(int argc, char *argv[])
{
vacuum_one_database(&cparams, &vacopts,
stage,
-   
&objects,
+   
&objects, NULL,

concurrentCons,

progname, echo, quiet);
}
@@ -413,7 +419,7 @@ main(int argc, char *argv[])
else
vacuum_one_database(&cparams, &vacopts,

ANALYZE_NO_STAGE,
-   &objects,
+   &objects, NULL,
concurrentCons,
progname, echo, 
quiet);
}
@@ -461,8 +467,36 @@ escape_quotes(const char *src)
 /*
  * vacuum_one_database
  *
- * Process tables in the given database.  If the 'objects' list is empty,
- * process all tables in the database.
+ * Process tables in the given database.
+ *
+ * There are two ways to specify the list of objects to process:
+ *
+ * 1) The "found_objs" parameter is a double pointer to a fully qualified list
+ *of objects to process, as returned by a previous call to
+ *vacuum_one_dat

Re: Vacuum statistics

2025-03-12 Thread Alena Rybakina


Hi!

On 10.03.2025 16:33, Kirill Reshke wrote:

On Thu, 27 Feb 2025 at 23:00, Alena Rybakina  wrote:

Hi!
On 17.02.2025 17:46, Alena Rybakina wrote:

On 04.02.2025 18:22, Alena Rybakina wrote:

Hi! Thank you for your review!

On 02.02.2025 23:43, Alexander Korotkov wrote:

On Mon, Jan 13, 2025 at 3:26 PM Alena Rybakina
  wrote:

I noticed that the cfbot is bad, the reason seems to be related to
the lack of a parameter in
src/backend/utils/misc/postgresql.conf.sample. I added it, it
should help.

The patch doesn't apply cleanly.  Please rebase.

I rebased them.

The patch needed a rebase again. There is nothing new since version
18, only a rebase.

The patch needed a new rebase.

Sorry, but due to feeling unwell I picked up a patch from another thread
and squashed the patch for vacuum statistics for indexes and heaps in
the previous version. Now I fixed everything together with the rebase.

--
Regards,
Alena Rybakina
Postgres Professional

Hi!
CI fails on this one[0]

Is it a flap or a real problem caused by v20?

```

  SELECT relpages AS irp
diff -U3 
/tmp/cirrus-ci-build/src/test/regress/expected/vacuum_tables_and_db_statistics.out
/tmp/cirrus-ci-build/build-32/testrun/recovery/027_stream_regress/data/results/vacuum_tables_and_db_statistics.out
--- 
/tmp/cirrus-ci-build/src/test/regress/expected/vacuum_tables_and_db_statistics.out
2025-03-10 09:36:10.274799176 +
+++ 
/tmp/cirrus-ci-build/build-32/testrun/recovery/027_stream_regress/data/results/vacuum_tables_and_db_statistics.out
2025-03-10 09:49:35.796596462 +
@@ -65,7 +65,7 @@
  WHERE vt.relname = 'vestat' AND vt.relid = c.oid;
   relname | vm_new_frozen_pages | tuples_deleted | relpages |
pages_scanned | pages_removed
  
-+-++--+---+---
- vestat  |   0 |  0 |  455 |
0 | 0
+ vestat  |   0 |  0 |  417 |
0 | 0
  (1 row)

  SELECT relpages AS rp
=== EOF ===


```

[0]https://api.cirrus-ci.com/v1/artifact/task/5336493629112320/testrun/build-32/testrun/recovery/027_stream_regress/log/regress_log_027_stream_regress

Thank you for your help, I'll fix it soon.

--
Regards,
Alena Rybakina
Postgres Professional

Re: Vacuum statistics

2025-03-12 Thread Alena Rybakina


Hi!

On 10.03.2025 12:13, Ilia Evdokimov wrote:

Hi,

After commit eaf5027 we should add information about wal_buffers_full.

Any thoughts?

--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.

I think I can add it. To be honest, I haven't gotten to know this 
statistic yet, haven't had time to get to know this commit yet. How will 
this statistic help us analyze the work of the vacuum for relations?


--
Regards,
Alena Rybakina
Postgres Professional

Re: Optimization for lower(), upper(), casefold() functions.

2025-03-12 Thread Alexander Borisov


12.03.2025 19:55, Alexander Borisov wrote:

[...]


A couple questions:

* Is there a reason the fast-path for codepoints < 0x80 is in
unicode_case.c rather than unicode_case_func.h?


Yes, this is an important optimization, below are benchmarks that


[...]

I forgot to add the benchmark:

Benchmark

Anatation:
Patch v3j_0001: patch v3j_0001 without any changes.
Patch v3j_0001 + static: patch v3j_0001 with adding static to casemap().
Patch v3j_0001 + static + fast path: patch v3j_0001 with adding static
to casemap() and fast path for codepoint < 0x80.
v4_0001: v3j_0001 + static + fast path
v4_0001 + v3j_0002: v3j_0002 patch unchanged, but with static for
casemap() (inherited from v4_0001 patch)
v4_0001 + v4_0002: changed fast path for codepoint < 0x80. Made fast
return to avoid unnecessary checks.
Without: current version of the code without any patches.

All results are in tps.

* macOS 15.1 (Apple M3 Pro) (Apple clang version 16.0.0)

ASCII:
Repeated characters (700kb) in the range from 0x20 to 0x7E.
Patch v3j_0001: 201.029609
Patch v3j_0001 + static: 247.155438
Patch v3j_0001 + static + fast path: 267.941047
Patch v4_0001 + v3j_0002: 260.737601
Patch v4_0001 + v4_0002: 268.913658
Without: 260.437508

Cyrillic:
Repeated characters (1MB) in the range from 0x0410 to 0x042F.
Patch v3j_0001: 48.130350
Patch v3j_0001 + static: 49.365897
Patch v3j_0001 + static + fast path: 48.891842
Patch v4_0001 + v3j_0002: 88.833061
Patch v4_0001 + v4_0002: 88.329603
Without: 47.869687

Unicode:
A query consisting of all Unicode characters from 0xA0 to 0x2FA1D
(excluding 0xD800..0xDFFF).
Patch v3j_0001: 91.333557
Patch v3j_0001 + static: 92.464786
Patch v3j_0001 + static + fast path: 91.359428
Patch v4_0001 + v3j_0002: 103.390609
Patch v4_0001 + v4_0002: 102.819763
Without: 92.279652


Regards,
Alexander Borisov

Re: Rename functions to alloc/free things in reorderbuffer.c

2025-03-12 Thread Tom Lane

Heikki Linnakangas  writes:
> ReorderBufferGetRelids allocates an array with MemoryContextAlloc, and 
> ReorderBufferReturnRelids just calls pfree. The pools are long gone, and 
> now the naming looks weird.

> Attached patch renames those functions and other such functions to use 
> the terms Alloc/Free. I actually wonder if we should go further and 
> remove these functions altogether, and change the callers to call 
> MemoryContextAlloc directly. But I didn't do that yet.

Yeah, that is very confusing, especially since nearby code uses
names like ReorderBufferGetFoo for functions that are lookups not
allocations.  +1 for Alloc/Free where that's an accurate description.

Given that a lot of these are not just one-liners, I'm not sure that
getting rid of the ones that are would help much.

regards, tom lane

Re: making EXPLAIN extensible

2025-03-12 Thread Sami Imseih

Hi,


>>> EXPLAIN output. It wouldn't make sense for core to have an EXPLAIN
>>> option whose whole purpose is to cater to the needs of some extension,
>>> so that made me think of providing some extensibility infrastructure.

>> Making EXPLAIN extensible sounds like a good idea.. FWIW, There is a
>> discussion [0]
>> for showing FDW remote plans ( postgres_fdw specifically), and I think
>> we  will need to
>> add some new options to EXPLAIN to make that possible.

> Have not looked at your patches, but I will do so now.

Over the past few days I have had a chance to experiment with these
patches and good news is that it has allowed me to extend EXPLAIN
for postrges_fdw to show remote plans. I will share the update and
patch for this in [0], but thought it will be good to share here as well.

postgres=# explain (remote_plans) select * from t_r1, t1_r1;
 QUERY PLAN

 Nested Loop  (cost=200.00..83272.80 rows=6553600 width=16)
   Plan Node ID: 0
   ->  Foreign Scan on t_r1  (cost=100.00..673.20 rows=2560 width=8)
 Plan Node ID: 1
   ->  Materialize  (cost=100.00..686.00 rows=2560 width=8)
 Plan Node ID: 2
 ->  Foreign Scan on t1_r1  (cost=100.00..673.20 rows=2560 width=8)
   Plan Node ID: 3
 Remote Plans:
   Seq Scan on t  (cost=0.00..32.60 rows=2260 width=8)
   Statement Name: Plan Node ID = 1

   Seq Scan on t1  (cost=0.00..32.60 rows=2260 width=8)
   Statement Name: Plan Node ID = 3
(14 rows)

I do have some findings/suggestions:

1/ As you can see form the output above, I used explain_per_node_hook
to append a "Plan Node ID" to the explain output. I really don't like having it
there, and prefer that it gets added to the top line of the node.

i.e.
   ->  Foreign Scan on t_r1  (cost=100.00..673.20 rows=2560 width=8) (node_id=1)
   ->  Materialize  (cost=100.00..686.00 rows=2560 width=8) (node_id=2)
 ->  Foreign Scan on t1_r1  (cost=100.00..673.20 rows=2560
width=8) (node_id=3)

Can we add a hook at that point [1] which will allow an extension to modify
the first line of a node? I think this is not just useful for my case, but also
for other use-cases in which some high level node details could be placed.
what do you think?

2/ I registered an options handler, and I wanted this options handler to
validate that my new extension option is not used with the analyze
option.

So, the behavior is if the core explain option was first in the list, it worked,
but if it was first in the list it does not.

postgres=# explain (analyze, remote_plans) select from t_r1, t1_r1;
ERROR:  EXPLAIN options REMOTE_PLANS and ANALYZE cannot be used together

postgres=# explain (remote_plans, analyze) select from t_r1, t1_r1;
   QUERY PLAN
-
 Nested Loop  (cost=200.00..147337.37 rows=11648569 width=0) (actual
time=3.222..3.236 rows=0.00 loops=1)
 ...


This is because the ApplyExtensionExplainOption is called inside the main
loop that parses the options, so when ApplyExtensionExplainOption, the core
option may or may not have been set yet. I also think this will break if there
are multiple extension options that need to be validated together.

One way I thought to fix this is to allow the user to register another handler
for validation, which can then be called after the parse option loop, and after
all the in-core options have been validated against each other. Right after
this line.

+/* if the summary was not set explicitly, set default value */
+es->summary = (summary_set) ? es->summary : es->analyze;

What do you think?


[0] 
https://www.postgresql.org/message-id/flat/CAP%2BB4TD%3Diy-C2EnsrJgjpwSc7_4pd3Xh-gFzA0bwsw3q8u860g%40mail.gmail.com
[1] 
https://github.com/postgres/postgres/blob/master/src/backend/commands/explain.c#L2013

Thanks

--
Sami Imseih
Amazon Web Services (AWS)

Re: making EXPLAIN extensible

2025-03-12 Thread Tom Lane

Sami Imseih  writes:
> 1/ As you can see form the output above, I used explain_per_node_hook
> to append a "Plan Node ID" to the explain output. I really don't like having 
> it
> there, and prefer that it gets added to the top line of the node.

> i.e.
>->  Foreign Scan on t_r1  (cost=100.00..673.20 rows=2560 width=8) 
> (node_id=1)
>->  Materialize  (cost=100.00..686.00 rows=2560 width=8) (node_id=2)
>  ->  Foreign Scan on t1_r1  (cost=100.00..673.20 rows=2560
> width=8) (node_id=3)

> Can we add a hook at that point [1] which will allow an extension to modify
> the first line of a node? I think this is not just useful for my case, but 
> also
> for other use-cases in which some high level node details could be placed.
> what do you think?

I think this is a seriously bad idea.  The first line is already
overloaded; we don't need several different extensions adding more
stuff to it.  Plus, this doesn't consider what to do in non-text
output formats.  ISTM you should be good with adding a new
"Plan Node ID" property to each node.  No, you don't get to put it
first.  Too bad.  (If we did try to cater to that, what shall we do
with multiple extensions that all think they should be first?)

The validation point is an interesting one.  I agree that we don't
want the behavior to depend on the order in which options are
written.

regards, tom lane

Re: Non-text mode for pg_dumpall

2025-03-12 Thread Andrew Dunstan




On 2025-03-12 We 3:03 AM, jian he wrote:

On Wed, Mar 12, 2025 at 1:06 AM Álvaro Herrera  wrote:

Hello,

On 2025-Mar-11, Mahendra Singh Thalor wrote:


In map.dat file, I tried to fix this issue by adding number of characters
in dbname but as per code comments, as of now, we are not supporting \n\r
in dbnames so i removed handling.
I will do some more study to fix this issue.

Yeah, I think this is saying that you should not consider the contents
of map.dat as a shell string.  After all, you're not going to _execute_
that file via the shell.

Maybe for map.dat you need to escape such characters somehow, so that
they don't appear as literal newlines/carriage returns.


I am confused.
currently pg_dumpall plain format will abort when encountering dbname
containing newline.
the left dumped plain file does not contain all the cluster databases data.


if pg_dumpall non-text format aborts earlier,
it's aligned with pg_dumpall plain format?
it's also an improvement since aborts earlier, nothing will be dumped?


am i missing something?




I think we should fix that.

But for the current proposal, Álvaro and I were talking this morning, 
and we thought the simplest thing here would be to have the one line 
format and escape NL/CRs in the database name.



cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: Optimization for lower(), upper(), casefold() functions.

2025-03-12 Thread Jeff Davis

On Wed, 2025-03-12 at 19:55 +0300, Alexander Borisov wrote:
> 1. Added static for casemap() function. Otherwise the compiler could
> not
> optimize the code and the performance dropped significantly.

Oops, it was static, but I made it external just to see what code it
generated. I didn't intend to publish it as an external function --
thank you for catching that!

> 2. Added a fast path for codepoint < 0x80.
> 
> v3j-0002:
> In the fast path for codepoints < 0x80, I added a premature return.
> This avoided additional insertions, which increased performance.

What do you mean "additional insertions"?

Also, should we just compute the results in the fast path? We don't
even need a table. Rough patch attached to go on top of v4-0001.

Should we properly return CASEMAP_SELF when *simple == u1, or is it ok
to return CASEMAP_SIMPLE? It probably doesn't matter performance-wise,
but it feels more correct to return CASEMAP_SELF.

> 
> Perhaps for general
> beauty it should be made static inline, I don't have a rigid position
> here.

We ordinarily use "static inline" if it's in a header file, and
"static" if it's in a .c file, so I'll do it that way.

> I was purely based on existing approaches in Postgres, the
> Normalization Forms have them separated into different headers. Just
> trying to be consistent with existing approaches.

I think that was done for normalization primarily because it's not used
#ifndef FRONTEND (see unicode_norm.c), and perhaps also because it's
just a more complex function worthy of its own file.

I looked into the history, and commit 783f0cc64d explains why perfect
hashing is not used in the frontend:

"The decomposition table remains the same, getting used for the binary
search in the frontend code, where we care more about the size of the
libraries like libpq over performance..."

> 
Regards,
Jeff Davis

From ed4d2803aa32add7c05726286b94e78e49bb1257 Mon Sep 17 00:00:00 2001
From: Jeff Davis 
Date: Wed, 12 Mar 2025 11:56:59 -0700
Subject: [PATCH vtmp] fastpath

---
 src/common/unicode_case.c | 34 --
 1 file changed, 32 insertions(+), 2 deletions(-)

diff --git a/src/common/unicode_case.c b/src/common/unicode_case.c
index 2b3b4cdc2e7..b1fc1651043 100644
--- a/src/common/unicode_case.c
+++ b/src/common/unicode_case.c
@@ -390,11 +390,41 @@ casemap(pg_wchar u1, CaseKind casekind, bool full,
 {
 	const pg_case_map *map;
 
+	/*
+	 * Fast path for early codepoints. The only interesting characters are
+	 * [a-zA-Z].
+	 */
 	if (u1 < 0x80)
 	{
-		*simple = case_map[u1].simplemap[casekind];
+		/* fast-path codepoints do not have special casing */
+		Assert(find_case_map(u1)->special_case == NULL);
 
-		return CASEMAP_SIMPLE;
+		switch (casekind)
+		{
+			case CaseLower:
+			case CaseFold:
+if (u1 >= 'A' && u1 <= 'Z')
+{
+	*simple = u1 + 0x20;
+	Assert(case_map[u1].simplemap[casekind] == *simple);
+	return CASEMAP_SIMPLE;
+}
+break;
+			case CaseTitle:
+			case CaseUpper:
+if (u1 >= 'a' && u1 <= 'z')
+{
+	*simple = u1 - 0x20;
+	Assert(case_map[u1].simplemap[casekind] == *simple);
+	return CASEMAP_SIMPLE;
+}
+break;
+			default:
+Assert(false);
+		}
+
+		Assert(case_map[u1].simplemap[casekind] == u1);
+		return CASEMAP_SELF;
 	}
 
 	map = find_case_map(u1);
-- 
2.34.1

Re: Changing the state of data checksums in a running cluster

2025-03-12 Thread Dagfinn Ilmari Mannsåker

Tomas Vondra  writes:

> On 3/11/25 14:07, Dagfinn Ilmari Mannsåker wrote:
>> As the resident perl style pedant, I'd just like to complain about the
>> below:
>> 
>> Tomas Vondra  writes:
>> 
>>> diff --git a/src/test/perl/PostgreSQL/Test/Cluster.pm 
>>> b/src/test/perl/PostgreSQL/Test/Cluster.pm
>>> index 666bd2a2d4c..1c66360c16c 100644
>>> --- a/src/test/perl/PostgreSQL/Test/Cluster.pm
>>> +++ b/src/test/perl/PostgreSQL/Test/Cluster.pm
>>> @@ -3761,7 +3761,8 @@ sub checksum_enable_offline
>>> my ($self) = @_;
>>>  
>>> print "### Enabling checksums in \"$self->data_dir\"\n";
>>> -   PostgreSQL::Test::Utils::system_or_bail('pg_checksums', '-D', 
>>> $self->data_dir, '-e');
>>> +   PostgreSQL::Test::Utils::system_or_bail('pg_checksums', '-D',
>>> +   $self->data_dir, '-e');
>>> return;
>>>  }
>> 
>> 
>> This breaking between the command line options and its arguments is why
>> we're switching to using fat commas. We're also using long options for
>> improved self-documentation, so this should be written as:
>> 
>>  PostgreSQL::Test::Utils::system_or_bail('pg_checksums',
>>  '--pgdata' => $self->data_dir,
>>  '--enable');
>> 
>> And likewise below in the disable method.
>> 
>
> I don't know what fat comma is, but that's simply what perltidy did. I
> don't mind formatting it differently, if there's a better way.

Fat comma is the perlish name for the => arrow, which is semantically
equivalent to a comma (except it auto-quotes any immediately preceding
bareword), but looks fatter.  Perltidy knows to not wrap lines around
them, keeping the key and value (or option and argument in this case)
together.  See commit ce1b0f9da03 for a large (but not complete, I have
more patches pending) conversion to this new style.

- ilmari

Re: meson vs. llvm bitcode files

2025-03-12 Thread Diego Fronza

Hi,

The v7 patch looks good to me, handling the bitcode modules in a uniform
way and also avoiding the hacky code and warnings, much better now.

A small note about the bitcode emission for generated sources in contrib,
using cube as example, currently it creates two dict entries in a list:
bc_seg_gen_sources = [{'srcfiles': [seg_scan]}]
bc_seg_gen_sources += {'srcfiles': [seg_parse[0]]}

Then pass it to the bitcode_modules:
bitcode_modules += {
  ...
  'gen_srcfiles': bc_seg_gen_sources,
}

It could be passed as a list with a single dict, since both generated
sources share the same compilation flags:
bitcode_modules += {
  ...
  'gen_srcfiles': [
{  'srcfiles': [cube_scan, cube_parse[0]] }.
  ]
}

Both approaches work, the first one has the advantage of being able to pass
separate additional_flags per generated source.

Thanks for your reply Nazir, also waiting for more opinions on this.

Regards,
Diego

On Wed, Mar 12, 2025 at 7:27 AM Nazir Bilal Yavuz 
wrote:

> Hi,
>
> On Tue, 11 Mar 2025 at 01:04, Diego Fronza 
> wrote:
> > I did a full review on the provided patches plus some tests, I was able
> to validate that the loading of bitcode modules is working also JIT works
> for both backend and contrib modules.
>
> Thank you!
>
> > To test JIT on contrib modules I just lowered the costs for all jit
> settings and used the intarray extension, using the data/test__int.data:
> > CREATE EXTENSION intarray;
> > CREATE TABLE test__int( a int[] );1
> > \copy test__int from 'data/test__int.data'
> >
> > For queries any from line 98+ on contrib/intarray/sql/_int.sql will work.
> >
> > Then I added extra debug messages to llvmjit_inline.cpp on
> add_module_to_inline_search_path() function, also on
> llvm_build_inline_plan(), I was able to see many functions in this module
> being successfully inlined.
> >
> > I'm attaching a new patch based on your original work which add further
> support for generating bitcode from:
>
> Thanks for doing that!
>
> >  - Generated backend sources: processed by flex, bison, etc.
> >  - Generated contrib module sources,
>
> I think we do not need to separate these two.
>
>foreach srcfile : bitcode_module['srcfiles']
> -if meson.version().version_compare('>=0.59')
> +srcfilename = '@0@'.format(srcfile)
> +if srcfilename.startswith(' +  srcfilename = srcfile.full_path().split(meson.build_root() + '/')[1]
> +elif meson.version().version_compare('>=0.59')
>
> Also, checking if the string starts with ' hacky, and 'srcfilename = '@0@'.format(srcfile)' causes a deprecation
> warning. So, instead of this we can process all generated sources like
> how generated backend sources are processed. I updated the patch with
> that.
>
> > On this patch I just included fmgrtab.c and src/backend/parser for the
> backend generated code.
> > For contrib generated sources I added contrib/cube as an example.
>
> I applied your contrib/cube example and did the same thing for the
> contrib/seg.
>
> > All relevant details about the changes are included in the patch itself.
> >
> > As you may know already I also created a PR focused on llvm bitcode
> emission on meson, it generates bitcode for all backend and contribution
> modules, currently under review by some colleagues at Percona:
> https://github.com/percona/postgres/pull/103
> > I'm curious if we should get all or some of the generated backend
> sources compiled to bitcode, similar to contrib modules.
>
> I think we can do this. I added other backend sources like you did in
> the PR but attached it as another patch (0007) because I wanted to
> hear other people's opinions on that first.
>
> v3 is attached.
>
> --
> Regards,
> Nazir Bilal Yavuz
> Microsoft
>

Re: dblink query interruptibility

2025-03-12 Thread Andreas Karlsson


On 3/12/25 12:48 AM, Noah Misch wrote:

The CREATE DATABASE hang is indeed new in v15.  The general dblink missed
interrupt processing (e.g. pg_cancel_backend response delay) is an old bug.


Aha, that was what you were referring to! My apologies, was reading your 
mail a bit too quickly. :)



Commit d3c5f37 used the new functions for postgres_fdw, not just dblink.  That
caused message changes detailed in
postgr.es/m/CAHGQGwGpDTXeg8K1oTmDv9nankaKTrCD-XW-tnkzo6%3DE9p5%3Duw%40mail.gmail.com
so I'm inclined to omit postgres_fdw changes in back branches.  postgres_fdw
was already interruptible, so the point of making postgres_fdw adopt the
functions was to reduce code duplication.


Makes sense.


Overall, in the absence of objections, I will queue a task to back-patch the
non-postgres_fdw portion of commit d3c5f37 to v13-v16.


Thanks!

Andreas

Re: Parallel heap vacuum

2025-03-12 Thread Amit Kapila

On Wed, Mar 12, 2025 at 11:24 AM Dilip Kumar  wrote:
>
> On Wed, Mar 12, 2025 at 3:17 AM Masahiko Sawada  wrote:
> >
> > On Tue, Mar 11, 2025 at 5:51 AM Amit Kapila  wrote:
> > >
>
> Some thoughts/questions on the idea
>
> I notice that we are always considering block-level parallelism for
> heaps and object-level parallelism for indexes. I'm wondering, when
> multiple tables are being vacuumed together—either because the user
> has provided a list of tables or has specified a  partitioned table
> with multiple children—does it still make sense to default to
> block-level parallelism? Or could we consider table-level parallelism
> in such cases? For example, if there are 4 tables and 6 workers, with
> 2 tables being small and the other 2 being large, perhaps we could
> allocate 4 workers to vacuum all 4 tables in parallel. For the larger
> tables, we could apply block-level parallelism, using more workers for
> internal parallelism. On the other hand, if all tables are small, we
> could just apply table-level parallelism without needing block-level
> parallelism at all. This approach could offer more flexibility, isn't
> it?
>

I have not thought from this angle, but it seems we can build this
even on top of block-level vacuum for large tables.

-- 
With Regards,
Amit Kapila.

Re: Support NOT VALID / VALIDATE constraint options for named NOT NULL constraints

2025-03-12 Thread Alvaro Herrera

On 2025-Mar-12, Ashutosh Bapat wrote:

> The 002_pg_upgrade test passes with and without my patches now. But
> then the tests added here do not leave behind any parent-child table.
> Previously we have found problems in dumping and restoring constraints
> in an inheritance hierarchy. I think the test should leave behind all
> the combinations of parent and child NOT NULL constraints so that
> 0002_pg_upgrade can test those.

I agree.

> Is it expected that a child may have VALID constraint but parent has
> not valid constraint?

Sure.  Consider: if the parent has an unvalidated constraint, we cannot
make any assertions about the state of its children.  The children may
have additional constraints of their own -- in this case, a child can
have a validated constraint even though the parent has none, or only an
unvalidatec constraint.

But the opposite is not allowed: if you know something to be true about
a parent table (to wit: that no row in it is NULL), then this must
necessarily apply to its children as well.  Therefore, if there's a
valid constraint in the parent, then all children must have the same
constraint, and all such constraints must be known valid.

> Same case with partitioned table. We should leave partitioned table
> hierarchy behind for 002_pg_upgrade to test. And we need tests to test
> scenarios where a partitioned table has valid constraint but we try to
> change constraint on a partition to not valid and vice versa. I think
> we shouldn't allow such assymetry in partitioned table hierarchy and
> having a test would be better.

Agreed.

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/

Re: [PoC] Reducing planning time when tables have many partitions

2025-03-12 Thread newtglobal postgresql_contributors

The following review has been posted through the commitfest application:
make installcheck-world:  tested, failed
Implements feature:   tested, failed
Spec compliant:   tested, failed
Documentation:tested, failed

Hi Yuya,
Tested this patch and noted that this patch significantly improves query 
planning time, especially as the number of partitions increases. While the 
impact is minimal for small partition counts (2–8), the improvement becomes 
substantial from 16 partitions onward, reaching up to ~86.6% reduction at 768 
partitions. Larger partitions (512–1024) see a dramatic speedup, cutting 
planning time by over 2.7 seconds. The results confirm that the patch optimizes 
partitioned query execution efficiently. This enhancement is crucial for 
databases handling large partitioned tables, leading to better performance and 
scalability.
Regards,
NewtGlobal PostgreSQL contributors

Re: Test mail for pgsql-hackers

2025-03-12 Thread David G. Johnston

On Wednesday, March 12, 2025, Ashutosh Bapat 
wrote:

> Hi Blessy,
>
> On Tue, Mar 11, 2025 at 6:03 PM BharatDB  wrote:
> >>
> >> Hi ,
> >
> >
> >>
> >> I’ve been exploring logical replication and noticed that if the column
> datatypes don’t match between the publisher and subscriber, PostgreSQL
> doesn’t give a warning. This can cause unexpected behavior, and I thought
> it might be helpful to alert users when this happens.
> >
> >
> >>
> >> ### **What This Patch Does:**
> >
> >
> >>
> >> - Adds a warning when a column's datatype in the subscriber doesn’t
> match the publisher.
> >>
> >> - Helps users catch issues early instead of running into silent errors
> later.
> >
> >
> >>
> >> Why I Think It’s Useful:- Avoids confusion when replication doesn’t
> work as expected. - Makes debugging easier by pointing out potential
> problems. I’d love to get feedback on whether this is a good idea and if
> I’ve approached it correctly. Since I’m still learning, any suggestions for
> improvement would be really helpful. I’ve attached the patch—please let me
> know what you think!
>
> Large part of your patch is renaming files which are not related to
> this patch. Can you please clean that up.
>

If there is really a patch submission happening here I suggest you start a
new thread with a meaningful subject line and proper recipients (one list,
not -owners)

David J.

Re: Document NULL

2025-03-12 Thread Marcos Pegoraro

Em ter., 11 de mar. de 2025 às 17:43, Tom Lane  escreveu:

> I think that idea (changing all the docs) is a complete nonstarter
> because people would not understand why the results they get don't
> look like what it says in the docs.  Of course, we could fix that
> by changing the factory default for \pset null, but that seems like
> even more of a nonstarter.
>

And if we use a tag to display nulls, like NULL
Then all null values would be the same, and the user would understand what
it means, because it's painted differently.
And then would be an unique way of appearance, on this page and all others

regards
Marcos

Re: per backend WAL statistics

2025-03-12 Thread Tom Lane

Michael Paquier  writes:
> And I guess that we're OK here, so applied.  That should be the last
> one.

Quite a few buildfarm members are not happy about the initialization
that 9a8dd2c5a added to PendingBackendStats.  For instance [1]:

gcc -std=gnu99 -Wall -Wmissing-prototypes -Wpointer-arith 
-Wdeclaration-after-statement -Werror=vla -Wendif-labels 
-Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv 
-fexcess-precision=standard -g -O2 -I. -I. -I../../../../src/include  
-D_GNU_SOURCE -I/usr/include/libxml2   -c -o pgstat_backend.o pgstat_backend.c
pgstat_backend.c:39:1: warning: missing braces around initializer 
[-Wmissing-braces]
 static PgStat_BackendPending PendingBackendStats = {0};
 ^
pgstat_backend.c:39:1: warning: (near initialization for 
\342\200\230PendingBackendStats.pending_io\342\200\231) [-Wmissing-braces]

I guess that more than one level of braces is needed for this to
be fully correct?  Similar from ayu, batfish, boa, buri, demoiselle,
dhole, rhinoceros, shelduck, siskin.

regards, tom lane

[1] 
https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=arowana&dt=2025-03-11%2004%3A59%3A16&stg=build

Re: RFC: Additional Directory for Extensions

2025-03-12 Thread Matheus Alcantara

On Tue, Mar 11, 2025 at 12:59 PM Peter Eisentraut  wrote:
> Yes, that structure looks ok.  But you can remove one level of block in
> get_extension_control_directories().
>
Sorry, missed during debugging. Fixed

> I found a bug that was already present in my earlier patch versions:
>
> @@ -423,7 +424,7 @@ find_extension_control_filename(ExtensionControlFile
> *control)
>  ecp = Extension_control_path;
>  if (strlen(ecp) == 0)
>  ecp = "$system";
> -   result = find_in_path(basename, Extension_control_path,
> "extension_control_path", "$system", system_dir);
> +   result = find_in_path(basename, ecp, "extension_control_path",
> "$system", system_dir);
>
> Without this, it won't work if you set extension_control_path empty.
> (Maybe add a test for that?)
>
Fixed, and also added a new test case for this.

> I think this all works now, but I think the way
> pg_available_extensions() works is a bit strange and inefficient.  After
> it finds a candidate control file, it calls
> read_extension_control_file() with the extension name, that calls
> parse_extension_control_file(), that calls
> find_extension_control_filename(), and that calls find_in_path(), which
> searches that path again!
>
> There should be a simpler way into this.  Maybe
> pg_available_extensions() should fill out the ExtensionControlFile
> structure itself, set ->control_dir with where it found it, then call
> directly to parse_extension_control_file(), and that should skip the
> finding if the directory is already set.  Or something similar.
>
Good catch. I fixed this by creating a new function to construct the
ExtensionControlFile and changed the pg_available_extensions to set the
control_dir. The read_extension_control_file was also changed to just
call this new function constructor. I implemented the logic to check if
the control_dir is already set on parse_extension_control_file because
it seems to me that make more sense to not call
find_extension_control_filename instead of putting this logic there
since we already set the control_dir when we find the control file, and
having the logic to set the control_dir or skip the find_in_path seems
more confusing on this function instead of on
parse_extension_control_file. Please let me know what you think.

-- 
Matheus Alcantara


v7-0001-extension_control_path.patch
Description: Binary data

Re: Support NOT VALID / VALIDATE constraint options for named NOT NULL constraints

2025-03-12 Thread Ashutosh Bapat

On Wed, Mar 12, 2025 at 3:52 PM Rushabh Lathia  wrote:
>
>
> Hi Alvaro,
>
> Here are the latest patches, which includes the regression fix.
>

The 002_pg_upgrade test passes with and without my patches now. But
then the tests added here do not leave behind any parent-child table.
Previously we have found problems in dumping and restoring constraints
in an inheritance hierarchy. I think the test should leave behind all
the combinations of parent and child NOT NULL constraints so that
0002_pg_upgrade can test those.

We should add more scenarios for constraint inheritance. E.g.
#CREATE TABLE notnull_tbl1 (a int);
#ALTER TABLE notnull_tbl1 ADD CONSTRAINT nn_parent not null a not valid;
#CREATE TABLE notnull_chld (a int);
#ALTER TABLE notnull_chld ADD CONSTRAINT nn_child not null a;
#ALTER TABLE notnull_chld INHERIT notnull_tbl1;
#SELECT conname, convalidated FROM pg_catalog.pg_constraint WHERE conrelid
in ('notnull_tbl1'::regclass, 'notnull_chld'::regclass);
  conname  | convalidated
---+--
 nn_parent | f
 nn_child  | t
(2 rows)

Is it expected that a child may have VALID constraint but parent has
not valid constraint?

Same case with partitioned table. We should leave partitioned table
hierarchy behind for 002_pg_upgrade to test. And we need tests to test
scenarios where a partitioned table has valid constraint but we try to
change constraint on a partition to not valid and vice versa. I think
we shouldn't allow such assymetry in partitioned table hierarchy and
having a test would be better.

-- 
Best Wishes,
Ashutosh Bapat

RE: Selectively invalidate caches in pgoutput module

2025-03-12 Thread Hayato Kuroda (Fujitsu)

Dear Amit,

> Thanks, the patch looks mostly good to me. I have made few cosmetic
> changes in the attached and combined both the patches.

Thanks, it looks good to me.

> The existing
> Alter Publication ... Rename tests don't test invalidations arising
> from that command. As this patch changes that code path, it would be
> good to add a few tests for the same. We can add one for individual
> relations and another for ALL Tables publication.

I created new patch which adds a test code. I added in 007_ddl.pl, but I feel
it is OK to introduce new test file. How do you think?

Best regards,
Hayato Kuroda
FUJITSU LIMITED



v12-0001-Avoid-invalidating-all-RelationSyncCache-entries.patch
Description:  v12-0001-Avoid-invalidating-all-RelationSyncCache-entries.patch


v12-0002-Add-testcase.patch
Description: v12-0002-Add-testcase.patch

Re: Question about duplicate JSONTYPE_JSON check

2025-03-12 Thread Amit Langote

On Wed, Mar 12, 2025 at 7:09 PM Álvaro Herrera  wrote:
> On 2025-Mar-12, Amit Langote wrote:
>
> > I was able to construct a test case that crashes due to this bug:
> >
> > CREATE TYPE mood AS ENUM ('happy', 'sad', 'neutral');
> > CREATE FUNCTION mood_to_json(mood) RETURNS json AS $$
> >   SELECT to_json($1::text);
> > $$ LANGUAGE sql IMMUTABLE;
> > CREATE CAST (mood AS json) WITH FUNCTION mood_to_json(mood) AS IMPLICIT;
> >
> > SELECT JSON_OBJECT('happy'::mood: '123'::jsonb);
> > server closed the connection unexpectedly
>
> Good reaction time :-)  I see that that line shows as not even uncovered
> in the report, but as non-existant (no background color as opposed to
> red):
>
> https://coverage.postgresql.org/src/backend/utils/adt/jsonb.c.gcov.html#660
>
> Evidently the compiler must be optimizing it out as dead code.

Ah, I did wonder about the coverage, thanks for pointing it out.

Patch look good for committing?

-- 
Thanks, Amit Langote

Re: Change log level for notifying hot standby is waiting non-overflowed snapshot

2025-03-12 Thread torikoshia


Hi,

After an off-list discussion with Fujii-san, I'm now trying to modify 
the following message that is output when a client attempts to connect 
instead of changing the log level as the original proposal:


  $ psql: error: connection to server at "localhost" (::1), port 5433 
failed: FATAL:  the database system is not yet accepting connections

DETAIL:  Consistent recovery state has not been yet reached.

I have now 2 candidates to do this.

The 1st 
one(v1-0001-Change-log-message-when-hot-standby-is-not-access.patch) is 
a simple update to the existing log messages, explicitly mentioning that 
snapshot overflow could be a possible cause.
The 2nd(v1-0001-Make-it-clear-when-hot-standby-is-inaccessible-du.patch) 
one introduces new states for pmState and CAC_state (which manages 
whether connections can be accepted) to represent waiting for a 
non-overflowed snapshot.


The advantage of the 2nd one is that it makes it clear whether the 
connection failure is due to not reaching a consistent recovery state or 
a snapshot overflow. However, I haven't found other significant 
benefits, and I feel it might be overkill.


Personally, I feel 1st patch may be sufficient, but I would appreciate 
any feedback.




--
Atsushi Torikoshi
Seconded from NTT DATA GROUP CORPORATION to SRA OSS K.K.From 38a9ec23af2dc43ad24d939bb015d28d550d71fd Mon Sep 17 00:00:00 2001
From: Atsushi Torikoshi 
Date: Wed, 12 Mar 2025 21:47:22 +0900
Subject: [PATCH v1] Make it clear when hot standby is inaccessible due to
 subtransaction overflow

Previously, the log message only assumed that the recovery process had
not yet reached a consistent point. However, even after reaching the
consistent point, if there is a transaction with an overflowed
subtransaction, hot standby becomes inaccessible.
Since there was no log message indicating this reason, it was
difficult to identify the cause.

This patch explicitly handles such cases, making the cause clearer in
the logs.
---
 src/backend/postmaster/postmaster.c | 29 ++---
 src/backend/storage/ipc/procarray.c | 17 +
 src/backend/tcop/backend_startup.c  | 13 +
 src/include/storage/pmsignal.h  |  2 ++
 src/include/tcop/backend_startup.h  |  1 +
 5 files changed, 55 insertions(+), 7 deletions(-)

diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index d2a7a7add6..5c3de3f97d 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -333,6 +333,8 @@ typedef enum
 	PM_INIT,	/* postmaster starting */
 	PM_STARTUP,	/* waiting for startup subprocess */
 	PM_RECOVERY,/* in archive recovery mode */
+	PM_SNAPSHOT_PENDING,		/* in snapshot pending because of an
+ * overflowed subtransaction */
 	PM_HOT_STANDBY,/* in hot standby mode */
 	PM_RUN,		/* normal "database is alive" state */
 	PM_STOP_BACKENDS,			/* need to stop remaining backends */
@@ -1814,6 +1816,9 @@ canAcceptConnections(BackendType backend_type)
 		else if (!FatalError && pmState == PM_RECOVERY)
 			return CAC_NOTCONSISTENT;	/* not yet at consistent recovery
 		 * state */
+		else if (!FatalError && pmState == PM_SNAPSHOT_PENDING)
+			return CAC_SNAPSHOT_PENDING;	/* waiting for non-overflowed
+			 * snapshot */
 		else
 			return CAC_RECOVERY;	/* else must be crash recovery */
 	}
@@ -2111,7 +2116,7 @@ process_pm_shutdown_request(void)
 			 */
 			if (pmState == PM_RUN || pmState == PM_HOT_STANDBY)
 connsAllowed = false;
-			else if (pmState == PM_STARTUP || pmState == PM_RECOVERY)
+			else if (pmState == PM_STARTUP || pmState == PM_RECOVERY || pmState == PM_SNAPSHOT_PENDING)
 			{
 /* There should be no clients, so proceed to stop children */
 UpdatePMState(PM_STOP_BACKENDS);
@@ -2145,7 +2150,7 @@ process_pm_shutdown_request(void)
 			sd_notify(0, "STOPPING=1");
 #endif
 
-			if (pmState == PM_STARTUP || pmState == PM_RECOVERY)
+			if (pmState == PM_STARTUP || pmState == PM_RECOVERY || pmState == PM_SNAPSHOT_PENDING)
 			{
 /* Just shut down background processes silently */
 UpdatePMState(PM_STOP_BACKENDS);
@@ -2711,6 +2716,7 @@ HandleFatalError(QuitSignalReason reason, bool consider_sigabrt)
 
 			/* wait for children to die */
 		case PM_RECOVERY:
+		case PM_SNAPSHOT_PENDING:
 		case PM_HOT_STANDBY:
 		case PM_RUN:
 		case PM_STOP_BACKENDS:
@@ -3193,6 +3199,7 @@ pmstate_name(PMState state)
 			PM_TOSTR_CASE(PM_INIT);
 			PM_TOSTR_CASE(PM_STARTUP);
 			PM_TOSTR_CASE(PM_RECOVERY);
+			PM_TOSTR_CASE(PM_SNAPSHOT_PENDING);
 			PM_TOSTR_CASE(PM_HOT_STANDBY);
 			PM_TOSTR_CASE(PM_RUN);
 			PM_TOSTR_CASE(PM_STOP_BACKENDS);
@@ -3245,7 +3252,7 @@ LaunchMissingBackgroundProcesses(void)
 	 * the shutdown checkpoint.  That's done in PostmasterStateMachine(), not
 	 * here.)
 	 */
-	if (pmState == PM_RUN || pmState == PM_RECOVERY ||
+	if (pmState == PM_RUN || pmState == PM_RECOVERY || pmState == PM_SNAPSHOT_PENDING ||
 		pmState == PM_HOT_STANDBY || pmState ==

Re: Logical Replication of sequences

2025-03-12 Thread vignesh C

On Wed, 12 Mar 2025 at 09:14, vignesh C  wrote:
>
> The patch was not applying on top of HEAD because of recent commits,
> here is a rebased version.

I have moved this to the next CommitFest since it will not be
committed in the current release. This also allows reviewers to focus
on the remaining patches in the current CommitFest.

Regards,
Vignesh

Re: Bypassing cursors in postgres_fdw to enable parallel plans

2025-03-12 Thread Andy Fan

Rafia Sabih  writes:

Hi,

>
> At present, in postgres_fdw, if a query which is using a parallel plan is 
> fired from the remote end fails to use the
> parallel plan locally because of the presence of CURSORS. Consider the 
> following example,
...
>
> Now, to overcome this limitation, I have worked on this idea (suggested by my 
> colleague Bernd Helmle) of bypassing the
> cursors.

Do you know why we can't use parallel plan when cursor is used? Is It
related to this code in ExecutePlan?

/*
 * Set up parallel mode if appropriate.
 *
 * Parallel mode only supports complete execution of a plan.  If we've
 * already partially executed it, or if the caller asks us to exit 
early,
 * we must force the plan to run without parallelism.
 */
if (queryDesc->already_executed || numberTuples != 0)
use_parallel_mode = false;

Actually I can't understand the comment as well and I had this
confusion for a long time.

-- 
Best Regards
Andy Fan

Re: Document NULL

2025-03-12 Thread David G. Johnston

On Wednesday, March 12, 2025, Marcos Pegoraro  wrote:

> Em ter., 11 de mar. de 2025 às 17:43, Tom Lane 
> escreveu:
>
>> I think that idea (changing all the docs) is a complete nonstarter
>> because people would not understand why the results they get don't
>> look like what it says in the docs.  Of course, we could fix that
>> by changing the factory default for \pset null, but that seems like
>> even more of a nonstarter.
>>
>
> And if we use a tag to display nulls, like NULL
> Then all null values would be the same, and the user would understand
> what it means, because it's painted differently.
> And then would be an unique way of appearance, on this page and all others
>

I’m not accepting this line of work as part of this patch.  Please limit
this to the merits of choosing \N as the particular value for \pset null on
this page.  You can start a new thread regarding a policy change for
marking up null values in the whole of the documentation.  But as Tom said,
I don’t see that going anywhere.

David J.

Re: Support NOT VALID / VALIDATE constraint options for named NOT NULL constraints

2025-03-12 Thread Rushabh Lathia

Hi Alvaro,

Here are the latest patches, which includes the regression fix.

Thanks,




On Wed, Mar 12, 2025 at 3:38 PM Ashutosh Bapat 
wrote:

> On Wed, Mar 12, 2025 at 3:20 PM Alvaro Herrera 
> wrote:
> >
> > On 2025-Mar-12, Ashutosh Bapat wrote:
> >
> > > If the test passes for you, can you please try the patches at [1] on
> > > top of your patches? Please apply those, set and export environment
> > > variable PG_TEST_EXTRA=regress_dump_test, and run 002_pg_upgrade test?
> > > I intended to do this but can not do it since the test always fails
> > > with your patches applied.
> >
> > Oh, I need to enable a PG_TEST_EXTRA option in order for the test to
> > run?  FFS.  That explains why the tests passed just fine for me.
> > I'll re-run.
>
> You need PG_TEST_EXTRA to run the testcase I am adding there. Rest of
> the testcases run without PG_TEST_EXTRA.
>
> --
> Best Wishes,
> Ashutosh Bapat
>


-- 
Rushabh Lathia


0001-Convert-pg_attribut.attnotnull-to-char-type.patch
Description: Binary data


0002-Support-NOT-VALID-and-VALIDATE-CONSTRAINT-for-named-.patch
Description: Binary data


0003-Documentation-changes.patch
Description: Binary data

Re: Parallel safety docs for CTEs

2025-03-12 Thread James Coleman

On Tue, Nov 19, 2024 at 2:16 PM James Coleman  wrote:
>
> Hello,
>
> A colleague noticed today that the docs still say that "Scans of
> common table expressions (CTEs)" are "always parallel restricted".
>
> While I think that strictly remains true at the implementation level,
> from a user's perspective I think that's not been true since the
> change to default to trying to inline CTEs rather than defaulting to
> materializing them.
>
> Attached is a patch to slightly modify the language; would be happy to
> hear suggestions on a better way to improve this.
>
> Regards,
> James Coleman

I'd forgotten to make a commit fest record for this and stumbled upon
it today. See https://commitfest.postgresql.org/patch/5650/

Regards,
James Coleman

Re: Statistics import and export: difference in statistics of materialized view dumped

2025-03-12 Thread Ashutosh Bapat

On Wed, Mar 12, 2025 at 4:08 AM Jeff Davis  wrote:
>
> On Tue, 2025-03-11 at 11:26 -0400, Tom Lane wrote:
> > Right, that was what I was thinking, but hadn't had time to look in
> > detail.  The postDataBound dependency isn't real helpful here, we
> > could lose that if we had the data dependency.
>
> Attached a patch.
>
> It's a bit messier than I expected, so I'm open to other suggestions.
> The reason is because materialized view data is also pushed to
> RESTORE_PASS_POST_ACL, so we need to do the same for the statistics
> (otherwise the dependency is just ignored).

I ran my test with this patch (we have to remove 0003 patch in my test
which uses --no-statistics option). It failed with following
differences
@@ -452068,8 +452068,8 @@
SELECT * FROM pg_catalog.pg_restore_relation_stats(
'version', '18'::integer,
'relation', 'public.mvtest_aa'::regclass,
- 'relpages', '2'::integer,
- 'reltuples', '1'::real,
+ 'relpages', '0'::integer,
+ 'reltuples', '-1'::real,
'relallvisible', '0'::integer
);
--
@@ -452097,8 +452097,8 @@
SELECT * FROM pg_catalog.pg_restore_relation_stats(
'version', '18'::integer,
'relation', 'public.mvtest_tm_type'::regclass,
- 'relpages', '2'::integer,
- 'reltuples', '3'::real,
+ 'relpages', '0'::integer,
+ 'reltuples', '-1'::real,
'relallvisible', '0'::integer
);
--
@@ -452111,8 +452111,8 @@
SELECT * FROM pg_catalog.pg_restore_relation_stats(
'version', '18'::integer,
'relation', 'public.mvtest_tvmm_expr'::regclass,
- 'relpages', '2'::integer,
- 'reltuples', '1'::real,
+ 'relpages', '0'::integer,
+ 'reltuples', '-1'::real,
'relallvisible', '0'::integer
);
--=== stderr ===
=== EOF ===

The previous differences have disappeared but new differences have appeared.

-- 
Best Wishes,
Ashutosh Bapat

Re: AIO v2.5

2025-03-12 Thread Andres Freund

Hi,

On 2025-03-11 20:57:43 -0700, Noah Misch wrote:
> > I think we'll really need to do something about this for BSD users 
> > regardless
> > of AIO. Or maybe those OSs should fix something, but somehow I am not having
> > high hopes for an OS that claims to have POSIX confirming unnamed semaphores
> > due to having a syscall that always returns EPERM... [1].
> 
> I won't mind a project making things better for non-root BSD users.  I do
> think such a project should not block other projects making things better for
> everything else (like $SUBJECT).

Oh, I strongly agree.  The main reason I would like it to be addressed that
I'm pretty tired of having to think about open/netbsd whenever we update some
default setting.


> > > On Mon, Mar 10, 2025 at 02:23:12PM -0400, Andres Freund wrote:
> > > > Attached is v2.6 of the AIO patchset.
> > >
> > > > - 0005, 0006 - io_uring support - close, but we need to do something 
> > > > about
> > > >   set_max_fds(), which errors out spuriously in some cases
> > >
> > > What do we know about those cases?  I don't see a set_max_fds(); is that
> > > set_max_safe_fds(), or something else?
> > 
> > Sorry, yes, set_max_safe_fds(). The problem basically is that with io_uring 
> > we
> > will have a large number of FDs already allocated by the time
> > set_max_safe_fds() is called. set_max_safe_fds() subtracts already_open from
> > max_files_per_process allowing few, and even negative, IOs.
> > 
> > I think we should redefine max_files_per_process to be about the number of
> > files each *backend* will additionally open.  Jelte was working on related
> > patches, see [2]
> 
> Got it.  max_files_per_process is a quaint setting, documented as follows (I
> needed the reminder):
> 
> If the kernel is enforcing
> a safe per-process limit, you don't need to worry about this setting.
> But on some platforms (notably, most BSD systems), the kernel will
> allow individual processes to open many more files than the system
> can actually support if many processes all try to open
> that many files. If you find yourself seeing Too many open
> files failures, try reducing this setting.
> 
> I could live with
> v6-0003-Reflect-the-value-of-max_safe_fds-in-max_files_pe.patch but would lean
> against it since it feels unduly novel to have a setting where we use the
> postgresql.conf value to calculate a value that becomes the new SHOW-value of
> the same setting.

I think we may update some other GUCs, but not sure.


> Options I'd consider before that:

> - Like you say, "redefine max_files_per_process to be about the number of
>   files each *backend* will additionally open".  It will become normal that
>   each backend's actual FD list length is max_files_per_process + MaxBackends
>   if io_method=io_uring.  Outcome is not unlike
>   v6-0002-Bump-postmaster-soft-open-file-limit-RLIMIT_NOFIL.patch +
>   v6-0003-Reflect-the-value-of-max_safe_fds-in-max_files_pe.patch but we don't
>   mutate max_files_per_process.  Benchmark results should not change beyond
>   the inter-major-version noise level unless one sets io_method=io_uring.  I'm
>   feeling best about this one, but I've not been thinking about it long.

Yea, I think that's something probably worth doing separately from Jelte's
patch.  I do think that it'd be rather helpful to have jelte's patch to
increase NOFILE in addition though.


> > > > +static void
> > > > +maybe_adjust_io_workers(void)
> > >
> > > This also restarts workers that exit, so perhaps name it
> > > start_io_workers_if_missing().
> > 
> > But it also stops IO workers if necessary?
> 
> Good point.  Maybe just add a comment like "start or stop IO workers to close
> the gap between the running count and the configured count intent".

It's now
/*
 * Start or stop IO workers, to close the gap between the number of running
 * workers and the number of configured workers.  Used to respond to change of
 * the io_workers GUC (by increasing and decreasing the number of workers), as
 * well as workers terminating in response to errors (by starting
 * "replacement" workers).
 */


> > > > +{
> > > ...
> > > > +   /* Try to launch one. */
> > > > +   child = StartChildProcess(B_IO_WORKER);
> > > > +   if (child != NULL)
> > > > +   {
> > > > +   io_worker_children[id] = child;
> > > > +   ++io_worker_count;
> > > > +   }
> > > > +   else
> > > > +   break;  /* XXX try 
> > > > again soon? */
> > >
> > > Can LaunchMissingBackgroundProcesses() become the sole caller of this
> > > function, replacing the current mix of callers?  That would be more 
> > > conducive
> > > to promptly doing the right thing after launch failure.
> > 
> > I'm not sure that'd be a good idea - right now IO workers are started before
> > the startup process, as the startup process might need to pe

001_rep_changes.pl succeeds after a long time

2025-03-12 Thread Álvaro Herrera

Hello,

I noticed that the test file 001_repo_changes.pl finished successfully
after having taken 180s to run.  This seems pretty suspicious --
normally that step takes around one second.

The problem is seen in this step:

[19:44:49.572](0.262s) ok 24 - update works with dropped subscriber column
### Stopping node "publisher" using mode fast
# Running: pg_ctl -D 
/home/alvherre/Code/pgsql-build/master/src/test/subscription/tmp_check/t_001_rep_changes_publisher_data/pgdata
 -m fast stop
waiting for server to shut down done
server stopped
# No postmaster PID for node "publisher"
### Starting node "publisher"
# Running: pg_ctl -w -D 
/home/alvherre/Code/pgsql-build/master/src/test/subscription/tmp_check/t_001_rep_changes_publisher_data/pgdata
 -l 
/home/alvherre/Code/pgsql-build/master/src/test/subscription/tmp_check/log/001_rep_changes_publisher.log
 -o --cluster-name=publisher start
waiting for server to start done
server started
# Postmaster PID for node "publisher" is 3502228
Waiting for replication conn tap_sub's replay_lsn to pass 0/17DEBD8 on publisher
done
[19:47:50.689](181.116s) ok 25 - check replicated inserts after subscription 
publication change


Note that test 25 succeeded but took 3 minutes.  If I look at the
publisher logs, I see this:

2025-03-11 19:44:49.885 CET [3501892] 001_rep_changes.pl LOG:  statement: 
DELETE FROM tab_rep
2025-03-11 19:44:50.052 CET [3494694] LOG:  received fast shutdown request
2025-03-11 19:44:50.052 CET [3494694] LOG:  aborting any active transactions
2025-03-11 19:44:50.053 CET [3494694] LOG:  background worker "logical 
replication launcher" (PID 3494772) exited with exit code 1
2025-03-11 19:44:50.053 CET [3494755] LOG:  shutting down
2025-03-11 19:44:50.480 CET [3501776] tap_sub LOG:  released logical 
replication slot "tap_sub"
2025-03-11 19:44:50.484 CET [3494755] LOG:  checkpoint starting: shutdown 
immediate
2025-03-11 19:44:50.485 CET [3494755] LOG:  checkpoint complete: wrote 9 
buffers (7.0%), wrote 3 SLRU buffers; 0 WAL file(s) added, 0 removed, 0 
recycled; write=0.001 s, sync=0.001 s, total=0.002 s; sync files=0, 
longest=0.000 s, average=0.000 s; distance=579 kB, estimate=579 kB; 
lsn=0/17DEB60, redo lsn=0/17DEB60
2025-03-11 19:44:50.488 CET [3494694] LOG:  database system is shut down
2025-03-11 19:44:50.884 CET [3502228] LOG:  starting PostgreSQL 18devel on 
x86_64-pc-linux-gnu, compiled by gcc (Debian 12.2.0-14) 12.2.0, 64-bit
2025-03-11 19:44:50.884 CET [3502228] LOG:  listening on Unix socket 
"/run/user/1000/alvherre-tmp/i4fQURzady/.s.PGSQL.26529"
2025-03-11 19:44:50.886 CET [3502349] LOG:  database system was shut down at 
2025-03-11 19:44:50 CET
2025-03-11 19:44:50.889 CET [3502228] LOG:  database system is ready to accept 
connections
2025-03-11 19:44:50.930 CET [3502395] 001_rep_changes.pl LOG:  statement: 
SELECT pg_is_in_recovery()
2025-03-11 19:44:50.945 CET [3502428] 001_rep_changes.pl LOG:  statement: 
SELECT pg_current_wal_lsn()
2025-03-11 19:44:50.960 CET [3502440] 001_rep_changes.pl LOG:  statement: 
SELECT '0/17DEBD8' <= replay_lsn AND state = 'streaming'
 FROM pg_catalog.pg_stat_replication
 WHERE application_name IN ('tap_sub', 'walreceiver')

The last query is repeated a number of times for 3 minutes, and finally we get 
this

2025-03-11 19:47:50.569 CET [3540762] 001_rep_changes.pl LOG:  statement: 
SELECT '0/17DEBD8' <= replay_lsn AND state = 'streaming'
 FROM pg_catalog.pg_stat_replication
 WHERE application_name IN ('tap_sub', 'walreceiver')
2025-03-11 19:47:50.596 CET [3540764] tap_sub LOG:  statement: SELECT 
pg_catalog.set_config('search_path', '', false);
2025-03-11 19:47:50.597 CET [3540764] tap_sub LOG:  received replication 
command: IDENTIFY_SYSTEM
2025-03-11 19:47:50.597 CET [3540764] tap_sub STATEMENT:  IDENTIFY_SYSTEM


Meanwhile, the subscriber has this:

2025-03-11 19:44:49.777 CET [3501647] LOG:  logical replication worker for 
subscription "tap_sub" will restart because of a parameter change
2025-03-11 19:44:49.785 CET [3501757] LOG:  logical replication apply worker 
for subscription "tap_sub" has started
2025-03-11 19:44:50.481 CET [3501757] LOG:  data stream from publisher has ended
2025-03-11 19:44:50.481 CET [3501757] ERROR:  could not send end-of-streaming 
message to primary: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
no COPY in progress
2025-03-11 19:44:50.484 CET [3495251] LOG:  background worker "logical 
replication apply worker" (PID 3501757) exited with exit code 1
2025-03-11 19:44:50.490 CET [3502088] LOG:  logical replication apply worker 
for subscription "tap_sub" has started
2025-03-11 19:44:50.490 CET [3502088] ERROR:  apply worker for subscription 
"tap_sub" could not connect to the publisher: connection to server on socket 
"/run/user/1000/alvherre-tmp/i4fQURzady/.s.PGSQL.26529" failed: No such file or 
directory
I

Rename functions to alloc/free things in reorderbuffer.c

2025-03-12 Thread Heikki Linnakangas

I noticed some weird naming conventions in reorderbuffer.c which are 
leftovers from a long time ago when reorderbuffer.c maintained its own 
small memory pools to reduce palloc/pfree overhead. For example:


extern Oid *ReorderBufferGetRelids(ReorderBuffer *rb, int nrelids);
extern void ReorderBufferReturnRelids(ReorderBuffer *rb, Oid *relids);

ReorderBufferGetRelids allocates an array with MemoryContextAlloc, and 
ReorderBufferReturnRelids just calls pfree. The pools are long gone, and 
now the naming looks weird.


Attached patch renames those functions and other such functions to use 
the terms Alloc/Free. I actually wonder if we should go further and 
remove these functions altogether, and change the callers to call 
MemoryContextAlloc directly. But I didn't do that yet.


Any objections?

--
Heikki Linnakangas
Neon (https://neon.tech)
From 23b2bfbbbcccb79f586f5b719344fcf1e09325b0 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas 
Date: Wed, 12 Mar 2025 19:47:18 +0200
Subject: [PATCH 1/1] Rename alloc/free functions in reorderbuffer.c

There used to be bespoken pools for these structs to reduce the
palloc/pfree overhead, but that was ripped out a long time ago and
replaced by the generic, cheaper generational memory allocator (commit
a4ccc1cef5). The Get/Return terminology made sense with the pools, as
you "got" an object from the pool and "returned" it later, but now it
just looks weird. Rename to Alloc/Free.
---
 src/backend/replication/logical/decode.c  | 26 +++---
 .../replication/logical/reorderbuffer.c   | 89 +--
 src/include/replication/reorderbuffer.h   | 15 ++--
 3 files changed, 64 insertions(+), 66 deletions(-)

diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 24d88f368d8..78f9a0a11c4 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -915,7 +915,7 @@ DecodeInsert(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 	if (FilterByOrigin(ctx, XLogRecGetOrigin(r)))
 		return;
 
-	change = ReorderBufferGetChange(ctx->reorder);
+	change = ReorderBufferAllocChange(ctx->reorder);
 	if (!(xlrec->flags & XLH_INSERT_IS_SPECULATIVE))
 		change->action = REORDER_BUFFER_CHANGE_INSERT;
 	else
@@ -928,7 +928,7 @@ DecodeInsert(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 	tuplelen = datalen - SizeOfHeapHeader;
 
 	change->data.tp.newtuple =
-		ReorderBufferGetTupleBuf(ctx->reorder, tuplelen);
+		ReorderBufferAllocTupleBuf(ctx->reorder, tuplelen);
 
 	DecodeXLogTuple(tupledata, datalen, change->data.tp.newtuple);
 
@@ -965,7 +965,7 @@ DecodeUpdate(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 	if (FilterByOrigin(ctx, XLogRecGetOrigin(r)))
 		return;
 
-	change = ReorderBufferGetChange(ctx->reorder);
+	change = ReorderBufferAllocChange(ctx->reorder);
 	change->action = REORDER_BUFFER_CHANGE_UPDATE;
 	change->origin_id = XLogRecGetOrigin(r);
 	memcpy(&change->data.tp.rlocator, &target_locator, sizeof(RelFileLocator));
@@ -980,7 +980,7 @@ DecodeUpdate(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 		tuplelen = datalen - SizeOfHeapHeader;
 
 		change->data.tp.newtuple =
-			ReorderBufferGetTupleBuf(ctx->reorder, tuplelen);
+			ReorderBufferAllocTupleBuf(ctx->reorder, tuplelen);
 
 		DecodeXLogTuple(data, datalen, change->data.tp.newtuple);
 	}
@@ -996,7 +996,7 @@ DecodeUpdate(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 		tuplelen = datalen - SizeOfHeapHeader;
 
 		change->data.tp.oldtuple =
-			ReorderBufferGetTupleBuf(ctx->reorder, tuplelen);
+			ReorderBufferAllocTupleBuf(ctx->reorder, tuplelen);
 
 		DecodeXLogTuple(data, datalen, change->data.tp.oldtuple);
 	}
@@ -1031,7 +1031,7 @@ DecodeDelete(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 	if (FilterByOrigin(ctx, XLogRecGetOrigin(r)))
 		return;
 
-	change = ReorderBufferGetChange(ctx->reorder);
+	change = ReorderBufferAllocChange(ctx->reorder);
 
 	if (xlrec->flags & XLH_DELETE_IS_SUPER)
 		change->action = REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT;
@@ -1051,7 +1051,7 @@ DecodeDelete(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 		Assert(XLogRecGetDataLen(r) > (SizeOfHeapDelete + SizeOfHeapHeader));
 
 		change->data.tp.oldtuple =
-			ReorderBufferGetTupleBuf(ctx->reorder, tuplelen);
+			ReorderBufferAllocTupleBuf(ctx->reorder, tuplelen);
 
 		DecodeXLogTuple((char *) xlrec + SizeOfHeapDelete,
 		datalen, change->data.tp.oldtuple);
@@ -1083,7 +1083,7 @@ DecodeTruncate(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 	if (FilterByOrigin(ctx, XLogRecGetOrigin(r)))
 		return;
 
-	change = ReorderBufferGetChange(ctx->reorder);
+	change = ReorderBufferAllocChange(ctx->reorder);
 	change->action = REORDER_BUFFER_CHANGE_TRUNCATE;
 	change->origin_id = XLogRecGetOrigin(r);
 	if (xlrec->flags & XLH_TRUNCATE_CASCADE)
@@ -1091,8 +1091,8 @@ DecodeTruncate(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 	if (xlrec->flags & XLH_TRUNCATE_RESTART_SEQS)
 		change->data.truncate.rest

Re: Test to dump and restore objects left behind by regression

2025-03-12 Thread Alvaro Herrera

On 2025-Mar-12, Ashutosh Bapat wrote:

> Does the test pass for you if you don't apply my patches?

Yes.  It also passes if I keep PG_TEST_EXTRA empty.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/

Re: Index AM API cleanup

2025-03-12 Thread Mark Dilger

On Wed, Mar 12, 2025 at 7:25 AM Tom Lane  wrote:

> Peter Eisentraut  writes:
> > 0002: Add get_opfamily_member_for_cmptype().  This was called
> > get_opmethod_member() in your patch set, but I think that name wasn't
> > quite right.  I also removed the opmethod argument, which was rarely
> > used and is somewhat redundant.
>
> Hm, that will throw an error if IndexAmTranslateCompareType fails.
> Shouldn't it be made to return InvalidOid instead?
>

There are two failure modes.  In one mode, the AM has a concept of
equality, but there is no operator for the given type.  In the other mode,
the AM simply has no concept of equality.

In DefineIndex, the call:

if (stmt->unique && !stmt->iswithoutoverlaps)
{
idx_eqop =
get_opfamily_member_for_cmptype(idx_opfamily,

 idx_opcintype,

 idx_opcintype,

 COMPARE_EQ);
if (!idx_eqop)
ereport(ERROR,
errcode(ERRCODE_UNDEFINED_OBJECT),
errmsg("could not identify an
equality operator for type %s", format_type_be(idx_opcintype)),
errdetail("There is no suitable
operator in operator family \"%s\" for access method \"%s\".",

get_opfamily_name(idx_opfamily, false),
get_am_name(get_opfamily_method(idx_opfamily;
}

probably should error about COMPARE_EQ not having a strategy in the AM,
rather than complaining later that an equality operator cannot be found for
the type.  So that one seems fine as it is now.

The other caller, refresh_by_match_merge, is similar:

op = get_opfamily_member_for_cmptype(opfamily, opcintype,
opcintype, COMPARE_EQ);
if (!OidIsValid(op))
elog(ERROR, "missing equality operator for (%u,%u) in
opfamily %u",
 opcintype, opcintype, opfamily);

seems fine.  There are no other callers as yet.

You have made me a bit paranoid that, in future, we might add additional
callers who are not anticipating the error.  Should we solve that with a
more complete code comment, or do you think refactoring is in order?

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Primary and standby setting cross-checks

2025-03-12 Thread Kirill Reshke

On Thu, 29 Aug 2024 at 23:52, Heikki Linnakangas  wrote:
>
> Currently, if you configure a hot standby server with a smaller
> max_connections setting than the primary, the server refuses to start up:
>
> LOG:  entering standby mode
> FATAL:  recovery aborted because of insufficient parameter settings
> DETAIL:  max_connections = 10 is a lower setting than on the primary
> server, where its value was 100.
> HINT:  You can restart the server after making the necessary
> configuration changes.
>
> Or if you change the setting in the primary while the standby is
> running, replay pauses:
>
> WARNING:  hot standby is not possible because of insufficient parameter
> settings
> DETAIL:  max_connections = 100 is a lower setting than on the primary
> server, where its value was 200.
> CONTEXT:  WAL redo at 2/E1D8 for XLOG/PARAMETER_CHANGE:
> max_connections=200 max_worker_processes=8 max_wal_senders=10
> max_prepared_xacts=0 max_locks_per_xact=64 wal_level=logical
> wal_log_hints=off track_commit_timestamp=off
> LOG:  recovery has paused
> DETAIL:  If recovery is unpaused, the server will shut down.
> HINT:  You can then restart the server after making the necessary
> configuration changes.
> CONTEXT:  WAL redo at 2/E1D8 for XLOG/PARAMETER_CHANGE:
> max_connections=200 max_worker_processes=8 max_wal_senders=10
> max_prepared_xacts=0 max_locks_per_xact=64 wal_level=logical
> wal_log_hints=off track_commit_timestamp=off
>
> Both of these are rather unpleasant behavior.
>
> I thought I could get rid of that limitation with my CSN snapshot patch
> [1], because it gets rid of the fixed-size known-assigned XIDs array,
> but there's a second reason for these limitations. It's also used to
> ensure that the standby has enough space in the lock manager to hold
> possible AccessExclusiveLocks taken by transactions in the primary.
>
> So firstly, I think that's a bad tradeoff. In vast majority of cases,
> you would not run out of lock space anyway, if you just started up the
> system. Secondly, that cross-check of settings doesn't fully prevent the
> problem. It ensures that the lock tables are large enough to accommodate
> all the locks you could possibly hold in the primary, but that doesn't
> take into account any additional locks held by read-only queries in the
> hot standby. So if you have queries running in the standby that take a
> lot of locks, this can happen anyway:
>
> 2024-08-29 21:44:32.634 EEST [668327] FATAL:  out of shared memory
> 2024-08-29 21:44:32.634 EEST [668327] HINT:  You might need to increase
> "max_locks_per_transaction".
> 2024-08-29 21:44:32.634 EEST [668327] CONTEXT:  WAL redo at 2/FD40FCC8
> for Standby/LOCK: xid 996 db 5 rel 154045
> 2024-08-29 21:44:32.634 EEST [668327] WARNING:  you don't own a lock of
> type AccessExclusiveLock
> 2024-08-29 21:44:32.634 EEST [668327] LOG:  RecoveryLockHash contains
> entry for lock no longer recorded by lock manager: xid 996 database 5
> relation 154045
> TRAP: failed Assert("false"), File:
> "../src/backend/storage/ipc/standby.c", Line: 1053, PID: 668327
> postgres: startup recovering
> 0001000200FD(ExceptionalCondition+0x6e)[0x556a4588396e]
> postgres: startup recovering
> 0001000200FD(+0x44156e)[0x556a4571356e]
> postgres: startup recovering
> 0001000200FD(StandbyReleaseAllLocks+0x78)[0x556a45712738]
> postgres: startup recovering
> 0001000200FD(ShutdownRecoveryTransactionEnvironment+0x15)[0x556a45712685]
> postgres: startup recovering
> 0001000200FD(shmem_exit+0x111)[0x556a457062e1]
> postgres: startup recovering
> 0001000200FD(+0x434132)[0x556a45706132]
> postgres: startup recovering
> 0001000200FD(proc_exit+0x59)[0x556a45706079]
> postgres: startup recovering
> 0001000200FD(errfinish+0x278)[0x556a45884708]
> postgres: startup recovering
> 0001000200FD(LockAcquireExtended+0xa46)[0x556a45719386]
> postgres: startup recovering
> 0001000200FD(StandbyAcquireAccessExclusiveLock+0x11d)[0x556a4571330d]
> postgres: startup recovering
> 0001000200FD(standby_redo+0x70)[0x556a45713690]
> postgres: startup recovering
> 0001000200FD(PerformWalRecovery+0x7b3)[0x556a4547d313]
> postgres: startup recovering
> 0001000200FD(StartupXLOG+0xac3)[0x556a4546dae3]
> postgres: startup recovering
> 0001000200FD(StartupProcessMain+0xe8)[0x556a45693558]
> postgres: startup recovering
> 0001000200FD(+0x3ba95d)[0x556a4568c95d]
> postgres: startup recovering
> 0001000200FD(+0x3bce41)[0x556a4568ee41]
> postgres: startup recovering
> 0001000200FD(PostmasterMain+0x116e)[0x556a4568eaae]
> postgres: startup recovering
> 0001000200FD(+0x2f960e)[0x556a455cb60e]
> /lib/x86_64-linux-gnu/libc.so.6(+0x27c8a)[0x7f10ef042c8a]
> /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x85)[0x7f10ef042d45]
> postgres: startup recovering
> 0001000200FD(_start+0x21)[0x556a

Re: Replace IN VALUES with ANY in WHERE clauses during optimization

2025-03-12 Thread Alena Rybakina


Hi, Alexander!

On 06.03.2025 11:23, Alexander Korotkov wrote:

Hi, Alena!

On Sat, Mar 1, 2025 at 1:39 PM Alena Rybakina  wrote:

On 09.02.2025 18:38, Alexander Korotkov wrote:

Also, aren't we too restrictive while requiring is_simple_values_sequence()?
For instance, I believe cases like this (containing Var) could be transformed 
too.

select * from t t1, lateral (select * from t t2 where t2.i in (values (t1.i), 
(1)));

I added it and attached a patch with diff file. To be honest, I didn't find 
queries except for var with volatile functions where the transform can't be 
applied.

I removed the function volatility check that I added in the previous version, 
since we already check it in is_simple_values_sequence.

I'm not sure about only cases where var can refer to something outside 
available_rels list but I couldn't come up with an example where that's 
possible, what do you think?

Considering it again, I think we can't face problems like that because we don't 
work with join.

I attached a diff file as a difference with the 3rd version of the patch, when 
we did not consider the values with var for transformation.

I take detailed look at makeSAOPArrayExpr() function, which is much
more complex than corresponding fragment from
match_orclause_to_indexcol().  And I found it to be mostly wrong.  We
are working in post parse-analyze stage.  That means it's too late to
do type coercion or lookup operator by name.  We have already all the
catalog objects nailed down.  In connection with that, second argument
of OpExpr shouldn't be ignored as it might contain amrelevant type
cast.  I think I've fixed the most of them problems in the attached
patchset.



I agree with your conclusion and changes.

--
Regards,
Alena Rybakina
Postgres Professional

1 2 >

1 - 100 of 141 matches

Mail list logo