date:20250315

On Fri, 2025-03-14 at 10:42 -0500, Nathan Bossart wrote:
> One other difference in my version of the patch [0] is to call this GUC
> vacuum_truncate and have it apply to both autovacuum and VACUUM.

I agree that that is the way to go.
It makes no sense to restrict the feature to autovacuum.

Yours,
Laurenz Albe

Re: vacuumdb changes for stats import/export

On Mon, Mar 10, 2025 at 12:35:22PM +0700, John Naylor wrote:
> I have no further comments.

Thanks.  I'll give this a little more time for review before committing.

We'll still need to update the recommendation in pg_upgrade's
documentation.  I'm going to keep that separate because the stats
import/export work is still settling.

-- 
nathan

Re: Improve CRC32C performance on SSE4.2

2025-03-15 Thread John Naylor

On Wed, Mar 12, 2025 at 3:57 AM Devulapalli, Raghuveer
 wrote:
>
> > I've gone ahead and added the generated AVX-512 algorithm in v14-0005

> I'm a little confused. Why is v14 missing the earlier versions of pclmul 
> implementation that works without avx-512?

As I mentioned upthread, the 128-bit implementation regresses on Zen 2
up to at least 256 bytes.

--
John Naylor
Amazon Web Services

Re: Support NOT VALID / VALIDATE constraint options for named NOT NULL constraints

2025-03-15 Thread Rushabh Lathia

On Thu, Feb 27, 2025 at 3:25 PM Rushabh Lathia 
wrote:

> Thank Alvaro for the fixup patch.
>
>
>
>
> On Fri, Feb 21, 2025 at 11:43 AM Alvaro Herrera 
> wrote:
>
>> Hello,
>>
>> Thanks!
>>
>> I noticed a typo 'constrint' in several places; fixed in the attached.
>>
>> I discovered that this sequence, taken from added regression tests,
>>
>> CREATE TABLE notnull_tbl1 (a int);
>> ALTER TABLE notnull_tbl1 ADD CONSTRAINT nn_parent not null a not valid;
>> CREATE TABLE notnull_chld (a int);
>> ALTER TABLE notnull_chld ADD CONSTRAINT nn_child not null a not valid;
>> ALTER TABLE notnull_chld INHERIT notnull_tbl1;
>> ALTER TABLE notnull_tbl1 VALIDATE CONSTRAINT nn_parent;
>>
>> does mark the constraint as validated in the child, but only in
>> pg_constraint -- pg_attribute continues to be marked as 'i', so if you
>> try to use it for a PK, it fails:
>>
>> alter table notnull_chld add constraint foo primary key (a);
>> ERROR:  column "a" of table "notnull_chld" is marked as NOT VALID NOT
>> NULL constrint
>>
>> I thought that was going to be a quick fix, so I tried to do so; since
>> we already have a function 'set_attnotnull', I thought it was the
>> perfect tool to changing attnotnull.  However, it's not great, because
>> since that part of the code is already doing the validation, I don't
>> want it to queue the validation again, so the API needs a tweak; I
>> changed it to receiving separately which new value to update attnotnull
>> to, and whether to queue validation.  With that change it works
>> correctly, but it is a bit ugly at the callers' side.  Maybe it works to
>> pass two booleans instead?  Please have a look at whether that can be
>> improved.
>>
>
> I haven't given much thought to improving the API, but I'll look into it
> now. Also,
> I'm considering renaming AdjustNotNullInheritance() since it now also
> checks for invalid NOT NULL constraints. What do you think?
>

I adjusted the set_attnotnull() API and removed the added queue_validation
parameter.  Rather, the function start using wqueue input parameter as a
check.
If wqueue is NULL, skip the queue_validation.  Attaching patch here, but not
sure how clear it is, in term of understanding the API.  Your
thoughts/inputs?

Looking further for AdjustNotNullInheritance() I don't see need of rename
this
API as it's just adding new error check now for an existing NOT NULL
constraint.

JFYI, I can reproduce the failure Ashutosh Bapat reported, while running
the pg_upgrade test through meson commands.  I am investigating that
further.


Thanks,
Rushabh Lathia


0004-Adjust-set_attnotnull-API-input-parameters.patch
Description: Binary data

Re: SQL Property Graph Queries (SQL/PGQ)

2025-03-15 Thread Junwang Zhao

Hi Amit,

On Tue, Mar 11, 2025 at 5:06 PM Amit Langote  wrote:
>
> Hi Ashutosh, Junwang,
>
> On Tue, Mar 11, 2025 at 4:22 PM Ashutosh Bapat
>  wrote:
> > On Wed, Feb 26, 2025 at 8:04 PM Junwang Zhao  wrote:
> > > I added a trivial fix(v12-0014) that called table_open/table_close in
> > > rewriteGraphTable, it now passed the regression test and cirrus ci test,
> > > but I'm not sure it's the correct fix.
> > >
> > > I hope Ashutosh can chime in and take a look at this problem.
> >
> > 2. Following Assertion is failing, the assertion was added recently.
> > TRAP: failed Assert("IsParallelWorker() ||
> > CheckRelationOidLockedByMe(rte->relid, AccessShareLock, true)"), File:
> > "../../coderoot/pg/src/backend/executor/execMain.c", Line: 695, PID:
> > 303994
> > postgres: ashutosh regression [local]
> > SELECT(ExceptionalCondition+0xbe)[0x5c838c5d7114]
> > postgres: ashutosh regression [local]
> > SELECT(ExecCheckPermissions+0xf8)[0x5c838c11fb9c]
> > postgres: ashutosh regression [local] SELECT(+0x38223f)[0x5c838c12023f]
> > postgres: ashutosh regression [local]
> > SELECT(standard_ExecutorStart+0x2f8)[0x5c838c11f223]
> > postgres: ashutosh regression [local] 
> > SELECT(ExecutorStart+0x69)[0x5c838c11ef22]
> > postgres: ashutosh regression [local] 
> > SELECT(PortalStart+0x368)[0x5c838c3d991a]
> > postgres: ashutosh regression [local] SELECT(+0x63458e)[0x5c838c3d258e]
> > postgres: ashutosh regression [local] 
> > SELECT(PostgresMain+0x9eb)[0x5c838c3d7cf0]
> > postgres: ashutosh regression [local] SELECT(+0x630178)[0x5c838c3ce178]
> > postgres: ashutosh regression [local]
> > SELECT(postmaster_child_launch+0x137)[0x5c838c2da677]
> > postgres: ashutosh regression [local] SELECT(+0x5431b4)[0x5c838c2e11b4]
> > postgres: ashutosh regression [local] SELECT(+0x54076a)[0x5c838c2de76a]
> > postgres: ashutosh regression [local]
> > SELECT(PostmasterMain+0x15f8)[0x5c838c2de04d]
> > postgres: ashutosh regression [local] SELECT(main+0x3a1)[0x5c838c1b12be]
> > /lib/x86_64-linux-gnu/libc.so.6(+0x29d90)[0x7eda9ea29d90]
> > /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80)[0x7eda9ea29e40]
> > postgres: ashutosh regression [local] SELECT(_start+0x25)[0x5c838be7c025]
> > 2025-03-11 11:40:01.696 IST postmaster[303081] LOG: client backend
> > (PID 303994) was terminated by signal 6: Aborted
> > 2025-03-11 11:40:01.696 IST postmaster[303081] DETAIL: Failed process
> > was running: select * from atpgv1;
> > I tried to investigate the Assertion, it's failing for property graph
> > RTE which is turned into subquery RTE. It has the right lock mode set,
> > but I haven't been able to figure out where the lock is supposed to be
> > taken or where it's released. If we just prepare the failing query and
> > execute the prepared statement, it does not trip the assertion. Will
> > investigate it more.
>
> I reproduced the crash using the example Junwang gave.
>
> The problem seems to be that RTEs of rtekind RTE_GRAPH_TABLE are not
> handled in AcquireRewriteLocks().  You'll need to add a case for
> RTE_GRAPH_TABLE similar to RTE_RELATION in the following switch of
> that function:
>
> /*
>  * First, process RTEs of the current query level.
>  */
> rt_index = 0;
> foreach(l, parsetree->rtable)
> {
> RangeTblEntry *rte = (RangeTblEntry *) lfirst(l);
> Relationrel;
> LOCKMODElockmode;
> List   *newaliasvars;
> Index   curinputvarno;
> RangeTblEntry *curinputrte;
> ListCell   *ll;
>
> ++rt_index;
> switch (rte->rtekind)
> {
> case RTE_RELATION:
>
> which could be as simple as the following (fixes the crash!) or
> something that's specific to RTE_GRAPH_TABLE:
>
> diff --git a/src/backend/rewrite/rewriteHandler.c
> b/src/backend/rewrite/rewriteHandler.c
> index c51dd3d2ee4..8fa6edb90eb 100644
> --- a/src/backend/rewrite/rewriteHandler.c
> +++ b/src/backend/rewrite/rewriteHandler.c
> @@ -174,6 +174,7 @@ AcquireRewriteLocks(Query *parsetree,
>  switch (rte->rtekind)
>  {
>  case RTE_RELATION:
> +case RTE_GRAPH_TABLE:
>
> --
> Thanks, Amit Langote

I’ve tested the fix, it works and is better than my previous solution.
I will send a new patch set with this improvement, thanks.

-- 
Regards
Junwang Zhao

Re: NOT ENFORCED constraint feature

2025-03-15 Thread Alexandra Wang

Hi Amul,

On Thu, Feb 27, 2025 at 12:57 AM Amul Sul  wrote:

> Attached is the rebased patch set against the latest master head,
> which also includes a *new* refactoring patch (0001). In this patch,
> I’ve re-added ATExecAlterChildConstr(), which is required for the main
> feature patch (0008) to handle recursion from different places while
> altering enforceability.

Thanks for the patches!

I reviewed and ran “make check” on each patch. I appreciate how the
patches are organized; separating the refactors from the
implementations made the review process very straightforward.
Overall, LGTM, and I have minor comments below:

0008
Since we are added "convalidated" in some of the constraints tests,
should we also add a "convalidated" field in the "table_constraints"
system view defined in src/backend/catalog/information_schema.sql? If
we do that, we'd also need to update the documentation for this view.

0009
Comment on top of the function ATExecAlterConstrEnforceability():
s/ATExecAlterConstrRecurse/ATExecAlterConstraintInternal/g

Typo in tablecmds.c: s/droping/dropping, s/ke/key
/* We should be droping trigger related to foreign ke constraint */

Thanks,
Alex

RE: Query ID Calculation Fix for DISTINCT / ORDER BY and LIMIT / OFFSET

2025-03-15 Thread Bykov Ivan

Hello!

> Since then, I see that Ivan
> has already submitted a patch that accounts for NULL nodes and adds a
> byte to the jumble buffer to account for NULLs. This seems quite clean
> and simple. However, Sami seems to have concerns about the overhead of
> doing this. Is that warranted at all? Potentially, there could be a
> decent number of NULL fields. It'll probably be much cheaper than the
> offsetof idea I came up with.

To address the issue of it consuming a lot of bytes in the jumble buffer
for NULL marks, I have added a new patch version for Variant B.
(v2-0001-Query-ID-Calculation-Fix-Variant-B.patch).

This patch adds to JumbleState the count of NULL nodes visited before a
non-NULL one appears. The least significant byte of this count is appended
to the jumble buffer every time the count is not null (indicating that we have
visited non-NULL nodes previously), and a new chunk of data is also appended
to the jumble buffer. I intentionally did not add a final append for the NULL
count in the JumbleQuery processing, as NULL tail nodes of the query do not
improve the reliability of query ID collision resolution.

I believe that my first version, Variant B of the patch, is simple and easy to
understand. I would prefer to use the first version of the Variant B patch,
so I have attached an updated version along with tests
(v3-0001-Query-ID-Calculation-Fix-Variant-B.patch).

As you can see, v2-0001-Query-ID-Calculation-Fix-Variant-B.patch is more
complex and invasive than v3-0001-Query-ID-Calculation-Fix-Variant-B.patch.
I don't think that saving a few bytes in a 1024-byte sized buffer is worth
implementing, even for this simple optimization (v2).

> Can you share your thoughts on Ivan's NULL jumble idea?

I would appreciate any feedback. Thank you.

-Original Message-
From: David Rowley  
Sent: Tuesday, March 11, 2025 12:49 PM
To: Michael Paquier 
Cc: Быков Иван Александрович ; Sami Imseih 
; pgsql-hack...@postgresql.org
Subject: Re: Query ID Calculation Fix for DISTINCT / ORDER BY and LIMIT / OFFSET

On Tue, 11 Mar 2025 at 19:50, Michael Paquier  wrote:
>
> On Tue, Mar 11, 2025 at 12:45:35AM +1300, David Rowley wrote:
> > It seems to me that if this fixes the issue, that the next similar 
> > one is already lurking in the shadows waiting to jump out on us.
>
> For how many years will be have to wait for this threat hiddent in the 
> shadows?  :)
>
> This issue exists since the query jumbling exists in pgss, so it goes 
> really down the road.  I've spent a bit more than a few minutes on 
> that.

I didn't mean to cause offence here. I just wanted to make it clear that I 
don't think fixing these types of issues by swapping the order of the fields is 
going to be a sustainable practice, and it would be good to come up with a 
solution that eliminates the chances of this class of bug from ever appearing 
again.  Even if we were to trawl the entire Query struct and everything that 
could ever be attached to it and we deem that today it's fine and no more such 
bugs exist, the person adding some new SQL feature in the future that adds new 
data to store in the Query probably isn't going to give query jumbling much 
thought, especially now that the code generation is automatic.

> > Couldn't we adjust all this code so that we pass a seed to
> > AppendJumble() that's the offsetof(Struct, field) that we're 
> > jumbling and incorporate that seed into the hash?  I don't think we 
> > could possibly change the offset in a minor version, so I don't 
> > think there's a risk that could cause jumble value instability. 
> > Possibly different CPU architectures would come up with different 
> > offsets through having different struct alignment requirements, but 
> > we don't make any guarantees about the jumble values matching across 
> > different CPU architectures anyway.
>
> Yeah, something like that has potential to get rid of such problems 
> forever, particularly thanks to the fact that we automate the 
> generation of this code now so it is mostly cost-free.  This has a 
> cost for the custom jumbling functions where one needs some 
> imagination, but with models being around while we keep the number of 
> custom functions low, that should be neither complicated nor costly, 
> at least in my experience.

I hadn't really processed this thread fully when I responded yesterday and my 
response was mostly aimed at the latest patch on the thread at the time, which 
I had looked at quickly and wanted to put a stop to changing field orders as a 
fix for this.  Since then, I see that Ivan has already submitted a patch that 
accounts for NULL nodes and adds a byte to the jumble buffer to account for 
NULLs. This seems quite clean and simple. However, Sami seems to have concerns 
about the overhead of doing this. Is that warranted at all? Potentially, there 
could be a decent number of NULL fields. It'll probably be much cheaper than 
the offsetof idea I came up with.

> When I was thinking about the addition

Re: Question about duplicate JSONTYPE_JSON check

2025-03-15 Thread Maciek Sakrejda

Thanks for the quick fix. I was able to reproduce the assertion
failure and to confirm that it's resolved with the patch. Looks good
to me.

Re: Add pg_accept_connections_start_time() for better uptime calculation

2025-03-15 Thread Fujii Masao





On 2025/03/06 21:55, Robins Tharakan wrote:

Hi,

Thanks for taking a look at the patch, and for your feedback.

On Wed, 5 Mar 2025 at 03:22, Fujii Masao mailto:masao.fu...@oss.nttdata.com>> wrote:

On 2025/02/16 16:05, Robins Tharakan wrote:
 > This patch introduces a new function pg_accept_connections_start_time().

Shouldn't this function also handle the time when the postmaster
starts accepting read-only connections? With the patch, it doesn’t
seem to cover that case, and it looks like an unexpected timestamp
is returned when run on a standby server. Maybe the function should
return a record with two columns — one for when the postmaster
starts accepting read-only connections and another for normal
connections?



I am not sure if I understand the question. For a given (re)start, a
database user would either be looking for a read-only or a read-write
start time (based on whether the server is a standby or not). Are you
saying that for a given instance of start, a database user would be
interested in two timestamps (once when the database became
ready to accept read-only connections, and then quickly thereafter
also began accepting read-writes?) Even if possible, that feels
unnecessary - but I may be misunderstanding here.


With the v1 patch, running pg_accept_connections_start_time() on
a standby returned an unexpected timestamp:

=# select * from pg_accept_connections_start_time();
 pg_accept_connections_start_time
--
 2000-01-01 09:00:00+09

So my comment meant that this seems odd and should be fixed.
Since I've not fully understood how this function is used,
I'm not sure what timestamp should be returned in the standby.
But I just thought it seems intuitive to return the timestamp
when the standby started accepting read-only connections, in that case.



But you bring up a good point around standbys. Attached is v2 of
the patch that returns a more accurate time on a standby (ie. it
captures the time just after emitting a message that it's ready for
read-only connections).


Thanks for updating the patch!

With v2 patch. when the standby is promoted to primary,
the result of pg_postmaster_open_time() appears to switch to
the time when the primary began accepting normal connections.
If this is intentional, it's better to clarify this behavior
in the documentation.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: Statistics Import and Export

2025-03-15 Thread Corey Huinker

>
>
> https://www.postgresql.org/docs/current/ddl-priv.html
>
> The above text indicates that we should do the check, but also that
> it's not terribly important for actual security.
>

Ok, I'm convinced.



>
> > If we do, we'll want to change downgrade the following errors to
> > warn+return false:
>
> Perhaps we should consider the schemaname/relname change as one patch,
> which maintains relation lookup failures as hard ERRORs, and a
> "downgrade errors to warnings" as a separate patch.
>

+1

Re: Support NOT VALID / VALIDATE constraint options for named NOT NULL constraints

2025-03-15 Thread Alvaro Herrera

On 2025-Mar-10, Rushabh Lathia wrote:

> I adjusted the set_attnotnull() API and removed the added
> queue_validation parameter.  Rather, the function start using wqueue
> input parameter as a check.
> If wqueue is NULL, skip the queue_validation.  Attaching patch here,
> but not sure how clear it is, in term of understanding the API.  Your
> thoughts/inputs?

Yeah, I think this makes sense, if it supports all the cases correctly.
(If in your code you have any calls of set_attnotnull that pass a NULL
wqueue during ALTER TABLE in order to avoid queueing a check, you had
better have comments to explain this.)

Kindly do not send patches standalone, because the CFbot doesn't
understand that it needs to apply it on top of the three previous
patches.  You need to send all 4 patches every time.  You can see the
result here:
https://commitfest.postgresql.org/patch/5554/
If you click on the "need rebase!" label you'll see that it tried to
apply patch 0004 to the top of the master branch, and that obviously
failed.  (FWIW if you click on the "summary" label you'll also see that
the patch has been failing the CI tests since Feb 27.)

> Looking further for AdjustNotNullInheritance() I don't see need of
> rename this API as it's just adding new error check now for an
> existing NOT NULL constraint.

OK.

> JFYI, I can reproduce the failure Ashutosh Bapat reported, while
> running the pg_upgrade test through meson commands.  I am
> investigating that further.

Ah good, thanks.

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/
"Industry suffers from the managerial dogma that for the sake of stability
and continuity, the company should be independent of the competence of
individual employees."  (E. Dijkstra)

Re: Random pg_upgrade 004_subscription test failure on drongo

2025-03-15 Thread Andrew Dunstan




On 2025-03-13 Th 5:04 AM, vignesh C wrote:

Hi,

In one of the recent buildfarm runs, the test
pg_upgrade/004_subscription test fails at [1].
It has failed with the following:
Restoring database schemas in the new cluster
*failure*

Consult the last few lines of
"C:/prog/bf/root/HEAD/pgsql.build/testrun/pg_upgrade/004_subscription/data/t_004_subscription_new_sub_data/pgdata/pg_upgrade_output.d/20250310T194018.517/log/pg_upgrade_dump_1.log"
for
the probable cause of the failure.
Failure, exiting

Unfortunately we don't have pg_upgrade_output.d contents in buildfarm
to see what is the exact reason.




That's not supposed to happen. I am testing a fix to see if I can make 
it collect the logs, but for now we'll have to wait till the next failure ..




cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: Vacuum statistics

2025-03-15 Thread Kirill Reshke

On Thu, 27 Feb 2025 at 23:00, Alena Rybakina  wrote:
>
> Hi!
> On 17.02.2025 17:46, Alena Rybakina wrote:
> > On 04.02.2025 18:22, Alena Rybakina wrote:
> >> Hi! Thank you for your review!
> >>
> >> On 02.02.2025 23:43, Alexander Korotkov wrote:
> >>> On Mon, Jan 13, 2025 at 3:26 PM Alena Rybakina
> >>>  wrote:
>  I noticed that the cfbot is bad, the reason seems to be related to
>  the lack of a parameter in
>  src/backend/utils/misc/postgresql.conf.sample. I added it, it
>  should help.
> >>> The patch doesn't apply cleanly.  Please rebase.
> >> I rebased them.
> > The patch needed a rebase again. There is nothing new since version
> > 18, only a rebase.
>
> The patch needed a new rebase.
>
> Sorry, but due to feeling unwell I picked up a patch from another thread
> and squashed the patch for vacuum statistics for indexes and heaps in
> the previous version. Now I fixed everything together with the rebase.
>
> --
> Regards,
> Alena Rybakina
> Postgres Professional

Hi!
CI fails on this one[0]

Is it a flap or a real problem caused by v20?

```

 SELECT relpages AS irp
diff -U3 
/tmp/cirrus-ci-build/src/test/regress/expected/vacuum_tables_and_db_statistics.out
/tmp/cirrus-ci-build/build-32/testrun/recovery/027_stream_regress/data/results/vacuum_tables_and_db_statistics.out
--- 
/tmp/cirrus-ci-build/src/test/regress/expected/vacuum_tables_and_db_statistics.out
2025-03-10 09:36:10.274799176 +
+++ 
/tmp/cirrus-ci-build/build-32/testrun/recovery/027_stream_regress/data/results/vacuum_tables_and_db_statistics.out
2025-03-10 09:49:35.796596462 +
@@ -65,7 +65,7 @@
 WHERE vt.relname = 'vestat' AND vt.relid = c.oid;
  relname | vm_new_frozen_pages | tuples_deleted | relpages |
pages_scanned | pages_removed
 
-+-++--+---+---
- vestat  |   0 |  0 |  455 |
   0 | 0
+ vestat  |   0 |  0 |  417 |
   0 | 0
 (1 row)

 SELECT relpages AS rp
=== EOF ===


```

[0] 
https://api.cirrus-ci.com/v1/artifact/task/5336493629112320/testrun/build-32/testrun/recovery/027_stream_regress/log/regress_log_027_stream_regress

-- 
Best regards,
Kirill Reshke

Re: pg_attribute_noreturn(), MSVC, C11

2025-03-15 Thread Peter Eisentraut


On 10.03.25 14:58, Andres Freund wrote:

diff --git a/src/backend/utils/mmgr/slab.c b/src/backend/utils/mmgr/slab.c
index ec8eddad863..d32c0d318fb 100644
--- a/src/backend/utils/mmgr/slab.c
+++ b/src/backend/utils/mmgr/slab.c
@@ -601,8 +601,8 @@ SlabAllocFromNewBlock(MemoryContext context, Size size, int 
flags)
   *want to avoid that.
   */
  pg_noinline
+pg_noreturn
  static void
-pg_attribute_noreturn()
  SlabAllocInvalidSize(MemoryContext context, Size size)
  {
SlabContext *slab = (SlabContext *) context;

Hm, is it good to put the pg_noreturn after the pg_noinline?


I probably just did them alphabetically.  I don't think there would be a 
problem, since pg_noinline is an attribute, and they can generally be 
put everywhere.  At least until we learn otherwise.

Re: Add XMLNamespaces to XMLElement

2025-03-15 Thread newtglobal postgresql_contributors

The following review has been posted through the commitfest application:
make installcheck-world:  not tested
Implements feature:   not tested
Spec compliant:   not tested
Documentation:not tested

Hi Pavel,

I have tested this patch, and it proves to be highly useful when handling 
XMLNAMESPACES() with both DEFAULT and NO DEFAULT options. The following test 
cases confirm its correctness:


SELECT xmlelement(
NAME "foo",
XMLNAMESPACES('http://x.y' AS xy, 'http://a.b' AS ab, DEFAULT 'http://d.e'),
xmlelement(NAME "foot", 
xmlelement(NAME "xy:shoe"), 
xmlelement(NAME "ab:lace")
)
);

SELECT xmlelement(
NAME "foo",
XMLNAMESPACES('http://x.y' AS xy, 'http://a.b' AS ab, NO DEFAULT),
xmlelement(NAME "foot", 
xmlelement(NAME "xy:shoe"), 
xmlelement(NAME "ab:lace")
)
);
Additionally, I verified that the patch correctly supports multiple namespaces 
when using both DEFAULT and NO DEFAULT, ensuring expected behavior across 
different use cases.

Great work on this improvement!

Re: vacuumdb changes for stats import/export

2025-03-15 Thread John Naylor

On Fri, Mar 7, 2025 at 4:47 AM Nathan Bossart  wrote:
>
> On Thu, Mar 06, 2025 at 06:30:59PM +0700, John Naylor wrote:

> > IIUC correctly, pg_statistic doesn't store stats on itself, so this
> > causes the query result to always contain pg_statistic -- does that
> > get removed elsewhere?
>
> Good catch.  I think the current behavior is to call ANALYZE on
> pg_statistic, too, but that should be mostly harmless (analyze_rel()
> refuses to process it).  I suppose we could try to avoid returning
> pg_statistic from the catalog query, but we don't bother doing that for any
> other vacuumdb modes, so I'm tempted to leave it alone.

Okay, thanks for confirming. I have no further comments.

-- 
John Naylor
Amazon Web Services

how to see the generated nodetags.h

2025-03-15 Thread Hao Zhang

Hello Hackers

It seems nodetags.h is generated by gen_node_support.pl
.
Where and how can I see this generated header to get a full list of all
possible node tags? Thx


typedef enum NodeTag
{
T_Invalid = 0,

#include "nodes/nodetags.h"
} No

Re: JSON_VALUE() behavior when RETURNING bytea (expected base64 decoding)

2025-03-15 Thread Shay Rojansky

>
> If the behavior of RETURNING is meant to be identical to that of simply
>> applying a cast, is there any actual advantage in using JSON_VALUE with
>> RETURNING? In other words, why not just do JSON_VALUE(json '"AQID"',
>> '$')::bytea instead of using RETURNING? I thought the point was precisely
>> for RETURNING to be able to perform JSON-specific conversions (e.g. take
>> into account that the base64 is being converted from a *JSON* string, and
>> therefore apply base64 decoding to it).
>>
>
> Not really…it does seem to just be syntactic sugar.  Not that we’d be
> likely to assume the contents of a JSON string are a base64 encoding as it
> is just, as you claim, a de-facto standard.  Unless we have some standard
> (namely the one defining json_value) telling us that the contents are
> indeed always base64 encoded data we’ll just assume it’s plain text and act
> accordingly - in this case passing it into bytea’s input function.
>

OK. For whatever it's worth, I'll note that SQL Server's OPENJSON does do
this (so when a JSON string property is extracted as a binary type, base64
encoding is assumed). Other databases also have very specific documented
conversion rules for JSON_VALUE RETURNING (Oracle
,
DB2

(table 1)). I'm basically trying to show that RETURNING definitely isn't a
simple cast-from-string in other databases, but is a distinct conversion
mechanism that takes into account the fact the the origin data comes from
JSON.

JSON is of course a very light on formal/official standards, but some very
strong de-facto standards have established themselves (e.g. ISO8601 for
timestamps), and even beyond JSON, base64 seems to be the de-facto standard
for encoding binary data as a string (which is what this is about). I'll
also point out again that if the user really is looking only to get a
string out and apply regular PG convert-from-string casting, they can do
just that (i.e. omit RETURNING and apply regular PG casting); to me that
points to RETURNING doing something beyond that, adding JSON-specific
usefulness.

Re: vacuumdb changes for stats import/export

2025-03-15 Thread John Naylor

On Wed, Mar 12, 2025 at 12:00 AM Nathan Bossart
 wrote:
>
> On Mon, Mar 10, 2025 at 10:08:49AM -0500, Nathan Bossart wrote:
> > On Mon, Mar 10, 2025 at 12:35:22PM +0700, John Naylor wrote:
> >> I have no further comments.
> >
> > Thanks.  I'll give this a little more time for review before committing.
>
> I realized that we could limit the catalog query reuse to only when
> --missing-only is specified, so I've updated 0001 and 0002 accordingly.
> This avoids changing any existing behavior.
>
> > We'll still need to update the recommendation in pg_upgrade's
> > documentation.  I'm going to keep that separate because the stats
> > import/export work is still settling.
>
> 0003 is a first attempt at this.  Unless I am missing something, there's
> really not much to update.

The change here seems fine. My only quibble is that this sentence now
seems out of place: "Option --analyze-in-stages can be used to
generate minimal statistics quickly." I'm thinking we should make a
clearer separation for with and without the  --no-statistics option
specified.


--
John Naylor
Amazon Web Services

Re: Parallel CREATE INDEX for GIN indexes

2025-03-15 Thread Matthias van de Meent

On Tue, 4 Mar 2025 at 20:50, Tomas Vondra wrote:
>
> I pushed the two smaller parts today.
>
> Here's the remaining two parts, to keep cfbot happy. I don't expect to
> get these into PG18, though.

As promised on- and off-list, here's the 0001 patch, polished, split,
and further adapted for performance.

As seen before, it reduces tempspace requirements by up to 50%. I've
not tested this against HEAD for performance.

It has been split into:

0001: Some API cleanup/changes that creaped into the patch. This
removes manual length-passing from the gin tuplesort APIs, instead
relying on GinTuple's tuplen field. It's not critical for anything,
and could be ignored if so desired.

0002: Tuplesort changes to allow TupleSort users to buffer and merge
tuples during the sort operations.
The patch was pulled directly from [0] (which was derived from earlier
work in this thread), is fairly easy to understand, and has no other
moving parts.

0003: Deduplication in tuplesort's flush-to-disk actions, utilizing
API introduced with 0002.
This improves temporary disk usage by deduplicating data even further,
for when there's a lot of duplicated data but the data has enough
distinct values to not fit in the available memory.

0004: Use a single tuplesort. This removes the worker-local tuplesort
in favor of only storing data in the global one.

This mainly reduces the code size and complexity of parallel GIN
builds; we already were using that global sort for various tasks.

Open questions and open items for this:
- I did not yet update the pg_stat_progress systems, nor docs.
- Maybe 0003 needs further splitting up, one for the optimizations in
GinBuffer, one for the tuplesort buffering.
- Maybe we need to trim the buffer in gin's tuplesort flush?
- Maybe we should grow the GinBuffer->items array superlinearly rather
than to the exact size requirement of the merge operation.

Apart from the complexities in 0003, I think the changes are fairly
straightforward.

I did not include the 0002 of the earlier patch, as it was WIP and its
feature explicitly conflicts with my 0004.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)

[0]
https://www.postgresql.org/message-id/CAEze2WhRFzd=nvh9YevwiLjrS1j1fP85vjNCXAab=iybz2r...@mail.gmail.com

v20250307-0004-Make-Gin-parallel-builds-use-a-single-tupl.patch
Description: Binary data

v20250307-0002-Allow-tuplesort-implementations-to-buffer-.patch
Description: Binary data

v20250307-0001-Remove-size-argument-from-GIN-tuplesort-in.patch
Description: Binary data

v20250307-0003-Merge-GinTuples-during-tuplesort-before-fl.patch
Description: Binary data

Re: protocol support for labels

2025-03-15 Thread Kirill Reshke

On Tue, 11 Mar 2025 at 11:09, Jeremy Schneider  wrote:

> observability frameworks like OpenTelemetry support tracing through all
> layers of a stack, and trace_ids can even be passed into sql with
> extensions like sqlcommenter. however sqlcommenter puts the trace_id
> into a comment which effectively breaks all the utility of
> pg_stat_statements since each query has a unique trace_id.
>
There are some other use-cases:
1) RO-RW routing. Users can pass target-session-attrs to the server
within query labels to hint about its need. Useful when some kind of
proxy (odyssey,pgbouncer,spqr,pgpool II, pgcat, pg_doorman) is used.
2) pg_comment_stats uses comments in the query to accumulate statistics. [0]

However, I don't think PostgreSQL community is open for this big
change for much (non-big) matters.

[0] g...@github.com:munakoiso/pg_comment_stats.git

-- 
Best regards,
Kirill Reshke

Re: track generic and custom plans in pg_stat_statements

2025-03-15 Thread Sami Imseih

> Thank you for your patch. It is really useful for tracking the history
> of generic and custom plan usage.

Thanks for the review!

> 1. Is there any reason for the double check of cplan != NULL? It seems
> unnecessary, and we could simplify it to:
>
> -if (cplan && cplan->status == PLAN_CACHE_STATUS_CUSTOM_PLAN)
> +if (cplan->status == PLAN_CACHE_STATUS_CUSTOM_PLAN)

No, it's not necessary and an oversight. removed.

> 2. Should we add Assert(kind == PGSS_EXEC) at this place  to ensure that
> generic_plan_calls and custom_plan_calls are only incremented when
> appropriate?
>

I don't think an assert is needed here. There is an assert at the start of
the block for PGSS_EXEC and PGSS_PLAN, but cplan is only available
in the executor.

v4 attached

--
Sami


v4-0001-add-plan_cache-counters-to-pg_stat_statements.patch
Description: Binary data

Re: Proposal: manipulating pg_control file from Perl

2025-03-15 Thread Christoph Berg

Re: Aleksander Alekseev
> 1) Provide a tool written in C that allows changing pg_control, e.g.
> `pg_writecontoldata` or maybe a flat like `pg_controldata -w`.

I thought we already had pg_resetwal for that.

Christoph

Re: ZStandard (with dictionaries) compression support for TOAST compression

2025-03-15 Thread Aleksander Alekseev

Hi Nikhil,

Many thanks for working on this. I proposed a similar patch some time
ago [1] but the overall feedback was somewhat mixed so I choose to
focus on something else. Thanks for peeking this up.

> test=# select build_zstd_dict_for_attribute('"public"."zstd"', 1);
> build_zstd_dict_for_attribute
> ---
> t
> (1 row)

Did you have a chance to familiarize yourself with the corresponding
discussion [1] and probably the previous threads? Particularly it was
pointed out that dictionaries should be built automatically during
VACUUM. We also discussed a special syntax for the feature, besides
other things.

[1]: 
https://www.postgresql.org/message-id/flat/CAJ7c6TOtAB0z1UrksvGTStNE-herK-43bj22%3D5xVBg7S4vr5rQ%40mail.gmail.com

-- 
Best regards,
Aleksander Alekseev

Re: Update Unicode data to Unicode 16.0.0

On Sat, 2025-03-15 at 12:15 -0400, Tom Lane wrote:
> On the other hand, if we keep up with the Joneses by updating the
> Unicode data, we can hopefully put those behavioral changes into
> effect *before* they'd affect any real data.

That's a good point.

Regards,
Jeff Davis

Re: AIO v2.5

Andres Freund  writes:
> While looking at the view I felt motivated to tackle the one FIXME in the
> implementation of the view. Namely that the "error_desc" column wasn't
> populated (the view did show that there was an error, but not what the error
> was).

> Which lead me down a sad sad rabbit hole, largely independent of AIO.

> ...

> The main reason I wanted to write this up is that it seems that we're just
> lacking some infrastructure here.

Maybe.  The mention of elog.c in the same breath with critical
sections is already enough to scare me; we surely daren't invoke
gettext() in a critical section, for instance.  I feel the most
we could hope for here is to report a constant string that would
not get translated till later, outside the critical section.
That seems less about infrastructure and more about how the AIO
error handling/reporting code is laid out.  In the meantime,
if leaving the error out of this view is enough to make the problem
go away, let's do that.

regards, tom lane

Assert(TransactionIdPrecedesOrEquals(TransactionXmin, RecentXmin));

2025-03-15 Thread Tomas Vondra

Hi,

while stress-testing a physical replication to test another patch, I've
repeatedly hit this assert in GetSnapshotDataReuse:

  Assert(TransactionIdPrecedesOrEquals(TransactionXmin, RecentXmin));

AFAICS the backtrace always looks exactly the same - a couple frames
from the top:

#4  0x00b5af36 in ExceptionalCondition (conditionName=0xe2a470
"TransactionIdPrecedesOrEquals(TransactionXmin, RecentXmin)",
fileName=0xe29768 "procarray.c",
lineNumber=2132) at assert.c:66
#5  0x0094c2c3 in GetSnapshotDataReuse (snapshot=0x1079380
) at procarray.c:2132
#6  0x0094c4b0 in GetSnapshotData (snapshot=0x1079380
) at procarray.c:2233
#7  0x00bb31f8 in GetNonHistoricCatalogSnapshot (relid=2615) at
snapmgr.c:412
#8  0x00bb31a2 in GetCatalogSnapshot (relid=2615) at snapmgr.c:385
#9  0x00493073 in systable_beginscan
(heapRelation=0x76cee3dc01d8, indexId=2684, indexOK=true, snapshot=0x0,
nkeys=1, key=0x7ffcaddd2f30) at genam.c:414
#10 0x00b383c0 in SearchCatCacheMiss (cache=0xa99a880, nkeys=1,
hashValue=2798280003, hashIndex=3, v1=130630962093252, v2=0, v3=0, v4=0)
at catcache.c:1533
#11 0x00b38290 in SearchCatCacheInternal (cache=0xa99a880,
nkeys=1, v1=130630962093252, v2=0, v3=0, v4=0) at catcache.c:1464
#12 0x00b37f69 in SearchCatCache (cache=0xa99a880,
v1=130630962093252, v2=0, v3=0, v4=0) at catcache.c:1318
#13 0x00b53df5 in SearchSysCache (cacheId=37,
key1=130630962093252, key2=0, key3=0, key4=0) at syscache.c:217
#14 0x00b54318 in GetSysCacheOid (cacheId=37, oidcol=1,
key1=130630962093252, key2=0, key3=0, key4=0) at syscache.c:459

At first I assumed it's something my patch broke, but then I tried the
stress test on master, and I hit that assert again, so it's a
pre-existing issue.

The stress test is fairly simple - it creates a primary/standby cluster,
and then does pgbench on both nodes (read-write on primary, read-only on
standby), and also randomly restarts both nodes (in fast or immediate
modes). The script I use is attached.

It takes a while to hit it - on my laptop it takes an hour or so, but I
guess it's more about the random sleeps in the script.

I've only ever seen this on the standby, never on the primary.


regards

-- 
Tomas Vondra
Program terminated with signal SIGABRT, Aborted.
#0  __pthread_kill_implementation (threadid=, 
signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44
44return INTERNAL_SYSCALL_ERROR_P (ret) ? INTERNAL_SYSCALL_ERRNO 
(ret) : 0;
(gdb) bt
#0  __pthread_kill_implementation (threadid=, 
signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44
#1  0x76cee428b183 in __pthread_kill_internal (threadid=, 
signo=6) at pthread_kill.c:78
#2  0x76cee4231f9e in __GI_raise (sig=sig@entry=6) at 
../sysdeps/posix/raise.c:26
#3  0x76cee4219942 in __GI_abort () at abort.c:79
#4  0x00b5af36 in ExceptionalCondition (conditionName=0xe2a470 
"TransactionIdPrecedesOrEquals(TransactionXmin, RecentXmin)", fileName=0xe29768 
"procarray.c", 
lineNumber=2132) at assert.c:66
#5  0x0094c2c3 in GetSnapshotDataReuse (snapshot=0x1079380 
) at procarray.c:2132
#6  0x0094c4b0 in GetSnapshotData (snapshot=0x1079380 
) at procarray.c:2233
#7  0x00bb31f8 in GetNonHistoricCatalogSnapshot (relid=2615) at 
snapmgr.c:412
#8  0x00bb31a2 in GetCatalogSnapshot (relid=2615) at snapmgr.c:385
#9  0x00493073 in systable_beginscan (heapRelation=0x76cee3dc01d8, 
indexId=2684, indexOK=true, snapshot=0x0, nkeys=1, key=0x7ffcaddd2f30) at 
genam.c:414
#10 0x00b383c0 in SearchCatCacheMiss (cache=0xa99a880, nkeys=1, 
hashValue=2798280003, hashIndex=3, v1=130630962093252, v2=0, v3=0, v4=0) at 
catcache.c:1533
#11 0x00b38290 in SearchCatCacheInternal (cache=0xa99a880, nkeys=1, 
v1=130630962093252, v2=0, v3=0, v4=0) at catcache.c:1464
#12 0x00b37f69 in SearchCatCache (cache=0xa99a880, v1=130630962093252, 
v2=0, v3=0, v4=0) at catcache.c:1318
#13 0x00b53df5 in SearchSysCache (cacheId=37, key1=130630962093252, 
key2=0, key3=0, key4=0) at syscache.c:217
#14 0x00b54318 in GetSysCacheOid (cacheId=37, oidcol=1, 
key1=130630962093252, key2=0, key3=0, key4=0) at syscache.c:459
#15 0x0054ba84 in get_namespace_oid (nspname=0x76cee4179cc4 "user", 
missing_ok=true) at namespace.c:3539
#16 0x0054ca87 in preprocessNamespacePath (searchPath=0xa9486b0 
"\"$user\", public", roleid=10, temp_missing=0xa958ebc) at namespace.c:4150
#17 0x0054cdb0 in cachedNamespacePath (searchPath=0xa9486b0 "\"$user\", 
public", roleid=10) at namespace.c:4261
#18 0x0054ce9f in recomputeNamespacePath () at namespace.c:4309
#19 0x00547d24 in RelnameGetRelid (relname=0xa92d708 
"pgbench_accounts") at namespace.c:890
#20 0x005474cc in RangeVarGetRelidExtended (relation=0xa92d738, 
lockmode=1, flags=1, callback=0x0, callback_arg=0x0) at namespace.c:539
#21 0x0041d7d7 in relation_openrv_extended (

Re: Update Unicode data to Unicode 16.0.0

On Fri, 2025-03-14 at 23:54 -0700, Jeremy Schneider wrote:
> From my perspective, the whole point of the builtin collation was to
> one option that avoids these problems that come with updating both
> ICU
> and glibc.
> 
> So I guess the main point of the builtin provider just that it's
> faster
> than ICU?

It doesn't break primary keys.

Also, it's stable within a major version, we can document and test its
behavior, it solves 99% of the upgrade problem, and what problems
remains are much more manageable.

And yes, collation is way, way faster than ICU.

Regards,
Jeff Davis

Re: PGSERVICEFILE as part of a normal connection string

2025-03-15 Thread Ryo Kanbayashi

On Thu, Mar 13, 2025 at 9:42 AM Michael Paquier  wrote:
>
> On Thu, Mar 13, 2025 at 08:53:49AM +0900, Ryo Kanbayashi wrote:
> > If you can't work for ther patch for a while because you are busy or
> > other some reason,
> > I can become additinal reviewer and  apply review comments from Micael
> > to the patch  instead of you.
> >
> > If you don't want my action, please reply and notice me that. If
> > possible, within a week :)
>
> Putting a bit of context here.  Most of the Postgres hackers based in
> Japan had a meeting last Friday, and Kanbayashi-san has asked me about
> patches that introduce to simpler code paths in the tree that could be
> worked on for this release.  I've mentioned this thread to him.
>
> > Just to let you know, my action is not intended to steal your
> > contribution but to prevent your good idea from being lost.
>
> Authors and reviewers get busy because of life and work matters, and
> contributions are listed in the commit logs for everybody who
> participates.  If you can help move this patch forward, thanks a lot
> for the help!  IMO, that would be great.  The patch set still needs
> more reorganization and adjustments, but I think that we can get it
> there.

On Thu, Mar 13, 2025 at 3:07 PM Laurenz Albe  wrote:
>
> On Thu, 2025-03-13 at 08:53 +0900, Ryo Kanbayashi wrote:
> > Just to let you know, my action is not intended to steal your
> > contribution but to prevent your good idea from being lost.
> >
> > TO: Mecael and other hackers,
> >
> > There are any problem in light of community customs?
>
> Anything submitted to the mailing list is no longer private
> intellectual property.  You are free and welcome to start working
> on any patch that you are interested in and that seems neglected
> by the author.  There is no problem with listing more than one
> author.

Michael and Laurenz,

Thank you for context description and comments to my action :)

I start coding to complete the patch :)

---
Great regards,
Ryo Kanbayashi

Re: Test to dump and restore objects left behind by regression

2025-03-15 Thread Alvaro Herrera

Hello

When running these tests, I encounter this strange diff in the dumps,
which seems to be that the locale for type money does not match.  I
imagine the problem is that the locale is not set correctly when
initdb'ing one of them?  Grepping the regress_log for initdb, I see
this:

$ grep -B1 'Running: initdb' tmp_check/log/regress_log_002_pg_upgrade 
[13:00:57.580](0.003s) # initializing database system by running initdb
# Running: initdb -D 
/home/alvherre/Code/pgsql-build/master/src/bin/pg_upgrade/tmp_check/t_002_pg_upgrade_old_node_data/pgdata
 -A trust -N --wal-segsize 1 --allow-group-access --encoding UTF-8 --lc-collate 
C --lc-ctype C --locale-provider builtin --builtin-locale C.UTF-8 -k
--
[13:01:12.879](0.044s) # initializing database system by running initdb
# Running: initdb -D 
/home/alvherre/Code/pgsql-build/master/src/bin/pg_upgrade/tmp_check/t_002_pg_upgrade_dst_node_data/pgdata
 -A trust -N --wal-segsize 1 --allow-group-access --encoding UTF-8 --lc-collate 
C --lc-ctype C --locale-provider builtin --builtin-locale C.UTF-8 -k
--
[13:01:28.000](0.033s) # initializing database system by running initdb
# Running: initdb -D 
/home/alvherre/Code/pgsql-build/master/src/bin/pg_upgrade/tmp_check/t_002_pg_upgrade_new_node_data/pgdata
 -A trust -N --wal-segsize 1 --allow-group-access --encoding SQL_ASCII 
--locale-provider libc



[12:50:31.838](0.102s) not ok 15 - dump outputs from original and restored 
regression database (using plain format) match
[12:50:31.839](0.000s) 
[12:50:31.839](0.000s) #   Failed test 'dump outputs from original and restored 
regression database (using plain format) match'
#   at /pgsql/source/master/src/test/perl/PostgreSQL/Test/Utils.pm line 797.
[12:50:31.839](0.000s) #  got: '1'
# expected: '0'
=== diff of 
/home/alvherre/Code/pgsql-build/master/src/bin/pg_upgrade/tmp_check/tmp_test_vVew/src_dump.sql_adjusted
 and 
/home/alvherre/Code/pgsql-build/master/src/bin/pg_upgrade/tmp_check/tmp_test_vVew/dest_dump.plain.sql_adjusted
=== stdout ===
--- 
/home/alvherre/Code/pgsql-build/master/src/bin/pg_upgrade/tmp_check/tmp_test_vVew/src_dump.sql_adjusted
 2025-03-12 12:50:27.674918597 +0100
+++ 
/home/alvherre/Code/pgsql-build/master/src/bin/pg_upgrade/tmp_check/tmp_test_vVew/dest_dump.plain.sql_adjusted
  2025-03-12 12:50:31.778840338 +0100
@@ -208972,7 +208972,7 @@
 -- Data for Name: money_data; Type: TABLE DATA; Schema: public; Owner: alvherre
 --
 COPY public.money_data (m) FROM stdin;
-$123.46
+$ 12.346,00
 \.
 --
 -- Data for Name: mvtest_t; Type: TABLE DATA; Schema: public; Owner: alvherre
@@ -376231,7 +376231,7 @@
 -- Data for Name: tab_core_types; Type: TABLE DATA; Schema: public; Owner: 
alvherre
 --
 COPY public.tab_core_types (point, line, lseg, box, openedpath, closedpath, 
polygon, circle, date, "time", "timestamp", timetz, timestamptz, "interval", 
"json", jsonb, jsonpath, inet, cidr, macaddr8, macaddr, int2, int4, int8, 
float4, float8, pi, "char", bpchar, "varchar", name, text, bool, bytea, "bit", 
varbit, money, refcursor, int2vector, oidvector, aclitem, tsvector, tsquery, 
uuid, xid8, regclass, type, regrole, oid, tid, xid, cid, txid_snapshot, 
pg_snapshot, pg_lsn, cardinal_number, character_data, sql_identifier, 
time_stamp, yes_or_no, int4range, int4multirange, int8range, int8multirange, 
numrange, nummultirange, daterange, datemultirange, tsrange, tsmultirange, 
tstzrange, tstzmultirange) FROM stdin;
-(11,12){1,-1,0}[(11,11),(12,12)]   (13,13),(11,11) 
((11,12),(13,13),(14,14))   [(11,12),(13,13),(14,14)]   
((11,12),(13,13),(14,14))   <(1,1),1>   2025-03-12  04:50:14.125899 
2025-03-12 04:50:14.125899  04:50:14.125899-07  2025-03-12 
12:50:14.125899+01   00:00:12{"reason":"because"}{"when": "now"} 
$."a"[*]?(@ > 2)127.0.0.1   127.0.0.0/8 00:01:03:ff:fe:86:1c:ba 
00:01:03:86:1c:ba   2   4   8   4   8   
3.14159265358979f   c   abc nametxt t   
\\xdeadbeef 1   10001   $12.34  abc 1 2 1 2 
alvherre=UC/alvherre'a' 'and' 'ate' 'cat' 'fat' 'mat' 'on' 'rat' 'sat'  
'fat' & 'rat'   a0eebc99-9c0b-4ef8-bb6d-6bb9bd380a1111  pg_class
regtype pg_monitor  1259(1,1)   2   3   10:20:10,14,15  
10:20:10,14,15  16/B374D848 1   l   n   2025-03-12 
12:50:14.13+01   YES empty   {}  empty   {}  (3,4)   {(3,4)} 
[2020-01-03,2021-02-03) {[2020-01-03,2021-02-03)}   ("2020-01-02 
03:04:05","2021-02-03 06:07:08")   {("2020-01-02 03:04:05","2021-02-03 
06:07:08")} ("2020-01-02 12:04:05+01","2021-02-03 15:07:08+01") 
{("2020-01-02 12:04:05+01","2021-02-03 15:07:08+01")}
+(11,12){1,-1,0}[(11,11),(12,12)]   (13,13),(11,11) 
((11,12),(13,13),(14,14))   [(11,12),(13,13),(14,14)]   
((11,12),(13,13),(14,14))   <(1,1),1>   2025-03-12  04:50:14.125899 
2025-03-12 04:50:14.125899  04:50:14.125899-07

Re: Changing the state of data checksums in a running cluster

2025-03-15 Thread Tomas Vondra

On 3/15/25 17:26, Andres Freund wrote:
> Jo.
> 
> On 2025-03-15 16:50:02 +0100, Tomas Vondra wrote:
>> Thanks, here's an updated patch version
> 
> FWIW, this fails in CI;
> 
> https://cirrus-ci.com/build/4678473324691456
> On all OSs:
> [16:08:36.331] #   Failed test 'options --locale-provider=icu --locale=und 
> --lc-*=C: no stderr'
> [16:08:36.331] #   at /tmp/cirrus-ci-build/src/bin/initdb/t/001_initdb.pl 
> line 132.
> [16:08:36.331] #  got: '2025-03-15 16:08:26.216 UTC [63153] LOG:  
> XLogCtl->data_checksum_version 0 ControlFile->data_checksum_version 0
> [16:08:36.331] # 2025-03-15 16:08:26.216 UTC [63153] LOG:  
> XLogCtl->data_checksum_version 0 ControlFile->data_checksum_version 0 
> (UPDATED)
> 
> Windows & Compiler warnings:
> [16:05:08.723] ../src/backend/storage/page/bufpage.c(25): fatal error C1083: 
> Cannot open include file: 'execinfo.h': No such file or directory
> 
> [16:18:52.385] bufpage.c:25:10: fatal error: execinfo.h: No such file or 
> directory
> [16:18:52.385]25 | #include 
> [16:18:52.385]   |  ^~~~
> 
> Greetings,

Yeah, that's just the "debug stuff" - I don't expect any of that to be
included in the commit, I only posted it for convenience. It adds a lot
of debug logging, which I hope might help others to understand what the
problem with checksums on standby is.

regards

-- 
Tomas Vondra

Re: Commitfest app release on Feb 17 with many improvements

2025-03-15 Thread Jelte Fennema-Nio

On Thu, 6 Mar 2025 at 16:36, vignesh C  wrote:
> But it applies neatly for me and Jim also at [3]. Any idea why patch
> apply fails with CFbot whereas it passes in our environment?

Okay, the cause of this seems to be that the CFbot currently uses "git
apply --3way", not "git am --3way". For some reason "git apply" fails
to apply the patch while "git am" succeeds. I'll check this weekend if
I can change the logic to use "git am" instead.

Re: Statistics Import and Export

On 2025-03-06 13:47:51 -0500, Corey Huinker wrote:
> I'm at the same conclusion. This would mean keeping the one
> getAttributeStats query perrelation,

Why does it have to mean that? It surely would be easier with separate
queries, but I don't think there's anything inherently blocking us from doing
something in a more batch-y fashion.

Re: Available disk space per tablespace

2025-03-15 Thread Christoph Berg

Re: Thomas Munro
> > Hmm. Is looping on EINTR worth the trouble?
> 
> I was just wondering if it might be one of those oddballs that ignores
> SA_RESTART, but I guess that doesn't seem too likely (I mean, first
> you'd probably have to have a reason to sleep or some other special
> reason, and who knows what some unusual file systems might do).  It
> certainly doesn't on the systems I tried.  So I guess not until we
> have other evidence.

Gnulib's get_fs_usage() (which is what GNU coreutil's df uses)
does not handle EINTR either.

There is some code that does int width expansion, but I believe we
don't need that since the `fst.f_bavail * fst.f_frsize` multiplication
takes care of converting that to int64 (if it wasn't already 64bits
before).

Christoph

Re: Patch: Cover POSITION(bytea,bytea) with tests

2025-03-15 Thread Peter Smith

+1 for this.

Some minor comments:

1. I thought the new test cases should be located earlier in the file
where there were already a couple of POSITION tests.
(see "-- E021-11 position expression")

2. Maybe one of your test cases can be identical to the example from
the docs [1].
position('\x5678'::bytea in '\x1234567890'::bytea) → 3

==
[1] https://www.postgresql.org/docs/17/functions-binarystring.html

Kind Regards,
Peter Smith.
Fujitsu Australia

Re: New subscriber - looking for an overview of the code

2025-03-15 Thread Roberto Mello

On Sat, Mar 15, 2025 at 2:27 PM Lateef Sagar  wrote:

> Hello devs,
> I am a coding enthusiast. I have downloaded code in a Ubuntu VMWare
> machine and I am able to compile the code.
>
> I am looking for a course / site that can give me an overview of the code
> like:
> 1. How is the code organized and which file/folder has these components?
>  * Core RDBMS engine components,
>  * additional DB components,
>  * DB Rule based security components
>  * DB Storage components
> * Index storage
>

There's probably something like that, offered by someone out on the
internet.

I'd suggest you start with the official information, as described in the
documentation:

https://www.postgresql.org/developer/
https://www.postgresql.org/developer/coding/
https://wiki.postgresql.org/wiki/Development_information
https://github.com/postgres/postgres/tree/master/src

Several prominent Postgres hackers provide guidance on a discord server:
https://discord.gg/9meuMTdt

Roberto

Re: Optimization for lower(), upper(), casefold() functions.

2025-03-15 Thread Alexander Borisov


15.03.2025 23:07, Jeff Davis wrote:

On Fri, 2025-03-14 at 15:00 +0300, Alexander Borisov wrote:

I tried adding a loop to create tables, and everything looks fine
(v7).


[...]


I prefer to generalize when we have the other code in place. As it was,
it was a bit confusing why the extra arguments and subroutines were
there.


Thanks Jeff for the review of my patch!


Also, make_ranges() doesn't seem to do what is described in the
comment: it produces a single range. I changed "> $limit" to ">=
$limit", but then it generates 3 ranges, not two. I rewrote it to be
more clear what's going on:


[...]

I had some comments, but since the patch is already committed,
it's fine.


When looking around, I didn't find a lot of material discussing this
generated-branches approach. While it's mentioned a few places, I
couldn't even find a consistent name for it. If you know of something
where I can read some more analysis, please send a link.


Unfortunately, I can't send any links. I came up with this myself, for
my project https://github.com/lexbor/lexbor. I saw that Postgres could
improve the approach to Normalization Form, then I noticed that case is
not all good either. That's why I took it up.
In other words, this approach is purely my idea, and it is showing well.
I really like Postgres and want to help you improve it, and I'm very
good at Unicode.

I feel like my knowledge will be useful for Postgres.

Thank you Jeff again for your time.


Regards,
Alexander Borisov

New subscriber - looking for an overview of the code

2025-03-15 Thread Lateef Sagar

Hello devs,I am a coding enthusiast. I have downloaded code in a Ubuntu VMWare 
machine and I am able to compile the code.
I am looking for a course / site that can give me an overview of the code 
like:1. How is the code organized and which file/folder has these components?   
  * Core RDBMS engine components,      * additional DB components,      * DB 
Rule based security components       * DB Storage components    * Index storage
2. Is there a code sample that can demonstrate DB engine changes (like a sample 
to change SELECT to SEL or SLCT, or adding a new mathematical operator)
Thanks
Lateef

Re: Optimization for lower(), upper(), casefold() functions.

On Sat, Mar 15, 2025 at 1:11 PM Tom Lane  wrote:

> Jeff Davis  writes:
> > Committed. Thank you!
>
> crake doesn't like your perl style:
>
> ./src/common/unicode/generate-unicode_case_table.pl: Loop iterator is not
> lexical at line 638, column 2.  See page 108 of PBP.

I suppose pgperltidy didn't catch that. I will fix it shortly.

Regards,
  Jeff Davis

Re: Optimization for lower(), upper(), casefold() functions.

Jeff Davis  writes:
> On Sat, Mar 15, 2025 at 1:11 PM Tom Lane  wrote:
>> crake doesn't like your perl style:
>> ./src/common/unicode/generate-unicode_case_table.pl: Loop iterator is not
>> lexical at line 638, column 2.  See page 108 of PBP.

> I suppose pgperltidy didn't catch that. I will fix it shortly.

crake is running perlcritic, which is a whole different animal.

regards, tom lane

Proposal: Deferred Replica Filtering for PostgreSQL Logical Replication

2025-03-15 Thread Dean

Hi hackers,

I'd like to propose an enhancement to PostgreSQL's logical replication
system: Deferred Replica Filtering (DRF). The goal of this feature is to
provide more granular control over which rows are replicated by applying
publication filters *after* the WAL decoding process, before sending data
to subscribers.

Currently, PostgreSQL's logical replication filters apply
deterministically. Deferred filtering, however, operates after the WAL has
been decoded, giving it access to the complete row data and making
filtering decisions based on mutable values. Additionally, record columns
may be omitted by the filter.

This opens up several possibilities for granular control. Consider the
following examples:
Alice and Bob subscribe to changes on a table with RLS enabled, allowing
CRUD operations based on user's IDs.
1. Alice needs to know the timestamp at which Bob updated the table. With
DRF, we can omit all columns except for the timestamp.
2. Bob wants to track DELETEs on the table. Without DRF, Bob can see all
columns on any deleted row, potentially exposing complete records he
shouldn't be authorized to view. DRF can filter these rows out.

Deferred replica filtering allows for session-specific, per-row, and
per-column filtering - features currently not supported by existing
replication filters, enhancing security and data privacy.

I look forward to hearing your thoughts!

Best,
Dean S

Re: PG_CFLAGS rpath Passthrough Issue

2025-03-15 Thread David E. Wheeler

On Mar 15, 2025, at 07:58, Álvaro Herrera  wrote:

> It's make.  From its info manual:

Oh, that explains it. It hadn’t occurred to me that make could eval strings 
passed to it like that, but of course it does.
> 
> I'm surprised that you don't need \$\$ though.  I wonder if
>  make CFLAGS='-Wl,-rpath,$$ORIGIN'
> isn't enough actually?

Oddly, no:

# make CFLAGS='-Wl,-rpath,$$ORIGIN'
# chrpath src/semver.so 
src/semver.so: RUNPATH=

Gotta have the backslash.

Best,

David



signature.asc
Description: Message signed with OpenPGP

Re: Statistics Import and Export

2025-03-15 Thread Corey Huinker

On Thu, Mar 6, 2025 at 3:48 AM Jeff Davis  wrote:

> On Wed, 2025-03-05 at 23:04 -0500, Corey Huinker wrote:
> >
> > Anyway, here's a rebased set of the existing up-for-consideration
> > patches, plus the optimization of avoiding querying on non-expression
> > indexes.
>
> Comments on 0003:
>
> * All the argument names for pg_restore_attribute_stats match pg_stats,
> except relname vs tablename. There doesn't appear to be a great answer
> here, because "relname" is the natural name to use for
> pg_restore_relation_stats(), so either the two restore functions will
> be inconsistent, or the argument name of one of them will be
> inconsistent with its respective catalog. I assume that's the
> reasoning?
>

Correct, either we use 'tablename' to describe indexes as well, or we
diverge from the system view's naming.

> * Now that it's doing a namespace lookup, we should also check for the
> USAGE privilege on the namespace, right?
>

Unless some check was being done by the 'foo.bar'::regclass cast, I
understand why we should add one.

> Based on the other changes we've made to this feature, I think 0003
> makes sense, so I'm inclined to move ahead with it, but I'm open to
> opinions.
>

If we do, we'll want to change downgrade the following errors to
warn+return false:

* stats_check_required_arg()
* stats_lock_check_privileges()
* RecoveryInProgress
* specified both attnum and argnum
* attname/attnum does not exist, or is system column

> 0004 looks straightforward, though perhaps we should move some of the
> code into a static function rather than indenting so many lines.
>

I agree, but the thread conversation had already shifted to doing just one
single call to pg_stats, so this was just a demonstration.

> Did you collect performance results for 0004?

No, as I wasn't sure that I could replicate Andres' setup, and the
conversation was quickly moving to the aforementioned single-query idea.

Re: dsm_registry: Add detach and destroy features

On Fri, Mar 14, 2025 at 10:46:58AM +0900, Sungwoo Chang wrote:
> I also made some fix in GetNamedDSMSegment function where it throws an
> exception if the found entry's size field does not match the user input. It
> looks like dshash_release_lock needs to be called and MemoryContext needs
> to be switched back to the old one.

My expectation is that the error handling takes care of these things.  Are
there cases where it does not?  In any case, I would expect these errors to
be extremely rare to the point of virtually never happening in practice.

> It's my first time submitting a patch so please let me know if I missed on
> something.

I'm delighted that folks want to expand on this feature, and I'm very
interested in helping.  Other things folks have asked for are a view into
the DSM registry [0] and a simple way to allocate a DSA or dshash table
with the DSM registry, all of which seem like good improvements to me.

That being said, we are at the very end of the v18 development cycle, and
most folks are probably busy trying to finish their ongoing projects.  So
it may be a while before anyone can give this a meaningful review.  Please
be sure to add a commitfest entry [1] so it doesn't get lost.

[0] https://postgr.es/m/4D445D3E-81C5-4135-95BB-D414204A0AB4%40gmail.com
[1] https://commitfest.postgresql.org/

-- 
nathan

Re: Adding support for SSLKEYLOGFILE in the frontend

2025-03-15 Thread Peter Eisentraut


On 13.03.25 19:31, Tom Lane wrote:

Jacob Champion  writes:

Adding the PG prefix to the envvar name addresses my collision
concern, but I think Tom's comment upthread [1] was saying that we
should not provide any envvar at all:



I think it might be safer if we only accepted it as a connection
parameter and not via an environment variable.



Is the addition of the PG prefix enough to address that concern too?


Indeed, I was advocating for *no* environment variable.  The PG prefix
does not comfort me.


It seems to me that the environment variable would be the most useful 
way to use this feature, for example if you want to debug an existing 
program that you can't or don't want to change.


As was mentioned earlier, libcurl uses an environment variable for this. 
 Moreover, the format originated in the NSS library, which also uses an 
environment variable.


So we are here constructing a higher level of security that others don't 
seem to have found the need for.


It's also possible that we should consider the SSLKEYLOGFILE environment 
variable some kind of quasi-standard like PAGER, and we should be using 
exactly that environment variable name like everyone else.

Re: Available disk space per tablespace

2025-03-15 Thread Thomas Munro

On Sun, Mar 16, 2025 at 1:17 AM Laurenz Albe  wrote:
> On Sat, 2025-03-15 at 13:09 +0100, Christoph Berg wrote:
> > Do we care about any of these?
> >
> > AIX
>
> We dropped support for it, but there are efforts to change that.

FWIW AIX does have it, according to its manual, in case it comes back.
The others in the list are defunct or obsolete versions.

Re: Proposal: Adding compression of temporary files

2025-03-15 Thread Alexander Korotkov

On Sun, Jan 5, 2025 at 1:43 AM Filip Janus  wrote:
>
> I apologize for multiple messages, but I found a small bug in the previous 
> version.
>
> -Filip-

Great, thank you for your work.

I think the patches could use a pgindent run.

I don't see a reason why the temp file compression method should be
different from the wal compression methods, which we already have
in-tree.  Perhaps it would be nice to have a 0001 patch, which would
abstract the compression methods we now have for wal into a separate
file containing GUC option values and functions for
compress/decompress. Then, 0002 would apply this to temporary file
compression.

--
Regards,
Alexander Korotkov
Supabase

Re: Add an option to skip loading missing publication to avoid logical replication failure

2025-03-15 Thread Amit Kapila

On Thu, Mar 13, 2025 at 7:26 PM Dilip Kumar  wrote:
>
> On Thu, Mar 13, 2025 at 3:20 PM Amit Kapila  wrote:
> >
> >
> > I think that is too much related to pub-sub model, and ideally,
> > pgoutput should not care about it. I have written a comment
> > considering somebody using pgoutput decoding module via APIs.
>

Thanks, Dilip and Sawada-San, for the inputs, and Vignesh for the
patch. I have pushed the change.

-- 
With Regards,
Amit Kapila.

Re: Disabling vacuum truncate for autovacuum

2025-03-15 Thread Fujii Masao





On 2025/03/15 0:42, Nathan Bossart wrote:

I should also mention that we just have a few weeks left in the v18
development cycle.  The code itself seems pretty straightforward, so if we
can agree on behavior and nomenclature, I'll do my darndest to get this
responsibly committed in time.


+1

Here are two minor review comments from me.

+  

This xreflabel should be "vacuum_truncate", not "autovacuum".


-  lock on the table. The TRUNCATE parameter
-  of VACUUM, if 
specified, overrides the value
-  of this option.
+  Per-table value for  parameter.

It was explicitly documented that the TRUNCATE option in the VACUUM
command overrides the vacuum_truncate reloption, but this information
has been removed in the patch. Shouldn't we keep it to clarify
the priority of these settings?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: Update Unicode data to Unicode 16.0.0

2025-03-15 Thread Joe Conway


On 3/15/25 03:26, Laurenz Albe wrote:

On Fri, 2025-03-14 at 23:54 -0700, Jeremy Schneider wrote:

On Fri, 07 Mar 2025 13:11:18 -0800
It seems the consensus is to update unicode in core... FWIW, I'm still
in favor of leaving it alone because ICU is there for when I need
up-to-date unicode versions.


Me too.


+1


From my perspective, the whole point of the builtin collation was to
one option that avoids these problems that come with updating both ICU
and glibc.


+1


+1

In the long term I think we should figure out how to support newer 
versions of unicode for the builtin, but in my mind that might involve 
the necessity of supporting multiple versions of unicode such that the 
choice remains to remain on the older one.


--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Adding a '--clean-publisher-objects' option to 'pg_createsubscriber' utility.

2025-03-15 Thread David G. Johnston

On Friday, March 14, 2025, Amit Kapila  wrote:

> On Sat, Mar 15, 2025 at 11:35 AM Peter Smith 
> wrote:
> >
> > On Sat, Mar 15, 2025 at 4:52 PM Peter Smith 
> wrote:
> > >
> > > On Fri, Mar 14, 2025 at 5:39 PM David G. Johnston
> > >  wrote:
> > > >
> > > > On Tuesday, March 11, 2025, Peter Smith 
> wrote:
> > > >>
> > > >>
> > > >> Choice 3. Implement some option that has an argument saying what to
> delete
> > > >>
> > > >> Implement an option that takes some argument saying what objects to
> remove.
> > > >>
> > > >> Here, the current patch would be something like:
> > > >> "--remove-target-objects=publications". In future, this option
> could
> > > >> be enhanced to accept other values like
> > > >> "--remove-target-objects=publications,subscriptions" etc.
> > > >
> > > >
> > > > I’m changing my vote to option 3.  With a requirement that dropping
> is done post-recovery by the user via psql - we just provide the commands
> we would have executed in a script.
> > > >
> > > > —prune-file=‘drop-commands.psql’
> > > > —prune=publications
> > > > —prune=subscriptions
> > > > Etc…
> > > >
> > > > I’d rather do multiple specifications than commas separated.  It
> matches what we already do with databases, which was done this way I
> suspect for the same reasons - length of the parameters value when we have:
> > > > Publications
> > > > Slots
> > > > Subscriptions
> > > > Databases
> > > > Roles
> > > > Tablespaces
> > > >
> > >
> >
> > (I'm sorry, my previous post included a confusing typo. Here is the
> correction)
> >
> > OK. Here are some examples...
> >
> > style #1:
> > --prune=Publications --prune=Slots --prune=Subscriptions
> > --prune=Databases --prune=Tablespaces
> >
> > versus
> >
> > style #2:
> > --prune=Publications,Slots,Subscriptions,Databases,Tablespaces
> >
> > David, I understand you are saying that you prefer style #1 because of
> > the potentially cumbersome length of the csv value part of style #2 if
> > there are many future object kinds (e.g.
> > "Publications,Slots,Subscriptions,Databases,Tablespaces" in the above
> > examples).
> >
>
> Style-1 sounds reasonable to me, but how exactly we want to do. One
> idea is to have short and long switches like -r and
> --remove_exiting_object=publication. The user would be allowed to give
> multiple options like -r publications -r slots, etc.
>

Either “existing” nor “object” are needed, a one-word long form suffices.
Drop, remove, or prune.  If we want a short form option we should choose
Remove and use -r; both D and P are already taken.

So, I marginally prefer —prune with no short-form option; followed by
—remove/-r

Communicating the semantic meaning of “prune” in the option name, we aren’t
removing all objects of the given type, tips the balance for me.  But that
can just be communicated in the description so it isn’t a strong desire.

David J.

Re: BitmapHeapScan streaming read user and prelim refactoring

2025-03-15 Thread Melanie Plageman

On Thu, Mar 13, 2025 at 5:41 PM Melanie Plageman
 wrote:
>
> Overall, I feel pretty good about merging this once Thomas merges the
> read stream patches.

This was committed in 944e81bf99db2b5b70b, 2b73a8cd33b745c, and
c3953226a07527a1e2.
I've marked it as committed in the CF app.

As we continue to test the feature in the coming weeks, we can enhance
the read stream API with any updates needed.

- Melanie

Re: Assert when executing query on partitioned table

2025-03-15 Thread Joseph Koshakow

On Sat, Mar 8, 2025 at 4:28 PM Joseph Koshakow  wrote:
>
> The assert was introduced commit f16241
>
> So if I understand correctly, at the time the assert was added, we in
> fact did not support UPDATE of INSERT ON CONFLICT for a partitioned
> table. However, since then we've added support but nobody removed or
> altered the assert.

I tried to validate this by checking out f16241, and running the following:

psql (11devel)
Type "help" for help.

test=# CREATE TABLE t_int (i int PRIMARY KEY, v int, x int) PARTITION
BY RANGE (i);
  CREATE TABLE t_int_1 PARTITION OF t_int FOR VALUES FROM (1) TO (100);
  CREATE TABLE t_int_2 PARTITION OF t_int FOR VALUES FROM (100) TO
(200);

  INSERT INTO t_int VALUES (1, 10, 100);
CREATE TABLE
CREATE TABLE
CREATE TABLE
INSERT 0 1
test=# INSERT INTO t_int VALUES (1, 11, 111) ON CONFLICT (i) DO UPDATE
SET x = excluded.x;
INSERT 0 1

So it looks like when that assert was added, we *did* support
INSERT .. ON CONFLICT UPDATE for partitioned tables. So I'm even more
confused about the conversation and assert. You can even update the
partitions directly.

test=# INSERT INTO t_int_1 VALUES (1, 11, 111) ON CONFLICT (i) DO
UPDATE SET x = excluded.x;
INSERT 0 1
test=# INSERT INTO t_int_2 VALUES (150, 11, 111) ON CONFLICT (i) DO
UPDATE SET x = excluded.x;
INSERT 0 1

Thanks,
Joseph Koshakow

Re: Add column name to error description

2025-03-15 Thread Erik Wienhold

On 2025-03-07 04:05 +0100, Tom Lane wrote:
> Erik Wienhold  writes:
> > But I don't see the point in keeping variables atttypid and atttypmod
> > around when those values are now available via outatt.  Removing these
> > two variables makes the code easier to read IMO.  Done so in the
> > attached v4.
> 
> I think the idea of the original coding was to keep those values in
> registers in the inner loop rather than re-fetching them each time.

Could be.  But the main reason was to hold the output column type as the
inner loop repurposed att for the input column.  With the separate
outatt and inatt this is no longer necessary.

-- 
Erik Wienhold

Re: Changing the state of data checksums in a running cluster

Jo.

On 2025-03-15 16:50:02 +0100, Tomas Vondra wrote:
> Thanks, here's an updated patch version

FWIW, this fails in CI;

https://cirrus-ci.com/build/4678473324691456
On all OSs:
[16:08:36.331] #   Failed test 'options --locale-provider=icu --locale=und 
--lc-*=C: no stderr'
[16:08:36.331] #   at /tmp/cirrus-ci-build/src/bin/initdb/t/001_initdb.pl line 
132.
[16:08:36.331] #  got: '2025-03-15 16:08:26.216 UTC [63153] LOG:  
XLogCtl->data_checksum_version 0 ControlFile->data_checksum_version 0
[16:08:36.331] # 2025-03-15 16:08:26.216 UTC [63153] LOG:  
XLogCtl->data_checksum_version 0 ControlFile->data_checksum_version 0 (UPDATED)

Windows & Compiler warnings:
[16:05:08.723] ../src/backend/storage/page/bufpage.c(25): fatal error C1083: 
Cannot open include file: 'execinfo.h': No such file or directory

[16:18:52.385] bufpage.c:25:10: fatal error: execinfo.h: No such file or 
directory
[16:18:52.385]25 | #include 
[16:18:52.385]   |  ^~~~

Greetings,

Andres Freund

Re: Optimization for lower(), upper(), casefold() functions.

2025-03-15 Thread Heikki Linnakangas


On 14/03/2025 05:43, Jeff Davis wrote:

On Wed, 2025-03-12 at 23:39 +0300, Alexander Borisov wrote:

v5 attached.


Attached v6j.

* marked arrays as "static const" rather than just "static"
* ran pgindent
* changed data types where appropriate (uint32->pg_wchar)
* modified perl code so that it produces code that's already pgindented
* cleanup of perl code, removing unnecessary subroutines and variables
* added a few comments
* ran pgperltidy

Some of the perl code working with ranges still needs further cleanup
and explanation, though.

Also, I ran some of my own simple tests (mostly ASCII) and it showed
over 10% speedup. That combined with the smaller table sizes makes this
well worth it.


Looks good overall.


static const pg_wchar case_map_lower[1677] =
{
0x00,   /* U+00 */
0x00,   /* U+00 */
0x01,   /* U+01 */
0x02,   /* U+02 */


The duplicated 0x00 looks wrong. I understand that the 0'th entry is 
reserved, and the actual codepoints start at index 1, but the /* 
U+00 */ comment on the 0'th entry is misleading.



static const uint8 case_map_special[1677] =
{
0x00,   /* U+00 */
0x00,   /* U+00 */
...


0x00 implies an 24-bit integer, but these are uint8's. Let's use 
plain base-10 decimals here rather than hex, like in 'case_map'.


Attached are fixes for those and some other minor things.

--
Heikki Linnakangas
Neon (https://neon.tech)
From 63213dce1ee4a5c410f6b473424bee2bed6fdacb Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas 
Date: Fri, 14 Mar 2025 12:14:12 +0200
Subject: [PATCH 1/3] minor fixes in the perl script

- casekind_index was unused, remove
- move the 'open', it seemed weirdly placed between building the special
  map and determining it size.
---
 src/common/unicode/generate-unicode_case_table.pl | 12 +---
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/src/common/unicode/generate-unicode_case_table.pl b/src/common/unicode/generate-unicode_case_table.pl
index 10b2dcc49d3..2b59d4d3e0b 100644
--- a/src/common/unicode/generate-unicode_case_table.pl
+++ b/src/common/unicode/generate-unicode_case_table.pl
@@ -231,17 +231,13 @@ while (my $line = <$FH>)
 close $FH;
 
 # assign sequential array indexes to the special mappings
-# 0 reserved for NULL
+# 0 is reserved for NULL
 my $special_idx = 1;
 foreach my $code (sort { $a <=> $b } (keys %special))
 {
 	$special{$code}{Index} = $special_idx++;
 }
 
-# Start writing out the output files
-open my $OT, '>', $output_table_file
-  or die "Could not open output file $output_table_file: $!\n";
-
 # determine size of array
 my $num_special = scalar(keys %special) + 1;
 
@@ -249,6 +245,10 @@ die
   "special case map contains $num_special entries which cannot be represented in uint8"
   if ($num_special > 256);
 
+# Start writing out the output files
+open my $OT, '>', $output_table_file
+  or die "Could not open output file $output_table_file: $!\n";
+
 print $OT <<"EOS";
 /*-
  *
@@ -420,8 +420,6 @@ die
   "mapping tables contains $index entries which cannot be represented in uint16"
   if ($index > 65536);
 
-my @casekind_index;
-
 print $OT <<"EOS";
 
 static const pg_wchar case_map_lower\[$index\] =
-- 
2.39.5

From b21ea572f18b0e173c1d80024e9c276c62e6e5fb Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas 
Date: Fri, 14 Mar 2025 12:11:26 +0200
Subject: [PATCH 2/3] use decimal for case_map_special indexes

---
 .../unicode/generate-unicode_case_table.pl|6 +-
 src/include/common/unicode_case_table.h   | 3382 -
 2 files changed, 1694 insertions(+), 1694 deletions(-)

diff --git a/src/common/unicode/generate-unicode_case_table.pl b/src/common/unicode/generate-unicode_case_table.pl
index 2b59d4d3e0b..1d574d0 100644
--- a/src/common/unicode/generate-unicode_case_table.pl
+++ b/src/common/unicode/generate-unicode_case_table.pl
@@ -485,7 +485,7 @@ EOS
 foreach my $entry (@map_special)
 {
 	print $OT
-	  sprintf("\t0x%06x,\t\t\t\t\t/* U+%06x */\n", @$entry[0], @$entry[1]);
+	  sprintf("\t%d,\t\t\t\t\t/* U+%06x */\n", @$entry[0], @$entry[1]);
 }
 
 print $OT "\n};\n";
@@ -575,7 +575,7 @@ sub branch
 		}
 
 		push @result,
-		  sprintf("%s\treturn case_map[cp - 0x%04X + 0x%04X];",
+		  sprintf("%s\treturn case_map[cp - 0x%04X + %d];",
 			$space, $entry->[0], $entry->[3]);
 	}
 	else
@@ -596,7 +596,7 @@ sub branch
 	if ($idx == $to)
 	{
 		push @result,
-		  sprintf("%s\treturn case_map\[cp - 0x%04X + 0x%04X];",
+		  sprintf("%s\treturn case_map\[cp - 0x%04X + %d];",
 			$space, $entry->[2], $range->[ $idx + 1 ]->[3]);
 	}
 	else
diff --git a/src/include/common/unicode_case_table.h b/src/include/common/un

Re: ZStandard (with dictionaries) compression support for TOAST compression

2025-03-15 Thread Yura Sokolov

06.03.2025 19:29, Nikhil Kumar Veldanda пишет:
> Hi,
> 
>> Overall idea is great.
>>
>> I just want to mention LZ4 also have API to use dictionary. Its dictionary
>> will be as simple as "virtually prepended" text (in contrast to complex
>> ZStd dictionary format).
>>
>> I mean, it would be great if "dictionary" will be common property for
>> different algorithms.
>>
>> On the other hand, zstd have "super fast" mode which is actually a bit
>> faster than LZ4 and compresses a bit better. So may be support for
>> different algos is not essential. (But then we need a way to change
>> compression level to that "super fast" mode.)
>>
> 
> zstd compression level and zstd dictionary size is configurable at
> attribute level using ALTER TABLE. Default zstd level is 3 and dict
> size is 4KB. For super fast mode level can be set to 1.

No. Super-fast mode levels are negative. See parsing "--fast" parameter in
`programs/zstdcli.c` in zstd's repository and definition of ZSTD_minCLevel().

So, to support "super-fast" mode you have to accept negative compression
levels. I didn't check, probably you're already support them?


---
regards
Yura Sokolov aka funny-falcon

Re: vacuumdb changes for stats import/export

On Thu, Mar 06, 2025 at 06:30:59PM +0700, John Naylor wrote:
> +This option can only be used in conjunction with
> +--analyze-only and
> --analyze-in-stages.
> 
> + /* Prohibit --missing-only without --analyze-only or --analyze-in-stages */
> + if (vacopts.missing_only && !vacopts.analyze_only)
> + pg_fatal("cannot use the \"%s\" option without \"%s\" or \"%s\"",
> + "missing-only", "analyze-only", "analyze-in-stages");
> 
> The first is slightly ambiguous, so maybe "or" is better throughout.

Agreed.

> + " CROSS JOIN LATERAL (SELECT c.relkind IN ('p', 'I')) as p (inherited)\n"
> 
> Looking elsewhere in this file, I think we prefer something like
> "(c.relkind OPERATOR(pg_catalog.=) ANY (array["
>   CppAsString2(RELKIND_PARTITIONED_TABLE) ", "
>   CppAsString2(RELKIND_PARTITIONED_INDEX) "]) as p (inherited)\n"

Fixed.

> + " AND NOT EXISTS (SELECT NULL FROM pg_catalog.pg_statistic s\n"
> + " WHERE s.starelid OPERATOR(pg_catalog.=) a.attrelid\n"
> + " AND s.staattnum OPERATOR(pg_catalog.=) a.attnum\n"
> + " AND s.stainherit OPERATOR(pg_catalog.=) p.inherited))\n");
> 
> IIUC correctly, pg_statistic doesn't store stats on itself, so this
> causes the query result to always contain pg_statistic -- does that
> get removed elsewhere?

Good catch.  I think the current behavior is to call ANALYZE on
pg_statistic, too, but that should be mostly harmless (analyze_rel()
refuses to process it).  I suppose we could try to avoid returning
pg_statistic from the catalog query, but we don't bother doing that for any
other vacuumdb modes, so I'm tempted to leave it alone.

-- 
nathan
>From e7e2c9750131458e00f735352f63d69da73aae8b Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Mon, 3 Mar 2025 09:43:38 -0600
Subject: [PATCH v4 1/2] vacuumdb: Save catalog query results for
 --analyze-in-stages.

Presently, each call to vacuum_one_database() for each stage of
--analyze-in-stages mode performs the catalog query to retrieve the
list of tables to process.  A proposed follow-up commit would add a
"missing only" feature to --analyze-in-stages, which requires us to
save the results of the catalog query (since tables without
statistics would have them after the first stage).  This commit
adds this ability via a new parameter for vacuum_one_database()
that specifies either a previously-retrieved list to process or a
place to store the results of the catalog query for later use.
This commit also makes use of this new parameter for
--analyze-in-stages.

The trade-offs of this approach are increased memory usage and less
responsiveness to concurrent catalog changes in later stages,
neither of which is expected to bother anyone.

Co-authored-by: Corey Huinker 
Reviewed-by: John Naylor 
Discussion: https://postgr.es/m/Z5O1bpcwDrMgyrYy%40nathan
---
 src/bin/scripts/vacuumdb.c | 335 ++---
 1 file changed, 199 insertions(+), 136 deletions(-)

diff --git a/src/bin/scripts/vacuumdb.c b/src/bin/scripts/vacuumdb.c
index 982bf070be6..52b91837492 100644
--- a/src/bin/scripts/vacuumdb.c
+++ b/src/bin/scripts/vacuumdb.c
@@ -62,10 +62,16 @@ typedef enum
 
 static VacObjFilter objfilter = OBJFILTER_NONE;
 
+static SimpleStringList *retrieve_objects(PGconn *conn,
+   
  vacuumingOptions *vacopts,
+   
  SimpleStringList *objects,
+   
  bool echo);
+
 static void vacuum_one_database(ConnParams *cparams,

vacuumingOptions *vacopts,
int stage,

SimpleStringList *objects,
+   
SimpleStringList **found_objs,
int 
concurrentCons,
const char 
*progname, bool echo, bool quiet);
 
@@ -400,12 +406,13 @@ main(int argc, char *argv[])
if (analyze_in_stages)
{
int stage;
+   SimpleStringList *found_objs = NULL;
 
for (stage = 0; stage < ANALYZE_NUM_STAGES; stage++)
{
vacuum_one_database(&cparams, &vacopts,
stage,
-   
&objects,
+   
&objects, &found_objs,

concurrentCons,

progname, echo, quiet);

Re: Available disk space per tablespace

2025-03-15 Thread Thomas Munro

On Sat, Mar 15, 2025 at 4:40 AM Christoph Berg  wrote:
> I'm still unconvinced if we should use statfs() instead of statvfs()
> on *BSD or if their manpage is just trolling us and statvfs is just
> fine.
>
> DESCRIPTION
>The statvfs() and fstatvfs() functions fill the structure pointed to by
>buf  with  garbage.  This garbage will occasionally bear resemblance to
>file system statistics, but portable applications must  not  depend  on
>this.

Hah, I see this in my local FreeBSD man page.  I guess this might be a
reference to POSIX's 100% get-out clause "it is unspecified whether
all members of the statvfs structure have meaningful values on all
file systems".  The statfs() man page doesn't say that (a nonstandard
syscall that originated in 4.4BSD, which POSIX decided to rename
because other systems sprouted incompatible statfs() interfaces?).
It's hard to imagine a system that doesn't track free space and report
it here, and if it doesn't, well so what, that's probably also a
system that can't report free space to the "df" command, so what are
we supposed to do?  We could perhaps add a note to the documentation
that this field relies on the OS providing meaningful "avail" field in
statvfs(), but it's hard to imagine.  Maybe just defer that until
someone shows up with a real report?  So +1 from me, go for it, call
statvfs() and don't worry.

I tried your v3 patch on my FreeBSD 14.2 battle station:

postgres=# \db+
 List of tablespaces
Name| Owner  | Location | Access privileges | Options |  Size
|  Free  | Description
++--+---+-+++-
 pg_default | tmunro |  |   | | 22 MB
| 290 GB |
 pg_global  | tmunro |  |   | | 556 kB
| 290 GB |

That is the correct answer:

tmunro@build1:~/projects/postgresql/build $ df -h .
FilesystemSizeUsed   Avail Capacity  Mounted on
zroot/usr/home331G 41G290G12%/usr/home

I also pushed your patch to CI and triggered the NetBSD and OpenBSD
tasks and they passed your sanity test, though that only checks that
the reported some number >  1MB.

I looked at the source, and on FreeBSD statvfs[1] is just a libc
function that calls statfs() (as does df).  The statfs() man page has
no funny disclaimers.  OpenBSD's[2] too.  NetBSD seems to have a real
statvfs (or statvfs1) syscall but its man page has no funny
disclaimers.

+#ifdef WIN32
+   if (GetDiskFreeSpaceEx(tblspcPath, &lpFreeBytesAvailable,
NULL, NULL) == false)
+   return -1;
+
+   return lpFreeBytesAvailable.QuadPart; /* ULONGLONG part of
ULARGE_INTEGER */
+#else
+   if (statvfs(tblspcPath, &fst) < 0)
+   return -1;
+
+   return fst.f_bavail * fst.f_frsize; /* available blocks times
fragment size */
+#endif

What's the rationale for not raising an error if the system call
fails?  If someone complains that it's showing -1, doesn't that mean
we'll have to ask them to trace the system calls to figure out why, or
if it's Windows, likely abandon all hope of ever knowing why?  Should
statvfs() retry on EINTR?

Style nit: maybe ! instead of == false?

Nice feature.

[1] 
https://github.com/freebsd/freebsd-src/blob/36782aaba4f1a7d054aa405357a8fa2bc0f94eb0/lib/libc/gen/statvfs.c#L70
[2] 
https://github.com/openbsd/src/blob/70ab9842eb8b368612eb098db19dcf94c19d673d/lib/libc/gen/statvfs.c#L59

Re: Statistics Import and Export

Hi,

On 2025-03-06 12:16:44 -0500, Corey Huinker wrote:
> >
> > To be honest, I am a bit surprised that we decided to enable this by
> > default. It's not obvious to me that statistics should be regarded as
> > part of the database in the same way that table definitions or table
> > data are. That said, I'm not overwhelmingly opposed to that choice.
> > However, even if it's the right choice in theory, we should maybe
> > rethink if it's going to be too slow or use too much memory.
> >
>
> I'm strongly in favor of the choice to make it default. This is reducing
> the impact of a post-upgrade customer footgun wherein heavy workloads are
> applied to a database post-upgrade but before analyze/vacuumdb have had a
> chance to do their magic [1].

To be clear, I think this is a very important improvement that most people
should use. I just don't think it's quite there yet.

> It seems to me that we're fretting over seconds when the feature is
> potentially saving the customer hours of reduced availability if not
> outright downtime.

FWIW, I care about the performance for two reasons:

1) It's a difference of seconds in the regression database, which has a few
   hundred tables, few columns, very little data and thus small stats. In a
   database with a lot of tables and columns with complicated datatypes the
   difference will be far larger.

   And in contrast to analyzing the database in parallel, the pg_dump/restore
   work to restore stats afaict happens single-threaded for each database.

2) The changes initially substantially increased the time a test cycle takes
   for me locally. I run the tests 10s to 100s time a day, that really adds
   up.

   002_pg_upgrade is the test that dominates the overall test time for me, so
   it getting slower by a good bit means the overall test time increased.

   1fd1bd87101^:
   total test time: 1m27.010s
   003_pg_upgrade alone:1m6.309s

   1fd1bd87101:
   total test time: 1m45.945s
   003_pg_upgrade alone:1m24.597s

   master at 0f21db36d66:
   total test time: 1m34.576s
   003_pg_upgrade alone:1m12.550s

   It clearly got a lot better since 1fd1bd87101, but it's still ~9% slower
   than before...

I care about the memory usage effects because I've seen plenty systems where
pg_statistics is many gigabytes (after toast compression!), and I am really
worried that pg_dump having all the serialized strings in memory will cause a
lot of previously working pg_dump invocations and pg_upgrades to fail. That'd
also be a really bad experience.

The more I think about it, the less correct it seems to me to have the
statement to restore statistics tracked via ArchiveOpts->createStmt.  We use
that for DDL, but this really is data, not DDL.  Because we store it in
->createStmt it's stored in-memory for the runtime of pg_dump, which means the
peak memory usage will inherently be quite high.

I think the stats need to be handled much more like we handle the actual table
data, which are obviously *not* stored in memory for the whole run of pg_dump.

Greetings,

Andres Freund

Re: per backend WAL statistics

2025-03-15 Thread Michael Paquier

On Wed, Mar 12, 2025 at 05:37:11AM +, Bertrand Drouvot wrote:
> Thanks for the report! I think that it's better to remove the 
> PendingBackendStats
> initializer (instead of adding extra braces). The reason is that I'm concerned
> about padding bytes (that could be added to the struct in the future) not 
> being
> zeroed (see [1]).

Okay.  I am going to remove this initialization in a couple of minutes
as that's more annoying than I thought.
--
Michael


signature.asc
Description: PGP signature

Re: Update Unicode data to Unicode 16.0.0

Jeremy Schneider  writes:
> On Fri, 07 Mar 2025 13:11:18 -0800
> Jeff Davis  wrote:
>> The change in Unicode that I'm focusing on is the addition of U+A7DC,
>> which is unassigned in Unicode 15.1 and assigned in Unicode 16, which
>> lowercases to U+019B. The examples assume that the user is using
>> unassigned code points in PG17/Unicode15.1 and the PG_C_UTF8
>> collation.

> It seems the consensus is to update unicode in core... FWIW, I'm still
> in favor of leaving it alone because ICU is there for when I need
> up-to-date unicode versions.

> From my perspective, the whole point of the builtin collation was to
> one option that avoids these problems that come with updating both ICU
> and glibc.

I don't really buy this argument.  If we sit on Unicode 15 until that
becomes untenable, which it will, then people will still be faced
with a behavioral change whenever we bow to reality and invent a
"builtin-2.0" or whatever collation.  Moreover, by then they might
well have instances of the newly-assigned code points in their
database, making the changeover real and perhaps painful for them.

On the other hand, if we keep up with the Joneses by updating the
Unicode data, we can hopefully put those behavioral changes into
effect *before* they'd affect any real data.  So it seems to me
that freezing our Unicode data is avoiding hypothetical pain now
at the price of certain pain later.

I compare this to our routine timezone data updates, which certainly
have not been without occasional pain ... but does anyone seriously
want to argue that we should still be running tzdata from 20 years
back?  Or even 5 years back?

In fact, on the analogy of timezones, I think we should not only
adopt newly-published Unicode versions pretty quickly but push
them into released branches as well.  Otherwise the benefit of
staying ahead of real use of the new code points isn't there
for end users.

regards, tom lane

Re: Changing the state of data checksums in a running cluster

2025-03-15 Thread Daniel Gustafsson

> On 14 Mar 2025, at 13:20, Tomas Vondra  wrote:

> I experimented with this a little bit, and unfortunately I ran into not
> one, but two race conditions in this :-( I don't have reproducers, all
> of this was done by manually adding sleep() calls / gdb breakpoints to
> pause the processes for a while, but I'll try to explain what/why ...

Ugh. Thanks for this!

> 1) race #1: SetDataChecksumsOn
> 
> The function (and all the other "SetDataChecksums" funcs) does this
> 
>SpinLockAcquire(&XLogCtl->info_lck);
>XLogCtl->data_checksum_version = PG_DATA_CHECKSUM_VERSION;
>SpinLockRelease(&XLogCtl->info_lck);
> 
>barrier = EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_ON);
> 
> Now, imagine there's a sleep() before the EmitProcSignalBarrier. A new
> process may start during that, and it'll read the current checksum value
> from XLogCtl. And then the SetDataChecksumsOn() wakes up, and emits the
> barrier. So far so good.
> 
> But the new backend is already registered in ProcSignal, so it'll get
> the barrier too, and will try to set the local version to "on" again.
> And kaboom - that hits the assert in AbsorbChecksumsOnBarrier():
> 
>Assert(LocalDataChecksumVersion ==
>   PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION);
> 
> The other "SetDataChecksums" have the same issue, except that in those
> cases there are no asserts to trip. Only AbsorbChecksumsOnBarrier() has
> such assert to check the state transition.
> 
> This is "ephemeral" in the sense that setting the value to "on" again
> would be harmless, and indeed a non-assert build will run just fine.

As mentioned off-list, being able to loosen the restriction for the first
barrier seen seem like a good way to keep this assertion.  Removing it is of
course the alternative solution, as it's not causing any issues, but given how
handy it's been to find actual issues it would be good to be able to keep it.

> 2) race #2: InitPostgres
> 
> The InitPostgres does this:
> 
>InitLocalControldata();
> 
>ProcSignalInit(MyCancelKeyValid, MyCancelKey);
> 
> where InitLocalControldata gets the current checksum value from XLogCtl,
> and ProcSignalInit registers the backend into the procsignal (which is
> what barriers are based on).
> 
> Imagine there's a sleep() between these two calls, and the cluster does
> not have checksums enabled. A backend will start, will read "off" from
> XLogCtl, and then gets stuck on the sleep before it gets added to the
> procsignal/barrier array.
> 
> Now, we enable checksums, and the instance goes through 'inprogress-on'
> and 'on' states. This completes, and the backend wakes up and registers
> itself into procsignal - but it won't get any barriers, of course.
> 
> So we end up with an instance with data_checksums="on", but this one
> backend still believes data_checksums="on". This can cause a lot of
> trouble, because it won't write blocks with checksums. I.e. this is
> persistent data corruption.
> 
> I have been thinking about how to fix this. One way would be to
> introduce some sort of locking, so that the two steps (update of the
> XLogCtl version + barrier emit) and (local flag init + procsignal init)
> would always happen atomically. So, something like this:
> 
>SpinLockAcquire(&XLogCtl->info_lck);
>XLogCtl->data_checksum_version = PG_DATA_CHECKSUM_VERSION;
>barrier = EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_ON);
>SpinLockRelease(&XLogCtl->info_lck);
> 
> and
> 
>SpinLockAcquire(&XLogCtl->info_lck);
>InitLocalControldata();
>ProcSignalInit(MyCancelKeyValid, MyCancelKey);
>SpinLockRelease(&XLogCtl->info_lck);
> 
> But that seems pretty heavy-handed, it's definitely much more work while
> holding a spinlock than I'm comfortable with, and I wouldn't be
> surprised if there were deadlock cases etc. (FWIW I believe it needs to
> use XLogCtl->info_lck, to make the value consistent with checkpoints.)

Yeah, that seems quite likely to introduce a new set if issues.

> Anyway, I think a much simpler solution would be to reorder InitPostgres
> like this:
> 
>ProcSignalInit(MyCancelKeyValid, MyCancelKey);
> 
>InitLocalControldata();

Agreed.

> i.e. to first register into procsignal, and then read the new value.
> AFAICS this guarantees we won't lose any checksum version updates. It
> does mean we still can get a barrier for a value we've already seen, but
> I think we should simply ignore this for the very first update.

Calling functions with sideeffects in setting state seems like a bad idea
before ProcSignalInit has run, that's thinko on my part in this patch.  Your
solution of reordering seems like the right way to handle this.

--
Daniel Gustafsson

Re: SQL/JSON json_table plan clause

2025-03-15 Thread Nikita Malakhov

Hi Amit!

Thank you for your feedback, I'll check it out in a couple of days and
reply.

--
Regards,
Nikita Malakhov
Postgres Professional
The Russian Postgres Company
https://postgrespro.ru/

Re: JSON_VALUE() behavior when RETURNING bytea (expected base64 decoding)

2025-03-15 Thread Shay Rojansky

> > For whatever it's worth, I'll note that SQL Server's OPENJSON does do
> > this (so when a JSON string property is extracted as a binary type,
> > base64 encoding is assumed). Other databases also have very specific
> > documented conversion rules for JSON_VALUE RETURNING (Oracle  > docs.oracle.com/en/database/oracle/oracle-database/19/adjsn/clauses-
> > used-in-functions-and-conditions-for-json.html#GUID-
> > DE9F29D3-1C23-4271-9DCD-E585866576D2>, DB2  > i/7.3?topic=functions-json-table#rbafzscajsontable__json_result> (table
> > 1)). I'm basically trying to show that RETURNING definitely isn't a
> > simple cast-from-string in other databases, but is a distinct conversion
> > mechanism that takes into account the fact the the origin data comes
> > from JSON.
>
> According to the SQL standard, once you account for various special
> cases (non-scalar values, null values), it comes down to a cast.
>

OK. I don't have the SQL standard here, but I'll just note that this
doesn't seem to be what most/all other databases are doing - there's maybe
room for interpretation there (but again, I have no idea). Applying certain
transformations where needed certainly seems like the more useful thing to
do, like in this case.

Re: [Doc] Improve hostssl related descriptions and option presentation

2025-03-15 Thread David G. Johnston

On Sunday, March 9, 2025, vignesh C  wrote:

> On Fri, 28 Feb 2025 at 00:08, David G. Johnston
>  wrote:
> >
> > On Mon, Apr 22, 2024 at 2:20 PM David G. Johnston <
> david.g.johns...@gmail.com> wrote:
> >>
> >> Thoughts anyone?
> >>
> >> On Thu, Feb 1, 2024 at 3:47 PM David G. Johnston <
> david.g.johns...@gmail.com> wrote:
> >>>
> >>> Motivated by a recent complaint [1] I found the hostssl related
> material in our docs quite verbose and even repetitive.  Some of that is
> normal since we have both an overview/walk-through section as well as a
> reference section.  But the overview in particular was self-repetitive.
> Here is a first pass at some improvements.  The commit message contains
> both the intent of the patch and some thoughts/questions still being
> considered.
> >>>
> >>> David J.
> >>>
> >>> [1] https://www.postgresql.org/message-id/170672433664.663.
> 11895343120533141715%40wrigleys.postgresql.org
> >
> >
> > Withdrawn as this now has conflicts from recent changes and no one seems
> interested in voicing an opinion on this for or against.
>
> I've updated the commitfest entry status to "withdrawn." Feel free to
> add it again anytime if you change your mind.
>

Thanks.  Apparently I ended up with two CF entries for this and I only
withdrew one of them.

David J.

Re: Clarification on Role Access Rights to Table Indexes

On Sun, Mar 09, 2025 at 11:48:05AM -0400, Tom Lane wrote:
> Nathan Bossart  writes:
>> On Sat, Mar 08, 2025 at 05:17:40PM -0500, Tom Lane wrote:
>>> ReindexIndex() faces this same problem and solves it with some
>>> very complex code that manages to get the table's lock first.
> 
>> I noticed that amcheck's bt_index_check_internal() handles this problem,
>> ...
>> stats_lock_check_privileges() does something similar, but it's not as
>> cautious about the "heapid != IndexGetRelation(indrelid, false)" race
>> condition.
> 
> Egad, we've already got three inconsistent implementations of this
> functionality?  I think the first step must be to unify them into
> a common implementation, if at all possible.

Agreed.  I worry that trying to unify each bespoke implementation into a
single function might result in an unwieldy mess, but I'll give it a
shot...

-- 
nathan

Re: [PATCH] Fixed creation of empty .log files during log rotation

2025-03-15 Thread vignesh C

On Tue, 29 Oct 2024 at 13:38, Арсений Косицын  wrote:
>
>
> Hi everyone!
>
> A situation has been discovered in which empty .log files are being created.
>
> --Reproduce:
>
> 1) You need to set the following parameters in the "postgresql.conf" file:
>
> log_destination = 'csvlog'
> logging_collector = on
> log_rotation_age = 1min
> log_directory = '/home/user/builds/postgresql/log'
>
> 2) After starting the server, two files are being created in the directory 
> with logs:
>
> -rw--- 1 user user  1,4K oct 10 13:29 postgresql-2024-10-10_132907.csv
> -rw--- 1 user user  172 oct 10 13:29 postgresql-2024-10-10_132907.log
>
> 3) The .log file is created anyway, regardless of the log_destination 
> parameter.
> The following message is displayed in the .log file (due to the fact that the 
> log_destination
> parameter is set = 'csvlog'):
>
> 2024-10-10 13:05:52.539 MSK [1629065] LOG:  ending log output to stderr
> 2024-10-10 13:05:52.539 MSK [1629065] HINT:  Future log output will go to log 
> destination "csvlog".
>
> Next, the stderr stream is redirected to a .csv file.
>
> 4) Now, logs are rotated once a minute (due to the set log_rotation_age 
> parameter). Moreover, all
> open log files are rotated, including the .log file that I wrote about above. 
> New ones are being created
> .csv and .log files. Logs themselves are sent to .csv, and nothing else is 
> output to .log
> and it remains empty:
>
> -rw--- 1 user user 1,4K oct 10 13:29 postgresql-2024-10-10_132907.csv
> -rw--- 1 user user  172 oct 10 13:29 postgresql-2024-10-10_132907.log
> -rw--- 1 user user 1,4K oct 10 13:30 postgresql-2024-10-10_133000.csv
> -rw--- 1 user user0 oct 10 13:30 postgresql-2024-10-10_133000.log
>
> --Fix:
>
> To correct the situation, you can add a check for the "Log_destination" 
> parameter in the "logfile_rotate()"
> function of the "syslogger.c" module. Then only the logs specified in this 
> parameter will be rotated. This is done in the proposed patch.

Some agreed-upon comments from [1] need to be addressed. I'm changing
the status to "Waiting on Author." Please resolve the comments, submit
a V2 patch, and update the status back to "Needs Review."
[1] - 
https://www.postgresql.org/message-id/4dc16c9a-a546-48a6-a8df-ab7441a008a6%40postgrespro.ru

Regards,
Vignesh

Re: Available disk space per tablespace

2025-03-15 Thread Thomas Munro

On Sun, Mar 16, 2025 at 1:09 AM Christoph Berg  wrote:
> Hmm. Is looping on EINTR worth the trouble?

I was just wondering if it might be one of those oddballs that ignores
SA_RESTART, but I guess that doesn't seem too likely (I mean, first
you'd probably have to have a reason to sleep or some other special
reason, and who knows what some unusual file systems might do).  It
certainly doesn't on the systems I tried.  So I guess not until we
have other evidence.

Re: Optimization for lower(), upper(), casefold() functions.

Jeff Davis  writes:
> Committed. Thank you!

crake doesn't like your perl style:

./src/common/unicode/generate-unicode_case_table.pl: Loop iterator is not 
lexical at line 638, column 2.  See page 108 of PBP.  
([Variables::RequireLexicalLoopIterators] Severity: 5)


regards, tom lane

Re: Dubious server log messages after pg_upgrade

Hi,

On 2025-03-12 20:41:29 -0400, Tom Lane wrote:
> I happened to notice these entries in a log file on a
> buildfarm member [1]:
> 
> 2025-03-12 15:39:53.265 UTC [7296] WARNING:  found incorrect redo LSN 
> 0/159FB60 (expected 0/4028)
> 2025-03-12 15:39:53.265 UTC [7296] LOG:  corrupted statistics file 
> "pg_stat/pgstat.stat"
> 
> (this is near the end of the pg_upgrade_server.log log file).
> I don't think these are related to that run's subsequent test failure,
> which looks to be good old Windows randomness.  I then looked into the
> logs of a local BF instance that also runs xversion-upgrade tests, and
> darned if I didn't find
> 
> 2025-03-12 14:59:15.792 EDT [2216647] LOG:  database system was shut down at 
> 2025-03-12 14:59:13 EDT
> 2025-03-12 14:59:15.794 EDT [2216647] WARNING:  found incorrect redo LSN 
> 0/46F73F18 (expected 0/4728)
> 2025-03-12 14:59:15.794 EDT [2216647] LOG:  corrupted statistics file 
> "pg_stat/pgstat.stat"
> 2025-03-12 14:59:15.795 EDT [2216644] LOG:  database system is ready to 
> accept connections
> 
> despite that run having completed with no report of trouble.
> So this may have been going on for quite some time without our
> noticing.  The "corrupted statistics file" whine is most likely
> caused by pg_upgrade copying the old system's pgstat.stat file
> into the new installation --- is that a good idea?  I have
> no idea what's causing the redo LSN complaint, but it seems
> like that might deserve closer investigation.

I think the two issues are closely related - this is code that was introduced,
in b860848232aa, as part of work to make pgstats somewhat crashsafe.

Greetings,

Andres Freund

Re: Optimization for lower(), upper(), casefold() functions.

On Fri, 2025-03-14 at 15:00 +0300, Alexander Borisov wrote:
> I tried adding a loop to create tables, and everything looks fine
> (v7).
> Also removed unnecessary (hanging) global variables.

Changed. I used a loop more similar to your first one (hash of arrays),
and I left case_map_special outside of the loop. It a different type of
array: rather than mapping to a codepoint, it maps to another index,
and I think it's a different case (and needs a different comment).

> 
> It's all about not getting too carried away. In my vision these
> subroutines (user-defined functions) will have to be moved to perl
> module (.pm) in the future, after the patch is committed, so that the
> code can be used for Normalization Forms.

I prefer to generalize when we have the other code in place. As it was,
it was a bit confusing why the extra arguments and subroutines were
there.

Also, make_ranges() doesn't seem to do what is described in the
comment: it produces a single range. I changed "> $limit" to ">=
$limit", but then it generates 3 ranges, not two. I rewrote it to be
more clear what's going on:

* I used new looping logic that, at least to me, seems a bit simpler.

* I changed the ranges from an array of 4 elements to a hash with 3
keys, which is more descriptive when using it.

* I changed the terminology of a range so that it's Start and End,
because $from and $to are used by branches() to mean something else.

In branch():

* I got rid of the third element "VALUE" from the range. It was used to
be the start of the next range, but there was already code in the
caller to look ahead to the next range, so it didn't serve any purpose.

* If we rely on some compiler optimizations, it might be possible to
simplify branch() a bit, but I'm fine with the way it's done.

When looking around, I didn't find a lot of material discussing this
generated-branches approach. While it's mentioned a few places, I
couldn't even find a consistent name for it. If you know of something
where I can read some more analysis, please send a link.

Committed. Thank you!

Regards,
Jeff Davis

Re: Dubious server log messages after pg_upgrade

Hi,

On 2025-03-15 16:14:10 -0400, Andres Freund wrote:
> On 2025-03-12 20:41:29 -0400, Tom Lane wrote:
> > I happened to notice these entries in a log file on a
> > buildfarm member [1]:
> > 
> > 2025-03-12 15:39:53.265 UTC [7296] WARNING:  found incorrect redo LSN 
> > 0/159FB60 (expected 0/4028)
> > 2025-03-12 15:39:53.265 UTC [7296] LOG:  corrupted statistics file 
> > "pg_stat/pgstat.stat"
> > 
> > (this is near the end of the pg_upgrade_server.log log file).
> > I don't think these are related to that run's subsequent test failure,
> > which looks to be good old Windows randomness.  I then looked into the
> > logs of a local BF instance that also runs xversion-upgrade tests, and
> > darned if I didn't find
> > 
> > 2025-03-12 14:59:15.792 EDT [2216647] LOG:  database system was shut down 
> > at 2025-03-12 14:59:13 EDT
> > 2025-03-12 14:59:15.794 EDT [2216647] WARNING:  found incorrect redo LSN 
> > 0/46F73F18 (expected 0/4728)
> > 2025-03-12 14:59:15.794 EDT [2216647] LOG:  corrupted statistics file 
> > "pg_stat/pgstat.stat"
> > 2025-03-12 14:59:15.795 EDT [2216644] LOG:  database system is ready to 
> > accept connections
> > 
> > despite that run having completed with no report of trouble.
> > So this may have been going on for quite some time without our
> > noticing.  The "corrupted statistics file" whine is most likely
> > caused by pg_upgrade copying the old system's pgstat.stat file
> > into the new installation --- is that a good idea?  I have
> > no idea what's causing the redo LSN complaint, but it seems
> > like that might deserve closer investigation.
> 
> I think the two issues are closely related - this is code that was introduced,
> in b860848232aa, as part of work to make pgstats somewhat crashsafe.

Ugh, sorry, that went out unintentionally.

Greetings,

Andres Freund

Re: Support NOT VALID / VALIDATE constraint options for named NOT NULL constraints

2025-03-15 Thread Rushabh Lathia

On Wed, Mar 12, 2025 at 11:50 PM Alvaro Herrera 
wrote:

> On 2025-Mar-12, Rushabh Lathia wrote:
>
> > Hi Alvaro,
> >
> > Here are the latest patches, which includes the regression fix.
>
> Thank you.
>
> Taking a step back after discussing this with some colleagues, I need to
> contradict what I said at the start of this thread.  There's a worry
> that changing pg_attribute.attnotnull in the way I initially suggested
> might not be such a great idea after all.  I did a quick search using
> codesearch.debian.net for applications reading that column and thinking
> about how they would react to this change; I think in the end it's going
> to be quite disastrous.  We would break a vast number of these apps, and
> there are probably countless other apps and frameworks that we would
> also break.  Everybody would hate us forever.  Upgrading to Postgres 18
> would become as bad an experience as the drastic change of implicit
> casts to text in 8.3.  Nothing else in the intervening 17 years strikes
> me as so problematic as this change would be.
>
> So I think we may need to go back and somehow leave pg_attribute alone,
> to avoid breaking the whole world.  Maybe it would be sufficient to
> change the CompactAttribute->attnotnull to no longer be a boolean, but
> the new char representation instead.  I'm not sure if this would
> actually work.
>

Thank you for your feedback. I understand that this change could be
problematic

for existing applications, as attnotnull is a highly generic catalog
column. I will

revisit the patch, make the necessary adjustments, and submit a revised
version

accordingly.


I appreciate your insights and will try to submit the new patch.



Regards,
Rushabh Lathia
www.EnterpriseDB.com

Re: SQLFunctionCache and generic plans

I spent some time today going through the actual code in this patch.
I realized that there's no longer any point in 0001: the later patches
don't move or repeatedly-call that bit of code, so it can be left as-is.

What I think we could stand to split out, though, is the changes in
the plancache support.  The new 0001 attached is just the plancache
and analyze.c changes.  That could be committed separately, although
of course there's little point in pushing it till we're happy with
the rest.

In general, this patch series is paying far too little attention to
updating existing comments that it obsoletes or adding new ones
explaining what's going on.  For example, the introductory comment
for struct SQLFunctionCache still says

 * Note that currently this has only the lifespan of the calling query.
 * Someday we should rewrite this code to use plancache.c to save parse/plan
 * results for longer than that.

and I wonder how much of the para after that is still accurate either.
The new structs aren't adequately documented either IMO.  We now have
about three different structs that have something to do with caches
by their names, but the reader is left to guess how they fit together.
Another example is that the header comment for init_execution_state
still describes an argument list it hasn't got anymore.

I tried to clean up the comment situation in the plancache in 0001,
but I've not done much of anything to functions.c.

I'm fairly confused why 0002 and 0003 are separate patches, and the
commit messages for them do nothing to clarify that.  It seems like
you're expecting reviewers to review a very transitory state of
affairs in 0002, and it's not clear why.  Maybe the code is fine
and you just need to explain the change sequence a bit more
in the commit messages.  0002 could stand to explain the point
of the new test cases, too, especially since one of them seems to
be demonstrating the fixing of a pre-existing bug.

Something is very wrong in 0004: it should not be breaking that
test case in test_extensions.  It seems to me we should already
have the necessary infrastructure for that, in that the plan
ought to have a PlanInvalItem referencing public.dep_req2(),
and the ALTER SET SCHEMA that gets done on that function should
result in an invalidation.  So it looks to me like that patch
has somehow rearranged things so we miss an invalidation.
I've not tried to figure out why.  I'm also sad that 0004
doesn't appear to include any test cases showing it doing
something right: without that, why do it at all?

regards, tom lane

From f975519041cbd278005f1d4035d4fe53a80cb665 Mon Sep 17 00:00:00 2001
From: Tom Lane 
Date: Fri, 14 Mar 2025 15:54:56 -0400
Subject: [PATCH v8 1/4] Support cached plans that work from a parse-analyzed
 Query.

Up to now, plancache.c dealt only with raw parse trees as the
starting point for a cached plan.  However, we'd like to use
this infrastructure for SQL functions, and in the case of a
new-style SQL function we'll only have the stored querytree,
which corresponds to an analyzed-but-not-rewritten Query.

Fortunately, we can make plancache.c handle that scenario
with only minor modifications; the biggest change is in
RevalidateCachedQuery() where we will need to apply only
pg_rewrite_query not pg_analyze_and_rewrite.

This patch just installs the infrastructure; there's no
caller as yet.

Author: Alexander Pyhalov 
Reviewed-by: Tom Lane 
Discussion: https://postgr.es/m/8216639.NyiUUSuA9g@aivenlaptop
---
 src/backend/parser/analyze.c|  39 +++
 src/backend/utils/cache/plancache.c | 158 +---
 src/include/parser/analyze.h|   1 +
 src/include/utils/plancache.h   |  23 +++-
 4 files changed, 179 insertions(+), 42 deletions(-)

diff --git a/src/backend/parser/analyze.c b/src/backend/parser/analyze.c
index 76f58b3aca3..1f4d6adda52 100644
--- a/src/backend/parser/analyze.c
+++ b/src/backend/parser/analyze.c
@@ -591,6 +591,45 @@ analyze_requires_snapshot(RawStmt *parseTree)
 	return stmt_requires_parse_analysis(parseTree);
 }
 
+/*
+ * query_requires_rewrite_plan()
+ *		Returns true if rewriting or planning is non-trivial for this Query.
+ *
+ * This is much like stmt_requires_parse_analysis(), but applies one step
+ * further down the pipeline.
+ *
+ * We do not provide an equivalent of analyze_requires_snapshot(): callers
+ * can assume that any rewriting or planning activity needs a snapshot.
+ */
+bool
+query_requires_rewrite_plan(Query *query)
+{
+	bool		result;
+
+	if (query->commandType != CMD_UTILITY)
+	{
+		/* All optimizable statements require rewriting/planning */
+		result = true;
+	}
+	else
+	{
+		/* This list should match stmt_requires_parse_analysis() */
+		switch (nodeTag(query->utilityStmt))
+		{
+			case T_DeclareCursorStmt:
+			case T_ExplainStmt:
+			case T_CreateTableAsStmt:
+			case T_CallStmt:
+result = true;
+break;
+			default:
+result = false;
+break;
+		}
+	}
+	return result;
+}
+

Re: Draft for basic NUMA observability

2025-03-15 Thread Bertrand Drouvot

Hi,

On Thu, Mar 13, 2025 at 02:15:14PM +, Bertrand Drouvot wrote:
> > > === 19
> > >
> > 
> > Can you please take a look again on this
> 
> Sure, will do.

> I'll have a look at v11 soon.

About 0001:

=== 1

git am produces:

.git/rebase-apply/patch:378: new blank line at EOF.
+
.git/rebase-apply/patch:411: new blank line at EOF.
+
warning: 2 lines add whitespace errors.

=== 2

+  Returns true if a NUMA support is available.

What about "Returns true if the server has been compiled with NUMA support"?

=== 3

+Datum
+pg_numa_available(PG_FUNCTION_ARGS)
+{
+   if(pg_numa_init() == -1)
+   PG_RETURN_BOOL(false);
+   PG_RETURN_BOOL(true);
+}

What about PG_RETURN_BOOL(pg_numa_init() != -1)?

Also I wonder if pg_numa.c would not be a better place for it.

=== 4

+{ oid => '5102', descr => 'Is NUMA compilation available?',
+  proname => 'pg_numa_available', provolatile => 'v', prorettype => 'bool',
+  proargtypes => '', prosrc => 'pg_numa_available' },
+

Not sure that 5102 is a good choice.

src/include/catalog/unused_oids is telling me:

Best practice is to start with a random choice in the range 8000-.
Suggested random unused OID: 9685 (23 consecutive OID(s) available starting 
here)

So maybe use 9685 instead?

=== 5

@@ -12477,3 +12481,4 @@
   prosrc => 'gist_stratnum_common' },

 ]
+

garbage?

=== 6

Run pgindent as it looks like it's finding some things to do in 
src/backend/storage/ipc/shmem.c
and src/port/pg_numa.c.

=== 7

+Size
+pg_numa_get_pagesize(void)
+{
+   Size os_page_size = sysconf(_SC_PAGESIZE);
+   if (huge_pages_status == HUGE_PAGES_ON)
+GetHugePageSize(&os_page_size, NULL);
+   return os_page_size;
+}

I think that's more usual to:

Size
pg_numa_get_pagesize(void)
{
   Size os_page_size = sysconf(_SC_PAGESIZE);

   if (huge_pages_status == HUGE_PAGES_ON)
GetHugePageSize(&os_page_size, NULL);

   return os_page_size;
}

I think that makes sense to check huge_pages_status as you did.

=== 8

+extern void numa_warn(int num, char *fmt,...) pg_attribute_printf(2, 3);
+extern void numa_error(char *where);

I wonder if that would make sense to add a comment mentioning
github.com/numactl/numactl/blob/master/libnuma.c here.

I still need to look at 0002 and 0003.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Clarification on Role Access Rights to Table Indexes

Nathan Bossart  writes:
> On Sat, Mar 08, 2025 at 05:17:40PM -0500, Tom Lane wrote:
>> ReindexIndex() faces this same problem and solves it with some
>> very complex code that manages to get the table's lock first.

> I noticed that amcheck's bt_index_check_internal() handles this problem,
> ...
> stats_lock_check_privileges() does something similar, but it's not as
> cautious about the "heapid != IndexGetRelation(indrelid, false)" race
> condition.

Egad, we've already got three inconsistent implementations of this
functionality?  I think the first step must be to unify them into
a common implementation, if at all possible.

regards, tom lane

Re: Parallel heap vacuum

2025-03-15 Thread Masahiko Sawada

On Sun, Mar 9, 2025 at 11:12 PM Amit Kapila  wrote:
>
> On Wed, Mar 5, 2025 at 6:25 AM Masahiko Sawada  wrote:
> >
> > On Mon, Mar 3, 2025 at 3:24 PM Masahiko Sawada  
> > wrote:
> > >
> > >
> > > Another performance regression I can see in the results is that heap
> > > vacuum phase (phase III) got slower with the patch. It's weired to me
> > > since I don't touch the code of heap vacuum phase. I'm still
> > > investigating the cause.
> >
> > I have investigated this regression. I've confirmed that In both
> > scenarios (patched and unpatched), the entire table and its associated
> > indexes were loaded into the shared buffer before the vacuum. Then,
> > the 'perf record' analysis, focused specifically on the heap vacuum
> > phase of the patched code, revealed numerous soft page faults
> > occurring:
> >
> > 62.37%13.90%  postgres  postgres[.] lazy_vacuum_heap_rel
> > |
> > |--52.44%--lazy_vacuum_heap_rel
> > |  |
> > |  |--46.33%--lazy_vacuum_heap_page (inlined)
> > |  |  |
> > |  |  |--32.42%--heap_page_is_all_visible 
> > (inlined)
> > |  |  |  |
> > |  |  |  
> > |--26.46%--HeapTupleSatisfiesVacuum
> > |  |  |  |
> > HeapTupleSatisfiesVacuumHorizon
> > |  |  |  |
> > HeapTupleHeaderXminCommitted (inlined)
> > |  |  |  |  |
> > |  |  |  |   
> > --18.52%--page_fault
> > |  |  |  | 
> > do_page_fault
> > |  |  |  |
> > __do_page_fault
> > |  |  |  |
> > handle_mm_fault
> > |  |  |  |
> > __handle_mm_fault
> > |  |  |  |
> > handle_pte_fault
> > |  |  |  | |
> > |  |  |  |
> > |--16.53%--filemap_map_pages
> > |  |  |  | |
> >   |
> > |  |  |  | |
> > --2.63%--alloc_set_pte
> > |  |  |  | |
> >   pfn_pte
> > |  |  |  | |
> > |  |  |  |
> > --1.99%--pmd_page_vaddr
> > |  |  |  |
> > |  |  |   --1.99%--TransactionIdPrecedes
> >
> > I did not observe these page faults in the 'perf record' results for
> > the HEAD version. Furthermore, when I disabled parallel heap vacuum
> > while keeping parallel index vacuuming enabled, the regression
> > disappeared. Based on these findings, the likely cause of the
> > regression appears to be that during parallel heap vacuum operations,
> > table blocks were loaded into the shared buffer by parallel vacuum
> > workers.
> >
>
> In the previous paragraph, you mentioned that the entire table and its
> associated indexes were loaded into the shared buffer before the
> vacuum. If that is true, then why does the parallel vacuum need to
> reload the table blocks into shared buffers?

Hmm, my above sentences are wrong. All pages are loaded into the
shared buffer before the vacuum test. In parallel heap vacuum cases,
the leader process reads a part of the table during phase I whereas in
single-process vacuum case, the process reads the entire table. So
there will be differences in the phase III in terms of PTEs as
described below.

>
> > However, in the heap vacuum phase, the leader process needed
> > to process all blocks, resulting in soft page faults while creating
> > Page Table Entries (PTEs). Without the patch, the backend process had
> > already created PTEs during the heap scan, thus preventing these
> > faults from occurring during the heap vacuum phase.
> >
>
> This part is again not clear to me because I am assuming all the data
> exists in shared buffers before the vacuum, so why the page faults
> will occur in the first place.

IIUC PTEs are process-local data. So even if physical pages are loaded
to PostgreSQL's shared buffer (and paga caches), soft page faults (or
minor page faults)[1] can occur if these pages are not yet mapped in
its page table.

Regards,

[1] https://en.wikipedia.org/wiki/Page_fault#Minor

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Re: Add RESPECT/IGNORE NULLS and FROM FIRST/LAST options

2025-03-15 Thread Oliver Ford

On Fri, Feb 28, 2025 at 11:49 AM Tatsuo Ishii  wrote:

> >> BTW, I noticed that in the code path where
> >> ignorenulls_getfuncarginframe() is called, WinSetMarkPosition() is
> >> never called?
> >>
> >> Attached version uses the mark_pos at the end.
>
> I did simple performance test against v8.
>
> EXPLAIN ANALYZE
> SELECT
>   x,
> nth_value(x,2) IGNORE NULLS OVER w
> FROM generate_series(1,$i) g(x)
> WINDOW w AS (ORDER BY x ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING);
>
> I changed $i = 1k, 2k, 3k, 4k, 5k... 10k and got this:
>
> Number  Time (ms)
> of rows
> 
> 100028.977
> 200096.556
> 3000212.019
> 4000383.615
> 5000587.05
> 6000843.23
> 70001196.177
> 80001508.52
> 90001920.593
> 1   2514.069
>
> As you can see, when the number of rows = 1k, it took 28 ms. For 10k
> rows, it took 2514 ms, which is 86 times slower than the 1k case. Can
> we enhance this?
>
>
Attached version removes the non-nulls array. That seems to speed
everything up.  Running the above query with 1 million rows averages 450ms,
similar when using lead/lag.


0009-ignore-nulls.patch
Description: Binary data

Re: PATCH: warn about, and deprecate, clear text passwords

On Mon, Mar 03, 2025 at 01:54:59PM -0500, Robert Haas wrote:
> Oh, good point. I don't know. I just have heard a LOT of complaining
> about passwords showing up in the log, and I'm not sure insisting that
> they have to all be encrypted is going to make all of the complaining
> stop.

+1.  At this point, IMHO we should consider this v19 material to provide
more time for discussion on the best way to tackle this problem.  Blocking
plain-text passwords in CREATE/ALTER ROLE commands may be part of it, but
as Robert notes, we might need to do more.

-- 
nathan

Re: Update Unicode data to Unicode 16.0.0

On Fri, 2025-03-14 at 23:54 -0700, Jeremy Schneider wrote:
> On Fri, 07 Mar 2025 13:11:18 -0800
> It seems the consensus is to update unicode in core... FWIW, I'm still
> in favor of leaving it alone because ICU is there for when I need
> up-to-date unicode versions.

Me too.

> From my perspective, the whole point of the builtin collation was to
> one option that avoids these problems that come with updating both ICU
> and glibc.

+1

Yours,
Laurenz Albe

Re: Adding a '--clean-publisher-objects' option to 'pg_createsubscriber' utility.

2025-03-15 Thread Amit Kapila

On Sat, Mar 15, 2025 at 11:35 AM Peter Smith  wrote:
>
> On Sat, Mar 15, 2025 at 4:52 PM Peter Smith  wrote:
> >
> > On Fri, Mar 14, 2025 at 5:39 PM David G. Johnston
> >  wrote:
> > >
> > > On Tuesday, March 11, 2025, Peter Smith  wrote:
> > >>
> > >>
> > >> Choice 3. Implement some option that has an argument saying what to 
> > >> delete
> > >>
> > >> Implement an option that takes some argument saying what objects to 
> > >> remove.
> > >>
> > >> Here, the current patch would be something like:
> > >> "--remove-target-objects=publications". In future, this option could
> > >> be enhanced to accept other values like
> > >> "--remove-target-objects=publications,subscriptions" etc.
> > >
> > >
> > > I’m changing my vote to option 3.  With a requirement that dropping is 
> > > done post-recovery by the user via psql - we just provide the commands we 
> > > would have executed in a script.
> > >
> > > —prune-file=‘drop-commands.psql’
> > > —prune=publications
> > > —prune=subscriptions
> > > Etc…
> > >
> > > I’d rather do multiple specifications than commas separated.  It matches 
> > > what we already do with databases, which was done this way I suspect for 
> > > the same reasons - length of the parameters value when we have:
> > > Publications
> > > Slots
> > > Subscriptions
> > > Databases
> > > Roles
> > > Tablespaces
> > >
> >
>
> (I'm sorry, my previous post included a confusing typo. Here is the 
> correction)
>
> OK. Here are some examples...
>
> style #1:
> --prune=Publications --prune=Slots --prune=Subscriptions
> --prune=Databases --prune=Tablespaces
>
> versus
>
> style #2:
> --prune=Publications,Slots,Subscriptions,Databases,Tablespaces
>
> David, I understand you are saying that you prefer style #1 because of
> the potentially cumbersome length of the csv value part of style #2 if
> there are many future object kinds (e.g.
> "Publications,Slots,Subscriptions,Databases,Tablespaces" in the above
> examples).
>

Style-1 sounds reasonable to me, but how exactly we want to do. One
idea is to have short and long switches like -r and
--remove_exiting_object=publication. The user would be allowed to give
multiple options like -r publications -r slots, etc.

-- 
With Regards,
Amit Kapila.

Re: like pg_shmem_allocations, but fine-grained for DSM registry ?

2025-03-15 Thread Florents Tselai

On Fri, Mar 14, 2025 at 11:44 PM Florents Tselai 
wrote:

>
> On Fri, Mar 14, 2025 at 4:16 PM Nathan Bossart 
> wrote:
>
>> On Thu, Mar 13, 2025 at 06:54:09PM +0200, Florents Tselai wrote:
>> > I扉e been working with the DSM registry API.
>> > I was wondering if it is possible (it doesn愒 look like it) or if it has
>> been discussed:
>> > can we expose a view like pg_shmem_allocations, but fine-grained for
>> every named segment (i.e. created by GetNamedDSMSegment )?
>> >
>> > Currently, there is a "DSM Registry Data" entry in that view,
>> > but iiuc, it愀 only about the top-level hash table the registry uses.
>>
>> This seems like a generally reasonable idea to me.  In theory, it should
>> be
>> easy enough to build something that walks through the DSM registry hash
>> table.
>>
>
>  Here's a first attempt towards a view pg_dsm_registry(name, size) that
> does just that
> So, using the test_dsm_registry module as a test bed,
>
> it would look like this.
>
> CREATE EXTENSION test_dsm_registry;
> SELECT set_val_in_shmem(1236);
>  set_val_in_shmem
> --
>
> (1 row)
>
> -- 20 bytes = int (4 bytes) + LWLock (16bytes)
> SELECT * FROM pg_dsm_registry;
>name| size
> ---+--
>  test_dsm_registry |   20
> (1 row)
>
> I'll create a cf entry to keep track of this
>
>
Here's an updated v3 that fixes some white spaces v2 had—no other changes.

I'm wondering though if it also makes sense to expose:
- backend_pid (who created the segment)
- is_pinned bool
- created_at

https://commitfest.postgresql.org/patch/5652/


v3-0001-pg_dsm_registry-view.patch
Description: Binary data

Re: Enhancing Memory Context Statistics Reporting

2025-03-15 Thread Alexander Korotkov

Hi, Rahila!

On Thu, Mar 13, 2025 at 3:57 PM Rahila Syed  wrote:
>
> Please find attached updated and rebased patches. It has the following changes
>
> 1. To prevent memory leaks, ensure that the latest statistics published by a 
> process
> are freed before it exits. This can be achieved by calling dsa_free in the
> before_shmem_exit callback.
> 2. Add a level column to maintain consistency with the output of
> pg_backend_memory_contexts.

Thank you for your work on this subject.

v17-0001-Preparatory-changes-for-reporting-memory-context-sta.patch

It looks like we're increasing *num_contexts twice per child memory
context.  First, it gets increased with a recursive
MemoryContextStatsInternal() call, then by adding an ichild.  I might
be wrong, but I think these calculations at least deserve more
comments.

v17-0002-Function-to-report-memory-context-statistics.patch

+   if (procNumber == MyProcNumber)
+   {
+   ereport(WARNING,
+   errmsg("cannot return statistics for local backend"),
+   errhint("Use pg_backend_memory_contexts view instead."));
+   PG_RETURN_NULL();
+   }

Is it worth it to keep this restriction?  Can we fetch info about
local memory context for the sake of generality?

I know there have been discussions in the thread before, but the
mechanism of publishing memory context stats via DSA looks quite
complicated.  Also, the user probably intends to inspect memory
contexts when there is not a huge amount of free memory.  So, failure
is probable on DSA allocation.  Could we do simpler?  For instance,
allocate some amount of static shared memory and use it as a message
queue between processes.  As a heavy load is not supposed to be here,
I think one queue would be enough.

--
Regards,
Alexander Korotkov
Supabase

Re: Available disk space per tablespace

2025-03-15 Thread Christoph Berg

Re: Thomas Munro
> Hah, I see this in my local FreeBSD man page.  I guess this might be a
> reference to POSIX's 100% get-out clause "it is unspecified whether
> all members of the statvfs structure have meaningful values on all
> file systems".

Yeah I could hear someone being annoyed by POSIX echoed in that
paragraph.

> system that can't report free space to the "df" command, so what are
> we supposed to do?  We could perhaps add a note to the documentation
> that this field relies on the OS providing meaningful "avail" field in
> statvfs(), but it's hard to imagine.  Maybe just defer that until
> someone shows up with a real report?  So +1 from me, go for it, call
> statvfs() and don't worry.

I was reading looking into gnulib's wrapper around this - it's also
basically calling statvfs() except on assorted older systems.

https://github.com/coreutils/gnulib/blob/master/lib/fsusage.c#L114

Do we care about any of these?

AIX
OSF/1
2.6 < glibc/Linux < 2.6.36
glibc/Linux < 2.6, 4.3BSD, SunOS 4, \
Mac OS X < 10.4, FreeBSD < 5.0, \
NetBSD < 3.0, OpenBSD < 4.4k
SunOS 4.1.2, 4.1.3, and 4.1.3_U1
4.4BSD and older NetBSD
SVR3, old Irix

If not, then statvfs seems safe.

> I also pushed your patch to CI and triggered the NetBSD and OpenBSD
> tasks and they passed your sanity test, though that only checks that
> the reported some number >  1MB.

I thought about making that test "between 1MB and 10PB", but that
seemed silly - it's not testing much, and some day, someone will try
to run the test on a system where it will still fail.

> What's the rationale for not raising an error if the system call
> fails?

That's mirroring the behavior of calculate_tablespace_size() in the
same file. I thought that's to allow \db+ to succeed even if some of
the tablespaces are botched/missing/whatever. But now on closer
inspection, I see that db_dir_size() is erroring out on problems, it
just ignores the top-level directory missing. Fixed in the attached
patch.

\db+
FEHLER:  XX000: could not statvfs directory "pg_tblspc/16384/PG_18_202503111": 
Zu viele Ebenen aus symbolischen Links
LOCATION:  calculate_tablespace_avail, dbsize.c:373

But this is actually something I wanted to address in a follow-up
patch: Currently, non-superusers cannot run \db+ because they lack
CREATE on pg_global (but `\db+ pg_default` works). Should we rather
make pg_database_size and pg_database_avail return NULL for
insufficient permissions instead of throwing an error?

> If someone complains that it's showing -1, doesn't that mean

(-1 is translated to NULL for the SQL level.)

> we'll have to ask them to trace the system calls to figure out why, or
> if it's Windows, likely abandon all hope of ever knowing why?  Should
> statvfs() retry on EINTR?

Hmm. Is looping on EINTR worth the trouble?

> Style nit: maybe ! instead of == false?

Changed.

> Nice feature.

Thanks!

Christoph
>From db42fdcc5ee3097b3364ec51602d6d58994b2060 Mon Sep 17 00:00:00 2001
From: Christoph Berg 
Date: Fri, 14 Mar 2025 16:29:19 +0100
Subject: [PATCH v4] Add pg_tablespace_avail() functions

This exposes the f_avail value from statvfs() on tablespace directories
on the SQL level, allowing monitoring of free disk space from within the
server. On windows, GetDiskFreeSpaceEx() is used.

Permissions required match those from pg_tablespace_size().

In psql, include a new "Free" column in \db+ output.

Add test coverage for pg_tablespace_avail() and the previously not
covered pg_tablespace_size() function.
---
 doc/src/sgml/func.sgml   |  21 +
 doc/src/sgml/ref/psql-ref.sgml   |   2 +-
 src/backend/utils/adt/dbsize.c   | 102 +++
 src/bin/psql/describe.c  |  11 ++-
 src/include/catalog/pg_proc.dat  |   8 ++
 src/test/regress/expected/tablespace.out |  21 +
 src/test/regress/sql/tablespace.sql  |  10 +++
 7 files changed, 171 insertions(+), 4 deletions(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 51dd8ad6571..0b4456ad958 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -30093,6 +30093,27 @@ postgres=# SELECT '0/0'::pg_lsn + pd.segment_number * ps.setting::int + :offset

   
 
+  
+   
+
+ pg_tablespace_avail
+
+pg_tablespace_avail ( name )
+bigint
+   
+   
+pg_tablespace_avail ( oid )
+bigint
+   
+   
+Returns the available disk space in the tablespace with the
+specified name or OID. To use this function, you must
+have CREATE privilege on the specified tablespace
+or have privileges of the pg_read_all_stats role,
+unless it is the default tablespace for the current database.
+   
+  
+
   

 
diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml
index cedccc14129..9e1bec0b422 100644
--- a/doc/src/sgml/ref/psql-ref.sgml
+++ b/doc/src/sgml/ref/psql-ref.sgml
@

Re: PG_CFLAGS rpath Passthrough Issue

2025-03-15 Thread Álvaro Herrera

On 2025-Mar-14, David E. Wheeler wrote:

> But, just. WAT. 30 years in this business and shell string escaping
> continues to elude me. I tried lots of different combinations of
> escaping, but this weird double-$ never occurred to me.

It's make.  From its info manual:

 6.1 Basics of Variable References

   To substitute a variable’s value, write a dollar sign followed by the name
   of the variable in parentheses or braces: either ‘$(foo)’ or ‘${foo}’ is a
   valid reference to the variable foo. This special significance of
   ‘$’ is why you must write ‘$$’ to have the effect of a single dollar sign in
   a file name or recipe.

I'm surprised that you don't need \$\$ though.  I wonder if
  make CFLAGS='-Wl,-rpath,$$ORIGIN'
isn't enough actually?

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"Learn about compilers. Then everything looks like either a compiler or
a database, and now you have two problems but one of them is fun."
https://twitter.com/thingskatedid/status/1456027786158776329

Re: Available disk space per tablespace

On Sat, 2025-03-15 at 13:09 +0100, Christoph Berg wrote:
> Do we care about any of these?
> 
> AIX

We dropped support for it, but there are efforts to change that.

Yours,
Laurenz Albe

Re: BitmapHeapScan streaming read user and prelim refactoring

Hi,

On 2025-03-15 10:43:45 -0400, Melanie Plageman wrote:
> On Thu, Mar 13, 2025 at 5:41 PM Melanie Plageman
>  wrote:
> >
> > Overall, I feel pretty good about merging this once Thomas merges the
> > read stream patches.
> 
> This was committed in 944e81bf99db2b5b70b, 2b73a8cd33b745c, and
> c3953226a07527a1e2.
> I've marked it as committed in the CF app.
> 
> As we continue to test the feature in the coming weeks, we can enhance
> the read stream API with any updates needed.

Really excited about this. This work has made read streams way better, making
it much easier to add future read stream users to other places!

Greetings,

Andres Freund

Re: Disabling vacuum truncate for autovacuum

2025-03-15 Thread Gurjeet Singh

+Andres Freund, since an old email of his is quoted here.

On Fri, Mar 14, 2025 at 8:45 AM Nathan Bossart  wrote:
>
> On Thu, Mar 06, 2025 at 08:54:59AM +0900, Fujii Masao wrote:
> > +1 to having the reloption (if specified) override the GUC setting.
> > That is, I think that autovacuum_vacuum_truncate as defining
> > the default behavior for VACUUM truncation, and that the GUC should
> > only apply when neither the TRUNCATE option in VACUUM nor
> > the reloption is set.
>
> One other difference in my version of the patch [0] is to call this GUC
> vacuum_truncate and have it apply to both autovacuum and VACUUM.  I did
> this for the following reasons:

+1 for the GUC name for the reasons you identified. But -1 for the behaviour
where the reloption and vacuum command's options overrides GUC.

I did not consider the VACUUM (TRUNCATE) command, and was focused on autovacuum,
since that's what caused the replication stall and the consequent application
outage in the anecdote I was relaying.

I'd like to bring our attention back to how this thread started. Will started
the discussion by asking for a way to disable autovacuum's truncate behaviour.
Even though the production outage he faced was due to manual vacuum, he was
worried about the same behaviour that autovacuum may cause, especially since the
parameter old_snapshot_threshold is no longer available; old_snapshot_threshold
allowed sysadmins like Will to disable the truncation behaviour globally. I
provided an anecdote where autovacuum's truncation behaviour had in fact caused
a replication outage as well as the consequent application outage.

The behaviour that is being proposed here does not prevent that situation from
arising again. A sysadmin who's trying to prevent replication outage and
a consequent application outage won't benefit from tuning vacuum_truncate GUC,
because a reloption or VACUUM (TRUNCATE) command will override its behaviour,
and lead to an outage.

We want this new GUC to give the sysadmin the power to override the per-relation
and per-command settings.

> IIUC reloptions with corresponding GUCs typically override the GUC setting,
> although autovacuum_enabled is arguably an exception.  If setting the GUC
> to false overrides the relation-specific settings, it also becomes more
> difficult to enable truncation for just a couple of tables, although that
> might not be a popular use-case.  Furthermore, even if we do want the GUC
> to override the reloption, it won't override VACUUM (TRUNCATE).

I guess what I'm looking for is a global switch that guarantees no relation
truncation will take place on the instance, so that the relevant replication
record is never emitted, and hence will never lead to a blocked replication on
the replica and never cause a consequent outage of applications connected to the
replica(s). That is, as a sysadmin, I need a global variable that overrides and
disables any relation-level and command-level truncation operations. I guess
that's why naming the GUC *disable*_vacuum_truncate made sense to me when I
initially proposed the autovacuum patch, since it was preventing autovacuum's
default truncation behaviour.

The downside of disabling _all_ truncation operations is that database size can
only grow and never shrink, and it may be frustrating for a regular user who's
trying to shrink their table sizes. Even in those situations, a sysadmin may
prefer that none of the tables ever be shrunk in normal operation, rather than
running the risk of causing a replication outage and a consequent application
outage.

In Will's words:
> I expect I would need to monitor
> tables for problematic growth, but that might be better than a surprise
> outage. I suppose the growth could cause an outage too, but I'm thinking it
> would be more controllable.

The sysadmin can schedule a periodic maintenance window where the GUC is changed
for the duration of the maintenance, allowing truncation operations globally,
and then running VACUUM (TRUNCATE), or relying on autovacuum to truncate
relations within that maintenance period.

With the proposed v2 patch behaviour, a regular user is still in control of
truncation behaviour, and hence can cause a replication outage and application
outage on the replicas. Essentially, this patch fails to help the sysadmin, and
leaves them at the mercy of table owners and database owners.

Perhaps the compromise is that that the sysadmin will run a regular script to
check that none of the relations have the reloption set to truncate the
relation, and also communicate to the application developers that they shouldn't
run the VACUUM (TRUNCATE) command. But anyone who has run a moderately large
fleet of Postgres instances would agree that both those things are not
foolproof, and will tend to be ignored or forgotten in the long run. Especially
the VACUUM (TRUNCATE) option is so enticing for the average user that it's near
impossible to prevent them from using it.

In the message linked by Will upthread, Andr

Re: Update Unicode data to Unicode 16.0.0

2025-03-15 Thread Jeremy Schneider

> On Mar 15, 2025, at 10:22 AM, Jeff Davis  wrote:
> 
> On Sat, 2025-03-15 at 12:15 -0400, Tom Lane wrote:
>> On the other hand, if we keep up with the Joneses by updating the
>> Unicode data, we can hopefully put those behavioral changes into
>> effect *before* they'd affect any real data.
> 
> That's a good point.

Jeff - thanks for the reminder that this is just about character semantics and 
not ordering. Obviously C collation by definition (code point ordering) doesn’t 
change sort order… two weeks ago I was working on updating the torture test 
GitHub page with glibc collation changes up through Ubuntu 24.10 so my mind was 
definitely over there. No detected changes in en-US so that’s great news. 🙂

Is the simple answer that functions & clauses related to both time zones and 
character semantics should just all be considered STABLE instead of IMMUTABLE?

I think if that were the case then changes across a minor version would simply 
be allowed by definition right? No need for warnings.

This would impact the ability to create case-insensitive indexes.

-Jeremy

Sent from my TI-83

Re: Dubious server log messages after pg_upgrade

2025-03-15 Thread Michael Paquier

On Sat, Mar 15, 2025 at 04:15:26PM -0400, Andres Freund wrote:
> Ugh, sorry, that went out unintentionally.

No problem.  Thanks.
--
Michael


signature.asc
Description: PGP signature

Re: Disabling vacuum truncate for autovacuum