date:20241203

On Tue, Dec 3, 2024 at 12:31 PM Shlok Kyal  wrote:
>
> The changes look good to me. I have included it in the updated patch.
>

The patch looks mostly good to me. I have updated the docs, comments,
and some other cosmetic changes. Please see attached and let me know
what you think.

-- 
With Regards,
Amit Kapila.

v15-0001-Disallow-UPDATE-DELETE-on-table-with-generated-c.patch
Description: Binary data

Giving the shared catalogues a defined encoding

Hello hackers,

Here's a WIP patch that started on a bugs thread[1].

Problem #1: You can have two databases with different encodings, and
they both pretend that pg_database, pg_authid, pg_db_role_setting etc
are in the local database encoding. That doesn't work too well:
non-ASCII text can be reinterpreted in the wrong encoding.

There's no problem if you only use one encoding everywhere (probably
UTF8). There's also no problem if you use multiple database
encodings, but put only ASCII in the shared catalogues (because ASCII
is a subset of every supported server encoding). This patch is about
formalising and enforcing those two working arrangements, hopefully
invisibly to most users. There's still an escape hatch mode if you
need it, e.g. for a non-conforming pg_upgrade'd system.

The patch invents a new setting CLUSTER CATALOG ENCODING, which can be
inspected with SHOW and changed with ALTER SYSTEM. It has three
possible values:

DATABASE: The shared catalogs use the same encoding as this database,
and all databases in this cluster, and all databases have to use the
default encoding configured at initdb time. Database names and roles
names are free to use any characters you like in that one single
encoding. This is the default.

ASCII: The shared catalogs are restricted to 7-bit ASCII, but in
exchange, databases with different encodings are allowed to co-exist.

UNDEFINED: The old behavior, no restrictions.

There's some documentation in the patch to explain that again in more
words, and a regression transcript showing the behaviour, ie things
you can and can't do in each mode, and how the transitions between
modes can be blocked until you make certain changes.

Problem #2: When dealing with new connections, we currently have
trouble with non-ASCII database and role names because the encoding is
undefined for both the catalogue and the network message. With this
patch, at least the catalogue encoding is defined (unless UNDEFINED),
so there's a pathway to straighten that out.

I am open to better terminology, models, etc. The command seems
verbose, but I hope you'd almost never need to run it, so being clear
seemed better than being brief. I had just CATALOG ENCODING in the
previous version, but then it's not clear that it only affects a few
special catalogues (pg_class et al are always in database encoding, as
they're not shared).I tried SHARED CATALOG ENCODING, but that's
not really a SQL word or concept. CLUSTER is, so here I'm trying
that. On the other hand CLUSTER is a bit overloaded. I had explicit
encoding names eg SET ... TO UTF8 in the previous version, but it
seems easier to call it DATABASE encoding given it had to match the
database anyway if not ASCII/UNDEFINED... There could be other ways
to express all this, though. It does still store the real encoding in
the control file, UTF8 -> 6 or whatever, in case that is useful. When
I was using real encoding names in the syntax, I also had SQL_ASCII
for ASCII mode, but that was quite confusing because SQL_ASCII is well
documented as accepting anything at all, whereas here we need 7-bit
ASCII.

Feedback welcome.

[1]
https://www.postgresql.org/message-id/flat/CA%2BhUKGKKNAc599Vp7kFAnLE1%3DV%3DceYujz_YQoSNrvNFGaJ6i7w%40mail.gmail.com#dfb55ce46ace161d81d60c44229e7c80

v2-0001-Formalize-the-encoding-of-the-shared-catalogs.patch
Description: Binary data

Re: Cannot find a working 64-bit integer type on Illumos


On 29.11.24 20:30, Thomas Munro wrote:

On Fri, Nov 29, 2024 at 11:12 PM Thomas Munro  wrote:
I was thinking about that ECPG stuff: I bet real applications prefer
to use int64_t etc directly too instead of long, the worst type in C.
I wondered if the embedded SQL standard might know about that these
days (ECPGt_int64_t?), but I don't have the standard to hand.  DB2's
embedded SQL seems to have a type sqlint64, but I didn't look too
closely and of course even if we wanted to do something like that as
an optional API option, that'd be a later change.


Interesting:

i) If “long long” is specified, then the  of 
HV is BIGINT.
ii) If “long” is specified, then the  of HV is 
INTEGER.
iii) If “short” is specified, then the  of HV 
is SMALLINT.

[...]

I suppose that makes sense.


BTW I forgot to mention earlier, I peeked at the source of gettext on
NetBSD and illumos, and both appear to handle those special
 tokens when loading message catalogues.


Ah great, thanks for checking that.

Re: Cannot find a working 64-bit integer type on Illumos


On 30.11.24 00:42, Thomas Munro wrote:

On Fri, Nov 29, 2024 at 11:12 PM Thomas Munro  wrote:

New idea: let's just redefine PRI...{32,64,PTR} on that platform,
instead of modifying snprintf.c.


D'oh, that's not going to fly.  gettext() would replace % with
the system's PRId64, so we can't avoid teaching our snprintf.c to
understand Windowsian format strings.  Here's a first attempt at that.
Tested a bit by removing the #ifdef WIN32 locally and playing around
with it.  CI passes on Windows, and I think that should be exercising
it via existing [U]INT64_FORMAT in various places that would break if
it didn't work.


This patch looks good to me.

In meson.build, this comment seems to be misplaced by accident:

+# Check if __int128 is a working 128 bit integer type, and if so
+# define PG_INT128_TYPE to that typename.
 cdata.set('SIZEOF_VOID_P', cc.sizeof('void *', args: test_c_args))
 cdata.set('SIZEOF_SIZE_T', cc.sizeof('size_t', args: test_c_args))

In c.h, you include  instead of .  Is there a 
reason for that?

Re: Memory leak in WAL sender with pgoutput (v10~)

On Tue, Dec 3, 2024 at 4:03 PM Alvaro Herrera  wrote:
>
> If you don't like the idea of a static memcxt in the one block where
> it's needed, I propose to store a new memcxt in PGOutputData, to be used
> exclusively for publications, with a well defined lifetime.
>

+1. This sounds like a way to proceed at least for HEAD. For
back-branches, it is less clear whether changing PGOutputData is a
good idea. Can such a change in back branches break any existing
non-core code (extensions)?

-- 
With Regards,
Amit Kapila.

Re: Incorrect result of bitmap heap scan.

2024-12-03 Thread Matthias van de Meent

On Mon, 2 Dec 2024 at 15:25, Konstantin Knizhnik  wrote:
>
> Hi hackers,
>
> Attached script reproduces the problem with incorrect results of `select 
> count(*)` (it returns larger number of records than really available in the 
> table).
> It is not always reproduced, so you may need to repeat it multiple times - at 
> my system it failed 3 times from 10.
>
> The problem takes place with pg16/17/18 (other versions I have not checked).

I suspect all back branches are affected. As I partially also mentioned offline:

The running theory is that bitmap executor nodes incorrectly assume
that the rows contained in the bitmap all are still present in the
index, and thus assume they're allowed to only check the visibility
map to see if the reference contained in the bitmap is visible.
However, this seems incorrect: Note that index AMs must hold at least
pins on the index pages that contain their results when those results
are returned by amgettuple() [0], and that amgetbitmap() doesn't do
that for all TIDs in the bitmap; thus allowing vacuum to remove TIDs
from the index (and later, heap) that are still present in the bitmap
used in the scan.

Concurrency timeline:

Session 1. amgetbitmap() gets snapshot of index contents, containing
references to dead tuples on heap P1.
Session 2. VACUUM runs on the heap, removes TIDs for P1 from the
index, deletes those TIDs from the heap pages, and finally sets heap
pages' VM bits to ALL_VISIBLE, including the now cleaned page P1
Session 1. Executes the bitmap heap scan that uses the bitmap without
checking tuples on P1 due to ALL_VISIBLE and a lack of output columns.

I think this might be an oversight when the feature was originally
committed in 7c70996e (PG11): we don't know when the VM bit was set,
and the bitmap we're scanning may thus be out-of-date (and should've
had TIDs removed it it had been an index), so I propose disabling this
optimization for now, as attached. Patch v1 is against a recent HEAD,
PG17 applies to the 17 branch, and PG16- should work on all (or at
least, most) active backbranches older than PG17's.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)

PS.
I don't think the optimization itself is completely impossible, and we
can probably re-enable an optimization like that if (or when) we find
a way to reliably keep track of when to stop using the optimization. I
don't think that's an easy fix, though, and definitely not for
backbranches.

The solution I could think to keep most of this optimization requires
the heap bitmap scan to notice that a concurrent process started
removing TIDs from the heap after amgetbitmap was called; i.e.. a
"vacuum generation counter" incremented every time heap starts the
cleanup run. This is quite non-trivial, however, as we don't have much
in place regarding per-relation shared structures which we could put
such a value into, nor a good place to signal changes of the value to
bitmap heap-scanning backends.
From 07739e5af83664b67ea02d3db7a461a106d74040 Mon Sep 17 00:00:00 2001
From: Matthias van de Meent 
Date: Mon, 2 Dec 2024 15:59:35 +0100
Subject: [PATCH v1] Disable BitmapScan's skip_fetch optimization

The optimization doesn't take concurrent vacuum's removal of TIDs into
account, which can remove dead TIDs and make ALL_VISIBLE pages for which
we have pointers to those dead TIDs in the bitmap.

The optimization itself isn't impossible, but would require more work
figuring out that vacuum started removing TIDs and then stop using the
optimization. However, we currently don't have such a system in place,
making the optimization unsound to keep around.

Reported-By: Konstantin Knizhnik 
Backpatch-through: 13
---
 src/include/access/heapam.h   |  9 +++--
 src/include/access/tableam.h  |  3 +-
 src/backend/access/heap/heapam.c  | 13 ---
 src/backend/access/heap/heapam_handler.c  | 28 ---
 src/backend/executor/nodeBitmapHeapscan.c | 42 ++-
 5 files changed, 8 insertions(+), 87 deletions(-)

diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 65999dd64e..42f28109e2 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -92,11 +92,10 @@ typedef struct HeapScanDescData
ParallelBlockTableScanWorkerData *rs_parallelworkerdata;

/*
-* These fields are only used for bitmap scans for the "skip fetch"
-* optimization. Bitmap scans needing no fields from the heap may skip
-* fetching an all visible block, instead using the number of tuples per
-* block reported by the bitmap to determine how many NULL-filled tuples
-* to return.
+* These fields were kept around to guarantee ABI stability, but are
+* otherwise unused. They were only used for bitmap scans for the
+* "skip fetch" optimization, which turned out to be unsound when the
+* second phase of vacuum ran concurrently.
 */
Buffer  rs_vmbuff

Re: [PoC] Reducing planning time when tables have many partitions

2024-12-03 Thread Ashutosh Bapat

On Tue, Dec 3, 2024 at 4:08 PM Alvaro Herrera  wrote:
>
> Hello,
>
> On 2024-Dec-03, Ashutosh Bapat wrote:
>
> > For one of the earlier versions, I had reported a large memory
> > consumption in all cases and increase in planning time for Assert
> > enabled builds. How does the latest version perform in those aspects?
>
> I don't think planning time in assert-enabled builds is something we
> should worry about, at all.  Planning time in production builds is the
> important one.
>

This was discussed earlier. See a few emails from [1] going backwards.
The degradation was Nx, if I am reading those emails right. That means
somebody who is working with a large number of partitions has to spend
Nx time in running their tests. Given that the planning time with
thousands of partitions is already in seconds, slowing that further
down, even in an assert build is slowing development down further. My
suggestion of using OPTIMIZER_DEBUG will help us keep the sanity
checks and also not slow down development.

[1] 
https://www.postgresql.org/message-id/CAJ2pMkZrFS8EfvZpkw9CP0iqWk=eaaxzakws7dw+fttqkuo...@mail.gmail.com

-- 
Best Wishes,
Ashutosh Bapat

Re: NOT ENFORCED constraint feature


On 03.12.24 13:00, Amul Sul wrote:

create table t(a int);
alter table t ADD CONSTRAINT the_constraint CHECK (a > 0) NOT ENFORCED;
insert into t select -1;
select  conname, contype,condeferrable,condeferred, convalidated,
conenforced,conkey,connoinherit
frompg_constraint
where   conrelid = 't'::regclass;

pg_constraint->convalidated should be set to false for NOT ENFORCED constraint?
Am I missing something?


The "validated" status is irrelevant when the constraint is set to not
enforced.  But it's probably still a good idea to make sure the field is
set consistently.  I'm also leaning toward setting it to false.  One
advantage of that would be that if you set the constraint to enforced
later, then it's automatically in the correct "not validated" state.


Let's make it so that ruleutils.c doesn't print the NOT VALID when it's 
already printing NOT ENFORCED.  Otherwise, it gets unnecessarily verbose 
and confusing.



typedef struct Constraint
{
  NodeTagtype;
  ConstrTypecontype;/* see above */
  char   *conname;/* Constraint name, or NULL if unnamed */
  booldeferrable;/* DEFERRABLE? */
  boolinitdeferred;/* INITIALLY DEFERRED? */
  boolskip_validation;/* skip validation of existing rows? */
  boolinitially_valid;/* mark the new constraint as valid? */
  boolis_enforced;/* enforced constraint? */
}
makeNode(Constraint) will default is_enforced to false.
Which makes the default value not what we want.
That means we may need to pay more attention for the trip from
makeNode(Constraint) to finally insert the constraint to the catalog.

if we change it to is_not_enforced, makeNode will default to false.
is_not_enforced is false, means the constraint is enforced.
which is not that intuitive...


Yes, it could be safer to make the field so that the default is false.
I guess the skip_validation field is like that for a similar reason, but
I'm not sure.



Ok. Initially, I was doing it the same way, but to maintain consistency
with the pg_constraint column and avoid negation in multiple places, I
chose that approach. However, I agree that having the default to false
would be safer. I’ve renamed the flag to is_not_enforced. Other names
I considered were not_enforced or is_unenforced, but since we already
have existing flags with two underscores, is_not_enforced shouldn't be
a problem.


I was initially thinking about this as well, but after seeing it now, I 
don't think this is a good change.  Because now we have both "enforced" 
and "not_enforced" sprinkled around the code.  If we were to do this 
consistently everywhere, then it might make sense, but this way it's 
just confusing.  The Constraint struct is only initialized in a few 
places, so I think we can be careful there.  Also note that the field 
initially_valid is equally usually true.


I could of other notes on patch 0001:

Update information_schema table_constraint.enforced (see 
src/backend/catalog/information_schema.sql and 
doc/src/sgml/information_schema.sgml).


The handling of merging check constraints seems incomplete.  What should 
be the behavior of this:


=> create table p1 (a int check (a > 0) not enforced);
CREATE TABLE
=> create table c1 (a int check (a > 0) enforced) inherits (p1);
CREATE TABLE

Or this?

=> create table p2 (a int check (a > 0) enforced);
CREATE TABLE
=> create table c2 () inherits (p1, p2);
CREATE TABLE

Should we catch these and error?

Re: Virtual generated columns


On 28.11.24 10:35, Peter Eisentraut wrote:

On 12.11.24 17:08, Peter Eisentraut wrote:

On 11.11.24 12:37, jian he wrote:
On Wed, Nov 6, 2024 at 12:17 AM Peter Eisentraut 
 wrote:


New patch version.  I've gone through the whole thread again and looked
at all the feedback and various bug reports and test cases and made 
sure

they are all addressed in the latest patch version.  (I'll send some
separate messages to respond to some individual messages, but I'm
keeping the latest patch here.)


just quickly note the not good error message before you rebase.

src7=# create domain d_fail as int4 constraint cc GENERATED ALWAYS AS 
(2) ;

ERROR:  unrecognized constraint subtype: 4
src7=# create domain d_fail as int4 constraint cc GENERATED ALWAYS AS
(2) stored;
ERROR:  unrecognized constraint subtype: 4
src7=# create domain d_fail as int4 constraint cc GENERATED ALWAYS AS
(2) virtual;
ERROR:  unrecognized constraint subtype: 4

reading gram.y, typedef struct Constraint seems cannot distinguish, we
are creating a domain or create table.
I cannot found a way to error out in gram.y.

so we have to error out at DefineDomain.


This appears to be a very old problem independent of this patch.  I'll 
take a look at fixing it.


Here is a patch.

I'm on the fence about taking out the default case.  It does catch the 
missing enum values, and I suppose if the struct arrives in 
DefineDomain() with a corrupted contype value that is none of the enum 
values, then we'd just do nothing with it.  Maybe go ahead with this, 
but for backpatching leave the default case in place?


I have committed this, just to master for now.

Re: Showing primitive index scan count in EXPLAIN ANALYZE (for skip scan and SAOP scans)

2024-12-03 Thread Matthias van de Meent

On Thu, 28 Nov 2024 at 22:09, Alena Rybakina  wrote:
>
> Hi!
>
> On 27.11.2024 16:36, Matthias van de Meent wrote:
>> On Wed, 27 Nov 2024 at 14:22, Alena Rybakina  
>> wrote:
>>> Sorry it took me so long to answer, I had some minor health complications
>>>
>>> On 12.11.2024 23:00, Peter Geoghegan wrote:
>>>
>>> On Sun, Nov 10, 2024 at 2:00 PM Alena Rybakina
>>>  wrote:
>>>
>>> Or maybe I was affected by fatigue, but I don’t understand this point, to 
>>> be honest. I see from the documentation and your first letter that it 
>>> specifies how many times in total the tuple search would be performed 
>>> during the index execution. Is that not quite right?
>>>
>>> Well, nodes that appear on the inner side of a nested loop join (and
>>> in a few other contexts) generally have their row counts (and a few
>>> other things) divided by the total number of executions. The idea is
>>> that we're showing the average across all executions of the node -- if
>>> the user wants the true absolute number, they're expected to multiply
>>> nrows by nloops themselves. This is slightly controversial behavior,
>>> but it is long established (weirdly, we never divide by nloops for
>>> "Buffers").
>>>
>>> I understood what you mean and I faced this situation before when I saw 
>>> extremely more number of actual rows that could be and it was caused by the 
>>> number of scanned tuples per cycles. [0]
>>>
>>> [0] 
>>> https://www.postgresql.org/message-id/flat/9f4a159b-f527-465f-b82e-38b4b7df8...@postgrespro.ru
>>>
>>> Initial versions of my patch didn't do this. The latest version does
>>> divide like this, though. In general it isn't all that likely that an
>>> inner index scan would have more than a single primitive index scan,
>>> in any case, so which particular behavior I use here (divide vs don't
>>> divide) is not something that I feel strongly about.
>>>
>>> I think we should divide them because by dividing the total buffer usage by 
>>> the number of loops, user finds the average buffer consumption per loop. 
>>> This gives them a clearer picture of the resource intensity per basic unit 
>>> of work.
>> I disagree; I think the whole "dividing by number of loops and
>> rounding up to integer" was the wrong choice for tuple count, as that
>> makes it difficult if not impossible to determine the actual produced
>> count when it's less than the number of loops. Data is lost in the
>> rounding/processing, and I don't want to have lost that data.
>>
>> Same applies for ~scans~ searches: If we do an index search, we should
>> show it in the count as total sum, not partial processed value. If a
>> user is interested in per-loopcount values, then they can derive that
>> value from the data they're presented with; but that isn't true when
>> we present only the divided-and-rounded value.
>>
> To be honest, I didn't understand how it will be helpful because there
> is an uneven distribution of buffer usage from cycle to cycle, isn't it?

I'm sorry, I don't quite understand what you mean by cycle here.

> I thought that the dividing memory on number of cycles helps us to
> normalize the metric to account for the repeated iterations. This gives
> us a clearer picture of the resource intensity per basic unit of work,
> rather than just the overall total. Each loop may consume a different
> amount of buffer space, but by averaging it out, we're smoothing those
> fluctuations into a more representative measure.

The issue I see here is that users can get those numbers from raw
results, but they can't get the raw (more accurate) data from the
current output; if we only show processed data (like the 'rows' metric
in text output, which is a divided-and-rounded value) you can't get
the original data back with good confidence.

E.g., I have a table 'twentyone' with values 1..21, and I left join it
on a table 'ten' with values 1..10. The current text explain output
-once the planner is convinced to execute (nested loop left join
(seqscan 'thousand'), (index scan 'ten'))- will show that the index
scan path produced 0 rows, which is clearly wrong, and I can't get the
original value back with accuracy by multiplying rows with loops due
to the rounding.

> Moreover, this does not correspond to another metric that is nearby -
> the number of lines processed by the algorithm for the inner node.

It doesn't have much correspondence to that anyway, as we don't count
lines that were accessed but didn't match index quals, nor heap tuples
filtered by rechecks, in the `rows` metric.

> Will
> not the user who evaluates the query plan be confused by such a discrepancy?

I think users will be more confused about a discrepancy between buffer
accesses and index searches (which are more closely related to
eachother) than a discrepancy between index searches and
rounded-average-number-of-tuples-produced-per-loop, or the discrepancy
between not-quite-average-tuples-procuded-per-loop vs the "heap
fetches" counter of an IndexOnlyScan, etc.

Kind regards,

Matthia

Re: Drop back the redundant "Lock" suffix from LWLock wait event names

On 2024-Dec-02, Alvaro Herrera wrote:

> Oh, you're right, this was unintentional and unnoticed.  I'll push this
> shortly, to both 17 and master.

Pushed, thanks Christophe and Bertrand.

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/
 It does it in a really, really complicated way
 why does it need to be complicated?
 Because it's MakeMaker.

Re: shared-memory based stats collector - v70

Hi,

On 2024-12-03 13:37:48 +0300, Anton A. Melnikov wrote:
> Found a place in the code of this patch that is unclear to me:
> https://github.com/postgres/postgres/blob/1acf10549e64c6a52ced570d712fcba1a2f5d1ec/src/backend/utils/activity/pgstat.c#L1658
> 
> Owing assert() the next if() should never be performed, but the comment above 
> says the opposite.
> Is this assert really needed here? And if so, for what?

It's code that should be unreachable. But in case it is encountered in a
production scenario, it's not worth taking down the server for it.

Greetings,

Andres Freund

Re: code contributions for 2024, WIP version

On Tue, Dec 03, 2024 at 10:16:35AM +0900, Michael Paquier wrote:
> On Mon, Dec 02, 2024 at 04:10:22PM -0500, Robert Haas wrote:
>> Donning my asbestos underwear, I remain yours faithfully,
> 
> Thanks for taking the time to compile all that.  That's really nice.

+1, I always look forward to the blog post.

-- 
nathan

Re: code contributions for 2024, WIP version

On Tue, Dec 3, 2024 at 10:37 AM Nathan Bossart  wrote:
> On Tue, Dec 03, 2024 at 10:16:35AM +0900, Michael Paquier wrote:
> > On Mon, Dec 02, 2024 at 04:10:22PM -0500, Robert Haas wrote:
> >> Donning my asbestos underwear, I remain yours faithfully,
> >
> > Thanks for taking the time to compile all that.  That's really nice.
>
> +1, I always look forward to the blog post.

Thanks, glad it's appreciated.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

RE: Proposal for Updating CRC32C with AVX-512 Algorithm.

2024-12-03 Thread Devulapalli, Raghuveer

> Raghuveer, would you mind rebasing this patch set now that the SSE4.2 patch is
> committed?

Rebased to master branch. 

Raghuveer


v8-0001-Add-a-Postgres-SQL-function-for-crc32c-benchmarki.patch
Description: v8-0001-Add-a-Postgres-SQL-function-for-crc32c-benchmarki.patch


v8-0002-Refactor-consolidate-x86-ISA-and-OS-runtime-check.patch
Description: v8-0002-Refactor-consolidate-x86-ISA-and-OS-runtime-check.patch


v8-0003-Add-AVX-512-CRC32C-algorithm-with-a-runtime-check.patch
Description: v8-0003-Add-AVX-512-CRC32C-algorithm-with-a-runtime-check.patch

Re: pg_stat_statements and "IN" conditions

2024-12-03 Thread Dmitry Dolgov

> On Thu, Nov 28, 2024 at 08:36:47PM GMT, Kirill Reshke wrote:
>
> Hi! Can you please send a rebased version of this?

Sure, here it is.
>From 2de1af6489d46449b2884a9194515cd1090d5e8c Mon Sep 17 00:00:00 2001
From: Dmitrii Dolgov <9erthali...@gmail.com>
Date: Tue, 3 Dec 2024 14:55:45 +0100
Subject: [PATCH v22 1/4] Prevent jumbling of every element in ArrayExpr

pg_stat_statements produces multiple entries for queries like

SELECT something FROM table WHERE col IN (1, 2, 3, ...)

depending on the number of parameters, because every element of
ArrayExpr is jumbled. In certain situations it's undesirable, especially
if the list becomes too large.

Make an array of Const expressions contribute only the first/last
elements to the jumble hash. Allow to enable this behavior via the new
pg_stat_statements parameter query_id_const_merge with the default value off.

Reviewed-by: Zhihong Yu, Sergey Dudoladov, Robert Haas, Tom Lane,
Michael Paquier, Sergei Kornilov, Alvaro Herrera, David Geier, Sutou Kouhei
Tested-by: Chengxi Sun, Yasuo Honda
---
 contrib/pg_stat_statements/Makefile   |   2 +-
 .../pg_stat_statements/expected/merging.out   | 167 ++
 contrib/pg_stat_statements/meson.build|   1 +
 .../pg_stat_statements/pg_stat_statements.c   |  74 +++-
 contrib/pg_stat_statements/sql/merging.sql|  58 ++
 doc/src/sgml/pgstatstatements.sgml|  57 +-
 src/backend/nodes/gen_node_support.pl |  21 ++-
 src/backend/nodes/queryjumblefuncs.c  | 107 ++-
 src/backend/postmaster/launch_backend.c   |   3 +
 src/backend/utils/misc/postgresql.conf.sample |   1 -
 src/include/nodes/nodes.h |   3 +
 src/include/nodes/primnodes.h |   2 +-
 src/include/nodes/queryjumble.h   |   9 +-
 13 files changed, 479 insertions(+), 26 deletions(-)
 create mode 100644 contrib/pg_stat_statements/expected/merging.out
 create mode 100644 contrib/pg_stat_statements/sql/merging.sql

diff --git a/contrib/pg_stat_statements/Makefile 
b/contrib/pg_stat_statements/Makefile
index 241c02587b..eef8d69cc4 100644
--- a/contrib/pg_stat_statements/Makefile
+++ b/contrib/pg_stat_statements/Makefile
@@ -20,7 +20,7 @@ LDFLAGS_SL += $(filter -lm, $(LIBS))
 REGRESS_OPTS = --temp-config 
$(top_srcdir)/contrib/pg_stat_statements/pg_stat_statements.conf
 REGRESS = select dml cursors utility level_tracking planning \
user_activity wal entry_timestamp privileges extended \
-   parallel cleanup oldextversions
+   parallel cleanup oldextversions merging
 # Disabled because these tests require 
"shared_preload_libraries=pg_stat_statements",
 # which typical installcheck users do not have (e.g. buildfarm clients).
 NO_INSTALLCHECK = 1
diff --git a/contrib/pg_stat_statements/expected/merging.out 
b/contrib/pg_stat_statements/expected/merging.out
new file mode 100644
index 00..1e58283afe
--- /dev/null
+++ b/contrib/pg_stat_statements/expected/merging.out
@@ -0,0 +1,167 @@
+--
+-- Const merging functionality
+--
+CREATE EXTENSION pg_stat_statements;
+CREATE TABLE test_merge (id int, data int);
+-- IN queries
+-- No merging is performed, as a baseline result
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+ t 
+---
+ t
+(1 row)
+
+SELECT * FROM test_merge WHERE id IN (1, 2, 3, 4, 5, 6, 7, 8, 9);
+ id | data 
++--
+(0 rows)
+
+SELECT * FROM test_merge WHERE id IN (1, 2, 3, 4, 5, 6, 7, 8, 9, 10);
+ id | data 
++--
+(0 rows)
+
+SELECT * FROM test_merge WHERE id IN (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11);
+ id | data 
++--
+(0 rows)
+
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+query  
  | calls 
+-+---
+ SELECT * FROM test_merge WHERE id IN ($1, $2, $3, $4, $5, $6, $7, $8, $9) 
  | 1
+ SELECT * FROM test_merge WHERE id IN ($1, $2, $3, $4, $5, $6, $7, $8, $9, 
$10)  | 1
+ SELECT * FROM test_merge WHERE id IN ($1, $2, $3, $4, $5, $6, $7, $8, $9, 
$10, $11) | 1
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t
  | 1
+(4 rows)
+
+-- Normal scenario, too many simple constants for an IN query
+SET pg_stat_statements.query_id_const_merge = on;
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+ t 
+---
+ t
+(1 row)
+
+SELECT * FROM test_merge WHERE id IN (1);
+ id | data 
++--
+(0 rows)
+
+SELECT * FROM test_merge WHERE id IN (1, 2, 3);
+ id | data 
++--
+(0 rows)
+
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+   query| calls 
++---
+ SELECT * FROM test_merge WHERE id IN ($1)  | 1
+ SELECT * FROM test_merge WHERE id IN (...) | 1
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1

Re: meson missing test dependencies

2024-12-03 Thread Nazir Bilal Yavuz

Hi,

On Mon, 2 Dec 2024 at 22:27, Peter Eisentraut  wrote:
>
> On 02.12.24 11:50, Nazir Bilal Yavuz wrote:
> > I applied your patches and the cube example worked. But when I edited
> > 'test_json_parser_perf.c' and 'test_json_parser_incremental.c', meson
> > did not rebuild. I used the 'meson test -C build --suite setup --suite
> > test_json_parser' command to test test_json_parser. Did I do something
> > wrong?
>
> Seems to work for me. I edited test_json_parser_incremental.c and then ran
>
> $ meson test -C build --suite setup --suite test_json_parser -v
> ninja: Entering directory `/Users/peter/devel/postgresql/postgresql/build'
> [2/2] Linking target
> src/test/modules/test_json_parser/test_json_parser_incremental
> 1/7 postgresql:setup / tmp_install
> RUNNING
> ...
>
> Without my patch, you don't get the "Linking target ..." output.

It is working fine now, I must have done something wrong before. Sorry
for the noise.


--
Regards,
Nazir Bilal Yavuz
Microsoft

Re: Memory leak in WAL sender with pgoutput (v10~)

On Tue, Dec 3, 2024 at 12:17 PM Michael Paquier  wrote:
>
> So how about the attached that introduces a FreePublication() matching
> with GetPublication(), used to do the cleanup?  Feel free to comment.
>

As you would have noted I am fine with the fix on these lines but I
suggest holding it till we conclude the memory context point raised by
me today. It is possible that we are still leaking some memory in
other related scenarios.

-- 
With Regards,
Amit Kapila.

Re: Memory leak in WAL sender with pgoutput (v10~)

2024-12-03 Thread Michael Paquier

On Tue, Dec 03, 2024 at 02:01:00PM +0530, Amit Kapila wrote:
> As you would have noted I am fine with the fix on these lines but I
> suggest holding it till we conclude the memory context point raised by
> me today. It is possible that we are still leaking some memory in
> other related scenarios.

Sure.  I've not seen anything else, but things are complicated enough
in this code that a path could always have been missed.
--
Michael


signature.asc
Description: PGP signature

Re: shared-memory based stats collector - v70

2024-12-03 Thread Anton A. Melnikov


Hi!

Found a place in the code of this patch that is unclear to me:
https://github.com/postgres/postgres/blob/1acf10549e64c6a52ced570d712fcba1a2f5d1ec/src/backend/utils/activity/pgstat.c#L1658

Owing assert() the next if() should never be performed, but the comment above 
says the opposite.
Is this assert really needed here? And if so, for what?

Would be glad for clarification.


With the best regards,

--
Anton A. Melnikov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: Incorrect result of bitmap heap scan.

Peter Geoghegan  writes:
> On Mon, Dec 2, 2024 at 10:15 AM Matthias van de Meent
>  wrote:
>> The running theory is that bitmap executor nodes incorrectly assume
>> that the rows contained in the bitmap all are still present in the
>> index, and thus assume they're allowed to only check the visibility
>> map to see if the reference contained in the bitmap is visible.
>> However, this seems incorrect: Note that index AMs must hold at least
>> pins on the index pages that contain their results when those results
>> are returned by amgettuple() [0], and that amgetbitmap() doesn't do
>> that for all TIDs in the bitmap; thus allowing vacuum to remove TIDs
>> from the index (and later, heap) that are still present in the bitmap
>> used in the scan.

> This theory seems very believable.

I'm not convinced.  I think there are two assumptions underlying
bitmap scan:

1. Regardless of index contents, it's not okay for vacuum to remove
tuples that an open transaction could potentially see.  So the heap
tuple will be there if we look, unless it was already dead (in which
case it could have been replaced, so we have to check visibility of
whatever we find).

2. If the page's all-visible bit is set, there has been no recent
change in its contents, so we don't have to look at the page.
"Recent" is a bit squishily defined, but again it should surely
cover outdating or removal of a tuple that an open transaction
could see.

If this is not working, I am suspicious that somebody made page
freezing too aggressive somewhere along the line.

Whether that's true or not, it seems like it'd be worth bisecting
to see if we can finger a commit where the behavior changed (and
the same goes for the question of why-isnt-it-an-IOS-scan).  However,
the reproducer seems to have quite a low failure probability for me,
which makes it hard to do bisection testing with much confidence.
Can we do anything to make the test more reliable?  If I'm right
to suspect autovacuum, maybe triggering lots of manual vacuums
would improve the odds?

regards, tom lane

Re: Incorrect result of bitmap heap scan.

Hi,

On 2024-12-02 11:07:38 -0500, Peter Geoghegan wrote:
> Clearly it would be wildly impractical to do the "buffer pin interlock
> against TID recycling" thing in the context of bitmap scans.

That's certainly true if we do the VM check at the time of the bitmap heap
scan.

What if we did it whenever we first enter a block into the TID bitmap? There's
sufficient padding space in PagetableEntry to store such state without
increasing memory usage.

That'd probably need some API evolution, because we'd only know after entering
into the tidbitmap that we need to check the VM, we'd need to index pins
during the VM checking and then update PagetableEntry with the result of the
VM probe.

I think, but am not certain, that it'd be sufficient to do the VM check the
first time an index scan encounters a block. If a concurrent vacuum would mark
the heap page all-visible after we encountered it first, we'd do "real"
visibility checks, because the first VM lookup won't have it as all
visible. And in the opposite situation, where we find a page all-visible in
the VM, which then subsequently gets marked not-all-visible, normal snapshot
semantics would make it safe to still consider the contents all-visible,
because update/delete can't be visible to "us".

Greetings,

Andres Freund

Re: relfilenode statistics

2024-12-03 Thread Bertrand Drouvot

Hi,

On Fri, Nov 29, 2024 at 08:52:13PM +0500, Kirill Reshke wrote:
> On Fri, 29 Nov 2024 at 20:20, Bertrand Drouvot
>  wrote:
> > On Fri, Nov 29, 2024 at 11:23:12AM +0500, Kirill Reshke wrote:
> > > If we don’t have the relation OID when writing buffers out, can we
> > > just store oid to buffertag mapping somewhere and use it?
> >
> > Do you mean add the relation OID into the BufferTag? While that could 
> > probably
> > be done from a technical point of view (with probably non negligible amount
> > of refactoring), I can see those cons:
> 
> Not exactly, what i had in mind was a separate hashmap into shared
> memory, mapping buffertag<>oid.

I see.

> > 2. Probably lot of refactoring
> > 3. This new member would be there "only" for stats and reporting purpose as
> > it is not needed at all for buffer related operations
> 
> To this design, your points 2&3 apply.

That said, it might also help for DropRelationBuffers() where we need to scan
the entire buffer pool (there is an optimization in place though). We could
imagine buffertag as key and the value could be the relation OID and each entry
would have next/prev pointers linking to other BufferTags with same OID.

That's probably much more refactoring (and more invasive) that the initial idea
in this thread but could lead to multiple pros though. I'm not very familar with
the "buffer" area of the code and would also need to study the performance 
impact
to maintain this new hash map.

Do you and/or others have any thoughts/ideas about it?

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Memory leak in WAL sender with pgoutput (v10~)

On 2024-Dec-03, Michael Paquier wrote:

> So how about the attached that introduces a FreePublication() matching
> with GetPublication(), used to do the cleanup?  Feel free to comment.

I think this doubles down on bad design in the logical replication code,
or at least it goes against what we do almost everywhere else in backend
code.  We should do less freeing, more context deleting/resetting.
(Storing stuff in CacheMemoryContext was surely a mistake.)

If you don't like the idea of a static memcxt in the one block where
it's needed, I propose to store a new memcxt in PGOutputData, to be used
exclusively for publications, with a well defined lifetime.  I'm against
reusing data->cachecxt, because the lifetime of that is 100% unclear.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/

Re: [PoC] Reducing planning time when tables have many partitions

Hello,

On 2024-Dec-03, Ashutosh Bapat wrote:

> For one of the earlier versions, I had reported a large memory
> consumption in all cases and increase in planning time for Assert
> enabled builds. How does the latest version perform in those aspects?

I don't think planning time in assert-enabled builds is something we
should worry about, at all.  Planning time in production builds is the
important one.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"Learn about compilers. Then everything looks like either a compiler or
a database, and now you have two problems but one of them is fun."
https://twitter.com/thingskatedid/status/1456027786158776329

Doc: fix the rewrite condition when executing ALTER TABLE ADD COLUMN

2024-12-03 Thread Masahiro Ikeda


Hi,

The documentation seems to overlook the rewrite condition
when executing ALTER TABLE ADD COLUMN.

The current document states that a volatile DEFAULT will
trigger a rewrite of the table and its indexes. However, the
table and its indexes will also be rewritten when an IDENTITY
column is added, or when a column with a domain data type that
has constraints is added.

What do you think?

Regards,
--
Masahiro Ikeda
NTT DATA CORPORATION-- psql -f test.sql -e --no-psqlrc

DROP TABLE IF EXISTS test;
CREATE TABLE test (id1 int);
INSERT INTO test (SELECT generate_series(1,100));
DELETE FROM test WHERE id1 < 10;
SELECT oid, relfilenode FROM pg_class WHERE relname = 'test';

\echo 'will not rewrite'
ALTER TABLE test ADD COLUMN id2 int;
SELECT oid, relfilenode FROM pg_class WHERE relname = 'test';

\echo 'will not rewrite'
ALTER TABLE test ADD COLUMN id3 int default 100;
SELECT oid, relfilenode FROM pg_class WHERE relname = 'test';

\echo 'will not rewrite: stable proc'
ALTER TABLE test ADD COLUMN id4 timestamp DEFAULT now();
SELECT oid, relfilenode FROM pg_class WHERE relname = 'test';

\echo 'will rewrite: volatile proc'
ALTER TABLE test ADD COLUMN id5 timestamp DEFAULT clock_timestamp();
SELECT oid, relfilenode FROM pg_class WHERE relname = 'test';

\echo 'will rewrite: identity'
ALTER TABLE test ADD COLUMN id6 int GENERATED ALWAYS AS (id1 * 2) STORED;
SELECT oid, relfilenode FROM pg_class WHERE relname = 'test';

\echo 'will rewrite: identity'
ALTER TABLE test ADD COLUMN id7 int GENERATED BY DEFAULT AS IDENTITY;
SELECT oid, relfilenode FROM pg_class WHERE relname = 'test';

\echo 'will rewrite: domain with constraint'
DROP DOMAIN IF EXISTS test_domain;
CREATE DOMAIN test_domain int CONSTRAINT check_test CHECK (VALUE IN (1, 2, 3));
ALTER TABLE test ADD COLUMN id8 test_domain;
SELECT oid, relfilenode FROM pg_class WHERE relname = 'test';

\echo 'will not rewrite: only constraint'
ALTER TABLE test ADD COLUMN id9 int CONSTRAINT check_test CHECK (id9 IN (1, 2, 
3));
SELECT oid, relfilenode FROM pg_class WHERE relname = 'test';

SELECT * FROM test LIMIT 10;

DROP TABLE IF EXISTS test;
DROP TABLE
CREATE TABLE test (id1 int);
CREATE TABLE
INSERT INTO test (SELECT generate_series(1,100));
INSERT 0 100
DELETE FROM test WHERE id1 < 10;
DELETE 9
SELECT oid, relfilenode FROM pg_class WHERE relname = 'test';
  oid  | relfilenode 
---+-
 17038 |   17038
(1 row)

will not rewrite
ALTER TABLE test ADD COLUMN id2 int;
ALTER TABLE
SELECT oid, relfilenode FROM pg_class WHERE relname = 'test';
  oid  | relfilenode 
---+-
 17038 |   17038
(1 row)

will not rewrite
ALTER TABLE test ADD COLUMN id3 int default 100;
ALTER TABLE
SELECT oid, relfilenode FROM pg_class WHERE relname = 'test';
  oid  | relfilenode 
---+-
 17038 |   17038
(1 row)

will not rewrite: stable proc
ALTER TABLE test ADD COLUMN id4 timestamp DEFAULT now();
ALTER TABLE
SELECT oid, relfilenode FROM pg_class WHERE relname = 'test';
  oid  | relfilenode 
---+-
 17038 |   17038
(1 row)

will rewrite: volatile proc
ALTER TABLE test ADD COLUMN id5 timestamp DEFAULT clock_timestamp();
ALTER TABLE
SELECT oid, relfilenode FROM pg_class WHERE relname = 'test';
  oid  | relfilenode 
---+-
 17038 |   17044
(1 row)

will rewrite: identity
ALTER TABLE test ADD COLUMN id6 int GENERATED ALWAYS AS (id1 * 2) STORED;
ALTER TABLE
SELECT oid, relfilenode FROM pg_class WHERE relname = 'test';
  oid  | relfilenode 
---+-
 17038 |   17048
(1 row)

will rewrite: identity
ALTER TABLE test ADD COLUMN id7 int GENERATED BY DEFAULT AS IDENTITY;
ALTER TABLE
SELECT oid, relfilenode FROM pg_class WHERE relname = 'test';
  oid  | relfilenode 
---+-
 17038 |   17053
(1 row)

will rewrite: domain with constraint
DROP DOMAIN IF EXISTS test_domain;
DROP DOMAIN
CREATE DOMAIN test_domain int CONSTRAINT check_test CHECK (VALUE IN (1, 2, 3));
CREATE DOMAIN
ALTER TABLE test ADD COLUMN id8 test_domain;
ALTER TABLE
SELECT oid, relfilenode FROM pg_class WHERE relname = 'test';
  oid  | relfilenode 
---+-
 17038 |   17059
(1 row)

will not rewrite: only constraint
ALTER TABLE test ADD COLUMN id9 int CONSTRAINT check_test CHECK (id9 IN (1, 2, 
3));
ALTER TABLE
SELECT oid, relfilenode FROM pg_class WHERE relname = 'test';
  oid  | relfilenode 
---+-
 17038 |   17059
(1 row)

SELECT * FROM test LIMIT 10;
 id1 | id2 | id3 |id4 |id5 | 
id6 | id7 | id8 | id9 
-+-+-+++-+-+-+-
  10 | | 100 | 2024-12-03 15:08:24.251233 | 2024-12-03 15:08:24.261365 |  
20 |   1 | |
  11 | | 100 | 2024-12-03 15:08:24.251233 | 2024-12-03 15:08:24.261427 |  
22 |   2 | |
  12 | | 100 | 2024-12-03 15:08:24.251233 | 2024-12-03 15:08:24.261432 |  
24 |   3 | |
  13 | | 100 | 2024-12-03 15:08:24.25

Re: Don't overwrite scan key in systable_beginscan()


On 27.11.24 16:35, Justin Pryzby wrote:

On Wed, Nov 27, 2024 at 04:33:25PM +0100, Peter Eisentraut wrote:

On 26.11.24 14:56, Justin Pryzby wrote:

Since 811af9786b, the palloc'd idxkey's seem to be leaking/accumulating
throughout the command.

I noticed this on the master branch while running ANALYZE on partitioned
table with 600 attributes, even though only 6 were being analyzed.

LOG:  level: 3; BuildRelationExtStatistics: 1239963512 total in 278 blocks; 
5082984 free (296 chunks); 1234880528 used

Several indexes are being scanned many thousands of times.


Hmm, this patch inserts one additional palloc() call per
systable_beginscan().  So it won't have much of an impact for isolated
calls, but for thousands of scans you get thousands of small chunks of
memory.

Does your test case get better if you insert corresponding pfree() calls?


Yes -- I'd already checked.


Ok, I committed a fix that inserts some pfree() calls.  Thanks for the 
report.

Re: An inefficient query caused by unnecessary PlaceHolderVar

2024-12-03 Thread Andrei Lepikhov

On 12/2/24 10:46, Richard Guo wrote:

On Wed, Nov 27, 2024 at 5:45 PM Richard Guo wrote:

I ended up using 'under the same lowest nulling outer join' to
keep consistent with the wording used elsewhere. Please see the
updated patch attached.

Commit e032e4c7d computes the nullingrel data for each leaf RTE, and
we can leverage that to determine if the referenced rel is under the
same lowest nulling outer join: we just need to check if the
nullingrels of the subquery RTE are a subset of those of the lateral
referenced rel. This eliminates the need to introduce
lowest_nullable_side. Please see attached.
Thanks for drawing attention to e032e4c7d. It is a really helpful
structure. I remember last year, we discussed [1] one sophisticated
subquery pull-up technique, and we needed exactly the same data - it was
too invasive to commit, and we committed only a small part of it. The
nullingrel_info structure may give this feature one more chance.

A couple of words about your patch. These few lines of code caused a lot
of discoveries, but in my opinion, they look fine. But I didn't find
negative tests, where we need to wrap a Var with PHV like the following:

explain (verbose, costs off)
select t1.q1, x from
int8_tbl t1 left join
(int8_tbl t2 left join
lateral (select t2.q2 as x, * from int8_tbl t3) ss on t2.q2 = ss.q1)
on t1.q1 = t2.q1
order by 1, 2;

If regression tests doesn't contain such check it would be nice to add.

[1]
https://www.postgresql.org/message-id/35c8a3e8-d080-dfa8-2be3-cf5fe702010a%40postgrespro.ru

--
regards, Andrei Lepikhov

Re: INSERT ... ON CONFLICT DO SELECT [FOR ...] take 2

2024-12-03 Thread Andreas Karlsson


Hi,

Here is an updated version of the patch which fixes a few small bugs, 
including making sure it checks the update permission plus a bug found 
by Joel Jacobsson when it was called by SPI.


Andreas
From 21ccc735d9d261278564a98d8d2d8137485cd758 Mon Sep 17 00:00:00 2001
From: Andreas Karlsson 
Date: Mon, 18 Nov 2024 00:29:15 +0100
Subject: [PATCH v3] Add support for ON CONFLICT DO SELECT [ FOR ... ]

Adds support for DO SELECT action for ON CONFLICT clause where we
select the tuples and optionally lock them. If the tuples are locked
with check for conflicts, otherwise not.
---
 doc/src/sgml/ref/insert.sgml  |  17 +-
 src/backend/commands/explain.c|  33 ++-
 src/backend/executor/nodeModifyTable.c| 255 +++---
 src/backend/optimizer/plan/createplan.c   |   2 +
 src/backend/parser/analyze.c  |  26 +-
 src/backend/parser/gram.y |  20 +-
 src/backend/parser/parse_clause.c |   7 +
 src/include/nodes/execnodes.h |   2 +
 src/include/nodes/lockoptions.h   |   3 +-
 src/include/nodes/nodes.h |   1 +
 src/include/nodes/parsenodes.h|   4 +-
 src/include/nodes/plannodes.h |   2 +
 src/include/nodes/primnodes.h |   9 +-
 src/test/regress/expected/insert_conflict.out |  97 ++-
 src/test/regress/sql/insert_conflict.sql  |  37 +++
 15 files changed, 459 insertions(+), 56 deletions(-)

diff --git a/doc/src/sgml/ref/insert.sgml b/doc/src/sgml/ref/insert.sgml
index 6f0adee1a12..63ffb0d141c 100644
--- a/doc/src/sgml/ref/insert.sgml
+++ b/doc/src/sgml/ref/insert.sgml
@@ -36,6 +36,7 @@ INSERT INTO table_name [ AS and conflict_action is one of:
 
 DO NOTHING
+DO SELECT [ FOR { UPDATE | NO KEY UPDATE | SHARE | KEY SHARE } ]
 DO UPDATE SET { column_name = { expression | DEFAULT } |
 ( column_name [, ...] ) = [ ROW ] ( { expression | DEFAULT } [, ...] ) |
 ( column_name [, ...] ) = ( sub-SELECT )
@@ -87,18 +88,24 @@ INSERT INTO table_name [ AS 
The optional RETURNING clause causes INSERT
-   to compute and return value(s) based on each row actually inserted
-   (or updated, if an ON CONFLICT DO UPDATE clause was
-   used).  This is primarily useful for obtaining values that were
+   to compute and return value(s) based on each row actually inserted.
+   If an ON CONFLICT DO UPDATE clause was used,
+   RETURNING also returns tuples which were updated, and
+   in the presence of an ON CONFLICT DO SELECT clause all
+   input rows are returned.  With a traditional INSERT,
+   the RETURNING clause is primarily useful for obtaining
+   values that were
supplied by defaults, such as a serial sequence number.  However,
any expression using the table's columns is allowed.  The syntax of
the RETURNING list is identical to that of the output
-   list of SELECT.  Only rows that were successfully
+   list of SELECT.  If an ON CONFLICT DO SELECT
+   clause is not present, only rows that were successfully
inserted or updated will be returned.  For example, if a row was
locked but not updated because an ON CONFLICT DO UPDATE
... WHERE clause condition was not satisfied, the
-   row will not be returned.
+   row will not be returned.  ON CONFLICT DO SELECT
+   works similarly, except no update takes place.
   
 
   
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index a3f1d53d7a5..012f51d1491 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -4601,10 +4601,35 @@ show_modifytable_info(ModifyTableState *mtstate, List *ancestors,
 
 	if (node->onConflictAction != ONCONFLICT_NONE)
 	{
-		ExplainPropertyText("Conflict Resolution",
-			node->onConflictAction == ONCONFLICT_NOTHING ?
-			"NOTHING" : "UPDATE",
-			es);
+		const char *resolution;
+
+		if (node->onConflictAction == ONCONFLICT_NOTHING)
+			resolution = "NOTHING";
+		else if (node->onConflictAction == ONCONFLICT_UPDATE)
+			resolution = "UPDATE";
+		else if (node->onConflictAction == ONCONFLICT_SELECT)
+		{
+			switch (node->onConflictLockingStrength)
+			{
+case LCS_NONE:
+	resolution = "SELECT";
+	break;
+case LCS_FORKEYSHARE:
+	resolution = "SELECT FOR KEY SHARE";
+	break;
+case LCS_FORSHARE:
+	resolution = "SELECT FOR SHARE";
+	break;
+case LCS_FORNOKEYUPDATE:
+	resolution = "SELECT FOR NO KEY UPDATE";
+	break;
+case LCS_FORUPDATE:
+	resolution = "SELECT FOR UPDATE";
+	break;
+			}
+		}
+
+		ExplainPropertyText("Conflict Resolution", resolution, es);
 
 		/*
 		 * Don't display arbiter indexes at all when DO NOTHING variant
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 1161520f76b..d295f685fd6 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -138,12 +138,23 @@ static vo

Re: meson missing test dependencies

2024-12-03 Thread Nazir Bilal Yavuz

Hi,

On Tue, 3 Dec 2024 at 04:05, Michael Paquier  wrote:
>
> On Mon, Dec 02, 2024 at 01:50:57PM +0300, Nazir Bilal Yavuz wrote:
> > On Mon, 2 Dec 2024 at 13:11, Peter Eisentraut  wrote:
> >> Is there any reason this was not done yet?
> >
> > This looks useful. I am not sure why this was not done before.
>
> Do you think that it would be possible to automate the addition of the
> dependency link between the module and the tests in some way?  The
> most common case where the only dependency needed by the test is the
> module itself would cover a lot of ground if this could be enforced in
> some fashion.

I tried something like:

diff --git a/meson.build b/meson.build
index 451c3f6d851..ddf6045d472 100644
--- a/meson.build
+++ b/meson.build
@@ -3366,6 +3366,13 @@ foreach test_dir : tests

 t = test_dir[kind]

+t_dep = [get_variable(test_dir['name'], '')]
+message('Adding test = @0@ and dep = @1@'.format(test_dir['name'], t_dep))
+if t_dep != ['']
+  t += {'deps': t_dep}
+endif
+

The code above creates a dependency between the module (*whose name is
same with the test*) and the test. This errors out for the 'libpq,
ssl, ldap and icu' because the type of these variables is dependency;
but test.depends can be one of the '[BuildTarget | CustomTarget |
CustomTargetIndex]' types, it can not be a dependency type.

And this can not create a link for the 'scripts, regress, isolation,
authentication, postmaster, recovery, subscription, brin, commit_ts,
gin, test_extensions, test_json_parser, test_misc, test_pg_dump,
typcache, unsafe_tests, worker_spi, kerberos and ecpg' tests. I think
only 'regress, isolation, test_json_parser, worker_spi and ecpg' are
wrong in this list as their modules names are not the same with their
tests.

-- 
Regards,
Nazir Bilal Yavuz
Microsoft

RE: Memory leak in WAL sender with pgoutput (v10~)

On Tuesday, December 3, 2024 4:43 AM Alvaro Herrera  
wrote:
> 
> On 2024-Dec-02, Amit Kapila wrote:
> > > you call anything that loads a Publication depending on how the
> > > caller caches its data.  So I would still choose for modifying the
> > > structure on HEAD removing the pstrdup() for the publication name.
> >
> > BTW, the subscription structure also used the name in a similar way.
> > This will make the publication/subscription names handled differently.
> 
> True (with conninfo, slotname, synccommit, and origin).
...
>
> (Why are we storing a string in Subscription->synccommit?)

I think it's because the primary purpose of sub->synccommit is to serve as a
parameter for SetConfigOption() in the apply worker, which requires a string
value. Additionally, the existing function set_config_option() that validates
this option only accepts a string input. Although we could convert
sub->synccommit to an integer, this would necessitate additional conversion
code before passing it to these functions.

Best Regards,
Hou zj

Wrong results with right-semi-joins

2024-12-03 Thread Richard Guo

I ran into $subject and it can be reproduced with the query below.

create temp table tbl_rs(a int, b int);
insert into tbl_rs select i, i from generate_series(1,10)i;
analyze tbl_rs;

set enable_nestloop to off;
set enable_hashagg to off;

select * from tbl_rs t1
where (select a from tbl_rs t2
   where exists (select 1 from
 (select (b in (select b from tbl_rs t3)) as c
  from tbl_rs t4 where t4.a = 1) s
 where c in
  (select t1.a = 1 from tbl_rs t5 union all select true))
   order by a limit 1) >= 0;
 a | b
---+---
 1 | 1
(1 row)

The expected output should be 10 rows, not 1.

I've traced the root cause to ExecReScanHashJoin, where we neglect to
reset the inner-tuple match flags in the hash table for right-semi
joins when reusing the hash table.  It was my oversight in commit
aa86129e1.  Attached is patch to fix it.

Thanks
Richard


v1-0001-Fix-right-semi-joins-in-HashJoin-rescans.patch
Description: Binary data

Doc: clarify the log message level of the VERBOSE option

2024-12-03 Thread Masahiro Ikeda


Hi,

I'd like to clarify the log message level of the VERBOSE option
because I misunderstood that VACUUM VERBOSE messages cannot be
printed in the server log.

Although the hint is mentioned in "Table 19.2. Message Severity Levels",
it is not addressed for other commands.
INFO	Provides information implicitly requested by the user, e.g., 
output from VACUUM VERBOSE.

https://www.postgresql.org/docs/devel/runtime-config-logging.html#RUNTIME-CONFIG-SEVERITY-LEVELS

IIUC, the following SQL commands support the VERBOSE option:
* VACUUM
* ANALYZE
* REINDEX
* COPY
* EXPALIN

The VERBOSE option for EXPLAIN is not related to the log message level,
and COPY already specifies the log message level. Therefore, I'd like to
add descriptions for the VERBOSE option in the ANALYZE, VACUUM, and
REINDEX commands.

What do you think?

Regards,
--
Masahiro Ikeda
NTT DATA CORPORATIONFrom 8534196181b1684a83bf1e47a8b3e0506a589087 Mon Sep 17 00:00:00 2001
From: Masahiro Ikeda 
Date: Tue, 3 Dec 2024 18:02:10 +0900
Subject: [PATCH v1] Doc: clarify the log message level of the VERBOSE option

---
 doc/src/sgml/ref/analyze.sgml | 3 ++-
 doc/src/sgml/ref/reindex.sgml | 3 ++-
 doc/src/sgml/ref/vacuum.sgml  | 3 ++-
 3 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/doc/src/sgml/ref/analyze.sgml b/doc/src/sgml/ref/analyze.sgml
index a0db56ae74..7b9fc76372 100644
--- a/doc/src/sgml/ref/analyze.sgml
+++ b/doc/src/sgml/ref/analyze.sgml
@@ -65,7 +65,8 @@ ANALYZE [ ( option [, ...] ) ] [ VERBOSE
 
  
-  Enables display of progress messages.
+  Prints a detailed analyze activity report for each table as
+  INFO messages.
  
 

diff --git a/doc/src/sgml/ref/reindex.sgml b/doc/src/sgml/ref/reindex.sgml
index dcf70d14bc..8d476751a0 100644
--- a/doc/src/sgml/ref/reindex.sgml
+++ b/doc/src/sgml/ref/reindex.sgml
@@ -194,7 +194,8 @@ REINDEX [ ( option [, ...] ) ] { DA
 VERBOSE
 
  
-  Prints a progress report as each index is reindexed.
+  Prints a detailed REINDEX activity report for
+  each index as INFO messages.
  
 

diff --git a/doc/src/sgml/ref/vacuum.sgml b/doc/src/sgml/ref/vacuum.sgml
index 9110938fab..62430730a2 100644
--- a/doc/src/sgml/ref/vacuum.sgml
+++ b/doc/src/sgml/ref/vacuum.sgml
@@ -129,7 +129,8 @@ VACUUM [ ( option [, ...] ) ] [ VERBOSE
 
  
-  Prints a detailed vacuum activity report for each table.
+  Prints a detailed vacuum activity report for each table as
+  INFO messages.
  
 

-- 
2.34.1

Re: Sequence Access Methods, round two

On Wed, 2 Oct 2024 at 10:29, Michael Paquier  wrote:
>
> Rebased as v29 due to an OID conflict.
Hi!
Looks like this needs another rebase



-- 
Best regards,
Kirill Reshke

Re: Fix for consume_xids advancing XIDs incorrectly

On Wed, 30 Oct 2024 at 12:01, Yushi Ogiwara
 wrote:
>
> I made a new patch (skip_xid_correctly.diff) that incorporates the
> points we discussed:
>
> 1. Fix the issue that consume_xids consumes nxids+1 XIDs.
> 2. Update lastxid when calling GetTopFullTransactionId() to support
> nxids==1 case.
> 3. Forbid consume_xids when nxids==0.
> 4. Add comments explaining the return values of consume_xids and
> consume_xids_until, and the rationale for incrementing consumed++ when
> GetTopFullTransactionId() is called.
>
> Also, I made another patch (support_blksz_32k.diff) that supports the
> block size == 32K case.
>
> Best,
> Yushi Ogiwara
>


Hi!

There is review comments that need to be addressed in [1]
Patch status now is waiting on author

[1] 
https://www.postgresql.org/message-id/CAD21AoCthHcSQ5zeeivNpiz7HMi_FPG-dtwDDNYUx2oKG36bCQ%40mail.gmail.com

-- 
Best regards,
Kirill Reshke

Re: INSERT ... ON CONFLICT DO SELECT [FOR ...] take 2

2024-12-03 Thread Joel Jacobson

On Tue, Dec 3, 2024, at 09:52, Andreas Karlsson wrote:
> Hi,
>
> Here is an updated version of the patch which fixes a few small bugs, 
> including making sure it checks the update permission plus a bug found 
> by Joel Jacobsson when it was called by SPI.

+1 for this feature.

This seems especially useful when designing idempotent APIs.
Neat to only need a single statement, for what we
currently need two separate statements for.

Here is an attempt of a realistic example:

CREATE OR REPLACE FUNCTION get_or_create_license_key(_user_id bigint, 
_product_id bigint)
RETURNS UUID BEGIN ATOMIC
INSERT INTO licenses (user_id, product_id)
VALUES (_user_id, _product_id)
ON CONFLICT (user_id, product_id) DO NOTHING;
SELECT license_key
FROM licenses
WHERE user_id = _user_id
AND product_id = _product_id;
END;

This can be simplified into:

CREATE OR REPLACE FUNCTION get_or_create_license_key(_user_id bigint, 
_product_id bigint)
RETURNS UUID BEGIN ATOMIC
INSERT INTO licenses (user_id, product_id)
VALUES (_user_id, _product_id)
ON CONFLICT (user_id, product_id) DO SELECT RETURNING license_key;
END;

I've tested the patch successfully and also looked at the code briefly
and at first glance think it looks nice and clean.

/Joel

Re: on_error table, saving error info to a table

On Tue, 3 Dec 2024 at 09:29, jian he  wrote:
>
> On Tue, Nov 5, 2024 at 6:30 PM Nishant Sharma
>  wrote:
> >
> > Thanks for the v2 patch!
> >
> > I see v1 review comments got addressed in v2 along with some
> > further improvements.
> >
> > 1) v2 Patch again needs re-base.
> >
> > 2) I think we need not worry whether table name is unique or not,
> > table name can be provided by user and we can check if it does
> > not exists then simply we can create it with appropriate columns,
> > if it exists we use it to check if its correct on_error table and
> > proceed.
>
> "simply we can create it with appropriate columns,"
> that would be more work.
> so i stick to if there is a table can use to
> error saving then use it, otherwise error out.
>
>
> >
> > 3) Using #define in between the code? I don't see that style in
> > copyfromparse.c file. I do see such style in other src file. So, not
> > sure if committer would allow it or not.
> > #define ERROR_TBL_COLUMNS   10
> >
> let's wait and see.
>
> > 4) Below appears redundant to me, it was not the case in v1 patch
> > set, where it had only one return and one increment of error as new
> > added code was at the end of the block:-
> > +   cstate->num_errors++;
> > +   return true;
> > +   }
> > cstate->num_errors++;
> >
> changed per your advice.
>
> > I was not able to test the v2 due to conflicts in v2, I did attempt to
> > resolve but I saw many failures in make world.
> >
> I get rid of all the SPI code.
>
> Instead, now I iterate through AttributeRelationId to check if the
> error saving table is ok or not,
> using DirectFunctionCall3 to do the privilege check.
> removed gram.y change, turns out it is not necessary.
> and other kinds of refactoring.
>
> please check attached.


Hi!

1)
> + switch (attForm->attnum)
> + {
> + case 1:
> + (.)
> + case 2:

case 1,2,3 ... Is too random. Other parts of core tend to use `#define
Anum__ `. Can we follow this style?

2)
>+ /*
> + * similar to commit a9cf48a
> + * 
> (https://postgr.es/m/7bcfc39d4176faf85ab317d0c26786953646a411.ca...@cybertec.at)
> + * in COPY FROM keep error saving table locks until the transaction end.
> + */

I can rarely see other comments referencing commits, and even few
referencing a mail archive thread.
Can we just write proper comment explaining the reasons?


= overall

Patch design is a little dubious for me. We give users some really
incomprehensible API. To use on_error *relation* feature user must
create tables with proper schema.
Maybe a better design will be to auto-create on_error table if this
table does not exist.


Thoughts?

-- 
Best regards,
Kirill Reshke

Re: [PATCH] Fix jsonb comparison for raw scalar pseudo arrays

2024-12-03 Thread Andrew Dunstan



On 2024-11-18 Mo 9:25 AM, Yan Chengpeng wrote:


Dear PostgreSQL Hackers,

*Problem Description*

I encountered an issue with the B-Tree ordering of `jsonb` values. 
According to the PostgreSQL documentation[1], the ordering should 
follow this precedence:


`Object > Array > Boolean > Number > String > Null`

However, empty arrays (`[]`) are currently considered smaller than 
`null`, which violates the documented rules. This occurs due to 
improper handling of the `rawScalar` flag when comparing arrays in the 
`compareJsonbContainers()` function in 
`src/backend/utils/adt/jsonb_util.c`.




I agree that this is a (10 year old) bug:


-                        if (va.val.array.nElems != vb.val.array.nElems)
+                        else if (va.val.array.nElems != 
vb.val.array.nElems)



But I don't think we can fix it, because there could well be indexes 
that would no longer be valid if we change the sort order. Given that, I 
think the best we can do is adjust the documentation to mention the anomaly.


So the actual sort order as implemented is, AIUI,


Object > Non-Empty-Array > Boolean > Number > String > Null > Empty-Array


which is ugly, but fortunately not many apps rely on jsonb sort order.


cheers


andrew


--
Andrew Dunstan
EDB:https://www.enterprisedb.com

Re: Virtual generated columns

2024-12-03 Thread jian he

-- check constraints
CREATE TABLE gtest20 (a int PRIMARY KEY, b int GENERATED ALWAYS AS (a
* 2) VIRTUAL CHECK (b < 50));
INSERT INTO gtest20 (a) VALUES (10);  -- ok
INSERT INTO gtest20 (a) VALUES (30);  -- violates constraint

ALTER TABLE gtest20 ALTER COLUMN b SET EXPRESSION AS (a * 100);  --
violates constraint
ALTER TABLE gtest20 ALTER COLUMN b SET EXPRESSION AS (a * 3);  -- ok
-
The above content is in src/test/regress/sql/generated_virtual.sql,
the last two query comments
seem to conflict with the error message for now.


i add some regress tests for your v10 changes in
src/backend/commands/statscmds.c.
please check attached.


the sql tests,
"sanity check of system catalog" maybe place it to the end of the sql
file will have better chance of catching some error.
for virtual, we can also check attnotnull, atthasdef value.
like:
SELECT attrelid, attname, attgenerated FROM pg_attribute WHERE
attgenerated IN ('v') and (attnotnull or not atthasdef);


v10-0001-stats_exts-regress-tests.no-cfbot
Description: Binary data

Re: Memory leak in WAL sender with pgoutput (v10~)

On 2024-Dec-03, Amit Kapila wrote:

> On Tue, Dec 3, 2024 at 4:03 PM Alvaro Herrera  wrote:
> >
> > If you don't like the idea of a static memcxt in the one block where
> > it's needed, I propose to store a new memcxt in PGOutputData, to be used
> > exclusively for publications, with a well defined lifetime.
> 
> +1. This sounds like a way to proceed at least for HEAD. For
> back-branches, it is less clear whether changing PGOutputData is a
> good idea. Can such a change in back branches break any existing
> non-core code (extensions)?

We can put the new member at the end of the struct, it shouldn't damage
anything even if they're using this struct -- which I find pretty
unlikely.  The only way that could break anything is if somebody is
allocating/using arrays of it, which sounds even more unlikely.

If we don't want to accept that risk (for which I see no argument, but
happy to be proven wrong), I would suggest to use the foreach-pfree
pattern Michael first proposed for the backbranches, and the new memory
context in master.  I think this is conducive to better coding overall
as we clean things up in this area.

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/

Re: Serverside SNI support in libpq

> On 25 Jul 2024, at 19:51, Jacob Champion  
> wrote:

The attached rebased version adds proper list reset, a couple of bugfixes
around cert loading and the ability to set ssl_passhprase_command (and reload)
in the hosts file.

> Matt Caswell appears to be convinced that SSL_set_SSL_CTX() is
> fundamentally broken. So it might just be FUD, but I'm wondering if we
> should instead be using the SSL_ flavors of the API to reassign the
> certificate chain on the SSL pointer directly, inside the callback,
> instead of trying to set them indirectly via the SSL_CTX_ API.

Maybe, but I would feel better about changing if I can could reproduce the
issues (see below).

> Have you seen any weird behavior like this on your end? I'm starting
> to doubt my test setup...

I've not been able to reproduce any behaviour like what you describe.

> On the plus side, I now have a handful of
> debugging patches for a future commitfest.

Do you have them handy for running tests on this version?

--
Daniel Gustafsson

v2-0001-Serverside-SNI-support-for-libpq.patch
Description: Binary data

Re: Replace current implementations in crypt() and gen_salt() to OpenSSL

> On 24 Nov 2024, at 14:48, Daniel Gustafsson  wrote:

Any other opinions or should we proceed with the proposed GUC?

--
Daniel Gustafsson

Re: Replace current implementations in crypt() and gen_salt() to OpenSSL

2024-12-03 Thread Joe Conway


On 12/3/24 08:59, Daniel Gustafsson wrote:

On 24 Nov 2024, at 14:48, Daniel Gustafsson  wrote:


Any other opinions or should we proceed with the proposed GUC?



I'm good with it. Did you want to commit or would you rather I do it?

--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Changing shared_buffers without restart

On Mon, Dec 2, 2024 at 2:18 PM Dmitry Dolgov <9erthali...@gmail.com> wrote:
> I've asked about that in linux-mm [1]. To my surprise, the
> recommendations were to stick to creating a large mapping in advance,
> and slice smaller mappings out of that, which could be resized later.
> The OOM score should not be affected, and hugetlb could be avoided using
> MAP_NORESERVE flag for the initial mapping (I've experimented with that,
> seems to be working just fine, even if the slices are not using
> MAP_NORESERVE).
>
> I guess that would mean I'll try to experiment with this approach as
> well. But what others think? How much research do we need to do, to gain
> some confidence about large shared mappings and make it realistically
> acceptable?

Personally, I like this approach. It seems to me that this opens up
the possibility of a system where the virtual addresses of data
structures in shared memory never change, which I think will avoid an
absolutely massive amount of implementation complexity. It's obviously
not ideal that we have to specify in advance an upper limit on the
potential size of shared_buffers, but we can live with it. It's better
than what we have today; and certainly cloud providers will have no
issue with pre-setting that to a reasonable value. I don't know if we
can port it to other operating systems, but it seems at least possible
that they offer similar primitives, or will in the future; if not, we
can disable the feature on those platforms.

I still think the synchronization is going to be tricky. For example
when you go to shrink a mapping, you need to make sure that it's free
of buffers that anyone might touch; and when you grow a mapping, you
need to make sure that nobody tries to touch that address space before
they grow the mapping, which goes back to my earlier point about
someone doing a lookup into the buffer mapping table and finding a
buffer number that is beyond the end of what they've already mapped.
But I think it may be doable with sufficient cleverness.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Proposals for EXPLAIN: rename ANALYZE to EXECUTE and extend VERBOSE

On Mon, Dec 2, 2024 at 6:42 AM Michael Christofides
 wrote:
> Are you mostly seeing query plans that have stumped other people already (eg 
> second or third line support), so perhaps seeing more complex plans than the 
> average user?

Yes. :-)

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Memory leak in WAL sender with pgoutput (v10~)

On Tue, Dec 3, 2024 at 11:57 AM Michael Paquier  wrote:
>
> On Tue, Dec 03, 2024 at 09:50:42AM +0530, Amit Kapila wrote:
> > But that suits the current design more. We allocate PGOutputData and
> > other contexts in that structure in a "Logical decoding context". A
> > few of its members (publications, publication_names) residing in
> > totally unrelated contexts sounds odd. In the first place, we don't
> > need to allocate publications under CacheMemoryContext, they should be
> > allocated in PGOutputData->cachectx. However, because we need to free
> > those entirely at one-shot during invalidation processing, we could
> > use a new context as a child context of PGOutputData->cachectx. Unless
> > I am missing something, the current memory context usage appears more
> > like a coding convenience than a thoughtful design decision.
>
> PGOutputData->cachectx has been introduced in 2022 in commit 52e4f0cd4,
> while the decision to have RelationSyncEntry and the publication list
> in CacheMemoryContext gets down to v10 where this logical replication
> has been introduced.  This was a carefully-thought choice back then
> because this is data that belongs to the process cache, so yes, this
> choice makes sense to me.
>

The parent structure (PGOutputData) was stored in the "Logical
decoding context" even in v11. So, how does storing its member
'publications' in CacheMemoryContext a good idea? It is possible that
we are leaking memory while doing decoding via SQL APIs where we free
decoding context after getting changes though I haven't tested the
same.

-- 
With Regards,
Amit Kapila.

RE: Memory leak in WAL sender with pgoutput (v10~)

On Tuesday, December 3, 2024 4:28 PM Amit Kapila  
wrote:
> 
> On Tue, Dec 3, 2024 at 11:57 AM Michael Paquier 
> wrote:
> >
> > On Tue, Dec 03, 2024 at 09:50:42AM +0530, Amit Kapila wrote:
> > > But that suits the current design more. We allocate PGOutputData and
> > > other contexts in that structure in a "Logical decoding context". A
> > > few of its members (publications, publication_names) residing in
> > > totally unrelated contexts sounds odd. In the first place, we don't
> > > need to allocate publications under CacheMemoryContext, they should
> > > be allocated in PGOutputData->cachectx. However, because we need to
> > > free those entirely at one-shot during invalidation processing, we
> > > could use a new context as a child context of
> > > PGOutputData->cachectx. Unless I am missing something, the current
> > > memory context usage appears more like a coding convenience than a
> thoughtful design decision.
> >
> > PGOutputData->cachectx has been introduced in 2022 in commit
> > PGOutputData->52e4f0cd4,
> > while the decision to have RelationSyncEntry and the publication list
> > in CacheMemoryContext gets down to v10 where this logical replication
> > has been introduced.  This was a carefully-thought choice back then
> > because this is data that belongs to the process cache, so yes, this
> > choice makes sense to me.
> >
> 
> The parent structure (PGOutputData) was stored in the "Logical decoding
> context" even in v11. So, how does storing its member 'publications' in
> CacheMemoryContext a good idea? It is possible that we are leaking memory
> while doing decoding via SQL APIs where we free decoding context after
> getting changes though I haven't tested the same.

Right. I think I have faced this memory leak recently. It might be true for
walsender that 'publications' is a per-process content. But SQL APIs might use
different publication names each time during execution.

I can reproduce the memory leak due to allocating the publication
names under CacheMemoryContext like the following:

--
CREATE PUBLICATION pub FOR ALL TABLES;
CREATE TABLE stream_test(a int);
SELECT 'init' FROM pg_create_logical_replication_slot('isolation_slot', 
'pgoutput');
INSERT INTO stream_test SELECT generate_series(1, 2, 1);

- he backend's memory usage increases with each execution of the following 
function
SELECT count(*) FROM pg_logical_slot_peek_binary_changes('isolation_slot', 
NULL, NULL, 'proto_version', '4', 'publication_names', 'pub,pub, ');

Best Regards,
Hou zj

Building an extension in a separate directory from the source files.

2024-12-03 Thread a . mitrokhin


hi.

1. Does postgresql support building an extension in a separate directory
   from the source files? For example, is this design officially 
supported?


~$ mkdir -p $(BUILDDIR)
~$ make -C $(BUILDDIR) -f $(SRCDIR)/Makefile USE_PGXS=1 all

2. If so, what is the correct way to build object files for .c files 
that are

   located in their own subdirectories inside $(SRCDIR)? For example, in
   $(SRCDIR)/Makefile it is written:
OBJS = src/a.c src/b.c
   I'm getting:
gcc ... -c -o src/a.o ../src/a.c
Assembler messages:
Fatal error: can't create src/a.o: No such file or directory

Patch which fixes this problem for me:

--- src/Makefile.global.in.orig 2024-11-29 16:26:14.350728880 +0700
+++ src/Makefile.global.in  2024-11-29 16:26:54.362809922 +0700
@@ -991,6 +991,7 @@
 # GCC allows us to create object and dependency file in one invocation.
 %.o : %.c
@if test ! -d $(DEPDIR); then mkdir -p $(DEPDIR); fi
+   @if test ! -d $(@D); then mkdir -p $(@D); fi
$(COMPILE.c) -o $@ $< -MMD -MP -MF $(DEPDIR)/$(*F).Po

 %.o : %.cpp

/foo

Re: processes stuck in shutdown following OOM/recovery

On Wed, 6 Mar 2024 at 02:22, Thomas Munro  wrote:
>
> On Sat, Dec 2, 2023 at 3:30 PM Thomas Munro  wrote:
> > On Sat, Dec 2, 2023 at 2:18 PM Thomas Munro  wrote:
> > > On Fri, Dec 1, 2023 at 6:13 PM Justin Pryzby  wrote:
> > > > $ kill -9 2524495; sleep 0.05; pg_ctl -D ./pgdev.dat1 stop -m fast # 
> > > > 2524495 is a child's pid
> > >
> > > > This affects v15, and fails at ) but not its parent.
> > >
> > > Repro'd here.  I had to make the sleep shorter on my system.  Looking...
> >
> > The PostmasterStateMachine() case for PM_WAIT_BACKENDS doesn't tell
> > the checkpointer to shut down in this race case.  We have
> > CheckpointerPID != 0 (because 7ff23c6d27 starts it earlier than
> > before), and FatalError is true because a child recently crashed and
> > we haven't yet received the PMSIGNAL_RECOVERY_STARTED handler that
> > would clear it.  Hmm.
>
> Here is a first attempt at fixing this.  I am not yet 100% sure if it
> is right, and there may be a nicer/simpler way to express the
> conditions.  It passes the test suite, and it fixes the repro that
> Justin posted.  FYI on my machine I had to use sleep 0.005 where he
> had 0.05, as an FYI if someone else is trying to reproduce the issue.

Hi!
This patch needs a rebase.CF entry status now is Waiting On author.

-- 
Best regards,
Kirill Reshke

Re: Why doesn't GiST VACUUM require a super-exclusive lock, like nbtree VACUUM?

2024-12-03 Thread Peter Geoghegan

On Mon, Dec 2, 2024 at 8:18 PM Peter Geoghegan  wrote:
> Attached is a refined version of a test case I posted earlier on [2],
> decisively proving that GiST index-only scans are in fact subtly
> broken. Right now it fails, showing a wrong answer to a query. The
> patch adds an isolationtest test case to btree_gist, based on a test
> case of Andres'.

I can confirm that the same bug affects SP-GiST. I modified the
original failing GiST isolation test to make it use SP-GiST instead,
proving what I already strongly suspected.

I have no reason to believe that there are any similar problems in
core index AMs other than GiST and SP-GiST, though. Let's go through
them all now: nbtree already does everything correctly, and all
remaining core index AMs don't support index-only scans *and* don't
support scans that don't just use an MVCC snapshot.

-- 
Peter Geoghegan

checksum verification code breaks backups in v16-

In 025584a168a4b3002e19350bb8db0ebf1fd10235, which shipped with v17, I
changed the way that a base backup decides whether files have
checksums. At the time, I thought this was harmless refactoring, but
it turns out that it was better than harmless. The old way can cause
pg_basebackup failures. To reproduce, I believe you just need:

1. A cluster created with v16 or earlier with checksums enabled (i.e.
initdb -k).
2. A file inside "global", "base", or a tablespace directory whose
name contains a period which is followed by something that atoi() is
unable to convert to a positive number.

Then pg_basebackup will fail like this:

pg_basebackup: error: could not get COPY data stream: ERROR:  invalid
segment number 0 in file "pg_control.old"

One way to fix this is to back-port
025584a168a4b3002e19350bb8db0ebf1fd10235. Before that commit, we
assumed all files had checksums unless the filename like one of the
files that we know isn't supposed to be checksummed. After that
commit, we assume files don't have checksums unless the file format
looks like something we know should be checksummed. That inherently
avoids trying to checksum things we're not expecting to see, such as a
pg_control.old file.

If we want a narrower and thus less-risky fix, we could consider just
adjusting this code here:

segmentno = atoi(segmentpath + 1);
if (segmentno == 0)
ereport(ERROR,
(errmsg("invalid segment number %d in file \"%s\"",
segmentno, filename)));

I think we could just replace the ereport() with verify_checksum =
false (and update the comments). That would leave open the possibility
of trying to checksum some random file with a name like
this_file_should_not_be_here, which we'll interpret as segment 0
because the name does not contain a dot; but if the file were called
this.file.should.not.be.here, we'd not try to checksum it at all
because "here" is not an integer. That's a little random, but it
avoids taking a position on whether
025584a168a4b3002e19350bb8db0ebf1fd10235 is fully correct. It simply
takes the narrower position that we shouldn't fail the entire backup
because we're unable to perform a checksum calculation that in the
worst case would only produce a warning anyway.

At the moment, I think I slightly prefer the narrower fix, but it
might be a little too narrow, since the resultant behavior as
described in the preceding paragraph is somewhat wonky and
nonsensical.

Opinions?

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: code contributions for 2024, WIP version

On Tue, Dec 3, 2024 at 11:19 AM Joe Conway  wrote:
> While I know you said "you will do you" when it comes to your annual
> blog, there are a number of similar efforts -- top of mind is the
> analysis done (as I understand it) by Daniel Gustafsson and Claire
> Giordano [1], as well as ongoing/recurring analysis done by the
> contributor committee. And there is the adjacent related discussion
> around commit messages/authors. It makes me wonder if there isn't a way
> to make all of our lives easier going forward.

Yes, I'm game to try to figure out how to combine our efforts. I don't
think it's a bad thing that different people have different takes;
this is complicated and looking at it through just one lens is
limiting. But people duplicating work is, well, not so good.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: meson missing test dependencies

Hi,

On 2024-12-02 11:10:56 +0100, Peter Eisentraut wrote:
> I have noticed that under meson many tests don't have dependencies on the
> build artifacts that they are testing.  As an example among many, if you
> make a source code change in contrib/cube/cube.c (see patch 0001 for a demo)
> and then run
>
> make -C contrib/cube check
>
> the test run will reflect the changed code, because the "check" targets
> typically depend on the "all" targets.  But if you do this under meson with
>
> meson test -C build --suite setup --suite cube
>
> the code will not be rebuilt first, and the test run will not reflect the
> changed code.

That's unfortunately a very partial fix - because we insist on tests being run
against a temporary install, we don't just need to rebuild the code, we also
need to re-install it. Without that you don't, e.g., see server changes.

Currently the install is done as part of the setup/tmp_install test.

It'd be *much* nicer to create the temporary installation as a build step, but
unfortunately meson currently builds all test dependencies when you build the
default target. Which unfortunately would mean that we'd reinstall the temp
installation whenever 'ninja' is issued. Hence the weird setup with the test
doing the install.  There's recent progress towards improving this in meson,
luckily.

However, it looks like the tmp_install test *does* miss dependencies too and I
see no reason to not fix that.

Medium term I think we should work on being able to run our tests from the
source tree. That'd substantially improve the time to run individual tests, I
think.

Greetings,

Andres Freund

diff --git i/meson.build w/meson.build
index ff3848b1d85..55b751a0c6b 100644
--- i/meson.build
+++ w/meson.build
@@ -3263,6 +3263,7 @@ test('tmp_install',
 priority: setup_tests_priority,
 timeout: 300,
 is_parallel: false,
+depends: all_built,
 suite: ['setup'])

 test('install_test_files',

Re: simplify regular expression locale global variables


On 25.10.24 10:16, Andreas Karlsson wrote:

On 10/15/24 8:12 AM, Peter Eisentraut wrote:

We currently have

 static PG_Locale_Strategy pg_regex_strategy;
 static pg_locale_t pg_regex_locale;
 static Oid  pg_regex_collation;

but after the recent improvements to pg_locale_t handling, we don't 
need all three anymore.  All the information we have is contained in 
pg_locale_t, so we just need to keep that one.  This allows us to 
structure the locale-using regular expression code more similar to 
other locale-using code, mainly by provider, avoiding another layer 
that is specific only to the regular expression code.  The first patch 
implements that.


Jeff Davis has a patch which also fixes this while refactoring other 
stuff too which I prefer over your patch since it also cleans up the 
collation code in general.


https://www.postgresql.org/message- 
id/2830211e1b6e6a2e26d845780b03e125281ea17b.camel%40j-davis.com


That patch set looks like a good direction.

But it doesn't remove pg_regex_collation, only pg_regex_strategy.  So I 
have split my v2 into two patches, the first removes pg_regex_collation 
and the second removes pg_regex_strategy.  The first patch is useful on 
its own, I think; the second one will presumably be replaced by the 
other patch series above.From ab19b4d7ab03ba8c515da3e4b389d41941c5ab27 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Tue, 3 Dec 2024 16:58:38 +0100
Subject: [PATCH v3 1/2] Remove pg_regex_collation

We can also use the existing pg_regex_locale as the cache key, which
is the only use of this variable.

Discussion: 
https://www.postgresql.org/message-id/flat/b1b92ae1-2e06-4619-a87a-4b4858e547ec%40eisentraut.org
---
 src/backend/regex/regc_pg_locale.c | 11 ---
 1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/src/backend/regex/regc_pg_locale.c 
b/src/backend/regex/regc_pg_locale.c
index b75784b6ce5..e07d4a8868c 100644
--- a/src/backend/regex/regc_pg_locale.c
+++ b/src/backend/regex/regc_pg_locale.c
@@ -74,7 +74,6 @@ typedef enum
 
 static PG_Locale_Strategy pg_regex_strategy;
 static pg_locale_t pg_regex_locale;
-static Oid pg_regex_collation;
 
 /*
  * Hard-wired character properties for C locale
@@ -254,7 +253,7 @@ pg_set_regex_collation(Oid collation)
 * pg_newlocale_from_collation().
 */
strategy = PG_REGEX_STRATEGY_C;
-   collation = C_COLLATION_OID;
+   locale = 0;
}
else
{
@@ -273,7 +272,6 @@ pg_set_regex_collation(Oid collation)
 */
strategy = PG_REGEX_STRATEGY_C;
locale = 0;
-   collation = C_COLLATION_OID;
}
else if (locale->provider == COLLPROVIDER_BUILTIN)
{
@@ -298,7 +296,6 @@ pg_set_regex_collation(Oid collation)
 
pg_regex_strategy = strategy;
pg_regex_locale = locale;
-   pg_regex_collation = collation;
 }
 
 static int
@@ -628,7 +625,7 @@ typedef int (*pg_wc_probefunc) (pg_wchar c);
 typedef struct pg_ctype_cache
 {
pg_wc_probefunc probefunc;  /* pg_wc_isalpha or a sibling */
-   Oid collation;  /* collation this entry 
is for */
+   pg_locale_t locale; /* locale this entry is for */
struct cvec cv; /* cache entry contents */
struct pg_ctype_cache *next;/* chain link */
 } pg_ctype_cache;
@@ -697,7 +694,7 @@ pg_ctype_get_cache(pg_wc_probefunc probefunc, int 
cclasscode)
for (pcc = pg_ctype_cache_list; pcc != NULL; pcc = pcc->next)
{
if (pcc->probefunc == probefunc &&
-   pcc->collation == pg_regex_collation)
+   pcc->locale == pg_regex_locale)
return &pcc->cv;
}
 
@@ -708,7 +705,7 @@ pg_ctype_get_cache(pg_wc_probefunc probefunc, int 
cclasscode)
if (pcc == NULL)
return NULL;
pcc->probefunc = probefunc;
-   pcc->collation = pg_regex_collation;
+   pcc->locale = pg_regex_locale;
pcc->cv.nchrs = 0;
pcc->cv.chrspace = 128;
pcc->cv.chrs = (chr *) malloc(pcc->cv.chrspace * sizeof(chr));

base-commit: 1ba0782ce90cb4261098de59b49ae5cb2326566b
-- 
2.47.1

From 3406390429bbb27931b1e99f9be410213fe05da4 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Tue, 3 Dec 2024 16:59:51 +0100
Subject: [PATCH v3 2/2] Remove pg_regex_strategy

We only need to keep pg_regex_locale.  This works now because
pg_locale_t now contains all the required information (such as a
ctype_is_c field).  This allows us to structure the locale-using
regular expression code more similar to other locale-using code,
mainly by provider, avoiding another layer that is specific only to
the regular expression code.

Discussion: 
https://www.postgresql.org/message-id/flat/b1b92ae1-2e06-4619-a87a-4b4858e547ec%40eisentraut

Re: Waits monitoring

2024-12-03 Thread Rahul Pandey

Is this supported on any of the public cloud managed postgres services?

On Tue, Dec 3, 2024 at 10:23 AM Robert Haas  wrote:

> On Thu, Sep 10, 2015 at 3:43 AM, Kyotaro HORIGUCHI
>  wrote:
> > Generated lwlocknames.[ch] don't have header comment because
> > generate-lwlocknames.pl writes them into wrong place.
> >
> > lmgr/Makefile looks to have some mistakes.
>
> Fixed.
>
> >  - lwlocknames.c is not generated from (or using) lwlocknames.c
> >so the entry "lwlocknames.c: lwlocknames.h" doesn't looks to
> >be appropriate.
>
> I think that's a pretty standard way of handling a case where a single
> command generates multiple files.
>
> >  - maintainer-clean in lmgr/Makefile forgets to remove lwlocknames.c.
>
> Fixed.
>
> > Perhaps uncommenting in pg_config_manual.h is left alone.
> > (This is not included in the diff below)
>
> Fixed.
>
> And committed.  Thanks for the review, let's see what the buildfarm thinks.
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hack...@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
>
>

Re: code contributions for 2024, WIP version

2024-12-03 Thread Joe Conway

On 12/3/24 10:44, Robert Haas wrote:

On Tue, Dec 3, 2024 at 10:37 AM Nathan Bossart  wrote:

On Tue, Dec 03, 2024 at 10:16:35AM +0900, Michael Paquier wrote:
> On Mon, Dec 02, 2024 at 04:10:22PM -0500, Robert Haas wrote:
>> Donning my asbestos underwear, I remain yours faithfully,
>
> Thanks for taking the time to compile all that.  That's really nice.

+1, I always look forward to the blog post.

Thanks, glad it's appreciated.

It is definitely appreciated.

While I know you said "you will do you" when it comes to your annual 
blog, there are a number of similar efforts -- top of mind is the 
analysis done (as I understand it) by Daniel Gustafsson and Claire 
Giordano [1], as well as ongoing/recurring analysis done by the 
contributor committee. And there is the adjacent related discussion 
around commit messages/authors. It makes me wonder if there isn't a way 
to make all of our lives easier going forward.

[1] 
https://speakerdeck.com/clairegiordano/whats-in-a-postgres-major-release-an-analysis-of-contributions-in-the-v17-timeframe-claire-giordano-pgconf-eu-2024

--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Better error message when --single is not the first arg to postgres executable

Here's what I have staged for commit.  I didn't like how v4 added the ERROR
to ParseLongOption(), so in v5 I've moved it to the callers of
ParseLongOption(), which is where the existing option validation lives.
This results in a bit of code duplication, but IMHO that's better than
adding nonobvious behavior to ParseLongOption().

Barring additional feedback or cfbot failures, I'm planning on committing
this shortly.

-- 
nathan
>From 95af772ac7106d28db0be4505beebcdc5fd8c902 Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Tue, 3 Dec 2024 10:59:32 -0600
Subject: [PATCH v5 1/1] Provide a better error message for misplaced
 subprogram options.

Before this patch, specifying a special must-be-first option for
dispatching to a subprogram (e.g., postgres -D . --single) would
fail with an error like

FATAL:  --single requires a value

This patch adjusts this error to more accurately complain that the
special option wasn't listed first.  The previous error message now
looks like

FATAL:  --single must be first argument

The subprogram parsing code has been refactored for reuse wherever
ParseLongOption() is called.  Beyond the obvious advantage of
avoiding code duplication, this should prevent similar problems
from appearing when new subprograms are added.  Note that we assume
that none of the subprogram option names match another valid
command-line argument, such as the name of a configuration
parameter.

Ideally, we'd remove this must-be-first requirement for these
options, but after some investigation, we felt that the added
complexity and behavior changes weren't worth it.

Author: Nathan Bossart, Greg Sabino Mullane
Reviewed-by: Greg Sabino Mullane, Peter Eisentraut
Discussion: 
https://postgr.es/m/CAKAnmmJkZtZAiSryho%3DgYpbvC7H-HNjEDAh16F3SoC9LPu8rqQ%40mail.gmail.com
---
 src/backend/bootstrap/bootstrap.c   | 15 +-
 src/backend/main/main.c | 84 -
 src/backend/postmaster/postmaster.c | 15 +-
 src/backend/tcop/postgres.c | 15 +-
 src/include/postmaster/postmaster.h | 16 ++
 src/tools/pgindent/typedefs.list|  1 +
 6 files changed, 130 insertions(+), 16 deletions(-)

diff --git a/src/backend/bootstrap/bootstrap.c 
b/src/backend/bootstrap/bootstrap.c
index d31a67599c..37a0838a90 100644
--- a/src/backend/bootstrap/bootstrap.c
+++ b/src/backend/bootstrap/bootstrap.c
@@ -224,8 +224,21 @@ BootstrapModeMain(int argc, char *argv[], bool check_only)
case 'B':
SetConfigOption("shared_buffers", optarg, 
PGC_POSTMASTER, PGC_S_ARGV);
break;
-   case 'c':
case '-':
+
+   /*
+* Error if the user misplaced a special 
must-be-first option
+* for dispatching to a subprogram. 
parse_subprogram() returns
+* SUBPROGRAM_POSTMASTER if it doesn't find a 
match, so error
+* for anything else.
+*/
+   if (parse_subprogram(optarg) != 
SUBPROGRAM_POSTMASTER)
+   ereport(ERROR,
+   
(errcode(ERRCODE_SYNTAX_ERROR),
+errmsg("--%s must be 
first argument", optarg)));
+
+   /* FALLTHROUGH */
+   case 'c':
{
char   *name,
   *value;
diff --git a/src/backend/main/main.c b/src/backend/main/main.c
index aea93a0229..43380bc7ee 100644
--- a/src/backend/main/main.c
+++ b/src/backend/main/main.c
@@ -43,6 +43,19 @@
 const char *progname;
 static bool reached_main = false;
 
+/* names of special must-be-first options for dispatching to subprograms */
+static const char *const SubprogramNames[] =
+{
+   [SUBPROGRAM_CHECK] = "check",
+   [SUBPROGRAM_BOOT] = "boot",
+   [SUBPROGRAM_FORKCHILD] = "forkchild",
+   [SUBPROGRAM_DESCRIBE_CONFIG] = "describe-config",
+   [SUBPROGRAM_SINGLE] = "single",
+   /* SUBPROGRAM_POSTMASTER has no name */
+};
+
+StaticAssertDecl(lengthof(SubprogramNames) == SUBPROGRAM_POSTMASTER,
+"array length mismatch");
 
 static void startup_hacks(const char *progname);
 static void init_locale(const char *categoryname, int category, const char 
*locale);
@@ -57,6 +70,7 @@ int
 main(int argc, char *argv[])
 {
booldo_check_root = true;
+   Subprogram  subprogram = SUBPROGRAM_POSTMASTER;
 
reached_main = true;
 
@@ -179,21 +193,36 @@ main(int argc, char *argv[])
 * Dispatch to one of various subprograms depending on first argument.
 */
 
-   if (argc > 1 && strcmp(argv[1], "--check")

Re: crash with synchronized_standby_slots

On 2024-Nov-29, Amit Kapila wrote:

> I tried it on my Windows machine and noticed that ReplicationSlotCtl
> is NULL for syslogger, so the problem doesn't occur. The reason is
> that we don't attach to shared memory in syslogger, so ideally
> ReplicationSlotCtl should be NULL. Because we inherit everything
> through the fork for Linux systems and then later for processes that
> don't want to attach to shared memory, we call PGSharedMemoryDetach()
> from postmaster_child_launch(). The PGSharedMemoryDetach won't
> reinitialize the memory pointed to by ReplicationSlotCtl, so, it would
> be an invalid memory.

Heh, interesting.  I'm not sure if we should try to do something about
invalid pointers being left around after shmem initialization.  Also, is
this the first GUC check_hook that needs to take an LWLock?

Anyway, I have pushed this.

BTW it occurs to me that there might well be some sort of thundering
herd problem if every process needs to run the check_hook when a SIGHUP
is broadcast, and they'll all be waiting on that particular lwlock and
run the same validation locally again and again.  I bet if you have a
few thousand backends (hi Jakub! [1]) it's problematic.  Maybe we need a
different way to validate the GUC, but I don't know what that might be;
but doing the validation once and storing the result in shmem might be
better.

On 2024-Nov-29, Zhijie Hou (Fujitsu) wrote:

> I can also reproduce this bug and confirmed that the bug is fixed
> after applying the patch. In addition to the regression tests, I also
> manually tested the behavior of the postmaster, walsender, and user
> backend after reloading the configuration, and they all work as
> expected.

Many thanks for testing!

[1] 
https://postgr.es/m/cakzirmwrbjcbcj433wv5zjvwt_ouy7bsvx12mbkibu+enzd...@mail.gmail.com

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"I apologize for the confusion in my previous responses.
 There appears to be an error."   (ChatGPT)

Re: Better error message when --single is not the first arg to postgres executable

On 2024-Dec-03, Nathan Bossart wrote:

> +Subprogram
> +parse_subprogram(const char *name)
> +{

Please add a comment atop this function.  Also, I don't think it should
go at the end of the file; maybe right after main() is a more
appropriate location?

> +/* special must-be-first options for dispatching to various subprograms */
> +typedef enum Subprogram
> +{

I'm not sure this comment properly explains what this enum is used for.
Maybe add a reference to parse_subprogram to the comment?

Thanks

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
Al principio era UNIX, y UNIX habló y dijo: "Hello world\n".
No dijo "Hello New Jersey\n", ni "Hello USA\n".

Re: Proposal for Updating CRC32C with AVX-512 Algorithm.

On Tue, Dec 03, 2024 at 03:46:16PM +, Devulapalli, Raghuveer wrote:
>> Raghuveer, would you mind rebasing this patch set now that the SSE4.2 patch 
>> is
>> committed?
> 
> Rebased to master branch. 

Thanks!  cfbot is showing a couple of errors [0] [1] [2].  32-bit Linux is
failing to compile with the 64-bit intrinsics.  I think it'd be fine to
limi this optimization to 64-bit builds unless the code can be easily fixed
to work for both.  The macOS build seems to be trying to include the x86
headers, which is producing many errors.  We'll need to make sure that none
of this code is being compiled on ARM machine.  The Windows build seems to
be unable to resolve the pg_comp_crc32c symbol, but it is not immediately
obvious to me why.

[0] https://cirrus-ci.com/task/6023394207989760
[1] https://cirrus-ci.com/task/5460444254568448
[2] https://cirrus-ci.com/task/6586344161411072

-- 
nathan

Re: Enhancing Memory Context Statistics Reporting

2024-12-03 Thread Tomas Vondra

On 12/3/24 20:09, Rahila Syed wrote:
> 
> Hi,
>  
> 
> 
> 
> >     4) I wonder if the function needs to return PID. I mean, the
> caller
> >     knows which PID it is for, so it seems rather unnecessary.
> >
> > Perhaps it can be used to ascertain that the information indeed
> belongs to 
> > the requested pid.
> >
> 
> I find that a bit ... suspicious. By this logic we'd include the input
> parameters in every result, but we don't. So why is this case different?
> 
>  
> This was added to address a review suggestion. I had left it in case
> anyone found it useful 
> for verification. 
> Previously, I included a check for scenarios where multiple processes
> could write to the same 
> shared memory. Now, each process has a separate shared memory space
> identified by 
> pgprocno, making it highly unlikely for the receiving process to see
> another process's memory 
> dump.
> Such a situation could theoretically occur if another process were
> mapped to the same 
> pgprocno, although I’m not sure how likely that is. That said, I’ve
> added a check in the receiver
> to ensure the PID written in the shared memory matches the PID for which
> the dump is 
> requested. 
> This guarantees that a user will never see the memory dump of another
> process.
> Given this, I’m fine with removing the pid column if it helps to make
> the output more readable.
> 

I'd just remove that. I agree it might have been useful with the single
chunk of shared memory, but I think with separate chunks it's not very
useful. And if we can end up with multiple processed getting the same
pgprocno I guess we have way bigger problems, this won't fix that.

> >     5) In the "summary" mode, it might be useful to include info
> about how
> >     many child contexts were aggregated. It's useful to know
> whether there
> >     was 1 child or 1 children. In the regular (non-summary)
> mode it'd
> >     always be "1", probably, but maybe it'd interact with the
> limit in (1).
> >     Not sure.
> >
> > Sure,  I will add this in the next iteration. 
> >
> 
> OK
> 
>  
> I have added this information as a column named "num_agg_contexts",
> which indicates 
> the number of contexts whose statistics have been aggregated/added for a
> particular output.
> 
> In summary mode, all the child contexts of a given level-1 context are
> aggregated, and 
> their statistics are presented as part of the parent context's
> statistics. In this case, 
> num_agg_contexts  provides the count of all child contexts under a given
> level-1 context.
> 
> In regular (non-summary) mode, this column shows a value of 1, meaning
> the statistics 
> correspond to a single context, with all context statistics displayed
> individually. In this mode
> an aggregate result is displayed if the number of contexts exceed the
> DSA size limit. In
> this case the num_agg_contexts will display the number of the remaining
> contexts.
> 

OK

> >      
> > In the patch, since the timeout was set to a high value, pgbench ended
> > up stuck 
> > waiting for the timeout to occur. The failure happens less frequently
> > after I added an
> > additional check for the process's existence, but it cannot be
> entirely 
> > avoided. This is because a process can terminate after we check
> for its
> > existence but 
> > before it signals the client. In such cases, the client will not
> receive
> > any signal.
> >
> 
> Hmmm, I see. I guess there's no way to know if a process responds to us,
> but I guess it should be possible to wake up regularly and check if the
> process still exists? Wouldn't that solve the case you mentioned?
> 
> I have fixed it accordingly in the attached patch by waking up after
> every 5 seconds
> to check if the process exists and sleeping again if the wake-up condition
> is not satisfied.  The number of such tries is limited to 20. So, the
> total wait 
> time can be 100 seconds. I will make the re-tries configurable, inline
> with your 
> suggestion to be able to override the default waiting time.
>  

Makes sense, although 100 seconds seems a bit weird, it seems we usually
pick "natural" values like 60s, or multiples of that. But if it's
configurable, that's not a huge issue.

Could the process wake up earlier than the timeout, say if it gets EINT
signal? That'd break the "total timeout is 100 seconds", and it would be
better to check that explicitly. Not sure if this can happen, though.

One thing I'd maybe consider is starting with a short timeout, and
gradually increasing it until e.g. 5 seconds (or maybe just 1 second
would be perfectly fine, IMHO). With the current coding it means we
either get the response right away, or wait 5+ seconds. That's a big
huge jump. If we start e.g. with 10ms, and then gradually multiply it by
1.2, it means we only wait "0-20% extra" on average.

But perhaps this is very unlik

Re: Doc: typo in config.sgml

2024-12-03 Thread Bruce Momjian

On Tue, Dec  3, 2024 at 09:03:37PM +0100, Peter Eisentraut wrote:
> On 03.12.24 04:13, Bruce Momjian wrote:
> > On Mon, Dec  2, 2024 at 09:33:39PM -0500, Tom Lane wrote:
> > > Bruce Momjian  writes:
> > > > Now that we have a warning about non-emittable characters in the PDF
> > > > build, do you want me to put back the Latin1 characters in the SGML
> > > > files or leave them as HTML entities?
> > > 
> > > I think going forward we're going to be putting in people's names
> > > in UTF8 --- I was certainly planning to start doing that.  It doesn't
> > 
> > Yes, I expected that, and added an item to my release checklist to make
> > a PDF file and check for the warning.  I don't normally do that.
> > 
> > > matter that much what we do with existing cases, though.
> > 
> > Okay, I think Peter had an opinion but I wasn't sure what it was.
> 
> I would prefer that the parts of commit 641a5b7a144 that replace non-ASCII
> characters with entities are reverted.

Done.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Do not let urgent matters crowd out time for investment in the future.

Re: checksum verification code breaks backups in v16-

On Tue, Dec 03, 2024 at 11:39:43AM -0500, Robert Haas wrote:
> If we want a narrower and thus less-risky fix, we could consider just
> adjusting this code here:
> 
> segmentno = atoi(segmentpath + 1);
> if (segmentno == 0)
> ereport(ERROR,
> (errmsg("invalid segment number %d in file 
> \"%s\"",
> segmentno, filename)));

I'm only seeing this code in pg_checksums.  Is that what you are proposing
to change?

> I think we could just replace the ereport() with verify_checksum =
> false (and update the comments). That would leave open the possibility
> of trying to checksum some random file with a name like
> this_file_should_not_be_here, which we'll interpret as segment 0
> because the name does not contain a dot; but if the file were called
> this.file.should.not.be.here, we'd not try to checksum it at all
> because "here" is not an integer. That's a little random, but it
> avoids taking a position on whether
> 025584a168a4b3002e19350bb8db0ebf1fd10235 is fully correct. It simply
> takes the narrower position that we shouldn't fail the entire backup
> because we're unable to perform a checksum calculation that in the
> worst case would only produce a warning anyway.

Hm.

> At the moment, I think I slightly prefer the narrower fix, but it
> might be a little too narrow, since the resultant behavior as
> described in the preceding paragraph is somewhat wonky and
> nonsensical.

Yeah, I'm a little worried that it is too narrow.  I was hoping that the
refactoring commit was older, because then I think we'd be much more
willing to back-patch it.  But there are a couple of v17 releases out there
already, so maybe it's not too much of a stretch.  I guess I'm leaning
towards the back-patching idea...

-- 
nathan

Re: Replace current implementations in crypt() and gen_salt() to OpenSSL

> On 3 Dec 2024, at 15:04, Joe Conway  wrote:
> 
> On 12/3/24 08:59, Daniel Gustafsson wrote:
>>> On 24 Nov 2024, at 14:48, Daniel Gustafsson  wrote:
>> Any other opinions or should we proceed with the proposed GUC?
> 
> I'm good with it. Did you want to commit or would you rather I do it?

No worries, I can make it happen.  ssnce there has been no objections since
this patch was posted I'll get it commmitted shortly.

--
Daniel Gustafsson

Re: Remove remnants of "snapshot too old"

Hi,

On 2024-12-03 22:06:59 +0200, Heikki Linnakangas wrote:
> I spotted some more remnants of the "snapshot too old" feature that was
> removed in v17. Barring objections, I will commit the attached patch to tidy
> up.

Most of this I agree with. But I'm not sure just removing the toast snapshot
stuff is good - we've had a bunch of bugs where we don't hold a snapshot for
long enough to actually ensure that toast tuples stay alive. It's not legal to
fetch a toast id in one snapshot, release that, and then fetch the toast tuple
with a fresh snapshot.  I think the removal of

> - if (snapshot == NULL)
> - elog(ERROR, "cannot fetch toast data without an active 
> snapshot");

will make it easier to introduce further such bugs?

Greetings,

Andres Freund

Re: Remove remnants of "snapshot too old"

Andres Freund  writes:
> On 2024-12-03 22:06:59 +0200, Heikki Linnakangas wrote:
>> I spotted some more remnants of the "snapshot too old" feature that was
>> removed in v17. Barring objections, I will commit the attached patch to tidy
>> up.

> Most of this I agree with. But I'm not sure just removing the toast snapshot
> stuff is good - we've had a bunch of bugs where we don't hold a snapshot for
> long enough to actually ensure that toast tuples stay alive.

Yeah, the stuff concerned with toast snapshots has nothing to do
with that old user-visible feature.  It's to keep us from writing
incorrect code, and it's still (very) needed.

regards, tom lane

Re: Memory leak in WAL sender with pgoutput (v10~)

2024-12-03 Thread Michael Paquier

On Wed, Dec 04, 2024 at 11:05:43AM +0530, Amit Kapila wrote:
> On Wed, Dec 4, 2024 at 7:39 AM Zhijie Hou (Fujitsu)  
> wrote:
>> It appears there is an additional memory leak caused by allocating 
>> publication
>> names within the CacheMemoryContext, as noted in [1]. And it can also be 
>> fixed by
>> creating a separate memctx for publication names under the logical decoding
>> context. I think the approach makes sense since the lifespan of publication
>> names should ideally align with that of the logical decoding context.
> 
> Yeah, I don't think we can go with the proposed patch for the local
> memory context as it is.

Ah, indeed.  I was missing your point.  Would any of you like to write
a patch to achieve that?
--
Michael


signature.asc
Description: PGP signature

Re: Remove remnants of "snapshot too old"

2024-12-03 Thread Heikki Linnakangas


On 04/12/2024 03:24, Tom Lane wrote:

Andres Freund  writes:

On 2024-12-03 22:06:59 +0200, Heikki Linnakangas wrote:

I spotted some more remnants of the "snapshot too old" feature that was
removed in v17. Barring objections, I will commit the attached patch to tidy
up.



Most of this I agree with. But I'm not sure just removing the toast snapshot
stuff is good - we've had a bunch of bugs where we don't hold a snapshot for
long enough to actually ensure that toast tuples stay alive.


Yeah, the stuff concerned with toast snapshots has nothing to do
with that old user-visible feature.  It's to keep us from writing
incorrect code, and it's still (very) needed.


Right. Here's a new attempt that keeps that check.

--
Heikki Linnakangas
Neon (https://neon.tech)
From 8ec645a1182969feb95c15db846e4b294d2a40dd Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas 
Date: Wed, 4 Dec 2024 09:52:41 +0200
Subject: [PATCH v2 1/1] Remove remants of "snapshot too old"

Remove the 'whenTaken' and 'lsn' fields from SnapshotData. After the
removal of the "snapshot too old" feature, they were never set to a
non-zero value.

This largely revert commit 3e2f3c2e423, which added the
OldestActiveSnapshot tracking, and the init_toast_snapshot()
function. That was only required for setting the 'whenTaken' and 'lsn'
fields. SnapshotToast is now a constant again, like SnapshotSelf and
SnapshotAny. I kept a thin get_toast_snapshot() wrapper around
SnapshotToast though, to check that you have a registered or active
snapshot. That's still a useful sanity check.

Reviewed-by: Nathan Bossart, Andres Freund, Tom Lane
Discussion: https://www.postgresql.org/message-id/cd4b4f8c-e63a-41c0-95f6-6e6cd9b83...@iki.fi
---
 contrib/amcheck/verify_heapam.c |  4 +-
 src/backend/access/common/toast_internals.c | 51 -
 src/backend/access/heap/heaptoast.c |  4 +-
 src/backend/storage/ipc/procarray.c |  4 --
 src/backend/utils/time/snapmgr.c| 47 +--
 src/include/access/toast_internals.h|  2 +-
 src/include/utils/snapmgr.h | 13 ++
 src/include/utils/snapshot.h|  3 --
 8 files changed, 25 insertions(+), 103 deletions(-)

diff --git a/contrib/amcheck/verify_heapam.c b/contrib/amcheck/verify_heapam.c
index 9c74daacee..e16557aca3 100644
--- a/contrib/amcheck/verify_heapam.c
+++ b/contrib/amcheck/verify_heapam.c
@@ -1767,7 +1767,6 @@ check_tuple_attribute(HeapCheckContext *ctx)
 static void
 check_toasted_attribute(HeapCheckContext *ctx, ToastedAttribute *ta)
 {
-	SnapshotData SnapshotToast;
 	ScanKeyData toastkey;
 	SysScanDesc toastscan;
 	bool		found_toasttup;
@@ -1791,10 +1790,9 @@ check_toasted_attribute(HeapCheckContext *ctx, ToastedAttribute *ta)
 	 * Check if any chunks for this toasted object exist in the toast table,
 	 * accessible via the index.
 	 */
-	init_toast_snapshot(&SnapshotToast);
 	toastscan = systable_beginscan_ordered(ctx->toast_rel,
 		   ctx->valid_toast_index,
-		   &SnapshotToast, 1,
+		   get_toast_snapshot(), 1,
 		   &toastkey);
 	found_toasttup = false;
 	while ((toasttup =
diff --git a/src/backend/access/common/toast_internals.c b/src/backend/access/common/toast_internals.c
index 90d0654e62..1939cfb4d2 100644
--- a/src/backend/access/common/toast_internals.c
+++ b/src/backend/access/common/toast_internals.c
@@ -393,7 +393,6 @@ toast_delete_datum(Relation rel, Datum value, bool is_speculative)
 	HeapTuple	toasttup;
 	int			num_indexes;
 	int			validIndex;
-	SnapshotData SnapshotToast;
 
 	if (!VARATT_IS_EXTERNAL_ONDISK(attr))
 		return;
@@ -425,9 +424,8 @@ toast_delete_datum(Relation rel, Datum value, bool is_speculative)
 	 * sequence or not, but since we've already locked the index we might as
 	 * well use systable_beginscan_ordered.)
 	 */
-	init_toast_snapshot(&SnapshotToast);
 	toastscan = systable_beginscan_ordered(toastrel, toastidxs[validIndex],
-		   &SnapshotToast, 1, &toastkey);
+		   get_toast_snapshot(), 1, &toastkey);
 	while ((toasttup = systable_getnext_ordered(toastscan, ForwardScanDirection)) != NULL)
 	{
 		/*
@@ -631,41 +629,28 @@ toast_close_indexes(Relation *toastidxs, int num_indexes, LOCKMODE lock)
 }
 
 /* --
- * init_toast_snapshot
+ * get_toast_snapshot
  *
- *	Initialize an appropriate TOAST snapshot.  We must use an MVCC snapshot
- *	to initialize the TOAST snapshot; since we don't know which one to use,
- *	just use the oldest one.
+ *	Return the TOAST snapshot. Detoasting *must* happen in the same
+ *	transaction that originally fetched the toast pointer.
  */
-void
-init_toast_snapshot(Snapshot toast_snapshot)
+Snapshot
+get_toast_snapshot(void)
 {
-	Snapshot	snapshot = GetOldestSnapshot();
-
 	/*
-	 * GetOldestSnapshot returns NULL if the session has no active snapshots.
-	 * We can get that if, for example, a procedure fetches a toasted value
-	 * into a local variable, commits, and then tries to detoast the value.
-	 * Such

RE: Memory leak in WAL sender with pgoutput (v10~)

On Wednesday, December 4, 2024 2:22 PM Michael Paquier  
wrote:
> 
> On Wed, Dec 04, 2024 at 11:05:43AM +0530, Amit Kapila wrote:
> > On Wed, Dec 4, 2024 at 7:39 AM Zhijie Hou (Fujitsu)
>  wrote:
> >> It appears there is an additional memory leak caused by allocating
> >> publication names within the CacheMemoryContext, as noted in [1]. And
> >> it can also be fixed by creating a separate memctx for publication
> >> names under the logical decoding context. I think the approach makes
> >> sense since the lifespan of publication names should ideally align with 
> >> that
> of the logical decoding context.
> >
> > Yeah, I don't think we can go with the proposed patch for the local
> > memory context as it is.
> 
> Ah, indeed.  I was missing your point.  Would any of you like to write a patch
> to achieve that?

I can try to write a patch if no one else is working on this.

Best Regards,
Hou zj

Re: Partition-wise join with whole row vars

2024-12-03 Thread Alexander Pyhalov


Dmitry Dolgov писал(а) 2024-12-02 16:39:

On Tue, Oct 08, 2024 at 09:24:15AM GMT, Alexander Pyhalov wrote:

Attaching rebased patches.


Just to let you know, looks like CFBot tests are red again, but this
time there are some unexpected differences in some test query plan.


Hi, updated patches. The differences in plans were likely caused by

commit 161320b4b960ee4fe918959be6529ae9b106ea5a
Author: David Rowley 
Date:   Fri Oct 11 17:19:59 2024 +1300

Adjust EXPLAIN's output for disabled nodes

which changed explain output format.
--
Best regards,
Alexander Pyhalov,
Postgres ProfessionalFrom 6531abffe9b9958db49176c96ffaa7a848622b1d Mon Sep 17 00:00:00 2001
From: Alexander Pyhalov 
Date: Wed, 27 Dec 2023 17:31:23 +0300
Subject: [PATCH 6/6] postgres_fdw: fix partition-wise DML

---
 contrib/postgres_fdw/deparse.c|  12 +-
 .../postgres_fdw/expected/postgres_fdw.out| 160 ++
 contrib/postgres_fdw/postgres_fdw.c   |  98 ++-
 contrib/postgres_fdw/sql/postgres_fdw.sql |  38 +
 src/backend/optimizer/util/appendinfo.c   |  32 +++-
 src/include/optimizer/appendinfo.h|   1 +
 6 files changed, 323 insertions(+), 18 deletions(-)

diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index 3d4ecbaeecb..94e87ccce28 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -2307,7 +2307,7 @@ deparseDirectUpdateSql(StringInfo buf, PlannerInfo *root,
 
 	appendStringInfoString(buf, "UPDATE ");
 	deparseRelation(buf, rel);
-	if (foreignrel->reloptkind == RELOPT_JOINREL)
+	if (IS_JOIN_REL(foreignrel))
 		appendStringInfo(buf, " %s%d", REL_ALIAS_PREFIX, rtindex);
 	appendStringInfoString(buf, " SET ");
 
@@ -2334,7 +2334,7 @@ deparseDirectUpdateSql(StringInfo buf, PlannerInfo *root,
 
 	reset_transmission_modes(nestlevel);
 
-	if (foreignrel->reloptkind == RELOPT_JOINREL)
+	if (IS_JOIN_REL(foreignrel))
 	{
 		List	   *ignore_conds = NIL;
 
@@ -2350,7 +2350,7 @@ deparseDirectUpdateSql(StringInfo buf, PlannerInfo *root,
 	if (additional_conds != NIL)
 		list_free_deep(additional_conds);
 
-	if (foreignrel->reloptkind == RELOPT_JOINREL)
+	if (IS_JOIN_REL(foreignrel))
 		deparseExplicitTargetList(returningList, true, retrieved_attrs,
   &context);
 	else
@@ -2415,10 +2415,10 @@ deparseDirectDeleteSql(StringInfo buf, PlannerInfo *root,
 
 	appendStringInfoString(buf, "DELETE FROM ");
 	deparseRelation(buf, rel);
-	if (foreignrel->reloptkind == RELOPT_JOINREL)
+	if (IS_JOIN_REL(foreignrel))
 		appendStringInfo(buf, " %s%d", REL_ALIAS_PREFIX, rtindex);
 
-	if (foreignrel->reloptkind == RELOPT_JOINREL)
+	if (IS_JOIN_REL(foreignrel))
 	{
 		List	   *ignore_conds = NIL;
 
@@ -2433,7 +2433,7 @@ deparseDirectDeleteSql(StringInfo buf, PlannerInfo *root,
 	if (additional_conds != NIL)
 		list_free_deep(additional_conds);
 
-	if (foreignrel->reloptkind == RELOPT_JOINREL)
+	if (IS_JOIN_REL(foreignrel))
 		deparseExplicitTargetList(returningList, true, retrieved_attrs,
   &context);
 	else
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index bf58cfe9c57..e099f0cf05c 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -10182,6 +10182,166 @@ SELECT t1.a, t2.b FROM fprt1 t1 INNER JOIN fprt2 t2 ON (t1.a = t2.b) WHERE t1.a
 (4 rows)
 
 reset enable_sort;
+-- test partition-wise DML
+EXPLAIN (COSTS OFF, VERBOSE)
+UPDATE fprt1 SET b=fprt1.b+1 FROM fprt2 WHERE fprt1.a = fprt2.b AND fprt2.a % 25 = 0;
+  QUERY PLAN  
+--
+ Update on public.fprt1
+   Foreign Update on public.ftprt1_p1 fprt1_1
+   Foreign Update on public.ftprt1_p2 fprt1_2
+   ->  Append
+ ->  Foreign Update
+   Remote SQL: UPDATE public.fprt1_p1 r3 SET b = (r3.b + 1) FROM public.fprt2_p1 r5 WHERE ((r3.a = r5.b)) AND (((r5.a % 25) = 0))
+ ->  Foreign Update
+   Remote SQL: UPDATE public.fprt1_p2 r4 SET b = (r4.b + 1) FROM public.fprt2_p2 r6 WHERE ((r4.a = r6.b)) AND (((r6.a % 25) = 0))
+(8 rows)
+
+UPDATE fprt1 SET b=fprt1.b+1 FROM fprt2 WHERE fprt1.a = fprt2.b AND fprt2.a % 25 = 0;
+EXPLAIN (COSTS OFF, VERBOSE)
+UPDATE fprt1 SET b=(fprt1.a+fprt2.b)/2 FROM fprt2 WHERE fprt1.a = fprt2.b AND fprt2.a % 25 = 0;
+  QUERY PLAN   
+---
+ Update on public.fprt1
+   Foreign Update on public.ftprt1_p1 fprt1_1
+   Foreign Update on public.ftprt1_p2 fprt1

Re: Contradictory behavior of array_agg(distinct) aggregate.

Konstantin Knizhnik  writes:
> postgres=# create table t(x integer unique);
> CREATE TABLE
> postgres=# insert into t values (null),(null);
> INSERT 0 2
> postgres=# select count(distinct x) from t;
>   count
> ---
>   0
> (1 row)

> postgres=# select array_agg(distinct x) from t;
>   array_agg
> ---
>   {NULL}
> (1 row)

> postgres=# select array_agg(x) from t;
>    array_agg
> -
>   {NULL,NULL}
> (1 row)

I see nothing contradictory here.  "array_agg(distinct x)"
combines the two NULLs into one, which is the normal
behavior of DISTINCT.  "count(distinct x)" does the same
thing --- but count() only counts non-null inputs, so
you end with zero.

regards, tom lane

Re: Contradictory behavior of array_agg(distinct) aggregate.

2024-12-03 Thread Konstantin Knizhnik




On 04/12/2024 9:03 am, Tom Lane wrote:

Konstantin Knizhnik  writes:

postgres=# create table t(x integer unique);
CREATE TABLE
postgres=# insert into t values (null),(null);
INSERT 0 2
postgres=# select count(distinct x) from t;
   count
---
   0
(1 row)
postgres=# select array_agg(distinct x) from t;
   array_agg
---
   {NULL}
(1 row)
postgres=# select array_agg(x) from t;
    array_agg
-
   {NULL,NULL}
(1 row)

I see nothing contradictory here.  "array_agg(distinct x)"
combines the two NULLs into one, which is the normal
behavior of DISTINCT.



Sorry.
It is actually inconsistency in basic SQL model in interpretation of 
NULL by UNIQUE and DISTINCT, and array_agg just follows this rule.

Re: Contradictory behavior of array_agg(distinct) aggregate.

2024-12-03 Thread David G. Johnston

On Tuesday, December 3, 2024, Konstantin Knizhnik 
wrote:
>
> Is it only me who consider that current behavior of array_agg(distinct)
> contradicts to interpretation of nulls in other cases ("null" is something
> like "unknown" which means that we can not say weather two nulls are the
> same or not).

Null value handling has a few varied behaviors related to it.  This
particular one is best thought of as desirably being consistent with group
by.

>
> This is why it is allowed to insert multiple nulls in the unique column.

Nowadays the DBA gets to choose which of the two behaviors a unique index
applies, which allows indexes to get on the same page as group by et al.,
thus fixing your inconsistency claim here.

> postgres=# create table t(x integer unique);
> CREATE TABLE
> postgres=# insert into t values (null),(null);
> INSERT 0 2
> postgres=# select count(distinct x) from t;
>  count
> ---
>  0
> (1 row)
>
postgres=# select array_agg(distinct x) from t;
>  array_agg
> ---
>  {NULL}
> (1 row)
>
> postgres=# select array_agg(x) from t;
>   array_agg
> -
>  {NULL,NULL}
> (1 row)
>
>
> So what is the number of distinct "x" values in this case? I think that
> according to SQL model - 0 (as count(distinct) returns).

1, but getting this answer computed is a non-trivial expression as the
count aggregate can’t do the counting.

> Why in this case array_agg(distinct x) returns non-empty array?

I can be convinced to see an inconsistency here.  In count though, not
array_agg.  The inability for count to see and thus count null values
cannot be worked around while you can always apply a filter clause to
ignore null values you don’t want.

> Yes, unlike most other aggregates, `array_agg` is not ignoring null values.
> But is it correct to treat two null values as the same (non-distinct)?

Yes, in most contexts where null values are forced to be compared to each
other they do so by defining all null values to be representationally
identical.  See group by for the most authoritative reference case.

> IMHO correct result in this case should be either {} or NULL, either
> {NULL,NULL}.
>

You have a typo here somewhere…

If you want the empty array use a filter clause to make it behave
“strictly”.  Producing a null output seems indefensible.

A policy that all nulls values are indistinct from one another, which is
the most prevalent one in SQL, makes the most sense to me.  My gut says
that “group by col nulls [not] distinct” would be an undesirable thing to
add to the language.  It was probably added to unique indexes because they
went the wrong way and needed a way to course-correct.

David J.

Re: not null constraints, again

2024-12-03 Thread jian he

hi.

heap_create_with_catalog argument (cooked_constraints):
passed as NIL in function {create_toast_table, make_new_heap}
passed as list_concat(cookedDefaults,old_constraints) in DefineRelation

in DefineRelation we have function call:
MergeAttributes
heap_create_with_catalog
StoreConstraints

StoreConstraints second argument: cooked_constraints, some is comes from
DefineRelation->MergeAttributes old_constraints:
{
stmt->tableElts = MergeAttributes(stmt->tableElts, inheritOids,
stmt->relation->relpersistence, stmt->partbound != NULL, &old_constraints,
&old_notnulls);
}

My understanding from DefineRelation->MergeAttributes is that old_constraints
will only have CHECK constraints.
that means heap_create_with_catalog->StoreConstraints
StoreConstraints didn't actually handle CONSTR_NOTNULL.

heap_create_with_catalog comments also says:
* cooked_constraints: list of precooked check constraints and defaults

coverage https://coverage.postgresql.org/src/backend/catalog/heap.c.gcov.html
also shows StoreConstraints, CONSTR_NOTNULL never being called,
which is added by this thread.


my question is can we remove StoreConstraints, CONSTR_NOTNULL handling.
we have 3 functions {StoreConstraints, AddRelationNotNullConstraints,
AddRelationNewConstraints} that will call StoreRelNotNull to store the not-null
constraint. That means if we want to bullet proof that something is conflicting
with not-null, we need to add code to check all these 3 places.
removing StoreConstraints handling not-null seems helpful.


also comments in MergeAttributes:
 * Output arguments:
 * 'supconstr' receives a list of constraints belonging to the parents,
 *updated as necessary to be valid for the child.
 * 'supnotnulls' receives a list of CookedConstraints that corresponds to
 *constraints coming from inheritance parents.

can we be explicit that "supconstr" is only about CHECK constraint,
"supnotnulls" is
only about NOT-NULL constraint.

Re: Cannot find a working 64-bit integer type on Illumos

On Wed, Dec 4, 2024 at 2:24 AM Peter Eisentraut  wrote:
> This patch looks good to me.

Thanks!  Pushed.  Let's see what the build farm says.

> In meson.build, this comment seems to be misplaced by accident:

Oops, fixed.

> In c.h, you include  instead of .  Is there a
> reason for that?

 was already there for uintptr_t at least.   is
needed to be able to define:

#define INT64_FORMAT "%" PRId64

Re: Doc: typo in config.sgml

2024-12-03 Thread Bruce Momjian

On Tue, Dec  3, 2024 at 03:58:20PM -0500, Bruce Momjian wrote:
> On Tue, Dec  3, 2024 at 09:05:45PM +0100, Peter Eisentraut wrote:
> > On 26.11.24 20:04, Bruce Momjian wrote:
> > >   %.pdf: %.fo $(ALL_IMAGES)
> > > - $(FOP) -fo $< -pdf $@
> > > + LANG=C $(FOP) -fo $< -pdf $@ 2>&1 | \
> > > + awk 'BEGIN { warn = 0 }  { print }/not available in font/ { warn = 1 }  
> > > \
> > > + END { if (warn != 0) print("\nFound characters that cannot be displayed 
> > > in the PDF document") }' 1>&2
> > 
> > Wouldn't that lose the exit code from the fop execution?
> 
> Yikes, I think it would.  Let me work on a fix now.

Fixed in the attached applied patch.  Glad you saw this mistake.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Do not let urgent matters crowd out time for investment in the future.


diff --git a/doc/src/sgml/Makefile b/doc/src/sgml/Makefile
index 4a08b6f433e..9d52715ff4b 100644
--- a/doc/src/sgml/Makefile
+++ b/doc/src/sgml/Makefile
@@ -156,9 +156,11 @@ XSLTPROC_FO_FLAGS += --stringparam img.src.path '$(srcdir)/'
 	$(XSLTPROC) $(XMLINCLUDE) $(XSLTPROCFLAGS) $(XSLTPROC_FO_FLAGS) --stringparam paper.type USletter -o $@ $^
 
 %.pdf: %.fo $(ALL_IMAGES)
-	LANG=C $(FOP) -fo $< -pdf $@ 2>&1 | \
-	awk 'BEGIN { warn = 0 }  { print }  /not available in font/ { warn = 1 }  \
-	END { if (warn != 0) print("\nFound characters that cannot be output in the PDF document;  see README.non-ASCII") }' 1>&2
+	@# There is no easy way to pipe output and capture its return code, so output a special string on failure.
+	{ LANG=C $(FOP) -fo $< -pdf $@ 2>&1; [ "$$?" -ne 0 ] && echo "FOP_ERROR"; } | \
+	awk 'BEGIN { warn = 0 }  ! /^FOP_ERROR$$/ { print }  /not available in font/ { warn = 1 }  \
+	END { if (warn != 0) print("\nFound characters that cannot be output in the PDF document;  see README.non-ASCII"); \
+	if ($$0 ~ /^FOP_ERROR$$/) { exit 1} }' 1>&2
 
 
 ##

Re: Serverside SNI support in libpq

2024-12-03 Thread Jacob Champion

On Tue, Dec 3, 2024 at 5:58 AM Daniel Gustafsson  wrote:
> > Have you seen any weird behavior like this on your end? I'm starting
> > to doubt my test setup...
>
> I've not been able to reproduce any behaviour like what you describe.

Hm, v2 is different enough that I'm going to need to check my notes
and try to reproduce again. At first glance, I am still seeing strange
reload behavior (e.g. issuing `pg_ctl reload` a couple of times in a
row leads to the server disappearing without any log messages
indicating why).

> > On the plus side, I now have a handful of
> > debugging patches for a future commitfest.
>
> Do you have them handy for running tests on this version?

I'll work on cleaning them up. I'd meant to contribute them
individually by now, but I got a bit sidetracked...

Thanks!
--Jacob

Re: Missing LWLock protection in pgstat_reset_replslot()

2024-12-03 Thread Anton A. Melnikov


Hi!

b36fbd9f8d message says that inconsistency may still be possible because
statistics are not completely consistent for a single scan of
pg_stat_replication_slots under concurrent replication slot drop or
creation activity.

Seems there is a reproduction of such a case via isolation test.
Please see the repslot_stat.spec attached.

In an build with asserts performing such a test will result in a
crash in the checkpointer during server shutdown.

Please see postmaster.log and bt.txt. There are
wal_level = logical
max_replication_slots = 4
logical_decoding_work_mem = 64kB
autovacuum_naptime = 1d
log_min_messages = DEBUG
in the postgresql.conf.

The assert occures due to this stats entry:
(gdb) p *ps
$1 = {key = {kind = PGSTAT_KIND_REPLSLOT, dboid = 0, objoid = 0}, dropped = 
true, refcount = {value = 0}, generation = {value = 1}, body = 1099512025088}

Also the commit message says that such an issue should unlikely be a problem in 
practice.
So i doubt whether this should be treated as a bug and to be fixed?

Would be glad to figure this out.

With the best regards,

--
Anton A. Melnikov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Companysetup {
	SELECT slot_name FROM pg_create_logical_replication_slot('test', 'pgoutput');
}

teardown {
	SELECT pg_drop_replication_slot('test');
}

session s1

step select {
		SELECT data
		FROM
		pg_logical_slot_peek_binary_changes('test', NULL, NULL);
}

permutation
	select

permutation
	select
2024-12-04 02:59:14.764 MSK postmaster[189291] DEBUG:  registering background worker "logical replication launcher"
2024-12-04 02:59:14.764 MSK postmaster[189291] DEBUG:  mmap(150994944) with MAP_HUGETLB failed, huge pages disabled: Cannot allocate memory
2024-12-04 02:59:14.768 MSK postmaster[189291] DEBUG:  dynamic shared memory system will support 674 segments
2024-12-04 02:59:14.768 MSK postmaster[189291] DEBUG:  created dynamic shared memory control segment 2406959938 (26976 bytes)
2024-12-04 02:59:14.769 MSK postmaster[189291] DEBUG:  max_safe_fds = 983, usable_fds = 1000, already_open = 7
2024-12-04 02:59:14.769 MSK postmaster[189291] LOG:  starting PostgreSQL 17.2 on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0, 64-bit
2024-12-04 02:59:14.769 MSK postmaster[189291] LOG:  listening on Unix socket "/tmp/pg_regress-wgpIvh/.s.PGSQL.55314"
2024-12-04 02:59:14.770 MSK checkpointer[189292] DEBUG:  checkpointer updated shared memory configuration values
2024-12-04 02:59:14.770 MSK startup[189294] LOG:  database system was shut down at 2024-12-04 02:59:14 MSK
2024-12-04 02:59:14.771 MSK startup[189294] DEBUG:  checkpoint record is at 0/14DDD90
2024-12-04 02:59:14.771 MSK startup[189294] DEBUG:  redo record is at 0/14DDD90; shutdown true
2024-12-04 02:59:14.771 MSK startup[189294] DEBUG:  next transaction ID: 738; next OID: 13541
2024-12-04 02:59:14.771 MSK startup[189294] DEBUG:  next MultiXactId: 1; next MultiXactOffset: 0
2024-12-04 02:59:14.771 MSK startup[189294] DEBUG:  oldest unfrozen transaction ID: 730, in database 1
2024-12-04 02:59:14.771 MSK startup[189294] DEBUG:  oldest MultiXactId: 1, in database 1
2024-12-04 02:59:14.771 MSK startup[189294] DEBUG:  commit timestamp Xid oldest/newest: 0/0
2024-12-04 02:59:14.771 MSK startup[189294] DEBUG:  transaction ID wrap limit is 2147484377, limited by database with OID 1
2024-12-04 02:59:14.771 MSK startup[189294] DEBUG:  MultiXactId wrap limit is 2147483648, limited by database with OID 1
2024-12-04 02:59:14.771 MSK startup[189294] DEBUG:  starting up replication slots
2024-12-04 02:59:14.771 MSK startup[189294] DEBUG:  xmin required by slots: data 0, catalog 0
2024-12-04 02:59:14.771 MSK startup[189294] DEBUG:  starting up replication origin progress state
2024-12-04 02:59:14.771 MSK startup[189294] DEBUG:  reading stats file "pg_stat/pgstat.stat"
2024-12-04 02:59:14.772 MSK startup[189294] DEBUG:  removing permanent stats file "pg_stat/pgstat.stat"
2024-12-04 02:59:14.773 MSK startup[189294] DEBUG:  MultiXactId wrap limit is 2147483648, limited by database with OID 1
2024-12-04 02:59:14.773 MSK startup[189294] DEBUG:  MultiXact member stop limit is now 4294914944 based on MultiXact 1
2024-12-04 02:59:14.775 MSK postmaster[189291] DEBUG:  starting background worker process "logical replication launcher"
2024-12-04 02:59:14.776 MSK autovacuum launcher[189296] DEBUG:  autovacuum launcher started
2024-12-04 02:59:14.776 MSK postmaster[189291] LOG:  database system is ready to accept connections
2024-12-04 02:59:14.776 MSK logical replication launcher[189297] DEBUG:  logical replication launcher started
2024-12-04 02:59:14.799 MSK postmaster[189291] DEBUG:  forked new backend, pid=189298 socket=9
2024-12-04 02:59:14.818 MSK postmaster[189291] DEBUG:  server process (PID 189298) exited with exit code 0
2024-12-04 02:59:14.829 MSK postmaster[189291] DEBUG:  forked new backend, pid=189300 socket=9
2024-12-04 02:59:14.868 MSK postmaster[189291] DEBUG:  server process (PID 189300

Re: Memory leak in WAL sender with pgoutput (v10~)

2024-12-03 Thread Euler Taveira

On Tue, Dec 3, 2024, at 7:41 PM, Michael Paquier wrote:
> On Tue, Dec 03, 2024 at 02:45:22PM +0100, Alvaro Herrera wrote:
> > If we don't want to accept that risk (for which I see no argument, but
> > happy to be proven wrong), I would suggest to use the foreach-pfree
> > pattern Michael first proposed for the backbranches, and the new memory
> > context in master.  I think this is conducive to better coding overall
> > as we clean things up in this area.
> 
> Is it really worth betting on nobody doing something that does a
> sizeof(PGOutputData) for the stable branches?  People like doing fancy
> things, and we would not hear about such problems except if we push
> the button making it a possibility because compiled code suddenly
> breaks after a minor release update of the core engine.

Although, Debian code search [1] says this data structure is not used outside
PostgreSQL, I wouldn't risk breaking third-party extensions during a minor
upgrade (even if it is known that such data structure is from that particular
output plugin -- pgoutput -- and other output plugins generally have its own
data structure). +1 from Alvaro's proposal.

[1] https://codesearch.debian.net/search?q=PGOutputData&literal=0

--
Euler Taveira
EDB   https://www.enterprisedb.com/

Re: Using Expanded Objects other than Arrays from plpgsql

Michel Pelletier  writes:
> Here's a WIP patch for a pgexpanded example in src/test/modules.

I didn't look at your patch yet, but in the meantime here's an update
that takes the next step towards what I promised.

0001-0003 are the same as before, with a couple of trivial changes
to rebase them up to current HEAD.  0004 adds a support function
request to allow extension functions to perform in-place updates.
You should be able to use that to improve what your extension
is doing.  The new comments in supportnodes.h explain how to
use it (plus see the built-in examples, though they are quite
simple).

regards, tom lane

From 2b5c421f608d77994c1a898c9860b8a28bc2f46f Mon Sep 17 00:00:00 2001
From: Tom Lane 
Date: Tue, 3 Dec 2024 14:31:04 -0500
Subject: [PATCH v2 1/4] Preliminary refactoring.

This short and boring patch simply moves the responsibility for
initializing PLpgSQL_expr.target_param into plpgsql parsing,
rather than doing it at first execution of the expr as before.
This doesn't save anything in terms of runtime, since the work was
trivial and done only once per expr anyway.  But it makes the info
available during parsing, which will be useful for the next step.

Likewise set PLpgSQL_expr.func during parsing.  According to the
comments, this was once impossible; but it's certainly possible
since we invented the plpgsql_curr_compile variable.  Again, this
saves little runtime, but it seems far cleaner conceptually.

While at it, I reordered stuff in struct PLpgSQL_expr to make it
clearer which fields are filled when, and merged some duplicative
code in pl_gram.y.

Discussion: https://postgr.es/m/CACxu=vjakfnsyxoosnw1wegsao5u_v1xybacfvj14wgjv_p...@mail.gmail.com
---
 src/pl/plpgsql/src/pl_exec.c | 27 ---
 src/pl/plpgsql/src/pl_gram.y | 65 
 src/pl/plpgsql/src/plpgsql.h | 31 +
 3 files changed, 62 insertions(+), 61 deletions(-)

diff --git a/src/pl/plpgsql/src/pl_exec.c b/src/pl/plpgsql/src/pl_exec.c
index e31206e7f4..1a9c010205 100644
--- a/src/pl/plpgsql/src/pl_exec.c
+++ b/src/pl/plpgsql/src/pl_exec.c
@@ -4174,12 +4174,6 @@ exec_prepare_plan(PLpgSQL_execstate *estate,
 	SPIPlanPtr	plan;
 	SPIPrepareOptions options;
 
-	/*
-	 * The grammar can't conveniently set expr->func while building the parse
-	 * tree, so make sure it's set before parser hooks need it.
-	 */
-	expr->func = estate->func;
-
 	/*
 	 * Generate and save the plan
 	 */
@@ -5016,21 +5010,7 @@ exec_assign_expr(PLpgSQL_execstate *estate, PLpgSQL_datum *target,
 	 * If first time through, create a plan for this expression.
 	 */
 	if (expr->plan == NULL)
-	{
-		/*
-		 * Mark the expression as being an assignment source, if target is a
-		 * simple variable.  (This is a bit messy, but it seems cleaner than
-		 * modifying the API of exec_prepare_plan for the purpose.  We need to
-		 * stash the target dno into the expr anyway, so that it will be
-		 * available if we have to replan.)
-		 */
-		if (target->dtype == PLPGSQL_DTYPE_VAR)
-			expr->target_param = target->dno;
-		else
-			expr->target_param = -1;	/* should be that already */
-
 		exec_prepare_plan(estate, expr, 0);
-	}
 
 	value = exec_eval_expr(estate, expr, &isnull, &valtype, &valtypmod);
 	exec_assign_value(estate, target, value, isnull, valtype, valtypmod);
@@ -6282,13 +6262,6 @@ setup_param_list(PLpgSQL_execstate *estate, PLpgSQL_expr *expr)
 		 * that they are interrupting an active use of parameters.
 		 */
 		paramLI->parserSetupArg = expr;
-
-		/*
-		 * Also make sure this is set before parser hooks need it.  There is
-		 * no need to save and restore, since the value is always correct once
-		 * set.  (Should be set already, but let's be sure.)
-		 */
-		expr->func = estate->func;
 	}
 	else
 	{
diff --git a/src/pl/plpgsql/src/pl_gram.y b/src/pl/plpgsql/src/pl_gram.y
index 8182ce28aa..5431977d69 100644
--- a/src/pl/plpgsql/src/pl_gram.y
+++ b/src/pl/plpgsql/src/pl_gram.y
@@ -64,6 +64,10 @@ static	bool			tok_is_keyword(int token, union YYSTYPE *lval,
 static	void			word_is_not_variable(PLword *word, int location);
 static	void			cword_is_not_variable(PLcword *cword, int location);
 static	void			current_token_is_not_variable(int tok);
+static	PLpgSQL_expr	*make_plpgsql_expr(const char *query,
+		   RawParseMode parsemode);
+static	void			expr_is_assignment_source(PLpgSQL_expr *expr,
+  PLpgSQL_datum *target);
 static	PLpgSQL_expr	*read_sql_construct(int until,
 			int until2,
 			int until3,
@@ -529,6 +533,10 @@ decl_statement	: decl_varname decl_const decl_datatype decl_collate decl_notnull
 	 errmsg("variable \"%s\" must have a default value, since it's declared NOT NULL",
 			var->refname),
 	 parser_errposition(@5)));
+
+		if (var->default_val != NULL)
+			expr_is_assignment_source(var->default_val,
+	  (PLpgSQL_datum *) var);
 	}
 | decl_varname K_ALIAS K_FOR decl_aliasitem ';'
 	{
@@ -

Re: Memory leak in WAL sender with pgoutput (v10~)

2024-12-03 Thread Michael Paquier

On Tue, Dec 03, 2024 at 09:46:06PM -0300, Euler Taveira wrote:
> Although, Debian code search [1] says this data structure is not used outside
> PostgreSQL, I wouldn't risk breaking third-party extensions during a minor
> upgrade (even if it is known that such data structure is from that particular
> output plugin -- pgoutput -- and other output plugins generally have its own
> data structure). +1 from Alvaro's proposal.

A lookup of the public repos of github did not show fancy with the
manipulation of the structure for peoject related to Postgres, either.

FWIW, I'm OK with the memory context reset solution as much as the
direct free calls as we are sure that they will be safe.  And at the
end of the day, the problem would be solved with any of these
solutions.  My votes would be +0.6 for the free and +0.5 for the mcxt
manipulation, so let's say that they are tied.

As Alvaro and yourself are in favor of the mcxt approach, then let's
go for it.  Amit has concerns with other code paths that could be
similarly leaking.  I'm not sure if this is worth waiting too long
based on how local the fix for the existing leak is with any of these
solutions.
--
Michael

signature.asc
Description: PGP signature

Re: shared-memory based stats collector - v70

2024-12-03 Thread Anton A. Melnikov




On 03.12.2024 18:07, Andres Freund wrote:

Hi,

On 2024-12-03 13:37:48 +0300, Anton A. Melnikov wrote:

Found a place in the code of this patch that is unclear to me:
https://github.com/postgres/postgres/blob/1acf10549e64c6a52ced570d712fcba1a2f5d1ec/src/backend/utils/activity/pgstat.c#L1658

Owing assert() the next if() should never be performed, but the comment above 
says the opposite.
Is this assert really needed here? And if so, for what?


It's code that should be unreachable. But in case it is encountered in a
production scenario, it's not worth taking down the server for it.


Thanks! It's clear.
Although there is a test case that lead to this assertion to be triggered.
But i doubt if anything needs to be fixed.
I described this case in as it seems to me suitable thread:
https://www.postgresql.org/message-id/56bf8ff9-dd8c-47b2-872a-748ede82af99%40postgrespro.ru

Would be appreciate if you take a look on it.


With the best wishes,

--
Anton A. Melnikov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Contradictory behavior of array_agg(distinct) aggregate.

2024-12-03 Thread Konstantin Knizhnik


Hi hackers!


Is it only me who consider that current behavior of array_agg(distinct) 
contradicts to interpretation of nulls in other cases ("null" is 
something like "unknown" which means that we can not say weather two 
nulls are the same or not). This is why it is allowed to insert multiple 
nulls in the unique column.


postgres=# create table t(x integer unique);
CREATE TABLE
postgres=# insert into t values (null),(null);
INSERT 0 2
postgres=# select count(distinct x) from t;
 count
---
 0
(1 row)

postgres=# select array_agg(distinct x) from t;
 array_agg
---
 {NULL}
(1 row)

postgres=# select array_agg(x) from t;
  array_agg
-
 {NULL,NULL}
(1 row)


So what is the number of distinct "x" values in this case? I think that 
according to SQL model - 0 (as count(distinct) returns).

Why in this case array_agg(distinct x) returns non-empty array?

Yes, unlike most other aggregates, `array_agg` is not ignoring null values.
But is it correct to treat two null values as the same (non-distinct)?
IMHO correct result in this case should be either {} or NULL, either 
{NULL,NULL}.

Re: CREATE SCHEMA ... CREATE DOMAIN support

On 30.11.24 20:08, Tom Lane wrote:

2. transformCreateSchemaStmtElements is of the opinion that it's
responsible for ordering the schema elements in a way that will work,
but it just about completely fails at that task. Ordering the objects
by kind is surely not sufficient, and adding CREATE DOMAIN will make
that worse. (Example: a domain could be used in a table definition,
but we also allow domains to be created over tables' composite types.)
Yet we have no infrastructure that would allow us to discover the real
dependencies between unparsed DDL commands, nor is it likely that
anyone will ever undertake building such. I think we ought to nuke
that concept from orbit and just execute the schema elements in the
order presented. I looked at several iterations of the SQL standard
and cannot find any support for the idea that CREATE SCHEMA needs to
be any smarter than that. I'd also argue that doing anything else is
a POLA violation. It's especially a POLA violation if the code
rearranges a valid user-written command order into an invalid order,
which is inevitable if we stick with the current approach.

The notion that we ought to sort the objects by kind appears to go
all the way back to 95ef6a344 of 2002-03-21, which I guess makes it
my fault. There must have been some prior mailing list discussion,
but I couldn't find much. There is a predecessor of the committed
patch in
https://www.postgresql.org/message-id/flat/3C7F8A49.CC4EF0BE%40redhat.com
but no discussion of why sorting by kind is a good idea. (The last
message in the thread suggests that there was more discussion among
the Red Hat RHDB team, but if so it's lost to history now.)

SQL/Framework subclause "Descriptors" says:

"""
The execution of an SQL-statement may result in the creation of many
descriptors. An SQL object that is created as a result of an
SQL-statement may depend on other descriptors that are only created as a
result of the execution of that SQL statement.

NOTE 8 — This is particularly relevant in the case of the definition> SQL-statement. A can, for example,
contain many s that in turn contain constraint>s. A single in one can
reference a second table being created by a separate
which itself is able to contain a reference to the first table. The
dependencies of each table on the descriptors of the other are valid
provided that all necessary descriptors are created during the execution
of the complete .

"""

So this says effectively that forward references are allowed. Whether
reordering the statements is a good way to implement that is dubious, as
we are discovering.

Re: code contributions for 2024, WIP version

> On 3 Dec 2024, at 17:41, Robert Haas  wrote:
> 
> On Tue, Dec 3, 2024 at 11:19 AM Joe Conway  wrote:
>> While I know you said "you will do you" when it comes to your annual
>> blog, there are a number of similar efforts -- top of mind is the
>> analysis done (as I understand it) by Daniel Gustafsson and Claire
>> Giordano [1], as well as ongoing/recurring analysis done by the
>> contributor committee. And there is the adjacent related discussion
>> around commit messages/authors. It makes me wonder if there isn't a way
>> to make all of our lives easier going forward.
> 
> Yes, I'm game to try to figure out how to combine our efforts. I don't
> think it's a bad thing that different people have different takes;
> this is complicated and looking at it through just one lens is
> limiting. But people duplicating work is, well, not so good.

If we settled on a meta-data standard for how to identify authors, reviewers,
backpatches etc I think that would go a very long way to lower the complexity
of getting to the data and keep folks focused on doing interesting analysis.

--
Daniel Gustafsson

add support for the old naming libs convention on windows (ssleay32.lib and libeay32.lib)

2024-12-03 Thread Darek Ślusarczyk

Hi,

Summary of Changes:

According to the postgresql-17 requirements
https://www.postgresql.org/docs/17/install-requirements.html
the minimum required version of openssl is 1.0.2.
In that version, the naming convention on windows is still
ssleay32.[lib|dll] and
libeay32.[lib|dll] instead of libssl.[lib|dll] and libcrypto.[lib|dll].
It changed in version 1.1.0
https://github.com/openssl/openssl/issues/10332#issuecomment-549027653
Thus there is a bug in meson.build as it only supports libssl.lib and
libcrypto.lib,
hence a simple patch that fixes the issue and supports both conventions.

I also submitted a pull request on GitHub, which can be found here:
https://github.com/postgres/postgres/pull/188
There are two simple changes in the meson.build file.

Reason for Changes:

Add support for the old naming libs convention on windows (ssleay32.lib and
libeay32.lib).

Testing:

The patch has been tested against postgres-17 and openssl v1.0.2 in our
environment on various platforms (lnx, win, ...)

Patch:

diff --git a/meson.build b/meson.build
index 451c3f6d85..50fa6d865b 100644
--- a/meson.build
+++ b/meson.build
@@ -1337,12 +1337,12 @@ if sslopt in ['auto', 'openssl']

   # via library + headers
   if not ssl.found()
-ssl_lib = cc.find_library('ssl',
+ssl_lib = cc.find_library(['ssl', 'ssleay32'],
   dirs: test_lib_d,
   header_include_directories: postgres_inc,
   has_headers: ['openssl/ssl.h', 'openssl/err.h'],
   required: openssl_required)
-crypto_lib = cc.find_library('crypto',
+crypto_lib = cc.find_library(['crypto', 'libeay32'],
   dirs: test_lib_d,
   required: openssl_required)
 if ssl_lib.found() and crypto_lib.found()

kind regards,
ds
--
marines() {
return Darek_Slusarczyk;
}

Re: simplify regular expression locale global variables

2024-12-03 Thread Jeff Davis

On Tue, 2024-12-03 at 17:06 +0100, Peter Eisentraut wrote:
> But it doesn't remove pg_regex_collation, only pg_regex_strategy.  So
> I 
> have split my v2 into two patches, the first removes
> pg_regex_collation 
> and the second removes pg_regex_strategy.  The first patch is useful
> on 
> its own, I think;

+1, looks committable now.

>  the second one will presumably be replaced by the 
> other patch series above.

Sounds good.

Regards,
Jeff Davis

Re: Remove remnants of "snapshot too old"

On Tue, Dec 03, 2024 at 10:06:59PM +0200, Heikki Linnakangas wrote:
> -/* --
> - * init_toast_snapshot
> - *
> - *   Initialize an appropriate TOAST snapshot.  We must use an MVCC snapshot
> - *   to initialize the TOAST snapshot; since we don't know which one to use,
> - *   just use the oldest one.
> - */
> -void
> -init_toast_snapshot(Snapshot toast_snapshot)
> -{
> - Snapshotsnapshot = GetOldestSnapshot();
> -
> - /*
> -  * GetOldestSnapshot returns NULL if the session has no active 
> snapshots.
> -  * We can get that if, for example, a procedure fetches a toasted value
> -  * into a local variable, commits, and then tries to detoast the value.
> -  * Such coding is unsafe, because once we commit there is nothing to
> -  * prevent the toast data from being deleted.  Detoasting *must* happen 
> in
> -  * the same transaction that originally fetched the toast pointer.  
> Hence,
> -  * rather than trying to band-aid over the problem, throw an error.  
> (This
> -  * is not very much protection, because in many scenarios the procedure
> -  * would have already created a new transaction snapshot, preventing us
> -  * from detecting the problem.  But it's better than nothing, and for 
> sure
> -  * we shouldn't expend code on masking the problem more.)
> -  */
> - if (snapshot == NULL)
> - elog(ERROR, "cannot fetch toast data without an active 
> snapshot");
> -
> - /*
> -  * Catalog snapshots can be returned by GetOldestSnapshot() even if not
> -  * registered or active. That easily hides bugs around not having a
> -  * snapshot set up - most of the time there is a valid catalog snapshot.
> -  * So additionally insist that the current snapshot is registered or
> -  * active.
> -  */
> - Assert(HaveRegisteredOrActiveSnapshot());
> -
> - InitToastSnapshot(*toast_snapshot, snapshot->lsn, snapshot->whenTaken);
> -}

The ERROR and Assert() here were the subject of some recent commits
(b52adba and 70b9adb) as well as a patch I'm hoping to commit soon [0].
I've only skimmed your patch so far, but I'm wondering if this fixes those
issues a different way.  If not, I'm a little worried about losing these
checks.

[0] https://postgr.es/m/flat/ZyKdCEaUB2lB1rG8%40nathan

-- 
nathan

Re: Remove useless casts to (void *)

I wrote:
> Pushed; I'll await hake's next run with interest.

hake didn't like that, but after adding -D__EXTENSIONS__ per
https://illumos.org/man/7/standards
it seems happy again.  Its configure results are the same as
beforehand, and the warning about shmdt() is gone.

regards, tom lane

Re: Remove useless casts to (void *)

On Wed, Dec 4, 2024 at 5:33 PM Tom Lane  wrote:
> hake didn't like that, but after adding -D__EXTENSIONS__ per
> https://illumos.org/man/7/standards
> it seems happy again.  Its configure results are the same as
> beforehand, and the warning about shmdt() is gone.

Cool.  I was wondering if it was going to break on some of our recent
POSIX 2008 stuff (thread-safe  bits and pieces) next, given
_POSIX_C_SOURCE=200112L.  It certainly does know about 2008 too, so it
looks like the man page might be out of date.

https://github.com/illumos/illumos-gate/blob/master/usr/src/boot/sys/sys/cdefs.h#L724

Re: Remove useless casts to (void *)

Thomas Munro  writes:
> Cool.  I was wondering if it was going to break on some of our recent
> POSIX 2008 stuff (thread-safe  bits and pieces) next, given
> _POSIX_C_SOURCE=200112L.  It certainly does know about 2008 too, so it
> looks like the man page might be out of date.

Do you want to try setting it to 200809?  But let's wait to see what
margay has to say about the current choices.

regards, tom lane

Re: Virtual generated columns

2024-12-03 Thread jian he

On Fri, Nov 29, 2024 at 6:13 PM Peter Eisentraut  wrote:
>
> - Added support for virtual columns in trigger column lists.  (For that,
> I renamed ExecInitStoredGenerated() to ExecInitGenerated(), which
> handles the computation of ri_extraUpdatedCols.)
>

why not duplicate some code from ExecInitStoredGenerated to
ExecGetExtraUpdatedCols?

* now the expression is that something initiated for the virtual
generated column. which may not be necessary for virtual columns.
let's make ResultRelInfo->ri_GeneratedExprsI,
ResultRelInfo->ri_GeneratedExprsU be NULL for virtual columns.

currently it may look like this:
(gdb) p resultRelInfo->ri_GeneratedExprsU
$20 = (ExprState **) 0x34f9638
(gdb) p resultRelInfo->ri_GeneratedExprsU[0]
$21 = (ExprState *) 0x0
(gdb) p resultRelInfo->ri_GeneratedExprsU[1]
$22 = (ExprState *) 0x0
(gdb) p resultRelInfo->ri_GeneratedExprsU[2]
$23 = (ExprState *) 0x40

* ExecInitStoredGenerated main used in ExecComputeStoredGenerated.
* we also need to slightly change ExecInitGenerated's comments.
* in InitResultRelInfo, do we need explicit set ri_Generated_valid to false?
* duplicate code won't have big performance issues, since
build_column_default will take most of the time.

the attached patch makes ExecInitStoredGenerated as is;
duplicate some code from ExecInitStoredGenerated to ExecGetExtraUpdatedCols.

v10-0001-refactor-ExecGetExtraUpdatedCols.no-cfbot
Description: Binary data

Re: Remove useless casts to (void *)

Andres Freund  writes:
> On 2024-12-02 17:42:33 -0500, Tom Lane wrote:
>> I'll proceed with improving that (in master only, don't think we need it in
>> back branches, at least not today) unless somebody pushes back soon.

> +1

Pushed; I'll await hake's next run with interest.

regards, tom lane

Re: Better error message when --single is not the first arg to postgres executable

Nathan Bossart  writes:
> Here's what I have staged for commit.

In addition to Alvaro's comments:

+/* special must-be-first options for dispatching to various subprograms */
+typedef enum Subprogram
+{
+   SUBPROGRAM_CHECK,
+   ... etc

"Subprogram" doesn't quite seem like the right name for this enum.
These are not subprograms, they are options.  I'm not feeling
especially inventive today, so this might be a lousy suggestion,
but how about

typedef enum DispatchOption
{
DISPATCH_CHECK,
... etc

Also, I think our usual convention for annotating a special
last entry is more like

+   SUBPROGRAM_SINGLE,
+   SUBPROGRAM_POSTMASTER,  /* must be last */
+} Subprogram;

I don't like the comment with "above" because it's not
very clear above what.

regards, tom lane

Re: Windows pg_basebackup unable to create >2GB pg_wal.tar tarballs ("could not close file: Invalid argument" when creating pg_wal.tar of size ~ 2^31 bytes)

On Fri, Nov 22, 2024 at 6:44 PM Thomas Munro  wrote:
> I'm not sure if we'd ever know if we broke MinGW + msvcrt.dll in the
> back branches, ie if any computer actually existing on earth would
> ever compile and run the second branch, and if so, be used for serious
> database work and then be able to reach the failure case.  I think an
> honest but disappointing errno is probably OK in that hypothetical
> case.  It's hard to know how far to go when programming ghost
> computers.  Thoughts, anyone?

Hypothetically speaking, suppose we just used _s_chsize everywhere
i.e. we deleted the #if/#else/#endif in your example and everything
between #else and #endif. I know very little about Windows, but I
suppose what would happen is that if you tried to compile a
thus-modified version of PostgreSQL on a very old MinGW, it would fail
to compile, and if you compiled it on a newer machine and tried to run
it on an older one, you'd get the equivalent of a dynamic linker
failure when you tried to start the server. At least that's what would
happen on Linux or macOS. Would Windows postpone the failure until we
actually tried to jump to the nonexistent entrypoint?

In any case, one idea would be to adopt this approach and, if we heard
of actual failures, we could then decide that the fallback path is
needed, and if not, then we're done. Perhaps that's too relaxed an
attitude; after all, breaking the entire server in a minor release is
not great. However, if you think that systems without msvcr80.dll are
extinct in the wild, maybe it's OK. I don't know the history of these
things, but a quick Google search suggested that maybe msvcr80.dll was
part of the "Microsoft Visual C++ 2005 Redistributable Package". I
don't know if that means all Windows shipped after 2005 would have it,
but that's a long time ago.

> (Obviously an exorcism is overdue in master, working on it...)

How does what need to be done compare to the patches from Jakub and Davinder?

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Precheck to consider path costs for partial paths

2024-12-03 Thread Nikita Malakhov

Hi hackers!

I'd stumbled upon the discussion [1] on TPC-DS query performance,
looked into code that caused this behavior and saw that partial hash,
merge and nested loop paths are discarded regardless of costs
if they have more pathkeys.

I've excluded the pathkey chain length condition from the precheck function
and added passing precalculated startup cost in addition to total cost,
and it seems to produce a more effective plan for case in [1].

Please check the attached patch. I'm very interested if my assumption
is correct or not.

[1]
https://www.postgresql.org/message-id/flat/SEZPR06MB649422CDEBEBBA3915154EE58A232%40SEZPR06MB6494.apcprd06.prod.outlook.com

-- 
Regards,
Nikita Malakhov
Postgres Professional
The Russian Postgres Company
https://postgrespro.ru/


v1-0001-ppath-precheck.patch
Description: Binary data

RE: Proposal for Updating CRC32C with AVX-512 Algorithm.

2024-12-03 Thread Devulapalli, Raghuveer

> Thanks!  cfbot is showing a couple of errors [0] [1] [2].  

Oh yikes, the CI had passed with an earlier version. Wonder if I made a mess of 
the rebase. I will take a look and fix them. 

Raghuveer

RE: Memory leak in WAL sender with pgoutput (v10~)

On Wednesday, December 4, 2024 8:55 AM Michael Paquier  
wrote:
> 
> On Tue, Dec 03, 2024 at 09:46:06PM -0300, Euler Taveira wrote:
> > Although, Debian code search [1] says this data structure is not used
> outside
> > PostgreSQL, I wouldn't risk breaking third-party extensions during a minor
> > upgrade (even if it is known that such data structure is from that 
> > particular
> > output plugin -- pgoutput -- and other output plugins generally have its own
> > data structure). +1 from Alvaro's proposal.
> 
> A lookup of the public repos of github did not show fancy with the
> manipulation of the structure for peoject related to Postgres, either.
> 
> FWIW, I'm OK with the memory context reset solution as much as the
> direct free calls as we are sure that they will be safe.  And at the
> end of the day, the problem would be solved with any of these
> solutions.  My votes would be +0.6 for the free and +0.5 for the mcxt
> manipulation, so let's say that they are tied.
> 
> As Alvaro and yourself are in favor of the mcxt approach, then let's
> go for it. 

+1

> Amit has concerns with other code paths that could be
> similarly leaking.  I'm not sure if this is worth waiting too long
> based on how local the fix for the existing leak is with any of these
> solutions.

It appears there is an additional memory leak caused by allocating publication
names within the CacheMemoryContext, as noted in [1]. And it can also be fixed 
by
creating a separate memctx for publication names under the logical decoding
context. I think the approach makes sense since the lifespan of publication
names should ideally align with that of the logical decoding context.

[1] 
https://www.postgresql.org/message-id/OS0PR01MB57166A4DA0ABBB94F2FBB28694362%40OS0PR01MB5716.jpnprd01.prod.outlook.com

Best Regards,
Hou zj

Re: Does RelCache/SysCache shrink except when relations are deleted?

2024-12-03 Thread Bowen Shi

Hello,

It has been over a decade since the discussion of this email. I would like
to know if there has been any proposal regarding the eviction of
relcache/syscache.

We have recently encountered a similar memory problem: relcache/syscache
keeps growing for the reason that connections have been reserved for a long
time (one hour or more), and has more than 10+ tables in the database.

Regards
Bowen Shi

On Wed, Dec 4, 2024 at 11:23 AM MauMau  wrote:

> Hello,
>
> Please let me ask you some questions about RelCache/SysCache/CatCache
> design. I know I should post this to pgsql-general, but I decided to post
> here because the content includes design questions.
>
> <>
> My customer is facing a "out of memory" problem during a batch job. I'd
> like
> to know the cause and solutions. PostgreSQL version is 8.2.7 (32-bit on
> Linux).
>
> The batch job consists of two steps in a single psql session:
>
> 1. call some PL/pgSQL function (say "somefunc" here)
> 2. VACUUM tables (at this time, maintenance_work_mem=256MB)
>
> The step 2 emitted the following messages in syslog.
>
> ERROR:  out of memory
> DETAIL:  Failed on request of size 268435452.
> STATEMENT:  VACUUM some_table_name
>
> somefunc copies rows from a single table to 100,000 tables (table_1 -
> table_10) as follows:
>
> [somefunc]
> FOR id in 1 .. 10 LOOP
> check if the table "table_${ID}" exists by searching pg_class
> if the table exists
> INSERT INTO table_${id} SELECT * FROM some_table
> WHERE pk = id;
> else /* the table does not exist */
> CREATE TABLE table_${id} AS SELECT * FROM some_table
> WHERE pk = id;
> END LOOP;
>
> Before starting somefunc, the virtual memory of the backend postgres is
> 1.6GB, as reported by top command as "VIRT" column. When somefunc
> completes,
> it becomes 2.6GB. So, VACUUM cannot allocate 256MB because the virtual
> memory space is full.
>
> This is all the information I have now. I requested the customer to
> collect
> PostgreSQL server log so that memory context statistics can be obtained
> when
> "out of memory" occurs. Plus, I asked for the result of "SHOW ALL" and the
> minimal procedure to reproduce the problem. However, I'd like to ask your
> opinions rather than waiting for the problem to happen again.
>
>
> <>
> I'm guessing that CacheMemoryContext might be using much memory, because
> somefunc accesses as many as 100,000 tables. But I don't understand
> RelCache/SysCache implementation yet.
>
> Q1: When are the RelCache/SysCache entries removed from CacheMemoryContext?
> Are they removed only when the corresponding relations are deleted? If so,
> "many tables and indexes" is not friendly for the current PostgreSQL?
>
> Q2: somefunc increased 1GB of virtual memory after accessing 100,000
> tables.
> This means that one table uses 10KB of local memory.
> Is it common that this much memory is used for RelCache/SysCache or other
> control information?
> Does the number of attributes in a table affect local memory usage much?
>
> Q3: I think one solution is to run VACUUM in a separate psql session.
> Are there any other solutions you can think of?
>
> Q4: The customer says one strange thing. If the 100,000 tables exist
> before
> somefunc starts (i.e., somefunc just copy records), the virtual memory of
> postgres does not increase.
> Is there anything to reason about his comment?
>
> Regards
> MauMau
>
>
>
>

Re: Windows pg_basebackup unable to create >2GB pg_wal.tar tarballs ("could not close file: Invalid argument" when creating pg_wal.tar of size ~ 2^31 bytes)