Re: Fix for recursive plpython triggers

2024-05-08 Thread Andreas Karlsson

On 5/4/24 10:16 PM, Tom Lane wrote:

This fixes bug #18456 [1].  Since we're in back-branch release freeze,
I'll just park it for the moment.  But I think we should shove it in
once the freeze lifts so it's in 17beta1.
There is a similar issue with the return type (at least if it is a 
generic record) in the code but it is hard to trigger with sane code so 
I don't know if it is worth fixing but this and the bug Jacques found 
shows the downsides of the hacky fix for recursion that we have in plpython.


I found this issue while reading the code, so am very unclear if there 
is any sane code which could trigger it.


In the example below the recursive call to f('int') changes the return 
type of the f('text') call causing it to fail.


# CREATE OR REPLACE FUNCTION f(t text) RETURNS record LANGUAGE 
plpython3u AS $$

if t == "text":
plpy.execute("SELECT * FROM f('int') AS (a int)");
return { "a": "x" }
elif t == "int":
return { "a": 1 }
$$;
CREATE FUNCTION

# SELECT * FROM f('text') AS (a text);
ERROR:  invalid input syntax for type integer: "x"
CONTEXT:  while creating return value
PL/Python function "f"

Andreas




Re: Pgoutput not capturing the generated columns

2024-05-08 Thread Shubham Khanna
On Wed, May 8, 2024 at 11:39 AM Rajendra Kumar Dangwal
 wrote:
>
> Hi PG Hackers.
>
> We are interested in enhancing the functionality of the pgoutput plugin by 
> adding support for generated columns.
> Could you please guide us on the necessary steps to achieve this? 
> Additionally, do you have a platform for tracking such feature requests? Any 
> insights or assistance you can provide on this matter would be greatly 
> appreciated.

The attached patch has the changes to support capturing generated
column data using ‘pgoutput’ and’ test_decoding’ plugin. Now if the
‘include_generated_columns’ option is specified, the generated column
information and generated column data also will be sent.

Usage from pgoutput plugin:
CREATE TABLE gencoltable (a int PRIMARY KEY, b int GENERATED ALWAYS AS
(a * 2) STORED);
CREATE publication pub1 for all tables;
SELECT 'init' FROM pg_create_logical_replication_slot('slot1', 'pgoutput');
SELECT * FROM pg_logical_slot_peek_binary_changes('slot1', NULL, NULL,
'proto_version', '1', 'publication_names', 'pub1',
'include_generated_columns', 'true');

Usage from test_decoding plugin:
SELECT 'init' FROM pg_create_logical_replication_slot('slot2', 'test_decoding');
CREATE TABLE gencoltable (a int PRIMARY KEY, b int GENERATED ALWAYS AS
(a * 2) STORED);
INSERT INTO gencoltable (a) VALUES (1), (2), (3);
SELECT data FROM pg_logical_slot_get_changes('slot2', NULL, NULL,
'include-xids', '0', 'skip-empty-xacts', '1',
'include_generated_columns', '1');

Currently it is not supported as a subscription option because table
sync for the generated column is not possible as copy command does not
support getting data for the generated column. If this feature is
required we can remove this limitation from the copy command and then
add it as a subscription option later.
Thoughts?

Thanks and Regards,
Shubham Khanna.


v1-0001-Support-capturing-generated-column-data-using-pgo.patch
Description: Binary data


Re: CREATE DATABASE with filesystem cloning

2024-05-08 Thread Nazir Bilal Yavuz
Hi Ranier,

Thanks for looking into this!

I am not sure why but your reply does not show up in the thread, so I
copied your reply and answered it in the thread for visibility.

On Tue, 7 May 2024 at 16:28, Ranier Vilela  wrote:
>
> I know it's coming from copy-and-paste, but
> I believe the flags could be:
> - dstfd = OpenTransientFile(tofile, O_RDWR | O_CREAT | O_EXCL | PG_BINARY);
> + dstfd = OpenTransientFile(tofile, O_CREAT | O_WRONLY | O_TRUNC | O_EXCL | 
> PG_BINARY);
>
> The flags:
> O_WRONLY | O_TRUNC
>
> Allow the OS to make some optimizations, if you deem it possible.
>
> The flag O_RDWR indicates that the file can be read, which is not true in 
> this case.
> The destination file will just be written.

You may be right about the O_WRONLY flag but I am not sure about the
O_TRUNC flag.

O_TRUNC from the linux man page [1]: If the file already exists and is
a regular file and the access mode allows writing (i.e., is O_RDWR or
O_WRONLY) it will be truncated to length 0.  If the file is a FIFO  or
terminal device file, the O_TRUNC flag is ignored. Otherwise, the
effect of O_TRUNC is unspecified.

But it should be already checked if the destination directory already
exists in dbcommands.c file in createdb() function [2], so this should
not happen.

[1] https://man7.org/linux/man-pages/man2/open.2.html

[2]
/*
 * If database OID is configured, check if the OID is already in use or
 * data directory already exists.
 */
if (OidIsValid(dboid))
{
char   *existing_dbname = get_database_name(dboid);

if (existing_dbname != NULL)
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE)),
errmsg("database OID %u is already in use by
database \"%s\"",
   dboid, existing_dbname));

if (check_db_file_conflict(dboid))
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE)),
errmsg("data directory with the specified OID %u
already exists", dboid));
}
else
{
/* Select an OID for the new database if is not explicitly
configured. */
do
{
dboid = GetNewOidWithIndex(pg_database_rel, DatabaseOidIndexId,
   Anum_pg_database_oid);
} while (check_db_file_conflict(dboid));
}

-- 
Regards,
Nazir Bilal Yavuz
Microsoft




Re: add --no-sync to pg_upgrade's calls to pg_dump and pg_dumpall

2024-05-08 Thread Peter Eisentraut

On 03.05.24 19:13, Nathan Bossart wrote:

This is likely small potatoes compared to some of the other
pg_upgrade-related improvements I've proposed [0] [1] or plan to propose,
but this is easy enough, and I already wrote the patch, so here it is.
AFAICT there's no reason to bother syncing these dump files to disk.  If
someone pulls the plug during pg_upgrade, it's not like you can resume
pg_upgrade from where it left off.  Also, I think we skipped syncing before
v10, anyway, as the --no-sync flag was only added in commit 96a7128, which
added the code to sync dump files, too.


Looks good to me.





RE: Slow catchup of 2PC (twophase) transactions on replica in LR

2024-05-08 Thread Hayato Kuroda (Fujitsu)
Dear Peter,

Thanks for reviewing! Here are updated patches.
I updated patches only for HEAD.

> ==
> Commit message
> 
> 1.
> This patch allows user to alter two_phase option
> 
> /allows user/allows the user/
> 
> /to alter two_phase option/to alter the 'two_phase' option/

Fixed.

> ==
> doc/src/sgml/ref/alter_subscription.sgml
> 
> 2.
> two_phase can be altered only for disabled subscription.
> 
> SUGGEST
> The two_phase parameter can only be altered when
> the subscription is disabled.

Fixed.

> ==
> src/backend/access/transam/twophase.c
> 
> 3. checkGid
> +
> +/*
> + * checkGid
> + */
> +static bool
> +checkGid(char *gid, Oid subid)
> +{
> + int ret;
> + Oid subid_written,
> + xid;
> +
> + ret = sscanf(gid, "pg_gid_%u_%u", &subid_written, &xid);
> +
> + if (ret != 2 || subid != subid_written)
> + return false;
> +
> + return true;
> +}
> 
> 3a.
> The function comment should give more explanation of what it does. I
> think this function is the counterpart of the TwoPhaseTransactionGid()
> function of worker.c so the comment can say that too.

Comments were updated.

> 3b.
> Indeed, perhaps the function name should be similar to
> TwoPhaseTransactionGid. e.g. call it like
> IsTwoPhaseTransactionGidForSubid?

Replaced to IsTwoPhaseTransactionGidForSubid().

> 3c.
> Probably 'xid' should be TransactionId instead of Oid.

Right, fixed.

> 3d.
> Why not have a single return?
> 
> SUGGESTION
> return (ret == 2 && subid = subid_written);

Fixed.

> 3e.
> I am wondering if the existing TwoPhaseTransactionGid function
> currently in worker.c should be moved here because IMO these 2
> functions belong together and twophase.c seems the right place to put
> them.

+1, moved.

> ~~~
> 
> 4.
> +LookupGXactBySubid(Oid subid)
> +{
> + bool found = false;
> +
> + LWLockAcquire(TwoPhaseStateLock, LW_SHARED);
> + for (int i = 0; i < TwoPhaseState->numPrepXacts; i++)
> + {
> + GlobalTransaction gxact = TwoPhaseState->prepXacts[i];
> +
> + /* Ignore not-yet-valid GIDs. */
> + if (gxact->valid && checkGid(gxact->gid, subid))
> + {
> + found = true;
> + break;
> + }
> + }
> + LWLockRelease(TwoPhaseStateLock);
> + return found;
> +}
> 
> AFAIK the gxact also has the 'xid' available, so won't it be better to
> pass BOTH the 'xid' and 'subid' to the checkGid so you can do a full
> comparison instead of comparing only the subid part of the gid?

IIUC, the xid written in the gxact means the transaction id on the subscriber,
but formatted GID has xid on the publisher. So the value cannot be used.

> ==
> src/backend/commands/subscriptioncmds.c
> 
> 5. AlterSubscription
> 
> + /* XXX */
> + if (IsSet(opts.specified_opts, SUBOPT_TWOPHASE_COMMIT))
> + {
> 
> The "XXX" comment looks like it is meant to say something more...

This flag was used only for me, removed.

> ~~~
> 
> 6.
> + /*
> + * two_phase can be only changed for disabled
> + * subscriptions
> + */
> + if (form->subenabled)
> + ereport(ERROR,
> + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> + errmsg("cannot set %s for enabled subscription",
> + "two_phase")));
> 
> 6a.
> Should this have a more comprehensive comment giving the reason like
> the 'failover' option has?

Modified, but it is almost the same as failover's one.

> 6b.
> Maybe this should include a "translator" comment to say don't
> translate the option name.

Hmm, but other parts in AlterSubscription() does not have.
For now, I kept current style.

> ~~~
> 
> 7.
> + /* Check whether the number of prepared transactions */
> + if (!opts.twophase &&
> + form->subtwophasestate == LOGICALREP_TWOPHASE_STATE_ENABLED
> &&
> + LookupGXactBySubid(subid))
> + ereport(ERROR,
> + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> + errmsg("cannot disable two_phase when uncommitted prepared
> transactions present")));
> +
> 
> 7a.
> The first comment seems to be an incomplete sentence. I think it
> should say something a bit like:
> two_phase cannot be disabled if there are any uncommitted prepared
> transactions present.

Modified, but this part would be replaced by upcoming patches.

> 7b.
> Also, if ereport occurs what is the user supposed to do about it?
> Shouldn't the ereport include some errhint with appropriate advice?

The hint was added, but this part would be replaced by upcoming patches.

> ~~~
> 
> 8.
> + /*
> + * The changed failover option of the slot can't be rolled
> + * back.
> + */
> + PreventInTransactionBlock(isTopLevel, "ALTER SUBSCRIPTION ... SET
> (two_phase)");
> +
> + /* Change system catalog acoordingly */
> + values[Anum_pg_subscription_subtwophasestate - 1] =
> + CharGetDatum(opts.twophase ?
> + LOGICALREP_TWOPHASE_STATE_PENDING :
> + LOGICALREP_TWOPHASE_STATE_DISABLED);
> + replaces[Anum_pg_subscription_subtwophasestate - 1] = true;
> + }
> 
> Typo I think: /failover option/two_phase option/

Right, fixed.

> ==
> .../libpqwalreceiver/libpqwalreceiver.c
> 
> 9.
>  static void
>  libpqrcv_alter_slot(WalReceiverConn *conn, const char *slotname,
> - bool failover)

Re: A problem about partitionwise join

2024-05-08 Thread Richard Guo
On Fri, May 3, 2024 at 9:31 PM Robert Haas  wrote:

> On Fri, May 3, 2024 at 7:47 AM Richard Guo  wrote:
> > I think one concern regarding performance cost is that the function
> > exprs_known_equal() would be called O(N^2) times, where N is the number
> > of partition key expressions.  But I think this might not be a problem.
> > The number of a joinrel's partition key expressions would only be equal
> > to the join degree, since each base relation within the join contributes
> > only one partition key expression, according to
> > set_joinrel_partition_key_exprs().  This number would not scale with the
> > number of partitions.  But I have not measured the performance in
> > practice by running benchmarks.  Maybe I'm just wrong.
>
> I don't know, but I do think you should do some benchmarking and see
> if you can find cases where this regresses performance. In my opinion,
> this feature is worth having only if it's basically free. There's lots
> of things we could do in the planner that would give better (perhaps
> much better) plans in certain cases, but which we don't do because in
> all other cases we'd pay a certain number of CPU cycles to have them
> and it just doesn't make sense given how often we'd actually get a
> benefit. This might be another such case.


Thank you for the suggestion.  In order to obtain a rough estimation of
how this patch affects planning time, I did the following benchmarking:

* create a partitioned table with 3 keys and 1000 partitions, which
looks like

   Partitioned table "public.t1_parted"
 Column |  Type   | Collation | Nullable | Default
+-+---+--+-
 a  | integer |   |  |
 b  | integer |   |  |
 c  | integer |   |  |
 d  | integer |   |  |
Partition key: RANGE (a, b, c)
Number of partitions: 1000 (Use \d+ to list them.)

* compose a query involving 5-way joins of this partitioned table, which
looks like:

select * from t1_parted t1
 natural join t1_parted t2
 natural join t1_parted t3
 natural join t1_parted t4
 natural join t1_parted t5
where t1.b = 1 and t1.c = 2;

This query is composed in such a way that it could actually generate
partitionwise join, because there exist equi-join condition for each
pair of matching partition keys; but currently on master it is not able
to generate partitionwise join, because of the filters 't1.b = 1 and
t1.c = 2', which is the issue fixed by this patch.

* run this query 5 times with enable_partitionwise_join set to on, and
collect the average planning time on master and on patched.

To ensure fairness, on master, a little hack is required to enable the
generation of partitionwise join for this query.  This allows us to
eliminate any potential impact on planning partitionwise joins and
evaluate the effects of this patch accurately.

Below is what I got on my local machine.

-- on master

  measurement  | average  | maximum  | minimum  | std_dev |
std_dev_as_perc_of_avg
---+--+--+--+-+
 planning time | 30355.07 | 33148.47 | 29020.82 | 1681.23 | 5.54%


-- on patched

  measurement  | average  | maximum  | minimum  | std_dev |
std_dev_as_perc_of_avg
---+--+--+--+-+
 planning time | 30600.00 | 33523.23 | 28680.75 | 1861.90 | 6.08%


-- without partitionwise join

  measurement  | average | maximum | minimum | std_dev |
std_dev_as_perc_of_avg
---+-+-+-+-+
 planning time | 4840.18 | 5184.05 | 4528.87 | 299.98  | 6.20%


So it seems that the planning time is not significantly affected by this
patch, particularly when compared to the impact caused by partitionwise
join.

BTW, I was using Ashutosh's script [1] for setting up the benchmarking.
I find the script very handy.

[1]
https://www.postgresql.org/message-id/flat/CAExHW5s%3DbCLMMq8n_bN6iU%2BPjau0DS3z_6Dn6iLE69ESmsPMJQ%40mail.gmail.com

Thanks
Richard


Re: Synchronizing slots from primary to standby

2024-05-08 Thread Bertrand Drouvot
Hi,

On Mon, Apr 29, 2024 at 11:58:09AM +, Zhijie Hou (Fujitsu) wrote:
> On Monday, April 29, 2024 5:11 PM shveta malik  wrote:
> > 
> > On Mon, Apr 29, 2024 at 11:38 AM shveta malik 
> > wrote:
> > >
> > > On Mon, Apr 29, 2024 at 10:57 AM Zhijie Hou (Fujitsu)
> > >  wrote:
> > > >
> > > > On Friday, March 15, 2024 10:45 PM Bertrand Drouvot
> >  wrote:
> > > > >
> > > > > Hi,
> > > > >
> > > > > On Thu, Mar 14, 2024 at 02:22:44AM +, Zhijie Hou (Fujitsu) wrote:
> > > > > > Hi,
> > > > > >
> > > > > > Since the standby_slot_names patch has been committed, I am
> > > > > > attaching the last doc patch for review.
> > > > > >
> > > > >
> > > > > Thanks!
> > > > >
> > > > > 1 ===
> > > > >
> > > > > +   continue subscribing to publications now on the new primary
> > > > > + server
> > > > > without
> > > > > +   any data loss.
> > > > >
> > > > > I think "without any data loss" should be re-worded in this
> > > > > context. Data loss in the sense "data committed on the primary and
> > > > > not visible on the subscriber in case of failover" can still occurs 
> > > > > (in case
> > synchronous replication is not used).
> > > > >
> > > > > 2 ===
> > > > >
> > > > > +   If the result (failover_ready) of both above 
> > > > > steps is
> > > > > +   true, existing subscriptions will be able to continue without data
> > loss.
> > > > > +  
> > > > >
> > > > > I don't think that's true if synchronous replication is not used.
> > > > > Say,
> > > > >
> > > > > - synchronous replication is not used
> > > > > - primary is not able to reach the standby anymore and
> > > > > standby_slot_names is set
> > > > > - new data is inserted into the primary
> > > > > - then not replicated to subscriber (due to standby_slot_names)
> > > > >
> > > > > Then I think the both above steps will return true but data would
> > > > > be lost in case of failover.
> > > >
> > > > Thanks for the comments, attach the new version patch which reworded
> > > > the above places.

Thanks!

> Here is the V3 doc patch.

Thanks! A few comments:

1 ===

+   losing any data that has been flushed to the new primary server.

Worth to add a few words about possible data loss, something like?

Please note that in case synchronous replication is not used and 
standby_slot_names
is set correctly, it might be possible to lose data that would have been 
committed
on the old primary server (in case the standby was not reachable during that 
time 
for example).

2 ===

+test_sub=# SELECT
+   array_agg(slotname) AS slots
+   FROM
+   ((
+   SELECT r.srsubid AS subid, CONCAT('pg_', srsubid, '_sync_', 
srrelid, '_', ctl.system_identifier) AS slotname
+   FROM pg_control_system() ctl, pg_subscription_rel r, 
pg_subscription s
+   WHERE r.srsubstate = 'f' AND s.oid = r.srsubid AND s.subfailover
+   ) UNION (

I guess this format comes from ReplicationSlotNameForTablesync(). What about
creating a SQL callable function on top of it and make use of it in the query
above? (that would ensure to keep the doc up to date even if the format changes
in ReplicationSlotNameForTablesync()).

3 ===

+test_sub=# SELECT
+   MAX(remote_lsn) AS remote_lsn_on_subscriber
+   FROM
+   ((
+   SELECT (CASE WHEN r.srsubstate = 'f' THEN 
pg_replication_origin_progress(CONCAT('pg_', r.srsubid, '_', r.srrelid), false)
+   WHEN r.srsubstate IN ('s', 'r') THEN r.srsublsn 
END) AS remote_lsn
+   FROM pg_subscription_rel r, pg_subscription s
+   WHERE r.srsubstate IN ('f', 's', 'r') AND s.oid = r.srsubid AND 
s.subfailover
+   ) UNION (
+   SELECT pg_replication_origin_progress(CONCAT('pg_', s.oid), 
false) AS remote_lsn
+   FROM pg_subscription s
+   WHERE s.subfailover
+   ));

What about adding a join to pg_replication_origin to get rid of the "hardcoded"
format "CONCAT('pg_', r.srsubid, '_', r.srrelid)" and "CONCAT('pg_', s.oid)"?

Idea behind 2 ===  and 3 === is to have the queries as generic as possible and
not rely on a hardcoded format (that would be more difficult to maintain should
those formats change in the future).

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Expand applicability of aggregate's sortop optimization

2024-05-08 Thread Matthias van de Meent
Hi,

As you may know, aggregates like SELECT MIN(unique1) FROM tenk1; are
rewritten as SELECT unique1 FROM tenk1 ORDER BY unique1 USING < LIMIT
1; by using the optional sortop field in the aggregator.
However, this optimization is disabled for clauses that in itself have
an ORDER BY clause such as `MIN(unique1 ORDER BY ), because
 can cause reordering of distinguisable values like 1.0 and
1.00, which then causes measurable differences in the output. In the
general case, that's a good reason to not apply this optimization, but
in some cases, we could still apply the index optimization.

One of those cases is fixed in the attached patch: if we order by the
same column that we're aggregating, using the same order class as the
aggregate's sort operator (i.e. the aggregate's sortop is in the same
btree opclass as the ORDER BY's sort operator), then we can still use
the index operation: The sort behaviour isn't changed, thus we can
apply the optimization.

PFA the small patch that implements this.

Note that we can't blindly accept just any ordering by the same
column: If we had an opclass that sorted numeric values by the length
of the significant/mantissa, then that'd provide a different (and
distinct) ordering from that which is expected by the normal
min()/max() aggregates for numeric, which could cause us to return
arguably incorrect results for the aggregate expression.

Alternatively, the current code could be changed to build indexed
paths that append the SORT BY paths to the aggregate's sort operator,
but that'd significantly increase complexity here.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)
From 215600d12164f214aae8f0de94b16373bd202d69 Mon Sep 17 00:00:00 2001
From: Matthias van de Meent 
Date: Thu, 25 Apr 2024 15:23:57 +0200
Subject: [PATCH v1] Plan min()/max()-like aggregates with index accesses in
 more cases

When the aggregate's operator is included in the ORDER BY's ordering
opclass, we know they have a common ordering, and thus should get the
same output (or at least same consistency guarantee for that output).

So, check if the ORDER BY orders things by only the aggregated
expression, and check if that ordering shares a btree opclass with
the aggregator's operator. If so, we can use the index.
---
 src/backend/optimizer/plan/planagg.c | 87 
 src/test/regress/expected/aggregates.out | 65 ++
 src/test/regress/sql/aggregates.sql  | 18 +
 3 files changed, 156 insertions(+), 14 deletions(-)

diff --git a/src/backend/optimizer/plan/planagg.c b/src/backend/optimizer/plan/planagg.c
index afb5445b77..d8479fe286 100644
--- a/src/backend/optimizer/plan/planagg.c
+++ b/src/backend/optimizer/plan/planagg.c
@@ -253,6 +253,16 @@ can_minmax_aggs(PlannerInfo *root, List **context)
 		if (list_length(aggref->args) != 1)
 			return false;		/* it couldn't be MIN/MAX */
 
+		/*
+		 * We might implement the optimization when a FILTER clause is present
+		 * by adding the filter to the quals of the generated subquery.  For
+		 * now, just punt.
+		 */
+		if (aggref->aggfilter != NULL)
+			return false;
+
+		curTarget = (TargetEntry *) linitial(aggref->args);
+
 		/*
 		 * ORDER BY is usually irrelevant for MIN/MAX, but it can change the
 		 * outcome if the aggsortop's operator class recognizes non-identical
@@ -267,22 +277,71 @@ can_minmax_aggs(PlannerInfo *root, List **context)
 		 * quickly.
 		 */
 		if (aggref->aggorder != NIL)
-			return false;
+		{
+			SortGroupClause *orderClause;
+
+			/*
+			 * If the order clause is the same column as the one we're
+			 * aggregating, we can still use the index: It is undefined which
+			 * value is MIN() or MAX(), as well as which value is first or
+			 * last when sorted. So, we can still use the index IFF the
+			 * aggregated expression equals the expression used in the
+			 * ordering operation.
+			 */
+
+			/*
+			 * We only accept a single argument to min/max aggregates,
+			 * orderings that have more clauses won't provide correct results.
+			 */
+			if (list_length(aggref->aggorder) > 1)
+return false;
+
+			orderClause = castNode(SortGroupClause, linitial(aggref->aggorder));
+
+			if (orderClause->tleSortGroupRef != curTarget->ressortgroupref)
+return false;
+
+			aggsortop = fetch_agg_sort_op(aggref->aggfnoid);
+			if (!OidIsValid(aggsortop))
+return false;		/* not a MIN/MAX aggregate */
+
+			if (orderClause->sortop != aggsortop)
+			{
+List   *btclasses;
+ListCell *cell;
+bool	match = false;
+
+btclasses = get_op_btree_interpretation(orderClause->sortop);
+
+foreach(cell, btclasses)
+{
+	OpBtreeInterpretation *interpretation;
+	interpretation = (OpBtreeInterpretation *) lfirst(cell);
+
+	if (!match)
+	{
+		if (op_in_opfamily(aggsortop, interpretation->opfamily_id))
+			match = true;
+	}
+	/* Now useless */
+	pfree(interpretation);
+}
+
+/* Now not useful anymore */
+pfree(btclasses);
+
+if (!match)

Re: Expand applicability of aggregate's sortop optimization

2024-05-08 Thread Dagfinn Ilmari Mannsåker
Matthias van de Meent  writes:

> PFA the small patch that implements this.

I don't have enough knowledge to have an opinion on most of the patch
other than it looks okay at a glance, but the list API usage could be
updated to more modern variants:

> diff --git a/src/backend/optimizer/plan/planagg.c 
> b/src/backend/optimizer/plan/planagg.c
> index afb5445b77..d8479fe286 100644
> --- a/src/backend/optimizer/plan/planagg.c
> +++ b/src/backend/optimizer/plan/planagg.c
> @@ -253,6 +253,16 @@ can_minmax_aggs(PlannerInfo *root, List **context)
>   if (list_length(aggref->args) != 1)
>   return false;   /* it couldn't be MIN/MAX */
>  
> + /*
> +  * We might implement the optimization when a FILTER clause is 
> present
> +  * by adding the filter to the quals of the generated subquery. 
>  For
> +  * now, just punt.
> +  */
> + if (aggref->aggfilter != NULL)
> + return false;
> +
> + curTarget = (TargetEntry *) linitial(aggref->args);

This could be linitial_node(TargetEntry, aggref->args).

> + if (list_length(aggref->aggorder) > 1)
> + return false;
> +
> + orderClause = castNode(SortGroupClause, 
> linitial(aggref->aggorder));

This could be linitial_node(SortGroupClause, aggref->aggorder).

> + if (orderClause->sortop != aggsortop)
> + {
> + List   *btclasses;
> + ListCell *cell;
> + boolmatch = false;
> +
> + btclasses = 
> get_op_btree_interpretation(orderClause->sortop);
> +
> + foreach(cell, btclasses)
> + {
> + OpBtreeInterpretation *interpretation;
> + interpretation = (OpBtreeInterpretation 
> *) lfirst(cell);

This could be foreach_ptr(OpBtreeInterpretation, interpretation, btclasses),
which also eliminates the need for the explicit `ListCell *` variable
and lfirst() call.

- ilmari




Re: CREATE DATABASE with filesystem cloning

2024-05-08 Thread Ranier Vilela
Em qua., 8 de mai. de 2024 às 04:37, Nazir Bilal Yavuz 
escreveu:

> Hi Ranier,
>
> Thanks for looking into this!
>
> I am not sure why but your reply does not show up in the thread, so I
> copied your reply and answered it in the thread for visibility.
>
> On Tue, 7 May 2024 at 16:28, Ranier Vilela  wrote:
> >
> > I know it's coming from copy-and-paste, but
> > I believe the flags could be:
> > - dstfd = OpenTransientFile(tofile, O_RDWR | O_CREAT | O_EXCL |
> PG_BINARY);
> > + dstfd = OpenTransientFile(tofile, O_CREAT | O_WRONLY | O_TRUNC |
> O_EXCL | PG_BINARY);
> >
> > The flags:
> > O_WRONLY | O_TRUNC
> >
> > Allow the OS to make some optimizations, if you deem it possible.
> >
> > The flag O_RDWR indicates that the file can be read, which is not true
> in this case.
> > The destination file will just be written.
>
> You may be right about the O_WRONLY flag but I am not sure about the
> O_TRUNC flag.
>
> O_TRUNC from the linux man page [1]: If the file already exists and is
> a regular file and the access mode allows writing (i.e., is O_RDWR or
> O_WRONLY) it will be truncated to length 0.  If the file is a FIFO  or
> terminal device file, the O_TRUNC flag is ignored. Otherwise, the
> effect of O_TRUNC is unspecified.
>
O_TRUNC is usually used in conjunction with O_WRONLY.
See the example at:
open.html


O_TRUNC signals the OS to forget the current contents of the file, if it
happens to exist.
In other words, there will be no seeks, only and exclusively writes.


> But it should be already checked if the destination directory already
> exists in dbcommands.c file in createdb() function [2], so this should
> not happen.
>
I'm not sure what you're referring to here.

best regards,
Ranier Vilela


Re: AIX support

2024-05-08 Thread Sriram RK
Hi Team, We have the AIX node ready in OSU lab, and the branches 15 and 16 got 
build on the node. We had raised a request to register this node as buildfarm 
member. Yet to receive the approval.

We would like to understand your inputs/plans on reverting the changes for AIX.

Thanks,
Sriram.


Re: CREATE DATABASE with filesystem cloning

2024-05-08 Thread Nazir Bilal Yavuz
Hi,

On Wed, 8 May 2024 at 14:16, Ranier Vilela  wrote:
>
>
> Em qua., 8 de mai. de 2024 às 04:37, Nazir Bilal Yavuz  
> escreveu:
>>
>> Hi Ranier,
>>
>> Thanks for looking into this!
>>
>> I am not sure why but your reply does not show up in the thread, so I
>> copied your reply and answered it in the thread for visibility.
>>
>> On Tue, 7 May 2024 at 16:28, Ranier Vilela  wrote:
>> >
>> > I know it's coming from copy-and-paste, but
>> > I believe the flags could be:
>> > - dstfd = OpenTransientFile(tofile, O_RDWR | O_CREAT | O_EXCL | PG_BINARY);
>> > + dstfd = OpenTransientFile(tofile, O_CREAT | O_WRONLY | O_TRUNC | O_EXCL 
>> > | PG_BINARY);
>> >
>> > The flags:
>> > O_WRONLY | O_TRUNC
>> >
>> > Allow the OS to make some optimizations, if you deem it possible.
>> >
>> > The flag O_RDWR indicates that the file can be read, which is not true in 
>> > this case.
>> > The destination file will just be written.
>>
>> You may be right about the O_WRONLY flag but I am not sure about the
>> O_TRUNC flag.
>>
>> O_TRUNC from the linux man page [1]: If the file already exists and is
>> a regular file and the access mode allows writing (i.e., is O_RDWR or
>> O_WRONLY) it will be truncated to length 0.  If the file is a FIFO  or
>> terminal device file, the O_TRUNC flag is ignored. Otherwise, the
>> effect of O_TRUNC is unspecified.
>
> O_TRUNC is usually used in conjunction with O_WRONLY.
> See the example at:
> open.html
>
> O_TRUNC signals the OS to forget the current contents of the file, if it 
> happens to exist.
> In other words, there will be no seeks, only and exclusively writes.

You are right; the O_TRUNC flag truncates the file, if it happens to
exist. But it should not exist in the first place because the code at
[2] should check this before the OpenTransientFile() call. Even if we
assume somehow the check at [2] does not work, I do not think the
correct operation is to truncate the contents of the existing file.

>>
>> But it should be already checked if the destination directory already
>> exists in dbcommands.c file in createdb() function [2], so this should
>> not happen.
>
> I'm not sure what you're referring to here.

Sorry, I meant that the destination directory / file should not exist
because the code at [2] confirms it does not exist, otherwise it
errors out.

-- 
Regards,
Nazir Bilal Yavuz
Microsoft




RE: Pgoutput not capturing the generated columns

2024-05-08 Thread Hayato Kuroda (Fujitsu)
Dear Shubham,

Thanks for creating a patch! Here are high-level comments.

1.
Please document the feature. If it is hard to describe, we should change the 
API.

2.
Currently, the option is implemented as streaming option. Are there any reasons
to choose the way? Another approach is to implement as slot option, like 
failover
and temporary.

3.
You said that subscription option is not supported for now. Not sure, is it mean
that logical replication feature cannot be used for generated columns? If so,
the restriction won't be acceptable. If the combination between this and initial
sync is problematic, can't we exclude them in CreateSubscrition and 
AlterSubscription?
E.g., create_slot option cannot be set if slot_name is NONE.

4.
Regarding the test_decoding plugin, it has already been able to decode the
generated columns. So... as the first place, is the proposed option really 
needed
for the plugin? Why do you include it?
If you anyway want to add the option, the default value should be on - which 
keeps
current behavior.

5.
Assuming that the feature become usable used for logical replicaiton. Not sure,
should we change the protocol version at that time? Nodes prior than PG17 may
not want receive values for generated columns. Can we control only by the 
option?

6. logicalrep_write_tuple()

```
-if (!column_in_column_list(att->attnum, columns))
+if (!column_in_column_list(att->attnum, columns) && !att->attgenerated)
+continue;
```

Hmm, does above mean that generated columns are decoded even if they are not in
the column list? If so, why? I think such columns should not be sent.

7.

Some functions refer data->publish_generated_column many times. Can we store
the value to a variable?

Below comments are for test_decoding part, but they may be not needed.

=

a. pg_decode_startup()

```
+else if (strcmp(elem->defname, "include_generated_columns") == 0)
```

Other options for test_decoding do not have underscore. It should be
"include-generated-columns".

b. pg_decode_change()

data->include_generated_columns is referred four times in the function.
Can you store the value to a varibable?


c. pg_decode_change()

```
-true);
+true, data->include_generated_columns );
```

Please remove the blank.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/ 



Re: CREATE DATABASE with filesystem cloning

2024-05-08 Thread Ranier Vilela
Em qua., 8 de mai. de 2024 às 08:42, Nazir Bilal Yavuz 
escreveu:

> Hi,
>
> On Wed, 8 May 2024 at 14:16, Ranier Vilela  wrote:
> >
> >
> > Em qua., 8 de mai. de 2024 às 04:37, Nazir Bilal Yavuz <
> byavu...@gmail.com> escreveu:
> >>
> >> Hi Ranier,
> >>
> >> Thanks for looking into this!
> >>
> >> I am not sure why but your reply does not show up in the thread, so I
> >> copied your reply and answered it in the thread for visibility.
> >>
> >> On Tue, 7 May 2024 at 16:28, Ranier Vilela  wrote:
> >> >
> >> > I know it's coming from copy-and-paste, but
> >> > I believe the flags could be:
> >> > - dstfd = OpenTransientFile(tofile, O_RDWR | O_CREAT | O_EXCL |
> PG_BINARY);
> >> > + dstfd = OpenTransientFile(tofile, O_CREAT | O_WRONLY | O_TRUNC |
> O_EXCL | PG_BINARY);
> >> >
> >> > The flags:
> >> > O_WRONLY | O_TRUNC
> >> >
> >> > Allow the OS to make some optimizations, if you deem it possible.
> >> >
> >> > The flag O_RDWR indicates that the file can be read, which is not
> true in this case.
> >> > The destination file will just be written.
> >>
> >> You may be right about the O_WRONLY flag but I am not sure about the
> >> O_TRUNC flag.
> >>
> >> O_TRUNC from the linux man page [1]: If the file already exists and is
> >> a regular file and the access mode allows writing (i.e., is O_RDWR or
> >> O_WRONLY) it will be truncated to length 0.  If the file is a FIFO  or
> >> terminal device file, the O_TRUNC flag is ignored. Otherwise, the
> >> effect of O_TRUNC is unspecified.
> >
> > O_TRUNC is usually used in conjunction with O_WRONLY.
> > See the example at:
> > open.html
> >
> > O_TRUNC signals the OS to forget the current contents of the file, if it
> happens to exist.
> > In other words, there will be no seeks, only and exclusively writes.
>
> You are right; the O_TRUNC flag truncates the file, if it happens to
> exist. But it should not exist in the first place because the code at
> [2] should check this before the OpenTransientFile() call. Even if we
> assume somehow the check at [2] does not work, I do not think the
> correct operation is to truncate the contents of the existing file.
>
I don't know if there is a communication problem here.
But what I'm trying to suggest refers to the destination file,
which doesn't matter if it exists or not?

If the destination file does not exist, O_TRUNC is ignored.
If the destination file exists, O_TRUNC truncates the current contents of
the file.
I don't see why you think it's a problem to truncate the current content if
the destination file exists.
Isn't he going to be replaced anyway?

Unless we want to preserve the current content (destination file), in case
the copy/clone fails?

best regards,
Ranier Vilela


Re: PERIOD foreign key feature

2024-05-08 Thread Peter Eisentraut

On 07.05.24 18:43, Paul Jungwirth wrote:

On 5/7/24 08:23, David G. Johnston wrote:
On Tue, May 7, 2024 at 7:54 AM Bruce Momjian > wrote:

    In the two marked lines, it says "if one side of the foreign key uses
    PERIOD, the other side must too."  However, looking at the example
    queries, it seems like if the foreign side has PERIOD, the primary 
side

    must have WITHOUT OVERLAPS, not PERIOD.

    Does this doc text need correcting?


The text is factually correct, though a bit hard to parse.

"the other side" refers to the part after "REFERENCES":

FOREIGN KEY ( column_name [, ... ] [, PERIOD column_name ] ) 
REFERENCES reftable [ ( refcolumn [, ... ] [, PERIOD column_name ] ) ]


***(shouldn't the second occurrence be [, PERIOD refcolum] ?)

The text is pointing out that since the refcolumn specification is 
optional you may very well not see a second PERIOD keyword in the 
clause.  Instead it will be inferred from the PK.


Maybe:

Finally, if the foreign key has a PERIOD column_name specification the 
corresponding refcolumn, if present, must also be marked PERIOD.  If 
the refcolumn clause is omitted, and thus the reftable's primary key 
constraint chosen, the primary key must have its final column marked 
WITHOUT OVERLAPS.


Yes, David is correct here on all points. I like his suggestion to 
clarify the language here also. If you need a patch from me let me know, 
but I assume it's something a committer can just make happen?


In principle yes, but it's also very helpful if someone produces an 
actual patch file, with complete commit message, credits, mailing list 
link, etc.






Re: CREATE DATABASE with filesystem cloning

2024-05-08 Thread Nazir Bilal Yavuz
Hi,

On Wed, 8 May 2024 at 15:23, Ranier Vilela  wrote:
>
> Em qua., 8 de mai. de 2024 às 08:42, Nazir Bilal Yavuz  
> escreveu:
>>
>> Hi,
>>
>> On Wed, 8 May 2024 at 14:16, Ranier Vilela  wrote:
>> >
>> >
>> > Em qua., 8 de mai. de 2024 às 04:37, Nazir Bilal Yavuz 
>> >  escreveu:
>> >>
>> >> Hi Ranier,
>> >>
>> >> Thanks for looking into this!
>> >>
>> >> I am not sure why but your reply does not show up in the thread, so I
>> >> copied your reply and answered it in the thread for visibility.
>> >>
>> >> On Tue, 7 May 2024 at 16:28, Ranier Vilela  wrote:
>> >> >
>> >> > I know it's coming from copy-and-paste, but
>> >> > I believe the flags could be:
>> >> > - dstfd = OpenTransientFile(tofile, O_RDWR | O_CREAT | O_EXCL | 
>> >> > PG_BINARY);
>> >> > + dstfd = OpenTransientFile(tofile, O_CREAT | O_WRONLY | O_TRUNC | 
>> >> > O_EXCL | PG_BINARY);
>> >> >
>> >> > The flags:
>> >> > O_WRONLY | O_TRUNC
>> >> >
>> >> > Allow the OS to make some optimizations, if you deem it possible.
>> >> >
>> >> > The flag O_RDWR indicates that the file can be read, which is not true 
>> >> > in this case.
>> >> > The destination file will just be written.
>> >>
>> >> You may be right about the O_WRONLY flag but I am not sure about the
>> >> O_TRUNC flag.
>> >>
>> >> O_TRUNC from the linux man page [1]: If the file already exists and is
>> >> a regular file and the access mode allows writing (i.e., is O_RDWR or
>> >> O_WRONLY) it will be truncated to length 0.  If the file is a FIFO  or
>> >> terminal device file, the O_TRUNC flag is ignored. Otherwise, the
>> >> effect of O_TRUNC is unspecified.
>> >
>> > O_TRUNC is usually used in conjunction with O_WRONLY.
>> > See the example at:
>> > open.html
>> >
>> > O_TRUNC signals the OS to forget the current contents of the file, if it 
>> > happens to exist.
>> > In other words, there will be no seeks, only and exclusively writes.
>>
>> You are right; the O_TRUNC flag truncates the file, if it happens to
>> exist. But it should not exist in the first place because the code at
>> [2] should check this before the OpenTransientFile() call. Even if we
>> assume somehow the check at [2] does not work, I do not think the
>> correct operation is to truncate the contents of the existing file.
>
> I don't know if there is a communication problem here.
> But what I'm trying to suggest refers to the destination file,
> which doesn't matter if it exists or not?

I do not think there is a communication problem. Actually it matters
because the destination file should not exist, there is a code [2]
which already checks and confirms that it does not exist.

>
> If the destination file does not exist, O_TRUNC is ignored.
> If the destination file exists, O_TRUNC truncates the current contents of the 
> file.
> I don't see why you think it's a problem to truncate the current content if 
> the destination file exists.
> Isn't he going to be replaced anyway?

'If the destination file does not exist' means the code at [2] works
well and we do not need the O_TRUNC flag.

'If the destination file already exists' means the code at [2] is
broken somehow and there is a high chance that we could truncate
something that we do not want to. For example, there is a foo db and
we want to create bar db, Postgres chose the foo db's location as the
destination of the bar db (which should not happen but let's assume
somehow checks at [2] failed), then we could wrongly truncate the foo
db's contents.

Hence, if Postgres works successfully I think the O_TRUNC flag is
unnecessary but if Postgres does not work successfully, the O_TRUNC
flag could cause harm.

>
> Unless we want to preserve the current content (destination file), in case 
> the copy/clone fails?

Like I said above, Postgres should not choose the existing file as the
destination file.

Also, we have O_CREAT | O_EXCL flags together, from the link [3] you
shared before: If O_CREAT and O_EXCL are set, open() shall fail if the
file exists. So, overwriting the already existing file is already
prevented.

[3] https://pubs.opengroup.org/onlinepubs/009696699/functions/open.html

-- 
Regards,
Nazir Bilal Yavuz
Microsoft




Re: CREATE DATABASE with filesystem cloning

2024-05-08 Thread Ranier Vilela
Em qua., 8 de mai. de 2024 às 10:06, Nazir Bilal Yavuz 
escreveu:

> Hi,
>
> On Wed, 8 May 2024 at 15:23, Ranier Vilela  wrote:
> >
> > Em qua., 8 de mai. de 2024 às 08:42, Nazir Bilal Yavuz <
> byavu...@gmail.com> escreveu:
> >>
> >> Hi,
> >>
> >> On Wed, 8 May 2024 at 14:16, Ranier Vilela  wrote:
> >> >
> >> >
> >> > Em qua., 8 de mai. de 2024 às 04:37, Nazir Bilal Yavuz <
> byavu...@gmail.com> escreveu:
> >> >>
> >> >> Hi Ranier,
> >> >>
> >> >> Thanks for looking into this!
> >> >>
> >> >> I am not sure why but your reply does not show up in the thread, so I
> >> >> copied your reply and answered it in the thread for visibility.
> >> >>
> >> >> On Tue, 7 May 2024 at 16:28, Ranier Vilela 
> wrote:
> >> >> >
> >> >> > I know it's coming from copy-and-paste, but
> >> >> > I believe the flags could be:
> >> >> > - dstfd = OpenTransientFile(tofile, O_RDWR | O_CREAT | O_EXCL |
> PG_BINARY);
> >> >> > + dstfd = OpenTransientFile(tofile, O_CREAT | O_WRONLY | O_TRUNC |
> O_EXCL | PG_BINARY);
> >> >> >
> >> >> > The flags:
> >> >> > O_WRONLY | O_TRUNC
> >> >> >
> >> >> > Allow the OS to make some optimizations, if you deem it possible.
> >> >> >
> >> >> > The flag O_RDWR indicates that the file can be read, which is not
> true in this case.
> >> >> > The destination file will just be written.
> >> >>
> >> >> You may be right about the O_WRONLY flag but I am not sure about the
> >> >> O_TRUNC flag.
> >> >>
> >> >> O_TRUNC from the linux man page [1]: If the file already exists and
> is
> >> >> a regular file and the access mode allows writing (i.e., is O_RDWR or
> >> >> O_WRONLY) it will be truncated to length 0.  If the file is a FIFO
> or
> >> >> terminal device file, the O_TRUNC flag is ignored. Otherwise, the
> >> >> effect of O_TRUNC is unspecified.
> >> >
> >> > O_TRUNC is usually used in conjunction with O_WRONLY.
> >> > See the example at:
> >> > open.html
> >> >
> >> > O_TRUNC signals the OS to forget the current contents of the file, if
> it happens to exist.
> >> > In other words, there will be no seeks, only and exclusively writes.
> >>
> >> You are right; the O_TRUNC flag truncates the file, if it happens to
> >> exist. But it should not exist in the first place because the code at
> >> [2] should check this before the OpenTransientFile() call. Even if we
> >> assume somehow the check at [2] does not work, I do not think the
> >> correct operation is to truncate the contents of the existing file.
> >
> > I don't know if there is a communication problem here.
> > But what I'm trying to suggest refers to the destination file,
> > which doesn't matter if it exists or not?
>
> I do not think there is a communication problem. Actually it matters
> because the destination file should not exist, there is a code [2]
> which already checks and confirms that it does not exist.
>
I got it.


>
> >
> > If the destination file does not exist, O_TRUNC is ignored.
> > If the destination file exists, O_TRUNC truncates the current contents
> of the file.
> > I don't see why you think it's a problem to truncate the current content
> if the destination file exists.
> > Isn't he going to be replaced anyway?
>
> 'If the destination file does not exist' means the code at [2] works
> well and we do not need the O_TRUNC flag.
>
True, the O_TRUNC is ignored in this case.


>
> 'If the destination file already exists' means the code at [2] is
> broken somehow and there is a high chance that we could truncate
> something that we do not want to. For example, there is a foo db and
> we want to create bar db, Postgres chose the foo db's location as the
> destination of the bar db (which should not happen but let's assume
> somehow checks at [2] failed), then we could wrongly truncate the foo
> db's contents.
>
Of course, truncating the wrong file would be pretty bad.


>
> Hence, if Postgres works successfully I think the O_TRUNC flag is
> unnecessary but if Postgres does not work successfully, the O_TRUNC
> flag could cause harm.
>
The disaster will happen anyway, but of course we can help in some way.
Even without truncating, the wrong file will be destroyed anyway.


> >
> > Unless we want to preserve the current content (destination file), in
> case the copy/clone fails?
>
> Like I said above, Postgres should not choose the existing file as the
> destination file.
>
> Also, we have O_CREAT | O_EXCL flags together, from the link [3] you
> shared before: If O_CREAT and O_EXCL are set, open() shall fail if the
> file exists. So, overwriting the already existing file is already
> prevented.
>
That said, I agree that not using O_TRUNC helps in some way.

best regards,
Ranier Vilela


Re: AIX support

2024-05-08 Thread Peter Eisentraut

On 08.05.24 13:39, Sriram RK wrote:
We would like to understand your inputs/plans on reverting the changes 
for AIX.


I think the ship has sailed for PG17.  The way forward would be that you 
submit a patch for new, modernized AIX support for PG18.






Re: Fix parallel vacuum buffer usage reporting

2024-05-08 Thread Masahiko Sawada
On Fri, May 3, 2024 at 3:41 PM Anthonin Bonnefoy
 wrote:
>
> On Wed, May 1, 2024 at 5:37 AM Masahiko Sawada  wrote:
>>
>> Thank you for further testing! I've pushed the patch.
>
> Thanks!
>
> Here is the rebased version for the follow-up patch removing VacuumPage 
> variables. Though I'm not sure if I should create a dedicated mail thread 
> since the bug was fixed and the follow-up is more of a refactoring. What do 
> you think?

I'd suggest starting a new thread or changing the subject as the
current subject no longer matches what we're discussing.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com




Re: SQL:2011 application time

2024-05-08 Thread Peter Eisentraut

On 30.04.24 18:39, Paul Jungwirth wrote:

On 4/30/24 09:24, Robert Haas wrote:

Peter, could you have a look at
http://postgr.es/m/47550967-260b-4180-9791-b224859fe...@illuminatedcomputing.com
and express an opinion about whether each of those proposals are (a)
good or bad ideas and (b) whether they need to be fixed for the
current release?


Here are the same patches but rebased.


I have committed 
v2-0002-Add-test-for-REPLICA-IDENTITY-with-a-temporal-key.patch.

About v2-0001-Fix-ON-CONFLICT-DO-NOTHING-UPDATE-for-temporal-in.patch, I think 
the
ideas are right, but I wonder if we can fine-tune the new conditionals a bit.

--- a/src/backend/executor/execIndexing.c
+++ b/src/backend/executor/execIndexing.c
@@ -210,7 +210,7 @@ ExecOpenIndices(ResultRelInfo *resultRelInfo, bool 
speculative)
 * If the indexes are to be used for speculative insertion, add 
extra
 * information required by unique index entries.
 */
-   if (speculative && ii->ii_Unique)
+   if (speculative && ii->ii_Unique && !ii->ii_HasWithoutOverlaps)
BuildSpeculativeIndexInfo(indexDesc, ii);

Here, I think we could check !indexDesc->rd_index->indisexclusion instead.  So 
we
wouldn't need ii_HasWithoutOverlaps.

Or we could push this into BuildSpeculativeIndexInfo(); it could just skip the 
rest
if an exclusion constraint is passed, on the theory that all the speculative 
index
info is already present in that case.

--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -815,7 +815,7 @@ infer_arbiter_indexes(PlannerInfo *root)
 */
if (indexOidFromConstraint == idxForm->indexrelid)
{
-   if (!idxForm->indisunique && onconflict->action == 
ONCONFLICT_UPDATE)
+   if ((!idxForm->indisunique || idxForm->indisexclusion) && 
onconflict->action == ONCONFLICT_UPDATE)
ereport(ERROR,
(errcode(ERRCODE_WRONG_OBJECT_TYPE),
 errmsg("ON CONFLICT DO UPDATE not supported with exclusion 
constraints")));

Shouldn't this use only idxForm->indisexclusion anyway?  Like

+   if (idxForm->indisexclusion && onconflict->action == 
ONCONFLICT_UPDATE)

That matches what the error message is reporting afterwards.

 * constraints), so index under consideration can be immediately
 * skipped if it's not unique
 */
-   if (!idxForm->indisunique)
+   if (!idxForm->indisunique || idxForm->indisexclusion)
goto next;

Maybe here we need a comment.  Or make that a separate statement, like

/* not supported yet etc. */
if (idxForm->indixexclusion)
next;





Re: CREATE DATABASE with filesystem cloning

2024-05-08 Thread Robert Haas
On Tue, May 7, 2024 at 8:00 AM Nazir Bilal Yavuz  wrote:
> We had an off-list talk with Thomas and we thought making this option
> GUC instead of SQL command level could solve this problem.
>
> I am posting a new rebased version of the patch with some important changes:
>
> * 'createdb_file_copy_method' GUC is created. Possible values are
> 'copy' and 'clone'. Copy is the default option. Clone option can be
> chosen if the system supports it, otherwise it gives error at the
> startup.

This seems like a smart idea, because the type of file copying that we
do during redo need not be the same as what was done when the
operation was originally performed.

I'm not so sure about the GUC name. On the one hand, it feels like
createdb should be spelled out as create_database, but on the other
hand, the GUC name is quite long already. Then again, why would we
make this specific to CREATE DATABASE in the first place? Would we
also want alter_tablespace_file_copy_method and
basic_archive.file_copy_method?

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: [PATCH] json_lex_string: don't overread on bad UTF8

2024-05-08 Thread Jacob Champion
On Tue, May 7, 2024 at 10:31 PM Michael Paquier  wrote:
> But looking closer, I can see that in the JSON_INVALID_TOKEN case,
> when !tok_done, we set token_terminator to point to the end of the
> token, and that would include an incomplete byte sequence like in your
> case.  :/

Ah, I see what you're saying. Yeah, that approach would need some more
invasive changes.

> This situation makes me
> uncomfortable and we should put more effort in printing error messages
> in a readable format, but that could always be tackled later as a
> separate problem..  And I don't see something backpatchable at short
> sight for v16.

Agreed. Fortunately (or unfortunately?) I think the JSON
client-encoding work is now a prerequisite for OAuth in libpq, so
hopefully some improvements can fall out of that work too.

> Thoughts and/or objections?

None here.

Thanks!
--Jacob




Re: AIX support

2024-05-08 Thread Bruce Momjian
On Wed, May  8, 2024 at 03:44:12PM +0200, Peter Eisentraut wrote:
> On 08.05.24 13:39, Sriram RK wrote:
> > We would like to understand your inputs/plans on reverting the changes
> > for AIX.
> 
> I think the ship has sailed for PG17.  The way forward would be that you
> submit a patch for new, modernized AIX support for PG18.

Yes, I think we were clear that someone needs to review the reverted
patch and figure out which parts are still needed, and why.  We have no
"plans" to restore support.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.




Re: bug: copy progress reporting of backends which run multiple COPYs

2024-05-08 Thread Robert Haas
On Tue, May 7, 2024 at 9:12 PM Michael Paquier  wrote:
> FWIW, I'm rather annoyed by the fact that we rely on the ParseState to
> decide if reporting should happen or not.  file_fdw tells, even if
> that's accidental, that status reporting can be useful if working on a
> single table.  So, shutting down the whole reporting if a caller if
> BeginCopyFrom() does not give a ParseState is too heavy-handed, IMO.

I think you're hoping for too much. The progress reporting
infrastructure is fundamentally designed around the idea that there
can only be one progress-reporting operation in progress at a time.
For COPY, that is, I believe, true, but for file_fdw, it's false. If
we want to do any kind of progress reporting from within plannable
queries, we need some totally different and much more complex
infrastructure that can report progress for, probably, each plan node
individually. I think it's likely a mistake to try to shoehorn cases
like this into the infrastructure
that we have today. It will just encourage people to try to use the
current infrastructure in ways that are less and less like what it was
actually designed to do; whereas what we should be doing if we want
this kind of functionality, at least IMHO, is building infrastructure
that's actually fit for purpose.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: CREATE DATABASE with filesystem cloning

2024-05-08 Thread Nazir Bilal Yavuz
Hi,

On Wed, 8 May 2024 at 16:58, Robert Haas  wrote:
>
> On Tue, May 7, 2024 at 8:00 AM Nazir Bilal Yavuz  wrote:
> > We had an off-list talk with Thomas and we thought making this option
> > GUC instead of SQL command level could solve this problem.
> >
> > I am posting a new rebased version of the patch with some important changes:
> >
> > * 'createdb_file_copy_method' GUC is created. Possible values are
> > 'copy' and 'clone'. Copy is the default option. Clone option can be
> > chosen if the system supports it, otherwise it gives error at the
> > startup.
>
> This seems like a smart idea, because the type of file copying that we
> do during redo need not be the same as what was done when the
> operation was originally performed.
>
> I'm not so sure about the GUC name. On the one hand, it feels like
> createdb should be spelled out as create_database, but on the other
> hand, the GUC name is quite long already. Then again, why would we
> make this specific to CREATE DATABASE in the first place? Would we
> also want alter_tablespace_file_copy_method and
> basic_archive.file_copy_method?

I agree that it is already quite long, because of that I chose the
createdb as a prefix. I did not think that file cloning was planned to
be used in other places. If that is the case, does something like
'preferred_copy_method' work? Then, we would mention which places will
be affected with this GUC in the docs.

-- 
Regards,
Nazir Bilal Yavuz
Microsoft




Re: PERIOD foreign key feature

2024-05-08 Thread Bruce Momjian
On Wed, May  8, 2024 at 02:29:34PM +0200, Peter Eisentraut wrote:
> > > Finally, if the foreign key has a PERIOD column_name specification
> > > the corresponding refcolumn, if present, must also be marked
> > > PERIOD.  If the refcolumn clause is omitted, and thus the reftable's
> > > primary key constraint chosen, the primary key must have its final
> > > column marked WITHOUT OVERLAPS.
> > 
> > Yes, David is correct here on all points. I like his suggestion to
> > clarify the language here also. If you need a patch from me let me know,
> > but I assume it's something a committer can just make happen?
> 
> In principle yes, but it's also very helpful if someone produces an actual
> patch file, with complete commit message, credits, mailing list link, etc.

I am ready to do the work, but waited a day for Peter to reply, since he
was the author of the text.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.




Re: Trigger violates foreign key constraint

2024-05-08 Thread Tom Lane
Aleksander Alekseev  writes:
>> Perhaps we should leave the system triggers out of the discussion
>> entirely?  More or less like:
>> 
>> If a foreign key constraint specifies referential actions (that
>> is, cascading updates or deletes), those actions are performed via
>> ordinary SQL update or delete commands on the referencing table.
>> In particular, any triggers that exist on the referencing table
>> will be fired for those changes.  If such a trigger modifies or
>> blocks the effect of one of these commands, the end result could
>> be to break referential integrity.  It is the trigger programmer's
>> responsibility to avoid that.

> That's perfect!

Hearing no further comments, done like that.

regards, tom lane




Re: UUID v7

2024-05-08 Thread Andrey M. Borodin



> On 3 May 2024, at 11:18, Andrey M. Borodin  wrote:
> 
> RFC 9562 is not in AUTH48-Done state, it was approved by authors and editor, 
> and now should be published.

It's RFC now.
https://datatracker.ietf.org/doc/rfc9562/


Best regards, Andrey Borodin.



Re: Fix for recursive plpython triggers

2024-05-08 Thread Tom Lane
Andreas Karlsson  writes:
> I found this issue while reading the code, so am very unclear if there 
> is any sane code which could trigger it.

> In the example below the recursive call to f('int') changes the return 
> type of the f('text') call causing it to fail.

> # CREATE OR REPLACE FUNCTION f(t text) RETURNS record LANGUAGE 
> plpython3u AS $$
> if t == "text":
>  plpy.execute("SELECT * FROM f('int') AS (a int)");
>  return { "a": "x" }
> elif t == "int":
>  return { "a": 1 }
> $$;
> CREATE FUNCTION

> # SELECT * FROM f('text') AS (a text);
> ERROR:  invalid input syntax for type integer: "x"
> CONTEXT:  while creating return value
> PL/Python function "f"

Oh, nice one.  I think we can fix this trivially though: the problem
is that RECORD return-type setup was stuck into PLy_function_build_args,
where it has no particular business being in the first place, rather
than being done at the point of use.  We can just move the code.

regards, tom lane

diff --git a/src/pl/plpython/plpy_exec.c b/src/pl/plpython/plpy_exec.c
index 6f7b5e121d..157229e96f 100644
--- a/src/pl/plpython/plpy_exec.c
+++ b/src/pl/plpython/plpy_exec.c
@@ -231,7 +231,23 @@ PLy_exec_function(FunctionCallInfo fcinfo, PLyProcedure *proc)
 		}
 		else
 		{
-			/* Normal conversion of result */
+			/*
+			 * Normal conversion of result.  However, if the result is of type
+			 * RECORD, we have to set up for that each time through, since it
+			 * might be different from last time.
+			 */
+			if (proc->result.typoid == RECORDOID)
+			{
+TupleDesc	desc;
+
+if (get_call_result_type(fcinfo, NULL, &desc) != TYPEFUNC_COMPOSITE)
+	ereport(ERROR,
+			(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			 errmsg("function returning record called in context "
+	"that cannot accept type record")));
+PLy_output_setup_record(&proc->result, desc, proc);
+			}
+
 			rv = PLy_output_convert(&proc->result, plrv,
 	&fcinfo->isnull);
 		}
@@ -456,21 +472,6 @@ PLy_function_build_args(FunctionCallInfo fcinfo, PLyProcedure *proc)
 PLy_elog(ERROR, "PyDict_SetItemString() failed, while setting up arguments");
 			arg = NULL;
 		}
-
-		/* Set up output conversion for functions returning RECORD */
-		if (proc->result.typoid == RECORDOID)
-		{
-			TupleDesc	desc;
-
-			if (get_call_result_type(fcinfo, NULL, &desc) != TYPEFUNC_COMPOSITE)
-ereport(ERROR,
-		(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-		 errmsg("function returning record called in context "
-"that cannot accept type record")));
-
-			/* cache the output conversion functions */
-			PLy_output_setup_record(&proc->result, desc, proc);
-		}
 	}
 	PG_CATCH();
 	{


Re: 2024-05-09 release announcement draft

2024-05-08 Thread Jonathan S. Katz

On 5/7/24 12:16 AM, Tom Lane wrote:

David Rowley  writes:

Why not "Fix INSERT with multi-row VALUES clauses ..."?


To my mind, the VALUES clause is the data source for INSERT,
so "from" seems appropriate.  I'm not going to argue hard
about it.


OK, so I've read through this a few times and have sufficiently confused 
myself. So, how about this:


* Fix how 
[`INSERT`](https://www.postgresql.org/docs/current/sql-insert.html) 
handles multiple 
[`VALUES`](https://www.postgresql.org/docs/current/sql-values.html) rows 
into a target column that is a domain over an array or composite type.


Thanks,

Jonathan


OpenPGP_signature.asc
Description: OpenPGP digital signature


Re: New GUC autovacuum_max_threshold ?

2024-05-08 Thread Imseih (AWS), Sami
> This is about how I feel, too. In any case, I +1'd a higher default
> because I think we need to be pretty conservative with these changes, at
> least until we have a better prioritization strategy. While folks may opt
> to set this value super low, I think that's more likely to lead to some
> interesting secondary effects. If the default is high, hopefully these
> secondary effects will be minimized or avoided.


There is also an alternative of making this GUC -1 by default, which
means it has not effect and any value larger will be used in the threshold
calculation of autovacuunm. A user will have to be careful not to set it too 
low, 
but that is going to be a concern either way.


This idea maybe worth considering as it does not change the default
behavior of the autovac threshold calculation, and if a user has cases in 
which they have many tables with a few billion tuples that they wish to 
see autovacuumed more often, they now have a GUC to make 
that possible and potentially avoid per-table threshold configuration.


Also, I think coming up with a good default will be challenging,
and perhaps this idea is a good middle ground.


Regards,

Sami 





Re: Add SPLIT PARTITION/MERGE PARTITIONS commands

2024-05-08 Thread Alexander Korotkov
On Fri, May 3, 2024 at 4:32 PM Alexander Korotkov  wrote:
> On Fri, May 3, 2024 at 4:23 PM Justin Pryzby  wrote:
> > On Wed, May 01, 2024 at 10:51:24PM +0300, Dmitry Koval wrote:
> > > 30.04.2024 23:15, Justin Pryzby пишет:
> > > > Is this issue already fixed ?
> > > > I wasn't able to reproduce it.  Maybe it only happened with earlier
> > > > patch versions applied ?
> > >
> > > I think this was fixed in commit [1].
> > >
> > > [1] 
> > > https://github.com/postgres/postgres/commit/fcf80c5d5f0f3787e70fca8fd029d2e08a923f91
> >
> > I tried to reproduce it at fcf80c5d5f~, but couldn't.
> > I don't see how that patch would fix it anyway.
> > I'm hoping Alexander can confirm what happened.
>
> This problem is only relevant for an old version of fix [1], which
> overrides schemas for new partitions.  That version was never
> committed.

Here are the patches.
0001 Adds permission checks on the partitions before doing MERGE/SPLIT
0002 Skips copying extended statistics while creating new partitions
in MERGE/SPLIT

0001 looks quite simple and trivial for me.  I'm going to push it if
no objections.
For 0002 I'd like to hear some feedback on wordings used in docs and comments.

--
Regards,
Alexander Korotkov
Supabase


v1-0002-Don-t-copy-extended-statistics-during-MERGE-SPLIT.patch
Description: Binary data


v1-0001-Add-permission-check-for-MERGE-SPLIT-partition-op.patch
Description: Binary data


Re: Revert: Remove useless self-joins *and* -DREALLOCATE_BITMAPSETS make server crash, regress test fail.

2024-05-08 Thread Tom Lane
I traced down the other failure I was seeing in check-world, and
found that it came from deconstruct_distribute doing this:

distribute_quals_to_rels(root, my_quals,
 jtitem,
 sjinfo,
 root->qual_security_level,
 jtitem->qualscope,
 ojscope, jtitem->nonnullable_rels,
 NULL,/* incompatible_relids */
 true,/* allow_equivalence */
 false, false,/* not clones */
 postponed_oj_qual_list);

where jtitem->nonnullable_rels is the same as the jtitem's left_rels,
which ends up as the syn_lefthand of the join's SpecialJoinInfo, and
then when remove_rel_from_query tries to adjust the syn_lefthand it
breaks the outer_relids of whatever RestrictInfos got made here.

I was able to fix that by not letting jtitem->nonnullable_rels be
the same as left_rels.  The attached alternative_1.patch does pass
check-world.  But I find it mighty unprincipled: the JoinTreeItem
data structures are full of shared relid sets, so why is this
particular sharing not OK?  I still don't have any confidence that
there aren't more problems.

Along about here I started to wonder how come we are only seeing
SpecialJoinInfo-vs-RestrictInfo sharing as a problem, when surely
there is plenty of cross-RestrictInfo sharing going on as well.
(The above call is perfectly capable of making a lot of RestrictInfos,
all with the same outer_relids.)  That thought led me to look at
remove_rel_from_restrictinfo, and darn if I didn't find this:

/*
 * The clause_relids probably aren't shared with anything else, but let's
 * copy them just to be sure.
 */
rinfo->clause_relids = bms_copy(rinfo->clause_relids);
...
/* Likewise for required_relids */
rinfo->required_relids = bms_copy(rinfo->required_relids);

So the reason we don't see cross-RestrictInfo breakage is that
analyzejoins.c is careful not to modify the original relid sets
when modifying a RestrictInfo.  (This comment is clearly wrong.)

And that leads to the thought that we can fix our current sharing
problems by similarly avoiding overwriting the original sets
in remove_rel_from_query.  The attached alternative-2.patch
attacks it that way, and also passes check-world.

I like alternative-2.patch a lot better, not least because it
only adds cycles when join removal actually fires.  Basically
this is putting the onus on the data structure modifier to
cope with shared bitmapsets, rather than trying to say that
sharing is disallowed.

Thoughts?

regards, tom lane

diff --git a/src/backend/optimizer/plan/initsplan.c b/src/backend/optimizer/plan/initsplan.c
index e2c68fe6f9..4198e9c83a 100644
--- a/src/backend/optimizer/plan/initsplan.c
+++ b/src/backend/optimizer/plan/initsplan.c
@@ -979,7 +979,7 @@ deconstruct_recurse(PlannerInfo *root, Node *jtnode,
 	right_item->inner_join_rels);
 jtitem->left_rels = left_item->qualscope;
 jtitem->right_rels = right_item->qualscope;
-jtitem->nonnullable_rels = left_item->qualscope;
+jtitem->nonnullable_rels = bms_copy(left_item->qualscope);
 break;
 			case JOIN_SEMI:
 /* This node belongs to parent_domain, as do its children */
@@ -1053,7 +1053,7 @@ deconstruct_recurse(PlannerInfo *root, Node *jtnode,
 jtitem->left_rels = left_item->qualscope;
 jtitem->right_rels = right_item->qualscope;
 /* each side is both outer and inner */
-jtitem->nonnullable_rels = jtitem->qualscope;
+jtitem->nonnullable_rels = bms_copy(jtitem->qualscope);
 break;
 			default:
 /* JOIN_RIGHT was eliminated during reduce_outer_joins() */
@@ -1888,7 +1888,7 @@ deconstruct_distribute_oj_quals(PlannerInfo *root,
 	qualscope = bms_union(sjinfo->syn_lefthand, sjinfo->syn_righthand);
 	qualscope = bms_add_member(qualscope, sjinfo->ojrelid);
 	ojscope = bms_union(sjinfo->min_lefthand, sjinfo->min_righthand);
-	nonnullable_rels = sjinfo->syn_lefthand;
+	nonnullable_rels = bms_copy(sjinfo->syn_lefthand);
 
 	/*
 	 * If this join can commute with any other ones per outer-join identity 3,
diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index aa72592567..1c9acf864c 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -390,6 +390,17 @@ remove_rel_from_query(PlannerInfo *root, int relid, SpecialJoinInfo *sjinfo)
 	{
 		SpecialJoinInfo *sjinf = (SpecialJoinInfo *) lfirst(l);
 
+		/*
+		 * initsplan.c is fairly cavalier about allowing SpecialJoinInfos'
+		 * lefthand/righthand relid sets to be shared with other data
+		 * structures.  Ensure that we don't modify the original relid sets.
+		 * (The commute_xxx sets are always per-SpecialJoinInfo though.)
+		 */
+		sjinf->min_lefthand = bms_copy

Re: add --no-sync to pg_upgrade's calls to pg_dump and pg_dumpall

2024-05-08 Thread Nathan Bossart
On Wed, May 08, 2024 at 10:09:46AM +0200, Peter Eisentraut wrote:
> On 03.05.24 19:13, Nathan Bossart wrote:
>> This is likely small potatoes compared to some of the other
>> pg_upgrade-related improvements I've proposed [0] [1] or plan to propose,
>> but this is easy enough, and I already wrote the patch, so here it is.
>> AFAICT there's no reason to bother syncing these dump files to disk.  If
>> someone pulls the plug during pg_upgrade, it's not like you can resume
>> pg_upgrade from where it left off.  Also, I think we skipped syncing before
>> v10, anyway, as the --no-sync flag was only added in commit 96a7128, which
>> added the code to sync dump files, too.
> 
> Looks good to me.

Thanks for looking.  I noticed that the version check is unnecessary since
we always use the new binary's pg_dump[all], so I removed that in v2.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From 265a999ed65bf56491f76ae013f705ab64491486 Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Fri, 3 May 2024 10:35:21 -0500
Subject: [PATCH v2 1/1] add --no-sync to pg_upgrade's calls to pg_dump[all]

---
 src/bin/pg_upgrade/dump.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/bin/pg_upgrade/dump.c b/src/bin/pg_upgrade/dump.c
index 29fb45b928..8345f55be8 100644
--- a/src/bin/pg_upgrade/dump.c
+++ b/src/bin/pg_upgrade/dump.c
@@ -22,7 +22,7 @@ generate_old_dump(void)
 	/* run new pg_dumpall binary for globals */
 	exec_prog(UTILITY_LOG_FILE, NULL, true, true,
 			  "\"%s/pg_dumpall\" %s --globals-only --quote-all-identifiers "
-			  "--binary-upgrade %s -f \"%s/%s\"",
+			  "--binary-upgrade %s --no-sync -f \"%s/%s\"",
 			  new_cluster.bindir, cluster_conn_opts(&old_cluster),
 			  log_opts.verbose ? "--verbose" : "",
 			  log_opts.dumpdir,
@@ -53,7 +53,7 @@ generate_old_dump(void)
 
 		parallel_exec_prog(log_file_name, NULL,
 		   "\"%s/pg_dump\" %s --schema-only --quote-all-identifiers "
-		   "--binary-upgrade --format=custom %s --file=\"%s/%s\" %s",
+		   "--binary-upgrade --format=custom %s --no-sync --file=\"%s/%s\" %s",
 		   new_cluster.bindir, cluster_conn_opts(&old_cluster),
 		   log_opts.verbose ? "--verbose" : "",
 		   log_opts.dumpdir,
-- 
2.25.1



Re: Revert: Remove useless self-joins *and* -DREALLOCATE_BITMAPSETS make server crash, regress test fail.

2024-05-08 Thread Tom Lane
BTW, now that I've wrapped my head around what's happening here,
I believe that -DREALLOCATE_BITMAPSETS is introducing a bug where
there was none before.  The changes that left-join removal makes
won't cause any of these sets to go to empty, so the bms_del_member
calls won't free the sets but just modify them in-place.  And the
same change will/should be made in every relevant relid set, so
the fact that the sets may be shared isn't hurting anything.

This is, of course, pretty fragile and I'm totally on board with
making it safer.  But there's no live bug in released branches,
and not in HEAD either unless you add -DREALLOCATE_BITMAPSETS.
That accounts for the lack of related field reports, and it means
we don't really need to back-patch anything.

This conclusion also reinforces my previously-vague feeling that
we should not consider making -DREALLOCATE_BITMAPSETS the default in
debug builds, as was proposed upthread.  It's really a fundamentally
different behavior, and I strongly suspect that it can mask bugs as
well as introduce them (by hiding sharing in cases that'd be less
benign than this turns out to be).  I'd rather not do development on
top of bitmapset infrastructure that acts entirely different from
production bitmapset infrastructure.

regards, tom lane




Re: add --no-sync to pg_upgrade's calls to pg_dump and pg_dumpall

2024-05-08 Thread Tom Lane
Nathan Bossart  writes:
> Thanks for looking.  I noticed that the version check is unnecessary since
> we always use the new binary's pg_dump[all], so I removed that in v2.

+1

regards, tom lane




Re: Add SPLIT PARTITION/MERGE PARTITIONS commands

2024-05-08 Thread Alexander Korotkov
On Wed, May 1, 2024 at 12:14 AM Dmitry Koval  wrote:
> 30.04.2024 6:00, Alexander Lakhin пишет:
> > Maybe I'm doing something wrong, but the following script:
> > CREATE TABLE t (i int, PRIMARY KEY(i)) PARTITION BY RANGE (i);
> > CREATE TABLE tp_0 PARTITION OF t FOR VALUES FROM (0) TO (1);
> > CREATE TABLE tp_1 PARTITION OF t FOR VALUES FROM (1) TO (2);
> >
> > CREATE TABLE t2 (LIKE t INCLUDING ALL);
> > CREATE TABLE tp2 (LIKE tp_0 INCLUDING ALL);
> > creates tables t2, tp2 without not-null constraints.
>
> To create partitions is used the "CREATE TABLE ... LIKE ..." command
> with the "EXCLUDING INDEXES" modifier (to speed up the insertion of
values).
>
> CREATE TABLE t (i int, PRIMARY KEY(i)) PARTITION BY RANGE(i);
> CREATE TABLE t2 (LIKE t INCLUDING ALL EXCLUDING INDEXES EXCLUDING
IDENTITY);
> \d+ t2;
> ...
> Not-null constraints:
>  "t2_i_not_null" NOT NULL "i"
> Access method: heap

I've explored this a little bit more.

If the parent table has explicit not null constraint than results of
MERGE/SPLIT look the same as result of CREATE TABLE ... PARTITION OF.  In
every case there is explicit not null constraint in all the cases.

# CREATE TABLE t (i int not null, PRIMARY KEY(i)) PARTITION BY RANGE(i);
# \d+ t
  Partitioned table "public.t"
 Column |  Type   | Collation | Nullable | Default | Storage | Compression
| Stats target | Description
+-+---+--+-+-+-+--+-
 i  | integer |   | not null | | plain   |
|  |
Partition key: RANGE (i)
Indexes:
"t_pkey" PRIMARY KEY, btree (i)
Not-null constraints:
"t_i_not_null" NOT NULL "i"
Number of partitions: 0
# CREATE TABLE tp_0_2 PARTITION OF t FOR VALUES FROM (0) TO (2);
# \d+ tp_0_2
 Table "public.tp_0_2"
 Column |  Type   | Collation | Nullable | Default | Storage | Compression
| Stats target | Description
+-+---+--+-+-+-+--+-
 i  | integer |   | not null | | plain   |
|  |
Partition of: t FOR VALUES FROM (0) TO (2)
Partition constraint: ((i IS NOT NULL) AND (i >= 0) AND (i < 2))
Indexes:
"tp_0_2_pkey" PRIMARY KEY, btree (i)
Not-null constraints:
"t_i_not_null" NOT NULL "i" (inherited)
Access method: heap
# ALTER TABLE t SPLIT PARTITION tp_0_2 INTO
#(PARTITION tp_0_1 FOR VALUES FROM (0) TO (1),
# PARTITION tp_1_2 FOR VALUES FROM (1) TO (2))
# \d+ tp_0_1
 Table "public.tp_0_1"
 Column |  Type   | Collation | Nullable | Default | Storage | Compression
| Stats target | Description
+-+---+--+-+-+-+--+-
 i  | integer |   | not null | | plain   |
|  |
Partition of: t FOR VALUES FROM (0) TO (1)
Partition constraint: ((i IS NOT NULL) AND (i >= 0) AND (i < 1))
Indexes:
"tp_0_1_pkey" PRIMARY KEY, btree (i)
Not-null constraints:
"t_i_not_null" NOT NULL "i" (inherited)
Access method: heap

However, if not null constraint is implicit and derived from primary key,
the situation is different.  The partition created by CREATE TABLE ...
PARTITION OF doesn't have explicit not null constraint just like the
parent.  But the partition created by MERGE/SPLIT has explicit not null
contraint.

# CREATE TABLE t (i int not null, PRIMARY KEY(i)) PARTITION BY RANGE(i);
# \d+ t
  Partitioned table "public.t"
 Column |  Type   | Collation | Nullable | Default | Storage | Compression
| Stats target | Description
+-+---+--+-+-+-+--+-
 i  | integer |   | not null | | plain   |
|  |
Partition key: RANGE (i)
Indexes:
"t_pkey" PRIMARY KEY, btree (i)
Number of partitions: 0
# CREATE TABLE tp_0_2 PARTITION OF t FOR VALUES FROM (0) TO (2);
# \d+ tp_0_2
 Table "public.tp_0_2"
 Column |  Type   | Collation | Nullable | Default | Storage | Compression
| Stats target | Description
+-+---+--+-+-+-+--+-
 i  | integer |   | not null | | plain   |
|  |
Partition of: t FOR VALUES FROM (0) TO (2)
Partition constraint: ((i IS NOT NULL) AND (i >= 0) AND (i < 2))
Indexes:
"tp_0_2_pkey" PRIMARY KEY, btree (i)
Access method: heap
# ALTER TABLE t SPLIT PARTITION tp_0_2 INTO
#(PARTITION tp_0_1 FOR VALUES FROM (0) TO (1),
# PARTITION tp_1_2 FOR VALUES FROM (1) TO (2))
# \d+ tp_0_1
 Table "public.tp_0_1"
 Column |  Type   | Collation | Nullable | Default | Storage | Compression
| Stats target | Description
+-+---+--+-+-+

Re: cataloguing NOT NULL constraints

2024-05-08 Thread Alvaro Herrera
On 2024-May-07, Kyotaro Horiguchi wrote:

> Hello,
> 
> At Wed, 1 May 2024 19:49:35 +0200, Alvaro Herrera  
> wrote in 
> > Here are two patches that I intend to push soon (hopefully tomorrow).
> 
> This commit added and edited two error messages, resulting in using
> slightly different wordings "in" and "on" for relation constraints.
> 
> +   errmsg("cannot change NO INHERIT status of NOT NULL constraint \"%s\" on 
> relation \"%s\"",
> ===
> +   errmsg("cannot change NO INHERIT status of NOT NULL constraint \"%s\" in 
> relation \"%s\"",

Thank you, I hadn't noticed the inconsistency -- I fix this in the
attached series.

While trying to convince myself that I could mark the remaining open
item for this work closed, I discovered that pg_dump fails to produce
working output for some combinations.  Notably, if I create Andrew
Bille's example in 16:

create table test_0 (id serial primary key);
create table test_1 (id integer primary key) inherits (test_0);

then current master's pg_dump produces output that the current server
fails to restore, failing the PK creation in test_0:

ALTER TABLE ONLY public.test_0
ADD CONSTRAINT test_0_pkey PRIMARY KEY (id);
ERROR:  cannot change NO INHERIT status of NOT NULL constraint 
"pgdump_throwaway_notnull_0" in relation "test_1"

because we have already created the NOT NULL NO INHERIT constraint in
test_1 when we created it, and because of d45597f72fe5, we refuse to
change it into a regular inheritable constraint, which the PK in its
parent table needs.

I spent a long time trying to think how to fix this, and I had despaired
wanting to write that I would need to revert the whole NOT NULL business
for pg17 -- but that was until I realized that we don't actually need
this NOT NULL NO INHERIT business except during pg_upgrade, and that
simplifies things enough to give me confidence that the whole feature
can be kept.

Because, remember: the idea of those NO INHERIT "throwaway" constraints
is that we can skip reading the data when we create the PRIMARY KEY
during binary upgrade.  We don't actually need the NO INHERIT
constraints for anything during regular pg_dump.  So what we can do, is
restrict the usage of NOT NULL NO INHERIT so that they occur only during
pg_upgrade.  I think this will make Justin P. happier, because we no
longer have these unsightly NOT NULL NO INHERIT nonstandard syntax in
dumps.

The attached patch series does that.  Actually, it does a little more,
but it's not really much:

0001: fix the typos pointed out by Kyotaro.

0002: A mechanical code movement that takes some ugly ballast out of
getTableAttrs into its own routine.  I realized that this new code was
far too ugly and messy to be in the middle of filling the tbinfo struct
of attributes.  If you use "git show --color-moved
--color-moved-ws=ignore-all-space" with this commit you can see that
nothing happens apart from the code move.

0003: pgindent, fixes the comments just moved to account for different
indentation depth.

0004: moves again the moved PQfnumber() calls back to getTableAttrs(),
for efficiency (we don't want to search the result for those resnums for
every single attribute of all tables being dumped).

0005: This is the actual code change I describe above.  We restrict
use_throwaway_nulls so that it's only set during binary upgrade mode.
This changes pg_dump output; in the normal case, we no longer have NOT
NULL NO INHERIT.  I added one test stanza to verify that pg_upgrade
retains these clauses, where they are critical.

0006: Tighten up what d45597f72fe5 did, in that outside of binary
upgrade mode, we no longer accept changes to NOT NULL NO INHERIT
constraints so that they become INHERIT.  Previously we accepted that
during recursion, but this isn't really very principled.  (I had
accepted this because pg_dump required it for some other cases).  This
changes some test output, and I also simplify some test cases that were
testing stuff that's no longer interesting.

(To push, I'll squash 0002+0003+0004 as a single one, and perhaps 0005
with them; I produced them like this only to make them easy to see
what's changing.)


I also have a pending patch for 16 that adds tables like the problematic
ones so that they remain for future pg_upgrade testing.  With the
changes in this series, the whole thing finally works AFAICT.

I did notice one more small bit of weirdness, which is that at the end
of the process you may end up with constraints that retain the throwaway
name.  This doesn't seem at all critical, considering that you can't
drop them anyway and such names do not survive a further dump (because
they are marked as inherited constraint without a "local" definition, so
they're not dumped separately).  I would still like to fix it, but it
seems to require unduly contortions so I may end up not doing anything
about it.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
>From 5cdec1b6ce61f75d886109d7daafd57b8064a4a6 Mon Sep 17 00:00:00 2001
From: A

Re: Add SPLIT PARTITION/MERGE PARTITIONS commands

2024-05-08 Thread Justin Pryzby
On Wed, May 08, 2024 at 09:00:10PM +0300, Alexander Korotkov wrote:
> On Fri, May 3, 2024 at 4:32 PM Alexander Korotkov  
> wrote:
> > On Fri, May 3, 2024 at 4:23 PM Justin Pryzby  wrote:
> > > On Wed, May 01, 2024 at 10:51:24PM +0300, Dmitry Koval wrote:
> > > > 30.04.2024 23:15, Justin Pryzby пишет:
> > > > > Is this issue already fixed ?
> > > > > I wasn't able to reproduce it.  Maybe it only happened with earlier
> > > > > patch versions applied ?
> > > >
> > > > I think this was fixed in commit [1].
> > > >
> > > > [1] 
> > > > https://github.com/postgres/postgres/commit/fcf80c5d5f0f3787e70fca8fd029d2e08a923f91
> > >
> > > I tried to reproduce it at fcf80c5d5f~, but couldn't.
> > > I don't see how that patch would fix it anyway.
> > > I'm hoping Alexander can confirm what happened.
> >
> > This problem is only relevant for an old version of fix [1], which
> > overrides schemas for new partitions.  That version was never
> > committed.
> 
> Here are the patches.
> 0002 Skips copying extended statistics while creating new partitions in 
> MERGE/SPLIT
> 
> For 0002 I'd like to hear some feedback on wordings used in docs and comments.

commit message:

Currenlty => Currently
partiions => partitios
copying => by copying

> However, parent's table extended statistics already covers all its
> children.

=> That's the wrong explanation.  It's not that "stats on the parent
table cover its children".  It's that there are two types of stats:
stats for the "table hierarchy" and stats for the individual table.
That's true for single-column stats as well as for extended stats.
In both cases, that's indicated by the inh flag in the code and in the
catalog.

The right explanation is that extended stats on partitioned tables are
not similar to indexes.  Indexes on parent table are nothing other than
a mechanism to create indexes on the child tables.  That's not true for
stats.

See also my prior messages
ZiJW1g2nbQs9ekwK@pryzbyj2023
Zi5Msg74C61DjJKW@pryzbyj2023

I think EXCLUDE IDENTITY can/should now also be removed - see 509199587.
I'm not able to reproduce that problem anyway, even before that...

-- 
Justin




Re: 2024-05-09 release announcement draft

2024-05-08 Thread David Rowley
On Thu, 9 May 2024 at 04:17, Jonathan S. Katz  wrote:
> * Fix how
> [`INSERT`](https://www.postgresql.org/docs/current/sql-insert.html)
> handles multiple
> [`VALUES`](https://www.postgresql.org/docs/current/sql-values.html) rows
> into a target column that is a domain over an array or composite type.

Maybe it's only me who thinks the double plural of "VALUES rows" is
hard to parse. If that's the case I'll just drop this as it's not that
important.

FWIW, I'd probably write:

* Fix issue with
[`INSERT`](https://www.postgresql.org/docs/current/sql-insert.html)
with a multi-row
[`VALUES`](https://www.postgresql.org/docs/current/sql-values.html) clause
where a target column is a domain over an array or composite type.

I'll argue no further with this.

David




Re: Add SPLIT PARTITION/MERGE PARTITIONS commands

2024-05-08 Thread Alexander Korotkov
On Thu, May 9, 2024 at 12:37 AM Justin Pryzby  wrote:
>
> On Wed, May 08, 2024 at 09:00:10PM +0300, Alexander Korotkov wrote:
> > On Fri, May 3, 2024 at 4:32 PM Alexander Korotkov  
> > wrote:
> > > On Fri, May 3, 2024 at 4:23 PM Justin Pryzby  wrote:
> > > > On Wed, May 01, 2024 at 10:51:24PM +0300, Dmitry Koval wrote:
> > > > > 30.04.2024 23:15, Justin Pryzby пишет:
> > > > > > Is this issue already fixed ?
> > > > > > I wasn't able to reproduce it.  Maybe it only happened with earlier
> > > > > > patch versions applied ?
> > > > >
> > > > > I think this was fixed in commit [1].
> > > > >
> > > > > [1] 
> > > > > https://github.com/postgres/postgres/commit/fcf80c5d5f0f3787e70fca8fd029d2e08a923f91
> > > >
> > > > I tried to reproduce it at fcf80c5d5f~, but couldn't.
> > > > I don't see how that patch would fix it anyway.
> > > > I'm hoping Alexander can confirm what happened.
> > >
> > > This problem is only relevant for an old version of fix [1], which
> > > overrides schemas for new partitions.  That version was never
> > > committed.
> >
> > Here are the patches.
> > 0002 Skips copying extended statistics while creating new partitions in 
> > MERGE/SPLIT
> >
> > For 0002 I'd like to hear some feedback on wordings used in docs and 
> > comments.
>
> commit message:
>
> Currenlty => Currently
> partiions => partitios
> copying => by copying


Thank you!

>
> > However, parent's table extended statistics already covers all its
> > children.
>
> => That's the wrong explanation.  It's not that "stats on the parent
> table cover its children".  It's that there are two types of stats:
> stats for the "table hierarchy" and stats for the individual table.
> That's true for single-column stats as well as for extended stats.
> In both cases, that's indicated by the inh flag in the code and in the
> catalog.
>
> The right explanation is that extended stats on partitioned tables are
> not similar to indexes.  Indexes on parent table are nothing other than
> a mechanism to create indexes on the child tables.  That's not true for
> stats.
>
> See also my prior messages
> ZiJW1g2nbQs9ekwK@pryzbyj2023
> Zi5Msg74C61DjJKW@pryzbyj2023

Yes, I understand that parents pg_statistic entry with stainherit ==
true includes statistics for the children.  I tried to express this by
word "covers".  But you're right, this is the wrong explanation.

Can I, please, ask you to revise the patch?

> I think EXCLUDE IDENTITY can/should now also be removed - see 509199587.
> I'm not able to reproduce that problem anyway, even before that...

I will check this.

--
Regards,
Alexander Korotkov
Supabase




ALTER EXTENSION SET SCHEMA versus dependent types

2024-05-08 Thread Tom Lane
I happened to notice that the comment for AlterObjectNamespace_oid
claims that

 * ... it doesn't have to deal with certain special cases
 * such as not wanting to process array types --- those should never
 * be direct members of an extension anyway.

This struck me as probably broken in the wake of e5bc9454e
(Explicitly list dependent types as extension members in pg_depend),
and sure enough a moment's worth of testing showed it is:

regression=# create schema s1;
CREATE SCHEMA
regression=# create extension cube with schema s1;
CREATE EXTENSION
regression=# create schema s2;
CREATE SCHEMA
regression=# alter extension cube set schema s2;
ERROR:  cannot alter array type s1.cube[]
HINT:  You can alter type s1.cube, which will alter the array type as well.

So we need to do something about that; and the fact that this escaped
testing shows that our coverage for ALTER EXTENSION SET SCHEMA is
pretty lame.

The attached patch fixes up the code and adds a new test to
the test_extensions module.  The fix basically is to skip the
pg_depend entries for dependent types, assuming that they'll
get dealt with when we process their parent objects.

regards, tom lane

diff --git a/src/backend/commands/alter.c b/src/backend/commands/alter.c
index 12802b9d3f..4f99ebb447 100644
--- a/src/backend/commands/alter.c
+++ b/src/backend/commands/alter.c
@@ -598,16 +598,16 @@ ExecAlterObjectSchemaStmt(AlterObjectSchemaStmt *stmt,
 /*
  * Change an object's namespace given its classOid and object Oid.
  *
- * Objects that don't have a namespace should be ignored.
+ * Objects that don't have a namespace should be ignored, as should
+ * dependent types such as array types.
  *
  * This function is currently used only by ALTER EXTENSION SET SCHEMA,
- * so it only needs to cover object types that can be members of an
- * extension, and it doesn't have to deal with certain special cases
- * such as not wanting to process array types --- those should never
- * be direct members of an extension anyway.
+ * so it only needs to cover object kinds that can be members of an
+ * extension, and it can silently ignore dependent types --- we assume
+ * those will be moved when their parent object is moved.
  *
  * Returns the OID of the object's previous namespace, or InvalidOid if
- * object doesn't have a schema.
+ * object doesn't have a schema or was ignored due to being a dependent type.
  */
 Oid
 AlterObjectNamespace_oid(Oid classId, Oid objid, Oid nspOid,
@@ -631,7 +631,7 @@ AlterObjectNamespace_oid(Oid classId, Oid objid, Oid nspOid,
 			}
 
 		case TypeRelationId:
-			oldNspOid = AlterTypeNamespace_oid(objid, nspOid, objsMoved);
+			oldNspOid = AlterTypeNamespace_oid(objid, nspOid, true, objsMoved);
 			break;
 
 		case ProcedureRelationId:
diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index 77d8c9e186..1643c8c69a 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -2940,7 +2940,7 @@ AlterExtensionNamespace(const char *extensionName, const char *newschema, Oid *o
 
 		/*
 		 * If not all the objects had the same old namespace (ignoring any
-		 * that are not in namespaces), complain.
+		 * that are not in namespaces or are dependent types), complain.
 		 */
 		if (dep_oldNspOid != InvalidOid && dep_oldNspOid != oldNspOid)
 			ereport(ERROR,
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 5bf5e69c5b..e5e67ed64c 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -18018,8 +18018,8 @@ AlterTableNamespaceInternal(Relation rel, Oid oldNspOid, Oid nspOid,
 
 	/* Fix the table's row type too, if it has one */
 	if (OidIsValid(rel->rd_rel->reltype))
-		AlterTypeNamespaceInternal(rel->rd_rel->reltype,
-   nspOid, false, false, objsMoved);
+		AlterTypeNamespaceInternal(rel->rd_rel->reltype, nspOid,
+   false, false, false, objsMoved);
 
 	/* Fix other dependent stuff */
 	AlterIndexNamespaces(classRel, rel, oldNspOid, nspOid, objsMoved);
diff --git a/src/backend/commands/typecmds.c b/src/backend/commands/typecmds.c
index 315b0feb8e..bb023c7820 100644
--- a/src/backend/commands/typecmds.c
+++ b/src/backend/commands/typecmds.c
@@ -4068,7 +4068,7 @@ AlterTypeNamespace(List *names, const char *newschema, ObjectType objecttype,
 	typename = makeTypeNameFromNameList(names);
 	typeOid = typenameTypeId(NULL, typename);
 
-	/* Don't allow ALTER DOMAIN on a type */
+	/* Don't allow ALTER DOMAIN on a non-domain type */
 	if (objecttype == OBJECT_DOMAIN && get_typtype(typeOid) != TYPTYPE_DOMAIN)
 		ereport(ERROR,
 (errcode(ERRCODE_WRONG_OBJECT_TYPE),
@@ -4079,7 +4079,7 @@ AlterTypeNamespace(List *names, const char *newschema, ObjectType objecttype,
 	nspOid = LookupCreationNamespace(newschema);
 
 	objsMoved = new_object_addresses();
-	oldNspOid = AlterTypeNamespace_oid(typeOid, nspOid, objsMoved);
+	oldNspOid = AlterTypeNamespace_oid(typeOid, nspOid, false, objsMoved);
 	

Re: Revert: Remove useless self-joins *and* -DREALLOCATE_BITMAPSETS make server crash, regress test fail.

2024-05-08 Thread David Rowley
On Thu, 9 May 2024 at 06:49, Tom Lane  wrote:
> BTW, now that I've wrapped my head around what's happening here,
> I believe that -DREALLOCATE_BITMAPSETS is introducing a bug where
> there was none before.  The changes that left-join removal makes
> won't cause any of these sets to go to empty, so the bms_del_member
> calls won't free the sets but just modify them in-place.  And the
> same change will/should be made in every relevant relid set, so
> the fact that the sets may be shared isn't hurting anything.

FWIW, it just feels like we're willing to accept that the
bms_del_member() is not updating all copies of the set in this case as
that particular behaviour is ok for this particular case. I know
you're not proposing this, but I don't think that would warrant
relaxing REALLOCATE_BITMAPSETS to not reallocate Bitmapsets on
bms_del_member() and bms_del_members().

If all we have to do to make -DREALLOCATE_BITMAPSETS builds happy in
make check-world is to add a bms_copy inside the bms_del_member()
calls in remove_rel_from_query(), then I think it's a small price to
pay to allow us to maintain the additional coverage that
REALLOCATE_BITMAPSETS provides. That seems like a small price to pay
when the gains are removing an entire join.

> This conclusion also reinforces my previously-vague feeling that
> we should not consider making -DREALLOCATE_BITMAPSETS the default in
> debug builds, as was proposed upthread.  It's really a fundamentally
> different behavior, and I strongly suspect that it can mask bugs as
> well as introduce them (by hiding sharing in cases that'd be less
> benign than this turns out to be).  I'd rather not do development on
> top of bitmapset infrastructure that acts entirely different from
> production bitmapset infrastructure.

My primary interest in this feature is using it to catch bugs that
we're unlikely to ever hit in the regression tests.  For example, the
planner works when there are <= 63 RTEs but falls over when there are
64 because some bms_add_member() must reallocate more memory to store
the 64th RTI in a Bitmapset. I'd like to have something to make it
more likely we'll find bugs like this before the release instead of
someone having a crash when they run some obscure query shape
containing > 63 RTEs 2 or 4 years after the release.

I'm happy Andrew added this to prion. Thanks for doing that.

David




Re: 2024-05-09 release announcement draft

2024-05-08 Thread Jonathan S. Katz

On 5/8/24 5:44 PM, David Rowley wrote:

On Thu, 9 May 2024 at 04:17, Jonathan S. Katz  wrote:

* Fix how
[`INSERT`](https://www.postgresql.org/docs/current/sql-insert.html)
handles multiple
[`VALUES`](https://www.postgresql.org/docs/current/sql-values.html) rows
into a target column that is a domain over an array or composite type.


Maybe it's only me who thinks the double plural of "VALUES rows" is
hard to parse. If that's the case I'll just drop this as it's not that
important.

FWIW, I'd probably write:

* Fix issue with
[`INSERT`](https://www.postgresql.org/docs/current/sql-insert.html)
with a multi-row
[`VALUES`](https://www.postgresql.org/docs/current/sql-values.html) clause
where a target column is a domain over an array or composite type.


I like your wording, and went with that.

Thanks,

Jonathan



OpenPGP_signature.asc
Description: OpenPGP digital signature


Re: Revert: Remove useless self-joins *and* -DREALLOCATE_BITMAPSETS make server crash, regress test fail.

2024-05-08 Thread David Rowley
On Thu, 9 May 2024 at 06:24, Tom Lane  wrote:
> I like alternative-2.patch a lot better, not least because it
> only adds cycles when join removal actually fires.  Basically
> this is putting the onus on the data structure modifier to
> cope with shared bitmapsets, rather than trying to say that
> sharing is disallowed.
>
> Thoughts?

I'm fine with this one as it's the same as what I already mentioned
earlier.  I had imagined doing bms_del_member(bms_copy ... but maybe
the compiler is able to optimise away the additional store. Likely, it
does not matter much as pallocing memory likely adds far more overhead
anyway.

David




Re: Revert: Remove useless self-joins *and* -DREALLOCATE_BITMAPSETS make server crash, regress test fail.

2024-05-08 Thread Tom Lane
David Rowley  writes:
> On Thu, 9 May 2024 at 06:49, Tom Lane  wrote:
>> BTW, now that I've wrapped my head around what's happening here,
>> I believe that -DREALLOCATE_BITMAPSETS is introducing a bug where
>> there was none before.  The changes that left-join removal makes
>> won't cause any of these sets to go to empty, so the bms_del_member
>> calls won't free the sets but just modify them in-place.  And the
>> same change will/should be made in every relevant relid set, so
>> the fact that the sets may be shared isn't hurting anything.

> FWIW, it just feels like we're willing to accept that the
> bms_del_member() is not updating all copies of the set in this case as
> that particular behaviour is ok for this particular case. I know
> you're not proposing this,

No, I'm not.  I was just trying to explain how come there's not
a visible bug.  I quite agree that this is too fragile to leave
as-is going forward.  (One thing I'm wondering about is whether
we should back-patch despite the lack of visible bug, just in
case some future back-patch relies on the safer behavior.)

>> This conclusion also reinforces my previously-vague feeling that
>> we should not consider making -DREALLOCATE_BITMAPSETS the default in
>> debug builds, as was proposed upthread.

> My primary interest in this feature is using it to catch bugs that
> we're unlikely to ever hit in the regression tests.  For example, the
> planner works when there are <= 63 RTEs but falls over when there are
> 64 because some bms_add_member() must reallocate more memory to store
> the 64th RTI in a Bitmapset. I'd like to have something to make it
> more likely we'll find bugs like this before the release instead of
> someone having a crash when they run some obscure query shape
> containing > 63 RTEs 2 or 4 years after the release.

Again, I think -DREALLOCATE_BITMAPSETS adds a valuable testing weapon.
But if we make that the default in debug builds, then we'll get next
door to zero testing of the behavior without it, and that seems like
a really bad idea given how different the behavior is.

(Speaking of which, I wonder how many buildfarm members build without
--enable-cassert.  The answer could well be "zero", and that's likely
not good.)

> I'm happy Andrew added this to prion. Thanks for doing that.

+1

regards, tom lane




Re: Support tid range scan in parallel?

2024-05-08 Thread Cary Huang
Thank you very much for the test and review. Greatly appreciated!
 
> This now calls heap_setscanlimits() for the parallel version, it's
> just that table_block_parallelscan_nextpage() does nothing to obey
> those limits.

yes, you are absolutely right. Though heap_setscanlimits() is now called by
parallel tid range scan,  table_block_parallelscan_nextpage() does nothing
to obey these limits, resulting in more blocks being inefficiently filtered out
by the optimization code you mentioned in heap_getnextslot_tidrange().

> I've not studied exactly how you'd get the rs_numblock information
> down to the parallel scan descriptor.  But when you figure that out,
> just remember that you can't do the --scan->rs_numblocks from
> table_block_parallelscan_nextpage() as that's not parallel safe. You
> might be able to add an or condition to: "if (nallocated >=
> pbscan->phs_nblocks)" to make it "if (nallocated >=
> pbscan->phs_nblocks || nallocated >= pbscan->phs_numblocks)",
> although the field names don't seem very intuitive there. It would be
> nicer if the HeapScanDesc field was called rs_blocklimit rather than
> rs_numblocks.  It's not for this patch to go messing with that,
> however.

rs_numblock was not passed down to the parallel scan context and
table_block_parallelscan_nextpage() did not seem to have a logic to limit
the block scan range set by heap_setscanlimits() in parallel scan. Also, I
noticed that the rs_startblock was also not passed to the parallel scan
context, which causes the parallel scan always start from 0 even when a
lower ctid bound is specified.

so I added a logic in heap_set_tidrange() to pass these 2 values to parallel
scan descriptor as "phs_startblock" and "phs_numblock". These will be
available in table_block_parallelscan_nextpage() in parallel scan. 

I followed your recommendation and modified the condition to:

if (nallocated >= pbscan->phs_nblocks || (pbscan->phs_numblock != 0 &&
nallocated >= pbscan->phs_numblock))

so that the parallel tid range scan will stop when the upper scan limit is
reached. With these changes, I see that the number of buffer reads is
consistent between single and parallel ctid range scans. The v3 patch is
attached.


postgres=# explain (analyze, buffers) select count(*) from test where ctid >= 
'(0,1)' and ctid < '(11,0)';
   QUERY PLAN
-
 Aggregate  (cost=39.43..39.44 rows=1 width=8) (actual time=1.007..1.008 rows=1 
loops=1)
   Buffers: shared read=11
   ->  Tid Range Scan on test  (cost=0.01..34.35 rows=2034 width=0) (actual 
time=0.076..0.639 rows=2035 loops=1)
 TID Cond: ((ctid >= '(0,1)'::tid) AND (ctid < '(11,0)'::tid))
 Buffers: shared read=11

postgres=# set max_parallel_workers_per_gather=2;
SET
postgres=# explain (analyze, buffers) select count(*) from test where ctid >= 
'(0,1)' and ctid < '(11,0)';
 QUERY PLAN

 Finalize Aggregate  (cost=16.45..16.46 rows=1 width=8) (actual 
time=14.329..16.840 rows=1 loops=1)
   Buffers: shared hit=11
   ->  Gather  (cost=16.43..16.44 rows=2 width=8) (actual time=3.197..16.814 
rows=3 loops=1)
 Workers Planned: 2
 Workers Launched: 2
 Buffers: shared hit=11
 ->  Partial Aggregate  (cost=16.43..16.44 rows=1 width=8) (actual 
time=0.705..0.706 rows=1 loops=3)
   Buffers: shared hit=11
   ->  Parallel Tid Range Scan on test  (cost=0.01..14.31 rows=848 
width=0) (actual time=0.022..0.423 rows=678 loops=3)
 TID Cond: ((ctid >= '(0,1)'::tid) AND (ctid < 
'(11,0)'::tid))
 Buffers: shared hit=11

postgres=# set max_parallel_workers_per_gather=3;
SET
postgres=# explain (analyze, buffers) select count(*) from test where ctid >= 
'(0,1)' and ctid < '(11,0)';
 QUERY PLAN

 Finalize Aggregate  (cost=12.74..12.75 rows=1 width=8) (actual 
time=16.793..19.053 rows=1 loops=1)
   Buffers: shared hit=11
   ->  Gather  (cost=12.72..12.73 rows=3 width=8) (actual time=2.827..19.012 
rows=4 loops=1)
 Workers Planned: 3
 Workers Launched: 3
 Buffers: shared hit=11
 ->  Partial Aggregate  (cost=12.72..12.73 rows=1 width=8) (actual 
time=0.563..0.565 rows=1 loops=4)
   Buffers: shared hit=11
   ->  Parallel Tid Range Scan on test  (cost=0.01..11.08 rows=656 
width=0) (actual time=0.018..0.338 rows=509 loops=4)
 TID Cond: ((ctid >= '(0,1)'::tid) AND (ctid < 
'(11,0)'::tid))
 Buffers: shared

Re: Revert: Remove useless self-joins *and* -DREALLOCATE_BITMAPSETS make server crash, regress test fail.

2024-05-08 Thread Tom Lane
David Rowley  writes:
> I'm fine with this one as it's the same as what I already mentioned
> earlier.  I had imagined doing bms_del_member(bms_copy ... but maybe
> the compiler is able to optimise away the additional store. Likely, it
> does not matter much as pallocing memory likely adds far more overhead
> anyway.

I actually wrote it that way to start with, but undid it after
noticing that the existing code in remove_rel_from_restrictinfo
does it in separate steps, and thinking that that was good for
both separation of concerns and a cleaner git history.  I too
can't believe that an extra fetch will be noticeable compared
to the cost of the adjacent bms_xxx operations.

regards, tom lane




Re: ALTER EXTENSION SET SCHEMA versus dependent types

2024-05-08 Thread Nathan Bossart
On Wed, May 08, 2024 at 05:52:31PM -0400, Tom Lane wrote:
> The attached patch fixes up the code and adds a new test to
> the test_extensions module.  The fix basically is to skip the
> pg_depend entries for dependent types, assuming that they'll
> get dealt with when we process their parent objects.

Looks reasonable to me.  The added test coverage seems particularly
valuable.  If I really wanted to nitpick, I might complain about the three
consecutive Boolean parameters for AlterTypeNamespaceInternal(), which
makes lines like

+   AlterTypeNamespaceInternal(arrayOid, nspOid, true, false, true,
+  objsMoved);

difficult to interpret.  But that's not necessarily the fault of this patch
and probably needn't block it.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: ALTER EXTENSION SET SCHEMA versus dependent types

2024-05-08 Thread Tom Lane
Nathan Bossart  writes:
> Looks reasonable to me.  The added test coverage seems particularly
> valuable.  If I really wanted to nitpick, I might complain about the three
> consecutive Boolean parameters for AlterTypeNamespaceInternal(), which
> makes lines like

> + AlterTypeNamespaceInternal(arrayOid, nspOid, true, false, true,
> +objsMoved);

> difficult to interpret.  But that's not necessarily the fault of this patch
> and probably needn't block it.

I considered merging ignoreDependent and errorOnTableType into a
single 3-valued enum, but didn't think it was worth the trouble
given the very small number of callers; also it wasn't quite clear
how to map that to AlterTypeNamespace_oid's API.  Perhaps a little
more thought is appropriate though.

One positive reason for increasing the number of parameters is that
that will be a clear API break for any outside callers, if there
are any.  If I just replace a bool with an enum, such callers might
or might not get any indication that they need to take a fresh
look.

regards, tom lane




Re: ALTER EXTENSION SET SCHEMA versus dependent types

2024-05-08 Thread Nathan Bossart
On Wed, May 08, 2024 at 07:42:18PM -0400, Tom Lane wrote:
> Nathan Bossart  writes:
>> Looks reasonable to me.  The added test coverage seems particularly
>> valuable.  If I really wanted to nitpick, I might complain about the three
>> consecutive Boolean parameters for AlterTypeNamespaceInternal(), which
>> makes lines like
> 
>> +AlterTypeNamespaceInternal(arrayOid, nspOid, true, false, true,
>> +   objsMoved);
> 
>> difficult to interpret.  But that's not necessarily the fault of this patch
>> and probably needn't block it.
> 
> I considered merging ignoreDependent and errorOnTableType into a
> single 3-valued enum, but didn't think it was worth the trouble
> given the very small number of callers; also it wasn't quite clear
> how to map that to AlterTypeNamespace_oid's API.  Perhaps a little
> more thought is appropriate though.
> 
> One positive reason for increasing the number of parameters is that
> that will be a clear API break for any outside callers, if there
> are any.  If I just replace a bool with an enum, such callers might
> or might not get any indication that they need to take a fresh
> look.

Agreed.  Another option could be to just annotate the arguments with the
parameter names.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: ALTER EXTENSION SET SCHEMA versus dependent types

2024-05-08 Thread Tom Lane
Nathan Bossart  writes:
> On Wed, May 08, 2024 at 07:42:18PM -0400, Tom Lane wrote:
>> One positive reason for increasing the number of parameters is that
>> that will be a clear API break for any outside callers, if there
>> are any.  If I just replace a bool with an enum, such callers might
>> or might not get any indication that they need to take a fresh
>> look.

> Agreed.  Another option could be to just annotate the arguments with the
> parameter names.

At the call sites you mean?  Sure, I can do that.

regards, tom lane




Re: bug: copy progress reporting of backends which run multiple COPYs

2024-05-08 Thread Michael Paquier
On Wed, May 08, 2024 at 10:07:15AM -0400, Robert Haas wrote:
> I think you're hoping for too much. The progress reporting
> infrastructure is fundamentally designed around the idea that there
> can only be one progress-reporting operation in progress at a time.
> For COPY, that is, I believe, true, but for file_fdw, it's false. If
> we want to do any kind of progress reporting from within plannable
> queries, we need some totally different and much more complex
> infrastructure that can report progress for, probably, each plan node
> individually. I think it's likely a mistake to try to shoehorn cases
> like this into the infrastructure
> that we have today. It will just encourage people to try to use the
> current infrastructure in ways that are less and less like what it was
> actually designed to do; whereas what we should be doing if we want
> this kind of functionality, at least IMHO, is building infrastructure
> that's actually fit for purpose.

Hmm.  OK.  I have been looking around for cases out there where
BeginCopyFrom() could be called with a pstate where the reporting
could matter, and could not find anything worth worrying about.  It
still makes me a bit uneasy to not have a separate argument in the
function to control that.  Now, if you, Justin and Matthias agree with
this approach, I won't stand in the way either.
--
Michael


signature.asc
Description: PGP signature


Re: ALTER EXTENSION SET SCHEMA versus dependent types

2024-05-08 Thread Nathan Bossart
On Wed, May 08, 2024 at 07:57:55PM -0400, Tom Lane wrote:
> Nathan Bossart  writes:
>> Agreed.  Another option could be to just annotate the arguments with the
>> parameter names.
> 
> At the call sites you mean?  Sure, I can do that.

Yes.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: add --no-sync to pg_upgrade's calls to pg_dump and pg_dumpall

2024-05-08 Thread Michael Paquier
On Wed, May 08, 2024 at 02:49:58PM -0400, Tom Lane wrote:
> Nathan Bossart  writes:
>> Thanks for looking.  I noticed that the version check is unnecessary since
>> we always use the new binary's pg_dump[all], so I removed that in v2.
> 
> +1

+1.  Could there be an argument in favor of a backpatch?  This is a
performance improvement, but one could also side that the addition of
sync support in pg_dump[all] has made that a regression that we'd
better fix because the flushes don't matter in this context.  They
also bring costs for no gain.
--
Michael


signature.asc
Description: PGP signature


Re: Expand applicability of aggregate's sortop optimization

2024-05-08 Thread David Rowley
On Wed, 8 May 2024 at 22:13, Matthias van de Meent
 wrote:
> As you may know, aggregates like SELECT MIN(unique1) FROM tenk1; are
> rewritten as SELECT unique1 FROM tenk1 ORDER BY unique1 USING < LIMIT
> 1; by using the optional sortop field in the aggregator.
> However, this optimization is disabled for clauses that in itself have
> an ORDER BY clause such as `MIN(unique1 ORDER BY ), because
>  can cause reordering of distinguisable values like 1.0 and
> 1.00, which then causes measurable differences in the output. In the
> general case, that's a good reason to not apply this optimization, but
> in some cases, we could still apply the index optimization.

I wonder if we should also consider as an alternative to this to just
have an aggregate support function, similar to
SupportRequestOptimizeWindowClause that just nullifies the aggorder /
aggdistinct fields for Min/Max aggregates on types where there's no
possible difference in output when calling the transition function on
rows in a different order.

Would that apply in enough cases for you?

I think it would rule out Min(numeric) and Max(numeric). We were
careful not to affect the number of decimal places in the numeric
output when using the moving aggregate inverse transition
infrastructure for WindowFuncs, so I agree we should maintain an
ability to control the aggregate transition order for numeric. (See
do_numeric_discard's maxScale if check)

I don't think floating point types have the same issues here. At least
+1.0 is greater than -1.0.

Are there any strange collation rules that would cause issues if we
did this with the text types?

David




Re: Weird test mixup

2024-05-08 Thread Michael Paquier
On Tue, May 07, 2024 at 11:53:10AM -0700, Noah Misch wrote:
> On Tue, May 07, 2024 at 10:17:49AM +0900, Michael Paquier wrote:
>> Always resetting
>> condition->name when detaching a point is a simpler flow and saner
>> IMO.
>> 
>> Overall, this switches from one detach behavior to a different one,
> 
> Can you say more about that?  The only behavior change known to me is that a
> given injection point workload uses more of INJ_MAX_CONDITION.  If there's
> another behavior change, it was likely unintended.

As far as I read the previous patch, the conditions stored in
InjectionPointSharedState would never be really gone, even if the
points are removed from InjectionPointHash. 

> The problem I'm trying to tackle in this thread is to make
> src/test/modules/gin installcheck-safe.  $SUBJECT's commit 5105c90 started
> that work, having seen the intarray test suite break when run concurrently
> with the injection_points test suite.  That combination still does break at
> the exit-time race condition.  To reproduce, apply this attachment to add
> sleeps, and run:
> 
> make -C src/test/modules/gin installcheck USE_MODULE_DB=1 & sleep 2; make -C 
> contrib/intarray installcheck USE_MODULE_DB=1

Thanks for that.  I am not really sure how to protect that without a
small cost in flexibility for the cases of detach vs run paths.  This
comes down to the fact that a custom routine could be run while it
could be detached concurrently, removing any stuff a callback could
depend on in the module.

It was mentioned upthread to add to InjectionPointCacheEntry a fixed
area of memory that modules could use to store some "status" data, but
it would not close the run/detach race because a backend could still
hold a pointer to a callback, with concurrent backends playing with
the contents of InjectionPointCacheEntry (concurrent detaches and
attaches that would cause the same entries to be reused).

One way to close entirely the window would be to hold
InjectionPointLock longer in InjectionPointRun() until the callback
finishes or until it triggers an ERROR.  This would mean that the case
you've mentioned in [1] would change, by blocking the detach() of s3
until the callback of s2 finishes.  We don't have tests in the tree
that do any of that, so holding InjectionPointLock longer would not
break anything on HEAD.  A detach being possible while the callback is
run is something I've considered as valid in d86d20f0ba79, but perhaps
that was too cute of me, even more with the use case of local points.

> Separately, I see injection_points_attach() populates InjectionPointCondition
> after InjectionPointAttach().  Shouldn't InjectionPointAttach() come last, to
> avoid the same sort of race?  I've not tried to reproduce that one.

Good point.  You could run into the case of a concurrent backend
running an injection point that should be local if waiting between
InjectionPointAttach() and the condition getting registered in
injection_points_attach().  That should be reversed.

[1] https://www.postgresql.org/message-id/20240501231214...@rfd.leadboat.com
--
Michael


signature.asc
Description: PGP signature


Re: Expand applicability of aggregate's sortop optimization

2024-05-08 Thread David Rowley
On Thu, 9 May 2024 at 12:26, David Rowley  wrote:
> I wonder if we should also consider as an alternative to this to just
> have an aggregate support function, similar to
> SupportRequestOptimizeWindowClause that just nullifies the aggorder /
> aggdistinct fields for Min/Max aggregates on types where there's no
> possible difference in output when calling the transition function on
> rows in a different order.
>
> Would that apply in enough cases for you?

One additional thought is that the above method would also help
eliminate redundant sorting in queries with a GROUP BY clause.
Whereas, the can_minmax_aggs optimisation is not applied in that case.

David




Re: Idea Feedback: psql \h misses -> Offers Links?

2024-05-08 Thread Kirk Wolak
On Fri, Apr 19, 2024 at 10:14 AM Euler Taveira  wrote:

> On Wed, Apr 17, 2024, at 2:47 PM, Kirk Wolak wrote:
>
> ...
>
> This is Question 1: Do others see the potential value here?
>
>
> Yes. However, I expect an exact and direct answer. There will be cases
> that the
> first result is not the one you are looking for. (You are expecting the
> function or parameter description but other page is on the top because it
> is
> more relevant.) The referred URL does not point you to the direct link.
> Instead, you have to click again to be able to check the content.
>

Again, this does get to the point that the current search feature at
postgresql.org could be better.  I would like to see that improved as
well...


> Question 2: What if we allowed the users to set some extra link Templates
> using \pset??
>
> \pset help_assist_link_1 =  https://www.google.com/search?q={token}'
> \pset help_assist_link_2 = '
> https://wiki.postgresql.org/index.php?title=Special%3ASearch&search={token}&go=Go
> 
> '
>
>
> That's a different idea. Are you proposing to provide URLs if this psql
> variable is set and it doesn't find an entry (say \h foo)? I'm not sure if
> it
> is a good idea to allow third-party URLs (even if it is configurable).
>

If you want to check the patch Andrey published.  We basically set the
default value to the set variable, and then allowed the user to override
that value with multiple pipe (|) separated URLs.  It does BEG the question
if this is cool for hackers.  Personally, I like the option as there are
probably a few resources worth checking against.  But if someone doesn't
change the default, they get a good enough answer.


> IMO we should expand \h to list documentation references for functions and
> GUCs
> using SGML files. We already did it for SQL commands. Another broader idea
> is
> to build an inverted index similar to what Index [1] provides. The main
> problem
> with this approach is to create a dependency between documentation build
> and
> psql. Maybe there is a reasonable way to obtain the links for each term.
>
>
> [1] https://www.postgresql.org/docs/current/bookindex.html
>

I don't want to add more dependencies into psql to the documentation for a
ton of stuff.  To me, if we had a better search page on the website for
finding things, it would be great.  I have been resigned to just googling
"postgresql " because google does a better job searching
postgresql.org than the postgresql.org site does (even when it is a known
indexed item like a function name).

Thanks for the feedback.


Re: Slow catchup of 2PC (twophase) transactions on replica in LR

2024-05-08 Thread Peter Smith
Hi Kuroda-san,

Thanks for addressing most of my v6-0001 review comments.

Below are some minor follow-up comments for v7-0001.

==
src/backend/access/transam/twophase.c

1. IsTwoPhaseTransactionGidForSubid

+/*
+ * IsTwoPhaseTransactionGidForSubid
+ * Check whether the given GID is formed by TwoPhaseTransactionGid.
+ */
+static bool
+IsTwoPhaseTransactionGidForSubid(Oid subid, char *gid)

I think the function comment should mention something about 'subid'.

SUGGESTION
Check whether the given GID (as formed by TwoPhaseTransactionGid) is
for the specified 'subid'.

==
src/backend/commands/subscriptioncmds.c

2. AlterSubscription

+ if (!opts.twophase &&
+ form->subtwophasestate == LOGICALREP_TWOPHASE_STATE_ENABLED &&
+ LookupGXactBySubid(subid))
+ /* Add error message */
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("cannot disable two_phase when uncommitted prepared
transactions present"),
+ errhint("Resolve these transactions and try again")));

The comment "/* Add error message */" seems unnecessary.

==
Kind Regards,
Peter Smith.
Fujitsu Australia




Re: Slow catchup of 2PC (twophase) transactions on replica in LR

2024-05-08 Thread Peter Smith
Hi, Here are some review comments for v7-0002

==
Commit message

1.
IIUC there is quite a lot of subtlety and details about why the slot
option needs to be changed only when altering "true" to "false", but
not when altering "false" to "true".

It also should explain why PreventInTransactionBlock is only needed
when altering two_phase "true" to "false".

Please include a commit message to describe all those tricky details.

==
src/backend/commands/subscriptioncmds.c

2. AlterSubscription

- PreventInTransactionBlock(isTopLevel, "ALTER SUBSCRIPTION ... SET
(two_phase)");
+ if (!opts.twophase)
+ PreventInTransactionBlock(isTopLevel,
+   "ALTER SUBSCRIPTION ... SET (two_phase = off)");

IMO this needs a comment to explain why PreventInTransactionBlock is
only needed when changing the 'two_phase' option from on to off.

~~~

3. AlterSubscription

/*
* Try to acquire the connection necessary for altering slot.
*
* This has to be at the end because otherwise if there is an error while
* doing the database operations we won't be able to rollback altered
* slot.
*/
if (replaces[Anum_pg_subscription_subfailover - 1] ||
replaces[Anum_pg_subscription_subtwophasestate - 1])
{
bool must_use_password;
char*err;
WalReceiverConn *wrconn;
bool failover_needs_to_be_updated;
bool two_phase_needs_to_be_updated;

/* Load the library providing us libpq calls. */
load_file("libpqwalreceiver", false);

/* Try to connect to the publisher. */
must_use_password = sub->passwordrequired && !sub->ownersuperuser;
wrconn = walrcv_connect(sub->conninfo, true, true, must_use_password,
sub->name, &err);
if (!wrconn)
ereport(ERROR,
(errcode(ERRCODE_CONNECTION_FAILURE),
errmsg("could not connect to the publisher: %s", err)));

/*
* Consider which slot option must be altered.
*
* We must alter the failover option whenever subfailover is updated.
* Two_phase, however, is altered only when changing true to false.
*/
failover_needs_to_be_updated =
replaces[Anum_pg_subscription_subfailover - 1];
two_phase_needs_to_be_updated =
(replaces[Anum_pg_subscription_subtwophasestate - 1] &&
!opts.twophase);

PG_TRY();
{
if (two_phase_needs_to_be_updated || failover_needs_to_be_updated)
walrcv_alter_slot(wrconn, sub->slotname,
  failover_needs_to_be_updated ? &opts.failover : NULL,
  two_phase_needs_to_be_updated ? &opts.twophase : NULL);
}
PG_FINALLY();
{
walrcv_disconnect(wrconn);
}
PG_END_TRY();
}

3a.
The block comment "Consider which slot option must be altered..." says
WHEN those options need to be updated, but it doesn't say WHY. e.g.
why only update the 'two_phase' when it is being disabled but not when
it is being enabled? In other words, I think there needs to be more
background/reason details given in this comment.

~~~

3b.
Can't those 2 new variable assignments be done up-front and guard this
entire "if-block" instead of the current replaces[] guarding it? Then
the code is somewhat simplified.

SUGGESTION:
/*
 * 
 */
update_failover = replaces[Anum_pg_subscription_subfailover - 1];
update_two_phase = (replaces[Anum_pg_subscription_subtwophasestate -
1] && !opts.twophase);

/*
 * Try to acquire the connection necessary for altering slot.
 *
 * This has to be at the end because otherwise if there is an error while
 * doing the database operations we won't be able to rollback altered
 * slot.
 */
if (update_failover || update_two_phase)
{
  ...

  /* Load the library providing us libpq calls. */
  load_file("libpqwalreceiver", false);

  /* Try to connect to the publisher. */
  must_use_password = sub->passwordrequired && !sub->ownersuperuser;
  wrconn = walrcv_connect(sub->conninfo, true, true,
must_use_password, sub->name, &err);
  if (!wrconn)
ereport(ERROR, ...);

  PG_TRY();
  {
walrcv_alter_slot(wrconn, sub->slotname,
  update_failover ? &opts.failover : NULL,
  update_two_phase ? &opts.twophase : NULL);
  }
  PG_FINALLY();
  {
walrcv_disconnect(wrconn);
  }
  PG_END_TRY();
}

==
.../libpqwalreceiver/libpqwalreceiver.c

4.
+ appendStringInfo(&cmd, "ALTER_REPLICATION_SLOT %s ( ",
+ quote_identifier(slotname));
+
+ if (failover)
+ appendStringInfo(&cmd, "FAILOVER %s ",
+ (*failover) ? "true" : "false");
+
+ if (two_phase)
+ appendStringInfo(&cmd, "TWO_PHASE %s%s ",
+ (*two_phase) ? "true" : "false",
+ failover ? ", " : "");
+
+ appendStringInfoString(&cmd, ");");

4a.
IIUC the comma logic here was broken in v7 when you swapped the order.
Anyway, IMO it will be better NOT to try combining that comma logic
with the existing appendStringInfo. Doing it separately is both easier
and less error-prone.

Furthermore, the parentheses like "(*two_phase)" instead of just
"*two_phase" seemed a bit overkill.

SUGGESTION:
+ if (failover)
+ appendStringInfo(&cmd, "FAILOVER %s",
+ *failover ? "true" : "false");
+
+   if (failover && two_phase)
+   appendStringInfo(&cmd, ", ");
+
+ if (two_phase)
+ appendStringInfo(&cmd, "TWO_PHASE %s",
+ *two_phase ? "true" : "false");
+
+ appendStringInfoString(&cmd, " );");

~~

4b.
L

Re: Revert: Remove useless self-joins *and* -DREALLOCATE_BITMAPSETS make server crash, regress test fail.

2024-05-08 Thread Richard Guo
On Thu, May 9, 2024 at 6:40 AM Tom Lane  wrote:

> David Rowley  writes:
> > I'm fine with this one as it's the same as what I already mentioned
> > earlier.  I had imagined doing bms_del_member(bms_copy ... but maybe
> > the compiler is able to optimise away the additional store. Likely, it
> > does not matter much as pallocing memory likely adds far more overhead
> > anyway.
>
> I actually wrote it that way to start with, but undid it after
> noticing that the existing code in remove_rel_from_restrictinfo
> does it in separate steps, and thinking that that was good for
> both separation of concerns and a cleaner git history.  I too
> can't believe that an extra fetch will be noticeable compared
> to the cost of the adjacent bms_xxx operations.


I also think it seems better to do bms_copy in separate steps, not only
because this keeps consistent with the existing code in
remove_rel_from_restrictinfo, but also because we need to do
bms_del_member twice for each lefthand/righthand relid set.

Speaking of consistency, do you think it would improve the code's
readability if we rearrange the code in remove_rel_from_query so that
the modifications of the same relid set are grouped together, just like
what we do in remove_rel_from_restrictinfo?  I mean something like:

   sjinf->min_lefthand = bms_copy(sjinf->min_lefthand);
   sjinf->min_lefthand = bms_del_member(sjinf->min_lefthand, relid);
   sjinf->min_lefthand = bms_del_member(sjinf->min_lefthand, ojrelid);

   ...

Thanks
Richard


Re: Weird test mixup

2024-05-08 Thread Noah Misch
On Thu, May 09, 2024 at 09:37:54AM +0900, Michael Paquier wrote:
> On Tue, May 07, 2024 at 11:53:10AM -0700, Noah Misch wrote:
> > The problem I'm trying to tackle in this thread is to make
> > src/test/modules/gin installcheck-safe.  $SUBJECT's commit 5105c90 started
> > that work, having seen the intarray test suite break when run concurrently
> > with the injection_points test suite.  That combination still does break at
> > the exit-time race condition.  To reproduce, apply this attachment to add
> > sleeps, and run:
> > 
> > make -C src/test/modules/gin installcheck USE_MODULE_DB=1 & sleep 2; make 
> > -C contrib/intarray installcheck USE_MODULE_DB=1
> 
> Thanks for that.  I am not really sure how to protect that without a
> small cost in flexibility for the cases of detach vs run paths.  This
> comes down to the fact that a custom routine could be run while it
> could be detached concurrently, removing any stuff a callback could
> depend on in the module.
> 
> It was mentioned upthread to add to InjectionPointCacheEntry a fixed
> area of memory that modules could use to store some "status" data, but
> it would not close the run/detach race because a backend could still
> hold a pointer to a callback, with concurrent backends playing with
> the contents of InjectionPointCacheEntry (concurrent detaches and
> attaches that would cause the same entries to be reused).

> One way to close entirely the window would be to hold
> InjectionPointLock longer in InjectionPointRun() until the callback
> finishes or until it triggers an ERROR.  This would mean that the case
> you've mentioned in [1] would change, by blocking the detach() of s3
> until the callback of s2 finishes.

> [1] https://www.postgresql.org/message-id/20240501231214...@rfd.leadboat.com

Yes, that would be a loss for test readability.  Also, I wouldn't be surprised
if some test will want to attach POINT-B while POINT-A is in injection_wait().
Various options avoiding those limitations:

1. The data area formerly called a "status" area is immutable after attach.
   The core code copies it into a stack buffer to pass in a const callback
   argument.

2. InjectionPointAttach() returns an attachment serial number, and the
   callback receives that as an argument.  injection_points.c would maintain
   an InjectionPointCondition for every still-attached serial number,
   including global attachments.  Finding no condition matching the serial
   number argument means detach already happened and callback should do
   nothing.

3. Move the PID check into core code.

4. Separate the concept of "make ineligible to fire" from "detach", with
   stronger locking for detach.  v1 pointed in this direction, though not
   using that terminology.

5. Injection point has multiple callbacks.  At least one runs with the lock
   held and can mutate the "status" data.  At least one runs without the lock.

(1) is, I think, simple and sufficient.  How about that?




Re: Support tid range scan in parallel?

2024-05-08 Thread David Rowley
On Thu, 9 May 2024 at 10:23, Cary Huang  wrote:
> The v3 patch is attached.

I've not looked at the patch, but please add it to the July CF.  I'll
try and look in more detail then.

David




Re: Expand applicability of aggregate's sortop optimization

2024-05-08 Thread David Rowley
On Thu, 9 May 2024 at 13:08, David Rowley  wrote:
> One additional thought is that the above method would also help
> eliminate redundant sorting in queries with a GROUP BY clause.
> Whereas, the can_minmax_aggs optimisation is not applied in that case.

Another argument for using this method is that
SupportRequestOptimizeAggref could allow future unrelated
optimisations such as swapping count() for count(*).
Where  is a NOT NULL column and isn't nullable by
any outer join.  Doing that could speed some queries up quite a bit as
it may mean fewer columns to deform from the tuple. You could imagine
a fact table with many columns and a few dimensions, often the
dimension columns that you'd expect to use in GROUP BY would appear
before the columns you'd aggregate on.

David




Re: PERIOD foreign key feature

2024-05-08 Thread Paul Jungwirth

On 5/8/24 07:44, Bruce Momjian wrote:

On Wed, May  8, 2024 at 02:29:34PM +0200, Peter Eisentraut wrote:

Yes, David is correct here on all points. I like his suggestion to
clarify the language here also. If you need a patch from me let me know,
but I assume it's something a committer can just make happen?


In principle yes, but it's also very helpful if someone produces an actual
patch file, with complete commit message, credits, mailing list link, etc.


I am ready to do the work, but waited a day for Peter to reply, since he
was the author of the text.


Here is a patch for this.

Yours,

--
Paul  ~{:-)
p...@illuminatedcomputing.comFrom d97a60b7e56c57a242668609c8fb82e6a6a32506 Mon Sep 17 00:00:00 2001
From: "Paul A. Jungwirth" 
Date: Wed, 8 May 2024 20:41:05 -0700
Subject: [PATCH] Reword docs for temporal PKs with implicit referent

The old docs had confusing language about "one side" of the foreign key,
which could mean either the referenced primary key itself or the
REFERENCES clause of the FK declaration.

Author: David G. Johnston 
Discussion: https://www.postgresql.org/message-id/ZjpApuq8I9DE5Elv%40momjian.us
---
 doc/src/sgml/ref/create_table.sgml | 12 +++-
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/doc/src/sgml/ref/create_table.sgml b/doc/src/sgml/ref/create_table.sgml
index 02f31d2d6fd..75f06bc49cc 100644
--- a/doc/src/sgml/ref/create_table.sgml
+++ b/doc/src/sgml/ref/create_table.sgml
@@ -1184,11 +1184,13 @@ WITH ( MODULUS numeric_literal, REM
   referent for its entire duration.  This column must be a range or
   multirange type.  In addition, the referenced table must have a primary
   key or unique constraint declared with WITHOUT
-  OVERLAPS.  Finally, if one side of the foreign key uses
-  PERIOD, the other side must too.  If the refcolumn list is omitted, the
-  WITHOUT OVERLAPS part of the primary key is treated
-  as if marked with PERIOD.
+  OVERLAPS. Finally, if the foreign key has a PERIOD
+  column_name specification
+  the corresponding refcolumn,
+  if present, must also be marked PERIOD.  If the
+  refcolumn clause is omitted,
+  and thus the reftable's primary key constraint chosen, the primary key
+  must have its final column marked WITHOUT OVERLAPS.
  
 
  
-- 
2.42.0



Non-systematic handling of EINTR/EAGAIN/EWOULDBLOCK

2024-05-08 Thread Alexander Lakhin

Hello hackers,

Looking at a recent failure on the buildfarm:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=morepork&dt=2024-04-30%2020%3A48%3A34

# poll_query_until timed out executing this query:
# SELECT archived_count FROM pg_stat_archiver
# expecting this output:
# 1
# last actual query output:
# 0
# with stderr:
# Looks like your test exited with 29 just after 4.
[23:01:41] t/020_archive_status.pl ..
Dubious, test returned 29 (wstat 7424, 0x1d00)
Failed 12/16 subtests

with the following error in the log:
2024-04-30 22:57:27.931 CEST [83115:1] LOG:  archive command failed with exit 
code 1
2024-04-30 22:57:27.931 CEST [83115:2] DETAIL:  The failed archive command was: cp 
"pg_wal/00010001_does_not_exist" "00010001_does_not_exist"

...
2024-04-30 22:57:28.070 CEST [47962:2] [unknown] LOG:  connection authorized: user=pgbf database=postgres 
application_name=020_archive_status.pl

2024-04-30 22:57:28.072 CEST [47962:3] 020_archive_status.pl LOG: statement: 
SELECT archived_count FROM pg_stat_archiver
2024-04-30 22:57:28.073 CEST [83115:3] LOG:  could not send to statistics 
collector: Resource temporarily unavailable

and the corresponding code (on REL_13_STABLE):
static void
pgstat_send(void *msg, int len)
{
    int rc;

    if (pgStatSock == PGINVALID_SOCKET)
    return;

    ((PgStat_MsgHdr *) msg)->m_size = len;

    /* We'll retry after EINTR, but ignore all other failures */
    do
    {
    rc = send(pgStatSock, msg, len, 0);
    } while (rc < 0 && errno == EINTR);

#ifdef USE_ASSERT_CHECKING
    /* In debug builds, log send failures ... */
    if (rc < 0)
    elog(LOG, "could not send to statistics collector: %m");
#endif
}

I wonder, whether this retry should be performed after EAGAIN (Resource
temporarily unavailable), EWOULDBLOCK as well.

With a simple send() wrapper (PFA) activated with LD_PRELOAD, I could
reproduce this failure easily when running
`make -s check -C src/test/recovery/ PROVE_TESTS="t/020*"` on
REL_13_STABLE:
t/020_archive_status.pl .. 1/16 # poll_query_until timed out executing this 
query:
# SELECT archived_count FROM pg_stat_archiver
# expecting this output:
# 1
# last actual query output:
# 0
# with stderr:
# Looks like your test exited with 29 just after 4.
t/020_archive_status.pl .. Dubious, test returned 29 (wstat 7424, 0x1d00)
Failed 12/16 subtests

I also reproduced another failure (that lacks useful diagnostics, 
unfortunately):
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=morepork&dt=2022-11-10%2015%3A30%3A16
...
t/020_archive_status.pl .. 8/16 # poll_query_until timed out executing this 
query:
# SELECT last_archived_wal FROM pg_stat_archiver
# expecting this output:
# 00010002
# last actual query output:
# 00010001
# with stderr:
# Looks like your test exited with 29 just after 13.
t/020_archive_status.pl .. Dubious, test returned 29 (wstat 7424, 0x1d00)
Failed 3/16 subtests
...

The "n == 64" condition in the cranky send() is needed to aim exactly
these failures. Without this restriction the test (and also `make check`)
just hangs because of:
    if (errno == EINTR)
    continue;   /* Ok if we were interrupted */

    /*
 * Ok if no data writable without blocking, and the socket is in
 * non-blocking mode.
 */
    if (errno == EAGAIN ||
    errno == EWOULDBLOCK)
    {
    return 0;
    }
in internal_flush_buffer().

On the other hand, even with:
int
send(int s, const void *buf, size_t n, int flags)
{
    if (rand() % 1 == 0)
    {
    errno = EINTR;
    return -1;
    }
    return real_send(s, buf, n, flags);
}

`make check` fails with many miscellaneous errors...

Best regards,
Alexander#define _GNU_SOURCE
#include 
#include 
#include 
#include 
#include 

static ssize_t (*real_send)(int s, const void *buf, size_t n, int flags) = NULL;

__attribute__((constructor))
void
lib_init(void)
{
	real_send = dlsym(RTLD_NEXT,"send");
}

int
send(int s, const void *buf, size_t n, int flags)
{
	if (n == 64 && rand() % 10 == 0)
	{
		errno = EAGAIN;
		return -1;
	}
	return real_send(s, buf, n, flags);
}


First draft of PG 17 release notes

2024-05-08 Thread Bruce Momjian
I have committed the first draft of the PG 17 release notes;  you can
see the results here:

https://momjian.us/pgsql_docs/release-17.html

It will be improved until the final release.  The item count is 188,
which is similar to recent releases:

release-10:  189
release-11:  170
release-12:  180
release-13:  178
release-14:  220
release-15:  184
release-16:  206
release-17:  188

I welcome feedback.  For some reason it was an easier job than usual.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.




Re: SQL:2011 application time

2024-05-08 Thread Paul Jungwirth
Here are a couple new patches, rebased to e305f715, addressing Peter's feedback. I'm still working 
on integrating jian he's suggestions for the last patch, so I've omitted that one here.


On 5/8/24 06:51, Peter Eisentraut wrote:

About v2-0001-Fix-ON-CONFLICT-DO-NOTHING-UPDATE-for-temporal-in.patch, I think 
the
ideas are right, but I wonder if we can fine-tune the new conditionals a bit.

--- a/src/backend/executor/execIndexing.c
+++ b/src/backend/executor/execIndexing.c
@@ -210,7 +210,7 @@ ExecOpenIndices(ResultRelInfo *resultRelInfo, bool 
speculative)
  * If the indexes are to be used for speculative insertion, 
add extra
  * information required by unique index entries.
  */
-   if (speculative && ii->ii_Unique)
+   if (speculative && ii->ii_Unique && !ii->ii_HasWithoutOverlaps)
     BuildSpeculativeIndexInfo(indexDesc, ii);

Here, I think we could check !indexDesc->rd_index->indisexclusion instead.  So 
we
wouldn't need ii_HasWithoutOverlaps.


Okay.


Or we could push this into BuildSpeculativeIndexInfo(); it could just skip the 
rest
if an exclusion constraint is passed, on the theory that all the speculative 
index
info is already present in that case.


I like how BuildSpeculativeIndexInfo starts with an Assert that it's given a unique index, so I've 
left the check outside the function. This seems cleaner anyway: the function stays more focused.



--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -815,7 +815,7 @@ infer_arbiter_indexes(PlannerInfo *root)
  */
     if (indexOidFromConstraint == idxForm->indexrelid)
     {
-   if (!idxForm->indisunique && onconflict->action == 
ONCONFLICT_UPDATE)
+   if ((!idxForm->indisunique || idxForm->indisexclusion) && onconflict->action == 
ONCONFLICT_UPDATE)

     ereport(ERROR,
     (errcode(ERRCODE_WRONG_OBJECT_TYPE),
  errmsg("ON CONFLICT DO UPDATE not supported with exclusion 
constraints")));

Shouldn't this use only idxForm->indisexclusion anyway?  Like

+   if (idxForm->indisexclusion && onconflict->action == 
ONCONFLICT_UPDATE)

That matches what the error message is reporting afterwards.


Agreed.


  * constraints), so index under consideration can be immediately
  * skipped if it's not unique
  */
-   if (!idxForm->indisunique)
+   if (!idxForm->indisunique || idxForm->indisexclusion)
     goto next;

Maybe here we need a comment.  Or make that a separate statement, like


Yes, that is nice. Done.

Yours,

--
Paul  ~{:-)
p...@illuminatedcomputing.comFrom 0ebe2e6d6cd8dc6f8120fe93b9024cf80472f8cc Mon Sep 17 00:00:00 2001
From: "Paul A. Jungwirth" 
Date: Sun, 24 Mar 2024 21:46:30 -0700
Subject: [PATCH v2 1/2] Fix ON CONFLICT DO NOTHING/UPDATE for temporal indexes

A PRIMARY KEY or UNIQUE constraint with WITHOUT OVERLAPS will be a GiST
index, not a B-Tree, but it will still have indisunique set. The code
for ON CONFLICT fails if it sees a non-btree index that has indisunique.
This commit fixes that and adds some tests. But now that we can't just
test indisunique, we also need some extra checks to prevent DO UPDATE
from running against a WITHOUT OVERLAPS constraint (because the conflict
could happen against more than one row, and we'd only update one).
---
 src/backend/catalog/index.c   |   1 +
 src/backend/executor/execIndexing.c   |   2 +-
 src/backend/optimizer/util/plancat.c  |   9 +-
 src/include/nodes/execnodes.h |   1 +
 .../regress/expected/without_overlaps.out | 176 ++
 src/test/regress/sql/without_overlaps.sql | 113 +++
 6 files changed, 300 insertions(+), 2 deletions(-)

diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 5a8568c55c9..1fd543cc550 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -2443,6 +2443,7 @@ BuildIndexInfo(Relation index)
  &ii->ii_ExclusionOps,
  &ii->ii_ExclusionProcs,
  &ii->ii_ExclusionStrats);
+		ii->ii_HasWithoutOverlaps = ii->ii_Unique;
 	}
 
 	return ii;
diff --git a/src/backend/executor/execIndexing.c b/src/backend/executor/execIndexing.c
index 9f05b3654c1..59acf67a36a 100644
--- a/src/backend/executor/execIndexing.c
+++ b/src/backend/executor/execIndexing.c
@@ -210,7 +210,7 @@ ExecOpenIndices(ResultRelInfo *resultRelInfo, bool speculative)
 		 * If the indexes are to be used for speculative insertion, add extra
 		 * information required by unique index entries.
 		 */
-		if (speculative && ii->ii_Unique)
+		if (speculative && ii->ii_Unique && !indexDesc->rd_index->indisexclusion)
 			BuildSpeculativeIndexInfo(indexDesc, ii);
 
 		relationDescs[i] = indexDesc;
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 130f838629f..775c3e26cd8 

Re: [PATCH] json_lex_string: don't overread on bad UTF8

2024-05-08 Thread Michael Paquier
On Wed, May 08, 2024 at 07:01:08AM -0700, Jacob Champion wrote:
> On Tue, May 7, 2024 at 10:31 PM Michael Paquier  wrote:
>> But looking closer, I can see that in the JSON_INVALID_TOKEN case,
>> when !tok_done, we set token_terminator to point to the end of the
>> token, and that would include an incomplete byte sequence like in your
>> case.  :/
> 
> Ah, I see what you're saying. Yeah, that approach would need some more
> invasive changes.

My first feeling was actually to do that, and report the location in
the input string where we are seeing issues.  All code paths playing
with token_terminator would need to track that.

> Agreed. Fortunately (or unfortunately?) I think the JSON
> client-encoding work is now a prerequisite for OAuth in libpq, so
> hopefully some improvements can fall out of that work too.

I'm afraid so.  I don't quite see how this would be OK to tweak on
stable branches, but all areas that could report error states with
partial byte sequence contents would benefit from such a change.

>> Thoughts and/or objections?
> 
> None here.

This is a bit mitigated by the fact that d6607016c738 is recent, but
this is incorrect since v13 so backpatched down to that.
--
Michael


signature.asc
Description: PGP signature


Re: First draft of PG 17 release notes

2024-05-08 Thread David Rowley
On Thu, 9 May 2024 at 16:04, Bruce Momjian  wrote:
> I welcome feedback.  For some reason it was an easier job than usual.

Thanks for working on that.

> +2023-11-02 [cac169d68] Increase DEFAULT_FDW_TUPLE_COST from 0.01 to 0.2

> +Double the default foreign data wrapper tuple cost (David Rowley, Umair 
> Shahid)

That's 20x rather than 2x.

David




Re: First draft of PG 17 release notes

2024-05-08 Thread Muhammad Ikram
Hi Bruce,

A minor formatting issue in the start below. Bullet is not required here.

E.1.1. Overview


PostgreSQL 17 contains many new features and enhancements, including:

   -


The above items and other new features of PostgreSQL 17 are explained in
more detail in the sections below.
Regards,
Ikram



On Thu, May 9, 2024 at 9:45 AM David Rowley  wrote:

> On Thu, 9 May 2024 at 16:04, Bruce Momjian  wrote:
> > I welcome feedback.  For some reason it was an easier job than usual.
>
> Thanks for working on that.
>
> > +2023-11-02 [cac169d68] Increase DEFAULT_FDW_TUPLE_COST from 0.01 to 0.2
>
> > +Double the default foreign data wrapper tuple cost (David Rowley, Umair
> Shahid)
>
> That's 20x rather than 2x.
>
> David
>
>
>

-- 
Muhammad Ikram


Re: Weird test mixup

2024-05-08 Thread Michael Paquier
On Wed, May 08, 2024 at 08:15:53PM -0700, Noah Misch wrote:
> Yes, that would be a loss for test readability.  Also, I wouldn't be surprised
> if some test will want to attach POINT-B while POINT-A is in injection_wait().

Not impossible, still annoying with more complex scenarios.

> Various options avoiding those limitations:
> 
> 1. The data area formerly called a "status" area is immutable after attach.
>The core code copies it into a stack buffer to pass in a const callback
>argument.
> 
> 3. Move the PID check into core code.

The PID checks are specific to the module, and there could be much
more conditions like running only in specific backend types (first
example coming into mind), so I want to avoid that kind of knowledge
in the backend.

> (1) is, I think, simple and sufficient.  How about that?

That sounds fine to do that at the end..  I'm not sure how large this
chunk area added to each InjectionPointEntry should be, though.  128B
stored in each InjectionPointEntry would be more than enough I guess?
Or more like 1024?  The in-core module does not need much currently,
but larger is likely better for pluggability.
--
Michael


signature.asc
Description: PGP signature


Re: First draft of PG 17 release notes

2024-05-08 Thread David Rowley
On Thu, 9 May 2024 at 16:47, Muhammad Ikram  wrote:
> A minor formatting issue in the start below. Bullet is not required here.

This is a placeholder for the highlight features of v17 will go.
Bruce tends not to decide what those are all by himself.

David




Re: First draft of PG 17 release notes

2024-05-08 Thread Bertrand Drouvot
Hi,

On Thu, May 09, 2024 at 12:03:50AM -0400, Bruce Momjian wrote:
> I have committed the first draft of the PG 17 release notes;  you can
> see the results here:
> 
>   https://momjian.us/pgsql_docs/release-17.html

Thanks for working on that!
 
> I welcome feedback.

> Add system view pg_wait_events that reports wait event types (Michael 
> Paquier) 

Michael is the committer for 1e68e43d3f, the author is me.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: First draft of PG 17 release notes

2024-05-08 Thread Masahiko Sawada
Hi,

On Thu, May 9, 2024 at 1:03 PM Bruce Momjian  wrote:
>
> I have committed the first draft of the PG 17 release notes;  you can
> see the results here:
>
> https://momjian.us/pgsql_docs/release-17.html

Thank you for working on that!

I'd like to mention some of my works. I think we can add the vacuum
performance improvements by the following commits:

- Add template for adaptive radix tree (ee1b30f1)
- Add TIDStore, to store sets of TIDs (ItemPointerData) efficiently (30e144287)
- Use TidStore for dead tuple TIDs storage during lazy vacuum (667e65aac)

Also, please consider the following item:

- Improve eviction algorithm in ReorderBuffer using max-heap for many
subtransactions (5bec1d6bc)

Finally, should we mention the following commit in the release note?
It's not a user-visible change but added a new regression test module.

- Add tests for XID wraparound (e255b646a)

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com




Re: cannot abort transaction 2737414167, it was already committed

2024-05-08 Thread Thomas Munro
On Thu, Dec 28, 2023 at 11:42 AM Tom Lane  wrote:
> Thomas Munro  writes:
> > In CommitTransaction() there is a stretch of code beginning s->state =
> > TRANS_COMMIT and ending s->state = TRANS_DEFAULT, from which we call
> > out to various subsystems' AtEOXact_XXX() functions.  There is no way
> > to roll back in that state, so anything that throws ERROR from those
> > routines is going to get something much like $SUBJECT.  Hmm, we'd know
> > which exact code path got that EIO from your smoldering core if we'd
> > put an explicit critical section there (if we're going to PANIC
> > anyway, it might as well not be from a different stack after
> > longjmp()...).
>
> +1, there's basically no hope of debugging this sort of problem
> as things stand.

I was reminded of this thread by Justin's other file system snafu thread.

Naively defining a critical section to match the extent of the
TRANS_COMMIT state doesn't work, as a bunch of code under there uses
palloc().  That reminds me of the nearby RelationTruncate() thread,
and there is possibly even some overlap, plus more in this case...
ugh.

Hmm, AtEOXact_RelationMap() is one of those steps, but lives just
outside the crypto-critical-section created by TRANS_COMMIT, though
has its own normal CS for logging.  I wonder, given that "updating the
map file is effectively commit of the relocation", why wouldn't it
have a variant of the problem solved by DELAY_CHKPT_START for normal
commit records, under diabolical scheduling?  It's a stretch, but: You
log XLOG_RELMAP_UPDATE, a concurrent checkpoint runs with REDO after
that record, you crash before/during durable_rename(), and then you
perform crash recovery.  Now your catalog is still using the old
relfilenode on the primary, but any replica following along replays
XLOG_RELMAP_UPDATE and is using the new relfilenode, frozen in time,
for queries, while replaying changes to the old relfilenode.  Right?




Re: query_id, pg_stat_activity, extended query protocol

2024-05-08 Thread Andrei Lepikhov

On 5/1/24 10:07, Imseih (AWS), Sami wrote:

Here is a new rev of the patch which deals with the scenario
mentioned by Andrei [1] in which the queryId may change
due to a cached query invalidation.


[1] 
https://www.postgresql.org/message-id/724348C9-8023-41BC-895E-80634E79A538%40amazon.com
I discovered the current state of queryId reporting and found that it 
may be unlogical: Postgres resets queryId right before query execution 
in simple protocol and doesn't reset it at all in extended protocol and 
other ways to execute queries.
I think we should generally report it when the backend executes a job 
related to the query with that queryId. This means it would reset the 
queryId at the end of the query execution.
However, the process of setting up the queryId is more complex. Should 
we set it at the beginning of query execution? This seems logical, but 
what about the planning process? If an extension plans a query without 
the intention to execute it for speculative reasons, should we still 
show the queryId? Perhaps we should reset the state right after planning 
to accurately reflect the current queryId.
See in the attachment some sketch for that - it needs to add queryId 
reset on abortion.


--
regards, Andrei Lepikhov
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 4d7c92d63c..a4d38a157f 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -470,6 +470,12 @@ ExecutorEnd(QueryDesc *queryDesc)
 		(*ExecutorEnd_hook) (queryDesc);
 	else
 		standard_ExecutorEnd(queryDesc);
+
+	/*
+	 * Report at the end of query execution. Don't change it for nested
+	 * queries.
+	 */
+	pgstat_report_query_id(0, false);
 }
 
 void
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 032818423f..ba29dc5bc7 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -58,6 +58,7 @@
 #include "parser/parse_relation.h"
 #include "parser/parsetree.h"
 #include "partitioning/partdesc.h"
+#include "pgstat.h"
 #include "utils/lsyscache.h"
 #include "utils/rel.h"
 #include "utils/selfuncs.h"
@@ -280,6 +281,13 @@ planner(Query *parse, const char *query_string, int cursorOptions,
 		result = (*planner_hook) (parse, query_string, cursorOptions, boundParams);
 	else
 		result = standard_planner(parse, query_string, cursorOptions, boundParams);
+
+	/*
+	 * Reset queryId at the end of planning job. Executor code will set it up
+	 * again on the ExecutorStart call.
+	 */
+	pgstat_report_query_id(0, false);
+
 	return result;
 }
 
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 2dff28afce..10e2529cf6 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -1108,8 +1108,6 @@ exec_simple_query(const char *query_string)
 		const char *cmdtagname;
 		size_t		cmdtaglen;
 
-		pgstat_report_query_id(0, true);
-
 		/*
 		 * Get the command name for use in status display (it also becomes the
 		 * default completion tag, down inside PortalRun).  Set ps_status and


Re: Slow catchup of 2PC (twophase) transactions on replica in LR

2024-05-08 Thread Peter Smith
Hi, Here are some review comments for the patch v7-0003.

==
Commit Message

1.
The patch needs a commit message to describe the purpose and highlight
any limitations and other details.

==
doc/src/sgml/ref/alter_subscription.sgml

2.
+
+ 
+  The two_phase parameter can only be altered when the
+  subscription is disabled. When altering the parameter from
true
+  to false, the backend process checks prepared
+  transactions done by the logical replication worker and aborts them.
+ 

Here, the para is referring to "true" and "false" but earlier on this
page it talks about "twophase = off". IMO it is better to use a
consistent terminology like "on|off" everywhere instead of randomly
changing the way it is described each time.

==
src/backend/commands/subscriptioncmds.c

3. AlterSubscription

  if (IsSet(opts.specified_opts, SUBOPT_TWOPHASE_COMMIT))
  {
+ List *prepared_xacts = NIL;

This 'prepared_xacts' can be declared at a lower scrope because it is
only used if (!opts.twophase).

Furthermore, IIUC you don't need to assign NIL in the declaration
because there is no chance for it to be unassigned anyway.

~~~

4. AlterSubscription

+ /*
+ * The changed two_phase option (true->false) of the
+ * slot can't be rolled back.
+ */
  PreventInTransactionBlock(isTopLevel,
"ALTER SUBSCRIPTION ... SET (two_phase = off)");

Here is another example of inconsistent mixing of the terminology
where the comment says "true"/"false" but the message says "off".
Let's keep everything consistent. (I prefer on|off).

~~~

5.
+ if (sub->twophasestate == LOGICALREP_TWOPHASE_STATE_ENABLED &&
+ (prepared_xacts = GetGidListBySubid(subid)) != NIL)
+ {
+ ListCell *cell;
+
+ /* Abort all listed transactions */
+ foreach(cell, prepared_xacts)
+ FinishPreparedTransaction((char *) lfirst(cell),
+   false);
+
+ list_free(prepared_xacts);
+ }

5A.
IIRC there is a cleaner way to write this loop without needing
ListCell variable -- e.g. foreach_ptr() macro?

~

5B.
Shouldn't this be using list_free_deep() so the pstrdup gid gets freed too?

==
src/test/subscription/t/099_twophase_added.pl

6.
+##
+# Check the case that prepared transactions exist on subscriber node
+##
+

Give some more detailed comments here similar to the review comment of
patch v7-0002 for the other part of this TAP test.

~~~

7. TAP test - comments

Same as for my v7-0002 review comments, I think this test case also
needs a few more one-line comments to describe the sub-steps. e.g.:

# prepare a transaction to insert some rows to the table

# verify the prepared tx is replicated to the subscriber (because
'two_phase = on')

# toggle the two_phase to 'off' *before* the COMMIT PREPARED

# verify the prepared tx got aborted

# do the COMMIT PREPARED (note that now two_phase is 'off')

# verify the inserted rows got replicated ok

~~~

8. TAP test - subscription name

It's better to rename the SUBSCRIPTION in this TAP test so you can
avoid getting log warnings like:

psql::4: WARNING:  subscriptions created by regression test
cases should have names starting with "regress_"
psql::4: NOTICE:  created replication slot "sub" on publisher

==
Kind Regards,
Peter Smith.
Fujitsu Australia




Re: First draft of PG 17 release notes

2024-05-08 Thread Richard Guo
On Thu, May 9, 2024 at 12:04 PM Bruce Momjian  wrote:

> I have committed the first draft of the PG 17 release notes;  you can
> see the results here:
>
> https://momjian.us/pgsql_docs/release-17.html


Thanks for working on that.

For this item:


> Allow the optimizer to improve CTE plans by using the sort order of
> columns referenced in earlier CTE clauses (Jian Guo)


I think you mean a65724dfa.  The author should be 'Richard Guo'.

And I'm wondering if it is more accurate to state it as "Allow the
optimizer to improve plans for the outer query by leveraging the sort
order of a CTE's output."

I think maybe a similar revision can be applied to the item just above
this one.

Thanks
Richard