date:20241030

Re: Useless field ispartitioned in CreateStmtContext

2024-10-30 Thread Kirill Reshke

Hi

On Thu, 24 Oct 2024, 18:08 hugo, <2689496...@qq.com> wrote:

> Hi!
>
>When looking at the partition-related code, I found that the
> ispartitioned
>
> field in CreateStmtContext is not used. It looks like we can safely remove
> it and
>
> avoid invalid assignment logic.
>
>
>
> Here's a very simple fix， any suggestion?
>
>
>
> diff --git a/src/backend/parser/parse_utilcmd.c
> b/src/backend/parser/parse_utilcmd.c
>
> index 1e15ce10b48..6dea85cc2dc 100644
>
> --- a/src/backend/parser/parse_utilcmd.c
>
> +++ b/src/backend/parser/parse_utilcmd.c
>
> @@ -89,7 +89,6 @@ typedef struct
>
> List   *alist;  /* "after list" of things
> to do after creating
>
>  * the
> table */
>
> IndexStmt  *pkey;   /* PRIMARY KEY index, if
> any */
>
> -   boolispartitioned;  /* true if table is partitioned */
>
> PartitionBoundSpec *partbound;  /* transformed FOR VALUES */
>
> boolofType; /* true if statement
> contains OF typename */
>
> } CreateStmtContext;
>
> @@ -246,7 +245,6 @@ transformCreateStmt(CreateStmt *stmt, const char
> *queryString)
>
> cxt.blist = NIL;
>
> cxt.alist = NIL;
>
> cxt.pkey = NULL;
>
> -   cxt.ispartitioned = stmt->partspec != NULL;
>
> cxt.partbound = stmt->partbound;
>
> cxt.ofType = (stmt->ofTypename != NULL);
>
> @@ -3401,7 +3399,6 @@ transformAlterTableStmt(Oid relid, AlterTableStmt
> *stmt,
>
> cxt.blist = NIL;
>
> cxt.alist = NIL;
>
> cxt.pkey = NULL;
>
> -   cxt.ispartitioned = (rel->rd_rel->relkind ==
> RELKIND_PARTITIONED_TABLE);
>
> cxt.partbound = NULL;
>
> cxt.ofType = false;
>
>
>
> --
>
> Best regards,
>
> hugozhang
>

I don't see any cf entry for this patch, so I created one[1]. I added Alena
and myself as reviewers. I can't find your account on cf, please add
yourself as author


[1] https://commitfest.postgresql.org/50/5340/

>

Re: Add isolation test template in injection_points for wait/wakeup/detach

2024-10-30 Thread Michael Paquier

On Wed, Oct 30, 2024 at 07:26:07AM +, Bertrand Drouvot wrote:
> 1 === Nit
> 
> s/Permutations like this one/Permutations like the following commented out 
> one/ ?
> 
> 2 ===
> 
> s/back its result/back its result once woken up/ ?
> 
> 3 ===
> 
> s/faster than the wait/before the SQL function doing the wait returns its 
> result/ ?
> 
> With 2 === and 3 === I think it's easier to "link" both sentences that way.

What do you think about the updated version attached?
--
Michael
diff --git a/src/test/modules/injection_points/specs/basic.spec b/src/test/modules/injection_points/specs/basic.spec
index 7f44e3ddc3..47db676c0c 100644
--- a/src/test/modules/injection_points/specs/basic.spec
+++ b/src/test/modules/injection_points/specs/basic.spec
@@ -26,7 +26,10 @@ step wakeup2	{ SELECT injection_points_wakeup('injection-points-wait'); }
 step detach2	{ SELECT injection_points_detach('injection-points-wait'); }
 
 # Detach after wait and wakeup.
-# This permutation is proving to be unstable on FreeBSD, so disable for now.
+# Permutations like the following one commented out should be avoided, as
+# the detach may finish before the SQL function doing the wait returns
+# its result.  It is recommended to use wakeups as the last permutation
+# should a wait be done within an SQL function.
 #permutation wait1 wakeup2 detach2
 
 # Detach before wakeup.  s1 waits until wakeup, ignores the detach.


signature.asc
Description: PGP signature

Consider pipeline implicit transaction as a transaction block

2024-10-30 Thread Anthonin Bonnefoy

Hi,

When using pipelining with implicit transaction, a transaction will
start from the first command and be committed with the Sync message.
Functions like IsInTransactionBlock and PreventInTransactionBlock
already assimilate this implicit transaction as a transaction block,
relying on the XACT_FLAGS_PIPELINING flag. However, this is not the
case in CheckTransactionBlock. This function is used for things like
warning when a local GUC is set outside of a transaction block.

The attached patch adds the detection of implicit transactions started
by a pipeline in CheckTransactionBlock, avoiding warnings when
commands like `set local` are called within a pipeline, and making the
detection of transaction block coherent with what's done in
IsInTransactionBlock and PreventInTransactionBlock.

One thing that's still not fixed by the patch is that the first
command won't be seen as part of a transaction block. For example,
with pgbench:

\startpipeline
SET LOCAL statement_timeout='200ms';
SELECT * FROM pgbench_accounts where aid=1;
\endpipeline

The XACT_FLAGS_PIPELINING will only be set after the first command, so
the warning about `set local` happening outside of a transaction block
will still be generated. However, I'm not sure if it's something
fixable (or worth fixing?). This would require to know beforehand that
there are multiple executes before the sync message, which doesn't
seem doable.

Regards,
Anthonin


v01-0001-Consider-pipeline-implicit-transaction-as-a-tran.patch
Description: Binary data

Re: define pg_structiszero(addr, s, r)

2024-10-30 Thread Bertrand Drouvot

Hi,

On Wed, Oct 30, 2024 at 09:09:54AM +0100, Peter Eisentraut wrote:
> On 29.10.24 14:58, Bertrand Drouvot wrote:
> > hi,
> > 
> > On Tue, Oct 29, 2024 at 10:23:37AM +0100, Peter Eisentraut wrote:
> > > On 29.10.24 08:54, Bertrand Drouvot wrote:
> > > > +static inline bool
> > > > +pg_mem_is_all_zeros(const char *p, size_t len)
> > > 
> > > This should be a void * pointer please.
> > 
> > Thanks for looking at it!
> > Yeah better, done in v4 attached.
> 
> Sorry for the confusion.  I didn't mean to discourage you from the const
> qualifier.  That should still be there.

doh! Of course we want it to be read only. No problem, I should have put more
thoughts on it. Done in v5 attached.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
>From e80e5cf946813185985bf5339ac996ccc0eab1f0 Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot 
Date: Tue, 10 Sep 2024 01:22:02 +
Subject: [PATCH v5] New pg_mem_is_all_zeros(p, len) inline function

This new function allows to test if a memory region (starting at p) and of
size len is full of zeroes.
---
 src/backend/storage/page/bufpage.c   | 13 +
 src/backend/utils/activity/pgstat_bgwriter.c |  5 +++--
 src/backend/utils/activity/pgstat_checkpointer.c |  7 +++
 src/backend/utils/activity/pgstat_relation.c |  7 ++-
 src/include/utils/memutils.h | 16 
 5 files changed, 25 insertions(+), 23 deletions(-)
  19.8% src/backend/storage/page/
  58.6% src/backend/utils/activity/
  21.5% src/include/utils/

diff --git a/src/backend/storage/page/bufpage.c b/src/backend/storage/page/bufpage.c
index be6f1f62d2..ca4c68de1c 100644
--- a/src/backend/storage/page/bufpage.c
+++ b/src/backend/storage/page/bufpage.c
@@ -89,10 +89,8 @@ PageIsVerifiedExtended(Page page, BlockNumber blkno, int flags)
 {
 	PageHeader	p = (PageHeader) page;
 	size_t	   *pagebytes;
-	int			i;
 	bool		checksum_failure = false;
 	bool		header_sane = false;
-	bool		all_zeroes = false;
 	uint16		checksum = 0;
 
 	/*
@@ -126,18 +124,9 @@ PageIsVerifiedExtended(Page page, BlockNumber blkno, int flags)
 	}
 
 	/* Check all-zeroes case */
-	all_zeroes = true;
 	pagebytes = (size_t *) page;
-	for (i = 0; i < (BLCKSZ / sizeof(size_t)); i++)
-	{
-		if (pagebytes[i] != 0)
-		{
-			all_zeroes = false;
-			break;
-		}
-	}
 
-	if (all_zeroes)
+	if (pg_mem_is_all_zeros(pagebytes , (BLCKSZ / sizeof(size_t
 		return true;
 
 	/*
diff --git a/src/backend/utils/activity/pgstat_bgwriter.c b/src/backend/utils/activity/pgstat_bgwriter.c
index 364a7a2024..398e1d57ca 100644
--- a/src/backend/utils/activity/pgstat_bgwriter.c
+++ b/src/backend/utils/activity/pgstat_bgwriter.c
@@ -17,6 +17,7 @@
 
 #include "postgres.h"
 
+#include "utils/memutils.h"
 #include "utils/pgstat_internal.h"
 
 
@@ -30,7 +31,6 @@ void
 pgstat_report_bgwriter(void)
 {
 	PgStatShared_BgWriter *stats_shmem = &pgStatLocal.shmem->bgwriter;
-	static const PgStat_BgWriterStats all_zeroes;
 
 	Assert(!pgStatLocal.shmem->is_shutdown);
 	pgstat_assert_is_up();
@@ -39,7 +39,8 @@ pgstat_report_bgwriter(void)
 	 * This function can be called even if nothing at all has happened. In
 	 * this case, avoid unnecessarily modifying the stats entry.
 	 */
-	if (memcmp(&PendingBgWriterStats, &all_zeroes, sizeof(all_zeroes)) == 0)
+	if (pg_mem_is_all_zeros(&PendingBgWriterStats,
+			sizeof(struct PgStat_BgWriterStats)))
 		return;
 
 	pgstat_begin_changecount_write(&stats_shmem->changecount);
diff --git a/src/backend/utils/activity/pgstat_checkpointer.c b/src/backend/utils/activity/pgstat_checkpointer.c
index bbfc9c7e18..fb66a1a11d 100644
--- a/src/backend/utils/activity/pgstat_checkpointer.c
+++ b/src/backend/utils/activity/pgstat_checkpointer.c
@@ -17,6 +17,7 @@
 
 #include "postgres.h"
 
+#include "utils/memutils.h"
 #include "utils/pgstat_internal.h"
 
 
@@ -29,8 +30,6 @@ PgStat_CheckpointerStats PendingCheckpointerStats = {0};
 void
 pgstat_report_checkpointer(void)
 {
-	/* We assume this initializes to zeroes */
-	static const PgStat_CheckpointerStats all_zeroes;
 	PgStatShared_Checkpointer *stats_shmem = &pgStatLocal.shmem->checkpointer;
 
 	Assert(!pgStatLocal.shmem->is_shutdown);
@@ -40,8 +39,8 @@ pgstat_report_checkpointer(void)
 	 * This function can be called even if nothing at all has happened. In
 	 * this case, avoid unnecessarily modifying the stats entry.
 	 */
-	if (memcmp(&PendingCheckpointerStats, &all_zeroes,
-			   sizeof(all_zeroes)) == 0)
+	if (pg_mem_is_all_zeros(&PendingCheckpointerStats,
+			sizeof(struct PgStat_CheckpointerStats)))
 		return;
 
 	pgstat_begin_changecount_write(&stats_shmem->changecount);
diff --git a/src/backend/utils/activity/pgstat_relation.c b/src/backend/utils/activity/pgstat_relation.c
index 8a3f7d434c..bf1947f316 100644
--- a/src/backend/utils/activity/pgstat_relation.c
+++ b/src/backend/utils/activity/pgstat_relation.c
@@ -801,7 +801,6 @@ pgstat_twophase_postabort(Trans

Re: Add isolation test template in injection_points for wait/wakeup/detach

2024-10-30 Thread Bertrand Drouvot

Hi,

On Wed, Oct 30, 2024 at 06:21:04PM +0900, Michael Paquier wrote:
> On Wed, Oct 30, 2024 at 07:26:07AM +, Bertrand Drouvot wrote:
> > 1 === Nit
> > 
> > s/Permutations like this one/Permutations like the following commented out 
> > one/ ?
> > 
> > 2 ===
> > 
> > s/back its result/back its result once woken up/ ?
> > 
> > 3 ===
> > 
> > s/faster than the wait/before the SQL function doing the wait returns its 
> > result/ ?
> > 
> > With 2 === and 3 === I think it's easier to "link" both sentences that way.
> 
> What do you think about the updated version attached?

Thanks! LGTM.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Index AM API cleanup

2024-10-30 Thread Peter Eisentraut


I want to call out one particular aspect that is central to this patch
series that needs more broader discussion and agreement.

The problem is that btree indexes (and to a lesser extent hash
indexes) are hard-coded in various places.  This includes various
places in the optimizer (see patch v17-0018) that might be harder to
understand, but also more easy-to-understand places such as:

- Only hash and btree are supported for replica identity.  (see patch 
v17-0015)


- Only btree is supported for speculative insertion.  (see patch v17-0016)

- Only btree is supported for foreign keys.  (see patch v17-0017)

- Only btree can participate in merge join.  (see patch v17-0020)

The problem in cases such as these, and some of them actually contain
code comments about this, is that we somehow need to know what the
equality operator (and in some cases ordering operator) for a type is.
And the way this is currently done is that the btree and hash strategy
numbers are hardcoded, and other index AMs are just not supported
because there is no way to find that out from those.

We faced a similar issue for the temporal primary keys feature.  One
thing this feature had to overcome is that

- Only btree is supported for primary keys.

For that feature, we wanted to add support for gist indexes, and we
had to again find a way to get the equals operator (and also the
overlaps operator).

The solution we ended up with there is to add a support function to
gist that translates the strategy numbers used by the operator class
to "well-known" strategy numbers that we can understand (commit
7406ab623fe).  It later turned out that this problem actually might be
a subset of the general problem being discussed here, so we can also
change that solution if we find a more general solution here.

So the question is, how do we communicate these fundamental semantics
of operators?

The solution we discussed for temporal primary keys is that we just
mandate that certain strategy numbers have fixed meanings.  This is
how btree and hash work anyway, but this was never required for gist.
And there are many gist opclass implementations already out there that
don't use compatible strategy numbers, so that solution would not have
worked (at least without some upgrading pain).  Hence the idea of
mapping the actual strategy numbers to a fixed set of well-known ones.

For the general case across all index AMs, we could similarly decree
that all index AMs use certain strategy numbers for fixed purposes.
But again, this already wouldn't work for gist and perhaps others
already out there.  So here again we need some kind of mapping to
numbers with a fixed purpose.  Or perhaps new index AMs can start out
with the "correct" numbers and only some existing ones would need the
mapping.

And then what do you map them to?

One idea would be to map them to the existing btree numbers.  This
could work, but I always found it kind of confusing that you'd then
have multiple sets of strategy numbers in play, one native to the
index AM or opclass, and one sort of global one.

Another idea is to map them to another thing altogether, a new enum.
This is what this patch series does, essentially.  But it turns out
(so argues the patch series), that this enum already exists, namely

typedef enum RowCompareType
{
/* Values of this enum are chosen to match btree strategy numbers */
ROWCOMPARE_LT = 1,  /* BTLessStrategyNumber */
ROWCOMPARE_LE = 2,  /* BTLessEqualStrategyNumber */
ROWCOMPARE_EQ = 3,  /* BTEqualStrategyNumber */
ROWCOMPARE_GE = 4,  /* BTGreaterEqualStrategyNumber */
ROWCOMPARE_GT = 5,  /* BTGreaterStrategyNumber */
ROWCOMPARE_NE = 6,  /* no such btree strategy */
} RowCompareType;

which in spite of the name, yes, sounds like exactly that kind of
thing.  And it's also used in many places for this kind of thing
already.

The patch series adds index AM callbacks

amtranslatestrategy
amtranslaterctype

to convert between strategy number and RowCompareType, and also adds

ROWCOMPARE_NONE
ROWCOMPARE_INVALID

to handle cases where there is no mapping.  I suppose for the temporal
PK use, we would need to add something like ROWCOMPARE_OVERLAPS.

So this is the idea.  To take a step back, I can see the following
options:

1. Require all index AMs to use btree-compatible strategy numbers.
   (Previously rejected, too much upgrading mess.)

2. Provide mapping between index AM strategy numbers and btree strategy
   numbers.  (This doesn't have a space for non-btree operations like
   overlaps.)

3. Provide mapping between index AM strategy numbers and the existing
   RT*StrategyNumber numbers.  (We can expand the set as we want.)
   (This is what the existing mapping function for gist does.)

4. Provide mapping between index AM strategy numbers and some
   completely new set of numbers/symbols.

5. Provide mapping between index AM strategy numbers and
   RowCompareType (with som

Re: Fix for consume_xids advancing XIDs incorrectly

2024-10-30 Thread Yushi Ogiwara

I made a new patch (skip_xid_correctly.diff) that incorporates the 
points we discussed:


1. Fix the issue that consume_xids consumes nxids+1 XIDs.
2. Update lastxid when calling GetTopFullTransactionId() to support 
nxids==1 case.

3. Forbid consume_xids when nxids==0.
4. Add comments explaining the return values of consume_xids and 
consume_xids_until, and the rationale for incrementing consumed++ when 
GetTopFullTransactionId() is called.


Also, I made another patch (support_blksz_32k.diff) that supports the 
block size == 32K case.


Best,
Yushi Ogiwara

diff --git a/src/test/modules/xid_wraparound/xid_wraparound.c b/src/test/modules/xid_wraparound/xid_wraparound.c
index dce81c0c6d..63f3305ee9 100644
--- a/src/test/modules/xid_wraparound/xid_wraparound.c
+++ b/src/test/modules/xid_wraparound/xid_wraparound.c
@@ -26,6 +26,7 @@ static FullTransactionId consume_xids_common(FullTransactionId untilxid, uint64
 
 /*
  * Consume the specified number of XIDs.
+ * Returns the last XID assigned by this function.
  */
 PG_FUNCTION_INFO_V1(consume_xids);
 Datum
@@ -34,11 +35,8 @@ consume_xids(PG_FUNCTION_ARGS)
 	int64		nxids = PG_GETARG_INT64(0);
 	FullTransactionId lastxid;
 
-	if (nxids < 0)
+	if (nxids <= 0)
 		elog(ERROR, "invalid nxids argument: %lld", (long long) nxids);
-
-	if (nxids == 0)
-		lastxid = ReadNextFullTransactionId();
 	else
 		lastxid = consume_xids_common(InvalidFullTransactionId, (uint64) nxids);
 
@@ -47,6 +45,7 @@ consume_xids(PG_FUNCTION_ARGS)
 
 /*
  * Consume XIDs, up to the given XID.
+ * Returns the last XID assigned by this function.
  */
 PG_FUNCTION_INFO_V1(consume_xids_until);
 Datum
@@ -88,8 +87,15 @@ consume_xids_common(FullTransactionId untilxid, uint64 nxids)
 	 * GetNewTransactionId registers them in the subxid cache in PGPROC, until
 	 * the cache overflows, but beyond that, we don't keep track of the
 	 * consumed XIDs.
+	 * 
+	 * If no top-level XID is assigned, a new one is obtained,
+	 * and the consumed XID counter is incremented.
 	 */
-	(void) GetTopTransactionId();
+	if(!FullTransactionIdIsValid(GetTopFullTransactionIdIfAny()))
+	{
+		lastxid = GetTopFullTransactionId();
+		consumed++;
+	}
 
 	for (;;)
 	{
diff --git a/src/test/modules/xid_wraparound/xid_wraparound.c b/src/test/modules/xid_wraparound/xid_wraparound.c
index dce81c0c6d..95832fbecc 100644
--- a/src/test/modules/xid_wraparound/xid_wraparound.c
+++ b/src/test/modules/xid_wraparound/xid_wraparound.c
@@ -64,6 +64,14 @@ consume_xids_until(PG_FUNCTION_ARGS)
 	PG_RETURN_FULLTRANSACTIONID(lastxid);
 }
 
+/*
+ * These constants copied from .c files, because they're private.
+ */
+#define COMMIT_TS_XACTS_PER_PAGE (BLCKSZ / 10)
+#define SUBTRANS_XACTS_PER_PAGE (BLCKSZ / sizeof(TransactionId))
+#define CLOG_XACTS_PER_BYTE 4
+#define CLOG_XACTS_PER_PAGE (BLCKSZ * CLOG_XACTS_PER_BYTE)
+
 /*
  * Common functionality between the two public functions.
  */
@@ -115,7 +123,7 @@ consume_xids_common(FullTransactionId untilxid, uint64 nxids)
 		 * If we still have plenty of XIDs to consume, try to take a shortcut
 		 * and bump up the nextXid counter directly.
 		 */
-		if (xids_left > 2000 &&
+		if (xids_left > COMMIT_TS_XACTS_PER_PAGE &&
 			consumed - last_reported_at < REPORT_INTERVAL &&
 			MyProc->subxidStatus.overflowed)
 		{
@@ -153,14 +161,6 @@ consume_xids_common(FullTransactionId untilxid, uint64 nxids)
 	return lastxid;
 }
 
-/*
- * These constants copied from .c files, because they're private.
- */
-#define COMMIT_TS_XACTS_PER_PAGE (BLCKSZ / 10)
-#define SUBTRANS_XACTS_PER_PAGE (BLCKSZ / sizeof(TransactionId))
-#define CLOG_XACTS_PER_BYTE 4
-#define CLOG_XACTS_PER_PAGE (BLCKSZ * CLOG_XACTS_PER_BYTE)
-
 /*
  * All the interesting action in GetNewTransactionId happens when we extend
  * the SLRUs, or at the uint32 wraparound. If the nextXid counter is not close

Re: Pgoutput not capturing the generated columns

2024-10-30 Thread Amit Kapila

On Tue, Oct 29, 2024 at 8:50 PM vignesh C  wrote:
>
> Thank you for reporting this issue. The attached v46 patch addresses
> the problem and includes some adjustments to the comments. Thanks to
> Amit for sharing the comment changes offline.
>

Pushed. Kindly rebase and send the remaining patches.

-- 
With Regards,
Amit Kapila.

Re: protocol-level wait-for-LSN

2024-10-30 Thread Jelte Fennema-Nio

On Wed, 30 Oct 2024 at 07:49, Tatsuo Ishii  wrote:
> > I think one would have to define that somehow.  If it's useful, the
> > additional fields of both extensions could be appended, in some
> > defined order.  But this is an interesting question to think about.
>
> I think this kind of extension, which adds new filed to an existing
> message type, should be implemented as v4 protocol.

Could you explain why you think a major version bump is needed? In
what situation do you care about this. Because for my usecases (client
implementations & pgbouncer) I don't think that would be necessary. If
a client doesn't send the _pq_.wait_for_lsn protocol parameter, it
will never receive this new version.

I don't really see a problem with having two protocol parameters
change the same message. Yes, you have to define what the result of
their combination is, but that seems trivial to do for additions of
fields. You either define the first protocol parameter that was added
to the spec, to add its field before the second. Or you could do it
based on something non-time-dependent, like the alphabetic order of
the protocol parameter, or the alphabetic order of the fields that
they add.

The main guarantees I'd like to uphold are listed here:
https://www.postgresql.org/message-id/cageczqr5pmud4q8atyz0gooj1xnh33g7g-mlxfml1_vrhbz...@mail.gmail.com

RE: Conflict detection for update_deleted in logical replication

2024-10-30 Thread Hayato Kuroda (Fujitsu)

Dear Hou,

Thanks for updating the patch! Here are my comments.

01. CreateSubscription
```
+if (opts.detectupdatedeleted && !track_commit_timestamp)
+ereport(ERROR,
+errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+errmsg("detecting update_deleted conflicts requires \"%s\" to 
be enabled",
+   "track_commit_timestamp"));
```

I don't think this guard is sufficient. I found two cases:

* Creates a subscription with detect_update_deleted = false and 
track_commit_timestamp = true,
  then alters detect_update_deleted to true.
* Creates a subscription with detect_update_deleted = true and 
track_commit_timestamp = true,
  then update track_commit_timestamp to true and restart the instance.

Based on that, how about detecting the inconsistency on the apply worker? It 
check
the parameters and do error out when it starts or re-reads a catalog. If we want
to detect in SQL commands, this can do in parse_subscription_options().

02. AlterSubscription()
```
+ApplyLauncherWakeupAtCommit();
```

The reason why launcher should wake up is different from other parts. Can we 
add comments
that it is needed to track/untrack the xmin?

03. build_index_column_bitmap()
```
+for (int i = 0; i < indexinfo->ii_NumIndexAttrs; i++)
+{
+int keycol = indexinfo->ii_IndexAttrNumbers[i];
+
+index_bitmap = bms_add_member(index_bitmap, keycol);
+}
```

I feel we can assert the ii_IndexAttrNumbers is valid, because the passed index 
is a replica identity key.

04. LogicalRepApplyLoop()

Can we move the definition of "phase" to the maybe_advance_nonremovable_xid() 
and
make it static? The variable is used only by the function.

05. LogicalRepApplyLoop()
```
+UpdateWorkerStats(last_received, timestamp, false);
```

The statistics seems not correct. last_received is not sent at "timestamp", it 
had
already been sent earlier. Do we really have to update here?

06. ErrorOnReservedSlotName()

I feel we should document that the slot name 'pg_conflict_detection' cannot be 
specified
unconditionally.

07. General

update_deleted can happen without DELETE commands. Should we rename the conflict
reason, like 'update_target_modified'?

E.g., there is a 2-way replication system and below transactions happen:

Node A:
  T1: INSERT INTO t (id, value) VALUES (1,1); ts = 10.00
  T2: UPDATE t SET id = 2 WHERE id = 1; ts = 10.02
Node B:
  T3: UPDATE t SET value = 2 WHERE id = 1; ts = 10.01

Then, T3 comes to Node A after executing T2. T3 tries to find id = 1 but find a
dead tuple instead. In this case, 'update_delete' happens without the delete.

08. Others

Also, here is an analysis related with the truncation of commit timestamp. I 
worried the
case that commit timestamp might be removed so that the detection would not go 
well.
But it seemed that entries can be removed when it behinds 
GetOldestNonRemovableTransactionId(NULL),
i.e., horizons.shared_oldest_nonremovable. The value is affected by the 
replication
slots so that interesting commit_ts entries for us are not removed.

Best regards,
Hayato Kuroda
FUJITSU LIMITED

Re: protocol-level wait-for-LSN

2024-10-30 Thread Tatsuo Ishii

>>> With this protocol extension, two things are changed:
>>>
>>> - The ReadyForQuery message sends back the current LSN.
>> If other protocol extension X tries to add something to the
>> ReadyForQuery message too, what would happen?
> 
> I think one would have to define that somehow.  If it's useful, the
> additional fields of both extensions could be appended, in some
> defined order.  But this is an interesting question to think about.

I think this kind of extension, which adds new filed to an existing
message type, should be implemented as v4 protocol.

Best reagards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

Re: AIO writes vs hint bits vs checksums

2024-10-30 Thread Heikki Linnakangas


On 30/10/2024 04:21, Andres Freund wrote:

Attached is a, unfortunately long, series of patches implementing what I
described upthread.


Review of the preparatory patches:


0001 Add very basic test for kill_prior_tuples

 We currently don't exercise this patch for gist and hash, which seems
 somewhat criminal.


Interesting to use the isolationtester for this. There's just one 
session, so you're just using it to define reusable steps with handy 
names. I'm fine with that, but please add a comment to explain it. I 
wonder if it'd be more straightforward to make it a regular pg_regress 
test though. There would be some repetition, but would it be so bad?


You forgot to add the new test to 'isolation_schedule'.

typos:
 "inex" -> "index"
 "does something approximately reasonble" -> "do something 
approximately reasonable"



0002 heapam: Move logic to handle HEAP_MOVED into a helper function

 A prep patch, which consolidates HEAP_MOVED handling into a helper. This
 isn't strictly necessary, but I got bothered by it while writing this patch
 series.


+1


+/*
+ * If HEAP_MOVED_OFF or HEAP_MOVED_IN are set on the tuple, remove them and
+ * adjust hint bits. See the comment for SetHintBits() for more background.
+ *
+ * This helper returns false if the row ought to be invisible, true otherwise.
+ */
+static inline bool
+HeapTupleCleanMoved(HeapTupleHeader tuple, Buffer buffer)
+{
+   TransactionId xvac;
+
+   /* only used by pre-9.0 binary upgrades */
+   if (likely(!(tuple->t_infomask & (HEAP_MOVED_OFF | HEAP_MOVED_IN
+   return true;
+
+   ...
+}


This is so unlikely these days that this function probably should not be 
inlined anymore.



0003 bufmgr: Add BufferLockHeldByMe()

 Useful for writing assertions in later patches.



+   if (mode == BUFFER_LOCK_EXCLUSIVE)
+   lwmode = LW_EXCLUSIVE;
+   else if (mode == BUFFER_LOCK_SHARE)
+   lwmode = LW_SHARED;
+   else
+   {
+   Assert(false);
+   pg_unreachable();
+   lwmode = LW_EXCLUSIVE;  /* assuage compiler */
+   }


I just learned a new word :-).

This is fine, but to avoid the assuaging, you could also do this:

if (mode == BUFFER_LOCK_EXCLUSIVE)
lwmode = LW_EXCLUSIVE;
else
{
Assert(mode == BUFFER_LOCK_SHARE);
lwmode = LW_SHARED;
}


0004 heapam: Use exclusive lock on old page in CLUSTER

 When I started looking into this I wanted to make sure that it's actually
 safe to skip setting hint bits in all places. One place that's *not* safe
 is cluster - we assume that all hint bits are set in rewrite_heap_tuple().

 This might not strictly be required, but it seemed very fragile as
 is. While using an exclusive lock does seem like it makes it a bit safer,
 it's not a very good fix, as we still dirty the old buffer, which seems
 like a shame when one uses VACUUM FULL.



+   /*
+* To be able to guarantee that we can set the hint bit, 
acquire an
+* exclusive lock on the old buffer. We need the hint bits to 
be set
+* as otherwise reform_and_rewrite_tuple() -> rewrite_heap_tuple() 
->
+* heap_freeze_tuple() will get confused.
+*


Hmm, I don't see any comments in the reform_and_rewrite_tuple() -> 
rewrite_heap_tuple() -> heap_freeze_tuple() path about requiring hint 
bits to be set. How does it get confused?



0005 heapam: Only set tuple's block once per page in pagemode

 Minor, but measurable, optimization.


+1, this makes sense independently of all the other patches.


diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 1748eafa100..d0cb4b1e29b 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -981,6 +981,9 @@ heapgettup_pagemode(HeapScanDesc scan,
linesleft = scan->rs_ntuples;
lineindex = ScanDirectionIsForward(dir) ? 0 : linesleft - 1;
 
+		/* set block once per page, instead of doing so for every tuple */

+   BlockIdSet(&tuple->t_self.ip_blkid, scan->rs_cblock);
+
/* lineindex now references the next or previous visible tid */
 continue_page:
 
@@ -995,7 +998,7 @@ continue_page:
 
 			tuple->t_data = (HeapTupleHeader) PageGetItem(page, lpp);

tuple->t_len = ItemIdGetLength(lpp);
-   ItemPointerSet(&(tuple->t_self), scan->rs_cblock, 
lineoff);
+   tuple->t_self.ip_posid = lineoff;
 
 			/* skip any tuples that don't match the scan key */

if (key != NULL &&


Should probably use ItemPointerSetBlockNumber and 
ItemPointerSetOffsetNumber.



0006 heapam: Add batch mode mvcc check and use it in page mode

 This is a good bit faster on its own, but is required to avoid a
 performance regression later, when setting hint bits only only when no IO
 going on

Re: Conflict detection for update_deleted in logical replication

2024-10-30 Thread Michail Nikolaev

Hello Hayato!

> Note that apply workers can stop due to some reasons (e.g., disabling
subscriptions,
> error out, deadlock...). In this case, the snapshot cannot eb registered
by the
> worker and index can be re-built during the period.

However, the xmin of a slot affects replication_slot_xmin in
ProcArrayStruct, so it might
be straightforward to wait for it during concurrent index builds. We could
consider adding
a separate conflict_resolution_replication_slot_xmin to wait only for that.

> Anyway, this topic introduces huge complexity and is not mandatory for
update_deleted
> detection. We can work on it in later versions based on the needs.

>From my perspective, this is critical for databases. REINDEX CONCURRENTLY
is typically run
in production databases on regular basic, so any master-master system
should be unaffected by it.

Best regards,
Mikhail.

Re: Avoid detoast overhead when possible

2024-10-30 Thread Nikita Malakhov

Hi!

Sorry for the long delay, other tasks required my attention and this project
was postponed, but we have some recent requests from clients and this
work has to be revived.

1) In short the idea behind the Json improvements is to introduce iterators
for Json objects to extract these objects value by value, and modification
of in-memory and on-disk representations to hold partially-detoasted Json
objects (tree nodes with need-detoast state) for in-memory, and to store
key-value map in on-disk representation to extract only required values.

2) Backward compatibility would not be a problem because in on-disk
representation we could distinguish old version from new, there are enough
service bits for that. But servers not updated with this patch could not
parse new data, of course.

3) Yes, it definitely has to. These changes are very complex and invasive,
that's the reason I haven't finished the patch.

4) It doesn't seem worthy at first glance. Do you have any ideas on this?

Thank you for your interest!

On Wed, Oct 30, 2024 at 2:59 AM Andy Fan  wrote:

> zhihuifan1...@163.com writes:
>
> Hello Nikita,
>
> >> There's a view from the other angle - detoast just attributes that are
> needed
> >> (partial detoast), with optimized storage mechanics for JSONb.
> >
> > Very glad to know that,  looking forward your design & patch!
>
> This is interesting, do you mind to provide a high level design of
> this. I'm not pushing to do this, but you asked me how I think about
> your proposal[1], so I think a high level design is a must to for these
> question. A first set of questions includes:
>
> 1. How to optimize the JSONb storage?
> 2. How to be compatible with older version?
> 3. How to integreate with existing ExprExecutionEngine? looks your
> proposal needs to know which "part" of jsonb fields are needed.
> 4. Is there any optimization if user want to get a full JSONB, like
> SELECT jb_col FROM t?
>
> [1]
>
> https://www.postgresql.org/message-id/CAN-LCVO3GZAKVTKNwwcezoc%3D9Lq%3DkU2via-BM3MXVdOq4tD9RQ%40mail.gmail.com
>
> --
> Best Regards
> Andy Fan
>
>

-- 
Regards,
Nikita Malakhov
Postgres Professional
The Russian Postgres Company
https://postgrespro.ru/

Re: Reduce one comparison in binaryheap's sift down

2024-10-30 Thread Robert Haas

On Tue, Oct 29, 2024 at 11:00 AM Nathan Bossart
 wrote:
> On Tue, Oct 29, 2024 at 11:21:15AM +0800, cca5507 wrote:
> > Agree, new version patch is attached.
>
> I might editorialize a bit, but overall this looks pretty good to me.
> Robert, would you like to commit this, or shall I?

I would be delighted if you could take care of it.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Alias of VALUES RTE in explain plan

2024-10-30 Thread Ashutosh Bapat

On Tue, Oct 29, 2024 at 10:49 PM Tom Lane  wrote:
>
> Andrei Lepikhov  writes:
> > -- New behavior
> > EXPLAIN (COSTS OFF, VERBOSE)
> > SELECT * FROM (VALUES (4),(2),(3),(1) ORDER BY t1.x LIMIT 2) AS t1(x);
> > SELECT * FROM (VALUES (4),(2),(3),(1) ORDER BY t1.x LIMIT 2) AS t1(x);
>
> After taking a closer look at that, yeah it's new behavior, and
> I'm not sure we want to change it.  (The existing behavior is that
> you'd have to write 'column1' or '"*VALUES*".column1' in the
> subquery's ORDER BY.)
>
> This example also violates my argument that the user thinks they
> are attaching the alias directly to VALUES.
>
>  So what I now think
> is that we ought to tweak the patch so that the parent alias is
> pushed down only when the subquery contains just VALUES, no other
> clauses.  Per a look at the grammar, ORDER BY, LIMIT, and FOR
> UPDATE could conceivably appear alongside VALUES; although
> FOR UPDATE would draw "FOR UPDATE cannot be applied to VALUES",
> so maybe we needn't worry about it.
>
> Thoughts?

If the user writes it in this manner, I think they intend to attach
the alias to VALUES() since there's no other way to do it. What is
weird is that they can use the alias before it's declared. For the
sake of eliminating this weirdness, your proposed tweak sounds fine to
me.

Even if we don't add that tweak, it's not easy for users to find out
that they can write the query this way. But it's better to plug the
hole before somebody starts exploiting it.

-- 
Best Wishes,
Ashutosh Bapat

Re: Eager aggregation, take 3

2024-10-30 Thread Robert Haas

On Sat, Oct 5, 2024 at 11:30 PM Richard Guo  wrote:
> Here’s a comparison of Execution Time and Planning Time for the seven
> queries with eager aggregation disabled versus enabled (best of 3).
>
> Execution Time:
>
> EAGER-AGG-OFF   EAGER-AGG-ON
>
> q4  105787.963 ms   34807.938 ms
> q8  1407.454 ms 1654.923 ms
> q11 67899.213 ms18670.086 ms
> q23 45945.849 ms42990.652 ms
> q31 10463.536 ms10244.175 ms
> q33 2186.928 ms 2217.228 ms
> q77 2360.565 ms 2416.674 ms

Could you attach the EXPLAIN ANALYZE output for these queries, with
and without the patch?

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: general purpose array_sort

2024-10-30 Thread Aleksander Alekseev

Hi,

Thanks for the updated patch set.

> > > +Datum
> > > +array_sort_order(PG_FUNCTION_ARGS)
> > > +{
> > > +return array_sort(fcinfo);
> > > +}
> > > +
> > > +Datum
> > > +array_sort_order_nulls_first(PG_FUNCTION_ARGS)
> > > +{
> > > +return array_sort(fcinfo);
> > > +}
> >
> > Any reason not to specify array_sort in pg_proc.dat?
>
> It is specified in 0001 (see oid => '8810').

What I meant was that I don't think these wrapper functions are
needed. I think you can just do:

```
+{ oid => '8811', descr => 'sort array',
+  proname => 'array_sort', prorettype => 'anyarray',
+  proargtypes => 'anyarray bool', prosrc => 'array_sort'}, <--
array_sort is used directly in `prosrc`
```

... unless I'm missing something.

-- 
Best regards,
Aleksander Alekseev

Re: detoast datum into the given buffer as a optimization.

2024-10-30 Thread Nikita Malakhov

Hi!

Andy, I'm truly sorry, I removed quoted messages just to not
make replies to walls of text.

On detoast_attr_buffer topic - and again agree with reply above
that supposes adding required data length as parameter, at least
by doing so we should explicitly make the user of this API pay
attention to the length of the buffer and data size, as it is done
in text_to_cstring_buffer you mentioned.

About printtup: I agree direct detoasting to the buffer could give
us some performance improvement, it seems not too difficult
to check, but as you can see there is a function call via fmgr,
it is better to try to find out why such a decision was made.

On Wed, Oct 30, 2024 at 2:47 AM Andy Fan  wrote:

> Nikita Malakhov  writes:
>
> > Hi!
> >
> > Sorry for misguiding you, I've overlooked va_rawsize with va_extinfo.
> > You're right, va_rawsize holds uncompressed size, and extinfo actual
> > storage size. This was not intentional.
>
> That's OK, so we are in the same page here.
>
> > I'd better not count on caller's do know detoasted data length,
> > and much more the buffer is correctly initialized, because we cannot
> > check that inside and must rely on the caller, which would for sure
> > lead to unexpected segfaults, I agree with Tom Lane's proposal above.
> > Other options seem to be more crude and error-prone here. This is
> > an internal data fetching function and it should not generate new kinds
> > of errors, I think.
>
> Tom'sreply depends on the fact I was going to changing the "detoast_attr"
> to "detoast_attr_buffer", as I have expalined in my previous post. I
> don't think a [new] user provided buffer function is so harmful. What
> do you think about function "text_to_string_buffer"?  This is also a
> part in my previous reply, but is ignored by your reply?
>
> > In case you're playing with this part of the code - I was always
> > confused with detoast_attr and detoast_external_attr functions
> > both marked as entry points of retrieving toasted data and both
> > look a lot the same. Have you ever thought of making a single
> > entry point by slightly redesigning this part?
>
> You can check [1] for a indepent improvements for this topic.
>
> [1] https://www.postgresql.org/message-id/874j4vcspl.fsf%40163.com
>
> --
> Best Regards
> Andy Fan
>
>
> --
Regards,
Nikita Malakhov
Postgres Professional
The Russian Postgres Company
https://postgrespro.ru/

Re: AIO writes vs hint bits vs checksums

2024-10-30 Thread Andres Freund

Hi,

Thanks for the quick review!

On 2024-10-30 14:16:51 +0200, Heikki Linnakangas wrote:
> On 24/09/2024 18:55, Andres Freund wrote:
> > What I'd instead like to propose is to implement the right to set hint bits 
> > as
> > a bit in each buffer's state, similar to BM_IO_IN_PROGRESS. Tentatively I
> > named this BM_SETTING_HINTS. It's only allowed to set BM_SETTING_HINTS when
> > BM_IO_IN_PROGRESS isn't already set and StartBufferIO has to wait for
> > BM_SETTING_HINTS to be unset to start IO.
> >
> > Naively implementing this, by acquiring and releasing the permission to set
> > hint bits in SetHintBits() unfortunately leads to a significant performance
> > regression. While the performance is unaffected for OLTPish workloads like
> > pgbench (both read and write), sequential scans of unhinted tables regress
> > significantly, due to the per-tuple lock acquisition this would imply.
>
> It'd be nice to implement this without having to set and clear
> BM_SETTING_HINTS every time.

True - but it's worth noting that we only need to set BM_SETTING_HINTS if
there are any hint bits that need to be set. So in the most common paths we
won't need to deal with it at all.

> Could we put the overhead on the FlushBuffer()
> instead?
>
> How about something like this:
>
> To set hint bits:
>
> 1. Lock page in SHARED mode.
> 2. Read BM_IO_IN_PROGRESS
> 3. If !BM_IO_IN_PROGRESS, set hint bits, otherwise don't
> 4. Unlock page
>
> To flush a buffer:
>
> 1. Lock page in SHARED mode
> 2. Set BM_IO_IN_PROGRESS
> 3. Read the lock count on the buffer lock, to see if we're the only locker.
> 4. If anyone else is holding the lock, upgrade it to exclusive mode, and
> immediately downgrade back to share mode.
> 5. calculate CRC, flush the buffer
> 6. Clear BM_IO_IN_PROGRESS and unlock page.
>
> This goes back to the idea of adding LWLock support for this, but the amount
> of changes could be pretty small. The missing operation we don't have today
> is reading the share-lock count on the lock in step 3. That seems simple to
> add.

I've played around with a bunch of ideas like this. There are two main
reasons I didn't like them that much in the end:

1) The worst case latency impacts seemed to make them not that
   interesting. A buffer that is heavily contended with share locks might not
   get down to zero share lockers for quite a while. That's not a problem for
   individual FlushBuffer() calls, but it could very well add up to a decent
   sized delay for something like a checkpoint that has to flush a lot of
   buffers.

   Waiting for all pre-existing share lockers is easier said than done. We
   don't record the acquisition order anywhere and a share-lock release won't
   wake anybody if the lockcount doesn't reach zero. Changing that wouldn't
   exactly be free and the cost would be born by all lwlock users.

2) They are based on holding an lwlock. But it's actually quite conceivable
   that we'd want to set something hint-bit-like without any lock, as we
   e.g. currently do for freespace/. That would work with something like the
   approach I chose here, but not if we rely on lwlocks.

   E.g. with a bit of work we could actually do sequential scans without
   blocking concurrent buffer modifications by carefully ordering when
   pd_lower is modified and making sure that tuple header fields are written
   in a sensible order.

> Acquiring the exclusive lock in step 4 is just a way to wait for all the
> existing share-lockers to release the lock. You wouldn't need to block new
> share-lockers. We have LW_WAIT_UNTIL_FREE, which is almost what we need, but
> it currently ignores share-lockers. So doing this "properly" would require
> more changes to LWLocks. Briefly acquiring an exclusive lock seems
> acceptable though.

I don't think LW_WAIT_UNTIL_FREE is that easily extended to cover something
like this, because share lockers can acquire/release the lock without waking
anybody. And changing that would be rather expensive for everyone...

> > But: We can address this and improve performance over the status quo! Today 
> > we
> > determine tuple visiblity determination one-by-one, even when checking the
> > visibility of an entire page worth of tuples.
>
> Yes, batching the visibility checks makes sense in any case.

Cool. The wins by doing that are noticeable...

Greetings,

Andres Freund

Re: protocol-level wait-for-LSN

2024-10-30 Thread Jelte Fennema-Nio

On Wed, 30 Oct 2024 at 12:53, Heikki Linnakangas  wrote:
> We might have made a mistake by calling this mechanism "protocol
> extensions", because it makes people think of user-defined extensions.

I think this is a real problem, that's probably worth fixing. I
created a separate thread to address this[1]

> So yes, each protocol extension needs to know about all the other
> protocol extensions that it can be used with. In practice we'll avoid
> doing crazy stuff so that the protocol extensions are orthogonal

Just as an example, let's say we add a server-side query time to the
protocol (which honestly seems like a pretty useful feature). So that
ReadyForQuery now returns the query time if the query_time protocol.
For clients it isn't difficult at all to support any combination of
query_time & wait_for_lsn options. As long as we define that the
wait_for_lsn field is before the query_time field if both exist, then
two simple if statements like this would do the trick:

if (wait_for_lsn_enabled) {
// interpret next field as LSN
}
if (query_time_enabled) {
// interpret next field as query time
}

[1]: 
https://www.postgresql.org/message-id/CAGECzQQoc%2BV94TrF-5cMikCMaf-uUnU52euwSCtQBeDYqXnXyA%40mail.gmail.com

Re: general purpose array_sort

2024-10-30 Thread Junwang Zhao

Hi,

On Wed, Oct 30, 2024 at 9:29 PM Aleksander Alekseev
 wrote:
>
> Hi,
>
> Thanks for the updated patch set.
>
> > > > +Datum
> > > > +array_sort_order(PG_FUNCTION_ARGS)
> > > > +{
> > > > +return array_sort(fcinfo);
> > > > +}
> > > > +
> > > > +Datum
> > > > +array_sort_order_nulls_first(PG_FUNCTION_ARGS)
> > > > +{
> > > > +return array_sort(fcinfo);
> > > > +}
> > >
> > > Any reason not to specify array_sort in pg_proc.dat?
> >
> > It is specified in 0001 (see oid => '8810').
>
> What I meant was that I don't think these wrapper functions are
> needed. I think you can just do:
>
> ```
> +{ oid => '8811', descr => 'sort array',
> +  proname => 'array_sort', prorettype => 'anyarray',
> +  proargtypes => 'anyarray bool', prosrc => 'array_sort'}, <--
> array_sort is used directly in `prosrc`
> ```
>
> ... unless I'm missing something.

There is a opr sanity check for this[1], if we remove these wrapper functions,
regression test will fail with:

- oid | proname | oid | proname
--+-+-+-
-(0 rows)
+ oid  |  proname   | oid  |  proname
+--++--+
+ 8811 | array_sort | 8812 | array_sort
+ 8810 | array_sort | 8811 | array_sort
+ 8810 | array_sort | 8812 | array_sort
+(3 rows)


[1]:

-- Considering only built-in procs (prolang = 12), look for multiple uses
-- of the same internal function (ie, matching prosrc fields). It's OK to
-- have several entries with different pronames for the same internal function,
-- but conflicts in the number of arguments and other critical items should
-- be complained of. (We don't check data types here; see next query.)
-- Note: ignore aggregate functions here, since they all point to the same
-- dummy built-in function.

SELECT p1.oid, p1.proname, p2.oid, p2.proname
FROM pg_proc AS p1, pg_proc AS p2
WHERE p1.oid < p2.oid AND
p1.prosrc = p2.prosrc AND
p1.prolang = 12 AND p2.prolang = 12 AND
(p1.prokind != 'a' OR p2.prokind != 'a') AND
(p1.prolang != p2.prolang OR
p1.prokind != p2.prokind OR
p1.prosecdef != p2.prosecdef OR
p1.proleakproof != p2.proleakproof OR
p1.proisstrict != p2.proisstrict OR
p1.proretset != p2.proretset OR
p1.provolatile != p2.provolatile OR
p1.pronargs != p2.pronargs);

>
> --
> Best regards,
> Aleksander Alekseev



-- 
Regards
Junwang Zhao

Re: Consider pipeline implicit transaction as a transaction block

2024-10-30 Thread Jelte Fennema-Nio

On Wed, 30 Oct 2024 at 10:15, Anthonin Bonnefoy
 wrote:
> The attached patch adds the detection of implicit transactions started
> by a pipeline in CheckTransactionBlock, avoiding warnings when
> commands like `set local` are called within a pipeline, and making the
> detection of transaction block coherent with what's done in
> IsInTransactionBlock and PreventInTransactionBlock.

+1 seems like a reasonable change.

> The XACT_FLAGS_PIPELINING will only be set after the first command, so
> the warning about `set local` happening outside of a transaction block
> will still be generated. However, I'm not sure if it's something
> fixable (or worth fixing?). This would require to know beforehand that
> there are multiple executes before the sync message, which doesn't
> seem doable.

Yeah, I don't really see a way around that apart from not throwing
this warning at all when the client is using the extended protocol.
Postgres would need to be clairvoyant to know to really know if it
should show it for the first message.

Re: [17] CREATE SUBSCRIPTION ... SERVER

2024-10-30 Thread Jeff Davis

On Fri, 2024-03-08 at 00:20 -0800, Jeff Davis wrote:
> Implemented in v11, attached.

Rebased, v12 attached.

Regards,
Jeff Davis

From 5c2a8f5cb865becd70b08379d9fc72946be9a32a Mon Sep 17 00:00:00 2001
From: Jeff Davis 
Date: Tue, 2 Jan 2024 13:42:48 -0800
Subject: [PATCH v12] CREATE SUSBCRIPTION ... SERVER.

Allow specifying a foreign server for CREATE SUBSCRIPTION, rather than
a raw connection string with CONNECTION.

Using a foreign server as a layer of indirection improves management
of multiple subscriptions to the same server. It also provides
integration with user mappings in case different subscriptions have
different owners or a subscription changes owners.

Discussion: https://postgr.es/m/61831790a0a937038f78ce09f8dd4cef7de7456a.ca...@j-davis.com
Reviewed-by: Ashutosh Bapat
---
 contrib/postgres_fdw/Makefile |   2 +
 contrib/postgres_fdw/connection.c |  73 
 .../postgres_fdw/expected/postgres_fdw.out|   8 +
 contrib/postgres_fdw/meson.build  |   5 +
 .../postgres_fdw/postgres_fdw--1.1--1.2.sql   |   8 +
 contrib/postgres_fdw/sql/postgres_fdw.sql |   7 +
 contrib/postgres_fdw/t/010_subscription.pl|  71 
 doc/src/sgml/ref/alter_subscription.sgml  |  18 +-
 doc/src/sgml/ref/create_subscription.sgml |  11 +-
 src/backend/catalog/pg_subscription.c |  38 +++-
 src/backend/commands/foreigncmds.c|  58 +-
 src/backend/commands/subscriptioncmds.c   | 168 --
 src/backend/foreign/foreign.c |  66 +++
 src/backend/parser/gram.y |  22 +++
 src/backend/replication/logical/worker.c  |  16 +-
 src/bin/pg_dump/pg_dump.c |  36 +++-
 src/bin/pg_dump/pg_dump.h |   1 +
 src/bin/psql/tab-complete.in.c|   2 +-
 src/include/catalog/pg_foreign_data_wrapper.h |   3 +
 src/include/catalog/pg_subscription.h |   7 +-
 src/include/foreign/foreign.h |   3 +
 src/include/nodes/parsenodes.h|   3 +
 src/test/regress/expected/oidjoins.out|   1 +
 23 files changed, 591 insertions(+), 36 deletions(-)
 create mode 100644 contrib/postgres_fdw/t/010_subscription.pl

diff --git a/contrib/postgres_fdw/Makefile b/contrib/postgres_fdw/Makefile
index 88fdce40d6..a101418d6e 100644
--- a/contrib/postgres_fdw/Makefile
+++ b/contrib/postgres_fdw/Makefile
@@ -18,6 +18,8 @@ DATA = postgres_fdw--1.0.sql postgres_fdw--1.0--1.1.sql postgres_fdw--1.1--1.2.s
 
 REGRESS = postgres_fdw query_cancel
 
+TAP_TESTS = 1
+
 ifdef USE_PGXS
 PG_CONFIG = pg_config
 PGXS := $(shell $(PG_CONFIG) --pgxs)
diff --git a/contrib/postgres_fdw/connection.c b/contrib/postgres_fdw/connection.c
index 2326f391d3..48c77a8de3 100644
--- a/contrib/postgres_fdw/connection.c
+++ b/contrib/postgres_fdw/connection.c
@@ -123,6 +123,7 @@ PG_FUNCTION_INFO_V1(postgres_fdw_get_connections);
 PG_FUNCTION_INFO_V1(postgres_fdw_get_connections_1_2);
 PG_FUNCTION_INFO_V1(postgres_fdw_disconnect);
 PG_FUNCTION_INFO_V1(postgres_fdw_disconnect_all);
+PG_FUNCTION_INFO_V1(postgres_fdw_connection);
 
 /* prototypes of private functions */
 static void make_new_connection(ConnCacheEntry *entry, UserMapping *user);
@@ -2161,6 +2162,78 @@ postgres_fdw_get_connections_internal(FunctionCallInfo fcinfo,
 	}
 }
 
+/*
+ * Values in connection strings must be enclosed in single quotes. Single
+ * quotes and backslashes must be escaped with backslash. NB: these rules are
+ * different from the rules for escaping a SQL literal.
+ */
+static void
+appendEscapedValue(StringInfo str, const char *val)
+{
+	appendStringInfoChar(str, '\'');
+	for (int i = 0; val[i] != '\0'; i++)
+	{
+		if (val[i] == '\\' || val[i] == '\'')
+			appendStringInfoChar(str, '\\');
+		appendStringInfoChar(str, val[i]);
+	}
+	appendStringInfoChar(str, '\'');
+}
+
+Datum
+postgres_fdw_connection(PG_FUNCTION_ARGS)
+{
+	Oid			userid = PG_GETARG_OID(0);
+	Oid			serverid = PG_GETARG_OID(1);
+	ForeignServer *server = GetForeignServer(serverid);
+	UserMapping *user = GetUserMapping(userid, serverid);
+	StringInfoData str;
+	const char **keywords;
+	const char **values;
+	int			n;
+
+	/*
+	 * Construct connection params from generic options of ForeignServer and
+	 * UserMapping.  (Some of them might not be libpq options, in which case
+	 * we'll just waste a few array slots.)  Add 4 extra slots for
+	 * application_name, fallback_application_name, client_encoding, end
+	 * marker.
+	 */
+	n = list_length(server->options) + list_length(user->options) + 4;
+	keywords = (const char **) palloc(n * sizeof(char *));
+	values = (const char **) palloc(n * sizeof(char *));
+
+	n = 0;
+	n += ExtractConnectionOptions(server->options,
+  keywords + n, values + n);
+	n += ExtractConnectionOptions(user->options,
+  keywords + n, values + n);
+
+	/* Set client_encoding so that libpq can convert encoding properly. */
+	keywords[n] = "client_encoding";
+	values[n] = GetDatabase

Re: AIO writes vs hint bits vs checksums

2024-10-30 Thread Heikki Linnakangas


On 24/09/2024 18:55, Andres Freund wrote:

What I'd instead like to propose is to implement the right to set hint bits as
a bit in each buffer's state, similar to BM_IO_IN_PROGRESS. Tentatively I
named this BM_SETTING_HINTS. It's only allowed to set BM_SETTING_HINTS when
BM_IO_IN_PROGRESS isn't already set and StartBufferIO has to wait for
BM_SETTING_HINTS to be unset to start IO.

Naively implementing this, by acquiring and releasing the permission to set
hint bits in SetHintBits() unfortunately leads to a significant performance
regression. While the performance is unaffected for OLTPish workloads like
pgbench (both read and write), sequential scans of unhinted tables regress
significantly, due to the per-tuple lock acquisition this would imply.


It'd be nice to implement this without having to set and clear 
BM_SETTING_HINTS every time. Could we put the overhead on the 
FlushBuffer() instead?


How about something like this:

To set hint bits:

1. Lock page in SHARED mode.
2. Read BM_IO_IN_PROGRESS
3. If !BM_IO_IN_PROGRESS, set hint bits, otherwise don't
4. Unlock page

To flush a buffer:

1. Lock page in SHARED mode
2. Set BM_IO_IN_PROGRESS
3. Read the lock count on the buffer lock, to see if we're the only locker.
4. If anyone else is holding the lock, upgrade it to exclusive mode, and 
immediately downgrade back to share mode.

5. calculate CRC, flush the buffer
6. Clear BM_IO_IN_PROGRESS and unlock page.

This goes back to the idea of adding LWLock support for this, but the 
amount of changes could be pretty small. The missing operation we don't 
have today is reading the share-lock count on the lock in step 3. That 
seems simple to add.


Acquiring the exclusive lock in step 4 is just a way to wait for all the 
existing share-lockers to release the lock. You wouldn't need to block 
new share-lockers. We have LW_WAIT_UNTIL_FREE, which is almost what we 
need, but it currently ignores share-lockers. So doing this "properly" 
would require more changes to LWLocks. Briefly acquiring an exclusive 
lock seems acceptable though.




But: We can address this and improve performance over the status quo! Today we
determine tuple visiblity determination one-by-one, even when checking the
visibility of an entire page worth of tuples.


Yes, batching the visibility checks makes sense in any case.

--
Heikki Linnakangas
Neon (https://neon.tech)

Re: [PATCH] Add array_reverse() function

2024-10-30 Thread Пополитов Владлен

On Tuesday, October 22, 2024 16:27 MSK, Aleksander Alekseev 
 wrote:

Hi everyone,

Thanks for the feedback!

> > But it returns the input array as is. I think it should at least make
> > a new copy of input array.
>
> I don't think that's really necessary. We have other functions that
> will return an input value unchanged without copying it. A
> longstanding example is array_larger. Also, this code looks to be
> copied from array_shuffle.

It was indeed copied from array_shuffle().

Makes me wonder if we should have an utility function which would
contain common code for array_shuffle() and array_reverse().
Considering inconveniences such an approach caused in case of common
code for text and bytea types [1] I choose to copy the code and modify
it for the task.

> +
> + /* If the target array is empty, exit fast */
> + if (ndim < 1 || dims[0] < 1)
> + return construct_empty_array(elmtyp);
>
> This is taken care by the caller, at least the ndim < 1 case.

Agree. Here is the corrected patch.

[1]: 
https://postgr.es/m/caj7c6to3x88dgd8c4tb-eq2zdpz+9mp+kowdzk_82bez_cm...@mail.gmail.com

--
Best regards,
Aleksander Alekseev

Hi!

Content.
The proposed function waited long to be implemented and it will be very
useful. Personally I used slow workaround, when I needed the reverse of
the array.

Run.
This patch applies cleanly to HEAD. All regression tests pass
successfully against the patch.

Format.
The code formatted according to The Code Guidelines

Documentation.
The documentation is updated, the description of the function is added.
>From my point of view, it would be better to mention, that function returns
updated array (does not updates it in place, as a reader can understand),
but other array returning functions in the documentation has the same 
style (silently assume: reverse == return reversed array).

Conclusion.
+1 for commiter review

Best regards,

Vladlen Popolitov
postgrespro.com

-- 
 Best regards,

Vladlen Popolitov.

Re: Reduce one comparison in binaryheap's sift down

2024-10-30 Thread Nathan Bossart

On Wed, Oct 30, 2024 at 08:09:17AM -0400, Robert Haas wrote:
> I would be delighted if you could take care of it.

Committed.

-- 
nathan

Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM

2024-10-30 Thread Bharath Rupireddy

Hi,

Thanks for looking into this.

On Mon, Oct 28, 2024 at 8:18 PM Jingtang Zhang  wrote:
>
> Just found that the initialization
> seems redundant since we have used palloc0?
>
> > +istate = (HeapInsertState *) palloc0(sizeof(HeapInsertState));
> > +istate->bistate = NULL;
> > +istate->mistate = NULL;

Changed it to palloc() and explicit initializations of the members.
With this, only TupleTableSlot's array in HeapMultiInsertState uses
palloc0(), the rest all use explicit initializations.

> Oh, another comments for v24-0001 patch: we are in heam AM now, should we use
> something like HEAP_INSERT_BAS_BULKWRITE instead of using table AM option,
> just like other heap AM options do?
>
> > + if ((state->options & TABLE_INSERT_BAS_BULKWRITE) != 0)
> > + istate->bistate = GetBulkInsertState();

Defined HEAP_INSERT_BAS_BULKWRITE and used that in heapam.c similar to
INSERT_SKIP_FSM, INSERT_FROZEN, NO_LOGICAL.

> Little question about v24 0002 patch: would it be better to move the
> implementation of TableModifyIsMultiInsertsSupported to somewhere for table AM
> level? Seems it is a common function for future use, not a specific one for
> matview.

It's more tailored for CREATE TABLE AS and CREATE/REFRESH MATERIALIZED
VIEW in the sense that no triggers, foreign table and partitioned
table possible here. INSERT INTO SELECT and Logical Replication Apply
will have a lot more conditions (e.g. RETURNING clause, triggers etc.)
and they will need to be handled differently. So, I left
TableModifyIsMultiInsertsSupported as-is in a common place in
matview.c.

Please find the attached v25 patch set.

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

v25-0001-Introduce-new-table-AM-for-multi-inserts.patch
Description: Binary data

v25-0002-Optimize-CTAS-CMV-RMV-with-new-multi-inserts-tab.patch
Description: Binary data

v25-0003-Use-new-multi-inserts-table-AM-for-COPY-.-FROM.patch
Description: Binary data

Re: Eager aggregation, take 3

2024-10-30 Thread Robert Haas

On Tue, Oct 29, 2024 at 3:57 AM Richard Guo  wrote:
> On Fri, Oct 18, 2024 at 10:22 PM jian he  wrote:
> > So overall I doubt here BTEQUALIMAGE_PROC flag usage is correct.
>
> The BTEQUALIMAGE_PROC flag is used to prevent eager aggregation for
> types whose equality operators do not imply bitwise equality, such as
> NUMERIC.
>
> After a second thought, I think it should be OK to just check the
> equality operator specified by the SortGroupClause for btree equality.
> I’m not very sure about this point, though, and would appreciate any
> inputs.

Well, the key thing here is the reasoning, which you don't really
spell out either here or in the patch. The patch's justification for
the use of BTEQUALIMAGE_PROC Is that, if we use non-equal-image
operators, "we may lose some information that could be needed to
evaluate upper qual clauses." But there's no explanation of why that
would happen, and I think that makes it difficult to have a good
discussion.

Here's an example from the optimizer README:

# 3.  (A leftjoin B on (Pab)) leftjoin C on (Pbc)
# = A leftjoin (B leftjoin C on (Pbc)) on (Pab)
#
# Identity 3 only holds if predicate Pbc must fail for all-null B rows
# (that is, Pbc is strict for at least one column of B).  If Pbc is not
# strict, the first form might produce some rows with nonnull C columns
# where the second form would make those entries null.

This isn't the clearest justification for a rule that I've ever read,
but it's something. Someone reading this can think about whether the
two sentences of justification are a correct argument that Pbc must be
strict in order for the identity to hold.

I think you should be trying to offer similar (or better)
justifications, and perhaps similar identities as well. It's a little
bit hard to think about how to write the identities for this patch,
although you could start with aggregate(X) =
finalizeagg(partialagg(X)) and then explain how the partialagg can be
commuted with certain joins and what is required for it to do so. The
argument needs to be detailed enough that we can evaluate whether it's
correct or not.

Personally, I'm still stuck on the point I wrote about yesterday. When
you partially aggregate a set of rows, the resulting row serves as a
proxy for the original set of rows. So, all of those rows must have
the same destiny. As I mentioned then, if you ever partially aggregate
a row that should ultimately be filtered out with one that shouldn't,
that breaks everything. But also, I need all the rows that feed into
each partial group to be part of the same set of output rows. For
instance, suppose I run a report like this:

SELECT o.order_number, SUM(l.quantity) FROM orders o, order_lines l
WHERE o.order_id = l.order_id GROUP BY 1;

If I'm thinking of executing this as FinalizeAgg(order JOIN
PartialAgg(order_lines)), what must be true in order for that to be a
valid execution plan? I certainly can't aggregate on some unrelated
column, say part_number. If I do, then first of all I don't know how
to get an order_id column out so that I can still do the join. But
even if that were no issue, you might have two rows with different
values of the order_id column and the same value for part_number, and
that the partial groups that you've created on the order_lines table
don't line up properly with the groups that you need to create on the
orders table. Some particular order_id might need some of the rows
that went into a certain part_number group and not others; that's no
good.

After some thought, I think the process of deduction goes something
like this. We ultimately need to aggregate on a column in the orders
table, but we want to partially aggregate the order_lines table. So we
need a set of columns in the order_lines table that can act as a proxy
for o.order_number. Our proxy should be all the columns that appear in
the join clauses between orders and order_lines. That way, all the
rows in a single partially aggregate row will have the "same" order_id
according to the operator class implementing the = operator, so for a
given row in the "orders" table, either every row in the group has
o.order_id = l.order_id or none of them do; hence they're all part of
the same set of output rows.

It doesn't matter, for example, whether o.order_number is unique. If
it isn't, then we'll flatten together some different rows from the
orders table -- but each of those rows will match all the rows in
partial groupings where o.order_id = partially_grouped_l.order_id and
none of the rows where that isn't the case, so the optimization is
still valid. Likewise it doesn't matter whether o.order_id is unique:
if order_id = 1 occurs twice, then both of the rows will match the
partially aggregated group with order_id = 1, but that's fine. The
only thing that's a problem is if the same row from orders was going
to match some but not all of the partially aggregate rows from some
order_lines group, and that won't happen here.

Now consider this example:

SELECT o.or

Re: Forbid to DROP temp tables of other sessions

2024-10-30 Thread Rafia Sabih

On Tue, 29 Oct 2024 at 07:22, Daniil Davydov <3daniss...@gmail.com> wrote:

> Hi,
> Thanks for your comments, I appreciate them.
>
> As I continued to deal with the topic of working with temp tables of
> other sessions, I noticed something like a bug. For example
> (REL_17_STABLE):
> Session 1:
> =# CREATE TEMP TABLE test(id int);
>
> Session 2:
> =# INSERT INTO pg_temp_0.test VALUES (1);
> =# INSERT INTO pg_temp_0.test VALUES (2);
>
> Second INSERT command will end with an error "cannot access temporary
> tables of other sessions". I checked why this is happening and found
> errors in several places.
>

Good catch. I agree with this being an unwarranted behaviour.
A minor comment from my end is the wording of the error message.
Based on the Postgresql error message style huide, something like this
could be better,
"could not access temporary relations of other sessions".


> So, I attach two files to this email :
> 1) Isolation test, that shows an error in REL_17_STABLE (iso_1.patch)
> 2) Patch that fixes code that mistakenly considered temporary tables
> to be permanent (I will be glad to receive feedback on these fixes) +
> isolation test, which shows that now any action with temp table of
> other session leads to error (temp_tbl_fix.patch)
>
> Tests look kinda ugly, but I think it's inevitable, given that we
> don't know exactly what the name of the temporary schema of other
> session will be.
>
> --
> Best regards,
> Daniil Davydov
>


-- 
Regards,
Rafia Sabih
CYBERTEC PostgreSQL International GmbH

Re: Eager aggregation, take 3

2024-10-30 Thread Robert Haas

On Tue, Oct 29, 2024 at 3:51 AM Richard Guo  wrote:
> > 2. join type is FULL JOIN, (i am not sure about other Semijoins and
> > anti-semijoins  types).
>
> The presence of a FULL JOIN does not preclude the use of eager
> aggregation.  We still can push a partial aggregation down to a level
> that is above the FULL JOIN.

I think you can also push a partial aggregation step through a FULL
JOIN. Consider this:

SELECT p.name, string_agg(c.name, ', ') FROM parents p FULL JOIN
children c ON p.id = c.parent_id GROUP BY 1;

I don't see why it matters here that this is a FULL JOIN. If it were
an inner join, we'd have one group for every parent that has at least
one child. Since it's a full join, we'll also get one group for every
parent with no children, and also one group for the null parent. But
that should work fine with a partially aggregated c.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Psql meta-command conninfo+

2024-10-30 Thread Hunaid Sohail

Hi,

I have attached a new patch that incorporates the approach suggested by
David. The documentation has also been updated.


$ bin/psql "port=5430 sslmode=disable dbname=postgres" -x -h localhost
psql (18devel)
Type "help" for help.

postgres=# \conninfo+
Connection Information
-[ RECORD 1 ]+--
Database | postgres
Client User  | hunaid
Host | localhost
Host Address | 127.0.0.1
Port | 5430
Options  |

Current Status
-[ RECORD 1 ]+--
Protocol Version | 3
Password Used| false
GSSAPI Authenticated | false
Backend PID  | 26268

Server Parameter Settings
-[ RECORD 1 ]-+---
Superuser | true
Client Encoding   | UTF8
Server Encoding   | UTF8
Session Authorization | hunaid

Connection Encryption
-[ RECORD 1 ]--+--
SSL Connection | false

$ bin/psql "port=5430 sslmode=require dbname=postgres" -x -h localhost
psql (18devel)
SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384,
compression: off, ALPN: postgresql)
Type "help" for help.

postgres=# \conninfo+ E
Connection Encryption
-[ RECORD 1 ]--+---
SSL Connection | true
Library| OpenSSL
Protocol   | TLSv1.3
Key Bits   | 256
Cipher | TLS_AES_256_GCM_SHA384
Compression| off
ALPN   | postgresql


I’m unsure if we need to expand the documentation further. I would
appreciate your suggestions on this.

Regards,
Hunaid Sohail

On Mon, Oct 7, 2024 at 9:31 PM David G. Johnston 
wrote:

> On Sun, Oct 6, 2024 at 11:17 PM Hunaid Sohail 
> wrote:
>
>>
>> PQpass - no need
>>
>
> I would include this as presence/absence.
>
> I concur on all of the rest.
>
>
>>
>> For PQparameterStatus, some parameters are already used.
>> server_version and application_name were already discussed and removed in
>> v12 and v29 respectively. Do we need other parameters?
>>
>
> Ok, I'll need to go read the reasoning for why they are deemed unneeded
> and form an opinion one way or the other.
>
>
>>
>>> Within that framework having \conninfo[+[CSE][…]] be the command -
>>> printing out only the table specified would be the behavior (specifying no
>>> suffix letters prints all three) - would be an option.
>>>
>>
>> 3 separate tables without suffix?
>>
>
> Yes, the tables need headers specific to their categories.
>
> I do like the idea of having 4 though, placing settings into their own.
> Premised on having all or most of the available parameters being on the
> table.  If it only ends up being a few of them then keeping those in
> the status table makes sense.
>
> David J.
>
>>
From 5fc7a9b2a80933641b67c30b5909a0aed810f0cd Mon Sep 17 00:00:00 2001
From: Hunaid Sohail 
Date: Wed, 30 Oct 2024 10:44:21 +0500
Subject: [PATCH v36] Add psql meta command conninfo+

---
 doc/src/sgml/ref/psql-ref.sgml |  23 ++-
 src/bin/psql/command.c | 328 +++--
 src/bin/psql/help.c|   2 +-
 3 files changed, 328 insertions(+), 25 deletions(-)

diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml
index b825ca96a2..c620bcd94e 100644
--- a/doc/src/sgml/ref/psql-ref.sgml
+++ b/doc/src/sgml/ref/psql-ref.sgml
@@ -1060,11 +1060,26 @@ INSERT INTO tbls1 VALUES ($1, $2) \parse stmt1
   
 
   
-\conninfo
+\conninfo[+[C|S|P|E]]
 
-
-Outputs information about the current database connection.
-
+  
+Outputs information about the current database connection.
+When + is appended, all details about the
+connection are displayed in table format.
+The modifiers can be:
+
+  C: Displays connection information, including:
+  Database, Client User, Host, Host Address, Port, and Options.
+  S: Displays the current connection status, including:
+  Protocol Version, Password Used, GSSAPI Authenticated, and Backend PID.
+  P: Displays parameter settings of the server, including:
+  Superuser, Client Encoding, Server Encoding, and Session Authorization.
+  E: Displays connection encryption details, including:
+  SSL Connection, Library, Protocol, Key Bits, Cipher, Compression, and ALPN.
+
+If no modifier is specified, all available details are displayed in separate tables.
+If connection is not using SSL, related encryption details are not displayed.
+  
 
   
 
diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c
index 328d78c73f..8a5b8d1ea9 100644
--- a/src/bin/psql/command.c
+++ b/src/bin/psql/command.c
@@ -72,7 +72,8 @@ static backslashResult exec_command_cd(PsqlScanState scan_state, bool active_bra
 	   const char *cmd);
 static backslashResult exec_command_close(PsqlScanState scan_state, bool active_branch,
 		  const char *cmd);
-static backslashResult exec_command_conninfo(PsqlScanState scan_state, bool active_b

Re: Interrupts vs signals

2024-10-30 Thread Heikki Linnakangas

One remaining issue with these patches is backwards-compatibility for 
extensions:


1. Any extensions using WaitLatch, SetLatch, ResetLatch need to be 
changed to use WaitInterrupt, SetInterrupt/RaiseInterrupt, 
ResetInterrupt instead.


2. If an extension is defining its own Latch with InitLatch, rather than 
using MyLatch, it's out of luck. You could probably rewrite it using the 
INTERRUPT_GENERAL_WAKEUP, which is multiplexed like MyLatch, but it's a 
bit more effort.


We can provide backwards compatibility macros and a new facility to 
allocate custom interrupt bits. But how big of a problem is this anyway? 
In an in-person chat last week, Andres said something like "this will 
break every extension". I don't think it's quite that bad, and more 
complicated extensions need small tweaks in every major release anyway. 
But let's try to quantify that.


Analysis


Joel compiled a list of all extensions that are on github: 
https://gist.github.com/joelonsql/e5aa27f8cc9bd22b8999b7de8aee9d47. I 
did "git checkout" on all of them, 1159 repositories in total (some of 
the links on the list were broken, or required authentication).


86 out of those 1159 extensions contain the word "Latch":

$ ls -d  */*  | xargs -IFOO bash -c "grep -q -r -I  Latch FOO && echo 
FOO" | wc -l

86

For comparison, in PostgreSQL v10, we added a new 'wait_event_info' 
argument to WaitLatch and WaitLatchForSocket. That breakage was on 
similar scale:


$ ls -d  */*  | xargs -IFOO bash -c "grep -q -r -I  WaitLatch FOO && 
echo FOO" | wc -l

71

Admittedly that was a long time ago, we might have different priorities now.

Deeper analysis
---

Most of those calls are simple calls to SetLatch(MyLatch), 
WaitLatch(MyLatch, ...) etc. that could be handled by backwards 
compatibility macros. Many of the calls don't use MyLatch, but use 
MyProc->procLatch directly:


rc = WaitLatch(&MyProc->procLatch, ...)

That makes sense, since  MyLatch was added in 2015, while the initial 
latch code was added in 2010. Any extensions that lived during that era 
would use MyProc->procLatch for backwards-compatibility, and there's 
been no need to change that since. That could also be supported by 
backwards-compatibility macros, or we could tell the extension authors 
to search & replace MyProc->procLatch to MyLatch.


Excluding those, there are a handful of extensions that would need to be 
updated. Below is a quick analysis of the remaining results of "grep 
Latch" in all the repos:


These calls use the process latch to wake up a different process (I 
excluded some duplicated forks of the same extension):


2ndQuadrant/pglogical/pglogical_sync.c: 
SetLatch(&apply->proc->procLatch);
2ndQuadrant/pglogical/pglogical_apply.c: 
   SetLatch(&worker->proc->procLatch);
2ndQuadrant/pglogical/pglogical_worker.c: 
SetLatch(&w->proc->procLatch);
2ndQuadrant/pglogical/pglogical_worker.c: 
SetLatch(&PGLogicalCtx->supervisor->procLatch);

heterodb/pg-strom/src/gpu_cache.c:  Latch  *backend;
heterodb/pg-strom/src/gpu_cache.c:  cmd->backend = (is_async ? NULL 
: MyLatch);
heterodb/pg-strom/src/gpu_cache.c: 
SetLatch(cmd->backend);
heterodb/pg-strom/src/gpu_cache.c: 
SetLatch(cmd->backend);
citusdata/citus/src/backend/distributed/utils/maintenanced.c:   Latch 
*latch; /* pointer to the background worker's latch */
citusdata/citus/src/backend/distributed/utils/maintenanced.c: 
   SetLatch(dbData->latch);
liaorc/devel-master/src/cuda_program.c: 
SetLatch(&proc->procLatch);
okbob/generic-scheduler/dbworker.c: 
SetLatch(&dbworker->parent->procLatch);
postgrespro/lsm3/lsm3.c: 
SetLatch(&entry->merger->procLatch);
postgrespro/lsm3/lsm3.c: 
SetLatch(&entry->merger->procLatch);
postgrespro/pg_pageprep/pg_pageprep.c: 
SetLatch(&starter->procLatch);
postgrespro/pg_pageprep/pg_pageprep.c: 
SetLatch(&starter->procLatch);

postgrespro/pg_query_state/pg_query_state.c:Latch   *caller;
postgrespro/pg_query_state/pg_query_state.c: 
SetLatch(counterpart_userid->caller);
postgrespro/pg_query_state/pg_query_state.c: 
counterpart_userid->caller = MyLatch;
timescale/timescaledb/src/loader/bgw_message_queue.c: 
SetLatch(&BackendPidGetProc(queue_get_reader(queue))->procLatch);

troels/hbase_fdw/process_communication.c:   control->latch = MyLatch;
troels/hbase_fdw/process_communication.c: 
SetLatch(control->latch);


The above could easily be fairly replaced with the new SendInterrupt() 
function. A backwards compatibility macro might be possible for some of 
these, but would be tricky for others. So I think these extensions will 
need to adapt by switching to SendInterrupt(). At quick glance, it would 
easy for all of these to do that with a small "#if PG_VERSION_NUM" block.



postgrespro/pg_wait_sampling/pg_wait_sampling.c: * null wait 
events. So instead we make use of DisownLatch() resetting
postgrespro/pg_wait_sampling/pg_wait_sampling.c:if (proc->pid == 
0 || proc->procLatch.owner_pid == 0 || proc->pid

Re: AIO writes vs hint bits vs checksums

2024-10-30 Thread Andres Freund

Hi,

On 2024-10-30 13:29:01 +0200, Heikki Linnakangas wrote:
> On 30/10/2024 04:21, Andres Freund wrote:
> > Attached is a, unfortunately long, series of patches implementing what I
> > described upthread.
> 
> Review of the preparatory patches:
> 
> > 0001 Add very basic test for kill_prior_tuples
> > 
> >  We currently don't exercise this patch for gist and hash, which seems
> >  somewhat criminal.
> 
> Interesting to use the isolationtester for this. There's just one session,
> so you're just using it to define reusable steps with handy names.

Yea. I had started out writing it as a pg_regress style test and it quickly got 
very
hard to understand.


> I'm fine with that, but please add a comment to explain it.

Makes sense.


> I wonder if it'd be more straightforward to make it a regular pg_regress
> test though. There would be some repetition, but would it be so bad?

I found it to be quite bad. If testing just one AM it's ok-ish, but once you
test 2-3 it gets very long and repetitive. I guess we could use functions or
such to make it a bit less painful - but that point, is it actually simpler?


> You forgot to add the new test to 'isolation_schedule'.

Oops.

> typos:
>  "inex" -> "index"
>  "does something approximately reasonble" -> "do something approximately
> reasonable"

Oops^2.


> > +/*
> > + * If HEAP_MOVED_OFF or HEAP_MOVED_IN are set on the tuple, remove them and
> > + * adjust hint bits. See the comment for SetHintBits() for more background.
> > + *
> > + * This helper returns false if the row ought to be invisible, true 
> > otherwise.
> > + */
> > +static inline bool
> > +HeapTupleCleanMoved(HeapTupleHeader tuple, Buffer buffer)
> > +{
> > +   TransactionId xvac;
> > +
> > +   /* only used by pre-9.0 binary upgrades */
> > +   if (likely(!(tuple->t_infomask & (HEAP_MOVED_OFF | HEAP_MOVED_IN
> > +   return true;
> > +
> > +   ...
> > +}
> 
> This is so unlikely these days that this function probably should not be
> inlined anymore.

Because I put the check for moved in the function it is important that it's
inlined. The compiler can understand that the rest of the function is unlikely
to be executed and put it together with other less likely to be executed code.


> > 0003 bufmgr: Add BufferLockHeldByMe()
> > 
> >  Useful for writing assertions in later patches.
> 
> > +   if (mode == BUFFER_LOCK_EXCLUSIVE)
> > +   lwmode = LW_EXCLUSIVE;
> > +   else if (mode == BUFFER_LOCK_SHARE)
> > +   lwmode = LW_SHARED;
> > +   else
> > +   {
> > +   Assert(false);
> > +   pg_unreachable();
> > +   lwmode = LW_EXCLUSIVE;  /* assuage compiler */
> > +   }
> 
> I just learned a new word :-).

Heh.


> This is fine, but to avoid the assuaging, you could also do this:
> 
> if (mode == BUFFER_LOCK_EXCLUSIVE)
> lwmode = LW_EXCLUSIVE;
> else
> {
> Assert(mode == BUFFER_LOCK_SHARE);
> lwmode = LW_SHARED;
> }

I have no opinion about which version is better...


> > 0004 heapam: Use exclusive lock on old page in CLUSTER
> > 
> >  When I started looking into this I wanted to make sure that it's 
> > actually
> >  safe to skip setting hint bits in all places. One place that's *not* 
> > safe
> >  is cluster - we assume that all hint bits are set in 
> > rewrite_heap_tuple().
> > 
> >  This might not strictly be required, but it seemed very fragile as
> >  is. While using an exclusive lock does seem like it makes it a bit 
> > safer,
> >  it's not a very good fix, as we still dirty the old buffer, which seems
> >  like a shame when one uses VACUUM FULL.
> 
> > +   /*
> > +* To be able to guarantee that we can set the hint bit, 
> > acquire an
> > +* exclusive lock on the old buffer. We need the hint bits to 
> > be set
> > +* as otherwise reform_and_rewrite_tuple() -> 
> > rewrite_heap_tuple() ->
> > +* heap_freeze_tuple() will get confused.
> > +*
> 
> Hmm, I don't see any comments in the reform_and_rewrite_tuple() ->
> rewrite_heap_tuple() -> heap_freeze_tuple() path about requiring hint bits
> to be set.

There's this comment is in heap_prepare_freeze_tuple():

 * It is assumed that the caller has checked the tuple with
 * HeapTupleSatisfiesVacuum() and determined that it is not HEAPTUPLE_DEAD
 * (else we should be removing the tuple, not freezing it).

I don't remember the precise details of what all goes wrong, I think there was
more than one avenue.

One example is stuff like this:
/*
 * If the tuple has been updated, check the old-to-new mapping hash 
table.
 */
if (!((old_tuple->t_data->t_infomask & HEAP_XMAX_INVALID) ||
  HeapTupleHeaderIsOnlyLocked(old_tuple->t_data)) &&
!HeapTupleHeaderIndicatesMovedPartitions(old_tuple->t_data) &&
!(ItemPointerEquals(&(old_tuple->t_self),

&(old_tuple->t_data->t

Re: protocol-level wait-for-LSN

2024-10-30 Thread Jesper Pedersen


Hi,

On 10/30/24 1:45 PM, Jelte Fennema-Nio wrote:

On Wed, 30 Oct 2024 at 18:18, Ants Aasma  wrote:

The idea is great, I have been wanting something like this for a long
time. For future proofing it might be a good idea to not require the
communicated-waited value to be a LSN.


Yours and Matthias' feedback make total sense I think. From an
implementation perspective I think there are a few things necessary to
enable these wider usecases:
1. The token should be considered opaque for clients (should be documented)
2. The token should be defined as variable length in the protocol
3. We should have a hook to allow postgres extensions to override the
default token generation
4. We should have a hook to allow postgres extensions to override
waiting until the token "timestamp"


Even without sharding LSN might not be a final choice. Right now on
the primary the visibility order is not LSN order. So if a connection
does synchronous_commit = off commit, the write location is not even
going to see the commit. By publishing the end of the commit record it
would be better. But I assume at some point we would like to have a
consistent visibility order, which quite likely means using something
other than LSN as the logical clock.


I was going to say that the default could probably still be LSN, but
this makes me doubt that. Is there some other token that we can send
now that we could "wait" on instead of the LSN, which would work for.
If not, I think LSN is still probably a good choice as the default. Or
maybe only as a default in case synchronous_commit != off.



There are known wish-lists for a protocol v4, like

 
https://github.com/pgjdbc/pgjdbc/blob/master/backend_protocol_v4_wanted_features.md

and a lot of clean-room implementations in drivers and embedded in 
projects/products.


Having LSN would be nice, but to break all existing implementations, no. 
Having to specify with startup parameters how a core message format 
looks like sounds like a bad idea to me,


 https://www.postgresql.org/docs/devel/protocol-message-formats.html

is it.

If we want to start on a protocol v4 thing then that is ok - but there 
are a lot of feature requests for that one.


Best regards,
 Jesper

small pg_combinebackup improvements

2024-10-30 Thread Robert Haas

Hi,

Here are two small patches to improve pg_combinebackup slightly.

0001: I noticed that there is some logic in reconstruct.c that
constructs a pathname of the form a/b//c instead of a/b/c. AFAICT,
this still works fine; it just looks ugly. It's possible to get one of
these pathnames to show up in an error message if you have a corrupted
backup, e.g.:

pg_combinebackup: error: could not open file
"x/base/1//INCREMENTAL.6229": No such file or directory
pg_combinebackup: removing output directory "xy"

So 0001 fixes this.

0002: If you try to do something like pg_combinebackup incr1 incr2 -o
result, you'll get an error saying that the first backup must be a
full backup. This is an implementation restriction that might be good
to lift, but right now that's how it is. However, if you do
pg_combinebackup full incr -o result, but the full backup happens to
contain an INCREMENTAL.* file that also exists in the incremental
backup, then you won't get an error. Instead you'll reconstruct a
completely bogus full file and write it to the result directory. Now,
this should really be impossible unless you're intentionally trying to
break pg_combinebackup, but it's also pretty lame, so 0002 fixes
things so that you get a proper error, and also adds a test case.

Review appreciated. My plan is to commit at least to master and
possibly back-patch. Opinions on whether to back-patch are also
appreciated.

Thanks,

-- 
Robert Haas
EDB: http://www.enterprisedb.com


v1-0001-pg_combinebackup-When-reconstructing-avoid-double.patch
Description: Binary data


v1-0002-pg_combinebackup-Error-if-incremental-file-exists.patch
Description: Binary data

Re: Shave a few cycles off our ilog10 implementation

2024-10-30 Thread Heikki Linnakangas


On 30/10/2024 21:27, David Fetter wrote:

Please find attached a patch to $Subject

I've done some preliminary testing, and it appears to shave somewhere
between 25-50% off the operations themselves, and these cascade into
things like formatting result sets and COPY OUT.


Impressive! What did you use to performance test it, to get those results?

--
Heikki Linnakangas
Neon (https://neon.tech)

Re: Proposal for Integrating Data Masking and anonymization into PostgreSQL

2024-10-30 Thread Stefan Keller

Hi Osman

See postgresql_anonymizer
https://gitlab.com/dalibo/postgresql_anonymizer/ and pg_anonymize
https://github.com/rjuju/pg_anonymize , where the former has more
functionality, like synthetic data generation (if you need that).

Yours, Stefan

Am Di., 29. Okt. 2024 um 10:01 Uhr schrieb Hosney Osman
:
>
> could you please provide me with reference or article or example for this 
> scenario
> for example i have schema include 4 table
> and i want to anonymize it in another schema
> how to do that
>
> On Wed, Oct 23, 2024 at 4:25 PM Alastair Turner  wrote:
>>
>> Hi Hosney
>>
>> On Wed, 23 Oct 2024, 15:17 Hosney Osman,  wrote:
>>>
>>> Dear PostgreSQL Development Team,
>>>
>>> I am writing to propose the development of native data masking and 
>>> anonymization features within PostgreSQL. As a long-time user of PostgreSQL,
>>>
>>> I have observed a growing need for efficient and secure data handling, 
>>> particularly in compliance with regulations like GDPR
>>
>> This requirement already seems to be well served by extensions. 
>> pg_anonymize, for instance, uses the meta data tools available in the 
>> database, like security labels, to manage the values returned to users.
>>
>> Is there any functionality, or usability, in this area which you believe 
>> can't be delivered by an extension?
>>
>> Regards
>>
>> Alastair

Re: protocol-level wait-for-LSN

2024-10-30 Thread Jelte Fennema-Nio

On Wed, 30 Oct 2024 at 19:04, Jesper Pedersen
 wrote:
> Having LSN would be nice, but to break all existing implementations, no.
> Having to specify with startup parameters how a core message format
> looks like sounds like a bad idea to me,

It would really help if you would explain why you think it's a bad
idea to use a startup parameter for that, instead of simply stating
that you think it needs a major protocol version bump.

The point of enabling it through a startup parameter (aka protocol
option) is exactly so it will not break any existing implementations.
If clients request the protocol option (which as the name suggests is
optional), then they are expected to be able to parse it. If they
don't, then they will get the old message format. So no existing
implementation will be broken. If some middleware/proxy gets a request
for a startup option it does not support it can advertise that to the
client using the NegotiateProtocolVersion message. Allowing the client
to continue in a mode where the option is not enabled.

So, not bumping the major protocol version and enabling this feature
through a protocol option actually causes less breakage in practice.

Also regarding the wishlist. I think it's much more likely for any of
those to happen in a minor version bump and/or protocol option than it
is that we'll bump the major protocol version.

P.S. Like I said in another email on this thread: I think for this
specific case I'd also prefer a separate new message, because that
makes it easier to filter that message out when received by PgBouncer.
But I'd still like to understand your viewpoint better on this,
because adding fields to existing message types is definitely one of
the types of changes that I personally think would be fine for some
protocol changes.

Re: Shave a few cycles off our ilog10 implementation

2024-10-30 Thread David Fetter

On Wed, Oct 30, 2024 at 09:54:20PM +0200, Heikki Linnakangas wrote:
> On 30/10/2024 21:27, David Fetter wrote:
> > Please find attached a patch to $Subject
> > 
> > I've done some preliminary testing, and it appears to shave somewhere
> > between 25-50% off the operations themselves, and these cascade into
> > things like formatting result sets and COPY OUT.
> 
> Impressive! What did you use to performance test it, to get those results?

In case that wasn't clear, what I've tested so far was the ilog10
implementations, not the general effects on the things they underlie.

This testing was basically just sending a bunch of appropriately sized
pseudo-random uints in a previously created array sent through a tight
loop that called the ilog10s and getting average execution times.

Any suggestions for more thorough testing would be welcome.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Re: protocol-level wait-for-LSN

2024-10-30 Thread Jelte Fennema-Nio

On Wed, 30 Oct 2024 at 18:18, Ants Aasma  wrote:
> The idea is great, I have been wanting something like this for a long
> time. For future proofing it might be a good idea to not require the
> communicated-waited value to be a LSN.

Yours and Matthias' feedback make total sense I think. From an
implementation perspective I think there are a few things necessary to
enable these wider usecases:
1. The token should be considered opaque for clients (should be documented)
2. The token should be defined as variable length in the protocol
3. We should have a hook to allow postgres extensions to override the
default token generation
4. We should have a hook to allow postgres extensions to override
waiting until the token "timestamp"

> Even without sharding LSN might not be a final choice. Right now on
> the primary the visibility order is not LSN order. So if a connection
> does synchronous_commit = off commit, the write location is not even
> going to see the commit. By publishing the end of the commit record it
> would be better. But I assume at some point we would like to have a
> consistent visibility order, which quite likely means using something
> other than LSN as the logical clock.

I was going to say that the default could probably still be LSN, but
this makes me doubt that. Is there some other token that we can send
now that we could "wait" on instead of the LSN, which would work for.
If not, I think LSN is still probably a good choice as the default. Or
maybe only as a default in case synchronous_commit != off.

Re: AIO writes vs hint bits vs checksums

2024-10-30 Thread Jeff Davis

On Tue, 2024-09-24 at 11:55 -0400, Andres Freund wrote:
> What I suspect we might want instead is something inbetween a share
> and an
> exclusive lock, which is taken while setting a hint bit and which
> conflicts
> with having an IO in progress.

I am starting to wonder if a shared content locks are really the right
concept at all. It makes sense for simple mutexes, but we are already
more complex than that, and you are suggesting adding on to that
complexity. Which I agree is a good idea, I'm just wondering if we
could go even further.

The README states that a shared lock is necessary for visibility
checking, but can we just be more careful with the ordering and
atomicity of visibility changes in the page?

 * carefully order reads and writes of xmin/xmax/hints (would
   that create too many read barriers in the tqual.c code?)
 * write line pointer after tuple is written

We would still have pins and cleanup locks to prevent data removal.
We'd have the logic you suggest that would prevent modification during
IO. And there would still need to be an exclusive content locks so that
two inserts don't try to allocate the same line pointer, or lock the
same tuple.

If PD_ALL_VISIBLE is set it's even simpler.

Am I missing some major hazards?

Regards,
Jeff Davis

Re: AIO writes vs hint bits vs checksums

2024-10-30 Thread Andres Freund

Hi,

On 2024-10-30 10:47:35 -0700, Jeff Davis wrote:
> On Tue, 2024-09-24 at 11:55 -0400, Andres Freund wrote:
> > What I suspect we might want instead is something inbetween a share
> > and an
> > exclusive lock, which is taken while setting a hint bit and which
> > conflicts
> > with having an IO in progress.
> 
> I am starting to wonder if a shared content locks are really the right
> concept at all. It makes sense for simple mutexes, but we are already
> more complex than that, and you are suggesting adding on to that
> complexity.

What I am proposing isn't making the content lock more complicated, it's
orthogonal to the content lock.

> Which I agree is a good idea, I'm just wondering if we could go even
> further.
> 
> The README states that a shared lock is necessary for visibility
> checking, but can we just be more careful with the ordering and
> atomicity of visibility changes in the page?
> 
>  * carefully order reads and writes of xmin/xmax/hints (would
>that create too many read barriers in the tqual.c code?)
>  * write line pointer after tuple is written

It's possible, but it'd be a lot of work.  And you wouldn't need to just do
this for heap, but all the indexes too, to make progress on the
don't-set-hint-bits-during-io front.  So I don't think it makes sense to tie
these things together.

I do think that it's an argument for not importing all the complexity into
lwlock.c though.

> We would still have pins and cleanup locks to prevent data removal.

As-is cleanup locks only work in coordination with content locks. While
cleanup is ongoing we need to prevent anybody from starting to look at the
page - without acquiring something like a shared lock that's not easy.

> We'd have the logic you suggest that would prevent modification during
> IO. And there would still need to be an exclusive content locks so that
> two inserts don't try to allocate the same line pointer, or lock the
> same tuple.
> 
> If PD_ALL_VISIBLE is set it's even simpler.
> 
> Am I missing some major hazards?

I don't think anything fundamental, but it's a decidedly nontrivial amount of
work.

Greetings,

Andres Freund

Re: Increase of maintenance_work_mem limit in 64-bit Windows

2024-10-30 Thread Vladlen Popolitov


v.popoli...@postgrespro.ru писал(а) 2024-10-01 00:30:

David Rowley писал(а) 2024-09-24 01:07:

On Tue, 24 Sept 2024 at 02:47, Vladlen Popolitov
 wrote:
I agree, it is better to fix all them together. I also do not like 
this

hack, it will be removed from the patch, if I check and change
all  at once.
I think, it will take about 1 week to fix and test all changes. I 
will

estimate the total volume of the changes and think, how to group them
in the patch ( I hope, it will be only one patch)


There's a few places that do this:

Size maxBlockSize = ALLOCSET_DEFAULT_MAXSIZE;

/* choose the maxBlockSize to be no larger than 1/16 of work_mem */
while (16 * maxBlockSize > work_mem * 1024L)

I think since maxBlockSize is a Size variable, that the above should
probably be:

while (16 * maxBlockSize > (Size) work_mem * 1024)

Maybe there can be a precursor patch to fix all those to get rid of
the 'L' and cast to the type we're comparing to or assigning to rather
than trying to keep the result of the multiplication as a long.


Hi

I rechecked all , that depend on MAX_KILOBYTES limit and 
fixed

all casts that are affected by 4-bytes long type in Windows 64-bit. Now
next variables are limited by 2TB in all 64-bit systems:
maintenance_work_mem
work_mem
logical_decoding_work_mem
max_stack_depth
autovacuum_work_mem
gin_pending_list_limit
wal_skip_threshold
Also wal_keep_size_mb, min_wal_size_mb, max_wal_size_mb,
max_slot_wal_keep_size_mb are not affected by "long" cast.


Hi everyone.

The patch added to Commitfest:
https://commitfest.postgresql.org/50/5343/
--
Best regards,

Vladlen Popolitov.

Re: Wrong result when enable_partitionwise_join is on if collation of PartitionKey and Column is different.

2024-10-30 Thread Junwang Zhao

On Wed, Oct 30, 2024 at 11:58 AM jian he  wrote:
>
> I missed a case when column collation and partition key collation are
> the same and indeterministic.
> that should be fine for partition-wise join.
> so v2 attached.
>
> have_partkey_equi_join, match_expr_to_partition_keys didn't do any
> collation related check.
> propose v2 change disallow partitionwise join for  case when
> column collation is indeterministic *and* is differ from partition
> key's collation.
>
> the attached partition_wise_join_collation.sql is the test script.
> you may use it to compare with the master behavior.

What if the partkey collation and column collation are both deterministic,
but with different sort order?

I'm not familiar with this part of the code base, but it seems to me the
partition wise join should use partkey collation instead of column collation,
because it's the partkey collation that decides which partition a row to
be dispatched.

What Jian proposed is also reasonable but seems another aspect of $subject?

Just some random thought, might be wrong ;(

-- 
Regards
Junwang Zhao

Shave a few cycles off our ilog10 implementation

2024-10-30 Thread David Fetter

Please find attached a patch to $Subject

I've done some preliminary testing, and it appears to shave somewhere
between 25-50% off the operations themselves, and these cascade into
things like formatting result sets and COPY OUT.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778
>From 612c848d8025d318e5b6e0da5bb04d0f7ed22126 Mon Sep 17 00:00:00 2001
From: David Fetter 
Date: Mon, 28 Oct 2024 04:38:16 -0700
Subject: [PATCH v1] Shave a few instructions off ilog10

by using SWAR and a lookup table.
---
 src/backend/utils/adt/numutils.c | 73 ++--
 1 file changed, 60 insertions(+), 13 deletions(-)

diff --git a/src/backend/utils/adt/numutils.c b/src/backend/utils/adt/numutils.c
index 63c2beb6a29..a53b647c054 100644
--- a/src/backend/utils/adt/numutils.c
+++ b/src/backend/utils/adt/numutils.c
@@ -39,30 +39,76 @@ static const char DIGIT_TABLE[200] =
 "90" "91" "92" "93" "94" "95" "96" "97" "98" "99";
 
 /*
- * Adapted from http://graphics.stanford.edu/~seander/bithacks.html#IntegerLog10
+ * Adapted from https://lemire.me/blog/2021/06/03/computing-the-number-of-digits-of-an-integer-even-faster/
  */
 static inline int
 decimalLength32(const uint32 v)
 {
-	int			t;
-	static const uint32 PowersOfTen[] = {
-		1, 10, 100,
-		1000, 1, 10,
-		100, 1000, 1,
-		10
+	/* Each entry in the table is a 64-bit unsigned integer.
+	 * - The upper 32 bits of this integer is floor(ilog10(v)), a thing we can know from lg(v).
+	 * - The lower 32 bits of the integer is a number n such that v + n = 2^32 when v is a power of 10.
+	 *
+	 * These two parts taken together create a way to put the number of decimal
+	 * digits as the upper 32 bits.
+	 *
+	 */
+	static const uint64_t BaseTwoToBaseTenPivot[] = {
+		4294967296,  8589934582,  8589934582,  8589934582,  12884901788,
+		12884901788, 12884901788, 17179868184, 17179868184, 17179868184,
+		21474826480, 21474826480, 21474826480, 21474826480, 25769703776,
+		25769703776, 25769703776, 30063771072, 30063771072, 30063771072,
+		34349738368, 34349738368, 34349738368, 34349738368, 38554705664,
+		38554705664, 38554705664, 41949672960, 41949672960, 41949672960,
+		42949672960, 42949672960
 	};
 
-	/*
-	 * Compute base-10 logarithm by dividing the base-2 logarithm by a
-	 * good-enough approximation of the base-2 logarithm of 10
-	 */
-	t = (pg_leftmost_one_pos32(v) + 1) * 1233 / 4096;
-	return t + (v >= PowersOfTen[t]);
+	return (v + BaseTwoToBaseTenPivot[pg_leftmost_one_pos32(v)]) >> 32;
 }
 
+/*
+ * Adapted from https://lemire.me/blog/2021/06/03/computing-the-number-of-digits-of-an-integer-even-faster/
+ */
 static inline int
 decimalLength64(const uint64 v)
 {
+#ifdef HAS_INT128
+	/* Each entry in the table is a 128-bit unsigned integer.
+	 * - The upper 64 bits of this integer is floor(ilog10(v)), a thing we can know from lg(v).
+	 * - The lower 64 bits of the integer is a number n such that v + n = 2^64 when v is a power of 10.
+	 *
+	 * These two parts taken together create a way to put the number of decimal
+	 * digits as the upper 64 bits.
+	 *
+	 */
+	static const uint128 BaseTwoToBaseTenPivot[] = {
+		 18446744073709551616,
+		 18446744073709551606,  18446744073709551606,  18446744073709551606,
+		 36893488147419103132,  36893488147419103132,  36893488147419103132,
+		 55340232221128653848,  55340232221128653848,  55340232221128653848,
+		 73786976294838196464,  73786976294838196464,  73786976294838196464,
+		 92233720368547658080,  92233720368547658080,  92233720368547658080,
+		110680464442256309696, 110680464442256309696, 110680464442256309696,
+		129127208515956861312, 129127208515956861312, 129127208515956861312,
+		147573952589576412928, 147573952589576412928, 147573952589576412928,
+		166020696662385964544, 166020696662385964544, 166020696662385964544,
+		184467440727095516160, 184467440727095516160, 184467440727095516160,
+		202914184710805067776, 202914184710805067776, 202914184710805067776,
+		221360927884514619392, 221360927884514619392, 221360927884514619392,
+		239807662958224171008, 239807662958224171008, 239807662958224171008,
+		258254317031933722624, 258254317031933722624, 258254317031933722624,
+		276700161105643274240, 276700161105643274240, 276700161105643274240,
+		295137905179352825856, 295137905179352825856, 295137905179352825856,
+		313494649253062377472, 313494649253062377472, 313494649253062377472,
+		331041393326771929088, 331041393326771929088, 331041393326771929088,
+		340488137400481480704, 340488137400481480704, 340488137400481480704,
+		268934881474191032320, 268934881474191032320, 268934881474191032320
+	};
+
+	return (v + BaseTwoToBaseTenPivot[pg_leftmost_one_pos64(v)]) >> 64;
+#else/* !HAVE_INT128 */
+	/*
+	 * * Adapted from http://graphics.stanford.edu/~seander/bithacks.html#IntegerLog10
+	 */
 	int			t;
 	static const uint64 PowersOfTen[] = {
 		UINT64CONST(1), UINT64CONST(10),
@@ -83,6 +129,7 @@ decimalLength64(const uint64 v)
 	 */
 	t = (pg_leftmost

Re: Interrupts vs signals

2024-10-30 Thread Robert Haas

On Wed, Oct 30, 2024 at 12:03 PM Heikki Linnakangas  wrote:
> 1. Any extensions using WaitLatch, SetLatch, ResetLatch need to be
> changed to use WaitInterrupt, SetInterrupt/RaiseInterrupt,
> ResetInterrupt instead.
>
> 2. If an extension is defining its own Latch with InitLatch, rather than
> using MyLatch, it's out of luck. You could probably rewrite it using the
> INTERRUPT_GENERAL_WAKEUP, which is multiplexed like MyLatch, but it's a
> bit more effort.
>
> We can provide backwards compatibility macros and a new facility to
> allocate custom interrupt bits. But how big of a problem is this anyway?
> In an in-person chat last week, Andres said something like "this will
> break every extension".

Seems hyperbolic.

> For comparison, in PostgreSQL v10, we added a new 'wait_event_info'
> argument to WaitLatch and WaitLatchForSocket. That breakage was on
> similar scale:
>
> $ ls -d  */*  | xargs -IFOO bash -c "grep -q -r -I  WaitLatch FOO &&
> echo FOO" | wc -l
> 71

However, that was also pretty easy to fix. This seems like it might be
a little more complicated.

> We have a few options for how to deal with backwards-compatibility for
> extensions:
>
> A) If we rip off the bandaid in one go and don't provide any
> backwards-compatibility macros, we will break 96 extensions. Most of
> them can be fixed by replacing WaitLatch, SetLatch, ResetLatch with
> corresponding WaitInterrupt, RaiseInterrupt, ResetInterrupt calls. (With
> #ifdef for compatibility with older postgres versions)
>
> B) If we provide backwards-compatibility macros so that simple Latch
> calls on MyLatch continue working, we will break about 14 extensions.
> They will need some tweaking like in option A). A bit more than the
> simple cases in option A), but nothing too difficult.
>
> C) We could provide "forward-compatibility" macros in a separate header
> file, to make the new "SetInterrupt" etc calls work in old PostgreSQL
> versions. Many of the extensions already have a header file like this,
> see e.g. citusdata/citus/src/include/pg_version_compat.h,
> pipelinedb/pipelinedb/include/compat.h. It might actually be a good idea
> to provide a semi-official header file like this on the Postgres wiki,
> to help extension authors. It would encourage extensions to use the
> latest idioms, while still being able to compile for older versions.
>
> I'm leaning towards option C). Let's rip off the band-aid, but provide
> documentation for how to adapt your extension code. And provide a
> forwards-compatibility header on the wiki, that extension authors can
> use to make the new Interrupt calls work against old server versions.

I don't know which of these options is best, but I don't find any of
them categorically unacceptable.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: protocol-level wait-for-LSN

2024-10-30 Thread Ants Aasma

On Mon, 28 Oct 2024 at 17:51, Peter Eisentraut  wrote:
> This is something I hacked together on the way back from pgconf.eu.
> It's highly experimental.
>
> The idea is to do the equivalent of pg_wal_replay_wait() on the protocol
> level, so that it is ideally fully transparent to the application code.
> The application just issues queries, and they might be serviced by a
> primary or a standby, but there is always a correct ordering of reads
> after writes.

The idea is great, I have been wanting something like this for a long
time. For future proofing it might be a good idea to not require the
communicated-waited value to be a LSN.

In a sharded database a Lamport timestamp would allow for sequential
consistency. Lamport timestamp is just some monotonically increasing
value that is eagerly shared between all communicating participants,
including clients. For a single cluster LSNs work fine for this
purpose. But with multiple shards LSNs will not work, unless arranged
as a vector clock which is what I think Matthias proposed.

Even without sharding LSN might not be a final choice. Right now on
the primary the visibility order is not LSN order. So if a connection
does synchronous_commit = off commit, the write location is not even
going to see the commit. By publishing the end of the commit record it
would be better. But I assume at some point we would like to have a
consistent visibility order, which quite likely means using something
other than LSN as the logical clock.

I see the patch names the field LSN, but on the protocol level and for
the client library this is just an opaque 127 byte token. So basically
I'm thinking the naming could be more generic. And for a complete
Lamport timestamp implementation we would need the capability of
extracting the last seen value and another set-if-greater update
operation.

-- 
Ants Aasma
www.cybertec-postgresql.com

Re: Shave a few cycles off our ilog10 implementation

2024-10-30 Thread David Rowley

On Thu, 31 Oct 2024 at 09:02, David Fetter  wrote:
> This testing was basically just sending a bunch of appropriately sized
> pseudo-random uints in a previously created array sent through a tight
> loop that called the ilog10s and getting average execution times.
>
> Any suggestions for more thorough testing would be welcome.

Maybe something similar to what I did in [1].

David

[1] 
https://postgr.es/m/CAApHDvopR=ypgnr4abbn4hmoztuyva+ifyrtvu49pxg9yo_...@mail.gmail.com

Re: Large expressions in indexes can't be stored (non-TOASTable)

2024-10-30 Thread Nathan Bossart

On Fri, Oct 18, 2024 at 08:08:55AM +0900, Michael Paquier wrote:
> 0002 and 0003 look sane enough to me as shaped.  I'd need to spend a
> few more hours on 0001 if I were to do a second round on it, but you
> don't really need a second opinion, do you?  :D

I'll manage.  0001 was a doozy to back-patch, and this obviously isn't a
terribly pressing issue, so I plan to wait until after the November minor
release to apply this.  I'm having second thoughts on 0002, so I'll
probably leave that one uncommitted, but IMHO we definitely need 0003 to
prevent this issue from reoccurring.

-- 
nathan
>From 58eefc6c2f90ac7d1f23f5b5f3052042cf0e75d9 Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Wed, 30 Oct 2024 15:42:29 -0500
Subject: [PATCH v5 1/1] Ensure we have a snapshot when updating various system
 catalogs.

Reported-by: Alexander Lakhin
Reviewed-by: Michael Paquier
Discussion: https://postgr.es/m/18127-fe54b6a667f29658%40postgresql.org
Discussion: https://postgr.es/m/18309-c0bf914950c46692%40postgresql.org
Discussion: https://postgr.es/m/ZvMSUPOqUU-VNADN%40nathan
Backpatch-through: 12
---
 src/backend/commands/indexcmds.c|  2 +-
 src/backend/commands/tablecmds.c|  8 +++
 src/backend/postmaster/autovacuum.c |  7 ++
 src/backend/replication/logical/tablesync.c | 21 ++
 src/backend/replication/logical/worker.c| 24 +
 5 files changed, 61 insertions(+), 1 deletion(-)

diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index 2f652463e3..fd86f28a35 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -4230,7 +4230,7 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid 
relationOid, const Rein
 false);
 
/*
-* Updating pg_index might involve TOAST table access, so 
ensure we
+* Swapping the indexes might involve TOAST table access, so 
ensure we
 * have a valid snapshot.
 */
PushActiveSnapshot(GetTransactionSnapshot());
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 4345b96de5..f1649419d6 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -19383,9 +19383,17 @@ ATExecDetachPartition(List **wqueue, AlteredTableInfo 
*tab, Relation rel,
tab->rel = rel;
}
 
+   /*
+* Detaching the partition might involve TOAST table access, so ensure 
we
+* have a valid snapshot.
+*/
+   PushActiveSnapshot(GetTransactionSnapshot());
+
/* Do the final part of detaching */
DetachPartitionFinalize(rel, partRel, concurrent, defaultPartOid);
 
+   PopActiveSnapshot();
+
ObjectAddressSet(address, RelationRelationId, 
RelationGetRelid(partRel));
 
/* keep our lock until commit */
diff --git a/src/backend/postmaster/autovacuum.c 
b/src/backend/postmaster/autovacuum.c
index dc3cf87aba..e84285b644 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -2209,6 +2209,12 @@ do_autovacuum(void)

get_namespace_name(classForm->relnamespace),
NameStr(classForm->relname;
 
+   /*
+* Deletion might involve TOAST table access, so ensure we have 
a
+* valid snapshot.
+*/
+   PushActiveSnapshot(GetTransactionSnapshot());
+
object.classId = RelationRelationId;
object.objectId = relid;
object.objectSubId = 0;
@@ -2221,6 +2227,7 @@ do_autovacuum(void)
 * To commit the deletion, end current transaction and start a 
new
 * one.  Note this also releases the locks we took.
 */
+   PopActiveSnapshot();
CommitTransactionCommand();
StartTransactionCommand();
 
diff --git a/src/backend/replication/logical/tablesync.c 
b/src/backend/replication/logical/tablesync.c
index 118503fcb7..57e1eede41 100644
--- a/src/backend/replication/logical/tablesync.c
+++ b/src/backend/replication/logical/tablesync.c
@@ -377,6 +377,12 @@ process_syncing_tables_for_sync(XLogRecPtr current_lsn)
replorigin_session_origin_lsn = InvalidXLogRecPtr;
replorigin_session_origin_timestamp = 0;
 
+   /*
+* Updating pg_replication_origin might involve TOAST table 
access, so
+* ensure we have a valid snapshot.
+*/
+   PushActiveSnapshot(GetTransactionSnapshot());
+
/*
 * Drop the tablesync's origin tracking if exists.
 *
@@ -387,6 +393,8 @@ process_syncing_tables_for_sync(XLogRecPtr current_lsn)
 */

Re: Popcount optimization using AVX512

2024-10-30 Thread Raghuveer Devulapalli

BTW, I just realized function attributes for xsave and avx512 don't work on 
MSVC (see 
https://developercommunity.visualstudio.com/t/support-function-target-attribute-and-mutiversioning/10130630).
 Not sure if you care about it. Its an easy fix (see 
https://gcc.godbolt.org/z/Pebdj3vMx).

Re: [PATCH] Add array_reverse() function

2024-10-30 Thread Vladlen Popolitov

The following review has been posted through the commitfest application:
make installcheck-world:  tested, failed
Implements feature:   tested, failed
Spec compliant:   tested, failed
Documentation:tested, failed

Content.
The proposed function waited long to be implemented and it will be very
useful. Personally I used slow workaround, when I needed the reverse of
the array.

Run.
This patch applies cleanly to HEAD. All regression tests pass
successfully against the patch.

Format.
The code formatted according to The Code Guidelines

Documentation.
The documentation is updated, the description of the function is added.
From my point of view, it would be better to mention, that function returns
updated array (does not updates it in place, as a reader can understand),
but other array returning functions in the documentation has the same 
style (silently assume: reverse == return reversed array).

Conclusion.
+1 for commiter review

Re: protocol-level wait-for-LSN

2024-10-30 Thread Heikki Linnakangas


On 30/10/2024 13:34, Tatsuo Ishii wrote:

On Wed, 30 Oct 2024 at 07:49, Tatsuo Ishii  wrote:

I think one would have to define that somehow.  If it's useful, the
additional fields of both extensions could be appended, in some
defined order.  But this is an interesting question to think about.


I think this kind of extension, which adds new filed to an existing
message type, should be implemented as v4 protocol.


Could you explain why you think a major version bump is needed? In
what situation do you care about this. Because for my usecases (client
implementations & pgbouncer) I don't think that would be necessary. If
a client doesn't send the _pq_.wait_for_lsn protocol parameter, it
will never receive this new version.


Yes, if there's only one extension for a message type, it would not be
a big problem. But if there's more than one extensions that want to
change the same type, problem arises as I have already discussed them
upthread.


I don't really see a problem with having two protocol parameters
change the same message. Yes, you have to define what the result of
their combination is, but that seems trivial to do for additions of
fields. You either define the first protocol parameter that was added
to the spec, to add its field before the second. Or you could do it
based on something non-time-dependent, like the alphabetic order of
the protocol parameter, or the alphabetic order of the fields that
they add.


That sounds far from trivial. So each extension needs to check if any
other extension which modifies the same message type is activated?
That requires each extension implementation to have built-in knowledge
about any conflicting extension. Moreover each extension may not be
added at once.  If extension Y is added after extension X is defined,
then implementation of X needs to be changed because at the time when
X is defined, it did not need to care about Y. Another way to deal
with the problem could be defining a new protocol message which
describes those conflict information so that each extensions do not
need to have such information built-in, but maybe it is too complex.


Note that the "protocol extension" mechanism is *not* meant for 
user-defined extensions. That's not the primary purpose anyway. It 
allows evolving the protocol in core code in a backwards compatible way, 
but indeed the different extensions will need to be coordinated so that 
they don't clash with each other. If they introduced new message types 
for example, they better use different message type codes.


We might have made a mistake by calling this mechanism "protocol 
extensions", because it makes people think of user-defined extensions.


With user-defined extensions, yes, you have exactly the problem you 
describe.We have no rules on how a protocol extension is allowed to 
change the protocol. It might add fields, it might add messages, or it 
might change the meaning of existing messages. Or encapsulate the whole 
protocol in XML.


So yes, each protocol extension needs to know about all the other 
protocol extensions that it can be used with. In practice we'll avoid 
doing crazy stuff so that the protocol extensions are orthogonal, but if 
user-defined extensions get involved, there's not much we can do to 
ensure that.


--
Heikki Linnakangas
Neon (https://neon.tech)

Re: protocol-level wait-for-LSN

2024-10-30 Thread Tatsuo Ishii

> On Wed, 30 Oct 2024 at 07:49, Tatsuo Ishii  wrote:
>> > I think one would have to define that somehow.  If it's useful, the
>> > additional fields of both extensions could be appended, in some
>> > defined order.  But this is an interesting question to think about.
>>
>> I think this kind of extension, which adds new filed to an existing
>> message type, should be implemented as v4 protocol.
> 
> Could you explain why you think a major version bump is needed? In
> what situation do you care about this. Because for my usecases (client
> implementations & pgbouncer) I don't think that would be necessary. If
> a client doesn't send the _pq_.wait_for_lsn protocol parameter, it
> will never receive this new version.

Yes, if there's only one extension for a message type, it would not be
a big problem. But if there's more than one extensions that want to
change the same type, problem arises as I have already discussed them
upthread.

> I don't really see a problem with having two protocol parameters
> change the same message. Yes, you have to define what the result of
> their combination is, but that seems trivial to do for additions of
> fields. You either define the first protocol parameter that was added
> to the spec, to add its field before the second. Or you could do it
> based on something non-time-dependent, like the alphabetic order of
> the protocol parameter, or the alphabetic order of the fields that
> they add.

That sounds far from trivial. So each extension needs to check if any
other extension which modifies the same message type is activated?
That requires each extension implementation to have built-in knowledge
about any conflicting extension. Moreover each extension may not be
added at once.  If extension Y is added after extension X is defined,
then implementation of X needs to be changed because at the time when
X is defined, it did not need to care about Y. Another way to deal
with the problem could be defining a new protocol message which
describes those conflict information so that each extensions do not
need to have such information built-in, but maybe it is too complex.

Best reagards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

Re: doc issues in event-trigger-matrix.html

2024-10-30 Thread jian he

On Tue, Oct 29, 2024 at 7:54 PM Peter Eisentraut  wrote:
>
> I made a patch for this.  I have expanded the narrative discussion on
> what commands are supported for event triggers, also made a few
> corrections/additions there, based on inspecting the source code.  And
> then removed the big matrix, which doesn't provide any additional
> information, I think.
>
> I think this is sufficient and covers everything.  The only hand-wavy
> thing I can see is exactly which ALTER commands trigger the sql_drop
> event.  But this was already quite imprecise before, and I think also
> not quite correct.  This might need a separate investigation.
>
> In any case, we can use this as a starting point to iterate on the right
> wording etc.

hi. I have some minor issue.

 An event trigger fires whenever the event with which it is associated
 occurs in the database in which it is defined.

is possible to rewrite this sentence, two "which" is kind of not easy
to understand?

create role alice;
create role bob;
grant alice to bob;

 As an exception, this event does not occur for DDL commands targeting
 shared objects:

  databases
  roles
  tablespaces
  parameter privileges
  ALTER SYSTEM

 This event also does not occur for commands targeting event triggers
 themselves.

not 100% sure this description
"  roles"
cover case like "grant alice to bob;"
Here "targeting  shared objects" is "role related meta information".
maybe a new item like
roles privileges.

so we can more easily distinguish
"grant select on t1 to alice;"
and
"grant alice to bob;"

Re: [PATCH] Add roman support for to_number function

2024-10-30 Thread Hunaid Sohail

Hi,

I have attached a rebased patch if someone wants to review in the next CF.
No changes as compared to v4.

Regards,
Hunaid Sohail

>
From 951f3e9620f0d50fcdaff095652a07d6304b490c Mon Sep 17 00:00:00 2001
From: Hunaid Sohail 
Date: Wed, 30 Oct 2024 12:00:47 +0500
Subject: [PATCH v5] Add roman support for to_number function

---
 doc/src/sgml/func.sgml|  13 ++-
 src/backend/utils/adt/formatting.c| 154 +-
 src/test/regress/expected/numeric.out |  41 +++
 src/test/regress/sql/numeric.sql  |  21 
 4 files changed, 224 insertions(+), 5 deletions(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 05f630c6a6..5f91f0401b 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -8628,7 +8628,7 @@ SELECT regexp_match('abc01234xyz', '(?:(.*?)(\d+)(.*)){1,1}');


 RN
-Roman numeral (input between 1 and 3999)
+Roman numeral (valid for numbers 1 to 3999)


 TH or th
@@ -8756,6 +8756,17 @@ SELECT regexp_match('abc01234xyz', '(?:(.*?)(\d+)(.*)){1,1}');
(e.g., 9.99 is a valid pattern).
   
  
+
+
+  
+In to_number, RN pattern converts
+roman numerals to standard numbers. It is case-insensitive (e.g., 'XIV',
+'xiv', and 'Xiv' are all seen as 14).
+RN cannot be used in combination with any other formatting patterns
+or modifiers, with the exception of FM, which is applicable only in
+to_char() and is ignored in to_number().
+  
+
 

 
diff --git a/src/backend/utils/adt/formatting.c b/src/backend/utils/adt/formatting.c
index 85a7dd4561..5e5e763388 100644
--- a/src/backend/utils/adt/formatting.c
+++ b/src/backend/utils/adt/formatting.c
@@ -49,7 +49,6 @@
  *	- better number building (formatting) / parsing, now it isn't
  *		  ideal code
  *	- use Assert()
- *	- add support for roman number to standard number conversion
  *	- add support for number spelling
  *	- add support for string to string formatting (we must be better
  *	  than Oracle :-),
@@ -270,6 +269,31 @@ static const char *const rm100[] = {"C", "CC", "CCC", "CD", "D", "DC", "DCC", "D
 static const char *const numTH[] = {"ST", "ND", "RD", "TH", NULL};
 static const char *const numth[] = {"st", "nd", "rd", "th", NULL};
 
+/* --
+ * MACRO: Check if the current and next characters
+ * form a valid subtraction combination for roman numerals
+ * --
+ */
+#define IS_VALID_SUB_COMB(curr, next) \
+	(((curr) == 'I' && ((next) == 'V' || (next) == 'X')) || \
+	 ((curr) == 'X' && ((next) == 'L' || (next) == 'C')) || \
+	 ((curr) == 'C' && ((next) == 'D' || (next) == 'M')))
+
+/* --
+ * MACRO: Roman number value
+ * --
+ */
+#define ROMAN_VAL(r) \
+	((r) == 'I' ? 1 : \
+	(r) == 'V' ? 5 : \
+	(r) == 'X' ? 10 : \
+	(r) == 'L' ? 50 : \
+	(r) == 'C' ? 100 : \
+	(r) == 'D' ? 500 : \
+	(r) == 'M' ? 1000 : 0)
+
+#define MAX_ROMAN_LEN	15
+
 /* --
  * Flags & Options:
  * --
@@ -1074,6 +1098,7 @@ static bool do_to_timestamp(text *date_txt, text *fmt, Oid collid, bool std,
 static char *fill_str(char *str, int c, int max);
 static FormatNode *NUM_cache(int len, NUMDesc *Num, text *pars_str, bool *shouldFree);
 static char *int_to_roman(int number);
+static int roman_to_int(char* s, int len);
 static void NUM_prepare_locale(NUMProc *Np);
 static char *get_last_relevant_decnum(char *num);
 static void NUM_numpart_from_char(NUMProc *Np, int id, int input_len);
@@ -1284,6 +1309,15 @@ NUMDesc_prepare(NUMDesc *num, FormatNode *n)
 
 		case NUM_rn:
 		case NUM_RN:
+			if (IS_ROMAN(num))
+ereport(ERROR,
+		(errcode(ERRCODE_SYNTAX_ERROR),
+		 errmsg("cannot use \"RN\" twice")));
+			if (IS_BLANK(num) || IS_LSIGN(num) || IS_BRACKET(num) ||
+IS_MINUS(num) || IS_PLUS(num) || IS_MULTI(num))
+ereport(ERROR,
+		(errcode(ERRCODE_SYNTAX_ERROR),
+		 errmsg("\"RN\" is incompatible with other formats")));
 			num->flag |= NUM_F_ROMAN;
 			break;
 
@@ -5247,7 +5281,107 @@ int_to_roman(int number)
 	return result;
 }
 
+/* Convert a standard roman numeral to an integer.
+ * Result is an integer between 1 and 3999.
+ *
+ * If input is invalid, return -1.
+ */
+static int
+roman_to_int(char* s, int len)
+{
+	int repeatCount = 1;
+	int vCount = 0, lCount = 0, dCount = 0;
+	bool subtractionEncountered = false;
+	char lastSubtractedChar = 0;
+	int total = 0;
+
+	/* Ensure the input is not too long or empty.
+	 * 'MMMDCCCLXXXVIII' (3888) is the longest valid roman numeral (15 chars).
+	 */
+	if (len == 0 || len > MAX_ROMAN_LEN)
+		return -1;
+
+	for (int i = 0; i < len; ++i)
+	{
+		char currChar = pg_ascii_toupper(s[i]);
+		int currValue = ROMAN_VAL(currChar);
+
+		if (currValue == 0)
+			return -1;
+
+		/* Ensure no character greater than or equal to the subtracted
+		 * character appears after the subtraction.
+		 */
+		if (subtractionEncountered && (currValue >= ROMAN_VAL(lastSubtractedChar)))
+

Re: Add isolation test template in injection_points for wait/wakeup/detach

2024-10-30 Thread Bertrand Drouvot

Hi,

On Wed, Oct 30, 2024 at 12:40:05PM +0900, Michael Paquier wrote:
> On Tue, Oct 29, 2024 at 02:06:11PM +, Bertrand Drouvot wrote:
> > Yeah, agree that it would make sense to document in the test what has been
> > discovered here.
> 
> How about something like the attached?

Thanks!

A few comments:

1 === Nit

s/Permutations like this one/Permutations like the following commented out one/ 
?

2 ===

s/back its result/back its result once woken up/ ?

3 ===

s/faster than the wait/before the SQL function doing the wait returns its 
result/ ?

With 2 === and 3 === I think it's easier to "link" both sentences that way.

Thoughts?

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Use "protocol options" name instead of "protocol extensions" everywhere

2024-10-30 Thread Jelte Fennema-Nio

It was pointed out by Heikki in the thread around protocol-level
wait-for-LSN that "protocol extensions" is a pretty confusing name,
since it suggests a relation to Postgres extensions. Even though there
is no such relationship at all. Attached is a small patch that aligns
on the name "protocol options" instead. This terminology is already
used in a bunch of the docs.

Since no protocol options have been introduced yet, it seems like now
is a good time to align on what to call them. It might even be worth
backporting this to have our docs of previous versions be consistent.
From a06619bcd5c85b6f38cb6a3d03c2a4e6ffbb9b53 Mon Sep 17 00:00:00 2001
From: Jelte Fennema-Nio 
Date: Wed, 30 Oct 2024 14:37:00 +0100
Subject: [PATCH v1] Rename protocol extensions to protocol options in docs

Currently we call connection parameters that have the "_pq_." prefix
both "protocol options" and "protocol extensions". This standardizes on
calling them "protocol options". The primary reason to do this is so
people don't associate them with Postgres extensions, with which they
have nothing to do. They are completely separate things, and Postgres
extensions currently cannot even define their own protocol options.
---
 doc/src/sgml/protocol.sgml   | 2 +-
 src/backend/backup/basebackup_copy.c | 2 +-
 src/interfaces/libpq/fe-protocol3.c  | 4 ++--
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index 71b6b2a535..fc1255b7eb 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -5941,7 +5941,7 @@ psql "dbname=postgres replication=database" -c "IDENTIFY_SYSTEM;"
 
  In addition to the above, other parameters may be listed.
  Parameter names beginning with _pq_. are
- reserved for use as protocol extensions, while others are
+ reserved for use as protocol options, while others are
  treated as run-time parameters to be set at backend start
  time.  Such settings will be applied during backend start
  (after parsing the command-line arguments if any) and will
diff --git a/src/backend/backup/basebackup_copy.c b/src/backend/backup/basebackup_copy.c
index 0b4ee5f2a2..8cfcae47ad 100644
--- a/src/backend/backup/basebackup_copy.c
+++ b/src/backend/backup/basebackup_copy.c
@@ -9,7 +9,7 @@
  * the course of that single COPY OUT. Each CopyData message begins with a
  * type byte, allowing us to signal the start of a new archive, or the
  * manifest, by some means other than ending the COPY stream. This also allows
- * for future protocol extensions, since we can include arbitrary information
+ * for future protocol options, since we can include arbitrary information
  * in the message stream as long as we're certain that the client will know
  * what to do with it.
  *
diff --git a/src/interfaces/libpq/fe-protocol3.c b/src/interfaces/libpq/fe-protocol3.c
index 8c5ac1729f..a84ea7a786 100644
--- a/src/interfaces/libpq/fe-protocol3.c
+++ b/src/interfaces/libpq/fe-protocol3.c
@@ -1439,8 +1439,8 @@ pqGetNegotiateProtocolVersion3(PGconn *conn)
 	if (num > 0)
 	{
 		appendPQExpBuffer(&conn->errorMessage,
-		  libpq_ngettext("protocol extension not supported by server: %s",
-		 "protocol extensions not supported by server: %s", num),
+		  libpq_ngettext("protocol option not supported by server: %s",
+		 "protocol options not supported by server: %s", num),
 		  buf.data);
 		appendPQExpBufferChar(&conn->errorMessage, '\n');
 	}
-- 
2.43.0

Re: freespace.c modifies buffer without any locks

2024-10-30 Thread Heikki Linnakangas


On 29/10/2024 02:50, Andres Freund wrote:

Hi,

I just noticed that fsm_vacuum_page() modifies a buffer without even holding a
shared lock.  That quite obviously seems like a violation of the buffer
locking protocol:

/*
 * Try to reset the next slot pointer. This encourages the use of
 * low-numbered pages, increasing the chances that a later vacuum can
 * truncate the relation.  We don't bother with a lock here, nor with
 * marking the page dirty if it wasn't already, since this is just a 
hint.
 */
if (BufferPrepareToSetHintBits(buf))
{
((FSMPage) PageGetContents(page))->fp_next_slot = 0;
BufferFinishSetHintBits(buf);
}


In the commit (15c121b3ed7) adding the current freespace code, there wasn't
even a comment remarking upon that oddity.  10 years later Tom added a
comment, in 2b1759e2675f.


I noticed this while adding a debug mode in which buffers are mprotected
PROT_NONE/PROT_READ/PROT_READ|PROT_WRITE depending on the buffer's state.


Is there any good reason to avoid a lock here? Compared to the cost of
exclusively locking buffers during RecordAndGetPageWithFreeSpace() the cost of
doing so during FreeSpaceMapVacuum*() seems small?


Agreed. This is a premature optimization, fsm_vacuum_page() should just 
take the lock.



Somewhat relatedly, but I don't think I understand why it's a good idea to
reset fp_next_slot to 0 in fsm_vacuum_page(). At least doing so
unconditionally.


Per the comment: "This encourages the use of low-numbered pages, 
increasing the chances that a later vacuum can truncate the relation".


Yes, the next GetPageWithFreeSpace() call will need to do a little more 
work to find the first page that actually has free space, if any. But 
that seems insignificant compared to vacuum.



When extending a relation, it seems we'll constantly reset the search back to
the start of the range, even though we pretty much know that there's no space
earlier in the relation - otherwise we'd not have extended.


That's a good point. Before commit a063baaced, relation extension used a 
separate UpdateFreeSpaceMap() function, which didn't reset fp_next_slot.



And when called from FreeSpaceMapVacuumRange() we'll reset fp_next_slot to
somewhere that wasn't actually vacuumed, afaict?


Yeah. In the context of actual VACUUM rather than relation extension, 
that seems fine; the next GetPageWithFreeSpace() call will fix it up 
quickly.


--
Heikki Linnakangas
Neon (https://neon.tech)

Re: general purpose array_sort

2024-10-30 Thread Junwang Zhao

On Wed, Oct 30, 2024 at 10:17 PM Junwang Zhao  wrote:
>
> Hi,
>
> On Wed, Oct 30, 2024 at 9:29 PM Aleksander Alekseev
>  wrote:
> >
> > Hi,
> >
> > Thanks for the updated patch set.
> >
> > > > > +Datum
> > > > > +array_sort_order(PG_FUNCTION_ARGS)
> > > > > +{
> > > > > +return array_sort(fcinfo);
> > > > > +}
> > > > > +
> > > > > +Datum
> > > > > +array_sort_order_nulls_first(PG_FUNCTION_ARGS)
> > > > > +{
> > > > > +return array_sort(fcinfo);
> > > > > +}
> > > >
> > > > Any reason not to specify array_sort in pg_proc.dat?
> > >
> > > It is specified in 0001 (see oid => '8810').
> >
> > What I meant was that I don't think these wrapper functions are
> > needed. I think you can just do:
> >
> > ```
> > +{ oid => '8811', descr => 'sort array',
> > +  proname => 'array_sort', prorettype => 'anyarray',
> > +  proargtypes => 'anyarray bool', prosrc => 'array_sort'}, <--
> > array_sort is used directly in `prosrc`
> > ```
> >
> > ... unless I'm missing something.
>
> There is a opr sanity check for this[1], if we remove these wrapper functions,
> regression test will fail with:
>
> - oid | proname | oid | proname
> --+-+-+-
> -(0 rows)
> + oid  |  proname   | oid  |  proname
> +--++--+
> + 8811 | array_sort | 8812 | array_sort
> + 8810 | array_sort | 8811 | array_sort
> + 8810 | array_sort | 8812 | array_sort
> +(3 rows)
>
>
> [1]:
>
> -- Considering only built-in procs (prolang = 12), look for multiple uses
> -- of the same internal function (ie, matching prosrc fields). It's OK to
> -- have several entries with different pronames for the same internal 
> function,
> -- but conflicts in the number of arguments and other critical items should
> -- be complained of. (We don't check data types here; see next query.)
> -- Note: ignore aggregate functions here, since they all point to the same
> -- dummy built-in function.
>
> SELECT p1.oid, p1.proname, p2.oid, p2.proname
> FROM pg_proc AS p1, pg_proc AS p2
> WHERE p1.oid < p2.oid AND
> p1.prosrc = p2.prosrc AND
> p1.prolang = 12 AND p2.prolang = 12 AND
> (p1.prokind != 'a' OR p2.prokind != 'a') AND
> (p1.prolang != p2.prolang OR
> p1.prokind != p2.prokind OR
> p1.prosecdef != p2.prosecdef OR
> p1.proleakproof != p2.proleakproof OR
> p1.proisstrict != p2.proisstrict OR
> p1.proretset != p2.proretset OR
> p1.provolatile != p2.provolatile OR
> p1.pronargs != p2.pronargs);
>
> >
> > --
> > Best regards,
> > Aleksander Alekseev
>
>
>
> --
> Regards
> Junwang Zhao

CFbot failed with doc build, v10 fixed that.

-- 
Regards
Junwang Zhao


v10-0002-support-sort-order-and-nullsfirst-flag.patch
Description: Binary data


v10-0001-general-purpose-array_sort.patch
Description: Binary data

Re: protocol-level wait-for-LSN

2024-10-30 Thread Tatsuo Ishii

>> So yes, each protocol extension needs to know about all the other
>> protocol extensions that it can be used with. In practice we'll avoid
>> doing crazy stuff so that the protocol extensions are orthogonal
> 
> Just as an example, let's say we add a server-side query time to the
> protocol (which honestly seems like a pretty useful feature). So that
> ReadyForQuery now returns the query time if the query_time protocol.
> For clients it isn't difficult at all to support any combination of
> query_time & wait_for_lsn options. As long as we define that the
> wait_for_lsn field is before the query_time field if both exist, then
> two simple if statements like this would do the trick:
> 
> if (wait_for_lsn_enabled) {
> // interpret next field as LSN
> }
> if (query_time_enabled) {
> // interpret next field as query time
> }

But

if (query_time_enabled) {
// interpret next field as query time
}
if (wait_for_lsn_enabled) {
// interpret next field as LSN
}

doesn't work, right? I don't like clients need to know the exact order
of each protocol extensions.

BTW,

> Just as an example, let's say we add a server-side query time to the
> protocol (which honestly seems like a pretty useful feature). So that
> ReadyForQuery now returns the query time if the query_time protocol.

Probaby it's better CommandComplete returns the query time because
there could be multiple query-time in multi-statement query or
extended query protocol.

Best reagards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

Re: Popcount optimization using AVX512

2024-10-30 Thread Nathan Bossart

On Wed, Oct 30, 2024 at 08:53:10PM +, Raghuveer Devulapalli wrote:
> BTW, I just realized function attributes for xsave and avx512 don't work
> on MSVC (see
> https://developercommunity.visualstudio.com/t/support-function-target-attribute-and-mutiversioning/10130630).
> Not sure if you care about it. Its an easy fix (see
> https://gcc.godbolt.org/z/Pebdj3vMx).

Oh, good catch.  IIUC we only need to check for #ifndef _MSC_VER in the
configure programs for meson.  pg_attribute_target will be empty on MSVC,
and I believe we only support meson builds there.

-- 
nathan

Re: New "raw" COPY format

2024-10-30 Thread Masahiko Sawada

On Tue, Oct 29, 2024 at 9:48 AM Joel Jacobson  wrote:
>
> > ---
> > It's a bit odd to me to use the delimiter as a EOL marker in raw
> > format, but probably it's okay.
> >
> > ---
> > -   if (cstate->opts.format != COPY_FORMAT_BINARY)
> > +   if (cstate->opts.format == COPY_FORMAT_RAW &&
> > +   cstate->opts.delim != NULL)
> > +   {
> > +   /* Output the user-specified delimiter between rows */
> > +   CopySendString(cstate, cstate->opts.delim);
> > +   }
> > +   else if (cstate->opts.format == COPY_FORMAT_TEXT ||
> > +cstate->opts.format == COPY_FORMAT_CSV)
> >
> > Since it sends the delimiter as a string, even if we specify the
> > delimiter to '\n', it doesn't send the new line (i.e. ASCII LF, 10).
> > For example,
> >
> > postgres(1:904427)=# copy (select '{"j" : 1}'::jsonb) to stdout with
> > (format 'raw', delimiter '\n');
> > {"j": 1}\npostgres(1:904427)=#
> >
> > I think there is a similar problem in COPY FROM; if we set a delimiter
> > to '\n' when doing COPY FROM in raw format, it expects the string '\n'
> > as a line termination but not ASCII LF(10). I think that input data
> > normally doesn't use the string '\n' as a line termination.
>
> You need to use E'\n' to get ASCII LF(10), since '\n' is just a delimiter
> consisting of backslash followed by "n".
>
> Is this a problem? Since any string can be used as delimiter,
> I think it would be strange if we parsed it and replaced the string
> with a different string.
>
> Another thought:
>
> Maybe we shouldn't default to no delimiter after all,
> maybe it would be better to default to the OS default EOL,

It seems to be useful for loading unstructured non-delimited text
files such as log files. Users can set the delimiter an empty when
loading the entire file to a single column. On the other hand, I think
that If we make it default it might not be necessary to allow the
delimiter to be multi bytes. It would be flexible but it would not
necessarily be necessary. Also, it would be somewhat odd to me that we
can use multi bytes characters as the delimiter only in 'raw' mode,
but not in 'text' mode.

> maybe a final delimiter should always be written at the end,
> so that when exporting a single json field, it would get exported
> to the text file with \n at the end, which is what most text editor
> does when saving a .json file.

Agreed.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Re: CREATE OR REPLACE MATERIALIZED VIEW

2024-10-30 Thread Erik Wienhold

On 2024-09-05 22:33 +0200, Erik Wienhold wrote:
> On 2024-07-27 02:45 +0200, Erik Wienhold wrote:
> > On 2024-07-12 16:49 +0200, Said Assemlal wrote:
> > > > My initial idea, while writing the patch, was that one could replace the
> > > > matview without populating it and then run the concurrent refresh, like
> > > > this:
> > > > 
> > > >  CREATE OR REPLACE MATERIALIZED VIEW foo AS ... WITH NO DATA;
> > > >  REFRESH MATERIALIZED VIEW CONCURRENTLY foo;
> > > > 
> > > > But that won't work because concurrent refresh requires an already
> > > > populated matview.
> > > > 
> > > > Right now the patch either populates the replaced matview or leaves it
> > > > in an unscannable state.  Technically, it's also possible to skip the
> > > > refresh and leave the old data in place, perhaps by specifying
> > > > WITH *OLD* DATA.  New columns would just be null.  Of course you can't
> > > > tell if you got stale data without knowing how the matview was replaced.
> > > > Thoughts?
> > > 
> > > I believe the expectation is to get materialized views updated whenever it
> > > gets replaced so likely to confuse users ?
> > 
> > I agree, that could be confusing -- unless it's well documented.  The
> > attached 0003 implements WITH OLD DATA and states in the docs that this
> > is intended to be used before a concurrent refresh.
> > 
> > Patch 0001 now covers all matview cases in psql's tab completion.  I
> > missed some of them with v1.
> 
> Here's a rebased version due to conflicts with f683d3a4ca and
> 1e35951e71.  No other changes since v2.

rebased

-- 
Erik
>From eefed7938bc1511a1ae998f67c67661c366484ec Mon Sep 17 00:00:00 2001
From: Erik Wienhold 
Date: Tue, 21 May 2024 18:35:47 +0200
Subject: [PATCH v4 1/3] Add CREATE OR REPLACE MATERIALIZED VIEW

---
 .../sgml/ref/create_materialized_view.sgml|  15 +-
 src/backend/commands/createas.c   | 207 ++
 src/backend/commands/tablecmds.c  |   8 +-
 src/backend/commands/view.c   | 106 ++---
 src/backend/parser/gram.y |  15 ++
 src/bin/psql/tab-complete.in.c|  15 +-
 src/include/commands/view.h   |   3 +
 src/include/nodes/parsenodes.h|   2 +-
 src/include/nodes/primnodes.h |   1 +
 src/test/regress/expected/matview.out | 191 
 src/test/regress/sql/matview.sql  | 108 +
 11 files changed, 582 insertions(+), 89 deletions(-)

diff --git a/doc/src/sgml/ref/create_materialized_view.sgml b/doc/src/sgml/ref/create_materialized_view.sgml
index 62d897931c..5e03320eb7 100644
--- a/doc/src/sgml/ref/create_materialized_view.sgml
+++ b/doc/src/sgml/ref/create_materialized_view.sgml
@@ -21,7 +21,7 @@ PostgreSQL documentation
 
  
 
-CREATE MATERIALIZED VIEW [ IF NOT EXISTS ] table_name
+CREATE [ OR REPLACE ] MATERIALIZED VIEW [ IF NOT EXISTS ] table_name
 [ (column_name [, ...] ) ]
 [ USING method ]
 [ WITH ( storage_parameter [= value] [, ... ] ) ]
@@ -60,6 +60,17 @@ CREATE MATERIALIZED VIEW [ IF NOT EXISTS ] table_name
   Parameters
 
   
+   
+OR REPLACE
+
+ 
+  Replaces a materialized view if it already exists.
+  Specifying OR REPLACE together with
+  IF NOT EXISTS is an error.
+ 
+
+   
+

 IF NOT EXISTS
 
@@ -67,7 +78,7 @@ CREATE MATERIALIZED VIEW [ IF NOT EXISTS ] table_name
   Do not throw an error if a materialized view with the same name already
   exists. A notice is issued in this case.  Note that there is no guarantee
   that the existing materialized view is anything like the one that would
-  have been created.
+  have been created, unless you use OR REPLACE instead.
  
 

diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 5c92e48a56..d4bc5e5c08 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -79,55 +79,151 @@ static void intorel_destroy(DestReceiver *self);
 static ObjectAddress
 create_ctas_internal(List *attrList, IntoClause *into)
 {
-	CreateStmt *create = makeNode(CreateStmt);
-	bool		is_matview;
+	bool		is_matview,
+replace = false;
 	char		relkind;
-	Datum		toast_options;
-	const char *const validnsps[] = HEAP_RELOPT_NAMESPACES;
+	Oid			matviewOid = InvalidOid;
 	ObjectAddress intoRelationAddr;
 
 	/* This code supports both CREATE TABLE AS and CREATE MATERIALIZED VIEW */
 	is_matview = (into->viewQuery != NULL);
 	relkind = is_matview ? RELKIND_MATVIEW : RELKIND_RELATION;
 
-	/*
-	 * Create the target relation by faking up a CREATE TABLE parsetree and
-	 * passing it to DefineRelation.
-	 */
-	create->relation = into->rel;
-	create->tableElts = attrList;
-	create->inhRelations = NIL;
-	create->ofTypename = NULL;
-	create->constraints = NIL;
-	create->options = into->options;
-	create->oncommit = into->onCommit;
-	create->tablespacename = into->tableSpaceName;
-	create->if_not_exists = false;
-	create->accessMeth

Re: per backend I/O statistics

2024-10-30 Thread Bertrand Drouvot

Hi,

On Tue, Oct 08, 2024 at 04:28:39PM +, Bertrand Drouvot wrote:
> > > On Fri, Sep 20, 2024 at 01:26:49PM +0900, Michael Paquier wrote:
> > 
> > Okay, per the above and the persistency of the stats.
> 
> Great, I'll work on an updated patch version then.
> 

I spend some time on this during the last 2 days and I think we have 3 design
options.

=== GOALS ===

But first let's sump up the goals that I think we agreed on:

- Keep pg_stat_io as it is today: give the whole server picture and serialize
the stats to disk.

- Introduce per-backend IO stats and 2 new APIs to:

   1. Provide the IO stats for "my backend" (through say pg_my_stat_io), this
  would take care of the stats_fetch_consistency.

   2. Retrieve the IO stats for another backend (through say 
pg_stat_get_backend_io(pid))
  that would _not_ take care of stats_fetch_consistency, as:

  2.1/ I think that there is no use case (there is no need to get others
   backends I/O statistics while taking care of the 
stats_fetch_consistency)

  2.2/ That could be memory expensive to store a snapshot for all the 
backends
   (depending of the number of backend created)

- There is no need to serialize the per-backend IO stats to disk (no point to
see stats for backends that do not exist anymore after a re-start).

- The per-backend IO stats should be variable-numbered (not fixed), as per 
up-thread discussion.

=== OPTIONS ===

So, based on this, I think that we could:

Option 1: "move" the existing PGSTAT_KIND_IO to variable-numbered and let this
KIND take care of the aggregated view (pg_stat_io) and the per-backend stats.

Option 2: let PGSTAT_KIND_IO as it is and introduce a new PGSTAT_KIND_BACKEND_IO
that would be variable-numbered.

Option 3: Remove PGSTAT_KIND_IO, introduce a new PGSTAT_KIND_BACKEND_IO that
would be variable-numbered and store the "aggregated stats aka pg_stat_io" in
shared memory (not part of the variable-numbered hash). Per-backend stats
could be aggregated into "pg_stat_io" during the flush_pending_cb call for 
example.

=== BEST OPTION? ===

I would opt for Option 2 as:

- The stats system is currently not designed for Option 1 and our goals (for
example the shared_data_len is used to serialize but also to fetch the entries,
see pgstat_fetch_entry()) so that would need some hack to serialize only a part
of them and still be able to fetch them all).

- Mixing "fixed" and "variable" in the same KIND does not sound like a good idea
(though that might be possible with some hacks, I don't think that would be 
easy to maintain).

- Having the per-backend as "variable" in its dedicated kind looks more 
reasonable
and less error-prone.

- I don't think there is a stats design similar to option 3 currently, so I'm
not sure there is a need to develop something new while Option 2 could be done.

- Option 3 would need some hack for (at least) the "pg_stat_io" 
[de]serialization
part.

- Option 2 seems to offer more flexibility (as compare to Option 1 and 3).

Thoughts?

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Having problems generating a code coverage report

2024-10-30 Thread Peter Geoghegan

On Wed, Oct 30, 2024 at 5:21 PM Aleksander Alekseev
 wrote:
> I'm using Xubuntu 24.04 LTS and my lcov version is:
>
> ```
> $ lcov --version
> lcov: LCOV version 2.0-1
> ```

I use Debian unstable for most of my day to day work. Apparently
Debian unstable has exactly the same version of lcov as Ubuntu 24.04.

I've also been unable to generate coverage reports for some time (at
least on Debian, with LCOV version 2.0-1).

--
Peter Geoghegan

Re: define pg_structiszero(addr, s, r)

2024-10-30 Thread Peter Smith

+/*
+ * Test if a memory region starting at p and of size len is full of zeroes.
+ */
+static inline bool
+pg_mem_is_all_zeros(const void *ptr, size_t len)

The function comment should say 'ptr' instead of 'p', right?

==
Kind Regards,
Peter Smith.
Fujitsu Australia

Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM

2024-10-30 Thread Jingtang Zhang

Hi~

I did some performance test these days, and I have some findings.

From the archive months ago, I found there were discussions about which type
of TupleTableSlot to use for buffering tuples. A single column mat view was
used for evaluation. Finally we used virtual one.

However when I test with a 32-columns mat view, I get regression.

Test case:

-- prepare
create table test as 
  select
i as id0,
i + 1 as id1,
i + 2 as id2,
i + 3 as id3,
i + 4 as id4,
i + 5 as id5,
i + 6 as id6,
i + 7 as id7,
i + 8 as id8,
i + 9 as id9,
i + 10 as id10,
i + 11 as id11,
i + 12 as id12,
i + 13 as id13,
i + 14 as id14,
i + 15 as id15,
i + 0.01 as f0,
i + 0.1 as f1,
i + 0.2 as f2,
i + 0.3 as f3,
i + 0.4 as f4,
i + 0.5 as f5,
i + 0.6 as f6,
i + 0.7 as f7,
i + 0.8 as f8,
i + 0.9 as f9,
i + 1.01 as f10,
i + 1.1 as f11,
i + 1.2 as f12,
i + 1.3 as f13,
i + 1.4 as f14,
i + 1.5 as f15,
i + 1.6 as f16
  from generate_series(1,500) i;

-- run
create materialized view m1 as select * from test;


HEAD:
Time: 13615.542 ms (00:13.616)
Time: 13545.706 ms (00:13.546)
Time: 13578.475 ms (00:13.578)

Patched
Time: 20112.734 ms (00:20.113)
Time: 19996.957 ms (00:19.997)
Time: 19936.871 ms (00:19.937)


I did a quick perf, the overhead seems to come from virtual tuple 
materialization.

HEAD:
  12.29%  postgres[.] pg_checksum_block
   6.33%  postgres[.] GetPrivateRefCountEntry
   5.40%  postgres[.] pg_comp_crc32c_sse42
   4.54%  [kernel][k] copy_user_enhanced_fast_string
   2.69%  postgres[.] BufferIsValid
   1.52%  postgres[.] XLogRecordAssemble

Patched:
  11.75%  postgres[.] tts_virtual_materialize
   8.87%  postgres[.] pg_checksum_block
   8.17%  postgres[.] slot_deform_heap_tuple
   8.09%  postgres[.] heap_compute_data_size
   6.17%  postgres[.] fill_val
   3.81%  postgres[.] heap_fill_tuple
   3.37%  postgres[.] tts_virtual_copyslot
   2.62%  [kernel][k] copy_user_enhanced_fast_string

Not sure if it is a universal situation.


—
Regards, Jingtang

Re: Large expressions in indexes can't be stored (non-TOASTable)

2024-10-30 Thread Michael Paquier

On Wed, Oct 30, 2024 at 03:54:32PM -0500, Nathan Bossart wrote:
> I'll manage.  0001 was a doozy to back-patch, and this obviously isn't a
> terribly pressing issue, so I plan to wait until after the November minor
> release to apply this.

Okay by me.

> I'm having second thoughts on 0002, so I'll
> probably leave that one uncommitted, but IMHO we definitely need 0003 to
> prevent this issue from reoccurring.

I liked your idea behind 0002, FWIW.  We do that for a lot of fields
already in a Relation.
--
Michael


signature.asc
Description: PGP signature

Re: Add isolation test template in injection_points for wait/wakeup/detach

2024-10-30 Thread Michael Paquier

On Wed, Oct 30, 2024 at 09:27:13AM +, Bertrand Drouvot wrote:
> Thanks! LGTM.

Thanks for the review.  Applied that, then.
--
Michael


signature.asc
Description: PGP signature

Re: define pg_structiszero(addr, s, r)

2024-10-30 Thread Michael Paquier

On Thu, Oct 31, 2024 at 07:55:45AM +1100, Peter Smith wrote:
> +/*
> + * Test if a memory region starting at p and of size len is full of zeroes.
> + */
> +static inline bool
> +pg_mem_is_all_zeros(const void *ptr, size_t len)
> 
> The function comment should say 'ptr' instead of 'p', right?

Yes.

+static inline bool
+pg_mem_is_all_zeros(const void *ptr, size_t len)

While we're talking about wordsmithing things, I would not choose
"mem" for this routine, just stick to "memory".  There is not much in
the code that does memory-specific things like what you are proposing
here.  Still, this would be more consistent with the macros for memory 
barriers at least.  Hence, "pg_memory_is_all_zeros()" makes more
sense?

The location of memutils.h is sensible.

+   if (pg_mem_is_all_zeros(pagebytes , (BLCKSZ / sizeof(size_t

Extra space not required after pagebytes.
--
Michael

signature.asc
Description: PGP signature

Re: Avoid detoast overhead when possible

2024-10-30 Thread Andy Fan

Nikita Malakhov  writes:

> 1) In short the idea behind the Json improvements is to introduce iterators
> for Json objects to extract these objects value by value,

OK here.

> and modification
> of in-memory and on-disk representations to hold partially-detoasted Json
> objects (tree nodes with need-detoast state) for in-memory, and to store
> key-value map in on-disk representation to extract only required
> values.

I have troubles to follow here. Could you explain this more for the
below example, What will be done differently from current in the below
example: 

CREATE TABLE t(jbcol jsonb);

INSERT INTO t values ('{"a": 1, "b": 2}'); // new storage format?

SELECT jbcol->'a',  jbcol->'b' FROM t;   // How does ExprInterp.c work
with this? 

> 2) Backward compatibility would not be a problem because in on-disk
> representation we could distinguish old version from new.

OK, so you indeed change the storage format, if so, How is the new
format looks like? It would be an important factor to discuss. 

> there are enough service bits for that.

OK, good news.

> But servers not updated with this patch could not parse new data, of
> course.

This looks too for me. 

>
> 3) Yes, it definitely has to. These changes are very complex and invasive,
> that's the reason I haven't finished the patch.

This is the area I am really worried about. I think you can have some
high-level design document for review first. It would be great that some
more experienced people could have a look at, however I know it is hard
becuase of their bindwidth. You need that document anyway.

Without the details, other people is hard to understand the idea in
your mind. 

>
> 4) It doesn't seem worthy at first glance. Do you have any ideas on
> this?

I think the more important ones are (a) what the new storage looks like,
(b) how does it works with the ExprExecutionEnginner. Without that
infromation, I think it is too soon to talk about this.

-- 
Best Regards
Andy Fan

Re: Add ExprState hashing for GROUP BY and hashed SubPlans

2024-10-30 Thread David Rowley

On Tue, 29 Oct 2024 at 22:47, David Rowley  wrote:
> I've attached an updated patch with a few other fixes. Whilr checking
> this tonight, noticed that master does not use
> SubPlanState.tab_eq_funcs for anything. I resisted removing that in
> this patch. Perhaps a follow-on patch can remove that. I suspect it's
> not been used for a long time now, but I didn't do the archaeology
> work to find out.

3974bc319 removed the SubPlanState.tab_eq_funcs field, so here's a
rebased patch.

David


v3-0001-Use-ExprStates-for-hashing-in-GROUP-BY-and-SubPla.patch
Description: Binary data

Count and log pages set all-frozen by vacuum

2024-10-30 Thread Melanie Plageman

Vacuum currently counts and logs the number of pages of a relation
with newly frozen tuples. It does not count the number of pages newly
set all-frozen in the visibility map.

The number of pages set all-frozen in the visibility map by a given
vacuum can be useful when analyzing which normal vacuums reduce the
number of pages future aggressive vacuum need to scan.

Also, empty pages that are set all-frozen in the VM don't show up in
the count of pages with newly frozen tuples. When making sense of the
result of visibilitymap_summary() for a relation, it's helpful to know
how many pages were set all-frozen in the VM by each vacuum.

The attached patch set makes visibilitymap_set() return the old value
of the block's VM bits and then uses that to increment a new counter
in the LVRelState.

Here is example logging output with the patch:

create table foo (a int, b int) with (autovacuum_enabled = false);
insert into foo select generate_series(1,1000), 1;
delete from foo where a > 50;
vacuum (freeze, verbose) foo;

finished vacuuming "melanieplageman.public.foo": index scans: 0
pages: 4 removed, 1 remain, 5 scanned (100.00% of total)
tuples: 950 removed, 50 remain, 0 are dead but not yet removable
frozen: 1 pages from table (20.00% of total) had 50 tuples frozen. 5
pages set all-frozen in the VM

- Melanie


v1-0003-Count-pages-set-all-frozen-in-VM-during-vacuum.patch
Description: Binary data


v1-0002-Make-visibilitymap_set-return-previous-state-of-v.patch
Description: Binary data


v1-0001-Rename-LVRelState-frozen_pages.patch
Description: Binary data

RE: Popcount optimization using AVX512

2024-10-30 Thread Devulapalli, Raghuveer



> Oh, good catch.  IIUC we only need to check for #ifndef _MSC_VER in the
> configure programs for meson.  pg_attribute_target will be empty on MSVC, and 
> I
> believe we only support meson builds there.

Right. __has_attribute (target) produces a compiler warning on MSVC: 
https://gcc.godbolt.org/z/EfWGxbvj3. Might need to guard that with #if 
defined(__has_attribute) to get rid of it. 

> 
> --
> nathan

Re: Pgoutput not capturing the generated columns

2024-10-30 Thread Peter Smith

On Thu, Oct 31, 2024 at 1:14 PM vignesh C  wrote:
>
> On Thu, 31 Oct 2024 at 04:42, Peter Smith  wrote:
> >
> > On Thu, Oct 31, 2024 at 3:16 AM vignesh C  wrote:
> > >
> > > On Wed, 30 Oct 2024 at 15:06, Amit Kapila  wrote:
> > > >
> > > > On Tue, Oct 29, 2024 at 8:50 PM vignesh C  wrote:
> > > > >
> > > > > Thank you for reporting this issue. The attached v46 patch addresses
> > > > > the problem and includes some adjustments to the comments. Thanks to
> > > > > Amit for sharing the comment changes offline.
> > > > >
> > > >
> > > > Pushed. Kindly rebase and send the remaining patches.
> > >
> > > Thanks for committing this patch, here is a rebased version of the
> > > remaining patches.
> > >
> >
> > Hi,
> >
> > I found that the docs of src/sgml/ddl.sgml [1] are still saying:
> >
> >  
> >   Generated columns are skipped for logical replication and cannot be
> >   specified in a CREATE PUBLICATION column list.
> >  
> >
> > But that is contrary to the new behaviour after the "Replicate
> > generated columns when specified in the column list." commit yesterday
> > [2].
> >
> > It looks like an oversight. I think updating that paragraph should
> > have been included with yesterday's commit.
>
> Thanks for the findings, the attached patch has the changes for the same.
>

LGTM.

==
Kind Regards,
Peter Smith.
Fujitsu Australia

Having problems generating a code coverage report

2024-10-30 Thread Aleksander Alekseev

Hi hackers,

Recently I tried to build a code coverage report according to the
documentation [1]. When using the following command:

```
time sh -c 'git clean -dfx && meson setup --buildtype debug
-DPG_TEST_EXTRA="kerberos ldap ssl" -Db_coverage=true -Dldap=disabled
-Dssl=openssl -Dcassert=true -Dtap_tests=enabled
-Dprefix=/home/eax/pginstall build && ninja -C build &&
PG_TEST_EXTRA=1 meson test -C build && ninja -C build coverage-html'
```

... I get:

```
geninfo: ERROR: Unexpected negative count '-3' for
/home/eax/projects/c/postgresql/src/port/snprintf.c:532.
Perhaps you need to compile with '-fprofile-update=atomic
(use "geninfo --ignore-errors negative ..." to bypass this error)
Traceback (most recent call last):
  File "/usr/bin/meson", line 40, in 
sys.exit(mesonmain.main())
 
  File "/usr/lib/python3/dist-packages/mesonbuild/mesonmain.py", line
294, in main
return run(sys.argv[1:], launcher)
   ^^^
  File "/usr/lib/python3/dist-packages/mesonbuild/mesonmain.py", line
282, in run
return run_script_command(args[1], args[2:])
   ^
  File "/usr/lib/python3/dist-packages/mesonbuild/mesonmain.py", line
223, in run_script_command
return module.run(script_args)
   ^^^
  File "/usr/lib/python3/dist-packages/mesonbuild/scripts/coverage.py",
line 206, in run
return coverage(options.outputs, options.source_root,
   ^^
  File "/usr/lib/python3/dist-packages/mesonbuild/scripts/coverage.py",
line 123, in coverage
subprocess.check_call([lcov_exe,
  File "/usr/lib/python3.12/subprocess.py", line 413, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['lcov', '--directory',
'/home/eax/projects/c/postgresql/build', '--capture', '--output-file',
'/home/eax/projects/c/postgresql/build/meson-logs/coverage.info.run',
'--no-checksum', '--rc', 'branch_coverage=1']' returned non-zero exit
status 1.
ninja: build stopped: subcommand failed.
```

If I add -fprofile-update=atomic as suggested:

```
time CFLAGS="-fprofile-update=atomic"
CXXFLAGS="-fprofile-update=atomic" sh -c 'git clean -dfx && meson
setup --buildtype debug -DPG_TEST_EXTRA="kerberos ldap ssl"
-Db_coverage=true -Dldap=disabled -Dssl=openssl -Dcassert=true
-Dtap_tests=enabled -Dprefix=/home/eax/pginstall build && ninja -C
build && PG_TEST_EXTRA=1 meson test -C build && ninja -C build
coverage-html'
```

... I get:

```
genhtml: ERROR: duplicate merge record src/include/catalog
Traceback (most recent call last):
  File "/usr/bin/meson", line 40, in 
sys.exit(mesonmain.main())
 
  File "/usr/lib/python3/dist-packages/mesonbuild/mesonmain.py", line
294, in main
return run(sys.argv[1:], launcher)
   ^^^
  File "/usr/lib/python3/dist-packages/mesonbuild/mesonmain.py", line
282, in run
return run_script_command(args[1], args[2:])
   ^
  File "/usr/lib/python3/dist-packages/mesonbuild/mesonmain.py", line
223, in run_script_command
return module.run(script_args)
   ^^^
  File "/usr/lib/python3/dist-packages/mesonbuild/scripts/coverage.py",
line 206, in run
return coverage(options.outputs, options.source_root,
   ^^
  File "/usr/lib/python3/dist-packages/mesonbuild/scripts/coverage.py",
line 150, in coverage
subprocess.check_call([genhtml_exe,
  File "/usr/lib/python3.12/subprocess.py", line 413, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['genhtml', '--prefix',
'/home/eax/projects/c/postgresql/build', '--prefix',
'/home/eax/projects/c/postgresql', '--output-directory',
'/home/eax/projects/c/postgresql/build/meson-logs/coveragereport',
'--title', 'Code coverage', '--legend', '--show-details',
'--branch-coverage',
'/home/eax/projects/c/postgresql/build/meson-logs/coverage.info']'
returned non-zero exit status 1.
ninja: build stopped: subcommand failed.
```

I'm using Xubuntu 24.04 LTS and my lcov version is:

```
$ lcov --version
lcov: LCOV version 2.0-1
```

I tried using Autotools with the same results. Pretty confident it
worked before. I'm wondering if anyone else experienced this lately
and/or knows a workaround.

[1]: https://www.postgresql.org/docs/current/regress-coverage.html

--
Best regards,
Aleksander Alekseev

Re: Interrupts vs signals

2024-10-30 Thread Michael Paquier

On Wed, Oct 30, 2024 at 01:23:54PM -0400, Robert Haas wrote:
> On Wed, Oct 30, 2024 at 12:03 PM Heikki Linnakangas  wrote:
>> We can provide backwards compatibility macros and a new facility to
>> allocate custom interrupt bits. But how big of a problem is this anyway?
>> In an in-person chat last week, Andres said something like "this will
>> break every extension".
> 
> Seems hyperbolic.

Most extensions that rely on bgworkers, at least.

> However, that was also pretty easy to fix. This seems like it might be
> a little more complicated.

The advantage of the breakage is also to force extension maintainers
to look at the changes because these have benefits.  I'd like to think
that 6f3bd98ebfc0 was a good move overall even if it broke come
compilations and these require more PG_VERSION_NUM magic.  For the
extension maintainer hat on, it is always annoying, but not really
that bad in the long-term.

>> We have a few options for how to deal with backwards-compatibility for
>> extensions:
>>
>> A) If we rip off the bandaid in one go and don't provide any
>> backwards-compatibility macros, we will break 96 extensions. Most of
>> them can be fixed by replacing WaitLatch, SetLatch, ResetLatch with
>> corresponding WaitInterrupt, RaiseInterrupt, ResetInterrupt calls. (With
>> #ifdef for compatibility with older postgres versions)
>>
>> B) If we provide backwards-compatibility macros so that simple Latch
>> calls on MyLatch continue working, we will break about 14 extensions.
>> They will need some tweaking like in option A). A bit more than the
>> simple cases in option A), but nothing too difficult.
>>
>> C) We could provide "forward-compatibility" macros in a separate header
>> file, to make the new "SetInterrupt" etc calls work in old PostgreSQL
>> versions. Many of the extensions already have a header file like this,
>> see e.g. citusdata/citus/src/include/pg_version_compat.h,
>> pipelinedb/pipelinedb/include/compat.h. It might actually be a good idea
>> to provide a semi-official header file like this on the Postgres wiki,
>> to help extension authors. It would encourage extensions to use the
>> latest idioms, while still being able to compile for older versions.
>>
>> I'm leaning towards option C). Let's rip off the band-aid, but provide
>> documentation for how to adapt your extension code. And provide a
>> forwards-compatibility header on the wiki, that extension authors can
>> use to make the new Interrupt calls work against old server versions.
> 
> I don't know which of these options is best, but I don't find any of
> them categorically unacceptable.

Looking at the compatibility macros of 0008 for the latches with
INTERRUPT_GENERAL_WAKEUP under latch.h, the changes are not that bad
to adapt to, IMO.  It reminds of f25968c49697: hard breakage, no
complaints I've heard of because I guess that most folks have been
using an in-house compatibility headers.

A big disadvantage of B is that someone may decide to add new code in
core that depends on the past routines, and we'd better avoid that for
this new layer of APIs for interrupt handling.  A is a subset of C: do
a hard switch in the core code, with C mentioning a compatibility
layer in the wiki that does not exist in the core code.  Any of A or C
is OK, I would not choose B for the core backend.
--
Michael


signature.asc
Description: PGP signature

Re: Cleanup SubPlanstate

2024-10-30 Thread David Rowley

On Wed, 30 Oct 2024 at 03:34, Rafia Sabih  wrote:
> While reviewing a related patch, it came to the notice that tab_eq_funcs in 
> the SubPlanState is not used. Hence the patch.

Thanks for picking that up. I didn't quite understand why you adjusted
the header comment for the TupleHashEntryData struct when the field
you removed was in the SubPlanState struct, so I left that part out.
However, I did fix the existing typo in that comment.

I've pushed the resulting patch.

David

RE: Conflict detection for update_deleted in logical replication

2024-10-30 Thread Hayato Kuroda (Fujitsu)

Dear Mikhail,

Thanks for the reply!

> > Anyway, this topic introduces huge complexity and is not mandatory for 
> > update_deleted
> > detection. We can work on it in later versions based on the needs.
>
> From my perspective, this is critical for databases. REINDEX CONCURRENTLY is 
> typically run
> in production databases on regular basic, so any master-master system should 
> be unaffected by it.

I think you do not understand what I said correctly. The main point here is that
the index scan is not needed to detect the update_deleted. In the first version
workers can do the normal sequential scan instead. This workaround definitely 
does
not affect REINDEX CONCURRENTLY.
After the patch being good shape or pushed, we can support using the index to 
find
the dead tuple, at that time we can consider how we ensure the index contains 
the entry
for dead tuples.

Best regards,
Hayato Kuroda
FUJITSU LIMITED

Re: CREATE OR REPLACE MATERIALIZED VIEW

2024-10-30 Thread Michael Paquier

On Tue, Jul 02, 2024 at 01:46:21PM +0300, Aleksander Alekseev wrote:
> I can imagine how this may impact many applications and upset many
> software developers worldwide. Was there even a precedent (in the
> recent decade or so) when PostgreSQL broke the SQL syntax?

We're usually very careful about that, and maintaining a
backward-compatible grammar has a minimal cost in the parser.  The
closest thing I can think of that has a rather complicated grammar is
COPY, which has *two* legacy grammars still supported, one for ~7.3
and one for ~9.0.

> To clarify, I'm not opposed to this idea. If we are fine with breaking
> backward compatibility on the SQL level, this would allow dropping the
> support of inherited tables some day, a feature that in my humble
> opinion shouldn't exist (I realize this is another and very debatable
> question though). I just don't think this is something we ever do in
> this project. But I admit that this information may be incorrect
> and/or outdated.

I am not sure that there is much to gain with this proposal knowing
that the commands with matviews have been around for quite a long time
now.  Particularly, after looking at 0001, you'd see that it shortcuts
a couple of areas of the CTAS code because that's what we are relying
on when building the initial data of matviews.  Hence,
implementation-wise in the backend, matviews are much closer to
physical relations than views.  This is trying to make matviews behave
more consistently with views.

This topic has been mentioned once on pgsql-general back in 2019, for
reference:
https://www.postgresql.org/message-id/CAAWhf%3DPHch2H3ekYnbafuwqWqwyRok8WVPaDxKosZE4GQ2pq5w%40mail.gmail.com
--
Michael

signature.asc
Description: PGP signature

Re: [PATCH] Add array_reverse() function

2024-10-30 Thread Michael Paquier

On Wed, Oct 30, 2024 at 10:11:20PM +0300, Пополитов Владлен wrote:
> Makes me wonder if we should have an utility function which would
> contain common code for array_shuffle() and array_reverse().
> Considering inconveniences such an approach caused in case of common
> code for text and bytea types [1] I choose to copy the code and modify
> it for the task.

We are talking about 20 lines copied over to reverse the order of
these elements that require their own inner for the element lookups
and manipulations.  FWIW, I can live with this burden rather than
invent a routine that encapsulates both as this has also the
disadvantage of hiding more internals across more layers.

The abstraction with array_reverse_n() is unnecessary, but you have an
argument with the consistency and the shuffling inner routine.
Perhaps somebody will reuse that to invent a function that works on a
subset, where the comment on top array_reverse_n() about the type
caching still works.  Anyway, no objections to what the patch defines
and does.

So it looks rather OK seen from here, as you are proposing.
--
Michael

signature.asc
Description: PGP signature

Re: Pgoutput not capturing the generated columns

2024-10-30 Thread vignesh C

On Thu, 31 Oct 2024 at 04:42, Peter Smith  wrote:
>
> On Thu, Oct 31, 2024 at 3:16 AM vignesh C  wrote:
> >
> > On Wed, 30 Oct 2024 at 15:06, Amit Kapila  wrote:
> > >
> > > On Tue, Oct 29, 2024 at 8:50 PM vignesh C  wrote:
> > > >
> > > > Thank you for reporting this issue. The attached v46 patch addresses
> > > > the problem and includes some adjustments to the comments. Thanks to
> > > > Amit for sharing the comment changes offline.
> > > >
> > >
> > > Pushed. Kindly rebase and send the remaining patches.
> >
> > Thanks for committing this patch, here is a rebased version of the
> > remaining patches.
> >
>
> Hi,
>
> I found that the docs of src/sgml/ddl.sgml [1] are still saying:
>
>  
>   Generated columns are skipped for logical replication and cannot be
>   specified in a CREATE PUBLICATION column list.
>  
>
> But that is contrary to the new behaviour after the "Replicate
> generated columns when specified in the column list." commit yesterday
> [2].
>
> It looks like an oversight. I think updating that paragraph should
> have been included with yesterday's commit.

Thanks for the findings, the attached patch has the changes for the same.

Regards,
Vignesh
From 6cf390c33d1b88836da2ec0753562568b20b39e1 Mon Sep 17 00:00:00 2001
From: Vignesh C 
Date: Thu, 31 Oct 2024 07:36:37 +0530
Subject: [PATCH] Update documentation for generated columns in logical
 replication

The previous commit (745217a051) introduced support for logical
replication of generated columns when specified alongside a column
list. However, the documentation for generated columns was not updated
to reflect this change. This commit addresses that oversight by adding
clarification on the support for generated columns in the logical
replication section.
---
 doc/src/sgml/ddl.sgml | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/doc/src/sgml/ddl.sgml b/doc/src/sgml/ddl.sgml
index f6344b3b79..f02f67d7b8 100644
--- a/doc/src/sgml/ddl.sgml
+++ b/doc/src/sgml/ddl.sgml
@@ -514,8 +514,9 @@ CREATE TABLE people (
 
 
  
-  Generated columns are skipped for logical replication and cannot be
-  specified in a CREATE PUBLICATION column list.
+  Generated columns can be replicated during logical replication by
+  including them in the column list of the
+  CREATE PUBLICATION command.
  
 

-- 
2.34.1

Re: Pgoutput not capturing the generated columns

2024-10-30 Thread Peter Smith

On Thu, Oct 31, 2024 at 3:16 AM vignesh C  wrote:
>
> On Wed, 30 Oct 2024 at 15:06, Amit Kapila  wrote:
> >
> > On Tue, Oct 29, 2024 at 8:50 PM vignesh C  wrote:
> > >
> > > Thank you for reporting this issue. The attached v46 patch addresses
> > > the problem and includes some adjustments to the comments. Thanks to
> > > Amit for sharing the comment changes offline.
> > >
> >
> > Pushed. Kindly rebase and send the remaining patches.
>
> Thanks for committing this patch, here is a rebased version of the
> remaining patches.
>

Hi,

I found that the docs of src/sgml/ddl.sgml [1] are still saying:

  Generated columns are skipped for logical replication and cannot be
  specified in a CREATE PUBLICATION column list.

But that is contrary to the new behaviour after the "Replicate
generated columns when specified in the column list." commit yesterday
[2].

It looks like an oversight. I think updating that paragraph should
have been included with yesterday's commit.

==
[1] https://github.com/postgres/postgres/blob/master/doc/src/sgml/ddl.sgml
[2] 
https://github.com/postgres/postgres/commit/745217a051a9341e8c577ea59a87665d331d4af0

Kind Regards,
Peter Smith.
Fujitsu Australia

Re: Add ExprState hashing for GROUP BY and hashed SubPlans

2024-10-30 Thread Andrei Lepikhov


On 10/31/24 08:16, David Rowley wrote:

On Tue, 29 Oct 2024 at 22:47, David Rowley  wrote:

I've attached an updated patch with a few other fixes. Whilr checking
this tonight, noticed that master does not use
SubPlanState.tab_eq_funcs for anything. I resisted removing that in
this patch. Perhaps a follow-on patch can remove that. I suspect it's
not been used for a long time now, but I didn't do the archaeology
work to find out.


3974bc319 removed the SubPlanState.tab_eq_funcs field, so here's a
rebased patch.

Thanks for sharing this.
I still need to dive deeply into the code. But I have one annoying user 
case where the user complained about a 4x SQL server speedup in 
comparison to Postgres, and I guess it is a good benchmark for your code.
This query is remarkable because of high grouping computation load. Of 
course, I can't provide the user's data, but I have prepared a synthetic 
test to reproduce the case (see attachment).
Comparing the master with and without your patch, the first, I see is 
more extensive usage of memory (see complete explains in the attachment):


Current master:
---

Partial HashAggregate  (cost=54492.60..55588.03 rows=19917 width=889)
  (actual time=20621.028..20642.664 rows=10176 loops=9)
  Group Key: t1.x1, t1.x2, t1.x3, t1.x4, t1.x5
  Batches: 1  Memory Usage: 74513kB


Patched:


Partial HashAggregate  (cost=54699.91..55799.69 rows=19996 width=889)
  (actual time=57213.280..186216.604 rows=10302738 loops=9)
  Group Key: t1.x1, t1.x2, t1.x3, t1.x4, t1.x5
  Batches: 261  Memory Usage: 527905kB  Disk Usage: 4832656kB

I wonder what causes memory consumption, but it is hard to decide on the 
patch's positive outcome for now.


--
regards, Andrei Lepikhov
/*
-- Master

 Finalize HashAggregate  (cost=88853.57..89949.01 rows=19917 width=889) (actual 
time=20960.694..20994.725 rows=4 loops=1)
   Group Key: t1.x1, t1.x2, t1.x3, t1.x4, t1.x5
   Batches: 1  Memory Usage: 280081kB
   ->  Gather  (cost=55492.60..72521.63 rows=159336 width=889) (actual 
time=20640.902..20680.058 rows=91580 loops=1)
 Workers Planned: 8
 Workers Launched: 8
 ->  Partial HashAggregate  (cost=54492.60..55588.03 rows=19917 
width=889) (actual time=20621.028..20642.664 rows=10176 loops=9)
   Group Key: t1.x1, t1.x2, t1.x3, t1.x4, t1.x5
   Batches: 1  Memory Usage: 74513kB
   Worker 0:  Batches: 1  Memory Usage: 66329kB
   Worker 1:  Batches: 1  Memory Usage: 58137kB
   Worker 2:  Batches: 1  Memory Usage: 58137kB
   Worker 3:  Batches: 1  Memory Usage: 66329kB
   Worker 4:  Batches: 1  Memory Usage: 58137kB
   Worker 5:  Batches: 1  Memory Usage: 74521kB
   Worker 6:  Batches: 1  Memory Usage: 58137kB
   Worker 7:  Batches: 1  Memory Usage: 66329kB
   ->  Parallel Hash Join  (cost=4921.00..23223.33 rows=305066 
width=879) (actual time=667.304..5047.261 rows=17361113 loops=9)
 Hash Cond: ((t1.x1 = t2.x1) AND (t1.x2 = t2.x2) AND (t1.x3 
= t2.x3) AND (t1.x4 = t2.x4))
 ->  Parallel Seq Scan on t1  (cost=0.00..4671.00 
rows=12500 width=868) (actual time=0.009..7.437 rows=1 loops=9)
 ->  Parallel Hash  (cost=4671.00..4671.00 rows=12500 
width=319) (actual time=666.929..666.930 rows=1 loops=9)
   Buckets: 131072  Batches: 1  Memory Usage: 35776kB
   ->  Parallel Seq Scan on t2  (cost=0.00..4671.00 
rows=12500 width=319) (actual time=560.944..562.943 rows=1 loops=9)
 Planning Time: 1.619 ms
 JIT:
   Functions: 149
   Options: Inlining true, Optimization true, Expressions true, Deforming true
   Timing: Generation 64.711 ms (Deform 13.706 ms), Inlining 846.944 ms, 
Optimization 2602.412 ms, Emission 1599.170 ms, Total 5113.237 ms
 Execution Time: 21030.501 ms
(29 rows)
 */

/*
-- Master + patch

 Finalize HashAggregate  (cost=89193.21..90292.99 rows=19996 width=889) (actual 
time=469100.662..469136.040 rows=4 loops=1)
   Group Key: t1.x1, t1.x2, t1.x3, t1.x4, t1.x5
   Batches: 1  Memory Usage: 8963609kB
   ->  Gather  (cost=55699.91..72796.49 rows=159968 width=889) (actual 
time=58325.864..202369.963 rows=92724640 loops=1)
 Workers Planned: 8
 Workers Launched: 8
 ->  Partial HashAggregate  (cost=54699.91..55799.69 rows=19996 
width=889) (actual time=57213.280..186216.604 rows=10302738 loops=9)
   Group Key: t1.x1, t1.x2, t1.x3, t1.x4, t1.x5
   Batches: 261  Memory Usage: 527905kB  Disk Usage: 4832656kB
   Worker 0:  Batches: 1  Memory Usage: 247321kB
   Worker 1:  Batches: 261  Memory Usage: 527417kB  Disk Usage: 
3850624kB
   Worker 2:  Batches: 261  Memory Usage: 527905kB  Disk Usage: 
4304528kB
   Worker 3:  Batches: 325  Memory Usage: 527417kB  Disk Usage: 
4315120kB
   Worker 4:  Batches: 261  Memory Usage:

Re: define pg_structiszero(addr, s, r)

2024-10-30 Thread Bertrand Drouvot

Hi,

On Thu, Oct 31, 2024 at 09:59:35AM +0900, Michael Paquier wrote:
> On Thu, Oct 31, 2024 at 07:55:45AM +1100, Peter Smith wrote:
> > +/*
> > + * Test if a memory region starting at p and of size len is full of zeroes.
> > + */
> > +static inline bool
> > +pg_mem_is_all_zeros(const void *ptr, size_t len)
> > 
> > The function comment should say 'ptr' instead of 'p', right?
> 
> Yes.

Thank you both for looking at it. Agree, done in the new version attached. 

> +static inline bool
> +pg_mem_is_all_zeros(const void *ptr, size_t len)
> 
> While we're talking about wordsmithing things, I would not choose
> "mem" for this routine, just stick to "memory".  There is not much in
> the code that does memory-specific things like what you are proposing
> here.  Still, this would be more consistent with the macros for memory 
> barriers at least.  Hence, "pg_memory_is_all_zeros()" makes more
> sense?

That makes sense to me, done that way in the attached.

> The location of memutils.h is sensible.

Thanks for sharing your thoughts on it.

> +   if (pg_mem_is_all_zeros(pagebytes , (BLCKSZ / sizeof(size_t
> 
> Extra space not required after pagebytes.

Fat finger here, thanks!

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
>From 88f5ebbbee3cdc7b773a1d0f5a7cf6f9d909b784 Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot 
Date: Tue, 10 Sep 2024 01:22:02 +
Subject: [PATCH v6] New pg_memory_is_all_zeros(ptr, len) inline function

This new function allows to test if a memory region starting at ptr and of
size len is full of zeroes.
---
 src/backend/storage/page/bufpage.c   | 13 +
 src/backend/utils/activity/pgstat_bgwriter.c |  5 +++--
 src/backend/utils/activity/pgstat_checkpointer.c |  7 +++
 src/backend/utils/activity/pgstat_relation.c |  7 ++-
 src/include/utils/memutils.h | 16 
 5 files changed, 25 insertions(+), 23 deletions(-)
  19.6% src/backend/storage/page/
  58.8% src/backend/utils/activity/
  21.4% src/include/utils/

diff --git a/src/backend/storage/page/bufpage.c b/src/backend/storage/page/bufpage.c
index be6f1f62d2..5ee1e58cd4 100644
--- a/src/backend/storage/page/bufpage.c
+++ b/src/backend/storage/page/bufpage.c
@@ -89,10 +89,8 @@ PageIsVerifiedExtended(Page page, BlockNumber blkno, int flags)
 {
 	PageHeader	p = (PageHeader) page;
 	size_t	   *pagebytes;
-	int			i;
 	bool		checksum_failure = false;
 	bool		header_sane = false;
-	bool		all_zeroes = false;
 	uint16		checksum = 0;
 
 	/*
@@ -126,18 +124,9 @@ PageIsVerifiedExtended(Page page, BlockNumber blkno, int flags)
 	}
 
 	/* Check all-zeroes case */
-	all_zeroes = true;
 	pagebytes = (size_t *) page;
-	for (i = 0; i < (BLCKSZ / sizeof(size_t)); i++)
-	{
-		if (pagebytes[i] != 0)
-		{
-			all_zeroes = false;
-			break;
-		}
-	}
 
-	if (all_zeroes)
+	if (pg_memory_is_all_zeros(pagebytes, (BLCKSZ / sizeof(size_t
 		return true;
 
 	/*
diff --git a/src/backend/utils/activity/pgstat_bgwriter.c b/src/backend/utils/activity/pgstat_bgwriter.c
index 364a7a2024..85d53d82f2 100644
--- a/src/backend/utils/activity/pgstat_bgwriter.c
+++ b/src/backend/utils/activity/pgstat_bgwriter.c
@@ -17,6 +17,7 @@
 
 #include "postgres.h"
 
+#include "utils/memutils.h"
 #include "utils/pgstat_internal.h"
 
 
@@ -30,7 +31,6 @@ void
 pgstat_report_bgwriter(void)
 {
 	PgStatShared_BgWriter *stats_shmem = &pgStatLocal.shmem->bgwriter;
-	static const PgStat_BgWriterStats all_zeroes;
 
 	Assert(!pgStatLocal.shmem->is_shutdown);
 	pgstat_assert_is_up();
@@ -39,7 +39,8 @@ pgstat_report_bgwriter(void)
 	 * This function can be called even if nothing at all has happened. In
 	 * this case, avoid unnecessarily modifying the stats entry.
 	 */
-	if (memcmp(&PendingBgWriterStats, &all_zeroes, sizeof(all_zeroes)) == 0)
+	if (pg_memory_is_all_zeros(&PendingBgWriterStats,
+			   sizeof(struct PgStat_BgWriterStats)))
 		return;
 
 	pgstat_begin_changecount_write(&stats_shmem->changecount);
diff --git a/src/backend/utils/activity/pgstat_checkpointer.c b/src/backend/utils/activity/pgstat_checkpointer.c
index bbfc9c7e18..258337cfb8 100644
--- a/src/backend/utils/activity/pgstat_checkpointer.c
+++ b/src/backend/utils/activity/pgstat_checkpointer.c
@@ -17,6 +17,7 @@
 
 #include "postgres.h"
 
+#include "utils/memutils.h"
 #include "utils/pgstat_internal.h"
 
 
@@ -29,8 +30,6 @@ PgStat_CheckpointerStats PendingCheckpointerStats = {0};
 void
 pgstat_report_checkpointer(void)
 {
-	/* We assume this initializes to zeroes */
-	static const PgStat_CheckpointerStats all_zeroes;
 	PgStatShared_Checkpointer *stats_shmem = &pgStatLocal.shmem->checkpointer;
 
 	Assert(!pgStatLocal.shmem->is_shutdown);
@@ -40,8 +39,8 @@ pgstat_report_checkpointer(void)
 	 * This function can be called even if nothing at all has happened. In
 	 * this case, avoid unnecessarily modifying the stats entry.
 	 */
-	if (memcmp(&PendingCheckpointerStats, &all_zer

Re: Having problems generating a code coverage report

2024-10-30 Thread Michael Paquier

On Wed, Oct 30, 2024 at 05:26:49PM -0400, Peter Geoghegan wrote:
> I use Debian unstable for most of my day to day work. Apparently
> Debian unstable has exactly the same version of lcov as Ubuntu 24.04.
>
> I've also been unable to generate coverage reports for some time (at
> least on Debian, with LCOV version 2.0-1).

So do I, for both the debian SID and the lcov parts.
--
Michael


signature.asc
Description: PGP signature

Re: Allowing pg_recvlogical to create temporary replication slots

2024-10-30 Thread Michael Paquier

On Wed, Oct 30, 2024 at 09:00:59AM +0100, Peter Eisentraut wrote:
> I think you should explain a bit why you want to do that, what use case or
> scenario this is meant to address.

Indeed.  Note also this page giving a couple more guidelines with more
details:
https://wiki.postgresql.org/wiki/Submitting_a_Patch

We also use what we call a "commit fest" to keep a track of the
patches under review for the current development cycle.  It is a
process where folks can register for patch reviews and provide
feedback. There is still time to register what you have here for the
commit fest beginning on the 1st of November:
https://commitfest.postgresql.org/50/
--
Michael


signature.asc
Description: PGP signature

Re: Inval reliability, especially for inplace updates

2024-10-30 Thread Noah Misch

On Mon, Oct 28, 2024 at 02:27:03PM +0530, Nitin Motiani wrote:
> On Thu, Oct 24, 2024 at 8:24 AM Noah Misch  wrote:
> > With the releases wrapping in 2.5 weeks, I'm ambivalent about pushing this
> > before the release or after.  Pushing before means fewer occurrences of
> > corruption, but pushing after gives more bake time to discover these changes
> > were defective.  It's hard to predict which helps users more, on a
> > risk-adjusted basis.  I'm leaning toward pushing this week.  Opinions?
> 
> I lean towards pushing after the release. This is based on my
> assumption that since this bug has been around for a while, it is
> (probably) not hit often. And a few weeks delay is better than
> introducing a new defect.

I had pushed this during the indicated week, before your mail.  Reverting it
is an option.  Let's see if more opinions arrive.

Fix for Extra Parenthesis in pgbench progress message

2024-10-30 Thread Yushi Ogiwara


Hi,

I noticed an issue in the pgbench progress message where an extra 
closing parenthesis )) appears, as shown below:


700 of 1000 tuples (70%) of pgbench_accounts done (elapsed 19.75 
s, remaining 8.46 s))


This occurs when running commands like pgbench -i -s100 and is caused by 
leftover characters when using \r with fprintf. I made a patch to 
address this by adding extra spaces before \r, which clears any 
remaining characters. While effective, I recognize this solution may not 
be the most sophisticated.


A more refined solution, such as using string padding, might be ideal 
for cases like this.


Best,
Yushi Ogiwaradiff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index e658d060ad..be984e0c7f 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -5004,7 +5004,7 @@ initPopulateTable(PGconn *con, const char *table, int64 base,
 			double		elapsed_sec = PG_TIME_GET_DOUBLE(pg_time_now() - start);
 			double		remaining_sec = ((double) total - j) * elapsed_sec / j;
 
-			chars = fprintf(stderr, INT64_FORMAT " of " INT64_FORMAT " tuples (%d%%) of %s done (elapsed %.2f s, remaining %.2f s)%c",
+			chars = fprintf(stderr, INT64_FORMAT " of " INT64_FORMAT " tuples (%d%%) of %s done (elapsed %.2f s, remaining %.2f s)   %c",
 			j, total,
 			(int) ((j * 100) / total),
 			table, elapsed_sec, remaining_sec, eol);
@@ -5018,7 +5018,7 @@ initPopulateTable(PGconn *con, const char *table, int64 base,
 			/* have we reached the next interval (or end)? */
 			if ((j == total) || (elapsed_sec >= log_interval * LOG_STEP_SECONDS))
 			{
-chars = fprintf(stderr, INT64_FORMAT " of " INT64_FORMAT " tuples (%d%%) of %s done (elapsed %.2f s, remaining %.2f s)%c",
+chars = fprintf(stderr, INT64_FORMAT " of " INT64_FORMAT " tuples (%d%%) of %s done (elapsed %.2f s, remaining %.2f s)   %c",
 j, total,
 (int) ((j * 100) / total),
 table, elapsed_sec, remaining_sec, eol);

Re: Segfault in jit tuple deforming on arm64 due to LLVM issue

2024-10-30 Thread Thomas Munro

On Thu, Oct 17, 2024 at 10:41 PM Thomas Munro  wrote:
> Thanks!  I'm going to go ahead and commit this.

(Sorry for the delay, I got distracted by pgconf.eu.)

Today I set out to commit this patch, and wrote a proper commit
message to explain the code provenance, circumstances that led to it,
and the future conditions that will allow us to delete it in a few
years.  Please see attached.  In the process I struck a potential
snag:

https://llvm.org/LICENSE.txt
https://en.wikipedia.org/wiki/Apache_License

There are a couple of cases of dual-licensed code in our tree where we
explicitly used the Boost alternative instead of Apache 2.  I plead
complete ignorance of this topic and defer to those who know about
such things: can we actually do this?  I guess at a minimum a copy of
the licence would need to appear somewhere -- perhaps under
src/backend/jit/llvm?  4d says that if you modified the code you have
to say so prominently, but I did that at the top (and the changes are
completely trivial, just some #ifdef swizzling to massage some
function prototypes to suit older LLVMs).  Otherwise I understand it
to be generally "BSD-like" (sans advert clause) but there is also some
stuff about patents, which surely aren't relevant to this in
practice... but I know that some projects object to it on principle
and because it smells like contract law, or something not an area
I am well informed about.  Who should I be asking? (Naively, I
wondered: could there be some kind of fair use concept for
back-patching fixes to broken libraries that you're merely a user of
where you can be excused from the burdens of a distributor?  Yeah
wishful thinking I'm sure.)
From 5d9ef5f83a79f56ddd6230c7e9428525bab0d82c Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Tue, 27 Aug 2024 08:58:48 +1200
Subject: [PATCH v7] Monkey-patch LLVM code to fix ARM relocation bug.

Supply a new memory manager for RuntimeDyld that avoids overflowing the
jump distance in linked code on large memory machines.  This is the same
Apache v2-licensed code that is normally used inside the LLVM library,
except this copy has been patched by Michael Smith in his proposed fix
https://www.github.com/llvm/llvm-project/pull/71968.

We hereby slurp it into our own source tree, after moving into a new
namespace llvm::backport and making some minor adjustments so that it
can be compiled with older LLVM versions as far back as 12.  It's harder
to make it work on even older LLVM versions, but it doesn't seem likely
that people are really using them so that is not investigated for now.
(We were probably remiss in not disclaiming support for ancient LLVM
releases sooner; starting from PostgreSQL 16 we've been more agressive
about trimming version support to match the trailing end of common
software distributions).

The problem could also have been addressed by switching to JITLink, and
that is the LLVM project's recommended solution as RuntimeDyld is about
to be deprecated.  We'll have to do that soon enough anyway, and then
when the LLVM version support window advances far enough in a few years
we'll be able to delete this code.  Unfortunately that wouldn't work
today for some relevant releases of LLVM where JITLink is missing or
incomplete.

Several other projects have already taken the approach of back-porting
this fix into their fork of LLVM, which is a vote of confidence despite
the lack of commit into LLVM as of today.  We don't have our own copy of
LLVM so we can't do exactly what they've done.  We just pick up the
whole patched class and inject an instance of it into the regular LLVM
code.

The changes that we've had to make to our copy can be seen by diffing
our SectionMemoryManager.{h,cpp} files against the ones in the tree of
the pull request.

The LLVM project hasn't chosen to commit the fix yet, and even it it
did, it wouldn't be back-ported into the releases of LLVM that most of
our users care about, so there is not much point in waiting any longer
for that.  If they make further changes and commit it to LLVM 19 or 20,
we'll still need this for older versions, but we may want to
resynchronize our copy.

This should fix the spate of crash reports we've been receiving lately
from users on large memory ARM systems.

Back-patch to all supported releases.

Co-authored-by: Thomas Munro 
Co-authored-by: Anthonin Bonnefoy 
Reported-by: Anthonin Bonnefoy 
Discussion: https://postgr.es/m/CAO6_Xqr63qj%3DSx7HY6ZiiQ6R_JbX%2B-p6sTPwDYwTWZjUmjsYBg%40mail.gmail.com
---
 src/backend/jit/llvm/Makefile |   3 +-
 src/backend/jit/llvm/SectionMemoryManager.cpp | 394 ++
 src/backend/jit/llvm/llvmjit.c|   7 +
 src/backend/jit/llvm/llvmjit_wrap.cpp |  20 +
 src/backend/jit/llvm/meson.build  |   1 +
 src/include/jit/SectionMemoryManager.h| 227 ++
 src/include/jit/llvmjit.h |   8 +
 src/include/jit/llvmjit_backport.h|  20 +
 src/tools/pginclude/headerscheck

Re: Consider pipeline implicit transaction as a transaction block

2024-10-30 Thread Michael Paquier

On Wed, Oct 30, 2024 at 04:06:20PM +0100, Jelte Fennema-Nio wrote:
> On Wed, 30 Oct 2024 at 10:15, Anthonin Bonnefoy
>  wrote:
>> The attached patch adds the detection of implicit transactions started
>> by a pipeline in CheckTransactionBlock, avoiding warnings when
>> commands like `set local` are called within a pipeline, and making the
>> detection of transaction block coherent with what's done in
>> IsInTransactionBlock and PreventInTransactionBlock.
> 
> +1 seems like a reasonable change.

That's indeed a bit strange.  I think that you're right.

@Tom added in CC: Is there a specific reason why CheckTransactionBlock()
did not include a check based on XACT_FLAGS_PIPELINING when it got
introduced in 20432f873140, while IsInTransactionBlock() considers it?

This was discussed here:
https://www.postgresql.org/message-id/flat/17434-d9f7a064ce2a88a3%40postgresql.org
--
Michael


signature.asc
Description: PGP signature

Re: define pg_structiszero(addr, s, r)

2024-10-30 Thread Michael Paquier

On Thu, Oct 31, 2024 at 05:52:49AM +, Bertrand Drouvot wrote:
> That makes sense to me, done that way in the attached.

Seems kind of OK seen from here.  I am wondering if others more
comments about the name of the macro, its location, the fact that we
still have pagebytes in bufpage.c, etc.
--
Michael


signature.asc
Description: PGP signature

Re: unicode test programs don't build with meson

2024-10-30 Thread Peter Eisentraut


On 18.10.24 15:17, Aleksander Alekseev wrote:

1) Add libintl directly to the affected programs, like

   case_test = executable('case_test',
 ['case_test.c'],
-  dependencies: [frontend_port_code, icu],
+  dependencies: [frontend_port_code, icu, libintl],
 include_directories: inc,
 link_with: [common_static, pgport_static],
 build_by_default: false,


To me the first option seems to be the most natural one (does the
program depend on the library? add it to the list of dependencies for
this program). Also it seems to be most consistent with what we
currently have, see how icu is listed.


As there were no other opinions forthcoming, I have committed it like 
this.  Thanks!

Re: Allowing pg_recvlogical to create temporary replication slots

2024-10-30 Thread Peter Eisentraut


On 27.10.24 13:37, Torsten Förtsch wrote:
This is my very first message to this mailing list. Please advise if I 
am making any mistakes in the procedure.


Welcome!


The attached patch enables pg_recvlogical to create a temporary slot.

What is the next step in the process to get this change into official 
postgres?


I think you should explain a bit why you want to do that, what use case 
or scenario this is meant to address.

Re: define pg_structiszero(addr, s, r)

2024-10-30 Thread Peter Eisentraut


On 29.10.24 14:58, Bertrand Drouvot wrote:

hi,

On Tue, Oct 29, 2024 at 10:23:37AM +0100, Peter Eisentraut wrote:

On 29.10.24 08:54, Bertrand Drouvot wrote:

+static inline bool
+pg_mem_is_all_zeros(const char *p, size_t len)


This should be a void * pointer please.


Thanks for looking at it!
Yeah better, done in v4 attached.


Sorry for the confusion.  I didn't mean to discourage you from the const 
qualifier.  That should still be there.

Re: protocol-level wait-for-LSN

2024-10-30 Thread Jelte Fennema-Nio

On Mon, 28 Oct 2024 at 16:51, Peter Eisentraut  wrote:
> Thoughts?

+   snprintf(xloc, sizeof(xloc), "%X/%X",
LSN_FORMAT_ARGS(logptr))
+   pq_sendstring(&buf, xloc);

nit: I feel that sending the LSN as a string seems unnecessarily
wasteful of bytes. I'd rather send it as its binary representation.

Re: protocol-level wait-for-LSN

2024-10-30 Thread Jelte Fennema-Nio

On Mon, 28 Oct 2024 at 17:58, Heikki Linnakangas  wrote:
> > - The Query message sends an LSN to wait for.  (This doesn't handle the
> > extended query protocol yet.)
>
> I'd suggest adding a new message type for this, so that it works the
> same with simple and extended query. Or if you just want to wait without
> issuing any query.

Big +1 to this. After thinking about it more, I think this would make
a fancy pooler much easier to implement. Because then the pooler could
simply send such a new WaitForLSN message whenever it wants to, e.g.
before handing off the server connection to the client. Instead of
having to intercept every Query/Execute message that the client is
sending, and modify that in place before sending it on to the server.

Writing the previous down made me realize that using a separate
message would be nice for this usecase too. As opposed to including it
in ReadyForQuery. Because if the fancy pooler wants to configure these
LSNs transparently for a client that has not set the protocol
parameter, it would need to strip the new LSN field from the
ReadyForQuery message before forwarding it to the client. Stripping
out a whole message is generally easier to do than modifying messages
in place.

98 matches

Mail list logo