Re: Useless field ispartitioned in CreateStmtContext
Hi On Thu, 24 Oct 2024, 18:08 hugo, <2689496...@qq.com> wrote: > Hi! > >When looking at the partition-related code, I found that the > ispartitioned > > field in CreateStmtContext is not used. It looks like we can safely remove > it and > > avoid invalid assignment logic. > > > > Here's a very simple fix, any suggestion? > > > > diff --git a/src/backend/parser/parse_utilcmd.c > b/src/backend/parser/parse_utilcmd.c > > index 1e15ce10b48..6dea85cc2dc 100644 > > --- a/src/backend/parser/parse_utilcmd.c > > +++ b/src/backend/parser/parse_utilcmd.c > > @@ -89,7 +89,6 @@ typedef struct > > List *alist; /* "after list" of things > to do after creating > > * the > table */ > > IndexStmt *pkey; /* PRIMARY KEY index, if > any */ > > - boolispartitioned; /* true if table is partitioned */ > > PartitionBoundSpec *partbound; /* transformed FOR VALUES */ > > boolofType; /* true if statement > contains OF typename */ > > } CreateStmtContext; > > @@ -246,7 +245,6 @@ transformCreateStmt(CreateStmt *stmt, const char > *queryString) > > cxt.blist = NIL; > > cxt.alist = NIL; > > cxt.pkey = NULL; > > - cxt.ispartitioned = stmt->partspec != NULL; > > cxt.partbound = stmt->partbound; > > cxt.ofType = (stmt->ofTypename != NULL); > > @@ -3401,7 +3399,6 @@ transformAlterTableStmt(Oid relid, AlterTableStmt > *stmt, > > cxt.blist = NIL; > > cxt.alist = NIL; > > cxt.pkey = NULL; > > - cxt.ispartitioned = (rel->rd_rel->relkind == > RELKIND_PARTITIONED_TABLE); > > cxt.partbound = NULL; > > cxt.ofType = false; > > > > -- > > Best regards, > > hugozhang > I don't see any cf entry for this patch, so I created one[1]. I added Alena and myself as reviewers. I can't find your account on cf, please add yourself as author [1] https://commitfest.postgresql.org/50/5340/ >
Re: Add isolation test template in injection_points for wait/wakeup/detach
On Wed, Oct 30, 2024 at 07:26:07AM +, Bertrand Drouvot wrote: > 1 === Nit > > s/Permutations like this one/Permutations like the following commented out > one/ ? > > 2 === > > s/back its result/back its result once woken up/ ? > > 3 === > > s/faster than the wait/before the SQL function doing the wait returns its > result/ ? > > With 2 === and 3 === I think it's easier to "link" both sentences that way. What do you think about the updated version attached? -- Michael diff --git a/src/test/modules/injection_points/specs/basic.spec b/src/test/modules/injection_points/specs/basic.spec index 7f44e3ddc3..47db676c0c 100644 --- a/src/test/modules/injection_points/specs/basic.spec +++ b/src/test/modules/injection_points/specs/basic.spec @@ -26,7 +26,10 @@ step wakeup2 { SELECT injection_points_wakeup('injection-points-wait'); } step detach2 { SELECT injection_points_detach('injection-points-wait'); } # Detach after wait and wakeup. -# This permutation is proving to be unstable on FreeBSD, so disable for now. +# Permutations like the following one commented out should be avoided, as +# the detach may finish before the SQL function doing the wait returns +# its result. It is recommended to use wakeups as the last permutation +# should a wait be done within an SQL function. #permutation wait1 wakeup2 detach2 # Detach before wakeup. s1 waits until wakeup, ignores the detach. signature.asc Description: PGP signature
Consider pipeline implicit transaction as a transaction block
Hi, When using pipelining with implicit transaction, a transaction will start from the first command and be committed with the Sync message. Functions like IsInTransactionBlock and PreventInTransactionBlock already assimilate this implicit transaction as a transaction block, relying on the XACT_FLAGS_PIPELINING flag. However, this is not the case in CheckTransactionBlock. This function is used for things like warning when a local GUC is set outside of a transaction block. The attached patch adds the detection of implicit transactions started by a pipeline in CheckTransactionBlock, avoiding warnings when commands like `set local` are called within a pipeline, and making the detection of transaction block coherent with what's done in IsInTransactionBlock and PreventInTransactionBlock. One thing that's still not fixed by the patch is that the first command won't be seen as part of a transaction block. For example, with pgbench: \startpipeline SET LOCAL statement_timeout='200ms'; SELECT * FROM pgbench_accounts where aid=1; \endpipeline The XACT_FLAGS_PIPELINING will only be set after the first command, so the warning about `set local` happening outside of a transaction block will still be generated. However, I'm not sure if it's something fixable (or worth fixing?). This would require to know beforehand that there are multiple executes before the sync message, which doesn't seem doable. Regards, Anthonin v01-0001-Consider-pipeline-implicit-transaction-as-a-tran.patch Description: Binary data
Re: define pg_structiszero(addr, s, r)
Hi, On Wed, Oct 30, 2024 at 09:09:54AM +0100, Peter Eisentraut wrote: > On 29.10.24 14:58, Bertrand Drouvot wrote: > > hi, > > > > On Tue, Oct 29, 2024 at 10:23:37AM +0100, Peter Eisentraut wrote: > > > On 29.10.24 08:54, Bertrand Drouvot wrote: > > > > +static inline bool > > > > +pg_mem_is_all_zeros(const char *p, size_t len) > > > > > > This should be a void * pointer please. > > > > Thanks for looking at it! > > Yeah better, done in v4 attached. > > Sorry for the confusion. I didn't mean to discourage you from the const > qualifier. That should still be there. doh! Of course we want it to be read only. No problem, I should have put more thoughts on it. Done in v5 attached. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com >From e80e5cf946813185985bf5339ac996ccc0eab1f0 Mon Sep 17 00:00:00 2001 From: Bertrand Drouvot Date: Tue, 10 Sep 2024 01:22:02 + Subject: [PATCH v5] New pg_mem_is_all_zeros(p, len) inline function This new function allows to test if a memory region (starting at p) and of size len is full of zeroes. --- src/backend/storage/page/bufpage.c | 13 + src/backend/utils/activity/pgstat_bgwriter.c | 5 +++-- src/backend/utils/activity/pgstat_checkpointer.c | 7 +++ src/backend/utils/activity/pgstat_relation.c | 7 ++- src/include/utils/memutils.h | 16 5 files changed, 25 insertions(+), 23 deletions(-) 19.8% src/backend/storage/page/ 58.6% src/backend/utils/activity/ 21.5% src/include/utils/ diff --git a/src/backend/storage/page/bufpage.c b/src/backend/storage/page/bufpage.c index be6f1f62d2..ca4c68de1c 100644 --- a/src/backend/storage/page/bufpage.c +++ b/src/backend/storage/page/bufpage.c @@ -89,10 +89,8 @@ PageIsVerifiedExtended(Page page, BlockNumber blkno, int flags) { PageHeader p = (PageHeader) page; size_t *pagebytes; - int i; bool checksum_failure = false; bool header_sane = false; - bool all_zeroes = false; uint16 checksum = 0; /* @@ -126,18 +124,9 @@ PageIsVerifiedExtended(Page page, BlockNumber blkno, int flags) } /* Check all-zeroes case */ - all_zeroes = true; pagebytes = (size_t *) page; - for (i = 0; i < (BLCKSZ / sizeof(size_t)); i++) - { - if (pagebytes[i] != 0) - { - all_zeroes = false; - break; - } - } - if (all_zeroes) + if (pg_mem_is_all_zeros(pagebytes , (BLCKSZ / sizeof(size_t return true; /* diff --git a/src/backend/utils/activity/pgstat_bgwriter.c b/src/backend/utils/activity/pgstat_bgwriter.c index 364a7a2024..398e1d57ca 100644 --- a/src/backend/utils/activity/pgstat_bgwriter.c +++ b/src/backend/utils/activity/pgstat_bgwriter.c @@ -17,6 +17,7 @@ #include "postgres.h" +#include "utils/memutils.h" #include "utils/pgstat_internal.h" @@ -30,7 +31,6 @@ void pgstat_report_bgwriter(void) { PgStatShared_BgWriter *stats_shmem = &pgStatLocal.shmem->bgwriter; - static const PgStat_BgWriterStats all_zeroes; Assert(!pgStatLocal.shmem->is_shutdown); pgstat_assert_is_up(); @@ -39,7 +39,8 @@ pgstat_report_bgwriter(void) * This function can be called even if nothing at all has happened. In * this case, avoid unnecessarily modifying the stats entry. */ - if (memcmp(&PendingBgWriterStats, &all_zeroes, sizeof(all_zeroes)) == 0) + if (pg_mem_is_all_zeros(&PendingBgWriterStats, + sizeof(struct PgStat_BgWriterStats))) return; pgstat_begin_changecount_write(&stats_shmem->changecount); diff --git a/src/backend/utils/activity/pgstat_checkpointer.c b/src/backend/utils/activity/pgstat_checkpointer.c index bbfc9c7e18..fb66a1a11d 100644 --- a/src/backend/utils/activity/pgstat_checkpointer.c +++ b/src/backend/utils/activity/pgstat_checkpointer.c @@ -17,6 +17,7 @@ #include "postgres.h" +#include "utils/memutils.h" #include "utils/pgstat_internal.h" @@ -29,8 +30,6 @@ PgStat_CheckpointerStats PendingCheckpointerStats = {0}; void pgstat_report_checkpointer(void) { - /* We assume this initializes to zeroes */ - static const PgStat_CheckpointerStats all_zeroes; PgStatShared_Checkpointer *stats_shmem = &pgStatLocal.shmem->checkpointer; Assert(!pgStatLocal.shmem->is_shutdown); @@ -40,8 +39,8 @@ pgstat_report_checkpointer(void) * This function can be called even if nothing at all has happened. In * this case, avoid unnecessarily modifying the stats entry. */ - if (memcmp(&PendingCheckpointerStats, &all_zeroes, - sizeof(all_zeroes)) == 0) + if (pg_mem_is_all_zeros(&PendingCheckpointerStats, + sizeof(struct PgStat_CheckpointerStats))) return; pgstat_begin_changecount_write(&stats_shmem->changecount); diff --git a/src/backend/utils/activity/pgstat_relation.c b/src/backend/utils/activity/pgstat_relation.c index 8a3f7d434c..bf1947f316 100644 --- a/src/backend/utils/activity/pgstat_relation.c +++ b/src/backend/utils/activity/pgstat_relation.c @@ -801,7 +801,6 @@ pgstat_twophase_postabort(Trans
Re: Add isolation test template in injection_points for wait/wakeup/detach
Hi, On Wed, Oct 30, 2024 at 06:21:04PM +0900, Michael Paquier wrote: > On Wed, Oct 30, 2024 at 07:26:07AM +, Bertrand Drouvot wrote: > > 1 === Nit > > > > s/Permutations like this one/Permutations like the following commented out > > one/ ? > > > > 2 === > > > > s/back its result/back its result once woken up/ ? > > > > 3 === > > > > s/faster than the wait/before the SQL function doing the wait returns its > > result/ ? > > > > With 2 === and 3 === I think it's easier to "link" both sentences that way. > > What do you think about the updated version attached? Thanks! LGTM. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Index AM API cleanup
I want to call out one particular aspect that is central to this patch series that needs more broader discussion and agreement. The problem is that btree indexes (and to a lesser extent hash indexes) are hard-coded in various places. This includes various places in the optimizer (see patch v17-0018) that might be harder to understand, but also more easy-to-understand places such as: - Only hash and btree are supported for replica identity. (see patch v17-0015) - Only btree is supported for speculative insertion. (see patch v17-0016) - Only btree is supported for foreign keys. (see patch v17-0017) - Only btree can participate in merge join. (see patch v17-0020) The problem in cases such as these, and some of them actually contain code comments about this, is that we somehow need to know what the equality operator (and in some cases ordering operator) for a type is. And the way this is currently done is that the btree and hash strategy numbers are hardcoded, and other index AMs are just not supported because there is no way to find that out from those. We faced a similar issue for the temporal primary keys feature. One thing this feature had to overcome is that - Only btree is supported for primary keys. For that feature, we wanted to add support for gist indexes, and we had to again find a way to get the equals operator (and also the overlaps operator). The solution we ended up with there is to add a support function to gist that translates the strategy numbers used by the operator class to "well-known" strategy numbers that we can understand (commit 7406ab623fe). It later turned out that this problem actually might be a subset of the general problem being discussed here, so we can also change that solution if we find a more general solution here. So the question is, how do we communicate these fundamental semantics of operators? The solution we discussed for temporal primary keys is that we just mandate that certain strategy numbers have fixed meanings. This is how btree and hash work anyway, but this was never required for gist. And there are many gist opclass implementations already out there that don't use compatible strategy numbers, so that solution would not have worked (at least without some upgrading pain). Hence the idea of mapping the actual strategy numbers to a fixed set of well-known ones. For the general case across all index AMs, we could similarly decree that all index AMs use certain strategy numbers for fixed purposes. But again, this already wouldn't work for gist and perhaps others already out there. So here again we need some kind of mapping to numbers with a fixed purpose. Or perhaps new index AMs can start out with the "correct" numbers and only some existing ones would need the mapping. And then what do you map them to? One idea would be to map them to the existing btree numbers. This could work, but I always found it kind of confusing that you'd then have multiple sets of strategy numbers in play, one native to the index AM or opclass, and one sort of global one. Another idea is to map them to another thing altogether, a new enum. This is what this patch series does, essentially. But it turns out (so argues the patch series), that this enum already exists, namely typedef enum RowCompareType { /* Values of this enum are chosen to match btree strategy numbers */ ROWCOMPARE_LT = 1, /* BTLessStrategyNumber */ ROWCOMPARE_LE = 2, /* BTLessEqualStrategyNumber */ ROWCOMPARE_EQ = 3, /* BTEqualStrategyNumber */ ROWCOMPARE_GE = 4, /* BTGreaterEqualStrategyNumber */ ROWCOMPARE_GT = 5, /* BTGreaterStrategyNumber */ ROWCOMPARE_NE = 6, /* no such btree strategy */ } RowCompareType; which in spite of the name, yes, sounds like exactly that kind of thing. And it's also used in many places for this kind of thing already. The patch series adds index AM callbacks amtranslatestrategy amtranslaterctype to convert between strategy number and RowCompareType, and also adds ROWCOMPARE_NONE ROWCOMPARE_INVALID to handle cases where there is no mapping. I suppose for the temporal PK use, we would need to add something like ROWCOMPARE_OVERLAPS. So this is the idea. To take a step back, I can see the following options: 1. Require all index AMs to use btree-compatible strategy numbers. (Previously rejected, too much upgrading mess.) 2. Provide mapping between index AM strategy numbers and btree strategy numbers. (This doesn't have a space for non-btree operations like overlaps.) 3. Provide mapping between index AM strategy numbers and the existing RT*StrategyNumber numbers. (We can expand the set as we want.) (This is what the existing mapping function for gist does.) 4. Provide mapping between index AM strategy numbers and some completely new set of numbers/symbols. 5. Provide mapping between index AM strategy numbers and RowCompareType (with som
Re: Fix for consume_xids advancing XIDs incorrectly
I made a new patch (skip_xid_correctly.diff) that incorporates the points we discussed: 1. Fix the issue that consume_xids consumes nxids+1 XIDs. 2. Update lastxid when calling GetTopFullTransactionId() to support nxids==1 case. 3. Forbid consume_xids when nxids==0. 4. Add comments explaining the return values of consume_xids and consume_xids_until, and the rationale for incrementing consumed++ when GetTopFullTransactionId() is called. Also, I made another patch (support_blksz_32k.diff) that supports the block size == 32K case. Best, Yushi Ogiwara diff --git a/src/test/modules/xid_wraparound/xid_wraparound.c b/src/test/modules/xid_wraparound/xid_wraparound.c index dce81c0c6d..63f3305ee9 100644 --- a/src/test/modules/xid_wraparound/xid_wraparound.c +++ b/src/test/modules/xid_wraparound/xid_wraparound.c @@ -26,6 +26,7 @@ static FullTransactionId consume_xids_common(FullTransactionId untilxid, uint64 /* * Consume the specified number of XIDs. + * Returns the last XID assigned by this function. */ PG_FUNCTION_INFO_V1(consume_xids); Datum @@ -34,11 +35,8 @@ consume_xids(PG_FUNCTION_ARGS) int64 nxids = PG_GETARG_INT64(0); FullTransactionId lastxid; - if (nxids < 0) + if (nxids <= 0) elog(ERROR, "invalid nxids argument: %lld", (long long) nxids); - - if (nxids == 0) - lastxid = ReadNextFullTransactionId(); else lastxid = consume_xids_common(InvalidFullTransactionId, (uint64) nxids); @@ -47,6 +45,7 @@ consume_xids(PG_FUNCTION_ARGS) /* * Consume XIDs, up to the given XID. + * Returns the last XID assigned by this function. */ PG_FUNCTION_INFO_V1(consume_xids_until); Datum @@ -88,8 +87,15 @@ consume_xids_common(FullTransactionId untilxid, uint64 nxids) * GetNewTransactionId registers them in the subxid cache in PGPROC, until * the cache overflows, but beyond that, we don't keep track of the * consumed XIDs. + * + * If no top-level XID is assigned, a new one is obtained, + * and the consumed XID counter is incremented. */ - (void) GetTopTransactionId(); + if(!FullTransactionIdIsValid(GetTopFullTransactionIdIfAny())) + { + lastxid = GetTopFullTransactionId(); + consumed++; + } for (;;) { diff --git a/src/test/modules/xid_wraparound/xid_wraparound.c b/src/test/modules/xid_wraparound/xid_wraparound.c index dce81c0c6d..95832fbecc 100644 --- a/src/test/modules/xid_wraparound/xid_wraparound.c +++ b/src/test/modules/xid_wraparound/xid_wraparound.c @@ -64,6 +64,14 @@ consume_xids_until(PG_FUNCTION_ARGS) PG_RETURN_FULLTRANSACTIONID(lastxid); } +/* + * These constants copied from .c files, because they're private. + */ +#define COMMIT_TS_XACTS_PER_PAGE (BLCKSZ / 10) +#define SUBTRANS_XACTS_PER_PAGE (BLCKSZ / sizeof(TransactionId)) +#define CLOG_XACTS_PER_BYTE 4 +#define CLOG_XACTS_PER_PAGE (BLCKSZ * CLOG_XACTS_PER_BYTE) + /* * Common functionality between the two public functions. */ @@ -115,7 +123,7 @@ consume_xids_common(FullTransactionId untilxid, uint64 nxids) * If we still have plenty of XIDs to consume, try to take a shortcut * and bump up the nextXid counter directly. */ - if (xids_left > 2000 && + if (xids_left > COMMIT_TS_XACTS_PER_PAGE && consumed - last_reported_at < REPORT_INTERVAL && MyProc->subxidStatus.overflowed) { @@ -153,14 +161,6 @@ consume_xids_common(FullTransactionId untilxid, uint64 nxids) return lastxid; } -/* - * These constants copied from .c files, because they're private. - */ -#define COMMIT_TS_XACTS_PER_PAGE (BLCKSZ / 10) -#define SUBTRANS_XACTS_PER_PAGE (BLCKSZ / sizeof(TransactionId)) -#define CLOG_XACTS_PER_BYTE 4 -#define CLOG_XACTS_PER_PAGE (BLCKSZ * CLOG_XACTS_PER_BYTE) - /* * All the interesting action in GetNewTransactionId happens when we extend * the SLRUs, or at the uint32 wraparound. If the nextXid counter is not close
Re: Pgoutput not capturing the generated columns
On Tue, Oct 29, 2024 at 8:50 PM vignesh C wrote: > > Thank you for reporting this issue. The attached v46 patch addresses > the problem and includes some adjustments to the comments. Thanks to > Amit for sharing the comment changes offline. > Pushed. Kindly rebase and send the remaining patches. -- With Regards, Amit Kapila.
Re: protocol-level wait-for-LSN
On Wed, 30 Oct 2024 at 07:49, Tatsuo Ishii wrote: > > I think one would have to define that somehow. If it's useful, the > > additional fields of both extensions could be appended, in some > > defined order. But this is an interesting question to think about. > > I think this kind of extension, which adds new filed to an existing > message type, should be implemented as v4 protocol. Could you explain why you think a major version bump is needed? In what situation do you care about this. Because for my usecases (client implementations & pgbouncer) I don't think that would be necessary. If a client doesn't send the _pq_.wait_for_lsn protocol parameter, it will never receive this new version. I don't really see a problem with having two protocol parameters change the same message. Yes, you have to define what the result of their combination is, but that seems trivial to do for additions of fields. You either define the first protocol parameter that was added to the spec, to add its field before the second. Or you could do it based on something non-time-dependent, like the alphabetic order of the protocol parameter, or the alphabetic order of the fields that they add. The main guarantees I'd like to uphold are listed here: https://www.postgresql.org/message-id/cageczqr5pmud4q8atyz0gooj1xnh33g7g-mlxfml1_vrhbz...@mail.gmail.com
RE: Conflict detection for update_deleted in logical replication
Dear Hou, Thanks for updating the patch! Here are my comments. 01. CreateSubscription ``` +if (opts.detectupdatedeleted && !track_commit_timestamp) +ereport(ERROR, +errcode(ERRCODE_INVALID_PARAMETER_VALUE), +errmsg("detecting update_deleted conflicts requires \"%s\" to be enabled", + "track_commit_timestamp")); ``` I don't think this guard is sufficient. I found two cases: * Creates a subscription with detect_update_deleted = false and track_commit_timestamp = true, then alters detect_update_deleted to true. * Creates a subscription with detect_update_deleted = true and track_commit_timestamp = true, then update track_commit_timestamp to true and restart the instance. Based on that, how about detecting the inconsistency on the apply worker? It check the parameters and do error out when it starts or re-reads a catalog. If we want to detect in SQL commands, this can do in parse_subscription_options(). 02. AlterSubscription() ``` +ApplyLauncherWakeupAtCommit(); ``` The reason why launcher should wake up is different from other parts. Can we add comments that it is needed to track/untrack the xmin? 03. build_index_column_bitmap() ``` +for (int i = 0; i < indexinfo->ii_NumIndexAttrs; i++) +{ +int keycol = indexinfo->ii_IndexAttrNumbers[i]; + +index_bitmap = bms_add_member(index_bitmap, keycol); +} ``` I feel we can assert the ii_IndexAttrNumbers is valid, because the passed index is a replica identity key. 04. LogicalRepApplyLoop() Can we move the definition of "phase" to the maybe_advance_nonremovable_xid() and make it static? The variable is used only by the function. 05. LogicalRepApplyLoop() ``` +UpdateWorkerStats(last_received, timestamp, false); ``` The statistics seems not correct. last_received is not sent at "timestamp", it had already been sent earlier. Do we really have to update here? 06. ErrorOnReservedSlotName() I feel we should document that the slot name 'pg_conflict_detection' cannot be specified unconditionally. 07. General update_deleted can happen without DELETE commands. Should we rename the conflict reason, like 'update_target_modified'? E.g., there is a 2-way replication system and below transactions happen: Node A: T1: INSERT INTO t (id, value) VALUES (1,1); ts = 10.00 T2: UPDATE t SET id = 2 WHERE id = 1; ts = 10.02 Node B: T3: UPDATE t SET value = 2 WHERE id = 1; ts = 10.01 Then, T3 comes to Node A after executing T2. T3 tries to find id = 1 but find a dead tuple instead. In this case, 'update_delete' happens without the delete. 08. Others Also, here is an analysis related with the truncation of commit timestamp. I worried the case that commit timestamp might be removed so that the detection would not go well. But it seemed that entries can be removed when it behinds GetOldestNonRemovableTransactionId(NULL), i.e., horizons.shared_oldest_nonremovable. The value is affected by the replication slots so that interesting commit_ts entries for us are not removed. Best regards, Hayato Kuroda FUJITSU LIMITED
Re: protocol-level wait-for-LSN
>>> With this protocol extension, two things are changed: >>> >>> - The ReadyForQuery message sends back the current LSN. >> If other protocol extension X tries to add something to the >> ReadyForQuery message too, what would happen? > > I think one would have to define that somehow. If it's useful, the > additional fields of both extensions could be appended, in some > defined order. But this is an interesting question to think about. I think this kind of extension, which adds new filed to an existing message type, should be implemented as v4 protocol. Best reagards, -- Tatsuo Ishii SRA OSS K.K. English: http://www.sraoss.co.jp/index_en/ Japanese:http://www.sraoss.co.jp
Re: AIO writes vs hint bits vs checksums
On 30/10/2024 04:21, Andres Freund wrote: Attached is a, unfortunately long, series of patches implementing what I described upthread. Review of the preparatory patches: 0001 Add very basic test for kill_prior_tuples We currently don't exercise this patch for gist and hash, which seems somewhat criminal. Interesting to use the isolationtester for this. There's just one session, so you're just using it to define reusable steps with handy names. I'm fine with that, but please add a comment to explain it. I wonder if it'd be more straightforward to make it a regular pg_regress test though. There would be some repetition, but would it be so bad? You forgot to add the new test to 'isolation_schedule'. typos: "inex" -> "index" "does something approximately reasonble" -> "do something approximately reasonable" 0002 heapam: Move logic to handle HEAP_MOVED into a helper function A prep patch, which consolidates HEAP_MOVED handling into a helper. This isn't strictly necessary, but I got bothered by it while writing this patch series. +1 +/* + * If HEAP_MOVED_OFF or HEAP_MOVED_IN are set on the tuple, remove them and + * adjust hint bits. See the comment for SetHintBits() for more background. + * + * This helper returns false if the row ought to be invisible, true otherwise. + */ +static inline bool +HeapTupleCleanMoved(HeapTupleHeader tuple, Buffer buffer) +{ + TransactionId xvac; + + /* only used by pre-9.0 binary upgrades */ + if (likely(!(tuple->t_infomask & (HEAP_MOVED_OFF | HEAP_MOVED_IN + return true; + + ... +} This is so unlikely these days that this function probably should not be inlined anymore. 0003 bufmgr: Add BufferLockHeldByMe() Useful for writing assertions in later patches. + if (mode == BUFFER_LOCK_EXCLUSIVE) + lwmode = LW_EXCLUSIVE; + else if (mode == BUFFER_LOCK_SHARE) + lwmode = LW_SHARED; + else + { + Assert(false); + pg_unreachable(); + lwmode = LW_EXCLUSIVE; /* assuage compiler */ + } I just learned a new word :-). This is fine, but to avoid the assuaging, you could also do this: if (mode == BUFFER_LOCK_EXCLUSIVE) lwmode = LW_EXCLUSIVE; else { Assert(mode == BUFFER_LOCK_SHARE); lwmode = LW_SHARED; } 0004 heapam: Use exclusive lock on old page in CLUSTER When I started looking into this I wanted to make sure that it's actually safe to skip setting hint bits in all places. One place that's *not* safe is cluster - we assume that all hint bits are set in rewrite_heap_tuple(). This might not strictly be required, but it seemed very fragile as is. While using an exclusive lock does seem like it makes it a bit safer, it's not a very good fix, as we still dirty the old buffer, which seems like a shame when one uses VACUUM FULL. + /* +* To be able to guarantee that we can set the hint bit, acquire an +* exclusive lock on the old buffer. We need the hint bits to be set +* as otherwise reform_and_rewrite_tuple() -> rewrite_heap_tuple() -> +* heap_freeze_tuple() will get confused. +* Hmm, I don't see any comments in the reform_and_rewrite_tuple() -> rewrite_heap_tuple() -> heap_freeze_tuple() path about requiring hint bits to be set. How does it get confused? 0005 heapam: Only set tuple's block once per page in pagemode Minor, but measurable, optimization. +1, this makes sense independently of all the other patches. diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c index 1748eafa100..d0cb4b1e29b 100644 --- a/src/backend/access/heap/heapam.c +++ b/src/backend/access/heap/heapam.c @@ -981,6 +981,9 @@ heapgettup_pagemode(HeapScanDesc scan, linesleft = scan->rs_ntuples; lineindex = ScanDirectionIsForward(dir) ? 0 : linesleft - 1; + /* set block once per page, instead of doing so for every tuple */ + BlockIdSet(&tuple->t_self.ip_blkid, scan->rs_cblock); + /* lineindex now references the next or previous visible tid */ continue_page: @@ -995,7 +998,7 @@ continue_page: tuple->t_data = (HeapTupleHeader) PageGetItem(page, lpp); tuple->t_len = ItemIdGetLength(lpp); - ItemPointerSet(&(tuple->t_self), scan->rs_cblock, lineoff); + tuple->t_self.ip_posid = lineoff; /* skip any tuples that don't match the scan key */ if (key != NULL && Should probably use ItemPointerSetBlockNumber and ItemPointerSetOffsetNumber. 0006 heapam: Add batch mode mvcc check and use it in page mode This is a good bit faster on its own, but is required to avoid a performance regression later, when setting hint bits only only when no IO going on
Re: Conflict detection for update_deleted in logical replication
Hello Hayato! > Note that apply workers can stop due to some reasons (e.g., disabling subscriptions, > error out, deadlock...). In this case, the snapshot cannot eb registered by the > worker and index can be re-built during the period. However, the xmin of a slot affects replication_slot_xmin in ProcArrayStruct, so it might be straightforward to wait for it during concurrent index builds. We could consider adding a separate conflict_resolution_replication_slot_xmin to wait only for that. > Anyway, this topic introduces huge complexity and is not mandatory for update_deleted > detection. We can work on it in later versions based on the needs. >From my perspective, this is critical for databases. REINDEX CONCURRENTLY is typically run in production databases on regular basic, so any master-master system should be unaffected by it. Best regards, Mikhail.
Re: Avoid detoast overhead when possible
Hi! Sorry for the long delay, other tasks required my attention and this project was postponed, but we have some recent requests from clients and this work has to be revived. 1) In short the idea behind the Json improvements is to introduce iterators for Json objects to extract these objects value by value, and modification of in-memory and on-disk representations to hold partially-detoasted Json objects (tree nodes with need-detoast state) for in-memory, and to store key-value map in on-disk representation to extract only required values. 2) Backward compatibility would not be a problem because in on-disk representation we could distinguish old version from new, there are enough service bits for that. But servers not updated with this patch could not parse new data, of course. 3) Yes, it definitely has to. These changes are very complex and invasive, that's the reason I haven't finished the patch. 4) It doesn't seem worthy at first glance. Do you have any ideas on this? Thank you for your interest! On Wed, Oct 30, 2024 at 2:59 AM Andy Fan wrote: > zhihuifan1...@163.com writes: > > Hello Nikita, > > >> There's a view from the other angle - detoast just attributes that are > needed > >> (partial detoast), with optimized storage mechanics for JSONb. > > > > Very glad to know that, looking forward your design & patch! > > This is interesting, do you mind to provide a high level design of > this. I'm not pushing to do this, but you asked me how I think about > your proposal[1], so I think a high level design is a must to for these > question. A first set of questions includes: > > 1. How to optimize the JSONb storage? > 2. How to be compatible with older version? > 3. How to integreate with existing ExprExecutionEngine? looks your > proposal needs to know which "part" of jsonb fields are needed. > 4. Is there any optimization if user want to get a full JSONB, like > SELECT jb_col FROM t? > > [1] > > https://www.postgresql.org/message-id/CAN-LCVO3GZAKVTKNwwcezoc%3D9Lq%3DkU2via-BM3MXVdOq4tD9RQ%40mail.gmail.com > > -- > Best Regards > Andy Fan > > -- Regards, Nikita Malakhov Postgres Professional The Russian Postgres Company https://postgrespro.ru/
Re: Reduce one comparison in binaryheap's sift down
On Tue, Oct 29, 2024 at 11:00 AM Nathan Bossart wrote: > On Tue, Oct 29, 2024 at 11:21:15AM +0800, cca5507 wrote: > > Agree, new version patch is attached. > > I might editorialize a bit, but overall this looks pretty good to me. > Robert, would you like to commit this, or shall I? I would be delighted if you could take care of it. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Alias of VALUES RTE in explain plan
On Tue, Oct 29, 2024 at 10:49 PM Tom Lane wrote: > > Andrei Lepikhov writes: > > -- New behavior > > EXPLAIN (COSTS OFF, VERBOSE) > > SELECT * FROM (VALUES (4),(2),(3),(1) ORDER BY t1.x LIMIT 2) AS t1(x); > > SELECT * FROM (VALUES (4),(2),(3),(1) ORDER BY t1.x LIMIT 2) AS t1(x); > > After taking a closer look at that, yeah it's new behavior, and > I'm not sure we want to change it. (The existing behavior is that > you'd have to write 'column1' or '"*VALUES*".column1' in the > subquery's ORDER BY.) > > This example also violates my argument that the user thinks they > are attaching the alias directly to VALUES. > > So what I now think > is that we ought to tweak the patch so that the parent alias is > pushed down only when the subquery contains just VALUES, no other > clauses. Per a look at the grammar, ORDER BY, LIMIT, and FOR > UPDATE could conceivably appear alongside VALUES; although > FOR UPDATE would draw "FOR UPDATE cannot be applied to VALUES", > so maybe we needn't worry about it. > > Thoughts? If the user writes it in this manner, I think they intend to attach the alias to VALUES() since there's no other way to do it. What is weird is that they can use the alias before it's declared. For the sake of eliminating this weirdness, your proposed tweak sounds fine to me. Even if we don't add that tweak, it's not easy for users to find out that they can write the query this way. But it's better to plug the hole before somebody starts exploiting it. -- Best Wishes, Ashutosh Bapat
Re: Eager aggregation, take 3
On Sat, Oct 5, 2024 at 11:30 PM Richard Guo wrote: > Here’s a comparison of Execution Time and Planning Time for the seven > queries with eager aggregation disabled versus enabled (best of 3). > > Execution Time: > > EAGER-AGG-OFF EAGER-AGG-ON > > q4 105787.963 ms 34807.938 ms > q8 1407.454 ms 1654.923 ms > q11 67899.213 ms18670.086 ms > q23 45945.849 ms42990.652 ms > q31 10463.536 ms10244.175 ms > q33 2186.928 ms 2217.228 ms > q77 2360.565 ms 2416.674 ms Could you attach the EXPLAIN ANALYZE output for these queries, with and without the patch? -- Robert Haas EDB: http://www.enterprisedb.com
Re: general purpose array_sort
Hi, Thanks for the updated patch set. > > > +Datum > > > +array_sort_order(PG_FUNCTION_ARGS) > > > +{ > > > +return array_sort(fcinfo); > > > +} > > > + > > > +Datum > > > +array_sort_order_nulls_first(PG_FUNCTION_ARGS) > > > +{ > > > +return array_sort(fcinfo); > > > +} > > > > Any reason not to specify array_sort in pg_proc.dat? > > It is specified in 0001 (see oid => '8810'). What I meant was that I don't think these wrapper functions are needed. I think you can just do: ``` +{ oid => '8811', descr => 'sort array', + proname => 'array_sort', prorettype => 'anyarray', + proargtypes => 'anyarray bool', prosrc => 'array_sort'}, <-- array_sort is used directly in `prosrc` ``` ... unless I'm missing something. -- Best regards, Aleksander Alekseev
Re: detoast datum into the given buffer as a optimization.
Hi! Andy, I'm truly sorry, I removed quoted messages just to not make replies to walls of text. On detoast_attr_buffer topic - and again agree with reply above that supposes adding required data length as parameter, at least by doing so we should explicitly make the user of this API pay attention to the length of the buffer and data size, as it is done in text_to_cstring_buffer you mentioned. About printtup: I agree direct detoasting to the buffer could give us some performance improvement, it seems not too difficult to check, but as you can see there is a function call via fmgr, it is better to try to find out why such a decision was made. On Wed, Oct 30, 2024 at 2:47 AM Andy Fan wrote: > Nikita Malakhov writes: > > > Hi! > > > > Sorry for misguiding you, I've overlooked va_rawsize with va_extinfo. > > You're right, va_rawsize holds uncompressed size, and extinfo actual > > storage size. This was not intentional. > > That's OK, so we are in the same page here. > > > I'd better not count on caller's do know detoasted data length, > > and much more the buffer is correctly initialized, because we cannot > > check that inside and must rely on the caller, which would for sure > > lead to unexpected segfaults, I agree with Tom Lane's proposal above. > > Other options seem to be more crude and error-prone here. This is > > an internal data fetching function and it should not generate new kinds > > of errors, I think. > > Tom'sreply depends on the fact I was going to changing the "detoast_attr" > to "detoast_attr_buffer", as I have expalined in my previous post. I > don't think a [new] user provided buffer function is so harmful. What > do you think about function "text_to_string_buffer"? This is also a > part in my previous reply, but is ignored by your reply? > > > In case you're playing with this part of the code - I was always > > confused with detoast_attr and detoast_external_attr functions > > both marked as entry points of retrieving toasted data and both > > look a lot the same. Have you ever thought of making a single > > entry point by slightly redesigning this part? > > You can check [1] for a indepent improvements for this topic. > > [1] https://www.postgresql.org/message-id/874j4vcspl.fsf%40163.com > > -- > Best Regards > Andy Fan > > > -- Regards, Nikita Malakhov Postgres Professional The Russian Postgres Company https://postgrespro.ru/
Re: AIO writes vs hint bits vs checksums
Hi, Thanks for the quick review! On 2024-10-30 14:16:51 +0200, Heikki Linnakangas wrote: > On 24/09/2024 18:55, Andres Freund wrote: > > What I'd instead like to propose is to implement the right to set hint bits > > as > > a bit in each buffer's state, similar to BM_IO_IN_PROGRESS. Tentatively I > > named this BM_SETTING_HINTS. It's only allowed to set BM_SETTING_HINTS when > > BM_IO_IN_PROGRESS isn't already set and StartBufferIO has to wait for > > BM_SETTING_HINTS to be unset to start IO. > > > > Naively implementing this, by acquiring and releasing the permission to set > > hint bits in SetHintBits() unfortunately leads to a significant performance > > regression. While the performance is unaffected for OLTPish workloads like > > pgbench (both read and write), sequential scans of unhinted tables regress > > significantly, due to the per-tuple lock acquisition this would imply. > > It'd be nice to implement this without having to set and clear > BM_SETTING_HINTS every time. True - but it's worth noting that we only need to set BM_SETTING_HINTS if there are any hint bits that need to be set. So in the most common paths we won't need to deal with it at all. > Could we put the overhead on the FlushBuffer() > instead? > > How about something like this: > > To set hint bits: > > 1. Lock page in SHARED mode. > 2. Read BM_IO_IN_PROGRESS > 3. If !BM_IO_IN_PROGRESS, set hint bits, otherwise don't > 4. Unlock page > > To flush a buffer: > > 1. Lock page in SHARED mode > 2. Set BM_IO_IN_PROGRESS > 3. Read the lock count on the buffer lock, to see if we're the only locker. > 4. If anyone else is holding the lock, upgrade it to exclusive mode, and > immediately downgrade back to share mode. > 5. calculate CRC, flush the buffer > 6. Clear BM_IO_IN_PROGRESS and unlock page. > > This goes back to the idea of adding LWLock support for this, but the amount > of changes could be pretty small. The missing operation we don't have today > is reading the share-lock count on the lock in step 3. That seems simple to > add. I've played around with a bunch of ideas like this. There are two main reasons I didn't like them that much in the end: 1) The worst case latency impacts seemed to make them not that interesting. A buffer that is heavily contended with share locks might not get down to zero share lockers for quite a while. That's not a problem for individual FlushBuffer() calls, but it could very well add up to a decent sized delay for something like a checkpoint that has to flush a lot of buffers. Waiting for all pre-existing share lockers is easier said than done. We don't record the acquisition order anywhere and a share-lock release won't wake anybody if the lockcount doesn't reach zero. Changing that wouldn't exactly be free and the cost would be born by all lwlock users. 2) They are based on holding an lwlock. But it's actually quite conceivable that we'd want to set something hint-bit-like without any lock, as we e.g. currently do for freespace/. That would work with something like the approach I chose here, but not if we rely on lwlocks. E.g. with a bit of work we could actually do sequential scans without blocking concurrent buffer modifications by carefully ordering when pd_lower is modified and making sure that tuple header fields are written in a sensible order. > Acquiring the exclusive lock in step 4 is just a way to wait for all the > existing share-lockers to release the lock. You wouldn't need to block new > share-lockers. We have LW_WAIT_UNTIL_FREE, which is almost what we need, but > it currently ignores share-lockers. So doing this "properly" would require > more changes to LWLocks. Briefly acquiring an exclusive lock seems > acceptable though. I don't think LW_WAIT_UNTIL_FREE is that easily extended to cover something like this, because share lockers can acquire/release the lock without waking anybody. And changing that would be rather expensive for everyone... > > But: We can address this and improve performance over the status quo! Today > > we > > determine tuple visiblity determination one-by-one, even when checking the > > visibility of an entire page worth of tuples. > > Yes, batching the visibility checks makes sense in any case. Cool. The wins by doing that are noticeable... Greetings, Andres Freund
Re: protocol-level wait-for-LSN
On Wed, 30 Oct 2024 at 12:53, Heikki Linnakangas wrote: > We might have made a mistake by calling this mechanism "protocol > extensions", because it makes people think of user-defined extensions. I think this is a real problem, that's probably worth fixing. I created a separate thread to address this[1] > So yes, each protocol extension needs to know about all the other > protocol extensions that it can be used with. In practice we'll avoid > doing crazy stuff so that the protocol extensions are orthogonal Just as an example, let's say we add a server-side query time to the protocol (which honestly seems like a pretty useful feature). So that ReadyForQuery now returns the query time if the query_time protocol. For clients it isn't difficult at all to support any combination of query_time & wait_for_lsn options. As long as we define that the wait_for_lsn field is before the query_time field if both exist, then two simple if statements like this would do the trick: if (wait_for_lsn_enabled) { // interpret next field as LSN } if (query_time_enabled) { // interpret next field as query time } [1]: https://www.postgresql.org/message-id/CAGECzQQoc%2BV94TrF-5cMikCMaf-uUnU52euwSCtQBeDYqXnXyA%40mail.gmail.com
Re: general purpose array_sort
Hi, On Wed, Oct 30, 2024 at 9:29 PM Aleksander Alekseev wrote: > > Hi, > > Thanks for the updated patch set. > > > > > +Datum > > > > +array_sort_order(PG_FUNCTION_ARGS) > > > > +{ > > > > +return array_sort(fcinfo); > > > > +} > > > > + > > > > +Datum > > > > +array_sort_order_nulls_first(PG_FUNCTION_ARGS) > > > > +{ > > > > +return array_sort(fcinfo); > > > > +} > > > > > > Any reason not to specify array_sort in pg_proc.dat? > > > > It is specified in 0001 (see oid => '8810'). > > What I meant was that I don't think these wrapper functions are > needed. I think you can just do: > > ``` > +{ oid => '8811', descr => 'sort array', > + proname => 'array_sort', prorettype => 'anyarray', > + proargtypes => 'anyarray bool', prosrc => 'array_sort'}, <-- > array_sort is used directly in `prosrc` > ``` > > ... unless I'm missing something. There is a opr sanity check for this[1], if we remove these wrapper functions, regression test will fail with: - oid | proname | oid | proname --+-+-+- -(0 rows) + oid | proname | oid | proname +--++--+ + 8811 | array_sort | 8812 | array_sort + 8810 | array_sort | 8811 | array_sort + 8810 | array_sort | 8812 | array_sort +(3 rows) [1]: -- Considering only built-in procs (prolang = 12), look for multiple uses -- of the same internal function (ie, matching prosrc fields). It's OK to -- have several entries with different pronames for the same internal function, -- but conflicts in the number of arguments and other critical items should -- be complained of. (We don't check data types here; see next query.) -- Note: ignore aggregate functions here, since they all point to the same -- dummy built-in function. SELECT p1.oid, p1.proname, p2.oid, p2.proname FROM pg_proc AS p1, pg_proc AS p2 WHERE p1.oid < p2.oid AND p1.prosrc = p2.prosrc AND p1.prolang = 12 AND p2.prolang = 12 AND (p1.prokind != 'a' OR p2.prokind != 'a') AND (p1.prolang != p2.prolang OR p1.prokind != p2.prokind OR p1.prosecdef != p2.prosecdef OR p1.proleakproof != p2.proleakproof OR p1.proisstrict != p2.proisstrict OR p1.proretset != p2.proretset OR p1.provolatile != p2.provolatile OR p1.pronargs != p2.pronargs); > > -- > Best regards, > Aleksander Alekseev -- Regards Junwang Zhao
Re: Consider pipeline implicit transaction as a transaction block
On Wed, 30 Oct 2024 at 10:15, Anthonin Bonnefoy wrote: > The attached patch adds the detection of implicit transactions started > by a pipeline in CheckTransactionBlock, avoiding warnings when > commands like `set local` are called within a pipeline, and making the > detection of transaction block coherent with what's done in > IsInTransactionBlock and PreventInTransactionBlock. +1 seems like a reasonable change. > The XACT_FLAGS_PIPELINING will only be set after the first command, so > the warning about `set local` happening outside of a transaction block > will still be generated. However, I'm not sure if it's something > fixable (or worth fixing?). This would require to know beforehand that > there are multiple executes before the sync message, which doesn't > seem doable. Yeah, I don't really see a way around that apart from not throwing this warning at all when the client is using the extended protocol. Postgres would need to be clairvoyant to know to really know if it should show it for the first message.
Re: [17] CREATE SUBSCRIPTION ... SERVER
On Fri, 2024-03-08 at 00:20 -0800, Jeff Davis wrote: > Implemented in v11, attached. Rebased, v12 attached. Regards, Jeff Davis From 5c2a8f5cb865becd70b08379d9fc72946be9a32a Mon Sep 17 00:00:00 2001 From: Jeff Davis Date: Tue, 2 Jan 2024 13:42:48 -0800 Subject: [PATCH v12] CREATE SUSBCRIPTION ... SERVER. Allow specifying a foreign server for CREATE SUBSCRIPTION, rather than a raw connection string with CONNECTION. Using a foreign server as a layer of indirection improves management of multiple subscriptions to the same server. It also provides integration with user mappings in case different subscriptions have different owners or a subscription changes owners. Discussion: https://postgr.es/m/61831790a0a937038f78ce09f8dd4cef7de7456a.ca...@j-davis.com Reviewed-by: Ashutosh Bapat --- contrib/postgres_fdw/Makefile | 2 + contrib/postgres_fdw/connection.c | 73 .../postgres_fdw/expected/postgres_fdw.out| 8 + contrib/postgres_fdw/meson.build | 5 + .../postgres_fdw/postgres_fdw--1.1--1.2.sql | 8 + contrib/postgres_fdw/sql/postgres_fdw.sql | 7 + contrib/postgres_fdw/t/010_subscription.pl| 71 doc/src/sgml/ref/alter_subscription.sgml | 18 +- doc/src/sgml/ref/create_subscription.sgml | 11 +- src/backend/catalog/pg_subscription.c | 38 +++- src/backend/commands/foreigncmds.c| 58 +- src/backend/commands/subscriptioncmds.c | 168 -- src/backend/foreign/foreign.c | 66 +++ src/backend/parser/gram.y | 22 +++ src/backend/replication/logical/worker.c | 16 +- src/bin/pg_dump/pg_dump.c | 36 +++- src/bin/pg_dump/pg_dump.h | 1 + src/bin/psql/tab-complete.in.c| 2 +- src/include/catalog/pg_foreign_data_wrapper.h | 3 + src/include/catalog/pg_subscription.h | 7 +- src/include/foreign/foreign.h | 3 + src/include/nodes/parsenodes.h| 3 + src/test/regress/expected/oidjoins.out| 1 + 23 files changed, 591 insertions(+), 36 deletions(-) create mode 100644 contrib/postgres_fdw/t/010_subscription.pl diff --git a/contrib/postgres_fdw/Makefile b/contrib/postgres_fdw/Makefile index 88fdce40d6..a101418d6e 100644 --- a/contrib/postgres_fdw/Makefile +++ b/contrib/postgres_fdw/Makefile @@ -18,6 +18,8 @@ DATA = postgres_fdw--1.0.sql postgres_fdw--1.0--1.1.sql postgres_fdw--1.1--1.2.s REGRESS = postgres_fdw query_cancel +TAP_TESTS = 1 + ifdef USE_PGXS PG_CONFIG = pg_config PGXS := $(shell $(PG_CONFIG) --pgxs) diff --git a/contrib/postgres_fdw/connection.c b/contrib/postgres_fdw/connection.c index 2326f391d3..48c77a8de3 100644 --- a/contrib/postgres_fdw/connection.c +++ b/contrib/postgres_fdw/connection.c @@ -123,6 +123,7 @@ PG_FUNCTION_INFO_V1(postgres_fdw_get_connections); PG_FUNCTION_INFO_V1(postgres_fdw_get_connections_1_2); PG_FUNCTION_INFO_V1(postgres_fdw_disconnect); PG_FUNCTION_INFO_V1(postgres_fdw_disconnect_all); +PG_FUNCTION_INFO_V1(postgres_fdw_connection); /* prototypes of private functions */ static void make_new_connection(ConnCacheEntry *entry, UserMapping *user); @@ -2161,6 +2162,78 @@ postgres_fdw_get_connections_internal(FunctionCallInfo fcinfo, } } +/* + * Values in connection strings must be enclosed in single quotes. Single + * quotes and backslashes must be escaped with backslash. NB: these rules are + * different from the rules for escaping a SQL literal. + */ +static void +appendEscapedValue(StringInfo str, const char *val) +{ + appendStringInfoChar(str, '\''); + for (int i = 0; val[i] != '\0'; i++) + { + if (val[i] == '\\' || val[i] == '\'') + appendStringInfoChar(str, '\\'); + appendStringInfoChar(str, val[i]); + } + appendStringInfoChar(str, '\''); +} + +Datum +postgres_fdw_connection(PG_FUNCTION_ARGS) +{ + Oid userid = PG_GETARG_OID(0); + Oid serverid = PG_GETARG_OID(1); + ForeignServer *server = GetForeignServer(serverid); + UserMapping *user = GetUserMapping(userid, serverid); + StringInfoData str; + const char **keywords; + const char **values; + int n; + + /* + * Construct connection params from generic options of ForeignServer and + * UserMapping. (Some of them might not be libpq options, in which case + * we'll just waste a few array slots.) Add 4 extra slots for + * application_name, fallback_application_name, client_encoding, end + * marker. + */ + n = list_length(server->options) + list_length(user->options) + 4; + keywords = (const char **) palloc(n * sizeof(char *)); + values = (const char **) palloc(n * sizeof(char *)); + + n = 0; + n += ExtractConnectionOptions(server->options, + keywords + n, values + n); + n += ExtractConnectionOptions(user->options, + keywords + n, values + n); + + /* Set client_encoding so that libpq can convert encoding properly. */ + keywords[n] = "client_encoding"; + values[n] = GetDatabase
Re: AIO writes vs hint bits vs checksums
On 24/09/2024 18:55, Andres Freund wrote: What I'd instead like to propose is to implement the right to set hint bits as a bit in each buffer's state, similar to BM_IO_IN_PROGRESS. Tentatively I named this BM_SETTING_HINTS. It's only allowed to set BM_SETTING_HINTS when BM_IO_IN_PROGRESS isn't already set and StartBufferIO has to wait for BM_SETTING_HINTS to be unset to start IO. Naively implementing this, by acquiring and releasing the permission to set hint bits in SetHintBits() unfortunately leads to a significant performance regression. While the performance is unaffected for OLTPish workloads like pgbench (both read and write), sequential scans of unhinted tables regress significantly, due to the per-tuple lock acquisition this would imply. It'd be nice to implement this without having to set and clear BM_SETTING_HINTS every time. Could we put the overhead on the FlushBuffer() instead? How about something like this: To set hint bits: 1. Lock page in SHARED mode. 2. Read BM_IO_IN_PROGRESS 3. If !BM_IO_IN_PROGRESS, set hint bits, otherwise don't 4. Unlock page To flush a buffer: 1. Lock page in SHARED mode 2. Set BM_IO_IN_PROGRESS 3. Read the lock count on the buffer lock, to see if we're the only locker. 4. If anyone else is holding the lock, upgrade it to exclusive mode, and immediately downgrade back to share mode. 5. calculate CRC, flush the buffer 6. Clear BM_IO_IN_PROGRESS and unlock page. This goes back to the idea of adding LWLock support for this, but the amount of changes could be pretty small. The missing operation we don't have today is reading the share-lock count on the lock in step 3. That seems simple to add. Acquiring the exclusive lock in step 4 is just a way to wait for all the existing share-lockers to release the lock. You wouldn't need to block new share-lockers. We have LW_WAIT_UNTIL_FREE, which is almost what we need, but it currently ignores share-lockers. So doing this "properly" would require more changes to LWLocks. Briefly acquiring an exclusive lock seems acceptable though. But: We can address this and improve performance over the status quo! Today we determine tuple visiblity determination one-by-one, even when checking the visibility of an entire page worth of tuples. Yes, batching the visibility checks makes sense in any case. -- Heikki Linnakangas Neon (https://neon.tech)
Re: [PATCH] Add array_reverse() function
On Tuesday, October 22, 2024 16:27 MSK, Aleksander Alekseev wrote: Hi everyone, Thanks for the feedback! > > But it returns the input array as is. I think it should at least make > > a new copy of input array. > > I don't think that's really necessary. We have other functions that > will return an input value unchanged without copying it. A > longstanding example is array_larger. Also, this code looks to be > copied from array_shuffle. It was indeed copied from array_shuffle(). Makes me wonder if we should have an utility function which would contain common code for array_shuffle() and array_reverse(). Considering inconveniences such an approach caused in case of common code for text and bytea types [1] I choose to copy the code and modify it for the task. > + > + /* If the target array is empty, exit fast */ > + if (ndim < 1 || dims[0] < 1) > + return construct_empty_array(elmtyp); > > This is taken care by the caller, at least the ndim < 1 case. Agree. Here is the corrected patch. [1]: https://postgr.es/m/caj7c6to3x88dgd8c4tb-eq2zdpz+9mp+kowdzk_82bez_cm...@mail.gmail.com -- Best regards, Aleksander Alekseev Hi! Content. The proposed function waited long to be implemented and it will be very useful. Personally I used slow workaround, when I needed the reverse of the array. Run. This patch applies cleanly to HEAD. All regression tests pass successfully against the patch. Format. The code formatted according to The Code Guidelines Documentation. The documentation is updated, the description of the function is added. >From my point of view, it would be better to mention, that function returns updated array (does not updates it in place, as a reader can understand), but other array returning functions in the documentation has the same style (silently assume: reverse == return reversed array). Conclusion. +1 for commiter review Best regards, Vladlen Popolitov postgrespro.com -- Best regards, Vladlen Popolitov.
Re: Reduce one comparison in binaryheap's sift down
On Wed, Oct 30, 2024 at 08:09:17AM -0400, Robert Haas wrote: > I would be delighted if you could take care of it. Committed. -- nathan
Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM
Hi, Thanks for looking into this. On Mon, Oct 28, 2024 at 8:18 PM Jingtang Zhang wrote: > > Just found that the initialization > seems redundant since we have used palloc0? > > > +istate = (HeapInsertState *) palloc0(sizeof(HeapInsertState)); > > +istate->bistate = NULL; > > +istate->mistate = NULL; Changed it to palloc() and explicit initializations of the members. With this, only TupleTableSlot's array in HeapMultiInsertState uses palloc0(), the rest all use explicit initializations. > Oh, another comments for v24-0001 patch: we are in heam AM now, should we use > something like HEAP_INSERT_BAS_BULKWRITE instead of using table AM option, > just like other heap AM options do? > > > + if ((state->options & TABLE_INSERT_BAS_BULKWRITE) != 0) > > + istate->bistate = GetBulkInsertState(); Defined HEAP_INSERT_BAS_BULKWRITE and used that in heapam.c similar to INSERT_SKIP_FSM, INSERT_FROZEN, NO_LOGICAL. > Little question about v24 0002 patch: would it be better to move the > implementation of TableModifyIsMultiInsertsSupported to somewhere for table AM > level? Seems it is a common function for future use, not a specific one for > matview. It's more tailored for CREATE TABLE AS and CREATE/REFRESH MATERIALIZED VIEW in the sense that no triggers, foreign table and partitioned table possible here. INSERT INTO SELECT and Logical Replication Apply will have a lot more conditions (e.g. RETURNING clause, triggers etc.) and they will need to be handled differently. So, I left TableModifyIsMultiInsertsSupported as-is in a common place in matview.c. Please find the attached v25 patch set. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com v25-0001-Introduce-new-table-AM-for-multi-inserts.patch Description: Binary data v25-0002-Optimize-CTAS-CMV-RMV-with-new-multi-inserts-tab.patch Description: Binary data v25-0003-Use-new-multi-inserts-table-AM-for-COPY-.-FROM.patch Description: Binary data
Re: Eager aggregation, take 3
On Tue, Oct 29, 2024 at 3:57 AM Richard Guo wrote: > On Fri, Oct 18, 2024 at 10:22 PM jian he wrote: > > So overall I doubt here BTEQUALIMAGE_PROC flag usage is correct. > > The BTEQUALIMAGE_PROC flag is used to prevent eager aggregation for > types whose equality operators do not imply bitwise equality, such as > NUMERIC. > > After a second thought, I think it should be OK to just check the > equality operator specified by the SortGroupClause for btree equality. > I’m not very sure about this point, though, and would appreciate any > inputs. Well, the key thing here is the reasoning, which you don't really spell out either here or in the patch. The patch's justification for the use of BTEQUALIMAGE_PROC Is that, if we use non-equal-image operators, "we may lose some information that could be needed to evaluate upper qual clauses." But there's no explanation of why that would happen, and I think that makes it difficult to have a good discussion. Here's an example from the optimizer README: # 3. (A leftjoin B on (Pab)) leftjoin C on (Pbc) # = A leftjoin (B leftjoin C on (Pbc)) on (Pab) # # Identity 3 only holds if predicate Pbc must fail for all-null B rows # (that is, Pbc is strict for at least one column of B). If Pbc is not # strict, the first form might produce some rows with nonnull C columns # where the second form would make those entries null. This isn't the clearest justification for a rule that I've ever read, but it's something. Someone reading this can think about whether the two sentences of justification are a correct argument that Pbc must be strict in order for the identity to hold. I think you should be trying to offer similar (or better) justifications, and perhaps similar identities as well. It's a little bit hard to think about how to write the identities for this patch, although you could start with aggregate(X) = finalizeagg(partialagg(X)) and then explain how the partialagg can be commuted with certain joins and what is required for it to do so. The argument needs to be detailed enough that we can evaluate whether it's correct or not. Personally, I'm still stuck on the point I wrote about yesterday. When you partially aggregate a set of rows, the resulting row serves as a proxy for the original set of rows. So, all of those rows must have the same destiny. As I mentioned then, if you ever partially aggregate a row that should ultimately be filtered out with one that shouldn't, that breaks everything. But also, I need all the rows that feed into each partial group to be part of the same set of output rows. For instance, suppose I run a report like this: SELECT o.order_number, SUM(l.quantity) FROM orders o, order_lines l WHERE o.order_id = l.order_id GROUP BY 1; If I'm thinking of executing this as FinalizeAgg(order JOIN PartialAgg(order_lines)), what must be true in order for that to be a valid execution plan? I certainly can't aggregate on some unrelated column, say part_number. If I do, then first of all I don't know how to get an order_id column out so that I can still do the join. But even if that were no issue, you might have two rows with different values of the order_id column and the same value for part_number, and that the partial groups that you've created on the order_lines table don't line up properly with the groups that you need to create on the orders table. Some particular order_id might need some of the rows that went into a certain part_number group and not others; that's no good. After some thought, I think the process of deduction goes something like this. We ultimately need to aggregate on a column in the orders table, but we want to partially aggregate the order_lines table. So we need a set of columns in the order_lines table that can act as a proxy for o.order_number. Our proxy should be all the columns that appear in the join clauses between orders and order_lines. That way, all the rows in a single partially aggregate row will have the "same" order_id according to the operator class implementing the = operator, so for a given row in the "orders" table, either every row in the group has o.order_id = l.order_id or none of them do; hence they're all part of the same set of output rows. It doesn't matter, for example, whether o.order_number is unique. If it isn't, then we'll flatten together some different rows from the orders table -- but each of those rows will match all the rows in partial groupings where o.order_id = partially_grouped_l.order_id and none of the rows where that isn't the case, so the optimization is still valid. Likewise it doesn't matter whether o.order_id is unique: if order_id = 1 occurs twice, then both of the rows will match the partially aggregated group with order_id = 1, but that's fine. The only thing that's a problem is if the same row from orders was going to match some but not all of the partially aggregate rows from some order_lines group, and that won't happen here. Now consider this example: SELECT o.or
Re: Forbid to DROP temp tables of other sessions
On Tue, 29 Oct 2024 at 07:22, Daniil Davydov <3daniss...@gmail.com> wrote: > Hi, > Thanks for your comments, I appreciate them. > > As I continued to deal with the topic of working with temp tables of > other sessions, I noticed something like a bug. For example > (REL_17_STABLE): > Session 1: > =# CREATE TEMP TABLE test(id int); > > Session 2: > =# INSERT INTO pg_temp_0.test VALUES (1); > =# INSERT INTO pg_temp_0.test VALUES (2); > > Second INSERT command will end with an error "cannot access temporary > tables of other sessions". I checked why this is happening and found > errors in several places. > Good catch. I agree with this being an unwarranted behaviour. A minor comment from my end is the wording of the error message. Based on the Postgresql error message style huide, something like this could be better, "could not access temporary relations of other sessions". > So, I attach two files to this email : > 1) Isolation test, that shows an error in REL_17_STABLE (iso_1.patch) > 2) Patch that fixes code that mistakenly considered temporary tables > to be permanent (I will be glad to receive feedback on these fixes) + > isolation test, which shows that now any action with temp table of > other session leads to error (temp_tbl_fix.patch) > > Tests look kinda ugly, but I think it's inevitable, given that we > don't know exactly what the name of the temporary schema of other > session will be. > > -- > Best regards, > Daniil Davydov > -- Regards, Rafia Sabih CYBERTEC PostgreSQL International GmbH
Re: Eager aggregation, take 3
On Tue, Oct 29, 2024 at 3:51 AM Richard Guo wrote: > > 2. join type is FULL JOIN, (i am not sure about other Semijoins and > > anti-semijoins types). > > The presence of a FULL JOIN does not preclude the use of eager > aggregation. We still can push a partial aggregation down to a level > that is above the FULL JOIN. I think you can also push a partial aggregation step through a FULL JOIN. Consider this: SELECT p.name, string_agg(c.name, ', ') FROM parents p FULL JOIN children c ON p.id = c.parent_id GROUP BY 1; I don't see why it matters here that this is a FULL JOIN. If it were an inner join, we'd have one group for every parent that has at least one child. Since it's a full join, we'll also get one group for every parent with no children, and also one group for the null parent. But that should work fine with a partially aggregated c. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Psql meta-command conninfo+
Hi, I have attached a new patch that incorporates the approach suggested by David. The documentation has also been updated. $ bin/psql "port=5430 sslmode=disable dbname=postgres" -x -h localhost psql (18devel) Type "help" for help. postgres=# \conninfo+ Connection Information -[ RECORD 1 ]+-- Database | postgres Client User | hunaid Host | localhost Host Address | 127.0.0.1 Port | 5430 Options | Current Status -[ RECORD 1 ]+-- Protocol Version | 3 Password Used| false GSSAPI Authenticated | false Backend PID | 26268 Server Parameter Settings -[ RECORD 1 ]-+--- Superuser | true Client Encoding | UTF8 Server Encoding | UTF8 Session Authorization | hunaid Connection Encryption -[ RECORD 1 ]--+-- SSL Connection | false $ bin/psql "port=5430 sslmode=require dbname=postgres" -x -h localhost psql (18devel) SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, compression: off, ALPN: postgresql) Type "help" for help. postgres=# \conninfo+ E Connection Encryption -[ RECORD 1 ]--+--- SSL Connection | true Library| OpenSSL Protocol | TLSv1.3 Key Bits | 256 Cipher | TLS_AES_256_GCM_SHA384 Compression| off ALPN | postgresql I’m unsure if we need to expand the documentation further. I would appreciate your suggestions on this. Regards, Hunaid Sohail On Mon, Oct 7, 2024 at 9:31 PM David G. Johnston wrote: > On Sun, Oct 6, 2024 at 11:17 PM Hunaid Sohail > wrote: > >> >> PQpass - no need >> > > I would include this as presence/absence. > > I concur on all of the rest. > > >> >> For PQparameterStatus, some parameters are already used. >> server_version and application_name were already discussed and removed in >> v12 and v29 respectively. Do we need other parameters? >> > > Ok, I'll need to go read the reasoning for why they are deemed unneeded > and form an opinion one way or the other. > > >> >>> Within that framework having \conninfo[+[CSE][…]] be the command - >>> printing out only the table specified would be the behavior (specifying no >>> suffix letters prints all three) - would be an option. >>> >> >> 3 separate tables without suffix? >> > > Yes, the tables need headers specific to their categories. > > I do like the idea of having 4 though, placing settings into their own. > Premised on having all or most of the available parameters being on the > table. If it only ends up being a few of them then keeping those in > the status table makes sense. > > David J. > >> From 5fc7a9b2a80933641b67c30b5909a0aed810f0cd Mon Sep 17 00:00:00 2001 From: Hunaid Sohail Date: Wed, 30 Oct 2024 10:44:21 +0500 Subject: [PATCH v36] Add psql meta command conninfo+ --- doc/src/sgml/ref/psql-ref.sgml | 23 ++- src/bin/psql/command.c | 328 +++-- src/bin/psql/help.c| 2 +- 3 files changed, 328 insertions(+), 25 deletions(-) diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml index b825ca96a2..c620bcd94e 100644 --- a/doc/src/sgml/ref/psql-ref.sgml +++ b/doc/src/sgml/ref/psql-ref.sgml @@ -1060,11 +1060,26 @@ INSERT INTO tbls1 VALUES ($1, $2) \parse stmt1 -\conninfo +\conninfo[+[C|S|P|E]] - -Outputs information about the current database connection. - + +Outputs information about the current database connection. +When + is appended, all details about the +connection are displayed in table format. +The modifiers can be: + + C: Displays connection information, including: + Database, Client User, Host, Host Address, Port, and Options. + S: Displays the current connection status, including: + Protocol Version, Password Used, GSSAPI Authenticated, and Backend PID. + P: Displays parameter settings of the server, including: + Superuser, Client Encoding, Server Encoding, and Session Authorization. + E: Displays connection encryption details, including: + SSL Connection, Library, Protocol, Key Bits, Cipher, Compression, and ALPN. + +If no modifier is specified, all available details are displayed in separate tables. +If connection is not using SSL, related encryption details are not displayed. + diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c index 328d78c73f..8a5b8d1ea9 100644 --- a/src/bin/psql/command.c +++ b/src/bin/psql/command.c @@ -72,7 +72,8 @@ static backslashResult exec_command_cd(PsqlScanState scan_state, bool active_bra const char *cmd); static backslashResult exec_command_close(PsqlScanState scan_state, bool active_branch, const char *cmd); -static backslashResult exec_command_conninfo(PsqlScanState scan_state, bool active_b
Re: Interrupts vs signals
One remaining issue with these patches is backwards-compatibility for extensions: 1. Any extensions using WaitLatch, SetLatch, ResetLatch need to be changed to use WaitInterrupt, SetInterrupt/RaiseInterrupt, ResetInterrupt instead. 2. If an extension is defining its own Latch with InitLatch, rather than using MyLatch, it's out of luck. You could probably rewrite it using the INTERRUPT_GENERAL_WAKEUP, which is multiplexed like MyLatch, but it's a bit more effort. We can provide backwards compatibility macros and a new facility to allocate custom interrupt bits. But how big of a problem is this anyway? In an in-person chat last week, Andres said something like "this will break every extension". I don't think it's quite that bad, and more complicated extensions need small tweaks in every major release anyway. But let's try to quantify that. Analysis Joel compiled a list of all extensions that are on github: https://gist.github.com/joelonsql/e5aa27f8cc9bd22b8999b7de8aee9d47. I did "git checkout" on all of them, 1159 repositories in total (some of the links on the list were broken, or required authentication). 86 out of those 1159 extensions contain the word "Latch": $ ls -d */* | xargs -IFOO bash -c "grep -q -r -I Latch FOO && echo FOO" | wc -l 86 For comparison, in PostgreSQL v10, we added a new 'wait_event_info' argument to WaitLatch and WaitLatchForSocket. That breakage was on similar scale: $ ls -d */* | xargs -IFOO bash -c "grep -q -r -I WaitLatch FOO && echo FOO" | wc -l 71 Admittedly that was a long time ago, we might have different priorities now. Deeper analysis --- Most of those calls are simple calls to SetLatch(MyLatch), WaitLatch(MyLatch, ...) etc. that could be handled by backwards compatibility macros. Many of the calls don't use MyLatch, but use MyProc->procLatch directly: rc = WaitLatch(&MyProc->procLatch, ...) That makes sense, since MyLatch was added in 2015, while the initial latch code was added in 2010. Any extensions that lived during that era would use MyProc->procLatch for backwards-compatibility, and there's been no need to change that since. That could also be supported by backwards-compatibility macros, or we could tell the extension authors to search & replace MyProc->procLatch to MyLatch. Excluding those, there are a handful of extensions that would need to be updated. Below is a quick analysis of the remaining results of "grep Latch" in all the repos: These calls use the process latch to wake up a different process (I excluded some duplicated forks of the same extension): 2ndQuadrant/pglogical/pglogical_sync.c: SetLatch(&apply->proc->procLatch); 2ndQuadrant/pglogical/pglogical_apply.c: SetLatch(&worker->proc->procLatch); 2ndQuadrant/pglogical/pglogical_worker.c: SetLatch(&w->proc->procLatch); 2ndQuadrant/pglogical/pglogical_worker.c: SetLatch(&PGLogicalCtx->supervisor->procLatch); heterodb/pg-strom/src/gpu_cache.c: Latch *backend; heterodb/pg-strom/src/gpu_cache.c: cmd->backend = (is_async ? NULL : MyLatch); heterodb/pg-strom/src/gpu_cache.c: SetLatch(cmd->backend); heterodb/pg-strom/src/gpu_cache.c: SetLatch(cmd->backend); citusdata/citus/src/backend/distributed/utils/maintenanced.c: Latch *latch; /* pointer to the background worker's latch */ citusdata/citus/src/backend/distributed/utils/maintenanced.c: SetLatch(dbData->latch); liaorc/devel-master/src/cuda_program.c: SetLatch(&proc->procLatch); okbob/generic-scheduler/dbworker.c: SetLatch(&dbworker->parent->procLatch); postgrespro/lsm3/lsm3.c: SetLatch(&entry->merger->procLatch); postgrespro/lsm3/lsm3.c: SetLatch(&entry->merger->procLatch); postgrespro/pg_pageprep/pg_pageprep.c: SetLatch(&starter->procLatch); postgrespro/pg_pageprep/pg_pageprep.c: SetLatch(&starter->procLatch); postgrespro/pg_query_state/pg_query_state.c:Latch *caller; postgrespro/pg_query_state/pg_query_state.c: SetLatch(counterpart_userid->caller); postgrespro/pg_query_state/pg_query_state.c: counterpart_userid->caller = MyLatch; timescale/timescaledb/src/loader/bgw_message_queue.c: SetLatch(&BackendPidGetProc(queue_get_reader(queue))->procLatch); troels/hbase_fdw/process_communication.c: control->latch = MyLatch; troels/hbase_fdw/process_communication.c: SetLatch(control->latch); The above could easily be fairly replaced with the new SendInterrupt() function. A backwards compatibility macro might be possible for some of these, but would be tricky for others. So I think these extensions will need to adapt by switching to SendInterrupt(). At quick glance, it would easy for all of these to do that with a small "#if PG_VERSION_NUM" block. postgrespro/pg_wait_sampling/pg_wait_sampling.c: * null wait events. So instead we make use of DisownLatch() resetting postgrespro/pg_wait_sampling/pg_wait_sampling.c:if (proc->pid == 0 || proc->procLatch.owner_pid == 0 || proc->pid
Re: AIO writes vs hint bits vs checksums
Hi, On 2024-10-30 13:29:01 +0200, Heikki Linnakangas wrote: > On 30/10/2024 04:21, Andres Freund wrote: > > Attached is a, unfortunately long, series of patches implementing what I > > described upthread. > > Review of the preparatory patches: > > > 0001 Add very basic test for kill_prior_tuples > > > > We currently don't exercise this patch for gist and hash, which seems > > somewhat criminal. > > Interesting to use the isolationtester for this. There's just one session, > so you're just using it to define reusable steps with handy names. Yea. I had started out writing it as a pg_regress style test and it quickly got very hard to understand. > I'm fine with that, but please add a comment to explain it. Makes sense. > I wonder if it'd be more straightforward to make it a regular pg_regress > test though. There would be some repetition, but would it be so bad? I found it to be quite bad. If testing just one AM it's ok-ish, but once you test 2-3 it gets very long and repetitive. I guess we could use functions or such to make it a bit less painful - but that point, is it actually simpler? > You forgot to add the new test to 'isolation_schedule'. Oops. > typos: > "inex" -> "index" > "does something approximately reasonble" -> "do something approximately > reasonable" Oops^2. > > +/* > > + * If HEAP_MOVED_OFF or HEAP_MOVED_IN are set on the tuple, remove them and > > + * adjust hint bits. See the comment for SetHintBits() for more background. > > + * > > + * This helper returns false if the row ought to be invisible, true > > otherwise. > > + */ > > +static inline bool > > +HeapTupleCleanMoved(HeapTupleHeader tuple, Buffer buffer) > > +{ > > + TransactionId xvac; > > + > > + /* only used by pre-9.0 binary upgrades */ > > + if (likely(!(tuple->t_infomask & (HEAP_MOVED_OFF | HEAP_MOVED_IN > > + return true; > > + > > + ... > > +} > > This is so unlikely these days that this function probably should not be > inlined anymore. Because I put the check for moved in the function it is important that it's inlined. The compiler can understand that the rest of the function is unlikely to be executed and put it together with other less likely to be executed code. > > 0003 bufmgr: Add BufferLockHeldByMe() > > > > Useful for writing assertions in later patches. > > > + if (mode == BUFFER_LOCK_EXCLUSIVE) > > + lwmode = LW_EXCLUSIVE; > > + else if (mode == BUFFER_LOCK_SHARE) > > + lwmode = LW_SHARED; > > + else > > + { > > + Assert(false); > > + pg_unreachable(); > > + lwmode = LW_EXCLUSIVE; /* assuage compiler */ > > + } > > I just learned a new word :-). Heh. > This is fine, but to avoid the assuaging, you could also do this: > > if (mode == BUFFER_LOCK_EXCLUSIVE) > lwmode = LW_EXCLUSIVE; > else > { > Assert(mode == BUFFER_LOCK_SHARE); > lwmode = LW_SHARED; > } I have no opinion about which version is better... > > 0004 heapam: Use exclusive lock on old page in CLUSTER > > > > When I started looking into this I wanted to make sure that it's > > actually > > safe to skip setting hint bits in all places. One place that's *not* > > safe > > is cluster - we assume that all hint bits are set in > > rewrite_heap_tuple(). > > > > This might not strictly be required, but it seemed very fragile as > > is. While using an exclusive lock does seem like it makes it a bit > > safer, > > it's not a very good fix, as we still dirty the old buffer, which seems > > like a shame when one uses VACUUM FULL. > > > + /* > > +* To be able to guarantee that we can set the hint bit, > > acquire an > > +* exclusive lock on the old buffer. We need the hint bits to > > be set > > +* as otherwise reform_and_rewrite_tuple() -> > > rewrite_heap_tuple() -> > > +* heap_freeze_tuple() will get confused. > > +* > > Hmm, I don't see any comments in the reform_and_rewrite_tuple() -> > rewrite_heap_tuple() -> heap_freeze_tuple() path about requiring hint bits > to be set. There's this comment is in heap_prepare_freeze_tuple(): * It is assumed that the caller has checked the tuple with * HeapTupleSatisfiesVacuum() and determined that it is not HEAPTUPLE_DEAD * (else we should be removing the tuple, not freezing it). I don't remember the precise details of what all goes wrong, I think there was more than one avenue. One example is stuff like this: /* * If the tuple has been updated, check the old-to-new mapping hash table. */ if (!((old_tuple->t_data->t_infomask & HEAP_XMAX_INVALID) || HeapTupleHeaderIsOnlyLocked(old_tuple->t_data)) && !HeapTupleHeaderIndicatesMovedPartitions(old_tuple->t_data) && !(ItemPointerEquals(&(old_tuple->t_self), &(old_tuple->t_data->t
Re: protocol-level wait-for-LSN
Hi, On 10/30/24 1:45 PM, Jelte Fennema-Nio wrote: On Wed, 30 Oct 2024 at 18:18, Ants Aasma wrote: The idea is great, I have been wanting something like this for a long time. For future proofing it might be a good idea to not require the communicated-waited value to be a LSN. Yours and Matthias' feedback make total sense I think. From an implementation perspective I think there are a few things necessary to enable these wider usecases: 1. The token should be considered opaque for clients (should be documented) 2. The token should be defined as variable length in the protocol 3. We should have a hook to allow postgres extensions to override the default token generation 4. We should have a hook to allow postgres extensions to override waiting until the token "timestamp" Even without sharding LSN might not be a final choice. Right now on the primary the visibility order is not LSN order. So if a connection does synchronous_commit = off commit, the write location is not even going to see the commit. By publishing the end of the commit record it would be better. But I assume at some point we would like to have a consistent visibility order, which quite likely means using something other than LSN as the logical clock. I was going to say that the default could probably still be LSN, but this makes me doubt that. Is there some other token that we can send now that we could "wait" on instead of the LSN, which would work for. If not, I think LSN is still probably a good choice as the default. Or maybe only as a default in case synchronous_commit != off. There are known wish-lists for a protocol v4, like https://github.com/pgjdbc/pgjdbc/blob/master/backend_protocol_v4_wanted_features.md and a lot of clean-room implementations in drivers and embedded in projects/products. Having LSN would be nice, but to break all existing implementations, no. Having to specify with startup parameters how a core message format looks like sounds like a bad idea to me, https://www.postgresql.org/docs/devel/protocol-message-formats.html is it. If we want to start on a protocol v4 thing then that is ok - but there are a lot of feature requests for that one. Best regards, Jesper
small pg_combinebackup improvements
Hi, Here are two small patches to improve pg_combinebackup slightly. 0001: I noticed that there is some logic in reconstruct.c that constructs a pathname of the form a/b//c instead of a/b/c. AFAICT, this still works fine; it just looks ugly. It's possible to get one of these pathnames to show up in an error message if you have a corrupted backup, e.g.: pg_combinebackup: error: could not open file "x/base/1//INCREMENTAL.6229": No such file or directory pg_combinebackup: removing output directory "xy" So 0001 fixes this. 0002: If you try to do something like pg_combinebackup incr1 incr2 -o result, you'll get an error saying that the first backup must be a full backup. This is an implementation restriction that might be good to lift, but right now that's how it is. However, if you do pg_combinebackup full incr -o result, but the full backup happens to contain an INCREMENTAL.* file that also exists in the incremental backup, then you won't get an error. Instead you'll reconstruct a completely bogus full file and write it to the result directory. Now, this should really be impossible unless you're intentionally trying to break pg_combinebackup, but it's also pretty lame, so 0002 fixes things so that you get a proper error, and also adds a test case. Review appreciated. My plan is to commit at least to master and possibly back-patch. Opinions on whether to back-patch are also appreciated. Thanks, -- Robert Haas EDB: http://www.enterprisedb.com v1-0001-pg_combinebackup-When-reconstructing-avoid-double.patch Description: Binary data v1-0002-pg_combinebackup-Error-if-incremental-file-exists.patch Description: Binary data
Re: Shave a few cycles off our ilog10 implementation
On 30/10/2024 21:27, David Fetter wrote: Please find attached a patch to $Subject I've done some preliminary testing, and it appears to shave somewhere between 25-50% off the operations themselves, and these cascade into things like formatting result sets and COPY OUT. Impressive! What did you use to performance test it, to get those results? -- Heikki Linnakangas Neon (https://neon.tech)
Re: Proposal for Integrating Data Masking and anonymization into PostgreSQL
Hi Osman See postgresql_anonymizer https://gitlab.com/dalibo/postgresql_anonymizer/ and pg_anonymize https://github.com/rjuju/pg_anonymize , where the former has more functionality, like synthetic data generation (if you need that). Yours, Stefan Am Di., 29. Okt. 2024 um 10:01 Uhr schrieb Hosney Osman : > > could you please provide me with reference or article or example for this > scenario > for example i have schema include 4 table > and i want to anonymize it in another schema > how to do that > > On Wed, Oct 23, 2024 at 4:25 PM Alastair Turner wrote: >> >> Hi Hosney >> >> On Wed, 23 Oct 2024, 15:17 Hosney Osman, wrote: >>> >>> Dear PostgreSQL Development Team, >>> >>> I am writing to propose the development of native data masking and >>> anonymization features within PostgreSQL. As a long-time user of PostgreSQL, >>> >>> I have observed a growing need for efficient and secure data handling, >>> particularly in compliance with regulations like GDPR >> >> This requirement already seems to be well served by extensions. >> pg_anonymize, for instance, uses the meta data tools available in the >> database, like security labels, to manage the values returned to users. >> >> Is there any functionality, or usability, in this area which you believe >> can't be delivered by an extension? >> >> Regards >> >> Alastair
Re: protocol-level wait-for-LSN
On Wed, 30 Oct 2024 at 19:04, Jesper Pedersen wrote: > Having LSN would be nice, but to break all existing implementations, no. > Having to specify with startup parameters how a core message format > looks like sounds like a bad idea to me, It would really help if you would explain why you think it's a bad idea to use a startup parameter for that, instead of simply stating that you think it needs a major protocol version bump. The point of enabling it through a startup parameter (aka protocol option) is exactly so it will not break any existing implementations. If clients request the protocol option (which as the name suggests is optional), then they are expected to be able to parse it. If they don't, then they will get the old message format. So no existing implementation will be broken. If some middleware/proxy gets a request for a startup option it does not support it can advertise that to the client using the NegotiateProtocolVersion message. Allowing the client to continue in a mode where the option is not enabled. So, not bumping the major protocol version and enabling this feature through a protocol option actually causes less breakage in practice. Also regarding the wishlist. I think it's much more likely for any of those to happen in a minor version bump and/or protocol option than it is that we'll bump the major protocol version. P.S. Like I said in another email on this thread: I think for this specific case I'd also prefer a separate new message, because that makes it easier to filter that message out when received by PgBouncer. But I'd still like to understand your viewpoint better on this, because adding fields to existing message types is definitely one of the types of changes that I personally think would be fine for some protocol changes.
Re: Shave a few cycles off our ilog10 implementation
On Wed, Oct 30, 2024 at 09:54:20PM +0200, Heikki Linnakangas wrote: > On 30/10/2024 21:27, David Fetter wrote: > > Please find attached a patch to $Subject > > > > I've done some preliminary testing, and it appears to shave somewhere > > between 25-50% off the operations themselves, and these cascade into > > things like formatting result sets and COPY OUT. > > Impressive! What did you use to performance test it, to get those results? In case that wasn't clear, what I've tested so far was the ilog10 implementations, not the general effects on the things they underlie. This testing was basically just sending a bunch of appropriately sized pseudo-random uints in a previously created array sent through a tight loop that called the ilog10s and getting average execution times. Any suggestions for more thorough testing would be welcome. Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778
Re: protocol-level wait-for-LSN
On Wed, 30 Oct 2024 at 18:18, Ants Aasma wrote: > The idea is great, I have been wanting something like this for a long > time. For future proofing it might be a good idea to not require the > communicated-waited value to be a LSN. Yours and Matthias' feedback make total sense I think. From an implementation perspective I think there are a few things necessary to enable these wider usecases: 1. The token should be considered opaque for clients (should be documented) 2. The token should be defined as variable length in the protocol 3. We should have a hook to allow postgres extensions to override the default token generation 4. We should have a hook to allow postgres extensions to override waiting until the token "timestamp" > Even without sharding LSN might not be a final choice. Right now on > the primary the visibility order is not LSN order. So if a connection > does synchronous_commit = off commit, the write location is not even > going to see the commit. By publishing the end of the commit record it > would be better. But I assume at some point we would like to have a > consistent visibility order, which quite likely means using something > other than LSN as the logical clock. I was going to say that the default could probably still be LSN, but this makes me doubt that. Is there some other token that we can send now that we could "wait" on instead of the LSN, which would work for. If not, I think LSN is still probably a good choice as the default. Or maybe only as a default in case synchronous_commit != off.
Re: AIO writes vs hint bits vs checksums
On Tue, 2024-09-24 at 11:55 -0400, Andres Freund wrote: > What I suspect we might want instead is something inbetween a share > and an > exclusive lock, which is taken while setting a hint bit and which > conflicts > with having an IO in progress. I am starting to wonder if a shared content locks are really the right concept at all. It makes sense for simple mutexes, but we are already more complex than that, and you are suggesting adding on to that complexity. Which I agree is a good idea, I'm just wondering if we could go even further. The README states that a shared lock is necessary for visibility checking, but can we just be more careful with the ordering and atomicity of visibility changes in the page? * carefully order reads and writes of xmin/xmax/hints (would that create too many read barriers in the tqual.c code?) * write line pointer after tuple is written We would still have pins and cleanup locks to prevent data removal. We'd have the logic you suggest that would prevent modification during IO. And there would still need to be an exclusive content locks so that two inserts don't try to allocate the same line pointer, or lock the same tuple. If PD_ALL_VISIBLE is set it's even simpler. Am I missing some major hazards? Regards, Jeff Davis
Re: AIO writes vs hint bits vs checksums
Hi, On 2024-10-30 10:47:35 -0700, Jeff Davis wrote: > On Tue, 2024-09-24 at 11:55 -0400, Andres Freund wrote: > > What I suspect we might want instead is something inbetween a share > > and an > > exclusive lock, which is taken while setting a hint bit and which > > conflicts > > with having an IO in progress. > > I am starting to wonder if a shared content locks are really the right > concept at all. It makes sense for simple mutexes, but we are already > more complex than that, and you are suggesting adding on to that > complexity. What I am proposing isn't making the content lock more complicated, it's orthogonal to the content lock. > Which I agree is a good idea, I'm just wondering if we could go even > further. > > The README states that a shared lock is necessary for visibility > checking, but can we just be more careful with the ordering and > atomicity of visibility changes in the page? > > * carefully order reads and writes of xmin/xmax/hints (would >that create too many read barriers in the tqual.c code?) > * write line pointer after tuple is written It's possible, but it'd be a lot of work. And you wouldn't need to just do this for heap, but all the indexes too, to make progress on the don't-set-hint-bits-during-io front. So I don't think it makes sense to tie these things together. I do think that it's an argument for not importing all the complexity into lwlock.c though. > We would still have pins and cleanup locks to prevent data removal. As-is cleanup locks only work in coordination with content locks. While cleanup is ongoing we need to prevent anybody from starting to look at the page - without acquiring something like a shared lock that's not easy. > We'd have the logic you suggest that would prevent modification during > IO. And there would still need to be an exclusive content locks so that > two inserts don't try to allocate the same line pointer, or lock the > same tuple. > > If PD_ALL_VISIBLE is set it's even simpler. > > Am I missing some major hazards? I don't think anything fundamental, but it's a decidedly nontrivial amount of work. Greetings, Andres Freund
Re: Increase of maintenance_work_mem limit in 64-bit Windows
v.popoli...@postgrespro.ru писал(а) 2024-10-01 00:30: David Rowley писал(а) 2024-09-24 01:07: On Tue, 24 Sept 2024 at 02:47, Vladlen Popolitov wrote: I agree, it is better to fix all them together. I also do not like this hack, it will be removed from the patch, if I check and change all at once. I think, it will take about 1 week to fix and test all changes. I will estimate the total volume of the changes and think, how to group them in the patch ( I hope, it will be only one patch) There's a few places that do this: Size maxBlockSize = ALLOCSET_DEFAULT_MAXSIZE; /* choose the maxBlockSize to be no larger than 1/16 of work_mem */ while (16 * maxBlockSize > work_mem * 1024L) I think since maxBlockSize is a Size variable, that the above should probably be: while (16 * maxBlockSize > (Size) work_mem * 1024) Maybe there can be a precursor patch to fix all those to get rid of the 'L' and cast to the type we're comparing to or assigning to rather than trying to keep the result of the multiplication as a long. Hi I rechecked all , that depend on MAX_KILOBYTES limit and fixed all casts that are affected by 4-bytes long type in Windows 64-bit. Now next variables are limited by 2TB in all 64-bit systems: maintenance_work_mem work_mem logical_decoding_work_mem max_stack_depth autovacuum_work_mem gin_pending_list_limit wal_skip_threshold Also wal_keep_size_mb, min_wal_size_mb, max_wal_size_mb, max_slot_wal_keep_size_mb are not affected by "long" cast. Hi everyone. The patch added to Commitfest: https://commitfest.postgresql.org/50/5343/ -- Best regards, Vladlen Popolitov.
Re: Wrong result when enable_partitionwise_join is on if collation of PartitionKey and Column is different.
On Wed, Oct 30, 2024 at 11:58 AM jian he wrote: > > I missed a case when column collation and partition key collation are > the same and indeterministic. > that should be fine for partition-wise join. > so v2 attached. > > have_partkey_equi_join, match_expr_to_partition_keys didn't do any > collation related check. > propose v2 change disallow partitionwise join for case when > column collation is indeterministic *and* is differ from partition > key's collation. > > the attached partition_wise_join_collation.sql is the test script. > you may use it to compare with the master behavior. What if the partkey collation and column collation are both deterministic, but with different sort order? I'm not familiar with this part of the code base, but it seems to me the partition wise join should use partkey collation instead of column collation, because it's the partkey collation that decides which partition a row to be dispatched. What Jian proposed is also reasonable but seems another aspect of $subject? Just some random thought, might be wrong ;( -- Regards Junwang Zhao
Shave a few cycles off our ilog10 implementation
Please find attached a patch to $Subject I've done some preliminary testing, and it appears to shave somewhere between 25-50% off the operations themselves, and these cascade into things like formatting result sets and COPY OUT. Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 >From 612c848d8025d318e5b6e0da5bb04d0f7ed22126 Mon Sep 17 00:00:00 2001 From: David Fetter Date: Mon, 28 Oct 2024 04:38:16 -0700 Subject: [PATCH v1] Shave a few instructions off ilog10 by using SWAR and a lookup table. --- src/backend/utils/adt/numutils.c | 73 ++-- 1 file changed, 60 insertions(+), 13 deletions(-) diff --git a/src/backend/utils/adt/numutils.c b/src/backend/utils/adt/numutils.c index 63c2beb6a29..a53b647c054 100644 --- a/src/backend/utils/adt/numutils.c +++ b/src/backend/utils/adt/numutils.c @@ -39,30 +39,76 @@ static const char DIGIT_TABLE[200] = "90" "91" "92" "93" "94" "95" "96" "97" "98" "99"; /* - * Adapted from http://graphics.stanford.edu/~seander/bithacks.html#IntegerLog10 + * Adapted from https://lemire.me/blog/2021/06/03/computing-the-number-of-digits-of-an-integer-even-faster/ */ static inline int decimalLength32(const uint32 v) { - int t; - static const uint32 PowersOfTen[] = { - 1, 10, 100, - 1000, 1, 10, - 100, 1000, 1, - 10 + /* Each entry in the table is a 64-bit unsigned integer. + * - The upper 32 bits of this integer is floor(ilog10(v)), a thing we can know from lg(v). + * - The lower 32 bits of the integer is a number n such that v + n = 2^32 when v is a power of 10. + * + * These two parts taken together create a way to put the number of decimal + * digits as the upper 32 bits. + * + */ + static const uint64_t BaseTwoToBaseTenPivot[] = { + 4294967296, 8589934582, 8589934582, 8589934582, 12884901788, + 12884901788, 12884901788, 17179868184, 17179868184, 17179868184, + 21474826480, 21474826480, 21474826480, 21474826480, 25769703776, + 25769703776, 25769703776, 30063771072, 30063771072, 30063771072, + 34349738368, 34349738368, 34349738368, 34349738368, 38554705664, + 38554705664, 38554705664, 41949672960, 41949672960, 41949672960, + 42949672960, 42949672960 }; - /* - * Compute base-10 logarithm by dividing the base-2 logarithm by a - * good-enough approximation of the base-2 logarithm of 10 - */ - t = (pg_leftmost_one_pos32(v) + 1) * 1233 / 4096; - return t + (v >= PowersOfTen[t]); + return (v + BaseTwoToBaseTenPivot[pg_leftmost_one_pos32(v)]) >> 32; } +/* + * Adapted from https://lemire.me/blog/2021/06/03/computing-the-number-of-digits-of-an-integer-even-faster/ + */ static inline int decimalLength64(const uint64 v) { +#ifdef HAS_INT128 + /* Each entry in the table is a 128-bit unsigned integer. + * - The upper 64 bits of this integer is floor(ilog10(v)), a thing we can know from lg(v). + * - The lower 64 bits of the integer is a number n such that v + n = 2^64 when v is a power of 10. + * + * These two parts taken together create a way to put the number of decimal + * digits as the upper 64 bits. + * + */ + static const uint128 BaseTwoToBaseTenPivot[] = { + 18446744073709551616, + 18446744073709551606, 18446744073709551606, 18446744073709551606, + 36893488147419103132, 36893488147419103132, 36893488147419103132, + 55340232221128653848, 55340232221128653848, 55340232221128653848, + 73786976294838196464, 73786976294838196464, 73786976294838196464, + 92233720368547658080, 92233720368547658080, 92233720368547658080, + 110680464442256309696, 110680464442256309696, 110680464442256309696, + 129127208515956861312, 129127208515956861312, 129127208515956861312, + 147573952589576412928, 147573952589576412928, 147573952589576412928, + 166020696662385964544, 166020696662385964544, 166020696662385964544, + 184467440727095516160, 184467440727095516160, 184467440727095516160, + 202914184710805067776, 202914184710805067776, 202914184710805067776, + 221360927884514619392, 221360927884514619392, 221360927884514619392, + 239807662958224171008, 239807662958224171008, 239807662958224171008, + 258254317031933722624, 258254317031933722624, 258254317031933722624, + 276700161105643274240, 276700161105643274240, 276700161105643274240, + 295137905179352825856, 295137905179352825856, 295137905179352825856, + 313494649253062377472, 313494649253062377472, 313494649253062377472, + 331041393326771929088, 331041393326771929088, 331041393326771929088, + 340488137400481480704, 340488137400481480704, 340488137400481480704, + 268934881474191032320, 268934881474191032320, 268934881474191032320 + }; + + return (v + BaseTwoToBaseTenPivot[pg_leftmost_one_pos64(v)]) >> 64; +#else/* !HAVE_INT128 */ + /* + * * Adapted from http://graphics.stanford.edu/~seander/bithacks.html#IntegerLog10 + */ int t; static const uint64 PowersOfTen[] = { UINT64CONST(1), UINT64CONST(10), @@ -83,6 +129,7 @@ decimalLength64(const uint64 v) */ t = (pg_leftmost
Re: Interrupts vs signals
On Wed, Oct 30, 2024 at 12:03 PM Heikki Linnakangas wrote: > 1. Any extensions using WaitLatch, SetLatch, ResetLatch need to be > changed to use WaitInterrupt, SetInterrupt/RaiseInterrupt, > ResetInterrupt instead. > > 2. If an extension is defining its own Latch with InitLatch, rather than > using MyLatch, it's out of luck. You could probably rewrite it using the > INTERRUPT_GENERAL_WAKEUP, which is multiplexed like MyLatch, but it's a > bit more effort. > > We can provide backwards compatibility macros and a new facility to > allocate custom interrupt bits. But how big of a problem is this anyway? > In an in-person chat last week, Andres said something like "this will > break every extension". Seems hyperbolic. > For comparison, in PostgreSQL v10, we added a new 'wait_event_info' > argument to WaitLatch and WaitLatchForSocket. That breakage was on > similar scale: > > $ ls -d */* | xargs -IFOO bash -c "grep -q -r -I WaitLatch FOO && > echo FOO" | wc -l > 71 However, that was also pretty easy to fix. This seems like it might be a little more complicated. > We have a few options for how to deal with backwards-compatibility for > extensions: > > A) If we rip off the bandaid in one go and don't provide any > backwards-compatibility macros, we will break 96 extensions. Most of > them can be fixed by replacing WaitLatch, SetLatch, ResetLatch with > corresponding WaitInterrupt, RaiseInterrupt, ResetInterrupt calls. (With > #ifdef for compatibility with older postgres versions) > > B) If we provide backwards-compatibility macros so that simple Latch > calls on MyLatch continue working, we will break about 14 extensions. > They will need some tweaking like in option A). A bit more than the > simple cases in option A), but nothing too difficult. > > C) We could provide "forward-compatibility" macros in a separate header > file, to make the new "SetInterrupt" etc calls work in old PostgreSQL > versions. Many of the extensions already have a header file like this, > see e.g. citusdata/citus/src/include/pg_version_compat.h, > pipelinedb/pipelinedb/include/compat.h. It might actually be a good idea > to provide a semi-official header file like this on the Postgres wiki, > to help extension authors. It would encourage extensions to use the > latest idioms, while still being able to compile for older versions. > > I'm leaning towards option C). Let's rip off the band-aid, but provide > documentation for how to adapt your extension code. And provide a > forwards-compatibility header on the wiki, that extension authors can > use to make the new Interrupt calls work against old server versions. I don't know which of these options is best, but I don't find any of them categorically unacceptable. -- Robert Haas EDB: http://www.enterprisedb.com
Re: protocol-level wait-for-LSN
On Mon, 28 Oct 2024 at 17:51, Peter Eisentraut wrote: > This is something I hacked together on the way back from pgconf.eu. > It's highly experimental. > > The idea is to do the equivalent of pg_wal_replay_wait() on the protocol > level, so that it is ideally fully transparent to the application code. > The application just issues queries, and they might be serviced by a > primary or a standby, but there is always a correct ordering of reads > after writes. The idea is great, I have been wanting something like this for a long time. For future proofing it might be a good idea to not require the communicated-waited value to be a LSN. In a sharded database a Lamport timestamp would allow for sequential consistency. Lamport timestamp is just some monotonically increasing value that is eagerly shared between all communicating participants, including clients. For a single cluster LSNs work fine for this purpose. But with multiple shards LSNs will not work, unless arranged as a vector clock which is what I think Matthias proposed. Even without sharding LSN might not be a final choice. Right now on the primary the visibility order is not LSN order. So if a connection does synchronous_commit = off commit, the write location is not even going to see the commit. By publishing the end of the commit record it would be better. But I assume at some point we would like to have a consistent visibility order, which quite likely means using something other than LSN as the logical clock. I see the patch names the field LSN, but on the protocol level and for the client library this is just an opaque 127 byte token. So basically I'm thinking the naming could be more generic. And for a complete Lamport timestamp implementation we would need the capability of extracting the last seen value and another set-if-greater update operation. -- Ants Aasma www.cybertec-postgresql.com
Re: Shave a few cycles off our ilog10 implementation
On Thu, 31 Oct 2024 at 09:02, David Fetter wrote: > This testing was basically just sending a bunch of appropriately sized > pseudo-random uints in a previously created array sent through a tight > loop that called the ilog10s and getting average execution times. > > Any suggestions for more thorough testing would be welcome. Maybe something similar to what I did in [1]. David [1] https://postgr.es/m/CAApHDvopR=ypgnr4abbn4hmoztuyva+ifyrtvu49pxg9yo_...@mail.gmail.com
Re: Large expressions in indexes can't be stored (non-TOASTable)
On Fri, Oct 18, 2024 at 08:08:55AM +0900, Michael Paquier wrote: > 0002 and 0003 look sane enough to me as shaped. I'd need to spend a > few more hours on 0001 if I were to do a second round on it, but you > don't really need a second opinion, do you? :D I'll manage. 0001 was a doozy to back-patch, and this obviously isn't a terribly pressing issue, so I plan to wait until after the November minor release to apply this. I'm having second thoughts on 0002, so I'll probably leave that one uncommitted, but IMHO we definitely need 0003 to prevent this issue from reoccurring. -- nathan >From 58eefc6c2f90ac7d1f23f5b5f3052042cf0e75d9 Mon Sep 17 00:00:00 2001 From: Nathan Bossart Date: Wed, 30 Oct 2024 15:42:29 -0500 Subject: [PATCH v5 1/1] Ensure we have a snapshot when updating various system catalogs. Reported-by: Alexander Lakhin Reviewed-by: Michael Paquier Discussion: https://postgr.es/m/18127-fe54b6a667f29658%40postgresql.org Discussion: https://postgr.es/m/18309-c0bf914950c46692%40postgresql.org Discussion: https://postgr.es/m/ZvMSUPOqUU-VNADN%40nathan Backpatch-through: 12 --- src/backend/commands/indexcmds.c| 2 +- src/backend/commands/tablecmds.c| 8 +++ src/backend/postmaster/autovacuum.c | 7 ++ src/backend/replication/logical/tablesync.c | 21 ++ src/backend/replication/logical/worker.c| 24 + 5 files changed, 61 insertions(+), 1 deletion(-) diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c index 2f652463e3..fd86f28a35 100644 --- a/src/backend/commands/indexcmds.c +++ b/src/backend/commands/indexcmds.c @@ -4230,7 +4230,7 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein false); /* -* Updating pg_index might involve TOAST table access, so ensure we +* Swapping the indexes might involve TOAST table access, so ensure we * have a valid snapshot. */ PushActiveSnapshot(GetTransactionSnapshot()); diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c index 4345b96de5..f1649419d6 100644 --- a/src/backend/commands/tablecmds.c +++ b/src/backend/commands/tablecmds.c @@ -19383,9 +19383,17 @@ ATExecDetachPartition(List **wqueue, AlteredTableInfo *tab, Relation rel, tab->rel = rel; } + /* +* Detaching the partition might involve TOAST table access, so ensure we +* have a valid snapshot. +*/ + PushActiveSnapshot(GetTransactionSnapshot()); + /* Do the final part of detaching */ DetachPartitionFinalize(rel, partRel, concurrent, defaultPartOid); + PopActiveSnapshot(); + ObjectAddressSet(address, RelationRelationId, RelationGetRelid(partRel)); /* keep our lock until commit */ diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c index dc3cf87aba..e84285b644 100644 --- a/src/backend/postmaster/autovacuum.c +++ b/src/backend/postmaster/autovacuum.c @@ -2209,6 +2209,12 @@ do_autovacuum(void) get_namespace_name(classForm->relnamespace), NameStr(classForm->relname; + /* +* Deletion might involve TOAST table access, so ensure we have a +* valid snapshot. +*/ + PushActiveSnapshot(GetTransactionSnapshot()); + object.classId = RelationRelationId; object.objectId = relid; object.objectSubId = 0; @@ -2221,6 +2227,7 @@ do_autovacuum(void) * To commit the deletion, end current transaction and start a new * one. Note this also releases the locks we took. */ + PopActiveSnapshot(); CommitTransactionCommand(); StartTransactionCommand(); diff --git a/src/backend/replication/logical/tablesync.c b/src/backend/replication/logical/tablesync.c index 118503fcb7..57e1eede41 100644 --- a/src/backend/replication/logical/tablesync.c +++ b/src/backend/replication/logical/tablesync.c @@ -377,6 +377,12 @@ process_syncing_tables_for_sync(XLogRecPtr current_lsn) replorigin_session_origin_lsn = InvalidXLogRecPtr; replorigin_session_origin_timestamp = 0; + /* +* Updating pg_replication_origin might involve TOAST table access, so +* ensure we have a valid snapshot. +*/ + PushActiveSnapshot(GetTransactionSnapshot()); + /* * Drop the tablesync's origin tracking if exists. * @@ -387,6 +393,8 @@ process_syncing_tables_for_sync(XLogRecPtr current_lsn) */
Re: Popcount optimization using AVX512
BTW, I just realized function attributes for xsave and avx512 don't work on MSVC (see https://developercommunity.visualstudio.com/t/support-function-target-attribute-and-mutiversioning/10130630). Not sure if you care about it. Its an easy fix (see https://gcc.godbolt.org/z/Pebdj3vMx).
Re: [PATCH] Add array_reverse() function
The following review has been posted through the commitfest application: make installcheck-world: tested, failed Implements feature: tested, failed Spec compliant: tested, failed Documentation:tested, failed Content. The proposed function waited long to be implemented and it will be very useful. Personally I used slow workaround, when I needed the reverse of the array. Run. This patch applies cleanly to HEAD. All regression tests pass successfully against the patch. Format. The code formatted according to The Code Guidelines Documentation. The documentation is updated, the description of the function is added. From my point of view, it would be better to mention, that function returns updated array (does not updates it in place, as a reader can understand), but other array returning functions in the documentation has the same style (silently assume: reverse == return reversed array). Conclusion. +1 for commiter review
Re: protocol-level wait-for-LSN
On 30/10/2024 13:34, Tatsuo Ishii wrote: On Wed, 30 Oct 2024 at 07:49, Tatsuo Ishii wrote: I think one would have to define that somehow. If it's useful, the additional fields of both extensions could be appended, in some defined order. But this is an interesting question to think about. I think this kind of extension, which adds new filed to an existing message type, should be implemented as v4 protocol. Could you explain why you think a major version bump is needed? In what situation do you care about this. Because for my usecases (client implementations & pgbouncer) I don't think that would be necessary. If a client doesn't send the _pq_.wait_for_lsn protocol parameter, it will never receive this new version. Yes, if there's only one extension for a message type, it would not be a big problem. But if there's more than one extensions that want to change the same type, problem arises as I have already discussed them upthread. I don't really see a problem with having two protocol parameters change the same message. Yes, you have to define what the result of their combination is, but that seems trivial to do for additions of fields. You either define the first protocol parameter that was added to the spec, to add its field before the second. Or you could do it based on something non-time-dependent, like the alphabetic order of the protocol parameter, or the alphabetic order of the fields that they add. That sounds far from trivial. So each extension needs to check if any other extension which modifies the same message type is activated? That requires each extension implementation to have built-in knowledge about any conflicting extension. Moreover each extension may not be added at once. If extension Y is added after extension X is defined, then implementation of X needs to be changed because at the time when X is defined, it did not need to care about Y. Another way to deal with the problem could be defining a new protocol message which describes those conflict information so that each extensions do not need to have such information built-in, but maybe it is too complex. Note that the "protocol extension" mechanism is *not* meant for user-defined extensions. That's not the primary purpose anyway. It allows evolving the protocol in core code in a backwards compatible way, but indeed the different extensions will need to be coordinated so that they don't clash with each other. If they introduced new message types for example, they better use different message type codes. We might have made a mistake by calling this mechanism "protocol extensions", because it makes people think of user-defined extensions. With user-defined extensions, yes, you have exactly the problem you describe.We have no rules on how a protocol extension is allowed to change the protocol. It might add fields, it might add messages, or it might change the meaning of existing messages. Or encapsulate the whole protocol in XML. So yes, each protocol extension needs to know about all the other protocol extensions that it can be used with. In practice we'll avoid doing crazy stuff so that the protocol extensions are orthogonal, but if user-defined extensions get involved, there's not much we can do to ensure that. -- Heikki Linnakangas Neon (https://neon.tech)
Re: protocol-level wait-for-LSN
> On Wed, 30 Oct 2024 at 07:49, Tatsuo Ishii wrote: >> > I think one would have to define that somehow. If it's useful, the >> > additional fields of both extensions could be appended, in some >> > defined order. But this is an interesting question to think about. >> >> I think this kind of extension, which adds new filed to an existing >> message type, should be implemented as v4 protocol. > > Could you explain why you think a major version bump is needed? In > what situation do you care about this. Because for my usecases (client > implementations & pgbouncer) I don't think that would be necessary. If > a client doesn't send the _pq_.wait_for_lsn protocol parameter, it > will never receive this new version. Yes, if there's only one extension for a message type, it would not be a big problem. But if there's more than one extensions that want to change the same type, problem arises as I have already discussed them upthread. > I don't really see a problem with having two protocol parameters > change the same message. Yes, you have to define what the result of > their combination is, but that seems trivial to do for additions of > fields. You either define the first protocol parameter that was added > to the spec, to add its field before the second. Or you could do it > based on something non-time-dependent, like the alphabetic order of > the protocol parameter, or the alphabetic order of the fields that > they add. That sounds far from trivial. So each extension needs to check if any other extension which modifies the same message type is activated? That requires each extension implementation to have built-in knowledge about any conflicting extension. Moreover each extension may not be added at once. If extension Y is added after extension X is defined, then implementation of X needs to be changed because at the time when X is defined, it did not need to care about Y. Another way to deal with the problem could be defining a new protocol message which describes those conflict information so that each extensions do not need to have such information built-in, but maybe it is too complex. Best reagards, -- Tatsuo Ishii SRA OSS K.K. English: http://www.sraoss.co.jp/index_en/ Japanese:http://www.sraoss.co.jp
Re: doc issues in event-trigger-matrix.html
On Tue, Oct 29, 2024 at 7:54 PM Peter Eisentraut wrote: > > I made a patch for this. I have expanded the narrative discussion on > what commands are supported for event triggers, also made a few > corrections/additions there, based on inspecting the source code. And > then removed the big matrix, which doesn't provide any additional > information, I think. > > I think this is sufficient and covers everything. The only hand-wavy > thing I can see is exactly which ALTER commands trigger the sql_drop > event. But this was already quite imprecise before, and I think also > not quite correct. This might need a separate investigation. > > In any case, we can use this as a starting point to iterate on the right > wording etc. hi. I have some minor issue. An event trigger fires whenever the event with which it is associated occurs in the database in which it is defined. is possible to rewrite this sentence, two "which" is kind of not easy to understand? create role alice; create role bob; grant alice to bob; As an exception, this event does not occur for DDL commands targeting shared objects: databases roles tablespaces parameter privileges ALTER SYSTEM This event also does not occur for commands targeting event triggers themselves. not 100% sure this description " roles" cover case like "grant alice to bob;" Here "targeting shared objects" is "role related meta information". maybe a new item like roles privileges. so we can more easily distinguish "grant select on t1 to alice;" and "grant alice to bob;"
Re: [PATCH] Add roman support for to_number function
Hi, I have attached a rebased patch if someone wants to review in the next CF. No changes as compared to v4. Regards, Hunaid Sohail > From 951f3e9620f0d50fcdaff095652a07d6304b490c Mon Sep 17 00:00:00 2001 From: Hunaid Sohail Date: Wed, 30 Oct 2024 12:00:47 +0500 Subject: [PATCH v5] Add roman support for to_number function --- doc/src/sgml/func.sgml| 13 ++- src/backend/utils/adt/formatting.c| 154 +- src/test/regress/expected/numeric.out | 41 +++ src/test/regress/sql/numeric.sql | 21 4 files changed, 224 insertions(+), 5 deletions(-) diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 05f630c6a6..5f91f0401b 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -8628,7 +8628,7 @@ SELECT regexp_match('abc01234xyz', '(?:(.*?)(\d+)(.*)){1,1}'); RN -Roman numeral (input between 1 and 3999) +Roman numeral (valid for numbers 1 to 3999) TH or th @@ -8756,6 +8756,17 @@ SELECT regexp_match('abc01234xyz', '(?:(.*?)(\d+)(.*)){1,1}'); (e.g., 9.99 is a valid pattern). + + + +In to_number, RN pattern converts +roman numerals to standard numbers. It is case-insensitive (e.g., 'XIV', +'xiv', and 'Xiv' are all seen as 14). +RN cannot be used in combination with any other formatting patterns +or modifiers, with the exception of FM, which is applicable only in +to_char() and is ignored in to_number(). + + diff --git a/src/backend/utils/adt/formatting.c b/src/backend/utils/adt/formatting.c index 85a7dd4561..5e5e763388 100644 --- a/src/backend/utils/adt/formatting.c +++ b/src/backend/utils/adt/formatting.c @@ -49,7 +49,6 @@ * - better number building (formatting) / parsing, now it isn't * ideal code * - use Assert() - * - add support for roman number to standard number conversion * - add support for number spelling * - add support for string to string formatting (we must be better * than Oracle :-), @@ -270,6 +269,31 @@ static const char *const rm100[] = {"C", "CC", "CCC", "CD", "D", "DC", "DCC", "D static const char *const numTH[] = {"ST", "ND", "RD", "TH", NULL}; static const char *const numth[] = {"st", "nd", "rd", "th", NULL}; +/* -- + * MACRO: Check if the current and next characters + * form a valid subtraction combination for roman numerals + * -- + */ +#define IS_VALID_SUB_COMB(curr, next) \ + (((curr) == 'I' && ((next) == 'V' || (next) == 'X')) || \ + ((curr) == 'X' && ((next) == 'L' || (next) == 'C')) || \ + ((curr) == 'C' && ((next) == 'D' || (next) == 'M'))) + +/* -- + * MACRO: Roman number value + * -- + */ +#define ROMAN_VAL(r) \ + ((r) == 'I' ? 1 : \ + (r) == 'V' ? 5 : \ + (r) == 'X' ? 10 : \ + (r) == 'L' ? 50 : \ + (r) == 'C' ? 100 : \ + (r) == 'D' ? 500 : \ + (r) == 'M' ? 1000 : 0) + +#define MAX_ROMAN_LEN 15 + /* -- * Flags & Options: * -- @@ -1074,6 +1098,7 @@ static bool do_to_timestamp(text *date_txt, text *fmt, Oid collid, bool std, static char *fill_str(char *str, int c, int max); static FormatNode *NUM_cache(int len, NUMDesc *Num, text *pars_str, bool *shouldFree); static char *int_to_roman(int number); +static int roman_to_int(char* s, int len); static void NUM_prepare_locale(NUMProc *Np); static char *get_last_relevant_decnum(char *num); static void NUM_numpart_from_char(NUMProc *Np, int id, int input_len); @@ -1284,6 +1309,15 @@ NUMDesc_prepare(NUMDesc *num, FormatNode *n) case NUM_rn: case NUM_RN: + if (IS_ROMAN(num)) +ereport(ERROR, + (errcode(ERRCODE_SYNTAX_ERROR), + errmsg("cannot use \"RN\" twice"))); + if (IS_BLANK(num) || IS_LSIGN(num) || IS_BRACKET(num) || +IS_MINUS(num) || IS_PLUS(num) || IS_MULTI(num)) +ereport(ERROR, + (errcode(ERRCODE_SYNTAX_ERROR), + errmsg("\"RN\" is incompatible with other formats"))); num->flag |= NUM_F_ROMAN; break; @@ -5247,7 +5281,107 @@ int_to_roman(int number) return result; } +/* Convert a standard roman numeral to an integer. + * Result is an integer between 1 and 3999. + * + * If input is invalid, return -1. + */ +static int +roman_to_int(char* s, int len) +{ + int repeatCount = 1; + int vCount = 0, lCount = 0, dCount = 0; + bool subtractionEncountered = false; + char lastSubtractedChar = 0; + int total = 0; + + /* Ensure the input is not too long or empty. + * 'MMMDCCCLXXXVIII' (3888) is the longest valid roman numeral (15 chars). + */ + if (len == 0 || len > MAX_ROMAN_LEN) + return -1; + + for (int i = 0; i < len; ++i) + { + char currChar = pg_ascii_toupper(s[i]); + int currValue = ROMAN_VAL(currChar); + + if (currValue == 0) + return -1; + + /* Ensure no character greater than or equal to the subtracted + * character appears after the subtraction. + */ + if (subtractionEncountered && (currValue >= ROMAN_VAL(lastSubtractedChar))) +
Re: Add isolation test template in injection_points for wait/wakeup/detach
Hi, On Wed, Oct 30, 2024 at 12:40:05PM +0900, Michael Paquier wrote: > On Tue, Oct 29, 2024 at 02:06:11PM +, Bertrand Drouvot wrote: > > Yeah, agree that it would make sense to document in the test what has been > > discovered here. > > How about something like the attached? Thanks! A few comments: 1 === Nit s/Permutations like this one/Permutations like the following commented out one/ ? 2 === s/back its result/back its result once woken up/ ? 3 === s/faster than the wait/before the SQL function doing the wait returns its result/ ? With 2 === and 3 === I think it's easier to "link" both sentences that way. Thoughts? Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Use "protocol options" name instead of "protocol extensions" everywhere
It was pointed out by Heikki in the thread around protocol-level wait-for-LSN that "protocol extensions" is a pretty confusing name, since it suggests a relation to Postgres extensions. Even though there is no such relationship at all. Attached is a small patch that aligns on the name "protocol options" instead. This terminology is already used in a bunch of the docs. Since no protocol options have been introduced yet, it seems like now is a good time to align on what to call them. It might even be worth backporting this to have our docs of previous versions be consistent. From a06619bcd5c85b6f38cb6a3d03c2a4e6ffbb9b53 Mon Sep 17 00:00:00 2001 From: Jelte Fennema-Nio Date: Wed, 30 Oct 2024 14:37:00 +0100 Subject: [PATCH v1] Rename protocol extensions to protocol options in docs Currently we call connection parameters that have the "_pq_." prefix both "protocol options" and "protocol extensions". This standardizes on calling them "protocol options". The primary reason to do this is so people don't associate them with Postgres extensions, with which they have nothing to do. They are completely separate things, and Postgres extensions currently cannot even define their own protocol options. --- doc/src/sgml/protocol.sgml | 2 +- src/backend/backup/basebackup_copy.c | 2 +- src/interfaces/libpq/fe-protocol3.c | 4 ++-- 3 files changed, 4 insertions(+), 4 deletions(-) diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml index 71b6b2a535..fc1255b7eb 100644 --- a/doc/src/sgml/protocol.sgml +++ b/doc/src/sgml/protocol.sgml @@ -5941,7 +5941,7 @@ psql "dbname=postgres replication=database" -c "IDENTIFY_SYSTEM;" In addition to the above, other parameters may be listed. Parameter names beginning with _pq_. are - reserved for use as protocol extensions, while others are + reserved for use as protocol options, while others are treated as run-time parameters to be set at backend start time. Such settings will be applied during backend start (after parsing the command-line arguments if any) and will diff --git a/src/backend/backup/basebackup_copy.c b/src/backend/backup/basebackup_copy.c index 0b4ee5f2a2..8cfcae47ad 100644 --- a/src/backend/backup/basebackup_copy.c +++ b/src/backend/backup/basebackup_copy.c @@ -9,7 +9,7 @@ * the course of that single COPY OUT. Each CopyData message begins with a * type byte, allowing us to signal the start of a new archive, or the * manifest, by some means other than ending the COPY stream. This also allows - * for future protocol extensions, since we can include arbitrary information + * for future protocol options, since we can include arbitrary information * in the message stream as long as we're certain that the client will know * what to do with it. * diff --git a/src/interfaces/libpq/fe-protocol3.c b/src/interfaces/libpq/fe-protocol3.c index 8c5ac1729f..a84ea7a786 100644 --- a/src/interfaces/libpq/fe-protocol3.c +++ b/src/interfaces/libpq/fe-protocol3.c @@ -1439,8 +1439,8 @@ pqGetNegotiateProtocolVersion3(PGconn *conn) if (num > 0) { appendPQExpBuffer(&conn->errorMessage, - libpq_ngettext("protocol extension not supported by server: %s", - "protocol extensions not supported by server: %s", num), + libpq_ngettext("protocol option not supported by server: %s", + "protocol options not supported by server: %s", num), buf.data); appendPQExpBufferChar(&conn->errorMessage, '\n'); } -- 2.43.0
Re: freespace.c modifies buffer without any locks
On 29/10/2024 02:50, Andres Freund wrote: Hi, I just noticed that fsm_vacuum_page() modifies a buffer without even holding a shared lock. That quite obviously seems like a violation of the buffer locking protocol: /* * Try to reset the next slot pointer. This encourages the use of * low-numbered pages, increasing the chances that a later vacuum can * truncate the relation. We don't bother with a lock here, nor with * marking the page dirty if it wasn't already, since this is just a hint. */ if (BufferPrepareToSetHintBits(buf)) { ((FSMPage) PageGetContents(page))->fp_next_slot = 0; BufferFinishSetHintBits(buf); } In the commit (15c121b3ed7) adding the current freespace code, there wasn't even a comment remarking upon that oddity. 10 years later Tom added a comment, in 2b1759e2675f. I noticed this while adding a debug mode in which buffers are mprotected PROT_NONE/PROT_READ/PROT_READ|PROT_WRITE depending on the buffer's state. Is there any good reason to avoid a lock here? Compared to the cost of exclusively locking buffers during RecordAndGetPageWithFreeSpace() the cost of doing so during FreeSpaceMapVacuum*() seems small? Agreed. This is a premature optimization, fsm_vacuum_page() should just take the lock. Somewhat relatedly, but I don't think I understand why it's a good idea to reset fp_next_slot to 0 in fsm_vacuum_page(). At least doing so unconditionally. Per the comment: "This encourages the use of low-numbered pages, increasing the chances that a later vacuum can truncate the relation". Yes, the next GetPageWithFreeSpace() call will need to do a little more work to find the first page that actually has free space, if any. But that seems insignificant compared to vacuum. When extending a relation, it seems we'll constantly reset the search back to the start of the range, even though we pretty much know that there's no space earlier in the relation - otherwise we'd not have extended. That's a good point. Before commit a063baaced, relation extension used a separate UpdateFreeSpaceMap() function, which didn't reset fp_next_slot. And when called from FreeSpaceMapVacuumRange() we'll reset fp_next_slot to somewhere that wasn't actually vacuumed, afaict? Yeah. In the context of actual VACUUM rather than relation extension, that seems fine; the next GetPageWithFreeSpace() call will fix it up quickly. -- Heikki Linnakangas Neon (https://neon.tech)
Re: general purpose array_sort
On Wed, Oct 30, 2024 at 10:17 PM Junwang Zhao wrote: > > Hi, > > On Wed, Oct 30, 2024 at 9:29 PM Aleksander Alekseev > wrote: > > > > Hi, > > > > Thanks for the updated patch set. > > > > > > > +Datum > > > > > +array_sort_order(PG_FUNCTION_ARGS) > > > > > +{ > > > > > +return array_sort(fcinfo); > > > > > +} > > > > > + > > > > > +Datum > > > > > +array_sort_order_nulls_first(PG_FUNCTION_ARGS) > > > > > +{ > > > > > +return array_sort(fcinfo); > > > > > +} > > > > > > > > Any reason not to specify array_sort in pg_proc.dat? > > > > > > It is specified in 0001 (see oid => '8810'). > > > > What I meant was that I don't think these wrapper functions are > > needed. I think you can just do: > > > > ``` > > +{ oid => '8811', descr => 'sort array', > > + proname => 'array_sort', prorettype => 'anyarray', > > + proargtypes => 'anyarray bool', prosrc => 'array_sort'}, <-- > > array_sort is used directly in `prosrc` > > ``` > > > > ... unless I'm missing something. > > There is a opr sanity check for this[1], if we remove these wrapper functions, > regression test will fail with: > > - oid | proname | oid | proname > --+-+-+- > -(0 rows) > + oid | proname | oid | proname > +--++--+ > + 8811 | array_sort | 8812 | array_sort > + 8810 | array_sort | 8811 | array_sort > + 8810 | array_sort | 8812 | array_sort > +(3 rows) > > > [1]: > > -- Considering only built-in procs (prolang = 12), look for multiple uses > -- of the same internal function (ie, matching prosrc fields). It's OK to > -- have several entries with different pronames for the same internal > function, > -- but conflicts in the number of arguments and other critical items should > -- be complained of. (We don't check data types here; see next query.) > -- Note: ignore aggregate functions here, since they all point to the same > -- dummy built-in function. > > SELECT p1.oid, p1.proname, p2.oid, p2.proname > FROM pg_proc AS p1, pg_proc AS p2 > WHERE p1.oid < p2.oid AND > p1.prosrc = p2.prosrc AND > p1.prolang = 12 AND p2.prolang = 12 AND > (p1.prokind != 'a' OR p2.prokind != 'a') AND > (p1.prolang != p2.prolang OR > p1.prokind != p2.prokind OR > p1.prosecdef != p2.prosecdef OR > p1.proleakproof != p2.proleakproof OR > p1.proisstrict != p2.proisstrict OR > p1.proretset != p2.proretset OR > p1.provolatile != p2.provolatile OR > p1.pronargs != p2.pronargs); > > > > > -- > > Best regards, > > Aleksander Alekseev > > > > -- > Regards > Junwang Zhao CFbot failed with doc build, v10 fixed that. -- Regards Junwang Zhao v10-0002-support-sort-order-and-nullsfirst-flag.patch Description: Binary data v10-0001-general-purpose-array_sort.patch Description: Binary data
Re: protocol-level wait-for-LSN
>> So yes, each protocol extension needs to know about all the other >> protocol extensions that it can be used with. In practice we'll avoid >> doing crazy stuff so that the protocol extensions are orthogonal > > Just as an example, let's say we add a server-side query time to the > protocol (which honestly seems like a pretty useful feature). So that > ReadyForQuery now returns the query time if the query_time protocol. > For clients it isn't difficult at all to support any combination of > query_time & wait_for_lsn options. As long as we define that the > wait_for_lsn field is before the query_time field if both exist, then > two simple if statements like this would do the trick: > > if (wait_for_lsn_enabled) { > // interpret next field as LSN > } > if (query_time_enabled) { > // interpret next field as query time > } But if (query_time_enabled) { // interpret next field as query time } if (wait_for_lsn_enabled) { // interpret next field as LSN } doesn't work, right? I don't like clients need to know the exact order of each protocol extensions. BTW, > Just as an example, let's say we add a server-side query time to the > protocol (which honestly seems like a pretty useful feature). So that > ReadyForQuery now returns the query time if the query_time protocol. Probaby it's better CommandComplete returns the query time because there could be multiple query-time in multi-statement query or extended query protocol. Best reagards, -- Tatsuo Ishii SRA OSS K.K. English: http://www.sraoss.co.jp/index_en/ Japanese:http://www.sraoss.co.jp
Re: Popcount optimization using AVX512
On Wed, Oct 30, 2024 at 08:53:10PM +, Raghuveer Devulapalli wrote: > BTW, I just realized function attributes for xsave and avx512 don't work > on MSVC (see > https://developercommunity.visualstudio.com/t/support-function-target-attribute-and-mutiversioning/10130630). > Not sure if you care about it. Its an easy fix (see > https://gcc.godbolt.org/z/Pebdj3vMx). Oh, good catch. IIUC we only need to check for #ifndef _MSC_VER in the configure programs for meson. pg_attribute_target will be empty on MSVC, and I believe we only support meson builds there. -- nathan
Re: New "raw" COPY format
On Tue, Oct 29, 2024 at 9:48 AM Joel Jacobson wrote: > > > --- > > It's a bit odd to me to use the delimiter as a EOL marker in raw > > format, but probably it's okay. > > > > --- > > - if (cstate->opts.format != COPY_FORMAT_BINARY) > > + if (cstate->opts.format == COPY_FORMAT_RAW && > > + cstate->opts.delim != NULL) > > + { > > + /* Output the user-specified delimiter between rows */ > > + CopySendString(cstate, cstate->opts.delim); > > + } > > + else if (cstate->opts.format == COPY_FORMAT_TEXT || > > +cstate->opts.format == COPY_FORMAT_CSV) > > > > Since it sends the delimiter as a string, even if we specify the > > delimiter to '\n', it doesn't send the new line (i.e. ASCII LF, 10). > > For example, > > > > postgres(1:904427)=# copy (select '{"j" : 1}'::jsonb) to stdout with > > (format 'raw', delimiter '\n'); > > {"j": 1}\npostgres(1:904427)=# > > > > I think there is a similar problem in COPY FROM; if we set a delimiter > > to '\n' when doing COPY FROM in raw format, it expects the string '\n' > > as a line termination but not ASCII LF(10). I think that input data > > normally doesn't use the string '\n' as a line termination. > > You need to use E'\n' to get ASCII LF(10), since '\n' is just a delimiter > consisting of backslash followed by "n". > > Is this a problem? Since any string can be used as delimiter, > I think it would be strange if we parsed it and replaced the string > with a different string. > > Another thought: > > Maybe we shouldn't default to no delimiter after all, > maybe it would be better to default to the OS default EOL, It seems to be useful for loading unstructured non-delimited text files such as log files. Users can set the delimiter an empty when loading the entire file to a single column. On the other hand, I think that If we make it default it might not be necessary to allow the delimiter to be multi bytes. It would be flexible but it would not necessarily be necessary. Also, it would be somewhat odd to me that we can use multi bytes characters as the delimiter only in 'raw' mode, but not in 'text' mode. > maybe a final delimiter should always be written at the end, > so that when exporting a single json field, it would get exported > to the text file with \n at the end, which is what most text editor > does when saving a .json file. Agreed. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Re: CREATE OR REPLACE MATERIALIZED VIEW
On 2024-09-05 22:33 +0200, Erik Wienhold wrote: > On 2024-07-27 02:45 +0200, Erik Wienhold wrote: > > On 2024-07-12 16:49 +0200, Said Assemlal wrote: > > > > My initial idea, while writing the patch, was that one could replace the > > > > matview without populating it and then run the concurrent refresh, like > > > > this: > > > > > > > > CREATE OR REPLACE MATERIALIZED VIEW foo AS ... WITH NO DATA; > > > > REFRESH MATERIALIZED VIEW CONCURRENTLY foo; > > > > > > > > But that won't work because concurrent refresh requires an already > > > > populated matview. > > > > > > > > Right now the patch either populates the replaced matview or leaves it > > > > in an unscannable state. Technically, it's also possible to skip the > > > > refresh and leave the old data in place, perhaps by specifying > > > > WITH *OLD* DATA. New columns would just be null. Of course you can't > > > > tell if you got stale data without knowing how the matview was replaced. > > > > Thoughts? > > > > > > I believe the expectation is to get materialized views updated whenever it > > > gets replaced so likely to confuse users ? > > > > I agree, that could be confusing -- unless it's well documented. The > > attached 0003 implements WITH OLD DATA and states in the docs that this > > is intended to be used before a concurrent refresh. > > > > Patch 0001 now covers all matview cases in psql's tab completion. I > > missed some of them with v1. > > Here's a rebased version due to conflicts with f683d3a4ca and > 1e35951e71. No other changes since v2. rebased -- Erik >From eefed7938bc1511a1ae998f67c67661c366484ec Mon Sep 17 00:00:00 2001 From: Erik Wienhold Date: Tue, 21 May 2024 18:35:47 +0200 Subject: [PATCH v4 1/3] Add CREATE OR REPLACE MATERIALIZED VIEW --- .../sgml/ref/create_materialized_view.sgml| 15 +- src/backend/commands/createas.c | 207 ++ src/backend/commands/tablecmds.c | 8 +- src/backend/commands/view.c | 106 ++--- src/backend/parser/gram.y | 15 ++ src/bin/psql/tab-complete.in.c| 15 +- src/include/commands/view.h | 3 + src/include/nodes/parsenodes.h| 2 +- src/include/nodes/primnodes.h | 1 + src/test/regress/expected/matview.out | 191 src/test/regress/sql/matview.sql | 108 + 11 files changed, 582 insertions(+), 89 deletions(-) diff --git a/doc/src/sgml/ref/create_materialized_view.sgml b/doc/src/sgml/ref/create_materialized_view.sgml index 62d897931c..5e03320eb7 100644 --- a/doc/src/sgml/ref/create_materialized_view.sgml +++ b/doc/src/sgml/ref/create_materialized_view.sgml @@ -21,7 +21,7 @@ PostgreSQL documentation -CREATE MATERIALIZED VIEW [ IF NOT EXISTS ] table_name +CREATE [ OR REPLACE ] MATERIALIZED VIEW [ IF NOT EXISTS ] table_name [ (column_name [, ...] ) ] [ USING method ] [ WITH ( storage_parameter [= value] [, ... ] ) ] @@ -60,6 +60,17 @@ CREATE MATERIALIZED VIEW [ IF NOT EXISTS ] table_name Parameters + +OR REPLACE + + + Replaces a materialized view if it already exists. + Specifying OR REPLACE together with + IF NOT EXISTS is an error. + + + + IF NOT EXISTS @@ -67,7 +78,7 @@ CREATE MATERIALIZED VIEW [ IF NOT EXISTS ] table_name Do not throw an error if a materialized view with the same name already exists. A notice is issued in this case. Note that there is no guarantee that the existing materialized view is anything like the one that would - have been created. + have been created, unless you use OR REPLACE instead. diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c index 5c92e48a56..d4bc5e5c08 100644 --- a/src/backend/commands/createas.c +++ b/src/backend/commands/createas.c @@ -79,55 +79,151 @@ static void intorel_destroy(DestReceiver *self); static ObjectAddress create_ctas_internal(List *attrList, IntoClause *into) { - CreateStmt *create = makeNode(CreateStmt); - bool is_matview; + bool is_matview, +replace = false; char relkind; - Datum toast_options; - const char *const validnsps[] = HEAP_RELOPT_NAMESPACES; + Oid matviewOid = InvalidOid; ObjectAddress intoRelationAddr; /* This code supports both CREATE TABLE AS and CREATE MATERIALIZED VIEW */ is_matview = (into->viewQuery != NULL); relkind = is_matview ? RELKIND_MATVIEW : RELKIND_RELATION; - /* - * Create the target relation by faking up a CREATE TABLE parsetree and - * passing it to DefineRelation. - */ - create->relation = into->rel; - create->tableElts = attrList; - create->inhRelations = NIL; - create->ofTypename = NULL; - create->constraints = NIL; - create->options = into->options; - create->oncommit = into->onCommit; - create->tablespacename = into->tableSpaceName; - create->if_not_exists = false; - create->accessMeth
Re: per backend I/O statistics
Hi, On Tue, Oct 08, 2024 at 04:28:39PM +, Bertrand Drouvot wrote: > > > On Fri, Sep 20, 2024 at 01:26:49PM +0900, Michael Paquier wrote: > > > > Okay, per the above and the persistency of the stats. > > Great, I'll work on an updated patch version then. > I spend some time on this during the last 2 days and I think we have 3 design options. === GOALS === But first let's sump up the goals that I think we agreed on: - Keep pg_stat_io as it is today: give the whole server picture and serialize the stats to disk. - Introduce per-backend IO stats and 2 new APIs to: 1. Provide the IO stats for "my backend" (through say pg_my_stat_io), this would take care of the stats_fetch_consistency. 2. Retrieve the IO stats for another backend (through say pg_stat_get_backend_io(pid)) that would _not_ take care of stats_fetch_consistency, as: 2.1/ I think that there is no use case (there is no need to get others backends I/O statistics while taking care of the stats_fetch_consistency) 2.2/ That could be memory expensive to store a snapshot for all the backends (depending of the number of backend created) - There is no need to serialize the per-backend IO stats to disk (no point to see stats for backends that do not exist anymore after a re-start). - The per-backend IO stats should be variable-numbered (not fixed), as per up-thread discussion. === OPTIONS === So, based on this, I think that we could: Option 1: "move" the existing PGSTAT_KIND_IO to variable-numbered and let this KIND take care of the aggregated view (pg_stat_io) and the per-backend stats. Option 2: let PGSTAT_KIND_IO as it is and introduce a new PGSTAT_KIND_BACKEND_IO that would be variable-numbered. Option 3: Remove PGSTAT_KIND_IO, introduce a new PGSTAT_KIND_BACKEND_IO that would be variable-numbered and store the "aggregated stats aka pg_stat_io" in shared memory (not part of the variable-numbered hash). Per-backend stats could be aggregated into "pg_stat_io" during the flush_pending_cb call for example. === BEST OPTION? === I would opt for Option 2 as: - The stats system is currently not designed for Option 1 and our goals (for example the shared_data_len is used to serialize but also to fetch the entries, see pgstat_fetch_entry()) so that would need some hack to serialize only a part of them and still be able to fetch them all). - Mixing "fixed" and "variable" in the same KIND does not sound like a good idea (though that might be possible with some hacks, I don't think that would be easy to maintain). - Having the per-backend as "variable" in its dedicated kind looks more reasonable and less error-prone. - I don't think there is a stats design similar to option 3 currently, so I'm not sure there is a need to develop something new while Option 2 could be done. - Option 3 would need some hack for (at least) the "pg_stat_io" [de]serialization part. - Option 2 seems to offer more flexibility (as compare to Option 1 and 3). Thoughts? Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Having problems generating a code coverage report
On Wed, Oct 30, 2024 at 5:21 PM Aleksander Alekseev wrote: > I'm using Xubuntu 24.04 LTS and my lcov version is: > > ``` > $ lcov --version > lcov: LCOV version 2.0-1 > ``` I use Debian unstable for most of my day to day work. Apparently Debian unstable has exactly the same version of lcov as Ubuntu 24.04. I've also been unable to generate coverage reports for some time (at least on Debian, with LCOV version 2.0-1). -- Peter Geoghegan
Re: define pg_structiszero(addr, s, r)
+/* + * Test if a memory region starting at p and of size len is full of zeroes. + */ +static inline bool +pg_mem_is_all_zeros(const void *ptr, size_t len) The function comment should say 'ptr' instead of 'p', right? == Kind Regards, Peter Smith. Fujitsu Australia
Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM
Hi~ I did some performance test these days, and I have some findings. From the archive months ago, I found there were discussions about which type of TupleTableSlot to use for buffering tuples. A single column mat view was used for evaluation. Finally we used virtual one. However when I test with a 32-columns mat view, I get regression. Test case: -- prepare create table test as select i as id0, i + 1 as id1, i + 2 as id2, i + 3 as id3, i + 4 as id4, i + 5 as id5, i + 6 as id6, i + 7 as id7, i + 8 as id8, i + 9 as id9, i + 10 as id10, i + 11 as id11, i + 12 as id12, i + 13 as id13, i + 14 as id14, i + 15 as id15, i + 0.01 as f0, i + 0.1 as f1, i + 0.2 as f2, i + 0.3 as f3, i + 0.4 as f4, i + 0.5 as f5, i + 0.6 as f6, i + 0.7 as f7, i + 0.8 as f8, i + 0.9 as f9, i + 1.01 as f10, i + 1.1 as f11, i + 1.2 as f12, i + 1.3 as f13, i + 1.4 as f14, i + 1.5 as f15, i + 1.6 as f16 from generate_series(1,500) i; -- run create materialized view m1 as select * from test; HEAD: Time: 13615.542 ms (00:13.616) Time: 13545.706 ms (00:13.546) Time: 13578.475 ms (00:13.578) Patched Time: 20112.734 ms (00:20.113) Time: 19996.957 ms (00:19.997) Time: 19936.871 ms (00:19.937) I did a quick perf, the overhead seems to come from virtual tuple materialization. HEAD: 12.29% postgres[.] pg_checksum_block 6.33% postgres[.] GetPrivateRefCountEntry 5.40% postgres[.] pg_comp_crc32c_sse42 4.54% [kernel][k] copy_user_enhanced_fast_string 2.69% postgres[.] BufferIsValid 1.52% postgres[.] XLogRecordAssemble Patched: 11.75% postgres[.] tts_virtual_materialize 8.87% postgres[.] pg_checksum_block 8.17% postgres[.] slot_deform_heap_tuple 8.09% postgres[.] heap_compute_data_size 6.17% postgres[.] fill_val 3.81% postgres[.] heap_fill_tuple 3.37% postgres[.] tts_virtual_copyslot 2.62% [kernel][k] copy_user_enhanced_fast_string Not sure if it is a universal situation. — Regards, Jingtang
Re: Large expressions in indexes can't be stored (non-TOASTable)
On Wed, Oct 30, 2024 at 03:54:32PM -0500, Nathan Bossart wrote: > I'll manage. 0001 was a doozy to back-patch, and this obviously isn't a > terribly pressing issue, so I plan to wait until after the November minor > release to apply this. Okay by me. > I'm having second thoughts on 0002, so I'll > probably leave that one uncommitted, but IMHO we definitely need 0003 to > prevent this issue from reoccurring. I liked your idea behind 0002, FWIW. We do that for a lot of fields already in a Relation. -- Michael signature.asc Description: PGP signature
Re: Add isolation test template in injection_points for wait/wakeup/detach
On Wed, Oct 30, 2024 at 09:27:13AM +, Bertrand Drouvot wrote: > Thanks! LGTM. Thanks for the review. Applied that, then. -- Michael signature.asc Description: PGP signature
Re: define pg_structiszero(addr, s, r)
On Thu, Oct 31, 2024 at 07:55:45AM +1100, Peter Smith wrote: > +/* > + * Test if a memory region starting at p and of size len is full of zeroes. > + */ > +static inline bool > +pg_mem_is_all_zeros(const void *ptr, size_t len) > > The function comment should say 'ptr' instead of 'p', right? Yes. +static inline bool +pg_mem_is_all_zeros(const void *ptr, size_t len) While we're talking about wordsmithing things, I would not choose "mem" for this routine, just stick to "memory". There is not much in the code that does memory-specific things like what you are proposing here. Still, this would be more consistent with the macros for memory barriers at least. Hence, "pg_memory_is_all_zeros()" makes more sense? The location of memutils.h is sensible. + if (pg_mem_is_all_zeros(pagebytes , (BLCKSZ / sizeof(size_t Extra space not required after pagebytes. -- Michael signature.asc Description: PGP signature
Re: Avoid detoast overhead when possible
Nikita Malakhov writes: > 1) In short the idea behind the Json improvements is to introduce iterators > for Json objects to extract these objects value by value, OK here. > and modification > of in-memory and on-disk representations to hold partially-detoasted Json > objects (tree nodes with need-detoast state) for in-memory, and to store > key-value map in on-disk representation to extract only required > values. I have troubles to follow here. Could you explain this more for the below example, What will be done differently from current in the below example: CREATE TABLE t(jbcol jsonb); INSERT INTO t values ('{"a": 1, "b": 2}'); // new storage format? SELECT jbcol->'a', jbcol->'b' FROM t; // How does ExprInterp.c work with this? > 2) Backward compatibility would not be a problem because in on-disk > representation we could distinguish old version from new. OK, so you indeed change the storage format, if so, How is the new format looks like? It would be an important factor to discuss. > there are enough service bits for that. OK, good news. > But servers not updated with this patch could not parse new data, of > course. This looks too for me. > > 3) Yes, it definitely has to. These changes are very complex and invasive, > that's the reason I haven't finished the patch. This is the area I am really worried about. I think you can have some high-level design document for review first. It would be great that some more experienced people could have a look at, however I know it is hard becuase of their bindwidth. You need that document anyway. Without the details, other people is hard to understand the idea in your mind. > > 4) It doesn't seem worthy at first glance. Do you have any ideas on > this? I think the more important ones are (a) what the new storage looks like, (b) how does it works with the ExprExecutionEnginner. Without that infromation, I think it is too soon to talk about this. -- Best Regards Andy Fan
Re: Add ExprState hashing for GROUP BY and hashed SubPlans
On Tue, 29 Oct 2024 at 22:47, David Rowley wrote: > I've attached an updated patch with a few other fixes. Whilr checking > this tonight, noticed that master does not use > SubPlanState.tab_eq_funcs for anything. I resisted removing that in > this patch. Perhaps a follow-on patch can remove that. I suspect it's > not been used for a long time now, but I didn't do the archaeology > work to find out. 3974bc319 removed the SubPlanState.tab_eq_funcs field, so here's a rebased patch. David v3-0001-Use-ExprStates-for-hashing-in-GROUP-BY-and-SubPla.patch Description: Binary data
Count and log pages set all-frozen by vacuum
Vacuum currently counts and logs the number of pages of a relation with newly frozen tuples. It does not count the number of pages newly set all-frozen in the visibility map. The number of pages set all-frozen in the visibility map by a given vacuum can be useful when analyzing which normal vacuums reduce the number of pages future aggressive vacuum need to scan. Also, empty pages that are set all-frozen in the VM don't show up in the count of pages with newly frozen tuples. When making sense of the result of visibilitymap_summary() for a relation, it's helpful to know how many pages were set all-frozen in the VM by each vacuum. The attached patch set makes visibilitymap_set() return the old value of the block's VM bits and then uses that to increment a new counter in the LVRelState. Here is example logging output with the patch: create table foo (a int, b int) with (autovacuum_enabled = false); insert into foo select generate_series(1,1000), 1; delete from foo where a > 50; vacuum (freeze, verbose) foo; finished vacuuming "melanieplageman.public.foo": index scans: 0 pages: 4 removed, 1 remain, 5 scanned (100.00% of total) tuples: 950 removed, 50 remain, 0 are dead but not yet removable frozen: 1 pages from table (20.00% of total) had 50 tuples frozen. 5 pages set all-frozen in the VM - Melanie v1-0003-Count-pages-set-all-frozen-in-VM-during-vacuum.patch Description: Binary data v1-0002-Make-visibilitymap_set-return-previous-state-of-v.patch Description: Binary data v1-0001-Rename-LVRelState-frozen_pages.patch Description: Binary data
RE: Popcount optimization using AVX512
> Oh, good catch. IIUC we only need to check for #ifndef _MSC_VER in the > configure programs for meson. pg_attribute_target will be empty on MSVC, and > I > believe we only support meson builds there. Right. __has_attribute (target) produces a compiler warning on MSVC: https://gcc.godbolt.org/z/EfWGxbvj3. Might need to guard that with #if defined(__has_attribute) to get rid of it. > > -- > nathan
Re: Pgoutput not capturing the generated columns
On Thu, Oct 31, 2024 at 1:14 PM vignesh C wrote: > > On Thu, 31 Oct 2024 at 04:42, Peter Smith wrote: > > > > On Thu, Oct 31, 2024 at 3:16 AM vignesh C wrote: > > > > > > On Wed, 30 Oct 2024 at 15:06, Amit Kapila wrote: > > > > > > > > On Tue, Oct 29, 2024 at 8:50 PM vignesh C wrote: > > > > > > > > > > Thank you for reporting this issue. The attached v46 patch addresses > > > > > the problem and includes some adjustments to the comments. Thanks to > > > > > Amit for sharing the comment changes offline. > > > > > > > > > > > > > Pushed. Kindly rebase and send the remaining patches. > > > > > > Thanks for committing this patch, here is a rebased version of the > > > remaining patches. > > > > > > > Hi, > > > > I found that the docs of src/sgml/ddl.sgml [1] are still saying: > > > > > > Generated columns are skipped for logical replication and cannot be > > specified in a CREATE PUBLICATION column list. > > > > > > But that is contrary to the new behaviour after the "Replicate > > generated columns when specified in the column list." commit yesterday > > [2]. > > > > It looks like an oversight. I think updating that paragraph should > > have been included with yesterday's commit. > > Thanks for the findings, the attached patch has the changes for the same. > LGTM. == Kind Regards, Peter Smith. Fujitsu Australia
Having problems generating a code coverage report
Hi hackers, Recently I tried to build a code coverage report according to the documentation [1]. When using the following command: ``` time sh -c 'git clean -dfx && meson setup --buildtype debug -DPG_TEST_EXTRA="kerberos ldap ssl" -Db_coverage=true -Dldap=disabled -Dssl=openssl -Dcassert=true -Dtap_tests=enabled -Dprefix=/home/eax/pginstall build && ninja -C build && PG_TEST_EXTRA=1 meson test -C build && ninja -C build coverage-html' ``` ... I get: ``` geninfo: ERROR: Unexpected negative count '-3' for /home/eax/projects/c/postgresql/src/port/snprintf.c:532. Perhaps you need to compile with '-fprofile-update=atomic (use "geninfo --ignore-errors negative ..." to bypass this error) Traceback (most recent call last): File "/usr/bin/meson", line 40, in sys.exit(mesonmain.main()) File "/usr/lib/python3/dist-packages/mesonbuild/mesonmain.py", line 294, in main return run(sys.argv[1:], launcher) ^^^ File "/usr/lib/python3/dist-packages/mesonbuild/mesonmain.py", line 282, in run return run_script_command(args[1], args[2:]) ^ File "/usr/lib/python3/dist-packages/mesonbuild/mesonmain.py", line 223, in run_script_command return module.run(script_args) ^^^ File "/usr/lib/python3/dist-packages/mesonbuild/scripts/coverage.py", line 206, in run return coverage(options.outputs, options.source_root, ^^ File "/usr/lib/python3/dist-packages/mesonbuild/scripts/coverage.py", line 123, in coverage subprocess.check_call([lcov_exe, File "/usr/lib/python3.12/subprocess.py", line 413, in check_call raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command '['lcov', '--directory', '/home/eax/projects/c/postgresql/build', '--capture', '--output-file', '/home/eax/projects/c/postgresql/build/meson-logs/coverage.info.run', '--no-checksum', '--rc', 'branch_coverage=1']' returned non-zero exit status 1. ninja: build stopped: subcommand failed. ``` If I add -fprofile-update=atomic as suggested: ``` time CFLAGS="-fprofile-update=atomic" CXXFLAGS="-fprofile-update=atomic" sh -c 'git clean -dfx && meson setup --buildtype debug -DPG_TEST_EXTRA="kerberos ldap ssl" -Db_coverage=true -Dldap=disabled -Dssl=openssl -Dcassert=true -Dtap_tests=enabled -Dprefix=/home/eax/pginstall build && ninja -C build && PG_TEST_EXTRA=1 meson test -C build && ninja -C build coverage-html' ``` ... I get: ``` genhtml: ERROR: duplicate merge record src/include/catalog Traceback (most recent call last): File "/usr/bin/meson", line 40, in sys.exit(mesonmain.main()) File "/usr/lib/python3/dist-packages/mesonbuild/mesonmain.py", line 294, in main return run(sys.argv[1:], launcher) ^^^ File "/usr/lib/python3/dist-packages/mesonbuild/mesonmain.py", line 282, in run return run_script_command(args[1], args[2:]) ^ File "/usr/lib/python3/dist-packages/mesonbuild/mesonmain.py", line 223, in run_script_command return module.run(script_args) ^^^ File "/usr/lib/python3/dist-packages/mesonbuild/scripts/coverage.py", line 206, in run return coverage(options.outputs, options.source_root, ^^ File "/usr/lib/python3/dist-packages/mesonbuild/scripts/coverage.py", line 150, in coverage subprocess.check_call([genhtml_exe, File "/usr/lib/python3.12/subprocess.py", line 413, in check_call raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command '['genhtml', '--prefix', '/home/eax/projects/c/postgresql/build', '--prefix', '/home/eax/projects/c/postgresql', '--output-directory', '/home/eax/projects/c/postgresql/build/meson-logs/coveragereport', '--title', 'Code coverage', '--legend', '--show-details', '--branch-coverage', '/home/eax/projects/c/postgresql/build/meson-logs/coverage.info']' returned non-zero exit status 1. ninja: build stopped: subcommand failed. ``` I'm using Xubuntu 24.04 LTS and my lcov version is: ``` $ lcov --version lcov: LCOV version 2.0-1 ``` I tried using Autotools with the same results. Pretty confident it worked before. I'm wondering if anyone else experienced this lately and/or knows a workaround. [1]: https://www.postgresql.org/docs/current/regress-coverage.html -- Best regards, Aleksander Alekseev
Re: Interrupts vs signals
On Wed, Oct 30, 2024 at 01:23:54PM -0400, Robert Haas wrote: > On Wed, Oct 30, 2024 at 12:03 PM Heikki Linnakangas wrote: >> We can provide backwards compatibility macros and a new facility to >> allocate custom interrupt bits. But how big of a problem is this anyway? >> In an in-person chat last week, Andres said something like "this will >> break every extension". > > Seems hyperbolic. Most extensions that rely on bgworkers, at least. > However, that was also pretty easy to fix. This seems like it might be > a little more complicated. The advantage of the breakage is also to force extension maintainers to look at the changes because these have benefits. I'd like to think that 6f3bd98ebfc0 was a good move overall even if it broke come compilations and these require more PG_VERSION_NUM magic. For the extension maintainer hat on, it is always annoying, but not really that bad in the long-term. >> We have a few options for how to deal with backwards-compatibility for >> extensions: >> >> A) If we rip off the bandaid in one go and don't provide any >> backwards-compatibility macros, we will break 96 extensions. Most of >> them can be fixed by replacing WaitLatch, SetLatch, ResetLatch with >> corresponding WaitInterrupt, RaiseInterrupt, ResetInterrupt calls. (With >> #ifdef for compatibility with older postgres versions) >> >> B) If we provide backwards-compatibility macros so that simple Latch >> calls on MyLatch continue working, we will break about 14 extensions. >> They will need some tweaking like in option A). A bit more than the >> simple cases in option A), but nothing too difficult. >> >> C) We could provide "forward-compatibility" macros in a separate header >> file, to make the new "SetInterrupt" etc calls work in old PostgreSQL >> versions. Many of the extensions already have a header file like this, >> see e.g. citusdata/citus/src/include/pg_version_compat.h, >> pipelinedb/pipelinedb/include/compat.h. It might actually be a good idea >> to provide a semi-official header file like this on the Postgres wiki, >> to help extension authors. It would encourage extensions to use the >> latest idioms, while still being able to compile for older versions. >> >> I'm leaning towards option C). Let's rip off the band-aid, but provide >> documentation for how to adapt your extension code. And provide a >> forwards-compatibility header on the wiki, that extension authors can >> use to make the new Interrupt calls work against old server versions. > > I don't know which of these options is best, but I don't find any of > them categorically unacceptable. Looking at the compatibility macros of 0008 for the latches with INTERRUPT_GENERAL_WAKEUP under latch.h, the changes are not that bad to adapt to, IMO. It reminds of f25968c49697: hard breakage, no complaints I've heard of because I guess that most folks have been using an in-house compatibility headers. A big disadvantage of B is that someone may decide to add new code in core that depends on the past routines, and we'd better avoid that for this new layer of APIs for interrupt handling. A is a subset of C: do a hard switch in the core code, with C mentioning a compatibility layer in the wiki that does not exist in the core code. Any of A or C is OK, I would not choose B for the core backend. -- Michael signature.asc Description: PGP signature
Re: Cleanup SubPlanstate
On Wed, 30 Oct 2024 at 03:34, Rafia Sabih wrote: > While reviewing a related patch, it came to the notice that tab_eq_funcs in > the SubPlanState is not used. Hence the patch. Thanks for picking that up. I didn't quite understand why you adjusted the header comment for the TupleHashEntryData struct when the field you removed was in the SubPlanState struct, so I left that part out. However, I did fix the existing typo in that comment. I've pushed the resulting patch. David
RE: Conflict detection for update_deleted in logical replication
Dear Mikhail, Thanks for the reply! > > Anyway, this topic introduces huge complexity and is not mandatory for > > update_deleted > > detection. We can work on it in later versions based on the needs. > > From my perspective, this is critical for databases. REINDEX CONCURRENTLY is > typically run > in production databases on regular basic, so any master-master system should > be unaffected by it. I think you do not understand what I said correctly. The main point here is that the index scan is not needed to detect the update_deleted. In the first version workers can do the normal sequential scan instead. This workaround definitely does not affect REINDEX CONCURRENTLY. After the patch being good shape or pushed, we can support using the index to find the dead tuple, at that time we can consider how we ensure the index contains the entry for dead tuples. Best regards, Hayato Kuroda FUJITSU LIMITED
Re: CREATE OR REPLACE MATERIALIZED VIEW
On Tue, Jul 02, 2024 at 01:46:21PM +0300, Aleksander Alekseev wrote: > I can imagine how this may impact many applications and upset many > software developers worldwide. Was there even a precedent (in the > recent decade or so) when PostgreSQL broke the SQL syntax? We're usually very careful about that, and maintaining a backward-compatible grammar has a minimal cost in the parser. The closest thing I can think of that has a rather complicated grammar is COPY, which has *two* legacy grammars still supported, one for ~7.3 and one for ~9.0. > To clarify, I'm not opposed to this idea. If we are fine with breaking > backward compatibility on the SQL level, this would allow dropping the > support of inherited tables some day, a feature that in my humble > opinion shouldn't exist (I realize this is another and very debatable > question though). I just don't think this is something we ever do in > this project. But I admit that this information may be incorrect > and/or outdated. I am not sure that there is much to gain with this proposal knowing that the commands with matviews have been around for quite a long time now. Particularly, after looking at 0001, you'd see that it shortcuts a couple of areas of the CTAS code because that's what we are relying on when building the initial data of matviews. Hence, implementation-wise in the backend, matviews are much closer to physical relations than views. This is trying to make matviews behave more consistently with views. This topic has been mentioned once on pgsql-general back in 2019, for reference: https://www.postgresql.org/message-id/CAAWhf%3DPHch2H3ekYnbafuwqWqwyRok8WVPaDxKosZE4GQ2pq5w%40mail.gmail.com -- Michael signature.asc Description: PGP signature
Re: [PATCH] Add array_reverse() function
On Wed, Oct 30, 2024 at 10:11:20PM +0300, Пополитов Владлен wrote: > Makes me wonder if we should have an utility function which would > contain common code for array_shuffle() and array_reverse(). > Considering inconveniences such an approach caused in case of common > code for text and bytea types [1] I choose to copy the code and modify > it for the task. We are talking about 20 lines copied over to reverse the order of these elements that require their own inner for the element lookups and manipulations. FWIW, I can live with this burden rather than invent a routine that encapsulates both as this has also the disadvantage of hiding more internals across more layers. The abstraction with array_reverse_n() is unnecessary, but you have an argument with the consistency and the shuffling inner routine. Perhaps somebody will reuse that to invent a function that works on a subset, where the comment on top array_reverse_n() about the type caching still works. Anyway, no objections to what the patch defines and does. So it looks rather OK seen from here, as you are proposing. -- Michael signature.asc Description: PGP signature
Re: Pgoutput not capturing the generated columns
On Thu, 31 Oct 2024 at 04:42, Peter Smith wrote: > > On Thu, Oct 31, 2024 at 3:16 AM vignesh C wrote: > > > > On Wed, 30 Oct 2024 at 15:06, Amit Kapila wrote: > > > > > > On Tue, Oct 29, 2024 at 8:50 PM vignesh C wrote: > > > > > > > > Thank you for reporting this issue. The attached v46 patch addresses > > > > the problem and includes some adjustments to the comments. Thanks to > > > > Amit for sharing the comment changes offline. > > > > > > > > > > Pushed. Kindly rebase and send the remaining patches. > > > > Thanks for committing this patch, here is a rebased version of the > > remaining patches. > > > > Hi, > > I found that the docs of src/sgml/ddl.sgml [1] are still saying: > > > Generated columns are skipped for logical replication and cannot be > specified in a CREATE PUBLICATION column list. > > > But that is contrary to the new behaviour after the "Replicate > generated columns when specified in the column list." commit yesterday > [2]. > > It looks like an oversight. I think updating that paragraph should > have been included with yesterday's commit. Thanks for the findings, the attached patch has the changes for the same. Regards, Vignesh From 6cf390c33d1b88836da2ec0753562568b20b39e1 Mon Sep 17 00:00:00 2001 From: Vignesh C Date: Thu, 31 Oct 2024 07:36:37 +0530 Subject: [PATCH] Update documentation for generated columns in logical replication The previous commit (745217a051) introduced support for logical replication of generated columns when specified alongside a column list. However, the documentation for generated columns was not updated to reflect this change. This commit addresses that oversight by adding clarification on the support for generated columns in the logical replication section. --- doc/src/sgml/ddl.sgml | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/doc/src/sgml/ddl.sgml b/doc/src/sgml/ddl.sgml index f6344b3b79..f02f67d7b8 100644 --- a/doc/src/sgml/ddl.sgml +++ b/doc/src/sgml/ddl.sgml @@ -514,8 +514,9 @@ CREATE TABLE people ( - Generated columns are skipped for logical replication and cannot be - specified in a CREATE PUBLICATION column list. + Generated columns can be replicated during logical replication by + including them in the column list of the + CREATE PUBLICATION command. -- 2.34.1
Re: Pgoutput not capturing the generated columns
On Thu, Oct 31, 2024 at 3:16 AM vignesh C wrote: > > On Wed, 30 Oct 2024 at 15:06, Amit Kapila wrote: > > > > On Tue, Oct 29, 2024 at 8:50 PM vignesh C wrote: > > > > > > Thank you for reporting this issue. The attached v46 patch addresses > > > the problem and includes some adjustments to the comments. Thanks to > > > Amit for sharing the comment changes offline. > > > > > > > Pushed. Kindly rebase and send the remaining patches. > > Thanks for committing this patch, here is a rebased version of the > remaining patches. > Hi, I found that the docs of src/sgml/ddl.sgml [1] are still saying: Generated columns are skipped for logical replication and cannot be specified in a CREATE PUBLICATION column list. But that is contrary to the new behaviour after the "Replicate generated columns when specified in the column list." commit yesterday [2]. It looks like an oversight. I think updating that paragraph should have been included with yesterday's commit. == [1] https://github.com/postgres/postgres/blob/master/doc/src/sgml/ddl.sgml [2] https://github.com/postgres/postgres/commit/745217a051a9341e8c577ea59a87665d331d4af0 Kind Regards, Peter Smith. Fujitsu Australia
Re: Add ExprState hashing for GROUP BY and hashed SubPlans
On 10/31/24 08:16, David Rowley wrote: On Tue, 29 Oct 2024 at 22:47, David Rowley wrote: I've attached an updated patch with a few other fixes. Whilr checking this tonight, noticed that master does not use SubPlanState.tab_eq_funcs for anything. I resisted removing that in this patch. Perhaps a follow-on patch can remove that. I suspect it's not been used for a long time now, but I didn't do the archaeology work to find out. 3974bc319 removed the SubPlanState.tab_eq_funcs field, so here's a rebased patch. Thanks for sharing this. I still need to dive deeply into the code. But I have one annoying user case where the user complained about a 4x SQL server speedup in comparison to Postgres, and I guess it is a good benchmark for your code. This query is remarkable because of high grouping computation load. Of course, I can't provide the user's data, but I have prepared a synthetic test to reproduce the case (see attachment). Comparing the master with and without your patch, the first, I see is more extensive usage of memory (see complete explains in the attachment): Current master: --- Partial HashAggregate (cost=54492.60..55588.03 rows=19917 width=889) (actual time=20621.028..20642.664 rows=10176 loops=9) Group Key: t1.x1, t1.x2, t1.x3, t1.x4, t1.x5 Batches: 1 Memory Usage: 74513kB Patched: Partial HashAggregate (cost=54699.91..55799.69 rows=19996 width=889) (actual time=57213.280..186216.604 rows=10302738 loops=9) Group Key: t1.x1, t1.x2, t1.x3, t1.x4, t1.x5 Batches: 261 Memory Usage: 527905kB Disk Usage: 4832656kB I wonder what causes memory consumption, but it is hard to decide on the patch's positive outcome for now. -- regards, Andrei Lepikhov /* -- Master Finalize HashAggregate (cost=88853.57..89949.01 rows=19917 width=889) (actual time=20960.694..20994.725 rows=4 loops=1) Group Key: t1.x1, t1.x2, t1.x3, t1.x4, t1.x5 Batches: 1 Memory Usage: 280081kB -> Gather (cost=55492.60..72521.63 rows=159336 width=889) (actual time=20640.902..20680.058 rows=91580 loops=1) Workers Planned: 8 Workers Launched: 8 -> Partial HashAggregate (cost=54492.60..55588.03 rows=19917 width=889) (actual time=20621.028..20642.664 rows=10176 loops=9) Group Key: t1.x1, t1.x2, t1.x3, t1.x4, t1.x5 Batches: 1 Memory Usage: 74513kB Worker 0: Batches: 1 Memory Usage: 66329kB Worker 1: Batches: 1 Memory Usage: 58137kB Worker 2: Batches: 1 Memory Usage: 58137kB Worker 3: Batches: 1 Memory Usage: 66329kB Worker 4: Batches: 1 Memory Usage: 58137kB Worker 5: Batches: 1 Memory Usage: 74521kB Worker 6: Batches: 1 Memory Usage: 58137kB Worker 7: Batches: 1 Memory Usage: 66329kB -> Parallel Hash Join (cost=4921.00..23223.33 rows=305066 width=879) (actual time=667.304..5047.261 rows=17361113 loops=9) Hash Cond: ((t1.x1 = t2.x1) AND (t1.x2 = t2.x2) AND (t1.x3 = t2.x3) AND (t1.x4 = t2.x4)) -> Parallel Seq Scan on t1 (cost=0.00..4671.00 rows=12500 width=868) (actual time=0.009..7.437 rows=1 loops=9) -> Parallel Hash (cost=4671.00..4671.00 rows=12500 width=319) (actual time=666.929..666.930 rows=1 loops=9) Buckets: 131072 Batches: 1 Memory Usage: 35776kB -> Parallel Seq Scan on t2 (cost=0.00..4671.00 rows=12500 width=319) (actual time=560.944..562.943 rows=1 loops=9) Planning Time: 1.619 ms JIT: Functions: 149 Options: Inlining true, Optimization true, Expressions true, Deforming true Timing: Generation 64.711 ms (Deform 13.706 ms), Inlining 846.944 ms, Optimization 2602.412 ms, Emission 1599.170 ms, Total 5113.237 ms Execution Time: 21030.501 ms (29 rows) */ /* -- Master + patch Finalize HashAggregate (cost=89193.21..90292.99 rows=19996 width=889) (actual time=469100.662..469136.040 rows=4 loops=1) Group Key: t1.x1, t1.x2, t1.x3, t1.x4, t1.x5 Batches: 1 Memory Usage: 8963609kB -> Gather (cost=55699.91..72796.49 rows=159968 width=889) (actual time=58325.864..202369.963 rows=92724640 loops=1) Workers Planned: 8 Workers Launched: 8 -> Partial HashAggregate (cost=54699.91..55799.69 rows=19996 width=889) (actual time=57213.280..186216.604 rows=10302738 loops=9) Group Key: t1.x1, t1.x2, t1.x3, t1.x4, t1.x5 Batches: 261 Memory Usage: 527905kB Disk Usage: 4832656kB Worker 0: Batches: 1 Memory Usage: 247321kB Worker 1: Batches: 261 Memory Usage: 527417kB Disk Usage: 3850624kB Worker 2: Batches: 261 Memory Usage: 527905kB Disk Usage: 4304528kB Worker 3: Batches: 325 Memory Usage: 527417kB Disk Usage: 4315120kB Worker 4: Batches: 261 Memory Usage:
Re: define pg_structiszero(addr, s, r)
Hi, On Thu, Oct 31, 2024 at 09:59:35AM +0900, Michael Paquier wrote: > On Thu, Oct 31, 2024 at 07:55:45AM +1100, Peter Smith wrote: > > +/* > > + * Test if a memory region starting at p and of size len is full of zeroes. > > + */ > > +static inline bool > > +pg_mem_is_all_zeros(const void *ptr, size_t len) > > > > The function comment should say 'ptr' instead of 'p', right? > > Yes. Thank you both for looking at it. Agree, done in the new version attached. > +static inline bool > +pg_mem_is_all_zeros(const void *ptr, size_t len) > > While we're talking about wordsmithing things, I would not choose > "mem" for this routine, just stick to "memory". There is not much in > the code that does memory-specific things like what you are proposing > here. Still, this would be more consistent with the macros for memory > barriers at least. Hence, "pg_memory_is_all_zeros()" makes more > sense? That makes sense to me, done that way in the attached. > The location of memutils.h is sensible. Thanks for sharing your thoughts on it. > + if (pg_mem_is_all_zeros(pagebytes , (BLCKSZ / sizeof(size_t > > Extra space not required after pagebytes. Fat finger here, thanks! Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com >From 88f5ebbbee3cdc7b773a1d0f5a7cf6f9d909b784 Mon Sep 17 00:00:00 2001 From: Bertrand Drouvot Date: Tue, 10 Sep 2024 01:22:02 + Subject: [PATCH v6] New pg_memory_is_all_zeros(ptr, len) inline function This new function allows to test if a memory region starting at ptr and of size len is full of zeroes. --- src/backend/storage/page/bufpage.c | 13 + src/backend/utils/activity/pgstat_bgwriter.c | 5 +++-- src/backend/utils/activity/pgstat_checkpointer.c | 7 +++ src/backend/utils/activity/pgstat_relation.c | 7 ++- src/include/utils/memutils.h | 16 5 files changed, 25 insertions(+), 23 deletions(-) 19.6% src/backend/storage/page/ 58.8% src/backend/utils/activity/ 21.4% src/include/utils/ diff --git a/src/backend/storage/page/bufpage.c b/src/backend/storage/page/bufpage.c index be6f1f62d2..5ee1e58cd4 100644 --- a/src/backend/storage/page/bufpage.c +++ b/src/backend/storage/page/bufpage.c @@ -89,10 +89,8 @@ PageIsVerifiedExtended(Page page, BlockNumber blkno, int flags) { PageHeader p = (PageHeader) page; size_t *pagebytes; - int i; bool checksum_failure = false; bool header_sane = false; - bool all_zeroes = false; uint16 checksum = 0; /* @@ -126,18 +124,9 @@ PageIsVerifiedExtended(Page page, BlockNumber blkno, int flags) } /* Check all-zeroes case */ - all_zeroes = true; pagebytes = (size_t *) page; - for (i = 0; i < (BLCKSZ / sizeof(size_t)); i++) - { - if (pagebytes[i] != 0) - { - all_zeroes = false; - break; - } - } - if (all_zeroes) + if (pg_memory_is_all_zeros(pagebytes, (BLCKSZ / sizeof(size_t return true; /* diff --git a/src/backend/utils/activity/pgstat_bgwriter.c b/src/backend/utils/activity/pgstat_bgwriter.c index 364a7a2024..85d53d82f2 100644 --- a/src/backend/utils/activity/pgstat_bgwriter.c +++ b/src/backend/utils/activity/pgstat_bgwriter.c @@ -17,6 +17,7 @@ #include "postgres.h" +#include "utils/memutils.h" #include "utils/pgstat_internal.h" @@ -30,7 +31,6 @@ void pgstat_report_bgwriter(void) { PgStatShared_BgWriter *stats_shmem = &pgStatLocal.shmem->bgwriter; - static const PgStat_BgWriterStats all_zeroes; Assert(!pgStatLocal.shmem->is_shutdown); pgstat_assert_is_up(); @@ -39,7 +39,8 @@ pgstat_report_bgwriter(void) * This function can be called even if nothing at all has happened. In * this case, avoid unnecessarily modifying the stats entry. */ - if (memcmp(&PendingBgWriterStats, &all_zeroes, sizeof(all_zeroes)) == 0) + if (pg_memory_is_all_zeros(&PendingBgWriterStats, + sizeof(struct PgStat_BgWriterStats))) return; pgstat_begin_changecount_write(&stats_shmem->changecount); diff --git a/src/backend/utils/activity/pgstat_checkpointer.c b/src/backend/utils/activity/pgstat_checkpointer.c index bbfc9c7e18..258337cfb8 100644 --- a/src/backend/utils/activity/pgstat_checkpointer.c +++ b/src/backend/utils/activity/pgstat_checkpointer.c @@ -17,6 +17,7 @@ #include "postgres.h" +#include "utils/memutils.h" #include "utils/pgstat_internal.h" @@ -29,8 +30,6 @@ PgStat_CheckpointerStats PendingCheckpointerStats = {0}; void pgstat_report_checkpointer(void) { - /* We assume this initializes to zeroes */ - static const PgStat_CheckpointerStats all_zeroes; PgStatShared_Checkpointer *stats_shmem = &pgStatLocal.shmem->checkpointer; Assert(!pgStatLocal.shmem->is_shutdown); @@ -40,8 +39,8 @@ pgstat_report_checkpointer(void) * This function can be called even if nothing at all has happened. In * this case, avoid unnecessarily modifying the stats entry. */ - if (memcmp(&PendingCheckpointerStats, &all_zer
Re: Having problems generating a code coverage report
On Wed, Oct 30, 2024 at 05:26:49PM -0400, Peter Geoghegan wrote: > I use Debian unstable for most of my day to day work. Apparently > Debian unstable has exactly the same version of lcov as Ubuntu 24.04. > > I've also been unable to generate coverage reports for some time (at > least on Debian, with LCOV version 2.0-1). So do I, for both the debian SID and the lcov parts. -- Michael signature.asc Description: PGP signature
Re: Allowing pg_recvlogical to create temporary replication slots
On Wed, Oct 30, 2024 at 09:00:59AM +0100, Peter Eisentraut wrote: > I think you should explain a bit why you want to do that, what use case or > scenario this is meant to address. Indeed. Note also this page giving a couple more guidelines with more details: https://wiki.postgresql.org/wiki/Submitting_a_Patch We also use what we call a "commit fest" to keep a track of the patches under review for the current development cycle. It is a process where folks can register for patch reviews and provide feedback. There is still time to register what you have here for the commit fest beginning on the 1st of November: https://commitfest.postgresql.org/50/ -- Michael signature.asc Description: PGP signature
Re: Inval reliability, especially for inplace updates
On Mon, Oct 28, 2024 at 02:27:03PM +0530, Nitin Motiani wrote: > On Thu, Oct 24, 2024 at 8:24 AM Noah Misch wrote: > > With the releases wrapping in 2.5 weeks, I'm ambivalent about pushing this > > before the release or after. Pushing before means fewer occurrences of > > corruption, but pushing after gives more bake time to discover these changes > > were defective. It's hard to predict which helps users more, on a > > risk-adjusted basis. I'm leaning toward pushing this week. Opinions? > > I lean towards pushing after the release. This is based on my > assumption that since this bug has been around for a while, it is > (probably) not hit often. And a few weeks delay is better than > introducing a new defect. I had pushed this during the indicated week, before your mail. Reverting it is an option. Let's see if more opinions arrive.
Fix for Extra Parenthesis in pgbench progress message
Hi, I noticed an issue in the pgbench progress message where an extra closing parenthesis )) appears, as shown below: 700 of 1000 tuples (70%) of pgbench_accounts done (elapsed 19.75 s, remaining 8.46 s)) This occurs when running commands like pgbench -i -s100 and is caused by leftover characters when using \r with fprintf. I made a patch to address this by adding extra spaces before \r, which clears any remaining characters. While effective, I recognize this solution may not be the most sophisticated. A more refined solution, such as using string padding, might be ideal for cases like this. Best, Yushi Ogiwaradiff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c index e658d060ad..be984e0c7f 100644 --- a/src/bin/pgbench/pgbench.c +++ b/src/bin/pgbench/pgbench.c @@ -5004,7 +5004,7 @@ initPopulateTable(PGconn *con, const char *table, int64 base, double elapsed_sec = PG_TIME_GET_DOUBLE(pg_time_now() - start); double remaining_sec = ((double) total - j) * elapsed_sec / j; - chars = fprintf(stderr, INT64_FORMAT " of " INT64_FORMAT " tuples (%d%%) of %s done (elapsed %.2f s, remaining %.2f s)%c", + chars = fprintf(stderr, INT64_FORMAT " of " INT64_FORMAT " tuples (%d%%) of %s done (elapsed %.2f s, remaining %.2f s) %c", j, total, (int) ((j * 100) / total), table, elapsed_sec, remaining_sec, eol); @@ -5018,7 +5018,7 @@ initPopulateTable(PGconn *con, const char *table, int64 base, /* have we reached the next interval (or end)? */ if ((j == total) || (elapsed_sec >= log_interval * LOG_STEP_SECONDS)) { -chars = fprintf(stderr, INT64_FORMAT " of " INT64_FORMAT " tuples (%d%%) of %s done (elapsed %.2f s, remaining %.2f s)%c", +chars = fprintf(stderr, INT64_FORMAT " of " INT64_FORMAT " tuples (%d%%) of %s done (elapsed %.2f s, remaining %.2f s) %c", j, total, (int) ((j * 100) / total), table, elapsed_sec, remaining_sec, eol);
Re: Segfault in jit tuple deforming on arm64 due to LLVM issue
On Thu, Oct 17, 2024 at 10:41 PM Thomas Munro wrote: > Thanks! I'm going to go ahead and commit this. (Sorry for the delay, I got distracted by pgconf.eu.) Today I set out to commit this patch, and wrote a proper commit message to explain the code provenance, circumstances that led to it, and the future conditions that will allow us to delete it in a few years. Please see attached. In the process I struck a potential snag: https://llvm.org/LICENSE.txt https://en.wikipedia.org/wiki/Apache_License There are a couple of cases of dual-licensed code in our tree where we explicitly used the Boost alternative instead of Apache 2. I plead complete ignorance of this topic and defer to those who know about such things: can we actually do this? I guess at a minimum a copy of the licence would need to appear somewhere -- perhaps under src/backend/jit/llvm? 4d says that if you modified the code you have to say so prominently, but I did that at the top (and the changes are completely trivial, just some #ifdef swizzling to massage some function prototypes to suit older LLVMs). Otherwise I understand it to be generally "BSD-like" (sans advert clause) but there is also some stuff about patents, which surely aren't relevant to this in practice... but I know that some projects object to it on principle and because it smells like contract law, or something not an area I am well informed about. Who should I be asking? (Naively, I wondered: could there be some kind of fair use concept for back-patching fixes to broken libraries that you're merely a user of where you can be excused from the burdens of a distributor? Yeah wishful thinking I'm sure.) From 5d9ef5f83a79f56ddd6230c7e9428525bab0d82c Mon Sep 17 00:00:00 2001 From: Thomas Munro Date: Tue, 27 Aug 2024 08:58:48 +1200 Subject: [PATCH v7] Monkey-patch LLVM code to fix ARM relocation bug. Supply a new memory manager for RuntimeDyld that avoids overflowing the jump distance in linked code on large memory machines. This is the same Apache v2-licensed code that is normally used inside the LLVM library, except this copy has been patched by Michael Smith in his proposed fix https://www.github.com/llvm/llvm-project/pull/71968. We hereby slurp it into our own source tree, after moving into a new namespace llvm::backport and making some minor adjustments so that it can be compiled with older LLVM versions as far back as 12. It's harder to make it work on even older LLVM versions, but it doesn't seem likely that people are really using them so that is not investigated for now. (We were probably remiss in not disclaiming support for ancient LLVM releases sooner; starting from PostgreSQL 16 we've been more agressive about trimming version support to match the trailing end of common software distributions). The problem could also have been addressed by switching to JITLink, and that is the LLVM project's recommended solution as RuntimeDyld is about to be deprecated. We'll have to do that soon enough anyway, and then when the LLVM version support window advances far enough in a few years we'll be able to delete this code. Unfortunately that wouldn't work today for some relevant releases of LLVM where JITLink is missing or incomplete. Several other projects have already taken the approach of back-porting this fix into their fork of LLVM, which is a vote of confidence despite the lack of commit into LLVM as of today. We don't have our own copy of LLVM so we can't do exactly what they've done. We just pick up the whole patched class and inject an instance of it into the regular LLVM code. The changes that we've had to make to our copy can be seen by diffing our SectionMemoryManager.{h,cpp} files against the ones in the tree of the pull request. The LLVM project hasn't chosen to commit the fix yet, and even it it did, it wouldn't be back-ported into the releases of LLVM that most of our users care about, so there is not much point in waiting any longer for that. If they make further changes and commit it to LLVM 19 or 20, we'll still need this for older versions, but we may want to resynchronize our copy. This should fix the spate of crash reports we've been receiving lately from users on large memory ARM systems. Back-patch to all supported releases. Co-authored-by: Thomas Munro Co-authored-by: Anthonin Bonnefoy Reported-by: Anthonin Bonnefoy Discussion: https://postgr.es/m/CAO6_Xqr63qj%3DSx7HY6ZiiQ6R_JbX%2B-p6sTPwDYwTWZjUmjsYBg%40mail.gmail.com --- src/backend/jit/llvm/Makefile | 3 +- src/backend/jit/llvm/SectionMemoryManager.cpp | 394 ++ src/backend/jit/llvm/llvmjit.c| 7 + src/backend/jit/llvm/llvmjit_wrap.cpp | 20 + src/backend/jit/llvm/meson.build | 1 + src/include/jit/SectionMemoryManager.h| 227 ++ src/include/jit/llvmjit.h | 8 + src/include/jit/llvmjit_backport.h| 20 + src/tools/pginclude/headerscheck
Re: Consider pipeline implicit transaction as a transaction block
On Wed, Oct 30, 2024 at 04:06:20PM +0100, Jelte Fennema-Nio wrote: > On Wed, 30 Oct 2024 at 10:15, Anthonin Bonnefoy > wrote: >> The attached patch adds the detection of implicit transactions started >> by a pipeline in CheckTransactionBlock, avoiding warnings when >> commands like `set local` are called within a pipeline, and making the >> detection of transaction block coherent with what's done in >> IsInTransactionBlock and PreventInTransactionBlock. > > +1 seems like a reasonable change. That's indeed a bit strange. I think that you're right. @Tom added in CC: Is there a specific reason why CheckTransactionBlock() did not include a check based on XACT_FLAGS_PIPELINING when it got introduced in 20432f873140, while IsInTransactionBlock() considers it? This was discussed here: https://www.postgresql.org/message-id/flat/17434-d9f7a064ce2a88a3%40postgresql.org -- Michael signature.asc Description: PGP signature
Re: define pg_structiszero(addr, s, r)
On Thu, Oct 31, 2024 at 05:52:49AM +, Bertrand Drouvot wrote: > That makes sense to me, done that way in the attached. Seems kind of OK seen from here. I am wondering if others more comments about the name of the macro, its location, the fact that we still have pagebytes in bufpage.c, etc. -- Michael signature.asc Description: PGP signature
Re: unicode test programs don't build with meson
On 18.10.24 15:17, Aleksander Alekseev wrote: 1) Add libintl directly to the affected programs, like case_test = executable('case_test', ['case_test.c'], - dependencies: [frontend_port_code, icu], + dependencies: [frontend_port_code, icu, libintl], include_directories: inc, link_with: [common_static, pgport_static], build_by_default: false, To me the first option seems to be the most natural one (does the program depend on the library? add it to the list of dependencies for this program). Also it seems to be most consistent with what we currently have, see how icu is listed. As there were no other opinions forthcoming, I have committed it like this. Thanks!
Re: Allowing pg_recvlogical to create temporary replication slots
On 27.10.24 13:37, Torsten Förtsch wrote: This is my very first message to this mailing list. Please advise if I am making any mistakes in the procedure. Welcome! The attached patch enables pg_recvlogical to create a temporary slot. What is the next step in the process to get this change into official postgres? I think you should explain a bit why you want to do that, what use case or scenario this is meant to address.
Re: define pg_structiszero(addr, s, r)
On 29.10.24 14:58, Bertrand Drouvot wrote: hi, On Tue, Oct 29, 2024 at 10:23:37AM +0100, Peter Eisentraut wrote: On 29.10.24 08:54, Bertrand Drouvot wrote: +static inline bool +pg_mem_is_all_zeros(const char *p, size_t len) This should be a void * pointer please. Thanks for looking at it! Yeah better, done in v4 attached. Sorry for the confusion. I didn't mean to discourage you from the const qualifier. That should still be there.
Re: protocol-level wait-for-LSN
On Mon, 28 Oct 2024 at 16:51, Peter Eisentraut wrote: > Thoughts? + snprintf(xloc, sizeof(xloc), "%X/%X", LSN_FORMAT_ARGS(logptr)) + pq_sendstring(&buf, xloc); nit: I feel that sending the LSN as a string seems unnecessarily wasteful of bytes. I'd rather send it as its binary representation.
Re: protocol-level wait-for-LSN
On Mon, 28 Oct 2024 at 17:58, Heikki Linnakangas wrote: > > - The Query message sends an LSN to wait for. (This doesn't handle the > > extended query protocol yet.) > > I'd suggest adding a new message type for this, so that it works the > same with simple and extended query. Or if you just want to wait without > issuing any query. Big +1 to this. After thinking about it more, I think this would make a fancy pooler much easier to implement. Because then the pooler could simply send such a new WaitForLSN message whenever it wants to, e.g. before handing off the server connection to the client. Instead of having to intercept every Query/Execute message that the client is sending, and modify that in place before sending it on to the server. Writing the previous down made me realize that using a separate message would be nice for this usecase too. As opposed to including it in ReadyForQuery. Because if the fancy pooler wants to configure these LSNs transparently for a client that has not set the protocol parameter, it would need to strip the new LSN field from the ReadyForQuery message before forwarding it to the client. Stripping out a whole message is generally easier to do than modifying messages in place.