Re: Support to define custom wait events for extensions
On Fri, Jun 16, 2023 at 11:14:05AM +0900, Masahiro Ikeda wrote: > I tried to query on pg_stat_activity to check the background worker > invoked by pg_prewarm. But, I found that pg_stat_activity doesn't show > it although I may be missing something... > > So, I tried to implement TAP tests. But I have a problem with it. > I couldn't find the way to check the status of another backend > while the another backend wait with custom wait events. Hmm. Right. It seems to me that BGWORKER_BACKEND_DATABASE_CONNECTION is required in this case, with BackgroundWorkerInitializeConnection() to connect to a database (or not, like the logical replication launcher if only access to shared catalogs is wanted). I have missed that the leader process of pg_prewarm does not use that, because it has no need to connect to a database, but its workers do. So it is not going to show up in pg_stat_activity. -- Michael signature.asc Description: PGP signature
Re: add non-option reordering to in-tree getopt_long
At Thu, 15 Jun 2023 21:58:28 -0700, Nathan Bossart wrote in > On Fri, Jun 16, 2023 at 10:30:09AM +0900, Michael Paquier wrote: > > On Thu, Jun 15, 2023 at 05:09:59PM -0700, Nathan Bossart wrote: > >> I've attached a new version of the patch that omits the > >> POSIXLY_CORRECT stuff. > > > > This looks OK at quick glance, though you may want to document at the > > top of getopt_long() the reordering business with the non-options? > > I added a comment to this effect in v3. I also noticed that '-' wasn't > properly handled as a non-option, so I've tried to fix that as well. (Honestly, the rearrangement code looks somewhat tricky to grasp..) It doesn't work properly if '--' is involved. For example, consider the following options (even though they don't work for the command). psql -t -T hoho -a hoge -- -1 hage hige huge After the function returns -1, the argv array looks like this. The "[..]" indicates the position of optind. psql -t -T hoho -a -- [-1] hage hige huge hoge This is clearly incorrect since "hoge" gets moved to the end. The rearrangement logic needs to take into account the '--'. For your information, the glibc version yields the following result for the same options. psql -t -T hoho -a -- [hoge] -1 hage hige huge regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Use the enum value CRS_EXPORT_SNAPSHOT instead of "0"
On Fri, Jun 16, 2023 at 3:10 PM Wei Wang (Fujitsu) wrote: > > Hi, > > In the function WalReceiverMain, when the function walrcv_create_slot is > called, > the fourth parameter is assigned the value "0" instead of the enum value > "CRS_EXPORT_SNAPSHOT". I think it would be better to use the corresponding > enum > value. The walreceiver process doesn't use CRS_EXPORT_SNAPSHOT actually, right? I think replacing it with CRS_EXPORT_SNAPSHOT would rather confuse me Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Incorrect estimation of HashJoin rows resulted from inaccurate small table statistics
We have a small table with only 23 rows and 21 values. The resulting MCV and histogram is as follows stanumbers1 | {0.08695652,0.08695652} stavalues1 | {v1,v2} stavalues2 | {v3,v4,v5,v6,v7,v8,v9,v10,v11,v12,v13,v14,v15,v16,v17,v18,v19,v20,v21} An incorrect number of rows was estimated when HashJoin was done with another large table (about 2 million rows). Hash Join (cost=1.52..92414.61 rows=2035023 width=0) (actual time=1.943..1528.983 rows=3902 loops=1) The reason is that the MCV of the small table excludes values with rows of 1. Put them in the MCV in the statistics to get the correct result. Using the conservative samplerows <= attstattarget doesn't completely solve this problem. It can solve this case. After modification we get statistics without histogram: stanumbers1 | {0.08695652,0.08695652,0.04347826,0.04347826, ... } stavalues1 | {v,v2, ... } And we have the right estimates: Hash Join (cost=1.52..72100.69 rows=3631 width=0) (actual time=1.447..1268.385 rows=3902 loops=1) Regards, -- Quan Zongliang Beijing Vastdata Co., LTDdiff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c index 52ef462dba..08ea4243f5 100644 --- a/src/backend/commands/analyze.c +++ b/src/backend/commands/analyze.c @@ -2518,7 +2518,7 @@ compute_scalar_stats(VacAttrStatsP stats, { /* Reached end of duplicates of this value */ ndistinct++; - if (dups_cnt > 1) + if (dups_cnt > 1 || samplerows <= num_mcv) { nmultiple++; if (track_cnt < num_mcv ||
Re: Use the enum value CRS_EXPORT_SNAPSHOT instead of "0"
On Fri, Jun 16, 2023 at 6:17 PM Masahiko Sawada wrote: > > On Fri, Jun 16, 2023 at 3:10 PM Wei Wang (Fujitsu) > wrote: > > > > Hi, > > > > In the function WalReceiverMain, when the function walrcv_create_slot is > > called, > > the fourth parameter is assigned the value "0" instead of the enum value > > "CRS_EXPORT_SNAPSHOT". I think it would be better to use the corresponding > > enum > > value. > > The walreceiver process doesn't use CRS_EXPORT_SNAPSHOT actually, > right? I think replacing it with CRS_EXPORT_SNAPSHOT would rather > confuse me > Passing some number (0) which has the same value as an enum, while at the same time not intending it to have the same meaning as that enum smells strange to me. If none of the existing enums is meaningful here, then perhaps there ought to be another enum added (CRS_UNUSED?) and pass that instead. ~ Alternatively, maybe continue to pass 0, but ensure the existing enums do not include any value of 0. e.g. typedef enum { CRS_EXPORT_SNAPSHOT = 1, CRS_NOEXPORT_SNAPSHOT, CRS_USE_SNAPSHOT } CRSSnapshotAction; -- Kind Regards, Peter Smith. Fujitsu Australia
Re: Use the enum value CRS_EXPORT_SNAPSHOT instead of "0"
On 2023-Jun-16, Masahiko Sawada wrote: > The walreceiver process doesn't use CRS_EXPORT_SNAPSHOT actually, > right? I think replacing it with CRS_EXPORT_SNAPSHOT would rather > confuse me libpqwalreceiver.c does use it. But I agree -- I think it would be better to not use the enum in walreceiver at all. IIRC if we stopped use of that enum in {libpq}walreceiver, then we wouldn't need walsender.h inclusion by walreceiver files. However, changing it means a change of the walrcv_create_slot API, so it's not all that trivial. But we could have a walreceiver-side enum instead (with the same values). I think this would be worth doing, because it'll all end up cleaner. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
Re: Synchronizing slots from primary to standby
On Mon, Apr 17, 2023 at 7:37 PM Drouvot, Bertrand wrote: > > Please find attached V5 (a rebase of V4 posted up-thread). > > In addition to the "rebasing" work, the TAP test adds a test about conflict > handling (logical slot invalidation) > relying on the work done in the "Minimal logical decoding on standby" patch > series. > > I did not look more at the patch (than what's was needed for the rebase) but > plan to do so. > Are you still planning to continue working on this? Some miscellaneous comments while going through this patch are as follows? 1. Can you please try to explain the functionality of the overall patch somewhere in the form of comments and or commit message? 2. It seems that the initially synchronized list of slots is only used to launch a per-database worker to synchronize all the slots corresponding to that database. If so, then why do we need to fetch all the slot-related information via LIST_SLOTS command? 3. As mentioned in the initial email, I think it would be better to replace LIST_SLOTS command with a SELECT query. 4. How the limit of sync_slot workers is decided? Can we document such a piece of information? Do we need a new GUC to decide the number of workers? Ideally, it would be better to avoid GUC, can we use any existing logical replication workers related GUC? 5. Can we separate out the functionality related to standby_slot_names in a separate patch, probably the first one? I think that will patch easier to review. 6. In libpqrcv_list_slots(), two-phase related slot information is not retrieved. Is there a reason for the same? 7. +static void +wait_for_standby_confirmation(XLogRecPtr commit_lsn) Some comments atop this function would make it easier to review. 8. +/*- + * slotsync.c + *PostgreSQL worker for synchronizing slots to a standby from primary + * + * Copyright (c) 2016-2018, PostgreSQL Global Development Group + * The copyright notice is out-of-date. 9. Why synchronize_one_slot() compares MyReplicationSlot->data.restart_lsn with the value of confirmed_flush_lsn passed to it? Also, why it does only for new slots but not existing slots? 10. Can we somehow test if the restart_lsn is advanced properly after sync? I think it is important to ensure that because otherwise after standby's promotion, the subscriber can start syncing from the wrong position. -- With Regards, Amit Kapila.
Re: Pluggable toaster
Hi, Seems that I missed the thread mentioned above. I strongly disagree with such statement, Pluggable TOAST could not be a part or Compression Dictionaries thread because the TOAST improvement is a more general subject, it involves much deeper and tricky changes in the core. And also is much more promising in terms of performance and storage improvements. We already have a lot of changes in Pluggable TOAST that were not committed to the main GIT branch of this thread, so it seems that I have to merge them and reopen it. -- Regards, Nikita Malakhov Postgres Professional The Russian Postgres Company https://postgrespro.ru/
Re: Do we want a hashset type?
New patch attached: Add customizable params to int4hashset() and collision count function This commit enhances int4hashset() by introducing adjustable capacity, load, and growth factors, providing flexibility for performance optimization. Also added is a new function, hashset_collisions(), to report collision counts, aiding in performance tuning. Aggregate functions are renamed to hashset_agg() for consistency with array_agg() and range_agg(). A new test file, test/sql/benchmark.sql, is added for evaluating the performance of hash functions. It's not run automatically by make installcheck. The adjustable parameters and the naive hash function are useful for testing and performance comparison. However, to keep things simple and streamlined for users, these features are likely to be removed in the final release, emphasizing the use of well-optimized default settings. SQL-function indentation is also adjusted to align with the PostgreSQL source repo, improving readability. In the benchmark results below, it was a bit surprising the naive hash function had no collisions, but that only held true when the input elements were sequential integers. When tested with random integers, all three hash functions caused collisions. Timing results not statistical significant, the purpose is just to give an idea of the execution times. *** Elements in sequence 1..10 - Testing default hash function (Jenkins/lookup3) psql:test/sql/benchmark.sql:23: NOTICE: hashset_count: 10 psql:test/sql/benchmark.sql:23: NOTICE: hashset_capacity: 262144 psql:test/sql/benchmark.sql:23: NOTICE: hashset_collisions: 31195 DO Time: 1342.564 ms (00:01.343) - Testing Murmurhash32 psql:test/sql/benchmark.sql:40: NOTICE: hashset_count: 10 psql:test/sql/benchmark.sql:40: NOTICE: hashset_capacity: 262144 psql:test/sql/benchmark.sql:40: NOTICE: hashset_collisions: 30879 DO Time: 1297.823 ms (00:01.298) - Testing naive hash function psql:test/sql/benchmark.sql:57: NOTICE: hashset_count: 10 psql:test/sql/benchmark.sql:57: NOTICE: hashset_capacity: 262144 psql:test/sql/benchmark.sql:57: NOTICE: hashset_collisions: 0 DO Time: 1400.936 ms (00:01.401) *** Testing 10 random ints setseed - (1 row) Time: 3.591 ms - Testing default hash function (Jenkins/lookup3) psql:test/sql/benchmark.sql:77: NOTICE: hashset_count: 10 psql:test/sql/benchmark.sql:77: NOTICE: hashset_capacity: 262144 psql:test/sql/benchmark.sql:77: NOTICE: hashset_collisions: 30919 DO Time: 1415.497 ms (00:01.415) setseed - (1 row) Time: 1.282 ms - Testing Murmurhash32 psql:test/sql/benchmark.sql:95: NOTICE: hashset_count: 10 psql:test/sql/benchmark.sql:95: NOTICE: hashset_capacity: 262144 psql:test/sql/benchmark.sql:95: NOTICE: hashset_collisions: 30812 DO Time: 2079.202 ms (00:02.079) setseed - (1 row) Time: 0.122 ms - Testing naive hash function psql:test/sql/benchmark.sql:113: NOTICE: hashset_count: 10 psql:test/sql/benchmark.sql:113: NOTICE: hashset_capacity: 262144 psql:test/sql/benchmark.sql:113: NOTICE: hashset_collisions: 30822 DO Time: 1613.965 ms (00:01.614) /Joel hashset-0.0.1-184a18a.patch Description: Binary data
Re: Support to define custom wait events for extensions
On 2023-06-16 16:46, Michael Paquier wrote: On Fri, Jun 16, 2023 at 11:14:05AM +0900, Masahiro Ikeda wrote: I tried to query on pg_stat_activity to check the background worker invoked by pg_prewarm. But, I found that pg_stat_activity doesn't show it although I may be missing something... So, I tried to implement TAP tests. But I have a problem with it. I couldn't find the way to check the status of another backend while the another backend wait with custom wait events. Hmm. Right. It seems to me that BGWORKER_BACKEND_DATABASE_CONNECTION is required in this case, with BackgroundWorkerInitializeConnection() to connect to a database (or not, like the logical replication launcher if only access to shared catalogs is wanted). I have missed that the leader process of pg_prewarm does not use that, because it has no need to connect to a database, but its workers do. So it is not going to show up in pg_stat_activity. Yes. Thanks to your advice, I understood that BGWORKER_BACKEND_DATABASE_CONNECTION is the reason. I could make the TAP test that invokes a background worker waiting forever and checks its custom wait event in pg_stat_activity. So, I'll make patches including test codes next week. Regards, -- Masahiro Ikeda NTT DATA CORPORATION
Re: Do we want a hashset type?
similar to (int[] || int4) and (int4 || int[]) should we expect ('{1,2}'::int4hashset || 3) == (3 || '{1,2}'::int4hashset) == (select hashset_add('{1,2}'::int4hashset,3)); *?* The following is the general idea on how to make it work by looking at similar code CREATE OPERATOR || ( leftarg = int4hashset, rightarg = int4, function = int4hashset_add, commutator = || ); CREATE OR REPLACE FUNCTION int4_add_int4hashset(int4, int4hashset) RETURNS int4hashset LANGUAGE sql IMMUTABLE PARALLEL SAFE STRICT COST 1 RETURN $2 || $1; CREATE OPERATOR || ( leftarg = int4, rightarg = int4hashset, function = int4_add_int4hashset, commutator = || ); while creating an operator. I am not sure how to specify NEGATOR,RESTRICT,JOIN clause. - also. I think the following query should return one row only? but now it doesn't. select hashset_cmp('{1,2}','{2,1}') union select hashset_cmp('{1,2}','{1,2,1}') union select hashset_cmp('{1,2}','{1,2}'); -- similar to elem_contained_by_range, range_contains_elem. we should already consider the operator *<@* and @*>? * CREATE OR REPLACE FUNCTION elem_contained_by_hashset(int4, int4hashset) RETURNS bool LANGUAGE sql IMMUTABLE PARALLEL SAFE STRICT COST 1 RETURN hashset_contains ($2,$1); Is the integer contained in the int4hashset? integer <@ int4hashset → boolean 1 <@ int4hashset'{1,7}' → t CREATE OPERATOR <@ ( leftarg = integer, rightarg = int4hashset, function = elem_contained_by_hashset ); int4hashset @> integer → boolean Does the int4hashset contain the element? int4hashset'{1,7}' @> 1 → t CREATE OPERATOR @> ( leftarg = int4hashset, rightarg = integer, function = hashset_contains ); ---
Add some more corruption error codes to relcache
Hi hackers, Relcache errors from time to time detect catalog corruptions. For example, recently I observed following: 1. Filesystem or nvme disk zeroed out leading 160Kb of catalog index. This type of corruption passes through data_checksums. 2. RelationBuildTupleDesc() was failing with "catalog is missing 1 attribute(s) for relid 2662". 3. We monitor corruption error codes and alert on-call DBAs when see one, but the message is not marked as XX001 or XX002. It's XX000 which happens from time to time due to less critical reasons than data corruption. 4. High-availability automation switched primary to other host and other monitoring checks did not ring too. This particular case is not very illustrative. In fact we had index corruption that looked like catalog corruption. But still it looks to me that catalog inconsistencies (like relnatts != number of pg_attribute rows) could be marked with ERRCODE_DATA_CORRUPTED. This particular error code in my experience proved to be a good indicator for early corruption detection. What do you think? What other subsystems can be improved in the same manner? Best regards, Andrey Borodin. v1-0001-Add-corruption-error-codes-to-relcache-entries.patch Description: Binary data
Re: Pluggable toaster
Hi Nikita, > We already have a lot of changes in Pluggable TOAST that were not committed > to the main GIT branch of this thread, so it seems that I have to merge them > and > reopen it. Pretty sure that reopening an already rejected patch that is competing with compression dictionaries (which the rest of us are currently focusing on) will achieve anything. Consider joining the compression dictionaries effort instead [1]. During the discussion with the community it ended up being a TOAST improvement after all. So we could use your expertise in this area. [1]: https://commitfest.postgresql.org/43/3626/ -- Best regards, Aleksander Alekseev
Re: RFC: logical publication via inheritance root?
On Sat, Apr 1, 2023 at 5:06 AM Jacob Champion wrote: > > On Fri, Mar 31, 2023 at 3:17 PM Peter Smith wrote: > > > Outside the scope of special TimescaleDB tables and the speculated > > pg_partman old-style table migration, will this proposed new feature > > have any other application? In other words, do you know if this > > proposal will be of any benefit to the *normal* users who just have > > native PostgreSQL inherited tables they want to replicate? > > I think it comes down to why an inheritance scheme was used. If it's > because you want to group rows into named, queryable subsets (e.g. the > "cities/capitals" example in the docs [1]), I don't think this has any > utility, because I assume you'd want to replicate your subsets as-is. > I also think so and your idea to have a function like pg_set_logical_root() seems to make the inheritance hierarchy behaves as a declarative partitioning scheme for the purpose of logical replication. > But if it's because you've implemented a partitioning scheme of your > own (the docs still list reasons you might want to [2], even today), > and all you ever really do is interact with the root table, I think > this feature will give you some of the same benefits that > publish_via_partition_root gives native partition users. We're very > much in that boat, but I don't know how many others are. > I agree that there may still be cases as pointed out by you where people want to use inheritance as a mechanism for partitioning but I feel those would still be in the minority. Personally, I am not very excited about this idea. -- With Regards, Amit Kapila.
Re: [DOC] Update ALTER SUBSCRIPTION documentation v3
On 15.06.23 04:49, Amit Kapila wrote: I wanted to backpatch the following change which provides somewhat accurate information about what a user needs to do when it faces an error. To proceed in - this situation, disassociate the subscription from the replication slot by - executing ALTER SUBSCRIPTION ... SET (slot_name = NONE). + this situation, first DISABLE the subscription, and then + disassociate it from the replication slot by executing + ALTER SUBSCRIPTION ... SET (slot_name = NONE). Now, along with this change, there is a change in errhint as well which I am not sure about whether to backpatch or not. I think we have the following options (a) commit both doc and code change in HEAD (b) commit both doc and code change in v17 when the next version branch opens (c) backpatch the doc change and commit the code change in HEAD only (d) backpatch the doc change and commit the code change in v17 (e) backpatch both the doc and code change. Reading the thread again now, I think this is essentially a bug fix, so I don't mind backpatching it. I wish the errhint would show the actual command to disable the subscription. It already shows the command to detach the replication slot, so it would only be consistent to also show the other command.
Re: [17] collation provider "builtin"
On 15.06.23 00:55, Jeff Davis wrote: The locale "C" (and equivalently, "POSIX") is not really a libc locale; it's implemented internally with memcmp for collation and pg_ascii_tolower, etc., for ctype. The attached patch implements a new collation provider, "builtin", which only supports "C" and "POSIX". It does not change the initdb default provider, so it must be requested explicitly. The user will be guaranteed that collations with provider "builtin" will never change semantics; therefore they need no version and indexes are not at risk of corruption. See previous discussion[1]. What happens if after this patch you continue to specify provider=libc and locale=C? Do you then get the "slow" path? Should there be some logic in pg_dump to change the provider if locale=C? What is the transition plan?
Parent/child context relation in pg_get_backend_memory_contexts()
Hi hackers, pg_get_backend_memory_contexts() (and pg_backend_memory_contexts view) does not display parent/child relation between contexts reliably. Current version of this function only shows the name of parent context for each context. The issue here is that it's not guaranteed that context names are unique. So, this makes it difficult to find the correct parent of a context. How can knowing the correct parent context be useful? One important use-case can be that it would allow us to sum up all the space used by a particular context and all other subcontexts which stem from that context. Calculating this sum is helpful since currently (total/used/free)_bytes returned by this function does not include child contexts. For this reason, only looking into the related row in pg_backend_memory_contexts does not help us to understand how many bytes that context is actually taking. Simplest approach to solve this could be just adding two new fields, id and parent_id, in pg_get_backend_memory_contexts() and ensuring each context has a unique id. This way allows us to build a correct memory context "tree". Please see the attached patch which introduces those two fields. Couldn't find an existing unique identifier to use. The patch simply assigns an id during the execution of pg_get_backend_memory_contexts() and does not store those id's anywhere. This means that these id's may be different in each call. With this change, here's a query to find how much space used by each context including its children: > WITH RECURSIVE cte AS ( > SELECT id, total_bytes, id as root, name as root_name > FROM memory_contexts > UNION ALL > SELECT r.id, r.total_bytes, cte.root, cte.root_name > FROM memory_contexts r > INNER JOIN cte ON r.parent_id = cte.id > ), > memory_contexts AS ( > SELECT * FROM pg_backend_memory_contexts > ) > SELECT root as id, root_name as name, sum(total_bytes) > FROM cte > GROUP BY root, root_name > ORDER BY sum DESC; You should see that TopMemoryContext is the one with highest allocated space since all other contexts are simply created under TopMemoryContext. Also; even though having a correct link between parent/child contexts can be useful to find out many other things as well by only writing SQL queries, it might require complex recursive queries similar to the one in case of total_bytes including children. Maybe, we can also consider adding such frequently used and/or useful information as new fields in pg_get_backend_memory_contexts() too. I appreciate any comment/feedback on this. Thanks, -- Melih Mutlu Microsoft 0001-Adding-id-parent_id-into-pg_backend_memory_contexts.patch Description: Binary data
[BUG] recovery of prepared transactions during promotion can fail
Hey everyone, I've discovered a serious bug that leads to a server crash upon promoting an instance that crashed previously and did recovery in standby mode. The bug is present in PostgreSQL versions 13 and 14 (and in earlier versions, though it doesn't manifest itself so catastrophically). The circumstances to trigger the bug are as follows: - postgresql is configured for hot_standby, archiving, and prepared transactions - prepare a transaction - crash postgresql - create standby.signal file - start postgresql, wait for recovery to finish - promote The promotion will fail with a FATAL error, stating that "requested WAL segment .* has already been removed". The FATAL error causes the startup process to exit, so postmaster shuts down again. Here's an exemplary log output, maybe this helps people to find this issue when they search for it online: LOG: consistent recovery state reached at 0/15D8AB0 LOG: database system is ready to accept read only connections LOG: received promote request LOG: redo done at 0/15D89B8 LOG: last completed transaction was at log time 2023-06-16 13:09:53.71118+02 LOG: selected new timeline ID: 2 LOG: archive recovery complete FATAL: requested WAL segment pg_wal/00010001 has already been removed LOG: startup process (PID 1650358) exited with exit code 1 LOG: terminating any other active server processes LOG: database system is shut down The cause of this failure is an oversight (rather obvious in hindsight): The renaming of the WAL file (that was last written to before the crash happened) to .partial is done *before* PostgreSQL might have to read this very file to recover prepared transactions from it. The relevant function calls here are durable_rename() and RecoverPreparedTransactions() in xlog.c . Note that it is important that the PREPARE entry is in the WAL file that PostgreSQL is writing to prior to the inital crash. This has happened repeatedly in production already with a customer that uses prepared transactions quite frequently. I assume that this has happened for others too, but the circumstances of the crash and the cause are very dubious, and troubleshooting it is pretty difficult. This behaviour has - apparently unintentionally - been fixed in PG 15 and upwards (see commit 811051c ), as part of a general restructure and reorganization of this portion of xlog.c (see commit 6df1543 ). Furthermore, it seems this behaviour does not appear in PG 12 and older, due to another possible bug: In PG 13 and newer, the XLogReaderState is reset in XLogBeginRead() before reading WAL in XlogReadTwoPhaseData() in twophase.c . In the older releases (PG <= 12), this reset is not done, so the requested LSN containing the prepared transaction can (by happy coincidence) be read from in-memory buffers, and PostgreSQL consequently manages to come up just fine (as the WAL has already been read into buffers prior to the .partial rename). If the older releases also where to properly reset the XLogReaderState, they would also fail to find the LSN on disk, and hence PostgreSQL would crash again. I've attached patches for PG 14 and PG 13 that mimic the change in PG15 (commit 811051c ) and reorder the crucial events, placing the recovery of prepared transactions *before* renaming the file. I've also attached recovery test scripts for PG >= 12 and PG <= 11 that can be used to verify that promote after recovery with prepared transactions works. A note for myself in the future and whomever may find it useful: The test can be copied to src/test/recovery/t/ and selectively run (after you've ./configure'd for TAP testing and compiled everything) from within the src/test/recovery directory using something like: make check PROVE_TESTS='t/PG_geq_12_promote_prepare_xact.pl' My humble opinion is that this fix should be backpatched to PG 14 and PG 13. It's debatable whether the fix needs to be brought back to 12 and older also, as those do not exhibit this issue, but the order of renaming is still wrong. I'm not sure if there could be cases where the in-memory buffers of the walreader are too small to cover a whole WAL file. There could also be other issues from operations that require reading WAL that happen after the .partial rename, I haven't checked in depth what else happens in the affected codepath. Please let me know if you think this should also be fixed in PG 12 and earlier, so I can produce the patches for those versions as well. Kind regards Julian diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index a05a82119e..e0d9b89d78 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -7858,6 +7858,60 @@ StartupXLOG(void) CreateCheckPoint(CHECKPOINT_END_OF_RECOVERY | CHECKPOINT_IMMEDIATE); } + /* + * Preallocate additional log files, if wanted. + */ + PreallocXlogFiles(EndOfLog); + + /* + * Okay, we're officially UP. + */ + InRecovery = false; + + /* st
Re: Pluggable toaster
Hi, > Pretty sure that reopening an already rejected patch that is competing > with compression dictionaries (which the rest of us are currently > focusing on) will achieve anything. Ooops, I didn't mean to be sarcastic: s/will achieve/will not achieve/ My apologies. > Consider joining the compression > dictionaries effort instead [1]. During the discussion with the > community it ended up being a TOAST improvement after all. So we could > use your expertise in this area. > > [1]: https://commitfest.postgresql.org/43/3626/ -- Best regards, Aleksander Alekseev
Re: Order changes in PG16 since ICU introduction
On 14.06.23 23:24, Jeff Davis wrote: On Mon, 2023-06-12 at 23:04 +0200, Peter Eisentraut wrote: Patch 0003: Makes LOCALE apply to all providers. The overall feel after this patch is that "locale" now means the collation locale, and LC_COLLATE/LC_CTYPE are for the server environment. When using libc, LC_COLLATE and LC_CTYPE still work as they did before, but their relationship to database collation feels more like a special case of the libc provider. I believe most people favor this patch and I haven't seen recent objections. This seems reasonable. Attached a clean patch for this. It seems to have widespread agreement so I plan to commit to v16 soon. To clarify, this affects both initdb and CREATE DATABASE. This looks good to me. Attached is small fixup patch with some documentation tweaks and simplifying some test code (also includes pgperltidy). From 0cd2154f364999091aba52136a139df75f58d1b7 Mon Sep 17 00:00:00 2001 From: Peter Eisentraut Date: Fri, 16 Jun 2023 16:46:36 +0200 Subject: [PATCH] fixup! CREATE DATABASE: make LOCALE apply to all collation providers. --- doc/src/sgml/ref/create_database.sgml | 12 ++-- src/test/icu/t/010_database.pl| 23 +-- 2 files changed, 19 insertions(+), 16 deletions(-) diff --git a/doc/src/sgml/ref/create_database.sgml b/doc/src/sgml/ref/create_database.sgml index dab05950ed..b2c8aef1ad 100644 --- a/doc/src/sgml/ref/create_database.sgml +++ b/doc/src/sgml/ref/create_database.sgml @@ -147,9 +147,9 @@ Parameters Sets the default collation order and character classification in the new database. Collation affects the sort order applied to strings, -e.g., in queries with ORDER BY, as well as the order used in indexes +e.g., in queries with ORDER BY, as well as the order used in indexes on text columns. Character classification affects the categorization -of characters, e.g., lower, - upper and digit. Also sets the +of characters, e.g., lower, upper, and digit. Also sets the associated aspects of the operating system environment, LC_COLLATE and LC_CTYPE. The default is the same setting as the template database. See Parameters Sets LC_COLLATE in the database server's operating system environment. The default is the setting of if specified; otherwise the same +linkend="create-database-locale"/> if specified, otherwise the same setting as the template database. See below for additional restrictions. @@ -198,7 +198,7 @@ Parameters Sets LC_CTYPE in the database server's operating system environment. The default is the setting of if specified; otherwise the same +linkend="create-database-locale"/> if specified, otherwise the same setting as the template database. See below for additional restrictions. @@ -218,8 +218,8 @@ Parameters Specifies the ICU locale (see ) for the database default collation order and character classification, overriding the setting -. The must be ICU. The default +. The locale provider must be ICU. The default is the setting of if specified; otherwise the same setting as the template database. diff --git a/src/test/icu/t/010_database.pl b/src/test/icu/t/010_database.pl index b417b1a409..cbe5467f3c 100644 --- a/src/test/icu/t/010_database.pl +++ b/src/test/icu/t/010_database.pl @@ -52,27 +52,30 @@ # Test that LOCALE='C' works for ICU -my $ret1 = $node1->psql('postgres', - q{CREATE DATABASE dbicu1 LOCALE_PROVIDER icu LOCALE 'C' TEMPLATE template0 ENCODING UTF8} -); -is($ret1, 0, +is( $node1->psql( + 'postgres', + q{CREATE DATABASE dbicu1 LOCALE_PROVIDER icu LOCALE 'C' TEMPLATE template0 ENCODING UTF8} + ), + 0, "C locale works for ICU"); # Test that LOCALE works for ICU locales if LC_COLLATE and LC_CTYPE # are specified -my $ret2 = $node1->psql('postgres', - q{CREATE DATABASE dbicu2 LOCALE_PROVIDER icu LOCALE '@colStrength=primary' +is( $node1->psql( + 'postgres', + q{CREATE DATABASE dbicu2 LOCALE_PROVIDER icu LOCALE '@colStrength=primary' LC_COLLATE='C' LC_CTYPE='C' TEMPLATE template0 ENCODING UTF8} -); -is($ret2, 0, + ), + 0, "LOCALE works for ICU locales if LC_COLLATE and LC_CTYPE are specified"); # Test that ICU-specific LOCALE without LC_COLLATE and LC_CTYPE must # be specified with ICU_LOCALE -my ($ret3, $stdout, $stderr) = $node1->psql('postgres', +my ($ret, $stdout, $stderr) = $node1->psql( + 'postgres', q{CREATE DATABASE dbicu3 LOCALE_PROVIDER icu LOCALE '@colStrength=primary' TEMPLATE template0 ENCODING UTF8}); -isnt($ret3, 0, +isnt($ret, 0, "ICU-specific locale must be specified with ICU_LOCALE: exit code not 0"); like( $stderr, --
Re: Deleting prepared statements from libpq.
On Fri, 16 Jun 2023 at 16:26, Craig Ringer wrote: > Nobody's implemented it. > > A patch to add PQclosePrepared and PQsendClosePrepared would be welcome. At > least, I think so... This might have been a pretty old thread. But I just took it upon me to implement these functions (or well I mostly copied the PQsendDescribe related code and did s/describe/close). I haven't tested this code yet but I'm pretty sure it should just work (it compiles at least). The main reason I'm interested in this is because we're actively working on implementing named prepared statement support for PgBouncer in transaction pooling mode. It works with lots of client libraries already. But sadly it doesn't work with psycopg at the moment, or at least the closing part does not. And the reason is that psycopg closes statements using a DEALLOCATE query instead of the Close protocol message, because libpq does not support sending the Close protocol message. Afaict this is not just a problem for PgBouncer its implementation. As far as I can tell the same is true for the Odyssey connection pooler (which implemented named prepared statement support first). v1-0001-Support-sending-Close-messages-from-libpq.patch Description: Binary data
Re: Incorrect estimation of HashJoin rows resulted from inaccurate small table statistics
On 6/16/23 11:25, Quan Zongliang wrote: > > We have a small table with only 23 rows and 21 values. > > The resulting MCV and histogram is as follows > stanumbers1 | {0.08695652,0.08695652} > stavalues1 | {v1,v2} > stavalues2 | > {v3,v4,v5,v6,v7,v8,v9,v10,v11,v12,v13,v14,v15,v16,v17,v18,v19,v20,v21} > > An incorrect number of rows was estimated when HashJoin was done with > another large table (about 2 million rows). > > Hash Join (cost=1.52..92414.61 rows=2035023 width=0) (actual > time=1.943..1528.983 rows=3902 loops=1) > That's interesting. I wonder how come the estimate gets this bad simply by skipping values entries with a single row in the sample, which means we know the per-value selectivity pretty well. I guess the explanation has to be something strange happening when estimating the join condition selectivity, where we combine MCVs from both sides of the join (which has to be happening here, otherwise it would not matter what gets to the MCV). It'd be interesting to know what's in the other MCV, and what are the other statistics for the attributes (ndistinct etc.). Or even better, a reproducer SQL script that builds two tables and then joins them. > The reason is that the MCV of the small table excludes values with rows > of 1. Put them in the MCV in the statistics to get the correct result. > > Using the conservative samplerows <= attstattarget doesn't completely > solve this problem. It can solve this case. > > After modification we get statistics without histogram: > stanumbers1 | {0.08695652,0.08695652,0.04347826,0.04347826, ... } > stavalues1 | {v,v2, ... } > > And we have the right estimates: > Hash Join (cost=1.52..72100.69 rows=3631 width=0) (actual > time=1.447..1268.385 rows=3902 loops=1) > I'm not against building a "complete" MCV, but I guess the case where (samplerows <= num_mcv) is pretty rare. Why shouldn't we make the MCV complete whenever we decide (ndistinct <= num_mcv)? That would need to happen later, because we don't have the ndistinct estimate yet at this point - we'd have to do the loop a bit later (or likely twice). FWIW the patch breaks the calculation of nmultiple (and thus likely the ndistinct estimate). regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Do we want a hashset type?
On Fri, Jun 16, 2023, at 13:57, jian he wrote: > similar to (int[] || int4) and (int4 || int[]) > should we expect ('{1,2}'::int4hashset || 3) == (3 || > '{1,2}'::int4hashset) == (select hashset_add('{1,2}'::int4hashset,3)); > *?* Good idea, makes sense to support it. Implemented in attached patch. > CREATE OPERATOR || ( > leftarg = int4, > rightarg = int4hashset, > function = int4_add_int4hashset, > commutator = || > ); > while creating an operator. I am not sure how to specify > NEGATOR,RESTRICT,JOIN clause. I don't think we need those for this operator, might be wrong though. > - > also. I think the following query should return one row only? but now > it doesn't. > select hashset_cmp('{1,2}','{2,1}') > union > select hashset_cmp('{1,2}','{1,2,1}') > union > select hashset_cmp('{1,2}','{1,2}'); Good point. I realise int4hashset_hash() is broken, since two int4hashset's that are considered equal, can by coincidence get different hashes: SELECT '{1,2}'::int4hashset = '{2,1}'::int4hashset; ?column? -- t (1 row) SELECT hashset_hash('{1,2}'::int4hashset); hashset_hash -- 990882385 (1 row) SELECT hashset_hash('{2,1}'::int4hashset); hashset_hash -- 996377797 (1 row) Do we have any ideas on how to fix this without sacrificing performance? We of course want to avoid having to sort the hashsets, which is the naive solution. To understand why this is happening, consider this example: SELECT '{1,2}'::int4hashset; int4hashset - {1,2} (1 row) SELECT '{2,1}'::int4hashset; int4hashset - {2,1} (1 row) If the hash of `1` and `2` modulus the capacity results in the same value, they will be attempted to be inserted into the same position, and since the input text is parsed left-to-right, in the first case `1` will win the first position, and `2` will get a collision and try the next position. In the second case, the opposite happens. Since we do modulus capacity, the position depends on the capacity, which is why the output string can be different for the same input. SELECTint4hashset() || 1 || 2 || 3; {3,1,2} SELECTint4hashset(capacity:=1) || 1 || 2 || 3; {3,1,2} SELECTint4hashset(capacity:=2) || 1 || 2 || 3; {3,1,2} SELECTint4hashset(capacity:=3) || 1 || 2 || 3; {3,2,1} SELECTint4hashset(capacity:=4) || 1 || 2 || 3; {3,1,2} SELECTint4hashset(capacity:=5) || 1 || 2 || 3; {1,2,3} SELECTint4hashset(capacity:=6) || 1 || 2 || 3; {1,3,2} > -- > similar to elem_contained_by_range, range_contains_elem. we should > already consider the operator *<@* and @*>? * That could perhaps be nice. Apart from possible syntax convenience, are there any other benefits than just using the function hashset_contains(int4hashset, integer) directly? /Joel hashset-0.0.1-d83bdef.patch Description: Binary data
Re: Support to define custom wait events for extensions
I will take a look at your V2 when it is ready! I will also pass along that this is wanted by Neon customers :). -- Tristan Partin Neon (https://neon.tech)
Re: Order changes in PG16 since ICU introduction
On Fri, 2023-06-16 at 16:50 +0200, Peter Eisentraut wrote: > This looks good to me. > > Attached is small fixup patch with some documentation tweaks and > simplifying some test code (also includes pgperltidy). Thank you. Committed with your fixups. Regards, Jeff Davis
Re: Support to define custom wait events for extensions
On 2023/06/17 1:16, Tristan Partin wrote: > I will take a look at your V2 when it is ready! I will also pass along > that this is wanted by Neon customers :). Thanks! Regards, -- Masahiro Ikeda NTT DATA CORPORATION
Re: add non-option reordering to in-tree getopt_long
On Fri, Jun 16, 2023 at 04:51:38PM +0900, Kyotaro Horiguchi wrote: > (Honestly, the rearrangement code looks somewhat tricky to grasp..) Yeah, I think there's still some room for improvement here. > It doesn't work properly if '--' is involved. > > For example, consider the following options (even though they don't > work for the command). > > psql -t -T hoho -a hoge -- -1 hage hige huge > > After the function returns -1, the argv array looks like this. The > "[..]" indicates the position of optind. > > psql -t -T hoho -a -- [-1] hage hige huge hoge > > This is clearly incorrect since "hoge" gets moved to the end. The > rearrangement logic needs to take into account the '--'. For your > information, the glibc version yields the following result for the > same options. > > psql -t -T hoho -a -- [hoge] -1 hage hige huge Ah, so it effectively retains the non-option ordering, even if there is a '--'. I think this behavior is worth keeping. I attempted to fix this in the attached patch. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com >From 51e1033374c464ed90484f641d31b0ab705672f2 Mon Sep 17 00:00:00 2001 From: Nathan Bossart Date: Fri, 9 Jun 2023 15:51:58 -0700 Subject: [PATCH v4 1/1] Teach in-tree getopt_long() to move non-options to the end of argv. --- src/bin/scripts/t/040_createuser.pl | 10 +++--- src/port/getopt_long.c | 50 + 2 files changed, 42 insertions(+), 18 deletions(-) diff --git a/src/bin/scripts/t/040_createuser.pl b/src/bin/scripts/t/040_createuser.pl index 9ca282181d..ba40ab11a3 100644 --- a/src/bin/scripts/t/040_createuser.pl +++ b/src/bin/scripts/t/040_createuser.pl @@ -42,9 +42,9 @@ $node->issues_sql_like( 'add a role as a member with admin option of the newly created role'); $node->issues_sql_like( [ - 'createuser', '-m', - 'regress_user3', '-m', - 'regress user #4', 'REGRESS_USER5' + 'createuser', 'REGRESS_USER5', + '-m', 'regress_user3', + '-m', 'regress user #4' ], qr/statement: CREATE ROLE "REGRESS_USER5" NOSUPERUSER NOCREATEDB NOCREATEROLE INHERIT LOGIN NOREPLICATION NOBYPASSRLS ROLE regress_user3,"regress user #4";/, 'add a role as a member of the newly created role'); @@ -73,11 +73,13 @@ $node->issues_sql_like( qr/statement: CREATE ROLE regress_user11 NOSUPERUSER NOCREATEDB NOCREATEROLE INHERIT LOGIN NOREPLICATION NOBYPASSRLS IN ROLE regress_user1;/, '--role'); $node->issues_sql_like( - [ 'createuser', '--member-of', 'regress_user1', 'regress_user12' ], + [ 'createuser', 'regress_user12', '--member-of', 'regress_user1' ], qr/statement: CREATE ROLE regress_user12 NOSUPERUSER NOCREATEDB NOCREATEROLE INHERIT LOGIN NOREPLICATION NOBYPASSRLS IN ROLE regress_user1;/, '--member-of'); $node->command_fails([ 'createuser', 'regress_user1' ], 'fails if role already exists'); +$node->command_fails([ 'createuser', 'regress_user1', '-m', 'regress_user2', 'regress_user3' ], + 'fails for too many non-options'); done_testing(); diff --git a/src/port/getopt_long.c b/src/port/getopt_long.c index c989276988..a4bdc6c8f0 100644 --- a/src/port/getopt_long.c +++ b/src/port/getopt_long.c @@ -50,8 +50,9 @@ * This implementation does not use optreset. Instead, we guarantee that * it can be restarted on a new argv array after a previous call returned -1, * if the caller resets optind to 1 before the first call of the new series. - * (Internally, this means we must be sure to reset "place" to EMSG before - * returning -1.) + * + * Note that this routine reorders the pointers in argv (despite the const + * qualifier) so that all non-options will be at the end when -1 is returned. */ int getopt_long(int argc, char *const argv[], @@ -60,38 +61,59 @@ getopt_long(int argc, char *const argv[], { static char *place = EMSG; /* option letter processing */ char *oli; /* option letter list index */ + static int nonopt_start = -1; + static bool force_nonopt = false; if (!*place) { /* update scanning pointer */ +retry: if (optind >= argc) { place = EMSG; + nonopt_start = -1; + force_nonopt = false; return -1; } place = argv[optind]; - if (place[0] != '-') + /* non-options, including '-' and anything after '--' */ + if (place[0] != '-' || place[1] == '\0' || force_nonopt) { - place = EMSG; - return -1; - } + char **args = (char **) argv; - place++; + /* + * If only non-options remain, return -1. Else, move the + * non-option to the end and try again. + */ + if (optind == nonopt_start) + { +place = EMSG; +nonopt_start = -1; +force_nonopt = false; +return -1; + } - if (!*place) - { - /* treat "-" as not being an option */ - place = EMSG; - return -1; + for (int i = optind; i < argc - 1; i++) +args[i] = args[i + 1]; + args[argc - 1] = place; + + if (nonopt_start == -1) +nonopt_start = argc - 1; + else +nonopt_start--; + + goto retry; } + place++; + if (place[0] == '-'
Re: [PATCH] Missing dep on Catalog.pm in meson rules
On 2023-06-14 We 15:32, Andres Freund wrote: Hi, On 2023-06-09 11:43:54 -0700, Andres Freund wrote: On 2023-06-02 10:13:44 -0500, Tristan Partin wrote: On Fri Jun 2, 2023 at 8:47 AM CDT, Andres Freund wrote: Hi, On 2023-06-02 08:10:43 -0500, Tristan Partin wrote: I wonder if we instead could just make perl output the files it loads and handle dependencies automatically that way? But that's more work, so it's probably the right thing to go for the manual path for now. I am not familar with Perl enough (at all haha) to know if that is possible. I don't know exactly what these Perl files do, but perhaps it might make sense to have some global lookup table that is setup near the beginning of the script. It'd be nice to have something more general - there are other perl modules we load, e.g. ./src/backend/catalog/Catalog.pm ./src/backend/utils/mb/Unicode/convutils.pm ./src/tools/PerfectHash.pm perl_files = { 'Catalog.pm': files('path/to/Catalog.pm'), ... } I think you got it, but just to make sure: I was thinking of generating a depfile from within perl. Something like what you propose doesn't quite seems like a sufficient improvement. Whatever I am proposing is definitely subpar to generating a depfile. So if that can be done, that is the best option! I looked for a bit, but couldn't find an easy way to do so. I would still like to pursue going towards dep files for the perl scripts, even if that requires explicit support in the perl scripts, but that's a change for later. Took a second look - sure looks like just using values %INC should suffice? Ilmari, you're the perl expert, is there an issue with that? Tristan, any chance you're interested hacking that up for a bunch of the scripts? Might be worth adding a common helper for, I guess? Something like for (values %INC) { print STDERR "$kw_def_file: $_\n"; } seems to roughly do the right thing for gen_keywordlist.pl. Of course for something real it'd need an option where to put that data, instead of printing to stderr. Unless I'm misunderstanding, this doesn't look terribly feasible to me. You can only get at %INC by loading the module, which in many cases will have side effects. And then you would also need to filter out things loaded that are not our artefacts (e.g. Catalog.pm loads File::Compare). cheers andrew -- Andrew Dunstan EDB:https://www.enterprisedb.com
Re: RFC: logical publication via inheritance root?
On Fri, Jun 16, 2023 at 6:26 AM Amit Kapila wrote: > On Sat, Apr 1, 2023 at 5:06 AM Jacob Champion wrote: > > I think it comes down to why an inheritance scheme was used. If it's > > because you want to group rows into named, queryable subsets (e.g. the > > "cities/capitals" example in the docs [1]), I don't think this has any > > utility, because I assume you'd want to replicate your subsets as-is. > > I also think so and your idea to have a function like > pg_set_logical_root() seems to make the inheritance hierarchy behaves > as a declarative partitioning scheme for the purpose of logical > replication. Right. > > But if it's because you've implemented a partitioning scheme of your > > own (the docs still list reasons you might want to [2], even today), > > and all you ever really do is interact with the root table, I think > > this feature will give you some of the same benefits that > > publish_via_partition_root gives native partition users. We're very > > much in that boat, but I don't know how many others are. > > > > I agree that there may still be cases as pointed out by you where > people want to use inheritance as a mechanism for partitioning but I > feel those would still be in the minority. (Just to clarify -- timescaledb is one of those cases. They definitely still exist.) > Personally, I am not very > excited about this idea. Yeah, "exciting" isn't how I'd describe this feature either :D But I think we're probably locked out of logical replication without the ability to override publish_as_relid for our internal tables, somehow. And I don't think DDL replication will help, just like it wouldn't necessarily help existing publish_via_partition_root use cases, because we don't want to force the source table's hierarchy on the target table. (A later version of timescaledb may not even use the same internal layout.) Is there an alternative implementation I'm missing, maybe, or a way to make this feature more generally applicable? "We have table Y and want it to be migrated as part of table X" seems to fall squarely under the logical replication umbrella. Thanks, --Jacob
test_extensions: fix inconsistency between meson.build and Makefile
Patch attached. Currently, the Makefile specifies NO_LOCALE=1, and the meson.build does not. -- Jeff Davis PostgreSQL Contributor Team - AWS From 1775c98badb94a2ee185d7a6bd11482a4e5db58a Mon Sep 17 00:00:00 2001 From: Jeff Davis Date: Fri, 16 Jun 2023 11:51:00 -0700 Subject: [PATCH v1] test_extensions: make meson.build consistent with Makefile. Specify --no-locale and --encoding=UTF8 to be consistent with the Makefile, which specifies NO_LOCALE=1. --- src/test/modules/test_extensions/meson.build | 1 + 1 file changed, 1 insertion(+) diff --git a/src/test/modules/test_extensions/meson.build b/src/test/modules/test_extensions/meson.build index 29e5bb2fb5..698775b28d 100644 --- a/src/test/modules/test_extensions/meson.build +++ b/src/test/modules/test_extensions/meson.build @@ -47,5 +47,6 @@ tests += { 'test_extensions', 'test_extdepend', ], +'regress_args': ['--no-locale', '--encoding=UTF8'], }, } -- 2.34.1
Default client_connection_check_interval to 10s on supported systems
Hi, I am proposing that the default value of client_connection_check_interval be moved to a non-zero value on systems where it is supported. I think it would be a nice quality of life improvement. I have run into a problem where this would have been useful before with regard to pgbench not currently handling SIGINT corrently[0]. I basically picked 10s out of thin air and am happy to change it to what others feel would be more appropriate. This doesn't need to be a check that happens often because it should just be a backstop for unusual scenarios or poorly programmed clients. The original thread where Thomas committed these changes seemed to indicate no performance impact[1]. The only reason that I can think of this being turned off by default is that it isn't available on all systems that Postgres supports. When this was committed however, the only system that seemed to support EPOLLRDHUP was Linux. Seems like in recent years this story has changed given the description of the parameter. > This option relies on kernel events exposed by Linux, macOS, illumos and > the BSD family of operating systems, and is not currently available on > other systems. [0]: https://www.postgresql.org/message-id/CSSWBAX56CVY.291H6ZNNHK7EO@c3po [1]: https://www.postgresql.org/message-id/ca+hukg++kitznuoxw2-kob1pkwd2cyuqa9vlj5bf0g_i7l1...@mail.gmail.com -- Tristan Partin Neon (https://neon.tech) From d38cc95dceda5fcf98c8aae6379085e9c38a3155 Mon Sep 17 00:00:00 2001 From: Tristan Partin Date: Fri, 16 Jun 2023 12:47:26 -0500 Subject: [PATCH v1] Default client_connection_check_interval to 10s on supported systems client_connection_check_interval is an inherently valuable parameter when supported. It can help cancel long running queries server-side when a connection drops client-side. --- configure.ac| 3 +++ doc/src/sgml/config.sgml| 10 ++ meson.build | 2 ++ src/backend/utils/misc/guc_tables.c | 7 ++- src/include/pg_config.h.in | 6 ++ src/tools/msvc/Solution.pm | 2 ++ 6 files changed, 25 insertions(+), 5 deletions(-) diff --git a/configure.ac b/configure.ac index 09558ada0f..3df7bbeb4f 100644 --- a/configure.ac +++ b/configure.ac @@ -1846,6 +1846,9 @@ AC_CHECK_DECLS([strlcat, strlcpy, strnlen]) AC_CHECK_DECLS([preadv], [], [AC_LIBOBJ(preadv)], [#include ]) AC_CHECK_DECLS([pwritev], [], [AC_LIBOBJ(pwritev)], [#include ]) +AC_CHECK_DECLS(EPOLLHUP, [], [], [#include ]) +AC_CHECK_DECLS(EPOLLRDHUP, [], [], [#include ]) + # This is probably only present on macOS, but may as well check always AC_CHECK_DECLS(F_FULLFSYNC, [], [], [#include ]) diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index 6262cb7bb2..24f69ebd15 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -1044,10 +1044,12 @@ include_dir 'conf.d' If the value is specified without units, it is taken as milliseconds. -The default value is 0, which disables connection -checks. Without connection checks, the server will detect the loss of -the connection only at the next interaction with the socket, when it -waits for, receives or sends data. +The default value is 0 on systems not previously +mentioned, such as Windows, which disables connection checks, and a +default value of 1, otherwise. Without +connection checks, the server will detect the loss of the connection +only at the next interaction with the socket, when it waits for, +receives or sends data. For the kernel itself to detect lost TCP connections reliably and within diff --git a/meson.build b/meson.build index 82f2782673..8de6a7a7f1 100644 --- a/meson.build +++ b/meson.build @@ -2163,6 +2163,8 @@ endforeach decl_checks = [ + ['EPOLLHUP', 'sys/epoll.h'], + ['EPOLLRDHUP', 'sys/epoll.h'], ['F_FULLFSYNC', 'fcntl.h'], ['fdatasync', 'unistd.h'], ['posix_fadvise', 'fcntl.h'], diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c index 71e27f8eb0..8cfd6b61bb 100644 --- a/src/backend/utils/misc/guc_tables.c +++ b/src/backend/utils/misc/guc_tables.c @@ -3478,7 +3478,12 @@ struct config_int ConfigureNamesInt[] = GUC_UNIT_MS }, &client_connection_check_interval, - 0, 0, INT_MAX, +#if defined(HAVE_DECL_EPOLLHUP) || defined(HAVE_DECL_EPOLLRDHUP) + 1, +#else + 0, +#endif + 0, INT_MAX, check_client_connection_check_interval, NULL, NULL }, diff --git a/src/include/pg_config.h.in b/src/include/pg_config.h.in index 6d572c3820..7ce5004550 100644 --- a/src/include/pg_config.h.in +++ b/src/include/pg_config.h.in @@ -91,6 +91,12 @@ /* Define to 1 if you have the `CRYPTO_lock' function. */ #undef HAVE_CRYPTO_LOCK +/* Define to 1 if you have the EPOLLHUP flag */ +#undef HAVE_DECL_EPOLLHUP + +/* Define to 1 if you have the EPOLLRDHUP flag */ +#undef HAVE_DECL_EPOLLRDHUP +
Re: test_extensions: fix inconsistency between meson.build and Makefile
On Fri Jun 16, 2023 at 3:29 PM CDT, Jeff Davis wrote: > Patch attached. Currently, the Makefile specifies NO_LOCALE=1, and the > meson.build does not. Looks alright to me, but it might be nicer to change the order of arguments to match contrib/unaccent/meson.build:40. Might help with grepping in the future. -- Tristan Partin Neon (https://neon.tech)
Adding further hardening to nbtree page deletion
Attached patch adds additional hardening to nbtree page deletion. It makes nbtree VACUUM tolerate a certain sort of cross-page inconsistencies in the structure of an index (corruption). VACUUM can press on, avoiding an eventual wraparound/xidStopLimit failure in environments where nobody notices the problem for an extended period. This is very similar to my recent commit 5abff197 (though it's even closer to commit 5b861baa). Once again we're demoting an ERROR to a LOG message, and pressing on with vacuuming. I propose that this patch be backpatched all the way, too. The hardening added by the patch seems equally well targeted and low risk. It's a parent/child inconsistency, as opposed to a sibling inconsistency. Very familiar stuff, overall. I have seen an internal report of the ERROR causing issues for a production instance, so this definitely can fail in the field on modern Postgres versions. Though this particular inconsistency ("right sibling is not next child...") has a long history. It has definitely been spotted in the field several times over many years. This 2006 thread about problems with a Wisconsin courts database is one example of that: https://www.postgresql.org/message-id/flat/3355.1144873721%40sss.pgh.pa.us#b0a89b2d9e7f6a3c818fdf723b8fa29b At the time the ERROR was a PANIC. A few years later (in 2010), it was demoted to an ERROR (see commit 8fa30f90). And now I want to demote it to a LOG -- which is much easier now that we have a robust approach to page deletion (after 2014 commit efada2b8e9). -- Peter Geoghegan v1-0001-nbtree-VACUUM-cope-with-topparent-inconsistencies.patch Description: Binary data
Re: [PATCH] Missing dep on Catalog.pm in meson rules
Hi, On 2023-06-16 16:20:14 -0400, Andrew Dunstan wrote: > Unless I'm misunderstanding, this doesn't look terribly feasible to me. You > can only get at %INC by loading the module, which in many cases will have > side effects. I was envisioning using %INC after the use/require block - I don't think our scripts load additional modules after that point? > And then you would also need to filter out things loaded that > are not our artefacts (e.g. Catalog.pm loads File::Compare). I don't think we would need to filter the output. This would just be for a build dependency file. I don't see a problem with rerunning genbki.pl et al after somebody updates File::Compare? Greetings, Andres Freund
Re: [PATCH] pg_regress.c: Fix "make check" on Mac OS X: Pass DYLD_LIBRARY_PATH
I have applied the patch to the latest master branch and successfully executed './configure && make && make check' on macOS Ventura. However, during the process, a warning was encountered: "mixing declarations and code is incompatible with standards before C99 [-Wdeclaration-after-statement]". Moving the declaration of 'result' to the beginning like below can resolve the warning, and it would be better to use a unique variable instead of 'result'. #ifdef __darwin__ static char extra_envvars[4096]; +int result = -1; ... ... -int result = snprintf(extra_envvars, sizeof(extra_envvars), "DYLD_LIBRARY_PATH=%s", +result = snprintf(extra_envvars, sizeof(extra_envvars), "DYLD_LIBRARY_PATH=%s",
Re: [17] collation provider "builtin"
On Fri, 2023-06-16 at 16:01 +0200, Peter Eisentraut wrote: > What happens if after this patch you continue to specify > provider=libc > and locale=C? Do you then get the "slow" path? Users can continue to use the libc provider as they did before and the fast path will still work. > Should there be some logic in pg_dump to change the provider if > locale=C? That's not a part of this proposal. > What is the transition plan? The built-in provider is for users who want to choose a provider that is guaranteed not to have the problems of an external provider (versioning, tracking affected objects, corrupt indexes, and slow performance). If they initialize with --locale-provider=builtin and -- locale=C, and they want to choose a different locale for another database, they'll need to specifically choose libc or ICU. Of course they can still use specific collations attached to columns or queries as required. It also acts as a nice complement to ICU (which doesn't support the C locale) or a potential replacement for many uses of the libc provider with the C locale. We can discuss later exactly how that will look, but even if the builtin provider needs to be explicitly requested (as in the current patch), it's still useful, so I don't think we need to decide now. We should also keep in mind that whatever provider is selected at initdb time also becomes the default for future databases. Regards, Jeff Davis
Fixing tab-complete for dollar-names
Hi hackers, In modern versions of Postgres the dollar sign is a totally legal character for identifiers (except for the first character), but tab-complete do not treat such identifiers well. For example if one try to create an Oracle-style view like this: create view v$activity as select * from pg_stat_activity; , he will get a normally functioning view, but psql tab-complete will not help him. Type "v", "v$" or "v$act" and press - nothing will be suggested. Attached is a small patch fixing this problem. Honestly I'm a little surprised that this was not done before. Maybe, there are some special considerations I am not aware of, and the patch will break something? What would you say? -- best regards, Mikhail A. Gribkov v001_fix_dollar_names_tab_complete.patch Description: Binary data
Re: pg_collation.collversion for C.UTF-8
On Thu, 2023-06-15 at 19:15 +1200, Thomas Munro wrote: > Hmm, OK let's explore that. What could we do that would be helpful > here, without affecting users of the "true" C.UTF-8 for the rest of > time? Where is the "true" C.UTF-8 defined? I assume you mean that the collation order can't (shouldn't, anyway) change. But what about the ctype (upper/lower/initcap) behavior? Is that also locked down for all time, or could it change if some new unicode characters are added? Would it be correct to interpret LC_COLLATE=C.UTF-8 as LC_COLLATE=C, but leave LC_CTYPE=C.UTF-8 as-is? Regards, Jeff Davis
Re: Default client_connection_check_interval to 10s on supported systems
"Tristan Partin" writes: > I am proposing that the default value of > client_connection_check_interval be moved to a non-zero value on > systems where it is supported. I think it would be a nice quality of > life improvement. I doubt that we need this, and I *really* doubt that it's appropriate to use a timeout as short as 10s. One reason not to try to enable it by default is exactly that the default behavior would then be platform-dependent. That's a documentation problem we could do without. regards, tom lane
Re: [PATCH] Missing dep on Catalog.pm in meson rules
Andres Freund writes: > Hi, > > On 2023-06-16 16:20:14 -0400, Andrew Dunstan wrote: >> Unless I'm misunderstanding, this doesn't look terribly feasible to me. You >> can only get at %INC by loading the module, which in many cases will have >> side effects. > > I was envisioning using %INC after the use/require block - I don't think our > scripts load additional modules after that point? I was thinking of a module for writing depfile entries that would append `values %INC` to the list of source files for each target specified by the script. >> And then you would also need to filter out things loaded that >> are not our artefacts (e.g. Catalog.pm loads File::Compare). > > I don't think we would need to filter the output. This would just be for a > build dependency file. I don't see a problem with rerunning genbki.pl et al > after > somebody updates File::Compare? As long as mason doesn't object to dep files outside the source tree. Otherwise, and option would be to pass in @SOURCE_ROOT@ and only include `grep /^\Q$source_root\E\b/, values %INC` in the depfile. > Greetings, > > Andres Freund - ilmari
Re: [PATCH] Missing dep on Catalog.pm in meson rules
On Fri Jun 16, 2023 at 5:10 PM CDT, Dagfinn Ilmari Mannsåker wrote: > Andres Freund writes: > > > Hi, > > > > On 2023-06-16 16:20:14 -0400, Andrew Dunstan wrote: > >> Unless I'm misunderstanding, this doesn't look terribly feasible to me. You > >> can only get at %INC by loading the module, which in many cases will have > >> side effects. > > > > I was envisioning using %INC after the use/require block - I don't think our > > scripts load additional modules after that point? > > I was thinking of a module for writing depfile entries that would append > `values %INC` to the list of source files for each target specified by > the script. > > >> And then you would also need to filter out things loaded that > >> are not our artefacts (e.g. Catalog.pm loads File::Compare). > > > > I don't think we would need to filter the output. This would just be for a > > build dependency file. I don't see a problem with rerunning genbki.pl et al > > after > > somebody updates File::Compare? > > As long as mason doesn't object to dep files outside the source tree. > Otherwise, and option would be to pass in @SOURCE_ROOT@ and only include > `grep /^\Q$source_root\E\b/, values %INC` in the depfile. Meson has no such restrictions. -- Tristan Partin Neon (https://neon.tech)
Re: Default client_connection_check_interval to 10s on supported systems
On Fri Jun 16, 2023 at 5:10 PM CDT, Tom Lane wrote: > "Tristan Partin" writes: > > I am proposing that the default value of > > client_connection_check_interval be moved to a non-zero value on > > systems where it is supported. I think it would be a nice quality of > > life improvement. > > I doubt that we need this, and I *really* doubt that it's appropriate > to use a timeout as short as 10s. Sure. Like I said, 10s is just a number pulled from thin air. The original patches on the mailing list had this as low as 1s. > One reason not to try to enable it by default is exactly that the > default behavior would then be platform-dependent. That's a > documentation problem we could do without. Totally fine with me if this is the reason for rejection. Just wanted to put it out there for discussion since I don't think the original thread covered the default value very much. -- Tristan Partin Neon (https://neon.tech)
Re: Incorrect estimation of HashJoin rows resulted from inaccurate small table statistics
On 2023/6/16 23:39, Tomas Vondra wrote: On 6/16/23 11:25, Quan Zongliang wrote: We have a small table with only 23 rows and 21 values. The resulting MCV and histogram is as follows stanumbers1 | {0.08695652,0.08695652} stavalues1 | {v1,v2} stavalues2 | {v3,v4,v5,v6,v7,v8,v9,v10,v11,v12,v13,v14,v15,v16,v17,v18,v19,v20,v21} An incorrect number of rows was estimated when HashJoin was done with another large table (about 2 million rows). Hash Join (cost=1.52..92414.61 rows=2035023 width=0) (actual time=1.943..1528.983 rows=3902 loops=1) That's interesting. I wonder how come the estimate gets this bad simply by skipping values entries with a single row in the sample, which means we know the per-value selectivity pretty well. I guess the explanation has to be something strange happening when estimating the join condition selectivity, where we combine MCVs from both sides of the join (which has to be happening here, otherwise it would not matter what gets to the MCV). It'd be interesting to know what's in the other MCV, and what are the other statistics for the attributes (ndistinct etc.). Or even better, a reproducer SQL script that builds two tables and then joins them. The other table is severely skewed. Most rows cannot JOIN the small table. This special case causes the inaccuracy of cost calculation. The reason is that the MCV of the small table excludes values with rows of 1. Put them in the MCV in the statistics to get the correct result. Using the conservative samplerows <= attstattarget doesn't completely solve this problem. It can solve this case. After modification we get statistics without histogram: stanumbers1 | {0.08695652,0.08695652,0.04347826,0.04347826, ... } stavalues1 | {v,v2, ... } And we have the right estimates: Hash Join (cost=1.52..72100.69 rows=3631 width=0) (actual time=1.447..1268.385 rows=3902 loops=1) I'm not against building a "complete" MCV, but I guess the case where (samplerows <= num_mcv) is pretty rare. Why shouldn't we make the MCV complete whenever we decide (ndistinct <= num_mcv)? That would need to happen later, because we don't have the ndistinct estimate yet at this point - we'd have to do the loop a bit later (or likely twice). FWIW the patch breaks the calculation of nmultiple (and thus likely the ndistinct estimate). It's not just a small table. If a column's value is nearly unique. It also causes the same problem because we exclude values that occur only once. samplerows <= num_mcv just solves one scenario. Perhaps we should discard this (dups cnt > 1) restriction? regards
Re: Incorrect estimation of HashJoin rows resulted from inaccurate small table statistics
Quan Zongliang writes: > Perhaps we should discard this (dups cnt > 1) restriction? That's not going to happen on the basis of one test case that you haven't even shown us. The implications of doing it are very unclear. In particular, I seem to recall that there are bits of logic that depend on the assumption that MCV entries always represent more than one row. The nmultiple calculation Tomas referred to may be failing because of that, but I'm worried about there being other places. Basically, you're proposing a rather fundamental change in the rules by which Postgres has gathered statistics for decades. You need to bring some pretty substantial evidence to support that. The burden of proof is on you, not on the status quo. regards, tom lane
Re: Incorrect estimation of HashJoin rows resulted from inaccurate small table statistics
On 2023/6/17 06:46, Tom Lane wrote: Quan Zongliang writes: Perhaps we should discard this (dups cnt > 1) restriction? That's not going to happen on the basis of one test case that you haven't even shown us. The implications of doing it are very unclear. In particular, I seem to recall that there are bits of logic that depend on the assumption that MCV entries always represent more than one row. The nmultiple calculation Tomas referred to may be failing because of that, but I'm worried about there being other places. The statistics for the other table look like this: stadistinct | 6 stanumbers1 | {0.50096667,0.49736667,0.0012} stavalues1 | {v22,v23,v5} The value that appears twice in the small table (v1 and v2) does not appear here. The stadistinct's true value is 18 instead of 6 (three values in the small table do not appear here). When calculating the selectivity: if (nd2 > sslot2->nvalues) totalsel1 += unmatchfreq1 * otherfreq2 / (nd2 - sslot2->nvalues); totalsel1 = 0 nd2 = 21 sslot2->nvalues = 2 unmatchfreq1 = 0.0002016420476 otherfreq2 = 0.82608695328235626 result: totalsel1 = 0.043473913749706022 rows = 0.043473913749706022 * 23 * 2,000,000 = 1999800 Basically, you're proposing a rather fundamental change in the rules by which Postgres has gathered statistics for decades. You need to bring some pretty substantial evidence to support that. The burden of proof is on you, not on the status quo. regards, tom lane
Re: Do we want a hashset type?
On Fri, Jun 16, 2023, at 17:42, Joel Jacobson wrote: > I realise int4hashset_hash() is broken, > since two int4hashset's that are considered equal, > can by coincidence get different hashes: ... > Do we have any ideas on how to fix this without sacrificing performance? The problem was due to hashset_hash() function accumulating the hashes of individual elements in a non-commutative manner. As a consequence, the final hash value was sensitive to the order in which elements were inserted into the hashset. This behavior led to inconsistencies, as logically equivalent sets (i.e., sets with the same elements but in different orders) produced different hash values. Solved by modifying the hashset_hash() function to use a commutative operation when combining the hashes of individual elements. This change ensures that the final hash value is independent of the element insertion order, and logically equivalent sets produce the same hash. An somewhat unfortunate side-effect of this fix, is that we can no longer visually sort the hashset output format, since it's not lexicographically sorted. I think this is an acceptable trade-off for a hashset type, since the only alternative I see would be to sort the elements, but then it wouldn't be a hashset, but a treeset, which different Big-O complexity. New patch is attached, which will henceforth always be a complete patch, to avoid the hassle of having to assemble incremental patches. /Joel hashset-0.0.1-8da4aa8.patch Description: Binary data
Re: RFC: logical publication via inheritance root?
On Sat, Jun 17, 2023 at 1:51 AM Jacob Champion wrote: > > On Fri, Jun 16, 2023 at 6:26 AM Amit Kapila wrote: > > > > But if it's because you've implemented a partitioning scheme of your > > > own (the docs still list reasons you might want to [2], even today), > > > and all you ever really do is interact with the root table, I think > > > this feature will give you some of the same benefits that > > > publish_via_partition_root gives native partition users. We're very > > > much in that boat, but I don't know how many others are. > > > > > > > I agree that there may still be cases as pointed out by you where > > people want to use inheritance as a mechanism for partitioning but I > > feel those would still be in the minority. > > (Just to clarify -- timescaledb is one of those cases. They definitely > still exist.) > Noted, but I think that can't be the reason to accept this feature in core. > > Personally, I am not very > > excited about this idea. > > Yeah, "exciting" isn't how I'd describe this feature either :D But I > think we're probably locked out of logical replication without the > ability to override publish_as_relid for our internal tables, somehow. > And I don't think DDL replication will help, just like it wouldn't > necessarily help existing publish_via_partition_root use cases, > because we don't want to force the source table's hierarchy on the > target table. (A later version of timescaledb may not even use the > same internal layout.) > > Is there an alternative implementation I'm missing, maybe, or a way to > make this feature more generally applicable? "We have table Y and want > it to be migrated as part of table X" seems to fall squarely under the > logical replication umbrella. > Are you talking about this w.r.t inheritance/partition hierarchy? I don't see any other way except "publish_via_partition_root" because we expect the same schema and relation name on the subscriber to replicate. You haven't explained why exactly you have such a requirement of replicating via inheritance root aka why you want inheritance hierarchy to be different on target db. The other idea that came across my mind was to provide some schema mapping kind of feature on subscribers where we could route the tuples from table X to table Y provided they have the same or compatible schema. I don't know if this is feasible or how generally it will be useful and whether any other DB (replication solution) provides such a feature but I guess something like that would have helped your use case. -- With Regards, Amit Kapila.
Re: pg_collation.collversion for C.UTF-8
On Sat, Jun 17, 2023 at 10:03 AM Jeff Davis wrote: > On Thu, 2023-06-15 at 19:15 +1200, Thomas Munro wrote: > > Hmm, OK let's explore that. What could we do that would be helpful > > here, without affecting users of the "true" C.UTF-8 for the rest of > > time? > > Where is the "true" C.UTF-8 defined? By "true" I just meant glibc's official one, in contrast to the imposter from Debian oldstable's patches. It's not defined by any standard, but we only know how to record versions for glibc, FreeBSD and Windows, and we know what the first two of those do for that locale because they tell us (see below). For Windows, the manual's BNF-style description of acceptable strings doesn't appear to accept C.UTF-8 (but I haven't tried it). > I assume you mean that the collation order can't (shouldn't, anyway) > change. But what about the ctype (upper/lower/initcap) behavior? Is > that also locked down for all time, or could it change if some new > unicode characters are added? Fair point. Considering that our collversion effectively functions as a proxy for ctype version too, Daniel's patch makes a certain amount of sense. Our versioning is nominally based only on the collation category, not locales more generally or any other category they contain (nominally, as in: we named it collversion, and our code and comments and discussions so far only contemplated collations in this context). But, clearly, changes to underlying ctype data could also cause a constraint CHECK (x ~ '[[:digit:]]') or a partial index with WHERE (upper(x) <> 'ẞ') to be corrupted, which I'd considered to be a separate topic, but Daniel's patch would cover with the same mechanism. (Actually I just learned that [[:digit:]] is a bad example on a glibc system, because they appear to have hardcoded a test for [0-9] into their iswdigit_l() implementation, but FreeBSD gives the Unicode answer, which is subject to change, and other classes may work better on glibc.) > Would it be correct to interpret LC_COLLATE=C.UTF-8 as LC_COLLATE=C, > but leave LC_CTYPE=C.UTF-8 as-is? Yes. The basic idea, at least for these two OSes, is that every category behaves as if set to C, except LC_CTYPE. For implementation reasons the glibc people don't quite describe it that way[1]: for LC_COLLATE, they decode to codepoints first and then compare those using a new codepath they had to write for release 2.35, while FreeBSD skips that useless step and compares raw UTF-8 bytes like LC_COLLATE=C[2]. Which is the same, as RFC 3692 tells us: o The byte-value lexicographic sorting order of UTF-8 strings is the same as if ordered by character numbers. Of course this is of limited interest since a sort order based on character numbers is almost never culturally valid. It is interesting to note that LC_COLLATE=C, LC_CTYPE=C.UTF-8 is equivalent, but would not get version warnings with Daniel's patch, revealing that it's only a proxy. But recording ctype version separately would be excessive. For completeness, Solaris also has C.UTF-8. I can't read about what it does, the release notes are behind a signup thing. *shrug* I can't find any other systems that have it. [1] https://sourceware.org/glibc/wiki/Proposals/C.UTF-8 [2] https://reviews.freebsd.org/D17833
Re: [DOC] Update ALTER SUBSCRIPTION documentation v3
On Fri, Jun 16, 2023 at 7:15 PM Peter Eisentraut wrote: > > On 15.06.23 04:49, Amit Kapila wrote: > > > > Now, along with this change, there is a change in errhint as well > > which I am not sure about whether to backpatch or not. I think we have > > the following options (a) commit both doc and code change in HEAD (b) > > commit both doc and code change in v17 when the next version branch > > opens (c) backpatch the doc change and commit the code change in HEAD > > only (d) backpatch the doc change and commit the code change in v17 > > (e) backpatch both the doc and code change. > > Reading the thread again now, I think this is essentially a bug fix, so > I don't mind backpatching it. > > I wish the errhint would show the actual command to disable the > subscription. It already shows the command to detach the replication > slot, so it would only be consistent to also show the other command. > Fair enough. I updated the errhint and slightly adjusted the docs as well in the attached. -- With Regards, Amit Kapila. logical_replication_slot_disable_doc_update_v5.patch Description: Binary data