Re: postgres_fdw test timeouts
08.12.2023 02:02, Nathan Bossart wrote: On Fri, Dec 08, 2023 at 09:55:58AM +1300, Thomas Munro wrote: Now we have the question of whether to go forwards (commit the "socket table" thing), or backwards (revert 04a09ee for now to clear the CI failures). I don't love the hidden complexity of the socket table and am not in a hurry to commit it, but I don't currently see another way... on the other hand we have other CI flapping due to that problem too so reverting 04a09ee would be sweeping problems under the carpet. I still need to process your feedback/discoveries on that other thread and it may take a few weeks for me to get to it. I don't think we need to revert 04a09ee provided the issue is unrelated and a fix is in development. I've reviewed the links posted upthread and analyzed statistics of such failures: yes, it happens rather frequently in Cirrus CI, but there might be dozens of successful runs, for example: https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest%2F45%2F3686 has 1 postgres_fdw failure on Windows per 32 runs. And there is only one such failure for 90 days in the buildfarm. (Perhaps the probability of the failure depend on external factors, such as concurrent activity.) So I would not say that it's a dominant failure for now, and given that 04a09ee lives in master only, maybe we can save two commits (Revert + Revert of revert) while moving to a more persistent solution. Best regards, Alexander
Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)
On Fri, Dec 8, 2023 at 3:09 PM Alena Rybakina wrote: > > Thank you for your work. Unfortunately, your code contained errors during the > make installation: > > 'SAVEPOINT' after 'SAVE_ERROR' in unreserved_keyword list is misplaced > 'SAVEPOINT' after 'SAVE_ERROR' in bare_label_keyword list is misplaced > make[2]: *** [../../../src/Makefile.global:783: gram.c] Error 1 > make[1]: *** [Makefile:131: parser/gram.h] Error 2 > make[1]: *** Waiting for unfinished jobs > make: *** [src/Makefile.global:383: submake-generated-headers] Error 2 > > I have ubuntu 22.04 operation system. > > On 06.12.2023 13:47, jian he wrote: > > On Tue, Dec 5, 2023 at 6:07 PM Alena Rybakina > wrote: > > Hi! > > Thank you for your contribution to this thread. > > > I reviewed it and have a few questions. > > 1. I have seen that you delete a table before creating it, to which you want > to add errors due to a failed "copy from" operation. I think this is wrong > because this table can save useful data for the user. > At a minimum, we should warn the user about this, but I think we can just add > some number at the end of the name, such as name_table1, name_table_2. > > Sorry. I don't understand this part. > Currently, if the error table name already exists, then the copy will > fail, an error will be reported. > I try to first create a table, if no error then the error table will be > dropped. > > To be honest, first of all, I misunderstood this part of the code. Now I see > that it works the way you mentioned. > > However, I didn't see if you dealt with cases where we already had a table > with the same name as the table error. > I mean, when is he trying to create for the first time, or will we never be > able to face such a problem? > > Can you demo the expected behavior? > > Unfortunately, I was unable to launch it due to a build issue. > Hopefully attached will work. > 2. I noticed that you are forming a table name using the type of errors that > prevent rows from being added during 'copy from' operation. > I think it would be better to use the name of the source file that was used > while 'copy from' was running. > In addition, there may be several such files, it is also worth considering. > > Another column added. > now it looks like: > > SELECT * FROM save_error_csv_error; > filename | lineno |line > | field | source | err_message | > err_detail | errorcode > --+++---++-++--- > STDIN| 1 | 2002232 40 50 60 70 > 80 | NULL | NULL | extra data after last expected column | > NULL | 22P04 > STDIN| 1 | 2000230 23 > | d | NULL | missing data for column "d" | NULL > | 22P04 > STDIN| 1 | z,,"" > | a | z | invalid input syntax for type integer: "z" | NULL > | 22P02 > STDIN| 2 | \0,, > | a | \0 | invalid input syntax for type integer: "\0" | NULL > | 22P02 > > Yes, I see the "filename" column, and this will solve the problem, but > "STDIN" is unclear to me. please see comment in struct CopyFromStateData: char*filename; /* filename, or NULL for STDIN */ > */ > > Maybe we can rewrite it like this: > > /* Check, the err_nsp.error_rel table has already existed > * and if it is, check its column name and data types. > refactored. From 2510dc2e2b13c60a5a7e184bf8e55325601d97e0 Mon Sep 17 00:00:00 2001 From: pgaddict Date: Sun, 10 Dec 2023 09:51:42 +0800 Subject: [PATCH v10 1/1] Make COPY FROM more error tolerant Currently COPY FROM has 3 types of error while processing the source file. * extra data after last expected column * missing data for column \"%s\" * data type conversion error. Instead of throwing errors while copying, save_error will save errors to a table automatically. We check the table definition via column name and column data type. if table already exists and meets the criteria then errors will save to that table. if the table does not exist, then create one. Only works for COPY FROM, non-BINARY mode. While copying, if error never happened, error save table will be dropped at the ending of COPY FROM. If the error saving table already exists, meaning at least once COPY FROM errors has happened, then all the future errors will be saved to that table. we save the error to error saving table using SPI, construct a query, then execute the query. --- contrib/file_fdw/file_fdw.c | 4 +- doc/src/sgml/ref/copy.sgml | 93 + src/backend/commands/copy.c | 12 ++ src/backend/commands/copyfrom.c | 146 +++- src/backend/commands/copyfromparse.c | 169 +-- src/backend/parser/gram.y| 8 +- src/bin/psql/tab-complete.c | 3 +- src/include/
Re: Synchronizing slots from primary to standby
On Wed, Dec 6, 2023 at 4:53 PM shveta malik wrote: > > v43-002: > Review comments on v43-0002: = 1. synchronize_one_slot() { ... + /* + * With hot_standby_feedback enabled and invalidations handled + * apropriately as above, this should never happen. + */ + if (remote_slot->restart_lsn < MyReplicationSlot->data.restart_lsn) + { + ereport(ERROR, + errmsg("not synchronizing local slot \"%s\" LSN(%X/%X)" +" to remote slot's LSN(%X/%X) as synchronization " +" would move it backwards", remote_slot->name, +LSN_FORMAT_ARGS(MyReplicationSlot->data.restart_lsn), +LSN_FORMAT_ARGS(remote_slot->restart_lsn))); + + goto cleanup; ... } After the error, the control won't return, so the above goto doesn't make any sense. 2. synchronize_one_slot() { ... + /* Search for the named slot */ + if ((s = SearchNamedReplicationSlot(remote_slot->name, true))) + { + SpinLockAcquire(&s->mutex); + sync_state = s->data.sync_state; + SpinLockRelease(&s->mutex); + } ... ... + ReplicationSlotAcquire(remote_slot->name, true); + + /* + * Copy the invalidation cause from remote only if local slot is not + * invalidated locally, we don't want to overwrite existing one. + */ + if (MyReplicationSlot->data.invalidated == RS_INVAL_NONE) + { + SpinLockAcquire(&MyReplicationSlot->mutex); + MyReplicationSlot->data.invalidated = remote_slot->invalidated; + SpinLockRelease(&MyReplicationSlot->mutex); + } + + /* Skip the sync if slot has been invalidated locally. */ + if (MyReplicationSlot->data.invalidated != RS_INVAL_NONE) + goto cleanup; ... It seems useless to acquire the slot if it is locally invalidated in the first place. Won't it be better if after the search we first check whether the slot is locally invalidated and take appropriate action? 3. After doing the above two, I think it doesn't make sense to have goto at the remaining places in synchronize_one_slot(). We can simply release the slot and commit the transaction at other places. 4. + * Returns nap time for the next sync-cycle. + */ +static long +synchronize_slots(WalReceiverConn *wrconn) Returning nap time from here appears a bit awkward. I think it is better if this function returns any_slot_updated and then the caller decides the adjustment of naptime. 5. +synchronize_slots(WalReceiverConn *wrconn) { ... ... + /* The syscache access needs a transaction env. */ + StartTransactionCommand(); + + /* + * Make result tuples live outside TopTransactionContext to make them + * accessible even after transaction is committed. + */ + MemoryContextSwitchTo(oldctx); + + /* Construct query to get slots info from the primary server */ + initStringInfo(&s); + construct_slot_query(&s); + + elog(DEBUG2, "slot sync worker's query:%s \n", s.data); + + /* Execute the query */ + res = walrcv_exec(wrconn, s.data, SLOTSYNC_COLUMN_COUNT, slotRow); + pfree(s.data); + + if (res->status != WALRCV_OK_TUPLES) + ereport(ERROR, + (errmsg("could not fetch failover logical slots info " + "from the primary server: %s", res->err))); + + CommitTransactionCommand(); ... ... } Where exactly in the above code, there is a syscache access as mentioned above StartTransactionCommand()? 6. - ~/.pgpass file on the standby server (use + ~/.pgpass file on the standby server. (use replication as the database name). Why do we need this change? 7. + standby. Additionally, similar to creating a logical replication slot + on the hot standby, hot_standby_feedback should be + set on the standby and a physical slot between the primary and the standby + should be used. In this, I don't understand the relation between the first part of the line: "Additionally, similar to creating a logical replication slot on the hot standby ..." with the rest. 8. However, + the slots which were in initiated sync_state ('i) and were not A single quote after 'i' is missing. 9. the slots with state 'r' and 'i' can neither be used for logical + decoded nor dropped by the user. /decoded/decoding 10. +/* + * Allocate and initialize slow sync worker shared memory + */ /slow/slot -- With Regards, Amit Kapila.
Re: Change GUC hashtable to use simplehash?
I wrote: > On Sun, Dec 10, 2023 at 2:18 AM Jeff Davis wrote: > > > > On Sat, 2023-12-09 at 18:52 +0700, John Naylor wrote: > > > > I tested using the new hash function APIs for my search path cache, > > > > and > > > > there's a significant speedup for cases not benefiting from > > > > a86c61c9ee. > > > > It's enough that we almost don't need a86c61c9ee. So a definite +1 > > > > to > > > > the new APIs. > > Interesting, thanks for testing! SearchPathCache is a better starting > point than dynahash for removing strlen calls anyway -- it's more > localized, uses simplehash, and we can test it with at-hand tests. Since I had to fix a misalignment in the original to keep ubsan from crashing CI anyway (v8-0005), I thought I'd take the experiment with search path cache and put the temporary validation of the hash function output in there (v8-0004). I had to finagle a bit to get the bytewise interface to give the same answer as the original, but that's okay: The bytewise interface is intended for when we don't know the length up front (and therefore the internal seed can't be tweaked with the length), but it's nice to make sure nothing's broken. There is also a chunkwise version for search path cache. That might be a little faster. Perf testing can be done as is, because I put the validation in assert builds only. I've left out the GUC stuff for now, just want to get CI green again. From 54e6419a632b04d97cad847603035050ab48c84f Mon Sep 17 00:00:00 2001 From: John Naylor Date: Sat, 9 Dec 2023 16:32:05 +0700 Subject: [PATCH v8 3/5] Add bytewise interface This is useful for hashing values with unknown length, like NUL-terminated strings. It should be faster than calling strlen() first and passing the length, which most hash functions require. Note: This method can't give the same answer as regular fasthash, so it will need to be evaluated. It's possible we need to mix in the length at the finalization step (at which time can know the length), in order to safeguard against collisions. --- src/include/common/hashfn_unstable.h | 19 +++ 1 file changed, 19 insertions(+) diff --git a/src/include/common/hashfn_unstable.h b/src/include/common/hashfn_unstable.h index fbae7a5522..80aec98dc9 100644 --- a/src/include/common/hashfn_unstable.h +++ b/src/include/common/hashfn_unstable.h @@ -49,6 +49,7 @@ typedef struct fasthash_state { uint64 accum; #define FH_SIZEOF_ACCUM sizeof(uint64) + int8 accum_len; uint64 hash; } fasthash_state; @@ -69,6 +70,7 @@ fasthash_combine(fasthash_state* hs) /* reset hash state for next input */ hs->accum = 0; + hs->accum_len = 0; } static inline void @@ -82,6 +84,18 @@ fasthash_init(fasthash_state *hs, int len, uint64 seed) hs->hash = seed ^ (len * 0x880355f21e6d1965ULL); } +static inline void +fasthash_accum_byte(fasthash_state *hs, const unsigned char ch) +{ + hs->accum <<= BITS_PER_BYTE; + hs->accum |= ch; + hs->accum_len++; + + // wip: is there a better way to get sizeof struct member? + if (hs->accum_len == sizeof(((fasthash_state *) 0)->accum)) + fasthash_combine(hs); +} + static inline void fasthash_accum(fasthash_state *hs, const unsigned char *k, int len) { @@ -117,6 +131,11 @@ fasthash_accum(fasthash_state *hs, const unsigned char *k, int len) static inline uint64 fasthash_final64(fasthash_state *hs) { + // check for remaining bytes to combine into hash + // should only be used by the bytewise interface + if (hs->accum_len > 0) + fasthash_combine(hs); + return fasthash_mix(hs->hash); } -- 2.43.0 From f5ab683d61724e9766d43e58c6f3177a30f708d0 Mon Sep 17 00:00:00 2001 From: John Naylor Date: Sun, 10 Dec 2023 12:11:37 +0700 Subject: [PATCH v8 2/5] Rewrite fasthash functions using a homegrown incremental interface The incremental interface will be useful for cases we don't know the length up front, such as NUL-terminated strings. First, we need to validate that this interface can give the same answer as the original functions when we do know the length. A future commit will add a temporary assert for testing in CI. --- src/include/common/hashfn_unstable.h | 161 +-- 1 file changed, 153 insertions(+), 8 deletions(-) diff --git a/src/include/common/hashfn_unstable.h b/src/include/common/hashfn_unstable.h index a5bf965fa2..fbae7a5522 100644 --- a/src/include/common/hashfn_unstable.h +++ b/src/include/common/hashfn_unstable.h @@ -1,3 +1,25 @@ +/* +Building blocks for creating fast inlineable hash functions. The +unstable designation is in contrast to hashfn.h, which cannot break +compatibility because hashes can be writen to disk and so must have +the same hashes between versions. + + * + * Portions Copyright (c) 2018-2023, PostgreSQL Global Development Group + * + * src/include/common/hashfn_unstable.c + */ + +#ifndef HASHFN_UNSTABLE_H +#define HASHFN_UNSTABLE_H + +/* + * fasthash is a modification of code taken from + * https://code.google.com/archive/p/fast-hash/source/default/source + * under the te
Re: GUC names in messages
On Sat, Dec 9, 2023 at 1:48 AM Peter Eisentraut wrote: > > On 08.12.23 05:10, Peter Smith wrote: > > Patch 0001 -- "datestyle" becomes DateStyle in messages > > Rebased this again, which was part of an earlier patch set > > - I think any GUC names documented as MixedCase should keep that same > > case in messages; this also obeys the guidelines recently pushed [1]. > > - Some others agreed, expecting the exact GUC name (in the message) > > can be found in pg_settings [2]. > > - OTOH, Michael didn't like the diff churn [3] caused by this patch. > > I'm fine with adjusting the mixed-case stuff, but intuitively, I don't > think removing the quotes in this is an improvement: > > - GUC_check_errdetail("Conflicting \"datestyle\" specifications."); > + GUC_check_errdetail("Conflicting DateStyle specifications."); > My original intention of this thread was only to document the GUC name quoting guidelines and then apply those consistently in the code. I'm happy either way for the MixedCase names to be quoted or not quoted, whatever is the consensus. If the rule is changed to quote those MixedCase GUCs then the docs will require minor tweaking CURRENT In messages containing configuration variable names, do not include quotes when the names are visibly not natural English words, such as when they have underscores, are all-uppercase or have mixed case. Otherwise, quotes must be added. Do include quotes in a message where an arbitrary variable name is to be expanded. "are all-uppercase or have mixed case." --> "or are all-uppercase." == Kind Regards, Peter Smith. Fujitsu Australia
Re: GUC names in messages
This v5* looks good to me, except it will need some further modification if PeterE's suggestion [1] to keep quotes for the MixedCase GUCs is adopted. == [1] https://www.postgresql.org/message-id/9e7802b2-2cf2-4c2d-b680-b2ccb9db1d2f%40eisentraut.org Kind Regards, Peter Smith. Futjisu Australia.
Re: Synchronizing slots from primary to standby
FYI -- the patch 0002 did not apply cleanly for me on top of the 050 test file created by patch 0001. [postgres@CentOS7-x64 oss_postgres_misc]$ git apply ../patches_misc/v44-0001-Allow-logical-walsenders-to-wait-for-the-physica.patch [postgres@CentOS7-x64 oss_postgres_misc]$ git apply ../patches_misc/v44-0002-Add-logical-slot-sync-capability-to-the-physical.patch error: patch failed: src/test/recovery/t/050_standby_failover_slots_sync.pl:289 error: src/test/recovery/t/050_standby_failover_slots_sync.pl: patch does not apply == Kind Regards, Peter Smith. Fujitsu Australia
Re: Make COPY format extendable: Extract COPY TO format implementations
On Sun, Dec 10, 2023 at 5:44 AM Sutou Kouhei wrote: > > Hi, > > Thanks for reviewing our latest patch! > > In > > > "RE: Make COPY format extendable: Extract COPY TO format implementations" > on Sat, 9 Dec 2023 02:43:49 +, > "Hayato Kuroda (Fujitsu)" wrote: > > > (I remember that this theme was talked at Japan PostgreSQL conference) > > Yes. I should have talked to you more at the conference... > I will do it next time! > > > Can we discuss how to proceed this improvement? > > There are 2 approaches for it: > > 1. Do the followings concurrently: >a. Implementing small changes that got a consensus and > merge them step-by-step > (e.g. We got a consensus that we need to extract the > current format related routines.) >b. Discuss design It's preferable to make patches small for easy review. We can merge them anytime before commit if necessary. I think we need to discuss overall design about callbacks and how extensions define a custom copy handler etc. It may require some PoC patches. Once we have a consensus on overall design we polish patches including the documentation changes and regression tests. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
unconstify()/unvolatize() vs g++/clang++
Hi, AFAICS you can't use unconstify()/unvolatize() in a static inline function in a .h file, or in a .cpp file, because __builtin_types_compatible_p is only available in C, not C++. Seems like a reasonable thing to want to be able to do, no? I'm not immediately sure what the right fix is; would #if defined(HAVE__BUILTIN_TYPES_COMPATIBLE_P) && !defined(__cplusplus) around the relevant versions of constify()/unvolatize() be too easy? HAVE__BUILTIN_TYPES_COMPATIBLE_P is also tested in relptr.h, but only for further preprocessor stuff, not in functions that the compiler will see, so cpluspluscheck doesn't have anything to reject, and nothing will break unless someone writing C++ code actually tries to use relptr_access(). I think we can live with that one?
Some useless includes in llvmjit_inline.cpp
Hi, This is not exhaustive, I just noticed in passing that we don't need these. From ccadf2778192db48376632c35474736a3370b0c2 Mon Sep 17 00:00:00 2001 From: Thomas Munro Date: Mon, 11 Dec 2023 13:50:18 +1300 Subject: [PATCH] Remove useless includes from llvmjit_inline.cpp. --- src/backend/jit/llvm/llvmjit_inline.cpp | 2 -- 1 file changed, 2 deletions(-) diff --git a/src/backend/jit/llvm/llvmjit_inline.cpp b/src/backend/jit/llvm/llvmjit_inline.cpp index d92d7f3c88..d6e6f2a559 100644 --- a/src/backend/jit/llvm/llvmjit_inline.cpp +++ b/src/backend/jit/llvm/llvmjit_inline.cpp @@ -34,9 +34,7 @@ extern "C" #include #include -#include "common/string.h" #include "miscadmin.h" -#include "storage/fd.h" } #include -- 2.39.2
Re: pg_upgrade failing for 200+ million Large Objects
I spent some time looking at the v7 patch. I can't help feeling that this is going off in the wrong direction, primarily for these reasons: * It focuses only on cutting the number of transactions needed to restore a large number of blobs (large objects). Certainly that's a pain point, but it's not the only one of this sort. If you have a lot of tables, restore will consume just as many transactions as it would for a similar number of blobs --- probably more, in fact, since we usually need more commands per table than per blob. * I'm not too thrilled with the (undocumented) rearrangements in pg_dump. I really don't like the idea of emitting a fundamentally different TOC layout in binary-upgrade mode; that seems unmaintainably bug-prone. Plus, the XID-consumption problem is not really confined to pg_upgrade. What I think we actually ought to do is one of the alternatives discussed upthread: teach pg_restore to be able to commit every so often, without trying to provide the all-or-nothing guarantees of --single-transaction mode. This cuts its XID consumption by whatever multiple "every so often" is, while allowing us to limit the number of locks taken during any one transaction. It also seems a great deal safer than the idea I floated of not taking locks at all during a binary upgrade; plus, it has some usefulness with regular pg_restore that's not under control of pg_upgrade. So I had a go at coding that, and attached is the result. It invents a --transaction-size option, and when that's active it will COMMIT after every N TOC items. (This seems simpler to implement and less bug-prone than every-N-SQL-commands.) I had initially supposed that in a parallel restore we could have child workers also commit after every N TOC items, but was soon disabused of that idea. After a worker processes a TOC item, any dependent items (such as index builds) might get dispatched to some other worker, which had better be able to see the results of the first worker's step. So at least in this implementation, we disable the multi-command-per-COMMIT behavior during the parallel part of the restore. Maybe that could be improved in future, but it seems like it'd add a lot more complexity, and it wouldn't make life any better for pg_upgrade (which doesn't use parallel pg_restore, and seems unlikely to want to in future). I've not spent a lot of effort on pg_upgrade changes here: I just hard-wired it to select --transaction-size=1000. Given the default lock table size of 64*100, that gives us enough headroom for each TOC to take half a dozen locks. We could go higher than that by making pg_upgrade force the destination postmaster to create a larger-than-default lock table, but I'm not sure if it's worth any trouble. We've already bought three orders of magnitude improvement as it stands, which seems like enough ambition for today. (Also, having pg_upgrade override the user's settings in the destination cluster might not be without downsides.) Another thing I'm wondering about is why this is only a pg_restore option not also a pg_dump/pg_dumpall option. I did it like that because --single-transaction is pg_restore only, but that seems more like an oversight or laziness than a well-considered decision. Maybe we should back-fill that omission; but it could be done later. Thoughts? regards, tom lane diff --git a/doc/src/sgml/ref/pg_restore.sgml b/doc/src/sgml/ref/pg_restore.sgml index 1a23874da6..2e3ba80258 100644 --- a/doc/src/sgml/ref/pg_restore.sgml +++ b/doc/src/sgml/ref/pg_restore.sgml @@ -786,6 +786,30 @@ PostgreSQL documentation + + --transaction-size=N + + +Execute the restore as a series of transactions, each processing +up to N database +objects. This option implies --exit-on-error. + + +--transaction-size offers an intermediate choice +between the default behavior (one transaction per SQL command) +and -1/--single-transaction +(one transaction for all restored objects). +While --single-transaction has the least +overhead, it may be impractical for large databases because the +transaction will take a lock on each restored object, possibly +exhausting the server's lock table space. +Using --transaction-size with a size of a few +thousand objects offers nearly the same performance benefits while +capping the amount of lock table space needed. + + + + --use-set-session-authorization diff --git a/src/bin/pg_dump/pg_backup.h b/src/bin/pg_dump/pg_backup.h index 9ef2f2017e..fbf5f1c515 100644 --- a/src/bin/pg_dump/pg_backup.h +++ b/src/bin/pg_dump/pg_backup.h @@ -149,7 +149,9 @@ typedef struct _restoreOptions * compression */ int suppressDumpWarnings; /* Suppress output of WARNING entries * to stderr */ - bool single_txn; + + bool single_txn; /* restore
Re: Make COPY format extendable: Extract COPY TO format implementations
On Sun, Dec 10, 2023 at 5:55 AM Sutou Kouhei wrote: > > Hi, > > In > "Re: Make COPY format extendable: Extract COPY TO format implementations" > on Fri, 8 Dec 2023 15:42:06 +0900, > Masahiko Sawada wrote: > > > So a custom COPY extension would not be able to define SQL functions > > just like arrow(internal) for example. We might need to define a rule > > like the function returning copy_in/out_handler must be defined as > > _to(internal) and _from(internal). > > We may not need to add "_to"/"_from" suffix by checking both > of argument type and return type. Because we use different > return type for copy_in/out_handler. > > But the current LookupFuncName() family doesn't check return > type. If we use this approach, we need to improve the > current LookupFuncName() family too. IIUC we cannot create two same name functions with the same arguments but a different return value type in the first place. It seems to me to be an overkill to change such a design. Another idea is to encapsulate copy_to/from_handler by a super class like copy_handler. The handler function is called with an argument, say copyto, and returns copy_handler encapsulating either copy_to/from_handler depending on the argument. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Re: Assert failure on 'list_member_ptr(rel->joininfo, restrictinfo)'
On Fri, Dec 8, 2023 at 3:13 PM Alexander Pyhalov wrote: > Andrei Lepikhov писал(а) 2023-12-08 07:37: > > I'd already clashed with Tom on copying the required_relids field and > > voluntarily made unnecessary copies in the project [1]. > > And ... stuck into huge memory consumption. The reason was in > > Bitmapsets: > > When we have 1E3-1E4 partitions and try to reparameterize a join, one > > bitmapset field can have a size of about 1kB. Having bitmapset > > referencing Relation with a large index value, we had a lot of (for > > example, 1E4 * 1kB) copies on each reparametrization of such a field. > > Alexander Pyhalov should remember that case. > > Yes. If it matters, this happened during reparametrization when 2 > partitioned tables with 1000 partitions each were joined. Then > asymmetric pw join managed to eat lots of memory for bitmapsets (by > lots of memory I mean all available on the test VM). By reparametrization did you mean the work done in reparameterize_path_by_child()? If so maybe you'd be interested in the patch [1] which postpones reparameterization of paths until createplan.c and thus can help avoid unnecessary reparametrization work. [1] https://www.postgresql.org/message-id/CAMbWs48PBwe1YadzgKGW_ES%3DV9BZhq00BaZTOTM6Oye8n_cDNg%40mail.gmail.com Thanks Richard
Re: Postgres db Update to Version 15
On Sun, 10 Dec 2023 at 04:10, Ritthaler, Axel wrote: > The Root Cause of this behavior, as aligned with AWS RDS Support, has been a > new feature-set coding (parallel_feature_query) with Postgres Version 15, > that shows a different behavior due to related parameter > (max_parallel_workers_per_gather). What is parallel_feature_query? No version of PostgreSQL has a setting by that name. > Remaining question now is, what has to be done to move related > Live-Landscapes back to the default parameter value (2) without creating the > same impact again. You'll need to identify the query or queries causing the problem. We've likely made many more query shapes parallelizable in PG15 compared to PG11. So it does not seem unusual that PG15 will be able to paralleize more of your queries than what PG11 could do. That could lead to parallel plans not getting the workers they desire due to workers being busy with other queries. > What is your suggestion and recommended way-forward to enable parallel-worker > setup again ? Identify the queries causing the problem. Then determine if the plan has changed since PG11. You can then check all the release notes starting with PG12 to see if anything is mentioned about why the plan might have changed. e.g. something in the query is parallelizable in this version that wasn't in PG11. One thing to keep in mind is that PostgreSQL does not opt to parallelize the cheapest serial plan. It will attempt to find the cheapest plan with or without parallel workers. The difference here is that it's optimized for speed rather than resource usage. I'm not sure if this is a factor in your issue, but it may be something to keep in mind while investigating. David
Re: Assert failure on 'list_member_ptr(rel->joininfo, restrictinfo)'
On 11/12/2023 09:31, Richard Guo wrote: On Fri, Dec 8, 2023 at 3:13 PM Alexander Pyhalov mailto:a.pyha...@postgrespro.ru>> wrote: Andrei Lepikhov писал(а) 2023-12-08 07:37: > I'd already clashed with Tom on copying the required_relids field and > voluntarily made unnecessary copies in the project [1]. > And ... stuck into huge memory consumption. The reason was in > Bitmapsets: > When we have 1E3-1E4 partitions and try to reparameterize a join, one > bitmapset field can have a size of about 1kB. Having bitmapset > referencing Relation with a large index value, we had a lot of (for > example, 1E4 * 1kB) copies on each reparametrization of such a field. > Alexander Pyhalov should remember that case. Yes. If it matters, this happened during reparametrization when 2 partitioned tables with 1000 partitions each were joined. Then asymmetric pw join managed to eat lots of memory for bitmapsets (by lots of memory I mean all available on the test VM). By reparametrization did you mean the work done in reparameterize_path_by_child()? If so maybe you'd be interested in the patch [1] which postpones reparameterization of paths until createplan.c and thus can help avoid unnecessary reparametrization work. Yeah, I have discovered it already. It is a promising solution and only needs a bit more review. But here, I embraced some corner cases with the idea that we may not see other cases right now. And also, sometimes the Bitmapset field is significant - it is not a corner case. -- regards, Andrei Lepikhov Postgres Professional
Re: Oversight in reparameterize_path_by_child leading to executor crash
On Fri, Dec 8, 2023 at 5:39 PM Alena Rybakina wrote: > On 06.12.2023 10:30, Richard Guo wrote: > > I've self-reviewed this patch again and I think it's now in a > > committable state. I'm wondering if we can mark it as 'Ready for > > Committer' now, or we need more review comments/feedbacks. > > > > To recap, this patch postpones reparameterization of paths until > > createplan.c, which would help avoid building the reparameterized > > expressions we might not use. More importantly, it makes it possible to > > modify the expressions in RTEs (e.g. tablesample) and in RelOptInfos > > (e.g. baserestrictinfo) for reparameterization. Failing to do that can > > lead to executor crashes, wrong results, or planning errors, as we have > > already seen. I marked it as 'Ready for Committer'. I think it is ready. Thank you. Appreciate that. Thanks Richard
Re: Adding facility for injection points (or probe points?) for more advanced tests
On Tue, Nov 28, 2023 at 4:07 AM Michael Paquier wrote: > > On Mon, Nov 27, 2023 at 12:14:05PM +0530, Ashutosh Bapat wrote: > > Since you wroten "(still need to improve ...) I thought you are > > working on v6. No problem. Sorry for the confusion. > > I see why my previous message could be confusing. Sorry about that. I haven't specifically done a review or testing of this patch, but I have used this for testing the CLOG group update code with my SLRU-specific changes and I found it quite helpful to test some of the concurrent areas where you need to stop processing somewhere in the middle of the code and testing that area without this kind of injection point framework is really difficult or may not be even possible. We wanted to test the case of clog group update where we can get multiple processes added to a single group and get the xid status updated by the group leader, you can refer to my test in that thread[1] (the last patch test_group_commit.patch is using this framework for testing). Overall I feel this framework is quite useful and easy to use as well. [1] https://www.postgresql.org/message-id/CAFiTN-udSTGG_t5n9Z3eBbb4_%3DzNoKU%2B8FP-S6zpv-r4Gm-Y%2BQ%40mail.gmail.com -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: RFC: Logging plan of the running query
On 2023-12-07 08:33, Rafael Thofehrn Castro wrote: Hello hackers, Last Saturday I submitted a patch to the pgsql-hackers list with the title "Proposal: In-flight explain logging" with a patch proposing a feature very similar to the one being worked on in this thread. I should have done a better search in the commitfest before implementing something from scratch. So, as recommended by Ashutosh, I am sending an incremental patch containing an additional feature I personally think we should include: logging the plan with instrumentation details if enabled. Thanks for the proposal and making the patch! When targeting a query with instrumentation PG should log the complete EXPLAIN ANALYZE plan with current row count and, if enabled, timing for each node. This gives the user not only the ability to see what the plan is but also what was executed so far, which is super useful when troubleshooting queries that never finish. I think showing the progress of the query execution would be useful. OTOH it seems to at least need some modifications around Instrumentation as your patch. As a first step, I think it would better to minimize the scope and focus on the fundamental function. For the same reason, getting queries for parallel workers is also prohibited in the current patch as discussed here[1]. [1] https://www.postgresql.org/message-id/c25ae6015be96a1964eddd964657660b%40oss.nttdata.com So I think below steps would be better than pushing all the functionalities to the 1st commit. - First, develop function to enable output of query progress(v34-0001-Add-function-to-log-the-plan-of-the-query.patch). - Then enhance the function - showing the progress of the query execution(v34-0002-Log-plan-along-with-instrumentation-details.patch), etc. --https://www.postgresql.org/message-id/CAG0ozMp3g3drnkDa6RZxXO_OmnisL2sD9vBrmpu5fOBoYpC-3w%40mail.gmail.com - ExplainState customization A ExplainState is allocated and customized for the in-flight logging. Instrumentation related settings are enabled based on how the target query started, which is usually via EXPLAIN ANALYZE or with auto_explain. Does this mean the progress can be got only when the target query was run with EXPLAIN ANALYZE or auto_explain.log_analyze? If so, there might be limited situations we can get the progress since I imagine EXPLAIN ANALYZE is used when user want to get the plan from the beginning and auto_explain.log_analyze can give negative impact on performance as described in the manual and there may not be many environments which enable it. -- Regards, -- Atsushi Torikoshi NTT DATA Group Corporation
Re: undetected deadlock in ALTER SUBSCRIPTION ... REFRESH PUBLICATION
On Sat, Dec 9, 2023 at 12:16 PM Amit Kapila wrote: > > Thanks, I could also reproduce the issue on back branches (tried till > 12), and the fix works. I'll push this on Monday. > Peter sent one minor suggestion (to write the check differently for easier understanding) offlist which I addressed and pushed the patch. -- With Regards, Amit Kapila.
Re: trying again to get incremental backup
On Tue, Dec 5, 2023 at 11:40 PM Robert Haas wrote: > > On Mon, Dec 4, 2023 at 3:58 PM Robert Haas wrote: > > Considering all this, what I'm inclined to do is go and put > > UPLOAD_MANIFEST back, instead of INCREMENTAL_WAL_RANGE, and adjust > > accordingly. But first: does anybody see more problems here that I may > > have missed? > > OK, so here's a new version with UPLOAD_MANIFEST put back. I wrote a > long comment explaining why that's believed to be necessary and > sufficient. I committed 0001 and 0002 from the previous series also, > since it doesn't seem like anyone has further comments on those > renamings. I have done some testing on standby, but I am facing some issues, although things are working fine on the primary. As shown below test [1]standby is reporting some errors that manifest require WAL from 0/6F8, but this backup starts at 0/628. Then I tried to look into the manifest file of the full backup and it shows contents as below[0]. Actually from this WARNING and ERROR, I am not clear what is the problem, I understand that full backup ends at "0/6F8" so for the next incremental backup we should be looking for a summary that has WAL starting at "0/6F8" and we do have those WALs. In fact, the error message is saying "this backup starts at 0/628" which is before "0/6F8" so whats the issue? [0] "WAL-Ranges": [ { "Timeline": 1, "Start-LSN": "0/628", "End-LSN": "0/6F8" } [1] -- test on primary dilipkumar@dkmac bin % ./pg_basebackup -D d dilipkumar@dkmac bin % ./pg_basebackup -D d1 -i d/backup_manifest -- cleanup the backup directory dilipkumar@dkmac bin % rm -rf d dilipkumar@dkmac bin % rm -rf d1 --test on standby dilipkumar@dkmac bin % ./pg_basebackup -D d -p 5433 dilipkumar@dkmac bin % ./pg_basebackup -D d1 -i d/backup_manifest -p 5433 WARNING: aborting backup due to backend exiting before pg_backup_stop was called pg_basebackup: error: could not initiate base backup: ERROR: manifest requires WAL from final timeline 1 ending at 0/6F8, but this backup starts at 0/628 pg_basebackup: removing data directory "d1" -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: [PoC] Improve dead tuple storage for lazy vacuum
On Fri, Dec 8, 2023 at 9:44 PM Masahiko Sawada wrote: > > On Fri, Dec 8, 2023 at 7:46 PM John Naylor wrote: > > > > On Fri, Dec 8, 2023 at 3:06 PM Masahiko Sawada > > wrote: > > > > > > BTW Given that the actual value size can be calculated only by the > > > caller, how does the tree know if the value is embedded or not? It's > > > probably related to how to store combined pointer/value slots. > > > > Right, this is future work. At first, variable-length types will have > > to be single-value leaves. In fact, the idea for storing up to 3 > > offsets in the bitmap header could be done this way -- it would just > > be a (small) single-value leaf. > > Agreed. > > > > > (Reminder: Currently, fixed-length values are compile-time embeddable > > if the platform pointer size is big enough.) > > > > > If leaf > > > nodes have a bitmap array that indicates the corresponding slot is an > > > embedded value or a pointer to a value, it would be easy. > > > > That's the most general way to do it. We could do it much more easily > > with a pointer tag, although for the above idea it may require some > > endian-aware coding. Both were mentioned in the paper, I recall. > > True. Probably we can use the combined pointer/value slots approach > only if the tree is able to use the pointer tagging. That is, if the > caller allows the tree to use one bit of the value. > > I'm going to update the patch based on the recent discussion (RT_SET() > and variable-length values) etc., and post the patch set early next > week. I've attached the updated patch set. From the previous patch set, I've merged patches 0007 to 0010. The other changes such as adding RT_GET() still are unmerged for now, for discussion. Probably we can make them as follow-up patches as we discussed. 0011 to 0015 patches are new changes for v44 patch set, which removes RT_SEARCH() and RT_SET() and support variable-length values. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com v44-ART.tar.gz Description: GNU Zip compressed data
Re: Synchronizing slots from primary to standby
Here are some review comments for v44-0001 == src/backend/replication/slot.c 1. ReplicationSlotCreate * during getting changes, if the two_phase option is enabled it can skip * prepare because by that time start decoding point has been moved. So the * user will only get commit prepared. + * failover: Allows the slot to be synced to physical standbys so that logical + * replication can be resumed after failover. */ void ReplicationSlotCreate(const char *name, bool db_specific, ~ /Allows the slot.../If enabled, allows the slot.../ == 2. validate_standby_slots +validate_standby_slots(char **newval) +{ + char*rawname; + List*elemlist; + ListCell *lc; + bool ok = true; + + /* Need a modifiable copy of string */ + rawname = pstrdup(*newval); + + /* Verify syntax and parse string into list of identifiers */ + if (!(ok = SplitIdentifierString(rawname, ',', &elemlist))) + GUC_check_errdetail("List syntax is invalid."); + + /* + * If there is a syntax error in the name or if the replication slots' + * data is not initialized yet (i.e., we are in the startup process), skip + * the slot verification. + */ + if (!ok || !ReplicationSlotCtl) + { + pfree(rawname); + list_free(elemlist); + return ok; + } 2a. You don't need to initialize 'ok' during declaration because it is assigned immediately anyway. ~ 2b. AFAIK assignment within a conditional like this is not a normal PG coding style unless there is no other way to do it. ~ 2c. /into list/into a list/ SUGGESTION /* Verify syntax and parse string into a list of identifiers */ ok = SplitIdentifierString(rawname, ',', &elemlist); if (!ok) GUC_check_errdetail("List syntax is invalid."); ~~~ 3. assign_standby_slot_names + if (!SplitIdentifierString(standby_slot_names_cpy, ',', &standby_slots)) + { + /* This should not happen if GUC checked check_standby_slot_names. */ + elog(ERROR, "list syntax is invalid"); + } This error here and in validate_standby_slots() are different -- "list" versus "List". == src/backend/replication/walsender.c 4. WalSndFilterStandbySlots + foreach(lc, standby_slots_cpy) + { + char*name = lfirst(lc); + XLogRecPtr restart_lsn = InvalidXLogRecPtr; + bool invalidated = false; + char*warningfmt = NULL; + ReplicationSlot *slot; + + slot = SearchNamedReplicationSlot(name, true); + + if (slot && SlotIsPhysical(slot)) + { + SpinLockAcquire(&slot->mutex); + restart_lsn = slot->data.restart_lsn; + invalidated = slot->data.invalidated != RS_INVAL_NONE; + SpinLockRelease(&slot->mutex); + } + + /* Continue if the current slot hasn't caught up. */ + if (!invalidated && !XLogRecPtrIsInvalid(restart_lsn) && + restart_lsn < wait_for_lsn) + { + /* Log warning if no active_pid for this physical slot */ + if (slot->active_pid == 0) + ereport(WARNING, + errmsg("replication slot \"%s\" specified in parameter \"%s\" does not have active_pid", +name, "standby_slot_names"), + errdetail("Logical replication is waiting on the " + "standby associated with \"%s\"", name), + errhint("Consider starting standby associated with " + "\"%s\" or amend standby_slot_names", name)); + + continue; + } + else if (!slot) + { + /* + * It may happen that the slot specified in standby_slot_names GUC + * value is dropped, so let's skip over it. + */ + warningfmt = _("replication slot \"%s\" specified in parameter \"%s\" does not exist, ignoring"); + } + else if (SlotIsLogical(slot)) + { + /* + * If a logical slot name is provided in standby_slot_names, issue + * a WARNING and skip it. Although logical slots are disallowed in + * the GUC check_hook(validate_standby_slots), it is still + * possible for a user to drop an existing physical slot and + * recreate a logical slot with the same name. Since it is + * harmless, a WARNING should be enough, no need to error-out. + */ + warningfmt = _("cannot have logical replication slot \"%s\" in parameter \"%s\", ignoring"); + } + else if (XLogRecPtrIsInvalid(restart_lsn) || invalidated) + { + /* + * Specified physical slot may have been invalidated, so there is no point + * in waiting for it. + */ + warningfmt = _("physical slot \"%s\" specified in parameter \"%s\" has been invalidated, ignoring"); + } + else + { + Assert(restart_lsn >= wait_for_lsn); + } This if/else chain seems structured awkwardly. IMO it would be tidier to eliminate the NULL slot and IsLogicalSlot up-front, which would also simplify some of the subsequent conditions SUGGESTION slot = SearchNamedReplicationSlot(name, true); if (!slot) { ... } else if (SlotIsLogical(slot)) { ... } else { Assert(SlotIsPhysical(slot)) SpinLockAcquire(&slot->mutex); restart_lsn = slot->data.restart_lsn; invalidated = slot->data.invalidated != RS_INVAL_NONE; SpinLockRelease(&slot->mutex); if (XLogRecPtrIsInvalid(restart_lsn) || invalidated) { ... } else if (!invalidated && !XLogRecPtrIsInvalid(restart_lsn) && restart_lsn < wait_for_lsn) { ... } else { Assert(restart_lsn >= wait_f
Re: Synchronizing slots from primary to standby
Hi, On 12/8/23 10:06 AM, Amit Kapila wrote: On Wed, Dec 6, 2023 at 4:53 PM shveta malik wrote: PFA v43, changes are: I wanted to discuss 0003 patch about cascading standby's. It is not clear to me whether we want to allow physical standbys to further wait for cascading standby to sync their slots. If we allow such a feature one may expect even primary to wait for all the cascading standby's because otherwise still logical subscriber can be ahead of one of the cascading standby. I've the same feeling here. I think it would probably be expected that the primary also wait for all the cascading standby. I feel even if we want to allow such a behaviour we can do it later once the main feature is committed. Agree. I think it would be good to just allow logical walsenders on primary to wait for physical standbys represented by GUC 'standby_slot_names'. That makes sense for me for v1. If we agree on that then it would be good to prohibit setting this GUC on standby or at least it should be a no-op even if this GUC should be set on physical standby. I'd prefer to completely prohibit it on standby (to make it very clear it's not working at all) as long as one can enable it without downtime once the standby is promoted (which is the case currently). Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: trying again to get incremental backup
On Mon, Dec 11, 2023 at 11:44 AM Dilip Kumar wrote: > > On Tue, Dec 5, 2023 at 11:40 PM Robert Haas wrote: > > > > On Mon, Dec 4, 2023 at 3:58 PM Robert Haas wrote: > > > Considering all this, what I'm inclined to do is go and put > > > UPLOAD_MANIFEST back, instead of INCREMENTAL_WAL_RANGE, and adjust > > > accordingly. But first: does anybody see more problems here that I may > > > have missed? > > > > OK, so here's a new version with UPLOAD_MANIFEST put back. I wrote a > > long comment explaining why that's believed to be necessary and > > sufficient. I committed 0001 and 0002 from the previous series also, > > since it doesn't seem like anyone has further comments on those > > renamings. > > I have done some testing on standby, but I am facing some issues, > although things are working fine on the primary. As shown below test > [1]standby is reporting some errors that manifest require WAL from > 0/6F8, but this backup starts at 0/628. Then I tried to look > into the manifest file of the full backup and it shows contents as > below[0]. Actually from this WARNING and ERROR, I am not clear what > is the problem, I understand that full backup ends at "0/6F8" so > for the next incremental backup we should be looking for a summary > that has WAL starting at "0/6F8" and we do have those WALs. In > fact, the error message is saying "this backup starts at 0/628" > which is before "0/6F8" so whats the issue? > > [0] > "WAL-Ranges": [ > { "Timeline": 1, "Start-LSN": "0/628", "End-LSN": "0/6F8" } > > > [1] > -- test on primary > dilipkumar@dkmac bin % ./pg_basebackup -D d > dilipkumar@dkmac bin % ./pg_basebackup -D d1 -i d/backup_manifest > > -- cleanup the backup directory > dilipkumar@dkmac bin % rm -rf d > dilipkumar@dkmac bin % rm -rf d1 > > --test on standby > dilipkumar@dkmac bin % ./pg_basebackup -D d -p 5433 > dilipkumar@dkmac bin % ./pg_basebackup -D d1 -i d/backup_manifest -p 5433 > > WARNING: aborting backup due to backend exiting before pg_backup_stop > was called > pg_basebackup: error: could not initiate base backup: ERROR: manifest > requires WAL from final timeline 1 ending at 0/6F8, but this > backup starts at 0/628 > pg_basebackup: removing data directory "d1" Jakub, pinged me offlist and pointed me to the thread[1] where it is already explained so I think we can ignore this. [1] https://www.postgresql.org/message-id/CA%2BTgmoYuC27_ToGtTTNyHgpn_eJmdqrmhJ93bAbinkBtXsWHaA%40mail.gmail.com -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com