Re: Parallel CREATE INDEX for BRIN indexes
On Wed, 8 Nov 2023 at 12:03, Tomas Vondra wrote: > > Hi, > > here's an updated patch, addressing the review comments, and reworking > how the work is divided between the workers & leader etc. Thanks! > In general I'm quite happy with the current state, and I believe it's > fairly close to be committable. Are you planning on committing the patches separately, or squashed? I won't have much time this week for reviewing the patch, and it seems like these patches are incremental, so some guidance on what you want to be reviewed would be appreciated. Kind regards, Matthias van de Meent Neon (https://neon.tech)
Re: Parallel CREATE INDEX for BRIN indexes
On 11/12/23 10:38, Matthias van de Meent wrote: > On Wed, 8 Nov 2023 at 12:03, Tomas Vondra > wrote: >> >> Hi, >> >> here's an updated patch, addressing the review comments, and reworking >> how the work is divided between the workers & leader etc. > > Thanks! > >> In general I'm quite happy with the current state, and I believe it's >> fairly close to be committable. > > Are you planning on committing the patches separately, or squashed? I > won't have much time this week for reviewing the patch, and it seems > like these patches are incremental, so some guidance on what you want > to be reviewed would be appreciated. > Definitely squashed. I only kept them separate to make it more obvious what the changes are. If you need more time for a review, I can certainly wait. No rush. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
building with meson on windows with ssl
Greetings, I am getting the following error building on HEAD Library crypto found: YES Checking for function "CRYPTO_new_ex_data" with dependencies -lssl, -lcrypto: NO I have openssl 1.1.1 installed Dave Cramer
Re: proposal: possibility to read dumped table's name from file
Hi > What are your thoughts on this version? It's not in a committable state > as it > needs a bit more comments here and there and a triplecheck that nothing was > missed in changing this, but I prefer to get your thoughts before spending > the > extra time. > I think using pointer to exit function is an elegant solution. I checked the code and I found only one issue. I fixed warning [13:57:22.578] time make -s -j${BUILD_JOBS} world-bin [13:58:20.858] filter.c: In function ‘pg_log_filter_error’: [13:58:20.858] filter.c:161:2: error: function ‘pg_log_filter_error’ might be a candidate for ‘gnu_printf’ format attribute [-Werror=suggest-attribute=format] [13:58:20.858] 161 | vsnprintf(buf, sizeof(buf), fmt, argp); [13:58:20.858] | ^ [13:58:20.858] cc1: all warnings being treated as errors and probably copy/paste bug diff --git a/src/bin/pg_dump/pg_restore.c b/src/bin/pg_dump/pg_restore.c index f647bde28d..ab2abedf5f 100644 --- a/src/bin/pg_dump/pg_restore.c +++ b/src/bin/pg_dump/pg_restore.c @@ -535,7 +535,7 @@ read_restore_filters(const char *filename, RestoreOptions *opts) case FILTER_OBJECT_TYPE_EXTENSION: case FILTER_OBJECT_TYPE_FOREIGN_DATA: pg_log_filter_error(&fstate, _("%s filter for \"%s\" is not allowed."), - "exclude", + "include", filter_object_type_name(objtype)); exit_nicely(1); Regards Pavel > > -- > Daniel Gustafsson > > From 9be8fb14a3fe75aa4203a059e8372986bf5e6615 Mon Sep 17 00:00:00 2001 From: "ok...@github.com" Date: Sun, 12 Nov 2023 13:54:26 +0100 Subject: [PATCH 2/2] fix err message --- src/bin/pg_dump/pg_restore.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/bin/pg_dump/pg_restore.c b/src/bin/pg_dump/pg_restore.c index f647bde28d..ab2abedf5f 100644 --- a/src/bin/pg_dump/pg_restore.c +++ b/src/bin/pg_dump/pg_restore.c @@ -535,7 +535,7 @@ read_restore_filters(const char *filename, RestoreOptions *opts) case FILTER_OBJECT_TYPE_EXTENSION: case FILTER_OBJECT_TYPE_FOREIGN_DATA: pg_log_filter_error(&fstate, _("%s filter for \"%s\" is not allowed."), - "exclude", + "include", filter_object_type_name(objtype)); exit_nicely(1); -- 2.41.0 From cbaae854eca0cc88bb0886abfd45416ad13cffc7 Mon Sep 17 00:00:00 2001 From: "ok...@github.com" Date: Sat, 11 Nov 2023 20:34:34 +0100 Subject: [PATCH 1/2] possibility to read options for dump from file --- doc/src/sgml/ref/pg_dump.sgml | 114 doc/src/sgml/ref/pg_dumpall.sgml| 22 + doc/src/sgml/ref/pg_restore.sgml| 25 + src/bin/pg_dump/Makefile| 5 +- src/bin/pg_dump/filter.c| 471 + src/bin/pg_dump/filter.h| 60 ++ src/bin/pg_dump/meson.build | 2 + src/bin/pg_dump/pg_dump.c | 114 +++- src/bin/pg_dump/pg_dumpall.c| 68 +- src/bin/pg_dump/pg_restore.c| 103 +++ src/bin/pg_dump/t/005_pg_dump_filterfile.pl | 717 src/tools/msvc/Mkvcbuild.pm | 1 + 12 files changed, 1698 insertions(+), 4 deletions(-) create mode 100644 src/bin/pg_dump/filter.c create mode 100644 src/bin/pg_dump/filter.h create mode 100644 src/bin/pg_dump/t/005_pg_dump_filterfile.pl diff --git a/doc/src/sgml/ref/pg_dump.sgml b/doc/src/sgml/ref/pg_dump.sgml index 8695571045..e2f100d552 100644 --- a/doc/src/sgml/ref/pg_dump.sgml +++ b/doc/src/sgml/ref/pg_dump.sgml @@ -836,6 +836,106 @@ PostgreSQL documentation + + --filter=filename + + +Specify a filename from which to read patterns for objects to include +or exclude from the dump. The patterns are interpreted according to the +same rules as the corresponding options: +-t/--table, +--table-and-children, +--exclude-table-and-children or +-T for tables, +-n/--schema for schemas, +--include-foreign-data for data on foreign servers and +--exclude-table-data, +--exclude-table-data-and-children for table data, +-e/--extension for extensions. +To read from STDIN, use - as the +filename. The --filter option can be specified in +conjunction with the above listed options for including or excluding +objects, and can also be specified more than once for multiple filter +files. + + + +The file lists one object pattern per row, with the following format: + +{ include | exclude } { extension | foreign_data | table | table_and_children | table_data | table_data_and_children | schema } PATTERN + + + + +The first keyword specifies whether the objects matched by the pattern +are to be included or excluded. The second keyword s
Re: Add PQsendSyncMessage() to libpq
Hello, Thanks for the feedback! On 07/11/2023 09:23, Jelte Fennema-Nio wrote: > But I think it's looking at the situation from the wrong direction. [...] we should look at it as an addition to our current list of PQsend functions for a new packet type. And none of those PQsend functions ever needed a flag. Which makes sense, because they are the lowest level building blocks that make sense from a user perspective: They send a message type over the socket and don't do anything else. Yes, I think that this is quite close to my thinking when I created the original version of the patch. Also, the protocol specification states that the Sync message lacks parameters. Since there haven't been any comments from the other people who have chimed in on this e-mail thread, I will assume that there is consensus (we are doing a U-turn with the implementation approach after all), so here is the updated version of the patch. Best wishes, Anton KirilovFrom b752269b2763f8d66bcfc79faf751e52226c344b Mon Sep 17 00:00:00 2001 From: Anton Kirilov Date: Wed, 22 Mar 2023 20:39:57 + Subject: [PATCH v5] Add PQsendPipelineSync() to libpq This new function is equivalent to PQpipelineSync(), except that it does not flush anything to the server; the user must subsequently call PQflush() instead. Its purpose is to reduce the system call overhead of pipeline mode. --- doc/src/sgml/libpq.sgml | 45 --- src/interfaces/libpq/exports.txt | 1 + src/interfaces/libpq/fe-exec.c| 17 +-- src/interfaces/libpq/libpq-fe.h | 1 + .../modules/libpq_pipeline/libpq_pipeline.c | 37 +++ .../traces/multi_pipelines.trace | 11 + 6 files changed, 102 insertions(+), 10 deletions(-) diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml index 64b2910fee..61bee82a54 100644 --- a/doc/src/sgml/libpq.sgml +++ b/doc/src/sgml/libpq.sgml @@ -3547,8 +3547,9 @@ ExecStatusType PQresultStatus(const PGresult *res); The PGresult represents a -synchronization point in pipeline mode, requested by -. +synchronization point in pipeline mode, requested by either + or +. This status occurs only when pipeline mode has been selected. @@ -5122,7 +5123,8 @@ int PQsendClosePortal(PGconn *conn, const char *portalName); , , , - , or + , + , or call, and returns it. A null pointer is returned when the command is complete and there @@ -5506,8 +5508,9 @@ int PQflush(PGconn *conn); client sends them. The server will begin executing the commands in the pipeline immediately, not waiting for the end of the pipeline. Note that results are buffered on the server side; the server flushes - that buffer when a synchronization point is established with - PQpipelineSync, or when + that buffer when a synchronization point is established with either + PQpipelineSync or + PQsendPipelineSync, or when PQsendFlushRequest is called. If any statement encounters an error, the server aborts the current transaction and does not execute any subsequent command in the queue @@ -5564,7 +5567,8 @@ int PQflush(PGconn *conn); PGresult types PGRES_PIPELINE_SYNC and PGRES_PIPELINE_ABORTED. PGRES_PIPELINE_SYNC is reported exactly once for each - PQpipelineSync at the corresponding point + PQpipelineSync or + PQsendPipelineSync at the corresponding point in the pipeline. PGRES_PIPELINE_ABORTED is emitted in place of a normal query result for the first error and all subsequent results @@ -5602,7 +5606,8 @@ int PQflush(PGconn *conn); PQresultStatus will report a PGRES_PIPELINE_ABORTED result for each remaining queued operation in an aborted pipeline. The result for - PQpipelineSync is reported as + PQpipelineSync or + PQsendPipelineSync is reported as PGRES_PIPELINE_SYNC to signal the end of the aborted pipeline and resumption of normal result processing. @@ -5834,6 +5839,32 @@ int PQsendFlushRequest(PGconn *conn); + + + PQsendPipelineSyncPQsendPipelineSync + + + + Marks a synchronization point in a pipeline by sending a + sync message + without flushing the send buffer. This serves as + the delimiter of an implicit transaction and an error recovery + point; see . + + +int PQsendPipelineSync(PGconn *conn); + + + + Returns 1 for success. Returns 0 if the connection is not in + pipeline mode or sending a + sync message + failed. + Note that the message is not itself flushed to the server automatically; + use PQflush if necessary. + + + diff --git a/src/interfaces/libpq/exports.txt
Re: pg_basebackup check vs Windows file path limits
On 2023-11-11 Sa 12:00, Alexander Lakhin wrote: 11.11.2023 18:18, Andrew Dunstan wrote: Hmm, maybe we should be using File::Copy::move() instead of rename(). The docco for that says: If possible, move() will simply rename the file. Otherwise, it copies the file to the new location and deletes the original. If an error occurs during this copy-and-delete process, you may be left with a (possibly partial) copy of the file under the destination name. Unfortunately, I've stumbled upon inability of File::Copy::move() to move directories across filesystems, exactly as described here: https://stackoverflow.com/questions/17628039/filecopy-move-directories-accross-drives-in-windows-not-working (I'm sorry for not looking above rename() where this stated explicitly: # On Windows use the short location to avoid path length issues. # Elsewhere use $tempdir to avoid file system boundary issues with moving. So this issue affects Windows only.) *sigh* A probable workaround is to use a temp directory on the same device the test is building on. Just set it up and set your environment TEMPDIR to point to it, and I think it will be OK (i.e. I havent tested it). But that doesn't mean I'm not searching for a better solution. Maybe Alvaro's suggestion nearby will help. cheers andrew -- Andrew Dunstan EDB:https://www.enterprisedb.com
Re: building with meson on windows with ssl
On Sun, 12 Nov 2023 at 07:57, Dave Cramer wrote: > Greetings, > > I am getting the following error > building on HEAD > > Library crypto found: YES > Checking for function "CRYPTO_new_ex_data" with dependencies -lssl, > -lcrypto: NO > So this is the error you get if you mix a 64 bit version of openssl and build with x86 tools. Clearly my problem, but the error message is less than helpful Dave >
Re: [PATCH] Add support function for containment operators
On Fri, 2023-10-20 at 16:24 +0800, jian he wrote: > [new patch] Thanks, that patch works as expected and passes regression tests. Some comments about the code: > --- a/src/backend/utils/adt/rangetypes.c > +++ b/src/backend/utils/adt/rangetypes.c > @@ -558,7 +570,6 @@ elem_contained_by_range(PG_FUNCTION_ARGS) > PG_RETURN_BOOL(range_contains_elem_internal(typcache, r, val)); > } > > - > /* range, range -> bool functions */ > > /* equality (internal version) */ Please don't change unrelated whitespace. > +static Node * > +find_simplified_clause(Const *rangeConst, Expr *otherExpr) > +{ > + Form_pg_range pg_range; > + HeapTuple tup; > + Oid opclassOid; > + RangeBound lower; > + RangeBound upper; > + boolempty; > + Oid rng_collation; > + TypeCacheEntry *elemTypcache; > + Oid opfamily = InvalidOid; > + > + RangeType *range = DatumGetRangeTypeP(rangeConst->constvalue); > + TypeCacheEntry *rangetypcache = > lookup_type_cache(RangeTypeGetOid(range), TYPECACHE_RANGE_INFO); > + { This brace is unnecessary. Perhaps a leftover from a removed conditional statement. > + /* this part is get the range's SUBTYPE_OPCLASS from pg_range > catalog. > + * Refer load_rangetype_info function last line. > + * TypeCacheEntry->rngelemtype typcaheenetry either don't have > opclass entry or with default opclass. > + * Range's subtype opclass only in catalog table. > + */ The comments in the patch need some more love. Apart from the language, you should have a look at the style guide: - single-line comments should start with lower case and have no period: /* example of a single-line comment */ - Multi-line comments should start with /* on its own line and end with */ on its own line. They should use whole sentences: /* * In case a comment spans several lines, it should look like * this. Try not to exceed 80 characters. */ > + tup = SearchSysCache1(RANGETYPE, > ObjectIdGetDatum(RangeTypeGetOid(range))); > + > + /* should not fail, since we already checked typtype ... */ > + if (!HeapTupleIsValid(tup)) > + elog(ERROR, "cache lookup failed for range type %u", > RangeTypeGetOid(range)); If this is a "can't happen" case, it should be an Assert. > + > + pg_range = (Form_pg_range) GETSTRUCT(tup); > + > + opclassOid = pg_range->rngsubopc; > + > + ReleaseSysCache(tup); > + > + /* get opclass properties and look up the comparison function */ > + opfamily = get_opclass_family(opclassOid); > + } > + > + range_deserialize(rangetypcache, range, &lower, &upper, &empty); > + rng_collation = rangetypcache->rng_collation; > + > + if (empty) > + { > + /* If the range is empty, then there can be no matches. */ > + return makeBoolConst(false, false); > + } > + else if (lower.infinite && upper.infinite) > + { > + /* The range has no bounds, so matches everything. */ > + return makeBoolConst(true, false); > + } > + else > + { Many of the variables declared at the beginning of the function are only used in this branch. You should declare them here. > +static Node * > +match_support_request(Node *rawreq) > +{ > + if (IsA(rawreq, SupportRequestSimplify)) > + { To keep the indentation shallow, the preferred style is: if (/* case we don't handle */) return NULL; /* proceed without indentation */ > + SupportRequestSimplify *req = (SupportRequestSimplify *) rawreq; > + FuncExpr *fexpr = req->fcall; > + Node *leftop; > + Node *rightop; > + Const *rangeConst; > + Expr *otherExpr; > + > + Assert(list_length(fexpr->args) == 2); > + > + leftop = linitial(fexpr->args); > + rightop = lsecond(fexpr->args); > + > + switch (fexpr->funcid) > + { > + case F_ELEM_CONTAINED_BY_RANGE: > + if (!IsA(rightop, Const) || ((Const *) > rightop)->constisnull) > + return NULL; > + > + rangeConst = (Const *) rightop; > + otherExpr = (Expr *) leftop; > + break; > + > + case F_RANGE_CONTAINS_ELEM: > + if (!IsA(leftop, Const) || ((Const *) > leftop)->constisnull) > + return NULL; > + > + rangeConst = (Const *) leftop; > + otherExpr = (Expr *) rightop; > + break; > + > + default: > +
Re: [PATCH] Add support function for containment operators
On Sun, 2023-11-12 at 18:15 +0100, Laurenz Albe wrote: > I adjusted your patch according to my comments; what do you think? I have added the patch to the January commitfest, with Jian and Kim as authors. I hope that is OK with you. Yours, Laurenz Albe
Re: [PATCH] Add support function for containment operators
On 12-11-2023 20:20, Laurenz Albe wrote: On Sun, 2023-11-12 at 18:15 +0100, Laurenz Albe wrote: I adjusted your patch according to my comments; what do you think? I have added the patch to the January commitfest, with Jian and Kim as authors. I hope that is OK with you. Sounds great to me. Thanks to Jian for picking this up. Regards, Kim Johan Andersson
Re: ResourceOwner refactoring
On 11/11/2023 14:00, Alexander Lakhin wrote: 10.11.2023 17:26, Heikki Linnakangas wrote: I think that is surprising behavior from the DSA facility. When you make allocations with dsa_allocate() or just call dsa_get_address() on an existing dsa_pointer, you wouldn't expect the current resource owner to matter. I think dsa_create/attach() should store the current resource owner in the dsa_area, for use in subsequent operations on the DSA, per attached patch (0002-Fix-dsa.c-with-different-resource-owners.patch). With the patch 0002 applied, I'm observing another anomaly: Ok, so the ownership of a dsa was still muddled :-(. Attached is a new version that goes a little further. It replaces the whole 'mapping_pinned' flag in dsa_area with the 'resowner'. When a mapping is pinned, it means that 'resowner == NULL'. This is now similar to how the resowner field and pinning works in dsm.c. -- Heikki Linnakangas Neon (https://neon.tech) From 90a0f1c8204f01c423b60be032ea521e4afd7473 Mon Sep 17 00:00:00 2001 From: Heikki Linnakangas Date: Fri, 10 Nov 2023 13:54:11 +0200 Subject: [PATCH 1/3] Add test_dsa module. This covers basic calls within a single backend process, and a test that uses a different resource owner. The resource owner test demonstrates that calling dsa_allocate() or dsa_get_address() while in a different resource owner than in the dsa_create/attach call causes a segfault. I think that's a bug. This resource owner issue was originally found by Alexander Lakhin, exposed by my large ResourceOwner rewrite commit. Reported-by: Alexander Lakhin Discussion: https://www.postgresql.org/message-id/11b70743-c5f3-3910-8e5b-dd6c115ff829%40gmail.com --- src/test/modules/Makefile | 1 + src/test/modules/meson.build | 1 + src/test/modules/test_dsa/Makefile| 23 .../modules/test_dsa/expected/test_dsa.out| 13 ++ src/test/modules/test_dsa/meson.build | 33 + src/test/modules/test_dsa/sql/test_dsa.sql| 4 + src/test/modules/test_dsa/test_dsa--1.0.sql | 12 ++ src/test/modules/test_dsa/test_dsa.c | 115 ++ src/test/modules/test_dsa/test_dsa.control| 4 + 9 files changed, 206 insertions(+) create mode 100644 src/test/modules/test_dsa/Makefile create mode 100644 src/test/modules/test_dsa/expected/test_dsa.out create mode 100644 src/test/modules/test_dsa/meson.build create mode 100644 src/test/modules/test_dsa/sql/test_dsa.sql create mode 100644 src/test/modules/test_dsa/test_dsa--1.0.sql create mode 100644 src/test/modules/test_dsa/test_dsa.c create mode 100644 src/test/modules/test_dsa/test_dsa.control diff --git a/src/test/modules/Makefile b/src/test/modules/Makefile index 738b715e792..a18e4d28a04 100644 --- a/src/test/modules/Makefile +++ b/src/test/modules/Makefile @@ -17,6 +17,7 @@ SUBDIRS = \ test_copy_callbacks \ test_custom_rmgrs \ test_ddl_deparse \ + test_dsa \ test_extensions \ test_ginpostinglist \ test_integerset \ diff --git a/src/test/modules/meson.build b/src/test/modules/meson.build index d4828dc44d5..4e83c0f8d74 100644 --- a/src/test/modules/meson.build +++ b/src/test/modules/meson.build @@ -14,6 +14,7 @@ subdir('test_bloomfilter') subdir('test_copy_callbacks') subdir('test_custom_rmgrs') subdir('test_ddl_deparse') +subdir('test_dsa') subdir('test_extensions') subdir('test_ginpostinglist') subdir('test_integerset') diff --git a/src/test/modules/test_dsa/Makefile b/src/test/modules/test_dsa/Makefile new file mode 100644 index 000..77583464dca --- /dev/null +++ b/src/test/modules/test_dsa/Makefile @@ -0,0 +1,23 @@ +# src/test/modules/test_dsa/Makefile + +MODULE_big = test_dsa +OBJS = \ + $(WIN32RES) \ + test_dsa.o +PGFILEDESC = "test_dsa - test code for dynamic shared memory areas" + +EXTENSION = test_dsa +DATA = test_dsa--1.0.sql + +REGRESS = test_dsa + +ifdef USE_PGXS +PG_CONFIG = pg_config +PGXS := $(shell $(PG_CONFIG) --pgxs) +include $(PGXS) +else +subdir = src/test/modules/test_dsa +top_builddir = ../../../.. +include $(top_builddir)/src/Makefile.global +include $(top_srcdir)/contrib/contrib-global.mk +endif diff --git a/src/test/modules/test_dsa/expected/test_dsa.out b/src/test/modules/test_dsa/expected/test_dsa.out new file mode 100644 index 000..266010e77fe --- /dev/null +++ b/src/test/modules/test_dsa/expected/test_dsa.out @@ -0,0 +1,13 @@ +CREATE EXTENSION test_dsa; +SELECT test_dsa_basic(); + test_dsa_basic + + +(1 row) + +SELECT test_dsa_resowners(); + test_dsa_resowners + + +(1 row) + diff --git a/src/test/modules/test_dsa/meson.build b/src/test/modules/test_dsa/meson.build new file mode 100644 index 000..21738290ad5 --- /dev/null +++ b/src/test/modules/test_dsa/meson.build @@ -0,0 +1,33 @@ +# Copyright (c) 2022-2023, PostgreSQL Global Development Group + +test_dsa_sources = files( + 'test_dsa.c', +) + +if host_system == 'windows' + test_dsa_sour
Re: ResourceOwner refactoring
On Mon, Nov 13, 2023 at 11:16 AM Heikki Linnakangas wrote: > On 11/11/2023 14:00, Alexander Lakhin wrote: > > 10.11.2023 17:26, Heikki Linnakangas wrote: > >> I think that is surprising behavior from the DSA facility. When you make > >> allocations with dsa_allocate() or just call > >> dsa_get_address() on an existing dsa_pointer, you wouldn't expect the > >> current resource owner to matter. I think > >> dsa_create/attach() should store the current resource owner in the > >> dsa_area, for use in subsequent operations on the > >> DSA, per attached patch > >> (0002-Fix-dsa.c-with-different-resource-owners.patch). Yeah, I agree that it is surprising and dangerous that the DSM segments can have different owners if you're not careful, and it's not clear that you have to be. Interesting that no one ever reported breakage in parallel query due to this thinko, presumably because the current resource owner always turns out to be either the same one or something with longer life. > > With the patch 0002 applied, I'm observing another anomaly: > > Ok, so the ownership of a dsa was still muddled :-(. Attached is a new > version that goes a little further. It replaces the whole > 'mapping_pinned' flag in dsa_area with the 'resowner'. When a mapping is > pinned, it means that 'resowner == NULL'. This is now similar to how the > resowner field and pinning works in dsm.c. This patch makes sense to me. It might be tempting to delete the dsa_pin_mapping() interface completely and say that if CurrentResourceOwner == NULL, then it's effectively (what we used to call) pinned, but I think it's useful for exception-safe construction of multiple objects to be able to start out with a resource owner and then later 'commit' by clearing it. As seen in GetSessionDsmHandle(). I don't love the way this stuff is not very explicit, and if we're going to keep doing this 'dynamic scope' stuff then I wish we could find a scoping notation that looks more like the stuff one sees in other languages that say something like "with-resource-owner(area->resowner) { block of code }".
Re: pg_walfile_name_offset can return inconsistent values
On Fri, Nov 10, 2023 at 07:59:43PM -0800, Andres Freund wrote: > I think this needs to add tests "documenting" the changed behaviour and > perhaps also for a few other edge cases. You could e.g. test > SELECT * FROM pg_walfile_name_offset('0/0') > which today returns bogus values, and which is independent of the wal segment > size. +1. -- Michael signature.asc Description: PGP signature
Re: Cleaning up array_in()
On 09/11/2023 18:57, Tom Lane wrote: After further thought I concluded that this area is worth spending a little more code for. If we have input like '{foo"bar"}' or '{"foo"bar}' or '{"foo""bar"}', what it most likely means is that the user misunderstood the quoting rules. A message like "Unexpected array element" is pretty completely unhelpful for figuring that out. The alternative I was considering, "Unexpected """ character", would not be much better. What we want to say is something like "Incorrectly quoted array element", and the attached v12 makes ReadArrayToken do that for both quoted and unquoted cases. +1 I also fixed the obsolete documentation that Alexander noted, and cleaned up a couple other infelicities (notably, I'd blindly written ereport(ERROR) in one place where ereturn is now the way). Barring objections, I think v12 is committable. Looks good to me. Just two little things caught my eye: 1. /* Initialize dim, lBound for ReadArrayDimensions, ReadArrayStr */ for (int i = 0; i < MAXDIM; i++) { dim[i] = -1;/* indicates "not yet known" */ lBound[i] = 1; /* default lower bound */ } The function comments in ReadArrayDimensions and ReadArrayStr don't make it clear that these arrays need to be initialized like this. ReadArrayDimensions() says that they are output variables, and ReadArrayStr() doesn't mention anything about having to initialize them. 2. This was the same before this patch, but: postgres=# select '{{1}}'::int[]; ERROR: number of array dimensions (7) exceeds the maximum allowed (6) LINE 1: select '{{1}}'::int[]; ^ The error message isn't great, as the literal contains 10 dimensions, not 7 as the error message claims. -- Heikki Linnakangas Neon (https://neon.tech)
Re: Cleaning up array_in()
Heikki Linnakangas writes: > On 09/11/2023 18:57, Tom Lane wrote: >> Barring objections, I think v12 is committable. > Looks good to me. Just two little things caught my eye: > 1. > The function comments in ReadArrayDimensions and ReadArrayStr don't make > it clear that these arrays need to be initialized like this. > ReadArrayDimensions() says that they are output variables, and > ReadArrayStr() doesn't mention anything about having to initialize them. Roger, will fix that. > 2. This was the same before this patch, but: > postgres=# select '{{1}}'::int[]; > ERROR: number of array dimensions (7) exceeds the maximum allowed (6) > LINE 1: select '{{1}}'::int[]; > ^ > The error message isn't great, as the literal contains 10 dimensions, > not 7 as the error message claims. Yeah. To make that report accurate, we'd have to somehow postpone issuing the error until we've seen all the left braces (or at least all the initial ones). There's a related problem in reading an explicitly-dimensioned array: postgres=# select '[1][2][3][4][5][6][7][8][9]={}'::text[]; ERROR: number of array dimensions (7) exceeds the maximum allowed (6) I kind of think it's not worth the trouble. What was discussed upthread was revising the message to not claim it knows how many dimensions there are. The related cases in plperl and plpython just say "number of array dimensions exceeds the maximum allowed (6)", and there's a case to be made for adjusting the core messages similarly. I figured that could be a separate patch though, since it'd touch more than array_in (there's about a dozen occurrences of the former wording). regards, tom lane
RE: Synchronizing slots from primary to standby
On Thursday, November 9, 2023 6:54 PM shveta malik wrote: > > > PFA v32 patches which has below changes: Thanks for updating the patch. Here are few comments: 1. Do we need to update the slot upgrade code in pg_upgrade to upgrade the slot's failover property as well ? 2. The check for wal_level < WAL_LEVEL_LOGICAL. It seems the existing codes disallow slot creation if wal_level is not sufficient, I am thinking we might need similar check in slot sync worker. Otherwise, the synced slot could not be used after standby promotion. Best Regards, Hou zj
How to solve the problem of one backend process crashing and causing other processes to restart?
In PostgreSQL, when a backend process crashes, it can cause other backend processes to also require a restart, primarily to ensure data consistency. I understand that the correct approach is to analyze and identify the cause of the crash and resolve it. However, it is also important to be able to handle a backend process crash without affecting the operation of other processes, thus minimizing the scope of negative impact and improving availability. To achieve this goal, could we mimic the Oracle process by introducing a "pmon" process dedicated to rolling back crashed process transactions and performing resource cleanup? I wonder if anyone has attempted such a strategy or if there have been previous discussions on this topic.
Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
Trivial observation: these patches obviously introduce many instances of words derived from "cancel", but they don't all conform to established project decisions (cf 21f1e15a) about how to spell them. We follow the common en-US usage: "canceled", "canceling" but "cancellation". Blame Webstah et al. https://english.stackexchange.com/questions/176957/cancellation-canceled-canceling-us-usage
Re: How to solve the problem of one backend process crashing and causing other processes to restart?
yuansong writes: > In PostgreSQL, when a backend process crashes, it can cause other backend > processes to also require a restart, primarily to ensure data consistency. I > understand that the correct approach is to analyze and identify the cause of > the crash and resolve it. However, it is also important to be able to handle > a backend process crash without affecting the operation of other processes, > thus minimizing the scope of negative impact and improving availability. To > achieve this goal, could we mimic the Oracle process by introducing a "pmon" > process dedicated to rolling back crashed process transactions and performing > resource cleanup? I wonder if anyone has attempted such a strategy or if > there have been previous discussions on this topic. The reason we force a database-wide restart is that there's no way to be certain that the crashed process didn't corrupt anything in shared memory. (Even with the forced restart, there's a window where bad data could reach disk before we kill off the other processes that might write it. But at least it's a short window.) "Corruption" here doesn't just involve bad data placed into disk buffers; more often it's things like unreleased locks, which would block other processes indefinitely. I seriously doubt that anything like what you're describing could be made reliable enough to be acceptable. "Oracle does it like this" isn't a counter-argument: they have a much different (and non-extensible) architecture, and they also have an army of programmers to deal with minutiae like undoing resource acquisition. Even with that, you'd have to wonder about the number of bugs existing in such necessarily-poorly-tested code paths. regards, tom lane
Re: A recent message added to pg_upgade
On Sat, Nov 11, 2023 at 5:46 AM Michael Paquier wrote: > > On Fri, Nov 10, 2023 at 03:27:25PM +0530, Amit Kapila wrote: > > I don't think this comment is correct because there won't be any apply > > activity on the new cluster as after restoration subscriptions should > > be disabled. On the old cluster, I think one problem is that the > > origins may move forward after we copy them which can cause data > > inconsistency issues. The other is that we may not prefer to generate > > additional data and WAL during the upgrade. Also, I am not completely > > sure about using the word 'corruption' in this context. > > What is your suggestion here? Would it be better to just aim for > simplicity and just say that we don't want it to run because "it can > lead to some inconsistent behaviors"? > I think we can be specific about logical replication stuff. I have not done any study on autovacuum behavior related to this, so we can update about it separately if required. I could think of something like the following: - /* Use -b to disable autovacuum. */ + /* +* Use -b to disable autovacuum and logical replication launcher +* (effective in PG17 or later for the latter). +* +* Logical replication workers can stream data during the upgrade which can +* cause replication origins to move forward after we have copied them. +* It can cause the system to request the data which is already present +* in the new cluster. +*/ Now, ideally, such a comment change makes more sense along with the main patch, so either we can go without this comment change or probably wait till the main patch is ready and merge just before it or along with it. I am fine either way. BTW, it is not clear to me another part of the comment "... for the latter" in the proposed wording. Is there any typo there or am I missing something? -- With Regards, Amit Kapila.
Re: Is this a problem in GenericXLogFinish()?
On Fri, Nov 10, 2023 at 9:17 AM Hayato Kuroda (Fujitsu) wrote: > > > Next we should add some test codes. I will continue considering but please > > post > > anything > > If you have idea. > > And I did, PSA the patch. This patch adds two parts in hash_index.sql. > > In the first part, the primary bucket page is filled by live tuples and some > overflow > pages are by dead ones. This leads removal of overflow pages without moving > tuples. > HEAD will crash if this were executed, which is already reported on hackers. > +-- And do CHECKPOINT and vacuum. Some overflow pages will be removed. +CHECKPOINT; +VACUUM hash_cleanup_heap; Why do we need CHECKPOINT in the above test? > The second one tests a case (ntups == 0 && is_prim_bucket_same_wrt == false && > is_prev_bucket_same_wrt == true), which is never done before. > +-- Insert few tuples, the primary bucket page will not satisfy +INSERT INTO hash_cleanup_heap SELECT 1 FROM generate_series(1, 50) as i; What do you mean by the second part of the sentence (the primary bucket page will not satisfy)? Few other minor comments: === 1. Can we use ROLLBACK instead of ABORT in the tests? 2. - for (i = 0; i < nitups; i++) + for (int i = 0; i < nitups; i++) I don't think there is a need to make this unrelated change. 3. + /* + * A write buffer needs to be registered even if no tuples are added to + * it to ensure that we can acquire a cleanup lock on it if it is the + * same as primary bucket buffer or update the nextblkno if it is same + * as the previous bucket buffer. + */ + else if (xlrec.is_prim_bucket_same_wrt || xlrec.is_prev_bucket_same_wrt) + { + uint8 wbuf_flags; + + Assert(xlrec.ntups == 0); Can we move this comment inside else if, just before Assert? -- With Regards, Amit Kapila.
Re: Synchronizing slots from primary to standby
On Mon, Nov 13, 2023 at 6:19 AM Zhijie Hou (Fujitsu) wrote: > > On Thursday, November 9, 2023 6:54 PM shveta malik > wrote: > > > > > > PFA v32 patches which has below changes: > > Thanks for updating the patch. > > Here are few comments: > > > 2. > The check for wal_level < WAL_LEVEL_LOGICAL. > > It seems the existing codes disallow slot creation if wal_level is not > sufficient, I am thinking we might need similar check in slot sync worker. > Otherwise, the synced slot could not be used after standby promotion. > Yes, I agree. We should add this check. > Best Regards, > Hou zj
Re: Add new option 'all' to pg_stat_reset_shared()
On 2023-11-12 16:46, Michael Paquier wrote: On Fri, Nov 10, 2023 at 01:15:50PM +0900, Michael Paquier wrote: The comments added could be better grammatically, but basically LGTM. I'll take care of that if there are no objections. The documentation also needed a few tweaks (for DEFAULT and the argument name), so I have fixed the whole and adapted the new part of the docs to that, with few little tweaks. Thanks! I assume you have already taken this into account, but I think we should add the same documentation to the below patch for pg_stat_reset_slru(): https://www.postgresql.org/message-id/CALj2ACW4Fqc_m%2BOaavrOMEivZ5aBa24pVKvoXRTmuFECsNBfAg%40mail.gmail.com On 2023-11-12 16:54, Michael Paquier wrote: On Fri, Nov 10, 2023 at 08:32:34PM +0900, torikoshia wrote: On 2023-11-10 13:18, Andres Freund wrote: I see no reason to not include slrus. We should never have added the ability to reset them individually, particularly not without a use case - I couldn't find one skimming some discussion. And what's the point in not allowing to reset them via pg_stat_reset_shared()? When including SLRUs, do you think it's better to add 'slrus' argument which enables pg_stat_reset_shared() to reset all SLRUs? I understand that Andres says that he'd be OK with a addition of a 'slru' option in pg_stat_reset_shared(), as well as including SLRUs in the resets if everything should be wiped. Thanks, I'll make the patch. 28cac71bd368 is around since 13~, so changing pg_stat_reset_slru() or removing it could impact existing applications, so there's little benefit in changing it at this stage. Let it be itself. +1. As described above, since SLRUs cannot be reset by pg_stat_reset_shared(), I feel a bit uncomfortable to delete it all together. That would be only effective if NULL is given to the function to reset everything, which is OK IMO, because this is a shared stats. -- Michael -- Regards, -- Atsushi Torikoshi NTT DATA Group Corporation
Re: Synchronizing slots from primary to standby
On Thu, Nov 9, 2023 at 8:56 AM Amit Kapila wrote: > > On Thu, Nov 9, 2023 at 8:11 AM Amit Kapila wrote: > > > > On Wed, Nov 8, 2023 at 8:09 PM Drouvot, Bertrand > > wrote: > > > > > > > Unrelated to above, if there is a user slot on standby with the same > > > > name which the slot-sync worker is trying to create, then shall it > > > > emit a warning and skip the sync of that slot or shall it throw an > > > > error? > > > > > > > > > > I'd vote for emit a warning and move on to the next slot if any. > > > > > > > But then it could take time for users to know the actual problem and > > they probably notice it after failover. OTOH, if we throw an error > > then probably they will come to know earlier because the slot sync > > mechanism would be stopped. Do you have reasons to prefer giving a > > WARNING and skipping creating such slots? I expect this WARNING to > > keep getting repeated in LOGs because the consecutive sync tries will > > again generate a WARNING. > > > > Apart from the above, I would like to discuss the slot sync work > distribution strategy of this patch. The current implementation as > explained in the commit message [1] works well if the slots belong to > multiple databases. It is clear from the data in emails [2][3][4] that > having more workers really helps if the slots belong to multiple > databases. But I think if all the slots belong to one or very few > databases then such a strategy won't be as good. Now, on one hand, we > get very good numbers for a particular workload with the strategy used > in the patch but OTOH it may not be adaptable to various different > kinds of workloads. So, I have a question whether we should try to > optimize this strategy for various kinds of workloads or for the first > version let's use a single-slot sync-worker and then we can enhance > the functionality in later patches either in PG17 itself or in PG18 or > later versions. I can work on separating the patch. We can first focus on single worker design and then we can work on multi-worker design either immediately (if needed) or we can target it in the second draft of the patch. I would like to know the thoughts of others on this. One thing to note is that a lot of the complexity of > the patch is attributed to the multi-worker strategy which may still > not be efficient, so there is an argument to go with a simpler > single-slot sync-worker strategy and then enhance it in future > versions as we learn more about various workloads. It will also help > to develop this feature incrementally instead of doing all the things > in one go and taking a much longer time than it should. > Agreed. With multi-workers, a lot of complexity (dsa, locks etc) have come into play. We can decide better on our workload distribution strategy among workers once we have more clarity on different types of workloads. > > [1] - "The replication launcher on the physical standby queries > primary to get the list of dbids for failover logical slots. Once it > gets the dbids, if dbids < max_slotsync_workers, it starts only that > many workers, and if dbids > max_slotsync_workers, it starts > max_slotsync_workers and divides the work equally among them. Each > worker is then responsible to keep on syncing the logical slots > belonging to the DBs assigned to it. > > Each slot-sync worker will have its own dbids list. Since the upper > limit of this dbid-count is not known, it needs to be handled using > dsa. We initially allocated memory to hold 100 dbids for each worker. > If this limit is exhausted, we reallocate this memory with size > incremented again by 100." > > [2] - > https://www.postgresql.org/message-id/CAJpy0uD2F43avuXy_yQv7Wa3kpUwioY_Xn955xdmd6vX0ME6%3Dg%40mail.gmail.com > [3] - > https://www.postgresql.org/message-id/CAFPTHDZw2G3Pax0smymMjfPqdPcZhMWo36f9F%2BTwNTs0HFxK%2Bw%40mail.gmail.com > [4] - > https://www.postgresql.org/message-id/CAJpy0uD%3DDevMxTwFVsk_%3DxHqYNH8heptwgW6AimQ9fbRmx4ioQ%40mail.gmail.com > > -- > With Regards, > Amit Kapila.
Re: Add recovery to pg_control and remove backup_label
On Fri, Nov 10, 2023 at 02:55:19PM -0400, David Steele wrote: > On 11/10/23 00:37, Michael Paquier wrote: >> I've done a few more dozen runs, and still nothing. I am wondering >> what this disturbance was. > > OK, hopefully it was just a blip. Still nothing on this side. So that seems really like a random blip in the matrix. > This has been split out. Thanks, applied 0001. >> The term "backup recovery", that we've never used in the tree until >> now as far as I know. Perhaps this recovery method should just be >> referred as "recovery from backup"? > > Well, "backup recovery" is less awkward, I think. For instance "backup > recovery field" vs "recovery from backup field". Not sure. I've never used this term when referring to recovery from a backup. Perhaps I'm just not used to it, still that sounds a bit confusing here. >> Something in this area is that backupRecoveryRequired is the switch >> controlling if the fields set by the recovery initialization. Could >> it be actually useful to leave the other fields as they are and only >> reset backupRecoveryRequired before the first control file update? >> This would leave a trace of the backup history directly in the control >> file. > > Since the other recovery fields are cleared in ReachedEndOfBackup() this > would be a change from what we do now. > > None of these fields are ever visible (with the exception of > minRecoveryPoint/TLI) since they are reset when the database becomes > consistent and before logons are allowed. Viewing them with pg_controldata > makes sense, but I'm not sure pg_control_recovery() does. > > In fact, are backup_start_lsn, backup_end_lsn, and > end_of_backup_record_required ever non-zero when logged onto Postgres? Maybe > I'm missing something? Yeah, but custom backup/restore tools may want manipulate the contents of the control file for their own work, so at least for the sake of visibility it sounds important to me to show all the information at hand, and that there is no need to. -The backup label -file includes the label string you gave to pg_backup_start, +The backup history file (which is archived like WAL) includes the label +string you gave to pg_backup_start, as well as the time at which pg_backup_start was run, and the name of the starting WAL file. In case of confusion it is therefore -possible to look inside a backup file and determine exactly which +possible to look inside a backup history file and determine exactly which As a side note, it is a bit disappointing that we lose the backup label from the backup itself, even if the patch changes correctly the documentation to reflect the new behavior. It is in the backup history file on the node from where the base backup has been taken or in the archives, hopefully. However there is nothing that remains in the base backup itself, and backups can be self-contained (easy with pg_basebackup --wal-method=stream). I think that we should retain a minimum amount of information as a replacement for the backup_label, at least. With this state, the current patch slightly reduces the debuggability of deployments. That could be annoying for some users. > New patches attached based on eb81e8e790. Diving into the code for references about the backup label file, I have spotted this log in pg_rewind that is now incorrect: if (showprogress) pg_log_info("creating backup label and updating control file"); +printf(_("Backup start location's timeline: %u\n"), + ControlFile->backupStartPointTLI); printf(_("Backup end location: %X/%X\n"), LSN_FORMAT_ARGS(ControlFile->backupEndPoint)); Perhaps these two should be reversed to match with the header file. +/* + * After pg_backup_stop() returns this field will contain a copy of + * pg_control that should be stored with the backup. Fields have been + * updated for recovery and the CRC has been recalculated. The buffer + * is padded to PG_CONTROL_MAX_SAFE_SIZE so that pg_control is always + * a consistent size but smaller (and hopefully easier to handle) than + * PG_CONTROL_FILE_SIZE. Bytes after sizeof(ControlFileData) are zeroed. + */ +uint8_t controlFile[PG_CONTROL_MAX_SAFE_SIZE]; I don't mind the addition of a control file with the max safe size, because it will never be higher than that. However: +/* End the backup before sending pg_control */ +basebackup_progress_wait_wal_archive(&state); +do_pg_backup_stop(backup_state, !opt->nowait); + +/* Send copy of pg_control containing recovery info */ +sendFileWithContent(sink, XLOG_CONTROL_FILE, +(char *)backup_state->controlFile, +PG_CONTROL_MAX_SAFE_SIZE, &manifest); It seems to me that the base backup protocol should always send an 8k file for the control file so as we maintain consi
Re: A recent message added to pg_upgade
On Mon, Nov 13, 2023 at 08:45:12AM +0530, Amit Kapila wrote: > I think we can be specific about logical replication stuff. I have not > done any study on autovacuum behavior related to this, so we can > update about it separately if required. Autovacuum, as far as I recall, could decide to do some work before files are physically copied from the old to the new cluster, corrupting the new cluster. Per 76dd09bbec89: +* If we have lost the autovacuum launcher, try to start a new one. +* We don't want autovacuum to run in binary upgrade mode because +* autovacuum might update relfrozenxid for empty tables before +* the physical files are put in place. > - /* Use -b to disable autovacuum. */ > + /* > +* Use -b to disable autovacuum and logical replication launcher > +* (effective in PG17 or later for the latter). > +* > +* Logical replication workers can stream data during the > upgrade which can > +* cause replication origins to move forward after we have copied > them. > +* It can cause the system to request the data which is already > present > +* in the new cluster. > +*/ > > Now, ideally, such a comment change makes more sense along with the > main patch, so either we can go without this comment change or > probably wait till the main patch is ready and merge just before it or > along with it. I am fine either way. Another location would be to document that stuff directly in launcher.c where the check for IsBinaryUpgrade would be added. You are right that it makes little sense to document that now, so how about: 1) keeping pg_upgrade.c minimal, say: - /* Use -b to disable autovacuum. */ + /* +* Use -b to disable autovacuum and logical replication +* launcher (in 17~). +*/ With a removal of the comment block related to max_logical_replication_workers=0? 2) Document that in ApplyLauncherRegister() as part of the main patch for the subscribers? > BTW, it is not clear to me another part of the comment "... for the > latter" in the proposed wording. Is there any typo there or am I > missing something? The "latter" refers to the logirep launcher here, as -b would affect it only in 17~ with the patch I sent previously. -- Michael signature.asc Description: PGP signature
Re: MERGE ... RETURNING
Hi. v13 works fine. all tests passed. The code is very intuitive. played with multi WHEN clauses, even with before/after row triggers, work as expected. I don't know when replace_outer_merging will be invoked. even set a breakpoint on it. coverage shows replace_outer_merging only called once. sql-merge.html miss mentioned RETURNING need select columns privilege? in sql-insert.html, we have: "Use of the RETURNING clause requires SELECT privilege on all columns mentioned in RETURNING. If you use the query clause to insert rows from a query, you of course need to have SELECT privilege on any table or column used in the query." I saw the change in src/sgml/glossary.sgml, So i looked around. in the "Materialized view (relation)" part. "It cannot be modified via INSERT, UPDATE, or DELETE operations.". Do we need to put "MERGE" into that sentence? also there is SELECT, INSERT, UPDATE, DELETE, do we need to add a MERGE entry in glossary.sgml?
Re: Remove MSVC scripts from the tree
On Fri, Nov 10, 2023 at 08:38:21AM +0100, Peter Eisentraut wrote: > How about this patch as a comprehensive solution? > 8 files changed, 26 insertions(+), 339 deletions(-) Thanks for the patch. The numbers are here, and the patch looks sensible. The contents of probes.h without --enable-trace are exactly the same before and after the patch. In short, +1 to switch to what you are proposing here. -- Michael signature.asc Description: PGP signature
Re: Synchronizing slots from primary to standby
On Thu, Nov 9, 2023 at 9:54 PM shveta malik wrote: > > PFA v32 patches which has below changes: Testing with this patch, I see that if the failover enabled slot is invalidated on the primary, then the corresponding synced slot is not invalidated on the standby. Instead, I see that it continuously gets the below error: " WARNING: not synchronizing slot sub; synchronization would move it backwards" In the code, I see that: if (remote_slot->restart_lsn < MyReplicationSlot->data.restart_lsn) { ereport(WARNING, errmsg("not synchronizing slot %s; synchronization would move" " it backwards", remote_slot->name)); ReplicationSlotRelease(); CommitTransactionCommand(); return; } If the restart_lsn of the remote slot is behind, then the local_slot_update() function is never called to set the invalidation status on the local slot. And for invalidated slots, restart_lsn is always NULL. regards, Ajin Cherian Fujitsu Australia
RE: Is this a problem in GenericXLogFinish()?
Dear Amit, Thanks for reviewing! PSA new version patch. > > > Next we should add some test codes. I will continue considering but please > post > > > anything > > > If you have idea. > > > > And I did, PSA the patch. This patch adds two parts in hash_index.sql. > > > > In the first part, the primary bucket page is filled by live tuples and some > overflow > > pages are by dead ones. This leads removal of overflow pages without moving > tuples. > > HEAD will crash if this were executed, which is already reported on hackers. > > > > +-- And do CHECKPOINT and vacuum. Some overflow pages will be removed. > +CHECKPOINT; > +VACUUM hash_cleanup_heap; > > Why do we need CHECKPOINT in the above test? CHECKPOINT command is needed for writing a hash pages to disk. IIUC if the command is omitted, none of tuples are recorded to hash index *file*, so squeeze would not be occurred. You can test by 1) restoring changes for hashovfl.c then 2) removing CHECKPOINT command. Server crash would not be occurred. > > The second one tests a case (ntups == 0 && is_prim_bucket_same_wrt == > false && > > is_prev_bucket_same_wrt == true), which is never done before. > > > > +-- Insert few tuples, the primary bucket page will not satisfy > +INSERT INTO hash_cleanup_heap SELECT 1 FROM generate_series(1, 50) as i; > > What do you mean by the second part of the sentence (the primary > bucket page will not satisfy)? I meant to say that the primary bucket page still can have more tuples. Maybe I should say "will not be full". Reworded. > Few other minor comments: > === > 1. Can we use ROLLBACK instead of ABORT in the tests? Changed. IIRC they have same meaning, but it seems that most of sql files have "ROLLBACK". > 2. > - for (i = 0; i < nitups; i++) > + for (int i = 0; i < nitups; i++) > > I don't think there is a need to make this unrelated change. Reverted. I thought that the variable i would be used only when nitups>0 so that it could be removed, but it was not my business. > 3. > + /* > + * A write buffer needs to be registered even if no tuples are added to > + * it to ensure that we can acquire a cleanup lock on it if it is the > + * same as primary bucket buffer or update the nextblkno if it is same > + * as the previous bucket buffer. > + */ > + else if (xlrec.is_prim_bucket_same_wrt || xlrec.is_prev_bucket_same_wrt) > + { > + uint8 wbuf_flags; > + > + Assert(xlrec.ntups == 0); > > Can we move this comment inside else if, just before Assert? Moved. Best Regards, Hayato Kuroda FUJITSU LIMITED fix_hash_squeeze_wal_5.patch Description: fix_hash_squeeze_wal_5.patch
Re: Adding facility for injection points (or probe points?) for more advanced tests
On Fri, Nov 10, 2023 at 02:44:25PM -0600, Nathan Bossart wrote: > On Tue, Nov 07, 2023 at 05:01:16PM +0900, Michael Paquier wrote: >> +#ifdef USE_INJECTION_POINTS >> +#define INJECTION_POINT_RUN(name) InjectionPointRun(name) >> +#else >> +#define INJECTION_POINT_RUN(name) ((void) name) >> +#endif > > nitpick: Why is the non-injection-point version "(void) name"? I see > "(void) true" used elsewhere for this purpose. Or (void) 0. >> + >> + Here is an example of callback for >> + InjectionPointCallback: >> + >> +static void >> +custom_injection_callback(const char *name) >> +{ >> +elog(NOTICE, "%s: executed custom callback", name); >> +} >> + > > Why do we provide the name to the callback functions? This is for the use of the same callback across multiple points, and tracking the name of the event happening was making sense to me to know which code path is being taken when a callback is called. One thing that I got in mind as well here is to be able to register custom wait events based on the name of the callback taken, for example on a condition variable, a latch or a named LWLock. > Overall, the design looks pretty good to me. I think it's a good idea to > keep it simple to start with. Since this is really only intended for > special tests that run in special builds, it seems like we ought to be able > to change it easily in the future as needed. Yes, my first idea is to keep the initial design minimal and take the temperature. As far as I can see, there seem to not be any strong objection with this basic design, still I agree that I need to show a bit more code about its usability. I have some SQL and recovery cases where this is handy and these have piled over time, including at least two/three of them with more basic APIs in the test module may make sense in the initial batch of what I am proposing here. -- Michael signature.asc Description: PGP signature
Re: How to solve the problem of one backend process crashing and causing other processes to restart?
On Sun, 2023-11-12 at 21:55 -0500, Tom Lane wrote: > yuansong writes: > > In PostgreSQL, when a backend process crashes, it can cause other backend > > processes to also require a restart, primarily to ensure data consistency. > > I understand that the correct approach is to analyze and identify the > > cause of the crash and resolve it. However, it is also important to be > > able to handle a backend process crash without affecting the operation of > > other processes, thus minimizing the scope of negative impact and > > improving availability. To achieve this goal, could we mimic the Oracle > > process by introducing a "pmon" process dedicated to rolling back crashed > > process transactions and performing resource cleanup? I wonder if anyone > > has attempted such a strategy or if there have been previous discussions > > on this topic. > > The reason we force a database-wide restart is that there's no way to > be certain that the crashed process didn't corrupt anything in shared > memory. (Even with the forced restart, there's a window where bad > data could reach disk before we kill off the other processes that > might write it. But at least it's a short window.) "Corruption" > here doesn't just involve bad data placed into disk buffers; more > often it's things like unreleased locks, which would block other > processes indefinitely. > > I seriously doubt that anything like what you're describing > could be made reliable enough to be acceptable. "Oracle does > it like this" isn't a counter-argument: they have a much different > (and non-extensible) architecture, and they also have an army of > programmers to deal with minutiae like undoing resource acquisition. > Even with that, you'd have to wonder about the number of bugs > existing in such necessarily-poorly-tested code paths. Yes. I think that PostgreSQL's approach is superior: rather than investing in code to mitigate the impact of data corruption caused by a crash, invest in quality code that doesn't crash in the first place. Euphemistically naming a crash "ORA-600 error" seems to be part of their strategy. Yours, Laurenz Albe
Re: Add new option 'all' to pg_stat_reset_shared()
On Mon, Nov 13, 2023 at 01:15:14PM +0900, torikoshia wrote: > I assume you have already taken this into account, but I think we should add > the same documentation to the below patch for pg_stat_reset_slru(): > > https://www.postgresql.org/message-id/CALj2ACW4Fqc_m%2BOaavrOMEivZ5aBa24pVKvoXRTmuFECsNBfAg%40mail.gmail.com Yep, the DEFAULT value and the argument name should be documented in monitoring.sgml. -- Michael signature.asc Description: PGP signature
Re: Synchronizing slots from primary to standby
On Mon, Nov 13, 2023 at 11:02 AM Ajin Cherian wrote: > > On Thu, Nov 9, 2023 at 9:54 PM shveta malik wrote: > > > > PFA v32 patches which has below changes: > Testing with this patch, I see that if the failover enabled slot is > invalidated on the primary, then the corresponding synced slot is not > invalidated on the standby. Instead, I see that it continuously gets > the below error: > " WARNING: not synchronizing slot sub; synchronization would move it > backwards" > > In the code, I see that: > if (remote_slot->restart_lsn < MyReplicationSlot->data.restart_lsn) > { > ereport(WARNING, > errmsg("not synchronizing slot %s; synchronization > would move" >" it backwards", remote_slot->name)); > > ReplicationSlotRelease(); > CommitTransactionCommand(); > return; > } > > If the restart_lsn of the remote slot is behind, then the > local_slot_update() function is never called to set the invalidation > status on the local slot. And for invalidated slots, restart_lsn is > always NULL. > Okay. Thanks for testing Ajin. I think it needs a fix wherein we set the local-slot's invalidation status (provided it is not invalidated already) from the remote slot before this check itself. And if the slot is invalidated locally (either by itself) or by primary_slot being invalidated, then we should skip the sync. I will fix this in the next version. thanks Shveta
Re: pg_upgrade --copy-file-range
On 08.10.23 07:15, Thomas Munro wrote: About your patch: I think you should have a "check" function called from check_new_cluster(). That check function can then also handle the "not supported" case, and you don't need to handle that in parseCommandLine(). I suggest following the clone example for these, since the issues there are very similar. Done. This version looks good to me. Tiny nit: You copy-and-pasted "%s/PG_VERSION.clonetest"; maybe choose a different suffix.