Re: HOT chain validation in verify_heapam()
On Tue, Jan 31, 2023 at 7:20 PM Robert Haas wrote: > On Mon, Jan 30, 2023 at 8:24 AM Himanshu Upadhyaya > wrote: > > Before this we stop the node by "$node->stop;" and then only we progress > to > > manual corruption. This will abort all running/in-progress transactions. > > So, if we create an in-progress transaction and comment "$node->stop;" > > then somehow all the code that we have for manual corruption does not > work. > > > > I think it is required to stop the server and then only proceed for > manual corruption? > > If this is the case then please suggest if there is a way to get an > in-progress transaction > > that we can use for manual corruption. > > How about using a prepared transaction? > > Thanks, yes it's working fine with Prepared Transaction. Please find attached the v9 patch incorporating all the review comments. -- Regards, Himanshu Upadhyaya EnterpriseDB: http://www.enterprisedb.com From d241937841c5990ff4df00d1abc6bfbb3491ac3e Mon Sep 17 00:00:00 2001 From: Himanshu Upadhyaya Date: Sun, 5 Feb 2023 12:50:21 +0530 Subject: [PATCH v9] Implement HOT chain validation in verify_heapam() Himanshu Upadhyaya, reviewed by Robert Haas, Aleksander Alekseev, Andres Freund --- contrib/amcheck/verify_heapam.c | 297 ++ src/bin/pg_amcheck/t/004_verify_heapam.pl | 241 +- 2 files changed, 527 insertions(+), 11 deletions(-) diff --git a/contrib/amcheck/verify_heapam.c b/contrib/amcheck/verify_heapam.c index 4fcfd6df72..01c639b5e1 100644 --- a/contrib/amcheck/verify_heapam.c +++ b/contrib/amcheck/verify_heapam.c @@ -399,9 +399,14 @@ verify_heapam(PG_FUNCTION_ARGS) for (ctx.blkno = first_block; ctx.blkno <= last_block; ctx.blkno++) { OffsetNumber maxoff; + OffsetNumber predecessor[MaxOffsetNumber]; + OffsetNumber successor[MaxOffsetNumber]; + bool lp_valid[MaxOffsetNumber]; CHECK_FOR_INTERRUPTS(); + memset(predecessor, 0, sizeof(OffsetNumber) * MaxOffsetNumber); + /* Optionally skip over all-frozen or all-visible blocks */ if (skip_option != SKIP_PAGES_NONE) { @@ -433,6 +438,10 @@ verify_heapam(PG_FUNCTION_ARGS) for (ctx.offnum = FirstOffsetNumber; ctx.offnum <= maxoff; ctx.offnum = OffsetNumberNext(ctx.offnum)) { + OffsetNumber nextoffnum; + + successor[ctx.offnum] = InvalidOffsetNumber; + lp_valid[ctx.offnum] = false; ctx.itemid = PageGetItemId(ctx.page, ctx.offnum); /* Skip over unused/dead line pointers */ @@ -469,6 +478,13 @@ verify_heapam(PG_FUNCTION_ARGS) report_corruption(&ctx, psprintf("line pointer redirection to unused item at offset %u", (unsigned) rdoffnum)); + +/* + * Make entry in successor array, redirected lp will be + * validated at the time when we loop over successor array. + */ +successor[ctx.offnum] = rdoffnum; +lp_valid[ctx.offnum] = true; continue; } @@ -504,9 +520,283 @@ verify_heapam(PG_FUNCTION_ARGS) /* It should be safe to examine the tuple's header, at least */ ctx.tuphdr = (HeapTupleHeader) PageGetItem(ctx.page, ctx.itemid); ctx.natts = HeapTupleHeaderGetNatts(ctx.tuphdr); + lp_valid[ctx.offnum] = true; /* Ok, ready to check this next tuple */ check_tuple(&ctx); + + /* + * Add the data to the successor array if next updated tuple is in + * the same page. It will be used later to generate the + * predecessor array. + * + * We need to access the tuple's header to populate the + * predecessor array. However the tuple is not necessarily sanity + * checked yet so delaying construction of predecessor array until + * all tuples are sanity checked. + */ + nextoffnum = ItemPointerGetOffsetNumber(&(ctx.tuphdr)->t_ctid); + if (ItemPointerGetBlockNumber(&(ctx.tuphdr)->t_ctid) == ctx.blkno && +nextoffnum != ctx.offnum) + { +successor[ctx.offnum] = nextoffnum; + } + } + + /* + * Loop over offset and populate predecessor array from all entries + * that are present in successor array. + */ + ctx.attnum = -1; + for (ctx.offnum = FirstOffsetNumber; ctx.offnum <= maxoff; + ctx.offnum = OffsetNumberNext(ctx.offnum)) + { + ItemId curr_lp; + ItemId next_lp; + HeapTupleHeader curr_htup; + HeapTupleHeader next_htup; + TransactionId curr_xmax; + TransactionId next_xmin; + + OffsetNumber nextoffnum = successor[ctx.offnum]; + + if (nextoffnum == InvalidOffsetNumber) + { +/* + * This is either the last updated tuple in the chain or a + * corrupted Tuple/lp or unused/dead line pointer. + */ +continue; + } + + /* + * Add data to the predecessor array even if the current or + * successor's LP is not valid. We will not process/validate these + * offset entries while looping over the predecessor array but + * having all entries in the predecessor array will help in + * identifying(and validating) the Root of a chain. + */ + if (!lp_valid[ctx.offnum] || !lp_valid[nextoffnum]) + { +
Re: what's the meaning of key?
On 05/02/2023 06:09, jack...@gmail.com wrote: I'm doing research on heap_am, and for heap_beginscan func, I find out that there is a arg called nkeys, I use some sqls as examples like 'select * from t;' and 'select * from t where a = 1', but it is always zero, can you give me some descriptions for this? what's it used for? The executor evaluates table scan quals in the SeqScan node itself, in ExecScan function. It doesn't use the heap_beginscan scankeys. There has been some discussion on changing that, as some table access methods might be able to filter rows more efficiently using the scan keys than the executor node. But that's how it currently works. I think the heap scankeys are used by catalog accesses, though, so it's not completely dead code. - Heikki
Re: Generating code for query jumbling through gen_node_support.pl
On Tue, Jan 31, 2023 at 03:40:56PM +0900, Michael Paquier wrote: > With all that in mind, I have spent my day polishing that and doing a > close lookup, and the patch has been applied. Thanks a lot! While working on a different patch, I have noticed a small issue in the way the jumbling happens for A_Const, where ValUnion was not getting jumbled correctly. This caused a few statements that rely on this node to compile the same query IDs when using different values. The full contents of pg_stat_statements for a regression database point to: - SET. - COPY with queries. - CREATE TABLE with partition bounds and default expressions. This was causing some confusion in pg_stat_statements where some utility queries would be incorrectly reported, and at this point the intention is to keep this area compatible with the string-based method when it comes to the values. Like read, write and copy nodes, we need to compile the query ID based on the type of the value, which cannot be automated. Attached is a patch to fix this issue with some regression tests, that I'd like to get fixed before moving on with more business in pg_stat_statements (aka properly show Const nodes for utilities with normalized queries). Comments or objections are welcome, of course. (FWIW, I'd like to think that there is an argument to normalize the A_Const nodes for a portion of the DDL queries, by ignoring their values in the query jumbling and mark a location, which would be really useful for some workloads, but that's a separate discussion I am keeping for later.) -- Michael diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h index 3d67787e7a..855da99ec0 100644 --- a/src/include/nodes/parsenodes.h +++ b/src/include/nodes/parsenodes.h @@ -355,7 +355,7 @@ union ValUnion typedef struct A_Const { - pg_node_attr(custom_copy_equal, custom_read_write) + pg_node_attr(custom_copy_equal, custom_read_write, custom_query_jumble) NodeTag type; union ValUnion val; diff --git a/src/backend/nodes/queryjumblefuncs.c b/src/backend/nodes/queryjumblefuncs.c index 223d1bc826..72ce15e43d 100644 --- a/src/backend/nodes/queryjumblefuncs.c +++ b/src/backend/nodes/queryjumblefuncs.c @@ -49,6 +49,7 @@ static void AppendJumble(JumbleState *jstate, const unsigned char *item, Size size); static void RecordConstLocation(JumbleState *jstate, int location); static void _jumbleNode(JumbleState *jstate, Node *node); +static void _jumbleA_Const(JumbleState *jstate, Node *node); static void _jumbleList(JumbleState *jstate, Node *node); static void _jumbleRangeTblEntry(JumbleState *jstate, Node *node); @@ -313,6 +314,40 @@ _jumbleList(JumbleState *jstate, Node *node) } } +static void +_jumbleA_Const(JumbleState *jstate, Node *node) +{ + A_Const *expr = (A_Const *) node; + + JUMBLE_FIELD(isnull); + if (!expr->isnull) + { + JUMBLE_FIELD(val.node.type); + switch (nodeTag(&expr->val)) + { + case T_Integer: +JUMBLE_FIELD(val.ival.ival); +break; + case T_Float: +JUMBLE_STRING(val.fval.fval); +break; + case T_Boolean: +JUMBLE_FIELD(val.boolval.boolval); +break; + case T_String: +JUMBLE_STRING(val.sval.sval); +break; + case T_BitString: +JUMBLE_STRING(val.bsval.bsval); +break; + default: +elog(ERROR, "unrecognized node type: %d", + (int) nodeTag(&expr->val)); +break; + } + } +} + static void _jumbleRangeTblEntry(JumbleState *jstate, Node *node) { diff --git a/contrib/pg_stat_statements/expected/pg_stat_statements.out b/contrib/pg_stat_statements/expected/pg_stat_statements.out index fb9ccd920f..753062a9d7 100644 --- a/contrib/pg_stat_statements/expected/pg_stat_statements.out +++ b/contrib/pg_stat_statements/expected/pg_stat_statements.out @@ -579,6 +579,14 @@ NOTICE: table "test" does not exist, skipping NOTICE: table "test" does not exist, skipping NOTICE: function plus_one(pg_catalog.int4) does not exist, skipping DROP FUNCTION PLUS_TWO(INTEGER); +-- This SET query uses two different strings, still thet count as one entry. +SET work_mem = '1MB'; +Set work_mem = '1MB'; +SET work_mem = '2MB'; +RESET work_mem; +SET enable_seqscan = off; +SET enable_seqscan = on; +RESET enable_seqscan; SELECT query, calls, rows FROM pg_stat_statements ORDER BY query COLLATE "C"; query | calls | rows --+---+-- @@ -588,10 +596,16 @@ SELECT query, calls, rows FROM pg_stat_statements ORDER BY query COLLATE "C"; DROP FUNCTION PLUS_TWO(INTEGER) | 1 |0 DROP TABLE IF EXISTS test| 3 |0 DROP TABLE test | 1 |0 + RESET enable_seqscan | 1 |0 + RESET work_mem
Re: [PATCH] Compression dictionaries for JSONB
Hi, > I don't think the approaches in either of these threads is > promising. They add a lot of complexity, require implementation effort > for each type, manual work by the administrator for column, etc. I would like to point out that compression dictionaries don't require per-type work. Current implementation is artificially limited to JSONB because it's a PoC. I was hoping to get more feedback from the community before proceeding further. Internally it uses type-agnostic compression and doesn't care whether it compresses JSON(B), XML, TEXT, BYTEA or arrays. This choice was explicitly done in order to support types other than JSONB. > One of the major justifications for work in this area is the cross-row > redundancy for types like jsonb. I think there's ways to improve that > across types, instead of requiring per-type work. To be fair, there are advantages in using type-aware compression. The compression algorithm can be more efficient than a general one and in theory one can implement lazy decompression, e.g. the one that decompresses only the accessed fields of a JSONB document. I agree though that particularly for PostgreSQL this is not necessarily the right path, especially considering the accompanying complexity. If the user cares about the disk space consumption why storing JSONB in a relational DBMS in the first place? We already have a great solution for compacting the data, it was invented in the 70s and is called normalization. Since PostgreSQL is not a specified document-oriented DBMS I think we better focus our (far from being infinite) resources on something more people would benefit from: AIO/DIO [1] or perhaps getting rid of freezing [2], to name a few examples. > [...] > step back and decide which one we dislike the less, and how to fix that > one into a shape that we no longer dislike. For the sake of completeness, doing neither type-aware TOASTing nor compression dictionaries and leaving this area to the extension authors (e.g. ZSON) is also a possible choice, for the same reasons named above. However having a built-in type-agnostic dictionary compression IMO is a too attractive idea to completely ignore it. Especially considering the fact that the implementation was proven to be fairly simple and there was even no need to rebase the patch since November :) I know that there were concerns [3] regarding the typmod hack. I don't like it either and 100% open to suggestions here. This is merely a current implementation detail used in a PoC, not a fundamental design decision. [1]: https://postgr.es/m/20210223100344.llw5an2aklengrmn%40alap3.anarazel.de [2]: https://postgr.es/m/CAJ7c6TOk1mx4KfF0AHkvXi%2BpkdjFqwTwvRE-JmdczZMAYnRQ0w%40mail.gmail.com [3]: https://wiki.postgresql.org/wiki/FOSDEM/PGDay_2023_Developer_Meeting#v16_Patch_Triage -- Best regards, Aleksander Alekseev
Re: undersized unions
Hi, On February 5, 2023 6:16:55 AM GMT+01:00, Tom Lane wrote: >Michael Paquier writes: >> On Sat, Feb 04, 2023 at 05:07:08AM -0800, Andres Freund wrote: >>> We actually have a fair amount of code like that, but currently are >>> escaping most of the warnings, because gcc doesn't know that palloc() is >>> an allocator. With more optimizations (particularly with LTO), we end up >>> with more of such warnings. I'd like to annotate palloc so gcc >>> understands the size, as that does help to catch bugs when confusing the >>> type. It also helps static analyzers. > >> Ah, that seems like a good idea in the long run. > >I'm kind of skeptical about whether we'll be able to get rid of all >the resulting warnings without extremely invasive (and ugly) changes. It's not that many sources of warnings, fwiw. But the concrete reason for posting here was that I'm wondering whether the "undersized" allocations could cause problems as-is. On the one hand there's compiler optimizations that could end up being a problem - imagine two branches of an if allocating something containing a union and one assigning to 32 the other to a 64bit integer union member. It'd imo be reasonable for the compiler to move that register->memory move outside of the if. On the other hand, it also just seems risky from a code writing perspective. It's not immediate obvious that it'd be unsafe to create an on-stack Numeric by assigning *ptr. But it is. Andres -- Sent from my Android device with K-9 Mail. Please excuse my brevity.
Re: pg_stat_statements and "IN" conditions
> On Sun, Feb 05, 2023 at 10:30:25AM +0900, Michael Paquier wrote: > On Sat, Feb 04, 2023 at 06:08:41PM +0100, Dmitry Dolgov wrote: > > Here is the rebased version. To adapt to the latest changes, I've marked > > ArrayExpr with custom_query_jumble to implement this functionality, but > > tried to make the actual merge logic relatively independent. Otherwise, > > everything is the same. > > I was quickly looking at this patch, so these are rough impressions. > > + boolmerged; /* whether or not the location was marked as > + not contributing to jumble */ > > This part of the patch is a bit disturbing.. We have node attributes > to track if portions of a node should be ignored or have a location > marked, still this "merged" flag is used as an extension to track if a > location should be done or not. Is that a concept that had better be > controlled via a new node attribute? Good question. I need to think a bit more if it's possible to leverage node attributes mechanism, but at the moment I'm still inclined to extend LocationLen. The reason is that it doesn't exactly serve the tracking purpose, i.e. whether to capture a location (I have to update the code commentary), it helps differentiate cases when locations A and D are obtained from merging A B C D instead of just being A and D. I'm thinking about this in the following way: the core jumbling logic is responsible for deriving locations based on the input expressions; in the case of merging we produce less locations; pgss have to represent the result only using locations and has to be able to differentiate simple locations and locations after merging. > +-- > +-- Consts merging > +-- > +CREATE TABLE test_merge (id int, data int); > +-- IN queries > +-- No merging > > Would it be better to split this set of tests into a new file? FWIW, > I have a patch in baking process that refactors a bit the whole, > before being able to extend it so as we have more coverage for > normalized utility queries, as of now the query strings stored by > pg_stat_statements don't reflect that even if the jumbling computation > marks the location of the Const nodes included in utility statements > (partition bounds, queries of COPY, etc.). I should be able to send > that tomorrow, and my guess that you could take advantage of that > even for this thread. Sure, I'll take a look how I can benefit from your work, thanks.
Re: First draft of back-branch release notes is done
On 2023-Feb-03, Tom Lane wrote: > ... at > > https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=f282b026787da69d88a35404cf62f1cc21cfbb7c > > As usual, please send corrections/comments by Sunday. Fix edge-case data corruption in shared tuplestores (Dmitry Astapov) If the final chunk of a large tuple being written out to disk was exactly 32760 bytes, it would be corrupted due to a fencepost bug. This is a hazard for parallelized plans that require a tuplestore, such as parallel hash join. The query would typically fail later with corrupted-data symptoms. I think this sounds really scary, because people are going to think that their stored data can get corrupted -- they don't necessarily know what a "shared tuplestore" is. Maybe "Avoid query failures in parallel hash joins" as headline? -- Álvaro HerreraBreisgau, Deutschland — https://www.EnterpriseDB.com/ "The Gord often wonders why people threaten never to come back after they've been told never to return" (www.actsofgord.com)
Re: undersized unions
Hi, On 2023-02-05 10:18:14 +0900, Michael Paquier wrote: > On Sat, Feb 04, 2023 at 05:07:08AM -0800, Andres Freund wrote: > > : In function 'assign': > > :9:6: warning: array subscript 'foo[0]' is partly outside array > > bounds of 'unsigned char[4]' [-Warray-bounds=] > > 9 | p->i = i; > > | ^~ > > :8:22: note: object of size 4 allocated by '__builtin_malloc' > > 8 | foo *p = (foo *) __builtin_malloc(sizeof(int)); > > | ^ > > Compiler returned: 0 > > > > I can't really tell if gcc is right or wrong wrong to warn about > > this. On the one hand it's a union, and we only access the element that > > is actually backed by memory, on the other hand, the standard does say > > that the size of a union is the largest element, so we are pointing to > > something undersized. > > Something I have noticed, related to that.. meson reports a set of > warnings here, not ./configure, still I apply the same set of CFLAGS > to both. What's the difference in the meson setup that creates that, > if I may ask? There is a link to the way -Warray-bound is handled? It's possibly related to the optimization level used. Need a bit more information to provide a more educated guess. What warnings, what CFLAGS etc. Greetings, Andres Freund
Re: run pgindent on a regular basis / scripted manner
On 2023-02-04 Sa 09:20, Andrew Dunstan wrote: On 2023-02-04 Sa 06:34, Andres Freund wrote: ISTM that we're closer to being able to enforce pgindent than perltidy. At the same time, I think the issue of C code in HEAD not being indented is more pressing - IME it's much more common to have to touch a lot of C code than to have to touch a lot fo perl files. So perhaps we should just start with being more stringent with C code, and once we made perltidy less noisy, add that? Sure, we don't have to tie them together. I'm currently experimenting with settings on the buildfarm code, trying to get it both stable and looking nice. Then I'll try on the Postgres core code and post some results. So here's a diff made from running perltidy v20221112 with the additional setting --valign-exclusion-list=", = => || && if unless" Essentially this abandons those bits of vertical alignment that tend to catch us out when additions are made to the code. I think this will make the code much more maintainable and result in much less perltidy churn. It would also mean that it's far more likely that what we would naturally code can be undisturbed by perltidy. cheers andrew -- Andrew Dunstan EDB:https://www.enterprisedb.com pgperltidy-no-valign.patch.gz Description: application/gzip
Re: File descriptors in exec'd subprocesses
Hi, Unsurprisingly I'm in favor of this. On February 5, 2023 1:00:50 AM GMT+01:00, Thomas Munro wrote: >Are there any more descriptors we need to think about? Postmaster's listen sockets? Saves a bunch of syscalls, at least. Logging collector pipe write end, in backends? Greetings, Andres -- Sent from my Android device with K-9 Mail. Please excuse my brevity.
Re: pg_stat_statements and "IN" conditions
Dmitry Dolgov <9erthali...@gmail.com> writes: > I'm thinking about this in the following way: the core jumbling logic is > responsible for deriving locations based on the input expressions; in > the case of merging we produce less locations; pgss have to represent > the result only using locations and has to be able to differentiate > simple locations and locations after merging. Uh ... why? ISTM you're just going to elide all inside the IN, so why do you need more than a start and stop position? regards, tom lane
Re: File descriptors in exec'd subprocesses
Andres Freund writes: > On February 5, 2023 1:00:50 AM GMT+01:00, Thomas Munro > wrote: >> Are there any more descriptors we need to think about? > Postmaster's listen sockets? I wonder whether O_CLOEXEC on that would be inherited by the client-communication sockets, though. That's fine ... unless you are doing EXEC_BACKEND. regards, tom lane
Re: Weird failure with latches in curculio on v15
Hi, On 2023-02-04 10:03:54 -0800, Nathan Bossart wrote: > On Sat, Feb 04, 2023 at 03:30:29AM -0800, Andres Freund wrote: > > That's kind of my problem with these changes. They try to introduce new > > abstraction layers, but don't provide real abstraction, because they're > > very tightly bound to the way the functions were called before the > > refactoring. And none of these restrictions are actually documented. > > Okay. Michael, why don't we revert the shell_restore stuff for now? Once > the archive modules interface changes and the fix for this > SIGTERM-during-system() problem are in, I will work through this feedback > and give recovery modules another try. I'm still hoping to have recovery > modules ready in time for the v16 feature freeze. > > My intent was to improve this code by refactoring and reducing code > duplication, but I seem to have missed the mark. I am sorry. FWIW, I think the patches were going roughly the right direction, they just needs a bit more work. I don't think we should expose the proc_exit() hack, and its supporting infrastructure, to the pluggable *_command logic. It's bad enough as-is, but having to do this stuff within an extension module seems likely to end badly. There's just way too much action-at-a-distance. I think Thomas has been hacking on a interruptible system() replacement. With that, a lot of this ugliness would be resolved. Greetings, Andres Freund
Re: [PATCH] Compression dictionaries for JSONB
Hi, On 2023-02-05 13:41:17 +0300, Aleksander Alekseev wrote: > > I don't think the approaches in either of these threads is > > promising. They add a lot of complexity, require implementation effort > > for each type, manual work by the administrator for column, etc. > > I would like to point out that compression dictionaries don't require > per-type work. > > Current implementation is artificially limited to JSONB because it's a > PoC. I was hoping to get more feedback from the community before > proceeding further. Internally it uses type-agnostic compression and > doesn't care whether it compresses JSON(B), XML, TEXT, BYTEA or > arrays. This choice was explicitly done in order to support types > other than JSONB. I don't think we'd want much of the infrastructure introduced in the patch for type agnostic cross-row compression. A dedicated "dictionary" type as a wrapper around other types IMO is the wrong direction. This should be a relation-level optimization option, possibly automatic, not something visible to every user of the table. I assume that manually specifying dictionary entries is a consequence of the prototype state? I don't think this is something humans are very good at, just analyzing the data to see what's useful to dictionarize seems more promising. I also suspect that we'd have to spend a lot of effort to make compression/decompression fast if we want to handle dictionaries ourselves, rather than using the dictionary support in libraries like lz4/zstd. > > One of the major justifications for work in this area is the cross-row > > redundancy for types like jsonb. I think there's ways to improve that > > across types, instead of requiring per-type work. > > To be fair, there are advantages in using type-aware compression. The > compression algorithm can be more efficient than a general one and in > theory one can implement lazy decompression, e.g. the one that > decompresses only the accessed fields of a JSONB document. > I agree though that particularly for PostgreSQL this is not > necessarily the right path, especially considering the accompanying > complexity. I agree with both those paragraphs. > above. However having a built-in type-agnostic dictionary compression > IMO is a too attractive idea to completely ignore it. Especially > considering the fact that the implementation was proven to be fairly > simple and there was even no need to rebase the patch since November > :) I don't think a prototype-y patch not needing a rebase two months is a good measure of complexity :) Greetings, Andres Freund
Re: File descriptors in exec'd subprocesses
Hi, On 2023-02-05 11:06:13 -0500, Tom Lane wrote: > Andres Freund writes: > > On February 5, 2023 1:00:50 AM GMT+01:00, Thomas Munro > > wrote: > >> Are there any more descriptors we need to think about? > > > Postmaster's listen sockets? > > I wonder whether O_CLOEXEC on that would be inherited by the > client-communication sockets, though. I'd be very suprised if it were. Nope, at least not on linux. Verified by looking at /proc/*/fdinfo/n after adding SOCK_CLOEXEC to just the socket() call. 'flags' changes from 02 -> 0202 for the listen socket, but stays at 04002 for the client socket. If I add SOCK_CLOEXEC to accept() (well, accept4()), it does change from 04002 to 02004002. Greetings, Andres Freund
Re: [PATCH] Compression dictionaries for JSONB
Hi, > I assume that manually specifying dictionary entries is a consequence of > the prototype state? I don't think this is something humans are very > good at, just analyzing the data to see what's useful to dictionarize > seems more promising. No, humans are not good at it. The idea was to automate the process and build the dictionaries automatically e.g. during the VACUUM. > I don't think we'd want much of the infrastructure introduced in the > patch for type agnostic cross-row compression. A dedicated "dictionary" > type as a wrapper around other types IMO is the wrong direction. This > should be a relation-level optimization option, possibly automatic, not > something visible to every user of the table. So to clarify, are we talking about tuple-level compression? Or perhaps page-level compression? Implementing page-level compression should be *relatively* straightforward. As an example this was previously done for InnoDB. Basically InnoDB compresses the entire page, then rounds the result to 1K, 2K, 4K, 8K, etc and stores the result in a corresponding fork ("fork" in PG terminology), similarly to how a SLAB allocator works. Additionally a page_id -> fork_id map should be maintained, probably in yet another fork, similarly to visibility map. A compressed page can change the fork after being modified since this may change the size of a compressed page. The buffer manager is unaffected and deals only with uncompressed pages. (I'm not an expert in InnoDB and this is my very rough understanding of how its compression works.) I believe this can be implemented as a TAM. Whether this would be a "dictionary" compression is debatable but it gives the users similar benefits, give or take. The advantage is that users shouldn't define any dictionaries manually, nor should DBMS during VACUUM or somehow else. > I also suspect that we'd have to spend a lot of effort to make > compression/decompression fast if we want to handle dictionaries > ourselves, rather than using the dictionary support in libraries like > lz4/zstd. That's a reasonable concern, can't argue with that. > I don't think a prototype-y patch not needing a rebase two months is a > good measure of complexity :) It's worth noting that I also invested quite some time into reviewing type-aware TOASTers :) I just choose to keep my personal opinion about the complexity of that patch to myself this time since obviously I'm a bit biased. However if you are curious it's all in the corresponding thread. -- Best regards, Aleksander Alekseev
Re: Make EXPLAIN generate a generic plan for a parameterized query
On Fri, 2023-02-03 at 15:11 -0500, Tom Lane wrote: > I can think of a couple of possible ways forward: > > * Fix things so that the generic parameters appear to have NULL > values when inspected during executor startup. I'm not sure > how useful that'd be though. In partition-pruning cases that'd > lead to EXPLAIN (GENERIC_PLAN) showing the plan with all > partitions pruned, other than the one for NULL values if there > is one. Doesn't sound too helpful. > > * Invent another executor flag that's a "stronger" version of > EXEC_FLAG_EXPLAIN_ONLY, and pass that when any generic parameters > exist, and check it in CreatePartitionPruneState to decide whether > to do startup pruning. This avoids changing behavior for existing > cases, but I'm not sure how much it has to recommend it otherwise. > Skipping the startup prune step seems like it'd produce misleading > results in another way, ie showing too many partitions not too few. > > This whole business of partition pruning dependent on the > generic parameters is making me uncomfortable. It seems like > we *can't* show a plan that is either representative of real > execution or comparable to what you get from more-traditional > EXPLAIN usage. Maybe we need to step back and think more. I slept over it, and the second idea now looks like the the right approach to me. My idea of seeing a generic plan is that plan-time partition pruning happens, but not execution-time pruning, so that I get no "subplans removed". Are there any weird side effects of skipping the startup prune step? Anyway, attached is v7 that tries to do it that way. It feels fairly good to me. I invented a new executor flag EXEC_FLAG_EXPLAIN_GENERIC. To avoid having to change all the places that check EXEC_FLAG_EXPLAIN_ONLY to also check for the new flag, I decided that the new flag can only be used as "add-on" to EXEC_FLAG_EXPLAIN_ONLY. Yours, Laurenz Albe From cd0b5a1a4f301bb7fad9088d5763989f5dde4636 Mon Sep 17 00:00:00 2001 From: Laurenz Albe Date: Sun, 5 Feb 2023 18:11:57 +0100 Subject: [PATCH] Add EXPLAIN option GENERIC_PLAN This allows EXPLAIN to generate generic plans for parameterized statements (that have parameter placeholders like $1 in the statement text). Invent a new executor flag EXEC_FLAG_EXPLAIN_GENERIC that disables runtime partition pruning for such plans: that would fail if the non-existing parameters are involved, and we don't want to remove subplans anyway. Author: Laurenz Albe Discussion: https://postgr.es/m/0a29b954b10b57f0d135fe12aa0909bd41883eb0.camel%40cybertec.at --- doc/src/sgml/ref/explain.sgml | 15 + src/backend/commands/explain.c| 11 +++ src/backend/executor/execMain.c | 3 ++ src/backend/executor/execPartition.c | 9 -- src/backend/parser/analyze.c | 29 + src/include/commands/explain.h| 1 + src/include/executor/executor.h | 18 +++ src/test/regress/expected/explain.out | 46 +++ src/test/regress/sql/explain.sql | 29 + 9 files changed, 152 insertions(+), 9 deletions(-) diff --git a/doc/src/sgml/ref/explain.sgml b/doc/src/sgml/ref/explain.sgml index d4895b9d7d..a1fdd31bc7 100644 --- a/doc/src/sgml/ref/explain.sgml +++ b/doc/src/sgml/ref/explain.sgml @@ -40,6 +40,7 @@ EXPLAIN [ ANALYZE ] [ VERBOSE ] statementboolean ] COSTS [ boolean ] SETTINGS [ boolean ] +GENERIC_PLAN [ boolean ] BUFFERS [ boolean ] WAL [ boolean ] TIMING [ boolean ] @@ -167,6 +168,20 @@ ROLLBACK; + +GENERIC_PLAN + + + Generate a generic plan for the statement (see + for details about generic plans). The statement can contain parameter + placeholders like $1 (but then it has to be a statement + that supports parameters). This option cannot be used together with + ANALYZE, since a statement with unknown parameters + cannot be executed. + + + + BUFFERS diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c index 35c23bd27d..37ea4e5035 100644 --- a/src/backend/commands/explain.c +++ b/src/backend/commands/explain.c @@ -190,6 +190,8 @@ ExplainQuery(ParseState *pstate, ExplainStmt *stmt, es->wal = defGetBoolean(opt); else if (strcmp(opt->defname, "settings") == 0) es->settings = defGetBoolean(opt); + else if (strcmp(opt->defname, "generic_plan") == 0) + es->generic = defGetBoolean(opt); else if (strcmp(opt->defname, "timing") == 0) { timing_set = true; @@ -227,6 +229,13 @@ ExplainQuery(ParseState *pstate, ExplainStmt *stmt, parser_errposition(pstate, opt->location))); } + /* check that GENERIC_PLAN is not used with EXPLAIN ANALYZE */ + if (es->generic && es->analyze) + ereport(ERROR, +(errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("EXPLAIN ANALYZE cannot be used with GENERIC_PLAN"))); + + /* check that WAL is used with EXPLAIN ANALYZE */ if (es->wal && !es->ana
add a "status" column to the pg_rules system view
hi, I noticed that the pg_rules system view (all PG versions) does not include a "status" field (like in pg_trigger with tgenabled column) the official view (from 15.1 sources) is : CREATE VIEW pg_rules AS SELECT N.nspname AS schemaname, C.relname AS tablename, R.rulename AS rulename, pg_get_ruledef(R.oid) AS definition FROM (pg_rewrite R JOIN pg_class C ON (C.oid = R.ev_class)) LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace) WHERE R.rulename != '_RETURN'; i propose to add a new field "rule_enabled" to get (easilly and officially) the rule status to all PG version CREATE VIEW pg_rules AS SELECT N.nspname AS schemaname, C.relname AS tablename, R.rulename AS rulename, R.ev_enabled as rule_enabled, pg_get_ruledef(R.oid) AS definition FROM (pg_rewrite R JOIN pg_class C ON (C.oid = R.ev_class)) LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace) WHERE R.rulename != '_RETURN'; What do u think about that ? Thx Albin
Re: pglz compression performance, take two
On Fri, Jan 6, 2023 at 10:02 PM Andrey Borodin wrote: > > Hello! Please find attached v8. I got some interesting feedback from some patch users. There was an oversight that frequently yielded results that are 1,2 or 3 bytes longer than expected. Looking closer I found that the correctness of the last 3-byte tail is checked in two places. PFA fix for this. Previously compressed data was correct, however in some cases few bytes longer than the result of current pglz implementation. Thanks! Best regards, Andrey Borodin. v9-0001-Reorganize-pglz-compression-code.patch Description: Binary data v9-0002-Fix-oversight-for-compression-of-last-4-bytes.patch Description: Binary data
Re: [PATCH] Compression dictionaries for JSONB
Hi, On 2023-02-05 20:05:51 +0300, Aleksander Alekseev wrote: > > I don't think we'd want much of the infrastructure introduced in the > > patch for type agnostic cross-row compression. A dedicated "dictionary" > > type as a wrapper around other types IMO is the wrong direction. This > > should be a relation-level optimization option, possibly automatic, not > > something visible to every user of the table. > > So to clarify, are we talking about tuple-level compression? Or > perhaps page-level compression? Tuple level. What I think we should do is basically this: When we compress datums, we know the table being targeted. If there's a pg_attribute parameter indicating we should, we can pass a prebuilt dictionary to the LZ4/zstd [de]compression functions. It's possible we'd need to use a somewhat extended header for such compressed datums, to reference the dictionary "id" to be used when decompressing, if the compression algorithms don't already have that in one of their headers, but that's entirely doable. A quick demo of the effect size: # create data to train dictionary with, use subset to increase realism mkdir /tmp/pg_proc_as_json/; CREATE EXTENSION adminpack; SELECT pg_file_write('/tmp/pg_proc_as_json/'||oid||'.raw', to_json(pp)::text, true) FROM pg_proc pp LIMIT 2000; # build dictionary zstd --train -o /tmp/pg_proc_as_json.dict /tmp/pg_proc_as_json/*.raw # create more data SELECT pg_file_write('/tmp/pg_proc_as_json/'||oid||'.raw', to_json(pp)::text, true) FROM pg_proc pp; # compress without dictionary lz4 -k -m /tmp/pg_proc_as_json/*.raw zstd -k /tmp/pg_proc_as_json/*.raw # measure size cat /tmp/pg_proc_as_json/*.raw|wc -c; cat /tmp/pg_proc_as_json/*.lz4|wc -c; cat /tmp/pg_proc_as_json/*.zst|wc -c # compress with dictionary rm -f /tmp/pg_proc_as_json/*.{lz4,zst}; lz4 -k -D /tmp/pg_proc_as_json.dict -m /tmp/pg_proc_as_json/*.raw zstd -k -D /tmp/pg_proc_as_json.dict /tmp/pg_proc_as_json/*.raw did the same with zstd. Here's the results: lz4 zstd uncompressed no dict1328794 9824973898498 dict375070 267194 I'd say the effect of the dictionary is pretty impressive. And remember, this is with the dictionary having been trained on a subset of the data. As a comparison, here's all the data compressed compressed at once: lz4 zstd no dict 180231 104913 dict179814 106444 Unsurprisingly the dictionary doesn't help much, because the compression algorithm can "natively" see the duplication. - Andres
Re: First draft of back-branch release notes is done
Alvaro Herrera writes: > On 2023-Feb-03, Tom Lane wrote: >> Fix edge-case data corruption in shared tuplestores (Dmitry Astapov) > I think this sounds really scary, because people are going to think that > their stored data can get corrupted -- they don't necessarily know what > a "shared tuplestore" is. Maybe "Avoid query failures in parallel hash > joins" as headline? Hmmm ... are we sure it *can't* lead to corruption of stored data, if this happens during an INSERT or UPDATE plan? I'll grant that such a case seems pretty unlikely though, as the bogus data retrieved from the tuplestore would have to not cause a failure within the query before it can get written out. Also, aren't shared tuplestores used in more places than just parallel hash join? I mentioned that as an example, not an exhaustive list of trouble spots. regards, tom lane
Re: pg_stat_statements and "IN" conditions
> On Sun, Feb 05, 2023 at 11:02:32AM -0500, Tom Lane wrote: > Dmitry Dolgov <9erthali...@gmail.com> writes: > > I'm thinking about this in the following way: the core jumbling logic is > > responsible for deriving locations based on the input expressions; in > > the case of merging we produce less locations; pgss have to represent > > the result only using locations and has to be able to differentiate > > simple locations and locations after merging. > > Uh ... why? ISTM you're just going to elide all inside the IN, > so why do you need more than a start and stop position? Exactly, start and stop positions. But if there would be no information that merging was applied, the following queries will look the same after jumbling, right? -- input query SELECT * FROM test_merge WHERE id IN (1, 2); -- jumbling result, two LocationLen, for values 1 and 2 SELECT * FROM test_merge WHERE id IN ($1, $2); -- input query SELECT * FROM test_merge WHERE id IN (1, 2, 3, 4, 5, 6, 7, 8, 9, 10); -- jumbling result, two LocationLen after merging, for values 1 and 10 SELECT * FROM test_merge WHERE id IN (...); -- without remembering about merging the result would be SELECT * FROM test_merge WHERE id IN ($1, $2);
Re: First draft of back-branch release notes is done
"Jonathan S. Katz" writes: > On Feb 4, 2023, at 10:24 AM, Tom Lane wrote: >> “Prevent clobbering of cached parsetrees…Bad things could happen if…” > While I chuckled over the phrasing, I’m left to wonder what the “bad things” > are, in case I > need to check an older version to see if I’m susceptible to this bug. Fair. I was trying to avoid committing to specific consequences. The assertion failure seen in the original report (#17702) wouldn't occur for typical users, but they might see crashes or "unexpected node type" failures. Maybe we can say that instead. regards, tom lane
Re: First draft of back-branch release notes is done
I wrote: > Alvaro Herrera writes: >> I think this sounds really scary, because people are going to think that >> their stored data can get corrupted -- they don't necessarily know what >> a "shared tuplestore" is. Maybe "Avoid query failures in parallel hash >> joins" as headline? Maybe less scary if we make it clear we're talking about a temporary file? Fix edge-case corruption of temporary data within shared tuplestores (Dmitry Astapov) If the final chunk of a large tuple being written out to a temporary file was exactly 32760 bytes, it would be corrupted due to a fencepost bug. This is a hazard for parallelized plans that require a tuplestore, such as parallel hash join. The query would typically fail later with corrupted-data symptoms. regards, tom lane
Re: First draft of back-branch release notes is done
On Mon, Feb 6, 2023 at 8:57 AM Tom Lane wrote: > Alvaro Herrera writes: > > On 2023-Feb-03, Tom Lane wrote: > >> Fix edge-case data corruption in shared tuplestores (Dmitry Astapov) > > > I think this sounds really scary, because people are going to think that > > their stored data can get corrupted -- they don't necessarily know what > > a "shared tuplestore" is. Maybe "Avoid query failures in parallel hash > > joins" as headline? > > Hmmm ... are we sure it *can't* lead to corruption of stored data, > if this happens during an INSERT or UPDATE plan? I'll grant that > such a case seems pretty unlikely though, as the bogus data > retrieved from the tuplestore would have to not cause a failure > within the query before it can get written out. Agreed. I think you have to be quite unlucky to hit this in the first place (very large tuples with very particular alignment), and then you'd be highly likely to fail in some way due to corrupted tuple size, making permanent corruption extremely unlikely. > Also, aren't shared tuplestores used in more places than just > parallel hash join? I mentioned that as an example, not an > exhaustive list of trouble spots. Shared file sets (= a directory of temp files with automatic cleanup) are used by more things, but shared tuplestores (= a shared file set with a tuple-oriented interface on top, to support partial scan) are currently only used by PHJ.
Re: First draft of back-branch release notes is done
Thomas Munro writes: > On Mon, Feb 6, 2023 at 8:57 AM Tom Lane wrote: >> Also, aren't shared tuplestores used in more places than just >> parallel hash join? I mentioned that as an example, not an >> exhaustive list of trouble spots. > Shared file sets (= a directory of temp files with automatic cleanup) > are used by more things, but shared tuplestores (= a shared file set > with a tuple-oriented interface on top, to support partial scan) are > currently only used by PHJ. Oh, okay. I'll change it to say "corruption within parallel hash join", then, and not use the word "tuplestore" at all. regards, tom lane
Re: Support logical replication of DDLs
On Fri, Feb 3, 2023 at 9:21 PM Alvaro Herrera wrote: > > On 2023-Feb-03, Peter Smith wrote: > ... > > 3. ExecuteGrantStmt > > > > + /* Copy the grantor id needed for DDL deparsing of Grant */ > > + istmt.grantor_uid = grantor; > > + > > > > SUGGESTION (comment) > > Copy the grantor id to the parsetree, needed for DDL deparsing of Grant > > Is istmt really "the parse tree" actually? As I recall, it's a derived > struct that's created during execution of the grant/revoke command, so > modifying the comment like this would be a mistake. > I thought this comment was analogous to another one from this same patch 0001 (see seclabel.c), so the suggested change above was simply to make the wording consistent. @@ -134,6 +134,9 @@ ExecSecLabelStmt(SecLabelStmt *stmt) (errcode(ERRCODE_INVALID_PARAMETER_VALUE), errmsg("must specify provider when multiple security label providers have been loaded"))); provider = (LabelProvider *) linitial(label_provider_list); + + /* Copy the provider name to the parsetree, needed for DDL deparsing of SecLabelStmt */ + stmt->provider = pstrdup(provider->provider_name); So if the suggestion for the ExecuteGrantStmt comment was a mistake then perhaps the ExecSecLabelStmt comment is wrong also? -- Kind Regards, Peter Smith. Fujitsu Australia
Re: Pluggable toaster
Hi hackers! In response to opinion in thread on Compresson dictionaries for JSONb [1] >The approaches are completely different, >but they seem to be trying to fix the same problem: the fact that the >default TOAST stuff isn't good enough for JSONB. The problem, actually, is that the default TOAST is often not good for modern loads and amounts of data.Pluggable TOAST is based not only on pure enthusiasm, but on demands and tickets from production databases.The main demand is effective and transparent storage subsystem for large values for some problematic types of data, which we already have, with proven efficiency. So we're really quite surprised that it has got so little feedback. We've got some opinions on approach but there is no any general one on the approach itself except doubts about the TOAST mechanism needs revision at all. Currently we're busy revising the whole Pluggable TOAST API to make it available as an extension and based on hooks to minimize changes in the core. It will be available soon. > [1] https://postgr.es/m/20230203095540.zutul5vmsbmantbm@alvherre.pgsql > -- Nikita Malakhov Postgres Professional https://postgrespro.ru/
Re: Weird failure with latches in curculio on v15
On Sun, Feb 05, 2023 at 09:49:57AM +0900, Michael Paquier wrote: > - Should we include archive_cleanup_command into the recovery modules > at all? We've discussed offloading that from the checkpointer, and it > makes the failure handling trickier when it comes to unexpected GUC > configurations, for one. The same may actually apply to > restore_end_command. Though it is done in the startup process now, > there may be an argument to offload that somewhere else based on the > timing of the end-of-recovery checkpoint. My opinion on this stuff is > that only including restore_command in the modules would make most > users I know of happy enough as it removes the overhead of the command > invocation from the startup process, if able to replay things fast > enough so as the restore command is the bottleneck. > restore_end_command would be simple enough, but if there is a wish to > redesign the startup process to offload it somewhere else, then the > recovery module makes backward-compatibility concerns harder to think > about in the long-term. I agree. I think we ought to first focus on getting the recovery modules interface and restore_command functionality in place before we take on more difficult things like archive_cleanup_command. But I still think the archive_cleanup_command/recovery_end_command functionality should eventually be added to recovery modules. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: Weird failure with latches in curculio on v15
Hi, On 2023-02-05 14:19:38 -0800, Nathan Bossart wrote: > On Sun, Feb 05, 2023 at 09:49:57AM +0900, Michael Paquier wrote: > > - Should we include archive_cleanup_command into the recovery modules > > at all? We've discussed offloading that from the checkpointer, and it > > makes the failure handling trickier when it comes to unexpected GUC > > configurations, for one. The same may actually apply to > > restore_end_command. Though it is done in the startup process now, > > there may be an argument to offload that somewhere else based on the > > timing of the end-of-recovery checkpoint. My opinion on this stuff is > > that only including restore_command in the modules would make most > > users I know of happy enough as it removes the overhead of the command > > invocation from the startup process, if able to replay things fast > > enough so as the restore command is the bottleneck. > > restore_end_command would be simple enough, but if there is a wish to > > redesign the startup process to offload it somewhere else, then the > > recovery module makes backward-compatibility concerns harder to think > > about in the long-term. > > I agree. I think we ought to first focus on getting the recovery modules > interface and restore_command functionality in place before we take on more > difficult things like archive_cleanup_command. But I still think the > archive_cleanup_command/recovery_end_command functionality should > eventually be added to recovery modules. I tend not to agree. If you make the API that small, you're IME likely to end up with something that looks somewhat incoherent once extended. The more I think about it, the less I am convinced that one-callback-per-segment, invoked just before needing the file, is the right approach to address the performance issues of restore_commmand. The main performance issue isn't the shell invocation overhead, it's synchronously needing to restore the archive, before replay can continue. It's also gonna be slow if a restore module copies the segment from a remote system - the latency is the problem. The only way the restore module approach can do better, is to asynchronously restore ahead of the current segment. But for that the API really isn't suited well. The signature of the relevant callback is: > +typedef bool (*RecoveryRestoreCB) (const char *file, const char *path, > +const char > *lastRestartPointFileName); That's not very suited to restoring "ahead of time". You need to parse file to figure out whether a segment or something else is restored, turn "file" back into an LSN, figure out where to to store further segments, somehow hand off to some background worker, etc. That doesn't strike me as something we want to happen inside multiple restore libraries. I think at the very least you'd want to have a separate callback for restoring segments than for restoring other files. But more likely a separate callback for each type of file to be restored. For the timeline history case an parameter indicating that we don't want to restore the file, just to see if there's a conflict, would make sense. For the segment files, we'd likely need a parameter to indicate whether the restore is random or not. Greetings, Andres Freund
Re: Pluggable toaster
Hi, On 2023-02-06 00:10:50 +0300, Nikita Malakhov wrote: > The problem, actually, is that the default TOAST is often not good for > modern loads and amounts of data.Pluggable TOAST is based not only > on pure enthusiasm, but on demands and tickets from production > databases. > The main demand is effective and transparent storage subsystem > for large values for some problematic types of data, which we already have, > with proven efficiency. > So we're really quite surprised that it has got so little feedback. We've > got > some opinions on approach but there is no any general one on the approach > itself except doubts about the TOAST mechanism needs revision at all. The problem for me is that what you've been posting doesn't actually fix any problem, but instead introduces lots of new code and complexity. > Currently we're busy revising the whole Pluggable TOAST API to make it > available as an extension and based on hooks to minimize changes in > the core. It will be available soon. I don't think we should accept that either. It still doesn't improve anything about toast, it just allows you to do such improvements out of core. Greetings, Andres Freund
Re: proposal: psql: psql variable BACKEND_PID
> > >> >> Clearly, it is hard to write a regression test for an externally volatile >> value. `SELECT sign(:BACKEND_PID)` would technically do the job, if we're >> striving for completeness. >> > > I did simple test - :BACKEND_PID should be equal pg_backend_pid() > > Even better. > >> >> Do we want to change %p to pull from this variable and save the >> snprintf()? Not a measurable savings, more or a don't-repeat-yourself thing. >> > > I checked the code, and I don't think so. Current state is safer (I > think). The psql's variables are not protected, and I think, so is safer, > better to > read the value for prompt directly by usage of the libpq API instead read > the possibly "customized" variable. I see possible inconsistency, > but again, the same inconsistency can be for variables USER and DBNAME > too, and I am not able to say what is significantly better. Just prompt > shows > real value, and the related variable is +/- copy in connection time. > > I am not 100% sure in this area what is better, but the change requires > wider and more general discussion, and I don't think the benefits of > possible > change are enough. There are just two possible solutions - we can protect > some psql's variables (and it can do some compatibility issues), or we > need to accept, so the value in prompt can be fake. It is better to not > touch it :-). > I agree it is out of scope of this patch, but I like the idea of protected psql variables, and I doubt users are intentionally re-using these vars to any positive effect. The more likely case is that newer psql vars just happen to use the names chosen by somebody's script in the past. > > done > > > Everything passes. Docs look good. Test looks good.
Re: Temporary tables versus wraparound... again
Hi, On 2023-02-04 17:12:36 +0100, Greg Stark wrote: > I think that was spurious. It looked good when we looked at it yesterday. > The rest that failed seemed unrelated and was also taking on my SSL patch > too. I don't think the SSL failures are related to the failure of this patch. That was in one of the new tests executed as part of the main regression tests: https://api.cirrus-ci.com/v1/artifact/task/6418299974582272/testrun/build/testrun/regress/regress/regression.diffs diff -U3 /tmp/cirrus-ci-build/src/test/regress/expected/temp.out /tmp/cirrus-ci-build/build/testrun/regress/regress/results/temp.out --- /tmp/cirrus-ci-build/src/test/regress/expected/temp.out 2023-02-04 05:43:14.225905000 + +++ /tmp/cirrus-ci-build/build/testrun/regress/regress/results/temp.out 2023-02-04 05:46:57.46825 + @@ -108,7 +108,7 @@ :old_relfrozenxid <> :new_relfrozenxid AS frozenxid_advanced; pages_analyzed | pages_reset | tuples_analyzed | tuples_reset | frozenxid_advanced +-+-+--+ - t | t | t | t| t + t | t | t | t| f (1 row) -- The toast table can't be analyzed so relpages and reltuples can't Whereas the SSL test once failed in subscription/031_column_list (a test with some known stability issues) and twice in postgres_fdw. Unfortunately the postgres_fdw failures are failing to upload: [17:41:25.601] Failed to upload artifacts: Put "https://storage.googleapis.com/cirrus-ci-5309429912436736-3271c9/artifacts/postgresql-cfbot/postgresql/6061134453669888/testrun/build/testrun/runningcheck.log?X-Goog-Algorithm=GOOG4-RSA-SHA256&X-Goog-Credential=cirrus-ci%40cirrus-ci-community.iam.gserviceaccount.com%2F20230128%2Fauto%2Fstorage%2Fgoog4_request&X-Goog-Date=20230128T174012Z&X-Goog-Expires=600&X-Goog-SignedHeaders=host%3Bx-goog-content-length-range%3Bx-goog-meta-created_by_task&X-Goog-Signature=6f5606e3966d68060a14077deb93ed5bf680c4636e6409e5eba6ca8f1ff9b11302c1b5605089e2cd759fd90d1542a4e2c794fd4c1210f04b056d7e09db54d3e983c34539fb4c24787b659189c27e1b6d0ebc1d1807b38a066c10e62fa57a374c3a7fbc610edddf1dfe900b3c788c8d7d7ded3366449b4520992c5ed7a3136c7103b7a668b591542bba58a32f5a84cb21bbeeafea09dc525d1631a5f413a0f98df43cc90ebf6c4206e6df61606bc634c3a8116c53d7c6dd4bc5b26547cd7d1a1633839ace694b73426267a9f434317350905b905b9c88132be14a7762c2f204b8072a3bd7e4e1d30217d9e60102d525b08e28bcfaabae80fba734a1015d8eb0a7": http2: request body larger than specified content length Hm, I suspect the problem is that we didn't shut down the server due to the error, so the log file was changing while cirrus was trying to upload. Greetings, Andres Freund
Re: proposal: psql: psql variable BACKEND_PID
The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: tested, passed Spec compliant: tested, passed Documentation:tested, passed A small but helpful feature. The new status of this patch is: Ready for Committer
Re: Weird failure with latches in curculio on v15
On Sun, Feb 05, 2023 at 09:49:57AM +0900, Michael Paquier wrote: > Yes, at this stage a revert of the refactoring with shell_restore.c is > the best path forward. Done that now, as of 2f6e15a. -- Michael signature.asc Description: PGP signature
Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)
H, On 2023-02-03 13:27:24 +0300, Damir Belyalov wrote: > @@ -625,6 +628,173 @@ CopyMultiInsertInfoStore(CopyMultiInsertInfo *miinfo, > ResultRelInfo *rri, > miinfo->bufferedBytes += tuplen; > } > > +/* > + * Safely reads source data, converts to a tuple and fills tuple buffer. > + * Skips some data in the case of failed conversion if data source for > + * a next tuple can be surely read without a danger. > + */ > +static bool > +SafeCopying(CopyFromState cstate, ExprContext *econtext, TupleTableSlot > *myslot) > + BeginInternalSubTransaction(NULL); > + CurrentResourceOwner = sfcstate->oldowner; I don't think this is the right approach. Creating a subtransaction for each row will cause substantial performance issues. We now can call data type input functions without throwing errors, see InputFunctionCallSafe(). Use that to avoid throwing an error instead of catching it. Greetings, Andres Freund
Re: Weird failure with latches in curculio on v15
On Sun, Feb 05, 2023 at 03:01:57PM -0800, Andres Freund wrote: > I think at the very least you'd want to have a separate callback for > restoring segments than for restoring other files. But more likely a > separate callback for each type of file to be restored. > > For the timeline history case an parameter indicating that we don't want > to restore the file, just to see if there's a conflict, would make > sense. That seems reasonable. > For the segment files, we'd likely need a parameter to indicate whether > the restore is random or not. Wouldn't this approach still require each module to handle restoring ahead of time? I agree that the shell overhead isn't the main performance issue, but it's unclear to me how much of this should be baked into PostgreSQL. I mean, we could introduce a GUC that tells us how far ahead to restore and have a background worker (or multiple background workers) asynchronously pull files into a staging directory via the callbacks. Is that the sort of scope you are envisioning? -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: Weird failure with latches in curculio on v15
Hi, On 2023-02-05 15:57:47 -0800, Nathan Bossart wrote: > > For the segment files, we'd likely need a parameter to indicate whether > > the restore is random or not. > > Wouldn't this approach still require each module to handle restoring ahead > of time? Yes, to some degree at least. I was just describing a few pretty obvious improvements. The core code can make that a lot easier though. The problem of where to store such files can be provided by core code (presumably a separate directory). A GUC for aggressiveness can be provided. Etc. > I agree that the shell overhead isn't the main performance issue, > but it's unclear to me how much of this should be baked into > PostgreSQL. I don't know fully either. But just reimplementing all of it in different modules doesn't seem like a sane approach either. A lot of it is policy that we need to solve once, centrally. > I mean, we could introduce a GUC that tells us how far ahead to > restore and have a background worker (or multiple background workers) > asynchronously pull files into a staging directory via the callbacks. > Is that the sort of scope you are envisioning? Closer, at least. Greetings, Andres Freund
Re: Add progress reporting to pg_verifybackup
On Sat, Feb 04, 2023 at 12:32:15PM +0900, Michael Paquier wrote: > That seems rather OK seen from here. I'll see about getting that > applied except if there is an objection of any kind. Okay, I have looked at that again this morning and I've spotted one tiny issue: specifying --progress with --skip-checksums does not really make sense. Ignoring entries with a bad size would lead to incorrect progress report (for example, say an entry in the manifest has a largely oversized size number), so your approach on this side is correct. The application of the ignore list via -i is also correct, as a patch matching with should_ignore_relpath() does not compute an extra size for total_size. I was also wondering for a few minutes while on it whether it would have been cleaner to move the check for should_ignore_relpath() directly in verify_file_checksum() and verify_backup_file(), but nobody has complained about that as being a problem, either. What do you think about the updated version attached? -- Michael From 08a33cc650fb3cd58e742a79b232dbd5d9843008 Mon Sep 17 00:00:00 2001 From: Michael Paquier Date: Mon, 6 Feb 2023 09:34:19 +0900 Subject: [PATCH v4] Add progress reporting to pg_verifybackup This adds a new option to pg_verifybackup called -P/--progress, showing every second some information about the state of checksum verification. Similarly to what is done for pg_rewind and pg_basebackup, the information printed in the progress report consists of the current amount of data computed and the total amount of data to compute. This could be extended later on. --- src/bin/pg_verifybackup/pg_verifybackup.c | 86 ++- src/bin/pg_verifybackup/t/004_options.pl | 21 -- doc/src/sgml/ref/pg_verifybackup.sgml | 15 3 files changed, 116 insertions(+), 6 deletions(-) diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c index 7634dfc285..8d5cb1c72b 100644 --- a/src/bin/pg_verifybackup/pg_verifybackup.c +++ b/src/bin/pg_verifybackup/pg_verifybackup.c @@ -16,12 +16,14 @@ #include #include #include +#include #include "common/hashfn.h" #include "common/logging.h" #include "fe_utils/simple_list.h" #include "getopt_long.h" #include "parse_manifest.h" +#include "pgtime.h" /* * For efficiency, we'd like our hash table containing information about the @@ -57,6 +59,8 @@ typedef struct manifest_file bool matched; bool bad; } manifest_file; +#define should_check_checksums(m) \ + (((m)->matched) && !((m)->bad) && (((m)->checksum_type) != CHECKSUM_TYPE_NONE)) /* * Define a hash table which we can use to store information about the files @@ -147,10 +151,18 @@ static void report_fatal_error(const char *pg_restrict fmt,...) pg_attribute_printf(1, 2) pg_attribute_noreturn(); static bool should_ignore_relpath(verifier_context *context, char *relpath); +static void progress_report(bool finished); static void usage(void); static const char *progname; +/* options */ +static bool show_progress = false; + +/* Progress indicators */ +static uint64 total_size = 0; +static uint64 done_size = 0; + /* * Main entry point. */ @@ -162,6 +174,7 @@ main(int argc, char **argv) {"ignore", required_argument, NULL, 'i'}, {"manifest-path", required_argument, NULL, 'm'}, {"no-parse-wal", no_argument, NULL, 'n'}, + {"progress", no_argument, NULL, 'P'}, {"quiet", no_argument, NULL, 'q'}, {"skip-checksums", no_argument, NULL, 's'}, {"wal-directory", required_argument, NULL, 'w'}, @@ -219,7 +232,7 @@ main(int argc, char **argv) simple_string_list_append(&context.ignore_list, "recovery.signal"); simple_string_list_append(&context.ignore_list, "standby.signal"); - while ((c = getopt_long(argc, argv, "ei:m:nqsw:", long_options, NULL)) != -1) + while ((c = getopt_long(argc, argv, "ei:m:nPqsw:", long_options, NULL)) != -1) { switch (c) { @@ -241,6 +254,9 @@ main(int argc, char **argv) case 'n': no_parse_wal = true; break; + case 'P': +show_progress = true; +break; case 'q': quiet = true; break; @@ -277,6 +293,14 @@ main(int argc, char **argv) exit(1); } + /* Complain if the specified arguments conflict */ + if (show_progress && quiet) + pg_fatal("cannot specify both %s and %s", + "-P/--progress", "-q/--quiet"); + if (show_progress && skip_checksums) + pg_fatal("cannot specify both %s and %s", + "-P/--progress", "-s/--skip-checksums"); + /* Unless --no-parse-wal was specified, we will need pg_waldump. */ if (!no_parse_wal) { @@ -638,6 +662,10 @@ verify_backup_file(verifier_context *context, char *relpath, char *fullpath) m->bad = true; } + /* Update statistics for progress report, if necessary */ + if (show_progress && should_check_checksums(m)) + total_size += m->size; + /* * We don't verify checksums at this stage. We first finish verifying that * we have the expected set of files with the expected sizes, and only @@ -675,10 +7
Re: Weird failure with latches in curculio on v15
On Sun, Feb 05, 2023 at 04:07:50PM -0800, Andres Freund wrote: > On 2023-02-05 15:57:47 -0800, Nathan Bossart wrote: >> I agree that the shell overhead isn't the main performance issue, >> but it's unclear to me how much of this should be baked into >> PostgreSQL. > > I don't know fully either. But just reimplementing all of it in > different modules doesn't seem like a sane approach either. A lot of it > is policy that we need to solve once, centrally. > >> I mean, we could introduce a GUC that tells us how far ahead to >> restore and have a background worker (or multiple background workers) >> asynchronously pull files into a staging directory via the callbacks. >> Is that the sort of scope you are envisioning? > > Closer, at least. Got it. I suspect we'll want to do something similar for archive modules eventually, too. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: MacOS: xsltproc fails with "warning: failed to load external entity"
Hi, On 2023-02-01 14:20:19 -0500, Tom Lane wrote: > Andres Freund writes: > > On 2023-02-01 09:49:00 -0800, Andres Freund wrote: > >> On 2023-02-01 12:23:27 -0500, Tom Lane wrote: > >>> And the minimum version appears to be newer than RHEL8's 1.8.2, which > >>> I find pretty unfortunate. > > > Unfortunately the test script accidentally pulled in ninja from epel, > > hence not noticing the issue. > > Ah. For myself, pulling the newer version from epel would not be a big > problem. I think what we need to do is figure out what is the minimum > ninja version we want to support, and then see if we need to make any > of these changes. I don't have hard data on which distros have which > versions of ninja, but surely somebody checked that at some point? I did survey available meson versions, and chose what features to use. But not really ninja, since I didn't know about this specific issue and other than this the ninja version differences were handled by meson. As all the issues are related to more precise dependencies, I somehwat wonder if it'd be good enough to use less accurate dependencies with 1.8.2. But I don't like it. Greetings, Andres Freund
Re: Support logical replication of DDLs
Here are some comments for patch v63-0002. This is a WIP because I have not yet looked at the large file - ddl_deparse.c. == Commit Message 1. This patch provides JSON blobs representing DDL commands, which can later be re-processed into plain strings by well-defined sprintf-like expansion. These JSON objects are intended to allow for machine-editing of the commands, by replacing certain nodes within the objects. ~ "This patch provides JSON blobs" --> "This patch constructs JSON blobs" == src/backend/commands/ddl_json. 2. Copyright + * Portions Copyright (c) 1996-2022, PostgreSQL Global Development Group "2022" --> "2023" ~~~ 3. +/* + * Conversion specifier which determines how we expand the JSON element into + * string. + */ +typedef enum +{ + SpecTypeName, + SpecOperatorName, + SpecDottedName, + SpecString, + SpecNumber, + SpecStringLiteral, + SpecIdentifier, + SpecRole +} convSpecifier; ~ 3a. SUGGESTION (comment) Conversion specifier which determines how to expand the JSON element into a string. ~ 3b. Are these enums in this strange order deliberately? If not, then maybe alphabetical is better. ~~~ 4. Forward declaration +char *deparse_ddl_json_to_string(char *jsonb); Why is this forward declared here? Isn't this already declared extern in ddl_deparse.h? ~~~ 5. expand_fmt_recursive +/* + * Recursive helper for deparse_ddl_json_to_string. + * + * Find the "fmt" element in the given container, and expand it into the + * provided StringInfo. + */ +static void +expand_fmt_recursive(JsonbContainer *container, StringInfo buf) I noticed all the other expand_ functions are passing the StringInfo buf as the first parameter. For consistency, shouldn’t this be the same? ~ 6. + if (*cp != '%') + { + appendStringInfoCharMacro(buf, *cp); + continue; + } + + + ADVANCE_PARSE_POINTER(cp, end_ptr); + + /* Easy case: %% outputs a single % */ + if (*cp == '%') + { + appendStringInfoCharMacro(buf, *cp); + continue; + } Double blank lines? ~ 7. + ADVANCE_PARSE_POINTER(cp, end_ptr); + for (; cp < end_ptr;) + { Maybe a while loop is more appropriate? ~ 8. + value = findJsonbValueFromContainer(container, JB_FOBJECT, &key); Should the code be checking or asserting value is not NULL? (IIRC I asked this a long time ago - sorry if it was already answered) ~~~ 9. expand_jsonval_dottedname It might be simpler code to use a variable like: JsonbContainer *data = jsonval->val.binary.data; Instead of repeating jsonval->val.binary.data many times. ~~~ 10. expand_jsonval_typename It might be simpler code to use a variable like: JsonbContainer *data = jsonval->val.binary.data; Instead of repeating jsonval->val.binary.data many times. ~~~ 11. +/* + * Expand a JSON value as an operator name. + */ +static void +expand_jsonval_operator(StringInfo buf, JsonbValue *jsonval) Should this function comment be more like the comment for expand_jsonval_dottedname by saying there can be an optional "schemaname"? ~~~ 12. +static bool +expand_jsonval_string(StringInfo buf, JsonbValue *jsonval) +{ + if (jsonval->type == jbvString) + { + appendBinaryStringInfo(buf, jsonval->val.string.val, +jsonval->val.string.len); + } + else if (jsonval->type == jbvBinary) + { + json_trivalue present; + + present = find_bool_in_jsonbcontainer(jsonval->val.binary.data, + "present"); + + /* + * If "present" is set to false, this element expands to empty; + * otherwise (either true or absent), fall through to expand "fmt". + */ + if (present == tv_false) + return false; + + expand_fmt_recursive(jsonval->val.binary.data, buf); + } + else + return false; + + return true; +} I felt this could be simpler if there is a new 'expanded' variable because then you can have just a single return point instead of 3 returns; If you choose to do this there is a minor tweak to the "fall through" comment. SUGGESTION expand_jsonval_string(StringInfo buf, JsonbValue *jsonval) { bool expanded = true; if (jsonval->type == jbvString) { appendBinaryStringInfo(buf, jsonval->val.string.val, jsonval->val.string.len); } else if (jsonval->type == jbvBinary) { json_trivalue present; present = find_bool_in_jsonbcontainer(jsonval->val.binary.data, "present"); /* * If "present" is set to false, this element expands to empty; * otherwise (either true or absent), expand "fmt". */ if (present == tv_false) expanded = false; else expand_fmt_recursive(jsonval->val.binary.data, buf); } return expanded; } ~~~ 13. +/* + * Expand a JSON value as an integer quantity. + */ +static void +expand_jsonval_number(StringInfo buf, JsonbValue *jsonval) +{ Should this also be checking/asserting that the type is jbvNumeric? ~~~ 14. +/* + * Expand a JSON value as a role name. If the is_public element is set to + * true, PUBLIC is expanded (no quotes
Re: MacOS: xsltproc fails with "warning: failed to load external entity"
Andres Freund writes: > I did survey available meson versions, and chose what features to > use. But not really ninja, since I didn't know about this specific issue > and other than this the ninja version differences were handled by meson. > As all the issues are related to more precise dependencies, I somehwat > wonder if it'd be good enough to use less accurate dependencies with > 1.8.2. But I don't like it. Nah, I don't like that either. I did a crude survey of ninja's version history by seeing which version is in each recent Fedora release: f20/ninja-build.spec:Version:1.4.0 f21/ninja-build.spec:Version:1.5.1 f22/ninja-build.spec:Version:1.5.3 f23/ninja-build.spec:Version:1.7.1 f24/ninja-build.spec:Version:1.7.2 f25/ninja-build.spec:Version:1.8.2 f26/ninja-build.spec:Version:1.8.2 f27/ninja-build.spec:Version:1.8.2 f28/ninja-build.spec:Version:1.8.2 f29/ninja-build.spec:Version:1.8.2 f30/ninja-build.spec:Version:1.9.0 f31/ninja-build.spec:Version:1.10.1 f32/ninja-build.spec:Version:1.10.1 f33/ninja-build.spec:Version:1.10.2 f34/ninja-build.spec:Version:1.10.2 f35/ninja-build.spec:Version:1.10.2 f36/ninja-build.spec:Version:1.10.2 f37/ninja-build.spec:Version:1.10.2 rawhide/ninja-build.spec:Version:1.11.1 Remembering that Fedora has a six-month release cycle, this shows that 1.8.2 was around for awhile but 1.9.x was a real flash-in-the-pan. We can probably get away with saying that you need 1.10 or newer. That's already three-plus years old. regards, tom lane
Re: First draft of back-branch release notes is done
On 2/5/23 3:01 PM, Tom Lane wrote: "Jonathan S. Katz" writes: On Feb 4, 2023, at 10:24 AM, Tom Lane wrote: “Prevent clobbering of cached parsetrees…Bad things could happen if…” While I chuckled over the phrasing, I’m left to wonder what the “bad things” are, in case I need to check an older version to see if I’m susceptible to this bug. Fair. I was trying to avoid committing to specific consequences. The assertion failure seen in the original report (#17702) wouldn't occur for typical users, but they might see crashes or "unexpected node type" failures. Maybe we can say that instead. I did a quick readthrough of #17702. Your proposal sounds reasonable. Based on that explanation and reading #17702, I'm still not sure if this will make the cut in the release announcement itself, but +1 for modifying it in the release notes. Thanks, Jonathan OpenPGP_signature Description: OpenPGP digital signature
Re: pglz compression performance, take two
On 2/5/23 19:36, Andrey Borodin wrote: > On Fri, Jan 6, 2023 at 10:02 PM Andrey Borodin wrote: >> >> Hello! Please find attached v8. > > I got some interesting feedback from some patch users. > There was an oversight that frequently yielded results that are 1,2 or > 3 bytes longer than expected. > Looking closer I found that the correctness of the last 3-byte tail is > checked in two places. PFA fix for this. Previously compressed data > was correct, however in some cases few bytes longer than the result of > current pglz implementation. > Thanks. What were the consequences of the issue? Lower compression ratio, or did we then fail to decompress the data (or would current pglz implementation fail to decompress it)? regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: pglz compression performance, take two
On Sun, Feb 5, 2023 at 5:51 PM Tomas Vondra wrote: > > On 2/5/23 19:36, Andrey Borodin wrote: > > On Fri, Jan 6, 2023 at 10:02 PM Andrey Borodin > > wrote: > >> > >> Hello! Please find attached v8. > > > > I got some interesting feedback from some patch users. > > There was an oversight that frequently yielded results that are 1,2 or > > 3 bytes longer than expected. > > Looking closer I found that the correctness of the last 3-byte tail is > > checked in two places. PFA fix for this. Previously compressed data > > was correct, however in some cases few bytes longer than the result of > > current pglz implementation. > > > > Thanks. What were the consequences of the issue? Lower compression > ratio, or did we then fail to decompress the data (or would current pglz > implementation fail to decompress it)? > The data was decompressed fine. But extension tests (Citus's columnar engine) hard-coded a lot of compression ratio stuff. And there is still 1 more test where optimized version produces 1 byte longer output. I'm trying to find it, but with no success yet. There are known and documented cases when optimized pglz version would do so. good_match without 10-division and memcmp by 4 bytes. But even disabling this, still observing 1-byte longer compression results persists... The problem is the length is changed after deleting some data, so compression of that particular sequence seems to be somewhere far away. It was funny at the beginning - to hunt for 1 byte. But the weekend is ending, and it seems that byte slipped from me again... Best regards, Andrey Borodin.
Re: File descriptors in exec'd subprocesses
On Mon, Feb 6, 2023 at 3:29 AM Andres Freund wrote: > On February 5, 2023 1:00:50 AM GMT+01:00, Thomas Munro > wrote: > >Are there any more descriptors we need to think about? > > Postmaster's listen sockets? Saves a bunch of syscalls, at least. Assuming you mean accepted sockets, yeah, I see how two save two syscalls there, and since you nerd-sniped me into looking into the SOCK_CLOEXEC landscape, I like it even more now that I've understood that accept4() is rubber-stamped for the next revision of POSIX[1] and is already accepted almost everywhere. It's not just window dressing, you need it to write multi-threaded programs that fork/exec without worrying about the window between fd creation and fcntl(FD_CLOEXEC) in another thread; hopefully one day we will care about that sort of thing in some places too! Here's a separate patch for that. I *guess* we need HAVE_DECL_ACCEPT4 for the guarded availability system (cf pwritev) when Apple gets the memo, but see below. Hard to say if AIX is still receiving memos (cf recent speculation in the Register). All other target OSes seem to have had this stuff for a while. Since client connections already do fcntl(FD_CLOEXEC), postgres_fdw connections didn't have this problem. It seems reasonable to want to skip a couple of system calls there too; also, client programs might also be interested in future-POSIX's atomic race-free close-on-exec socket fabrication. So here also is a patch to use SOCK_CLOEXEC on that end too, if available. But ... hmph, all we can do here is test for the existence of SOCK_NONBLOCK and SOCK_CLOEXEC, since there is no new function to test for. Maybe we should just assume accept4() also exists if these exist (it's hard to imagine that Apple or IBM would address atomicity on one end but not the other of a socket), but predictions are so difficult, especially about the future! Anyone want to guess if it's better to leave the meson/configure probe in for the accept4 end or just roll with the macros? > Logging collector pipe write end, in backends? The write end of the logging pipe is already closed, after dup2'ing it to STDOUT_FILENO to STDERR_FILENO, so archive commands and suchlike do receive the handle, but they want them. It's the intended and documented behaviour that anything written to that will finish up in the log. As for pipe2(O_CLOEXEC), I see the point of it in a multi-threaded application. It's not terribly useful for us though, because we usually want to close only one end, except in the case of the self-pipe. But the self-pipe is no longer used on the systems that have pipe2()-from-the-future. I haven't tested this under EXEC_BACKEND yet. [1] https://www.austingroupbugs.net/view.php?id=411 From c9a61ba8eaddb197c31095a163f7efdf2e3ea797 Mon Sep 17 00:00:00 2001 From: Thomas Munro Date: Sun, 5 Feb 2023 12:18:14 +1300 Subject: [PATCH v2 1/3] Don't leak descriptors into subprograms. Open data and WAL files with O_CLOEXEC (POSIX 2018, SUSv4). All of our target systems have it, except Windows. Windows already has that behavior, because we wrote our own open() implementation. Do the same for sockets with FD_CLOEXEC. (A separate commit will improve this with SOCK_CLOEXEC on some systems, but we'll need the FD_CLOEXEC path as a fallback.) This means that programs executed by COPY and archiving commands etc, will not be able to mess with those descriptors (of course nothing stops them from opening files, so this is more about tidyness and preventing accidental problems than security). Reviewed-by: Tom Lane Reviewed-by: Andres Freund Discussion: https://postgr.es/m/CA%2BhUKGKb6FsAdQWcRL35KJsftv%2B9zXqQbzwkfRf1i0J2e57%2BhQ%40mail.gmail.com --- src/backend/access/transam/xlog.c | 9 ++--- src/backend/libpq/pqcomm.c| 9 + src/backend/storage/file/fd.c | 2 -- src/backend/storage/smgr/md.c | 9 + src/include/port/win32_port.h | 7 +++ 5 files changed, 27 insertions(+), 9 deletions(-) diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index fb4c860bde..3b44a0f237 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -2937,7 +2937,8 @@ XLogFileInitInternal(XLogSegNo logsegno, TimeLineID logtli, * Try to use existent file (checkpoint maker may have created it already) */ *added = false; - fd = BasicOpenFile(path, O_RDWR | PG_BINARY | get_sync_bit(sync_method)); + fd = BasicOpenFile(path, O_RDWR | PG_BINARY | O_CLOEXEC | + get_sync_bit(sync_method)); if (fd < 0) { if (errno != ENOENT) @@ -3098,7 +3099,8 @@ XLogFileInit(XLogSegNo logsegno, TimeLineID logtli) return fd; /* Now open original target segment (might not be file I just made) */ - fd = BasicOpenFile(path, O_RDWR | PG_BINARY | get_sync_bit(sync_method)); + fd = BasicOpenFile(path, O_RDWR | PG_BINARY | O_CLOEXEC | + get_sync_bit(sync_method)); if (fd < 0) ereport(ERROR, (errcode_for_file_access(), @@ -3329
Re: First draft of back-branch release notes is done
"Jonathan S. Katz" writes: > On 2/5/23 3:01 PM, Tom Lane wrote: >> Fair. I was trying to avoid committing to specific consequences. >> The assertion failure seen in the original report (#17702) wouldn't >> occur for typical users, but they might see crashes or "unexpected node >> type" failures. Maybe we can say that instead. > I did a quick readthrough of #17702. Your proposal sounds reasonable. > Based on that explanation and reading #17702, I'm still not sure if this > will make the cut in the release announcement itself, but +1 for > modifying it in the release notes. The notes now say Prevent clobbering of cached parsetrees for utility statements in SQL functions (Tom Lane, Daniel Gustafsson) If a SQL-language function executes the same utility command more than once within a single calling query, it could crash or report strange errors such as unrecognized node type. regards, tom lane
Re: Time delayed LR (WAS Re: logical replication restrictions)
On Sat, Feb 4, 2023 at 5:04 PM Takamichi Osumi (Fujitsu) wrote: > ... > > Kindly have a look at the attached v27. > Here are some review comments for patch v27-0001. == src/test/subscription/t/032_apply_delay.pl 1. +# Confirm the time-delayed replication has been effective from the server log +# message where the apply worker emits for applying delay. Moreover, verify +# that the current worker's remaining wait time is sufficiently bigger than the +# expected value, in order to check any update of the min_apply_delay. +sub check_apply_delay_log ~ "has been effective from the server log" --> "worked, by inspecting the server log" ~~~ 2. +my $delay = 3; Might be better to name this variable as 'min_apply_delay'. ~~~ 3. +# Now wait for replay to complete on publisher. We're done waiting when the +# subscriber has applyed up to the publisher LSN. +$node_publisher->wait_for_catchup($appname); 3a. Something seemed wrong with the comment. Was it meant to say more like? "The publisher waits for the replication to complete". Typo: "applyed" ~ 3b. Instead of doing this wait_for_catchup stuff why don't you just use a synchronous pub/sub and then the publication will just block internally like you require but without you having to block using test code? ~~~ 4. +# Run a query to make sure that the reload has taken effect. +$node_publisher->safe_psql('postgres', q{SELECT 1}); SUGGESTION (for the comment) # Running a dummy query causes the config to be reloaded. ~~~ 5. +# Confirm the record is not applied expectedly +my $result = $node_subscriber->safe_psql('postgres', + "SELECT count(a) FROM tab_int WHERE a = 0;"); +is($result, qq(0), "check the delayed transaction was not applied"); "expectedly" ?? SUGGESTION (for comment) # Confirm the record was not applied -- Kind Regards, Peter Smith. Fujitsu Australia
Re: First draft of back-branch release notes is done
On 2/5/23 9:39 PM, Tom Lane wrote: Prevent clobbering of cached parsetrees for utility statements in SQL functions (Tom Lane, Daniel Gustafsson) If a SQL-language function executes the same utility command more than once within a single calling query, it could crash or report strange errors such as unrecognized node type. regards, tom lane +1. Thanks! Jonathan OpenPGP_signature Description: OpenPGP digital signature
Re: Add progress reporting to pg_verifybackup
On Mon, Feb 6, 2023 at 9:35 AM Michael Paquier wrote: > > On Sat, Feb 04, 2023 at 12:32:15PM +0900, Michael Paquier wrote: > > That seems rather OK seen from here. I'll see about getting that > > applied except if there is an objection of any kind. > > Okay, I have looked at that again this morning and I've spotted one > tiny issue: specifying --progress with --skip-checksums does not > really make sense. I thought that too, but I thought it's better to ignore it, instead of erroring out. For example, we can specify both --disable and --progress options to pg_checksum commands, but we don't write any progress information in this case. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Re: cutting down the TODO list thread
On Mon, Jan 30, 2023 at 10:07 PM Bruce Momjian wrote: > > On Mon, Jan 30, 2023 at 01:13:45PM +0700, John Naylor wrote: > > "It's worth checking if the feature of interest is found in the TODO list on > > our wiki: http://wiki.postgresql.org/wiki/TODO. The entries there often have > > additional information about the feature and may point to reasons why it hasn't > > been implemented yet." > > Good. > > Do I understand right that we could just remove this entire section "What areas > > need work?"? > > Yes, I think so. > > > I can now see that just removing the [E] label totally is the right > > > answer. Yes, there might be an easy item on there, but the fact we have > > > three labeled and they are not easy makes me thing [E] is causing more > > > problems than it solves. > > > > Okay, having heard no objections I'll remove it. These are all done now. I'll try to get back to culling the list items at the end of April. -- John Naylor EDB: http://www.enterprisedb.com
Re: Exit walsender before confirming remote flush in logical replication
On Sat, Feb 4, 2023 at 6:31 PM Andres Freund wrote: > > On 2023-02-02 11:21:54 +0530, Amit Kapila wrote: > > The main problem we want to solve here is to avoid shutdown failing in > > case walreceiver/applyworker is busy waiting for some lock or for some > > other reason as shown in the email [1]. > > Isn't handling this part of the job of wal_sender_timeout? > In some cases, it is not clear whether we can handle it by wal_sender_timeout. Consider a case of a time-delayed replica where the applyworker will keep sending some response/alive message so that walsender doesn't timeout in that (during delay) period. In that case, because walsender won't timeout, the shutdown will fail (with the failed message) even though it will be complete after the walsender is able to send all the WAL and shutdown. The time-delayed replica patch is still under discussion [1]. Also, for large values of wal_sender_timeout, it will wait till the walsender times out and can return with a failed message. > > I don't at all agree that it's ok to just stop replicating changes > because we're blocked on network IO. The patch justifies this with: > > > Currently, at shutdown, walsender processes wait to send all pending data > > and > > ensure the all data is flushed in remote node. This mechanism was added by > > 985bd7 for supporting clean switch over, but such use-case cannot be > > supported > > for logical replication. This commit remove the blocking in the case. > > and at the start of the thread with: > > > In case of logical replication, however, we cannot support the use-case that > > switches the role publisher <-> subscriber. Suppose same case as above, > > additional > > transactions are committed while doing step2. To catch up such changes > > subscriber > > must receive WALs related with trans, but it cannot be done because > > subscriber > > cannot request WALs from the specific position. In the case, we must > > truncate all > > data in new subscriber once, and then create new subscription with copy_data > > = true. > > But that seems a too narrow view to me. Imagine you want to decomission > the current primary, and instead start to use the logical standby as the > primary. For that you'd obviously want to replicate the last few > changes. But with the proposed change, that'd be hard to ever achieve. > I think that can still be achieved with the idea being discussed which is to keep allowing sending the WAL for smart shutdown mode but not for other modes(fast, immediate). I don't know whether it is a good idea or not but Kuroda-San has produced a POC patch for it. We can instead choose to improve our docs related to shutdown to explain a bit more about the shutdown's interaction with (logical and physical) replication. As of now, it says: (“Smart” mode disallows new connections, then waits for all existing clients to disconnect. If the server is in hot standby, recovery and streaming replication will be terminated once all clients have disconnected.)[2]. Here, it is not clear that shutdown will wait for sending and flushing all the WALs. The information for fast and immediate modes is even lesser which makes it more difficult to understand what kind of behavior is expected in those modes. [1] - https://commitfest.postgresql.org/42/3581/ [2] - https://www.postgresql.org/docs/devel/app-pg-ctl.html -- With Regards, Amit Kapila.
Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)
Hi, Andres! Thank you for reviewing. > I don't think this is the right approach. Creating a subtransaction for > each row will cause substantial performance issues. > Subtransactions aren't created for each row. The block of rows in one subtransaction is 1000 (SAFE_BUFFER_SIZE) and can be changed. There is also a constraint for the number of bytes MAX_SAFE_BUFFER_BYTES in safe_buffer: while (sfcstate->saved_tuples < SAFE_BUFFER_SIZE && sfcstate->safeBufferBytes < MAX_SAFE_BUFFER_BYTES) We now can call data type input functions without throwing errors, see > InputFunctionCallSafe(). Use that to avoid throwing an error instead of > catching it. > InputFunctionCallSafe() is good for detecting errors from input-functions but there are such errors from NextCopyFrom () that can not be detected with InputFunctionCallSafe(), e.g. "wrong number of columns in row''. Do you offer to process input-function errors separately from other errors? Now all errors are processed in one "switch" loop in PG_CATCH, so this change can complicate code. Regards, Damir Belyalov Postgres Professional
Re: Exit walsender before confirming remote flush in logical replication
Hi, On February 5, 2023 8:29:19 PM PST, Amit Kapila wrote: >On Sat, Feb 4, 2023 at 6:31 PM Andres Freund wrote: >> >> On 2023-02-02 11:21:54 +0530, Amit Kapila wrote: >> > The main problem we want to solve here is to avoid shutdown failing in >> > case walreceiver/applyworker is busy waiting for some lock or for some >> > other reason as shown in the email [1]. >> >> Isn't handling this part of the job of wal_sender_timeout? >> > >In some cases, it is not clear whether we can handle it by >wal_sender_timeout. Consider a case of a time-delayed replica where >the applyworker will keep sending some response/alive message so that >walsender doesn't timeout in that (during delay) period. In that case, >because walsender won't timeout, the shutdown will fail (with the >failed message) even though it will be complete after the walsender is >able to send all the WAL and shutdown. The time-delayed replica patch >is still under discussion [1]. Also, for large values of >wal_sender_timeout, it will wait till the walsender times out and can >return with a failed message. > >> >> I don't at all agree that it's ok to just stop replicating changes >> because we're blocked on network IO. The patch justifies this with: >> >> > Currently, at shutdown, walsender processes wait to send all pending data >> > and >> > ensure the all data is flushed in remote node. This mechanism was added by >> > 985bd7 for supporting clean switch over, but such use-case cannot be >> > supported >> > for logical replication. This commit remove the blocking in the case. >> >> and at the start of the thread with: >> >> > In case of logical replication, however, we cannot support the use-case >> > that >> > switches the role publisher <-> subscriber. Suppose same case as above, >> > additional >> > transactions are committed while doing step2. To catch up such changes >> > subscriber >> > must receive WALs related with trans, but it cannot be done because >> > subscriber >> > cannot request WALs from the specific position. In the case, we must >> > truncate all >> > data in new subscriber once, and then create new subscription with >> > copy_data >> > = true. >> >> But that seems a too narrow view to me. Imagine you want to decomission >> the current primary, and instead start to use the logical standby as the >> primary. For that you'd obviously want to replicate the last few >> changes. But with the proposed change, that'd be hard to ever achieve. >> > >I think that can still be achieved with the idea being discussed which >is to keep allowing sending the WAL for smart shutdown mode but not >for other modes(fast, immediate). I don't know whether it is a good >idea or not but Kuroda-San has produced a POC patch for it. We can >instead choose to improve our docs related to shutdown to explain a >bit more about the shutdown's interaction with (logical and physical) >replication. As of now, it says: (“Smart” mode disallows new >connections, then waits for all existing clients to disconnect. If the >server is in hot standby, recovery and streaming replication will be >terminated once all clients have disconnected.)[2]. Here, it is not >clear that shutdown will wait for sending and flushing all the WALs. >The information for fast and immediate modes is even lesser which >makes it more difficult to understand what kind of behavior is >expected in those modes. > >[1] - https://commitfest.postgresql.org/42/3581/ >[2] - https://www.postgresql.org/docs/devel/app-pg-ctl.html > Smart shutdown is practically unusable. I don't think it makes sense to tie behavior of walsender to it in any way. Andres -- Sent from my Android device with K-9 Mail. Please excuse my brevity.
Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)
Damir Belyalov writes: >> I don't think this is the right approach. Creating a subtransaction for >> each row will cause substantial performance issues. > Subtransactions aren't created for each row. The block of rows in one > subtransaction is 1000 (SAFE_BUFFER_SIZE) and can be changed. I think that at this point, any patch that involves adding subtransactions to COPY is dead on arrival; whether it's batched or not is irrelevant. (It's not like batching has no downsides.) > InputFunctionCallSafe() is good for detecting errors from input-functions > but there are such errors from NextCopyFrom () that can not be detected > with InputFunctionCallSafe(), e.g. "wrong number of columns in row''. If you want to deal with those, then there's more work to be done to make those bits non-error-throwing. But there's a very finite amount of code involved and no obvious reason why it couldn't be done. The major problem here has always been the indefinite amount of code implicated by calling datatype input functions, and we have now created a plausible answer to that problem. regards, tom lane
Re: Add progress reporting to pg_verifybackup
On Mon, Feb 06, 2023 at 12:27:51PM +0900, Masahiko Sawada wrote: > I thought that too, but I thought it's better to ignore it, instead of > erroring out. For example, we can specify both --disable and > --progress options to pg_checksum commands, but we don't write any > progress information in this case. Well, if you don't feel strongly about that, that's fine by me as well, so I have applied your v3 with the tweaks I posted previously, without the restriction on --skip-checksums. -- Michael signature.asc Description: PGP signature
Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)
Hi, On February 5, 2023 9:12:17 PM PST, Tom Lane wrote: >Damir Belyalov writes: >>> I don't think this is the right approach. Creating a subtransaction for >>> each row will cause substantial performance issues. > >> Subtransactions aren't created for each row. The block of rows in one >> subtransaction is 1000 (SAFE_BUFFER_SIZE) and can be changed. > >I think that at this point, any patch that involves adding subtransactions >to COPY is dead on arrival; whether it's batched or not is irrelevant. >(It's not like batching has no downsides.) Indeed. >> InputFunctionCallSafe() is good for detecting errors from input-functions >> but there are such errors from NextCopyFrom () that can not be detected >> with InputFunctionCallSafe(), e.g. "wrong number of columns in row''. > >If you want to deal with those, then there's more work to be done to make >those bits non-error-throwing. But there's a very finite amount of code >involved and no obvious reason why it couldn't be done. The major problem >here has always been the indefinite amount of code implicated by calling >datatype input functions, and we have now created a plausible answer to >that problem. I'm not even sure it makes sense to avoid that kind of error. And invalid column count or such is something quite different than failing some data type input routine, or falling a constraint. -- Sent from my Android device with K-9 Mail. Please excuse my brevity.
Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)
Andres Freund writes: > On February 5, 2023 9:12:17 PM PST, Tom Lane wrote: >> Damir Belyalov writes: >>> InputFunctionCallSafe() is good for detecting errors from input-functions >>> but there are such errors from NextCopyFrom () that can not be detected >>> with InputFunctionCallSafe(), e.g. "wrong number of columns in row''. >> If you want to deal with those, then there's more work to be done to make >> those bits non-error-throwing. But there's a very finite amount of code >> involved and no obvious reason why it couldn't be done. > I'm not even sure it makes sense to avoid that kind of error. And > invalid column count or such is something quite different than failing > some data type input routine, or falling a constraint. I think it could be reasonable to put COPY's overall-line-format requirements on the same level as datatype input format violations. I agree that trying to trap every kind of error is a bad idea, for largely the same reason that the soft-input-errors patches only trap certain kinds of errors: it's too hard to tell whether an error is an "internal" error that it's scary to continue past. regards, tom lane
Re: Add progress reporting to pg_verifybackup
On Mon, Feb 6, 2023 at 2:45 PM Michael Paquier wrote: > > On Mon, Feb 06, 2023 at 12:27:51PM +0900, Masahiko Sawada wrote: > > I thought that too, but I thought it's better to ignore it, instead of > > erroring out. For example, we can specify both --disable and > > --progress options to pg_checksum commands, but we don't write any > > progress information in this case. > > Well, if you don't feel strongly about that, that's fine by me as > well, so I have applied your v3 with the tweaks I posted previously, > without the restriction on --skip-checksums. Thank you! Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Re: Exit walsender before confirming remote flush in logical replication
On Mon, Feb 6, 2023 at 10:33 AM Andres Freund wrote: > > On February 5, 2023 8:29:19 PM PST, Amit Kapila > wrote: > >> > >> But that seems a too narrow view to me. Imagine you want to decomission > >> the current primary, and instead start to use the logical standby as the > >> primary. For that you'd obviously want to replicate the last few > >> changes. But with the proposed change, that'd be hard to ever achieve. > >> > > > >I think that can still be achieved with the idea being discussed which > >is to keep allowing sending the WAL for smart shutdown mode but not > >for other modes(fast, immediate). I don't know whether it is a good > >idea or not but Kuroda-San has produced a POC patch for it. We can > >instead choose to improve our docs related to shutdown to explain a > >bit more about the shutdown's interaction with (logical and physical) > >replication. As of now, it says: (“Smart” mode disallows new > >connections, then waits for all existing clients to disconnect. If the > >server is in hot standby, recovery and streaming replication will be > >terminated once all clients have disconnected.)[2]. Here, it is not > >clear that shutdown will wait for sending and flushing all the WALs. > >The information for fast and immediate modes is even lesser which > >makes it more difficult to understand what kind of behavior is > >expected in those modes. > > > >[1] - https://commitfest.postgresql.org/42/3581/ > >[2] - https://www.postgresql.org/docs/devel/app-pg-ctl.html > > > > Smart shutdown is practically unusable. I don't think it makes sense to tie > behavior of walsender to it in any way. > So, we have the following options: (a) do nothing for this; (b) clarify the current behavior in docs. Any suggestions? -- With Regards, Amit Kapila.
RE: Time delayed LR (WAS Re: logical replication restrictions)
On Monday, February 6, 2023 12:03 PM Peter Smith wrote: > On Sat, Feb 4, 2023 at 5:04 PM Takamichi Osumi (Fujitsu) > wrote: > > > ... > > > > Kindly have a look at the attached v27. > > > > Here are some review comments for patch v27-0001. Thanks for checking ! > == > src/test/subscription/t/032_apply_delay.pl > > 1. > +# Confirm the time-delayed replication has been effective from the > +server log # message where the apply worker emits for applying delay. > +Moreover, verify # that the current worker's remaining wait time is > +sufficiently bigger than the # expected value, in order to check any update > of > the min_apply_delay. > +sub check_apply_delay_log > > ~ > > "has been effective from the server log" --> "worked, by inspecting the server > log" Sounds good to me. Also, this is an unique part for time-delayed logical replication. So, we can update those as we want. Fixed. > ~~~ > > 2. > +my $delay = 3; > > Might be better to name this variable as 'min_apply_delay'. I named this variable by following the test of recovery_min_apply_delay (src/test/recovery/005_replay_delay.pl). So, this is aligned with the test and I'd like to keep it as it is. > ~~~ > > 3. > +# Now wait for replay to complete on publisher. We're done waiting when > +the # subscriber has applyed up to the publisher LSN. > +$node_publisher->wait_for_catchup($appname); > > 3a. > Something seemed wrong with the comment. > > Was it meant to say more like? "The publisher waits for the replication to > complete". > > Typo: "applyed" Your wording looks better than mine. Fixed. > ~ > > 3b. > Instead of doing this wait_for_catchup stuff why don't you just use a > synchronous pub/sub and then the publication will just block internally like > you require but without you having to block using test code? This is the style of 005_reply_delay.pl. Then, this is also aligned with it. So, I'd like to keep the current way of times comparison as it is. Even if we could omit wait_for_catchup(), there will be new codes for synchronous replication and that would make the min_apply_delay tests more different from the corresponding one. Note that if we use the synchronous mode, we need to turn it off for the last ALTER SUBSCRIPTION DISABLE test case whose min_apply_delay to 1 day 5 min and execute one record insert after that. This will make the tests confusing. > ~~~ > > 4. > +# Run a query to make sure that the reload has taken effect. > +$node_publisher->safe_psql('postgres', q{SELECT 1}); > > SUGGESTION (for the comment) > # Running a dummy query causes the config to be reloaded. Fixed. > ~~~ > > 5. > +# Confirm the record is not applied expectedly my $result = > +$node_subscriber->safe_psql('postgres', > + "SELECT count(a) FROM tab_int WHERE a = 0;"); is($result, qq(0), > +"check the delayed transaction was not applied"); > > "expectedly" ?? > > SUGGESTION (for comment) > # Confirm the record was not applied Fixed. Best Regards, Takamichi Osumi v28-0001-Time-delayed-logical-replication-subscriber.patch Description: v28-0001-Time-delayed-logical-replication-subscriber.patch
Re: Fix GUC_NO_SHOW_ALL test scenario in 003_check_guc.pl
On Sun, Feb 05, 2023 at 12:56:58AM +0530, Nitin Jadhav wrote: > Ok. Understood the other problems. I have attached the v2 patch which > uses the idea present in Michael's patch. In addition, I have removed > fetching NO_SHOW_ALL parameters while creating tab_settings_flags > table in guc.sql and adjusted the test which checks for (NO_RESET_ALL > AND NOT NO_SHOW_ALL) as this was misleading the developer who thinks > that tab_settings_flags table has NO_SHOW_ALL parameters which is > incorrect. Okay, the part to add an initialization check for GUC_NO_SHOW_ALL and GUC_NOT_IN_SAMPLE looked fine by me, so applied after more comment polishing. +-- NO_RESET_ALL can be specified without NO_SHOW_ALL, like transaction_*. +-- tab_settings_flags does not contain NO_SHOW_ALL flags. Just checking for +-- NO_RESET_ALL implies NO_RESET_ALL AND NOT NO_SHOW_ALL. SELECT name FROM tab_settings_flags - WHERE NOT no_show_all AND no_reset_all + WHERE no_reset_all ORDER BY 1; Removing entirely no_show_all is fine by me, but keeping this SQL has little sense, then, because it would include any GUCs loaded by an external source when they define NO_RESET_ALL. I think that 0001 should be like the attached, instead, backpatched down to 15 (release week, so it cannot be touched until the next version is stamped), where we just remove all the checks based on no_show_all. On top of that, I have noticed an extra combination that would not make sense and that could be checked with the SQL queries: GUC_DISALLOW_IN_FILE implies GUC_NOT_IN_SAMPLE. The opposite may not be true, though, as some developer GUCs are marked as GUC_NOT_IN_SAMPLE but they are allowed in a file. The only exception to that currently is config_file. It is just a special case whose value is enforced at startup and it can be passed down as an option switch via the postgres binary, still it seems like it would be better to also mark it as GUC_NOT_IN_SAMPLE? This is done in 0002, only for HEAD, as that would be a new check. Thoughts? -- Michael >From 727361c5fe4c9af16ef66c1090a9e376dfc891d2 Mon Sep 17 00:00:00 2001 From: Michael Paquier Date: Mon, 6 Feb 2023 16:05:11 +0900 Subject: [PATCH v3 1/2] Clean up SQL tests for NO_SHOW_ALL --- src/test/regress/expected/guc.out | 28 src/test/regress/sql/guc.sql | 13 - 2 files changed, 41 deletions(-) diff --git a/src/test/regress/expected/guc.out b/src/test/regress/expected/guc.out index 2914b950b4..127c953297 100644 --- a/src/test/regress/expected/guc.out +++ b/src/test/regress/expected/guc.out @@ -841,7 +841,6 @@ CREATE TABLE tab_settings_flags AS SELECT name, category, 'EXPLAIN' = ANY(flags) AS explain, 'NO_RESET' = ANY(flags) AS no_reset, 'NO_RESET_ALL' = ANY(flags) AS no_reset_all, -'NO_SHOW_ALL' = ANY(flags) AS no_show_all, 'NOT_IN_SAMPLE'= ANY(flags) AS not_in_sample, 'RUNTIME_COMPUTED' = ANY(flags) AS runtime_computed FROM pg_show_all_settings() AS psas, @@ -880,33 +879,6 @@ SELECT name FROM tab_settings_flags -- (0 rows) --- NO_SHOW_ALL implies NO_RESET_ALL, and vice-versa. -SELECT name FROM tab_settings_flags - WHERE no_show_all AND NOT no_reset_all - ORDER BY 1; - name --- -(0 rows) - --- Exceptions are transaction_*. -SELECT name FROM tab_settings_flags - WHERE NOT no_show_all AND no_reset_all - ORDER BY 1; - name - - transaction_deferrable - transaction_isolation - transaction_read_only -(3 rows) - --- NO_SHOW_ALL implies NOT_IN_SAMPLE. -SELECT name FROM tab_settings_flags - WHERE no_show_all AND NOT not_in_sample - ORDER BY 1; - name --- -(0 rows) - -- NO_RESET implies NO_RESET_ALL. SELECT name FROM tab_settings_flags WHERE no_reset AND NOT no_reset_all diff --git a/src/test/regress/sql/guc.sql b/src/test/regress/sql/guc.sql index d40d0c178f..dc79761955 100644 --- a/src/test/regress/sql/guc.sql +++ b/src/test/regress/sql/guc.sql @@ -326,7 +326,6 @@ CREATE TABLE tab_settings_flags AS SELECT name, category, 'EXPLAIN' = ANY(flags) AS explain, 'NO_RESET' = ANY(flags) AS no_reset, 'NO_RESET_ALL' = ANY(flags) AS no_reset_all, -'NO_SHOW_ALL' = ANY(flags) AS no_show_all, 'NOT_IN_SAMPLE'= ANY(flags) AS not_in_sample, 'RUNTIME_COMPUTED' = ANY(flags) AS runtime_computed FROM pg_show_all_settings() AS psas, @@ -349,18 +348,6 @@ SELECT name FROM tab_settings_flags SELECT name FROM tab_settings_flags WHERE category = 'Preset Options' AND NOT not_in_sample ORDER BY 1; --- NO_SHOW_ALL implies NO_RESET_ALL, and vice-versa. -SELECT name FROM tab_settings_flags - WHERE no_show_all AND NOT no_reset_all - ORDER BY 1; --- Exceptions are transaction_*. -SELECT name FROM tab_settings_flags - WHERE NOT no_show_all AND no_reset_all - ORDER BY 1; --- NO_SHOW_ALL implies NOT_IN_SAMPLE. -SELECT name FROM tab_settings_flags - WHERE no_show_all AND NOT not_in_sample - ORDER BY 1
Re: proposal: psql: psql variable BACKEND_PID
po 6. 2. 2023 v 0:25 odesílatel Corey Huinker napsal: > >>> >>> Clearly, it is hard to write a regression test for an externally >>> volatile value. `SELECT sign(:BACKEND_PID)` would technically do the job, >>> if we're striving for completeness. >>> >> >> I did simple test - :BACKEND_PID should be equal pg_backend_pid() >> >> > > Even better. > > >> >>> >>> Do we want to change %p to pull from this variable and save the >>> snprintf()? Not a measurable savings, more or a don't-repeat-yourself thing. >>> >> >> I checked the code, and I don't think so. Current state is safer (I >> think). The psql's variables are not protected, and I think, so is safer, >> better to >> read the value for prompt directly by usage of the libpq API instead read >> the possibly "customized" variable. I see possible inconsistency, >> but again, the same inconsistency can be for variables USER and DBNAME >> too, and I am not able to say what is significantly better. Just prompt >> shows >> real value, and the related variable is +/- copy in connection time. >> >> I am not 100% sure in this area what is better, but the change requires >> wider and more general discussion, and I don't think the benefits of >> possible >> change are enough. There are just two possible solutions - we can protect >> some psql's variables (and it can do some compatibility issues), or we >> need to accept, so the value in prompt can be fake. It is better to not >> touch it :-). >> > > I agree it is out of scope of this patch, but I like the idea of protected > psql variables, and I doubt users are intentionally re-using these vars to > any positive effect. The more likely case is that newer psql vars just > happen to use the names chosen by somebody's script in the past. > bash variables are not protected too. I know this is in a different context, and different architecture. It can be a very simple patch, but it needs wider discussion. Probably it should be immutable, it is safer, and now I do not have any useful use case for mutability of these variables. Regards Pavel > > >> >> done >> >> >> > Everything passes. Docs look good. Test looks good. >
Re: proposal: psql: psql variable BACKEND_PID
po 6. 2. 2023 v 0:35 odesílatel Corey Huinker napsal: > The following review has been posted through the commitfest application: > make installcheck-world: tested, passed > Implements feature: tested, passed > Spec compliant: tested, passed > Documentation:tested, passed > > A small but helpful feature. > > The new status of this patch is: Ready for Committer > Thank you very much Pavel