Re: Patch proposal: New hooks in the connection path
On Mon, Jul 4, 2022 at 6:29 PM Drouvot, Bertrand wrote: > > On 7/2/22 2:49 AM, Roberto Mello wrote: > > On Fri, Jul 1, 2022 at 5:00 PM Nathan Bossart > wrote: >> >> That being said, I don't see why this information couldn't be provided in a >> system view. IMO it is generically useful. > > +1 for a system view with appropriate permissions, in addition to the hooks. > > That would make the information easily accessible to a number or monitoring > systems besides the admin. > > Agree about that. Are we going to have it as a part of shared memory stats? Or a separate shared memory for connection stats exposing these via a function and a view can be built on this function like pg_get_replication_slots and pg_replication_slots? > I'll start another thread and propose a dedicated patch for the "internal > counters" and how to expose them. IMHO, let's have the discussion here in this thread and the patch can be 0002. Regards, Bharath Rupireddy.
Re: [PATCH] Completed unaccent dictionary with many missing characters
On Tue, Jun 28, 2022 at 02:14:53PM +0900, Michael Paquier wrote: > Well, the addition of cyrillic does not make necessary the removal of > SOUND RECORDING COPYRIGHT or the DEGREEs, that implies the use of a > dictionnary when manipulating the set of codepoints, but that's me > being too picky. Just to say that I am fine with what you are > proposing here. So, I have been looking at the change for cyrillic letters, and are you sure that the range of codepoints [U+0410,U+044f] is right when it comes to consider all those letters as plain letters? There are a couple of characters that itch me a bit with this range: - What of the letter CAPITAL SHORT I (U+0419) and SMALL SHORT I (U+0439)? Shouldn't U+0439 be translated to U+0438 and U+0419 translated to U+0418? That's what I get while looking at UnicodeData.txt, and it would mean that the range of plain letters should not include both of them. - It seems like we are missing a couple of letters after U+044F, like U+0454, U+0456 or U+0455 just to name three of them? I have extracted from 0001 and applied the parts about the regression tests for degree signs, while adding two more for SOUND RECORDING COPYRIGHT (U+2117) and Black-Letter Capital H (U+210C) translated to 'x', while it should be probably 'H'. -- Michael signature.asc Description: PGP signature
Re: [PATCH] Add result_types column to pg_prepared_statements view
On Tue, 5 Jul 2022, at 06:34, Peter Eisentraut wrote: > On 01.07.22 14:27, Dagfinn Ilmari Mannsåker wrote: >> Peter Eisentraut writes: >> >>> On 19.05.22 17:34, Dagfinn Ilmari Mannsåker wrote: Prompted by a question on IRC, here's a patch to add a result_types column to the pg_prepared_statements view, so that one can see the types of the columns returned by a prepared statement, not just the parameter types. I'm not quite sure about the column name, suggestions welcome. >>> >>> I think this patch is sensible. > >> The arguments about client-side type-specific value handling for >> RowDescription don't apply here IMO, since this view is more >> user-facing. > > I agree. It's also easy to change if needed. Committed as is. Thanks!
Re: [PoC] Improve dead tuple storage for lazy vacuum
On Tue, Jul 5, 2022 at 6:18 AM Andres Freund wrote: > > Hi, > > On 2022-06-16 13:56:55 +0900, Masahiko Sawada wrote: > > diff --git a/src/backend/lib/radixtree.c b/src/backend/lib/radixtree.c > > new file mode 100644 > > index 00..bf87f932fd > > --- /dev/null > > +++ b/src/backend/lib/radixtree.c > > @@ -0,0 +1,1763 @@ > > +/*- > > + * > > + * radixtree.c > > + * Implementation for adaptive radix tree. > > + * > > + * This module employs the idea from the paper "The Adaptive Radix Tree: > > ARTful > > + * Indexing for Main-Memory Databases" by Viktor Leis, Alfons Kemper, and > > Thomas > > + * Neumann, 2013. > > + * > > + * There are some differences from the proposed implementation. For > > instance, > > + * this radix tree module utilizes AVX2 instruction, enabling us to use > > 256-bit > > + * width SIMD vector, whereas 128-bit width SIMD vector is used in the > > paper. > > + * Also, there is no support for path compression and lazy path expansion. > > The > > + * radix tree supports fixed length of the key so we don't expect the tree > > level > > + * wouldn't be high. > > I think we're going to need path compression at some point, fwiw. I'd bet on > it being beneficial even for the tid case. > > > > + * The key is a 64-bit unsigned integer and the value is a Datum. > > I don't think it's a good idea to define the value type to be a datum. A datum value is convenient to represent both a pointer and a value so I used it to avoid defining node types for inner and leaf nodes separately. Since a datum could be 4 bytes or 8 bytes depending it might not be good for some platforms. But what kind of aspects do you not like the idea of using datum? > > > > +/* > > + * As we descend a radix tree, we push the node to the stack. The stack is > > used > > + * at deletion. > > + */ > > +typedef struct radix_tree_stack_data > > +{ > > + radix_tree_node *node; > > + struct radix_tree_stack_data *parent; > > +} radix_tree_stack_data; > > +typedef radix_tree_stack_data *radix_tree_stack; > > I think it's a very bad idea for traversal to need allocations. I really want > to eventually use this for shared structures (eventually with lock-free > searches at least), and needing to do allocations while traversing the tree is > a no-go for that. > > Particularly given that the tree currently has a fixed depth, can't you just > allocate this on the stack once? Yes, we can do that. > > > +/* > > + * Allocate a new node with the given node kind. > > + */ > > +static radix_tree_node * > > +radix_tree_alloc_node(radix_tree *tree, radix_tree_node_kind kind) > > +{ > > + radix_tree_node *newnode; > > + > > + newnode = (radix_tree_node *) > > MemoryContextAllocZero(tree->slabs[kind], > > + > >radix_tree_node_info[kind].size); > > + newnode->kind = kind; > > + > > + /* update the statistics */ > > + tree->mem_used += GetMemoryChunkSpace(newnode); > > + tree->cnt[kind]++; > > + > > + return newnode; > > +} > > Why are you tracking the memory usage at this level of detail? It's *much* > cheaper to track memory usage via the memory contexts? Since they're dedicated > for the radix tree, that ought to be sufficient? Indeed. I'll use MemoryContextMemAllocated instead. > > > > + else if (idx != n4->n.count) > > + { > > + /* > > + * the key needs to be > > inserted in the middle of the > > + * array, make space for the > > new key. > > + */ > > + memmove(&(n4->chunks[idx + > > 1]), &(n4->chunks[idx]), > > + sizeof(uint8) > > * (n4->n.count - idx)); > > + memmove(&(n4->slots[idx + > > 1]), &(n4->slots[idx]), > > + > > sizeof(radix_tree_node *) * (n4->n.count - idx)); > > + } > > Maybe we could add a static inline helper for these memmoves? Both because > it's repetitive (for different node types) and because the last time I looked > gcc was generating quite bad code for this. And having to put workarounds into > multiple places is obviously worse than having to do it in one place. Agreed, I'll update it. > > > > +/* > > + * Insert the key with the val. > > + * > > + * found_p is set to true if the key already present, otherwise false, if > > + * it's not NULL. > > + * > > + * XXX: do we need to support update_if_exists behavior? > > + */ > > Yes, I think that's needed - hence using bfm_set() instead
Re: [PoC] Improve dead tuple storage for lazy vacuum
On Tue, Jul 5, 2022 at 7:00 AM Andres Freund wrote: > > Hi, > > On 2022-06-28 15:24:11 +0900, Masahiko Sawada wrote: > > In both test cases, There is not much difference between using AVX2 > > and SSE2. The more mode types, the more time it takes for loading the > > data (see sse2_4_16_32_128_256). > > Yea, at some point the compiler starts using a jump table instead of branches, > and that turns out to be a good bit more expensive. And even with branches, it > obviously adds hard to predict branches. IIRC I fought a bit with the compiler > to avoid some of that cost, it's possible that got "lost" in Sawada-san's > patch. > > > Sawada-san, what led you to discard the 1 and 16 node types? IIRC the 1 node > one is not unimportant until we have path compression. I wanted to start with a smaller number of node types for simplicity. 16 node type has been added to v4 patch I submitted[1]. I think it's trade-off between better memory and the overhead of growing (and shrinking) the node type. I'm going to add more node types once we turn out based on the benchmark that it's beneficial. > > Right now the node struct sizes are: > 4 - 48 bytes > 32 - 296 bytes > 128 - 1304 bytes > 256 - 2088 bytes > > I guess radix_tree_node_128->isset is just 16 bytes compared to 1288 other > bytes, but needing that separate isset array somehow is sad :/. I wonder if a > smaller "free index" would do the trick? Point to the element + 1 where we > searched last and start a plain loop there. Particularly in an insert-only > workload that'll always work, and in other cases it'll still often work I > think. radix_tree_node_128->isset is used to distinguish between null-pointer in inner nodes and 0 in leaf nodes. So I guess we can have a flag to indicate a leaf or an inner so that we can interpret (Datum) 0 as either null-pointer or 0. Or if we define different data types for inner and leaf nodes probably we don't need it. > One thing I was wondering about is trying to choose node types in > roughly-power-of-two struct sizes. It's pretty easy to end up with significant > fragmentation in the slabs right now when inserting as you go, because some of > the smaller node types will be freed but not enough to actually free blocks of > memory. If we instead have ~power-of-two sizes we could just use a single slab > of the max size, and carve out the smaller node types out of that largest > allocation. You meant to manage memory allocation (and free) for smaller node types by ourselves? How about using different block size for different node types? > > Btw, that fragmentation is another reason why I think it's better to track > memory usage via memory contexts, rather than doing so based on > GetMemoryChunkSpace(). Agreed. > > > > > Ideally, node16 and node32 would have the same code with a different > > > loop count (1 or 2). More generally, there is too much duplication of > > > code (noted by Andres in his PoC), and there are many variable names > > > with the node size embedded. This is a bit tricky to make more > > > general, so we don't need to try it yet, but ideally we would have > > > something similar to: > > > > > > switch (node->kind) // todo: inspect tagged pointer > > > { > > > case RADIX_TREE_NODE_KIND_4: > > >idx = node_search_eq(node, chunk, 4); > > >do_action(node, idx, 4, ...); > > >break; > > > case RADIX_TREE_NODE_KIND_32: > > >idx = node_search_eq(node, chunk, 32); > > >do_action(node, idx, 32, ...); > > > ... > > > } > > FWIW, that should be doable with an inline function, if you pass it the memory > to the "array" rather than the node directly. Not so sure it's a good idea to > do dispatch between node types / search methods inside the helper, as you > suggest below: > > > > > static pg_alwaysinline void > > > node_search_eq(radix_tree_node node, uint8 chunk, int16 node_fanout) > > > { > > > if (node_fanout <= SIMPLE_LOOP_THRESHOLD) > > > // do simple loop with (node_simple *) node; > > > else if (node_fanout <= VECTORIZED_LOOP_THRESHOLD) > > > // do vectorized loop where available with (node_vec *) node; > > > ... > > > } Yeah, It's worth trying at some point. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
Re: doc phrase: "inheritance child"
On Thu, Jun 30, 2022 at 6:55 PM Justin Pryzby wrote: > On Fri, May 27, 2022 at 03:22:38PM +0900, Amit Langote wrote: > > On Wed, May 25, 2022 at 1:30 PM Ashutosh Bapat > > wrote: > > > > > > - If true, the stats include inheritance child columns, not just the > > > + If true, the stats include child tables, not just the > > > > > > We are replacing columns with tables; is that intentional? > > > > > > Partitioned tables do not have their own stats, it's just aggregated > > > partition stats. > > > ... > > > - If true, the stats include inheritance child columns, not just the > > > + If true, the stats include child childs, not just the > > > values in the specified relation > > > > > > > > > @@ -13152,7 +13152,7 @@ SELECT * FROM pg_locks pl LEFT JOIN > > > pg_prepared_xacts ppx > > > inherited bool > > > > > > > > > - If true, this row includes inheritance child columns, not just the > > > + If true, this row includes child tables, not just the > > > values in the specified table > > > > > > > > > > > > Replacing inheritance child "column" with "tables", is that intentional? > > > > I was a bit confused by these too, though perhaps the original text is > > not as clear as it could be? Would the following be a good rewrite: > > I updated the language to say "values from". Is this better ? Yes. -- Thanks, Amit Langote EDB: http://www.enterprisedb.com
Re: Patch proposal: New hooks in the connection path
On Mon, Jul 4, 2022 at 6:23 PM Drouvot, Bertrand wrote: > > Hi, > > On 7/2/22 1:00 AM, Nathan Bossart wrote: > > Could we model this after fmgr_hook? The first argument in that hook > > indicates where it is being called from. This doesn't alleviate the need > > for several calls to the hook in the authentication logic, but extension > > authors would only need to define one hook. > > I like the idea and indeed fmgr.h looks a good place to model it. > > Attached a new patch version doing so. Thanks for the patch. Can we think of enhancing ClientAuthentication_hook_type itself i.e. make it a generic hook for all sorts of authentication metrics, info etc. with the type parameter embedded to it instead of new hook FailedConnection_hook?We can either add a new parameter for the "event" (the existing ClientAuthentication_hook_type implementers will have problems), or embed/multiplex the "event" info to existing Port structure or status variable (macro or enum) (existing implementers will not have compatibility problems). IMO, this looks cleaner going forward. On the v2 patch: 1. Why do we need to place the hook and structures in fmgr.h? Why not in auth.h? 2. Timeout Handler is a signal handler, called as part of SIGALRM signal handler, most of the times, signal handlers ought to be doing small things, now that we are handing off the control to hook, which can do any long running work (writing to a remote storage, file, aggregate etc.), I don't think it's the right thing to do here. static void StartupPacketTimeoutHandler(void) { + if (FailedConnection_hook) + (*FailedConnection_hook) (FCET_STARTUP_PACKET_TIMEOUT, MyProcPort); + ereport(COMMERROR, + (errcode(ERRCODE_PROTOCOL_VIOLATION), + errmsg("timeout while processing startup packet"))); 3. On "not letting these hooks (ClientAuthentication_hook_type or FailedConnection_hook_type) expose sensitive info via Port structure" - it seems like the Port structure has sensitive info like HbaLine, host address, username, etc. but that's what it is so I think we are okay with the structure as-is. Regards, Bharath Rupireddy.
Re: typos
On Thu, Apr 14, 2022 at 08:56:22AM +1200, David Rowley wrote: > 0007: Not pushed. No space after comment and closing */ pgindent > fixed one of these but not the other 2. I've not looked into why > pgindent does 1 and not the other 2. > -/* get operation priority by its code*/ > +/* get operation priority by its code */ pgindent never touches comments that start in column zero. (That's why many column-0 comments are wrapped to widths other than the standard 78.)
Re: [PoC] Improve dead tuple storage for lazy vacuum
Hi, On 2022-07-05 16:33:17 +0900, Masahiko Sawada wrote: > On Tue, Jul 5, 2022 at 6:18 AM Andres Freund wrote: > A datum value is convenient to represent both a pointer and a value so > I used it to avoid defining node types for inner and leaf nodes > separately. I'm not convinced that's a good goal. I think we're going to want to have different key and value types, and trying to unify leaf and inner nodes is going to make that impossible. Consider e.g. using it for something like a buffer mapping table - your key might be way too wide to fit it sensibly into 64bit. > Since a datum could be 4 bytes or 8 bytes depending it might not be good for > some platforms. Right - thats another good reason why it's problematic. A lot of key types aren't going to be 4/8 bytes dependent on 32/64bit, but either / or. > > > +void > > > +radix_tree_insert(radix_tree *tree, uint64 key, Datum val, bool *found_p) > > > +{ > > > + int shift; > > > + boolreplaced; > > > + radix_tree_node *node; > > > + radix_tree_node *parent = tree->root; > > > + > > > + /* Empty tree, create the root */ > > > + if (!tree->root) > > > + radix_tree_new_root(tree, key, val); > > > + > > > + /* Extend the tree if necessary */ > > > + if (key > tree->max_val) > > > + radix_tree_extend(tree, key); > > > > FWIW, the reason I used separate functions for these in the prototype is > > that > > it turns out to generate a lot better code, because it allows non-inlined > > function calls to be sibling calls - thereby avoiding the need for a > > dedicated > > stack frame. That's not possible once you need a palloc or such, so > > splitting > > off those call paths into dedicated functions is useful. > > Thank you for the info. How much does using sibling call optimization > help the performance in this case? I think that these two cases are > used only a limited number of times: inserting the first key and > extending the tree. It's not that it helps in the cases moved into separate functions - it's that not having that code in the "normal" paths keeps the normal path faster. Greetings, Andres Freund
Re: [PoC] Improve dead tuple storage for lazy vacuum
Hi, On 2022-07-05 16:33:29 +0900, Masahiko Sawada wrote: > > One thing I was wondering about is trying to choose node types in > > roughly-power-of-two struct sizes. It's pretty easy to end up with > > significant > > fragmentation in the slabs right now when inserting as you go, because some > > of > > the smaller node types will be freed but not enough to actually free blocks > > of > > memory. If we instead have ~power-of-two sizes we could just use a single > > slab > > of the max size, and carve out the smaller node types out of that largest > > allocation. > > You meant to manage memory allocation (and free) for smaller node > types by ourselves? For all of them basically. Using a single slab allocator and then subdividing the "common block size" into however many chunks that fit into a single node type. > How about using different block size for different node types? Not following... Greetings, Andres Freund
Re: Handle infinite recursion in logical replication setup
On Mon, Jul 4, 2022 at 11:33 PM vignesh C wrote: > > On Mon, Jul 4, 2022 at 7:44 PM Alvaro Herrera wrote: > > > > > From d8f8844f877806527b6f3f45320b6ba55a8e3154 Mon Sep 17 00:00:00 2001 > > > From: Vigneshwaran C > > > Date: Thu, 26 May 2022 19:29:33 +0530 > > > Subject: [PATCH v27 1/4] Add a missing test to verify only-local > > > parameter in > > > test_decoding plugin. > > > > > > Add a missing test to verify only-local parameter in test_decoding plugin. > > > > I don't get it. replorigin.sql already has some lines to test > > local-only. What is your patch adding that is new? Maybe instead of > > adding some more lines at the end of the script, you should add lines > > where this stuff is already being tested. But that assumes that there > > is something new that is being tested; if so what is it? > > The test is to check that remote origin data (i.e. replication origin > being set) will be filtered when only-local parameter is set. I felt > that this scenario is not covered, unless I'm missing something. > If we change only-local option to '0' in the below part of the test, we can see a different result: -- and magically the replayed xact will be filtered! SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1', 'only-local', '1'); So, I think only-local functionality is being tested but I feel your test is clear in the sense that it clearly shows that remote data can only be fetched with 'only-local' as '0' when the origin is set. It seems to me that in existing tests, we are not testing the combination where the origin is set and 'only-local' is '0'. So, there is some value to having the test you are proposing but if Alvaro, you, or others don't think it is worth adding a new test for it then we can probably drop this one. -- With Regards, Amit Kapila.
Wrong provolatile value for to_timestamp (1 argument)
I found that provolatile attribute of to_timestamp in pg_proc is wrong: test=# select provolatile, proargtypes from pg_proc where proname = 'to_timestamp' and proargtypes[0] = 701; provolatile | proargtypes -+- i | 701 (1 row) 'i' (immutable) is clearly wrong since the function's return value can be changed depending on the time zone settings. Actually the manual says functions depending on time zone settings should be labeled STABLE. https://www.postgresql.org/docs/14/xfunc-volatility.html "A common error is to label a function IMMUTABLE when its results depend on a configuration parameter. For example, a function that manipulates timestamps might well have results that depend on the TimeZone setting. For safety, such functions should be labeled STABLE instead." It's intersting that two arguments form of to_timestamp has correct attribute value ('s': stable) for provolatile in pg_proc. Do we want to fix this for PG16? I think it's too late for 15. Best reagards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
[PATCH] Optional OR REPLACE in CREATE OPERATOR statement
Hello hackers, It seems useful to have [OR REPLACE] option in CREATE OPERATOR statement, as in CREATE FUNCTION. This option may be good for writing extension update scripts, to avoid errors with re-creating the same operator. Because of cached query plans, only RESTRICT and JOIN options can be changed for existing operator, as in ALTER OPERATOR statement. (discussed here: https://www.postgresql.org/message-id/flat/3348985.V7xMLFDaJO%40dinodell ) The attached patch will be proposed for September CF. Best regards, -- Svetlana Derevyanko Postgres Professional: http://www.postgrespro.com The Russian Postgres CompanyFrom 7398a8a14f29a6bbb59117a3e3059231f0b476d9 Mon Sep 17 00:00:00 2001 From: Svetlana Derevyanko Date: Tue, 14 Jun 2022 08:40:51 +0300 Subject: [PATCH v1] Add optional [OR REPLACE] in CREATE OPERATOR statement. As in ALTER OPERATOR, only restrict and join params can be modified, because of the cached query plans. --- doc/src/sgml/ref/create_operator.sgml | 11 +- src/backend/catalog/pg_operator.c | 121 +++--- src/backend/commands/operatorcmds.c | 6 +- src/backend/parser/gram.y | 12 ++ src/backend/tcop/utility.c| 3 +- src/include/catalog/pg_operator.h | 3 +- src/include/commands/defrem.h | 2 +- src/test/regress/expected/create_operator.out | 94 ++ src/test/regress/sql/create_operator.sql | 101 +++ 9 files changed, 325 insertions(+), 28 deletions(-) diff --git a/doc/src/sgml/ref/create_operator.sgml b/doc/src/sgml/ref/create_operator.sgml index e27512ff39..0e03108876 100644 --- a/doc/src/sgml/ref/create_operator.sgml +++ b/doc/src/sgml/ref/create_operator.sgml @@ -21,7 +21,7 @@ PostgreSQL documentation -CREATE OPERATOR name ( +CREATE [ OR REPLACE ] OPERATOR name ( {FUNCTION|PROCEDURE} = function_name [, LEFTARG = left_type ] [, RIGHTARG = right_type ] [, COMMUTATOR = com_op ] [, NEGATOR = neg_op ] @@ -36,7 +36,8 @@ CREATE OPERATOR name ( CREATE OPERATOR defines a new operator, - name. The user who + name. CREATE OR REPLACE OPERATOR + will either create a new operator, or replace an existing definition. The user who defines an operator becomes its owner. If a schema name is given then the operator is created in the specified schema. Otherwise it is created in the current schema. @@ -114,6 +115,12 @@ CREATE OPERATOR name ( as EXECUTE privilege on the underlying function. If a commutator or negator operator is specified, you must own these operators. + + + When CREATE OR REPLACE OPERATOR is used to replace an + existing operator, only res_proc + and join_proc can be changed. + diff --git a/src/backend/catalog/pg_operator.c b/src/backend/catalog/pg_operator.c index 630bf3e56c..eca235f735 100644 --- a/src/backend/catalog/pg_operator.c +++ b/src/backend/catalog/pg_operator.c @@ -303,6 +303,7 @@ OperatorShellMake(const char *operatorName, * joinId X join selectivity procedure ID * canMergemerge join can be used with this operator * canHash hash join can be used with this operator + * replace replace operator if exists * * The caller should have validated properties and permissions for the * objects passed as OID references. We must handle the commutator and @@ -334,7 +335,8 @@ OperatorCreate(const char *operatorName, Oid restrictionId, Oid joinId, bool canMerge, - bool canHash) + bool canHash, + bool replace) { Relation pg_operator_desc; HeapTuple tup; @@ -415,12 +417,18 @@ OperatorCreate(const char *operatorName, rightTypeId, &operatorAlreadyDefined); - if (operatorAlreadyDefined) + if (operatorAlreadyDefined && !replace) ereport(ERROR, (errcode(ERRCODE_DUPLICATE_FUNCTION), errmsg("operator %s already exists", operatorName))); + /* + * No such operator yet, so CREATE OR REPLACE is equivalent to CREATE + */ + if (!OidIsValid(operatorObjectId) && replace) + replace = false; + /* * At this point, if operatorObjectId is not InvalidOid then we are * filling in a previously-created shell. Insist that the user own any @@ -431,6 +439,59 @@ OperatorCreate(const char *operatorName, aclcheck_error(ACLCHECK_NOT_OWNER, OBJECT_OPERATOR, operatorName); + /* + * When operator is updated, + * params others than RESTRICT and JOIN should remain the same. + */ + if (replace) + { + Form_pg_operator oprForm; + + tup = SearchSysCache4(OPERNAMENSP, + PointerGetDatum(operatorName), + ObjectIdGetDatum(leftTypeId), + ObjectIdGetDatum(rightTypeId), + ObjectIdGetDatum(operatorNamespace)); + oprForm = (Form_pg_operator) GETSTRUCT(tup); + + if (oprForm->oprcanmerge != canMerge) + ereport(ERROR, + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), + errmsg("operator attribute \"merges\" cannot
Re: logical replication restrictions
Here are some review comments for your v4-0001 patch. I hope they are useful for you. == 1. General This thread name "logical replication restrictions" seems quite unrelated to the patch here. Maybe it's better to start a new thread otherwise nobody is going to recognise what this thread is really about. == 2. Commit message Similar to physical replication, a time-delayed copy of the data for logical replication is useful for some scenarios (specially to fix errors that might cause data loss). "specially" -> "particularly" ? ~~~ 3. Commit message Maybe take some examples from the regression tests to show usage of the new parameter == 4. doc/src/sgml/catalogs.sgml + + + subapplydelay int8 + + + Delay the application of changes by a specified amount of time. + + I think this should say that the units are ms. == 5. doc/src/sgml/ref/create_subscription.sgml + +min_apply_delay (integer) + Is the "integer" type here correct? It might eventually be stored as an integer, but IIUC (going by the tests) from the user point-of-view this parameter is really "text" type for representing ms or interval, right? ~~~ 6. doc/src/sgml/ref/create_subscription.sgml Similar + to the physical replication feature + (), it may be useful to + have a time-delayed copy of data for logical replication. SUGGESTION As with the physical replication feature (recovery_min_apply_delay), it can be useful for logical replication to delay the data replication. ~~~ 7. doc/src/sgml/ref/create_subscription.sgml Delays in logical + decoding and in transfer the transaction may reduce the actual wait + time. SUGGESTION Time spent in logical decoding and in transferring the transaction may reduce the actual wait time. ~~~ 8. doc/src/sgml/ref/create_subscription.sgml If the system clocks on publisher and subscriber are not + synchronized, this may lead to apply changes earlier than expected. Why just say "earlier than expected"? If the publisher's time is ahead of the subscriber then the changes might also be *later* than expected, right? So, perhaps it is better to just say "other than expected". ~~~ 9. doc/src/sgml/ref/create_subscription.sgml Should there also be a big warning box about the impact if using synchronous_commit (like the other streaming replication page has this warning)? ~~~ 10. doc/src/sgml/ref/create_subscription.sgml I think there should be some examples somewhere showing how to specify this parameter. Maybe they are better added somewhere in "31.2 Subscription" and xrefed from here. == 11. src/backend/commands/subscriptioncmds.c - parse_subscription_options I think there should be a default assignment to 0 (done where all the other supported option defaults are set) ~~~ 12. src/backend/commands/subscriptioncmds.c - parse_subscription_options + if (opts->min_apply_delay < 0) + ereport(ERROR, + errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE), + errmsg("option \"%s\" must not be negative", "min_apply_delay")); + I thought this check only needs to be do within scope of the preceding if - (IsSet(supported_opts, SUBOPT_MIN_APPLY_DELAY) && strcmp(defel->defname, "min_apply_delay") == 0) == 13. src/backend/commands/subscriptioncmds.c - AlterSubscription @@ -1093,6 +1126,17 @@ AlterSubscription(ParseState *pstate, AlterSubscriptionStmt *stmt, if (opts.enabled) ApplyLauncherWakeupAtCommit(); + /* + * If this subscription has been disabled and it has an apply + * delay set, wake up the logical replication worker to finish + * it as soon as possible. + */ + if (!opts.enabled && sub->applydelay > 0) I did not really understand the logic why should the min_apply_delay override the enabled=false? It is a called *minimum* delay so if it ends up being way over the parameter value (because the subscription is disabled) then why does that matter? == 14. src/backend/replication/logical/worker.c @@ -252,6 +252,7 @@ WalReceiverConn *LogRepWorkerWalRcvConn = NULL; Subscription *MySubscription = NULL; static bool MySubscriptionValid = false; +TimestampTz MySubscriptionMinApplyDelayUntil = 0; Looking at the only usage of this variable (in apply_delay) and how it is used I did see why this cannot just be a local member of the apply_delay function? ~~~ 15. src/backend/replication/logical/worker.c - apply_delay +/* + * Apply the informed delay for the transaction. + * + * A regular transaction uses the commit time to calculate the delay. A + * prepared transaction uses the prepare time to calculate the delay. + */ +static void +apply_delay(TimestampTz ts) I didn't think it needs to mention here about the different kinds of transactions because where it comes from has nothing really to do with this function's logic. ~~~ 16. src/backend/replication/logical/worker.c - apply_delay Refer to comment #14 about MySubscriptionMinApplyDelayUntil. ~~~ 17. s
Re: Wrong provolatile value for to_timestamp (1 argument)
On Tue, 2022-07-05 at 17:29 +0900, Tatsuo Ishii wrote: > I found that provolatile attribute of to_timestamp in pg_proc is > wrong: > > test=# select provolatile, proargtypes from pg_proc where proname = > 'to_timestamp' and proargtypes[0] = 701; > provolatile | proargtypes > -+- > i | 701 > (1 row) > > 'i' (immutable) is clearly wrong s Are you sure? I'd say that "to_timestamp(double precision)" always produces the same timestamp for the same argument. What changes with the setting of "timezone" is how that timestamp is converted to a string, but that's a different affair. Yours, Laurenz Albe
Re: [PoC] Improve dead tuple storage for lazy vacuum
On Mon, Jul 4, 2022 at 12:07 PM Masahiko Sawada wrote: > > Looking at the node stats, and then your benchmark code, I think key > > construction is a major influence, maybe more than node type. The > > key/value scheme tested now makes sense: > > > > blockhi || blocklo || 9 bits of item offset > > > > (with the leaf nodes containing a bit map of the lowest few bits of > > this whole thing) > > > > We want the lower fanout nodes at the top of the tree and higher > > fanout ones at the bottom. > > So more inner nodes can fit in CPU cache, right? My thinking is, on average, there will be more dense space utilization in the leaf bitmaps, and fewer inner nodes. I'm not quite sure about cache, since with my idea a search might have to visit more nodes to get the common negative result (indexed tid not found in vacuum's list). > > Note some consequences: If the table has enough columns such that much > > fewer than 100 tuples fit on a page (maybe 30 or 40), then in the > > dense case the nodes above the leaves will have lower fanout (maybe > > they will fit in a node32). Also, the bitmap values in the leaves will > > be more empty. In other words, many tables in the wild *resemble* the > > sparse case a bit, even if truly all tuples on the page are dead. > > > > Note also that the dense case in the benchmark above has ~4500 times > > more keys than the sparse case, and uses about ~1000 times more > > memory. But the runtime is only 2-3 times longer. That's interesting > > to me. > > > > To optimize for the sparse case, it seems to me that the key/value would be > > > > blockhi || 9 bits of item offset || blocklo > > > > I believe that would make the leaf nodes more dense, with fewer inner > > nodes, and could drastically speed up the sparse case, and maybe many > > realistic dense cases. > > Does it have an effect on the number of inner nodes? > > > I'm curious to hear your thoughts. > > Thank you for your analysis. It's worth trying. We use 9 bits for item > offset but most pages don't use all bits in practice. So probably it > might be better to move the most significant bit of item offset to the > left of blockhi. Or more simply: > > 9 bits of item offset || blockhi || blocklo A concern here is most tids won't use many bits in blockhi either, most often far fewer, so this would make the tree higher, I think. Each value of blockhi represents 0.5GB of heap (32TB max). Even with very large tables I'm guessing most pages of interest to vacuum are concentrated in a few of these 0.5GB "segments". And it's possible path compression would change the tradeoffs here. -- John Naylor EDB: http://www.enterprisedb.com
Re: [PoC] Reducing planning time when tables have many partitions
Dear Tom, Thank you for replying to my email. On Mon, Jul 4, 2022 at 6:28 AM Tom Lane wrote: > I'm not real thrilled with trying to throw hashtables at the problem, > though. As David noted, they'd be counterproductive for simple > queries. As you said, my approach that utilizes hash tables has some overheads, leading to degradation in query planning time. I tested the degradation by a brief experiment. In this experiment, I used a simple query shown below. === SELECT students.name, gpas.gpa AS gpa, sum(scores.score) AS total_score FROM students, scores, gpas WHERE students.id = scores.student_id AND students.id = gpas.student_id GROUP BY students.id, gpas.student_id; === Here, students, scores, and gpas tables have no partitions, i.e., they are regular tables. Therefore, my techniques do not work for this query and instead may lead to some regression. I repeatedly issued this query 1 million times and measured their planning times. The attached figure describes the distribution of the planning times. The figure indicates that my patch has no severe negative impacts on the planning performance. However, there seems to be a slight degradation. I show the mean and median of planning times below. With my patch, the planning time became 0.002-0.004 milliseconds slower. We have to deal with this problem, but reducing time complexity while keeping degradation to zero is significantly challenging. Planning time (ms) | Mean | Median -- Master | 0.682 | 0.674 Patched | 0.686 | 0.676 -- Degradation | 0.004 | 0.002 Of course, the attached result is just an example. Significant regression might occur in other types of queries. > For the bms_equal class of lookups, I wonder if we could get anywhere > by adding an additional List field to every RelOptInfo that chains > all EquivalenceMembers that match that RelOptInfo's relids. > The trick here would be to figure out when to build those lists. > The simple answer would be to do it lazily on-demand, but that > would mean a separate scan of all the EquivalenceMembers for each > RelOptInfo; I wonder if there's a way to do better? > > Perhaps the bms_is_subset class could be handled in a similar > way, ie do a one-time pass to make a List of all EquivalenceMembers > that use a RelOptInfo. Thank you for giving your idea. I will try to polish up my algorithm based on your suggestion. -- Best regards, Yuya Watari
Re: [PATCH] Add result_types column to pg_prepared_statements view
On 05.07.22 09:31, Dagfinn Ilmari Mannsåker wrote: On Tue, 5 Jul 2022, at 06:34, Peter Eisentraut wrote: On 01.07.22 14:27, Dagfinn Ilmari Mannsåker wrote: Peter Eisentraut writes: On 19.05.22 17:34, Dagfinn Ilmari Mannsåker wrote: Prompted by a question on IRC, here's a patch to add a result_types column to the pg_prepared_statements view, so that one can see the types of the columns returned by a prepared statement, not just the parameter types. I'm not quite sure about the column name, suggestions welcome. I think this patch is sensible. The arguments about client-side type-specific value handling for RowDescription don't apply here IMO, since this view is more user-facing. I agree. It's also easy to change if needed. Committed as is. Thanks! There was a problem that we didn't cover: Not all prepared statements have result descriptors (e.g., DML statements), so that would crash as written. I have changed it to return null for result_types in that case, and added a test case.
Re: [PATCH] Add result_types column to pg_prepared_statements view
Peter Eisentraut writes: > There was a problem that we didn't cover: Not all prepared statements > have result descriptors (e.g., DML statements), so that would crash as > written. D'oh! > I have changed it to return null for result_types in that case, and > added a test case. Thanks for spotting and fixing that. - ilmari
Re: Wrong provolatile value for to_timestamp (1 argument)
> Are you sure? I'd say that "to_timestamp(double precision)" always > produces the same timestamp for the same argument. What changes with > the setting of "timezone" is how that timestamp is converted to a > string, but that's a different affair. Of course the internal representation of timestamp with time zone data type is not affected by the time zone setting. But why other form of to_timestamp is labeled as stable? If your theory is correct, then other form of to_timestamp shouldn't be labeled immutable as well? Best reagards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
Re: Schema variables - new implementation for Postgres 15
Hi Pavel, On Tue, Jul 05, 2022 at 08:42:09AM +0200, Pavel Stehule wrote: > Hi > > fresh rebase + type check. Before returning any value, the related type is > checked if it is valid still Great news, thanks a lot for keeping working on it! I'm still in PTO since last Friday, but I'm planning to start reviewing this patch as soon as I come back. It might take a while as my knowledge of this patch are a bit blurry but hopefully it shouldn't take too long.
Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns
On Mon, Jul 4, 2022 at 6:12 PM Amit Kapila wrote: > > On Mon, May 30, 2022 at 11:13 AM Masahiko Sawada > wrote: > > > > I've attached three POC patches: > > > > I think it will be a good idea if you can add a short commit message > at least to say which patch is proposed for HEAD and which one is for > back branches. Also, it would be good if you can add some description > of the fix in the commit message. Let's remove poc* from the patch > name. > > Review poc_add_running_catchanges_xacts_to_serialized_snapshot > = Few more comments: 1. + + /* This array must be sorted in xidComparator order */ + TransactionId *xip; + } catchanges; }; This array contains the transaction ids for subtransactions as well. I think it is better mention the same in comments. 2. Are we anytime removing transaction ids from catchanges->xip array? If not, is there a reason for the same? I think we can remove it either at commit/abort or even immediately after adding the xid/subxid to committed->xip array. 3. + if (readBytes != sz) + { + int save_errno = errno; + + CloseTransientFile(fd); + + if (readBytes < 0) + { + errno = save_errno; + ereport(ERROR, + (errcode_for_file_access(), + errmsg("could not read file \"%s\": %m", path))); + } + else + ereport(ERROR, + (errcode(ERRCODE_DATA_CORRUPTED), + errmsg("could not read file \"%s\": read %d of %zu", + path, readBytes, sz))); + } This is the fourth instance of similar error handling code in SnapBuildRestore(). Isn't it better to extract this into a separate function? 4. +TransactionId * +ReorderBufferGetCatalogChangesXacts(ReorderBuffer *rb, size_t *xcnt_p) +{ + HASH_SEQ_STATUS hash_seq; + ReorderBufferTXNByIdEnt *ent; + TransactionId *xids; + size_t xcnt = 0; + size_t xcnt_space = 64; /* arbitrary number */ + + xids = (TransactionId *) palloc(sizeof(TransactionId) * xcnt_space); + + hash_seq_init(&hash_seq, rb->by_txn); + while ((ent = hash_seq_search(&hash_seq)) != NULL) + { + ReorderBufferTXN *txn = ent->txn; + + if (!rbtxn_has_catalog_changes(txn)) + continue; It would be better to allocate memory the first time we have to store xids. There is a good chance that many a time this function will do just palloc without having to store any xid. 5. Do you think we should do some performance testing for a mix of ddl/dml workload to see if it adds any overhead in decoding due to serialize/restore doing additional work? I don't think it should add some meaningful overhead but OTOH there is no harm in doing some testing of the same. -- With Regards, Amit Kapila.
Re: Making Vars outer-join aware
On Sat, Jul 2, 2022 at 12:42 AM Tom Lane wrote: > > Anyway, even though this is far from done, I'm pretty pleased with > the results so far, so I thought I'd put it out for review by > anyone who cares to take a look. I'll add it to the September CF > in hopes that it might be more or less finished by then, and so > that the cfbot will check it out. > Thanks for the work! I have a question about qual clause placement. For the query in the example SELECT * FROM t1 LEFT JOIN t2 ON (t1.x = t2.y) WHERE foo(t2.z) (foo() is not strict.) We want to avoid pushing foo(t2.z) down to the t2 scan level. Previously we do that with check_outerjoin_delay() by scanning all the outer joins below and check if the qual references any nullable rels of the OJ, and if so include the OJ's rels into the qual. So as a result we'd get that foo(t2.z) is referencing t1 and t2, and we'd put the qual into the join lists of t1 and t2. Now there is the 'varnullingrels' marker in the t2.z, which is the LEFT JOIN below (with RTI 3). So we consider the qual is referencing RTE 2 (which is t2) and RTE 3 (which is the OJ). Do we still need to include RTE 1, i.e. t1 into the qual's required relids? How should we do that? Thanks Richard
Re: Schema variables - new implementation for Postgres 15
út 5. 7. 2022 v 12:50 odesílatel Julien Rouhaud napsal: > Hi Pavel, > > On Tue, Jul 05, 2022 at 08:42:09AM +0200, Pavel Stehule wrote: > > Hi > > > > fresh rebase + type check. Before returning any value, the related type > is > > checked if it is valid still > > Great news, thanks a lot for keeping working on it! I'm still in PTO since > last Friday, but I'm planning to start reviewing this patch as soon as I > come > back. It might take a while as my knowledge of this patch are a bit > blurry but > hopefully it shouldn't take too long. > Thank you Pavel
Re: CFM Manager
On Tue, Jul 5, 2022 at 8:50 AM Michael Paquier wrote: > On Tue, Jul 05, 2022 at 08:17:26AM +0500, Ibrar Ahmed wrote: > > If nobody has already volunteered for the next upcoming commitfest. > > I'd like to volunteer. I think early to say is better as always. > > Jacob and Daniel have already volunteered. Based on the number of > patches at hand (305 in total), getting more help is always welcome, I > guess. > -- > Michael > I am happy to help, but I am talking about the next one. -- Ibrar Ahmed
Re: pg15b2: large objects lost on upgrade
I was able to reproduce the issue. Also, the issue does not occur with code before to preserve relfilenode commit. I tested your patch and it fixes the problem. I am currently analyzing a few things related to the issue. I will come back once my analysis is completed. On Sat, Jul 2, 2022 at 9:19 PM Justin Pryzby wrote: > On Sat, Jul 02, 2022 at 08:34:04AM -0400, Robert Haas wrote: > > On Fri, Jul 1, 2022 at 7:14 PM Justin Pryzby > wrote: > > > I noticed this during beta1, but dismissed the issue when it wasn't > easily > > > reproducible. Now, I saw the same problem while upgrading from beta1 > to beta2, > > > so couldn't dismiss it. It turns out that LOs are lost if VACUUM FULL > was run. > > > > Yikes. That's really bad, and I have no idea what might be causing it, > > either. I'll plan to investigate this on Tuesday unless someone gets > > to it before then. > > I suppose it's like Bruce said, here. > > https://www.postgresql.org/message-id/20210601140949.GC22012%40momjian.us > > |One tricky case is pg_largeobject, which is copied from the old to new > |cluster since it has user data. To preserve that relfilenode, you would > |need to have pg_upgrade perform cluster surgery in each database to > |renumber its relfilenode to match since it is created by initdb. I > |can't think of a case where pg_upgrade already does something like that. > > Rather than setting the filenode of the next object as for user tables, > pg-upgrade needs to UPDATE the relfilenode. > > This patch "works for me" but feel free to improve on it. >
Re: doc: BRIN indexes and autosummarize
On 2022-Jul-04, Jaime Casanova wrote: > I feel that somewhere in this paragraph it should be mentioned that is > off by default. OK, I added it. On 2022-Jul-04, Justin Pryzby wrote: > [ lots of comments ] OK, I have adopted all your proposed changes, thanks for submitting in both forms. I did some more wordsmithing and pushed, to branches 12 and up. 11 fails 'make check', I think for lack of Docbook id tags, and I didn't want to waste more time. Kindly re-read the result and let me know if I left something unaddressed, or made something worse. The updated text is already visible in the website: https://www.postgresql.org/docs/devel/brin-intro.html (Having almost-immediate doc refreshes is an enormous improvement. Thanks Magnus.) > Also, a reminder that this was never addressed (I wish the project had a way > to > keep track of known issues). > > https://www.postgresql.org/message-id/20201113160007.gq30...@telsasoft.com > |error_severity of brin work item Yeah, I've not forgotten that item. I can't promise I'll get it fixed soon, but it's on my list. -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/ "Nunca se desea ardientemente lo que solo se desea por razón" (F. Alexandre)
Re: replacing role-level NOINHERIT with a grant-level option
On Sun, Jul 3, 2022 at 1:17 PM Nathan Bossart wrote: > If by "bolder" you mean "mark [NO]INHERIT as deprecated-and-to-be-removed > and begin emitting WARNINGs when it and WITH INHERIT DEFAULT are used," I > think it's worth consideration. I suspect it will be hard to sell removing > [NO]INHERIT in v16 because it would introduce a compatibility break without > giving users much time to migrate. I could be wrong, though. It's a fair point. But, if our goal for v16 is to do something that could lead to an eventual deprecation of [NO]INHERIT, I still think removing WITH INHERIT DEFAULT from the patch set is probably a good idea. Perhaps then we could document that we recommend using the grant-level option rather than setting NOINHERIT on the role, and if users were to follow that advice, eventually no one would be relying on the role-level property anymore. It might be optimistic to assume that users will read the documentation, let alone follow it, but it's something. Now, that does mean accepting a compatibility break now, in that flipping the role-level setting would no longer affect existing grants, but it's less of a compatibility break than I proposed originally, so maybe it's acceptable. -- Robert Haas EDB: http://www.enterprisedb.com
Re: logical replication restrictions
On Tue, Jul 5, 2022 at 2:12 PM Peter Smith wrote: > > Here are some review comments for your v4-0001 patch. I hope they are > useful for you. > > == > > 1. General > > This thread name "logical replication restrictions" seems quite > unrelated to the patch here. Maybe it's better to start a new thread > otherwise nobody is going to recognise what this thread is really > about. > +1. > > 17. src/backend/replication/logical/worker.c - apply_handle_stream_prepare > > @@ -1090,6 +1146,19 @@ apply_handle_stream_prepare(StringInfo s) > > elog(DEBUG1, "received prepare for streamed transaction %u", > prepare_data.xid); > > + /* > + * Should we delay the current prepared transaction? > + * > + * Although the delay is applied in BEGIN PREPARE messages, streamed > + * prepared transactions apply the delay in a STREAM PREPARE message. > + * That's ok because no changes have been applied yet > + * (apply_spooled_messages() will do it). > + * The STREAM START message does not contain a prepare time (it will be > + * available when the in-progress prepared transaction finishes), hence, it > + * was not possible to apply a delay at that time. > + */ > + apply_delay(prepare_data.prepare_time); > + > > It seems to rely on the spooling happening at the end. But won't this > cause a problem later when/if the "parallel apply" patch [1] is pushed > and the stream bgworkers are doing stuff on the fly instead of > spooling at the end? > I wonder why we don't apply the delay on commit/commit_prepared records only similar to physical replication. See recoveryApplyDelay. One more advantage would be then we don't need to worry about transactions that we are going to skip due SKIP feature for subscribers. One more thing that might be worth discussing is whether introducing a new subscription parameter for this feature is a better idea or can we use guc (either an existing or a new one). Users may want to set this only for a particular subscription or set of subscriptions in which case it is better to have this as a subscription level parameter. OTOH, I was slightly worried that if this will be used for all subscriptions on a subscriber then it will be burdensome for users. -- With Regards, Amit Kapila.
Re: Using PQexecQuery in pipeline mode produces unexpected Close messages
I have pushed this to all three branches. Thanks! -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "It takes less than 2 seconds to get to 78% complete; that's a good sign. A few seconds later it's at 90%, but it seems to have stuck there. Did somebody make percentages logarithmic while I wasn't looking?" http://smylers.hates-software.com/2005/09/08/1995c749.html
Re: Support load balancing in libpq
On Fri, Jun 10, 2022 at 10:01 PM Jelte Fennema wrote: > > Load balancing connections across multiple read replicas is a pretty > common way of scaling out read queries. There are two main ways of doing > so, both with their own advantages and disadvantages: > 1. Load balancing at the client level > 2. Load balancing by connecting to an intermediary load balancer > > Option 1 has been supported by JDBC (Java) for 8 years and Npgsql (C#) > merged support about a year ago. This patch adds the same functionality > to libpq. The way it's implemented is the same as the implementation of > JDBC, and contains two levels of load balancing: > 1. The given hosts are randomly shuffled, before resolving them > one-by-one. > 2. Once a host its addresses get resolved, those addresses are shuffled, > before trying to connect to them one-by-one. Thanks for the patch. +1 for the general idea of redirecting connections. I'm quoting a previous attempt by Satyanarayana Narlapuram on this topic [1], it also has a patch set. IMO, rebalancing of the load must be based on parameters (as also suggested by Aleksander Alekseev in this thread) such as the read-only/write queries, CPU/IO/Memory utilization of the primary/standby, network distance etc. We may not have to go the extra mile to determine all of these parameters dynamically during query authentication time, but we can let users provide a list of standby hosts based on "some" priority (Satya's thread [1] attempts to do this, in a way, with users specifying the hosts via pg_hba.conf file). If required, randomization in choosing the hosts can be optional. Also, IMO, the solution must have a fallback mechanism if the standby/chosen host isn't reachable. Few thoughts on the patch: 1) How are we determining if the submitted query is read-only or write? 2) What happens for explicit transactions? The queries related to the same txn get executed on the same host right? How are we guaranteeing this? 3) Isn't it good to provide a way to test the patch? [1] https://www.postgresql.org/message-id/flat/CY1PR21MB00246DE1F9E9C58455A78A37915C0%40CY1PR21MB0024.namprd21.prod.outlook.com Regards, Bharath Rupireddy.
Re: Temporary file access API
Peter Eisentraut wrote: > On 04.07.22 18:35, Antonin Houska wrote: > >> Attached is a new version of the patch, to evaluate what the API use in the > >> backend could look like. I haven't touched places where the file is > >> accessed > >> in a non-trivial way, e.g. lseek() / fseek() or pg_pwrite() / pg_pread() is > >> called. > > Rebased patch set is attached here, which applies to the current master. > > (A few more opportunities for the new API implemented here.) > > I don't understand what this patch set is supposed to do. AFAICT, the thread > originally forked off from a TDE discussion and, considering the thread > subject, was possibly once an attempt to refactor temporary file access to > make integrating encryption easier? The discussion then talked about things > like saving on system calls and excising all OS-level file access API use, > which I found confusing, and the thread then just became a general TDE-related > mini-discussion. Yes, it's an attempt to make the encryption less invasive, but there are a few other objectives, at least: 1) to make the read / write operations "less low-level" (there are common things like error handling which are often just copy & pasted from other places), 2) to have buffered I/O with configurable buffer size (the current patch version still has fixed buffer size though) It's true that the discussion ends up to be encryption-specific, however the scope of the patch is broader. The first meassage of the thread references a related discussion https://www.postgresql.org/message-id/CA+TgmoYGjN_f=fcerx49bzjhng+gocty+a+xhnrwcvvdy8u...@mail.gmail.com which is more important for this patch than the suggestions about encryption. > The patches at hand extend some internal file access APIs and then sprinkle > them around various places in the backend code, but there is no explanation > why or how this is better. I don't see any real structural savings one might > expect from a refactoring patch. No information has been presented how this > might help encryption. At this stage I expected feedback from the developers who have already contributed to the discussion, because I'm not sure myself if this version fits their ideas - that's why I didn't elaborate the documentation yet. I'll try to summarize my understanding in the next version, but I'd appreciate if I got some feedback for the current version first. > I also suspect that changing around the use of various file access APIs needs > to be carefully evaluated in each individual case. Various code has subtle > expectations about atomic write behavior, syncing, flushing, error recovery, > etc. I don't know if this has been considered here. I considered that, but could have messed up at some places. Right now I'm aware of one problem: pgstat.c does not expect the file access API to raise ERROR - this needs to be fixed. -- Antonin Houska Web: https://www.cybertec-postgresql.com
Re: POC: Lock updated tuples in tuple_update() and tuple_delete()
Hi Alexander, > Thoughts and feedback are welcome. I took some preliminary look at the patch. I'm going to need more time to meditate on the proposed changes and to figure out the performance impact. So far I just wanted to let you know that the patch applied OK for me and passed all the tests. The `else` branch here seems to be redundant here: +if (!updated) +{ +/* Should not encounter speculative tuple on recheck */ +Assert(!HeapTupleHeaderIsSpeculative(tuple->t_data)); - ReleaseBuffer(buffer); +ReleaseBuffer(buffer); +} +else +{ +updated = false; +} Also I wish there were a little bit more comments since some of the proposed changes are not that straightforward. -- Best regards, Aleksander Alekseev
Re: POC: Lock updated tuples in tuple_update() and tuple_delete()
Hi again, > +if (!updated) > +{ > +/* Should not encounter speculative tuple on recheck */ > +Assert(!HeapTupleHeaderIsSpeculative(tuple->t_data)); > - ReleaseBuffer(buffer); > +ReleaseBuffer(buffer); > +} > +else > +{ > +updated = false; > +} OK, I got confused here. I suggest changing the if(!...) { .. } else { .. } code to if() { .. } else { .. } here. -- Best regards, Aleksander Alekseev
[PATCH] Reuse Workers and Replication Slots during Logical Replication
Hi hackers, I created a patch to reuse tablesync workers and their replication slots for more tables that are not synced yet. So that overhead of creating and dropping workers/replication slots can be reduced. Current version of logical replication has two steps: tablesync and apply. In tablesync step, apply worker creates a tablesync worker for each table and those tablesync workers are killed when they're done with their associated table. (the number of tablesync workers running at the same time is limited by "max_sync_workers_per_subscription") Each tablesync worker also creates a replication slot on publisher during its lifetime and drops the slot before exiting. The purpose of this patch is getting rid of the overhead of creating/killing a new worker (and replication slot) for each table. It aims to reuse tablesync workers and their replication slots so that tablesync workers can copy multiple tables from publisher to subscriber during their lifetime. The benefits of reusing tablesync workers can be significant if tables are empty or close to empty. In an empty table case, spawning tablesync workers and handling replication slots are where the most time is spent since the actual copy phase takes too little time. The changes in the behaviour of tablesync workers with this patch as follows: 1- After tablesync worker is done with syncing the current table, it takes a lock and fetches tables in init state 2- it looks for a table that is not already being synced by another worker from the tables with init state 3- If it founds one, updates its state for the new table and loops back to beginning to start syncing 4- If no table found, it drops the replication slot and exits With those changes, I did some benchmarking to see if it improves anything. This results compares this patch with the latest version of master branch. "max_sync_workers_per_subscription" is set to 2 as default. Got some results simply averaging timings from 5 consecutive runs for each branch. First, tested logical replication with empty tables. 10 tables - master:286.964 ms - the patch:116.852 ms 100 tables - master:2785.328 ms - the patch:706.817 ms 10K tables - master:39612.349 ms - the patch:12526.981 ms Also tried replication tables with some data 10 tables loaded with 10MB data - master:1517.714 ms - the patch:1399.965 ms 100 tables loaded with 10MB data - master:16327.229 ms - the patch:11963.696 ms Then loaded more data 10 tables loaded with 100MB data - master:13910.189 ms - the patch:14770.982 ms 100 tables loaded with 100MB data - master:146281.457 ms - the patch:156957.512 If tables are mostly empty, the improvement can be significant - up to 3x faster logical replication. With some data loaded, it can still be faster to some extent. When the table size increases more, the advantage of reusing workers becomes insignificant. I would appreciate your comments and suggestions.Thanks in advance for reviewing. Best, Melih 0001-Reuse-Logical-Replication-Background-worker.patch Description: Binary data
Re: Fix proposal for comparaison bugs in PostgreSQL::Version
On 2022-07-03 Su 16:12, Jehan-Guillaume de Rorthais wrote: > On Sun, 3 Jul 2022 10:40:21 -0400 > Andrew Dunstan wrote: > >> On 2022-06-29 We 05:09, Jehan-Guillaume de Rorthais wrote: >>> On Tue, 28 Jun 2022 18:17:40 -0400 >>> Andrew Dunstan wrote: >>> On 2022-06-28 Tu 16:53, Jehan-Guillaume de Rorthais wrote: > ... > A better fix would be to store the version internally as version_num that > are trivial to compute and compare. Please, find in attachment an > implementation of this. > > The patch is a bit bigger because it improved the devel version to support > rc/beta/alpha comparison like 14rc2 > 14rc1. > > Moreover, it adds a bunch of TAP tests to check various use cases. Nice catch, but this looks like massive overkill. I think we can very simply fix the test in just a few lines of code, instead of a 190 line fix and a 130 line TAP test. >>> I explained why the patch was a little bit larger than required: it fixes >>> the bugs and do a little bit more. The _version_cmp sub is shorter and >>> easier to understand, I use multi-line code where I could probably fold >>> them in a one-liner, added some comments... Anyway, I don't feel the number >>> of line changed is "massive". But I can probably remove some code and >>> shrink some other if it is really important... >>> >>> Moreover, to be honest, I don't mind the number of additional lines of TAP >>> tests. Especially since it runs really, really fast and doesn't hurt >>> day-to-day devs as it is independent from other TAP tests anyway. It could >>> be 1k, if it runs fast, is meaningful and helps avoiding futur regressions, >>> I would welcome the addition. >> >> I don't see the point of having a TAP test at all. We have TAP tests for >> testing the substantive products we test, not for the test suite >> infrastructure. Otherwise, where will we stop? Shall we have tests for >> the things that test the test suite? > Tons of perl module have regression tests. When questioning where testing > should stop, it seems the Test::More module itself is not the last frontier: > https://github.com/Test-More/test-more/tree/master/t > > Moreover, the PostgreSQL::Version is not a TAP test module, but a module to > deal with PostgreSQL versions and compare them. > > Testing makes development faster as well when it comes to test the code. > Instead of testing vaguely manually, you can test a whole bunch of situations > and add accumulate some more when you think about a new one or when a bug is > reported. Having TAP test helps to make sure the code work as expected. > > It helped me when creating my patch. With all due respect, I just don't > understand your arguments against them. The number of lines or questioning > when > testing should stop doesn't hold much. There is not a single TAP test in our source code that is aimed at testing our test infrastructure as opposed to testing what we are actually in the business of building, and I'm not about to add one. This is quite different from, say, CPAN modules. Every added test consumes buildfarm cycles and space on the buildfarm server for the report, be it ever so small. Every added test needs maintenance, be it ever so small. There's no such thing as a free test (apologies to Heinlein and others). > >>> If we really want to save some bytes, I have a two lines worth of code fix >>> that looks more readable to me than fixing _version_cmp: >>> >>> +++ b/src/test/perl/PostgreSQL/Version.pm >>> @@ -92,9 +92,13 @@ sub new >>> # Split into an array >>> my @numbers = split(/\./, $arg); >>> >>> + # make sure all digit of the array-represented version are set so >>> we can >>> + # keep _version_cmp code as a "simple" digit-to-digit comparison >>> loop >>> + $numbers[$_] += 0 for 0..3; >>> + >>> # Treat development versions as having a minor/micro version one >>> less than # the first released version of that branch. >>> - push @numbers, -1 if ($devel); >>> + $numbers[3] = -1 if $devel; >>> >>> $devel ||= ""; >> I don't see why this is any more readable. > The _version_cmp is much more readable. > > But anyway, this is not the point. Using an array to compare versions where we > can use version_num seems like useless and buggy convolutions to me. > I think we'll just have to agree to disagree about it. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: Making Vars outer-join aware
Richard Guo writes: > For the query in the example > SELECT * FROM t1 LEFT JOIN t2 ON (t1.x = t2.y) WHERE foo(t2.z) > (foo() is not strict.) We want to avoid pushing foo(t2.z) down to the t2 > scan level. Previously we do that with check_outerjoin_delay() by > scanning all the outer joins below and check if the qual references any > nullable rels of the OJ, and if so include the OJ's rels into the qual. > So as a result we'd get that foo(t2.z) is referencing t1 and t2, and > we'd put the qual into the join lists of t1 and t2. > Now there is the 'varnullingrels' marker in the t2.z, which is the LEFT > JOIN below (with RTI 3). So we consider the qual is referencing RTE 2 > (which is t2) and RTE 3 (which is the OJ). Do we still need to include > RTE 1, i.e. t1 into the qual's required relids? How should we do that? It seems likely to me that we could leave the qual's required_relids as just {2,3} and not have to bother ORing any additional bits into that. However, in the case of a Var-free JOIN/ON clause it'd still be necessary to artificially add some relids to its initially empty relids. Since I've not yet tried to rewrite distribute_qual_to_rels I'm not sure how the details will shake out. regards, tom lane
Re: Wrong provolatile value for to_timestamp (1 argument)
On Tue, 2022-07-05 at 19:37 +0900, Tatsuo Ishii wrote: > > Are you sure? I'd say that "to_timestamp(double precision)" always > > produces the same timestamp for the same argument. What changes with > > the setting of "timezone" is how that timestamp is converted to a > > string, but that's a different affair. > > Of course the internal representation of timestamp with time zone data > type is not affected by the time zone setting. But why other form of > to_timestamp is labeled as stable? If your theory is correct, then > other form of to_timestamp shouldn't be labeled immutable as well? The result of the two-argument form of "to_timestamp" can depend on the setting of "lc_time": test=> SET lc_time = 'en_US.utf8'; SET test=> SELECT to_timestamp('2022-July-05', '-TMMonth-DD'); to_timestamp 2022-07-05 00:00:00+02 (1 row) test=> SET lc_time = 'de_DE.utf8'; SET test=> SELECT to_timestamp('2022-July-05', '-TMMonth-DD'); ERROR: invalid value "July-05" for "Month" DETAIL: The given value did not match any of the allowed values for this field. Yours, Laurenz Albe
Re: Wrong provolatile value for to_timestamp (1 argument)
Laurenz Albe writes: > On Tue, 2022-07-05 at 19:37 +0900, Tatsuo Ishii wrote: >> Of course the internal representation of timestamp with time zone data >> type is not affected by the time zone setting. But why other form of >> to_timestamp is labeled as stable? If your theory is correct, then >> other form of to_timestamp shouldn't be labeled immutable as well? > The result of the two-argument form of "to_timestamp" can depend on > the setting of "lc_time": It also depends on the session's timezone setting, in a way that the single-argument form does not. regression=# show timezone; TimeZone -- America/New_York (1 row) regression=# select to_timestamp(0); to_timestamp 1969-12-31 19:00:00-05 (1 row) regression=# select to_timestamp('1970-01-01', '-MM-DD'); to_timestamp 1970-01-01 00:00:00-05 (1 row) regression=# set timezone = 'utc'; SET regression=# select to_timestamp(0); to_timestamp 1970-01-01 00:00:00+00 (1 row) regression=# select to_timestamp('1970-01-01', '-MM-DD'); to_timestamp 1970-01-01 00:00:00+00 (1 row) The two results of to_timestamp(0) represent the same UTC instant, but the other two are different instants. regards, tom lane
Re: Add LZ4 compression in pg_dump
This is a review of 0001. On Tue, Jul 05, 2022 at 01:22:47PM +, gkokola...@pm.me wrote: > Simply this patchset had started to divert > heavily already based on comments from Mr. Paquier who had already requested > for > the APIs to be refactored to use function pointers. This is happening in 0002 > of > the patchset. I said something about reducing ifdefs, but I'm having trouble finding what Michael said about this ? > > On Sat, Mar 26, 2022 at 11:21:56AM -0500, Justin Pryzby wrote: > > > > > LZ4F_HEADER_SIZE_MAX isn't defined in old LZ4. > > > > > > I ran into that on an ubuntu LTS, so I don't think it's so old that it > > > shouldn't be handled more gracefully. LZ4 should either have an explicit > > > version check, or else shouldn't depend on that feature (or should define > > > a > > > safe fallback version if the library header doesn't define it). > > > https://packages.ubuntu.com/liblz4-1 The constant still seems to be used without defining a fallback or a minimum version. > > > 0003: typo: of legacy => or legacy This is still there > > > You renamed this: > > > > > > |- COMPR_ALG_LIBZ > > > |-} CompressionAlgorithm; > > > |+ COMPRESSION_GZIP, > > > |+} CompressionMethod; > > > > > > ..But I don't think that's an improvement. If you were to change it, it > > > should > > > say something like PGDUMP_COMPRESS_ZLIB, since there are other compression > > > structs and typedefs. zlib is not idential to gzip, which uses a different > > > header, so in WriteDataToArchive(), LIBZ is correct, and GZIP is > > > incorrect. This comment still applies - zlib's gz* functions are "gzip" but the others are "zlib". https://zlib.net/manual.html That affects both the 0001 and 0002 patches. Actually, I think that "gzip" should not be the name of the user-facing option, since (except for "plain" format) it isn't using gzip. +Robert, since this suggests amending parse_compress_algorithm(). Maybe "zlib" should be parsed the same way as "gzip" - I don't think we ever expose both to a user, but in some cases (basebackup and pg_dump -Fp -Z1) the output is "gzip" and in some cases NO it's zlib (pg_dump -Fc -Z1). > > > The cf* changes in pg_backup_archiver could be split out into a separate > > > commit. It's strictly a code simplification - not just preparation for > > > more > > > compression algorithms. The commit message should "See also: > > > bf9aa490db24b2334b3595ee33653bf2fe39208c". I still think this could be an early, patch. > > > freebsd/cfbot is failing. This is still failing for bsd, windows and compiler warnings. Windows also has compiler warnings. http://cfbot.cputube.org/georgios-kokolatos.html Please see: src/tools/ci/README, which you can use to run check-world on 4 OS by pushing a branch to github. > > > I suggested off-list to add an 0099 patch to change LZ4 to the default, to > > > exercise it more on CI. What about this ? I think the patch needs to pass CI on all 4 OS with default=zlib and default=lz4. > > On Sat, Mar 26, 2022 at 01:33:36PM -0500, Justin Pryzby wrote: > @@ -254,7 +251,12 @@ CreateArchive(const char *FileSpec, const ArchiveFormat > fmt, > Archive * > OpenArchive(const char *FileSpec, const ArchiveFormat fmt) > { > - ArchiveHandle *AH = _allocAH(FileSpec, fmt, 0, true, archModeRead, > setupRestoreWorker); > + ArchiveHandle *AH; > + pg_compress_specification compress_spec; Should this be initialized to {0} ? > @@ -969,6 +969,8 @@ NewRestoreOptions(void) > opts->format = archUnknown; > opts->cparams.promptPassword = TRI_DEFAULT; > opts->dumpSections = DUMP_UNSECTIONED; > + opts->compress_spec.algorithm = PG_COMPRESSION_NONE; > + opts->compress_spec.level = INT_MIN; Why INT_MIN ? > @@ -1115,23 +1117,28 @@ PrintTOCSummary(Archive *AHX) > ArchiveHandle *AH = (ArchiveHandle *) AHX; > RestoreOptions *ropt = AH->public.ropt; > TocEntry *te; > + pg_compress_specification out_compress_spec; Should have {0} ? I suggest to write it like my 2020 patch for this, which says: no_compression = {0}; > /* Open stdout with no compression for AH output handle */ > - AH->gzOut = 0; > - AH->OF = stdout; > + out_compress_spec.algorithm = PG_COMPRESSION_NONE; > + AH->OF = cfdopen(dup(fileno(stdout)), PG_BINARY_A, out_compress_spec); Ideally this should check the success of dup(). > @@ -3776,21 +3746,25 @@ ReadHead(ArchiveHandle *AH) > + if (AH->compress_spec.level != INT_MIN) Why is it testing the level and not the algorithm ? > --- a/src/bin/pg_dump/pg_backup_custom.c > +++ b/src/bin/pg_dump/pg_backup_custom.c > @@ -298,7 +298,7 @@ _StartData(ArchiveHandle *AH, TocEntry *te) > _WriteByte(AH, BLK_DATA); /* Block type */ > WriteInt(AH, te->dumpId); /* For sanity check */ > > - ctx->cs = AllocateCompressor(AH->compression, _CustomWriteFunc); > + ctx->cs = AllocateCompressor(AH->compress_spec, _CustomWriteFunc); Is it necessary to rename the dat
Re: [EXTERNAL] Re: Support load balancing in libpq
> I'm quoting a previous attempt by Satyanarayana Narlapuram on this > topic [1], it also has a patch set. Thanks for sharing that. It's indeed a different approach to solve the same problem. I think my approach is much simpler, since it only requires minimal changes to the libpq client and none to the postgres server or the postgres protocol. However, that linked patch is more flexible due to allowing redirection based on users and databases. With my patch something similar could still be achieved by using different hostname lists for different databases or users at the client side. To be completely clear on the core difference between the patch IMO: In this patch a DNS server or (a hardcoded hostname/IP list at the client side) is used to determine what host to connect to. In the linked patch instead the Postgres server starts being the source of truth of what to connect to, thus essentially becoming something similar to a DNS server. > We may not have to go the extra > mile to determine all of these parameters dynamically during query > authentication time, but we can let users provide a list of standby > hosts based on "some" priority (Satya's thread [1] attempts to do > this, in a way, with users specifying the hosts via pg_hba.conf file). > If required, randomization in choosing the hosts can be optional. I'm not sure if you read my response to Aleksander. I feel like I addressed part of this at least. But maybe I was not clear enough, or added too much fluff. So, I'll re-iterate the important part: By specifying the same host multiple times in the DNS response or the hostname/IP list you can achieve weighted load balancing. Few thoughts on the patch: > 1) How are we determining if the submitted query is read-only or write? This is not part of this patch. libpq and thus this patch works at the connection level, not at the query level, so determining a read-only query or write only query is not possible without large changes. However, libpq already has a target_session_attrs[1] connection option. This can be used to open connections specifically to read-only or writable servers. However, once a read-only connection is opened it is then the responsibility of the client not to send write queries over this read-only connection, otherwise they will fail. > 2) What happens for explicit transactions? The queries related to the > same txn get executed on the same host right? How are we guaranteeing > this? We're load balancing connections, not queries. Once a connection is made all queries on that connection will be executed on the same host. > 3) Isn't it good to provide a way to test the patch? The way I tested it myself was by setting up a few databases on my local machine listening on 127.0.0.1, 127.0.0.2, 127.0.0.3 and then putting all those in the connection string. Then looking at the connection attempts on the servers their logs showed that the client was indeed connecting to a random one (by using log_connections=true in postgresql.conf). I would definitely like to have some automated tests for this, but I couldn't find tests for libpq that were connecting to multiple postgres servers. If they exist, any pointers are appreciated. If they don't exist, pointers to similar tests are also appreciated. [1]: https://www.postgresql.org/docs/current/libpq-connect.html#LIBPQ-CONNECT-TARGET-SESSION-ATTRS
Re: [PATCH] Optional OR REPLACE in CREATE OPERATOR statement
=?UTF-8?B?U3ZldGxhbmEgRGVyZXZ5YW5rbw==?= writes: > It seems useful to have [OR REPLACE] option in CREATE OPERATOR statement, as > in CREATE FUNCTION. This option may be good for writing extension update > scripts, to avoid errors with re-creating the same operator. No, that's not acceptable. CREATE OR REPLACE should always produce exactly the same final state of the object, but in this case we cannot change the underlying function if the operator already exists. (At least, not without writing a bunch of infrastructure to update existing views/rules that might use the operator; which among other things would create a lot of deadlock risks.) regards, tom lane
Re: Handle infinite recursion in logical replication setup
On Tue, Jul 5, 2022 at 6:14 AM Peter Smith wrote: > > I checked again the v26* patch set. > > I had no more comments for v26-0001, v26-0002, or v26-0004, but below > are some comments for v26-0003. > > > v26-0003 > > > 3.1 src/backend/catalog/pg_subscription.c > > 3.1.a > +/* > + * Get all relations for subscription that are in a ready state. > + * > + * Returned list is palloc'ed in current memory context. > + */ > +List * > +GetSubscriptionReadyRelations(Oid subid) > > "subscription" -> "the subscription" > > Also, It might be better to say in the function header what kind of > structures are in the returned List. E.g. Firstly, I'd assumed it was > the same return type as the other function > GetSubscriptionReadyRelations, but it isn’t. This function has been removed and GetSubscriptionRelations is used with slight changes. > 3.1.b > I think there might be a case to be made for *combining* those > SubscriptionRelState and SubscripotionRel structs into a single common > struct. Then all those GetSubscriptionNotReadyRelations and > GetSubscriptionNotReadyRelations (also GetSubscriptionRelations?) can > be merged to be just one common function that takes a parameter to say > do you want to return the relation List of ALL / READY / NOT_READY? > What do you think? I have used a common function that can get all relations, ready relations and not ready relations instead of the 3 functions. > == > > 3.2 src/backend/commands/subscriptioncmds.c - check_pub_table_subscribed > > +/* > + * Check and throw an error if the publisher has subscribed to the same table > + * from some other publisher. This check is required only if copydata is ON > and > + * the origin is local. > + */ > +static void > +check_pub_table_subscribed(WalReceiverConn *wrconn, List *publications, > +CopyData copydata, char *origin, Oid subid) > > The function comment probably should say something about why the new > READY logic was added in v26. Added a comment for the same. > ~~~ > > 3.3 > > 3.3.a > + /* > + * The subid will be valid only for ALTER SUBSCRIPTION ... REFRESH > + * PUBLICATION. Get the ready relations for the subscription only in case > + * of ALTER SUBSCRIPTION case as there will be no relations in ready state > + * while the subscription is created. > + */ > + if (subid != InvalidOid) > + subreadyrels = GetSubscriptionReadyRelations(subid); > > The word "case" is 2x in the same sentence. I also paraphrased my > understanding of this comment below. Maybe it is simpler? > > SUGGESTION > Get the ready relations for the subscription. The subid will be valid > only for ALTER SUBSCRIPTION ... REFRESH because there will be no > relations in ready state while the subscription is created. Modified > 3.3.b > + if (subid != InvalidOid) > + subreadyrels = GetSubscriptionReadyRelations(subid); > > SUGGESTION > if (OidIsValid(subid)) Modified > == > > 3.4 src/test/subscription/t/032_localonly.pl > > Now all the test cases are re-using the same data (e.g. 11,12,13) and > you are deleting the data between the tests. I guess it is OK, but IMO > the tests are easier to read when each test part was using unique data > (e.g. 11,12,13, then 12,22,32, then 13,23,33 etc) Modified Thanks for the comments, the v29 patch attached at [1] has the changes for the same. [1] - https://www.postgresql.org/message-id/CALDaNm1T5utq60qVx%3DRN60rHcg7wt2psM7PpCQ2fDiB-R8oLGg%40mail.gmail.com Regards, Vignesh
Re: [PATCH] Log details for client certificate failures
On Fri, Jul 1, 2022 at 1:51 PM Jacob Champion wrote: > Sorry for the misunderstanding! v3 adds the Issuer to the logs as well. Resending v3; I messed up the certificate diff with my gitconfig. --Jacob From 8d03e043cd86ec81dfb49a138e871c5ac110dc06 Mon Sep 17 00:00:00 2001 From: Jacob Champion Date: Mon, 29 Mar 2021 10:07:31 -0700 Subject: [PATCH v3] Log details for client certificate failures Currently, debugging client cert verification failures is mostly limited to looking at the TLS alert code on the client side. For simple deployments, sometimes it's enough to see "sslv3 alert certificate revoked" and know exactly what needs to be fixed, but if you add any more complexity (multiple CA layers, misconfigured CA certificates, etc.), trying to debug what happened based on the TLS alert alone can be an exercise in frustration. Luckily, the server has more information about exactly what failed in the chain, and we already have the requisite callback implemented as a stub. Fill it out with error handling and add a COMMERROR log so that a DBA can debug client failures more easily. It ends up looking like LOG: connection received: host=localhost port=44120 LOG: client certificate verification failed at depth 1: unable to get local issuer certificate DETAIL: failed certificate had subject '/CN=Test CA for PostgreSQL SSL regression test client certs', serial number 2315134995201656577, purported issuer '/CN=Root CA' LOG: could not accept SSL connection: certificate verify failed The length of the Subject and Issuer strings is limited to prevent malicious client certs from spamming the logs. In case the truncation makes things ambiguous, the certificate's serial number is also logged. --- src/backend/libpq/be-secure-openssl.c | 101 +- src/test/ssl/conf/client-long.config | 14 src/test/ssl/ssl/client-long.crt | 20 + src/test/ssl/ssl/client-long.key | 27 +++ src/test/ssl/sslfiles.mk | 2 +- src/test/ssl/t/001_ssltests.pl| 40 +- src/test/ssl/t/SSL/Backend/OpenSSL.pm | 2 +- 7 files changed, 200 insertions(+), 6 deletions(-) create mode 100644 src/test/ssl/conf/client-long.config create mode 100644 src/test/ssl/ssl/client-long.crt create mode 100644 src/test/ssl/ssl/client-long.key diff --git a/src/backend/libpq/be-secure-openssl.c b/src/backend/libpq/be-secure-openssl.c index 3d0168a369..80b361b105 100644 --- a/src/backend/libpq/be-secure-openssl.c +++ b/src/backend/libpq/be-secure-openssl.c @@ -1076,12 +1076,45 @@ dummy_ssl_passwd_cb(char *buf, int size, int rwflag, void *userdata) return 0; } +/* + * Examines the provided certificate name, and if it's too long to log, modifies + * and truncates it. The return value is NULL if no truncation was needed; it + * otherwise points into the middle of the input string, and should not be + * freed. + */ +static char * +truncate_cert_name(char *name) +{ + size_t namelen = strlen(name); + char *truncated; + + /* + * Common Names are 64 chars max, so for a common case where the CN is the + * last field, we can still print the longest possible CN with a 7-character + * prefix (".../CN=[64 chars]"), for a reasonable limit of 71 characters. + */ +#define MAXLEN 71 + + if (namelen <= MAXLEN) + return NULL; + + /* + * Keep the end of the name, not the beginning, since the most specific + * field is likely to give users the most information. + */ + truncated = name + namelen - MAXLEN; + truncated[0] = truncated[1] = truncated[2] = '.'; + +#undef MAXLEN + + return truncated; +} + /* * Certificate verification callback * * This callback allows us to log intermediate problems during - * verification, but for now we'll see if the final error message - * contains enough information. + * verification. * * This callback also allows us to override the default acceptance * criteria (e.g., accepting self-signed or expired certs), but @@ -1090,6 +1123,70 @@ dummy_ssl_passwd_cb(char *buf, int size, int rwflag, void *userdata) static int verify_cb(int ok, X509_STORE_CTX *ctx) { + int depth; + int errcode; + const char *errstring; + X509 *cert; + char *subject = NULL, *issuer = NULL; + char *sub_truncated = NULL, *iss_truncated = NULL; + char *serialno = NULL; + + if (ok) + { + /* Nothing to do for the successful case. */ + return ok; + } + + /* Pull all the information we have on the verification failure. */ + depth = X509_STORE_CTX_get_error_depth(ctx); + errcode = X509_STORE_CTX_get_error(ctx); + errstring = X509_verify_cert_error_string(errcode); + + cert = X509_STORE_CTX_get_current_cert(ctx); + if (cert) + { + ASN1_INTEGER *sn; + BIGNUM *b; + + /* + * Get the Subject and Issuer for logging, but don't let maliciously + * huge certs flood the logs. + */ + subject = X509_NAME_to_cstring(X509_get_subject_name(cert)); + sub_truncated = truncate_cert_name(subject); + + issuer = X509_NAME_to_cstring(X509_get_issu
Re: pg15b2: large objects lost on upgrade
On Sat, Jul 2, 2022 at 11:49 AM Justin Pryzby wrote: > I suppose it's like Bruce said, here. > > https://www.postgresql.org/message-id/20210601140949.GC22012%40momjian.us Well, I feel dumb. I remember reading that email back when Bruce sent it, but it seems that it slipped out of my head between then and when I committed. I think your patch is fine, except that I think maybe we should adjust this dump comment: -- For binary upgrade, set pg_largeobject relfrozenxid, relminmxid and relfilenode Perhaps: -- For binary upgrade, preserve values for pg_largeobject and its index Listing the exact properties preserved seems less important to me than mentioning that the second UPDATE statement is for its index -- because if you look at the SQL that's generated, you can see what's being preserved, but you don't automatically know why there are two UPDATE statements or what the rows with those OIDs are. I had a moment of panic this morning where I thought maybe the whole patch needed to be reverted. I was worried that we might need to preserve the OID of every system table and index. Otherwise, what happens if the OID counter in the old cluster wraps around and some user-created object gets an OID that the system tables are using in the new cluster? However, I think this can't happen, because when the OID counter wraps around, it wraps around to 16384, and the relfilenode values for newly created system tables and indexes are all less than 16384. So I believe we only need to fix pg_largeobject and its index, and I think your patch does that. Anyone else see it differently? -- Robert Haas EDB: http://www.enterprisedb.com
Re: pg15b2: large objects lost on upgrade
On Tue, Jul 05, 2022 at 12:43:54PM -0400, Robert Haas wrote: > On Sat, Jul 2, 2022 at 11:49 AM Justin Pryzby wrote: > > I suppose it's like Bruce said, here. > > > > https://www.postgresql.org/message-id/20210601140949.GC22012%40momjian.us > > Well, I feel dumb. I remember reading that email back when Bruce sent > it, but it seems that it slipped out of my head between then and when > I committed. I think your patch is fine, except that I think maybe we My patch also leaves a 0 byte file around from initdb, which is harmless, but dirty. I've seen before where a bunch of 0 byte files are abandoned in an otherwise-empty tablespace, with no associated relation, and I have to "rm" them to be able to drop the tablespace. Maybe that's a known issue, maybe it's due to crashes or other edge case, maybe it's of no consequence, and maybe it's already been fixed or being fixed already. But it'd be nice to avoid another way to have a 0 byte files - especially ones named with system OIDs. > Listing the exact properties preserved seems less important to me than > mentioning that the second UPDATE statement is for its index -- +1 -- Justin
Re: avoid multiple hard links to same WAL file after a crash
On Tue, Jul 05, 2022 at 10:19:49AM +0900, Michael Paquier wrote: > On Thu, May 05, 2022 at 08:10:02PM +0900, Michael Paquier wrote: >> I'd agree with removing all the callers at the end. pgrename() is >> quite robust on Windows, but I'd keep the two checks in >> writeTimeLineHistory(), as the logic around findNewestTimeLine() would >> consider a past TLI history file as in-use even if we have a crash >> just after the file got created in the same path by the same standby, >> and the WAL segment init part. Your patch does that. > > As v16 is now open for business, I have revisited this change and > applied 0001 to change all the callers (aka removal of the assertion > for the WAL receiver when it overwrites a TLI history file). The > commit log includes details about the reasoning of all the areas > changed, for clarity, as of the WAL recycling part, the TLI history > file part and basic_archive. Thanks! I wonder if we should add a comment in writeTimeLineHistoryFile() about possible concurrent use by a WAL receiver and the startup process and why that is okay. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: [UNVERIFIED SENDER] Re: pg_upgrade can result in early wraparound on databases with high transaction load
Bruce Momjian writes: > This patch has been applied back to 9.6 and will appear in the next > minor release. I have just discovered that this patch broke pg_upgrade's ability to upgrade from 8.4: $ pg_upgrade -b ~/version84/bin -d ... Performing Consistency Checks - Checking cluster versions ok The source cluster lacks some required control information: latest checkpoint oldestXID Cannot continue without required control information, terminating Failure, exiting Sure enough, 8.4's pg_controldata doesn't print anything about oldestXID, because that info wasn't there then. Given the lack of field complaints, it's probably not worth trying to do anything to restore that capability. But we really ought to update pg_upgrade's code and docs in pre-v15 branches to say that the minimum supported source version is 9.0. regards, tom lane
Re: pg_upgrade (12->14) fails on aggregate
Andrey Borodin writes: > The patch is marked WiP, what is in progress as of now? It looks about ready to me. Pushed with some minor cosmetic adjustments. regards, tom lane
Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)
On Mon, Mar 21, 2022 at 8:15 PM Andres Freund wrote: > Hi, > > On 2022-02-19 11:06:18 -0500, Melanie Plageman wrote: > > v21 rebased with compile errors fixed is attached. > > This currently doesn't apply (mea culpa likely): > http://cfbot.cputube.org/patch_37_3272.log > > Could you rebase? Marked as waiting-on-author for now. > > > Attached is the rebased/rewritten version of the pg_stat_buffers patch which uses the cumulative stats system instead of stats collector. I've moved to the model of backend-local pending stats which get accumulated into shared memory by pgstat_report_stat(). It is worth noting that, with this method, other backends will no longer have access to each other's individual IO operation statistics. An argument could be made to keep the statistics in each backend in PgBackendStatus before accumulating them to the cumulative stats system so that they can be accessed at the per-backend level of detail. There are two TODOs related to when pgstat_report_io_ops() should be called. pgstat_report_io_ops() is meant for backends that will not commonly call pgstat_report_stat(). I was unsure if it made sense for BootstrapModeMain() to explicitly call pgstat_report_io_ops() and if auto vacuum worker should call it explicitly and, if so, if it was the right location to call it after do_autovacuum(). Archiver and syslogger do not increment or report IO operations. I did not change pg_stat_bgwriter fields to derive from the IO operations statistics structures since the reset targets differ. Also, I added one test, but I'm not sure if it will be flakey. It tests that the "writes" for checkpointer are tracked when data is inserted into a table and then CHECKPOINT is explicitly invoked directly after. I don't know if this will have a problem if the checkpointer is busy and somehow the backend which dirtied the buffer is forced to write out its own buffer, causing the test to potentially fail (even if the checkpointer is doing other writes [causing it to be busy], it may not do them in between the INSERT and the SELECT from pg_stat_buffers). I am wondering how to add a non-flakey test. For regular backends, I couldn't think of a way to suspend checkpointer to make them do their own writes and fsyncs in the context of a regression or isolation test. In fact for many of the dirty buffers it seems like it will be difficult to keep bgwriter, checkpointer, and regular backends from competing and sometimes causing test failures. - Melanie v22-0002-Track-IO-operation-statistics.patch Description: Binary data v22-0003-Add-system-view-tracking-IO-ops-per-backend-type.patch Description: Binary data v22-0001-Add-BackendType-for-standalone-backends.patch Description: Binary data
Re: pg_checkpointer is not a verb or verb phrase
On Fri, Jul 1, 2022 at 5:50 PM Nathan Bossart wrote: > On Fri, Jul 01, 2022 at 03:36:48PM +0200, Magnus Hagander wrote: > > +1 for pg_checkpoint on that -- let's not make it longer than necessary. > > > > And yes, +1 for actually changing it. It's a lot cheaper to change it now > > than it will be in the future. Yes, it would've been even cheaper to have > > already changed it, but we can't go back in time... > > Yeah, pg_checkpoint seems like the obvious alternative to pg_checkpointer. > I didn't see a patch in this thread yet, so I've put one together. I did > not include the catversion bump. Hearing several votes in favor and none opposed, committed and back-patched to v15. I added the catversion bump, but left out the .po file changes, figuring it was better to let those files get updated via the normal process. -- Robert Haas EDB: http://www.enterprisedb.com
Re: generic plans and "initial" pruning
On Fri, May 27, 2022 at 1:09 AM Amit Langote wrote: > 0001 contains the mechanical changes of moving PartitionPruneInfo out > of Append/MergeAppend into a list in PlannedStmt. > > 0002 is the main patch to "Optimize AcquireExecutorLocks() by locking > only unpruned partitions". This patchset will need to be rebased over 835d476fd21; looks like just a cosmetic change. --Jacob
Re: Making the subquery alias optional in the FROM clause
Dean Rasheed writes: > This was discussed previously in [1], and there seemed to be general > consensus in favour of it, but no new patch emerged. As I said in that thread, I'm not super enthused about this, but I was clearly in the minority so I think it should go forward. > Attached is a patch that takes the approach of not generating an alias > at all, which seems to be neater and simpler, and less code than > trying to generate a unique alias. Hm. Looking at the code surrounding what you touched, I'm reminded that we allow JOIN nodes to not have an alias, and represent that situation by rte->alias == NULL. I wonder if it'd be better in the long run to make alias-less subqueries work similarly, rather than generating something that after-the-fact will be indistinguishable from a user-written alias. If that turns out to not work well, I'd agree with "unnamed_subquery" as the inserted name. Also, what about VALUES clauses? It seems inconsistent to remove this restriction for sub-SELECT but not VALUES. Actually it looks like your patch already does remove that restriction in gram.y, but you didn't follow through elsewhere. As far as the docs go, I think it's sufficient to mention the inconsistency with SQL down at the bottom; we don't need a redundant in-line explanation. regards, tom lane
Re: First draft of the PG 15 release notes
On Sat, Jul 2, 2022 at 08:13:41PM -0400, Bruce Momjian wrote: > On Fri, Jul 1, 2022 at 09:56:17AM -0700, Peter Geoghegan wrote: > > On Fri, Jul 1, 2022 at 9:41 AM Bruce Momjian wrote: > > > > It might be worth explaining the shift directly in the release notes. > > > > The new approach is simpler and makes a lot more sense -- why should > > > > the relfrozenxid be closely tied to freezing? We don't necessarily > > > > > > I don't think this is an appropriate detail for the release notes. > > > > Okay. What about saying something about relminmxid advancement where > > the database consumes lots of multixacts? > > No. same issue. Actually, I was wrong. I thought that we only mentioned that we computed a more agressive xid, but now see I was mentioning the _frozen_ xid. Reading the commit, we do compute the multi-xid and store that too so I have updated the PG 15 release notes with the attached patch. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com Indecision is a decision. Inaction is an action. Mark Batterson diff --git a/doc/src/sgml/release-15.sgml b/doc/src/sgml/release-15.sgml index 47ac329e79..179ad37d9d 100644 --- a/doc/src/sgml/release-15.sgml +++ b/doc/src/sgml/release-15.sgml @@ -994,7 +994,8 @@ Author: Peter Geoghegan Allow vacuum to be more -aggressive in setting the oldest frozenxid (Peter Geoghegan) +aggressive in setting the oldest frozen and multi transaction id +(Peter Geoghegan)
Re: [PATCH] Add sortsupport for range types and btree_gist
On Wed, Jun 15, 2022 at 3:45 AM Christoph Heiss wrote: > `make check-world` reports no regressions. cfbot is reporting a crash in contrib/btree_gist: https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest/38/3686 Thanks, --Jacob
Re: Eliminating SPI from RI triggers - take 2
On Thu, Jun 30, 2022 at 11:23 PM Amit Langote wrote: > > I will continue investigating what to do about points (1) and (2) > mentioned above and see if we can do away with using SQL in the > remaining cases. Hi Amit, looks like isolation tests are failing in cfbot: https://cirrus-ci.com/task/6642884727275520 Note also the uninitialized variable warning that cfbot picked up; that may or may not be related. Thanks, --Jacob
Re: Error "initial slot snapshot too large" in create replication slot
On Thu, Mar 31, 2022 at 11:53 PM Kyotaro Horiguchi wrote: > So this is that. v5 creates a regular snapshot. This patch will need a quick rebase over 905c020bef9, which added `extern` to several missing locations. Thanks, --Jacob
Re: doc: BRIN indexes and autosummarize
On Mon, Jul 4, 2022 at 9:20 AM Alvaro Herrera wrote: > > [Some of] these additions are wrong actually. It says that autovacuum > will not summarize new entries; but it does. If you just let the table > sit idle, any autovacuum run that cleans the table will also summarize > any ranges that need summarization. > > What 'autosummarization=off' means is that the behavior to trigger an > immediate summarization of a range once it becomes full is not default. > This is very different. > Without having read through the code, I'll take your word for it. I simply went with what was written on this phrase of the docs: "or by automatic summarization executed by autovacuum, as insertions occur. (This last trigger is disabled by default and can be enabled with the autosummarize parameter.)" To me this did not indicate a third behavior, which is what you are describing, so I'm glad we're having this discussion to clarify it. As for the new s that you added, I'd say they're > stylistically wrong. Each paragraph is supposed to be one fully > contained idea; what these tags do is split each idea across several > smaller paragraphs. This is likely subjective though. > While I don't disagree with you, readability is more important. We have lots of places (such as that one on the docs) where we have a big blob of text, reducing readability, IMHO. In the source they are broken by new lines, but in the rendered HTML, which is what the vast majority of people read, they get rendered into a big blob-looking-thing. > What would be the downside (if any) of having autosummarize=on by default? > > I'm not aware of any. Maybe we should turn it on by default. > +1 Thanks for looking at this Alvaro. Roberto -- Cunchy Data -- passion for open source PostgreSQL
Re: First draft of the PG 15 release notes
On Fri, Jul 1, 2022 at 06:21:28PM -0700, Noah Misch wrote: > Here's what I've been trying to ask: what do you think of linking to > https://www.postgresql.org/docs/devel/ddl-schemas.html#DDL-SCHEMAS-PATTERNS > here? The release note text is still vague, and the docs have extensive > coverage of the topic. The notes can just link to that extensive coverage. Sure. how is this patch? -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com Indecision is a decision. Inaction is an action. Mark Batterson diff --git a/doc/src/sgml/release-15.sgml b/doc/src/sgml/release-15.sgml index 179ad37d9d..3ad8c1d996 100644 --- a/doc/src/sgml/release-15.sgml +++ b/doc/src/sgml/release-15.sgml @@ -63,11 +63,12 @@ Author: Noah Misch permissions on the public schema has not been changed. Databases restored from previous Postgres releases will be restored with their current permissions. Users wishing - to have the former permissions will need to grant + to have the former more-open permissions will need to grant CREATE permission for PUBLIC on the public schema; this change can be made on template1 to cause all new databases - to have these permissions. + to have these permissions. This change was made to increase + security; see .
Re: PG 15 (and to a smaller degree 14) regression due to ExprEvalStep size
Hi, On 2022-06-24 10:29:06 -0400, Andrew Dunstan wrote: > On 2022-06-23 Th 21:51, Andres Freund wrote: > > On 2022-06-23 16:38:12 +0900, Michael Paquier wrote: > >> On Tue, Jun 21, 2022 at 05:41:07PM -0400, Andrew Dunstan wrote: > >>> Yes, but I don't guarantee to have a fix in time for Beta2. > >> IMHO, it would be nice to get something done for beta2. Now the > >> thread is rather fresh and I guess that more performance study is > >> required even for 0004, so.. > > I don't think there's a whole lot of performance study needed for 0004 - the > > current state is obviously wrong. > > > > I think Andrew's beta 2 comment was more about my other architectural > > complains around the json expression eval stuff. > > > Right. That's being worked on but it's not going to be a mechanical fix. Any updates here? I'd mentioned the significant space use due to all JsonCoercionsState for all the types. Another related aspect is that this code is just weird - the same struct name (JsonCoercionsState), nested in each other? struct JsonCoercionsState { struct JsonCoercionState { JsonCoercion *coercion; /* coercion expression */ ExprState *estate; /* coercion expression state */ } null, string, numeric, boolean, date, time, timetz, timestamp, timestamptz, composite; } coercions; /* states for coercion from SQL/JSON item * types directly to the output type */ Also note the weird numeric indentation that pgindent does... > The attached very small patch applies on top of your 0002 and deals with > the FmgrInfo complaint. Now that the FmgrInfo is part of a separately allocated struct, that doesn't seem necessary anymore. - Andres
Re: pg_rewind: warn when checkpoint hasn't happened after promotion
On Sat, Jun 4, 2022 at 8:59 AM James Coleman wrote: > A quick background refresher: after promoting a standby rewinding the > former primary requires that a checkpoint have been completed on the > new primary after promotion. This is correctly documented. However > pg_rewind incorrectly reports to the user that a rewind isn't > necessary because the source and target are on the same timeline. Is there anything intrinsic to the mechanism of operation of pg_rewind that requires a timeline change, or could we just rewind within the same timeline to an earlier LSN? In other words, maybe we could just remove this limitation of pg_rewind, and then perhaps it wouldn't be necessary to determine what the new timeline is. -- Robert Haas EDB: http://www.enterprisedb.com
Re: pg15b2: large objects lost on upgrade
On Tue, Jul 5, 2022 at 12:56 PM Justin Pryzby wrote: > My patch also leaves a 0 byte file around from initdb, which is harmless, but > dirty. > > I've seen before where a bunch of 0 byte files are abandoned in an > otherwise-empty tablespace, with no associated relation, and I have to "rm" > them to be able to drop the tablespace. Maybe that's a known issue, maybe > it's > due to crashes or other edge case, maybe it's of no consequence, and maybe > it's > already been fixed or being fixed already. But it'd be nice to avoid another > way to have a 0 byte files - especially ones named with system OIDs. Do you want to add something to the patch to have pg_upgrade remove the stray file? -- Robert Haas EDB: http://www.enterprisedb.com
Re: PSA: Autoconf has risen from the dead
On Sun, Jul 3, 2022 at 1:17 PM Andres Freund wrote: > Yea, I guess I should start a documentation section... > > I've only used homebrew on mac, but with that it should be something along the > lines of > > brew install meson > meson setup --buildtype debug -Dcassert=true build-directory > cd build-directory > ninja > > of course if you want to build against some dependencies and / or run tap > tests, you need to do something similar to what you have to do for > configure. I.e. > - install perl modules [1] > - tell the build about location of homebrew [2] Since I'm a macports user I hope at some point we'll have directions for that as well as for homebrew. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Emit extra debug message when executing extension script.
On Mon, Jul 4, 2022 at 5:27 AM Alvaro Herrera wrote: > On 2022-Jun-29, Robert Haas wrote: > > On Wed, Jun 29, 2022 at 9:26 AM Peter Eisentraut > > wrote: > > > On 28.06.22 21:10, Jeff Davis wrote: > > > > + ereport(DEBUG1, errmsg("executing extension script: %s", > > > > filename)); > > > > > > This should either be elog or use errmsg_internal. > > > > Why? > > The reason is that errmsg() marks the message for translation, and we > don't want to burden translators with messages that are of little > interest to most users. Using either elog() or errmsg_internal() > avoids that. Yeah, I'm aware of that in general, but I'm not quite clear on how we decide that. Do we take the view that all debug-level messages need not be translated? -- Robert Haas EDB: http://www.enterprisedb.com
Re: pg_rewind: warn when checkpoint hasn't happened after promotion
On Tue, Jul 5, 2022 at 2:39 PM Robert Haas wrote: > > On Sat, Jun 4, 2022 at 8:59 AM James Coleman wrote: > > A quick background refresher: after promoting a standby rewinding the > > former primary requires that a checkpoint have been completed on the > > new primary after promotion. This is correctly documented. However > > pg_rewind incorrectly reports to the user that a rewind isn't > > necessary because the source and target are on the same timeline. > > Is there anything intrinsic to the mechanism of operation of pg_rewind > that requires a timeline change, or could we just rewind within the > same timeline to an earlier LSN? In other words, maybe we could just > remove this limitation of pg_rewind, and then perhaps it wouldn't be > necessary to determine what the new timeline is. I think (someone can correct me if I'm wrong) that in theory the mechanisms would support the source and target being on the same timeline, but in practice that presents problems since you'd not have an LSN you could detect as the divergence point. If we allowed passing "rewind to" point LSN value, then that (again, as far as I understand it) would work, but it's a different use case. Specifically I wouldn't want that option to need to be used for this particular case since in my example there is in fact a real divergence point that we should be detecting automatically. Thanks, James Coleman
Re: pg_rewind: warn when checkpoint hasn't happened after promotion
Robert Haas writes: > Is there anything intrinsic to the mechanism of operation of pg_rewind > that requires a timeline change, or could we just rewind within the > same timeline to an earlier LSN? In other words, maybe we could just > remove this limitation of pg_rewind, and then perhaps it wouldn't be > necessary to determine what the new timeline is. That seems like a fairly bad idea. For example, if you've already archived some WAL segments past the rewind target, there will shortly be two versions of truth about what that part of the WAL space contains, and your archiver will either spit up or do probably-the-wrong-thing. regards, tom lane
Re: PSA: Autoconf has risen from the dead
Hi, On 2022-07-05 14:42:03 -0400, Robert Haas wrote: > On Sun, Jul 3, 2022 at 1:17 PM Andres Freund wrote: > > Yea, I guess I should start a documentation section... > > > > I've only used homebrew on mac, but with that it should be something along > > the > > lines of > > > > brew install meson > > meson setup --buildtype debug -Dcassert=true build-directory > > cd build-directory > > ninja > > > > of course if you want to build against some dependencies and / or run tap > > tests, you need to do something similar to what you have to do for > > configure. I.e. > > - install perl modules [1] > > - tell the build about location of homebrew [2] > > Since I'm a macports user I hope at some point we'll have directions > for that as well as for homebrew. I am not a normal mac user, it looks hard to run macos in a VM, and I'm not sure it's wise to mix macports and homebrew on my test box. So I don't want to test it myself. But it looks like it's just sudo port install meson Greetings, Andres Freund
Re: pg_rewind: warn when checkpoint hasn't happened after promotion
On Tue, Jul 5, 2022 at 2:47 PM Tom Lane wrote: > Robert Haas writes: > > Is there anything intrinsic to the mechanism of operation of pg_rewind > > that requires a timeline change, or could we just rewind within the > > same timeline to an earlier LSN? In other words, maybe we could just > > remove this limitation of pg_rewind, and then perhaps it wouldn't be > > necessary to determine what the new timeline is. > > That seems like a fairly bad idea. For example, if you've already > archived some WAL segments past the rewind target, there will shortly > be two versions of truth about what that part of the WAL space contains, > and your archiver will either spit up or do probably-the-wrong-thing. Well, only if you void the warranty. If you rewind the ex-primary to the LSN where the new primary is replaying and tell it to start replaying from there and follow the new primary's subsequent switch onto a new timeline, there's no split-brain problem. -- Robert Haas EDB: http://www.enterprisedb.com
Re: First draft of the PG 15 release notes
On Tue, Jul 5, 2022 at 11:09 AM Bruce Momjian wrote: > > Actually, I was wrong. I thought that we only mentioned that we > computed a more agressive xid, but now see I was mentioning the _frozen_ > xid. Reading the commit, we do compute the multi-xid and store that too > so I have updated the PG 15 release notes with the attached patch. It might be worth using the "symbol names" directly, since they appear in the documentation already (under "Routine Vacuuming"). These are relfrozenxid and relminmxid. These are implementation details, but they're documented in detail (though admittedly the documentation has *lots* of problems). Here is what I would like this item to hint at, to advanced users with tricky requirements: The new approach to setting relminmxid will improve the behavior of VACUUM in databases that already happen to use lots of MultiXacts. These users will notice that autovacuum now works off of relminmxid values that actually tell us something about each table's consumption of MultiXacts over time. Most individual tables naturally consume *zero* MultiXacts, even in databases that consume many MultiXacts -- due to naturally occuring workload characteristics. The old approach failed to recognize this, leading to very uniform relminmxid values across tables that were in fact very different, MultiXact-wise. The way that we handle relfrozenxid is probably much less likely to make life much easier for any database, at least on its own, in Postgres 15. So from the point of view of a user considering upgrading, the impact on relminmxid is likely to be far more important. Admittedly the most likely scenario by far is that the whole feature just isn't interesting, but a small minority of advanced users (users with painful MultiXact problems) will find the relminmxid thing very compelling. -- Peter Geoghegan
Re: PSA: Autoconf has risen from the dead
Andres Freund writes: > On 2022-07-05 14:42:03 -0400, Robert Haas wrote: >> Since I'm a macports user I hope at some point we'll have directions >> for that as well as for homebrew. > But it looks like it's just > sudo port install meson Yeah, that's what I did to install it locally. The ninja package has some weird name (ninja-build or some such), but you don't have to remember that because installing meson is enough to pull it in. I dunno anything about the other steps Andres mentioned, but presumably they're independent of where you got meson from. regards, tom lane
Re: PSA: Autoconf has risen from the dead
On Tue, Jul 5, 2022 at 2:52 PM Tom Lane wrote: > Andres Freund writes: > > On 2022-07-05 14:42:03 -0400, Robert Haas wrote: > >> Since I'm a macports user I hope at some point we'll have directions > >> for that as well as for homebrew. > > > But it looks like it's just > > sudo port install meson > > Yeah, that's what I did to install it locally. The ninja package > has some weird name (ninja-build or some such), but you don't have > to remember that because installing meson is enough to pull it in. > > I dunno anything about the other steps Andres mentioned, but > presumably they're independent of where you got meson from. That seems simple enough that even I can handle it! -- Robert Haas EDB: http://www.enterprisedb.com
Re: [UNVERIFIED SENDER] Re: pg_upgrade can result in early wraparound on databases with high transaction load
On 2022-07-05 Tu 12:59, Tom Lane wrote: > Bruce Momjian writes: >> This patch has been applied back to 9.6 and will appear in the next >> minor release. > I have just discovered that this patch broke pg_upgrade's ability > to upgrade from 8.4: > > $ pg_upgrade -b ~/version84/bin -d ... > Performing Consistency Checks > - > Checking cluster versions ok > The source cluster lacks some required control information: > latest checkpoint oldestXID > > Cannot continue without required control information, terminating > Failure, exiting > > Sure enough, 8.4's pg_controldata doesn't print anything about > oldestXID, because that info wasn't there then. > > Given the lack of field complaints, it's probably not worth trying > to do anything to restore that capability. But we really ought to > update pg_upgrade's code and docs in pre-v15 branches to say that > the minimum supported source version is 9.0. So it's taken us a year to discover the issue :-( Perhaps if we're going to say we support upgrades back to 9.0 we should have some testing to be assured we don't break it without knowing like this. I'll see if I can coax crake to do that - it already tests back to 9.2. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: doc: BRIN indexes and autosummarize
On Tue, Jul 5, 2022 at 5:47 AM Alvaro Herrera wrote: > OK, I have adopted all your proposed changes, thanks for submitting in > both forms. I did some more wordsmithing and pushed, to branches 12 and > up. 11 fails 'make check', I think for lack of Docbook id tags, and I > didn't want to waste more time. Kindly re-read the result and let me > know if I left something unaddressed, or made something worse. The > updated text is already visible in the website: > https://www.postgresql.org/docs/devel/brin-intro.html > You removed the reference to the functions' documentation at functions-admin-index choosing instead to duplicate a summarized version of the docs, and to boot getting the next block to be blobbed together with it. Keeping with the reduced-readability theme, you made the paragraphs even bigger. While I do appreciate the time to clarify things a bit, as was my original intent with the patch, We should be writing documentation with the user in mind, not for our developer eyes. Different target audiences. It is less helpful to have awesome features that don't get used because users can't really grasp the docs. Paragraphs such as this feel like we're playing "summary bingo": When a new page is created that does not fall within the last summarized range, the range that the new page belongs into does not automatically acquire a summary tuple; those tuples remain unsummarized until a summarization run is invoked later, creating the initial summary for that range Roberto -- Crunchy Data -- passion for open source PostgreSQL
Re: [PATCH] Implement INSERT SET syntax
On Thu, Apr 7, 2022 at 11:29 AM Marko Tiikkaja wrote: > I can help with review and/or other work here. Please give me a > couple of weeks. Hi Marko, did you get a chance to pick up this patchset? If not, no worries; I can mark this RwF and we can try again in a future commitfest. Thanks, --Jacob
Re: PSA: Autoconf has risen from the dead
Hi, On 2022-07-05 14:52:05 -0400, Tom Lane wrote: > I dunno anything about the other steps Andres mentioned, but > presumably they're independent of where you got meson from. Yea. They might not be independent of where you get other dependencies from though. Does macports install headers / libraries into a path that's found by default? Or does one have to pass --with-includes / --with-libs to configure and set PKG_CONFIG_PATH, like with homebrew? Except that with meson doing PKG_CONFIG_PATH should suffice for most (all?) dependencies on macos, and that the syntax for with-includes/libs is a bit different (-Dextra_include_dirs=... and -Dextra_lib_dirs=...) and that optionally one can use a parameter (--pkg-config-path) instead of PKG_CONFIG_PATH, that part shouldn't really differ from what's neccesary for configure. Greetings, Andres Freund
Re: PG 15 (and to a smaller degree 14) regression due to ExprEvalStep size
On 2022-07-05 Tu 14:36, Andres Freund wrote: > Hi, > > On 2022-06-24 10:29:06 -0400, Andrew Dunstan wrote: >> On 2022-06-23 Th 21:51, Andres Freund wrote: >>> On 2022-06-23 16:38:12 +0900, Michael Paquier wrote: On Tue, Jun 21, 2022 at 05:41:07PM -0400, Andrew Dunstan wrote: > Yes, but I don't guarantee to have a fix in time for Beta2. IMHO, it would be nice to get something done for beta2. Now the thread is rather fresh and I guess that more performance study is required even for 0004, so.. >>> I don't think there's a whole lot of performance study needed for 0004 - the >>> current state is obviously wrong. >>> >>> I think Andrew's beta 2 comment was more about my other architectural >>> complains around the json expression eval stuff. >> >> Right. That's being worked on but it's not going to be a mechanical fix. > Any updates here? Not yet. A colleague and I are working on it. I'll post a status this week if we can't post a fix. > > I'd mentioned the significant space use due to all JsonCoercionsState for all > the types. Another related aspect is that this code is just weird - the same > struct name (JsonCoercionsState), nested in each other? > > struct JsonCoercionsState > { > struct JsonCoercionState > { > JsonCoercion *coercion; /* coercion expression */ > ExprState *estate; /* coercion expression state */ > } null, > string, > numeric, > boolean, > date, > time, > timetz, > timestamp, > timestamptz, > composite; > } coercions; /* states for coercion from SQL/JSON item > * types directly to the output type */ > > Also note the weird numeric indentation that pgindent does... Yeah, we'll try to fix that. > > >> The attached very small patch applies on top of your 0002 and deals with >> the FmgrInfo complaint. > Now that the FmgrInfo is part of a separately allocated struct, that doesn't > seem necessary anymore. Right, but you complained that we should do it the same way as it's done elsewhere, so I thought I'd do that anyway. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: PSA: Autoconf has risen from the dead
Andres Freund writes: > Yea. They might not be independent of where you get other dependencies from > though. Does macports install headers / libraries into a path that's found by > default? Or does one have to pass --with-includes / --with-libs to configure > and set PKG_CONFIG_PATH, like with homebrew? What are you expecting to need PKG_CONFIG_PATH for? Or more precisely, why would meson/ninja create any new need for that that doesn't exist in the autoconf case? regards, tom lane
Re: [PATCH] Fix pg_upgrade test from v10
On Tue, Jul 05, 2022 at 09:01:49AM +0300, Anton A. Melnikov wrote: > On 01.07.2022 20:07, Justin Pryzby wrote: > > It's silly to say that v9.2 will be supported potentially for a handful more > > years, but that the upgrade-testing script itself doesn't support that, so > > developers each have to reinvent its fixups. > > I've test the attached patch in all variants from v9.5..15 to supported > versions 10..master. The script test.sh for 9.5->10 and 9.6->10 upgrades > works fine without any patch. > In 9.4 there is a regress test largeobject to be patched to allow upgrade > test from this version.So i've stopped at 9.5. > This is clear that we limit the destination version for upgrade test to the > supported versions only. In our case destination versions > starting from the 10th inclusively. > But is there are a limit for the source version for upgrade test from? As of last year, there's a reasonably clear policy for support of old versions: https://www.postgresql.org/docs/devel/pgupgrade.html |pg_upgrade supports upgrades from 9.2.X and later to the current major release of PostgreSQL, including snapshot and beta releases. See: e469f0aaf3c586c8390bd65923f97d4b1683cd9f So it'd be ideal if this were to support versions down to 9.2. This is failing in cfbot: http://cfbot.cputube.org/anton-melnikov.html ..since it tries to apply all the *.patch files to the master branch, one after another. For branches other than master, I suggest to name the patches *.txt or similar. Or, just focus for now on allowing upgrades *to* master. I'm not sure if anyone is interested in patching test.sh in backbranches. I'm not sure, but there may be more interest to backpatch the conversion to TAP (322becb60). -- Justin
Re: PG 15 (and to a smaller degree 14) regression due to ExprEvalStep size
Hi, On 2022-06-23 18:51:45 -0700, Andres Freund wrote: > > Waiting for beta3 would a better move at this stage. Is somebody confident > > enough in the patches proposed? > > 0001 is the one that needs to most careful analysis, I think. 0002 I'd be fine > with pushing after reviewing it again. For 0003 David's approach might be > better or worse, it doesn't matter much I think. 0004 is ok I think, perhaps > with the exception of quibbling over some naming decisions? I don't quite feel comfortable with 0001, without review by others. So my current plan is to drop it and use get_timeout_active() "manually". We can improve this in HEAD to remove the redundancy. I've pushed what was 0004, will push what was 0002 with the above change in a short while unless somebody protests PDQ. Then will look at David's edition of my 0003. Greetings, Andres Freund
Re: PSA: Autoconf has risen from the dead
Hi, On 2022-07-05 15:06:31 -0400, Tom Lane wrote: > Andres Freund writes: > > Yea. They might not be independent of where you get other dependencies from > > though. Does macports install headers / libraries into a path that's found > > by > > default? Or does one have to pass --with-includes / --with-libs to configure > > and set PKG_CONFIG_PATH, like with homebrew? > > What are you expecting to need PKG_CONFIG_PATH for? Or more precisely, > why would meson/ninja create any new need for that that doesn't exist > in the autoconf case? It's just used in more cases than before, with fallback to non-pkg-config in most cases. I think all dependencies besides perl can use pkg-config. So all that changes compared to AC is that you might not need to pass extra include/lib paths for some dependencies that needed it before, if you set/pass PKG_CONFIG_PATH. Greetings, Andres Freund
Re: [UNVERIFIED SENDER] Re: pg_upgrade can result in early wraparound on databases with high transaction load
Andrew Dunstan writes: > So it's taken us a year to discover the issue :-( Perhaps if we're going > to say we support upgrades back to 9.0 we should have some testing to be > assured we don't break it without knowing like this. I'll see if I can > coax crake to do that - it already tests back to 9.2. Hmm ... could you first look into why 09878cdd4 broke it? I'd supposed that that was just detecting situations we must already have dealt with in order for the pg_upgrade test to work, but crake's not happy. regards, tom lane
Re: [PATCH] Completed unaccent dictionary with many missing characters
Michael Paquier wrote on 7/5/2022 9:22 AM: On Tue, Jun 28, 2022 at 02:14:53PM +0900, Michael Paquier wrote: Well, the addition of cyrillic does not make necessary the removal of SOUND RECORDING COPYRIGHT or the DEGREEs, that implies the use of a dictionnary when manipulating the set of codepoints, but that's me being too picky. Just to say that I am fine with what you are proposing here. So, I have been looking at the change for cyrillic letters, and are you sure that the range of codepoints [U+0410,U+044f] is right when it comes to consider all those letters as plain letters? There are a couple of characters that itch me a bit with this range: - What of the letter CAPITAL SHORT I (U+0419) and SMALL SHORT I (U+0439)? Shouldn't U+0439 be translated to U+0438 and U+0419 translated to U+0418? That's what I get while looking at UnicodeData.txt, and it would mean that the range of plain letters should not include both of them. 1. It's good that you noticed it. I missed it. But it doesn't affect the generated rule list. - It seems like we are missing a couple of letters after U+044F, like U+0454, U+0456 or U+0455 just to name three of them? 2. I added a few more letters that are used in languages other than Russian: Byelorussian or Ukrainian. - (0x0410, 0x044f), # Cyrillic capital and small letters + (0x0402, 0x0402), # Cyrillic capital and small letters + (0x0404, 0x0406), # + (0x0408, 0x040b), # + (0x040f, 0x0418), # + (0x041a, 0x0438), # + (0x043a, 0x044f), # + (0x0452, 0x0452), # + (0x0454, 0x0456), # I do not add more, because they probably concern older languages. An alternative might be to rely entirely on Unicode decomposition ... However, after the change, only one additional Ukrainian letter with an accent was added to the rule file. I have extracted from 0001 and applied the parts about the regression tests for degree signs, while adding two more for SOUND RECORDING COPYRIGHT (U+2117) and Black-Letter Capital H (U+210C) translated to 'x', while it should be probably 'H'. 3. The matter is not that simple. When I change priorities (ie Latin-ASCII.xml is less important than Unicode decomposition), then "U + 33D7" changes not to pH but to PH. In the end, I left it like it was before ... If you decide what to do with point 3, I will correct it and send new patches. -- Przemysław Sztoch | Mobile +48 509 99 00 66
Re: Emit extra debug message when executing extension script.
On 2022-Jul-05, Robert Haas wrote: > On Mon, Jul 4, 2022 at 5:27 AM Alvaro Herrera wrote: > > On 2022-Jun-29, Robert Haas wrote: > > > Why? > > > > The reason is that errmsg() marks the message for translation, and we > > don't want to burden translators with messages that are of little > > interest to most users. Using either elog() or errmsg_internal() > > avoids that. > > Yeah, I'm aware of that in general, but I'm not quite clear on how we > decide that. Do we take the view that all debug-level messages need > not be translated? Yes. I don't know about others, but I do. I notice that we have a small number of other errmsg() uses in DEBUG messages already. I don't think they're quite worth it. I mean, would a user ever run amcheck with log level set to DEBUG, and care about any of these messages? I think I wouldn't care. contrib/amcheck/verify_heapam.c:ereport(DEBUG1, contrib/amcheck/verify_heapam.c- (errcode(ERRCODE_READ_ONLY_SQL_TRANSACTION), contrib/amcheck/verify_heapam.c- errmsg("cannot verify unlogged relation \"%s\" during recovery, skipping", contrib/amcheck/verify_nbtree.c:ereport(DEBUG1, contrib/amcheck/verify_nbtree.c- (errcode(ERRCODE_READ_ONLY_SQL_TRANSACTION), contrib/amcheck/verify_nbtree.c- errmsg("cannot verify unlogged index \"%s\" during recovery, skipping", Why are these basic_archive messages translatable? Seems pointless. contrib/basic_archive/basic_archive.c: ereport(DEBUG3, contrib/basic_archive/basic_archive.c- (errmsg("archiving \"%s\" via basic_archive", file))); contrib/basic_archive/basic_archive.c: ereport(DEBUG3, contrib/basic_archive/basic_archive.c- (errmsg("archive file \"%s\" already exists with identical contents", contrib/basic_archive/basic_archive.c: ereport(DEBUG1, contrib/basic_archive/basic_archive.c- (errmsg("archived \"%s\" via basic_archive", file))); We also have a small number in the backend: src/backend/access/heap/vacuumlazy.c: ereport(DEBUG2, src/backend/access/heap/vacuumlazy.c- (errmsg("table \"%s\": removed %lld dead item identifiers in %u pages", src/backend/access/heap/vacuumlazy.c- vacrel->relname, (long long) index, vacuumed_pages))); Why is this one unconditional DEBUG2 instead of depending on LVRelState->message_level (ie. turn into INFO when VERBOSE is used), like every other message from vacuum? It seems to me that if I say VERBOSE, then I may be interested in this message also. While at it, why are the index vacuuming routines not using LVRelState->message_level either but instead hardcode DEBUG2? Aren't they all mistakes? src/backend/replication/logical/worker.c: ereport(DEBUG1, src/backend/replication/logical/worker.c- (errmsg("logical replication apply worker for subscription \"%s\" two_phase is %s", src/backend/replication/logical/worker.c- MySubscription->name, Not sure why anybody cares about this. Why not remove the message? src/backend/utils/activity/pgstat.c:ereport(DEBUG2, src/backend/utils/activity/pgstat.c-(errcode_for_file_access(), src/backend/utils/activity/pgstat.c- errmsg("unlinked permanent statistics file \"%s\"", Hmm, why is there an errcode here, after the operation succeeds? ISTM this could be an elog(). Then we have this one: ereport(DEBUG1, (errcode(ERRCODE_INTERNAL_ERROR), errmsg("picksplit method for column %d of index \"%s\" failed", attno + 1, RelationGetRelationName(r)), errhint("The index is not optimal. To optimize it, contact a developer, or try to use the column as the second one in the CREATE INDEX command."))); I cannot understand how is DEBUG1 a useful log level for this message. How is the user going to find out that there is a problem, when this message is hidden from them? Do we tell people to run their insert queries for tables with GiST indexes under DEBUG1, in case the picksplit method fails, so that they can contact a developer? How many user-defined picksplit methods have been improved to cope with this problem, since commit 09368d23dbf4 added this bit in April 2009? How many of them have been presenting the problem since then, and not been fixed because nobody has noticed that there is a problem? -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
Re: [UNVERIFIED SENDER] Re: pg_upgrade can result in early wraparound on databases with high transaction load
On Tue, Jul 5, 2022 at 11:53 AM Andrew Dunstan wrote: > > Sure enough, 8.4's pg_controldata doesn't print anything about > > oldestXID, because that info wasn't there then. > > > > Given the lack of field complaints, it's probably not worth trying > > to do anything to restore that capability. But we really ought to > > update pg_upgrade's code and docs in pre-v15 branches to say that > > the minimum supported source version is 9.0. > > > So it's taken us a year to discover the issue :-( I'm not surprised at all, given the history here. There were at least a couple of bugs affecting how pg_upgrade carries forward information about these cutoffs. See commits 74cf7d46 and a61daa14. Actually, commit 74cf7d46 was where pg_resetxlog/pg_resetwal's -u argument was first added, for use by pg_upgrade. That commit is only about a year old, and was only backpatched to 9.6. Unfortunately the previous approach to carrying forward oldestXID was an accident that usually worked. So...yeah, things are bad here. At least we now have the ability to detect any downstream problems that this might cause by using pg_amcheck. -- Peter Geoghegan
Re: pg_checkpointer is not a verb or verb phrase
On Tue, Jul 05, 2022 at 01:38:43PM -0400, Robert Haas wrote: > Hearing several votes in favor and none opposed, committed and > back-patched to v15. Thanks. > I added the catversion bump, but left out the .po > file changes, figuring it was better to let those files get updated > via the normal process. I'll keep this in mind for future patches. The changes looked pretty obvious, so I wasn't sure whether to include it. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: doc: BRIN indexes and autosummarize
On 2022-Jul-05, Roberto Mello wrote: > You removed the reference to the functions' documentation at > functions-admin-index choosing instead to duplicate a summarized > version of the docs, and to boot getting the next block to be blobbed > together with it. Actually, my first instinct was to move the interesting parts to the functions docs, then reference those, removing the duplicate bits. But I was discouraged when I read it, because it is just a table in a place not really appropriate for a larger discussion on it. Also, a reference to it is not direct, but rather it goes to a table that contains a lot of other stuff. > Keeping with the reduced-readability theme, you made the paragraphs > even bigger. While I do appreciate the time to clarify things a bit, as was > my original intent with the patch, [...] Hmm, which paragraph are you referring to? I'm not aware of having made any paragraph bigger, quite the opposite. In the original text, the paragraph "At the time of creation," is 13 lines on a browser window that is half the screen; in the patched text, that has been replaced by three paragraphs that are 7, 6, and 4 lines long, plus a separate one for the de-summarization bits at the end of the page, which is 3 lines long. > We should be writing documentation with the user in mind, not for our > developer eyes. Different target audiences. It is less helpful to have > awesome features that don't get used because users can't really > grasp the docs. I try to do that. I guess I fail more frequently that I should. > Paragraphs such as this feel like we're playing "summary bingo": > > When a new page is created that does not fall within the last > summarized range, the range that the new page belongs into > does not automatically acquire a summary tuple; > those tuples remain unsummarized until a summarization run is > invoked later, creating the initial summary for that range Yeah, I am aware that the word "summary" and variations occur way too many times. Maybe it is possible to replace "summary tuple" with "BRIN tuple" for example; can you propose some synonym for "summarized" and "unsummarized"? Perhaps something like this: > When a new page is created that does not fall within the last > summarized range, the range that the new page belongs into > does not automatically acquire a BRIN tuple; > those [pages] remain uncovered by the BRIN index until a summarization > run is > invoked later, creating the initial BRIN tuple for that range (I also replaced the word "tuples" with "pages" in one spot.) -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/ "Hay dos momentos en la vida de un hombre en los que no debería especular: cuando puede permitírselo y cuando no puede" (Mark Twain)
Re: [UNVERIFIED SENDER] Re: pg_upgrade can result in early wraparound on databases with high transaction load
On Tue, Jul 5, 2022 at 12:41 PM Peter Geoghegan wrote: > Actually, commit 74cf7d46 was where pg_resetxlog/pg_resetwal's -u > argument was first added, for use by pg_upgrade. That commit is only > about a year old, and was only backpatched to 9.6. I just realized that this thread was where that work was first discussed. That explains why it took a year to discover that we broke 8.4! On further reflection I think that breaking pg_upgrade for 8.4 might have been a good thing. The issue was fairly visible and obvious if you actually ran into it, which is vastly preferable to what would have happened before commit 74cf7d46. -- Peter Geoghegan
Re: "ERROR: latch already owned" on gharial
On Sun, Jul 3, 2022 at 11:51 PM Thomas Munro wrote: > On Wed, Jun 1, 2022 at 12:55 AM Robert Haas wrote: > > OK, I have access to the box now. I guess I might as well leave the > > crontab jobs enabled until the next time this happens, since Thomas > > just took steps to improve the logging, but I do think these BF > > members are overdue to be killed off, and would like to do that as > > soon as it seems like a reasonable step to take. > > A couple of months later, there has been no repeat of that error. I'd > happily forget about that and move on, if you want to decommission > these. I have commented out the BF stuff in crontab on that machine. -- Robert Haas EDB: http://www.enterprisedb.com
Re: doc: BRIN indexes and autosummarize
On Tue, Jul 05, 2022 at 01:47:27PM +0200, Alvaro Herrera wrote: > OK, I have adopted all your proposed changes, thanks for submitting in > both forms. I did some more wordsmithing and pushed, to branches 12 and > up. 11 fails 'make check', I think for lack of Docbook id tags, and I > didn't want to waste more time. Kindly re-read the result and let me > know if I left something unaddressed, or made something worse. The > updated text is already visible in the website: > https://www.postgresql.org/docs/devel/brin-intro.html One issue: + summarized range, the range that the new page belongs into + does not automatically acquire a summary tuple; "belongs into" sounds wrong - "belongs to" is better. I'll put that change into my "typos" branch to fix later if it's not addressed in this thread. -- Justin
Re: First draft of the PG 15 release notes
On Tue, Jul 5, 2022 at 11:51:31AM -0700, Peter Geoghegan wrote: > On Tue, Jul 5, 2022 at 11:09 AM Bruce Momjian wrote: > > > > Actually, I was wrong. I thought that we only mentioned that we > > computed a more agressive xid, but now see I was mentioning the _frozen_ > > xid. Reading the commit, we do compute the multi-xid and store that too > > so I have updated the PG 15 release notes with the attached patch. > > It might be worth using the "symbol names" directly, since they appear > in the documentation already (under "Routine Vacuuming"). These are > relfrozenxid and > relminmxid. These are implementation > details, but they're documented in detail (though admittedly the > documentation has *lots* of problems). Well, users can look into the details if they wish. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com Indecision is a decision. Inaction is an action. Mark Batterson
Re: First draft of the PG 15 release notes
On Tue, Jul 05, 2022 at 02:35:39PM -0400, Bruce Momjian wrote: > On Fri, Jul 1, 2022 at 06:21:28PM -0700, Noah Misch wrote: > > Here's what I've been trying to ask: what do you think of linking to > > https://www.postgresql.org/docs/devel/ddl-schemas.html#DDL-SCHEMAS-PATTERNS > > here? The release note text is still vague, and the docs have extensive > > coverage of the topic. The notes can just link to that extensive coverage. > > Sure. how is this patch? > --- a/doc/src/sgml/release-15.sgml > +++ b/doc/src/sgml/release-15.sgml > @@ -63,11 +63,12 @@ Author: Noah Misch >permissions on the public schema has not >been changed. Databases restored from previous Postgres releases >will be restored with their current permissions. Users wishing > - to have the former permissions will need to grant > + to have the former more-open permissions will need to grant >CREATE permission for PUBLIC >on the public schema; this change can be made >on template1 to cause all new databases > - to have these permissions. > + to have these permissions. This change was made to increase > + security; see . > > I think this still puts undue weight on single-user systems moving back to the old default. The linked documentation does say how to get back to v14 permissions (and disclaims security if you do so), so let's not mention it here. The attached is how I would write it. I also reworked the "Databases restored from previous ..." sentence, since its statement is also true of databases restored v15-to-v15 (no "previous" release involved). I also moved the bit about USAGE to end, since it's just emphasizing what the reader should already assume. Any concerns? Author: Noah Misch Commit: Noah Misch diff --git a/doc/src/sgml/release-15.sgml b/doc/src/sgml/release-15.sgml index 179ad37..aa02ee9 100644 --- a/doc/src/sgml/release-15.sgml +++ b/doc/src/sgml/release-15.sgml @@ -58,16 +58,15 @@ Author: Noah Misch - This is a change in the default for newly-created databases in - existing clusters and for new clusters; USAGE - permissions on the public schema has not - been changed. Databases restored from previous Postgres releases - will be restored with their current permissions. Users wishing - to have the former permissions will need to grant - CREATE permission for PUBLIC - on the public schema; this change can be made - on template1 to cause all new databases - to have these permissions. + The new default is one of the secure schema usage patterns that + has recommended since the + security release for CVE-2018-1058. Upgrading a cluster or restoring a + database dump will preserve existing permissions. This is a change in + the default for newly-created databases in existing clusters and for new + clusters. In existing databases, especially those having multiple + users, consider issuing a REVOKE to adopt this new + default. (USAGE permission on this schema has not + changed.)
Re: First draft of the PG 15 release notes
On Tue, Jul 5, 2022 at 12:53:49PM -0700, Noah Misch wrote: > On Tue, Jul 05, 2022 at 02:35:39PM -0400, Bruce Momjian wrote: > > On Fri, Jul 1, 2022 at 06:21:28PM -0700, Noah Misch wrote: > > > Here's what I've been trying to ask: what do you think of linking to > > > https://www.postgresql.org/docs/devel/ddl-schemas.html#DDL-SCHEMAS-PATTERNS > > > here? The release note text is still vague, and the docs have extensive > > > coverage of the topic. The notes can just link to that extensive > > > coverage. > > > > Sure. how is this patch? > > > --- a/doc/src/sgml/release-15.sgml > > +++ b/doc/src/sgml/release-15.sgml > > @@ -63,11 +63,12 @@ Author: Noah Misch > >permissions on the public schema has not > >been changed. Databases restored from previous Postgres releases > >will be restored with their current permissions. Users wishing > > - to have the former permissions will need to grant > > + to have the former more-open permissions will need to grant > >CREATE permission for PUBLIC > >on the public schema; this change can be made > >on template1 to cause all new databases > > - to have these permissions. > > + to have these permissions. This change was made to increase > > + security; see . > > > > > > I think this still puts undue weight on single-user systems moving back to the > old default. The linked documentation does say how to get back to v14 > permissions (and disclaims security if you do so), so let's not mention it > here. The attached is how I would write it. I also reworked the "Databases > restored from previous ..." sentence, since its statement is also true of > databases restored v15-to-v15 (no "previous" release involved). I also moved > the bit about USAGE to end, since it's just emphasizing what the reader should > already assume. Any concerns? I see where you are going --- to talk about how to convert upgraded clusters to secure clusters, rather than how to revert to the previous behavior. I assumed that the most common question would be how to get the previous behavior, rather than how to get the new behavior in upgraded clusters. However, I am fine with what you think is best. My only stylistic suggestion would be to remove "a" from "a REVOKE". -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com Indecision is a decision. Inaction is an action. Mark Batterson
Re: making relfilenodes 56 bits
On Fri, Jul 1, 2022 at 6:42 AM Dilip Kumar wrote: > > - I might be missing something here, but this isn't actually making > > the relfilenode 56 bits, is it? The reason to do that is to make the > > BufferTag smaller, so I expected to see that BufferTag either used > > bitfields like RelFileNumber relNumber:56 and ForkNumber forkNum:8, or > > else that it just declared a single field for both as uint64 and used > > accessor macros or static inlines to separate them out. But it doesn't > > seem to do either of those things, which seems like it can't be right. > > On a related note, I think it would be better to declare RelFileNumber > > as an unsigned type even though we have no use for the high bit; we > > have, equally, no use for negative values. It's easier to reason about > > bit-shifting operations with unsigned types. > > Opps, somehow missed to merge that change in the patch. Changed that > like below and adjusted the macros. > typedef struct buftag > { > Oid spcOid; /* tablespace oid. */ > Oid dbOid; /* database oid. */ > uint32 relNumber_low; /* relfilenumber 32 lower bits */ > uint32 relNumber_hi:24; /* relfilenumber 24 high bits */ > uint32 forkNum:8; /* fork number */ > BlockNumber blockNum; /* blknum relative to begin of reln */ > } BufferTag; > > I think we need to break like this to keep the BufferTag 4 byte > aligned otherwise the size of the structure will be increased. Well, I guess you're right. That's a bummer. In that case I'm a little unsure whether it's worth using bit fields at all. Maybe we should just write uint32 something[2] and use macros after that. Another approach could be to accept the padding and define a constant SizeOfBufferTag and use that as the hash table element size, like we do for the sizes of xlog records. -- Robert Haas EDB: http://www.enterprisedb.com
symbol "conflicts" between libpq and binaries
Hi, There's a number of symbols that are exported by libpq that are also in binaries (mostly via pgport). That strikes me as not great, because the behaviour in those cases isn't particularly well defined / OS dependent / linker option dependent. How about renaming those functions, turning the functions exported by libpq into wrappers? This is at least: pqsignal pg_char_to_encoding pg_valid_server_encoding_id pg_valid_server_encoding pg_encoding_to_char pg_utf_mblen I'm not quite sure why we export some of these, but we likely don't want to change that, given the API/ABI break that'd cause. Greetings, Andres Freund
Re: making relfilenodes 56 bits
On Tue, Jul 5, 2022 at 4:33 AM Dilip Kumar wrote: > I thought about this comment from Robert > > that's not quite the same as either of those things. For example, in > > tableam.h we currently say "This callback needs to create a new > > relation filenode for `rel`" and how should that be changed in this > > new naming? We're not creating a new RelFileNumber - those would need > > to be allocated, not created, as all the numbers in the universe exist > > already. Neither are we creating a new locator; that sounds like it > > means assembling it from pieces. > > I think that "This callback needs to create a new relation storage > for `rel`" looks better. I like the idea, but it would sound better to say "create new relation storage" rather than "create a new relation storage." -- Robert Haas EDB: http://www.enterprisedb.com
Re: PSA: Autoconf has risen from the dead
On Tue, Jul 5, 2022 at 3:02 PM Andres Freund wrote: > Yea. They might not be independent of where you get other dependencies from > though. Does macports install headers / libraries into a path that's found by > default? Or does one have to pass --with-includes / --with-libs to configure > and set PKG_CONFIG_PATH, like with homebrew? My configure switches include: --with-libraries=/opt/local/lib --with-includes=/opt/local/include I don't do anything with PKG_CONFIG_PATH. -- Robert Haas EDB: http://www.enterprisedb.com
Re: pg_checkpointer is not a verb or verb phrase
On Tue, Jul 5, 2022 at 3:42 PM Nathan Bossart wrote: > On Tue, Jul 05, 2022 at 01:38:43PM -0400, Robert Haas wrote: > > Hearing several votes in favor and none opposed, committed and > > back-patched to v15. > > Thanks. > > > I added the catversion bump, but left out the .po > > file changes, figuring it was better to let those files get updated > > via the normal process. > > I'll keep this in mind for future patches. The changes looked pretty > obvious, so I wasn't sure whether to include it. I believe Peter periodically runs a script that bulk copies everything over from the translation repository. -- Robert Haas EDB: http://www.enterprisedb.com
Re: [UNVERIFIED SENDER] Re: pg_upgrade can result in early wraparound on databases with high transaction load
On 2022-07-05 Tu 15:17, Tom Lane wrote: > Andrew Dunstan writes: >> So it's taken us a year to discover the issue :-( Perhaps if we're going >> to say we support upgrades back to 9.0 we should have some testing to be >> assured we don't break it without knowing like this. I'll see if I can >> coax crake to do that - it already tests back to 9.2. > Hmm ... could you first look into why 09878cdd4 broke it? I'd supposed > that that was just detecting situations we must already have dealt with > in order for the pg_upgrade test to work, but crake's not happy. It's complaining about this: andrew@emma:HEAD $ cat ./inst/REL9_6_STABLE-20220705T160820.039/incompatible_polymorphics.txt In database: regression aggregate: public.first_el_agg_f8(double precision) I can have TestUpgradeXVersion.pm search for and remove offending functions, if that's the right fix. I note too that drongo is failing similarly, but its pg_upgrade output directory is missing, so 4fff78f009 seems possibly shy of a load w.r.t. MSVC. I will investigate. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: Issue with pg_stat_subscription_stats
Hi, On 2022-07-04 11:01:01 +0900, Masahiko Sawada wrote: > I've attached the patch, fix_drop_subscriptions_stats.patch, to fix it. LGTM. Unless somebody sees a reason not to, I'm planning to commit that to 15 and HEAD. > I've also attached another PoC patch, > poc_create_subscription_stats.patch, to create the stats entry when > creating the subscription, which address the issue reported in this > thread; the pg_stat_reset_subscription_stats() doesn't update the > stats_reset if no error is reported yet. It'd be good for this to include a test. > diff --git a/src/backend/utils/activity/pgstat_subscription.c > b/src/backend/utils/activity/pgstat_subscription.c > index e1072bd5ba..ef318b7422 100644 > --- a/src/backend/utils/activity/pgstat_subscription.c > +++ b/src/backend/utils/activity/pgstat_subscription.c > @@ -47,8 +47,20 @@ pgstat_report_subscription_error(Oid subid, bool > is_apply_error) > void > pgstat_create_subscription(Oid subid) > { > + PgStat_EntryRef *entry_ref; > + PgStatShared_Subscription *shstatent; > + > pgstat_create_transactional(PGSTAT_KIND_SUBSCRIPTION, > InvalidOid, > subid); > + > + entry_ref = pgstat_get_entry_ref_locked(PGSTAT_KIND_SUBSCRIPTION, > + > InvalidOid, subid, > + > false); > + shstatent = (PgStatShared_Subscription *) entry_ref->shared_stats; > + > + memset(&shstatent->stats, 0, sizeof(shstatent->stats)); > + > + pgstat_unlock_entry(entry_ref); > } > > /* I think most of this could just be pgstat_reset_entry(). Greetings, Andres Freund
Re: First draft of the PG 15 release notes
On Tue, Jul 05, 2022 at 04:35:32PM -0400, Bruce Momjian wrote: > On Tue, Jul 5, 2022 at 12:53:49PM -0700, Noah Misch wrote: > > On Tue, Jul 05, 2022 at 02:35:39PM -0400, Bruce Momjian wrote: > > > On Fri, Jul 1, 2022 at 06:21:28PM -0700, Noah Misch wrote: > > > > Here's what I've been trying to ask: what do you think of linking to > > > > https://www.postgresql.org/docs/devel/ddl-schemas.html#DDL-SCHEMAS-PATTERNS > > > > here? The release note text is still vague, and the docs have extensive > > > > coverage of the topic. The notes can just link to that extensive > > > > coverage. > > > > > > Sure. how is this patch? > > > > > --- a/doc/src/sgml/release-15.sgml > > > +++ b/doc/src/sgml/release-15.sgml > > > @@ -63,11 +63,12 @@ Author: Noah Misch > > >permissions on the public schema has not > > >been changed. Databases restored from previous Postgres releases > > >will be restored with their current permissions. Users wishing > > > - to have the former permissions will need to grant > > > + to have the former more-open permissions will need to grant > > >CREATE permission for PUBLIC > > >on the public schema; this change can be made > > >on template1 to cause all new databases > > > - to have these permissions. > > > + to have these permissions. This change was made to increase > > > + security; see . > > > > > > > > > > I think this still puts undue weight on single-user systems moving back to > > the > > old default. The linked documentation does say how to get back to v14 > > permissions (and disclaims security if you do so), so let's not mention it > > here. The attached is how I would write it. I also reworked the "Databases > > restored from previous ..." sentence, since its statement is also true of > > databases restored v15-to-v15 (no "previous" release involved). I also > > moved > > the bit about USAGE to end, since it's just emphasizing what the reader > > should > > already assume. Any concerns? > > I see where you are going --- to talk about how to convert upgraded > clusters to secure clusters, rather than how to revert to the previous > behavior. I assumed that the most common question would be how to get > the previous behavior, rather than how to get the new behavior in > upgraded clusters. However, I am fine with what you think is best. Since having too-permissive ACLs is usually symptom-free, I share your forecast about the more-common question. Expect questions on mailing lists, stackoverflow, etc. The right way to answer those questions is roughly this: > On PostgreSQL 15, my application gets "permission denied for schema > public". What should I do? You have a choice to make. The best selection depends on the security needs of your database. See https://www.postgresql.org/docs/devel/ddl-schemas.html#DDL-SCHEMAS-PATTERNS for a guide to making that choice. Recommending GRANT to that two-sentence question would be negligent. One should know a database's lack of security needs before recommending GRANT. This is a key opportunity to have more users make the right decision while their attention is on the topic. > My only stylistic suggestion would be to remove "a" from "a > REVOKE". I'll plan to push with that change.