Re: Patch proposal: New hooks in the connection path

2022-07-05 Thread Bharath Rupireddy
On Mon, Jul 4, 2022 at 6:29 PM Drouvot, Bertrand  wrote:
>
> On 7/2/22 2:49 AM, Roberto Mello wrote:
>
> On Fri, Jul 1, 2022 at 5:00 PM Nathan Bossart  
> wrote:
>>
>> That being said, I don't see why this information couldn't be provided in a
>> system view.  IMO it is generically useful.
>
> +1 for a system view with appropriate permissions, in addition to the hooks.
>
> That would make the information easily accessible to a number or monitoring 
> systems besides the admin.
>
> Agree about that.

Are we going to have it as a part of shared memory stats? Or a
separate shared memory for connection stats exposing these via a
function and a view can be built on this function like
pg_get_replication_slots and pg_replication_slots?

> I'll start another thread and propose a dedicated patch for the "internal 
> counters" and how to expose them.

IMHO, let's have the discussion here in this thread and the patch can be 0002.

Regards,
Bharath Rupireddy.




Re: [PATCH] Completed unaccent dictionary with many missing characters

2022-07-05 Thread Michael Paquier
On Tue, Jun 28, 2022 at 02:14:53PM +0900, Michael Paquier wrote:
> Well, the addition of cyrillic does not make necessary the removal of
> SOUND RECORDING COPYRIGHT or the DEGREEs, that implies the use of a
> dictionnary when manipulating the set of codepoints, but that's me
> being too picky.  Just to say that I am fine with what you are
> proposing here.

So, I have been looking at the change for cyrillic letters, and are
you sure that the range of codepoints [U+0410,U+044f] is right when it
comes to consider all those letters as plain letters?  There are a
couple of characters that itch me a bit with this range:
- What of the letter CAPITAL SHORT I (U+0419) and SMALL SHORT I
(U+0439)?  Shouldn't U+0439 be translated to U+0438 and U+0419
translated to U+0418?  That's what I get while looking at
UnicodeData.txt, and it would mean that the range of plain letters
should not include both of them.
- It seems like we are missing a couple of letters after U+044F, like
U+0454, U+0456 or U+0455 just to name three of them?

I have extracted from 0001 and applied the parts about the regression
tests for degree signs, while adding two more for SOUND RECORDING
COPYRIGHT (U+2117) and Black-Letter Capital H (U+210C) translated to
'x', while it should be probably 'H'.
--
Michael


signature.asc
Description: PGP signature


Re: [PATCH] Add result_types column to pg_prepared_statements view

2022-07-05 Thread Dagfinn Ilmari Mannsåker
On Tue, 5 Jul 2022, at 06:34, Peter Eisentraut wrote:
> On 01.07.22 14:27, Dagfinn Ilmari Mannsåker wrote:
>> Peter Eisentraut  writes:
>> 
>>> On 19.05.22 17:34, Dagfinn Ilmari Mannsåker wrote:
 Prompted by a question on IRC, here's a patch to add a result_types
 column to the pg_prepared_statements view, so that one can see the types
 of the columns returned by a prepared statement, not just the parameter
 types.
 I'm not quite sure about the column name, suggestions welcome.
>>>
>>> I think this patch is sensible.
>
>> The arguments about client-side type-specific value handling for
>> RowDescription don't apply here IMO, since this view is more
>> user-facing.
>
> I agree.  It's also easy to change if needed.  Committed as is.

Thanks!




Re: [PoC] Improve dead tuple storage for lazy vacuum

2022-07-05 Thread Masahiko Sawada
On Tue, Jul 5, 2022 at 6:18 AM Andres Freund  wrote:
>
> Hi,
>
> On 2022-06-16 13:56:55 +0900, Masahiko Sawada wrote:
> > diff --git a/src/backend/lib/radixtree.c b/src/backend/lib/radixtree.c
> > new file mode 100644
> > index 00..bf87f932fd
> > --- /dev/null
> > +++ b/src/backend/lib/radixtree.c
> > @@ -0,0 +1,1763 @@
> > +/*-
> > + *
> > + * radixtree.c
> > + *   Implementation for adaptive radix tree.
> > + *
> > + * This module employs the idea from the paper "The Adaptive Radix Tree: 
> > ARTful
> > + * Indexing for Main-Memory Databases" by Viktor Leis, Alfons Kemper, and 
> > Thomas
> > + * Neumann, 2013.
> > + *
> > + * There are some differences from the proposed implementation.  For 
> > instance,
> > + * this radix tree module utilizes AVX2 instruction, enabling us to use 
> > 256-bit
> > + * width SIMD vector, whereas 128-bit width SIMD vector is used in the 
> > paper.
> > + * Also, there is no support for path compression and lazy path expansion. 
> > The
> > + * radix tree supports fixed length of the key so we don't expect the tree 
> > level
> > + * wouldn't be high.
>
> I think we're going to need path compression at some point, fwiw. I'd bet on
> it being beneficial even for the tid case.
>
>
> > + * The key is a 64-bit unsigned integer and the value is a Datum.
>
> I don't think it's a good idea to define the value type to be a datum.

A datum value is convenient to represent both a pointer and a value so
I used it to avoid defining node types for inner and leaf nodes
separately. Since a datum could be 4 bytes or 8 bytes depending it
might not be good for some platforms. But what kind of aspects do you
not like the idea of using datum?

>
>
> > +/*
> > + * As we descend a radix tree, we push the node to the stack. The stack is 
> > used
> > + * at deletion.
> > + */
> > +typedef struct radix_tree_stack_data
> > +{
> > + radix_tree_node *node;
> > + struct radix_tree_stack_data *parent;
> > +} radix_tree_stack_data;
> > +typedef radix_tree_stack_data *radix_tree_stack;
>
> I think it's a very bad idea for traversal to need allocations. I really want
> to eventually use this for shared structures (eventually with lock-free
> searches at least), and needing to do allocations while traversing the tree is
> a no-go for that.
>
> Particularly given that the tree currently has a fixed depth, can't you just
> allocate this on the stack once?

Yes, we can do that.

>
> > +/*
> > + * Allocate a new node with the given node kind.
> > + */
> > +static radix_tree_node *
> > +radix_tree_alloc_node(radix_tree *tree, radix_tree_node_kind kind)
> > +{
> > + radix_tree_node *newnode;
> > +
> > + newnode = (radix_tree_node *) 
> > MemoryContextAllocZero(tree->slabs[kind],
> > +   
> >radix_tree_node_info[kind].size);
> > + newnode->kind = kind;
> > +
> > + /* update the statistics */
> > + tree->mem_used += GetMemoryChunkSpace(newnode);
> > + tree->cnt[kind]++;
> > +
> > + return newnode;
> > +}
>
> Why are you tracking the memory usage at this level of detail? It's *much*
> cheaper to track memory usage via the memory contexts? Since they're dedicated
> for the radix tree, that ought to be sufficient?

Indeed. I'll use MemoryContextMemAllocated instead.

>
>
> > + else if (idx != n4->n.count)
> > + {
> > + /*
> > +  * the key needs to be 
> > inserted in the middle of the
> > +  * array, make space for the 
> > new key.
> > +  */
> > + memmove(&(n4->chunks[idx + 
> > 1]), &(n4->chunks[idx]),
> > + sizeof(uint8) 
> > * (n4->n.count - idx));
> > + memmove(&(n4->slots[idx + 
> > 1]), &(n4->slots[idx]),
> > + 
> > sizeof(radix_tree_node *) * (n4->n.count - idx));
> > + }
>
> Maybe we could add a static inline helper for these memmoves? Both because
> it's repetitive (for different node types) and because the last time I looked
> gcc was generating quite bad code for this. And having to put workarounds into
> multiple places is obviously worse than having to do it in one place.

Agreed, I'll update it.

>
>
> > +/*
> > + * Insert the key with the val.
> > + *
> > + * found_p is set to true if the key already present, otherwise false, if
> > + * it's not NULL.
> > + *
> > + * XXX: do we need to support update_if_exists behavior?
> > + */
>
> Yes, I think that's needed - hence using bfm_set() instead

Re: [PoC] Improve dead tuple storage for lazy vacuum

2022-07-05 Thread Masahiko Sawada
On Tue, Jul 5, 2022 at 7:00 AM Andres Freund  wrote:
>
> Hi,
>
> On 2022-06-28 15:24:11 +0900, Masahiko Sawada wrote:
> > In both test cases, There is not much difference between using AVX2
> > and SSE2. The more mode types, the more time it takes for loading the
> > data (see sse2_4_16_32_128_256).
>
> Yea, at some point the compiler starts using a jump table instead of branches,
> and that turns out to be a good bit more expensive. And even with branches, it
> obviously adds hard to predict branches. IIRC I fought a bit with the compiler
> to avoid some of that cost, it's possible that got "lost" in Sawada-san's
> patch.
>
>
> Sawada-san, what led you to discard the 1 and 16 node types? IIRC the 1 node
> one is not unimportant until we have path compression.

I wanted to start with a smaller number of node types for simplicity.
16 node type has been added to v4 patch I submitted[1]. I think it's
trade-off between better memory and the overhead of growing (and
shrinking) the node type. I'm going to add more node types once we
turn out based on the benchmark that it's beneficial.

>
> Right now the node struct sizes are:
> 4 - 48 bytes
> 32 - 296 bytes
> 128 - 1304 bytes
> 256 - 2088 bytes
>
> I guess radix_tree_node_128->isset is just 16 bytes compared to 1288 other
> bytes, but needing that separate isset array somehow is sad :/. I wonder if a
> smaller "free index" would do the trick? Point to the element + 1 where we
> searched last and start a plain loop there. Particularly in an insert-only
> workload that'll always work, and in other cases it'll still often work I
> think.

radix_tree_node_128->isset is used to distinguish between null-pointer
in inner nodes and 0 in leaf nodes. So I guess we can have a flag to
indicate a leaf or an inner so that we can interpret (Datum) 0 as
either null-pointer or 0. Or if we define different data types for
inner and leaf nodes probably we don't need it.


> One thing I was wondering about is trying to choose node types in
> roughly-power-of-two struct sizes. It's pretty easy to end up with significant
> fragmentation in the slabs right now when inserting as you go, because some of
> the smaller node types will be freed but not enough to actually free blocks of
> memory. If we instead have ~power-of-two sizes we could just use a single slab
> of the max size, and carve out the smaller node types out of that largest
> allocation.

You meant to manage memory allocation (and free) for smaller node
types by ourselves?

How about using different block size for different node types?

>
> Btw, that fragmentation is another reason why I think it's better to track
> memory usage via memory contexts, rather than doing so based on
> GetMemoryChunkSpace().

Agreed.

>
>
> > > Ideally, node16 and node32 would have the same code with a different
> > > loop count (1 or 2). More generally, there is too much duplication of
> > > code (noted by Andres in his PoC), and there are many variable names
> > > with the node size embedded. This is a bit tricky to make more
> > > general, so we don't need to try it yet, but ideally we would have
> > > something similar to:
> > >
> > > switch (node->kind) // todo: inspect tagged pointer
> > > {
> > >   case RADIX_TREE_NODE_KIND_4:
> > >idx = node_search_eq(node, chunk, 4);
> > >do_action(node, idx, 4, ...);
> > >break;
> > >   case RADIX_TREE_NODE_KIND_32:
> > >idx = node_search_eq(node, chunk, 32);
> > >do_action(node, idx, 32, ...);
> > >   ...
> > > }
>
> FWIW, that should be doable with an inline function, if you pass it the memory
> to the "array" rather than the node directly. Not so sure it's a good idea to
> do dispatch between node types / search methods inside the helper, as you
> suggest below:
>
>
> > > static pg_alwaysinline void
> > > node_search_eq(radix_tree_node node, uint8 chunk, int16 node_fanout)
> > > {
> > > if (node_fanout <= SIMPLE_LOOP_THRESHOLD)
> > >   // do simple loop with (node_simple *) node;
> > > else if (node_fanout <= VECTORIZED_LOOP_THRESHOLD)
> > >   // do vectorized loop where available with (node_vec *) node;
> > > ...
> > > }

Yeah, It's worth trying at some point.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/




Re: doc phrase: "inheritance child"

2022-07-05 Thread Amit Langote
On Thu, Jun 30, 2022 at 6:55 PM Justin Pryzby  wrote:
> On Fri, May 27, 2022 at 03:22:38PM +0900, Amit Langote wrote:
> > On Wed, May 25, 2022 at 1:30 PM Ashutosh Bapat 
> >  wrote:
> > >
> > > -   If true, the stats include inheritance child columns, not just the
> > > +   If true, the stats include child tables, not just the
> > >
> > > We are replacing columns with tables; is that intentional?
> > >
> > > Partitioned tables do not have their own stats, it's just aggregated 
> > > partition stats.
> > > ...
> > > -   If true, the stats include inheritance child columns, not just the
> > > +   If true, the stats include child childs, not just the
> > > values in the specified relation
> > >
> > >   
> > > @@ -13152,7 +13152,7 @@ SELECT * FROM pg_locks pl LEFT JOIN 
> > > pg_prepared_xacts ppx
> > > inherited bool
> > >
> > >
> > > -   If true, this row includes inheritance child columns, not just the
> > > +   If true, this row includes child tables, not just the
> > > values in the specified table
> > >
> > >   
> > >
> > > Replacing inheritance child "column" with "tables", is that intentional?
> >
> > I was a bit confused by these too, though perhaps the original text is
> > not as clear as it could be?  Would the following be a good rewrite:
>
> I updated the language to say "values from".  Is this better ?

Yes.

-- 
Thanks, Amit Langote
EDB: http://www.enterprisedb.com




Re: Patch proposal: New hooks in the connection path

2022-07-05 Thread Bharath Rupireddy
On Mon, Jul 4, 2022 at 6:23 PM Drouvot, Bertrand  wrote:
>
> Hi,
>
> On 7/2/22 1:00 AM, Nathan Bossart wrote:
> > Could we model this after fmgr_hook?  The first argument in that hook
> > indicates where it is being called from.  This doesn't alleviate the need
> > for several calls to the hook in the authentication logic, but extension
> > authors would only need to define one hook.
>
> I like the idea and indeed fmgr.h looks a good place to model it.
>
> Attached a new patch version doing so.

Thanks for the patch. Can we think of enhancing
ClientAuthentication_hook_type itself i.e. make it a generic hook for
all sorts of authentication metrics, info etc. with the type parameter
embedded to it instead of new hook FailedConnection_hook?We can either
add a new parameter for the "event" (the existing
ClientAuthentication_hook_type implementers will have problems), or
embed/multiplex the "event" info to existing Port structure or status
variable (macro or enum) (existing implementers will not have
compatibility problems).  IMO, this looks cleaner going forward.

On the v2 patch:

1. Why do we need to place the hook and structures in fmgr.h? Why not in auth.h?

2. Timeout Handler is a signal handler, called as part of SIGALRM
signal handler, most of the times, signal handlers ought to be doing
small things, now that we are handing off the control to hook, which
can do any long running work (writing to a remote storage, file,
aggregate etc.), I don't think it's the right thing to do here.
 static void
 StartupPacketTimeoutHandler(void)
 {
+ if (FailedConnection_hook)
+ (*FailedConnection_hook) (FCET_STARTUP_PACKET_TIMEOUT, MyProcPort);
+ ereport(COMMERROR,
+ (errcode(ERRCODE_PROTOCOL_VIOLATION),
+ errmsg("timeout while processing startup packet")));

3. On "not letting these hooks (ClientAuthentication_hook_type or
FailedConnection_hook_type) expose sensitive info via Port structure"
- it seems like the Port structure has sensitive info like HbaLine,
host address, username, etc. but that's what it is so I think we are
okay with the structure as-is.

Regards,
Bharath Rupireddy.




Re: typos

2022-07-05 Thread Noah Misch
On Thu, Apr 14, 2022 at 08:56:22AM +1200, David Rowley wrote:
> 0007: Not pushed. No space after comment and closing */  pgindent
> fixed one of these but not the other 2.  I've not looked into why
> pgindent does 1 and not the other 2.

> -/* get operation priority  by its code*/
> +/* get operation priority by its code */

pgindent never touches comments that start in column zero.  (That's why many
column-0 comments are wrapped to widths other than the standard 78.)




Re: [PoC] Improve dead tuple storage for lazy vacuum

2022-07-05 Thread Andres Freund
Hi,

On 2022-07-05 16:33:17 +0900, Masahiko Sawada wrote:
> On Tue, Jul 5, 2022 at 6:18 AM Andres Freund  wrote:
> A datum value is convenient to represent both a pointer and a value so
> I used it to avoid defining node types for inner and leaf nodes
> separately.

I'm not convinced that's a good goal. I think we're going to want to have
different key and value types, and trying to unify leaf and inner nodes is
going to make that impossible.

Consider e.g. using it for something like a buffer mapping table - your key
might be way too wide to fit it sensibly into 64bit.


> Since a datum could be 4 bytes or 8 bytes depending it might not be good for
> some platforms.

Right - thats another good reason why it's problematic. A lot of key types
aren't going to be 4/8 bytes dependent on 32/64bit, but either / or.


> > > +void
> > > +radix_tree_insert(radix_tree *tree, uint64 key, Datum val, bool *found_p)
> > > +{
> > > + int shift;
> > > + boolreplaced;
> > > + radix_tree_node *node;
> > > + radix_tree_node *parent = tree->root;
> > > +
> > > + /* Empty tree, create the root */
> > > + if (!tree->root)
> > > + radix_tree_new_root(tree, key, val);
> > > +
> > > + /* Extend the tree if necessary */
> > > + if (key > tree->max_val)
> > > + radix_tree_extend(tree, key);
> >
> > FWIW, the reason I used separate functions for these in the prototype is 
> > that
> > it turns out to generate a lot better code, because it allows non-inlined
> > function calls to be sibling calls - thereby avoiding the need for a 
> > dedicated
> > stack frame. That's not possible once you need a palloc or such, so 
> > splitting
> > off those call paths into dedicated functions is useful.
> 
> Thank you for the info. How much does using sibling call optimization
> help the performance in this case? I think that these two cases are
> used only a limited number of times: inserting the first key and
> extending the tree.

It's not that it helps in the cases moved into separate functions - it's that
not having that code in the "normal" paths keeps the normal path faster.

Greetings,

Andres Freund




Re: [PoC] Improve dead tuple storage for lazy vacuum

2022-07-05 Thread Andres Freund
Hi,

On 2022-07-05 16:33:29 +0900, Masahiko Sawada wrote:
> > One thing I was wondering about is trying to choose node types in
> > roughly-power-of-two struct sizes. It's pretty easy to end up with 
> > significant
> > fragmentation in the slabs right now when inserting as you go, because some 
> > of
> > the smaller node types will be freed but not enough to actually free blocks 
> > of
> > memory. If we instead have ~power-of-two sizes we could just use a single 
> > slab
> > of the max size, and carve out the smaller node types out of that largest
> > allocation.
> 
> You meant to manage memory allocation (and free) for smaller node
> types by ourselves?

For all of them basically. Using a single slab allocator and then subdividing
the "common block size" into however many chunks that fit into a single node
type.

> How about using different block size for different node types?

Not following...


Greetings,

Andres Freund




Re: Handle infinite recursion in logical replication setup

2022-07-05 Thread Amit Kapila
On Mon, Jul 4, 2022 at 11:33 PM vignesh C  wrote:
>
> On Mon, Jul 4, 2022 at 7:44 PM Alvaro Herrera  wrote:
> >
> > > From d8f8844f877806527b6f3f45320b6ba55a8e3154 Mon Sep 17 00:00:00 2001
> > > From: Vigneshwaran C 
> > > Date: Thu, 26 May 2022 19:29:33 +0530
> > > Subject: [PATCH v27 1/4] Add a missing test to verify only-local 
> > > parameter in
> > >  test_decoding plugin.
> > >
> > > Add a missing test to verify only-local parameter in test_decoding plugin.
> >
> > I don't get it.  replorigin.sql already has some lines to test
> > local-only.  What is your patch adding that is new?  Maybe instead of
> > adding some more lines at the end of the script, you should add lines
> > where this stuff is already being tested.  But that assumes that there
> > is something new that is being tested; if so what is it?
>
> The test is to check that remote origin data (i.e. replication origin
> being set) will be filtered when only-local parameter is set. I felt
> that this scenario is not covered, unless I'm missing something.
>

If we change only-local option to '0' in the below part of the test,
we can see a different result:
-- and magically the replayed xact will be filtered!
SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,
NULL, 'include-xids', '0', 'skip-empty-xacts', '1', 'only-local',
'1');

So, I think only-local functionality is being tested but I feel your
test is clear in the sense that it clearly shows that remote data can
only be fetched with 'only-local' as '0' when the origin is set. It
seems to me that in existing tests, we are not testing the combination
where the origin is set and 'only-local' is '0'. So, there is some
value to having the test you are proposing but if Alvaro, you, or
others don't think it is worth adding a new test for it then we can
probably drop this one.

-- 
With Regards,
Amit Kapila.




Wrong provolatile value for to_timestamp (1 argument)

2022-07-05 Thread Tatsuo Ishii
I found that provolatile attribute of to_timestamp in pg_proc is
wrong:

test=# select provolatile, proargtypes from pg_proc where proname = 
'to_timestamp' and proargtypes[0] = 701;
 provolatile | proargtypes 
-+-
 i   | 701
(1 row)

'i' (immutable) is clearly wrong since the function's return value can
be changed depending on the time zone settings.

Actually the manual says functions depending on time zone settings
should be labeled STABLE.

https://www.postgresql.org/docs/14/xfunc-volatility.html

"A common error is to label a function IMMUTABLE when its results
depend on a configuration parameter. For example, a function that
manipulates timestamps might well have results that depend on the
TimeZone setting. For safety, such functions should be labeled STABLE
instead."

It's intersting that two arguments form of to_timestamp has correct
attribute value ('s': stable) for provolatile in pg_proc.

Do we want to fix this for PG16? I think it's too late for 15.

Best reagards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp




[PATCH] Optional OR REPLACE in CREATE OPERATOR statement

2022-07-05 Thread Svetlana Derevyanko

 
Hello hackers,
 
It seems useful to have [OR REPLACE] option in CREATE OPERATOR statement, as in 
CREATE FUNCTION. This option may be good for writing extension update scripts, 
to avoid errors with re-creating the same operator.
 
Because of cached query plans, only RESTRICT and JOIN options can be changed 
for existing operator, as in ALTER OPERATOR statement.
(discussed here:  
https://www.postgresql.org/message-id/flat/3348985.V7xMLFDaJO%40dinodell )
 
The attached patch will be proposed for September CF.
 
Best regards,
--
Svetlana Derevyanko
Postgres Professional:  http://www.postgrespro.com
The Russian Postgres CompanyFrom 7398a8a14f29a6bbb59117a3e3059231f0b476d9 Mon Sep 17 00:00:00 2001
From: Svetlana Derevyanko 
Date: Tue, 14 Jun 2022 08:40:51 +0300
Subject: [PATCH v1] Add optional [OR REPLACE] in CREATE OPERATOR statement.

As in ALTER OPERATOR, only restrict and join params can be modified,
because of the cached query plans.
---
 doc/src/sgml/ref/create_operator.sgml |  11 +-
 src/backend/catalog/pg_operator.c | 121 +++---
 src/backend/commands/operatorcmds.c   |   6 +-
 src/backend/parser/gram.y |  12 ++
 src/backend/tcop/utility.c|   3 +-
 src/include/catalog/pg_operator.h |   3 +-
 src/include/commands/defrem.h |   2 +-
 src/test/regress/expected/create_operator.out |  94 ++
 src/test/regress/sql/create_operator.sql  | 101 +++
 9 files changed, 325 insertions(+), 28 deletions(-)

diff --git a/doc/src/sgml/ref/create_operator.sgml b/doc/src/sgml/ref/create_operator.sgml
index e27512ff39..0e03108876 100644
--- a/doc/src/sgml/ref/create_operator.sgml
+++ b/doc/src/sgml/ref/create_operator.sgml
@@ -21,7 +21,7 @@ PostgreSQL documentation
 
  
 
-CREATE OPERATOR name (
+CREATE [ OR REPLACE ] OPERATOR name (
 {FUNCTION|PROCEDURE} = function_name
 [, LEFTARG = left_type ] [, RIGHTARG = right_type ]
 [, COMMUTATOR = com_op ] [, NEGATOR = neg_op ]
@@ -36,7 +36,8 @@ CREATE OPERATOR name (
 
   
CREATE OPERATOR defines a new operator,
-   name.  The user who
+   name.  CREATE OR REPLACE OPERATOR
+   will either create a new operator, or replace an existing definition.  The user who
defines an operator becomes its owner.  If a schema name is given
then the operator is created in the specified schema.  Otherwise it
is created in the current schema.
@@ -114,6 +115,12 @@ CREATE OPERATOR name (
as EXECUTE privilege on the underlying function.  If a
commutator or negator operator is specified, you must own these operators.
   
+
+  
+   When CREATE OR REPLACE OPERATOR is used to replace an
+   existing operator, only res_proc
+   and join_proc can be changed.
+  
  
 
  
diff --git a/src/backend/catalog/pg_operator.c b/src/backend/catalog/pg_operator.c
index 630bf3e56c..eca235f735 100644
--- a/src/backend/catalog/pg_operator.c
+++ b/src/backend/catalog/pg_operator.c
@@ -303,6 +303,7 @@ OperatorShellMake(const char *operatorName,
  *		joinId	X join selectivity procedure ID
  *		canMergemerge join can be used with this operator
  *		canHash	hash join can be used with this operator
+ *		replace	replace operator if exists
  *
  * The caller should have validated properties and permissions for the
  * objects passed as OID references.  We must handle the commutator and
@@ -334,7 +335,8 @@ OperatorCreate(const char *operatorName,
 			   Oid restrictionId,
 			   Oid joinId,
 			   bool canMerge,
-			   bool canHash)
+			   bool canHash,
+			   bool replace)
 {
 	Relation	pg_operator_desc;
 	HeapTuple	tup;
@@ -415,12 +417,18 @@ OperatorCreate(const char *operatorName,
    rightTypeId,
    &operatorAlreadyDefined);
 
-	if (operatorAlreadyDefined)
+	if (operatorAlreadyDefined && !replace)
 		ereport(ERROR,
 (errcode(ERRCODE_DUPLICATE_FUNCTION),
  errmsg("operator %s already exists",
 		operatorName)));
 
+	/*
+	 * No such operator yet, so CREATE OR REPLACE is equivalent to CREATE
+	 */
+	if (!OidIsValid(operatorObjectId) && replace)
+		replace = false;
+
 	/*
 	 * At this point, if operatorObjectId is not InvalidOid then we are
 	 * filling in a previously-created shell.  Insist that the user own any
@@ -431,6 +439,59 @@ OperatorCreate(const char *operatorName,
 		aclcheck_error(ACLCHECK_NOT_OWNER, OBJECT_OPERATOR,
 	   operatorName);
 
+	/*
+	 * When operator is updated,
+	 * params others than RESTRICT and JOIN should remain the same.
+	 */
+	if (replace)
+	{
+		Form_pg_operator oprForm;
+
+		tup = SearchSysCache4(OPERNAMENSP,
+			  PointerGetDatum(operatorName),
+			  ObjectIdGetDatum(leftTypeId),
+			  ObjectIdGetDatum(rightTypeId),
+			  ObjectIdGetDatum(operatorNamespace));
+		oprForm = (Form_pg_operator) GETSTRUCT(tup);
+
+		if (oprForm->oprcanmerge != canMerge)
+			ereport(ERROR,
+	(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+	errmsg("operator attribute \"merges\" cannot 

Re: logical replication restrictions

2022-07-05 Thread Peter Smith
Here are some review comments for your v4-0001 patch. I hope they are
useful for you.

==

1. General

This thread name "logical replication restrictions" seems quite
unrelated to the patch here. Maybe it's better to start a new thread
otherwise nobody is going to recognise what this thread is really
about.

==

2. Commit message

Similar to physical replication, a time-delayed copy of the data for
logical replication is useful for some scenarios (specially to fix
errors that might cause data loss).

"specially" -> "particularly" ?

~~~

3. Commit message

Maybe take some examples from the regression tests to show usage of
the new parameter

==

4. doc/src/sgml/catalogs.sgml

+ 
+  
+   subapplydelay int8
+  
+  
+   Delay the application of changes by a specified amount of time.
+  
+ 

I think this should say that the units are ms.

==

5. doc/src/sgml/ref/create_subscription.sgml

+   
+min_apply_delay (integer)
+

Is the "integer" type here correct? It might eventually be stored as
an integer, but IIUC (going by the tests) from the user point-of-view
this parameter is really "text" type for representing ms or interval,
right?

~~~

6. doc/src/sgml/ref/create_subscription.sgml

 Similar
+  to the physical replication feature
+  (), it may be useful to
+  have a time-delayed copy of data for logical replication.

SUGGESTION
As with the physical replication feature (recovery_min_apply_delay),
it can be useful for logical replication to delay the data
replication.

~~~

7. doc/src/sgml/ref/create_subscription.sgml

Delays in logical
+  decoding and in transfer the transaction may reduce the actual wait
+  time.

SUGGESTION
Time spent in logical decoding and in transferring the transaction may
reduce the actual wait time.

~~~

8. doc/src/sgml/ref/create_subscription.sgml

If the system clocks on publisher and subscriber are not
+  synchronized, this may lead to apply changes earlier than expected.

Why just say "earlier than expected"? If the publisher's time is ahead
of the subscriber then the changes might also be *later* than
expected, right? So, perhaps it is better to just say "other than
expected".

~~~

9. doc/src/sgml/ref/create_subscription.sgml

Should there also be a big warning box about the impact if using
synchronous_commit (like the other streaming replication page has this
warning)?

~~~

10. doc/src/sgml/ref/create_subscription.sgml

I think there should be some examples somewhere showing how to specify
this parameter. Maybe they are better added somewhere in "31.2
Subscription" and xrefed from here.

==

11. src/backend/commands/subscriptioncmds.c - parse_subscription_options

I think there should be a default assignment to 0 (done where all the
other supported option defaults are set)

~~~

12. src/backend/commands/subscriptioncmds.c - parse_subscription_options

+ if (opts->min_apply_delay < 0)
+ ereport(ERROR,
+ errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
+ errmsg("option \"%s\" must not be negative", "min_apply_delay"));
+

I thought this check only needs to be do within scope of the preceding
if - (IsSet(supported_opts, SUBOPT_MIN_APPLY_DELAY) &&
strcmp(defel->defname, "min_apply_delay") == 0)

==

13. src/backend/commands/subscriptioncmds.c - AlterSubscription

@@ -1093,6 +1126,17 @@ AlterSubscription(ParseState *pstate,
AlterSubscriptionStmt *stmt,
  if (opts.enabled)
  ApplyLauncherWakeupAtCommit();

+ /*
+ * If this subscription has been disabled and it has an apply
+ * delay set, wake up the logical replication worker to finish
+ * it as soon as possible.
+ */
+ if (!opts.enabled && sub->applydelay > 0)

I did not really understand the logic why should the min_apply_delay
override the enabled=false? It is a called *minimum* delay so if it
ends up being way over the parameter value (because the subscription
is disabled) then why does that matter?

==

14. src/backend/replication/logical/worker.c

@@ -252,6 +252,7 @@ WalReceiverConn *LogRepWorkerWalRcvConn = NULL;

 Subscription *MySubscription = NULL;
 static bool MySubscriptionValid = false;
+TimestampTz MySubscriptionMinApplyDelayUntil = 0;

Looking at the only usage of this variable (in apply_delay) and how it
is used I did see why this cannot just be a local member of the
apply_delay function?

~~~

15. src/backend/replication/logical/worker.c - apply_delay

+/*
+ * Apply the informed delay for the transaction.
+ *
+ * A regular transaction uses the commit time to calculate the delay. A
+ * prepared transaction uses the prepare time to calculate the delay.
+ */
+static void
+apply_delay(TimestampTz ts)

I didn't think it needs to mention here about the different kinds of
transactions because where it comes from has nothing really to do with
this function's logic.

~~~

16. src/backend/replication/logical/worker.c - apply_delay

Refer to comment #14 about MySubscriptionMinApplyDelayUntil.

~~~

17. s

Re: Wrong provolatile value for to_timestamp (1 argument)

2022-07-05 Thread Laurenz Albe
On Tue, 2022-07-05 at 17:29 +0900, Tatsuo Ishii wrote:
> I found that provolatile attribute of to_timestamp in pg_proc is
> wrong:
> 
> test=# select provolatile, proargtypes from pg_proc where proname = 
> 'to_timestamp' and proargtypes[0] = 701;
>  provolatile | proargtypes 
> -+-
>  i   | 701
> (1 row)
> 
> 'i' (immutable) is clearly wrong s

Are you sure?  I'd say that "to_timestamp(double precision)" always
produces the same timestamp for the same argument.  What changes with
the setting of "timezone" is how that timestamp is converted to a
string, but that's a different affair.

Yours,
Laurenz Albe




Re: [PoC] Improve dead tuple storage for lazy vacuum

2022-07-05 Thread John Naylor
On Mon, Jul 4, 2022 at 12:07 PM Masahiko Sawada  wrote:

> > Looking at the node stats, and then your benchmark code, I think key
> > construction is a major influence, maybe more than node type. The
> > key/value scheme tested now makes sense:
> >
> > blockhi || blocklo || 9 bits of item offset
> >
> > (with the leaf nodes containing a bit map of the lowest few bits of
> > this whole thing)
> >
> > We want the lower fanout nodes at the top of the tree and higher
> > fanout ones at the bottom.
>
> So more inner nodes can fit in CPU cache, right?

My thinking is, on average, there will be more dense space utilization
in the leaf bitmaps, and fewer inner nodes. I'm not quite sure about
cache, since with my idea a search might have to visit more nodes to
get the common negative result (indexed tid not found in vacuum's
list).

> > Note some consequences: If the table has enough columns such that much
> > fewer than 100 tuples fit on a page (maybe 30 or 40), then in the
> > dense case the nodes above the leaves will have lower fanout (maybe
> > they will fit in a node32). Also, the bitmap values in the leaves will
> > be more empty. In other words, many tables in the wild *resemble* the
> > sparse case a bit, even if truly all tuples on the page are dead.
> >
> > Note also that the dense case in the benchmark above has ~4500 times
> > more keys than the sparse case, and uses about ~1000 times more
> > memory. But the runtime is only 2-3 times longer. That's interesting
> > to me.
> >
> > To optimize for the sparse case, it seems to me that the key/value would be
> >
> > blockhi || 9 bits of item offset || blocklo
> >
> > I believe that would make the leaf nodes more dense, with fewer inner
> > nodes, and could drastically speed up the sparse case, and maybe many
> > realistic dense cases.
>
> Does it have an effect on the number of inner nodes?
>
> >  I'm curious to hear your thoughts.
>
> Thank you for your analysis. It's worth trying. We use 9 bits for item
> offset but most pages don't use all bits in practice. So probably it
> might be better to move the most significant bit of item offset to the
> left of blockhi. Or more simply:
>
> 9 bits of item offset || blockhi || blocklo

A concern here is most tids won't use many bits in blockhi either,
most often far fewer, so this would make the tree higher, I think.
Each value of blockhi represents 0.5GB of heap (32TB max). Even with
very large tables I'm guessing most pages of interest to vacuum are
concentrated in a few of these 0.5GB "segments".

And it's possible path compression would change the tradeoffs here.

-- 
John Naylor
EDB: http://www.enterprisedb.com




Re: [PoC] Reducing planning time when tables have many partitions

2022-07-05 Thread Yuya Watari
Dear Tom,

Thank you for replying to my email.

On Mon, Jul 4, 2022 at 6:28 AM Tom Lane  wrote:
> I'm not real thrilled with trying to throw hashtables at the problem,
> though.  As David noted, they'd be counterproductive for simple
> queries.

As you said, my approach that utilizes hash tables has some overheads,
leading to degradation in query planning time.

I tested the degradation by a brief experiment. In this experiment, I
used a simple query shown below.

===
SELECT students.name, gpas.gpa AS gpa, sum(scores.score) AS total_score
FROM students, scores, gpas
WHERE students.id = scores.student_id AND students.id = gpas.student_id
GROUP BY students.id, gpas.student_id;
===

Here, students, scores, and gpas tables have no partitions, i.e., they
are regular tables. Therefore, my techniques do not work for this
query and instead may lead to some regression. I repeatedly issued
this query 1 million times and measured their planning times.

The attached figure describes the distribution of the planning times.
The figure indicates that my patch has no severe negative impacts on
the planning performance. However, there seems to be a slight
degradation.

I show the mean and median of planning times below. With my patch, the
planning time became 0.002-0.004 milliseconds slower. We have to deal
with this problem, but reducing time complexity while keeping
degradation to zero is significantly challenging.

Planning time (ms)
 |  Mean | Median
--
 Master  | 0.682 |  0.674
 Patched | 0.686 |  0.676
--
 Degradation | 0.004 |  0.002

Of course, the attached result is just an example. Significant
regression might occur in other types of queries.

> For the bms_equal class of lookups, I wonder if we could get anywhere
> by adding an additional List field to every RelOptInfo that chains
> all EquivalenceMembers that match that RelOptInfo's relids.
> The trick here would be to figure out when to build those lists.
> The simple answer would be to do it lazily on-demand, but that
> would mean a separate scan of all the EquivalenceMembers for each
> RelOptInfo; I wonder if there's a way to do better?
>
> Perhaps the bms_is_subset class could be handled in a similar
> way, ie do a one-time pass to make a List of all EquivalenceMembers
> that use a RelOptInfo.

Thank you for giving your idea. I will try to polish up my algorithm
based on your suggestion.

-- 
Best regards,
Yuya Watari


Re: [PATCH] Add result_types column to pg_prepared_statements view

2022-07-05 Thread Peter Eisentraut

On 05.07.22 09:31, Dagfinn Ilmari Mannsåker wrote:

On Tue, 5 Jul 2022, at 06:34, Peter Eisentraut wrote:

On 01.07.22 14:27, Dagfinn Ilmari Mannsåker wrote:

Peter Eisentraut  writes:


On 19.05.22 17:34, Dagfinn Ilmari Mannsåker wrote:

Prompted by a question on IRC, here's a patch to add a result_types
column to the pg_prepared_statements view, so that one can see the types
of the columns returned by a prepared statement, not just the parameter
types.
I'm not quite sure about the column name, suggestions welcome.


I think this patch is sensible.



The arguments about client-side type-specific value handling for
RowDescription don't apply here IMO, since this view is more
user-facing.


I agree.  It's also easy to change if needed.  Committed as is.


Thanks!


There was a problem that we didn't cover: Not all prepared statements 
have result descriptors (e.g., DML statements), so that would crash as 
written.  I have changed it to return null for result_types in that 
case, and added a test case.





Re: [PATCH] Add result_types column to pg_prepared_statements view

2022-07-05 Thread Dagfinn Ilmari Mannsåker
Peter Eisentraut  writes:

> There was a problem that we didn't cover: Not all prepared statements
> have result descriptors (e.g., DML statements), so that would crash as 
> written. 

D'oh!

> I have changed it to return null for result_types in that case, and
> added a test case.

Thanks for spotting and fixing that.

- ilmari




Re: Wrong provolatile value for to_timestamp (1 argument)

2022-07-05 Thread Tatsuo Ishii
> Are you sure?  I'd say that "to_timestamp(double precision)" always
> produces the same timestamp for the same argument.  What changes with
> the setting of "timezone" is how that timestamp is converted to a
> string, but that's a different affair.

Of course the internal representation of timestamp with time zone data
type is not affected by the time zone setting. But why other form of
to_timestamp is labeled as stable? If your theory is correct, then
other form of to_timestamp shouldn't be labeled immutable as well?

Best reagards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp




Re: Schema variables - new implementation for Postgres 15

2022-07-05 Thread Julien Rouhaud
Hi Pavel,

On Tue, Jul 05, 2022 at 08:42:09AM +0200, Pavel Stehule wrote:
> Hi
>
> fresh rebase + type check. Before returning any value, the related type is
> checked if it is valid still

Great news, thanks a lot for keeping working on it!  I'm still in PTO since
last Friday, but I'm planning to start reviewing this patch as soon as I come
back.  It might take a while as my knowledge of this patch are a bit blurry but
hopefully it shouldn't take too long.




Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns

2022-07-05 Thread Amit Kapila
On Mon, Jul 4, 2022 at 6:12 PM Amit Kapila  wrote:
>
> On Mon, May 30, 2022 at 11:13 AM Masahiko Sawada  
> wrote:
> >
> > I've attached three POC patches:
> >
>
> I think it will be a good idea if you can add a short commit message
> at least to say which patch is proposed for HEAD and which one is for
> back branches. Also, it would be good if you can add some description
> of the fix in the commit message. Let's remove poc* from the patch
> name.
>
> Review poc_add_running_catchanges_xacts_to_serialized_snapshot
> =

Few more comments:
1.
+
+ /* This array must be sorted in xidComparator order */
+ TransactionId *xip;
+ } catchanges;
 };

This array contains the transaction ids for subtransactions as well. I
think it is better mention the same in comments.

2. Are we anytime removing transaction ids from catchanges->xip array?
If not, is there a reason for the same? I think we can remove it
either at commit/abort or even immediately after adding the xid/subxid
to committed->xip array.

3.
+ if (readBytes != sz)
+ {
+ int save_errno = errno;
+
+ CloseTransientFile(fd);
+
+ if (readBytes < 0)
+ {
+ errno = save_errno;
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not read file \"%s\": %m", path)));
+ }
+ else
+ ereport(ERROR,
+ (errcode(ERRCODE_DATA_CORRUPTED),
+ errmsg("could not read file \"%s\": read %d of %zu",
+ path, readBytes, sz)));
+ }

This is the fourth instance of similar error handling code in
SnapBuildRestore(). Isn't it better to extract this into a separate
function?

4.
+TransactionId *
+ReorderBufferGetCatalogChangesXacts(ReorderBuffer *rb, size_t *xcnt_p)
+{
+ HASH_SEQ_STATUS hash_seq;
+ ReorderBufferTXNByIdEnt *ent;
+ TransactionId *xids;
+ size_t xcnt = 0;
+ size_t xcnt_space = 64; /* arbitrary number */
+
+ xids = (TransactionId *) palloc(sizeof(TransactionId) * xcnt_space);
+
+ hash_seq_init(&hash_seq, rb->by_txn);
+ while ((ent = hash_seq_search(&hash_seq)) != NULL)
+ {
+ ReorderBufferTXN *txn = ent->txn;
+
+ if (!rbtxn_has_catalog_changes(txn))
+ continue;

It would be better to allocate memory the first time we have to store
xids. There is a good chance that many a time this function will do
just palloc without having to store any xid.

5. Do you think we should do some performance testing for a mix of
ddl/dml workload to see if it adds any overhead in decoding due to
serialize/restore doing additional work? I don't think it should add
some meaningful overhead but OTOH there is no harm in doing some
testing of the same.

-- 
With Regards,
Amit Kapila.




Re: Making Vars outer-join aware

2022-07-05 Thread Richard Guo
On Sat, Jul 2, 2022 at 12:42 AM Tom Lane  wrote:

>
> Anyway, even though this is far from done, I'm pretty pleased with
> the results so far, so I thought I'd put it out for review by
> anyone who cares to take a look.  I'll add it to the September CF
> in hopes that it might be more or less finished by then, and so
> that the cfbot will check it out.
>

Thanks for the work! I have a question about qual clause placement.

For the query in the example

SELECT * FROM t1 LEFT JOIN t2 ON (t1.x = t2.y) WHERE foo(t2.z)

(foo() is not strict.) We want to avoid pushing foo(t2.z) down to the t2
scan level. Previously we do that with check_outerjoin_delay() by
scanning all the outer joins below and check if the qual references any
nullable rels of the OJ, and if so include the OJ's rels into the qual.
So as a result we'd get that foo(t2.z) is referencing t1 and t2, and
we'd put the qual into the join lists of t1 and t2.

Now there is the 'varnullingrels' marker in the t2.z, which is the LEFT
JOIN below (with RTI 3). So we consider the qual is referencing RTE 2
(which is t2) and RTE 3 (which is the OJ). Do we still need to include
RTE 1, i.e. t1 into the qual's required relids? How should we do that?

Thanks
Richard


Re: Schema variables - new implementation for Postgres 15

2022-07-05 Thread Pavel Stehule
út 5. 7. 2022 v 12:50 odesílatel Julien Rouhaud  napsal:

> Hi Pavel,
>
> On Tue, Jul 05, 2022 at 08:42:09AM +0200, Pavel Stehule wrote:
> > Hi
> >
> > fresh rebase + type check. Before returning any value, the related type
> is
> > checked if it is valid still
>
> Great news, thanks a lot for keeping working on it!  I'm still in PTO since
> last Friday, but I'm planning to start reviewing this patch as soon as I
> come
> back.  It might take a while as my knowledge of this patch are a bit
> blurry but
> hopefully it shouldn't take too long.
>

Thank you

Pavel


Re: CFM Manager

2022-07-05 Thread Ibrar Ahmed
On Tue, Jul 5, 2022 at 8:50 AM Michael Paquier  wrote:

> On Tue, Jul 05, 2022 at 08:17:26AM +0500, Ibrar Ahmed wrote:
> > If nobody has already volunteered for the next upcoming commitfest.
> > I'd like to volunteer. I think early to say is better as always.
>
> Jacob and Daniel have already volunteered.  Based on the number of
> patches at hand (305 in total), getting more help is always welcome, I
> guess.
> --
> Michael
>
I am happy to help, but I am talking about the next one.


-- 
Ibrar Ahmed


Re: pg15b2: large objects lost on upgrade

2022-07-05 Thread Shruthi Gowda
I was able to reproduce the issue. Also, the issue does not occur with code
before to preserve relfilenode commit.
I tested your patch and it fixes the problem.
I am currently analyzing a few things related to the issue. I will come
back once my analysis is completed.

On Sat, Jul 2, 2022 at 9:19 PM Justin Pryzby  wrote:

> On Sat, Jul 02, 2022 at 08:34:04AM -0400, Robert Haas wrote:
> > On Fri, Jul 1, 2022 at 7:14 PM Justin Pryzby 
> wrote:
> > > I noticed this during beta1, but dismissed the issue when it wasn't
> easily
> > > reproducible.  Now, I saw the same problem while upgrading from beta1
> to beta2,
> > > so couldn't dismiss it.  It turns out that LOs are lost if VACUUM FULL
> was run.
> >
> > Yikes. That's really bad, and I have no idea what might be causing it,
> > either. I'll plan to investigate this on Tuesday unless someone gets
> > to it before then.
>
> I suppose it's like Bruce said, here.
>
> https://www.postgresql.org/message-id/20210601140949.GC22012%40momjian.us
>
> |One tricky case is pg_largeobject, which is copied from the old to new
> |cluster since it has user data.  To preserve that relfilenode, you would
> |need to have pg_upgrade perform cluster surgery in each database to
> |renumber its relfilenode to match since it is created by initdb.  I
> |can't think of a case where pg_upgrade already does something like that.
>
> Rather than setting the filenode of the next object as for user tables,
> pg-upgrade needs to UPDATE the relfilenode.
>
> This patch "works for me" but feel free to improve on it.
>


Re: doc: BRIN indexes and autosummarize

2022-07-05 Thread Alvaro Herrera
On 2022-Jul-04, Jaime Casanova wrote:

> I feel that somewhere in this paragraph it should be mentioned that is
> off by default.

OK, I added it.

On 2022-Jul-04, Justin Pryzby wrote:

> [ lots of comments ]

OK, I have adopted all your proposed changes, thanks for submitting in
both forms.  I did some more wordsmithing and pushed, to branches 12 and
up.  11 fails 'make check', I think for lack of Docbook id tags, and I
didn't want to waste more time.  Kindly re-read the result and let me
know if I left something unaddressed, or made something worse.  The
updated text is already visible in the website:
https://www.postgresql.org/docs/devel/brin-intro.html

(Having almost-immediate doc refreshes is an enormous improvement.
Thanks Magnus.)

> Also, a reminder that this was never addressed (I wish the project had a way 
> to
> keep track of known issues).
> 
> https://www.postgresql.org/message-id/20201113160007.gq30...@telsasoft.com
> |error_severity of brin work item

Yeah, I've not forgotten that item.  I can't promise I'll get it fixed
soon, but it's on my list.

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"Nunca se desea ardientemente lo que solo se desea por razón" (F. Alexandre)




Re: replacing role-level NOINHERIT with a grant-level option

2022-07-05 Thread Robert Haas
On Sun, Jul 3, 2022 at 1:17 PM Nathan Bossart  wrote:
> If by "bolder" you mean "mark [NO]INHERIT as deprecated-and-to-be-removed
> and begin emitting WARNINGs when it and WITH INHERIT DEFAULT are used," I
> think it's worth consideration.  I suspect it will be hard to sell removing
> [NO]INHERIT in v16 because it would introduce a compatibility break without
> giving users much time to migrate.  I could be wrong, though.

It's a fair point. But, if our goal for v16 is to do something that
could lead to an eventual deprecation of [NO]INHERIT, I still think
removing WITH INHERIT DEFAULT from the patch set is probably a good
idea. Perhaps then we could document that we recommend using the
grant-level option rather than setting NOINHERIT on the role, and if
users were to follow that advice, eventually no one would be relying
on the role-level property anymore. It might be optimistic to assume
that users will read the documentation, let alone follow it, but it's
something. Now, that does mean accepting a compatibility break now, in
that flipping the role-level setting would no longer affect existing
grants, but it's less of a compatibility break than I proposed
originally, so maybe it's acceptable.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: logical replication restrictions

2022-07-05 Thread Amit Kapila
On Tue, Jul 5, 2022 at 2:12 PM Peter Smith  wrote:
>
> Here are some review comments for your v4-0001 patch. I hope they are
> useful for you.
>
> ==
>
> 1. General
>
> This thread name "logical replication restrictions" seems quite
> unrelated to the patch here. Maybe it's better to start a new thread
> otherwise nobody is going to recognise what this thread is really
> about.
>

+1.

>
> 17. src/backend/replication/logical/worker.c - apply_handle_stream_prepare
>
> @@ -1090,6 +1146,19 @@ apply_handle_stream_prepare(StringInfo s)
>
>   elog(DEBUG1, "received prepare for streamed transaction %u",
> prepare_data.xid);
>
> + /*
> + * Should we delay the current prepared transaction?
> + *
> + * Although the delay is applied in BEGIN PREPARE messages, streamed
> + * prepared transactions apply the delay in a STREAM PREPARE message.
> + * That's ok because no changes have been applied yet
> + * (apply_spooled_messages() will do it).
> + * The STREAM START message does not contain a prepare time (it will be
> + * available when the in-progress prepared transaction finishes), hence, it
> + * was not possible to apply a delay at that time.
> + */
> + apply_delay(prepare_data.prepare_time);
> +
>
> It seems to rely on the spooling happening at the end. But won't this
> cause a problem later when/if the "parallel apply" patch [1] is pushed
> and the stream bgworkers are doing stuff on the fly instead of
> spooling at the end?
>

I wonder why we don't apply the delay on commit/commit_prepared
records only similar to physical replication. See recoveryApplyDelay.
One more advantage would be then we don't need to worry about
transactions that we are going to skip due SKIP feature for
subscribers.

One more thing that might be worth discussing is whether introducing a
new subscription parameter for this feature is a better idea or can we
use guc (either an existing or a new one). Users may want to set this
only for a particular subscription or set of subscriptions in which
case it is better to have this as a subscription level parameter.
OTOH, I was slightly worried that if this will be used for all
subscriptions on a subscriber then it will be burdensome for users.

-- 
With Regards,
Amit Kapila.




Re: Using PQexecQuery in pipeline mode produces unexpected Close messages

2022-07-05 Thread Alvaro Herrera
I have pushed this to all three branches.  Thanks!

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"It takes less than 2 seconds to get to 78% complete; that's a good sign.
A few seconds later it's at 90%, but it seems to have stuck there.  Did
somebody make percentages logarithmic while I wasn't looking?"
http://smylers.hates-software.com/2005/09/08/1995c749.html




Re: Support load balancing in libpq

2022-07-05 Thread Bharath Rupireddy
On Fri, Jun 10, 2022 at 10:01 PM Jelte Fennema
 wrote:
>
> Load balancing connections across multiple read replicas is a pretty
> common way of scaling out read queries. There are two main ways of doing
> so, both with their own advantages and disadvantages:
> 1. Load balancing at the client level
> 2. Load balancing by connecting to an intermediary load balancer
>
> Option 1 has been supported by JDBC (Java) for 8 years and Npgsql (C#)
> merged support about a year ago. This patch adds the same functionality
> to libpq. The way it's implemented is the same as the implementation of
> JDBC, and contains two levels of load balancing:
> 1. The given hosts are randomly shuffled, before resolving them
> one-by-one.
> 2. Once a host its addresses get resolved, those addresses are shuffled,
> before trying to connect to them one-by-one.

Thanks for the patch. +1 for the general idea of redirecting connections.

I'm quoting a previous attempt by Satyanarayana Narlapuram on this
topic [1], it also has a patch set.

IMO, rebalancing of the load must be based on parameters (as also
suggested by Aleksander Alekseev in this thread) such as the
read-only/write queries, CPU/IO/Memory utilization of the
primary/standby, network distance etc. We may not have to go the extra
mile to determine all of these parameters dynamically during query
authentication time, but we can let users provide a list of standby
hosts based on "some" priority (Satya's thread [1] attempts to do
this, in a way, with users specifying the hosts via pg_hba.conf file).
If required, randomization in choosing the hosts can be optional.

Also, IMO, the solution must have a fallback mechanism if the
standby/chosen host isn't reachable.

Few thoughts on the patch:
1) How are we determining if the submitted query is read-only or write?
2) What happens for explicit transactions? The queries related to the
same txn get executed on the same host right? How are we guaranteeing
this?
3) Isn't it good to provide a way to test the patch?

[1] 
https://www.postgresql.org/message-id/flat/CY1PR21MB00246DE1F9E9C58455A78A37915C0%40CY1PR21MB0024.namprd21.prod.outlook.com

Regards,
Bharath Rupireddy.




Re: Temporary file access API

2022-07-05 Thread Antonin Houska
Peter Eisentraut  wrote:

> On 04.07.22 18:35, Antonin Houska wrote:
> >> Attached is a new version of the patch, to evaluate what the API use in the
> >> backend could look like. I haven't touched places where the file is 
> >> accessed
> >> in a non-trivial way, e.g. lseek() / fseek() or pg_pwrite() / pg_pread() is
> >> called.
> > Rebased patch set is attached here, which applies to the current master.
> > (A few more opportunities for the new API implemented here.)
> 
> I don't understand what this patch set is supposed to do.  AFAICT, the thread
> originally forked off from a TDE discussion and, considering the thread
> subject, was possibly once an attempt to refactor temporary file access to
> make integrating encryption easier?  The discussion then talked about things
> like saving on system calls and excising all OS-level file access API use,
> which I found confusing, and the thread then just became a general TDE-related
> mini-discussion.

Yes, it's an attempt to make the encryption less invasive, but there are a few
other objectives, at least: 1) to make the read / write operations "less
low-level" (there are common things like error handling which are often just
copy & pasted from other places), 2) to have buffered I/O with configurable
buffer size (the current patch version still has fixed buffer size though)

It's true that the discussion ends up to be encryption-specific, however the
scope of the patch is broader. The first meassage of the thread references a
related discussion

https://www.postgresql.org/message-id/CA+TgmoYGjN_f=fcerx49bzjhng+gocty+a+xhnrwcvvdy8u...@mail.gmail.com

which is more important for this patch than the suggestions about encryption.

> The patches at hand extend some internal file access APIs and then sprinkle
> them around various places in the backend code, but there is no explanation
> why or how this is better. I don't see any real structural savings one might
> expect from a refactoring patch.  No information has been presented how this
> might help encryption.

At this stage I expected feedback from the developers who have already
contributed to the discussion, because I'm not sure myself if this version
fits their ideas - that's why I didn't elaborate the documentation yet. I'll
try to summarize my understanding in the next version, but I'd appreciate if I
got some feedback for the current version first.

> I also suspect that changing around the use of various file access APIs needs
> to be carefully evaluated in each individual case.  Various code has subtle
> expectations about atomic write behavior, syncing, flushing, error recovery,
> etc.  I don't know if this has been considered here.

I considered that, but could have messed up at some places. Right now I'm
aware of one problem: pgstat.c does not expect the file access API to raise
ERROR - this needs to be fixed.

-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com




Re: POC: Lock updated tuples in tuple_update() and tuple_delete()

2022-07-05 Thread Aleksander Alekseev
Hi Alexander,

> Thoughts and feedback are welcome.

I took some preliminary look at the patch. I'm going to need more time
to meditate on the proposed changes and to figure out the performance
impact.

So far I just wanted to let you know that the patch applied OK for me
and passed all the tests. The `else` branch here seems to be redundant
here:

+if (!updated)
+{
+/* Should not encounter speculative tuple on recheck */
+Assert(!HeapTupleHeaderIsSpeculative(tuple->t_data));
- ReleaseBuffer(buffer);
+ReleaseBuffer(buffer);
+}
+else
+{
+updated = false;
+}

Also I wish there were a little bit more comments since some of the
proposed changes are not that straightforward.

-- 
Best regards,
Aleksander Alekseev




Re: POC: Lock updated tuples in tuple_update() and tuple_delete()

2022-07-05 Thread Aleksander Alekseev
Hi again,

> +if (!updated)
> +{
> +/* Should not encounter speculative tuple on recheck */
> +Assert(!HeapTupleHeaderIsSpeculative(tuple->t_data));
> - ReleaseBuffer(buffer);
> +ReleaseBuffer(buffer);
> +}
> +else
> +{
> +updated = false;
> +}

OK, I got confused here. I suggest changing the if(!...) { .. } else {
.. } code to if() { .. } else { .. } here.

-- 
Best regards,
Aleksander Alekseev




[PATCH] Reuse Workers and Replication Slots during Logical Replication

2022-07-05 Thread Melih Mutlu
Hi hackers,

I created a patch to reuse tablesync workers and their replication slots
for more tables that are not synced yet. So that overhead of creating and
dropping workers/replication slots can be reduced.

Current version of logical replication has two steps: tablesync and apply.
In tablesync step, apply worker creates a tablesync worker for each table
and those tablesync workers are killed when they're done with their
associated table. (the number of tablesync workers running at the same time
is limited by "max_sync_workers_per_subscription")
Each tablesync worker also creates a replication slot on publisher during
its lifetime and drops the slot before exiting.

The purpose of this patch is getting rid of the overhead of
creating/killing a new worker (and replication slot) for each table.
It aims to reuse tablesync workers and their replication slots so that
tablesync workers can copy multiple tables from publisher to subscriber
during their lifetime.

The benefits of reusing tablesync workers can be significant if tables are
empty or close to empty.
In an empty table case, spawning tablesync workers and handling replication
slots are where the most time is spent since the actual copy phase takes
too little time.


The changes in the behaviour of tablesync workers with this patch as
follows:
1- After tablesync worker is done with syncing the current table, it takes
a lock and fetches tables in init state
2- it looks for a table that is not already being synced by another worker
from the tables with init state
3- If it founds one, updates its state for the new table and loops back to
beginning to start syncing
4- If no table found, it drops the replication slot and exits


With those changes, I did some benchmarking to see if it improves anything.
This results compares this patch with the latest version of master branch.
"max_sync_workers_per_subscription" is set to 2 as default.
Got some results simply averaging timings from 5 consecutive runs for each
branch.

First, tested logical replication with empty tables.
10 tables

- master:286.964 ms
- the patch:116.852 ms

100 tables

- master:2785.328 ms
- the patch:706.817 ms

10K tables

- master:39612.349 ms
- the patch:12526.981 ms


Also tried replication tables with some data
10 tables loaded with 10MB data

- master:1517.714 ms
- the patch:1399.965 ms

100 tables loaded with 10MB data

- master:16327.229 ms
- the patch:11963.696 ms


Then loaded more data
10 tables loaded with 100MB data

- master:13910.189 ms
- the patch:14770.982 ms

100 tables loaded with 100MB data

- master:146281.457 ms
- the patch:156957.512


If tables are mostly empty, the improvement can be significant - up to 3x
faster logical replication.
With some data loaded, it can still be faster to some extent.
When the table size increases more, the advantage of reusing workers
becomes insignificant.


I would appreciate your comments and suggestions.Thanks in advance for
reviewing.

Best,
Melih


0001-Reuse-Logical-Replication-Background-worker.patch
Description: Binary data


Re: Fix proposal for comparaison bugs in PostgreSQL::Version

2022-07-05 Thread Andrew Dunstan


On 2022-07-03 Su 16:12, Jehan-Guillaume de Rorthais wrote:
> On Sun, 3 Jul 2022 10:40:21 -0400
> Andrew Dunstan  wrote:
>
>> On 2022-06-29 We 05:09, Jehan-Guillaume de Rorthais wrote:
>>> On Tue, 28 Jun 2022 18:17:40 -0400
>>> Andrew Dunstan  wrote:
>>>  
 On 2022-06-28 Tu 16:53, Jehan-Guillaume de Rorthais wrote:  
> ...
> A better fix would be to store the version internally as version_num that
> are trivial to compute and compare. Please, find in attachment an
> implementation of this.
>
> The patch is a bit bigger because it improved the devel version to support
> rc/beta/alpha comparison like 14rc2 > 14rc1.
>
> Moreover, it adds a bunch of TAP tests to check various use cases.
 Nice catch, but this looks like massive overkill. I think we can very
 simply fix the test in just a few lines of code, instead of a 190 line
 fix and a 130 line TAP test.  
>>> I explained why the patch was a little bit larger than required: it fixes
>>> the bugs and do a little bit more. The _version_cmp sub is shorter and
>>> easier to understand, I use multi-line code where I could probably fold
>>> them in a one-liner, added some comments... Anyway, I don't feel the number
>>> of line changed is "massive". But I can probably remove some code and
>>> shrink some other if it is really important...
>>>
>>> Moreover, to be honest, I don't mind the number of additional lines of TAP
>>> tests. Especially since it runs really, really fast and doesn't hurt
>>> day-to-day devs as it is independent from other TAP tests anyway. It could
>>> be 1k, if it runs fast, is meaningful and helps avoiding futur regressions,
>>> I would welcome the addition.  
>>
>> I don't see the point of having a TAP test at all. We have TAP tests for
>> testing the substantive products we test, not for the test suite
>> infrastructure. Otherwise, where will we stop? Shall we have tests for
>> the things that test the test suite?
> Tons of perl module have regression tests. When questioning where testing
> should stop, it seems the Test::More module itself is not the last frontier:
> https://github.com/Test-More/test-more/tree/master/t
>
> Moreover, the PostgreSQL::Version is not a TAP test module, but a module to
> deal with PostgreSQL versions and compare them.
>
> Testing makes development faster as well when it comes to test the code.
> Instead of testing vaguely manually, you can test a whole bunch of situations
> and add accumulate some more when you think about a new one or when a bug is
> reported. Having TAP test helps to make sure the code work as expected.
>
> It helped me when creating my patch. With all due respect, I just don't
> understand your arguments against them. The number of lines or questioning 
> when
> testing should stop doesn't hold much.


There is not a single TAP test in our source code that is aimed at
testing our test infrastructure as opposed to testing what we are
actually in the business of building, and I'm not about to add one. This
is quite different from, say, CPAN modules.

Every added test consumes buildfarm cycles and space on the buildfarm
server for the report, be it ever so small. Every added test needs
maintenance, be it ever so small. There's no such thing as a free test
(apologies to Heinlein and others).


>
>>> If we really want to save some bytes, I have a two lines worth of code fix
>>> that looks more readable to me than fixing _version_cmp:
>>>
>>> +++ b/src/test/perl/PostgreSQL/Version.pm
>>> @@ -92,9 +92,13 @@ sub new
>>> # Split into an array
>>> my @numbers = split(/\./, $arg);
>>>  
>>> +   # make sure all digit of the array-represented version are set so
>>> we can
>>> +   # keep _version_cmp code as a "simple" digit-to-digit comparison
>>> loop
>>> +   $numbers[$_] += 0 for 0..3;
>>> +
>>> # Treat development versions as having a minor/micro version one
>>> less than # the first released version of that branch.
>>> -   push @numbers, -1 if ($devel);
>>> +   $numbers[3] = -1 if $devel;
>>>  
>>> $devel ||= "";  
>> I don't see why this is any more readable.
> The _version_cmp is much more readable.
>
> But anyway, this is not the point. Using an array to compare versions where we
> can use version_num seems like useless and buggy convolutions to me.
>

I think we'll just have to agree to disagree about it.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Making Vars outer-join aware

2022-07-05 Thread Tom Lane
Richard Guo  writes:
> For the query in the example

> SELECT * FROM t1 LEFT JOIN t2 ON (t1.x = t2.y) WHERE foo(t2.z)

> (foo() is not strict.) We want to avoid pushing foo(t2.z) down to the t2
> scan level. Previously we do that with check_outerjoin_delay() by
> scanning all the outer joins below and check if the qual references any
> nullable rels of the OJ, and if so include the OJ's rels into the qual.
> So as a result we'd get that foo(t2.z) is referencing t1 and t2, and
> we'd put the qual into the join lists of t1 and t2.

> Now there is the 'varnullingrels' marker in the t2.z, which is the LEFT
> JOIN below (with RTI 3). So we consider the qual is referencing RTE 2
> (which is t2) and RTE 3 (which is the OJ). Do we still need to include
> RTE 1, i.e. t1 into the qual's required relids? How should we do that?

It seems likely to me that we could leave the qual's required_relids
as just {2,3} and not have to bother ORing any additional bits into
that.  However, in the case of a Var-free JOIN/ON clause it'd still
be necessary to artificially add some relids to its initially empty
relids.  Since I've not yet tried to rewrite distribute_qual_to_rels
I'm not sure how the details will shake out.

regards, tom lane




Re: Wrong provolatile value for to_timestamp (1 argument)

2022-07-05 Thread Laurenz Albe
On Tue, 2022-07-05 at 19:37 +0900, Tatsuo Ishii wrote:
> > Are you sure?  I'd say that "to_timestamp(double precision)" always
> > produces the same timestamp for the same argument.  What changes with
> > the setting of "timezone" is how that timestamp is converted to a
> > string, but that's a different affair.
> 
> Of course the internal representation of timestamp with time zone data
> type is not affected by the time zone setting. But why other form of
> to_timestamp is labeled as stable? If your theory is correct, then
> other form of to_timestamp shouldn't be labeled immutable as well?

The result of the two-argument form of "to_timestamp" can depend on
the setting of "lc_time":

test=> SET lc_time = 'en_US.utf8';
SET
test=> SELECT to_timestamp('2022-July-05', '-TMMonth-DD');
  to_timestamp  

 2022-07-05 00:00:00+02
(1 row)

test=> SET lc_time = 'de_DE.utf8';
SET
test=> SELECT to_timestamp('2022-July-05', '-TMMonth-DD');
ERROR:  invalid value "July-05" for "Month"
DETAIL:  The given value did not match any of the allowed values for this field.

Yours,
Laurenz Albe




Re: Wrong provolatile value for to_timestamp (1 argument)

2022-07-05 Thread Tom Lane
Laurenz Albe  writes:
> On Tue, 2022-07-05 at 19:37 +0900, Tatsuo Ishii wrote:
>> Of course the internal representation of timestamp with time zone data
>> type is not affected by the time zone setting. But why other form of
>> to_timestamp is labeled as stable? If your theory is correct, then
>> other form of to_timestamp shouldn't be labeled immutable as well?

> The result of the two-argument form of "to_timestamp" can depend on
> the setting of "lc_time":

It also depends on the session's timezone setting, in a way that
the single-argument form does not.

regression=# show timezone;
 TimeZone 
--
 America/New_York
(1 row)

regression=# select to_timestamp(0);
  to_timestamp  

 1969-12-31 19:00:00-05
(1 row)

regression=# select to_timestamp('1970-01-01', '-MM-DD');
  to_timestamp  

 1970-01-01 00:00:00-05
(1 row)

regression=# set timezone = 'utc';
SET
regression=# select to_timestamp(0);
  to_timestamp  

 1970-01-01 00:00:00+00
(1 row)

regression=# select to_timestamp('1970-01-01', '-MM-DD');
  to_timestamp  

 1970-01-01 00:00:00+00
(1 row)

The two results of to_timestamp(0) represent the same UTC
instant, but the other two are different instants.

regards, tom lane




Re: Add LZ4 compression in pg_dump

2022-07-05 Thread Justin Pryzby
This is a review of 0001.

On Tue, Jul 05, 2022 at 01:22:47PM +, gkokola...@pm.me wrote:
> Simply this patchset had started to divert
> heavily already based on comments from Mr. Paquier who had already requested 
> for
> the APIs to be refactored to use function pointers. This is happening in 0002 
> of
> the patchset.

I said something about reducing ifdefs, but I'm having trouble finding what
Michael said about this ?

> > On Sat, Mar 26, 2022 at 11:21:56AM -0500, Justin Pryzby wrote:
> >
> > > LZ4F_HEADER_SIZE_MAX isn't defined in old LZ4.
> > >
> > > I ran into that on an ubuntu LTS, so I don't think it's so old that it
> > > shouldn't be handled more gracefully. LZ4 should either have an explicit
> > > version check, or else shouldn't depend on that feature (or should define 
> > > a
> > > safe fallback version if the library header doesn't define it).
> > > https://packages.ubuntu.com/liblz4-1

The constant still seems to be used without defining a fallback or a minimum 
version.

> > > 0003: typo: of legacy => or legacy

This is still there

> > > You renamed this:
> > >
> > > |- COMPR_ALG_LIBZ
> > > |-} CompressionAlgorithm;
> > > |+ COMPRESSION_GZIP,
> > > |+} CompressionMethod;
> > >
> > > ..But I don't think that's an improvement. If you were to change it, it 
> > > should
> > > say something like PGDUMP_COMPRESS_ZLIB, since there are other compression
> > > structs and typedefs. zlib is not idential to gzip, which uses a different
> > > header, so in WriteDataToArchive(), LIBZ is correct, and GZIP is 
> > > incorrect.

This comment still applies - zlib's gz* functions are "gzip" but the others are
"zlib".  https://zlib.net/manual.html

That affects both the 0001 and 0002 patches.

Actually, I think that "gzip" should not be the name of the user-facing option,
since (except for "plain" format) it isn't using gzip.

+Robert, since this suggests amending parse_compress_algorithm().  Maybe "zlib"
should be parsed the same way as "gzip" - I don't think we ever expose both to
a user, but in some cases (basebackup and pg_dump -Fp -Z1) the output is "gzip"
and in some cases NO it's zlib (pg_dump -Fc -Z1).

> > > The cf* changes in pg_backup_archiver could be split out into a separate
> > > commit. It's strictly a code simplification - not just preparation for 
> > > more
> > > compression algorithms. The commit message should "See also:
> > > bf9aa490db24b2334b3595ee33653bf2fe39208c".

I still think this could be an early,  patch.

> > > freebsd/cfbot is failing.

This is still failing for bsd, windows and compiler warnings.
Windows also has compiler warnings.
http://cfbot.cputube.org/georgios-kokolatos.html

Please see: src/tools/ci/README, which you can use to run check-world on 4 OS
by pushing a branch to github.

> > > I suggested off-list to add an 0099 patch to change LZ4 to the default, to
> > > exercise it more on CI.

What about this ?  I think the patch needs to pass CI on all 4 OS with
default=zlib and default=lz4.

> > On Sat, Mar 26, 2022 at 01:33:36PM -0500, Justin Pryzby wrote:

> @@ -254,7 +251,12 @@ CreateArchive(const char *FileSpec, const ArchiveFormat 
> fmt,
>  Archive *
>  OpenArchive(const char *FileSpec, const ArchiveFormat fmt)
>  {
> - ArchiveHandle *AH = _allocAH(FileSpec, fmt, 0, true, archModeRead, 
> setupRestoreWorker);
> + ArchiveHandle *AH;
> + pg_compress_specification compress_spec;

Should this be initialized to {0} ?

> @@ -969,6 +969,8 @@ NewRestoreOptions(void)
>   opts->format = archUnknown;
>   opts->cparams.promptPassword = TRI_DEFAULT;
>   opts->dumpSections = DUMP_UNSECTIONED;
> + opts->compress_spec.algorithm = PG_COMPRESSION_NONE;
> + opts->compress_spec.level = INT_MIN;

Why INT_MIN ?

> @@ -1115,23 +1117,28 @@ PrintTOCSummary(Archive *AHX)
>   ArchiveHandle *AH = (ArchiveHandle *) AHX;
>   RestoreOptions *ropt = AH->public.ropt;
>   TocEntry   *te;
> + pg_compress_specification out_compress_spec;

Should have {0} ?
I suggest to write it like my 2020 patch for this, which says:
no_compression = {0};

>   /* Open stdout with no compression for AH output handle */
> - AH->gzOut = 0;
> - AH->OF = stdout;
> + out_compress_spec.algorithm = PG_COMPRESSION_NONE;
> + AH->OF = cfdopen(dup(fileno(stdout)), PG_BINARY_A, out_compress_spec);

Ideally this should check the success of dup().

> @@ -3776,21 +3746,25 @@ ReadHead(ArchiveHandle *AH)
> + if (AH->compress_spec.level != INT_MIN)

Why is it testing the level and not the algorithm ?

> --- a/src/bin/pg_dump/pg_backup_custom.c
> +++ b/src/bin/pg_dump/pg_backup_custom.c
> @@ -298,7 +298,7 @@ _StartData(ArchiveHandle *AH, TocEntry *te)
>   _WriteByte(AH, BLK_DATA);   /* Block type */
>   WriteInt(AH, te->dumpId);   /* For sanity check */
>  
> - ctx->cs = AllocateCompressor(AH->compression, _CustomWriteFunc);
> + ctx->cs = AllocateCompressor(AH->compress_spec, _CustomWriteFunc);

Is it necessary to rename the dat

Re: [EXTERNAL] Re: Support load balancing in libpq

2022-07-05 Thread Jelte Fennema
> I'm quoting a previous attempt by Satyanarayana Narlapuram on this
> topic [1], it also has a patch set.

Thanks for sharing that. It's indeed a different approach to solve the
same problem. I think my approach is much simpler, since it only 
requires minimal changes to the libpq client and none to the postgres 
server or the postgres protocol.

However, that linked patch is more flexible due to allowing redirection
based on users and databases. With my patch something similar could
still be achieved by using different hostname lists for different databases
or users at the client side.

To be completely clear on the core difference between the patch IMO:
In this patch a DNS server or (a hardcoded hostname/IP list at the client
side) is used to determine what host to connect to. In the linked
patch instead the Postgres server starts being the source of truth of what
to connect to, thus essentially becoming something similar to a DNS server.

> We may not have to go the extra
> mile to determine all of these parameters dynamically during query
> authentication time, but we can let users provide a list of standby
> hosts based on "some" priority (Satya's thread [1] attempts to do
> this, in a way, with users specifying the hosts via pg_hba.conf file).
> If required, randomization in choosing the hosts can be optional.

I'm not sure if you read my response to Aleksander. I feel like I
addressed part of this at least. But maybe I was not clear enough, 
or added too much fluff. So, I'll re-iterate the important part:
By specifying the same host multiple times in the DNS response or
the hostname/IP list you can achieve weighted load balancing.

Few thoughts on the patch:
> 1) How are we determining if the submitted query is read-only or write?

This is not part of this patch. libpq and thus this patch works at the 
connection 
level, not at the query level, so determining a read-only query or write only 
query
is not possible without large changes.

However, libpq already has a target_session_attrs[1] connection option. This 
can be 
used to open connections specifically to read-only or writable servers. However,
once a read-only connection is opened it is then the responsibility of the 
client 
not to send write queries over this read-only connection, otherwise they will 
fail.

> 2) What happens for explicit transactions? The queries related to the
> same txn get executed on the same host right? How are we guaranteeing
> this?

We're load balancing connections, not queries. Once a connection is made
all queries on that connection will be executed on the same host. 

> 3) Isn't it good to provide a way to test the patch?

The way I tested it myself was by setting up a few databases on my local machine
listening on 127.0.0.1, 127.0.0.2, 127.0.0.3 and then putting all those in the 
connection 
string. Then looking at the connection attempts on the servers their logs 
showed that
the client was indeed connecting to a random one (by using log_connections=true 
in postgresql.conf).

I would definitely like to have some automated tests for this, but I couldn't 
find tests
for libpq that were connecting to multiple postgres servers. If they exist, any 
pointers
are appreciated. If they don't exist, pointers to similar tests are also 
appreciated.

[1]: 
https://www.postgresql.org/docs/current/libpq-connect.html#LIBPQ-CONNECT-TARGET-SESSION-ATTRS



Re: [PATCH] Optional OR REPLACE in CREATE OPERATOR statement

2022-07-05 Thread Tom Lane
=?UTF-8?B?U3ZldGxhbmEgRGVyZXZ5YW5rbw==?=  writes:
> It seems useful to have [OR REPLACE] option in CREATE OPERATOR statement, as 
> in CREATE FUNCTION. This option may be good for writing extension update 
> scripts, to avoid errors with re-creating the same operator.

No, that's not acceptable.  CREATE OR REPLACE should always produce
exactly the same final state of the object, but in this case we cannot
change the underlying function if the operator already exists.

(At least, not without writing a bunch of infrastructure to update
existing views/rules that might use the operator; which among other
things would create a lot of deadlock risks.)

regards, tom lane




Re: Handle infinite recursion in logical replication setup

2022-07-05 Thread vignesh C
On Tue, Jul 5, 2022 at 6:14 AM Peter Smith  wrote:
>
> I checked again the v26* patch set.
>
> I had no more comments for v26-0001, v26-0002, or v26-0004, but below
> are some comments for v26-0003.
>
> 
> v26-0003
> 
>
> 3.1 src/backend/catalog/pg_subscription.c
>
> 3.1.a
> +/*
> + * Get all relations for subscription that are in a ready state.
> + *
> + * Returned list is palloc'ed in current memory context.
> + */
> +List *
> +GetSubscriptionReadyRelations(Oid subid)
>
> "subscription" -> "the subscription"
>
> Also, It might be better to say in the function header what kind of
> structures are in the returned List. E.g. Firstly, I'd assumed it was
> the same return type as the other function
> GetSubscriptionReadyRelations, but it isn’t.

This function has been removed and GetSubscriptionRelations is used
with slight changes.

> 3.1.b
> I think there might be a case to be made for *combining* those
> SubscriptionRelState and SubscripotionRel structs into a single common
> struct. Then all those GetSubscriptionNotReadyRelations and
> GetSubscriptionNotReadyRelations (also GetSubscriptionRelations?) can
> be merged to be just one common function that takes a parameter to say
> do you want to return the relation List of ALL / READY /  NOT_READY?
> What do you think?

I have used a common function that can get all relations, ready
relations and not ready relations instead of the 3 functions.

> ==
>
> 3.2 src/backend/commands/subscriptioncmds.c - check_pub_table_subscribed
>
> +/*
> + * Check and throw an error if the publisher has subscribed to the same table
> + * from some other publisher. This check is required only if copydata is ON 
> and
> + * the origin is local.
> + */
> +static void
> +check_pub_table_subscribed(WalReceiverConn *wrconn, List *publications,
> +CopyData copydata, char *origin, Oid subid)
>
> The function comment probably should say something about why the new
> READY logic was added in v26.

Added a comment for the same.

> ~~~
>
> 3.3
>
> 3.3.a
> + /*
> + * The subid will be valid only for ALTER SUBSCRIPTION ... REFRESH
> + * PUBLICATION. Get the ready relations for the subscription only in case
> + * of ALTER SUBSCRIPTION case as there will be no relations in ready state
> + * while the subscription is created.
> + */
> + if (subid != InvalidOid)
> + subreadyrels = GetSubscriptionReadyRelations(subid);
>
> The word "case" is 2x in the same sentence. I also paraphrased my
> understanding of this comment below. Maybe it is simpler?
>
> SUGGESTION
> Get the ready relations for the subscription. The subid will be valid
> only for ALTER SUBSCRIPTION ... REFRESH because there will be no
> relations in ready state while the subscription is created.

Modified

> 3.3.b
> + if (subid != InvalidOid)
> + subreadyrels = GetSubscriptionReadyRelations(subid);
>
> SUGGESTION
> if (OidIsValid(subid))

Modified

> ==
>
> 3.4 src/test/subscription/t/032_localonly.pl
>
> Now all the test cases are re-using the same data (e.g. 11,12,13) and
> you are deleting the data between the tests. I guess it is OK, but IMO
> the tests are easier to read when each test part was using unique data
> (e.g. 11,12,13, then 12,22,32, then 13,23,33 etc)

Modified

Thanks for the comments, the v29 patch attached at [1] has the changes
for the same.
[1] - 
https://www.postgresql.org/message-id/CALDaNm1T5utq60qVx%3DRN60rHcg7wt2psM7PpCQ2fDiB-R8oLGg%40mail.gmail.com

Regards,
 Vignesh




Re: [PATCH] Log details for client certificate failures

2022-07-05 Thread Jacob Champion
On Fri, Jul 1, 2022 at 1:51 PM Jacob Champion  wrote:
> Sorry for the misunderstanding! v3 adds the Issuer to the logs as well.

Resending v3; I messed up the certificate diff with my gitconfig.

--Jacob
From 8d03e043cd86ec81dfb49a138e871c5ac110dc06 Mon Sep 17 00:00:00 2001
From: Jacob Champion 
Date: Mon, 29 Mar 2021 10:07:31 -0700
Subject: [PATCH v3] Log details for client certificate failures

Currently, debugging client cert verification failures is mostly limited
to looking at the TLS alert code on the client side. For simple
deployments, sometimes it's enough to see "sslv3 alert certificate
revoked" and know exactly what needs to be fixed, but if you add any
more complexity (multiple CA layers, misconfigured CA certificates,
etc.), trying to debug what happened based on the TLS alert alone can be
an exercise in frustration.

Luckily, the server has more information about exactly what failed in
the chain, and we already have the requisite callback implemented as a
stub. Fill it out with error handling and add a COMMERROR log so that a
DBA can debug client failures more easily. It ends up looking like

LOG:  connection received: host=localhost port=44120
LOG:  client certificate verification failed at depth 1: unable to get local issuer certificate
DETAIL:  failed certificate had subject '/CN=Test CA for PostgreSQL SSL regression test client certs', serial number 2315134995201656577, purported issuer '/CN=Root CA'
LOG:  could not accept SSL connection: certificate verify failed

The length of the Subject and Issuer strings is limited to prevent
malicious client certs from spamming the logs. In case the truncation
makes things ambiguous, the certificate's serial number is also logged.
---
 src/backend/libpq/be-secure-openssl.c | 101 +-
 src/test/ssl/conf/client-long.config  |  14 
 src/test/ssl/ssl/client-long.crt  |  20 +
 src/test/ssl/ssl/client-long.key  |  27 +++
 src/test/ssl/sslfiles.mk  |   2 +-
 src/test/ssl/t/001_ssltests.pl|  40 +-
 src/test/ssl/t/SSL/Backend/OpenSSL.pm |   2 +-
 7 files changed, 200 insertions(+), 6 deletions(-)
 create mode 100644 src/test/ssl/conf/client-long.config
 create mode 100644 src/test/ssl/ssl/client-long.crt
 create mode 100644 src/test/ssl/ssl/client-long.key

diff --git a/src/backend/libpq/be-secure-openssl.c b/src/backend/libpq/be-secure-openssl.c
index 3d0168a369..80b361b105 100644
--- a/src/backend/libpq/be-secure-openssl.c
+++ b/src/backend/libpq/be-secure-openssl.c
@@ -1076,12 +1076,45 @@ dummy_ssl_passwd_cb(char *buf, int size, int rwflag, void *userdata)
 	return 0;
 }
 
+/*
+ * Examines the provided certificate name, and if it's too long to log, modifies
+ * and truncates it. The return value is NULL if no truncation was needed; it
+ * otherwise points into the middle of the input string, and should not be
+ * freed.
+ */
+static char *
+truncate_cert_name(char *name)
+{
+	size_t		namelen = strlen(name);
+	char	   *truncated;
+
+	/*
+	 * Common Names are 64 chars max, so for a common case where the CN is the
+	 * last field, we can still print the longest possible CN with a 7-character
+	 * prefix (".../CN=[64 chars]"), for a reasonable limit of 71 characters.
+	 */
+#define MAXLEN 71
+
+	if (namelen <= MAXLEN)
+		return NULL;
+
+	/*
+	 * Keep the end of the name, not the beginning, since the most specific
+	 * field is likely to give users the most information.
+	 */
+	truncated = name + namelen - MAXLEN;
+	truncated[0] = truncated[1] = truncated[2] = '.';
+
+#undef MAXLEN
+
+	return truncated;
+}
+
 /*
  *	Certificate verification callback
  *
  *	This callback allows us to log intermediate problems during
- *	verification, but for now we'll see if the final error message
- *	contains enough information.
+ *	verification.
  *
  *	This callback also allows us to override the default acceptance
  *	criteria (e.g., accepting self-signed or expired certs), but
@@ -1090,6 +1123,70 @@ dummy_ssl_passwd_cb(char *buf, int size, int rwflag, void *userdata)
 static int
 verify_cb(int ok, X509_STORE_CTX *ctx)
 {
+	int			depth;
+	int			errcode;
+	const char *errstring;
+	X509	   *cert;
+	char	   *subject = NULL, *issuer = NULL;
+	char	   *sub_truncated = NULL, *iss_truncated = NULL;
+	char	   *serialno = NULL;
+
+	if (ok)
+	{
+		/* Nothing to do for the successful case. */
+		return ok;
+	}
+
+	/* Pull all the information we have on the verification failure. */
+	depth = X509_STORE_CTX_get_error_depth(ctx);
+	errcode = X509_STORE_CTX_get_error(ctx);
+	errstring = X509_verify_cert_error_string(errcode);
+
+	cert = X509_STORE_CTX_get_current_cert(ctx);
+	if (cert)
+	{
+		ASN1_INTEGER *sn;
+		BIGNUM	   *b;
+
+		/*
+		 * Get the Subject and Issuer for logging, but don't let maliciously
+		 * huge certs flood the logs.
+		 */
+		subject = X509_NAME_to_cstring(X509_get_subject_name(cert));
+		sub_truncated = truncate_cert_name(subject);
+
+		issuer = X509_NAME_to_cstring(X509_get_issu

Re: pg15b2: large objects lost on upgrade

2022-07-05 Thread Robert Haas
On Sat, Jul 2, 2022 at 11:49 AM Justin Pryzby  wrote:
> I suppose it's like Bruce said, here.
>
> https://www.postgresql.org/message-id/20210601140949.GC22012%40momjian.us

Well, I feel dumb. I remember reading that email back when Bruce sent
it, but it seems that it slipped out of my head between then and when
I committed. I think your patch is fine, except that I think maybe we
should adjust this dump comment:

-- For binary upgrade, set pg_largeobject relfrozenxid, relminmxid and
relfilenode

Perhaps:

-- For binary upgrade, preserve values for pg_largeobject and its index

Listing the exact properties preserved seems less important to me than
mentioning that the second UPDATE statement is for its index --
because if you look at the SQL that's generated, you can see what's
being preserved, but you don't automatically know why there are two
UPDATE statements or what the rows with those OIDs are.

I had a moment of panic this morning where I thought maybe the whole
patch needed to be reverted. I was worried that we might need to
preserve the OID of every system table and index. Otherwise, what
happens if the OID counter in the old cluster wraps around and some
user-created object gets an OID that the system tables are using in
the new cluster? However, I think this can't happen, because when the
OID counter wraps around, it wraps around to 16384, and the
relfilenode values for newly created system tables and indexes are all
less than 16384. So I believe we only need to fix pg_largeobject and
its index, and I think your patch does that.

Anyone else see it differently?

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: pg15b2: large objects lost on upgrade

2022-07-05 Thread Justin Pryzby
On Tue, Jul 05, 2022 at 12:43:54PM -0400, Robert Haas wrote:
> On Sat, Jul 2, 2022 at 11:49 AM Justin Pryzby  wrote:
> > I suppose it's like Bruce said, here.
> >
> > https://www.postgresql.org/message-id/20210601140949.GC22012%40momjian.us
> 
> Well, I feel dumb. I remember reading that email back when Bruce sent
> it, but it seems that it slipped out of my head between then and when
> I committed. I think your patch is fine, except that I think maybe we

My patch also leaves a 0 byte file around from initdb, which is harmless, but
dirty.

I've seen before where a bunch of 0 byte files are abandoned in an
otherwise-empty tablespace, with no associated relation, and I have to "rm"
them to be able to drop the tablespace.  Maybe that's a known issue, maybe it's
due to crashes or other edge case, maybe it's of no consequence, and maybe it's
already been fixed or being fixed already.  But it'd be nice to avoid another
way to have a 0 byte files - especially ones named with system OIDs.

> Listing the exact properties preserved seems less important to me than
> mentioning that the second UPDATE statement is for its index --

+1

-- 
Justin




Re: avoid multiple hard links to same WAL file after a crash

2022-07-05 Thread Nathan Bossart
On Tue, Jul 05, 2022 at 10:19:49AM +0900, Michael Paquier wrote:
> On Thu, May 05, 2022 at 08:10:02PM +0900, Michael Paquier wrote:
>> I'd agree with removing all the callers at the end.  pgrename() is
>> quite robust on Windows, but I'd keep the two checks in
>> writeTimeLineHistory(), as the logic around findNewestTimeLine() would
>> consider a past TLI history file as in-use even if we have a crash
>> just after the file got created in the same path by the same standby,
>> and the WAL segment init part.  Your patch does that.
> 
> As v16 is now open for business, I have revisited this change and
> applied 0001 to change all the callers (aka removal of the assertion
> for the WAL receiver when it overwrites a TLI history file).  The
> commit log includes details about the reasoning of all the areas
> changed, for clarity, as of the WAL recycling part, the TLI history
> file part and basic_archive. 

Thanks!  I wonder if we should add a comment in writeTimeLineHistoryFile()
about possible concurrent use by a WAL receiver and the startup process and
why that is okay.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: [UNVERIFIED SENDER] Re: pg_upgrade can result in early wraparound on databases with high transaction load

2022-07-05 Thread Tom Lane
Bruce Momjian  writes:
> This patch has been applied back to 9.6 and will appear in the next
> minor release.

I have just discovered that this patch broke pg_upgrade's ability
to upgrade from 8.4:

$ pg_upgrade -b ~/version84/bin -d ...
Performing Consistency Checks
-
Checking cluster versions   ok
The source cluster lacks some required control information:
  latest checkpoint oldestXID

Cannot continue without required control information, terminating
Failure, exiting

Sure enough, 8.4's pg_controldata doesn't print anything about
oldestXID, because that info wasn't there then.

Given the lack of field complaints, it's probably not worth trying
to do anything to restore that capability.  But we really ought to
update pg_upgrade's code and docs in pre-v15 branches to say that
the minimum supported source version is 9.0.

regards, tom lane




Re: pg_upgrade (12->14) fails on aggregate

2022-07-05 Thread Tom Lane
Andrey Borodin  writes:
> The patch is marked WiP, what is in progress as of now?

It looks about ready to me.  Pushed with some minor cosmetic
adjustments.

regards, tom lane




Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)

2022-07-05 Thread Melanie Plageman
On Mon, Mar 21, 2022 at 8:15 PM Andres Freund  wrote:

> Hi,
>
> On 2022-02-19 11:06:18 -0500, Melanie Plageman wrote:
> > v21 rebased with compile errors fixed is attached.
>
> This currently doesn't apply (mea culpa likely):
> http://cfbot.cputube.org/patch_37_3272.log
>
> Could you rebase? Marked as waiting-on-author for now.
>
>
>
Attached is the rebased/rewritten version of the pg_stat_buffers patch
which uses the cumulative stats system instead of stats collector.

I've moved to the model of backend-local pending stats which get
accumulated into shared memory by pgstat_report_stat().

It is worth noting that, with this method, other backends will no longer
have access to each other's individual IO operation statistics. An
argument could be made to keep the statistics in each backend in
PgBackendStatus before accumulating them to the cumulative stats system
so that they can be accessed at the per-backend level of detail.

There are two TODOs related to when pgstat_report_io_ops() should be
called. pgstat_report_io_ops() is meant for backends that will not
commonly call pgstat_report_stat(). I was unsure if it made sense for
BootstrapModeMain() to explicitly call pgstat_report_io_ops() and if
auto vacuum worker should call it explicitly and, if so, if it was the
right location to call it after do_autovacuum().

Archiver and syslogger do not increment or report IO operations.

I did not change pg_stat_bgwriter fields to derive from the IO
operations statistics structures since the reset targets differ.

Also, I added one test, but I'm not sure if it will be flakey. It tests
that the "writes" for checkpointer are tracked when data is inserted
into a table and then CHECKPOINT is explicitly invoked directly after. I
don't know if this will have a problem if the checkpointer is busy and
somehow the backend which dirtied the buffer is forced to write out its
own buffer, causing the test to potentially fail (even if the
checkpointer is doing other writes [causing it to be busy], it may not
do them in between the INSERT and the SELECT from pg_stat_buffers).

I am wondering how to add a non-flakey test. For regular backends, I
couldn't think of a way to suspend checkpointer to make them do their
own writes and fsyncs in the context of a regression or isolation test.
In fact for many of the dirty buffers it seems like it will be difficult
to keep bgwriter, checkpointer, and regular backends from competing and
sometimes causing test failures.

- Melanie


v22-0002-Track-IO-operation-statistics.patch
Description: Binary data


v22-0003-Add-system-view-tracking-IO-ops-per-backend-type.patch
Description: Binary data


v22-0001-Add-BackendType-for-standalone-backends.patch
Description: Binary data


Re: pg_checkpointer is not a verb or verb phrase

2022-07-05 Thread Robert Haas
On Fri, Jul 1, 2022 at 5:50 PM Nathan Bossart  wrote:
> On Fri, Jul 01, 2022 at 03:36:48PM +0200, Magnus Hagander wrote:
> > +1 for pg_checkpoint on that -- let's not make it longer than necessary.
> >
> > And yes, +1 for actually changing it. It's a lot cheaper to change it now
> > than it will be in the future.  Yes, it would've been even cheaper to have
> > already changed it, but we can't go back in time...
>
> Yeah, pg_checkpoint seems like the obvious alternative to pg_checkpointer.
> I didn't see a patch in this thread yet, so I've put one together.  I did
> not include the catversion bump.

Hearing several votes in favor and none opposed, committed and
back-patched to v15. I added the catversion bump, but left out the .po
file changes, figuring it was better to let those files get updated
via the normal process.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: generic plans and "initial" pruning

2022-07-05 Thread Jacob Champion
On Fri, May 27, 2022 at 1:09 AM Amit Langote  wrote:
> 0001 contains the mechanical changes of moving PartitionPruneInfo out
> of Append/MergeAppend into a list in PlannedStmt.
>
> 0002 is the main patch to "Optimize AcquireExecutorLocks() by locking
> only unpruned partitions".

This patchset will need to be rebased over 835d476fd21; looks like
just a cosmetic change.

--Jacob




Re: Making the subquery alias optional in the FROM clause

2022-07-05 Thread Tom Lane
Dean Rasheed  writes:
> This was discussed previously in [1], and there seemed to be general
> consensus in favour of it, but no new patch emerged.

As I said in that thread, I'm not super enthused about this, but I was
clearly in the minority so I think it should go forward.

> Attached is a patch that takes the approach of not generating an alias
> at all, which seems to be neater and simpler, and less code than
> trying to generate a unique alias.

Hm.  Looking at the code surrounding what you touched, I'm reminded
that we allow JOIN nodes to not have an alias, and represent that
situation by rte->alias == NULL.  I wonder if it'd be better in the
long run to make alias-less subqueries work similarly, rather than
generating something that after-the-fact will be indistinguishable
from a user-written alias.  If that turns out to not work well,
I'd agree with "unnamed_subquery" as the inserted name.

Also, what about VALUES clauses?  It seems inconsistent to remove
this restriction for sub-SELECT but not VALUES.  Actually it looks
like your patch already does remove that restriction in gram.y,
but you didn't follow through elsewhere.

As far as the docs go, I think it's sufficient to mention the
inconsistency with SQL down at the bottom; we don't need a
redundant in-line explanation.

regards, tom lane




Re: First draft of the PG 15 release notes

2022-07-05 Thread Bruce Momjian
On Sat, Jul  2, 2022 at 08:13:41PM -0400, Bruce Momjian wrote:
> On Fri, Jul  1, 2022 at 09:56:17AM -0700, Peter Geoghegan wrote:
> > On Fri, Jul 1, 2022 at 9:41 AM Bruce Momjian  wrote:
> > > > It might be worth explaining the shift directly in the release notes.
> > > > The new approach is simpler and makes a lot more sense -- why should
> > > > the relfrozenxid be closely tied to freezing? We don't necessarily
> > >
> > > I don't think this is an appropriate detail for the release notes.
> > 
> > Okay. What about saying something about relminmxid advancement where
> > the database consumes lots of multixacts?
> 
> No. same issue.

Actually, I was wrong.  I thought that we only mentioned that we
computed a more agressive xid, but now see I was mentioning the _frozen_
xid.  Reading the commit, we do compute the multi-xid and store that too
so I have updated the PG 15 release notes with the attached patch.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Indecision is a decision.  Inaction is an action.  Mark Batterson

diff --git a/doc/src/sgml/release-15.sgml b/doc/src/sgml/release-15.sgml
index 47ac329e79..179ad37d9d 100644
--- a/doc/src/sgml/release-15.sgml
+++ b/doc/src/sgml/release-15.sgml
@@ -994,7 +994,8 @@ Author: Peter Geoghegan 
   

 Allow vacuum to be more
-aggressive in setting the oldest frozenxid (Peter Geoghegan)
+aggressive in setting the oldest frozen and multi transaction id
+(Peter Geoghegan)

   
 


Re: [PATCH] Add sortsupport for range types and btree_gist

2022-07-05 Thread Jacob Champion
On Wed, Jun 15, 2022 at 3:45 AM Christoph Heiss
 wrote:
> `make check-world` reports no regressions.

cfbot is reporting a crash in contrib/btree_gist:

https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest/38/3686

Thanks,
--Jacob




Re: Eliminating SPI from RI triggers - take 2

2022-07-05 Thread Jacob Champion
On Thu, Jun 30, 2022 at 11:23 PM Amit Langote  wrote:
>
> I will continue investigating what to do about points (1) and (2)
> mentioned above and see if we can do away with using SQL in the
> remaining cases.

Hi Amit, looks like isolation tests are failing in cfbot:

https://cirrus-ci.com/task/6642884727275520

Note also the uninitialized variable warning that cfbot picked up;
that may or may not be related.

Thanks,
--Jacob




Re: Error "initial slot snapshot too large" in create replication slot

2022-07-05 Thread Jacob Champion
On Thu, Mar 31, 2022 at 11:53 PM Kyotaro Horiguchi
 wrote:
> So this is that. v5 creates a regular snapshot.

This patch will need a quick rebase over 905c020bef9, which added
`extern` to several missing locations.

Thanks,
--Jacob




Re: doc: BRIN indexes and autosummarize

2022-07-05 Thread Roberto Mello
On Mon, Jul 4, 2022 at 9:20 AM Alvaro Herrera 
wrote:

>
> [Some of] these additions are wrong actually.  It says that autovacuum
> will not summarize new entries; but it does.  If you just let the table
> sit idle, any autovacuum run that cleans the table will also summarize
> any ranges that need summarization.
>
> What 'autosummarization=off' means is that the behavior to trigger an
> immediate summarization of a range once it becomes full is not default.
> This is very different.
>

Without having read through the code, I'll take your word for it. I simply
went with what was written on this phrase of the docs:

"or by automatic summarization executed by autovacuum, as insertions occur.
(This last trigger is disabled by default and can be enabled with the
autosummarize parameter.)"

To me this did not indicate a third behavior, which is what you are
describing, so I'm glad we're having this discussion to clarify it.

As for the new s that you added, I'd say they're
> stylistically wrong.  Each paragraph is supposed to be one fully
> contained idea; what these tags do is split each idea across several
> smaller paragraphs.  This is likely subjective though.
>

While I don't disagree with you, readability is more important. We have
lots of places (such as that one on the docs) where we have a big blob of
text, reducing readability, IMHO. In the source they are broken by new
lines, but in the rendered HTML, which is what the vast majority of people
read, they get rendered into a big blob-looking-thing.

> What would be the downside (if any) of having autosummarize=on by default?
>
> I'm not aware of any.  Maybe we should turn it on by default.
>

 +1

Thanks for looking at this Alvaro.

Roberto
--
Cunchy Data -- passion for open source PostgreSQL


Re: First draft of the PG 15 release notes

2022-07-05 Thread Bruce Momjian
On Fri, Jul  1, 2022 at 06:21:28PM -0700, Noah Misch wrote:
> Here's what I've been trying to ask: what do you think of linking to
> https://www.postgresql.org/docs/devel/ddl-schemas.html#DDL-SCHEMAS-PATTERNS
> here?  The release note text is still vague, and the docs have extensive
> coverage of the topic.  The notes can just link to that extensive coverage.

Sure. how is this patch?

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Indecision is a decision.  Inaction is an action.  Mark Batterson

diff --git a/doc/src/sgml/release-15.sgml b/doc/src/sgml/release-15.sgml
index 179ad37d9d..3ad8c1d996 100644
--- a/doc/src/sgml/release-15.sgml
+++ b/doc/src/sgml/release-15.sgml
@@ -63,11 +63,12 @@ Author: Noah Misch 
   permissions on the public schema has not
   been changed.  Databases restored from previous Postgres releases
   will be restored with their current permissions.  Users wishing
-  to have the former permissions will need to grant
+  to have the former more-open permissions will need to grant
   CREATE permission for PUBLIC
   on the public schema; this change can be made
   on template1 to cause all new databases
-  to have these permissions.
+  to have these permissions.  This change was made to increase
+  security;  see .
  
 
 


Re: PG 15 (and to a smaller degree 14) regression due to ExprEvalStep size

2022-07-05 Thread Andres Freund
Hi,

On 2022-06-24 10:29:06 -0400, Andrew Dunstan wrote:
> On 2022-06-23 Th 21:51, Andres Freund wrote:
> > On 2022-06-23 16:38:12 +0900, Michael Paquier wrote:
> >> On Tue, Jun 21, 2022 at 05:41:07PM -0400, Andrew Dunstan wrote:
> >>> Yes, but I don't guarantee to have a fix in time for Beta2.
> >> IMHO, it would be nice to get something done for beta2.  Now the
> >> thread is rather fresh and I guess that more performance study is
> >> required even for 0004, so..
> > I don't think there's a whole lot of performance study needed for 0004 - the
> > current state is obviously wrong.
> >
> > I think Andrew's beta 2 comment was more about my other architectural
> > complains around the json expression eval stuff.
> 
> 
> Right. That's being worked on but it's not going to be a mechanical fix.

Any updates here?

I'd mentioned the significant space use due to all JsonCoercionsState for all
the types. Another related aspect is that this code is just weird - the same
struct name (JsonCoercionsState), nested in each other?

struct JsonCoercionsState
{
struct JsonCoercionState
{
JsonCoercion *coercion; /* coercion expression */
ExprState  *estate; /* coercion expression state */
}   null,
string,
numeric,
boolean,
date,
time,
timetz,
timestamp,
timestamptz,
composite;
}   coercions;  /* states for coercion from SQL/JSON item
 * types directly to the output type */

Also note the weird numeric indentation that pgindent does...


> The attached very small patch applies on top of your 0002 and deals with
> the FmgrInfo complaint.

Now that the FmgrInfo is part of a separately allocated struct, that doesn't
seem necessary anymore.

- Andres




Re: pg_rewind: warn when checkpoint hasn't happened after promotion

2022-07-05 Thread Robert Haas
On Sat, Jun 4, 2022 at 8:59 AM James Coleman  wrote:
> A quick background refresher: after promoting a standby rewinding the
> former primary requires that a checkpoint have been completed on the
> new primary after promotion. This is correctly documented. However
> pg_rewind incorrectly reports to the user that a rewind isn't
> necessary because the source and target are on the same timeline.

Is there anything intrinsic to the mechanism of operation of pg_rewind
that requires a timeline change, or could we just rewind within the
same timeline to an earlier LSN? In other words, maybe we could just
remove this limitation of pg_rewind, and then perhaps it wouldn't be
necessary to determine what the new timeline is.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: pg15b2: large objects lost on upgrade

2022-07-05 Thread Robert Haas
On Tue, Jul 5, 2022 at 12:56 PM Justin Pryzby  wrote:
> My patch also leaves a 0 byte file around from initdb, which is harmless, but
> dirty.
>
> I've seen before where a bunch of 0 byte files are abandoned in an
> otherwise-empty tablespace, with no associated relation, and I have to "rm"
> them to be able to drop the tablespace.  Maybe that's a known issue, maybe 
> it's
> due to crashes or other edge case, maybe it's of no consequence, and maybe 
> it's
> already been fixed or being fixed already.  But it'd be nice to avoid another
> way to have a 0 byte files - especially ones named with system OIDs.

Do you want to add something to the patch to have pg_upgrade remove
the stray file?

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: PSA: Autoconf has risen from the dead

2022-07-05 Thread Robert Haas
On Sun, Jul 3, 2022 at 1:17 PM Andres Freund  wrote:
> Yea, I guess I should start a documentation section...
>
> I've only used homebrew on mac, but with that it should be something along the
> lines of
>
> brew install meson
> meson setup --buildtype debug -Dcassert=true build-directory
> cd build-directory
> ninja
>
> of course if you want to build against some dependencies and / or run tap
> tests, you need to do something similar to what you have to do for
> configure. I.e.
> - install perl modules [1]
> - tell the build about location of homebrew [2]

Since I'm a macports user I hope at some point we'll have directions
for that as well as for homebrew.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Emit extra debug message when executing extension script.

2022-07-05 Thread Robert Haas
On Mon, Jul 4, 2022 at 5:27 AM Alvaro Herrera  wrote:
> On 2022-Jun-29, Robert Haas wrote:
> > On Wed, Jun 29, 2022 at 9:26 AM Peter Eisentraut
> >  wrote:
> > > On 28.06.22 21:10, Jeff Davis wrote:
> > > > + ereport(DEBUG1, errmsg("executing extension script: %s", 
> > > > filename));
> > >
> > > This should either be elog or use errmsg_internal.
> >
> > Why?
>
> The reason is that errmsg() marks the message for translation, and we
> don't want to burden translators with messages that are of little
> interest to most users.  Using either elog() or errmsg_internal()
> avoids that.

Yeah, I'm aware of that in general, but I'm not quite clear on how we
decide that. Do we take the view that all debug-level messages need
not be translated?

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: pg_rewind: warn when checkpoint hasn't happened after promotion

2022-07-05 Thread James Coleman
On Tue, Jul 5, 2022 at 2:39 PM Robert Haas  wrote:
>
> On Sat, Jun 4, 2022 at 8:59 AM James Coleman  wrote:
> > A quick background refresher: after promoting a standby rewinding the
> > former primary requires that a checkpoint have been completed on the
> > new primary after promotion. This is correctly documented. However
> > pg_rewind incorrectly reports to the user that a rewind isn't
> > necessary because the source and target are on the same timeline.
>
> Is there anything intrinsic to the mechanism of operation of pg_rewind
> that requires a timeline change, or could we just rewind within the
> same timeline to an earlier LSN? In other words, maybe we could just
> remove this limitation of pg_rewind, and then perhaps it wouldn't be
> necessary to determine what the new timeline is.

I think (someone can correct me if I'm wrong) that in theory the
mechanisms would support the source and target being on the same
timeline, but in practice that presents problems since you'd not have
an LSN you could detect as the divergence point. If we allowed passing
"rewind to" point LSN value, then that (again, as far as I understand
it) would work, but it's a different use case. Specifically I wouldn't
want that option to need to be used for this particular case since in
my example there is in fact a real divergence point that we should be
detecting automatically.

Thanks,
James Coleman




Re: pg_rewind: warn when checkpoint hasn't happened after promotion

2022-07-05 Thread Tom Lane
Robert Haas  writes:
> Is there anything intrinsic to the mechanism of operation of pg_rewind
> that requires a timeline change, or could we just rewind within the
> same timeline to an earlier LSN? In other words, maybe we could just
> remove this limitation of pg_rewind, and then perhaps it wouldn't be
> necessary to determine what the new timeline is.

That seems like a fairly bad idea.  For example, if you've already
archived some WAL segments past the rewind target, there will shortly
be two versions of truth about what that part of the WAL space contains,
and your archiver will either spit up or do probably-the-wrong-thing.

regards, tom lane




Re: PSA: Autoconf has risen from the dead

2022-07-05 Thread Andres Freund
Hi,

On 2022-07-05 14:42:03 -0400, Robert Haas wrote:
> On Sun, Jul 3, 2022 at 1:17 PM Andres Freund  wrote:
> > Yea, I guess I should start a documentation section...
> >
> > I've only used homebrew on mac, but with that it should be something along 
> > the
> > lines of
> >
> > brew install meson
> > meson setup --buildtype debug -Dcassert=true build-directory
> > cd build-directory
> > ninja
> >
> > of course if you want to build against some dependencies and / or run tap
> > tests, you need to do something similar to what you have to do for
> > configure. I.e.
> > - install perl modules [1]
> > - tell the build about location of homebrew [2]
> 
> Since I'm a macports user I hope at some point we'll have directions
> for that as well as for homebrew.

I am not a normal mac user, it looks hard to run macos in a VM, and I'm not
sure it's wise to mix macports and homebrew on my test box. So I don't want to
test it myself.

But it looks like it's just
  sudo port install meson

Greetings,

Andres Freund




Re: pg_rewind: warn when checkpoint hasn't happened after promotion

2022-07-05 Thread Robert Haas
On Tue, Jul 5, 2022 at 2:47 PM Tom Lane  wrote:
> Robert Haas  writes:
> > Is there anything intrinsic to the mechanism of operation of pg_rewind
> > that requires a timeline change, or could we just rewind within the
> > same timeline to an earlier LSN? In other words, maybe we could just
> > remove this limitation of pg_rewind, and then perhaps it wouldn't be
> > necessary to determine what the new timeline is.
>
> That seems like a fairly bad idea.  For example, if you've already
> archived some WAL segments past the rewind target, there will shortly
> be two versions of truth about what that part of the WAL space contains,
> and your archiver will either spit up or do probably-the-wrong-thing.

Well, only if you void the warranty. If you rewind the ex-primary to
the LSN where the new primary is replaying and tell it to start
replaying from there and follow the new primary's subsequent switch
onto a new timeline, there's no split-brain problem.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: First draft of the PG 15 release notes

2022-07-05 Thread Peter Geoghegan
On Tue, Jul 5, 2022 at 11:09 AM Bruce Momjian  wrote:
>
> Actually, I was wrong.  I thought that we only mentioned that we
> computed a more agressive xid, but now see I was mentioning the _frozen_
> xid.  Reading the commit, we do compute the multi-xid and store that too
> so I have updated the PG 15 release notes with the attached patch.

It might be worth using the "symbol names" directly, since they appear
in the documentation already (under "Routine Vacuuming"). These are
relfrozenxid and
relminmxid. These are implementation
details, but they're documented in detail (though admittedly the
documentation has *lots* of problems).

Here is what I would like this item to hint at, to advanced users with
tricky requirements: The new approach to setting relminmxid will
improve the behavior of VACUUM in databases that already happen to use
lots of MultiXacts. These users will notice that autovacuum now works
off of relminmxid values that actually tell us something about each
table's consumption of MultiXacts over time. Most individual tables
naturally consume *zero* MultiXacts, even in databases that consume
many MultiXacts -- due to naturally occuring workload characteristics.
The old approach failed to recognize this, leading to very uniform
relminmxid values across tables that were in fact very different,
MultiXact-wise.

The way that we handle relfrozenxid is probably much less likely to
make life much easier for any database, at least on its own, in
Postgres 15. So from the point of view of a user considering
upgrading, the impact on relminmxid is likely to be far more
important.

Admittedly the most likely scenario by far is that the whole feature
just isn't interesting, but a small minority of advanced users (users
with painful MultiXact problems) will find the relminmxid thing very
compelling.

-- 
Peter Geoghegan




Re: PSA: Autoconf has risen from the dead

2022-07-05 Thread Tom Lane
Andres Freund  writes:
> On 2022-07-05 14:42:03 -0400, Robert Haas wrote:
>> Since I'm a macports user I hope at some point we'll have directions
>> for that as well as for homebrew.

> But it looks like it's just
>   sudo port install meson

Yeah, that's what I did to install it locally.  The ninja package
has some weird name (ninja-build or some such), but you don't have
to remember that because installing meson is enough to pull it in.

I dunno anything about the other steps Andres mentioned, but
presumably they're independent of where you got meson from.

regards, tom lane




Re: PSA: Autoconf has risen from the dead

2022-07-05 Thread Robert Haas
On Tue, Jul 5, 2022 at 2:52 PM Tom Lane  wrote:
> Andres Freund  writes:
> > On 2022-07-05 14:42:03 -0400, Robert Haas wrote:
> >> Since I'm a macports user I hope at some point we'll have directions
> >> for that as well as for homebrew.
>
> > But it looks like it's just
> >   sudo port install meson
>
> Yeah, that's what I did to install it locally.  The ninja package
> has some weird name (ninja-build or some such), but you don't have
> to remember that because installing meson is enough to pull it in.
>
> I dunno anything about the other steps Andres mentioned, but
> presumably they're independent of where you got meson from.

That seems simple enough that even I can handle it!

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: [UNVERIFIED SENDER] Re: pg_upgrade can result in early wraparound on databases with high transaction load

2022-07-05 Thread Andrew Dunstan


On 2022-07-05 Tu 12:59, Tom Lane wrote:
> Bruce Momjian  writes:
>> This patch has been applied back to 9.6 and will appear in the next
>> minor release.
> I have just discovered that this patch broke pg_upgrade's ability
> to upgrade from 8.4:
>
> $ pg_upgrade -b ~/version84/bin -d ...
> Performing Consistency Checks
> -
> Checking cluster versions   ok
> The source cluster lacks some required control information:
>   latest checkpoint oldestXID
>
> Cannot continue without required control information, terminating
> Failure, exiting
>
> Sure enough, 8.4's pg_controldata doesn't print anything about
> oldestXID, because that info wasn't there then.
>
> Given the lack of field complaints, it's probably not worth trying
> to do anything to restore that capability.  But we really ought to
> update pg_upgrade's code and docs in pre-v15 branches to say that
> the minimum supported source version is 9.0.


So it's taken us a year to discover the issue :-( Perhaps if we're going
to say we support upgrades back to 9.0 we should have some testing to be
assured we don't break it without knowing like this. I'll see if I can
coax crake to do that - it already tests back to 9.2.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: doc: BRIN indexes and autosummarize

2022-07-05 Thread Roberto Mello
On Tue, Jul 5, 2022 at 5:47 AM Alvaro Herrera 
wrote:

> OK, I have adopted all your proposed changes, thanks for submitting in
> both forms.  I did some more wordsmithing and pushed, to branches 12 and
> up.  11 fails 'make check', I think for lack of Docbook id tags, and I
> didn't want to waste more time.  Kindly re-read the result and let me
> know if I left something unaddressed, or made something worse.  The
> updated text is already visible in the website:
> https://www.postgresql.org/docs/devel/brin-intro.html
>

You removed the reference to the functions' documentation at
functions-admin-index choosing instead to duplicate a summarized
version of the docs, and to boot getting the next block to be blobbed
together with it.

Keeping with the reduced-readability theme, you made the paragraphs
even bigger. While I do appreciate the time to clarify things a bit, as was
my original intent with the patch,

We should be writing documentation with the user in mind, not for our
developer eyes. Different target audiences. It is less helpful to have
awesome features that don't get used because users can't really
grasp the docs.

Paragraphs such as this feel like we're playing "summary bingo":

 When a new page is created that does not fall within the last
 summarized range, the range that the new page belongs into
 does not automatically acquire a summary tuple;
 those tuples remain unsummarized until a summarization run is
 invoked later, creating the initial summary for that range

Roberto

--
Crunchy Data -- passion for open source PostgreSQL


Re: [PATCH] Implement INSERT SET syntax

2022-07-05 Thread Jacob Champion
On Thu, Apr 7, 2022 at 11:29 AM Marko Tiikkaja  wrote:
> I can help with review and/or other work here.  Please give me a
> couple of weeks.

Hi Marko, did you get a chance to pick up this patchset? If not, no
worries; I can mark this RwF and we can try again in a future
commitfest.

Thanks,
--Jacob




Re: PSA: Autoconf has risen from the dead

2022-07-05 Thread Andres Freund
Hi,

On 2022-07-05 14:52:05 -0400, Tom Lane wrote:
> I dunno anything about the other steps Andres mentioned, but
> presumably they're independent of where you got meson from.

Yea. They might not be independent of where you get other dependencies from
though. Does macports install headers / libraries into a path that's found by
default? Or does one have to pass --with-includes / --with-libs to configure
and set PKG_CONFIG_PATH, like with homebrew?

Except that with meson doing PKG_CONFIG_PATH should suffice for most (all?)
dependencies on macos, and that the syntax for with-includes/libs is a bit
different (-Dextra_include_dirs=... and -Dextra_lib_dirs=...) and that
optionally one can use a parameter (--pkg-config-path) instead of
PKG_CONFIG_PATH, that part shouldn't really differ from what's neccesary
for configure.

Greetings,

Andres Freund




Re: PG 15 (and to a smaller degree 14) regression due to ExprEvalStep size

2022-07-05 Thread Andrew Dunstan


On 2022-07-05 Tu 14:36, Andres Freund wrote:
> Hi,
>
> On 2022-06-24 10:29:06 -0400, Andrew Dunstan wrote:
>> On 2022-06-23 Th 21:51, Andres Freund wrote:
>>> On 2022-06-23 16:38:12 +0900, Michael Paquier wrote:
 On Tue, Jun 21, 2022 at 05:41:07PM -0400, Andrew Dunstan wrote:
> Yes, but I don't guarantee to have a fix in time for Beta2.
 IMHO, it would be nice to get something done for beta2.  Now the
 thread is rather fresh and I guess that more performance study is
 required even for 0004, so..
>>> I don't think there's a whole lot of performance study needed for 0004 - the
>>> current state is obviously wrong.
>>>
>>> I think Andrew's beta 2 comment was more about my other architectural
>>> complains around the json expression eval stuff.
>>
>> Right. That's being worked on but it's not going to be a mechanical fix.
> Any updates here?


Not yet. A colleague and I are working on it. I'll post a status this
week if we can't post a fix.


>
> I'd mentioned the significant space use due to all JsonCoercionsState for all
> the types. Another related aspect is that this code is just weird - the same
> struct name (JsonCoercionsState), nested in each other?
>
> struct JsonCoercionsState
> {
> struct JsonCoercionState
> {
> JsonCoercion *coercion; /* coercion expression */
> ExprState  *estate; /* coercion expression state */
> }   null,
> string,
> numeric,
> boolean,
> date,
> time,
> timetz,
> timestamp,
> timestamptz,
> composite;
> }   coercions;  /* states for coercion from SQL/JSON item
>  * types directly to the output type */
>
> Also note the weird numeric indentation that pgindent does...


Yeah, we'll try to fix that.


>
>
>> The attached very small patch applies on top of your 0002 and deals with
>> the FmgrInfo complaint.
> Now that the FmgrInfo is part of a separately allocated struct, that doesn't
> seem necessary anymore.


Right, but you complained that we should do it the same way as it's done
elsewhere, so I thought I'd do that anyway.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: PSA: Autoconf has risen from the dead

2022-07-05 Thread Tom Lane
Andres Freund  writes:
> Yea. They might not be independent of where you get other dependencies from
> though. Does macports install headers / libraries into a path that's found by
> default? Or does one have to pass --with-includes / --with-libs to configure
> and set PKG_CONFIG_PATH, like with homebrew?

What are you expecting to need PKG_CONFIG_PATH for?  Or more precisely,
why would meson/ninja create any new need for that that doesn't exist
in the autoconf case?

regards, tom lane




Re: [PATCH] Fix pg_upgrade test from v10

2022-07-05 Thread Justin Pryzby
On Tue, Jul 05, 2022 at 09:01:49AM +0300, Anton A. Melnikov wrote:
> On 01.07.2022 20:07, Justin Pryzby wrote:
> > It's silly to say that v9.2 will be supported potentially for a handful more
> > years, but that the upgrade-testing script itself doesn't support that, so
> > developers each have to reinvent its fixups.
> 
> I've test the attached patch in all variants from v9.5..15 to supported
> versions 10..master. The script test.sh for 9.5->10 and 9.6->10 upgrades
> works fine without any patch.
> In 9.4 there is a regress test largeobject to be patched to allow upgrade
> test from this version.So i've stopped at 9.5.
> This is clear that we limit the destination version for upgrade test to the
> supported versions only. In our case destination versions
> starting from the 10th inclusively.
> But is there are a limit for the source version for upgrade test from?

As of last year, there's a reasonably clear policy for support of old versions:

https://www.postgresql.org/docs/devel/pgupgrade.html
|pg_upgrade supports upgrades from 9.2.X and later to the current major release 
of PostgreSQL, including snapshot and beta releases.

See: e469f0aaf3c586c8390bd65923f97d4b1683cd9f

So it'd be ideal if this were to support versions down to 9.2.

This is failing in cfbot:
http://cfbot.cputube.org/anton-melnikov.html

..since it tries to apply all the *.patch files to the master branch, one after
another.  For branches other than master, I suggest to name the patches *.txt
or similar.  Or, just focus for now on allowing upgrades *to* master.  I'm not
sure if anyone is interested in patching test.sh in backbranches.  I'm not
sure, but there may be more interest to backpatch the conversion to TAP
(322becb60).

-- 
Justin




Re: PG 15 (and to a smaller degree 14) regression due to ExprEvalStep size

2022-07-05 Thread Andres Freund
Hi,

On 2022-06-23 18:51:45 -0700, Andres Freund wrote:
> > Waiting for beta3 would a better move at this stage.  Is somebody confident
> > enough in the patches proposed?
> 
> 0001 is the one that needs to most careful analysis, I think. 0002 I'd be fine
> with pushing after reviewing it again. For 0003 David's approach might be
> better or worse, it doesn't matter much I think. 0004 is ok I think, perhaps
> with the exception of quibbling over some naming decisions?

I don't quite feel comfortable with 0001, without review by others. So my
current plan is to drop it and use get_timeout_active() "manually". We can
improve this in HEAD to remove the redundancy.

I've pushed what was 0004, will push what was 0002 with the above change in a
short while unless somebody protests PDQ. Then will look at David's edition of
my 0003.

Greetings,

Andres Freund




Re: PSA: Autoconf has risen from the dead

2022-07-05 Thread Andres Freund
Hi,

On 2022-07-05 15:06:31 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > Yea. They might not be independent of where you get other dependencies from
> > though. Does macports install headers / libraries into a path that's found 
> > by
> > default? Or does one have to pass --with-includes / --with-libs to configure
> > and set PKG_CONFIG_PATH, like with homebrew?
> 
> What are you expecting to need PKG_CONFIG_PATH for?  Or more precisely,
> why would meson/ninja create any new need for that that doesn't exist
> in the autoconf case?

It's just used in more cases than before, with fallback to non-pkg-config in
most cases. I think all dependencies besides perl can use pkg-config. So all
that changes compared to AC is that you might not need to pass extra
include/lib paths for some dependencies that needed it before, if you set/pass
PKG_CONFIG_PATH.

Greetings,

Andres Freund




Re: [UNVERIFIED SENDER] Re: pg_upgrade can result in early wraparound on databases with high transaction load

2022-07-05 Thread Tom Lane
Andrew Dunstan  writes:
> So it's taken us a year to discover the issue :-( Perhaps if we're going
> to say we support upgrades back to 9.0 we should have some testing to be
> assured we don't break it without knowing like this. I'll see if I can
> coax crake to do that - it already tests back to 9.2.

Hmm ... could you first look into why 09878cdd4 broke it?  I'd supposed
that that was just detecting situations we must already have dealt with
in order for the pg_upgrade test to work, but crake's not happy.

regards, tom lane




Re: [PATCH] Completed unaccent dictionary with many missing characters

2022-07-05 Thread Przemysław Sztoch

Michael Paquier wrote on 7/5/2022 9:22 AM:

On Tue, Jun 28, 2022 at 02:14:53PM +0900, Michael Paquier wrote:

Well, the addition of cyrillic does not make necessary the removal of
SOUND RECORDING COPYRIGHT or the DEGREEs, that implies the use of a
dictionnary when manipulating the set of codepoints, but that's me
being too picky.  Just to say that I am fine with what you are
proposing here.

So, I have been looking at the change for cyrillic letters, and are
you sure that the range of codepoints [U+0410,U+044f] is right when it
comes to consider all those letters as plain letters?  There are a
couple of characters that itch me a bit with this range:
- What of the letter CAPITAL SHORT I (U+0419) and SMALL SHORT I
(U+0439)?  Shouldn't U+0439 be translated to U+0438 and U+0419
translated to U+0418?  That's what I get while looking at
UnicodeData.txt, and it would mean that the range of plain letters
should not include both of them.
1. It's good that you noticed it. I missed it. But it doesn't affect the 
generated rule list.

- It seems like we are missing a couple of letters after U+044F, like
U+0454, U+0456 or U+0455 just to name three of them?
2. I added a few more letters that are used in languages other than 
Russian: Byelorussian or Ukrainian.


-   (0x0410, 0x044f),  # Cyrillic capital and 
small letters

+ (0x0402, 0x0402),  # Cyrillic capital and small letters
+ (0x0404, 0x0406),  #
+ (0x0408, 0x040b),  #
+ (0x040f, 0x0418),  #
+ (0x041a, 0x0438),  #
+ (0x043a, 0x044f),  #
+ (0x0452, 0x0452),  #
+ (0x0454, 0x0456),  #

I do not add more, because they probably concern older languages.
An alternative might be to rely entirely on Unicode decomposition ...
However, after the change, only one additional Ukrainian letter with an 
accent was added to the rule file.


I have extracted from 0001 and applied the parts about the regression
tests for degree signs, while adding two more for SOUND RECORDING
COPYRIGHT (U+2117) and Black-Letter Capital H (U+210C) translated to
'x', while it should be probably 'H'.
3. The matter is not that simple. When I change priorities (ie 
Latin-ASCII.xml is less important than Unicode decomposition),

then "U + 33D7" changes not to pH but to PH.
In the end, I left it like it was before ...

If you decide what to do with point 3, I will correct it and send new 
patches.


--
Przemysław Sztoch | Mobile +48 509 99 00 66


Re: Emit extra debug message when executing extension script.

2022-07-05 Thread Alvaro Herrera
On 2022-Jul-05, Robert Haas wrote:

> On Mon, Jul 4, 2022 at 5:27 AM Alvaro Herrera  wrote:
> > On 2022-Jun-29, Robert Haas wrote:

> > > Why?
> >
> > The reason is that errmsg() marks the message for translation, and we
> > don't want to burden translators with messages that are of little
> > interest to most users.  Using either elog() or errmsg_internal()
> > avoids that.
> 
> Yeah, I'm aware of that in general, but I'm not quite clear on how we
> decide that. Do we take the view that all debug-level messages need
> not be translated?

Yes.  I don't know about others, but I do.

I notice that we have a small number of other errmsg() uses in DEBUG
messages already.  I don't think they're quite worth it.  I mean, would
a user ever run amcheck with log level set to DEBUG, and care about
any of these messages?  I think I wouldn't care.

contrib/amcheck/verify_heapam.c:ereport(DEBUG1,
contrib/amcheck/verify_heapam.c-
(errcode(ERRCODE_READ_ONLY_SQL_TRANSACTION),
contrib/amcheck/verify_heapam.c- errmsg("cannot verify unlogged 
relation \"%s\" during recovery, skipping",

contrib/amcheck/verify_nbtree.c:ereport(DEBUG1,
contrib/amcheck/verify_nbtree.c-
(errcode(ERRCODE_READ_ONLY_SQL_TRANSACTION),
contrib/amcheck/verify_nbtree.c- errmsg("cannot verify unlogged 
index \"%s\" during recovery, skipping",

Why are these basic_archive messages translatable?  Seems pointless.

contrib/basic_archive/basic_archive.c:  ereport(DEBUG3,
contrib/basic_archive/basic_archive.c-  (errmsg("archiving \"%s\" via 
basic_archive", file)));

contrib/basic_archive/basic_archive.c:  ereport(DEBUG3,
contrib/basic_archive/basic_archive.c-  (errmsg("archive file 
\"%s\" already exists with identical contents",

contrib/basic_archive/basic_archive.c:  ereport(DEBUG1,
contrib/basic_archive/basic_archive.c-  (errmsg("archived \"%s\" via 
basic_archive", file)));


We also have a small number in the backend:

src/backend/access/heap/vacuumlazy.c:   ereport(DEBUG2,
src/backend/access/heap/vacuumlazy.c-   (errmsg("table \"%s\": removed 
%lld dead item identifiers in %u pages",
src/backend/access/heap/vacuumlazy.c-   vacrel->relname, (long 
long) index, vacuumed_pages)));

Why is this one unconditional DEBUG2 instead of depending on
LVRelState->message_level (ie. turn into INFO when VERBOSE is used),
like every other message from vacuum?  It seems to me that if I say
VERBOSE, then I may be interested in this message also.  While at it,
why are the index vacuuming routines not using LVRelState->message_level
either but instead hardcode DEBUG2?  Aren't they all mistakes?


src/backend/replication/logical/worker.c:   ereport(DEBUG1,
src/backend/replication/logical/worker.c-   (errmsg("logical 
replication apply worker for subscription \"%s\" two_phase is %s",
src/backend/replication/logical/worker.c-   
MySubscription->name,

Not sure why anybody cares about this.  Why not remove the message?

src/backend/utils/activity/pgstat.c:ereport(DEBUG2,
src/backend/utils/activity/pgstat.c-(errcode_for_file_access(),
src/backend/utils/activity/pgstat.c- errmsg("unlinked permanent 
statistics file \"%s\"",

Hmm, why is there an errcode here, after the operation succeeds?  ISTM
this could be an elog().


Then we have this one:

ereport(DEBUG1,
(errcode(ERRCODE_INTERNAL_ERROR),
 errmsg("picksplit method for column %d of 
index \"%s\" failed",
attno + 1, 
RelationGetRelationName(r)),
 errhint("The index is not optimal. To optimize 
it, contact a developer, or try to use the column as the second one in the 
CREATE INDEX command.")));

I cannot understand how is DEBUG1 a useful log level for this message.
How is the user going to find out that there is a problem, when this
message is hidden from them?  Do we tell people to run their insert
queries for tables with GiST indexes under DEBUG1, in case the picksplit
method fails, so that they can contact a developer?  How many
user-defined picksplit methods have been improved to cope with this
problem, since commit 09368d23dbf4 added this bit in April 2009?  How
many of them have been presenting the problem since then, and not been
fixed because nobody has noticed that there is a problem?

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/




Re: [UNVERIFIED SENDER] Re: pg_upgrade can result in early wraparound on databases with high transaction load

2022-07-05 Thread Peter Geoghegan
On Tue, Jul 5, 2022 at 11:53 AM Andrew Dunstan  wrote:
> > Sure enough, 8.4's pg_controldata doesn't print anything about
> > oldestXID, because that info wasn't there then.
> >
> > Given the lack of field complaints, it's probably not worth trying
> > to do anything to restore that capability.  But we really ought to
> > update pg_upgrade's code and docs in pre-v15 branches to say that
> > the minimum supported source version is 9.0.
>
>
> So it's taken us a year to discover the issue :-(

I'm not surprised at all, given the history here. There were at least
a couple of bugs affecting how pg_upgrade carries forward information
about these cutoffs. See commits 74cf7d46 and a61daa14.

Actually, commit 74cf7d46 was where pg_resetxlog/pg_resetwal's -u
argument was first added, for use by pg_upgrade. That commit is only
about a year old, and was only backpatched to 9.6. Unfortunately the
previous approach to carrying forward oldestXID was an accident that
usually worked. So...yeah, things are bad here. At least we now have
the ability to detect any downstream problems that this might cause by
using pg_amcheck.


--
Peter Geoghegan




Re: pg_checkpointer is not a verb or verb phrase

2022-07-05 Thread Nathan Bossart
On Tue, Jul 05, 2022 at 01:38:43PM -0400, Robert Haas wrote:
> Hearing several votes in favor and none opposed, committed and
> back-patched to v15.

Thanks.

> I added the catversion bump, but left out the .po
> file changes, figuring it was better to let those files get updated
> via the normal process.

I'll keep this in mind for future patches.  The changes looked pretty
obvious, so I wasn't sure whether to include it.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: doc: BRIN indexes and autosummarize

2022-07-05 Thread Alvaro Herrera
On 2022-Jul-05, Roberto Mello wrote:

> You removed the reference to the functions' documentation at
> functions-admin-index choosing instead to duplicate a summarized
> version of the docs, and to boot getting the next block to be blobbed
> together with it.

Actually, my first instinct was to move the interesting parts to the
functions docs, then reference those, removing the duplicate bits.  But
I was discouraged when I read it, because it is just a table in a place
not really appropriate for a larger discussion on it.  Also, a reference
to it is not direct, but rather it goes to a table that contains a lot
of other stuff.

> Keeping with the reduced-readability theme, you made the paragraphs
> even bigger. While I do appreciate the time to clarify things a bit, as was
> my original intent with the patch, [...]

Hmm, which paragraph are you referring to?  I'm not aware of having made
any paragraph bigger, quite the opposite.  In the original text, the
paragraph "At the time of creation," is 13 lines on a browser window
that is half the screen; in the patched text, that has been replaced by
three paragraphs that are 7, 6, and 4 lines long, plus a separate one
for the de-summarization bits at the end of the page, which is 3 lines
long.

> We should be writing documentation with the user in mind, not for our
> developer eyes. Different target audiences. It is less helpful to have
> awesome features that don't get used because users can't really
> grasp the docs.

I try to do that.  I guess I fail more frequently that I should.

> Paragraphs such as this feel like we're playing "summary bingo":
> 
>  When a new page is created that does not fall within the last
>  summarized range, the range that the new page belongs into
>  does not automatically acquire a summary tuple;
>  those tuples remain unsummarized until a summarization run is
>  invoked later, creating the initial summary for that range

Yeah, I am aware that the word "summary" and variations occur way too
many times.  Maybe it is possible to replace "summary tuple" with "BRIN
tuple" for example; can you propose some synonym for "summarized" and
"unsummarized"?  Perhaps something like this:

>  When a new page is created that does not fall within the last
>  summarized range, the range that the new page belongs into
>  does not automatically acquire a BRIN tuple;
>  those [pages] remain uncovered by the BRIN index until a summarization 
> run is
>  invoked later, creating the initial BRIN tuple for that range

(I also replaced the word "tuples" with "pages" in one spot.)

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"Hay dos momentos en la vida de un hombre en los que no debería
especular: cuando puede permitírselo y cuando no puede" (Mark Twain)




Re: [UNVERIFIED SENDER] Re: pg_upgrade can result in early wraparound on databases with high transaction load

2022-07-05 Thread Peter Geoghegan
On Tue, Jul 5, 2022 at 12:41 PM Peter Geoghegan  wrote:
> Actually, commit 74cf7d46 was where pg_resetxlog/pg_resetwal's -u
> argument was first added, for use by pg_upgrade. That commit is only
> about a year old, and was only backpatched to 9.6.

I just realized that this thread was where that work was first
discussed. That explains why it took a year to discover that we broke
8.4!

On further reflection I think that breaking pg_upgrade for 8.4 might
have been a good thing. The issue was fairly visible and obvious if
you actually ran into it, which is vastly preferable to what would
have happened before commit 74cf7d46.

-- 
Peter Geoghegan




Re: "ERROR: latch already owned" on gharial

2022-07-05 Thread Robert Haas
On Sun, Jul 3, 2022 at 11:51 PM Thomas Munro  wrote:
> On Wed, Jun 1, 2022 at 12:55 AM Robert Haas  wrote:
> > OK, I have access to the box now. I guess I might as well leave the
> > crontab jobs enabled until the next time this happens, since Thomas
> > just took steps to improve the logging, but I do think these BF
> > members are overdue to be killed off, and would like to do that as
> > soon as it seems like a reasonable step to take.
>
> A couple of months later, there has been no repeat of that error.  I'd
> happily forget about that and move on, if you want to decommission
> these.

I have commented out the BF stuff in crontab on that machine.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: doc: BRIN indexes and autosummarize

2022-07-05 Thread Justin Pryzby
On Tue, Jul 05, 2022 at 01:47:27PM +0200, Alvaro Herrera wrote:
> OK, I have adopted all your proposed changes, thanks for submitting in
> both forms.  I did some more wordsmithing and pushed, to branches 12 and
> up.  11 fails 'make check', I think for lack of Docbook id tags, and I
> didn't want to waste more time.  Kindly re-read the result and let me
> know if I left something unaddressed, or made something worse.  The
> updated text is already visible in the website:
> https://www.postgresql.org/docs/devel/brin-intro.html

One issue:

+   summarized range, the range that the new page belongs into
+   does not automatically acquire a summary tuple;

"belongs into" sounds wrong - "belongs to" is better.

I'll put that change into my "typos" branch to fix later if it's not addressed
in this thread.

-- 
Justin




Re: First draft of the PG 15 release notes

2022-07-05 Thread Bruce Momjian
On Tue, Jul  5, 2022 at 11:51:31AM -0700, Peter Geoghegan wrote:
> On Tue, Jul 5, 2022 at 11:09 AM Bruce Momjian  wrote:
> >
> > Actually, I was wrong.  I thought that we only mentioned that we
> > computed a more agressive xid, but now see I was mentioning the _frozen_
> > xid.  Reading the commit, we do compute the multi-xid and store that too
> > so I have updated the PG 15 release notes with the attached patch.
> 
> It might be worth using the "symbol names" directly, since they appear
> in the documentation already (under "Routine Vacuuming"). These are
> relfrozenxid and
> relminmxid. These are implementation
> details, but they're documented in detail (though admittedly the
> documentation has *lots* of problems).

Well, users can look into the details if they wish.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Indecision is a decision.  Inaction is an action.  Mark Batterson





Re: First draft of the PG 15 release notes

2022-07-05 Thread Noah Misch
On Tue, Jul 05, 2022 at 02:35:39PM -0400, Bruce Momjian wrote:
> On Fri, Jul  1, 2022 at 06:21:28PM -0700, Noah Misch wrote:
> > Here's what I've been trying to ask: what do you think of linking to
> > https://www.postgresql.org/docs/devel/ddl-schemas.html#DDL-SCHEMAS-PATTERNS
> > here?  The release note text is still vague, and the docs have extensive
> > coverage of the topic.  The notes can just link to that extensive coverage.
> 
> Sure. how is this patch?

> --- a/doc/src/sgml/release-15.sgml
> +++ b/doc/src/sgml/release-15.sgml
> @@ -63,11 +63,12 @@ Author: Noah Misch 
>permissions on the public schema has not
>been changed.  Databases restored from previous Postgres releases
>will be restored with their current permissions.  Users wishing
> -  to have the former permissions will need to grant
> +  to have the former more-open permissions will need to grant
>CREATE permission for PUBLIC
>on the public schema; this change can be made
>on template1 to cause all new databases
> -  to have these permissions.
> +  to have these permissions.  This change was made to increase
> +  security;  see .
>   
>  

I think this still puts undue weight on single-user systems moving back to the
old default.  The linked documentation does say how to get back to v14
permissions (and disclaims security if you do so), so let's not mention it
here.  The attached is how I would write it.  I also reworked the "Databases
restored from previous ..." sentence, since its statement is also true of
databases restored v15-to-v15 (no "previous" release involved).  I also moved
the bit about USAGE to end, since it's just emphasizing what the reader should
already assume.  Any concerns?
Author: Noah Misch 
Commit: Noah Misch 



diff --git a/doc/src/sgml/release-15.sgml b/doc/src/sgml/release-15.sgml
index 179ad37..aa02ee9 100644
--- a/doc/src/sgml/release-15.sgml
+++ b/doc/src/sgml/release-15.sgml
@@ -58,16 +58,15 @@ Author: Noah Misch 
  
 
  
-  This is a change in the default for newly-created databases in
-  existing clusters and for new clusters;  USAGE
-  permissions on the public schema has not
-  been changed.  Databases restored from previous Postgres releases
-  will be restored with their current permissions.  Users wishing
-  to have the former permissions will need to grant
-  CREATE permission for PUBLIC
-  on the public schema; this change can be made
-  on template1 to cause all new databases
-  to have these permissions.
+  The new default is one of the secure schema usage patterns that
+   has recommended since the
+  security release for CVE-2018-1058.  Upgrading a cluster or restoring a
+  database dump will preserve existing permissions.  This is a change in
+  the default for newly-created databases in existing clusters and for new
+  clusters.  In existing databases, especially those having multiple
+  users, consider issuing a REVOKE to adopt this new
+  default.  (USAGE permission on this schema has not
+  changed.)
  
 
 


Re: First draft of the PG 15 release notes

2022-07-05 Thread Bruce Momjian
On Tue, Jul  5, 2022 at 12:53:49PM -0700, Noah Misch wrote:
> On Tue, Jul 05, 2022 at 02:35:39PM -0400, Bruce Momjian wrote:
> > On Fri, Jul  1, 2022 at 06:21:28PM -0700, Noah Misch wrote:
> > > Here's what I've been trying to ask: what do you think of linking to
> > > https://www.postgresql.org/docs/devel/ddl-schemas.html#DDL-SCHEMAS-PATTERNS
> > > here?  The release note text is still vague, and the docs have extensive
> > > coverage of the topic.  The notes can just link to that extensive 
> > > coverage.
> > 
> > Sure. how is this patch?
> 
> > --- a/doc/src/sgml/release-15.sgml
> > +++ b/doc/src/sgml/release-15.sgml
> > @@ -63,11 +63,12 @@ Author: Noah Misch 
> >permissions on the public schema has not
> >been changed.  Databases restored from previous Postgres releases
> >will be restored with their current permissions.  Users wishing
> > -  to have the former permissions will need to grant
> > +  to have the former more-open permissions will need to grant
> >CREATE permission for PUBLIC
> >on the public schema; this change can be made
> >on template1 to cause all new databases
> > -  to have these permissions.
> > +  to have these permissions.  This change was made to increase
> > +  security;  see .
> >   
> >  
> 
> I think this still puts undue weight on single-user systems moving back to the
> old default.  The linked documentation does say how to get back to v14
> permissions (and disclaims security if you do so), so let's not mention it
> here.  The attached is how I would write it.  I also reworked the "Databases
> restored from previous ..." sentence, since its statement is also true of
> databases restored v15-to-v15 (no "previous" release involved).  I also moved
> the bit about USAGE to end, since it's just emphasizing what the reader should
> already assume.  Any concerns?

I see where you are going --- to talk about how to convert upgraded
clusters to secure clusters, rather than how to revert to the previous
behavior.  I assumed that the most common question would be how to get
the previous behavior, rather than how to get the new behavior in
upgraded clusters.  However, I am fine with what you think is best.

My only stylistic suggestion would be to remove "a" from "a
REVOKE".

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Indecision is a decision.  Inaction is an action.  Mark Batterson





Re: making relfilenodes 56 bits

2022-07-05 Thread Robert Haas
On Fri, Jul 1, 2022 at 6:42 AM Dilip Kumar  wrote:
> > - I might be missing something here, but this isn't actually making
> > the relfilenode 56 bits, is it? The reason to do that is to make the
> > BufferTag smaller, so I expected to see that BufferTag either used
> > bitfields like RelFileNumber relNumber:56 and ForkNumber forkNum:8, or
> > else that it just declared a single field for both as uint64 and used
> > accessor macros or static inlines to separate them out. But it doesn't
> > seem to do either of those things, which seems like it can't be right.
> > On a related note, I think it would be better to declare RelFileNumber
> > as an unsigned type even though we have no use for the high bit; we
> > have, equally, no use for negative values. It's easier to reason about
> > bit-shifting operations with unsigned types.
>
> Opps, somehow missed to merge that change in the patch.  Changed that
> like below and adjusted the macros.
> typedef struct buftag
> {
> Oid spcOid; /* tablespace oid. */
> Oid dbOid; /* database oid. */
> uint32 relNumber_low; /* relfilenumber 32 lower bits */
> uint32 relNumber_hi:24; /* relfilenumber 24 high bits */
> uint32 forkNum:8; /* fork number */
> BlockNumber blockNum; /* blknum relative to begin of reln */
> } BufferTag;
>
> I think we need to break like this to keep the BufferTag 4 byte
> aligned otherwise the size of the structure will be increased.

Well, I guess you're right. That's a bummer. In that case I'm a little
unsure whether it's worth using bit fields at all. Maybe we should
just write uint32 something[2] and use macros after that.

Another approach could be to accept the padding and define a constant
SizeOfBufferTag and use that as the hash table element size, like we
do for the sizes of xlog records.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




symbol "conflicts" between libpq and binaries

2022-07-05 Thread Andres Freund
Hi,

There's a number of symbols that are exported by libpq that are also in
binaries (mostly via pgport). That strikes me as not great, because the
behaviour in those cases isn't particularly well defined / OS dependent /
linker option dependent.

How about renaming those functions, turning the functions exported by libpq
into wrappers?

This is at least:

pqsignal
pg_char_to_encoding
pg_valid_server_encoding_id
pg_valid_server_encoding
pg_encoding_to_char
pg_utf_mblen

I'm not quite sure why we export some of these, but we likely don't want to
change that, given the API/ABI break that'd cause.

Greetings,

Andres Freund




Re: making relfilenodes 56 bits

2022-07-05 Thread Robert Haas
On Tue, Jul 5, 2022 at 4:33 AM Dilip Kumar  wrote:
> I thought about this comment from Robert
> > that's not quite the same as either of those things. For example, in
> > tableam.h we currently say "This callback needs to create a new
> > relation filenode for `rel`" and how should that be changed in this
> > new naming? We're not creating a new RelFileNumber - those would need
> > to be allocated, not created, as all the numbers in the universe exist
> > already. Neither are we creating a new locator; that sounds like it
> > means assembling it from pieces.
>
> I think that "This callback needs to create a new relation storage
> for `rel`" looks better.

I like the idea, but it would sound better to say "create new relation
storage" rather than "create a new relation storage."

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: PSA: Autoconf has risen from the dead

2022-07-05 Thread Robert Haas
On Tue, Jul 5, 2022 at 3:02 PM Andres Freund  wrote:
> Yea. They might not be independent of where you get other dependencies from
> though. Does macports install headers / libraries into a path that's found by
> default? Or does one have to pass --with-includes / --with-libs to configure
> and set PKG_CONFIG_PATH, like with homebrew?

My configure switches include: --with-libraries=/opt/local/lib
--with-includes=/opt/local/include

I don't do anything with PKG_CONFIG_PATH.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: pg_checkpointer is not a verb or verb phrase

2022-07-05 Thread Robert Haas
On Tue, Jul 5, 2022 at 3:42 PM Nathan Bossart  wrote:
> On Tue, Jul 05, 2022 at 01:38:43PM -0400, Robert Haas wrote:
> > Hearing several votes in favor and none opposed, committed and
> > back-patched to v15.
>
> Thanks.
>
> > I added the catversion bump, but left out the .po
> > file changes, figuring it was better to let those files get updated
> > via the normal process.
>
> I'll keep this in mind for future patches.  The changes looked pretty
> obvious, so I wasn't sure whether to include it.

I believe Peter periodically runs a script that bulk copies everything
over from the translation repository.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: [UNVERIFIED SENDER] Re: pg_upgrade can result in early wraparound on databases with high transaction load

2022-07-05 Thread Andrew Dunstan


On 2022-07-05 Tu 15:17, Tom Lane wrote:
> Andrew Dunstan  writes:
>> So it's taken us a year to discover the issue :-( Perhaps if we're going
>> to say we support upgrades back to 9.0 we should have some testing to be
>> assured we don't break it without knowing like this. I'll see if I can
>> coax crake to do that - it already tests back to 9.2.
> Hmm ... could you first look into why 09878cdd4 broke it?  I'd supposed
> that that was just detecting situations we must already have dealt with
> in order for the pg_upgrade test to work, but crake's not happy.


It's complaining about this:


andrew@emma:HEAD $ cat
./inst/REL9_6_STABLE-20220705T160820.039/incompatible_polymorphics.txt
In database: regression
  aggregate: public.first_el_agg_f8(double precision)

I can have TestUpgradeXVersion.pm search for and remove offending
functions, if that's the right fix.

I note too that drongo is failing similarly, but its pg_upgrade output
directory is missing, so 4fff78f009 seems possibly shy of a load w.r.t.
MSVC. I will investigate.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Issue with pg_stat_subscription_stats

2022-07-05 Thread Andres Freund
Hi,

On 2022-07-04 11:01:01 +0900, Masahiko Sawada wrote:
> I've attached the patch, fix_drop_subscriptions_stats.patch, to fix it.

LGTM. Unless somebody sees a reason not to, I'm planning to commit that to 15
and HEAD.


> I've also attached another PoC patch,
> poc_create_subscription_stats.patch, to create the stats entry when
> creating the subscription, which address the issue reported in this
> thread; the pg_stat_reset_subscription_stats() doesn't update the
> stats_reset if no error is reported yet.

It'd be good for this to include a test.


> diff --git a/src/backend/utils/activity/pgstat_subscription.c 
> b/src/backend/utils/activity/pgstat_subscription.c
> index e1072bd5ba..ef318b7422 100644
> --- a/src/backend/utils/activity/pgstat_subscription.c
> +++ b/src/backend/utils/activity/pgstat_subscription.c
> @@ -47,8 +47,20 @@ pgstat_report_subscription_error(Oid subid, bool 
> is_apply_error)
>  void
>  pgstat_create_subscription(Oid subid)
>  {
> + PgStat_EntryRef *entry_ref;
> + PgStatShared_Subscription *shstatent;
> +
>   pgstat_create_transactional(PGSTAT_KIND_SUBSCRIPTION,
>   InvalidOid, 
> subid);
> +
> + entry_ref = pgstat_get_entry_ref_locked(PGSTAT_KIND_SUBSCRIPTION,
> + 
> InvalidOid, subid,
> + 
> false);
> + shstatent = (PgStatShared_Subscription *) entry_ref->shared_stats;
> +
> + memset(&shstatent->stats, 0, sizeof(shstatent->stats));
> +
> + pgstat_unlock_entry(entry_ref);
>  }
>  
>  /*

I think most of this could just be pgstat_reset_entry().

Greetings,

Andres Freund




Re: First draft of the PG 15 release notes

2022-07-05 Thread Noah Misch
On Tue, Jul 05, 2022 at 04:35:32PM -0400, Bruce Momjian wrote:
> On Tue, Jul  5, 2022 at 12:53:49PM -0700, Noah Misch wrote:
> > On Tue, Jul 05, 2022 at 02:35:39PM -0400, Bruce Momjian wrote:
> > > On Fri, Jul  1, 2022 at 06:21:28PM -0700, Noah Misch wrote:
> > > > Here's what I've been trying to ask: what do you think of linking to
> > > > https://www.postgresql.org/docs/devel/ddl-schemas.html#DDL-SCHEMAS-PATTERNS
> > > > here?  The release note text is still vague, and the docs have extensive
> > > > coverage of the topic.  The notes can just link to that extensive 
> > > > coverage.
> > > 
> > > Sure. how is this patch?
> > 
> > > --- a/doc/src/sgml/release-15.sgml
> > > +++ b/doc/src/sgml/release-15.sgml
> > > @@ -63,11 +63,12 @@ Author: Noah Misch 
> > >permissions on the public schema has not
> > >been changed.  Databases restored from previous Postgres releases
> > >will be restored with their current permissions.  Users wishing
> > > -  to have the former permissions will need to grant
> > > +  to have the former more-open permissions will need to grant
> > >CREATE permission for PUBLIC
> > >on the public schema; this change can be made
> > >on template1 to cause all new databases
> > > -  to have these permissions.
> > > +  to have these permissions.  This change was made to increase
> > > +  security;  see .
> > >   
> > >  
> > 
> > I think this still puts undue weight on single-user systems moving back to 
> > the
> > old default.  The linked documentation does say how to get back to v14
> > permissions (and disclaims security if you do so), so let's not mention it
> > here.  The attached is how I would write it.  I also reworked the "Databases
> > restored from previous ..." sentence, since its statement is also true of
> > databases restored v15-to-v15 (no "previous" release involved).  I also 
> > moved
> > the bit about USAGE to end, since it's just emphasizing what the reader 
> > should
> > already assume.  Any concerns?
> 
> I see where you are going --- to talk about how to convert upgraded
> clusters to secure clusters, rather than how to revert to the previous
> behavior.  I assumed that the most common question would be how to get
> the previous behavior, rather than how to get the new behavior in
> upgraded clusters.  However, I am fine with what you think is best.

Since having too-permissive ACLs is usually symptom-free, I share your
forecast about the more-common question.  Expect questions on mailing lists,
stackoverflow, etc.  The right way to answer those questions is roughly this:

> On PostgreSQL 15, my application gets "permission denied for schema
> public".  What should I do?

You have a choice to make.  The best selection depends on the security
needs of your database.  See
https://www.postgresql.org/docs/devel/ddl-schemas.html#DDL-SCHEMAS-PATTERNS
for a guide to making that choice.

Recommending GRANT to that two-sentence question would be negligent.  One
should know a database's lack of security needs before recommending GRANT.
This is a key opportunity to have more users make the right decision while
their attention is on the topic.

> My only stylistic suggestion would be to remove "a" from "a
> REVOKE".

I'll plan to push with that change.




  1   2   >