date:20231009

Re: Fix output of zero privileges in psql

2023-10-09 Thread Laurenz Albe

On Mon, 2023-10-09 at 03:53 +0200, Erik Wienhold wrote:
> On 2023-10-08 06:14 +0200, Laurenz Albe write:
> > On Sat, 2023-10-07 at 20:41 +0200, Erik Wienhold wrote:
> > > > If you are happy enough with my patch, shall we mark it as ready for
> > > > committer?
> > > 
> > > I amended your patch to also document the effect of \pset null in this
> > > case.  See the attached v2.
> > 
> > +1
> > 
> > If you mention in ddl.sgml that you can use "\pset null" to distinguish
> > default from no privileges, you should mention that this only works with
> > psql.  Many people out there don't use psql.
> 
> I don't think this is necessary because that section in ddl.sgml is
> already about psql and \dp.

You are right.

> > Also, I'm not sure if "zero privileges" will be readily understood by
> > everybody.  Perhaps "no privileges at all, even for the object owner"
> > would be a better wording.
> 
> Changed in v3 to "empty privileges" with an explanation that this means
> "no privileges at all, even for the object owner".

Looks good.

> > Perhaps it would also be good to mention this in the psql documentation.
> 
> Just once under  \pset null  with a reference to section 5.7?  Something
> like "This is also useful for distinguishing default privileges from
> empty privileges as explained in Section 5.7."
> 
> Or instead under every command affected by this change?  \dp and \ddp
> already have such a reference ("The meaning of the privilege display is
> explained in Section 5.7.")
> 
> I prefer the first one because it's less effort ;)  Also done in v3.

I think that sufficient.

I tinkered a bit with your documentation.  For example, the suggestion to
"\pset null" seemed to be in an inappropriate place.  Tell me what you think.

Yours,
Laurenz Albe
From 2be3e63a6330cbf7767e19f86940ae97f87bf5f2 Mon Sep 17 00:00:00 2001
From: Laurenz Albe 
Date: Mon, 9 Oct 2023 09:49:20 +0200
Subject: [PATCH] psql: honor "\pset null" in backslash commands

In the output of backslash commands, NULL was always displayed
as empty string, rather than honoring the values set with
"\pset null".

This was surprising, and in some cases it was downright confusing:
for example, default privileges (stored as NULL) were displayed
just like an empty aclitem[], making these cases undistinguishable
in the output.  Add this missing detail to the documentation.

Discussion: https://postgr.es/m/96d6885a-5e25-9ae8-4a1a-d7e557a5fe9c%40mtneva.com
---
 doc/src/sgml/ddl.sgml  |  7 --
 doc/src/sgml/ref/psql-ref.sgml |  4 +++-
 src/bin/psql/describe.c| 43 --
 3 files changed, 8 insertions(+), 46 deletions(-)

diff --git a/doc/src/sgml/ddl.sgml b/doc/src/sgml/ddl.sgml
index 075ff32991..52f17e79ed 100644
--- a/doc/src/sgml/ddl.sgml
+++ b/doc/src/sgml/ddl.sgml
@@ -2353,8 +2353,11 @@ GRANT SELECT (col1), UPDATE (col1) ON mytable TO miriam_rw;
   
If the Access privileges column is empty for a given
object, it means the object has default privileges (that is, its
-   privileges entry in the relevant system catalog is null).  Default
-   privileges always include all privileges for the owner, and can include
+   privileges entry in the relevant system catalog is null) or empty privileges
+   (that is, no privileges at all, even for the object owner — a rare
+   occurrence).  One way to distinguish default privileges from empty privileges
+   is to set \pset null '(default)'.
+   Default privileges always include all privileges for the owner, and can include
some privileges for PUBLIC depending on the object
type, as explained above.  The first GRANT
or REVOKE on an object will instantiate the default
diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml
index d94e3cacfc..7cd12eb867 100644
--- a/doc/src/sgml/ref/psql-ref.sgml
+++ b/doc/src/sgml/ref/psql-ref.sgml
@@ -3079,7 +3079,9 @@ lo_import 152801
   Sets the string to be printed in place of a null value.
   The default is to print nothing, which can easily be mistaken for
   an empty string. For example, one might prefer \pset null
-  '(null)'.
+  '(null)'. Setting a non-empty null display string will
+  help to distinguish between default and empty privileges, as
+  explained in .
   
   
   
diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index bac94a338c..224aa22575 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -124,7 +124,6 @@ describeAggregates(const char *pattern, bool verbose, bool showSystem)
 	if (!res)
 		return false;
 
-	myopt.nullPrint = NULL;
 	myopt.title = _("List of aggregate functions");
 	myopt.translate_header = true;
 
@@ -197,7 +196,6 @@ describeAccessMethods(const char *pattern, bool verbose)
 	if (!res)
 		return false;
 
-	myopt.nullPrint = NULL;
 	myopt.title = _("List of access methods");
 	myopt.translate_header = true;
 	myopt.translate_columns = translate_columns;
@@ -262,

Re: Making aggregate deserialization (and WAL receive) functions slightly faster

2023-10-09 Thread David Rowley

On Mon, 9 Oct 2023 at 17:37, Tom Lane  wrote:
> Sorry for not having paid more attention to this thread ... but
> I'm pretty desperately unhappy with the patch as-pushed.  I agree
> with the criticism that this is a very repetitive coding pattern
> that could have used a macro.  But my real problem with this:
>
> +   buf.data = VARDATA_ANY(sstate);
> +   buf.len = VARSIZE_ANY_EXHDR(sstate);
> +   buf.maxlen = 0;
> +   buf.cursor = 0;
>
> is that it totally breaks the StringInfo API without even
> attempting to fix the API specs that it falsifies,
> particularly this in stringinfo.h:
>
>  *maxlen  is the allocated size in bytes of 'data', i.e. the maximum
>  *string size (including the terminating '\0' char) that we 
> can
>  *currently store in 'data' without having to reallocate
>  *more space.  We must always have maxlen > len.
>
> I could see inventing a notion of a "read-only StringInfo"
> to legitimize what you've done here, but you didn't bother
> to try.  I do not like this one bit.  This is a fairly
> fundamental API and we shouldn't be so cavalier about
> breaking it.

You originally called the centralised logic a "loaded foot-gun" [1],
but now you're complaining about a lack of loaded foot-gun and want a
macro? Which part did I misunderstand? Enlighten me, please.

Here are some more thoughts on how we could improve this:

1. Adjust the definition of StringInfoData.maxlen to define that -1
means the StringInfoData's buffer is externally managed.
2. Adjust enlargeStringInfo() to add a check for maxlen = -1 and have
it palloc, say, pg_next_pow2(str->len * 2) bytes and memcpy the
existing (externally managed string) into the newly palloc'd buffer.
3. Add a new function along the lines of what I originally proposed to
allow init of a StringInfoData using an existing allocated string
which sets maxlen = -1.
4. Update all the existing places, including the ones I just committed
(plus the ones you committed in ba1e066e4) to make use of the function
added in #3.

Better ideas?

David

[1] https://postgr.es/m/770055.1676183...@sss.pgh.pa.us

Re: Fix output of zero privileges in psql

2023-10-09 Thread Laurenz Albe

On Sun, 2023-10-08 at 19:58 -0700, David G. Johnston wrote:
> For us, I would suggest the following wording:
> 
> In addition to the situation of printing all acl items, the Access and Column
> privileges columns report two other situations specially.  In the rare case
> where all privileges for an object have been explicitly removed, including
> from the owner and PUBLIC, (i.e., has empty privileges) these columns will
> display NULL.  The other case is where the built-in default privileges are
> in effect, in which case these columns will display the empty string.
> (Note that by default psql will print NULL as an empty string, so in order
> to visually distinguish these two cases you will need to issue the \pset null
> meta-command and choose some other string to print for NULLs).  Built-in
> default privileges include all privileges for the owner, as well as those
> granted to PUBLIC per for relevant object types as described above.

That doesn't look like an improvement over the latest patches to me.

> The built-in default privileges are only in effect if the object has not been
> the target of a GRANT or REVOKE and also has not had its default privileges
> modified using ALTER DEFAULT PRIVILEGES. (???: if it is possible to revert
> back to the state of built-in privileges that would need to be described 
> here.)

I don't think that we need to mention ALTER DEFAULT PRIVILEGES there.  If
the default privileges have been altered, the ACL will not be stored as
NULL in the catalogs.

> The above removes the parenthetical regarding null in the catalogs, this is
> intentional as it seems that the goal here is to use psql instead of the
> catalogs and adding its use of null being printed as the empty string just
> seems likely to add confusion.

To me, mentioning the default privileges are stored as NULLs in the catalogs
is not an invitation to view the privileges with catalog queries, but
information about implementation details that explains why default privileges
are displayed the way they are.

> We probably should add the two terms to the glossary:
> 
> empty privileges: all privileges explicitly revoked from the owner and PUBLIC
> (where applicable), and none otherwise granted.
> 
> built-in default privileges: owner having all privileges and no privileges
> granted or removed via ALTER DEFAULT PRIVILEGES

"Empty privileges" are too unimportant to warrant an index entry.

I can see the value of an index entry


 privilege
 default


Done in the attached v5 of the patch, even though this is not really
the business of this patch.

> > > Perhaps it would also be good to mention this in the psql documentation.
> 
> We've chosen a poor default and I'd rather inform the user of specific 
> meta-commands
> to be wary of this poor default and change it at the point they will be 
> learning
> about the meta-commands adversely affected.
> 
> That said, I'd be willing to document that these commands, because they are 
> affected
> by empty string versus null, require a non-empty-string value for \pset null 
> and will
> choose the string '' for the duration of the meta-command's execution 
> if the
> user's setting is incompatible.

I am not certain I understood you correctly.
Are you advocating for adding a mention of "\pset null" to every backslash 
command
that displays privileges?  That is excessive, in my opinion.

Yours,
Laurenz Albe
From 2afe3cbf674e163c146ea29582f7aa3839bd184d Mon Sep 17 00:00:00 2001
From: Laurenz Albe 
Date: Mon, 9 Oct 2023 10:27:58 +0200
Subject: [PATCH] psql: honor "\pset null" in backslash commands

In the output of backslash commands, NULL was always displayed
as empty string, rather than honoring the values set with
"\pset null".

This was surprising, and in some cases it was downright confusing:
for example, default privileges (stored as NULL) were displayed
just like an empty aclitem[], making these cases undistinguishable
in the output.  Add this missing detail to the documentation.

In passing, add entry for "privileges, default" to the index.

Discussion: https://postgr.es/m/96d6885a-5e25-9ae8-4a1a-d7e557a5fe9c%40mtneva.com
---
 doc/src/sgml/ddl.sgml  | 14 ---
 doc/src/sgml/ref/psql-ref.sgml |  4 +++-
 src/bin/psql/describe.c| 43 --
 3 files changed, 14 insertions(+), 47 deletions(-)

diff --git a/doc/src/sgml/ddl.sgml b/doc/src/sgml/ddl.sgml
index 075ff32991..76e62250e4 100644
--- a/doc/src/sgml/ddl.sgml
+++ b/doc/src/sgml/ddl.sgml
@@ -1737,6 +1737,11 @@ ALTER TABLE products RENAME TO items;
ACL
   
 
+  
+   privilege
+   default
+  
+
   
When an object is created, it is assigned an owner. The
owner is normally the role that executed the creation statement.
@@ -2049,7 +2054,7 @@ REVOKE ALL ON accounts FROM PUBLIC;
reference page of the respective command.
   
 
-  
+  
PostgreSQL grants privileges on some types of objects to
PUBLIC by default when the objects are created.
No privileges are

Re: Where can I find the doxyfile?

2023-10-09 Thread Stefan Kaltenbrunner


On 07.10.23 00:23, Bohdan Mart wrote:


On 07.10.23 00:29, postg...@coyotebush.net wrote:
Sometime earlier, I created a filter to annotate regular C comments as 
doxy
comments.  I'll attach it along with a sample doxyfile for running 
it.  Just

in case it looks useful.

I've never been a big fan of Doxygen, but it seems to have gotten better
over time. Now that some IDEs display doxy comments on hover, I'm 
beginning

to appreciate it.


Thank you for providing this `flex` filter! It is actually useful. I've 
tested it on one


file from postures source base, and it actually converted regular

comments to doc-comments! If this filter could be used by official

Doxygen generation of Postgres, it would highly increase quality

of online documentation of Postgres!



I have not actually looked at the code of the filter itself but 
personally I'm not super happy with doing a C-based filter as part of 
our doxygen documentation generation step - that seems like an 
additional maintainance burden infrastrukcture wise - is there nothing 
more lightweight available(assuming we even want that)?





Stefan

Re: Checks in RegisterBackgroundWorker.()

2023-10-09 Thread Heikki Linnakangas


On 06/10/2023 13:13, Thomas Munro wrote:

On Thu, Sep 28, 2023 at 9:46 AM Heikki Linnakangas  wrote:

Subject: [PATCH v2 3/3] Fix misleading comment on StartBackgroundWorker().


LGTM.  Hmm, maybe I would have called that function
"BackgroundWorkerMain()" like several other similar things, but that's
not important.


That's a good idea. I renamed it to BackgroundWorkerMain().

Pushed with that change, thanks for the review!

--
Heikki Linnakangas
Neon (https://neon.tech)

Re: [PoC] pg_upgrade: allow to upgrade publisher nodeHayato Kuroda (Fujitsu)

2023-10-09 Thread vignesh C

On Fri, 6 Oct 2023 at 18:30, Hayato Kuroda (Fujitsu)
 wrote:
>
> Dear hackers,
>
> Based on comments, I revised my patch. PSA the file.
>
> >
> > > Today, I discussed this problem with Andres at PGConf NYC and he
> > > suggested as following. To verify, if there is any pending unexpected
> > > WAL after shutdown, we can have an API like
> > > pg_logical_replication_slot_advance() which will simply process
> > > records without actually sending anything downstream. In this new API,
> > > we will start with each slot's restart_lsn location and try to process
> > > till the end of WAL, if we encounter any WAL that needs to be
> > > processed (like we need to send the decoded WAL downstream) we can
> > > return a false indicating that there is an unexpected WAL. The reason
> > > to start with restart_lsn is that it is the location that we use to
> > > start scanning the WAL anyway.
>
> I implemented this by using decoding context. The binary upgrade function
> processes WALs from the confirmed_flush, and returns false if some meaningful
> changes are found.
>
> Internally, I added a new decoding mode - DECODING_MODE_SILENT - and used it.
> If the decoding context is in the mode, the output plugin is not loaded, but
> any WALs are decoded without skipping. Also, a new flag "did_process" is also
> added. This flag is set if wrappers for output plugin callbacks are called 
> during
> the silent mode. The upgrading function checks both reorder buffer and the new
> flag because both (non-)transactional changes should be detected. If we only
> check reorder buffer, we miss the non-transactional one.
>
> fast_forward was changed as a variant of decoding mode.
>
> Currently the function is called for all the valid slot. If the approach seems
> good, we can refactor like Bharath said [1].
>
> >
> > > Then, we should also try to create slots before invoking pg_resetwal.
> > > The idea is that we can write a new binary mode function that will do
> > > exactly what pg_resetwal does to compute the next segment and use that
> > > location as a new location (restart_lsn) to create the slots in a new
> > > node. Then, pass it pg_resetwal by using the existing option '-l
> > > walfile'. As we don't have any API that takes restart_lsn as input, we
> > > can write a new API probably for binary mode to create slots that do
> > > take restart_lsn as input. This will ensure that there is no new WAL
> > > inserted by background processes between resetwal and the creation of
> > > slots.
>
> Based on that, I added another binary function 
> binary_upgrade_create_logical_replication_slot().
> This function is similar to pg_create_logical_replication_slot(), but the
> restart_lsn and confirmed_flush are set to *next* WAL segment. The pointed
> filename is returned and it is passed to pg_resetwal command.
>
> One consideration is that pg_log_standby_snapshot() must be executed before
> slots consuming changes. New cluster does not have RUNNING_XACTS records so 
> that
> decoding context on new cluster cannot be create a consistent snapshot as-is.
> This may lead to discard changes during the upcoming consuming event. To
> prevent it the function is called after the final pg_resetwal.

Few comments:
1)  Should we add binary upgrade check "CHECK_IS_BINARY_UPGRADE" for
this funcion too:
+binary_upgrade_create_logical_replication_slot(PG_FUNCTION_ARGS)
+{
+   Namename = PG_GETARG_NAME(0);
+   Nameplugin = PG_GETARG_NAME(1);
+
+   /* Temporary slots is never handled in this function */
+   booltwo_phase = PG_GETARG_BOOL(2);

2) Generally we are specifying the slot name in this case, is slot
name null check required:
+Datum
+binary_upgrade_validate_wal_logical_end(PG_FUNCTION_ARGS)
+{
+   Nameslot_name;
+   XLogRecPtr  end_of_wal;
+   LogicalDecodingContext *ctx = NULL;
+   boolhas_record;
+
+   CHECK_IS_BINARY_UPGRADE;
+
+   /* Quick exit if the input is NULL */
+   if (PG_ARGISNULL(0))
+   PG_RETURN_BOOL(false);

3) Since this is similar to pg_create_logical_replication_slot, can we
add a comment saying any change in pg_create_logical_replication_slot
would also need the same check to be added in
binary_upgrade_create_logical_replication_slot:
+/*
+ * SQL function for creating a new logical replication slot.
+ *
+ * This function is almost same as pg_create_logical_replication_slot(), but
+ * this can specify the restart_lsn.
+ */
+Datum
+binary_upgrade_create_logical_replication_slot(PG_FUNCTION_ARGS)
+{
+   Namename = PG_GETARG_NAME(0);
+   Nameplugin = PG_GETARG_NAME(1);
+
+   /* Temporary slots is never handled in this function */

4) Any conclusion on this try catch comment, do you want to add which
setting you want to revert in catch, if try/catch is not required we
can remove this comment:
+   ReplicationSlotAcquire(NameStr(*slot_name), true);
+
+   /* XXX: Is PG_TRY/CATCH

Re: Clean up some pg_dump tests

2023-10-09 Thread Alvaro Herrera

I tried this out.  I agree it's a good change.  BTW, this made me
realize that "unlike" is not a good name: maybe it should be called
"except".

On 2023-Oct-02, Peter Eisentraut wrote:

> + if (!defined($tests{$test}->{like}))
> + {
> + diag "missing like in test \"$test\"";
> + }
> + if ($tests{$test}->{unlike}->{$test_key} &&
> + !defined($tests{$test}->{like}->{$test_key}))
> + {
> + diag "useless unlike \"$test_key\" in test \"$test\"";
> + }

I would add quotes to the words "like" and "unlike" there.  Otherwise,
these sentences are hard to parse.  Also, some commentary on what this
is about seems warranted: maybe "Check that this test properly defines
which dumps the output should match on." or similar.

I didn't like using diag(), because automated runs will not alert to any
problems.  Now maybe that's not critical, but I fear that people would
not notice problems if they are just noise in the output.  Let's make
them test errors.  fail() seems good enough: with the lines I quote
above and omitting the test corrections, I get this, which seems good
enough:

#   Failed test 'useless unlike "binary_upgrade" in test "Disabled trigger on 
partition is not created"'
#   at t/002_pg_dump.pl line 4960.

#   Failed test 'useless unlike "clean" in test "Disabled trigger on partition 
is not created"'
#   at t/002_pg_dump.pl line 4960.

[... a few others ...]

Test Summary Report
---
t/002_pg_dump.pl(Wstat: 15104 (exited 59) Tests: 11368 Failed: 59)
  Failed tests:  241, 486, 731, 1224, 1473, 1719, 1968, 2217
2463, 2712, 2961, 3207, 3452, 3941, 4190
4442, 4692, 4735-4736, 4943, 5094, 5189
5242, 5341, 5436, 5681, 5926, 6171, 6660
6905, 7150, 7395, 7640, 7683, 7762, 7887
7930, 7941, 8134, 8187, 8229, 8287, 8626
8871, 8924, 9023, 9170, 9269, 9457, 9515
9704, 9762, 10345, 10886, 10985, 11105
11123, 11134, 11327
  Non-zero exit status: 59
Files=5, Tests=11482, 15 wallclock secs ( 0.43 usr  0.04 sys +  4.56 cusr  1.63 
csys =  6.66 CPU)
Result: FAIL

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"Ni aún el genio muy grande llegaría muy lejos
si tuviera que sacarlo todo de su propio interior" (Goethe)

Re: [PoC/RFC] Multiple passwords, interval expirations

2023-10-09 Thread Gurjeet Singh

On Sun, Oct 8, 2023 at 1:01 PM Gurjeet Singh  wrote:
>
> On Fri, Oct 6, 2023 at 1:29 PM Jeff Davis  wrote:
> >
> > On Thu, 2023-10-05 at 14:28 -0700, Gurjeet Singh wrote:
> >
> > > This way there's a notion of a 'new' and 'old' passwords.
> >
> > IIUC, you are proposing that there are exactly two slots, NEW and OLD.
> > When adding a password, OLD must be unset and it moves NEW to OLD, and
> > adds the new password in NEW. DROP only works on OLD. Is that right?
>
> Yes, that's what I was proposing. But thinking a bit more about it,
> the _implicit_ shuffling of labels 'new' and 'old' doesn't feel right
> to me. The password that used to be referred to as 'new' now
> automatically gets labeled 'old'.
>
> > It's close to the idea of deprecation, except that adding a new
> > password implicitly deprecates the existing one. I'm not sure about
> > that -- it could be confusing.
>
> +1

> I believe we should fix the _names_ of the slots the 2 passwords are
> stored in, and provide commands that manipulate those slots by
> respective names; the commands should not implicitly move the
> passwords between the slots. Additionally, we may provide functions
> that provide observability info for these slots. I propose the slot
> names FIRST and SECOND (I picked these because these keywords/tokens
> already exist in the grammar, but not yet sure if the grammar rules
> would allow their use; feel free to propose better names). FIRST
> refers to the the existing slot, namely rolpassword. SECOND refers to
> the new slot we'd add, that is, a pgauthid column named
> rolsecondpassword. The existing commands, or when neither FIRST nor
> SECOND are specified, the commands apply to the existing slot, a.k.a.
> FIRST.

Please see attached the patch that implements this proposal. The patch
is named password_rollover_v3.diff, breaking from the name
'multiple_passwords...', since this patch limits itself to address the
password-rollover feature.

The multiple_password* series of patches had removed a critical
functionality, which I believe is crucial from security perspective.
When a user does not exist, or has no passwords, or has passwords that
have expired, we must pretend to perform authentication (network
packet exchanges) as normally as possible, so that the absence of
user, or lack of (or expiration of) passwords is not revealed to an
attacker. I have restored the original behaviour in the
CheckPWChallengeAuth() function; see commit aba99df407 [2].

I looked for any existing keywords that may better fit the purpose of
naming the slots, better than FIRST and SECOND, but I could not find
any. Instead of DROP to remove the passwords, I tried DELETE and the
grammar/bison did not complain about it; so DELETE is an option, too,
but I feel DROP FIRST/SECOND/ALL PASSWORD is a better companion of ADD
FIRST/SECOND PASSWORD syntax, in the same vein as ADD/DROP COLUMN.

The doc changes are still missing, but the regression tests and the
comments therein should provide a good idea of the user interface of
this new feature. Documenting this behaviour in a succinct manner
feels difficult; so ideas welcome for how to inform the reader that
now a role is accompanied by two slots to store the passwords, and
that the old commands operate on the first slot, and to operate on the
second password slot one must use the new syntax. I guess it would be
best to start the relevant section with "To support gradual password
rollovers, Postgres provides the ability to store 2 active passwords
at the same time. The passwords are referred to as FIRST and SECOND
password. Each of these passwords can be changed independently, and
each of these can have an associated expiration time, if necessary."

Since these new commands are only available to ALTER ROLE (and not to
CREATE ROLE), the corresponding command doc page also needs to be
updated.

Next steps:
- Break the patch into a series of smaller patches.
- Add TAP tests (test the ability to actually login with these passwords)
- Add/update documentation
- Add more regression tests

The patch progress can be followed on the Git branch
password_rollover_v3 [1]. This branch begins uses
multiple_passwords_v4 as starting point, and removes unnecessary code
(commit 326f60225f [3])

The v1 (and tombstone of v2) patches of password_rollover never
finished as the consensus changed while they were in progress, but
they exist as sibling branches of the v3 branch.

[1]: https://github.com/gurjeet/postgres/commits/password_rollover_v3
[2]: 
https://github.com/gurjeet/postgres/commit/aba99df407a523357db2813f0eea0b45dbeb6006
[3]: 
https://github.com/gurjeet/postgres/commit/326f60225f0e660338fc9c276c8728dc10db435b

Best regards,
Gurjeet
http://Gurje.et

password_rollover_v3.diff
Description: Binary data

Re: FDW LIM IT pushdown

2023-10-09 Thread Ashutosh Bapat

Hi Michal,

On Fri, Oct 6, 2023 at 9:34 PM Michał Kłeczek  wrote:
>
> Hello hackers,
>
> First timer here with a question:
>
> I’ve searched through various e-mail threads and documents and could not find 
> if pushdown of LIMIT clauses to FDW is implemented.
> Seen some conversation about that a couple of years ago when that was treated 
> as a feature request but nothing  since then.
> Citus does that according to its documentation but for now we are trying to 
> base our solution on “plain” Postgres.
>
> If it is not implemented - could you, guys, provide me with some pointers to 
> source code where I could look at and try to implement that?
>

I started looking at code from grouping_planner and reached
create_ordinary_grouping_paths(). It calls
create_partitionwise_grouping_paths() to push aggregate and grouping
down into partitions and from there it is pushed down into FDW.
Looking at create_limit_path(), I don't similar treatment. So there
may be some cases where we are not pushing down final LIMIT.

However create_append_path() uses PlannerInfo::limit_tuples or
root->limit_tuples when creating append path node. So either it's
being used for costing or for pushing it down to the partitions.

This isn't a full answer, but I hope these pointers would help you.

-- 
Best Wishes,
Ashutosh Bapat

Re: Add a new BGWORKER_BYPASS_ROLELOGINCHECK flag

2023-10-09 Thread Michael Paquier

On Sun, Oct 08, 2023 at 05:48:55PM -0400, Tom Lane wrote:
> There have been intermittent failures on various buildfarm machines
> since this went in.  After seeing one on my own animal mamba [1],
> I tried to reproduce it manually on that machine, and it does
> indeed fail about one time in two.  The buildfarm script is not
> managing to capture the relevant log files, but what I see in a
> manual run is that 001_worker_spi.pl logs this:

Thanks for the logs, I've noticed the failure but could not make any
sense of it based on the lack of information provided from the
buildfarm.  Serinus has complained once, for instance. 

> Since this only seems to happen on slow machines, I'd call it a timing
> problem or race condition.  Unless you want to argue that the race
> should not happen, probably the fix is to make the test script cope
> with this worker_spi_launch() call failing.  As long as we see the
> expected result from wait_for_log, we can be pretty sure the right
> thing happened.

The trick to reproduce the failure is to slow down worker_spi_launch()
before WaitForBackgroundWorkerStartup() with a worker already
registered so as the worker has the time to start and exit because of
the ALLOW_CONNECTIONS restriction.  (SendPostmasterSignal() in
RegisterDynamicBackgroundWorker() interrupts a hardcoded sleep, so
I've just used an on-disk flag.)

Another thing is that we cannot rely on the PID returned by launch()
as it could fail, so $worker3_pid needs to disappear.  If we do that,
I'd rather just switch to a specific database for the tests with
ALLOWCONN rather than reuse "mydb" that could have other workers.  The
attached fixes the issue for me.
--
Michael
diff --git a/src/test/modules/worker_spi/t/001_worker_spi.pl b/src/test/modules/worker_spi/t/001_worker_spi.pl
index 06bb73816f..044e208812 100644
--- a/src/test/modules/worker_spi/t/001_worker_spi.pl
+++ b/src/test/modules/worker_spi/t/001_worker_spi.pl
@@ -105,30 +105,31 @@ postgres|myrole|WorkerSpiMain]),
 ) or die "Timed out while waiting for dynamic bgworkers to be launched";
 
 # Check BGWORKER_BYPASS_ALLOWCONN.
-$node->safe_psql('postgres', q(ALTER DATABASE mydb ALLOW_CONNECTIONS false;));
+$node->safe_psql('postgres', q(CREATE DATABASE noconndb ALLOW_CONNECTIONS false;));
+my $noconndb_id = $node->safe_psql('mydb',
+	"SELECT oid FROM pg_database where datname = 'noconndb';");
 my $log_offset = -s $node->logfile;
 
-# bgworker cannot be launched with connection restriction.
-my $worker3_pid = $node->safe_psql('postgres',
-	qq[SELECT worker_spi_launch(12, $mydb_id, $myrole_id);]);
+# worker_spi_launch() may be able to detect that the worker has been
+# stopped, so do not rely on psql_safe().
+$node->psql('postgres',
+	qq[SELECT worker_spi_launch(12, $noconndb_id, $myrole_id);]);
 $node->wait_for_log(
-	qr/database "mydb" is not currently accepting connections/, $log_offset);
+	qr/database "noconndb" is not currently accepting connections/, $log_offset);
 
 $result = $node->safe_psql('postgres',
-	"SELECT count(*) FROM pg_stat_activity WHERE pid = $worker3_pid;");
+	"SELECT count(*) FROM pg_stat_activity WHERE datname = 'noconndb';");
 is($result, '0', 'dynamic bgworker without BYPASS_ALLOWCONN not started');
 
 # bgworker bypasses the connection check, and can be launched.
 my $worker4_pid = $node->safe_psql('postgres',
-	qq[SELECT worker_spi_launch(12, $mydb_id, $myrole_id, '{"ALLOWCONN"}');]);
+	qq[SELECT worker_spi_launch(12, $noconndb_id, $myrole_id, '{"ALLOWCONN"}');]);
 ok( $node->poll_query_until(
 		'postgres',
 		qq[SELECT datname, usename, wait_event FROM pg_stat_activity
 WHERE backend_type = 'worker_spi dynamic' AND
 pid IN ($worker4_pid) ORDER BY datname;],
-		qq[mydb|myrole|WorkerSpiMain]),
+		qq[noconndb|myrole|WorkerSpiMain]),
 	'dynamic bgworker with BYPASS_ALLOWCONN started');
 
-$node->safe_psql('postgres', q(ALTER DATABASE mydb ALLOW_CONNECTIONS true;));
-
 done_testing();


signature.asc
Description: PGP signature

Crash in add_paths_to_append_rel

2023-10-09 Thread Richard Guo

I came across a crash in add_paths_to_append_rel() with the query below.

create table t (a int);

create table inh (b int);
create table inh_child() inherits(inh);

explain (costs off)
select * from t left join lateral (select t.a from inh) on true limit 1;
server closed the connection unexpectedly

It was introduced by commit a8a968a8 where we referenced
cheapest_startup_path->param_info without first checking that we have a
valid cheapest_startup_path.  We do have checked that the pathlist is
not empty, but that does not imply that the cheapest_startup_path is not
NULL.  Normally only unparameterized paths are considered candidates for
cheapest_startup_path, because startup cost does not have too much value
for parameterized paths since they are on the inside of a nestloop.  So
if there are no unparameterized paths, the cheapest_startup_path will be
NULL.  That is the case in the repro query above.  Because of the
lateral reference within PHV, the child rels of 'inh' do not have
unparameterized paths, so their cheapest_startup_path is NULL.

I think we should explicitly check that cheapest_startup_path is not
NULL here.  Doing that seems to make the check of pathlist not being
empty unnecessary.  Besides, I think we do not need to check that
cheapest_startup_path->param_info is NULL, since cheapest_startup_path
must be unparameterized path.  Maybe we can use an Assert instead.

Attached is my proposed fix.

Thanks
Richard


v1-0001-Don-t-assume-that-cheapest_startup_path-exists.patch
Description: Binary data

Re: Check each of base restriction clauses for constant-FALSE-or-NULL

2023-10-09 Thread Ashutosh Bapat

On Sat, Oct 7, 2023 at 3:14 PM Richard Guo  wrote:
>
> In relation_excluded_by_constraints() when we're trying to figure out
> whether the relation need not be scanned, one of the checks we do is to
> detect constant-FALSE-or-NULL restriction clauses.  Currently we perform
> this check only when there is exactly one baserestrictinfo entry, and
> the comment explains this as below.
>
>  * Regardless of the setting of constraint_exclusion, detect
>  * constant-FALSE-or-NULL restriction clauses.  Because const-folding will
>  * reduce "anything AND FALSE" to just "FALSE", any such case should
>  * result in exactly one baserestrictinfo entry.
>
> This doesn't seem entirely correct, because equivclass.c may generate
> constant-FALSE baserestrictinfo entry on the fly.  In addition, other
> quals could get pushed down to the baserel.  All these cases would
> result in that the baserestrictinfo list might possibly have other
> members besides the FALSE constant.
>
> So I'm wondering if we should check each of base restriction clauses for
> constant-FALSE-or-NULL quals, like attached.
>
> Here are some examples.
>
> -- #1 constant-FALSE generated by ECs
>
> -- unpatched (in all branches)
>
> QUERY PLAN
> --
>  Result
>One-Time Filter: false
>->  Seq Scan on t t1
>  Filter: (a = 1)
> (4 rows)
>

I used a slightly modified query as below

# explain (costs off) select * from pg_class t1 where oid = 1 and oid = 2;
QUERY PLAN
--
 Result
   One-Time Filter: false
   ->  Index Scan using pg_class_oid_index on pg_class t1
 Index Cond: (oid = '1'::oid)
(4 rows)

postgres@312571=# explain (analyze, costs off) select * from pg_class
t1 where oid = 1 and oid = 2;
QUERY PLAN
---
 Result (actual time=0.002..0.003 rows=0 loops=1)
   One-Time Filter: false
   ->  Index Scan using pg_class_oid_index on pg_class t1 (never executed)
 Index Cond: (oid = '1'::oid)
 Planning Time: 0.176 ms
 Execution Time: 0.052 ms
(6 rows)

You will see that the scan node was never executed. Hence there's no
execution time benefit if we remove the scan plan.

Where do we produce the single baserestrictinfo mentioned in the
comments? Is it before the planning proper starts?

get_gating_quals does what you are doing much earlier in the query
processing. Your code would just duplicate that.

>
> -- patched
> explain (costs off)
> select * from t t1 left join (select * from t t2 where false) s on s.a = 1;
>QUERY PLAN
> 
>  Nested Loop Left Join
>->  Seq Scan on t t1
>->  Result
>  One-Time Filter: false
> (4 rows)

Does your code have any other benefits like deeming an inner join as empty?

-- 
Best Wishes,
Ashutosh Bapat

Re: should frontend tools use syncfs() ?

2023-10-09 Thread Maxim Orlov

On Fri, 6 Oct 2023 at 22:35, Nathan Bossart 
wrote:

> From a quick skim, this one looks pretty good to me.  Would you mind adding
> it to the commitfest so that it doesn't get lost?  I will aim to take a
> closer look at it next week.
>

Sounds good, thanks a lot!

https://commitfest.postgresql.org/45/4609/

-- 
Best regards,
Maxim Orlov.

Re: UUID v7

2023-10-09 Thread Chris Travers

So I am in the process of reviewing the patch and hopefully can provide 
something there soon.

However I want to address in the mean time the question of timestamp functions. 
 I know that is outside the scope of this patch but I would be in favor of 
adding them generally, not just as an extension but eventually into core.  I 
understand (and generally agree with) the logic of not generally extracting 
timestamps from UUIDs or other such field,s but there are cases where it is 
really, really helpful to be able to do.  In particular when you are 
troubleshooting misbehavior, all information you can get is helpful.  And so 
extracting all of the subfields can be helpful.

The problem with putting this in an extension is that this is mostly useful 
when debugging systems (particularly larger distributed systems) and so the 
chances of it hitting a critical mass enough to be supported by all major cloud 
vendors is effectively zero.

So I am not asking for this to be included in this patch but I am saying I 
would love to see these sort of things contributed at some point to core.

Re: Remove distprep

2023-10-09 Thread Peter Eisentraut


On 06.10.23 20:50, Andres Freund wrote:
The only thing I wonder is whether we ought to keep a maintainer-clean 
target (as an alias to distclean), so that extensions that added things 
to maintainer-clean continue to work.


The patch does do that.

Re: Two Window aggregate node for logically same over clause

2023-10-09 Thread "Anitha S"

https://www.postgresql.org/docs/16/sql-select.html#SQL-ORDERBY:~:text=Optionally%20one%20can%20add%20the%20key%20word%20ASC%20(ascending)%20or%20DESC%20(descending)%20after%20any%20expression%20in%20the%20ORDER%20BY%20clause.%20If%20not%20specified%2C%20ASC%20is%20assumed%20by%20default.%20Alternatively

If order by directions is not mentioned it is assumed as ASC. It is mention 
ORDER by section. Currently PG don't have any GUC to set default order 
direction. Is there any idea to set default ordering direction via GUC ?

Another angle is to ask: Why would the query add ASC to one window 

specification and not the other?

Had seen a query from user where one over clause contains order direction & 
others not.  The reason for this thread is to get a solution to handle such 
cases also.







 On Fri, 06 Oct 2023 18:06:10 +0530 Ashutosh Bapat 
 wrote ---



On Thu, Oct 5, 2023 at 8:53 PM "Anitha S"  
wrote: 
> 
> 
> 
> Hi team, 
> 
>   We have observed that for logically same over clause two different 
> window aggregate nodes are created in plan. 
> The below query contains two window functions. Both Over clause contain the 
> same partition & order clause in it. But in one over clause ordering option 
> is mentioned as ascending but not in another over clause which represents the 
> default option "ascending". 
> 
> 
> Sample Query: 
> 
> CREATE TEMPORARY TABLE empsalary ( 
> depname varchar, 
> empno bigint, 
> salary int, 
> enroll_date date 
> ); 
> 
> INSERT INTO empsalary VALUES ('develop', 10, 5200, '2007-08-01'), ('sales', 
> 1, 5000, '2006-10-01'), 
> ('personnel', 5, 3500, '2007-12-10'),('sales', 4, 4800, 
> '2007-08-08'),('personnel', 2, 3900, '2006-12-23'), 
> ('develop', 7, 4200, '2008-01-01'),('develop', 9, 4500, 
> '2008-01-01'),('sales', 3, 4800, '2007-08-01'), 
> ('develop', 8, 6000, '2006-10-01'),('develop', 11, 5200, '2007-08-15'); 
> 
> explain verbose select rank() over (partition by depname order by salary), 
> percent_rank() over(partition by depname order by salary asc) from empsalary; 
>   QUERY PLAN 
> --
>  
>  WindowAgg (cost=174.54..1000114.66 rows=1070 width=52) 
>  Output: (rank() OVER (?)), percent_rank() OVER (?), salary, depname 
>   -> WindowAgg (cost=174.54..195.94 rows=1070 width=44) 
>   Output: salary, depname, rank() OVER (?) 
> -> Sort (cost=174.54..177.21 rows=1070 width=36) 
> Output: salary, depname 
> Sort Key: empsalary.depname, empsalary.salary 
>-> Seq Scan on pg_temp_7.empsalary (cost=0.00..20.70 
> rows=1070 width=36) 
>Output: salary, depname 
> 
> 
> Ordering option of Sort is represented by enum SortByDir (parsenodes.h). 
> 
> List of SortBy is present in WindowDef structure which has info of order by 
> clause in Over clause 
> 
> /* Sort ordering options for ORDER BY and CREATE INDEX */ 
> typedef enum SortByDir 
> { 
> SORTBY_DEFAULT, 
> SORTBY_ASC, 
> SORTBY_DESC, 
> SORTBY_USING /* not allowed in CREATE INDEX ... */ 
> } SortByDir; 
> typedef struct SortBy 
> { 
> NodeTag type; 
> Node *node; /* expression to sort on */ 
> SortByDir sortby_dir; /* ASC/DESC/USING/default */ 
> SortByNulls sortby_nulls; /* NULLS FIRST/LAST */ 
> List *useOp; /* name of op to use, if SORTBY_USING */ 
> int location; /* operator location, or -1 if none/unknown */ 
> } SortBy; 
> 
> 
> In transformWindowFuncCall API, Equality check of order clause in window 
> definition failed while comparing SortByDir enum of both over clause i.e 
> SORT_DEFAULT  is not equal to SORT_ASC. Hence two window clause are created 
> in parse tree resulting in the creation of two different window aggregate 
> node. 
> 
> This check can be modified to form a single window aggregate node for the 
> above results in faster query execution. Is there any reason for creating two 
> different window aggregate node? 
 
I don't see any. https://www.postgresql.org/docs/16/sql-select.html 
description of ORDER BY clause clearly says that ASC is assumed when 
no direction is mentioned. The only place in code which is used to 
create the node treats DEFAULT and ASC as same. May be we want to 
allow default to be ASC or DESC based on some setting (read GUC) in 
some future. 
 
Another angle is to ask: Why would the query add ASC to one window 
specification and not the other? 
 
-- 
Best Wishes, 
Ashutosh Bapat

Re: Synchronizing slots from primary to standby

2023-10-09 Thread shveta malik

On Wed, Oct 4, 2023 at 8:53 AM Peter Smith  wrote:
>
> Here are some review comments for v20-0002.
>

Thanks Peter for the feedback. Comments from 31 till end are addressed
in v22. First 30 comments will be addressed in the next version.

> ==
> 1. GENERAL - errmsg/elog messages
>
> There are a a lot of minor problems and/or quirks across all the
> message texts. Here is a summary of some I found:
>
> ERROR
> errmsg("could not receive list of slots from the primary server: %s",
> errmsg("invalid response from primary server"),
> errmsg("invalid connection string syntax: %s",
> errmsg("replication slot-sync worker slot %d is empty, cannot attach",
> errmsg("replication slot-sync worker slot %d is already used by
> another worker, cannot attach",
> errmsg("replication slot-sync worker slot %d is already used by
> another worker, cannot attach",
> errmsg("could not connect to the primary server: %s",
>
> errmsg("operation not permitted on replication slots on standby which
> are synchronized from primary")));
> /primary/the primary/
>
> errmsg("could not fetch invalidation cuase for slot \"%s\" from primary: %s",
> /cuase/cause/
> /primary/the primary/
>
> errmsg("slot \"%s\" disapeared from the primary",
> /disapeared/disappeared/
>
> errmsg("could not fetch slot info from the primary: %s",
> errmsg("could not connect to the primary server: %s", err)));
> errmsg("could not map dynamic shared memory segment for slot-sync worker")));
>
> errmsg("physical replication slot %s found in synchronize_slot_names",
> slot name not quoted?
> ---
>
> WARNING
> errmsg("out of background worker slots"),
>
> errmsg("Replication slot-sync worker failed to attach to worker-pool slot %d",
> case?
>
> errmsg("Removed database %d from replication slot-sync worker %d;
> dbcount now: %d",
> case?
>
> errmsg("Skipping slots synchronization as primary_slot_name is not set."));
> case?
>
> errmsg("Skipping slots synchronization as hot_standby_feedback is off."));
> case?
>
> errmsg("Skipping slots synchronization as dbname is not specified in
> primary_conninfo."));
> case?
>
> errmsg("slot-sync wait for slot %s interrupted by promotion, slot
> creation aborted",
>
> errmsg("could not fetch slot info for slot \"%s\" from primary: %s",
> /primary/the primary/
>
> errmsg("slot \"%s\" disappeared from the primary, aborting slot creation",
> errmsg("slot \"%s\" invalidated on primary, aborting slot creation",
>
> errmsg("slot-sync for slot %s interrupted by promotion, sync not possible",
> slot name not quoted?
>
> errmsg("skipping sync of slot \"%s\" as the received slot-sync lsn
> %X/%X is ahead of the standby position %X/%X",
>
> errmsg("not synchronizing slot %s; synchronization would move it backward",
> slot name not quoted?
> /backward/backwards/
>
> ---
>
> LOG
> errmsg("Added database %d to replication slot-sync worker %d; dbcount now: 
> %d",
> errmsg("Added database %d to replication slot-sync worker %d; dbcount now: 
> %d",
> errmsg("Stopping replication slot-sync worker %d",
> errmsg("waiting for remote slot \"%s\" LSN (%u/%X) and catalog xmin
> (%u) to pass local slot LSN (%u/%X) and and catalog xmin (%u)",
>
> errmsg("wait over for remote slot \"%s\" as its LSN (%X/%X)and catalog
> xmin (%u) has now passed local slot LSN (%X/%X) and catalog xmin
> (%u)",
> missing spaces?
>
> elog(LOG, "Dropped replication slot \"%s\" ",
> extra space?
> why this one is elog but others are not?
>
> elog(LOG, "Replication slot-sync worker %d is shutting down on
> receiving SIGINT", MySlotSyncWorker->slot);
> case?
> why this one is elog but others are not?
>
> elog(LOG, "Replication slot-sync worker %d started", worker_slot);
> case?
> why this one is elog but others are not?
> 
>
> DEBUG1
> errmsg("allocated dsa for slot-sync worker for dbcount: %d"
> worker number not given?
> should be elog?
>
> errmsg_internal("logical replication launcher started")
> should be elog?
>
> 
>
> DEBUG2
> elog(DEBUG2, "slot-sync worker%d's query:%s \n",
> missing space after 'worker'
> extra space before \n
>
> ==
> .../libpqwalreceiver/libpqwalreceiver.c
>
> 2. libpqrcv_get_dbname_from_conninfo
>
> +/*
> + * Get database name from primary conninfo.
> + *
> + * If dbanme is not found in connInfo, return NULL value.
> + * The caller should take care of handling NULL value.
> + */
> +static char *
> +libpqrcv_get_dbname_from_conninfo(const char *connInfo)
>
> 2a.
> /dbanme/dbname/
>
> ~
>
> 2b.
> "The caller should take care of handling NULL value."
>
> IMO this is not very useful; it's like saying "caller must handle
> function return values".
>
> ~~~
>
> 3.
> + for (opt = opts; opt->keyword != NULL; ++opt)
> + {
> + /* Ignore connection options that are not present. */
> + if (opt->val == NULL)
> + continue;
> +
> + if (strcmp(opt->keyword, "dbname") == 0 && opt->val[0] != '\0')
> + {
> + dbname = pstrdup(opt->val);
> + }
> + }
>
> 3a.
> If there are multiple "dbname" in the conninfo then it will be the
> LAST one that is returned.
>
> Judg

Re: Draft LIMIT pushdown to Append and MergeAppend patch

2023-10-09 Thread Ashutosh Bapat

On Mon, Oct 9, 2023 at 6:25 AM David Rowley  wrote:
>
> However, it may also be worth you reading over [3] and the ultimate
> reason I changed my mind on that being a good idea. Pushing LIMITs
> below an Append seems quite incomplete when we don't yet push sorts
> below Appends, which is what that patch did.

When the paths are already ordered according to ORDER BY
specification, pushing down LIMIT will give them extra benefit of
being cost effective. Do you think we can proceed along those lines?
Later when we implement Sorting push down we will adjust the LIMIT
pushdown code for the same.

> I just was not
> comfortable proceeding with [3] as nodeSort.c holds onto the tuplesort
> until executor shutdown.  That'll be done for rescan reasons, but it
> does mean if you pushed Sort below Append that we could have a very
> large number of sort nodes holding onto work_mem all at once.   I find
> that a bit scary, especially so given the excessive partitioning cases
> I've seen and worked on recently.  I did consider if we maybe could
> adjust nodeSort.c to do tuplesort_end() after the final row. We'd need
> to only do that if we could be somehow certain there were going to be
> no rescans.  I don't have a plan on how that would be detected.

We have that problem with partitionwise join. Have you seen it in the
field? I have not seen such reports but that could be because not many
know the partitionwise join needs to be explicitly turned ON. The
solution we will develop here will solve problem with partitionwise
join as well. It's hard to solve this problem. If there's a real case
where LIMIT pushdown helps without fixing Sort pushdown case, it might
help proceeding with the same.

-- 
Best Wishes,
Ashutosh Bapat

Re: Synchronizing slots from primary to standby

2023-10-09 Thread shveta malik

On Mon, Oct 9, 2023 at 3:24 AM Peter Smith  wrote:
>
> On Fri, Oct 6, 2023 at 7:37 PM Alvaro Herrera  wrote:
> >
> > On 2023-Sep-27, Peter Smith wrote:
> >
> > > 3. get_local_synced_slot_names
> > >
> > > + for (int i = 0; i < max_replication_slots; i++)
> > > + {
> > > + ReplicationSlot *s = &ReplicationSlotCtl->replication_slots[i];
> > > +
> > > + /* Check if it is logical synchronized slot */
> > > + if (s->in_use && SlotIsLogical(s) && s->data.synced)
> > > + {
> > > + for (int j = 0; j < MySlotSyncWorker->dbcount; j++)
> > > + {
> > >
> > > Loop variables are not declared in the common PG code way.
> >
> > Note that since we added C99 as a mandatory requirement for compilers in
> > commit d9dd406fe281, we've been using declarations in loop initializers
> > (see 143290efd079).  We have almost 500 occurrences of this already.
> > Older code, obviously, does not use them, but that's no reason not to
> > introduce them in new code.  I think they make the code a bit leaner, so
> > I suggest to use these liberally.
> >
>
> I also prefer the C99 style, but I had misunderstood there was still a
> convention to keep using the old style for code consistency (e.g. many
> new patches I see still seem to use the old style).
>
> Thanks for confirming that C99 loop variables are fine for any new code.
>
> @Shveta/Ajin - please ignore/revert all my old review comments about this 
> point.
>

Sure, reverted all such changes in v22. Now we have declarations in
loop initializers.

thanks
Shveta

Re: typo in couple of places

2023-10-09 Thread Amit Kapila

On Sat, Oct 7, 2023 at 8:28 AM vignesh C  wrote:
>
> On Fri, 6 Oct 2023 at 20:50, Dagfinn Ilmari Mannsåker  
> wrote:
> >
> > vignesh C  writes:
> >
> > > Hi,
> > >
> > > I noticed a couple of typos in code. "the the" should have been "the",
> > > attached patch has the changes for the same.
> >
> > This made me curious about other duplicate word occurrences, and after a
> > few minutes of increasingly-elaborate regexing to exclude false
> > postives, I found a few more (plus a bonus a _missing_ "the").  See the
> > attached patch (which includes your originl one, for completeness).
>
> Thanks, Looks good.
>

Pushed.

-- 
With Regards,
Amit Kapila.

Re: Making aggregate deserialization (and WAL receive) functions slightly faster

2023-10-09 Thread David Rowley

On Mon, 9 Oct 2023 at 21:17, David Rowley  wrote:
> Here are some more thoughts on how we could improve this:
>
> 1. Adjust the definition of StringInfoData.maxlen to define that -1
> means the StringInfoData's buffer is externally managed.
> 2. Adjust enlargeStringInfo() to add a check for maxlen = -1 and have
> it palloc, say, pg_next_pow2(str->len * 2) bytes and memcpy the
> existing (externally managed string) into the newly palloc'd buffer.
> 3. Add a new function along the lines of what I originally proposed to
> allow init of a StringInfoData using an existing allocated string
> which sets maxlen = -1.
> 4. Update all the existing places, including the ones I just committed
> (plus the ones you committed in ba1e066e4) to make use of the function
> added in #3.

I just spent the past few hours playing around with the attached WIP
patch to try to clean up the various places where we manually build
StringInfoDatas around the tree.

While working on this, I added an Assert in the new
initStringInfoFromStringWithLen function to ensure that data[len] ==
'\0' per the "There is guaranteed to be a terminating '\0' at
data[len]" comment in stringinfo.h.  It looks like we have some
existing breakers of this rule.

If you apply the attached patch to 608fd198de~1 and ignore the
rejected hunks from the deserial functions, you'll see an Assert
failure during 023_twophase_stream.pl

023_twophase_stream_subscriber.log indicates:
TRAP: failed Assert("data[len] == '\0'"), File:
"../../../../src/include/lib/stringinfo.h", Line: 97, PID: 1073141
postgres: subscriber: logical replication parallel apply worker for
subscription 16396 (ExceptionalCondition+0x70)[0x56160451e9d0]
postgres: subscriber: logical replication parallel apply worker for
subscription 16396 (ParallelApplyWorkerMain+0x53c)[0x5616043618cc]
postgres: subscriber: logical replication parallel apply worker for
subscription 16396 (StartBackgroundWorker+0x20b)[0x56160434452b]

So it seems like we have some existing issues with
LogicalParallelApplyLoop(). The code there does not properly NUL
terminate the StringInfoData.data field. There are some examples in
exec_bind_message() of how that could be fixed. I've CC'd Amit to let
him know about this.

I'll also need to revert 608fd198 as this also highlights that setting
the StringInfoData.data to point to a bytea Datum can't be done either
as those aren't NUL terminated strings.

If people think it's worthwhile having something like the attached to
try to eliminate our need to manually build StringInfoDatas then I can
spend more time on it once LogicalParallelApplyLoop() is fixed.

David
diff --git a/src/backend/replication/logical/applyparallelworker.c 
b/src/backend/replication/logical/applyparallelworker.c
index 82f48a488e..fdfbd45e2c 100644
--- a/src/backend/replication/logical/applyparallelworker.c
+++ b/src/backend/replication/logical/applyparallelworker.c
@@ -774,10 +774,7 @@ LogicalParallelApplyLoop(shm_mq_handle *mqh)
if (len == 0)
elog(ERROR, "invalid message length");
 
-   s.cursor = 0;
-   s.maxlen = -1;
-   s.data = (char *) data;
-   s.len = len;
+   initStringInfoFromStringWithLen(&s, data, len);
 
/*
 * The first byte of messages sent from leader apply 
worker to
diff --git a/src/backend/replication/logical/proto.c 
b/src/backend/replication/logical/proto.c
index d52c8963eb..0d65e2836b 100644
--- a/src/backend/replication/logical/proto.c
+++ b/src/backend/replication/logical/proto.c
@@ -879,6 +879,7 @@ logicalrep_read_tuple(StringInfo in, LogicalRepTupleData 
*tuple)
/* Read the data */
for (i = 0; i < natts; i++)
{
+   char   *buff;
charkind;
int len;
StringInfo  value = &tuple->colvalues[i];
@@ -899,19 +900,17 @@ logicalrep_read_tuple(StringInfo in, LogicalRepTupleData 
*tuple)
len = pq_getmsgint(in, 4);  /* read length 
*/
 
/* and data */
-   value->data = palloc(len + 1);
-   pq_copymsgbytes(in, value->data, len);
+   buff = palloc(len + 1);
+   pq_copymsgbytes(in, buff, len);
 
/*
 * Not strictly necessary for 
LOGICALREP_COLUMN_BINARY, but
 * per StringInfo practice.
 */
-   value->data[len] = '\0';
+   buff[len] = '\0';
 
-   /* make StringInfo fully valid */
-   value->len = len;
-   value->cursor = 0;
-

Re: Draft LIMIT pushdown to Append and MergeAppend patch

2023-10-09 Thread David Rowley

On Mon, 9 Oct 2023 at 23:35, Ashutosh Bapat
 wrote:
>
> On Mon, Oct 9, 2023 at 6:25 AM David Rowley  wrote:
> >
> > However, it may also be worth you reading over [3] and the ultimate
> > reason I changed my mind on that being a good idea. Pushing LIMITs
> > below an Append seems quite incomplete when we don't yet push sorts
> > below Appends, which is what that patch did.
>
> When the paths are already ordered according to ORDER BY
> specification, pushing down LIMIT will give them extra benefit of
> being cost effective. Do you think we can proceed along those lines?
> Later when we implement Sorting push down we will adjust the LIMIT
> pushdown code for the same.

What are there benefits if the paths are already ordered?  e.g if it's
an index scan then we'll only pull the tuples we need from it.

I think if we did manage to get something working to push Sorts below
Appends then ExecSetTupleBound() would take care of most of the limit
problem (with the exception of FDWs).  If you look at
ExecSetTupleBound() you'll see that it does recursively descend into
AppendStates and set the bound on the append children.  That's not
going to work when the Sort is above the Append. So isn't it mostly
the work_mem * n_append_children concern that is holding us up here?

> We have that problem with partitionwise join. Have you seen it in the
> field? I have not seen such reports but that could be because not many
> know the partitionwise join needs to be explicitly turned ON. The
> solution we will develop here will solve problem with partitionwise
> join as well. It's hard to solve this problem. If there's a real case
> where LIMIT pushdown helps without fixing Sort pushdown case, it might
> help proceeding with the same.

I've not heard anything about that.  What I saw were just complaints
about the planner being too slow to produce a plan which accessed well
in excess of the number of partitions that we recommend in the
partitioning best practices documents.

David

Re: Crash in add_paths_to_append_rel

2023-10-09 Thread David Rowley

On Mon, 9 Oct 2023 at 22:49, Richard Guo  wrote:
> I came across a crash in add_paths_to_append_rel() with the query below.

> It was introduced by commit a8a968a8 where we referenced
> cheapest_startup_path->param_info without first checking that we have a
> valid cheapest_startup_path.

Thank you for the report.

Since you've managed to attribute this to a specific commit, for the
future, I think instead of opening a new thread, it would be more
useful to communicate the issue on the thread that's linked in the
commit message.

I will look at this tomorrow.

David

Re: Synchronizing slots from primary to standby

2023-10-09 Thread shveta malik

On Mon, Oct 2, 2023 at 4:29 PM Hayato Kuroda (Fujitsu)
 wrote:
>
> Dear Shveta,
>
> Thank you for updating the patch!

Thanks for the feedback Kuroda-san. I have addressed most of these in
v22. Please find my comments inline.

>
> I found another ERROR due to the slot removal. Is this a real issue?
>
> 1. applied add_sleep.txt, which emulated the case the tablesync worker stucked
>and the primary crashed during the
>initial sync.
> 2. executed test_0925_v2.sh (You attached in [1])
> 3. secondary could not start the logical replication because the slot was not
>created (log files were also attached).
>
>
> Here is my analysis. The cause is that the slotsync worker aborts the slot 
> creation
> on secondary server because the restart_lsn of secondary ahead the primary's 
> one.
> IIUC it can be occurred when tablesync workers finishes initial copy before
> walsenders stream changes. In this case, the relstate of the worker is set to
> SUBREL_STATE_CATCHUP and the apply worker waits till the relation becomes
> SUBREL_STATE_SYNCDONE. From here the slot on primary will not be updated until
> the relation is caught up. If some changes are come and the primary crashes at
> that time, the syncslot worker will abort the slot creation.
>
>
> Anyway, followings are my comments.
> I have not checked detailed conventions yet. It should be done in later stage.
>
>
> 
>
> For 0001:
> ===
>
> WalSndWaitForStandbyConfirmation()
>
> ```
> +   /* If postmaster asked us to stop, don't wait anymore */
> +   if (got_STOPPING)
> +   break;
> ```
>
> I have considered again, and it may still have an issue: logical walsenders 
> may
> break from the loop before physical walsenders send WALs. This may be occurred
> because both physical and logical walsenders would get 
> PROCSIG_WALSND_INIT_STOPPING.
>
> I think a function like WalSndWaitStopping() must be needed, which waits until
> physical walsenders become WALSNDSTATE_STOPPING or exit. Thought?
>
> WalSndWaitForStandbyConfirmation()
>
> ```
> +   standby_slot_cpy = list_copy(standby_slot_names_list);
> ```
>
> I found that standby_slot_names_list and standby_slot_cpy would not be updated
> even if the GUC was updated. Is this acceptable? Won't it be occurred after 
> you
> refactor the patch?
> What would be occurred when synchronize_slot_names is updated on secondary
> while primary executes this?
>
> WalSndWaitForStandbyConfirmation()
>
> ```
> +
> +   goto retry;
> ```

Yes, there could be a problem here. I will review it. Allow some more
time for this.

>
> I checked other "goto retry;", but I could not find the pattern that the 
> return
> clause does not exist after the goto (exception: void function). I also think
> that current style seems a bit strange. How about using an outer loop like
> While (list_length(standby_slot_cpy))?
>
> =
>
> slot.h
>
> ```
> +extern void WaitForStandbyLSN(XLogRecPtr wait_for_lsn);
> ```
>
> WaitForStandbyLSN() does not exist.
>

Done.

> 
>
> For 0002:
> =
>
> General
>
> The patch requires that primary_conninfo must contain the dbname, but it
> conflicts with documentation. It says:
>
> ```
> ...Do not specify a database name in the primary_conninfo string.
> ```
>
> I confirmed [^a] it is harmless that primary_conninfo has dbname, but at least
> the description must be fixed.
>

Done.

> General
>
> I found that primary server output huge amount of logs when the 
> log_min_duration_messages = 0.
> This ie because slotsync worker sends an SQL per 10ms, in 
> wait_for_primary_slot_catchup().
> Is there any good way to suppress it? Or, should we be patient?
>

I will review it to see if we can do anything here.

> =
>
> ```
> +{ oid => '6312', descr => 'what caused the replication slot to become 
> invalid',
> ```
>
> How did you determine the oid? IIRC, developping features should use oids in
> the range 8000-. See src/include/catalog/unused_oids.
>

Corrected it.

> =
>
> LogicalRepCtxStruct
>
> ```
> /* Background workers. */
> +   SlotSyncWorker *ss_workers; /* slot-sync workers */
> LogicalRepWorker workers[FLEXIBLE_ARRAY_MEMBER];
> ```
>
> It's OK for now, but can we combine them into an array? IIUC there is no
> possibility to exist both of processes and they have same component, so it may
> be able to be same. It can reduce an attribute but may lead some
> difficulties to read.

I feel it will add to more confusion and should be kept separate.

>
> WaitForReplicationWorkerAttach() and logicalrep_worker_stop_internal()
>
> I could not find cases that has "LWLock *" as an argument (exception: 
> functions in lwlock.c).
> Is it sufficient to check RecoveryInProgress() instead of specifying as 
> arguments?
>

I feel it should be argument based. If not lock-based then a different
arg perhaps.  Let us say it is in the process of starting a worker and
it failed and now it wants to do cleanup. It should d

Re: Draft LIMIT pushdown to Append and MergeAppend patch

2023-10-09 Thread Ashutosh Bapat

On Mon, Oct 9, 2023 at 4:33 PM David Rowley  wrote:
>
> On Mon, 9 Oct 2023 at 23:35, Ashutosh Bapat
>  wrote:
> >
> > On Mon, Oct 9, 2023 at 6:25 AM David Rowley  wrote:
> > >
> > > However, it may also be worth you reading over [3] and the ultimate
> > > reason I changed my mind on that being a good idea. Pushing LIMITs
> > > below an Append seems quite incomplete when we don't yet push sorts
> > > below Appends, which is what that patch did.
> >
> > When the paths are already ordered according to ORDER BY
> > specification, pushing down LIMIT will give them extra benefit of
> > being cost effective. Do you think we can proceed along those lines?
> > Later when we implement Sorting push down we will adjust the LIMIT
> > pushdown code for the same.
>
> What are there benefits if the paths are already ordered?  e.g if it's
> an index scan then we'll only pull the tuples we need from it.
>

postgres_fdw creates a path with pathkeys based on the clauses on that
relation. There is no index involved there. Pushing down LIMIT will
limit the number of rows fetched from the foreign server and the
foreign server may have opportunity to optimize query based on the
LIMIT.

> > We have that problem with partitionwise join. Have you seen it in the> > 
> > field? I have not seen such reports but that could be because not many
> > know the partitionwise join needs to be explicitly turned ON. The
> > solution we will develop here will solve problem with partitionwise
> > join as well. It's hard to solve this problem. If there's a real case
> > where LIMIT pushdown helps without fixing Sort pushdown case, it might
> > help proceeding with the same.
>
> I've not heard anything about that.  What I saw were just complaints
> about the planner being too slow to produce a plan which accessed well
> in excess of the number of partitions that we recommend in the
> partitioning best practices documents.

I am not able to relate the planner slowness and work_mem * number of
partitions problems. I agree that both of them are problems but
different one.

-- 
Best Wishes,
Ashutosh Bapat

Re: Draft LIMIT pushdown to Append and MergeAppend patch

2023-10-09 Thread Michał Kłeczek



> On 9 Oct 2023, at 15:04, Ashutosh Bapat  wrote:
> 
> On Mon, Oct 9, 2023 at 4:33 PM David Rowley  > wrote:
>> 
>> What are there benefits if the paths are already ordered?  e.g if it's
>> an index scan then we'll only pull the tuples we need from it.
>> 
> 
> postgres_fdw creates a path with pathkeys based on the clauses on that
> relation. There is no index involved there. Pushing down LIMIT will
> limit the number of rows fetched from the foreign server and the
> foreign server may have opportunity to optimize query based on the
> LIMIT.

I would add another benefit:
opportunity to fetch everything early, buffer it and release the session.

Without limit information fdw has to keep cursors open.


—
Michal

Re: ALTER COLUMN ... SET EXPRESSION to alter stored generated column's expression

2023-10-09 Thread Amul Sul

On Fri, Oct 6, 2023 at 6:03 PM Peter Eisentraut 
wrote:

> On 28.08.23 11:54, Amul Sul wrote:
> > Thanks for the review comments, I have fixed those in the attached
> > version. In
> > addition to that, extended syntax to have the STORE keyword as suggested
> by
> > Vik.
>
> An additional comment: When you change the generation expression, you
> need to run ON UPDATE triggers on all rows, if there are any triggers
> defined.  That is because someone could have triggers defined on the
> column to either check for valid values or propagate values somewhere
> else, and if the expression changes, that is kind of like an UPDATE.
>
> Similarly, I think we should consider how logical decoding should handle
> this operation.  I'd imagine it should generate UPDATE events on all
> rows.  A test case in test_decoding would be useful.
>

If I am not mistaken, the existing table rewrite facilities for ALTER TABLE
don't have support to run triggers or generate an event for each row,
right?

Do you expect to write a new code to handle this rewriting?

Regards,
Amul

Re: A failure in 031_recovery_conflict.pl on Debian/s390x

2023-10-09 Thread Alexander Lakhin


Hi,

13.08.2023 00:00, Andres Freund wrote:

Hi,

On 2023-08-12 15:50:24 +1200, Thomas Munro wrote:

Thanks.  I realised that it's easy enough to test that theory about
cleanup locks by hacking ConditionalLockBufferForCleanup() to return
false randomly.  Then the test occasionally fails as described.  Seems
like we'll need to fix that test, but it's not evidence of a server
bug, and my signal handler refactoring patch is in the clear.  Thanks
for testing it!

WRT fixing the test: I think just using VACUUM FREEZE ought to do the job?
After changing all the VACUUMs to VACUUM FREEZEs, 031_recovery_conflict.pl
passes even after I make ConditionalLockBufferForCleanup() fail 100%.



I happened to encounter another failure of 031_recovery_conflict, on
rather slow ARMv7:
t/031_recovery_conflict.pl  13/? # poll_query_until timed out 
executing this query:
#
# SELECT 'waiting' FROM pg_locks WHERE locktype = 'relation' AND NOT granted;
#
# expecting this output:
# waiting
# last actual query output:
#
# with stderr:
t/031_recovery_conflict.pl  14/?
#   Failed test 'startup deadlock: lock acquisition is waiting'
#   at t/031_recovery_conflict.pl line 261.
# Tests were run but no plan was declared and done_testing() was not seen.
# Looks like your test exited with 255 just after 15.
t/031_recovery_conflict.pl  Dubious, test returned 255 (wstat 
65280, 0xff00)
Failed 1/15 subtests

I've managed to reproduce it with the attached test patch, on a regular
x86_64 VM with CPU slowed down to 2%, so that the modified test runs for
5+ minutes.
...
t/031_recovery_conflict.pl .. 102/? # iteration 31
t/031_recovery_conflict.pl .. 104/? # iteration 32
t/031_recovery_conflict.pl .. 106/?
<... waiting for something ...>

031_recovery_conflict_standby.log contains:
2023-10-09 12:43:14.821 UTC [145519] 031_recovery_conflict.pl LOG: statement: 
BEGIN;
2023-10-09 12:43:14.821 UTC [145519] 031_recovery_conflict.pl LOG: statement: DECLARE test_recovery_conflict_cursor 
CURSOR FOR SELECT a FROM test_recovery_conflict_table1;
2023-10-09 12:43:14.822 UTC [145519] 031_recovery_conflict.pl LOG: statement: FETCH FORWARD FROM 
test_recovery_conflict_cursor;

2023-10-09 12:43:14.822 UTC [145519] 031_recovery_conflict.pl LOG: statement: 
SELECT * FROM test_recovery_conflict_table2;
2023-10-09 12:43:15.039 UTC [145513] LOG:  recovery still waiting after 10.217 
ms: recovery conflict on buffer pin
2023-10-09 12:43:15.039 UTC [145513] CONTEXT:  WAL redo at 0/35ADD38 for Heap2/PRUNE: snapshotConflictHorizon: 0, 
nredirected: 0, ndead: 100, nunused: 0, redirected: [], dead: [2, 3, 4, ... , 99, 100, 101], unused: []; blkref #0: rel 
1663/16385/16489, blk 0
2023-10-09 12:43:15.039 UTC [145519] 031_recovery_conflict.pl ERROR:  canceling statement due to conflict with recovery 
at character 15
2023-10-09 12:43:15.039 UTC [145519] 031_recovery_conflict.pl DETAIL:  User transaction caused buffer deadlock with 
recovery.

2023-10-09 12:43:15.039 UTC [145519] 031_recovery_conflict.pl STATEMENT:  
SELECT * FROM test_recovery_conflict_table2;
2023-10-09 12:43:15.039 UTC [145513] LOG:  recovery finished waiting after 
10.382 ms: recovery conflict on buffer pin
2023-10-09 12:43:15.039 UTC [145513] CONTEXT:  WAL redo at 0/35ADD38 for Heap2/PRUNE: snapshotConflictHorizon: 0, 
nredirected: 0, ndead: 100, nunused: 0, redirected: [], dead: [2, 3, 4, ... , 99, 100, 101], unused: []; blkref #0: rel 
1663/16385/16489, blk 0
2023-10-09 12:43:15.134 UTC [145527] 031_recovery_conflict.pl LOG: statement: SELECT 'waiting' FROM pg_locks WHERE 
locktype = 'relation' AND NOT granted;
2023-10-09 12:43:15.647 UTC [145530] 031_recovery_conflict.pl LOG: statement: SELECT 'waiting' FROM pg_locks WHERE 
locktype = 'relation' AND NOT granted;
2023-10-09 12:43:15.981 UTC [145532] 031_recovery_conflict.pl LOG: statement: SELECT 'waiting' FROM pg_locks WHERE 
locktype = 'relation' AND NOT granted;

...
2023-10-09 12:51:24.729 UTC [149175] 031_recovery_conflict.pl LOG: statement: SELECT 'waiting' FROM pg_locks WHERE 
locktype = 'relation' AND NOT granted;
2023-10-09 12:51:25.969 UTC [145514] FATAL:  could not receive data from WAL stream: server closed the connection 
unexpectedly

    This probably means the server terminated abnormally
    before or while processing the request.

Indeed, t_031_recovery_conflict_standby_data/pgdata/pg_wal/
00010003 contains:
...
rmgr: Heap    len (rec/tot): 59/    59, tx:    972, lsn: 0/035ADB18, prev 0/035ADAD8, desc: INSERT off: 100, 
flags: 0x00, blkref #0: rel 1663/16385/16489 blk 0
rmgr: Heap    len (rec/tot): 59/    59, tx:    972, lsn: 0/035ADB58, prev 0/035ADB18, desc: INSERT off: 101, 
flags: 0x00, blkref #0: rel 1663/16385/16489 blk 0
rmgr: Transaction len (rec/tot): 34/    34, tx:    972, lsn: 0/035ADB98, prev 0/035ADB58, desc: ABORT 2023-10-09 
12:43:13.797513 UTC
rmgr: Standby len (rec/tot): 42/    42, tx:    973, lsn: 0/035

Re: On login trigger: take three

2023-10-09 Thread Alexander Korotkov

Hi!

On Tue, Oct 3, 2023 at 8:35 PM Alexander Korotkov  wrote:
> Thank you for the interesting ideas. I'd like to try to revive the
> version with the flag in pg_database.  Will use other ideas as backup
> if no success.

I've revived the patch version with pg_database.dathasloginevt flag.
I took v32 version [1] and made the following changes.

 * Incorporate enchantments made on flagless version of patch.
 * Read dathasloginevt during InitPostgres() to prevent extra catalog
access and even more notable StartTransactionCommand() when there are
no login triggers.
 * Hold lock during setting of pg_database.dathasloginevt flag (v32
version actually didn't prevent race condition).
 * Fix AlterEventTrigger() to check event name not trigger name
 * Acquire conditional lock while resetting pg_database.dathasloginevt
flag to prevent new database connection to hang waiting another
transaction to finish.

This version should be good and has no overhead.  Any thoughts?
Daniel, could you please re-run the performance tests?

Links
1. 
https://www.postgresql.org/message-id/CAMEv5_vDjceLr54WUCNPPVsJs8WBWWsRW826VppNEFoLC1LAEw%40mail.gmail.com

--
Regards,
Alexander Korotkov

Re: Request for comment on setting binary format output per session

2023-10-09 Thread Jelte Fennema

On Wed, 4 Oct 2023 at 21:10, Robert Haas  wrote:
>
> On Wed, Oct 4, 2023 at 10:17 AM Peter Eisentraut  wrote:
> > I think intuitively, this facility ought to work like client_encoding.
>
> I hadn't really considered client_encoding as a precedent for this
> setting. A lot of my discomfort with the proposed mechanism also
> applies to client_encoding, namely, suppose you call some function or
> procedure or whatever and it changes client_encoding on your behalf
> and now your communication with the server is all screwed up. That
> seems very unpleasant. Yet it's also existing behavior. I think one
> could conclude on these facts either that (a) client_encoding is fine
> and the problems with controlling behavior using that kind of
> mechanism are mostly theoretical or (b) that we messed up with
> client_encoding and shouldn't add any more mistakes of the same ilk or
> (c) that we should really be looking at redesigning the way
> client_encoding works, too.

With my PgBouncer maintainer hat on: I think the GUC approach would be
quite alright, i.e. option (a). The nice thing is that it would be
very simple to make it work with connection poolers, because the same
approach could be reused that is currently used for client_encoding.
NOTE: This does require that the new GUC has the GUC_REPORT flag set
(just like client_encoding). By adding the GUC_REPORT flag clients
could also take into account any changes to the setting even when they
did not change it themselves (simplest way to handle a change would be
by throwing an error and closing the connection).

To clarify how PgBouncer currently handles client_encoding: For each
client PgBouncer keeps track of the current value for a list of GUCs,
one of which is client_encoding. This is done by listening for the
ParameterStatus responses it gets from the server while the client is
connected. Then if later a client is assigned another server
connection, and that server has different values for (some of) these
GUCs, before actually forwarding the client its query some SET
commands are sent to correctly set the GUCs.

The resultFormat = 3 trick might be nice for backwards compatibility
of clients. That way old clients would continue to get text or binary
output even when the new GUC is set. To be clear resultFormat=3 would
mean: Use binary format when the new GUC indicates that it should.
Upthread I see that Dave mentioned that this would require an extra
Describe, but I don't understand why. If you set 3 for all columns and
you know the value of the GUC, then you know which columns will be
encoded in binary.

Re: Request for comment on setting binary format output per session

2023-10-09 Thread Jelte Fennema

On Fri, 6 Oct 2023 at 13:11, Peter Eisentraut  wrote:
>
> On 04.10.23 20:30, Dave Cramer wrote:
> > We need to
> > figure out what is the right way to globally identify types, like
> > either
> > by fully-qualified name, by base name, some combination, how does it
> > work with extensions, or do we need a new mechanism like UUIDs.  I
> > think
> > that is something we need to work out, no matter which protocol
> > mechanism we end up using.
> >
> >
> > So how is this different than the GUC that I proposed ?
>
> The last patch I see from you in this thread uses OIDs, which I have
> argued is not the right solution.

Since the protocol already returns OIDs in the ParameterDescription
and RowDescription messages I don't see why using OIDs for this GUC
would cause any additional problems. Clients already need to know OIDs
and how to encode/decode them. So I don't see a big reason why we
should allow passing in "schema"."type" as well. Clients already need
a mapping from typename to OID for user defined types to be able to
parse ParameterDescription and RowDescription messages.

With my Citus hat on: I would very much like something like the UUID
or typename approach. With Citus the same user defined type can have
different OIDs on each of the servers in the cluster. So it sounds
like currently using a transaction pooler that does load balancing
across the workers in the cluster would cause issues for user defined
types. Having a cluster global unique identifier for a type within a
database would be able to solve those issues. But that would require
that the protocol actually sent those cluster global unique
identifiers instead of OIDs. As far as I can tell similar issues would
be present with zero-downtime upgrades using pg_upgrade + logical
replication, and probably also in solutions like BDR. i.e. this is an
issue when clients get transparently re-connected to a new host where
an OIDs of user defined types might be different.

So I think OIDs are a good choice for the newly proposed GUC, because
that's what the protocol uses currently. But I do think it would be
good to keep in mind what it would look like if we'd change the
protocol to report and accept UUIDs/typenames instead of OIDs.
UUIDs/typenames and OIDs have a clearly different string
representation though. So, I think we could easily expand the new GUC
to support both OIDs and UUIDs/typenames when we change the protocol
to do so too, even when supporting just OIDs now.

Re: pg_resetwal: Corrections around -c option

2023-10-09 Thread Alvaro Herrera

On 2023-Sep-28, Peter Eisentraut wrote:

> Branching off from [0], here is a for-discussion patch about some
> corrections for the pg_resetwal -c (--commit-timestamp-ids) option.
> 
> First, in the documentation for finding a manual value for the -c option
> based on the files present in the data directory, it was missing a
> multiplier, like for the other SLRU-based values, and also missing the
> mention of adding one for the upper value.  The value I came up with is
> computed as
> 
> SLRU_PAGES_PER_SEGMENT * COMMIT_TS_XACTS_PER_PAGE = 13088 = 0x3320

Hmm, not sure about this.  SLRU_PAGES_PER_SEGMENT is 32, and
COMMIT_TS_XACTS_PER_PAGE is 819, so this formula gives decimal 26208 =
0x6660.  But more generally, I'm not sure about the algorithm.  Really,
the safe value also depends on how large the latest file actually is;
e.g., if your numerically greatest file is only 32kB long (four pages)
then you can't specify values that refer to Xids in pages 5 and beyond,
because those pages will not have been zeroed into existence yet, so
you'll get an error:
  ERROR:  could not access status of transaction 55692
  DETAIL:  Could not read from file "pg_commit_ts/0002" at offset 32768: read 
too few bytes.
I think a useful value can be had by multiplying 26208 by the latest
*complete* file number, then if there is an incomplete last file, add
819 multiplied by the number of pages in it.

Also, "numerically greatest" is the wrong thing in case you've recently
wrapped XIDs around but the oldest files (before the wraparound) are
still present.  You really want the "logically latest" files.  (I think
that'll coincide with the files having the latest modification times.)

Not sure how to write this concisely, though.  Should we really try?

(I think the number 13088 appeared somewhere in connection with
multixacts.  Maybe there was a confusion with that.)

> Second, the present pg_resetwal code hardcodes the minimum value as 2, which
> is FrozenTransactionId, but it's not clear why that is allowed. Maybe we
> should change that to FirstNormalTransactionId, which matches other
> xid-related options in pg_resetwal.

Yes, that's clearly a mistake I made at the last minute: in [1] I posted
this patch *without* the test for 2, and when I pushed the patch two
days later, I had introduced that without any further explanation.

BTW if you `git show 666e8db` (which is the SHA1 identifier for
pg_resetxlog.c that appears in the patch I posted back then) you'll see
that the existing code did not have any similar protection for valid XID
values.  The tests to FirstNormalTransactionId for -u and -x were
introduced by commit 74cf7d46a91d, seven years later -- that commit both
introduced -u as a new feature, and hardened the tests for -x, which was
previously only testing for zero.

[1] https://postgr.es/m/20141201223413.gh1...@alvh.no-ip.org

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"Las navajas y los monos deben estar siempre distantes"   (Germán Poo)

Re: Logging parallel worker draught

2023-10-09 Thread Imseih (AWS), Sami

Hi,

This thread has been quiet for a while, but I'd like to share some
thoughts.

+1 to the idea of improving visibility into parallel worker saturation.
But overall, we should improve parallel processing visibility, so DBAs can
detect trends in parallel usage ( is the workload doing more parallel, or doing 
less )
and have enough data to either tune the workload or change parallel GUCs.

>> We can output this at the LOG level to avoid running the server at
>> DEBUG1 level. There are a few other cases where we are not able to
>> spawn the worker or process and those are logged at the LOG level. For
>> example, "could not fork autovacuum launcher process .." or "too many
>> background workers". So, not sure, if this should get a separate
>> treatment. If we fear this can happen frequently enough that it can
>> spam the LOG then a GUC may be worthwhile.


> I think we should definitely be afraid of that. I am in favor of a separate 
> GUC.

Currently explain ( analyze ) will give you the "Workers Planned"
and "Workers launched". Logging this via auto_explain is possible, so I am
not sure we need additional GUCs or debug levels for this info.

   ->  Gather  (cost=10430.00..10430.01 rows=2 width=8) (actual tim
e=131.826..134.325 rows=3 loops=1)
 Workers Planned: 2
 Workers Launched: 2


>> What I was wondering was whether we would be better off putting this
>> into the statistics collector, vs. doing it via logging. Both
>> approaches seem to have pros and cons.
>>
>> I think it could be easier for users to process the information if it
>> is available via some view, so there is a benefit in putting this into
>> the stats subsystem.


> Unless we do this instead.

Adding cumulative stats is a much better idea.

3 new columns can be added to pg_stat_database:
workers_planned, 
workers_launched,
parallel_operations - There could be more than 1 operation
per query, if for example there are multiple Parallel Gather
operations in a plan.

With these columns, monitoring tools can trend if there is more
or less parallel work happening over time ( by looking at parallel
operations ) or if the workload is suffering from parallel saturation.
workers_planned/workers_launched < 1 means there is a lack
of available worker processes.

Also, We could add this information on a per query level as well 
in pg_stat_statements, but this can be taken up in a seperate
discussion.

Regards,

--
Sami Imseih
Amazon Web Services (AWS)

Re: should frontend tools use syncfs() ?

2023-10-09 Thread Nathan Bossart

On Mon, Oct 09, 2023 at 01:12:16PM +0300, Maxim Orlov wrote:
> On Fri, 6 Oct 2023 at 22:35, Nathan Bossart 
> wrote:
>> From a quick skim, this one looks pretty good to me.  Would you mind adding
>> it to the commitfest so that it doesn't get lost?  I will aim to take a
>> closer look at it next week.
> 
> Sounds good, thanks a lot!
> 
> https://commitfest.postgresql.org/45/4609/

Thanks.  I've made a couple of small changes, but otherwise I think this
one is just about ready.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From 481efed2b47ab1cf37308a789698069d1a84561b Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Mon, 9 Oct 2023 11:07:29 -0500
Subject: [PATCH v12 1/1] Align names of variables, etc. in
 wal_sync_method-related code.

Author: Maxim Orlov
Discussion: https://postgr.es/m/CACG%3DezbL1gwE7_K7sr9uqaCGkWhmvRTcTEnm3%2BX1xsRNwbXULQ%40mail.gmail.com
---
 src/backend/access/transam/xlog.c   | 58 ++---
 src/backend/storage/file/fd.c   |  4 +-
 src/backend/utils/misc/guc_tables.c |  8 ++--
 src/include/access/xlog.h   | 15 +---
 src/include/access/xlogdefs.h   |  8 ++--
 src/include/port/freebsd.h  |  2 +-
 src/include/port/linux.h|  2 +-
 src/include/utils/guc_hooks.h   |  2 +-
 src/tools/pgindent/typedefs.list|  1 +
 9 files changed, 52 insertions(+), 48 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index fcbde10529..a0acd67772 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -130,7 +130,7 @@ bool	   *wal_consistency_checking = NULL;
 bool		wal_init_zero = true;
 bool		wal_recycle = true;
 bool		log_checkpoints = true;
-int			sync_method = DEFAULT_SYNC_METHOD;
+int			wal_sync_method = DEFAULT_WAL_SYNC_METHOD;
 int			wal_level = WAL_LEVEL_REPLICA;
 int			CommitDelay = 0;	/* precommit delay in microseconds */
 int			CommitSiblings = 5; /* # concurrent xacts needed to sleep */
@@ -171,17 +171,17 @@ static bool check_wal_consistency_checking_deferred = false;
 /*
  * GUC support
  */
-const struct config_enum_entry sync_method_options[] = {
-	{"fsync", SYNC_METHOD_FSYNC, false},
+const struct config_enum_entry wal_sync_method_options[] = {
+	{"fsync", WAL_SYNC_METHOD_FSYNC, false},
 #ifdef HAVE_FSYNC_WRITETHROUGH
-	{"fsync_writethrough", SYNC_METHOD_FSYNC_WRITETHROUGH, false},
+	{"fsync_writethrough", WAL_SYNC_METHOD_FSYNC_WRITETHROUGH, false},
 #endif
-	{"fdatasync", SYNC_METHOD_FDATASYNC, false},
+	{"fdatasync", WAL_SYNC_METHOD_FDATASYNC, false},
 #ifdef O_SYNC
-	{"open_sync", SYNC_METHOD_OPEN, false},
+	{"open_sync", WAL_SYNC_METHOD_OPEN, false},
 #endif
 #ifdef O_DSYNC
-	{"open_datasync", SYNC_METHOD_OPEN_DSYNC, false},
+	{"open_datasync", WAL_SYNC_METHOD_OPEN_DSYNC, false},
 #endif
 	{NULL, 0, false}
 };
@@ -2328,8 +2328,8 @@ XLogWrite(XLogwrtRqst WriteRqst, TimeLineID tli, bool flexible)
 		 * have no open file or the wrong one.  However, we do not need to
 		 * fsync more than one file.
 		 */
-		if (sync_method != SYNC_METHOD_OPEN &&
-			sync_method != SYNC_METHOD_OPEN_DSYNC)
+		if (wal_sync_method != WAL_SYNC_METHOD_OPEN &&
+			wal_sync_method != WAL_SYNC_METHOD_OPEN_DSYNC)
 		{
 			if (openLogFile >= 0 &&
 !XLByteInPrevSeg(LogwrtResult.Write, openLogSegNo,
@@ -2959,7 +2959,7 @@ XLogFileInitInternal(XLogSegNo logsegno, TimeLineID logtli,
 	 */
 	*added = false;
 	fd = BasicOpenFile(path, O_RDWR | PG_BINARY | O_CLOEXEC |
-	   get_sync_bit(sync_method));
+	   get_sync_bit(wal_sync_method));
 	if (fd < 0)
 	{
 		if (errno != ENOENT)
@@ -3124,7 +3124,7 @@ XLogFileInit(XLogSegNo logsegno, TimeLineID logtli)
 
 	/* Now open original target segment (might not be file I just made) */
 	fd = BasicOpenFile(path, O_RDWR | PG_BINARY | O_CLOEXEC |
-	   get_sync_bit(sync_method));
+	   get_sync_bit(wal_sync_method));
 	if (fd < 0)
 		ereport(ERROR,
 (errcode_for_file_access(),
@@ -3356,7 +3356,7 @@ XLogFileOpen(XLogSegNo segno, TimeLineID tli)
 	XLogFilePath(path, tli, segno, wal_segment_size);
 
 	fd = BasicOpenFile(path, O_RDWR | PG_BINARY | O_CLOEXEC |
-	   get_sync_bit(sync_method));
+	   get_sync_bit(wal_sync_method));
 	if (fd < 0)
 		ereport(PANIC,
 (errcode_for_file_access(),
@@ -8118,16 +8118,16 @@ get_sync_bit(int method)
 			 * not included in the enum option array, and therefore will never
 			 * be seen here.
 			 */
-		case SYNC_METHOD_FSYNC:
-		case SYNC_METHOD_FSYNC_WRITETHROUGH:
-		case SYNC_METHOD_FDATASYNC:
+		case WAL_SYNC_METHOD_FSYNC:
+		case WAL_SYNC_METHOD_FSYNC_WRITETHROUGH:
+		case WAL_SYNC_METHOD_FDATASYNC:
 			return o_direct_flag;
 #ifdef O_SYNC
-		case SYNC_METHOD_OPEN:
+		case WAL_SYNC_METHOD_OPEN:
 			return O_SYNC | o_direct_flag;
 #endif
 #ifdef O_DSYNC
-		case SYNC_METHOD_OPEN_DSYNC:
+		case WAL_SYNC_METHOD_OPEN_DSYNC:
 			return O_DSYNC | o_direct_flag;
 #endif
 		default:
@@ -8141,9 +8141,9 @@ get_sync_bit(int method)
  * GUC support
  */
 void
-assign_

Re: Add a new BGWORKER_BYPASS_ROLELOGINCHECK flag

2023-10-09 Thread Tom Lane

Michael Paquier  writes:
> Another thing is that we cannot rely on the PID returned by launch()
> as it could fail, so $worker3_pid needs to disappear.  If we do that,
> I'd rather just switch to a specific database for the tests with
> ALLOWCONN rather than reuse "mydb" that could have other workers.

Right.

> The
> attached fixes the issue for me.

Hmm.  This passed a few dozen test cycles on mamba's host,
but it seems to me there's still a race condition here:

$result = $node->safe_psql('postgres',
"SELECT count(*) FROM pg_stat_activity WHERE datname = 'noconndb';");
is($result, '0', 'dynamic bgworker without BYPASS_ALLOWCONN not started');

There will be a window where the worker has logged "database
"noconndb" is not currently accepting connections" but hasn't yet
exited, so that conceivably this query could see a positive count.

We could just drop this test, reasoning that the appearance of
the error message is sufficient evidence that the right thing
happened.  (If the failed worker is still around, it won't break
the remaining tests AFAICS.)  Or we could convert this to a
poll_query_until() loop.

regards, tom lane

Re: Fix output of zero privileges in psql

2023-10-09 Thread David G. Johnston

On Mon, Oct 9, 2023 at 1:29 AM Laurenz Albe 
wrote:

> On Sun, 2023-10-08 at 19:58 -0700, David G. Johnston wrote:
>
> > The built-in default privileges are only in effect if the object has not
> been
> > the target of a GRANT or REVOKE and also has not had its default
> privileges
> > modified using ALTER DEFAULT PRIVILEGES. (???: if it is possible to
> revert
> > back to the state of built-in privileges that would need to be described
> here.)
>
> I don't think that we need to mention ALTER DEFAULT PRIVILEGES there.  If
> the default privileges have been altered, the ACL will not be stored as
> NULL in the catalogs.
>

It's already mentioned there, I just felt bringing the mention of ADP and
grant/revoke both invalidating the built-in default privileges closer
together would be an improvement.

>
> > The above removes the parenthetical regarding null in the catalogs, this
> is
> > intentional as it seems that the goal here is to use psql instead of the
> > catalogs and adding its use of null being printed as the empty string
> just
> > seems likely to add confusion.
>
> To me, mentioning the default privileges are stored as NULLs in the
> catalogs
> is not an invitation to view the privileges with catalog queries, but
> information about implementation details that explains why default
> privileges
> are displayed the way they are.
>

Calling it an implementation detail leads me to conclude the point does not
belong in the user-facing administration documentation.

>
> > > > Perhaps it would also be good to mention this in the psql
> documentation.
> >
> > We've chosen a poor default and I'd rather inform the user of specific
> meta-commands
> > to be wary of this poor default and change it at the point they will be
> learning
> > about the meta-commands adversely affected.
> >
> > That said, I'd be willing to document that these commands, because they
> are affected
> > by empty string versus null, require a non-empty-string value for \pset
> null and will
> > choose the string '' for the duration of the meta-command's
> execution if the
> > user's setting is incompatible.
>
> I am not certain I understood you correctly.
> Are you advocating for adding a mention of "\pset null" to every backslash
> command
> that displays privileges?  That is excessive, in my opinion.
>

Yes, I am.  I suppose the argument that any user of those commands is
expected to have read the ddl/privileges chapter would suffice for me
though.

My point with the second paragraph is that we could, instead of documenting
the caveat about null printing as empty strings is to instead issue an
implicit "\pset null ''" as part of these commands, and a "\pset
null" afterward, conditioned upon it not already being set to a non-empty
value.  IOW, the special-casing we do today but actually do it in a way
that distinguishes the two cases instead of forcing them to be
indistinguishable.

David J.

Re: UUID v7

2023-10-09 Thread Nick Babadzhanian

On Thu, 31 Aug 2023 at 23:10, Andrey M. Borodin  wrote:
> Well, as far as I know, RFC discourages extracting timestamps from UUIDs. But 
> we still can have such functions...maybe as an extension?

Do you know of any reason for that?

> However, so far I haven't figured out how to implement optional arguments for 
> catalog functions. I'd appreciate any pointers here.

I'd argue that the time argument shouldn't be optional. Asking the
user to supply time would force them to think whether they want to go
with `now()` or `clock_timestamp()` or something else.

Also, a shameless plug with my extension for UUID v1 that implements
extract and create from (and an opclass):
https://github.com/pgnickb/uuid_v1_ops

Re: Making aggregate deserialization (and WAL receive) functions slightly faster

2023-10-09 Thread Tom Lane

David Rowley  writes:
> On Mon, 9 Oct 2023 at 21:17, David Rowley  wrote:
>> Here are some more thoughts on how we could improve this:
>> 
>> 1. Adjust the definition of StringInfoData.maxlen to define that -1
>> means the StringInfoData's buffer is externally managed.
>> 2. Adjust enlargeStringInfo() to add a check for maxlen = -1 and have
>> it palloc, say, pg_next_pow2(str->len * 2) bytes and memcpy the
>> existing (externally managed string) into the newly palloc'd buffer.
>> 3. Add a new function along the lines of what I originally proposed to
>> allow init of a StringInfoData using an existing allocated string
>> which sets maxlen = -1.
>> 4. Update all the existing places, including the ones I just committed
>> (plus the ones you committed in ba1e066e4) to make use of the function
>> added in #3.

Hm.  I'd be inclined to use maxlen == 0 as the indicator of a read-only
buffer, just because that would not create a problem if we ever want
to change it to an unsigned type.  Other than that, I agree with the
idea of using a special maxlen value to indicate that the buffer is
read-only and not owned by the StringInfo.  We need to nail down the
exact semantics though.

> While working on this, I added an Assert in the new
> initStringInfoFromStringWithLen function to ensure that data[len] ==
> '\0' per the "There is guaranteed to be a terminating '\0' at
> data[len]" comment in stringinfo.h.  It looks like we have some
> existing breakers of this rule.

Ugh.  The point that 608fd198d also broke the terminating-nul
convention was something that occurred to me after sending
my previous message.  That's something we can't readily accommodate
within the concept of a read-only buffer, but I think we can't
give it up without risking a lot of obscure bugs.

> I'll also need to revert 608fd198 as this also highlights that setting
> the StringInfoData.data to point to a bytea Datum can't be done either
> as those aren't NUL terminated strings.

Yeah.  I would revert that as a separate commit and then think about
how we want to proceed, but I generally agree that there could be
value in the idea of a setup function that accepts a caller-supplied
buffer.

regards, tom lane

Re: UUID v7

2023-10-09 Thread Jelte Fennema

On Mon, 9 Oct 2023 at 18:46, Nick Babadzhanian  wrote:
>
> On Thu, 31 Aug 2023 at 23:10, Andrey M. Borodin  wrote:
> > Well, as far as I know, RFC discourages extracting timestamps from UUIDs. 
> > But we still can have such functions...maybe as an extension?
>
> Do you know of any reason for that?

No reasons are given but the RFC states this:

> UUIDs SHOULD be treated as opaque values and implementations SHOULD NOT 
> examine the bits in a UUID to whatever extent is possible. However, where 
> necessary, inspectors should refer to Section 4 for more information on 
> determining UUID version and variant.

> > However, so far I haven't figured out how to implement optional arguments 
> > for catalog functions. I'd appreciate any pointers here.
>
> I'd argue that the time argument shouldn't be optional. Asking the
> user to supply time would force them to think whether they want to go
> with `now()` or `clock_timestamp()` or something else.

I think using `now()` is quite prone to sequence rollover. With the
current patch inserting more than 2^18~=0.26M rows into a table with
`gen_uuid_v7()` as the default in a single transaction would already
cause sequence rollover. I think using a monotonic clock source is the
only reasonable thing to do. From the RFC:

> Implementations SHOULD use the current timestamp from a reliable source to 
> provide values that are time-ordered and continually increasing. Care SHOULD 
> be taken to ensure that timestamp changes from the environment or operating 
> system are handled in a way that is consistent with implementation 
> requirements. For example, if it is possible for the system clock to move 
> backward due to either manual adjustment or corrections from a time 
> synchronization protocol, implementations must decide how to handle such 
> cases. (See Altering, Fuzzing, or Smearing bullet below.)

Re: UUID v7

2023-10-09 Thread Andrey Borodin

On Mon, Oct 9, 2023 at 11:11 PM Jelte Fennema  wrote:
>
> I think using `now()` is quite prone to sequence rollover. With the
> current patch inserting more than 2^18~=0.26M rows into a table with
> `gen_uuid_v7()` as the default in a single transaction would already
> cause sequence rollover.

Well, the current patch will just use now()+1ms when 2^18 is
exhausted. Even if now() would be passed as an argument (however
current patch does not support an argument).

Best regards, Andrey Borodin.

Re: Fix output of zero privileges in psql

2023-10-09 Thread Laurenz Albe

On Mon, 2023-10-09 at 09:30 -0700, David G. Johnston wrote:
> On Mon, Oct 9, 2023 at 1:29 AM Laurenz Albe  wrote:
> > On Sun, 2023-10-08 at 19:58 -0700, David G. Johnston wrote:
> > 
> > > The built-in default privileges are only in effect if the object has not 
> > > been
> > > the target of a GRANT or REVOKE and also has not had its default 
> > > privileges
> > > modified using ALTER DEFAULT PRIVILEGES. (???: if it is possible to revert
> > > back to the state of built-in privileges that would need to be described 
> > > here.)
> > 
> > I don't think that we need to mention ALTER DEFAULT PRIVILEGES there.  If
> > the default privileges have been altered, the ACL will not be stored as
> > NULL in the catalogs.
> 
> It's already mentioned there, I just felt bringing the mention of ADP and
> grant/revoke both invalidating the built-in default privileges closer together
> would be an improvement.

Ah, sorry, I misread your comment.  You are right.  But then, the effects of
ALTER DEFAULT PRIVILEGES are discussed later in the paragraph anyway, and we 
don't
have to repeat that here.

> > 
> > To me, mentioning the default privileges are stored as NULLs in the catalogs
> > is not an invitation to view the privileges with catalog queries, but
> > information about implementation details that explains why default 
> > privileges
> > are displayed the way they are.
> 
> Calling it an implementation detail leads me to conclude the point does not
> belong in the user-facing administration documentation.

Again, you have a point there.  However, I find that detail useful, as it 
explains
the user-facing behavior.  Anyway, I don't think it is the job of this patch to
remove that pre-existing detail.

> > Are you advocating for adding a mention of "\pset null" to every backslash 
> > command
> > that displays privileges?  That is excessive, in my opinion.
> 
> Yes, I am.  I suppose the argument that any user of those commands is expected
> to have read the ddl/privileges chapter would suffice for me though.

Thanks.  Then let's leave it like that.

> My point with the second paragraph is that we could, instead of documenting 
> the
> caveat about null printing as empty strings is to instead issue an implicit
> "\pset null ''" as part of these commands, and a "\pset null" afterward,
> conditioned upon it not already being set to a non-empty value.  IOW, the
> special-casing we do today but actually do it in a way that distinguishes the
> two cases instead of forcing them to be indistinguishable.

-1

The whole point of this patch is to make psql behave consistently with respect 
to
NULLs in meta-commands.  Your suggestion would subvert that idea.

Yours,
Laurenz Albe

Re: Request for comment on setting binary format output per session

2023-10-09 Thread Robert Haas

On Mon, Oct 9, 2023 at 11:09 AM Jelte Fennema  wrote:
> Since the protocol already returns OIDs in the ParameterDescription
> and RowDescription messages I don't see why using OIDs for this GUC
> would cause any additional problems.

...but then...

> With Citus the same user defined type can have
> different OIDs on each of the servers in the cluster.

I realize that your intention here may be to say that this is not an
*additional* problem but one we have already. But it seems like one
that we ought to be trying to solve, rather than propagating a
problematic solution into more places.

Decisions we make about the wire protocol are some of the most
long-lasting and painful decisions we make, right up there with the
on-disk format. Maybe worse, in some ways.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Pre-proposal: unicode normalized text

2023-10-09 Thread Robert Haas

On Fri, Oct 6, 2023 at 3:07 PM Jeff Davis  wrote:
> On Fri, 2023-10-06 at 13:33 -0400, Robert Haas wrote:
> > What I think people really want is a whole column in
> > some encoding that isn't the normal one for that database.
>
> Do people really want that? I'd be curious to know why.

Because it's a feature that exists in other products and so having it
eases migrations and/or replication of data between systems.

I'm not saying that there are a lot of people who want this, any more.
I think there used to be more interest in it. But the point of the
comment was that people who want multiple character set support want
it as a per-column property, not a per-value property. I've never
heard of anyone wanting to store text blobs in multiple distinct
character sets in the same column. But I have heard of people wanting
text blobs in multiple distinct character sets in the same database,
each one in its own column.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Request for comment on setting binary format output per session

2023-10-09 Thread Dave Cramer

On Mon, 9 Oct 2023 at 15:00, Robert Haas  wrote:

> On Mon, Oct 9, 2023 at 11:09 AM Jelte Fennema  wrote:
> > Since the protocol already returns OIDs in the ParameterDescription
> > and RowDescription messages I don't see why using OIDs for this GUC
> > would cause any additional problems.
>
> ...but then...
>
> > With Citus the same user defined type can have
> > different OIDs on each of the servers in the cluster.
>
> I realize that your intention here may be to say that this is not an
> *additional* problem but one we have already. But it seems like one
> that we ought to be trying to solve, rather than propagating a
> problematic solution into more places.
>
> Decisions we make about the wire protocol are some of the most
> long-lasting and painful decisions we make, right up there with the
> on-disk format. Maybe worse, in some ways.
>

So if we use . would it be possible to have something like
 which represents a set of well known types?
My goal here is to reduce the overhead of naming all the types the client
wants in binary. The list of well known types is pretty long.
Additionally we could have a shorthand for removing a well known type.

Dave

Re: Fix output of zero privileges in psql

2023-10-09 Thread Tom Lane

Laurenz Albe  writes:
> On Mon, 2023-10-09 at 09:30 -0700, David G. Johnston wrote:
>> My point with the second paragraph is that we could, instead of documenting 
>> the
>> caveat about null printing as empty strings is to instead issue an implicit
>> "\pset null ''" as part of these commands, and a "\pset null" 
>> afterward,
>> conditioned upon it not already being set to a non-empty value.  IOW, the
>> special-casing we do today but actually do it in a way that distinguishes the
>> two cases instead of forcing them to be indistinguishable.

> -1

> The whole point of this patch is to make psql behave consistently with 
> respect to
> NULLs in meta-commands.  Your suggestion would subvert that idea.

Yeah.  There is a lot of attraction in having \pset null affect these
displays just like all other ones.  The fact that they act differently
now is a wart, not something we should replace with a different special
case behavior.

Also, I'm fairly concerned about not changing the default behavior here.
The fact that this behavior has stood for a couple dozen years without
many complaints indicates that there's not all that big a problem to be
solved.  I doubt that a new default behavior will be well received,
even if it's arguably better.

regards, tom lane

Re: should frontend tools use syncfs() ?

2023-10-09 Thread Nathan Bossart

On Mon, Oct 09, 2023 at 11:14:39AM -0500, Nathan Bossart wrote:
> Thanks.  I've made a couple of small changes, but otherwise I think this
> one is just about ready.

I forgot to rename one thing.  Here's a v13 with that fixed.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From b1829fea8f81cceceabd37585ab74a290505d0d5 Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Mon, 9 Oct 2023 11:07:29 -0500
Subject: [PATCH v13 1/1] Align names of variables, etc. in
 wal_sync_method-related code.

Author: Maxim Orlov
Discussion: https://postgr.es/m/CACG%3DezbL1gwE7_K7sr9uqaCGkWhmvRTcTEnm3%2BX1xsRNwbXULQ%40mail.gmail.com
---
 src/backend/access/transam/xlog.c   | 58 ++---
 src/backend/storage/file/fd.c   |  4 +-
 src/backend/utils/misc/guc_tables.c |  8 ++--
 src/include/access/xlog.h   | 15 +---
 src/include/access/xlogdefs.h   |  8 ++--
 src/include/port/freebsd.h  |  2 +-
 src/include/port/linux.h|  2 +-
 src/include/utils/guc_hooks.h   |  2 +-
 src/tools/pgindent/typedefs.list|  1 +
 9 files changed, 52 insertions(+), 48 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index fcbde10529..a0acd67772 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -130,7 +130,7 @@ bool	   *wal_consistency_checking = NULL;
 bool		wal_init_zero = true;
 bool		wal_recycle = true;
 bool		log_checkpoints = true;
-int			sync_method = DEFAULT_SYNC_METHOD;
+int			wal_sync_method = DEFAULT_WAL_SYNC_METHOD;
 int			wal_level = WAL_LEVEL_REPLICA;
 int			CommitDelay = 0;	/* precommit delay in microseconds */
 int			CommitSiblings = 5; /* # concurrent xacts needed to sleep */
@@ -171,17 +171,17 @@ static bool check_wal_consistency_checking_deferred = false;
 /*
  * GUC support
  */
-const struct config_enum_entry sync_method_options[] = {
-	{"fsync", SYNC_METHOD_FSYNC, false},
+const struct config_enum_entry wal_sync_method_options[] = {
+	{"fsync", WAL_SYNC_METHOD_FSYNC, false},
 #ifdef HAVE_FSYNC_WRITETHROUGH
-	{"fsync_writethrough", SYNC_METHOD_FSYNC_WRITETHROUGH, false},
+	{"fsync_writethrough", WAL_SYNC_METHOD_FSYNC_WRITETHROUGH, false},
 #endif
-	{"fdatasync", SYNC_METHOD_FDATASYNC, false},
+	{"fdatasync", WAL_SYNC_METHOD_FDATASYNC, false},
 #ifdef O_SYNC
-	{"open_sync", SYNC_METHOD_OPEN, false},
+	{"open_sync", WAL_SYNC_METHOD_OPEN, false},
 #endif
 #ifdef O_DSYNC
-	{"open_datasync", SYNC_METHOD_OPEN_DSYNC, false},
+	{"open_datasync", WAL_SYNC_METHOD_OPEN_DSYNC, false},
 #endif
 	{NULL, 0, false}
 };
@@ -2328,8 +2328,8 @@ XLogWrite(XLogwrtRqst WriteRqst, TimeLineID tli, bool flexible)
 		 * have no open file or the wrong one.  However, we do not need to
 		 * fsync more than one file.
 		 */
-		if (sync_method != SYNC_METHOD_OPEN &&
-			sync_method != SYNC_METHOD_OPEN_DSYNC)
+		if (wal_sync_method != WAL_SYNC_METHOD_OPEN &&
+			wal_sync_method != WAL_SYNC_METHOD_OPEN_DSYNC)
 		{
 			if (openLogFile >= 0 &&
 !XLByteInPrevSeg(LogwrtResult.Write, openLogSegNo,
@@ -2959,7 +2959,7 @@ XLogFileInitInternal(XLogSegNo logsegno, TimeLineID logtli,
 	 */
 	*added = false;
 	fd = BasicOpenFile(path, O_RDWR | PG_BINARY | O_CLOEXEC |
-	   get_sync_bit(sync_method));
+	   get_sync_bit(wal_sync_method));
 	if (fd < 0)
 	{
 		if (errno != ENOENT)
@@ -3124,7 +3124,7 @@ XLogFileInit(XLogSegNo logsegno, TimeLineID logtli)
 
 	/* Now open original target segment (might not be file I just made) */
 	fd = BasicOpenFile(path, O_RDWR | PG_BINARY | O_CLOEXEC |
-	   get_sync_bit(sync_method));
+	   get_sync_bit(wal_sync_method));
 	if (fd < 0)
 		ereport(ERROR,
 (errcode_for_file_access(),
@@ -3356,7 +3356,7 @@ XLogFileOpen(XLogSegNo segno, TimeLineID tli)
 	XLogFilePath(path, tli, segno, wal_segment_size);
 
 	fd = BasicOpenFile(path, O_RDWR | PG_BINARY | O_CLOEXEC |
-	   get_sync_bit(sync_method));
+	   get_sync_bit(wal_sync_method));
 	if (fd < 0)
 		ereport(PANIC,
 (errcode_for_file_access(),
@@ -8118,16 +8118,16 @@ get_sync_bit(int method)
 			 * not included in the enum option array, and therefore will never
 			 * be seen here.
 			 */
-		case SYNC_METHOD_FSYNC:
-		case SYNC_METHOD_FSYNC_WRITETHROUGH:
-		case SYNC_METHOD_FDATASYNC:
+		case WAL_SYNC_METHOD_FSYNC:
+		case WAL_SYNC_METHOD_FSYNC_WRITETHROUGH:
+		case WAL_SYNC_METHOD_FDATASYNC:
 			return o_direct_flag;
 #ifdef O_SYNC
-		case SYNC_METHOD_OPEN:
+		case WAL_SYNC_METHOD_OPEN:
 			return O_SYNC | o_direct_flag;
 #endif
 #ifdef O_DSYNC
-		case SYNC_METHOD_OPEN_DSYNC:
+		case WAL_SYNC_METHOD_OPEN_DSYNC:
 			return O_DSYNC | o_direct_flag;
 #endif
 		default:
@@ -8141,9 +8141,9 @@ get_sync_bit(int method)
  * GUC support
  */
 void
-assign_xlog_sync_method(int new_sync_method, void *extra)
+assign_wal_sync_method(int new_wal_sync_method, void *extra)
 {
-	if (sync_method != new_sync_method)
+	if (wal_sync_method != new_wal_sync_method)
 	{
 		/*
 		 * To ensure that no blocks escape unsync

Re: [PoC/RFC] Multiple passwords, interval expirations

2023-10-09 Thread Gurjeet Singh

On Mon, Oct 9, 2023 at 2:31 AM Gurjeet Singh  wrote:
>
> Next steps:
> - Break the patch into a series of smaller patches.

Please see attached the same v3 patch, but now split into 3 separate
patches. Each patch in the series depends on the previous patch to
have been applied. I have made sure that each patch passes `make
check` individually.

First patch adds the two new columns, rolsecondpassword and
rolsecondvaliduntil to the pg_authid shared catalog. This patch also
updates the corresponding pg_authid.dat file to set these values to
null for the rows populated during bootstrap. Finally, it adds code to
CreateRole() to set these columns' values to NULL for a role being
created.

The second patch updates the password extraction, verification
functions as well as authentication functions to honor the second
password, if any. There is more detailed description in the commit
message/body of the patch.

The third patch adds the SQL support to the ALTER ROLE command which
allows manipulation of both, the rolpassword and rolsecondpassword,
columns and their respective expiration timestamps,
rol[second]validuntil. This patch also adds regression tests for the
new SQL command, demonstrating the use of the new grammar.

v3-0001-Add-new-columns-to-pg_authid.patch
v3-0002-Update-password-verification-infrastructure-to-ha.patch
v3-0003-Added-SQL-support-for-ALTER-ROLE-to-manage-two-pa.patch



Best regards,
Gurjeet
http://Gurje.et
From bc7c35e53e421157c9425c198bc2557ad118a650 Mon Sep 17 00:00:00 2001
From: Gurjeet Singh 
Date: Mon, 9 Oct 2023 11:36:05 -0700
Subject: [PATCH v3 1/3] Add new columns to pg_authid

Add two columns to pg_authid, namely rolsecondpassword and
rolsecondvaliduntil. These columns are added in preparation for the
password-rollover feature. These columns will store the hash and the
expiration time, repspectively, of a second password that the role can
use for login authentication.
---
 src/backend/commands/user.c   |  4 +++
 src/include/catalog/pg_authid.dat | 45 ---
 src/include/catalog/pg_authid.h   |  2 ++
 3 files changed, 36 insertions(+), 15 deletions(-)

diff --git a/src/backend/commands/user.c b/src/backend/commands/user.c
index ce77a055e5..0afaf74865 100644
--- a/src/backend/commands/user.c
+++ b/src/backend/commands/user.c
@@ -452,9 +452,13 @@ CreateRole(ParseState *pstate, CreateRoleStmt *stmt)
 	else
 		new_record_nulls[Anum_pg_authid_rolpassword - 1] = true;
 
+	new_record_nulls[Anum_pg_authid_rolsecondpassword - 1] = true;
+
 	new_record[Anum_pg_authid_rolvaliduntil - 1] = validUntil_datum;
 	new_record_nulls[Anum_pg_authid_rolvaliduntil - 1] = validUntil_null;
 
+	new_record_nulls[Anum_pg_authid_rolsecondvaliduntil - 1] = true;
+
 	new_record[Anum_pg_authid_rolbypassrls - 1] = BoolGetDatum(bypassrls);
 
 	/*
diff --git a/src/include/catalog/pg_authid.dat b/src/include/catalog/pg_authid.dat
index 6b4a0d..b326e48376 100644
--- a/src/include/catalog/pg_authid.dat
+++ b/src/include/catalog/pg_authid.dat
@@ -23,76 +23,91 @@
   rolname => 'POSTGRES', rolsuper => 't', rolinherit => 't',
   rolcreaterole => 't', rolcreatedb => 't', rolcanlogin => 't',
   rolreplication => 't', rolbypassrls => 't', rolconnlimit => '-1',
-  rolpassword => '_null_', rolvaliduntil => '_null_' },
+  rolpassword => '_null_', rolvaliduntil => '_null_',
+  rolsecondpassword => '_null_', rolsecondvaliduntil => '_null_' },
 { oid => '6171', oid_symbol => 'ROLE_PG_DATABASE_OWNER',
   rolname => 'pg_database_owner', rolsuper => 'f', rolinherit => 't',
   rolcreaterole => 'f', rolcreatedb => 'f', rolcanlogin => 'f',
   rolreplication => 'f', rolbypassrls => 'f', rolconnlimit => '-1',
-  rolpassword => '_null_', rolvaliduntil => '_null_' },
+  rolpassword => '_null_', rolvaliduntil => '_null_',
+  rolsecondpassword => '_null_', rolsecondvaliduntil => '_null_' },
 { oid => '6181', oid_symbol => 'ROLE_PG_READ_ALL_DATA',
   rolname => 'pg_read_all_data', rolsuper => 'f', rolinherit => 't',
   rolcreaterole => 'f', rolcreatedb => 'f', rolcanlogin => 'f',
   rolreplication => 'f', rolbypassrls => 'f', rolconnlimit => '-1',
-  rolpassword => '_null_', rolvaliduntil => '_null_' },
+  rolpassword => '_null_', rolvaliduntil => '_null_',
+  rolsecondpassword => '_null_', rolsecondvaliduntil => '_null_' },
 { oid => '6182', oid_symbol => 'ROLE_PG_WRITE_ALL_DATA',
   rolname => 'pg_write_all_data', rolsuper => 'f', rolinherit => 't',
   rolcreaterole => 'f', rolcreatedb => 'f', rolcanlogin => 'f',
   rolreplication => 'f', rolbypassrls => 'f', rolconnlimit => '-1',
-  rolpassword => '_null_', rolvaliduntil => '_null_' },
+  rolpassword => '_null_', rolvaliduntil => '_null_',
+  rolsecondpassword => '_null_', rolsecondvaliduntil => '_null_' },
 { oid => '3373', oid_symbol => 'ROLE_PG_MONITOR',
   rolname => 'pg_monitor', rolsuper => 'f', rolinherit => 't',
   rolcreaterole => 'f', rolcreatedb => 'f', rolcanlogin => 'f',
   rolreplication => 'f', rolbypassrls => 'f', rolconnlimit => '-1',
-  rolp

Re: New WAL record to detect the checkpoint redo location

2023-10-09 Thread Robert Haas

On Thu, Oct 5, 2023 at 2:34 PM Andres Freund  wrote:
> One thing that's notable, but not related to the patch, is that we waste a
> fair bit of cpu time below XLogInsertRecord() with divisions. I think they're
> all due to the use of UsableBytesInSegment in
> XLogBytePosToRecPtr/XLogBytePosToEndRecPtr.  The multiplication of
> XLogSegNoOffsetToRecPtr() also shows.

Despite what I said in my earlier email, and with a feeling like unto
that created by the proximity of the sword of Damocles or some ghostly
albatross, I spent some time reflecting on this. Some observations:

1. The reason why we're doing this multiplication and division is to
make sure that the code in ReserveXLogInsertLocation which executes
while holding insertpos_lck remains as simple and brief as possible.
We could eliminate the conversion between usable byte positions and
LSNs if we replaced Insert->{Curr,Prev}BytePos with LSNs and had
ReserveXLogInsertLocation work out by how much to advance the LSN, but
it would have to be worked out while holding insertpos_lck (or some
replacement lwlock, perhaps) and that cure seems worse than the
disease. Given that, I think we're stuck with converting between
usable bye positions and LSNs, and that intrinsically needs some
multiplication and division.

2. It seems possible to remove one branch in each of
XLogBytePosToRecPtr and XLogBytePosToEndRecPtr. Rather than testing
whether bytesleft < XLOG_BLCKSZ - SizeOfXLogLongPHD, we could simply
increment bytesleft by SizeOfXLogLongPHD - SizeOfXLogShortPHD. Then
the rest of the calculations can be performed as if every page in the
segment had a header of length SizeOfXLogShortPHD, with no need to
special-case the first page. However, that doesn't get rid of any
multiplication or division, just a branch.

3. Aside from that, there seems to be no simple way to reduce the
complexity of an individual calculation, but ReserveXLogInsertLocation
does perform 3 rather similar computations, and I believe that we know
that it will always be the case that *PrevPtr < *StartPos < *EndPos.
Maybe we could have a fast-path for the case where they are all in the
same segment. We could take prevbytepos modulo UsableBytesInSegment;
call the result prevsegoff. If UsableBytesInSegment - prevsegoff >
endbytepos - prevbytepos, then all three pointers are in the same
segment, and maybe we could take advantage of that to avoid performing
the segment calculations more than once, but still needing to repeat
the page calculations. Or, instead or in addition, I think we could by
a similar technique check whether all three pointers are on the same
page; if so, then *StartPos and *EndPos can be computed from *PrevPtr
by just adding the difference between the corresponding byte
positions.

I'm not really sure whether that would come out cheaper. It's just the
only idea that I have. It did also occur to me to wonder whether the
apparent delays performing multiplication and division here were
really the result of the arithmetic itself being slow or whether they
were synchronization-related, SpinLockRelease(&Insert->insertpos_lck)
being a memory barrier just before. But I assume you thought about
that and concluded that wasn't the issue here.

--
Robert Haas
EDB: http://www.enterprisedb.com

Re: CHECK Constraint Deferrable

2023-10-09 Thread Robert Haas

On Mon, Oct 2, 2023 at 10:25 PM Tom Lane  wrote:
> But here we have a
> feature whose only possible use is with constraints that *aren't*
> immutable; else we might as well just check them immediately.

I'm a little bit confused by this whole discussion because surely this
statement is just completely false. The example in the original post
demonstrates that clearly.

The use case for a deferred check constraint is exactly the same as
the use case for a deferred foreign key constraint or a deferred
uniqueness constraint, which is that you might have a constraint that
will be temporarily false while the transaction is in progress, but
true by the time the transaction actually commits, and you might like
the transaction to succeed instead of failing in such a case. You seem
to be imagining that the constraint itself might be returning mutable
answers on the same inputs, but that's not what this is about at all.

I'm not here - at least, not right now - to take a position on whether
the patch itself is any good.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: New WAL record to detect the checkpoint redo location

2023-10-09 Thread Andres Freund

Hi,

On 2023-10-06 13:44:55 -0400, Robert Haas wrote:
> On Thu, Oct 5, 2023 at 2:34 PM Andres Freund  wrote:
> > If I add an unlikely around if (rechdr->xl_rmid == RM_XLOG_ID), the
> > performance does improve. But that "only" brings it up to 322.406. Not sure
> > what the rest is.
> 
> I don't really think this is worth worrying about. A sub-one-percent
> regression on a highly artificial test case doesn't seem like a big
> deal.

I agree. I think it's worth measuring and looking at, after all the fix might
be trivial (like the case of the unlikely for the earlier if()). But it
shouldn't block progress on significant features.

I think this "issue" might be measurable in some other, not quite as artifical
cases, like INSERT ... SELECT or such. But even then it's going to be tiny.

Greetings,

Andres Freund

Re: Request for comment on setting binary format output per session

2023-10-09 Thread Jeff Davis

On Wed, 2023-10-04 at 15:10 -0400, Robert Haas wrote:
> I hadn't really considered client_encoding as a precedent for this
> setting. A lot of my discomfort with the proposed mechanism also
> applies to client_encoding, namely, suppose you call some function or
> procedure or whatever and it changes client_encoding on your behalf
> and now your communication with the server is all screwed up.

This may have some security implications, but we've had lots of
discussion about the general topic of executing malicious code, and the
ability to mess with the on-the-wire formats might not be any worse
than what can already happen. (Though expanding it to binary formats
might slightly increase the attack surface area.)

> That
> seems very unpleasant. Yet it's also existing behavior.

The binary format setting is better in some ways and worse in other
ways.

For text encoding, usually it's expecting a single encoding and so a
single setting at the start of the session makes sense. For binary
formats, the client is likely to support some values in binary and
others not; and user-defined types make it even messier.

On the other hand, at least the results are marked as being binary
format, so if something unexpected happens, a well-written client is
more likely to see that something went wrong. For text encoding, the
client would have to be a bit more defensive.

Another thing to consider is that using a GUC for binary formats is a
protocol change in a way that client_encoding is not. The existing
documentation for the protocol already specifies when binary formats
will be used, and a GUC would change that behavior. We absolutely would
need to update the documentation, and clients (like psql) really should
be updated.

> I think one
> could conclude on these facts either that (a) client_encoding is fine
> and the problems with controlling behavior using that kind of
> mechanism are mostly theoretical or 

I'm not clear on the exact rules for a protocol version bump and why a
GUC helps us avoid one. If we have a binary_formats GUC, the client
would need to know the server version and check that it's >=17 before
sending the "SET binary_formats='...'" commmand, right? What's the
difference between that and making it an explicit protocol message that
only >=17 understand?

In any case, I think clients and connection poolers can work around the
problems, and they are mostly minor in practice, but I wouldn't call
them "theoretical". If there's enough utility in the binary_formats
parameter, we can decide to put up with the problems; which is
different than saying there aren't any.

> (b) that we messed up with
> client_encoding and shouldn't add any more mistakes of the same ilk
> or
> (c) that we should really be looking at redesigning the way
> client_encoding works, too.

(b) doesn't seem like a very helpful perspective without some ideas
toward (c). I think (c) is worth discussing but we don't have to block
on it.

Regards,
Jeff Davis

Re: CHECK Constraint Deferrable

2023-10-09 Thread Robert Haas

On Tue, Oct 3, 2023 at 10:05 AM David G. Johnston
 wrote:
>> The real-world use case, at least for me, is when using an ORM. For large 
>> object-graphs ORMs have a tendency to INSERT first with NULLs then UPDATE 
>> the “NOT NULLs” later.
>>
>> “Rewrite the ORM” is not an option for most of us…
>
> Between this and Vik comment it sounds like we should probably require a 
> patch in this area to solve both the not null and check constraint deferral 
> omissions then, not just one of them (alternatively, let’s solve the not null 
> one first).

I have a couple of problems with this comment:

1. I don't know which of Vik's comments you're talking about here, but
it seems to me that Vik is generally in favor of this feature, so I'm
a bit surprised to hear that one of his comments led you to think that
it should be burdened with additional requirements.

2. I don't think it's a good idea for the same patch to try to solve
two problems unless they are so closely related that solving one
without solving the other is not sensible. It is a good policy for the
community to accept incremental progress provided it doesn't break
things along the way. Smaller patches are way easier to get committed,
and then we get some of the feature sooner instead of all of it some
more distant point in the future or never. Furthemore, forcing
additional requirements onto patch submitters as a condition of patch
acceptance is extremely demoralizing to submitters, and we should not
do it without an excellent reason.

Mind you, I'm not against this patch handling both CHECK and NOT NULL
constraints if that's the most sensible way forward, especially in
view of Álvaro's recent work in that area. But it sort of sounds like
you're just trying to sink the patch despite it being a feature that
is both in the SQL standard and has real use cases which have been
mentioned on the thread, and I don't really like that. In the interest
of full disclosure, I do work at the same company as Dilip and
Himanshu, so I might be biased. But I feel like I would be in favor of
this feature no matter who proposed it, as long as it was
well-implemented. It's always struck me as odd that we allow deferring
some types of constraints but not others, and I don't understand why
we'd want to block closing that gap unless there is some concrete
downside to so doing.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Request for comment on setting binary format output per session

2023-10-09 Thread Jelte Fennema

On Mon, 9 Oct 2023 at 21:00, Robert Haas  wrote:
> ...but then...
>
> > With Citus the same user defined type can have
> > different OIDs on each of the servers in the cluster.
>
> I realize that your intention here may be to say that this is not an
> *additional* problem but one we have already. But it seems like one
> that we ought to be trying to solve, rather than propagating a
> problematic solution into more places.

Yes, I probably should have emphasized the word *additional*. i.e.
starting from scratch I wouldn't use OIDs in this GUC nor in
ParameterDescription or RowDescription, but blocking the addition of
this GUC on addressing that seems unnecessary. When we fix it we can
fix this too. I'd rather use OIDs (with all their problems)
consistently now for communication of types with regards to protocol
related things. Then we can at some point change all places in bulk to
some better identifier than OIDs.

> Decisions we make about the wire protocol are some of the most
> long-lasting and painful decisions we make, right up there with the
> on-disk format. Maybe worse, in some ways.

Yes, I agree. I just don't think using OIDs makes changing the
protocol in this regard any less painful than it already is currently.

Re: Fix output of zero privileges in psql

2023-10-09 Thread David G. Johnston

On Mon, Oct 9, 2023 at 12:13 PM Tom Lane  wrote:

> Laurenz Albe  writes:
> > On Mon, 2023-10-09 at 09:30 -0700, David G. Johnston wrote:
> >> My point with the second paragraph is that we could, instead of
> documenting the
> >> caveat about null printing as empty strings is to instead issue an
> implicit
> >> "\pset null ''" as part of these commands, and a "\pset null"
> afterward,
> >> conditioned upon it not already being set to a non-empty value.  IOW,
> the
> >> special-casing we do today but actually do it in a way that
> distinguishes the
> >> two cases instead of forcing them to be indistinguishable.
>
> > -1
>
> > The whole point of this patch is to make psql behave consistently with
> respect to
> > NULLs in meta-commands.  Your suggestion would subvert that idea.
>
> Yeah.  There is a lot of attraction in having \pset null affect these
> displays just like all other ones.  The fact that they act differently
> now is a wart, not something we should replace with a different special
> case behavior.
>
> Also, I'm fairly concerned about not changing the default behavior here.
> The fact that this behavior has stood for a couple dozen years without
> many complaints indicates that there's not all that big a problem to be
> solved.  I doubt that a new default behavior will be well received,
> even if it's arguably better.
>

My position is that the default behavior should be changed such that the
distinction these reports need to make between empty privileges and default
privileges is impossible for the user to remove.  I suppose the best direct
solution given that goal is for the query to simply do away with any
reliance on NULL being an indicator.  Output an i18n'd "no permissions
present" line in the rare empty permissions situation and leave the empty
string for built-in default.

If the consensus is to not actually fix these views to make them
environment invariant in their accuracy then so be it.  Having them obey
\pset null seems like a half-measure but it at least is an improvement over
the status quo such that users are capable of altering their system to make
the reports distinguish the two cases if they realize the need.

I do agree that my suggestion regarding the implicit \pset null, while
plausible, leaves the wart that NULL is being used to mean something
specific.  Better is to just use a label for that specific thing.

David J.

Re: New WAL record to detect the checkpoint redo location

2023-10-09 Thread Andres Freund

Hi,

As noted in my email from a few minutes ago, I agree that optimizing this
shouldn't be a requirement for merging the patch.

On 2023-10-09 15:58:36 -0400, Robert Haas wrote:
> 1. The reason why we're doing this multiplication and division is to
> make sure that the code in ReserveXLogInsertLocation which executes
> while holding insertpos_lck remains as simple and brief as possible.
> We could eliminate the conversion between usable byte positions and
> LSNs if we replaced Insert->{Curr,Prev}BytePos with LSNs and had
> ReserveXLogInsertLocation work out by how much to advance the LSN, but
> it would have to be worked out while holding insertpos_lck (or some
> replacement lwlock, perhaps) and that cure seems worse than the
> disease. Given that, I think we're stuck with converting between
> usable bye positions and LSNs, and that intrinsically needs some
> multiplication and division.

Right, that's absolutely crucial for scalability.

> 2. It seems possible to remove one branch in each of
> XLogBytePosToRecPtr and XLogBytePosToEndRecPtr. Rather than testing
> whether bytesleft < XLOG_BLCKSZ - SizeOfXLogLongPHD, we could simply
> increment bytesleft by SizeOfXLogLongPHD - SizeOfXLogShortPHD. Then
> the rest of the calculations can be performed as if every page in the
> segment had a header of length SizeOfXLogShortPHD, with no need to
> special-case the first page. However, that doesn't get rid of any
> multiplication or division, just a branch.

This reminded me about something I've been bugged by for a while: The whole
idea of short xlog page headers seems like a completely premature
optimization.  The page header is a very small amount of the overall data
(long: 40/8192 ~= 0.00488, short: 24/8192 ~= 0.00292), compared to the space
we waste in many other places, including on a per-record level, it doesn't
seem worth the complexity.

> 3. Aside from that, there seems to be no simple way to reduce the
> complexity of an individual calculation, but ReserveXLogInsertLocation
> does perform 3 rather similar computations, and I believe that we know
> that it will always be the case that *PrevPtr < *StartPos < *EndPos.
> Maybe we could have a fast-path for the case where they are all in the
> same segment. We could take prevbytepos modulo UsableBytesInSegment;
> call the result prevsegoff. If UsableBytesInSegment - prevsegoff >
> endbytepos - prevbytepos, then all three pointers are in the same
> segment, and maybe we could take advantage of that to avoid performing
> the segment calculations more than once, but still needing to repeat
> the page calculations. Or, instead or in addition, I think we could by
> a similar technique check whether all three pointers are on the same
> page; if so, then *StartPos and *EndPos can be computed from *PrevPtr
> by just adding the difference between the corresponding byte
> positions.

I think we might be able to speed some of this up by pre-compute values so we
can implement things like bytesleft / UsableBytesInPage with shifts. IIRC we
already insist on power-of-two segment sizes, so instead of needing to divide
by a runtime value, we should be able to shift by a runtime value (and the
modulo should be a mask).

> I'm not really sure whether that would come out cheaper. It's just the
> only idea that I have. It did also occur to me to wonder whether the
> apparent delays performing multiplication and division here were
> really the result of the arithmetic itself being slow or whether they
> were synchronization-related, SpinLockRelease(&Insert->insertpos_lck)
> being a memory barrier just before. But I assume you thought about
> that and concluded that wasn't the issue here.

I did verify that they continue to be a bottleneck even after (incorrectly
obviously), removing the spinlock. It's also not too surprising, the latency
of 64bit divs is just high, particularly on intel from a few years ago (my
cascade lake workstation) and IIRC there's just a single execution port for it
too, so multiple instructions can't be fully parallelized.

https://uops.info/table.html documents a worst case latency of 89 cycles on
cascade lake, with the division broken up into 36 uops (reducing what's
available to track other in-flight instructions). It's much better on alter
lake (9 cycles and 7 uops on the perf cores, 44 cycles and 4 uops on
efficiency cores) and on zen 3+ (19 cycles, 2 uops).

Greetings,

Andres Freund

Re: should frontend tools use syncfs() ?

2023-10-09 Thread Nathan Bossart

On Wed, Oct 04, 2023 at 10:29:07AM -0500, Nathan Bossart wrote:
> On Wed, Sep 27, 2023 at 01:56:08PM +0100, Peter Eisentraut wrote:
>> We have a collection of platform-specific notes in chapter 19, including
>> file-system-related notes in section 19.2.  Maybe it could be put there?
> 
> I will give this a try.

I started on this, but I couldn't shake the feeling that this wasn't the
right place for these notes.  This chapter is about setting up a server,
and the syncfs() notes do apply to the recovery_init_sync_method
configuration parameter, but it also applies to a number of server/client
applications.

I've been looking around, and I haven't found a great place to move this
section to.  IMO some of the other appendices have similar amounts of
information (e.g., Date/Time Support, The Source Code Repository, Color
Support), so maybe a dedicated appendix isn't too extreme.  Another option
could be to introduce a new section for platform-specific notes, but that
would just make this section even larger for now.

Thoughts?

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: On login trigger: take three

2023-10-09 Thread Andres Freund

On 2023-10-09 17:11:25 +0300, Alexander Korotkov wrote:
> This version should be good and has no overhead.  Any thoughts?

I think you forgot to attach the patch?

Re: Request for comment on setting binary format output per session

2023-10-09 Thread Jelte Fennema

On Mon, 9 Oct 2023 at 22:25, Jeff Davis  wrote:
> We absolutely would
> need to update the documentation, and clients (like psql) really should
> be updated.

+1

> > I think one
> > could conclude on these facts either that (a) client_encoding is fine
> > and the problems with controlling behavior using that kind of
> > mechanism are mostly theoretical or
>
> I'm not clear on the exact rules for a protocol version bump and why a
> GUC helps us avoid one. If we have a binary_formats GUC, the client
> would need to know the server version and check that it's >=17 before
> sending the "SET binary_formats='...'" commmand, right?

I agree that we'd probably still want to do a protocol minor version
bump. FYI there is another thread trying to introduce protocol change
which needs a minor version bump. Patch number 0003 in that patchset
is meant to actually make libpq handle minor version increases
correctly. If we need a version bump than that would be useful[1].

> What's the
> difference between that and making it an explicit protocol message that
> only >=17 understand?

Honestly I think the main difference is the need to introduce this
explicit protocol message. If we do, I think it might be best to have
this be a way of setting a GUC at the Protocol level, and expand the
GucContext enum to have a way to disallow setting it from SQL (e.g.
PGC_PROTOCOL), while still allowing PgBouncer (or other poolers) to
change the GUC as part of the connection handoff, in a way that's
similar to what's being done for client_encoding now. We might even
want to make client_encoding PGC_PROTOCOL too (eventually).

Actually, for connection poolers there's other reasons to want to set
GUC values at the protocol level instead of SQL. Because the value of
the ParameterStatus response is sadly not a valid SQL string... That's
why in PgBouncer we have to re-quote the value [2], which is a problem
for any GUC_LIST_QUOTE type, which search_path is. This GUC_LIST_QUOTE
logic is actually not completely correct in PgBouncer and only handles
"" (empty search_path), because for search_path that's the only
reasonable problematic case that people might hit (e.g. truncating to
NAMELEN is another problem, but elements in search_path should already
be at most NAMELEN). But still it would be good not to have to worry
about that. And being able to send the value in ParameterStatus back
verbatim to the server would be quite helpful for PgBouncer.

[1]: 
https://www.postgresql.org/message-id/flat/CAFj8pRAX48WH5Y6BbqnZbUSzmtEaQZ22rY_6cYw%3DE9QkoVvL0A%40mail.gmail.com#643c91f84ae33b316c0fed64e19c8e49
[2]: 
https://github.com/pgbouncer/pgbouncer/blob/60708022d5b934fa53c51849b9f02d87a7881b11/src/varcache.c#L172-L183

Re: CHECK Constraint Deferrable

2023-10-09 Thread David G. Johnston

On Mon, Oct 9, 2023 at 1:27 PM Robert Haas  wrote:

> On Tue, Oct 3, 2023 at 10:05 AM David G. Johnston
>  wrote:
> >> The real-world use case, at least for me, is when using an ORM. For
> large object-graphs ORMs have a tendency to INSERT first with NULLs then
> UPDATE the “NOT NULLs” later.
> >>
> >> “Rewrite the ORM” is not an option for most of us…
> >
> > Between this and Vik comment it sounds like we should probably require a
> patch in this area to solve both the not null and check constraint deferral
> omissions then, not just one of them (alternatively, let’s solve the not
> null one first).
>
> I have a couple of problems with this comment:
>
> 1. I don't know which of Vik's comments you're talking about here, but
> it seems to me that Vik is generally in favor of this feature, so I'm
> a bit surprised to hear that one of his comments led you to think that
> it should be burdened with additional requirements.
>

Specifically, Vik commented that the standard requires implementing NOT
NULL as a check constraint and thus needs to allow for deferrable check
constraints in order for not null checks to be deferrable.  I agree fully
that deferring a not null check makes for an excellent use case.  But we've
also decided to make NOT NULL its own thing, contrary to the standard.
Thus my understanding for why this behavior is standard mandated is that it
is to allow for deferrable not null constraints and thus our goal should be
to make our not null constraints deferrable.

The only other example case of wanting a deferrable check constraint
involved the usage of a function that we expressly prohibit as a check
constraint.  The argument, which I weakly support, is that if our adding
deferrable check constraints increases the frequency of such functions
being created and used by our users, then we should continue to prohibit
such deferrability and require those users to properly implement triggers
which can then be deferred.  With a deferrable not null constraint any
other reasonable check constraints can simply evaluate to null during the
period where they should be deferred - because their column inputs are
deferred nulls - and then we be fully evaluated when the inputs ultimately
end up non-null.

2. I don't think it's a good idea for the same patch to try to solve
> two problems unless they are so closely related that solving one
> without solving the other is not sensible.

A NOT NULL constraint apparently is just a special case of a check
constraint which seems closely related enough to match your definition.

But I guess you are right, I was trying to say no to this patch, and yes to
the not null deferral idea, without being so explicit and final about it.

While the coders are welcome to work on whatever they wish, the effort
spent on this just doesn't seem that valuable compared to what is already
in the queue being worked on needing reviews and commits. I can live with a
gap in our standards conformance here since I haven't observed any uses
cases that are impossible to accomplish except by adding this specific
feature which would only cover NOT NULL constraints if the syntactical form
for creating them were not used (which I suppose if we fail to provide NOT
NULL DEFERRABLE that would argue for at least giving this work-around...)

David J.

Re: Request for comment on setting binary format output per session

2023-10-09 Thread Jelte Fennema

On Mon, 9 Oct 2023 at 21:08, Dave Cramer  wrote:
> So if we use . would it be possible to have something like 
>  which represents a set of well known types?
> My goal here is to reduce the overhead of naming all the types the client  
> wants in binary. The list of well known types is pretty long.
> Additionally we could have a shorthand for removing a well known type.

You're only setting this once in the lifetime of the connection right,
i.e. right at the start (although pgbouncer could set it once per
transaction in the worst case). It seems like it shouldn't really
matter much to optimize the size of the "SET format_binary=..."
command, I'd expect it to be at most 1 kilobyte. I'm not super opposed
to having a shorthand for some of the most commonly wanted built-in
types, but then we'd need to decide on what those are, which would add
even more discussion/bikeshedding to this thread. I'm not sure the win
in size is worth that effort.

Re: Remove distprep

2023-10-09 Thread Andres Freund

Hi,

On 2023-10-09 12:16:23 +0200, Peter Eisentraut wrote:
> On 06.10.23 20:50, Andres Freund wrote:
> > The only thing I wonder is whether we ought to keep a maintainer-clean
> > target (as an alias to distclean), so that extensions that added things
> > to maintainer-clean continue to work.
> 
> The patch does do that.

It kinda works, but I'm not sure how well.  Because the aliasing happens in
Makefile.global, we won't know about the "original" maintainer-clean target
once recursing into a subdir.

That's perhaps OK, because extensions likely won't utilize subdirectories? But
I'm not sure. I know that some people build postgres extensions by adding them
to contrib/, in those cases it won't work.

OTOH, it seems somewhat unlikely that maintainer-clean is utilized much in
extensions. I see it in things like postgis, but that has it's own configure
etc, even though it also invokes pgxs.

I wish we had an easy way of
a) downloading most working open-source extensions
b) building many of those

Greetings,

Andres Freund

Re: New WAL record to detect the checkpoint redo location

2023-10-09 Thread Robert Haas

On Mon, Oct 9, 2023 at 4:47 PM Andres Freund  wrote:
> I think we might be able to speed some of this up by pre-compute values so we
> can implement things like bytesleft / UsableBytesInPage with shifts. IIRC we
> already insist on power-of-two segment sizes, so instead of needing to divide
> by a runtime value, we should be able to shift by a runtime value (and the
> modulo should be a mask).

Huh, is there a general technique for this when dividing by a
non-power-of-two? The segment size is a power of two, as is the page
size, but UsableBytesIn{Page,Segment} are some random value slightly
less than a power of two.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Add a new BGWORKER_BYPASS_ROLELOGINCHECK flag

2023-10-09 Thread Michael Paquier

On Mon, Oct 09, 2023 at 12:20:18PM -0400, Tom Lane wrote:
> There will be a window where the worker has logged "database
> "noconndb" is not currently accepting connections" but hasn't yet
> exited, so that conceivably this query could see a positive count.

I don't think that's possible here.  The check on datallowconn is done
before a backend calls pgstat_bestart() which would make its backend
entry reported to pg_stat_activity.  So there is no window where a
backend would be in pg_stat_activity if this check fails.

> We could just drop this test, reasoning that the appearance of
> the error message is sufficient evidence that the right thing
> happened.  (If the failed worker is still around, it won't break
> the remaining tests AFAICS.)  Or we could convert this to a
> poll_query_until() loop.

Saying that, I'm OK with just dropping this query, as it could also be
possible that one decides that calling pgstat_bestart() before the
datallowconn check is a good idea for a reason or another.
--
Michael


signature.asc
Description: PGP signature

Lowering the default wal_blocksize to 4K

2023-10-09 Thread Andres Freund

Hi,

I've mentioned this to a few people before, but forgot to start an actual
thread. So here we go:

I think we should lower the default wal_blocksize / XLOG_BLCKSZ to 4096, from
the current 8192.  The reason is that

a) We don't gain much from a blocksize above 4096, as we already do one write
   all the pending WAL data in one go (except when at the tail of
   wal_buffers). We *do* incur more overhead for page headers, but compared to
   the actual WAL data it is not a lot (~0.29% of space is page headers 8192
   vs 0.59% with 4096).

b) Writing 8KB when we we have to flush a partially filled buffer can
   substantially increase write amplification. In a transactional workload,
   this will often double the write volume.

Currently disks mostly have 4096 bytes as their "sector size". Sometimes
that's exposed directly, sometimes they can also write in 512 bytes, but that
internally requires a read-modify-write operation.


For some example numbers, I ran a very simple insert workload with a varying
number of clients with both a wal_blocksize=4096 and wal_blocksize=8192
cluster, and measured the amount of bytes written before/after.  The table was
recreated before each run, followed by a checkpoint and the benchmark. Here I
ran the inserts only for 15s each, because the results don't change
meaningfully with longer runs.


With XLOG_BLCKSZ=8192

clients  tpsdisk bytes written
1667 81296
2739 89796
4   1446 89208
8   2858 90858
16  5775 96928
32 11920115351
64 23686135244
12846001173390
25688833239720
512   146208335669


With XLOG_BLCKSZ=4096

clients  tpsdisk bytes written
1751 46838
2773 47936
4   1512 48317
8   3143 52584
16  6221 59097
32 12863 73776
64 25652 98792
1284827410
25688969200720
512   146298298523


This is on a not-that-fast NVMe SSD (Samsung SSD 970 PRO 1TB).


It's IMO quite interesting that even at the higher client counts, the number
of bytes written don't reach parity.


On a stripe of two very fast SSDs:

With XLOG_BLCKSZ=8192

clients  tpsdisk bytes written
1  237862893392
2  385154683336
4  634364688052
8 1066184618760
161779054384360
322548903890664
642971133031568
128   2998782297808
256   3087741935064
512   2925151630408


With XLOG_BLCKSZ=4096

clients  tpsdisk bytes written
1  257421586748
2  435782686708
4  627342613856
8 1162172809560
162008022947580
322692682461364
643231952042196
128   3171601550364
256   3096011285744
512   2920631103816

It's fun to see how the total number of writes *decreases* at higher
concurrency, because it becomes more likely that pages are filled completely.


One thing I noticed is that our auto-configuration of wal_buffers leads to
different wal_buffers settings for different XLOG_BLCKSZ, which doesn't seem
great.


Performing the same COPY workload (1024 files, split across N clients) for
both settings shows no performance difference, but a very slight increase in
total bytes written (about 0.25%, which is roughly what I'd expect).


Personally I'd say the slight increase in WAL volume is more than outweighed
by the increase in throughput and decrease in bytes written.


There's an alternative approach we could take, which is to write in 4KB
increments, while keeping 8KB pages. With the current format that's not
obviously a bad idea. But given there aren't really advantages in 8KB WAL
pages, it seems we should just go for 4KB?

Greetings,

Andres Freund

Re: Proposal to use JSON for Postgres Parser format

2023-10-09 Thread Thomas Munro

On Tue, Sep 20, 2022 at 4:16 PM Thomas Munro  wrote:
> To explain my earlier guess: reader code for #S(STRUCTNAME ...) can
> bee seen here, though it's being lexed as "PLAN_SYM" so perhaps the
> author of that C already didn't know that was a general syntax for
> Lisp structs.  (Example: at a Lisp prompt, if you write (defstruct foo
> x y z) then (make-foo :x 1 :y 2 :z 3), the resulting object will be
> printed as #S(FOO :x 1 :y 2 :z 3), so I'm guessing that the POSTGRES
> Lisp code, which sadly (for me) was ripped out before even that repo
> IIUC, must have used defstruct-based plans.)

That defstruct guess is confirmed by page 36 and nearby of
https://dsf.berkeley.edu/papers/UCB-MS-zfong.pdf.

Re: New WAL record to detect the checkpoint redo location

2023-10-09 Thread Andres Freund

Hi,

On 2023-10-09 18:31:11 -0400, Robert Haas wrote:
> On Mon, Oct 9, 2023 at 4:47 PM Andres Freund  wrote:
> > I think we might be able to speed some of this up by pre-compute values so 
> > we
> > can implement things like bytesleft / UsableBytesInPage with shifts. IIRC we
> > already insist on power-of-two segment sizes, so instead of needing to 
> > divide
> > by a runtime value, we should be able to shift by a runtime value (and the
> > modulo should be a mask).
> 
> Huh, is there a general technique for this when dividing by a
> non-power-of-two?

There is, but I was just having a brainfart, forgetting that UsableBytesInPage
isn't itself a power of two. The general technique is used by compilers, but
doesn't iirc lend itself well to be done at runtime.

Greetings,

Andres Freund

Re: Lowering the default wal_blocksize to 4K

2023-10-09 Thread Bruce Momjian

On Mon, Oct  9, 2023 at 04:08:05PM -0700, Andres Freund wrote:
> There's an alternative approach we could take, which is to write in 4KB
> increments, while keeping 8KB pages. With the current format that's not
> obviously a bad idea. But given there aren't really advantages in 8KB WAL
> pages, it seems we should just go for 4KB?

How do we handle shorter maximum row lengths and shorter maximum index
entry lengths?

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.

Re: Lowering the default wal_blocksize to 4K

2023-10-09 Thread Andres Freund

Hi,

On 2023-10-09 19:26:54 -0400, Bruce Momjian wrote:
> On Mon, Oct  9, 2023 at 04:08:05PM -0700, Andres Freund wrote:
> > There's an alternative approach we could take, which is to write in 4KB
> > increments, while keeping 8KB pages. With the current format that's not
> > obviously a bad idea. But given there aren't really advantages in 8KB WAL
> > pages, it seems we should just go for 4KB?
> 
> How do we handle shorter maximum row lengths and shorter maximum index
> entry lengths?

The WAL blocksize shouldn't influence either, unless we have a bug somewhere.

Greetings,

Andres Freund

Re: Lowering the default wal_blocksize to 4K

2023-10-09 Thread Bruce Momjian

On Mon, Oct  9, 2023 at 04:36:20PM -0700, Andres Freund wrote:
> Hi,
> 
> On 2023-10-09 19:26:54 -0400, Bruce Momjian wrote:
> > On Mon, Oct  9, 2023 at 04:08:05PM -0700, Andres Freund wrote:
> > > There's an alternative approach we could take, which is to write in 4KB
> > > increments, while keeping 8KB pages. With the current format that's not
> > > obviously a bad idea. But given there aren't really advantages in 8KB WAL
> > > pages, it seems we should just go for 4KB?
> > 
> > How do we handle shorter maximum row lengths and shorter maximum index
> > entry lengths?
> 
> The WAL blocksize shouldn't influence either, unless we have a bug somewhere.

Oh, good point.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.

Re: post-recovery amcheck expectations

2023-10-09 Thread Peter Geoghegan

On Wed, Oct 4, 2023 at 7:52 PM Noah Misch  wrote:
> Suppose we start with this nbtree (subset of a diagram from verify_nbtree.c):
>
>  *   1
>  *   /   \
>  *2 <-> 3
>
> We're deleting 2, the leftmost leaf under a leftmost internal page.  After the
> MARK_PAGE_HALFDEAD record, the first downlink from 1 will lead to 3, which
> still has a btpo_prev pointing to 2.  bt_index_parent_check() complains here:

Thanks for working on this. Your analysis seems sound to me.

When I reviewed the patch that became commit d114cc53, I presented
Alexander with a test case that demonstrated false positive reports of
corruption involving interrupted page splits (not interrupted page
deletions). Obviously I didn't give sufficient thought to this case,
which is analogous.

Might make sense to test the fix for this issue using a similar
approach: by adding custom code that randomly throws errors at a point
that stresses the implementation. I'm referring to the point at which
VACUUM is between the first and second phase of page deletion: right
before (or directly after) _bt_unlink_halfdead_page() is called. That
isn't fundamentally different to the approach from your TAP test, but
seems like it might add some interesting variation.

> One can encounter this if recovery ends between a MARK_PAGE_HALFDEAD record
> and its corresponding UNLINK_PAGE record.  See the attached test case.  The
> index is actually fine in such a state, right?

Yes, it is fine.

FWIW, this feels like it might be related to the fact that (unlike
Lanin & Shasha), we don't make the key space move left; we make it
move right instead (just like page splits). In other words, page
deletion isn't the precise opposite of a page split, which is a bit
awkward.

Note, in particular, that _bt_mark_page_halfdead() doesn't do a
straight delete of the pivot tuple in the parent page that points to
the target page, as you might expect. It actually deletes the right
sibling of the target page's pivot, and then performs an in-place overwrite of
the downlink from the pivot tuple that originally pointed to the
target page. Perhaps this isn't worth going into now, but thought you
might appreciate the context.

Terminology note: we sometimes use "downlink" as a synonym of "pivot
tuple" or even "separator key", which is misleading.

> I lean toward fixing this by
> having amcheck scan left; if left links reach only half-dead or deleted pages,
> that's as good as the present child block being P_LEFTMOST.

Also my preference.

> There's a
> different error from bt_index_check(), and I've not yet studied how to fix
> that:
>
>   ERROR:  left link/right link pair in index "not_leftmost_pk" not in 
> agreement
>   DETAIL:  Block=0 left block=0 left link from block=4294967295.

This looks like this might be a straightforward case of confusing
P_NONE for InvalidBlockNumber (or vice-versa). They're both used to
indicate "no such block" here.

> Alternatively, one could view this as a need for the user to VACUUM between
> recovery and amcheck.  The documentation could direct users to "VACUUM
> (DISABLE_PAGE_SKIPPING off, INDEX_CLEANUP on, TRUNCATE off)" if not done since
> last recovery.  Does anyone prefer that or some other alternative?

I'd rather not go that route. That strikes me as defining the problem
out of existence.

> For some other amcheck expectations, the comments suggest reliance on the
> bt_index_parent_check() ShareLock.  I haven't tried to make test cases for
> them, but perhaps recovery can trick them the same way.  Examples:
>
>   errmsg("downlink or sibling link points to deleted block in index \"%s\"",
>   errmsg("block %u is not leftmost in index \"%s\"",
>   errmsg("block %u is not true root in index \"%s\"",

These are all much older. They're certainly all from before the
relevant checks were first added (by commit d114cc53), and seem much
less likely to be buggy.

These older cases are all cases where we descend directly from an
internal page to one of its child pages. Whereas the problem you've
demonstrated involves traversal across levels *and* across siblings in
newer code. That's quite a bit more complicated, since it requires
that we worry about both phases of page deletion -- not just the
first. That in itself necessitates that we deal with various edge
cases. (The really prominent edge-case is the interrupted page
deletion case, which requires significant handling, but evidently
missed a subtlety with leftmost pages).

--
Peter Geoghegan

Re: CREATE DATABASE with filesystem cloning

2023-10-09 Thread Andres Freund

Hi,

On 2023-10-07 18:51:45 +1300, Thomas Munro wrote:
> It should be a lot faster, and use less physical disk, than the two
> existing strategies on recent-ish XFS, BTRFS, very recent OpenZFS,
> APFS (= macOS), and it could in theory be extended to other systems
> that invented different system calls for this with more work (Solaris,
> Windows).  Then extra physical disk space will be consumed only as the
> two clones diverge.

> It's just like the old strategy=file_copy, except it asks the OS to do
> its best copying trick.  If you try it on a system that doesn't
> support copy-on-write, then copy_file_range() should fall back to
> plain old copy, but it might still be better than we could do, as it
> can push copy commands to network storage or physical storage.
> 
> Therefore, the usual caveats from strategy=file_copy also apply here.
> Namely that it has to perform checkpoints which could be very
> expensive, and there are some quirks/brokenness about concurrent
> backups and PITR.  Which makes me wonder if it's worth pursuing this
> idea.  Thoughts?

I think it'd be interesting to have. For the regression tests we do end up
spending a lot of disk throughput on contents duplicated between
template0/template1/postgres.  And I've plenty of time spent time copying huge
template databases, to have a reproducible starting point for some benchmark
that's expensive to initialize.

If we do this, I think we should consider creating template0, template1 with
the new strategy, so that a new initdb cluster ends up with deduplicated data.

FWIW, I experimented with using cp -c on macos for the initdb template, and
that provided some further gain. I suspect that that gain would increase if
template0/template1/postgres were deduplicated.

> diff --git a/src/backend/storage/file/copydir.c 
> b/src/backend/storage/file/copydir.c
> index e04bc3941a..8c963ff548 100644
> --- a/src/backend/storage/file/copydir.c
> +++ b/src/backend/storage/file/copydir.c
> @@ -19,14 +19,21 @@
>  #include "postgres.h"
>  
>  #include 
> +#include 
>  #include 
>  
> +#ifdef HAVE_COPYFILE_H
> +#include 
> +#endif

We already have code around this in src/bin/pg_upgrade/file.c, seems we ought
to move it somewhere in src/port?

Greetings,

Andres Freund

Re: typo in couple of places

2023-10-09 Thread vignesh C

On Mon, 9 Oct 2023 at 16:17, Amit Kapila  wrote:
>
> On Sat, Oct 7, 2023 at 8:28 AM vignesh C  wrote:
> >
> > On Fri, 6 Oct 2023 at 20:50, Dagfinn Ilmari Mannsåker  
> > wrote:
> > >
> > > vignesh C  writes:
> > >
> > > > Hi,
> > > >
> > > > I noticed a couple of typos in code. "the the" should have been "the",
> > > > attached patch has the changes for the same.
> > >
> > > This made me curious about other duplicate word occurrences, and after a
> > > few minutes of increasingly-elaborate regexing to exclude false
> > > postives, I found a few more (plus a bonus a _missing_ "the").  See the
> > > attached patch (which includes your originl one, for completeness).
> >
> > Thanks, Looks good.
> >
>
> Pushed.

Thanks for pushing this.

Re: PGDOCS - add more links in the pub/sub reference pages

2023-10-09 Thread vignesh C

On Mon, 9 Oct 2023 at 12:18, Peter Smith  wrote:
>
> On Mon, Oct 9, 2023 at 3:32 PM Amit Kapila  wrote:
> >
> > On Fri, Oct 6, 2023 at 12:15 PM Peter Smith  wrote:
> > >
> > > Here is a patch to add the 2 missing references:
> > >
> > > ref/alter_subscription.sgml ==> added more ids
> > > ref/alter_publication.sgml ==> added link to
> > > "sql-altersubscription-refresh-publication"
> > > ref/drop_subscription.sgml ==> added link to "sql-altersubscription"
> > >
> >
> > -   
> > +   
> >  new_owner
> >  
> >   
> > @@ -281,7 +281,7 @@ ALTER SUBSCRIPTION  > class="parameter">name RENAME TO <
> >  
> > 
> >
> > -   
> > +   
> >  new_name
> >  
> >
>
> Thanks for looking at this patch!
>
> > Shall we append 'params' in the above and other id's in the patch. For
> > example, sql-altersubscription-params-new-name. The other places like
> > alter_role.sgml and alter_table.sgml uses similar id's. Is there a
> > reason to be different here?
>
> In v1, I used the same pattern as on the CREATE SUBSCRIPTION page,
> which doesn't look like those...
>
> ~~~
>
> The "Parameters" section describes some things that really are parameters:
>
> e.g.
> "sql-altersubscription-name"
> "sql-altersubscription-new-owner"
> "sql-altersubscription-new-name">
>
> I agree, emphasising that those ones are parameters is better. Changed
> like this in v2.
>
> "sql-altersubscription-params-name"
> "sql-altersubscription-params-new-owner"
> "sql-altersubscription-params-new-name">
>
> ~
>
> But, the "Parameters" section also describes other SQL syntax clauses
> which are not really parameters at all.
>
> e.g.
> "sql-altersubscription-refresh-publication"
> "sql-altersubscription-enable"
> "sql-altersubscription-disable"
>
> So I felt those ones are more intuitive left as they are  -- e.g.,
> instead of having ids/linkends like:
>
> "sql-altersubscription-params-refresh-publication"
> "sql-altersubscription-params-enable"
> "sql-altersubscription-params-disable"
>
> ~~
>
> PSA v2.

I noticed a couple of other places where a link to "ALTER SUBSCRIPTION
... DISABLE" and "ALTER SUBSCRIPTION ... SET" can be specified in
drop_subscription:
To proceed in this situation, first disable the subscription by
executing  ALTER SUBSCRIPTION ... DISABLE, and then
disassociate it from the replication slot by executing ALTER
SUBSCRIPTION ... SET (slot_name = NONE).

Regards,
Vignesh

Re: pgBufferUsage.blk_{read|write}_time are zero although there are pgBufferUsage.local_blks_{read|written}

2023-10-09 Thread Michael Paquier

On Thu, Oct 05, 2023 at 08:51:40AM -0400, Robert Haas wrote:
> On Thu, Oct 5, 2023 at 6:25 AM Nazir Bilal Yavuz  wrote:
>> What do you think about the second patch, counting extend calls'
>> timings in blk_write_time? In my opinion if something increments
>> {shared|local}_blks_written, then it needs to be counted in
>> blk_write_time too. I am not sure why it is decided like that.
> 
> I agree that an extend should be counted the same way as a write. But
> I'm suspicious that here too we have confusion about whether
> blk_write_time is supposed to be covering shared buffers and local
> buffers or just shared buffers.

Agreed.

In ~14, as far as I can see blk_write_time is only incremented for
shared buffers.  FWIW, I agree that we should improve these stats for
local buffers but I am not on board with a solution where we'd use the
same counter for local and shared buffers while we've historically
only counted the former, because that could confuse existing
monitoring queries.  It seems to me that the right solution is to do
the same separation as temp blocks with two separate counters, without
a backpatch.  I'd like to go as far as renaming blk_read_time and
blk_write_time to respectively shared_blk_read_time and
shared_blk_write_time to know exactly what the type of block dealt
with is when querying this data, particularly for pg_stat_statements's
sake.
--
Michael


signature.asc
Description: PGP signature

Re: Included xid in restoring reorder buffer changes from disk log message

2023-10-09 Thread Kyotaro Horiguchi

At Fri, 6 Oct 2023 14:58:13 +0530, vignesh C  wrote in 
> On Fri, 30 Apr 2021 at 11:53, Dilip Kumar  wrote:
> > It makes sense to include xid in the debug message, earlier I thought
> > that will it be a good idea to also print the toplevel_xid.  But I
> > think it is for debug purposes and only we have the xid we can fetch
> > the other parameters so maybe adding xid is good enough.

+1

> While having a look at the reorderbuffer code, I noticed that this
> changes were still not committed.
> Here is a rebased version of the patch.

While this patch makes the following change on the de-serializing side;

-   elog(DEBUG2, "restored %u/%u changes from disk",
+   elog(DEBUG2, "restored %u/%u changes of XID %u from 
disk",

the counter part ReorderBufferSerializeTXN() has the following
message.

>   elog(DEBUG2, "spill %u changes in XID %u to disk",
>(uint32) txn->nentries_mem, txn->xid);

It might be preferable for those two messages to have a corresponding
appearance.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: pg_ctl start may return 0 even if the postmaster has been already started on Windows

2023-10-09 Thread Kyotaro Horiguchi

Thank you for testing this!

At Fri, 6 Oct 2023 12:28:32 +0530, Shlok Kyal  wrote 
i> D:\project\pg_dev\bin>pg_ctl -D ../data -l data2.log start
> pg_ctl: another server might be running; trying to start server anyway
> waiting for server to startpg_ctl: launcher shell died
> 
> The output message after patch is different from the HEAD. I felt that
> with patch as well we should get the message  "pg_ctl: could not start
> server".
> Is this message change intentional?

Partly no, partly yes. My focus was on verifying the accuracy of
identifying the actual postmaster PID on Windows. The current patch
provides a detailed description of the events, primarily because I
lack a comprehensive understanding of both the behavior of Windows
APIs and the associated processes.  Given that context, the messages
essentially serve debugging purposes.

I agree with your suggestion.  Ultimately, if there's a possibility
for this to be committed, the message will be consolidated to "could
not start server".

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Making aggregate deserialization (and WAL receive) functions slightly faster

2023-10-09 Thread David Rowley

On Tue, 10 Oct 2023 at 06:38, Tom Lane  wrote:
> Hm.  I'd be inclined to use maxlen == 0 as the indicator of a read-only
> buffer, just because that would not create a problem if we ever want
> to change it to an unsigned type.  Other than that, I agree with the
> idea of using a special maxlen value to indicate that the buffer is
> read-only and not owned by the StringInfo.  We need to nail down the
> exact semantics though.

I've attached a slightly more worked on patch that makes maxlen == 0
mean read-only.  Unsure if a macro is worthwhile there or not.

The patch still fails during 023_twophase_stream.pl for the reasons
mentioned upthread.  Getting rid of the Assert in
initStringInfoFromStringWithLen() allows it to pass.

One thought I had about this is that the memory context behaviour
might catch someone out at some point.  Right now if you do
initStringInfo() the memory context of the "data" field will be
CurrentMemoryContext, but if someone does
initStringInfoFromStringWithLen() and then changes to some other
memory context before doing an appendStringInfo on that string, then
we'll allocate "data" in whatever that memory context is. Maybe that's
ok if we document it.  Fixing it would mean adding a MemoryContext
field to StringInfoData which would be set to CurrentMemoryContext
during initStringInfo() and initStringInfoFromStringWithLen().

I'm not fully happy with the extra code added in enlargeStringInfo().
It's a little repetitive.  Fixing it up would mean having to have a
boolean variable to mark if the string was readonly so at the end we'd
know to repalloc or palloc/memcpy.  For now, I just marked that code
as unlikely() since there's no place in the code base that uses it.

David
diff --git a/src/backend/replication/logical/applyparallelworker.c 
b/src/backend/replication/logical/applyparallelworker.c
index 82f48a488e..fdfbd45e2c 100644
--- a/src/backend/replication/logical/applyparallelworker.c
+++ b/src/backend/replication/logical/applyparallelworker.c
@@ -774,10 +774,7 @@ LogicalParallelApplyLoop(shm_mq_handle *mqh)
if (len == 0)
elog(ERROR, "invalid message length");

-   s.cursor = 0;
-   s.maxlen = -1;
-   s.data = (char *) data;
-   s.len = len;
+   initStringInfoFromStringWithLen(&s, data, len);

/*
 * The first byte of messages sent from leader apply 
worker to
diff --git a/src/backend/replication/logical/proto.c 
b/src/backend/replication/logical/proto.c
index d52c8963eb..e51a2b7ce1 100644
--- a/src/backend/replication/logical/proto.c
+++ b/src/backend/replication/logical/proto.c
@@ -879,6 +879,7 @@ logicalrep_read_tuple(StringInfo in, LogicalRepTupleData 
*tuple)
/* Read the data */
for (i = 0; i < natts; i++)
{
+   char   *buff;
charkind;
int len;
StringInfo  value = &tuple->colvalues[i];
@@ -899,19 +900,16 @@ logicalrep_read_tuple(StringInfo in, LogicalRepTupleData 
*tuple)
len = pq_getmsgint(in, 4);  /* read length 
*/

/* and data */
-   value->data = palloc(len + 1);
-   pq_copymsgbytes(in, value->data, len);
+   buff = palloc(len + 1);
+   pq_copymsgbytes(in, buff, len);

/*
 * Not strictly necessary for 
LOGICALREP_COLUMN_BINARY, but
 * per StringInfo practice.
 */
-   value->data[len] = '\0';
+   buff[len] = '\0';

-   /* make StringInfo fully valid */
-   value->len = len;
-   value->cursor = 0;
-   value->maxlen = len;
+   initStringInfoFromStringWithLen(value, buff, 
len);
break;
default:
elog(ERROR, "unrecognized data representation 
type '%c'", kind);
diff --git a/src/backend/replication/logical/worker.c 
b/src/backend/replication/logical/worker.c
index 597947410f..b4bf3111cf 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -3582,10 +3582,7 @@ LogicalRepApplyLoop(XLogRecPtr last_received)
/* Ensure we are reading the data into 
our memory context. */

MemoryContextSwitchTo(ApplyMessageContext);

-   s.data = buf;
-   s.len = len;
-

Re: Evaluate arguments of correlated SubPlans in the referencing ExprState

2023-10-09 Thread Andres Freund

Hi,

On 2023-10-01 14:53:23 -0400, Tom Lane wrote:
> Peter Eisentraut  writes:
> > Is this patch still being worked on?
> 
> I thought Andres simply hadn't gotten back to it yet.
> It still seems like a worthwhile improvement.

Indeed - I do plan to commit it. I haven't quite shifted into v17 mode yet...

Greetings,

Andres Freund

Re: Add a new BGWORKER_BYPASS_ROLELOGINCHECK flag

2023-10-09 Thread Tom Lane

Michael Paquier  writes:
> On Mon, Oct 09, 2023 at 12:20:18PM -0400, Tom Lane wrote:
>> There will be a window where the worker has logged "database
>> "noconndb" is not currently accepting connections" but hasn't yet
>> exited, so that conceivably this query could see a positive count.

> I don't think that's possible here.  The check on datallowconn is done
> before a backend calls pgstat_bestart() which would make its backend
> entry reported to pg_stat_activity.  So there is no window where a
> backend would be in pg_stat_activity if this check fails.

Ah, right.  I complained after seeing that we set MyProc->databaseId
before doing CheckMyDatabase, but you're right that it doesn't
matter for pg_stat_activity until pgstat_bestart. 

> Saying that, I'm OK with just dropping this query, as it could also be
> possible that one decides that calling pgstat_bestart() before the
> datallowconn check is a good idea for a reason or another.

Not sure if that's a likely change or not.  However, if we're in
agreement that this test step isn't buying much, let's just drop
it and save the test cycles.

regards, tom lane

Re: Lowering the default wal_blocksize to 4K

2023-10-09 Thread Tom Lane

Andres Freund  writes:
> There's an alternative approach we could take, which is to write in 4KB
> increments, while keeping 8KB pages. With the current format that's not
> obviously a bad idea. But given there aren't really advantages in 8KB WAL
> pages, it seems we should just go for 4KB?

Seems like that's doubling the overhead of WAL page headers.  Do we need
to try to skinny those down?

regards, tom lane

Re: PGDOCS - add more links in the pub/sub reference pages

2023-10-09 Thread Peter Smith

On Tue, Oct 10, 2023 at 11:46 AM vignesh C  wrote:
>
> I noticed a couple of other places where a link to "ALTER SUBSCRIPTION
> ... DISABLE" and "ALTER SUBSCRIPTION ... SET" can be specified in
> drop_subscription:
> To proceed in this situation, first disable the subscription by
> executing  ALTER SUBSCRIPTION ... DISABLE, and then
> disassociate it from the replication slot by executing ALTER
> SUBSCRIPTION ... SET (slot_name = NONE).
>

Hi Vignesh. Thanks for the review comments.

Modified as suggested.

PSA v3.

==
Kind Regards,
Peter Smith.
Fujitsu Australia


v3-0001-Add-more-links.patch
Description: Binary data

Re: [PATCH] hstore: Fix parsing on Mac OS X: isspace() is locale specific

2023-10-09 Thread Thomas Munro

FTR I ran into a benign case of the phenomenon in this thread when
dealing with row types.  In rowtypes.c, we double-quote stuff
containing spaces, but we detect them by passing individual bytes of
UTF-8 sequences to isspace().  Like macOS, Windows thinks that 0xa0 is
a space when you do that, so for example the Korean character '점'
(code point C810, UTF-8 sequence EC A0 90) gets quotes on Windows but
not on Linux.  That confused a migration/diff tool while comparing
Windows and Linux database servers using that representation.  Not a
big deal, I guess no one ever promised that the format was stable
across platforms, and I don't immediately see a way for anything more
serious to go wrong (though I may lack imagination).  It does seem a
bit weird to be using locale-aware tokenising for a machine-readable
format, and then making sure its behaviour is undefined by feeding it
chopped up bytes.

Re: Crash in add_paths_to_append_rel

2023-10-09 Thread David Rowley

On Mon, 9 Oct 2023 at 22:49, Richard Guo  wrote:
> I came across a crash in add_paths_to_append_rel() with the query below.

> It was introduced by commit a8a968a8 where we referenced
> cheapest_startup_path->param_info without first checking that we have a
> valid cheapest_startup_path.

I pushed this with just minor adjustments to the comments you wrote. I
didn't feel the need to reiterate on the set_cheapest() comments.

David

Re: Lowering the default wal_blocksize to 4K

2023-10-09 Thread Andres Freund

Hi,

On 2023-10-09 23:16:30 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > There's an alternative approach we could take, which is to write in 4KB
> > increments, while keeping 8KB pages. With the current format that's not
> > obviously a bad idea. But given there aren't really advantages in 8KB WAL
> > pages, it seems we should just go for 4KB?
> 
> Seems like that's doubling the overhead of WAL page headers.  Do we need
> to try to skinny those down?

I think the overhead is small, and we are wasting so much space in other
places, that I am not worried about the proportional increase page header
space usage at this point, particularly compared to saving in overall write
rate and increase in TPS. There's other areas we can save much more space, if
we want to focus on that.

I was thinking we should perhaps do the opposite, namely getting rid of short
page headers. The overhead in the "byte position" <-> LSN conversion due to
the differing space is worse than the gain. Or do something inbetween - having
the system ID in the header adds a useful crosscheck, but I'm far less
convinced that having segment and block size in there, as 32bit numbers no
less, is worthwhile. After all, if the system id matches, it's not likely that
the xlog block or segment size differ.

Greetings,

Andres Freund

Re: Add support for AT LOCAL

2023-10-09 Thread Michael Paquier

On Sat, Oct 07, 2023 at 02:35:06AM +0100, Vik Fearing wrote:
> I realized that my regression tests are not exercising what I originally
> intended them to after this change.  They are supposed to show that calling
> the function explicitly or using AT LOCAL is correctly reproduced by
> ruleutils.c.

Yes, I don't really see the point in adding more tests for the
deparsing of AT TIME ZONE in this context.  I would not expect one to
call directly timezone() with the grammar in place, but I have no
objections either to do that in the view for the regression tests as
you are suggesting in v4.  The minimal set of changes to test is to
make sure that both paths (ts->tstz and tstz->tz) are exercised, and
that's what you do.

Anyway, upon review, I have a few issues with this patch.  First is
the documentation that I find too light:
- There is no description for AT LOCAL and what kind of result one
gets back depending on the input given, while AT TIME ZONE has more
details about the arguments that can be used and the results
obtained.
- The function timezone(zone, timestamp) is mentioned, and I think
that we should do the same with timezone(timestamp) for AT LOCAL.

Another thing that I have been surprised with is the following, which
is inconsistent with AT TIME ZONE because we are lacking one system
function:
=# select time with time zone '05:34:17-05' at local;
ERROR:  42883: function pg_catalog.timezone(time with time zone) does not exist

I think that we should include that to have a full set of operations
supported, similar to AT TIME ZONE (see \df+ timezone).  It looks like
this would need one extra timetz_at_local(), which would require a bit
more refactoring in date.c so as an equivalent of timetz_zone() could
feed on the current session's TimeZone instead.  I guess that you
could just have an timetz_zone_internal() that optionally takes a
timezone provided by the user or gets the current session's Timezone
(session_timezone).  Am I missing something?

I am attaching a v5 that addresses the documentation bits, could you
look at the business with date.c?
--
Michael
From e136d38967a863452762e1ee6e28d9ab40056adf Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Tue, 10 Oct 2023 13:28:46 +0900
Subject: [PATCH v5] Add support for AT LOCAL with timestamps

When converting a timestamp to/from with/without time zone, the SQL
Standard specifies an AT LOCAL variant of AT TIME ZONE which uses the
session's time zone.

Note: bump catalog version
---
 src/include/catalog/pg_proc.dat   |  6 +++
 src/backend/parser/gram.y |  7 +++
 src/backend/utils/adt/ruleutils.c |  9 
 src/backend/utils/adt/timestamp.c | 20 
 src/test/regress/expected/timestamptz.out | 47 +
 src/test/regress/sql/timestamptz.sql  | 21 
 doc/src/sgml/func.sgml| 62 +--
 7 files changed, 169 insertions(+), 3 deletions(-)

diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index f0b7b9cbd8..21645b63a4 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -2319,6 +2319,12 @@
 { oid => '1159', descr => 'adjust timestamp to new time zone',
   proname => 'timezone', prorettype => 'timestamp',
   proargtypes => 'text timestamptz', prosrc => 'timestamptz_zone' },
+{ oid => '9159', descr => 'adjust timestamp to local time zone',
+  proname => 'timezone', prorettype => 'timestamp',
+  proargtypes => 'timestamptz', prosrc => 'timestamptz_at_local' },
+{ oid => '9160', descr => 'adjust timestamp to local time zone',
+  proname => 'timezone', prorettype => 'timestamptz',
+  proargtypes => 'timestamp', prosrc => 'timestamp_at_local' },
 
 { oid => '1160', descr => 'I/O',
   proname => 'interval_in', provolatile => 's', prorettype => 'interval',
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index e56cbe77cb..50ed504e5a 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -14508,6 +14508,13 @@ a_expr:		c_expr	{ $$ = $1; }
 			   COERCE_SQL_SYNTAX,
 			   @2);
 }
+			| a_expr AT LOCAL		%prec AT
+{
+	$$ = (Node *) makeFuncCall(SystemFuncName("timezone"),
+			   list_make1($1),
+			   COERCE_SQL_SYNTAX,
+			   -1);
+}
 		/*
 		 * These operators must be called out explicitly in order to make use
 		 * of bison's automatic operator-precedence handling.  All other
diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c
index 442205382e..9d1b0b13b1 100644
--- a/src/backend/utils/adt/ruleutils.c
+++ b/src/backend/utils/adt/ruleutils.c
@@ -10347,6 +10347,15 @@ get_func_sql_syntax(FuncExpr *expr, deparse_context *context)
 			appendStringInfoChar(buf, ')');
 			return true;
 
+		case F_TIMEZONE_TIMESTAMP:
+		case F_TIMEZONE_TIMESTAMPTZ:
+			/* AT LOCAL */
+			appendStringInfoChar(buf, '(');
+			get_rule_expr_paren((Node *) linitial(expr->args), context, false,
+

Re: Add a new BGWORKER_BYPASS_ROLELOGINCHECK flag

2023-10-09 Thread Michael Paquier

On Mon, Oct 09, 2023 at 11:11:58PM -0400, Tom Lane wrote:
> Michael Paquier  writes:
>> Saying that, I'm OK with just dropping this query, as it could also be
>> possible that one decides that calling pgstat_bestart() before the
>> datallowconn check is a good idea for a reason or another.
> 
> Not sure if that's a likely change or not.  However, if we're in
> agreement that this test step isn't buying much, let's just drop
> it and save the test cycles.

No problem here.  f483b2090 has removed the query entirely, relying
now only on a wait_for_log() when the worker startup fails.
--
Michael


signature.asc
Description: PGP signature

Re: Crash in add_paths_to_append_rel

2023-10-09 Thread Andy Fan

On Tue, Oct 10, 2023 at 11:52 AM David Rowley  wrote:

> On Mon, 9 Oct 2023 at 22:49, Richard Guo  wrote:
> > I came across a crash in add_paths_to_append_rel() with the query below.
>
> > It was introduced by commit a8a968a8 where we referenced
> > cheapest_startup_path->param_info without first checking that we have a
> > valid cheapest_startup_path.
>
> I pushed this with just minor adjustments to the comments you wrote. I
> didn't feel the need to reiterate on the set_cheapest() comments.
>
>
 Thanks Richard for the report and also Thanks you David for the push!

-- 
Best Regards
Andy Fan

Minor edit to src/bin/pg_upgrade/IMPLEMENTAION

2023-10-09 Thread Gurjeet Singh

The attached patch adds the word 'databases' to show that template0,
template1 and postgres are databases in a default installation.

Best regards,
Gurjeet
http://Gurje.et


pg_upgrade.implementation.diff
Description: Binary data

Re: Add a new BGWORKER_BYPASS_ROLELOGINCHECK flag

2023-10-09 Thread Drouvot, Bertrand


Hi,

On 10/6/23 8:48 AM, Drouvot, Bertrand wrote:

Hi,

On 10/5/23 6:23 PM, Bharath Rupireddy wrote:

On Thu, Oct 5, 2023 at 9:32 PM Drouvot, Bertrand
 wrote:


+  CREATE ROLE nologrole with nologin;
+  GRANT CREATE ON DATABASE mydb TO nologrole;


A few nit-picks:

1. s/with/WITH
2. s/nologin/NOLOGIN


done in v8 attached.


3. +   is specified as flags it is possible to
bypass the login check to connect to databases.
How about "it is possible to bypass the login check for the role used
to connect to databases."?



"for the role used" sounds implicit to me but I don't have a strong opinion
about it so re-worded as per your proposal in v8.



Please find attached v9 (v8 rebase due to f483b2090).

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.comFrom e3a75359b2f3a57d7312ff4ab3f77ae78f44f221 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Thu, 5 Oct 2023 14:06:37 +0900
Subject: [PATCH v9] Allow background workers to bypass login check

This adds a new flag value (BGWORKER_BYPASS_ROLELOGINCHECK) to the flags
being used in BackgroundWorkerInitializeConnection() and
BackgroundWorkerInitializeConnectionByOid().  This flag allows the
background workers to bypass the login check done in
InitializeSessionUserId().
---
 doc/src/sgml/bgworker.sgml|  4 ++-
 src/backend/bootstrap/bootstrap.c |  2 +-
 src/backend/postmaster/autovacuum.c   |  5 ++-
 src/backend/postmaster/postmaster.c   |  2 ++
 src/backend/tcop/postgres.c   |  1 +
 src/backend/utils/init/miscinit.c |  4 +--
 src/backend/utils/init/postinit.c |  5 +--
 src/include/miscadmin.h   |  4 ++-
 src/include/postmaster/bgworker.h |  6 ++--
 .../modules/worker_spi/t/001_worker_spi.pl| 36 +++
 src/test/modules/worker_spi/worker_spi.c  |  2 ++
 11 files changed, 59 insertions(+), 12 deletions(-)
  10.0% doc/src/sgml/
  10.9% src/backend/postmaster/
  15.4% src/backend/utils/init/
   9.6% src/include/postmaster/
  45.8% src/test/modules/worker_spi/t/
   3.1% src/test/modules/worker_spi/
   4.8% src/

diff --git a/doc/src/sgml/bgworker.sgml b/doc/src/sgml/bgworker.sgml
index 9ad1146ba0..8b61cf1c55 100644
--- a/doc/src/sgml/bgworker.sgml
+++ b/doc/src/sgml/bgworker.sgml
@@ -200,7 +200,9 @@ typedef struct BackgroundWorker
InvalidOid, the process will run as the superuser created
during initdb. If 
BGWORKER_BYPASS_ALLOWCONN
is specified as flags it is possible to bypass the 
restriction
-   to connect to databases not allowing user connections.
+   to connect to databases not allowing user connections. If 
BGWORKER_BYPASS_ROLELOGINCHECK
+   is specified as flags it is possible to bypass the login 
check for the
+   role used to connect to databases.
A background worker can only call one of these two functions, and only
once.  It is not possible to switch databases.
   
diff --git a/src/backend/bootstrap/bootstrap.c 
b/src/backend/bootstrap/bootstrap.c
index 5810f8825e..bdd005b66a 100644
--- a/src/backend/bootstrap/bootstrap.c
+++ b/src/backend/bootstrap/bootstrap.c
@@ -345,7 +345,7 @@ BootstrapModeMain(int argc, char *argv[], bool check_only)
if (pg_link_canary_is_frontend())
elog(ERROR, "backend is incorrectly linked to frontend 
functions");
 
-   InitPostgres(NULL, InvalidOid, NULL, InvalidOid, false, false, NULL);
+   InitPostgres(NULL, InvalidOid, NULL, InvalidOid, false, false, false, 
NULL);
 
/* Initialize stuff for bootstrap-file processing */
for (i = 0; i < MAXATTR; i++)
diff --git a/src/backend/postmaster/autovacuum.c 
b/src/backend/postmaster/autovacuum.c
index ae9be9b911..4686b81f68 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -488,7 +488,7 @@ AutoVacLauncherMain(int argc, char *argv[])
/* Early initialization */
BaseInit();
 
-   InitPostgres(NULL, InvalidOid, NULL, InvalidOid, false, false, NULL);
+   InitPostgres(NULL, InvalidOid, NULL, InvalidOid, false, false, false, 
NULL);
 
SetProcessingMode(NormalProcessing);
 
@@ -1706,8 +1706,7 @@ AutoVacWorkerMain(int argc, char *argv[])
 * Note: if we have selected a just-deleted database (due to 
using
 * stale stats info), we'll fail and exit here.
 */
-   InitPostgres(NULL, dbid, NULL, InvalidOid, false, false,
-dbname);
+   InitPostgres(NULL, dbid, NULL, InvalidOid, false, false, false, 
dbname);
SetProcessingMode(NormalProcessing);
set_ps_display(dbname);
ereport(DEBUG1,
diff --git a/src/backend/postmaster/postmaster.c 
b/src/backend/postmaster/postmaster.c
index 0761b38bf8..0502588013 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@

Re: On login trigger: take three

2023-10-09 Thread Alexander Korotkov

On Mon, Oct 9, 2023 at 11:58 PM Andres Freund  wrote:
> On 2023-10-09 17:11:25 +0300, Alexander Korotkov wrote:
> > This version should be good and has no overhead.  Any thoughts?
>
> I think you forgot to attach the patch?

That's it!

--
Regards,
Alexander Korotkov


0001-Add-support-event-triggers-on-authenticated-logi-v43.patch
Description: Binary data

Re: Check each of base restriction clauses for constant-FALSE-or-NULL

2023-10-09 Thread Richard Guo

On Mon, Oct 9, 2023 at 5:48 PM Ashutosh Bapat 
wrote:

> postgres@312571=# explain (analyze, costs off) select * from pg_class
> t1 where oid = 1 and oid = 2;
> QUERY PLAN
> ---
>  Result (actual time=0.002..0.003 rows=0 loops=1)
>One-Time Filter: false
>->  Index Scan using pg_class_oid_index on pg_class t1 (never executed)
>  Index Cond: (oid = '1'::oid)
>  Planning Time: 0.176 ms
>  Execution Time: 0.052 ms
> (6 rows)
>
> You will see that the scan node was never executed. Hence there's no
> execution time benefit if we remove the scan plan.

Yeah, the constant-FALSE is a pseudoconstant qual and will result in a
gating Result node atop the scan node.  So this optimization about
detecting constant-FALSE restriction clauses and marking the rel as
dummy if there is one is not likely to benefit execution time.  AFAICS
it can help save some planning efforts, because once a base rel is
marked dummy, we won't bother building access paths for it.  Also a
dummy input rel can save efforts when we generate paths for joinrel, see
how we cope with dummy rels in populate_joinrel_with_paths().

> Where do we produce the single baserestrictinfo mentioned in the
> comments? Is it before the planning proper starts?

Do you mean the const-folding?  It happens in the preprocessing phase,
specifically in eval_const_expressions().

> get_gating_quals does what you are doing much earlier in the query
> processing. Your code would just duplicate that.

Hm, I don't think so.  get_gating_quals is called in createplan.c, where
we've selected the best path, while the optimization with my code
happens much earlier, when we set size estimates for a base relation.
Neither of these two is a duplicate of the other.  I think the theory
here is that it's always a win to mark a rel as dummy if possible as
early as we can.

Thanks
Richard

Re: Check each of base restriction clauses for constant-FALSE-or-NULL

2023-10-09 Thread Ashutosh Bapat

On Tue, Oct 10, 2023 at 11:09 AM Richard Guo  wrote:
> Do you mean the const-folding?  It happens in the preprocessing phase,
> specifically in eval_const_expressions().

Thanks.

> Hm, I don't think so.  get_gating_quals is called in createplan.c, where
> we've selected the best path, while the optimization with my code
> happens much earlier, when we set size estimates for a base relation.
> Neither of these two is a duplicate of the other.  I think the theory
> here is that it's always a win to mark a rel as dummy if possible as
> early as we can.

Right. Do you have an example where this could be demonstrated?

-- 
Best Wishes,
Ashutosh Bapat

Re: Crash in add_paths_to_append_rel

2023-10-09 Thread Richard Guo

On Mon, Oct 9, 2023 at 7:35 PM David Rowley  wrote:

> Since you've managed to attribute this to a specific commit, for the
> future, I think instead of opening a new thread, it would be more
> useful to communicate the issue on the thread that's linked in the
> commit message.


Ah, yes, I should have done it that way, sorry.

Thanks
Richard

Re: Crash in add_paths_to_append_rel

2023-10-09 Thread Richard Guo

On Tue, Oct 10, 2023 at 11:52 AM David Rowley  wrote:

> On Mon, 9 Oct 2023 at 22:49, Richard Guo  wrote:
> > I came across a crash in add_paths_to_append_rel() with the query below.
>
> > It was introduced by commit a8a968a8 where we referenced
> > cheapest_startup_path->param_info without first checking that we have a
> > valid cheapest_startup_path.
>
> I pushed this with just minor adjustments to the comments you wrote. I
> didn't feel the need to reiterate on the set_cheapest() comments.


Thanks for pushing!

Thanks
Richard

Re: Add const to values and nulls arguments

2023-10-09 Thread Peter Eisentraut


On 06.10.23 16:51, Aleksander Alekseev wrote:

There are a lot of Datum *values, bool *nulls argument pairs that should
really be const.  The 0001 patch makes those changes.  Some of these
hunks depend on each other.

The 0002 patch, which I'm not proposing to commit at this time, makes
similar changes but in a way that breaks the table and index AM APIs.
So I'm just including that here in case anyone wonders, why didn't you
touch those.  And also maybe if we ever change that API incompatibly we
could throw this one in then.


0001 looks good to me and passes the tests in several environments,
not surprisingly. Let's merge it.


done

Re: [Code Cleanup] : Small code cleanup in twophase.sql

2023-10-09 Thread Nishant Sharma

Hi,

Any taker or rejector for above? -- It's a very small 'good to have' change
patch for cleanup.

Thanks,
Nishant (EDB).

On Tue, Sep 26, 2023 at 6:31 PM Nishant Sharma <
nishant.sha...@enterprisedb.com> wrote:

> Hi,
>
>
> PFA small code cleanup in twophase.sql. Which contains a drop table
> statement for 'test_prepared_savepoint'. Which, to me, appears to be
> missing in the cleanup section of that file.
>
> To support it I have below points:-
>
> 1) Grepping this table 'test_prepared_savepoint' shows occurrences
> only in twophase.out & twophase.sql files. This means that table is
> local to that sql test file and not used in any other test file.
>
> 2) I don't see any comment on why this was not added in the cleanup
> section of twophase.sql, but drop for other two test tables are done.
>
> 3) I ran "make check-world" with the patch and I don't see any failures.
>
> Kindly correct, if I missed anything.
>
>
> Regards,
> Nishant (EDB).
>

Re: Add a new BGWORKER_BYPASS_ROLELOGINCHECK flag

2023-10-09 Thread Michael Paquier

On Tue, Oct 10, 2023 at 06:57:05AM +0200, Drouvot, Bertrand wrote:
> Please find attached v9 (v8 rebase due to f483b2090).

I was looking at v8 just before you sent this v9, and still got
annoyed by the extra boolean argument added to InitPostgres().  So
please let me propose to bite the bullet and refactor that, as of the
0001 attached that means less diff footprints in all the callers of
InitPostgres() (I am not wedded to the flag names).

It looks like 0002 had the same issues as f483b209: the worker that
could not be started because of the login restriction could be
detected as stopped by worker_spi_launch(), causing the script to fail
hard.

0002 is basically your v9, able to work with the refactoring from
0001.  
--
Michael
From 6e094435789a0255ddd4961bd43cfae4bf54e135 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Tue, 10 Oct 2023 14:41:56 +0900
Subject: [PATCH v10 1/2] Refactor InitPostgres() with flags

---
 src/include/miscadmin.h |  6 --
 src/backend/bootstrap/bootstrap.c   |  2 +-
 src/backend/postmaster/autovacuum.c |  5 ++---
 src/backend/postmaster/postmaster.c | 16 
 src/backend/tcop/postgres.c |  5 +++--
 src/backend/utils/init/postinit.c   | 17 +
 6 files changed, 31 insertions(+), 20 deletions(-)

diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index 14bd574fc2..1cc3712c0f 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -463,12 +463,14 @@ extern PGDLLIMPORT AuxProcType MyAuxProcType;
  */
 
 /* in utils/init/postinit.c */
+/* flags for InitPostgres() */
+#define INIT_PG_LOAD_SESSION_LIBS		0x0001
+#define INIT_PG_OVERRIDE_ALLOW_CONNS	0x0002
 extern void pg_split_opts(char **argv, int *argcp, const char *optstr);
 extern void InitializeMaxBackends(void);
 extern void InitPostgres(const char *in_dbname, Oid dboid,
 		 const char *username, Oid useroid,
-		 bool load_session_libraries,
-		 bool override_allow_connections,
+		 bits32 flags,
 		 char *out_dbname);
 extern void BaseInit(void);
 
diff --git a/src/backend/bootstrap/bootstrap.c b/src/backend/bootstrap/bootstrap.c
index 5810f8825e..e01dca9b7c 100644
--- a/src/backend/bootstrap/bootstrap.c
+++ b/src/backend/bootstrap/bootstrap.c
@@ -345,7 +345,7 @@ BootstrapModeMain(int argc, char *argv[], bool check_only)
 	if (pg_link_canary_is_frontend())
 		elog(ERROR, "backend is incorrectly linked to frontend functions");
 
-	InitPostgres(NULL, InvalidOid, NULL, InvalidOid, false, false, NULL);
+	InitPostgres(NULL, InvalidOid, NULL, InvalidOid, 0, NULL);
 
 	/* Initialize stuff for bootstrap-file processing */
 	for (i = 0; i < MAXATTR; i++)
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index ae9be9b911..327ea0d45a 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -488,7 +488,7 @@ AutoVacLauncherMain(int argc, char *argv[])
 	/* Early initialization */
 	BaseInit();
 
-	InitPostgres(NULL, InvalidOid, NULL, InvalidOid, false, false, NULL);
+	InitPostgres(NULL, InvalidOid, NULL, InvalidOid, 0, NULL);
 
 	SetProcessingMode(NormalProcessing);
 
@@ -1706,8 +1706,7 @@ AutoVacWorkerMain(int argc, char *argv[])
 		 * Note: if we have selected a just-deleted database (due to using
 		 * stale stats info), we'll fail and exit here.
 		 */
-		InitPostgres(NULL, dbid, NULL, InvalidOid, false, false,
-	 dbname);
+		InitPostgres(NULL, dbid, NULL, InvalidOid, 0, dbname);
 		SetProcessingMode(NormalProcessing);
 		set_ps_display(dbname);
 		ereport(DEBUG1,
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 0761b38bf8..3d7fec995a 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -5562,6 +5562,11 @@ void
 BackgroundWorkerInitializeConnection(const char *dbname, const char *username, uint32 flags)
 {
 	BackgroundWorker *worker = MyBgworkerEntry;
+	bits32		init_flags = 0; /* never honor session_preload_libraries */
+
+	/* ignore datallowconn? */
+	if (flags & BGWORKER_BYPASS_ALLOWCONN)
+		init_flags |= INIT_PG_OVERRIDE_ALLOW_CONNS;
 
 	/* XXX is this the right errcode? */
 	if (!(worker->bgw_flags & BGWORKER_BACKEND_DATABASE_CONNECTION))
@@ -5571,8 +5576,7 @@ BackgroundWorkerInitializeConnection(const char *dbname, const char *username, u
 
 	InitPostgres(dbname, InvalidOid,	/* database to connect to */
  username, InvalidOid,	/* role to connect as */
- false,			/* never honor session_preload_libraries */
- (flags & BGWORKER_BYPASS_ALLOWCONN) != 0,	/* ignore datallowconn? */
+ init_flags,
  NULL);			/* no out_dbname */
 
 	/* it had better not gotten out of "init" mode yet */
@@ -5589,6 +5593,11 @@ void
 BackgroundWorkerInitializeConnectionByOid(Oid dboid, Oid useroid, uint32 flags)
 {
 	BackgroundWorker *worker = MyBgworkerEntry;
+	bits32		init_flags = 0; /* never honor session_

Re: Minor edit to src/bin/pg_upgrade/IMPLEMENTAION

2023-10-09 Thread Michael Paquier

On Mon, Oct 09, 2023 at 09:55:29PM -0700, Gurjeet Singh wrote:
> The attached patch adds the word 'databases' to show that template0,
> template1 and postgres are databases in a default installation.

+1.
--
Michael


signature.asc
Description: PGP signature

Re: [Code Cleanup] : Small code cleanup in twophase.sql

2023-10-09 Thread Michael Paquier

On Tue, Sep 26, 2023 at 06:31:42PM +0530, Nishant Sharma wrote:
> To support it I have below points:-
> 
> 1) Grepping this table 'test_prepared_savepoint' shows occurrences
> only in twophase.out & twophase.sql files. This means that table is
> local to that sql test file and not used in any other test file.
> 
> 2) I don't see any comment on why this was not added in the cleanup
> section of twophase.sql, but drop for other two test tables are done.
> 
> 3) I ran "make check-world" with the patch and I don't see any failures.

Note that sometimes tables are left around in the regression tests for
pg_upgrade, so as we can test dedicated upgrade paths for some object
types.  But, here, I think you're right: this is not a table that
matters for an upgrade and the end of the test file also expects a
cleanup.  So okay for me to drop this table here.
--
Michael


signature.asc
Description: PGP signature

Re: PGDOCS - add more links in the pub/sub reference pages

2023-10-09 Thread vignesh C

On Tue, 10 Oct 2023 at 08:47, Peter Smith  wrote:
>
> On Tue, Oct 10, 2023 at 11:46 AM vignesh C  wrote:
> >
> > I noticed a couple of other places where a link to "ALTER SUBSCRIPTION
> > ... DISABLE" and "ALTER SUBSCRIPTION ... SET" can be specified in
> > drop_subscription:
> > To proceed in this situation, first disable the subscription by
> > executing  ALTER SUBSCRIPTION ... DISABLE, and then
> > disassociate it from the replication slot by executing ALTER
> > SUBSCRIPTION ... SET (slot_name = NONE).
> >
>
> Hi Vignesh. Thanks for the review comments.
>
> Modified as suggested.
>
> PSA v3.

Few more instances in other logical replication related pages:
1) Another instance was in alter_subscription.sgml:
  Fetch missing table information from publisher.  This will start
  replication of tables that were added to the subscribed-to publications
  since CREATE SUBSCRIPTION or
  the last invocation of REFRESH PUBLICATION.
2) Few more instances were in logical-replication-subscription.html
2.a)Normally, the remote replication slot is created automatically when the
subscription is created using CREATE SUBSCRIPTION and it
is dropped automatically when the subscription is dropped using
DROP SUBSCRIPTION.  In some situations, however, it can
be useful or necessary to manipulate the subscription and the underlying
replication slot separately.
2.b)  When dropping a subscription, the remote host is not reachable.  In
   that case, disassociate the slot from the subscription
   using ALTER SUBSCRIPTION before attempting to drop
   the subscription.
2.c)  If a subscription is affected by this problem, the only way to resume
 replication is to adjust one of the column lists on the publication
 side so that they all match; and then either recreate the subscription,
 or use ALTER SUBSCRIPTION ... DROP PUBLICATION to
 remove one of the offending publications and add it again.
2.d) The
   transaction that produced the conflict can be skipped by using
   ALTER SUBSCRIPTION ... SKIP with the finish LSN
   (i.e., LSN 0/14C0378).
2.e)Before using this function, the subscription needs to be
disabled temporarily
   either by ALTER SUBSCRIPTION ... DISABLE or,

Regards,
Vignesh

1 2 >

1 - 100 of 104 matches

Mail list logo