date:20230106

 The easiest way to achieve the same (without patching libpq) is by setting
sslcert to something non-existent. While maybe not the most obvious way, I
would consider this the recommended approach.

(sorry for the resend Jim, my original message got blocked to the wider
mailing list)

On Fri, 6 Jan 2023 at 09:15, Jim Jones  wrote:

> Dear PostgreSQL Hackers,
>
> Some time ago we faced a small issue in libpq regarding connections
> configured in the pg_hba.conf as type *hostssl* and using *md5* as
> authentication method.
>
> One of our users placed the client certificates in ~/.postgresql/ (
> *postgresql.crt,**postgresql.key*), so that libpq sends them to the
> server without having to manually set *sslcert* and *sslkey* - which is
> quite convenient. However, there are other servers where the same user
> authenticates with password (md5), but libpq still sends the client
> certificates for authentication by default. This causes the authentication
> to fail even before the user has the chance to enter his password, since he
> has no certificate registered in the server.
>
> To make it clearer:
>
> Although the connection is configured as ...
>
>
> *host  all  dummyuser  192.168.178.42/32   md5 *
>
> ... and the client uses the following connection string ...
>
> *psql "host=myserver dbname=db user=**dummyuser" *
>
> ... the server tries to authenticate the user using the client
> certificates in *~/.postgresql/* and, as expected, the authentication
> fails:
>
> *psql: error: connection to server at "myserver" (xx.xx.xx.xx), port 5432
> failed: SSL error: tlsv1 alert unknown ca*
>
> Server log:
>
>
> *2022-12-09 10:50:59.376 UTC [13896] LOG:  could not accept SSL
> connection: certificate verify failed *
>
> Am I missing something?
>
> Obviously it would suffice to just remove or rename *~/.postgresql/*
> *postgresql.{crt,key}*, but the user needs them to authenticate in other
> servers. So we came up with the workaround to create a new sslmode
> (no-clientcert) to make libpq explicitly ignore the client certificates, so
> that we can avoid ssl authentication errors. These small changes can be
> seen in the patch file attached.
>
> *psql "host=myserver dbname=db user=**dummyuser sslrootcert=server.crt
> sslmode=no-clientcert"*
>
> Any better ideas to make libpq ignore *~/.postgresql/*
> *postgresql.{crt,key}*? Preferably without having to change the source
> code :) Thanks in advance!
>
> Best,
>
> Jim
>
>

Re: Generate pg_stat_get_xact*() functions with Macros


Hi,

On 1/5/23 9:19 PM, Corey Huinker wrote:



I like code cleanups like this. It makes sense, it results in less code, and 
anyone doing a `git grep pg_stat_get_live_tuples` will quickly find the macro 
definition.

Unsurprisingly, it passes `make check-world`.

So I think it's good to go as-is.


Thanks for the review!

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: [RFC] Add jit deform_counter

2023-01-06 Thread Pavel Stehule

Hi


po 2. 1. 2023 v 17:55 odesílatel Dmitry Dolgov <9erthali...@gmail.com>
napsal:

> > On Sun, Dec 25, 2022 at 06:55:02PM +0100, Pavel Stehule wrote:
> > there are some problems with stability of regress tests
> >
> > http://cfbot.cputube.org/dmitry-dolgov.html
>
> Looks like this small change predates moving to meson, the attached
> version should help.
>

The explain part is working, the part of pg_stat_statements doesn't

set jit_above_cost to 10;
set jit_optimize_above_cost to 10;
set jit_inline_above_cost to 10;

(2023-01-06 09:08:59) postgres=# explain analyze select
count(length(prosrc) > 0) from pg_proc;
┌┐
│ QUERY PLAN
  │
╞╡
│ Aggregate  (cost=154.10..154.11 rows=1 width=8) (actual
time=132.320..132.321 rows=1 loops=1)  │
│   ->  Seq Scan on pg_proc  (cost=0.00..129.63 rows=3263 width=16) (actual
time=0.013..0.301 rows=3266 loops=1) │
│ Planning Time: 0.070 ms
 │
│ JIT:
  │
│   Functions: 3
  │
│   Options: Inlining true, Optimization true, Expressions true, Deforming
true  │
│   Timing: Generation 0.597 ms, Deforming 0.407 ms, Inlining 8.943 ms,
Optimization 79.403 ms, Emission 43.091 ms, Total 132.034 ms │
│ Execution Time: 132.986 ms
  │
└┘
(8 rows)

I see the result of deforming in explain analyze, but related values in
pg_stat_statements are 0.

Minimally, the values are assigned in wrong order

+   if (api_version >= PGSS_V1_11)
+   {
+   values[i++] = Float8GetDatumFast(tmp.jit_deform_time);
+   values[i++] = Int64GetDatumFast(tmp.jit_deform_count);
+   }

After reading the doc, I am confused what this metric means

+ 
+  
+   jit_deform_count bigint
+  
+  
+   Number of times tuples have been deformed
+  
+ 
+
+ 
+  
+   jit_deform_time double
precision
+  
+  
+   Total time spent by the statement on deforming tuples, in
milliseconds
+  
+ 

It is not clean so these times and these numbers are related just to the
compilation of the deforming process, not by own deforming.

Regards

Pavel

Re: Generate pg_stat_get_xact*() functions with Macros


Hi,

On 1/6/23 12:21 AM, Andres Freund wrote:

Hi,

On 2023-01-05 15:19:54 -0500, Corey Huinker wrote:

It does get me wondering, however, if we reordered the three typedefs to
group like-typed registers together, we could make them an array with the
names becoming defined constant index values (or keeping them via a union),
then the typedefs effectively become:


I think that'd make it substantially enough harder to work with the
datastructures that I don't want to go there.



Yeah, I think that's a good idea from a "coding style" point of view but harder 
to work with.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Sampling-based timing for EXPLAIN ANALYZE

Nice addition! And the code looks pretty straight forward.

The current patch triggers warnings:
https://cirrus-ci.com/task/6016013976731648 Looks like you need to add void
as the argument.

Do you have some performance comparison between TIMING ON and TIMING
SAMPLING?

In InstrStartSampling there's logic to increase/decrease the frequency of
an already existing timer. It's not clear to me when this can occur. I'd
expect sampling frequency to remain constant throughout an explain plan. If
it's indeed needed, I think a code comment would be useful to explain why
this edge case is necessary.

On Fri, 6 Jan 2023 at 09:41, Lukas Fittl  wrote:

> Hi,
>
> Since EXPLAIN ANALYZE with TIMING ON still carries noticeable overhead on
> modern hardware (despite time sources being faster), I'd like to propose a
> new setting EXPLAIN ANALYZE, called "TIMING SAMPLING", as compared to
> TIMING ON.
>
> This new timing mode uses a timer on a fixed recurring frequency (e.g. 100
> or 1000 Hz) to gather a sampled timestamp on a predefined schedule, instead
> of getting the time on-demand when InstrStartNode/InstrStopNode is called.
> To implement the timer, we can use the existing timeout infrastructure,
> which is backed by a wall clock timer (ITIMER_REAL).
>
> Conceptually this is inspired by how sampling profilers work (e.g.
> "perf"), but it ties into the existing per-plan node instrumentation done
> by EXPLAIN ANALYZE, and simply provides a lower accuracy version of the
> total time for each plan node.
>
> In EXPLAIN output this is marked as "sampled time", and scaled to the
> total wall clock time (to adjust for the sampling undercounting):
>
> =# EXPLAIN (ANALYZE, BUFFERS, TIMING SAMPLING, SAMPLEFREQ 100) SELECT ...;
>  QUERY PLAN
>
>
> -
>  HashAggregate  (cost=201747.90..201748.00 rows=10 width=12) (actual
> sampled time=5490.974 rows=9 loops=1)
>...
>->  Hash Join  (cost=0.23..199247.90 rows=49 width=4) (actual
> sampled time=3738.619 rows=900 loops=1)
>  ...
>  ->  Seq Scan on large  (cost=0.00..144247.79 rows=979
> width=4) (actual sampled time=1004.671 rows=1001 loops=1)
>...
>  ->  Hash  (cost=0.10..0.10 rows=10 width=4) (actual sampled
> time=0.000 rows=10 loops=1)
>...
>  Execution Time: 5491.475 ms
> ---
>
> In simple query tests like this on my local machine, this shows a
> consistent benefit over TIMING ON (and behaves close to ANALYZE with TIMING
> OFF), whilst providing a "good enough" accuracy to identify which part of
> the query was problematic.
>
> Attached is a prototype patch for early feedback on the concept, with
> tests and documentation to come in a follow-up. Since the January
> commitfest is still marked as open I'll register it there, but note that my
> assumption is this is *not* Postgres 16 material.
>
> As an open item, note that in the patch the requested sampling frequency
> is not yet passed to parallel workers (it always defaults to 1000 Hz when
> sampling is enabled). Also, note the timing frequency is limited to a
> maximum of 1000 Hz (1ms) due to current limitations of the timeout
> infrastructure.
>
> With thanks to Andres Freund for help on refining the idea, collaborating
> on early code and finding the approach to hook into the timeout API.
>
> Thanks,
> Lukas
>
> --
> Lukas Fittl
>

convey privileges -> confer privileges

2023-01-06 Thread Erik Rijkers


Can we change 'convey' to 'confer' in these recent doc changes?

Maybe 'convey a privilege' isn't exactly wrong but it leaves you 
wondering what exactly is meant.


Thanks,

Erik

--- doc/src/sgml/ref/createuser.sgml.orig	2023-01-05 21:37:35.803839575 +0100
+++ doc/src/sgml/ref/createuser.sgml	2023-01-05 21:38:14.700390046 +0100
@@ -47,7 +47,7 @@
CREATEROLE privilege.
Being a superuser implies the ability to bypass all access permission
checks within the database, so superuser access should not be granted
-   lightly. CREATEROLE also conveys
+   lightly. CREATEROLE also confers
very extensive privileges.
   
 
--- doc/src/sgml/user-manag.sgml.orig	2023-01-05 21:30:14.905548605 +0100
+++ doc/src/sgml/user-manag.sgml	2023-01-05 21:34:48.945471335 +0100
@@ -207,10 +207,10 @@
 SECURITY LABEL commands.


-However, CREATEROLE does not convey the ability to
-create SUPERUSER roles, nor does it convey any
+However, CREATEROLE does not confer the ability to
+create SUPERUSER roles, nor does it confer any
 power over SUPERUSER roles that already exist.
-Furthermore, CREATEROLE does not convey the power
+Furthermore, CREATEROLE does not confer the power
 to create REPLICATION users, nor the ability to
 grant or revoke the REPLICATION privilege, nor the
 ability to modify the role properties of such users.  However, it does

Re: Minimal logical decoding on standbys

Hi,

On 1/6/23 4:40 AM, Andres Freund wrote:

Hi,

Thomas, there's one point at the bottom wrt ConditionVariables that'd be
interesting for you to comment on.

On 2023-01-05 16:15:39 -0500, Robert Haas wrote:

On Tue, Jan 3, 2023 at 2:42 AM Drouvot, Bertrand
wrote:

Please find attached v36, tiny rebase due to 1de58df4fe.

0001 looks committable to me now, though we probably shouldn't do that
unless we're pretty confident about shipping enough of the rest of
this to accomplish something useful.

Thanks for your precious help reaching this state!

Cool!

ISTM that the ordering of patches isn't quite right later on. ISTM that it
doesn't make sense to introduce working logic decoding without first fixing
WalSndWaitForWal() (i.e. patch 0006). What made you order the patches that
way?

Idea was to ease the review: 0001 to 0005 to introduce the feature and 0006 to
deal
with this race condition.

I thought it would be easier to review that way (given the complexity of "just"
adding the
feature itself).

0001:

4. We can't rely on the standby's relcache entries for this purpose in
any way, because the WAL record that causes the problem might be
replayed before the standby even reaches consistency.

The startup process can't access catalog contents in the first place, so the
consistency issue is secondary.

Thanks for pointing out, I'll update the commit message.

ISTM that the commit message omits a fairly significant portion of the change:
The introduction of indisusercatalog / the reason for its introduction.

Agree, will do (or create a dedicated path as you are suggesting below).

Why is indisusercatalog stored as "full" column, whereas we store the fact of
table being used as a catalog table in a reloption? I'm not adverse to moving
to a full column, but then I think we should do the same for tables.

Earlier version of the patches IIRC sourced the "catalogness" from the
relation. What lead you to changing that? I'm not saying it's wrong, just not
sure it's right either.

That's right it's started retrieving this information from the relation.

Then, Robert made a comment in [1] saying it's not safe to call
table_open() while holding a buffer lock.

Then, I worked on other options and submitted the current one.

While reviewing 0001, Robert's also thought of it (see [2])) and finished with:

"
So while I do not really like the approach of storing the same
property in different ways for tables and for indexes, it's also not
really obvious to me how to do better.
"

That's also my thought.

It'd be good to introduce cross-checks that indisusercatalog is set
correctly. RelationGetIndexList() seems like a good candidate.

Good point, will look at it.

I'd probably split the introduction of indisusercatalog into a separate patch.

You mean, completely outside of this patch series or a sub-patch in this series?
If the former, I'm not sure it would make sense outside of the current context.

Why was HEAP_DEFAULT_USER_CATALOG_TABLE introduced in this patch?

To help in case of reset on the table (ensure the default gets also propagated
to the indexes).

I wonder if we instead should compute a relation's "catalogness" in the
relcache. That'd would have the advantage of making
RelationIsUsedAsCatalogTable() cheaper and working for all kinds of
relations.

Any idea on where and how you'd do that? (that's one option I explored in vain
before
submitting the current proposal).

It does look like that's also an option explored by Robert in [2]:

"
Yet a third way is to have the index fetch the flag from
the associated table, perhaps when the relcache entry is built. But I
see no existing precedent for that in RelationInitIndexAccessInfo,
which I think is where it would be if we had it -- and that makes me
suspect that there might be good reasons why this isn't actually safe.
"

VISIBILITYMAP_ON_CATALOG_ACCESSIBLE_IN_LOGICAL_DECODING is a very long
identifier. Given that the field in the xlog records is just named
isCatalogRel, any reason to not just name it correspondingly?

Agree, VISIBILITYMAP_IS_CATALOG_REL maybe?

I'll look at the other comments too and work on/reply later on.

[1]:
https://www.postgresql.org/message-id/CA%2BTgmobgOLH-JpBoBSdu4i%2BsjRdgwmDEZGECkmowXqQgQL6WhQ%40mail.gmail.com
[2]:
https://www.postgresql.org/message-id/CA%2BTgmoY0df9X%2B5ENg8P0BGj0odhM45sdQ7kB4JMo4NKaoFy-Vg%40mail.gmail.com

Thanks for your help,
Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Improve tab completion for ALTER FUNCTION/PROCEDURE/ROUTINE

On Fri, 6 Jan 2023 at 02:38, vignesh C  wrote:
>
> On Thu, 5 Jan 2023 at 18:22, Dean Rasheed  wrote:
> >
> > That leads to the attached, which barring objections, I'll push shortly.
>
> The changes look good to me.
>

Pushed.

Regards,
Dean

Fix comments in gistxlogDelete, xl_heap_freeze_page and xl_btree_delete


Hi hackers,

Please find attached a patch to $SUBJECT.

The wrong comments have been discovered by Robert in [1].

Submitting this here as a separate thread so it does not get lost in the 
logical decoding
on standby thread.

[1]: 
https://www.postgresql.org/message-id/CA%2BTgmoYTTsxP8y6uknZvCBNCRq%2B1FJ4zGbX8Px1TGW459fGsaQ%40mail.gmail.com

Looking forward to your feedback,

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.comdiff --git a/src/include/access/gistxlog.h b/src/include/access/gistxlog.h
index 33f1c7e31b..95068feb87 100644
--- a/src/include/access/gistxlog.h
+++ b/src/include/access/gistxlog.h
@@ -52,9 +52,7 @@ typedef struct gistxlogDelete
TransactionId snapshotConflictHorizon;
uint16  ntodelete;  /* number of deleted offsets */
 
-   /*
-* In payload of blk 0 : todelete OffsetNumbers
-*/
+   /* TODELETE OFFSET NUMBER ARRAY FOLLOWS */
 } gistxlogDelete;
 
 #define SizeOfGistxlogDelete   (offsetof(gistxlogDelete, ntodelete) + 
sizeof(uint16))
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index 5c77290eec..1117e95ede 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -345,8 +345,9 @@ typedef struct xl_heap_freeze_page
TransactionId snapshotConflictHorizon;
uint16  nplans;
 
-   /* FREEZE PLANS FOLLOW */
-   /* OFFSET NUMBER ARRAY FOLLOWS */
+   /*
+* In payload of blk 0 : FREEZE PLANS and OFFSET NUMBER ARRAY
+*/
 } xl_heap_freeze_page;
 
 #define SizeOfHeapFreezePage (offsetof(xl_heap_freeze_page, nplans) + 
sizeof(uint16))
diff --git a/src/include/access/nbtxlog.h b/src/include/access/nbtxlog.h
index 3b2d959c69..1aa8e7eca5 100644
--- a/src/include/access/nbtxlog.h
+++ b/src/include/access/nbtxlog.h
@@ -236,9 +236,12 @@ typedef struct xl_btree_delete
uint16  ndeleted;
uint16  nupdated;
 
-   /* DELETED TARGET OFFSET NUMBERS FOLLOW */
-   /* UPDATED TARGET OFFSET NUMBERS FOLLOW */
-   /* UPDATED TUPLES METADATA (xl_btree_update) ARRAY FOLLOWS */
+   /*
+* In payload of blk 0 :
+* - DELETED TARGET OFFSET NUMBERS
+* - UPDATED TARGET OFFSET NUMBERS
+* - UPDATED TUPLES METADATA (xl_btree_update) ARRAY
+*/
 } xl_btree_delete;
 
 #define SizeOfBtreeDelete  (offsetof(xl_btree_delete, nupdated) + 
sizeof(uint16))

Issue attaching a table to a partitioned table with an auto-referenced foreign key

2023-01-06 Thread Guillaume Lelarge

Hello,

One of our customers has an issue with partitions and foreign keys. He
works on a v13, but the issue is also present on v15.

I attach a SQL script showing the issue, and the results on 13.7, 13.9, and
15.1. But I'll explain the script here, and its behaviour on 13.9.

There is one partitioned table, two partitions and a foreign key. The
foreign key references the same table:

create table t1 (
  c1 bigint not null,
  c1_old bigint null,
  c2 bigint not null,
  c2_old bigint null,
  primary key (c1, c2)
  )
  partition by list (c1);
create table t1_a   partition of t1 for values in (1);
create table t1_def partition of t1 default;
alter table t1 add foreign key (c1_old, c2_old) references t1 (c1, c2) on
delete restrict on update restrict;

I've a SQL function that shows me some information from pg_constraints
(code of the function in the SQL script attached). Here is the result of
this function after creating the table, its partitions, and its foreign key:

select * from show_constraints();
conname |   t|  tref  |   coparent
+++---
 t1_c1_old_c2_old_fkey  | t1 | t1 |
 t1_c1_old_c2_old_fkey  | t1_a   | t1 | t1_c1_old_c2_old_fkey
 t1_c1_old_c2_old_fkey  | t1_def | t1 | t1_c1_old_c2_old_fkey
 t1_c1_old_c2_old_fkey1 | t1 | t1_a   | t1_c1_old_c2_old_fkey
 t1_c1_old_c2_old_fkey2 | t1 | t1_def | t1_c1_old_c2_old_fkey
(5 rows)

The constraint works great :

insert into t1 values(1, NULL, 2, NULL);
insert into t1 values(2, 1,2, 2);
delete from t1 where c1 = 1;
psql:ticket15010_v3.sql:34: ERROR:  update or delete on table "t1_a"
violates foreign key constraint "t1_c1_old_c2_old_fkey1" on table "t1"
DETAIL:  Key (c1, c2)=(1, 2) is still referenced from table "t1".

This error is normal since the line I want to delete is referenced on the
other line.

If I try to detach the partition, it also gives me an error.

alter table t1 detach partition t1_a;
psql:ticket15010_v3.sql:36: ERROR:  removing partition "t1_a" violates
foreign key constraint "t1_c1_old_c2_old_fkey1"
DETAIL:  Key (c1_old, c2_old)=(1, 2) is still referenced from table "t1".

Sounds good to me too (well, I'd like it to be smarter and find that the
constraint is still good after the detach, but I can understand why it
won't allow it).

The pg_constraint didn't change of course:

select * from show_constraints();
conname |   t|  tref  |   coparent
+++---
 t1_c1_old_c2_old_fkey  | t1 | t1 |
 t1_c1_old_c2_old_fkey  | t1_a   | t1 | t1_c1_old_c2_old_fkey
 t1_c1_old_c2_old_fkey  | t1_def | t1 | t1_c1_old_c2_old_fkey
 t1_c1_old_c2_old_fkey1 | t1 | t1_a   | t1_c1_old_c2_old_fkey
 t1_c1_old_c2_old_fkey2 | t1 | t1_def | t1_c1_old_c2_old_fkey
(5 rows)

Now, I'll delete the whole table contents, and I'll detach the partition:

delete from t1;
alter table t1 detach partition t1_a;

It seems to be working, but the content of pg_constraints is weird:

select * from show_constraints();
conname |   t|  tref  |   coparent
+++---
 t1_c1_old_c2_old_fkey  | t1 | t1 |
 t1_c1_old_c2_old_fkey  | t1_a   | t1 |
 t1_c1_old_c2_old_fkey  | t1_def | t1 | t1_c1_old_c2_old_fkey
 t1_c1_old_c2_old_fkey2 | t1 | t1_def | t1_c1_old_c2_old_fkey
(4 rows)

I understand why the ('t1_c1_old_c2_old_fkey1', 't1', 't1_a',
't1_c1_old_c2_old_fkey') tuple has gone but I don't understand why the
('t1_c1_old_c2_old_fkey', 't1_a', 't1', NULL) tuple is still there.

Anyway, I attach the partition:

alter table t1 attach partition t1_a for values in (1);

But pg_constraint has not changed:

select * from show_constraints();
conname |   t|  tref  |   coparent
+++---
 t1_c1_old_c2_old_fkey  | t1 | t1 |
 t1_c1_old_c2_old_fkey  | t1_a   | t1 | t1_c1_old_c2_old_fkey
 t1_c1_old_c2_old_fkey  | t1_def | t1 | t1_c1_old_c2_old_fkey
 t1_c1_old_c2_old_fkey2 | t1 | t1_def | t1_c1_old_c2_old_fkey
(4 rows)

I was expecting to see the fifth tuple coming back, but alas, no.

And as a result, the foreign key doesn't work anymore:

insert into t1 values(1, NULL, 2, NULL);
insert into t1 values(2, 1,2, 2);
delete from t1 where c1 = 1;

Well, let's truncate the partitioned table, and drop the partition:

truncate t1;
drop table t1_a;

The content of pg_constraint looks good to me:

select * from show_constraints();
conname |   t|  tref  |   coparent
+++---
 t1_c1_old_c2_old_fkey  | t1 | t1 |
 t1_c1_old_c2_old_fkey  | t1_def | t1 | t1_c1_old_c2_old_fkey
 t1_c1_old_c2_old_fkey2 | t1 | t1_def | t1_c1_old_c2_old_fkey
(3 rows)

Let's create the partition to see if that works better:

create table t1_a

Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert

Huge +1 from me. On Azure we're already using public CAs to sign
certificates for our managed postgres offerings[1][2]. Right now, our
customers have to go to the hassle of downloading a specific root cert or
finding their OS default location. Neither of these allow us to give users
a simple copy-pastable connection string that uses secure settings. This
would change this and make it much easier for our customers to use secure
connections to their database.

I have two main questions:
1. From the rest of the thread it's not entirely clear to me why this patch
goes for the sslrootcert=system approach, instead of changing what
sslrootcert='' means when using verify-full. Like Tom Lane suggested, we
could change it to try ~/.postgresql/root.crt and if that doesn't exist
make it try the system store, instead of erroring out like it does now when
~/.postgresql/root.crt doesn't exist. This approach seems nicer to me, as
it doesn't require introducing another special keyword. It would also
remove the need for the changing of defaults depending on the value of
sslrootcert. NOTE: For sslmode=verify-ca we should still error out if
~/.postgresql/root.crt doesn't exist, because as mentioned upthread it is
trivial to get a cert from these CAs.

2. Should we allow the same approach with ssl_ca_file on the server side,
for client cert validation?

[1]:
https://learn.microsoft.com/en-us/azure/postgresql/flexible-server/how-to-connect-tls-ssl
[2]:
https://learn.microsoft.com/en-us/azure/cosmos-db/postgresql/howto-ssl-connection-security

On Fri, 6 Jan 2023 at 10:42, Jacob Champion  wrote:

> On Thu, Dec 8, 2022 at 3:10 PM Jacob Champion 
> wrote:
> > For now, it's worked around in v4. This should finally get the cfbot
> > fully green.
>
> Cirrus's switch to M1 Macs changed the Homebrew installation path, so
> v5 adjusts the workaround accordingly.
>
> --Jacob
>

Re: Getting an error if we provide --enable-tap-tests switch on SLES 12

2023-01-06 Thread tushar


On 1/5/23 2:40 AM, Andres Freund wrote:


Have you actually tested running the tests with the patch applied?

Yes but getting an errors like
t/006_edb_current_audit_logfile.pl .. Can't locate IPC/Run.pm in @INC 
(you may need to install the IPC::Run module) (@INC contains: 
/home/runner/edbas/src/bin/pg_ctl/../../../src/test/perl 
/home/runner/edbas/src/bin/pg_ctl



Do we have any better option to work without this workaround?

You could install the module via cpan :/.



Yes, will try to install.

Thanks Andres.

--
regards,tushar
EnterpriseDB  https://www.enterprisedb.com/
The Enterprise PostgreSQL Company

Re: [PATCH] pgbench: add multiconnect option

This patch seems to have quite some use case overlap with my patch which
adds load balancing to libpq itself:
https://www.postgresql.org/message-id/flat/pr3pr83mb04768e2ff04818eeb2179949f7...@pr3pr83mb0476.eurprd83.prod.outlook.com

My patch is only able to add "random" load balancing though, not
"round-robin". So this patch still definitely seems useful, even when mine
gets merged.

I'm not sure that the support for the "working" connection is necessary
from a feature perspective though (usability/discoverability is another
question). It's already possible to achieve the same behaviour by simply
providing multiple host names in the connection string. You can even tell
libpq to connect to a primary or secondary by using the
target_session_attrs option.

On Fri, 6 Jan 2023 at 11:33, Fabien COELHO  wrote:

>
> Hello Ian,
>
> > cfbot reports the patch no longer applies.  As CommitFest 2022-11 is
> > currently underway, this would be an excellent time to update the patch.
>
> Attached a v5 which is just a rebase.
>
> --
> Fabien.

Re: Improve tab completion for ALTER FUNCTION/PROCEDURE/ROUTINE

On Thu, 5 Jan 2023 at 12:52, Dean Rasheed  wrote:
>
> While playing around with this, I noticed that the "... SET SCHEMA"
> case offers "FROM CURRENT" and "TO" as completions, which is
> incorrect. It should really offer to complete with a list of schemas.
> However, since that's a pre-existing bug in a different region of the
> code, I think it's best addressed in a separate patch, which probably
> ought to be back-patched.
>

OK, I've pushed and back-patched a fix for that issue too.

Regards,
Dean

Re: Cygwin cleanup

On Wed, Nov 9, 2022 at 2:04 PM Justin Pryzby  wrote:
> +data_sync_retry = on

Sharing with the list some clues that Justin and I figured out about
what that part is doing.  Without it, you get failures like:

  PANIC:  could not open file "pg_logical/snapshots/0-14FE6B0.snap":
No such file or directory

That's been seen before:

  https://www.postgresql.org/message-id/flat/17827.1549866683%40sss.pgh.pa.us

That thread concluded that the operating system must have a non-atomic
rename(), ie a kernel bug.  I don't know why Cygwin would display that
behaviour and our native Windows build not; maybe timing, or maybe our
own open() and rename() wrappers for Windows do something important
differently than Cygwin's open() and rename().

On reflection, that seems a bit too flimsy to have in-tree without
more investigation, which I won't have time for myself, so I'm going
to withdraw this entry.

Re: BUG: Postgres 14 + vacuum_defer_cleanup_age + FOR UPDATE + UPDATE

2023-01-06 Thread Michail Nikolaev

Hello!

Thanks for checking the issue!

> FWIW, I find that the "Assert(ItemIdIsNormal(lp));" at the top of
> heap_lock_tuple() is the first thing that fails on my assert-enabled
> build.

Yes, the same for me:

 TRAP: failed Assert("ItemIdIsNormal(lp)"), File: "heapam.c",
Line: 4292, PID: 33416


> Reproduced this on HEAD locally (no docker), without any difficulty.

It is a little bit harder without docker in my case, need to adjust
connections and number of threads:

 pgbench -c 90 -j 8 -n -f reproduce.bench 'port= 5432
host=localhost user=postgres dbname=postgres password=postgres' -T
2000 -P 1


Best regards,
Michail.

Re: [PATCH] psql: Add tab-complete for optional view parameters

2023-01-06 Thread vignesh C

On Fri, 9 Dec 2022 at 16:01, Christoph Heiss  wrote:
>
> Thanks for the review!
>
> On 12/8/22 12:19, Melih Mutlu wrote:
> > Hi Christoph,
> >
> > I just took a quick look at your patch.
> > Some suggestions:
> >
> > +   else if (Matches("ALTER", "VIEW", MatchAny, "SET", "("))
> > +   COMPLETE_WITH_LIST(view_optional_parameters);
> > +   /* ALTER VIEW xxx RESET ( yyy , ... ) */
> > +   else if (Matches("ALTER", "VIEW", MatchAny, "RESET", "("))
> > +   COMPLETE_WITH_LIST(view_optional_parameters);
> >
> >
> > What about combining these two cases into one like Matches("ALTER",
> > "VIEW", MatchAny, "SET|RESET", "(") ?
> Good thinking, incorporated that into v2.
> While at it, I also added completion for the values of the SET
> parameters, since that is useful as well.
>
> >
> >  /* ALTER VIEW  */
> >  else if (Matches("ALTER", "VIEW", MatchAny))
> >  COMPLETE_WITH("ALTER COLUMN", "OWNER TO", "RENAME",
> >"SET SCHEMA");
> >
> >
> > Also seems like SET and RESET don't get auto-completed for "ALTER VIEW
> > ".
> > I think it would be nice to include those missing words.
> That is already contained in the patch:
>
> @@ -2139,7 +2146,7 @@ psql_completion(const char *text, int start, int end)
>   /* ALTER VIEW  */
>   else if (Matches("ALTER", "VIEW", MatchAny))
>   COMPLETE_WITH("ALTER COLUMN", "OWNER TO", "RENAME",
> -  "SET SCHEMA");
> +  "SET SCHEMA", "SET (", "RESET (");

One suggestion:
Tab completion for "alter view v1 set (check_option =" is handled to
complete with CASCADED and LOCAL but the same is not handled for
create view: "create view viewname with ( check_option ="
+   else if (Matches("ALTER", "VIEW", MatchAny, "SET", "(",
"check_option", "="))
+   COMPLETE_WITH("local", "cascaded");

I felt we should keep the handling consistent for both create view and
alter view.

Regards,
Vignesh

Re: Fix showing XID of a spectoken lock in an incorrect field of pg_locks view.

2023-01-06 Thread Amit Kapila

On Thu, Jan 5, 2023 at 11:46 AM Masahiko Sawada  wrote:
>
> Agreed. Attached the updated patch.
>

Thanks, the patch looks good to me. I think it would be probably good
to backpatch this but it has the potential to break some monitoring
scripts which were using the wrong columns for transaction id and spec
token number. As this is not a very critical issue and is not reported
till now, so it may be better to leave backpatching it. What do you
think?

-- 
With Regards,
Amit Kapila.

Re: [PATCH] psql: Add tab-complete for optional view parameters

On Fri, 6 Jan 2023 at 11:52, vignesh C  wrote:
>
> One suggestion:
> Tab completion for "alter view v1 set (check_option =" is handled to
> complete with CASCADED and LOCAL but the same is not handled for
> create view: "create view viewname with ( check_option ="
> +   else if (Matches("ALTER", "VIEW", MatchAny, "SET", "(",
> "check_option", "="))
> +   COMPLETE_WITH("local", "cascaded");
>
> I felt we should keep the handling consistent for both create view and
> alter view.
>

Hmm, I don't think we should be offering "check_option" as a tab
completion for CREATE VIEW at all, since that would encourage users to
use non-SQL-standard syntax, rather than CREATE VIEW ... WITH
[CASCADED|LOCAL] CHECK OPTION.

Regards,
Dean

Re: Fix showing XID of a spectoken lock in an incorrect field of pg_locks view.

2023-01-06 Thread Masahiko Sawada

On Fri, Jan 6, 2023 at 9:00 PM Amit Kapila  wrote:
>
> On Thu, Jan 5, 2023 at 11:46 AM Masahiko Sawada  wrote:
> >
> > Agreed. Attached the updated patch.
> >
>
> Thanks, the patch looks good to me. I think it would be probably good
> to backpatch this but it has the potential to break some monitoring
> scripts which were using the wrong columns for transaction id and spec
> token number.

Right.

> As this is not a very critical issue and is not reported
> till now, so it may be better to leave backpatching it. What do you
> think?

Considering the compatibility, I'm inclined to agree not to backpatch
it. If someone complains about the current behavior in back branches
in the future, we can backpatch it.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Re: POC: Lock updated tuples in tuple_update() and tuple_delete()

2023-01-06 Thread Pavel Borisov

Hi, Alexander!

On Thu, 5 Jan 2023 at 15:11, Alexander Korotkov  wrote:
>
> On Wed, Jan 4, 2023 at 5:05 PM Alexander Korotkov  
> wrote:
> > On Wed, Jan 4, 2023 at 3:43 PM Pavel Borisov  wrote:
> > > One more update of a patchset to avoid compiler warnings.
> >
> > Thank you for your help.  I'm going to provide the revised version of
> > patch with comments and commit message in the next couple of days.
>
> The revised patch is attached.  It contains describing commit message,
> comments and some minor code improvements.

I've looked through the patch once again. It seems in a nice state to
be committed.
I also noticed that in tableam level and NodeModifyTable function
calls we have a one-to-one correspondence between *lockedSlot и bool
lockUpdated, but no checks on this in case something changes in the
code in the future. I'd propose combining these variables to remain
free from these checks. See v5 of a patch. Tests are successfully
passed.
Besides, the new version has only some minor changes in the comments
and the commit message.

Kind regards,
Pavel Borisov,
Supabase.

v5-0001-Allow-locking-updated-tuples-in-tuple_update-and-.patch
Description: Binary data

Re: Add BufFileRead variants with short read and EOF detection

2023-01-06 Thread Peter Eisentraut


On 02.01.23 13:13, Amit Kapila wrote:

On Wed, Dec 28, 2022 at 4:17 PM Peter Eisentraut
 wrote:


Most callers of BufFileRead() want to check whether they read the full
specified length.  Checking this at every call site is very tedious.
This patch provides additional variants BufFileReadExact() and
BufFileReadMaybeEOF() that include the length checks.

I considered changing BufFileRead() itself, but this function is also
used in extensions, and so changing the behavior like this would create
a lot of problems there.  The new names are analogous to the existing
LogicalTapeReadExact().



+1 for the new APIs. I have noticed that some of the existing places
use %m and the file path in messages which are not used in the new
common function.


The existing uses of %m are wrong.  This was already fixed once in 
7897e3bb902c557412645b82120f4d95f7474906, but the affected areas of code 
were apparently developed at around the same time and didn't get the 
fix.  So I have attached a separate patch to fix this first, which could 
be backpatched.


The original patch is then rebased on top of that.  I have adjusted the 
error message to include the file set name if available.


What this doesn't keep is the purpose of the temp file in some cases, 
like "hash-join temporary file".  We could maybe make this an additional 
argument or an error context, but it seems cumbersome in any case. 
Thoughts?
From fca7c4724e2dd4cd3edd1e59275e6b6ca2448bb3 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Fri, 6 Jan 2023 11:57:35 +0100
Subject: [PATCH v2 1/2] Fix some BufFileRead() error reporting

Remove "%m" from error messages where errno would be bogus.  Add short
read byte counts where appropriate.

This is equivalent to what was done in
7897e3bb902c557412645b82120f4d95f7474906, but some code was apparently
developed concurrently to that and not updated accordingly.
---
 src/backend/backup/backup_manifest.c |  3 ++-
 src/backend/replication/logical/worker.c | 33 ++--
 2 files changed, 21 insertions(+), 15 deletions(-)

diff --git a/src/backend/backup/backup_manifest.c 
b/src/backend/backup/backup_manifest.c
index 21ca2941dc..d325ef047a 100644
--- a/src/backend/backup/backup_manifest.c
+++ b/src/backend/backup/backup_manifest.c
@@ -371,7 +371,8 @@ SendBackupManifest(backup_manifest_info *manifest, bbsink 
*sink)
if (rc != bytes_to_read)
ereport(ERROR,
(errcode_for_file_access(),
-errmsg("could not read from temporary 
file: %m")));
+errmsg("could not read from temporary 
file: read only %zu of %zu bytes",
+   rc, bytes_to_read)));
bbsink_manifest_contents(sink, bytes_to_read);
manifest_bytes_done += bytes_to_read;
}
diff --git a/src/backend/replication/logical/worker.c 
b/src/backend/replication/logical/worker.c
index f8e8cf71eb..64ce93e725 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -1426,7 +1426,7 @@ apply_spooled_messages(TransactionId xid, XLogRecPtr lsn)
nchanges = 0;
while (true)
{
-   int nbytes;
+   size_t  nbytes;
int len;
 
CHECK_FOR_INTERRUPTS();
@@ -1442,8 +1442,8 @@ apply_spooled_messages(TransactionId xid, XLogRecPtr lsn)
if (nbytes != sizeof(len))
ereport(ERROR,
(errcode_for_file_access(),
-errmsg("could not read from streaming 
transaction's changes file \"%s\": %m",
-   path)));
+errmsg("could not read from streaming 
transaction's changes file \"%s\": read only %zu of %zu bytes",
+   path, nbytes, 
sizeof(len;
 
if (len <= 0)
elog(ERROR, "incorrect length %d in streaming 
transaction's changes file \"%s\"",
@@ -1453,11 +1453,12 @@ apply_spooled_messages(TransactionId xid, XLogRecPtr 
lsn)
buffer = repalloc(buffer, len);
 
/* and finally read the data into the buffer */
-   if (BufFileRead(fd, buffer, len) != len)
+   nbytes = BufFileRead(fd, buffer, len);
+   if (nbytes != len)
ereport(ERROR,
(errcode_for_file_access(),
-errmsg("could not read from streaming 
transaction's changes file \"%s\": %m",
-   path)));
+errmsg("could not read from streaming 
transaction's changes file \"%s\": rea

Re: Todo: Teach planner to evaluate multiple windows in the optimal order

2023-01-06 Thread Ankit Kumar Pandey


Sorry if multiple mails has been sent for this.



On 05/01/23 12:53, David Rowley wrote:


We *can* reuse Sorts where a more strict or equivalent sort order is
available.  The question is how do we get the final WindowClause to do
something slightly more strict to save having to do anything for the
ORDER BY.  One way you might think would be to adjust the
WindowClause's orderClause to add the additional clauses, but that
cannot be done because that would cause are_peers() in nodeWindowAgg.c
to not count some rows as peers when they maybe should be given a less
strict orderClause in the WindowClause.


I attempted this in attached patch.


#1. No op case

--In patched 
version---


explain (costs off) SELECT rank() OVER (ORDER BY b), count(*) OVER 
(ORDER BY a,b,c) FROM abcd order by a;

    QUERY PLAN
--
 WindowAgg
   ->  Sort
 Sort Key: a, b, c
 ->  WindowAgg
   ->  Sort
 Sort Key: b
 ->  Seq Scan on abcd
(7 rows)

explain (costs off) SELECT rank() OVER (ORDER BY b), count(*) OVER 
(ORDER BY a,b,c) FROM abcd;

    QUERY PLAN
--
 WindowAgg
   ->  Sort
 Sort Key: b
 ->  WindowAgg
   ->  Sort
 Sort Key: a, b, c
 ->  Seq Scan on abcd
(7 rows)

--In 
master



explain (costs off) SELECT rank() OVER (ORDER BY b), count(*) OVER 
(ORDER BY a,b,c) FROM abcd order by a;

    QUERY PLAN
--
 WindowAgg
   ->  Sort
 Sort Key: a, b, c
 ->  WindowAgg
   ->  Sort
 Sort Key: b
 ->  Seq Scan on abcd
(7 rows)
explain (costs off) SELECT rank() OVER (ORDER BY b), count(*) OVER 
(ORDER BY a,b,c) FROM abcd;

    QUERY PLAN
--
 WindowAgg
   ->  Sort
 Sort Key: b
 ->  WindowAgg
   ->  Sort
 Sort Key: a, b, c
 ->  Seq Scan on abcd
(7 rows)

No change between patched version and master.


2. In case where optimization can happen

In patched 
version---


explain (costs off) SELECT rank() OVER (ORDER BY b), count(*) OVER 
(ORDER BY a) FROM abcd order by a,b;

    QUERY PLAN
--
 WindowAgg
   ->  Sort
 Sort Key: a, b
 ->  WindowAgg
   ->  Sort
 Sort Key: b
 ->  Seq Scan on abcd
(7 rows)

explain (costs off)  SELECT rank() OVER (ORDER BY a), count(*) OVER 
(ORDER BY b), count(*) OVER (PARTITION BY a ORDER BY b) FROM abcd order 
by a,b,c,d;

   QUERY PLAN

 WindowAgg
   ->  WindowAgg
 ->  Sort
   Sort Key: a, b, c, d
   ->  WindowAgg
 ->  Sort
   Sort Key: b
   ->  Seq Scan on abcd
(8 rows)

---In 
master


explain (costs off) SELECT rank() OVER (ORDER BY b), count(*) OVER 
(ORDER BY a) FROM abcd order by a,b;

   QUERY PLAN

 Incremental Sort
   Sort Key: a, b
   Presorted Key: a
   ->  WindowAgg
 ->  Sort
   Sort Key: a
   ->  WindowAgg
 ->  Sort
   Sort Key: b
   ->  Seq Scan on abcd
(10 rows)

explain (costs off)  SELECT rank() OVER (ORDER BY a), count(*) OVER 
(ORDER BY b), count(*) OVER (PARTITION BY a ORDER BY b) FROM abcd order 
by a,b,c,d;

  QUERY PLAN
--
 Incremental Sort
   Sort Key: a, b, c, d
   Presorted Key: a, b
   ->  WindowAgg
 ->  WindowAgg
   ->  Sort
 Sort Key: a, b
 ->  WindowAgg
   ->  Sort
 Sort Key: b
 ->  Seq Scan on abcd
(11 rows)

Patched version removes few sorts.

Regression tests all passed so it is not breaking anything existing.

We don't have any tests for verifying sorting plan in window functions 
(which would have failed, if present).


Please let me know any feedbacks (I have added some my own concerns in 
the comments)


Thanks


--
Regards,
Ankit Kumar Pandey
From 8fb4b6188eda111bee2e47d3e85289064b428702 Mon Sep 17 00:00:00 2001
From: Ankit Kumar Pandey 
Date: Fri, 6 Jan 2023 14:30:38 +0530
Subject: [PATCH] Teach pla

Re: Resolve UNKNOWN type to relevant type instead of text type while bulk update using values

2023-01-06 Thread Ashutosh Bapat

On Thu, Jan 5, 2023 at 12:42 PM David G. Johnston
 wrote:

>
> The VALUES subquery has to produce its tabular output without being aware of 
> how the outer query is going to use it.  The second column of your values 
> subquery lacks type information so the system chooses a default - text.
>
> Dealing with types is one of the harder medium-hard problems in computer 
> science…encountering this problem in real life has never seen me motivated 
> enough to gripe about it rather than just add an explicit cast and move on.  
> And I’ve been around long enough to know that the project is, and long has 
> been, aware of the dull pain points in this area.
>

being here for quite a few years now I agree. It's tempting to trying
to fix a problem in this area since it seems the fix is simple but it
is hard to realize the wider impact that simple fix has. Still let me
try to propose something :)

we cast a quoted value to UNKNOWN type, but this is a special value
null which can be casted to any SQL data type. Probably we could add a
ANYNULLTYPE or some such generic null type which can be casted to any
data type. Then a null value without any type is labeled as
ANYNULLTYPE if specific type information is not available. This
problem wouldn't arise then. Of course that's a lot of code to fix
seemingly rare problem so may not be worth it still.

-- 
Best Wishes,
Ashutosh Bapat

Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert



On 2023-01-06 Fr 05:17, Jelte Fennema wrote:
> Huge +1 from me. On Azure we're already using public CAs to sign
> certificates for our managed postgres offerings[1][2]. Right now, our
> customers have to go to the hassle of downloading a specific root cert
> or finding their OS default location. Neither of these allow us to
> give users a simple copy-pastable connection string that uses secure
> settings. This would change this and make it much easier for our
> customers to use secure connections to their database.
>
> I have two main questions:
> 1. From the rest of the thread it's not entirely clear to me why this
> patch goes for the sslrootcert=system approach, instead of changing
> what sslrootcert='' means when using verify-full. Like Tom Lane
> suggested, we could change it to try ~/.postgresql/root.crt and if
> that doesn't exist make it try the system store, instead of erroring
> out like it does now when ~/.postgresql/root.crt doesn't exist. This
> approach seems nicer to me, as it doesn't require introducing another
> special keyword. It would also remove the need for the changing of
> defaults depending on the value of sslrootcert. NOTE: For
> sslmode=verify-ca we should still error out if ~/.postgresql/root.crt
> doesn't exist, because as mentioned upthread it is trivial to get a
> cert from these CAs.


One reason might be that it doesn't give you any way not to fall back on
the system store. Maybe that's important, maybe not. I don't know that
there would be much extra ease in doing it the other way, you're going
to have to specify some ssl options anyway.


>
> 2. Should we allow the same approach with ssl_ca_file on the server
> side, for client cert validation?


+1 for doing this, although I think client certs are less likely to have
been issued by a public CA.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert

> One reason might be that it doesn't give you any way not to fall back on
> the system store.

To not fall back to the system store you could still provide the exact path
to the CA cert file.

> +1 for doing this, although I think client certs are less likely to have
> been issued by a public CA.

I totally agree that it's less likely. And I definitely don't want to block this
patch on this feature. Especially since configuring your database server
is much easier than configuring ALL the clients that ever connect to your
database.

However, I would like to give a use case where use public CA signed
client authentication can make sense:
Authenticating different nodes in a citus cluster to each other. If such
nodes already have a public CA signed certificate for their hostname
to attest their identity for regular clients, then you can set up client
side auth on each of the nodes so that each node in the
cluster can connect as any user to each of the other nodes in
the cluster by authenticating with that same certificate.

Re: Cygwin cleanup



On 2023-01-05 Th 16:39, Thomas Munro wrote:
> On Fri, Jan 6, 2023 at 1:22 AM Thomas Munro  wrote:
>> https://cirrus-ci.com/task/5142810819559424 [still running at time of 
>> writing]
>>
>> Gotta run, but I'll check again in the morning to see if that does better...
> Yes!  Two successful runs with all TAP tests so far.  So it looks like
> we can probably stop lorikeet's spurious failures, by happy
> coincidence due to other work, and we could seriously consider
> committing this optional CI test for it, much like we have the
> optional MSYS build.  Any interest in producing a tidied up version of
> the patch, Justin?  Or I can, but I'll go and work on other things
> first.
>
> I pushed a change to switch the semaphore implementation.  I haven't
> personally seen that failure mode on lorikeet, but I would guess
> that's because (1) it's only running a tiny subset of the tests, (2)
> it crashes for the other reason with higher likelihood, and/or (3)
> it's not using much concurrency yet because the build farm doesn't use
> meson yet.


OK, should I now try re-enabling TAP tests on lorikeet?


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: postgres_fdw: using TABLESAMPLE to collect remote sample

2023-01-06 Thread Tomas Vondra

I've pushed the comment/assert cleanup.

Here's a cleaned up version of the relkind check. This is almost
identical to the patch from yesterday (plus the renames and updates of
comments for modified functions).

The one difference is that I realized the relkind check does not
actually say we can't do sampling - it just means we can't use
TABLESAMPLE to do it. We could still use "random()" ...

Furthermore, I don't think we should silently disable sampling when the
user explicitly requests TABLESAMPLE by specifying bernoulli/system for
the table - IMHO it's less surprising to just fail in that case.

So we now do this:

if (!can_tablesample && (method == ANALYZE_SAMPLE_AUTO))
method = ANALYZE_SAMPLE_RANDOM;

Yes, we may still disable sampling when reltuples is -1, 0 or something
like that. But that's a condition that is expected for new relations and
likely to fix itself, which is not the case for relkind.

Of course, all relkinds that don't support TABLESAMPLE currently have
reltuples value that will disable sampling anyway (e.g. views have -1).
So we won't actually fallback to random() anyway, because we can't
calculate the sample fraction.

That's a bit annoying for foreign tables pointing at a view, which is a
more likely use case than table pointing at a sequence. And likely more
of an issue, because views may return a many rows (while sequences have
only a single row).

But I realized we could actually still do "random()" sampling:

SELECT * FROM t ORDER BY random() LIMIT $X;

where $X is the target number of rows for sample for the table. Which
would result in plans like this (given sufficient work_mem values)

  QUERY PLAN
  ---
   Limit (actual rows=3 loops=1)
 ->  Sort (actual rows=3 loops=1)
   Sort Key: (random())
   Sort Method: top-N heapsort  Memory: 3916kB
   ->  Append (actual rows=100 loops=1)

Even with lower work_mem values this would likely be a win, due to
saving on network transfers.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL CompanyFrom 76270100186507c1403f7758131966d146db5b39 Mon Sep 17 00:00:00 2001
From: Tomas Vondra 
Date: Thu, 5 Jan 2023 21:13:20 +0100
Subject: [PATCH] Check relkind before using TABLESAMPLE in postgres_fdw

Before using TABLESAMPLE to acquire a sample from a foreign relation,
check the remote relkind. If it does not support TABLESAMPLE, disable
sampling and fallback to reading all rows.

This may seem unnecessary, because for relkinds that don't support
TABLESAMPLE we'll disable the sampling anyway, due to reltuples value
(e.g. views use -1, sequences 1). But that's rather accidental, and
might get broken after adding TABLESAMPLE support for other relkinds.
Better to make an explicit check.

Instead of disabling remote sampling for relkinds that do not support
TABLESAMPLE, consider fallback to using random() if the requested
sampling method was AUTO.

Reported-by: Tom Lane
Discussion: https://postgr.es/m/951485.1672461744%40sss.pgh.pa.us
---
 contrib/postgres_fdw/deparse.c  |  7 ++--
 contrib/postgres_fdw/postgres_fdw.c | 60 -
 contrib/postgres_fdw/postgres_fdw.h |  2 +-
 3 files changed, 47 insertions(+), 22 deletions(-)

diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index 4461fb06f02..21237d18ef8 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -2368,14 +2368,15 @@ deparseAnalyzeSizeSql(StringInfo buf, Relation rel)
 }
 
 /*
- * Construct SELECT statement to acquire the number of rows of a relation.
+ * Construct SELECT statement to acquire the number of rows and the relkind of
+ * a relation.
  *
  * Note: we just return the remote server's reltuples value, which might
  * be off a good deal, but it doesn't seem worth working harder.  See
  * comments in postgresAcquireSampleRowsFunc.
  */
 void
-deparseAnalyzeTuplesSql(StringInfo buf, Relation rel)
+deparseAnalyzeInfoSql(StringInfo buf, Relation rel)
 {
 	StringInfoData relname;
 
@@ -2383,7 +2384,7 @@ deparseAnalyzeTuplesSql(StringInfo buf, Relation rel)
 	initStringInfo(&relname);
 	deparseRelation(&relname, rel);
 
-	appendStringInfoString(buf, "SELECT reltuples FROM pg_catalog.pg_class WHERE oid = ");
+	appendStringInfoString(buf, "SELECT reltuples, relkind FROM pg_catalog.pg_class WHERE oid = ");
 	deparseStringLiteral(buf, relname.data);
 	appendStringInfoString(buf, "::pg_catalog.regclass");
 }
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index f8461bf18dc..44573c9fed2 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -4974,11 +4974,14 @@ postgresAnalyzeForeignTable(Relation relation,
 }
 
 /*
- * postgresCountTuplesForForeignTable
+ * postgresGetAnalyzeInfoForForeignTable
  *		Count tuples in foreign table (just get pg_cla

Re: Optimizing Node Files Support

2023-01-06 Thread Ranier Vilela

Em qua., 4 de jan. de 2023 às 19:39, Tom Lane  escreveu:

> vignesh C  writes:
> > The patch does not apply on top of HEAD as in [1], please post a rebased
> patch:
>
> Yeah.  The way that I'd been thinking of optimizing the copy functions
> was more or less as attached: continue to write all the COPY_SCALAR_FIELD
> macro calls, but just make them expand to no-ops after an initial memcpy
> of the whole node.  This preserves flexibility to do something else while
> still getting the benefit of substituting memcpy for retail field copies.
> Having said that, I'm not very sure it's worth doing, because I do not
> see any major reduction in code size:
>
I think this option is worse.
By disabling these macros, you lose their use in other areas.
By putting more intelligence into gen_node_support.pl, to either use memcpy
or the macros is better, IMO.
In cases of one or two macros, would it be faster than memset?


> HEAD:
>textdata bss dec hex filename
>   53601   0   0   53601d161 copyfuncs.o
> w/patch:
>textdata bss dec hex filename
>   49850   0   0   49850c2ba copyfuncs.o
>
I haven't tested it on Linux, but on Windows, there is a significant
reduction.

head:
8,114,688 postgres.exe
121.281 copyfuncs.funcs.c

patched:
8,108,544 postgres.exe
95.321 copyfuncs.funcs.c


> I've not looked at the generated assembly code, but I suspect that
> my compiler is converting the memcpy's into inlined code that's
> hardly any smaller than field-by-field assignment.  Also, it's
> rather debatable that it'd be faster, especially for node types
> that are mostly pointer fields, where the memcpy is going to be
> largely wasted effort.
>
IMO, with many field assignments, memcpy would be more faster.


>
> I tend to agree with John that the rest of the changes proposed
> in the v1 patch are not useful improvements, especially with
> no evidence offered that they'd make the code smaller or faster.
>
I tried using palloc_object, as you proposed, but several tests failed.
So I suspect that some fields are not filled in correctly.
It would be an advantage to avoid memset in the allocation (palloc0), but I
chose to keep it because of the errors.

This way, if we use palloc0, there is no need to set NULL on
COPY_STRING_FIELD.

Regarding COPY_POINTER_FIELD, it is wasteful to cast size_t to size_t.

v3 attached.

regards,
Ranier Vilela


> regards, tom lane
>
>


v3-optimize_gen_node_support.patch
Description: Binary data

Re: Resolve UNKNOWN type to relevant type instead of text type while bulk update using values

Ashutosh Bapat  writes:
> we cast a quoted value to UNKNOWN type, but this is a special value
> null which can be casted to any SQL data type. Probably we could add a
> ANYNULLTYPE or some such generic null type which can be casted to any
> data type. Then a null value without any type is labeled as
> ANYNULLTYPE if specific type information is not available.

And ... how does that differ from the existing behavior of UNKNOWN?

regards, tom lane

Re: Optimizing Node Files Support

Ranier Vilela  writes:
> Em qua., 4 de jan. de 2023 às 19:39, Tom Lane  escreveu:
>> Yeah.  The way that I'd been thinking of optimizing the copy functions
>> was more or less as attached: continue to write all the COPY_SCALAR_FIELD
>> macro calls, but just make them expand to no-ops after an initial memcpy
>> of the whole node.

> I think this option is worse.
> By disabling these macros, you lose their use in other areas.

What other areas?  They're local to copyfuncs.c.

The bigger picture here is that as long as we have any manually-maintained
node copy functions, it seems best to adhere to the existing convention
of explicitly listing each and every field in them.  I'm far more
concerned about errors-of-omission than I am about incremental performance
gains (which still haven't been demonstrated to exist, anyway).

> v3 attached.

I think you're wasting people's time if you don't provide some
performance measurements showing that this is worth doing from
a speed standpoint.

regards, tom lane

Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert



On 2023-01-06 Fr 09:28, Jelte Fennema wrote:
>> One reason might be that it doesn't give you any way not to fall back on
>> the system store.
> To not fall back to the system store you could still provide the exact path
> to the CA cert file.


I guess. I don't have strong feelings one way or the other about this.


>
>> +1 for doing this, although I think client certs are less likely to have
>> been issued by a public CA.
> I totally agree that it's less likely. And I definitely don't want to block 
> this
> patch on this feature. Especially since configuring your database server
> is much easier than configuring ALL the clients that ever connect to your
> database.
>
> However, I would like to give a use case where use public CA signed
> client authentication can make sense:
> Authenticating different nodes in a citus cluster to each other. If such
> nodes already have a public CA signed certificate for their hostname
> to attest their identity for regular clients, then you can set up client
> side auth on each of the nodes so that each node in the
> cluster can connect as any user to each of the other nodes in
> the cluster by authenticating with that same certificate.


Yeah, I have done that sort of thing with pgbouncer auth using an ident
map. (There's probably a good case for making ident maps for useful by
adopting the +role mechanism from pg_hba.conf processing, but that's a
separate issue).


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: [PATCH] Expand character set for ltree labels



On 2022-10-05 We 18:05, Garen Torikian wrote:
> After digging into it, you are completely correct. I had to do a bit
> more reading to understand the relationships between UTF-8 and wchar,
> but ultimately the existing locale support works for my use case.
>
> Therefore I have updated the patch with three much smaller changes:
>
> * Support for `-` in addition to `_`
> * Expanding the limit to 512 chars (from the existing 256); again it's
> not uncommon for non-English strings to be much longer
> * Fixed the documentation to expand on what the ltree label's
> relationship to the DB locale is
>

Regardless of the punycode issue, allowing hyphens in ltree labels seems
quite reasonable. I haven't reviewed the patch yet, but if it's OK I
intend to commit it.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: Logical replication - schema change not invalidating the relation cache

vignesh C  writes:
> On Fri, 6 Jan 2023 at 04:32, Tom Lane  wrote:
>> Hm.  I spent some time cleaning up this patch, and found that there's
>> still a problem.  ISTM that the row with value "3" ought to end up
>> in the subscriber's sch2.t1 table, but it does not: the attached
>> test script fails with
>> ...
>> What's up with that?

> When the subscription is created, the subscriber will create a
> subscription relation map of the corresponding relations from the
> publication. The subscription relation map will only have sch1.t1
> entry. As sch2.t1 was not present in the publisher when the
> subscription was created, subscription will not have this entry in the
> subscription relation map. So the insert operations performed on the
> new table sch2.t1 will not be applied by the subscriber. We will have
> to refresh the publication using 'ALTER SUBSCRIPTION ... REFRESH
> PUBLICATION' to fetch missing table information from publisher. This
> will start replication of tables that were added to the subscribed-to
> publications since CREATE SUBSCRIPTION or the last invocation of
> REFRESH PUBLICATION.

But ... but ... but ... that's the exact opposite of what the test
case shows to be happening.  To wit, the newly created table
(the second coming of sch1.t1) *is* replicated immediately, while
the pre-existing t1 (now sch2.t1) is not.  It's impossible to
explain those two facts under either a model of "tables are matched
by name" or "tables are matched by OID".  So I'm still of the opinion
that there's some very dubious behavior here.

However, it does seem that the cache flush makes one aspect better,
so I pushed this after a little further work on the test case.

regards, tom lane

Re: [PATCH] Expand character set for ltree labels

Andrew Dunstan  writes:
> On 2022-10-05 We 18:05, Garen Torikian wrote:
>> Therefore I have updated the patch with three much smaller changes:
>> 
>> * Support for `-` in addition to `_`
>> * Expanding the limit to 512 chars (from the existing 256); again it's
>> not uncommon for non-English strings to be much longer
>> * Fixed the documentation to expand on what the ltree label's
>> relationship to the DB locale is

> Regardless of the punycode issue, allowing hyphens in ltree labels seems
> quite reasonable. I haven't reviewed the patch yet, but if it's OK I
> intend to commit it.

No objection to allowing hyphens.  If we're going to increase the length
limit, how about using 1000 characters?  AFAICS we could even get away
with 10K, but it's probably best to hold a bit or two in reserve in case
we ever want flags or something associated with labels.

(I've not read the patch.)

regards, tom lane

Re: postgres_fdw: using TABLESAMPLE to collect remote sample

Tomas Vondra  writes:
> The one difference is that I realized the relkind check does not
> actually say we can't do sampling - it just means we can't use
> TABLESAMPLE to do it. We could still use "random()" ...

> Furthermore, I don't think we should silently disable sampling when the
> user explicitly requests TABLESAMPLE by specifying bernoulli/system for
> the table - IMHO it's less surprising to just fail in that case.

Agreed on both points.  This patch looks good to me.

> Of course, all relkinds that don't support TABLESAMPLE currently have
> reltuples value that will disable sampling anyway (e.g. views have -1).
> So we won't actually fallback to random() anyway, because we can't
> calculate the sample fraction.
> That's a bit annoying for foreign tables pointing at a view, which is a
> more likely use case than table pointing at a sequence.

Right, that's a case worth being concerned about.

> But I realized we could actually still do "random()" sampling:
> SELECT * FROM t ORDER BY random() LIMIT $X;

Hmm, interesting idea, but it would totally bollix our correlation
estimates.  Not sure that those are worth anything for remote views,
but still...

regards, tom lane

Re: Support load balancing in libpq

2023-01-06 Thread Michael Banck

Hi,

On Tue, Nov 29, 2022 at 02:57:08PM +, Jelte Fennema wrote:
> Attached is a new version with the tests cleaned up a bit (more
> comments mostly).
> 
> @Michael, did you have a chance to look at the last version? Because I
> feel that the patch is pretty much ready for a committer to look at,
> at this point.

I had another look; it still applies and tests pass. I still find the
distribution over three hosts a bit skewed (even when OpenSSL is
enabled, which wasn't the case when I first tested it), but couldn't
find a general fault and it worked well enough in my testing.

I wonder whether making the parameter a boolean will paint us into a
corner, and whether maybe additional modes could be envisioned in the
future, but I can't think of some right now (you can pretty neatly
restrict load-balancing to standbys by setting
target_session_attrs=standby in addition). Maybe a way to change the
behaviour if a dns hostname is backed by multiple entries?

Some further (mostly nitpicking) comments on the patch:

> From 6e20bb223012b666161521b5e7249c066467a5f3 Mon Sep 17 00:00:00 2001
> From: Jelte Fennema 
> Date: Mon, 12 Sep 2022 09:44:06 +0200
> Subject: [PATCH v5] Support load balancing in libpq
> 
> Load balancing connections across multiple read replicas is a pretty
> common way of scaling out read queries. There are two main ways of doing
> so, both with their own advantages and disadvantages:
> 1. Load balancing at the client level
> 2. Load balancing by connecting to an intermediary load balancer
> 
> Both JBDC (Java) and Npgsql (C#) already support client level load
> balancing (option #1). This patch implements client level load balancing
> for libpq as well. To stay consistent with the JDBC and Npgsql part of
> the  ecosystem, a similar implementation and name for the option are
> used. 

I still think all of the above has no business in the commit message,
though maybe the first paragraph can stay as introduction.

> It contains two levels of load balancing:
> 1. The given hosts are randomly shuffled, before resolving them
> one-by-one.
> 2. Once a host its addresses get resolved, those addresses are shuffled,
> before trying to connect to them one-by-one.

That's fine.

What should be in the commit message is at least a mention of what the
new connection parameter is called and possibly what is done to
accomplish it.

But the committer will pick this up if needed I guess.

> diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml
> index f9558dec3b..6ce7a0c9cc 100644
> --- a/doc/src/sgml/libpq.sgml
> +++ b/doc/src/sgml/libpq.sgml
> @@ -1316,6 +1316,54 @@ postgresql://%2Fvar%2Flib%2Fpostgresql/dbname
>
>   
>  
> +  xreflabel="load_balance_hosts">
> +  load_balance_hosts
> +  
> +   
> +Controls whether the client load balances connections across hosts 
> and
> +adresses. The default value is 0, meaning off, this means that hosts 
> are
> +tried in order they are provided and addresses are tried in the order
> +they are received from DNS or a hosts file. If this value is set to 
> 1,
> +meaning on, the hosts and addresses that they resolve to are tried in
> +random order. Subsequent queries once connected will still be sent to
> +the same server. Setting this to 1, is mostly useful when opening
> +multiple connections at the same time, possibly from different 
> machines.
> +This way connections can be load balanced across multiple Postgres
> +servers.
> +   
> +   
> +When providing multiple hosts, these hosts are resolved in random 
> order.
> +Then if that host resolves to multiple addresses, these addresses are
> +connected to in random order. Only once all addresses for a single 
> host
> +have been tried, the addresses for the next random host will be
> +resolved. This behaviour can lead to non-uniform address selection in
> +certain cases. Such as when not all hosts resolve to the same number 
> of
> +addresses, or when multiple hosts resolve to the same address. So if 
> you
> +want uniform load balancing, this is something to keep in mind. 
> However,
> +non-uniform load balancing also has usecases, e.g. providing the
> +hostname of a larger server multiple times in the host string so it 
> gets
> +more requests.
> +   
> +   
> +When using this setting it's recommended to also configure a 
> reasonable
> +value for . Because 
> then,
> +if one of the nodes that are used for load balancing is not 
> responding,
> +a new node will be tried.
> +   
> +  
> + 

I think this whole section is generally fine, but needs some
copyediting.

> + 
> +  random_seed
> +  
> +   
> +Sets the random seed that is used by  linkend="libpq-load-balance-hosts"/>
> +to randomize the host order. This option is mo

Re: ATTACH PARTITION seems to ignore column generation status

Amit Langote  writes:
> On Fri, Jan 6, 2023 at 3:53 AM Tom Lane  wrote:
>> I'm not sure to what extent it's sensible for partitions to have
>> GENERATED columns that don't match their parent; but even if that's
>> okay for payload columns I doubt we want to allow partitioning
>> columns to be GENERATED.

> Actually, I'm inclined to disallow partitions to have *any* generated
> columns that are not present in the parent table.  The main reason for
> that is the inconvenience of checking that a partition's generated
> columns doesn't override the partition key column of an ancestor that
> is not its immediate parent, which MergeAttributesIntoExisting() has
> access to and would have been locked.

After thinking about this awhile, I feel that we ought to disallow
it in the traditional-inheritance case as well.  The reason is that
there are semantic prohibitions on inserting or updating a generated
column, eg

regression=# create table t (f1 int, f2 int generated always as (f1+1) stored);
CREATE TABLE
regression=# update t set f2=42;
ERROR:  column "f2" can only be updated to DEFAULT
DETAIL:  Column "f2" is a generated column.

It's not very reasonable to have to recheck that for child tables,
and we don't.  But if one does this:

regression=# create table pp (f1 int, f2 int);
CREATE TABLE
regression=# create table cc (f1 int, f2 int generated always as (f1+1) stored) 
inherits(pp);
NOTICE:  merging column "f1" with inherited definition
NOTICE:  merging column "f2" with inherited definition
CREATE TABLE
regression=# insert into cc values(1);
INSERT 0 1
regression=# update pp set f2 = 99 where f1 = 1;
UPDATE 1
regression=# table cc;
 f1 | f2 
+
  1 | 99
(1 row)

That is surely just as broken as the partition-based case.

I also note that the code adjacent to what you added is

/*
 * If parent column is generated, child column must be, too.
 */
if (attribute->attgenerated && !childatt->attgenerated)
ereport(ERROR, ...

without any exception for non-partition inheritance, and the
following check for equivalent generation expressions has
no such exception either.  So it's not very clear why this
test should have an exception.

> Patch doing it that way is attached.  Perhaps the newly added error
> message should match CREATE TABLE .. PARTITION OF's, but I found the
> latter to be not detailed enough, or maybe that's just me.

Maybe we should improve the existing error message while we're at it?

regards, tom lane

Re: Add a new pg_walinspect function to extract FPIs from WAL records

2023-01-06 Thread Bharath Rupireddy

On Fri, Jan 6, 2023 at 11:47 AM Bharath Rupireddy
 wrote:
>
> On Thu, Jan 5, 2023 at 6:51 PM Bharath Rupireddy
>  wrote:
> >
> > > I'm also wondering if it would make sense to extend the test coverage of 
> > > it (and pg_waldump) to "validate" that both
> > > extracted images are the same and matches the one modified right after 
> > > the checkpoint.
> > >
> > > What do you think? (could be done later in another patch though).
> >
> > I think pageinspect can be used here. We can fetch the raw page from
> > the table after the checkpoint and raw FPI from the WAL record logged
> > as part of the update. I've tried to do so [1], but I see a slight
> > difference in the raw output. The expectation is that they both be the
> > same. It might be that the update operation logs the FPI with some
> > more info set (prune_xid). I'll try to see why it is so.
> >
> > I'm attaching the v2 patch for further review.
> >
> > [1]
> > SELECT * FROM page_header(:'page_from_table');
> > lsn| checksum | flags | lower | upper | special | pagesize |
> > version | prune_xid
> > ---+--+---+---+---+-+--+-+---
> >  0/1891D78 |0 | 0 |40 |  8064 |8192 | 8192 |
> > 4 | 0
> > (1 row)
> >
> > SELECT * FROM page_header(:'page_from_wal');
> > lsn| checksum | flags | lower | upper | special | pagesize |
> > version | prune_xid
> > ---+--+---+---+---+-+--+-+---
> >  0/1891D78 |0 | 0 |44 |  8032 |8192 | 8192 |
> > 4 |   735
> > (1 row)
>
> Ugh, v2 patch missed the new file added, I'm attaching v3 patch for
> further review. Sorry for the noise.

I took a stab at how and what gets logged as FPI in WAL records:

Option 1:
WAL record with FPI contains both the unmodified table page from the
disk after checkpoint and new tuple (not applied to the unmodified
page) and the recovery (redo) applies the new tuple to the unmodified
page as part of recovery. A bit more WAL is needed to store both
unmodified page and new tuple data in the WAL record and recovery can
get slower a bit too as it needs to stitch the modified page.

Option 2:
WAL record with FPI contains only the modified page (new tuple applied
to the unmodified page from the disk after checkpoint) and the
recovery (redo)  just returns the applied block as BLK_RESTORED.
Recovery can get faster with this approach and less WAL is needed to
store just the modified page.

My earlier understanding was that postgres does option (1), however, I
was wrong, option (2) is what actually postgres has implemented for
the obvious advantages specified.

I now made the tests a bit stricter in checking the FPI contents
(tuple values) pulled from the WAL record with raw page contents
pulled from the table using the pageinspect extension. Please see the
attached v4 patch.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
From 6b577f15aede723f5cb8ea675d41d0efe9b96727 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy 
Date: Fri, 6 Jan 2023 17:07:49 +
Subject: [PATCH v4] Add FPI extract function to pg_walinspect

---
 contrib/pg_walinspect/Makefile|   3 +-
 .../pg_walinspect/expected/pg_walinspect.out  |  84 -
 contrib/pg_walinspect/meson.build |   1 +
 .../pg_walinspect/pg_walinspect--1.0--1.1.sql |  24 
 contrib/pg_walinspect/pg_walinspect.c | 110 ++
 contrib/pg_walinspect/pg_walinspect.control   |   2 +-
 contrib/pg_walinspect/sql/pg_walinspect.sql   |  58 -
 doc/src/sgml/pgwalinspect.sgml|  49 
 8 files changed, 327 insertions(+), 4 deletions(-)
 create mode 100644 contrib/pg_walinspect/pg_walinspect--1.0--1.1.sql

diff --git a/contrib/pg_walinspect/Makefile b/contrib/pg_walinspect/Makefile
index 960530eb6c..7412307ede 100644
--- a/contrib/pg_walinspect/Makefile
+++ b/contrib/pg_walinspect/Makefile
@@ -7,11 +7,12 @@ OBJS = \
 PGFILEDESC = "pg_walinspect - functions to inspect contents of PostgreSQL Write-Ahead Log"
 
 EXTENSION = pg_walinspect
-DATA = pg_walinspect--1.0.sql
+DATA = pg_walinspect--1.0.sql pg_walinspect--1.0--1.1.sql
 
 REGRESS = pg_walinspect
 
 REGRESS_OPTS = --temp-config $(top_srcdir)/contrib/pg_walinspect/walinspect.conf
+EXTRA_INSTALL = contrib/pageinspect
 
 # Disabled because these tests require "wal_level=replica", which
 # some installcheck users do not have (e.g. buildfarm clients).
diff --git a/contrib/pg_walinspect/expected/pg_walinspect.out b/contrib/pg_walinspect/expected/pg_walinspect.out
index a1ee743457..b4ae466edb 100644
--- a/contrib/pg_walinspect/expected/pg_walinspect.out
+++ b/contrib/pg_walinspect/expected/pg_walinspect.out
@@ -1,4 +1,5 @@
 CREATE EXTENSION pg_walinspect;
+CREATE EXTENSION pageinspect;
 -- Make sure checkpoints don't interfere with the test.
 SELECT 'init' FROM pg_create_physical_r

Re: BUG #17717: Regression in vacuumdb (15 is slower than 10/11 and possible memory issue)

Nathan Bossart  writes:
> On Thu, Dec 29, 2022 at 03:29:15PM -0500, Tom Lane wrote:
>> I tried to make a patch along these lines, and soon hit a stumbling
>> block: ONLY is a fully-reserved SQL keyword.  I don't think this
>> syntax is attractive enough to justify requiring people to
>> double-quote the option, so we are back to square one.  Anybody
>> have a different suggestion?

> ... adding both SKIP_DATABASE_STATS and ONLY_DATABASE_STATS might be the
> best bet.

Nobody has proposed a different bikeshed color, so I'm going to
proceed with that syntax.  I'll incorporate the parallel-machinery
fix from your patch and push to HEAD only (since it's hard to argue
this isn't a new feature).

This needn't foreclose pursuing the various ideas about making
vac_update_datfrozenxid faster; but none of those would eliminate
the fundamental O(N^2) issue AFAICS.

regards, tom lane

Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert

2023-01-06 Thread Jacob Champion

On Fri, Jan 6, 2023 at 2:18 AM Jelte Fennema  wrote:
>
> Huge +1 from me. On Azure we're already using public CAs to sign certificates 
> for our managed postgres offerings[1][2]. Right now, our customers have to go 
> to the hassle of downloading a specific root cert or finding their OS default 
> location. Neither of these allow us to give users a simple copy-pastable 
> connection string that uses secure settings. This would change this and make 
> it much easier for our customers to use secure connections to their database.

Thanks! Timescale Cloud is in the same boat.

> I have two main questions:
> 1. From the rest of the thread it's not entirely clear to me why this patch 
> goes for the sslrootcert=system approach, instead of changing what 
> sslrootcert='' means when using verify-full. Like Tom Lane suggested, we 
> could change it to try ~/.postgresql/root.crt and if that doesn't exist make 
> it try the system store, instead of erroring out like it does now when 
> ~/.postgresql/root.crt doesn't exist.

I mentioned it briefly upthread, but to expand: For something this
critical to security, I don't like solutions that don't say exactly
what they do. What does the following connection string mean?

$ psql 'host=example.org sslmode=verify-full'

If it sometimes means to use root.crt (if one exists) and sometimes to
use the system store, then
1) it's hard to audit the actual behavior without knowing the state of
the filesystem, and
2) if I want to connect to a server using the system store, and *only*
the system store, then I have to make sure that the default root.crt
never exists. But what if other software on my system relies on it?

It also provides a bigger target for exploit chains, because I can
remove somebody's root.crt file and their connections may try to
continue with an effectively weaker verification level instead of
erroring out. I realize that for many people this is a nonissue ("if
you can delete the root cert, you can probably do much worse") but IME
arbitrary file deletion vulnerabilities are treated with less concern
than arbitrary file writes.

By contrast,

$ psql 'host=example.org sslrootcert=system sslmode=verify-full'

has a clear meaning. We'll never use a root.crt.

(A downside of reusing sslrootcert like this is the cross-version
hazard. The connection string 'host=example.org sslrootcert=system'
means something strong with this patchset, but something very weak to
libpq 15 and prior. So clients should probably continue to specify
sslmode=verify-full explicitly for the foreseeable future.)

> This approach seems nicer to me, as it doesn't require introducing another 
> special keyword.

Maybe I'm being overly aspirational, but one upside to that special
keyword is that it's a clear signal that the user wants to use the
public CA model. So we have the opportunity to remove footguns
aggressively when we see this mode. In the future we may have further
opportunities to strengthen sslrootcert=system (OCSP and/or
must-staple support?) that would be harder to roll out by default if
we're just trying to guess what the user wants.

> It would also remove the need for the changing of defaults depending on the 
> value of sslrootcert.

Agreed. Personally I think the benefit of 0002 outweighs its cost, but
maybe there's a better way to implement it.

> 2. Should we allow the same approach with ssl_ca_file on the server side, for 
> client cert validation?

I don't know enough about this use case to implement it safely. We'd
have to make sure the HBA entry is checking the hostname (so that we
do the reverse DNS dance), and I guess we'd need to introduce a new
clientcert verify-* mode? Also, it seems like server operators are
more likely to know exactly which roots they need, at least compared
to clients. I agree the feature is useful, but I'm not excited about
attaching it to this patchset.

--Jacob

Re: add \dpS to psql

On Wed, 28 Dec 2022 at 21:26, Nathan Bossart  wrote:
>
> On Wed, Dec 28, 2022 at 02:46:23PM +0300, Maxim Orlov wrote:
> > The patch applies with no problem, implements what it declared, CF bot is
> > happy.
> > Without patch \dpS shows 0 rows, after applying system objects are shown.
> > Consider this patch useful, hope it will be committed soon.
>
> Thanks for reviewing.
>

Looking this over this, I have a couple of comments:

Firstly, I think it should allow \zS in the same fashion as \dpS,
since \z is an alias for \dp, so the 2 should be kept in sync.

Secondly, I don't think the following is the right SQL clause to use
in the absence of "S":

if (!showSystem && !pattern)
appendPQExpBufferStr(&buf, "AND n.nspname !~ '^pg_'\n");

I know that's the condition it used before, but the problem with using
that now is that it will cause temporary relations to be excluded
unless the "S" modifier is used, which goes against the expectation
that "S" just causes system relations to be included. Also, it fails
to exclude information_schema relations, if that happens to be on the
user's search_path.

So I think we should use the same SQL clauses as every other psql
command that supports "S", namely:

if (!showSystem && !pattern)
appendPQExpBufferStr(&buf, "  AND n.nspname <> 'pg_catalog'\n"
 "  AND n.nspname <> 'information_schema'\n");

Updated patch attached.

Regards,
Dean
diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml
new file mode 100644
index 8a5285d..3f994a3
--- a/doc/src/sgml/ref/psql-ref.sgml
+++ b/doc/src/sgml/ref/psql-ref.sgml
@@ -1825,14 +1825,16 @@ INSERT INTO tbl1 VALUES ($1, $2) \bind '
 
 
   
-\dp [ pattern ]
+\dp[S] [ pattern ]
 
 
 Lists tables, views and sequences with their
 associated access privileges.
 If pattern is
 specified, only tables, views and sequences whose names match the
-pattern are listed.
+pattern are listed.  By default only user-created objects are shown;
+supply a pattern or the S modifier to include
+system objects.
 
 
 
@@ -3575,14 +3577,16 @@ testdb=> \setenv LESS -imx
 
 
   
-\z [ pattern ]
+\z[S] [ pattern ]
 
 
 Lists tables, views and sequences with their
 associated access privileges.
 If a pattern is
 specified, only tables, views and sequences whose names match the
-pattern are listed.
+pattern are listed.  By default only user-created objects are shown;
+supply a pattern or the S modifier to include
+system objects.
 
 
 
diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c
new file mode 100644
index 00b89d9..b5201ed
--- a/src/bin/psql/command.c
+++ b/src/bin/psql/command.c
@@ -140,7 +140,8 @@ static backslashResult exec_command_writ
 static backslashResult exec_command_watch(PsqlScanState scan_state, bool active_branch,
 		  PQExpBuffer query_buf, PQExpBuffer previous_buf);
 static backslashResult exec_command_x(PsqlScanState scan_state, bool active_branch);
-static backslashResult exec_command_z(PsqlScanState scan_state, bool active_branch);
+static backslashResult exec_command_z(PsqlScanState scan_state, bool active_branch,
+	  const char *cmd);
 static backslashResult exec_command_shell_escape(PsqlScanState scan_state, bool active_branch);
 static backslashResult exec_command_slash_command_help(PsqlScanState scan_state, bool active_branch);
 static char *read_connect_arg(PsqlScanState scan_state);
@@ -413,8 +414,8 @@ exec_command(const char *cmd,
 	query_buf, previous_buf);
 	else if (strcmp(cmd, "x") == 0)
 		status = exec_command_x(scan_state, active_branch);
-	else if (strcmp(cmd, "z") == 0)
-		status = exec_command_z(scan_state, active_branch);
+	else if (strcmp(cmd, "z") == 0 || strcmp(cmd, "zS") == 0)
+		status = exec_command_z(scan_state, active_branch, cmd);
 	else if (strcmp(cmd, "!") == 0)
 		status = exec_command_shell_escape(scan_state, active_branch);
 	else if (strcmp(cmd, "?") == 0)
@@ -875,7 +876,7 @@ exec_command_d(PsqlScanState scan_state,
 success = listCollations(pattern, show_verbose, show_system);
 break;
 			case 'p':
-success = permissionsList(pattern);
+success = permissionsList(pattern, show_system);
 break;
 			case 'P':
 {
@@ -2822,16 +2823,22 @@ exec_command_x(PsqlScanState scan_state,
  * \z -- list table privileges (equivalent to \dp)
  */
 static backslashResult
-exec_command_z(PsqlScanState scan_state, bool active_branch)
+exec_command_z(PsqlScanState scan_state, bool active_branch, const char *cmd)
 {
 	bool		success = true;
 
 	if (active_branch)
 	{
-		char	   *pattern = psql_scan_slash_option(scan_state,
-	 OT_NORMAL, NULL, true);
+		char	   *pattern;
+		bool		show_system;
+
+		pattern = psql_scan_slash_option(scan_state,
+		 OT_NORM

Re: Authentication fails for md5 connections if ~/.postgresql/postgresql.{crt and key} exist

2023-01-06 Thread Jim Jones

Hi Jelte, thanks for the message. You're right, an invalid cert path 
does solve the issue - I even use it for tests. Although it solves the 
authentication issue it still looks in my eyes like a non intuitive 
workaround/hack. Perhaps a new sslmode isn't the right place for this 
"feature"? Thanks again for the suggestion!


Jim

On 06.01.23 09:32, Jelte Fennema wrote:
The easiest way to achieve the same (without patching libpq) is by 
setting sslcert to something non-existent. While maybe not the most 
obvious way, I would consider this the recommended approach.


On Fri, 6 Jan 2023 at 09:15, Jim Jones  wrote:

Dear PostgreSQL Hackers,

Some time ago we faced a small issue in libpq regarding
connections configured in the pg_hba.conf as type *hostssl* and
using *md5* as authentication method.

One of our users placed the client certificates in ~/.postgresql/
(*postgresql.crt,**postgresql.key*), so that libpq sends them to
the server without having to manually set *sslcert* and *sslkey* -
which is quite convenient. However, there are other servers where
the same user authenticates with password (md5), but libpq still
sends the client certificates for authentication by default. This
causes the authentication to fail even before the user has the
chance to enter his password, since he has no certificate
registered in the server.

To make it clearer:

Although the connection is configured as ...

*host  all  dummyuser 192.168.178.42/32
  md5
*

... and the client uses the following connection string ...

*psql "host=myserver dbname=db user=***dummyuser*" *

... the server tries to authenticate the user using the client
certificates in *~/.postgresql/* and, as expected, the
authentication fails:

*psql: error: connection to server at "myserver" (xx.xx.xx.xx),
port 5432 failed: SSL error: tlsv1 alert unknown ca*

Server log:
**

*2022-12-09 10:50:59.376 UTC [13896] LOG:  could not accept SSL
connection: certificate verify failed
*

Am I missing something?**

Obviously it would suffice to just remove or rename
*~/.postgresql/**postgresql.{crt,key}*, but the user needs them to
authenticate in other servers. So we came up with the workaround
to create a new sslmode (no-clientcert) to make libpq explicitly
ignore the client certificates, so that we can avoid ssl
authentication errors. These small changes can be seen in the
patch file attached.

*psql "host=myserver dbname=db user=dummyuser**
sslrootcert=server.crt sslmode=no-clientcert"*

Any better ideas to make libpq ignore
*~/.postgresql/**postgresql.{crt,key}*? Preferably without having
to change the source code :) Thanks in advance!

Best,

Jim

Re: pgsql: Delay commit status checks until freezing executes.

2023-01-06 Thread Peter Geoghegan

On Tue, Jan 3, 2023 at 10:52 PM Peter Geoghegan  wrote:
> > And it'd make sense to have
> > the explanation of why TransactionIdDidAbort() isn't the same as
> > !TransactionIdDidCommit(), even for !TransactionIdIsInProgress() xacts, near
> > the explanation for doing TransactionIdIsInProgress() first.
>
> I think that we should definitely have a comment directly over
> TransactionIdDidAbort(). Though I wouldn't mind reorganizing these
> other comments, or making the comment over TransactionIdDidAbort()
> mostly just point to the other comments.

What do you think of the attached patch, which revises comments over
TransactionIdDidAbort, and adds something about it to the top of
heapam_visbility.c?

-- 
Peter Geoghegan


v1-0001-Document-TransactionIdDidAbort-hard-crash-behavio.patch
Description: Binary data

Re: Introduce "log_connection_stages" setting.

2023-01-06 Thread Jacob Champion

Hi,

+1 for the idea!

> +authenticated
> +Logs the original identity that an authentication method 
> employs to identify a user. In most cases, the identity string equals the 
> PostgreSQL username,
> +but some third-party authentication methods may alter the original 
> user identifier before the server stores it. Failed authentication is always 
> logged regardless of the value of this setting.

I think the documentation needs to be rewrapped; those are very long lines.

On 11/17/22 07:36, Justin Pryzby wrote:
> This function hardcodes each of the 4 connections:
> 
>> +if (pg_strcasecmp(stage, "received") == 0)
>> +myextra[0] = true;
> 
> It'd be better to use #defines or enums for these.

Hardcoding seems reasonable to me, if this is the only place we're doing
string comparison.

>> --- a/src/backend/tcop/postgres.c
>> +++ b/src/backend/tcop/postgres.c
>> @@ -84,8 +84,11 @@ const char *debug_query_string; /* client-supplied query 
>> string */
>>  /* Note: whereToSendOutput is initialized for the bootstrap/standalone case 
>> */
>>  CommandDest whereToSendOutput = DestDebug;
>>  
>> -/* flag for logging end of session */
>> -boolLog_disconnections = false;
>> +/* flags for logging information about session state */
>> +boolLog_disconnected = false;
>> +boolLog_authenticated = false;
>> +boolLog_authorized = false;
>> +boolLog_received = false;
> 
> I think this ought to be an integer with flag bits, rather than 4
> booleans (I don't know, but there might be more later?).  Then, the
> implementation follows the user-facing GUC and also follows
> log_destination.

Agreed. Or at the very least, follow what's done with
wal_consistency_checking? But I think flag bits would be better.

The tests should be expanded for cases other than 'all'.

As to the failing test cases: it looks like there's a keyword issue with
ALTER SYSTEM and 'all', but trying to fix it by quoting also fails. I
think it's because of GUC_LIST_QUOTE -- is there a reason that's used
here? I don't think we'd need any special characters in future option
names. wal_consistency_checking is very similar, and it just uses
GUC_LIST_INPUT.

Thanks,
--Jacob

Re: Cygwin cleanup

On Sat, Jan 7, 2023 at 3:40 AM Andrew Dunstan  wrote:
> OK, should I now try re-enabling TAP tests on lorikeet?

Not before https://commitfest.postgresql.org/41/4032/ is committed.
After that, it might be worth a try?  I have no idea if the PANIC
problem I mentioned last night would apply to lorikeet's kernel too.
To summarise the kinds of failure we have analysed in this thread:

1.  SysV semaphores are buggy; fixed, I hope, by recent commit (= just
don't use them).
2.  The regular crashes we already knew about from other threads due
to signal masking being buggy seem to be fixed, coincidentally, by CF
#4032, not yet committed (= don't rely on sa_mask for correctness).
3.  PANIC apparently caused by non-atomic rename(), based on analysis
of similar failures seen on other old buggy OSes back in 2018.

If lorikeet has problem #3 (which it may not; we know from CF #3951
that kernel versions differ in related respects and Server 2019 as
used on CI seems to have the most conservative/old Windows behaviour)
then it might fail in the TAP tests just like the proposed
CI-for-Cygwin patch, unless you also do data_sync_retry=on, which
seems like a pretty ugly workaround to me.  I don't know what else
might be broken by non-atomic rename(), and I'd rather not find out
:-D  I finished up here by trying to tidy up some weird looking
nonsense in our code while working on general portability cleanup,
since I needed a way to check if __CYGWIN__ stuff still works, but
what we found out is that it's more broken than anyone realised, and
now I have to pull the emergency rabbit hole ejection cord because I
have less than zero time for or interest in debugging Cygwin.

Re: [PATCH] Expand character set for ltree labels



On 2023-01-06 Fr 11:29, Tom Lane wrote:
> Andrew Dunstan  writes:
>> Regardless of the punycode issue, allowing hyphens in ltree labels seems
>> quite reasonable. I haven't reviewed the patch yet, but if it's OK I
>> intend to commit it.
> No objection to allowing hyphens.  If we're going to increase the length
> limit, how about using 1000 characters?  AFAICS we could even get away
> with 10K, but it's probably best to hold a bit or two in reserve in case
> we ever want flags or something associated with labels.
>

OK, done that way.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Permute underscore separated components of columns before fuzzy matching

2023-01-06 Thread Arne Roland

Hello,

we have the great fuzzy string match, that comes up with suggestions in the 
case of a typo of a column name.

Since underscores are the de facto standard of separating words, it would also 
make sense to also generate suggestions, if the order of words gets mixed up. 
Example: If the user types timstamp_entry instead of entry_timestamp the 
suggestion shows up.

The attached patch does that for up to three segments, that are separated by 
underscores. The permutation of two segments is treated the same way a wrongly 
typed char would be.

The permutation is skipped, if the typed column name contains more than 6 
underscores to prevent a meaningful (measured on my development machine) 
slowdown, if the user types to many underscores. In terms of underscores m and 
the length of the individual strings n_att and n_col the trivial upper bound is 
O(n_att * n_col * m^2). Considering, that strings with a lot of underscores 
have a bigger likelihood of being long as well, I simply decided to add it. I 
still wonder a bit whether it should be disabled entirely (as this patch does) 
or only the swap-three sections part as the rest would bound by O(n_att * n_col 
* m). But the utility of only swapping two sections seems a bit dubious to me, 
if I have 7 or more of them.

To me this patch seems simple (if string handling in C can be called that way) 
and self contained. Despite my calculations above, it resides in a non 
performance critical piece of code. I think of it as a quality of life thing.
Let me know what you think. Thank you!

Regards
Arne

From 2f5801abe48234fade70a7238fe2a1d1f2c5813d Mon Sep 17 00:00:00 2001
From: Arne Roland 
Date: Fri, 6 Jan 2023 22:23:37 +0100
Subject: [PATCH] fuzzy_underscore_permutation

---
 src/backend/parser/parse_relation.c | 103 +---
 1 file changed, 80 insertions(+), 23 deletions(-)

diff --git a/src/backend/parser/parse_relation.c b/src/backend/parser/parse_relation.c
index 5389a0eddb..f9347792eb 100644
--- a/src/backend/parser/parse_relation.c
+++ b/src/backend/parser/parse_relation.c
@@ -579,32 +579,13 @@ GetCTEForRTE(ParseState *pstate, RangeTblEntry *rte, int rtelevelsup)
 	return NULL;/* keep compiler quiet */
 }
 
-/*
- * updateFuzzyAttrMatchState
- *	  Using Levenshtein distance, consider if column is best fuzzy match.
- */
 static void
-updateFuzzyAttrMatchState(int fuzzy_rte_penalty,
-		  FuzzyAttrMatchState *fuzzystate, RangeTblEntry *rte,
-		  const char *actual, const char *match, int attnum)
+updateFuzzyAttrMatchStateSingleString(int fuzzy_rte_penalty,
+			FuzzyAttrMatchState *fuzzystate, RangeTblEntry *rte,
+			const char *actual, const char *match, int attnum, int matchlen)
 {
-	int			columndistance;
-	int			matchlen;
-
-	/* Bail before computing the Levenshtein distance if there's no hope. */
-	if (fuzzy_rte_penalty > fuzzystate->distance)
-		return;
-
-	/*
-	 * Outright reject dropped columns, which can appear here with apparent
-	 * empty actual names, per remarks within scanRTEForColumn().
-	 */
-	if (actual[0] == '\0')
-		return;
-
 	/* Use Levenshtein to compute match distance. */
-	matchlen = strlen(match);
-	columndistance =
+	int columndistance =
 		varstr_levenshtein_less_equal(actual, strlen(actual), match, matchlen,
 	  1, 1, 1,
 	  fuzzystate->distance + 1
@@ -667,6 +648,82 @@ updateFuzzyAttrMatchState(int fuzzy_rte_penalty,
 	}
 }
 
+/*
+ * updateFuzzyAttrMatchState
+ *	  Using Levenshtein distance, consider if column is best fuzzy match.
+ */
+static void
+updateFuzzyAttrMatchState(int fuzzy_rte_penalty,
+		FuzzyAttrMatchState *fuzzystate, RangeTblEntry *rte,
+		const char *actual, const char *match, int attnum)
+{
+	/* Memory segment to store the current permutation of the match string. */
+	char* tmp_match;
+	/*
+	 * We count the number of underscores here, because we want to know whether we should consider
+	 * permuting underscore separated sections.
+	 */
+	int underscore_count = 0;
+	int matchlen		 = strlen(match);
+	/* We check for the amounts of underscores first, since updateFuzzyAttrMatchState has already quadratic run time. */
+	for (int i = 0; i < matchlen; i++) {
+		if (match[i] == '_')
+			underscore_count++;
+	}
+
+	/* Bail before computing the Levenshtein distance if there's no hope. */
+	if (fuzzy_rte_penalty > fuzzystate->distance)
+		return;
+
+	/*
+	 * Outright reject dropped columns, which can appear here with apparent
+	 * empty actual names, per remarks within scanRTEForColumn().
+	 */
+	if (actual[0] == '\0')
+		return;
+
+	updateFuzzyAttrMatchStateSingleString(fuzzy_rte_penalty, fuzzystate, rte, actual, match, attnum, matchlen);
+	/*
+	 * If told to, check for permuting up to three sections separated by underscores.
+	 */
+	if (underscore_count && underscore_count <= 6) {
+			tmp_match = palloc(matchlen);
+			tmp_match[matchlen] = '\0';
+			for (int i = 1; i < matchlen - 1; i++) {
+if (match[i] == '_') {
+	/* Consider swapping two sections

Re: RFC: logical publication via inheritance root?

2023-01-06 Thread Jacob Champion

On Fri, Dec 9, 2022 at 10:21 AM Jacob Champion  wrote:
> Some inheritance hierarchies won't be "partitioned" hierarchies, of
> course, but the user can simply not set that replication option for
> those publications.

The more I noodle around with this approach, the less I like it: it
feels overly brittle, we have to deal with multiple inheritance
somehow, and there seem to be many code paths that need to be
partially duplicated. And my suggestion that the user could just opt
out of problematic cases would be a bad user experience, since any
non-partition inheritance hierarchies would just silently break.

Instead...

> (Alternatively, I can imagine a system where an
> extension explicitly marks a table as having a different "publication
> root", and then handling that marker with the existing replication
> option. But that may be overengineering things.)

...I'm going to try this approach next, since it's opt-in and may be
able to better use the existing code paths.

--Jacob

Re: GROUP BY ALL

2023-01-06 Thread Bruce Momjian

On Mon, Dec 19, 2022 at 05:53:46PM +0100, Vik Fearing wrote:
> I think this is a pretty terrible idea.  If we want that kind of behavior,
> we should just allow the GROUP BY to be omitted since without grouping sets,
> it is kind of redundant anyway.
> 
> I don't know what my opinion is on that.

This is a very interesting concept.  Because Postgres requires GROUP BY
of all non-aggregate columns of a target list, Postgres could certainly
automatically generate the GROUP BY.  However, readers of the query
might not easily distinguish function calls from aggregates, so in a way
the GROUP BY is for the reader, not for the database server.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

Embrace your flaws.  They make you human, rather than perfect,
which you will never be.

Re: Using WaitEventSet in the postmaster

On Fri, Dec 23, 2022 at 8:46 PM Thomas Munro  wrote:
> I pushed the small preliminary patches.  Here's a rebase of the main patch.

Here are some questions I have considered.  Anyone got an opinion
on point 3, in particular?

1.  Is it OK that we are now using APIs that might throw, in places
where we weren't?  I think so: we don't really expect WaitEventSet
APIs to throw unless something is pretty seriously wrong, and if you
hack things to inject failures there then you get a FATAL error and
the postmaster exits and the children detect that.  I think that is
appropriate.

2.  Is it really OK to delete the pqsignal_pm() infrastructure?  I
think so.  The need for sa_mask to block all signals is gone: all
signal handlers should now be re-entrant (ie safe in case of
interruption while already in a signal handler), safe against stack
overflow (pqsignal() still blocks re-entry for the *same* signal
number, because we use sigaction() without SA_NODEFER, so a handler
can only be interrupted by a different signal, and the number of
actions installed is finite and small), and safe to run at any time
(ie safe to interrupt the user context because we just do known-good
sigatomic_t/syscall stuff and save/restore errno).  The concern about
SA_RESTART is gone, because we no longer use the underspecified
select() interface; the replacement implementation syscalls, even
poll(), return with EINTR for handlers installed with SA_RESTART, but
that's now moot anyway because we have a latch that guarantees they
return with a different event anyway.  (FTR select() is nearly extinct
in BE code, I found one other user and I plan to remove it, see RADIUS
thread, CF #4103.)

3.  Is it OK to clobber the shared pending flag for SIGQUIT, SIGTERM,
SIGINT?  If you send all of these extremely rapidly, it's
indeterminate which one will be seen by handle_shutdown_request().  I
think that's probably acceptable?  To be strict about processing only
the first one that is delivered, I think you'd need an sa_mask to
block all three signals, and then you wouldn't change
pending_shutdown_request if it's already set, which I'm willing to
code up if someone thinks that's important.  (Ideally I
would invent WL_SIGNAL to consume signal events serially without
handlers or global variables.)

4.  Is anything new leaking into child processes due to this new
infrastructure?  I don't think so; the postmaster's MemoryContext is
destroyed, and before that I'm releasing kernel resources on OSes that
need it (namely Linux, where the epoll fd and signalfd need to be
closed).

5.  Is the signal mask being correctly handled during forking?  I
think so: I decided to push the masking logic directly into the
routine that forks, to make it easy to verify that all paths set the
mask the way we want.  (While thinking about that I noticed that
signals don't seem to be initially masked on Windows; I think that's a
pre-existing condition, and I assume we get away with it because
nothing reaches the fake signal dispatch code soon enough to break
anything?  Not changing that in this patch.)

6.  Is the naming and style OK?  Better ideas welcome, but basically I
tried to avoid all unnecessary refactoring and changes, so no real
logic moves around, and the changes are pretty close to "mechanical".
One bikeshed decision was what to call the {handle,process}_XXX
functions and associated flags.  Maybe "action" isn't the best name;
but it could be a request from pg_ctl or a request from a child
process.  I went with newly invented names for these handlers rather
than "handle_SIGUSR1" etc because (1) the 3 different shutdown request
signals point to a common handler and (2) I hope to switch to latches
instead of SIGUSR1 for "action" in later work.  But I could switch to
got_SIGUSR1 style variables if someone thinks it's better.

Here's a new version, with small changes:
* remove a stray reference to select() in a pqcomm.c comment
* move PG_SETMASK(&UnBlockSig) below the bit that sets up SIGTTIN etc
* pgindent
From 95fd8e1a832ba063f7f1a34f6548de39ca8e691c Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Wed, 9 Nov 2022 22:59:58 +1300
Subject: [PATCH v8] Give the postmaster a WaitEventSet and a latch.

Traditionally, the postmaster's architecture was quite unusual.  It did
a lot of work inside signal handlers, which were only unblocked while
waiting in select() to make that safe.

Switch to a more typical architecture, where signal handlers just set
flags and use a latch to close races.  Now the postmaster looks like
all other PostgreSQL processes, multiplexing its event processing in
epoll_wait()/kevent()/poll()/WaitForMultipleObjects() depending on the
OS.

Changes:

 * Allow the postmaster to set up its own local latch.  For now we don't
   want other backends setting the postmaster's latch directly (that
   would require latches robust against arbitrary corruption of shared
   memory).

 * The existing signal handlers are cut in two: a handle_XXX part that
   sets a pending_XXX

Re: daitch_mokotoff module

2023-01-06 Thread Dag Lem

Alvaro Herrera  writes:

> On 2023-Jan-05, Dag Lem wrote:
>
>> Is there anything else I should do here, to avoid the status being
>> incorrectly stuck at "Waiting for Author" again.
>
> Just mark it Needs Review for now.  I'll be back from vacation on Jan
> 11th and can have a look then (or somebody else can, perhaps.)

OK, done. Have a nice vacation!

Best regards

Dag Lem

Re: wake up logical workers after ALTER SUBSCRIPTION

Nathan Bossart  writes:
> I found some additional places that should remove the last-start time from
> the hash table.  I've added those in v14.

I've pushed 0001 and 0002, which seem pretty uncontroversial.
Attached is a rebased 0003, just to keep the cfbot happy.
I'm kind of wondering whether 0003 is worth the complexity TBH,
but in any case I ran out of time to look at it closely today.

regards, tom lane

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 5bcba0fdec..8f06e234c8 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -1991,6 +1991,16 @@ postgres   27093  0.0  0.0  30096  2752 ?Ss   11:34   0:00 postgres: ser
   Waiting to read or update information
about heavyweight locks.
  
+ 
+  LogicalRepLauncherDSA
+  Waiting for logical replication launcher dynamic shared memory
+  allocator access
+ 
+ 
+  LogicalRepLauncherHash
+  Waiting for logical replication launcher shared memory hash table
+  access
+ 
  
   LogicalRepWorker
   Waiting to read or update the state of logical replication
diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c
index f15a332bae..88f180c2a7 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -1504,6 +1504,14 @@ DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel)
 	}
 	list_free(subworkers);
 
+	/*
+	 * Clear the last-start time for the apply worker to free up space.  If
+	 * this transaction rolls back, the launcher might restart the apply worker
+	 * before wal_retrieve_retry_interval milliseconds have elapsed, but that's
+	 * probably okay.
+	 */
+	logicalrep_launcher_delete_last_start_time(subid);
+
 	/*
 	 * Cleanup of tablesync replication origins.
 	 *
diff --git a/src/backend/replication/logical/launcher.c b/src/backend/replication/logical/launcher.c
index a69e371c05..dfe49db64f 100644
--- a/src/backend/replication/logical/launcher.c
+++ b/src/backend/replication/logical/launcher.c
@@ -25,6 +25,7 @@
 #include "catalog/pg_subscription.h"
 #include "catalog/pg_subscription_rel.h"
 #include "funcapi.h"
+#include "lib/dshash.h"
 #include "libpq/pqsignal.h"
 #include "miscadmin.h"
 #include "pgstat.h"
@@ -56,6 +57,25 @@
 int			max_logical_replication_workers = 4;
 int			max_sync_workers_per_subscription = 2;
 
+/* an entry in the last-start times hash table */
+typedef struct LauncherLastStartTimesEntry
+{
+	Oid		subid;
+	TimestampTz last_start_time;
+} LauncherLastStartTimesEntry;
+
+/* parameters for the last-start times hash table */
+static const dshash_parameters dsh_params = {
+	sizeof(Oid),
+	sizeof(LauncherLastStartTimesEntry),
+	dshash_memcmp,
+	dshash_memhash,
+	LWTRANCHE_LAUNCHER_HASH
+};
+
+static dsa_area *last_start_times_dsa = NULL;
+static dshash_table *last_start_times = NULL;
+
 LogicalRepWorker *MyLogicalRepWorker = NULL;
 
 typedef struct LogicalRepCtxStruct
@@ -63,6 +83,10 @@ typedef struct LogicalRepCtxStruct
 	/* Supervisor process. */
 	pid_t		launcher_pid;
 
+	/* hash table for last-start times */
+	dsa_handle	last_start_dsa;
+	dshash_table_handle last_start_dsh;
+
 	/* Background workers. */
 	LogicalRepWorker workers[FLEXIBLE_ARRAY_MEMBER];
 } LogicalRepCtxStruct;
@@ -74,6 +98,9 @@ static void logicalrep_launcher_onexit(int code, Datum arg);
 static void logicalrep_worker_onexit(int code, Datum arg);
 static void logicalrep_worker_detach(void);
 static void logicalrep_worker_cleanup(LogicalRepWorker *worker);
+static void logicalrep_launcher_attach_dshmem(void);
+static void logicalrep_launcher_set_last_start_time(Oid subid, TimestampTz start_time);
+static TimestampTz logicalrep_launcher_get_last_start_time(Oid subid);
 
 static bool on_commit_launcher_wakeup = false;
 
@@ -756,6 +783,9 @@ ApplyLauncherShmemInit(void)
 			memset(worker, 0, sizeof(LogicalRepWorker));
 			SpinLockInit(&worker->relmutex);
 		}
+
+		LogicalRepCtx->last_start_dsa = DSM_HANDLE_INVALID;
+		LogicalRepCtx->last_start_dsh = DSM_HANDLE_INVALID;
 	}
 }
 
@@ -801,8 +831,6 @@ ApplyLauncherWakeup(void)
 void
 ApplyLauncherMain(Datum main_arg)
 {
-	TimestampTz last_start_time = 0;
-
 	ereport(DEBUG1,
 			(errmsg_internal("logical replication launcher started")));
 
@@ -837,58 +865,55 @@ ApplyLauncherMain(Datum main_arg)
 
 		now = GetCurrentTimestamp();
 
-		/* Limit the start retry to once a wal_retrieve_retry_interval */
-		if (TimestampDifferenceExceeds(last_start_time, now,
-	   wal_retrieve_retry_interval))
-		{
-			/* Use temporary context for the database list and worker info. */
-			subctx = AllocSetContextCreate(TopMemoryContext,
-		   "Logical Replication Launcher sublist",
-		   ALLOCSET_DEFAULT_SIZES);
-			oldctx = MemoryContextSwitchTo(subctx);
-
-			/* search for subscriptions to start or stop. */
-			sublist = get_subscription_list();
-
-			/* Start the missing workers for enable

Re: postgres_fdw: using TABLESAMPLE to collect remote sample

2023-01-06 Thread Tomas Vondra

On 1/6/23 17:58, Tom Lane wrote:
> Tomas Vondra  writes:
>> The one difference is that I realized the relkind check does not
>> actually say we can't do sampling - it just means we can't use
>> TABLESAMPLE to do it. We could still use "random()" ...
> 
>> Furthermore, I don't think we should silently disable sampling when the
>> user explicitly requests TABLESAMPLE by specifying bernoulli/system for
>> the table - IMHO it's less surprising to just fail in that case.
> 
> Agreed on both points.  This patch looks good to me.
> 

Good, I'll get this committed.The "ORDER BY random()" idea is a possible
improvement, can be discussed on it's own.

>> Of course, all relkinds that don't support TABLESAMPLE currently have
>> reltuples value that will disable sampling anyway (e.g. views have -1).
>> So we won't actually fallback to random() anyway, because we can't
>> calculate the sample fraction.
>> That's a bit annoying for foreign tables pointing at a view, which is a
>> more likely use case than table pointing at a sequence.
> 
> Right, that's a case worth being concerned about.
> 
>> But I realized we could actually still do "random()" sampling:
>> SELECT * FROM t ORDER BY random() LIMIT $X;
> 
> Hmm, interesting idea, but it would totally bollix our correlation
> estimates.  Not sure that those are worth anything for remote views,
> but still...

But isn't that an issue that we already have? I mean, if we do ANALYZE
on a foreign table pointing to a view, we fetch all the results. But if
the view does not have a well-defined ORDER BY, a trivial plan change
may change the order of results and thus the correlation.

Actually, how is a correlation even defined for a view?

It's true this "ORDER BY random()" thing would make it less stable, as
it would change the correlation on every run. Although - the calculated
correlation is actually quite stable, because it's guaranteed to be
pretty close to 0 because we make the order random.

Maybe that's actually better than the current state where it depends on
the plan? Why not to look at the relkind and just set correlation to 0.0
in such cases?

But if we want to restore that current behavior (where it depends on the
actual query plan), we could do something like this:

   SELECT * FROM the_remote_view ORDER BY row_number() over ();

But yeah, this makes the remote sampling more expensive. Probably still
a win because of network costs, but not great.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: New strategies for freezing, advancing relfrozenxid early

2023-01-06 Thread Peter Geoghegan

On Wed, Jan 4, 2023 at 5:21 PM Matthias van de Meent
 wrote:
> Some reviews (untested; only code review so far) on these versions of
> the patches:

Thanks for the review!

> > [PATCH v14 1/3] Add eager and lazy freezing strategies to VACUUM.

> I don't think the mention of 'cutoff point' is necessary when it has
> 'Threshold'.

Fair. Will fix.

> > +intfreeze_strategy_threshold;/* threshold to use eager
> > [...]
> > +BlockNumber freeze_strategy_threshold;
>
> Is there a way to disable the 'eager' freezing strategy? `int` cannot
> hold the maximum BlockNumber...

I'm going to fix this by switching over to making the GUC (and the
reloption) GUC_UNIT_MB, while keeping it in ConfigureNamesInt[]. That
approach is a little bit more cumbersome, but not by much. That'll
solve this problem.

> > +lazy_scan_strategy(vacrel);
> >  if (verbose)
>
> I'm slightly suprised you didn't update the message for verbose vacuum
> to indicate whether we used the eager strategy: there are several GUCs
> for tuning this behaviour, so you'd expect to want direct confirmation
> that the configuration is effective.

Perhaps that would be worth doing, but I don't think that it's all
that useful in the grand scheme of things. I wouldn't mind including
it, but I think that it shouldn't be given much prominence. It's
certainly far less important than "aggressive vs non-aggressive" is
right now.

Eagerness is not just a synonym of aggressiveness. For example, every
VACUUM of a table like pgbench_tellers or pgbench_branches will use
eager scanning strategy. More generally, you have to bear in mind that
the actual state of the table is just as important as the GUCs
themselves. We try to avoid obligations that could be very hard or
even impossible for vacuumlazy.c to fulfill.

There are far weaker constraints on things like the final relfrozenxid
value we'll set in pg_class (more on this below, when I talk about
MinXid/MinMulti). It will advance far more frequently and by many more
XIDs than it would today, on average. But occasionally it will allow a
far earlier relfrozenxid than aggressive mode would ever allow, since
making some small amount of progress now is almost always much better
than making no progress at all.

> (looks at further patches) I see that the message for verbose vacuum
> sees significant changes in patch 2 instead.

It just works out to be slightly simpler that way. I want to add the
scanned_pages stuff to VERBOSE in the vmsnap/scanning strategies
commit, so I need to make significant changes to the initial VERBOSE
message in that commit. There is little point in preserving
information about aggressive mode if it's removed in the very next
commit anyway.

> > [PATCH v14 2/3] Add eager and lazy VM strategies to VACUUM.

> Right now, we don't use syncscan to determine a startpoint. I can't
> find the reason why in the available documentation: [0] discusses the
> issue but without clearly describing an issue why it wouldn't be
> interesting from a 'nothing lost' perspective.

That's not something I've given much thought to. It's a separate issue, I think.

Though I will say that one reason why I think that the vm snapshot
concept will become important is that working off an immutable
structure makes various things much easier, in fairly obvious ways. It
makes it straightforward to reorder work. So things like parallel heap
vacuuming are a lot more straightforward.

I also think that it would be useful to teach VACUUM to speculatively
scan a random sample of pages, just like a normal VACUUM. We start out
doing a normal VACUUM that just processes scanned_pages in a random
order. At some point we look at the state of pages so far. If it looks
like the table really doesn't urgently need to be vacuumed, then we
can give up before paying much of a cost. If it looks like the table
really needs to be VACUUM'd, we can press on almost like any other
VACUUM would.

This is related to the problem of bad statistics that drive
autovacuum. Deciding as much as possible at runtime, dynamically,
seems promising to me.

> In addition, I noticed that progress reporting of blocks scanned
> ("heap_blocks_scanned", duh) includes skipped pages. Now that we have
> a solid grasp of how many blocks we're planning to scan, we can update
> the reported stats to how many blocks we're planning to scan (and have
> scanned), increasing the user value of that progress view.

Yeah, that's definitely a natural direction to go with this. Knowing
scanned_pages from the start is a basis for much more useful progress
reporting.

> > +doubletableagefrac;
>
> I think this can use some extra info on the field itself, that it is
> the fraction of how "old" the relfrozenxid and relminmxid fields are,
> as a fraction between 0 (latest values; nextXID and nextMXID), and 1
> (values that are old by at least freeze_table_age and
> multixact_freeze_table_age (multi)transaction ids, respectively).

Agreed that that needs more t

Re: Using WaitEventSet in the postmaster

2023-01-06 Thread Andres Freund

Hi,

On 2023-01-07 11:08:36 +1300, Thomas Munro wrote:
> 1.  Is it OK that we are now using APIs that might throw, in places
> where we weren't?  I think so: we don't really expect WaitEventSet
> APIs to throw unless something is pretty seriously wrong, and if you
> hack things to inject failures there then you get a FATAL error and
> the postmaster exits and the children detect that.  I think that is
> appropriate.

I think it's ok in principle. It might be that we'll find some thing to fix in
the future, but I don't see anything fundamental or obvious.

> 2.  Is it really OK to delete the pqsignal_pm() infrastructure?  I
> think so.

Same.

> 3.  Is it OK to clobber the shared pending flag for SIGQUIT, SIGTERM,
> SIGINT?  If you send all of these extremely rapidly, it's
> indeterminate which one will be seen by handle_shutdown_request().

That doesn't seem optimal. I'm mostly worried that we can end up downgrading a
shutdown request.

> I think that's probably acceptable?  To be strict about processing only the
> first one that is delivered, I think you'd need an sa_mask to block all
> three signals, and then you wouldn't change pending_shutdown_request if it's
> already set, which I'm willing to code up if someone thinks that's
> important.  (Ideally I would invent WL_SIGNAL to consume signal
> events serially without handlers or global variables.)

Hm. The need for blocking sa_mask solely comes from using one variable in
three signal handlers, right?  It's not pretty, but to me the easiest fix here
seems to be have separate pending_{fast,smart,immediate}_shutdown_request
variables and deal with them in process_shutdown_request().  Might still make
sense to have one pending_shutdown_request variable, to avoid unnecessary
branches before calling process_shutdown_request().

> 5.  Is the signal mask being correctly handled during forking?  I
> think so: I decided to push the masking logic directly into the
> routine that forks, to make it easy to verify that all paths set the
> mask the way we want.

Hm. If I understand correctly, you used sigprocmask() directly (vs
PG_SETMASK()) in fork_process() because you want the old mask? But why do we
restore the prior mask, instead of just using PG_SETMASK(&UnBlockSig); as we
still do in a bunch of places in the postmaster?

Not that I'm advocating for that, but would there be any real harm in just
continuing to accept signals post fork? Now all the signal handlers should
just end up pointlessly setting a local variable that's not going to be read
any further?  If true, it'd be good to add a comment explaining that this is
just a belt-and-suspenders thing.

> (While thinking about that I noticed that signals don't seem to be initially
> masked on Windows; I think that's a pre-existing condition, and I assume we
> get away with it because nothing reaches the fake signal dispatch code soon
> enough to break anything?  Not changing that in this patch.)

It's indeed a bit odd that we do pgwin32_signal_initialize() before the
initmask() and PG_SETMASK(&BlockSig) in InitPostmasterChild(). I guess it's
kinda harmless though?

I'm now somewhat weirded out by the choice to do pg_strong_random_init() in
fork_process() rather than InitPostmasterChild(). Seems odd.

> 6.  Is the naming and style OK?  Better ideas welcome, but basically I
> tried to avoid all unnecessary refactoring and changes, so no real
> logic moves around, and the changes are pretty close to "mechanical".
> One bikeshed decision was what to call the {handle,process}_XXX
> functions and associated flags.

I wonder if it'd be good to have a _pm_ in the name.

> Maybe "action" isn't the best name;

Yea, I don't like it. A shutdown is also an action, etc. What about just using
_pmsignal_? It's a it odd because there's two signals in the name, but it
still feels better than 'action' and better than the existing sigusr1_handler.

> +
> +/* I/O multiplexing object */
> +static WaitEventSet *wait_set;

I'd name it a bit more obviously connected to postmaster, particularly because
it does survive into forked processes and needs to be closed there.

> +/*
> + * Activate or deactivate notifications of server socket events.  Since we
> + * don't currently have a way to remove events from an existing WaitEventSet,
> + * we'll just destroy and recreate the whole thing.  This is called during
> + * shutdown so we can wait for backends to exit without accepting new
> + * connections, and during crash reinitialization when we need to start
> + * listening for new connections again.
> + */

I'd maybe reference that this gets cleaned up via ClosePostmasterPorts(), it's
not *immediately* obvious.

> +static void
> +ConfigurePostmasterWaitSet(bool accept_connections)
> +{
> + if (wait_set)
> + FreeWaitEventSet(wait_set);
> + wait_set = NULL;
> +
> + wait_set = CreateWaitEventSet(CurrentMemoryContext, 1 + MAXLISTEN);
> + AddWaitEventToSet(wait_set, WL_LATCH_SET, PGINVALID_SOCKET, MyLatch,

Re: New strategies for freezing, advancing relfrozenxid early

2023-01-06 Thread Peter Geoghegan

On Thu, Jan 5, 2023 at 10:19 AM Matthias van de Meent
 wrote:
> Could this use something like autovacuum_cost_delay? I don't quite
> like the use of arbitrary hardcoded millisecond delays

It's not unlike (say) the way that there can sometimes be hardcoded
waits inside GetMultiXactIdMembers(), which does run during VACUUM.

It's not supposed to be noticeable at all. If it is noticeable in any
practical sense, then the design is flawed, and should be fixed.

> it can slow a
> system down by a significant fraction, especially on high-contention
> systems, and this potential of 60ms delay per scanned page can limit
> the throughput of this new vacuum strategy to < 17 pages/second
> (<136kB/sec) for highly contended sections, which is not great.

We're only willing to wait the full 60ms when smaller waits don't work
out. And when 60ms doesn't do it, we'll then accept an older final
NewRelfrozenXid value. Our willingness to wait at all is conditioned
on the existing NewRelfrozenXid tracker being affected at all by
whether or not we accept reduced lazy_scan_noprune processing for the
page. So the waits are naturally self-limiting.

You may be right that I need to do more about the possibility of
something like that happening -- it's a legitimate concern. But I
think that this may be enough on its own. I've never seen a workload
where more than a small fraction of all pages couldn't be cleanup
locked right away. But I *have* seen workloads where VACUUM vainly
waited forever for a cleanup lock on one single heap page.

> It is also not unlikely that in the time it was waiting, the page
> contents were updated significantly (concurrent prune, DELETEs
> committed), which could result in improved bounds. I think we should
> redo the dead items check if we waited, but failed to get a lock - any
> tuples removed now reduce work we'll have to do later.

I don't think that it matters very much. That's always true. It seems
very unlikely that we'll get better bounds here, unless it happens by
getting a full cleanup lock and then doing full lazy_scan_prune
processing after all.

Sure, it's possible that a concurrent opportunistic prune could make
the crucial difference, even though we ourselves couldn't get a
cleanup lock despite going to considerable trouble. I just don't think
that it's worth doing anything about.

> > +++ b/doc/src/sgml/ref/vacuum.sgml
> > [...] Pages where
> > +  all tuples are known to be frozen are always skipped.
>
> "...are always skipped, unless the >DISABLE_PAGE_SKIPPING< option is used."

I'll look into changing this.

> > +++ b/doc/src/sgml/maintenance.sgml
>
> There are a lot of details being lost from the previous version of
> that document. Some of the details are obsolete (mentions of
> aggressive VACUUM and freezing behavior), but others are not
> (FrozenTransactionId in rows from a pre-9.4 system, the need for
> vacuum for prevention of issues surrounding XID wraparound).

I will admit that I really hate the "Routine Vacuuming" docs, and
think that they explain things in just about the worst possible way.

I also think that this needs to be broken up into pieces. As I said
recently, the docs are the part of the patch series that is the least
worked out.

> I also am not sure this is the best place to store most of these
> mentions, but I can't find a different place where these details on
> certain interesting parts of the system are documented, and plain
> removal of the information does not sit right with me.

I'm usually the person that argues for describing more implementation
details in the docs. But starting with low-level details here is
deeply confusing. At most these are things that should be discussed in
the context of internals, as part of some completely different
chapter.

I'll see about moving details of things like FrozenTransactionId somewhere else.

> Specifically, I don't like the removal of the following information
> from our documentation:
>
> - Size of pg_xact and pg_commit_ts data in relation to 
> autovacuum_freeze_max_age
>Although it is less likely with the new behaviour that we'll hit
> these limits due to more eager freezing of transactions, it is still
> important for users to have easy access to this information, and
> tuning this for storage size is not useless information.

That is a fair point. Though note that these things have weaker
relationships with settings like autovacuum_freeze_max_age now. Mostly
this is a positive improvement (in the sense that we can truncate
SLRUs much more aggressively on average), but not always.

> - The reason why VACUUM is essential to the long-term consistency of
> Postgres' MVCC system
> Informing the user about our use of 32-bit transaction IDs and
> that we update an epoch when this XID wraps around does not
> automatically make the user aware of the issues that surface around
> XID wraparound. Retaining the explainer for XID wraparound in the docs
> seems like a decent idea - it may be moved, but please do

Re: Transparent column encryption

2023-01-06 Thread Justin Pryzby

"ON (CASE WHEN a.attrealtypid <> 0 THEN a.attrealtypid ELSE a.atttypid END = 
t.oid)\n"

This breaks interoperability with older servers:
ERROR:  column a.attrealtypid does not exist

Same in describe.c

Find attached some typos and bad indentation.  I'm sending this off now
as I've already sat on it for 2 weeks since starting to look at the
patch.

-- 
Justin
>From d9dcf23d25ba4452fb12c4b065ab5215e2228882 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Fri, 6 Jan 2023 18:28:59 -0600
Subject: [PATCH] f!

---
 doc/src/sgml/datatype.sgml |  4 ++--
 doc/src/sgml/ddl.sgml  |  4 ++--
 doc/src/sgml/libpq.sgml|  2 +-
 doc/src/sgml/protocol.sgml |  2 +-
 src/bin/pg_dump/pg_dump.c  |  4 ++--
 src/interfaces/libpq/fe-exec.c | 12 ++--
 6 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml
index 56b2e1d0d1e..243e6861506 100644
--- a/doc/src/sgml/datatype.sgml
+++ b/doc/src/sgml/datatype.sgml
@@ -5369,7 +5369,7 @@ WHERE ...
 pg_encrypted_rnd (for randomized encryption) or
 pg_encrypted_det (for deterministic encryption); see .  Most of the database system treats
-this as normal types.  For example, the type pg_encrypted_det has
+these as normal types.  For example, the type pg_encrypted_det has
 an equals operator that allows lookup of encrypted values.  It is,
 however, not allowed to create a table using one of these types directly
 as a column type.
@@ -5383,7 +5383,7 @@ WHERE ...
 Clients that don't support transparent column encryption or have disabled
 it will see the encrypted values in this format.  Clients that support
 transparent data encryption will not see these types in result sets, as
-the protocol layer will translate them back to declared underlying type in
+the protocol layer will translate them back to the declared underlying type in
 the table definition.

 
diff --git a/doc/src/sgml/ddl.sgml b/doc/src/sgml/ddl.sgml
index b8364a91f9a..a7624d6a60c 100644
--- a/doc/src/sgml/ddl.sgml
+++ b/doc/src/sgml/ddl.sgml
@@ -1263,7 +1263,7 @@ CREATE TABLE customers (
randomized encryption, which is the default.  Randomized encryption uses a
random initialization vector for each encryption, so that even if the
plaintext of two rows is equal, the encrypted values will be different.
-   This prevents someone with direct access to the database server to make
+   This prevents someone with direct access to the database server from making
computations such as distinct counts on the encrypted values.
Deterministic encryption uses a fixed initialization vector.  This reduces
security, but it allows equality searches on encrypted values.  The
@@ -1540,7 +1540,7 @@ export PGCMKLOOKUP
 

 In general, column encryption is never a replacement for additional
-security and encryption techniques such as transmission encryption
+security and encryption techniques such as transport encryption
 (SSL/TLS), storage encryption, strong access control, and password
 security.  Column encryption only targets specific use cases and should be
 used in conjunction with additional security measures.
diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml
index a2d413bafd5..9b7f76db7a9 100644
--- a/doc/src/sgml/libpq.sgml
+++ b/doc/src/sgml/libpq.sgml
@@ -3012,7 +3012,7 @@ PGresult *PQexecParams(PGconn *conn,
 If encryption is forced for a parameter but the parameter does not
 correspond to an encrypted column on the server, then the call
 will fail and the parameter will not be sent.  This can be used
-for additional security against a comprimised server.  (The
+for additional security against a compromised server.  (The
 drawback is that application code then needs to be kept up to date
 with knowledge about which columns are encrypted rather than
 letting the server specify this.)
diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index 1a9b8abd7f2..caa6e3174ee 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -5807,7 +5807,7 @@ psql "dbname=postgres replication=database" -c "IDENTIFY_SYSTEM;"

 
  If the field is encrypted, this specifies the identifier of the
- encrypt algorithm, else zero.
+ encryption algorithm, else zero.
 

   
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index f82c2496fd5..99f2583c34a 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -690,7 +690,7 @@ main(int argc, char **argv)
 	 * --rows-per-insert were specified.
 	 */
 	if (dopt.cparams.column_encryption && dopt.dump_inserts == 0)
-		pg_fatal("option --decrypt_encrypted_columns requires option --inserts, --rows-per-insert, or --column-inserts");
+		pg_fatal("option --decrypt-encrypted-columns requires

Re: add \dpS to psql

2023-01-06 Thread Nathan Bossart

On Fri, Jan 06, 2023 at 06:52:33PM +, Dean Rasheed wrote:
> Looking this over this, I have a couple of comments:

Thanks for reviewing.

> Firstly, I think it should allow \zS in the same fashion as \dpS,
> since \z is an alias for \dp, so the 2 should be kept in sync.

That seems reasonable to me.

> Secondly, I don't think the following is the right SQL clause to use
> in the absence of "S":
> 
> if (!showSystem && !pattern)
> appendPQExpBufferStr(&buf, "AND n.nspname !~ '^pg_'\n");
> 
> I know that's the condition it used before, but the problem with using
> that now is that it will cause temporary relations to be excluded
> unless the "S" modifier is used, which goes against the expectation
> that "S" just causes system relations to be included. Also, it fails
> to exclude information_schema relations, if that happens to be on the
> user's search_path.
> 
> So I think we should use the same SQL clauses as every other psql
> command that supports "S", namely:
> 
> if (!showSystem && !pattern)
> appendPQExpBufferStr(&buf, "  AND n.nspname <> 'pg_catalog'\n"
>  "  AND n.nspname <> 'information_schema'\n");

Good catch.  I should have noticed this.  The deleted comment mentions that
the system/temp tables normally aren't very interesting from a permissions
perspective, so perhaps there is an argument for always excluding temp
tables without a pattern.  After all, \dp always excludes indexes and TOAST
tables.  However, it looks like \dt includes temp tables, and I didn't see
any other meta-commands that excluded them.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: Missing update of all_hasnulls in BRIN opclasses

2023-01-06 Thread Justin Pryzby

On Fri, Dec 30, 2022 at 01:18:36AM +0100, Tomas Vondra wrote:
> +  * Does the range already has NULL values? Either of the flags 
> can

should say: "already have NULL values"

> +  * If we had NULLS, and the opclass didn't set allnulls=true, 
> set
> +  * the hasnulls so that we know there are NULL values.

You could remove "the" before "hasnulls".
Or say "clear hasnulls so that.."

> @@ -585,6 +587,13 @@ brin_deform_tuple(BrinDesc *brdesc, BrinTuple *tuple, 
> BrinMemTuple *dMemtuple)
>   {
>   int i;
>  
> + /*
> +  * Make sure to overwrite the hasnulls flag, because it was 
> initialized
> +  * to true by brin_memtuple_initialize and we don't want to 
> skip it if
> +  * allnulls.

Does "if allnulls" mean "if allnulls is true" ?
It's a bit unclear.

> +  */
> + dtup->bt_columns[keyno].bv_hasnulls = hasnulls[keyno];
> +
>   if (allnulls[keyno])
>   {
>   valueno += brdesc->bd_info[keyno]->oi_nstored;

-- 
Justin

Re: GROUP BY ALL

2023-01-06 Thread Andrey Borodin

On Fri, Jan 6, 2023 at 1:56 PM Bruce Momjian  wrote:
> Because Postgres requires GROUP BY
> of all non-aggregate columns of a target list, Postgres could certainly
> automatically generate the GROUP BY.  However, readers of the query
> might not easily distinguish function calls from aggregates, so in a way
> the GROUP BY is for the reader, not for the database server.
>

How about "SELECT a,b, count(*) FROM t GROUP AUTOMATICALLY;" ? And
then a shorthand for "SELECT a,b, count(*) FROM t GROUP;".

Anyway, the problem is not in clever syntax, but in the fact that it's
an SQL extension, not a standard...

Best regards, Andrey Borodin.

Re: wake up logical workers after ALTER SUBSCRIPTION

2023-01-06 Thread Nathan Bossart

On Fri, Jan 06, 2023 at 05:31:26PM -0500, Tom Lane wrote:
> I've pushed 0001 and 0002, which seem pretty uncontroversial.

Thanks!

> Attached is a rebased 0003, just to keep the cfbot happy.
> I'm kind of wondering whether 0003 is worth the complexity TBH,
> but in any case I ran out of time to look at it closely today.

Yeah.  It's not as bad as I was expecting, but it does add a bit more
complexity than is probably warranted.  I'm not wedded to this approach.

BTW I intend to start a new thread for the bugs I mentioned upthread that
were revealed by setting wal_retrieve_retry_interval to 1ms in the tests.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: Add index scan progress to pg_stat_progress_vacuum

2023-01-06 Thread Imseih (AWS), Sami

> My point is whether we should show
> indexes_total throughout the vacuum execution (even also in not
>  relevant phases such as heap scanning/vacuum/truncation).

That is a good point. We should only show indexes_total
and indexes_completed only during the relevant phases.

V21 addresses this along with a documentation fix.

indexes_total and indexes_completed can only be a value > 0 while in the
"vacuuming indexes" or "cleaning up indexes" phases of vacuum. 

Indexes_total is set to 0 at the start of the index vacuum cycle or index 
cleanups
and set back to 0, along with indexes_completed, at the end of the index vacuum
cycle and index cleanups

Also, for the progress updates in vacuumlazy.c that should be done atomically,
I made a change to use pgstat_progress_update_multi_param.

Regards,

--
Sami Imseih
Amazon Web Services: https://aws.amazon.com



v21-0001-Add-2-new-columns-to-pg_stat_progress_vacuum.-Th.patch
Description: v21-0001-Add-2-new-columns-to-pg_stat_progress_vacuum.-Th.patch

Re: Is OpenSSL AES-NI not available in pgcrypto?

2023-01-06 Thread Bruce Momjian

On Mon, Jan  2, 2023 at 05:57:38PM +0100, aghart...@gmail.com wrote:
> So, a test with pgcrypto:
> 
> select pgp_sym_encrypt(data::text, 'pwd') --default to aes128
> from generate_series('2022-01-01'::timestamp, '2022-12-31'::timestamp, '1
> hour'::interval) data
> 
> vs
> 
> select pgp_sym_encrypt(data::text, 'pwd','cipher-algo=bf') -- blowfish
> from generate_series('2022-01-01'::timestamp, '2022-12-31'::timestamp, '1
> hour'::interval) data

To see the difference, I think you need to construct a single large
query that calls many pgcrypto functions, with a small return result, so
the network, parsing, and optimizer overhead are minimal compared to the
OpenSSL overhread.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

Embrace your flaws.  They make you human, rather than perfect,
which you will never be.

Re: Todo: Teach planner to evaluate multiple windows in the optimal order

2023-01-06 Thread David Rowley

On Thu, 5 Jan 2023 at 04:11, Ankit Kumar Pandey  wrote:
>
> Attaching test cases for this (+ small change in doc).
>
> Tested this in one of WIP branch where I had modified
> select_active_windows and it failed
>
> as expected.
>
> Please let me know if something can be improved in this.

Thanks for writing that.

I had a look over the patch and ended up making some adjustments to
the tests. Looking back at 728202b63, I think any tests we add here
should be kept alongside the tests added by that commit rather than
tacked on to the end of the test file.  It also makes sense to me just
to use the same table as the original tests. I also thought the
comment in select_active_windows should be in the sort comparator
function instead. I think that's a more likely place to capture the
attention of anyone making modifications.

I've now pushed the adjusted patch.

David

Re: Support for dumping extended statistics

2023-01-06 Thread Bruce Momjian

On Thu, Jan  5, 2023 at 06:29:03PM +, Hari krishna Maddileti wrote:
> Hi Team,
> In order to restore dumped extended statistics (stxdndistinct,
> stxddependencies, stxdmcv) we need to provide input functions to parse
> pg_distinct/pg_dependency/pg_mcv_list strings.
> 
> Today we get the ERROR "cannot accept a value of type pg_ndistinct/
> pg_dependencies/pg_mcv_list" when we try to do an insert of any type.
> 
> Approch tried:
> 
> - Using yacc grammar file (statistics_gram.y) to parse the input string to its
> internal format for the types pg_distinct and pg_dependencies
> 
> - We are just calling byteain() for serialized input text of type pg_mcv_list.
> 
> Currently the changes are working locally,  I would like to push the commit
> changes to upstream if there any usecase for postgres.   Would like to know if
> there any interest from postgres side.

There is certainly interest in allowing the optimizer statistics to be
dumped and reloaded.  This could be used by pg_restore and pg_upgrade.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

Embrace your flaws.  They make you human, rather than perfect,
which you will never be.

Re: Todo: Teach planner to evaluate multiple windows in the optimal order

2023-01-06 Thread David Rowley

On Sat, 7 Jan 2023 at 02:11, Ankit Kumar Pandey  wrote:
> > On 05/01/23 12:53, David Rowley wrote:
> >>
> >> We *can* reuse Sorts where a more strict or equivalent sort order is
> >> available.  The question is how do we get the final WindowClause to do
> >> something slightly more strict to save having to do anything for the
> >> ORDER BY.  One way you might think would be to adjust the
> >> WindowClause's orderClause to add the additional clauses, but that
> >> cannot be done because that would cause are_peers() in nodeWindowAgg.c
> >> to not count some rows as peers when they maybe should be given a less
> >> strict orderClause in the WindowClause.
>
> I attempted this in attached patch.

I had a quick look at this and it's going to need some additional code
to ensure there are no WindowFuncs in the ORDER BY clause.  We can't
sort on those before we evaluate them.

Right now you get:

postgres=# explain select *,row_number() over (order by oid) rn1 from
pg_class order by oid,rn1;
ERROR:  could not find pathkey item to sort

I also don't think there's any point in adding the additional pathkeys
when the input path is already presorted.  Have a look at:

postgres=# set enable_seqscan=0;
SET
postgres=# explain select *,row_number() over (order by oid) rn1 from
pg_class order by oid,relname;
 QUERY PLAN

 WindowAgg  (cost=0.43..85.44 rows=412 width=281)
   ->  Incremental Sort  (cost=0.43..79.26 rows=412 width=273)
 Sort Key: oid, relname
 Presorted Key: oid
 ->  Index Scan using pg_class_oid_index on pg_class
(cost=0.27..60.72 rows=412 width=273)
(5 rows)

It would be better to leave this case alone and just do the
incremental sort afterwards.

You also don't seem to be considering the fact that the query might
have a DISTINCT clause.  That's evaluated between window functions and
the order by. It would be fairly useless to do a more strict sort when
the sort order is going to be obliterated by a Hash Aggregate. Perhaps
we can just not do this when the query has a DISTINCT clause.

On the other hand, there are also a few reasons why we shouldn't do
this. I mentioned the WindowClause runConditions earlier here.

The patched version produces:

postgres=# explain (analyze, costs off) select * from (select
oid,relname,row_number() over (order by oid) rn1 from pg_class order
by oid,relname) where rn1 < 10;
  QUERY PLAN
--
 WindowAgg (actual time=0.488..0.497 rows=9 loops=1)
   Run Condition: (row_number() OVER (?) < 10)
   ->  Sort (actual time=0.466..0.468 rows=10 loops=1)
 Sort Key: pg_class.oid, pg_class.relname
 Sort Method: quicksort  Memory: 67kB
 ->  Seq Scan on pg_class (actual time=0.028..0.170 rows=420 loops=1)
 Planning Time: 0.214 ms
 Execution Time: 0.581 ms
(8 rows)

Whereas master produces:

postgres=# explain (analyze, costs off) select * from (select
oid,relname,row_number() over (order by oid) rn1 from pg_class order
by oid,relname) where rn1 < 10;
   QUERY PLAN

 Incremental Sort (actual time=0.506..0.508 rows=9 loops=1)
   Sort Key: pg_class.oid, pg_class.relname
   Presorted Key: pg_class.oid
   Full-sort Groups: 1  Sort Method: quicksort  Average Memory: 26kB
Peak Memory: 26kB
   ->  WindowAgg (actual time=0.475..0.483 rows=9 loops=1)
 Run Condition: (row_number() OVER (?) < 10)
 ->  Sort (actual time=0.461..0.461 rows=10 loops=1)
   Sort Key: pg_class.oid
   Sort Method: quicksort  Memory: 67kB
   ->  Seq Scan on pg_class (actual time=0.022..0.178
rows=420 loops=1)
 Planning Time: 0.245 ms
 Execution Time: 0.594 ms
(12 rows)

(slightly bad example since oid is unique but...)

It's not too clear to me that the patched version is a better plan.
The bottom level sort, which sorts 420 rows has a more complex
comparison to do. Imagine the 2nd column is a long text string. That
would make the sort much more expensive.  The top-level sort has far
fewer rows to sort due to the runCondition filtering out anything that
does not match rn1 < 10. The same can be said for a query with a LIMIT
clause.

I think the patch should also be using pathkeys_contained_in() and
Lists of pathkeys rather than concatenating lists of SortGroupClauses
together.  That should allow things to work correctly when a given
pathkey has become redundant due to either duplication or a Const in
the Eclass.

Also, since I seem to be only be able to think of these cases properly
by actually trying them, I ended up with the attached patch.  I opted
to not do the optimisation when there are runConditions or a LIMIT
clause.  Doing it when there are runConditions rea

Re: Perform streaming logical transactions by background workers and parallel apply

2023-01-06 Thread Dilip Kumar

On Fri, Jan 6, 2023 at 3:38 PM houzj.f...@fujitsu.com
 wrote:
>

Looks good, but I feel in pa_process_spooled_messages_if_required()
function after getting the filestate the first check should be if
(filestate== FS_EMPTY) return false.  I mean why to process through
all the states if it is empty and we can directly exit.  It is not a
big deal so if you prefer the way it is then I have no objection to
it.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: allowing for control over SET ROLE

2023-01-06 Thread Noah Misch

On Wed, Jan 04, 2023 at 03:56:34PM -0500, Robert Haas wrote:
> On Tue, Jan 3, 2023 at 5:03 PM Noah Misch  wrote:
> > I'd start with locations where the patch already added documentation.  In 
> > the
> > absence of documentation otherwise, a reasonable person could think WITH SET
> > controls just SET ROLE.  The documentation of WITH SET is a good place to 
> > list
> > what else you opted for it to control.  If the documentation can explain the
> > set of principles that would be used to decide whether WITH SET should 
> > govern
> > another thing in the future, that would provide extra value.
> 
> From the point of view of the code, we currently have four different
> functions that make inquiries about role membership:
> has_privs_of_role, is_member_of_role, is_member_of_role_nosuper, and
> member_can_set_role.
> 
> I spent a while looking at how has_privs_of_role() is used. Basically,
> there are three main patterns. First, in some places, you must have
> the privileges of a certain role (typically, either a predefined role
> or the role that owns some object) or the operation will fail with an
> error indicating that you don't have sufficient permissions. Second,
> there are places where having the privileges of a certain role exempts
> you from some other permissions check; if you have neither, you'll get
> an error. An example is that having the permissions of
> pg_read_all_data substitutes for a select privilege. And third, there
> are cases where you definitely won't get an error, but the behavior
> will vary depending on whether you have the privileges of some role.
> For instance, you can see more data in pg_stat_replication,
> pg_stat_wal_receiver, and other stats views if you have
> pg_read_all_stats. The GUC values reported in EXPLAIN output will
> exclude superuser-only values unless you have pg_read_all_settings. It
> looks like some maintenance commands like CLUSTER and VACUUM
> completely skip over, or just warn about, cases where permission is
> lacking. And weirdest of all, having the privileges of a role means
> that the RLS policies applied to that role also apply to you. That's
> odd because it makes permissions not strictly additive.
> 
> member_can_set_role() controls (a) whether you can SET ROLE to some
> other role, (b) whether you can alter the owner of an existing object
> to that role, and (c) whether you can create an object owned by some
> other user in cases where the CREATE command has an option for that,
> like CREATE DATABASE ... OWNER.
> 
> is_member_of_role_nosuper() is used to prevent creation of role
> membership loops, and for pg_hba.conf matching.
> 
> The only remaining call to is_member_of_role() is in
> pg_role_aclcheck(), which just supports the SQL-callable
> pg_has_role(). has_privs_of_role() and member_can_set_role() are used
> here, too.
> 
> How much of this should we document, do you think?

Rough thoughts:

Do document:
- For pg_read_all_stats, something like s/Read all pg_stat_/See all rows of all 
pg_stat_/
- At CREATE POLICY and/or similar places, explain the semantics used to judge
  the applicability of role_name to a given query.

Don't document:
- Mechanism for preventing membership loops.

Already documented adequately:
- "First, in some places, you must have the privileges of a certain role" is
  documented through language like "You must own the table".
- pg_read_all_data
- EXPLAIN.  I'm not seeing any setting that's both GUC_SUPERUSER_ONLY and
  GUC_EXPLAIN.
- SQL-level pg_has_role().

Unsure:
- At INHERIT, cover the not-strictly-additive RLS consequences.

> If we're going to
> go into the details, I sort of feel like it would be good to somehow
> contrast what is attached to membership with what is attached to the
> INHERIT option or the SET option.

Works for me.

> I think it would be slightly
> surprising not to mention the way that RLS rules are triggered by
> privilege inheritance yet include the fact that the SET option affects
> ALTER ... OWNER TO, but maybe I've got the wrong idea.

The CREATE POLICY syntax and docs show the role_name parameter, though they
don't detail how exactly the server determines whether a given role applies at
a given moment.  The docs are silent on the SET / OWNER TO connection.  Hence,
I think the doc gap around SET / OWNER TO is more acute than the doc gap
around this RLS behavior.

Thanks,
nm

Re: add PROCESS_MAIN to VACUUM

2023-01-06 Thread Nathan Bossart

rebased for cfbot

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From 7e72b0a9f06fdfa00d5320d4c3303e67788878aa Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Thu, 29 Dec 2022 15:31:49 -0800
Subject: [PATCH v2 1/1] add PROCESS_MAIN to VACUUM

---
 doc/src/sgml/ref/vacuum.sgml | 13 +
 doc/src/sgml/ref/vacuumdb.sgml   | 15 +++
 src/backend/commands/vacuum.c| 27 +--
 src/backend/postmaster/autovacuum.c  |  4 +++-
 src/bin/psql/tab-complete.c  |  4 ++--
 src/bin/scripts/t/100_vacuumdb.pl|  7 +++
 src/bin/scripts/vacuumdb.c   | 24 
 src/include/commands/vacuum.h|  1 +
 src/test/regress/expected/vacuum.out |  4 
 src/test/regress/sql/vacuum.sql  |  5 +
 10 files changed, 95 insertions(+), 9 deletions(-)

diff --git a/doc/src/sgml/ref/vacuum.sgml b/doc/src/sgml/ref/vacuum.sgml
index 8fa8421847..4266f0c985 100644
--- a/doc/src/sgml/ref/vacuum.sgml
+++ b/doc/src/sgml/ref/vacuum.sgml
@@ -33,6 +33,7 @@ VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] [ ANALYZE ] [ boolean ]
 SKIP_LOCKED [ boolean ]
 INDEX_CLEANUP { AUTO | ON | OFF }
+PROCESS_MAIN [ boolean ]
 PROCESS_TOAST [ boolean ]
 TRUNCATE [ boolean ]
 PARALLEL integer
@@ -238,6 +239,18 @@ VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] [ ANALYZE ] [ defname, "process_toast") == 0)
 			process_toast = defGetBoolean(opt);
 		else if (strcmp(opt->defname, "truncate") == 0)
@@ -224,7 +227,8 @@ ExecVacuum(ParseState *pstate, VacuumStmt *vacstmt, bool isTopLevel)
 		(disable_page_skipping ? VACOPT_DISABLE_PAGE_SKIPPING : 0) |
 		(process_toast ? VACOPT_PROCESS_TOAST : 0) |
 		(skip_database_stats ? VACOPT_SKIP_DATABASE_STATS : 0) |
-		(only_database_stats ? VACOPT_ONLY_DATABASE_STATS : 0);
+		(only_database_stats ? VACOPT_ONLY_DATABASE_STATS : 0) |
+		(process_main ? VACOPT_PROCESS_MAIN : 0);
 
 	/* sanity checks on options */
 	Assert(params.options & (VACOPT_VACUUM | VACOPT_ANALYZE));
@@ -365,9 +369,10 @@ vacuum(List *relations, VacuumParams *params,
 			ereport(ERROR,
 	(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 	 errmsg("ONLY_DATABASE_STATS cannot be specified with a list of tables")));
-		/* don't require people to turn off PROCESS_TOAST explicitly */
+		/* don't require people to turn off PROCESS_TOAST/MAIN explicitly */
 		if (params->options & ~(VACOPT_VACUUM |
 VACOPT_VERBOSE |
+VACOPT_PROCESS_MAIN |
 VACOPT_PROCESS_TOAST |
 VACOPT_ONLY_DATABASE_STATS))
 			ereport(ERROR,
@@ -2026,10 +2031,12 @@ vacuum_rel(Oid relid, RangeVar *relation, VacuumParams *params)
 	/*
 	 * Remember the relation's TOAST relation for later, if the caller asked
 	 * us to process it.  In VACUUM FULL, though, the toast table is
-	 * automatically rebuilt by cluster_rel so we shouldn't recurse to it.
+	 * automatically rebuilt by cluster_rel so we shouldn't recurse to it
+	 * unless PROCESS_MAIN is disabled.
 	 */
 	if ((params->options & VACOPT_PROCESS_TOAST) != 0 &&
-		(params->options & VACOPT_FULL) == 0)
+		((params->options & VACOPT_FULL) == 0 ||
+		 (params->options & VACOPT_PROCESS_MAIN) == 0))
 		toast_relid = rel->rd_rel->reltoastrelid;
 	else
 		toast_relid = InvalidOid;
@@ -2048,7 +2055,8 @@ vacuum_rel(Oid relid, RangeVar *relation, VacuumParams *params)
 	/*
 	 * Do the actual work --- either FULL or "lazy" vacuum
 	 */
-	if (params->options & VACOPT_FULL)
+	if (params->options & VACOPT_FULL &&
+		params->options & VACOPT_PROCESS_MAIN)
 	{
 		ClusterParams cluster_params = {0};
 
@@ -2062,7 +2070,7 @@ vacuum_rel(Oid relid, RangeVar *relation, VacuumParams *params)
 		/* VACUUM FULL is now a variant of CLUSTER; see cluster.c */
 		cluster_rel(relid, InvalidOid, &cluster_params);
 	}
-	else
+	else if (params->options & VACOPT_PROCESS_MAIN)
 		table_relation_vacuum(rel, params, vac_strategy);
 
 	/* Roll back any GUC changes executed by index functions */
@@ -2089,7 +2097,14 @@ vacuum_rel(Oid relid, RangeVar *relation, VacuumParams *params)
 	 * totally unimportant for toast relations.
 	 */
 	if (toast_relid != InvalidOid)
+	{
+		/* we force VACOPT_PROCESS_MAIN so vacuum_rel() processes it */
+		bool force_opt = ((params->options & VACOPT_PROCESS_MAIN) == 0);
+
+		params->options |= force_opt ? VACOPT_PROCESS_MAIN : 0;
 		vacuum_rel(toast_relid, NULL, params);
+		params->options &= force_opt ? ~VACOPT_PROCESS_MAIN : ~0;
+	}
 
 	/*
 	 * Now release the session-level lock on the main table.
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index f5ea381c53..12dcb2b762 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -2860,7 +2860,9 @@ table_recheck_autovac(Oid relid, HTAB *table_toast_map,
 		 * skip vac_update_datfrozenxid(); we'll do that separately.
 		 */
 		tab->at_params.options =
-			(dovacuum ? (VACOPT_VACUUM | VACOPT_SKIP_DATABASE_STATS) : 0) |
+			(dovacuum ? (VACOPT_VACUUM |
+		 VAC

Re: Using WaitEventSet in the postmaster