Re: Conflict Detection and Resolution

2024-06-17 Thread Dilip Kumar
On Mon, Jun 17, 2024 at 5:38 AM Tomas Vondra
 wrote:
>

> > The issue with using commit timestamps is that, when multiple nodes
> > are involved, the commit timestamp won't accurately represent the
> > actual order of operations. There's no reliable way to determine the
> > perfect order of each operation happening on different nodes roughly
> > simultaneously unless we use some globally synchronized counter.
> > Generally, that order might not cause real issues unless one operation
> > is triggered by a previous operation, and relying solely on physical
> > timestamps would not detect that correctly.
> >
> This whole conflict detection / resolution proposal is based on using
> commit timestamps. Aren't you suggesting it can't really work with
> commit timestamps?
>
> FWIW there are ways to builds distributed consistency with timestamps,
> as long as it's monotonic - e.g. clock-SI does that. It's not perfect,
> but it shows it's possible.

Hmm, I see that clock-SI does this by delaying the transaction when it
detects the clock skew.

> However, I'm not we have to go there - it depends on what the goal is.
> For a one-directional replication (multiple nodes replicating to the
> same target) it might be sufficient if the conflict resolution is
> "deterministic" (e.g. not dependent on the order in which the changes
> are applied). I'm not sure, but it's why I asked what's the goal in my
> very first message in this thread.

I'm not completely certain about this.  Even in one directional
replication if multiple nodes are sending data how can we guarantee
determinism in the presence of clock skew if we are not using some
other mechanism like logical counters or something like what clock-SI
is doing?  I don't want to insist on using any specific solution here.
However, I noticed that we haven't addressed how we plan to manage
clock skew, which is my primary concern. I believe that if multiple
nodes are involved and we're receiving data from them with
unsynchronized clocks, ensuring determinism about their order will
require us to take some measures to handle that.

> > We need some sort of logical counter, such as a vector clock, which
> > might be an independent counter on each node but can perfectly track
> > the causal order. For example, if NodeA observes an operation from
> > NodeB with a counter value of X, NodeA will adjust its counter to X+1.
> > This ensures that if NodeA has seen an operation from NodeB, its next
> > operation will appear to have occurred after NodeB's operation.
> >
> > I admit that I haven't fully thought through how we could design such
> > version tracking in our logical replication protocol or how it would
> > fit into our system. However, my point is that we need to consider
> > something beyond commit timestamps to achieve reliable ordering.
> >
>
> I can't really respond to this as there's no suggestion how it would be
> implemented in the patch discussed in this thread.
>
No worries, I'll consider whether finding such a solution is feasible
for our situation. Thank you!

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: RFC: adding pytest as a supported test framework

2024-06-17 Thread Matthias van de Meent
Hi Greg, Jelte,

On Sat, 15 Jun 2024 at 23:53, Greg Sabino Mullane 
wrote:
>
> On Sat, Jun 15, 2024 at 12:48 PM Jelte Fennema-Nio 
wrote:
>>
>> Afaict, there's a significant part of our current community who feel the
same way (and I'm pretty sure every sub-30 year old person who
>> newly joins the community would feel the exact same way too).

I feel I'm still relatively new (started in the past 4 years) and I have
quite some time left before I hit 30 years of age.

I don't specifically feel the way you describe, nor do I think I've ever
really felt like that in the previous nearly 4 years of hacking. Then
again, I'm not interested in testing frameworks, so I don't feel much at
all about which test frameworks we use.

> Those young-uns are also the same group who hold their nose when coding
in C, and are always clamoring for rewriting Postgres in Rust.

Could you point me to one occasion I have 'always' clamored for this, or
any of "those young-uns" in the community? I may not be a huge fan of C,
but rewriting PostgreSQL in [other language] is not on the list of things
I'm clamoring for. I may have given off-hand mentions that [other language]
would've helped in certain cases, sure, but I'd hardly call that clamoring.

Kind regards,

Matthias van de Meent


Re: Pgoutput not capturing the generated columns

2024-06-17 Thread Peter Smith
Hi, here are my review comments for patch v7-0001.

==
1. GENERAL - \dRs+

Shouldn't the new SUBSCRIPTION parameter be exposed via "describe"
(e.g. \dRs+ mysub) the same as the other boolean parameters?

==
Commit message

2.
When 'include_generated_columns' is false then the PUBLICATION
col-list will ignore any generated cols even when they are present in
a PUBLICATION col-list

~

Maybe you don't need to mention "PUBLICATION col-list" twice.

SUGGESTION
When 'include_generated_columns' is false, generated columns are not
replicated, even when present in a PUBLICATION col-list.

~~~

2.
CREATE SUBSCRIPTION test1 connection 'dbname=postgres host=localhost port=
'publication pub1;

~

2a.
(I've questioned this one in previous reviews)

What exactly is the purpose of this statement in the commit message?
Was this supposed to demonstrate the usage of the
'include_generated_columns' parameter?

~

2b.
/publication/ PUBLICATION/


~~~

3.
If the subscriber-side column is also a generated column then
thisoption has no effect; the replicated data will be ignored and the
subscriber column will be filled as normal with the subscriber-side
computed or default data.

~

Missing space: /thisoption/this option/

==
.../expected/decoding_into_rel.out

4.
+-- When 'include-generated-columns' is not set
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,
NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+data
+-
+ BEGIN
+ table public.gencoltable: INSERT: a[integer]:1 b[integer]:2
+ table public.gencoltable: INSERT: a[integer]:2 b[integer]:4
+ table public.gencoltable: INSERT: a[integer]:3 b[integer]:6
+ COMMIT
+(5 rows)

Why is the default value here equivalent to
'include-generated-columns' = '1' here instead of '0'? The default for
the CREATE SUBSCRIPTION parameter 'include_generated_columns' is
false, and IMO it seems confusing for these 2 defaults to be
different. Here I think it should default to '0' *regardless* of what
the previous functionality might have done -- e.g. this is a "test
decoder" so the parameter should behave sensibly.

==
.../test_decoding/sql/decoding_into_rel.sql

NITPICK - wrong comments.

==
doc/src/sgml/protocol.sgml

5.
+
+ include_generated_columns
+  
+   
+Boolean option to enable generated columns. This option controls
+whether generated columns should be included in the string
+representation of tuples during logical decoding in PostgreSQL.
+The default is false.
+   
+  
+
+

Does the protocol version need to be bumped to support this new option
and should that be mentioned on this page similar to how all other
version values are mentioned?

==
doc/src/sgml/ref/create_subscription.sgml

NITPICK - some missing words/sentence.
NITPICK - some missing  tags.
NITPICK - remove duplicated sentence.
NITPICK - add another .

==
src/backend/commands/subscriptioncmds.c

6.
 #define SUBOPT_ORIGIN 0x8000
+#define SUBOPT_include_generated_columns 0x0001

Please use UPPERCASE for consistency with other macros.

==
.../libpqwalreceiver/libpqwalreceiver.c

7.
+ if (options->proto.logical.include_generated_columns &&
+ PQserverVersion(conn->streamConn) >= 17)
+ appendStringInfoString(&cmd, ", include_generated_columns 'on'");
+

IMO it makes more sense to say 'true' here instead of 'on'. It seems
like this was just cut/paste from the above code (where 'on' was
sensible).

==
src/include/catalog/pg_subscription.h

8.
@@ -98,6 +98,8 @@ CATALOG(pg_subscription,6100,SubscriptionRelationId)
BKI_SHARED_RELATION BKI_ROW
  * slots) in the upstream database are enabled
  * to be synchronized to the standbys. */

+ bool subincludegencol; /* True if generated columns must be published */
+

Not fixed as claimed. This field name ought to be plural.

/subincludegencol/subincludegencols/

~~~

9.
  char*origin; /* Only publish data originating from the
  * specified origin */
+ bool includegencol; /* publish generated column data */
 } Subscription;

Not fixed as claimed. This field name ought to be plural.

/includegencol/includegencols/

==
src/test/subscription/t/031_column_list.pl

10.
+$node_publisher->safe_psql('postgres',
+ "CREATE TABLE tab2 (a int PRIMARY KEY, b int GENERATED ALWAYS AS (a
* 2) STORED)"
+);
+
+$node_publisher->safe_psql('postgres',
+ "CREATE TABLE tab3 (a int PRIMARY KEY, b int GENERATED ALWAYS AS (a
+ 10) STORED)"
+);
+
 $node_subscriber->safe_psql('postgres',
  "CREATE TABLE tab1 (a int PRIMARY KEY, b int GENERATED ALWAYS AS (a
* 22) STORED, c int)"
 );

+$node_subscriber->safe_psql('postgres',
+ "CREATE TABLE tab2 (a int PRIMARY KEY, b int)"
+);
+
+$node_subscriber->safe_psql('postgres',
+ "CREATE TABLE tab3 (a int PRIMARY KEY, b int GENERATED ALWAYS AS (a
+ 20) STORED)"
+);

IMO the test needs lots more comments to describe what it is doing:

For example, the setu

Re: Using LibPq in TAP tests via FFI

2024-06-17 Thread Alvaro Herrera
On 2024-Jun-16, Andrew Dunstan wrote:


> +sub query_oneval
> +{
> + my $self = shift;
> + my $sql = shift;
> + my $missing_ok = shift; # default is not ok
> + my $conn = $self->{conn};
> + my $result = PQexec($conn, $sql);
> + my $ok = $result && (PQresultStatus($result) == PGRES_TUPLES_OK);
> + unless  ($ok)
> + {
> + PQclear($result) if $result;
> + return undef;
> + }
> + my $ntuples = PQntuples($result);
> + return undef if ($missing_ok && !$ntuples);
> + my $nfields = PQnfields($result);
> + die "$ntuples tuples != 1 or $nfields fields != 1"
> +   if $ntuples != 1 || $nfields != 1;
> + my $val = PQgetvalue($result, 0, 0);
> + if ($val eq "")
> + {
> + $val = undef if PGgetisnull($result, 0, 0);
> + }
> + PQclear($result);
> + return $val;
> +}

Hmm, here you use PGgetisnull, is that a typo for PQgetisnull?  If it
is, then I wonder why doesn't this fail in some obvious way?  Is this
part dead code maybe?

> +# return tuples like psql's -A -t mode.
> +
> +sub query_tuples
> +{
> + my $self = shift;
> + my @results;
> + foreach my $sql (@_)
> + {
> + my $res = $self->query($sql);
> + # join will render undef as an empty string here
> + no warnings qw(uninitialized);
> + my @tuples = map { join('|', @$_); } @{$res->{rows}};
> + push(@results, join("\n",@tuples));
> + }
> + return join("\n",@results);
> +}

You made this function join the tuples from multiple queries together,
but the output format doesn't show anything for queries that return
empty.  I think this strategy doesn't cater for the case of comparing
results from multiple queries very well, because it might lead to sets
of queries that return empty result for different queries reported as
identical when they aren't.  Maybe add a separator line between the
results from each query, when there's more than one?  (Perhaps just
"join('--\n', @results)" in that last line does the trick?)

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"The Postgresql hackers have what I call a "NASA space shot" mentality.
 Quite refreshing in a world of "weekend drag racer" developers."
(Scott Marlowe)




Re: SQL/JSON query functions context_item doc entry and type requirement

2024-06-17 Thread jian he
On Mon, Jun 17, 2024 at 2:43 PM Amit Langote  wrote:
>
> Hi,
>
> On Tue, Jun 4, 2024 at 12:11 AM jian he  wrote:
> >
> > hi
> > based on gram.y and function transformJsonValueExpr.
> >
> > gram.y:
> > | JSON_QUERY '('
> > json_value_expr ',' a_expr json_passing_clause_opt
> > json_returning_clause_opt
> > json_wrapper_behavior
> > json_quotes_clause_opt
> > json_behavior_clause_opt
> > ')'
> >
> > | JSON_EXISTS '('
> > json_value_expr ',' a_expr json_passing_clause_opt
> > json_on_error_clause_opt
> > ')'
> >
> > | JSON_VALUE '('
> > json_value_expr ',' a_expr json_passing_clause_opt
> > json_returning_clause_opt
> > json_behavior_clause_opt
> > ')'
> >
> > json_format_clause_opt contains:
> > | FORMAT_LA JSON
> > {
> > $$ = (Node *) makeJsonFormat(JS_FORMAT_JSON, JS_ENC_DEFAULT, @1);
> > }
> >
> >
> > That means, all the context_item can specify "FORMAT JSON" options,
> > in the meantime, do we need to update these functions
> > synopsis/signature in the doc?
> >

>
> If I understand correctly, you're suggesting that we add a line to the
> above paragraph to mention which types are appropriate for
> context_item.  How about we add the following:
>
> context_item expression can be a value of
> any type that can be cast to jsonb. This includes types
> such as char,  text, bpchar,
> character varying, and bytea (with
> ENCODING UTF8), as well as any domains over these types.

your wording looks ok to me. I want to add two sentences. so it becomes:

+   The context_item expression can be a value of
+any type that can be cast to jsonb. This includes types
+   such as char,  text, bpchar,
+character varying, and bytea (with
+ENCODING UTF8), as well as any domains over these types.
+The context_item expression can also
be followed with
+FORMAT JSON, ENCODING UTF8.
+These two options currently don't have actual meaning.
+ENCODING UTF8 can only be specified when
context_item type is bytea.

imho, "These two options currently don't have actual meaning." is accurate,
but still does not explain why we allow "FORMAT JSON ENCODING UTF8".
I think we may need an explanation for  "FORMAT JSON ENCODING UTF8".
because json_array, json_object, json_serialize, json all didn't
mention the meaning of "[ FORMAT JSON [ ENCODING UTF8 ] ] ".


I added "[ FORMAT JSON [ ENCODING UTF8 ] ] " to the function
signature/synopsis of json_exists, json_query, json_value.
From c5e2b51d351c3987e96290363288499f1a369e49 Mon Sep 17 00:00:00 2001
From: jian he 
Date: Mon, 17 Jun 2024 17:35:01 +0800
Subject: [PATCH v1 1/1] minor SQL/JSON Query Functions doc fix

main fix doc context_item description.
---
 doc/src/sgml/func.sgml | 21 +
 1 file changed, 17 insertions(+), 4 deletions(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 17c44bc3..18db4a5b 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -18663,7 +18663,17 @@ $.* ? (@ like_regex "^\\d+$")
described in  can be used
to query JSON documents.  Each of these functions apply a
path_expression (the query) to a
-   context_item (the document); see
+   context_item (the document).
+   The context_item expression can be a value of
+any type that can be cast to jsonb. This includes types
+   such as char,  text, bpchar,
+character varying, and bytea (with
+ENCODING UTF8), as well as any domains over these types.
+The context_item expression can also be followed with
+FORMAT JSON, ENCODING UTF8.
+These two options currently don't have actual meaning.
+ENCODING UTF8 can only be specified when context_item type is bytea.
+   See
 for more details on what
path_expression can contain.
   
@@ -18689,7 +18699,8 @@ $.* ? (@ like_regex "^\\d+$")
   
 json_exists
 json_exists (
-context_item, path_expression  PASSING { value AS varname } , ...
+context_item  FORMAT JSON  ENCODING UTF8  ,
+path_expression  PASSING { value AS varname } , ...
  { TRUE | FALSE | UNKNOWN | ERROR } ON ERROR )


@@ -18729,7 +18740,8 @@ ERROR:  jsonpath array subscript is out of bounds
   
 json_query
 json_query (
-context_item, path_expression  PASSING { value AS varname } , ...
+context_item  FORMAT JSON  ENCODING UTF8  ,
+path_expression  PASSING { value AS varname } , ...
  RETURNING data_type  FORMAT JSON  ENCODING UTF8   
  { WITHOUT | WITH { CONDITIONAL | UNCONDITIONAL } }  ARRAY  WRAPPER 
  { KEEP | OMIT } QUOTES  ON SCALAR STRING  
@@ -18806,7 +18818,8 @@ DETAIL:  Missing "]" after array dimensions.
   
 json_value
 json_value (
-context_item, path_expression
+context_item  FORMAT JSON  ENCODING UTF8  ,
+path_expression
  PASSING { value AS varname } , ...
  RETURNING data_type 
  { ERROR | NULL | DEFAULT expression } ON EMPTY 
-- 
2.34.1



Re: Conflict Detection and Resolution

2024-06-17 Thread Amit Kapila
On Wed, Jun 12, 2024 at 10:03 AM Dilip Kumar  wrote:
>
> On Tue, Jun 11, 2024 at 7:44 PM Tomas Vondra
>  wrote:
>
> > > Yes, that's correct. However, many cases could benefit from the
> > > update_deleted conflict type if it can be implemented reliably. That's
> > > why we wanted to give it a try. But if we can't achieve predictable
> > > results with it, I'm fine to drop this approach and conflict_type. We
> > > can consider a better design in the future that doesn't depend on
> > > non-vacuumed entries and provides a more robust method for identifying
> > > deleted rows.
> > >
> >
> > I agree having a separate update_deleted conflict would be beneficial,
> > I'm not arguing against that - my point is actually that I think this
> > conflict type is required, and that it needs to be detected reliably.
> >
>
> When working with a distributed system, we must accept some form of
> eventual consistency model. However, it's essential to design a
> predictable and acceptable behavior. For example, if a change is a
> result of a previous operation (such as an update on node B triggered
> after observing an operation on node A), we can say that the operation
> on node A happened before the operation on node B. Conversely, if
> operations on nodes A and B are independent, we consider them
> concurrent.
>
> In distributed systems, clock skew is a known issue. To establish a
> consistency model, we need to ensure it guarantees the
> "happens-before" relationship. Consider a scenario with three nodes:
> NodeA, NodeB, and NodeC. If NodeA sends changes to NodeB, and
> subsequently NodeB makes changes, and then both NodeA's and NodeB's
> changes are sent to NodeC, the clock skew might make NodeB's changes
> appear to have occurred before NodeA's changes. However, we should
> maintain data that indicates NodeB's changes were triggered after
> NodeA's changes arrived at NodeB. This implies that logically, NodeB's
> changes happened after NodeA's changes, despite what the timestamps
> suggest.
>
> A common method to handle such cases is using vector clocks for
> conflict resolution.
>

I think the unbounded size of the vector could be a problem to store
for each event. However, while researching previous discussions, it
came to our notice that we have discussed this topic in the past as
well in the context of standbys. For recovery_min_apply_delay, we
decided the clock skew is not a problem as the settings of this
parameter are much larger than typical time deviations between servers
as mentioned in docs. Similarly for casual reads [1], there was a
proposal to introduce max_clock_skew parameter and suggesting the user
to make sure to have NTP set up correctly. We have tried to check
other databases (like Ora and BDR) where CDR is implemented but didn't
find anything specific to clock skew. So, I propose to go with a GUC
like max_clock_skew such that if the difference of time between the
incoming transaction's commit time and the local time is more than
max_clock_skew then we raise an ERROR. It is not clear to me that
putting bigger effort into clock skew is worth especially when other
systems providing CDR feature (like Ora or BDR) for decades have not
done anything like vector clocks. It is possible that this is less of
a problem w.r.t CDR and just detecting the anomaly in clock skew is
good enough.

[1] - 
https://www.postgresql.org/message-id/flat/CAEepm%3D1iiEzCVLD%3DRoBgtZSyEY1CR-Et7fRc9prCZ9MuTz3pWg%40mail.gmail.com

-- 
With Regards,
Amit Kapila.




Re: ecdh support causes unnecessary roundtrips

2024-06-17 Thread Daniel Gustafsson
> On 17 Jun 2024, at 01:46, Andres Freund  wrote:

> When connecting with a libpq based client, the TLS establishment ends up like
> this in many configurations;
> 
> C->S: TLSv1 393 Client Hello
> S->C: TLSv1.3 167 Hello Retry Request, Change Cipher Spec
> C->S: TLSv1.3 432 Change Cipher Spec, Client Hello
> S->C: TLSv1.3 1407 Server Hello, Application Data, Application Data, 
> Application Data, Application Data

I wonder if this is the result of us still using TLS 1.2 as the default minimum
protocol version.  In 1.2 the ClientHello doesn't contain the extensions for
cipher and curves, so the server must reply with a HRR (Hello Retry Request)
message asking the client to set protocol, curve and cipher.  The client will
respond with a new Client Hello using 1.3 with the extensions.

Using 1.3 as the default minimum has been on my personal TODO for a while,
maybe that should be revisited for 18.

> I.e. there are two clients hellos, because the server rejects the clients
> "parameters".

Or the client didn't send any.

> This appears to be caused by ECDH support.  The difference between the two
> ClientHellos is
> -Extension: key_share (len=38) x25519
> +Extension: key_share (len=71) secp256r1
> 
> I.e. the clients wanted to use x25519, but the server insists on secp256r1.

Somewhat related, Erica Zhang has an open patch to make the server-side curves
configuration take a list rather than a single curve [0], and modernizing the
API used as a side effect (SSL_CTX_set_tmp_ecdh is documented as obsoleted by
OpenSSL, but not deprecated with an API level).

> I don't know if it's good that we're calling SSL_CTX_set_tmp_ecdh at all,

To set the specified curve in ssl_ecdh_curve we have to don't we?

> but if it is, shouldn't we at least do the same in libpq, so we don't 
> introduce
> unnecessary roundtrips?

If we don't set the curve in the client I believe OpenSSL will pass the set of
supported curves the client has, which then should allow the server to pick the
one it wants based on ssl_ecdh_curve, so ideally we shouldn't have to I think.

> I did confirm that doing the same thing on the client side removes the
> additional roundtrip.

The roundtrip went away because the client was set to use secp256r1?  I wonder
if that made OpenSSL override the min protocol version and switch to a TLS1.3
ClientHello since it otherwise couldn't announce the curve.  If you force the
client min protocol to 1.3 in an unpatched client, do you see the same speedup?

--
Daniel Gustafsson

[0] tencent_1b368b07f4b5886f9822981189da736cf...@qq.com



Re: Vacuum statistics

2024-06-17 Thread Alena Rybakina

I have written the documentary and attached the patch.

On 08.06.2024 09:30, Alena Rybakina wrote:


Iam currentlyworkingondividingthispatchintothreepartstosimplifythe 
reviewprocess:oneofthemwillcontaincodeforcollectingvacuumstatisticsontables,the 
secondonindexesandthe lastondatabases.I alsowritethe documentation.



--
Regards,
Alena Rybakina
Postgres Professional:http://www.postgrespro.com
The Russian Postgres Company
From ba01e9b5b14a0b46482beea20127062e8ae7e067 Mon Sep 17 00:00:00 2001
From: Alena Rybakina 
Date: Mon, 17 Jun 2024 00:48:45 +0300
Subject: [PATCH] Add documentation about the system views that are used in the
 machinery of vacuum statistics.

---
 doc/src/sgml/system-views.sgml | 747 +
 1 file changed, 747 insertions(+)

diff --git a/doc/src/sgml/system-views.sgml b/doc/src/sgml/system-views.sgml
index 67a42a1c4d4..dfef8ed44b8 100644
--- a/doc/src/sgml/system-views.sgml
+++ b/doc/src/sgml/system-views.sgml
@@ -5738,4 +5738,751 @@ ALTER DATABASE template0 ALLOW_CONNECTIONS off;
   
  
 
+
+  pg_stats_vacuum_database
+
+  
+   pg_stats_vacuum_database
+  
+
+  
+   The view pg_stats_vacuum_database will contain
+   one row for each database in the current cluster, showing statistics about
+   vacuuming that database.
+  
+
+  
+   pg_stats_vacuum_database Columns
+   
+
+ 
+  
+   Column Type
+  
+  
+   Description
+  
+ 
+
+
+
+ 
+  
+   dbid oid
+  
+  
+   OID of a database
+  
+ 
+
+ 
+  
+   total_blks_read int8
+  
+  
+Number of database blocks read by vacuum operations
+performed on this database
+  
+ 
+
+ 
+  
+   total_blks_hit int8
+  
+  
+Number of times database blocks were found in the
+buffer cache by vacuum operations
+performed on this database
+  
+ 
+
+ 
+  
+   total_blks_dirtied int8
+  
+  
+Number of database blocks dirtied by vacuum operations
+performed on this database
+  
+ 
+
+ 
+  
+   total_blks_written int8
+  
+  
+Number of database blocks written by vacuum operations
+performed on this database
+  
+ 
+
+ 
+  
+   wal_records int8
+  
+  
+Total number of WAL records generated by vacuum operations
+performed on this database
+  
+ 
+
+ 
+  
+   wal_fpi int8
+  
+  
+Total number of WAL full page images generated by vacuum operations
+performed on this database
+  
+ 
+
+ 
+  
+   wal_bytes numeric
+  
+  
+Total amount of WAL bytes generated by vacuum operations
+performed on this database
+  
+ 
+
+ 
+  
+   blk_read_time float8
+  
+  
+Time spent reading database blocks by vacuum operations performed on
+this database, in milliseconds (if  is enabled,
+otherwise zero)
+  
+ 
+
+ 
+  
+   blk_write_time float8
+  
+  
+Time spent writing database blocks by vacuum operations performed on
+this database, in milliseconds (if  is enabled,
+otherwise zero)
+  
+ 
+
+ 
+  
+   delay_time float8
+  
+  
+Time spent sleeping in a vacuum delay point by vacuum operations performed on
+this database, in milliseconds (see 
+for details)
+  
+ 
+
+ 
+  
+   system_time float8
+  
+  
+System CPU time of vacuuming this database, in milliseconds
+  
+ 
+
+ 
+  
+   user_time float8
+  
+  
+User CPU time of vacuuming this database, in milliseconds
+  
+ 
+
+ 
+  
+   total_time float8
+  
+  
+Total time of vacuuming this database, in milliseconds
+  
+ 
+
+ 
+  
+   interrupts int4
+  
+  
+Number of times vacuum operations performed on this database
+were interrupted on any errors
+  
+ 
+
+   
+  
+ 
+
+  
+  pg_stats_vacuum_indexes
+
+  
+   pg_stats_vacuum_indexes
+  
+
+  
+   The view pg_stats_vacuum_indexes will contain
+   one row for each index in the current database (including TOAST
+   table indexes), showing statistics about vacuuming that specific index.
+  
+
+  
+   pg_stats_vacuum_indexes Columns
+   
+
+ 
+  
+   Column Type
+  
+  
+   Description
+  
+ 
+
+
+
+ 
+  
+   relid oid
+  
+  
+   OID of an index
+  
+ 
+
+ 
+  
+   schema name
+  
+  
+Name of the schema this index is in
+  
+ 
+
+ 
+  
+   relname name
+  
+  
+   Name of this index
+  
+ 
+
+ 
+  
+   total_blks_read int8
+  
+  
+Number of database blocks read by vacuum operations
+   

Re: POC, WIP: OR-clause support for indexes

2024-06-17 Thread Alena Rybakina

Hi, thank you for your work with this subject!

On 14.06.2024 15:00, Alexander Korotkov wrote:

On Mon, Apr 8, 2024 at 1:34 AM Alexander Korotkov  wrote:

I've revised the patch.  Did some beautification, improvements for
documentation, commit messages etc.

I've pushed the 0001 patch without 0002.  I think 0001 is good by
itself given that there is the or_to_any_transform_limit GUC option.
The more similar OR clauses are here the more likely grouping them
into SOAP will be a win.  But I've changed the default value to 5.
This will make it less invasive and affect only queries with obvious
repeating patterns.  That also reduced the changes in the regression
tests expected outputs.

Regarding 0002, it seems questionable since it could cause a planning
slowdown for SAOP's with large arrays.  Also, it might reduce the win
of transformation made by 0001.  So, I think we should skip it for
now.

The patch has been reverted from pg17.  Let me propose a new version
for pg18 based on the valuable feedback from Tom Lane [1][2].

  * The transformation is moved to the stage of adding restrictinfos to
the base relation (in particular add_base_clause_to_rel()).  This
leads to interesting consequences.  While this allows IndexScans to
use transformed clauses, BitmapScans and SeqScans seem unaffected.
Therefore, I wasn't able to find a planning regression.
  * As soon as there is no planning regression anymore, I've removed
or_to_any_transform_limit GUC, which was a source of critics.
  * Now, not only Consts allowed in the SAOP's list, but also Params.
  * The criticized hash based on expression jumbling has been removed.
Now, the plain list is used instead.
  * OrClauseGroup now gets a legal node tag.  That allows to mix it in
the list with other nodes without hacks.

I think this patch shouldn't be as good as before for optimizing
performance of large OR lists, given that BitmapScans and SeqScans
still deal with ORs.  However, it allows IndexScans to handle more,
doesn't seem to cause planning regression and therefore introduce no
extra GUC.  Overall, this seems like a good compromise.

This patch could use some polishing, but I'd like to first hear some
feedback on general design.

Links
1.https://www.postgresql.org/message-id/3604469.1712628736%40sss.pgh.pa.us
2.https://www.postgresql.org/message-id/3649287.1712642139%40sss.pgh.pa.us


Inoticedthat7librarieshave 
beenaddedtosrc/backend/optimizer/plan/initsplan.c,andas faras 
Iremember,TomLanehas alreadyexpresseddoubtsaboutthe 
approachthatrequiresaddinga largenumberof libraries[0], but I'm afraid 
I'm out of ideas about alternative approach.


In addition,Icheckedthe fixinthe 
previouscasesthatyouwroteearlier[1]andnoticedthatSeqScancontinuesto 
generate,unfortunately,withoutconvertingexpressions:


with patch:

create table test as (select (random()*10)::int x, (random()*1000) y 
from generate_series(1,100) i); create index test_x_1_y on test (y) 
where x = 1; create index test_x_2_y on test (y) where x = 2; vacuum 
analyze test; SELECT 100 CREATE INDEX CREATE INDEX VACUUM 
alena@postgres=# explain select * from test where (x = 1 or x = 2) and y 
= 100; QUERY PLAN 
-- 
Gather (cost=1000.00..12690.10 rows=1 width=12) Workers Planned: 2 -> 
Parallel Seq Scan on test (cost=0.00..11690.00 rows=1 width=12) Filter: 
(((x = 1) OR (x = 2)) AND (y = '100'::double precision)) (4 rows) 
alena@postgres=# set enable_seqscan =off; SET alena@postgres=# explain 
select * from test where (x = 1 or x = 2) and y = 100; QUERY PLAN 
- 
Seq Scan on test (cost=100.00..1020440.00 rows=1 width=12) 
Filter: (((x = 1) OR (x = 2)) AND (y = '100'::double precision)) (2 rows)


without patch:

-- 
Bitmap Heap Scan on test (cost=8.60..12.62 rows=1 width=12) Recheck 
Cond: (((y = '100'::double precision) AND (x = 1)) OR ((y = 
'100'::double precision) AND (x = 2))) -> BitmapOr (cost=8.60..8.60 
rows=1 width=0) -> Bitmap Index Scan on test_x_1_y (cost=0.00..4.30 
rows=1 width=0) Index Cond: (y = '100'::double precision) -> Bitmap 
Index Scan on test_x_2_y (cost=0.00..4.30 rows=1 width=0) Index Cond: (y 
= '100'::double precision) (7 rows)


[0] https://www.postgresql.org/message-id/3604469.1712628736%40sss.pgh.pa.us

[1] 
https://www.postgresql.org/message-id/CAPpHfduJtO0s9E%3DSHUTzrCD88BH0eik0UNog1_q3XBF2wLmH6g%40mail.gmail.com 



--
Regards,
Alena Rybakina
Postgres Professional:http://www.postgrespro.com
The Russian Postgres Company


Re: Avoid orphaned objects dependencies, take 3

2024-06-17 Thread Bertrand Drouvot
Hi,

On Thu, Jun 13, 2024 at 04:52:09PM +, Bertrand Drouvot wrote:
> Hi,
> 
> On Thu, Jun 13, 2024 at 10:49:34AM -0400, Robert Haas wrote:
> > On Fri, Jun 7, 2024 at 4:41 AM Bertrand Drouvot
> >  wrote:
> > > Do you still find the code hard to maintain with v9?
> > 
> > I don't think it substantially changes my concerns as compared with
> > the earlier version.
> 
> Thanks for the feedback, I'll give it more thoughts.

Please find attached v10 that puts the object locking outside of the dependency
code.

It's done that way except for:

recordDependencyOnExpr() 
recordDependencyOnSingleRelExpr()
makeConfigurationDependencies()

The reason is that I think that it would need part of the logic that his inside
the above functions to be duplicated and I'm not sure that's worth it.

For example, we would probably need to:

- make additional calls to find_expr_references_walker() 
- make additional scan on the config map

It's also not done outside of recordDependencyOnCurrentExtension() as:

1. I think it is clear enough that way (as it is clear that the lock is taken on
a ExtensionRelationId object class).
2. why to include "commands/extension.h" in more places (locking would
depend of "creating_extension" and "CurrentExtensionObject"), while 1.?

Remarks:

1. depLockAndCheckObject() and friends in v9 have been renamed to
LockNotPinnedObject() and friends (as the vast majority of their calls are now
done outside of the dependency code).

2. regarding the concern around RelationRelationId (discussed in [1]), v10 adds
a comment "XXX Do we need a lock for RelationRelationId?" at the places we
may want to lock this object class. I did not think about it yet (but will do),
I only added this comment at multiple places.

I think that v10 is easier to follow (as compare to v9) as we can easily see for
which object class we'll put a lock on.

Thoughts?

[1]: 
https://www.postgresql.org/message-id/Zmv3TPfJAyQXhIdu%40ip-10-97-1-34.eu-west-3.compute.internal

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
>From dc75e3255803617d21c55d77dc218904bd729d81 Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot 
Date: Fri, 29 Mar 2024 15:43:26 +
Subject: [PATCH v10] Avoid orphaned objects dependencies

It's currently possible to create orphaned objects dependencies, for example:

Scenario 1:

session 1: begin; drop schema schem;
session 2: create a function in the schema schem
session 1: commit;

With the above, the function created in session 2 would be linked to a non
existing schema.

Scenario 2:

session 1: begin; create a function in the schema schem
session 2: drop schema schem;
session 1: commit;

With the above, the function created in session 1 would be linked to a non
existing schema.

To avoid those scenarios, a new lock (that conflicts with a lock taken by DROP)
has been put in place before the dependencies are being recorded. With this in
place, the drop schema in scenario 2 would be locked.

Also, after the new lock attempt, the patch checks that the object still exists:
with this in place session 2 in scenario 1 would be locked and would report an
error once session 1 committs (that would not be the case should session 1 abort
the transaction).

If the object is dropped before the new lock attempt is triggered then the patch
would also report an error (but with less details).

The patch adds a few tests for some dependency cases (that would currently produce
orphaned objects):

- schema and function (as the above scenarios)
- alter a dependency (function and schema)
- function and arg type
- function and return type
- function and function
- domain and domain
- table and type
- server and foreign data wrapper
---
 src/backend/catalog/aclchk.c  |   1 +
 src/backend/catalog/dependency.c  | 109 ++-
 src/backend/catalog/heap.c|   8 ++
 src/backend/catalog/index.c   |  16 +++
 src/backend/catalog/objectaddress.c   |  57 
 src/backend/catalog/pg_aggregate.c|   9 ++
 src/backend/catalog/pg_attrdef.c  |   1 +
 src/backend/catalog/pg_cast.c |   5 +
 src/backend/catalog/pg_collation.c|   1 +
 src/backend/catalog/pg_constraint.c   |  11 ++
 src/backend/catalog/pg_conversion.c   |   2 +
 src/backend/catalog/pg_depend.c   |  27 +++-
 src/backend/catalog/pg_operator.c |  19 +++
 src/backend/catalog/pg_proc.c |  17 ++-
 src/backend/catalog/pg_publication.c  |   5 +
 src/backend/catalog/pg_range.c|   6 +
 src/backend/catalog/pg_type.c |  33 +
 src/backend/catalog/toasting.c|   1 +
 src/backend/commands/alter.c  |   4 +
 src/backend/commands/amcmds.c |   1 +
 src/backend/commands/cluster.c|   5 +
 src/backend/commands/event_trigger.c  |   

Re: speed up a logical replica setup

2024-06-17 Thread Peter Eisentraut

On 07.06.24 05:49, Euler Taveira wrote:
Here it is a patch series to fix the issues reported in recent 
discussions. The
patches 0001 and 0003 aim to fix the buildfarm issues. The patch 0002 
removes
synchronized failover slots on subscriber since it has no use. I also 
included
an optional patch 0004 that improves the usability by checking both 
servers if

it already failed in any subscriber check.


I have committed 0001, 0002, and 0003.  Let's keep an eye on the 
buildfarm to see if that stabilizes things.  So far it looks good.


For 0004, I suggest inverting the result values from check_publisher() 
and create_subscriber() so that it returns true if the check is ok.






Re: speed up a logical replica setup

2024-06-17 Thread Peter Eisentraut

On 07.06.24 11:17, Hayato Kuroda (Fujitsu) wrote:

Other minor comments are included by the attached diff file. It contains changes
to follow conventions and pgindent/pgperltidy.


I have included some of your changes in the patches from Euler that I 
committed today.  The 0004 patch might get rerolled by Euler, so he 
could include your relevant changes there.


I'm not sure about moving the sync_replication_slots setting around.  It 
seems to add a lot of work for no discernible gain?


The pgperltidy changes are probably better done separately, because they 
otherwise make quite a mess of the patch.  Maybe we'll just do that once 
more before we branch.






Re: POC, WIP: OR-clause support for indexes

2024-06-17 Thread Alexander Korotkov
On Mon, Jun 17, 2024 at 1:33 PM Alena Rybakina
 wrote:
> I noticed that 7 libraries have been added to 
> src/backend/optimizer/plan/initsplan.c, and as far as I remember, Tom Lane 
> has already expressed doubts about the approach that requires adding a large 
> number of libraries [0], but I'm afraid I'm out of ideas about alternative 
> approach.

Thank you for pointing.  Right, the number of extra headers included
was one of points for criticism on this patch.  I'll look to move this
functionality elsewhere, while the stage of transformation could
probably be the same.

> In addition, I checked the fix in the previous cases that you wrote earlier 
> [1] and noticed that SeqScan continues to generate, unfortunately, without 
> converting expressions:

I've rechecked and see I made wrong conclusion about this.  The plan
regression is still here.  But I'm still looking to workaround this
without extra GUC.

I think we need to additionally do something like [1], but take
further steps to avoid planning overhead when not necessary.  In
particular, I think we should only consider splitting SAOP for bitmap
OR in the following cases:
1. There are partial indexes with predicates over target column.
2. There are multiple indexes covering target column and different
subsets of other columns presented in restrictions.
3. There are indexes covreing target column without support of SAOP
(amsearcharray == false).
Hopefully this should skip generation of useless bitmap paths in
majority cases. Thoughts?

Links.
1. 
https://www.postgresql.org/message-id/67bd918d-285e-44d2-a207-f52d9a4c35e6%40postgrespro.ru

--
Regards,
Alexander Korotkov
Supabase




Re: POC, WIP: OR-clause support for indexes

2024-06-17 Thread Alexander Korotkov
On Mon, Jun 17, 2024 at 7:02 AM Andrei Lepikhov
 wrote:
> On 6/14/24 19:00, Alexander Korotkov wrote:
> > This patch could use some polishing, but I'd like to first hear some
> > feedback on general design.
> Thanks for your time and efforts. I have skimmed through the code—there
> is a minor fix in the attachment.
> First and foremost, I think this approach can survive.
> But generally, I'm not happy with manipulations over a restrictinfo clause:
> 1. While doing that, we should remember the fields of the RestrictInfo
> clause. It may need to be changed, too, or it can require such a change
> in the future if someone adds new logic.
> 2. We should remember the link to the RestrictInfo: see how the caller
> of the  distribute_restrictinfo_to_rels routine manipulates its fields
> right after the distribution.
> 3. Remember caches and cached decisions inside the RestrictInfo
> structure: replacing the clause should we change these fields too?
>
> These were the key reasons why we shifted the code to the earlier stages
> in the previous incarnation. So, going this way we should recheck all
> the  fields of this structure and analyse how the transformation can
> [potentially] affect their values.

I see your points.  Making this at the stage of restrictinfos seems
harder, and there are open questions in the patch.

I'd like to hear how Tom feels about this.  Is this the right
direction, or should we try another way?

--
Regards,
Alexander Korotkov
Supabase




Re: RFC: adding pytest as a supported test framework

2024-06-17 Thread Aleksander Alekseev
Hi Jacob,

> For the v18 cycle, I would like to try to get pytest [1] in as a
> supported test driver, in addition to the current offerings.
>
> (I'm tempted to end the email there.)

Huge +1 from me and many thanks for working on this.

Two cents from me.

I spent a fair part of my career writing in Perl. Personally I like
the language and often find it more convenient for the tasks I'm
working on than Python.

This being said, there were several projects I was involved in where
we had to choose a scripting language. In all the cases there was a
strong push-back against Perl and Python always seemed to be a common
denominator for everyone. So I ended up with the rule of thumb to use
Perl for projects I'm working on alone and Python otherwise. Although
the objective reality in the entire industry is unknown to me I spoke
to many people whose observations were similar.

We could speculate about the reasons why people seem to prefer Python
(good IDE support*, unique** libraries like Matplotlib / NumPy /
SciPy, ...) but honestly I don't think they are extremely relevant in
this discussion.

I believe supporting Python in our test infrastructure will attract
more contributors and thus would be a good step for the project in the
long run.

* including PyTest integration
** citation needed

-- 
Best regards,
Aleksander Alekseev




Is creating logical replication slots in template databases useful at all?

2024-06-17 Thread Bharath Rupireddy
Hi,

While looking at the commit
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=29d0a77fa6606f9c01ba17311fc452dabd3f793d,
I noticed that get_old_cluster_logical_slot_infos gets called for even
template1 and template0 databases. Which means, pg_upgrade executes
queries against the template databases to get replication slot
information. I then realized that postgres allows one to connect to
template1 database (or any other user-defined template databases for
that matter), and create logical replication slots in it. If created,
all the subsequent database creations will end up adding inactive
logical replication slots in the postgres server. This might not be a
problem in production servers as I assume the connections to template
databases are typically restricted. Despite the connection
restrictions, if at all one gets to connect to template databases in
any way, it's pretty much possible to load the postgres server with
inactive replication slots.

This leads me to think why one would need logical replication slots in
template databases at all. Can postgres restrict logical replication
slots creation in template databases? If restricted, it may deviate
from the fundamental principle of template databases in the sense that
everything in the template database must be copied over to the new
database created using it. Is it okay to do this? Am I missing
something here?

Thoughts?

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Using LibPq in TAP tests via FFI

2024-06-17 Thread Andrew Dunstan



On 2024-06-17 Mo 5:07 AM, Alvaro Herrera wrote:

On 2024-Jun-16, Andrew Dunstan wrote:



+sub query_oneval
+{
+   my $self = shift;
+   my $sql = shift;
+   my $missing_ok = shift; # default is not ok
+   my $conn = $self->{conn};
+   my $result = PQexec($conn, $sql);
+   my $ok = $result && (PQresultStatus($result) == PGRES_TUPLES_OK);
+   unless  ($ok)
+   {
+   PQclear($result) if $result;
+   return undef;
+   }
+   my $ntuples = PQntuples($result);
+   return undef if ($missing_ok && !$ntuples);
+   my $nfields = PQnfields($result);
+   die "$ntuples tuples != 1 or $nfields fields != 1"
+ if $ntuples != 1 || $nfields != 1;
+   my $val = PQgetvalue($result, 0, 0);
+   if ($val eq "")
+   {
+   $val = undef if PGgetisnull($result, 0, 0);
+   }
+   PQclear($result);
+   return $val;
+}

Hmm, here you use PGgetisnull, is that a typo for PQgetisnull?  If it
is, then I wonder why doesn't this fail in some obvious way?  Is this
part dead code maybe?



It's not dead, just not exercised ATM. I should maybe include a test 
scripts for the two new modules.


As you rightly suggest, it's a typo. If it had been called it would have 
aborted the test.






+# return tuples like psql's -A -t mode.
+
+sub query_tuples
+{
+   my $self = shift;
+   my @results;
+   foreach my $sql (@_)
+   {
+   my $res = $self->query($sql);
+   # join will render undef as an empty string here
+   no warnings qw(uninitialized);
+   my @tuples = map { join('|', @$_); } @{$res->{rows}};
+   push(@results, join("\n",@tuples));
+   }
+   return join("\n",@results);
+}

You made this function join the tuples from multiple queries together,
but the output format doesn't show anything for queries that return
empty.  I think this strategy doesn't cater for the case of comparing
results from multiple queries very well, because it might lead to sets
of queries that return empty result for different queries reported as
identical when they aren't.  Maybe add a separator line between the
results from each query, when there's more than one?  (Perhaps just
"join('--\n', @results)" in that last line does the trick?)



psql doesn't do that, and this is designed to mimic psql's behaviour. We 
could change that of course. I suspect none of the uses expect empty 
resultsets, so it's probably somewhat moot.



Thanks for looking.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Introduce XID age and inactive timeout based replication slot invalidation

2024-06-17 Thread Bharath Rupireddy
Hi,

On Sat, Apr 13, 2024 at 9:36 AM Bharath Rupireddy
 wrote:
>
> There was a point raised by Amit
> https://www.postgresql.org/message-id/CAA4eK1K8wqLsMw6j0hE_SFoWAeo3Kw8UNnMfhsWaYDF1GWYQ%2Bg%40mail.gmail.com
> on when to do the XID age based invalidation - whether in checkpointer
> or when vacuum is being run or whenever ComputeXIDHorizons gets called
> or in autovacuum process. For now, I've chosen the design to do these
> new invalidation checks in two places - 1) whenever the slot is
> acquired and the slot acquisition errors out if invalidated, 2) during
> checkpoint. However, I'm open to suggestions on this.

Here are my thoughts on when to do the XID age invalidation. In all
the patches sent so far, the XID age invalidation happens in two
places - one during the slot acquisition, and another during the
checkpoint. As the suggestion is to do it during the vacuum (manual
and auto), so that even if the checkpoint isn't happening in the
database for whatever reasons, a vacuum command or autovacuum can
invalidate the slots whose XID is aged.

An idea is to check for XID age based invalidation for all the slots
in ComputeXidHorizons() before it reads replication_slot_xmin and
replication_slot_catalog_xmin, and obviously before the proc array
lock is acquired. A potential problem with this approach is that the
invalidation check can become too aggressive as XID horizons are
computed from many places.

Another idea is to check for XID age based invalidation for all the
slots in higher levels than ComputeXidHorizons(), for example in
vacuum() which is an entry point for both vacuum command and
autovacuum. This approach seems similar to vacuum_failsafe_age GUC
which checks each relation for the failsafe age before vacuum gets
triggered on it.

Does anyone see any issues or risks with the above two approaches or
have any other ideas? Thoughts?

I attached v40 patches here. I reworded some of the ERROR messages,
and did some code clean-up. Note that I haven't implemented any of the
above approaches yet.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
From 7a920d4f1a4d6a10776ff597d6d931b10340417c Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy 
Date: Mon, 17 Jun 2024 06:55:27 +
Subject: [PATCH v40 1/2] Add inactive_timeout based replication slot
 invalidation

Till now, postgres has the ability to invalidate inactive
replication slots based on the amount of WAL (set via
max_slot_wal_keep_size GUC) that will be needed for the slots in
case they become active. However, choosing a default value for
max_slot_wal_keep_size is tricky. Because the amount of WAL a
customer generates, and their allocated storage will vary greatly
in production, making it difficult to pin down a one-size-fits-all
value. It is often easy for developers to set a timeout of say 1
or 2 or 3 days, after which the inactive slots get invalidated.

To achieve the above, postgres introduces a GUC allowing users
set inactive timeout. The replication slots that are inactive
for longer than specified amount of time get invalidated.

The invalidation check happens at various locations to help being
as latest as possible, these locations include the following:
- Whenever the slot is acquired and the slot acquisition errors
out if invalidated.
- During checkpoint

Note that this new invalidation mechanism won't kick-in for the
slots that are currently being synced from the primary to the
standby, because such synced slots are typically considered not
active (for them to be later considered as inactive) as they don't
perform logical decoding to produce the changes.

Author: Bharath Rupireddy
Reviewed-by: Bertrand Drouvot, Amit Kapila, Shveta Malik
Discussion: https://www.postgresql.org/message-id/CALj2ACW4aUe-_uFQOjdWCEN-xXoLGhmvRFnL8SNw_TZ5nJe+aw@mail.gmail.com
Discussion: https://www.postgresql.org/message-id/CA%2BTgmoZTbaaEjSZUG1FL0mzxAdN3qmXksO3O9_PZhEuXTkVnRQ%40mail.gmail.com
Discussion: https://www.postgresql.org/message-id/202403260841.5jcv7ihniccy%40alvherre.pgsql
---
 doc/src/sgml/config.sgml  |  33 ++
 doc/src/sgml/system-views.sgml|   7 +
 .../replication/logical/logicalfuncs.c|   2 +-
 src/backend/replication/logical/slotsync.c|  11 +-
 src/backend/replication/slot.c| 188 +++-
 src/backend/replication/slotfuncs.c   |   2 +-
 src/backend/replication/walsender.c   |   4 +-
 src/backend/utils/adt/pg_upgrade_support.c|   2 +-
 src/backend/utils/misc/guc_tables.c   |  12 +
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/include/replication/slot.h|   6 +-
 src/test/recovery/meson.build |   1 +
 src/test/recovery/t/050_invalidate_slots.pl   | 286 ++
 13 files changed, 535 insertions(+), 20 deletions(-)
 create mode 100644 src/test/recovery/t/050_invalidate_slots.pl

diff --git a/doc/src/sgml/config.sgml b/doc/src/

Re: ON ERROR in json_query and the like

2024-06-17 Thread Chapman Flack
Hi,

On 06/17/24 02:20, Amit Langote wrote:
>>>Apparently, the functions expect JSONB so that a cast is implied
>>>when providing TEXT. However, the errors during that cast are
>>>not subject to the ON ERROR clause.
>>>
>>>17beta1=# SELECT JSON_QUERY('invalid', '$' NULL ON ERROR);
>>>ERROR:  invalid input syntax for type json
>>>DETAIL:  Token "invalid" is invalid.
>>>CONTEXT:  JSON data, line 1: invalid
>>>
>>>Oracle DB and Db2 (LUW) both return NULL in that case.

I wonder, could prosupport rewriting be used to detect that the first
argument is supplied by a cast, and rewrite the expression to apply the
cast 'softly'? Or would that behavior be too magical?

Regards,
-Chap




Re: SQL/JSON query functions context_item doc entry and type requirement

2024-06-17 Thread Chapman Flack
Hi,

On 06/17/24 02:43, Amit Langote wrote:
> context_item expression can be a value of
> any type that can be cast to jsonb. This includes types
> such as char,  text, bpchar,
> character varying, and bytea (with
> ENCODING UTF8), as well as any domains over these types.

Reading this message in conjunction with [0] makes me think that we are
really talking about a function that takes a first parameter of type jsonb,
and behaves exactly that way (so any cast required is applied by the system
ahead of the call). Under those conditions, this seems like an unusual
sentence to add in the docs, at least until we have also documented that
tan's argument can be of any type that can be cast to double precision.

On the other hand, if the behavior of the functions were to be changed
(perhaps using prosupport rewriting as suggested in [1]?) so that it was
not purely describable as a function accepting exactly jsonb with a
possible system-applied cast in front, then in that case such an added
explanation in the docs might be very fitting.

Regards,
-Chap


[0]
https://www.postgresql.org/message-id/CA%2BHiwqGuqLfAEP-FwW3QHByfQOoUpyj6YZG6R6bScpQswvNYDA%40mail.gmail.com
[1] https://www.postgresql.org/message-id/66703054.6040109%40acm.org




Re: ON ERROR in json_query and the like

2024-06-17 Thread Markus Winand


> On 17.06.2024, at 08:20, Amit Langote  wrote:
> 
> Hi,
> 
> (apologies for not replying to this thread sooner)
> 
> On Tue, May 28, 2024 at 6:57 PM Pavel Stehule  wrote:
>> út 28. 5. 2024 v 11:29 odesílatel Markus Winand  
>> napsal:
>>> 
>>> Hi!
>>> 
>>> I’ve noticed two “surprising” (to me) behaviors related to
>>> the “ON ERROR” clause of the new JSON query functions in 17beta1.
>>> 
>>> 1. JSON parsing errors are not subject to ON ERROR
>>>   Apparently, the functions expect JSONB so that a cast is implied
>>>   when providing TEXT. However, the errors during that cast are
>>>   not subject to the ON ERROR clause.
>>> 
>>>   17beta1=# SELECT JSON_QUERY('invalid', '$' NULL ON ERROR);
>>>   ERROR:  invalid input syntax for type json
>>>   DETAIL:  Token "invalid" is invalid.
>>>   CONTEXT:  JSON data, line 1: invalid
>>> 
>>>   Oracle DB and Db2 (LUW) both return NULL in that case.
>>> 
>>>   I had a look on the list archive to see if that is intentional but
>>>   frankly speaking these functions came a long way. In case it is
>>>   intentional it might be worth adding a note to the docs.
>> 
>> 
>> I remember a talk about this subject years ago. Originally the JSON_QUERY 
>> was designed in similar like Oracle, and casting to jsonb was done inside. 
>> If I remember this behave depends on the fact, so old SQL/JSON has not json 
>> type and it was based just on processing of plain text. But Postgres has 
>> JSON, and JSONB and then was more logical to use these types. And because 
>> the JSON_QUERY uses these types, and the casting is done before the 
>> execution of the function, then the clause ON ERROR cannot be handled. 
>> Moreover, until soft errors Postgres didn't allow handling input errors in 
>> common functions.
>> 
>> I think so this difference should be mentioned in documentation.
> 
> Agree that the documentation needs to be clear about this. I'll update
> my patch at [1] to add a note next to table 9.16.3. SQL/JSON Query
> Functions.

Considering another branch of this thread [1] I think the
"Supported Features” appendix of the docs should mention that as well.

The way I see it is that the standards defines two overloaded
JSON_QUERY functions, of which PostgreSQL will support only one.
In case of valid JSON, the implied CAST makes it look as though
the second variant of these function was supported as well but that
illusion totally falls apart once the JSON is not valid anymore.

I think it affects the following feature IDs:

  - T821, Basic SQL/JSON query operators
 For JSON_VALUE, JSON_TABLE and JSON_EXISTS
  - T828, JSON_QUERY

Also, how hard would it be to add the functions that accept
character strings? Is there, besides the effort, any thing else
against it? I’m asking because I believe once released it might
never be changed — for backward compatibility.

[1] 
https://www.postgresql.org/message-id/CAKFQuwb50BAaj83Np%2B1O6xe3_T6DO8w2mxtFbgSbbUng%2BabrqA%40mail.gmail.com


> 
>>> 2. EMPTY [ARRAY|OBJECT] ON ERROR implies ERROR ON EMPTY
>>> 
>>>   17beta1=# SELECT JSON_QUERY('[]', '$[*]' EMPTY ARRAY ON ERROR) a;
>>>a
>>>   
>>>[]
>>>   (1 row)
>>> 
>>>   As NULL ON EMPTY is implied, it should give the same result as
>>>   explicitly adding NULL ON EMPTY:
>>> 
>>>   17beta1=# SELECT JSON_QUERY('[]', '$[*]' NULL ON EMPTY EMPTY ARRAY ON 
>>> ERROR) a;
>>>a
>>>   ---
>>> 
>>>   (1 row)
>>> 
>>>   Interestingly, Oracle DB gives the same (wrong) results. Db2 (LUW)
>>>   on the other hand returns NULL for both queries.
>>> 
>>>   I don’t think that PostgreSQL should follow Oracle DB's suit here
>>>   but again, in case this is intentional it should be made explicit
>>>   in the docs.
> 
> This behavior is a bug and result of an unintentional change that I
> made at some point after getting involved with this patch set.  So I'm
> going to fix this so that the empty results of jsonpath evaluation use
> NULL ON EMPTY by default, ie, when the ON EMPTY clause is not present.
> Attached a patch to do so.
> 

Tested: works.

Thanks :)

-markus





Re: DROP OWNED BY fails to clean out pg_init_privs grants

2024-06-17 Thread Daniel Gustafsson
> On 15 Jun 2024, at 01:46, Tom Lane  wrote:
> 
> Robert Haas  writes:
>> I think the only thing we absolutely have to fix here is the dangling
>> ACL references.
> 
> Here's a draft patch that takes care of Hannu's example, and I think
> it fixes other potential dangling-reference scenarios too.
> 
> I'm not sure whether I like the direction this is going, but I do
> think we may not be able to do a lot more than this for v17.

Agreed.

> The core change is to install SHARED_DEPENDENCY_INITACL entries in
> pg_shdepend for all references to non-pinned roles in pg_init_privs,
> whether they are the object's owner or not.  Doing that ensures that
> we can't drop a role that is still referenced there, and it provides
> a basis for shdepDropOwned and shdepReassignOwned to take some kind
> of action for such references.

I wonder if this will break any tools/scripts in prod which relies on the
previous (faulty) behaviour.  It will be interesting to see if anything shows
up on -bugs.  Off the cuff it seems like a good idea judging by where we are
and what we can fix with it.

> I'm not terribly thrilled with this, because it's still possible
> to get into a state where a pg_init_privs entry is based on
> an owning role that's no longer the owner: you just have to use
> ALTER OWNER directly rather than via REASSIGN OWNED.  While
> I don't think the backend has much problem with that, it probably
> will confuse pg_dump to some extent.  However, such cases are
> going to confuse pg_dump anyway for reasons discussed upthread,
> namely that we don't really support dump/restore of extensions
> where not all the objects are owned by the extension owner.
> I'm content to leave that in the pile of unfinished work for now.

+1

> An alternative definition could be that ALTER OWNER also replaces
> old role with new in pg_init_privs entries.  That would reduce
> the surface for confusing pg_dump a little bit, but I don't think
> that I like it better, for two reasons:
> 
> * ALTER OWNER would have to probe pg_init_acl for potential
> entries every time, which would be wasted work for most ALTERs.

Unless it would magically fix all the pg_dump problems I'd prefer to avoid
this.

> * As this stands, updateInitAclDependencies() no longer pays any
> attention to its ownerId argument, and in one place I depend on
> that to skip doing a rather expensive lookup of the current object
> owner.  Perhaps we should remove that argument altogether, and
> in consequence simplify some other callers too?  However, that
> would only help much if we were willing to revert 534287403's
> changes to pass the object's owner ID to recordExtensionInitPriv,
> which I'm hesitant to do because I suspect we'll end up wanting
> to record the owner ID explicitly in pg_init_privs entries.
> On the third hand, maybe we'll never do that, so perhaps we should
> revert those changes for now; some of them add nontrivial lookup
> costs.

I wonder if it's worth reverting passing the owner ID for v17 and revisiting
that in 18 if we work on recording the ID.  Shaving a few catalog lookups is
generally worthwhile, doing them without needing the result for the next five
years might bite us.

Re-reading 534287403 I wonder about this hunk in RemoveRoleFromInitPriv:

+   if (!isNull)
+   old_acl = DatumGetAclPCopy(oldAclDatum);
+   else
+   old_acl = NULL; /* this case shouldn't happen, 
probably */

I wonder if we should Assert() on old_acl being NULL?  I can't imagine a case
where it should legitimately be that and catching such in development might be
useful for catching stray bugs?

> * In shdepReassignOwned, I refactored to move the switch on
> sdepForm->classid into a new subroutine.  We could have left
> it where it is, but it would need a couple more tab stops of
> indentation which felt like too much.  It's in the eye of
> the beholder though.

I prefer the new way.

+1 on going ahead with this patch.  There is more work to do but I agree that
this about all that makes sense in v17 at this point.

--
Daniel Gustafsson





Re: RFC: adding pytest as a supported test framework

2024-06-17 Thread Jelte Fennema-Nio
On Sun, 16 Jun 2024 at 23:04, Robert Haas  wrote:
>
> On Sat, Jun 15, 2024 at 6:00 PM Melanie Plageman
>  wrote:
> > Writing a new test framework in a popular language that makes it more
> > likely that more people will write more tests and test infrastructure
> > is such a completely different thing than suggesting we rewrite
> > Postgres in Rust that I feel that this comparison is unfair and,
> > frankly, a distraction from the discussion at hand.
>
> I don't really agree with this.
> 
> it's not *as* problematic if some tests are
> written in one language and others in another as it would be if
> different parts of the backend used different languages, and it
> wouldn't be *as* hard if at some point we decided we wanted to convert
> all remaining code to the new language.

Honestly, it sounds like you actually do agree with each other. It
seems you interpreted Melanie her use of "thing" as "people wanting to
use Rust/Python in the Postgres codebase" while I believe she probably
meant "all the problems and effort involved in the task making that
possible''. And afaict from your response, you definitely agree that
making it possible to use Rust in our main codebase is a lot more
difficult than for Python for our testing code.

> But, would I feel respected and valued as a participant in
> that project? Would I have to use weird tools and follow arcane and
> frustrating processes? If I did, *that* would make me give up. I don't
> want to say that the choice of programming language doesn't matter at
> all, but it seems to me that it might matter more because it's a
> symptom of being unwilling to modernize things rather than for its own
> sake.

I can personally definitely relate to this (although I wouldn't frame
it as strongly as you did). Postgres development definitely requires
weird tools and arcane processes (imho) when compared to most other
open source projects. The elephant in the room is of course the
mailing list development flow. But we have some good reasons for using
that[^1]. But most people have some limit on the amount of weirdness
they are willing to accept when wanting to contribute, and the mailing
list pushes us quite close to that limit for a bunch of people
already. Any additional weird tools/arcane processes might push some
people over that limit.

We've definitely made big improvements in modernizing our development
workflow over the last few years though: We now have CI (cfbot), a
modern build system (meson), and working autoformatting (requiring
pgindent on commit). These improvements have been very noticeable to
me, and I think we should continue such efforts. I think allowing
people to write tests in Python is one of the easier improvements that
we can make.

[^1]: Although I think those reasons apply much less to the
documentation, maybe we could allow github contributions for just
those.




Re: Using LibPq in TAP tests via FFI

2024-06-17 Thread Alvaro Herrera
On 2024-Jun-17, Andrew Dunstan wrote:

> On 2024-06-17 Mo 5:07 AM, Alvaro Herrera wrote:

> > You made this function join the tuples from multiple queries together,
> > but the output format doesn't show anything for queries that return
> > empty.  I think this strategy doesn't cater for the case of comparing
> > results from multiple queries very well, because it might lead to sets
> > of queries that return empty result for different queries reported as
> > identical when they aren't.  Maybe add a separator line between the
> > results from each query, when there's more than one?  (Perhaps just
> > "join('--\n', @results)" in that last line does the trick?)
> 
> psql doesn't do that, and this is designed to mimic psql's behaviour. We
> could change that of course. I suspect none of the uses expect empty
> resultsets, so it's probably somewhat moot.

True -- I guess my comment should really be directed to the original
coding of the test in test_index_replay.  I think adding the separator
line makes it more trustworthy.

Probably you're right that the current code of in-core tests don't care
about this, but if we export this technique to the world, I'm sure
somebody somewhere is going to care.

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"The Gord often wonders why people threaten never to come back after they've
been told never to return" (www.actsofgord.com)




Re: ON ERROR in json_query and the like

2024-06-17 Thread Pavel Stehule
po 17. 6. 2024 v 15:07 odesílatel Markus Winand 
napsal:

>
> > On 17.06.2024, at 08:20, Amit Langote  wrote:
> >
> > Hi,
> >
> > (apologies for not replying to this thread sooner)
> >
> > On Tue, May 28, 2024 at 6:57 PM Pavel Stehule 
> wrote:
> >> út 28. 5. 2024 v 11:29 odesílatel Markus Winand <
> markus.win...@winand.at> napsal:
> >>>
> >>> Hi!
> >>>
> >>> I’ve noticed two “surprising” (to me) behaviors related to
> >>> the “ON ERROR” clause of the new JSON query functions in 17beta1.
> >>>
> >>> 1. JSON parsing errors are not subject to ON ERROR
> >>>   Apparently, the functions expect JSONB so that a cast is implied
> >>>   when providing TEXT. However, the errors during that cast are
> >>>   not subject to the ON ERROR clause.
> >>>
> >>>   17beta1=# SELECT JSON_QUERY('invalid', '$' NULL ON ERROR);
> >>>   ERROR:  invalid input syntax for type json
> >>>   DETAIL:  Token "invalid" is invalid.
> >>>   CONTEXT:  JSON data, line 1: invalid
> >>>
> >>>   Oracle DB and Db2 (LUW) both return NULL in that case.
> >>>
> >>>   I had a look on the list archive to see if that is intentional but
> >>>   frankly speaking these functions came a long way. In case it is
> >>>   intentional it might be worth adding a note to the docs.
> >>
> >>
> >> I remember a talk about this subject years ago. Originally the
> JSON_QUERY was designed in similar like Oracle, and casting to jsonb was
> done inside. If I remember this behave depends on the fact, so old SQL/JSON
> has not json type and it was based just on processing of plain text. But
> Postgres has JSON, and JSONB and then was more logical to use these types.
> And because the JSON_QUERY uses these types, and the casting is done before
> the execution of the function, then the clause ON ERROR cannot be handled.
> Moreover, until soft errors Postgres didn't allow handling input errors in
> common functions.
> >>
> >> I think so this difference should be mentioned in documentation.
> >
> > Agree that the documentation needs to be clear about this. I'll update
> > my patch at [1] to add a note next to table 9.16.3. SQL/JSON Query
> > Functions.
>
> Considering another branch of this thread [1] I think the
> "Supported Features” appendix of the docs should mention that as well.
>
> The way I see it is that the standards defines two overloaded
> JSON_QUERY functions, of which PostgreSQL will support only one.
> In case of valid JSON, the implied CAST makes it look as though
> the second variant of these function was supported as well but that
> illusion totally falls apart once the JSON is not valid anymore.
>
> I think it affects the following feature IDs:
>
>   - T821, Basic SQL/JSON query operators
>  For JSON_VALUE, JSON_TABLE and JSON_EXISTS
>   - T828, JSON_QUERY
>
> Also, how hard would it be to add the functions that accept
> character strings? Is there, besides the effort, any thing else
> against it? I’m asking because I believe once released it might
> never be changed — for backward compatibility.
>

It is easy to add the function that accepts text, but when you have
overloaded functions, then varchar or text is the preferred type, and then
the arguments will be casted to text by default instead of json. You can
have one function with argument of type "any", but then the
execution is a little bit slower (outer cast is faster than cast inside
function), and again the Postgres cannot deduce used argument types from
function's argument types.

Probably this can be solved if we introduce a new kind of type, where the
preferred type will be json, or jsonb.

So the problem is in the definition of implementation details about the
mechanism of type deduction (when you use string literal or when you use
string expression).

So now, when you will write json_table(x1 || x2), then and x1 and x2 are of
unknown type, then Postgres can know, so x1 and x2 will be jsonb, but when
there
will be secondary function json_table(text), then Postgres detects problem,
and use preferred type (that is text).

Generally, Postgres supports function overloading and it is working well
between text, int and numeric where constants have different syntax, but
when the constant
literal has the same syntax, there can be problems with hidden casts to
unwanted type, so not overloaded function is ideal. These issues can be
solved in the analysis stage by special code, but again it increases code
complexity.






>
> [1]
> https://www.postgresql.org/message-id/CAKFQuwb50BAaj83Np%2B1O6xe3_T6DO8w2mxtFbgSbbUng%2BabrqA%40mail.gmail.com
>
>
> >
> >>> 2. EMPTY [ARRAY|OBJECT] ON ERROR implies ERROR ON EMPTY
> >>>
> >>>   17beta1=# SELECT JSON_QUERY('[]', '$[*]' EMPTY ARRAY ON ERROR) a;
> >>>a
> >>>   
> >>>[]
> >>>   (1 row)
> >>>
> >>>   As NULL ON EMPTY is implied, it should give the same result as
> >>>   explicitly adding NULL ON EMPTY:
> >>>
> >>>   17beta1=# SELECT JSON_QUERY('[]', '$[*]' NULL ON EMPTY EMPTY ARRAY
> ON ERROR) a;
> >>>a
> >>>   ---
> >>>
> >>>   (1 row)
> >>>
> 

Re: Using LibPq in TAP tests via FFI

2024-06-17 Thread Andrew Dunstan


On 2024-06-16 Su 6:38 PM, Andres Freund wrote:

Hi,

On 2024-06-16 17:43:05 -0400, Andrew Dunstan wrote:

On Fri, Jun 14, 2024 at 12:25 PM Andres Freund  wrote:
I guess it's a question of how widely available FFI::Platypus is. I know
it's available pretty much out of the box on Strawberry Perl and Msys2'
ucrt perl.

FWIW I hacked a bit on CI, trying to make it work. Took a bit, partially
because CI uses an older strawberry perl without FFI::Platypus. And
FFI::Platypus didn't build with that.


Updating that to 5.38 causes some complaints about LANG that I haven't hunted
down, just avoided by unsetting LANG.


As-is your patch didn't work, because it has "systempath => []", which caused
libpq to not load, because it depended on things in the system path...


What's the reason for that?



Not sure, that code was written months ago. I just checked the 
FFI::CheckLib code and libpath is searched before systempath, so there 
shouldn't be any reason not to use the default load path.





After commenting that out, all but one tests passed:

[20:21:31.137] - 8< 
-
[20:21:31.137] stderr:
[20:21:31.137] #   Failed test 'psql connect success'
[20:21:31.137] #   at 
C:/cirrus/src/test/recovery/t/041_checkpoint_at_promote.pl line 161.
[20:21:31.137] #  got: '2'
[20:21:31.137] # expected: '0'
[20:21:31.137] #   Failed test 'psql select 1'
[20:21:31.137] #   at 
C:/cirrus/src/test/recovery/t/041_checkpoint_at_promote.pl line 162.
[20:21:31.137] #  got: ''
[20:21:31.137] # expected: '1'
[20:21:31.137] # Looks like you failed 2 tests of 6.
[20:21:31.137]
[20:21:31.137] (test program exited with status code 2)
[20:21:31.137] 
--
[20:21:31.137]



Yeah, the recovery tests were using poll_query_until in a rather funky 
way. That's fixed in this latest version.








I agree with you that falling back on BackgroundPsql is not a terribly
satisfactory solution.

I'm somewhat doubtful we'll just agree on making FFI::Platypus a hard
dependency, but if we agree to do so...




Maybe not. If so your other suggestion of a small XS wrapper might make 
sense.



cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com
diff --git a/contrib/amcheck/t/001_verify_heapam.pl b/contrib/amcheck/t/001_verify_heapam.pl
index 9de3148277..c259d2ccfd 100644
--- a/contrib/amcheck/t/001_verify_heapam.pl
+++ b/contrib/amcheck/t/001_verify_heapam.pl
@@ -5,6 +5,7 @@ use strict;
 use warnings FATAL => 'all';
 
 use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Session;
 use PostgreSQL::Test::Utils;
 
 use Test::More;
@@ -18,7 +19,9 @@ $node = PostgreSQL::Test::Cluster->new('test');
 $node->init;
 $node->append_conf('postgresql.conf', 'autovacuum=off');
 $node->start;
-$node->safe_psql('postgres', q(CREATE EXTENSION amcheck));
+my $session = PostgreSQL::Test::Session->new(node => $node);
+
+$session->do(q(CREATE EXTENSION amcheck));
 
 #
 # Check a table with data loaded but no corruption, freezing, etc.
@@ -49,7 +52,7 @@ detects_heap_corruption(
 # Check a corrupt table with all-frozen data
 #
 fresh_test_table('test');
-$node->safe_psql('postgres', q(VACUUM (FREEZE, DISABLE_PAGE_SKIPPING) test));
+$session->do(q(VACUUM (FREEZE, DISABLE_PAGE_SKIPPING) test));
 detects_no_corruption("verify_heapam('test')",
 	"all-frozen not corrupted table");
 corrupt_first_page('test');
@@ -81,7 +84,7 @@ sub relation_filepath
 	my ($relname) = @_;
 
 	my $pgdata = $node->data_dir;
-	my $rel = $node->safe_psql('postgres',
+	my $rel = $session->query_oneval(
 		qq(SELECT pg_relation_filepath('$relname')));
 	die "path not found for relation $relname" unless defined $rel;
 	return "$pgdata/$rel";
@@ -92,8 +95,8 @@ sub get_toast_for
 {
 	my ($relname) = @_;
 
-	return $node->safe_psql(
-		'postgres', qq(
+	return $session->query_oneval(
+		qq(
 		SELECT 'pg_toast.' || t.relname
 			FROM pg_catalog.pg_class c, pg_catalog.pg_class t
 			WHERE c.relname = '$relname'
@@ -105,8 +108,8 @@ sub fresh_test_table
 {
 	my ($relname) = @_;
 
-	return $node->safe_psql(
-		'postgres', qq(
+	return $session->do(
+		qq(
 		DROP TABLE IF EXISTS $relname CASCADE;
 		CREATE TABLE $relname (a integer, b text);
 		ALTER TABLE $relname SET (autovacuum_enabled=false);
@@ -130,8 +133,8 @@ sub fresh_test_sequence
 {
 	my ($seqname) = @_;
 
-	return $node->safe_psql(
-		'postgres', qq(
+	return $session->do(
+		qq(
 		DROP SEQUENCE IF EXISTS $seqname CASCADE;
 		CREATE SEQUENCE $seqname
 			INCREMENT BY 13
@@ -147,8 +150,8 @@ sub advance_test_sequence
 {
 	my ($seqname) = @_;
 
-	return $node->safe_psql(
-		'postgres', qq(
+	return $session->query_oneval(
+		qq(
 		SELECT nextval('$seqname');
 	));
 }
@@ -158,7 +161,7 @@ sub set_test_sequence
 {
 	my ($seqname) = @_;
 
-	return $node->safe_psql(
+	return $session->query_oneval(
 		'postgres', qq(
 		SELECT setval('$seqname', 102);
 	));
@@ -169,8 +172,8 @@ su

Re: Direct SSL connection and ALPN loose ends

2024-06-17 Thread Jacob Champion
On Mon, Apr 29, 2024 at 8:24 AM Heikki Linnakangas  wrote:
> I was mostly worried about the refactoring of the
> retry logic in libpq (and about the pre-existing logic too to be honest,
> it was complicated before these changes already).

Some changes in the v17 negotiation fallback order caught my eye:

1. For sslmode=prefer, a modern v3 error during negotiation now
results in a fallback to plaintext. For v16 this resulted in an
immediate failure. (v2 errors retain the v16 behavior.)
2. For gssencmode=prefer, a legacy v2 error during negotiation now
results in an immediate failure. In v16 it allowed fallback to SSL or
plaintext depending on sslmode.

Are both these changes intentional/desirable? Change #1 seems to
partially undo the decision made in a49fbaaf:

> Don't assume that "E" response to NEGOTIATE_SSL_CODE means pre-7.0 server.
>
> These days, such a response is far more likely to signify a server-side
> problem, such as fork failure. [...]
>
> Hence, it seems best to just eliminate the assumption that backing off
> to non-SSL/2.0 protocol is the way to recover from an "E" response, and
> instead treat the server error the same as we would in non-SSL cases.

Thanks,
--Jacob




Inconsistency between try_mergejoin_path and create_mergejoin_plan

2024-06-17 Thread Alexander Pyhalov

Hi.

There's the following inconsistency between try_mergejoin_path() and 
create_mergejoin_plan().
When clause operator has no commutator, we can end up with mergejoin 
path.
Later create_mergejoin_plan() will call get_switched_clauses(). This 
function can error out with


ERROR:  could not find commutator for operator XXX

The similar behavior seems to be present also for hash join.

Attaching a test case (in patch) and a possible fix.

--
Best regards,
Alexander Pyhalov,
Postgres ProfessionalFrom ea9497c9f62b3613482d5b267c7907070ba8fcd4 Mon Sep 17 00:00:00 2001
From: Alexander Pyhalov 
Date: Mon, 17 Jun 2024 10:33:10 +0300
Subject: [PATCH] Merge and hash joins need more checks with custom operators

---
 src/backend/optimizer/path/joinpath.c| 46 +++-
 src/test/regress/expected/equivclass.out | 45 +++
 src/test/regress/sql/equivclass.sql  | 20 +++
 3 files changed, 110 insertions(+), 1 deletion(-)

diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c
index 5be8da9e095..dae06117569 100644
--- a/src/backend/optimizer/path/joinpath.c
+++ b/src/backend/optimizer/path/joinpath.c
@@ -95,7 +95,7 @@ static void generate_mergejoin_paths(PlannerInfo *root,
 	 Path *inner_cheapest_total,
 	 List *merge_pathkeys,
 	 bool is_partial);
-
+static bool all_clauses_compatible(Relids outerrelids, List *clauses);
 
 /*
  * add_paths_to_joinrel
@@ -972,6 +972,10 @@ try_mergejoin_path(PlannerInfo *root,
 		return;
 	}
 
+	/* Sometimes can't create merjejoin plan if we have non-commutable clauses */
+	if (!all_clauses_compatible(outer_path->parent->relids, mergeclauses))
+		return;
+
 	/*
 	 * If the given paths are already well enough ordered, we can skip doing
 	 * an explicit sort.
@@ -1036,6 +1040,10 @@ try_partial_mergejoin_path(PlannerInfo *root,
 {
 	JoinCostWorkspace workspace;
 
+	/* Sometimes can't create merjejoin plan if we have non-commutable clauses */
+	if (!all_clauses_compatible(outer_path->parent->relids, mergeclauses))
+		return;
+
 	/*
 	 * See comments in try_partial_hashjoin_path().
 	 */
@@ -1129,6 +1137,10 @@ try_hashjoin_path(PlannerInfo *root,
 		return;
 	}
 
+	/* Sometimes can't create hashjoin plan if we have non-commutable clauses */
+	if (!all_clauses_compatible(outer_path->parent->relids, hashclauses))
+		return;
+
 	/*
 	 * See comments in try_nestloop_path().  Also note that hashjoin paths
 	 * never have any output pathkeys, per comments in create_hashjoin_path.
@@ -2431,3 +2443,35 @@ select_mergejoin_clauses(PlannerInfo *root,
 
 	return result_list;
 }
+
+/*
+ * all_clauses_compatible
+ *
+ * Determine that all clauses' operators have commutator
+ * when necessary.  While constructing merge join plan or
+ * hash join plan, optimizer can call get_switched_clauses(),
+ * which can error out if clauses are not commutable.
+ * The logic here should match one in get_switched_clauses().
+ */
+static bool
+all_clauses_compatible(Relids outer_relids, List *clauses)
+{
+	ListCell   *l;
+
+	foreach(l, clauses)
+	{
+		RestrictInfo *restrictinfo = (RestrictInfo *) lfirst(l);
+		OpExpr	   *clause = (OpExpr *) restrictinfo->clause;
+		Oid			opoid;
+
+		if (bms_is_subset(restrictinfo->right_relids, outer_relids))
+		{
+			opoid = get_commutator(clause->opno);
+
+			if (!OidIsValid(opoid))
+return false;
+		}
+	}
+
+	return true;
+}
diff --git a/src/test/regress/expected/equivclass.out b/src/test/regress/expected/equivclass.out
index 126f7047fed..c8c85fd6c2d 100644
--- a/src/test/regress/expected/equivclass.out
+++ b/src/test/regress/expected/equivclass.out
@@ -323,6 +323,51 @@ explain (costs off)
Index Cond: (ff = '42'::bigint)
 (19 rows)
 
+-- merge join should check if conditions are commutable
+explain (costs off)
+with ss1 as materialized (
+	select ff + 1 as x from
+   (select ff + 2 as ff from ec1
+union all
+select ff + 3 as ff from ec1) ss0
+ union all
+ select ff + 4 as x from ec1
+)
+  select * from ec1,
+(select ff + 1 as x from
+   (select ff + 2 as ff from ec1
+union all
+select ff + 3 as ff from ec1) ss0
+ union all
+ select ff + 4 as x from ec1) as ss2,
+  ss1
+  where ss1.x = ec1.f1 and ss1.x = ss2.x and ec1.ff = 42::int8;
+QUERY PLAN 
+---
+ Merge Join
+   Merge Cond: ec1_1.ff + 2) + 1)) = ss1.x)
+   CTE ss1
+ ->  Append
+   ->  Seq Scan on ec1 ec1_4
+   ->  Seq Scan on ec1 ec1_5
+   ->  Seq Scan on ec1 ec1_6
+   ->  Merge Append
+ Sort Key: (((ec1_1.ff + 2) + 1))
+ ->  Index Scan using ec1_expr2 on ec1 ec1_1
+ ->  Index Scan using ec1_expr3 on ec1 ec1_2
+ ->  Index Scan using ec1_expr4 on ec1 ec1_3
+   ->  Materialize
+ ->  Merge Join
+   Merge Cond: (ss1.x = ec1.f1)
+   ->  Sort
+ 

Re: Show WAL write and fsync stats in pg_stat_io

2024-06-17 Thread Melanie Plageman
On Thu, Jun 13, 2024 at 5:24 AM Nazir Bilal Yavuz  wrote:
>
> On Sun, 9 Jun 2024 at 18:05, Nitin Jadhav  
> wrote:
> >
> > > If possible, let's have all the I/O stats (even for WAL) in
> > > pg_stat_io. Can't we show the WAL data we get from buffers in the hits
> > > column and then have read_bytes or something like that to know the
> > > amount of data read?
> >
> > The ‘hits’ column in ‘pg_stat_io’ is a vital indicator for adjusting a
> > database. It signifies the count of cache hits, or in other words, the
> > instances where data was located in the ‘shared_buffers’. As a result,
> > keeping an eye on the ‘hits’ column in ‘pg_stat_io’ can offer useful
> > knowledge about the buffer cache’s efficiency and assist users in
> > making educated choices when fine-tuning their database. However, if
> > we include the hit count of WAL buffers in this, it may lead to
> > misleading interpretations for database tuning. If there’s something
> > I’ve overlooked that’s already been discussed, please feel free to
> > correct me.
>
> I think counting them as a hit makes sense. We read data from WAL
> buffers instead of reading them from disk. And, WAL buffers are stored
> in shared memory so I believe they can be counted as hits in the
> shared buffers. Could you please explain how this change can 'lead to
> misleading interpretations for database tuning' a bit more?

Perhaps Nitin was thinking of a scenario in which WAL hits are counted
as hits on the same IOObject as shared buffer hits. Since this thread
has been going on for awhile and we haven't recently had a schema
overview, I could understand if there was some confusion. For clarity,
I will restate that the current proposal is to count WAL buffer hits
for IOObject WAL, which means they will not be mixed in with shared
buffer hits.

And I think it makes sense to count WAL IOObject hits since increasing
wal_buffers can lead to more hits, right?

- Melanie




Re: DROP OWNED BY fails to clean out pg_init_privs grants

2024-06-17 Thread Tom Lane
Daniel Gustafsson  writes:
> On 15 Jun 2024, at 01:46, Tom Lane  wrote:
>> The core change is to install SHARED_DEPENDENCY_INITACL entries in
>> pg_shdepend for all references to non-pinned roles in pg_init_privs,
>> whether they are the object's owner or not.  Doing that ensures that
>> we can't drop a role that is still referenced there, and it provides
>> a basis for shdepDropOwned and shdepReassignOwned to take some kind
>> of action for such references.

> I wonder if this will break any tools/scripts in prod which relies on the
> previous (faulty) behaviour.  It will be interesting to see if anything shows
> up on -bugs.  Off the cuff it seems like a good idea judging by where we are
> and what we can fix with it.

Considering that SHARED_DEPENDENCY_INITACL has existed for less than
two months, it's hard to believe that any outside code has grown any
dependencies on it, much less that it couldn't be adjusted readily.

> I wonder if it's worth reverting passing the owner ID for v17 and revisiting
> that in 18 if we work on recording the ID.  Shaving a few catalog lookups is
> generally worthwhile, doing them without needing the result for the next five
> years might bite us.

Yeah, that was the direction I was leaning in, too.  I'll commit the
revert of that separately, so that un-reverting it shouldn't be too
painful if we eventually decide to do so.

> Re-reading 534287403 I wonder about this hunk in RemoveRoleFromInitPriv:

> +   if (!isNull)
> +   old_acl = DatumGetAclPCopy(oldAclDatum);
> +   else
> +   old_acl = NULL; /* this case shouldn't 
> happen, probably */

> I wonder if we should Assert() on old_acl being NULL?  I can't imagine a case
> where it should legitimately be that and catching such in development might be
> useful for catching stray bugs?

Hmm, yeah.  I was trying to be agnostic about whether it's okay for a
pg_init_privs ACL to be NULL ... but I can't imagine a real use for
that either, and the new patch does add some code that's effectively
assuming it isn't.  Agreed, let's be uniform about insisting !isNull.

> +1 on going ahead with this patch.  There is more work to do but I agree that
> this about all that makes sense in v17 at this point.

Thanks for reviewing!

regards, tom lane




Re: Separate HEAP WAL replay logic into its own file

2024-06-17 Thread Melanie Plageman
On Mon, Jun 17, 2024 at 2:20 AM Li, Yong  wrote:
>
> Hi PostgreSQL hackers,
>
> For most access methods in PostgreSQL, the implementation of the access 
> method itself and the implementation of its WAL replay logic are organized in 
> separate source files.  However, the HEAP access method is an exception.  
> Both the access method and the WAL replay logic are collocated in the same 
> heapam.c.  To follow the pattern established by other access methods and to 
> improve maintainability, I made the enclosed patch to separate HEAP’s replay 
> logic into its own file.  The changes are straightforward.  Move the replay 
> related functions into the new heapam_xlog.c file, push the common 
> heap_execute_freeze_tuple() helper function into the heapam.h header, and 
> adjust the build files.

I'm not against this change, but I am curious at what inspired this.
Were you looking at Postgres code and simply noticed that there isn't
a heapam_xlog.c (like there is a nbtxlog.c etc) and thought that you
wanted to change that? Or is there some specific reason this would
help you as a Postgres developer, user, or ecosystem member?

- Melanie




Re: Introduce XID age and inactive timeout based replication slot invalidation

2024-06-17 Thread Nathan Bossart
On Mon, Jun 17, 2024 at 05:55:04PM +0530, Bharath Rupireddy wrote:
> Here are my thoughts on when to do the XID age invalidation. In all
> the patches sent so far, the XID age invalidation happens in two
> places - one during the slot acquisition, and another during the
> checkpoint. As the suggestion is to do it during the vacuum (manual
> and auto), so that even if the checkpoint isn't happening in the
> database for whatever reasons, a vacuum command or autovacuum can
> invalidate the slots whose XID is aged.

+1.  IMHO this is a principled choice.  The similar max_slot_wal_keep_size
parameter is considered where it arguably matters most: when we are trying
to remove/recycle WAL segments.  Since this parameter is intended to
prevent the server from running out of space, it makes sense that we'd
apply it at the point where we are trying to free up space.  The proposed
max_slot_xid_age parameter is intended to prevent the server from running
out of transaction IDs, so it follows that we'd apply it at the point where
we reclaim them, which happens to be vacuum.

> An idea is to check for XID age based invalidation for all the slots
> in ComputeXidHorizons() before it reads replication_slot_xmin and
> replication_slot_catalog_xmin, and obviously before the proc array
> lock is acquired. A potential problem with this approach is that the
> invalidation check can become too aggressive as XID horizons are
> computed from many places.
>
> Another idea is to check for XID age based invalidation for all the
> slots in higher levels than ComputeXidHorizons(), for example in
> vacuum() which is an entry point for both vacuum command and
> autovacuum. This approach seems similar to vacuum_failsafe_age GUC
> which checks each relation for the failsafe age before vacuum gets
> triggered on it.

I don't presently have any strong opinion on where this logic should go,
but in general, I think we should only invalidate slots if invalidating
them would allow us to advance the vacuum cutoff.  If the cutoff is held
back by something else, I don't see a point in invalidating slots because
we'll just be breaking replication in return for no additional reclaimed
transaction IDs.

-- 
nathan




Re: Allow logical failover slots to wait on synchronous replication

2024-06-17 Thread Bertrand Drouvot
Hi,

On Tue, Jun 11, 2024 at 10:00:46AM +, Bertrand Drouvot wrote:
> Hi,
> 
> On Mon, Jun 10, 2024 at 09:25:10PM -0500, Nathan Bossart wrote:
> > On Mon, Jun 10, 2024 at 03:51:05PM -0700, John H wrote:
> > > The existing 'standby_slot_names' isn't great for users who are running
> > > clusters with quorum-based synchronous replicas. For instance, if
> > > the user has  synchronous_standby_names = 'ANY 3 (A,B,C,D,E)' it's a
> > > bit tedious to have to reconfigure the standby_slot_names to set it to
> > > the most updated 3 sync replicas whenever different sync replicas start
> > > lagging. In the event that both GUCs are set, 'standby_slot_names' takes
> > > precedence.
> > 
> > Hm.  IIUC you'd essentially need to set standby_slot_names to "A,B,C,D,E"
> > to get the desired behavior today.  That might ordinarily be okay, but it
> > could cause logical replication to be held back unnecessarily if one of the
> > replicas falls behind for whatever reason.  A way to tie standby_slot_names
> > to synchronous replication instead does seem like it would be useful in
> > this case.
> 
> FWIW, I have the same understanding and also think your proposal would be
> useful in this case.

A few random comments about v1:

1 

+   int mode = SyncRepWaitMode;

It's set to SyncRepWaitMode and then never change. Worth to get rid of "mode"?

2 

+   static XLogRecPtr   lsn[NUM_SYNC_REP_WAIT_MODE] = 
{InvalidXLogRecPtr};

I did some testing and saw that the lsn[] values were not always set to
InvalidXLogRecPtr right after. It looks like that, in that case, we should
avoid setting the lsn[] values at compile time. Then, what about?

1. remove the "static". 

or

2. keep the static but set the lsn[] values after its declaration.

3 

-   /*
-* Return false if not all the standbys have caught up to the specified
-* WAL location.
-*/
-   if (caught_up_slot_num != standby_slot_names_config->nslotnames)
-   return false;
+   if (!XLogRecPtrIsInvalid(lsn[mode]) && lsn[mode] >= 
wait_for_lsn)
+   return true;

lsn[] values are/(should have been, see 2 above) just been initialized to
InvalidXLogRecPtr so that XLogRecPtrIsInvalid(lsn[mode]) will always return
true. I think this check is not needed then.

4 

> > > I did some very brief pgbench runs to compare the latency. Client instance
> > > was running pgbench and 10 logical clients while the Postgres box hosted
> > > the writer and 5 synchronous replicas.

> > > There's a hit to TPS

Out of curiosity, did you compare with standby_slot_names_from_syncrep set to 
off
and standby_slot_names not empty?

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: POC, WIP: OR-clause support for indexes

2024-06-17 Thread Alena Rybakina

On 17.06.2024 15:11, Alexander Korotkov wrote:

On Mon, Jun 17, 2024 at 1:33 PM Alena Rybakina
  wrote:

I noticed that 7 libraries have been added to 
src/backend/optimizer/plan/initsplan.c, and as far as I remember, Tom Lane has 
already expressed doubts about the approach that requires adding a large number 
of libraries [0], but I'm afraid I'm out of ideas about alternative approach.

Thank you for pointing.  Right, the number of extra headers included
was one of points for criticism on this patch.  I'll look to move this
functionality elsewhere, while the stage of transformation could
probably be the same.

Yes, I thing so.

In addition, I checked the fix in the previous cases that you wrote earlier [1] 
and noticed that SeqScan continues to generate, unfortunately, without 
converting expressions:

I've rechecked and see I made wrong conclusion about this.  The plan
regression is still here.  But I'm still looking to workaround this
without extra GUC.

I think we need to additionally do something like [1], but take
further steps to avoid planning overhead when not necessary.


Iagreewithyoutoreconsiderthisplacein detailonceagain,becauseotherwiseit 
lookslike we're likelyto runinto aperformanceissue.

In
particular, I think we should only consider splitting SAOP for bitmap
OR in the following cases:
1. There are partial indexes with predicates over target column.

Frankly, I see that we will need to split SAOP anyway to check it, right?

2. There are multiple indexes covering target column and different
subsets of other columns presented in restrictions.
I see two cases in one. First, we need to check whether there is an 
index for the columns specified in the restrictlist, and secondly, the 
index ranges for which the conditions fall into the "OR" expressions.

3. There are indexes covreing target column without support of SAOP
(amsearcharray == false).
Hopefully this should skip generation of useless bitmap paths in
majority cases. Thoughts?
I'm notsureIfullyunderstandhowusefulthiscanbe.Couldyouexplainit to mein 
more detail?

Links.
1.https://www.postgresql.org/message-id/67bd918d-285e-44d2-a207-f52d9a4c35e6%40postgrespro.ru


--
Regards,
Alena Rybakina
Postgres Professional:http://www.postgrespro.com
The Russian Postgres Company


Re: Conflict Detection and Resolution

2024-06-17 Thread Robert Haas
On Mon, Jun 17, 2024 at 1:42 AM Amit Kapila  wrote:
> The difference w.r.t the existing mechanisms for holding deleted data
> is that we don't know whether we need to hold off the vacuum from
> cleaning up the rows because we can't say with any certainty whether
> other nodes will perform any conflicting operations in the future.
> Using the example we discussed,
> Node A:
>   T1: INSERT INTO t (id, value) VALUES (1,1);
>   T2: DELETE FROM t WHERE id = 1;
>
> Node B:
>   T3: UPDATE t SET value = 2 WHERE id = 1;
>
> Say the order of receiving the commands is T1-T2-T3. We can't predict
> whether we will ever get T-3, so on what basis shall we try to prevent
> vacuum from removing the deleted row?

The problem arises because T2 and T3 might be applied out of order on
some nodes. Once either one of them has been applied on every node, no
further conflicts are possible.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Direct SSL connection and ALPN loose ends

2024-06-17 Thread Heikki Linnakangas

On 17/06/2024 17:11, Jacob Champion wrote:

On Mon, Apr 29, 2024 at 8:24 AM Heikki Linnakangas  wrote:

I was mostly worried about the refactoring of the
retry logic in libpq (and about the pre-existing logic too to be honest,
it was complicated before these changes already).


Some changes in the v17 negotiation fallback order caught my eye:

1. For sslmode=prefer, a modern v3 error during negotiation now
results in a fallback to plaintext. For v16 this resulted in an
immediate failure. (v2 errors retain the v16 behavior.)
2. For gssencmode=prefer, a legacy v2 error during negotiation now
results in an immediate failure. In v16 it allowed fallback to SSL or
plaintext depending on sslmode.

Are both these changes intentional/desirable? Change #1 seems to
partially undo the decision made in a49fbaaf:


 Don't assume that "E" response to NEGOTIATE_SSL_CODE means pre-7.0 server.

 These days, such a response is far more likely to signify a server-side
 problem, such as fork failure. [...]

 Hence, it seems best to just eliminate the assumption that backing off
 to non-SSL/2.0 protocol is the way to recover from an "E" response, and
 instead treat the server error the same as we would in non-SSL cases.


They were not intentional. Let me think about the desirable part :-).

By "negotiation", which part of the protocol are we talking about 
exactly? In the middle of the TLS handshake? After sending the startup 
packet?


I think the behavior with v2 and v3 errors should be the same. And I 
think an immediate failure is appropriate on any v2/v3 error during 
negotiation, assuming we don't use those errors for things like "TLS not 
supported", which would warrant a fallback.


--
Heikki Linnakangas
Neon (https://neon.tech)





FYI: LLVM Runtime Crash

2024-06-17 Thread David E. Wheeler
Hackers,

FYI, I wanted to try using PostgreSQL with LLVM on my Mac, but the backend 
repeatedly crashes during `make check`. I found the same issue in master and 
REL_16_STABLE. The crash message is:

FATAL:  fatal llvm error: Unsupported stack probing method
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
connection to server was lost

Same thing appears in the log output. I poked around and found a similar bug[1] 
was fixed in LLVM in December, but as I’m getting it with the latest release 
from 2 weeks ago, something else might be going on. I’ve opened a new LLVM bug 
report[2] for the issue.

I don’t *think* it’s something that can be fixed in Postgres core. This is 
mostly in FYI in case anyone else runs into this issue or knows something more 
about it.

In the meantime I’ll be building without --with-llvm.

Best,

David

[1]: https://github.com/llvm/llvm-project/issues/57171
[2]: https://github.com/llvm/llvm-project/issues/95804





[PATCH] Improve error message when trying to lock virtual tuple.

2024-06-17 Thread Sven Klemm
Hello,

When currently trying to lock a virtual tuple the returned error
will be a misleading `could not read block 0`. This patch adds a
check for the tuple table slot being virtual to produce a clearer
error.

This can be triggered by extensions returning virtual tuples.
While this is of course an error in those extensions the resulting
error is very misleading.



-- 
Regards, Sven Klemm
From c5ccbf2e9cb401a7e63e93cd643b4dfec8d92ab8 Mon Sep 17 00:00:00 2001
From: Sven Klemm 
Date: Mon, 17 Jun 2024 17:38:27 +0200
Subject: [PATCH] Improve error message when trying to lock virtual tuple.

When currently trying to lock a virtual tuple the returned error
will be a misleading `could not read block 0`. This patch adds a
check for the tuple table slot being virtual to produce a clearer
error.
---
 src/backend/executor/nodeLockRows.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/src/backend/executor/nodeLockRows.c b/src/backend/executor/nodeLockRows.c
index 41754ddfea..cb3abbd348 100644
--- a/src/backend/executor/nodeLockRows.c
+++ b/src/backend/executor/nodeLockRows.c
@@ -65,6 +65,13 @@ lnext:
 		return NULL;
 	}
 
+	/*
+	 * If the slot is virtual, we can't lock it. This should never happen, but
+	 * this will lead to a misleading could not read block error later otherwise.
+	 */
+	if (TTS_IS_VIRTUAL(slot))
+		elog(ERROR, "cannot lock virtual tuple");
+
 	/* We don't need EvalPlanQual unless we get updated tuple version(s) */
 	epq_needed = false;
 
-- 
2.45.2



Re: [PATCH] Improve error message when trying to lock virtual tuple.

2024-06-17 Thread Aleksander Alekseev
Hi,

> When currently trying to lock a virtual tuple the returned error
> will be a misleading `could not read block 0`. This patch adds a
> check for the tuple table slot being virtual to produce a clearer
> error.
>
> This can be triggered by extensions returning virtual tuples.
> While this is of course an error in those extensions the resulting
> error is very misleading.

```
+/*
+ * If the slot is virtual, we can't lock it. This should never happen, but
+ * this will lead to a misleading could not read block error
later otherwise.
+ */
```

I suggest dropping or rephrasing the "this should never happen" part.
If this never happened we didn't need this check. Maybe "If the slot
is virtual, we can't lock it. Fail early in order to provide an
appropriate error message", or just "If the slot is virtual, we can't
lock it".

```
elog(ERROR, "cannot lock virtual tuple");
```

For some reason I thought that ereport() is the preferred way of
throwing errors, but I see elog() used many times in ExecLockRows() so
this is probably fine.

-- 
Best regards,
Aleksander Alekseev




Re: RFC: adding pytest as a supported test framework

2024-06-17 Thread Andrew Dunstan



On 2024-06-17 Mo 4:27 AM, Matthias van de Meent wrote:

Hi Greg, Jelte,

On Sat, 15 Jun 2024 at 23:53, Greg Sabino Mullane  
wrote:


> Those young-uns are also the same group who hold their nose when 
coding in C, and are always clamoring for rewriting Postgres in Rust.


Could you point me to one occasion I have 'always' clamored for this, 
or any of "those young-uns" in the community? I may not be a huge fan 
of C, but rewriting PostgreSQL in [other language] is not on the list 
of things I'm clamoring for. I may have given off-hand mentions that 
[other language] would've helped in certain cases, sure, but I'd 
hardly call that clamoring.





Greg was being a but jocular here. I didn't take him seriously. But 
there's maybe a better case to make the point he was making. Back in the 
dark ages we used a source code control system called CVS. It's quite 
unlike git and has a great many limitations and uglinesses, and there 
was some pressure for us to move off it. If we had done so when it was 
first suggested, we would probably have moved to using Subversion, which 
is rather like CVS with many of the warts knocked off. Before long, some 
distributed systems like Mercurial and git came along, and we, like most 
of the world, chose git. Thus by waiting and not immediately doing what 
was suggested we got a better solution. Moving twice would have been ... 
painful.


I have written Python in the past. Not a huge amount, but it doesn't 
feel like a foreign country to me, just the next town over instead of my 
immediate neighbourhood. We even have a python script in the buildfarm 
server code (not written by me). I'm sure if we started writing tests in 
Python I would adjust. But I think we need to know what the advantages 
are, beyond simple language preference. And to get to an equivalent 
place for Python that we are at with perl will involve some work.



cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Direct SSL connection and ALPN loose ends

2024-06-17 Thread Jacob Champion
On Mon, Jun 17, 2024 at 8:24 AM Heikki Linnakangas  wrote:
> By "negotiation", which part of the protocol are we talking about
> exactly? In the middle of the TLS handshake? After sending the startup
> packet?

By "negotiation" I mean the server's response to the startup packet.
I.e. "supported"/"not supported"/"error".

> I think the behavior with v2 and v3 errors should be the same. And I
> think an immediate failure is appropriate on any v2/v3 error during
> negotiation, assuming we don't use those errors for things like "TLS not
> supported", which would warrant a fallback.

For GSS encryption, it was my vague understanding that older servers
respond with an error rather than the "not supported" indication. For
TLS, though, the decision in a49fbaaf (immediate failure) seemed
reasonable.

Thanks,
--Jacob




Re: Avoid orphaned objects dependencies, take 3

2024-06-17 Thread Robert Haas
On Fri, Jun 14, 2024 at 3:54 AM Bertrand Drouvot
 wrote:
> > Ah, right. So, I was assuming that, with either this version of your
> > patch or the earlier version, we'd end up locking the constraint
> > itself. Was I wrong about that?
>
> The child contraint itself is not locked when going through
> ConstraintSetParentConstraint().
>
> While at it, let's look at a full example and focus on your concern.

I'm not at the point of having a concern yet, honestly. I'm trying to
understand the design ideas. The commit message just says that we take
a conflicting lock, but it doesn't mention which object types that
principle does or doesn't apply to. I think the idea of skipping it
for cases where it's redundant with the relation lock could be the
right idea, but if that's what we're doing, don't we need to explain
the principle somewhere? And shouldn't we also apply it across all
object types that have the same property?

Along the same lines:

+ /*
+ * Those don't rely on LockDatabaseObject() when being dropped (see
+ * AcquireDeletionLock()). Also it looks like they can not produce
+ * orphaned dependent objects when being dropped.
+ */
+ if (object->classId == RelationRelationId || object->classId ==
AuthMemRelationId)
+ return;

"It looks like X cannot happen" is not confidence-inspiring. At the
very least, a better comment is needed here. But also, that relation
has no exception for AuthMemRelationId, only for RelationRelationId.
And also, the exception for RelationRelationId doesn't imply that we
don't need a conflicting lock here: the special case for
RelationRelationId in AcquireDeletionLock() is necessary because the
lock tag format is different for relations than for other database
objects, not because we don't need a lock at all. If the handling here
were really symmetric with what AcquireDeletionLock(), the coding
would be to call either LockRelationOid() or LockDatabaseObject()
depending on whether classid == RelationRelationId. Now, that isn't
actually necessary, because we already have relation-locking calls
elsewhere, but my point is that the rationale this commit gives for
WHY it isn't necessary seems to me to be wrong in multiple ways.

So to try to sum up here: I'm not sure I agree with this design. But I
also feel like the design is not as clear and consistently implemented
as it could be. So even if I just ignored the question of whether it's
the right design, it feels like we're a ways from having something
potentially committable here, because of issues like the ones I
mentioned in the last paragraph.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: DROP OWNED BY fails to clean out pg_init_privs grants

2024-06-17 Thread Daniel Gustafsson
> On 17 Jun 2024, at 16:56, Tom Lane  wrote:
> Daniel Gustafsson  writes:

>> I wonder if this will break any tools/scripts in prod which relies on the
>> previous (faulty) behaviour.  It will be interesting to see if anything shows
>> up on -bugs.  Off the cuff it seems like a good idea judging by where we are
>> and what we can fix with it.
> 
> Considering that SHARED_DEPENDENCY_INITACL has existed for less than
> two months, it's hard to believe that any outside code has grown any
> dependencies on it, much less that it couldn't be adjusted readily.

Doh, I was thinking about it backwards, clearly not a worry =)

>> I wonder if it's worth reverting passing the owner ID for v17 and revisiting
>> that in 18 if we work on recording the ID.  Shaving a few catalog lookups is
>> generally worthwhile, doing them without needing the result for the next five
>> years might bite us.
> 
> Yeah, that was the direction I was leaning in, too.  I'll commit the
> revert of that separately, so that un-reverting it shouldn't be too
> painful if we eventually decide to do so.

Sounds good.

--
Daniel Gustafsson





Re: Issues with ON CONFLICT UPDATE and REINDEX CONCURRENTLY

2024-06-17 Thread Michail Nikolaev
Hello, everyone.

> I think It is even possible to see !alive index in the same situation (it
is worse case), but I was unable to reproduce it so far.
Fortunately, it is not possible.

So, seems like I have found the source of the problem:

1) infer_arbiter_indexes calls RelationGetIndexList to get the list of
candidates.
It does no lock selected indexes in any additional way which
prevents index_concurrently_swap changing them (set and clear validity).

RelationGetIndexList relcache.c:4857
infer_arbiter_indexes plancat.c:780
make_modifytable createplan.c:7097 --
node->arbiterIndexes = infer_arbiter_indexes(root);
create_modifytable_plan createplan.c:2826
create_plan_recurse createplan.c:532
create_plan createplan.c:349
standard_planner planner.c:421
planner planner.c:282
pg_plan_query postgres.c:904
pg_plan_queries postgres.c:996
exec_simple_query postgres.c:1193

2) other backend marks some index as invalid and commits

index_concurrently_swap index.c:1600
ReindexRelationConcurrently indexcmds.c:4115
ReindexIndex indexcmds.c:2814
ExecReindex indexcmds.c:2743
ProcessUtilitySlow utility.c:1567
standard_ProcessUtility utility.c:1067
ProcessUtility utility.c:523
PortalRunUtility pquery.c:1158
PortalRunMulti pquery.c:1315
PortalRun pquery.c:791
exec_simple_query postgres.c:1274

3) first backend invalidates catalog snapshot because transactional snapshot

InvalidateCatalogSnapshot snapmgr.c:426
GetTransactionSnapshot snapmgr.c:278
PortalRunMulti pquery.c:1244
PortalRun pquery.c:791
exec_simple_query postgres.c:1274

4) first backend copies indexes selected using previous catalog snapshot

ExecInitModifyTable nodeModifyTable.c:4499 
resultRelInfo->ri_onConflictArbiterIndexes = node->arbiterIndexes;
ExecInitNode execProcnode.c:177
InitPlan execMain.c:966
standard_ExecutorStart execMain.c:261
ExecutorStart execMain.c:137
ProcessQuery pquery.c:155
PortalRunMulti pquery.c:1277
PortalRun pquery.c:791
exec_simple_query postgres.c:1274

5) then reads indexes using new fresh snapshot

  RelationGetIndexList relcache.c:4816
  ExecOpenIndices execIndexing.c:175
  ExecInsert nodeModifyTable.c:792 -
ExecOpenIndices(resultRelInfo, onconflict != ONCONFLICT_NONE);
  ExecModifyTable nodeModifyTable.c:4059
  ExecProcNodeFirst execProcnode.c:464
  ExecProcNode executor.h:274
  ExecutePlan execMain.c:1646
  standard_ExecutorRun execMain.c:363
  ExecutorRun execMain.c:304
  ProcessQuery pquery.c:160
  PortalRunMulti pquery.c:1277
  PortalRun pquery.c:791
  exec_simple_query postgres.c:1274

5) and uses arbiter selected with stale snapshot with new index view
(marked as invalid)

ExecInsert nodeModifyTable.c:1016 -- arbiterIndexes
= resultRelInfo->ri_onConflictArbiterIndexes;


ExecInsert nodeModifyTable.c:1048 ---if
(!ExecCheckIndexConstraints(resultRelInfo, slot, estate, conflictTid,
arbiterIndexes))
ExecModifyTable nodeModifyTable.c:4059
ExecProcNodeFirst execProcnode.c:464
ExecProcNode executor.h:274
ExecutePlan execMain.c:1646
standard_ExecutorRun execMain.c:363
ExecutorRun execMain.c:304
ProcessQuery pquery.c:160
PortalRunMulti pquery.c:1277
PortalRun pquery.c:791
exec_simple_query postgres.c:1274


I have attached an updated test for the issue (it fails on assert quickly
and uses only 2 backends).
The same issue may happen in case of CREATE/DROP INDEX CONCURRENTLY as well.

The simplest possible fix is to use ShareLock
instead ShareUpdateExclusiveLock in the index_concurrently_swap

oldClassRel = relation_open(oldIndexId, ShareLock);
newClassRel = relation_open(newIndexId, ShareLock);

But this is not a "concurrent" way. But such update should be fast enough
as far as I understand.

Best regards,
Mikhail.
Subject: [PATCH] better test
error report in case of invalid index used as arbiter
test for issue with upsert fail
---
Index: src/test/modules/test_misc/t/006_concurrently_unique_fail.pl
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===
diff --git a/src/test/mo

Re: ecdh support causes unnecessary roundtrips

2024-06-17 Thread Andres Freund
Hi,

On 2024-06-17 12:00:30 +0200, Daniel Gustafsson wrote:
> > On 17 Jun 2024, at 01:46, Andres Freund  wrote:
>
> > When connecting with a libpq based client, the TLS establishment ends up 
> > like
> > this in many configurations;
> >
> > C->S: TLSv1 393 Client Hello
> > S->C: TLSv1.3 167 Hello Retry Request, Change Cipher Spec
> > C->S: TLSv1.3 432 Change Cipher Spec, Client Hello
> > S->C: TLSv1.3 1407 Server Hello, Application Data, Application Data, 
> > Application Data, Application Data
>
> I wonder if this is the result of us still using TLS 1.2 as the default 
> minimum
> protocol version.  In 1.2 the ClientHello doesn't contain the extensions for
> cipher and curves, so the server must reply with a HRR (Hello Retry Request)
> message asking the client to set protocol, curve and cipher.  The client will
> respond with a new Client Hello using 1.3 with the extensions.

I'm pretty sure it's not that, given that
a) removing the server side SSL_CTX_set_tmp_ecdh() call
b) adding a client side SSL_CTX_set_tmp_ecdh() call
avoid the roundtrip.

I've now also confirmed that ssl_min_protocol_version=TLSv1.3 doesn't change
anything (relevant, of course the indicated version support changes).


> > This appears to be caused by ECDH support.  The difference between the two
> > ClientHellos is
> > -Extension: key_share (len=38) x25519
> > +Extension: key_share (len=71) secp256r1
> >
> > I.e. the clients wanted to use x25519, but the server insists on secp256r1.
>
> Somewhat related, Erica Zhang has an open patch to make the server-side curves
> configuration take a list rather than a single curve [0], and modernizing the
> API used as a side effect (SSL_CTX_set_tmp_ecdh is documented as obsoleted by
> OpenSSL, but not deprecated with an API level).

That does seem nicer. Fun coincidence in timing.


> > I don't know if it's good that we're calling SSL_CTX_set_tmp_ecdh at all,
>
> To set the specified curve in ssl_ecdh_curve we have to don't we?

Sure, but it's not obvious to me why we actually want to override openssl's
defaults here. There's not even a parameter to opt out of forcing a specific
choice on the server side.


> > but if it is, shouldn't we at least do the same in libpq, so we don't 
> > introduce
> > unnecessary roundtrips?
>
> If we don't set the curve in the client I believe OpenSSL will pass the set of
> supported curves the client has, which then should allow the server to pick 
> the
> one it wants based on ssl_ecdh_curve, so ideally we shouldn't have to I think.

Afaict the client sends exactly one:

Transport Layer Security
TLSv1.3 Record Layer: Handshake Protocol: Client Hello
Content Type: Handshake (22)
Version: TLS 1.0 (0x0301)
Length: 320
Handshake Protocol: Client Hello
...
Extension: supported_versions (len=5) TLS 1.3, TLS 1.2
Type: supported_versions (43)
Length: 5
Supported Versions length: 4
Supported Version: TLS 1.3 (0x0304)
Supported Version: TLS 1.2 (0x0303)
Extension: psk_key_exchange_modes (len=2)
Type: psk_key_exchange_modes (45)
Length: 2
PSK Key Exchange Modes Length: 1
PSK Key Exchange Mode: PSK with (EC)DHE key establishment 
(psk_dhe_ke) (1)
Extension: key_share (len=38) x25519
Type: key_share (51)
Length: 38
Key Share extension
Extension: compress_certificate (len=5)
Type: compress_certificate (27)
...

Note key_share being set to x25519.


The HRR says:

Extension: key_share (len=2) secp256r1
Type: key_share (51)
Length: 2
Key Share extension
Selected Group: secp256r1 (23)


> > I did confirm that doing the same thing on the client side removes the
> > additional roundtrip.
>
> The roundtrip went away because the client was set to use secp256r1?

Yes. Or if I change the server to not set the ecdh curve.


> I wonder if that made OpenSSL override the min protocol version and switch
> to a TLS1.3 ClientHello since it otherwise couldn't announce the curve.

The client seems to announce the curve in the initial ClientHello even with
1.3 as the minimum version.

What *does* make the HRR go away is setting ssl_max_protocol_version=TLSv1.2
on the client side.


https://wiki.openssl.org/index.php/TLS1.3 says:

> In practice most clients will use X25519 or P-256 for their initial
> key_share. For maximum performance it is recommended that servers are
> configured to support at least those two groups and clients use one of those
> two for its initial key_share. This is the default case (OpenSSL clients
> will use X25519).

We're not allowing both groups and the client defaults to X25519, hence
the HRR.


> If you force the client min protocol to 1.3 in an unpatched client, do you
> see the same speedup?

Nope.

Greetings,

Andres Freund




Re: ecdh support causes unnecessary roundtrips

2024-06-17 Thread Jacob Champion
On Mon, Jun 17, 2024 at 10:01 AM Andres Freund  wrote:
> On 2024-06-17 12:00:30 +0200, Daniel Gustafsson wrote:
> > To set the specified curve in ssl_ecdh_curve we have to don't we?
>
> Sure, but it's not obvious to me why we actually want to override openssl's
> defaults here. There's not even a parameter to opt out of forcing a specific
> choice on the server side.

I had exactly the same question in the context of the other thread, and found


https://www.openssl.org/blog/blog/2022/10/21/tls-groups-configuration/index.html

My initial takeaway was that our default is more restrictive than it
should be, but the OpenSSL default is more permissive than what they
recommend in practice, due to denial of service concerns:

> A general recommendation is to limit the groups to those that meet the
> required security level and that all the potential TLS clients support.

--Jacob




Re: ecdh support causes unnecessary roundtrips

2024-06-17 Thread Daniel Gustafsson
> On 17 Jun 2024, at 19:01, Andres Freund  wrote:
> On 2024-06-17 12:00:30 +0200, Daniel Gustafsson wrote:
>>> On 17 Jun 2024, at 01:46, Andres Freund  wrote:

>>> I don't know if it's good that we're calling SSL_CTX_set_tmp_ecdh at all,
>> 
>> To set the specified curve in ssl_ecdh_curve we have to don't we?
> 
> Sure, but it's not obvious to me why we actually want to override openssl's
> defaults here. There's not even a parameter to opt out of forcing a specific
> choice on the server side.

I agree that the GUC is a bit rough around the edges, maybe leavint it blank or
something should be defined as "OpenSSL defaults".  Let's bring that to Erica's
patch for allowing a list of curves.

>>> I did confirm that doing the same thing on the client side removes the
>>> additional roundtrip.
>> 
>> The roundtrip went away because the client was set to use secp256r1?
> 
> Yes. Or if I change the server to not set the ecdh curve.

Configuring the server to use x25519 instead of secp256r1 should achieve the
same thing.

>> I wonder if that made OpenSSL override the min protocol version and switch
>> to a TLS1.3 ClientHello since it otherwise couldn't announce the curve.
> 
> The client seems to announce the curve in the initial ClientHello even with
> 1.3 as the minimum version.

With 1.3 it should announce it in ClientHello, do you mean that it's announced
when 1.2 is the minimum version as well?  It does make sense since a 1.2 server
is defined to disregard all extensions.

> What *does* make the HRR go away is setting ssl_max_protocol_version=TLSv1.2
> on the client side.

Makes sense, that would remove the curve and there is no change required.

> https://wiki.openssl.org/index.php/TLS1.3 says:
> 
>> In practice most clients will use X25519 or P-256 for their initial
>> key_share. For maximum performance it is recommended that servers are
>> configured to support at least those two groups and clients use one of those
>> two for its initial key_share. This is the default case (OpenSSL clients
>> will use X25519).
> 
> We're not allowing both groups and the client defaults to X25519, hence
> the HRR.

So this would be solved by the curve-list patch referenced above, especially if
allow it to have an opt-out to use OpenSSL defaults.

--
Daniel Gustafsson





tls 1.3: sending multiple tickets

2024-06-17 Thread Andres Freund
Hi,

To investigate an unrelated issue, I set up key logging in the backend (we
probably should build that in) and looked at the decrypted data.  And noticed
that just after TLS setup finished the server sends three packets in a row:

C->S: TLSv1.3: finished
C->S: TLSv1.3: application data (startup message)
S->C: TLSv1.3: new session ticket
S->C: TLSv1.3: new session ticket
S->C: TLSv1.3: application data (authentication ok, parameter status+)


We try to turn off session resumption, but not completely enough for 1.3:
   SSL_OP_NO_TICKET
   SSL/TLS supports two mechanisms for resuming sessions: session ids 
and stateless session tickets.

   When using session ids a copy of the session information is cached 
on the server and a unique id is sent to the client. When the client  wishes  to
   resume it provides the unique id so that the server can retrieve the 
session information from its cache.

   When  using  stateless  session  tickets the server uses a session 
ticket encryption key to encrypt the session information. This encrypted data is
   sent to the client as a "ticket". When the client wishes to resume 
it sends the encrypted data back to the server.  The  server  uses  its  key  to
   decrypt the data and resume the session. In this way the server can 
operate statelessly - no session information needs to be cached locally.

   The  TLSv1.3  protocol  only  supports  tickets and does not 
directly support session ids. However, OpenSSL allows two modes of ticket 
operation in
   TLSv1.3: stateful and stateless. Stateless tickets work the same way 
as in TLSv1.2 and below.  Stateful tickets  mimic  the  session  id  behaviour
   available  in TLSv1.2 and below.  The session information is cached 
on the server and the session id is wrapped up in a ticket and sent back to the
   client. When the client wishes to resume, it presents a ticket in 
the same way as for stateless tickets. The server can then extract the session 
id
   from the ticket and retrieve the session information from its cache.

   By default OpenSSL will use stateless tickets. The SSL_OP_NO_TICKET 
option will cause stateless tickets to not be issued. In TLSv1.2 and below this
   means no ticket gets sent to the client at all. In TLSv1.3 a 
stateful ticket will be sent. This is a server-side option only.

   In TLSv1.3 it  is  possible  to  suppress  all  tickets  (stateful  
and  stateless)  from  being  sent  by  calling  SSL_CTX_set_num_tickets(3)  or
   SSL_set_num_tickets(3).


Note the second to last paragraph: Because we use SSL_OP_NO_TICKET we trigger
use of stateful tickets. Which afaict are never going to be useful, because we
don't share the necessary state.

I guess openssl really could have inferred this from the fact that we *do*
call SSL_CTX_set_session_cache_mode(SSL_SESS_CACHE_OFF), b


Seems we ought to use SSL_CTX_set_num_tickets() to prevent issuing the useless
tickets?



It seems like a buglet in openssl that it forces each session tickets to be
sent in its own packet (it does an explicit BIO_flush(), so even if we
buffered between openssl and OS, as I think we should, we'd still send it
separately), but I don't really understand most of this stuff.

Greetings,

Andres Freund




Re: ecdh support causes unnecessary roundtrips

2024-06-17 Thread Andres Freund
Hi,

On 2024-06-17 19:29:47 +0200, Daniel Gustafsson wrote:
> >> I wonder if that made OpenSSL override the min protocol version and switch
> >> to a TLS1.3 ClientHello since it otherwise couldn't announce the curve.
> >
> > The client seems to announce the curve in the initial ClientHello even with
> > 1.3 as the minimum version.
>
> With 1.3 it should announce it in ClientHello, do you mean that it's announced
> when 1.2 is the minimum version as well?  It does make sense since a 1.2 
> server
> is defined to disregard all extensions.

Yes, it's announced even when 1.2 is the minimum:

Extension: supported_versions (len=5) TLS 1.3, TLS 1.2
Type: supported_versions (43)
Length: 5
Supported Versions length: 4
Supported Version: TLS 1.3 (0x0304)
Supported Version: TLS 1.2 (0x0303)
...
Extension: key_share (len=38) x25519
Type: key_share (51)
Length: 38
Key Share extension



> Let's bring that to Erica's patch for allowing a list of curves.

I'm kinda wondering if we ought to do something about this in the
backbranches. Forcing unnecessary roundtrips onto everyone for the next five
years due to an oversight on our part isn't great.  Once you're not local, the
roundtrip does measurably increase the "time to first query".

Greetings,

Andres Freund




Re: ecdh support causes unnecessary roundtrips

2024-06-17 Thread Daniel Gustafsson
> On 17 Jun 2024, at 19:44, Andres Freund  wrote:

>> Let's bring that to Erica's patch for allowing a list of curves.
> 
> I'm kinda wondering if we ought to do something about this in the
> backbranches. Forcing unnecessary roundtrips onto everyone for the next five
> years due to an oversight on our part isn't great.  Once you're not local, the
> roundtrip does measurably increase the "time to first query".

I don't disagree, but wouldn't it be the type of behavioural change which we
typically try to avoid in backbranches?  Changing the default of the ecdh GUC
would perhaps be doable?  (assuming that's a working solution to avoid the
roundtrip).  Amending the documentation is the one thing we certainly can do
but 99.9% of affected users won't know they are affected so won't look for that
section.

--
Daniel Gustafsson





Re: ecdh support causes unnecessary roundtrips

2024-06-17 Thread Andres Freund
Hi,

On 2024-06-17 19:51:45 +0200, Daniel Gustafsson wrote:
> > On 17 Jun 2024, at 19:44, Andres Freund  wrote:
> 
> >> Let's bring that to Erica's patch for allowing a list of curves.
> > 
> > I'm kinda wondering if we ought to do something about this in the
> > backbranches. Forcing unnecessary roundtrips onto everyone for the next five
> > years due to an oversight on our part isn't great.  Once you're not local, 
> > the
> > roundtrip does measurably increase the "time to first query".
> 
> I don't disagree, but wouldn't it be the type of behavioural change which we
> typically try to avoid in backbranches?

Yea, it's not great. Not sure what the right thing is here.


> Changing the default of the ecdh GUC would perhaps be doable?

I was wondering whether we could change the default so that it accepts both
x25519 and secp256r1. Unfortunately that seems to requires changing what we
use to set the parameter...


> (assuming that's a working solution to avoid the roundtrip).

It is.


> Amending the documentation is the one thing we certainly can do but 99.9% of
> affected users won't know they are affected so won't look for that section.

Yea. It's also possible that some other bindings changed their default to
match ours...

Greetings,

Andres Freund




Re: IPC::Run accepts bug reports

2024-06-17 Thread Robert Haas
On Sat, Jun 15, 2024 at 7:48 PM Noah Misch  wrote:
> Separating this from the pytest thread:
>
> On Sat, Jun 15, 2024 at 01:26:57PM -0400, Robert Haas wrote:
> > The one
> > thing I know about that *I* think is a pretty big problem about Perl
> > is that IPC::Run is not really maintained.
>
> I don't see in https://github.com/cpan-authors/IPC-Run/issues anything
> affecting PostgreSQL.  If you know of IPC::Run defects, please report them.
> If I knew of an IPC::Run defect affecting PostgreSQL, I likely would work on
> it before absurdity like https://github.com/cpan-authors/IPC-Run/issues/175
> NetBSD-10-specific behavior coping.

I'm not concerned about any specific open issue; my concern is about
the health of that project. https://metacpan.org/pod/IPC::Run says
that this module is seeking new maintainers, and it looks like the
people listed as current maintainers are mostly inactive. Instead,
you're fixing stuff. That's great, but we ideally want PostgreSQL's
dependencies to be things that are used widely enough that we don't
end up maintaining them ourselves.

I apologize if my comment came across as disparaging your efforts;
that was not my intent.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Avoid orphaned objects dependencies, take 3

2024-06-17 Thread Bertrand Drouvot
Hi,

On Mon, Jun 17, 2024 at 12:24:46PM -0400, Robert Haas wrote:
> On Fri, Jun 14, 2024 at 3:54 AM Bertrand Drouvot
>  wrote:
> > > Ah, right. So, I was assuming that, with either this version of your
> > > patch or the earlier version, we'd end up locking the constraint
> > > itself. Was I wrong about that?
> >
> > The child contraint itself is not locked when going through
> > ConstraintSetParentConstraint().
> >
> > While at it, let's look at a full example and focus on your concern.
> 
> I'm not at the point of having a concern yet, honestly. I'm trying to
> understand the design ideas. The commit message just says that we take
> a conflicting lock, but it doesn't mention which object types that
> principle does or doesn't apply to. I think the idea of skipping it
> for cases where it's redundant with the relation lock could be the
> right idea, but if that's what we're doing, don't we need to explain
> the principle somewhere? And shouldn't we also apply it across all
> object types that have the same property?

Yeah, I still need to deeply study this area and document it. 

> Along the same lines:
> 
> + /*
> + * Those don't rely on LockDatabaseObject() when being dropped (see
> + * AcquireDeletionLock()). Also it looks like they can not produce
> + * orphaned dependent objects when being dropped.
> + */
> + if (object->classId == RelationRelationId || object->classId ==
> AuthMemRelationId)
> + return;
> 
> "It looks like X cannot happen" is not confidence-inspiring.

Yeah, it is not. It is just a "feeling" that I need to work on to remove
any ambiguity and/or adjust the code as needed.

> At the
> very least, a better comment is needed here. But also, that relation
> has no exception for AuthMemRelationId, only for RelationRelationId.
> And also, the exception for RelationRelationId doesn't imply that we
> don't need a conflicting lock here: the special case for
> RelationRelationId in AcquireDeletionLock() is necessary because the
> lock tag format is different for relations than for other database
> objects, not because we don't need a lock at all. If the handling here
> were really symmetric with what AcquireDeletionLock(), the coding
> would be to call either LockRelationOid() or LockDatabaseObject()
> depending on whether classid == RelationRelationId.

Agree.

> Now, that isn't
> actually necessary, because we already have relation-locking calls
> elsewhere, but my point is that the rationale this commit gives for
> WHY it isn't necessary seems to me to be wrong in multiple ways.

Agree. I'm not done with that part yet (should have made it more clear).

> So to try to sum up here: I'm not sure I agree with this design. But I
> also feel like the design is not as clear and consistently implemented
> as it could be. So even if I just ignored the question of whether it's
> the right design, it feels like we're a ways from having something
> potentially committable here, because of issues like the ones I
> mentioned in the last paragraph.
> 

Agree. I'll now move on with the "XXX Do we need a lock for RelationRelationId?"
comments that I put in v10 (see [1]) and study all the cases around them.

Once done, I think that it will easier to 1.remove ambiguity, 2.document and
3.do the "right" thing regarding the RelationRelationId object class.

[1]: 
https://www.postgresql.org/message-id/ZnAVEBhlGvpDDVOD%40ip-10-97-1-34.eu-west-3.compute.internal

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: improve predefined roles documentation

2024-06-17 Thread Robert Haas
On Thu, Jun 13, 2024 at 3:48 PM Nathan Bossart  wrote:
> I think we could improve matters by abandoning the table and instead
> documenting these roles more like we document GUCs, i.e., each one has a
> section below it where we can document it in as much detail as we want.
> Some of these roles should probably be documented together (e.g.,
> pg_read_all_data and pg_write_all_data), so the ordering is unlikely to be
> perfect, but I'm hoping it would still be a net improvement.

+1. I'm not sure about all of the details, but I like the general idea.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: IPC::Run accepts bug reports

2024-06-17 Thread Andres Freund
Hi,

On 2024-06-15 16:48:24 -0700, Noah Misch wrote:
> Separating this from the pytest thread:
>
> On Sat, Jun 15, 2024 at 01:26:57PM -0400, Robert Haas wrote:
> > The one
> > thing I know about that *I* think is a pretty big problem about Perl
> > is that IPC::Run is not really maintained.
>
> I don't see in https://github.com/cpan-authors/IPC-Run/issues anything
> affecting PostgreSQL.  If you know of IPC::Run defects, please report them.
> If I knew of an IPC::Run defect affecting PostgreSQL, I likely would work on
> it before absurdity like https://github.com/cpan-authors/IPC-Run/issues/175
> NetBSD-10-specific behavior coping.

1) Sometimes hangs hard on windows if started processes have not been shut
   down before script exits.  I've mostly encountered this via the buildfarm /
   CI, so I never had a good way of narrowing this down. It's very painful
   because things seem to often just get stuck once that happens.

2) If a subprocess dies in an inopportune moment, IPC::Run dies with "ack
   Broken pipe:" (in _do_filters()). There's plenty reports of this on the
   list, and I've hit this several times personally. It seems to be timing
   dependent, I've encountered it after seemingly irrelevant ordering changes.

   I suspect I could create a reproducer with a bit of time.

3) It's very slow on windows (in addition to the windows process
   slowness). That got to be fixable to some degree.

Greetings,

Andres Freund




Re: Document NULL

2024-06-17 Thread David G. Johnston
On Sat, May 11, 2024 at 11:00 AM David G. Johnston <
david.g.johns...@gmail.com> wrote:

> Though I haven’t settled on a phrasing I really like.  But I’m trying to
> avoid a parenthetical.
>
>
Settled on this:

The cardinal rule, a null value is neither
   equal nor unequal
   to any value, including other null values.

I've been tempted to just say, "to any value.", but cannot quite bring
myself to do it...

David J.


Re: Direct SSL connection and ALPN loose ends

2024-06-17 Thread Andres Freund
Hi,

On 2024-04-29 18:24:04 +0300, Heikki Linnakangas wrote:
> All reported bugs have now been fixed. We now enforce ALPN in all the right
> places. Please let me know if I missed something.

Very minor and not really your responsibility:

If provided with the necessary key information, wireshark can decode TLS
exchanges when using sslnegotiation=postgres but not with direct. Presumably
it needs to be taught postgres' ALPN id or something.

Example with direct:

  476 6513.310308457 192.168.0.113 → 192.168.0.200 48978 5432 142 TLSv1.3 
Finished
  477 6513.310341492 192.168.0.113 → 192.168.0.200 48978 5432 151 TLSv1.3 
Application Data
  478 6513.320730295 192.168.0.200 → 192.168.0.113 5432 48978 147 TLSv1.3 New 
Session Ticket
  479 6513.320745684 192.168.0.200 → 192.168.0.113 5432 48978 147 TLSv1.3 New 
Session Ticket
  480 6513.321175713 192.168.0.113 → 192.168.0.200 48978 5432 68 TCP 48978 → 
5432 [ACK] Seq=915 Ack=1665 Win=62848 Len=0 TSval=3779915421 TSecr=3469016093
  481 6513.323161553 192.168.0.200 → 192.168.0.113 5432 48978 518 TLSv1.3 
Application Data
  482 6513.323626180 192.168.0.113 → 192.168.0.200 48978 5432 125 TLSv1.3 
Application Data
  483 6513.333977769 192.168.0.200 → 192.168.0.113 5432 48978 273 TLSv1.3 
Application Data
  484 6513.334581920 192.168.0.113 → 192.168.0.200 48978 5432 95 TLSv1.3 
Application Data
  485 6513.334666116 192.168.0.113 → 192.168.0.200 48978 5432 92 TLSv1.3 Alert 
(Level: Warning, Description: Close Notify)

Example with postgres:

  502 6544.752799560 192.168.0.113 → 192.168.0.200 46300 5432 142 TLSv1.3 
Finished
  503 6544.752842863 192.168.0.113 → 192.168.0.200 46300 5432 151 PGSQL >?
  504 6544.76315 192.168.0.200 → 192.168.0.113 5432 46300 147 TLSv1.3 New 
Session Ticket
  505 6544.763163155 192.168.0.200 → 192.168.0.113 5432 46300 147 TLSv1.3 New 
Session Ticket
  506 6544.763587595 192.168.0.113 → 192.168.0.200 46300 5432 68 TCP 46300 → 
5432 [ACK] Seq=923 Ack=1666 Win=62848 Len=0 TSval=3779946864 TSecr=3469047536
  507 6544.765024827 192.168.0.200 → 192.168.0.113 5432 46300 518 PGSQL 
Q
  509 6544.776974164 192.168.0.200 → 192.168.0.113 5432 46300 273 PGSQL 
X
  511 6544.777631520 192.168.0.113 → 192.168.0.200 46300 5432 92 TLSv1.3 Alert 
(Level: Warning, Description: Close Notify)

Note that in the second one it knows what's inside the "Application Data"
messages and decodes them (S: startup, authentication ok, parameters, cancel 
key,
ready for query, C: simple query, S: description, 10 rows, command complete,
ready for query).

In the GUI you can obviously go into the "postgres messages" in more detail
than I know how to do on the console.



A second aspect is that I'm not super happy about the hack of stashing data
into Port.  I think medium term we'd be better off separating out the
buffering for unencrypted and encrypted data properly. It turns out that not
having any buffering *below* openssl (i.e. the encrypted data) hurts both for
the send and receive side, due to a) increased number of syscalls b) too many
small packets being sent, as we use TCP_NODELAY c) kernel memory copies being
slower due to the small increments.

Greetings,

Andres Freund




Re: Refactoring backend fork+exec code

2024-06-17 Thread Nathan Bossart
While looking into [0], I noticed that main() still only checks for the
--fork prefix, but IIUC commit aafc05d removed all --fork* options except
for --forkchild.  I've attached a patch to strengthen the check in main().
This is definitely just a nitpick.

[0] 
https://postgr.es/m/CAKAnmmJkZtZAiSryho%3DgYpbvC7H-HNjEDAh16F3SoC9LPu8rqQ%40mail.gmail.com

-- 
nathan
diff --git a/src/backend/main/main.c b/src/backend/main/main.c
index bfd0c5ed65..4672aab837 100644
--- a/src/backend/main/main.c
+++ b/src/backend/main/main.c
@@ -185,7 +185,7 @@ main(int argc, char *argv[])
else if (argc > 1 && strcmp(argv[1], "--boot") == 0)
BootstrapModeMain(argc, argv, false);
 #ifdef EXEC_BACKEND
-   else if (argc > 1 && strncmp(argv[1], "--fork", 6) == 0)
+   else if (argc > 1 && strncmp(argv[1], "--forkchild", 11) == 0)
SubPostmasterMain(argc, argv);
 #endif
else if (argc > 1 && strcmp(argv[1], "--describe-config") == 0)


Re: Add support to TLS 1.3 cipher suites and curves lists

2024-06-17 Thread Andres Freund
Hi,

This thread was referenced by 
https://www.postgresql.org/message-id/48F0A1F8-E0B4-41F8-990F-41E6BA2A6185%40yesql.se

On 2024-06-13 14:34:27 +0800, Erica Zhang wrote:

> diff --git a/src/backend/libpq/be-secure-openssl.c 
> b/src/backend/libpq/be-secure-openssl.c
> index 39b1a66236..d097e81407 100644
> --- a/src/backend/libpq/be-secure-openssl.c
> +++ b/src/backend/libpq/be-secure-openssl.c
> @@ -1402,30 +1402,30 @@ static bool
>  initialize_ecdh(SSL_CTX *context, bool isServerStart)
>  {
>  #ifndef OPENSSL_NO_ECDH
> - EC_KEY *ecdh;
> - int nid;
> + char*curve_list = strdup(SSLECDHCurve);

ISTM we'd want to eventually rename the GUC variable to indicate it's a list?
I think the "ecdh" portion is actually not accurate anymore either, it's used
outside of ecdh if I understand correctly (probably I am not)?


> + char*saveptr;
> + char*token = strtok_r(curve_list, ":", &saveptr);
> + int nid;
>  
> - nid = OBJ_sn2nid(SSLECDHCurve);
> - if (!nid)
> + while (token != NULL)

It'd be good to have a comment explaining why we're parsing the list ourselves
instead of using just the builtin SSL_CTX_set1_groups_list().

>   {
> - ereport(isServerStart ? FATAL : LOG,
> + nid = OBJ_sn2nid(token);
> + if (!nid)
> + {ereport(isServerStart ? FATAL : LOG,
>   (errcode(ERRCODE_CONFIG_FILE_ERROR),
> -  errmsg("ECDH: unrecognized curve name: %s", 
> SSLECDHCurve)));
> +  errmsg("ECDH: unrecognized curve name: %s", 
> token)));
>   return false;
> + }
> + token = strtok_r(NULL, ":", &saveptr);
>   }
>  
> - ecdh = EC_KEY_new_by_curve_name(nid);
> - if (!ecdh)
> + if(SSL_CTX_set1_groups_list(context, SSLECDHCurve) !=1)
>   {
>   ereport(isServerStart ? FATAL : LOG,
>   (errcode(ERRCODE_CONFIG_FILE_ERROR),
> -  errmsg("ECDH: could not create key")));
> +  errmsg("ECDH: failed to set curve names")));

Probably worth including the value of the GUC here?



This also needs to change the documentation for the GUC.



Once we have this parameter we probably should add at least x25519 to the
allowed list, as that's the client side default these days.

But that can be done in a separate patch.

Greetings,

Andres Freund




Re: Virtual generated columns

2024-06-17 Thread Tomasz Rybak
On Wed, 2024-05-22 at 19:22 +0200, Peter Eisentraut wrote:
> On 29.04.24 10:23, Peter Eisentraut wrote:
> > Here is a patch set to implement virtual generated columns.
> 
> > The main feature patch (0005 here) generally works but has a number
> > of 
> > open corner cases that need to be thought about and/or fixed, many
> > of 
> > which are marked in the code or the tests.  I'll continue working
> > on 
> > that.  But I wanted to see if I can get some feedback on the test 
> > structure, so I don't have to keep changing it around later.
> 
> Here is an updated patch set.  It needed some rebasing, especially 
> around the reverting of the catalogued not-null constraints.  I have 
> also fixed up the various incomplete or "fixme" pieces of code
> mentioned 
> above.  I have in most cases added "not supported yet" error messages
> for now, with the idea that some of these things can be added in
> later, 
> as incremental features.
> 

This is not (yet) full review.

Patches applied cleanly on 76618097a6c027ec603a3dd143f61098e3fb9794
from 2024-06-14.
I've run
./configure && make world-bin && make check && make check-world
on 0001, then 0001+0002, then 0001+0002+0003, up to applying
all 5 patches. All cases passed on Debian unstable on aarch64 (ARM64)
on gcc (Debian 13.2.0-25) 13.2.0.

v1-0001-Rename-regress-test-generated-to-generated_stored.patch:
no objections here, makes sense as preparation for future changes

v1-0002-Put-generated_stored-test-objects-in-a-schema.patch:
also no objections.
OTOH other tests (like publication.out, rowsecurity.out, stats_ext.out,
tablespace.out) are creating schemas and later dropping them - so here
it might also make sense to drop schema at the end of testing.

v1-0003-Remove-useless-initializations.patch:
All other cases (I checked directory src/backend/utils/cache)
calling MemoryContextAllocZero only initialize fields when they
are non-zero, so removing partial initialization with false brings
consistency to the code.

v1-0004-Remove-useless-code.patch:
Patch removes filling in of constraints from function
BuildDescForRelation. This function is only called from file
view.c and tablecmds.c (twice). In none of those cases
result->constr is used, so proposed change makes sense.
While I do not know code well, so might be wrong here,
I would apply this patch.

I haven't looked at the most important (and biggest) file yet,
v1-0005-Virtual-generated-columns.patch; hope to look at it
this week.
At the same I believe 0001-0004 can be applied - even backported
if it'll make maintenance of future changes easier. But that should
be commiter's decision.


Best regards

-- 
Tomasz Rybak, Debian Developer 
GPG: A565 CE64 F866 A258 4DDC F9C7 ECB7 3E37 E887 AA8C


signature.asc
Description: This is a digitally signed message part


cost delay brainstorming

2024-06-17 Thread Robert Haas
Hi,

As I mentioned in my talk at 2024.pgconf.dev, I think that the biggest
problem with autovacuum as it exists today is that the cost delay is
sometimes too low to keep up with the amount of vacuuming that needs
to be done. I sketched a solution during the talk, but it was very
complicated, so I started to try to think of simpler ideas that might
still solve the problem, or at least be better than what we have
today.

I think we might able to get fairly far by observing that if the
number of running autovacuum workers is equal to the maximum allowable
number of running autovacuum workers, that may be a sign of trouble,
and the longer that situation persists, the more likely it is that
we're in trouble. So, a very simple algorithm would be: If the maximum
number of workers have been running continuously for more than, say,
10 minutes, assume we're falling behind and exempt all workers from
the cost limit for as long as the situation persists. One could
criticize this approach on the grounds that it causes a very sudden
behavior change instead of, say, allowing the rate of vacuuming to
gradually increase. I'm curious to know whether other people think
that would be a problem.

I think it might be OK, for a couple of reasons:

1. I'm unconvinced that the vacuum_cost_delay system actually prevents
very many problems. I've fixed a lot of problems by telling users to
raise the cost limit, but virtually never by lowering it. When we
lowered the delay by an order of magnitude a few releases ago -
equivalent to increasing the cost limit by an order of magnitude - I
didn't personally hear any complaints about that causing problems. So
disabling the delay completely some of the time might just be fine.

1a. Incidentally, when I have seen problems because of vacuum running
"too fast", it's not been because it was using up too much I/O
bandwidth, but because it's pushed too much data out of cache too
quickly. A long overnight vacuum can evict a lot of pages from the
system page cache by morning - the ring buffer only protects our
shared_buffers, not the OS cache. I don't think this can be fixed by
rate-limiting vacuum, though: to keep the cache eviction at a level
low enough that you could be certain of not causing trouble, you'd
have to limit it to an extremely low rate which would just cause
vacuuming not to keep up. The cure would be worse than the disease at
that point.

2. If we decided to gradually increase the rate of vacuuming instead
of just removing the throttling all at once, what formula would we use
and why would that be the right idea? We'd need a lot of state to
really do a calculation of how fast we would need to go in order to
keep up, and that starts to rapidly turn into a very complicated
project along the lines of what I mooted in Vancouver. Absent that,
the only other idea I have is to gradually ramp up the cost limit
higher and higher, which we could do, but we would have no idea how
fast to ramp it up, so anything we do here feels like it's just
picking random numbers and calling them an algorithm.

If you like this idea, I'd like to know that, and hear any further
thoughts you have about how to improve or refine it. If you don't, I'd
like to know that, too, and any alternatives you can propose,
especially alternatives that don't require crazy amounts of new
infrastructure to implement.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: FYI: LLVM Runtime Crash

2024-06-17 Thread David E. Wheeler
On Jun 17, 2024, at 11:52, David E. Wheeler  wrote:

> I don’t *think* it’s something that can be fixed in Postgres core. This is 
> mostly in FYI in case anyone else runs into this issue or knows something 
> more about it.

Okay, a response to the issue[1] says the bug is in Postgres:

> The error message is LLVM reporting the backend can't handle the particular 
> form of "probe-stack" attribute in the input LLVM IR. So this is likely a bug 
> in the way postgres is generating LLVM IR: please file a bug against 
> Postgres. (Feel free to reopen if you have some reason to believe the issue 
> is on the LLVM side.)

Would it be best for me to send a report to pgsql-bugs?

Best,

David

[1]: https://github.com/llvm/llvm-project/issues/95804#issuecomment-2174310977





Re: [PATCH] Improve error message when trying to lock virtual tuple.

2024-06-17 Thread Matthias van de Meent
(now send a copy to -hackers, too)

On Mon, 17 Jun 2024 at 17:55, Sven Klemm  wrote:
>
> Hello,
>
> When currently trying to lock a virtual tuple the returned error
> will be a misleading `could not read block 0`. This patch adds a
> check for the tuple table slot being virtual to produce a clearer
> error.
>
> This can be triggered by extensions returning virtual tuples.
> While this is of course an error in those extensions the resulting
> error is very misleading.

I think you're solving the wrong problem here, as I can't think of a
place where both virtual tuple slots and tuple locking are allowed at
the same time in core code.

I mean, in which kind of situation could we get a Relation's table
slot which is not lockable by said relation's AM? Assuming the "could
not read block 0" error comes from the heap code, why does the
assertion in heapam_tuple_lock that checks for a
BufferHeapTupleTableSlot not fire before this `block 0` error? If the
error is not in the heapam code, could you show an example of the code
that breaks with that error code?


Kind regards,

Matthias van de Meent
Neon (https://neon.tech)




Xact end leaves CurrentMemoryContext = TopMemoryContext

2024-06-17 Thread Tom Lane
AtCommit_Memory and friends have done $SUBJECT for at least a couple
of decades, but in the wake of analyzing bug #18512 [1], I'm feeling
like that's a really bad idea.  There is too much code running
around the system that assumes that it's fine to leak stuff in
CurrentMemoryContext.  If we execute any such thing between
AtCommit_Memory and the next AtStart_Memory, presto: we have a
session-lifespan memory leak.  I'm almost feeling that we should
have a policy that CurrentMemoryContext should never point at
TopMemoryContext.

As to what to do about it: I'm imagining that instead of resetting
CurrentMemoryContext to TopMemoryContext, we set it to some special
context that we expect we can reset every so often, like at the start
of the next transaction.  The existing TransactionAbortContext is
a very similar thing, and maybe could be repurposed/shared with this
usage.

Thoughts?

regards, tom lane

[1] 
https://www.postgresql.org/message-id/18512-6e89f654d7da884d%40postgresql.org




Re: FYI: LLVM Runtime Crash

2024-06-17 Thread Andres Freund
Hi,

On 2024-06-17 16:07:49 -0400, David E. Wheeler wrote:
> On Jun 17, 2024, at 11:52, David E. Wheeler  wrote:
> 
> > I don’t *think* it’s something that can be fixed in Postgres core. This is 
> > mostly in FYI in case anyone else runs into this issue or knows something 
> > more about it.
> 
> Okay, a response to the issue[1] says the bug is in Postgres:
>
> > The error message is LLVM reporting the backend can't handle the particular 
> > form of "probe-stack" attribute in the input LLVM IR. So this is likely a 
> > bug in the way postgres is generating LLVM IR: please file a bug against 
> > Postgres. (Feel free to reopen if you have some reason to believe the issue 
> > is on the LLVM side.)

I suspect the issue might be that the version of clang and LLVM are diverging
too far. Does it work if you pass CLANG=/opt/homebrew/opt/llvm/bin/clang to
configure?

Greetings,

Andres Freund




Re: FYI: LLVM Runtime Crash

2024-06-17 Thread David E. Wheeler
On Jun 17, 2024, at 16:37, Andres Freund  wrote:

> I suspect the issue might be that the version of clang and LLVM are diverging
> too far. Does it work if you pass CLANG=/opt/homebrew/opt/llvm/bin/clang to
> configure?

It does! It didn’t occur to me this would be the issue, but I presumes 
/usr/bin/clang is not compatible with the latest LLVM installed from Homebrew. 
Interesting! I’ll update that issue.

Thanks,

David





Re: Addressing SECURITY DEFINER Function Vulnerabilities in PostgreSQL Extensions

2024-06-17 Thread John H
Hi Ashutosh,

Thinking about this more, could you clarify the problem/issue at hand?
I think it's still not clear to me.
Yes, CREATE EXTENSION can create functions that lead to unexpected
privilege escalation, regardless
 if they are SECURITY DEFINER or SECURITY INVOKER (if the function is
inadvertently executed by superuser).
But that's also true for a general CREATE FUNCTION call outside of extensions.

If I read the commit message I see:

> This flag controls PostgreSQL's behavior in setting the implicit
> search_path within the proconfig for functions created by an extension
> that does not have the search_path explicitly set in proconfig

If that's the "general" issue you're trying to solve, I would wonder
why we we wouldn't for instance be extending
the functionality to normal CREATE FUNCTION calls (ie, schema qualify
based off search_path)

Thanks,
John H

On Thu, Jun 13, 2024 at 1:09 AM Ashutosh Sharma  wrote:
>
> Hi,
>
> Please find the attached patch addressing all the comments. I kindly
> request your review and feedback. Your thoughts are greatly
> appreciated.
>
> --
> With Regards,
> Ashutosh Sharma.



-- 
John Hsu - Amazon Web Services




Re: cost delay brainstorming

2024-06-17 Thread Andres Freund
Hi,

On 2024-06-17 15:39:27 -0400, Robert Haas wrote:
> As I mentioned in my talk at 2024.pgconf.dev, I think that the biggest
> problem with autovacuum as it exists today is that the cost delay is
> sometimes too low to keep up with the amount of vacuuming that needs
> to be done.

I agree it's a big problem, not sure it's *the* problem. But I'm happy to see
it improved anyway, so it doesn't really matter.

One issue around all of this is that we pretty much don't have the tools to
analyze autovacuum behaviour across a larger number of systems in a realistic
way :/.  I find my own view of what precisely the problem is being heavily
swayed by the last few problematic cases I've looked t.



> I think we might able to get fairly far by observing that if the
> number of running autovacuum workers is equal to the maximum allowable
> number of running autovacuum workers, that may be a sign of trouble,
> and the longer that situation persists, the more likely it is that
> we're in trouble. So, a very simple algorithm would be: If the maximum
> number of workers have been running continuously for more than, say,
> 10 minutes, assume we're falling behind and exempt all workers from
> the cost limit for as long as the situation persists. One could
> criticize this approach on the grounds that it causes a very sudden
> behavior change instead of, say, allowing the rate of vacuuming to
> gradually increase. I'm curious to know whether other people think
> that would be a problem.

Another issue is that it's easy to fall behind due to cost limits on systems
where autovacuum_max_workers is smaller than the number of busy tables.

IME one common situation is to have a single table that's being vacuumed too
slowly due to cost limits, with everything else keeping up easily.


> I think it might be OK, for a couple of reasons:
>
> 1. I'm unconvinced that the vacuum_cost_delay system actually prevents
> very many problems. I've fixed a lot of problems by telling users to
> raise the cost limit, but virtually never by lowering it. When we
> lowered the delay by an order of magnitude a few releases ago -
> equivalent to increasing the cost limit by an order of magnitude - I
> didn't personally hear any complaints about that causing problems. So
> disabling the delay completely some of the time might just be fine.

I have seen disabling cost limits cause replication setups to fall over
because the amount of WAL increases beyond what can be
replicated/archived/replayed.  It's very easy to reach the issue when syncrep
is enabled.



> 1a. Incidentally, when I have seen problems because of vacuum running
> "too fast", it's not been because it was using up too much I/O
> bandwidth, but because it's pushed too much data out of cache too
> quickly. A long overnight vacuum can evict a lot of pages from the
> system page cache by morning - the ring buffer only protects our
> shared_buffers, not the OS cache. I don't think this can be fixed by
> rate-limiting vacuum, though: to keep the cache eviction at a level
> low enough that you could be certain of not causing trouble, you'd
> have to limit it to an extremely low rate which would just cause
> vacuuming not to keep up. The cure would be worse than the disease at
> that point.

I've seen versions of this too. Ironically it's often made way worse by
ringbuffers, because even if there is space is shared buffers, we'll not move
buffers there, instead putting a lot of pressure on the OS page cache.


> If you like this idea, I'd like to know that, and hear any further
> thoughts you have about how to improve or refine it. If you don't, I'd
> like to know that, too, and any alternatives you can propose,
> especially alternatives that don't require crazy amounts of new
> infrastructure to implement.

I unfortunately don't know what to do about all of this with just the existing
set of metrics :/.


One reason that cost limit can be important is that we often schedule
autovacuums on tables in useless ways, over and over. Without cost limits
providing *some* protection against using up all IO bandwidth / WAL volume, we
could end up doing even worse.


Common causes of such useless vacuums I've seen:

- Longrunning transaction prevents increasing relfrozenxid, we run autovacuum
  over and over on the same relation, using up the whole cost budget. This is
  particularly bad because often we'll not autovacuum anything else, building
  up a larger and larger backlog of actual work.

- Tables, where on-access pruning works very well, end up being vacuumed far
  too frequently, because our autovacuum scheduling doesn't know about tuples
  having been cleaned up by on-access pruning.

- Larger tables with occasional lock conflicts cause autovacuum to be
  cancelled and restarting from scratch over and over. If that happens before
  the second table scan, this can easily eat up the whole cost budget without
  making forward progress.


Greetings,

Andres Freund




Re: Reducing the log spam

2024-06-17 Thread Justin Pryzby
On Thu, May 02, 2024 at 12:47:45PM +0200, Laurenz Albe wrote:
> On Mon, 2024-03-11 at 09:33 +0100, Jelte Fennema-Nio wrote:
> > -   the subscriber's server log.
> > +   the subscriber's server log if you remove 23505 from
> > +   .
> > 
> > This seems like a pretty big regression. Being able to know why your
> > replication got closed seems pretty critical.
> 
> Yes.  But I'd argue that that is a shortcoming of logical replication:
> there should be a ways to get this information via SQL.  Having to look into
> the log file is not a very useful option.
> 
> The feature will become much less useful if unique voilations keep getting 
> logged.

Uh, to be clear, your patch is changing the *defaults*, which I found
surprising, even after reaading the thread.  Evidently, the current
behavior is not what you want, and you want to change it, but I'm *sure*
that whatever default you want to use at your site/with your application
is going to make someone else unhappy.  I surely want unique violations
to be logged, for example.

> @@ -6892,6 +6892,41 @@ local0.*/var/log/postgresql
>
>   
>  
> +  xreflabel="log_suppress_errcodes">
> +  log_suppress_errcodes (string)
> +  
> +   log_suppress_errcodes configuration 
> parameter
> +  
> +  
> +  
> +   
> +Causes ERROR and FATAL messages
> +from client backend processes with certain error codes to be excluded
> +from the log.
> +The value is a comma-separated list of five-character error codes as
> +listed in .  An error code that
> +represents a class of errors (ends with three zeros) suppresses 
> logging
> +of all error codes within that class.  For example, the entry
> +08000 (connection_exception)
> +would suppress an error with code 08P01
> +(protocol_violation).  The default setting is
> +23505,3D000,3F000,42601,42704,42883,42P01,57P03.
> +Only superusers and users with the appropriate SET
> +privilege can change this setting.
> +   

> +
> +   
> +This setting is useful to exclude error messages from the log that 
> are
> +frequent but irrelevant.

I think you should phrase the feature as ".. *allow* skipping error
logging for messages ... that are frequent but irrelevant for a given
site/role/DB/etc."

I suggest that this patch should not change the default behavior at all:
its default should be empty.  That you, personally, would plan to
exclude this or that error code is pretty uninteresting.  I think the
idea of changing the default behavior will kill the patch, and even if
you want to propose to do that, it should be a separate discussion.
Maybe you should make it an 002 patch.

> + {"log_suppress_errcodes", PGC_SUSET, LOGGING_WHEN,
> + gettext_noop("ERROR and FATAL messages with these error 
> codes don't get logged."),
> + NULL,
> + GUC_LIST_INPUT
> + },
> + &log_suppress_errcodes,
> + DEFAULT_LOG_SUPPRESS_ERRCODES,
> + check_log_suppress_errcodes, assign_log_suppress_errcodes, NULL

> +/*
> + * Default value for log_suppress_errcodes.  ERROR or FATAL messages with
> + * these error codes are never logged.  Error classes (error codes ending 
> with
> + * three zeros) match all error codes in the class.   The idea is to suppress
> + * messages that usually don't indicate a serious problem but tend to pollute
> + * the log file.
> + */
> +
> +#define DEFAULT_LOG_SUPPRESS_ERRCODES 
> "23505,3D000,3F000,42601,42704,42883,42P01,57P03"
> +

../src/backend/utils/errcodes.txt:23505EERRCODE_UNIQUE_VIOLATION
   unique_violation
../src/backend/utils/errcodes.txt:3D000EERRCODE_INVALID_CATALOG_NAME
   invalid_catalog_name
../src/backend/utils/errcodes.txt:3F000EERRCODE_INVALID_SCHEMA_NAME 
   invalid_schema_name
../src/backend/utils/errcodes.txt:42601EERRCODE_SYNTAX_ERROR
   syntax_error
../src/backend/utils/errcodes.txt:3D000EERRCODE_UNDEFINED_DATABASE
../src/backend/utils/errcodes.txt:42883EERRCODE_UNDEFINED_FUNCTION  
   undefined_function
../src/backend/utils/errcodes.txt:3F000EERRCODE_UNDEFINED_SCHEMA
../src/backend/utils/errcodes.txt:42P01EERRCODE_UNDEFINED_TABLE 
   undefined_table
../src/backend/utils/errcodes.txt:42704EERRCODE_UNDEFINED_OBJECT
   undefined_object
../src/backend/utils/errcodes.txt:57P03EERRCODE_CANNOT_CONNECT_NOW  
   cannot_connect_now

-- 
Justin




Re: Xact end leaves CurrentMemoryContext = TopMemoryContext

2024-06-17 Thread Andres Freund
Hi,

On 2024-06-17 16:37:05 -0400, Tom Lane wrote:
> As to what to do about it: I'm imagining that instead of resetting
> CurrentMemoryContext to TopMemoryContext, we set it to some special
> context that we expect we can reset every so often, like at the start
> of the next transaction.  The existing TransactionAbortContext is
> a very similar thing, and maybe could be repurposed/shared with this
> usage.

One issue is that that could lead to hard to find use-after-free issues in
currently working code. Right now allocations made "between transactions"
live forever, if we just use a different context it won't anymore.

Particularly if the reset is only occasional, we'd make it hard to find
buggy allocations.

I wonder if we ought to set CurrentMemoryContext to NULL in that timeframe,
forcing code to explicitly choose what lifetime is needed, rather than just
defaulting such code into changed semantics.

Greetings,

Andres Freund




Re: cost delay brainstorming

2024-06-17 Thread Greg Sabino Mullane
On Mon, Jun 17, 2024 at 3:39 PM Robert Haas  wrote:

> So, a very simple algorithm would be: If the maximum number of workers
> have been running continuously for more than, say,
> 10 minutes, assume we're falling behind


Hmm, I don't know about the validity of this. I've seen plenty of cases
where we hit the max workers but all is just fine. On the other hand, I
don't have an alternative trigger point yet. But I do overall like the idea
of dynamically changing the delay. And agree it is pretty conservative.


> 2. If we decided to gradually increase the rate of vacuuming instead of
> just removing the throttling all at once, what formula would we use
> and why would that be the right idea?


Well, since the idea of disabling the delay is on the table, we could raise
the cost every minute by X% until we effectively reach an infinite cost /
zero delay situation. I presume this would only affect currently running
vacs, and future ones would get the default cost until things get triggered
again?

Cheers,
Greg


Re: may be a buffer overflow problem

2024-06-17 Thread Daniel Gustafsson
> On 14 Jun 2024, at 17:18, Tom Lane  wrote:
> 
> I wrote:
>> Seeing that this code is exercised thousands of times a day in the
>> regression tests and has had a failure rate of exactly zero (and
>> yes, the tests do check the output), there must be some reason
>> why it's okay.
> 
> After looking a little closer, I think the reason why it works in
> practice is that there's always a few bytes of zero padding at the
> end of struct sqlca_t.
> 
> I don't see any value in changing individual code sites that are
> depending on that, because there are surely many more, both in
> our own code and end users' code.  What I suggest we ought to do
> is formalize the existence of that zero pad.  Perhaps like this:
> 
> char sqlstate[5];
> + char sqlstatepad; /* nul terminator for sqlstate */
> };
> 
> Another way could be to change
> 
> - char sqlstate[5];
> + char sqlstate[6];
> 
> but I fear that could have unforeseen consequences in code that
> is paying attention to sizeof(sqlstate).

Since sqlca is, according to our docs, present in other database systems we
should probably keep it a 5-char array for portability reasons.  Adding a
padding character should be fine though.

The attached adds padding and adjust the tests and documentation to match.  I
kept the fprintf using %.*s to match other callers.  I don't know ECPG well
enough to have strong feelings wrt this being the right fix or not, and the age
of incorrect assumptions arounf that fprintf hints at this not being a huge
problem in reality (should still be fixed of course).

--
Daniel Gustafsson



ecgp_sqlstate-v2.diff
Description: Binary data


Re: jsonpath: Missing regex_like && starts with Errors?

2024-06-17 Thread David E. Wheeler
On Jun 16, 2024, at 11:52, David E. Wheeler  wrote:

> I think that’s how it should be; I prefer that it raises errors by default 
> but you can silence them:
> 
> david=# select jsonb_path_query(target => '{"x": "hi"}', path => 
> '$.integer()', silent => false);
> ERROR:  jsonpath item method .integer() can only be applied to a string or 
> numeric value
> 
> david=# select jsonb_path_query(target => '{"x": "hi"}', path => 
> '$.integer()', silent => true);
> jsonb_path_query 
> --
> (0 rows)
> 
> I suggest that the same behavior be adopted for `like_regex` and `starts 
> with`.

Okay, I think I’ve figured this out, and the key is that I am, once again, 
comparing predicate path queries to SQL standard queries. If I update the first 
example to use a comparison I no longer get an error:

david=# select jsonb_path_query('{"x": "hi"}', '$.integer() == 1');
 jsonb_path_query 
--
 null

So I think that’s the key: There’s not a difference between the behavior of 
`like_regex` and `starts with` vs other predicate expressions.

This dichotomy continues to annoy. I would very much like some way to have 
jsonb_path_query() raise an error (or even a warning!) if passed a predate 
expression, and for jsonb_path_match() to raise an error or warning if its path 
is not a predicate expression. Because I keep confusing TF out of myself.

Best,

David





Re: Proposal: Document ABI Compatibility

2024-06-17 Thread David E. Wheeler
On Jun 12, 2024, at 11:20, Andres Freund  wrote:

>> The PostgreSQL core team maintains two application binary interface (ABI) 
>> guarantees: one for major releases and one for minor releases.
> 
> I.e. for major versions it's "there is none"?

Is it? ISTM that there is the intention not to break things that don’t need to 
be broken, though that doesn’t rule out interface improvements.

>> In such cases the incompatible changes will be listed in the Release Notes.
> 
> I don't think we actually exhaustively list all of them.

Should they be? I can maybe see the argument not to for major releases. But 
I’ve also has the experience of a new failure on a major release and having to 
go find the reason for it and the fix, often requiring the attention of someone 
on a mailing list who might rather tap the “compatibility changes” sign in the 
latest change log. :-)

> s/every/a reasonable/ or just s/every/an/

✅

> The padding case doesn't affect sizeof() fwiw.

✅

> I think there's too often not an alternative to using sizeof(), potentially
> indirectly (via makeNode() or such. So this sounds a bit too general.

Is there some other way to avoid the issue? Or would constructor APIs need to 
be added to core?


Updated with your suggestions:


``` md

ABI Policy
==

The PostgreSQL core team maintains two application binary interface (ABI) 
guarantees: one for major releases and one for minor releases.

Major Releases
--

Applications that use the PostgreSQL APIs must be compiled for each major 
release supported by the application. The inclusion of `PG_MODULE_MAGIC` 
ensures that code compiled for one major version will rejected by other major 
versions.

Furthermore, new releases may make API changes that require code changes. Use 
the `PG_VERSION_NUM` constant to adjust code in a backwards compatible way:

``` c
#if PG_VERSION_NUM >= 16
#include "varatt.h"
#endif
```

PostgreSQL avoids unnecessary API changes in major releases, but usually ships 
a few necessary API changes, including deprecation, renaming, and argument 
variation. In such cases the incompatible changes will be listed in the Release 
Notes.

Minor Releases
--

PostgreSQL makes an effort to avoid both API and ABI breaks in minor releases. 
In general, an application compiled against any minor release will work with 
any other minor release, past or future.

When a change *is* required, PostgreSQL will choose the least invasive way 
possible, for example by squeezing a new field into padding space or appending 
it to the end of a struct. This sort of change should not impact dependent 
applications unless, they use `sizeof(the struct)` on a struct with an appended 
field, or create their own instances of such structs --- patterns best avoided.

In rare cases, however, even such non-invasive changes may be impractical or 
impossible. In such an event, the change will be documented in the Release 
Notes, and details on the issue will also be posted to [TBD; mail list? Blog 
post? News item?].

The project strongly recommends that developers adopt continuous integration 
testing at least for the latest minor release all major versions of Postgres 
they support.
```

Best,

David





Re: Proposal: Document ABI Compatibility

2024-06-17 Thread David E. Wheeler
On Jun 12, 2024, at 11:30, Peter Geoghegan  wrote:

> I'm a little surprised that we don't seem to have all that many
> problems with ABI breakage, though. Although we theoretically have a
> huge number of APIs that extension authors might choose to use, that
> isn't really true in practical terms. The universe of theoretically
> possible problems is vastly larger than the areas where we see
> problems in practice. You have to be pragmatic about it.

Things go wrong far less often than one might fear! Given this relative 
stability, I think it’s reasonable to document what heretofore assumed the 
policy is so that the fears can largely be put to rest by clear expectations.

Best,

David





Re: Proposal: Document ABI Compatibility

2024-06-17 Thread David E. Wheeler
On Jun 12, 2024, at 11:57, Peter Geoghegan  wrote:

> That having been said, it would be useful if there was a community web
> resource for this -- something akin to coverage.postgresql.org, but
> with differential ABI breakage reports. You can see an example report
> here:
> 
> https://postgr.es/m/CAH2-Wzm-W6hSn71sUkz0Rem=qdeu7tnfmc7_jg2djrlfef_...@mail.gmail.com
> 
> Theoretically anybody can do this themselves. In practice they don't.
> So something as simple as providing automated reports about ABI
> changes might well move the needle here.

What would be required to make such a thing? Maybe it’d make a good Summer of 
Code project.

Best,

David





RE: Proposal for Updating CRC32C with AVX-512 Algorithm.

2024-06-17 Thread Amonson, Paul D
> This is extremely workload dependent, it's not hard to find workloads with
> lots of very small record and very few big ones...  What you observed might
> have "just" been the warmup behaviour where more full page writes have to
> be written.

Can you tell me how to avoid capturing this "warm-up" so that the numbers are 
more accurate?
 
> There a very frequent call computing COMP_CRC32C over just 20 bytes, while
> holding a crucial lock.  If we were to do introduce something like this
> AVX-512 algorithm, it'd probably be worth to dispatch differently in case of
> compile-time known small lengths.

So are you suggesting that we be able to directly call into the 64/32 bit based 
algorithm directly from these known small byte cases in the code? I think that 
we can do that with a separate API being exposed.

> How does the latency of the AVX-512 algorithm compare to just using the
> CRC32C instruction?

I think I need more information on this one as I am not sure I understand the 
use case? The same function pointer indirect methods are used with or without 
the AVX-512 algorithm?

Paul





Re: cost delay brainstorming

2024-06-17 Thread David Rowley
On Tue, 18 Jun 2024 at 07:39, Robert Haas  wrote:
> I think we might able to get fairly far by observing that if the
> number of running autovacuum workers is equal to the maximum allowable
> number of running autovacuum workers, that may be a sign of trouble,
> and the longer that situation persists, the more likely it is that
> we're in trouble. So, a very simple algorithm would be: If the maximum
> number of workers have been running continuously for more than, say,
> 10 minutes, assume we're falling behind and exempt all workers from
> the cost limit for as long as the situation persists. One could
> criticize this approach on the grounds that it causes a very sudden
> behavior change instead of, say, allowing the rate of vacuuming to
> gradually increase. I'm curious to know whether other people think
> that would be a problem.

I think a nicer solution would implement some sort of unified "urgency
level" and slowly ramp up the vacuum_cost_limit according to that
urgency rather than effectively switching to an infinite
vacuum_cost_limit when all workers have been going for N mins. If
there is nothing else that requires a vacuum while all 3 workers have
been busy for an hour or two, it seems strange to hurry them up so
they can more quickly start their next task -- being idle.

An additional feature that having this unified "urgency level" will
provide is the ability to prioritise auto-vacuum so that it works on
the most urgent tables first.

I outlined some ideas in [1] of how this might be done.

David

[1] 
https://postgr.es/m/CAApHDvo8DWyt4CWhF=NPeRstz_78SteEuuNDfYO7cjp=7yt...@mail.gmail.com




Re: jsonpath: Missing regex_like && starts with Errors?

2024-06-17 Thread Chapman Flack
On 06/17/24 18:14, David E. Wheeler wrote:
> So I think that’s the key: There’s not a difference between the behavior of
> `like_regex` and `starts with` vs other predicate expressions.

The current implementation seems to have made each of our
s responsible for swallowing its own errors, which
is one perfectly cromulent way to satisfy the SQL standard behavior saying
all errors within a  should be swallowed.

The standard says nothing on how they should behave outside of a
, because as far as the standard's concerned,
they can't appear there.

Ours currently behave the same way, and swallow their errors.

It would have been possible to write them in such a way as to raise errors,
but not when inside a , and that would also satisfy
the standard, but it would also give us the errors you would like from our
nonstandard "predicate check expressions". And then you could easily use
silent => true if you wanted them silent.

I'd be leery of changing that, though, as we've already documented that
a "predicate check expression" returns true, false, or unknown, so having
it throw by default seems like a change of documented behavior.

The current situation can't make much use of 'silent', since it's already
false by default; you can't make it any falser to make predicate-check
errors show up.

Would it be a thinkable thought to change the 'silent' default to null?
That could have the same effect as false for SQL standard expressions, and
the same effect seen now for "predicate check expressions", and you could
pass it explicitly false if you wanted errors from the predicate checks.

If that's no good, I don't see an obvious solution other than adding
another nonstandard construct to what's nonstandard already, and allowing
something like nonsilent(predicate check expression).

Regards,
-Chap




Re: jsonpath: Missing regex_like && starts with Errors?

2024-06-17 Thread David E. Wheeler
On Jun 17, 2024, at 6:44 PM, Chapman Flack  wrote:

> The current implementation seems to have made each of our
> s responsible for swallowing its own errors, which
> is one perfectly cromulent way to satisfy the SQL standard behavior saying
> all errors within a  should be swallowed.

Naw, executePredicate does it for all of them, as for the left operand here[1].

> The standard says nothing on how they should behave outside of a
> , because as far as the standard's concerned,
> they can't appear there.
> 
> Ours currently behave the same way, and swallow their errors.

Yes, and they’re handled consistently, at least.

> It would have been possible to write them in such a way as to raise errors,
> but not when inside a , and that would also satisfy
> the standard, but it would also give us the errors you would like from our
> nonstandard "predicate check expressions". And then you could easily use
> silent => true if you wanted them silent.

I’m okay without the errors, as long as the behaviors are consistent. I mean it 
might be cool to have a way to get them, but the consistency I thought I saw 
was the bit that seemed like a bug.

> I'd be leery of changing that, though, as we've already documented that
> a "predicate check expression" returns true, false, or unknown, so having
> it throw by default seems like a change of documented behavior.

Right, same for using jsonb_path_match().

> The current situation can't make much use of 'silent', since it's already
> false by default; you can't make it any falser to make predicate-check
> errors show up.

EXTREAMLY FALSE! 😂

> Would it be a thinkable thought to change the 'silent' default to null?
> That could have the same effect as false for SQL standard expressions, and
> the same effect seen now for "predicate check expressions", and you could
> pass it explicitly false if you wanted errors from the predicate checks.

Thaat seems like it’d be confusing TBH.

> If that's no good, I don't see an obvious solution other than adding
> another nonstandard construct to what's nonstandard already, and allowing
> something like nonsilent(predicate check expression).

The only options I can think of are a GUC to turn on SUPER STRICT mode or 
something (yuck, action at a distance) or introduce new functions with the new 
behavior. I advocate for neither (at this point).

Best,

David

[1]: 
https://github.com/postgres/postgres/blob/82ed67a/src/backend/utils/adt/jsonpath_exec.c#L2058-L2059





Re: Inval reliability, especially for inplace updates

2024-06-17 Thread Noah Misch
On Sat, Jun 15, 2024 at 03:37:18PM -0700, Noah Misch wrote:
> On Wed, May 22, 2024 at 05:05:48PM -0700, Noah Misch wrote:
> > https://postgr.es/m/20240512232923.aa.nmi...@google.com wrote:
> > > Separable, nontrivial things not fixed in the attached patch stack:
> > > 
> > > - Inplace update uses transactional CacheInvalidateHeapTuple().  ROLLBACK 
> > > of
> > >   CREATE INDEX wrongly discards the inval, leading to the relhasindex=t 
> > > loss
> > >   still seen in inplace-inval.spec.  CacheInvalidateRelmap() does this 
> > > right.
> > 
> > I plan to fix that like CacheInvalidateRelmap(): send the inval immediately,
> > inside the critical section.  Send it in heap_xlog_inplace(), too.
> 
> > a. Within logical decoding, cease processing invalidations for inplace
> 
> I'm attaching the implementation.  This applies atop the v3 patch stack from
> https://postgr.es/m/20240614003549.c2.nmi...@google.com, but the threads are
> mostly orthogonal and intended for independent review.  Translating a tuple
> into inval messages uses more infrastructure than relmapper, which needs just
> a database ID.  Hence, this ended up more like a miniature of inval.c's
> participation in the transaction commit sequence.
> 
> I waffled on whether to back-patch inplace150-inval-durability-atcommit

That inplace150 patch turned out to be unnecessary.  Contrary to the
"noncritical resource releasing" comment some lines above
AtEOXact_Inval(true), the actual behavior is already to promote ERROR to
PANIC.  An ERROR just before or after sending invals becomes PANIC, "cannot
abort transaction %u, it was already committed".  Since
inplace130-AtEOXact_RelationCache-comments existed to clear the way for
inplace150, inplace130 also becomes unnecessary.  I've removed both from the
attached v2 patch stack.

> The patch changes xl_heap_inplace of XLOG_HEAP_INPLACE.  For back branches, we
> could choose between:
> 
> - Same change, no WAL version bump.  Standby must update before primary.  This
>   is best long-term, but the transition is more disruptive.  I'm leaning
>   toward this one, but the second option isn't bad:
> 
> - heap_xlog_inplace() could set the shared-inval-queue overflow signal on
>   every backend.  This is more wasteful, but inplace updates might be rare
>   enough (~once per VACUUM) to make it tolerable.
> 
> - Use LogStandbyInvalidations() just after XLOG_HEAP_INPLACE.  This isn't
>   correct if one ends recovery between the two records, but you'd need to be
>   unlucky to notice.  Noticing would need a procedure like the following.  A
>   hot standby backend populates a relcache entry, then does DDL on the rel
>   after recovery ends.

That still holds.
Author: Noah Misch 
Commit: Noah Misch 

Remove comment about xl_heap_inplace "AT END OF STRUCT".

Commit 2c03216d831160bedd72d45f712601b6f7d03f1c moved the tuple data
from there to the buffer-0 data.  Back-patch to v12 (all supported
versions), the plan for the next change to this struct.

Reviewed by FIXME.

Discussion: https://postgr.es/m/20240523000548.58.nmi...@google.com

diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index 22a1747..42736f3 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -425,7 +425,6 @@ typedef struct xl_heap_confirm
 typedef struct xl_heap_inplace
 {
OffsetNumber offnum;/* updated tuple's offset on page */
-   /* TUPLE DATA FOLLOWS AT END OF STRUCT */
 } xl_heap_inplace;
 
 #define SizeOfHeapInplace  (offsetof(xl_heap_inplace, offnum) + 
sizeof(OffsetNumber))
Author: Noah Misch 
Commit: Noah Misch 

For inplace update, send nontransactional invalidations.

The inplace update survives ROLLBACK.  The inval didn't, so another
backend's DDL could then update the row without incorporating the
inplace update.  In the test this fixes, a mix of CREATE INDEX and ALTER
TABLE resulted in a table with an index, yet relhasindex=f.  That is a
source of index corruption.

Core code no longer needs XLOG_INVALIDATIONS, but this leaves removing
it for future work.  Extensions could be relying on that mechanism, so
that removal would not be back-patch material.  Back-patch to v12 (all
supported versions).

Reviewed by FIXME.

Discussion: https://postgr.es/m/20240523000548.58.nmi...@google.com

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 797bddf..d7e417f 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -6305,6 +6305,9 @@ heap_inplace_update_finish(void *state, HeapTuple tuple)
Relationrelation = scan->heap_rel;
uint32  oldlen;
uint32  newlen;
+   int nmsgs = 0;
+   SharedInvalidationMessage *invalMessages = NULL;
+   boolRelcacheInitFileInval = false;
 
Assert(ItemPointer

Re: Separate HEAP WAL replay logic into its own file

2024-06-17 Thread Li, Yong


> On Jun 17, 2024, at 23:01, Melanie Plageman  wrote:
> 
> External Email
> 
> On Mon, Jun 17, 2024 at 2:20 AM Li, Yong  wrote:
>> 
>> Hi PostgreSQL hackers,
>> 
>> For most access methods in PostgreSQL, the implementation of the access 
>> method itself and the implementation of its WAL replay logic are organized 
>> in separate source files.  However, the HEAP access method is an exception.  
>> Both the access method and the WAL replay logic are collocated in the same 
>> heapam.c.  To follow the pattern established by other access methods and to 
>> improve maintainability, I made the enclosed patch to separate HEAP’s replay 
>> logic into its own file.  The changes are straightforward.  Move the replay 
>> related functions into the new heapam_xlog.c file, push the common 
>> heap_execute_freeze_tuple() helper function into the heapam.h header, and 
>> adjust the build files.
> 
> I'm not against this change, but I am curious at what inspired this.
> Were you looking at Postgres code and simply noticed that there isn't
> a heapam_xlog.c (like there is a nbtxlog.c etc) and thought that you
> wanted to change that? Or is there some specific reason this would
> help you as a Postgres developer, user, or ecosystem member?
> 
> - Melanie

As a newcomer, when I was walking through the code looking for WAL replay 
related code, it was relatively easy for me to find them for the B-Tree access 
method because of the “xlog” hint in the file names. It took me a while to find 
the same for the heap access method. When I finally found them (via text 
search), it was a small surprise. Having different file organizations for 
different access methods gives me this urge to make everything consistent. I 
think it will make it easier for newcomers, and it will reduce the mental load 
for everyone to remember that heap replay is inside the heapam.c not some 
“???xlog.c”.

Yong

Re: Check lateral references within PHVs for memoize cache keys

2024-06-17 Thread Richard Guo
On Mon, Mar 18, 2024 at 4:36 PM Richard Guo  wrote:
> Here is another rebase over master so it applies again.  I also added a
> commit message to help review.  Nothing else has changed.

AFAIU currently we do not add Memoize nodes on top of join relation
paths.  This is because the ParamPathInfos for join relation paths do
not maintain ppi_clauses, as the set of relevant clauses varies
depending on how the join is formed.  In addition, joinrels do not
maintain lateral_vars.  So we do not have a way to extract cache keys
from joinrels.

(Besides, there are places where the code doesn't cope with Memoize path
on top of a joinrel path, such as in get_param_path_clause_serials.)

Therefore, when extracting lateral references within PlaceHolderVars,
there is no need to consider those that are due to be evaluated at
joinrels.

Hence, here is v7 patch for that.  In passing, this patch also includes
a comment explaining that Memoize nodes are currently not added on top
of join relation paths (maybe we should have a separate patch for this?).

Thanks
Richard


v7-0001-Check-lateral-refs-within-PHVs-for-memoize-cache-keys.patch
Description: Binary data


Re: Inval reliability, especially for inplace updates

2024-06-17 Thread Andres Freund
Hi,

On 2024-06-17 16:58:54 -0700, Noah Misch wrote:
> On Sat, Jun 15, 2024 at 03:37:18PM -0700, Noah Misch wrote:
> > On Wed, May 22, 2024 at 05:05:48PM -0700, Noah Misch wrote:
> > > https://postgr.es/m/20240512232923.aa.nmi...@google.com wrote:
> > > > Separable, nontrivial things not fixed in the attached patch stack:
> > > >
> > > > - Inplace update uses transactional CacheInvalidateHeapTuple().  
> > > > ROLLBACK of
> > > >   CREATE INDEX wrongly discards the inval, leading to the relhasindex=t 
> > > > loss
> > > >   still seen in inplace-inval.spec.  CacheInvalidateRelmap() does this 
> > > > right.
> > >
> > > I plan to fix that like CacheInvalidateRelmap(): send the inval 
> > > immediately,
> > > inside the critical section.  Send it in heap_xlog_inplace(), too.

I'm worried this might cause its own set of bugs, e.g. if there are any places
that, possibly accidentally, rely on the invalidation from the inplace update
to also cover separate changes.

Have you considered instead submitting these invalidations during abort as
well?


> > > a. Within logical decoding, cease processing invalidations for inplace
> >
> > I'm attaching the implementation.  This applies atop the v3 patch stack from
> > https://postgr.es/m/20240614003549.c2.nmi...@google.com, but the threads are
> > mostly orthogonal and intended for independent review.  Translating a tuple
> > into inval messages uses more infrastructure than relmapper, which needs 
> > just
> > a database ID.  Hence, this ended up more like a miniature of inval.c's
> > participation in the transaction commit sequence.
> >
> > I waffled on whether to back-patch inplace150-inval-durability-atcommit
>
> That inplace150 patch turned out to be unnecessary.  Contrary to the
> "noncritical resource releasing" comment some lines above
> AtEOXact_Inval(true), the actual behavior is already to promote ERROR to
> PANIC.  An ERROR just before or after sending invals becomes PANIC, "cannot
> abort transaction %u, it was already committed".

Relying on that, instead of explicit critical sections, seems fragile to me.
IIRC some of the behaviour around errors around transaction commit/abort has
changed a bunch of times. Tying correctness into something that could be
changed for unrelated reasons doesn't seem great.

I'm not sure it holds true even today - what if the transaction didn't have an
xid? Then RecordTransactionAbort() wouldn't trigger
 "cannot abort transaction %u, it was already committed"
I think?



> > - Same change, no WAL version bump.  Standby must update before primary.  
> > This
> >   is best long-term, but the transition is more disruptive.  I'm leaning
> >   toward this one, but the second option isn't bad:

Hm. The inplace record doesn't use the length of the "main data" record
segment for anything, from what I can tell. If records by an updated primary
were replayed by an old standby, it'd just ignore the additional data, afaict?

I think with the code as-is, the situation with an updated standby replaying
an old primary's record would actually be worse - it'd afaict just assume the
now-longer record contained valid fields, despite those just pointing into
uninitialized memory.  I think the replay routine would have to check the
length of the main data and execute the invalidation conditionally.


> > - heap_xlog_inplace() could set the shared-inval-queue overflow signal on
> >   every backend.  This is more wasteful, but inplace updates might be rare
> >   enough (~once per VACUUM) to make it tolerable.

We already set that surprisingly frequently, as
a) The size of the sinval queue is small
b) If a backend is busy, it does not process catchup interrupts
   (i.e. executing queries, waiting for a lock prevents processing)
c) There's no deduplication of invals, we often end up sending the same inval
   over and over.

So I suspect this might not be too bad, compared to the current badness.


At least for core code. I guess there could be extension code triggering
inplace updates more frequently? But I'd hope they'd do it not on catalog
tables... Except that we wouldn't know that that's the case during replay,
it's not contained in the record.




> > - Use LogStandbyInvalidations() just after XLOG_HEAP_INPLACE.  This isn't
> >   correct if one ends recovery between the two records, but you'd need to be
> >   unlucky to notice.  Noticing would need a procedure like the following.  A
> >   hot standby backend populates a relcache entry, then does DDL on the rel
> >   after recovery ends.

Hm. The problematic cases presumably involves an access exclusive lock? If so,
could we do LogStandbyInvalidations() *before* logging the WAL record for the
inplace update? The invalidations can't be processed by other backends until
the exclusive lock has been released, which should avoid the race?


Greetings,

Andres Freund




Re: Logical Replication of sequences

2024-06-17 Thread Masahiko Sawada
On Fri, Jun 14, 2024 at 4:04 PM Amit Kapila  wrote:
>
> On Thu, Jun 13, 2024 at 6:14 PM Masahiko Sawada  wrote:
> >
> > On Thu, Jun 13, 2024 at 7:06 PM Amit Kapila  wrote:
> > >
> > > On Thu, Jun 13, 2024 at 1:09 PM Masahiko Sawada  
> > > wrote:
> > > >
> > > > On Wed, Jun 12, 2024 at 6:59 PM Amit Kapila  
> > > > wrote:
> > > > >
> > > > >
> > > > > Yeah, starting with a single worker sounds good for now. Do you think
> > > > > we should sync all the sequences in a single transaction or have some
> > > > > threshold value above which a different transaction would be required
> > > > > or maybe a different sequence sync worker altogether? Now, having
> > > > > multiple sequence-sync workers requires some synchronization so that
> > > > > only a single worker is allocated for one sequence.
> > > > >
> > > > > The simplest thing is to use a single sequence sync worker that syncs
> > > > > all sequences in one transaction but with a large number of sequences,
> > > > > it could be inefficient. OTOH, I am not sure if it would be a problem
> > > > > in reality.
> > > >
> > > > I think that we can start with using a single worker and one
> > > > transaction, and measure the performance with a large number of
> > > > sequences.
> > > >
> > >
> > > Fair enough. However, this raises the question Dilip and Vignesh are
> > > discussing whether we need a new relfilenode for sequence update even
> > > during initial sync? As per my understanding, the idea is that similar
> > > to tables, the CREATE SUBSCRIPTION command (with copy_data = true)
> > > will create the new sequence entries in pg_subscription_rel with the
> > > state as 'i'. Then the sequence-sync worker would start a transaction
> > > and one-by-one copy the latest sequence values for each sequence (that
> > > has state as 'i' in pg_subscription_rel) and mark its state as ready
> > > 'r' and commit the transaction. Now if there is an error during this
> > > operation it will restart the entire operation. The idea of creating a
> > > new relfilenode is to handle the error so that if there is a rollback,
> > > the sequence state will be rolled back to 'i' and the sequence value
> > > will also be rolled back. The other option could be that we update the
> > > sequence value without a new relfilenode and if the transaction rolled
> > > back then only the sequence's state will be rolled back to 'i'. This
> > > would work with a minor inconsistency that sequence values will be
> > > up-to-date even when the sequence state is 'i' in pg_subscription_rel.
> > > I am not sure if that matters because anyway, they can quickly be
> > > out-of-sync with the publisher again.
> >
> > I think it would be fine in many cases even if the sequence value is
> > up-to-date even when the sequence state is 'i' in pg_subscription_rel.
> > But the case we would like to avoid is where suppose the sequence-sync
> > worker does both synchronizing sequence values and updating the
> > sequence states for all sequences in one transaction, and if there is
> > an error we end up retrying the synchronization for all sequences.
> >
>
> The one idea to avoid this is to update sequences in chunks (say 100
> or some threshold number of sequences in one transaction). Then we
> would only redo the sync for the last and pending set of sequences.

That could be one idea.

>
> > >
> > > Now, say we don't want to maintain the state of sequences for initial
> > > sync at all then after the error how will we detect if there are any
> > > pending sequences to be synced? One possibility is that we maintain a
> > > subscription level flag 'subsequencesync' in 'pg_subscription' to
> > > indicate whether sequences need sync. This flag would indicate whether
> > > to sync all the sequences in pg_susbcription_rel. This would mean that
> > > if there is an error while syncing the sequences we will resync all
> > > the sequences again. This could be acceptable considering the chances
> > > of error during sequence sync are low. The benefit is that both the
> > > REFRESH PUBLICATION SEQUENCES and CREATE SUBSCRIPTION can use the same
> > > idea and sync all sequences without needing a new relfilenode. Users
> > > can always refer 'subsequencesync' flag in 'pg_subscription' to see if
> > > all the sequences are synced after executing the command.
> >
> > I think that REFRESH PUBLICATION {SEQUENCES} can be executed even
> > while the sequence-sync worker is synchronizing sequences. In this
> > case, the worker might not see new sequences added by the concurrent
> > REFRESH PUBLICATION {SEQUENCES} command since it's already running.
> > The worker could end up marking the subsequencesync as completed while
> > not synchronizing these new sequences.
> >
>
> This is possible but we could avoid REFRESH PUBLICATION {SEQUENCES} by
> not allowing to change the subsequencestate during the time
> sequence-worker is syncing the sequences. This could be restrictive
> but there doesn't seem to be cases where user would like to

RE: Conflict Detection and Resolution

2024-06-17 Thread Zhijie Hou (Fujitsu)
On Thursday, June 13, 2024 2:11 PM Masahiko Sawada  
wrote:

Hi,

> On Wed, Jun 5, 2024 at 3:32 PM Zhijie Hou (Fujitsu) 
> wrote:
> >
> > This time at PGconf.dev[1], we had some discussions regarding this
> > project. The proposed approach is to split the work into two main
> > components. The first part focuses on conflict detection, which aims
> > to identify and report conflicts in logical replication. This feature
> > will enable users to monitor the unexpected conflicts that may occur.
> > The second part involves the actual conflict resolution. Here, we will
> > provide built-in resolutions for each conflict and allow user to
> > choose which resolution will be used for which conflict(as described
> > in the initial email of this thread).
> 
> I agree with this direction that we focus on conflict detection (and
> logging) first and then develop conflict resolution on top of that.

Thanks for your reply !

> 
> >
> > Of course, we are open to alternative ideas and suggestions, and the
> > strategy above can be changed based on ongoing discussions and
> > feedback received.
> >
> > Here is the patch of the first part work, which adds a new parameter
> > detect_conflict for CREATE and ALTER subscription commands. This new
> > parameter will decide if subscription will go for conflict detection.
> > By default, conflict detection will be off for a subscription.
> >
> > When conflict detection is enabled, additional logging is triggered in
> > the following conflict scenarios:
> >
> > * updating a row that was previously modified by another origin.
> > * The tuple to be updated is not found.
> > * The tuple to be deleted is not found.
> >
> > While there exist other conflict types in logical replication, such as
> > an incoming insert conflicting with an existing row due to a primary
> > key or unique index, these cases already result in constraint violation 
> > errors.
> 
> What does detect_conflict being true actually mean to users? I understand that
> detect_conflict being true could introduce some overhead to detect conflicts.
> But in terms of conflict detection, even if detect_confict is false, we detect
> some conflicts such as concurrent inserts with the same key. Once we
> introduce the complete conflict detection feature, I'm not sure there is a 
> case
> where a user wants to detect only some particular types of conflict.
> 
> > Therefore, additional conflict detection for these cases is currently
> > omitted to minimize potential overhead. However, the pre-detection for
> > conflict in these error cases is still essential to support automatic
> > conflict resolution in the future.
> 
> I feel that we should log all types of conflict in an uniform way. For 
> example,
> with detect_conflict being true, the update_differ conflict is reported as
> "conflict %s detected on relation "%s"", whereas concurrent inserts with the
> same key is reported as "duplicate key value violates unique constraint "%s"",
> which could confuse users.

Do you mean it's ok to add a pre-check before applying the INSERT, which will
verify if the remote tuple violates any unique constraints, and if it violates
then we log a conflict message ? I thought about this but was slightly
worried about the extra cost it would bring. OTOH, if we think it's acceptable,
we could do that since the cost is there only when detect_conflict is enabled.

I also thought of logging such a conflict message in pg_catch(), but I think we
lack some necessary info(relation, index name, column name) at the catch block.

Best Regards,
Hou zj





Re: jsonpath: Missing regex_like && starts with Errors?

2024-06-17 Thread Chapman Flack
On 06/17/24 19:17, David E. Wheeler wrote:
> [1]: 
> https://github.com/postgres/postgres/blob/82ed67a/src/backend/utils/adt/jsonpath_exec.c#L2058-L2059

Huh, I just saw something peculiar, skimming through the code:

https://github.com/postgres/postgres/blob/82ed67a/src/backend/utils/adt/jsonpath_exec.c#L1385

We allow .boolean() applied to a jbvBool, a jbvString (those are the only
two possibilities allowed by the standard), or to a jbvNumeric (!), but
only if it can be serialized and then parsed as an int4, otherwise we say
ERRCODE_NON_NUMERIC_SQL_JSON_ITEM, or if it survived all that we call it
true if it isn't zero.

I wonder what that alternative is doing there.

It also looks like the expected errcode (in the standard, if the item
was not boolean or string) would be 2202V "non-boolean SQL/JSON item" ...
which isn't in our errcodes.txt.

Regards,
-Chap




Re: assertion failure at cost_memoize_rescan()

2024-06-17 Thread Kohei KaiGai
2024年6月17日(月) 8:27 David Rowley :
>
> On Mon, 17 Jun 2024 at 10:23, Tomas Vondra
>  wrote:
> > Interesting. Seems like a bug due to the two places clamping the values
> > inconsistently. It probably does not matter in other contexts because we
> > don't subtract the values like this, but here it triggers the assert.
> >
> > I guess the simplest fix would be to clamp "calls" the same way before
> > calculating hit_ratio. That makes the ">= 0" part of the assert somewhat
> > pointless, though.
>
> "calls" comes from the value passed as the final parameter in
> create_memoize_path().
>
> There's really only one call to that function and that's in 
> get_memoize_path().
>
> return (Path *) create_memoize_path(root,
> innerrel,
> inner_path,
> param_exprs,
> hash_operators,
> extra->inner_unique,
> binary_mode,
> outer_path->rows);
>
> It would be good to know what type of Path outer_path is.  Normally
> we'll clamp_row_est() on that field. I suspect we must have some Path
> type that isn't doing that.
>
> KaiGai-san,  what type of Path is outer_path?
>
It is CustomPath with rows = 3251872.91667.
(I'm not certain whether the non-integer value in the estimated rows
is legal or not.)

According to the crash dump, try_partial_nestloop_path() takes this two paths,

(gdb) up
#7  0x00860fc5 in try_partial_nestloop_path (root=0x133a968,
joinrel=0x15258d8, outer_path=0x1513c98, inner_path=0x15264c8,
pathkeys=0x0, jointype=JOIN_INNER, extra=0x7ffc494cddf0) at
joinpath.c:887
/home/kaigai/source/pgsql-16/src/backend/optimizer/path/joinpath.c:887:30713:beg:0x860fc5

(gdb) p *(CustomPath *)outer_path
$13 = {path = {type = T_CustomPath, pathtype = T_CustomScan, parent =
0x1513058, pathtarget = 0x1513268, param_info = 0x0, parallel_aware =
true, parallel_safe = true, parallel_workers = 2, rows =
3251872.91667, startup_cost = 41886.75250002,
total_cost = 12348693.48861, pathkeys = 0x0}, flags = 4,
custom_paths = 0x1514788, custom_private = 0x1514ee8, methods =
0x7f45211feca0 }

(gdb) p *(MemoizePath *)inner_path
$14 = {path = {type = T_MemoizePath, pathtype = T_Memoize, parent =
0x14dc800, pathtarget = 0x14dca10, param_info = 0x150d8a8,
parallel_aware = false, parallel_safe = true, parallel_workers = 0,
rows = 1, startup_cost = 0.44501,
total_cost = 8.4284913207446568, pathkeys = 0x150d5c8}, subpath =
0x150d148, hash_operators = 0x1526428, param_exprs = 0x1526478,
singlerow = true, binary_mode = false, calls = 3251872.91667,
est_entries = 3251873}

Best regards,
-- 
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei 




Missing docs for new enable_group_by_reordering GUC

2024-06-17 Thread Bruce Momjian
This commit added enable_group_by_reordering:

commit 0452b461bc4
Author: Alexander Korotkov 
Date:   Sun Jan 21 22:21:36 2024 +0200

Explore alternative orderings of group-by pathkeys during 
optimization.

When evaluating a query with a multi-column GROUP BY clause, we can 
minimize
sort operations or avoid them if we synchronize the order of GROUP 
BY clauses
with the ORDER BY sort clause or sort order, which comes from the 
underlying
query tree. Grouping does not imply any ordering, so we can compare
the keys in arbitrary order, and a Hash Agg leverages this. But for 
Group Agg,
we simply compared keys in the order specified in the query. This 
commit
explores alternative ordering of the keys, trying to find a cheaper 
one.

The ordering of group keys may interact with other parts of the 
query, some of
which may not be known while planning the grouping. For example, 
there may be
an explicit ORDER BY clause or some other ordering-dependent 
operation higher up
in the query, and using the same ordering may allow using either 
incremental
sort or even eliminating the sort entirely.

The patch always keeps the ordering specified in the query, 
assuming the user
might have additional insights.

This introduces a new GUC enable_group_by_reordering so that the 
optimization
may be disabled if needed.

Discussion: 
https://postgr.es/m/7c79e6a5-8597-74e8-0671-1c39d124c9d6%40sigaev.ru
Author: Andrei Lepikhov, Teodor Sigaev
Reviewed-by: Tomas Vondra, Claudio Freire, Gavin Flower, Dmitry 
Dolgov
Reviewed-by: Robert Haas, Pavel Borisov, David Rowley, Zhihong Yu
Reviewed-by: Tom Lane, Alexander Korotkov, Richard Guo, Alena 
Rybakina

It mentions it was added as a GUC to postgresql.conf, but I see no SGML
docs for this new GUC value.  Would someone please add docs for this? 
Thanks.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.




Re: may be a buffer overflow problem

2024-06-17 Thread Andres Freund
Hi,

On 2024-06-17 23:52:54 +0200, Daniel Gustafsson wrote:
> Since sqlca is, according to our docs, present in other database systems we
> should probably keep it a 5-char array for portability reasons.  Adding a
> padding character should be fine though.

How about, additionally, adding __attribute__((nonstring))? Wrapped in an
attribute, of course.  That'll trigger warning for many unsafe uses, like
strlen().

It doesn't quite detect the problematic case in ecpg_log() though, seems it
doesn't understand fprintf() well enough (it does trigger in simple printf()
cases, because they get reduced to puts(), which it understands).


Adding nonstring possibly allow us to re-enable -Wstringop-truncation, it 
triggers a
bunch on

../../../../../home/andres/src/postgresql/src/interfaces/ecpg/ecpglib/misc.c: 
In function ‘ECPGset_var’:
../../../../../home/andres/src/postgresql/src/interfaces/ecpg/ecpglib/misc.c:575:17:
 warning: ‘__builtin_strncpy’ output truncated before terminating nul copying 5 
bytes from a string of the same length [-Wstringop-truncation]
  575 | strncpy(sqlca->sqlstate, "YE001", 
sizeof(sqlca->sqlstate));


The only other -Wstringop-truncation warnings are in ecpg tests and at least
the first one doesn't look bogus:

../../../../../home/andres/src/postgresql/src/interfaces/ecpg/test/compat_oracle/char_array.pgc:
 In function 'main':
../../../../../home/andres/src/postgresql/src/interfaces/ecpg/test/compat_oracle/char_array.pgc:54:5:
 warning: '__builtin_strncpy' output truncated before terminating nul copying 5 
bytes from a string of the same length [-Wstringop-truncation]
   54 | strncpy(shortstr, p, sizeof shortstr);

Which seems like a valid complaint, given that shortstr is a char[5], p
is "X" and thatshortstr is printed:
printf("\"%s\": \"%s\"  %d\n", bigstr, shortstr, shstr_ind);


Greetings,

Andres Freund




Re: may be a buffer overflow problem

2024-06-17 Thread Tom Lane
Andres Freund  writes:
> On 2024-06-17 23:52:54 +0200, Daniel Gustafsson wrote:
>> Since sqlca is, according to our docs, present in other database systems we
>> should probably keep it a 5-char array for portability reasons.  Adding a
>> padding character should be fine though.

> How about, additionally, adding __attribute__((nonstring))? Wrapped in an
> attribute, of course.  That'll trigger warning for many unsafe uses, like
> strlen().

What I was advocating for is that we make it *safe* for strlen, not
that we double down on awkward, non-idiomatic, unsafe coding
practices.

Admittedly, I'm not sure how we could persuade compilers that
a char[5] followed by a char field is a normal C string ...

regards, tom lane




Re: Better error message when --single is not the first arg to postgres executable

2024-06-17 Thread Nathan Bossart
On Wed, Jun 05, 2024 at 11:38:48PM -0400, Greg Sabino Mullane wrote:
> On Wed, Jun 5, 2024 at 3:18 PM Nathan Bossart 
> wrote:
>> Could we remove the requirement that --single must be first?  I'm not
>> thrilled about adding a list of "must be first" options that needs to stay
>> updated, but given this list probably doesn't change too frequently, maybe
>> that's still better than a more invasive patch to allow specifying these
>> options in any order...
> 
> It would be nice, and I briefly looked into removing the "first"
> requirement, but src/backend/tcop/postgres.c for one assumes that --single
> is always argv[1], and it seemed not worth the extra effort to make it work
> for argv[N] instead of argv[1]. I don't mind it being the first argument,
> but that confusing error message needs to go.

I spent some time trying to remove the must-be-first requirement and came
up with the attached draft-grade patch.  However, there's a complication:
the "database" option for single-user mode must still be listed last, at
least on systems where getopt() doesn't move non-options to the end of the
array.  My previous research [0] indicated that this is pretty common, and
I noticed it because getopt() on macOS doesn't seem to reorder non-options.
I thought about changing these to getopt_long(), which we do rely on to
reorder non-options, but that conflicts with our ParseLongOption() "long
argument simulation" that we use to allow specifying arbitrary GUCs via the
command-line.

This remaining discrepancy might be okay, but I was really hoping to reduce
the burden on users to figure out the correct ordering of options.  The
situations in which I've had to use single-user mode are precisely the
situations in which I'd rather not have to spend time learning these kinds
of details.

[0] https://postgr.es/m/20230609232257.GA121461%40nathanxps13

-- 
nathan
>From 44f8747a75197b3842c41f77503fa8389ba8db3e Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Mon, 17 Jun 2024 21:20:13 -0500
Subject: [PATCH 1/1] allow --single, etc. in any order

---
 src/backend/bootstrap/bootstrap.c |  12 ++--
 src/backend/main/main.c   | 101 --
 src/backend/tcop/postgres.c   |  15 ++---
 3 files changed, 77 insertions(+), 51 deletions(-)

diff --git a/src/backend/bootstrap/bootstrap.c 
b/src/backend/bootstrap/bootstrap.c
index 986f6f1d9c..53011b4300 100644
--- a/src/backend/bootstrap/bootstrap.c
+++ b/src/backend/bootstrap/bootstrap.c
@@ -210,13 +210,6 @@ BootstrapModeMain(int argc, char *argv[], bool check_only)
/* Set defaults, to be overridden by explicit options below */
InitializeGUCOptions();
 
-   /* an initial --boot or --check should be present */
-   Assert(argc > 1
-  && (strcmp(argv[1], "--boot") == 0
-  || strcmp(argv[1], "--check") == 0));
-   argv++;
-   argc--;
-
while ((flag = getopt(argc, argv, "B:c:d:D:Fkr:X:-:")) != -1)
{
switch (flag)
@@ -230,6 +223,11 @@ BootstrapModeMain(int argc, char *argv[], bool check_only)
char   *name,
   *value;
 
+   /* --boot and --check were already 
processed in main() */
+   if (strcmp(optarg, "boot") == 0 ||
+   strcmp(optarg, "check") == 0)
+   break;
+
ParseLongOption(optarg, &name, &value);
if (!value)
{
diff --git a/src/backend/main/main.c b/src/backend/main/main.c
index bfd0c5ed65..b893a9f298 100644
--- a/src/backend/main/main.c
+++ b/src/backend/main/main.c
@@ -57,7 +57,17 @@ static void check_root(const char *progname);
 int
 main(int argc, char *argv[])
 {
-   booldo_check_root = true;
+   boolcheck = false;
+   boolboot = false;
+#ifdef EXEC_BACKEND
+   boolforkchild = false;
+#endif
+   booldescribe_config = false;
+   boolsingle = false;
+   boolshow_help = false;
+   boolversion = false;
+   boolcheck_guc = false;
+   int modes_set = 0;
 
reached_main = true;
 
@@ -136,61 +146,86 @@ main(int argc, char *argv[])
 */
unsetenv("LC_ALL");
 
+   for (int i = 1; i < argc; i++)
+   {
+   modes_set++;
+
+   if (strcmp(argv[i], "--check") == 0)
+   check = true;
+   else if (strcmp(argv[i], "--boot") == 0)
+   boot = true;
+#ifdef EXEC_BACKEND
+   else if (strncmp(argv[i], "--forkchild", 11) == 0)
+   forkchild = true;
+#endif
+   else if (strcmp(argv[i], "--

Re: assertion failure at cost_memoize_rescan()

2024-06-17 Thread David Rowley
On Tue, 18 Jun 2024 at 14:23, Kohei KaiGai  wrote:
>
> 2024年6月17日(月) 8:27 David Rowley :
> > It would be good to know what type of Path outer_path is.  Normally
> > we'll clamp_row_est() on that field. I suspect we must have some Path
> > type that isn't doing that.
> >
> > KaiGai-san,  what type of Path is outer_path?
> >
> It is CustomPath with rows = 3251872.91667.

I suspected this might have been a CustomPath.

> (I'm not certain whether the non-integer value in the estimated rows
> is legal or not.)

I guess since it's not documented that Path.rows is always clamped,
it's probably bad to assume that it is.

Since clamp_row_est() will ensure the value is clamped >= 1.0 && <=
MAXIMUM_ROWCOUNT (which ensures non-zero), I tried looking around the
codebase for anything that divides by Path.rows to see if we ever
assume that we can divide without first checking if Path.rows != 0.
Out of the places I saw, it seems we do tend to code things so that we
don't assume the value has been clamped.  E.g.
adjust_limit_rows_costs() does if (*rows < 1) *rows = 1;

I think the best solution is to apply the attached.  I didn't test,
but it should fix the issue you reported and also ensure that
MemoizePath.calls is never zero, which would also cause issues in the
hit_ratio calculation in cost_memoize_rescan().

David


clamp_row_est_memoize_calls.patch
Description: Binary data


Re: New GUC autovacuum_max_threshold ?

2024-06-17 Thread Nathan Bossart
I didn't see a commitfest entry for this, so I created one to make sure we
don't lose track of this:

https://commitfest.postgresql.org/48/5046/

-- 
nathan




Re: assertion failure at cost_memoize_rescan()

2024-06-17 Thread Richard Guo
On Tue, Jun 18, 2024 at 10:53 AM David Rowley  wrote:
> Out of the places I saw, it seems we do tend to code things so that we
> don't assume the value has been clamped.  E.g.
> adjust_limit_rows_costs() does if (*rows < 1) *rows = 1;

Agreed.  In costsize.c I saw a few instances where we have

/* Protect some assumptions below that rowcounts aren't zero */
if (inner_path_rows <= 0)
inner_path_rows = 1;

> I think the best solution is to apply the attached.  I didn't test,
> but it should fix the issue you reported and also ensure that
> MemoizePath.calls is never zero, which would also cause issues in the
> hit_ratio calculation in cost_memoize_rescan().

+1.

Thanks
Richard




Re: State of pg_createsubscriber

2024-06-17 Thread Amit Kapila
On Mon, May 20, 2024 at 12:12 PM Amit Kapila  wrote:
>
> On Sun, May 19, 2024 at 11:20 PM Euler Taveira  wrote:
> >
> > On Sun, May 19, 2024, at 2:30 PM, Tom Lane wrote:
> >
> > I'm fairly disturbed about the readiness of pg_createsubscriber.
> > The 040_pg_createsubscriber.pl TAP test is moderately unstable
> > in the buildfarm [1], and there are various unaddressed complaints
> > at the end of the patch thread (pretty much everything below [2]).
> > I guess this is good enough to start beta with, but it's far from
> > being good enough to ship, IMO.  If there were active work going
> > on to fix these things, I'd feel better, but neither the C code
> > nor the test script have been touched since 1 April.
> >
> >
> > My bad. :( I'll post patches soon to address all of the points.
> >
>
> Just to summarize, apart from BF failures for which we had some
> discussion, I could recall the following open points:
>
> 1. After promotion, the pre-existing replication objects should be
> removed (either optionally or always), otherwise, it can lead to a new
> subscriber not being able to restart or getting some unwarranted data.
> [1][2].
>
> 2. Retaining synced slots on new subscribers can lead to unnecessary
> WAL retention and dead rows [3].
>
> 3. We need to consider whether some of the messages displayed in
> --dry-run mode are useful or not [4].
>

The recent commits should silence BF failures and resolve point number
2. But we haven't done anything yet for 1 and 3. For 3, we have a
patch in email [1] (v3-0005-Avoid*) which can be reviewed and
committed but point number 1 needs discussion. Apart from that
somewhat related to point 1, Kuroda-San has raised a point in an email
[2] for replication slots. Shlok has presented a case in this thread
[3] where the problem due to point 1 can cause ERRORs or can cause
data inconsistency.

Now, the easiest way out here is that we accept the issues with the
pre-existing subscriptions and replication slots cases and just
document them for now with the intention to work on those in the next
version. OTOH, if there are no major challenges, we can try to
implement a patch for them as well as see how it goes.

[1] 
https://www.postgresql.org/message-id/CANhcyEWGfp7_AGg2zZUgJF_VYTCix01yeY8ZX9woz%2B03WCMPRg%40mail.gmail.com
[2] 
https://www.postgresql.org/message-id/OSBPR01MB25525C17E2EF5FC81152F6C8F5F42%40OSBPR01MB2552.jpnprd01.prod.outlook.com
[3] 
https://www.postgresql.org/message-id/CANhcyEWvimA1-f6hSrA%3D9qkfR5SonFb56b36M%2B%2BvT%3DLiFj%3D76g%40mail.gmail.com

-- 
With Regards,
Amit Kapila.




Re: Addressing SECURITY DEFINER Function Vulnerabilities in PostgreSQL Extensions

2024-06-17 Thread Ashutosh Sharma
Hi John,

On Tue, Jun 18, 2024 at 2:35 AM John H  wrote:
>
> Hi Ashutosh,
>
> Thinking about this more, could you clarify the problem/issue at hand?
> I think it's still not clear to me.
> Yes, CREATE EXTENSION can create functions that lead to unexpected
> privilege escalation, regardless
>  if they are SECURITY DEFINER or SECURITY INVOKER (if the function is
> inadvertently executed by superuser).
> But that's also true for a general CREATE FUNCTION call outside of extensions.
>

This specifically applies to extension functions, not standalone
functions created independently. The difference is that installing
extensions typically requires superuser privileges, which is not the
case with standalone functions.

--
With Regards,
Ashutosh Sharma.




Re: Xact end leaves CurrentMemoryContext = TopMemoryContext

2024-06-17 Thread David Rowley
On Tue, 18 Jun 2024 at 08:37, Tom Lane  wrote:
> As to what to do about it: I'm imagining that instead of resetting
> CurrentMemoryContext to TopMemoryContext, we set it to some special
> context that we expect we can reset every so often, like at the start
> of the next transaction.  The existing TransactionAbortContext is
> a very similar thing, and maybe could be repurposed/shared with this
> usage.
>
> Thoughts?

Instead, could we just not delete TopTransactionContext in
AtCommit_Memory() and instead do MemoryContextReset() on it? Likewise
in AtCleanup_Memory().

If we got to a stage where we didn't expect anything to allocate into
that context outside of a transaction, we could check if the context
is still reset in AtStart_Memory() and do something like raise a
WARNING on debug builds (or Assert()) to alert us that some code that
broke our expectations.

It might also be a very tiny amount more efficient to not delete the
context so we don't have to fetch a new context from the context
freelist in AtStart_Memory().  Certainly, it wouldn't add any
overhead.  Adding a new special context would and so would the logic
to reset it every so often.

David




Re: Conflict Detection and Resolution

2024-06-17 Thread Dilip Kumar
On Mon, Jun 17, 2024 at 3:23 PM Amit Kapila  wrote:
>
> On Wed, Jun 12, 2024 at 10:03 AM Dilip Kumar  wrote:
> >
> > On Tue, Jun 11, 2024 at 7:44 PM Tomas Vondra
> >  wrote:
> >
> > > > Yes, that's correct. However, many cases could benefit from the
> > > > update_deleted conflict type if it can be implemented reliably. That's
> > > > why we wanted to give it a try. But if we can't achieve predictable
> > > > results with it, I'm fine to drop this approach and conflict_type. We
> > > > can consider a better design in the future that doesn't depend on
> > > > non-vacuumed entries and provides a more robust method for identifying
> > > > deleted rows.
> > > >
> > >
> > > I agree having a separate update_deleted conflict would be beneficial,
> > > I'm not arguing against that - my point is actually that I think this
> > > conflict type is required, and that it needs to be detected reliably.
> > >
> >
> > When working with a distributed system, we must accept some form of
> > eventual consistency model. However, it's essential to design a
> > predictable and acceptable behavior. For example, if a change is a
> > result of a previous operation (such as an update on node B triggered
> > after observing an operation on node A), we can say that the operation
> > on node A happened before the operation on node B. Conversely, if
> > operations on nodes A and B are independent, we consider them
> > concurrent.
> >
> > In distributed systems, clock skew is a known issue. To establish a
> > consistency model, we need to ensure it guarantees the
> > "happens-before" relationship. Consider a scenario with three nodes:
> > NodeA, NodeB, and NodeC. If NodeA sends changes to NodeB, and
> > subsequently NodeB makes changes, and then both NodeA's and NodeB's
> > changes are sent to NodeC, the clock skew might make NodeB's changes
> > appear to have occurred before NodeA's changes. However, we should
> > maintain data that indicates NodeB's changes were triggered after
> > NodeA's changes arrived at NodeB. This implies that logically, NodeB's
> > changes happened after NodeA's changes, despite what the timestamps
> > suggest.
> >
> > A common method to handle such cases is using vector clocks for
> > conflict resolution.
> >
>
> I think the unbounded size of the vector could be a problem to store
> for each event. However, while researching previous discussions, it
> came to our notice that we have discussed this topic in the past as
> well in the context of standbys. For recovery_min_apply_delay, we
> decided the clock skew is not a problem as the settings of this
> parameter are much larger than typical time deviations between servers
> as mentioned in docs. Similarly for casual reads [1], there was a
> proposal to introduce max_clock_skew parameter and suggesting the user
> to make sure to have NTP set up correctly. We have tried to check
> other databases (like Ora and BDR) where CDR is implemented but didn't
> find anything specific to clock skew. So, I propose to go with a GUC
> like max_clock_skew such that if the difference of time between the
> incoming transaction's commit time and the local time is more than
> max_clock_skew then we raise an ERROR. It is not clear to me that
> putting bigger effort into clock skew is worth especially when other
> systems providing CDR feature (like Ora or BDR) for decades have not
> done anything like vector clocks. It is possible that this is less of
> a problem w.r.t CDR and just detecting the anomaly in clock skew is
> good enough.

I believe that if we've accepted this solution elsewhere, then we can
also consider the same. Basically, we're allowing the application to
set its tolerance for clock skew. And, if the skew exceeds that
tolerance, it's the application's responsibility to synchronize;
otherwise, an error will occur. This approach seems reasonable.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: Xact end leaves CurrentMemoryContext = TopMemoryContext

2024-06-17 Thread Tom Lane
David Rowley  writes:
> Instead, could we just not delete TopTransactionContext in
> AtCommit_Memory() and instead do MemoryContextReset() on it? Likewise
> in AtCleanup_Memory().

Hmm, that's a nice idea.  Maybe reset again in AtStart_Memory, although
that seems optional.  My first reaction was "what about memory context
callbacks attached to TopTransactionContext?" ... but those are defined
to be fired on either reset or delete, so semantically this seems like
it creates no issues.  And you're right that not constantly deleting
and recreating that context should save some microscopic amount.

> If we got to a stage where we didn't expect anything to allocate into
> that context outside of a transaction, we could check if the context
> is still reset in AtStart_Memory() and do something like raise a
> WARNING on debug builds (or Assert()) to alert us that some code that
> broke our expectations.

My point is exactly that I don't think that we can expect that,
or at least that the cost of guaranteeing it will vastly outweigh
any possible benefit.  (So I wasn't excited about Andres' suggestion.
But this one seems promising.)

I'll poke at this tomorrow, unless you're hot to try it right now.

regards, tom lane




  1   2   >