Consolidate ItemPointer to Datum conversion functions

2023-02-06 Thread Peter Eisentraut
Instead of defining the same set of macros several times, define it once 
in an appropriate header file.  In passing, convert to inline functions.From 11227ec513c9f6fd05d8286156c63ebc9c200017 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Mon, 6 Feb 2023 10:48:32 +0100
Subject: [PATCH] Consolidate ItemPointer to Datum conversion functions

Instead of defining the same set of macros several times, define it
once in an appropriate header file.  In passing, convert to inline
functions.
---
 contrib/pageinspect/btreefuncs.c |  2 --
 contrib/pageinspect/ginfuncs.c   |  3 ---
 contrib/pageinspect/gistfuncs.c  |  2 --
 src/backend/utils/adt/tid.c  |  2 --
 src/include/storage/itemptr.h| 17 +
 5 files changed, 17 insertions(+), 9 deletions(-)

diff --git a/contrib/pageinspect/btreefuncs.c b/contrib/pageinspect/btreefuncs.c
index e4e5dc3c81..9cdc8e182b 100644
--- a/contrib/pageinspect/btreefuncs.c
+++ b/contrib/pageinspect/btreefuncs.c
@@ -50,8 +50,6 @@ PG_FUNCTION_INFO_V1(bt_multi_page_stats);
 
 #define IS_INDEX(r) ((r)->rd_rel->relkind == RELKIND_INDEX)
 #define IS_BTREE(r) ((r)->rd_rel->relam == BTREE_AM_OID)
-#define DatumGetItemPointer(X)  ((ItemPointer) DatumGetPointer(X))
-#define ItemPointerGetDatum(X)  PointerGetDatum(X)
 
 /* 
  * structure for single btree page statistics
diff --git a/contrib/pageinspect/ginfuncs.c b/contrib/pageinspect/ginfuncs.c
index efaa47e86d..0f846988df 100644
--- a/contrib/pageinspect/ginfuncs.c
+++ b/contrib/pageinspect/ginfuncs.c
@@ -21,9 +21,6 @@
 #include "utils/builtins.h"
 #include "utils/rel.h"
 
-#define DatumGetItemPointer(X)  ((ItemPointer) DatumGetPointer(X))
-#define ItemPointerGetDatum(X)  PointerGetDatum(X)
-
 
 PG_FUNCTION_INFO_V1(gin_metapage_info);
 PG_FUNCTION_INFO_V1(gin_page_opaque_info);
diff --git a/contrib/pageinspect/gistfuncs.c b/contrib/pageinspect/gistfuncs.c
index 3a947c82af..100697814d 100644
--- a/contrib/pageinspect/gistfuncs.c
+++ b/contrib/pageinspect/gistfuncs.c
@@ -31,8 +31,6 @@ PG_FUNCTION_INFO_V1(gist_page_items_bytea);
 
 #define IS_GIST(r) ((r)->rd_rel->relam == GIST_AM_OID)
 
-#define ItemPointerGetDatum(X)  PointerGetDatum(X)
-
 
 Datum
 gist_page_opaque_info(PG_FUNCTION_ARGS)
diff --git a/src/backend/utils/adt/tid.c b/src/backend/utils/adt/tid.c
index 251219a1ef..7c64b0d693 100644
--- a/src/backend/utils/adt/tid.c
+++ b/src/backend/utils/adt/tid.c
@@ -37,8 +37,6 @@
 #include "utils/varlena.h"
 
 
-#define DatumGetItemPointer(X)  ((ItemPointer) DatumGetPointer(X))
-#define ItemPointerGetDatum(X)  PointerGetDatum(X)
 #define PG_GETARG_ITEMPOINTER(n) DatumGetItemPointer(PG_GETARG_DATUM(n))
 #define PG_RETURN_ITEMPOINTER(x) return ItemPointerGetDatum(x)
 
diff --git a/src/include/storage/itemptr.h b/src/include/storage/itemptr.h
index 354e50e68b..0715ba0d17 100644
--- a/src/include/storage/itemptr.h
+++ b/src/include/storage/itemptr.h
@@ -222,4 +222,21 @@ extern int32 ItemPointerCompare(ItemPointer arg1, 
ItemPointer arg2);
 extern void ItemPointerInc(ItemPointer pointer);
 extern void ItemPointerDec(ItemPointer pointer);
 
+/* 
+ * Datum conversion functions
+ * 
+ */
+
+static inline ItemPointer
+DatumGetItemPointer(Datum X)
+{
+   return (ItemPointer) DatumGetPointer(X);
+}
+
+static inline Datum
+ItemPointerGetDatum(const ItemPointerData *X)
+{
+   return PointerGetDatum(X);
+}
+
 #endif /* ITEMPTR_H */
-- 
2.39.1



Re: pgbench: using prepared BEGIN statement in a pipeline could cause an error

2023-02-06 Thread Alvaro Herrera
I'm writing my own patch for this problem.  While playing around with
it, I noticed this:

struct Command {
PQExpBufferDatalines;/* 024 */
char * first_line;   /*24 8 */
inttype; /*32 4 */
MetaCommandmeta; /*36 4 */
intargc; /*40 4 */

/* XXX 4 bytes hole, try to pack */

char * argv[256];/*48  2048 */
/* --- cacheline 32 boundary (2048 bytes) was 48 bytes ago --- */
char * varprefix;/*  2096 8 */
PgBenchExpr *  expr; /*  2104 8 */
/* --- cacheline 33 boundary (2112 bytes) --- */
SimpleStatsstats;/*  211240 */
int64  retries;  /*  2152 8 */
int64  failures; /*  2160 8 */

/* size: 2168, cachelines: 34, members: 11 */
/* sum members: 2164, holes: 1, sum holes: 4 */
/* last cacheline: 56 bytes */
};

Not great.  I suppose this makes pgbench slower than it needs to be.
Can we do better?

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"Nunca se desea ardientemente lo que solo se desea por razón" (F. Alexandre)




Re: Consolidate ItemPointer to Datum conversion functions

2023-02-06 Thread Heikki Linnakangas

On 06/02/2023 11:54, Peter Eisentraut wrote:

Instead of defining the same set of macros several times, define it once
in an appropriate header file.  In passing, convert to inline functions.


Looks good to me. Did you consider moving PG_GETARG_ITEMPOINTER and 
PG_RETURN_ITEMPOINTER, too? They're only used in tid.c, but for most 
datatypes, we define the PG_GETARG and PG_RETURN macros in the same 
header file as the the Datum conversion functions.


- Heikki





RE: Perform streaming logical transactions by background workers and parallel apply

2023-02-06 Thread houzj.f...@fujitsu.com
Hi, 

while reading the code, I noticed that in pa_send_data() we set wait event
to WAIT_EVENT_LOGICAL_PARALLEL_APPLY_STATE_CHANGE while sending the
message to the queue. Because this state is used in multiple places, user might
not be able to distinguish what they are waiting for. So It seems we'd better
to use WAIT_EVENT_MQ_SEND here which will be eaier to distinguish and
understand. Here is a tiny patch for that.

Best Regards,
Hou zj


0001-Use-appropriate-wait-event-when-sending-data.patch
Description: 0001-Use-appropriate-wait-event-when-sending-data.patch


Re: Pluggable toaster

2023-02-06 Thread Aleksander Alekseev
Hi,

> > So we're really quite surprised that it has got so little feedback. We've
> > got
> > some opinions on approach but there is no any general one on the approach
> > itself except doubts about the TOAST mechanism needs revision at all.
>
> The problem for me is that what you've been posting doesn't actually fix
> any problem, but instead introduces lots of new code and complexity.

> > Currently we're busy revising the whole Pluggable TOAST API to make it
> > available as an extension and based on hooks to minimize changes in
> > the core. It will be available soon.
>
> I don't think we should accept that either. It still doesn't improve
> anything about toast, it just allows you to do such improvements out of
> core.

Agree. On top of that referencing non-reproducible benchmarks doesn't
help. There were some slides referenced in the thread but I couldn't
find exact steps to reproduce the benchmarks.

Your desire to improve the TOAST mechanism is much appreciated. I
believe we are all on the same side here, the one where people work
together to make PostgreSQL an even better DBMS.

However in order to achieve this firstly a consensus within the
community should be reached about how exactly we are going to do this.
Afterwards, all the code and benchmarks should be made publicly
available under a proper license so that anyone could explore and
reproduce them. Last but not least, the complexity should really be
taken into account. There are real people who are going to maintain
the code after (and if) it will be merged, and there are not so many
of them.

The problems I see are that the type-aware TOASTers skipped step (1)
right to the step (2) and doesn't seem to consider (3). Even after it
was explicitly pointed out that we should take a step back and return
to (1).

-- 
Best regards,
Aleksander Alekseev




RE: Ability to reference other extensions by schema in extension scripts

2023-02-06 Thread Regina Obe
> > > Here is first version of my patch using the
> > > @extschema:extensionname@ syntax you proposed.
> > >
> > > This patch includes:
> > > 1) Changes to replace references of @extschema:extensionname@ with
> > > the schema of the required extension
> > > 2) Documentation for the feature
> > > 3) Tests for the feature.
> > >

Attached is a revised version of the original patch. It is revised to
prevent 

ALTER EXTENSION .. SET SCHEMA  if there is a dependent extension that 
references the extension in their scripts using @extschema:extensionname@
It also adds additional tests to verify that new feature.

In going thru the code base, I was tempted to add a new dependency type
instead of using the existing DEPENDENCY_AUTO.  I think this would be
cleaner, but I felt that was overstepping the area a bit, since it requires
making changes to dependency.h and dependency.c

My main concern with using DEPENDENCY_AUTO is because it was designed for
cases where an object can be dropped without need for CASCADE.  In this
case, we don't want a dependent extension to be dropped if it's required is
dropped.  However since there will already exist 
a DEPENDENCY_NORMAL between the 2 extensions, I figure we are protected
against that issue already.

The issue I ran into is there doesn't seem to be an easy way of checking if
a pg_depend record is already in place, so I ended up dropping it first with
deleteDependencyRecordsForSpecific so I wouldn't need to check and then
reading it.

The reason for that is during CREATE EXTENSION it would need to create the
dependency.
It would also need to do so with ALTER EXTENSION  .. UPDATE, since extension
could later on add it in their upgrade scripts and so there end up being
dupes after many ALTER EXTENSION UPDATE calls.


pg_depends getAutoExtensionsOfObject  seemed suited to that check, as is
done in 

alter.c ExecAlterObjectDependsStmt
/* Avoid duplicates */
currexts = getAutoExtensionsOfObject(address.classId,

address.objectId);
if (!list_member_oid(currexts, refAddr.objectId))
recordDependencyOn(&address, &refAddr,
DEPENDENCY_AUTO_EXTENSION);

but it is hard-coded to only check  DEPENDENCY_AUTO_EXTENSION

Why isn't there a variant getAutoExtensionsOfObject take a DEPENDENCY type
as an option so it would be more useful or is there functionality for that I
missed?

Thanks,
Regina







0002-Allow-use-of-extschema-reqextname-to-reference.patch
Description: Binary data


RE: Perform streaming logical transactions by background workers and parallel apply

2023-02-06 Thread Hayato Kuroda (Fujitsu)
Dear Hou,

> while reading the code, I noticed that in pa_send_data() we set wait event
> to WAIT_EVENT_LOGICAL_PARALLEL_APPLY_STATE_CHANGE while sending
> the
> message to the queue. Because this state is used in multiple places, user 
> might
> not be able to distinguish what they are waiting for. So It seems we'd better
> to use WAIT_EVENT_MQ_SEND here which will be eaier to distinguish and
> understand. Here is a tiny patch for that.

In LogicalParallelApplyLoop(), we introduced the new wait event
WAIT_EVENT_LOGICAL_PARALLEL_APPLY_MAIN whereas it is practically waits a shared
message queue and it seems to be same as WAIT_EVENT_MQ_RECEIVE.
Do you have a policy to reuse the event instead of adding a new event?

Best Regards,
Hayato Kuroda
FUJITSU LIMITED



Re: Pluggable toaster

2023-02-06 Thread Alvaro Herrera
On 2023-Feb-06, Nikita Malakhov wrote:

> Currently we're busy revising the whole Pluggable TOAST API to make it
> available as an extension and based on hooks to minimize changes in
> the core. It will be available soon.

Hmm, I'm not sure why would PGDG want to accept such a thing.  I read
"minimize changes" as "open source Postgres can keep their crap
implementation and companies will offer good ones for a fee".  I'd
rather have something that can give users direct benefit -- not hide it
behind proprietary licenses forcing each company to implement their own
performant toaster.

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/




Re: proposal: psql: psql variable BACKEND_PID

2023-02-06 Thread Daniel Verite
Corey Huinker wrote:

> Manually testing confirms that it works, at least for the connected state. I
> don't actually know how get psql to invoke DISCONNECT, so I killed the dev
> server and can confirm

Maybe something like this could be used, with no external action:

  postgres=# \echo :BACKEND_PID 
  10805
  postgres=# create user tester superuser;
  CREATE ROLE
  postgres=# \c postgres tester
  You are now connected to database "postgres" as user "tester".
  postgres=# alter user tester nosuperuser connection limit 0;
  ALTER ROLE
  postgres=# select pg_terminate_backend(pg_backend_pid());
  FATAL:  terminating connection due to administrator command
  server closed the connection unexpectedly
  This probably means the server terminated abnormally
  before or while processing the request.
  The connection to the server was lost. Attempting reset: Failed.
  The connection to the server was lost. Attempting reset: Failed.

  !?> \echo :BACKEND_PID
  :BACKEND_PID


> In the varlistentry, I suggest we add "This variable is unset when the
> connection is lost." after "but can be changed or unset.

Personally I'd much rather have BACKEND_PID set to 0 rather than being unset
when not connected. For one thing it allows safely using \if :BACKEND_PID.


Best regards,
-- 
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite




RE: Perform streaming logical transactions by background workers and parallel apply

2023-02-06 Thread houzj.f...@fujitsu.com
On Monday, February 6, 2023 6:34 PM Kuroda, Hayato  
wrote:
> > while reading the code, I noticed that in pa_send_data() we set wait
> > event to WAIT_EVENT_LOGICAL_PARALLEL_APPLY_STATE_CHANGE while
> sending
> > the message to the queue. Because this state is used in multiple
> > places, user might not be able to distinguish what they are waiting
> > for. So It seems we'd better to use WAIT_EVENT_MQ_SEND here which will
> > be eaier to distinguish and understand. Here is a tiny patch for that.
> 
> In LogicalParallelApplyLoop(), we introduced the new wait event
> WAIT_EVENT_LOGICAL_PARALLEL_APPLY_MAIN whereas it is practically waits a
> shared message queue and it seems to be same as WAIT_EVENT_MQ_RECEIVE.
> Do you have a policy to reuse the event instead of adding a new event?

I think PARALLEL_APPLY_MAIN waits for two kinds of event: 1) wait for new
message from the queue 2) wait for the partial file state to be set. So, I
think introducing a new general event for them is better and it is also
consistent with the WAIT_EVENT_LOGICAL_APPLY_MAIN which is used in the main
loop of leader apply worker(LogicalRepApplyLoop). But the event in
pg_send_data() is only for message send, so it seems fine to use
WAIT_EVENT_MQ_SEND, besides MQ_SEND is also unique in parallel apply worker and
user can distinglish without adding new event.

Best Regards,
Hou zj


Re: Time delayed LR (WAS Re: logical replication restrictions)

2023-02-06 Thread Amit Kapila
On Mon, Feb 6, 2023 at 12:36 PM Takamichi Osumi (Fujitsu)
 wrote:
>

I have made a couple of changes in the attached: (a) changed a few
error and LOG messages; (a) added/changed comments. See, if these look
good to you then please include them in the next version.

-- 
With Regards,
Amit Kapila.


v28_amit_changes.1.patch
Description: Binary data


Re: Time delayed LR (WAS Re: logical replication restrictions)

2023-02-06 Thread Amit Kapila
On Tue, Jan 24, 2023 at 5:02 AM Euler Taveira  wrote:
>
>
> -   elog(DEBUG2, "sending feedback (force %d) to recv %X/%X, write %X/%X, 
> flush %X/%X in-delayed: %d",
> +   elog(DEBUG2, "sending feedback (force %d) to recv %X/%X, write %X/%X, 
> flush %X/%X, apply delay: %s",
>  force,
>  LSN_FORMAT_ARGS(recvpos),
>  LSN_FORMAT_ARGS(writepos),
>  LSN_FORMAT_ARGS(flushpos),
> -in_delayed_apply);
> +in_delayed_apply? "yes" : "no");
>
> It is better to use a string to represent the yes/no option.
>

I think it is better to be consistent with the existing force
parameter which is also boolean, otherwise, it will look odd.

-- 
With Regards,
Amit Kapila.




Re: proposal: psql: psql variable BACKEND_PID

2023-02-06 Thread Daniel Verite
I wrote:

> > In the varlistentry, I suggest we add "This variable is unset when the
> > connection is lost." after "but can be changed or unset.
> 
> Personally I'd much rather have BACKEND_PID set to 0 rather than being unset
> when not connected. For one thing it allows safely using \if :BACKEND_PID.

Oops it turns out that was wishful thinking from me.
\if does not interpret a non-zero integer as true, except for the
value "1". 
I'd still prefer BACKEND_PID being 0 when not connected, though.

Best regards,
-- 
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite




RE: Time delayed LR (WAS Re: logical replication restrictions)

2023-02-06 Thread Takamichi Osumi (Fujitsu)
On Monday, February 6, 2023 8:51 PM Amit Kapila  wrote:
> On Mon, Feb 6, 2023 at 12:36 PM Takamichi Osumi (Fujitsu)
>  wrote:
> >
> 
> I have made a couple of changes in the attached: (a) changed a few error and
> LOG messages; (a) added/changed comments. See, if these look good to you
> then please include them in the next version.
Hi, thanks for sharing the patch !

The proposed changes make comments easier to understand
and more aligned with other existing comments. So, LGTM.

The attached patch v29 has included your changes.


Best Regards,
Takamichi Osumi



v29-0001-Time-delayed-logical-replication-subscriber.patch
Description:  v29-0001-Time-delayed-logical-replication-subscriber.patch


RE: Time delayed LR (WAS Re: logical replication restrictions)

2023-02-06 Thread Takamichi Osumi (Fujitsu)
Hi,


On Monday, February 6, 2023 8:57 PM Amit Kapila  wrote:
> On Tue, Jan 24, 2023 at 5:02 AM Euler Taveira  wrote:
> >
> >
> > -   elog(DEBUG2, "sending feedback (force %d) to recv %X/%X,
> write %X/%X, flush %X/%X in-delayed: %d",
> > +   elog(DEBUG2, "sending feedback (force %d) to recv %X/%X, write
> > + %X/%X, flush %X/%X, apply delay: %s",
> >  force,
> >  LSN_FORMAT_ARGS(recvpos),
> >  LSN_FORMAT_ARGS(writepos),
> >  LSN_FORMAT_ARGS(flushpos),
> > -in_delayed_apply);
> > +in_delayed_apply? "yes" : "no");
> >
> > It is better to use a string to represent the yes/no option.
> >
> 
> I think it is better to be consistent with the existing force parameter which 
> is
> also boolean, otherwise, it will look odd.
Agreed. The latest patch v29 posted in [1] followed this suggestion.

Kindly have a look at it.

[1] - 
https://www.postgresql.org/message-id/TYCPR01MB8373A59E7B74AA4F96B62BEAEDDA9%40TYCPR01MB8373.jpnprd01.prod.outlook.com



Best Regards,
Takamichi Osumi





Re: Pluggable toaster

2023-02-06 Thread Nikita Malakhov
Hi!

Existing TOAST has several very painful drawbacks - lack of UPDATE
operation, bloating of TOAST tables, and limits that are implicitly implied
on base tables by their TOAST tables, so it is seems not fair to say that
Pluggable TOAST does not solve any problems but just introduces new
ones.

The main reason behind this decision is that keeping the first
implementation
on the side of the vanilla (I mean rebasing it) over time is very difficult
due
to the very invasive nature of this solution.

So we decided to reduce changes in the core to the minimum necessary
to make it available through the hooks, because the hooks part is very
lightweight and simple to keep rebasing onto the vanilla core. We plan
to keep this extension free with the PostgreSQL license, so any PostgreSQL
user could benefit from the TOAST on steroids, and sometimes in the
future it will be a much simpler task to integrate the Pluggable TOAST into
the vanilla, along with our advanced TOAST implementations which
we plan to keep under Open Source licenses too.


On Mon, Feb 6, 2023 at 1:49 PM Alvaro Herrera 
wrote:

> On 2023-Feb-06, Nikita Malakhov wrote:
>
> > Currently we're busy revising the whole Pluggable TOAST API to make it
> > available as an extension and based on hooks to minimize changes in
> > the core. It will be available soon.
>
> Hmm, I'm not sure why would PGDG want to accept such a thing.  I read
> "minimize changes" as "open source Postgres can keep their crap
> implementation and companies will offer good ones for a fee".  I'd
> rather have something that can give users direct benefit -- not hide it
> behind proprietary licenses forcing each company to implement their own
> performant toaster.
>
> --
> Álvaro Herrera   48°01'N 7°57'E  —
> https://www.EnterpriseDB.com/
>


-- 
Regards,
Nikita Malakhov
Postgres Professional
https://postgrespro.ru/


Re: [PATCH] Compression dictionaries for JSONB

2023-02-06 Thread Aleksander Alekseev
Hi,

> > So to clarify, are we talking about tuple-level compression? Or
> > perhaps page-level compression?
>
> Tuple level.
>
> What I think we should do is basically this:
>
> When we compress datums, we know the table being targeted. If there's a
> pg_attribute parameter indicating we should, we can pass a prebuilt
> dictionary to the LZ4/zstd [de]compression functions.
>
> It's possible we'd need to use a somewhat extended header for such
> compressed datums, to reference the dictionary "id" to be used when
> decompressing, if the compression algorithms don't already have that in
> one of their headers, but that's entirely doable.
>
> A quick demo of the effect size:
> [...]
> Here's the results:
>
>lz4   zstd   uncompressed
> no dict1328794 9824973898498
> dict375070 267194
>
> I'd say the effect of the dictionary is pretty impressive. And remember,
> this is with the dictionary having been trained on a subset of the data.

I see your point regarding the fact that creating dictionaries on a
training set is too beneficial to neglect it. Can't argue with this.

What puzzles me though is: what prevents us from doing this on a page
level as suggested previously?

More similar data you compress the more space and disk I/O you save.
Additionally you don't have to compress/decompress the data every time
you access it. Everything that's in shared buffers is uncompressed.
Not to mention the fact that you don't care what's in pg_attribute,
the fact that schema may change, etc. There is a table and a
dictionary for this table that you refresh from time to time. Very
simple.

Of course the disadvantage here is that we are not saving the memory,
unlike the case of tuple-level compression. But we are saving a lot of
CPU cycles and doing less disk IOs. I would argue that saving CPU
cycles is generally more preferable. CPUs are still often a bottleneck
while the memory becomes more and more available, e.g there are
relatively affordable (for a company, not an individual) 1 TB RAM
instances, etc.

So it seems to me that doing page-level compression would be simpler
and more beneficial in the long run (10+ years). Don't you agree?

-- 
Best regards,
Aleksander Alekseev




Re: proposal: psql: psql variable BACKEND_PID

2023-02-06 Thread Pavel Stehule
po 6. 2. 2023 v 13:03 odesílatel Daniel Verite 
napsal:

> I wrote:
>
> > > In the varlistentry, I suggest we add "This variable is unset when the
> > > connection is lost." after "but can be changed or unset.
> >
> > Personally I'd much rather have BACKEND_PID set to 0 rather than being
> unset
> > when not connected. For one thing it allows safely using \if
> :BACKEND_PID.
>
> Oops it turns out that was wishful thinking from me.
> \if does not interpret a non-zero integer as true, except for the
> value "1".
> I'd still prefer BACKEND_PID being 0 when not connected, though.
>

I think psql never tries to execute a query if the engine is not connected,
so for usage in queries undefined state is not important - it will be
always defined.

for using in \if is unset may be a better state, because you can try to use
{? varname} syntax.

0 is theoretically valid process id number, so I am not sure if 0 is ok. I
don't know if some numbers can be used like invalid process id?




>
> Best regards,
> --
> Daniel Vérité
> https://postgresql.verite.pro/
> Twitter: @DanielVerite
>


Re: Pluggable toaster

2023-02-06 Thread Aleksander Alekseev
Hi,

> The main reason behind this decision is that keeping the first implementation
> on the side of the vanilla (I mean rebasing it) over time is very difficult 
> due
> to the very invasive nature of this solution.
>
> So we decided to reduce changes in the core to the minimum necessary
> to make it available through the hooks, because the hooks part is very
> lightweight and simple to keep rebasing onto the vanilla core. We plan
> to keep this extension free with the PostgreSQL license, so any PostgreSQL
> user could benefit from the TOAST on steroids, and sometimes in the
> future it will be a much simpler task to integrate the Pluggable TOAST into
> the vanilla, along with our advanced TOAST implementations which
> we plan to keep under Open Source licenses too.

That's great to hear. I'm looking forward to the link to the
corresponding GitHub repository. Please let us know when this effort
will be available for testing and benchmarking!

I would like to point out however that there were several other pieces
of feedback that could have been missed:

* No one wants to see this as an extension. This was my original
proposal (adding ZSON to /contrib/) and it was rejected. The community
explicitly wants this to be a core feature with its syntax,
autocompletion, documentation and stuff.
* The community wants the feature to have a simple implementation. You
said yourself that the idea of type-aware TOASTers is very invasive,
and I completely agree.
* People also want this to be simple from the user perspective, as
simple as just CREATE COMPRESSED TABLE ... [USING lz4|zstd];

At least this is my personal summary/impression from following the mailing list.

Anyhow since we are back to the stage where we discuss the RFC I
suggest continuing it in the compression dictionaries thread [1] since
we made noticeable progress there already.

[1]: 
https://postgr.es/m/CAJ7c6TOtAB0z1UrksvGTStNE-herK-43bj22%3D5xVBg7S4vr5rQ%40mail.gmail.com

-- 
Best regards,
Aleksander Alekseev




Re: run pgindent on a regular basis / scripted manner

2023-02-06 Thread Robert Haas
On Sat, Feb 4, 2023 at 12:37 PM Tom Lane  wrote:
> But it's not clear to me why you're allergic to the perl wrapper?
> It's not like that's the only perl infrastructure in our build process.
> Also, whether or not we could push some of what it does into pg_bsd_indent
> proper, I can't see pushing all of it (for instance, the very PG-specific
> list of typedef exclusions).

I don't mind that there is a script. I do mind that it's not that good
of a script. There have been some improvements for which I am
grateful, like removing the thing where the first argument was taken
as a typedefs file under some circumstances. But there are still some
things that I would like:

1. I'd like to be able to run pgindent src/include and have it indent
everything relevant under src/include. Right now that silently does
nothing.

2. I'd like an easy way to indent the unstaged files in the current
directory (e.g. pgindent --dirty) or the files that have been queued
up for commit (e.g. pgindent --cached).

3. I'd also like an easy way to indent every file touched by a recent
commit, e.g. pgindent --commit HEAD, pgindent --commit HEAD~2,
pgindent --commit 62e1e28bf7.

It'd be much less annoying to include this in my workflow with these
kinds of options. For instance:

patch -p1 < ~/Downloads/some_stuff_v94.patch
# committer adjustments as desired
git add -u
pgindent --cached
git diff # did pgindent change anything? does it look ok?
git commit -a

For a while I, like some others here, was trying to be religious about
pgindenting at least the bigger commits that I pushed. But I fear I've
grown slack. I don't mind if we tighten up the process, but the better
we make the tools, the less friction it will cause.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: run pgindent on a regular basis / scripted manner

2023-02-06 Thread Tom Lane
Robert Haas  writes:
> I don't mind that there is a script. I do mind that it's not that good
> of a script. There have been some improvements for which I am
> grateful, like removing the thing where the first argument was taken
> as a typedefs file under some circumstances. But there are still some
> things that I would like:

I have no objection to someone coding those things up ;-).
I'll just note that adding features like those to a Perl script
is going to be a ton easier than doing it inside pg_bsd_indent.

regards, tom lane




Re: [PATCH] Compression dictionaries for JSONB

2023-02-06 Thread Matthias van de Meent
On Mon, 6 Feb 2023 at 15:03, Aleksander Alekseev
 wrote:
>
> Hi,
>
> I see your point regarding the fact that creating dictionaries on a
> training set is too beneficial to neglect it. Can't argue with this.
>
> What puzzles me though is: what prevents us from doing this on a page
> level as suggested previously?

The complexity of page-level compression is significant, as pages are
currently a base primitive of our persistency and consistency scheme.
TOAST builds on top of these low level primitives and has access to
catalogs, but smgr doesn't do that and can't have that, respectively,
because it need to be accessible and usable without access to the
catalogs during replay in recovery.

I would like to know how you envision we would provide consistency
when page-level compression would be implemented - wouldn't it
increase WAL overhead (and WAL synchronization overhead) when writing
out updated pages to a new location due to it changing compressed
size?

> More similar data you compress the more space and disk I/O you save.
> Additionally you don't have to compress/decompress the data every time
> you access it. Everything that's in shared buffers is uncompressed.
> Not to mention the fact that you don't care what's in pg_attribute,
> the fact that schema may change, etc. There is a table and a
> dictionary for this table that you refresh from time to time. Very
> simple.

You cannot "just" refresh a dictionary used once to compress an
object, because you need it to decompress the object too.

Additionally, I don't think block-level compression is related to this
thread in a meaningful way: TOAST and datatype -level compression
reduce the on-page size of attributes, and would benefit from improved
compression regardless of the size of pages when stored on disk, but a
page will always use 8kB when read into memory. A tuple that uses less
space on pages will thus always be the better option when you're
optimizing for memory usage, while also reducing storage size.

> Of course the disadvantage here is that we are not saving the memory,
> unlike the case of tuple-level compression. But we are saving a lot of
> CPU cycles

Do you have any indication for how attribute-level compares against
page-level compression in cpu cycles?

> and doing less disk IOs.

Less IO bandwidth, but I doubt it uses less operations, as each page
would still need to be read; which currently happens on a page-by-page
IO operation. 10 page read operations use 10 syscalls to read data
from disk - 10 IO ops.

> I would argue that saving CPU
> cycles is generally more preferable. CPUs are still often a bottleneck
> while the memory becomes more and more available, e.g there are
> relatively affordable (for a company, not an individual) 1 TB RAM
> instances, etc.

But not all systems have that 1TB RAM, and we cannot expect all users
to increase their RAM.

> So it seems to me that doing page-level compression would be simpler
> and more beneficial in the long run (10+ years). Don't you agree?

Page-level compression can not compress patterns that have a length of
more than 1 page. TOAST is often used to store values larger than 8kB,
which we'd prefer to compress to the greatest extent possible. So, a
value-level compression method specialized to the type of the value
does make a lot of sense, too.

I'm not trying to say that compressing pages doesn't make sense or is
useless, I just don't think that we should ignore attribute-level
compression just because page-level compression could at some point be
implemented too.


Kind regards,

Matthias van de Meent




Re: run pgindent on a regular basis / scripted manner

2023-02-06 Thread Andrew Dunstan


On 2023-02-06 Mo 09:40, Robert Haas wrote:

On Sat, Feb 4, 2023 at 12:37 PM Tom Lane  wrote:

But it's not clear to me why you're allergic to the perl wrapper?
It's not like that's the only perl infrastructure in our build process.
Also, whether or not we could push some of what it does into pg_bsd_indent
proper, I can't see pushing all of it (for instance, the very PG-specific
list of typedef exclusions).

I don't mind that there is a script. I do mind that it's not that good
of a script. There have been some improvements for which I am
grateful, like removing the thing where the first argument was taken
as a typedefs file under some circumstances. But there are still some
things that I would like:

1. I'd like to be able to run pgindent src/include and have it indent
everything relevant under src/include. Right now that silently does
nothing.

2. I'd like an easy way to indent the unstaged files in the current
directory (e.g. pgindent --dirty) or the files that have been queued
up for commit (e.g. pgindent --cached).

3. I'd also like an easy way to indent every file touched by a recent
commit, e.g. pgindent --commit HEAD, pgindent --commit HEAD~2,
pgindent --commit 62e1e28bf7.



Good suggestions. 1 and 3 seem fairly straightforward. I'll start on 
those, and look into 2.



cheers


andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com


Re: pgbench: using prepared BEGIN statement in a pipeline could cause an error

2023-02-06 Thread Tom Lane
Alvaro Herrera  writes:
> I'm writing my own patch for this problem.  While playing around with
> it, I noticed this:

> struct Command {
> /* size: 2168, cachelines: 34, members: 11 */
> /* sum members: 2164, holes: 1, sum holes: 4 */
> /* last cacheline: 56 bytes */
> };

I think the original intent was for argv[] to be at the end,
which fell victim to ye olde add-at-the-end antipattern.
Cache-friendliness-wise, putting it back to the end would
likely be enough.  But turning it into a variable-size array
would be better from a functionality standpoint.

regards, tom lane




Re: run pgindent on a regular basis / scripted manner

2023-02-06 Thread Robert Haas
On Mon, Feb 6, 2023 at 10:16 AM Tom Lane  wrote:
> I'll just note that adding features like those to a Perl script
> is going to be a ton easier than doing it inside pg_bsd_indent.

+1.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: run pgindent on a regular basis / scripted manner

2023-02-06 Thread Robert Haas
On Mon, Feb 6, 2023 at 10:21 AM Andrew Dunstan  wrote:
> Good suggestions. 1 and 3 seem fairly straightforward. I'll start on those, 
> and look into 2.

Thanks!

-- 
Robert Haas
EDB: http://www.enterprisedb.com




OpenSSL 3.0.0 vs old branches

2023-02-06 Thread Andrew Dunstan


I recently moved crake to a new machine running Fedora 36, which has 
OpenSSL 3.0.0. This causes the SSL tests to fail on branches earlier 
than release 13, so I propose to backpatch commit f0d2c65f17 to the 
release 11 and 12 branches.



cheers


andrew


--
Andrew Dunstan
EDB:https://www.enterprisedb.com


Re: when the startup process doesn't (logging startup delays)

2023-02-06 Thread Robert Haas
On Thu, Feb 2, 2023 at 11:22 PM Bharath Rupireddy
 wrote:
> I took the v4 patch, added some comments and attached it as the v7
> patch here. Please find it.

Committed and back-patched to v15.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: when the startup process doesn't (logging startup delays)

2023-02-06 Thread Tom Lane
Robert Haas  writes:
> On Thu, Feb 2, 2023 at 11:22 PM Bharath Rupireddy
>  wrote:
>> I took the v4 patch, added some comments and attached it as the v7
>> patch here. Please find it.

> Committed and back-patched to v15.

Umm ... is this really the sort of patch to be committing on a
release wrap day?

regards, tom lane




Re: when the startup process doesn't (logging startup delays)

2023-02-06 Thread Robert Haas
On Mon, Feb 6, 2023 at 11:07 AM Tom Lane  wrote:
> Umm ... is this really the sort of patch to be committing on a
> release wrap day?

Oh, shoot, I wasn't thinking about that. Would you like me to revert
it in v15 for now?

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: OpenSSL 3.0.0 vs old branches

2023-02-06 Thread Tom Lane
Andrew Dunstan  writes:
> I recently moved crake to a new machine running Fedora 36, which has 
> OpenSSL 3.0.0. This causes the SSL tests to fail on branches earlier 
> than release 13, so I propose to backpatch commit f0d2c65f17 to the 
> release 11 and 12 branches.

Hmm ... according to that commit message,

  Note that the minimum supported OpenSSL version is 1.0.1 as of
  7b283d0e1d1d79bf1c962d790c94d2a53f3bb38a, so this does not introduce
  any new version requirements.

So presumably, changing this test would break it for OpenSSL 0.9.8,
which is still nominally supported in those branches.  On the other
hand, this test isn't run by default, so users would likely never
notice anyway.

On the whole, +1 for doing this (after the release freeze lifts).

regards, tom lane




Re: when the startup process doesn't (logging startup delays)

2023-02-06 Thread Tom Lane
Robert Haas  writes:
> On Mon, Feb 6, 2023 at 11:07 AM Tom Lane  wrote:
>> Umm ... is this really the sort of patch to be committing on a
>> release wrap day?

> Oh, shoot, I wasn't thinking about that. Would you like me to revert
> it in v15 for now?

Yeah, seems like the safest course.  I wouldn't object to it going in
after the release is over, but right now there's zero margin for error.

regards, tom lane




Re: Where is the filter?

2023-02-06 Thread Robert Haas
On Sat, Feb 4, 2023 at 11:29 PM jack...@gmail.com  wrote:
> When I use 'select * from t where a = 1'; And I debug to find where the 'a = 
> 1' is used,
> when I arrive ExecScan in src/backend/executor/execScan.c, line 158, where 
> this 'a = 1' is
> stored in?

It depends somewhat on what query plan you got. For instance if it was
a Seq Scan then it will be a filter-condition, or "qual", and the call
to ExecQual() later in ExecScan() will be responsible for evaluating
it. But if you are using an index scan, then it will probably become
an index qual, and those are passed down into the index machinery and
internally handled by the index AM. It's a good idea when you're
debugging this sort of thing to start by looking at the EXPLAIN or
EXPLAIN ANALYZE output, and perhaps also the output with
debug_print_plan = true.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: make_ctags: use -I option to ignore pg_node_attr macro

2023-02-06 Thread Fujii Masao




On 2022/10/19 13:25, Tatsuo Ishii wrote:

Thanks. the v6 patch pushed to master branch.


Since this commit, make_etags has started failing to generate
tags files with the following error messages, on my MacOS.

$ src/tools/make_etags
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/ctags:
 illegal option -- e
usage: ctags [-BFadtuwvx] [-f tagsfile] file ...
sort: No such file or directory


In my MacOS, non-Exuberant ctags is installed and doesn't support
-e option. But the commit changed make_etags so that it always
calls ctags with -e option via make_ctags. This seems the cause of
the above failure.

IS_EXUBERANT=""
ctags --version 2>&1 | grep Exuberant && IS_EXUBERANT="Y"

make_ctags has the above code and seems to support non-Exuberant ctags.
If so, we should revert the changes of make_etags by the commit and
make make_etags work with that ctags? Or, we should support
only Exuberant-type ctags (btw, I'm ok with this) and get rid of
something like the above code?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: [PATCH] Add pretty-printed XML output option

2023-02-06 Thread Jim Jones

Hi,

The cfbot on "Windows - Server 2019, VS 2019 - Meson & ninja" is failing 
the regression tests with the error:


ERROR:  unsupported XML feature
DETAIL:  This functionality requires the server to be built with libxml 
support.


Is there anything I can do to enable libxml to run my regression tests?

v3 adds a missing xmlFree call.

Best, Jim
From ced9fccddc033de98709a6e93dc6530ce68149db Mon Sep 17 00:00:00 2001
From: Jim Jones 
Date: Thu, 2 Feb 2023 21:27:16 +0100
Subject: [PATCH v3 1/3] Add pretty-printed XML output option

This small patch introduces a XML pretty print function.
It basically takes advantage of the indentation feature
of xmlDocDumpFormatMemory from libxml2 to format XML strings.
---
 doc/src/sgml/func.sgml|  34 ++
 src/backend/utils/adt/xml.c   |  30 +
 src/include/catalog/pg_proc.dat   |   3 +
 src/test/regress/expected/xml.out | 107 ++
 src/test/regress/sql/xml.sql  |  34 ++
 5 files changed, 208 insertions(+)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index e09e289a43..e8b5e581f0 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -14293,6 +14293,40 @@ SELECT xmlagg(x) FROM (SELECT * FROM test ORDER BY y DESC) AS tab;
 ]]>
 

+   
+ 
+xmlpretty
+
+ 
+ xmlpretty
+ 
+ 
+
+xmlpretty ( xml ) xml
+
+ 
+ 
+ Converts the given XML value to pretty-printed, indented text.
+ 
+ 
+ 
+ Example:
+ 
+ 
+   
+   

 

diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c
index 079bcb1208..6409133137 100644
--- a/src/backend/utils/adt/xml.c
+++ b/src/backend/utils/adt/xml.c
@@ -473,6 +473,36 @@ xmlBuffer_to_xmltype(xmlBufferPtr buf)
 }
 #endif
 
+Datum
+xmlpretty(PG_FUNCTION_ARGS)
+{
+#ifdef USE_LIBXML
+
+xmlDocPtr  doc;
+xmlChar*buf = NULL;
+text   *arg = PG_GETARG_TEXT_PP(0);
+
+doc = xml_parse(arg, XMLOPTION_DOCUMENT, false, GetDatabaseEncoding(), NULL);
+
+/**
+* xmlDocDumpFormatMemory (
+*   xmlDocPtr doc,  # the XML document.
+*   xmlChar ** buf, # buffer where the formatted XML document will be stored.
+*   int *size,  # this could store the size of the created buffer
+* but as we do not need it, we can leave it NULL.
+*   int format) # 1 = node indenting.
+*/
+xmlDocDumpFormatMemory(doc, &buf, NULL, 1);
+
+xmlFreeDoc(doc);
+PG_RETURN_XML_P(cstring_to_xmltype((char*)buf));
+
+#else
+NO_XML_SUPPORT();
+return 0;
+#endif
+}
+
 
 Datum
 xmlcomment(PG_FUNCTION_ARGS)
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index c0f2a8a77c..3224dc3e76 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -8842,6 +8842,9 @@
 { oid => '3053', descr => 'determine if a string is well formed XML content',
   proname => 'xml_is_well_formed_content', prorettype => 'bool',
   proargtypes => 'text', prosrc => 'xml_is_well_formed_content' },
+  { oid => '4642', descr => 'Indented text from xml',
+  proname => 'xmlpretty', prorettype => 'xml',
+  proargtypes => 'xml', prosrc => 'xmlpretty' },
 
 # json
 { oid => '321', descr => 'I/O',
diff --git a/src/test/regress/expected/xml.out b/src/test/regress/expected/xml.out
index 3c357a9c7e..98a338ad8d 100644
--- a/src/test/regress/expected/xml.out
+++ b/src/test/regress/expected/xml.out
@@ -1599,3 +1599,110 @@ SELECT * FROM XMLTABLE('.' PASSING XMLELEMENT(NAME a) columns a varchar(20) PATH
   | 
 (1 row)
 
+-- XML pretty print: single line XML string
+SELECT xmlpretty('Belgian Waffles$5.95Two of our famous Belgian Waffles with plenty of real maple syrup650')::xml;
+xmlpretty 
+--
+ +
+ +
+ Belgian Waffles+
+ $5.95+
+ Two of our famous Belgian Waffles with plenty of real maple syrup+
+ 650+
+  +
++
+ 
+(1 row)
+
+-- XML pretty print: XML string with space, tabs and newline between nodes
+SELECT xmlpretty(' Belgian Waffles $15.95
+ Two of our famous Belgian Waffles with plenty of real maple syrup
+650')::xml;
+xmlpretty 
+-

Re: postgres_fdw, dblink, and CREATE SUBSCRIPTION security

2023-02-06 Thread Robert Haas
On Wed, Feb 1, 2023 at 3:37 PM Jacob Champion  wrote:
> > I'm not an expert on this stuff, but to me that feels like a weak and
> > fuzzy concept. If the client is going to tell the server something,
> > I'd much rather have it say something like "i'm proxying a request
> > from my local user rhaas, who authenticated using such and such a
> > method and connected from such and such an IP yadda yadda". That feels
> > to me like really clear communication that the server can then be
> > configured to something about via pg_hba.conf or similar. Saying "use
> > in-band authentication only", to me, feels much murkier. As the
> > recipient of that message, I don't know exactly what to do about it,
> > and it feels like whatever heuristic I adopt might end up being wrong
> > and something bad happens anyway.
>
> Is it maybe just a matter of terminology? If a proxy tells the server,
> "This user is logging in. Here's the password I have for them. DO NOT
> authenticate using anything else," and the HBA says to use ident auth
> for that user, then the server fails the connection. That's what I
> mean by in-band -- the proxy says, "here are the credentials for this
> connection." That's it.

I don't think that's quite the right concept. It seems to me that the
client is responsible for informing the server of what the situation
is, and the server is responsible for deciding whether to allow the
connection. In your scenario, the client is not only communicating
information ("here's the password I have got") but also making demands
on the server ("DO NOT authenticate using anything else"). I like the
first part fine, but not the second part.

Consider the scenario where somebody wants to allow a connection that
is proxied and does not require a password. For example, maybe I have
a group of three machines that all mutually trust each other and the
network is locked down so that we need not worry about IP spoofing or
whatever. Just be doubly sure, they all have SSL certificates so that
they can verify that an incoming connection is from one of the other
trusted machines. I, as the administrator, want to configure things so
that each machine will proxy connections to the others as long as
local user = remote user. When the remote machine receives the
connection, it can trust that the request is legitimate provided that
the SSL certificate is successfully verified.

The way I think this should work is, first, on each machine, in the
proxy configuration, there should be a rule that says "only proxy
connections where local user = remote user" (and any other rules I
want to enforce). Second, in the HBA configuration, there should be a
rule that says "if somebody is trying to proxy a connection, it has to
be for one of these IPs and they have to authenticate using an SSL
certificate". In this kind of scenario, the client has no business
demanding that the server authenticate using the password rather than
anything else. The server, not the client, is in charge of deciding
which connections to accept; the client's job is only to decide which
connections to proxy. And the human being is responsible for making
sure that the combination of those two things implements the intended
security policy.

> Agreed. The danger from my end is, I'm trained on configuration
> formats that have infinite bells and whistles. I don't really want to
> go too crazy with it.

Yeah. If I remember my math well enough, the time required to
implement infinite bells and whistles will also be infinite, and as a
wise man once said, real artists ship.

It does seem like a good idea, if we can, to make the configuration
file format flexible enough that we can easily extend it with more
bells and whistles later if we so choose. But realistically most
people are going to have very simple configurations.

> > and that maybe
> > has something in common with our existing configuration file syntaxes.
> > But if we have to invent something new, then we can do that.
>
> Okay. Personally I'd like
> - the ability to set options globally (so filters are optional)
> - the ability to maintain many options for a specific scope (host? IP
> range?) without making my config lines grow without bound
> - the ability to audit a configuration without trusting its comments
>
> But getting all of my wishlist into a sane configuration format that
> handles all the use cases is the tricky part. I'll think about it.

Nobody seemed too keen on my proposal of a bunch of tab-separated
fields; maybe we're all traumatized from pg_hba.conf and should look
for something more complex with a real parser. I thought that
tab-separated fields might be good enough and simple to implement, but
it doesn't matter how simple it is to implement if nobody likes it. We
could do something that looks more like a series of if-then rules,
e.g.

target-host 127.0.0.0/8 => reject
authentication-method scram => accept
reject

But it's only a hop, skip and a jump from there to something that
looks an awful lot

Re: when the startup process doesn't (logging startup delays)

2023-02-06 Thread Robert Haas
On Mon, Feb 6, 2023 at 11:15 AM Tom Lane  wrote:
> Robert Haas  writes:
> > On Mon, Feb 6, 2023 at 11:07 AM Tom Lane  wrote:
> >> Umm ... is this really the sort of patch to be committing on a
> >> release wrap day?
>
> > Oh, shoot, I wasn't thinking about that. Would you like me to revert
> > it in v15 for now?
>
> Yeah, seems like the safest course.  I wouldn't object to it going in
> after the release is over, but right now there's zero margin for error.

Done.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: [PATCH] Add pretty-printed XML output option

2023-02-06 Thread Tom Lane
Jim Jones  writes:
> The cfbot on "Windows - Server 2019, VS 2019 - Meson & ninja" is failing 
> the regression tests with the error:

> ERROR:  unsupported XML feature
> DETAIL:  This functionality requires the server to be built with libxml 
> support.

> Is there anything I can do to enable libxml to run my regression tests?

Your patch needs to also update expected/xml_1.out to match the output
the test produces when run without --with-libxml.

regards, tom lane




Re: Parallelize correlated subqueries that execute within each worker

2023-02-06 Thread Antonin Houska
James Coleman  wrote:

> On Sat, Jan 21, 2023 at 10:07 PM James Coleman  wrote:
> > ...
> > While working through Tomas's comment about a conditional in the
> > max_parallel_hazard_waker being guaranteed true I realized that in the
> > current version of the patch the safe_param_ids tracking in
> > is_parallel_safe isn't actually needed any longer. That seemed
> > suspicious, and so I started digging, and I found out that in the
> > current approach all of the tests pass with only the changes in
> > clauses.c. I don't believe that the other changes aren't needed;
> > rather I believe there isn't yet a test case exercising them, but I
> > realize that means I can't prove they're needed. I spent some time
> > poking at this, but at least with my current level of imagination I
> > haven't stumbled across a query that would exercise these checks.
> 
> I played with this a good bit more yesterday, I'm now a good bit more
> confident this is correct. I've cleaned up the patch; see attached for
> v7.
> 
> Here's some of my thought process:
> The comments in src/include/nodes/pathnodes.h:2953 tell us that
> PARAM_EXEC params are used to pass values around from one plan node to
> another in the following ways:
> 1. Values down into subqueries (for outer references in subqueries)
> 2. Up out of subqueries (for the results of a subplan)
> 3. From a NestLoop plan node into its inner relation (when the inner
> scan is parameterized with values from the outer relation)
> 
> Case (2) is already known to be safe (we currently add these params to
> safe_param_ids in max_parallel_hazard_walker when we encounter a
> SubPlan node).
> 
> I also believe case (3) is already handled. We don't build partial
> paths for joins when joinrel->lateral_relids is non-empty, and join
> order calculations already require that parameterization here go the
> correct way (i.e., inner depends on outer rather than the other way
> around).
> 
> That leaves us with only case (1) to consider in this patch. Another
> way of saying this is that this is really the only thing the
> safe_param_ids tracking is guarding against. For params passed down
> into subqueries we can further distinguish between init plans and
> "regular" subplans. We already know that params from init plans are
> safe (at the right level). So we're concerned here with a way to know
> if the params passed to subplans are safe. We already track required
> rels in ParamPathInfo, so it's fairly simple to do this test.
> 
> Which this patch we do in fact now see (as expected) rels with
> non-empty lateral_relids showing up in generate_[useful_]gather_paths.
> And the partial paths can now have non-empty required outer rels. I'm
> not able to come up with a plan that would actually be caught by those
> checks; I theorize that because of the few places we actually call
> generate_[useful_]gather_paths we are in practice already excluding
> those, but for now I've left these as a conditional rather than an
> assertion because it seems like the kind of guard we'd want to ensure
> those methods are safe.

Maybe we can later (in separate patches) relax the restrictions imposed on
partial path creation a little bit, so that more parameterized partial paths
are created.

One particular case that should be rejected by your checks is a partial index
path, which can be parameterized, but I couldn't construct a query that makes
your checks fire. Maybe the reason is that a parameterized index path is
mostly used on the inner side of a NL join, however no partial path can be
used there. (The problem is that each worker evaluating the NL join would only
see a subset of the inner relation, which whould lead to incorrect results.)

So I'd also choose conditions rather than assert statements.


Following are my (minor) findings:

In generate_gather_paths() you added this test

/*
 * Delay gather path creation until the level in the join tree where all
 * params used in a worker are generated within that worker.
 */
if (!bms_is_subset(required_outer, rel->relids))
return;

but I'm not sure if required_outer can contain anything of rel->relids. How
about using bms_is_empty(required) outer, or even this?

if (required_outer)
return;

Similarly,

/* We can't pass params to workers. */
if (!bms_is_subset(PATH_REQ_OUTER(cheapest_partial_path), rel->relids))

might look like

if (!bms_is_empty(PATH_REQ_OUTER(cheapest_partial_path)))

or

if (PATH_REQ_OUTER(cheapest_partial_path))

In particular, build_index_paths() does the following when setting
outer_relids (which eventually becomes (path->param_info->ppi_req_outer):

/* Enforce convention that outer_relids is exactly NULL if empty */
if (bms_is_empty(outer_relids))
outer_relids = NULL;


Another question is whether in this call

simple_gather_path = (Path *)
create_gather_path(root, rel, ch

Re: undersized unions

2023-02-06 Thread Robert Haas
On Sun, Feb 5, 2023 at 6:28 AM Andres Freund  wrote:
> On the other hand, it also just seems risky from a code writing perspective. 
> It's not immediate obvious that it'd be unsafe to create an on-stack Numeric 
> by assigning *ptr. But it is.

Well, I think that is pretty obvious: we have lots of things that are
essentially variable-length types, and you can't put any of them on
the stack.

But I do also think that the Numeric situation is messier than some
others we have got, and that's partly my fault, and it would be nice
to make it better.

I do not really know exactly how to do that, though. Our usual pattern
is to just have a struct and end with a variable-length array, or
alternatively add a comment says "other stuff follows!" at the end of
the struct definition, without doing anything that C knows about at
all. But here it's more complicated: there's a uint16 value for sure,
and then maybe an int16 value, and then some number of NumericDigit
values. That "maybe an int16 value" part is not something that C has a
built-in way of representing, to my knowledge, which is why we end up
with this hackish thing.

--
Robert Haas
EDB: http://www.enterprisedb.com




Re: undersized unions

2023-02-06 Thread Tom Lane
Robert Haas  writes:
> I do not really know exactly how to do that, though. Our usual pattern
> is to just have a struct and end with a variable-length array, or
> alternatively add a comment says "other stuff follows!" at the end of
> the struct definition, without doing anything that C knows about at
> all. But here it's more complicated: there's a uint16 value for sure,
> and then maybe an int16 value, and then some number of NumericDigit
> values. That "maybe an int16 value" part is not something that C has a
> built-in way of representing, to my knowledge, which is why we end up
> with this hackish thing.

If we were willing to blow off the optimizations for NBASE < 1,
and say that NumericDigit is always int16, then it would be possible
to represent all of these variants as plain array-of-int16, with
some conventions about which indexes are what (and some casting
between int16 and uint16).

I am, however, very dubious that Andres is correct that there's a
problem here.  Given that two of the variants of union NumericChoice
are structs ending with a flexible array, any compiler that thinks
it knows the size of the union precisely is broken.

regards, tom lane




Re: run pgindent on a regular basis / scripted manner

2023-02-06 Thread Andrew Dunstan


On 2023-02-06 Mo 10:36, Robert Haas wrote:

On Mon, Feb 6, 2023 at 10:21 AM Andrew Dunstan  wrote:

Good suggestions. 1 and 3 seem fairly straightforward. I'll start on those, and 
look into 2.

Thanks!



Here's a quick patch for 1 and 3. Would also need to adjust the docco.


cheers


andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com


Re: run pgindent on a regular basis / scripted manner

2023-02-06 Thread Peter Eisentraut

On 02.02.23 07:40, Tom Lane wrote:

Noah Misch  writes:

Regarding the concern about a pre-receive hook blocking an emergency push, the
hook could approve every push where a string like "pgindent: no" appears in a
commit message within the push.  You'd still want to make the tree clean
sometime the same week or so.  It's cheap to provide a break-glass like that.


I think the real question here is whether we can get all (or at least
a solid majority of) committers to accept such draconian constraints.
I'd buy into it, and evidently so would you, but I can't help noting
that less than a quarter of active committers have bothered to
comment on this thread.  I suspect the other three-quarters would
be quite annoyed if we tried to institute such requirements.  That's
not manpower we can afford to drive away.


I have some concerns about this.

First, as a matter of principle, it would introduce another level of 
gatekeeping power.  Right now, the committers are as a group in charge 
of what gets into the tree.  Adding commit hooks that are installed 
somewhere(?) by someone(?) and can only be seen by some(?) would upset 
that.  If we were using something like github or gitlab (not suggesting 
that, but for illustration), then you could put this kind of thing under 
.github/ or similar and then it would be under the same control as the 
source code itself.


Also, pgindent takes tens of seconds to run, so hooking that into the 
git push process would slow this down quite a bit.  And maybe we want to 
add pgperltidy and so on, where would this lead?  If somehow your local 
indenting doesn't give you the "correct" result for some reason, you 
might sit there for minutes and minutes trying to fix and push and fix 
and push.


Then, consider the typedefs issue.  If you add a typedef but don't add 
it to the typedefs list but otherwise pgindent your code perfectly, the 
push would be accepted.  If then later someone updates the typedefs 
list, perhaps from the build farm, it would then reject the indentation 
of your previously committed code, thus making it their problem.


I think a better way to address these issues would be making this into a 
test suite, so that you can run some command that checks "is everything 
indented correctly".  Then you can run this locally, on the build farm, 
in the cfbot etc. in a uniform way and apply the existing 
blaming/encouragement processes like for any other test failure.






Re: Understanding years part of Interval

2023-02-06 Thread Marcos Pegoraro
Em seg., 6 de fev. de 2023 às 10:59, Erik Wienhold 
escreveu:

> > On 06/02/2023 12:20 CET Marcos Pegoraro  wrote:
> >
> > I was just playing with some random timestamps for a week, for a month,
> > for a year ...
> >
> > select distinct current_date+((random()::numeric)||'month')::interval
> from generate_series(1,100) order by 1;
> > It´s with distinct clause because if you change that 'month' for a 'year'
> > it´ll return only 12 rows, instead of 100. So, why years part of interval
> > works differently than any other ?
> >
> > select '1.01 week'::interval; --> 0 years 0 mons 7 days 1 hours 40 mins
> 48.00 secs
> > select '1.01 month'::interval; --> 0 years 1 mons 0 days 7 hours 12 mins
> 0.00 secs
> > select '1.01 year'::interval; --> 1 years 0 mons 0 days 0 hours 0 mins
> 0.00 secs
>
> Explained in
> https://www.postgresql.org/docs/15/datatype-datetime.html#DATATYPE-INTERVAL-INPUT
> :
>
> Field values can have fractional parts: for example, '1.5 weeks' or
> '01:02:03.45'. However, because interval internally stores only
> three integer units (months, days, microseconds), fractional units
> must be spilled to smaller units. Fractional parts of units greater
> than months are rounded to be an integer number of months, e.g.
> '1.5 years' becomes '1 year 6 mons'. Fractional parts of weeks and
> days are computed to be an integer number of days and microseconds,
> assuming 30 days per month and 24 hours per day, e.g., '1.75
> months'
> becomes 1 mon 22 days 12:00:00. Only seconds will ever be shown as
> fractional on output.
>
> Internally interval values are stored as months, days, and
> microseconds. This is done because the number of days in a month
> varies, and a day can have 23 or 25 hours if a daylight savings
> time
> adjustment is involved.
>
> I´ve sent this message initially to general and Erik told me it's
documented, so it's better to hackers help me if this has an explaining why
it's done that way.

select '1 year'::interval = '1.05 year'::interval -->true ?
I cannot agree that this select returns true.


Re: Logical Replication Custom Column Expression

2023-02-06 Thread Stavros Koureas
>> And yes, probably you need to change the way you reply to email on
>> this list. Top-posting is generally avoided. See
>> https://wiki.postgresql.org/wiki/Mailing_Lists.

>Thanks for bringing this into the discussion :)

Thinking these days more about this topic, subscriber name is not a bad
idea, although it makes sense to be able to give your own value even on
subscriber level, for example an integer.
Having a custom integer value is better as definitely this integer will
participate later in all joins beside the tables and for sure joining with
an integer it would be quicker rather than joining on a character varying
(plus the rest of the columns).
In addition, discussing with other people and also on Stack
Overflow/DBAExchange I have found that other people think it is a great
enhancement for analytical purposes.

Στις Τετ 30 Νοε 2022 στις 10:39 π.μ., ο/η Stavros Koureas <
koureasstav...@gmail.com> έγραψε:

>
>
> Στις Τρί 29 Νοε 2022 στις 3:27 μ.μ., ο/η Ashutosh Bapat <
> ashutosh.bapat@gmail.com> έγραψε:
> > That would be too restrictive - not necessarily in your application
> > but generally. There could be some tables where consolidating rows
> > with same PK from different publishers into a single row in subscriber
> > would be desirable. I think we need to enable the property for every
> > subscriber that intends to add publisher column to the desired and
> > subscribed tables. But there should be another option per table which
> > will indicate that receiver should add publisher when INSERTING row to
> > that table.
>
> So we are discussing the scope level of this property, if this property
> will be implemented on subscriber level or on subscriber table.
> In that case I am not sure how this will be implemented as currently
> postgres subscribers can have multiple tables streamed from a single
> publisher.
> In that case we may have an additional syntax on subscriber, for example:
>
> CREATE SUBSCRIPTION sub1 CONNECTION 'host=localhost port=5432
> user=postgres password=XX dbname=publisher1' PUBLICATION pub1 with
> (enabled = false, create_slot = false, slot_name = NONE, tables =
> {tableA:union, tableB:none, });
>
> Something like this?
>
> > And yes, probably you need to change the way you reply to email on
> > this list. Top-posting is generally avoided. See
> > https://wiki.postgresql.org/wiki/Mailing_Lists.
>
> Thanks for bringing this into the discussion :)
>


Re: run pgindent on a regular basis / scripted manner

2023-02-06 Thread Andrew Dunstan


On 2023-02-06 Mo 12:03, Andrew Dunstan wrote:



On 2023-02-06 Mo 10:36, Robert Haas wrote:

On Mon, Feb 6, 2023 at 10:21 AM Andrew Dunstan  wrote:

Good suggestions. 1 and 3 seem fairly straightforward. I'll start on those, and 
look into 2.

Thanks!



Here's a quick patch for 1 and 3. Would also need to adjust the docco.




This time with patch.


cheers


andrew


--
Andrew Dunstan
EDB:https://www.enterprisedb.com
diff --git a/src/tools/pgindent/pgindent b/src/tools/pgindent/pgindent
index 56640e576a..ca329747a4 100755
--- a/src/tools/pgindent/pgindent
+++ b/src/tools/pgindent/pgindent
@@ -23,12 +23,14 @@ my $devnull = File::Spec->devnull;
 
 my ($typedefs_file, $typedef_str, $code_base,
 	@excludes,  $indent,  $build,
-	$show_diff, $silent_diff, $help);
+	$show_diff, $silent_diff, $help,
+	@commits,);
 
 $help = 0;
 
 my %options = (
 	"help"   => \$help,
+	"commit=s"   => \@commits,
 	"typedefs=s" => \$typedefs_file,
 	"list-of-typedefs=s" => \$typedef_str,
 	"code-base=s"=> \$code_base,
@@ -44,6 +46,9 @@ usage() if $help;
 usage("Cannot have both --silent-diff and --show-diff")
   if $silent_diff && $show_diff;
 
+usage("Cannot use --commit with --code-base or command line file list")
+  if (@commits && ($code_base || @ARGV));
+
 run_build($code_base) if ($build);
 
 # command line option wins, then environment (which is how --build sets it) ,
@@ -388,6 +393,7 @@ Usage:
 pgindent [OPTION]... [FILE]...
 Options:
 	--help  show this message and quit
+--commit=gitref use files modified by the named commit
 	--typedefs=FILE file containing a list of typedefs
 	--list-of-typedefs=STR  string containing typedefs, space separated
 	--code-base=DIR path to the base of PostgreSQL source code
@@ -396,7 +402,7 @@ Options:
 	--build build the pg_bsd_indent program
 	--show-diff show the changes that would be made
 	--silent-diff   exit with status 2 if any changes would be made
-The --excludes option can be given more than once.
+The --excludes and --commit options can be given more than once.
 EOF
 	if ($help)
 	{
@@ -412,27 +418,38 @@ EOF
 
 # main
 
-# get the list of files under code base, if it's set
-File::Find::find(
-	{
-		wanted => sub {
-			my ($dev, $ino, $mode, $nlink, $uid, $gid);
-			(($dev, $ino, $mode, $nlink, $uid, $gid) = lstat($_))
-			  && -f _
-			  && /^.*\.[ch]\z/s
-			  && push(@files, $File::Find::name);
-		}
-	},
-	$code_base) if $code_base;
-
 $filtered_typedefs_fh = load_typedefs();
 
 check_indent();
 
-# any non-option arguments are files to be processed
-push(@files, @ARGV);
+build_clean($code_base) if $build;
 
-# the exclude list applies to command line arguments as well as found files
+my $wanted = sub
+{
+	my ($dev, $ino, $mode, $nlink, $uid, $gid);
+	(($dev, $ino, $mode, $nlink, $uid, $gid) = lstat($_))
+	  && -f _
+	  && /^.*\.[ch]\z/s
+	  && push(@files, $File::Find::name);
+};
+
+# get the list of files under code base, if it's set
+File::Find::find({wanted => $wanted }, $code_base) if $code_base;
+
+# any non-option arguments are files or directories to be processed
+File::Find::find({wanted => $wanted}, @ARGV) if @ARGV;
+
+# process named commits by comparing each with their immediate ancestor
+foreach my $commit (@commits)
+{
+	my $prev="$commit~";
+	my @affected=`git diff-tree --no-commit-id --name-only -r $commit $prev`;
+	die "git error" if $?;
+	chomp(@affected);
+	push(@files,@affected);
+}
+
+# remove excluded files from the file list
 process_exclude();
 
 foreach my $source_filename (@files)
@@ -481,6 +498,4 @@ foreach my $source_filename (@files)
 	}
 }
 
-build_clean($code_base) if $build;
-
 exit 0;


Re: [PATCH] Add pretty-printed XML output option

2023-02-06 Thread Jim Jones

On 06.02.23 17:23, Tom Lane wrote:

Your patch needs to also update expected/xml_1.out to match the output
the test produces when run without --with-libxml.


Thanks a lot! It seems to do the trick.

xml_1.out updated in v4.

Best, Jim
From ced9fccddc033de98709a6e93dc6530ce68149db Mon Sep 17 00:00:00 2001
From: Jim Jones 
Date: Thu, 2 Feb 2023 21:27:16 +0100
Subject: [PATCH v4 1/4] Add pretty-printed XML output option

This small patch introduces a XML pretty print function.
It basically takes advantage of the indentation feature
of xmlDocDumpFormatMemory from libxml2 to format XML strings.
---
 doc/src/sgml/func.sgml|  34 ++
 src/backend/utils/adt/xml.c   |  30 +
 src/include/catalog/pg_proc.dat   |   3 +
 src/test/regress/expected/xml.out | 107 ++
 src/test/regress/sql/xml.sql  |  34 ++
 5 files changed, 208 insertions(+)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index e09e289a43..e8b5e581f0 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -14293,6 +14293,40 @@ SELECT xmlagg(x) FROM (SELECT * FROM test ORDER BY y DESC) AS tab;
 ]]>
 

+   
+ 
+xmlpretty
+
+ 
+ xmlpretty
+ 
+ 
+
+xmlpretty ( xml ) xml
+
+ 
+ 
+ Converts the given XML value to pretty-printed, indented text.
+ 
+ 
+ 
+ Example:
+ 
+ 
+   
+   

 

diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c
index 079bcb1208..6409133137 100644
--- a/src/backend/utils/adt/xml.c
+++ b/src/backend/utils/adt/xml.c
@@ -473,6 +473,36 @@ xmlBuffer_to_xmltype(xmlBufferPtr buf)
 }
 #endif
 
+Datum
+xmlpretty(PG_FUNCTION_ARGS)
+{
+#ifdef USE_LIBXML
+
+xmlDocPtr  doc;
+xmlChar*buf = NULL;
+text   *arg = PG_GETARG_TEXT_PP(0);
+
+doc = xml_parse(arg, XMLOPTION_DOCUMENT, false, GetDatabaseEncoding(), NULL);
+
+/**
+* xmlDocDumpFormatMemory (
+*   xmlDocPtr doc,  # the XML document.
+*   xmlChar ** buf, # buffer where the formatted XML document will be stored.
+*   int *size,  # this could store the size of the created buffer
+* but as we do not need it, we can leave it NULL.
+*   int format) # 1 = node indenting.
+*/
+xmlDocDumpFormatMemory(doc, &buf, NULL, 1);
+
+xmlFreeDoc(doc);
+PG_RETURN_XML_P(cstring_to_xmltype((char*)buf));
+
+#else
+NO_XML_SUPPORT();
+return 0;
+#endif
+}
+
 
 Datum
 xmlcomment(PG_FUNCTION_ARGS)
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index c0f2a8a77c..3224dc3e76 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -8842,6 +8842,9 @@
 { oid => '3053', descr => 'determine if a string is well formed XML content',
   proname => 'xml_is_well_formed_content', prorettype => 'bool',
   proargtypes => 'text', prosrc => 'xml_is_well_formed_content' },
+  { oid => '4642', descr => 'Indented text from xml',
+  proname => 'xmlpretty', prorettype => 'xml',
+  proargtypes => 'xml', prosrc => 'xmlpretty' },
 
 # json
 { oid => '321', descr => 'I/O',
diff --git a/src/test/regress/expected/xml.out b/src/test/regress/expected/xml.out
index 3c357a9c7e..98a338ad8d 100644
--- a/src/test/regress/expected/xml.out
+++ b/src/test/regress/expected/xml.out
@@ -1599,3 +1599,110 @@ SELECT * FROM XMLTABLE('.' PASSING XMLELEMENT(NAME a) columns a varchar(20) PATH
   | 
 (1 row)
 
+-- XML pretty print: single line XML string
+SELECT xmlpretty('Belgian Waffles$5.95Two of our famous Belgian Waffles with plenty of real maple syrup650')::xml;
+xmlpretty 
+--
+ +
+ +
+ Belgian Waffles+
+ $5.95+
+ Two of our famous Belgian Waffles with plenty of real maple syrup+
+ 650+
+  +
++
+ 
+(1 row)
+
+-- XML pretty print: XML string with space, tabs and newline between nodes
+SELECT xmlpretty(' Belgian Waffles $15.95
+ Two of our famous Belgian Waffles with plenty of real maple syrup
+650')::xml;
+xmlpretty 
+--
+   

2023-02-09 release announcement draft

2023-02-06 Thread Jonathan S. Katz

Hi,

Attached is a draft of the announcement for the 2023-02-09 update release.

Please review and provide corrections, notable omissions, and 
suggestions no later than 2023-02-09 0:00 AoE.


Thanks!

Jonathan
The PostgreSQL Global Development Group has released an update to all supported
versions of PostgreSQL, including 15.2, 14.7, 13.10, 12.14, and 11.19.
This release fixes over 60 bugs reported over the last several months.

For the full list of changes, please review the
[release notes](https://www.postgresql.org/docs/release/).

Bug Fixes and Improvements
--
 
This update fixes over 60 bugs that were reported in the last several months.
The issues listed below affect PostgreSQL 15. Some of these issues may also
affect other supported versions of PostgreSQL.

Included in this release:

* Fix for partitioned tables to correctly update
[`GENERATED`](https://www.postgresql.org/docs/current/ddl-generated-columns.html)
columns in child tables if the `GENERATED` column does not exist in the parent
table or the child generated column has different dependencies than the parent.
* Several fixes for the 
[`MERGE`](https://www.postgresql.org/docs/current/sql-merge.html)
command.
* Allow a
[`WITH RECURSIVE ... 
CYCLE`](https://www.postgresql.org/docs/current/queries-with.html#QUERIES-WITH-CYCLE)
query to access its output column.
* Fix an issue with bulk insertions on foreign tables that could lead to logical
inconsistencies, for example, a `BEFORE ROW` trigger may not process rows that
should be available.
* Reject uses of undefined variables in
[`jsonpath`](https://www.postgresql.org/docs/current/functions-json.html#FUNCTIONS-SQLJSON-PATH)
existence checks.
* Fix for [`jsonb` 
subscripting](https://www.postgresql.org/docs/current/datatype-json.html#JSONB-SUBSCRIPTING)
to handle very large subscript values.
* Honor updated values of `checkpoint_completion_target` on reload.
* Log the correct ending timestamp in `recovery_target_xid` mode.
* Fix issue to allow longer column lists when using logical replication.
* Prevent "wrong tuple length" failure at the end of
[`VACUUM`](https://www.postgresql.org/docs/current/sql-vacuum.html).
* Avoid an immediate commit after
[`ANALYZE`](https://www.postgresql.org/docs/current/sql-analyze.html) when using
query pipelining.
* Several fixes to the query planner, including one that provides more
opportunities for using
[memoization with partitionwise 
joins](https://www.postgresql.org/docs/current/runtime-config-query.html).
* Fix for statistics collection to correctly handle when a relation changes
(e.g. a table is converted to a view).
* Ensure [full text 
search](https://www.postgresql.org/docs/current/textsearch.html)
queries can be cancelled while performing phrase matches.
* Fix deadlock between [`DROP 
DATABASE`](https://www.postgresql.org/docs/current/sql-dropdatabase.html)
and logical replication worker process.
* Fix small session-lifespan memory leak when
[`CREATE 
SUBSCRIPTION`](https://www.postgresql.org/docs/current/sql-createsubscription.html)
fails its connection attempt.
* Performance improvement for replicas with
[`hot_standby`](https://www.postgresql.org/docs/current/hot-standby.html)
enabled that are processing `SELECT` queries.
* Several fixes for logical decoding that improve its stability and bloat
handling.
* Fix the default logical replication plug-in, `pgoutput`, to not send columns
that are not listed in a table's replication
[column 
list](https://www.postgresql.org/docs/current/logical-replication-col-lists.html).
* Fix possible corruption of very large tablespace map files in
[`pg_basebackup`](https://www.postgresql.org/docs/current/app-pgbasebackup.html).
* Remove a harmless warning from
[`pg_dump`](https://www.postgresql.org/docs/current/app-pgdump.html) in
`--if-exists` mode when the
[`public` 
schema](https://www.postgresql.org/docs/current/ddl-schemas.html#DDL-SCHEMAS-PUBLIC)
has a non-default owner.
* Fix the [`psql`](https://www.postgresql.org/docs/current/app-psql.html)
commands `\sf` and `\ef` to handle SQL-language functions that have
[SQL-standard function 
bodies](https://www.postgresql.org/docs/currenet/sql-createprocedure.html)
(i.e. `BEGIN ATOMIC`).
* Fix tab completion of `ALTER FUNCTION/PROCEDURE/ROUTINE ... SET SCHEMA`.
* Update the 
[`pageinspect`](https://www.postgresql.org/docs/current/pageinspect.html)
extension to mark its disk-accessing functions as `PARALLEL RESTRICTED`.
* Fix the [`seg`](https://www.postgresql.org/docs/current/seg.html) extension to
not crash or print garbage if an input number has more than 127 digits.

This release also updates time zone data files to tzdata release 2022g for
DST law changes in Greenland and Mexico, plus historical corrections for
northern Canada, Colombia, and Singapore. Notably, a new timezone,
America/Ciudad_Juarez, has been split off from America/Ojinaga.

For the full list of changes available, please review the
[release notes](https://www.postgresql.org/do

Re: undersized unions

2023-02-06 Thread Andres Freund
Hi

On 2023-02-06 11:42:57 -0500, Robert Haas wrote:
> On Sun, Feb 5, 2023 at 6:28 AM Andres Freund  wrote:
> > On the other hand, it also just seems risky from a code writing 
> > perspective. It's not immediate obvious that it'd be unsafe to create an 
> > on-stack Numeric by assigning *ptr. But it is.
>
> Well, I think that is pretty obvious: we have lots of things that are
> essentially variable-length types, and you can't put any of them on
> the stack.

There's a difference between a stack copy not containing all the data, and a
stack copy triggering undefined behaviour. But I agree, it's not something
that'll commonly endanger us.


> But I do also think that the Numeric situation is messier than some
> others we have got, and that's partly my fault, and it would be nice
> to make it better.
>
> I do not really know exactly how to do that, though. Our usual pattern
> is to just have a struct and end with a variable-length array, or
> alternatively add a comment says "other stuff follows!" at the end of
> the struct definition, without doing anything that C knows about at
> all. But here it's more complicated: there's a uint16 value for sure,
> and then maybe an int16 value, and then some number of NumericDigit
> values. That "maybe an int16 value" part is not something that C has a
> built-in way of representing, to my knowledge, which is why we end up
> with this hackish thing.

Perhaps something like

typedef struct NumericBase
{
uint16  n_header;
} NumericBase;

typedef struct NumericData
{
int32   vl_len_;/* varlena header (do not touch 
directly!) */
NumericBase data;
} NumericData;

/* subclass of NumericBase, needs to start in a compatible way */
typedef struct NumericLong
{
uint16  n_sign_dscale;  /* Sign + display scale */
int16   n_weight;   /* Weight of 1st digit  */
} NumericLong;

/* subclass of NumericBase, needs to start in a compatible way */
struct NumericShort
{
uint16  n_header;   /* Sign + display scale + 
weight */
NumericDigit n_data[FLEXIBLE_ARRAY_MEMBER]; /* Digits */
};

Macros that e.g. access n_long would need to cast, before they're able to
access n_long. So we'd end up with something like

#define NUMERIC_SHORT(n)  ((NumericShort *)&((n)->data))
#define NUMERIC_LONG(n)  ((NumericLong *)&((n)->data))

#define NUMERIC_WEIGHT(n)   (NUMERIC_HEADER_IS_SHORT((n)) ? \
((NUMERIC_SHORT(n)->n_header & NUMERIC_SHORT_WEIGHT_SIGN_MASK ? \
~NUMERIC_SHORT_WEIGHT_MASK : 0) \
 | (NUMERIC_SHORT(n)->n_header & NUMERIC_SHORT_WEIGHT_MASK)) \
: (NUMERIC_LONG(n)->n_weight))

Although I'd actually be tempted to rip out all the casts but NUMERIC_LONG()
in this case, because it's all all just accessing the base n_header anyway.

Greetings,

Andres Freund




Re: undersized unions

2023-02-06 Thread Andres Freund
Hi,

On 2023-02-06 11:55:40 -0500, Tom Lane wrote:
> I am, however, very dubious that Andres is correct that there's a
> problem here.  Given that two of the variants of union NumericChoice
> are structs ending with a flexible array, any compiler that thinks
> it knows the size of the union precisely is broken.

The compiler just complains about the minimum size of the union, which is
  Max(offsetof(NumericShort, n_data), offsetof(NumericLong, n_data))
IOW, our trickery with flexible arrays would allow us to allocate just 8 bytes
for a NumericData, but not just 6.

Flexible arrays allow the compiler to understand the variable size, but we
don't use it for all variability. Hence the warnings.

Greetings,

Andres Freund




Re: GUCs to control abbreviated sort keys

2023-02-06 Thread Peter Geoghegan
On Fri, Jan 27, 2023 at 12:34 PM Jeff Davis  wrote:
> It is non-deterministic, but I tried with two generated files, and got
> similar results.

Jeff and I coordinated off-list. It turned out that the
nondeterministic nature of the program to generate test data was
behind my initial inability to recreate Jeff's results. Once Jeff
provided me with the exact data that he saw the problem with, I was
able to recreate the problematic case for abbreviated keys.

It turns out that this was due to aborting abbreviation way too late
in the process. It would happen relatively late in the process, when
more than 50% of all tuples had already had abbreviations generated by
ICU. This was a marginal case for abbreviated keys, which is precisely
why it only happened this long into the process. That factor is also
likely why I couldn't recreate the problem at first, even though I had
test data that was substantially the same as the data required to show
the problem.

Attached patch fixes the issue. It teaches varstr_abbrev_abort to do
something similar to every other abbreviated keys abort function: stop
estimating cardinality entirely (give up on giving up) once there are
a certain number of distinct abbreviated keys, regardless of any other
factor.

This is very closely based on existing code from numeric_abbrev_abort,
though I use a cutoff of 10k rather than a cutoff of 100k. This
difference is justified by the special considerations for text, where
we authoritative comparisons have further optimizations such as
strcoll caching and the memcmp equality fast path. It's also required
to actually fix the test case at hand -- 100k isn't enough to avoid
the performance issue Jeff reported.

I think that this should be committed to HEAD only.

-- 
Peter Geoghegan


v1-0001-Abort-abbreviated-keys-for-text-less-eagerly.patch
Description: Binary data


Re: lockup in parallel hash join on dikkop (freebsd 14.0-current)

2023-02-06 Thread Tomas Vondra
On 1/29/23 19:08, Tomas Vondra wrote:
> 
> 
> On 1/29/23 18:53, Andres Freund wrote:
>> Hi,
>>
>> On 2023-01-29 18:39:05 +0100, Tomas Vondra wrote:
>>> Will do, but I'll wait for another lockup to see how frequent it
>>> actually is. I'm now at ~90 runs total, and it didn't happen again yet.
>>> So hitting it after 15 runs might have been a bit of a luck.
>>
>> Was there a difference in how much load there was on the machine between
>> "reproduced in 15 runs" and "not reproed in 90"?  If indeed lack of barriers
>> is related to the issue, an increase in context switches could substantially
>> change the behaviour (in both directions).  More intra-process context
>> switches can amount to "probabilistic barriers" because that'll be a
>> barrier. At the same time it can make it more likely that the relatively
>> narrow window in WaitEventSetWait() is hit, or lead to larger delays
>> processing signals.
>>
> 
> No. The only thing the machine is doing is
> 
>   while /usr/bin/true; do
> make check
>   done
> 
> I can't reduce the workload further, because the "join" test is in a
> separate parallel group (I cut down parallel_schedule). I could make the
> machine busier, of course.
> 
> However, the other lockup I saw was when using serial_schedule, so I
> guess lower concurrency makes it more likely.
> 

FWIW the machine is now on run ~2700 without any further lockups :-/

Seems it was quite lucky we hit it twice in a handful of attempts.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: undersized unions

2023-02-06 Thread Robert Haas
On Mon, Feb 6, 2023 at 1:28 PM Andres Freund  wrote:
> Perhaps something like

Yeah, that'd work. You'd want a big ol' warning comment here:

> typedef struct NumericData
> {
> int32   vl_len_;/* varlena header (do not 
> touch directly!) */
> NumericBase data;
> } NumericData;

like /* actually NumericShort or NumericLong */ or something

> Although I'd actually be tempted to rip out all the casts but NUMERIC_LONG()
> in this case, because it's all all just accessing the base n_header anyway.

Yeah.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Non-superuser subscription owners

2023-02-06 Thread Robert Haas
On Fri, Feb 3, 2023 at 3:47 AM Andres Freund  wrote:
> On 2023-02-02 09:28:03 -0500, Robert Haas wrote:
> > I don't know what you mean by this. DML doesn't confer privileges. If
> > code gets executed and runs with the replication user's credentials,
> > that could lead to privilege escalation, but just moving rows around
> > doesn't, at least not in the database sense.
>
> Executing DML ends up executing code. Think predicated/expression
> indexes, triggers, default expressions etc. If a badly written trigger
> etc can be tricked to do arbitrary code exec, an attack will be able to
> run with the privs of the run-as user.  How bad that is is influenced to
> some degree by the amount of privileges that user has.

I spent some time studying this today. I think you're right. What I'm
confused about is: why do we consider this situation even vaguely
acceptable? Isn't this basically an admission that our logical
replication security model is completely and totally broken and we
need to fix it somehow and file for a CVE number? Like, in released
branches, you can't even have a subscription owned by a non-superuser.
But any non-superuser can set a default expression or create an enable
always trigger and sure enough, if that table is replicated, the
system will run that trigger as the subscription owner, who is a
superuser. Which AFAICS means that if a non-superuser owns a table
that is part of a subscription, they can instantly hack superuser.
Which seems, uh, extremely bad. Am I missing something?

Based on other remarks you made upthread, it seems like we ought to be
doing the actual replication as the table owner, since the table owner
has to be prepared for executable code attached to the table to be
re-run on rows in the table at any table when somebody does a REINDEX.
And then, in master, where there's some provision for non-superuser
subscription owners, we maybe need to re-think the privileges required
to replicate into a table in the first place. I don't think that
having I/U/D permissions on a table is really sufficient to justify
performing those operations *as the table owner*; perhaps the check
ought to be whether you have the privileges of the table owner.

--
Robert Haas
EDB: http://www.enterprisedb.com




Re: Non-superuser subscription owners

2023-02-06 Thread Andres Freund
Hi,

On 2023-02-06 14:07:39 -0500, Robert Haas wrote:
> On Fri, Feb 3, 2023 at 3:47 AM Andres Freund  wrote:
> > On 2023-02-02 09:28:03 -0500, Robert Haas wrote:
> > > I don't know what you mean by this. DML doesn't confer privileges. If
> > > code gets executed and runs with the replication user's credentials,
> > > that could lead to privilege escalation, but just moving rows around
> > > doesn't, at least not in the database sense.
> >
> > Executing DML ends up executing code. Think predicated/expression
> > indexes, triggers, default expressions etc. If a badly written trigger
> > etc can be tricked to do arbitrary code exec, an attack will be able to
> > run with the privs of the run-as user.  How bad that is is influenced to
> > some degree by the amount of privileges that user has.
> 
> I spent some time studying this today. I think you're right. What I'm
> confused about is: why do we consider this situation even vaguely
> acceptable? Isn't this basically an admission that our logical
> replication security model is completely and totally broken and we
> need to fix it somehow and file for a CVE number? Like, in released
> branches, you can't even have a subscription owned by a non-superuser.
> But any non-superuser can set a default expression or create an enable
> always trigger and sure enough, if that table is replicated, the
> system will run that trigger as the subscription owner, who is a
> superuser. Which AFAICS means that if a non-superuser owns a table
> that is part of a subscription, they can instantly hack superuser.
> Which seems, uh, extremely bad. Am I missing something?

It's decidedly not great, yes. I don't know if it's quite a CVE type issue,
after all, the same is true for any other type of query the superuser
executes. But at the very least the documentation needs to be better, with a
big red box making sure the admin is aware of the problem.

I think we need some more fundamental ways to deal with this issue, including
but not restricted to the replication context. Some potentially relevant
discussion is in this thread:
https://postgr.es/m/75b0dbb55e9febea54c441efff8012a6d2cb5bd7.camel%40j-davis.com

I don't agree with Jeff's proposal, but I think there's some worthwhile ideas
in the idea + followups.


> And then, in master, where there's some provision for non-superuser
> subscription owners, we maybe need to re-think the privileges required
> to replicate into a table in the first place. I don't think that
> having I/U/D permissions on a table is really sufficient to justify
> performing those operations *as the table owner*; perhaps the check
> ought to be whether you have the privileges of the table owner.

Yes, I think we ought to check role membership, including non-inherited
memberships.

Greetings,

Andres Freund




Re: lockup in parallel hash join on dikkop (freebsd 14.0-current)

2023-02-06 Thread Andres Freund
Hi,

On 2023-02-06 19:51:19 +0100, Tomas Vondra wrote:
> > No. The only thing the machine is doing is
> > 
> >   while /usr/bin/true; do
> > make check
> >   done
> > 
> > I can't reduce the workload further, because the "join" test is in a
> > separate parallel group (I cut down parallel_schedule). I could make the
> > machine busier, of course.
> > 
> > However, the other lockup I saw was when using serial_schedule, so I
> > guess lower concurrency makes it more likely.
> > 
> 
> FWIW the machine is now on run ~2700 without any further lockups :-/
> 
> Seems it was quite lucky we hit it twice in a handful of attempts.

Did you cut down the workload before you reproduced it the first time, or
after? It's quite possible that it's not reproducible in isolation.

Greetings,

Andres Freund




Re: GUCs to control abbreviated sort keys

2023-02-06 Thread Thomas Munro
On Fri, Jan 27, 2023 at 12:29 PM Jeff Davis  wrote:
> I am using these GUCs for testing the various collation paths in my
> collation refactoring branch.

Speaking of testing, has anyone ever tried porting Tom's random test
program[1] to ICU?

[1] https://www.postgresql.org/message-id/31913.1458747...@sss.pgh.pa.us




Re: Pluggable toaster

2023-02-06 Thread Andres Freund
Hi,

On 2023-02-06 16:38:01 +0300, Nikita Malakhov wrote:
> So we decided to reduce changes in the core to the minimum necessary
> to make it available through the hooks, because the hooks part is very
> lightweight and simple to keep rebasing onto the vanilla core.

At least I don't think we should accept such hooks. I don't think I am alone
in that.

Greetings,

Andres Freund




Re: [PATCH] Compression dictionaries for JSONB

2023-02-06 Thread Andres Freund
Hi,

On 2023-02-06 16:16:41 +0100, Matthias van de Meent wrote:
> On Mon, 6 Feb 2023 at 15:03, Aleksander Alekseev
>  wrote:
> >
> > Hi,
> >
> > I see your point regarding the fact that creating dictionaries on a
> > training set is too beneficial to neglect it. Can't argue with this.
> >
> > What puzzles me though is: what prevents us from doing this on a page
> > level as suggested previously?
> 
> The complexity of page-level compression is significant, as pages are
> currently a base primitive of our persistency and consistency scheme.

+many

It's also not all a panacea performance-wise, datum-level decompression can
often be deferred much longer than page level decompression. For things like
json[b], you'd hopefully normally have some "pre-filtering" based on proper
columns, before you need to dig into the json datum.

It's also not necessarily that good, compression ratio wise. Particularly for
wider datums you're not going to be able to remove much duplication, because
there's only a handful of tuples. Consider the case of json keys - the
dictionary will often do better than page level compression, because it'll
have the common keys in the dictionary, which means the "full" keys never will
have to appear on a page, whereas page-level compression will have the keys on
it, at least once.

Of course you can use a dictionary for page-level compression too, but the
gains when it works well will often be limited, because in most OLTP usable
page-compression schemes I'm aware of, you can't compress a page all that far
down, because you need a small number of possible "compressed page sizes".


> > More similar data you compress the more space and disk I/O you save.
> > Additionally you don't have to compress/decompress the data every time
> > you access it. Everything that's in shared buffers is uncompressed.
> > Not to mention the fact that you don't care what's in pg_attribute,
> > the fact that schema may change, etc. There is a table and a
> > dictionary for this table that you refresh from time to time. Very
> > simple.
> 
> You cannot "just" refresh a dictionary used once to compress an
> object, because you need it to decompress the object too.

Right. That's what I was trying to refer to when mentioning that we might need
to add a bit of additional information to the varlena header for datums
compressed with a dictionary.

Greetings,

Andres Freund




Re: Non-superuser subscription owners

2023-02-06 Thread Robert Haas
On Mon, Feb 6, 2023 at 2:18 PM Andres Freund  wrote:
> It's decidedly not great, yes. I don't know if it's quite a CVE type issue,
> after all, the same is true for any other type of query the superuser
> executes. But at the very least the documentation needs to be better, with a
> big red box making sure the admin is aware of the problem.

I don't think that's the same thing at all. A superuser executing a
query interactively can indeed cause all sorts of bad things to
happen, but you don't have to log in as superuser and run DML queries
on tables owned by unprivileged users, and you shouldn't.

But what we're talking about here is -- the superuser comes along and
sets up logical replication in the configuration in what seems to be
exactly the way it was intended to be used, and now any user who can
log into the subscriber node can become superuser for free whenever
they want, without the superuser doing anything at all, even logging
in. Saying it's "not ideal" seems like you're putting it in the same
category as "the cheese got moldy in the fridge" but to me it sounds
more like "the fridge exploded and the house is on fire."

If we were to document this, I assume that the warning we would add to
the documentation would look like this:

<-- begin documentation text -->
Pretty much don't ever use logical replication. In any normal
configuration, it lets every user on your system escalate to superuser
whenever they want. It is possible to make it safe, if you make sure
all the tables on the replica are owned by the superuser and none of
them have any triggers, defaults, expression indexes, or anything else
associated with them that might execute any code while replicating.
But notice that this makes logical replication pretty much useless for
one of its intended purposes, which is high availability, because if
you actually fail over, you're going to then have to change the owners
of all of those tables and apply any missing triggers, defaults,
expression indexes, or anything like that which you may want to have.
And then to fail back you're going to have to remove all of that stuff
again and once again make the tables superuser-owned. That's obviously
pretty impractical, so you probably shouldn't use logical replication
at all until we get around to fixing this. You might wonder why we
implemented a feature that can't be used in any kind of normal way
without completely and totally breaking your system security -- but
don't ask us, we don't know, either!
<-- end documentation text -->

Honestly, this makes the CREATEROLE exploit that I fixed recently in
master look like a walk in the park. Sure, it's a pain for service
providers, who might otherwise use it, but most normal users don't and
wouldn't no matter how it worked, and really are not going to care.
But people do use logical replication, and it seems to me that the
issue you're describing here means that approximately 100% of those
installations have a vulnerability allowing any local user who owns a
table or can create one to escalate to superuser. Far from being not
quite a CVE issue, that seems substantially more serious than most
things we get CVEs for.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: pglz compression performance, take two

2023-02-06 Thread Tomas Vondra



On 2/6/23 03:00, Andrey Borodin wrote:
> On Sun, Feb 5, 2023 at 5:51 PM Tomas Vondra
>  wrote:
>>
>> On 2/5/23 19:36, Andrey Borodin wrote:
>>> On Fri, Jan 6, 2023 at 10:02 PM Andrey Borodin  
>>> wrote:

 Hello! Please find attached v8.
>>>
>>> I got some interesting feedback from some patch users.
>>> There was an oversight that frequently yielded results that are 1,2 or
>>> 3 bytes longer than expected.
>>> Looking closer I found that the correctness of the last 3-byte tail is
>>> checked in two places. PFA fix for this. Previously compressed data
>>> was correct, however in some cases few bytes longer than the result of
>>> current pglz implementation.
>>>
>>
>> Thanks. What were the consequences of the issue? Lower compression
>> ratio, or did we then fail to decompress the data (or would current pglz
>> implementation fail to decompress it)?
>>
> The data was decompressed fine. But extension tests (Citus's columnar
> engine) hard-coded a lot of compression ratio stuff.

OK. Not sure I'd blame the patch for these failures, as long as long as
the result is still correct and can be decompressed. I'm not aware of a
specification of what the compression must (not) produce.

> And there is still 1 more test where optimized version produces 1 byte
> longer output. I'm trying to find it, but with no success yet.
> 
> There are known and documented cases when optimized pglz version would
> do so. good_match without 10-division and memcmp by 4 bytes. But even
> disabling this, still observing 1-byte longer compression results
> persists... The problem is the length is changed after deleting some
> data, so compression of that particular sequence seems to be somewhere
> far away.
> It was funny at the beginning - to hunt for 1 byte. But the weekend is
> ending, and it seems that byte slipped from me again...
> 

I wonder what that means for the patch. I haven't investigated this at
all, but it seems as if the optimization means we fail to find a match,
producing a tad larger output. That may be still be a good tradeoff, as
long as the output is correct (assuming it doesn't break some promise
regarding expected output).

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Pluggable toaster

2023-02-06 Thread Matthias van de Meent
On Mon, 6 Feb 2023 at 20:24, Andres Freund  wrote:
>
> Hi,
>
> On 2023-02-06 16:38:01 +0300, Nikita Malakhov wrote:
> > So we decided to reduce changes in the core to the minimum necessary
> > to make it available through the hooks, because the hooks part is very
> > lightweight and simple to keep rebasing onto the vanilla core.
>
> At least I don't think we should accept such hooks. I don't think I am alone
> in that.

+1

Assuming type-aware TOASTing is the goal, I don't think hooks are the
right design to implement that.

-Matthias van de Meent




Re: Pluggable toaster

2023-02-06 Thread Matthias van de Meent
On Mon, 6 Feb 2023 at 15:38, Aleksander Alekseev
 wrote:
> I would like to point out however that there were several other pieces
> of feedback that could have been missed:
>
> * No one wants to see this as an extension. This was my original
> proposal (adding ZSON to /contrib/) and it was rejected. The community
> explicitly wants this to be a core feature with its syntax,
> autocompletion, documentation and stuff.

I believe that is a misrepresentation of the situation. ZSON had
(has?) several systemic issues and could not be accepted to /contrib/
in the way it was implemented, and it was commented that it would make
sense that the feature of compression assisted by dictionaries would
be implemented in core. Yet still, that feature is only closely
related to pluggable TOAST, but it is not the same as making TOAST
pluggable.

> * The community wants the feature to have a simple implementation. You
> said yourself that the idea of type-aware TOASTers is very invasive,
> and I completely agree.

I'm not sure that this is correct either. Compression (and TOAST) is
inherently complex, and I don't think we should reject improvements
because they are complex.
The problem that I see being raised here, is that there was little
discussion and no observed community consensus about the design of
this complex feature *before* this patch with high complexity was
provided.
The next action that was requested is to take a step back and decide
how we would want to implement type-aware TOASTing (and the associated
patch compression dictionaries) *before* we look into the type-aware
toasting.

> * People also want this to be simple from the user perspective, as
> simple as just CREATE COMPRESSED TABLE ... [USING lz4|zstd];

Could you provide a reference to this? I have yet to see a COMPRESSED
TABLE feature or syntax, let alone users asking for TOAST to be as
easily usable as that feature or syntax.


Kind regards,

Matthias van de Meent




Re: pglz compression performance, take two

2023-02-06 Thread Andrey Borodin
On Mon, Feb 6, 2023 at 11:57 AM Tomas Vondra
 wrote:
>
> I wonder what that means for the patch. I haven't investigated this at
> all, but it seems as if the optimization means we fail to find a match,
> producing a tad larger output. That may be still be a good tradeoff, as
> long as the output is correct (assuming it doesn't break some promise
> regarding expected output).
>

Yes, patch produces correct results and faster. And keeps the
compression ratio the same except for some one odd case.
The only problem is I do not understand _why_ it happens in that odd
case. And so far I failed to extract input\outputs of that odd case,
because it is buried under so many layers of abstraction and affects
only late stats.
Maybe the problem is not in compression at all...

Best regards, Andrey Borodin.




Re: Pluggable toaster

2023-02-06 Thread Nikita Malakhov
Hi,

It'll be great to see more opinions on the approach as a whole.

>The problem that I see being raised here, is that there was little
>discussion and no observed community consensus about the design of
>this complex feature *before* this patch with high complexity was
>provided.
>The next action that was requested is to take a step back and decide
>how we would want to implement type-aware TOASTing (and the associated
>patch compression dictionaries) *before* we look into the type-aware
>toasting.

We decided to put this improvement as a patch because we thought
that the most complex and questionable part would be the TOAST
implementations (the Toasters) itself, and the Pluggable TOAST is
just a tool to make plugging different TOAST implementations clean
and simple.

--
Nikita Malakhov
Postgres Professional
https://postgrespro.ru/


Re: SLRUs in the main buffer pool - Page Header definitions

2023-02-06 Thread Andres Freund
Hi,

> From 098f37c0a17fc32a94bff43817414e01fcfb234f Mon Sep 17 00:00:00 2001
> From: Rishu Bagga 
> Date: Thu, 15 Sep 2022 00:55:25 +
> Subject: [PATCH] slru to buffercache + page headers + upgrade
> 
> ---
>  contrib/amcheck/verify_nbtree.c |2 +-
>  [...]
>  65 files changed, 2176 insertions(+), 3258 deletions(-)

Unfortunately a quite large patch, with this little explanation, is hard to
review. I could read through the entire thread to try to figure out what this
is doing, but I really shouldn't have to.

You're changing quite fundamental APIs across the tree. Why is that required
for the topic at hand? Why is it worth doing that, given it'll break loads of
extensions?

Greetings,

Andres Freund




Re: Exit walsender before confirming remote flush in logical replication

2023-02-06 Thread Andres Freund
Hi,

On 2023-02-06 12:23:54 +0530, Amit Kapila wrote:
> On Mon, Feb 6, 2023 at 10:33 AM Andres Freund  wrote:
> > Smart shutdown is practically unusable. I don't think it makes sense to tie 
> > behavior of walsender to it in any way.
> >
> 
> So, we have the following options: (a) do nothing for this; (b)
> clarify the current behavior in docs. Any suggestions?

b) seems good.

I also think it'd make sense to improve this on a code-level. Just not in the
wholesale way discussed so far.

How about we make it an option in START_REPLICATION? Delayed logical rep can
toggle that on by default.

Greetings,

Andres Freund




Re: BUG: Postgres 14 + vacuum_defer_cleanup_age + FOR UPDATE + UPDATE

2023-02-06 Thread Andres Freund
Hi,

On 2023-02-04 11:10:55 -0800, Peter Geoghegan wrote:
> On Sat, Feb 4, 2023 at 2:57 AM Andres Freund  wrote:
> > Is there a good way to make breakage in the page recycling mechanism
> > visible with gist? I guess to see corruption, I'd have to halt a scan
> > before a page is visited with gdb, then cause the page to be recycled
> > prematurely in another session, then unblock the first? Which'd then
> > visit that page, thinking it to be in a different part of the tree than
> > it actually is?
> 
> Yes. This bug is similar to an ancient nbtree bug fixed back in 2012,
> by commit d3abbbeb.
> 
> > which clearly doesn't seem right.
> >
> > I just can't quite judge how bad that is.
> 
> It's really hard to judge, even if you're an expert. We're talking
> about a fairly chaotic scenario. My guess is that there is a very
> small chance of a very unpleasant scenario if you have a GiST index
> that has regular page deletions, and if you use
> vacuum_defer_cleanup_age. It's likely that most GiST indexes never
> have any page deletions due to the workload characteristics.

Thanks.


Sounds like a problem here is too hard to repro. I mostly wanted to know how
to be more confident about a fix working correctly. There's no tests for the
whole page recycling behaviour, afaics, so it's a bit scary to change things
around.

I didn't quite feel confident pushing a fix for this just before a minor
release, so I'll push once the minor releases are tagged. A quite minimal fix
to GetFullRecentGlobalXmin() in 12-13 (returning FirstNormalTransactionId if
epoch == 0 and RecentGlobalXmin > nextxid_xid), and the slightly larger fix in
14+.

Greetings,

Andres Freund




Re: run pgindent on a regular basis / scripted manner

2023-02-06 Thread Andrew Dunstan


On 2023-02-06 Mo 12:17, Peter Eisentraut wrote:

On 02.02.23 07:40, Tom Lane wrote:

Noah Misch  writes:
Regarding the concern about a pre-receive hook blocking an emergency 
push, the
hook could approve every push where a string like "pgindent: no" 
appears in a
commit message within the push.  You'd still want to make the tree 
clean
sometime the same week or so.  It's cheap to provide a break-glass 
like that.


I think the real question here is whether we can get all (or at least
a solid majority of) committers to accept such draconian constraints.
I'd buy into it, and evidently so would you, but I can't help noting
that less than a quarter of active committers have bothered to
comment on this thread.  I suspect the other three-quarters would
be quite annoyed if we tried to institute such requirements. That's
not manpower we can afford to drive away.


I have some concerns about this.

First, as a matter of principle, it would introduce another level of 
gatekeeping power.  Right now, the committers are as a group in charge 
of what gets into the tree.  Adding commit hooks that are installed 
somewhere(?) by someone(?) and can only be seen by some(?) would upset 
that.  If we were using something like github or gitlab (not 
suggesting that, but for illustration), then you could put this kind 
of thing under .github/ or similar and then it would be under the same 
control as the source code itself.


Also, pgindent takes tens of seconds to run, so hooking that into the 
git push process would slow this down quite a bit.  And maybe we want 
to add pgperltidy and so on, where would this lead?  If somehow your 
local indenting doesn't give you the "correct" result for some reason, 
you might sit there for minutes and minutes trying to fix and push and 
fix and push.



Well, pgindent should produce canonical results or we're surely doing it 
wrong. Regarding the time it takes, if we are only indenting the changed 
files that time will be vastly reduced for most cases.


But I take your point to some extent. I think we should start by making 
it easier and quicker to run pgindent locally, both by hand and in local 
git hooks, for ordinary developers and for committers, and we should 
encourage committers to be stricter in their use of pgindent. If there 
are features we need to make this possible, speak up (c.f. Robert's 
email earlier today). I'm committed to making this as easy as possible 
for people.


Once we get over those hurdles we can possibly revisit automation.




Then, consider the typedefs issue.  If you add a typedef but don't add 
it to the typedefs list but otherwise pgindent your code perfectly, 
the push would be accepted.  If then later someone updates the 
typedefs list, perhaps from the build farm, it would then reject the 
indentation of your previously committed code, thus making it their 
problem.



It would be nice if there were a gadget that would find new typedefs and 
warn you about them. Unfortunately our current code to find typedefs 
isn't all that fast either. Nicer still would be a way of not needing 
the typedefs list, but I don't think anyone has come up with one yet 
that meets our other requirements.





I think a better way to address these issues would be making this into 
a test suite, so that you can run some command that checks "is 
everything indented correctly".  Then you can run this locally, on the 
build farm, in the cfbot etc. in a uniform way and apply the existing 
blaming/encouragement processes like for any other test failure.





Well arguably the new --silent-diff and --show-diff modes are such tests :-)


cheers


andrew


--
Andrew Dunstan
EDB:https://www.enterprisedb.com


Re: Allow tailoring of ICU locales with custom rules

2023-02-06 Thread Peter Eisentraut

On 04.02.23 14:41, Daniel Verite wrote:

However it still leaves "daticurules" empty in the destination db,
because of an actual bug in the current patch.

Looking at createdb() in commands.c, it creates this variable:

@@ -711,6 +714,7 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
char   *dbcollate = NULL;
char   *dbctype = NULL;
char   *dbiculocale = NULL;
+   char   *dbicurules = NULL;
chardblocprovider = '\0';
char   *canonname;
int encoding = -1;

and then reads it later

@@ -1007,6 +1017,8 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
dblocprovider = src_locprovider;
if (dbiculocale == NULL && dblocprovider == COLLPROVIDER_ICU)
dbiculocale = src_iculocale;
+   if (dbicurules == NULL && dblocprovider == COLLPROVIDER_ICU)
+   dbicurules = src_icurules;
  
	/* Some encodings are client only */

if (!PG_VALID_BE_ENCODING(encoding))

but it forgets to assign it in between, so it stays NULL and src_icurules
is taken instead.


Right.  Here is a new patch with this fixed.
From 7ca76032097397d685a313500c96a41b2c38ecc6 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Mon, 6 Feb 2023 21:58:24 +0100
Subject: [PATCH v4] Allow tailoring of ICU locales with custom rules

This exposes the ICU facility to add custom collation rules to a
standard collation.

Discussion: 
https://www.postgresql.org/message-id/flat/821c71a4-6ef0-d366-9acf-bb8e367f7...@enterprisedb.com
---
 doc/src/sgml/catalogs.sgml| 18 +++
 doc/src/sgml/ref/create_collation.sgml| 22 
 doc/src/sgml/ref/create_database.sgml | 12 +
 src/backend/catalog/pg_collation.c|  5 ++
 src/backend/commands/collationcmds.c  | 23 +++--
 src/backend/commands/dbcommands.c | 51 +--
 src/backend/utils/adt/pg_locale.c | 41 ++-
 src/backend/utils/init/postinit.c | 11 +++-
 src/include/catalog/pg_collation.h|  2 +
 src/include/catalog/pg_database.h |  3 ++
 src/include/utils/pg_locale.h |  1 +
 .../regress/expected/collate.icu.utf8.out | 30 +++
 src/test/regress/sql/collate.icu.utf8.sql | 13 +
 13 files changed, 222 insertions(+), 10 deletions(-)

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index c1e4048054..746baf5053 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -2428,6 +2428,15 @@ pg_collation 
Columns
   
  
 
+ 
+  
+   collicurules text
+  
+  
+   ICU collation rules for this collation object
+  
+ 
+
  
   
collversion text
@@ -3106,6 +3115,15 @@ pg_database 
Columns
   
  
 
+ 
+  
+   daticurules text
+  
+  
+   ICU collation rules for this database
+  
+ 
+
  
   
datcollversion text
diff --git a/doc/src/sgml/ref/create_collation.sgml 
b/doc/src/sgml/ref/create_collation.sgml
index 58f5f0cd63..2c7266107e 100644
--- a/doc/src/sgml/ref/create_collation.sgml
+++ b/doc/src/sgml/ref/create_collation.sgml
@@ -27,6 +27,7 @@
 [ LC_CTYPE = lc_ctype, ]
 [ PROVIDER = provider, ]
 [ DETERMINISTIC = boolean, ]
+[ RULES = rules, ]
 [ VERSION = version ]
 )
 CREATE COLLATION [ IF NOT EXISTS ] name FROM 
existing_collation
@@ -149,6 +150,19 @@ Parameters
  
 
 
+
+ rules
+
+ 
+  
+   Specifies additional collation rules to customize the behavior of the
+   collation.  This is supported for ICU only.  See https://unicode-org.github.io/icu/userguide/collation/customization/"/>
+   for details on the syntax.
+  
+ 
+
+
 
  version
 
@@ -228,6 +242,14 @@ Examples
 
   
 
+  
+   To create a collation using the ICU provider, based on the English ICU
+   locale, with custom rules:
+
+
+
+  
+
   
To create a collation from an existing collation:
 
diff --git a/doc/src/sgml/ref/create_database.sgml 
b/doc/src/sgml/ref/create_database.sgml
index f3df2def86..860211e4d6 100644
--- a/doc/src/sgml/ref/create_database.sgml
+++ b/doc/src/sgml/ref/create_database.sgml
@@ -192,6 +192,18 @@ Parameters
   
  
 
+ 
+  icu_rules
+  
+   
+Specifies additional collation rules to customize the behavior of the
+collation.  This is supported for ICU only.  See https://unicode-org.github.io/icu/userguide/collation/customization/"/>
+for details on the syntax.
+   
+  
+ 
+
  
   locale_provider
 
diff --git a/src/backend/catalog/pg_collation.c 
b/src/backend/catalog/pg_collation.c
index 287b13725d..fd022e6fc2 100644
--- a/src/backend/catalog/pg_collation.c
+++ b/src/backend/catalog/pg_collation.c
@@ -50,6 +50,7 @@ CollationCreate(const char *collname, Oid collnamespace,

Re: OpenSSL 3.0.0 vs old branches

2023-02-06 Thread Andrew Dunstan


On 2023-02-06 Mo 11:13, Tom Lane wrote:

Andrew Dunstan  writes:

I recently moved crake to a new machine running Fedora 36, which has
OpenSSL 3.0.0. This causes the SSL tests to fail on branches earlier
than release 13, so I propose to backpatch commit f0d2c65f17 to the
release 11 and 12 branches.

Hmm ... according to that commit message,

   Note that the minimum supported OpenSSL version is 1.0.1 as of
   7b283d0e1d1d79bf1c962d790c94d2a53f3bb38a, so this does not introduce
   any new version requirements.

So presumably, changing this test would break it for OpenSSL 0.9.8,
which is still nominally supported in those branches.  On the other
hand, this test isn't run by default, so users would likely never
notice anyway.

On the whole, +1 for doing this (after the release freeze lifts).





Presumably we don't have any buildfarm animals running with such old 
versions of openssl, or they would be failing the same test on release 
>= 13.



I'll push this in due course.


cheers


andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com


Re: run pgindent on a regular basis / scripted manner

2023-02-06 Thread Andres Freund
Hi,

On 2023-02-06 18:17:02 +0100, Peter Eisentraut wrote:
> First, as a matter of principle, it would introduce another level of
> gatekeeping power.  Right now, the committers are as a group in charge of
> what gets into the tree.  Adding commit hooks that are installed
> somewhere(?) by someone(?) and can only be seen by some(?) would upset that.
> If we were using something like github or gitlab (not suggesting that, but
> for illustration), then you could put this kind of thing under .github/ or
> similar and then it would be under the same control as the source code
> itself.

Well, we did talk about adding a pre-commit hook to the repository, with
instructions for how to enable it. And I don't see a problem with adding the
pre-receive we're discussing here to src/tools/something.


> Also, pgindent takes tens of seconds to run, so hooking that into the git
> push process would slow this down quite a bit.  And maybe we want to add
> pgperltidy and so on, where would this lead?

Yes, I've been annoyed by this as well. This is painful, even without a push
hook.  Not just for pgindent, headerscheck/cpluspluscheck are quite painful as
well. I came to the conclusion that we ought to integrate pgindent etc into
the buildsystem properly. Instead of running such targets serially across all
files, without logic to prevent re-processing files, the relevant targets
ought to be run once for each process, and create a stamp file.



> If somehow your local indenting doesn't give you the "correct" result for
> some reason, you might sit there for minutes and minutes trying to fix and
> push and fix and push.

I was imagining that such a pre-receive hook would spit out the target that
you'd need to run locally to verify that the issue is resolved.


> Then, consider the typedefs issue.  If you add a typedef but don't add it to
> the typedefs list but otherwise pgindent your code perfectly, the push would
> be accepted.  If then later someone updates the typedefs list, perhaps from
> the build farm, it would then reject the indentation of your previously
> committed code, thus making it their problem.

I'd like to address this one via the buildsystem as well. We can do the
trickery that the buildfarm uses to extract typedefs as part of the build, and
update typedefs.list with the additional types.  There's really no need to
force us to do this manually.

Greetings,

Andres Freund




Re: run pgindent on a regular basis / scripted manner

2023-02-06 Thread Tom Lane
Andres Freund  writes:
> On 2023-02-06 18:17:02 +0100, Peter Eisentraut wrote:
>> First, as a matter of principle, it would introduce another level of
>> gatekeeping power.  Right now, the committers are as a group in charge of
>> what gets into the tree.  Adding commit hooks that are installed
>> somewhere(?) by someone(?) and can only be seen by some(?) would upset that.
>> If we were using something like github or gitlab (not suggesting that, but
>> for illustration), then you could put this kind of thing under .github/ or
>> similar and then it would be under the same control as the source code
>> itself.

> Well, we did talk about adding a pre-commit hook to the repository, with
> instructions for how to enable it. And I don't see a problem with adding the
> pre-receive we're discussing here to src/tools/something.

Yeah.  I don't think we are seriously considering putting any restrictions
in place on gitmaster --- the idea is to offer better tools to committers
to let them check/fix the indentation of what they are working on.  If
somebody wants to run that as a local pre-commit hook, that's their choice.

regards, tom lane




Re: pg_upgrade test failure

2023-02-06 Thread Thomas Munro
On Wed, Feb 1, 2023 at 2:44 PM Thomas Munro  wrote:
> OK, I pushed that.  Third time lucky?

I pulled down logs for a week of Windows CI, just over ~1k builds.
The failure rate was a few per day before, but there are no failures
like that after that went in.  There are logs that contain the
"Directory not empty" warning, but clearly the
try-again-and-this-time-wait-for-the-other-process logic must be
working (as horrible as it is) because then the test checks that the
directory is gone, and succeeds.  Hooray.

So that's one of our biggest CI flappers fixed.  Unfortunately without
treating the root cause, really.

Next up: the new "running" tests, spuriously failing around 8.8% of CI
builds on FreeBSD.  I'll go and ping that thread...




Re: pg_upgrade test failure

2023-02-06 Thread Andres Freund
Hi, 

On February 6, 2023 1:51:20 PM PST, Thomas Munro  wrote:
>Next up: the new "running" tests, spuriously failing around 8.8% of CI
>builds on FreeBSD.  I'll go and ping that thread...

Is that rate unchanged? I thought I fixed the main issue last week?

Greetings,

Andres
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.




Re: OpenSSL 3.0.0 vs old branches

2023-02-06 Thread Tom Lane
Andrew Dunstan  writes:
> On 2023-02-06 Mo 11:13, Tom Lane wrote:
>> So presumably, changing this test would break it for OpenSSL 0.9.8,
>> which is still nominally supported in those branches.  On the other
>> hand, this test isn't run by default, so users would likely never
>> notice anyway.

> Presumably we don't have any buildfarm animals running with such old 
> versions of openssl, or they would be failing the same test on release 
> >= 13.

That test isn't run by default in the buildfarm either, no?

But indeed, probably nobody in the community is testing such builds
at all.  I did have such setups on my old dinosaur BF animals, but
they bit the dust last year for unrelated reasons.  I wonder how
realistic it is to claim that we still support those old OpenSSL
versions.

regards, tom lane




Re: pg_upgrade test failure

2023-02-06 Thread Thomas Munro
On Tue, Feb 7, 2023 at 10:57 AM Andres Freund  wrote:
> On February 6, 2023 1:51:20 PM PST, Thomas Munro  
> wrote:
> >Next up: the new "running" tests, spuriously failing around 8.8% of CI
> >builds on FreeBSD.  I'll go and ping that thread...
>
> Is that rate unchanged? I thought I fixed the main issue last week?

Unfortunately my cfbot database only holds a week's history.  What I
see is that there were 1254 FreeBSD tasks run in that window, of which
163 failed, and (more interestingly) 111 of those failures succeeded
on every other platform.  And clicking on a few on cfbot's page
reveals that it's the new running stuff, and I'm still trying to find
the interesting logs...




Re: pg_upgrade test failure

2023-02-06 Thread Thomas Munro
On Tue, Feb 7, 2023 at 11:03 AM Thomas Munro  wrote:
> On Tue, Feb 7, 2023 at 10:57 AM Andres Freund  wrote:
> > On February 6, 2023 1:51:20 PM PST, Thomas Munro  
> > wrote:
> > >Next up: the new "running" tests, spuriously failing around 8.8% of CI
> > >builds on FreeBSD.  I'll go and ping that thread...
> >
> > Is that rate unchanged? I thought I fixed the main issue last week?
>
> Unfortunately my cfbot database only holds a week's history.  What I
> see is that there were 1254 FreeBSD tasks run in that window, of which
> 163 failed, and (more interestingly) 111 of those failures succeeded
> on every other platform.  And clicking on a few on cfbot's page
> reveals that it's the new running stuff, and I'm still trying to find
> the interesting logs...

Ah, that number might include some other problems, including in
subscription (#2900).  That's the problem with flapping tests, you get
desensitised and stop looking closely and miss things...




Re: pg_upgrade test failure

2023-02-06 Thread Andres Freund
Hi,

On 2023-02-07 11:03:18 +1300, Thomas Munro wrote:
> On Tue, Feb 7, 2023 at 10:57 AM Andres Freund  wrote:
> > On February 6, 2023 1:51:20 PM PST, Thomas Munro  
> > wrote:
> > >Next up: the new "running" tests, spuriously failing around 8.8% of CI
> > >builds on FreeBSD.  I'll go and ping that thread...
> >
> > Is that rate unchanged? I thought I fixed the main issue last week?
> 
> Unfortunately my cfbot database only holds a week's history.

Would be interesting to increase that to a considerably longer time. I can't
imagine that that'd take all that much resources?


> What I see is that there were 1254 FreeBSD tasks run in that window, of
> which 163 failed, and (more interestingly) 111 of those failures succeeded
> on every other platform.  And clicking on a few on cfbot's page reveals that
> it's the new running stuff, and I'm still trying to find the interesting
> logs...

I think I figured out why the logs frequently fail to upload - the server is
still running, so the size changes during the upload, causing the upload to
fail with errors like:

[12:46:43.552] Failed to upload artifacts: Put 
"https://storage.googleapis.com/cirrus-ci-5309429912436736-3271c9/artifacts/postgresql-cfbot/postgresql/6729936359129088/testrun/build/testrun/runningcheck.log?X-Goog-Algorithm=GOOG4-RSA-SHA256&X-Goog-Credential=cirrus-ci%40cirrus-ci-community.iam.gserviceaccount.com%2F20230206%2Fauto%2Fstorage%2Fgoog4_request&X-Goog-Date=20230206T124536Z&X-Goog-Expires=600&X-Goog-SignedHeaders=host%3Bx-goog-content-length-range%3Bx-goog-meta-created_by_task&X-Goog-Signature=8e84192cbc754180b8baa6c00c41b463f580fe7183f0e7113c253aac13cc2458b835caef7940b91e102e96d54cff2b5714c77390e74244e2fb88c00c9957a801e33cbee2ac960e0db8a01fe08ee945bedf4616881e6beafa3a162c22948ac0b9a9359d93e1f461fc9f49385b784b75d633f1b01805b987d9d53bc7fb55263917ec85180a2140659d50990f066160f03e8bb8984e8d2aadb64c875c253167cf24da152a18d69fcd3d941edce145931e4feb23dc8cf43de7b7bbfc565786c1c692406f2a0a127f30385a8c4b66f96709b51d26d3c71617991c731b0e7206ee3906338dedf6359412edd024f8c76bd33400f4c9320c2bde9512fa8bcd6289e54d52":
 http2: request body larger than specified content length

I'm testing adding a pg_ctl stop to the on_failure right now.

Greetings,

Andres Freund




gokiburi versus the back branches

2023-02-06 Thread Tom Lane
I notice that Michael's new BF animal gokiburi is failing in
all the pre-v15 branches, though it's fine in v15 and HEAD.
It's evidently dying from ASLR effects because it's trying
to build with EXEC_BACKEND on Linux: there's lots of

2023-02-06 06:07:02.131 GMT [1503972] FATAL:  could not reattach to shared 
memory (key=813803, addr=0x8c3a5000): Invalid argument
2023-02-06 06:07:02.131 GMT [1503971] FATAL:  could not reattach to shared 
memory (key=813803, addr=0x8c3a5000): Invalid argument
2023-02-06 06:07:02.132 GMT [1503976] FATAL:  could not reattach to shared 
memory (key=813803, addr=0x8c3a5000): Invalid argument

in its logs.

The reason it's okay in v15 and up is presumably this:

Author: Thomas Munro 
Branch: master Release: REL_15_BR [f3e78069d] 2022-01-11 00:04:33 +1300

Make EXEC_BACKEND more convenient on Linux and FreeBSD.

Try to disable ASLR when building in EXEC_BACKEND mode, to avoid random
memory mapping failures while testing.  For developer use only, no
effect on regular builds.

Is it time to back-patch that commit?  The alternative would be
to run the animal with an ASLR-disabling environment variable.
On the whole I think testing that f3e78069d works is more
useful than working around lack of it.

regards, tom lane




Re: gokiburi versus the back branches

2023-02-06 Thread Michael Paquier
On Mon, Feb 06, 2023 at 06:27:50PM -0500, Tom Lane wrote:
> Is it time to back-patch that commit?  The alternative would be
> to run the animal with an ASLR-disabling environment variable.
> On the whole I think testing that f3e78069d works is more
> useful than working around lack of it.

Yes, this is my intention as of this message from last week, once this
week's release is tagged:
https://www.postgresql.org/message-id/y9smhxo51hrxa...@paquier.xyz

Thanks,
--
Michael


signature.asc
Description: PGP signature


Re: Temporary tables versus wraparound... again

2023-02-06 Thread Andres Freund
Hi,

On 2023-02-05 15:30:57 -0800, Andres Freund wrote:
> Hm, I suspect the problem is that we didn't shut down the server due to
> the error, so the log file was changing while cirrus was trying to
> upload.

Pushed a fix for that.

Greetings,

Andres Freund




Re: pg_upgrade test failure

2023-02-06 Thread Andres Freund
On 2023-02-06 14:14:22 -0800, Andres Freund wrote:
> On 2023-02-07 11:03:18 +1300, Thomas Munro wrote:
> > What I see is that there were 1254 FreeBSD tasks run in that window, of
> > which 163 failed, and (more interestingly) 111 of those failures succeeded
> > on every other platform.  And clicking on a few on cfbot's page reveals that
> > it's the new running stuff, and I'm still trying to find the interesting
> > logs...
> 
> I think I figured out why the logs frequently fail to upload - the server is
> still running, so the size changes during the upload, causing the upload to
> fail with errors like:
> 
> [12:46:43.552] Failed to upload artifacts: Put 
> "https://storage.googleapis.com/cirrus-ci-5309429912436736-3271c9/artifacts/postgresql-cfbot/postgresql/6729936359129088/testrun/build/testrun/runningcheck.log?X-Goog-Algorithm=GOOG4-RSA-SHA256&X-Goog-Credential=cirrus-ci%40cirrus-ci-community.iam.gserviceaccount.com%2F20230206%2Fauto%2Fstorage%2Fgoog4_request&X-Goog-Date=20230206T124536Z&X-Goog-Expires=600&X-Goog-SignedHeaders=host%3Bx-goog-content-length-range%3Bx-goog-meta-created_by_task&X-Goog-Signature=8e84192cbc754180b8baa6c00c41b463f580fe7183f0e7113c253aac13cc2458b835caef7940b91e102e96d54cff2b5714c77390e74244e2fb88c00c9957a801e33cbee2ac960e0db8a01fe08ee945bedf4616881e6beafa3a162c22948ac0b9a9359d93e1f461fc9f49385b784b75d633f1b01805b987d9d53bc7fb55263917ec85180a2140659d50990f066160f03e8bb8984e8d2aadb64c875c253167cf24da152a18d69fcd3d941edce145931e4feb23dc8cf43de7b7bbfc565786c1c692406f2a0a127f30385a8c4b66f96709b51d26d3c71617991c731b0e7206ee3906338dedf6359412edd024f8c76bd33400f4c9320c2bde9512fa8bcd6289e54d52":
>  http2: request body larger than specified content length
> 
> I'm testing adding a pg_ctl stop to the on_failure right now.

Pushed the fix doing so.




Re: gokiburi versus the back branches

2023-02-06 Thread Tom Lane
Michael Paquier  writes:
> On Mon, Feb 06, 2023 at 06:27:50PM -0500, Tom Lane wrote:
>> Is it time to back-patch that commit?

> Yes, this is my intention as of this message from last week, once this
> week's release is tagged:
> https://www.postgresql.org/message-id/y9smhxo51hrxa...@paquier.xyz

D'oh, I'd totally forgotten that conversation already :-(

regards, tom lane




Re: lockup in parallel hash join on dikkop (freebsd 14.0-current)

2023-02-06 Thread Tomas Vondra



On 2/6/23 20:20, Andres Freund wrote:
> Hi,
> 
> On 2023-02-06 19:51:19 +0100, Tomas Vondra wrote:
>>> No. The only thing the machine is doing is
>>>
>>>   while /usr/bin/true; do
>>> make check
>>>   done
>>>
>>> I can't reduce the workload further, because the "join" test is in a
>>> separate parallel group (I cut down parallel_schedule). I could make the
>>> machine busier, of course.
>>>
>>> However, the other lockup I saw was when using serial_schedule, so I
>>> guess lower concurrency makes it more likely.
>>>
>>
>> FWIW the machine is now on run ~2700 without any further lockups :-/
>>
>> Seems it was quite lucky we hit it twice in a handful of attempts.
> 
> Did you cut down the workload before you reproduced it the first time, or
> after? It's quite possible that it's not reproducible in isolation.
> 

No, I left the workload as it was for the first lockup, so `make check`
runs everything as is up until the "join" test suite.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: lockup in parallel hash join on dikkop (freebsd 14.0-current)

2023-02-06 Thread Thomas Munro
On Tue, Feb 7, 2023 at 12:46 PM Tomas Vondra
 wrote:
> No, I left the workload as it was for the first lockup, so `make check`
> runs everything as is up until the "join" test suite.

Wait, shouldn't that be join_hash?




Re: lockup in parallel hash join on dikkop (freebsd 14.0-current)

2023-02-06 Thread Tomas Vondra
On 2/7/23 00:48, Thomas Munro wrote:
> On Tue, Feb 7, 2023 at 12:46 PM Tomas Vondra
>  wrote:
>> No, I left the workload as it was for the first lockup, so `make check`
>> runs everything as is up until the "join" test suite.
> 
> Wait, shouldn't that be join_hash?

No, because join_hash does not exist on 11 (it was added in 12). Also,
it actually locked up like this - that's the lockup I reported on 28/1.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: lockup in parallel hash join on dikkop (freebsd 14.0-current)

2023-02-06 Thread Thomas Munro
On Tue, Feb 7, 2023 at 1:06 PM Tomas Vondra
 wrote:
> On 2/7/23 00:48, Thomas Munro wrote:
> > On Tue, Feb 7, 2023 at 12:46 PM Tomas Vondra
> >  wrote:
> >> No, I left the workload as it was for the first lockup, so `make check`
> >> runs everything as is up until the "join" test suite.
> >
> > Wait, shouldn't that be join_hash?
>
> No, because join_hash does not exist on 11 (it was added in 12). Also,
> it actually locked up like this - that's the lockup I reported on 28/1.

Oh, good.  I had been trying to repro with 12 here and forgot that you
were looking at 11...




Re: Generating code for query jumbling through gen_node_support.pl

2023-02-06 Thread Michael Paquier
On Sun, Feb 05, 2023 at 07:40:57PM +0900, Michael Paquier wrote:
> Comments or objections are welcome, of course.

Done this part.

> (FWIW, I'd like to think that there is an argument to normalize the
> A_Const nodes for a portion of the DDL queries, by ignoring their
> values in the query jumbling and mark a location, which would be
> really useful for some workloads, but that's a separate discussion I
> am keeping for later.)

And I have a patch pretty much OK for this part of the discussion.
Will post soon..
--
Michael


signature.asc
Description: PGP signature


Re: Time delayed LR (WAS Re: logical replication restrictions)

2023-02-06 Thread Peter Smith
Here are my review comments for v29-0001.

==
Commit Message

1.
Discussion: 
https://postgr.es/m/CAB-JLwYOYwL=xtyaxkih5ctm_vm8kjkh7aaitckvmch4rzr...@mail.gmail.com

tmp

~

What's that "tmp" doing there? A typo?

==
doc/src/sgml/catalogs.sgml

2.
+ 
+  
+   subminapplydelay int4
+  
+  
+   The minimum delay (ms) for applying changes.
+  
+ 

For consistency remove the period (.) because the other
single-sentence descriptions on this page do not have one.

==
src/backend/commands/subscriptioncmds.c

3. AlterSubscription
+ errmsg("cannot set parallel streaming mode for subscription with %s",
+"min_apply_delay"));

Since there are no translator considerations here why not write it like this:

errmsg("cannot set parallel streaming mode for subscription with
min_apply_delay")

~~~

4. AlterSubscription
+ errmsg("cannot set %s for subscription in parallel streaming mode",
+"min_apply_delay"));

Since there are no translator considerations here why not write it like this:

errmsg("cannot set min_apply_delay for subscription in parallel streaming mode")

~~~

5.
+defGetMinApplyDelay(DefElem *def)
+{
+ char*input_string;
+ int result;
+ const char *hintmsg;
+
+ input_string = defGetString(def);
+
+ /*
+ * Parse given string as parameter which has millisecond unit
+ */
+ if (!parse_int(input_string, &result, GUC_UNIT_MS, &hintmsg))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("invalid value for parameter \"%s\": \"%s\"",
+ "min_apply_delay", input_string),
+ hintmsg ? errhint("%s", _(hintmsg)) : 0));
+
+ /*
+ * Check both the lower boundary for the valid min_apply_delay range and
+ * the upper boundary as the safeguard for some platforms where INT_MAX is
+ * wider than int32 respectively. Although parse_int() has confirmed that
+ * the result is less than or equal to INT_MAX, the value will be stored
+ * in a catalog column of int32.
+ */
+ if (result < 0 || result > PG_INT32_MAX)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("%d ms is outside the valid range for parameter \"%s\" (%d .. %d)",
+ result,
+ "min_apply_delay",
+ 0, PG_INT32_MAX)));
+
+ return result;
+}

5a.
Since there are no translator considerations here why not write the
first error like:

errmsg("invalid value for parameter \"min_apply_delay\": \"%s\"",
input_string)

~

5b.
Since there are no translator considerations here why not write the
second error like:

errmsg("%d ms is outside the valid range for parameter
\"min_apply_delay\" (%d .. %d)",
result, 0, PG_INT32_MAX))

--
Kind Regards,
Peter Smith.
Fujitsu Australia




Re: tests against running server occasionally fail, postgres_fdw & tenk1

2023-02-06 Thread Andres Freund
Hi,

On 2022-12-08 16:36:07 -0800, Andres Freund wrote:
> On 2022-12-08 16:15:11 -0800, Andres Freund wrote:
> > Unfortunately cfbot shows that that doesn't work entirely reliably.
> >
> > The most frequent case is postgres_fdw, which somewhat regularly fails with 
> > a
> > regression.diff like this:
> >
> > diff -U3 
> > /tmp/cirrus-ci-build/contrib/postgres_fdw/expected/postgres_fdw.out 
> > /tmp/cirrus-ci-build/build/testrun/postgres_fdw-running/regress/results/postgres_fdw.out
> > --- /tmp/cirrus-ci-build/contrib/postgres_fdw/expected/postgres_fdw.out 
> > 2022-12-08 20:35:24.772888000 +
> > +++ 
> > /tmp/cirrus-ci-build/build/testrun/postgres_fdw-running/regress/results/postgres_fdw.out
> > 2022-12-08 20:43:38.19945 +
> > @@ -9911,8 +9911,7 @@
> > WHERE application_name = 'fdw_retry_check';
> >   pg_terminate_backend
> >  --
> > - t
> > -(1 row)
> > +(0 rows)
> >
> >  -- This query should detect the broken connection when starting new remote
> >  -- transaction, reestablish new connection, and then succeed.

> >
> > I guess that a cache reset message arrives and leads to the connection being
> > terminated. Unfortunately that's hard to see right now, as the relevant log
> > messages are output with DEBUG3 - it's quite verbose, so enabling it for all
> > tests will be painful.
> >
> > I *think* I have seen this failure locally at least once, when running the
> > test normally.
> >
> >
> > I'll try to reproduce this locally for a bit. If I can't, the only other 
> > idea
> > I have for debugging this is to change log_min_messages in that section of 
> > the
> > postgres_fdw test to DEBUG3.
>
> Oh. I tried to reproduce the issue, without success so far, but eventually my
> test loop got stuck in something I reported previously and forgot about since:
> https://www.postgresql.org/message-id/20220925232237.p6uskba2dw6fnwj2%40awork3.anarazel.de
>
> I wonder if the timing on the freebsd CI task works out to hitting a "smaller
> version" of the problem that eventually resolves itself, which then leads to a
> sinval reset getting sent, causing the observable problem.

The issue referenced above is now fixed, and I haven't seen instances of it
since then. I also just now fixed the issue that often lead to failing to
upload logs.

Unfortunately the fdw_retry_check issue from above has re-occurred since then:

https://cirrus-ci.com/task/4901157940756480
https://api.cirrus-ci.com/v1/artifact/task/4901157940756480/testrun/build/testrun/postgres_fdw-running/regress/regression.diffs


Another run hit an issue we've been fighting repeatedly on the buildfarm / CI:
https://cirrus-ci.com/task/5527490404286464
https://api.cirrus-ci.com/v1/artifact/task/5527490404286464/testrun/build/testrun/regress-running/regress/regression.diffs

diff -U3 /tmp/cirrus-ci-build/src/test/regress/expected/create_index.out 
/tmp/cirrus-ci-build/build/testrun/regress-running/regress/results/create_index.out
--- /tmp/cirrus-ci-build/src/test/regress/expected/create_index.out 
2023-02-06 23:52:43.627604000 +
+++ 
/tmp/cirrus-ci-build/build/testrun/regress-running/regress/results/create_index.out
 2023-02-07 00:03:04.535232000 +
@@ -1930,12 +1930,13 @@
 SELECT thousand, tenthous FROM tenk1
 WHERE thousand < 2 AND tenthous IN (1001,3000)
 ORDER BY thousand;
-  QUERY PLAN

- Index Only Scan using tenk1_thous_tenthous on tenk1
-   Index Cond: (thousand < 2)
-   Filter: (tenthous = ANY ('{1001,3000}'::integer[]))
-(3 rows)
+  QUERY PLAN
+--
+ Sort
+   Sort Key: thousand
+   ->  Index Only Scan using tenk1_thous_tenthous on tenk1
+ Index Cond: ((thousand < 2) AND (tenthous = ANY 
('{1001,3000}'::integer[])))
+(4 rows)

 SELECT thousand, tenthous FROM tenk1
 WHERE thousand < 2 AND tenthous IN (1001,3000)


I'd be tempted to disable the test, but it also found genuine issues in a
bunch of CF entries, and all these test instabilities seem like ones we'd also
see, with a lower occurrence on the buildfarm.


WRT the fdw_retry_check: I wonder if we should increase the log level of
a) pgfdw_inval_callback deciding to disconnect
b) ReceiveSharedInvalidMessages() deciding to reset

to DEBUG1, at least temporarily?

Alternatively we could add a
SET log_min_messages=debug4;
...
RESET log_min_messages;

around the postgres_fdw connection re-establishment test?


One thing nudging me towards the more global approach is that I have the vague
feelign that there's a wider issue around hitting more sinval resets than we
should - and it'd right now be very hard to know about that.


Greetings,

Andres Freund




RE: Perform streaming logical transactions by background workers and parallel apply

2023-02-06 Thread Hayato Kuroda (Fujitsu)
Dear Hou,

> I think PARALLEL_APPLY_MAIN waits for two kinds of event: 1) wait for new
> message from the queue 2) wait for the partial file state to be set. So, I
> think introducing a new general event for them is better and it is also
> consistent with the WAIT_EVENT_LOGICAL_APPLY_MAIN which is used in the
> main
> loop of leader apply worker(LogicalRepApplyLoop). But the event in
> pg_send_data() is only for message send, so it seems fine to use
> WAIT_EVENT_MQ_SEND, besides MQ_SEND is also unique in parallel apply
> worker and
> user can distinglish without adding new event.

Thank you for your explanation. I think both of you said are reasonable.


Best Regards,
Hayato Kuroda
FUJITSU LIMITED



RE: [PATCH] Reuse Workers and Replication Slots during Logical Replication

2023-02-06 Thread shiy.f...@fujitsu.com

On Thu, Feb 2, 2023 11:48 AM shveta malik  wrote:
> 
> On Wed, Feb 1, 2023 at 5:42 PM Melih Mutlu 
> wrote:
> >
> > Hi Shveta,
> >
> > shveta malik , 1 Şub 2023 Çar, 15:01 tarihinde
> şunu yazdı:
> >>
> >> On Wed, Feb 1, 2023 at 5:05 PM Melih Mutlu 
> wrote:
> >> 2) I found a crash in the previous patch (v9), but have not tested it
> >> on the latest yet. Crash happens when all the replication slots are
> >> consumed and we are trying to create new. I tweaked the settings like
> >> below so that it can be reproduced easily:
> >> max_sync_workers_per_subscription=3
> >> max_replication_slots = 2
> >> and then ran the test case shared by you. I think there is some memory
> >> corruption happening. (I did test in debug mode, have not tried in
> >> release mode). I tried to put some traces to identify the root-cause.
> >> I observed that worker_1 keeps on moving from 1 table to another table
> >> correctly, but at some point, it gets corrupted i.e. origin-name
> >> obtained for it is wrong and it tries to advance that and since that
> >> origin does not exist, it  asserts and then something else crashes.
> >> From log: (new trace lines added by me are prefixed by shveta, also
> >> tweaked code to have my comment 1 fixed to have clarity on worker-id).
> >>
> >> form below traces, it is clear that worker_1 was moving from one
> >> relation to another, always getting correct origin 'pg_16688_1', but
> >> at the end it got 'pg_16688_49' which does not exist. Second part of
> >> trace shows who updated 'pg_16688_49', it was done by worker_49
> which
> >> even did not get chance to create this origin due to max_rep_slot
> >> reached.
> >
> >
> > Thanks for investigating this error. I think it's the same error as the one 
> > Shi
> reported earlier. [1]
> > I couldn't reproduce it yet but will apply your tweaks and try again.
> > Looking into this.
> >
> > [1] https://www.postgresql.org/message-
> id/OSZPR01MB631013C833C98E826B3CFCB9FDC69%40OSZPR01MB6310.jpn
> prd01.prod.outlook.com
> >
> 
> Hi Melih,
> I think I am able to identify the root cause. It is not memory
> corruption, but the way origin-names are stored in system-catalog
> mapped to a particular relation-id before even those are created.
> 
> After adding few more logs:
> 
> [4227] LOG:  shveta- LogicalRepSyncTableStart- worker_49 constructed
> originname :pg_16684_49, relid:16540
> [4227] LOG:  shveta- LogicalRepSyncTableStart- worker_49
> updated-origin in system catalog:pg_16684_49,
> slot:pg_16684_sync_49_7195149685251088378, relid:16540
> [4227] LOG:  shveta- LogicalRepSyncTableStart- worker_49
> get-origin-id2 originid:0, originname:pg_16684_49
> [4227] ERROR:  could not create replication slot
> "pg_16684_sync_49_7195149685251088378": ERROR:  all replication slots
> are in use
> HINT:  Free one or increase max_replication_slots.
> 
> 
> [4428] LOG:  shveta- LogicalRepSyncTableStart- worker_148 constructed
> originname :pg_16684_49, relid:16540
> [4428] LOG:  could not drop replication slot
> "pg_16684_sync_49_7195149685251088378" on publisher: ERROR:
> replication slot "pg_16684_sync_49_7195149  685251088378" does not
> exist
> [4428] LOG:  shveta- LogicalRepSyncTableStart- worker_148 drop-origin
> originname:pg_16684_49
> [4428] LOG:  shveta- LogicalRepSyncTableStart- worker_148
> updated-origin:pg_16684_49,
> slot:pg_16684_sync_148_7195149685251088378, relid:16540
> 
> So from above, worker_49 came and picked up relid:16540, constructed
> origin-name and slot-name and updated in system-catalog and then it
> tried to actually create that slot and origin but since max-slot count
> was reached, it failed and exited, but did not do cleanup from system
> catalog for that relid.
> 
> Then worker_148 came and also picked up table 16540 since it was not
> completed/started by previous worker, but this time it found that
> origin and slot entry present in system-catalog against this relid, so
> it picked the same names and started processing, but since those do
> not exist, it asserted while advancing the origin.
> 
> The assert is only reproduced when an already running worker (say
> worker_1) who has 'created=true' set, gets to sync the relid for which
> a previously failed worker has tried and updated origin-name w/o
> creating it. In such a case worker_1 (with created=true) will try to
> reuse the origin and thus will try to advance it and will end up
> asserting. That is why you might not see the assertion always. The
> condition 'created' is set to true for that worker and it goes to
> reuse the origin updated by the previous worker.
> 
> So to fix this, I think either we update origin and slot entries in
> the system catalog after the creation has passed or we clean-up the
> system catalog in case of failure. What do you think?
> 

I think the first way seems better.

I reproduced the problem I reported before with latest patch (v7-0001,
v10-0002), and looked into this problem. It is caused by a similar reason. Here
is some analysis for

RE: Time delayed LR (WAS Re: logical replication restrictions)

2023-02-06 Thread Hayato Kuroda (Fujitsu)
Dear Peter,

Thank you for reviewing! PSA new version.

> ==
> Commit Message
> 
> 1.
> Discussion:
> https://postgr.es/m/CAB-JLwYOYwL=XTyAXKiH5CtM_Vm8KjKh7aaitCKvmCh4r
> zr...@mail.gmail.com
> 
> tmp
> 
> ~
> 
> What's that "tmp" doing there? A typo?

Removed. It was a typo.
I used `git rebase` command to combining the local commits,
but the commit message seemed to be remained.

> ==
> doc/src/sgml/catalogs.sgml
> 
> 2.
> + 
> +  
> +   subminapplydelay int4
> +  
> +  
> +   The minimum delay (ms) for applying changes.
> +  
> + 
> 
> For consistency remove the period (.) because the other
> single-sentence descriptions on this page do not have one.

I have also confirmed and agreed. Fixed.

> ==
> src/backend/commands/subscriptioncmds.c
> 
> 3. AlterSubscription
> + errmsg("cannot set parallel streaming mode for subscription with %s",
> +"min_apply_delay"));
> 
> Since there are no translator considerations here why not write it like this:
> 
> errmsg("cannot set parallel streaming mode for subscription with
> min_apply_delay")

Fixed.

> ~~~
> 
> 4. AlterSubscription
> + errmsg("cannot set %s for subscription in parallel streaming mode",
> +"min_apply_delay"));
> 
> Since there are no translator considerations here why not write it like this:
> 
> errmsg("cannot set min_apply_delay for subscription in parallel streaming 
> mode")

Fixed.

> ~~~
> 
> 5.
> +defGetMinApplyDelay(DefElem *def)
> +{
> + char*input_string;
> + int result;
> + const char *hintmsg;
> +
> + input_string = defGetString(def);
> +
> + /*
> + * Parse given string as parameter which has millisecond unit
> + */
> + if (!parse_int(input_string, &result, GUC_UNIT_MS, &hintmsg))
> + ereport(ERROR,
> + (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> + errmsg("invalid value for parameter \"%s\": \"%s\"",
> + "min_apply_delay", input_string),
> + hintmsg ? errhint("%s", _(hintmsg)) : 0));
> +
> + /*
> + * Check both the lower boundary for the valid min_apply_delay range and
> + * the upper boundary as the safeguard for some platforms where INT_MAX is
> + * wider than int32 respectively. Although parse_int() has confirmed that
> + * the result is less than or equal to INT_MAX, the value will be stored
> + * in a catalog column of int32.
> + */
> + if (result < 0 || result > PG_INT32_MAX)
> + ereport(ERROR,
> + (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> + errmsg("%d ms is outside the valid range for parameter \"%s\" (%d .. %d)",
> + result,
> + "min_apply_delay",
> + 0, PG_INT32_MAX)));
> +
> + return result;
> +}
> 
> 5a.
> Since there are no translator considerations here why not write the
> first error like:
> 
> errmsg("invalid value for parameter \"min_apply_delay\": \"%s\"",
> input_string)
> 
> ~
> 
> 5b.
> Since there are no translator considerations here why not write the
> second error like:
> 
> errmsg("%d ms is outside the valid range for parameter
> \"min_apply_delay\" (%d .. %d)",
> result, 0, PG_INT32_MAX))

Both of you said were fixed.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

> -Original Message-
> From: Peter Smith 
> Sent: Tuesday, February 7, 2023 9:33 AM
> To: Osumi, Takamichi/大墨 昂道 
> Cc: Amit Kapila ; Shi, Yu/侍 雨
> ; Kyotaro Horiguchi ;
> vignes...@gmail.com; Kuroda, Hayato/黒田 隼人
> ; shveta.ma...@gmail.com; dilipbal...@gmail.com;
> eu...@eulerto.com; m.melihmu...@gmail.com; and...@anarazel.de;
> mar...@f10.com.br; pgsql-hack...@postgresql.org
> Subject: Re: Time delayed LR (WAS Re: logical replication restrictions)
> 
> Here are my review comments for v29-0001.
> 
> ==
> Commit Message
> 
> 1.
> Discussion:
> https://postgr.es/m/CAB-JLwYOYwL=XTyAXKiH5CtM_Vm8KjKh7aaitCKvmCh4r
> zr...@mail.gmail.com
> 
> tmp
> 
> ~
> 
> What's that "tmp" doing there? A typo?
> 
> ==
> doc/src/sgml/catalogs.sgml
> 
> 2.
> + 
> +  
> +   subminapplydelay int4
> +  
> +  
> +   The minimum delay (ms) for applying changes.
> +  
> + 
> 
> For consistency remove the period (.) because the other
> single-sentence descriptions on this page do not have one.
> 
> ==
> src/backend/commands/subscriptioncmds.c
> 
> 3. AlterSubscription
> + errmsg("cannot set parallel streaming mode for subscription with %s",
> +"min_apply_delay"));
> 
> Since there are no translator considerations here why not write it like this:
> 
> errmsg("cannot set parallel streaming mode for subscription with
> min_apply_delay")
> 
> ~~~
> 
> 4. AlterSubscription
> + errmsg("cannot set %s for subscription in parallel streaming mode",
> +"min_apply_delay"));
> 
> Since there are no translator considerations here why not write it like this:
> 
> errmsg("cannot set min_apply_delay for subscription in parallel streaming 
> mode")
> 
> ~~~
> 
> 5.
> +defGetMinApplyDelay(DefElem *def)
> +{
> + char*input_string;
> + int result;
> + const char *hintmsg;
> +
> + input_string = defGetString(def);
> +
> + /*
> + * Parse given string as parameter which has millisecon

A problem in deconstruct_distribute_oj_quals

2023-02-06 Thread Richard Guo
In deconstruct_distribute_oj_quals, when we've identified a commutable
left join which provides join clause with flexible semantics, we try to
generate multiple versions of the join clause.  Here we have the logic
that puts back any ojrelids that were removed from its min_righthand.

 /*
  * Put any OJ relids that were removed from min_righthand back into
  * ojscope, else distribute_qual_to_rels will complain.
  */
 ojscope = bms_join(ojscope, bms_intersect(sjinfo->commute_below,
   sjinfo->syn_righthand));

I doubt this is necessary.  It seems to me that all relids mentioned
within the join clause have already been contained in ojscope, which is
the union of min_lefthand and min_righthand.

I noticed this code because I came across a problem with a query as
below.

create table t (a int);

select t1.a from (t t1 left join t t2 on true) left join (t t3 left join t
t4 on t3.a = t4.a) on t2.a = t3.a;

When we deal with qual 't2.a = t3.a', deconstruct_distribute_oj_quals
would always add the OJ relid of t3/t4 into its required_relids, due to
the code above, which I think is wrong.  The direct consequence is that
we would miss the plan that joins t2 and t3 directly.

If we add unique constraint for 'a' and try the outer-join removal
logic, we would notice that the left join of t2/t3 cannot be removed
because its join qual is treated as pushed down due to the fact that its
required_relids exceed the scope of the join.  I think this is also not
correct.

So is it safe we remove that code?

Thanks
Richard


Re: Perform streaming logical transactions by background workers and parallel apply

2023-02-06 Thread Amit Kapila
On Mon, Feb 6, 2023 at 3:43 PM houzj.f...@fujitsu.com
 wrote:
>
> while reading the code, I noticed that in pa_send_data() we set wait event
> to WAIT_EVENT_LOGICAL_PARALLEL_APPLY_STATE_CHANGE while sending the
> message to the queue. Because this state is used in multiple places, user 
> might
> not be able to distinguish what they are waiting for. So It seems we'd better
> to use WAIT_EVENT_MQ_SEND here which will be eaier to distinguish and
> understand. Here is a tiny patch for that.
>

Thanks for noticing this. The patch LGTM. I'll push this in some time.

-- 
With Regards,
Amit Kapila.




Re: tests against running server occasionally fail, postgres_fdw & tenk1

2023-02-06 Thread Andres Freund
Hi,

On 2023-02-06 17:53:00 -0800, Andres Freund wrote:
> WRT the fdw_retry_check: I wonder if we should increase the log level of
> a) pgfdw_inval_callback deciding to disconnect
> b) ReceiveSharedInvalidMessages() deciding to reset
>
> to DEBUG1, at least temporarily?
>
> Alternatively we could add a
> SET log_min_messages=debug4;
> ...
> RESET log_min_messages;
>
> around the postgres_fdw connection re-establishment test?
>
>
> One thing nudging me towards the more global approach is that I have the vague
> feelign that there's a wider issue around hitting more sinval resets than we
> should - and it'd right now be very hard to know about that.

Luckily it proved to be easy enough to reproduce on a private branch, by
setting the test to repeat a couple times.

I set the aforementioned log messages to LOG. And indeed:

2023-02-07 02:54:18.773 UTC [10800][client backend] 
[pg_regress/postgres_fdw][7/10526:0] LOG:  cache state reset
2023-02-07 02:54:18.773 UTC [10800][client backend] 
[pg_regress/postgres_fdw][7/10526:0] LOG:  discarding connection 0x802251f00

that was preceded by another log message less than 200 ms before:
2023-02-07 02:54:18.588 UTC [10800][client backend] 
[pg_regress/postgres_fdw][7/10523:55242] STATEMENT:  ALTER SERVER loopback 
OPTIONS (application_name 'fdw_retry_check');

The log statements inbetween are about isolation/reindex-concurrently-toast
and pg_regress/indexing.

So the problem is indeed that we somehow quickly overflow the sinval queue. I
guess we need a bit more logging around the size of the sinval queue and its
"fullness"?


I'm a bit surprised that MAXNUMMESSAGES is a hardcoded 4096.  It's not
particularly surprising that that's quickly overflown?


There's something off. Isolationtester's control connection emits *loads* of
invalidation messages:
2023-02-06 19:29:06.430 PST [2125297][client 
backend][6/0:121864][isolation/receipt-report/control connection] LOG:  
previously emitted 7662 messages, 21 this time
2023-02-06 19:29:06.566 PST [2125297][client 
backend][6/0:121873][isolation/receipt-report/control connection] LOG:  
previously emitted 8155 messages, 99 this time
2023-02-06 19:29:06.655 PST [2125297][client 
backend][6/0:121881][isolation/receipt-report/control connection] LOG:  
previously emitted 8621 messages, 99 this time
2023-02-06 19:29:06.772 PST [2125297][client 
backend][6/0:121892][isolation/receipt-report/control connection] LOG:  
previously emitted 9208 messages, 85 this time
2023-02-06 19:29:06.867 PST [2125297][client 
backend][6/0:121900][isolation/receipt-report/control connection] LOG:  
previously emitted 9674 messages, 85 this time

and this happens with lots of other tests.

Greetings,

Andres Freund


PS: The reindex-concurrently-toast output seems to indicate something is
broken in the test... There's lots of non-existing table references in the
expected file, without that immediately making sense.




Re: Exit walsender before confirming remote flush in logical replication

2023-02-06 Thread Amit Kapila
On Tue, Feb 7, 2023 at 2:04 AM Andres Freund  wrote:
>
> On 2023-02-06 12:23:54 +0530, Amit Kapila wrote:
> > On Mon, Feb 6, 2023 at 10:33 AM Andres Freund  wrote:
> > > Smart shutdown is practically unusable. I don't think it makes sense to 
> > > tie behavior of walsender to it in any way.
> > >
> >
> > So, we have the following options: (a) do nothing for this; (b)
> > clarify the current behavior in docs. Any suggestions?
>
> b) seems good.
>
> I also think it'd make sense to improve this on a code-level. Just not in the
> wholesale way discussed so far.
>
> How about we make it an option in START_REPLICATION? Delayed logical rep can
> toggle that on by default.
>

Works for me. So, when this option is set in START_REPLICATION
message, walsender will set some flag and allow itself to exit at
shutdown without waiting for WAL to be sent?

-- 
With Regards,
Amit Kapila.




Re: Time delayed LR (WAS Re: logical replication restrictions)

2023-02-06 Thread Amit Kapila
On Tue, Feb 7, 2023 at 6:03 AM Peter Smith  wrote:
>
> 5.
> +defGetMinApplyDelay(DefElem *def)
> +{
> + char*input_string;
> + int result;
> + const char *hintmsg;
> +
> + input_string = defGetString(def);
> +
> + /*
> + * Parse given string as parameter which has millisecond unit
> + */
> + if (!parse_int(input_string, &result, GUC_UNIT_MS, &hintmsg))
> + ereport(ERROR,
> + (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> + errmsg("invalid value for parameter \"%s\": \"%s\"",
> + "min_apply_delay", input_string),
> + hintmsg ? errhint("%s", _(hintmsg)) : 0));
> +
> + /*
> + * Check both the lower boundary for the valid min_apply_delay range and
> + * the upper boundary as the safeguard for some platforms where INT_MAX is
> + * wider than int32 respectively. Although parse_int() has confirmed that
> + * the result is less than or equal to INT_MAX, the value will be stored
> + * in a catalog column of int32.
> + */
> + if (result < 0 || result > PG_INT32_MAX)
> + ereport(ERROR,
> + (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> + errmsg("%d ms is outside the valid range for parameter \"%s\" (%d .. %d)",
> + result,
> + "min_apply_delay",
> + 0, PG_INT32_MAX)));
> +
> + return result;
> +}
>
> 5a.
> Since there are no translator considerations here why not write the
> first error like:
>
> errmsg("invalid value for parameter \"min_apply_delay\": \"%s\"",
> input_string)
>
> ~
>
> 5b.
> Since there are no translator considerations here why not write the
> second error like:
>
> errmsg("%d ms is outside the valid range for parameter
> \"min_apply_delay\" (%d .. %d)",
> result, 0, PG_INT32_MAX))
>

I see that existing usage in the code matches what the patch had
before this comment. See below and similar usages in the code.
if (start <= 0)
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("invalid value for parameter \"%s\": %d",
"start", start)));

-- 
With Regards,
Amit Kapila.




Re: Deadlock between logrep apply worker and tablesync worker

2023-02-06 Thread Peter Smith
On Fri, Feb 3, 2023 at 6:58 PM houzj.f...@fujitsu.com
 wrote:
>
...
> > Right, I think that case could be addressed by Tom's patch to some extent 
> > but
> > I am thinking we should also try to analyze if we can completely avoid the 
> > need
> > to remove origins from both processes. One idea could be to introduce
> > another relstate something like PRE_SYNCDONE and set it in a separate
> > transaction before we set the state as SYNCDONE and remove the slot and
> > origin in tablesync worker.
> > Now, if the tablesync worker errors out due to some reason during the second
> > transaction, it can remove the slot and origin after restart by checking 
> > the state.
> > However, it would add another relstate which may not be the best way to
> > address this problem. Anyway, that can be accomplished as a separate patch.
>
> Here is an attempt to achieve the same.
> Basically, the patch removes the code that drop the origin in apply worker. 
> And
> add a new state PRE_SYNCDONE after synchronization finished in front of apply
> (sublsn set), but before dropping the origin and other final cleanups. The
> tablesync will restart and redo the cleanup if it failed after reaching the 
> new
> state. Besides, since the changes can already be applied on the table in
> PRE_SYNCDONE state, so I also modified the check in
> should_apply_changes_for_rel(). And some other conditions for the origin drop
> in subscription commands are were adjusted in this patch.
>

Here are some review comments for the 0001 patch

==
General Comment

0.
The idea of using the extra relstate for clean-up seems OK, but the
implementation of the new state in this patch appears misordered and
misnamed to me.

The state name should indicate what it is doing (PRE_SYNCDONE is
meaningless). The patch describes in several places that this state
means "synchronized, but not yet cleaned up" therefore IMO it means
the SYNCDONE state should be *before* this new state. And since this
new state is for "cleanup" then let's call it something like that.

To summarize, I don’t think the meaning of SYNCDONE should be touched.
SYNCDONE means the synchronization is done, same as before. And your
new "cleanup" state belongs directly *after* that. IMO it should be
like this:

1. STATE_INIT
2. STATE_DATASYNC
3. STATE_FINISHEDCOPY
4. STATE_SYNCDONE
5. STATE_CLEANUP <-- new relstate
6. STATE_READY

Of course, this is going to impact almost every aspect of the patch,
but I think everything will be basically the same as you have it now
-- only all the state names and comments need to be adjusted according
to the above.

==
Commit Message

1.
Previously, we allowed the apply worker to drop the origin to avoid the case
that the tablesync worker fails to the origin(due to crash). In this case we
don't restart the tablesync worker, and the apply worker can clean the origin.

~

There seem to be some words missing in this paragraph.

SUGGESTION
Previously, we allowed the apply worker to drop the origin as a way to
recover from the scenario where the tablesync worker failed to drop it
(due to crash).

~~~

2.
To improve this, we introduce a new relstate SUBREL_STATE_PRE_SYNCDONE which
will be set after synchronization finished in front of apply (sublsn set), but
before dropping the origin and other final cleanups. The apply worker will
restart tablesync worker if the relstate is SUBREL_STATE_PRE_SYNCDONE. This
way, even if the tablesync worker error out in the transaction that tries to
drop the origin, the apply worker will restart the tablesync worker to redo the
cleanup(for origin and other stuff) and then directly exit.

~

2a.
This is going to be impacted by my "General Comment". Notice how you
describe again "will be set after synchronization finished".  Evidence
again this means
the new CLEANUP state should directly follow the SYNCDONE state.

2b.
"error out” --> "encounters an error"

2c.
"cleanup(for origin" --> space before the "("

==
doc/src/sgml/catalogs.sgml

3.
@@ -8071,7 +8071,8 @@ SCRAM-SHA-256$:&l
i = initialize,
d = data is being copied,
f = finished table copy,
-   s = synchronized,
+   p = synchronized but not yet cleaned up,
+   s = synchronization done,
r = ready (normal replication)
   
  
@@ -8082,8 +8083,8 @@ SCRAM-SHA-256$:&l
   
   
Remote LSN of the state change used for synchronization coordination
-   when in s or r states,
-   otherwise null
+   when in p, s or
+   r states, otherwise null
   
  
 

This state order and choice of the letter are impacted by my "General Comment".

IMO it should be more like this:

State code: i = initialize, d = data is being copied, f = finished
table copy,  s = synchronization done, c = clean-up done, r = ready
(normal replication)

==
src/backend/commands/subscriptioncmds.c

4. AlterSubscription_refresh

Some adjustments are needed according to my "General Comment".

  1   2   >