Re: pg_stat_statements oddity with track = all

2020-12-04 Thread Julien Rouhaud
On Thu, Dec 03, 2020 at 04:53:59PM +0800, Julien Rouhaud wrote:
> On Thu, Dec 03, 2020 at 11:40:22AM +0300, Sergei Kornilov wrote:
> > Hello
> > 
> > > To get an increase in the number of records that means that the same
> > > statement
> > > would appear at top level AND nested level. This seems a corner case with
> > > very low
> > > (neglectible) occurence rate.
> > 
> > +1
> > I think splitting fields into plans_toplevel / plans_nested will be less 
> > convenient. And more code with higher chance of copypaste errors
> 
> As I mentioned in a previous message, I really have no idea if that would be a
> corner case or not.  For instance with native partitioning, the odds to have
> many different query executed both at top level and as a nested statement may
> be quite higher.

The consensus seems to be adding a new boolean toplevel flag in the entry key,
so PFA a patch implementing that.  Note that the key now has padding, so
memset() calls are required.
>From 6c738707abdd72807ec94dbafd346f077e482f74 Mon Sep 17 00:00:00 2001
From: Julien Rouhaud 
Date: Fri, 4 Dec 2020 13:33:51 +0800
Subject: [PATCH v1] Add a bool toplevel column to pg_stat_statements.

As top level statements include resource consumption for underlying statements,
it's not possible to compute the total resource consumption accurately.  Fix
that by adding a new toplevel boolean field that indicates whether the counters
were cumulated for queries executed at top level or not.

It can lead to more entries being stored for the same workload if
pg_stat_statements.track is set to all, but it seems unlikely to have the same
queries being executed both at top level and as nested statements.
---
 contrib/pg_stat_statements/Makefile   |  3 +-
 .../expected/pg_stat_statements.out   | 40 +
 .../pg_stat_statements--1.9--1.10.sql | 57 +++
 .../pg_stat_statements/pg_stat_statements.c   | 42 +++---
 .../pg_stat_statements.control|  2 +-
 .../sql/pg_stat_statements.sql| 21 +++
 doc/src/sgml/pgstatstatements.sgml|  9 +++
 7 files changed, 165 insertions(+), 9 deletions(-)
 create mode 100644 contrib/pg_stat_statements/pg_stat_statements--1.9--1.10.sql

diff --git a/contrib/pg_stat_statements/Makefile 
b/contrib/pg_stat_statements/Makefile
index 3ec627b956..cab4f626ad 100644
--- a/contrib/pg_stat_statements/Makefile
+++ b/contrib/pg_stat_statements/Makefile
@@ -6,7 +6,8 @@ OBJS = \
pg_stat_statements.o
 
 EXTENSION = pg_stat_statements
-DATA = pg_stat_statements--1.4.sql pg_stat_statements--1.8--1.9.sql \
+DATA = pg_stat_statements--1.4.sql \
+pg_stat_statements--1.9--1.10.sql pg_stat_statements--1.8--1.9.sql \
pg_stat_statements--1.7--1.8.sql pg_stat_statements--1.6--1.7.sql \
pg_stat_statements--1.5--1.6.sql pg_stat_statements--1.4--1.5.sql \
pg_stat_statements--1.3--1.4.sql pg_stat_statements--1.2--1.3.sql \
diff --git a/contrib/pg_stat_statements/expected/pg_stat_statements.out 
b/contrib/pg_stat_statements/expected/pg_stat_statements.out
index 16158525ca..fb97f68737 100644
--- a/contrib/pg_stat_statements/expected/pg_stat_statements.out
+++ b/contrib/pg_stat_statements/expected/pg_stat_statements.out
@@ -876,4 +876,44 @@ SELECT dealloc FROM pg_stat_statements_info;
0
 (1 row)
 
+--
+-- top level handling
+--
+SET pg_stat_statements.track = 'top';
+DELETE FROM test;
+DO $$
+BEGIN
+DELETE FROM test;
+END;
+$$ LANGUAGE plpgsql;
+SELECT query, toplevel, plans, calls FROM pg_stat_statements WHERE query LIKE 
'%DELETE%' ORDER BY query COLLATE "C", toplevel;
+ query | toplevel | plans | calls 
+---+--+---+---
+ DELETE FROM test  | t| 1 | 1
+ DO $$+| t| 0 | 1
+ BEGIN+|  |   | 
+ DELETE FROM test;+|  |   | 
+ END; +|  |   | 
+ $$ LANGUAGE plpgsql   |  |   | 
+(2 rows)
+
+SET pg_stat_statements.track = 'all';
+DELETE FROM test;
+DO $$
+BEGIN
+DELETE FROM test;
+END;
+$$ LANGUAGE plpgsql;
+SELECT query, toplevel, plans, calls FROM pg_stat_statements WHERE query LIKE 
'%DELETE%' ORDER BY query COLLATE "C", toplevel;
+ query | toplevel | plans | calls 
+---+--+---+---
+ DELETE FROM test  | f| 1 | 1
+ DELETE FROM test  | t| 2 | 2
+ DO $$+| t| 0 | 2
+ BEGIN+|  |   | 
+ DELETE FROM test;+|  |   | 
+ END; +|  |   | 
+ $$ LANGUAGE plpgsql   |  |   | 
+(3 rows)
+
 DROP EXTENSION pg_stat_statements;
diff --git a/contrib/pg_stat_statements/pg_stat_statements--1.9--1.10.sql 
b/contrib/pg_stat_statements/pg_stat_statements--1.9--1.10.sql
new file mode 100644
index 00..f97d16497d
--- /dev/null
+++ b/contrib/pg_stat_st

Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit

2020-12-04 Thread Bharath Rupireddy
On Fri, Dec 4, 2020 at 11:49 AM Fujii Masao  wrote:
>
> > Attaching the v2 patch set. Please review it further.
>
> Regarding the 0001 patch, we should add the function that returns
> the information of cached connections like dblink_get_connections(),
> together with 0001 patch? Otherwise it's not easy for users to
> see how many cached connections are and determine whether to
> disconnect them or not. Sorry if this was already discussed before.
>

Thanks for bringing this up. Exactly this is what I was thinking a few
days back. Say the new function postgres_fdw_get_connections() which
can return an array of server names whose connections exist in the
cache. Without this function, the user may not know how many
connections this backend has until he checks it manually on the remote
server.

Thoughts? If okay, I can code the function in the 0001 patch.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




RE: A new function to wait for the backend exit after termination

2020-12-04 Thread Hou, Zhijie
Hi,

When test pg_terminate_backend_and_wait with parallel query.
I noticed that the function is not defined as parallel safe.

I am not very familiar with the standard about whether a function should be 
parallel safe.
But I found the following function are all defined as parallel safe:

pg_promote
pg_terminate_backend(integer)
pg_sleep*

Is there a reason why pg_terminate_backend_and_wait are not parallel safe ?
(I'm sorry if I miss something in previous mails.)

Best regards,
houzj




Re: Single transaction in the tablesync worker?

2020-12-04 Thread Amit Kapila
On Fri, Dec 4, 2020 at 10:29 AM Amit Kapila  wrote:
>
> On Fri, Dec 4, 2020 at 7:53 AM Craig Ringer
>  wrote:
> >
> > On Thu, 3 Dec 2020 at 17:25, Amit Kapila  wrote:
> >
> > > Is there any fundamental problem if
> > > we commit the transaction after initial copy and slot creation in
> > > LogicalRepSyncTableStart and then allow the apply of transactions as
> > > it happens in apply worker?
> >
> > No fundamental problem. Both approaches are fine. Committing the
> > initial copy then doing the rest in individual txns means an
> > incomplete sync state for the table becomes visible, which may not be
> > ideal. Ideally we'd do something like sync the data into a clone of
> > the table then swap the table relfilenodes out once we're synced up.
> >
> > IMO the main advantage of committing as we go is that it would let us
> > use a non-temporary slot and support recovering an incomplete sync and
> > finishing it after interruption by connection loss, crash, etc. That
> > would be advantageous for big table syncs or where the sync has lots
> > of lag to replay. But it means we have to remember sync states, and
> > give users a way to cancel/abort them. Otherwise forgotten temp slots
> > for syncs will cause a mess on the upstream.
> >
> > It also allows the sync slot to advance, freeing any held upstream
> > resources before the whole sync is done, which is good if the upstream
> > is busy and generating lots of WAL.
> >
> > Finally, committing as we go means we won't exceed the cid increment
> > limit in a single txn.
> >
>
>
> Yeah, all these are advantages of processing
> transaction-by-transaction. IIUC, we need to primarily do two things
> to achieve it, one is to have an additional state in the catalog (say
> catch up) which will say that the initial copy is done. Then we need
> to have a permanent slot using which we can track the progress of the
> slot so that after restart (due to crash, connection break, etc.) we
> can start from the appropriate position.
>
> Apart from the above, I think with the current design of tablesync we
> can see partial data of transactions because we allow all the
> tablesync workers to run parallelly. Consider the below scenario:
>
..
..
>
> Basically, the results for Tx1, Tx2, Tx3 are visible for mytbl2 but
> not for mytbl1. To reproduce this I have stopped the tablesync workers
> (via debugger) for mytbl1 and mytbl2 in LogicalRepSyncTableStart
> before it changes the relstate to SUBREL_STATE_SYNCWAIT. Then allowed
> Tx2 and Tx3 to be processed by apply worker and then allowed tablesync
> worker for mytbl2 to proceed. After that, I can see the above state.
>
> Now, won't this behavior be considered as transaction inconsistency
> where partial transaction data or later transaction data is visible? I
> don't think we can have such a situation on the master (publisher)
> node or in physical standby.
>

On briefly checking the pglogical code [1], it seems this problem
won't be there in pglogical. Because it seems to first copy all the
tables (via pglogical_sync_table) in one process and then catch with
the apply worker in a transaction-by-transaction manner. Am, I reading
it correctly? If so then why we followed a different approach for
in-core solution or is it that the pglogical has improved over time
but all the improvements can't be implemented in-core because of some
missing features?

[1] - https://github.com/2ndQuadrant/pglogical

-- 
With Regards,
Amit Kapila.




Re: Remove incorrect assertion in reorderbuffer.c.

2020-12-04 Thread Amit Kapila
On Fri, Dec 4, 2020 at 11:19 AM Dilip Kumar  wrote:
>
> On Thu, Dec 3, 2020 at 5:33 PM Amit Kapila  wrote:
> >
> > We start recording changes in ReorderBufferTXN even before we reach
> > SNAPBUILD_CONSISTENT state so that if the commit is encountered after
> > reaching that we should be able to send the changes of the entire
> > transaction. Now, while recording changes if the reorder buffer memory
> > has exceeded logical_decoding_work_mem then we can start streaming if
> > it is allowed and we haven't yet streamed that data. However, we must
> > not allow streaming to start unless the snapshot has reached
> > SNAPBUILD_CONSISTENT state.
> >
> > I have also improved the comments atop ReorderBufferResetTXN to
> > mention the case when we need to continue streaming after getting an
> > error.
> >
> > Attached patch for the above changes.
> >
> > Thoughts?
>
> LGTM.
>

Thanks for the review, Pushed!

-- 
With Regards,
Amit Kapila.




Re: pg_stat_statements oddity with track = all

2020-12-04 Thread Sergei Kornilov
Hello

Seems we need also change PGSS_FILE_HEADER.

regards, Sergei




Re: A new function to wait for the backend exit after termination

2020-12-04 Thread Bharath Rupireddy
On Fri, Dec 4, 2020 at 2:02 PM Hou, Zhijie  wrote:
>
> Hi,
>
> When test pg_terminate_backend_and_wait with parallel query.
> I noticed that the function is not defined as parallel safe.
>
> I am not very familiar with the standard about whether a function should be 
> parallel safe.
> But I found the following function are all defined as parallel safe:
>
> pg_promote
> pg_terminate_backend(integer)
> pg_sleep*
>
> Is there a reason why pg_terminate_backend_and_wait are not parallel safe ?
> (I'm sorry if I miss something in previous mails.)
>

I'm not quite sure of a use case where existing pg_terminate_backend()
or for that matter the new pg_terminate_backend_and_wait() and
pg_wait_backend() will ever get used from parallel workers. Having
said that, I marked the new functions as parallel safe to keep it the
way it is with existing pg_terminate_backend().

postgres=# select proparallel, proname, prosrc from pg_proc where
proname IN ('pg_wait_backend', 'pg_terminate_backend');
 proparallel |   proname|prosrc
-+--+---
 s   | pg_terminate_backend | pg_terminate_backend
 s   | pg_wait_backend  | pg_wait_backend
 s   | pg_terminate_backend | pg_terminate_backend_and_wait
(3 rows)

Attaching v6 patch. Please have a look.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com


v6-0001-pg_terminate_backend-with-wait-timeout-and-pg_wai.patch
Description: Binary data


Re: A new function to wait for the backend exit after termination

2020-12-04 Thread hou zhijie
The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   tested, passed
Documentation:not tested

Thanks for the new patch, the patch LGTM and works as expected

The new status of this patch is: Ready for Committer


Re: On login trigger: take three

2020-12-04 Thread Konstantin Knizhnik




On 04.12.2020 3:50, Greg Nancarrow wrote:

On Tue, Sep 15, 2020 at 2:12 AM Pavel Stehule  wrote:

It is always possible to login by disabling startup triggers using 
disable_session_start_trigger GUC:

psql "dbname=postgres options='-c disable_session_start_trigger=true'"


sure, I know. Just this behavior can be a very unpleasant surprise, and my 
question is if it can be fixed.  Creating custom libpq variables can be the 
stop for people that use pgAdmin.


Hi,

I thought in the case of using pgAdmin (assuming you can connect as
superuser to a database, say the default "postgres" maintenance
database, that doesn't have an EVENT TRIGGER defined for the
session_start event) you could issue the query "ALTER SYSTEM SET
disable_session_start_trigger TO true;"  and then reload the
configuration?
As far as I understand Pavel concern was about the case when superuser 
defines wrong login trigger which prevents login to the system
all user including himself. Right now solution of this problem is to 
include "options='-c disable_session_start_trigger=true'" in connection 
string.

I do not know if it can be done with pgAdmin.


Anyway, I am wondering if this patch is still being actively developed/improved?


I do not know what else has to be improved.
If you, Pavel or anybody else have some suggestions: please let me know.


Regarding the last-posted patch, I'd like to give some feedback. I
found that the documentation part wouldn't build because of errors in
the SGML tags. There are some grammatical errors too, and some minor
inconsistencies with the current documentation, and some descriptions
could be improved. I think that a colon separator should be added to
the NOTICE message for superuser, so it's clear exactly where the text
of the underlying error message starts. Also, I think that
"client_connection" is perhaps a better and more intuitive event name
than "session_start", or the suggested "user_backend_start".
I've therefore attached an updated patch with these suggested minor
improvements, please take a look and see what you think (please
compare with the original patch).


Thank you very much for detecting the problems and much more thanks for 
fixing them and providing your version of the patch.

I have nothing against renaming "session_start" to "client_connection".

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company





Re: pg_stat_statements oddity with track = all

2020-12-04 Thread Julien Rouhaud
On Fri, Dec 04, 2020 at 12:06:10PM +0300, Sergei Kornilov wrote:
> Hello
> 
> Seems we need also change PGSS_FILE_HEADER.

Indeed, thanks!  v2 attached.
>From 1da24926d9645ee997aabd2907482a29332e3729 Mon Sep 17 00:00:00 2001
From: Julien Rouhaud 
Date: Fri, 4 Dec 2020 13:33:51 +0800
Subject: [PATCH v2] Add a bool toplevel column to pg_stat_statements.

As top level statements include resource consumption for underlying statements,
it's not possible to compute the total resource consumption accurately.  Fix
that by adding a new toplevel boolean field that indicates whether the counters
were cumulated for queries executed at top level or not.

It can lead to more entries being stored for the same workload if
pg_stat_statements.track is set to all, but it seems unlikely to have the same
queries being executed both at top level and as nested statements.
---
 contrib/pg_stat_statements/Makefile   |  3 +-
 .../expected/pg_stat_statements.out   | 40 +
 .../pg_stat_statements--1.9--1.10.sql | 57 +++
 .../pg_stat_statements/pg_stat_statements.c   | 44 +++---
 .../pg_stat_statements.control|  2 +-
 .../sql/pg_stat_statements.sql| 21 +++
 doc/src/sgml/pgstatstatements.sgml|  9 +++
 7 files changed, 166 insertions(+), 10 deletions(-)
 create mode 100644 contrib/pg_stat_statements/pg_stat_statements--1.9--1.10.sql

diff --git a/contrib/pg_stat_statements/Makefile 
b/contrib/pg_stat_statements/Makefile
index 3ec627b956..cab4f626ad 100644
--- a/contrib/pg_stat_statements/Makefile
+++ b/contrib/pg_stat_statements/Makefile
@@ -6,7 +6,8 @@ OBJS = \
pg_stat_statements.o
 
 EXTENSION = pg_stat_statements
-DATA = pg_stat_statements--1.4.sql pg_stat_statements--1.8--1.9.sql \
+DATA = pg_stat_statements--1.4.sql \
+pg_stat_statements--1.9--1.10.sql pg_stat_statements--1.8--1.9.sql \
pg_stat_statements--1.7--1.8.sql pg_stat_statements--1.6--1.7.sql \
pg_stat_statements--1.5--1.6.sql pg_stat_statements--1.4--1.5.sql \
pg_stat_statements--1.3--1.4.sql pg_stat_statements--1.2--1.3.sql \
diff --git a/contrib/pg_stat_statements/expected/pg_stat_statements.out 
b/contrib/pg_stat_statements/expected/pg_stat_statements.out
index 16158525ca..fb97f68737 100644
--- a/contrib/pg_stat_statements/expected/pg_stat_statements.out
+++ b/contrib/pg_stat_statements/expected/pg_stat_statements.out
@@ -876,4 +876,44 @@ SELECT dealloc FROM pg_stat_statements_info;
0
 (1 row)
 
+--
+-- top level handling
+--
+SET pg_stat_statements.track = 'top';
+DELETE FROM test;
+DO $$
+BEGIN
+DELETE FROM test;
+END;
+$$ LANGUAGE plpgsql;
+SELECT query, toplevel, plans, calls FROM pg_stat_statements WHERE query LIKE 
'%DELETE%' ORDER BY query COLLATE "C", toplevel;
+ query | toplevel | plans | calls 
+---+--+---+---
+ DELETE FROM test  | t| 1 | 1
+ DO $$+| t| 0 | 1
+ BEGIN+|  |   | 
+ DELETE FROM test;+|  |   | 
+ END; +|  |   | 
+ $$ LANGUAGE plpgsql   |  |   | 
+(2 rows)
+
+SET pg_stat_statements.track = 'all';
+DELETE FROM test;
+DO $$
+BEGIN
+DELETE FROM test;
+END;
+$$ LANGUAGE plpgsql;
+SELECT query, toplevel, plans, calls FROM pg_stat_statements WHERE query LIKE 
'%DELETE%' ORDER BY query COLLATE "C", toplevel;
+ query | toplevel | plans | calls 
+---+--+---+---
+ DELETE FROM test  | f| 1 | 1
+ DELETE FROM test  | t| 2 | 2
+ DO $$+| t| 0 | 2
+ BEGIN+|  |   | 
+ DELETE FROM test;+|  |   | 
+ END; +|  |   | 
+ $$ LANGUAGE plpgsql   |  |   | 
+(3 rows)
+
 DROP EXTENSION pg_stat_statements;
diff --git a/contrib/pg_stat_statements/pg_stat_statements--1.9--1.10.sql 
b/contrib/pg_stat_statements/pg_stat_statements--1.9--1.10.sql
new file mode 100644
index 00..f97d16497d
--- /dev/null
+++ b/contrib/pg_stat_statements/pg_stat_statements--1.9--1.10.sql
@@ -0,0 +1,57 @@
+/* contrib/pg_stat_statements/pg_stat_statements--1.9--1.10.sql */
+
+-- complain if script is sourced in psql, rather than via ALTER EXTENSION
+\echo Use "ALTER EXTENSION pg_stat_statements UPDATE TO '1.10'" to load this 
file. \quit
+
+/* First we have to remove them from the extension */
+ALTER EXTENSION pg_stat_statements DROP VIEW pg_stat_statements;
+ALTER EXTENSION pg_stat_statements DROP FUNCTION pg_stat_statements(boolean);
+
+/* Then we can drop them */
+DROP VIEW pg_stat_statements;
+DROP FUNCTION pg_stat_statements(boolean);
+
+/* Now redefine */
+CREATE FUNCTION pg_stat_statements(IN showtext boolean,
+OUT userid oid,
+OUT dbid oid,
+OUT toplevel bool,
+OUT queryid bigint,
+OUT query text,
+OUT plans int8,
+  

Re: Get memory contexts of an arbitrary backend process

2020-12-04 Thread torikoshia

On 2020-12-03 10:36, Tom Lane wrote:

Fujii Masao  writes:
I'm starting to study how this feature behaves. At first, when I 
executed

the following query, the function never returned. ISTM that since
the autovacuum launcher cannot respond to the request of memory
contexts dump, the function keeps waiting infinity. Is this a bug?
Probably we should exclude non-backend proceses from the target
processes to dump? Sorry if this was already discussed.



 SELECT pg_get_backend_memory_contexts(pid) FROM pg_stat_activity;


Thanks for trying it!

It was not discussed explicitly, and I was going to do it later
as commented.


+   /* TODO: Check also whether backend or not. */




FWIW, I think this patch is fundamentally unsafe.  It's got a
lot of the same problems that I complained about w.r.t. the
nearby proposal to allow cross-backend stack trace dumping.
It does avoid the trap of thinking that it can do work in
a signal handler, but instead it supposes that it can do
work involving very high-level objects such as shared hash tables
in anyplace that might execute CHECK_FOR_INTERRUPTS.  That's
never going to be safe: the only real expectation the system
has is that CHECK_FOR_INTERRUPTS is called at places where our
state is sane enough that a transaction abort can clean up.
Trying to do things like taking LWLocks is going to lead to
deadlocks or worse.  We need not even get into the hard questions,
such as what happens when one process or the other exits
unexpectedly.


Thanks for reviewing!

I may misunderstand something, but the dumper works not at
CHECK_FOR_INTERRUPTS but during the client read, i.e.,
ProcessClientReadInterrupt().

Is it also unsafe?


BTW, since there was a comment that the shared hash table
used too much memory, I'm now rewriting this patch not to use
the shared hash table but a simpler static shared memory struct.


I also find the idea that this should be the same SQL function
as pg_get_backend_memory_contexts to be a seriously bad decision.
That means that it's not possible to GRANT the right to examine
only your own process's memory --- with this proposal, that means
granting the right to inspect every other process as well.

Beyond that, the fact that there's no way to restrict the capability
to just, say, other processes owned by the same user means that
it's not really safe to GRANT to non-superusers anyway.  Even with
such a restriction added, things are problematic, since for example
it would be possible to inquire into the workings of a
security-definer function executing in another process that
nominally is owned by your user.


I'm going to change the function name and restrict the executor to
superusers. Is it enough?


Regards,




Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit

2020-12-04 Thread Bharath Rupireddy
On Fri, Dec 4, 2020 at 1:46 PM Bharath Rupireddy
 wrote:
>
> On Fri, Dec 4, 2020 at 11:49 AM Fujii Masao  
> wrote:
> >
> > > Attaching the v2 patch set. Please review it further.
> >
> > Regarding the 0001 patch, we should add the function that returns
> > the information of cached connections like dblink_get_connections(),
> > together with 0001 patch? Otherwise it's not easy for users to
> > see how many cached connections are and determine whether to
> > disconnect them or not. Sorry if this was already discussed before.
> >
>
> Thanks for bringing this up. Exactly this is what I was thinking a few
> days back. Say the new function postgres_fdw_get_connections() which
> can return an array of server names whose connections exist in the
> cache. Without this function, the user may not know how many
> connections this backend has until he checks it manually on the remote
> server.
>
> Thoughts? If okay, I can code the function in the 0001 patch.
>

Added a new function postgres_fdw_get_connections() into 0001 patch,
which returns a list of server names for which there exists an
existing open and active connection.

Attaching v3 patch set, please review it further.


With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
From 0ededddba3ee80a968d51df8283525ff962440da Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy 
Date: Fri, 4 Dec 2020 16:22:06 +0530
Subject: [PATCH v3] postgres_fdw connection cache discard tests and
 documentation

---
 .../postgres_fdw/expected/postgres_fdw.out| 202 +-
 contrib/postgres_fdw/sql/postgres_fdw.sql | 118 ++
 doc/src/sgml/postgres-fdw.sgml| 132 
 3 files changed, 451 insertions(+), 1 deletion(-)

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 2d88d06358..692da1c606 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -8911,7 +8911,7 @@ DO $d$
 END;
 $d$;
 ERROR:  invalid option "password"
-HINT:  Valid options in this context are: service, passfile, channel_binding, connect_timeout, dbname, host, hostaddr, port, options, application_name, keepalives, keepalives_idle, keepalives_interval, keepalives_count, tcp_user_timeout, sslmode, sslcompression, sslcert, sslkey, sslrootcert, sslcrl, requirepeer, ssl_min_protocol_version, ssl_max_protocol_version, gssencmode, krbsrvname, gsslib, target_session_attrs, use_remote_estimate, fdw_startup_cost, fdw_tuple_cost, extensions, updatable, fetch_size
+HINT:  Valid options in this context are: service, passfile, channel_binding, connect_timeout, dbname, host, hostaddr, port, options, application_name, keepalives, keepalives_idle, keepalives_interval, keepalives_count, tcp_user_timeout, sslmode, sslcompression, sslcert, sslkey, sslrootcert, sslcrl, requirepeer, ssl_min_protocol_version, ssl_max_protocol_version, gssencmode, krbsrvname, gsslib, target_session_attrs, use_remote_estimate, fdw_startup_cost, fdw_tuple_cost, extensions, updatable, fetch_size, keep_connection
 CONTEXT:  SQL statement "ALTER SERVER loopback_nopw OPTIONS (ADD password 'dummypw')"
 PL/pgSQL function inline_code_block line 3 at EXECUTE
 -- If we add a password for our user mapping instead, we should get a different
@@ -9035,3 +9035,203 @@ ERROR:  08006
 COMMIT;
 -- Clean up
 DROP PROCEDURE terminate_backend_and_wait(text);
+-- ===
+-- disconnect connections that are cached/kept by the local session
+-- ===
+-- postgres_fdw_get_connections returns an array with elements in a
+-- machine-dependent ordering, so we must resort to unnesting and sorting for a
+-- stable result.
+CREATE FUNCTION fdw_unnest(anyarray) RETURNS SETOF anyelement
+LANGUAGE SQL STRICT IMMUTABLE AS $$
+SELECT $1[i] FROM generate_series(array_lower($1,1), array_upper($1,1)) AS i
+$$;
+-- Change application names of remote connections to special ones so that we
+-- can easily check for their existence.
+ALTER SERVER loopback OPTIONS (SET application_name 'fdw_disconnect_cached_conn_1');
+ALTER SERVER loopback2 OPTIONS (application_name 'fdw_disconnect_cached_conn_2');
+-- By default, the connections associated with foreign server are cached i.e.
+-- keep_connection option is on. Set it to off.
+ALTER SERVER loopback OPTIONS (keep_connection 'off');
+-- loopback server connection is closed by the local session at the end of xact
+-- as the keep_connection was set to off.
+SELECT 1 FROM ft1 LIMIT 1;
+ ?column? 
+--
+1
+(1 row)
+
+-- Connection should not exist.
+SELECT application_name FROM pg_stat_activity
+	WHERE application_name = 'fdw_disconnect_cached_conn_1';
+ application_name 
+--
+(0 rows)
+
+-- loopback2 server connection is cached by the local session as the
+-- keep_connection is on.
+SELECT 1 FROM ft6 LIMIT 1;
+ ?column

Re: [Patch] Optimize dropping of relation buffers using dlist

2020-12-04 Thread Amit Kapila
On Fri, Nov 27, 2020 at 11:36 AM Kyotaro Horiguchi
 wrote:
>
> At Fri, 27 Nov 2020 02:19:57 +, "k.jami...@fujitsu.com" 
>  wrote in
> > > From: Kyotaro Horiguchi 
> > > Hello, Kirk. Thank you for the new version.
> >
> > Hi, Horiguchi-san. Thank you for your very helpful feedback.
> > I'm updating the patches addressing those.
> >
> > > +   if (!smgrexists(rels[i], j))
> > > +   continue;
> > > +
> > > +   /* Get the number of blocks for a relation's fork */
> > > +   blocks[i][numForks] = smgrnblocks(rels[i], j,
> > > NULL);
> > >
> > > If we see a fork which its size is not cached we must give up this 
> > > optimization
> > > for all target relations.
> >
> > I did not use the "cached" flag in DropRelFileNodesAllBuffers and use 
> > InRecovery
> > when deciding for optimization because of the following reasons:
> > XLogReadBufferExtended() calls smgrnblocks() to apply changes to relation 
> > page
> > contents. So in DropRelFileNodeBuffers(), XLogReadBufferExtended() is called
> > during VACUUM replay because VACUUM changes the page content.
> > OTOH, TRUNCATE doesn't change the relation content, it just truncates 
> > relation pages
> > without changing the page contents. So XLogReadBufferExtended() is not 
> > called, and
> > the "cached" flag will always return false. I tested with "cached" flags 
> > before, and this
>
> A bit different from the point, but if some tuples have been inserted
> to the truncated table, XLogReadBufferExtended() is called for the
> table and the length is cached.
>
> > always return false, at least in DropRelFileNodesAllBuffers. Due to this, 
> > we cannot use
> > the cached flag in DropRelFileNodesAllBuffers(). However, I think we can 
> > still rely on
> > smgrnblocks to get the file size as long as we're InRecovery. That cached 
> > nblocks is still
> > guaranteed to be the maximum in the shared buffer.
> > Thoughts?
>
> That means that we always think as if smgrnblocks returns "cached" (or
> "safe") value during recovery, which is out of our current
> consensus. If we go on that side, we don't need to consult the
> "cached" returned from smgrnblocks at all and it's enough to see only
> InRecovery.
>
> I got confused..
>
> We are relying on the "fact" that the first lseek() call of a
> (startup) process tells the truth.  We added an assertion so that we
> make sure that the cached value won't be cleared during recovery.  A
> possible remaining danger would be closing of an smgr object of a live
> relation just after a file extension failure. I think we are thinking
> that that doesn't happen during recovery.  Although it seems to me
> true, I'm not confident.
>

Yeah, I also think it might not be worth depending upon whether smgr
close has been done before or not. I feel the current idea of using
'cached' parameter is relatively solid and we should rely on that.
Also, which means that in DropRelFileNodesAllBuffers() we should rely
on the same, I think doing things differently in this regard will lead
to confusion. I agree in some cases we might not get benefits but it
is more important to be correct and keep the code consistent to avoid
introducing bugs now or in the future.

-- 
With Regards,
Amit Kapila.




Re: {CREATE INDEX, REINDEX} CONCURRENTLY improvements

2020-12-04 Thread Dmitry Dolgov
> On Mon, Nov 30, 2020 at 04:54:39PM -0300, Alvaro Herrera wrote:
>
> In a previous thread [1], we added smarts so that processes running
> CREATE INDEX CONCURRENTLY would not wait for each other.
>
> One is adding the same to REINDEX CONCURRENTLY.  I've attached patch
> 0002 here which does that.
>
> Why 0002, you ask?  That's because preparatory patch 0001 simplifies the
> ReindexRelationConcurrently somewhat by adding a struct to be used of
> indexes that are going to be processed, instead of just a list of Oids.
> This is a good change in itself because it let us get rid of duplicative
> open/close of the index rels in order to obtain some info that's already
> known at the start.

Thanks! The patch looks pretty good to me, after reading it I only have
a few minor comments/questions:

* ReindexIndexInfo sounds a bit weird for me because of the repeating
  part, although I see there is already a similar ReindexIndexCallbackState
  so probably it's fine.

* This one is mostly for me to understand. There are couple of places
  with a commentary that 'PROC_IN_SAFE_IC is not necessary, because the
  transaction only takes a snapshot to do some catalog manipulation'.
  But for some of them I don't immediately see in the relevant code
  anything related to snapshots. E.g. one in DefineIndex is followed by
  WaitForOlderSnapshots (which seems to only do waiting, not taking a
  snapshot), index_set_state_flags and CacheInvalidateRelcacheByRelid.
  Is taking a snapshot hidden somewhere there inside?




Re: [bug fix] ALTER TABLE SET LOGGED/UNLOGGED on a partitioned table does nothing silently

2020-12-04 Thread Bharath Rupireddy
On Fri, Dec 4, 2020 at 8:22 AM tsunakawa.ta...@fujitsu.com
 wrote:
>
> From: Bharath Rupireddy 
> > 1) What happens if a partitioned table has a foreign partition along
> > with few other local partitions[1]? Currently, if we try to set
> > logged/unlogged of a foreign table, then an "ERROR:  "" is not a
> > table" is thrown. This makes sense. With your patch also we see the
> > same error, but I'm not quite sure, whether it is setting the parent
> > and local partitions to logged/unlogged and then throwing the error?
> > Or do we want the parent relation and the all the local partitions
> > become logged/unlogged and give a warning saying foreign table can not
> > be made logged/unlogged?
>
> Good point, thanks.  I think the foreign partitions should be ignored, 
> otherwise the user would have to ALTER each local partition manually or 
> detach the foreign partitions temporarily.  Done like that.
>

Agreed.

>
> > 2) What is the order in which the parent table and the partitions are
> > made logged/unlogged? Is it that first the parent table and then all
> > the partitions? What if an error occurs as in above point for a
> > foreign partition or a partition having foreign key reference to a
> > logged table? During the rollback of the txn, will we undo the setting
> > logged/unlogged?
>
> The parent is not changed because it does not have storage.
>

IMHO, we should also change the parent table. Say, I have 2 local
partitions for a logged table, then I alter that table to
unlogged(with your patch, parent table doesn't become unlogged whereas
the partitions will), and I detach all the partitions for some reason.
Now, the user would think that he set the parent table to unlogged but
it didn't happen. So, I think we should also change the parent table's
logged/unlogged property though it may not have data associated with
it when it has all the partitions. Thoughts?

>
> If some partition has undesirable foreign key relationship, the entire ALTER 
> command fails.  All the effects are undone when the transaction rolls back.
>

Hmm.

>
> > 3) Say, I have two logged tables t1 and t2. I altered t1 to be
> > unlogged, and then I attach logged table t2 as a partition to t1, then
> > what's the expectation? While attaching the partition, should we also
> > make t2 as unlogged?
>
> The attached partition retains its property.
>

This may lead to a situation where a partition table has few
partitions as logged and few as unlogged. Will it lead to some
problems? I'm not quite sure though. I feel while attaching a
partition to a table, we have to check whether the parent is
logged/unlogged and set the partition accordingly. While detaching a
partition we do not need to do anything, just retain the existing
state. I read this from the docs "Any indexes created on an unlogged
table are automatically unlogged as well". So, on the similar lines we
also should automatically make partitions logged/unlogged based on the
parent table. We may have to also think of cases where there exists a
multi level parent child relationship i.e. the table to which a
partition is being attached may be a child to another parent table.
Thoughts?

>
> > 5) Coming to the patch, it is missing to add test cases.
>
> Yes, added in the revised patch.
>

I think we can add foreign partition case into postgres_fdw.sql so
that the tests will run as part of make check-world.

How about documenting all these points in alter_table.sgml,
create_table.sgml and create_foreign_table.sgml under the partition
section?

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




Re: [HACKERS] Custom compression methods

2020-12-04 Thread Dilip Kumar
On Tue, Dec 1, 2020 at 9:15 PM Dilip Kumar  wrote:
>
> On Tue, Dec 1, 2020 at 4:50 PM Dilip Kumar  wrote:
> >
> > On Sat, Nov 21, 2020 at 3:50 AM Robert Haas  wrote:
> >
> > While working on this comment I have doubts.
> >
> > > I wonder in passing about TOAST tables and materialized views, which
> > > are the other things that have storage. What gets stored for
> > > attcompression? For a TOAST table it probably doesn't matter much
> > > since TOAST table entries shouldn't ever be toasted themselves, so
> > > anything that doesn't crash is fine (but maybe we should test that
> > > trying to alter the compression properties of a TOAST table doesn't
> > > crash, for example).
> >
> > Yeah for the toast table it doesn't matter,  but I am not sure what do
> > you mean by altering the compression method for the toast table. Do you
> > mean manually update the pg_attribute tuple for the toast table and
> > set different compression methods?  Or there is some direct way to
> > alter the toast table?
> >
> >  For a materialized view it seems reasonable to
> > > want to set column properties, but I'm not quite sure how that works
> > > today for things like STORAGE anyway. If we do allow setting STORAGE
> > > or COMPRESSION for materialized view columns then dump-and-reload
> > > needs to preserve the values.
> >
> > I see that we allow setting the STORAGE for the materialized view but
> > I am not sure what is the use case.  Basically, the tuples are
> > directly getting selected from the host table and inserted in the
> > materialized view without checking target and source storage type.
> > The behavior is the same if you execute INSERT INTO dest_table SELECT
> > * FROM source_table.  Basically, if the source_table attribute has
> > extended storage and the target table has plain storage, still the
> > value will be inserted directly into the target table without any
> > conversion.  However, in the table, you can insert the new tuple and
> > that will be stored as per the new storage method so that is still
> > fine but I don't know any use case for the materialized view.  Now I am
> > thinking what should be the behavior for the materialized view?
> >
> > For the materialized view can we have the same behavior as storage?  I
> > think for the built-in compression method that might not be a problem
> > but for the external compression method how can we handle the
> > dependency, I mean when the materialized view has created the table
> > was having an external compression method "cm1" and we have created
> > the materialized view based on that now if we alter table and set the
> > new compression method and force table rewrite then what will happen
> > to the tuple inside the materialized view, I mean tuple is still
> > compressed with "cm1" and there is no attribute is maintaining the
> > dependency on "cm1" because the materialized view can point to any
> > compression method.  Now if we drop the cm1 it will be allowed to
> > drop.  So I think for the compression method we can consider the
> > materialized view same as the table, I mean we can allow setting the
> > compression method for the materialized view and we can always ensure
> > that all the tuple in this view is compressed with the current or the
> > preserved compression methods.  So whenever we are inserting in the
> > materialized view then we should compare the datum compression method
> > with the target compression method.

As per the offlist discussion with Robert, for materialized/table we
will always compress the value as per the target attribute compression
method.  So if we are creating/refreshing the materialized view and
the attcompression for the target attribute is different than the
source table then we will decompress it and then compress it back as
per the target table/view.

> >
> >
> > > +   /*
> > > +* Use default compression method if the existing compression 
> > > method is
> > > +* invalid but the new storage type is non plain storage.
> > > +*/
> > > +   if (!OidIsValid(attrtuple->attcompression) &&
> > > +   (newstorage != TYPSTORAGE_PLAIN))
> > > +   attrtuple->attcompression = DefaultCompressionOid;
> > >
> > > You have a few too many parens in there.
> > >
> > > I don't see a particularly good reason to treat plain and external
> > > differently.
> >
> > Yeah, I think they should be treated the same.
> >
> >  More generally, I think there's a question here about
> > > when we need an attribute to have a valid compression type and when we
> > > don't. If typstorage is plan or external, then there's no point in
> > > ever having a compression type and maybe we should even reject
> > > attempts to set one (but I'm not sure).
> >
> > I agree.
> >
> > > However, the attstorage is a
> > > different case. Suppose the column is created with extended storage
> > > and then later it's changed to plain. That's only a hint, so there may
> > > still be toasted values in that c

Re: convert elog(LOG) calls to ereport

2020-12-04 Thread Peter Eisentraut

On 2020-12-02 15:04, Alvaro Herrera wrote:

-   elog(LOG, "WSAIoctl(SIO_KEEPALIVE_VALS) failed: %ui",
-WSAGetLastError());
+   ereport(LOG,
+   (errmsg("WSAIoctl(SIO_KEEPALIVE_VALS) failed: 
%ui",
+   WSAGetLastError(;


Please take the opportunity to move the flag name out of the message in
this one, also.


done


I do wonder if it'd be a good idea to move the syscall
name itself out of the message, too; that would reduce the number of
messages to translate 50x to just "%s(%s) failed: %m" instead of one
message per distinct syscall.


Seems useful, but perhaps as a separate project.


Should fd.c messages do errcode_for_file_access() like elsewhere?


yes, done




Re: convert elog(LOG) calls to ereport

2020-12-04 Thread Peter Eisentraut

On 2020-12-03 08:02, Michael Paquier wrote:

+   else
+   ereport(LOG,
+   (errmsg("checkpoint starting:%s%s%s%s%s%s%s%s",

Would it be better to add a note for translators here, in short that
all those %s are options related to checkpoint/restartpoints?


done


The ones in geqo_erx.c look like debugging information, and the ones
in win32_shmem.c for segment creation are code paths only used by the
postmaster.


I dialed those back a bit.




Re: convert elog(LOG) calls to ereport

2020-12-04 Thread Peter Eisentraut

On 2020-12-03 08:55, Noah Misch wrote:

For the changes I didn't mention explicitly (most of them), I'm -0.5.  Many of
them "can't happen", use source code terms of art, and/or breach guideline
"Avoid mentioning called function names, either; instead say what the code was
trying to do" (https://www.postgresql.org/docs/devel/error-style-guide.html).


Thanks for the detailed feedback.  I dialed back some of the changes 
based on your feedback.





Re: Single transaction in the tablesync worker?

2020-12-04 Thread Ashutosh Bapat
On Thu, Dec 3, 2020 at 7:24 PM Amit Kapila  wrote:
>
> On Thu, Dec 3, 2020 at 7:04 PM Ashutosh Bapat
>  wrote:
> >
> > On Thu, Dec 3, 2020 at 2:55 PM Amit Kapila  wrote:
> > >
> > > The tablesync worker in logical replication performs the table data
> > > sync in a single transaction which means it will copy the initial data
> > > and then catch up with apply worker in the same transaction. There is
> > > a comment in LogicalRepSyncTableStart ("We want to do the table data
> > > sync in a single transaction.") saying so but I can't find the
> > > concrete theory behind the same. Is there any fundamental problem if
> > > we commit the transaction after initial copy and slot creation in
> > > LogicalRepSyncTableStart and then allow the apply of transactions as
> > > it happens in apply worker? I have tried doing so in the attached (a
> > > quick prototype to test) and didn't find any problems with regression
> > > tests. I have tried a few manual tests as well to see if it works and
> > > didn't find any problem. Now, it is quite possible that it is
> > > mandatory to do the way we are doing currently, or maybe something
> > > else is required to remove this requirement but I think we can do
> > > better with respect to comments in this area.
> >
> > If we commit the initial copy, the data upto the initial copy's
> > snapshot will be visible downstream. If we apply the changes by
> > committing changes per transaction, the data visible to the other
> > transactions will differ as the apply progresses.
> >
>
> It is not clear what you mean by the above.  The way you have written
> appears that you are saying that instead of copying the initial data,
> I am saying to copy it transaction-by-transaction. But that is not the
> case. I am saying copy the initial data by using REPEATABLE READ
> isolation level as we are doing now, commit it and then process
> transaction-by-transaction till we reach sync-point (point till where
> apply worker has already received the data).

Craig in his mail has clarified this. The changes after the initial
COPY will be visible before the table sync catches up.

>
> > You haven't
> > clarified whether we will respect the transaction boundaries in the
> > apply log or not. I assume we will.
> >
>
> It will be transaction-by-transaction.
>
> > Whereas if we apply all the
> > changes in one go, other transactions either see the data before
> > resync or after it without any intermediate states.
> >
>
> What is the problem even if the user is able to see the data after the
> initial copy?
>
> > That will not
> > violate consistency, I think.
> >
>
> I am not sure how consistency will be broken.

Some of the transactions applied by apply workers may not have been
applied by the resync and vice versa. If the intermediate states of
table resync worker are visible, this difference in applied
transaction will result in loss of consistency if those transactions
are changing the table being resynced and some other table in the same
transaction. The changes won't be atomically visible. Thinking more
about this, this problem exists today for a table being resynced, but
at least it's only the table being resynced that is behind the other
tables so it's predictable.

-- 
Best Wishes,
Ashutosh Bapat




Re: {CREATE INDEX, REINDEX} CONCURRENTLY improvements

2020-12-04 Thread Alvaro Herrera
On 2020-Dec-04, Dmitry Dolgov wrote:

> * This one is mostly for me to understand. There are couple of places
>   with a commentary that 'PROC_IN_SAFE_IC is not necessary, because the
>   transaction only takes a snapshot to do some catalog manipulation'.
>   But for some of them I don't immediately see in the relevant code
>   anything related to snapshots. E.g. one in DefineIndex is followed by
>   WaitForOlderSnapshots (which seems to only do waiting, not taking a
>   snapshot), index_set_state_flags and CacheInvalidateRelcacheByRelid.
>   Is taking a snapshot hidden somewhere there inside?

Well, they're actually going to acquire an Xid, not a snapshot, so the
comment is slightly incorrect; I'll fix it, thanks for pointing that
out.  The point stands: because those transactions are of very short
duration (at least of very short duration after the point where the XID
is obtained), it's not necessary to set the PROC_IN_SAFE_IC flag, since
it won't cause any disruption to other processes.

It is possible that I copied the wrong comment in DefineIndex.  (I only
noticed that one after I had mulled over the ones in
ReindexRelationConcurrently, each of which is skipping setting the flag
for slightly different reasons.)




Re: [HACKERS] [PATCH] Generic type subscripting

2020-12-04 Thread Tom Lane
Pavel Stehule  writes:
> I tested the last patch on my FC33 Lenovo T520 (I7) and I don't see 15%
> slowdown too .. On my comp there is a slowdown of about 1.5-3%. I used your
> function arraytest.

After repeating the experiment a few times, I think I was misled
by ASLR variance (ie, hot code falling within or across cache
lines depending on where the backend executable gets loaded).
I'd tried a couple of postmaster restarts, but seemingly not
enough to expose the full variance in runtime that's due to that.
I do still see a 2% or so penalty when comparing best-case runtimes,
which is consistent with other people's reports.

However, 2% is still more than I want to pay for this feature,
and after studying the patchset awhile I don't think it's tried
hard at all on execution efficiency.  We should eliminate the
ExecEvalSubscriptingRef* interface layer altogether, and have
ExecInterpExpr dispatch directly to container-type-specific code,
so that we end up with approximately the same call depth as before.
With the patches shown below, we are (as best I can tell) right
about on par with the existing code's runtime.  This patch also gets
rid of a few more undesirable assumptions in the core executor ---
for instance, AFAICS there is no need for *any* hard-wired limit
on the number of subscripts within the core executor.  (What a
particular container type chooses to support is its business,
of course.)  We also need not back off on the specificity of error
messages, since the complaints that were in ExecEvalSubscriptingRef*
are now in container-type-specific code.

There were other things not to like about the way v35 chose to
refactor the executor support.  In particular, I don't think it
understood the point of having the EEOP_SBSREF_SUBSCRIPT steps,
which is to only transform the subscript Datums to internal form
once, even when we have to use them twice in OLD and ASSIGN steps.
Admittedly DatumGetInt32 is pretty cheap, but this cannot be said
of reading text datums as the 0003 patch wishes to do.  (BTW, 0003 is
seriously buggy in that regard, as it's failing to cope with toasted
or even short-header inputs.  We really don't want to detoast twice,
so that has to be dealt with in the SUBSCRIPT step.)  I also felt that
processing the subscripts one-at-a-time wasn't necessarily a great
solution, as one can imagine container semantics where they need to be
handled more holistically.  So I replaced EEOP_SBSREF_SUBSCRIPT with
EEOP_SBSREF_SUBSCRIPTS, which is executed just once after all the
subscript Datums have been collected.  (This does mean that we lose
the optimization of short-circuiting as soon as we've found a NULL
subscript, but I'm not troubled by that.  I note in particular that
the core code shouldn't be forcing a particular view of what to do
with null subscripts onto all container types.)

The two patches attached cover the same territory as v35's 0001 and
0002, but I split it up differently because I didn't see much point
in a division that has a nonfunctional code state in the middle.
0001 below is just concerned with revising things enough so that the
core executor doesn't have any assumption about a maximum number of
subscripts.  Then 0002 incorporates what was in v35 0001+0002, revised
with what seems to me a better set of execution APIs.

There are a bunch of loose ends yet, the first three introduced
by me and the rest being pre-existing problems:

* I don't have a lot of confidence in the LLVM changes --- they seem
to work, but I don't really understand that code, and in particular
I don't understand the difference between TypeParamBool and
TypeStorageBool.  So there might be something subtly wrong with the
code generation for EEOP_SBSREF_SUBSCRIPTS.

* As things stand here, there's no difference among the expression
step types EEOP_SBSREF_OLD, EEOP_SBSREF_ASSIGN, and EEOP_SBSREF_FETCH;
they dispatch to different support routines but the core executor's
behavior is identical.  So we could fold them all into one step type,
and lose nothing except perhaps debugging visibility.  Should we do
that, or keep them separate?

* I've not rebased v35-0003 and later onto this design, and don't
intend to do so myself.

* The patchset adds a CREATE TYPE option, but fails to provide any
pg_dump support for that option.  (There's no test coverage either.
Maybe further on, we should extend hstore or another contrib type
to have subscripting support, if only to have testing of that?)

* CREATE TYPE fails to create a dependency from a type to its
subscripting function.  (Related to which, the changes to the
GenerateTypeDependencies call in TypeShellMake are surely wrong.)

* findTypeSubscriptingFunction contains dead code (not to mention sadly
incorrect comments).

* What is refnestedfunc?  That sure seems to be dead code.

* I'm not on board with including refindexprslice in the transformed
expression, either.  AFAICS that is the untransformed subscript list,
which has *no* business being included in the finis

Re: Add session statistics to pg_stat_database

2020-12-04 Thread Laurenz Albe
On Thu, 2020-12-03 at 13:22 +0100, Laurenz Albe wrote:
> > Basically, that would change pgStatSessionDisconnectedNormally into instead 
> > being an
> > enum of reasons, which could be normal disconnect, abnormal disconnect and 
> > admin.
> > And we'd track all those three as separate numbers in the stats file, 
> > meaning we could
> > then calculate the crash by subtracting all three from the total number of 
> > sessions?
> 
> I think at least "closed by admin" might be interesting; I'll have a look.
> I don't think we have to specifically count "closed by normal disconnect", 
> because
> that should be the rule and could be more or less deduced from the other 
> numbers
> (with the uncertainty mentioned above).
> 
> > (Let me know if you think the idea could work and would prefer it if I 
> > worked up a
> > complete suggestion based on it rather than just spitting ideas)
> 
> Thanks for the offer, and I'll get back to it if I get stuck.

Ok, I could use a pointer.

I am considering the cases

1) client just went away (currently "aborted")
2) death by FATAL error
3) killed by the administrator (or shutdown)

What is a good place in the code to tell 2) or 3)
so that I can set the state accordingly?

Yours,
Laurenz Albe





[PATCH] Add support for leading/trailing bytea trim()ing

2020-12-04 Thread Joel Jacobson
Dear hackers,

Let's say we want to strip the leading zero bytes from 
'\xbeefbabe00'::bytea.

This is currently not supported, since trim() for bytea values only support the 
BOTH mode:

SELECT trim(LEADING '\x00'::bytea FROM '\xbeefbabe00'::bytea);
ERROR:  function pg_catalog.ltrim(bytea, bytea) does not exist

The attached patch adds LEADING | TRAILING support for the bytea version of 
trim():

SELECT trim(LEADING '\x00'::bytea FROM '\xbeefbabe00'::bytea);
ltrim
--
\xbeefbabe00

SELECT trim(TRAILING '\x00'::bytea FROM '\xbeefbabe00'::bytea);
 rtrim

\xbeefbabe

Best regards,

Joel Jacobson

leading-trailing-trim-bytea.patch
Description: Binary data


Re: [PATCH] Add support for leading/trailing bytea trim()ing

2020-12-04 Thread Tom Lane
"Joel Jacobson"  writes:
> The attached patch adds LEADING | TRAILING support for the bytea version of 
> trim():

No objection in principle, but you need to extend the code added by
commit 40c24bfef to know about these functions.

The grammar in the functions' descr strings seems a bit shaky too.

regards, tom lane




Re: Corner-case bug in pg_rewind

2020-12-04 Thread Heikki Linnakangas

On 04/12/2020 00:16, Heikki Linnakangas wrote:

On 03/12/2020 16:10, Heikki Linnakangas wrote:

On 02/12/2020 15:26, Ian Barwick wrote:

On 02/12/2020 20:13, Heikki Linnakangas wrote:

Attached are two patches. The first patch is your original patch, unmodified
(except for a cosmetic rename of the test file). The second patch builds on
that, demonstrating and fixing the issue I mentioned. It took me a while to
create a repro for it, it's easily masked by incidental full-page writes or
because rows created by XIDs that are not marked as committed on the other
timeline are invisible, but succeeded at last.


Aha, many thanks. I wasn't entirely sure what I was looking for there and
recently haven't had the time or energy to dig any further.


Ok, pushed and backpatched this now.


The buildfarm is reporting sporadic failures in the new regression test.
I suspect it's because of timing issues, where a server is promoted or
shut down before some data has been replicated. I'll fix that tomorrow
morning.


Fixed, I hope. It took me a while to backpatch, because small 
differences were needed in almost all versions, because some helpful TAP 
test helpers like waiting for a standby to catchup are not available in 
backbranches.


There was one curious difference between versions 9.6 and 10. In v10, 
you can perform a "clean switchover" like this:


1. Shut down primary (A) with "pg_ctl -m fast".

2. Promote the standby (B) with "pg_ctl promote".

3. Reconfigure the old primary (A) as a standby, by creating 
recovery.conf that points to the promoted server, and start it up.


But on 9.6, that leads to an error on the the repurposed primary server (A):

LOG:  primary server contains no more WAL on requested timeline 1
LOG:  new timeline 2 forked off current database system timeline 1 
before current recovery point 0/3A0


It's not clear to me why that is. It seems that the primary generates 
some WAL at shutdown that doesn't get replicated, before the shutdown 
happens. Or the standby doesn't replay that WAL before it's promoted. 
But we have supported "clean switchover" since 9.1, see commit 
985bd7d497. When you shut down the primary, it should wait until all the 
WAL has been replicated, including the shutdown checkpoint.


Perhaps I was just doing it wrong in the test. Or maybe there's a 
genuine bug in that that was fixed in v10. I worked around that in the 
test by re-initializing the primary standby from backup instead of just 
reconfiguring it as a standby, and that's good enough for this 
particular test, so I'm not planning to dig deeper into that myself.


- Heikki




Re: [PATCH] Covering SPGiST index

2020-12-04 Thread Tom Lane
Pavel Borisov  writes:
> I've noticed CI error due to the fact that MSVC doesn't allow arrays of
> flexible size arrays and made a fix for the issue.
> Also did some minor refinement in tuple creation.
> PFA v12 of a patch.

The cfbot's still unhappy --- looks like you omitted a file from the
patch?

regards, tom lane




Re: pg_dump, ATTACH, and independently restorable child partitions

2020-12-04 Thread Tom Lane
Justin Pryzby  writes:
> [ v4-0001-pg_dump-output-separate-object-for-ALTER-TABLE.AT.patch ]

The cfbot is being picky about this:

3218pg_dump.c: In function ‘dumpTableAttach’:
3219pg_dump.c:15600:42: error: suggest parentheses around comparison in operand 
of ‘&’ [-Werror=parentheses]
3220  if (attachinfo->partitionTbl->dobj.dump & DUMP_COMPONENT_DEFINITION == 0)
3221  ^
3222cc1: all warnings being treated as errors

which if I've got the precedence straight is indeed a bug.

Personally I'd probably write

if (!(attachinfo->partitionTbl->dobj.dump & DUMP_COMPONENT_DEFINITION))

as it seems like a boolean test to me.

regards, tom lane




Re: Logical archiving

2020-12-04 Thread Euler Taveira
On Fri, 4 Dec 2020 at 04:33, Andrey Borodin  wrote:

>
> I was discussing problems of CDC with scientific community and they asked
> this simple question: "So you have efficient WAL archive on a very cheap
> storage, why don't you have a logical archive too?"
>

WAL archive doesn't process data; it just copies from one location into
another one. However, "logical archive" must process data.


> If we could just run archive command ```archive-tool wal-push
> 00090F2C00E1.logical``` with contents of logical replication -
> this would be super cool for OLAP. I'd prefer even avoid writing
> 00090F2C00E1.logical to disk, i.e. push data on stdio or
> something like that.
>
> The most time consuming process is logical decoding, mainly due to long
running transactions. In order to minimize your issue, we should improve
the logical decoding mechanism. There was a discussion about allowing
logical decoding on the replica that would probably help your use case a
lot.


-- 
Euler Taveira http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: pg_ls_tmpdir to show directories and shared filesets (and pg_ls_*)

2020-12-04 Thread Tom Lane
Justin Pryzby  writes:
[ v24-0001-Document-historic-behavior-of-links-to-directori.patch ]

The cfbot is unhappy with one of the test cases you added:

6245@@ -259,9 +259,11 @@
6246 select path, filename, type from pg_ls_dir_metadata('.', true, false, 
true) where path!~'[0-9]|pg_internal.init|global.tmp' order by 1;
6247path   |   filename| type 
6248 --+---+--
6249+ PG_VERSION   | PG_VERSION| -
6250  base | base  | d
6251  base/pgsql_tmp   | pgsql_tmp | d
6252  global   | global| d
6253+ global/config_exec_params| config_exec_params| -
6254  global/pg_control| pg_control| -
6255  global/pg_filenode.map   | pg_filenode.map   | -
6256  pg_commit_ts | pg_commit_ts  | d
6257@@ -285,7 +287,6 @@
6258  pg_subtrans  | pg_subtrans   | d
6259  pg_tblspc| pg_tblspc | d
6260  pg_twophase  | pg_twophase   | d
6261- PG_VERSION   | PG_VERSION| -
6262  pg_wal   | pg_wal| d
6263  pg_wal/archive_status| archive_status| d
6264  pg_xact  | pg_xact   | d
6265@@ -293,7 +294,7 @@
6266  postgresql.conf  | postgresql.conf   | -
6267  postmaster.opts  | postmaster.opts   | -
6268  postmaster.pid   | postmaster.pid| -
6269-(34 rows)
6270+(35 rows)

This shows that (a) the test is sensitive to prevailing collation and
(b) it's not filtering out enough temporary files.  Even if those things
were fixed, though, the test would break every time we added/removed
some PGDATA substructure.  Worse, it'd also break if say somebody had
edited postgresql.conf and left an editor backup file behind, or when
running in an installation where the configuration files are someplace
else.  I think this is way too fragile to be acceptable.

Maybe it could be salvaged by reversing the sense of the WHERE condition
so that instead of trying to blacklist stuff, you whitelist just a small
number of files that should certainly be there.

regards, tom lane




Re: [PATCH] Covering SPGiST index

2020-12-04 Thread Pavel Borisov
>
> The cfbot's still unhappy --- looks like you omitted a file from the
> patch?
>
You are right, thank you. Corrected this. PFA v13

-- 
Best regards,
Pavel Borisov

Postgres Professional: http://postgrespro.com 


v13-0001-Covering-SP-GiST-index-support-for-INCLUDE-colum.patch
Description: Binary data


Re: Logical archiving

2020-12-04 Thread Andrey Borodin
Hi Euler!

Thanks for your response.

> 4 дек. 2020 г., в 22:14, Euler Taveira  
> написал(а):
> 
> On Fri, 4 Dec 2020 at 04:33, Andrey Borodin  wrote:
> 
> I was discussing problems of CDC with scientific community and they asked 
> this simple question: "So you have efficient WAL archive on a very cheap 
> storage, why don't you have a logical archive too?"
> 
> WAL archive doesn't process data; it just copies from one location into 
> another one. However, "logical archive" must process data.
WAL archiving processes data: it does compression, encryption and digesting. 
Only minimal impractical setup will copy data as is. However I agree, that all 
processing is done outside postgres.

> If we could just run archive command ```archive-tool wal-push 
> 00090F2C00E1.logical``` with contents of logical replication - 
> this would be super cool for OLAP. I'd prefer even avoid writing 
> 00090F2C00E1.logical to disk, i.e. push data on stdio or 
> something like that.
> 
> The most time consuming process is logical decoding, mainly due to long 
> running transactions.
Currently I do not experience problem of high CPU utilisation.

> In order to minimize your issue, we should improve the logical decoding 
> mechanism.
No, the issue I'm facing comes from the fact that corner cases of failover are 
not solved properly for logical replication. Timelines, partial segments, 
archiving along with streaming, starting from arbitrary LSN (within available 
WAL), rewind, named restore points, cascade replication etc etc. All these nice 
things are there for WAL and are missing for LR. I'm just trying to find 
shortest path through this to make CDC(changed data capture) work.

> There was a discussion about allowing logical decoding on the replica that 
> would probably help your use case a lot.
I will look there more closely, thanks! But it's only part of a solution.


Best regards, Andrey Borodin.



Re: pg_dump, ATTACH, and independently restorable child partitions

2020-12-04 Thread Justin Pryzby
On Fri, Dec 04, 2020 at 12:13:05PM -0500, Tom Lane wrote:
> Justin Pryzby  writes:
> > [ v4-0001-pg_dump-output-separate-object-for-ALTER-TABLE.AT.patch ]
> 
> The cfbot is being picky about this:
> 
> 3218pg_dump.c: In function ‘dumpTableAttach’:
> 3219pg_dump.c:15600:42: error: suggest parentheses around comparison in 
> operand of ‘&’ [-Werror=parentheses]
> 3220  if (attachinfo->partitionTbl->dobj.dump & DUMP_COMPONENT_DEFINITION == 
> 0)
> 3221  ^
> 3222cc1: all warnings being treated as errors
> 
> which if I've got the precedence straight is indeed a bug.

Oops - from a last-minute edit.
I missed it due to cfboot being slow, and clogged up with duplicate entries.
This also adds/updates comments.

-- 
Justin
>From 67b5d61a45bdded00661c13c71f901750e4b48a9 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Thu, 26 Nov 2020 22:02:07 -0600
Subject: [PATCH v5] pg_dump: output separate "object" for ALTER TABLE..ATTACH
 PARTITION

..allowing table partitions to be restored separately and independently, even
if there's an error attaching to parent table.
---
 src/bin/pg_dump/common.c   | 33 ++-
 src/bin/pg_dump/pg_dump.c  | 76 --
 src/bin/pg_dump/pg_dump.h  |  8 
 src/bin/pg_dump/pg_dump_sort.c | 48 +++--
 4 files changed, 122 insertions(+), 43 deletions(-)

diff --git a/src/bin/pg_dump/common.c b/src/bin/pg_dump/common.c
index 634ca86cfb..5c8aa80cc4 100644
--- a/src/bin/pg_dump/common.c
+++ b/src/bin/pg_dump/common.c
@@ -261,7 +261,9 @@ getSchemaData(Archive *fout, int *numTablesPtr)
 
 /* flagInhTables -
  *	 Fill in parent link fields of tables for which we need that information,
- *	 and mark parents of target tables as interesting
+ *	 mark parents of target tables as interesting, and create
+ *	 TableAttachInfo objects for partitioned tables with appropriate
+ *	 dependency links.
  *
  * Note that only direct ancestors of targets are marked interesting.
  * This is sufficient; we don't much care whether they inherited their
@@ -320,6 +322,35 @@ flagInhTables(Archive *fout, TableInfo *tblinfo, int numTables,
 			for (j = 0; j < numParents; j++)
 parents[j]->interesting = true;
 		}
+
+		/* Dump ALTER TABLE .. ATTACH PARTITION */
+		if (tblinfo[i].dobj.dump && tblinfo[i].ispartition)
+		{
+			/* We don't even need to store this anywhere, but maybe there's
+			 * some utility in making a single, large allocation earlier ? */
+			TableAttachInfo *attachinfo = palloc(sizeof(*attachinfo));
+
+			attachinfo->dobj.objType = DO_TABLE_ATTACH;
+			attachinfo->dobj.catId.tableoid = 0;
+			attachinfo->dobj.catId.oid = 0;
+			AssignDumpId(&attachinfo->dobj);
+			attachinfo->dobj.name = pg_strdup(tblinfo[i].dobj.name);
+			attachinfo->dobj.namespace = tblinfo[i].dobj.namespace;
+			attachinfo->parentTbl = tblinfo[i].parents[0];
+			attachinfo->partitionTbl = &tblinfo[i];
+
+			/*
+			 * We must state the DO_TABLE_ATTACH object's dependencies
+			 * explicitly, since it will not match anything in pg_depend.
+			 *
+			 * Give it dependencies on both the partition table and the parent
+			 * table, so that it will not be executed till both of those
+			 * exist.  (There's no need to care what order those are created
+			 * in.)
+			 */
+			addObjectDependency(&attachinfo->dobj, tblinfo[i].dobj.dumpId);
+			addObjectDependency(&attachinfo->dobj, tblinfo[i].parents[0]->dobj.dumpId);
+		}
 	}
 }
 
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index dc1d41dd8d..1efc642280 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -201,6 +201,7 @@ static void dumpAgg(Archive *fout, AggInfo *agginfo);
 static void dumpTrigger(Archive *fout, TriggerInfo *tginfo);
 static void dumpEventTrigger(Archive *fout, EventTriggerInfo *evtinfo);
 static void dumpTable(Archive *fout, TableInfo *tbinfo);
+static void dumpTableAttach(Archive *fout, TableAttachInfo *tbinfo);
 static void dumpTableSchema(Archive *fout, TableInfo *tbinfo);
 static void dumpAttrDef(Archive *fout, AttrDefInfo *adinfo);
 static void dumpSequence(Archive *fout, TableInfo *tbinfo);
@@ -10110,6 +10111,9 @@ dumpDumpableObject(Archive *fout, DumpableObject *dobj)
 		case DO_TABLE:
 			dumpTable(fout, (TableInfo *) dobj);
 			break;
+		case DO_TABLE_ATTACH:
+			dumpTableAttach(fout, (TableAttachInfo *) dobj);
+			break;
 		case DO_ATTRDEF:
 			dumpAttrDef(fout, (AttrDefInfo *) dobj);
 			break;
@@ -15573,6 +15577,55 @@ createDummyViewAsClause(Archive *fout, TableInfo *tbinfo)
 	return result;
 }
 
+/*
+ * dumpTableAttach
+ *	  write to fout the commands to attach a child partition
+ *
+ * For partitioned tables, emit the ATTACH PARTITION clause.  Note
+ * that we always want to create partitions this way instead of using
+ * CREATE TABLE .. PARTITION OF, mainly to preserve a possible column
+ * layout discrepancy with the parent, but also to ensure it gets the
+ * correct tablespace setting if it differs from the

Re: [HACKERS] [PATCH] Generic type subscripting

2020-12-04 Thread Tom Lane
BTW, I had a thought about this patch, which I wanted to write down
before it disappears again (I'm not offering to code it right now).

I think we should split array_subscript_handler into two functions,
one for "regular" varlena arrays and one for the fixed-length
pseudo-array types like "name" and "point".  This needn't have a lot
of impact on the execution code.  In fact, for the first version both
handlers could just return the same set of method pointers, and then
if we feel like it we could tease apart the code paths later.  The
value of doing this is that then typsubshandler could be used as a
bulletproof designator of the type's semantics.  Instead of weird
implementation-dependent tests on typlen and so on, the rule could be
"it's a regular array if typsubshandler == F_ARRAY_SUBSCRIPT_HANDLER".

Later on, we could even allow the "fixed length" array semantics to
be applied to varlena types perhaps, so long as their contents are
just N copies of some fixed-size type.  The point here is that we
now have a tool for recognizing the subscripting semantics reliably,
instead of having to back into an understanding of what they are.
But for the tool to be useful, we don't want the same handler
implementing significantly different behaviors.

regards, tom lane




Re: walsender bug: stuck during shutdown

2020-12-04 Thread Alvaro Herrera
On 2020-Nov-26, Fujii Masao wrote:

> Yes, so the problem here is that walsender goes into the busy loop
> in that case. Seems this happens only in logical replication walsender.
> In physical replication walsender, WaitLatchOrSocket() in WalSndLoop()
> seems to work as expected and prevent it from entering into busy loop
> even in that case.
> 
>   /*
>* If postmaster asked us to stop, don't wait anymore.
>*
>* It's important to do this check after the recomputation of
>* RecentFlushPtr, so we can send all remaining data before 
> shutting
>* down.
>*/
>   if (got_STOPPING)
>   break;
> 
> The above code in WalSndWaitForWal() seems to cause this issue. But I've
> not come up with idea about how to fix yet.

With DEBUG1 I observe that walsender is getting a lot of 'r' messages
(standby reply) with all zeroes:

2020-12-01 21:01:24.100 -03 [15307] DEBUG:  write 0/0 flush 0/0 apply 0/0

However, while doing that I also observed that if I do send some
activity to the logical replication stream, with the provided program,
it will *still* have the 'write' pointer set to 0/0, and the 'flush'
pointer has moved forward to what was sent.  I'm not clear on what
causes the write pointer to move forward in logical replication.

Still, the previously proposed patch does resolve the problem in either
case.




Re: WIP: WAL prefetch (another approach)

2020-12-04 Thread Stephen Frost
Greetings,

* Thomas Munro (thomas.mu...@gmail.com) wrote:
> On Thu, Nov 19, 2020 at 10:00 AM Stephen Frost  wrote:
> > * Thomas Munro (thomas.mu...@gmail.com) wrote:
> > > Hmm.  Every time I try to think of a protocol change for the
> > > restore_command API that would be acceptable, I go around the same
> > > circle of thoughts about event flow and realise that what we really
> > > need for this is ... a WAL receiver...
> >
> > A WAL receiver, or an independent process which goes out ahead and
> > fetches WAL..?
> 
> What I really meant was: why would you want this over streaming rep?

I have to admit to being pretty confused as to this question and maybe
I'm just not understanding.  Why wouldn't change patch be helpful for
streaming replication too..?

If I follow correctly, this patch will scan ahead in the WAL and let
the kernel know that certain blocks will be needed soon.  Ideally,
though I don't think it does yet, we'd only do that for blocks that
aren't already in shared buffers, and only for non-FPIs (even better if
we can skip past pages for which we already, recently, passed an FPI).

The biggest caveat here, it seems to me anyway, is that for this to
actually help you need to be running with checkpoints that are larger
than shared buffers, as otherwise all the pages we need will be in
shared buffers already, thanks to FPIs bringing them in, except when
running with hot standby, right?

In the hot standby case, other random pages could be getting pulled in
to answer user queries and therefore this would be quite helpful to
minimize the amount of time required to replay WAL, I would think.
Naturally, this isn't very interesting if we're just always able to
keep up with the primary, but that's certainly not always the case.

> I just noticed this thread proposing to retire pg_standby on that
> basis:
> 
> https://www.postgresql.org/message-id/flat/20201029024412.GP5380%40telsasoft.com
> 
> I'd be happy to see that land, to fix this problem with my plan.  But
> are there other people writing restore scripts that block that would
> expect them to work on PG14?

Ok, I think I finally get the concern that you're raising here-
basically that if a restore command was written to sit around and wait
for WAL segments to arrive, instead of just returning to PG and saying
"WAL segment not found", that this would be a problem if we are running
out ahead of the applying process and asking for WAL.

The thing is- that's an outright broken restore command script in the
first place.  If PG is in standby mode, we'll ask again if we get an
error result indicating that the WAL file wasn't found.  The restore
command documentation is quite clear on this point:

The command will be asked for file names that are not present in the
archive; it must return nonzero when so asked.

There's no "it can wait around for the next file to show up if it wants
to" in there- it *must* return nonzero when asked for files that don't
exist.

So, I don't think that we really need to stress over this.  The fact
that pg_standby offers options to have it wait instead of just returning
a non-zero error-code and letting the loop that we already do in the
core code seems like it's really just a legacy thing from before we were
doing that and probably should have been ripped out long ago...  Even
more reason to get rid of pg_standby tho, imv, we haven't been properly
adjusting it when we've been making changes to the core code, it seems.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: Logical archiving

2020-12-04 Thread Euler Taveira
On Fri, 4 Dec 2020 at 14:36, Andrey Borodin  wrote:

> >
> > The most time consuming process is logical decoding, mainly due to long
> running transactions.
> Currently I do not experience problem of high CPU utilisation.
>
> I'm wondering why the LSN isn't moving fast enough for your use case.


> > In order to minimize your issue, we should improve the logical decoding
> mechanism.
> No, the issue I'm facing comes from the fact that corner cases of failover
> are not solved properly for logical replication. Timelines, partial
> segments, archiving along with streaming, starting from arbitrary LSN
> (within available WAL), rewind, named restore points, cascade replication
> etc etc. All these nice things are there for WAL and are missing for LR.
> I'm just trying to find shortest path through this to make CDC(changed data
> capture) work.
>
> Craig started a thread a few days ago [1] that described some of these
issues and possible solutions [2]. The lack of HA with logical replication
reduces the number of solutions that could possibly use this technology.
Some of the facilities such as logical replication slots and replication
origin on failover-candidate subscribers should encourage users to adopt
such solutions.

[1]
https://www.postgresql.org/message-id/CAGRY4nx0-ZVnFJV5749QCqwmqBMkjQpcFkYY56a9U6Vf%2Bf7-7Q%40mail.gmail.com
[2]
https://wiki.postgresql.org/wiki/Logical_replication_and_physical_standby_failover


-- 
Euler Taveira http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: POC: Better infrastructure for automated testing of concurrency issues

2020-12-04 Thread Alvaro Herrera
On 2020-Nov-25, Alexander Korotkov wrote:

> In the view of above, I'd like to propose a POC patch, which implements new
> builtin infrastructure for reproduction of concurrency issues in automated
> test suites.  The general idea is so-called "stop events", which are
> special places in the code, where the execution could be stopped on some
> condition.  Stop event also exposes a set of parameters, encapsulated into
> jsonb value.  The condition over stop event parameters is defined using
> jsonpath language.

+1 for the idea.  I agree we have a need for something on this area;
there are *many* scenarios currently untested because of the lack of
what you call "stop points".  I don't know if jsonpath is the best way
to implement it, but at least it is readily available and it seems a
decent way to go at it.





Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly

2020-12-04 Thread Alexey Kondratov

On 2020-12-04 04:25, Justin Pryzby wrote:

On Thu, Dec 03, 2020 at 04:12:53PM +0900, Michael Paquier wrote:

> +typedef struct ReindexParams {
> +  bool concurrently;
> +  bool verbose;
> +  bool missingok;
> +
> +  int options;/* bitmask of lowlevel REINDEXOPT_* */
> +} ReindexParams;
> +

By moving everything into indexcmds.c, keeping ReindexParams within it
makes sense to me.  Now, there is no need for the three booleans
because options stores the same information, no?


 I liked the bools, but dropped them so the patch is smaller.



I had a look on 0001 and it looks mostly fine to me except some strange 
mixture of tabs/spaces in the ExecReindex(). There is also a couple of 
meaningful comments:


-   options =
-   (verbose ? REINDEXOPT_VERBOSE : 0) |
-   (concurrently ? REINDEXOPT_CONCURRENTLY : 0);
+   if (verbose)
+   params.options |= REINDEXOPT_VERBOSE;

Why do we need this intermediate 'verbose' variable here? We only use it 
once to set a bitmask. Maybe we can do it like this:


params.options |= defGetBoolean(opt) ?
REINDEXOPT_VERBOSE : 0;

See also attached txt file with diff (I wonder can I trick cfbot this 
way, so it does not apply the diff).


+   int options;/* bitmask of lowlevel REINDEXOPT_* */

I would prefer if the comment says '/* bitmask of ReindexOption */' as 
in the VacuumOptions, since citing the exact enum type make it easier to 
navigate source code.




Regarding the REINDEX patch, I think this comment is misleading:

|+* Even if table was moved to new tablespace,
normally toast cannot move.
| */
|+   Oid toasttablespaceOid = allowSystemTableMods ?
tablespaceOid : InvalidOid;
|result |= reindex_relation(toast_relid, flags,

I think it ought to say "Even if a table's indexes were moved to a new
tablespace, its toast table's index is not normally moved"
Right ?



Yes, I think so, we are dealing only with index tablespace changing 
here. Thanks for noticing.




Also, I don't know whether we should check for GLOBALTABLESPACE_OID 
after

calling get_tablespace_oid(), or in the lowlevel routines.  Note that
reindex_relation is called during cluster/vacuum, and in the later 
patches, I
moved the test from from cluster() and ExecVacuum() to 
rebuild_relation().




IIRC, I wanted to do GLOBALTABLESPACE_OID check as early as possible 
(just after getting Oid), since it does not make sense to proceed 
further if tablespace is set to that value. So initially there were a 
lot of duplicative GLOBALTABLESPACE_OID checks, since there were a lot 
of reindex entry-points (index, relation, concurrently, etc.). Now we 
are going to have ExecReindex(), so there are much less entry-points and 
in my opinion it is fine to keep this validation just after 
get_tablespace_oid().


However, this is mostly a sanity check. I can hardly imagine a lot of 
users trying to constantly move indexes to the global tablespace, so it 
is also OK to put this check deeper into guts.



Regards
--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Companydiff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index a27f8f9d83..0b1884815c 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -2472,8 +2472,6 @@ void
 ExecReindex(ParseState *pstate, ReindexStmt *stmt, bool isTopLevel)
 {
 	ReindexParams		params = {0};
-	bool		verbose = false,
-concurrently = false;
 	ListCell   	*lc;
 	char	*tablespace = NULL;
 
@@ -2483,9 +2481,11 @@ ExecReindex(ParseState *pstate, ReindexStmt *stmt, bool isTopLevel)
 		DefElem*opt = (DefElem *) lfirst(lc);
 
 		if (strcmp(opt->defname, "verbose") == 0)
-			verbose = defGetBoolean(opt);
+			params.options |= defGetBoolean(opt) ?
+REINDEXOPT_VERBOSE : 0;
 		else if (strcmp(opt->defname, "concurrently") == 0)
-			concurrently = defGetBoolean(opt);
+			params.options |= defGetBoolean(opt) ?
+REINDEXOPT_CONCURRENTLY : 0;
 		else if (strcmp(opt->defname, "tablespace") == 0)
 			tablespace = defGetString(opt);
 		else
@@ -2496,18 +2496,12 @@ ExecReindex(ParseState *pstate, ReindexStmt *stmt, bool isTopLevel)
 	 parser_errposition(pstate, opt->location)));
 	}
 
-	if (verbose)
-		params.options |= REINDEXOPT_VERBOSE;
+	params.tablespaceOid = tablespace ?
+		get_tablespace_oid(tablespace, false) : InvalidOid;
 
-	if (concurrently)
-	{
-		params.options |= REINDEXOPT_CONCURRENTLY;
+	if (params.options & REINDEXOPT_CONCURRENTLY)
 		PreventInTransactionBlock(isTopLevel,
   "REINDEX CONCURRENTLY");
-	}
-
-	params.tablespaceOid = tablespace ?
-		get_tablespace_oid(tablespace, false) : InvalidOid;
 
 	switch (stmt->kind)
 	{


Re: [PATCH] Add support for leading/trailing bytea trim()ing

2020-12-04 Thread Joel Jacobson
On Fri, Dec 4, 2020, at 17:37, Tom Lane wrote:
>No objection in principle, but you need to extend the code added by
>commit 40c24bfef to know about these functions.

Oh, I see, that's a very nice improvement.

I've now added F_LTRIM_BYTEA_BYTEA and F_RTRIM_BYTEA_BYTEA to ruleutils.c 
accordingly,
and also added regress tests to create_view.sql.

>The grammar in the functions' descr strings seems a bit shaky too.

Not sure what you mean? The grammar is unchanged, since it was already 
supported,
but the overloaded bytea functions were missing.

I did however notice I forgot to update the description in func.sgml
for the bytea version of trim(). Maybe that's what you meant was shaky?
I've changed the description to read:

-bytesremoved from the start
-and end of bytes.
+bytesremoved from the start,
+the end, or both ends of bytes.
+(BOTH is the default)

New patch attached.

/Joel

leading-trailing-trim-bytea-002.patch
Description: Binary data


Re: WIP: WAL prefetch (another approach)

2020-12-04 Thread Andres Freund
Hi,

On 2020-12-04 13:27:38 -0500, Stephen Frost wrote:
> If I follow correctly, this patch will scan ahead in the WAL and let
> the kernel know that certain blocks will be needed soon.  Ideally,
> though I don't think it does yet, we'd only do that for blocks that
> aren't already in shared buffers, and only for non-FPIs (even better if
> we can skip past pages for which we already, recently, passed an FPI).

The patch uses PrefetchSharedBuffer(), which only initiates a prefetch
if the page isn't already in s_b.

And once we have AIO, it can actually initiate IO into s_b at that
point, rather than fetching it just into the kernel page cache.

Greetings,

Andres Freund




Re: POC: Better infrastructure for automated testing of concurrency issues

2020-12-04 Thread Peter Geoghegan
On Wed, Nov 25, 2020 at 6:11 AM Alexander Korotkov  wrote:
> While the postgres community does a great job on investigating and fixing the 
> problems, our ability to reproduce concurrency issues in the source code test 
> suites is limited.

+1. This seems really cool.

> For sure, evaluation of stop events causes a CPU overhead.  This is why it's 
> controlled by enable_stopevents GUC, which is off by default. I expect the 
> overhead with enable_stopevents = off shouldn't be observable.  Even if it 
> would be observable, we could enable stop events only by specific configure 
> parameter.  There is also trace_stopevents GUC, which traces all the stop 
> events to the log with debug2 level.

But why even risk adding noticeable overhead when "enable_stopevents =
off "? Even if it's a very small risk? We can still get most of the
benefit by enabling it only on certain builds and buildfarm animals.
It will be a bit annoying to not have stop events enabled in all
builds, but it avoids the problem of even having to think about the
overhead, now or in the future. I think that that trade-off is a good
one. Even if the performance trade-off is judged perfectly for the
first few tests you add, what are the chances that it will stay that
way as the infrastructure is used in more and more places? What if you
need to add a test to the back branches? Since we don't anticipate any
direct benefit for users (right?), I think that this question is
simple.

I am not arguing for not enabling stop events on standard builds
because the infrastructure isn't useful -- it's *very* useful. Useful
enough that it would be nice to be able to use it extensively without
really thinking about the performance hit each time. I know that I'll
be *far* more likely to use it if I don't have to waste time and
energy on that aspect every single time.

-- 
Peter Geoghegan




Re: Improper use about DatumGetInt32

2020-12-04 Thread Alvaro Herrera
On 2020-Dec-03, Peter Eisentraut wrote:

> On 2020-11-30 16:32, Alvaro Herrera wrote:
> > On 2020-Nov-30, Peter Eisentraut wrote:
> > 
> > > Patch updated this way.  I agree it's better that way.
> > 
> > Thanks, LGTM.
> 
> For a change like this, do we need to change the C symbol names, so that
> there is no misbehavior if the shared library is not updated at the same
> time as the extension is upgraded in SQL?

Good question.  One point is that since the changes to the arguments are
just in the way we read the values from the Datum C-values, there's no
actual ABI change.  So if I understand correctly, there's no danger of a
crash; there's just a danger of misinterpreting a value.

I don't know if it's possible to determine (at function execution time)
that we're running with the old extension version; if so it might
suffice to throw a warning but still have the SQL function run the same
C function.

If we really think that we ought to differentiate, then we could do what
pg_stat_statement does, and have a separate C function that's called
with the obsolete signature (pg_stat_statements_1_8 et al).




Re: Add docs stub for recovery.conf

2020-12-04 Thread Stephen Frost
Greetings,

* Bruce Momjian (br...@momjian.us) wrote:
> On Wed, Dec  2, 2020 at 08:07:47PM -0500, Isaac Morland wrote:
> > On Wed, 2 Dec 2020 at 19:33, David G. Johnston 
> > wrote:
> > 
> > On Wed, Dec 2, 2020 at 5:26 PM Bruce Momjian  wrote:
> > 
> > I think the ideal solution is to create a section for all the rename
> > cases and do all the redirects to that page.  The page would list 
> > the
> > old and new name for each item, and would link to the section for 
> > each
> > new item.
> > 
> > 
> > 
> > Nothing prevents us from doing that for simple renames.  For me, this
> > situation is not a simple rename and the proposed solution is 
> > appropriate
> > for what it is - changing the implementation details of an existing
> > feature.  We can do both - though the simple rename page doesn't seem
> > particularly appealing at first glance.
> > 
> > 
> > I for one do not like following a bookmark or link and then being 
> > redirected to
> > a generic page that doesn't relate to the specific link I was following. 
> > What
> > is being proposed here is not as bad as the usual, where all the old links
> > simply turn into redirects to the homepage, but it's still disorienting. I
> > would much rather each removed page be moved to an appendix (without 
> > renaming)
> > and edited to briefly explain what happened to the page and provide links to
> > the appropriate up-to-date page or pages.
> 
> Yes, that is pretty much the same thing I was suggesting, except that
> each rename has its own _original_ URL link, which I think is also
> acceptable.  My desire is for these items to all exist in one place, and
> an appendix of them seems fine.

Alright, so, to try and move this forward I'll list out (again) the
renames that we have in pgweb:

catalog-pg-replication-slots.html <-> view-pg-replication-slots.html
pgxlogdump.html <-> pgwaldump.html
app-pgresetxlog.html <-> app-pgresetwal.html
app-pgreceivexlog.html <-> app-pgreceivewal.html

(excluding the 'legal notice' one)

Bruce, are you saying that we need to take Craig's patch and then add to
it entries for all of the above, effectively removing the need for the
web page aliases and redirects?  If that was done, would that be
sufficient to get this committed?  Are there other things that people
can think of off-hand that we should include, I think Craig might have
mentioned something else earlier on..?  I don't think we should require
that someone troll through everything that ever existed, just to be
clear, as we can always add to this later if other things come up.  If
that's the expectation though, then someone needs to say so, in which
case I'll assume it's status quo unless/until someone steps up to do
that.

Obviously, I'd then have to adjust the patch that I proposed for default
roles, or move forward with it as-is, depending on what we end up doing
here.  I dislike what feels like a state of limbo for this right now
though.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: WIP: WAL prefetch (another approach)

2020-12-04 Thread Stephen Frost
Greetings,

* Andres Freund (and...@anarazel.de) wrote:
> On 2020-12-04 13:27:38 -0500, Stephen Frost wrote:
> > If I follow correctly, this patch will scan ahead in the WAL and let
> > the kernel know that certain blocks will be needed soon.  Ideally,
> > though I don't think it does yet, we'd only do that for blocks that
> > aren't already in shared buffers, and only for non-FPIs (even better if
> > we can skip past pages for which we already, recently, passed an FPI).
> 
> The patch uses PrefetchSharedBuffer(), which only initiates a prefetch
> if the page isn't already in s_b.

Great, glad that's already been addressed in this, that's certainly
good.  I think I knew that and forgot it while composing that response
over the past rather busy week. :)

> And once we have AIO, it can actually initiate IO into s_b at that
> point, rather than fetching it just into the kernel page cache.

Sure.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: Add docs stub for recovery.conf

2020-12-04 Thread David G. Johnston
On Fri, Dec 4, 2020 at 12:00 PM Stephen Frost  wrote:

> Obviously, I'd then have to adjust the patch that I proposed for default
> roles, or move forward with it as-is, depending on what we end up doing
> here.  I dislike what feels like a state of limbo for this right now
> though.
>
>
We have a committer + 1 in favor of status quo - or at least requiring that
a non-existing solution be created to move things forward. (Bruce, Daniel)
We have a committer + 3 that seem to agree that the proposed patch is
acceptable as presented. (Stephen, Craig, Isaac, David J.)
Anyone wish to update the above observation?

Stephen, are you will to commit this with the support of the community
members who have spoken up here?

David J.


Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly

2020-12-04 Thread Alvaro Herrera
On 2020-Dec-04, Michael Paquier wrote:

> VacuumOption does that since 6776142, and ClusterOption since 9ebe057,
> so switching ReindexOption to just match the two others still looks
> like the most consistent move.

9ebe057 goes to show why this is a bad idea, since it has this:

+typedef enum ClusterOption
+{
+   CLUOPT_RECHECK, /* recheck relation state */
+   CLUOPT_VERBOSE  /* print progress info */
+} ClusterOption;

and then you do things like

+   if ($2)
+   n->options |= CLUOPT_VERBOSE;

and then tests like

+   if ((options & VACOPT_VERBOSE) != 0)

Now if you were to ever define third and fourth values in that enum,
this would immediately start malfunctioning.

FWIW I'm with Peter on this.




Re: Add docs stub for recovery.conf

2020-12-04 Thread Stephen Frost
Greetings,

* David G. Johnston (david.g.johns...@gmail.com) wrote:
> On Fri, Dec 4, 2020 at 12:00 PM Stephen Frost  wrote:
> > Obviously, I'd then have to adjust the patch that I proposed for default
> > roles, or move forward with it as-is, depending on what we end up doing
> > here.  I dislike what feels like a state of limbo for this right now
> > though.
>
> We have a committer + 1 in favor of status quo - or at least requiring that
> a non-existing solution be created to move things forward. (Bruce, Daniel)
> We have a committer + 3 that seem to agree that the proposed patch is
> acceptable as presented. (Stephen, Craig, Isaac, David J.)
> Anyone wish to update the above observation?
> 
> Stephen, are you will to commit this with the support of the community
> members who have spoken up here?

What I was hoping to achieve is consensus on a reasonably well bounded
solution.  I'm not going to push something that others are objecting to,
but if we can agree on the idea and what's needed for it to be
acceptable then I'm willing to work towards that, provided it doesn't
require going back 10 versions and looking at every single change.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly

2020-12-04 Thread Justin Pryzby
On Fri, Dec 04, 2020 at 09:40:31PM +0300, Alexey Kondratov wrote:
> >  I liked the bools, but dropped them so the patch is smaller.
> 
> I had a look on 0001 and it looks mostly fine to me except some strange
> mixture of tabs/spaces in the ExecReindex(). There is also a couple of
> meaningful comments:
> 
> - options =
> - (verbose ? REINDEXOPT_VERBOSE : 0) |
> - (concurrently ? REINDEXOPT_CONCURRENTLY : 0);
> + if (verbose)
> + params.options |= REINDEXOPT_VERBOSE;
> 
> Why do we need this intermediate 'verbose' variable here? We only use it
> once to set a bitmask. Maybe we can do it like this:
> 
> params.options |= defGetBoolean(opt) ?
>   REINDEXOPT_VERBOSE : 0;

That allows *setting* REINDEXOPT_VERBOSE, but doesn't *unset* it if someone
runs (VERBOSE OFF).  So I kept the bools like Michael originally had rather
than writing "else: params.options &= ~REINDEXOPT_VERBOSE"

> See also attached txt file with diff (I wonder can I trick cfbot this way,
> so it does not apply the diff).

Yes, I think that works :)
I believe it looks for *.diff and *.patch.

> + int options;/* bitmask of lowlevel REINDEXOPT_* */
> 
> I would prefer if the comment says '/* bitmask of ReindexOption */' as in
> the VacuumOptions, since citing the exact enum type make it easier to
> navigate source code.

Yes, thanks.

This also fixes some minor formatting and rebase issues, including broken doc/.

-- 
Justin
>From 5d874da6f93341cb5aed4d3cc54137cbc341d57c Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Wed, 2 Dec 2020 20:54:47 -0600
Subject: [PATCH v33 1/5] ExecReindex and ReindexParams

TODO: typedef
---
 src/backend/commands/indexcmds.c | 145 ---
 src/backend/tcop/utility.c   |  40 +
 src/include/commands/defrem.h|   7 +-
 3 files changed, 97 insertions(+), 95 deletions(-)

diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index 14d24b3cc4..c2ee88f2c3 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -67,6 +67,10 @@
 #include "utils/syscache.h"
 
 
+typedef struct ReindexParams {
+	int options;	/* bitmask of ReindexOption */
+} ReindexParams;
+
 /* non-export function prototypes */
 static bool CompareOpclassOptions(Datum *opts1, Datum *opts2, int natts);
 static void CheckPredicate(Expr *predicate);
@@ -86,12 +90,15 @@ static char *ChooseIndexName(const char *tabname, Oid namespaceId,
 			 bool primary, bool isconstraint);
 static char *ChooseIndexNameAddition(List *colnames);
 static List *ChooseIndexColumnNames(List *indexElems);
+static void ReindexIndex(RangeVar *indexRelation, ReindexParams *params, bool isTopLevel);
+static Oid ReindexTable(RangeVar *relation, ReindexParams *params, bool isTopLevel);
+static void ReindexMultipleTables(const char *objectName, ReindexObjectType objectKind, ReindexParams *params);
 static void RangeVarCallbackForReindexIndex(const RangeVar *relation,
 			Oid relId, Oid oldRelId, void *arg);
 static void reindex_error_callback(void *args);
-static void ReindexPartitions(Oid relid, int options, bool isTopLevel);
-static void ReindexMultipleInternal(List *relids, int options);
-static bool ReindexRelationConcurrently(Oid relationOid, int options);
+static void ReindexPartitions(Oid relid, ReindexParams *params, bool isTopLevel);
+static void ReindexMultipleInternal(List *relids, ReindexParams *params);
+static bool ReindexRelationConcurrently(Oid relationOid, ReindexParams *params);
 static void update_relispartition(Oid relationId, bool newval);
 static inline void set_indexsafe_procflags(void);
 
@@ -100,7 +107,7 @@ static inline void set_indexsafe_procflags(void);
  */
 struct ReindexIndexCallbackState
 {
-	int			options;		/* options from statement */
+	ReindexParams		*params;
 	Oid			locked_table_oid;	/* tracks previously locked table */
 };
 
@@ -2452,14 +2459,17 @@ ChooseIndexColumnNames(List *indexElems)
 }
 
 /*
- * ReindexParseOptions
- *		Parse list of REINDEX options, returning a bitmask of ReindexOption.
+ * Reindex accordinging to stmt.
+ * This calls the intermediate routines: ReindexIndex, ReindexTable, ReindexMultipleTables,
+ * which ultimately call reindex_index, reindex_relation, ReindexRelationConcurrently.
+ * Note that partitioned relations are handled by ReindexPartitions, except that
+ * ReindexRelationConcurrently handles concurrently reindexing a table.
  */
-int
-ReindexParseOptions(ParseState *pstate, ReindexStmt *stmt)
+void
+ExecReindex(ParseState *pstate, ReindexStmt *stmt, bool isTopLevel)
 {
-	ListCell   *lc;
-	int			options = 0;
+	ListCell   	*lc;
+	ReindexParams		params = {0};
 	bool		concurrently = false;
 	bool		verbose = false;
 
@@ -2480,19 +2490,53 @@ ReindexParseOptions(ParseState *pstate, ReindexStmt *stmt)
 	 parser_errposition(pstate, opt->location)));
 	}
 
-	options =
-		(verbose ? REINDEXOPT_VERBOSE : 0) |
-		(concurrently ? REINDEXOPT_CONCURRENTLY : 0);
+	if (

Re: [PATCH] Add support for leading/trailing bytea trim()ing

2020-12-04 Thread Tom Lane
"Joel Jacobson"  writes:
> On Fri, Dec 4, 2020, at 17:37, Tom Lane wrote:
>> The grammar in the functions' descr strings seems a bit shaky too.

> Not sure what you mean?

"trim left ends" (plural) seems wrong.  A string only has one left end,
at least in my universe.

(Maybe the existing ltrim/rtrim descrs are also like this, but if so
I'd change them too.)

regards, tom lane




Re: POC: Better infrastructure for automated testing of concurrency issues

2020-12-04 Thread Alexander Korotkov
On Fri, Dec 4, 2020 at 9:29 PM Alvaro Herrera  wrote:
> On 2020-Nov-25, Alexander Korotkov wrote:
> > In the view of above, I'd like to propose a POC patch, which implements new
> > builtin infrastructure for reproduction of concurrency issues in automated
> > test suites.  The general idea is so-called "stop events", which are
> > special places in the code, where the execution could be stopped on some
> > condition.  Stop event also exposes a set of parameters, encapsulated into
> > jsonb value.  The condition over stop event parameters is defined using
> > jsonpath language.
>
> +1 for the idea.  I agree we have a need for something on this area;
> there are *many* scenarios currently untested because of the lack of
> what you call "stop points".  I don't know if jsonpath is the best way
> to implement it, but at least it is readily available and it seems a
> decent way to go at it.

Thank you for your feedback.  I agree with you regarding jsonpath.  My
initial idea was to use the executor expressions.  But executor
expressions require serialization/deserialization, while stop points
need to work cross-database or even with processes not connected to
any database (such as checkpointer, background writer etc).  That
leads to difficulties, while jsonpath appears to be very easy for this
use-case.

--
Regards,
Alexander Korotkov




Re: POC: Better infrastructure for automated testing of concurrency issues

2020-12-04 Thread Alexander Korotkov
On Fri, Dec 4, 2020 at 9:57 PM Peter Geoghegan  wrote:
> On Wed, Nov 25, 2020 at 6:11 AM Alexander Korotkov  
> wrote:
> > While the postgres community does a great job on investigating and fixing 
> > the problems, our ability to reproduce concurrency issues in the source 
> > code test suites is limited.
>
> +1. This seems really cool.
>
> > For sure, evaluation of stop events causes a CPU overhead.  This is why 
> > it's controlled by enable_stopevents GUC, which is off by default. I expect 
> > the overhead with enable_stopevents = off shouldn't be observable.  Even if 
> > it would be observable, we could enable stop events only by specific 
> > configure parameter.  There is also trace_stopevents GUC, which traces all 
> > the stop events to the log with debug2 level.
>
> But why even risk adding noticeable overhead when "enable_stopevents =
> off "? Even if it's a very small risk? We can still get most of the
> benefit by enabling it only on certain builds and buildfarm animals.
> It will be a bit annoying to not have stop events enabled in all
> builds, but it avoids the problem of even having to think about the
> overhead, now or in the future. I think that that trade-off is a good
> one. Even if the performance trade-off is judged perfectly for the
> first few tests you add, what are the chances that it will stay that
> way as the infrastructure is used in more and more places? What if you
> need to add a test to the back branches? Since we don't anticipate any
> direct benefit for users (right?), I think that this question is
> simple.
>
> I am not arguing for not enabling stop events on standard builds
> because the infrastructure isn't useful -- it's *very* useful. Useful
> enough that it would be nice to be able to use it extensively without
> really thinking about the performance hit each time. I know that I'll
> be *far* more likely to use it if I don't have to waste time and
> energy on that aspect every single time.

Thank you for your feedback.  We probably can't think over everything
in advance.  We can start with configure option enabled for developers
and some buildfarm animals.  That causes no risk of overhead in
standard builds.  After some time, we may reconsider to enable stop
events even in standard build if we see they cause no regression.

--
Regards,
Alexander Korotkov




Re: Improving spin-lock implementation on ARM.

2020-12-04 Thread Alexander Korotkov
On Wed, Dec 2, 2020 at 6:58 AM Krunal Bauskar  wrote:
> Let me know what do you think about this analysis and any specific direction 
> that we should consider to help move forward.

BTW, it would be also nice to benchmark my lwlock patch on the
Kunpeng.  I'm very optimistic about this patch, but it wouldn't be
fair to completely throw it away.  It still might be useful for
LSE-disabled builds.

--
Regards,
Alexander Korotkov




[PATCH] pg_dumpall options proposal/patch

2020-12-04 Thread code


Hi pgsql-hackers,

I have a relatively trivial proposal - this affects pg_dumpall exclusively. 
Primary use case in ability to use pg_dumpall without SUPERUSER.

* Add --no-alter-role flag to only use CREATE ROLE syntax instead of CREATE 
then ALTER.
* Add --exclude-role flag similar to --exclude-database, semantically 
equivalent but applying to ROLEs.
* Add --no-granted-by flag to explicitly omit GRANTED BY clauses.
* Likely controversial - add --merge-credentials-file which loads ROLE/PASSWORD 
combinations from an ini file and adds to dump output if ROLE password not 
present. Implemented with an external library, inih.

All together, based against REL_12_STABLE:
https://github.com/remingtonc/postgres/compare/REL_12_STABLE...remingtonc:REL_12_STABLE_DUMPALL_CLOUDSQL

Example usage used against GCP Cloud SQL:
pg_dumpall --host=$HOST --username=$USER --no-password \
--no-role-passwords --merge-credentials-file=$CREDENTIALS_PATH \
--quote-all-identifiers --no-comments --no-alter-role --no-granted-by \
--exclude-database=postgres\* --exclude-database=template\* 
--exclude-database=cloudsql\* \
--exclude-role=cloudsql\* --exclude-role=postgres\* \
--file=$DUMP_PATH

Before I go to base against master and split in to individual patches - does 
this seem reasonable?

Best,
Remington




Removal of operator_precedence_warning

2020-12-04 Thread Tom Lane
I think it's time for $SUBJECT.  We added this GUC in 9.5, which
will be EOL by the time of our next major release, and it was never
meant as more than a transitional aid.  Moreover, it's been buggy
as heck (cf abb164655, 05104f693, 01e0cbc4f, 4cae471d1), and the
fact that some of those errors went undetected for years shows that
it's not really gotten much field usage.

Hence, I propose the attached.  Comments?

regards, tom lane

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 8cd3d6901c..4b60382778 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -9389,29 +9389,6 @@ dynamic_library_path = 'C:\tools\postgresql;H:\my_project\lib;$libdir'
   
  
 
- 
-  operator_precedence_warning (boolean)
-  
-   operator_precedence_warning configuration parameter
-  
-  
-  
-   
-When on, the parser will emit a warning for any construct that might
-have changed meanings since PostgreSQL 9.4 as a result
-of changes in operator precedence.  This is useful for auditing
-applications to see if precedence changes have broken anything; but it
-is not meant to be kept turned on in production, since it will warn
-about some perfectly valid, standard-compliant SQL code.
-The default is off.
-   
-
-   
-See  for more information.
-   
-  
- 
-
 
   quote_all_identifiers (boolean)
   
diff --git a/doc/src/sgml/syntax.sgml b/doc/src/sgml/syntax.sgml
index 3fdd87823e..d66560b587 100644
--- a/doc/src/sgml/syntax.sgml
+++ b/doc/src/sgml/syntax.sgml
@@ -1121,11 +1121,7 @@ SELECT 3 OPERATOR(pg_catalog.+) 4;
  cases, these changes will result in no behavioral change, or perhaps
  in no such operator failures which can be resolved by adding
  parentheses.  However there are corner cases in which a query might
- change behavior without any parsing error being reported.  If you are
- concerned about whether these changes have silently broken something,
- you can test your application with the configuration
- parameter  turned on
- to see if any warnings are logged.
+ change behavior without any parsing error being reported.
 

   
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 9c73c605a4..8f5e4e71b2 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -3264,9 +3264,6 @@ _outAExpr(StringInfo str, const A_Expr *node)
 			appendStringInfoString(str, " NOT_BETWEEN_SYM ");
 			WRITE_NODE_FIELD(name);
 			break;
-		case AEXPR_PAREN:
-			appendStringInfoString(str, " PAREN");
-			break;
 		default:
 			appendStringInfoString(str, " ??");
 			break;
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index ecff4cd2ac..8f341ac006 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -59,7 +59,6 @@
 #include "nodes/nodeFuncs.h"
 #include "parser/gramparse.h"
 #include "parser/parser.h"
-#include "parser/parse_expr.h"
 #include "storage/lmgr.h"
 #include "utils/date.h"
 #include "utils/datetime.h"
@@ -13461,28 +13460,6 @@ c_expr:		columnref{ $$ = $1; }
 		n->indirection = check_indirection($4, yyscanner);
 		$$ = (Node *)n;
 	}
-	else if (operator_precedence_warning)
-	{
-		/*
-		 * If precedence warnings are enabled, insert
-		 * AEXPR_PAREN nodes wrapping all explicitly
-		 * parenthesized subexpressions; this prevents bogus
-		 * warnings from being issued when the ordering has
-		 * been forced by parentheses.  Take care that an
-		 * AEXPR_PAREN node has the same exprLocation as its
-		 * child, so as not to cause surprising changes in
-		 * error cursor positioning.
-		 *
-		 * In principle we should not be relying on a GUC to
-		 * decide whether to insert AEXPR_PAREN nodes.
-		 * However, since they have no effect except to
-		 * suppress warnings, it's probably safe enough; and
-		 * we'd just as soon not waste cycles on dummy parse
-		 * nodes if we don't have to.
-		 */
-		$$ = (Node *) makeA_Expr(AEXPR_PAREN, NIL, $2, NULL,
- exprLocation($2));
-	}
 	else
 		$$ = $2;
 }
@@ -16516,16 +16493,10 @@ doNegateFloat(Value *v)
 static Node *
 makeAndExpr(Node *lexpr, Node *rexpr, int location)
 {
-	Node	   *lexp = lexpr;
-
-	/* Look through AEXPR_PAREN nodes so they don't affect flattening */
-	while (IsA(lexp, A_Expr) &&
-		   ((A_Expr *) lexp)->kind == AEXPR_PAREN)
-		lexp = ((A_Expr *) lexp)->lexpr;
 	/* Flatten "a AND b AND c ..." to a single BoolExpr on sight */
-	if (IsA(lexp, BoolExpr))
+	if (IsA(lexpr, BoolExpr))
 	{
-		BoolExpr *blexpr = (BoolExpr *) lexp;
+		BoolExpr *blexpr = (BoolExpr *) lexpr;
 
 		if (blexpr->boolop == AND_EXPR)
 		{
@@ -16539,16 +16510,10 @@ makeAndExpr(Node *lexpr, Node *rexpr, int location)
 static Node *
 makeOrExpr(Node *lexpr, Node *rexpr, int lo

Re: [PATCH] pg_dumpall options proposal/patch

2020-12-04 Thread Tom Lane
c...@remington.io writes:
> I have a relatively trivial proposal - this affects pg_dumpall exclusively. 
> Primary use case in ability to use pg_dumpall without SUPERUSER.

> * Add --no-alter-role flag to only use CREATE ROLE syntax instead of CREATE 
> then ALTER.

What's the point of that?

> * Likely controversial - add --merge-credentials-file which loads 
> ROLE/PASSWORD combinations from an ini file and adds to dump output if ROLE 
> password not present. Implemented with an external library, inih.

If it requires an external library, it's probably DOA, regardless of
whether there's a compelling use-case (which you didn't present anyway).

regards, tom lane




Re: A few new options for CHECKPOINT

2020-12-04 Thread Alvaro Herrera
On the UI of this patch, you're proposing to add the option FAST.  I'm
not a fan of this option name and propose that (if we have it) we use
the name SPREAD instead (defaults to false).

Now we don't actually explain the term "spread" much in the documentation;
we just say "the writes are spread".  But it seems more natural to build
on that adjective rather than "fast/slow".


I think starting a spread checkpoint has some usefulness, if your
checkpoint interval is very large but your completion target is not very
close to 1.  In that case, you're expressing that you want a checkpoint
to start now and not impact production unduly, so that you know when it
finishes and therefore when is it a good time to start a backup.  (You
will still have some WAL to replay, but it won't be as much as if you
just ignored checkpoint considerations completely.)


On the subject of measuring replay times for backups taking while
pgbench is pounding the database, I think a realistic test does *not*
have pgbench running at top speed; rather you have some non-maximal
"-R xyz" option.  You would probably determine a value to use by running
without -R, observing what's a typical transaction rate, and using some
fraction (say, half) of that in the real run.




Re: POC: Better infrastructure for automated testing of concurrency issues

2020-12-04 Thread Peter Geoghegan
On Fri, Dec 4, 2020 at 1:20 PM Alexander Korotkov  wrote:
> Thank you for your feedback.  We probably can't think over everything
> in advance.  We can start with configure option enabled for developers
> and some buildfarm animals.  That causes no risk of overhead in
> standard builds.  After some time, we may reconsider to enable stop
> events even in standard build if we see they cause no regression.

I'll start using the configure option for debug builds only as soon as
possible. It will easily work with my existing workflow.

I don't know about anyone else, but for me this is only a very small
inconvenience. Whereas the convenience of not having to think about
the performance impact seems huge.

-- 
Peter Geoghegan




Re: Removal of operator_precedence_warning

2020-12-04 Thread Alvaro Herrera
On 2020-Dec-04, Tom Lane wrote:

> I think it's time for $SUBJECT.  We added this GUC in 9.5, which
> will be EOL by the time of our next major release, and it was never
> meant as more than a transitional aid.  Moreover, it's been buggy
> as heck (cf abb164655, 05104f693, 01e0cbc4f, 4cae471d1), and the
> fact that some of those errors went undetected for years shows that
> it's not really gotten much field usage.
> 
> Hence, I propose the attached.  Comments?

I wonder if it'd be fruitful to ask the submitters of those bugs about
their experiences with the feature.  Did they find it useful in finding
precedence problems in their code?  Did they experience other problems
that they didn't report?

Reading the reports mentioned in those commits, it doesn't look like any
of them were actually using the feature -- they all seem to have come
across the problems by accidents of varying nature.





Re: A few new options for CHECKPOINT

2020-12-04 Thread Stephen Frost
Greetings,

* Alvaro Herrera (alvhe...@alvh.no-ip.org) wrote:
> I think starting a spread checkpoint has some usefulness, if your
> checkpoint interval is very large but your completion target is not very
> close to 1.  In that case, you're expressing that you want a checkpoint
> to start now and not impact production unduly, so that you know when it
> finishes and therefore when is it a good time to start a backup.  (You
> will still have some WAL to replay, but it won't be as much as if you
> just ignored checkpoint considerations completely.)

You could view an immediate checkpoint as more-or-less being a 'spread'
checkpoint with a checkpoint completion target approaching 0.  In the
end, it's all about how much time you're going to spend trying to get
the data written out, because the WAL that's generated during that time
is what's going to have to get replayed.

If the goal is to not end up with an increase in IO from this, then you
want to spread things out as much as you can over as much time as you're
able to- but that then means that you're going to have that much WAL to
replay.  If you're alright with performing IO to get the amount of WAL
to replay to be minimal, then you just run 'CHECKPOINT;' before your
backup and you're good to go (and is why that's in the documentation as
a way to reduce your WAL replay time- because it reduces it as much as
possible given your IO capabilities).

If you don't mind the increased amount of IO and WAL, you could just
reduce checkpoint_timeout and then crash recovery and snapshot-based
backup recovery will also be reduced, no matter when you actually take
the snapshot.

> On the subject of measuring replay times for backups taking while
> pgbench is pounding the database, I think a realistic test does *not*
> have pgbench running at top speed; rather you have some non-maximal
> "-R xyz" option.  You would probably determine a value to use by running
> without -R, observing what's a typical transaction rate, and using some
> fraction (say, half) of that in the real run.

That'd halve the amount of WAL being generated per unit time, but I
don't think it really changes much when it comes to this particular
analysis..?

If you generate 16MB of WAL per minute, and the checkpoint timeout is 5
minutes, with a checkpoint target of 0.9, then at more-or-less any point
in time you've got ~5 minutes worth of WAL outstanding, or around 80MB.
If your completion target is 0.5 then, really, you might as well make it
0.9 and have your timeout be 2.5m, so that you've got a steady-state of
around 40MB of WAL outstanding.

What I'm getting around to is that the only place this kind of thing
makes sense is where you're front-loading all your IO during the
checkpoint because your checkpoint completion target is less than 0.9
and then, sure, there's a difference between snapshotting right when the
checkpoint completes vs. later- because if you wait around to snapshot,
we aren't actually doing IO during that time and just letting the WAL
build up, but that's an argument to remove checkpoint completion target
as an option that doesn't really make much sense in the first place,
imv, and recommend folks tune checkpoint timeout for the amount of
outstanding WAL they want to have when they are doing recovery (either
from a crash or from a snapshot).

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: Removal of operator_precedence_warning

2020-12-04 Thread Tom Lane
Alvaro Herrera  writes:
> Reading the reports mentioned in those commits, it doesn't look like any
> of them were actually using the feature -- they all seem to have come
> across the problems by accidents of varying nature.

The two oldest reports look like the submitters had
operator_precedence_warning turned on in normal use, which is reasonable
given that was early 9.5.x days.  The third one looks like it was a test
setup, while the latest bug sounds like it was found by code inspection
not by stumbling over the misbehavior.  So people did use it, at least
for awhile.  But anyone who's going directly from 9.4 or earlier to v14
is going to have lots more compatibility issues to worry about besides
precedence.

regards, tom lane




Re: A few new options for CHECKPOINT

2020-12-04 Thread Alvaro Herrera
On 2020-Dec-04, Bossart, Nathan wrote:

> On 12/4/20, 1:47 PM, "Alvaro Herrera"  wrote:
> > On the UI of this patch, you're proposing to add the option FAST.  I'm
> > not a fan of this option name and propose that (if we have it) we use
> > the name SPREAD instead (defaults to false).
> >
> > Now we don't actually explain the term "spread" much in the documentation;
> > we just say "the writes are spread".  But it seems more natural to build
> > on that adjective rather than "fast/slow".
> 
> Here is a version of the patch that uses SPREAD instead of FAST.

WFM.

Instead of adding checkpt_option_list, how about utility_option_list?
It seems intended for reuse.




Hybrid Hash/Nested Loop joins and caching results from subplans

2020-12-04 Thread Zhihong Yu
Hi, David:
For nodeResultCache.c :

+#define SH_EQUAL(tb, a, b) ResultCacheHash_equal(tb, a, b) == 0

I think it would be safer if the comparison is enclosed in parentheses (in
case the macro appears in composite condition).

+ResultCacheHash_equal(struct resultcache_hash *tb, const ResultCacheKey
*key1,
+ const ResultCacheKey *key2)

Since key2 is not used, maybe name it unused_key ?

+   /* Make a guess at a good size when we're not given a valid size. */
+   if (size == 0)
+   size = 1024;

Should the default size be logged ?

+   /* Update the memory accounting */
+   rcstate->mem_used -= freed_mem;

Maybe add an assertion that mem_used is >= 0 after the decrement (there is
an assertion in remove_cache_entry however, that assertion is after another
decrement).

+ * 'specialkey', if not NULL, causes the function to return false if the
entry
+ * entry which the key belongs to is removed from the cache.

duplicate entry (one at the end of first line and one at the beginning of
second line).

For cache_lookup(), new key is allocated before checking
whether rcstate->mem_used > rcstate->mem_upperlimit. It seems new entries
should probably have the same size.
Can we check whether upper limit is crossed (assuming the addition of new
entry) before allocating new entry ?

+   if (unlikely(!cache_reduce_memory(rcstate, key)))
+   return NULL;

Does the new entry need to be released in the above case?

Cheers


Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly

2020-12-04 Thread Michael Paquier
On Fri, Dec 04, 2020 at 04:28:26PM -0300, Alvaro Herrera wrote:
> FWIW I'm with Peter on this.

Okay, attached is a patch to adjust the enums for the set of utility
commands that is the set of things I have touched lately.  Should that
be extended more?  I have not done that as a lot of those structures
exist as such for a long time.
--
Michael
diff --git a/src/include/catalog/index.h b/src/include/catalog/index.h
index c041628049..8f80f9f3aa 100644
--- a/src/include/catalog/index.h
+++ b/src/include/catalog/index.h
@@ -30,13 +30,10 @@ typedef enum
 } IndexStateFlagsAction;
 
 /* options for REINDEX */
-typedef enum ReindexOption
-{
-	REINDEXOPT_VERBOSE = 1 << 0,	/* print progress info */
-	REINDEXOPT_REPORT_PROGRESS = 1 << 1,	/* report pgstat progress */
-	REINDEXOPT_MISSING_OK = 1 << 2, /* skip missing relations */
-	REINDEXOPT_CONCURRENTLY = 1 << 3	/* concurrent mode */
-} ReindexOption;
+#define REINDEXOPT_VERBOSE			(1 << 0)	/* print progress info */
+#define REINDEXOPT_REPORT_PROGRESS	(1 << 1)	/* report pgstat progress */
+#define REINDEXOPT_MISSING_OK		(1 << 2)	/* skip missing relations */
+#define REINDEXOPT_CONCURRENTLY		(1 << 3)	/* concurrent mode */
 
 /* state info for validate_index bulkdelete callback */
 typedef struct ValidateIndexState
diff --git a/src/include/commands/cluster.h b/src/include/commands/cluster.h
index 7cfb37c9b2..ded9379818 100644
--- a/src/include/commands/cluster.h
+++ b/src/include/commands/cluster.h
@@ -20,11 +20,8 @@
 
 
 /* options for CLUSTER */
-typedef enum ClusterOption
-{
-	CLUOPT_RECHECK = 1 << 0,	/* recheck relation state */
-	CLUOPT_VERBOSE = 1 << 1		/* print progress info */
-} ClusterOption;
+#define CLUOPT_RECHECK	(1 << 0)	/* recheck relation state */
+#define CLUOPT_VERBOSE	(1 << 1)	/* print progress info */
 
 extern void cluster(ParseState *pstate, ClusterStmt *stmt, bool isTopLevel);
 extern void cluster_rel(Oid tableOid, Oid indexOid, int options);
diff --git a/src/include/commands/vacuum.h b/src/include/commands/vacuum.h
index a4cd721400..1d7b6eaf04 100644
--- a/src/include/commands/vacuum.h
+++ b/src/include/commands/vacuum.h
@@ -174,17 +174,16 @@ typedef struct VacAttrStats
 	int			rowstride;
 } VacAttrStats;
 
-typedef enum VacuumOption
-{
-	VACOPT_VACUUM = 1 << 0,		/* do VACUUM */
-	VACOPT_ANALYZE = 1 << 1,	/* do ANALYZE */
-	VACOPT_VERBOSE = 1 << 2,	/* print progress info */
-	VACOPT_FREEZE = 1 << 3,		/* FREEZE option */
-	VACOPT_FULL = 1 << 4,		/* FULL (non-concurrent) vacuum */
-	VACOPT_SKIP_LOCKED = 1 << 5,	/* skip if cannot get lock */
-	VACOPT_SKIPTOAST = 1 << 6,	/* don't process the TOAST table, if any */
-	VACOPT_DISABLE_PAGE_SKIPPING = 1 << 7	/* don't skip any pages */
-} VacuumOption;
+/* options for VACUUM */
+#define VACOPT_VACUUM		(1 << 0)	/* do VACUUM */
+#define VACOPT_ANALYZE		(1 << 1)	/* do ANALYZE */
+#define VACOPT_VERBOSE		(1 << 2)	/* print progress info */
+#define VACOPT_FREEZE		(1 << 3)	/* FREEZE option */
+#define VACOPT_FULL			(1 << 4)	/* FULL (non-concurrent) vacuum */
+#define VACOPT_SKIP_LOCKED	(1 << 5)	/* skip if cannot get lock */
+#define VACOPT_SKIPTOAST	(1 << 6)	/* don't process the TOAST table, if
+		 * any */
+#define VACOPT_DISABLE_PAGE_SKIPPING (1 << 7)	/* don't skip any pages */
 
 /*
  * A ternary value used by vacuum parameters.
@@ -207,7 +206,7 @@ typedef enum VacOptTernaryValue
  */
 typedef struct VacuumParams
 {
-	int			options;		/* bitmask of VacuumOption */
+	int			options;		/* bitmask of VACOPT_* values */
 	int			freeze_min_age; /* min freeze age, -1 to use default */
 	int			freeze_table_age;	/* age at which to scan whole table */
 	int			multixact_freeze_min_age;	/* min multixact freeze age, -1 to
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index 14d24b3cc4..5e2cbba407 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -2453,7 +2453,8 @@ ChooseIndexColumnNames(List *indexElems)
 
 /*
  * ReindexParseOptions
- *		Parse list of REINDEX options, returning a bitmask of ReindexOption.
+ *		Parse list of REINDEX options, returning a bitmask of REINDEXOPT_*
+ *		values.
  */
 int
 ReindexParseOptions(ParseState *pstate, ReindexStmt *stmt)


signature.asc
Description: PGP signature


Re: A few new options for CHECKPOINT

2020-12-04 Thread Michael Paquier
On Sat, Dec 05, 2020 at 12:11:13AM +, Bossart, Nathan wrote:
> On 12/4/20, 3:33 PM, "Alvaro Herrera"  wrote:
>> Instead of adding checkpt_option_list, how about utility_option_list?
>> It seems intended for reuse.

+1.  It is intended for reuse.

> Ah, good call.  That simplifies the grammar changes quite a bit.

+CHECKPOINT;
+CHECKPOINT (SPREAD);
+CHECKPOINT (SPREAD FALSE);
+CHECKPOINT (SPREAD ON);
+CHECKPOINT (SPREAD 0);
+CHECKPOINT (SPREAD 2);
+ERROR:  spread requires a Boolean value
+CHECKPOINT (NONEXISTENT);
+ERROR:  unrecognized CHECKPOINT option "nonexistent"
+LINE 1: CHECKPOINT (NONEXISTENT);
Testing for negative cases like those two last ones is fine by me, but
I don't like much the idea of running 5 checkpoints as part of the
main regression test suite (think installcheck with a large shared
buffer pool for example).

--- a/src/include/postmaster/bgwriter.h
+++ b/src/include/postmaster/bgwriter.h
@@ -15,6 +15,8 @@
 #ifndef _BGWRITER_H
 #define _BGWRITER_H

+#include "nodes/parsenodes.h"
+#include "parser/parse_node.h"
I don't think you need to include parsenodes.h here.

+void
+ExecCheckPointStmt(ParseState *pstate, CheckPointStmt *stmt)
+{
Nit: perhaps this could just be ExecCheckPoint()?  See the existing
ExecVacuum().

+   flags = CHECKPOINT_WAIT |
+   (RecoveryInProgress() ? 0 : CHECKPOINT_FORCE) |
+   (spread ? 0 : CHECKPOINT_IMMEDIATE);
The handling done for CHECKPOINT_FORCE and CHECKPOINT_WAIT deserve
a comment.
--
Michael


signature.asc
Description: PGP signature


Re: Single transaction in the tablesync worker?

2020-12-04 Thread Amit Kapila
On Fri, Dec 4, 2020 at 7:12 PM Ashutosh Bapat
 wrote:
>
> On Thu, Dec 3, 2020 at 7:24 PM Amit Kapila  wrote:
> >
> > On Thu, Dec 3, 2020 at 7:04 PM Ashutosh Bapat
> >  wrote:
> > >
> > > On Thu, Dec 3, 2020 at 2:55 PM Amit Kapila  
> > > wrote:
> > > >
> > > > The tablesync worker in logical replication performs the table data
> > > > sync in a single transaction which means it will copy the initial data
> > > > and then catch up with apply worker in the same transaction. There is
> > > > a comment in LogicalRepSyncTableStart ("We want to do the table data
> > > > sync in a single transaction.") saying so but I can't find the
> > > > concrete theory behind the same. Is there any fundamental problem if
> > > > we commit the transaction after initial copy and slot creation in
> > > > LogicalRepSyncTableStart and then allow the apply of transactions as
> > > > it happens in apply worker? I have tried doing so in the attached (a
> > > > quick prototype to test) and didn't find any problems with regression
> > > > tests. I have tried a few manual tests as well to see if it works and
> > > > didn't find any problem. Now, it is quite possible that it is
> > > > mandatory to do the way we are doing currently, or maybe something
> > > > else is required to remove this requirement but I think we can do
> > > > better with respect to comments in this area.
> > >
> > > If we commit the initial copy, the data upto the initial copy's
> > > snapshot will be visible downstream. If we apply the changes by
> > > committing changes per transaction, the data visible to the other
> > > transactions will differ as the apply progresses.
> > >
> >
> > It is not clear what you mean by the above.  The way you have written
> > appears that you are saying that instead of copying the initial data,
> > I am saying to copy it transaction-by-transaction. But that is not the
> > case. I am saying copy the initial data by using REPEATABLE READ
> > isolation level as we are doing now, commit it and then process
> > transaction-by-transaction till we reach sync-point (point till where
> > apply worker has already received the data).
>
> Craig in his mail has clarified this. The changes after the initial
> COPY will be visible before the table sync catches up.
>

I think the problem is not that the changes are visible after COPY
rather it is that we don't have a mechanism to restart if it crashes
after COPY unless we do all the sync up in one transaction. Assume we
commit after COPY and then process transaction-by-transaction and it
errors out (due to connection loss) or crashes, in-between one of the
following transactions after COPY then after the restart we won't know
from where to start for that relation. This is because the catalog
(pg_subscription_rel) will show the state as 'd' (data is being
copied) and the slot would have gone as it was a temporary slot. But
as mentioned in one of my emails above [1] we can solve these problems
which Craig also seems to be advocating for as there are many
advantages of not doing the entire sync (initial copy + stream changes
for that relation) in one single transaction. It will allow us to
support decode of prepared xacts in the subscriber. Also, it seems
pglogical already does processing transaction-by-transaction after the
initial copy. The only thing which is not clear to me is why we
haven't decided to go ahead initially and it would be probably better
if the original authors would also chime-in to at least clarify the
same.

> >
> > > You haven't
> > > clarified whether we will respect the transaction boundaries in the
> > > apply log or not. I assume we will.
> > >
> >
> > It will be transaction-by-transaction.
> >
> > > Whereas if we apply all the
> > > changes in one go, other transactions either see the data before
> > > resync or after it without any intermediate states.
> > >
> >
> > What is the problem even if the user is able to see the data after the
> > initial copy?
> >
> > > That will not
> > > violate consistency, I think.
> > >
> >
> > I am not sure how consistency will be broken.
>
> Some of the transactions applied by apply workers may not have been
> applied by the resync and vice versa. If the intermediate states of
> table resync worker are visible, this difference in applied
> transaction will result in loss of consistency if those transactions
> are changing the table being resynced and some other table in the same
> transaction. The changes won't be atomically visible. Thinking more
> about this, this problem exists today for a table being resynced, but
> at least it's only the table being resynced that is behind the other
> tables so it's predictable.
>

Yeah, I have already shown that this problem [1] exists today and it
won't be predictable when the number of tables to be synced are more.
I am not sure why but it seems acceptable to original authors that the
data of transactions are visibly partially during the initial
synchronization phase for a subscription. I don't se

Re: Proposed patch for key managment

2020-12-04 Thread Bruce Momjian
On Wed, Dec  2, 2020 at 04:38:14PM -0500, Bruce Momjian wrote:
> If most people approve of this general approach, and the design
> decisions made, I would like to apply this in the next few weeks, but
> this brings complications.  The syntax added by this commit might not
> provide a useful feature until PG 15, so how do we hide it from users. 
> I was thinking of not applying the doc changes (or commenting them out)
> and commenting out the --help output.

Here is an updated patch to handle the new hash API introduced by
commit 87ae9691d2.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee



key.diff.gz
Description: application/gzip


Re: convert elog(LOG) calls to ereport

2020-12-04 Thread Michael Paquier
On Fri, Dec 04, 2020 at 02:34:26PM +0100, Peter Eisentraut wrote:
> On 2020-12-02 15:04, Alvaro Herrera wrote:
>> I do wonder if it'd be a good idea to move the syscall
>> name itself out of the message, too; that would reduce the number of
>> messages to translate 50x to just "%s(%s) failed: %m" instead of one
>> message per distinct syscall.
> 
> Seems useful, but perhaps as a separate project.

-   elog(LOG, "getsockname() failed: %m");
+   ereport(LOG,
+   (errmsg("getsockname() failed: %m")));
FWIW, I disagree with the approach taken by eb93f3a.  As of HEAD, it
is now required to translate all those strings.  I think that it would
have been better to remove the function names from all those error
messages and not require the same pattern to be translated N times.
--
Michael


signature.asc
Description: PGP signature


Re: Proposed patch for key managment

2020-12-04 Thread Michael Paquier
On Fri, Dec 04, 2020 at 09:08:03PM -0500, Bruce Momjian wrote:
> Here is an updated patch to handle the new hash API introduced by
> commit 87ae9691d2.

+   if (!ossl_initialized)
+   {
+#ifdef HAVE_OPENSSL_INIT_SSL
+   OPENSSL_init_ssl(OPENSSL_INIT_LOAD_CONFIG, NULL);
+#else
+   OPENSSL_config(NULL);
+   SSL_library_init();
+   SSL_load_error_strings();
+#endif
+   ossl_initialized = true;
This is a duplicate of what's done in be-secure-openssl.c, and it does
not strike me as a good idea to do that potentially twice.

git diff --check complains.

+extern bool pg_HMAC_SHA512(const uint8 *key,
+   const uint8 *in, int inlen,
+   uint8 *out);
I think that the split done in this patch makes the HMAC handling in
the core code messier:
- SCRAM makes use of HMAC internally, and we should try to use the
HMAC of OpenSSL if building with it even for SCRAM.
- For the first reason, I think that we should also have a fallback
implementation.
- This API layer should not depend directly on the SHA2 used (SCRAM
uses SHA256 with HMAC).
FWIW, I got plans to work on that once I am done with the business
around MD5 and OpenSSL.

The refactoring done with the ciphers moved from pgcrypto to
src/common/ should be a separate patch.  In short, it would be good to
rework this patch and split it into pieces that are independently
useful.  This would make the review much easier as well.
--
Michael


signature.asc
Description: PGP signature


Re: Proposed patch for key managment

2020-12-04 Thread Bruce Momjian
On Sat, Dec  5, 2020 at 11:39:18AM +0900, Michael Paquier wrote:
> On Fri, Dec 04, 2020 at 09:08:03PM -0500, Bruce Momjian wrote:
> > Here is an updated patch to handle the new hash API introduced by
> > commit 87ae9691d2.
> 
> +   if (!ossl_initialized)
> +   {
> +#ifdef HAVE_OPENSSL_INIT_SSL
> +   OPENSSL_init_ssl(OPENSSL_INIT_LOAD_CONFIG, NULL);
> +#else
> +   OPENSSL_config(NULL);
> +   SSL_library_init();
> +   SSL_load_error_strings();
> +#endif
> +   ossl_initialized = true;
> This is a duplicate of what's done in be-secure-openssl.c, and it does
> not strike me as a good idea to do that potentially twice.

Yeah, I kind of wondered about that.  In fact, the code from the
original patch would not compile so I got this init code from somewhere
else. I have now removed it and it works fine.  :-)

> git diff --check complains.

Uh, can you be more specific?  I don't see any output from that command.

> +extern bool pg_HMAC_SHA512(const uint8 *key,
> +   const uint8 *in, int inlen,
> +   uint8 *out);
> I think that the split done in this patch makes the HMAC handling in
> the core code messier:
> - SCRAM makes use of HMAC internally, and we should try to use the
> HMAC of OpenSSL if building with it even for SCRAM.
> - For the first reason, I think that we should also have a fallback
> implementation.
> - This API layer should not depend directly on the SHA2 used (SCRAM
> uses SHA256 with HMAC).
> FWIW, I got plans to work on that once I am done with the business
> around MD5 and OpenSSL.

Uh, I just kind of kept all that code and didn't modify it.  It would be
great if you can help me improve it.  I will be using the hash code for
the command-line tool that alters the passphrase, so having that in
common/ does help me.

> The refactoring done with the ciphers moved from pgcrypto to
> src/common/ should be a separate patch.  In short, it would be good to

Uh, I am kind of unclear exactly what was done there since I just took
that part of the patch unchanged.

> rework this patch and split it into pieces that are independently
> useful.  This would make the review much easier as well.

I can break out the -R/file descriptor passing part as a separate patch,
and have the ssl_passphrase_command use that, but that's the only part I
know can be useful on its own.

Since the patch is large, I found a way to push the branch to git and
how to make a download link that tracks whatever I push to the 'key'
branch on my github account.  Here is the updated patch link:

https://github.com/postgres/postgres/compare/master...bmomjian:key.diff

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee





Re: Add Information during standby recovery conflicts

2020-12-04 Thread Masahiko Sawada
On Fri, Dec 4, 2020 at 7:22 PM Drouvot, Bertrand  wrote:
>
> Hi,
>
> On 12/4/20 2:21 AM, Fujii Masao wrote:
> >
> > On 2020/12/04 9:28, Masahiko Sawada wrote:
> >> On Fri, Dec 4, 2020 at 2:54 AM Fujii Masao
> >>  wrote:
> >>>
> >>>
> >>>
> >>> On 2020/12/01 17:29, Drouvot, Bertrand wrote:
>  Hi,
> 
>  On 12/1/20 12:35 AM, Masahiko Sawada wrote:
> > CAUTION: This email originated from outside of the organization.
> > Do not click links or open attachments unless you can confirm the
> > sender and know the content is safe.
> >
> >
> >
> > On Tue, Dec 1, 2020 at 3:25 AM Alvaro Herrera
> >  wrote:
> >> On 2020-Dec-01, Fujii Masao wrote:
> >>
> >>> + if (proc)
> >>> + {
> >>> + if (nprocs == 0)
> >>> + appendStringInfo(&buf, "%d", proc->pid);
> >>> + else
> >>> + appendStringInfo(&buf, ", %d", proc->pid);
> >>> +
> >>> + nprocs++;
> >>>
> >>> What happens if all the backends in wait_list have gone? In
> >>> other words,
> >>> how should we handle the case where nprocs == 0 (i.e., nprocs
> >>> has not been
> >>> incrmented at all)? This would very rarely happen, but can happen.
> >>> In this case, since buf.data is empty, at least there seems no
> >>> need to log
> >>> the list of conflicting processes in detail message.
> >> Yes, I noticed this too; this can be simplified by changing the
> >> condition in the ereport() call to be "nprocs > 0" (rather than
> >> wait_list being null), otherwise not print the errdetail.  (You
> >> could
> >> test buf.data or buf.len instead, but that seems uglier to me.)
> > +1
> >
> > Maybe we can also improve the comment of this function from:
> >
> > + * This function also reports the details about the conflicting
> > + * process ids if *wait_list is not NULL.
> >
> > to " This function also reports the details about the conflicting
> > process ids if exist" or something.
> >
>  Thank you all for the review/remarks.
> 
>  They have been addressed in the new attached patch version.
> >>>
> >>> Thanks for updating the patch! I read through the patch again
> >>> and applied the following chages to it. Attached is the updated
> >>> version of the patch. Could you review this version? If there is
> >>> no issue in it, I'm thinking to commit this version.
> >>
> >> Thank you for updating the patch! I have one question.
> >>
> >>>
> >>> +   timeouts[cnt].id = STANDBY_TIMEOUT;
> >>> +   timeouts[cnt].type = TMPARAM_AFTER;
> >>> +   timeouts[cnt].delay_ms = DeadlockTimeout;
> >>>
> >>> Maybe STANDBY_TIMEOUT should be STANDBY_DEADLOCK_TIMEOUT here?
> >>> I changed the code that way.
> >>
> >> As the comment of ResolveRecoveryConflictWithLock() says the
> >> following, a deadlock is detected by the ordinary backend process:
> >>
> >>   * Deadlocks involving the Startup process and an ordinary backend
> >> proces
> >>   * will be detected by the deadlock detector within the ordinary
> >> backend.
> >>
> >> If we use STANDBY_DEADLOCK_TIMEOUT,
> >> SendRecoveryConflictWithBufferPin() will be called after
> >> DeadlockTimeout passed, but I think it's not necessary for the startup
> >> process in this case.
> >
> > Thanks for pointing this! You are right.
> >
> >
> >> If we want to just wake up the startup process
> >> maybe we can use STANDBY_TIMEOUT here?
> >
> Thanks for the patch updates! Except what we are still discussing below,
> it looks good to me.
>
> > When STANDBY_TIMEOUT happens, a request to release conflicting buffer
> > pins is sent. Right? If so, we should not also use STANDBY_TIMEOUT there?
>
> Agree
>
> >
> > Or, first of all, we don't need to enable the deadlock timer at all?
> > Since what we'd like to do is to wake up after deadlock_timeout
> > passes, we can do that by changing ProcWaitForSignal() so that it can
> > accept the timeout and giving the deadlock_timeout to it. If we do
> > this, maybe we can get rid of STANDBY_LOCK_TIMEOUT from
> > ResolveRecoveryConflictWithLock(). Thought?

Where do we enable deadlock timeout in hot standby case? You meant to
enable it in ProcWaitForSignal() or where we set a timer for not hot
standby case, in ProcSleep()?

>
> Why not simply use (again) the STANDBY_LOCK_TIMEOUT one? (as it triggers
> a call to StandbyLockTimeoutHandler() which does nothing, except waking
> up. That's what we want, right?)

Right, what I wanted to mean is STANDBY_LOCK_TIMEOUT. The startup
process can wake up and do nothing. Thank you for pointing out.

Regards,

-- 
Masahiko Sawada
EnterpriseDB:  https://www.enterprisedb.com/




Re: Hybrid Hash/Nested Loop joins and caching results from subplans

2020-12-04 Thread Zhihong Yu
There are two blocks with almost identical code (second occurrence in
cache_store_tuple):

+   if (rcstate->mem_used > rcstate->mem_upperlimit)
+   {

It would be nice if the code can be extracted to a method and shared.

node->rc_status = RC_END_OF_SCAN;
return NULL;
}
else

There are several places where the else keyword for else block can be
omitted because the if block ends with return.
This would allow the code in else block to move leftward (for easier
reading).

   if (!get_op_hash_functions(hashop, &left_hashfn, &right_hashfn))

I noticed that right_hashfn isn't used. Would this cause some warning from
the compiler (for some compiler the warning would be treated as error) ?
Maybe NULL can be passed as the last parameter. The return value
of get_op_hash_functions would keep the current meaning (find both hash
fn's).

rcstate->mem_lowerlimit = rcstate->mem_upperlimit * 0.98;

Maybe (in subsequent patch) GUC variable can be introduced for tuning the
constant 0.98.

For +paraminfo_get_equal_hashops :

+   else
+   Assert(false);

Add elog would be good for debugging.

Cheers

On Fri, Dec 4, 2020 at 5:09 PM Zhihong Yu  wrote:

> Hi, David:
> For nodeResultCache.c :
>
> +#define SH_EQUAL(tb, a, b) ResultCacheHash_equal(tb, a, b) == 0
>
> I think it would be safer if the comparison is enclosed in parentheses (in
> case the macro appears in composite condition).
>
> +ResultCacheHash_equal(struct resultcache_hash *tb, const ResultCacheKey
> *key1,
> + const ResultCacheKey *key2)
>
> Since key2 is not used, maybe name it unused_key ?
>
> +   /* Make a guess at a good size when we're not given a valid size. */
> +   if (size == 0)
> +   size = 1024;
>
> Should the default size be logged ?
>
> +   /* Update the memory accounting */
> +   rcstate->mem_used -= freed_mem;
>
> Maybe add an assertion that mem_used is >= 0 after the decrement (there is
> an assertion in remove_cache_entry however, that assertion is after another
> decrement).
>
> + * 'specialkey', if not NULL, causes the function to return false if the
> entry
> + * entry which the key belongs to is removed from the cache.
>
> duplicate entry (one at the end of first line and one at the beginning of
> second line).
>
> For cache_lookup(), new key is allocated before checking
> whether rcstate->mem_used > rcstate->mem_upperlimit. It seems new entries
> should probably have the same size.
> Can we check whether upper limit is crossed (assuming the addition of new
> entry) before allocating new entry ?
>
> +   if (unlikely(!cache_reduce_memory(rcstate, key)))
> +   return NULL;
>
> Does the new entry need to be released in the above case?
>
> Cheers
>


Re: Proposed patch for key managment

2020-12-04 Thread Bruce Momjian
On Sat, Dec  5, 2020 at 12:15:13PM +0900, Masahiko Sawada wrote:
> diff --git a/src/common/cryptohash_openssl.c b/src/common/cryptohash_openssl.c
> index e5233daab6..a45c86fa67 100644
> --- a/src/common/cryptohash_openssl.c
> +++ b/src/common/cryptohash_openssl.c
> @@ -81,6 +81,8 @@ pg_cryptohash_create(pg_cryptohash_type type)
> return NULL;
> }
> 
> +   memset(ctx, 0, sizeof(pg_cryptohash_ctx));
> +   memset(state, 0, sizeof(pg_cryptohash_state));
> ctx->data = state;
> ctx->type = type;

OK, I worked with Sawada-san and added the attached patch.  The updated
full patch is at the same URL:  :-)

https://github.com/postgres/postgres/compare/master...bmomjian:key.diff

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee

diff --git a/src/common/cryptohash_openssl.c b/src/common/cryptohash_openssl.c
index e5233daab6..02dec1fd1b 100644
--- a/src/common/cryptohash_openssl.c
+++ b/src/common/cryptohash_openssl.c
@@ -72,14 +72,15 @@ pg_cryptohash_create(pg_cryptohash_type type)
 	ctx = ALLOC(sizeof(pg_cryptohash_ctx));
 	if (ctx == NULL)
 		return NULL;
+	explicit_bzero(ctx, sizeof(pg_cryptohash_ctx));
 
 	state = ALLOC(sizeof(pg_cryptohash_state));
 	if (state == NULL)
 	{
-		explicit_bzero(ctx, sizeof(pg_cryptohash_ctx));
 		FREE(ctx);
 		return NULL;
 	}
+	explicit_bzero(state, sizeof(pg_cryptohash_state));
 
 	ctx->data = state;
 	ctx->type = type;
@@ -97,8 +98,6 @@ pg_cryptohash_create(pg_cryptohash_type type)
 
 	if (state->evpctx == NULL)
 	{
-		explicit_bzero(state, sizeof(pg_cryptohash_state));
-		explicit_bzero(ctx, sizeof(pg_cryptohash_ctx));
 #ifndef FRONTEND
 		ereport(ERROR,
 (errcode(ERRCODE_OUT_OF_MEMORY),


Re: POC: Cleaning up orphaned files using undo logs

2020-12-04 Thread Amit Kapila
On Fri, Dec 4, 2020 at 1:50 PM Antonin Houska  wrote:
>
> Amit Kapila  wrote:
>
>
> > The earlier version of the patch having all these ideas
> > implemented is attached
> > (Infrastructure-to-execute-pending-undo-actions and
> > Provide-interfaces-to-store-and-fetch-undo-records). The second one
> > has some APIs used by the first one but the main concepts were
> > implemented in the first one
> > (Infrastructure-to-execute-pending-undo-actions). I see that in the
> > current version these can't be used as it is but still it can give us
> > a good start point and we might be able to either re-use some code and
> > or ideas from these patches.
>
> Is there a branch with these patches applied? They reference some functions
> that I don't see in [1]. I'd like to examine if / how my approach can be
> aligned with the current zheap design.
>

Can you once check in the patch-set attached in the email [1]?

[1] - 
https://www.postgresql.org/message-id/CA%2BhUKG%2BMpzRsZFE7ChhRq-Br5VYYi6mafVQ73Af7ahioWo5o8w%40mail.gmail.com
-- 
With Regards,
Amit Kapila.




Re: [PATCH] Add support for leading/trailing bytea trim()ing

2020-12-04 Thread Joel Jacobson
On Fri, Dec 4, 2020, at 22:02, Tom Lane wrote:
>"trim left ends" (plural) seems wrong.  A string only has one left end,
>at least in my universe.

Fixed, the extra "s" came from copying from btrim()'s description.

>(Maybe the existing ltrim/rtrim descrs are also like this, but if so
I>'d change them too.)

They weren't, but I think the description for the bytea functions
can be improved to have a more precise description
if we take inspiration from the the text functions.

Here is an overview of all functions containing "trim" in the function name,
to get the full picture of the trim description terminology:

SELECT
  oid,
  pg_describe_object('pg_proc'::regclass,oid,0),
  pg_catalog.obj_description(oid, 'pg_proc')
FROM pg_proc
WHERE proname LIKE '%trim%'
ORDER BY oid;

oid  |  pg_describe_object  | obj_description
--+--+--
  875 | function ltrim(text,text)| trim selected characters from left end 
of string
  876 | function rtrim(text,text)| trim selected characters from right end 
of string
  881 | function ltrim(text) | trim spaces from left end of string
  882 | function rtrim(text) | trim spaces from right end of string
  884 | function btrim(text,text)| trim selected characters from both ends 
of string
  885 | function btrim(text) | trim spaces from both ends of string
2015 | function btrim(bytea,bytea)  | trim both ends of string
5043 | function trim_scale(numeric) | numeric with minimum scale needed to 
represent the value

Do we want the two new functions to derive their description from the existing 
bytea function?

9612 | function ltrim(bytea,bytea)  | trim left end of string
9613 | function rtrim(bytea,bytea)  | trim right end of string

Patch with this wording: 
leading-trailing-trim-bytea-left-right-end-of-string.patch

Or would it be better to be inspired by the more precise descriptions for the 
two parameter text functions,
and to change the existing btrim() function's description as well?

2015 | function btrim(bytea,bytea)  | trim selected bytes from both ends of 
string
9612 | function ltrim(bytea,bytea)  | trim selected bytes from left end of 
string
9613 | function rtrim(bytea,bytea)  | trim selected bytes from right end of 
string

Patch with this wording: leading-trailing-trim-bytea-selected-bytes.patch

Best regards,

Joel



leading-trailing-trim-bytea-left-right-end-of-string.patch
Description: Binary data


leading-trailing-trim-bytea-selected-bytes.patch
Description: Binary data