Re: Strange decreasing value of pg_last_wal_receive_lsn()

2020-06-01 Thread godjan •
Hi, sorry for 2 weeks latency in answer :)

>> It fixed out trouble, but there is one another. Now we should wait when all
>> ha alive hosts finish replaying WAL to failover. It might take a while(for
>> example WAL contains wal_record about splitting b-tree).
> 
> Indeed, this is the concern I wrote about yesterday in a second mail on this
> thread.

Actually, I found out that we use the wrong heuristic to understand that 
standby still replaying WAL.
We compare values of pg_last_wal_replay_lsn() after and before sleeping.
If standby replaying huge wal_record(e.g. splitting b-tree) it gave us the 
wrong result.


> Note that when you promote a node, it first replays available WALs before
> acting as a primary.


Do you know how Postgres understand that standby still replays available WAL?
I didn’t get it from the code of promotion.


> However, how useful is this test in a context of auto failover for
> **service** high availability?

Such a situation has a really low probability in our service. And we thought 
that it could be okay to resolve such a situation with on-call participation.

> Nope, no clean and elegant idea. One your instances are killed, maybe you can
> force flush the system cache (secure in-memory-only data)? 

Do "force flush the system cache” means invoke this command 
https://linux.die.net/man/8/sync  on the 
standby?

> and read the latest received WAL using pg_waldump?


I did an experiment with pg_waldump without sync:
- write data on primary 
- kill primary
- read the latest received WAL using pg_waldump:
0/1D019F38
- pg_last_wal_replay_lsn():
0/1D019F68

So it’s wrong to use pg_waldump to understand what was latest received LSN. At 
least without “forcing flush system cache”.

> If all your nodes are killed in the same
> disaster, how/why an automatic cluster manager should take care of starting 
> all
> nodes again and pick the right node to promote?

1. How?
The automatic cluster manager will restart standbys in such a situation.
If the primary lock in ZK is released automatic cluster manager start process 
of election new primary.
To understand which node should be promoted automatic cluster manager should 
get LSN of the last wal_record wrote on disk by each potential new primary.
We used pg_last_wal_receive_lsn() for it but it was a mistake. Because after 
"kill -9” on standby pg_last_wal_receive_lsn() reports first lsn of last 
segment.

2. Why?
- sleepy on-call in a night can make something bad in such situation)
- pg_waldump didn’t give the last LSN wrote on disk(at least without forcing 
flush the system cache) so I don’t know how on-call can understand which 
standby should be promoted
- automatic cluster manager successfully resolve such situations in clusters 
with one determined synchronous standby for years, and we hope it’s possible to 
do it in clusters with quorum replication





Re: OpenSSL 3.0.0 compatibility

2020-06-01 Thread Peter Eisentraut

On 2020-05-30 14:34, Andrew Dunstan wrote:


On 5/28/20 6:16 PM, Daniel Gustafsson wrote:


OpenSSL also deprecates DES keys in 3.0.0, which cause our password callback
tests to fail with the cryptic error "fetch failed", as the test suite keys are
encrypted with DES.  0002 fixes this by changing to AES256 (randomly chosen
among the ciphers supported in 1.0.1+ and likely to be around), and could be
applied already today as there is nothing 3.0.0 specific about it.



+1 for applying this forthwith. The key in my recent commit 896fcdb230
is encrypted with AES256.


I don't see anything in that commit about how to regenerate those files, 
such as a makefile rule.  Is that missing?


--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Incorrect comment in be-secure-openssl.c

2020-06-01 Thread Daniel Gustafsson
> On 1 Jun 2020, at 08:06, Michael Paquier  wrote:

> The problem I have with first and second flavors is that "DH
> parameters files" does not sound right.  First, the grammar sounds
> incorrect to me as in this case "parameters" should not be plural.

I think "parameters" is the right term here, as the shared secret is determines
a set of Diffie-Hellman parameters.

> Second, it is only possible to load one file with ssl_dh_params_file,
> and we only attempt to load this single file within initialize_dh().

Thats correct, this is a leftover from when we allowed for different DH sizes
and loaded the appropriate file.  This was removed in c0a15e07cd718cb6e455e683
in favor of only using 2048.

> Of course it would be possible to just switch to "DH parameter file"
> in the first part of the sentence, but I have just finished by
> rewriting the whole thing, as the third flavor.

I don't have a problem with the existing wording of the first sentence, and I
don't have a problem with your suggestion either (as long as it's parameters in
plural).

cheers ./daniel



Re: OpenSSL 3.0.0 compatibility

2020-06-01 Thread Peter Eisentraut

On 2020-05-31 04:52, Michael Paquier wrote:

593d4e4 claims that we only support OpenSSL >= 0.9.8, meaning that
down to PG 10 we have this requirement, and that PG 9.6 and 9.5 should
be able to work with OpenSSL 0.9.7 and 0.9.6, but little effort has
been put in testing these.


Then we can stick a OPENSSL_API_COMPAT=908 into at least PG10, 11, and 
12, and probably also into PG9.5 and 9.6 without harm.


--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: OpenSSL 3.0.0 compatibility

2020-06-01 Thread Daniel Gustafsson
> On 30 May 2020, at 11:29, Peter Eisentraut  
> wrote:
> 
> On 2020-05-29 14:45, Daniel Gustafsson wrote:
>>> I think we should set OPENSSL_API_COMPAT=10001, and move that along with 
>>> whatever our oldest supported release is going forward.  That declares our 
>>> intention, it will silence the deprecation warnings, and IIUC, if the 
>>> deprecated stuff actually gets removed, you get a clean compiler error that 
>>> your API level is too low.
>> I think I know what you mean but just to clarify: I master, back-branches or
>> all of the above?
> 
> I'm not sure.  I don't have a good sense of what OpenSSL versions we claim to 
> support in branches older than PG13.  We made a conscious decision for 1.0.1 
> in PG13, but I seem to recall that that discussion also revealed that the 
> version assumptions before that were quite inconsistent.  Code in PG12 and 
> before makes references to OpenSSL as old as 0.9.6.  But OpenSSL 3.0.0 will 
> reject a compat level older than 0.9.8.
> 
> My proposal would be to introduce OPENSSL_API_COMPAT=10001 into master after 
> the 13/14 branching, along with any other changes to make it compile cleanly 
> against OpenSSL 3.0.0.  Once that has survived some scrutiny from the 
> buildfarm and also from folks building against LibreSSL etc., it should 
> probably be backpatched into PG13.  In the immediate future, I wouldn't 
> bother about the older branches (<=PG12) at all.  As long as they still 
> compile, users can just disable deprecation warnings, and we may add some 
> patches to that effect at some point, but it's not like OpenSSL 3.0.0 will be 
> adopted into production builds any time soon.
> 
>> Considering how little effort it is to not use the deprecated API's I'm not
>> entirely convinced, but I don't have too strong opinions there.
> 
> Well, in the case like X509_STORE_load_locations(), the solution is in either 
> case to write a wrapper.  It doesn't matter if we write the wrapper or 
> OpenSSL writes the wrapper.  Only OpenSSL has already written the wrapper and 
> has created a well-defined way to declare that you want to use the wrapper, 
> so I'd just take that.

I'll buy that argument.

> In any case, using OPENSSL_API_COMPAT is also good just for our own 
> documentation, so we can keep track of what version we claim to support in 
> different branches.

Good point.

>>> There is also the question of what to do with the test suites in the back 
>>> branches.
>> If we don't want to change the testdata in the backbranches, we could add a
>> SKIP section for the password key tests iff OpenSSL is 3.0.0+?
> 
> I suggest to update the test data in PG13+, since we require OpenSSL 1.0.1 
> there.  For the older branches, I would look into changing the test driver 
> setup so that it loads a custom openssl.cnf that loads the legacy providers.

Ok, I'll roll a patch along these lines for master for ~ the 13/14 branch time
and then we'll see how we deal with PG13 once the dust has settled not only on
our side but for OpenSSL.

cheers ./daniel



[POC] Fast COPY FROM command for the table with foreign partitions

2020-06-01 Thread Andrey Lepikhov

Hi, hackers!

Currently i see, COPY FROM insertion into the partitioned table with 
foreign partitions is not optimal: even if table constraints allows can 
do multi insert copy, we will flush the buffers and prepare new INSERT 
query for each tuple, routed into the foreign partition.
To solve this problem i tried to use the multi insert buffers for 
foreign tuples too. Flushing of these buffers performs by the analogy 
with 'COPY .. FROM STDIN' machinery as it is done by the psql '\copy' 
command.
The patch in attachment was prepared from the private scratch developed 
by Arseny Sher a couple of years ago.

Benchmarks shows that it speeds up COPY FROM operation:
Command "COPY pgbench_accounts FROM ..." (test file contains 1e7 tuples, 
copy to three partitions) executes on my laptop in 14 minutes without 
the patch and in 1.5 minutes with the patch. Theoretical minimum here 
(with infinite buffer size) is 40 seconds.


A couple of questions:
1. Can this feature be interesting for the PostgreSQL core or not?
2. If this is a useful feature, is the correct way chosen?

--
Andrey Lepikhov
Postgres Professional
https://postgrespro.com
The Russian Postgres Company
>From 6cc545a00f6f6b18926602880819eeed9313550b Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" 
Date: Fri, 29 May 2020 10:39:57 +0500
Subject: [PATCH] Fast COPY FROM into the foreign (or sharded) table.

---
 contrib/postgres_fdw/deparse.c|  25 ++
 .../postgres_fdw/expected/postgres_fdw.out|   5 +-
 contrib/postgres_fdw/postgres_fdw.c   |  95 
 contrib/postgres_fdw/postgres_fdw.h   |   1 +
 src/backend/commands/copy.c   | 216 --
 src/include/commands/copy.h   |   5 +
 src/include/foreign/fdwapi.h  |   9 +
 7 files changed, 289 insertions(+), 67 deletions(-)

diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index ad37a74221..427402c8eb 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -1758,6 +1758,31 @@ deparseInsertSql(StringInfo buf, RangeTblEntry *rte,
 		 withCheckOptionList, returningList, retrieved_attrs);
 }
 
+/*
+ * Deparse COPY FROM into given buf.
+ * We need to use list of parameters at each query.
+ */
+void
+deparseCopyFromSql(StringInfo buf, Relation rel)
+{
+	int attnum;
+
+	appendStringInfoString(buf, "COPY ");
+	deparseRelation(buf, rel);
+	appendStringInfoString(buf, " ( ");
+
+	for(attnum = 0; attnum < rel->rd_att->natts; attnum++)
+	{
+		appendStringInfoString(buf, NameStr(rel->rd_att->attrs[attnum].attname));
+
+		if (attnum != rel->rd_att->natts-1)
+			appendStringInfoString(buf, ", ");
+	}
+
+	appendStringInfoString(buf, " ) ");
+	appendStringInfoString(buf, " FROM STDIN ");
+}
+
 /*
  * deparse remote UPDATE statement
  *
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 90db550b92..5ae24fef7c 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -8063,8 +8063,9 @@ copy rem2 from stdin;
 copy rem2 from stdin; -- ERROR
 ERROR:  new row for relation "loc2" violates check constraint "loc2_f1positive"
 DETAIL:  Failing row contains (-1, xyzzy).
-CONTEXT:  remote SQL command: INSERT INTO public.loc2(f1, f2) VALUES ($1, $2)
-COPY rem2, line 1: "-1	xyzzy"
+CONTEXT:  COPY loc2, line 1: "-1	xyzzy"
+remote SQL command: COPY public.loc2 ( f1, f2 )  FROM STDIN 
+COPY rem2, line 2
 select * from rem2;
  f1 | f2  
 +-
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 9fc53cad68..bd2a8f596f 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -18,6 +18,7 @@
 #include "access/sysattr.h"
 #include "access/table.h"
 #include "catalog/pg_class.h"
+#include "commands/copy.h"
 #include "commands/defrem.h"
 #include "commands/explain.h"
 #include "commands/vacuum.h"
@@ -190,6 +191,7 @@ typedef struct PgFdwModifyState
 	/* for update row movement if subplan result rel */
 	struct PgFdwModifyState *aux_fmstate;	/* foreign-insert state, if
 			 * created */
+	CopyState fdwcstate;
 } PgFdwModifyState;
 
 /*
@@ -350,12 +352,16 @@ static TupleTableSlot *postgresExecForeignDelete(EState *estate,
  ResultRelInfo *resultRelInfo,
  TupleTableSlot *slot,
  TupleTableSlot *planSlot);
+static void postgresExecForeignCopy(ResultRelInfo *resultRelInfo,
+ TupleTableSlot *slot);
 static void postgresEndForeignModify(EState *estate,
 	 ResultRelInfo *resultRelInfo);
 static void postgresBeginForeignInsert(ModifyTableState *mtstate,
 	   ResultRelInfo *resultRelInfo);
 static void postgresEndForeignInsert(EState *estate,
 	 ResultRelInfo *resultRelInfo);
+static void postgresBeginForeignCopy(ResultRelInfo *resultRelInfo);
+static void postgresEndForeignCopy(ResultRelInfo *resultRelInfo, bool status)

Re: pg_dump dumps row level policies on extension tables

2020-06-01 Thread Masahiko Sawada
On Tue, 19 May 2020 at 15:31, Pavan Deolasee  wrote:
>
> Hi,
>
> I noticed that if a row level policy is defined on an extension
> object, even in the extension creation script, pg_dump dumps a
> separate CREATE POLICY statement for such policies. That makes the
> dump unrestorable because the CREATE EXTENSION and CREATE POLICY then
> conflicts.
>
> Here is a simple example. I just abused the pageinspect contrib module
> to demonstrate the problem.
>
> ```
> diff --git a/contrib/pageinspect/pageinspect--1.5.sql
> b/contrib/pageinspect/pageinspect--1.5.sql
> index 1e40c3c97e..f04d70d1c1 100644
> --- a/contrib/pageinspect/pageinspect--1.5.sql
> +++ b/contrib/pageinspect/pageinspect--1.5.sql
> @@ -277,3 +277,9 @@ CREATE FUNCTION gin_leafpage_items(IN page bytea,
>  RETURNS SETOF record
>  AS 'MODULE_PATHNAME', 'gin_leafpage_items'
>  LANGUAGE C STRICT PARALLEL SAFE;
> +
> +-- sample table
> +CREATE TABLE pf_testtab (a int, b int);
> +-- sample policy
> +CREATE POLICY p1 ON pf_testtab
> +FOR SELECT USING (true);
> ```
>
> If I now take a dump of a database with pageinspect extension created,
> the dump has the following.
>
> ```
>
> --
> -- Name: pageinspect; Type: EXTENSION; Schema: -; Owner:
> --
>
> CREATE EXTENSION IF NOT EXISTS pageinspect WITH SCHEMA public;
>
> --
> -- Name: pf_testtab p1; Type: POLICY; Schema: public; Owner: pavan
> --
>
> CREATE POLICY p1 ON public.pf_testtab FOR SELECT USING (true);
>
> ```
>
> That's a problem. The CREATE POLICY statement fails during restore
> because CREATE EXTENSION already creates the policy.
>
> Are we missing recording dependency on extension for row level
> policies? Or somehow pg_dump should skip dumping those policies?
>

I think we don't support this case as the comment in
checkExtensionMembership() describes:

/*
 * In 9.6 and above, mark the member object to have any non-initial ACL,
 * policies, and security labels dumped.
 *
 * Note that any initial ACLs (see pg_init_privs) will be removed when we
 * extract the information about the object.  We don't provide support for
 * initial policies and security labels and it seems unlikely for those to
 * ever exist, but we may have to revisit this later.
 *
 * Prior to 9.6, we do not include any extension member components.
 *
 * In binary upgrades, we still dump all components of the members
 * individually, since the idea is to exactly reproduce the database
 * contents rather than replace the extension contents with something
 * different.
 */

Regards,

-- 
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: OpenSSL 3.0.0 compatibility

2020-06-01 Thread Andrew Dunstan


On 6/1/20 4:33 AM, Peter Eisentraut wrote:
> On 2020-05-30 14:34, Andrew Dunstan wrote:
>>
>> On 5/28/20 6:16 PM, Daniel Gustafsson wrote:
>>>
>>> OpenSSL also deprecates DES keys in 3.0.0, which cause our password
>>> callback
>>> tests to fail with the cryptic error "fetch failed", as the test
>>> suite keys are
>>> encrypted with DES.  0002 fixes this by changing to AES256 (randomly
>>> chosen
>>> among the ciphers supported in 1.0.1+ and likely to be around), and
>>> could be
>>> applied already today as there is nothing 3.0.0 specific about it.
>>>
>>
>> +1 for applying this forthwith. The key in my recent commit 896fcdb230
>> is encrypted with AES256.
>
> I don't see anything in that commit about how to regenerate those
> files, such as a makefile rule.  Is that missing?



You missed these comments in the test file:


# self-signed cert was generated like this:
# system('openssl req -new -x509 -days 1 -nodes -out server.crt
-keyout server.ckey -subj "/CN=localhost"');
# add the cleartext passphrase to the key, remove the unprotected key
# system("openssl rsa -aes256 -in server.ckey -out server.key -passout
pass:$clearpass");
# unlink "server.ckey";


If you want I can add a rule for it to the Makefile, although who knows
what commands will actually apply when the certificate runs out?


cheers


andrew

-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services





Re: OpenSSL 3.0.0 compatibility

2020-06-01 Thread Daniel Gustafsson
> On 1 Jun 2020, at 13:58, Andrew Dunstan  
> wrote:

> If you want I can add a rule for it to the Makefile, although who knows
> what commands will actually apply when the certificate runs out?

Being able to easily regenerate the testdata, regardless of expiration status,
has proven very helpful for me when implementing support for new TLS backends.
+1 for adding it to the Makefile.

cheers ./daniel



Wrong width of UNION statement

2020-06-01 Thread Kenichiro Tanaka
Hello hackers

I think I found a bug about estimating width of table column when I
perform SQL with UNION statement.

I think table column width of  UNION statement should be equal one of UNION ALL.
But they don't match.This can be reproduce it on HEAD.

See following example.

--CREATE TEST TABLE
DROP TABLE union_test;DROP TABLE union_test2;
CREATE TABLE union_test  AS SELECT md5(g::text)::char(84) as data FROM
generate_series(1,1000) as g;
CREATE TABLE union_test2 AS SELECT md5(g::text)::char(84) as data FROM
generate_series(1,1000) as g;
ANALYZE union_test;
ANALYZE union_test2;

--width of union_test is 85.
SELECT * FROM union_test;
 QUERY PLAN

 Seq Scan on union_test  (cost=0.00..25.00 rows=1000 width=85) (actual
time=0.591..1.166 rows=1000 loops=1)
 Planning Time: 10.559 ms
 Execution Time: 2.974 ms
(3 rows)

--width of UNION is 340(wrong)
EXPLAIN ANALYZE
SELECT * FROM union_test
UNION
SELECT * FROM union_test2;

QUERY PLAN
-
HashAggregate  (cost=85.00..105.00 rows=2000 width=*340*) (actual
time=3.323..3.672 rows=1000 loops=1)
Group Key: union_test.data
Peak Memory Usage: 369 kB
->  Append  (cost=0.00..80.00 rows=2000 width=340) (actual
time=0.021..1.191 rows=2000 loops=1)
->  Seq Scan on union_test  (cost=0.00..25.00 rows=1000 width=85)
(actual time=0.019..0.393 rows=1000 loops=1)
->  Seq Scan on union_test2  (cost=0.00..25.00 rows=1000 width=85)
(actual time=0.027..0.302 rows=1000 loops=1)
Planning Time: 0.096 ms
Execution Time: 3.908 ms
(8 rows)

--width of UNION ALL is 85
EXPLAIN ANALYZE
SELECT * FROM union_test
UNION ALL
SELECT * FROM union_test2;

QUERY PLAN
---
Append  (cost=0.00..60.00 rows=2000 width=85) (actual
time=0.017..1.187 rows=2000 loops=1)
->  Seq Scan on union_test  (cost=0.00..25.00 rows=1000 width=85)
(actual time=0.017..0.251 rows=1000 loops=1)
->  Seq Scan on union_test2  (cost=0.00..25.00 rows=1000 width=85)
(actual time=0.018..0.401 rows=1000 loops=1)
Planning Time: 0.213 ms
Execution Time: 1.444 ms
(5 rows)

I think this is bug, is it right?

Regards
Kenichiro Tanaka.




Re: OpenSSL 3.0.0 compatibility

2020-06-01 Thread Andrew Dunstan

On 6/1/20 8:03 AM, Daniel Gustafsson wrote:
>> On 1 Jun 2020, at 13:58, Andrew Dunstan  
>> wrote:
>> If you want I can add a rule for it to the Makefile, although who knows
>> what commands will actually apply when the certificate runs out?
> Being able to easily regenerate the testdata, regardless of expiration status,
> has proven very helpful for me when implementing support for new TLS backends.
> +1 for adding it to the Makefile.
>


OK, here's a patch.


cheers


andrew


-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

diff --git a/src/test/modules/ssl_passphrase_callback/Makefile b/src/test/modules/ssl_passphrase_callback/Makefile
index a3b518b50d..f81265c296 100644
--- a/src/test/modules/ssl_passphrase_callback/Makefile
+++ b/src/test/modules/ssl_passphrase_callback/Makefile
@@ -20,3 +20,21 @@ include $(top_srcdir)/contrib/contrib-global.mk
 endif
 
 SHLIB_LINK += $(filter -lssl -lcrypto -lssleay32 -leay32, $(LIBS))
+
+# Targets to generate or remove the ssl certificate and key
+# Normally not needed. Don't run these targets in a vpath build, the results
+# won't be in the right place if you do.
+
+# needs to agree with what's in the test script
+PASS = FooBaR1
+
+.PHONY: ssl-files ssl-files-clean
+
+ssl-files:
+	openssl req -new -x509 -days 1 -nodes -out server.crt \
+		-keyout server.ckey -subj "/CN=localhost"
+	openssl rsa -aes256 -in server.ckey -out server.key -passout pass:$(PASS)
+	rm server.ckey
+
+ssl-files-clean:
+	rm -f server.crt server.key


Re: Inlining of couple of functions in pl_exec.c improves performance

2020-06-01 Thread Amit Khandekar
On Mon, 1 Jun 2020 at 12:27, Pavel Stehule  wrote:
> po 1. 6. 2020 v 8:15 odesílatel Amit Khandekar  
> napsal:
>>
>> On Sat, 30 May 2020 at 11:11, Pavel Stehule  wrote:
>> > I think so the effect of these patches strongly depends on CPU and compile
>>
>> I quickly tried pi() with gcc 10 as well, and saw more or less the
>> same benefit. I think, we are bound to see some differences in the
>> benefits across architectures, kernels and compilers; but looks like
>> some benefit is always there.
>>
>> > but it is micro optimization, and when I look to profiler, the bottle neck 
>> > is elsewhere.
>>
>> Please check the perf numbers in my reply to Michael. I suppose you
>> meant CachedPlanIsSimplyValid() when you say the bottle neck is
>> elsewhere ? Yeah, this function is always the hottest spot, which I
>> recall is being discussed elsewhere. But I always see exec_stmt(),
>> exec_assign_value as the next functions.
>
>
> It is hard to read the profile result, because these functions are nested 
> together. For your example
>
> 18.22%  postgres  postgres   [.] CachedPlanIsSimplyValid
>
> Is little bit strange, and probably this is real bottleneck in your simple 
> example, and maybe some work can be done there, because you assign just 
> constant.

I had earlier had a quick look on this one. CachedPlanIsSimplyValid()
was, I recall, hitting a hotspot when it tries to access
plansource->search_path (possibly cacheline miss). But didn't get a
chance to further dig on that. For now, i am focusing on these other
functions for which the patches were submitted.


>
> On second hand, your example is pretty unrealistic - and against any 
> developer's best practices for writing cycles.
>
> I think so we can look on PostGIS, where is some computing heavy routines in 
> PLpgSQL, and we can look on real profiles.
>
> Probably the most people will have benefit from these optimization.

I understand it's not a real world example. For generating perf
figures, I had to use an example which amplifies the benefits, so that
the effect of the patches on the perf figures also becomes visible.
Hence, used that example. I had shown the benefits up-thread using a
practical function avg_space(). But the perf figures for that example
were varying a lot.

So below, what I did was : Run the avg_space() ~150 times, and took
perf report. This stabilized the results a bit :

HEAD :
+   16.10%  17.29%  16.82%  postgres  postgres[.]
ExecInterpExpr
+   13.80%  13.56%  14.49%  postgres  plpgsql.so  [.]
exec_assign_value
+   12.64%  12.10%  12.09%  postgres  plpgsql.so  [.]
plpgsql_param_eval_var
+   12.15%  11.28%  11.05%  postgres  plpgsql.so  [.]
exec_stmt
+   10.81%  10.24%  10.55%  postgres  plpgsql.so  [.]
exec_eval_expr
+9.50%   9.35%   9.37%  postgres  plpgsql.so  [.]
exec_cast_value
.
+1.19%   1.06%   1.21%  postgres  plpgsql.so  [.]
exec_stmts


0001+0002 patches applied (i.e. inline exec_stmt) :
+   16.90%  17.20%  16.54%  postgres  postgres[.]
ExecInterpExpr
+   16.42%  15.37%  15.28%  postgres  plpgsql.so  [.]
exec_assign_value
+   11.34%  11.92%  11.93%  postgres  plpgsql.so  [.]
plpgsql_param_eval_var
+   11.18%  11.86%  10.99%  postgres  plpgsql.so  [.] exec_stmts.part.0
+   10.51%   9.52%  10.61%  postgres  plpgsql.so  [.]
exec_eval_expr
+9.39%   9.48%   9.30%  postgres  plpgsql.so  [.]
exec_cast_value

HEAD : exec_stmts + exec_stmt = ~12.7 %
Patched (0001+0002): exec_stmts = 11.3 %

Just 0003 patch applied (i.e. inline exec_cast_value) :
+   17.00%  16.77%  17.09% postgres  postgres   [.] ExecInterpExpr
+   15.21%  15.64%  15.09% postgres  plpgsql.so [.] exec_assign_value
+   14.48%  14.06%  13.94% postgres  plpgsql.so [.] exec_stmt
+   13.26%  13.30%  13.14% postgres  plpgsql.so [.]
plpgsql_param_eval_var
+   11.48%  11.64%  12.66% postgres  plpgsql.so [.] exec_eval_expr

+1.03%   0.85%   0.87% postgres  plpgsql.so [.] exec_stmts

HEAD : exec_assign_value + exec_cast_value = ~23.4 %
Patched (0001+0002): exec_assign_value =  15.3%


Time in milliseconds after calling avg_space() 150 times :
HEAD  : 7210
Patch 0001+0002 : 6925
Patch 0003 : 6670
Patch 0001+0002+0003 : 6346




Re: OpenSSL 3.0.0 compatibility

2020-06-01 Thread Tom Lane
Andrew Dunstan  writes:
> On 6/1/20 8:03 AM, Daniel Gustafsson wrote:
>> +1 for adding it to the Makefile.

> OK, here's a patch.

Likewise +1 for having it in the makefile.  But now you have two copies,
the other being in comments in the test script.  The latter should go
away, as we surely won't remember to maintain it.  (You could replace it
with a pointer to the makefile rules if you want.)

regards, tom lane




Re: feature idea: use index when checking for NULLs before SET NOT NULL

2020-06-01 Thread John Bachir


On Fri, May 29, 2020, at 10:10 PM, Justin Pryzby wrote:

> If you do it right, you can see a DEBUG:

> postgres=# SET client_min_messages=debug;
> postgres=# ALTER TABLE tn ALTER i SET NOT NULL ;
> DEBUG:  existing constraints on column "tn"."i" are sufficient to prove 
> that it does not contain nulls

Thanks! I'll add that to my recipe for the future. Although by that time it 
would be too late, so to make use of this I would have to set up a cloned test 
environment and hope that all conditions are correctly cloned. Is there a way 
to check sufficiency before running the command?


> That the duration decreased every time may have been due to caching?
> How big is the table vs RAM ?

Table is about 10 gigs, machine has 16gigs,  I'm hoping OS & PG did not decided 
to kick out everything else from ram when doing the operation. But even with 
caching, the final command being 20ms, and the first 2 commands being the same 
time as a table scan, seems like something other than caching is at play here? 
IDK!

> Do you know if the SET NOT NULL blocked or not ?
> Maybe something else had a nontrivial lock on the table, and those commands
> were waiting on lock.  If you "SET deadlock_timeout='1'; SET
> log_lock_waits=on;", then you could see that.

I don't know if it blocked. Great idea! I'll add that to my recipe as well.

John


p.s. current recipe: 
https://gist.github.com/jjb/fab5cc5f0e1b23af28694db4fc01c55a
p.p.s I think one of the biggest surprises was that setting the NOT NULL 
condition was slow. That's totally unrelated to this feature though and out of 
scope for this list though, I asked about it here 
https://dba.stackexchange.com/questions/268301/why-is-add-constraint-not-valid-taking-a-long-time




Re: Wrong width of UNION statement

2020-06-01 Thread Tom Lane
Kenichiro Tanaka  writes:
> I think table column width of  UNION statement should be equal one of UNION 
> ALL.

I don't buy that argument, because there could be type coercions involved,
so that the result width isn't necessarily equal to any one of the inputs.

Having said that, the example you give shows that we make use of
pg_statistic.stawidth values when estimating the width of immediate
relation outputs, but that data isn't available by the time we get to
a UNION output.  So we fall back to get_typavgwidth, which in this
case is going to produce something involving the typmod times the
maximum encoding length.  (I suppose you're using UTF8 encoding...)

There's room for improvement there, but this is all bound up in the legacy
mess that we have in prepunion.c.  For example, because we don't have
RelOptInfo nodes associated with individual set-operation outputs, it's
difficult to figure out where we might store data about the widths of such
outputs.  Nor could we easily access the data if we had it, since the
associated Vars don't have valid RTE indexes.  So to my mind, that code
needs to be thrown away and rewritten, using actual relations to represent
the different setop results and Paths to represent possible computations.
In the meantime, it's hard to get excited about layering some additional
hacks on top of what's there now.

regards, tom lane




Small code cleanup

2020-06-01 Thread Mark Dilger
One line change to remove a duplicate check.



v1-0001-Code-cleanup.patch
Description: Binary data

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company





Re: Compatible defaults for LEAD/LAG

2020-06-01 Thread Tom Lane
Pavel Stehule  writes:
> po 1. 6. 2020 v 4:07 odesílatel Tom Lane  napsal:
>> That's just the tip of the iceberg, though.  If you consider all the
>> old-style polymorphic types, we have [for example]
>> array_append(anyarray,anyelement)

> I am not sure, if using anycompatible for buildin's array functions can be
> good idea. Theoretically a users can do new performance errors due hidden
> cast of a longer array.

I don't buy that argument.  If the query requires casting int4[] to
int8[], making the user do it by hand isn't going to improve performance
over having the parser insert the coercion automatically.  Sure, there
will be some fraction of queries that could be rewritten to avoid the
need for any cast, but so what?  Often the performance difference isn't
going to matter; and when it does, I don't see that this is any different
from hundreds of other cases in which there are more-efficient and
less-efficient ways to write a query.  SQL is full of performance traps
and always will be.  You're also assuming that when the user gets an
"operator does not exist" error from "int4[] || int8", that will magically
lead them to choosing an optimal substitute.  I know of no reason to
believe that --- it's at least as likely that they'll conclude it just
can't be done, as in the LAG() example we started the thread with.  So
I think most people would be much happier if the system just did something
reasonable, and they can optimize later if it's important.

> When I
> though about this cases, and about designing functions compatible with
> Oracle I though about another generic family (families) with different
> behave (specified by suffix or maybe with typemod or with some syntax):

So we're already deciding anycompatible can't get the job done?  Maybe
it's a good thing we had this conversation now.  It's not too late to
rip the feature out of v13 altogether, and try again later.  But if
you think I'm going to commit yet another variant of polymorphism on
top of this one, you're mistaken.

regards, tom lane




Re: Small code cleanup

2020-06-01 Thread Tom Lane
Mark Dilger  writes:
> One line change to remove a duplicate check.

The comment just above this mentions a connection to the "Finish printing
the footer information about a table" stanza below.  I think some work is
needed to clarify what's going on there --- it doesn't seem actually
buggy, but there are multiple lies embedded in these comments.  I'm also
questioning somebody's decision to wedge partitioning into this logic
without refactoring any existing if's, as they seem to have done.  At the
very least we're issuing useless queries here, for instance looking for
inheritance parents of matviews.

regards, tom lane




Re: Small code cleanup

2020-06-01 Thread Mark Dilger



> On Jun 1, 2020, at 8:53 AM, Tom Lane  wrote:
> 
> Mark Dilger  writes:
>> One line change to remove a duplicate check.
> 
> The comment just above this mentions a connection to the "Finish printing
> the footer information about a table" stanza below.  I think some work is
> needed to clarify what's going on there --- it doesn't seem actually
> buggy, but there are multiple lies embedded in these comments.  I'm also
> questioning somebody's decision to wedge partitioning into this logic
> without refactoring any existing if's, as they seem to have done.  At the
> very least we're issuing useless queries here, for instance looking for
> inheritance parents of matviews.

Yeah, I noticed the `git blame` last night when writing the patch that you had 
originally wrote the code around 2017, and that the duplication was introduced 
in a patch committed by others around 2018.  I was hoping that you, as the 
original author, or somebody involved in the 2018 patch, might have a deeper 
understanding of what's being done and volunteer to clean up the comments.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: Compatible defaults for LEAD/LAG

2020-06-01 Thread Pavel Stehule
po 1. 6. 2020 v 17:36 odesílatel Tom Lane  napsal:

> Pavel Stehule  writes:
> > po 1. 6. 2020 v 4:07 odesílatel Tom Lane  napsal:
> >> That's just the tip of the iceberg, though.  If you consider all the
> >> old-style polymorphic types, we have [for example]
> >> array_append(anyarray,anyelement)
>
> > I am not sure, if using anycompatible for buildin's array functions can
> be
> > good idea. Theoretically a users can do new performance errors due hidden
> > cast of a longer array.
>
> I don't buy that argument.  If the query requires casting int4[] to
> int8[], making the user do it by hand isn't going to improve performance
> over having the parser insert the coercion automatically.  Sure, there
> will be some fraction of queries that could be rewritten to avoid the
> need for any cast, but so what?  Often the performance difference isn't
> going to matter; and when it does, I don't see that this is any different
> from hundreds of other cases in which there are more-efficient and
> less-efficient ways to write a query.  SQL is full of performance traps
> and always will be.  You're also assuming that when the user gets an
> "operator does not exist" error from "int4[] || int8", that will magically
> lead them to choosing an optimal substitute.  I know of no reason to
> believe that --- it's at least as likely that they'll conclude it just
> can't be done, as in the LAG() example we started the thread with.  So
> I think most people would be much happier if the system just did something
> reasonable, and they can optimize later if it's important.
>
> > When I
> > though about this cases, and about designing functions compatible with
> > Oracle I though about another generic family (families) with different
> > behave (specified by suffix or maybe with typemod or with some syntax):
>
> So we're already deciding anycompatible can't get the job done?  Maybe
> it's a good thing we had this conversation now.  It's not too late to
> rip the feature out of v13 altogether, and try again later.  But if
> you think I'm going to commit yet another variant of polymorphism on
> top of this one, you're mistaken.
>

anycompatible types are fully conssistent with Postgres buildin functions
supported by parser. It is main benefit and important benefit.
Without anycompatible types is pretty hard to write variadic functions with
friendly behave like buildin functions has.
It can be perfect for LAG() example. It does almost same work what we did
manually in parser.

Surely, it is not compatible with Oracle's polymorphism rules, because

a) Our and Postgres type system is different (sometimes very different).

b) Oracle's casting rules depends on argument positions and some specific
exceptions - so it is not possible to map it on Postgres type system (or
system of polymorphic types).

I wrote and I spent lot of time on this feature to be used - by core
developers, by extension's developers. Like lot of other feature - it can
carry  more comfort with some risk of performance issues.

On second hand if we use this feature for buildin functions, there is zero
impact of current applications - there should not be any problem with
compatibility or performance.

Maybe I am too old, and last year was too terrible so I have a problem to
imagine a "intelligent" user now :)

Regards

Pavel

Although I can imagine other enhancing of polymorphic types, I don't
propose any new, and I don't expect any enhancing in near feature. Is true,
so there are not requests and I think so "anycompatible" and "any" are more
than good enough for 99.99% developers.

I am little bit surprised so semi compatibility mode implemented in Orafce
is good enough and full compatibility with Oracle a) isn't possible, b)
isn't requested by visible group of users (or users who need it are
satisfied by EDB).



>
> regards, tom lane
>


Re: Small code cleanup

2020-06-01 Thread Tom Lane
Mark Dilger  writes:
> Yeah, I noticed the `git blame` last night when writing the patch that you 
> had originally wrote the code around 2017, and that the duplication was 
> introduced in a patch committed by others around 2018.  I was hoping that 
> you, as the original author, or somebody involved in the 2018 patch, might 
> have a deeper understanding of what's being done and volunteer to clean up 
> the comments.

I don't think there's any deep dark mystery here.  We have a collection of
things we need to do, each one applying to some subset of relkinds, and
the issue is how to express the control logic in a maintainable and
not-too-confusing way.  Unfortunately people have pasted in new things
with little focus on "not too confusing" and more focus on "how can I make
this individual patch as short as possible".  It's probably time to take a
step back and refactor.

My immediate annoyance was that the "Finish printing the footer
information about a table" comment has been made a lie by adding
partitioned indexes to the set of relkinds handled; I can cope with
considering a matview to be a table, but surely an index is not.  Plus, if
partitioned indexes need to be handled here, why not also regular indexes?
The lack of any comments explaining this is really not good.

I'm inclined to think that maybe having that overall if-test just after
that comment is obsolete, and we ought to break it down into separate
segments.  For instance there's no obvious reason why the first
"print foreign server name" stanza should be inside that if-test;
and the sections related to partitioning would be better off restricted
to relkinds that, um, can have partitions.

I have to admit that I don't any longer see what the connection is
between the two "footer information about a table" sections.  Maybe
it was more obvious before all the partitioning stuff got shoved in,
or maybe there never was any essential connection.

Anyway the whole thing is overdue for a cosmetic workover.  Do you
want to have a go at that?

regards, tom lane




Re: Small code cleanup

2020-06-01 Thread Mark Dilger



> On Jun 1, 2020, at 9:59 AM, Tom Lane  wrote:
> 
> Mark Dilger  writes:
>> Yeah, I noticed the `git blame` last night when writing the patch that you 
>> had originally wrote the code around 2017, and that the duplication was 
>> introduced in a patch committed by others around 2018.  I was hoping that 
>> you, as the original author, or somebody involved in the 2018 patch, might 
>> have a deeper understanding of what's being done and volunteer to clean up 
>> the comments.
> 
> I don't think there's any deep dark mystery here.  We have a collection of
> things we need to do, each one applying to some subset of relkinds, and
> the issue is how to express the control logic in a maintainable and
> not-too-confusing way.  Unfortunately people have pasted in new things
> with little focus on "not too confusing" and more focus on "how can I make
> this individual patch as short as possible".  It's probably time to take a
> step back and refactor.
> 
> My immediate annoyance was that the "Finish printing the footer
> information about a table" comment has been made a lie by adding
> partitioned indexes to the set of relkinds handled; I can cope with
> considering a matview to be a table, but surely an index is not.  Plus, if
> partitioned indexes need to be handled here, why not also regular indexes?
> The lack of any comments explaining this is really not good.
> 
> I'm inclined to think that maybe having that overall if-test just after
> that comment is obsolete, and we ought to break it down into separate
> segments.  For instance there's no obvious reason why the first
> "print foreign server name" stanza should be inside that if-test;
> and the sections related to partitioning would be better off restricted
> to relkinds that, um, can have partitions.
> 
> I have to admit that I don't any longer see what the connection is
> between the two "footer information about a table" sections.  Maybe
> it was more obvious before all the partitioning stuff got shoved in,
> or maybe there never was any essential connection.
> 
> Anyway the whole thing is overdue for a cosmetic workover.  Do you
> want to have a go at that?

Ok, sure, I'll see if I can clean that up.  I ran into this while doing some 
refactoring of about 160 files, so I wasn't really focused on this particular 
file, or its features.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: Read access for pg_monitor to pg_replication_origin_status view

2020-06-01 Thread Martín Marqués
Hi,

Took me a bit longer than expected, but here is a new version, now
with the idea of just removing the superuser() check and REVOKEing
execution of the functions from public. At the end I grant permission
to functions and the pg_replication_origin_status view.

I wonder now if I needed to GRANT execution of the functions. A grant
on the view should be enough.

I'll think about it.

El dom., 31 de may. de 2020 a la(s) 12:13, Martín Marqués
(mar...@2ndquadrant.com) escribió:
>
> Hi Michael,
>
> > Wouldn't it be just better to remove this hardcoded superuser check
> > and replace it with equivalent ACLs by default?  The trick is to make
> > sure that any function calling replorigin_check_prerequisites() has
> > its execution correctly revoked from public.  See for example
> > e79350fe.
>
> Looking at that, it seems a better solution. Let me wrap up a new
> patch, likely later today or early tomorrow as it's Sunday ;-)
>
> --
> Martín Marquéshttp://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Training & Services



-- 

Martín Marquéshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
From 945cd1847d624da64e15e3bdbbabcf34a04f848f Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Mart=C3=ADn=20Marqu=C3=A9s?=
 
Date: Mon, 1 Jun 2020 08:39:54 -0300
Subject: [PATCH 1/2] Access to pg_replication_origin_status view was
 restricted to superusers only due to it using
 pg_show_replication_origin_status(), and that all replication origin
 functions require superuser.

This patch removes the check for superuser, which is hardcoded, and relies on
ACL checks (we revoke execution on such functions)
---
 src/backend/catalog/system_views.sql | 20 
 src/backend/replication/logical/origin.c |  5 -
 2 files changed, 20 insertions(+), 5 deletions(-)

diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 56420bbc9d6..a95f30ccaed 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1469,6 +1469,21 @@ REVOKE EXECUTE ON FUNCTION pg_stat_file(text,boolean) FROM public;
 REVOKE EXECUTE ON FUNCTION pg_ls_dir(text) FROM public;
 REVOKE EXECUTE ON FUNCTION pg_ls_dir(text,boolean,boolean) FROM public;
 
+--
+-- Permision to execute Replication Origin modification functions should be REVOKEd from public
+--
+REVOKE EXECUTE ON FUNCTION pg_replication_origin_advance(text, pg_lsn) FROM public;
+REVOKE EXECUTE ON FUNCTION pg_replication_origin_create(text) FROM public;
+REVOKE EXECUTE ON FUNCTION pg_replication_origin_drop(text) FROM public;
+REVOKE EXECUTE ON FUNCTION pg_replication_origin_oid(text) FROM public;
+REVOKE EXECUTE ON FUNCTION pg_replication_origin_progress(text, boolean) FROM public;
+REVOKE EXECUTE ON FUNCTION pg_replication_origin_session_is_setup() FROM public;
+REVOKE EXECUTE ON FUNCTION pg_replication_origin_session_progress(boolean) FROM public;
+REVOKE EXECUTE ON FUNCTION pg_replication_origin_session_reset() FROM public;
+REVOKE EXECUTE ON FUNCTION pg_replication_origin_session_setup(text) FROM public;
+REVOKE EXECUTE ON FUNCTION pg_replication_origin_xact_reset() FROM public;
+REVOKE EXECUTE ON FUNCTION pg_replication_origin_xact_setup(pg_lsn, timestamp with time zone) FROM public;
+
 --
 -- We also set up some things as accessible to standard roles.
 --
@@ -1478,6 +1493,11 @@ GRANT EXECUTE ON FUNCTION pg_ls_archive_statusdir() TO pg_monitor;
 GRANT EXECUTE ON FUNCTION pg_ls_tmpdir() TO pg_monitor;
 GRANT EXECUTE ON FUNCTION pg_ls_tmpdir(oid) TO pg_monitor;
 
+GRANT EXECUTE ON FUNCTION pg_replication_origin_progress(text, boolean) TO pg_monitor;
+GRANT EXECUTE ON FUNCTION pg_replication_origin_session_progress(boolean) TO pg_monitor;
+
 GRANT pg_read_all_settings TO pg_monitor;
 GRANT pg_read_all_stats TO pg_monitor;
 GRANT pg_stat_scan_tables TO pg_monitor;
+
+--GRANT SELECT ON pg_replication_origin_status TO pg_monitor;
diff --git a/src/backend/replication/logical/origin.c b/src/backend/replication/logical/origin.c
index 981d60f135d..1f223daf21f 100644
--- a/src/backend/replication/logical/origin.c
+++ b/src/backend/replication/logical/origin.c
@@ -182,11 +182,6 @@ static ReplicationState *session_replication_state = NULL;
 static void
 replorigin_check_prerequisites(bool check_slots, bool recoveryOK)
 {
-	if (!superuser())
-		ereport(ERROR,
-(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
- errmsg("only superusers can query or manipulate replication origins")));
-
 	if (check_slots && max_replication_slots == 0)
 		ereport(ERROR,
 (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
-- 
2.21.3


From 4016d5f817a202ea272eb3ef8a9f17dfec2b66ec Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Mart=C3=ADn=20Marqu=C3=A9s?=
 
Date: Mon, 1 Jun 2020 14:46:11 -0300
Subject: [PATCH 2/2] Forgot to REVOKE permissions over
 pg_show_replication_origin_status() to PUBLIC.

After testing, I uncommented the GRANT SELECT over the pg_replication_origin_s

Re: Just for fun: Postgres 20?

2020-06-01 Thread Robert Haas
On Wed, Feb 12, 2020 at 11:25 AM Juan José Santamaría Flecha
 wrote:
> On Wed, Feb 12, 2020 at 3:47 PM Tom Lane  wrote:
>> Yeah; I don't think it's *that* unlikely for it to happen again.  But
>> my own principal concern about this mirrors what somebody else already
>> pointed out: the one-major-release-per-year schedule is not engraved on
>> any stone tablets.  So I don't want to go to a release numbering system
>> that depends on us doing it that way for the rest of time.
>
> We could you use  as version identifier, so people will not expect 
> correlative numbering. SQL Server is being released every couple of years and 
> they are using this naming shema. The problem would be releasing twice the 
> same year, but how likely would that be?

As has already been pointed out, it could definitely happen, but we
could solve that by just using a longer version number, say, including
the month and, in case we ever do multiple major releases in the same
month, also the day. In fact, we might as well take it one step
further and use the same format for the release number that we use for
CATALOG_VERSION_NO: MMDDN. So this fall, piggybacking on the
success of PostgreSQL 10, 11, and 12, we could look then release
PostgreSQL 202009241 or so.  As catversion.h wisely points out,
there's room to hope that we'll never commit 10 independent sets of
catalog changes on the same day, and I think we can also hope we'll
never do more than ten major releases on the same day. Admittedly,
skipping the version number by 200 million or so might seem like an
overreaction to the purported unluckiness of the number 13, but just
think how many OTHER unlucky numbers we'd also skip in the progress.

/me runs away and hides.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Just for fun: Postgres 20?

2020-06-01 Thread Tom Lane
Robert Haas  writes:
> As has already been pointed out, it could definitely happen, but we
> could solve that by just using a longer version number, say, including
> the month and, in case we ever do multiple major releases in the same
> month, also the day. In fact, we might as well take it one step
> further and use the same format for the release number that we use for
> CATALOG_VERSION_NO: MMDDN. So this fall, piggybacking on the
> success of PostgreSQL 10, 11, and 12, we could look then release
> PostgreSQL 202009241 or so.

But then where do you put the minor number for maintenance releases?

regards, tom lane




Re: Speeding up parts of the planner using a binary search tree structure for nodes

2020-06-01 Thread David Rowley
On Sat, 30 May 2020 at 01:52, Ashutosh Bapat
 wrote:
>
> On Fri, May 29, 2020 at 10:47 AM David Rowley  wrote:
> > In [1] I mentioned that I'd like to look into coding a data structure
> > to allow Node types to be looked up more efficiently than what List
> > allows.  There are many places in the planner where we must determine
> > if a given Node is in some List of Nodes. This is an O(n) operation.
> > We could reduce these to as low as O(log n) with a binary search tree.
> >
> > A few places in the code that does this which can become a problem
> > when the lists being searched are large:
> >
> > get_eclass_for_sort_expr()
>
> eclass members have relids as their key. Can we use hash table instead
> like how we manage join relations? I know that there are other cases
> where we search for subset of relids instead of exact match. So the
> hash table has to account for that. If we could somehow create a hash
> value out of an expression node (we almost hash anything so that
> should be possible) we should be able to use hash table for searching
> expression as well.

This certainly could be done with hash tables. However, I feel it
raises the bar a little as it means we either need to maintain a hash
function for each node type or do some pre-processor magic in
equalfuncs.c to adjust the comparison macros into hashing macros to
allow it to be compiled again to implement hash functions.  If
everyone feels that's an okay thing to go and do then perhaps hash
tables are a better option.  I was just trying to keep the bar to some
level that I thought might be acceptable to the community.

It does seem likely to me that hash tables would be a more efficient
structure, but the gains might not be as big as you think. To perform
a lookup in the table we need to hash the entire node to get the hash
value, then perform at least one equal comparison.  With the Binary
Search Lists, we'd need more comparisons, but those can be partial
comparisons as they can abort early when we find the first mismatching
field in the Node. When the number of items in the list is fairly
small that might actually be less work, especially when the Node type
varies (since that's the first field we compare). Hash tables are
likely to become more efficient when the number of items is larger.
The general case is that we'll just have a small number of items. I'd
like to improve the less common situation when the number of items is
large with minimal impact for when the number of items is small.

> > tlist_member()
>
> hash table by expression as key here as well?

The order of the tlist does matter in many cases. I'm unsure how we
could track the order that items were added to the hash table and then
obtain the items back in that order in an efficient way. I imagine
we'd need to store the item in some subsequent data structure, such as
a List. There's obviously going to be some overhead to doing that.
The idea with the Binary Search Lists was that items would be stored
in an array, similar to List, but the cells of the list would contain
the offset index to its parent and left and right child.  New items
would always take the next free slot in the list, just like List does
so we'd easily be able to get the insert order by looping over the
array of elements in order.

David

> > [1] 
> > https://www.postgresql.org/message-id/CAKJS1f8v-fUG8YpaAGj309ZuALo3aEk7f6cqMHr_AVwz1fKXug%40mail.gmail.com




Re: Another modest proposal for docs formatting: catalog descriptions

2020-06-01 Thread Josef Šimánek
út 2. 6. 2020 v 0:30 odesílatel Tom Lane  napsal:

> As of HEAD, building the PDF docs for A4 paper draws 538 "contents
> ... exceed the available area" warnings.  While this is a nice step
> forward from where we were (v12 has more than 1500 such warnings),
> we're far from done fixing that issue.
>
> A large chunk of the remaining warnings are about tables that describe
> the columns of system catalogs, system views, and information_schema
> views.  The typical contents of a row in such a table are a field name,
> a field data type, possibly a "references" link, and then a description.
> Unsurprisingly, this does not work very well for descriptions of more
> than a few words.  And not infrequently, we *need* more than a few words.
>
> ISTM this is more or less the same problem we have/had with function
> descriptions, and so I'm tempted to solve it in more or less the same
> way.  Let's redefine the table layout to look like, say, this for
> pg_attrdef [1]:
>
> oid oid
> Row identifier
>
> adrelid oid (references pg_class.oid)
> The table this column belongs to
>
> adnum int2 (references pg_attribute.attnum)
> The number of the column
>
> adbin pg_node_tree
> The column default value, in nodeToString() representation. Use
> pg_get_expr(adbin, adrelid) to convert it to an SQL expression.
>
> That is, let's go over to something that's more or less like a table-ized
> , with the fixed items for an entry all written on the first
> line, and then as much description text as we need.  The actual markup
> would be closely modeled on what we did for function-table entries.
>
> Thoughts?
>

I have spotted this change recently at progress monitoring devel docs (
https://www.postgresql.org/docs/devel/progress-reporting.html#CREATE-INDEX-PROGRESS-REPORTING).
Current version seems a little chaotic since there are multiple tables on
the same page with 2 mixed layouts. Older layout (for example v12 one -
https://www.postgresql.org/docs/12/progress-reporting.html#CREATE-INDEX-PROGRESS-REPORTING)
is much easier to read for me.

Is this final change? I do not see any problem on this (progress
monitoring) page in old layout. Is there any example of problematic page?
Maybe there's a different way to solve this. For example instead of
in-lining long text as a column description, it should be possible to link
to detailed description in custom paragraph or table. See description
column at table 27.22. at progress monitoring page for column "phase" for
similar approach.


>
> regards, tom lane
>
> [1] https://www.postgresql.org/docs/devel/catalog-pg-attrdef.html
>
>
>
>
>


Re: Another modest proposal for docs formatting: catalog descriptions

2020-06-01 Thread Tom Lane
=?UTF-8?B?Sm9zZWYgxaBpbcOhbmVr?=  writes:
> I have spotted this change recently at progress monitoring devel docs (
> https://www.postgresql.org/docs/devel/progress-reporting.html#CREATE-INDEX-PROGRESS-REPORTING).
> Current version seems a little chaotic since there are multiple tables on
> the same page with 2 mixed layouts. Older layout (for example v12 one -
> https://www.postgresql.org/docs/12/progress-reporting.html#CREATE-INDEX-PROGRESS-REPORTING)
> is much easier to read for me.

> Is this final change? I do not see any problem on this (progress
> monitoring) page in old layout. Is there any example of problematic page?
> Maybe there's a different way to solve this. For example instead of
> in-lining long text as a column description, it should be possible to link
> to detailed description in custom paragraph or table. See description
> column at table 27.22. at progress monitoring page for column "phase" for
> similar approach.

I'm not planning on revisiting that work, no.  And converting every
table/view description table into two (or more?) tables sure doesn't
sound like an improvement.

Perhaps there's a case for reformatting the phase-description tables
in the progress monitoring section to look more like the view tables.
(I hadn't paid much attention to them, since they weren't causing PDF
rendering problems.)  On the other hand, you could argue that it's
good that they don't look like the view tables, since the info they
are presenting is fundamentally different.  I don't honestly see much
wrong with the way it is now.

regards, tom lane




Re: Another modest proposal for docs formatting: catalog descriptions

2020-06-01 Thread Jonathan S. Katz
On 6/1/20 6:57 PM, Tom Lane wrote:
> =?UTF-8?B?Sm9zZWYgxaBpbcOhbmVr?=  writes:
>> I have spotted this change recently at progress monitoring devel docs (
>> https://www.postgresql.org/docs/devel/progress-reporting.html#CREATE-INDEX-PROGRESS-REPORTING).
>> Current version seems a little chaotic since there are multiple tables on
>> the same page with 2 mixed layouts. Older layout (for example v12 one -
>> https://www.postgresql.org/docs/12/progress-reporting.html#CREATE-INDEX-PROGRESS-REPORTING)
>> is much easier to read for me.
> 
>> Is this final change? I do not see any problem on this (progress
>> monitoring) page in old layout. Is there any example of problematic page?
>> Maybe there's a different way to solve this. For example instead of
>> in-lining long text as a column description, it should be possible to link
>> to detailed description in custom paragraph or table. See description
>> column at table 27.22. at progress monitoring page for column "phase" for
>> similar approach.
> 
> I'm not planning on revisiting that work, no.  And converting every
> table/view description table into two (or more?) tables sure doesn't
> sound like an improvement.
> 
> Perhaps there's a case for reformatting the phase-description tables
> in the progress monitoring section to look more like the view tables.
> (I hadn't paid much attention to them, since they weren't causing PDF
> rendering problems.)  On the other hand, you could argue that it's
> good that they don't look like the view tables, since the info they
> are presenting is fundamentally different.  I don't honestly see much
> wrong with the way it is now.

I think it looks fine. +1 for leaving it.

Jonathan



signature.asc
Description: OpenPGP digital signature


Re: [POC] Fast COPY FROM command for the table with foreign partitions

2020-06-01 Thread Etsuro Fujita
Hi Andrey,

On Mon, Jun 1, 2020 at 6:29 PM Andrey Lepikhov
 wrote:
> Currently i see, COPY FROM insertion into the partitioned table with
> foreign partitions is not optimal: even if table constraints allows can
> do multi insert copy, we will flush the buffers and prepare new INSERT
> query for each tuple, routed into the foreign partition.
> To solve this problem i tried to use the multi insert buffers for
> foreign tuples too. Flushing of these buffers performs by the analogy
> with 'COPY .. FROM STDIN' machinery as it is done by the psql '\copy'
> command.
> The patch in attachment was prepared from the private scratch developed
> by Arseny Sher a couple of years ago.
> Benchmarks shows that it speeds up COPY FROM operation:
> Command "COPY pgbench_accounts FROM ..." (test file contains 1e7 tuples,
> copy to three partitions) executes on my laptop in 14 minutes without
> the patch and in 1.5 minutes with the patch. Theoretical minimum here
> (with infinite buffer size) is 40 seconds.

Great!

> A couple of questions:
> 1. Can this feature be interesting for the PostgreSQL core or not?

Yeah, I think this is especially useful for sharding.

> 2. If this is a useful feature, is the correct way chosen?

I think I also thought something similar to this before [1].  Will take a look.

Thanks!

Best regards,
Etsuro Fujita

[1] 
https://www.postgresql.org/message-id/23990375-45a6-5823-b0aa-a6a7a6a957f0%40lab.ntt.co.jp




Re: Postgres Windows build system doesn't work with python installed in Program Files

2020-06-01 Thread David Zhang

Hi Michael,

I performed a quick test for the path "msvc-build-init-v2.patch" using 
below cases:


1. perl build.pl
2. perl build.pl debug psql
3. perl build.pl RELEASE psql
4. perl build.pl deBUG psql
5. perl build.pl psql
The above cases (case-insensitive) are all working great without any 
warning for latest master branch.


When build with more than 3 parameters, yes, I got below expected 
message as well.


c:\Users\david\Downloads\postgres\src\tools\msvc>perl build.pl DEbug 
psql pg_baseback

Usage: build.pl [ [  ]  ]
Options are case-insensitive.
configuration: Release | Debug. This sets the configuration
to build. Default is Release.
component: name of component to build. An empty value means
to build all components.


However, if I ask for help in a typical Windows' way, i.e, "perl 
build.pl -help" or "perl build.pl /?", I got some messages like below,


c:\Users\david\Downloads\postgres\src\tools\msvc>perl build.pl -help
Detected hardware platform: Win32
Files src/bin/pgbench/exprscan.l
Files src/bin/pgbench/exprparse.y
Files src/bin/psql/psqlscanslash.l
Files contrib/cube/cubescan.l
Files contrib/cube/cubeparse.y
Files contrib/seg/segscan.l
Files contrib/seg/segparse.y
Generating configuration headers...
Microsoft (R) Build Engine version 16.5.1+4616136f8 for .NET Framework
Copyright (C) Microsoft Corporation. All rights reserved.

MSBUILD : error MSB1001: Unknown switch.
Switch: -help.vcxproj

For switch syntax, type "MSBuild -help"

c:\Users\david\Downloads\postgres\src\tools\msvc>perl build.pl /?
Detected hardware platform: Win32
Files src/bin/pgbench/exprscan.l
Files src/bin/pgbench/exprparse.y
Files src/bin/psql/psqlscanslash.l
Files contrib/cube/cubescan.l
Files contrib/cube/cubeparse.y
Files contrib/seg/segscan.l
Files contrib/seg/segparse.y
Generating configuration headers...
Microsoft (R) Build Engine version 16.5.1+4616136f8 for .NET Framework
Copyright (C) Microsoft Corporation. All rights reserved.

MSBUILD : error MSB1001: Unknown switch.
Switch: /?.vcxproj

For switch syntax, type "MSBuild -help"

It would be a bonus if the build.pl can support the "help" in Windows' way.


Thanks,

David

On 2020-05-06 10:45 p.m., Michael Paquier wrote:

On Wed, May 06, 2020 at 12:17:03AM +0200, Juan José Santamaría Flecha wrote:

Please forgive me if I am being too nitpicky, but I find the comments a
little too verbose, a usage format might be more visual and easier to
explain:

Usage: build [[CONFIGURATION] COMPONENT]

The options are  case-insensitive.
CONFIGURATION sets the configuration to build, "debug" or "release" (by
default).
COMPONENT defines a component to build. An empty option means all
components.

Your comment makes sense to me.  What about the attached then?  On top
of documenting the script usage in the code, let's trigger it if it
gets called with more than 3 arguments.  What do you think?

FWIW, I forgot to mention that I don't think those warnings are worth
a backpatch.  No objections with improving things on HEAD of course.
--
Michael

--
David

Software Engineer
Highgo Software Inc. (Canada)
www.highgo.ca





Re: Read access for pg_monitor to pg_replication_origin_status view

2020-06-01 Thread Martín Marqués
Hi,

> Took me a bit longer than expected, but here is a new version, now
> with the idea of just removing the superuser() check and REVOKEing
> execution of the functions from public. At the end I grant permission
> to functions and the pg_replication_origin_status view.
>
> I wonder now if I needed to GRANT execution of the functions. A grant
> on the view should be enough.
>
> I'll think about it.

Yeah, those `GRANT EXECUTE` for the 2 functions should go, as the view
which is what we want to `SELECT` from has the appropriate ACL set.

$ git diff
diff --git a/src/backend/catalog/system_views.sql
b/src/backend/catalog/system_views.sql
index c16061f8f00..97ee72a9cfc 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1494,9 +1494,6 @@ GRANT EXECUTE ON FUNCTION
pg_ls_archive_statusdir() TO pg_monitor;
 GRANT EXECUTE ON FUNCTION pg_ls_tmpdir() TO pg_monitor;
 GRANT EXECUTE ON FUNCTION pg_ls_tmpdir(oid) TO pg_monitor;

-GRANT EXECUTE ON FUNCTION pg_replication_origin_progress(text,
boolean) TO pg_monitor;
-GRANT EXECUTE ON FUNCTION
pg_replication_origin_session_progress(boolean) TO pg_monitor;
-
 GRANT pg_read_all_settings TO pg_monitor;
 GRANT pg_read_all_stats TO pg_monitor;
 GRANT pg_stat_scan_tables TO pg_monitor;


Regards,

-- 
Martín Marquéshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services




Re: feature idea: use index when checking for NULLs before SET NOT NULL

2020-06-01 Thread John Bachir
> Maybe something else had a nontrivial lock on the table, and those commands
> were waiting on lock.  If you "SET deadlock_timeout='1'; SET
> log_lock_waits=on;", then you could see that.

Just checking - I think you mean lock_timeout? (although setting 
deadlock_timeout is also not a bad idea just in case).




Re: feature idea: use index when checking for NULLs before SET NOT NULL

2020-06-01 Thread Justin Pryzby
On Mon, Jun 01, 2020 at 10:49:25AM -0400, John Bachir wrote:
> On Fri, May 29, 2020, at 10:10 PM, Justin Pryzby wrote:
> 
> > If you do it right, you can see a DEBUG:
> > postgres=# SET client_min_messages=debug;
> > postgres=# ALTER TABLE tn ALTER i SET NOT NULL ;
> > DEBUG:  existing constraints on column "tn"."i" are sufficient to prove 
> > that it does not contain nulls
> 
> Thanks! I'll add that to my recipe for the future. Although by that time it 
> would be too late, so to make use of this I would have to set up a cloned 
> test environment and hope that all conditions are correctly cloned. Is there 
> a way to check sufficiency before running the command?

Yea, client_min_messages is there to demonstrate that the feature is working
and allow you to check whether it work using your own recipe.

If you want to avoid blocking the table for nontrivial time, maybe you'd add:
SET statement_timeout='1s';

On Mon, Jun 01, 2020 at 09:55:43PM -0400, John Bachir wrote:
> > Maybe something else had a nontrivial lock on the table, and those commands
> > were waiting on lock.  If you "SET deadlock_timeout='1'; SET
> > log_lock_waits=on;", then you could see that.
> 
> Just checking - I think you mean lock_timeout? (although setting 
> deadlock_timeout is also not a bad idea just in case).

No, actually (but I've had to double check):

https://www.postgresql.org/docs/current/runtime-config-locks.html
|When log_lock_waits is set, this parameter also determines the length of time
|to wait before a log message is issued about the lock wait. If you are trying
|to investigate locking delays you might want to set a shorter than normal
|deadlock_timeout.

-- 
Justin




Re: feature idea: use index when checking for NULLs before SET NOT NULL

2020-06-01 Thread John Bachir
Thank you Justin for all that useful info! A couple nitpicky questions, so I 
can get my recipe right.

On Mon, Jun 1, 2020, at 10:04 PM, Justin Pryzby wrote:
> On Mon, Jun 01, 2020 at 10:49:25AM -0400, John Bachir wrote:
> > Thanks! I'll add that to my recipe for the future. Although by that time it 
> > would be too late, so to make use of this I would have to set up a cloned 
> > test environment and hope that all conditions are correctly cloned. Is 
> > there a way to check sufficiency before running the command?
> 
> Yea, client_min_messages is there to demonstrate that the feature is working
> and allow you to check whether it work using your own recipe.

Gotcha. Okay, just to double-check - you are saying there is _not_ a way to 
check before running the command, right?


> > Just checking - I think you mean lock_timeout? (although setting 
> > deadlock_timeout is also not a bad idea just in case).
> 
> No, actually (but I've had to double check):
> 
> https://www.postgresql.org/docs/current/runtime-config-locks.html
> |When log_lock_waits is set, this parameter also determines the length of time
> |to wait before a log message is issued about the lock wait. If you are trying
> |to investigate locking delays you might want to set a shorter than normal
> |deadlock_timeout.

Hah! Unexpected and useful.

I think to avoid blocking my application activity, I should _also_ set 
lock_timeout, and retry if it fails. (maybe this is orthogonal to what you were 
addressing before, but I'm just wondering what you think).

Thanks,
John




Small doc improvement about spilled txn tracking

2020-06-01 Thread Masahiko Sawada
Hi all,

When reading pg_stat_replication doc of PG13, I thought it's better to
mention that tracking of spilled transactions works only for logical
replication like we already mentioned about replication lag tracking:

  
   Lag times work automatically for physical replication. Logical decoding
   plugins may optionally emit tracking messages; if they do not, the tracking
   mechanism will simply display NULL lag.
  

What do you think? Please find attached patch.

Regards,

-- 
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


spilled_txn_doc.patch
Description: Binary data


Re: SIGSEGV from START_REPLICATION 0/XXXXXXX in XLogSendPhysical () at walsender.c:2762

2020-06-01 Thread Michael Paquier
On Fri, May 29, 2020 at 06:09:06PM +0900, Masahiko Sawada wrote:
> Yes. Conversely, if we start logical replication in a physical
> replication connection (i.g. replication=true) we got an error before
> staring replication:
> 
> ERROR:  logical decoding requires a database connection
> 
> I think we can prevent that SEGV in a similar way.

Still unconvinced as this restriction stands for logical decoding
requiring a database connection but it is not necessarily true now as
physical replication has less restrictions than a logical one.

Looking at the code, I think that there is some confusion with the
fake WAL reader used as base reference in InitWalSender() where we
assume that it could only be used in the context of a non-database WAL
sender.  However, this initialization happens when the WAL sender
connection is initialized, and what I think this misses is that we 
should try to initialize a WAL reader when actually going through a
START_REPLICATION command.

I can note as well that StartLogicalReplication() moves in this sense
by setting xlogreader to be the one from logical_decoding_ctx once the
decoding context has been created.

This results in the attached.  The extra test from upthread to check
that logical decoding is not allowed in a non-database WAL sender is a
good idea, so I have kept it.
--
Michael
diff --git a/src/include/access/xlogreader.h b/src/include/access/xlogreader.h
index d930fe957d..b0f2a6ed43 100644
--- a/src/include/access/xlogreader.h
+++ b/src/include/access/xlogreader.h
@@ -262,10 +262,6 @@ extern XLogReaderRoutine *LocalXLogReaderRoutine(void);
 /* Free an XLogReader */
 extern void XLogReaderFree(XLogReaderState *state);
 
-/* Initialize supporting structures */
-extern void WALOpenSegmentInit(WALOpenSegment *seg, WALSegmentContext *segcxt,
-			   int segsize, const char *waldir);
-
 /* Position the XLogReader to given record */
 extern void XLogBeginRead(XLogReaderState *state, XLogRecPtr RecPtr);
 #ifdef FRONTEND
diff --git a/src/backend/access/transam/xlogreader.c b/src/backend/access/transam/xlogreader.c
index 5995798b58..cb76be4f46 100644
--- a/src/backend/access/transam/xlogreader.c
+++ b/src/backend/access/transam/xlogreader.c
@@ -44,6 +44,8 @@ static bool ValidXLogRecordHeader(XLogReaderState *state, XLogRecPtr RecPtr,
 static bool ValidXLogRecord(XLogReaderState *state, XLogRecord *record,
 			XLogRecPtr recptr);
 static void ResetDecoder(XLogReaderState *state);
+static void WALOpenSegmentInit(WALOpenSegment *seg, WALSegmentContext *segcxt,
+			   int segsize, const char *waldir);
 
 /* size of the buffer allocated for error message. */
 #define MAX_ERRORMSG_LEN 1000
@@ -210,7 +212,7 @@ allocate_recordbuf(XLogReaderState *state, uint32 reclength)
 /*
  * Initialize the passed segment structs.
  */
-void
+static void
 WALOpenSegmentInit(WALOpenSegment *seg, WALSegmentContext *segcxt,
    int segsize, const char *waldir)
 {
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 86847cbb54..aeda307c6b 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -130,13 +130,11 @@ bool		log_replication_commands = false;
 bool		wake_wal_senders = false;
 
 /*
- * Physical walsender does not use xlogreader to read WAL, but it does use a
- * fake one to keep state.  Logical walsender uses a proper xlogreader.  Both
- * keep the 'xlogreader' pointer to the right one, for the sake of common
- * routines.
+ * xlogreader used for replication.  Note that a WAL sender doing physical
+ * replication does not need xlogreader to read WAL, but it needs one to
+ * keep a state of its work.
  */
-static XLogReaderState fake_xlogreader;
-static XLogReaderState *xlogreader;
+static XLogReaderState *xlogreader = NULL;
 
 /*
  * These variables keep track of the state of the timeline we're currently
@@ -285,20 +283,6 @@ InitWalSender(void)
 
 	/* Initialize empty timestamp buffer for lag tracking. */
 	lag_tracker = MemoryContextAllocZero(TopMemoryContext, sizeof(LagTracker));
-
-	/*
-	 * Prepare physical walsender's fake xlogreader struct.  Logical walsender
-	 * does this later.
-	 */
-	if (!am_db_walsender)
-	{
-		xlogreader = &fake_xlogreader;
-		xlogreader->routine =
-			*XL_ROUTINE(.segment_open = WalSndSegmentOpen,
-		.segment_close = wal_segment_close);
-		WALOpenSegmentInit(&xlogreader->seg, &xlogreader->segcxt,
-		   wal_segment_size, NULL);
-	}
 }
 
 /*
@@ -594,6 +578,18 @@ StartReplication(StartReplicationCmd *cmd)
 (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
  errmsg("IDENTIFY_SYSTEM has not been run before START_REPLICATION")));
 
+	/* create xlogreader for physical replication */
+	xlogreader =
+		XLogReaderAllocate(wal_segment_size, NULL,
+		   XL_ROUTINE(.segment_open = WalSndSegmentOpen,
+	  .segment_close = wal_segment_close),
+		   NULL);
+
+	if (!xlogreader)
+		ereport(ERROR,
+(errcode(ERRCODE_OUT_OF_MEMORY),
+ errmsg("out of memory

Re: Read access for pg_monitor to pg_replication_origin_status view

2020-06-01 Thread Michael Paquier
On Mon, Jun 01, 2020 at 03:38:07PM -0300, Martín Marqués wrote:
> Took me a bit longer than expected, but here is a new version, now
> with the idea of just removing the superuser() check and REVOKEing
> execution of the functions from public. At the end I grant permission
> to functions and the pg_replication_origin_status view.
> 
> I wonder now if I needed to GRANT execution of the functions. A grant
> on the view should be enough.

+GRANT EXECUTE ON FUNCTION pg_replication_origin_progress(text, boolean) TO 
pg_monitor;
+GRANT EXECUTE ON FUNCTION pg_replication_origin_session_progress(boolean) TO 
pg_monitor;

FWIW, I think that removing a hardcoded superuser() restriction and
assigning new rights to system roles are two different things, so it
would be better to split the logic into two patches to ease the
review.

I can also see the following in func.sgml:
Use of functions for replication origin is restricted to
superusers. 
   

But that's not right as one can use GRANT to leverage the ACLs of
those functions with your patch.  I have a suggestion for this part,
as follows:
"Use of functions for replication origin is only allowed to the
superuser by default, but may be allowed to other users by using the
GRANT command."

Also, you may want to add this patch to the next commit fest:
https://commitfest.postgresql.org/28/
--
Michael


signature.asc
Description: PGP signature


Re: Small doc improvement about spilled txn tracking

2020-06-01 Thread Amit Kapila
On Tue, Jun 2, 2020 at 9:10 AM Masahiko Sawada
 wrote:
>
> Hi all,
>
> When reading pg_stat_replication doc of PG13, I thought it's better to
> mention that tracking of spilled transactions works only for logical
> replication like we already mentioned about replication lag tracking:
>
>   
>Lag times work automatically for physical replication. Logical decoding
>plugins may optionally emit tracking messages; if they do not, the tracking
>mechanism will simply display NULL lag.
>   
>
> What do you think?
>

+1.

> Please find attached patch.
>

On a quick look, it seems fine but I will look in more detail and let
you know if I have any feedback.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: Read access for pg_monitor to pg_replication_origin_status view

2020-06-01 Thread Kyotaro Horiguchi
Hi.

At Mon, 1 Jun 2020 21:41:13 -0300, Martín Marqués  
wrote in 
> Hi,
> 
> > Took me a bit longer than expected, but here is a new version, now
> > with the idea of just removing the superuser() check and REVOKEing
> > execution of the functions from public. At the end I grant permission
> > to functions and the pg_replication_origin_status view.
> 
> > I wonder now if I needed to GRANT execution of the functions. A grant
> > on the view should be enough.
> 
> > I'll think about it.
> 
> Yeah, those `GRANT EXECUTE` for the 2 functions should go, as the view
> which is what we want to `SELECT` from has the appropriate ACL set.
> 
> $ git diff
> diff --git a/src/backend/catalog/system_views.sql
> b/src/backend/catalog/system_views.sql
> index c16061f8f00..97ee72a9cfc 100644
> --- a/src/backend/catalog/system_views.sql
> +++ b/src/backend/catalog/system_views.sql
> @@ -1494,9 +1494,6 @@ GRANT EXECUTE ON FUNCTION
> pg_ls_archive_statusdir() TO pg_monitor;
>  GRANT EXECUTE ON FUNCTION pg_ls_tmpdir() TO pg_monitor;
>  GRANT EXECUTE ON FUNCTION pg_ls_tmpdir(oid) TO pg_monitor;
> 
> -GRANT EXECUTE ON FUNCTION pg_replication_origin_progress(text,
> boolean) TO pg_monitor;
> -GRANT EXECUTE ON FUNCTION
> pg_replication_origin_session_progress(boolean) TO pg_monitor;
> -
>  GRANT pg_read_all_settings TO pg_monitor;
>  GRANT pg_read_all_stats TO pg_monitor;
>  GRANT pg_stat_scan_tables TO pg_monitor;


Agreed on this part.  The two functions aren't needed to be granted.

But, pg_show_replication_origin_status() should be allowed
pg_read_all_stats, not pg_monitor. pg_monitor is just a union role of
actual privileges.

Another issue would be how to control output of
pg_show_replication_origin_status().  Most of functions that needs
pg_read_all_stats privileges are filtering sensitive columns in each
row, instead of hiding the existence of rows.  Maybe the view
pg_replication_origin_status should show only local_id and hide other
columns from non-pg_read_all_stats users.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: SIGSEGV from START_REPLICATION 0/XXXXXXX in XLogSendPhysical () at walsender.c:2762

2020-06-01 Thread Fujii Masao




On 2020/06/02 13:24, Michael Paquier wrote:

On Fri, May 29, 2020 at 06:09:06PM +0900, Masahiko Sawada wrote:

Yes. Conversely, if we start logical replication in a physical
replication connection (i.g. replication=true) we got an error before
staring replication:

ERROR:  logical decoding requires a database connection

I think we can prevent that SEGV in a similar way.


Still unconvinced as this restriction stands for logical decoding
requiring a database connection but it is not necessarily true now as
physical replication has less restrictions than a logical one.


Could you tell me what the benefit for supporting physical replication on
logical rep connection is? If it's only for "undocumented"
backward-compatibility, IMO it's better to reject such "tricky" set up.
But if there are some use cases for that, I'm ok to support that.


Looking at the code, I think that there is some confusion with the
fake WAL reader used as base reference in InitWalSender() where we
assume that it could only be used in the context of a non-database WAL
sender.  However, this initialization happens when the WAL sender
connection is initialized, and what I think this misses is that we
should try to initialize a WAL reader when actually going through a
START_REPLICATION command.

I can note as well that StartLogicalReplication() moves in this sense
by setting xlogreader to be the one from logical_decoding_ctx once the
decoding context has been created.

This results in the attached.  The extra test from upthread to check
that logical decoding is not allowed in a non-database WAL sender is a
good idea, so I have kept it.


Yes. Also we should add the test to check if physical replication can work
fine even on logical rep connection?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: race condition when writing pg_control

2020-06-01 Thread Michael Paquier
On Sun, May 31, 2020 at 09:11:35PM +, Bossart, Nathan wrote:
> Thanks for the feedback.  I've attached a new set of patches.

Thanks for splitting the set.  0001 and 0002 are the minimum set for
back-patching, and it would be better to merge them together.  0003 is
debatable and not an actual bug fix, so I would refrain from doing a
backpatch.  It does not seem that there is a strong consensus in favor
of 0003 either.

Thomas, are you planning to look at this patch set?
--
Michael


signature.asc
Description: PGP signature


Re: Small doc improvement about spilled txn tracking

2020-06-01 Thread Amit Kapila
On Tue, Jun 2, 2020 at 10:22 AM Amit Kapila  wrote:
>
> On Tue, Jun 2, 2020 at 9:10 AM Masahiko Sawada
>  wrote:
> >
>
> > Please find attached patch.
> >
>
> On a quick look, it seems fine but I will look in more detail and let
> you know if I have any feedback.
>

I am not sure if we need to add "Logical decoding plugins may
optionally emit tracking message." as the stats behavior should be the
same for decoding plugin and logical replication.  Apart from removing
this line, I have made a few other minor changes, see what you think
of attached?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


0001-Doc-Update-the-documentation-for-spilled-transaction.patch
Description: Binary data


Re: [POC] Fast COPY FROM command for the table with foreign partitions

2020-06-01 Thread Andrey Lepikhov

Thank you for the answer,

02.06.2020 05:02, Etsuro Fujita пишет:

I think I also thought something similar to this before [1].  Will take a look.



[1] 
https://www.postgresql.org/message-id/23990375-45a6-5823-b0aa-a6a7a6a957f0%40lab.ntt.co.jp


I have looked into the thread.
My first version of the patch was like your idea. But when developing 
the “COPY FROM” code, the following features were discovered:
1. Two or more partitions can be placed at the same node. We need to 
finish COPY into one partition before start COPY into another partition 
at the same node.
2. On any error we need to send EOF to all started "COPY .. FROM STDIN" 
operations. Otherwise FDW can't cancel operation.


Hiding the COPY code under the buffers management machinery allows us to 
generalize buffers machinery, execute one COPY operation on each buffer 
and simplify error handling.


As i understand, main idea of the thread, mentioned by you, is to add 
"COPY FROM" support without changes in FDW API.
It is possible to remove BeginForeignCopy() and EndForeignCopy() from 
the patch. But it is not trivial to change ExecForeignInsert() for the 
COPY purposes.
All that I can offer in this place now is to introduce one new 
ExecForeignBulkInsert(buf) routine that will execute single "COPY FROM 
STDIN" operation, send tuples and close the operation. We can use the 
ExecForeignInsert() routine for each buffer tuple if 
ExecForeignBulkInsert() is not supported.


One of main questions here is to use COPY TO machinery for serializing a 
tuple. It is needed (if you will take a look into the patch) to 
transform the CopyTo() routine to an iterative representation: 
start/next/finish. May it be acceptable?


In the attachment there is a patch with the correction of a stupid error.

--
Andrey Lepikhov
Postgres Professional
https://postgrespro.com
The Russian Postgres Company
>From 92a86ca52e33444ef2f559a1da9c8632398892a4 Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" 
Date: Fri, 29 May 2020 10:39:57 +0500
Subject: [PATCH] Fast COPY FROM into the foreign (or sharded) table.

---
 contrib/postgres_fdw/deparse.c|  25 ++
 .../postgres_fdw/expected/postgres_fdw.out|   5 +-
 contrib/postgres_fdw/postgres_fdw.c   |  95 
 contrib/postgres_fdw/postgres_fdw.h   |   1 +
 src/backend/commands/copy.c   | 213 --
 src/include/commands/copy.h   |   5 +
 src/include/foreign/fdwapi.h  |   9 +
 7 files changed, 286 insertions(+), 67 deletions(-)

diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index ad37a74221..427402c8eb 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -1758,6 +1758,31 @@ deparseInsertSql(StringInfo buf, RangeTblEntry *rte,
 		 withCheckOptionList, returningList, retrieved_attrs);
 }
 
+/*
+ * Deparse COPY FROM into given buf.
+ * We need to use list of parameters at each query.
+ */
+void
+deparseCopyFromSql(StringInfo buf, Relation rel)
+{
+	int attnum;
+
+	appendStringInfoString(buf, "COPY ");
+	deparseRelation(buf, rel);
+	appendStringInfoString(buf, " ( ");
+
+	for(attnum = 0; attnum < rel->rd_att->natts; attnum++)
+	{
+		appendStringInfoString(buf, NameStr(rel->rd_att->attrs[attnum].attname));
+
+		if (attnum != rel->rd_att->natts-1)
+			appendStringInfoString(buf, ", ");
+	}
+
+	appendStringInfoString(buf, " ) ");
+	appendStringInfoString(buf, " FROM STDIN ");
+}
+
 /*
  * deparse remote UPDATE statement
  *
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 90db550b92..5ae24fef7c 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -8063,8 +8063,9 @@ copy rem2 from stdin;
 copy rem2 from stdin; -- ERROR
 ERROR:  new row for relation "loc2" violates check constraint "loc2_f1positive"
 DETAIL:  Failing row contains (-1, xyzzy).
-CONTEXT:  remote SQL command: INSERT INTO public.loc2(f1, f2) VALUES ($1, $2)
-COPY rem2, line 1: "-1	xyzzy"
+CONTEXT:  COPY loc2, line 1: "-1	xyzzy"
+remote SQL command: COPY public.loc2 ( f1, f2 )  FROM STDIN 
+COPY rem2, line 2
 select * from rem2;
  f1 | f2  
 +-
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 9fc53cad68..bd2a8f596f 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -18,6 +18,7 @@
 #include "access/sysattr.h"
 #include "access/table.h"
 #include "catalog/pg_class.h"
+#include "commands/copy.h"
 #include "commands/defrem.h"
 #include "commands/explain.h"
 #include "commands/vacuum.h"
@@ -190,6 +191,7 @@ typedef struct PgFdwModifyState
 	/* for update row movement if subplan result rel */
 	struct PgFdwModifyState *aux_fmstate;	/* foreign-insert state, if
 			 * created */
+	CopyState fdwcstate;
 } PgFdwModifyState;
 
 /*
@@ -350,12 +352,16 @@ static TupleTableSlot *postgresExecForeignDe

Re: Small doc improvement about spilled txn tracking

2020-06-01 Thread Masahiko Sawada
On Tue, 2 Jun 2020 at 14:50, Amit Kapila  wrote:
>
> On Tue, Jun 2, 2020 at 10:22 AM Amit Kapila  wrote:
> >
> > On Tue, Jun 2, 2020 at 9:10 AM Masahiko Sawada
> >  wrote:
> > >
> >
> > > Please find attached patch.
> > >
> >
> > On a quick look, it seems fine but I will look in more detail and let
> > you know if I have any feedback.
> >
>
> I am not sure if we need to add "Logical decoding plugins may
> optionally emit tracking message." as the stats behavior should be the
> same for decoding plugin and logical replication.  Apart from removing
> this line, I have made a few other minor changes, see what you think
> of attached?
>

I'm okay with removing that sentence. The patch looks good to me.

Regards,

-- 
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: SIGSEGV from START_REPLICATION 0/XXXXXXX in XLogSendPhysical () at walsender.c:2762

2020-06-01 Thread Kyotaro Horiguchi
At Tue, 2 Jun 2020 13:24:56 +0900, Michael Paquier  wrote 
in 
> On Fri, May 29, 2020 at 06:09:06PM +0900, Masahiko Sawada wrote:
> > Yes. Conversely, if we start logical replication in a physical
> > replication connection (i.g. replication=true) we got an error before
> > staring replication:
> > 
> > ERROR:  logical decoding requires a database connection
> > 
> > I think we can prevent that SEGV in a similar way.
> 
> Still unconvinced as this restriction stands for logical decoding
> requiring a database connection but it is not necessarily true now as
> physical replication has less restrictions than a logical one.

If we deliberately allow physical replication on a
database-replication connection, we should revise the documentation
that way. On the other hand physical replication has wider access to a
database cluster than logical replication.  Thus allowing to start
physical replication on a logical replication connection could
introduce a problem related to privileges.  So I think it might be
better that physical and logical replication have separate pg_hba
lines.

Once we explicitly allow physical replication on a logical replication
connection in documentation, it would be far harder to change the
behavior than now.

If we are sure that that cannot be a problem, I don't object the
change in documented behavior.

> Looking at the code, I think that there is some confusion with the
> fake WAL reader used as base reference in InitWalSender() where we
> assume that it could only be used in the context of a non-database WAL
> sender.  However, this initialization happens when the WAL sender
> connection is initialized, and what I think this misses is that we 
> should try to initialize a WAL reader when actually going through a
> START_REPLICATION command.

At first fake_xlogreader was really a fake one that only provides
callback routines, but it should have been changed to a real
xlogreader at the time it began to store segment information. In that
sense moving to real xlogreader makes sense to me separately from
whether we allow physicalrep on logicalrep connections.

> I can note as well that StartLogicalReplication() moves in this sense
> by setting xlogreader to be the one from logical_decoding_ctx once the
> decoding context has been created.
> 
> This results in the attached.  The extra test from upthread to check
> that logical decoding is not allowed in a non-database WAL sender is a
> good idea, so I have kept it.

+   ereport(ERROR,
+   (errcode(ERRCODE_OUT_OF_MEMORY),
+errmsg("out of memory")));

The same error message is accompanied by a DETAILS in some other
places. Don't we need one for this?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Small doc improvement about spilled txn tracking

2020-06-01 Thread Amit Kapila
On Tue, Jun 2, 2020 at 11:30 AM Masahiko Sawada
 wrote:
>
> On Tue, 2 Jun 2020 at 14:50, Amit Kapila  wrote:
> >
> > On Tue, Jun 2, 2020 at 10:22 AM Amit Kapila  wrote:
> > >
> > > On Tue, Jun 2, 2020 at 9:10 AM Masahiko Sawada
> > >  wrote:
> > > >
> > >
> > > > Please find attached patch.
> > > >
> > >
> > > On a quick look, it seems fine but I will look in more detail and let
> > > you know if I have any feedback.
> > >
> >
> > I am not sure if we need to add "Logical decoding plugins may
> > optionally emit tracking message." as the stats behavior should be the
> > same for decoding plugin and logical replication.  Apart from removing
> > this line, I have made a few other minor changes, see what you think
> > of attached?
> >
>
> I'm okay with removing that sentence. The patch looks good to me.
>

Pushed.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: Small doc improvement about spilled txn tracking

2020-06-01 Thread Masahiko Sawada
On Tue, 2 Jun 2020 at 15:15, Amit Kapila  wrote:
>
> On Tue, Jun 2, 2020 at 11:30 AM Masahiko Sawada
>  wrote:
> >
> > On Tue, 2 Jun 2020 at 14:50, Amit Kapila  wrote:
> > >
> > > On Tue, Jun 2, 2020 at 10:22 AM Amit Kapila  
> > > wrote:
> > > >
> > > > On Tue, Jun 2, 2020 at 9:10 AM Masahiko Sawada
> > > >  wrote:
> > > > >
> > > >
> > > > > Please find attached patch.
> > > > >
> > > >
> > > > On a quick look, it seems fine but I will look in more detail and let
> > > > you know if I have any feedback.
> > > >
> > >
> > > I am not sure if we need to add "Logical decoding plugins may
> > > optionally emit tracking message." as the stats behavior should be the
> > > same for decoding plugin and logical replication.  Apart from removing
> > > this line, I have made a few other minor changes, see what you think
> > > of attached?
> > >
> >
> > I'm okay with removing that sentence. The patch looks good to me.
> >
>
> Pushed.

Thank you!

Regards,

-- 
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Resetting spilled txn statistics in pg_stat_replication

2020-06-01 Thread Masahiko Sawada
Hi all,

Tracking of spilled transactions has been introduced to PG13. These
new statistics values, spill_txns, spill_count, and spill_bytes, are
cumulative total values unlike other statistics values in
pg_stat_replication. How can we reset these values? We can reset
statistics values in other statistics views using by
pg_stat_reset_shared(), pg_stat_reset() and so on. It seems to me that
the only option to reset spilled transactions is to restart logical
replication but it's surely high cost.

It might have been discussed during development but it's worth having
a SQL function to reset these statistics?

Regards,

-- 
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services