Re: [PATCH] Tracking statements entry timestamp in pg_stat_statements

2021-03-23 Thread Julien Rouhaud
On Tue, Mar 23, 2021 at 09:50:16AM +0300, Andrei Zubkov wrote:
> 
> By the way right now in my workload tracing tool pg_profile I have to
> reset pg_stat_statements on every sample (wich is about 30-60 minutes)
> to make sure that all workload between samples is captured. This causes
> much more overhead. Introduced first_seen column can eliminate the need
> of resets.

Note that you could also detect entries for which some counters decreased (e.g.
the execution count), and in that case only use the current values.  It should
give the exact same results as what you will get with the first_seen column,
except of course if some entry is almost never used and is suddenly used a lot
after an explicit reset or an eviction and only until you perform your
snapshot.  I'm not sure that it's a very likely scenario though.

FTR that's how powa currently deals with reset/eviction.




Re: About to add WAL write/fsync statistics to pg_stat_wal view

2021-03-23 Thread Fujii Masao




On 2021/03/22 20:25, ikedamsh wrote:

Agreed. Users can know whether the stats is for walreceiver or not. The
pg_stat_wal view in standby server shows for the walreceiver, and in primary
server it shows for the others. So, I updated the document.
(v20-0003-Makes-the-wal-receiver-report-WAL-statistics.patch)


Thanks for updating the docs!

There was the discussion about when the stats collector is invoked, at [1].
Currently during archive recovery or standby, the stats collector is
invoked when the startup process reaches the consistent state, sends
PMSIGNAL_BEGIN_HOT_STANDBY, and then the system is starting accepting
read-only connections. But walreceiver can be invoked at earlier stage.
This can cause walreceiver to generate and send the statistics about WAL
writing even though the stats collector has not been running yet. This might
be problematic? If so, maybe we need to ensure that the stats collector is
invoked before walreceiver?

During recovery, the stats collector is not invoked if hot standby mode is
disabled. But walreceiver can be running in this case. So probably we should
change walreceiver so that it's invoked even when hot standby is disabled?
Otherwise we cannnot collect the statistics about WAL writing by walreceiver
in that case.

[1]
https://postgr.es/m/e5a982a5-8bb4-5a10-cf9a-40dd1921b...@oss.nttdata.com

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: [CLOBBER_CACHE]Server crashed with segfault 11 while executing clusterdb

2021-03-23 Thread Michael Paquier
On Tue, Mar 23, 2021 at 10:52:09AM +0530, Neha Sharma wrote:
> Sure, will give a regression run with CCA enabled.

I can confirm the regression between 13 and HEAD, so I have added an
open item.  It would be good to figure out the root issue here, and I
am ready to bet that the problem is deeper than it looks and that more
code paths could be involved.

It takes some time to initialize a cluster under CLOBBER_CACHE_ALWAYS,
but the test is quick enough to reproduce.  It would be good to bisect
the origin point here as a first step.
--
Michael


signature.asc
Description: PGP signature


Re: Is it useful to record whether plans are generic or custom?

2021-03-23 Thread torikoshia

On 2021-03-05 17:47, Fujii Masao wrote:

Thanks for your comments!

I just tried this feature. When I set plan_cache_mode to 
force_generic_plan

and executed the following queries, I found that
pg_stat_statements.generic_calls
and pg_prepared_statements.generic_plans were not the same.
Is this behavior expected? I was thinking that they are basically the 
same.


It's not expected behavior, fixed.



DEALLOCATE ALL;
SELECT pg_stat_statements_reset();
PREPARE hoge AS SELECT * FROM pgbench_accounts WHERE aid = $1;
EXECUTE hoge(1);
EXECUTE hoge(1);
EXECUTE hoge(1);

SELECT generic_plans, statement FROM pg_prepared_statements WHERE
statement LIKE '%hoge%';
 generic_plans |   statement
---+
 3 | PREPARE hoge AS SELECT * FROM pgbench_accounts WHERE 
aid = $1;


SELECT calls, generic_calls, query FROM pg_stat_statements WHERE query
LIKE '%hoge%';
 calls | generic_calls | query
---+---+---
 3 | 2 | PREPARE hoge AS SELECT * FROM
pgbench_accounts WHERE aid = $1




When I executed the prepared statements via EXPLAIN ANALYZE, I found
pg_stat_statements.generic_calls was not incremented. Is this behavior 
expected?

Or we should count generic_calls even when executing the queries via
ProcessUtility()?


I think prepared statements via EXPLAIN ANALYZE also should be counted
for consistency with  pg_prepared_statements.

Since ActivePortal did not keep the plan type in the 
ProcessUtility_hook,

I moved the global variables 'is_plan_type_generic' and
'is_prev_plan_type_generic' from pg_stat_statements to plancache.c.



DEALLOCATE ALL;
SELECT pg_stat_statements_reset();
PREPARE hoge AS SELECT * FROM pgbench_accounts WHERE aid = $1;
EXPLAIN ANALYZE EXECUTE hoge(1);
EXPLAIN ANALYZE EXECUTE hoge(1);
EXPLAIN ANALYZE EXECUTE hoge(1);

SELECT generic_plans, statement FROM pg_prepared_statements WHERE
statement LIKE '%hoge%';
 generic_plans |   statement
---+
 3 | PREPARE hoge AS SELECT * FROM pgbench_accounts WHERE 
aid = $1;


SELECT calls, generic_calls, query FROM pg_stat_statements WHERE query
LIKE '%hoge%';
 calls | generic_calls | query
---+---+---
 3 | 0 | PREPARE hoge AS SELECT * FROM
pgbench_accounts WHERE aid = $1
 3 | 0 | EXPLAIN ANALYZE EXECUTE hoge(1)




Regards,diff --git a/contrib/pg_stat_statements/expected/pg_stat_statements.out b/contrib/pg_stat_statements/expected/pg_stat_statements.out
index 16158525ca..887c4b2be8 100644
--- a/contrib/pg_stat_statements/expected/pg_stat_statements.out
+++ b/contrib/pg_stat_statements/expected/pg_stat_statements.out
@@ -251,6 +251,72 @@ FROM pg_stat_statements ORDER BY query COLLATE "C";
  UPDATE pgss_test SET b = $1 WHERE a > $2  | 1 |3 | t   | t | t
 (7 rows)
 
+--
+-- Track the number of generic plan
+--
+CREATE TABLE pgss_test (i int, j int, k int);
+SELECT pg_stat_statements_reset();
+ pg_stat_statements_reset 
+--
+ 
+(1 row)
+
+SET plan_cache_mode TO force_generic_plan;
+SET pg_stat_statements.track_utility = TRUE;
+PREPARE pgss_p1 AS SELECT i FROM pgss_test WHERE i = $1;
+EXECUTE pgss_p1(1);
+ i 
+---
+(0 rows)
+
+-- EXPLAIN ANALZE should be recorded
+PREPARE pgss_p2 AS SELECT j FROM pgss_test WHERE j = $1;
+EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF) EXECUTE pgss_p2(1);
+  QUERY PLAN   
+---
+ Seq Scan on pgss_test (actual rows=0 loops=1)
+   Filter: (j = $1)
+(2 rows)
+
+-- Nested Portal
+PREPARE pgss_p3 AS SELECT k FROM pgss_test WHERE k = $1;
+BEGIN;
+DECLARE pgss_c1 CURSOR FOR SELECT name FROM pg_prepared_statements;
+FETCH IN pgss_c1;
+  name   
+-
+ pgss_p2
+(1 row)
+
+EXECUTE pgss_p3(1);
+ k 
+---
+(0 rows)
+
+FETCH IN pgss_c1;
+  name   
+-
+ pgss_p1
+(1 row)
+
+COMMIT;
+SELECT calls, generic_calls, query FROM pg_stat_statements;
+ calls | generic_calls |  query   
+---+---+--
+ 1 | 0 | DECLARE pgss_c1 CURSOR FOR SELECT name FROM pg_prepared_statements
+ 0 | 0 | SELECT calls, generic_calls, query FROM pg_stat_statements
+ 1 | 1 | PREPARE pgss_p1 AS SELECT i FROM pgss_test WHERE i = $1
+ 2 | 0 | FETCH IN pgss_c1
+ 1 | 0 | BEGIN
+ 1 | 0 | SELECT pg_stat_statements_reset()
+ 1 | 1 | EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF) EXECUTE p

Re: [PATCH] Tracking statements entry timestamp in pg_stat_statements

2021-03-23 Thread Andrei Zubkov
Hi Julien,

On Tue, 2021-03-23 at 15:03 +0800, Julien Rouhaud wrote:
> Note that you could also detect entries for which some counters
> decreased (e.g.
> the execution count), and in that case only use the current values. 

Yes, but checking condition for several counters seems complicated
compared to check only one field.

>  It should
> give the exact same results as what you will get with the first_seen
> column,
> except of course if some entry is almost never used and is suddenly
> used a lot
> after an explicit reset or an eviction and only until you perform
> your
> snapshot.  I'm not sure that it's a very likely scenario though.
But it is possible, and we are guessing here. Storing a timestamp does
not seems too expensive to me, but it totally eliminates guessing, and
provides a clear view about the time interval we watching for this
specific statement.

> FTR that's how powa currently deals with reset/eviction.
PoWA sampling is much more frequent than pg_profile. For PoWA it is, of
cource, very unlikely scenario, but still possible.

-- 
Andrei Zubkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company






RE: [PATCH] Tracking statements entry timestamp in pg_stat_statements

2021-03-23 Thread kuroda.hay...@fujitsu.com
Dear Andrei,

> Certaily I was thinking about this. And I've taken an advice of Teodor
> Sigaev - a much more expirienced developer than me. It seems that
> GetCurrentTimestamp() is fast enough for our purpose and we won't call
> it too often - only on new statement entry allocation.

OK.

> However, there is another way - we can store the curent value
> of pg_stat_statements_info.dealloc field when allocating a new
> statement entry instead of timstamping it. Probably, it would be little
> faster, but timestamp seems much more valuable here.

I don't like the idea because such a column has no meaning for the specific row.
I prefer storing timestamp if GetCurrentTimestamp() is cheap.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED



Re: Get memory contexts of an arbitrary backend process

2021-03-23 Thread Kyotaro Horiguchi
At Mon, 22 Mar 2021 15:09:58 +0900, torikoshia  
wrote in 
> >> If MemoryContextStatsPrint(), i.e. showing 100 children at most is
> >> enough, this hard limit may be acceptable.
> > Can't this number be passed via shared memory?
> 
> The attached patch uses static shared memory to pass the number.

"pg_print_backend_memory_contexts"

That name looks like as if it returns the result as text when used on
command-line. We could have pg_get_backend_memory_context(bool
dump_to_log (or where to dump), int limit).  Or couldn't we name it
differently even in the ase we add a separate function?


+/*
+ * MaxChildrenPerContext
+ * Max number of children to print per one parent context.
+ */
+int*MaxChildrenPerContext = NULL;

Perhaps it'd be better to have a struct even if it consists only of
one member.  (Aligned) C-int values are atomic so we can omit the
McxtPrintLock. (I don't think it's a problem even if it is modifed
while reading^^:)


+   if(max_children <= 0)
+   {
+   ereport(WARNING,
+   (errmsg("%d is invalid value", max_children),
+errhint("second parameter is the number of 
context and it must be set to a value greater than or equal to 1")));

It's annoying to choose a number large enough when I want to dump
children unlimitedly.  Couldn't we use 0 to specify "unlimited"?

+   (errmsg("%d is invalid value", max_children),
+errhint("second parameter is the number of 
context and it must be set to a value greater than or equal to 1")));

For the main message, (I think) we usually spell the "%d is invalid
value" as "maximum number of children must be positive" or such.  For
the hint, we don't need a copy of the primary section of the
documentation here.

I think we should ERROR out for invalid parameters, at least for
max_children.  I'm not sure about pid since we might call it based on
pg_stat_activity..


+   if(!SendProcSignal(pid, PROCSIG_PRINT_MEMORY_CONTEXT, InvalidBackendId))

We know the backendid of the process here.


+   if (is_dst_stderr)
+   {
+   for (i = 0; i <= level; i++)
+   fprintf(stderr, "  ");

The fprintf path is used nowhere in the patch at all. It can be used
while attaching debugger but I'm not sure we need that code.  The
footprint of this patch is largely shrinked by removing it.


+   strcat(truncated_ident, delimiter);

strcpy is sufficient here.  And we don't need the delimiter to be a
variable.  (we can copy a string literal into truncate_ident, then
count the length of truncate_ident, instead of the delimiter
variable.)


+   $current_logfiles = slurp_file($node->data_dir . 
'/current_logfiles');
...
+my $lfname = $current_logfiles;
+$lfname =~ s/^stderr //;
+chomp $lfname;

$node->logfile is the current log file name.

+   'target PID is not PostgreSQL server process');

Maybe "check if PID check is working" or such?  And, we can do
something like the following to exercise in a more practical way.

 select pg_print_backend...(pid,) from pg_stat_activity where backend_type = 
'checkpointer';

> As documented, the current implementation allows that when multiple
> pg_print_backend_memory_contexts() called in succession or
> simultaneously, max_children can be the one of another
> pg_print_backend_memory_contexts().
> I had tried to avoid this by adding some state information and using
> before_shmem_exit() in case of process termination for cleaning up the
> state information as in the patch I presented earlier, but since
> kill()
> returns success before the dumper called signal handler, it seemed
> there were times when we couldn't clean up the state.
> Since this happens only when multiple
> pg_print_backend_memory_contexts()
> are called and their specified number of children are different, and
> the
> effect is just the not intended number of children to print, it might
> be
> acceptable.

I see it as a non-issue. Even though the behavior looks somewhat
strange, that usage is stranger than the behavior.

> Or it might be better to wait for some seconds if num_chilren on
> shared
> memory is not the initialized value(meaning some other process is
> requesting to print memory contexts).
> 
> >> Only superusers can call pg_print_backend_memory_contexts().
> > +   /* Only allow superusers to signal superuser-owned backends. */
> > +   if (superuser_arg(proc->roleId) && !superuser())
> > The patch seems to allow even non-superuser to request to print the
> > memory
> > contexts if the target backend is owned by non-superuser. Is this
> > intentional?
> > I think that only superuser should be allowed to execute
> > pg_print_backend_memory_contexts() whoever owns the target backend.
> > Because that function can cause lots of log messages.
> 
> Thanks, it's not intentional, modified it.


[PATCH] rename column if exists

2021-03-23 Thread David Oksman
Added the ability to specify IF EXISTS when renaming a column of an object
(table, view, etc.).
For example: ALTER TABLE distributors RENAME COLUMN IF EXISTS address TO
city;
If the column does not exist, a notice is issued instead of throwing an
error.
From 32bd8ced5dcb923575e1311e7353399b04c245fa Mon Sep 17 00:00:00 2001
From: David Oksman 
Date: Mon, 22 Mar 2021 21:22:00 +0200
Subject: [PATCH] rename column if exists

---
 doc/src/sgml/ref/alter_foreign_table.sgml |   2 +-
 doc/src/sgml/ref/alter_materialized_view.sgml |   2 +-
 doc/src/sgml/ref/alter_table.sgml |   4 +-
 doc/src/sgml/ref/alter_type.sgml  |   7 +-
 doc/src/sgml/ref/alter_view.sgml  |   2 +-
 src/backend/commands/tablecmds.c  |  80 -
 src/backend/nodes/copyfuncs.c |   1 +
 src/backend/nodes/equalfuncs.c|   1 +
 src/backend/parser/gram.y | 109 ++
 src/include/nodes/parsenodes.h|   1 +
 src/test/regress/expected/alter_table.out |  12 ++
 src/test/regress/expected/foreign_data.out|   2 +
 src/test/regress/sql/alter_table.sql  |  11 ++
 src/test/regress/sql/foreign_data.sql |   1 +
 14 files changed, 196 insertions(+), 39 deletions(-)

diff --git a/doc/src/sgml/ref/alter_foreign_table.sgml b/doc/src/sgml/ref/alter_foreign_table.sgml
index 7ca03f3..e0f3761 100644
--- a/doc/src/sgml/ref/alter_foreign_table.sgml
+++ b/doc/src/sgml/ref/alter_foreign_table.sgml
@@ -24,7 +24,7 @@ PostgreSQL documentation
 ALTER FOREIGN TABLE [ IF EXISTS ] [ ONLY ] name [ * ]
 action [, ... ]
 ALTER FOREIGN TABLE [ IF EXISTS ] [ ONLY ] name [ * ]
-RENAME [ COLUMN ] column_name TO new_column_name
+RENAME [ COLUMN ] [ IF EXISTS ] column_name TO new_column_name
 ALTER FOREIGN TABLE [ IF EXISTS ] name
 RENAME TO new_name
 ALTER FOREIGN TABLE [ IF EXISTS ] name
diff --git a/doc/src/sgml/ref/alter_materialized_view.sgml b/doc/src/sgml/ref/alter_materialized_view.sgml
index 27f60f6..8f9ce5c 100644
--- a/doc/src/sgml/ref/alter_materialized_view.sgml
+++ b/doc/src/sgml/ref/alter_materialized_view.sgml
@@ -26,7 +26,7 @@ ALTER MATERIALIZED VIEW [ IF EXISTS ] namename
 [ NO ] DEPENDS ON EXTENSION extension_name
 ALTER MATERIALIZED VIEW [ IF EXISTS ] name
-RENAME [ COLUMN ] column_name TO new_column_name
+RENAME [ COLUMN ] [ IF EXISTS ] column_name TO new_column_name
 ALTER MATERIALIZED VIEW [ IF EXISTS ] name
 RENAME TO new_name
 ALTER MATERIALIZED VIEW [ IF EXISTS ] name
diff --git a/doc/src/sgml/ref/alter_table.sgml b/doc/src/sgml/ref/alter_table.sgml
index 80a8efa..fb2c7a7 100644
--- a/doc/src/sgml/ref/alter_table.sgml
+++ b/doc/src/sgml/ref/alter_table.sgml
@@ -24,7 +24,7 @@ PostgreSQL documentation
 ALTER TABLE [ IF EXISTS ] [ ONLY ] name [ * ]
 action [, ... ]
 ALTER TABLE [ IF EXISTS ] [ ONLY ] name [ * ]
-RENAME [ COLUMN ] column_name TO new_column_name
+RENAME [ COLUMN ] [ IF EXISTS ] column_name TO new_column_name
 ALTER TABLE [ IF EXISTS ] [ ONLY ] name [ * ]
 RENAME CONSTRAINT constraint_name TO new_constraint_name
 ALTER TABLE [ IF EXISTS ] name
@@ -1008,7 +1008,7 @@ WITH ( MODULUS numeric_literal, REM
   IF EXISTS
   

-Do not throw an error if the table does not exist. A notice is issued
+Do not throw an error if the table or the column does not exist. A notice is issued
 in this case.

   
diff --git a/doc/src/sgml/ref/alter_type.sgml b/doc/src/sgml/ref/alter_type.sgml
index 21887e8..2513c73 100644
--- a/doc/src/sgml/ref/alter_type.sgml
+++ b/doc/src/sgml/ref/alter_type.sgml
@@ -26,7 +26,7 @@ PostgreSQL documentation
 ALTER TYPE name OWNER TO { new_owner | CURRENT_ROLE | CURRENT_USER | SESSION_USER }
 ALTER TYPE name RENAME TO new_name
 ALTER TYPE name SET SCHEMA new_schema
-ALTER TYPE name RENAME ATTRIBUTE attribute_name TO new_attribute_name [ CASCADE | RESTRICT ]
+ALTER TYPE name RENAME ATTRIBUTE [ IF EXISTS ] attribute_name TO new_attribute_name [ CASCADE | RESTRICT ]
 ALTER TYPE name action [, ... ]
 ALTER TYPE name ADD VALUE [ IF NOT EXISTS ] new_enum_value [ { BEFORE | AFTER } neighbor_enum_value ]
 ALTER TYPE name RENAME VALUE existing_enum_value TO new_enum_value
@@ -76,11 +76,14 @@ ALTER TYPE name SET ( 
 

-RENAME ATTRIBUTE
+RENAME ATTRIBUTE [ IF EXISTS ]
 
  
   This form is only usable with composite types.
   It changes the name of an individual attribute of the type.
+  If IF EXISTS is specified and the attribute
+  does not exist, no error is thrown. In this case a notice
+  is issued instead.
  
 

diff --git a/doc/src/sgml/ref/alter_view.sgml b/doc/src/sgml/ref/alter_view.sgml
index 98c312c..9dc3f5e 100644
--- a/doc/src/sgml/ref/alter_view.sgml
+++ b/doc/src/sgml/ref/alter_view.sgml
@@ -24,7 +24,7 @@ PostgreSQL documentation
 ALTER VIEW [ IF EXISTS ] name ALTER [ COLUMN ] column_name SET DEFAULT expression
 ALTER VIEW [ IF EX

tool to migrate database

2021-03-23 Thread isabelle Ross
Hi,
I'm looking for a software to migrate database between versions with
minimum downtime.

which one can be used to do this job ?

thanks

Isabelle


Mail
priva di virus. www.avast.com

<#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>


Re: tool to migrate database

2021-03-23 Thread Joel Jacobson
On Mon, Mar 22, 2021, at 22:47, isabelle Ross wrote:
>Hi,
>I'm looking for a software to migrate database between versions with minimum 
>downtime. 
>
>which one can be used to do this job ?

Hi Isabelle,

There are multiple ways to do it.
The fastest way is probably pg_upgrade.

There are some difference ways to do it depending on what requirements you have 
on redundancy while upgrading.

I recently read an interesting real-life story from a very big company, Adyen, 
and how they upgraded their 50 terrabyte PostgreSQL database. The article is 
from 2018 but I still think it's relevant:

https://medium.com/adyen/updating-a-50-terabyte-postgresql-database-f64384b799e7

There might be other good tools I don't know of, I'm not an expert on upgrades.
Hopefully other pghackers can fill in.

Best regards,

Joel




Re: a misbehavior of partition row movement (?)

2021-03-23 Thread Amit Langote
Sawada-san,

On Wed, Mar 10, 2021 at 4:51 PM Masahiko Sawada  wrote:
> I looked at the 0001 patch and here are random comments. Please ignore
> a comment if it is already discussed.

Thanks a lot for the review and sorry for the delay in replying.

> ---
> @@ -9077,7 +9102,8 @@ addFkRecurseReferenced(List **wqueue, Constraint
> *fkconstraint, Relation rel,
>partIndexId, constrOid, numfks,
>mapped_pkattnum, fkattnum,
>pfeqoperators, ppeqoperators, 
> ffeqoperators,
> -  old_check_ok);
> +  old_check_ok,
> +  deleteTriggerOid, updateTriggerOid);
>
> /* Done -- clean up (but keep the lock) */
> table_close(partRel, NoLock);
> @@ -9126,8 +9152,12 @@ addFkRecurseReferencing(List **wqueue,
> Constraint *fkconstraint, Relation rel,
> Relation pkrel, Oid indexOid, Oid parentConstr,
> int numfks, int16 *pkattnum, int16 *fkattnum,
> Oid *pfeqoperators, Oid *ppeqoperators, Oid
> *ffeqoperators,
> -   bool old_check_ok, LOCKMODE lockmode)
> +   bool old_check_ok, LOCKMODE lockmode,
> +   Oid parentInsTrigger, Oid parentUpdTrigger)
>  {
>
> We need to update the function comments as well.

Done.

> ---
> I think it's better to add comments for newly added functions such as
> GetForeignKeyActionTriggers() and GetForeignKeyCheckTriggers() etc.
> Those functions have no comment at all.

I've added comments.

> BTW, those two functions out of newly added four functions:
> AttachForeignKeyCheckTriggers() and DetachForeignKeyCheckTriggers(),
> have only one user. Can we past the functions body at where each
> function is called?

I made those pieces of code into functions because I thought future
patches may have a need for them.  But maybe those future patches
should do the refactoring, so I've incorporated their code into the
respective callers as you suggest.

> ---
> /*
>  * If the referenced table is a plain relation, create the action triggers
>  * that enforce the constraint.
>  */
> -   if (pkrel->rd_rel->relkind == RELKIND_RELATION)
> -   {
> -   createForeignKeyActionTriggers(rel, RelationGetRelid(pkrel),
> -  fkconstraint,
> -  constrOid, indexOid);
> -   }
> +   createForeignKeyActionTriggers(rel, RelationGetRelid(pkrel),
> +  fkconstraint,
> +  constrOid, indexOid,
> +  parentDelTrigger, parentUpdTrigger,
> +  &deleteTriggerOid, &updateTriggerOid);
>
> The comment needs to be updated.
> ---
>  /*
>   * If the referencing relation is a plain table, add the check triggers 
> to
>   * it and, if necessary, schedule it to be checked in Phase 3.
>   *
>   * If the relation is partitioned, drill down to do it to its partitions.
>   */
> +createForeignKeyCheckTriggers(RelationGetRelid(rel),
> +  RelationGetRelid(pkrel),
> +  fkconstraint,
> +  parentConstr,
> +  indexOid,
> +  parentInsTrigger, parentUpdTrigger,
> +  &insertTriggerOid, &updateTriggerOid);
>
> Same as above.

Done and done.

> ---
> I think TriggerSetParentTrigger needs to free the heap tuple copied by
> heap_copytuple().

Oops, done.

> ---
> +   Assert(trigForm->tgparentid == 0);
> +   if (trigForm->tgparentid != InvalidOid)
> +   elog(ERROR, "trigger %u already has a parent trigger",
> +childTrigId);
>
> I think the first condition in Assert() can be
> !OidIsValid(trigForm->tgparentid) and the second condition in 'if'
> statement can be OidIsValid(trigForm->tgparentid). So those are
> redundant?

Ah, indeed.  I've kept the if () elog(...) after applying your suggested change.

> ---
> -   if (fout->remoteVersion >= 9)
> +   if (fout->remoteVersion >= 13)
> +   {
> +   /*
> +* NB: think not to use pretty=true in pg_get_triggerdef.  It
> +* could result in non-forward-compatible dumps of WHEN clauses
> +* due to under-parenthesization.
> +*/
> +   appendPQExpBuffer(query,
> + "SELECT tgname, "
> + "tgfoid::pg_catalog.regproc AS tgfname, "
> + "pg_catalog.pg_get_triggerdef(oid,
> false) AS tgdef, "
> + "tgenabled, tableoid, oid "
> + "FROM pg_catalog.pg_trigger t "
> +  

Re: popcount

2021-03-23 Thread Peter Eisentraut

On 21.03.21 02:31, David Fetter wrote:

I have now read the entire internet on what a suitable name for this
function could be.  I think the emerging winner is BIT_COUNT(), which
already exists in MySQL, and also in Python (int.bit_count()) and Java
(Integer.bitCount()).


Thanks for doing this tedious work. Please find attached the next
version of the patch.


committing, with some polishing




Re: [HACKERS] logical decoding of two-phase transactions

2021-03-23 Thread Peter Smith
On Tue, Mar 23, 2021 at 10:44 AM Peter Smith  wrote:
>
> On Mon, Mar 22, 2021 at 11:51 PM Amit Kapila  wrote:
> >
> > I have incorporated all your changes and additionally made few more
> > changes (a) got rid of LogicalRepBeginPrepareData and instead used
> > LogicalRepPreparedTxnData, (b) made a number of changes in comments
> > and docs, (c) ran pgindent, (d) modified tests to use standard
> > wait_for_catch function and removed few tests to reduce the time and
> > to keep regression tests reliable.
>
> I checked all v65* / v66* differences and found only two trivial comment 
> typos.
>
> PSA patches to fix those.
>

Hi Amit.

PSA a patch to allow the ALTER SUBSCRIPTION ... REFRESH PUBLICATION to
work when two-phase tristate is PENDING.

This is necessary for the pg_dump/pg_restore scenario, or for any
other use-case where the subscription might
start off having no tables.

Please apply this on top of your v66-0001 (after applying the other
Feedback patches I posted earlier today).

--
Kind Regards,
Peter Smith.
Fujitsu Australia


v66-0003-Fix-to-allow-REFRESH-PUBLICATION-for-two_phase-P.patch
Description: Binary data


Re: [CLOBBER_CACHE]Server crashed with segfault 11 while executing clusterdb

2021-03-23 Thread Michael Paquier
On Tue, Mar 23, 2021 at 04:12:01PM +0900, Michael Paquier wrote:
> It takes some time to initialize a cluster under CLOBBER_CACHE_ALWAYS,
> but the test is quick enough to reproduce.  It would be good to bisect
> the origin point here as a first step.

One bisect later, the winner is:
commit: 3d351d916b20534f973eda760cde17d96545d4c4
author: Tom Lane 
date: Sun, 30 Aug 2020 12:21:51 -0400
Redefine pg_class.reltuples to be -1 before the first VACUUM or ANALYZE.

I am too tired to poke at that today, so I'll try tomorrow.  Tom may
beat me at that though.
--
Michael


signature.asc
Description: PGP signature


Re: Add client connection check during the execution of the query

2021-03-23 Thread Thomas Munro
On Mon, Mar 22, 2021 at 3:29 PM Thomas Munro  wrote:
> 2.  The tests need tightening up.  The thing with the "sleep 3" will
> not survive contact with the build farm, and I'm not sure if the SSL
> test is as short as it could be.

I don't think the TAP test can be done in the way Sergey had it,
because of multiple races that would probably fail on
slow/overloaded/valgrind machines.  I'll try to think of a better way,
but for now I've removed those tests.

I realised that this should really be testing DoingCommandRead to
decide when it's time to stop checking and re-arming (originally it
was checking PqReadingMessage, which isn't quite right), so I moved a
tiny bit more of the logic into postgres.c, keeping only the actual
connection-check in pqcomm.c.

That leaves the thorny problem Tom mentioned at the top of this
thread[1]: this socket-level approach can be fooled by an 'X' sitting
in the socket buffer, if a client that did PQsendQuery() and then
PQfinish().  Or perhaps even by SSL messages invisible to our protocol
level.  That can surely only be addressed by moving the 'peeking' one
level up the protocol stack.  I've attached a WIP attemp to do that,
on top of the other patch.  Lookahead happens in our receive buffer,
not the kernel's socket buffer.  It detects the simple 'X' case, but
not deeper pipelines of queries (which would seem to require an
unbounded receive buffer and lookahead that actually decodes message
instead of just looking at the first byte, which seems way over the
top considering the price of infinite RAM these days).  I think it's
probably safe in terms of protocol synchronisation because it consults
PqCommReadingMsg to avoid look at non-message-initial bytes, but I
could be wrong, it's a first swing at it...  Maybe it's a little
unprincipled to bother with detecting 'X' at all if you can't handle
pipelining in general... I don't know.

Today I learned that there have been other threads[2][3] with people
wanting some variant of this feature over the years.

[1] https://www.postgresql.org/message-id/19003.1547420739%40sss.pgh.pa.us
[2] 
https://www.postgresql.org/message-id/flat/e09785e00907271728k4bf4d17kac0e7f5ec9316069%40mail.gmail.com
[3] 
https://www.postgresql.org/message-id/flat/20130810.113901.1014453099921841746.t-ishii%40sraoss.co.jp
From d57726134ad51b794b6db31d5824f21f6fd40214 Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Mon, 1 Mar 2021 18:08:23 +1300
Subject: [PATCH v6 1/2] Detect dropped connections while running queries.

Provide a new optional GUC that can be used to check whether the client
connection has gone away periodically while running very long queries.

Author: Sergey Cherkashin 
Author: Thomas Munro 
Reviewed-by: Thomas Munro 
Reviewed-by: Tatsuo Ishii 
Reviewed-by: Konstantin Knizhnik 
Reviewed-by: Zhihong Yu 
Reviewed-by: Tom Lane  (much earlier version)
Discussion: https://postgr.es/m/77def86b27e41f0efcba411460e929ae%40postgrespro.ru
---
 doc/src/sgml/config.sgml  | 36 +
 src/backend/libpq/pqcomm.c| 52 +++
 src/backend/tcop/postgres.c   | 27 ++
 src/backend/utils/init/globals.c  |  1 +
 src/backend/utils/init/postinit.c | 11 
 src/backend/utils/misc/guc.c  | 10 
 src/backend/utils/misc/postgresql.conf.sample |  3 ++
 src/include/libpq/libpq.h |  2 +
 src/include/miscadmin.h   |  1 +
 src/include/utils/timeout.h   |  1 +
 10 files changed, 144 insertions(+)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 5679b40dd5..5cd0d38dbf 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -998,6 +998,42 @@ include_dir 'conf.d'
   
  
 
+ 
+  client_connection_check_interval (integer)
+  
+   client_connection_check_interval configuration parameter
+  
+  
+  
+   
+Sets the time interval between checks that the client is still
+connected, while running queries.  The check is performed by testing
+whether one byte could be read from the socket with
+MSG_PEEK.  If the kernel reports that the connection
+has been closed or lost, a long running query can abort immediately,
+rather than discovering the problem when it eventually tries to send
+the response.
+   
+   
+If this value is specified without units, it is taken as milliseconds.  
+The default value is 0, which disables connection
+checks.  Without connection checks, the server will detect the loss of
+the connection only when it is waiting for a new request, receiving a
+request or sending a response.
+   
+   
+For the kernel itself to detect lost TCP connections reliably and
+within a known timeframe in all scenarios including network failure, it
+may also be necessary to adjust the default TCP k

Re: [HACKERS] logical decoding of two-phase transactions

2021-03-23 Thread Ajin Cherian
On Tue, Mar 23, 2021 at 9:01 PM Peter Smith  wrote:

>
>
> Please apply this on top of your v66-0001 (after applying the other
> Feedback patches I posted earlier today).
>

Applied all the above patches and did a 5 cascade test set up and all the
instances synced correctly. Test log attached.

regards,
Ajin Cherian
Fujitsu Australia


5
Description: Binary data


Re: OpenSSL 3.0.0 compatibility

2021-03-23 Thread Peter Eisentraut

On 12.03.21 00:22, Daniel Gustafsson wrote:

On 12 Mar 2021, at 00:04, Peter Eisentraut  
wrote:

On 11.03.21 11:41, Daniel Gustafsson wrote:

Then there are a few where we get padding back where we really should have
ended up with the "Cipher cannot be initialized" error since DES is in the
legacy provider:
  select decrypt_iv(decode('50735067b073bb93', 'hex'), '0123456', 'abcd', 
'des');
- decrypt_iv
-
- foo
+decrypt_iv
+--
+ \177\177\177\177\177\177\177\177
  (1 row)


The attached patch appears to address these cases.


+1, males a lot of sense.  This removes said errors when running without the
legacy provider enabled, and all tests still pass with it enabled.


I have committed this to master.  I see that the commit fest entry has 
been withdrawn in the meantime.  I suppose we'll come back to this, 
including possible backpatching, when OpenSSL 3.0.0 is in beta.





Re: New IndexAM API controlling index vacuum strategies

2021-03-23 Thread Masahiko Sawada
On Mon, Mar 22, 2021 at 10:39 PM Masahiko Sawada  wrote:
>
> On Sat, Mar 20, 2021 at 11:05 AM Peter Geoghegan  wrote:
> >
> > On Wed, Mar 17, 2021 at 7:55 PM Peter Geoghegan  wrote:
> >
> > 2. A patch to remove the tupgone case.
> >
> > Severa new and interesting changes here -- see below.
> >
> > 3. The patch to optimize VACUUM by teaching it to skip index and heap
> > vacuuming in certain cases where we only expect a very small benefit.
>
> I’ll review the other two patches tomorrow.

Here are review comments on 0003 patch:

+   /*
+* Check whether or not to do index vacuum and heap vacuum.
+*
+* We do both index vacuum and heap vacuum if more than
+* SKIP_VACUUM_PAGES_RATIO of all heap pages have at least one LP_DEAD
+* line pointer.  This is normally a case where dead tuples on the heap
+* are highly concentrated in relatively few heap blocks, where the
+* index's enhanced deletion mechanism that is clever about heap block
+* dead tuple concentrations including btree's bottom-up index deletion
+* works well.  Also, since we can clean only a few heap blocks, it would
+* be a less negative impact in terms of visibility map update.
+*
+* If we skip vacuum, we just ignore the collected dead tuples.  Note that
+* vacrelstats->dead_tuples could have tuples which became dead after
+* HOT-pruning but are not marked dead yet.  We do not process them
+* because it's a very rare condition, and the next vacuum will process
+* them anyway.
+*/

The second paragraph is no longer true after removing the 'tupegone' case.

---
if (dead_tuples->num_tuples > 0)
two_pass_strategy(onerel, vacrelstats, Irel, indstats, nindexes,
- lps, params->index_cleanup);
+ lps, params->index_cleanup,
+ has_dead_items_pages, !calledtwopass);

Maybe we can use vacrelstats->num_index_scans instead of
calledtwopass? When calling to two_pass_strategy() at the end of
lazy_scan_heap(), if vacrelstats->num_index_scans is 0 it means this
is the first time call, which is equivalent to calledtwopass = false.

---
-   params.index_cleanup = get_vacopt_ternary_value(opt);
+   {
+   if (opt->arg == NULL || strcmp(defGetString(opt), "auto") == 0)
+   params.index_cleanup = VACOPT_CLEANUP_AUTO;
+   else if (defGetBoolean(opt))
+   params.index_cleanup = VACOPT_CLEANUP_ENABLED;
+   else
+   params.index_cleanup = VACOPT_CLEANUP_DISABLED;
+   }


+   /*
+* Set index cleanup option based on reloptions if not set to either ON or
+* OFF.  Note that an VACUUM(INDEX_CLEANUP=AUTO) command is interpreted as
+* "prefer reloption, but if it's not set dynamically determine if index
+* vacuuming and cleanup" takes place in vacuumlazy.c.  Note also that the
+* reloption might be explicitly set to AUTO.
+*
+* XXX: Do we really want that?
+*/
+   if (params->index_cleanup == VACOPT_CLEANUP_AUTO &&
+   onerel->rd_options != NULL)
+   params->index_cleanup =
+   ((StdRdOptions *) onerel->rd_options)->vacuum_index_cleanup;

Perhaps we can make INDEX_CLEANUP option a four-value option: on, off,
auto, and default? A problem with the above change would be that if
the user wants to do "auto" mode, they might need to reset
vacuum_index_cleanup reloption before executing VACUUM command. In
other words, there is no way in VACUUM command to force "auto" mode.
So I think we can add "auto" value to INDEX_CLEANUP option and ignore
the vacuum_index_cleanup reloption if that value is specified.

Are you updating also the 0003 patch? if you're focusing on 0001 and
0002 patch, I'll update the 0003 patch along with the fourth patch
(skipping index vacuum in emergency cases).

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/




Re: pg_upgrade failing for 200+ million Large Objects

2021-03-23 Thread Jan Wieck

On 3/22/21 7:18 PM, Jan Wieck wrote:

On 3/22/21 5:36 PM, Zhihong Yu wrote:

Hi,

w.r.t. pg_upgrade_improvements.v2.diff.

+       blobBatchCount = 0;
+       blobInXact = false;

The count and bool flag are always reset in tandem. It seems 
variable blobInXact is not needed.


You are right. I will fix that.


New patch v3 attached.


Thanks, Jan

--
Jan Wieck
Principle Database Engineer
Amazon Web Services
diff --git a/src/bin/pg_dump/parallel.c b/src/bin/pg_dump/parallel.c
index c7351a4..4a611d0 100644
--- a/src/bin/pg_dump/parallel.c
+++ b/src/bin/pg_dump/parallel.c
@@ -864,6 +864,11 @@ RunWorker(ArchiveHandle *AH, ParallelSlot *slot)
 	WaitForCommands(AH, pipefd);
 
 	/*
+	 * Close an eventually open BLOB batch transaction.
+	 */
+	CommitBlobTransaction((Archive *)AH);
+
+	/*
 	 * Disconnect from database and clean up.
 	 */
 	set_cancel_slot_archive(slot, NULL);
diff --git a/src/bin/pg_dump/pg_backup.h b/src/bin/pg_dump/pg_backup.h
index 0296b9b..cd8a590 100644
--- a/src/bin/pg_dump/pg_backup.h
+++ b/src/bin/pg_dump/pg_backup.h
@@ -203,6 +203,8 @@ typedef struct Archive
 	int			numWorkers;		/* number of parallel processes */
 	char	   *sync_snapshot_id;	/* sync snapshot id for parallel operation */
 
+	int			blobBatchSize;	/* # of blobs to restore per transaction */
+
 	/* info needed for string escaping */
 	int			encoding;		/* libpq code for client_encoding */
 	bool		std_strings;	/* standard_conforming_strings */
@@ -269,6 +271,7 @@ extern void WriteData(Archive *AH, const void *data, size_t dLen);
 extern int	StartBlob(Archive *AH, Oid oid);
 extern int	EndBlob(Archive *AH, Oid oid);
 
+extern void	CommitBlobTransaction(Archive *AH);
 extern void CloseArchive(Archive *AH);
 
 extern void SetArchiveOptions(Archive *AH, DumpOptions *dopt, RestoreOptions *ropt);
diff --git a/src/bin/pg_dump/pg_backup_archiver.c b/src/bin/pg_dump/pg_backup_archiver.c
index 1f82c64..8331e8a 100644
--- a/src/bin/pg_dump/pg_backup_archiver.c
+++ b/src/bin/pg_dump/pg_backup_archiver.c
@@ -68,6 +68,7 @@ typedef struct _parallelReadyList
 	bool		sorted;			/* are valid entries currently sorted? */
 } ParallelReadyList;
 
+static int		blobBatchCount = 0;
 
 static ArchiveHandle *_allocAH(const char *FileSpec, const ArchiveFormat fmt,
 			   const int compression, bool dosync, ArchiveMode mode,
@@ -265,6 +266,8 @@ CloseArchive(Archive *AHX)
 	int			res = 0;
 	ArchiveHandle *AH = (ArchiveHandle *) AHX;
 
+	CommitBlobTransaction(AHX);
+
 	AH->ClosePtr(AH);
 
 	/* Close the output */
@@ -279,6 +282,23 @@ CloseArchive(Archive *AHX)
 
 /* Public */
 void
+CommitBlobTransaction(Archive *AHX)
+{
+	ArchiveHandle *AH = (ArchiveHandle *) AHX;
+
+	if (blobBatchCount > 0)
+	{
+		ahprintf(AH, "--\n");
+		ahprintf(AH, "-- End BLOB restore batch\n");
+		ahprintf(AH, "--\n");
+		ahprintf(AH, "COMMIT;\n\n");
+
+		blobBatchCount = 0;
+	}
+}
+
+/* Public */
+void
 SetArchiveOptions(Archive *AH, DumpOptions *dopt, RestoreOptions *ropt)
 {
 	/* Caller can omit dump options, in which case we synthesize them */
@@ -3531,6 +3551,57 @@ _printTocEntry(ArchiveHandle *AH, TocEntry *te, bool isData)
 {
 	RestoreOptions *ropt = AH->public.ropt;
 
+	/* We restore BLOBs in batches to reduce XID consumption */
+	if (strcmp(te->desc, "BLOB") == 0 && AH->public.blobBatchSize > 0)
+	{
+		if (blobBatchCount > 0)
+		{
+			/* We are inside a BLOB restore transaction */
+			if (blobBatchCount >= AH->public.blobBatchSize)
+			{
+/*
+ * We did reach the batch size with the previous BLOB.
+ * Commit and start a new batch.
+ */
+ahprintf(AH, "--\n");
+ahprintf(AH, "-- BLOB batch size reached\n");
+ahprintf(AH, "--\n");
+ahprintf(AH, "COMMIT;\n");
+ahprintf(AH, "BEGIN;\n\n");
+
+blobBatchCount = 1;
+			}
+			else
+			{
+/* This one still fits into the current batch */
+blobBatchCount++;
+			}
+		}
+		else
+		{
+			/* Not inside a transaction, start a new batch */
+			ahprintf(AH, "--\n");
+			ahprintf(AH, "-- Start BLOB restore batch\n");
+			ahprintf(AH, "--\n");
+			ahprintf(AH, "BEGIN;\n\n");
+
+			blobBatchCount = 1;
+		}
+	}
+	else
+	{
+		/* Not a BLOB. If we have a BLOB batch open, close it. */
+		if (blobBatchCount > 0)
+		{
+			ahprintf(AH, "--\n");
+			ahprintf(AH, "-- End BLOB restore batch\n");
+			ahprintf(AH, "--\n");
+			ahprintf(AH, "COMMIT;\n\n");
+
+			blobBatchCount = 0;
+		}
+	}
+
 	/* Select owner, schema, tablespace and default AM as necessary */
 	_becomeOwner(AH, te);
 	_selectOutputSchema(AH, te->namespace);
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index f8bec3f..f153f08 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -165,12 +165,20 @@ static void guessConstraintInheritance(TableInfo *tblinfo, int numTables);
 static void dumpComment(Archive *fout, const char *type, const char *name,
 		const char *namespace, const char *owner,
 		CatalogId catalogId, int subid, DumpId dumpId);
+static bool dumpCommentQuery(Archive *fout, PQExpBuf

Re: Wired if-statement in gen_partprune_steps_internal

2021-03-23 Thread Amit Langote
Hi Ryan,

On Tue, Mar 23, 2021 at 2:24 AM Ryan Lambert  wrote:
> Should the status of this patch be updated to ready for comitter to get in 
> line for Pg 14 deadline?

Yes, I've done that.  Thanks for the reminder.

-- 
Amit Langote
EDB: http://www.enterprisedb.com




[PATCH] Allow multiple recursive self-references

2021-03-23 Thread Denis Hirn

Hey everyone,

As you know, Postgres currently supports SQL:1999 recursive common table
expressions, using WITH RECURSIVE. However, Postgres does not allow more than
one recursive self-reference in the recursive term. This restriction seems to be
unnecessary.

In this mail, I'd like to propose a patch that removes this restriction, and 
therefore allows the use of multiple self-references in the recursive term.
After the patch:

WITH RECURSIVE t(n) AS (
VALUES(1)
  UNION ALL
SELECT t.n+f.n
FROM t, t AS f
WHERE t.n < 100
) SELECT * FROM t;

  n
-
   1
   2
   4
   8
  16
  32
  64
 128
(8 rows)

This feature deviates only slightly from the current WITH RECURSIVE, and 
requires very little changes (~10 loc). Any thoughts on this?

--
Denis Hirn


0001-Allow-multiple-recursive-self-references.patch
Description: Binary data


Re: PATCH: Attempt to make dbsize a bit more consistent

2021-03-23 Thread gkokolatos






‐‐‐ Original Message ‐‐‐
On Wednesday, March 17, 2021 6:35 AM, Michael Paquier  
wrote:

> On Mon, Mar 15, 2021 at 03:10:59PM +0900, Michael Paquier wrote:
>
> > Anyway, as mentioned by other people upthread, I am not really
> > convinced either that we should have more flavors of size functions,
> > particularly depending on the relkind as this would be confusing for
> > the end-user. pg_relation_size() can already do this job for all
> > relkinds that use segment files, so it should still be able to hold
> > its ground and maintain a consistent behavior with what it does
> > currently.
>
> On top of the rest of my notes, there are two more things that would
> face a behavior change if making the existing functions go through
> table AMs, which would scan the data in the smgr:

I am not certain if you are referring to v5 (sent earlier than your mail)
or v4. Can you please verify?

>
> -   After a VACUUM, the relation would be reported with a size of 0,
> while that may not be the case of on-disk files yet.

I am not really following. Apologies for it. The table AM may or may not
choose to go through smgr, depending on the implementation. The only
currently known implementation, heap, does invalidate smgr, based on
what I can see, after a VACUUM. I have not been able to create a test
case both with or without v5 of the patch where not the same result
would be returned.

What have I missed?

>
> -   Temporary tables of other sessions would be accessible.

I am not really certain I am following. Commit 6919b7e3294 from 2012
notes that calculate_relation_size can be safely applied to temp tables
of other sessions. v5 of the patch does not change that behaviour. Nor
did previous versions, but those are already obsolete.

>
> So we would likely want a separate function. Another possibility,
> which I find tempting, would be to push down the calculation logic
> relying on physical files down to the table AM themselves with a new
> dedicated callback (relation_size_physical?), as it seems to me that
> the most important thing users want to know with this set of functions
> is how much physical space is being consumed at one given point in
> time. Attached is a small prototype I was just playing with.
> --
> Michael
>






Re: Nicer error when connecting to standby with hot_standby=off

2021-03-23 Thread James Coleman
On Tue, Mar 23, 2021 at 1:46 AM Fujii Masao  wrote:
>
>
>
> On 2021/03/23 3:59, James Coleman wrote:
> > Are you saying we should only change the message for a single case:
> > the case where we'd otherwise allow connections but EnableHotStandby
> > is false?
>
> No. Let me clarify my opinion.
>
> At PM_STARTUP, "the database system is starting up" should be logged
> whatever the setting of hot_standby is. This is the same as the original
> behavior. During crash recovery, this message is output. Also at archive
> recovery or standby server, until the startup process sends
> PMSIGNAL_RECOVERY_STARTED, this message is logged.
>
> At PM_RECOVERY, originally "the database system is starting up" was logged
> whatever the setting of hot_standby is. My opinion is the same as our
> consensus, i.e., "the database system is not accepting connections" and
> "Hot standby mode is disabled." are logged if hot_standby is disabled.
> "the database system is not accepting connections" and "Consistent
>   recovery state has not been yet reached." are logged if hot_standby is
>   enabled.
>
> After the consistent recovery state is reached, if hot_standby is disabled,
> the postmaster state is still PM_RECOVERY. So "Hot standby mode is disabled."
> is still logged in this case. This is also different behavior from the 
> original.
> If hot_standby is enabled, read-only connections can be accepted because
> the consistent state is reached. So no message needs to be logged.
>
> Therefore for now what we've not reached the consensus is what message
> should be logged at PM_STARTUP. I'm thinking it's better to log
> "the database system is starting up" in that case because of the reasons
> that I explained upthread.
>
> Thought?

I understand your point now, and I agree, that makes sense.

The attached takes a similar approach to your v5, but I've used
CAC_NOTCONSISTENT instead of CAC_NOCONSISTENT because I think it reads
better (CAC_INCONSISTENT would technically be better English,
but...also it doesn't parallel the code and error message).

Thoughts?

James Coleman


v7-0001-Improve-standby-connection-denied-error-message.patch
Description: Binary data


Re: Minimal logical decoding on standbys

2021-03-23 Thread Fabrízio de Royes Mello
On Tue, Mar 23, 2021 at 8:47 AM Drouvot, Bertrand 
wrote:
>
> I have one remark regarding the conflicts:
>
> The logical slots are dropped if a conflict is detected.
>
> But, if the slot is not active before being dropped (say wal_level is
changed to  < logical on master and a logical slot is not active on the
standby) then its corresponding
pg_stat_database_conflicts.confl_logicalslot is not incremented (as it
would be incremented "only" during the cancel of an "active" backend).
>
> I think, it should be incremented in all the cases (active or not), what
do you think?
>

Good catch... IMHO it should be incremented as well!!!

> I updated the patch to handle this scenario (see the new
pgstat_send_droplogicalslot() in
v12-0003-Handle-logical-slot-conflicts-on-standby.patch).
>

Perfect.

> I also added more tests in 023_standby_logical_decoding_conflicts.pl to
verify that pg_stat_database_conflicts.confl_logicalslot is updated as
expected (see check_confl_logicalslot() in
v12-0004-New-TAP-test-for-logical-decoding-on-standby.patch).
>

Perfect.

> Except this remark and the associated changes, then it looks good to me.
>

LGTM too... Reviewing new changes now to move it forward and make this
patch set ready for commiter review.

Regards,

-- 
   Fabrízio de Royes Mello
   PostgreSQL Developer at OnGres Inc. - https://ongres.com


Re: WIP: BRIN multi-range indexes

2021-03-23 Thread Alvaro Herrera
On 2021-Mar-22, Tomas Vondra wrote:

> I don't know what's the right fix, but it seems like this patch has
> nothing to do with it. If we want to move the opclasses into an
> extension, we can comment out that one (cidr/inet) case for now.

I don't know what would be a good reason to define the opclasses in
separate contrib extensions.  I think it's going to be a nuisance to
users, so unless there is some strong argument for it, I'd suggest not
to do it.  I found it being discussed here:
https://www.postgresql.org/message-id/CA%2BTgmoajaQKBUx%3DvaTUFo6z80dsRzBw__Nu41Q4t06baZep3Ug%40mail.gmail.com
but there weren't any strong arguments put forward.

It seems a good experiment to have done it, though, since we now know
that there is a limitation in the existing SQL interface.  Maybe the fix
to that problem is to add a new clause to CREATE/ALTER OPERATOR CLASS to
let you define what goes into opckeytype.  However I don't think it's
this patch's responsibility to fix that problem.

-- 
Álvaro Herrera   Valdivia, Chile
"Hay que recordar que la existencia en el cosmos, y particularmente la
elaboración de civilizaciones dentro de él no son, por desgracia,
nada idílicas" (Ijon Tichy)




Re: [PATCH] Partial foreign key updates in referential integrity triggers

2021-03-23 Thread David Steele

On 3/18/21 9:52 AM, David Steele wrote:

On 1/5/21 4:40 PM, Paul Martinez wrote:


I've created a patch to better support referential integrity 
constraints when
using composite primary and foreign keys. This patch allows creating a 
foreign

key using the syntax:




I previously proposed this feature about a year ago [1], but I don't 
feel like

the arguments against it were very strong.


Perhaps not, but Tom certainly didn't seem in favor of this feature, at 
the least.


Since the patch has not attracted any review/comment I propose to move 
it to the next CF since it doesn't seem a likely candidate for PG14.


I'll do that on March 23 unless there are arguments to the contrary.


This entry has been moved to the 2021-07 CF with status Needs Review.

Regards,
--
-David
da...@pgmasters.net




Re: [PATCH] Tracking statements entry timestamp in pg_stat_statements

2021-03-23 Thread Andrei Zubkov
Dear Kuroda,

> I don't like the idea because such a column has no meaning for the
> specific row.
> I prefer storing timestamp if GetCurrentTimestamp() is cheap.
I agree. New version attached.
-- 
Andrei Zubkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

From ac68636c8a26223854292ee5860f4a5989cb985a Mon Sep 17 00:00:00 2001
From: Andrei Zubkov 
Date: Tue, 23 Mar 2021 17:03:34 +0300
Subject: [PATCH] pg_stat_statements: Track statement entry timestamp

Added first_seen column in pg_stat_statements view. This field is
populated with current timestamp when a new statement is added to
pg_stat_statements hashtable. This field provides clean information
about statistics collection time interval for each statement. Besides
it can be used by sampling solutions to detect situations when a
statement was evicted and returned back between two samples.

Discussion: https://www.postgresql.org/message-id/flat/72e80e7b160a6eb189df9ef6f068cce3765d37f8.camel%40moonset.ru
---
 .../expected/pg_stat_statements.out   | 22 
 .../pg_stat_statements--1.8--1.9.sql  | 54 +++
 .../pg_stat_statements/pg_stat_statements.c   | 31 +--
 .../sql/pg_stat_statements.sql|  8 +++
 doc/src/sgml/pgstatstatements.sgml|  9 
 5 files changed, 121 insertions(+), 3 deletions(-)

diff --git a/contrib/pg_stat_statements/expected/pg_stat_statements.out b/contrib/pg_stat_statements/expected/pg_stat_statements.out
index 16158525ca..ffe08dece5 100644
--- a/contrib/pg_stat_statements/expected/pg_stat_statements.out
+++ b/contrib/pg_stat_statements/expected/pg_stat_statements.out
@@ -876,4 +876,26 @@ SELECT dealloc FROM pg_stat_statements_info;
0
 (1 row)
 
+--
+-- statement timestamps
+--
+SELECT pg_stat_statements_reset();
+ pg_stat_statements_reset 
+--
+ 
+(1 row)
+
+SELECT now() AS start_ts \gset
+SELECT 1 AS "STMTTS";
+ STMTTS 
+
+  1
+(1 row)
+
+SELECT count(*) FROM pg_stat_statements WHERE first_seen >= :'start_ts' AND query LIKE '%STMTTS%';
+ count 
+---
+ 1
+(1 row)
+
 DROP EXTENSION pg_stat_statements;
diff --git a/contrib/pg_stat_statements/pg_stat_statements--1.8--1.9.sql b/contrib/pg_stat_statements/pg_stat_statements--1.8--1.9.sql
index 3504ca7eb1..9842d2da49 100644
--- a/contrib/pg_stat_statements/pg_stat_statements--1.8--1.9.sql
+++ b/contrib/pg_stat_statements/pg_stat_statements--1.8--1.9.sql
@@ -3,6 +3,60 @@
 -- complain if script is sourced in psql, rather than via ALTER EXTENSION
 \echo Use "ALTER EXTENSION pg_stat_statements UPDATE TO '1.9'" to load this file. \quit
 
+/* We need to redefine a view and a function */
+/* First we have to remove them from the extension */
+ALTER EXTENSION pg_stat_statements DROP VIEW pg_stat_statements;
+ALTER EXTENSION pg_stat_statements DROP FUNCTION pg_stat_statements(boolean);
+
+/* Then we can drop them */
+DROP VIEW pg_stat_statements;
+DROP FUNCTION pg_stat_statements(boolean);
+
+/* Now redefine */
+CREATE FUNCTION pg_stat_statements(IN showtext boolean,
+OUT userid oid,
+OUT dbid oid,
+OUT queryid bigint,
+OUT query text,
+OUT plans int8,
+OUT total_plan_time float8,
+OUT min_plan_time float8,
+OUT max_plan_time float8,
+OUT mean_plan_time float8,
+OUT stddev_plan_time float8,
+OUT calls int8,
+OUT total_exec_time float8,
+OUT min_exec_time float8,
+OUT max_exec_time float8,
+OUT mean_exec_time float8,
+OUT stddev_exec_time float8,
+OUT rows int8,
+OUT shared_blks_hit int8,
+OUT shared_blks_read int8,
+OUT shared_blks_dirtied int8,
+OUT shared_blks_written int8,
+OUT local_blks_hit int8,
+OUT local_blks_read int8,
+OUT local_blks_dirtied int8,
+OUT local_blks_written int8,
+OUT temp_blks_read int8,
+OUT temp_blks_written int8,
+OUT blk_read_time float8,
+OUT blk_write_time float8,
+OUT wal_records int8,
+OUT wal_fpi int8,
+OUT wal_bytes numeric,
+OUT first_seen timestamp with time zone
+)
+RETURNS SETOF record
+AS 'MODULE_PATHNAME', 'pg_stat_statements_1_9'
+LANGUAGE C STRICT VOLATILE PARALLEL SAFE;
+
+CREATE VIEW pg_stat_statements AS
+  SELECT * FROM pg_stat_statements(true);
+
+GRANT SELECT ON pg_stat_statements TO PUBLIC;
+
 --- Define pg_stat_statements_info
 CREATE FUNCTION pg_stat_statements_info(
 OUT dealloc bigint,
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 62cccbfa44..090889e21b 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -99,7 +99,7 @@ PG_MODULE_MAGIC;
 #define PGSS_TEXT_FILE	PG_STAT_TMP_DIR "/pgss_query_texts.stat"
 
 /* Magic number identifying the stats file format */
-static const uint32 PGSS_FILE_HEADER = 0x20201218;
+static const uint32 PGSS_FILE_HEADER = 0x20210322;
 
 /* PostgreSQL major version number, changes in which invalidate al

Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY

2021-03-23 Thread Alvaro Herrera
So I was about ready to get these patches pushed, when I noticed that in
REPEATABLE READ isolation mode it is possible to insert rows violating
an FK referencing the partition that is being detached.  I'm not sure
what is a good solution to this problem.

The problem goes like this:

/* setup */
drop table if exists d4_primary, d4_primary1, d4_fk;
create table d4_primary (a int primary key) partition by list (a);
create table d4_primary1 partition of d4_primary for values in (1);
insert into d4_primary values (1);
create table d4_fk (a int references d4_primary);

/* session 1 */
begin isolation level repeatable read;
select * from d4_primary;

/* session 2 */
alter table d4_primary detach partition d4_primary1 concurrently;
-- blocks
-- Cancel wait: Ctrl-c

/* session 1 */
insert into d4_fk values (1);
commit;

At this point, d4_fk contains the value (1) which is not present in
d4_primary.

This doesn't happen in READ COMMITTED mode; the INSERT at the final step
fails with "insert or update in table f4_fk violates the foreign key",
which is what I expected to happen here too.

I had the idea that the RI code, in REPEATABLE READ mode, used a
different snapshot for the RI queries than the transaction snapshot.
Maybe I'm wrong about that.

I'm looking into that now.

-- 
Álvaro Herrera   Valdivia, Chile
"Cuando mañana llegue pelearemos segun lo que mañana exija" (Mowgli)




Re: Minimal logical decoding on standbys

2021-03-23 Thread Fabrízio de Royes Mello
On Tue, Mar 23, 2021 at 10:18 AM Fabrízio de Royes Mello <
fabriziome...@gmail.com> wrote:
>
> LGTM too... Reviewing new changes now to move it forward and make this
patch set ready for commiter review.
>

According to the feature LGTM and all tests passed. Documentation is also
OK. Some minor comments:

+
+ A logical replication slot can also be created on a hot standby. To
prevent
+ VACUUM from removing required rows from the system
+ catalogs, hot_standby_feedback should be set on the
+ standby. In spite of that, if any required rows get removed, the slot
gets
+ dropped.  Existing logical slots on standby also get dropped if
wal_level
+ on primary is reduced to less than 'logical'.
+

Remove extra space before "Existing logical slots..."

+pg_stat_get_db_conflict_logicalslot(D.oid) AS
confl_logicalslot,

Move it to the end of pg_stat_database_conflicts columns


+* is being reduced.  Hence this extra check.

Remove extra space before "Hence this..."


+   /* Send the other backend, a conflict recovery signal */
+
+   SetInvalidVirtualTransactionId(vxid);

Remove extra empty line


+   if (restart_lsn % XLOG_BLCKSZ != 0)
+   elog(ERROR, "invalid replay pointer");

Add an empty line after this "IF" for code readability


+void
+ResolveRecoveryConflictWithLogicalSlots(Oid dboid, TransactionId xid,
+   char *conflict_reason)
+{
+   int i;
+   boolfound_conflict = false;
+
+   if (max_replication_slots <= 0)
+   return;

What about adding an "Assert(max_replication_slots >= 0);" before the
replication slots check?

One last thing is about the name of TAP tests, we should rename them
because there are other TAP tests starting with 022_ and 023_. It should be
renamed to:

src/test/recovery/t/022_standby_logical_decoding_xmins.pl ->
src/test/recovery/t/024_standby_logical_decoding_xmins.pl
src/test/recovery/t/023_standby_logical_decoding_conflicts.pl
-> src/test/recovery/t/025_standby_logical_decoding_conflicts.pl

Regards,

-- 
   Fabrízio de Royes Mello
   PostgreSQL Developer at OnGres Inc. - https://ongres.com


Re: Add Nullif case for eval_const_expressions_mutator

2021-03-23 Thread David Steele

On 1/19/21 8:16 PM, Hou, Zhijie wrote:


Attatching v3 patch, please consider it for further review.


Peter, thoughts on the new patch in [1]?

--
-David
da...@pgmasters.net

[1]  
https://www.postgresql.org/message-id/ab53b3dbdbd6436f970f33b51ccd7dd3%40G08CNEXMBPEKD05.g08.fujitsu.local





Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2021-03-23 Thread Bruce Momjian
On Tue, Mar 23, 2021 at 02:36:27PM +0800, Julien Rouhaud wrote:
> On Mon, Mar 22, 2021 at 08:43:40PM -0400, Bruce Momjian wrote:
> > On Mon, Mar 22, 2021 at 05:17:15PM -0700, Zhihong Yu wrote:
> > > Hi,
> > > For queryjumble.c :
> > > 
> > > + * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
> > > 
> > > The year should be updated.
> > > Same with queryjumble.h
> > 
> > Thanks, fixed.
> 
> Thanks also for taking care of that.  While at it I see that current HEAD has 
> a
> lot of files with the same problem:
> 
> $ git grep "\-2020"
> config/config.guess:#   Copyright 1992-2020 Free Software Foundation, Inc.
> config/config.guess:Copyright 1992-2020 Free Software Foundation, Inc.
> config/config.sub:#   Copyright 1992-2020 Free Software Foundation, Inc.
> config/config.sub:Copyright 1992-2020 Free Software Foundation, Inc.
> contrib/pageinspect/gistfuncs.c: * Copyright (c) 2014-2020, PostgreSQL Global 
> Development Group
> src/backend/rewrite/rewriteSearchCycle.c: * Portions Copyright (c) 1996-2020, 
> PostgreSQL Global Development Group
> src/backend/utils/adt/jsonbsubs.c: * Portions Copyright (c) 1996-2020, 
> PostgreSQL Global Development Group
> src/bin/pg_archivecleanup/po/de.po:# Copyright (C) 2019-2020 PostgreSQL 
> Global Development Group
> src/bin/pg_rewind/po/de.po:# Copyright (C) 2015-2020 PostgreSQL Global 
> Development Group
> src/bin/pg_rewind/po/de.po:# Peter Eisentraut , 
> 2015-2020.
> src/common/hex.c: * Portions Copyright (c) 1996-2020, PostgreSQL Global 
> Development Group
> src/common/sha1.c: * Portions Copyright (c) 1996-2020, PostgreSQL Global 
> Development Group
> src/common/sha1_int.h: * Portions Copyright (c) 1996-2020, PostgreSQL Global 
> Development Group
> src/include/common/hex.h: * Portions Copyright (c) 1996-2020, PostgreSQL 
> Global Development Group
> src/include/common/sha1.h: * Portions Copyright (c) 1996-2020, PostgreSQL 
> Global Development Group
> src/include/port/pg_iovec.h: * Portions Copyright (c) 1996-2020, PostgreSQL 
> Global Development Group
> src/include/rewrite/rewriteSearchCycle.h: * Portions Copyright (c) 1996-2020, 
> PostgreSQL Global Development Group
> src/interfaces/ecpg/preproc/po/de.po:# Copyright (C) 2009-2020 PostgreSQL 
> Global Development Group
> src/interfaces/ecpg/preproc/po/de.po:# Peter Eisentraut 
> , 2009-2020.
> 
> Is that an oversight in ca3b37487be333a1d241dab1bbdd17a211a88f43, at least for
> non .po files?

No, I don't think so.  We don't change the Free Software Foundation
copyrights, and the .po files get loaded from another repository
occasionally.  The hex/sha copyrights came from patches developed in
2020 but committed in 2021.  These will mostly be corrected in 2022.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.





Re: Replication slot stats misgivings

2021-03-23 Thread Masahiko Sawada
On Tue, Mar 23, 2021 at 3:09 PM Amit Kapila  wrote:
>
> On Mon, Mar 22, 2021 at 12:20 PM Masahiko Sawada  
> wrote:
> >
> > On Mon, Mar 22, 2021 at 1:25 PM Masahiko Sawada  
> > wrote:
> > >
> > > On Sat, Mar 20, 2021 at 3:52 AM Andres Freund  wrote:
> > > >
> > > > - If max_replication_slots was lowered between a restart,
> > > >   pgstat_read_statfile() will happily write beyond the end of
> > > >   replSlotStats.
> > >
> > > I think we cannot restart the server after lowering
> > > max_replication_slots to a value less than the number of replication
> > > slots actually created on the server. No?
> >
> > This problem happens in the case where max_replication_slots is
> > lowered and there still are stats for a slot.
> >
>
> I think this can happen only if the drop message is lost, right?

Yes, I think you're right. In that case, the stats file could have
more slots statistics than the lowered max_replication_slots.

>
> > I understood the risk of running out of replSlotStats. If we use the
> > index in replSlotStats instead, IIUC we need to somehow synchronize
> > the indexes in between replSlotStats and
> > ReplicationSlotCtl->replication_slots. The order of replSlotStats is
> > preserved across restarting whereas the order of
> > ReplicationSlotCtl->replication_slots isn’t (readdir() that is used by
> > StartupReplicationSlots() doesn’t guarantee the order of the returned
> > entries in the directory). Maybe we can compare the slot name in the
> > received message to the name in the element of replSlotStats. If they
> > don’t match, we swap entries in replSlotStats to synchronize the index
> > of the replication slot in ReplicationSlotCtl->replication_slots and
> > replSlotStats. If we cannot find the entry in replSlotStats that has
> > the name in the received message, it probably means either it's a new
> > slot or the previous create message is dropped, we can create the new
> > stats for the slot. Is that what you mean, Andres?
> >
>
> I wonder how in this scheme, we will remove the risk of running out of
> 'replSlotStats' and still restore correct stats assuming the drop
> message is lost? Do we want to check after restoring each slot info
> whether the slot with that name exists?

Yeah, I think we need such a check at least if the number of slot
stats in the stats file is larger than max_replication_slots. Or we
can do that at every startup to remove orphaned slot stats.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/




Re: [CLOBBER_CACHE]Server crashed with segfault 11 while executing clusterdb

2021-03-23 Thread Tom Lane
Michael Paquier  writes:
> On Tue, Mar 23, 2021 at 04:12:01PM +0900, Michael Paquier wrote:
>> It takes some time to initialize a cluster under CLOBBER_CACHE_ALWAYS,
>> but the test is quick enough to reproduce.  It would be good to bisect
>> the origin point here as a first step.

> One bisect later, the winner is:
> commit: 3d351d916b20534f973eda760cde17d96545d4c4
> author: Tom Lane 
> date: Sun, 30 Aug 2020 12:21:51 -0400
> Redefine pg_class.reltuples to be -1 before the first VACUUM or ANALYZE.

> I am too tired to poke at that today, so I'll try tomorrow.  Tom may
> beat me at that though.

I think that's an artifact.  That commit didn't touch anything related to
relation opening or closing.  What it could have done, though, is change
CLUSTER's behavior on this empty table from use-an-index to use-a-seqscan,
thus causing us to follow the buggy code path where before we didn't.

The interesting question here seems to be "why didn't the existing
CLOBBER_CACHE_ALWAYS buildfarm testing catch this?".  It looks to me like
the answer is that it only happens for an empty table (or at least one
where the data pattern is such that we skip the RelationOpenSmgr call
earlier in end_heap_rewrite) and we don't happen to be exercising that
exact scenario in the regression tests.

regards, tom lane




Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY

2021-03-23 Thread Alvaro Herrera
On 2021-Mar-23, Alvaro Herrera wrote:

> So I was about ready to get these patches pushed, when I noticed that in
> REPEATABLE READ isolation mode it is possible to insert rows violating
> an FK referencing the partition that is being detached.  I'm not sure
> what is a good solution to this problem.

...

> I had the idea that the RI code, in REPEATABLE READ mode, used a
> different snapshot for the RI queries than the transaction snapshot.

I am definitely right about this.  So why doesn't it work?  The reason
is that when SPI goes to execute the query, it obtains a new partition
directory, and we tell it to include detached partitions precisely
because we're in REPEATABLE READ mode.

In other words, the idea that we can blanket use the snapshot-isolation
condition to decide whether to include detached partitions or not, is
bogus and needs at least the refinement that for any query that comes
from the RI system, we need a partition directory that does not include
detached partitions.

-- 
Álvaro Herrera39°49'30"S 73°17'W
"El sabio habla porque tiene algo que decir;
el tonto, porque tiene que decir algo" (Platon).




Re: pg_upgrade failing for 200+ million Large Objects

2021-03-23 Thread Bruce Momjian
On Tue, Mar 23, 2021 at 08:51:32AM -0400, Jan Wieck wrote:
> On 3/22/21 7:18 PM, Jan Wieck wrote:
> > On 3/22/21 5:36 PM, Zhihong Yu wrote:
> > > Hi,
> > > 
> > > w.r.t. pg_upgrade_improvements.v2.diff.
> > > 
> > > +       blobBatchCount = 0;
> > > +       blobInXact = false;
> > > 
> > > The count and bool flag are always reset in tandem. It seems
> > > variable blobInXact is not needed.
> > 
> > You are right. I will fix that.
> 
> New patch v3 attached.

Would it be better to allow pg_upgrade to pass arbitrary arguments to
pg_restore, instead of just these specific ones?

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.





Re: pgsql: Move tablespace path re-creation from the makefiles to pg_regres

2021-03-23 Thread Christoph Berg
Re: Michael Paquier
> Move tablespace path re-creation from the makefiles to pg_regress
> 
> Moving this logic into pg_regress fixes a potential failure with
> parallel tests when pg_upgrade and the main regression test suite both
> trigger the makefile rule that cleaned up testtablespace/ under
> src/test/regress.  Even if pg_upgrade was triggering this rule, it has
> no need to do so as it uses a different tablespace path.  So if
> pg_upgrade triggered the makefile rule for the tablespace setup while
> the main regression test suite ran the tablespace cases, it would fail.
> 
> 61be85a was a similar attempt at achieving that, but that broke cases
> where the regression tests require to run under an Administrator
> account, like with Appveyor.

This change broke running the testsuite on an existing PG server, if
server user and pg_regress client user are different. This is one of
the tests exercised by Debian's autopkgtest suite.

Previously I could create the tablespace directory, chown it to
postgres, and fire up pg_regress with the correct options. Now
pg_regress wipes that directory, recreates it, and then the server
can't use it because user "postgres" can't write to it.

I'm working around the problem now by running the tests as user
"postgres", but does completely break in environments where users want
to run the testsuite from a separate compilation user but don't have root.

Old code: 
https://salsa.debian.org/postgresql/postgresql/-/blob/8b1217fcae3e64155bc35517acbd50c6f166d997/debian/tests/installcheck
Workaround: 
https://salsa.debian.org/postgresql/postgresql/-/blob/cbc0240bec738b6ab3b61c498825b82c8ff21a70/debian/tests/installcheck

Christoph




Re: tool to migrate database

2021-03-23 Thread Bruce Momjian
On Tue, Mar 23, 2021 at 09:49:57AM +0100, Joel Jacobson wrote:
> I recently read an interesting real-life story from a very big company, Adyen,
> and how they upgraded their 50 terrabyte PostgreSQL database. The article is
> from 2018 but I still think it's relevant:
> 
> https://medium.com/adyen/
> updating-a-50-terabyte-postgresql-database-f64384b799e7
> 
> There might be other good tools I don't know of, I'm not an expert on 
> upgrades.
> Hopefully other pghackers can fill in.

This is not an appropriate topic for the hackers email list, which is
for internal server development discussion.  The 'general' or 'admin'
lists would be better.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.





Re: Support ALTER SUBSCRIPTION ... ADD/DROP PUBLICATION ... syntax

2021-03-23 Thread Japin Li

On Mon, 22 Mar 2021 at 11:14, Bharath Rupireddy 
 wrote:
> On Sun, Mar 7, 2021 at 7:21 PM Japin Li  wrote:
>> Thank you point out this.  Fixed it in v7 patch set.
>>
>> Please consider the v7 patch for futher review.
>
> Thanks for the patches. I just found the following behaviour with the
> new ADD/DROP syntax: when the specified publication list has
> duplicates, the patch is throwing "publication is already present"
> error. It's adding the first instance of the duplicate into the list
> and the second instance is being checked in the added list and
> throwing the "already present error". The error message means that the
> publication is already present in the subscription but it's not true.
> See my testing at [1].
>
> I think we have two cases:
> case 1: the publication/s specified in the new ADD/DROP syntax may/may
> not have already been associated with the subscription, so the error
> "publication is already present"/"publication doesn't exist" error
> makes sense.
> case 2: there can be duplicate publications specified in the new
> ADD/DROP syntax, in this case the error "publication name "mypub2"
> used more than once" makes more sense much like [2].
>
> [1]
> postgres=# select subpublications from pg_subscription;
>  subpublications
> -
>  {mypub,mypub1}
>
> postgres=# alter subscription mysub add publication mypub2, mypub2;
> ERROR:  publication "mypub2" is already present in the subscription
>
> postgres=# select subpublications from pg_subscription;
> subpublications
> ---
>  {mypub,mypub1,mypub2}
>
> postgres=# alter subscription mysub drop publication mypub2, mypub2;
> ERROR:  publication "mypub2" doesn't exist in the subscription
>
> [2]
> postgres=# alter subscription mysub set publication mypub2, mypub2;
> ERROR:  publication name "mypub2" used more than once
>

Thanks for your review.

I check the duplicates for newpublist in merge_publications(). The code is
copied from publicationListToArray().

I do not check for all duplicates because it will make the code more complex.
For example:

ALTER SUBSCRIPTION mysub ADD PUBLICATION mypub2, mypub2, mypub2;

If we record the duplicate publication names in list A, when we find a
duplication in newpublist, we should check whether the publication is
in list A or not, to make the error message make sense (do not have
duplicate publication names in error message).

Thought?

-- 
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.

>From 632505be60af48a8d9514e58d536f3acaff48550 Mon Sep 17 00:00:00 2001
From: Japin Li 
Date: Sun, 7 Mar 2021 12:56:55 +
Subject: [PATCH v8 1/4] Introduce a new syntax to add/drop publications

At present, if we want to update publications in subscription, we can
use SET PUBLICATION, however, it requires supply all publications that
exists and the new publications if we want to add new publications, it's
inconvenient.  The new syntax only supply the new publications.  When
the refresh is true, it only refresh the new publications.
---
 src/backend/commands/subscriptioncmds.c | 153 
 src/backend/parser/gram.y   |  20 
 src/include/nodes/parsenodes.h  |   2 +
 3 files changed, 175 insertions(+)

diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c
index bfd3514546..368ee36961 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -47,6 +47,7 @@
 #include "utils/syscache.h"
 
 static List *fetch_table_list(WalReceiverConn *wrconn, List *publications);
+static List *merge_publications(List *oldpublist, List *newpublist, bool addpub);
 static void ReportSlotConnectionError(List *rstates, Oid subid, char *slotname, char *err);
 
 
@@ -964,6 +965,53 @@ AlterSubscription(AlterSubscriptionStmt *stmt, bool isTopLevel)
 break;
 			}
 
+		case ALTER_SUBSCRIPTION_ADD_PUBLICATION:
+		case ALTER_SUBSCRIPTION_DROP_PUBLICATION:
+			{
+bool	copy_data = false;
+bool	isadd = stmt->kind == ALTER_SUBSCRIPTION_ADD_PUBLICATION;
+bool	refresh;
+List   *publist = NIL;
+
+publist = merge_publications(sub->publications, stmt->publication, isadd);
+
+parse_subscription_options(stmt->options,
+		   NULL,	/* no "connect" */
+		   NULL, NULL,	/* no "enabled" */
+		   NULL,	/* no "create_slot" */
+		   NULL, NULL,	/* no "slot_name" */
+		   isadd ? ©_data : NULL, /* for drop, no "copy_data" */
+		   NULL,	/* no "synchronous_commit" */
+		   &refresh,
+		   NULL, NULL,	/* no "binary" */
+		   NULL, NULL); /* no "streaming" */
+
+values[Anum_pg_subscription_subpublications - 1] =
+	publicationListToArray(publist);
+replaces[Anum_pg_subscription_subpublications - 1] = true;
+
+update_tuple = true;
+
+/* Refresh if user asked us to. */
+if (refresh)
+{
+	if (!sub->enabled)
+		ereport(ERROR,
+(errcode(ERRCODE_SYNTAX_ERROR),
+

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2021-03-23 Thread Alvaro Herrera
On 2021-Mar-22, Bruce Momjian wrote:

> --- a/doc/src/sgml/ref/explain.sgml
> +++ b/doc/src/sgml/ref/explain.sgml
> @@ -136,8 +136,10 @@ ROLLBACK;
>the output column list for each node in the plan tree, schema-qualify
>table and function names, always label variables in expressions with
>their range table alias, and always print the name of each trigger for
> -  which statistics are displayed.  This parameter defaults to
> -  FALSE.
> +  which statistics are displayed.  The query identifier will also be
> +  displayed if one has been compute, see  +  linkend="guc-compute-query-id"/> for more details.  This parameter
> +  defaults to FALSE.

Typo here, "has been computed".

Is the intention to commit each of these patches separately?

-- 
Álvaro Herrera   Valdivia, Chile




Re: 64-bit XIDs in deleted nbtree pages

2021-03-23 Thread Masahiko Sawada
On Mon, Mar 22, 2021 at 7:27 AM Peter Geoghegan  wrote:
>
> On Wed, Mar 10, 2021 at 5:34 PM Peter Geoghegan  wrote:
> > Here is another bitrot-fix-only revision, v9. Just the recycling patch 
> > again.
>
> I committed the final nbtree page deletion patch just now -- the one
> that attempts to make recycling happen for newly deleted pages. Thanks
> for all your work on patch review, Masahiko!

You're welcome! Those are really good improvements.

By this patch series, btree indexes became like hash indexes in terms
of amvacuumcleanup. We do an index scan at btvacuumcleanup() in the
two cases: metapage upgrading and more than 5%
deleted-but-not-yet-recycled pages. Both cases seem rare cases. So do
we want to disable parallel index cleanup for btree indexes like hash
indexes? That is, remove VACUUM_OPTION_PARALLEL_COND_CLEANUP from
amparallelvacuumoptions. IMO we can live with the current
configuration just in case where the user runs into such rare
situations (especially for the latter case). In most cases, parallel
vacuum workers for index cleanup might exit with no-op but the
side-effect (wasting resources and overhead etc) would not be big. If
we want to enable it only in particular cases, we would need to have
another way for index AM to tell lazy vacuum whether or not to allow a
parallel worker to process the index at that time. What do you think?

I’m not sure we need changes but I think it’s worth discussing here.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/




Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY

2021-03-23 Thread Alvaro Herrera
I'm coming around to the idea that the fact that you can cancel the wait
phase of DETACH CONCURRENTLY creates quite a disaster, and it's not easy
to get away from it.  The idea that REPEATABLE READ mode means that you
now see detached partitions as if they were in normal condition, is
completely at odds with that behavior. 

I think a possible solution to this problem is that the "detach" flag in
pg_inherits is not a boolean anymore, but an Xid (or maybe two Xids).
Not sure exactly which Xid(s) yet, and I'm not sure what are the exact
rules, but the Xid becomes a marker that indicates an horizon past which
the partition is no longer visible.  Then, REPEATABLE READ can see the
partition, but only if its snapshot is older than the Xid.

-- 
Álvaro Herrera   Valdivia, Chile
"La persona que no quería pecar / estaba obligada a sentarse
 en duras y empinadas sillas/ desprovistas, por cierto
 de blandos atenuantes"  (Patricio Vogel)




Re: [CLOBBER_CACHE]Server crashed with segfault 11 while executing clusterdb

2021-03-23 Thread Tom Lane
I wrote:
> Michael Paquier  writes:
>> One bisect later, the winner is:
>> commit: 3d351d916b20534f973eda760cde17d96545d4c4
>> author: Tom Lane 
>> date: Sun, 30 Aug 2020 12:21:51 -0400
>> Redefine pg_class.reltuples to be -1 before the first VACUUM or ANALYZE.

> I think that's an artifact.  That commit didn't touch anything related to
> relation opening or closing.  What it could have done, though, is change
> CLUSTER's behavior on this empty table from use-an-index to use-a-seqscan,
> thus causing us to follow the buggy code path where before we didn't.

On closer inspection, I believe the true culprit is c6b92041d,
which did this:

 */
if (RelationNeedsWAL(state->rs_new_rel))
-   heap_sync(state->rs_new_rel);
+   smgrimmedsync(state->rs_new_rel->rd_smgr, MAIN_FORKNUM);
 
logical_end_heap_rewrite(state);

heap_sync was careful about opening rd_smgr, the new code not so much.

I read the rest of that commit and didn't see any other equivalent
bugs, but I might've missed something.

regards, tom lane




Re: Handling of opckeytype / CREATE OPERATOR CLASS (bug?)

2021-03-23 Thread Tomas Vondra



On 3/23/21 6:15 AM, Tom Lane wrote:
> Tomas Vondra  writes:
>> On 3/23/21 3:13 AM, Tom Lane wrote:
>>> Hm.  Both catalogs.sgml and pg_opclass.h say specifically that
>>> opckeytype should be zero if it's to be the same as the input
>>> column type.  I don't think just dropping the enforcement of that
>>> is the right answer.
> 
>> Yeah, that's possible. I was mostly just demonstrating the difference in
>> behavior. Maybe the right fix is to fix the catalog contents and then
>> tweak the AM code, or something.
> 
> Digging in our git history, the rule about zero opckeytype dates to
> 2001 (f933766ba), which precedes our invention of polymorphic types
> in 2003 (somewhere around 730840c9b).  So I'm pretty sure that that
> was a poor man's substitute for polymorphic opclasses, which we
> failed to clean up entirely after we got real polymorphic opclasses.
> 
> Now, I'd be in favor of cleaning that up and just using standard
> polymorphism rules throughout.  But (without having studied the code)
> it looks like the immediate issue is that something in the BRIN code is
> unfamiliar with the rule for zero opckeytype.  It might be a noticeably
> smaller patch to fix that than to get rid of the convention about zero.
> 

That's possible. I'm not familiar with how we deal with polymorphic
opclasses etc. but I tried to look for places dealing with opckeytype,
so that I can compare BRIN vs. the other AMs, but the only references
seem to be in amvalidate() functions.

So either the difference is not very obvious, or maybe the other AMs
don't trigger this for some reason. For example btree has a separate
opclass for cidr, so it does not have to use "inet_ops" as polymorphic.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: multi-install PostgresNode

2021-03-23 Thread Andrew Dunstan


On 1/13/21 7:25 AM, Daniel Gustafsson wrote:
>> On 17 Dec 2020, at 22:37, Andrew Dunstan  wrote:
>> I've been giving some thought to $subject. The initial impetus is the
>> promise I made to assist with testing of clients built with NSS against
>> servers built with openssl, and vice versa.
> Thanks for tackling!
>
>> My main question is: do we want something like this in the core code
>> (presumably in src/test/perl), or is it not of sufficiently general
>> interest?
> To be able to implement pg_upgrade tests as TAP tests seems like enough of a
> win to consider this for inclusion in core.
>

Daniel, did you have any further comments on this? If not, does anyone
object to my committing it?


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Add Nullif case for eval_const_expressions_mutator

2021-03-23 Thread Tom Lane
David Steele  writes:
> Peter, thoughts on the new patch in [1]?

I'm not Peter, but I have a complaint about this bit:

+   if (!has_nonconst_input)
+   return ece_evaluate_expr(expr);

That's not okay without a further check to see if the comparison function
used by the node is immutable.  Compare ScalarArrayOpExpr, for instance.

regards, tom lane




Re: Nicer error when connecting to standby with hot_standby=off

2021-03-23 Thread Alvaro Herrera
On 2021-Mar-23, James Coleman wrote:

> On Tue, Mar 23, 2021 at 1:46 AM Fujii Masao  
> wrote:

> > Therefore for now what we've not reached the consensus is what message
> > should be logged at PM_STARTUP. I'm thinking it's better to log
> > "the database system is starting up" in that case because of the reasons
> > that I explained upthread.

> I understand your point now, and I agree, that makes sense.

Please note that PM_STARTUP mode is very very short-lived.  It only
starts happening when postmaster launches the startup process, and
before the startup process begins WAL replay (as changed by
sigusr1_handler in postmaster.c).  Once WAL replay begins, the PM status
changes to PM_RECOVERY.  So I don't think we really care all that much
what message is logged in this case.  It changes very quickly into the
CAC_NOTCONSISTENT message anyway.  For this state, it seems okay with
either what James submitted in v7, or what Fujii said.

However, for this one

+   case CAC_NOTCONSISTENT:
+   if (EnableHotStandby)
+   ereport(FATAL,
+   (errcode(ERRCODE_CANNOT_CONNECT_NOW),
+errmsg("the database system is not accepting 
connections"),
+errdetail("Consistent recovery state has not been yet 
reached.")));

Maybe it makes sense to say "... is not accepting connections *yet*".
That'd be a tad redundant with what the DETAIL says, but that seems
acceptable.

-- 
Álvaro Herrera39°49'30"S 73°17'W




Re: Change default of checkpoint_completion_target

2021-03-23 Thread Stephen Frost
Greetings,

* Michael Paquier (mich...@paquier.xyz) wrote:
> On Mon, Mar 22, 2021 at 01:11:00PM -0400, Stephen Frost wrote:
> > Unless there's anything further on this, I'll plan to commit it tomorrow
> > or Wednesday.
> 
> Cool, looks fine to me.
> 
> This version of the patch has forgotten to update one spot:
> src/backend/postmaster/checkpointer.c:double CheckPointCompletionTarget = 0.5;

Hah!  Indeed!

Fixed in the attached.

Thanks!

Stephen
From 1c69cca6fc9bbd921f873cb208ffcdbd68bde586 Mon Sep 17 00:00:00 2001
From: Stephen Frost 
Date: Tue, 19 Jan 2021 13:53:34 -0500
Subject: [PATCH] Change checkpoint_completion_target default to 0.9

Common recommendations are that the checkpoint should be spread out as
much as possible, provided we avoid having it take too long.  This
change updates the default to 0.9 (from 0.5) to match that
recommendation.

There was some debate about possibly removing the option entirely but it
seems there may be some corner-cases where having it set much lower to
try to force the checkpoint to be as fast as possible could result in
fewer periods of time of reduced performance due to kernel flushing.
General agreement is that the "spread more" is the preferred approach
though and those who need to tune away from that value are much less
common.

Reviewed-By: Michael Paquier, Peter Eisentraut, Tom Lane, David Steele
Discussion: https://postgr.es/m/20201207175329.GM16415%40tamriel.snowman.net
---
 doc/src/sgml/config.sgml  | 12 ++--
 doc/src/sgml/wal.sgml | 29 ---
 src/backend/postmaster/checkpointer.c |  2 +-
 src/backend/utils/misc/guc.c  |  2 +-
 src/backend/utils/misc/postgresql.conf.sample |  2 +-
 src/test/recovery/t/015_promotion_pages.pl|  1 -
 6 files changed, 30 insertions(+), 18 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 5679b40dd5..44763f0180 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -3302,9 +3302,15 @@ include_dir 'conf.d'
   

 Specifies the target of checkpoint completion, as a fraction of
-total time between checkpoints. The default is 0.5.
-This parameter can only be set in the postgresql.conf
-file or on the server command line.
+total time between checkpoints. The default is 0.9, which spreads the
+checkpoint across almost all of the available interval, providing fairly
+consistent I/O load while also leaving some time for checkpoint
+completion overhead.  Reducing this parameter is not recommended as that
+causes the I/O from the checkpoint to have to complete faster, resulting
+in a higher I/O rate, while then having a period of less I/O between the
+completion of the checkpoint and the start of the next scheduled
+checkpoint.  This parameter can only be set in the
+postgresql.conf file or on the server command line.

   
  
diff --git a/doc/src/sgml/wal.sgml b/doc/src/sgml/wal.sgml
index ae4a3c1293..4354051c7b 100644
--- a/doc/src/sgml/wal.sgml
+++ b/doc/src/sgml/wal.sgml
@@ -571,22 +571,29 @@
writing dirty buffers during a checkpoint is spread over a period of time.
That period is controlled by
, which is
-   given as a fraction of the checkpoint interval.
+   given as a fraction of the checkpoint interval (configured by using
+   checkpoint_timeout).
The I/O rate is adjusted so that the checkpoint finishes when the
given fraction of
checkpoint_timeout seconds have elapsed, or before
max_wal_size is exceeded, whichever is sooner.
-   With the default value of 0.5,
+   With the default value of 0.9,
PostgreSQL can be expected to complete each checkpoint
-   in about half the time before the next checkpoint starts.  On a system
-   that's very close to maximum I/O throughput during normal operation,
-   you might want to increase checkpoint_completion_target
-   to reduce the I/O load from checkpoints.  The disadvantage of this is that
-   prolonging checkpoints affects recovery time, because more WAL segments
-   will need to be kept around for possible use in recovery.  Although
-   checkpoint_completion_target can be set as high as 1.0,
-   it is best to keep it less than that (perhaps 0.9 at most) since
-   checkpoints include some other activities besides writing dirty buffers.
+   a bit before the next scheduled checkpoint (at around 90% of the last checkpoint's
+   duration).  This spreads out the I/O as much as possible to have the I/O load be
+   consistent during the checkpoint.  The disadvantage of this is that prolonging
+   checkpoints affects recovery time, because more WAL segments will need to be kept
+   around for possible use in recovery.  A user concerned about the amount of time
+   required to recover might wish to reduce checkpoint_timeout,
+   causing checkpoints to happen more frequently while still spreading out

Re: Handling of opckeytype / CREATE OPERATOR CLASS (bug?)

2021-03-23 Thread Tom Lane
Tomas Vondra  writes:
> On 3/23/21 6:15 AM, Tom Lane wrote:
>> Digging in our git history, the rule about zero opckeytype dates to
>> 2001 (f933766ba), which precedes our invention of polymorphic types
>> in 2003 (somewhere around 730840c9b).  So I'm pretty sure that that
>> was a poor man's substitute for polymorphic opclasses, which we
>> failed to clean up entirely after we got real polymorphic opclasses.

> That's possible. I'm not familiar with how we deal with polymorphic
> opclasses etc. but I tried to look for places dealing with opckeytype,
> so that I can compare BRIN vs. the other AMs, but the only references
> seem to be in amvalidate() functions.
> So either the difference is not very obvious, or maybe the other AMs
> don't trigger this for some reason. For example btree has a separate
> opclass for cidr, so it does not have to use "inet_ops" as polymorphic.

I think the difference is that brin is trying to look up opclass members
based on the recorded type of the index's column (not the underlying
table column).  I don't recall that anyplace else does that.  btree
for instance does some lookups based on opcintype, but I don't think
it looks at the index column type anywhere.

After poking at it a bit more, the convention for zero does allow us
to do some things that regular polymorphism won't.  As an example:

test=# create table vc (id varchar(9) primary key);
CREATE TABLE
test=# \d+ vc_pkey
   Index "public.vc_pkey"
 Column | Type | Key? | Definition | Storage  | Stats target 
+--+--++--+--
 id | character varying(9) | yes  | id | extended | 
primary key, btree, for table "public.vc"

If btree text_ops had opckeytype = 'text' then this index column
would show as just "text", which while not fatal seems like a loss
of information.

So I'm coming around to the idea that opckeytype = opcintype and
opckeytype = 0 are valid but distinct situations, and CREATE OPCLASS
indeed ought not smash one to the other.  But we'd better poke around
at the documentation, pg_dump, etc and make sure everything plays
nice with that.

regards, tom lane




Re: [PATCH] Allow multiple recursive self-references

2021-03-23 Thread Pantelis Theodosiou
On Tue, Mar 23, 2021 at 1:03 PM Denis Hirn 
wrote:

>
> Hey everyone,
>
> As you know, Postgres currently supports SQL:1999 recursive common table
> expressions, using WITH RECURSIVE. However, Postgres does not allow more
> than
> one recursive self-reference in the recursive term. This restriction seems
> to be
> unnecessary.
>
> In this mail, I'd like to propose a patch that removes this restriction,
> and
> therefore allows the use of multiple self-references in the recursive term.
> After the patch:
>
> WITH RECURSIVE t(n) AS (
> VALUES(1)
>   UNION ALL
> SELECT t.n+f.n
> FROM t, t AS f
> WHERE t.n < 100
> ) SELECT * FROM t;
>
>   n
> -
>1
>2
>4
>8
>   16
>   32
>   64
>  128
> (8 rows)
>
> This feature deviates only slightly from the current WITH RECURSIVE, and
> requires very little changes (~10 loc). Any thoughts on this?
>
> --
> Denis Hirn
>

I am not at all sure what the standard says about such recursion but it
looks like the two t's are treated in your patch as the same incarnation of
the table, not as a cross join of two incarnations. The natural result I
would expect from a this query would be all numbers from 1 to 198 (assuming
that the query is modified to restrict f.n and that UNION ALL is converted
to UNION to avoid infinite recursion).

I don't think any other DBMS has implemented this, except MariaDB. Tested
here:
https://dbfiddle.uk/?rdbms=mariadb_10.5&fiddle=565c22771fdfc746e05808a7da7a205f

SET  @@standard_compliant_cte=0;
WITH RECURSIVE t(n) AS (
SELECT 1
  UNION -- ALL
SELECT t.n + f.n
FROM t, t AS f
WHERE t.n < 4 AND f.n < 4
) SELECT * FROM t;

Result:

> |  n |
> | -: |
> |  1 |
> |  2 |
> |  3 |
> |  4 |
> |  5 |
> |  6 |

Best regards
Pantelis Theodosiou


Re: Nicer error when connecting to standby with hot_standby=off

2021-03-23 Thread Tom Lane
Alvaro Herrera  writes:
> However, for this one

> +   case CAC_NOTCONSISTENT:
> +   if (EnableHotStandby)
> +   ereport(FATAL,
> +   (errcode(ERRCODE_CANNOT_CONNECT_NOW),
> +errmsg("the database system is not accepting 
> connections"),
> +errdetail("Consistent recovery state has not been 
> yet reached.")));

> Maybe it makes sense to say "... is not accepting connections *yet*".

+1, but I think "... is not yet accepting connections" is slightly
better style.

regards, tom lane




Query about pg asynchronous processing support

2021-03-23 Thread ??????
Dear hacker:
    I am an undergraduate from Nanjing University. I use 
pgsql source code for my own development. During processing each sql query in 
function 'exec_simple_query', I'm going to add some extra functions such as 
index recommendation, which should be asynchronous in respect to the sql query 
in order to ensure the effectiveness of query processing. So my problem is 
whether pg has some mechanisms supporting asnchronous processing while 
processing each sql query? 
    I am also pleased to dig into some detailed blogs or 
docs you recommend me. Looking forward to your reply!

    Yours, sincerely.

Re: making update/delete of inheritance trees scale better

2021-03-23 Thread Robert Haas
On Wed, Mar 3, 2021 at 9:39 AM Amit Langote  wrote:
> Just noticed that a test added by the recent 927f453a941 fails due to
> 0002.  We no longer allow moving a row into a postgres_fdw partition
> if it is among the UPDATE's result relations, whereas previously we
> would if the UPDATE on that partition is already finished.
>
> To fix, I've adjusted the test case.  Attached updated version.

I spent some time studying this patch this morning. As far as I can
see, 0001 is a relatively faithful implementation of the design Tom
proposed back in early 2019. I think it would be nice to either get
this committed or else decide that we don't want it and what we're
going to try to do instead, because we can't make UPDATEs and DELETEs
stop sucking with partitioned tables until we settle on some solution
to the problem that is inheritance_planner(), and that strikes me as
an *extremely* important problem. Lots of people are trying to use
partitioning in PostgreSQL, and they don't like finding out that, in
many cases, it makes things slower rather than faster. Neither this
nor any other patch is going to solve that problem in general, because
there are *lots* of things that haven't been well-optimized for
partitioning yet. But, this is a pretty important case that we should
really try to do something about.

So, that said, here are some random comments:

- I think it would be interesting to repeat your earlier benchmarks
using -Mprepared. One question I have is whether we're saving overhead
here during planning at the price of execution-time overhead, or
whether we're saving during both planning and execution.

- Until I went back and found your earlier remarks on this thread, I
was confused as to why you were replacing a JunkFilter with a
ProjectionInfo. I think it would be good to try to add some more
explicit comments about that design choice, perhaps as header comments
for ExecGetUpdateNewTuple, or maybe there's a better place. I'm still
not sure why we need to do the same thing for the insert case, which
seems to just be about removing junk columns. At least in the non-JIT
case, it seems to me that ExecJunkFilter() should be cheaper than
ExecProject(). Is it different enough to matter? Does the fact that we
can JIT the ExecProject() work make it actually faster? These are
things I don't know.

- There's a comment which you didn't write but just moved which I find
to be quite confusing. It says "For UPDATE/DELETE, find the
appropriate junk attr now. Typically, this will be a 'ctid' or
'wholerow' attribute, but in the case of a foreign data wrapper it
might be a set of junk attributes sufficient to identify the remote
row." But, the block of code which follows caters only to the 'ctid'
and 'wholerow' cases, not anything else. Perhaps that's explained by
the comment a bit further down which says "When there is a row-level
trigger, there should be a wholerow attribute," but if the point is
that this code is only reached when there's a row-level trigger,
that's far from obvious. It *looks* like something we'd reach for ANY
insert or UPDATE case. Maybe it's not your job to do anything about
this confusion, but I thought it was worth highlighting.

- The comment for filter_junk_tlist_entries(), needless to say, is of
the highest quality, but would it make any sense to think about having
an option for expand_tlist() to omit the junk entries itself, to avoid
extra work? I'm unclear whether we want that behavior in all UPDATE
cases or only some of them, because preproces_targetlist() has a call
to expand_tlist() to set parse->onConflict->onConflictSet that does
not call filter_junk_tlist_entries() on the result. Does this patch
need to make any changes to the handling of ON CONFLICT .. UPDATE? It
looks to me like right now it doesn't, but I don't know whether that's
an oversight or intentional.

- The output changes in the main regression test suite are pretty easy
to understand: we're just seeing columns that no longer need to get
fed through the execution get dropped. The changes in the postgres_fdw
results are harder to understand. In general, it appears that what's
happening is that we're no longer outputting the non-updated columns
individually -- which makes sense -- but now we're outputting a
whole-row var that wasn't there before, e.g.:

- Output: foreign_tbl.a, (foreign_tbl.b + 15), foreign_tbl.ctid
+ Output: (foreign_tbl.b + 15), foreign_tbl.ctid, foreign_tbl.*

Since this is postgres_fdw, we can re-fetch the row using CTID, so
it's not clear to me why we need foreign_tbl.* when we didn't before.
Maybe the comments need improvement.

- Specifically, I think the comments in preptlist.c need some work.
You've edited the top-of-file comment, but I don't think it's really
accurate. The first sentence says "For INSERT and UPDATE, the
targetlist must contain an entry for each attribute of the target
relation in the correct order," but I don't think that's really true
any more. It's certainly not what y

Re: Nicer error when connecting to standby with hot_standby=off

2021-03-23 Thread James Coleman
On Tue, Mar 23, 2021 at 12:34 PM Tom Lane  wrote:
>
> Alvaro Herrera  writes:
> > However, for this one
>
> > +   case CAC_NOTCONSISTENT:
> > +   if (EnableHotStandby)
> > +   ereport(FATAL,
> > +   (errcode(ERRCODE_CANNOT_CONNECT_NOW),
> > +errmsg("the database system is not accepting 
> > connections"),
> > +errdetail("Consistent recovery state has not been 
> > yet reached.")));
>
> > Maybe it makes sense to say "... is not accepting connections *yet*".
>
> +1, but I think "... is not yet accepting connections" is slightly
> better style.

All right, see attached v8.

James Coleman


v8-0001-Improve-standby-connection-denied-error-message.patch
Description: Binary data


Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2021-03-23 Thread Bruce Momjian
On Tue, Mar 23, 2021 at 12:12:03PM -0300, Álvaro Herrera wrote:
> On 2021-Mar-22, Bruce Momjian wrote:
> 
> > --- a/doc/src/sgml/ref/explain.sgml
> > +++ b/doc/src/sgml/ref/explain.sgml
> > @@ -136,8 +136,10 @@ ROLLBACK;
> >the output column list for each node in the plan tree, schema-qualify
> >table and function names, always label variables in expressions with
> >their range table alias, and always print the name of each trigger 
> > for
> > -  which statistics are displayed.  This parameter defaults to
> > -  FALSE.
> > +  which statistics are displayed.  The query identifier will also be
> > +  displayed if one has been compute, see  > +  linkend="guc-compute-query-id"/> for more details.  This parameter
> > +  defaults to FALSE.
> 
> Typo here, "has been computed".

Good catch, fixed.

> Is the intention to commit each of these patches separately?

No, I was thinking of just doing a single commit.  Should I do three
commits?  I posted it as three patches since that is how it was posted
by the author, and reviewing is easier.  It also will need a catversion
bump.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.





Re: new release pspg

2021-03-23 Thread Pavel Stehule
Hi

ne 21. 3. 2021 v 7:40 odesílatel Greg Stark  napsal:

> This is really cool.  Now I just need to figure out how to
> integrate it with using Emacs for my terminal. I still want to use
> emacs enter and edit my queries but it would be cool to be able to hit
> a key and launch an xterm and send the query output to pspg
>

pspg 4.5.0 supports --querystream mode

run in one terminal /pspg --querystream -f ~/pipe  --hold-stream=2 -h
localhost

and from any application, where you can write to stream, you can send a
queries separated by GS char 0x1d or ^] on separate line

[pavel@localhost src]$ cat /dev/tty > ~/pipe
select 1
^]
select * from pg_class limit
10
^]

Regards

Pavel


Re: proposal - psql - use pager for \watch command

2021-03-23 Thread Pavel Stehule
po 22. 3. 2021 v 13:13 odesílatel Thomas Munro 
napsal:

> On Mon, Mar 22, 2021 at 5:10 PM Pavel Stehule 
> wrote:
> > probably there will not be an issue inside ncurses - the most complex
> part of get_event is polling of input sources - tty and some other. The
> pspg should not to stop there on tty reading.
>
> The problem is that Apple's /dev/tty device is defective, and doesn't
> work in poll().  It always returns immediately with revents=POLLNVAL,
> but pspg assumes that data is ready and tries to read the keyboard and
> then blocks until I press a key.  This seems to fix it:
>
> +#ifndef __APPLE__
> +   /* macOS can't use poll() on /dev/tty */
> state.tty = fopen("/dev/tty", "r+");
> +#endif
> if (!state.tty)
> state.tty = fopen(ttyname(fileno(stdout)), "r");
>
> A minor problem is that on macOS, _GNU_SOURCE doesn't seem to imply
> NCURSES_WIDECHAR, so I suspect Unicode will be broken unless you
> manually add -DNCURSES_WIDECHAR=1, though I didn't check.
>

For record, this issue is fixed in pspg 4.5.0.

Regards

Pavel


Re: making update/delete of inheritance trees scale better

2021-03-23 Thread Tom Lane
Robert Haas  writes:
> I spent some time studying this patch this morning. As far as I can
> see, 0001 is a relatively faithful implementation of the design Tom
> proposed back in early 2019. I think it would be nice to either get
> this committed or else decide that we don't want it and what we're
> going to try to do instead,

Yeah, it's on my to-do list for this CF, but I expect it's going to
take some concentrated study and other things keep intruding :-(.

All of your comments/questions seem reasonable; thanks for taking
a look.

regards, tom lane




Re: Replication slot stats misgivings

2021-03-23 Thread Andres Freund
Hi,

On 2021-03-23 23:37:14 +0900, Masahiko Sawada wrote:
> On Tue, Mar 23, 2021 at 3:09 PM Amit Kapila  wrote:
> >
> > On Mon, Mar 22, 2021 at 12:20 PM Masahiko Sawada  
> > wrote:
> > >
> > > On Mon, Mar 22, 2021 at 1:25 PM Masahiko Sawada  
> > > wrote:
> > > >
> > > > On Sat, Mar 20, 2021 at 3:52 AM Andres Freund  
> > > > wrote:
> > > > >
> > > > > - If max_replication_slots was lowered between a restart,
> > > > >   pgstat_read_statfile() will happily write beyond the end of
> > > > >   replSlotStats.
> > > >
> > > > I think we cannot restart the server after lowering
> > > > max_replication_slots to a value less than the number of replication
> > > > slots actually created on the server. No?
> > >
> > > This problem happens in the case where max_replication_slots is
> > > lowered and there still are stats for a slot.
> > >
> >
> > I think this can happen only if the drop message is lost, right?
> 
> Yes, I think you're right. In that case, the stats file could have
> more slots statistics than the lowered max_replication_slots.

Or if slots are deleted on the file-system while the cluster is
shutdown. Which obviously is at best a semi-supported thing, but it
normally does work.


> > > I understood the risk of running out of replSlotStats. If we use the
> > > index in replSlotStats instead, IIUC we need to somehow synchronize
> > > the indexes in between replSlotStats and
> > > ReplicationSlotCtl->replication_slots. The order of replSlotStats is
> > > preserved across restarting whereas the order of
> > > ReplicationSlotCtl->replication_slots isn’t (readdir() that is used by
> > > StartupReplicationSlots() doesn’t guarantee the order of the returneda
> > > entries in the directory).

Very good point. Even if readdir() order were fixed, we'd still have the
problem because there can be "gaps" in the indexes for slots
(e.g. create slot_a, create slot_b, create slot_c, drop slot_b, leaving
you with index 0 and 2 used, and 1 unused).


> > > Maybe we can compare the slot name in the
> > > received message to the name in the element of replSlotStats. If they
> > > don’t match, we swap entries in replSlotStats to synchronize the index
> > > of the replication slot in ReplicationSlotCtl->replication_slots and
> > > replSlotStats. If we cannot find the entry in replSlotStats that has
> > > the name in the received message, it probably means either it's a new
> > > slot or the previous create message is dropped, we can create the new
> > > stats for the slot. Is that what you mean, Andres?

That doesn't seem great. Slot names are imo a poor identifier for
something happening asynchronously. The stats collector regularly
doesn't process incoming messages for periods of time because it is busy
writing out the stats file. That's also when messages to it are most
likely to be dropped (likely because the incoming buffer is full).

Perhaps we could have RestoreSlotFromDisk() send something to the stats
collector ensuring the mapping makes sense?

Greetings,

Andres Freund




Re: pg_upgrade failing for 200+ million Large Objects

2021-03-23 Thread Jan Wieck

On 3/23/21 10:56 AM, Bruce Momjian wrote:

On Tue, Mar 23, 2021 at 08:51:32AM -0400, Jan Wieck wrote:

On 3/22/21 7:18 PM, Jan Wieck wrote:
> On 3/22/21 5:36 PM, Zhihong Yu wrote:
> > Hi,
> > 
> > w.r.t. pg_upgrade_improvements.v2.diff.
> > 
> > +       blobBatchCount = 0;

> > +       blobInXact = false;
> > 
> > The count and bool flag are always reset in tandem. It seems

> > variable blobInXact is not needed.
> 
> You are right. I will fix that.


New patch v3 attached.


Would it be better to allow pg_upgrade to pass arbitrary arguments to
pg_restore, instead of just these specific ones?



That would mean arbitrary parameters to pg_dump as well as pg_restore. 
But yes, that would probably be better in the long run.


Any suggestion as to how that would actually look like? Unfortunately 
pg_restore has -[dDoOr] already used, so it doesn't look like there will 
be any naturally intelligible short options for that.



Regards, Jan

--
Jan Wieck
Principle Database Engineer
Amazon Web Services




Re: WIP: BRIN multi-range indexes

2021-03-23 Thread Tomas Vondra


On 3/23/21 2:36 PM, Alvaro Herrera wrote:
> On 2021-Mar-22, Tomas Vondra wrote:
> 
>> I don't know what's the right fix, but it seems like this patch has
>> nothing to do with it. If we want to move the opclasses into an
>> extension, we can comment out that one (cidr/inet) case for now.
> 
> I don't know what would be a good reason to define the opclasses in
> separate contrib extensions.  I think it's going to be a nuisance to
> users, so unless there is some strong argument for it, I'd suggest not
> to do it.  I found it being discussed here:
> https://www.postgresql.org/message-id/CA%2BTgmoajaQKBUx%3DvaTUFo6z80dsRzBw__Nu41Q4t06baZep3Ug%40mail.gmail.com
> but there weren't any strong arguments put forward.
> 

OK, thanks for the opinion. Yeah, you're right there were no strong
opinions either way, and I see both pros/cons for the contrib option.
Defining the opclasses using SQL is way more convenient and less
error-prone than doing that directly in .dat files, for example. But
there are some limitations too, so not sure it's worth it.

> It seems a good experiment to have done it, though, since we now know
> that there is a limitation in the existing SQL interface.  Maybe the fix
> to that problem is to add a new clause to CREATE/ALTER OPERATOR CLASS to
> let you define what goes into opckeytype.  However I don't think it's
> this patch's responsibility to fix that problem.
> 

Yeah, it was a good experiment. I think we still need to figure out what
to do about the opckeytype - what needs to be fixed, exactly.

FWIW there's yet another difference between the current BRIN opclass
definition, compared to what CREATE OPERATOR CLASS would do. Or more
precisely, how we'd define opfamily for the cross-type cases (integer,
float and timestamp cases).

AFAICS we don't really need pg_amproc entries for the cross-type cases,
we just need the operators, so pg_amproc entries like

{ amprocfamily => 'brin/datetime_minmax_ops', amproclefttype =>
'timestamptz',
  amprocrighttype => 'timestamp', amprocnum => '1',
  amproc => 'brin_minmax_opcinfo' },

are unnecessary. The attached patch cleans that up, without breaking any
regression tests. Or is there a reason why we need those?

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
diff --git a/src/include/catalog/pg_amproc.dat 
b/src/include/catalog/pg_amproc.dat
index 9192a66a88..50082a1801 100644
--- a/src/include/catalog/pg_amproc.dat
+++ b/src/include/catalog/pg_amproc.dat
@@ -843,28 +843,7 @@
   amproc => 'brin_minmax_consistent' },
 { amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int8',
   amprocrighttype => 'int8', amprocnum => '4', amproc => 'brin_minmax_union' },
-{ amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int8',
-  amprocrighttype => 'int2', amprocnum => '1',
-  amproc => 'brin_minmax_opcinfo' },
-{ amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int8',
-  amprocrighttype => 'int2', amprocnum => '2',
-  amproc => 'brin_minmax_add_value' },
-{ amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int8',
-  amprocrighttype => 'int2', amprocnum => '3',
-  amproc => 'brin_minmax_consistent' },
-{ amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int8',
-  amprocrighttype => 'int2', amprocnum => '4', amproc => 'brin_minmax_union' },
-{ amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int8',
-  amprocrighttype => 'int4', amprocnum => '1',
-  amproc => 'brin_minmax_opcinfo' },
-{ amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int8',
-  amprocrighttype => 'int4', amprocnum => '2',
-  amproc => 'brin_minmax_add_value' },
-{ amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int8',
-  amprocrighttype => 'int4', amprocnum => '3',
-  amproc => 'brin_minmax_consistent' },
-{ amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int8',
-  amprocrighttype => 'int4', amprocnum => '4', amproc => 'brin_minmax_union' },
+
 { amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int2',
   amprocrighttype => 'int2', amprocnum => '1',
   amproc => 'brin_minmax_opcinfo' },
@@ -876,28 +855,7 @@
   amproc => 'brin_minmax_consistent' },
 { amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int2',
   amprocrighttype => 'int2', amprocnum => '4', amproc => 'brin_minmax_union' },
-{ amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int2',
-  amprocrighttype => 'int8', amprocnum => '1',
-  amproc => 'brin_minmax_opcinfo' },
-{ amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int2',
-  amprocrighttype => 'int8', amprocnum => '2',
-  amproc => 'brin_minmax_add_value' },
-{ amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int2',
-  amprocrighttype => 'int8', amprocnum => '3',
-  amproc => 'brin_minmax_consistent' },
-{ amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int2',
-  amprocrighttype => 'int8', amprocnum => '4', amproc => 'brin_minmax_union' },
-{ 

Re: Disable WAL logging to speed up data loading

2021-03-23 Thread Stephen Frost
Greetings,

* tsunakawa.ta...@fujitsu.com (tsunakawa.ta...@fujitsu.com) wrote:
> From: Stephen Frost 
> > First- what are you expecting would actually happen during crash recovery in
> > this specific case with your proposed new WAL level?
> ...
> > I'm not suggesting it's somehow more crash safe- but it's at least very 
> > clear
> > what happens in such a case, to wit: the entire table is cleared on crash
> > recovery.
> 
> As Laurenz-san kindly replied, the database server refuses to start with a 
> clear message.  So, it's similarly very clear what happens.  The user will 
> never unknowingly resume operation with possibly corrupt data.

No, instead they'll just have a completely broken system that has to be
rebuilt or restored from a backup.  That doesn't strike me as a good
result.

> > We're talking about two different ways to accomplish essentially the same
> > thing- one which introduces a new WAL level, vs. one which adds an
> > optimization for a WAL level we already have.  That the second is more 
> > elegant
> > is more-or-less entirely the point I'm making here, so it seems pretty 
> > relevant.
> 
> So, I understood the point boils down to elegance.  Could I ask what makes 
> you feel ALTER TABLE UNLOGGED/LOGGED is (more) elegant?  I'm purely asking as 
> a user.

The impact is localized to those specific tables.  The rest of the
system should come up cleanly and there won't be corruption, instead
merely the lack of data in UNLOGGED tables.

> (I don't want to digress, but if we consider the number of options for 
> wal_level as an issue, I feel it's not elegant to have separate "replica" and 
> "logical".)

Do you know of a way to avoid having those two distinct levels while
still writing only the WAL needed depending on if a system is doing
logical replication or not..?  If you've got suggestions on how to
eliminate one of those levels, I'm sure there would be interest in doing
so.  I don't see the fact that we have those two levels as justification
for adding another spelling of 'minimal'.

> > Under the proposed 'none', you basically have to throw out the entire 
> > cluster on
> > a crash, all because you don't want to use 'UNLOGGED' when you created the
> > tables you want to load data into, or 'TRUNCATE' them in the transaction 
> > where
> > you start the data load, either of which gives us enough indication and 
> > which
> > we have infrastructure around dealing with in the event of a crash during 
> > the
> > load without everything else having to be tossed and everything restored 
> > from a
> > backup.  That's both a better user experience from the perspective of having
> > fewer WAL levels to understand and from just a general administration
> > perspective so you don't have to go all the way back to a backup to bring 
> > the
> > system back up.
> 
> The elegance of wal_level = none is that the user doesn't have to remember to 
> add ALTER TABLE to the data loading job when they add load target 
> tables/partitions.  If they build and use their own (shell) scripts to load 
> data, that won't be burdon or forgotten.  But what would they have to do when 
> they use ETL tools like Talend, Pentaho, and Informatica Power Center?  Do 
> those tools allow users to add custom processing like ALTER TABLE to the data 
> loading job steps for each table?  (AFAIK, not.)

I don't buy the argument that having to 'remember' to do an ALTER TABLE
is such a burden when it means that the database will still be
consistent and operational after a crash.

As for data loading tools, surely they support loading data into
UNLOGGED tables and it's certainly not hard to have a script run around
and flip those tables to LOGGED after they're loaded, and I do actually
believe some of those tools support building processes of which one step
could be such a command (I'm fairly confident Pentaho, in particular,
does as I remember building such pipelines myself...).

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: pg_upgrade failing for 200+ million Large Objects

2021-03-23 Thread Bruce Momjian
On Tue, Mar 23, 2021 at 01:25:15PM -0400, Jan Wieck wrote:
> On 3/23/21 10:56 AM, Bruce Momjian wrote:
> > Would it be better to allow pg_upgrade to pass arbitrary arguments to
> > pg_restore, instead of just these specific ones?
> > 
> 
> That would mean arbitrary parameters to pg_dump as well as pg_restore. But
> yes, that would probably be better in the long run.
> 
> Any suggestion as to how that would actually look like? Unfortunately
> pg_restore has -[dDoOr] already used, so it doesn't look like there will be
> any naturally intelligible short options for that.

We have the postmaster which can pass arbitrary arguments to postgres
processes using -o.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.





Re: [PATCH] Allow multiple recursive self-references

2021-03-23 Thread Denis Hirn
Hey Pantelis,

> I am not at all sure what the standard says about such recursion [...]

as far as I know, the standard does not constraint the number of self-references
of recursive common table expressions. However, I could be wrong here.

> [...] but it looks like the two t's are treated in your patch as the same 
> incarnation of the table, not as a cross join of two incarnations.


That's right and – as far as I'm concerned – it's expected behaviour. The patch 
only allows the recursive
union operator's working table to be read more than once. All self-references 
read exactly the same rows
in each iteration. You could basically accomplish the same thing with another 
CTE like this:

WITH RECURSIVE t(n) AS (
VALUES(1)
  UNION ALL
(WITH wt AS (SELECT * FROM t)
SELECT wt.n+f.n
FROM wt, wt AS f
WHERE wt.n < 100)
) SELECT * FROM t;

But honestly, this feels more like a hack than a solution to me. The entire 
working table is
materialized by the (non recursive) common table expression wt, effectively 
doubling the
memory consumption of the query. This patch eliminates this intermediate 
materialization.

> I don't think any other DBMS has implemented this, except MariaDB. Tested 
> here:

There are a few recent DBMSs that I know of that support this: HyPer, Umbra, 
DuckDB, and NoisePage.
I'm sure there are some more examples. Still, you are right, many other DBMSs 
do not support this – yet.

--
Denis Hirn



Re: pg_upgrade failing for 200+ million Large Objects

2021-03-23 Thread Jan Wieck

On 3/23/21 2:06 PM, Bruce Momjian wrote:

We have the postmaster which can pass arbitrary arguments to postgres
processes using -o.


Right, and -o is already taken in pg_upgrade for sending options to the 
old postmaster.


What we are looking for are options for sending options to pg_dump and 
pg_restore, which are not postmasters or children of postmaster, but 
rather clients. There is no option to send options to clients of 
postmasters.


So the question remains, how do we name this?

--pg-dump-options ""
--pg-restore-options ""

where "" could be something like "--whatever[=NUM] [...]" would 
be something unambiguous.



Regards, Jan

--
Jan Wieck
Principle Database Engineer
Amazon Web Services




Re: pg_upgrade failing for 200+ million Large Objects

2021-03-23 Thread Bruce Momjian
On Tue, Mar 23, 2021 at 02:23:03PM -0400, Jan Wieck wrote:
> On 3/23/21 2:06 PM, Bruce Momjian wrote:
> > We have the postmaster which can pass arbitrary arguments to postgres
> > processes using -o.
> 
> Right, and -o is already taken in pg_upgrade for sending options to the old
> postmaster.
> 
> What we are looking for are options for sending options to pg_dump and
> pg_restore, which are not postmasters or children of postmaster, but rather
> clients. There is no option to send options to clients of postmasters.
> 
> So the question remains, how do we name this?
> 
> --pg-dump-options ""
> --pg-restore-options ""
> 
> where "" could be something like "--whatever[=NUM] [...]" would be
> something unambiguous.

Sure.  I don't think the letter you use is a problem.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.





Re: Change default of checkpoint_completion_target

2021-03-23 Thread Bossart, Nathan
LGTM.  I just have a few small wording suggestions.

+completion overhead.  Reducing this parameter is not recommended as 
that
+causes the I/O from the checkpoint to have to complete faster, 
resulting
+in a higher I/O rate, while then having a period of less I/O between 
the
+completion of the checkpoint and the start of the next scheduled
+checkpoint.  This parameter can only be set in the

Reducing this parameter is not recommended because it forces the
checkpoint to complete faster.  This results in a higher rate of I/O
during the checkpoint followed by a period of less I/O between
checkpoint completion and the next scheduled checkpoint.

+   duration).  This spreads out the I/O as much as possible to have the I/O 
load be
+   consistent during the checkpoint.  The disadvantage of this is that 
prolonging

This spreads out the I/O as much as possible so that the checkpoint
I/O load is consistent throughout the checkpoint interval.

+   around for possible use in recovery.  A user concerned about the amount of 
time
+   required to recover might wish to reduce 
checkpoint_timeout,
+   causing checkpoints to happen more frequently while still spreading out the 
I/O
+   from each checkpoint.  Alternatively,

A user concerned about the amount of time required to recover might
wish to reduce checkpoint_timeout so that checkpoints occur more
frequently but still spread the I/O across the checkpoint interval.

+   Although checkpoint_completion_target could be set as 
high as
+   1.0, it is best to keep it less than that (such as at the default of 0.9, 
at most)
+   since checkpoints include some other activities besides writing dirty 
buffers.

Although checkpoint_completion_target can be set as high at 1.0, it is
typically recommended to set it to no higher than 0.9 (the default)
since checkpoints include some other activities besides writing dirty
buffers.

Nathan



Re: WIP: BRIN multi-range indexes

2021-03-23 Thread Alvaro Herrera
On 2021-Mar-23, Tomas Vondra wrote:

> FWIW there's yet another difference between the current BRIN opclass
> definition, compared to what CREATE OPERATOR CLASS would do. Or more
> precisely, how we'd define opfamily for the cross-type cases (integer,
> float and timestamp cases).
> 
> AFAICS we don't really need pg_amproc entries for the cross-type cases,
> we just need the operators, so pg_amproc entries like
> 
> { amprocfamily => 'brin/datetime_minmax_ops', amproclefttype =>
> 'timestamptz',
>   amprocrighttype => 'timestamp', amprocnum => '1',
>   amproc => 'brin_minmax_opcinfo' },
> 
> are unnecessary. The attached patch cleans that up, without breaking any
> regression tests. Or is there a reason why we need those?

... ooh ...

When you say "just the operators" you mean the pg_amop entries, right?

I think I agree -- cross-type amproc entries are unlikely to have any
use.

-- 
Álvaro Herrera   Valdivia, Chile




Re: Change default of checkpoint_completion_target

2021-03-23 Thread Bruce Momjian
On Tue, Mar 23, 2021 at 06:24:07PM +, Bossart, Nathan wrote:
> LGTM.  I just have a few small wording suggestions.
> 
> +completion overhead.  Reducing this parameter is not recommended as 
> that
> +causes the I/O from the checkpoint to have to complete faster, 
> resulting
> +in a higher I/O rate, while then having a period of less I/O between 
> the
> +completion of the checkpoint and the start of the next scheduled
> +checkpoint.  This parameter can only be set in the
> 
> Reducing this parameter is not recommended because it forces the
> checkpoint to complete faster.  This results in a higher rate of I/O
> during the checkpoint followed by a period of less I/O between
> checkpoint completion and the next scheduled checkpoint.

FYI, I am very happy this issue is being addressed for PG 14.  :-)

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.





Re: pg_upgrade failing for 200+ million Large Objects

2021-03-23 Thread Tom Lane
Jan Wieck  writes:
> So the question remains, how do we name this?

>  --pg-dump-options ""
>  --pg-restore-options ""

If you're passing multiple options, that is

--pg-dump-options "--foo=x --bar=y"

it seems just horribly fragile.  Lose the double quotes and suddenly
--bar is a separate option to pg_upgrade itself, not part of the argument
for the previous option.  That's pretty easy to do when passing things
through shell scripts, too.  So it'd likely be safer to write

--pg-dump-option=--foo=x --pg-dump-option=--bar=y

which requires pg_upgrade to allow aggregating multiple options,
but you'd probably want it to act that way anyway.

regards, tom lane




Re: make the stats collector shutdown without writing the statsfiles if the immediate shutdown is requested.

2021-03-23 Thread Andres Freund
Hi,

On 2021-03-23 15:50:46 +0900, Fujii Masao wrote:
> This fact makes me wonder that if we collect the statistics about WAL writing
> from walreceiver as we discussed in other thread, the stats collector should
> be invoked at more earlier stage. IIUC walreceiver can be invoked before
> PMSIGNAL_BEGIN_HOT_STANDBY is sent.

FWIW, in the shared memory stats patch the stats subsystem is
initialized early on by the startup process.

Greetings,

Andres Freund




Re: pg_upgrade failing for 200+ million Large Objects

2021-03-23 Thread Jan Wieck

On 3/23/21 2:35 PM, Tom Lane wrote:

Jan Wieck  writes:

So the question remains, how do we name this?



 --pg-dump-options ""
 --pg-restore-options ""


If you're passing multiple options, that is

--pg-dump-options "--foo=x --bar=y"

it seems just horribly fragile.  Lose the double quotes and suddenly
--bar is a separate option to pg_upgrade itself, not part of the argument
for the previous option.  That's pretty easy to do when passing things
through shell scripts, too.  So it'd likely be safer to write

--pg-dump-option=--foo=x --pg-dump-option=--bar=y

which requires pg_upgrade to allow aggregating multiple options,
but you'd probably want it to act that way anyway.


... which would be all really easy if pg_upgrade wouldn't be assembling 
a shell script string to pass into parallel_exec_prog() by itself.


But I will see what I can do ...


Regards, Jan


--
Jan Wieck
Principle Database Engineer
Amazon Web Services




Re: pg_upgrade failing for 200+ million Large Objects

2021-03-23 Thread Tom Lane
Jan Wieck  writes:
> On 3/23/21 2:35 PM, Tom Lane wrote:
>> If you're passing multiple options, that is
>> --pg-dump-options "--foo=x --bar=y"
>> it seems just horribly fragile.  Lose the double quotes and suddenly
>> --bar is a separate option to pg_upgrade itself, not part of the argument
>> for the previous option.  That's pretty easy to do when passing things
>> through shell scripts, too.

> ... which would be all really easy if pg_upgrade wouldn't be assembling 
> a shell script string to pass into parallel_exec_prog() by itself.

No, what I was worried about is shell script(s) that invoke pg_upgrade
and have to pass down some of these options through multiple levels of
option parsing.

BTW, it doesn't seem like the "pg-" prefix has any value-add here,
so maybe "--dump-option" and "--restore-option" would be suitable
spellings.

regards, tom lane




Re: Support for NSS as a libpq TLS backend

2021-03-23 Thread Stephen Frost
Greetings,

* Daniel Gustafsson (dan...@yesql.se) wrote:
> > On 22 Mar 2021, at 00:49, Stephen Frost  wrote:
> 
> Thanks for the review!  Below is a partial response, I haven't had time to
> address all your review comments yet but I wanted to submit a rebased patchset
> directly since the current version doesn't work after recent changes in the
> tree. I will address the remaining comments tomorrow or the day after.

Great, thanks!

> This rebase also includes a fix for pgtls_init which was sent offlist by 
> Jacob.
> The changes in pgtls_init can potentially be used to initialize the crypto
> context for NSS to clean up this patch, Jacob is currently looking at that.

Ah, cool, sounds good.

> > They aren't the same and it might not be
> > clear what's going on if one was to somehow mix them (at least if pr_fd
> > continues to sometimes be a void*, but I wonder why that's being
> > done..?  more on that later..).
> 
> To paraphrase from a later in this email, there are collisions between nspr 
> and
> postgres on things like BITS_PER_BYTE, and there were also collisions on basic
> types until I learned about NO_NSPR_10_SUPPORT.  By moving the juggling of 
> this
> into common/nss.h we can use proper types without introducing that pollution
> everywhere. I will address these places.

Ah, ok, and great, that sounds good.

> >> +++ b/src/backend/libpq/be-secure-nss.c
> > [...]
> >> +/* default init hook can be overridden by a shared library */
> >> +static void default_nss_tls_init(bool isServerStart);
> >> +nss_tls_init_hook_type nss_tls_init_hook = default_nss_tls_init;
> > 
> >> +static PRDescIdentity pr_id;
> >> +
> >> +static PRIOMethods pr_iomethods;
> > 
> > Happy to be told I'm missing something, but the above two variables seem
> > to only be used in init_iolayer.. is there a reason they're declared
> > here instead of just being declared in that function?
> 
> They must be there since NSPR doesn't copy these but reference them.

Ah, ok, interesting.

> >> +  /*
> >> +   * Set the fallback versions for the TLS protocol version range to a
> >> +   * combination of our minimal requirement and the library maximum. Error
> >> +   * messages should be kept identical to those in be-secure-openssl.c to
> >> +   * make translations easier.
> >> +   */
> > 
> > Should we pull these error messages out into another header so that
> > they're in one place to make sure they're kept consistent, if we really
> > want to put the effort in to keep them the same..?  I'm not 100% sure
> > that it's actually necessary to do so, but defining these in one place
> > would help maintain this if we want to.  Also alright with just keeping
> > the comment, not that big of a deal.
> 
> It might make sense to pull them into common/nss.h, but seeing the error
> message right there when reading the code does IMO make it clearer so it's a
> doubleedged sword.  Not sure what is the best option, but I'm not married to
> the current solution so if there is consensus to pull them out somewhere I'm
> happy to do so.

My thought was to put them into some common/ssl.h or something along
those lines but I don't see it as a big deal either way really.  You
make a good point that having the error message there when reading the
code is nice.

> > Maybe we should put a stake in the ground that says "we only support
> > back to version X of NSS", test with that and a few more recent versions
> > and the most recent, and then rip out anything that's needed for
> > versions which are older than that?  
> 
> Yes, right now there is very little in the patch which caters for old 
> versions,
> the PR_Init call might be one of the few offenders.  There has been discussion
> upthread about settling for a required version, combining the insights learned
> there with a survey of which versions are commonly available packaged.
> 
> Once we settle on a version we can confirm if PR_Init is/isn't needed and
> remove all traces of it if not.

I don't really see this as all that hard to do- I'd suggest we look at
what systems someone might reasonably deploy v14 on.  To that end, I'd
say "only systems which are presently supported", so: RHEL7+, Debian 9+,
Ubuntu 16.04+.  Looking at those, I see:

Ubuntu 16.04: 3.28.4
RHEL6: v3.28.4
Debian: 3.26.2

> > I have a pretty hard time imagining that someone is going to want to build 
> > PG
> > v14 w/ NSS 2.0 ...
> 
> Let alone compiling 2.0 at all on a recent system..

Indeed, and given the above, it seems entirely reasonable to make the
requirement be NSS v3+, no?  I wouldn't be against making that even
tighter if we thought it made sense to do so.

> > Also- we don't seem to complain at all about a cipher being specified that 
> > we
> > don't find?  Guess I would think that we might want to throw a WARNING in 
> > such
> > a case, but I could possibly be convinced otherwise.
> 
> No, I think you're right, we should throw WARNING there or possibly even a
> higher elevel. Should that be a COMMERROR even?

I suppo

Re: pg_amcheck contrib application

2021-03-23 Thread Robert Haas
On Thu, Mar 18, 2021 at 12:12 AM Tom Lane  wrote:
> Mark Dilger  writes:
> >> On Mar 16, 2021, at 12:52 PM, Robert Haas  wrote:
> >> Since we now know that shutting autovacuum off makes the problem go
> >> away, I don't see a reason to commit 0001. We should fix pg_amcheck
> >> instead, if, as presently seems to be the case, that's where the
> >> problem is.
>
> > If you get unlucky, autovacuum will process one of the tables that the test 
> > intentionally corrupted, with bad consequences, ultimately causing build 
> > farm intermittent test failures.
>
> Um, yeah, the test had better shut off autovacuum on any table that
> it intentionally corrupts.

Right, good point. But... does that really apply to
005_opclass_damage.pl? I feel like with the kind of physical damage
we're doing in 003_check.pl, it makes a lot of sense to stop vacuum
from crashing headlong into that table. But, 005 is doing "logical"
damage rather than "physical" damage, and I don't see why autovacuum
should misbehave in that kind of case. In fact, the fact that
autovacuum can handle such cases is one of the selling points for the
whole design of vacuum, as opposed to, for example, retail index
lookups.

Pending resolution of that question, I've committed the change to
disable autovacuum in 003, and also Mark's changes to have it also run
pg_amcheck BEFORE corrupting anything, so the post-corruption tests
fail - say by finding the wrong kind of corruption - we can see
whether it was also failing before the corruption was even introduced.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Change default of checkpoint_completion_target

2021-03-23 Thread Stephen Frost
Greetings,

* Bossart, Nathan (bossa...@amazon.com) wrote:
> LGTM.  I just have a few small wording suggestions.

Agreed, those looked like good suggestions and so I've incorporated
them.

Updated patch attached.

Thanks!

Stephen
From 40a529bc0a129e90c9917c1a3df2297ac7f2e073 Mon Sep 17 00:00:00 2001
From: Stephen Frost 
Date: Tue, 19 Jan 2021 13:53:34 -0500
Subject: [PATCH] Change checkpoint_completion_target default to 0.9

Common recommendations are that the checkpoint should be spread out as
much as possible, provided we avoid having it take too long.  This
change updates the default to 0.9 (from 0.5) to match that
recommendation.

There was some debate about possibly removing the option entirely but it
seems there may be some corner-cases where having it set much lower to
try to force the checkpoint to be as fast as possible could result in
fewer periods of time of reduced performance due to kernel flushing.
General agreement is that the "spread more" is the preferred approach
though and those who need to tune away from that value are much less
common.

Reviewed-By: Michael Paquier, Peter Eisentraut, Tom Lane, David Steele,
Nathan Bossart
Discussion: https://postgr.es/m/20201207175329.GM16415%40tamriel.snowman.net
---
 doc/src/sgml/config.sgml  | 12 ++--
 doc/src/sgml/wal.sgml | 29 ---
 src/backend/postmaster/checkpointer.c |  2 +-
 src/backend/utils/misc/guc.c  |  2 +-
 src/backend/utils/misc/postgresql.conf.sample |  2 +-
 src/test/recovery/t/015_promotion_pages.pl|  1 -
 6 files changed, 30 insertions(+), 18 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 5679b40dd5..0d101f65f6 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -3302,9 +3302,15 @@ include_dir 'conf.d'
   

 Specifies the target of checkpoint completion, as a fraction of
-total time between checkpoints. The default is 0.5.
-This parameter can only be set in the postgresql.conf
-file or on the server command line.
+total time between checkpoints. The default is 0.9, which spreads the
+checkpoint across almost all of the available interval, providing fairly
+consistent I/O load while also leaving some time for checkpoint
+completion overhead.  Reducing this parameter is not recommended because
+it causes the checkpoint to complete faster.  This results in a higher
+rate of I/O during the checkpoint followed by a period of less I/O between
+the checkpoint completion and the next scheduled checkpoint.  This
+parameter can only be set in the postgresql.conf file
+or on the server command line.

   
  
diff --git a/doc/src/sgml/wal.sgml b/doc/src/sgml/wal.sgml
index ae4a3c1293..7d48f42710 100644
--- a/doc/src/sgml/wal.sgml
+++ b/doc/src/sgml/wal.sgml
@@ -571,22 +571,29 @@
writing dirty buffers during a checkpoint is spread over a period of time.
That period is controlled by
, which is
-   given as a fraction of the checkpoint interval.
+   given as a fraction of the checkpoint interval (configured by using
+   checkpoint_timeout).
The I/O rate is adjusted so that the checkpoint finishes when the
given fraction of
checkpoint_timeout seconds have elapsed, or before
max_wal_size is exceeded, whichever is sooner.
-   With the default value of 0.5,
+   With the default value of 0.9,
PostgreSQL can be expected to complete each checkpoint
-   in about half the time before the next checkpoint starts.  On a system
-   that's very close to maximum I/O throughput during normal operation,
-   you might want to increase checkpoint_completion_target
-   to reduce the I/O load from checkpoints.  The disadvantage of this is that
-   prolonging checkpoints affects recovery time, because more WAL segments
-   will need to be kept around for possible use in recovery.  Although
-   checkpoint_completion_target can be set as high as 1.0,
-   it is best to keep it less than that (perhaps 0.9 at most) since
-   checkpoints include some other activities besides writing dirty buffers.
+   a bit before the next scheduled checkpoint (at around 90% of the last checkpoint's
+   duration).  This spreads out the I/O as much as possible so that the checkpoint
+   I/O load is consistent throughout the checkpoint interval.  The disadvantage of
+   this is that prolonging checkpoints affects recovery time, because more WAL
+   segments will need to be kept around for possible use in recovery.  A user
+   concerned about the amount of time required to recover might wish to reduce
+   checkpoint_timeout so that checkpoints occur more frequently
+   but still spread the I/O across the checkpoint interval.  Alternatively,
+   checkpoint_completion_target could be reduced, but this would
+   result in times of more intense I/O (during the checkpoint) and times of less I

Re: pg_amcheck contrib application

2021-03-23 Thread Mark Dilger



> On Mar 23, 2021, at 12:05 PM, Robert Haas  wrote:
> 
> 005 is doing "logical"
> damage rather than "physical" damage, and I don't see why autovacuum
> should misbehave in that kind of case. In fact, the fact that
> autovacuum can handle such cases is one of the selling points for the
> whole design of vacuum, as opposed to, for example, retail index
> lookups.

That is a good point.  Checking that autovacuum behaves sensibly despite sort 
order breakage sounds reasonable, but test 005 doesn't do that reliably, 
because it does nothing to make sure that autovacuum runs against the affected 
table during the short window when the affected table exists.  All the same, I 
don't see that turning autovacuum off is required.  If autovacuum is broken in 
this regard, we may get occasional, hard to reproduce build farm failures, but 
that would be more informative than no failures at all.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: pg_upgrade failing for 200+ million Large Objects

2021-03-23 Thread Jan Wieck

On 3/23/21 2:59 PM, Tom Lane wrote:

Jan Wieck  writes:

On 3/23/21 2:35 PM, Tom Lane wrote:

If you're passing multiple options, that is
--pg-dump-options "--foo=x --bar=y"
it seems just horribly fragile.  Lose the double quotes and suddenly
--bar is a separate option to pg_upgrade itself, not part of the argument
for the previous option.  That's pretty easy to do when passing things
through shell scripts, too.


... which would be all really easy if pg_upgrade wouldn't be assembling 
a shell script string to pass into parallel_exec_prog() by itself.


No, what I was worried about is shell script(s) that invoke pg_upgrade
and have to pass down some of these options through multiple levels of
option parsing.


The problem here is that pg_upgrade itself is invoking a shell again. It 
is not assembling an array of arguments to pass into exec*(). I'd be a 
happy camper if it did the latter. But as things are we'd have to add 
full shell escapeing for arbitrary strings.




BTW, it doesn't seem like the "pg-" prefix has any value-add here,
so maybe "--dump-option" and "--restore-option" would be suitable
spellings.


Agreed.


Regards, Jan

--
Jan Wieck
Principle Database Engineer
Amazon Web Services




Re: [PATCH] Allow multiple recursive self-references

2021-03-23 Thread Tom Lane
Denis Hirn  writes:
>> I am not at all sure what the standard says about such recursion [...]

> as far as I know, the standard does not constraint the number of 
> self-references
> of recursive common table expressions. However, I could be wrong here.

As far as I can see, the spec flat-out forbids this.  In SQL:2021,
it's discussed in 7.17  syntax rule 3) j) ix), which
defines

  ix) If WLEi is recursive, then:
  1) Let S be the stratum that contains WQNi.
  2) If WQEi does not contain a  that contains
 more than one  referencing members of S, then WLEi is
 linearly recursive.

("stratum" means a set of mutually-recursive WITH items), and they
helpfully explain

  NOTE 308 — For example, if WLEi contains the 
SELECT * FROM A AS A1, A AS A2
  where A is a  in S, then WLEi is not linearly recursive. The
  point is that this  contains two references to
  members of S. It is irrelevant that they reference the same member of S.

and then the next rule says

  x) If WLEi is recursive, then WLEi shall be linearly recursive.


So the problem with extending the spec here is that someday they might
extend it with some other semantics, and then we're between a rock and
a hard place.

If there were really compelling reasons why (a) we have to have this
feature and (b) there's only one reasonable way for it to act, hence
no likelihood that the SQL committee will choose something else, then
I'd be willing to think about getting out front of the committee.
But you haven't made either case.

>> I don't think any other DBMS has implemented this, except MariaDB. Tested 
>> here:

> There are a few recent DBMSs that I know of that support this: HyPer, Umbra, 
> DuckDB, and NoisePage.

Do they all act the same?  Has anybody that sits on the SQL committee
done it?  (If you could point to DB2, in particular, I'd have some
confidence that the committee might standardize on their approach.)


A totally independent question is whether you've even defined a
well-defined behavior.  There's an interesting comment in the spec:

  NOTE 310 — The restrictions insure that each WLEi, viewed as a
  transformation of the query names of the stratum, is monotonically
  increasing. According to Tarski’s fixed point theorem, this insures that
  there is a fixed point. The General Rules use Kleene’s fixed point
  theorem to define a sequence that converges to the minimal fixed
  point.

I don't know any of the underlying math here, but if you've got a
join of multiple recursion references then the number of output
rows could certainly be nonlinear, which very possibly breaks the
argument that there's a well-defined interpretation.

regards, tom lane




Re: pg_upgrade failing for 200+ million Large Objects

2021-03-23 Thread Tom Lane
Jan Wieck  writes:
> The problem here is that pg_upgrade itself is invoking a shell again. It 
> is not assembling an array of arguments to pass into exec*(). I'd be a 
> happy camper if it did the latter. But as things are we'd have to add 
> full shell escapeing for arbitrary strings.

Surely we need that (and have it already) anyway?

I think we've stayed away from exec* because we'd have to write an
emulation for Windows.  Maybe somebody will get fed up and produce
such code, but it's not likely to be the least-effort route to the
goal.

regards, tom lane




Re: pg_amcheck contrib application

2021-03-23 Thread Peter Geoghegan
On Tue, Mar 23, 2021 at 12:05 PM Robert Haas  wrote:
> Right, good point. But... does that really apply to
> 005_opclass_damage.pl? I feel like with the kind of physical damage
> we're doing in 003_check.pl, it makes a lot of sense to stop vacuum
> from crashing headlong into that table. But, 005 is doing "logical"
> damage rather than "physical" damage, and I don't see why autovacuum
> should misbehave in that kind of case. In fact, the fact that
> autovacuum can handle such cases is one of the selling points for the
> whole design of vacuum, as opposed to, for example, retail index
> lookups.

FWIW that is only 99.9% true (contrary to what README.HOT says). This
is the case because nbtree page deletion will in fact search the tree
to find a downlink to the target page, which must be removed at the
same time -- see the call to _bt_search() made within nbtpage.c.

This is much less of a problem than you'd think, though, even with an
opclass that gives wrong answers all the time. Because it's also true
that _bt_getstackbuf() is remarkably tolerant when it actually goes to
locate the downlink -- because that happens via a linear search that
matches on downlink block number (it doesn't use the opclass for that
part). This means that we'll accidentally fail to fail if the page is
*somewhere* to the right in the "true" key space. Which probably means
that it has a greater than 50% chance of not failing with a 100%
broken opclass. Which probably makes our odds better with more
plausible levels of misbehavior (e.g. collation changes).

That being said, I should make _bt_lock_subtree_parent() return false
and back out of page deletion without raising an error in the case
where we really cannot locate a valid downlink. We really ought to
soldier on when that happens, since we'll do that for a bunch of other
reasons already. I believe that the only reason we throw an error
today is for parity with the page split case (the main
_bt_getstackbuf() call). But this isn't the same situation at all --
this is VACUUM.

I will make this change to HEAD soon, barring objections.

-- 
Peter Geoghegan




Re: pg_amcheck contrib application

2021-03-23 Thread Tom Lane
Peter Geoghegan  writes:
> That being said, I should make _bt_lock_subtree_parent() return false
> and back out of page deletion without raising an error in the case
> where we really cannot locate a valid downlink. We really ought to
> soldier on when that happens, since we'll do that for a bunch of other
> reasons already. I believe that the only reason we throw an error
> today is for parity with the page split case (the main
> _bt_getstackbuf() call). But this isn't the same situation at all --
> this is VACUUM.

> I will make this change to HEAD soon, barring objections.

+1.  Not deleting the upper page seems better than the alternatives.

regards, tom lane




Re: pg_amcheck contrib application

2021-03-23 Thread Peter Geoghegan
On Tue, Mar 23, 2021 at 12:44 PM Tom Lane  wrote:
> > I will make this change to HEAD soon, barring objections.
>
> +1.  Not deleting the upper page seems better than the alternatives.

FWIW it might also work that way as a holdover from the old page
deletion algorithm. These days we decide exactly which pages (leaf
page plus possible internal pages) are to be deleted as a whole up
front (these are a subtree, though usually just a degenerate
single-leaf-page subtree -- internal page deletions are rare).

One of the advantages of this design is that we verify practically all
of the work involved in deleting an entire subtree up-front, inside
_bt_lock_subtree_parent(). It's clearly safe to back out of it if it
looks dicey.

--
Peter Geoghegan




Re: Nicer error when connecting to standby with hot_standby=off

2021-03-23 Thread Fujii Masao




On 2021/03/24 1:20, Alvaro Herrera wrote:

On 2021-Mar-23, James Coleman wrote:


On Tue, Mar 23, 2021 at 1:46 AM Fujii Masao  wrote:



Therefore for now what we've not reached the consensus is what message
should be logged at PM_STARTUP. I'm thinking it's better to log
"the database system is starting up" in that case because of the reasons
that I explained upthread.



I understand your point now, and I agree, that makes sense.


Please note that PM_STARTUP mode is very very short-lived.  It only
starts happening when postmaster launches the startup process, and
before the startup process begins WAL replay (as changed by
sigusr1_handler in postmaster.c).  Once WAL replay begins, the PM status
changes to PM_RECOVERY.


True if archive recovery or standby server. But during crash recovery
postmaster sits in PM_STARTUP mode. So I guess that we still see
the log messages for PM_STARTUP lots of times.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: pg_upgrade failing for 200+ million Large Objects

2021-03-23 Thread Jan Wieck

On 3/23/21 3:35 PM, Tom Lane wrote:

Jan Wieck  writes:
The problem here is that pg_upgrade itself is invoking a shell again. It 
is not assembling an array of arguments to pass into exec*(). I'd be a 
happy camper if it did the latter. But as things are we'd have to add 
full shell escapeing for arbitrary strings.


Surely we need that (and have it already) anyway?


There are functions to shell escape a single string, like

appendShellString()

but that is hardly enough when a single optarg for --restore-option 
could look like any of


--jobs 8
--jobs=8
--jobs='8'
--jobs '8'
--jobs "8"
--jobs="8"
--dont-bother-about-jobs

When placed into a shell string, those things have very different 
effects on your args[].


I also want to say that we are overengineering this whole thing. Yes, 
there is the problem of shell quoting possibly going wrong as it passes 
from one shell to another. But for now this is all about passing a few 
numbers down from pg_upgrade to pg_restore (and eventually pg_dump).


Have we even reached a consensus yet on that doing it the way, my patch 
is proposing, is the right way to go? Like that emitting BLOB TOC 
entries into SECTION_DATA when in binary upgrade mode is a good thing? 
Or that bunching all the SQL statements for creating the blob, changing 
the ACL and COMMENT and SECLABEL all in one multi-statement-query is.


Maybe we should focus on those details before getting into all the 
parameter naming stuff.



Regards, Jan

--
Jan Wieck
Principle Database Engineer
Amazon Web Services




Re: [PATCH] pg_permissions

2021-03-23 Thread Joel Jacobson
On Thu, Mar 11, 2021, at 08:00, Joel Jacobson wrote:
> 0004-pg_permissions-and-pg_ownerships.patch

Having gotten some hands-on experience of these views for a while,
I notice I quite often want to check the ownerships/permissions
for some specific type of objects, or in some specific schema.

The current patch returns pg_describe_object() as the "objdesc" column.

Would it be a better idea to instead return the fields from 
pg_identify_object()?
This would allow specifically filtering on "type", "schema", "name" or 
"identity"
instead of having to apply a regex/LIKE on the object description.

/Joel

Re: Nicer error when connecting to standby with hot_standby=off

2021-03-23 Thread Alvaro Herrera
On 2021-Mar-24, Fujii Masao wrote:

> On 2021/03/24 1:20, Alvaro Herrera wrote:

> > Please note that PM_STARTUP mode is very very short-lived.  It only
> > starts happening when postmaster launches the startup process, and
> > before the startup process begins WAL replay (as changed by
> > sigusr1_handler in postmaster.c).  Once WAL replay begins, the PM status
> > changes to PM_RECOVERY.
> 
> True if archive recovery or standby server. But during crash recovery
> postmaster sits in PM_STARTUP mode. So I guess that we still see
> the log messages for PM_STARTUP lots of times.

Hmm ... true, and I had missed that this is what you had already said
upthread.  In this case, should we add a DETAIL line for this message?

FATAL:  the database system is starting up
DETAIL:  WAL is being applied to recover from a system crash.
or
DETAIL:  The system is applying WAL to recover from a system crash.
or
DETAIL:  The startup process is applying WAL to recover from a system crash.

-- 
Álvaro Herrera39°49'30"S 73°17'W
"La conclusión que podemos sacar de esos estudios es que
no podemos sacar ninguna conclusión de ellos" (Tanenbaum)




Re: [PATCH] pg_permissions

2021-03-23 Thread Alvaro Herrera
On 2021-Mar-23, Joel Jacobson wrote:

> On Thu, Mar 11, 2021, at 08:00, Joel Jacobson wrote:
> > 0004-pg_permissions-and-pg_ownerships.patch
> 
> Having gotten some hands-on experience of these views for a while,
> I notice I quite often want to check the ownerships/permissions
> for some specific type of objects, or in some specific schema.
> 
> The current patch returns pg_describe_object() as the "objdesc" column.
> 
> Would it be a better idea to instead return the fields from 
> pg_identify_object()?
> This would allow specifically filtering on "type", "schema", "name" or 
> "identity"
> instead of having to apply a regex/LIKE on the object description.

For programmatic use it is certainly better to use the object identity
rather than the description.  Particularly because the description gets
translated, and the identity doesn't.

-- 
Álvaro Herrera   Valdivia, Chile




Re: [PATCH] pg_permissions

2021-03-23 Thread Alvaro Herrera
On 2021-Mar-08, Joel Jacobson wrote:

> $ dropuser test
> dropuser: error: removal of role "test" failed: ERROR:  role "test" cannot be 
> dropped because some objects depend on it
> DETAIL:  1 object in database joel
> 
> Hmmm. I wonder which 1 object that could be?

BTW the easiest way to find out the answer to this question with current
tech is to connect to database joel and attempt "DROP USER joel"; it
will print a list of objects the user owns or has privileges for.

> # SELECT * FROM pg_ownerships WHERE rolname = 'test';
> # SELECT * FROM pg_permissions WHERE grantee = 'test';

I wonder if these views should be defined on top of pg_shdepend instead
of querying every single catalog.  That would make for much shorter
queries.

-- 
Álvaro Herrera   Valdivia, Chile




Re: Since '2001-09-09 01:46:40'::timestamp microseconds are lost when extracting epoch

2021-03-23 Thread Tom Lane
Peter Eisentraut  writes:
> On 19.03.21 21:06, Tom Lane wrote:
>> I guess the immediate question is how much of a performance gap there
>> is now between the float and numeric implementations.

> Attached are my test script and the full output.

OK ... I prefer to do this sort of timing in a way that's not so
dependent on client I/O speeds, along the lines of

select count(date_part('day', current_date + g * interval '1 day')) from 
generate_series(0, :N) g;

I applied the v5 patch and ran your test suite that way, producing
the attached results.  It looks pretty promising for me, too.
Most of the cases show about 10%-15% degradation:

# select extract, date_part, extract/date_part as ratio, unit from (select 
sum(msec) filter (where fn = 'extract') as extract, sum(msec) filter (where fn 
= 'date_part') as date_part, unit from timings group by unit) ss order by ratio;
  extract  | date_part | ratio  |  unit   
---+---++-
 22690.100 | 20705.402 | 1.09585411575201486066 | decade
 22810.005 | 20754.296 | 1.09904980636298142804 | century
 11238.122 | 10190.385 | 1.10281623314526389337 | timezone_minute
 20201.992 | 18303.982 | 1.1036938301184955 | doy
 20121.073 | 18206.290 | 1.1051715094069138 | dow
 23209.090 | 20915.715 | 1.10964841507928368693 | millennium
 18839.455 | 16943.063 | 1.11192734159106886399 | week
 20130.843 | 18010.011 | 1.1177585066438882 | isoyear
 19755.296 | 17668.497 | 1.11810846163089027890 | isodow
 22500.373 | 20112.264 | 1.11873894455641592612 | day
 22631.485 | 20200.266 | 1.12035579135443067928 | month
 22883.344 | 20407.733 | 1.12130749652594925659 | quarter
 22628.524 | 20172.361 | 1.12175882634660365239 | year
 26503.545 | 23493.288 | 1.12813263941598979249 | minute
 26381.817 | 23329.924 | 1.13081452815705700542 | hour
 27236.886 | 24070.860 | 1.13152940941869131390 | microseconds
 11563.820 |  9948.148 | 1.1624093248311143 | timezone_hour
 27728.212 | 23567.973 | 1.17652086583771968849 | second
 28348.328 | 23984.219 | 1.18195751965073367617 | milliseconds
 49902.129 | 30798.034 | 1.6203024193037776 | epoch
 31544.035 | 18250.745 | 1.7283697186060076 | julian
(21 rows)

The outliers are epoch and julian, which unsurprisingly are the
ones you didn't fix yet.

I think a ten-percent-ish slowdown is acceptable for this purpose,
so I think if you can address the points already raised then we're
pretty much good to go with this.

regards, tom lane

\set ECHO all
\timing on

\set N 1000

-- date

select count(date_part('day', current_date + g * interval '1 day')) from 
generate_series(0, :N) g;
select count(date_part('month', current_date + g * interval '1 day')) from 
generate_series(0, :N) g;
select count(date_part('quarter', current_date + g * interval '1 day')) from 
generate_series(0, :N) g;
select count(date_part('week', current_date + g * interval '1 day')) from 
generate_series(0, :N) g;
select count(date_part('year', current_date + g * interval '1 day')) from 
generate_series(0, :N) g;
select count(date_part('decade', current_date + g * interval '1 day')) from 
generate_series(0, :N) g;
select count(date_part('century', current_date + g * interval '1 day')) from 
generate_series(0, :N) g;
select count(date_part('millennium', current_date + g * interval '1 day')) from 
generate_series(0, :N) g;
select count(date_part('julian', current_date + g * interval '1 day')) from 
generate_series(0, :N) g;
select count(date_part('isoyear', current_date + g * interval '1 day')) from 
generate_series(0, :N) g;
select count(date_part('dow', current_date + g * interval '1 day')) from 
generate_series(0, :N) g;
select count(date_part('isodow', current_date + g * interval '1 day')) from 
generate_series(0, :N) g;
select count(date_part('doy', current_date + g * interval '1 day')) from 
generate_series(0, :N) g;
select count(date_part('epoch', current_date + g * interval '1 day')) from 
generate_series(0, :N) g;

select count(extract(day from current_date + g * interval '1 day')) from 
generate_series(0, :N) g;
select count(extract(month from current_date + g * interval '1 day')) from 
generate_series(0, :N) g;
select count(extract(quarter from current_date + g * interval '1 day')) from 
generate_series(0, :N) g;
select count(extract(week from current_date + g * interval '1 day')) from 
generate_series(0, :N) g;
select count(extract(year from current_date + g * interval '1 day')) from 
generate_series(0, :N) g;
select count(extract(decade from current_date + g * interval '1 day')) from 
generate_series(0, :N) g;
select count(extract(century from current_date + g * interval '1 day')) from 
generate_series(0, :N) g;
select count(extract(millennium from current_date + g * interval '1 day')) from 
generate_series(0, :N) g;
select count(extract(julian from current_date + g * interval '1 day')) from 
generate_series(0, :N) g;
select count(extract(isoyear from current_da

Re: pg_upgrade failing for 200+ million Large Objects

2021-03-23 Thread Tom Lane
Jan Wieck  writes:
> Have we even reached a consensus yet on that doing it the way, my patch 
> is proposing, is the right way to go? Like that emitting BLOB TOC 
> entries into SECTION_DATA when in binary upgrade mode is a good thing? 
> Or that bunching all the SQL statements for creating the blob, changing 
> the ACL and COMMENT and SECLABEL all in one multi-statement-query is.

Now you're asking for actual review effort, which is a little hard
to come by towards the tail end of the last CF of a cycle.  I'm
interested in this topic, but I can't justify spending much time
on it right now.

regards, tom lane




Re: Nicer error when connecting to standby with hot_standby=off

2021-03-23 Thread Tom Lane
Alvaro Herrera  writes:
> FATAL:  the database system is starting up
> DETAIL:  WAL is being applied to recover from a system crash.
> or
> DETAIL:  The system is applying WAL to recover from a system crash.
> or
> DETAIL:  The startup process is applying WAL to recover from a system crash.

I don't think the postmaster has enough context to know if that's
actually true.  It just launches the startup process and waits for
results.  If somebody saw this during a normal (non-crash) startup,
they'd be justifiably alarmed.

regards, tom lane




Re: Add client connection check during the execution of the query

2021-03-23 Thread Thomas Munro
On Tue, Mar 23, 2021 at 11:47 PM Thomas Munro  wrote:
> That leaves the thorny problem Tom mentioned at the top of this
> thread[1]: this socket-level approach can be fooled by an 'X' sitting
> in the socket buffer, if a client that did PQsendQuery() and then
> PQfinish().  Or perhaps even by SSL messages invisible to our protocol
> level.  That can surely only be addressed by moving the 'peeking' one
> level up the protocol stack.  I've attached a WIP attemp to do that,
> on top of the other patch.  Lookahead happens in our receive buffer,
> not the kernel's socket buffer.

After sleeping on this, I'm still not seeing any problem with this
approach.  Sanity checks welcome.  Of course that function should be
called something like pq_peekmessage() -- done.  I think this patch
addresses all critiques leveled at the earlier versions, and I've
tested this with SSL and non-SSL connections, by killing psql while a
query runs, and using a client that calls PQfinish() after starting a
query, and in an earlier version I did yank-the-cable testing, having
set up TCP keepalive to make that last case work.
From f7fd8640ebac242f21719574b0e92dc7cfba4041 Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Mon, 1 Mar 2021 18:08:23 +1300
Subject: [PATCH v7] Detect dropped connections while running queries.

Provide a new optional GUC that can be used to check whether the client
connection has gone away periodically while running very long queries.

Author: Sergey Cherkashin 
Author: Thomas Munro 
Reviewed-by: Thomas Munro 
Reviewed-by: Tatsuo Ishii 
Reviewed-by: Konstantin Knizhnik 
Reviewed-by: Zhihong Yu 
Reviewed-by: Tom Lane  (much earlier version)
Discussion: https://postgr.es/m/77def86b27e41f0efcba411460e929ae%40postgrespro.ru
---
 doc/src/sgml/config.sgml  | 36 +++
 src/backend/libpq/pqcomm.c| 60 +--
 src/backend/tcop/postgres.c   | 34 +++
 src/backend/utils/init/globals.c  |  1 +
 src/backend/utils/init/postinit.c | 11 
 src/backend/utils/misc/guc.c  | 10 
 src/backend/utils/misc/postgresql.conf.sample |  3 +
 src/include/libpq/libpq.h |  2 +
 src/include/miscadmin.h   |  1 +
 src/include/utils/timeout.h   |  1 +
 10 files changed, 155 insertions(+), 4 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 5679b40dd5..e522be460c 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -998,6 +998,42 @@ include_dir 'conf.d'
   
  
 
+ 
+  client_connection_check_interval (integer)
+  
+   client_connection_check_interval configuration parameter
+  
+  
+  
+   
+Sets the time interval between checks that the client is still
+connected, while running queries.  The check is performed by testing
+whether a part of the next message can be read from the client.
+If the kernel reports that the connection
+has been closed or lost, a long running query can abort immediately,
+rather than discovering the problem when it eventually tries to send
+the response.
+   
+   
+If this value is specified without units, it is taken as milliseconds.  
+The default value is 0, which disables connection
+checks.  Without connection checks, the server will detect the loss of
+the connection only when it is waiting for a new request, receiving a
+request or sending a response.
+   
+   
+For the kernel itself to detect lost TCP connections reliably and
+within a known timeframe in all scenarios including network failure, it
+may also be necessary to adjust the default TCP keepalive settings of
+the operating system, or the
+,
+ and
+ settings of
+PostgreSQL.
+   
+  
+ 
+ 
  
  
 
diff --git a/src/backend/libpq/pqcomm.c b/src/backend/libpq/pqcomm.c
index 4c7b1e7bfd..a12ed3f851 100644
--- a/src/backend/libpq/pqcomm.c
+++ b/src/backend/libpq/pqcomm.c
@@ -79,6 +79,7 @@
 #include "storage/ipc.h"
 #include "utils/guc.h"
 #include "utils/memutils.h"
+#include "utils/timeout.h"
 
 /*
  * Cope with the various platform-specific ways to spell TCP keepalive socket
@@ -104,6 +105,7 @@
  */
 int			Unix_socket_permissions;
 char	   *Unix_socket_group;
+int			client_connection_check_interval;
 
 /* Where the Unix socket files are (list of palloc'd strings) */
 static List *sock_paths = NIL;
@@ -919,13 +921,13 @@ socket_set_nonblocking(bool nonblocking)
 }
 
 /* 
- *		pq_recvbuf - load some bytes into the input buffer
+ *		pq_recvbuf_ext - load some bytes into the input buffer
  *
  *		returns 0 if OK, EOF if trouble
  * 
  */
 static int
-pq_recvbuf(void)
+pq_recvbuf_ext(bool nonblocking)
 {
 	if (PqRecvPointer > 0)
 	{
@@ -94

Re: Minimal logical decoding on standbys

2021-03-23 Thread Fabrízio de Royes Mello
>
> done in v13 attached.
>

All tests passed and everything looks good to me... just a final minor fix
on regression tests:

diff --git a/src/test/regress/expected/rules.out
b/src/test/regress/expected/rules.out
index b0e17d4e1d..961ec869a6 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1869,9 +1869,9 @@ pg_stat_database_conflicts| SELECT d.oid AS datid,
 pg_stat_get_db_conflict_tablespace(d.oid) AS confl_tablespace,
 pg_stat_get_db_conflict_lock(d.oid) AS confl_lock,
 pg_stat_get_db_conflict_snapshot(d.oid) AS confl_snapshot,
-pg_stat_get_db_conflict_logicalslot(d.oid) AS confl_logicalslot,
 pg_stat_get_db_conflict_bufferpin(d.oid) AS confl_bufferpin,
-pg_stat_get_db_conflict_startup_deadlock(d.oid) AS confl_deadlock
+pg_stat_get_db_conflict_startup_deadlock(d.oid) AS confl_deadlock,
+pg_stat_get_db_conflict_logicalslot(d.oid) AS confl_logicalslot
FROM pg_database d;
 pg_stat_gssapi| SELECT s.pid,
 s.gss_auth AS gss_authenticated,

We moved "pg_stat_database_conflicts.confl_logicalslot" to the end of the
column list but forgot to change the regression test expected result.

Regards,

-- 
   Fabrízio de Royes Mello
   PostgreSQL Developer at OnGres Inc. - https://ongres.com


Re: multi-install PostgresNode

2021-03-23 Thread Michael Paquier
On Thu, Jan 28, 2021 at 10:19:57AM -0500, Andrew Dunstan wrote:
> +BEGIN
> +{
> +
> +# putting this in a BEGIN block means it's run and checked by perl -c
> +
> +
> +# everything other than info and get_new_node that we need to override.
> +# they are all instance methods, so we can use the same template for all.
> +my @instance_overrides = qw(init backup start kill9 stop reload restart
> +  promote logrotate safe_psql psql background_psql
> +  interactive_psql poll_query_until command_ok
> +  command_fails command_like command_checks_all
> +  issues_sql_like run_log pg_recvlogical_upto
> +);

No actual objections here, but it would be easy to miss the addition
of a new routine.  Would an exclusion filter be more adapted, aka
override everything except get_new_node() and info()?
--
Michael


signature.asc
Description: PGP signature


Re: Add client connection check during the execution of the query

2021-03-23 Thread Zhihong Yu
Hi,
In the description:

Provide a new optional GUC that can be used to check whether the client
connection has gone away periodically while running very long queries.

I think moving 'periodically' to the vicinity of 'to check' would make the
sentence more readable.

+the operating system, or the
+,
+ and

The same guc is listed twice. I am not sure if that was intended.

Cheers

On Tue, Mar 23, 2021 at 2:54 PM Thomas Munro  wrote:

> On Tue, Mar 23, 2021 at 11:47 PM Thomas Munro 
> wrote:
> > That leaves the thorny problem Tom mentioned at the top of this
> > thread[1]: this socket-level approach can be fooled by an 'X' sitting
> > in the socket buffer, if a client that did PQsendQuery() and then
> > PQfinish().  Or perhaps even by SSL messages invisible to our protocol
> > level.  That can surely only be addressed by moving the 'peeking' one
> > level up the protocol stack.  I've attached a WIP attemp to do that,
> > on top of the other patch.  Lookahead happens in our receive buffer,
> > not the kernel's socket buffer.
>
> After sleeping on this, I'm still not seeing any problem with this
> approach.  Sanity checks welcome.  Of course that function should be
> called something like pq_peekmessage() -- done.  I think this patch
> addresses all critiques leveled at the earlier versions, and I've
> tested this with SSL and non-SSL connections, by killing psql while a
> query runs, and using a client that calls PQfinish() after starting a
> query, and in an earlier version I did yank-the-cable testing, having
> set up TCP keepalive to make that last case work.
>


Re: Change default of checkpoint_completion_target

2021-03-23 Thread Bossart, Nathan
On 3/23/21, 12:19 PM, "Stephen Frost"  wrote:
> * Bossart, Nathan (bossa...@amazon.com) wrote:
> > LGTM.  I just have a few small wording suggestions.
>
> Agreed, those looked like good suggestions and so I've incorporated
> them.
>
> Updated patch attached.

Looks good!

Nathan



Re: multi-install PostgresNode

2021-03-23 Thread Andrew Dunstan


On 3/23/21 6:36 PM, Michael Paquier wrote:
> On Thu, Jan 28, 2021 at 10:19:57AM -0500, Andrew Dunstan wrote:
>> +BEGIN
>> +{
>> +
>> +# putting this in a BEGIN block means it's run and checked by perl -c
>> +
>> +
>> +# everything other than info and get_new_node that we need to override.
>> +# they are all instance methods, so we can use the same template for 
>> all.
>> +my @instance_overrides = qw(init backup start kill9 stop reload restart
>> +  promote logrotate safe_psql psql background_psql
>> +  interactive_psql poll_query_until command_ok
>> +  command_fails command_like command_checks_all
>> +  issues_sql_like run_log pg_recvlogical_upto
>> +);
> No actual objections here, but it would be easy to miss the addition
> of a new routine.  Would an exclusion filter be more adapted, aka
> override everything except get_new_node() and info()?



Actually, following a brief offline discussion today I've thought of a
way that doesn't require subclassing. Will post that probably tomorrow.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: pg_amcheck contrib application

2021-03-23 Thread Peter Geoghegan
On Tue, Mar 23, 2021 at 12:53 PM Peter Geoghegan  wrote:
> One of the advantages of this design is that we verify practically all
> of the work involved in deleting an entire subtree up-front, inside
> _bt_lock_subtree_parent(). It's clearly safe to back out of it if it
> looks dicey.

That's taken care of. I just pushed a commit that teaches
_bt_lock_subtree_parent() to press on.

-- 
Peter Geoghegan




Re: Support for NSS as a libpq TLS backend

2021-03-23 Thread Jacob Champion
On Tue, 2021-03-23 at 00:38 +0100, Daniel Gustafsson wrote:
> This rebase also includes a fix for pgtls_init which was sent offlist by 
> Jacob.
> The changes in pgtls_init can potentially be used to initialize the crypto
> context for NSS to clean up this patch, Jacob is currently looking at that.

I'm having a hell of a time trying to get the context stuff working.
Findings so far (I have patches in progress for many of these, but it's
all blowing up because of the last problem):

NSS_INIT_NOROOTINIT is hardcoded for NSS_InitContext(), so we probably
don't need to pass it explicitly. NSS_INIT_PK11RELOAD is apparently
meant to hack around libraries that do their own PKCS loading; do we
need it?

NSS_ShutdownContext() can (and does) fail if we've leaked handles to
objects, so we need to check its return value. Once this happens,
future NSS_InitContext() calls behave poorly. Currently we leak the
pr_fd as well as a handful of server_cert handles.

NSS_NoDB_Init() is going to pin NSS in memory. For the backend this is
probably okay, but for libpq clients that's probably not what we want.

The first database loaded by NSS_InitContext() becomes the "default"
database. This is what I'm currently hung up on. I can't figure out how
to get NSS to use the database that was loaded for the current
connection, so in my local patches for the issues above, client
certificates fail to load. I can work around it temporarily for the
tests, but this will be a problem if any libpq clients load up multiple
independent databases for use with separate connections. Anyone know if
this is a supported use case for NSS?

--Jacob


Re: Support for NSS as a libpq TLS backend

2021-03-23 Thread Michael Paquier
On Wed, Mar 24, 2021 at 12:05:35AM +, Jacob Champion wrote:
> The first database loaded by NSS_InitContext() becomes the "default"
> database. This is what I'm currently hung up on. I can't figure out how
> to get NSS to use the database that was loaded for the current
> connection, so in my local patches for the issues above, client
> certificates fail to load. I can work around it temporarily for the
> tests, but this will be a problem if any libpq clients load up multiple
> independent databases for use with separate connections. Anyone know if
> this is a supported use case for NSS?

Are you referring to the case of threading here?  This should be a
supported case, as threads created by an application through libpq
could perfectly use completely different connection strings.
--
Michael


signature.asc
Description: PGP signature


  1   2   >