date:20201006

Is the doc for SHOW somewhat stale?

2020-10-06 Thread Kyotaro Horiguchi

Hello.

If I read the documentation of the SHOW command, it says:

https://www.postgresql.org/docs/13/sql-show.html

> name 
>   The name of a run-time parameter. Available parameters are documented
>   in Chapter 19 and on the SET reference page. In addition, there are a
>   few parameters that can be shown but not set:
>  SERVER_VERSION
>  SERVER_ENCODING
>  LC_COLLATE
>  LC_CTYPE
>  IS_SUPERUSER
> ALL
>   Show the values of all configuration parameters, with descriptions.

It seems to me GUC variables of PGC_INTERNAL are shown above, but
there are 13 other variables of that context.  On the other hand
is_superuser is not shown in pg_settings view nor by SHOW ALL
(GUC_NO_SHOW_ALL).

is_superuser is commented as the follows in guc.c:

> /* Not for general use --- used by SET SESSION AUTHORIZATION */

So, if I followed the current style of the page, we should:

 - remove IS_SUPERUSER  from the documentation.
 - add the 13 variables.

However, it seems better that that item simply points to "19.15 Preset
Options" instead, as attached.

By the way, SHOW ALL shows 329 rows as of PG13 whereas documentation
says 196 rows. (14 devel shows 331 rows.)

Thoughts?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/doc/src/sgml/ref/show.sgml b/doc/src/sgml/ref/show.sgml
index 945b0491b1..5bb7a2ca1b 100644
--- a/doc/src/sgml/ref/show.sgml
+++ b/doc/src/sgml/ref/show.sgml
@@ -54,62 +54,8 @@ SHOW ALL
   documented in  and on the  reference page.  In
   addition, there are a few parameters that can be shown but not
-  set:
-
-  
-   
-SERVER_VERSION
-
- 
-  Shows the server's version number.
- 
-
-   
-
-   
-SERVER_ENCODING
-
- 
-  Shows the server-side character set encoding.  At present,
-  this parameter can be shown but not set, because the
-  encoding is determined at database creation time.
- 
-
-   
-
-   
-LC_COLLATE
-
- 
-  Shows the database's locale setting for collation (text
-  ordering).  At present, this parameter can be shown but not
-  set, because the setting is determined at database creation
-  time.
- 
-
-   
-
-   
-LC_CTYPE
-
- 
-  Shows the database's locale setting for character
-  classification.  At present, this parameter can be shown but
-  not set, because the setting is determined at database creation
-  time.
- 
-
-   
-
-   
-IS_SUPERUSER
-
- 
-  True if the current role has superuser privileges.
- 
-
-   
-  
+  set. See  for details.
+ 
 

 
@@ -175,7 +121,7 @@ SHOW ALL;
 .
  xmloption   | content | Sets whether XML data in implicit parsing ...
  zero_damaged_pages  | off | Continues processing past damaged page headers.
-(196 rows)
+(329 rows)

Re: Add primary keys to system catalogs

2020-10-06 Thread Craig Ringer

On Sun, 4 Oct 2020, 01:32 John Naylor,  wrote:

>
> On Sat, Oct 3, 2020 at 9:27 AM Craig Ringer 
> wrote:
> > So a big +1 from me for the idea. Especially if we ensure psql
> recognises when the relation 'oid' attribute has a declared PK and includes
> it in the column list.
>
> Oid has been in the normal column list since v12 when it stopped being a
> system column. Or did you mean something else?
>

It means I've spent way too long working on extensions running mainly on
pg11 lately and way too little reading -hackers. Ignore they bit.  Thanks..

>

RE: extension patch of CREATE OR REPLACE TRIGGER

2020-10-06 Thread osumi.takami...@fujitsu.com

Hi


I spent too much time to respond to this e-mail. Sorry.
Actually, I got stuck to deal with achieving both
error detection of internal trigger case and pending trigger case.

> * I'm concerned by the fact that there doesn't seem to be any defense against
> somebody replacing a foreign-key trigger with something that does something
> else entirely, and thereby silently breaking their foreign key constraint.  I 
> think
> it might be a good idea to forbid replacing triggers for which tgisinternal 
> is true;
> but I've not done the legwork to see if that's exactly the condition we want.
> 
> * In the same vein, I'm not sure that the right things happen when fooling 
> with
> triggers attached to partitioned tables.  We presumably don't want to allow
> mucking directly with a child trigger.  Perhaps refusing an update when
> tgisinternal might fix this too (although we'll have to be careful to make 
> the error
> message not too confusing).
Yeah, you are right. tgisinternal works to detect an invalid cases.
I added a new check condition in my patch to prohibit
the replacement of internal triggers by an user,
which protects FK trigger and child trigger from being replaced directly.


> * I don't think that you've fully thought through the implications of 
> replacing a
> trigger for a table that the current transaction has already modified.  Is it 
> really
> sufficient, or even useful, to do
> this:
> 
> +/*
> + * If this trigger has pending events, throw an error.
> + */
> +if (trigger_deferrable &&
> + AfterTriggerPendingOnRel(RelationGetRelid(rel)))
> 
> As an example, if we change a BEFORE trigger to an AFTER trigger, that's not
> going to affect the fact that we *already* fired that trigger.  Maybe this is 
> okay
> and we just need to document it, but I'm not convinced.
> 
> * BTW, I don't think a trigger necessarily has to be deferrable in order to 
> have
> pending AFTER events.  The existing use of AfterTriggerPendingOnRel
> certainly doesn't assume that.  But really, I think we probably ought to be
> applying CheckTableNotInUse which'd include that test.  (Another way in
> which there's fuzzy thinking here is that AfterTriggerPendingOnRel isn't 
> specific
> to *this*
> trigger.)
Hmm, actually, when I just put a code of CheckTableNotInUse() in 
CreateTrigger(),
it throws error like "cannot CREATE OR REPLACE
TRIGGER because it is being used by active queries in this session".
This causes an break of the protection for internal cases above
and a contradiction of already passed test cases.
I though adding a condition of in_partition==false to call
CheckTableNotInUse(). But this doesn't work in a corner case that
child trigger generated internally is pending and
we don't want to allow the replacement of this kind of trigger.
Did you have any good idea to achieve both points at the same time ?


> * A lesser point is that I think you're overcomplicating the code by applying
> heap_modify_tuple.  You might as well just build the new tuple normally in all
> cases, and then apply either CatalogTupleInsert or CatalogTupleUpdate.
> 
> * Also, the search for an existing trigger tuple is being done the hard way.
> You're using an index on (tgrelid, tgname), so you could include the name in 
> the
> index key and expect that there's at most one match.
While waiting for a new reply, I'll doing those 2 refactorings.


Regards,
Takamichi Osumi


CREATE_OR_REPLACE_TRIGGER_v12.patch
Description: CREATE_OR_REPLACE_TRIGGER_v12.patch

Re: Resetting spilled txn statistics in pg_stat_replication

2020-10-06 Thread Amit Kapila

On Tue, Oct 6, 2020 at 9:34 AM Masahiko Sawada
 wrote:
>
> Looking at pgstat_reset_replslot_counter() in the v8 patch, even if we
> pass a physical slot name to pg_stat_reset_replication_slot() a
> PgStat_MsgResetreplslotcounter is sent to the stats collector. I’m
> okay with not raising an error but maybe we can have it not to send
> the message in that case.
>

makes sense, so changed accordingly.

-- 
With Regards,
Amit Kapila.


v9-0001-Track-statistics-for-spilling-of-changes-from-Reo.patch
Description: Binary data


v9-0002-Test-stats.patch
Description: Binary data

pg_upgrade dead code for pre-8.4 versions

2020-10-06 Thread Magnus Hagander

pg_upgrade documentation says it supports upgrades from 8.4 and newer.

But there is code in there that makes a check and differs pre-8.4 from
8.4-or-newer.

ISTM that code could be cleaned out, because it should be dead based on the
initial check that we are upgrading from 8.4 or later?

This appears to be:
* The code in the analyze script (see a follow-up email on this), which
actually tries to generate a vacuum script if it's a pre-8.4 without
visibility map
* The file copy code in relfilenode.c that copies vm and fsm files only
from 8.4 and up (which should now be always)

Or am I misreading something?

//Magnus

Re: Add session statistics to pg_stat_database

2020-10-06 Thread Masahiro Ikeda


On 2020-09-05 00:50, Laurenz Albe wrote:

I have changed the code so that connections are counted immediately.
Attached is a new version.


Thanks for making a patch.
I'm interested in this feature.

I think to add the number of login failures is good for security.
Although we can see the event from log files, it's useful to know the 
overview

if the database may be attached or not.

By the way, could you rebase the patch since the latest patches
failed to be applied to the master branch?

Regards,
--
Masahiro Ikeda
NTT DATA CORPORATION

Re: Parallel INSERT (INTO ... SELECT ...)

2020-10-06 Thread Greg Nancarrow

On Mon, Oct 5, 2020 at 10:36 PM Dilip Kumar  wrote:
>
> > > I have one question which is common to both this patch and parallel
> > > inserts in CTAS[1], do we need to skip creating tuple
> > > queues(ExecParallelSetupTupleQueues) as we don't have any tuples
> > > that's being shared from workers to leader?
> > >
> >
> > As far as this patch is concerned we might need to return tuples when
> > there is a Returning clause. I think for the cases where we don't need
> > to return tuples we might want to skip creating these queues if it is
> > feasible without too many changes.
>

Hi Dilip,

You're right. I've included that in my latest version of the patch (so
Gather should only start tuple queues in the case of parallel SELECT
or parallel INSERT with a RETURNING clause).
Other functionality updated includes:
- Added more necessary exclusions for Parallel INSERT INTO ... SELECT
... (but allowing underlying query to still be parallel):
  - non-parallel-safe triggers
  - non-parallel-safe default and check expressions
  - foreign tables
  - temporary tables
- Added support for before/after statement-level INSERT triggers
(can't allow parallel workers to execute these)
- Adjusted cost of Gather node, for when RETURNING clause is not specified
I have not found issues with partition tables (yet) or toast column values.

Also, I have attached a separate patch (requested by Andres Freund)
that just allows the underlying SELECT part of "INSERT INTO ... SELECT
..." to be parallel.

Regards,
Greg Nancarrow
Fujitsu Australia


0004-ParallelInsertSelect.patch
Description: Binary data


0001-InsertParallelSelect.patch
Description: Binary data

Re: pg_upgrade dead code for pre-8.4 versions

2020-10-06 Thread Daniel Gustafsson

> On 6 Oct 2020, at 11:27, Magnus Hagander  wrote:

> ISTM that code could be cleaned out, because it should be dead based on the 
> initial check that we are upgrading from 8.4 or later?

+1.  Commit 2209b3923a7afe0b removed support for 8.3 so anything left checking
for pre-8.4 should be removed as a pre-8.4 source cluster couldn't be upgraded
with pg_upgrade today.

cheers ./daniel

pg_upgrade analyze script

2020-10-06 Thread Magnus Hagander

For a long time now, pg_upgrade drops a script (analyze_new_cluster.sh)
that just calls vacuumdb to run the analyze in stages. This script made a
lot of sense back when it actually implemented the stages, but these days
since it's just calling a single command, I think it's just unnecessary
complication.

I suggest we drop it and just replace it with instructions to call vacuumdb
directly.

Attached patch does this. It also removes the support in the instructions
that talk about pre-8.4 databases, which I believe is dead code per
https://postgr.es/m/CABUevEx-D0PNVe00tkeQRGennZQwDtBJn=
493mjt-x6sppb...@mail.gmail.com.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 
diff --git a/src/bin/pg_upgrade/.gitignore b/src/bin/pg_upgrade/.gitignore
index 9edea5c98f..2d3bfeaa50 100644
--- a/src/bin/pg_upgrade/.gitignore
+++ b/src/bin/pg_upgrade/.gitignore
@@ -1,9 +1,7 @@
 /pg_upgrade
 # Generated by test suite
 /pg_upgrade_internal.log
-/analyze_new_cluster.sh
 /delete_old_cluster.sh
-/analyze_new_cluster.bat
 /delete_old_cluster.bat
 /reindex_hash.sql
 /loadable_libraries.txt
diff --git a/src/bin/pg_upgrade/Makefile b/src/bin/pg_upgrade/Makefile
index 0360c37bf9..44d06be5a6 100644
--- a/src/bin/pg_upgrade/Makefile
+++ b/src/bin/pg_upgrade/Makefile
@@ -44,7 +44,7 @@ uninstall:
 
 clean distclean maintainer-clean:
 	rm -f pg_upgrade$(X) $(OBJS)
-	rm -rf analyze_new_cluster.sh delete_old_cluster.sh log/ tmp_check/ \
+	rm -rf delete_old_cluster.sh log/ tmp_check/ \
 	   loadable_libraries.txt reindex_hash.sql \
 	   pg_upgrade_dump_globals.sql \
 	   pg_upgrade_dump_*.custom pg_upgrade_*.log
diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index 2f7aa632c5..8538b076aa 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -231,21 +231,22 @@ issue_warnings_and_set_wal_level(void)
 
 
 void
-output_completion_banner(char *analyze_script_file_name,
-		 char *deletion_script_file_name)
+output_completion_banner(char *deletion_script_file_name)
 {
-	/* Did we copy the free space files? */
-	if (GET_MAJOR_VERSION(old_cluster.major_version) >= 804)
-		pg_log(PG_REPORT,
-			   "Optimizer statistics are not transferred by pg_upgrade so,\n"
-			   "once you start the new server, consider running:\n"
-			   "%s\n\n", analyze_script_file_name);
-	else
-		pg_log(PG_REPORT,
-			   "Optimizer statistics and free space information are not transferred\n"
-			   "by pg_upgrade so, once you start the new server, consider running:\n"
-			   "%s\n\n", analyze_script_file_name);
+	PQExpBufferData user_specification;
+
+	initPQExpBuffer(&user_specification);
+	if (os_info.user_specified)
+	{
+		appendPQExpBufferStr(&user_specification, "-U ");
+		appendShellString(&user_specification, os_info.user);
+		appendPQExpBufferChar(&user_specification, ' ');
+	}
 
+	pg_log(PG_REPORT,
+		   "Optimizer statistics are not transferred by pg_upgrade so,\n"
+		   "once you start the new server, consider running:\n"
+		   "%s/vacuumdb %s--all --analyze-in-stages\n\n", new_cluster.bindir, user_specification.data);
 
 	if (deletion_script_file_name)
 		pg_log(PG_REPORT,
@@ -258,6 +259,8 @@ output_completion_banner(char *analyze_script_file_name,
 			   "because user-defined tablespaces or the new cluster's data directory\n"
 			   "exist in the old cluster directory.  The old cluster's contents must\n"
 			   "be deleted manually.\n");
+
+	termPQExpBuffer(&user_specification);
 }
 
 
@@ -452,96 +455,6 @@ check_databases_are_compatible(void)
 }
 
 
-/*
- * create_script_for_cluster_analyze()
- *
- *	This incrementally generates better optimizer statistics
- */
-void
-create_script_for_cluster_analyze(char **analyze_script_file_name)
-{
-	FILE	   *script = NULL;
-	PQExpBufferData user_specification;
-
-	prep_status("Creating script to analyze new cluster");
-
-	initPQExpBuffer(&user_specification);
-	if (os_info.user_specified)
-	{
-		appendPQExpBufferStr(&user_specification, "-U ");
-		appendShellString(&user_specification, os_info.user);
-		appendPQExpBufferChar(&user_specification, ' ');
-	}
-
-	*analyze_script_file_name = psprintf("%sanalyze_new_cluster.%s",
-		 SCRIPT_PREFIX, SCRIPT_EXT);
-
-	if ((script = fopen_priv(*analyze_script_file_name, "w")) == NULL)
-		pg_fatal("could not open file \"%s\": %s\n",
- *analyze_script_file_name, strerror(errno));
-
-#ifndef WIN32
-	/* add shebang header */
-	fprintf(script, "#!/bin/sh\n\n");
-#else
-	/* suppress command echoing */
-	fprintf(script, "@echo off\n");
-#endif
-
-	fprintf(script, "echo %sThis script will generate minimal optimizer statistics rapidly%s\n",
-			ECHO_QUOTE, ECHO_QUOTE);
-	fprintf(script, "echo %sso your system is usable, and then gather statistics twice more%s\n",
-			ECHO_QUOTE, ECHO_QUOTE);
-	fprintf(script, "echo %swith increasing accuracy.  When it is done, your system will%s\n",
-			ECHO_QUOTE, ECHO_QUOTE);

RE: [PoC] Non-volatile WAL buffer

2020-10-06 Thread Takashi Menjo

Hi Gang,

I have tried to but yet cannot reproduce performance degrade you reported when 
inserting 328-byte records. So I think the condition of you and me would be 
different, such as steps to reproduce, postgresql.conf, installation setup, and 
so on.

My results and condition are as follows. May I have your condition in more 
detail? Note that I refer to your "Storage over App Direct" as my "Original 
(PMEM)" and "NVWAL patch" to "Non-volatile WAL buffer."

Best regards,
Takashi


# Results
See the attached figure. In short, Non-volatile WAL buffer got better 
performance than Original (PMEM).

# Steps
Note that I ran postgres server and pgbench in a single-machine system but 
separated two NUMA nodes. PMEM and PCI SSD for the server process are on the 
server-side NUMA node.

01) Create a PMEM namespace (sudo ndctl create-namespace -f -t pmem -m fsdax -M 
dev -e namespace0.0)
02) Make an ext4 filesystem for PMEM then mount it with DAX option (sudo 
mkfs.ext4 -q -F /dev/pmem0 ; sudo mount -o dax /dev/pmem0 /mnt/pmem0)
03) Make another ext4 filesystem for PCIe SSD then mount it (sudo mkfs.ext4 -q 
-F /dev/nvme0n1 ; sudo mount /dev/nvme0n1 /mnt/nvme0n1)
04) Make /mnt/pmem0/pg_wal directory for WAL
05) Make /mnt/nvme0n1/pgdata directory for PGDATA
06) Run initdb (initdb --locale=C --encoding=UTF8 -X /mnt/pmem0/pg_wal ...)
- Also give -P /mnt/pmem0/pg_wal/nvwal -Q 81920 in the case of Non-volatile 
WAL buffer
07) Edit postgresql.conf as the attached one
- Please remove nvwal_* lines in the case of Original (PMEM)
08) Start postgres server process on NUMA node 0 (numactl -N 0 -m 0 -- pg_ctl 
-l pg.log start)
09) Create a database (createdb --locale=C --encoding=UTF8)
10) Initialize pgbench tables with s=50 (pgbench -i -s 50)
11) Change # characters of "filler" column of "pgbench_history" table to 300 
(ALTER TABLE pgbench_history ALTER filler TYPE character(300);)
- This would make the row size of the table 328 bytes
12) Stop the postgres server process (pg_ctl -l pg.log -m smart stop)
13) Remount the PMEM and the PCIe SSD
14) Start postgres server process on NUMA node 0 again (numactl -N 0 -m 0 -- 
pg_ctl -l pg.log start)
15) Run pg_prewarm for all the four pgbench_* tables
16) Run pgbench on NUMA node 1 for 30 minutes (numactl -N 1 -m 1 -- pgbench -r 
-M prepared -T 1800 -c __ -j __)
- It executes the default tpcb-like transactions

I repeated all the steps three times for each (c,j) then got the median "tps = 
__ (including connections establishing)" of the three as throughput and the 
"latency average = __ ms " of that time as average latency.

# Environment variables
export PGHOST=/tmp
export PGPORT=5432
export PGDATABASE="$USER"
export PGUSER="$USER"
export PGDATA=/mnt/nvme0n1/pgdata

# Setup
- System: HPE ProLiant DL380 Gen10
- CPU: Intel Xeon Gold 6240M x2 sockets (18 cores per socket; HT disabled by 
BIOS)
- DRAM: DDR4 2933MHz 192GiB/socket x2 sockets (32 GiB per channel x 6 channels 
per socket)
- Optane PMem: Apache Pass, AppDirect Mode, DDR4 2666MHz 1.5TiB/socket x2 
sockets (256 GiB per channel x 6 channels per socket; interleaving enabled)
- PCIe SSD: DC P4800X Series SSDPED1K750GA
- Distro: Ubuntu 20.04.1
- C compiler: gcc 9.3.0
- libc: glibc 2.31
- Linux kernel: 5.7 (vanilla)
- Filesystem: ext4 (DAX enabled when using Optane PMem)
- PMDK: 1.9
- PostgreSQL (Original): 14devel (200f610: Jul 26, 2020)
- PostgreSQL (Non-volatile WAL buffer): 14devel (200f610: Jul 26, 2020) + 
non-volatile WAL buffer patchset v4

-- 
Takashi Menjo 
NTT Software Innovation Center

> -Original Message-
> From: Takashi Menjo 
> Sent: Thursday, September 24, 2020 2:38 AM
> To: Deng, Gang 
> Cc: pgsql-hack...@postgresql.org; Takashi Menjo 
> 
> Subject: Re: [PoC] Non-volatile WAL buffer
> 
> Hello Gang,
> 
> Thank you for your report. I have not taken care of record size deeply yet, 
> so your report is very interesting. I will
> also have a test like yours then post results here.
> 
> Regards,
> Takashi
> 
> 
> 2020年9月21日(月) 14:14 Deng, Gang   >:
> 
> 
>   Hi Takashi,
> 
> 
> 
>   Thank you for the patch and work on accelerating PG performance with 
> NVM. I applied the patch and made
> some performance test based on the patch v4. I stored database data files on 
> NVMe SSD and stored WAL file on
> Intel PMem (NVM). I used two methods to store WAL file(s):
> 
>   1.  Leverage your patch to access PMem with libpmem (NVWAL patch).
> 
>   2.  Access PMem with legacy filesystem interface, that means use 
> PMem as ordinary block device, no
> PG patch is required to access PMem (Storage over App Direct).
> 
> 
> 
>   I tried two insert scenarios:
> 
>   A. Insert small record (length of record to be inserted is 24 
> bytes), I think it is similar as your test
> 
>   B.  Insert large record (length of record to be inserted is 328 
> bytes)
> 
> 
> 
>   My original purpose is to see higher performance gain in scenario B as 
>

Re: Parallel INSERT (INTO ... SELECT ...)

2020-10-06 Thread Bharath Rupireddy

On Tue, Oct 6, 2020 at 3:08 PM Greg Nancarrow  wrote:
>
> I have not found issues with partition tables (yet) or toast column values.
>

I think for toast column values there may not be a problem as each
parallel worker inserts toast column values individually.

But the problem may arise if a partitioned table has foreign table as
a partition, I think we can not allow parallelism for this case too,
but it's hard to determine ahead of time whether a table has a foreign
partition.(See [1] in copy.c)

>
> - Added support for before/after statement-level INSERT triggers
> (can't allow parallel workers to execute these)
>

I think we can allow parallelism for before statement level-triggers.
Leader can execute this trigger and go for parallel inserts.

How about before row, after row, instead row, new table type triggers?

[1]
else
{
/*
 * For partitioned tables, we may still be able to perform bulk
 * inserts.  However, the possibility of this depends on which types
 * of triggers exist on the partition.  We must disable bulk inserts
 * if the partition is a foreign table or it has any before row insert
 * or insert instead triggers (same as we checked above for the parent
 * table).  Since the partition's resultRelInfos are initialized only
 * when we actually need to insert the first tuple into them, we must
 * have the intermediate insert method of CIM_MULTI_CONDITIONAL to
 * flag that we must later determine if we can use bulk-inserts for
 * the partition being inserted into.
 */
if (proute)
insertMethod = CIM_MULTI_CONDITIONAL;

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Prevent printing "next step instructions" in initdb and pg_upgrade

2020-10-06 Thread Magnus Hagander

The attached patch adds a switch --no-instructions to initdb and
pg_upgrade, which prevents them from printing instructions about how to
start the cluster (initdb) or how to analyze and delete clusters
(pg_upgrade).

The use case for this is for example the debian or redhat package wrappers.
When these commands are run under those wrappers the printed instructions
are *wrong*. It's better in that case to exclude them, and let the wrapper
be responsible for printing the correct instructions.

I went with the name --no-instructions to have the same name for both
initdb and pg_upgrade. The downside is that "no-instructions" also causes
the scripts not to be written in pg_upgrade, which arguably is a different
thing. We could go with "--no-instructions" and "--no-scripts", but that
would leave the parameters different. I also considered "--no-next-step",
but that one didn't quite have the right ring to me. I'm happy for other
suggestions on the parameter names.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 
diff --git a/doc/src/sgml/ref/initdb.sgml b/doc/src/sgml/ref/initdb.sgml
index 385ac25150..995d78408e 100644
--- a/doc/src/sgml/ref/initdb.sgml
+++ b/doc/src/sgml/ref/initdb.sgml
@@ -275,6 +275,19 @@ PostgreSQL documentation
   
  
 
+ 
+  --no-instructions
+  
+   
+By default, initdb will write instructions for how
+to start the cluster at the end of its output. This option causes
+those instructions to be left out. This is primarily intended for use
+by tools that wrap initdb in platform specific
+behavior, where those instructions are likely to be incorrect.
+   
+  
+ 
+
  
   --pwfile=filename
   
diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml
index b59c5697a3..b4d0a5e8f1 100644
--- a/doc/src/sgml/ref/pgupgrade.sgml
+++ b/doc/src/sgml/ref/pgupgrade.sgml
@@ -215,6 +215,20 @@ PostgreSQL documentation
   
  
 
+ 
+  --no-instructions
+  
+   
+By default, pg_upgrade will write instructions and
+scripts for how to analyze the new and delete the old cluster at the
+end of its output. This option causes those scripts and instructions
+to be left out. This is primarily intended for use by tools that wrap
+pg_upgrade in platform specific behavior, where
+those instructions are likely to be incorrect.
+   
+  
+ 
+
  
   -?
   --help
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index ee3bfa82f4..e6b5200ba5 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -139,6 +139,7 @@ static const char *authmethodhost = NULL;
 static const char *authmethodlocal = NULL;
 static bool debug = false;
 static bool noclean = false;
+static bool noinstructions = false;
 static bool do_sync = true;
 static bool sync_only = false;
 static bool show_setting = false;
@@ -2294,6 +2295,7 @@ usage(const char *progname)
 	printf(_("  -L DIRECTORY  where to find the input files\n"));
 	printf(_("  -n, --no-cleando not clean up after errors\n"));
 	printf(_("  -N, --no-sync do not wait for changes to be written safely to disk\n"));
+	printf(_("  --no-instructions don't print instructions for next steps\n"));
 	printf(_("  -s, --showshow internal settings\n"));
 	printf(_("  -S, --sync-only   only sync data directory\n"));
 	printf(_("\nOther options:\n"));
@@ -2953,6 +2955,7 @@ main(int argc, char *argv[])
 		{"no-clean", no_argument, NULL, 'n'},
 		{"nosync", no_argument, NULL, 'N'}, /* for backwards compatibility */
 		{"no-sync", no_argument, NULL, 'N'},
+		{"no-instructions", no_argument, NULL, 13},
 		{"sync-only", no_argument, NULL, 'S'},
 		{"waldir", required_argument, NULL, 'X'},
 		{"wal-segsize", required_argument, NULL, 12},
@@ -3093,6 +3096,9 @@ main(int argc, char *argv[])
 			case 12:
 str_wal_segment_size_mb = pg_strdup(optarg);
 break;
+			case 13:
+noinstructions = true;
+break;
 			case 'g':
 SetDataDirectoryCreatePerm(PG_DIR_MODE_GROUP);
 break;
@@ -3243,34 +3249,40 @@ main(int argc, char *argv[])
 		  "--auth-local and --auth-host, the next time you run initdb.\n"));
 	}
 
-	/*
-	 * Build up a shell command to tell the user how to start the server
-	 */
-	start_db_cmd = createPQExpBuffer();
+	if (!noinstructions)
+	{
+		/*
+		 * Build up a shell command to tell the user how to start the server
+		 */
+		start_db_cmd = createPQExpBuffer();
+
+		/* Get directory specification used to start initdb ... */
+		strlcpy(pg_ctl_path, argv[0], sizeof(pg_ctl_path));
+		canonicalize_path(pg_ctl_path);
+		get_parent_directory(pg_ctl_path);
+		/* ... and tag on pg_ctl instead */
+		join_path_components(pg_ctl_path, pg_ctl_path, "pg_ctl");
 
-	/* Get directory specification used to start

Re: Parallel INSERT (INTO ... SELECT ...)

2020-10-06 Thread Greg Nancarrow

On Tue, Oct 6, 2020 at 9:10 PM Bharath Rupireddy
 wrote:

> But the problem may arise if a partitioned table has foreign table as
> a partition, I think we can not allow parallelism for this case too,
> but it's hard to determine ahead of time whether a table has a foreign
> partition.(See [1] in copy.c)
>

Thanks, I had seen that as a potential issue when scanning the code,
but had forgotten to note it. I'll check your code again.

> >
> > - Added support for before/after statement-level INSERT triggers
> > (can't allow parallel workers to execute these)
> >
>
> I think we can allow parallelism for before statement level-triggers.
> Leader can execute this trigger and go for parallel inserts.
>

My attached patch implements the before/after statement-level trigger
invocation.
(For INSERT INTO ... SELECT... case, it needs to account for parallel
and non-parallel INSERT, and also the fact that, as the patch
currently stands, the leader also participates in a parallel INSERT -
so I found it necessary to invoke those triggers at the Gather node
level in that case).

> How about before row, after row, instead row, new table type triggers?
>

My attached patch does not allow parallel INSERT if there are any
row-level triggers (as the trigger functions could see a different and
unpredictable table state compared to non-parallel INSERT, even if
otherwise parallel-safe).

Regards,
Greg Nancarrow
Fujitsu Australia

Re: Yet another fast GiST build

2020-10-06 Thread Pavel Borisov

I've been making tests with memory sanitizer and got one another error in
regression test create_index:
CREATE INDEX gpointind ON point_tbl USING gist (f1);
server closed the connection unexpectedly

with logfile:
gistproc.c:1714:28: runtime error: 1e+300 is outside the range of
representable values of type 'float'
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior gistproc.c:1714:28

gistproc.c:
1714 z = point_zorder_internal(p->x, p->y);

Consider this a minor issue but unrelated to the other issues discussed. It
is reproduced on the last master
commit 0a3c864c32751fd29d021929cf70af421fd27370 after all changes into Gist
committed.

cflags="-DUSE_VALGRIND -Og -O0 -fsanitize=address -fsanitize=undefined
-fno-sanitize-recover=all -fno-sanitize=alignment -fstack-protector"


-- 
Best regards,
Pavel Borisov

Postgres Professional: http://postgrespro.com

Re: Partitionwise join fails under GEQO

2020-10-06 Thread Ashutosh Bapat

On Mon, Oct 5, 2020 at 11:59 PM Tom Lane  wrote:
>
> Ashutosh Bapat  writes:
> > On Mon, Oct 5, 2020 at 6:47 AM Tom Lane  wrote:
> >> Now that I've seen this, I wonder whether add_child_join_rel_equivalences
> >> might not be making duplicate EC entries even without GEQO.  Is there any
> >> guarantee that it's not called repeatedly on related join-rel sets?
>
> > build_child_join_rel() is the only caller of
> > add_child_join_rel_equivalences(). Before the first calls the later,
> > the first has an Assert
> >  899 Assert(!find_join_rel(root, joinrel->relids));
> > This means that for a given child join rel this function is called
> > only once implying that add_child_join_rel_equivalences() is called
> > only once for a given child join rel. Initially I thought that this is
> > enough to not add duplicate child members. But to me use of
> > bms_overlap() in that function looks doubtful.
>
> Yeah.  As soon as you get into three-or-more-way joins, this code will
> be called again, mentioning some of the same relids as at the lower
> join level.
>
> > On a related but different subject, I have been thinking on-off about
> > the need for expression translation while planning partitions.The
> > translated expressions differ only in the leaf-vars and even for most
> > of the partitioned tables really only the varno (assuming that
> > partitioned table and partitions will have same layouts when a large
> > number of partitions are created automatically.) If we can introduce a
> > new type of (polymorphic) Var node which represents not just the
> > parent Var but also child Vars as well. The whole need for expression
> > translation vanishes, reducing memory requirements by many folds.
>
> Actually, the reason I have been looking at equivclass.c is that I've
> been attacking the problem from a different direction.  I'm fairly
> unexcited about making the definition of Var looser as you suggest
> here --- I actually think that it's dangerously imprecise already,
> due to not accounting for the effects of outer joins.  However,
> we could avoid the growth in eclass members for large partition sets
> if we simply didn't store child eclass members, instead translating
> on-the-fly when we need to deal with child rels.  I have a patch
> about half done, but it won't be possible to determine the true
> performance implications of that idea until it's all done.  More
> later.

If we translate more than ones, every time someone comes searching for
an EC member, we will leak memory in the planner memory context, which
anyway gets bloated because of the huge number of translations even
when done only once. So that needs to be done rather carefully. Of
course we could save child EC members separately in the same EC and
search that using hash or something to avoid multiple instances of the
same child EC member.

But this memory bloat isn't unique to ECs, though it shows up mostly
in ECs. We translate all the expressions, TLEs, quals and so on. That
consumes a lot of memory. The number of joins grows exponentially with
the number of relations being joined and so does this child joins. So
we need some way to avoid expression translations, where the
translated expressions just differ in leaf var nodes, in order to
support a large number of partitions.

>
> Either approach would mean that add_child_join_rel_equivalences
> goes away entirely, or at least doesn't need to store new em_is_child
> entries anymore, causing the memory bloat issue to become moot.
> So maybe we should just fix the wrong-context issue for now, and
> live with the GEQO bloat in the back branches.

Yes, I agree with that. For now your patch fixing the wrong context
issue is good enough.

-- 
Best Wishes,
Ashutosh Bapat

Re: enable_incremental_sort changes query behavior

2020-10-06 Thread James Coleman

On Mon, Oct 5, 2020 at 10:38 PM James Coleman  wrote:
>
> On Mon, Oct 5, 2020 at 11:33 AM James Coleman  wrote:
> >
> > On Sun, Oct 4, 2020 at 9:40 PM James Coleman  wrote:
> > >
> > > On Sat, Oct 3, 2020 at 10:10 PM James Coleman  wrote:
> > > >
> > > > On Sat, Oct 3, 2020 at 5:44 PM Tomas Vondra
> > > >  wrote:
> > > > >
> > > > > On Sat, Oct 03, 2020 at 10:50:06AM -0400, James Coleman wrote:
> > > > > >On Fri, Oct 2, 2020 at 11:16 PM James Coleman  
> > > > > >wrote:
> > > > > >>
> > > > > >> On Fri, Oct 2, 2020 at 7:07 PM James Coleman  
> > > > > >> wrote:
> > > > > >> >
> > > > > >> > On Fri, Oct 2, 2020 at 6:28 PM Tomas Vondra
> > > > > >> >  wrote:
> > > > > >> > >
> > > > > >> > > On Fri, Oct 02, 2020 at 05:45:52PM -0400, James Coleman wrote:
> > > > > >> > > >On Fri, Oct 2, 2020 at 4:56 PM Tomas Vondra
> > > > > >> > > > wrote:
> > > > > >> > > >>
> > > > > >> > > >> ...
> > > > > >> > > >>
> > > > > >> > > >> More importanly, it does not actually fix the issue - it 
> > > > > >> > > >> does fix that
> > > > > >> > > >> particular query, but just replacing the DISTINCT with 
> > > > > >> > > >> either ORDER BY
> > > > > >> > > >> or GROUP BY make it fail again :-(
> > > > > >> > > >>
> > > > > >> > > >> Attached is a simple script I used, demonstrating these 
> > > > > >> > > >> issues for the
> > > > > >> > > >> three cases that expect to have ressortgroupref != 0 (per 
> > > > > >> > > >> the comment
> > > > > >> > > >> before TargetEntry in plannodes.h).
> > > > > >> > > >
> > > > > >> > > >So does checking for volatile expressions (if you happened to 
> > > > > >> > > >test
> > > > > >> > > >that) solve all the cases? If you haven't tested that yet, I 
> > > > > >> > > >can try
> > > > > >> > > >to do that this evening.
> > > > > >> > > >
> > > > > >> > >
> > > > > >> > > Yes, it does fix all the three queries in the SQL script.
> > > > > >> > >
> > > > > >> > > The question however is whether this is the root issue, and 
> > > > > >> > > whether it's
> > > > > >> > > the right way to fix it. For example - volatility is not the 
> > > > > >> > > only reason
> > > > > >> > > that may block qual pushdown. If you look at 
> > > > > >> > > qual_is_pushdown_safe, it
> > > > > >> > > also blocks pushdown of leaky functions in security_barrier 
> > > > > >> > > views. So I
> > > > > >> > > wonder if that could cause failures too, somehow. But I 
> > > > > >> > > haven't managed
> > > > > >> > > to create such example.
> > > > > >> >
> > > > > >> > I was about to say that the issue here is slightly different 
> > > > > >> > from qual
> > > > > >> > etc. pushdown, since we're not concerned about quals here, and 
> > > > > >> > so I
> > > > > >> > wonder where we determine what target list entries to put in a 
> > > > > >> > given
> > > > > >> > scan path, but then I realized that implies (maybe!) a simpler
> > > > > >> > solution. Instead of duplicating checks on target list entries 
> > > > > >> > would
> > > > > >> > be safe, why not check directly in 
> > > > > >> > get_useful_pathkeys_for_relation()
> > > > > >> > whether or not the pathkey has a target list entry?
> > > > > >> >
> > > > > >> > I haven't been able to try that out yet, and so maybe I'm missing
> > > > > >> > something, but I need to step out for a bit, so I'll have to 
> > > > > >> > look at
> > > > > >> > it later.
> > > > > >>
> > > > > >> I've started poking at this, but haven't yet found a workable
> > > > > >> solution. See the attached patch which "solves" the problem by
> > > > > >> breaking putting the sort under the gather merge, but it at least
> > > > > >> demonstrates conceptually what I think we're interested in doing.
> > > > > >>
> > > > > >> The issue currently is that the comparison of expressions fails -- 
> > > > > >> in
> > > > > >> our query, the first column selected shows up as a Var (with the 
> > > > > >> same
> > > > > >> contents) but is a different pointer in the em_expr and the 
> > > > > >> reltarget
> > > > > >> exprs list. I don't yet see a good way to get the equivalence class
> > > > > >> for a PathTarget entry.
> > > > > >
> > > > > >Hmm, I think I was looking to do is the attached patch. I didn't
> > > > > >realize until I did a lot of reading through source that we have an
> > > > > >equal() function that can compare exprs. That (plus the realization 
> > > > > >in
> > > > > >[1] the originally reported backtrace wasn't where the error actually
> > > > > >came from) convinced me that what we need is to confirm not only that
> > > > > >the all of the vars in the ec member come from the relids in the rel
> > > > > >but also that the expr is actually represented in the target of the
> > > > > >rel.
> > > > > >
> > > > > >With the gucs changed as I mentioned earlier both of the plans (with
> > > > > >and without a volatile call in the 2nd select entry) now look like:
> > > > > >
> > > > > > Unique
> > > > > >   ->  Gather Merge
> > > > > > Workers Planned: 2
> > > > > > ->  Sort
> >

Re: Prevent printing "next step instructions" in initdb and pg_upgrade

2020-10-06 Thread Ian Lawrence Barwick

2020年10月6日(火) 19:26 Magnus Hagander :
>
> The attached patch adds a switch --no-instructions to initdb and pg_upgrade, 
> which prevents them from printing instructions about how to start the cluster 
> (initdb) or how to analyze and delete clusters (pg_upgrade).
>
> The use case for this is for example the debian or redhat package wrappers. 
> When these commands are run under those wrappers the printed instructions are 
> *wrong*. It's better in that case to exclude them, and let the wrapper be 
> responsible for printing the correct instructions.
>
> I went with the name --no-instructions to have the same name for both initdb 
> and pg_upgrade. The downside is that "no-instructions" also causes the 
> scripts not to be written in pg_upgrade, which arguably is a different thing. 
> We could go with "--no-instructions" and "--no-scripts", but that would leave 
> the parameters different. I also considered "--no-next-step", but that one 
> didn't quite have the right ring to me. I'm happy for other suggestions on 
> the parameter names.

As the switches are doing slightly different things (just omitting verbiage
versus omitting verbiage *and* not generating some script files) it
seems reasonable
not to try and shoehorn them into a using a unified but ambiguous name
name. Particularly as they're intended to be used in scripts themselves, so
it's not like it's important to create something that users can remember
easily for frequent use.

Alternatively, rather than describing what is not being done, the switch
could be called "--script-mode" or similar with a description along the
lines of "produces output suitable for execution by packaging scripts"
or something, and detail what's being omitted (or done differently)
in the documentation page.

Regards

Ian Barwick


-- 
EnterpriseDB: https://www.enterprisedb.com

Re: Logical Replication - detail message with names of missing columns

2020-10-06 Thread Amit Kapila

On Tue, Oct 6, 2020 at 12:14 PM Bharath Rupireddy
 wrote:
>
> On Mon, Oct 5, 2020 at 9:39 AM Amit Kapila  wrote:
> >
> >
> > 3. The patch doesn't seem to be freeing the memory allocated for 
> > missingattsbuf.
> >
>
> I don't think we need to do that. We are passing missingattsbuf.data to 
> ereport and we are safe without freeing up missingattsbuf(we don't reach the 
> code after ereprot(ERROR,...)as the table sync worker anyways goes away after 
> throwing missing attributes error( if (sigsetjmp(local_sigjmp_buf, 1) != 0) 
> in StartBackgroundWorker and then proc_exit(1)). Each time a new table sync 
> bg worker is spawned.
>

Okay, by that logic, we don't even need to free memory for missingatts.

I have made a few changes, (a) moved free of missingatts in the caller
where we are allocating it. (b) added/edited/removed comments, (c) ran
pgindent.

Shall we backpatch this? I don't see any particular need to backpatch
this as this is a minor error message improvement and nobody has
reported any problem due to this. What do you think?

-- 
With Regards,
Amit Kapila.

v8-0001-Display-the-names-of-missing-columns-in-error-dur.patch
Description: Binary data

[doc] remove reference to pg_dump pre-8.1 switch behaviour

2020-10-06 Thread Ian Lawrence Barwick

Hi

The pg_dump doc page [1], under the -t/--table option, contains a Note
documenting the behavioural differences introduced in PostgreSQL 8.2.

As it's been almost exactly 14 years since that note was added [2], I suggest
it can be removed entirely.

[1] https://www.postgresql.org/docs/current/app-pgdump.html
[2] 
https://git.postgresql.org/gitweb/?p=postgresql.git;a=blobdiff;f=doc/src/sgml/ref/pg_dump.sgml;h=9aa4baf84e74817a3c3e8359b2c4c8a847fda987;hp=deafd7c9a989c2cbce3979d94416a298609f5e84;hb=24e97528631e7e810ce61fc0f5fbcaca0c001c4c;hpb=77d2b1b625c7decd7a25ec865bced3b927de6d4b


Regards

Ian Barwick


-- 
EnterpriseDB: https://www.enterprisedb.com

Re: [doc] remove reference to pg_dump pre-8.1 switch behaviour

2020-10-06 Thread Ian Lawrence Barwick

2020年10月6日(火) 21:13 Ian Lawrence Barwick :
>
> Hi
>
> The pg_dump doc page [1], under the -t/--table option, contains a Note
> documenting the behavioural differences introduced in PostgreSQL 8.2.
>
> As it's been almost exactly 14 years since that note was added [2], I suggest
> it can be removed entirely.
>
> [1] https://www.postgresql.org/docs/current/app-pgdump.html
> [2] 
> https://git.postgresql.org/gitweb/?p=postgresql.git;a=blobdiff;f=doc/src/sgml/ref/pg_dump.sgml;h=9aa4baf84e74817a3c3e8359b2c4c8a847fda987;hp=deafd7c9a989c2cbce3979d94416a298609f5e84;hb=24e97528631e7e810ce61fc0f5fbcaca0c001c4c;hpb=77d2b1b625c7decd7a25ec865bced3b927de6d4b


Oh yes, I was planning to attach an ultra-trivial patch for that too.


Regards

Ian Barwick
-- 
EnterpriseDB: https://www.enterprisedb.com
commit 80e4ab04401d38d4924dac153a53c9b2108f560c
Author: Ian Barwick 
Date:   Tue Oct 6 20:51:03 2020 +0900

doc: remove reference to pre-8.2 pg_dump behaviour

The behavioural change in the -t/--table option happened around 15
years ago and there seems little point in keeping it around.

diff --git a/doc/src/sgml/ref/pg_dump.sgml b/doc/src/sgml/ref/pg_dump.sgml
index 1400cf8775..f99dc3f31e 100644
--- a/doc/src/sgml/ref/pg_dump.sgml
+++ b/doc/src/sgml/ref/pg_dump.sgml
@@ -548,18 +548,6 @@ PostgreSQL documentation
 

 
-   
-
- The behavior of the -t switch is not entirely upward
- compatible with pre-8.2 PostgreSQL
- versions.  Formerly, writing -t tab would dump all
- tables named tab, but now it just dumps whichever one
- is visible in your default search path.  To get the old behavior
- you can write -t '*.tab'.  Also, you must write something
- like -t sch.tab to select a table in a particular schema,
- rather than the old locution of -n sch -t tab.
-
-

Re: Prevent printing "next step instructions" in initdb and pg_upgrade

2020-10-06 Thread Magnus Hagander

On Tue, Oct 6, 2020 at 1:49 PM Ian Lawrence Barwick 
wrote:

> 2020年10月6日(火) 19:26 Magnus Hagander :
> >
> > The attached patch adds a switch --no-instructions to initdb and
> pg_upgrade, which prevents them from printing instructions about how to
> start the cluster (initdb) or how to analyze and delete clusters
> (pg_upgrade).
> >
> > The use case for this is for example the debian or redhat package
> wrappers. When these commands are run under those wrappers the printed
> instructions are *wrong*. It's better in that case to exclude them, and let
> the wrapper be responsible for printing the correct instructions.
> >
> > I went with the name --no-instructions to have the same name for both
> initdb and pg_upgrade. The downside is that "no-instructions" also causes
> the scripts not to be written in pg_upgrade, which arguably is a different
> thing. We could go with "--no-instructions" and "--no-scripts", but that
> would leave the parameters different. I also considered "--no-next-step",
> but that one didn't quite have the right ring to me. I'm happy for other
> suggestions on the parameter names.
>
> As the switches are doing slightly different things (just omitting verbiage
> versus omitting verbiage *and* not generating some script files) it
> seems reasonable
> not to try and shoehorn them into a using a unified but ambiguous name
> name. Particularly as they're intended to be used in scripts themselves, so
> it's not like it's important to create something that users can remember
> easily for frequent use.
>

True.



Alternatively, rather than describing what is not being done, the switch
> could be called "--script-mode" or similar with a description along the
> lines of "produces output suitable for execution by packaging scripts"
> or something, and detail what's being omitted (or done differently)
> in the documentation page.
>

Hmm, I'm less sure about that one. One could argue that this should then
also affect other things, like password prompting, to fulfill it's name.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/

[doc] clarify behaviour of pg_dump's -t/--table option with non-tables

2020-10-06 Thread Ian Lawrence Barwick

Hi

Recently I ran into a case where someone was wondering why it was not
possible to dump the contents of a view, even though the documentation [1]
seems to imply this is possible.

Currently it says:

  Dump only tables with names matching pattern. For this purpose, "table"
  includes views, materialized views, sequences, and foreign tables.

The attached patch attempts to clarify that only definitions of those objects
will be dumped, and also mentions that dumping foreign table data requires the
--include-foreign-data option.

I suggest backpatching any changes to Pg13 where the --include-foreign-data
option was added.

[1] https://www.postgresql.org/docs/current/app-pgdump.html


Regards

Ian Barwick

-- 
EnterpriseDB: https://www.enterprisedb.com
commit 7240af7e6b7168ff1ad1790c6537b29f93fb6891
Author: Ian Barwick 
Date:   Tue Oct 6 21:40:37 2020 +0900

doc: clarify behaviour of pg_dump's -t/--table option

The existing wording gives the impression it might be possible to
dump the contents of views, which is not possible. Also, from PostgreSQL
13 it is possible to dump the contents of foreign tables, but this
requires the --include-foreign-data option.

diff --git a/doc/src/sgml/ref/pg_dump.sgml b/doc/src/sgml/ref/pg_dump.sgml
index f99dc3f31e..4e2c80983e 100644
--- a/doc/src/sgml/ref/pg_dump.sgml
+++ b/doc/src/sgml/ref/pg_dump.sgml
@@ -517,9 +517,7 @@ PostgreSQL documentation
   

 Dump only tables with names matching
-pattern.
-For this purpose, table includes views, materialized views,
-sequences, and foreign tables.  Multiple tables
+pattern. Multiple tables
 can be selected by writing multiple -t switches.  The
 pattern parameter is
 interpreted as a pattern according to the same rules used by
@@ -531,6 +529,14 @@ PostgreSQL documentation
  below.

 
+   
+As well as tables, this option can be used to dump views, materialized views,
+foreign tables, and sequence definitions.  However it will not dump the contents
+of views or materialized views, and the contents of foreign tables will only be
+dumped if the corresponding foreign server is specified with
+--include-foreign-data.
+   
+

 The -n and -N switches have no effect when
 -t is used, because tables selected by -t will

Re: Logical Replication - detail message with names of missing columns

2020-10-06 Thread Bharath Rupireddy

On Tue, Oct 6, 2020 at 5:24 PM Amit Kapila  wrote:
>
> On Tue, Oct 6, 2020 at 12:14 PM Bharath Rupireddy
>  wrote:
> >
> > On Mon, Oct 5, 2020 at 9:39 AM Amit Kapila  wrote:
> > >
> > >
> > > 3. The patch doesn't seem to be freeing the memory allocated for 
> > > missingattsbuf.
> > >
> >
> > I don't think we need to do that. We are passing missingattsbuf.data to 
> > ereport and we are safe without freeing up missingattsbuf(we don't reach 
> > the code after ereprot(ERROR,...)as the table sync worker anyways goes away 
> > after throwing missing attributes error( if (sigsetjmp(local_sigjmp_buf, 1) 
> > != 0) in StartBackgroundWorker and then proc_exit(1)). Each time a new 
> > table sync bg worker is spawned.
> >
>
> Okay, by that logic, we don't even need to free memory for missingatts.
>
> I have made a few changes, (a) moved free of missingatts in the caller
> where we are allocating it. (b) added/edited/removed comments, (c) ran
> pgindent.
>

Thanks Amit. v8 patch looks good to me.

>
> Shall we backpatch this? I don't see any particular need to backpatch
> this as this is a minor error message improvement and nobody has
> reported any problem due to this. What do you think?
>

IMO, no backpatch is required as this is not a bug or something
reported by anyone.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Re: Yet another fast GiST build

2020-10-06 Thread Heikki Linnakangas


On 06/10/2020 14:05, Pavel Borisov wrote:

I've been making tests with memory sanitizer


Thanks!

and got one another error 
in regression test create_index:

CREATE INDEX gpointind ON point_tbl USING gist (f1);
server closed the connection unexpectedly

with logfile:
gistproc.c:1714:28: runtime error: 1e+300 is outside the range of 
representable values of type 'float'

SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior gistproc.c:1714:28

gistproc.c:
1714     z = point_zorder_internal(p->x, p->y);

Consider this a minor issue but unrelated to the other issues discussed. 
It is reproduced on the last master 
commit 0a3c864c32751fd29d021929cf70af421fd27370 after all changes into 
Gist committed.


cflags="-DUSE_VALGRIND -Og -O0 -fsanitize=address -fsanitize=undefined 
-fno-sanitize-recover=all -fno-sanitize=alignment -fstack-protector"


You get the same error with:

select (float8 '1e+300')::float4;

float.c:1204:11: runtime error: 1e+300 is outside the range of 
representable values of type 'float'

SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior float.c:1204:11 in

It boils down to casting a C double to float, when the value doesn't fit 
in float. I'm surprised that's undefined behavior, but I'm no C99 
lawyer. The code in dtof() expects it to yield Inf.


I'm inclined to shrug this off and say that the sanitizer is being 
over-zealous. Is there some compiler flag we should be setting, to tell 
it that we require specific behavior? Any other ideas?


- Heikki

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2020-10-06 Thread Bruce Momjian

On Tue, Oct  6, 2020 at 02:34:58PM +0900, Michael Paquier wrote:
> On Mon, Oct 05, 2020 at 11:23:50PM -0400, Bruce Momjian wrote:
> > On Tue, Oct  6, 2020 at 11:11:27AM +0800, Julien Rouhaud wrote:
> >> Maybe we could add a new hook for only queryid computation, and add a
> >> GUC to let people choose between no queryid computed, core computation
> >> (current pg_stat_statement) and 3rd party plugin?
> > 
> > That all seems very complicated.  If we go in that direction, I suggest
> > we just give up getting any of this into core.
> 
> A GUC would have at least the advantage to make the computation
> consistent for any system willing to consume it, with the option to
> not pay any potential performance impact, though I have to admit that
> just moving the query ID computation of PGSS into core may not be the
> best option as a query ID of 0 means the same thing for a utility, for
> an initialization, and for a backend running a query with an unknown
> value, but that could be worked out.
> 
> FWIW, I think that adding the system ID in the hash is too
> restrictive, as it could be interesting for users to do stat
> comparisons across multiple systems running the same major version.
> It would be better to not give any strong guarantee that the query ID
> computed will remain consistent across major versions so as it is
> possible to keep improving it.  Also, if nothing has been done that
> changes the hashing computation, I see little benefit in forcing a
> breakage by adding something like PG_MAJORVERSION_NUM or such in the
> hash computation.

I thought some more about this.  First, I think having the queryid hash
code in the server, without requiring pg_stat_statements, is a
requirement --- I think too many people will want to use this feature
independent of pg_stat_statements.  Second, I understand the desire to
have different hash computation methods, depending on what level of
detail/matching you want.

I propose moving the pg_stat_statements queryid hash code into the
server (with a version number), and also adding a postgressql.conf
variable that lets you control how detailed the queryid hash is
computed.  This addresses the problem of people wanting different hash
methods.

When computing a hash, the queryid detail level and version number will
be mixed into the hash, so only a hash that used a similar query and
identical queryid detail level would match.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee

Re: enable_incremental_sort changes query behavior

2020-10-06 Thread Jaime Casanova

On Tue, 6 Oct 2020 at 06:46, James Coleman  wrote:
>
> > All right, here's a modified patch. I did a bit of surgery: the
> > concept is the same, but I decided to explicitly not the parallels to
> > (and follow as possible) prepare_sort_from_pathkeys(). That means we
> > only have to do the expression equality check if the expr is volatile,
> > and the relids bms check doesn't have to bail if it's empty.
> >
> > This properly allows us to push the sort all the way down to the base
> > rel when the only things in the base rel are referenced (including
> > stable expressions) but pushes the sort up to the upper rel when a
> > volatile expression is used (technically it would be safe for it to go
> > lower, but in the current planner that's the only time it appears in
> > the target list, so allowing that would be a larger functional
> > change).
> >
> > I left the questions patch here, though obviously it doesn't need to
> > be committed. If someone has answers to the questions, I'd be
> > interested though, but I don't think it affects the correctness of
> > this patch.
>
> And with a typo fixed.
>

Hi James,

Thanks for working on this, I haven't tried the latest patch but the
previous single patch worked fine in the cases you and Tomas found and
with my original example.

I haven't been able to reproduce the same problem with other queries
(I tried RLS and views with barriers, i also wanted to test with LEAK
functions but didn't have the time yet).

Can you please create an entry in the commitfest for this one so we
don't lose track of it?

-- 
Jaime Casanova  www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: enable_incremental_sort changes query behavior

2020-10-06 Thread James Coleman

On Tue, Oct 6, 2020 at 9:28 AM Jaime Casanova
 wrote:
> Can you please create an entry in the commitfest for this one so we
> don't lose track of it?

Done.

Re: Parallel INSERT (INTO ... SELECT ...)

2020-10-06 Thread Bharath Rupireddy

On Tue, Oct 6, 2020 at 4:13 PM Greg Nancarrow  wrote:
>
> On Tue, Oct 6, 2020 at 9:10 PM Bharath Rupireddy
>  wrote:
>
> > But the problem may arise if a partitioned table has foreign table as
> > a partition, I think we can not allow parallelism for this case too,
> > but it's hard to determine ahead of time whether a table has a foreign
> > partition.(See [1] in copy.c)
> >
>
> Thanks, I had seen that as a potential issue when scanning the code,
> but had forgotten to note it. I'll check your code again.
>

In parallel, we are not doing anything(due to the same reason
explained in above comment) to find whether there is a foreign
partition or not while deciding to go with parallel/non-parallel copy,
we are just throwing an error during the first tuple insertion into
the partition.

errmsg("cannot perform PARALLEL COPY if partition has BEFORE/INSTEAD
OF triggers, or if the partition is foreign partition"),
errhint("Try COPY without PARALLEL option")));

> > >
> > > - Added support for before/after statement-level INSERT triggers
> > > (can't allow parallel workers to execute these)
> > >
> >
> > I think we can allow parallelism for before statement level-triggers.
> > Leader can execute this trigger and go for parallel inserts.
> >
>
> My attached patch implements the before/after statement-level trigger
> invocation.
> (For INSERT INTO ... SELECT... case, it needs to account for parallel
> and non-parallel INSERT, and also the fact that, as the patch
> currently stands, the leader also participates in a parallel INSERT -
> so I found it necessary to invoke those triggers at the Gather node
> level in that case).
>

Allowing the leader to execute before statement triggers at Gather
node level before invoking the parallel plan and then parallel inserts
makes sense. But if there are any after statement triggers, there may
come transition tables, see Amit's findings under Case-1 in [1] and we
must disable parallelism in that case.

[1] - 
https://www.postgresql.org/message-id/flat/CAA4eK1%2BANNEaMJCCXm4naweP5PLY6LhJMvGo_V7-Pnfbh6GsOA%40mail.gmail.com

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Re: [doc] clarify behaviour of pg_dump's -t/--table option with non-tables

2020-10-06 Thread Ian Lawrence Barwick

2020年10月6日(火) 21:58 Ian Lawrence Barwick :
>
> Hi
>
> Recently I ran into a case where someone was wondering why it was not
> possible to dump the contents of a view, even though the documentation [1]
> seems to imply this is possible.
>
> Currently it says:
>
>   Dump only tables with names matching pattern. For this purpose, "table"
>   includes views, materialized views, sequences, and foreign tables.
>
> The attached patch attempts to clarify that only definitions of those objects
> will be dumped, and also mentions that dumping foreign table data requires the
> --include-foreign-data option.
>
> I suggest backpatching any changes to Pg13 where the --include-foreign-data
> option was added.
>
> [1] https://www.postgresql.org/docs/current/app-pgdump.html

Better version attached.


Regards

Ian Barwick

-- 
EnterpriseDB: https://www.enterprisedb.com
commit 4344e620620fa82d89bc9838d389cc345a7763ea
Author: Ian Barwick 
Date:   Tue Oct 6 21:40:37 2020 +0900

doc: clarify behaviour of pg_dump's -t/--table option

The existing wording gives the impression it might be possible to
dump the contents of views, which is not possible. Also, from PostgreSQL
13 it is possible to dump the contents of foreign tables, but this
requires the --include-foreign-data option.

diff --git a/doc/src/sgml/ref/pg_dump.sgml b/doc/src/sgml/ref/pg_dump.sgml
index f99dc3f31e..0aa35cf0c3 100644
--- a/doc/src/sgml/ref/pg_dump.sgml
+++ b/doc/src/sgml/ref/pg_dump.sgml
@@ -517,9 +517,7 @@ PostgreSQL documentation
   

 Dump only tables with names matching
-pattern.
-For this purpose, table includes views, materialized views,
-sequences, and foreign tables.  Multiple tables
+pattern. Multiple tables
 can be selected by writing multiple -t switches.  The
 pattern parameter is
 interpreted as a pattern according to the same rules used by
@@ -531,6 +529,14 @@ PostgreSQL documentation
  below.

 
+   
+As well as tables, this option can be used to dump the definition of matching
+views, materialized views, foreign tables, and sequences.  It will not dump the
+contents of views or materialized views, and the contents of foreign tables will
+only be dumped if the corresponding foreign server is specified with
+--include-foreign-data.
+   
+

 The -n and -N switches have no effect when
 -t is used, because tables selected by -t will

Re: [doc] clarify behaviour of pg_dump's -t/--table option with non-tables

2020-10-06 Thread Magnus Hagander

On Tue, Oct 6, 2020 at 3:45 PM Ian Lawrence Barwick 
wrote:

> 2020年10月6日(火) 21:58 Ian Lawrence Barwick :
> >
> > Hi
> >
> > Recently I ran into a case where someone was wondering why it was not
> > possible to dump the contents of a view, even though the documentation
> [1]
> > seems to imply this is possible.
> >
> > Currently it says:
> >
> >   Dump only tables with names matching pattern. For this purpose, "table"
> >   includes views, materialized views, sequences, and foreign tables.
> >
> > The attached patch attempts to clarify that only definitions of those
> objects
> > will be dumped, and also mentions that dumping foreign table data
> requires the
> > --include-foreign-data option.
> >
> > I suggest backpatching any changes to Pg13 where the
> --include-foreign-data
> > option was added.
> >
> > [1] https://www.postgresql.org/docs/current/app-pgdump.html
>
> Better version attached.
>

Argh, perfect timing. I'll update with your new version :)

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/

Re: [doc] clarify behaviour of pg_dump's -t/--table option with non-tables

2020-10-06 Thread Magnus Hagander

On Tue, Oct 6, 2020 at 2:59 PM Ian Lawrence Barwick 
wrote:

> Hi
>
> Recently I ran into a case where someone was wondering why it was not
> possible to dump the contents of a view, even though the documentation [1]
> seems to imply this is possible.
>
> Currently it says:
>
>   Dump only tables with names matching pattern. For this purpose, "table"
>   includes views, materialized views, sequences, and foreign tables.
>
> The attached patch attempts to clarify that only definitions of those
> objects
> will be dumped, and also mentions that dumping foreign table data requires
> the
> --include-foreign-data option.
>
> I suggest backpatching any changes to Pg13 where the --include-foreign-data
> option was added.
>

LGTM and agreed on the backpatch. Pushed.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/

Re: Transactions involving multiple postgres foreign servers, take 2

2020-10-06 Thread Masahiko Sawada

On Fri, 2 Oct 2020 at 18:20, tsunakawa.ta...@fujitsu.com
 wrote:
>
> From: Masahiko Sawada 
> > You proposed the first idea
> > to avoid such a situation that FDW implementor can write the code
> > while trying to reduce the possibility of errors happening as much as
> > possible, for example by usingpalloc_extended(MCXT_ALLOC_NO_OOM) and
> > hash_search(HASH_ENTER_NULL) but I think it's not a comprehensive
> > solution. They might miss, not know it, or use other functions
> > provided by the core that could lead an error.
>
> We can give the guideline in the manual, can't we?  It should not be 
> especially difficult for the FDW implementor compared to other Postgres's 
> extensibility features that have their own rules -- table/index AM, 
> user-defined C function, trigger function in C, user-defined data types, 
> hooks, etc.  And, the Postgres functions that the FDW implementor would use 
> to implement their commit will be very limited, won't they?  Because most of 
> the commit processing is performed in the resource manager's library (e.g. 
> Oracle and MySQL client library.)

Yeah, if we think FDW implementors properly implement these APIs while
following the guideline, giving the guideline is a good idea. But I’m
not sure all FDW implementors are able to do that and even if the user
uses an FDW whose transaction APIs don’t follow the guideline, the
user won’t realize it. IMO it’s better to design the feature while not
depending on external programs for reliability (correctness?) of this
feature, although I might be too worried.

>
>
> > Another idea is to use
> > PG_TRY() and PG_CATCH(). IIUC with this idea, FDW implementor catches
> > an error but ignores it rather than rethrowing by PG_RE_THROW() in
> > order to return the control to the core after an error. I’m really not
> > sure it’s a correct usage of those macros. In addition, after
> > returning to the core, it will retry to resolve the same or other
> > foreign transactions. That is, after ignoring an error, the core needs
> > to continue working and possibly call transaction callbacks of other
> > FDW implementations.
>
> No, not ignore the error.  The FDW can emit a WARNING, LOG, or NOTICE 
> message, and return an error code to TM.  TM can also emit a message like:
>
> WARNING:  failed to commit part of a transaction on the foreign server 'XXX'
> HINT:  The server continues to try committing the remote transaction.
>
> Then TM asks the resolver to take care of committing the remote transaction, 
> and acknowledge the commit success to the client.

It seems like if failed to resolve, the backend would return an
acknowledgment of COMMIT to the client and the resolver process
resolves foreign prepared transactions in the background. So we can
ensure that the distributed transaction is completed at the time when
the client got an acknowledgment of COMMIT if 2nd phase of 2PC is
successfully completed in the first attempts. OTOH, if it failed for
whatever reason, there is no such guarantee. From an optimistic
perspective, i.g., the failures are unlikely to happen, it will work
well but IMO it’s not uncommon to fail to resolve foreign transactions
due to network issue, especially in an unreliable network environment
for example geo-distributed database. So I think it will end up
requiring the client to check if preceding distributed transactions
are completed or not in order to see the results of these
transactions.

We could retry the foreign transaction resolution before leaving it to
the resolver process but the problem that the core continues trying to
resolve foreign transactions without neither transaction aborting and
rethrowing even after an error still remains.

Regards,

-- 
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Yet another fast GiST build

2020-10-06 Thread Tom Lane

Heikki Linnakangas  writes:
> You get the same error with:
> select (float8 '1e+300')::float4;
> float.c:1204:11: runtime error: 1e+300 is outside the range of 
> representable values of type 'float'
> SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior float.c:1204:11 in

> It boils down to casting a C double to float, when the value doesn't fit 
> in float. I'm surprised that's undefined behavior, but I'm no C99 
> lawyer. The code in dtof() expects it to yield Inf.

I think UBSan read C99 6.3.1.5:

   [#2]  When  a double is demoted to float or a long double to
   double or float, if the value being converted is outside the
   range  of  values  that  can be represented, the behavior is
   undefined.

and stopped reading at that point, which they should not have.
If you go on to read the portions around, particularly, ,
you get a different picture of affairs.  If we're relying on IEEE
float semantics in other places, which we are, we're perfectly
entitled to assume that the cast will yield Inf (and a floating
point exception flag, which we ignore).  I think the "undefined"
here is just meant to say that there's no single behavior promised
across all possible C implementations.  They'd have been better to
write "implementation-defined", though.

> I'm inclined to shrug this off and say that the sanitizer is being 
> over-zealous. Is there some compiler flag we should be setting, to tell 
> it that we require specific behavior? Any other ideas?

If UBSan doesn't have a flag to tell it to assume IEEE math,
I'd say that makes it next door to worthless for our purposes.

regards, tom lane

Re: Prevent printing "next step instructions" in initdb and pg_upgrade

2020-10-06 Thread Tom Lane

Magnus Hagander  writes:
> The use case for this is for example the debian or redhat package wrappers.
> When these commands are run under those wrappers the printed instructions
> are *wrong*. It's better in that case to exclude them, and let the wrapper
> be responsible for printing the correct instructions.

Hm, does it matter?  I think those wrappers send the output to /dev/null
anyway.

regards, tom lane

Re: Prevent printing "next step instructions" in initdb and pg_upgrade

2020-10-06 Thread Magnus Hagander

On Tue, Oct 6, 2020 at 4:31 PM Tom Lane  wrote:

> Magnus Hagander  writes:
> > The use case for this is for example the debian or redhat package
> wrappers.
> > When these commands are run under those wrappers the printed instructions
> > are *wrong*. It's better in that case to exclude them, and let the
> wrapper
> > be responsible for printing the correct instructions.
>
> Hm, does it matter?  I think those wrappers send the output to /dev/null
> anyway.
>

The debian ones don't, because they consider it useful information to the
user. I'd say that it is, especially in the case of pg_upgrade. (Less so
about initdb, but still a bit -- especially when creating secondary
clusters)
The redhat ones I believe sent it to a logfile, not to /dev/null. Meaning
it's not in your face, but it still contains incorrect instructions.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/

Re: Prevent printing "next step instructions" in initdb and pg_upgrade

2020-10-06 Thread Tom Lane

Magnus Hagander  writes:
> On Tue, Oct 6, 2020 at 4:31 PM Tom Lane  wrote:
>> Hm, does it matter?  I think those wrappers send the output to /dev/null
>> anyway.

> The debian ones don't, because they consider it useful information to the
> user. I'd say that it is, especially in the case of pg_upgrade. (Less so
> about initdb, but still a bit -- especially when creating secondary
> clusters)
> The redhat ones I believe sent it to a logfile, not to /dev/null. Meaning
> it's not in your face, but it still contains incorrect instructions.

OK.  FWIW, I'd vote for separate --no-instructions and --no-scripts
switches.

regards, tom lane

Re: Index Skip Scan (new UniqueKeys)

2020-10-06 Thread Dmitry Dolgov

> On Mon, Sep 21, 2020 at 05:59:32PM -0700, Peter Geoghegan wrote:
>
> * I see the following compiler warning:
>
> /code/postgresql/patch/build/../source/src/backend/optimizer/path/uniquekeys.c:
> In function ‘populate_baserel_uniquekeys’:
> /code/postgresql/patch/build/../source/src/backend/optimizer/path/uniquekeys.c:797:13:
> warning: ‘expr’ may be used uninitialized in this function
> [-Wmaybe-uninitialized]
>   797 |   else if (!list_member(unique_index->rel->reltarget->exprs, expr))
>   | ^~

This is mostly for UniqueKeys patch, which is attached here only as a
dependency, but I'll prepare changes for that. Interesting enough I
can't reproduce this warning, but if I understand correctly gcc has some
history of spurious uninitialized warnings, so I guess it could be
version dependent.

> * Perhaps the warning is related to this nearby code that I noticed
> Valgrind complains about:
>
> ==1083468== VALGRINDERROR-BEGIN
> ==1083468== Invalid read of size 4
> ==1083468==at 0x59568A: get_exprs_from_uniqueindex (uniquekeys.c:771)
> ==1083468==by 0x593C5B: populate_baserel_uniquekeys (uniquekeys.c:140)

This also belongs to UniqueKeys patch, but at least I can reproduce this
one. My guess is that nkeycolums should be used there, not ncolums,
which is visible in index_incuding tests. The same as previous one, will
prepare corresponding changes.

> * Do we really need the AM-level boolean flag/argument named
> "scanstart"? Why not just follow the example of btgettuple(), which
> determines whether or not the scan has been initialized based on the
> current scan position?
>
> Just because you set so->currPos.buf to InvalidBuffer doesn't mean you
> cannot or should not take the same approach as btgettuple(). And even
> if you can't take exactly the same approach, I would still think that
> the scan's opaque B-Tree state should remember if it's the first call
> to _bt_skip() (rather than some subsequent call) in some other way
> (e.g. carrying a "scanstart" bool flag directly).

Yes, agree, carrying this flag inside the opaque state would be better.

> * Why is it okay to do anything important based on the
> _bt_scankey_within_page() return value?
>
> If the page is empty, then how can we know that it's okay to go to the
> next value? I'm concerned that there could be subtle bugs in this
> area. VACUUM will usually just delete the empty page. But it won't
> always do so, for a variety of reasons that aren't worth going into
> now. This could mask bugs in this area. I'm concerned about patterns
> like this one from _bt_skip():
>
> while (!nextFound)
> {
> 
>
> if (_bt_scankey_within_page(scan, so->skipScanKey,
> so->currPos.buf, dir))
> {
> ...
> }
> else
> /*
>  * If startItup could be not found within the current 
> page,
>  * assume we found something new
>  */
> nextFound = true;
> 
> }
>
> Why would you assume that "we found something new" here? In general I
> just don't understand the design of _bt_skip(). I get the basic idea
> of what you're trying to do, but it could really use better comments.

Yeah, I'll put more efforts into clear comments. There are two different
ways in which _bt_scankey_within_page is being used.

The first one is to check if it's possible to skip traversal of the tree
from root in case if what we're looking for could be on the current
page. In this case an empty page would mean we need to search from the
root, so not sure what could be the issue here?

The second one (that you've highlighted above) I admit is probably the
most questionable part of the patch and open for suggestions how to
improve it. It's required for one particular case with a cursor when
scan advances forward but reads backward. What could happen here is we
found one valid item, but the next one e.g. do not pass scan key
conditions, and we end up with the previous item again. I'm not entirely
sure how presence of an empty page could change this scenario, could you
please show an example?

> *The "jump one more time if it's the same as at the beginning" thing
> seems scary to me. Maybe you should be doing something with the actual
> high key here.

Same as for the previous question, can you give a hint what do you mean
by "doing something with the actual high key"?

Re: Prevent printing "next step instructions" in initdb and pg_upgrade

2020-10-06 Thread Bruce Momjian

On Tue, Oct  6, 2020 at 11:06:13AM -0400, Tom Lane wrote:
> Magnus Hagander  writes:
> > On Tue, Oct 6, 2020 at 4:31 PM Tom Lane  wrote:
> >> Hm, does it matter?  I think those wrappers send the output to /dev/null
> >> anyway.
> 
> > The debian ones don't, because they consider it useful information to the
> > user. I'd say that it is, especially in the case of pg_upgrade. (Less so
> > about initdb, but still a bit -- especially when creating secondary
> > clusters)
> > The redhat ones I believe sent it to a logfile, not to /dev/null. Meaning
> > it's not in your face, but it still contains incorrect instructions.
> 
> OK.  FWIW, I'd vote for separate --no-instructions and --no-scripts
> switches.

Works for me.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee

Re: Yet another fast GiST build

2020-10-06 Thread Pavel Borisov

It became normal with
-fsanitize=signed-integer-overflow,null,alignment
instead of
-fsanitize=undefined
(which is strictly a 'default' list of needed and unnecessary things to
check, can be overridden anyway but needed some reading for it)

Thanks!

-- 
Best regards,
Pavel Borisov

Postgres Professional: http://postgrespro.com

Re: Partitionwise join fails under GEQO

2020-10-06 Thread Tom Lane

Ashutosh Bapat  writes:
> On Mon, Oct 5, 2020 at 11:59 PM Tom Lane  wrote:
>> ... we could avoid the growth in eclass members for large partition sets
>> if we simply didn't store child eclass members, instead translating
>> on-the-fly when we need to deal with child rels.  I have a patch
>> about half done, but it won't be possible to determine the true
>> performance implications of that idea until it's all done.  More
>> later.

> If we translate more than ones, every time someone comes searching for
> an EC member, we will leak memory in the planner memory context, which
> anyway gets bloated because of the huge number of translations even
> when done only once. So that needs to be done rather carefully.

I'm not terribly concerned about that.  There's not a "huge number"
of such translations to be done --- it's more like one per index.
(Also, we could very possibly build the translations in a temp
context that gets reset every so often, if it becomes an issue.)

I am a bit worried about whether we'll be spending a lot more cycles
to do the added translations; though some of that should be bought
back by having fewer EC members to compare to.  In any event, testing
a working patch will be a lot more illuminating than speculation.

>> Either approach would mean that add_child_join_rel_equivalences
>> goes away entirely, or at least doesn't need to store new em_is_child
>> entries anymore, causing the memory bloat issue to become moot.
>> So maybe we should just fix the wrong-context issue for now, and
>> live with the GEQO bloat in the back branches.

> Yes, I agree with that. For now your patch fixing the wrong context
> issue is good enough.

Done for now.

regards, tom lane

Re: [HACKERS] Custom compression methods

2020-10-06 Thread Tomas Vondra


On Tue, Oct 06, 2020 at 11:00:55AM +0530, Dilip Kumar wrote:

On Mon, Oct 5, 2020 at 9:34 PM Tomas Vondra
 wrote:


On Mon, Oct 05, 2020 at 07:57:41PM +0530, Dilip Kumar wrote:
>On Mon, Oct 5, 2020 at 5:53 PM Tomas Vondra
> wrote:
>>
>> On Mon, Oct 05, 2020 at 11:17:28AM +0530, Dilip Kumar wrote:
>> >On Mon, Oct 5, 2020 at 3:37 AM Tomas Vondra
>> > wrote:
>> >
>> >Thanks, Tomas for your feedback.
>> >
>> >> 9) attcompression ...
>> >>
>> >> The main issue I see is what the patch does with attcompression. Instead
>> >> of just using it to store a the compression method, it's also used to
>> >> store the preserved compression methods. And using NameData to store
>> >> this seems wrong too - if we really want to store this info, the correct
>> >> way is either using text[] or inventing charvector or similar.
>> >
>> >The reason for using the NameData is the get it in the fixed part of
>> >the data structure.
>> >
>>
>> Why do we need that? It's possible to have varlena fields with direct
>> access (see pg_index.indkey for example).
>
>I see.  While making it NameData I was thinking whether we have an
>option to direct access the varlena. Thanks for pointing me there. I
>will change this.
>
> Adding NameData just to make
>> it fixed-length means we're always adding 64B even if we just need a
>> single byte, which means ~30% overhead for the FormData_pg_attribute.
>> That seems a bit unnecessary, and might be an issue with many attributes
>> (e.g. with many temp tables, etc.).
>
>You are right.  Even I did not like to keep 64B for this, so I will change it.
>
>>
>> >> But to me this seems very much like a misuse of attcompression to track
>> >> dependencies on compression methods, necessary because we don't have a
>> >> separate catalog listing compression methods. If we had that, I think we
>> >> could simply add dependencies between attributes and that catalog.
>> >
>> >Basically, up to this patch, we are having only built-in compression
>> >methods and those can not be dropped so we don't need any dependency
>> >at all.  We just want to know what is the current compression method
>> >and what is the preserve compression methods supported for this
>> >attribute.  Maybe we can do it better instead of using the NameData
>> >but I don't think it makes sense to add a separate catalog?
>> >
>>
>> Sure, I understand what the goal was - all I'm saying is that it looks
>> very much like a workaround needed because we don't have the catalog.
>>
>> I don't quite understand how could we support custom compression methods
>> without listing them in some sort of catalog?
>
>Yeah for supporting custom compression we need some catalog.
>
>> >> Moreover, having the catalog would allow adding compression methods
>> >> (from extensions etc) instead of just having a list of hard-coded
>> >> compression methods. Which seems like a strange limitation, considering
>> >> this thread is called "custom compression methods".
>> >
>> >I think I forgot to mention while submitting the previous patch that
>> >the next patch I am planning to submit is, Support creating the custom
>> >compression methods wherein we can use pg_am catalog to insert the new
>> >compression method.  And for dependency handling, we can create an
>> >attribute dependency on the pg_am row.  Basically, we will create the
>> >attribute dependency on the current compression method AM as well as
>> >on the preserved compression methods AM.   As part of this, we will
>> >add two build-in AMs for zlib and pglz, and the attcompression field
>> >will be converted to the oid_vector (first OID will be of the current
>> >compression method, followed by the preserved compression method's
>> >oids).
>> >
>>
>> Hmmm, ok. Not sure pg_am is the right place - compression methods don't
>> quite match what I though AMs are about, but maybe it's just my fault.
>>
>> FWIW it seems a bit strange to first do the attcompression magic and
>> then add the catalog later - I think we should start with the catalog
>> right away. The advantage is that if we end up committing only some of
>> the patches in this cycle, we already have all the infrastructure etc.
>> We can reorder that later, though.
>
>Hmm,  yeah we can do this way as well that first create a new catalog
>table and add entries for these two built-in methods and the
>attcompression can store the oid vector.  But if we only commit the
>build-in compression methods part then does it make sense to create an
>extra catalog or adding these build-in methods to the existing catalog
>(if we plan to use pg_am).  Then in attcompression instead of using
>one byte for each preserve compression method, we need to use oid.  So
>from Robert's mail[1], it appeared to me that he wants that the
>build-in compression methods part should be independently committable
>and if we think from that perspective then adding a catalog doesn't
>make much sense.  But if we are planning to commit the custom method
>also then it makes more sense to directly sta

Re: [PATCH] Automatic HASH and LIST partition creation

2020-10-06 Thread Anastasia Lubennikova


On 06.10.2020 00:21, Pavel Borisov wrote:

Hi, hackers!
I added some extra tests for different cases of use of automatic 
partition creation.
v3-0002 can be applied on top of the original v2 patch for correct 
work with some corner cases with constraints included in this test.



Thank you for the tests. I've added them and the fix into the patch.

I also noticed, that some table parameters, such as persistence were not 
promoted to auto generated partitions. This is fixed now. The test cases 
for temp and unlogged auto partitioned tables are updated respectively.
Besides, I slightly refactored the code and fixed documentation typos, 
that were reported by Rahila.


With my recent changes, one test statement, that you've added as 
failing, works.


CREATE TABLE list_parted_fail (a int) PARTITION BY LIST (a) CONFIGURATION
(VALUES IN ('1' collate "POSIX"));

It simply ignores collate POSIX part and creates a table with following 
structure:



   Partitioned table "public.list_parted_fail"
 Column |  Type   | Collation | Nullable | Default | Storage | Stats 
target | Description

+-+---+--+-+-+--+-
 a  | integer |   |  | | plain 
|  |

Partition key: LIST (a)
Partitions: list_parted_fail_0 FOR VALUES IN (1)

Do you think that it is a bug? For now, I removed this statement from 
tests just to calm down the CI.


--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

commit 4a387cfbd43e93b2c2e363307fb9c9ca53c3f56e
Author: anastasia 
Date:   Tue Oct 6 20:23:22 2020 +0300

Auto generated HASH and LIST partitions.

New syntax:

CREATE TABLE tbl_hash (i int) PARTITION BY HASH (i)
CONFIGURATION (modulus 3);

CREATE TABLE tbl_list (i int) PARTITION BY LIST (i)
CONFIGURATION (values in (1, 2), (3, 4) DEFAULT PARTITION tbl_default);

With tests and documentation draft

diff --git a/doc/src/sgml/ref/create_table.sgml b/doc/src/sgml/ref/create_table.sgml
index 28f844071b..4501e81bb5 100644
--- a/doc/src/sgml/ref/create_table.sgml
+++ b/doc/src/sgml/ref/create_table.sgml
@@ -29,6 +29,7 @@ CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE [ IF NOT EXI
 ] )
 [ INHERITS ( parent_table [, ... ] ) ]
 [ PARTITION BY { RANGE | LIST | HASH } ( { column_name | ( expression ) } [ COLLATE collation ] [ opclass ] [, ... ] ) ]
+[ CONFIGURATION ( partition_bound_auto_spec ) ]
 [ USING method ]
 [ WITH ( storage_parameter [= value] [, ... ] ) | WITHOUT OIDS ]
 [ ON COMMIT { PRESERVE ROWS | DELETE ROWS | DROP } ]
@@ -41,6 +42,7 @@ CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE [ IF NOT EXI
 [, ... ]
 ) ]
 [ PARTITION BY { RANGE | LIST | HASH } ( { column_name | ( expression ) } [ COLLATE collation ] [ opclass ] [, ... ] ) ]
+[ CONFIGURATION ( partition_bound_auto_spec ) ]
 [ USING method ]
 [ WITH ( storage_parameter [= value] [, ... ] ) | WITHOUT OIDS ]
 [ ON COMMIT { PRESERVE ROWS | DELETE ROWS | DROP } ]
@@ -53,6 +55,7 @@ CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE [ IF NOT EXI
 [, ... ]
 ) ] { FOR VALUES partition_bound_spec | DEFAULT }
 [ PARTITION BY { RANGE | LIST | HASH } ( { column_name | ( expression ) } [ COLLATE collation ] [ opclass ] [, ... ] ) ]
+[ CONFIGURATION ( partition_bound_auto_spec ) ]
 [ USING method ]
 [ WITH ( storage_parameter [= value] [, ... ] ) | WITHOUT OIDS ]
 [ ON COMMIT { PRESERVE ROWS | DELETE ROWS | DROP } ]
@@ -96,6 +99,11 @@ FROM ( { partition_bound_expr | MIN
   TO ( { partition_bound_expr | MINVALUE | MAXVALUE } [, ...] ) |
 WITH ( MODULUS numeric_literal, REMAINDER numeric_literal )
 
+and partition_bound_auto_spec is:
+
+VALUES IN ( partition_bound_expr [, ...] ), [( partition_bound_expr [, ...] )] [, ...] [DEFAULT PARTITION default_part_name]
+MODULUS numeric_literal
+
 index_parameters in UNIQUE, PRIMARY KEY, and EXCLUDE constraints are:
 
 [ INCLUDE ( column_name [, ... ] ) ]
@@ -383,6 +391,11 @@ WITH ( MODULUS numeric_literal, REM
   however, you can define these constraints on individual partitions.
  
 
+ 
+  Hash and list partitioning also support automatic creation of partitions
+  with an optional CONFIGURATION clause.
+
+
  
   See  for more discussion on table
   partitioning.
@@ -391,6 +404,42 @@ WITH ( MODULUS numeric_literal, REM
 

 
+   
+CONFIGURATION ( partition_bound_auto_spec ) ] 
+
+ 
+  The optional CONFIGURATION clause used together
+  with PARTITION BY specifies a rule of generating bounds
+  for partitions of the partitioned table. All partitions are created automatically
+  along with the parent table.
+  
+  Any indexes, constraints and user-defined row-level triggers that exist
+  in the parent table are cloned on the new partitions. When using this

Re: A modest proposal: let's add PID to assertion failure messages

2020-10-06 Thread Andres Freund

On 2020-10-05 10:20:01 +1300, Thomas Munro wrote:
> On Mon, Oct 5, 2020 at 10:08 AM Tom Lane  wrote:
> > In these days when we run almost all test cases in parallel, it's
> > frequently not that easy to tie a "TRAP: ..." message in the log
> > to nearby log messages.  (The postmaster's subsequent complaint
> > often helps, but it could be some distance away in the log; and
> > good luck untangling things if more than one Assert failure happens
> > concurrently.)  We could add a simple bread crumb trail by
> > including the process's PID in such messages.  Any objections?
> 
> +1

+1

Re: Add primary keys to system catalogs

2020-10-06 Thread Andres Freund

Hi,

On 2020-10-03 08:40:31 +0200, Peter Eisentraut wrote:
> Since we have ADD PRIMARY KEY USING INDEX, we can declare a primary key for
> an existing index.  So this doesn't have to affect the low-level early
> bootstrapping.  The attached patch adds those commands manually. Another
> option might be to have the catalog generation Perl tooling create those
> declarations automatically from some marker in the catalog files.  That
> would also allow declaring unique constraints for the unique indexes that
> don't end up being the primary key.

Hm. What prevents us from declaring the pkey during bootstrap? I don't
at all like adding yet another place that needs manual editing when
doing DDL changes.

Greetings,

Andres Freund

Re: pg_upgrade dead code for pre-8.4 versions

2020-10-06 Thread Bruce Momjian

On Tue, Oct  6, 2020 at 11:40:57AM +0200, Daniel Gustafsson wrote:
> > On 6 Oct 2020, at 11:27, Magnus Hagander  wrote:
> 
> > ISTM that code could be cleaned out, because it should be dead based on the 
> > initial check that we are upgrading from 8.4 or later?
> 
> +1.  Commit 2209b3923a7afe0b removed support for 8.3 so anything left checking
> for pre-8.4 should be removed as a pre-8.4 source cluster couldn't be upgraded
> with pg_upgrade today.

OK, fixed with the attached patch, applied to all branches.  There was
code that tested for >= 8.4 (useless test) or < 8.4 (dead code).  I
also adjusted the version tests to be consistent.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee

diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
new file mode 100644
index 89f3b0d..168058a
*** a/src/bin/pg_upgrade/check.c
--- b/src/bin/pg_upgrade/check.c
*** void
*** 234,251 
  output_completion_banner(char *analyze_script_file_name,
  		 char *deletion_script_file_name)
  {
! 	/* Did we copy the free space files? */
! 	if (GET_MAJOR_VERSION(old_cluster.major_version) >= 804)
! 		pg_log(PG_REPORT,
! 			   "Optimizer statistics are not transferred by pg_upgrade so,\n"
! 			   "once you start the new server, consider running:\n"
! 			   "%s\n\n", analyze_script_file_name);
! 	else
! 		pg_log(PG_REPORT,
! 			   "Optimizer statistics and free space information are not transferred\n"
! 			   "by pg_upgrade so, once you start the new server, consider running:\n"
! 			   "%s\n\n", analyze_script_file_name);
! 
  
  	if (deletion_script_file_name)
  		pg_log(PG_REPORT,
--- 234,243 
  output_completion_banner(char *analyze_script_file_name,
  		 char *deletion_script_file_name)
  {
! 	pg_log(PG_REPORT,
! 		   "Optimizer statistics are not transferred by pg_upgrade so,\n"
! 		   "once you start the new server, consider running:\n"
! 		   "%s\n\n", analyze_script_file_name);
  
  	if (deletion_script_file_name)
  		pg_log(PG_REPORT,
*** create_script_for_cluster_analyze(char *
*** 510,528 
  			ECHO_QUOTE, ECHO_QUOTE);
  	fprintf(script, "echo %sthis script and run:%s\n",
  			ECHO_QUOTE, ECHO_QUOTE);
! 	fprintf(script, "echo %s\"%s/vacuumdb\" %s--all %s%s\n", ECHO_QUOTE,
! 			new_cluster.bindir, user_specification.data,
! 	/* Did we copy the free space files? */
! 			(GET_MAJOR_VERSION(old_cluster.major_version) >= 804) ?
! 			"--analyze-only" : "--analyze", ECHO_QUOTE);
  	fprintf(script, "echo%s\n\n", ECHO_BLANK);
  
  	fprintf(script, "\"%s/vacuumdb\" %s--all --analyze-in-stages\n",
  			new_cluster.bindir, user_specification.data);
- 	/* Did we copy the free space files? */
- 	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 803)
- 		fprintf(script, "\"%s/vacuumdb\" %s--all\n", new_cluster.bindir,
- user_specification.data);
  
  	fprintf(script, "echo%s\n\n", ECHO_BLANK);
  	fprintf(script, "echo %sDone%s\n",
--- 502,513 
  			ECHO_QUOTE, ECHO_QUOTE);
  	fprintf(script, "echo %sthis script and run:%s\n",
  			ECHO_QUOTE, ECHO_QUOTE);
! 	fprintf(script, "echo %s\"%s/vacuumdb\" %s--all --analyze-only%s\n", ECHO_QUOTE,
! 			new_cluster.bindir, user_specification.data, ECHO_QUOTE);
  	fprintf(script, "echo%s\n\n", ECHO_BLANK);
  
  	fprintf(script, "\"%s/vacuumdb\" %s--all --analyze-in-stages\n",
  			new_cluster.bindir, user_specification.data);
  
  	fprintf(script, "echo%s\n\n", ECHO_BLANK);
  	fprintf(script, "echo %sDone%s\n",
diff --git a/src/bin/pg_upgrade/relfilenode.c b/src/bin/pg_upgrade/relfilenode.c
new file mode 100644
index af9a021..f76ddaa
*** a/src/bin/pg_upgrade/relfilenode.c
--- b/src/bin/pg_upgrade/relfilenode.c
*** transfer_single_new_db(FileNameMap *maps
*** 163,178 
  			/* transfer primary file */
  			transfer_relfile(&maps[mapnum], "", vm_must_add_frozenbit);
  
! 			/* fsm/vm files added in PG 8.4 */
! 			if (GET_MAJOR_VERSION(old_cluster.major_version) >= 804)
! 			{
! /*
!  * Copy/link any fsm and vm files, if they exist
!  */
! transfer_relfile(&maps[mapnum], "_fsm", vm_must_add_frozenbit);
! if (vm_crashsafe_match)
! 	transfer_relfile(&maps[mapnum], "_vm", vm_must_add_frozenbit);
! 			}
  		}
  	}
  }
--- 163,174 
  			/* transfer primary file */
  			transfer_relfile(&maps[mapnum], "", vm_must_add_frozenbit);
  
! 			/*
! 			 * Copy/link any fsm and vm files, if they exist
! 			 */
! 			transfer_relfile(&maps[mapnum], "_fsm", vm_must_add_frozenbit);
! 			if (vm_crashsafe_match)
! transfer_relfile(&maps[mapnum], "_vm", vm_must_add_frozenbit);
  		}
  	}
  }

Re: pg_upgrade analyze script

2020-10-06 Thread Bruce Momjian

On Tue, Oct  6, 2020 at 11:43:01AM +0200, Magnus Hagander wrote:
> For a long time now, pg_upgrade drops a script (analyze_new_cluster.sh) that
> just calls vacuumdb to run the analyze in stages. This script made a lot of
> sense back when it actually implemented the stages, but these days since it's
> just calling a single command, I think it's just unnecessary complication.
> 
> I suggest we drop it and just replace it with instructions to call vacuumdb
> directly.

Uh, that script has instructions on what is running.  Do we want to show
that at the end of running pg_upgrade or do people understand "stages"
by now?

> Attached patch does this. It also removes the support in the instructions that
> talk about pre-8.4 databases, which I believe is dead code per https://
> postgr.es/m/CABUevEx-D0PNVe00tkeQRGennZQwDtBJn=493mjt-x6sppb...@mail.gmail.com.

I just removed that.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee

Re: Add primary keys to system catalogs

2020-10-06 Thread Tom Lane

Andres Freund  writes:
> On 2020-10-03 08:40:31 +0200, Peter Eisentraut wrote:
>> Since we have ADD PRIMARY KEY USING INDEX, we can declare a primary key for
>> an existing index.  So this doesn't have to affect the low-level early
>> bootstrapping.  The attached patch adds those commands manually. Another
>> option might be to have the catalog generation Perl tooling create those
>> declarations automatically from some marker in the catalog files.  That
>> would also allow declaring unique constraints for the unique indexes that
>> don't end up being the primary key.

> Hm. What prevents us from declaring the pkey during bootstrap? I don't
> at all like adding yet another place that needs manual editing when
> doing DDL changes.

We don't add new catalogs often, so the cost-benefit ratio of automating
this looks pretty poor.  It's not like you'll be able to forget to do it,
given the proposed regression test check for catalogs without pkeys.

One thing I'm wondering about though is whether Peter's implementation
ends up with desirable pg_depend contents for the pkey constraints.
Probably we want them to just be pinned and not have any explicit
dependencies on the associated indexes, but I haven't thought hard
about it.

regards, tom lane

Re: Add primary keys to system catalogs

2020-10-06 Thread Andres Freund

Hi,

On 2020-10-06 15:31:16 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On 2020-10-03 08:40:31 +0200, Peter Eisentraut wrote:
> >> Since we have ADD PRIMARY KEY USING INDEX, we can declare a primary key for
> >> an existing index.  So this doesn't have to affect the low-level early
> >> bootstrapping.  The attached patch adds those commands manually. Another
> >> option might be to have the catalog generation Perl tooling create those
> >> declarations automatically from some marker in the catalog files.  That
> >> would also allow declaring unique constraints for the unique indexes that
> >> don't end up being the primary key.
> 
> > Hm. What prevents us from declaring the pkey during bootstrap? I don't
> > at all like adding yet another place that needs manual editing when
> > doing DDL changes.
> 
> We don't add new catalogs often, so the cost-benefit ratio of automating
> this looks pretty poor.

True, we don't create new ones that often. Still think that distributing
such setup over fewer places is good. And it's not like there's only a
handful of pkeys to start with. To me it makes more sense to add a
DECLARE_PRIMARY_KEY in indexing.h, replacing the relevant
DECLARE_UNIQUE_INDEX stanzas. Or, possibly preferrable, do it as part of
the CATALOG() directly - which'd avoid needing the index statement in
the first place.

The Catalog.pm part is trivial. It's abit harder to implement actually
creating the constraint - but even the hard route of adding a new field
to Boot_DeclareIndexStmt or such wouldn't be particularly hard.

> One thing I'm wondering about though is whether Peter's implementation
> ends up with desirable pg_depend contents for the pkey constraints.
> Probably we want them to just be pinned and not have any explicit
> dependencies on the associated indexes, but I haven't thought hard
> about it.

That sounds right to me.

Wonder whether it's not time to move the DECLARE bits from indexing.h to
the individual catalogs, independent of what we're discussing here. With
todays Catalog.pm there's really not much point in keeping them
separate, imo.

Greetings,

Andres Freund

Re: pg_upgrade analyze script

2020-10-06 Thread Magnus Hagander

On Tue, Oct 6, 2020 at 9:05 PM Bruce Momjian  wrote:

> On Tue, Oct  6, 2020 at 11:43:01AM +0200, Magnus Hagander wrote:
> > For a long time now, pg_upgrade drops a script (analyze_new_cluster.sh)
> that
> > just calls vacuumdb to run the analyze in stages. This script made a lot
> of
> > sense back when it actually implemented the stages, but these days since
> it's
> > just calling a single command, I think it's just unnecessary
> complication.
> >
> > I suggest we drop it and just replace it with instructions to call
> vacuumdb
> > directly.
>
> Uh, that script has instructions on what is running.  Do we want to show
> that at the end of running pg_upgrade or do people understand "stages"
> by now?
>

I think that's information that belongs in the documentation, for those
that case. I'm willing to bet that the large majority of the people who run
the script never read that, and certainly don't cancel it. Those that need
it to their upgrade planning long before they get to that stage.

It's already decently documented on the vacuumdb page.  Maybe just add a
link to that from the pg_upgrade reference page (
https://www.postgresql.org/docs/current/pgupgrade.html under point 14)?

> Attached patch does this. It also removes the support in the instructions
> that
> > talk about pre-8.4 databases, which I believe is dead code per https://
> >
> postgr.es/m/CABUevEx-D0PNVe00tkeQRGennZQwDtBJn=493mjt-x6sppb...@mail.gmail.com
> .
>
> I just removed that.
>

Great, I'll simplify the patch if we agree that it's a good way forward,
and also add a doc reference per above.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/

Re: pg_upgrade dead code for pre-8.4 versions

2020-10-06 Thread Magnus Hagander

On Tue, Oct 6, 2020 at 8:45 PM Bruce Momjian  wrote:

> On Tue, Oct  6, 2020 at 11:40:57AM +0200, Daniel Gustafsson wrote:
> > > On 6 Oct 2020, at 11:27, Magnus Hagander  wrote:
> >
> > > ISTM that code could be cleaned out, because it should be dead based
> on the initial check that we are upgrading from 8.4 or later?
> >
> > +1.  Commit 2209b3923a7afe0b removed support for 8.3 so anything left
> checking
> > for pre-8.4 should be removed as a pre-8.4 source cluster couldn't be
> upgraded
> > with pg_upgrade today.
>
> OK, fixed with the attached patch, applied to all branches.  There was
> code that tested for >= 8.4 (useless test) or < 8.4 (dead code).  I
> also adjusted the version tests to be consistent.
>

Thanks!

And yeah, I noticed the inconsistency in the checking as well. Getting that
cleaned up was good.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/

Re: pg_upgrade analyze script

2020-10-06 Thread Bruce Momjian

On Tue, Oct  6, 2020 at 10:40:30PM +0200, Magnus Hagander wrote:
> I think that's information that belongs in the documentation, for those that
> case. I'm willing to bet that the large majority of the people who run the
> script never read that, and certainly don't cancel it. Those that need it to
> their upgrade planning long before they get to that stage.
> 
> It's already decently documented on the vacuumdb page.  Maybe just add a link
> to that from the pg_upgrade reference page (https://www.postgresql.org/docs/
> current/pgupgrade.html under point 14)?

Agreed.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee

Re: pg_dump bug for extension owned tables

2020-10-06 Thread Tom Lane

Andrew Dunstan  writes:
> Thanks, Committed. Further investigation shows this was introduced in
> release 12, so that's how far back I went.

Still further investigation shows that this patch caused bug #16655 [1].
It should *not* have been designed to unconditionally clear the
table's "interesting" flag, as there may have been other reasons
why that was set.  The right way to think about it is "if we are
going to dump the table's data, then the table certainly needs its
interesting flag set, so that we'll collect the per-attribute info.
Otherwise leave well enough alone".

The patches I proposed in the other thread seem like they really ought
to go all the way back for safety's sake.  However, I do not observe
any crash on the test case in v11, and I'm kind of wondering why not.
Did you identify exactly where this was "introduced in release 12"?

regards, tom lane

[1] 
https://www.postgresql.org/message-id/flat/16655-5c92d6b3a9438137%40postgresql.org

Re: [HACKERS] logical decoding of two-phase transactions

2020-10-06 Thread Peter Smith

Hello Ajin.

I have gone through the v6 patch changes and have a list of review
comments below.

Apologies for the length of this email - I know that many of the
following comments are trivial, but I figured I should either just
ignore everything cosmetic, or list everything regardless. I chose the
latter.

There may be some duplication where the same review comment is written
for multiple files and/or where the same file is in your multiple
patches.

Kind Regards.
Peter Smith
Fujitsu Australia

[BEGIN]

==
Patch V6-0001, File: contrib/test_decoding/expected/prepared.out (so
prepared.sql also)
==

COMMENT
Line 30 - The INSERT INTO test_prepared1 VALUES (2); is kind of
strange because it is not really part of the prior test nor the
following test. Maybe it would be better to have a comment describing
the purpose of this isolated INSERT and to also consume the data from
the slot so it does not get jumbled with the data of the following
(abort) test.

;

COMMENT
Line 53 - Same comment for this test INSERT INTO test_prepared1 VALUES
(4); It kind of has nothing really to do with either the prior (abort)
test nor the following (ddl) test.

;

COMMENT
Line 60 - Seems to check which locks are held for the test_prepared_1
table while the transaction is in progress. Maybe it would be better
to have more comments describing what is expected here and why.

;

COMMENT
Line 88 - There is a comment in the test saying "-- We should see '7'
before '5' in our results since it commits first." but I did not see
any test code that actually verifies that happens.

;

QUESTION
Line 120 - I did not really understand the SQL checking the pg_class.
I expected this would be checking table 'test_prepared1' instead. Can
you explain it?
SELECT 'pg_class' AS relation, locktype, mode
FROM pg_locks
WHERE locktype = 'relation'
AND relation = 'pg_class'::regclass;
relation | locktype | mode
--+--+--
(0 rows)

;

QUESTION
Line 139 - SET statement_timeout = '1s'; is 1 seconds short enough
here for this test, or might it be that these statements would be
completed in less than one seconds anyhow?

;

QUESTION
Line 163 - How is this testing a SAVEPOINT? Or is it only to check
that the SAVEPOINT command is not part of the replicated changes?

;

COMMENT
Line 175 - Missing underscore in comment. Code requires also underscore:
"nodecode" --> "_nodecode"

==
Patch V6-0001, File: contrib/test_decoding/test_decoding.c
==

COMMENT
Line 43
@@ -36,6 +40,7 @@ typedef struct
bool skip_empty_xacts;
bool xact_wrote_changes;
bool only_local;
+ TransactionId check_xid; /* track abort of this txid */
} TestDecodingData;

The "check_xid" seems a meaningless name. Check what?
IIUC maybe should be something like "check_xid_aborted"

;

COMMENT
Line 105
@ -88,6 +93,19 @@ static void
pg_decode_stream_truncate(LogicalDecodingContext *ctx,
 ReorderBufferTXN *txn,
 int nrelations, Relation relations[],
 ReorderBufferChange *change);
+static bool pg_decode_filter_prepare(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,

Remove extra blank line after these functions

;

COMMENT
Line 149
@@ -116,6 +134,11 @@ _PG_output_plugin_init(OutputPluginCallbacks *cb)
 cb->stream_change_cb = pg_decode_stream_change;
 cb->stream_message_cb = pg_decode_stream_message;
 cb->stream_truncate_cb = pg_decode_stream_truncate;
+ cb->filter_prepare_cb = pg_decode_filter_prepare;
+ cb->prepare_cb = pg_decode_prepare_txn;
+ cb->commit_prepared_cb = pg_decode_commit_prepared_txn;
+ cb->abort_prepared_cb = pg_decode_abort_prepared_txn;
+
 }

There is a confusing mix of terminology where sometimes things are
referred as ROLLBACK/rollback and other times apparently the same
operation is referred as ABORT/abort. I do not know the root cause of
this mixture. IIUC maybe the internal functions and protocol generally
use the term "abort", whereas the SQL syntax is "ROLLBACK"... but
where those two terms collide in the middle it gets quite confusing.

At least I thought the names of the "callbacks" which get exposed to
the user (e.g. in the help) might be better if they would match the
SQL.
"abort_prepared_cb" --> "rollback_prepared_db"

There are similar review comments like this below where the
alternating terms caused me some confusion.

~

Also, Remove the extra blank line before the end of the function.

;

COMMENT
Line 267
@ -227,6 +252,42 @@ pg_decode_startup(LogicalDecodingContext *ctx,
OutputPluginOptions *opt,
 errmsg("could not parse value \"%s\" for parameter \"%s\"",
 strVal(elem->arg), elem->defname)));
 }
+ else if (strcmp(elem->defname, "two-phase-commit") == 0)
+ {
+ if (elem->arg == NULL)
+ continue;

IMO the "check-xid" code might be better rearranged so the NULL check
is first instead of if/else.
e.g.
if (elem->arg == NULL)
ereport(FATAL,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("check-xid needs an input value")));
~

Also, is it really supposed to be FATAL instead or ERROR. That is not
the same as

Re: [doc] clarify behaviour of pg_dump's -t/--table option with non-tables

2020-10-06 Thread Ian Lawrence Barwick

2020年10月6日(火) 22:48 Magnus Hagander :
>
> On Tue, Oct 6, 2020 at 3:45 PM Ian Lawrence Barwick  wrote:
>>
>> 2020年10月6日(火) 21:58 Ian Lawrence Barwick :
>> >
>> > Hi
>> >
>> > Recently I ran into a case where someone was wondering why it was not
>> > possible to dump the contents of a view, even though the documentation [1]
>> > seems to imply this is possible.
>> >
>> > Currently it says:
>> >
>> >   Dump only tables with names matching pattern. For this purpose, "table"
>> >   includes views, materialized views, sequences, and foreign tables.
>> >
>> > The attached patch attempts to clarify that only definitions of those 
>> > objects
>> > will be dumped, and also mentions that dumping foreign table data requires 
>> > the
>> > --include-foreign-data option.
>> >
>> > I suggest backpatching any changes to Pg13 where the --include-foreign-data
>> > option was added.
>> >
>> > [1] https://www.postgresql.org/docs/current/app-pgdump.html
>>
>> Better version attached.
>
>
> Argh, perfect timing. I'll update with your new version :)

Whoops, wasn't expecting such a quick response. Thanks!

FWIW, I sent in another patch suggesting removal of an ancient
backwards compatibility
"Note" under the same -t/--tables option description [1].

[1] 
https://www.postgresql.org/message-id/flat/CAB8KJ%3DjYHgnxLLZSNJz7gBTck4TxomngCmGkw3nEMSNF0yL6wA%40mail.gmail.com


Regards

Ian Barwick

-- 
EnterpriseDB: https://www.enterprisedb.com

Re: Parallel INSERT (INTO ... SELECT ...)

2020-10-06 Thread Greg Nancarrow

On Wed, Oct 7, 2020 at 12:40 AM Bharath Rupireddy
 wrote:
>
> In parallel, we are not doing anything(due to the same reason
> explained in above comment) to find whether there is a foreign
> partition or not while deciding to go with parallel/non-parallel copy,
> we are just throwing an error during the first tuple insertion into
> the partition.
>
> errmsg("cannot perform PARALLEL COPY if partition has BEFORE/INSTEAD
> OF triggers, or if the partition is foreign partition"),
> errhint("Try COPY without PARALLEL option")));
>

I may well need to do something similar for parallel INSERT, but I'm
kind of surprised it can't be detected earlier (?).
Will need to further test this.

>
> Allowing the leader to execute before statement triggers at Gather
> node level before invoking the parallel plan and then parallel inserts
> makes sense. But if there are any after statement triggers, there may
> come transition tables, see Amit's findings under Case-1 in [1] and we
> must disable parallelism in that case.
>
> [1] - 
> https://www.postgresql.org/message-id/flat/CAA4eK1%2BANNEaMJCCXm4naweP5PLY6LhJMvGo_V7-Pnfbh6GsOA%40mail.gmail.com
>

The patch I last posted for parallel INSERT does detect use of
transition tables in this case (trigdesc->trig_insert_new_table) and
disables INSERT parallelism (I tested it against Amit's example), yet
still otherwise allows AFTER STATEMENT triggers for parallel INSERT.

Regards,
Greg Nancarrow
Fujitsu Australia

Re: WIP: WAL prefetch (another approach)

2020-10-06 Thread Tomas Vondra


On Thu, Sep 24, 2020 at 11:38:45AM +1200, Thomas Munro wrote:

On Wed, Sep 9, 2020 at 11:16 AM Tomas Vondra
 wrote:

OK, thanks for looking into this. I guess I'll wait for an updated patch
before testing this further. The storage has limited capacity so I'd
have to either reduce the amount of data/WAL or juggle with the WAL
segments somehow. Doesn't seem worth it.


Here's a new WIP version that works for archive-based recovery in my tests.

The main change I have been working on is that there is now just a
single XLogReaderState, so no more double-reading and double-decoding
of the WAL.  It provides XLogReadRecord(), as before, but now you can
also read further ahead with XLogReadAhead().  The user interface is
much like before, except that the GUCs changed a bit.  They are now:

 recovery_prefetch=on
 recovery_prefetch_fpw=off
 wal_decode_buffer_size=256kB
 maintenance_io_concurrency=10

I recommend setting maintenance_io_concurrency and
wal_decode_buffer_size much higher than those defaults.



I think you've left the original GUC (replaced by the buffer size) in
the postgresql.conf.sample file. Confused me for a bit ;-)

I've done a bit of testing and so far it seems to work with WAL archive,
so I'll do more testing and benchmarking over the next couple days.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2020-10-06 Thread Michael Paquier

On Tue, Oct 06, 2020 at 09:22:29AM -0400, Bruce Momjian wrote:
> I propose moving the pg_stat_statements queryid hash code into the
> server (with a version number), and also adding a postgresql.conf
> variable that lets you control how detailed the queryid hash is
> computed.  This addresses the problem of people wanting different hash
> methods.

In terms of making this part expendable in the future, there could be
a point in having an enum here, but are we sure that we will have a
need for that in the future?  What I get from this discussion is that
we want a unique source of truth that users can consume, and that the
only source of truth proposed is the PGSS hashing.  We may change the
way we compute the query ID in the future, for example if it gets
expanded to some utility statements, etc.  But that would be
controlled by the version number in the hash, not the GUC itself.

> When computing a hash, the queryid detail level and version number will
> be mixed into the hash, so only a hash that used a similar query and
> identical queryid detail level would match.

Yes, having a version number directly dependent on the hashing sounds
like a good compromise to me.
--
Michael

signature.asc
Description: PGP signature

Re: Recent failures on buildfarm member hornet

2020-10-06 Thread Noah Misch

On Mon, Oct 05, 2020 at 08:42:15PM -0400, Tom Lane wrote:
> hornet has failed its last five runs with
> 
> 2020-10-05 22:45:42.784 UTC [34734498:40] pg_regress/create_aggregate LOG:  
> statement: create aggregate my_percentile_disc(float8 ORDER BY anyelement) (
> stype = internal,
> sfunc = ordered_set_transition,
> finalfunc = percentile_disc_final,
> finalfunc_extra = true,
> finalfunc_modify = read_write
>   );
> TRAP: FailedAssertion("variadicArgType != InvalidOid", File: 
> "pg_aggregate.c", Line: 216, PID: 34734498)
> 
> After looking at the commits immediately preceding the first failure, and
> digging around in the aggregate-related code, it seems like commit 
> cc99baa43 (Improve pg_list.h's linitial(), lsecond() and co macros)
> must've broke it somehow.  The nearest thing that I can see to a theory
> is that where DefineAggregate does
>   numDirectArgs = intVal(lsecond(args));
> it's coming out with the wrong result, leading to a failure of the
> numDirectArgs-vs-numArgs sanity check in AggregateCreate.

Building the tree with -O0 suppresses the problem.  (xlc does not have -O1.)
Building just aggregatecmds.c and pg_aggregate.c that way does not, so I
suppose the damage arose elsewhere.

> But how could
> that be?  I hesitate to blame the compiler twice in one week.  OTOH, it's
> a not-very-mainstream compiler on a not-very-mainstream architecture.

A compiler bug is plausible.  The compiler is eight years old, and we're not
seeing the problem on 32-bit (mandrill) or on 2019-vintage xlc (hoverfly).
Shall I make this animal use -O0 on v14+?

Re: Prevent printing "next step instructions" in initdb and pg_upgrade

2020-10-06 Thread Michael Paquier

On Tue, Oct 06, 2020 at 11:22:11AM -0400, Bruce Momjian wrote:
> On Tue, Oct  6, 2020 at 11:06:13AM -0400, Tom Lane wrote:
>> OK.  FWIW, I'd vote for separate --no-instructions and --no-scripts
>> switches.
> 
> Works for me.

+1.
--
Michael


signature.asc
Description: PGP signature

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2020-10-06 Thread Bruce Momjian

On Wed, Oct  7, 2020 at 10:42:49AM +0900, Michael Paquier wrote:
> On Tue, Oct 06, 2020 at 09:22:29AM -0400, Bruce Momjian wrote:
> > I propose moving the pg_stat_statements queryid hash code into the
> > server (with a version number), and also adding a postgresql.conf
> > variable that lets you control how detailed the queryid hash is
> > computed.  This addresses the problem of people wanting different hash
> > methods.
> 
> In terms of making this part expendable in the future, there could be
> a point in having an enum here, but are we sure that we will have a
> need for that in the future?  What I get from this discussion is that
> we want a unique source of truth that users can consume, and that the
> only source of truth proposed is the PGSS hashing.  We may change the
> way we compute the query ID in the future, for example if it gets
> expanded to some utility statements, etc.  But that would be
> controlled by the version number in the hash, not the GUC itself.

Oh, if that is true, then I agree let's just go with the version number.

> > When computing a hash, the queryid detail level and version number will
> > be mixed into the hash, so only a hash that used a similar query and
> > identical queryid detail level would match.
> 
> Yes, having a version number directly dependent on the hashing sounds
> like a good compromise to me.

Good, much simpler.  I think there is enough demand for a queryid that I
would like to get this moving forward.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee

Re: Recent failures on buildfarm member hornet

2020-10-06 Thread Tom Lane

Noah Misch  writes:
> On Mon, Oct 05, 2020 at 08:42:15PM -0400, Tom Lane wrote:
>> hornet has failed its last five runs with
>> TRAP: FailedAssertion("variadicArgType != InvalidOid", File: 
>> "pg_aggregate.c", Line: 216, PID: 34734498)

> Building the tree with -O0 suppresses the problem.  (xlc does not have -O1.)

OK, that's pretty unsurprising.

> Building just aggregatecmds.c and pg_aggregate.c that way does not, so I
> suppose the damage arose elsewhere.

Now that *is* surprising.  Could you poke a little further to determine
which module is getting miscompiled?  (gram.o seems like the next most
likely bet.)

regards, tom lane

Re: Recent failures on buildfarm member hornet

2020-10-06 Thread Noah Misch

On Tue, Oct 06, 2020 at 09:56:49PM -0400, Tom Lane wrote:
> Noah Misch  writes:
> > On Mon, Oct 05, 2020 at 08:42:15PM -0400, Tom Lane wrote:
> >> hornet has failed its last five runs with
> >> TRAP: FailedAssertion("variadicArgType != InvalidOid", File: 
> >> "pg_aggregate.c", Line: 216, PID: 34734498)
> 
> > Building the tree with -O0 suppresses the problem.  (xlc does not have -O1.)
> 
> OK, that's pretty unsurprising.
> 
> > Building just aggregatecmds.c and pg_aggregate.c that way does not, so I
> > suppose the damage arose elsewhere.
> 
> Now that *is* surprising.  Could you poke a little further to determine
> which module is getting miscompiled?  (gram.o seems like the next most
> likely bet.)

gram.o was it.  (The way I rebuilt gram.o also rebuilt parser.o and scan.o,
but those seem far less likely.)

Re: Use standard SIGHUP and SIGTERM handlers in autoprewarm module

2020-10-06 Thread Bharath Rupireddy

On Tue, Oct 6, 2020 at 11:41 AM Bharath Rupireddy
 wrote:
>
> On Tue, Oct 6, 2020 at 11:20 AM Fujii Masao  
> wrote:
> >
> > +1 Or it's also ok to make each patch separately.
> > Anyway, thanks!
> >
>
> Thanks. +1 to have separate patches. I will post two separate patches
> for autoprewarm and WalRcvShutdownHandler for further review. The
> other two patches can be for startup.c and syslogger.c.
>

I'm attaching all the 4 patches here together.

1. v1-use-standard-SIGHUP-and-SIGTERM-handlers-in-autoprewarm-process.patch
--  This is the patch initially sent in this mail by me, I just
renamed it.
2. v1-use-MyLatch-and-standard-SIGHUP-handler-in-startup-process.patch
-- This is the patch proposed by Fuji Masao, I just renamed it.
3. v1-use-standard-SIGHUP-hanlder-in-syslogger-process.patch
4. v1-use-standard-SIGTERM-handler-in-walreceiver-process.patch

Please consider these patches for further review.


With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
From b9d77b6e84d3eee87693893c1687f120ee3db68b Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy 
Date: Wed, 7 Oct 2020 07:12:57 +0530
Subject: [PATCH v1] autoprewarm use standard SIGHUP and SIGTERM handlers

Replace apw_sigterm_handler() and apw_sighup_handler() with standard
SignalHandlerForShutdownRequest() and SignalHandlerForConfigReload()
respectively.
---
 contrib/pg_prewarm/autoprewarm.c | 49 
 1 file changed, 6 insertions(+), 43 deletions(-)

diff --git a/contrib/pg_prewarm/autoprewarm.c b/contrib/pg_prewarm/autoprewarm.c
index c32ddc56fd..93e526ef62 100644
--- a/contrib/pg_prewarm/autoprewarm.c
+++ b/contrib/pg_prewarm/autoprewarm.c
@@ -35,6 +35,7 @@
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "postmaster/bgworker.h"
+#include "postmaster/interrupt.h"
 #include "storage/buf_internals.h"
 #include "storage/dsm.h"
 #include "storage/ipc.h"
@@ -94,12 +95,6 @@ static void apw_start_database_worker(void);
 static bool apw_init_shmem(void);
 static void apw_detach_shmem(int code, Datum arg);
 static int	apw_compare_blockinfo(const void *p, const void *q);
-static void apw_sigterm_handler(SIGNAL_ARGS);
-static void apw_sighup_handler(SIGNAL_ARGS);
-
-/* Flags set by signal handlers */
-static volatile sig_atomic_t got_sigterm = false;
-static volatile sig_atomic_t got_sighup = false;
 
 /* Pointer to shared-memory state. */
 static AutoPrewarmSharedState *apw_state = NULL;
@@ -161,8 +156,8 @@ autoprewarm_main(Datum main_arg)
 	TimestampTz last_dump_time = 0;
 
 	/* Establish signal handlers; once that's done, unblock signals. */
-	pqsignal(SIGTERM, apw_sigterm_handler);
-	pqsignal(SIGHUP, apw_sighup_handler);
+	pqsignal(SIGTERM, SignalHandlerForShutdownRequest);
+	pqsignal(SIGHUP, SignalHandlerForConfigReload);
 	pqsignal(SIGUSR1, procsignal_sigusr1_handler);
 	BackgroundWorkerUnblockSignals();
 
@@ -206,12 +201,12 @@ autoprewarm_main(Datum main_arg)
 	}
 
 	/* Periodically dump buffers until terminated. */
-	while (!got_sigterm)
+	while (!ShutdownRequestPending)
 	{
 		/* In case of a SIGHUP, just reload the configuration. */
-		if (got_sighup)
+		if (ConfigReloadPending)
 		{
-			got_sighup = false;
+			ConfigReloadPending = false;
 			ProcessConfigFile(PGC_SIGHUP);
 		}
 
@@ -895,35 +890,3 @@ apw_compare_blockinfo(const void *p, const void *q)
 
 	return 0;
 }
-
-/*
- * Signal handler for SIGTERM
- */
-static void
-apw_sigterm_handler(SIGNAL_ARGS)
-{
-	int			save_errno = errno;
-
-	got_sigterm = true;
-
-	if (MyProc)
-		SetLatch(&MyProc->procLatch);
-
-	errno = save_errno;
-}
-
-/*
- * Signal handler for SIGHUP
- */
-static void
-apw_sighup_handler(SIGNAL_ARGS)
-{
-	int			save_errno = errno;
-
-	got_sighup = true;
-
-	if (MyProc)
-		SetLatch(&MyProc->procLatch);
-
-	errno = save_errno;
-}
-- 
2.25.1

From 50d718fec6ade8c101d466ee71f0ab30ed020182 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy 
Date: Wed, 7 Oct 2020 07:17:49 +0530
Subject: [PATCH v1] startup use MyLatch and standard SIGHUP handler

Remove a separate shared latch with MyLatch which is also a shared
latch in startup process. This change can let startup process use
standard SIGHUP handler SignalHandlerForConfigReload() instead of
StartupProcSigHupHandler().
---
 src/backend/access/transam/xlog.c | 30 +++---
 src/backend/postmaster/startup.c  | 24 +---
 2 files changed, 16 insertions(+), 38 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 52a67b1170..c211d66994 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -682,7 +682,7 @@ typedef struct XLogCtlData
 	 * WAL replay, if it is waiting for WAL to arrive or failover trigger file
 	 * to appear.
 	 */
-	Latch		recoveryWakeupLatch;
+	Latch		*recoveryWakeupLatch;
 
 	/*
 	 * During recovery, we keep a copy of the latest checkpoint record here.
@@ -5185,7 +5185,6 @@ XLOGShmemInit(void)
 	SpinLockInit(&XLogCtl->Insert.insertpos_lck);
 	SpinLockIn

nbtpage.c:356 Expression 'metad->btm_root != P_NONE' is always false.

2020-10-06 Thread Антон Пацев

Hello!
I am trying to help for Project PostgreSQL. Check source code PostgreSQL by
PVS-Studio.
PVS-Studio have many issue, but some issue is interesting.
nbtpage.c:356  *High* V547
 Expression 'metad->btm_root != 0' is
always false.
Source file src/backend/access/nbtree/nbtpage.c

if (metad->btm_root == P_NONE)
{
Page metapg;
/* If access = BT_READ, caller doesn't want us to create root yet */
if (access == BT_READ)
{
_bt_relbuf(rel, metabuf);
return InvalidBuffer;
}
/* trade in our read lock for a write lock */
_bt_unlockbuf(rel, metabuf);
_bt_lockbuf(rel, metabuf, BT_WRITE);
/*
* Race condition: if someone else initialized the metadata between
* the time we released the read lock and acquired the write lock, we
* must avoid doing it again.
*/
if (metad->btm_root != P_NONE)

↑ V547  Expression 'metad->btm_root !=
0' is always false.

-- 
С уважением, Антон Пацев.
Best regards, Anton Patsev.

Re: nbtpage.c:356 Expression 'metad->btm_root != P_NONE' is always false.

2020-10-06 Thread Tom Lane

=?UTF-8?B?0JDQvdGC0L7QvSDQn9Cw0YbQtdCy?=  writes:
> I am trying to help for Project PostgreSQL. Check source code PostgreSQL by
> PVS-Studio.
> PVS-Studio have many issue, but some issue is interesting.
> nbtpage.c:356  *High* V547
>  Expression 'metad->btm_root != 0' is
> always false.

This is incorrect.  The tool apparently doesn't understand that the
storage pointed to by metad can be changed by other processes in the
interval where we don't hold a lock on the buffer.

regards, tom lane

Re: [HACKERS] Custom compression methods

2020-10-06 Thread Dilip Kumar

On Tue, Oct 6, 2020 at 10:21 PM Tomas Vondra
 wrote:
>
> On Tue, Oct 06, 2020 at 11:00:55AM +0530, Dilip Kumar wrote:
> >On Mon, Oct 5, 2020 at 9:34 PM Tomas Vondra
> > wrote:
> >>
> >> On Mon, Oct 05, 2020 at 07:57:41PM +0530, Dilip Kumar wrote:
> >> >On Mon, Oct 5, 2020 at 5:53 PM Tomas Vondra
> >> > wrote:
> >> >>
> >> >> On Mon, Oct 05, 2020 at 11:17:28AM +0530, Dilip Kumar wrote:
> >> >> >On Mon, Oct 5, 2020 at 3:37 AM Tomas Vondra
> >> >> > wrote:
> >> >> >
> >> >> >Thanks, Tomas for your feedback.
> >> >> >
> >> >> >> 9) attcompression ...
> >> >> >>
> >> >> >> The main issue I see is what the patch does with attcompression. 
> >> >> >> Instead
> >> >> >> of just using it to store a the compression method, it's also used to
> >> >> >> store the preserved compression methods. And using NameData to store
> >> >> >> this seems wrong too - if we really want to store this info, the 
> >> >> >> correct
> >> >> >> way is either using text[] or inventing charvector or similar.
> >> >> >
> >> >> >The reason for using the NameData is the get it in the fixed part of
> >> >> >the data structure.
> >> >> >
> >> >>
> >> >> Why do we need that? It's possible to have varlena fields with direct
> >> >> access (see pg_index.indkey for example).
> >> >
> >> >I see.  While making it NameData I was thinking whether we have an
> >> >option to direct access the varlena. Thanks for pointing me there. I
> >> >will change this.
> >> >
> >> > Adding NameData just to make
> >> >> it fixed-length means we're always adding 64B even if we just need a
> >> >> single byte, which means ~30% overhead for the FormData_pg_attribute.
> >> >> That seems a bit unnecessary, and might be an issue with many attributes
> >> >> (e.g. with many temp tables, etc.).
> >> >
> >> >You are right.  Even I did not like to keep 64B for this, so I will 
> >> >change it.
> >> >
> >> >>
> >> >> >> But to me this seems very much like a misuse of attcompression to 
> >> >> >> track
> >> >> >> dependencies on compression methods, necessary because we don't have 
> >> >> >> a
> >> >> >> separate catalog listing compression methods. If we had that, I 
> >> >> >> think we
> >> >> >> could simply add dependencies between attributes and that catalog.
> >> >> >
> >> >> >Basically, up to this patch, we are having only built-in compression
> >> >> >methods and those can not be dropped so we don't need any dependency
> >> >> >at all.  We just want to know what is the current compression method
> >> >> >and what is the preserve compression methods supported for this
> >> >> >attribute.  Maybe we can do it better instead of using the NameData
> >> >> >but I don't think it makes sense to add a separate catalog?
> >> >> >
> >> >>
> >> >> Sure, I understand what the goal was - all I'm saying is that it looks
> >> >> very much like a workaround needed because we don't have the catalog.
> >> >>
> >> >> I don't quite understand how could we support custom compression methods
> >> >> without listing them in some sort of catalog?
> >> >
> >> >Yeah for supporting custom compression we need some catalog.
> >> >
> >> >> >> Moreover, having the catalog would allow adding compression methods
> >> >> >> (from extensions etc) instead of just having a list of hard-coded
> >> >> >> compression methods. Which seems like a strange limitation, 
> >> >> >> considering
> >> >> >> this thread is called "custom compression methods".
> >> >> >
> >> >> >I think I forgot to mention while submitting the previous patch that
> >> >> >the next patch I am planning to submit is, Support creating the custom
> >> >> >compression methods wherein we can use pg_am catalog to insert the new
> >> >> >compression method.  And for dependency handling, we can create an
> >> >> >attribute dependency on the pg_am row.  Basically, we will create the
> >> >> >attribute dependency on the current compression method AM as well as
> >> >> >on the preserved compression methods AM.   As part of this, we will
> >> >> >add two build-in AMs for zlib and pglz, and the attcompression field
> >> >> >will be converted to the oid_vector (first OID will be of the current
> >> >> >compression method, followed by the preserved compression method's
> >> >> >oids).
> >> >> >
> >> >>
> >> >> Hmmm, ok. Not sure pg_am is the right place - compression methods don't
> >> >> quite match what I though AMs are about, but maybe it's just my fault.
> >> >>
> >> >> FWIW it seems a bit strange to first do the attcompression magic and
> >> >> then add the catalog later - I think we should start with the catalog
> >> >> right away. The advantage is that if we end up committing only some of
> >> >> the patches in this cycle, we already have all the infrastructure etc.
> >> >> We can reorder that later, though.
> >> >
> >> >Hmm,  yeah we can do this way as well that first create a new catalog
> >> >table and add entries for these two built-in methods and the
> >> >attcompression can store the oid vector.  But if we only commit the
> >> >build-in com

Re: Two fsync related performance issues?

2020-10-06 Thread Thomas Munro

On Mon, Oct 5, 2020 at 2:38 PM Thomas Munro  wrote:
> On Wed, Sep 9, 2020 at 3:49 PM Thomas Munro  wrote:
> > For the record, Andres Freund mentioned a few problems with this
> > off-list and suggested we consider calling Linux syncfs() for each top
> > level directory that could potentially be on a different filesystem.
> > That seems like a nice idea to look into.
>
> Here's an experimental patch to try that.  One problem is that before
> Linux 5.8, syncfs() doesn't report failures[1].  I'm not sure what to
> think about that; in the current coding we just log them and carry on
> anyway, but any theoretical problems that causes for BLK_DONE should
> be moot anyway because of FPIs which result in more writes and syncs.
> Another is that it may affect other files that aren't under pgdata as
> collateral damage, but that seems acceptable.  It also makes me a bit
> sad that this wouldn't help other OSes.

... and for comparison/discussion, here is an alternative patch that
figures out precisely which files need to be fsync'd using information
in the WAL.  On a system with full_page_writes=on, this effectively
means that we don't have to do anything at all for relation files, as
described in more detail in the commit message.  You still need to
fsync the WAL files to make sure you can't replay records from the log
that were written but not yet fdatasync'd, addressed in the patch.
I'm not yet sure which other kinds of special files might need special
treatment.

Some thoughts:
1.  Both patches avoid the need to open many files.  With 1 million
tables this can take over a minute even on a fast system with warm
caches and/or fast local storage, before replay begins, and for a cold
system with high latency storage it can be a serious problem.
2.  The syncfs() one is comparatively simple, but it only works on
Linux.  If you used sync() (or sync(); sync()) instead, you'd be
relying on non-POSIX behaviour, because POSIX says it doesn't wait for
completion and indeed many non-Linux systems really are like that.
3.  Though we know of kernel/filesystem pairs that can mark buffers
clean while retaining the dirty contents (which would cause corruption
with the current code in master, or either of these patches),
fortunately those systems can't possibly run with full_page_writes=off
so such problems are handled the same way torn pages are fixed.
4.  Perhaps you could set a flag in the buffer to say 'needs sync' as
a way to avoid repeatedly requesting syncs for the same page, but it
might not be worth the hassle involved.

Some other considerations that have been mentioned to me by colleagues
I talked to about this:
1.  The ResetUnloggedRelations() code still has to visit all relation
files looking for init forks after a crash.  But that turns out to be
OK, it only reads directory entries in a straight line.  It doesn't
stat() or open() files with non-matching names, so unless you have
very many unlogged tables, the problem is already avoided.  (It also
calls fsync() on the result, which could perhaps be replaced with a
deferred request too, not sure, for another day.)
2.  There may be some more directories that need special fsync()
treatment.  SLRUs are already covered (either handled by checkpoint or
they hold ephemeral data), and I think pg_tblspc changes will be
redone.  pg_logical?  I am not sure.
From 5becb011e7657f6aa46d9d0a834d3b176dbb589c Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Sat, 26 Sep 2020 23:07:48 +1200
Subject: [PATCH] Optimize fsync() calls in crash recovery.

Previously, we'd open every file under pgdata and call fsync(), before
entering crash recovery.  That's because it's important to avoid relying
on dirty data in the kernel's cache when deciding that a change has
already been applied.

Instead of doing work in proportion to the number of relations (which
could be extremely slow with a cold VFS cache and high latency storage),
switch to doing work in proportion to the number of relations that we
opted not to dirty (in PostgreSQL's own buffer pool) based on page LSNs.

Typically, this means doing no work at all, because most users run with
full_page_writes=on, meaning that we'll restore the earliest version of
the page and then replay every WAL record on top of it (that is, we'll
never skip a record, and always dirty, write back and eventually sync
the page).

With full_page_writes=off, we get to skip dirtying pages that have
higher LSNs ("BLK_DONE"), but now in that case we'll still queue up a
request to issue an fsync() call in the next checkpoint whenever that
happens.

Discussion: https://postgr.es/m/CAEET0ZHGnbXmi8yF3ywsDZvb3m9CbdsGZgfTXscQ6agcbzcZAw%40mail.gmail.com
---
 src/backend/access/transam/xlog.c  |  41 
 src/backend/access/transam/xlogutils.c |   9 ++
 src/backend/storage/buffer/bufmgr.c|  20 
 src/backend/storage/file/fd.c  | 137 -
 src/backend/storage/smgr/md.c  |  15 +++
 src/backend/storage/smgr/smgr.c|  15

Re: [HACKERS] logical decoding of two-phase transactions

2020-10-06 Thread Amit Kapila

On Tue, Oct 6, 2020 at 10:23 AM peter.b.sm...@fujitsu.com
 wrote:
>
>
> [BEGIN]
>
> ==
> Patch V6-0001, File: contrib/test_decoding/expected/prepared.out (so
> prepared.sql also)
> ==
>
> COMMENT
> Line 30 - The INSERT INTO test_prepared1 VALUES (2); is kind of
> strange because it is not really part of the prior test nor the
> following test. Maybe it would be better to have a comment describing
> the purpose of this isolated INSERT and to also consume the data from
> the slot so it does not get jumbled with the data of the following
> (abort) test.
>
> ;
>
> COMMENT
> Line 53 - Same comment for this test INSERT INTO test_prepared1 VALUES
> (4); It kind of has nothing really to do with either the prior (abort)
> test nor the following (ddl) test.
>
> ;
>
> COMMENT
> Line 60 - Seems to check which locks are held for the test_prepared_1
> table while the transaction is in progress. Maybe it would be better
> to have more comments describing what is expected here and why.
>
> ;
>
> COMMENT
> Line 88 - There is a comment in the test saying "-- We should see '7'
> before '5' in our results since it commits first." but I did not see
> any test code that actually verifies that happens.
>
> ;

All the above comments are genuine and I think it is mostly because
the author has blindly modified the existing tests without completely
understanding the intent of the test. I suggest we write a completely
new regression file (decode_prepared.sql) for these and just copy
whatever is required from prepared.sql. Once we do that we might also
want to rename existing prepared.sql to decode_commit_prepared.sql or
something like that. I think modifying the existing test appears to be
quite ugly and also it is changing the intent of the existing tests.

>
> QUESTION
> Line 120 - I did not really understand the SQL checking the pg_class.
> I expected this would be checking table 'test_prepared1' instead. Can
> you explain it?
> SELECT 'pg_class' AS relation, locktype, mode
> FROM pg_locks
> WHERE locktype = 'relation'
> AND relation = 'pg_class'::regclass;
> relation | locktype | mode
> --+--+--
> (0 rows)
>
> ;

Yes, I also think your expectation is correct and this should be on
'test_prepared_1'.

>
> QUESTION
> Line 139 - SET statement_timeout = '1s'; is 1 seconds short enough
> here for this test, or might it be that these statements would be
> completed in less than one seconds anyhow?
>
> ;

Good question. I think we have to mention the reason why logical
decoding is not blocked while it needs to acquire a shared lock on the
table and the previous commands already held an exclusive lock on the
table. I am not sure if I am missing something but like you, it is not
clear to me as well what this test intends to do, so surely more
commentary is required.


>
> QUESTION
> Line 163 - How is this testing a SAVEPOINT? Or is it only to check
> that the SAVEPOINT command is not part of the replicated changes?
>
> ;

It is more of testing that subtransactions will not create a problem
while decoding.

>
> COMMENT
> Line 175 - Missing underscore in comment. Code requires also underscore:
> "nodecode" --> "_nodecode"
>

makes sense.

> ==
> Patch V6-0001, File: contrib/test_decoding/test_decoding.c
> ==
>
> COMMENT
> Line 43
> @@ -36,6 +40,7 @@ typedef struct
> bool skip_empty_xacts;
> bool xact_wrote_changes;
> bool only_local;
> + TransactionId check_xid; /* track abort of this txid */
> } TestDecodingData;
>
> The "check_xid" seems a meaningless name. Check what?
> IIUC maybe should be something like "check_xid_aborted"
>
> ;
>
> COMMENT
> Line 105
> @ -88,6 +93,19 @@ static void
> pg_decode_stream_truncate(LogicalDecodingContext *ctx,
>  ReorderBufferTXN *txn,
>  int nrelations, Relation relations[],
>  ReorderBufferChange *change);
> +static bool pg_decode_filter_prepare(LogicalDecodingContext *ctx,
> + ReorderBufferTXN *txn,
>
> Remove extra blank line after these functions
>
> ;

The above two sounds reasonable suggestions.

>
> COMMENT
> Line 149
> @@ -116,6 +134,11 @@ _PG_output_plugin_init(OutputPluginCallbacks *cb)
>  cb->stream_change_cb = pg_decode_stream_change;
>  cb->stream_message_cb = pg_decode_stream_message;
>  cb->stream_truncate_cb = pg_decode_stream_truncate;
> + cb->filter_prepare_cb = pg_decode_filter_prepare;
> + cb->prepare_cb = pg_decode_prepare_txn;
> + cb->commit_prepared_cb = pg_decode_commit_prepared_txn;
> + cb->abort_prepared_cb = pg_decode_abort_prepared_txn;
> +
>  }
>
> There is a confusing mix of terminology where sometimes things are
> referred as ROLLBACK/rollback and other times apparently the same
> operation is referred as ABORT/abort. I do not know the root cause of
> this mixture. IIUC maybe the internal functions and protocol generally
> use the term "abort", whereas the SQL syntax is "ROLLBACK"... but
> where those two terms collide in the middle it gets quite confusing.
>
> At least I thought the names of the "callbacks" which get exposed

Re: Improve choose_custom_plan for initial partition prune case

2020-10-06 Thread Andy Fan

Hi Ashutosh:

Thanks for coming.

On Mon, Oct 5, 2020 at 9:27 PM Ashutosh Bapat 
wrote:

> On Thu, Oct 1, 2020 at 9:34 PM Andy Fan  wrote:
> >
> > Given the plan example:
> >
> > CREATE TABLE measurement (
> > city_id int not null,
> > logdate date not null,
> > peaktempint,
> > unitsales   int
> > ) PARTITION BY RANGE (logdate);
> >
> > CREATE TABLE measurement_y2006m02 PARTITION OF measurement
> > FOR VALUES FROM ('2006-02-01') TO ('2006-03-01');
> >
> > CREATE TABLE measurement_y2006m03 PARTITION OF measurement
> > FOR VALUES FROM ('2006-03-01') TO ('2006-04-01');
> >
> > prepare s as select * from measurement where logdate = $1;
> > execute s('2006-02-01').
> >
> > The generic plan will probably not be chosen because it doesn't reduce
> the cost
> > which can be reduced at initial_prune while the custom plan reduces such
> cost
> > at  planning time. which makes the cost comparison not fair.  I'm
> thinking if we can
> > get an estimated cost reduction of initial_prunne for generic plan based
> on the
> > partition pruned at plan time from custom plan and then reducing
> > such costs from the generic plan.  I just went through the related code
> but
> > didn't write anything now.  I'd like to see if this is a correct
> direction to go.
>
> What happens when we
> execute plans with values that have estimates similar to the generic
> plan later when we moderate generic plan costs based on the custom
> plans?
>
>
The example at the beginning of this thread,  I used the exact same values
every time, the custom plan will be chosen all the time, which is bad,
The main reason is the custom plan knows the exact value in Execute
message, so it run plan time partition prune, then the total cost is low,
however
for the generic plan the partition prune happens at
Runtime initial_partition prune
stage,  so the cost of the partitions which can be pruned at that stage is
still
included the total cost,  so generic plans can't be chosen. that would be
the
thing I want to fix.

If the table has good distribution of a partition key, which also
> results in good distribution of data across partitions, generic plan
> cost will be similar to the custom plan costs. If not that's something
> we want to fix.



> [1] talks about generic plan being
> not chosen just because it has higher cost even though its shape is
> similar to a custom plan. Leveraging that idea might be a good option.
> If the custom plans produced have shapes similar to the generic plan,
> stop producing those.
>

If I understand you correctly,  the issue I want to fix here matches this
goal.

-- 
Best Regards
Andy Fan

Re: Resetting spilled txn statistics in pg_stat_replication

2020-10-06 Thread Masahiko Sawada

On Tue, 6 Oct 2020 at 17:56, Amit Kapila  wrote:
>
> On Tue, Oct 6, 2020 at 9:34 AM Masahiko Sawada
>  wrote:
> >
> > Looking at pgstat_reset_replslot_counter() in the v8 patch, even if we
> > pass a physical slot name to pg_stat_reset_replication_slot() a
> > PgStat_MsgResetreplslotcounter is sent to the stats collector. I’m
> > okay with not raising an error but maybe we can have it not to send
> > the message in that case.
> >
>
> makes sense, so changed accordingly.
>

Thank you for updating the patch!

Both patches look good to me.

Regards,

-- 
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: speed up unicode normalization quick check

2020-10-06 Thread Michael Paquier

On Sat, Sep 19, 2020 at 04:09:27PM -0700, Mark Dilger wrote:
> I am marking this ready for committer.  I didn't object to the
> whitespace weirdness in your patch (about which `git apply`
> grumbles) since you seem to have done that intentionally.  I have no
> further comments on the performance issue, since I don't have any
> other platforms at hand to test it on.  Whichever committer picks
> this up can decide if the issue matters to them enough to punt it
> back for further performance testing.

About 0001, the new set of multipliers looks fine to me.  Even if this
adds an extra item from 901 to 902 because this can be divided by 17
in kwlist_d.h, I also don't think that this is really much bothering
and.  As mentioned, this impacts none of the other tables that are much
smaller in size, on top of coming back to normal once a new keyword
will be added.  Being able to generate perfect hash functions for much
larger sets is a nice property to have.  While on it, I also looked at
the assembly code with gcc -O2 for keywords.c & co and I have not
spotted any huge difference.  So I'd like to apply this first if there
are no objections.

I have tested 0002 and 0003, that had better be merged together at the
end, and I can see performance improvements with MSVC and gcc similar
to what is being reported upthread, with 20~30% gains for simple
data sample using IS NFC/NFKC.  That's cool.

Including unicode_normprops_table.h in what gets ignored with pgindent
is also fine at the end, even with the changes to make the output of
the structures generated more in-line with what pgindent generates.
One tiny comment I have is that I would have added an extra comment in
the unicode header generated to document the set of structures
generated for the perfect hash, but that's easy enough to add.
--
Michael

signature.asc
Description: PGP signature

Re: new heapcheck contrib module

2020-10-06 Thread Andrey Borodin




> 7 окт. 2020 г., в 04:20, Mark Dilger  
> написал(а):
> 
> 
> 
>> On Oct 5, 2020, at 5:24 PM, Mark Dilger  wrote:
>> 
>> - This version does not change clog handling, which leaves Andrey's concern 
>> unaddressed.  Peter also showed some support for (or perhaps just a lack of 
>> opposition to) doing more of what Andrey suggests.  I may come back to this 
>> issue, depending on time available and further feedback.
> 
> Attached is a patch set that includes the clog handling as discussed.  The 
> 0001 and 0002 are effectively unchanged since version 16 posted yesterday, 
> but this now includes 0003 which creates a non-throwing interface to clog, 
> and 0004 which uses the non-throwing interface from within amcheck's heap 
> checking functions.
> 
> I think this is a pretty good sketch for discussion, though I am unsatisfied 
> with the lack of regression test coverage of verify_heapam in the presence of 
> clog truncation.  I was hoping to have that as part of v17, but since it is 
> taking a bit longer than I anticipated, I'll have to come back with that in a 
> later patch.
> 

Many thanks, Mark! I really appreciate this functionality. It could save me 
many hours of recreating clogs.

I'm not entire sure this message is correct: psprintf(_("xmax %u commit status 
is lost")
It seems to me to be not commit status, but rather transaction status.

Thanks!

Best regards, Andrey Borodin.

Re: Improve choose_custom_plan for initial partition prune case

2020-10-06 Thread Andy Fan

On Sat, Oct 3, 2020 at 10:05 AM Andy Fan  wrote:

> Hi Amit:
>
>   Very glad to see your comment!
>
> On Fri, Oct 2, 2020 at 4:21 PM Amit Langote 
> wrote:
>
>> Hi Andy,
>>
>> On Fri, Oct 2, 2020 at 1:04 AM Andy Fan  wrote:
>> >
>> > Given the plan example:
>> >
>> > CREATE TABLE measurement (
>> > city_id int not null,
>> > logdate date not null,
>> > peaktempint,
>> > unitsales   int
>> > ) PARTITION BY RANGE (logdate);
>> >
>> > CREATE TABLE measurement_y2006m02 PARTITION OF measurement
>> > FOR VALUES FROM ('2006-02-01') TO ('2006-03-01');
>> >
>> > CREATE TABLE measurement_y2006m03 PARTITION OF measurement
>> > FOR VALUES FROM ('2006-03-01') TO ('2006-04-01');
>> >
>> > prepare s as select * from measurement where logdate = $1;
>> > execute s('2006-02-01').
>> >
>> > The generic plan will probably not be chosen because it doesn't reduce
>> the cost
>> > which can be reduced at initial_prune while the custom plan reduces
>> such cost
>> > at  planning time. which makes the cost comparison not fair.
>>
>> I agree that there is something to be done here.  Actually, I think we
>> should try to find a solution that will allow us to consider not just
>> "initial" pruning, but also "execution-time" pruning.   The latter
>> will allow a nested loop join whose inner side scans a partitioned
>> table using a parameterized scan on the partition key to be favored
>> over other join plans, because that parameterized scan can use
>> execution-time pruning which can make the inner scan very cheap.
>>
>>
> This looks like to resolve another important issue of partition prune,
> which
> may happen at planning time totally (no generic plan or custom plan
> involved).
> for example between choosing a Nest Loop plan which can use
> some run-time partition prune and hash join which can't.  I "repeat" your
> idea
> just to  make sure I understand you correctly.
>
>
>> >  I'm thinking if we can
>> > get an estimated cost reduction of initial_prunne for generic plan
>> based on the
>> > partition pruned at plan time from custom plan and then reducing
>> > such costs from the generic plan.  I just went through the related code
>> but
>> > didn't write anything now.  I'd like to see if this is a correct
>> direction to go.
>>
>> That's an interesting idea, that is, to try to do this totally outside
>> the planner.  When I was thinking about this a little while ago, I was
>> trying to find a way to adjust the cost of the plan in the planner
>> itself by looking at the runtime pruning info in the nodes that
>> support it, that is, Append, MergeAppend.  Actually, such an approach
>> had also come up in the original run-time pruning discussion [1].
>>
>>
> Thank you for your comments. Looks like your approach can be helpful
> for the both cases, and I did think a bit for that as well, However, that
> looks
> complex (for me) AND I am prefer to guess how many partitions can be
> pruned with real data even it is the real data in the past (I assume that
> will not cause too much difference in practice).
>
> I'm not sure if I should treat Robert's comments as an opposed idea[1] ,
> but I think there are some little differences.  I'd like to implement my
> idea
> soon, and I'm glad to see any opposed idea at any time, of course the
> sooner
> the better:)
>

After some day's coding in this direction, I find a very upsetting
situation.

The main idea here is if we have N partition survived for custom plan after
plan time partition prune, then I assume we can get the similar partition
survived for generic plan after initial partition prune.  Then we should
reduce
such costs from generic plans.

However the hard part in implementation is we can't associate the Append
node in the generic plan with the survived partition information from the
custom plan even Append node has apprelids in it.

1. Associate them with rtindex. The final rtindex is impacted by
glob->finalrtable,
 however the glob->finalrtable includes the child partitions, we will get
less
finalrtable in the custom plan so rtindex can't be matched.

2. Associate them with RelationOid, and we can record such information in
the
 Append node as well. The bad part is the same relation Oid may appear
multiple
 times in a query. for example: SELECT .. FROM p p1,  p p2 where
p1.partkey1 = $1
 AND p2.partkey2 = $2;

So any hint on this will be appreciated..

-- 
Best Regards
Andy Fan

Re: Improve choose_custom_plan for initial partition prune case

2020-10-06 Thread Andy Fan

>
>
> 2. Associate them with RelationOid, and we can record such information in
> the
>  Append node as well. The bad part is the same relation Oid may appear
> multiple
>  times in a query. for example: SELECT .. FROM p p1,  p p2 where
> p1.partkey1 = $1
>  AND p2.partkey2 = $2;
>
>
I just came up with a new idea.  Since this situation should be rare, we
can just come back
to our original method (totally ignore the cost reduction) or use the
average number.  Fixing
the 99% cases would be a big winner as well IMO.

So any hint on this will be appreciated..
>
> --
> Best Regards
> Andy Fan
>


-- 
Best Regards
Andy Fan

74 matches

Mail list logo