Re: pg_stat_statements oddity with track = all

2020-12-27 Thread Julien Rouhaud
On Fri, Dec 04, 2020 at 06:09:13PM +0800, Julien Rouhaud wrote:
> On Fri, Dec 04, 2020 at 12:06:10PM +0300, Sergei Kornilov wrote:
> > Hello
> > 
> > Seems we need also change PGSS_FILE_HEADER.
> 
> Indeed, thanks!  v2 attached.

There was a conflict on PGSS_FILE_HEADER since some recent commit, v3 attached.
>From 832b1a81cfba8a38c6b58b0a9212a6a95fc231a8 Mon Sep 17 00:00:00 2001
From: Julien Rouhaud 
Date: Fri, 4 Dec 2020 13:33:51 +0800
Subject: [PATCH v3] Add a bool toplevel column to pg_stat_statements.

As top level statements include resource consumption for underlying statements,
it's not possible to compute the total resource consumption accurately.  Fix
that by adding a new toplevel boolean field that indicates whether the counters
were cumulated for queries executed at top level or not.

It can lead to more entries being stored for the same workload if
pg_stat_statements.track is set to all, but it seems unlikely to have the same
queries being executed both at top level and as nested statements.
---
 contrib/pg_stat_statements/Makefile   |  3 +-
 .../expected/pg_stat_statements.out   | 40 +
 .../pg_stat_statements--1.9--1.10.sql | 57 +++
 .../pg_stat_statements/pg_stat_statements.c   | 44 +++---
 .../pg_stat_statements.control|  2 +-
 .../sql/pg_stat_statements.sql| 21 +++
 doc/src/sgml/pgstatstatements.sgml|  9 +++
 7 files changed, 166 insertions(+), 10 deletions(-)
 create mode 100644 contrib/pg_stat_statements/pg_stat_statements--1.9--1.10.sql

diff --git a/contrib/pg_stat_statements/Makefile 
b/contrib/pg_stat_statements/Makefile
index 3ec627b956..cab4f626ad 100644
--- a/contrib/pg_stat_statements/Makefile
+++ b/contrib/pg_stat_statements/Makefile
@@ -6,7 +6,8 @@ OBJS = \
pg_stat_statements.o
 
 EXTENSION = pg_stat_statements
-DATA = pg_stat_statements--1.4.sql pg_stat_statements--1.8--1.9.sql \
+DATA = pg_stat_statements--1.4.sql \
+pg_stat_statements--1.9--1.10.sql pg_stat_statements--1.8--1.9.sql \
pg_stat_statements--1.7--1.8.sql pg_stat_statements--1.6--1.7.sql \
pg_stat_statements--1.5--1.6.sql pg_stat_statements--1.4--1.5.sql \
pg_stat_statements--1.3--1.4.sql pg_stat_statements--1.2--1.3.sql \
diff --git a/contrib/pg_stat_statements/expected/pg_stat_statements.out 
b/contrib/pg_stat_statements/expected/pg_stat_statements.out
index 16158525ca..fb97f68737 100644
--- a/contrib/pg_stat_statements/expected/pg_stat_statements.out
+++ b/contrib/pg_stat_statements/expected/pg_stat_statements.out
@@ -876,4 +876,44 @@ SELECT dealloc FROM pg_stat_statements_info;
0
 (1 row)
 
+--
+-- top level handling
+--
+SET pg_stat_statements.track = 'top';
+DELETE FROM test;
+DO $$
+BEGIN
+DELETE FROM test;
+END;
+$$ LANGUAGE plpgsql;
+SELECT query, toplevel, plans, calls FROM pg_stat_statements WHERE query LIKE 
'%DELETE%' ORDER BY query COLLATE "C", toplevel;
+ query | toplevel | plans | calls 
+---+--+---+---
+ DELETE FROM test  | t| 1 | 1
+ DO $$+| t| 0 | 1
+ BEGIN+|  |   | 
+ DELETE FROM test;+|  |   | 
+ END; +|  |   | 
+ $$ LANGUAGE plpgsql   |  |   | 
+(2 rows)
+
+SET pg_stat_statements.track = 'all';
+DELETE FROM test;
+DO $$
+BEGIN
+DELETE FROM test;
+END;
+$$ LANGUAGE plpgsql;
+SELECT query, toplevel, plans, calls FROM pg_stat_statements WHERE query LIKE 
'%DELETE%' ORDER BY query COLLATE "C", toplevel;
+ query | toplevel | plans | calls 
+---+--+---+---
+ DELETE FROM test  | f| 1 | 1
+ DELETE FROM test  | t| 2 | 2
+ DO $$+| t| 0 | 2
+ BEGIN+|  |   | 
+ DELETE FROM test;+|  |   | 
+ END; +|  |   | 
+ $$ LANGUAGE plpgsql   |  |   | 
+(3 rows)
+
 DROP EXTENSION pg_stat_statements;
diff --git a/contrib/pg_stat_statements/pg_stat_statements--1.9--1.10.sql 
b/contrib/pg_stat_statements/pg_stat_statements--1.9--1.10.sql
new file mode 100644
index 00..f97d16497d
--- /dev/null
+++ b/contrib/pg_stat_statements/pg_stat_statements--1.9--1.10.sql
@@ -0,0 +1,57 @@
+/* contrib/pg_stat_statements/pg_stat_statements--1.9--1.10.sql */
+
+-- complain if script is sourced in psql, rather than via ALTER EXTENSION
+\echo Use "ALTER EXTENSION pg_stat_statements UPDATE TO '1.10'" to load this 
file. \quit
+
+/* First we have to remove them from the extension */
+ALTER EXTENSION pg_stat_statements DROP VIEW pg_stat_statements;
+ALTER EXTENSION pg_stat_statements DROP FUNCTION pg_stat_statements(boolean);
+
+/* Then we can drop them */
+DROP VIEW pg_stat_statements;
+DROP FUNCTION pg_stat_statements(boolean);
+
+/* Now redefine */
+CREATE FUNCTION pg_stat_statement

Re: Parallel Inserts in CREATE TABLE AS

2020-12-27 Thread Bharath Rupireddy
On Sat, Dec 26, 2020 at 9:20 PM vignesh C  wrote:
> +-- parallel inserts must occur
> +select explain_pictas(
> +'create table parallel_write as select length(stringu1) from tenk1;');
> +select count(*) from parallel_write;
> +drop table parallel_write;
>
> We can change comment  "parallel inserts must occur" like "parallel
> insert must be selected for CTAS on normal table"
>
> +-- parallel inserts must occur
> +select explain_pictas(
> +'create unlogged table parallel_write as select length(stringu1) from 
> tenk1;');
> +select count(*) from parallel_write;
> +drop table parallel_write;
>
> We can change comment "parallel inserts must occur" like "parallel
> insert must be selected for CTAS on unlogged table"
> Similar comment need to be handled in other places also.

I think the existing comments look fine. The info like table type and
the Query CTAS or CMV is visible by looking at the test case. What I
wanted from the comments is whether we support parallel inserts or not
and if not why so that it will be easy to read. I tried to keep it as
succinctly as possible.

> If possible try to make a common function for both and use.

Yes you are right. The function explain_pictas is the same as
explain_parallel_append from partition_prune.sql. It's a test
function, and I also see that we have serial_schedule and
parallel_schedule which means that these sql files can run in any
order. I'm not quite sure whether we can have it in a common test sql
file and use it across other tests sql files. AFAICS, I didn't find
any function being used in such a manner. Thoughts?

> +   if (intoclausestr && OidIsValid(objectid))
> +   fpes->objectid = objectid;
> +   else
> +   fpes->objectid = InvalidOid;
> Here OidIsValid(objectid) check is not required intoclausestr will be
> set only if OidIsValid.

Removed the OidIsValid check in the latest v16 patch set posted
upthread. Please have a look.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




Re: range_agg

2020-12-27 Thread Alexander Korotkov
On Thu, Dec 17, 2020 at 10:10 PM Alexander Korotkov
 wrote:
>
> I think this patch is very close to committable.  I'm going to spend
> some more time further polishing it and commit (if I don't find a
> major issue or face objections).

The main patch is committed.  I've prepared a set of improvements.
0001 Fixes bug in bsearch comparison functions
0002 Implements missing @> (range,multirange) operator and its commutator
0003 Does refactors signatures of *_internal() multirange functions
0004 Adds cross-type (range, multirange) operators handling to
existing range GiST opclass
0005 Adds support for GiST multirange indexing by approximation of
multirange as the union range with no gaps

The patchset is quite trivial.  I'm going to push it if there are no objections.

The SP-GiST handling is more tricky and requires substantial work.

--
Regards,
Alexander Korotkov


0001-Fix-bugs-in-comparison-functions-for-multirange_bsea.patch
Description: Binary data


0002-Implement-operators-for-checking-if-the-range-contai.patch
Description: Binary data


0003-Improve-the-signature-of-internal-multirange-functio.patch
Description: Binary data


0005-Add-GiST-indexes-for-multiranges.patch
Description: Binary data


0004-Add-support-of-multirange-matching-to-the-existing-r.patch
Description: Binary data


Re: Add table access method as an option to pgbench

2020-12-27 Thread Fabien COELHO


Hello Justin,


src/test/regress/sql/create_am.sql:CREATE ACCESS METHOD heap2 TYPE TABLE 
HANDLER heap_tableam_handler;
...
src/test/regress/sql/create_am.sql:DROP ACCESS METHOD heap2;



Or maybe using SET default_tablespace instead of modifying the CREATE sql.
That'd need to be done separately for indexes, and RESET after creating the
tables, to avoid accidentally affecting indexes, too.


Why should it not affect indexes?


@Fabien: I think the table should be created and populated within the same
transaction, to allow this optimization in v13 to apply during the
"initialization" phase.
c6b92041d Skip WAL for new relfilenodes, under wal_level=minimal.


Possibly.

This would be a change of behavior: currently only filling tables is under 
an explicit transaction.


That would be a matter for another patch, probably with an added option.

As VACUUM is not transaction compatible, it might be a little bit tricky 
to add such a feature. Also I'm not sure about some constraint 
declarations.


ISTM that I submitted a patch a long time ago to allow "()" to control 
transaction from the command line, but somehow it got lost or rejected.
I redeveloped it, see attached. I cannot see reliable performance 
improvement by playing with (), though.


--
Fabien.diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index b03d0cc50f..8d9b642fdd 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -170,7 +170,7 @@ pgbench  options  d
 init_steps specifies the
 initialization steps to be performed, using one character per step.
 Each step is invoked in the specified order.
-The default is dtgvp.
+The default is dt(g)vp.
 The available steps are:
 
 
@@ -226,6 +226,22 @@ pgbench  options  d

   
  
+ 
+  ( (Begin)
+  
+   
+Begin a transaction.
+   
+  
+ 
+ 
+  ) (Commit)
+  
+   
+Commit a transaction.
+   
+  
+ 
  
   v (Vacuum)
   
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 3057665bbe..f77e33056c 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -132,8 +132,8 @@ static int	pthread_join(pthread_t th, void **thread_return);
 /
  * some configurable parameters */
 
-#define DEFAULT_INIT_STEPS "dtgvp"	/* default -I setting */
-#define ALL_INIT_STEPS "dtgGvpf"	/* all possible steps */
+#define DEFAULT_INIT_STEPS "dt(g)vp"	/* default -I setting */
+#define ALL_INIT_STEPS "dtgGvpf()"	/* all possible steps */
 
 #define LOG_STEP_SECONDS	5	/* seconds between log messages */
 #define DEFAULT_NXACTS	10		/* default nxacts */
@@ -3823,12 +3823,6 @@ initGenerateDataClientSide(PGconn *con)
 
 	fprintf(stderr, "generating data (client-side)...\n");
 
-	/*
-	 * we do all of this in one transaction to enable the backend's
-	 * data-loading optimizations
-	 */
-	executeStatement(con, "begin");
-
 	/* truncate away any old data */
 	initTruncateTables(con);
 
@@ -3940,8 +3934,6 @@ initGenerateDataClientSide(PGconn *con)
 	}
 
 	termPQExpBuffer(&sql);
-
-	executeStatement(con, "commit");
 }
 
 /*
@@ -3958,12 +3950,6 @@ initGenerateDataServerSide(PGconn *con)
 
 	fprintf(stderr, "generating data (server-side)...\n");
 
-	/*
-	 * we do all of this in one transaction to enable the backend's
-	 * data-loading optimizations
-	 */
-	executeStatement(con, "begin");
-
 	/* truncate away any old data */
 	initTruncateTables(con);
 
@@ -3989,8 +3975,6 @@ initGenerateDataServerSide(PGconn *con)
 	executeStatement(con, sql.data);
 
 	termPQExpBuffer(&sql);
-
-	executeStatement(con, "commit");
 }
 
 /*
@@ -4076,6 +4060,8 @@ initCreateFKeys(PGconn *con)
 static void
 checkInitSteps(const char *initialize_steps)
 {
+	bool in_tx = false;
+
 	if (initialize_steps[0] == '\0')
 	{
 		pg_log_fatal("no initialization steps specified");
@@ -4090,6 +4076,36 @@ checkInitSteps(const char *initialize_steps)
 			pg_log_info("Allowed step characters are: \"" ALL_INIT_STEPS "\".");
 			exit(1);
 		}
+
+		if (*step == '(')
+		{
+			if (in_tx)
+			{
+pg_log_fatal("opening a transaction inside a transaction");
+exit(1);
+			}
+			in_tx = true;
+		}
+		else if (*step == ')')
+		{
+			if (!in_tx)
+			{
+pg_log_fatal("closing a transaction without opening it");
+exit(1);
+			}
+			in_tx = false;
+		}
+		else if (*step == 'v' && in_tx)
+		{
+			pg_log_fatal("cannot run vacuum within a transaction");
+			exit(1);
+		}
+	}
+
+	if (in_tx)
+	{
+		pg_log_fatal("uncommitted transaction");
+		exit(1);
 	}
 }
 
@@ -4150,6 +4166,14 @@ runInitSteps(const char *initialize_steps)
 op = "foreign keys";
 initCreateFKeys(con);
 break;
+			case '(':
+op = "begin";
+executeStatement(con, "begin");
+break;
+			case ')':
+op = "commit";
+	

Re: Add session statistics to pg_stat_database

2020-12-27 Thread Laurenz Albe
On Fri, 2020-12-25 at 20:28 +0900, Masahiro Ikeda wrote:
> As a user, I want this feature to know whether
> clients' session activities are as expected.
> 
> I have some comments about the patch.
> 
> 1. pg_proc.dat
> 
> The unit of "session time" and so on says "in seconds".
> But, is "in milliseconds" right?
> 
> 2. monitoring.sgml
> 
> IIUC, "active_time" includes the time executes a fast-path function and
> "idle in transaction" includes "idle in transaction(aborted)" time.
> 
> Why don't you reference pg_stat_activity's "state" column and
> "active_time" is the total time when the state is "active" and "fast 
> path"?
> "idle in transaction" is as same too.
> 
> 3. pgstat.h
> 
> The comment of PgStat_MsgConn says "Sent by pgstat_connection".
> I thought "pgstat_connection" is a function, but it doesn't exist.
> 
> Is "Sent by the backend" right?
> 
> Although this is a trivial thing, the following row has too many tabs.
> Other structs have only one space.
> // }Pgstat_MsgConn;

Thanks for the feedback.

I am currently on vacations and will take a look after January 7.

Yours,
Laurenz Albe





Re: PoC Refactor AM analyse API

2020-12-27 Thread Andrey Borodin



> 8 дек. 2020 г., в 16:44, Denis Smirnov  написал(а):
> 
> Andrey, thanks for your feedback!
> 
> I agree that AMs with fix sized blocks can have much alike code in 
> acquire_sample_rows() (though it is not a rule). But there are several points 
> about current master sampling.
> 
> * It is not perfect - AM developers may want to improve it with other 
> sampling algorithms.
> * It is designed with a big influence of heap AM - for example, 
> RelationGetNumberOfBlocks() returns uint32 while other AMs can have a bigger 
> amount of blocks.
> * heapam_acquire_sample_rows() is a small function - I don't think it is not 
> a big trouble to write something alike for any AM developer.
> * Some AMs may have a single level sampling (only algorithm Z from Vitter for 
> example) - why not?
> 
> As a result we get a single and clear method to acquire rows for statistics. 
> If we don’t modify but rather extend current API ( for example in a manner it 
> is done for FDW) the code becomes more complicated and difficult to 
> understand.

This makes sense. Purpose of the API is to provide flexible abstraction. 
Current table_scan_analyze_next_block()/table_scan_analyze_next_tuple() API 
assumes too much about AM implementation.
But why do you pass int natts and VacAttrStats **stats to 
acquire_sample_rows()? Is it of any use? It seems to break abstraction too.

Best regards, Andrey Borodin.



Re: Proposed patch for key managment

2020-12-27 Thread Bruce Momjian
On Sat, Dec 19, 2020 at 11:45:08AM +, Alastair Turner wrote:
> On Fri, 18 Dec 2020 at 21:36, Stephen Frost  wrote:
> > These ideas don't seem too bad but I'd think we'd pass which key we want
> > on the command-line using a %i or something like that, rather than using
> > stdin, unless there's some reason that would be an issue..?  Certainly
> > the CLI utilities I've seen tend to expect the name of the secret that
> > you're asking for to be passed on the command line.
> 
> Agreed. I was working on the assumption that the calling process
> (initdb or pg_ctl) would have access to the challenge material and be
> passing it to the utility, so putting it in a command line would allow
> it to leak. If the utility is accessing the challenge material
> directly, then just an indicator of which key to work with would be a
> lot simpler, true.

I want to repeat here what I said in another thread:

> I think ultimately we will need three commands to control the keys.
> First, there is the cluster_key_command, which we have now.  Second, I
> think we will need an optional command which returns random bytes ---
> this would allow users to get random bytes from a different source than
> that used by the server code.
> 
> Third, we will probably need a command that returns the data encryption
> keys directly, either heap/index or WAL keys, probably based on key
> number --- you pass the key number you want, and the command returns the
> data key.  There would not be a cluster key in this case, but the
> command could still prompt the user for perhaps a password to the KMS
> server. It could not be used if any of the previous two commands are
> used. I assume an HMAC would still be stored in the pg_cryptokeys
> directory to check that the right key has been returned.
> 
> I thought we should implement the first command, because it will
> probably be the most common and easiest to use, and then see what people
> want added.

There is also a fourth option where the command returns multiple keys,
one per line of hex digits.  That could be written in shell script, but
it would be fragile and complex.  It could be written in Perl, but that
would add a new language requirement for this feature.  It could be
written in C, but that would limits its flexibility because changes
would require a recompile, and you would probably need that C file to
call external scripts to tailor input like we do now from the server.

You could actually write a full implemention of what we do on the server
side in client code, but pg_alterckey would not work, since it would not
know the data format, so we would need another cluster key alter for that.

It could be written as a C extension, but that would be also have C's
limitations.  In summary, having the server do most of the complex work
for the default case seems best, and eventually allowing the ability for
the client to do everything seems ideal.  I think we need more input
before we go beyond what we do now.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee





Re: Add table access method as an option to pgbench

2020-12-27 Thread Justin Pryzby
On Sun, Dec 27, 2020 at 09:14:53AM -0400, Fabien COELHO wrote:
> > src/test/regress/sql/create_am.sql:CREATE ACCESS METHOD heap2 TYPE TABLE 
> > HANDLER heap_tableam_handler;
> > ...
> > src/test/regress/sql/create_am.sql:DROP ACCESS METHOD heap2;
> 
> > Or maybe using SET default_tablespace instead of modifying the CREATE sql.
> > That'd need to be done separately for indexes, and RESET after creating the
> > tables, to avoid accidentally affecting indexes, too.
> 
> Why should it not affect indexes?

I rarely use pgbench, and probably never looked at its source before, but I saw
that it has a separate --tablespace and --index-tablespace, and that
--tablespace is *not* the default for indexes.

So if we changed it to use SET default_tablespace instead of amending the DDL
sql, we'd need to make sure the SET applied only to the intended CREATE
command, and not all following create commands.  In the case that
--index-tablespace is not specified, it would be buggy to do this:

SET default_tablespace=foo;
CREATE TABLE ...
CREATE INDEX ...
CREATE TABLE ...
CREATE INDEX ...
...

-- 
Justin

PS. Thanks for patching it to work with partitioned tables :)




Re: range_agg

2020-12-27 Thread Zhihong Yu
Hi,

This is not an ideal way to index multirages, but something we can easily
have.

typo: multiranges

Cheers

On Sun, Dec 27, 2020 at 1:50 AM Alexander Korotkov 
wrote:

> On Thu, Dec 17, 2020 at 10:10 PM Alexander Korotkov
>  wrote:
> >
> > I think this patch is very close to committable.  I'm going to spend
> > some more time further polishing it and commit (if I don't find a
> > major issue or face objections).
>
> The main patch is committed.  I've prepared a set of improvements.
> 0001 Fixes bug in bsearch comparison functions
> 0002 Implements missing @> (range,multirange) operator and its commutator
> 0003 Does refactors signatures of *_internal() multirange functions
> 0004 Adds cross-type (range, multirange) operators handling to
> existing range GiST opclass
> 0005 Adds support for GiST multirange indexing by approximation of
> multirange as the union range with no gaps
>
> The patchset is quite trivial.  I'm going to push it if there are no
> objections.
>
> The SP-GiST handling is more tricky and requires substantial work.
>
> --
> Regards,
> Alexander Korotkov
>


Re: range_agg

2020-12-27 Thread David Fetter
On Sun, Dec 27, 2020 at 09:53:13AM -0800, Zhihong Yu wrote:
> Hi,
> 
> This is not an ideal way to index multirages, but something we can
> easily have.

What sort of indexing improvements do you have in mind?

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate




Re: pg_upgrade test for binary compatibility of core data types

2020-12-27 Thread Justin Pryzby
On Wed, Dec 16, 2020 at 11:22:23AM -0600, Justin Pryzby wrote:
> On Sun, Dec 06, 2020 at 12:02:48PM -0600, Justin Pryzby wrote:
> > I meant to notice if the binary format is accidentally changed again, which 
> > was
> > what happened here:
> > 7c15cef86 Base information_schema.sql_identifier domain on name, not 
> > varchar.
> > 
> > I added a table to the regression tests so it's processed by pg_upgrade 
> > tests,
> > run like:
> > 
> > | time make -C src/bin/pg_upgrade check oldsrc=`pwd`/11 
> > oldbindir=`pwd`/11/tmp_install/usr/local/pgsql/bin
> 
> Per cfbot, this avoids testing ::xml (support for which may not be enabled)
> And also now tests oid types.
> 
> I think the per-version hacks should be grouped by logical change, rather than
> by version.  Which I've started doing here.

rebased on 6df7a9698bb036610c1e8c6d375e1be38cb26d5f

-- 
Justin
>From a1114c3db36891f169122383db136fd8fb47cb10 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Sat, 5 Dec 2020 22:31:19 -0600
Subject: [PATCH v3 1/2] WIP: pg_upgrade/test.sh: changes needed to allow
 testing upgrade from v11

---
 src/bin/pg_upgrade/test.sh | 92 ++
 1 file changed, 84 insertions(+), 8 deletions(-)

diff --git a/src/bin/pg_upgrade/test.sh b/src/bin/pg_upgrade/test.sh
index 04aa7fd9f5..9733217535 100644
--- a/src/bin/pg_upgrade/test.sh
+++ b/src/bin/pg_upgrade/test.sh
@@ -23,7 +23,7 @@ standard_initdb() {
 	# To increase coverage of non-standard segment size and group access
 	# without increasing test runtime, run these tests with a custom setting.
 	# Also, specify "-A trust" explicitly to suppress initdb's warning.
-	"$1" -N --wal-segsize 1 -g -A trust
+	"$1" -N -A trust
 	if [ -n "$TEMP_CONFIG" -a -r "$TEMP_CONFIG" ]
 	then
 		cat "$TEMP_CONFIG" >> "$PGDATA/postgresql.conf"
@@ -108,6 +108,9 @@ export EXTRA_REGRESS_OPTS
 mkdir "$outputdir"
 mkdir "$outputdir"/testtablespace
 
+mkdir "$outputdir"/sql
+mkdir "$outputdir"/expected
+
 logdir=`pwd`/log
 rm -rf "$logdir"
 mkdir "$logdir"
@@ -172,16 +175,83 @@ if "$MAKE" -C "$oldsrc" installcheck-parallel; then
 		fix_sql=""
 		case $oldpgversion in
 			804??)
-fix_sql="DROP FUNCTION public.myfunc(integer); DROP FUNCTION public.oldstyle_length(integer, text);"
+fix_sql="$fix_sql DROP FUNCTION public.myfunc(integer);"
 ;;
-			*)
-fix_sql="DROP FUNCTION public.oldstyle_length(integer, text);"
+		esac
+
+		# Removed in v10 commit 5ded4bd21
+		case $oldpgversion in
+			804??|9)
+fix_sql="$fix_sql DROP FUNCTION public.oldstyle_length(integer, text);"
+;;
+		esac
+
+		# commit 068503c76511cdb0080bab689662a20e86b9c845
+		case $oldpgversion in
+			10) # XXX
+fix_sql="$fix_sql DROP TRANSFORM FOR integer LANGUAGE sql CASCADE;"
+;;
+		esac
+
+		# commit db3af9feb19f39827e916145f88fa5eca3130cb2
+		case $oldpgversion in
+			10) # XXX
+fix_sql="$fix_sql DROP FUNCTION boxarea(box);"
+fix_sql="$fix_sql DROP FUNCTION funny_dup17();"
 ;;
 		esac
+
+		# commit cda6a8d01d391eab45c4b3e0043a1b2b31072f5f
+		case $oldpgversion in
+			10) # XXX
+fix_sql="$fix_sql DROP TABLE abstime_tbl;"
+fix_sql="$fix_sql DROP TABLE reltime_tbl;"
+fix_sql="$fix_sql DROP TABLE tinterval_tbl;"
+;;
+		esac
+
+		# Various things removed for v14
+		case $oldpgversion in
+			804??|9|10|11|12|13)
+# commit 76f412ab3
+# This one is only needed for v11+ ??
+# (see below for more operators removed that also apply to older versions)
+fix_sql="$fix_sql DROP OPERATOR public.!=- (pg_catalog.int8, NONE);"
+;;
+		esac
+		case $oldpgversion in
+			804??|9|10|11|12|13)
+# commit 76f412ab3
+fix_sql="$fix_sql DROP OPERATOR public.#@# (pg_catalog.int8, NONE);"
+fix_sql="$fix_sql DROP OPERATOR public.#%# (pg_catalog.int8, NONE);"
+fix_sql="$fix_sql DROP OPERATOR public.#@%# (pg_catalog.int8, NONE);"
+
+# commit 9e38c2bb5 and 97f73a978
+# fix_sql="$fix_sql DROP AGGREGATE array_larger_accum(anyarray);"
+fix_sql="$fix_sql DROP AGGREGATE array_cat_accum(anyarray);"
+fix_sql="$fix_sql DROP AGGREGATE first_el_agg_any(anyelement);"
+
+# commit 76f412ab3
+#fix_sql="$fix_sql DROP OPERATOR @#@(bigint,NONE);"
+fix_sql="$fix_sql DROP OPERATOR @#@(NONE,bigint);"
+;;
+		esac
+
+		# commit 578b22971: OIDS removed in v12
+		case $oldpgversion in
+			804??|9|10|11)
+fix_sql="$fix_sql ALTER TABLE public.tenk1 SET WITHOUT OIDS;"
+fix_sql="$fix_sql ALTER TABLE public.tenk1 SET WITHOUT OIDS;"
+#fix_sql="$fix_sql ALTER TABLE public.stud_emp SET WITHOUT OIDS;" # inherited
+fix_sql="$fix_sql ALTER TABLE public.emp SET WITHOUT OIDS;"
+fix_sql="$fix_sql ALTER TABLE public.tt7 SET WITHOUT OIDS;"
+;;
+		esac
+
 		psql -X -d regression -c "$fix_sql;" || psql_fix_sql_status=$?
 	fi
 
-	pg_dumpall --no-sync -f "$temp_root"/dump1.sql || pg_dumpall1_status=$?
+	pg_dumpall --extra-float-digits=0 --no-sync -f "$temp_root"/dump1.sql || pg_d

Re: range_agg

2020-12-27 Thread Alexander Korotkov
On Sun, Dec 27, 2020 at 8:52 PM Zhihong Yu  wrote:
> This is not an ideal way to index multirages, but something we can easily 
> have.
>
> typo: multiranges

Thanks for catching.  I will revise the commit message before committing.

--
Regards,
Alexander Korotkov




Re: range_agg

2020-12-27 Thread Alexander Korotkov
On Sun, Dec 27, 2020 at 9:07 PM David Fetter  wrote:
> On Sun, Dec 27, 2020 at 09:53:13AM -0800, Zhihong Yu wrote:
> > This is not an ideal way to index multirages, but something we can
> > easily have.
>
> What sort of indexing improvements do you have in mind?

Approximation of multirange as a range can cause false positives.
It's good if gaps are small, but what if they aren't.

Ideally, we should split multirange to the ranges and index them
separately.  So, we would need a GIN-like index.  The problem is that
the GIN entry tree is a B-tree, which is not very useful for searching
for ranges.  If we could replace the GIN entry tree with GiST or
SP-GiST, that should be good.  We could index multirage parts
separately and big gaps wouldn't be a problem.  Similar work was
already prototyped (it was prototyped under the name "vodka", but I'm
not a big fan of this name).  FWIW, such a new access method would
need a lot of work to bring it to commit.  I don't think it would be
reasonable, before multiranges get popular.

Regarding the GiST opclass, it seems the best we can do in GiST.

--
Regards,
Alexander Korotkov




Re: Parallel Inserts in CREATE TABLE AS

2020-12-27 Thread Zhihong Yu
For v16-0002-Tuple-Cost-Adjustment-for-Parallel-Inserts-in-CTAS.patch:

+   if (ignore &&
+   (root->parse->CTASParallelInsInfo &
+CTAS_PARALLEL_INS_TUP_COST_CAN_IGN))

I wonder why CTAS_PARALLEL_INS_TUP_COST_CAN_IGN is checked again in the
above if since when ignore_parallel_tuple_cost returns
true, CTAS_PARALLEL_INS_TUP_COST_CAN_IGN is set already.

+ * In this function we only care Append and Gather nodes.

'care' -> 'care about'

+   for (int i = 0; i < aps->as_nplans; i++)
+   {
+   parallel |= PushDownCTASParallelInsertState(dest,
+   aps->appendplans[i],
+   gather_exists);

It seems the loop termination condition can include parallel since we can
come out of the loop once parallel is true.

+   if (!allow && tuple_cost_flags && gather_exists)

As the above code shows, gather_exists is only checked when allow is false.

+* We set the flag for two cases when there is no parent path
will
+* be created(such as : limit,sort,distinct...):

Please correct the grammar : there are two verbs following 'when'

For set_append_rel_size:

+   {
+   root->parse->CTASParallelInsInfo |=
+
CTAS_PARALLEL_INS_IGN_TUP_COST_APPEND;
+   }
+   }
+
+   if (root->parse->CTASParallelInsInfo &
+   CTAS_PARALLEL_INS_IGN_TUP_COST_APPEND)
+   {
+   root->parse->CTASParallelInsInfo &=
+
~CTAS_PARALLEL_INS_IGN_TUP_COST_APPEND;

In the if block for childrel->rtekind ==
RTE_SUBQUERY, CTAS_PARALLEL_INS_IGN_TUP_COST_APPEND maybe set. Why is it
cleared immediately after ?

+   /* Set to this in case tuple cost needs to be ignored for Append cases.
*/
+   CTAS_PARALLEL_INS_IGN_TUP_COST_APPEND = 1 << 3

Since each CTAS_PARALLEL_INS_ flag is a bit, maybe it's better to use 'turn
on' or similar term in the comment. Because 'set to' normally means
assignment.

Cheers

On Sun, Dec 27, 2020 at 12:50 AM Bharath Rupireddy <
bharath.rupireddyforpostg...@gmail.com> wrote:

> On Sat, Dec 26, 2020 at 11:11 AM Dilip Kumar 
> wrote:
> > I have reviewed part of v15-0001 patch, I have a few comments, I will
> > continue to review this.
>
> Thanks a lot.
>
> > 1.
> > Why is this temporary hack? and what is the plan for removing this hack?
>
> The changes in xact.c, xact.h and heapam.c are common to all the
> parallel insert patches - COPY, INSERT INTO SELECT. That was the
> initial comment, I forgot to keep it in sync with the other patches.
> Now, I used the comment from INSERT INTO SELECT patch. IIRC, the plan
> was to have these code in all the parallel inserts patch, whichever
> gets to review and commit first, others will update their patches
> accordingly.
>
> > 2.
> > +/*
> > + * ChooseParallelInsertsInCTAS --- determine whether or not parallel
> > + * insertion is possible, if yes set the parallel insert state i.e.
> push down
> > + * the dest receiver to the Gather nodes.
> > + */
> > +void ChooseParallelInsertsInCTAS(IntoClause *into, QueryDesc *queryDesc)
> > +{
> > +if (!IS_CTAS(into))
> > +return;
> >
> > When will this hit?  The functtion name suggest that it is from CTAS
> > but now you have a check that if it is
> > not for CTAS then return,  can you add the comment that when do you
> > expect this case?
>
> Yes it will hit for explain cases, but I choose to remove this and
> check outside in the explain something like:
> if (into)
> ChooseParallelInsertsInCTAS()
>
> > Also the function name should start in a new line
> > i.e
> > void
> > ChooseParallelInsertsInCTAS(IntoClause *into, QueryDesc *queryDesc)
>
> Ah, missed that. Modified now.
>
> > 3.
> > +/*
> > + * ChooseParallelInsertsInCTAS --- determine whether or not parallel
> > + * insertion is possible, if yes set the parallel insert state i.e.
> push down
> > + * the dest receiver to the Gather nodes.
> > + */
> >
> > Push down to the Gather nodes?  I think the right statement will be
> > push down below the Gather node.
>
> Modified.
>
> > 4.
> > intorel_startup(DestReceiver *self, int operation, TupleDesc typeinfo)
> > {
> >  DR_intorel *myState = (DR_intorel *) self;
> >
> > -- Comment ->in parallel worker we don't need to crease dest recv
> blah blah
> > +if (myState->is_parallel_worker)
> > {
> > --parallel worker handling--
> > return;
> > }
> >
> > --non-parallel worker code stay right there, instead of moving to
> else
>
> Done.
>
> > 5.
> > +/*
> > + * ChooseParallelInsertsInCTAS --- determine whether or not parallel
> > + * insertion is possible, if yes set the parallel insert state i.e.
> push down
> > + * the dest receiver to the Gather nodes.
> > + */
> > +void ChooseParallelInsertsInCTAS(IntoClause *into, QueryDesc *queryDesc)
> > +{
> >
> > From function name and comments it appeared that this function will
> > return boolean saying whether
> > Parallel insert should be selected or not.  I t

hash_extension(text)

2020-12-27 Thread Joel Jacobson
Hi,

When developing extensions, I find myself spending time on manually checking
if my update scripts (invoked via ALTER EXTENSION ... UPDATE) gives the same 
result as
CREATE EXTENSION ... would do if installing the latest version from scratch 
without any previous version.

I thought it would be efficient if one could quickly detect such a difference.

If there would be a way to concatenate all extension objects and their 
definitions in a deterministic order,
a hash could be created, which could then be used to detect differences.

Below is my attempt to implement this idea.

The strategy is fragile as it doesn't handle all the regclasses,
so false negatives are possible, but I don't think false positives should be 
possible,
if implemented correctly.

Is there a better way to solve my problem already?

Feedback welcome.

Best regards,

Joel

SELECT extversion FROM pg_extension WHERE extname = 'uniphant';
extversion

1.0
(1 row)

SELECT hash_extension('uniphant');
hash_extension

-1425682867
(1 row)

ALTER EXTENSION uniphant UPDATE;

SELECT hash_extension('uniphant');
hash_extension

-1615520783
(1 row)

DROP EXTENSION uniphant;

CREATE EXTENSION uniphant;

SELECT hash_extension('uniphant');
hash_extension

-1615520783
(1 row)


CREATE OR REPLACE FUNCTION hash_extension(text)
RETURNS integer
STABLE
LANGUAGE sql
AS $$
--
-- Constructs a text string containing most of all the extension objects
-- and their create definitions.
--
-- This is useful to detect a diff between the result of
--
--ALTER EXTENSION ... UPDATE;
--SELECT hash_extension(...);
--
-- compared to if one would install the latest version
-- of the extension from scratch using
--
--CREATE EXTENSION ...;
--SELECT hash_extension(...);
--
-- This could happen if the author of the extension
-- made a mistake in the update scripts.
--
-- This function is meant to be useful to check
-- the correctness of such update scripts.
--
SELECT hashtext(jsonb_agg(jsonb_build_array(
  pg_describe_object,
  CASE classid
  WHEN 'pg_namespace'::regclass THEN (
SELECT jsonb_build_array(pg_roles.rolname, pg_namespace.nspacl)
FROM pg_namespace
JOIN pg_roles ON pg_roles.oid = pg_namespace.nspowner
WHERE pg_namespace.oid = q.objid
  )
  WHEN 'pg_proc'::regclass THEN jsonb_build_array(pg_get_functiondef(objid))
  WHEN 'pg_class'::regclass THEN (
SELECT jsonb_agg(jsonb_build_array(
  a.attname,
  pg_catalog.format_type(a.atttypid, a.atttypmod),
  (SELECT substring(pg_catalog.pg_get_expr(d.adbin, d.adrelid, true) for 
128)
FROM pg_catalog.pg_attrdef d
WHERE d.adrelid = a.attrelid AND d.adnum = a.attnum AND a.atthasdef),
  a.attnotnull,
  (SELECT c.collname FROM pg_catalog.pg_collation c, pg_catalog.pg_type t
WHERE c.oid = a.attcollation AND t.oid = a.atttypid
AND a.attcollation <> t.typcollation),
  a.attidentity,
  a.attgenerated
) ORDER BY a.attnum)
FROM pg_catalog.pg_attribute a
WHERE a.attrelid = q.objid
AND a.attnum > 0
AND NOT a.attisdropped
  )
  END,
  classid::regclass
) ORDER BY pg_describe_object)::text)
FROM (
  SELECT pg_describe_object(classid, objid, 0), classid::regclass, objid
  FROM pg_depend
  WHERE refclassid = 'pg_extension'::regclass
  AND refobjid = (SELECT oid FROM pg_extension WHERE extname = $1)
) AS q
$$;



vacuum_cost_page_miss default value and modern hardware

2020-12-27 Thread Peter Geoghegan
vacuum_cost_page_miss has a default value of 10, while
vacuum_cost_page_dirty has a default value of 20. This has been the
case since cost-based delays were introduced by commit f425b605f4e
back in 2004. The obvious implication is that dirtying a page is on
average only twice as expensive as a single shared_buffers miss (that
doesn't dirty the page). While that might have made sense back in
2004, when magnetic disks were the norm, it seems questionable now.

The trend these days is that the total number of dirty pages is the
limiting factor for OLTP workloads. This is a broad trend among all
disk-based RDBMSs with an ARIES style design. It is more or less a
result of long term trends in main memory size scaling and flash
storage. It's rare for OLTP workloads to be truly I/O bound, and yet
the backpressure from page cleaning/checkpointing becomes a
bottleneck. In short, writes are way more expensive than reads -- the
*relative* cost of writes has increased significantly (our use of the
OS FS cache makes this even worse).

I suspect that this trend will become even more important for Postgres
in the coming years, but that's not what I want to talk about right
now. I just want to talk about vacuum_cost_page_miss on this thread.

Simply decreasing vacuum_cost_page_dirty seems like a low risk way of
making the VACUUM costing more useful within autovacuum workers.
Halving vacuum_cost_page_dirty to 5 would be a good start, though I
think that a value as low as 2 would be better. That would make it
only 2x vacuum_cost_page_hit's default (i.e 2x the cost of processing
a page that is in shared_buffers but did not need to be dirtied),
which seems sensible to me when considered in the context in which the
value is actually applied (and not some abstract theoretical context).
There are a few reasons why this seems like a good idea now:

* Throttling/delaying VACUUM is only useful as a way of smoothing the
impact on production queries, which is an important goal, but
currently we don't discriminate against the cost that we really should
keep under control (dirtying new pages within VACUUM) very well.

This is due to the aforementioned trends, the use of a strategy ring
buffer by VACUUM, the fact that indexes are usually vacuumed in
sequential physical order these days, and many other things that were
not a factor in 2004.

* There is a real downside to throttling VACUUM unnecessarily, and the
effects are *non-linear*. On a large table, the oldest xmin cutoff may
become very old by the time we're only (say) half way through the
initial table scan in lazy_scan_heap(). There may be relatively little
work to do because most of the dead tuples won't be before the oldest
xmin cutoff by that time (VACUUM just cannot keep up). Excessive
throttling for simple page misses may actually *increase* the amount
of I/O that VACUUM has to do over time -- we should try to get to the
pages that actually need to be vacuumed quickly, which are probably
already dirty anyway (and thus are probably going to add little to the
cost delay limit in practice). Everything is connected to everything
else.

* vacuum_cost_page_miss is very much not like random_page_cost, and
the similar names confuse the issue -- this is not an optimization
problem. Thinking about VACUUM as unrelated to the workload itself is
obviously wrong. Changing the default is also an opportunity to clear
that up.

Even if I am wrong to suggest that a miss within VACUUM should only be
thought of as 2x as expensive as a hit in some *general* sense, I am
concerned about *specific* consequences. There is no question about
picking the best access path here -- we're still going to have to
access the same blocks in the same way sooner or later. In general I
think that we should move in the direction of more frequent, cheaper
VACUUM operations [1], though we've already done a lot of work in that
direction (e.g. freeze map work).

* Some impact from VACUUM on query performance may in fact be a good thing.

Deferring the cost of vacuuming can only make sense if we'll
eventually be able to catch up because we're experiencing a surge in
demand, which seems kind of optimistic -- it seems more likely that
the GC debt will just grow and grow. Why should the DBA not expect to
experience some kind of impact, which could be viewed as a natural
kind of backpressure? The most important thing is consistent
performance.

* Other recent work such as the vacuum_cleanup_index_scale_factor
patch has increased the relative cost of index vacuuming in some
important cases: we don't have a visibility/freeze map for indexes,
but index vacuuming that doesn't dirty any pages and has a TID kill
list that's concentrated at the end of the heap/table is pretty cheap
(the TID binary search is cache efficient/cheap). This change will
help these workloads by better reflecting the way in which index
vacuuming can be cheap for append-only tables with a small amount of
garbage for recently inserted tuples that also got u

Re: doc review for v14

2020-12-27 Thread Justin Pryzby
On Thu, Dec 24, 2020 at 05:12:02PM +0900, Michael Paquier wrote:
> I have applied most of it on HEAD, except 0011 and the things noted
> above.  Thanks again.

Thank you.

I see that I accidentally included ZSTD_COMPRESSION in pg_backup_archiver.h
while cherry-picking from the branch where I first fixed this.  Sorry :(

> 0001-pgindent-typos.not-a-patch touches pg_bsd_indent.

I'm hoping that someone will apply it there, but I realize that access to its
repository is tightly controlled :)

On Thu, Dec 24, 2020 at 05:12:02PM +0900, Michael Paquier wrote:
> Restraining more the set of options is something to consider, though
> it could be annoying.  I have discarded this one for now.

Even though its -d is unused, I guess since wouldn't serve any significant
purpose, we shouldn't make pg_restore -l -d fail for no reason.

I think a couple of these should be backpatched.
doc/src/sgml/ref/pg_dump.sgml
doc/src/sgml/sources.sgml
doc/src/sgml/cube.sgml?
doc/src/sgml/func.sgml?

-- 
Justin
>From 9c24fa421e423edb29fc866a70e935843dab2804 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Sat, 14 Nov 2020 23:09:21 -0600
Subject: [PATCH 1/7] typos in master

---
 doc/src/sgml/datatype.sgml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml
index 58d168c763..55d79c02f5 100644
--- a/doc/src/sgml/datatype.sgml
+++ b/doc/src/sgml/datatype.sgml
@@ -639,7 +639,7 @@ NUMERIC
 
 
  The NaN (not a number) value is used to represent
- undefined calculational results.  In general, any operation with
+ undefined computational results.  In general, any operation with
  a NaN input yields another NaN.
  The only exception is when the operation's other inputs are such that
  the same output would be obtained if the NaN were to
-- 
2.17.0

>From b4f8290d322a2d783352aff3a4b134acafa0a13c Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Sun, 25 Oct 2020 13:53:08 -0500
Subject: [PATCH 2/7] producement?  fcec6caafa2346b6c9d3ad5065e417733bd63cd9

Previously discussed at
https://www.postgresql.org/message-id/20201102062203.GA15770%40paquier.xyz

Should backpatch
---
 src/backend/utils/adt/xml.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c
index 4c299057a6..ab3151cefb 100644
--- a/src/backend/utils/adt/xml.c
+++ b/src/backend/utils/adt/xml.c
@@ -4535,11 +4535,11 @@ XmlTableFetchRow(TableFuncScanState *state)
 	xtCxt = GetXmlTableBuilderPrivateData(state, "XmlTableFetchRow");
 
 	/*
-	 * XmlTable returns table - set of composite values. The error context, is
-	 * used for producement more values, between two calls, there can be
-	 * created and used another libxml2 error context. It is libxml2 global
-	 * value, so it should be refreshed any time before any libxml2 usage,
-	 * that is finished by returning some value.
+	 * XmlTable returns a table-set of composite values. The error context is
+	 * used for providing more detail. Between two calls, other libxml2
+	 * error contexts might have been created and used ; since they're libxml2 
+	 * global values, they should be refreshed each time before any libxml2 usage
+	 * that finishes by returning some value.
 	 */
 	xmlSetStructuredErrorFunc((void *) xtCxt->xmlerrcxt, xml_errorHandler);
 
-- 
2.17.0

>From e2b22163feb706ecba670108b6760526ddb1d57b Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Sun, 15 Nov 2020 10:23:32 -0600
Subject: [PATCH 3/7] cannot

---
 doc/src/sgml/protocol.sgml | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index 4899bacda7..8fc1427d0a 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -1532,8 +1532,8 @@ SELCT 1/0;
 support to PostgreSQL.  In this case the
 connection must be closed, but the frontend might choose to open a fresh
 connection and proceed without requesting GSSAPI
-encryption.  Given the length limits specified above, the ErrorMessage can
-not be confused with a proper response from the server with an appropriate
+encryption.  Given the length limits specified above, the ErrorMessage
+cannot be confused with a proper response from the server with an appropriate
 length.

 
-- 
2.17.0

>From 4f78473ed9063464e2510bbd433e3ebc3b9a773e Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Sat, 19 Dec 2020 03:30:25 -0600
Subject: [PATCH 4/7] Doc review for min_dynamic_shared_memory: 84b1c63ad

---
 doc/src/sgml/config.sgml | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index ce1fed7dfb..6b2acab9f4 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -1941,13 +1941,13 @@ include_dir 'conf.d'
   

 Specifies the amount of memory that should be allocated at server
-startup time for use by parallel queries.  

Re: Feature request: Connection string parsing for postgres_fdw

2020-12-27 Thread Eric Hanson
Whoa that's perfect!  Thank you so much.

On Thu, Dec 24, 2020 at 4:59 PM Ian Lawrence Barwick 
wrote:

> 2020年12月23日(水) 22:05 Eric Hanson :
> >
> > I'm trying to store connection to postgres_fdw in the database  I want
> to be able to store the full breadth of connection styles and all the
> different types of connections that libpq supports.  But having some
> troubles.
> >
> > Postgres_fdw wants options passed into CREATE SERVER, all broken out
> into separate variables, but this format is neither a connection URI, nor a
> keyword/value string.  One could imagine writing some parser that extracts
> all the variables and honors collisions in the same way libpq does (like
> when both the path and the query string specify a port), but I really don't
> want to recreate that.
> >
> > It would be really nice if I could tap into libpq's connection string
> parser from SQL and use it to extract all the variables in a given
> connection string, so that I can then pipe those into CREATE SERVER
> options.  Either that, or teach postgres_fdw how to consume standard
> connection strings.
>
> Does something like this do what you're looking for?
>
> postgres=# SELECT * FROM conninfo_parse('host=node1 user=foo');
>  keyword | value
> -+---
>  user| foo
>  host| node1
> (2 rows)
>
> postgres=# SELECT * FROM conninfo_parse('postgresql://localhost:5433');
>  keyword |   value
> -+---
>  host| localhost
>  port| 5433
> (2 rows)
>
> Basically a wrapper around PQconninfoParse(), I've had the code knocking
> around
> for a while now and finally got round to packaging it into an
> extension [1]. It's
> on my todo list to submit a patch based on this to core, as it seems
> Generally
> Useful To Have.
>
> [1] https://github.com/ibarwick/conninfo
>
> Regards
>
> Ian Barwick
> --
> EnterpriseDB: https://www.enterprisedb.com
>


Re: Rethinking plpgsql's assignment implementation

2020-12-27 Thread Pavel Stehule
so 26. 12. 2020 v 19:00 odesílatel Pavel Stehule 
napsal:

> Hi
>
> I repeated tests. I wrote a set of simple functions. It is a synthetical
> test, but I think it can identify potential problems well.
>
> I calculated the average of 3 cycles and I checked the performance of each
> function. I didn't find any problem. The total execution time is well too.
> Patched code is about 11% faster than master (14sec x 15.8sec). So there is
> new important functionality with nice performance benefits.
>
> make check-world passed
>

I played with plpgsql_check tests and again I didn't find any significant
issue of this patch. I am very satisfied with implementation.

Now, the behavior of  SELECT INTO is behind the assign statement and this
fact should be documented. Usually we don't need to use array's fields
here, but somebody can try it.

Regards

Pavel



> Regards
>
> Pavel
>
>
>
>


Re: Rethinking plpgsql's assignment implementation

2020-12-27 Thread Tom Lane
Pavel Stehule  writes:
> Now, the behavior of  SELECT INTO is behind the assign statement and this
> fact should be documented. Usually we don't need to use array's fields
> here, but somebody can try it.

It's been behind all along --- this patch didn't really change that.
But I don't mind documenting it more clearly.

regards, tom lane




Re: Cache relation sizes?

2020-12-27 Thread Andy Fan
On Thu, Dec 24, 2020 at 6:59 AM Thomas Munro  wrote:

> On Thu, Dec 17, 2020 at 10:22 PM Andy Fan 
> wrote:
> > Let me try to understand your point.  Suppose process 1 extends a file to
> > 2 blocks from 1 block, and fsync is not called, then a). the lseek *may*
> still
> > return 1 based on the comments in the ReadBuffer_common ("because
> > of buggy Linux kernels that sometimes return an seek SEEK_END result
> > that doesn't account for a recent write").  b). When this patch is
> introduced,
> > we would get 2 blocks for sure if we can get the size from cache. c).
> After
> > user reads the 2 blocks from cache and then the cache is evicted, and
> user
> > reads the blocks again, it is possible to get 1 again.
> >
> > Do we need to consider case a) in this patch? and Is case c). the
> situation you
> > want to avoid and do we have to introduce fsync to archive this?
> Basically I
> > think case a) should not happen by practice so we don't need to handle
> case
> > c) and finally we don't need the fsync/SR_SYNCING/SR_JUST_DIRTIED flag.
> > I'm not opposed to adding them, I am just interested in the background
> in case
> > you are also willing to discuss it.
>
> Sorry for the lack of background -- there was some discussion on the
> thread "Optimize dropping of relation buffers using dlist", but it's
> long and mixed up with other topics.  I'll try to summarise here.
>
> It's easy to reproduce files shrinking asynchronously on network
> filesystems that reserve space lazily[1],


Thanks for the explaintain. After I read that thread, I am more confused
now.
In [1],  we have the below test result.

$ cat magic_shrinking_file.c
#include 
#include 
#include 
#include 

int main()
{
int fd;
char buffer[8192] = {0};

fd = open("/mnt/test_loopback_remote/dir/file", O_RDWR | O_APPEND);
if (fd < 0) {
perror("open");
return EXIT_FAILURE;
}
printf("lseek(..., SEEK_END) = %jd\n", lseek(fd, 0, SEEK_END));
printf("write(...) = %zd\n", write(fd, buffer, sizeof(buffer)));
printf("lseek(..., SEEK_END) = %jd\n", lseek(fd, 0, SEEK_END));
printf("fsync(...) = %d\n", fsync(fd));
printf("lseek(..., SEEK_END) = %jd\n", lseek(fd, 0, SEEK_END));

return EXIT_SUCCESS;
}
$ cc magic_shrinking_file.c
$ ./a.out
lseek(..., SEEK_END) = 9670656
write(...) = 8192
lseek(..., SEEK_END) = 9678848
fsync(...) = -1
lseek(..., SEEK_END) = 9670656





I got 2 information from above.  a) before the fsync,  the lseek(fd, 0,
SEEK_END)
returns a correct value,  however after the fsync,  it returns a wrong
value.  b).
the fsync failed(return values is -1) in the above case.  I'm more confused
because of point a).  Since the fsync can't correct some wrong value, what
is the point to call fsync in this patch (I agree that it is good to fix
this
reliability problem within this  patch, but I'm doubtful if fsync can help
in
this case). Am I missing something obviously?

I read your patch with details some days ago and feel it is in pretty good
shape.
and I think you are clear about the existing issues very well. like  a).
smgr_alloc_sr use a
FIFO design.  b). SR_PARTITION_LOCK doesn't use a partition lock really.
c). and
looks like you have some concern about the concurrency issue, which I
didn't find
anything wrong.  d).  how to handle the DIRTY sr during evict.  I planned
to recheck
item c) before this reply, but looks like the time is not permitted.  May I
know what
Is your plan about this patch?

[1]
https://www.postgresql.org/message-id/CA%2BhUKGLZTfKuXir9U4JQkY%3Dk3Tb6M_3toGrGOK_fa2d4MPQQ_w%40mail.gmail.com

-- 
Best Regards
Andy Fan


Tying an object's ownership to datdba

2020-12-27 Thread Noah Misch
https://postgr.es/m/20201031163518.gb4039...@rfd.leadboat.com gave $SUBJECT as
one of the constituent projects for changing the public schema default ACL.
This feature stands on its own, hence the new thread.  I showed syntax "ALTER
SCHEMA public OWNER TO DATABASE_OWNER" and anticipated a new RoleSpecType.
That was fine for in-memory representation, but it lacked a value to store in
pg_namespace.nspowner.  That led me to instead add a default role,
"pg_database_owner".  That way, user queries joining owner columns to
pg_authid need no modification.  Also, user.c changes were trivial, and
pg_dump needed no changes.  The role's membership consists, implicitly, of the
current database owner.  The first patch refactors acl.c to reduce code
duplication, and the second patch adds the feature.

I ended up blocking DDL that creates role memberships involving the new role;
see reasons in user.c comments.  Lifting those restrictions looked feasible,
but it was inessential to the mission, and avoiding unintended consequences
would have been tricky.  Views "information_schema.enabled_roles" and
"information_schema.applicable_roles" list any implicit membership in
pg_database_owner, but pg_catalog.pg_group and psql \dgS do not.  If this
leads any reviewer to look closely at applicable_roles, note that its behavior
doesn't match its documentation
(https://postgr.es/m/flat/20060728170615.gy20...@kenobi.snowman.net).

This patch makes us read pg_database when reading pg_auth_members.
IndexScanOK() has been saying "during backend startup we have to be able to
use the pg_authid and pg_auth_members syscaches for authentication".  While
that's true of pg_authid, I found no sign of pg_auth_members reads that early.
(The read in InitPostgres() -> CheckMyDatabase() -> pg_database_aclcheck()
happens after RelationCacheInitializePhase3().  In a physical walsender, which
never has a mature relcache, some SHOW commands make guc.c check
DEFAULT_ROLE_READ_ALL_SETTINGS.  The walsender case, though it happens after
authentication, may necessitate IndexScanOK()'s treatment of pg_auth_members.)
For now, just in case I missed the early read, I've made IndexScanOK() treat
pg_database like pg_auth_members.

Thanks,
nm
Author: Noah Misch 
Commit: Noah Misch 

Merge similar algorithms into roles_is_member_of().

The next commit would have complicated two or three algorithms, so take
this opportunity to consolidate.  No functional changes.

Reviewed by FIXME.

Discussion: https://postgr.es/m/FIXME

diff --git a/src/backend/utils/adt/acl.c b/src/backend/utils/adt/acl.c
index fe6c444..1adacb9 100644
--- a/src/backend/utils/adt/acl.c
+++ b/src/backend/utils/adt/acl.c
@@ -50,32 +50,24 @@ typedef struct
 /*
  * We frequently need to test whether a given role is a member of some other
  * role.  In most of these tests the "given role" is the same, namely the
- * active current user.  So we can optimize it by keeping a cached list of
- * all the roles the "given role" is a member of, directly or indirectly.
- *
- * There are actually two caches, one computed under "has_privs" rules
- * (do not recurse where rolinherit isn't true) and one computed under
- * "is_member" rules (recurse regardless of rolinherit).
+ * active current user.  So we can optimize it by keeping cached lists of all
+ * the roles the "given role" is a member of, directly or indirectly.
  *
  * Possibly this mechanism should be generalized to allow caching membership
  * info for multiple roles?
  *
- * The has_privs cache is:
- * cached_privs_role is the role OID the cache is for.
- * cached_privs_roles is an OID list of roles that cached_privs_role
- * has the privileges of (always including itself).
- * The cache is valid if cached_privs_role is not InvalidOid.
- *
- * The is_member cache is similarly:
- * cached_member_role is the role OID the cache is for.
- * cached_membership_roles is an OID list of roles that cached_member_role
- * is a member of (always including itself).
- * The cache is valid if cached_member_role is not InvalidOid.
+ * Each element of cached_roles is an OID list of constituent roles for the
+ * corresponding element of cached_role (always including the cached_role
+ * itself).  One cache has ROLERECURSE_PRIVS semantics, and the other has
+ * ROLERECURSE_MEMBERS semantics.
  */
-static Oid cached_privs_role = InvalidOid;
-static List *cached_privs_roles = NIL;
-static Oid cached_member_role = InvalidOid;
-static List *cached_membership_roles = NIL;
+enum RoleRecurseType
+{
+   ROLERECURSE_PRIVS = 0,  /* recurse if rolinherit */
+   ROLERECURSE_MEMBERS = 1 /* recurse unconditionally */
+};
+static Oid cached_role[] = {InvalidOid, InvalidOid};
+static List *cached_roles[] = {NIL, NIL};
 
 
 static const char *getid(const char *s, char *n);
@@ -4675,8 +4667,8 @@ initialize_acl(void)
{
/*
 * In normal mode, set a callbac

Re: Parallel Inserts in CREATE TABLE AS

2020-12-27 Thread Dilip Kumar
On Sun, Dec 27, 2020 at 2:20 PM Bharath Rupireddy
 wrote:
>
> On Sat, Dec 26, 2020 at 11:11 AM Dilip Kumar  wrote:
> > I have reviewed part of v15-0001 patch, I have a few comments, I will
> > continue to review this.
>
> Thanks a lot.
>
> > 1.
> > Why is this temporary hack? and what is the plan for removing this hack?
>
> The changes in xact.c, xact.h and heapam.c are common to all the
> parallel insert patches - COPY, INSERT INTO SELECT. That was the
> initial comment, I forgot to keep it in sync with the other patches.
> Now, I used the comment from INSERT INTO SELECT patch. IIRC, the plan
> was to have these code in all the parallel inserts patch, whichever
> gets to review and commit first, others will update their patches
> accordingly.
>
> > 2.
> > +/*
> > + * ChooseParallelInsertsInCTAS --- determine whether or not parallel
> > + * insertion is possible, if yes set the parallel insert state i.e. push 
> > down
> > + * the dest receiver to the Gather nodes.
> > + */
> > +void ChooseParallelInsertsInCTAS(IntoClause *into, QueryDesc *queryDesc)
> > +{
> > +if (!IS_CTAS(into))
> > +return;
> >
> > When will this hit?  The functtion name suggest that it is from CTAS
> > but now you have a check that if it is
> > not for CTAS then return,  can you add the comment that when do you
> > expect this case?
>
> Yes it will hit for explain cases, but I choose to remove this and
> check outside in the explain something like:
> if (into)
> ChooseParallelInsertsInCTAS()
>
> > Also the function name should start in a new line
> > i.e
> > void
> > ChooseParallelInsertsInCTAS(IntoClause *into, QueryDesc *queryDesc)
>
> Ah, missed that. Modified now.
>
> > 3.
> > +/*
> > + * ChooseParallelInsertsInCTAS --- determine whether or not parallel
> > + * insertion is possible, if yes set the parallel insert state i.e. push 
> > down
> > + * the dest receiver to the Gather nodes.
> > + */
> >
> > Push down to the Gather nodes?  I think the right statement will be
> > push down below the Gather node.
>
> Modified.
>
> > 4.
> > intorel_startup(DestReceiver *self, int operation, TupleDesc typeinfo)
> > {
> >  DR_intorel *myState = (DR_intorel *) self;
> >
> > -- Comment ->in parallel worker we don't need to crease dest recv blah 
> > blah
> > +if (myState->is_parallel_worker)
> > {
> > --parallel worker handling--
> > return;
> > }
> >
> > --non-parallel worker code stay right there, instead of moving to else
>
> Done.
>
> > 5.
> > +/*
> > + * ChooseParallelInsertsInCTAS --- determine whether or not parallel
> > + * insertion is possible, if yes set the parallel insert state i.e. push 
> > down
> > + * the dest receiver to the Gather nodes.
> > + */
> > +void ChooseParallelInsertsInCTAS(IntoClause *into, QueryDesc *queryDesc)
> > +{
> >
> > From function name and comments it appeared that this function will
> > return boolean saying whether
> > Parallel insert should be selected or not.  I think name/comment
> > should be better for this
>
> Yeah that function can still return void because no point in returning
> bool there, since the intention is to see if parallel inserts can be
> performed, if yes, set the state otherwise exit. I changed the
> function name to TryParallelizingInsertsInCTAS(). Let me know your
> suggestions if that doesn't work out.
>
> > 6.
> > /*
> > + * For parallelizing inserts in CTAS i.e. making each parallel 
> > worker
> > + * insert the tuples, we must send information such as into clause 
> > (for
> > + * each worker to build separate dest receiver), object id (for 
> > each
> > + * worker to open the created table).
> >
> > Comment is saying we need to pass object id but the code under this
> > comment is not doing so.
>
> Improved the comment.
>
> > 7.
> > +/*
> > + * Since there are no rows that are transferred from workers to 
> > Gather
> > + * node, so we set it to 0 to be visible in estimated row count of
> > + * explain plans.
> > + */
> > +queryDesc->planstate->plan->plan_rows = 0;
> >
> > This seems a bit hackies Why it is done after the planning,  I mean
> > plan must know that it is returning a 0 rows?
>
> This exists to show up the estimated row count(in case of EXPLAIN CTAS
> without ANALYZE) in the output. For EXPLAIN ANALYZE CTAS actual tuples
> are shown correctly as 0 because Gather doesn't receive any tuples.
> if (es->costs)
> {
> if (es->format == EXPLAIN_FORMAT_TEXT)
> {
> appendStringInfo(es->str, "  (cost=%.2f..%.2f rows=%.0f 
> width=%d)",
>  plan->startup_cost, plan->total_cost,
>  plan->plan_rows, plan->plan_width);
>
> Since it's an estimated row count(which may not be always correct), we
> will let the EXPLAIN plan show that and I think we can remove that
> part. Thoughts?
>
> I removed it in v6 patch set.
>
> > 8.
> > +char *intoc

Re: [HACKERS] Custom compression methods

2020-12-27 Thread Dilip Kumar
On Sun, Dec 27, 2020 at 12:40 PM Andrey Borodin  wrote:
>
>
>
> > 25 дек. 2020 г., в 14:34, Dilip Kumar  написал(а):
> >
> >  
> >  
> >  
> >  
> >  
> > 
>
> Maybe add Lz4\Zlib WAL FPI compression on top of this patchset? I'm not 
> insisting on anything, it just would be so cool to have it...
>
> BTW currently there are Oid collisions in original patchset.

Thanks for the patch.  Maybe we can allow setting custom compression
methods for wal compression as well.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: Rethinking plpgsql's assignment implementation

2020-12-27 Thread Pavel Stehule
po 28. 12. 2020 v 0:55 odesílatel Tom Lane  napsal:

> Pavel Stehule  writes:
> > Now, the behavior of  SELECT INTO is behind the assign statement and this
> > fact should be documented. Usually we don't need to use array's fields
> > here, but somebody can try it.
>
> It's been behind all along --- this patch didn't really change that.
> But I don't mind documenting it more clearly.
>

ok

Pavel


> regards, tom lane
>


Re: Disable WAL logging to speed up data loading

2020-12-27 Thread Masahiko Sawada
On Thu, Dec 3, 2020 at 12:14 PM osumi.takami...@fujitsu.com
 wrote:
>
> Hello
>
>
> I've made a new patch v05 that took in comments
> to filter out WALs more strictly and addressed some minor fixes
> that were discussed within past few days.
> Also, I changed the documentations, considering those modifications.
>

>From a backup management tool perspective, how can they detect that
wal_level has been changed to ‘none' (and back to the normal)? IIUC
once we changed wal_level to none, old backups that are taken before
setting to ‘none’ can be used only for restoring the database to the
point before the LSN where setting 'wal_level = none'. The users can
neither restore the database to any points in the term of 'wal_level =
none' nor use an old backup to restore the database to the point after
LSN where setting 'wal_level = none’. I think we might need to provide
a way to detect the changes other than reading XLOG_PARAMETER_CHANGE.

Regards,

--
Masahiko Sawada
EnterpriseDB:  https://www.enterprisedb.com/




Re: Parallel Inserts in CREATE TABLE AS

2020-12-27 Thread vignesh C
On Sun, Dec 27, 2020 at 2:28 PM Bharath Rupireddy <
bharath.rupireddyforpostg...@gmail.com> wrote:
>
> On Sat, Dec 26, 2020 at 9:20 PM vignesh C  wrote:
> > +-- parallel inserts must occur
> > +select explain_pictas(
> > +'create table parallel_write as select length(stringu1) from tenk1;');
> > +select count(*) from parallel_write;
> > +drop table parallel_write;
> >
> > We can change comment  "parallel inserts must occur" like "parallel
> > insert must be selected for CTAS on normal table"
> >
> > +-- parallel inserts must occur
> > +select explain_pictas(
> > +'create unlogged table parallel_write as select length(stringu1) from
tenk1;');
> > +select count(*) from parallel_write;
> > +drop table parallel_write;
> >
> > We can change comment "parallel inserts must occur" like "parallel
> > insert must be selected for CTAS on unlogged table"
> > Similar comment need to be handled in other places also.
>
> I think the existing comments look fine. The info like table type and
> the Query CTAS or CMV is visible by looking at the test case. What I
> wanted from the comments is whether we support parallel inserts or not
> and if not why so that it will be easy to read. I tried to keep it as
> succinctly as possible.
>

I saw few inconsistencies in the patch:

*+-- parallel inserts must occur*+select explain_pictas(
+'create table parallel_write as select length(stringu1) from tenk1;');
+  explain_pictas


*+-- parallel inserts must not occur as the table is temporary*+select
explain_pictas(
+'create temporary table parallel_write as select length(stringu1) from
tenk1;');
+  explain_pictas


*+-- parallel inserts must occur, as there is init plan that gets executed
by+-- each parallel worker*
+select explain_pictas(
+'create table parallel_write as select two col1,
+(select two from (select * from tenk2) as tt limit 1) col2
+from tenk1  where tenk1.four = 3;');
+ explain_pictas


*+-- the top node is Gather under which merge join happens, so parallel
inserts+-- must occur*
+set enable_nestloop to off;
+set enable_mergejoin to on;


*+-- parallel hash join happens under Gather node, so parallel inserts must
occur*+set enable_mergejoin to off;
+set enable_hashjoin to on;
+select explain_pictas(

Test comments are detailed in a few cases and in few others it is not
detailed for similar kinds of parallelism selected tests. I felt we could
make the test comments consistent across the file.

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Custom compression methods

2020-12-27 Thread Andrey Borodin



> 28 дек. 2020 г., в 10:20, Dilip Kumar  написал(а):
> 
> On Sun, Dec 27, 2020 at 12:40 PM Andrey Borodin  wrote:
>> 
>> 
>> 
>>> 25 дек. 2020 г., в 14:34, Dilip Kumar  написал(а):
>>> 
>>>  
>>>  
>>>  
>>>  
>>>  
>>> 
>> 
>> Maybe add Lz4\Zlib WAL FPI compression on top of this patchset? I'm not 
>> insisting on anything, it just would be so cool to have it...
>> 
>> BTW currently there are Oid collisions in original patchset.
> 
> Thanks for the patch.  Maybe we can allow setting custom compression
> methods for wal compression as well.

No, unfortunately, we can't use truly custom methods. Custom compression 
handlers are WAL-logged. So we can use only static set of hardcoded compression 
methods.

Thanks!

Best regards, Andrey Borodin.



Re: New IndexAM API controlling index vacuum strategies

2020-12-27 Thread Masahiko Sawada
On Thu, Dec 24, 2020 at 12:59 PM Peter Geoghegan  wrote:
>
> On Tue, Dec 22, 2020 at 2:54 PM Masahiko Sawada  wrote:
> > I've started this separate thread from [1] for discussing the general
> > API design of index vacuum.
>
> This is a very difficult and very important problem. Clearly defining
> the problem is probably the hardest part. This prototype patch seems
> like a good start, though.
>
> Private discussion between Masahiko and myself led to a shared
> understanding of what the best *general* direction is for VACUUM now.
> It is necessary to deal with several problems all at once here, and to
> at least think about several more problems that will need to be solved
> later. If anybody reading the thread initially finds it hard to see
> the connection between the specific items that Masahiko has
> introduced, they should note that that's *expected*.
>
> > Summary:
> >
> > * Call ambulkdelete and amvacuumcleanup even when INDEX_CLEANUP is
> > false, and leave it to the index AM whether or not skip them.
>
> Makes sense. I like the way you unify INDEX_CLEANUP and the
> vacuum_cleanup_index_scale_factor stuff in a way that is now quite
> explicit and obvious in the code.
>
> > The second and third points are to introduce a general framework for
> > future extensibility. User-visible behavior is not changed by this
> > change.
>
> In some ways the ideas in your patch might be considered radical, or
> at least novel: they introduce the idea that bloat can be a
> qualitative thing. But at the same time the design is quite
> conservative: these are fairly isolated changes, at least code-wise. I
> am not 100% sure that this approach will be successful in
> vacuumlazy.c, in the end (I'm ~95% sure). But I am 100% sure that our
> collective understanding of the problems in this area will be
> significantly improved by this effort. A fundamental rethink does not
> necessarily require a fundamental redesign, and yet it might be just
> as effective.
>
> This is certainly what I see when testing my bottom-up index deletion
> patch, which adds an incremental index deletion mechanism that merely
> intervenes in a precise, isolated way. Despite my patch's simplicity,
> it manages to practically eliminate an entire important *class* of
> index bloat (at least once you make certain mild assumptions about the
> duration of snapshots). Sometimes it is possible to solve a hard
> problem by thinking about it only *slightly* differently.
>
> This is a tantalizing possibility for VACUUM, too. I'm willing to risk
> sounding grandiose if that's what it takes to get more hackers
> interested in these questions. With that in mind, here is a summary of
> the high level hypothesis behind this VACUUM patch:
>
> VACUUM can and should be reimagined as a top-down mechanism that
> complements various bottom-up mechanisms (including the stuff from my
> deletion patch, heap pruning, and possibly an enhanced version of heap
> pruning based on similar principles). This will be possible without
> changing any of the fundamental invariants of the current vacuumlazy.c
> design. VACUUM's problems are largely pathological behaviors of one
> kind or another, that can be fixed with specific well-targeted
> interventions. Workload characteristics can naturally determine how
> much of the cleanup is done by VACUUM itself -- large variations are
> possible within a single database, and even across indexes on the same
> table.

Agreed.

Ideally, the bottom-up mechanism works well and reclaim almost all
garbage. VACUUM should be a feature that complements these works if
the bottom-up mechanism cannot work well for some reason, and also is
used to make sure that all collected garbage has been vacuumed. For
heaps, we already have such a mechanism: opportunistically hot-pruning
and lazy vacuum. For indexes especially btree indexes, the bottom-up
index deletion and ambulkdelete() would have a similar relationship.

>
> > The new index AM API, amvacuumstrategy(), which is called before
> > bulkdelete() for each index and asks the index bulk-deletion strategy.
> > On this API, lazy vacuum asks, "Hey index X, I collected garbage heap
> > tuples during heap scanning, how urgent is vacuuming for you?", and
> > the index answers either "it's urgent" when it wants to do
> > bulk-deletion or "it's not urgent, I can skip it". The point of this
> > proposal is to isolate heap vacuum and index vacuum for each index so
> > that we can employ different strategies for each index. Lazy vacuum
> > can decide whether or not to do heap clean based on the answers from
> > the indexes.
>
> Right -- workload characteristics (plus appropriate optimizations at
> the local level) make it possible that amvacuumstrategy() will give
> *very* different answers from different indexes *on the same table*.
> The idea that all indexes on the table are more or less equally
> bloated at any given point in time is mostly wrong. Actually,
> *sometimes* it really is correct! But other times 

RE: Disable WAL logging to speed up data loading

2020-12-27 Thread osumi.takami...@fujitsu.com
Hello Sawada-San


On Monday, December 28, 2020 2:29 PM Masahiko Sawada  
wrote:
> On Thu, Dec 3, 2020 at 12:14 PM osumi.takami...@fujitsu.com
>  wrote:
> >
> > I've made a new patch v05 that took in comments to filter out WALs
> > more strictly and addressed some minor fixes that were discussed
> > within past few days.
> > Also, I changed the documentations, considering those modifications.
> 
> From a backup management tool perspective, how can they detect that
> wal_level has been changed to ‘none' (and back to the normal)? IIUC once
> we changed wal_level to none, old backups that are taken before setting to
> ‘none’ can be used only for restoring the database to the point before the
> LSN where setting 'wal_level = none'. The users can neither restore the
> database to any points in the term of 'wal_level = none' nor use an old backup
> to restore the database to the point after LSN where setting 'wal_level =
> none’. I think we might need to provide a way to detect the changes other
> than reading XLOG_PARAMETER_CHANGE.
In the past, we discussed the aspect of backup management tool in [1]
and concluded that this should be another patch separated from this thread
because to compare the wal_level changes between snapshots
applies to wal_level = minimal, too. Please have a look at the "second idea"
in the e-mail in the [1] and responses to it.

 [1] - 
https://www.postgresql.org/message-id/OSBPR01MB4888B34B81A6E0DD46B5D063EDE00%40OSBPR01MB4888.jpnprd01.prod.outlook.com

Best Regards,
Takamichi Osumi


Re: New IndexAM API controlling index vacuum strategies

2020-12-27 Thread Peter Geoghegan
On Sun, Dec 27, 2020 at 10:55 PM Masahiko Sawada  wrote:
> > As you said, the next question must be: How do we teach lazy vacuum to
> > not do what gets requested by amvacuumcleanup() when it cannot respect
> > the wishes of one individual indexes, for example when the
> > accumulation of LP_DEAD items in the heap becomes a big problem in
> > itself? That really could be the thing that forces full heap
> > vacuuming, even with several indexes.
>
> You mean requested by amvacuumstreategy(), not by amvacuumcleanup()? I
> think amvacuumstrategy() affects only ambulkdelete(). But when all
> ambulkdelete() were skipped by the requests by index AMs we might want
> to skip amvacuumcleanup() as well.

No, I was asking about how we should decide to do a real VACUUM even
(a real ambulkdelete() call) when no index asks for it because
bottom-up deletion works very well in every index. Clearly we will
need to eventually remove remaining LP_DEAD items from the heap at
some point if nothing else happens -- eventually LP_DEAD items in the
heap alone will force a traditional heap vacuum (which will still have
to go through indexes that have not grown, just to be safe/avoid
recycling a TID that's still in the index).

Postgres heap fillfactor is 100 by default, though I believe it's 90
in another well known DB system. If you set Postgres heap fill factor
to 90 you can fit a little over 200 LP_DEAD items in the "extra space"
left behind in each heap page after initial bulk loading/INSERTs take
place that respect our lower fill factor setting. This is about 4x the
number of initial heap tuples in the pgbench_accounts table -- it's
quite a lot!

If we pessimistically assume that all updates are non-HOT updates,
we'll still usually have enough space for each logical row to get
updated several times before the heap page "overflows". Even when
there is significant skew in the UPDATEs, the skew is not noticeable
at the level of individual heap pages. We have a surprisingly large
general capacity to temporarily "absorb" extra garbage LP_DEAD items
in heap pages this way. Nobody really cared about this extra capacity
very much before now, because it did not help with the big problem of
index bloat that you naturally see with this workload. But that big
problem may go away soon, and so this extra capacity may become
important at the same time.

I think that it could make sense for lazy_scan_heap() to maintain
statistics about the number of LP_DEAD items remaining in each heap
page (just local stack variables). From there, it can pass the
statistics to the choose_vacuum_strategy() function from your patch.
Perhaps choose_vacuum_strategy() will notice that the heap page with
the most LP_DEAD items encountered within lazy_scan_heap() (among
those encountered so far in the event of multiple index passes) has
too many LP_DEAD items -- this indicates that there is a danger that
some heap pages will start to "overflow" soon, which is now a problem
that lazy_scan_heap() must think about. Maybe if the "extra space"
left by applying heap fill factor (with settings below 100) is
insufficient to fit perhaps 2/3 of the LP_DEAD items needed on the
heap page that has the most LP_DEAD items (among all heap pages), we
stop caring about what amvacuumstrategy()/the indexes say. So we do
the right thing for the heap pages, while still mostly avoiding index
vacuuming and the final heap pass.

I experimented with this today, and I think that it is a good way to
do it. I like the idea of choose_vacuum_strategy() understanding that
heap pages that are subject to many non-HOT updates have a "natural
extra capacity for LP_DEAD items" that it must care about directly (at
least with non-default heap fill factor settings). My early testing
shows that it will often take a surprisingly long time for the most
heavily updated heap page to have more than about 100 LP_DEAD items.

> > I will need to experiment in order to improve my understanding of how
> > to make this cooperate with bottom-up index deletion. But that's
> > mostly just a question for my patch (and a relatively easy one).
>
> Yeah, I think we might need something like statistics about garbage
> per index so that individual index can make a different decision based
> on their status. For example, a btree index might want to skip
> ambulkdelete() if it has a few dead index tuples in its leaf pages. It
> could be on stats collector or on btree's meta page.

Right. I think that even a very conservative approach could work well.
For example, maybe we teach nbtree's amvacuumstrategy() routine to ask
to do a real ambulkdelete(), except in the extreme case where the
index is *exactly* the same size as it was after the last VACUUM.
This will happen regularly with bottom-up index deletion. Maybe that
approach is a bit too conservative, though.

-- 
Peter Geoghegan