Re: [PATCH] Increase the maximum value track_activity_query_size

2019-12-22 Thread Julien Rouhaud
On Sun, Dec 22, 2019 at 1:03 AM Tomas Vondra
 wrote:
>
> On Sat, Dec 21, 2019 at 04:25:05PM -0500, Tom Lane wrote:
> >Nikolay Samokhvalov  writes:
> >> Here is what ORMs do:
> >> select length('SELECT "column_name_1001", "column_name_1002",
> >> "column_name_1003", "column_name_1004", "column_name_1005",
> >> "column_name_1006", "column_name_1007", "column_name_1008",
> >> "column_name_1009", "column_name_1010", "column_name_1011",
> >> "column_name_1012", "column_name_1013", "column_name_1014",
> >> "column_name_1015", "column_name_1016", "column_name_1017",
> >> "column_name_1018", "column_name_1019", "column_name_1020",
> >> "column_name_1021", "column_name_1022", "column_name_1023",
> >> "column_name_1024", "column_name_1025", "column_name_1026",
> >> "column_name_1027", "column_name_1028", "column_name_1029",
> >> "column_name_1030", "column_name_1031", "column_name_1032",
> >> "column_name_1033", "column_name_1034", "column_name_1035",
> >> "column_name_1036", "column_name_1037", "column_name_1038",
> >> "column_name_1039", "column_name_1040", "column_name_1041",
> >> "column_name_1042", "column_name_1043", "column_name_1044",
> >> "column_name_1045", "column_name_1046", "column_name_1047",
> >> "column_name_1048", "column_name_1049", "column_name_1050" FROM
> >> "some_table";');
> >>  length
> >> 
> >>1024
> >> (1 row)
> >
> >> That's it – with default settings, you won't see WHERE clause or
> >> anything else.
> >
> >If that's true, it doesn't offer much of a case for upping the limit
> >on track_activity_query_size.  The longest such a query could reasonably
> >get is somewhere near NAMEDATALEN times MaxHeapAttributeNumber, which
> >as it happens is exactly the existing limit on track_activity_query_size.
> >
> >> As a result, many queries exceed track_activity_query_size
> >
> >How?  And if they are, why do you care?  Such queries sure seem
> >pretty content-free.
> >
>
> I believe the example was just a very simplistic example. ORMs can of
> course generate queries with joins, which can easily exceed the limit
> you mentioned.
>
> >> What is the overhead here except the memory consumption?
> >
> >The time to copy those strings out of shared storage, any time
> >you query pg_stat_activity.
> >
>
> IMO that seems like a reasonable price to pay, if you want to see
> complete queries and bump the track_activity_query_size value up.

Couldn't be pg_stat_statements (or any similar extension) queryid
exposure in pg_stat_activity [1] also an alternative?  You wouldn't
have the parameters but maybe the normalized query would be enough for
most analysis.  Now, maybe pg_stat_statements jumble overhead for such
large statements would be even more problematic.

[1] https://commitfest.postgresql.org/26/2069/




mdclose() does not cope w/ FileClose() failure

2019-12-22 Thread Noah Misch
Forking thread "WAL logging problem in 9.4.3?" for this tangent:

On Mon, Dec 09, 2019 at 06:04:06PM +0900, Kyotaro Horiguchi wrote:
> I don't understand why mdclose checks for (v->mdfd_vfd >= 0) of open
> segment but anyway mdimmedsync is believing that that won't happen and
> I follow the assumption.  (I suspect that the if condition in mdclose
> should be an assertion..)

That check helps when data_sync_retry=on and FileClose() raised an error in a
previous mdclose() invocation.  However, the check is not sufficient to make
that case work; the attached test case (not for commit) gets an assertion
failure or SIGSEGV.

I am inclined to fix this by decrementing md_num_open_segs before modifying
md_seg_fds (second attachment).  An alternative would be to call
_fdvec_resize() after every FileClose(), like mdtruncate() does; however, the
repalloc() overhead could be noticeable.  (mdclose() is called much more
frequently than mdtruncate().)


Incidentally, _mdfd_openseg() has this:

if (segno <= reln->md_num_open_segs[forknum])
_fdvec_resize(reln, forknum, segno + 1);

That should be >=, not <=.  If the less-than case happened, this would delete
the record of a vfd for a higher-numbered segno.  There's no live bug, because
only segno == reln->md_num_open_segs[forknum] actually happens.  I am inclined
to make an assertion of that and remove the condition:

Assert(segno == reln->md_num_open_segs[forknum]);
_fdvec_resize(reln, forknum, segno + 1);
diff --git a/configure b/configure
index 9de5037..a1f09e8 100755
--- a/configure
+++ b/configure
@@ -3725,7 +3725,7 @@ $as_echo "${segsize}GB" >&6; }
 
 
 cat >>confdefs.h <<_ACEOF
-#define RELSEG_SIZE ${RELSEG_SIZE}
+#define RELSEG_SIZE 2
 _ACEOF
 
 
diff --git a/configure.in b/configure.in
index 9c5e5e7..86f3604 100644
--- a/configure.in
+++ b/configure.in
@@ -289,7 +289,7 @@ RELSEG_SIZE=`expr '(' 1024 / ${blocksize} ')' '*' 
${segsize} '*' 1024`
 test $? -eq 0 || exit 1
 AC_MSG_RESULT([${segsize}GB])
 
-AC_DEFINE_UNQUOTED([RELSEG_SIZE], ${RELSEG_SIZE}, [
+AC_DEFINE_UNQUOTED([RELSEG_SIZE], 2, [
  RELSEG_SIZE is the maximum number of blocks allowed in one disk file.
  Thus, the maximum size of a single file is RELSEG_SIZE * BLCKSZ;
  relations bigger than that are divided into multiple files.
diff --git a/src/backend/storage/file/fd.c b/src/backend/storage/file/fd.c
index f4ab19f..350ce7f 100644
--- a/src/backend/storage/file/fd.c
+++ b/src/backend/storage/file/fd.c
@@ -1741,6 +1741,8 @@ PathNameDeleteTemporaryFile(const char *path, bool 
error_on_failure)
return true;
 }
 
+char *fail_to_close_file;
+
 /*
  * close a file when done with it
  */
@@ -1751,8 +1753,11 @@ FileClose(File file)
 
Assert(FileIsValid(file));
 
-   DO_DB(elog(LOG, "FileClose: %d (%s)",
-  file, VfdCache[file].fileName));
+   elog(LOG, "FileClose: %d (%s)",
+file, VfdCache[file].fileName);
+
+   if (strcmp(VfdCache[file].fileName, fail_to_close_file) == 0)
+   elog(ERROR, "failing to close %s, as requested", 
fail_to_close_file);
 
vfdP = &VfdCache[file];
 
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 8d951ce..c9f5f53 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -3504,6 +3504,16 @@ static struct config_real ConfigureNamesReal[] =
 static struct config_string ConfigureNamesString[] =
 {
{
+   {"fail_to_close_file", PGC_SUSET, DEVELOPER_OPTIONS,
+   gettext_noop("Raise an error when attempt to close this 
file."),
+   gettext_noop("This is a temporary GUC to demonstrate a 
bug."),
+   GUC_NOT_IN_SAMPLE
+   },
+   &fail_to_close_file,
+   "",
+   NULL, NULL, NULL
+   },
+   {
{"archive_command", PGC_SIGHUP, WAL_ARCHIVING,
gettext_noop("Sets the shell command that will be 
called to archive a WAL file."),
NULL
diff --git a/src/include/storage/fd.h b/src/include/storage/fd.h
index 625fbc3..8c21d76 100644
--- a/src/include/storage/fd.h
+++ b/src/include/storage/fd.h
@@ -48,6 +48,7 @@ typedef int File;
 /* GUC parameter */
 extern PGDLLIMPORT int max_files_per_process;
 extern PGDLLIMPORT bool data_sync_retry;
+extern PGDLLIMPORT char *fail_to_close_file;
 
 /*
  * This is private to fd.c, but exported for save/restore_backend_variables()
diff --git a/src/test/regress/sql/boolean.sql b/src/test/regress/sql/boolean.sql
index df61fa4..50a4550 100644
--- a/src/test/regress/sql/boolean.sql
+++ b/src/test/regress/sql/boolean.sql
@@ -2,6 +2,22 @@
 -- BOOLEAN
 --
 
+-- Row count must be large enough to create more than one segment.  This value
+-- is enough at RELSEG_SIZE=2.  That is unrealistic for production use, but it
+-- makes the test finish quickly, with low disk consumption.
+create table fileclose (c) as select

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2019-12-22 Thread vignesh C
On Mon, Dec 2, 2019 at 2:02 PM Dilip Kumar  wrote:
>
> On Sun, Dec 1, 2019 at 7:58 AM Michael Paquier  wrote:
> >
> > On Fri, Nov 22, 2019 at 01:18:11PM +0530, Dilip Kumar wrote:
> > > I have rebased the patch on the latest head and also fix the issue of
> > > "concurrent abort handling of the (sub)transaction." and attached as
> > > (v1-0013-Extend-handling-of-concurrent-aborts-for-streamin) along with
> > > the complete patch set.  I have added the version number so that we
> > > can track the changes.
> >
> > The patch has rotten a bit and does not apply anymore.  Could you
> > please send a rebased version?  I have moved it to next CF, waiting on
> > author.
>
> I have rebased the patch set on the latest head.
>

Few comments:
assert variable should be within #ifdef USE_ASSERT_CHECKING in patch
v2-0008-Add-support-for-streaming-to-built-in-replication.patch:
+   int64   subidx;
+   boolfound = false;
+   charpath[MAXPGPATH];
+
+   subidx = -1;
+   subxact_info_read(MyLogicalRepWorker->subid, xid);
+
+   /* FIXME optimize the search by bsearch on sorted data */
+   for (i = nsubxacts; i > 0; i--)
+   {
+   if (subxacts[i - 1].xid == subxid)
+   {
+   subidx = (i - 1);
+   found = true;
+   break;
+   }
+   }
+
+   /* We should not receive aborts for unknown subtransactions. */
+   Assert(found);

Add the typedefs like below in typedefs.lst common across the patches:
xl_xact_invalidations, ReorderBufferStreamIterTXNEntry,
ReorderBufferStreamIterTXNState, SubXactInfo

"are written" appears twice in commit message of
v2-0002-Issue-individual-invalidations-with-wal_level-log.patch:
The individual invalidations are written are written using a new
xlog record type XLOG_XACT_INVALIDATIONS, from RM_XACT_ID resource
manager. See LogLogicalInvalidations for details.

v2-0002-Issue-individual-invalidations-with-wal_level-log.patch patch
does not compile by itself:
reorderbuffer.c:1822:9: error: ‘ReorderBufferTXN’ has no member named
‘is_schema_sent’
+
LocalExecuteInvalidationMessage(&change->data.inval.msg);
+   txn->is_schema_sent = false;
+   break;

Should we include printing of id here like in earlier cases in
v2-0002-Issue-individual-invalidations-with-wal_level-log.patch:
+   appendStringInfo(buf, " relcache %u", msg->rc.relId);
+   /* not expected, but print something anyway */
+   else if (msg->id == SHAREDINVALSMGR_ID)
+   appendStringInfoString(buf, " smgr");
+   /* not expected, but print something anyway */
+   else if (msg->id == SHAREDINVALRELMAP_ID)
+   appendStringInfo(buf, " relmap db %u", msg->rm.dbId);

There is some code duplication in stream_change_cb_wrapper,
stream_truncate_cb_wrapper, stream_message_cb_wrapper,
stream_abort_cb_wrapper, stream_commit_cb_wrapper,
stream_start_cb_wrapper and stream_stop_cb_wrapper functions in
v2-0003-Extend-the-output-plugin-API-with-stream-methods.patch patch.
Should we have a separate function for common code?

Should we can add function header for AssertChangeLsnOrder in
v2-0006-Implement-streaming-mode-in-ReorderBuffer.patch:
+static void
+AssertChangeLsnOrder(ReorderBuffer *rb, ReorderBufferTXN *txn)
+{

This "Assert(txn->first_lsn != InvalidXLogRecPtr)"can be before the
loop, can be checked only once:
+   dlist_foreach(iter, &txn->changes)
+   {
+   ReorderBufferChange *cur_change;
+
+   cur_change = dlist_container(ReorderBufferChange,
node, iter.cur);
+
+   Assert(txn->first_lsn != InvalidXLogRecPtr);
+   Assert(cur_change->lsn != InvalidXLogRecPtr);
+   Assert(txn->first_lsn <= cur_change->lsn);

Should we add function header for ReorderBufferDestroyTupleCidHash in
v2-0006-Implement-streaming-mode-in-ReorderBuffer.patch:
+static void
+ReorderBufferDestroyTupleCidHash(ReorderBuffer *rb, ReorderBufferTXN *txn)
+{
+   if (txn->tuplecid_hash != NULL)
+   {
+   hash_destroy(txn->tuplecid_hash);
+   txn->tuplecid_hash = NULL;
+   }
+}
+

Should we add function header for ReorderBufferStreamCommit in
v2-0006-Implement-streaming-mode-in-ReorderBuffer.patch:
+static void
+ReorderBufferStreamCommit(ReorderBuffer *rb, ReorderBufferTXN *txn)
+{
+   /* we should only call this for previously streamed transactions */
+   Assert(rbtxn_is_streamed(txn));
+
+   ReorderBufferStreamTXN(rb, txn);
+
+   rb->stream_commit(rb, txn, txn->final_lsn);
+
+   ReorderBufferCleanupTXN(rb, txn);
+}
+

Should we add function header for ReorderBufferCanStream in
v2-0006-Implement-st

Re: Implementing Incremental View Maintenance

2019-12-22 Thread nuko yokohama
SELECT statement that is not IMMUTABLE must not be specified when creating
a view.

An expression SELECT statement that is not IMMUTABLE must not be specified
when creating a view.

In the current implementation, a SELECT statement containing an expression
that is not IMMUTABLE can be specified when creating a view.
If an incremental materialized view is created from a SELECT statement that
contains an expression that is not IMMUTABLE, applying the SELECT statement
to the view returns incorrect results.
To prevent this, we propose that the same error occur when a non-IMMUTABLE
expression is specified in the "CREATE INDEX" statement.

The following is an inappropriate example.

CREATE TABLE base (id int primary key, data text, ts timestamp);
CREATE TABLE
CREATE VIEW base_v AS SELECT * FROM base
  WHERE ts >= (now() - '3 second'::interval);
CREATE VIEW
CREATE MATERIALIZED VIEW base_mv AS SELECT * FROM base
  WHERE ts >= (now() - '3 second'::interval);
SELECT 0
CREATE INCREMENTAL MATERIALIZED VIEW base_imv AS SELECT * FROM base
  WHERE ts >= (now() - '3 second'::interval);
SELECT 0
  View "public.base_v"
 Column |Type | Collation | Nullable | Default |
Storage  | Description
+-+---+--+-+--+-
 id | integer |   |  | |
plain|
 data   | text|   |  | |
extended |
 ts | timestamp without time zone |   |  | |
plain|
View definition:
 SELECT base.id,
base.data,
base.ts
   FROM base
  WHERE base.ts >= (now() - '00:00:03'::interval);

  Materialized view "public.base_mv"
 Column |Type | Collation | Nullable | Default |
Storage  | Stats target | Description
+-+---+--+-+--+--+-
 id | integer |   |  | |
plain|  |
 data   | text|   |  | |
extended |  |
 ts | timestamp without time zone |   |  | |
plain|  |
View definition:
 SELECT base.id,
base.data,
base.ts
   FROM base
  WHERE base.ts >= (now() - '00:00:03'::interval);
Access method: heap

 Materialized view "public.base_imv"
Column |Type | Collation | Nullable |
Default | Storage  | Stats target | Description
---+-+---+--+-+--+--+-
 id| integer |   |  |
  | plain|  |
 data  | text|   |  |
  | extended |  |
 ts| timestamp without time zone |   |  |
  | plain|  |
 __ivm_count__ | bigint  |   |  |
  | plain|  |
View definition:
 SELECT base.id,
base.data,
base.ts
   FROM base
  WHERE base.ts >= (now() - '00:00:03'::interval);
Access method: heap
Incremental view maintenance: yes

INSERT INTO base VALUES (generate_series(1,3), 'dummy', clock_timestamp());
INSERT 0 3
SELECT * FROM base_v ORDER BY id;
 id | data  | ts
+---+
  1 | dummy | 2019-12-22 11:38:26.367481
  2 | dummy | 2019-12-22 11:38:26.367599
  3 | dummy | 2019-12-22 11:38:26.367606
(3 rows)

SELECT * FROM base_mv ORDER BY id;
 id | data | ts
+--+
(0 rows)

REFRESH MATERIALIZED VIEW base_mv;
REFRESH MATERIALIZED VIEW
SELECT * FROM base_mv ORDER BY id;
 id | data  | ts
+---+
  1 | dummy | 2019-12-22 11:38:26.367481
  2 | dummy | 2019-12-22 11:38:26.367599
  3 | dummy | 2019-12-22 11:38:26.367606
(3 rows)

SELECT * FROM base_imv ORDER BY id;
 id | data  | ts
+---+
  1 | dummy | 2019-12-22 11:38:26.367481
  2 | dummy | 2019-12-22 11:38:26.367599
  3 | dummy | 2019-12-22 11:38:26.367606
(3 rows)

SELECT pg_sleep(3);
 pg_sleep
--

(1 row)

INSERT INTO base VALUES (generate_series(4,6), 'dummy', clock_timestamp());
INSERT 0 3
SELECT * FROM base_v ORDER BY id;
 id | data  | ts
+---+
  4 | dummy | 2019-12-22 11:38:29.381414
  5 | dummy | 2019-12-22 11:38:29.381441
  6 | dummy | 2019-12-22 11:38:29.381444
(3 rows)

SELECT * FROM base_mv ORDER BY id;
 id | data  | ts
+---+
  1 | dummy | 2019-12-22 11:38:26.367481
  2 | dummy | 2019-12-22 11:38:26.367599
  3 | dummy | 2019-12-22 11:38:26.367606
(3 rows)

REFRESH MATERIALIZED VIEW base_mv;
REFRESH MATERIALIZED VIEW
SELECT * FROM bas

Re: proposal: schema variables

2019-12-22 Thread Philippe BEAUDOIN
The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, failed
Spec compliant:   not tested
Documentation:tested, failed

Hi Pavel,

First of all, I would like to congratulate you for this great work. This patch 
is really cool. The lack of package variables is sometimes a blocking issue for 
Oracle to Postgres migrations, because the usual emulation with GUC is 
sometimes not enough, in particular when there are security concerns or when 
the database is used in a public cloud.

As I look forward to having this patch commited, I decided to spend some time 
to participate to the review, although I am not a C specialist and I have not a 
good knowledge of the Postgres internals. Here is my report.

A) Installation

The patch applies correctly and the compilation is fine. The "make check" 
doesn't report any issue.

B) Basic usage

I tried some simple schema variables use cases. No problem.

C) The interface

The SQL changes look good to me.

However, in the CREATE VARIABLE command, I would replace the "TRANSACTION" word 
by "TRANSACTIONAL".

I have also tried to replace this word by a ON ROLLBACK clause at the end of 
the statement, like for ON COMMIT, but I have not found a satisfying wording to 
propose.

D) Behaviour

I am ok with variables not being transactional by default. That's the most 
simple, the most efficient, it emulates the package variables of other RDBMS 
and it will probably fit the most common use cases.

Note that I am not strongly opposed to having by default transactional 
variables. But I don't know whether this change would be a great work. We would 
have at least to find another keyword in the CREATE VARIABLE statement. 
Something like "NON-TRANSACTIONAL VARIABLE" ?

It is possible to create a NOT NULL variable without DEFAULT. When trying to 
read the variable before a LET statement, one gets an error massage saying that 
the NULL value is not allowed (and the documentation is clear about this case). 
Just for the records, I wondered whether it wouldn't be better to forbid a NOT 
NULL variable creation that wouldn't have a DEFAULT value. But finally, I think 
this behaviour provides a good way to force the variable initialisation before 
its use. So let's keep it as is.

E) ACL and Rights

I played a little bit with the GRANT and REVOKE statements. 

I have got an error (Issue 1). The following statement chain:
  create variable public.sv1 int;
  grant read on variable sv1 to other_user;
  drop owned by other_user;
reports : ERROR:  unexpected object class 4287

I then tried to use DEFAULT PRIVILEGES. Despite this is not documented, I 
successfuly performed:
  alter default privileges in schema public grant read on variables to 
simple_user;
  alter default privileges in schema public grant write on variables to 
simple_user;

When variables are then created, the grants are properly given.
And the psql \ddp command perfectly returns:
 Default access privileges
  Owner   | Schema | Type |Access privileges
--++--+-
 postgres | public |  | simple_user=SW/postgres
(1 row)

So the ALTER DEFAULT PRIVILEGES documentation chapter has to reflect this new 
syntax (Issue 2).

BTW, in the ACL, the READ privilege is represented by a S letter. A comment in 
the source reports that the R letter was used in the past for rule privilege. 
Looking at the postgres sources, I see that this privilege on rules has been 
suppressed  in 8.2, so 13 years ago. As this R letter would be a so much better 
choice, I wonder whether it couldn't be reused now for this new purpose. Is it 
important to keep this letter frozen ?

F) Extension

I then created an extension, whose installation script creates a schema 
variable and functions that use it. The schema variable is correctly linked to 
the extension, so that dropping the extension drops the variable.

But there is an issue when dumping the database (Issue 3). The script generated 
by pg_dump includes the CREATE EXTENSION statement as expected but also a 
redundant CREATE VARIABLE statement for the variable that belongs to the 
extension. As a result, one of course gets an error at restore time.

G) Row Level Security

I did a test activating RLS on a table and creating a POLICY that references a 
schema variable in its USING and WITH CHECK clauses. Everything worked fine.

H) psql

A \dV meta-command displays all the created variables.
I would change a little bit the provided view. More precisely I would:
- rename "Constraint" into "Is nullable" and report it as a boolean
- rename "Special behave" into "Is transactional" and report it as a boolean
- change the order of columns so to have:
Schema | Name | Type | Is nullable | Default | Owner | Is transactional | 
Transaction end action
"Is nullable" being aside "Default"

I) Performance

I just quickly looked at the performance, and 

Re: Simplify passing of configure arguments to pg_config

2019-12-22 Thread Peter Eisentraut

On 2019-12-04 11:30, Peter Eisentraut wrote:

On 2019-12-03 06:03, Tom Lane wrote:

Peter Eisentraut  writes:

Currently, configure puts the configure args into the makefiles and
then have the makefiles pass them to the build of pg_config.  That looks
like an unnecessary redirection, and indeed that method was
put in place when pg_config was a shell script.  We can simplify that
by having configure put the value into pg_config.h directly.  This
also makes the standard build system match how the MSVC build system
already does it.


I dunno, is this really an improvement?  It makes the handling of
VAL_CONFIGURE different from every other one of the values passed
into pg_config, and I don't see any countervailing addition of
some other regularity.


The other values come from the makefiles, so we have to do it that way.
The configure args come from configure, so why make them go through the
makefile?  (PG_VERSION also comes in that way. ;-) )

There is also the weird difference with how the MSVC build system
handles it.  It appends VAL_CONFIGURE to pg_config.h instead of passing
it on the command line.


Here is an updated version of the patch after the removal of 
pg_config.h.win32.  It's easier to see now how this helps unify the 
handling of this between the two build systems.


--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From 459c1ec0708348ea1db8d23d789fa8e391904867 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Sun, 22 Dec 2019 14:50:47 +0100
Subject: [PATCH v2] Simplify passing of configure arguments to pg_config

The previous system had configure put the value into the makefiles and
then have the makefiles pass them to the build of pg_config.  That was
put in place when pg_config was a shell script.  We can simplify that
by having configure put the value into pg_config.h directly.  This
also makes the standard build system match how the MSVC build system
already does it.
---
 configure  | 6 --
 configure.in   | 2 +-
 src/Makefile.global.in | 3 ---
 src/common/Makefile| 1 -
 src/common/config_info.c   | 6 +-
 src/include/pg_config.h.in | 3 +++
 src/tools/msvc/Solution.pm | 7 +--
 7 files changed, 10 insertions(+), 18 deletions(-)

diff --git a/configure b/configure
index 9de50377ff..ef0e9ac6ca 100755
--- a/configure
+++ b/configure
@@ -776,7 +776,6 @@ build_vendor
 build_cpu
 build
 PG_MAJORVERSION
-configure_args
 target_alias
 host_alias
 build_alias
@@ -2797,7 +2796,10 @@ ac_configure="$SHELL $ac_aux_dir/configure"  # Please 
don't use this var.
 
 
 
-configure_args=$ac_configure_args
+
+cat >>confdefs.h <<_ACEOF
+#define CONFIGURE_ARGS "$ac_configure_args"
+_ACEOF
 
 
 PG_MAJORVERSION=`expr "$PACKAGE_VERSION" : '\([0-9][0-9]*\)'`
diff --git a/configure.in b/configure.in
index 9c5e5e7f8c..c5323ff9b4 100644
--- a/configure.in
+++ b/configure.in
@@ -27,7 +27,7 @@ AC_COPYRIGHT([Copyright (c) 1996-2019, PostgreSQL Global 
Development Group])
 AC_CONFIG_SRCDIR([src/backend/access/common/heaptuple.c])
 AC_CONFIG_AUX_DIR(config)
 AC_PREFIX_DEFAULT(/usr/local/pgsql)
-AC_SUBST(configure_args, [$ac_configure_args])
+AC_DEFINE_UNQUOTED(CONFIGURE_ARGS, ["$ac_configure_args"], [Saved arguments 
from configure])
 
 [PG_MAJORVERSION=`expr "$PACKAGE_VERSION" : '\([0-9][0-9]*\)'`]
 AC_SUBST(PG_MAJORVERSION)
diff --git a/src/Makefile.global.in b/src/Makefile.global.in
index 05b66380e0..95f5a104e5 100644
--- a/src/Makefile.global.in
+++ b/src/Makefile.global.in
@@ -74,9 +74,6 @@ endif # not PGXS
 
 vpathsearch = `for f in $(addsuffix /$(1),$(subst :, ,. $(VPATH))); do test -r 
$$f && echo $$f && break; done`
 
-# Saved arguments from configure
-configure_args = @configure_args@
-
 
 ##
 #
diff --git a/src/common/Makefile b/src/common/Makefile
index ffb0f6edff..fd43558830 100644
--- a/src/common/Makefile
+++ b/src/common/Makefile
@@ -31,7 +31,6 @@ include $(top_builddir)/src/Makefile.global
 # don't include subdirectory-path-dependent -I and -L switches
 STD_CPPFLAGS := $(filter-out -I$(top_srcdir)/src/include 
-I$(top_builddir)/src/include,$(CPPFLAGS))
 STD_LDFLAGS := $(filter-out -L$(top_builddir)/src/common 
-L$(top_builddir)/src/port,$(LDFLAGS))
-override CPPFLAGS += -DVAL_CONFIGURE="\"$(configure_args)\""
 override CPPFLAGS += -DVAL_CC="\"$(CC)\""
 override CPPFLAGS += -DVAL_CPPFLAGS="\"$(STD_CPPFLAGS)\""
 override CPPFLAGS += -DVAL_CFLAGS="\"$(CFLAGS)\""
diff --git a/src/common/config_info.c b/src/common/config_info.c
index dd34fbfc00..902f8226a4 100644
--- a/src/common/config_info.c
+++ b/src/common/config_info.c
@@ -124,11 +124,7 @@ get_configdata(const char *my_exec_path, size_t 
*configdata_len)
i++;
 
configdata[i].name = pstrdup("CONFIGURE");
-#ifdef VAL_CONFIGURE
-   configdata[i].setting = pstrdup(VAL_CONFIGURE);
-#else
-   configdata[i].setting = pstrdup(_("not rec

Avoiding a small risk of failure in timestamp(tz) regression tests

2019-12-22 Thread Tom Lane
I noticed a buildfarm failure here:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skate&dt=2019-12-22%2007%3A49%3A22

== pgsql.build/src/test/regress/regression.diffs 
==
*** 
/home/pgbf/buildroot/REL_10_STABLE/pgsql.build/src/test/regress/expected/timestamptz.out
2019-12-13 08:51:47.0 +0100
--- 
/home/pgbf/buildroot/REL_10_STABLE/pgsql.build/src/test/regress/results/timestamptz.out
 2019-12-22 09:00:00.0 +0100
***
*** 27,33 
  SELECT count(*) AS One FROM TIMESTAMPTZ_TBL WHERE d1 = timestamp with time 
zone 'today';
   one 
  -
!1
  (1 row)
  
  SELECT count(*) AS One FROM TIMESTAMPTZ_TBL WHERE d1 = timestamp with time 
zone 'tomorrow';
--- 27,33 
  SELECT count(*) AS One FROM TIMESTAMPTZ_TBL WHERE d1 = timestamp with time 
zone 'today';
   one 
  -
!2
  (1 row)
  
  SELECT count(*) AS One FROM TIMESTAMPTZ_TBL WHERE d1 = timestamp with time 
zone 'tomorrow';


Judging by the reported timestamp on the results file, this is an instance
of the problem mentioned in the comments in timestamptz.sql:

-- NOTE: it is possible for this part of the test to fail if the transaction
-- block is entered exactly at local midnight; then 'now' and 'today' have
-- the same values and the counts will come out different.

On most machines it'd be pretty hard to hit that window; I speculate that
"skate" has got a very low-resolution system clock, making the window
larger.  Nonetheless, a test that's got designed-in failure modes is
annoying.  We can dodge this by separating the test for "now" from the
tests for the today/tomorrow/etc input strings, as attached.
Any objections?

regards, tom lane

diff --git a/src/test/regress/expected/timestamp.out b/src/test/regress/expected/timestamp.out
index 39a4d49..5f97505 100644
--- a/src/test/regress/expected/timestamp.out
+++ b/src/test/regress/expected/timestamp.out
@@ -5,20 +5,9 @@ CREATE TABLE TIMESTAMP_TBL (d1 timestamp(2) without time zone);
 -- Test shorthand input values
 -- We can't just "select" the results since they aren't constants; test for
 -- equality instead.  We can do that by running the test inside a transaction
--- block, within which the value of 'now' shouldn't change.  We also check
--- that 'now' *does* change over a reasonable interval such as 100 msec.
--- NOTE: it is possible for this part of the test to fail if the transaction
--- block is entered exactly at local midnight; then 'now' and 'today' have
--- the same values and the counts will come out different.
-INSERT INTO TIMESTAMP_TBL VALUES ('now');
-SELECT pg_sleep(0.1);
- pg_sleep 
---
- 
-(1 row)
-
+-- block, within which the value of 'now' shouldn't change, and so these
+-- related values shouldn't either.
 BEGIN;
-INSERT INTO TIMESTAMP_TBL VALUES ('now');
 INSERT INTO TIMESTAMP_TBL VALUES ('today');
 INSERT INTO TIMESTAMP_TBL VALUES ('yesterday');
 INSERT INTO TIMESTAMP_TBL VALUES ('tomorrow');
@@ -43,15 +32,17 @@ SELECT count(*) AS One FROM TIMESTAMP_TBL WHERE d1 = timestamp without time zone
1
 (1 row)
 
-SELECT count(*) AS One FROM TIMESTAMP_TBL WHERE d1 = timestamp(2) without time zone 'now';
- one 
--
-   1
-(1 row)
-
 COMMIT;
 DELETE FROM TIMESTAMP_TBL;
--- verify uniform transaction time within transaction block
+-- Verify that 'now' *does* change over a reasonable interval such as 100 msec,
+-- and that it doesn't change over the same interval within a transaction block
+INSERT INTO TIMESTAMP_TBL VALUES ('now');
+SELECT pg_sleep(0.1);
+ pg_sleep 
+--
+ 
+(1 row)
+
 BEGIN;
 INSERT INTO TIMESTAMP_TBL VALUES ('now');
 SELECT pg_sleep(0.1);
@@ -73,6 +64,12 @@ SELECT count(*) AS two FROM TIMESTAMP_TBL WHERE d1 = timestamp(2) without time z
2
 (1 row)
 
+SELECT count(d1) AS three, count(DISTINCT d1) AS two FROM TIMESTAMP_TBL;
+ three | two 
+---+-
+ 3 |   2
+(1 row)
+
 COMMIT;
 TRUNCATE TIMESTAMP_TBL;
 -- Special values
diff --git a/src/test/regress/expected/timestamptz.out b/src/test/regress/expected/timestamptz.out
index bb89910..639b503 100644
--- a/src/test/regress/expected/timestamptz.out
+++ b/src/test/regress/expected/timestamptz.out
@@ -5,20 +5,9 @@ CREATE TABLE TIMESTAMPTZ_TBL (d1 timestamp(2) with time zone);
 -- Test shorthand input values
 -- We can't just "select" the results since they aren't constants; test for
 -- equality instead.  We can do that by running the test inside a transaction
--- block, within which the value of 'now' shouldn't change.  We also check
--- that 'now' *does* change over a reasonable interval such as 100 msec.
--- NOTE: it is possible for this part of the test to fail if the transaction
--- block is entered exactly at local midnight; then 'now' and 'today' have
--- the same values and the counts will come out different.
-INSERT INTO TIMESTAMPTZ_TBL VALUES ('now');
-SELECT pg_sleep(0.1);
- pg_sleep 
---
- 
-(1 row)
-
+-- block, within which the value of 'now' shouldn't change, and so th

Re: Invisible PROMPT2

2019-12-22 Thread Maxence Ahlouche
On Wed, 27 Nov 2019 at 17:09, Tom Lane  wrote:

> Good idea, but I think you need to account for "visible" (ie, if the
> newline is inside RL_PROMPT_START_IGNORE, it shouldn't change the width).
> It might be best to add logic inside the existing "if (visible)" instead
> of making a new top-level case.
>

Right, I assumed that it was safe given that only terminal control
characters were invisible.
Since the title of the terminal window can be changed as well via control
characters, it's probably better not to make that assumption.

I updated the patch accordingly.


> Another special case that somebody's likely to whine about is \t, though
> to handle that we'd have to make assumptions about the tab stop distance.
> Maybe assuming that it's 8 is good enough.
>

The problem with tabs is that any user can set their tabstops to whatever
they want, and a tab doesn't have a fixed width, it just goes up to the
next tab stop.
One way to do it would be to add tabs wherever necessary in prompt2 to make
sure they have the same size as in prompt1 (a list of numbers of spaces,
which we would concatenate with a tab?), but I'm not sure it's worth the
effort.
commit 491cf173aad247299622796775feea580a8f9b13 (HEAD -> refs/heads/patch_psql_prompt)
Author: Maxence Ahlouche 
Date:   Wed Nov 27 16:21:35 2019 +0100

Fix %w length in PROMPT2 when PROMPT1 contains a newline, in psql.

The width of the invisible PROMPT2 must take into account, in order for user input to be aligned with the first line, that PROMPT1 can contain newlines.

diff --git a/src/bin/psql/prompt.c b/src/bin/psql/prompt.c
index 41c6f21ecf..24efb8f686 100644
--- a/src/bin/psql/prompt.c
+++ b/src/bin/psql/prompt.c
@@ -379,7 +379,10 @@ get_prompt(promptStatus_t status, ConditionalStack cstack)
 if (visible)
 {
 	chwidth = PQdsplen(p, pset.encoding);
-	if (chwidth > 0)
+
+	if (*p == '\n')
+		last_prompt1_width = 0;
+	else if (chwidth > 0)
 		last_prompt1_width += chwidth;
 }
 


Re: Global temporary tables

2019-12-22 Thread Philippe BEAUDOIN
Hi all,

I am not aware enough in the Postgres internals to give advice about the 
implementation.

But my feeling is that there is another big interest for this feature: simplify 
the Oracle to PostgreSQL migration of applications that use global termporary 
tables. And this is quite common when stored procedures are used. In such a 
case, we currently need to modify the logic of the code, always implementing an 
ugly solution (either add CREATE TEMP TABLE statements in the code everywhere 
it is needed, or use a regular table with additional TRUNCATE statements if we 
can ensure that only a single connection uses the table at a time).

So, Konstantin and all, Thanks by advance for all that could be done on this 
feature :-)

Best regards.

Re: proposal: schema variables

2019-12-22 Thread Pavel Stehule
Hi

ne 22. 12. 2019 v 13:04 odesílatel Philippe BEAUDOIN 
napsal:

> The following review has been posted through the commitfest application:
> make installcheck-world:  tested, passed
> Implements feature:   tested, failed
> Spec compliant:   not tested
> Documentation:tested, failed
>
> Hi Pavel,
>
> First of all, I would like to congratulate you for this great work. This
> patch is really cool. The lack of package variables is sometimes a blocking
> issue for Oracle to Postgres migrations, because the usual emulation with
> GUC is sometimes not enough, in particular when there are security concerns
> or when the database is used in a public cloud.
>
> As I look forward to having this patch commited, I decided to spend some
> time to participate to the review, although I am not a C specialist and I
> have not a good knowledge of the Postgres internals. Here is my report.
>
> A) Installation
>
> The patch applies correctly and the compilation is fine. The "make check"
> doesn't report any issue.
>
> B) Basic usage
>
> I tried some simple schema variables use cases. No problem.
>
> C) The interface
>
> The SQL changes look good to me.
>
> However, in the CREATE VARIABLE command, I would replace the "TRANSACTION"
> word by "TRANSACTIONAL".
>

There is not technical problem - the problem is in introduction new keyword
"transactional" that is near to "transaction". I am not sure if it is
practical to have two "similar" keyword and how much the CREATE statement
has to use correct English grammar.

I am not native speaker, so I am not able to see how bad is using
"TRANSACTION" instead "TRANSACTIONAL" in this context. So I see a risk to
have two important (it is not syntactic sugar) similar keywords.

Just I afraid so using TRANSACTIONAL instead just TRANSACTION is not too
user friendly. I have not strong opinion about this - and the
implementation is easy, but I am not feel comfortable with introduction
this keyword.


> I have also tried to replace this word by a ON ROLLBACK clause at the end
> of the statement, like for ON COMMIT, but I have not found a satisfying
> wording to propose.
>



>
> D) Behaviour
>
> I am ok with variables not being transactional by default. That's the most
> simple, the most efficient, it emulates the package variables of other
> RDBMS and it will probably fit the most common use cases.
>
> Note that I am not strongly opposed to having by default transactional
> variables. But I don't know whether this change would be a great work. We
> would have at least to find another keyword in the CREATE VARIABLE
> statement. Something like "NON-TRANSACTIONAL VARIABLE" ?
>

Variables almost everywhere (global user settings - GUC is only one planet
exception) are non transactional by default. I don't see any reason
introduce new different design than is wide used.


> It is possible to create a NOT NULL variable without DEFAULT. When trying
> to read the variable before a LET statement, one gets an error massage
> saying that the NULL value is not allowed (and the documentation is clear
> about this case). Just for the records, I wondered whether it wouldn't be
> better to forbid a NOT NULL variable creation that wouldn't have a DEFAULT
> value. But finally, I think this behaviour provides a good way to force the
> variable initialisation before its use. So let's keep it as is.
>

This is a question - and there are two possibilities

postgres=# do $$
declare x int not null;
begin
  raise notice '%', x;
end;
$$ ;
ERROR:  variable "x" must have a default value, since it's declared NOT NULL
LINE 2: declare x int not null;
  ^

PLpgSQL requires it. But there is not a possibility to enforce future
setting.

So I know so behave of schema variables is little bit different, but I
think so this difference has interesting use case. You can check if the
variable was modified somewhere or not.


> E) ACL and Rights
>
> I played a little bit with the GRANT and REVOKE statements.
>
> I have got an error (Issue 1). The following statement chain:
>   create variable public.sv1 int;
>   grant read on variable sv1 to other_user;
>   drop owned by other_user;
> reports : ERROR:  unexpected object class 4287
>

this is bug and should be fixed


> I then tried to use DEFAULT PRIVILEGES. Despite this is not documented, I
> successfuly performed:
>   alter default privileges in schema public grant read on variables to
> simple_user;
>   alter default privileges in schema public grant write on variables to
> simple_user;
>
> When variables are then created, the grants are properly given.
> And the psql \ddp command perfectly returns:
>  Default access privileges
>   Owner   | Schema | Type |Access privileges
> --++--+-
>  postgres | public |  | simple_user=SW/postgres
> (1 row)
>
> So the ALTER DEFAULT PRIVILEGES documentation chapter has to reflect this
> new syntax (Issue 2).
>
> BTW, in the ACL, the READ pr

Re: mdclose() does not cope w/ FileClose() failure

2019-12-22 Thread Noah Misch
On Sun, Dec 22, 2019 at 01:19:30AM -0800, Noah Misch wrote:
> I am inclined to fix this by decrementing md_num_open_segs before modifying
> md_seg_fds (second attachment).

That leaked memory, since _fdvec_resize() assumes md_num_open_segs is also the
allocated array length.  The alternative is looking better:

> An alternative would be to call
> _fdvec_resize() after every FileClose(), like mdtruncate() does; however, the
> repalloc() overhead could be noticeable.  (mdclose() is called much more
> frequently than mdtruncate().)

I can skip repalloc() when the array length decreases, to assuage mdclose()'s
worry.  In the mdclose() case, the final _fdvec_resize(reln, fork, 0) will
still pfree() the array.  Array elements that mdtruncate() frees today will
instead persist to end of transaction.  That is okay, since mdtruncate()
crossing more than one segment boundary is fairly infrequent.  For it to
happen, you must either create a >2G relation and then TRUNCATE it in the same
transaction, or VACUUM must find >1-2G of unused space at the end of the
relation.  I'm now inclined to do it that way, attached.
diff --git a/src/backend/storage/smgr/md.c b/src/backend/storage/smgr/md.c
index 82442db..5dae6d4 100644
--- a/src/backend/storage/smgr/md.c
+++ b/src/backend/storage/smgr/md.c
@@ -516,18 +516,10 @@ mdclose(SMgrRelation reln, ForkNumber forknum)
{
MdfdVec*v = &reln->md_seg_fds[forknum][nopensegs - 1];
 
-   /* if not closed already */
-   if (v->mdfd_vfd >= 0)
-   {
-   FileClose(v->mdfd_vfd);
-   v->mdfd_vfd = -1;
-   }
-
+   FileClose(v->mdfd_vfd);
+   _fdvec_resize(reln, forknum, nopensegs - 1);
nopensegs--;
}
-
-   /* resize just once, avoids pointless reallocations */
-   _fdvec_resize(reln, forknum, 0);
 }
 
 /*
@@ -1047,13 +1039,15 @@ _fdvec_resize(SMgrRelation reln,
reln->md_seg_fds[forknum] =
MemoryContextAlloc(MdCxt, sizeof(MdfdVec) * nseg);
}
-   else
+   else if (nseg > reln->md_num_open_segs[forknum])
{
/*
 * It doesn't seem worthwhile complicating the code by having a 
more
 * aggressive growth strategy here; the number of segments 
doesn't
 * grow that fast, and the memory context internally will 
sometimes
-* avoid doing an actual reallocation.
+* avoid doing an actual reallocation.  Likewise, since the 
number of
+* segments doesn't shrink that fast, don't shrink at all.  
During
+* mdclose(), we'll pfree the array at nseg==0.
 */
reln->md_seg_fds[forknum] =
repalloc(reln->md_seg_fds[forknum],


Re: Implementing Incremental View Maintenance

2019-12-22 Thread legrand legrand
Hello,

First of all many thanks for this Great feature 
replacing so many triggers by a so simple syntax ;o)

I was wondering about performances and add a look 
at pg_stat_statements (with track=all) with IVM_v9.patch.

For each insert into a base table there are 3 statements:
- ANALYZE pg_temp_3.pg_temp_81976
- WITH updt AS (  UPDATE public.mv1 AS mv SET __ivm_count__ = ...
- DROP TABLE pg_temp_3.pg_temp_81976

It generates a lot of lines in pg_stat_statements with calls = 1.
Thoses statements can not be shared because the temp table is dropped each
time.

Is there a plan to change this ?

Many Thanks again

Regards
PAscal



--
Sent from: https://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html




Re: mdclose() does not cope w/ FileClose() failure

2019-12-22 Thread Thomas Munro
On Sun, Dec 22, 2019 at 10:19 PM Noah Misch  wrote:
> Assert(segno == reln->md_num_open_segs[forknum]);
> _fdvec_resize(reln, forknum, segno + 1);

Oh yeah, I spotted that part too but didn't follow up.

https://www.postgresql.org/message-id/flat/CA%2BhUKG%2BNBw%2BuSzxF1os-SO6gUuw%3DcqO5DAybk6KnHKzgGvxhxA%40mail.gmail.com




Re: mdclose() does not cope w/ FileClose() failure

2019-12-22 Thread Noah Misch
On Mon, Dec 23, 2019 at 09:33:29AM +1300, Thomas Munro wrote:
> On Sun, Dec 22, 2019 at 10:19 PM Noah Misch  wrote:
> > Assert(segno == reln->md_num_open_segs[forknum]);
> > _fdvec_resize(reln, forknum, segno + 1);
> 
> Oh yeah, I spotted that part too but didn't follow up.
> 
> https://www.postgresql.org/message-id/flat/CA%2BhUKG%2BNBw%2BuSzxF1os-SO6gUuw%3DcqO5DAybk6KnHKzgGvxhxA%40mail.gmail.com

That patch of yours looks good.




Re: unsupportable composite type partition keys

2019-12-22 Thread Tom Lane
I wrote:
> Now as far as point 1 goes, I think it's not really that awful to use
> CheckAttributeType() with a dummy attribute name.  The attached
> incomplete patch uses "partition key" which causes it to emit errors
> like
> regression=# create table fool (a int, b int) partition by list ((row(a, 
> b))); 
> ERROR:  column "partition key" has pseudo-type record
> I don't think that that's unacceptable.  But if we wanted to improve it,
> we could imagine adding another flag, say CHKATYPE_IS_PARTITION_KEY,
> that doesn't affect CheckAttributeType's semantics, just the wording of
> the error messages it throws.

Here's a fleshed-out patch that does it like that.

While poking at this, I also started to wonder why CheckAttributeType
wasn't recursing into ranges, since those are our other kind of
container type.  And the answer is that it must, because we allow
creation of ranges over composite types:

regression=# create table foo (f1 int, f2 int);
CREATE TABLE
regression=# create type foorange as range (subtype = foo);
CREATE TYPE
regression=# alter table foo add column r foorange;
ALTER TABLE

Simple things still work on table foo, but surely this is exactly
what CheckAttributeType is supposed to be preventing.  With the
second attached patch you get

regression=# alter table foo add column r foorange;
ERROR:  composite type foo cannot be made a member of itself

The second patch needs to go back all the way, the first one
only as far as we have partitions.

regards, tom lane

diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index 8404904..82398da 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -572,6 +572,10 @@ CheckAttributeNamesTypes(TupleDesc tupdesc, char relkind,
  * are reliably identifiable only within a session, since the identity info
  * may use a typmod that is only locally assigned.  The caller is expected
  * to know whether these cases are safe.)
+ *
+ * flags can also control the phrasing of the error messages.  If
+ * CHKATYPE_IS_PARTKEY is specified, "attname" should be a partition key
+ * column number as text, not a real column name.
  * 
  */
 void
@@ -598,10 +602,19 @@ CheckAttributeType(const char *attname,
 		if (!((atttypid == ANYARRAYOID && (flags & CHKATYPE_ANYARRAY)) ||
 			  (atttypid == RECORDOID && (flags & CHKATYPE_ANYRECORD)) ||
 			  (atttypid == RECORDARRAYOID && (flags & CHKATYPE_ANYRECORD
-			ereport(ERROR,
-	(errcode(ERRCODE_INVALID_TABLE_DEFINITION),
-	 errmsg("column \"%s\" has pseudo-type %s",
-			attname, format_type_be(atttypid;
+		{
+			if (flags & CHKATYPE_IS_PARTKEY)
+ereport(ERROR,
+		(errcode(ERRCODE_INVALID_TABLE_DEFINITION),
+/* translator: first %s is an integer not a name */
+		 errmsg("partition key column %s has pseudo-type %s",
+attname, format_type_be(atttypid;
+			else
+ereport(ERROR,
+		(errcode(ERRCODE_INVALID_TABLE_DEFINITION),
+		 errmsg("column \"%s\" has pseudo-type %s",
+attname, format_type_be(atttypid;
+		}
 	}
 	else if (att_typtype == TYPTYPE_DOMAIN)
 	{
@@ -648,7 +661,7 @@ CheckAttributeType(const char *attname,
 			CheckAttributeType(NameStr(attr->attname),
 			   attr->atttypid, attr->attcollation,
 			   containing_rowtypes,
-			   flags);
+			   flags & ~CHKATYPE_IS_PARTKEY);
 		}
 
 		relation_close(relation, AccessShareLock);
@@ -670,11 +683,21 @@ CheckAttributeType(const char *attname,
 	 * useless, and it cannot be dumped, so we must disallow it.
 	 */
 	if (!OidIsValid(attcollation) && type_is_collatable(atttypid))
-		ereport(ERROR,
-(errcode(ERRCODE_INVALID_TABLE_DEFINITION),
- errmsg("no collation was derived for column \"%s\" with collatable type %s",
-		attname, format_type_be(atttypid)),
- errhint("Use the COLLATE clause to set the collation explicitly.")));
+	{
+		if (flags & CHKATYPE_IS_PARTKEY)
+			ereport(ERROR,
+	(errcode(ERRCODE_INVALID_TABLE_DEFINITION),
+			/* translator: first %s is an integer not a name */
+	 errmsg("no collation was derived for partition key column %s with collatable type %s",
+			attname, format_type_be(atttypid)),
+	 errhint("Use the COLLATE clause to set the collation explicitly.")));
+		else
+			ereport(ERROR,
+	(errcode(ERRCODE_INVALID_TABLE_DEFINITION),
+	 errmsg("no collation was derived for column \"%s\" with collatable type %s",
+			attname, format_type_be(atttypid)),
+	 errhint("Use the COLLATE clause to set the collation explicitly.")));
+	}
 }
 
 /*
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index e8e004e..845f010 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -15096,12 +15096,24 @@ ComputePartitionAttrs(ParseState *pstate, Relation rel, List *partParams, AttrNu
 		{
 			/* Expression */
 			Node	   *expr = pelem->expr;
+			char		partattname[16];
 
 			Assert(expr != NULL)

Re: Memory-Bounded Hash Aggregation

2019-12-22 Thread Jeff Davis
On Tue, 2019-12-10 at 13:34 -0800, Adam Lee wrote:
> Melanie and I tried this, had a installcheck passed patch. The way
> how
> we verify it is composing a wide table with long unnecessary text
> columns, then check the size it writes on every iteration.
> 
> Please check out the attachment, it's based on your 1204 version.

Thank you. Attached a new patch that incorporates your projection work.

A few comments:

* You are only nulling out up to tts_nvalid, which means that you can
still end up storing more on disk if the wide column comes at the end
of the table and hasn't been deserialized yet. I fixed this by copying
needed attributes to the hash_spill_slot and making it virtual.

* aggregated_columns does not need to be a member of AggState; nor does
it need to be computed inside of the perhash loop. Aside: if adding a
field to AggState is necessary, you need to bump the field numbers of
later fields that are labeled for JIT use, otherwise it will break JIT.

* I used an array rather than a bitmapset. It makes it easier to find
the highest column (to do a slot_getsomeattrs), and it might be a
little more efficient for wide tables with mostly useless columns.

* Style nitpick: don't mix code and declarations

The updated patch also saves the transitionSpace calculation in the Agg
node for better hash table size estimating. This is a good way to
choose an initial number of buckets for the hash table, and also to cap
the number of groups we permit in the hash table when we expect the
groups to grow.

Regards,
Jeff Davis



diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 5d1c90282f9..89ced3cd978 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -1751,6 +1751,23 @@ include_dir 'conf.d'
   
  
 
+ 
+  hashagg_mem_overflow (boolean)
+  
+   hashagg_mem_overflow configuration parameter
+  
+  
+  
+   
+ If hash aggregation exceeds work_mem at query
+ execution time, and hashagg_mem_overflow is set
+ to on, continue consuming more memory rather than
+ performing disk-based hash aggregation. The default
+ is off.
+   
+  
+ 
+
  
   max_stack_depth (integer)
   
@@ -4451,6 +4468,24 @@ ANY num_sync ( 
+  enable_hashagg_spill (boolean)
+  
+   enable_hashagg_spill configuration parameter
+  
+  
+  
+   
+Enables or disables the query planner's use of hashed aggregation plan
+types when the memory usage is expected to
+exceed work_mem. This only affects the planner
+choice; actual behavior at execution time is dictated by
+. The default
+is on.
+   
+  
+ 
+
  
   enable_hashjoin (boolean)
   
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 949fefa23ae..c2fb7a088a2 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -102,6 +102,7 @@ static void show_tablesample(TableSampleClause *tsc, PlanState *planstate,
 			 List *ancestors, ExplainState *es);
 static void show_sort_info(SortState *sortstate, ExplainState *es);
 static void show_hash_info(HashState *hashstate, ExplainState *es);
+static void show_hashagg_info(AggState *hashstate, ExplainState *es);
 static void show_tidbitmap_info(BitmapHeapScanState *planstate,
 ExplainState *es);
 static void show_instrumentation_count(const char *qlabel, int which,
@@ -1844,6 +1845,8 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		case T_Agg:
 			show_agg_keys(castNode(AggState, planstate), ancestors, es);
 			show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
+			if (es->analyze)
+show_hashagg_info((AggState *) planstate, es);
 			if (plan->qual)
 show_instrumentation_count("Rows Removed by Filter", 1,
 		   planstate, es);
@@ -2742,6 +2745,56 @@ show_hash_info(HashState *hashstate, ExplainState *es)
 	}
 }
 
+/*
+ * If EXPLAIN ANALYZE, show information on hash aggregate memory usage and
+ * batches.
+ */
+static void
+show_hashagg_info(AggState *aggstate, ExplainState *es)
+{
+	Agg		*agg	   = (Agg *)aggstate->ss.ps.plan;
+	long	 memPeakKb = (aggstate->hash_mem_peak + 1023) / 1024;
+	long	 diskKb= (aggstate->hash_disk_used + 1023) / 1024;
+
+
+	Assert(IsA(aggstate, AggState));
+
+	if (agg->aggstrategy != AGG_HASHED &&
+		agg->aggstrategy != AGG_MIXED)
+		return;
+
+	if (es->format == EXPLAIN_FORMAT_TEXT)
+	{
+		appendStringInfoSpaces(es->str, es->indent * 2);
+		appendStringInfo(
+			es->str,
+			"Memory Usage: %ldkB",
+			memPeakKb);
+
+		if (aggstate->hash_batches_used > 0)
+		{
+			appendStringInfo(
+es->str,
+"  Batches: %d  Disk: %ldkB",
+aggstate->hash_batches_used, diskKb);
+		}
+
+		appendStringInfo(
+			es->str,
+			"\n");
+	}
+	else
+	{
+		ExplainPropertyInteger("Peak Memory Usage", "kB", memPeakKb, es);
+		if (aggstate->hash_batches_used > 0)
+		{
+			ExplainPropertyInteger("Has

Re: Memory-Bounded Hash Aggregation

2019-12-22 Thread Jeff Davis
On Sat, 2019-12-14 at 18:32 +0100, Tomas Vondra wrote:
> So I think we're not costing the batching properly / at all.

Thank you for all of the testing! I think the results are good: even
for cases where HashAgg is the wrong choice, it's not too bad. You're
right that costing is not done, and when it is, I think it will avoid
these bad choices most of the time.

> A couple more comments:
> 
> 1) IMHO we should rename hashagg_mem_overflow to
> enable_hashagg_overflow
> or something like that. I think that describes the GUC purpose better
> (and it's more consistent with enable_hashagg_spill).

The other enable_* GUCs are all planner GUCs, so I named this one
differently to stand out as an executor GUC.

> 2) show_hashagg_info
> 
> I think there's a missing space after ":" here:
> 
>   "  Batches: %d  Disk Usage:%ldkB",
> 
> and maybe we should use just "Disk:" just like in we do for sort:

Done, thank you.

> 3) I'm not quite sure what to think about the JIT recompile we do for
> EEOP_AGG_INIT_TRANS_SPILLED etc. I'm no llvm/jit expert, but do we do
> that for some other existing cases?

Andres asked for that explicitly to avoid branches in the non-spilling
code path (or at least branches that are likely to be mispredicted).

Regards,
Jeff Davis






RE: Implementing Incremental View Maintenance

2019-12-22 Thread tsunakawa.ta...@fujitsu.com
From: Tatsuo Ishii 
> > The following IVM wiki page returns an error.  Does anybody know what's
> wrong?
> >
> > https://wiki.postgresql.org/wiki/Incremental_View_Maintenance
> 
> I don't have any problem with the page. Maybe temporary error?

Yeah, I can see it now.  I could see it on the weekend.  The page was not 
available for at least an hour or so when I asked about this.  I thought the 
Pgsql-www team kindly solved the issue.


Regards
Takayuki Tsunakawa





Re: planner support functions: handle GROUP BY estimates ?

2019-12-22 Thread Justin Pryzby
On Tue, Nov 19, 2019 at 01:34:21PM -0600, Justin Pryzby wrote:
> Tom implemented "Planner support functions":
> https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=a391ff3c3d418e404a2c6e4ff0865a107752827b
> https://www.postgresql.org/docs/12/xfunc-optimization.html
> 
> I wondered whether there was any consideration to extend that to allow
> providing improved estimates of "group by".  That currently requires manually
> by creating an expression index, if the function is IMMUTABLE (which is not
> true for eg.  date_trunc of timestamptz).

I didn't hear back so tried implementing this for date_trunc().  Currently, the
planner assumes that functions output equally many groups as their input
variables.  Most invocations of our reports use date_trunc (or similar), so my
earlier attempt to alert on rowcount misestimates was very brief.

I currently assume that the input data has 1 second granularity:
|postgres=# CREATE TABLE t(i) AS SELECT date_trunc('second',a)a FROM 
generate_series(now(), now()+'7 day'::interval, '1 seconds')a; ANALYZE t;
|postgres=# explain analyze SELECT date_trunc('hour',i) i FROM t GROUP BY 1;
| Group  (cost=9021.85..9042.13 rows=169 width=8) (actual 
time=1365.934..1366.453 rows=169 loops=1)
|
|postgres=# explain analyze SELECT date_trunc('minute',i) i FROM t GROUP BY 1;
| Finalize HashAggregate  (cost=10172.79..10298.81 rows=10081 width=8) (actual 
time=1406.057..1413.413 rows=10081 loops=1)
|
|postgres=# explain analyze SELECT date_trunc('day',i) i FROM t GROUP BY 1;
| Group  (cost=9013.71..9014.67 rows=8 width=8) (actual time=1582.998..1583.030 
rows=8 loops=1)

If the input timestamps have (say) hourly granularity, rowcount will be
*underestimated* by 3600x, which is worse than the behavior in master of
overestimating by (for "day") 24x.

I'm trying to think of ways to address that:

0) Add a fudge factor of 4x or maybe 30x;

1) Avoid applying a corrective factor for seconds or minutes that makes the
rowcount less than (say) 2 or 100.  That would divide 24 but might then avoid
the last /60 or /60/60.  Ultimately, that's more "fudge" than anything else;

2) Leave alone pg_catalog.date_trunc(), but provide "template" support
functions like timestamp_support_10pow1, 10pow2, 10pow3, etc, which include the
given corrective factor, which should allow more accurate rowcount for input
data with granularity of the given number of seconds.

Ideally, that would be user-specified factor, but I don't think that's possible
to specify in SQL; the constant has to be built into the C function.  At
telsasoft, our data mostly has 15minute granularity (900sec), so we'd maybe
make a "date_trunc" function in the user schema which calls the
pg_catalog.date_trunc with support function timestamp_support_10pow3;

There could be a "base" support function that accepts a multiplier argument,
and then any user-provided C extension would be a one-liner specifing an
arbitrary value;

3) Maybe there are better functions than date_trunc() to address;

4) Leave it as a patch in the archives for people to borrow from;

Justin
>From 94f7791a7f82dea8757ef139befbf26feb970685 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Sun, 15 Dec 2019 20:27:24 -0600
Subject: [PATCH v1] Planner support functions for GROUP BY f()..

..implemented for date_trunc()

See also a391ff3c3d418e404a2c6e4ff0865a107752827b
---
 src/backend/optimizer/util/plancat.c | 44 
 src/backend/utils/adt/selfuncs.c | 28 +++
 src/backend/utils/adt/timestamp.c| 97 
 src/include/catalog/catversion.h |  2 +-
 src/include/catalog/pg_proc.dat  | 15 --
 src/include/nodes/nodes.h|  3 +-
 src/include/nodes/supportnodes.h | 15 ++
 src/include/optimizer/plancat.h  |  2 +
 8 files changed, 201 insertions(+), 5 deletions(-)

diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 5e889d1..9438b3c 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -2008,6 +2008,50 @@ get_function_rows(PlannerInfo *root, Oid funcid, Node *node)
 	return result;
 }
 
+/* */
+double
+get_function_groupby(PlannerInfo *root, Oid funcid, Node *node, Node *var)
+{
+	HeapTuple	proctup;
+	Form_pg_proc procform;
+
+	proctup = SearchSysCache1(PROCOID, ObjectIdGetDatum(funcid));
+	if (!HeapTupleIsValid(proctup))
+		elog(ERROR, "cache lookup failed for function %u", funcid);
+	procform = (Form_pg_proc) GETSTRUCT(proctup);
+
+	if (OidIsValid(procform->prosupport))
+	{
+		SupportRequestGroupBy req;
+		SupportRequestGroupBy *sresult;
+
+		req.type = T_SupportRequestGroupBy;
+		req.root = root;
+		req.funcid = funcid;
+		req.node = node;
+		req.var = var;
+		req.factor = 1;			/* just for sanity */
+
+		sresult = (SupportRequestGroupBy *)
+			DatumGetPointer(OidFunctionCall1(procform->prosupport,
+			 PointerGetDatum(&req)));
+
+		if (sresult == &req)
+		{
+			/* Success */
+			ReleaseSysCache(

Drongo vs. 9.4 initdb TAP test

2019-12-22 Thread Tom Lane
Buildfarm member drongo has been failing the initdb TAP test in the
9.4 branch for the last week or two:

# Running: rm -rf 
'C:\prog\bf\root\REL9_4_STABLE\pgsql.build\src\bin\initdb\tmp_check\tmp_testAHN7'/*
'rm' is not recognized as an internal or external command,
operable program or batch file.
Bail out!  system rm -rf 
'C:\prog\bf\root\REL9_4_STABLE\pgsql.build\src\bin\initdb\tmp_check\tmp_testAHN7'/*
 failed: 256

The test has not changed; rather, it looks like drongo wasn't
trying to run it before.

This test is passing in the newer branches --- evidently due to
the 9.5-era commit 1a629c1b1, which removed this TAP script's
dependency on "rm -rf".  So we should either back-patch that
commit into 9.4 or undo whatever configuration change caused
drongo to try to run more tests.  I favor the former.

regards, tom lane




Re: Drongo vs. 9.4 initdb TAP test

2019-12-22 Thread Michael Paquier
On Sun, Dec 22, 2019 at 07:24:09PM -0500, Tom Lane wrote:
> This test is passing in the newer branches --- evidently due to
> the 9.5-era commit 1a629c1b1, which removed this TAP script's
> dependency on "rm -rf".  So we should either back-patch that
> commit into 9.4 or undo whatever configuration change caused
> drongo to try to run more tests.  I favor the former.

I would prefer simply removing the dependency of rm -rf in the tests,
even if that's for a short time as 9.4 is EOL in two months.  A
back-patch applies without conflicts, and the tests are able to pass.
Would you prefer doing it yourself?  I have not checked yet on
Windows, better to make sure that it does not fail.
--
Michael


signature.asc
Description: PGP signature


Re: Drongo vs. 9.4 initdb TAP test

2019-12-22 Thread Tom Lane
Michael Paquier  writes:
> On Sun, Dec 22, 2019 at 07:24:09PM -0500, Tom Lane wrote:
>> This test is passing in the newer branches --- evidently due to
>> the 9.5-era commit 1a629c1b1, which removed this TAP script's
>> dependency on "rm -rf".  So we should either back-patch that
>> commit into 9.4 or undo whatever configuration change caused
>> drongo to try to run more tests.  I favor the former.

> I would prefer simply removing the dependency of rm -rf in the tests,
> even if that's for a short time as 9.4 is EOL in two months.

I'd vote for back-patching 1a629c1b1 as-is, or is that what you meant?

> A back-patch applies without conflicts, and the tests are able to pass.
> Would you prefer doing it yourself?  I have not checked yet on
> Windows, better to make sure that it does not fail.

I don't have the ability to test it on Windows --- if you want to do that,
feel free to do so and push.

regards, tom lane




Re: Implementing Incremental View Maintenance

2019-12-22 Thread Yugo Nagata
On Sun, 22 Dec 2019 20:54:41 +0900
nuko yokohama  wrote:

> SELECT statement that is not IMMUTABLE must not be specified when creating
> a view.
> 
> An expression SELECT statement that is not IMMUTABLE must not be specified
> when creating a view.
> 
> In the current implementation, a SELECT statement containing an expression
> that is not IMMUTABLE can be specified when creating a view.
> If an incremental materialized view is created from a SELECT statement that
> contains an expression that is not IMMUTABLE, applying the SELECT statement
> to the view returns incorrect results.
> To prevent this, we propose that the same error occur when a non-IMMUTABLE
> expression is specified in the "CREATE INDEX" statement.

Thank you for pointing out this. That makes sense.  The check of not-IMMUTABLE
epressions is missing at creating IMMV.  We'll add this.

Thanks,
Yugo Nagata

> 
> The following is an inappropriate example.
> 
> CREATE TABLE base (id int primary key, data text, ts timestamp);
> CREATE TABLE
> CREATE VIEW base_v AS SELECT * FROM base
>   WHERE ts >= (now() - '3 second'::interval);
> CREATE VIEW
> CREATE MATERIALIZED VIEW base_mv AS SELECT * FROM base
>   WHERE ts >= (now() - '3 second'::interval);
> SELECT 0
> CREATE INCREMENTAL MATERIALIZED VIEW base_imv AS SELECT * FROM base
>   WHERE ts >= (now() - '3 second'::interval);
> SELECT 0
>   View "public.base_v"
>  Column |Type | Collation | Nullable | Default |
> Storage  | Description
> +-+---+--+-+--+-
>  id | integer |   |  | |
> plain|
>  data   | text|   |  | |
> extended |
>  ts | timestamp without time zone |   |  | |
> plain|
> View definition:
>  SELECT base.id,
> base.data,
> base.ts
>FROM base
>   WHERE base.ts >= (now() - '00:00:03'::interval);
> 
>   Materialized view "public.base_mv"
>  Column |Type | Collation | Nullable | Default |
> Storage  | Stats target | Description
> +-+---+--+-+--+--+-
>  id | integer |   |  | |
> plain|  |
>  data   | text|   |  | |
> extended |  |
>  ts | timestamp without time zone |   |  | |
> plain|  |
> View definition:
>  SELECT base.id,
> base.data,
> base.ts
>FROM base
>   WHERE base.ts >= (now() - '00:00:03'::interval);
> Access method: heap
> 
>  Materialized view "public.base_imv"
> Column |Type | Collation | Nullable |
> Default | Storage  | Stats target | Description
> ---+-+---+--+-+--+--+-
>  id| integer |   |  |
>   | plain|  |
>  data  | text|   |  |
>   | extended |  |
>  ts| timestamp without time zone |   |  |
>   | plain|  |
>  __ivm_count__ | bigint  |   |  |
>   | plain|  |
> View definition:
>  SELECT base.id,
> base.data,
> base.ts
>FROM base
>   WHERE base.ts >= (now() - '00:00:03'::interval);
> Access method: heap
> Incremental view maintenance: yes
> 
> INSERT INTO base VALUES (generate_series(1,3), 'dummy', clock_timestamp());
> INSERT 0 3
> SELECT * FROM base_v ORDER BY id;
>  id | data  | ts
> +---+
>   1 | dummy | 2019-12-22 11:38:26.367481
>   2 | dummy | 2019-12-22 11:38:26.367599
>   3 | dummy | 2019-12-22 11:38:26.367606
> (3 rows)
> 
> SELECT * FROM base_mv ORDER BY id;
>  id | data | ts
> +--+
> (0 rows)
> 
> REFRESH MATERIALIZED VIEW base_mv;
> REFRESH MATERIALIZED VIEW
> SELECT * FROM base_mv ORDER BY id;
>  id | data  | ts
> +---+
>   1 | dummy | 2019-12-22 11:38:26.367481
>   2 | dummy | 2019-12-22 11:38:26.367599
>   3 | dummy | 2019-12-22 11:38:26.367606
> (3 rows)
> 
> SELECT * FROM base_imv ORDER BY id;
>  id | data  | ts
> +---+
>   1 | dummy | 2019-12-22 11:38:26.367481
>   2 | dummy | 2019-12-22 11:38:26.367599
>   3 | dummy | 2019-12-22 11:38:26.367606
> (3 rows)
> 
> SELECT pg_sleep(3);
>  pg_sleep
> --
> 
> (1 row)
> 
> INSERT INTO base VALUES (generate_series(4,6), 'dummy', clock_timestamp());
> INSERT 0 3
> SELECT * FROM base_v ORDER BY id;
>  id | data  | ts
> +---+

Re: Implementing Incremental View Maintenance

2019-12-22 Thread Tatsuo Ishii
> Could you give some concrete use cases, so that I can have a clearer image of 
> the target data?  In the discussion, someone referred to master data with low 
> update frequency, because the proposed IVM implementation adds triggers on 
> source tables, which limits the applicability to update-heavy tables.

But if you want to get always up-to-data you need to pay the cost for
REFRESH MATERIALIZED VIEW. IVM gives a choice here.

pgbench -s 100
create materialized view mv1 as select count(*) from pgbench_accounts;
create incremental materialized view mv2 as select count(*) from 
pgbench_accounts;

Now I delete one row from pgbench_accounts.

test=# delete from pgbench_accounts where aid = 1000;
DELETE 1
Time: 12.387 ms

Of course this makes mv1's data obsolete:
test=# select * from mv1;
  count   
--
 1000
(1 row)

To reflect the fact on mv1 that a row was deleted from
pgbench_accounts, you need to refresh mv1:

test=# refresh materialized view mv1;
REFRESH MATERIALIZED VIEW
Time: 788.757 ms

which takes 788ms. With mv2 you don't need to pay this cost to get the
latest data.

This is kind of ideal use case for IVM and I do not claim that IVM
always wins over ordinary materialized view (or non materialized
view). IVM will give benefit in that a materialized view instantly
updated whenever base tables get updated with a cost of longer update
time on base tables.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp




Re: Drongo vs. 9.4 initdb TAP test

2019-12-22 Thread Michael Paquier
On Sun, Dec 22, 2019 at 07:57:34PM -0500, Tom Lane wrote:
> Michael Paquier  writes:
>> On Sun, Dec 22, 2019 at 07:24:09PM -0500, Tom Lane wrote:
>>> This test is passing in the newer branches --- evidently due to
>>> the 9.5-era commit 1a629c1b1, which removed this TAP script's
>>> dependency on "rm -rf".  So we should either back-patch that
>>> commit into 9.4 or undo whatever configuration change caused
>>> drongo to try to run more tests.  I favor the former.
> 
>> I would prefer simply removing the dependency of rm -rf in the tests,
>> even if that's for a short time as 9.4 is EOL in two months.
> 
> I'd vote for back-patching 1a629c1b1 as-is, or is that what you meant?

Yes, that's what I meant.

>> A back-patch applies without conflicts, and the tests are able to pass.
>> Would you prefer doing it yourself?  I have not checked yet on
>> Windows, better to make sure that it does not fail.
> 
> I don't have the ability to test it on Windows --- if you want to do that,
> feel free to do so and push.

Thanks, done.  The original commit had a typo in one comment, fixed by
a9793e07 later on so I have included this fix as well here.
--
Michael


signature.asc
Description: PGP signature


RE: Implementing Incremental View Maintenance

2019-12-22 Thread tsunakawa.ta...@fujitsu.com
From: legrand legrand 
> For each insert into a base table there are 3 statements:
> - ANALYZE pg_temp_3.pg_temp_81976
> - WITH updt AS (  UPDATE public.mv1 AS mv SET __ivm_count__ = ...
> - DROP TABLE pg_temp_3.pg_temp_81976

Does it also include CREATE TEMPORARY TABLE, because there's DROP?

I remember that repeated CREATE and DROP of temporary tables should be avoided 
in PostgreSQL.  Dropped temporary tables leave some unused memory in 
CacheMemoryContext.  If creation and deletion of temporary tables are done per 
row in a single session, say loading of large amount of data, memory bloat 
could crash the OS.  That actually happened at a user's environment.

Plus, repeated create/drop may cause system catalog bloat as well even when 
they are performed in different sessions.  In a fortunate case, the garbage 
records gather at the end of the system tables, and autovacuum will free those 
empty areas by truncating data files.  However, if some valid entry persists 
after the long garbage area, the system tables would remain bloated.

What kind of workload and data are you targeting with IVM?


Regards
Takayuki Tsunakawa





Re: TCP option assign hook doesn't work well if option not supported

2019-12-22 Thread Michael Paquier
On Thu, Dec 19, 2019 at 07:26:19PM +0100, Peter Eisentraut wrote:
> macOS does not support the socket option TCP_USER_TIMEOUT.  Yet, I can start
> a server with postgres -D ... --tcp-user-timeout=100 without a diagnostic.
> Only when I connect I get a log entry
> 
> LOG:  setsockopt(TCP_USER_TIMEOUT) not supported

Yeah, this choice was made to be consistent with what we have for the
other TCP parameters.

> So that the #else branch that is supposed to check this will also be run in
> the postmaster (where port == NULL).

Hm.  That would partially revisit cc3bda3.  No actual objections from
me to generate a LOG when starting the postmaster as that won't be
invasive, though I think that it should be done consistently for all
the TCP parameters.

> Or perhaps there should be a separate GUC check hook that just does
> 
> #ifndef TCP_USER_TIMEOUT
> if (val != 0)
> return false;
> #endif
> return true;
> 
> The same considerations apply to the various TCP keepalive settings, but
> since those are widely supported the unsupported code paths probably haven't
> gotten much attention.

Yeah, Windows does not support tcp_keepalives_count for one, so
setting it in postgresql.conf generate the same LOG message for each
connection attempt.
--
Michael


signature.asc
Description: PGP signature


Re: Fetching timeline during recovery

2019-12-22 Thread Michael Paquier
On Fri, Dec 20, 2019 at 11:14:28AM +0100, Jehan-Guillaume de Rorthais wrote:
> Yes, that would be great but sadly, it would introduce a regression on various
> tools relying on them. At least, the one doing "select *" or most
> probably "select func()".
> 
> But anyway, adding 5 funcs is not a big deal neither. Too bad they are so 
> close
> to existing ones though.

Consistency of the data matters a lot if we want to build reliable
tools on top of them in case someone would like to compare the various
modes, and using different functions for those fields creates locking
issues (somewhat the point of Fujii-san upthread?).  If nobody likes
the approach of one function, returning one row, taking in input the
mode wanted, then I would not really object Stephen's idea on the
matter about having a multi-column function returning one row.
issues

>> Right. It is a restriction of polymorphic functions. It is in the same
>> relation with pg_stop_backup() and pg_stop_backup(true).

(pg_current_wal_lsn & co talk about LSNs, not TLIs).
--
Michael


signature.asc
Description: PGP signature


Re: Proposal: Add more compile-time asserts to expose inconsistencies.

2019-12-22 Thread Michael Paquier
On Wed, Dec 04, 2019 at 09:54:47AM -0800, Andres Freund wrote:
> Well, _Static_assert has an error message, so we got to pass
> something. And having something like "array length mismatch", without
> referring again to the variable, doesn't strike me as that bad. We could
> of course just again pass the expression, this time stringified, but
> that doesn't seem an improvement.

Yeah, I would rather keep the second argument.  I think that's also
more helpful as it gives more flexibility to extension authors willing
to make use of it.
--
Michael


signature.asc
Description: PGP signature


Re: backup manifests

2019-12-22 Thread Rushabh Lathia
On Fri, Dec 20, 2019 at 9:14 PM Robert Haas  wrote:

> On Fri, Dec 20, 2019 at 8:24 AM Suraj Kharage
>  wrote:
> > Thank you for review comments.
>
> Thanks for the new version.
>
> +  --verify-backup 
>
> Whitespace.
>
> +struct manifesthash_hash *hashtab;
>
> Uh, I had it in mind that you would nuke this line completely, not
> just remove "typedef" from it. You shouldn't need a global variable
> here.
>
> + if (buf == NULL)
>
> pg_malloc seems to have an internal check such that it never returns
> NULL. I don't see anything like this test in other callers.
>
> The order of operations in create_manifest_hash() seems unusual:
>
> + fd = open(manifest_path, O_RDONLY, 0);
> + if (fstat(fd, &stat))
> + buf = pg_malloc(stat.st_size);
> + hashtab = manifesthash_create(1024, NULL);
> ...
> + entry = manifesthash_insert(hashtab, filename, &found);
> ...
> + close(fd);
>
> I would have expected open-fstat-read-close to be consecutive, and the
> manifesthash stuff all done afterwards. In fact, it seems like reading
> the file could be a separate function.
>
> + if (strncmp(checksum, "SHA256", 6) == 0)
>
> This isn't really right; it would give a false match if we had a
> checksum algorithm with a name like SHA2560 or SHA256C or
> SHA256ExceptWayBetter. The right thing to do is find the colon first,
> and then probably overwrite it with '\0' so that you have a string
> that you can pass to parse_checksum_algorithm().
>
> + /*
> + * we don't have checksum type in the header, so need to
> + * read through the first file enttry to find the checksum
> + * type for the manifest file and initilize the checksum
> + * for the manifest file itself.
> + */
>
> This seems to be proceeding on the assumption that the checksum type
> for the manifest itself will always be the same as the checksum type
> for the first file in the manifest. I don't think that's the right
> approach. I think the manifest should always have a SHA256 checksum,
> regardless of what type of checksum is used for the individual files
> within the manifest. Since the volume of data in the manifest is
> presumably very small compared to the size of the database cluster
> itself, I don't think there should be any performance problem there.
>

Agree, that performance won't be a problem, but that will be bit confusing
to the user.  As at the start user providing the manifest-checksum (assume
that user-provided CRC32C) and at the end, user will find the SHA256
checksum string in the backup_manifest file.

Does this also means that irrespective of whether user provided a checksum
option or not,  we will be always generating the checksum for the
backup_manifest file?


> + filesize = atol(size);
>
> Using strtol() would allow for some error checking.
>
> + * Increase the checksum by its lable length so that we can
> + checksum = checksum + checksum_lable_length;
>
> Spelling.
>
> + pg_log_error("invalid record found in \"%s\"", manifest_path);
>
> Error message needs work.
>
> +VerifyBackup(void)
> +create_manifest_hash(char *manifest_path)
> +nextLine(char *buf)
>
> Your function names should be consistent with the surrounding style,
> and with each other, as far as possible. Three different conventions
> within the same patch and source file seems over the top.
>
> Also keep in mind that you're not writing code in a vacuum. There's a
> whole file of code here, and around that, a whole project.
> scan_data_directory() is a good example of a function whose name is
> clearly too generic. It's not a general-purpose function for scanning
> the data directory; it's specifically a support function for verifying
> a backup. Yet, the name gives no hint of this.
>
> +verify_file(struct dirent *de, char fn[MAXPGPATH], struct stat st,
> + char relative_path[MAXPGPATH], manifesthash_hash *hashtab)
>
> I think I commented on the use of char[] parameters in my previous review.
>
> + /* Skip backup manifest file. */
> + if (strcmp(de->d_name, "backup_manifest") == 0)
> + return;
>
> Still looks like this will be skipped at any level of the directory
> hierarchy, not just the top. And why are we skipping backup_manifest
> here bug pg_wal in scan_data_directory? That's a rhetorical question,
> because I know the answer: verify_file() is only getting called for
> files, so you can't use it to skip directories. But that's not a good
> excuse for putting closely-related checks in different parts of the
> code. It's just going to result in the checks being inconsistent and
> each one having its own bugs that have to be fixed separately from the
> other one, as here. Please try to reorganize this code so that it can
> be done in a consistent way.
>
> I think this is related to the way you're traversing the directory
> tree, which somehow looks a bit awkward to me. At the top of
> scan_data_directory(), you've got code that uses basedir and
> subdirpath to construct path and relative_path. I was initially
> surprised to see that this was the job of this function, rather than
> the

Should we rename amapi.h and amapi.c?

2019-12-22 Thread Michael Paquier
Hi all,

I was working on some stuff for table AMs, and I got to wonder it we
had better rename amapi.h to indexam.h and amapi.c to indexam.c, so as
things are more consistent with table AM.  It is a bit annoying to
name the files dedicated to index AMs with what looks like now a too
generic name.  That would require switching a couple of header files
for existing module developers, which is always annoying, but the move
makes sense thinking long-term?

Any thoughts?
--
Michael


signature.asc
Description: PGP signature


Re: Implementing Incremental View Maintenance

2019-12-22 Thread Yugo Nagata
On Mon, 23 Dec 2019 02:26:09 +
"tsunakawa.ta...@fujitsu.com"  wrote:

> From: legrand legrand 
> > For each insert into a base table there are 3 statements:
> > - ANALYZE pg_temp_3.pg_temp_81976
> > - WITH updt AS (  UPDATE public.mv1 AS mv SET __ivm_count__ = ...
> > - DROP TABLE pg_temp_3.pg_temp_81976
> 
> Does it also include CREATE TEMPORARY TABLE, because there's DROP?

CREATE TEMPRARY TABLE is not called because temptables are created
by make_new_heap() instead of queries via SPI.
 
> I remember that repeated CREATE and DROP of temporary tables should be 
> avoided in PostgreSQL.  Dropped temporary tables leave some unused memory in 
> CacheMemoryContext.  If creation and deletion of temporary tables are done 
> per row in a single session, say loading of large amount of data, memory 
> bloat could crash the OS.  That actually happened at a user's environment.

> Plus, repeated create/drop may cause system catalog bloat as well even when 
> they are performed in different sessions.  In a fortunate case, the garbage 
> records gather at the end of the system tables, and autovacuum will free 
> those empty areas by truncating data files.  However, if some valid entry 
> persists after the long garbage area, the system tables would remain bloated.

Thank you for explaining the problem. I understood that creating and
dropping temprary tables is harmful more than I have thought. Although
this is not a concrete plan, there are two ideas to reduce creating
temporary tables:

1. Create a temporary table only once at the first view maintenance in
this session. This is possible if we store names or oid of temporary
tables used for each materialized view in memory. However, users may
access to these temptables whenever during the session.

2. Use tuplestores instead of temprary tables. Tuplestores can be
converted to Ephemeral Name Relation (ENR) and used in queries.
 It doesn't need updating system catalogs, but indexes can not be
used to access.

> 
> What kind of workload and data are you targeting with IVM?

IVM (with immediate maintenance approach) would be efficient
in situations where modifications on base tables are not frequent. 
In such situations, create and drop of temptalbes is not so
frequent either, but it would be still possible that the problem
you concern occurs. So, it seems worth to consider the way to
reduce use of temptable.

Regards,
Yugo Nagata


-- 
Yugo Nagata 




Re: Implementing Incremental View Maintenance

2019-12-22 Thread Julien Rouhaud
On Mon, Dec 23, 2019 at 7:51 AM Yugo Nagata  wrote:
>
> On Mon, 23 Dec 2019 02:26:09 +
> "tsunakawa.ta...@fujitsu.com"  wrote:
>
> > From: legrand legrand 
> > > For each insert into a base table there are 3 statements:
> > > - ANALYZE pg_temp_3.pg_temp_81976
> > > - WITH updt AS (  UPDATE public.mv1 AS mv SET __ivm_count__ = ...
> > > - DROP TABLE pg_temp_3.pg_temp_81976
> >
> > Does it also include CREATE TEMPORARY TABLE, because there's DROP?
>
> CREATE TEMPRARY TABLE is not called because temptables are created
> by make_new_heap() instead of queries via SPI.
>
> > I remember that repeated CREATE and DROP of temporary tables should be 
> > avoided in PostgreSQL.  Dropped temporary tables leave some unused memory 
> > in CacheMemoryContext.  If creation and deletion of temporary tables are 
> > done per row in a single session, say loading of large amount of data, 
> > memory bloat could crash the OS.  That actually happened at a user's 
> > environment.
>
> > Plus, repeated create/drop may cause system catalog bloat as well even when 
> > they are performed in different sessions.  In a fortunate case, the garbage 
> > records gather at the end of the system tables, and autovacuum will free 
> > those empty areas by truncating data files.  However, if some valid entry 
> > persists after the long garbage area, the system tables would remain 
> > bloated.
>
> Thank you for explaining the problem. I understood that creating and
> dropping temprary tables is harmful more than I have thought. Although
> this is not a concrete plan, there are two ideas to reduce creating
> temporary tables:

For the pg_stat_statements point of view, utility command support is
already quite bad as with many workloads it's rather impossible to
activate track_utility as it'd otherwise pollute the hashtable with an
infinity of queries executed only once (random prepared transaction
name, random cursor names...).  I'm wondering whether we should
normalize utility statements deparsing the utilityStmt, and also
normalizing some identifiers (maybe optionally with a GUC), eg.
"DECLARE ? AS CURSOR FOR normalized_query_here".  However commands
like vacuum or drop would be better kept as-is.




Re: [HACKERS] Block level parallel vacuum

2019-12-22 Thread Mahendra Singh
On Fri, 20 Dec 2019 at 17:17, Prabhat Sahu
 wrote:
>
> Hi,
>
> While testing this feature with parallel vacuum on "TEMPORARY TABLE", I got a 
> server crash on PG Head+V36_patch.
> Changed configuration parameters and Stack trace are as below:
>
> autovacuum = on
> max_worker_processes = 4
> shared_buffers = 10MB
> max_parallel_workers = 8
> max_parallel_maintenance_workers = 8
> vacuum_cost_limit = 2000
> vacuum_cost_delay = 10
> min_parallel_table_scan_size = 8MB
> min_parallel_index_scan_size = 0
>
> -- Stack trace:
> [centos@parallel-vacuum-testing bin]$ gdb -q -c data/core.1399 postgres
> Reading symbols from 
> /home/centos/BLP_Vacuum/postgresql/inst/bin/postgres...done.
> [New LWP 1399]
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib64/libthread_db.so.1".
> Core was generated by `postgres: autovacuum worker   postgres 
>  '.
> Program terminated with signal 6, Aborted.
> #0  0x7f4517d80337 in raise () from /lib64/libc.so.6
> Missing separate debuginfos, use: debuginfo-install glibc-2.17-292.el7.x86_64 
> keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-37.el7_7.2.x86_64 
> libcom_err-1.42.9-16.el7.x86_64 libgcc-4.8.5-39.el7.x86_64 
> libselinux-2.5-14.1.el7.x86_64 openssl-libs-1.0.2k-19.el7.x86_64 
> pcre-8.32-17.el7.x86_64 zlib-1.2.7-18.el7.x86_64
> (gdb) bt
> #0  0x7f4517d80337 in raise () from /lib64/libc.so.6
> #1  0x7f4517d81a28 in abort () from /lib64/libc.so.6
> #2  0x00a96341 in ExceptionalCondition (conditionName=0xd18efb 
> "strvalue != NULL", errorType=0xd18eeb "FailedAssertion",
> fileName=0xd18ee0 "snprintf.c", lineNumber=442) at assert.c:67
> #3  0x00b02522 in dopr (target=0x7ffdb0e38450, format=0xc5fa95 
> ".%s\"", args=0x7ffdb0e38538) at snprintf.c:442
> #4  0x00b01ea6 in pg_vsnprintf (str=0x256df50 "autovacuum: dropping 
> orphan temp table \"postgres.", '\177' ..., count=1024,
> fmt=0xc5fa68 "autovacuum: dropping orphan temp table \"%s.%s.%s\"", 
> args=0x7ffdb0e38538) at snprintf.c:195
> #5  0x00afbadf in pvsnprintf (buf=0x256df50 "autovacuum: dropping 
> orphan temp table \"postgres.", '\177' ..., len=1024,
> fmt=0xc5fa68 "autovacuum: dropping orphan temp table \"%s.%s.%s\"", 
> args=0x7ffdb0e38538) at psprintf.c:110
> #6  0x00afd34b in appendStringInfoVA (str=0x7ffdb0e38550, 
> fmt=0xc5fa68 "autovacuum: dropping orphan temp table \"%s.%s.%s\"", 
> args=0x7ffdb0e38538)
> at stringinfo.c:149
> #7  0x00a970fd in errmsg (fmt=0xc5fa68 "autovacuum: dropping orphan 
> temp table \"%s.%s.%s\"") at elog.c:832
> #8  0x008588d2 in do_autovacuum () at autovacuum.c:2249
> #9  0x00857b29 in AutoVacWorkerMain (argc=0, argv=0x0) at 
> autovacuum.c:1689
> #10 0x0085772f in StartAutoVacWorker () at autovacuum.c:1483
> #11 0x0086e64f in StartAutovacuumWorker () at postmaster.c:5562
> #12 0x0086e106 in sigusr1_handler (postgres_signal_arg=10) at 
> postmaster.c:5279
> #13 
> #14 0x7f4517e3f933 in __select_nocancel () from /lib64/libc.so.6
> #15 0x00869838 in ServerLoop () at postmaster.c:1691
> #16 0x00869212 in PostmasterMain (argc=3, argv=0x256bd70) at 
> postmaster.c:1400
> #17 0x0077855d in main (argc=3, argv=0x256bd70) at main.c:210
> (gdb)
>
> I have tried to reproduce the same with all previously executed queries but 
> now I am not able to reproduce the same.

Thanks Prabhat for reporting this issue.

I am able to reproduce this issue at my end. I tested and verified
that this issue is not related to parallel vacuum patch. I am able to
reproduce this issue on HEAD without parallel vacuum patch(v37).

I will report this issue in new thread with reproducible test case.

Thanks and Regards
Mahendra Thalor
EnterpriseDB: http://www.enterprisedb.com




RE: Implementing Incremental View Maintenance

2019-12-22 Thread tsunakawa.ta...@fujitsu.com
From: Tatsuo Ishii 
> the target data?  In the discussion, someone referred to master data with low
> update frequency, because the proposed IVM implementation adds triggers on
> source tables, which limits the applicability to update-heavy tables.
> 
> But if you want to get always up-to-data you need to pay the cost for
> REFRESH MATERIALIZED VIEW. IVM gives a choice here.

Thank you, that clarified to some extent.  What kind of data do you think of as 
an example?

Materialized view reminds me of the use in a data warehouse.  Oracle handles 
the top in its Database Data Warehousing Guide, and Microsoft has just started 
to offer the materialized view feature in its Azure Synapse Analytics (formerly 
SQL Data Warehouse).  AWS also has previewed Redshift's materialized view 
feature in re:Invent 2019.  Are you targeting the data warehouse (analytics) 
workload?

IIUC, to put (over) simply, the data warehouse has two kind of tables:

* Facts (transaction data): e.g. sales, user activity
Large amount.  INSERT only on a regular basis (ETL/ELT) or continuously 
(streaming)

* Dimensions (master/reference data): e.g. product, customer, time, country
Small amount.  Infrequently INSERTed or UPDATEd.


The proposed trigger-based approach does not seem to be suitable for the facts, 
because the trigger overhead imposed on data loading may offset or exceed the 
time saved by incrementally refreshing the materialized views.

Then, does the proposed feature fit the dimension tables?  If the materialized 
view is only based on the dimension data, then the full REFRESH of the 
materialized view wouldn't take so long.  The typical materialized view should 
join the fact and dimension tables.  Then, the fact table will have to have the 
triggers, causing the data loading slowdown.

I'm saying this because I'm concerned about the trigger based overhead.  As you 
know, Oracle uses materialized view logs to save changes and incrementally 
apply them later to the materialized views (REFRESH ON STATEMENT materialized 
views doesn't require the materialized view log, so it might use triggers.)  
Does any commercial grade database implement materialized view using triggers?  
I couldn't find relevant information regarding Azure Synapse and Redshift.

If our only handy option is a trigger, can we minimize the overhead by doing 
the view maintenance at transaction commit?


Regards
Takayuki Tsunakawa