date:20200318

Re: adding partitioned tables to publications

2020-03-18 Thread Amit Langote

On Wed, Mar 18, 2020 at 12:06 PM Amit Langote  wrote:
> Hi Peter,
>
> On Mon, Mar 16, 2020 at 9:49 PM Peter Eisentraut
>  wrote:
> >
> > I was trying to extract some preparatory work from the remaining patches
> > and came up with the attached.  This is part of your patch 0003, but
> > also relevant for part 0004.  The problem was that COPY (SELECT *) is
> > not sufficient when the table has generated columns, so we need to build
> > the column list explicitly.
> >
> > Thoughts?
>
> Thank you for that.
>
> +   if (isnull || !remote_is_publishable)
> +   ereport(ERROR,
> +   (errmsg("table \"%s.%s\" on the publisher is not publishable",
> +   nspname, relname)));
>
> Maybe add a one-line comment above this to say it's an "not supposed
> to happen" error or am I missing something?  Wouldn't elog() suffice
> for this?
>
> Other than that, looks good.

Wait, the following Assert in copy_table() should now be gone:

Assert(relmapentry->localrel->rd_rel->relkind == RELKIND_RELATION);

because just below it:

/* Start copy on the publisher. */
initStringInfo(&cmd);
-   appendStringInfo(&cmd, "COPY %s TO STDOUT",
-quote_qualified_identifier(lrel.nspname, lrel.relname));
+   if (lrel.relkind == RELKIND_RELATION)
+   appendStringInfo(&cmd, "COPY %s TO STDOUT",
+quote_qualified_identifier(lrel.nspname,
lrel.relname));
+   else
+   {
+   /*
+* For non-tables, we need to do COPY (SELECT ...), but we can't just
+* do SELECT * because we need to not copy generated columns.
+*/

By the way, I have rebased the patches, although maybe you've got your
own copies; attached.

-- 
Thank you,
Amit


v12-0003-Add-subscription-support-to-replicate-into-parti.patch
Description: Binary data


v12-0002-Some-refactoring-of-logical-worker.c.patch
Description: Binary data


v12-0004-Publish-partitioned-table-inserts-as-its-own.patch
Description: Binary data

Re: Collation versioning

2020-03-18 Thread Michael Paquier

On Tue, Mar 17, 2020 at 11:42:34AM +0100, Julien Rouhaud wrote:
> On Tue, Mar 17, 2020 at 08:19:30AM +0100, Julien Rouhaud wrote:
>> On Tue, Mar 17, 2020 at 03:37:49PM +0900, Michael Paquier wrote:
>>> It seems to me that you could add an extra test with a catalog that
>>> does not exist, making sure that NULL is returned:
>>> SELECT to_regtype('ng_catalog."POSIX"');
>>
>>
>> Agreed, I'll add that, but using a name that looks less like a typo :)
> 
> Tests added, including one with an error output, as the not existing schema
> doesn't reveal the encoding.

Yep.

>> Ah good catch, I missed that during the NameData/text refactoring.  I'll fix 
>> it
>> anyway, better to have clean history.
> 
> And this should be fixed too.

Thanks.

It would be good to be careful about the indentation.  Certain parts
of 0003 don't respect the core indentation.  Not necessarily your job
though.  Other than that 0003 seems to be in good shape.

@@ -54,6 +55,7 @@ recordDependencyOn(const ObjectAddress *depender,
 void
 recordMultipleDependencies(const ObjectAddress *depender,
const ObjectAddress *referenced,
+   const char *version,
int nreferenced,
DependencyType behavior)
Nit from patch 0002: the argument "version" should be fourth I think,
keeping the number of referenced objects and the referenced objects
close.  And actually, this "version" argument is removed in patch
0004, replaced by the boolean track_version.  (By reading the
arguments below I'd rather keep *version).

So, 0004 is the core of the changes.  I have found a bug with the
handling of refobjversion and pg_depend entries.  When swapping the
dependencies of the old and new indexes in index_concurrently_swap(),
refobjversion remains set to the value of the old index.  I used a
manual UPDATE on pg_depend to emulate that with a past fake version
string to emulate that (sneaky I know), but you would get the same
behavior with an upgraded instance.  refobjversion should be updated
to the version of the new index. 

+void recordDependencyOnCollations(ObjectAddress *myself,
+  List *collations,
+  bool record_version)
Incorrect declaration format.

+if (track_version)
+{
+/* Only dependency on a collation is supported. */
+if (referenced->classId == CollationRelationId)
+{
+/* C and POSIX collations don't require tracking the version */
+if (referenced->objectId == C_COLLATION_OID ||
+referenced->objectId == POSIX_COLLATION_OID)
+continue;
I don't think that the API is right for this stuff, as you introduce
collation-level knowledge into something which has been much more
flexible until now.  Wouldn't it be better to move the refobjversion
string directly into ObjectAddress?

+ * Test if a record exists for the given depender and referenceds addresses.
[...]
+   /* recordDependencyOnSingleRelExpr get rid of duplicate entries */
Typos.

+   /* XXX should we warn about "disappearing" versions? */
+   if (current_version)
What are disappearing version strings?

+/*
+ * Perform version sanity checks on the relation underlying indexes if
+ * it's not a VACUUM FULL
+ */
+if (!(options & VACOPT_FULL) && onerel && !IsSystemRelation(onerel) &&
+onerel->rd_rel->relhasindex)
Should this explain why?

+/* Record collations from the type itself, or underlying in case of
+ * complex type.  Note that if the direct parent is a CollateExpr
+ * node, there's no need to record the type underlying collation if
Comment block format.

+-- for key columns, hash indexes should record dependency on the collation but
+-- not the version
+CREATE INDEX icuidx18_hash_d_es ON collate_test USING hash (d_es);
Why is that?  The code in 0004 has no mention of that, and relies on
this code path:
+/*
+ * Returns whether the given index access method depend on a stable collation
+ * order.
+ */
+static bool
+index_depends_stable_coll_order(Oid amoid)
+{
+   return (amoid != HASH_AM_OID &&
+   strcmp(get_am_name(amoid), "bloom") != 0);
+}
And how is that even extensible for custom index AMs?  There are other
things than bloom out there.

+   /*
+* We only care about dependencies on a specific collation if a valid Oid
+* was given.=
+*/
[...]
+   /*
+* do not issue UNKNOWN VERSION is caller specified that those are
+* compatible
+*/
Typos from patch 5.

-   $self->logfile, '-o', "--cluster-name=$name", 'start');
+   $self->logfile, '-o', $options, 'start');
This needs to actually be shaped with two separate arguments for
--cluster-name or using quotes would not work properly if I recall
correctly.  Not your patch's fault, so I would fix that separately.
--
Michael


signature.

Re: RecoveryWalAll and RecoveryWalStream wait events

2020-03-18 Thread Atsushi Torikoshi

On Tue, Mar 17, 2020 at 11:55 AM Fujii Masao 
wrote:

> >  >Waiting when WAL data is not available from any kind of sources
> >  >(local, archive or stream) before trying again to retrieve WAL
> data,
> >
> > I think 'local' means pg_wal here, but the comment on
> > WaitForWALToBecomeAvailable() says checking pg_wal in
> > standby mode is 'not documented', so I'm a little bit worried
> > that users may be confused.
>
> This logic seems to be documented in high-availability.sgml.

Thanks! I didn't notice it.
I think you mean the below sentence.

>  The standby server will also attempt to restore any WAL found in the
standby cluster's pg_wal directory.

It seems the comment on WaitForWALToBecomeAvailable()
does not go along with the high-availability.sgml, do we need
modification on the comment on the function?
Or do I misunderstand something?

But, anyway, you think that "pg_wal" should be used instead

of "local" here?

I don't have special opinion here.
It might be better because high-availability.sgml does not use
"local" but "pg_wal" for the explanation,  but I also feel it's
obvious in this context.

Regards,

--
Torikoshi Atsushi

Re: Collation versioning

2020-03-18 Thread Julien Rouhaud

On Wed, Mar 18, 2020 at 04:55:25PM +0900, Michael Paquier wrote:
> On Tue, Mar 17, 2020 at 11:42:34AM +0100, Julien Rouhaud wrote:
> > On Tue, Mar 17, 2020 at 08:19:30AM +0100, Julien Rouhaud wrote:
>
> It would be good to be careful about the indentation.  Certain parts
> of 0003 don't respect the core indentation.  Not necessarily your job
> though.  Other than that 0003 seems to be in good shape.


I'll try to do a partial pgindent run on all patches before next patchset.


>
> @@ -54,6 +55,7 @@ recordDependencyOn(const ObjectAddress *depender,
>  void
>  recordMultipleDependencies(const ObjectAddress *depender,
> const ObjectAddress *referenced,
> +   const char *version,
> int nreferenced,
> DependencyType behavior)
> Nit from patch 0002: the argument "version" should be fourth I think,
> keeping the number of referenced objects and the referenced objects
> close.  And actually, this "version" argument is removed in patch
> 0004, replaced by the boolean track_version.  (By reading the
> arguments below I'd rather keep *version).
>
> So, 0004 is the core of the changes.  I have found a bug with the
> handling of refobjversion and pg_depend entries.  When swapping the
> dependencies of the old and new indexes in index_concurrently_swap(),
> refobjversion remains set to the value of the old index.  I used a
> manual UPDATE on pg_depend to emulate that with a past fake version
> string to emulate that (sneaky I know), but you would get the same
> behavior with an upgraded instance.  refobjversion should be updated
> to the version of the new index.


Oh good catch.  I'll dig into it.


> +if (track_version)
> +{
> +/* Only dependency on a collation is supported. */
> +if (referenced->classId == CollationRelationId)
> +{
> +/* C and POSIX collations don't require tracking the version 
> */
> +if (referenced->objectId == C_COLLATION_OID ||
> +referenced->objectId == POSIX_COLLATION_OID)
> +continue;
> I don't think that the API is right for this stuff, as you introduce
> collation-level knowledge into something which has been much more
> flexible until now.  Wouldn't it be better to move the refobjversion
> string directly into ObjectAddress?


We could, but we would then need to add code to retrieve the collation version
in multiple places (at least RecordDependencyOnCollation and
recordDependencyOnSingleRelExpr).  I'm afraid that'll open room for bugs if
some other places are missed, now or later, even more if more objects get a
versionning support.


> + * Test if a record exists for the given depender and referenceds addresses.
> [...]
> +   /* recordDependencyOnSingleRelExpr get rid of duplicate entries */
> Typos.
>
> +   /* XXX should we warn about "disappearing" versions? */
> +   if (current_version)
> What are disappearing version strings?


A collation for which a version was previously recorded but that now doesn't
report a version anymore.  For instance if upgrading from glibc X.Y to X.Z
changes gnu_get_libc_version() to return NULL, or if a new major pg version
removes support for glibc (or other lib) versioning.  It seems unlikely to
happen, and if that happens there's nothing we can do anymore to warn about
possible corruption anyway.


> +/*
> + * Perform version sanity checks on the relation underlying indexes if
> + * it's not a VACUUM FULL
> + */
> +if (!(options & VACOPT_FULL) && onerel && !IsSystemRelation(onerel) &&
> +onerel->rd_rel->relhasindex)
> Should this explain why?


I was assuming it's self explanatory, since VACUUM FULL is one of the 3 only
ways to fix a possibly corrupt index (on top of REINDEX and ALTER INDEX ...
ALTER COLLATION ... REFRESH VERSION).  I can mention it if needed though.


>
> +/* Record collations from the type itself, or underlying in case 
> of
> + * complex type.  Note that if the direct parent is a CollateExpr
> + * node, there's no need to record the type underlying collation 
> if
> Comment block format.

Oops, will fix.


> +-- for key columns, hash indexes should record dependency on the collation 
> but
> +-- not the version
> +CREATE INDEX icuidx18_hash_d_es ON collate_test USING hash (d_es);
> Why is that?


Because hash indexes don't rely on the sort order for the key columns?  So even
if the sort order changes the index won't get corrupted (unless it's a non
deterministic collation of course).


> The code in 0004 has no mention of that, and relies on
> this code path:
> +/*
> + * Returns whether the given index access method depend on a stable collation
> + * order.
> + */
> +static bool
> +index_depends_stable_coll_order(Oid amoid)
> +{
> +   return (amoid != HASH_AM_OID &&
> +   strcmp(get_am_name(amoid), "bloom") != 0);
> +}


Th

Re: add types to index storage params on doc

2020-03-18 Thread Fujii Masao





On 2020/03/17 14:52, Atsushi Torikoshi wrote:

On Mon, Mar 16, 2020 at 11:32 PM Alvaro Herrera mailto:alvhe...@2ndquadrant.com>> wrote:


 > Should I also change it to "enum"?

Yeah, these were strings until recently (commit 773df883e8f7 Sept 2019).


Thanks!

Attached a patch for manuals on create and alter view.

check_option is also described in information_schema.sgml
and its type is 'character_data', but as far as I read
information_schema.sql this is correct and it seems no
need for modification here.


Thanks for the patch!
I pushed two patches that you posted in this thread. Thanks!

Regards,

--
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters

SupportRequestSimplify and SQL SRF

2020-03-18 Thread Ronan Dunklau

Hello,

I've been trying to play with support functions for a use-case of ours, and
found some behaviors I don't know are expected or not.

The use case: we have some complicated queries, where whole-branches of the
execution tree could be cut if some things were evaluated at planning time.
Take the following simplified example:

CREATE OR REPLACE FUNCTION myfunction(t1_id int) AS $$
SELECT *
FROM sometable t1
JOIN sometable t2 on t2.t1_id = t1.id
WHERE id = t1_id AND t1.somecolumn IS NOT NULL
$$ language SQL;

If I were to incorporate this function in a larger query, the planner will
choose a plan based on a generic value of t1_id and may estimate a large
rowcount after inlining.

What I want to do is to evaluate whether id = t1_id AND somecolumn is NOT
NULL at planification time, and replace the function by another one if this
can be pruned altogether.

So, what I've been doing is to implement a support function for
SupportRequestSimplify, and If the predicate doesn't match any row, replace
the FuncExpr by a new one, calling a different function.

This seems to work great, but I have several questions:

1) Is it valid to make SPI calls in a support function to do this kind of
simplification ?

2) My new FuncExpr doesn't get inlined. This is because in
inline_set_returning_function, we check that after the call to
eval_const_expressions we still call the same function. I think it would be
better to first simplify the function if we can, and only then record the
function oid and call the rest of the machinery. I tested that naively by
calling eval_const_expressions early in inline_set_returning_function and
it seems to do the trick. A proper patch would likely only call the support
function at this stage.

What do you think ?

-- 




This e-mail message and any attachments to it are intended only for the 
named recipients and may contain legally privileged and/or confidential 
information. If you are not one of the intended recipients, do not 
duplicate or forward this e-mail message.

Re: RecoveryWalAll and RecoveryWalStream wait events

2020-03-18 Thread Fujii Masao

On 2020/03/18 17:56, Atsushi Torikoshi wrote:

On Tue, Mar 17, 2020 at 11:55 AM Fujii Masao mailto:masao.fu...@oss.nttdata.com>> wrote:

 >  >    Waiting when WAL data is not available from any kind of sources
 >  >    (local, archive or stream) before trying again to retrieve WAL 
data,
 >
 > I think 'local' means pg_wal here, but the comment on
 > WaitForWALToBecomeAvailable() says checking pg_wal in
 > standby mode is 'not documented', so I'm a little bit worried
 > that users may be confused.

This logic seems to be documented in high-availability.sgml.

Thanks! I didn't notice it.
I think you mean the below sentence.

 >  The standby server will also attempt to restore any WAL found in the 
standby cluster's pg_wal directory.

I meant the following part in the doc.

-
At startup, the standby begins by restoring all WAL available in the archive
location, calling restore_command. Once it reaches the end of WAL available
there and restore_command fails, it tries to restore any WAL available in the
pg_wal directory. If that fails, and streaming replication has been configured,
the standby tries to connect to the primary server and start streaming WAL from
the last valid record found in archive or pg_wal. If that fails or streaming
replication is not configured, or if the connection is later disconnected,
the standby goes back to step 1 and tries to restore the file from the archive
again. This loop of retries from the archive, pg_wal, and via streaming
replication goes on until the server is stopped or failover is triggered by a
trigger file.
-

It seems the comment on WaitForWALToBecomeAvailable()
does not go along with the high-availability.sgml, do we need
modification on the comment on the function?

No, I think for now. But you'd like to improve the docs?

But, anyway, you think that "pg_wal" should be used instead 

of "local" here?

I don't have special opinion here.
It might be better because high-availability.sgml does not use
"local" but "pg_wal" for the explanation,  but I also feel it's
obvious in this context.

Ok, I changed that from "local" to "pg_wal" in the patch for
the master. Attached is the updated version of the patch.
If you're OK with this, I'd like to commit two patches that I posted
in this thread.

Regards,

--
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 7626987808..89853a16d8 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -1244,7 +1244,7 @@ postgres   27093  0.0  0.0  30096  2752 ?Ss   
11:34   0:00 postgres: ser
  Waiting to acquire a pin on a buffer.

- Activity
+ Activity
  ArchiverMain
  Waiting in main loop of the archiver process.

@@ -1276,17 +1276,9 @@ postgres   27093  0.0  0.0  30096  2752 ?Ss   
11:34   0:00 postgres: ser
  PgStatMain
  Waiting in main loop of the statistics collector 
process.

-
- RecoveryWalAll
- Waiting for WAL from a stream at recovery.
-

  RecoveryWalStream
- 
-  Waiting when WAL data is not available from any kind of sources
-  (local, archive or stream) before trying again to retrieve WAL data,
-  at recovery.
- 
+ Waiting for WAL from a stream at recovery.

  SysLoggerMain
@@ -1496,7 +1488,7 @@ postgres   27093  0.0  0.0  30096  2752 ?Ss   
11:34   0:00 postgres: ser
  Waiting for confirmation from remote server during synchronous 
replication.

- Timeout
+ Timeout
  BaseBackupThrottle
  Waiting during base backup when throttling activity.

@@ -1508,6 +1500,14 @@ postgres   27093  0.0  0.0  30096  2752 ?Ss   
11:34   0:00 postgres: ser
  RecoveryApplyDelay
  Waiting to apply WAL at recovery because it is delayed.

+
+ RecoveryRetrieveRetryInterval
+ 
+  Waiting when WAL data is not available from any kind of sources
+  (pg_wal, archive or stream) before trying
+  again to retrieve WAL data, at recovery.
+ 
+

  IO
  BufFileRead
diff --git a/src/backend/access/transam/xlog.c 
b/src/backend/access/transam/xlog.c
index de2d4ee582..793c076da6 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -12031,7 +12031,7 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool 
randAccess,

 WL_LATCH_SET | WL_TIMEOUT |

 WL_EXIT_ON_PM_DEATH,

Re: Online checksums verification in the backend

2020-03-18 Thread Julien Rouhaud

On Wed, Mar 18, 2020 at 07:06:19AM +0100, Julien Rouhaud wrote:
> On Wed, Mar 18, 2020 at 01:20:47PM +0900, Michael Paquier wrote:
> > On Mon, Mar 16, 2020 at 09:21:22AM +0100, Julien Rouhaud wrote:
> > > On Mon, Mar 16, 2020 at 12:29:28PM +0900, Michael Paquier wrote:
> > >> With a large amount of
> > >> shared buffer eviction you actually increase the risk of torn page
> > >> reads.  Instead of a logic relying on partition mapping locks, which
> > >> could be unwise on performance grounds, did you consider different
> > >> approaches?  For example a kind of pre-emptive lock on the page in
> > >> storage to prevent any shared buffer operation to happen while the
> > >> block is read from storage, that would act like a barrier.
> > >
> > > Even with a workload having a large shared_buffers eviction pattern, I 
> > > don't
> > > think that there's a high probability of hitting a torn page.  Unless I'm
> > > mistaken it can only happen if all those steps happen concurrently to 
> > > doing the
> > > block read just after releasing the LWLock:
> > >
> > > - postgres read the same block in shared_buffers (including all the 
> > > locking)
> > > - dirties it
> > > - writes part of the page
> > >
> > > It's certainly possible, but it seems so unlikely that the optimistic 
> > > lock-less
> > > approach seems like a very good tradeoff.
> >
> > Having false reports in this area could be very confusing for the
> > user.  That's for example possible now with checksum verification and
> > base backups.
>
>
> I agree, however this shouldn't be the case here, as the block will be
> rechecked while holding proper lock the 2nd time in case of possible false
> positive before being reported as corrupted.  So the only downside is to check
> twice a corrupted block that's not found in shared buffers (or concurrently
> loaded/modified/half flushed).  As the number of corrupted or concurrently
> loaded/modified/half flushed blocks should usually be close to zero, it seems
> worthwhile to have a lockless check first for performance reason.


I just noticed some dumb mistakes while adding the new GUCs.  v5 attached to
fix that, no other changes.
>From 6bea507c5dbd7bff862c75b93fc023f0f33aba99 Mon Sep 17 00:00:00 2001
From: Julien Rouhaud 
Date: Mon, 4 Nov 2019 08:40:23 +0100
Subject: [PATCH v5] Add a pg_check_relation() function.

This functions checks the validity of the checksums for all non-dirty blocks of
a given relation, and optionally a given fork, and returns the list of all
blocks that don't match, along with the expected and found checksums.

Author: Julien Rouhaud
Reviewed-by: Michael Paquier, Masahiko Sawada
Discussion: 
https://postgr.es/m/CAOBaU_aVvMjQn%3Dge5qPiJOPMmOj5%3Dii3st5Q0Y%2BWuLML5sR17w%40mail.gmail.com
---
 doc/src/sgml/func.sgml|  44 ++
 src/backend/catalog/system_views.sql  |   7 +
 src/backend/storage/page/checksum.c   | 427 +-
 src/backend/utils/adt/Makefile|   1 +
 src/backend/utils/adt/checksumfuncs.c |  96 
 src/backend/utils/init/globals.c  |   8 +
 src/backend/utils/misc/guc.c  |  43 ++
 src/include/catalog/pg_proc.dat   |   8 +
 src/include/miscadmin.h   |   8 +
 src/include/storage/checksum.h|   7 +
 src/include/utils/guc_tables.h|   1 +
 src/test/Makefile |   3 +-
 src/test/check_relation/.gitignore|   2 +
 src/test/check_relation/Makefile  |  23 +
 src/test/check_relation/README|  23 +
 .../check_relation/t/01_checksums_check.pl| 276 +++
 16 files changed, 973 insertions(+), 4 deletions(-)
 create mode 100644 src/backend/utils/adt/checksumfuncs.c
 create mode 100644 src/test/check_relation/.gitignore
 create mode 100644 src/test/check_relation/Makefile
 create mode 100644 src/test/check_relation/README
 create mode 100644 src/test/check_relation/t/01_checksums_check.pl

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index fc4d7f0f78..4baa2bb5e9 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -21819,6 +21819,50 @@ SELECT (pg_stat_file('filename')).modification;
 
   
 
+  
+   Data Sanity Functions
+
+   
+The functions shown in 
+provide means to check for health of data file in a cluster.
+   
+
+   
+Data Sanity Functions
+
+ 
+  Name Return Type 
Description
+  
+ 
+
+ 
+  
+   
+pg_check_relation(relation 
oid, fork 
text)
+   
+   setof record
+   Validate the checksums for all blocks of a given relation, and
+   optionally the given fork.
+  
+ 
+
+   
+
+   
+pg_check_relation
+   
+   
+pg_check_relation iterates over all the blocks of all
+or the specified fork of a given relation and verify their checksum.  It
+returns the list of blocks for which the found checksum doesn't match the
+expected one.

Re: adding partitioned tables to publications

2020-03-18 Thread Peter Eisentraut


On 2020-03-18 04:06, Amit Langote wrote:

+   if (isnull || !remote_is_publishable)
+   ereport(ERROR,
+   (errmsg("table \"%s.%s\" on the publisher is not publishable",
+   nspname, relname)));

Maybe add a one-line comment above this to say it's an "not supposed
to happen" error or am I missing something?  Wouldn't elog() suffice
for this?


On second thought, maybe we should just drop this check.  The list of 
tables that is part of the publication was already filtered by the 
publisher, so this query doesn't need to check it again.  We just need 
the relkind to be able to construct the COPY command, but we don't need 
to second-guess it beyond that.


--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [PATCH] Add schema and table names to partition error

2020-03-18 Thread Amit Kapila

On Thu, Mar 12, 2020 at 7:46 PM Amit Kapila  wrote:
>
> On Wed, Mar 11, 2020 at 8:51 PM Chris Bandy  wrote:
> >
> > On 3/11/20 6:29 AM, Amit Kapila wrote:
> > >
> > > I have tried with git am as well, but it failed.  I am not sure what
> > > is the reason.  Can you please once check at your end?
> >
> > Yes, sorry. This set (and v3 and v4) should work with -p0. Any following
> > patches from me will use the normal -p1.
> >
>
> Okay.
>

I again tried the latest patch v5 both with -p1 and -p0, but it gives
an error while applying the patch.  Can you send a patch that we can
apply with patch -p1 or git-am?

[1] - 
https://www.postgresql.org/message-id/0731def8-978e-0285-04ee-582762729b38%40gmail.com

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: Multivariate MCV list vs. statistics target

2020-03-18 Thread Tomas Vondra


Hi,

On Wed, Mar 18, 2020 at 04:28:34AM +, Shinoda, Noriyoshi (PN Japan A&PS 
Delivery) wrote:

Hello,

I found a missing column description in the pg_statistic_ext catalog document 
for this new feature.
The attached patch adds a description of the 'stxstattarget' column to 
pg_statistic_ext catalog's document.
If there is a better explanation, please correct it.



Thanks for the report. Yes, this is clearly an omission. I think it
would be better to use wording similar to attstattarget, per the
attached patch. That is, without reference to ALTER STATISTICS and
better explaination of what positive/negative values do. Do you agree?


regards


Regards,
Noriyoshi Shinoda

-Original Message-
From: Tomas Vondra [mailto:tomas.von...@2ndquadrant.com]
Sent: Wednesday, September 11, 2019 7:28 AM
To: Alvaro Herrera 
Cc: Thomas Munro ; Jamison, Kirk ; Dean 
Rasheed ; PostgreSQL Hackers 
Subject: Re: Multivariate MCV list vs. statistics target

On Tue, Sep 03, 2019 at 02:38:56PM -0400, Alvaro Herrera wrote:

On 2019-Aug-01, Tomas Vondra wrote:


I'll move it to the next CF. Aside from the issues pointed by
Kyotaro-san in his review, I still haven't made my mind about whether
to base the use statistics targets set for the attributes. That's
what we're doing now, but I'm not sure it's a good idea after adding separate 
statistics target.
I wonder what Dean's opinion on this is, as he added the current logic.


Latest patch no longer applies.  Please update.  And since you already
seem to have handled all review comments since it was Ready for
Committer, and you now know Dean's opinion on the remaining question,
please get it pushed.



OK, I've pushed this the patch, after some minor polishing.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services







--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index c6f95fa688..64614b569c 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -6452,6 +6452,22 @@ SCRAM-SHA-256$:&l
   Owner of the statistics object
  
 
+ 
+  stxstattarget
+  int4
+  
+  
+   stxstattarget controls the level of detail
+   of statistics accumulated for this statistics object by
+   .
+   A zero value indicates that no statistics should be collected.
+   A negative value says to use the system default statistics target.
+   Positive values stxstattarget
+   determine the target number of most common values
+   to collect.
+  
+ 
+
  
   stxkeys
   int2vector

Re: JDBC prepared insert and X00 and SQL_ASCII

2020-03-18 Thread gmail Vladimir Koković


Hi,


After a thorough Java-Swig-libpq test, I can confirm that INSERT/SELECT 
is working properly:

1. server_encoding: SQL_ASCII
2. client_encoding: SQL_ASCII
3. INSERT / SELECT java string with x00


libpq, psql - everything is OK !


Vladimir Kokovic, DP senior (69)
Serbia, Belgrade, March 18, 2020

On 16.3.20. 17:04, gmail Vladimir Koković wrote:


Hi,

I don't know if this is a bug or the intended mode,
but since ODBC works and JDBC does not, I would ask why JDBC prepared 
insert does not work if ODBC prepared insert works

in case some varchar field contains 0x00 and DB is SQL_ASCII?

Re: Auxiliary Processes and MyAuxProc

2020-03-18 Thread Mike Palmiotto

On Tue, Mar 17, 2020 at 9:04 PM Alvaro Herrera  wrote:
>
> On 2020-Mar-17, Justin Pryzby wrote:
>
> > +static PgSubprocess process_types[] = {
> > + {
> > + .desc = "checker",
> > + .entrypoint = CheckerModeMain
> > + },
> > + {
> > + .desc = "bootstrap",
> > + .entrypoint = BootstrapModeMain
> > + },
>
> Maybe this stuff can be resolved using a technique like rmgrlist.h or
> cmdtaglist.h.  That way it's not in two separate files.

Great suggestion, thanks! I'll try this out in the next version.

-- 
Mike Palmiotto
https://crunchydata.com

RE: Multivariate MCV list vs. statistics target

2020-03-18 Thread Shinoda, Noriyoshi (PN Japan A&PS Delivery)

Hi, 

>Thanks for the report. Yes, this is clearly an omission. I think it would be 
>better to use wording >similar to attstattarget, per the attached patch. That 
>is, without reference to ALTER STATISTICS and >better explaination of what 
>positive/negative values do. Do you agree?

Thank you for your comment.
I agree with the text you suggested.

Regards,
Noriyoshi Shinoda

-Original Message-
From: Tomas Vondra [mailto:tomas.von...@2ndquadrant.com] 
Sent: Wednesday, March 18, 2020 9:36 PM
To: Shinoda, Noriyoshi (PN Japan A&PS Delivery) 
Cc: Alvaro Herrera ; Thomas Munro 
; Jamison, Kirk ; Dean 
Rasheed ; PostgreSQL Hackers 

Subject: Re: Multivariate MCV list vs. statistics target

Hi,

On Wed, Mar 18, 2020 at 04:28:34AM +, Shinoda, Noriyoshi (PN Japan A&PS 
Delivery) wrote:
>Hello,
>
>I found a missing column description in the pg_statistic_ext catalog document 
>for this new feature.
>The attached patch adds a description of the 'stxstattarget' column to 
>pg_statistic_ext catalog's document.
>If there is a better explanation, please correct it.
>

Thanks for the report. Yes, this is clearly an omission. I think it would be 
better to use wording similar to attstattarget, per the attached patch. That 
is, without reference to ALTER STATISTICS and better explaination of what 
positive/negative values do. Do you agree?


regards

>Regards,
>Noriyoshi Shinoda
>
>-Original Message-
>From: Tomas Vondra [mailto:tomas.von...@2ndquadrant.com]
>Sent: Wednesday, September 11, 2019 7:28 AM
>To: Alvaro Herrera 
>Cc: Thomas Munro ; Jamison, Kirk 
>; Dean Rasheed ; 
>PostgreSQL Hackers 
>Subject: Re: Multivariate MCV list vs. statistics target
>
>On Tue, Sep 03, 2019 at 02:38:56PM -0400, Alvaro Herrera wrote:
>>On 2019-Aug-01, Tomas Vondra wrote:
>>
>>> I'll move it to the next CF. Aside from the issues pointed by 
>>> Kyotaro-san in his review, I still haven't made my mind about 
>>> whether to base the use statistics targets set for the attributes. 
>>> That's what we're doing now, but I'm not sure it's a good idea after adding 
>>> separate statistics target.
>>> I wonder what Dean's opinion on this is, as he added the current logic.
>>
>>Latest patch no longer applies.  Please update.  And since you already 
>>seem to have handled all review comments since it was Ready for 
>>Committer, and you now know Dean's opinion on the remaining question, 
>>please get it pushed.
>>
>
>OK, I've pushed this the patch, after some minor polishing.
>
>
>regards
>
>-- 
>Tomas Vondra  http://www.2ndQuadrant.com 
>PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>
>
>



-- 
Tomas Vondra  http://www.2ndQuadrant.com 
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: RecoveryWalAll and RecoveryWalStream wait events

2020-03-18 Thread Atsushi Torikoshi

On Wed, Mar 18, 2020 at 6:59 PM Fujii Masao 
wrote:

>
> I meant the following part in the doc.
>
> -
> At startup, the standby begins by restoring all WAL available in the
> archive
> location, calling restore_command. Once it reaches the end of WAL available
> there and restore_command fails, it tries to restore any WAL available in
> the
> pg_wal directory. If that fails, and streaming replication has been
> configured,
> the standby tries to connect to the primary server and start streaming WAL
> from
> the last valid record found in archive or pg_wal. If that fails or
> streaming
> replication is not configured, or if the connection is later disconnected,
> the standby goes back to step 1 and tries to restore the file from the
> archive
> again. This loop of retries from the archive, pg_wal, and via streaming
> replication goes on until the server is stopped or failover is triggered
> by a
> trigger file.
> -
>
>
Thanks!


> > It seems the comment on WaitForWALToBecomeAvailable()
> > does not go along with the high-availability.sgml, do we need
> > modification on the comment on the function?
>
> No, I think for now. But you'd like to improve the docs?
>

I'll do it.


> > But, anyway, you think that "pg_wal" should be used instead
> >
> > of "local" here?
> >
> >
> > I don't have special opinion here.
> > It might be better because high-availability.sgml does not use
> > "local" but "pg_wal" for the explanation,  but I also feel it's
> > obvious in this context.
>
> Ok, I changed that from "local" to "pg_wal" in the patch for
> the master. Attached is the updated version of the patch.
> If you're OK with this, I'd like to commit two patches that I posted
> in this thread.


 Thanks for your modification and it looks good to me.

Regards,

--
Torikoshi Atsushi

Re: SupportRequestSimplify and SQL SRF

2020-03-18 Thread Tom Lane

Ronan Dunklau  writes:
> What I want to do is to evaluate whether id = t1_id AND somecolumn is NOT
> NULL at planification time, and replace the function by another one if this
> can be pruned altogether.

Hm.  There was never really any expectation that support functions
would be attached to PL functions --- since you have to write the
former in C, it seems a little odd for the supported function not
to also be C.  Perhaps more to the point though, what simplification
knowledge is this support function bringing to bear that the planner
hasn't already got?  It kinda feels like you are trying to solve
this in the wrong place.

> So, what I've been doing is to implement a support function for
> SupportRequestSimplify, and If the predicate doesn't match any row, replace
> the FuncExpr by a new one, calling a different function.

I'm confused.  I don't see any SupportRequestSimplify call at all in the
code path for set-returning functions.  Maybe there should be one,
but there is not.

> This seems to work great, but I have several questions:

> 1) Is it valid to make SPI calls in a support function to do this kind of
> simplification ?

Hmm, a bit scary maybe but we don't hesitate to const-simplify
functions that could contain SPI calls, so I don't see a big
problem in that aspect.  I'd be more worried, if you're executing
some random SQL that way, about whether the SQL reliably does what
you want (in the face of variable search_path and the like).

> 2) My new FuncExpr doesn't get inlined. This is because in
> inline_set_returning_function, we check that after the call to
> eval_const_expressions we still call the same function.

Uh, what?  I didn't check the back branches, but I see nothing
remotely like that in HEAD.

regards, tom lane

Re: Auxiliary Processes and MyAuxProc

2020-03-18 Thread Justin Pryzby

On Wed, Mar 18, 2020 at 09:22:58AM -0400, Mike Palmiotto wrote:
> On Tue, Mar 17, 2020 at 9:04 PM Alvaro Herrera  
> wrote:
> >
> > On 2020-Mar-17, Justin Pryzby wrote:
> >
> > > +static PgSubprocess process_types[] = {
> > > + {
> > > + .desc = "checker",
> > > + .entrypoint = CheckerModeMain
> > > + },
> > > + {
> > > + .desc = "bootstrap",
> > > + .entrypoint = BootstrapModeMain
> > > + },
> >
> > Maybe this stuff can be resolved using a technique like rmgrlist.h or
> > cmdtaglist.h.  That way it's not in two separate files.
> 
> Great suggestion, thanks! I'll try this out in the next version.

Also, I saw this was failing tests both before and after my rebase.

http://cfbot.cputube.org/
https://ci.appveyor.com/project/postgresql-cfbot/postgresql/builds/31535161
https://ci.appveyor.com/project/postgresql-cfbot/postgresql/builds/31386446

-- 
Justin

Re: adding partitioned tables to publications

2020-03-18 Thread Amit Langote

On Wed, Mar 18, 2020 at 8:16 PM Peter Eisentraut
 wrote:
> On 2020-03-18 04:06, Amit Langote wrote:
> > +   if (isnull || !remote_is_publishable)
> > +   ereport(ERROR,
> > +   (errmsg("table \"%s.%s\" on the publisher is not 
> > publishable",
> > +   nspname, relname)));
> >
> > Maybe add a one-line comment above this to say it's an "not supposed
> > to happen" error or am I missing something?  Wouldn't elog() suffice
> > for this?
>
> On second thought, maybe we should just drop this check.  The list of
> tables that is part of the publication was already filtered by the
> publisher, so this query doesn't need to check it again.  We just need
> the relkind to be able to construct the COPY command, but we don't need
> to second-guess it beyond that.

Agreed.

-- 
Thank you,
Amit

Re: SupportRequestSimplify and SQL SRF

2020-03-18 Thread Ronan Dunklau

> Hm.  There was never really any expectation that support functions
> would be attached to PL functions --- since you have to write the
> former in C, it seems a little odd for the supported function not
> to also be C.  Perhaps more to the point though, what simplification
> knowledge is this support function bringing to bear that the planner
> hasn't already got?  It kinda feels like you are trying to solve
> this in the wrong place.
>

Some optimization aren't done by the planner, and could be added easily
that way.

For example, the following query would have wrong estimates if the planner
can't inject inferred values:

SELECT t2.*
FROM t1 JOIN t2 ON t1.id = t2.t1_id
WHERE t1.code = ? AND t1.col1 IS NOT NULL
UNION
SELECT t3.*
FROM t1 JOIN t3 ON t1.id = t3.t1_id
WHERE t1.code = ? AND t1.col1 IS NULL

At any given time, only one of those branch will be evaluated. I can either
write a PL function which will force me to abandon all the benefits of
inlining with regards to cost estimation, or keep it in SQL and fall back
on a generic plan, which will evaluate an average number of rows for both
cases.
With support functions, I was hoping to replace a function containing the
above query to another depending of the matched t1 record: if col1 is NULL,
then query directly t2 else query directly t3. By injecting the value
directly when we know we have only one row (unique constraint on t1.code)
we can optimize the whole thing away, and have sensible estimates based on
the statistics of t1_id. But of course, I need to be able to use SPI calls
to inject the value...

I'm not yet convinced it is a good idea either, but it is one I wanted to
experiment with.
In the more generic case, the planner could possibly perform those kind of
optimizations if it was able to identify JOINs between one unique row and
other relations. If we were to work on a patch like this, would it be
something that could be of interest, perhaps hidden behind a GUC ?


> I'm confused.  I don't see any SupportRequestSimplify call at all in the
> code path for set-returning functions.  Maybe there should be one,
> but there is not.
>

Sorry, I should have checked on HEAD, I was working on REL_12_STABLE.
This simplification was done in eval_const_expressions, which in turn ended
in calling simplify_function.
I have not looked at the code thoroughly on HEAD, but a quick test shows
that it now does what I want and presumably simplifies it earlier.


> > 1) Is it valid to make SPI calls in a support function to do this kind of
> > simplification ?
>
> Hmm, a bit scary maybe but we don't hesitate to const-simplify
> functions that could contain SPI calls, so I don't see a big
> problem in that aspect.  I'd be more worried, if you're executing
> some random SQL that way, about whether the SQL reliably does what
> you want (in the face of variable search_path and the like).
>

Ok, I need to triple-check that, but that was my main worry.


>
> > 2) My new FuncExpr doesn't get inlined. This is because in
> > inline_set_returning_function, we check that after the call to
> > eval_const_expressions we still call the same function.
>
> Uh, what?  I didn't check the back branches, but I see nothing
> remotely like that in HEAD.
>

Sorry again, I should have checked HEAD. The code is different on HEAD, and
works as expected: the replacement SRF ends up being inlined.
Again, thank you for your answer.

Best regards,

-- 




This e-mail message and any attachments to it are intended only for the 
named recipients and may contain legally privileged and/or confidential 
information. If you are not one of the intended recipients, do not 
duplicate or forward this e-mail message.

Thinko in index_concurrently_swap comment

2020-03-18 Thread Julien Rouhaud

Hi,

While looking at RIC for the collation versioning issue Michael raised earlier,
I found a thinko in a nearby comment, apparently introduced in the original
REINDEX CONCURRENTLY patch (5dc92b8).  Trivial patch attached.
>From 161ea86380a50caeee8de81fe82d6eb106a2fd39 Mon Sep 17 00:00:00 2001
From: Julien Rouhaud 
Date: Wed, 18 Mar 2020 15:16:35 +0100
Subject: [PATCH] Fix comment in index_concurrently_swap()

---
 src/backend/catalog/index.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 76fd938ce3..023ec7e618 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -1531,7 +1531,7 @@ index_concurrently_swap(Oid newIndexId, Oid oldIndexId, 
const char *oldName)
newIndexForm->indisclustered = oldIndexForm->indisclustered;
 
/*
-* Mark the old index as valid, and the new index as invalid similarly
+* Mark the new index as valid, and the old index as invalid similarly
 * to what index_set_state_flags() does.
 */
newIndexForm->indisvalid = true;
-- 
2.20.1

Re: Adding missing object access hook invocations

2020-03-18 Thread Mark Dilger



> On Mar 17, 2020, at 9:33 PM, Michael Paquier  wrote:
> 
> On Tue, Mar 17, 2020 at 12:39:35PM -0700, Mark Dilger wrote:
>> I agree that this does not need to be back-patched.  I was debating
>> whether it constitutes a bug for the purpose of putting the fix into
>> v13 vs. punting the patch forward to the v14 cycle.  I don't have a
>> strong opinion on that.
> 
> I don't see any strong argument against fixing this stuff in v13,
> FWIW.

Here is the latest patch.  I'll go add this thread to the commitfest app now



v2-0001-Adding-missing-Object-Access-hook-invocations.patch
Description: Binary data

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: BEFORE ROW triggers for partitioned tables

2020-03-18 Thread Ashutosh Bapat

I was expecting that documentation somewhere covered the fact that BR
triggers are not supported on a partitioned table. And this patch
would remove/improve that portion of the code. But I don't see any
documentation changes in this patch.

On Tue, Mar 17, 2020 at 10:11 PM Ashutosh Bapat
 wrote:
>
>
>
> On Fri, 13 Mar 2020 at 21:55, Alvaro Herrera  wrote:
>>
>> On 2020-Mar-11, Ashutosh Bapat wrote:
>>
>> > On Thu, Feb 27, 2020 at 10:22 PM Alvaro Herrera
>> >  wrote:
>>
>> > > * The new function I added, ReportTriggerPartkeyChange(), contains one
>> > > serious bug (namely: it doesn't map attribute numbers properly if
>> > > partitions are differently defined).
>> >
>> > IIUC the code in your patch, it seems you are just looking at
>> > partnatts. But partition key can contain expressions also which are
>> > stored in partexprs. So, I think the code won't catch change in the
>> > partition key values when it contains expression. Using
>> > RelationGetPartitionQual() will fix this problem and also problem of
>> > attribute remapping across the partition hierarchy.
>>
>> Oh, of course.
>>
>> In fact, I don't need to deal with PartitionQual directly; I can just
>> use ExecPartitionCheck(), since in ExecBRInsertTriggers et al we already
>> have all we need.  v2 attached.
>
>
> Thanks.
>
>>
>>  insert into parted values (1, 1, 'uno uno v2');-- fail
>>  ERROR:  moving row to another partition during a BEFORE trigger is not 
>> supported
>>  DETAIL:  Before trigger "t", row was to be in partition "public.parted_1_1"
>>
>> Note that in this implementation I no longer know which column is the
>> problematic one, but I suppose users have clue enough.  Wording
>> suggestions welcome.
>
>
> When we have expression as a partition key, there may not be one particular 
> column which causes the row to move out of partition. So, this should be fine.
> A slight wording suggestion below.
>
> - * Complain if we find an unexpected trigger type.
> - */
> - if (!TRIGGER_FOR_AFTER(trigForm->tgtype))
> - elog(ERROR, "unexpected trigger \"%s\" found",
> - NameStr(trigForm->tgname));
>
> !AFTER means INSTEAD OF and BEFORE. Do you intend to allow INSTEAD OF triggers
> as well?
> - */
> - if (stmt->timing != TRIGGER_TYPE_AFTER)
>
> Same comment as the above?
>
> + /*
> + * After a tuple in a partition goes through a trigger, the user
> + * could have changed the partition key enough that the tuple
> + * no longer fits the partition.  Verify that.
> + */
> + if (trigger->tgisclone &&
>
> Why do we want to restrict this check only for triggers which are cloned from
> the ancestors?
>
> + !ExecPartitionCheck(relinfo, slot, estate, false))
> + ereport(ERROR,
> + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> + errmsg("moving row to another partition during a BEFORE trigger is not 
> supported"),
> + errdetail("Before trigger \"%s\", row was to be in partition \"%s.%s\"",
>
> In the error message you removed above, we are mentioning BEFORE FOR EACH ROW
> trigger. Should we continue to use the same terminology?
>
> I was wondering whether it would be good to check the partition constraint 
> only
> once i.e. after all before row triggers have been executed. This would avoid
> throwing an error in case multiple triggers together worked to keep the tuple
> in the same partition when individual trigger/s caused it to move out of that
> partition. But then we would loose the opportunity to point out the before row
> trigger which actually caused the row to move out of the partition. Anyway,
> wanted to bring that for the discussion here.
>
> @@ -307,7 +307,7 @@ CreatePartitionDirectory(MemoryContext mcxt)
>   *
>   * The purpose of this function is to ensure that we get the same
>   * PartitionDesc for each relation every time we look it up.  In the
> - * face of current DDL, different PartitionDescs may be constructed with
> + * face of concurrent DDL, different PartitionDescs may be constructed with
>
> Thanks for catching it. Looks unrelated though.
>
> +-- Before triggers and partitions
>
> The test looks good. Should we add a test for partitioned table with partition
> key as expression?
>
> The approach looks good to me.
>
> --
> Best Wishes,
> Ashutosh



-- 
Best Wishes,
Ashutosh Bapat

Re: Auxiliary Processes and MyAuxProc

2020-03-18 Thread Mike Palmiotto

On Wed, Mar 18, 2020 at 10:17 AM Justin Pryzby  wrote:
>
> On Wed, Mar 18, 2020 at 09:22:58AM -0400, Mike Palmiotto wrote:
> > On Tue, Mar 17, 2020 at 9:04 PM Alvaro Herrera  
> > wrote:
> > >
> > > On 2020-Mar-17, Justin Pryzby wrote:
> > >
> > > > +static PgSubprocess process_types[] = {
> > > > + {
> > > > + .desc = "checker",
> > > > + .entrypoint = CheckerModeMain
> > > > + },
> > > > + {
> > > > + .desc = "bootstrap",
> > > > + .entrypoint = BootstrapModeMain
> > > > + },
> > >
> > > Maybe this stuff can be resolved using a technique like rmgrlist.h or
> > > cmdtaglist.h.  That way it's not in two separate files.
> >
> > Great suggestion, thanks! I'll try this out in the next version.
>
> Also, I saw this was failing tests both before and after my rebase.
>
> http://cfbot.cputube.org/
> https://ci.appveyor.com/project/postgresql-cfbot/postgresql/builds/31535161
> https://ci.appveyor.com/project/postgresql-cfbot/postgresql/builds/31386446

Good catch, thanks. Will address this as well in the next round. Just
need to set up a Windows dev environment to see if I can
reproduce/fix.

-- 
Mike Palmiotto
https://crunchydata.com

Re: pg_stat_progress_basebackup - progress reporting for pg_basebackup, in the server side

2020-03-18 Thread Magnus Hagander

On Wed, Mar 11, 2020 at 5:53 AM Fujii Masao  wrote:
>
>
>
> On 2020/03/11 3:39, Magnus Hagander wrote:
> > On Tue, Mar 10, 2020 at 6:19 PM Fujii Masao  
> > wrote:
> >>
> >>
> >>
> >> On 2020/03/10 22:43, Amit Langote wrote:
> >>> On Tue, Mar 10, 2020 at 6:09 PM Fujii Masao  
> >>> wrote:
> > So, I will make the patch adding support for --no-estimate-size option
> > in pg_basebackup.
> 
>  Patch attached.
> >>>
> >>> Like the idea and the patch looks mostly good.
> >>
> >> Thanks for reviewing the patch!
> >>
> >>> +  total size. If the estimation is disabled in
> >>> +  pg_basebackup
> >>> +  (i.e., --no-estimate-size option is specified),
> >>> +  this is always 0.
> >>>
> >>> "always" seems unnecessary.
> >>
> >> Fixed.
> >>
> >>> +This option prevents the server from estimating the total
> >>> +amount of backup data that will be streamed. In other words,
> >>> +backup_total column in the
> >>> +pg_stat_progress_basebackup
> >>> +view always indicates 0 if this option is 
> >>> enabled.
> >>>
> >>> Here too.
> >>
> >> Fixed.
> >>
> >> Attached is the updated version of the patch.
> >
> > Would it perhaps be better to return NULL instead of 0 in the
> > statistics view if there is no data?

Did you miss this comment, or not agree? :)


> > Also, should it really  be the server version that decides how this
> > feature behaves, and not the pg_basebackup version? Given that the
> > implementation is entirely in the client, it seems that's more
> > logical?
>
> Yeah, you're right. I changed the patch that way.
> Attached is the updated version of the patch.

The other changes in it look good!

//Magnus

potential stuck lock in SaveSlotToPath()

2020-03-18 Thread Peter Eisentraut

When SaveSlotToPath() is called with elevel=LOG, the early exits don't 
release the slot's io_in_progress_lock.  Fix attached.


This could result in a walsender being stuck on the lock forever.  A 
possible way to get into this situation is if the offending code paths 
are triggered in a low disk space situation.  (This is how it was found; 
maybe there are other ways.)


Pavan Deolasee and Craig Ringer worked on this issue.  I'm forwarding it 
on their behalf.


--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From b33a856c68e1c48c8663a760f79b2b3d3bf0088c Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Wed, 18 Mar 2020 16:24:59 +0100
Subject: [PATCH] Drop slot's LWLock before returning from SaveSlotToPath()

When SaveSlotToPath() is called with elevel=LOG, the early exits didn't
release the slot's io_in_progress_lock.

This could result in a walsender being stuck on the lock forever.  A
possible way to get into this situation is if the offending code paths
are triggered in a low disk space situation.

Author: Pavan Deolasee 
Reported-by: Craig Ringer 
---
 src/backend/replication/slot.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 1cec53d748..1bff2d6185 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -1256,6 +1256,13 @@ SaveSlotToPath(ReplicationSlot *slot, const char *dir, 
int elevel)
fd = OpenTransientFile(tmppath, O_CREAT | O_EXCL | O_WRONLY | 
PG_BINARY);
if (fd < 0)
{
+   /*
+* If not an ERROR, then release the lock before returning the 
control
+* back to the caller. In case of an ERROR, the error recovery 
path
+* should automatically release the lock, but no harm in 
explicitly
+* releaseing even in that case.
+*/
+   LWLockRelease(&slot->io_in_progress_lock);
ereport(elevel,
(errcode_for_file_access(),
 errmsg("could not create file \"%s\": %m",
@@ -1287,6 +1294,7 @@ SaveSlotToPath(ReplicationSlot *slot, const char *dir, 
int elevel)
 
pgstat_report_wait_end();
CloseTransientFile(fd);
+   LWLockRelease(&slot->io_in_progress_lock);
 
/* if write didn't set errno, assume problem is no disk space */
errno = save_errno ? save_errno : ENOSPC;
@@ -1306,6 +1314,7 @@ SaveSlotToPath(ReplicationSlot *slot, const char *dir, 
int elevel)
 
pgstat_report_wait_end();
CloseTransientFile(fd);
+   LWLockRelease(&slot->io_in_progress_lock);
errno = save_errno;
ereport(elevel,
(errcode_for_file_access(),
@@ -1317,6 +1326,7 @@ SaveSlotToPath(ReplicationSlot *slot, const char *dir, 
int elevel)
 
if (CloseTransientFile(fd) != 0)
{
+   LWLockRelease(&slot->io_in_progress_lock);
ereport(elevel,
(errcode_for_file_access(),
 errmsg("could not close file \"%s\": %m",
@@ -1327,6 +1337,7 @@ SaveSlotToPath(ReplicationSlot *slot, const char *dir, 
int elevel)
/* rename to permanent file, fsync file and directory */
if (rename(tmppath, path) != 0)
{
+   LWLockRelease(&slot->io_in_progress_lock);
ereport(elevel,
(errcode_for_file_access(),
 errmsg("could not rename file \"%s\" to 
\"%s\": %m",
-- 
2.25.0

Re: Multivariate MCV list vs. statistics target

2020-03-18 Thread Tomas Vondra


On Wed, Mar 18, 2020 at 01:32:19PM +, Shinoda, Noriyoshi (PN Japan
A&PS Delivery) wrote:

Hi,


Thanks for the report. Yes, this is clearly an omission. I think it
would be better to use wording >similar to attstattarget, per the
attached patch. That is, without reference to ALTER STATISTICS and

better explaination of what positive/negative values do. Do you
agree?


Thank you for your comment.  I agree with the text you suggested.



Thank you for the report, I've pushed the reworded fix.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: PATCH: add support for IN and @> in functional-dependency statistics use

2020-03-18 Thread Tomas Vondra


On Tue, Mar 17, 2020 at 04:37:06PM +0100, Tomas Vondra wrote:

On Tue, Mar 17, 2020 at 12:42:52PM +, Dean Rasheed wrote:

On Sat, 14 Mar 2020 at 18:45, Tomas Vondra  wrote:


I realized there's one more thing that probably needs discussing.
Essentially, these two clause types are the same:

  a IN (1, 2, 3)

  (a = 1 OR a = 2 OR a = 3)

but with 8f321bd1 we only recognize the first one as compatible with
functional dependencies. It was always the case that we estimated those
two clauses a bit differently, but the differences were usually small.
But now that we recognize IN as compatible with dependencies, the
difference may be much larger, which bugs me a bit ...

So I wonder if we should recognize the special form of an OR clause,
with all Vars referencing to the same attribute etc. and treat this as
supported by functional dependencies - the attached patch does that.
MCV lists there's already no difference because OR clauses are
supported.



Makes sense, and the patch looks straightforward enough.


The question is whether we want to do this, and whether we should also
teach the per-column estimates to recognize this special case of IN
clause.


I'm not convinced about that second part though. I'd say that
recognising the OR clause for functional dependencies is sufficient to
prevent the large differences in estimates relative to the equivalent
IN clauses. The small differences between the way that OR and IN
clauses are handled have always been there, and I think that changing
that is out of scope for this work.



Not sure. I think the inconsistency between plan and extended stats may
be a bit surprising, but I agree that issue may be negligible.



OK, I've pushed the change recognizing the special case of OR clauses as
supported by functional dependencies. I've left the estimation of the
clause itself as it's, we can address that in the future if needed.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: pg_stat_progress_basebackup - progress reporting for pg_basebackup, in the server side

2020-03-18 Thread Fujii Masao




On 2020/03/19 0:37, Magnus Hagander wrote:

On Wed, Mar 11, 2020 at 5:53 AM Fujii Masao  wrote:




On 2020/03/11 3:39, Magnus Hagander wrote:

On Tue, Mar 10, 2020 at 6:19 PM Fujii Masao  wrote:




On 2020/03/10 22:43, Amit Langote wrote:

On Tue, Mar 10, 2020 at 6:09 PM Fujii Masao  wrote:

So, I will make the patch adding support for --no-estimate-size option
in pg_basebackup.


Patch attached.


Like the idea and the patch looks mostly good.


Thanks for reviewing the patch!


+  total size. If the estimation is disabled in
+  pg_basebackup
+  (i.e., --no-estimate-size option is specified),
+  this is always 0.

"always" seems unnecessary.


Fixed.


+This option prevents the server from estimating the total
+amount of backup data that will be streamed. In other words,
+backup_total column in the
+pg_stat_progress_basebackup
+view always indicates 0 if this option is enabled.

Here too.


Fixed.

Attached is the updated version of the patch.


Would it perhaps be better to return NULL instead of 0 in the
statistics view if there is no data?


Did you miss this comment, or not agree? :)


Oh, I forgot to attached the patch... Patch attached.
This patch needs to be applied after applying
add_no_estimate_size_v3.patch.


Also, should it really  be the server version that decides how this
feature behaves, and not the pg_basebackup version? Given that the
implementation is entirely in the client, it seems that's more
logical?


Yeah, you're right. I changed the patch that way.
Attached is the updated version of the patch.


The other changes in it look good!


Thanks for the review!

Regards,

--
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 221acbe3ec..cf74e3da45 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -4403,7 +4403,7 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid,
   total size. If the estimation is disabled in
   pg_basebackup
   (i.e., --no-estimate-size option is specified),
-  this is 0.
+  this is NULL.
  
 
 
diff --git a/doc/src/sgml/ref/pg_basebackup.sgml 
b/doc/src/sgml/ref/pg_basebackup.sgml
index 90638aad0e..c8e040bacf 100644
--- a/doc/src/sgml/ref/pg_basebackup.sgml
+++ b/doc/src/sgml/ref/pg_basebackup.sgml
@@ -546,7 +546,7 @@ PostgreSQL documentation
 amount of backup data that will be streamed, resulting in the
 backup_total column in the
 pg_stat_progress_basebackup
-to be 0.
+to be NULL.


 Without this option, the backup will start by enumerating
diff --git a/src/backend/catalog/system_views.sql 
b/src/backend/catalog/system_views.sql
index b8a3f46912..5a6dc61630 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1070,7 +1070,7 @@ CREATE VIEW pg_stat_progress_basebackup AS
   WHEN 4 THEN 'waiting for wal archiving to finish'
   WHEN 5 THEN 'transferring wal files'
   END AS phase,
-   S.param2 AS backup_total,
+   CASE S.param2 WHEN -1 THEN NULL ELSE S.param2 END AS backup_total,
S.param3 AS backup_streamed,
S.param4 AS tablespaces_total,
S.param5 AS tablespaces_streamed
diff --git a/src/backend/replication/basebackup.c 
b/src/backend/replication/basebackup.c
index 806d013108..a2e28b064c 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -123,7 +123,10 @@ static long long int total_checksum_failures;
 /* Do not verify checksums. */
 static bool noverify_checksums = false;
 
-/* Total amount of backup data that will be streamed */
+/*
+ * Total amount of backup data that will be streamed.
+ * -1 means that the size is not estimated.
+ */
 static int64 backup_total = 0;
 
 /* Amount of backup data already streamed */
@@ -258,6 +261,18 @@ perform_base_backup(basebackup_options *opt)
backup_streamed = 0;
pgstat_progress_start_command(PROGRESS_COMMAND_BASEBACKUP, InvalidOid);
 
+   /*
+* If the estimation of the total backup size is disabled, make the
+* backup_total column in the view return NULL by setting the parameter 
to
+* -1.
+*/
+   if (!opt->progress)
+   {
+   backup_total = -1;
+   pgstat_progress_update_param(PROGRESS_BASEBACKUP_BACKUP_TOTAL,
+
backup_total);
+   }
+
datadirpathlen = strlen(DataDir);
 
backup_started_in_recovery = RecoveryInProgress();
@@ -1842,7 +1857,7 @@ update_basebackup_progress(int64 delta)
 * will always be wrong if WAL is included), but that's better than 
having
 * the done column be bigger than the total.
 */
-   if (backup_total > 0 && bac

Re: proposal: new polymorphic types - commontype and commontypearray

2020-03-18 Thread Tom Lane

So ... there is a definitional question here that doesn't seem to have
been mentioned anywhere in the thread.  For the traditional polymorphic
types, we insist that at least one non-unknown input be supplied, thus
you get

regression=# create function foo(anyelement, anyelement) returns bool
regression-# language sql as 'select $1 = $2';
CREATE FUNCTION

regression=# select foo('a', 'b');
ERROR:  could not determine polymorphic type because input has type unknown

regression=# select foo('a', 'b'::text);
 foo 
-
 f
(1 row)

As this patch stands, the ANYCOMPATIBLE types also require that:

regression=# create function foo2(anycompatible, anycompatible) returns bool
language sql as 'select $1 = $2';
CREATE FUNCTION

regression=# select foo2('a', 'b');
ERROR:  could not determine polymorphic common type because input has type 
unknown

However, it seems to me that this is inconsistent with the definition,
namely that we resolve the common type the same way select_common_type()
does, because select_common_type() will choose TEXT when given all-unknown
inputs.  So shouldn't we choose TEXT here?

Admittedly, the fact that select_common_type() falls back to TEXT is a
bit of a wart, so maybe we don't want to propagate it here.  But if we
don't, we'll have to document the selection rule as almost but not
quite like what it says in section 10.5.  That seems confusing.

Documentation issues aside, I'm not quite sure whether this behavior
would be more or less preferable in practice than sticking with the
existing behavior.  It seems like it'd be convenient in some cases
but possibly allow mistakes to go undetected in others.

Thoughts?

regards, tom lane

Re: pg_stat_progress_basebackup - progress reporting for pg_basebackup, in the server side

2020-03-18 Thread Magnus Hagander

On Wed, Mar 18, 2020 at 5:14 PM Fujii Masao  wrote:
>
>
>
> On 2020/03/19 0:37, Magnus Hagander wrote:
> > On Wed, Mar 11, 2020 at 5:53 AM Fujii Masao  
> > wrote:
> >>
> >>
> >>
> >> On 2020/03/11 3:39, Magnus Hagander wrote:
> >>> On Tue, Mar 10, 2020 at 6:19 PM Fujii Masao  
> >>> wrote:
> 
> 
> 
>  On 2020/03/10 22:43, Amit Langote wrote:
> > On Tue, Mar 10, 2020 at 6:09 PM Fujii Masao 
> >  wrote:
> >>> So, I will make the patch adding support for --no-estimate-size option
> >>> in pg_basebackup.
> >>
> >> Patch attached.
> >
> > Like the idea and the patch looks mostly good.
> 
>  Thanks for reviewing the patch!
> 
> > +  total size. If the estimation is disabled in
> > +  pg_basebackup
> > +  (i.e., --no-estimate-size option is 
> > specified),
> > +  this is always 0.
> >
> > "always" seems unnecessary.
> 
>  Fixed.
> 
> > +This option prevents the server from estimating the total
> > +amount of backup data that will be streamed. In other words,
> > +backup_total column in the
> > +pg_stat_progress_basebackup
> > +view always indicates 0 if this option is 
> > enabled.
> >
> > Here too.
> 
>  Fixed.
> 
>  Attached is the updated version of the patch.
> >>>
> >>> Would it perhaps be better to return NULL instead of 0 in the
> >>> statistics view if there is no data?
> >
> > Did you miss this comment, or not agree? :)
>
> Oh, I forgot to attached the patch... Patch attached.
> This patch needs to be applied after applying
> add_no_estimate_size_v3.patch.

:)

Hmm. I'm slightly irked by doing the -1 -> NULL conversion in the SQL
view.  I wonder if it might be worth teaching
pg_stat_get_progress_info() about returning NULL?

-- 
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/

Re: Small docs bugfix: make it clear what can be used in UPDATE FROM and DELETE USING

2020-03-18 Thread Bruce Momjian

On Tue, Mar 17, 2020 at 10:58:54PM -0400, Tom Lane wrote:
> Bruce Momjian  writes:
> > I have implemented the ideas above in the attached patch.  I have
> > synchronized the syntax to match SELECT, and synchronized the paragraphs
> > describing the item.
> 
> I think that the DELETE synopsis should look like
> 
> [ USING from_item [, ...] ]
> 
> so that there's not any question which part of the SELECT syntax we're
> talking about.  I also think that the running text in both cases should
> say in exactly these words "from_item means the same thing as it does
> in SELECT"; the wording you propose still seems to be dancing around
> the point, leaving readers perhaps not quite sure about what is meant.
> 
> In the DELETE case you could alternatively say "using_item means the same
> thing as from_item does in SELECT", but that doesn't really seem like an
> improvement to me.

OK, updated patch attached.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +
diff --git a/doc/src/sgml/ref/delete.sgml b/doc/src/sgml/ref/delete.sgml
index df8cea48cf..08fb032b50 100644
--- a/doc/src/sgml/ref/delete.sgml
+++ b/doc/src/sgml/ref/delete.sgml
@@ -23,7 +23,7 @@ PostgreSQL documentation
 
 [ WITH [ RECURSIVE ] with_query [, ...] ]
 DELETE FROM [ ONLY ] table_name [ * ] [ [ AS ] alias ]
-[ USING using_list ]
+[ USING from_item [, ...] ]
 [ WHERE condition | WHERE CURRENT OF cursor_name ]
 [ RETURNING * | output_expression [ [ AS ] output_name ] [, ...] ]
 
@@ -117,17 +117,17 @@ DELETE FROM [ ONLY ] table_name [ *

 

-using_list
+from_item
 
  
-  A list of table expressions, allowing columns from other tables
-  to appear in the WHERE condition.  This is similar
-  to the list of tables that can be specified in the  of a
-  SELECT statement; for example, an alias for
-  the table name can be specified.  Do not repeat the target table
-  in the using_list,
-  unless you wish to set up a self-join.
+  A table expression allowing columns from other tables to appear
+  in the WHERE condition.  This uses the same
+  syntax as the 
+  of a SELECT statement; for example, an alias
+  for the table name can be specified.  Do not repeat the target
+  table as a from_item
+  unless you wish to set up a self-join (in which case it must appear
+  with an alias in the from_item).
  
 

diff --git a/doc/src/sgml/ref/update.sgml b/doc/src/sgml/ref/update.sgml
index f58dcd8877..07958e7447 100644
--- a/doc/src/sgml/ref/update.sgml
+++ b/doc/src/sgml/ref/update.sgml
@@ -27,7 +27,7 @@ UPDATE [ ONLY ] table_name [ * ] [
   ( column_name [, ...] ) = [ ROW ] ( { expression | DEFAULT } [, ...] ) |
   ( column_name [, ...] ) = ( sub-SELECT )
 } [, ...]
-[ FROM from_list ]
+[ FROM from_item [, ...] ]
 [ WHERE condition | WHERE CURRENT OF cursor_name ]
 [ RETURNING * | output_expression [ [ AS ] output_name ] [, ...] ]
 
@@ -164,17 +164,17 @@ UPDATE [ ONLY ] table_name [ * ] [

 

-from_list
+from_item
 
  
-  A list of table expressions, allowing columns from other tables
-  to appear in the WHERE condition and the update
-  expressions. This is similar to the list of tables that can be
-  specified in the  of a SELECT
-  statement.  Note that the target table must not appear in the
-  from_list, unless you intend a self-join (in which
-  case it must appear with an alias in the from_list).
+  A table expression allowing columns from other tables to appear in
+  the WHERE condition and update expressions. This
+  uses the same syntax as the  of a SELECT statement;
+  for example, an alias for the table name can be specified.  Do not
+  repeat the target table as a from_item
+  unless you intend a self-join (in which case it must appear with
+  an alias in the from_item).
  
 

@@ -264,7 +264,7 @@ UPDATE count
   
When a FROM clause is present, what essentially happens
is that the target table is joined to the tables mentioned in the
-   from_list, and each output row of the join
+   from_item list, and each output row of the join
represents an update operation for the target table.  When using
FROM you should ensure that the join
produces at most one output row for each row to be modified.  In

Re: Autovacuum on partitioned table (autoanalyze)

2020-03-18 Thread Justin Pryzby

Regarding this patch:

+* the ANALYZE message as it resets the partition's changes_since_analze
=> analyze

+* If the relation is a partitioned table, we must add up children's
childrens'

The approach in general:

I see an issue for timeseries data, where only the most recent partition is
being inserted into, and the histogram endpoint is being continuously extended
(this is our use-case).  The most recent partition will be analyzed pretty
often, and I think it'll be problematic if its parent doesn't get similar
treatment.  Let's say there are 12 historic, monthly children with 1e6 tuples
each, and the 13th child has 2e5 tuples (6 days into the month).  It's analyzed
when it grows by 20% (1.2 days), but at that point the parent has only grown by
12x less (~2%) and won't be analyzed until 12x further into the future (14
days).  Its histogram is 12x longer (geometrically), but the histogram changed
by just as much (arithmetically).  That's an issue for a query over "the last
few days"; if that's past the end of the histogram bound, the query planner
will estimate about ~0 tuples, and tend to give cascades of nested loops.  I'm
biased, but I'm guessing that's too common a use case to answer that the proper
fix is to set the parent's analyze_scale_factor=0.0005.  I think that suggests
that the parent might sometimes need to be analyzed every time any of its
children are.  In other cases (like probably any hash partitioning), that'd be
excessive, and maybe the default settings shouldn't do that, but I think that
behavior ought to be possible, and I think this patch doesn't allow that.  

In the past, I think there's was talk that maybe someone would invent a clever
way to dynamically combine all the partitions' statistics, so analyzing the
parent wasn't needed.  I think that's easy enough for reltuples, MCV, and I
think histogram, but ISTM that ndistinct is simultaneously important to get
right and hard to do so.  It depends on whether it's the partition key, which
now can be an arbitrary expression.  Extended stats further complicates it,
even if we didn't aim to dynamically compute extended stats for a parent.

While writing this, it occured to me that we could use "CREATE STATISTICS" as a
way to mark a partitioned table (or certain columns) as needing to be handled
by analyze.  I understand "CREATE STATs" was intended to (eventually) allow
implementing stats on expressions without using "create index" as a hack.  So
if it's excessive to automatically analyze a parent table when any of its
children are analyzed, maybe it's less excessive to only do that for parents
with a stats object, and only on the given colums.  I realize this patch is
alot less useful if it requires to do anything extra/nondefault, and it's
desirable to work without creating a stats object at all.  Also, using CREATE
STATs would reduces the CPU cost of re-analyzing the entire heirarchy, but
doesn't help to reduce the I/O cost, which is significant.

-- 
Justin

Re: proposal: new polymorphic types - commontype and commontypearray

2020-03-18 Thread Pavel Stehule

st 18. 3. 2020 v 17:14 odesílatel Tom Lane  napsal:

> So ... there is a definitional question here that doesn't seem to have
> been mentioned anywhere in the thread.  For the traditional polymorphic
> types, we insist that at least one non-unknown input be supplied, thus
> you get
>
> regression=# create function foo(anyelement, anyelement) returns bool
> regression-# language sql as 'select $1 = $2';
> CREATE FUNCTION
>
> regression=# select foo('a', 'b');
> ERROR:  could not determine polymorphic type because input has type unknown
>
> regression=# select foo('a', 'b'::text);
>  foo
> -
>  f
> (1 row)
>
> As this patch stands, the ANYCOMPATIBLE types also require that:
>
> regression=# create function foo2(anycompatible, anycompatible) returns
> bool
> language sql as 'select $1 = $2';
> CREATE FUNCTION
>
> regression=# select foo2('a', 'b');
> ERROR:  could not determine polymorphic common type because input has type
> unknown
>
> However, it seems to me that this is inconsistent with the definition,
> namely that we resolve the common type the same way select_common_type()
> does, because select_common_type() will choose TEXT when given all-unknown
> inputs.  So shouldn't we choose TEXT here?
>
Admittedly, the fact that select_common_type() falls back to TEXT is a
> bit of a wart, so maybe we don't want to propagate it here.  But if we
> don't, we'll have to document the selection rule as almost but not
> quite like what it says in section 10.5.  That seems confusing.
>
> Documentation issues aside, I'm not quite sure whether this behavior
> would be more or less preferable in practice than sticking with the
> existing behavior.  It seems like it'd be convenient in some cases
> but possibly allow mistakes to go undetected in others.
>
> Thoughts?
>

It is difficult question. What I know, this issue is less than we can
expect, because almost all functions are called with typed parameters
(columns, variables).

the fallback to text is enticement but maybe better is consistency with
other polymorphic types.

Maybe users can implement own fallback behave with next custom function

create function foo2(text, text) returns bool
language sql as 'select $1 = $2';

Regards

Pavel



> regards, tom lane
>

Re: pg_stat_progress_basebackup - progress reporting for pg_basebackup, in the server side

2020-03-18 Thread Fujii Masao





On 2020/03/19 1:22, Magnus Hagander wrote:

On Wed, Mar 18, 2020 at 5:14 PM Fujii Masao  wrote:




On 2020/03/19 0:37, Magnus Hagander wrote:

On Wed, Mar 11, 2020 at 5:53 AM Fujii Masao  wrote:




On 2020/03/11 3:39, Magnus Hagander wrote:

On Tue, Mar 10, 2020 at 6:19 PM Fujii Masao  wrote:




On 2020/03/10 22:43, Amit Langote wrote:

On Tue, Mar 10, 2020 at 6:09 PM Fujii Masao  wrote:

So, I will make the patch adding support for --no-estimate-size option
in pg_basebackup.


Patch attached.


Like the idea and the patch looks mostly good.


Thanks for reviewing the patch!


+  total size. If the estimation is disabled in
+  pg_basebackup
+  (i.e., --no-estimate-size option is specified),
+  this is always 0.

"always" seems unnecessary.


Fixed.


+This option prevents the server from estimating the total
+amount of backup data that will be streamed. In other words,
+backup_total column in the
+pg_stat_progress_basebackup
+view always indicates 0 if this option is enabled.

Here too.


Fixed.

Attached is the updated version of the patch.


Would it perhaps be better to return NULL instead of 0 in the
statistics view if there is no data?


Did you miss this comment, or not agree? :)


Oh, I forgot to attached the patch... Patch attached.
This patch needs to be applied after applying
add_no_estimate_size_v3.patch.


:)

Hmm. I'm slightly irked by doing the -1 -> NULL conversion in the SQL
view.  I wonder if it might be worth teaching
pg_stat_get_progress_info() about returning NULL?


That's possible by
- adding the boolean array like st_progress_null[PGSTAT_NUM_PROGRESS_PARAM]
   that indicates whether each column is NULL or not, into PgBackendStatus
- extending pgstat_progress_update_param() and 
pgstat_progress_update_multi_param()
   so that they can update the boolean array for NULL
- updating the progress reporting code so that the extended versions of
   function are used

I didn't adopt this idea because it looks a bit overkill for the purpose.
OTOH, this would be good improvement for the progress reporting
infrastructure and I'm fine to implement it.

Regards,

--
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters

Re: Small docs bugfix: make it clear what can be used in UPDATE FROM and DELETE USING

2020-03-18 Thread Tom Lane

Bruce Momjian  writes:
> OK, updated patch attached.

LGTM, thanks.

regards, tom lane

Re: type of some table storage params on doc

2020-03-18 Thread Alvaro Herrera

On 2020-Mar-18, Michael Paquier wrote:

> On Mon, Mar 16, 2020 at 11:07:37AM +0900, Atsushi Torikoshi wrote:
> > As far as I read the reloptions.c, autovacuum_vacuum_cost_delay,
> > autovacuum_vacuum_scale_factor and autovacuum_analyze_scale_factor
> > are the members of relopt_real, so their type seems the same, real.
> 
> In this case, the parsing uses parse_real(), which is exactly the same
> code path as what real GUCs use.
> 
> > But the manual about storage parameters[1] says two of their type
> > are float4 and the other is floating point.
> >
> > I think using float4 on storage parameters[1] are not consistent
> > so far, how about changing these parameters type from float4 to
> > floating point if there are no specific reasons using float4?
> 
> That's a good idea, so I am fine to apply your patch as float4 is a
> data type.  However, let's see first if others have more comments or
> objections.

Hmm.  So unadorned 'floating point' seems to refer to float8; you have
to use float(24) in order to get a float4.  The other standards-mandated
name for float4 seems to be REAL.  (I had a look around but was unable
to figure out whether the standard mandates exact bit widths other than
the precision spec).  Since they're not doubles, what about we use REAL
rather than FLOATING POINT?

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Auxiliary Processes and MyAuxProc

2020-03-18 Thread Alvaro Herrera

On 2020-Mar-18, Mike Palmiotto wrote:

> On Wed, Mar 18, 2020 at 11:26 AM Mike Palmiotto
>  wrote:
> >
> > On Wed, Mar 18, 2020 at 10:17 AM Justin Pryzby  wrote:
> > > Also, I saw this was failing tests both before and after my rebase.
> > >
> > > http://cfbot.cputube.org/
> > > https://ci.appveyor.com/project/postgresql-cfbot/postgresql/builds/31535161
> > > https://ci.appveyor.com/project/postgresql-cfbot/postgresql/builds/31386446
> >
> > Good catch, thanks. Will address this as well in the next round. Just
> > need to set up a Windows dev environment to see if I can
> > reproduce/fix.
> 
> While I track this down, here is a rebased patchset, which drops
> MySubprocessType and makes use of the MyBackendType.

Note that you can compile with -DEXEC_BACKEND to use in a *nix build the
same technique used to spawn processes in Windows, which might be an
easier way to discover some problems without a proper Windows build.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: proposal: new polymorphic types - commontype and commontypearray

2020-03-18 Thread Tom Lane

Pavel Stehule  writes:
> st 18. 3. 2020 v 17:14 odesílatel Tom Lane  napsal:
>> However, it seems to me that this is inconsistent with the definition,
>> namely that we resolve the common type the same way select_common_type()
>> does, because select_common_type() will choose TEXT when given all-unknown
>> inputs.  So shouldn't we choose TEXT here?

> It is difficult question. What I know, this issue is less than we can
> expect, because almost all functions are called with typed parameters
> (columns, variables).

True, in actual production queries it's less likely that all the inputs
would be literal constants.  So this is mainly about surprise factor,
or lack of it, for handwritten test queries.

> Maybe users can implement own fallback behave with next custom function

> create function foo2(text, text) returns bool
> language sql as 'select $1 = $2';

No, because if you've got that alongside foo2(anycompatible,
anycompatible) then your queries will fail due to both functions
matching anything that's promotable to text.

regards, tom lane

Re: Auxiliary Processes and MyAuxProc

2020-03-18 Thread Mike Palmiotto

On Wed, Mar 18, 2020 at 12:54 PM Alvaro Herrera
 wrote:
>
> On 2020-Mar-18, Mike Palmiotto wrote:
>
> > On Wed, Mar 18, 2020 at 11:26 AM Mike Palmiotto
> >  wrote:
> > >
> > > On Wed, Mar 18, 2020 at 10:17 AM Justin Pryzby  
> > > wrote:
> > > > Also, I saw this was failing tests both before and after my rebase.
> > > >
> > > > http://cfbot.cputube.org/
> > > > https://ci.appveyor.com/project/postgresql-cfbot/postgresql/builds/31535161
> > > > https://ci.appveyor.com/project/postgresql-cfbot/postgresql/builds/31386446
> > >
> > > Good catch, thanks. Will address this as well in the next round. Just
> > > need to set up a Windows dev environment to see if I can
> > > reproduce/fix.
> >
> > While I track this down, here is a rebased patchset, which drops
> > MySubprocessType and makes use of the MyBackendType.
>
> Note that you can compile with -DEXEC_BACKEND to use in a *nix build the
> same technique used to spawn processes in Windows, which might be an
> easier way to discover some problems without a proper Windows build.

Good suggestion. Unfortunately,that's how I've been testing all along.
I thought that would be sufficient, but seems like this might be more
specific to the Windows #ifdef's.

I have another version on my devbox which fixes the (embarrassing)
Travis failure for non-EXEC_BACKEND due to a dropped semicolon during
rebase. I'm setting up my own appveyor instance with Tomas's config
and will see if I can get to the bottom of this.

-- 
Mike Palmiotto
https://crunchydata.com

Re: proposal: new polymorphic types - commontype and commontypearray

2020-03-18 Thread Pavel Stehule

st 18. 3. 2020 v 17:54 odesílatel Tom Lane  napsal:

> Pavel Stehule  writes:
> > st 18. 3. 2020 v 17:14 odesílatel Tom Lane  napsal:
> >> However, it seems to me that this is inconsistent with the definition,
> >> namely that we resolve the common type the same way select_common_type()
> >> does, because select_common_type() will choose TEXT when given
> all-unknown
> >> inputs.  So shouldn't we choose TEXT here?
>
> > It is difficult question. What I know, this issue is less than we can
> > expect, because almost all functions are called with typed parameters
> > (columns, variables).
>
> True, in actual production queries it's less likely that all the inputs
> would be literal constants.  So this is mainly about surprise factor,
> or lack of it, for handwritten test queries.
>
> > Maybe users can implement own fallback behave with next custom function
>
> > create function foo2(text, text) returns bool
> > language sql as 'select $1 = $2';
>
> No, because if you've got that alongside foo2(anycompatible,
> anycompatible) then your queries will fail due to both functions
> matching anything that's promotable to text.
>

It is working for anyelement

postgres=# create or replace function fx(anyelement, anyelement)
postgres-# returns bool as $$ select $1=$2 $$ language sql;
CREATE FUNCTION
postgres=# create or replace function fx(text, text)
returns bool as $$ select $1=$2 $$ language sql;
CREATE FUNCTION
postgres=# select fx(1,2);
┌┐
│ fx │
╞╡
│ f  │
└┘
(1 row)

postgres=# select fx('ahoj','nazdar');
┌┐
│ fx │
╞╡
│ f  │
└┘
(1 row)


> regards, tom lane
>

Re: type of some table storage params on doc

2020-03-18 Thread Tom Lane

Alvaro Herrera  writes:
> On 2020-Mar-18, Michael Paquier wrote:
>> On Mon, Mar 16, 2020 at 11:07:37AM +0900, Atsushi Torikoshi wrote:
 In this case, the parsing uses parse_real(), which is exactly the same
 code path as what real GUCs use.

> Hmm.  So unadorned 'floating point' seems to refer to float8; you have
> to use float(24) in order to get a float4.  The other standards-mandated
> name for float4 seems to be REAL.  (I had a look around but was unable
> to figure out whether the standard mandates exact bit widths other than
> the precision spec).  Since they're not doubles, what about we use REAL
> rather than FLOATING POINT?

Isn't this whole argument based on a false premise?  What parse_real
returns is double, not float.  Also notice that config.sgml consistently
documents those GUCs as floating point.  (I recall having
recently whacked some GUC descriptions that were randomly out of line
with that.)

regards, tom lane

Re: proposal: new polymorphic types - commontype and commontypearray

2020-03-18 Thread Tom Lane

Pavel Stehule  writes:
> st 18. 3. 2020 v 17:54 odesílatel Tom Lane  napsal:
>> No, because if you've got that alongside foo2(anycompatible,
>> anycompatible) then your queries will fail due to both functions
>> matching anything that's promotable to text.

> It is working for anyelement

[ pokes at that... ]  Hm, looks like you're getting away with that
because of the preference for functions taking preferred types.
Seems pretty shaky to me though --- you can probably invent
cases that will throw 'ambiguous function' if you try a bit harder.
In any case, I don't think users will understand why they have to
write two versions of the same function.

regards, tom lane

Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-18 Thread Andres Freund

Hi,

On 2020-03-17 21:58:53 -0400, James Coleman wrote:
> On Tue, Mar 17, 2020 at 9:03 PM Andres Freund  wrote:
> >
> > Hi,
> >
> > On 2020-03-17 20:42:07 +0100, Laurenz Albe wrote:
> > > > I think Andres was thinking this would maybe be an optimization 
> > > > independent of
> > > > is_insert_only (?)
> > >
> > > I wasn't sure.
> >
> > I'm not sure myself - but I'm doubtful that using a 0 min age by default
> > will be ok.
> >
> > I was trying to say (in a later email) that I think it might be a good
> > compromise to opportunistically freeze if we're dirtying the page
> > anyway, but not optimize WAL emission etc. That's a pretty simple
> > change, and it'd address a lot of the potential performance regressions,
> > while still freezing for the "first" vacuum in insert only workloads.
> 
> If we have truly insert-only tables, then doesn't vacuuming with
> freezing every tuple actually decrease total vacuum cost (perhaps
> significantly) since otherwise every vacuum keeps having to scan the
> heap for dead tuples on pages where we know there are none? Those
> pages could conceptually be frozen and ignored, but are not frozen
> because of the default behavior, correct?

Yes.


> If that's all true, it seems to me that removing that part of the
> patch significantly lowers its value.

Well, perfect sometimes is the enemy of the good. We gotta get something
in, and having some automated vacuuming for insert mostly/only tables is
a huge step forward. And avoiding regressions is an important part of
doing so.

I outlined the steps we could take to allow for more aggressive
vacuuming upthread.


> If we opportunistically freeze only if we're already dirtying a page,
> would that help a truly insert-only workload?

Yes.


> E.g., are there hint bits on the page that would need to change the
> first time we vacuum a full page with no dead tuples?

Yes. HEAP_XMIN_COMMITTED.


> I would have assumed the answer was "no" (since if so I think it would
> follow that _all_ pages need updated the first time they're
> vacuumed?).

That is the case. Although they might already be set when the tuples are
accessed for other reasons.


Greetings,

Andres Freund

Re: type of some table storage params on doc

2020-03-18 Thread Alvaro Herrera

On 2020-Mar-18, Tom Lane wrote:

> Alvaro Herrera  writes:
> > On 2020-Mar-18, Michael Paquier wrote:
> >> On Mon, Mar 16, 2020 at 11:07:37AM +0900, Atsushi Torikoshi wrote:
>  In this case, the parsing uses parse_real(), which is exactly the same
>  code path as what real GUCs use.
> 
> > Hmm.  So unadorned 'floating point' seems to refer to float8; you have
> > to use float(24) in order to get a float4.  The other standards-mandated
> > name for float4 seems to be REAL.  (I had a look around but was unable
> > to figure out whether the standard mandates exact bit widths other than
> > the precision spec).  Since they're not doubles, what about we use REAL
> > rather than FLOATING POINT?
> 
> Isn't this whole argument based on a false premise?  What parse_real
> returns is double, not float.  Also notice that config.sgml consistently
> documents those GUCs as floating point.  (I recall having
> recently whacked some GUC descriptions that were randomly out of line
> with that.)

Ah, I hadn't checked -- I was taking the function and struct names at
face value, but it turns out that they're lies as well -- parse_real,
relopt_real all parsing/storing doubles *is* confusing.

That being the case, I agree that "float4" is the wrong thing and
"floating point" is what to use.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: type of some table storage params on doc

2020-03-18 Thread Tom Lane

Alvaro Herrera  writes:
> Ah, I hadn't checked -- I was taking the function and struct names at
> face value, but it turns out that they're lies as well -- parse_real,
> relopt_real all parsing/storing doubles *is* confusing.

Yeah, that's certainly true.  I wonder if we could rename them
without causing a lot of pain for extensions?

regards, tom lane

Re: WAL usage calculation patch

2020-03-18 Thread Julien Rouhaud

On Wed, Mar 18, 2020 at 09:02:58AM +0300, Kirill Bychik wrote:
>
> There is a higher-level Instrumentation API that can be used with
> INSTRUMENT_WAL flag to collect the wal usage information. I believe
> the instrumentation is widely used in the executor code, so it should
> not be a problem to colelct instrumentation information on autovacuum
> worker level.
>
> Just a recommendation/chat, though. I am happy with the way the data
> is collected now. If you commit this variant, please add a TODO to
> rework wal usage to common instr API.


The instrumentation is somewhat intended to be used with executor nodes, not
backend commands.  I don't see real technical reason that would prevent that,
but I prefer to keep things as-is for now, as it sound less controversial.
This is for the 3rd patch, which may not even be considered for this CF anyway.


> > > As for the tests, please get somebody else to review this. I strongly
> > > believe checking full page writes here could be a source of
> > > instability.
> >
> >
> > I'm also a little bit dubious about it.  The initial checkpoint should make
> > things stable (of course unless full_page_writes is disabled), and Cfbot 
> > also
> > seems happy about it.  At least keeping it for the temporary tables test
> > shouldn't be a problem.
>
> Temp tables should show zero FPI WAL records, true :)
>
> I have no objections to the patch.


I'm attaching a v5 with fp records only for temp tables, so there's no risk of
instability.  As I previously said I'm fine with your two patches, so unless
you have objections on the fpi test for temp tables or the documentation
changes, I believe those should be ready for committer.
>From a41a58c51e15c31524ea28be8e31bccbf8d5b343 Mon Sep 17 00:00:00 2001
From: Kirill Bychik 
Date: Tue, 17 Mar 2020 14:41:50 +0100
Subject: [PATCH v5 1/3] Track WAL usage.

---
 src/backend/access/transam/xlog.c   |  8 
 src/backend/access/transam/xloginsert.c |  6 +++
 src/backend/executor/execParallel.c | 22 ++-
 src/backend/executor/instrument.c   | 51 ++---
 src/include/executor/execParallel.h |  1 +
 src/include/executor/instrument.h   | 16 +++-
 src/tools/pgindent/typedefs.list|  1 +
 7 files changed, 95 insertions(+), 10 deletions(-)

diff --git a/src/backend/access/transam/xlog.c 
b/src/backend/access/transam/xlog.c
index de2d4ee582..7cab00323d 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -42,6 +42,7 @@
 #include "commands/progress.h"
 #include "commands/tablespace.h"
 #include "common/controldata_utils.h"
+#include "executor/instrument.h"
 #include "miscadmin.h"
 #include "pg_trace.h"
 #include "pgstat.h"
@@ -1231,6 +1232,13 @@ XLogInsertRecord(XLogRecData *rdata,
ProcLastRecPtr = StartPos;
XactLastRecEnd = EndPos;
 
+   /* Provide WAL update data to the instrumentation */
+   if (inserted)
+   {
+   pgWalUsage.wal_bytes += rechdr->xl_tot_len;
+   pgWalUsage.wal_records++;
+   }
+
return EndPos;
 }
 
diff --git a/src/backend/access/transam/xloginsert.c 
b/src/backend/access/transam/xloginsert.c
index 2fa0a7f667..1f71cc0a76 100644
--- a/src/backend/access/transam/xloginsert.c
+++ b/src/backend/access/transam/xloginsert.c
@@ -25,6 +25,7 @@
 #include "access/xloginsert.h"
 #include "catalog/pg_control.h"
 #include "common/pg_lzcompress.h"
+#include "executor/instrument.h"
 #include "miscadmin.h"
 #include "pg_trace.h"
 #include "replication/origin.h"
@@ -635,6 +636,11 @@ XLogRecordAssemble(RmgrId rmid, uint8 info,
 */
bkpb.fork_flags |= BKPBLOCK_HAS_IMAGE;
 
+   /*
+* Report a full page image constructed for the WAL 
record
+*/
+   pgWalUsage.wal_fp_records++;
+
/*
 * Construct XLogRecData entries for the page content.
 */
diff --git a/src/backend/executor/execParallel.c 
b/src/backend/executor/execParallel.c
index a753d6efa0..017367878f 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -62,6 +62,7 @@
 #define PARALLEL_KEY_DSA   
UINT64CONST(0xE007)
 #define PARALLEL_KEY_QUERY_TEXTUINT64CONST(0xE008)
 #define PARALLEL_KEY_JIT_INSTRUMENTATION UINT64CONST(0xE009)
+#define PARALLEL_KEY_WAL_USAGE UINT64CONST(0xE00A)
 
 #define PARALLEL_TUPLE_QUEUE_SIZE  65536
 
@@ -573,6 +574,7 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
char   *pstmt_space;
char   *paramlistinfo_space;
BufferUsage *bufusage_space;
+   WalUsage*walusage_space;
SharedExecutorInstrumentation *instrumentation = NULL;
SharedJitInstrumentation *jit_instrumentation = NULL;

Re: proposal: new polymorphic types - commontype and commontypearray

2020-03-18 Thread Pavel Stehule

st 18. 3. 2020 v 18:09 odesílatel Tom Lane  napsal:

> Pavel Stehule  writes:
> > st 18. 3. 2020 v 17:54 odesílatel Tom Lane  napsal:
> >> No, because if you've got that alongside foo2(anycompatible,
> >> anycompatible) then your queries will fail due to both functions
> >> matching anything that's promotable to text.
>
> > It is working for anyelement
>
> [ pokes at that... ]  Hm, looks like you're getting away with that
> because of the preference for functions taking preferred types.
> Seems pretty shaky to me though --- you can probably invent
> cases that will throw 'ambiguous function' if you try a bit harder.
> In any case, I don't think users will understand why they have to
> write two versions of the same function.
>

yes, it is not for usual user.

Pavel

>
> regards, tom lane
>

Re: type of some table storage params on doc

2020-03-18 Thread Alvaro Herrera

On 2020-Mar-18, Tom Lane wrote:

> Alvaro Herrera  writes:
> > Ah, I hadn't checked -- I was taking the function and struct names at
> > face value, but it turns out that they're lies as well -- parse_real,
> > relopt_real all parsing/storing doubles *is* confusing.
> 
> Yeah, that's certainly true.  I wonder if we could rename them
> without causing a lot of pain for extensions?

I don't think it will, directly; debian.codesearch.net says only patroni
and slony1-2 contain the "parse_real", and both have their own
implementations (patroni is Python anyway).  I didn't find any
relopt_real anywhere.

However, if we were to rename DefineCustomRealVariable() to match, that
would no doubt hurt a lot of people.  We also have GucRealCheckHook and
GucRealAssignHook typedefs, but those appear to hit no Debian package.
(In guc.c, the fallout rabbit hole goes pretty deep, but that seems well
localized.)

I don't think the last pg13 CF is when to be spending time on this,
though.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-18 Thread James Coleman

On Tue, Mar 17, 2020 at 11:37 PM Justin Pryzby  wrote:
>
> On Tue, Mar 17, 2020 at 09:58:53PM -0400, James Coleman wrote:
> > On Tue, Mar 17, 2020 at 9:03 PM Andres Freund  wrote:
> > >
> > > On 2020-03-17 20:42:07 +0100, Laurenz Albe wrote:
> > > > > I think Andres was thinking this would maybe be an optimization 
> > > > > independent of
> > > > > is_insert_only (?)
> > > >
> > > > I wasn't sure.
> > >
> > > I'm not sure myself - but I'm doubtful that using a 0 min age by default
> > > will be ok.
> > >
> > > I was trying to say (in a later email) that I think it might be a good
> > > compromise to opportunistically freeze if we're dirtying the page
> > > anyway, but not optimize WAL emission etc. That's a pretty simple
> > > change, and it'd address a lot of the potential performance regressions,
> > > while still freezing for the "first" vacuum in insert only workloads.
> >
> > If we have truly insert-only tables, then doesn't vacuuming with
> > freezing every tuple actually decrease total vacuum cost (perhaps
> > significantly) since otherwise every vacuum keeps having to scan the
> > heap for dead tuples on pages where we know there are none? Those
> > pages could conceptually be frozen and ignored, but are not frozen
> > because of the default behavior, correct?
>
> The essential part of this patch is to trigger vacuum *at all* on an
> insert-only table.  Before today's updated patch, it also used FREEZE on any
> table which hit the new insert threshold.  The concern I raised is for
> insert-MOSTLY tables.  I thought it might be an issue if repeatedly freezing
> updated tuples caused vacuum to be too slow, especially if they're distributed
> in pages all across the table rather than clustered.

Yeah, for some reason I'd completely forgotten (caught up in thinking
about the best possible outcome re: freezing insert only tables) that
the bigger problem was just triggering vacuum at all on those tables.

> And I asked that the behavior (FREEZE) be configurable by a separate setting
> than the one that triggers autovacuum to run.  FREEZE is already controlled by
> the vacuum_freeze_table_age param.
>
> I think you're right that VACUUM FREEZE on an insert-only table would be less
> expensive than vacuum once without freeze and vacuum again later, which uses
> freeze.  To me, that suggests setting vacuum_freeze_table_age to a low value 
> on
> those tables.
>
> Regular vacuum avoids scanning all-visible pages, so for an insert-only table
> pages should only be vacuumed once (if frozen the 1st time) or twice (if not).
>
>  * Except when aggressive is set, we want to skip pages that are
>  * all-visible according to the visibility map, but only when we can 
> skip
>
> postgres=# CREATE TABLE t (i int) ; INSERT INTO t SELECT 
> generate_series(1,99); VACUUM VERBOSE t; VACUUM VERBOSE t;
> ...
> INFO:  "t": found 0 removable, 99 nonremovable row versions in 4425 out 
> of 4425 pages
> ...
> VACUUM
> Time: 106.038 ms
> INFO:  "t": found 0 removable, 175 nonremovable row versions in 1 out of 4425 
> pages
> VACUUM
> Time: 1.828 ms
>
> => That's its not very clear way of saying that it only scanned 1 page the 2nd
> time around.

I didn't realize that about the visibility map being taken into account.

> > We have tables that log each change to a business object (as I suspect
> > many transactional workloads do), and I've often thought that
> > immediately freeze every page as soon as it fills up would be a real
> > win for us.
> >
> > If that's all true, it seems to me that removing that part of the
> > patch significantly lowers its value.
>
> > If we opportunistically freeze only if we're already dirtying a page,
> > would that help a truly insert-only workload? E.g., are there hint
> > bits on the page that would need to change the first time we vacuum a
> > full page with no dead tuples? I would have assumed the answer was
> > "no" (since if so I think it would follow that _all_ pages need
> > updated the first time they're vacuumed?).
>
> You probably know that hint bits are written by the first process to access 
> the
> tuple after it was written.  I think you're asking if the first *vacuum*
> requires additional writes beyond that.  And I think vacuum wouldn't touch the
> page until it decides to freeze tuples.

I think my assumption is that (at least in our case), the first
process to access will definitely not be vacuum on any regular basis.

> I do have a patch to display the number of hint bits written and pages frozen.
> https://www.postgresql.org/message-id/flat/20200126141328.GP13621%40telsasoft.com

I'll take a look at that too.

> > But if that's the case, then this kind of opportunistic freezing wouldn't
> > help this kind of workload. Maybe there's something I'm misunderstanding
> > about how vacuum works though.
>
> I am reminding myself about vacuum with increasing frequency and usually still
> learn something new.

For sure.

Thanks,
James

Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-18 Thread James Coleman

On Wed, Mar 18, 2020 at 1:08 PM Andres Freund  wrote:
>
> Hi,
>
> On 2020-03-17 21:58:53 -0400, James Coleman wrote:
> > On Tue, Mar 17, 2020 at 9:03 PM Andres Freund  wrote:
> > >
> > > Hi,
> > >
> > > On 2020-03-17 20:42:07 +0100, Laurenz Albe wrote:
> > > > > I think Andres was thinking this would maybe be an optimization 
> > > > > independent of
> > > > > is_insert_only (?)
> > > >
> > > > I wasn't sure.
> > >
> > > I'm not sure myself - but I'm doubtful that using a 0 min age by default
> > > will be ok.
> > >
> > > I was trying to say (in a later email) that I think it might be a good
> > > compromise to opportunistically freeze if we're dirtying the page
> > > anyway, but not optimize WAL emission etc. That's a pretty simple
> > > change, and it'd address a lot of the potential performance regressions,
> > > while still freezing for the "first" vacuum in insert only workloads.
> >
> > If we have truly insert-only tables, then doesn't vacuuming with
> > freezing every tuple actually decrease total vacuum cost (perhaps
> > significantly) since otherwise every vacuum keeps having to scan the
> > heap for dead tuples on pages where we know there are none? Those
> > pages could conceptually be frozen and ignored, but are not frozen
> > because of the default behavior, correct?
>
> Yes.
>
>
> > If that's all true, it seems to me that removing that part of the
> > patch significantly lowers its value.
>
> Well, perfect sometimes is the enemy of the good. We gotta get something
> in, and having some automated vacuuming for insert mostly/only tables is
> a huge step forward. And avoiding regressions is an important part of
> doing so.

Yep, as I responded to Justin, in thinking about the details I'd lost
sight of the biggest issue.

So I withdraw that concern in favor of getting something out that
improves things now.

...

> > If we opportunistically freeze only if we're already dirtying a page,
> > would that help a truly insert-only workload?
>
> Yes.

Only if some other process hasn't already read and caused hint bits to
be written, correct? Or am I missing something there too?

> > E.g., are there hint bits on the page that would need to change the
> > first time we vacuum a full page with no dead tuples?
>
> Yes. HEAP_XMIN_COMMITTED.

This can be set opportunistically by other non-vacuum processes though?

> > I would have assumed the answer was "no" (since if so I think it would
> > follow that _all_ pages need updated the first time they're
> > vacuumed?).
>
> That is the case. Although they might already be set when the tuples are
> accessed for other reasons.

Ah, I think this is answering what I'd asked above.

I'm very excited to see improvements in flight on this use case.

Thanks,
James

Re: WAL usage calculation patch

2020-03-18 Thread Kirill Bychik

> > There is a higher-level Instrumentation API that can be used with
> > INSTRUMENT_WAL flag to collect the wal usage information. I believe
> > the instrumentation is widely used in the executor code, so it should
> > not be a problem to colelct instrumentation information on autovacuum
> > worker level.
> >
> > Just a recommendation/chat, though. I am happy with the way the data
> > is collected now. If you commit this variant, please add a TODO to
> > rework wal usage to common instr API.
>
>
> The instrumentation is somewhat intended to be used with executor nodes, not
> backend commands.  I don't see real technical reason that would prevent that,
> but I prefer to keep things as-is for now, as it sound less controversial.
> This is for the 3rd patch, which may not even be considered for this CF 
> anyway.
>
>
> > > > As for the tests, please get somebody else to review this. I strongly
> > > > believe checking full page writes here could be a source of
> > > > instability.
> > >
> > >
> > > I'm also a little bit dubious about it.  The initial checkpoint should 
> > > make
> > > things stable (of course unless full_page_writes is disabled), and Cfbot 
> > > also
> > > seems happy about it.  At least keeping it for the temporary tables test
> > > shouldn't be a problem.
> >
> > Temp tables should show zero FPI WAL records, true :)
> >
> > I have no objections to the patch.
>
>
> I'm attaching a v5 with fp records only for temp tables, so there's no risk of
> instability.  As I previously said I'm fine with your two patches, so unless
> you have objections on the fpi test for temp tables or the documentation
> changes, I believe those should be ready for committer.

No objections on my side either. Thank you for your review, time and efforts!

Re: Small docs bugfix: make it clear what can be used in UPDATE FROM and DELETE USING

2020-03-18 Thread David G. Johnston

On Wednesday, March 18, 2020, Tom Lane  wrote:

> Bruce Momjian  writes:
> > OK, updated patch attached.
>
> LGTM, thanks
>

+1

David J.

Re: Minimal logical decoding on standbys

2020-03-18 Thread Alvaro Herrera

There were conflicts again, so I rebased once more.  Didn't do anything
else.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From e2f5cedfc35652f198f9ca3bca9a45572f8845fb Mon Sep 17 00:00:00 2001
From: Amit Khandekar 
Date: Thu, 16 Jan 2020 10:05:15 +0530
Subject: [PATCH v6 1/5] Allow logical decoding on standby.

Allow a logical slot to be created on standby. Restrict its usage
or its creation if wal_level on primary is less than logical.
During slot creation, it's restart_lsn is set to the last replayed
LSN. Effectively, a logical slot creation on standby waits for an
xl_running_xact record to arrive from primary. Conflicting slots
would be handled in next commits.

Andres Freund and Amit Khandekar.
---
 src/backend/access/transam/xlog.c | 11 +
 src/backend/replication/logical/decode.c  | 22 -
 src/backend/replication/logical/logical.c | 37 ---
 src/backend/replication/slot.c| 57 +++
 src/backend/replication/walsender.c   | 10 ++--
 src/include/access/xlog.h |  1 +
 6 files changed, 98 insertions(+), 40 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index de2d4ee582..a666b4b935 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -4967,6 +4967,17 @@ LocalProcessControlFile(bool reset)
 	ReadControlFile();
 }
 
+/*
+ * Get the wal_level from the control file. For a standby, this value should be
+ * considered as its active wal_level, because it may be different from what
+ * was originally configured on standby.
+ */
+WalLevel
+GetActiveWalLevel(void)
+{
+	return ControlFile->wal_level;
+}
+
 /*
  * Initialization of shared memory for XLOG
  */
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index c2e5e3abf8..3a072af75b 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -187,11 +187,31 @@ DecodeXLogOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 			 * can restart from there.
 			 */
 			break;
+		case XLOG_PARAMETER_CHANGE:
+		{
+			xl_parameter_change *xlrec =
+(xl_parameter_change *) XLogRecGetData(buf->record);
+
+			/*
+			 * If wal_level on primary is reduced to less than logical, then we
+			 * want to prevent existing logical slots from being used.
+			 * Existing logical slots on standby get dropped when this WAL
+			 * record is replayed; and further, slot creation fails when the
+			 * wal level is not sufficient; but all these operations are not
+			 * synchronized, so a logical slot may creep in while the wal_level
+			 * is being reduced.  Hence this extra check.
+			 */
+			if (xlrec->wal_level < WAL_LEVEL_LOGICAL)
+ereport(ERROR,
+		(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+		 errmsg("logical decoding on standby requires "
+"wal_level >= logical on master")));
+			break;
+		}
 		case XLOG_NOOP:
 		case XLOG_NEXTOID:
 		case XLOG_SWITCH:
 		case XLOG_BACKUP_END:
-		case XLOG_PARAMETER_CHANGE:
 		case XLOG_RESTORE_POINT:
 		case XLOG_FPW_CHANGE:
 		case XLOG_FPI_FOR_HINT:
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index 5adf253583..03463719f8 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -91,23 +91,22 @@ CheckLogicalDecodingRequirements(void)
 (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
  errmsg("logical decoding requires a database connection")));
 
-	/* 
-	 * TODO: We got to change that someday soon...
-	 *
-	 * There's basically three things missing to allow this:
-	 * 1) We need to be able to correctly and quickly identify the timeline a
-	 *	  LSN belongs to
-	 * 2) We need to force hot_standby_feedback to be enabled at all times so
-	 *	  the primary cannot remove rows we need.
-	 * 3) support dropping replication slots referring to a database, in
-	 *	  dbase_redo. There can't be any active ones due to HS recovery
-	 *	  conflicts, so that should be relatively easy.
-	 * 
-	 */
 	if (RecoveryInProgress())
-		ereport(ERROR,
-(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("logical decoding cannot be used while in recovery")));
+	{
+		/*
+		 * This check may have race conditions, but whenever
+		 * XLOG_PARAMETER_CHANGE indicates that wal_level has changed, we
+		 * verify that there are no existing logical replication slots. And to
+		 * avoid races around creating a new slot,
+		 * CheckLogicalDecodingRequirements() is called once before creating
+		 * the slot, and once when logical decoding is initially starting up.
+		 */
+		if (GetActiveWalLevel() < WAL_LEVEL_LOGICAL)
+			ereport(ERROR,
+	(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+	 errmsg("logical decoding on standby requires "
+			"wal_level >= logical on master"))

Re: Minimal logical decoding on standbys

2020-03-18 Thread Alvaro Herrera

On 2020-Mar-18, Alvaro Herrera wrote:

> There were conflicts again, so I rebased once more.  Didn't do anything
> else.

This compiles fine, but tests seem to hang forever with no output.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: potential stuck lock in SaveSlotToPath()

2020-03-18 Thread Andres Freund

Hi,

On 2020-03-18 16:46:23 +0100, Peter Eisentraut wrote:
> When SaveSlotToPath() is called with elevel=LOG, the early exits don't
> release the slot's io_in_progress_lock.  Fix attached.

I'm a bit confused as to why we we ever call it with elevel = LOG
(i.e. why we have the elevel parameter at all). That seems to have been
there from the start, so it's either me or Robert that's to blame. But I
can't immediately see a reason for it?

Greetings,

Andres Freund

Re: WAL usage calculation patch

2020-03-18 Thread Julien Rouhaud

On Wed, Mar 18, 2020 at 08:48:17PM +0300, Kirill Bychik wrote:
> > I'm attaching a v5 with fp records only for temp tables, so there's no risk 
> > of
> > instability.  As I previously said I'm fine with your two patches, so unless
> > you have objections on the fpi test for temp tables or the documentation
> > changes, I believe those should be ready for committer.
>
> No objections on my side either. Thank you for your review, time and efforts!


Great, thanks also for the patches and efforts!  I'll mark the entry as RFC.

Re: Additional improvements to extended statistics

2020-03-18 Thread Tomas Vondra

On Sun, Mar 15, 2020 at 12:37:37PM +, Dean Rasheed wrote:

On Sun, 15 Mar 2020 at 00:08, Tomas Vondra  wrote:

On Sat, Mar 14, 2020 at 05:56:10PM +0100, Tomas Vondra wrote:
>
>Attached is a patch series rebased on top of the current master, after
>committing the ScalarArrayOpExpr enhancements. I've updated the OR patch
>to get rid of the code duplication, and barring objections I'll get it
>committed shortly together with the two parts improving test coverage.

I've pushed the two patches improving test coverage for functional
dependencies and MCV lists, which seems mostly non-controversial. I'll
wait a bit more with the two patches actually changing behavior (rebased
version attached, to keep cputube happy).

Patch 0001 looks to be mostly ready. Just a couple of final comments:

+   if (is_or)
+   simple_sel = clauselist_selectivity_simple_or(root,
stat_clauses, varRelid,
+ jointype,
sjinfo, NULL, 1.0);
+   else

Surely that should be passing 0.0 as the final argument, otherwise it
will always just return simple_sel = 1.0.

+*
+* XXX We can't multiply with current value, because for OR clauses
+* we start with 0.0, so we simply assign to s1 directly.
+*/
+   s = statext_clauselist_selectivity(root, clauses, varRelid,
+  jointype, sjinfo, rel,
+  &estimatedclauses, true);

That final part of the comment is no longer relevant (variable s1 no
longer exists). Probably it could now just be deleted, since I think
there are sufficient comments elsewhere to explain what's going on.

Otherwise it looks good, and I think this will lead to some very
worthwhile improvements.

Attached is a rebased patch series, addressing both those issues.

I've been wondering why none of the regression tests failed because of
the 0.0 vs. 1.0 issue, but I think the explanation is pretty simple - to
make the tests stable, all the MCV lists we use are "perfect" i.e. it
represents 100% of the data. But this selectivity is used to compute
selectivity only for the part not represented by the MCV list, i.e. it's
not really used. I suppose we could add a test that would use larger
MCV item, but I'm afraid that'd be inherently unstable :-(

Another thing I was thinking about is the changes to the API. We need to
pass information whether the clauses are connected by AND or OR to a
number of places, and 0001 does that in two ways. For some functions it
adds a new parameter (called is_or), and for other functiosn it creates
a new copy of a function. So for example

  - statext_mcv_clauselist_selectivity
  - statext_clauselist_selectivity

got the new flag, while e.g. clauselist_selectivity gets a new "copy"
sibling called clauselist_selectivity_or.

There were two reasons for not using flag. First, clauselist_selectivity
and similar functions have to do very different stuff for these two
cases, so it'd be just one huge if/else block. Second, minimizing
breakage of third-party code - pretty much all the extensions I've seen
only work with AND clauses, and call clauselist_selectivity. Adding a
flag would break that code. (Also, there's a bit of laziness, because
this was the simplest thing to do during development.)

But I wonder if that's sufficient reason - maybe we should just add the
flag in all cases. It might break some code, but the fix is trivial (add
a false there).

Opinions?

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From 4d1b7a071ff7ea27bb991f7999133eb8d34f4720 Mon Sep 17 00:00:00 2001
From: Tomas Vondra 
Date: Sun, 8 Mar 2020 23:26:50 +0100
Subject: [PATCH 1/2] Improve estimation of OR clauses using extended
 statistics

Until now, OR clauses were estimated using extended statistics only when
the whole clause (all the arguments) are compatible. If even just one
argument was found to be incompatible, the whole clause was estimated
ignoring extended statistics. Estimation errors for OR clauses tend to
be fairly mild, so this was considered acceptable, but it may become an
issue for OR clauses with more complex arguments, etc.

This commit relaxes the restriction, using mostly the same logic as AND
clauses. We first apply extended statistics to as many arguments as
possible, and then use the (s1 + s2 - s1 * s2) formula to factor in the
remaining clauses.

The OR clause is still considered incompatible, though. If any argument
is unsupported or references variable not covered by the statistics, the
whole OR clause is incompatible. The consequence is that e.g. clauses

(a = 1) AND (b = 1 OR c = 1 OR d = 1)

can't be estimated by statistics on (a,b,c) because the OR clause also
references "d". So we'll estimate each of the AND arguments separately,
and the extended statistics will be used only to estimate the OR clause.
This may be solved by c

Re: Minimal logical decoding on standbys

2020-03-18 Thread Alvaro Herrera

On 2020-Mar-18, Alvaro Herrera wrote:

> On 2020-Mar-18, Alvaro Herrera wrote:
> 
> > There were conflicts again, so I rebased once more.  Didn't do anything
> > else.
> 
> This compiles fine, but tests seem to hang forever with no output.

well, not "forever", but:

$ make check PROVE_TESTS=t/019_standby_logical_decoding_conflicts.pl 
PROVE_FLAGS=-v
...
cd /pgsql/source/master/src/test/recovery && 
TESTDIR='/home/alvherre/mnt/crypt/alvherre/Code/pgsql/build/master/src/test/recovery'
 PATH="/pgsql/build/master/tmp_install/pgsql/install/master/bin:$PATH" 
LD_LIBRARY_PATH="/pgsql/build/master/tmp_install/pgsql/install/master/lib"  
PGPORT='655432' 
PG_REGRESS='/home/alvherre/mnt/crypt/alvherre/Code/pgsql/build/master/src/test/recovery/../../../src/test/regress/pg_regress'
 REGRESS_SHLIB='/pgsql/build/master/src/test/regress/regress.so' /usr/bin/prove 
-I /pgsql/source/master/src/test/perl/ -I 
/pgsql/source/master/src/test/recovery -v 
t/019_standby_logical_decoding_conflicts.pl
t/019_standby_logical_decoding_conflicts.pl .. 
1..24
ok 1 - dropslot on standby created
ok 2 - activeslot on standby created
# poll_query_until timed out executing this query:
# SELECT '0/35C9190' <= replay_lsn AND state = 'streaming' FROM 
pg_catalog.pg_stat_replication WHERE application_name = 'standby';
# expecting this output:
# t
# last actual query output:
# 
# with stderr:
Bailout called.  Further testing stopped:  system pg_ctl failed
Bail out!  system pg_ctl failed
FAILED--Further testing stopped: system pg_ctl failed
make: *** [Makefile:19: check] Error 255


-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: potential stuck lock in SaveSlotToPath()

2020-03-18 Thread Alvaro Herrera

On 2020-Mar-18, Andres Freund wrote:

> Hi,
> 
> On 2020-03-18 16:46:23 +0100, Peter Eisentraut wrote:
> > When SaveSlotToPath() is called with elevel=LOG, the early exits don't
> > release the slot's io_in_progress_lock.  Fix attached.
> 
> I'm a bit confused as to why we we ever call it with elevel = LOG
> (i.e. why we have the elevel parameter at all). That seems to have been
> there from the start, so it's either me or Robert that's to blame. But I
> can't immediately see a reason for it?

I guess you didn't want failure to save a slot be a reason to abort a
checkpoint.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-18 Thread Laurenz Albe

On Tue, 2020-03-17 at 17:26 -0700, Andres Freund wrote:
> On 2020-03-17 01:14:02 +0100, Laurenz Albe wrote:
> > lazy_check_needs_freeze() is only called for an aggressive vacuum, which
> > this isn't.
> 
> Hm? I mean some of these will be aggressive vacuums, because it's older
> than vacuum_freeze_table_age? And the lower age limit would make that
> potentially more painful, no?

You are right.  I thought of autovacuum_freeze_max_age, but not of
vacuum_freeze_table_age.

Autovacuum configuration is so woefully complicated that it makes me
feel bad to propose two more parameters :^(

Yours,
Laurenz Albe

Re: potential stuck lock in SaveSlotToPath()

2020-03-18 Thread Thomas Munro

On Thu, Mar 19, 2020 at 4:46 AM Peter Eisentraut
 wrote:
> [patch]

+ * releaseing even in that case.

Typo.

Re: potential stuck lock in SaveSlotToPath()

2020-03-18 Thread Andres Freund

Hi,

On 2020-03-18 16:54:19 -0300, Alvaro Herrera wrote:
> On 2020-Mar-18, Andres Freund wrote:
> > On 2020-03-18 16:46:23 +0100, Peter Eisentraut wrote:
> > > When SaveSlotToPath() is called with elevel=LOG, the early exits don't
> > > release the slot's io_in_progress_lock.  Fix attached.
> > 
> > I'm a bit confused as to why we we ever call it with elevel = LOG
> > (i.e. why we have the elevel parameter at all). That seems to have been
> > there from the start, so it's either me or Robert that's to blame. But I
> > can't immediately see a reason for it?
> 
> I guess you didn't want failure to save a slot be a reason to abort a
> checkpoint.

I don't see a valid reason for that though - if anything it's dangerous,
because we're not persistently saving the slot. It should fail the
checkpoint imo. Robert, do you have an idea?

Greetings,

Andres Freund

Re: Collation versioning

2020-03-18 Thread Peter Eisentraut


On 2020-03-17 18:43, Julien Rouhaud wrote:

On Tue, Mar 17, 2020 at 05:31:47PM +0100, Christoph Berg wrote:

Re: Peter Eisentraut 
2020-03-17

Did we discuss the regcollation type?  In the current patch set, it's only
used in two places in a new regression test, where it can easily be replaced
by a join.  Do we need it?


I originally wrote it for a previous version of the patchset, to shorten the
pg_dump query, but that went out when I replaced the DDL command with native
functions instead.  It didn't seem to hurt to keep it, so I relied on it in the
regression tests.


OK, I have committed the regcollation patch, and some surrounding 
cleanup of the reg* types documentation.


Note that your patch updated the pg_upgrade documentation to say that 
tables with regcollation columns cannot be upgraded but didn't actually 
patch the pg_upgrade code to make that happen.


--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: BEFORE ROW triggers for partitioned tables

2020-03-18 Thread Alvaro Herrera

On 2020-Mar-17, Ashutosh Bapat wrote:

> On Fri, 13 Mar 2020 at 21:55, Alvaro Herrera 
> wrote:

> > Note that in this implementation I no longer know which column is the
> > problematic one, but I suppose users have clue enough.  Wording
> > suggestions welcome.
> 
> When we have expression as a partition key, there may not be one particular
> column which causes the row to move out of partition. So, this should be
> fine.

True.

> A slight wording suggestion below.
> 
> - * Complain if we find an unexpected trigger type.
> - */
> - if (!TRIGGER_FOR_AFTER(trigForm->tgtype))
> - elog(ERROR, "unexpected trigger \"%s\" found",
> - NameStr(trigForm->tgname));
> 
> !AFTER means INSTEAD OF and BEFORE. Do you intend to allow INSTEAD OF
> triggers as well?

Hmm, yeah, this should check both types; I'll put it back.  Note that
this is just a cross-check that the catalogs we're going to copy don't
contain bogus info; the real backstop for that at the user level is in
the other one you complain about:

> - */
> - if (stmt->timing != TRIGGER_TYPE_AFTER)
> 
> Same comment as the above?

Note that in this one we have a check for INSTEAD before we enter the
FOR EACH ROW block, so this case is already covered -- AFAICS the code
is correct.

> + /*
> + * After a tuple in a partition goes through a trigger, the user
> + * could have changed the partition key enough that the tuple
> + * no longer fits the partition.  Verify that.
> + */
> + if (trigger->tgisclone &&
> 
> Why do we want to restrict this check only for triggers which are
> cloned from the ancestors?

Because it's not our business in the other case.  When the trigger is
defined directly in the partition, it's the user's problem if something
going amiss.

> + !ExecPartitionCheck(relinfo, slot, estate, false))
> + ereport(ERROR,
> + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> + errmsg("moving row to another partition during a BEFORE trigger is not
> supported"),
> + errdetail("Before trigger \"%s\", row was to be in partition \"%s.%s\"",
> 
> In the error message you removed above, we are mentioning BEFORE FOR EACH
> ROW trigger. Should we continue to use the same terminology?

Sounds good, I'll change that.

I also changed the errdetail slightly:
errdetail("Before executing trigger \"%s\", the row was to be in 
partition \"%s.%s\"",

> I was wondering whether it would be good to check the partition
> constraint only once i.e. after all before row triggers have been
> executed. This would avoid throwing an error in case multiple triggers
> together worked to keep the tuple in the same partition when
> individual trigger/s caused it to move out of that partition. But then
> we would loose the opportunity to point out the before row trigger
> which actually caused the row to move out of the partition. Anyway,
> wanted to bring that for the discussion here.

Yeah, I too thought about a combination of triggers that move the tuple
elsewhere and back.  Frankly, I don't think we need to support that.  It
sounds devious and likely we'll miss some odd corner case -- anything
involving the weird cross-partition UPDATE mechanism sounds easy to get
wrong.

> +-- Before triggers and partitions
> 
> The test looks good. Should we add a test for partitioned table with
> partition
> key as expression?

Will do.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Collation versioning

2020-03-18 Thread Julien Rouhaud

On Wed, Mar 18, 2020 at 09:29:55PM +0100, Peter Eisentraut wrote:
> On 2020-03-17 18:43, Julien Rouhaud wrote:
> > On Tue, Mar 17, 2020 at 05:31:47PM +0100, Christoph Berg wrote:
> > > Re: Peter Eisentraut 
> > > 2020-03-17
> > > > Did we discuss the regcollation type?  In the current patch set, it's 
> > > > only
> > > > used in two places in a new regression test, where it can easily be 
> > > > replaced
> > > > by a join.  Do we need it?
> > 
> > I originally wrote it for a previous version of the patchset, to shorten the
> > pg_dump query, but that went out when I replaced the DDL command with native
> > functions instead.  It didn't seem to hurt to keep it, so I relied on it in 
> > the
> > regression tests.
> 
> OK, I have committed the regcollation patch, and some surrounding cleanup of
> the reg* types documentation.

Thanks!

> Note that your patch updated the pg_upgrade documentation to say that tables
> with regcollation columns cannot be upgraded but didn't actually patch the
> pg_upgrade code to make that happen.

Oh right, sorry for that I shouldn't have miss it:(

Re: expose parallel leader in CSV and log_line_prefix

2020-03-18 Thread Justin Pryzby

On Sun, Mar 15, 2020 at 12:49:33PM +0100, Julien Rouhaud wrote:
> On Sun, Mar 15, 2020 at 06:18:31AM -0500, Justin Pryzby wrote:
> > See also:
> > https://commitfest.postgresql.org/27/2390/
> > https://www.postgresql.org/message-id/flat/caobau_yy5bt0vtpz2_lum6cucgeqmynoj8-rgto+c2+w3de...@mail.gmail.com
> > b025f32e0b Add leader_pid to pg_stat_activity
> 
> FTR this is a followup of 
> https://www.postgresql.org/message-id/20200315095728.GA26184%40telsasoft.com

Yes - but I wasn't going to draw attention to the first patch, in which I did
something needlessly complicated and indirect. :)

> +   case 'k':
> +   if (MyBackendType != B_BG_WORKER)
> +   ; /* Do nothing */
> 
> 
> Isn't the test inverted?  Also a bgworker could run parallel queries through
> SPI I think, should we really ignore bgworkers?

I don't think it's reversed, but I think I see your point: the patch is
supposed to be showing the leader's own PID for the leader itself.  So I think
that can just be removed.

> +   else if (!MyProc->lockGroupLeader)
> +   ; /* Do nothing */
> 
> There should be a test that MyProc isn't NULL.

Yes, done.

> +   else if (padding != 0)
> +   appendStringInfo(buf, "%*d", padding, 
> MyProc->lockGroupLeader->pid);
> +   else
> +   appendStringInfo(buf, "%d", MyProc->lockGroupLeader->pid);
> +   break;
> 
> I think that if padding was asked we should append spaces rather than doing
> nothing.

Done

It logs like:

template1=# SET log_temp_files=0; explain analyze SELECT a,COUNT(1) FROM t a 
JOIN t b USING(a) GROUP BY 1;
2020-03-15 21:20:47.288 CDT [55375537]LOG:  statement: SET 
log_temp_files=0;
SET
2020-03-15 21:20:47.289 CDT [55375537]LOG:  statement: explain analyze 
SELECT a,COUNT(1) FROM t a JOIN t b USING(a) GROUP BY 1;
2020-03-15 21:20:51.253 CDT [56275537]LOG:  temporary file: path 
"base/pgsql_tmp/pgsql_tmp5627.0", size 6094848
2020-03-15 21:20:51.253 CDT [56275537]STATEMENT:  explain analyze 
SELECT a,COUNT(1) FROM t a JOIN t b USING(a) GROUP BY 1;
2020-03-15 21:20:51.254 CDT [56265537]LOG:  temporary file: path 
"base/pgsql_tmp/pgsql_tmp5626.0", size 6103040
2020-03-15 21:20:51.254 CDT [56265537]STATEMENT:  explain analyze 
SELECT a,COUNT(1) FROM t a JOIN t b USING(a) GROUP BY 1;
2020-03-15 21:20:51.263 CDT [55375537]LOG:  temporary file: path 
"base/pgsql_tmp/pgsql_tmp5537.1.sharedfileset/o15of16.p0.0", size 557056

Now, with the leader showing its own PID.

This also fixes unsafe access to lockGroupLeader->pid, same issue as in the
original v1 patch for b025f32e0b.

-- 
Justin
>From 53d766e88537ca018d46efc671d5b586862fb30b Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Fri, 13 Mar 2020 22:03:06 -0500
Subject: [PATCH v3] Include the leader PID in logfile

See also: b025f32e0b, which adds the leader PID to pg_stat_activity
---
 doc/src/sgml/config.sgml  | 11 +++-
 src/backend/utils/error/elog.c| 27 +++
 src/backend/utils/misc/postgresql.conf.sample |  1 +
 3 files changed, 38 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 672bf6f1ee..b9a1bd9a31 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -6511,6 +6511,14 @@ local0.*/var/log/postgresql
  Process ID
  no
 
+
+ %k
+ Process ID of the parallel group leader if this
+ process is or was involved in parallel query, null
+ otherwise.  For a parallel group leader, this field is
+ set to its own process ID.
+ no
+
 
  %t
  Time stamp without milliseconds
@@ -6809,7 +6817,7 @@ log_line_prefix = '%m [%p] %q%u@%d/%a '
 character count of the error position therein,
 location of the error in the PostgreSQL source code
 (if log_error_verbosity is set to verbose),
-application name, and backend type.
+application name, backend type, and leader PID.
 Here is a sample table definition for storing CSV-format log output:
 
 
@@ -6839,6 +6847,7 @@ CREATE TABLE postgres_log
   location text,
   application_name text,
   backend_type text,
+  leader_pid integer,
   PRIMARY KEY (session_id, session_line_num)
 );
 
diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c
index 62eef7b71f..5328664f4d 100644
--- a/src/backend/utils/error/elog.c
+++ b/src/backend/utils/error/elog.c
@@ -77,6 +77,7 @@
 #include "postmaster/syslogger.h"
 #include "storage/ipc.h"
 #include "storage/proc.h"
+#include "storage/procarray.h"
 #include "tcop/tcopprot.h"
 #include "utils/guc.h"
 #include "utils/memutils.h"
@@ -2560,6 +2561,22 @@ log_line_prefix(StringInfo buf, ErrorData *edata)
 else
 	appendStringInfo(buf, "%d", MyProcPi

Re: BEFORE ROW triggers for partitioned tables

2020-03-18 Thread Alvaro Herrera

Thanks for the reviews; I have pushed it now.

(I made the doc edits you mentioned too.)

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [PATCH] Add schema and table names to partition error

2020-03-18 Thread Chris Bandy

On 3/18/20 6:56 AM, Amit Kapila wrote:
> On Thu, Mar 12, 2020 at 7:46 PM Amit Kapila  wrote:
>>
>> On Wed, Mar 11, 2020 at 8:51 PM Chris Bandy  wrote:
>>>
>>> On 3/11/20 6:29 AM, Amit Kapila wrote:

 I have tried with git am as well, but it failed.  I am not sure what
 is the reason.  Can you please once check at your end?
>>>
>>> Yes, sorry. This set (and v3 and v4) should work with -p0. Any following
>>> patches from me will use the normal -p1.
>>>
>>
>> Okay.
>>
> 
> I again tried the latest patch v5 both with -p1 and -p0, but it gives
> an error while applying the patch.  Can you send a patch that we can
> apply with patch -p1 or git-am?
> 
> [1] - 
> https://www.postgresql.org/message-id/0731def8-978e-0285-04ee-582762729b38%40gmail.com
> 

Sorry for these troubles. Attached are patches created using `git
format-patch -n -v6` on master at 487e9861d0.

Thanks,
Chris
>From 6e5e47dc9a816c8d3e3453759da5ea0dcbeea31a Mon Sep 17 00:00:00 2001
From: Chris Bandy 
Date: Fri, 6 Mar 2020 20:48:55 -0600
Subject: [PATCH v6 1/2] Add tests for integrity violation error fields

The documentation states that all errors of SQLSTATE class 23 should
include the name of an object associated with the error.
---
 src/test/regress/expected/integrity_errors.out | 408 +
 src/test/regress/parallel_schedule |   2 +-
 src/test/regress/sql/integrity_errors.sql  | 193 
 3 files changed, 602 insertions(+), 1 deletion(-)
 create mode 100644 src/test/regress/expected/integrity_errors.out
 create mode 100644 src/test/regress/sql/integrity_errors.sql

diff --git a/src/test/regress/expected/integrity_errors.out b/src/test/regress/expected/integrity_errors.out
new file mode 100644
index 00..e75e6b722f
--- /dev/null
+++ b/src/test/regress/expected/integrity_errors.out
@@ -0,0 +1,408 @@
+--
+-- Tests for integrity violation error fields
+--
+-- Errors in SQLSTATE class 23 (integrity constraint violation) should
+-- include the name of a database object as a separate field.
+--
+-- The fields of interest are shown at the same verbosity level as
+-- volatile details such as source code line numbers. To produce stable
+-- regression output, the following function returns a portion of the
+-- full error reported.
+CREATE FUNCTION integrity_error_record(
+dml text,
+OUT err_sqlstate text,
+OUT err_message text,
+OUT err_detail text,
+OUT err_datatype text,
+OUT err_schema text,
+OUT err_table text,
+OUT err_column text,
+OUT err_constraint text)
+AS $$
+BEGIN
+EXECUTE $1;
+EXCEPTION
+WHEN integrity_constraint_violation THEN GET STACKED DIAGNOSTICS
+err_sqlstate := RETURNED_SQLSTATE,
+err_message := MESSAGE_TEXT,
+err_detail := PG_EXCEPTION_DETAIL,
+err_datatype := PG_DATATYPE_NAME,
+err_schema := SCHEMA_NAME,
+err_table := TABLE_NAME,
+err_column := COLUMN_NAME,
+err_constraint := CONSTRAINT_NAME;
+END;
+$$ LANGUAGE plpgsql;
+\pset expanded on
+\pset tuples_only on
+-- table not null
+CREATE TABLE ivnt1 (n int NOT NULL);
+SELECT * FROM integrity_error_record($$
+INSERT INTO ivnt1 VALUES (NULL);
+$$);
+err_sqlstate   | 23502
+err_message| null value in column "n" of relation "ivnt1" violates not-null constraint
+err_detail | Failing row contains (null).
+err_datatype   | 
+err_schema | public
+err_table  | ivnt1
+err_column | n
+err_constraint | 
+
+-- alter table not null
+CREATE TABLE ivnt2 (n int);
+INSERT INTO ivnt2 VALUES (NULL);
+SELECT * FROM integrity_error_record($$
+ALTER TABLE ivnt2 ALTER n SET NOT NULL;
+$$);
+err_sqlstate   | 23502
+err_message| column "n" of relation "ivnt2" contains null values
+err_detail | 
+err_datatype   | 
+err_schema | public
+err_table  | ivnt2
+err_column | n
+err_constraint | 
+
+DROP TABLE ivnt1, ivnt2;
+-- table unique
+CREATE TABLE ivpkt1 (x int, y int, PRIMARY KEY (x, y));
+INSERT INTO ivpkt1 VALUES (1, 2);
+SELECT * FROM integrity_error_record($$
+INSERT INTO ivpkt1 VALUES (1, 2);
+$$);
+err_sqlstate   | 23505
+err_message| duplicate key value violates unique constraint "ivpkt1_pkey"
+err_detail | Key (x, y)=(1, 2) already exists.
+err_datatype   | 
+err_schema | public
+err_table  | ivpkt1
+err_column | 
+err_constraint | ivpkt1_pkey
+
+-- alter table unique
+CREATE TABLE ivpkt2 (x int, y int);
+INSERT INTO ivpkt2 VALUES (1, 2), (1, 2);
+SELECT * FROM integrity_error_record($$
+ALTER TABLE ivpkt2 ADD PRIMARY KEY (x, y);
+$$);
+err_sqlstate   | 23505
+err_message| could not create unique index "ivpkt2_pkey"
+err_detail | Key (x, y)=(1, 2) is duplicated.
+err_datatype   | 
+err_schema | public
+err_table  | ivpkt2
+err_column | 
+err_constraint | ivpkt2_pkey
+
+-- table foreign key reference
+CREATE TABLE ivfkt1 (x int, y int, FOREIGN KEY (x, y) REFERENCES ivpkt1);
+SELECT * FROM integrity_error_record($$
+INSERT INTO ivfkt1 VALU

Re: Make MemoryContextMemAllocated() more precise

2020-03-18 Thread Jeff Davis

On Mon, 2020-03-16 at 11:45 -0700, Jeff Davis wrote:
> Attached is a patch that makes mem_allocated a method (rather than a
> field) of MemoryContext, and allows each memory context type to track
> the memory its own way. They all do the same thing as before
> (increment/decrement a field), but AllocSet also subtracts out the
> free
> space in the current block. For Slab and Generation, we could do
> something similar, but it's not as much of a problem because there's
> no
> doubling of the allocation size.

Committed.

In an off-list discussion, Andres suggested that MemoryContextStats
could be refactored to achieve this purpose, perhaps with flags to
avoid walking through the blocks and freelists when those are not
needed.

We discussed a few other names, such as "space", "active memory", and
"touched". We didn't settle on any great name, but "touched" seemed to
be the most descriptive.

This refactoring/renaming can be done later; right now I committed this
to unblock disk-based Hash Aggregation, which is ready.

Regards,
Jeff Davis

Re: Define variables in the approprieate scope

2020-03-18 Thread Bruce Momjian

On Tue, Feb 25, 2020 at 09:35:52AM +0100, Antonin Houska wrote:
> I've noticed that two variables in RelationCopyStorage() are defined in a
> scope higher than necessary. Please see the patch.

It seems cleaner to me to allocate the variables once before the loop
starts, rather than for each loop iteration.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +

Re: pgsql: Disk-based Hash Aggregation.

2020-03-18 Thread Alvaro Herrera

On 2020-Mar-18, Jeff Davis wrote:

> Disk-based Hash Aggregation.
> 
> While performing hash aggregation, track memory usage when adding new
> groups to a hash table. If the memory usage exceeds work_mem, enter
> "spill mode".

Kudos!!

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Memory-Bounded Hash Aggregation

2020-03-18 Thread Jeff Davis



Committed.

There's some future work that would be nice (some of these are just
ideas and may not be worth it):

* Refactor MemoryContextMemAllocated() to be a part of
MemoryContextStats(), but allow it to avoid walking through the blocks
and freelists.

* Improve the choice of the initial number of buckets in the hash
table. For this patch, I tried to preserve the existing behavior of
estimating the number of groups and trying to initialize with that many
buckets. But my performance tests seem to indicate this is not the best
approach. More work is needed to find what we should really do here.

* For workloads that are not in work_mem *or* system memory, and need
to actually go to storage, I see poor CPU utilization because it's not
effectively overlapping CPU and IO work. Perhaps buffering or readahead
changes can improve this, or async IO (even better).

* Project unnecessary attributes away before spilling tuples to disk.

* Improve logtape.c API so that the caller doesn't need to manage a
bunch of tape numbers.

* Improve estimate of the hash entry size. This patch doesn't change
the way the planner estimates it, but I observe that actual size as
seen at runtime is significantly different. This is connected to the
initial number of buckets for the hash table.

* In recursive steps, I don't have a good estimate for the number of
groups, so I just estimate it as the number of tuples in that spill
tape (which is pessimistic). That could be improved by doing a real
cardinality estimate as the tuples are spilling (perhaps with HLL?).

* Many aggregates with pass-by-ref transition states don't provide a
great aggtransspace. We should consider doing something smarter, like
having negative numbers represent a number that should be multiplied by
the size of the group (e.g. ARRAY_AGG would have a size dependent on
the group size, not a constant).

* If we want to handle ARRAY_AGG (and the like) well, we can consider
spilling the partial states in the hash table whem the memory is full.
That would add a fair amount of complexity because there would be two
types of spilled data (tuples and partial states), but it could be
useful in some cases.

Regards,
Jeff Davis

Re: Memory-Bounded Hash Aggregation

2020-03-18 Thread Tomas Vondra


On Wed, Mar 18, 2020 at 04:35:57PM -0700, Jeff Davis wrote:


Committed.



\o/

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Collation versioning

2020-03-18 Thread Michael Paquier

On Wed, Mar 18, 2020 at 09:29:55PM +0100, Peter Eisentraut wrote:
> OK, I have committed the regcollation patch, and some surrounding cleanup of
> the reg* types documentation.

Thanks, Peter.
--
Michael


signature.asc
Description: PGP signature

Re: proposal: new polymorphic types - commontype and commontypearray

2020-03-18 Thread Tom Lane

Here's a pretty-nearly-final version of the patch.

In 0001 below, I've left it throwing an error for the case of all
ANYCOMPATIBLE inputs being unknown, but the documentation fails to
acknowledge that.  0002 below is a delta patch that switches to the
other approach of resolving as TEXT.  I'm pretty well convinced that
0002 is what we should do, so I have not bothered to write a doc
change that would explain 0001's behavior on this point.

regards, tom lane

diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml
index 157fe4e..aa634ea 100644
--- a/doc/src/sgml/datatype.sgml
+++ b/doc/src/sgml/datatype.sgml
@@ -4789,6 +4789,22 @@ SELECT * FROM pg_attribute

 

+anycompatible
+   
+
+   
+anycompatiblearray
+   
+
+   
+anycompatiblenonarray
+   
+
+   
+anycompatiblerange
+   
+
+   
 void

 
@@ -4894,6 +4910,35 @@ SELECT * FROM pg_attribute

 

+anycompatible
+Indicates that a function accepts any data type,
+with automatic promotion of multiple arguments to a common data type
+(see ).
+   
+
+   
+anycompatiblearray
+Indicates that a function accepts any array data type,
+with automatic promotion of multiple arguments to a common data type
+(see ).
+   
+
+   
+anycompatiblenonarray
+Indicates that a function accepts any non-array data type,
+with automatic promotion of multiple arguments to a common data type
+(see ).
+   
+
+   
+anycompatiblerange
+Indicates that a function accepts any range data type,
+with automatic promotion of multiple arguments to a common data type
+(see  and
+).
+   
+
+   
 cstring
 Indicates that a function accepts or returns a null-terminated C string.

@@ -4960,7 +5005,7 @@ SELECT * FROM pg_attribute
 

 Functions coded in C (whether built-in or dynamically loaded) can be
-declared to accept or return any of these pseudo data types.  It is up to
+declared to accept or return any of these pseudo-types.  It is up to
 the function author to ensure that the function will behave safely
 when a pseudo-type is used as an argument type.

@@ -4971,10 +5016,9 @@ SELECT * FROM pg_attribute
 languages forbid use of a pseudo-type as an argument type, and allow
 only void and record as a result type (plus
 trigger or event_trigger when the function is used
-as a trigger or event trigger).  Some also
-support polymorphic functions using the types anyelement,
-anyarray, anynonarray, anyenum, and
-anyrange.
+as a trigger or event trigger).  Some also support polymorphic functions
+using the polymorphic pseudo-types, which are shown above and discussed
+in detail in .

 

diff --git a/doc/src/sgml/extend.sgml b/doc/src/sgml/extend.sgml
index 97706da..930aeb7 100644
--- a/doc/src/sgml/extend.sgml
+++ b/doc/src/sgml/extend.sgml
@@ -229,21 +229,114 @@

 
 
- Five pseudo-types of special interest are anyelement,
- anyarray, anynonarray, anyenum,
- and anyrange,
- which are collectively called polymorphic types.
- Any function declared using these types is said to be
- a polymorphic function.  A polymorphic function can
- operate on many different data types, with the specific data type(s)
- being determined by the data types actually passed to it in a particular
- call.
+ Some pseudo-types of special interest are the polymorphic
+ types, which are used to declare polymorphic
+ functions.  This powerful feature allows a single function
+ definition to operate on many different data types, with the specific
+ data type(s) being determined by the data types actually passed to it
+ in a particular call.  The polymorphic types are shown in
+ .  Some examples of
+ their use appear in .
 
 
+
+ Polymorphic Types
+ 
+  
+   
+Name
+Family
+Description
+   
+  
+
+  
+   
+anyelement
+Simple
+Indicates that a function accepts any data type
+   
+
+   
+anyarray
+Simple
+Indicates that a function accepts any array data type
+   
+
+   
+anynonarray
+Simple
+Indicates that a function accepts any non-array data type
+   
+
+   
+anyenum
+Simple
+Indicates that a function accepts any enum data type
+(see )
+
+   
+
+   
+anyrange
+Simple
+Indicates that a function accepts any range data type
+(see )
+
+   
+
+   
+anycompatible
+Common
+Indicates that a function accepts any data type,
+with automatic promotion of multiple arguments to a common data type
+
+   
+
+   
+anycompatibl

Re: Thinko in index_concurrently_swap comment

2020-03-18 Thread Michael Paquier

On Wed, Mar 18, 2020 at 03:33:40PM +0100, Julien Rouhaud wrote:
> While looking at RIC for the collation versioning issue Michael raised 
> earlier,
> I found a thinko in a nearby comment, apparently introduced in the original
> REINDEX CONCURRENTLY patch (5dc92b8).  Trivial patch attached.

Thanks, Julien.  Fixed as of d41202f.
--
Michael


signature.asc
Description: PGP signature

Re: pg_stat_progress_basebackup - progress reporting for pg_basebackup, in the server side

2020-03-18 Thread Amit Langote

Hello,

On Thu, Mar 19, 2020 at 1:47 AM Fujii Masao  wrote:
> On 2020/03/19 1:22, Magnus Hagander wrote:
> > Would it perhaps be better to return NULL instead of 0 in the
> > statistics view if there is no data?
> >>>
> >>> Did you miss this comment, or not agree? :)
> >>
> >> Oh, I forgot to attached the patch... Patch attached.
> >> This patch needs to be applied after applying
> >> add_no_estimate_size_v3.patch.
> >
> > :)
> >
> > Hmm. I'm slightly irked by doing the -1 -> NULL conversion in the SQL
> > view.  I wonder if it might be worth teaching
> > pg_stat_get_progress_info() about returning NULL?
>
> That's possible by
> - adding the boolean array like st_progress_null[PGSTAT_NUM_PROGRESS_PARAM]
> that indicates whether each column is NULL or not, into PgBackendStatus
> - extending pgstat_progress_update_param() and 
> pgstat_progress_update_multi_param()
> so that they can update the boolean array for NULL
> - updating the progress reporting code so that the extended versions of
> function are used
>
> I didn't adopt this idea because it looks a bit overkill for the purpose.

I tend to agree that this would be too many changes for something only
one place currently needs to use.

> OTOH, this would be good improvement for the progress reporting
> infrastructure and I'm fine to implement it.

Magnus' idea of checking the values in pg_stat_get_progress_info() to
determine whether to return NULL seems fine to me.  We will need to
update the documentation of st_progress_param, because it currently
says:

 *  ...but the meaning of each element in the
 * st_progress_param array is command-specific.
 */
ProgressCommandType st_progress_command;
Oid st_progress_command_target;
int64   st_progress_param[PGSTAT_NUM_PROGRESS_PARAM];
} PgBackendStatus;

If we are to define -1 in st_progress_param[] as NULL to the users,
that must be mentioned here.

-- 
Thank you,
Amit

Re: pg_stat_progress_basebackup - progress reporting for pg_basebackup, in the server side

2020-03-18 Thread Alvaro Herrera

On 2020-Mar-19, Amit Langote wrote:

> Magnus' idea of checking the values in pg_stat_get_progress_info() to
> determine whether to return NULL seems fine to me.  We will need to
> update the documentation of st_progress_param, because it currently
> says:
> 
>  *  ...but the meaning of each element in the
>  * st_progress_param array is command-specific.
>  */
> ProgressCommandType st_progress_command;
> Oid st_progress_command_target;
> int64   st_progress_param[PGSTAT_NUM_PROGRESS_PARAM];
> } PgBackendStatus;
> 
> If we are to define -1 in st_progress_param[] as NULL to the users,
> that must be mentioned here.

Hmm, why -1?  It seems like a value that we might want to use for other
purposes in other params.  Maybe INT64_MIN is a better choice?

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: pgsql: walreceiver uses a temporary replication slot by default

2020-03-18 Thread Michael Paquier

On Tue, Mar 17, 2020 at 11:39:11PM +0300, Sergei Kornilov wrote:
> We need this change to set is_temp_slot properly. PrimarySlotName
> GUC can usually be an empty string, so just "slotname != NULL" is
> not enough.

Yep, or a temporary slot would never be created even if there is no
slot defined, and the priority goes to primary_slot_name if set.

> I attached patch with this change.

Thanks, I have added a new open item for v13 to track this effort:
https://wiki.postgresql.org/wiki/PostgreSQL_13_Open_Items
--
Michael


signature.asc
Description: PGP signature

Re: pg_stat_progress_basebackup - progress reporting for pg_basebackup, in the server side

2020-03-18 Thread Amit Langote

On Thu, Mar 19, 2020 at 11:24 AM Alvaro Herrera
 wrote:
> On 2020-Mar-19, Amit Langote wrote:
>
> > Magnus' idea of checking the values in pg_stat_get_progress_info() to
> > determine whether to return NULL seems fine to me.  We will need to
> > update the documentation of st_progress_param, because it currently
> > says:
> >
> >  *  ...but the meaning of each element in the
> >  * st_progress_param array is command-specific.
> >  */
> > ProgressCommandType st_progress_command;
> > Oid st_progress_command_target;
> > int64   st_progress_param[PGSTAT_NUM_PROGRESS_PARAM];
> > } PgBackendStatus;
> >
> > If we are to define -1 in st_progress_param[] as NULL to the users,
> > that must be mentioned here.
>
> Hmm, why -1?  It seems like a value that we might want to use for other
> purposes in other params.  Maybe INT64_MIN is a better choice?

Yes, maybe.

-- 
Thank you,
Amit

Re: Add A Glossary

2020-03-18 Thread Corey Huinker

On Fri, Mar 13, 2020 at 12:18 AM Jürgen Purtz  wrote:

>
> The statement that names of schema objects are unique isn't *strictly* true,
> just *mostly* true. Take the case of a unique constraints.
>
> Concerning CONSTRAINTS you are right. Constraints seems to be an exception:
>
>- Their name belongs to a schema, but are not necessarily unique
>within this context:
>https://www.postgresql.org/docs/current/catalog-pg-constraint.html.
>- There is a UNIQUE index within the system catalog pg_constraints:  
> "pg_constraint_conrelid_contypid_conname_index"
>UNIQUE, btree (conrelid, contypid, conname), which expresses that
>names are unique within the context of a table/constraint-type.
>Nevertheless tests have shown that some stronger restrictions exists across
>table-boarders (,which seems to be implemented in CREATE statements - or as
>a consequence of your mentioned correlation between constraint and index 
> ?).
>
> I hope that there are no more such exception to the global rule 'object
> names in a schema are unique':
> https://www.postgresql.org/docs/current/sql-createschema.html
>
> This facts must be mentioned as a short note in glossary and in more
> detail in the later patch about the architecture.
>
>
> I did what I could to address the near uniqueness, as well as incorporate
your earlier edits into this new, squashed patch attached.
From dbce6922194eb4ad8de57e81e182b9a6eebf859e Mon Sep 17 00:00:00 2001
From: coreyhuinker 
Date: Tue, 10 Mar 2020 11:26:29 -0400
Subject: [PATCH] add glossary page with revisions

---
 doc/src/sgml/filelist.sgml |1 +
 doc/src/sgml/glossary.sgml | 1072 
 doc/src/sgml/postgres.sgml |1 +
 3 files changed, 1074 insertions(+)
 create mode 100644 doc/src/sgml/glossary.sgml

diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index 3da2365ea9..504c8a6326 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -170,6 +170,7 @@
 
 
 
+
 
 
 
diff --git a/doc/src/sgml/glossary.sgml b/doc/src/sgml/glossary.sgml
new file mode 100644
index 00..d28bfb6fcf
--- /dev/null
+++ b/doc/src/sgml/glossary.sgml
@@ -0,0 +1,1072 @@
+
+ Glossary
+ 
+  This is a list of terms and their meaning in the context of PostgreSQL and Databases in general.
+ 
+  
+   
+Aggregate
+
+ 
+  To combine a collection of data values into a single value, whose value may not be of the same type as the original values. Aggregate Functions combine multiple Rows that share a common set of values into one Row, which means that the only data visible in the values in common, and the aggregates of the non-common data.
+ 
+ 
+  For more information, see Aggregate Functions.
+ 
+
+   
+
+   
+Analytic
+
+ 
+  A Function whose computed value can reference values found in nearby Rows of the same Result Set.
+ 
+ 
+  For more information, see Window Functions.
+ 
+
+   
+
+   
+Archiver
+
+ 
+  A process that backs up WAL Files in order to reclaim space on the file system.
+ 
+ 
+  For more information, see Backup and Restore: Continuous Archiving and Point-in-Time Recovery (PITR).
+ 
+
+   
+
+   
+Atomic
+
+ 
+  In reference to the value of an Attribute or Datum: cannot be broken down into smaller components.
+ 
+ 
+  In reference to an operation: An event that cannot be completed in part: it must either entirely succeed or entirely fail. A series of SQL statements can be combined into a Transaction, and that transaction is said to be Atomic.
+ 
+
+   
+
+   
+Attribute
+
+ 
+  An element with a certain name and data type found within a Tuple or Table.
+ 
+
+   
+
+   
+Autovacuum
+
+ 
+  Processes that remove outdated MVCC Records of the Heap and Index.
+ 
+ 
+  For more information, see Routine Database Maintenance Tasks: Routine Vacuuming.
+ 
+
+   
+
+   
+Backend Process
+
+ 
+  Processes of an Instance which act on behalf of client Connections and handle their requests.
+ 
+ 
+  (Don't confuse this term with the similar terms Background Worker or Background Writer).
+ 
+
+   
+
+   
+Backend Server
+
+ 
+  See Instance.
+ 
+
+   
+
+   
+Background Worker
+
+ 
+  Individual processes within an Instance, which run system- or user-supplied code. Typical use cases are processes which handle parts of an SQL query to take advantage of parallel execution on servers with multiple CPUs.
+
+
+ For more information, see Background Worker Processes.
+
+
+   
+
+   
+Background Writer
+
+ 
+  Writes continuously dirty pages from Shared Memory to the file system. It starts periodically, but works only for a short period in order to distribute
+expensive I/O activity over time instead of generating fewe

Re: pg_stat_progress_basebackup - progress reporting for pg_basebackup, in the server side

2020-03-18 Thread Alvaro Herrera

On 2020-Mar-19, Amit Langote wrote:

> On Thu, Mar 19, 2020 at 11:24 AM Alvaro Herrera
>  wrote:
> > On 2020-Mar-19, Amit Langote wrote:
> >
> > > Magnus' idea of checking the values in pg_stat_get_progress_info() to
> > > determine whether to return NULL seems fine to me.  We will need to
> > > update the documentation of st_progress_param, because it currently
> > > says:
> > >
> > >  *  ...but the meaning of each element in the
> > >  * st_progress_param array is command-specific.
> > >  */
> > > ProgressCommandType st_progress_command;
> > > Oid st_progress_command_target;
> > > int64   st_progress_param[PGSTAT_NUM_PROGRESS_PARAM];
> > > } PgBackendStatus;
> > >
> > > If we are to define -1 in st_progress_param[] as NULL to the users,
> > > that must be mentioned here.
> >
> > Hmm, why -1?  It seems like a value that we might want to use for other
> > purposes in other params.  Maybe INT64_MIN is a better choice?
> 
> Yes, maybe.

Looking at the code involved, I think it's okay to use -1 in that
specific param and teach the SQL query to display null when it finds
that value.  We have plenty of magic numbers in the progress params, and
it's always the definition of the view that interprets them.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: type of some table storage params on doc

2020-03-18 Thread Michael Paquier

On Wed, Mar 18, 2020 at 02:29:17PM -0300, Alvaro Herrera wrote:
> I don't think it will, directly; debian.codesearch.net says only patroni
> and slony1-2 contain the "parse_real", and both have their own
> implementations (patroni is Python anyway).  I didn't find any
> relopt_real anywhere.

Reloptions in general are not used much in extensions, and one could
assume that reloptions of type real (well, double) are even less.

> However, if we were to rename DefineCustomRealVariable() to match, that
> would no doubt hurt a lot of people.  We also have GucRealCheckHook and
> GucRealAssignHook typedefs, but those appear to hit no Debian package.
> (In guc.c, the fallout rabbit hole goes pretty deep, but that seems well
> localized.)

I make use of this API myself, for some personal stuff, and even some
internal company stuff.  And I am ready to bet that it is much more
popular than its reloption cousin mainly for bgworkers.  Hence a
rename would need a compatibility layer remaining around.  Honestly, I
am not sure that a rename is worth it.

> I don't think the last pg13 CF is when to be spending time on this,
> though.

Indeed.

Do any of you have any arguments against the patch proposed upthread
switching "float4" to "floating point" then?  Better be sure..
--
Michael

signature.asc
Description: PGP signature

Re: pg_stat_progress_basebackup - progress reporting for pg_basebackup, in the server side

2020-03-18 Thread Fujii Masao





On 2020/03/19 11:32, Amit Langote wrote:

On Thu, Mar 19, 2020 at 11:24 AM Alvaro Herrera
 wrote:

On 2020-Mar-19, Amit Langote wrote:


Magnus' idea of checking the values in pg_stat_get_progress_info() to
determine whether to return NULL seems fine to me.


So you think that the latest patch is good enough?


 We will need to
update the documentation of st_progress_param, because it currently
says:

  *  ...but the meaning of each element in the
  * st_progress_param array is command-specific.
  */
 ProgressCommandType st_progress_command;
 Oid st_progress_command_target;
 int64   st_progress_param[PGSTAT_NUM_PROGRESS_PARAM];
} PgBackendStatus;

If we are to define -1 in st_progress_param[] as NULL to the users,
that must be mentioned here.


Hmm, why -1?  It seems like a value that we might want to use for other
purposes in other params.  Maybe INT64_MIN is a better choice?


Yes, maybe.


I don't think that we need to define the specific value like -1 as NULL 
globally.
Which value should be used for that purpose may vary by each command. Only for
pg_stat_progress_basebackup.backup_total, IMO using -1 as special value for
NULL is not so bad idea.

Regards,

--
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters

Re: error context for vacuum to include block number

2020-03-18 Thread Amit Kapila

On Tue, Mar 17, 2020 at 11:51 AM Amit Kapila  wrote:
>
> On Tue, Mar 17, 2020 at 9:52 AM Amit Kapila  wrote:
> >
> >
> > Another minor point, don't we need to remove the error stack by doing
> > "error_context_stack = errcallback.previous;" in parallel_vacuum_main?
> >
>
> Few other comments:
> 1. The error in lazy_vacuum_heap can either have phase
> VACUUM_ERRCB_PHASE_INDEX_* or VACUUM_ERRCB_PHASE_VACUUM_HEAP depending
> on when it occurs.  If it occurs the first time it enters that
> function before a call to lazy_vacuum_page, it will use phase
> VACUUM_ERRCB_PHASE_INDEX_*, otherwise, it would use
> VACUUM_ERRCB_PHASE_VACUUM_HEAP.  The reason is lazy_vacuum_index or
> lazy_cleanup_index won't reset the phase after leaving that function.
>
> 2. Also once we set phase as VACUUM_ERRCB_PHASE_VACUUM_HEAP via
> lazy_vacuum_page, it doesn't seem to be reset to
> VACUUM_ERRCB_PHASE_SCAN_HEAP even when we do scanning of the heap.  I
> think you need to set phase VACUUM_ERRCB_PHASE_SCAN_HEAP inside loop.
>
> I think we need to be a bit more careful in setting/resetting the
> phase information correctly so that it doesn't display the wrong info
> in the context in an error message.
>

Justin, are you planning to work on the pending comments?  If you
want, I can try to fix some of these.  We have less time left for this
CF, so we need to do things a bit quicker.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: pg_stat_progress_basebackup - progress reporting for pg_basebackup, in the server side

2020-03-18 Thread Amit Langote

On Thu, Mar 19, 2020 at 11:45 AM Fujii Masao
 wrote:
> On 2020/03/19 11:32, Amit Langote wrote:
> > On Thu, Mar 19, 2020 at 11:24 AM Alvaro Herrera
> >  wrote:
> >> On 2020-Mar-19, Amit Langote wrote:
> >>
> >>> Magnus' idea of checking the values in pg_stat_get_progress_info() to
> >>> determine whether to return NULL seems fine to me.
>
> So you think that the latest patch is good enough?

I see that the latest patch modifies pg_stat_progress_basebackup view
to return NULL, so not exactly.  IIUC, Magnus seems to be advocating
to *centralize* this in pg_stat_get_progress_info(), which all views
are based on, which means we need to globally define a NULL param
value, as Alvaro also pointed out.

But...

> >>>  We will need to
> >>> update the documentation of st_progress_param, because it currently
> >>> says:
> >>>
> >>>   *  ...but the meaning of each element in the
> >>>   * st_progress_param array is command-specific.
> >>>   */
> >>>  ProgressCommandType st_progress_command;
> >>>  Oid st_progress_command_target;
> >>>  int64   st_progress_param[PGSTAT_NUM_PROGRESS_PARAM];
> >>> } PgBackendStatus;
> >>>
> >>> If we are to define -1 in st_progress_param[] as NULL to the users,
> >>> that must be mentioned here.
> >>
> >> Hmm, why -1?  It seems like a value that we might want to use for other
> >> purposes in other params.  Maybe INT64_MIN is a better choice?
> >
> > Yes, maybe.
>
> I don't think that we need to define the specific value like -1 as NULL 
> globally.
> Which value should be used for that purpose may vary by each command. Only for
> pg_stat_progress_basebackup.backup_total, IMO using -1 as special value for
> NULL is not so bad idea.

This is the first instance of needing to display NULL in a progress
view, so a non-general solution may be enough for now.  IOW, your
latest patch is good enough for that. :)

--
Thank you,
Amit

Re: type of some table storage params on doc

2020-03-18 Thread Alvaro Herrera

On 2020-Mar-19, Michael Paquier wrote:

> Do any of you have any arguments against the patch proposed upthread
> switching "float4" to "floating point" then?  Better be sure..

None here.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Collation versioning

2020-03-18 Thread Michael Paquier

On Wed, Mar 18, 2020 at 04:35:43PM +0100, Julien Rouhaud wrote:
> On Wed, Mar 18, 2020 at 09:56:40AM +0100, Julien Rouhaud wrote:
> AFAICT it was only missing a call to index_update_collation_versions() in
> ReindexRelationConcurrently.  I added regression tests to make sure that
> REINDEX, REINDEX [INDEX|TABLE] CONCURRENTLY and VACUUM FULL are doing what's
> expected.

If you add a call to index_update_collation_versions(), the old and
invalid index will use the same refobjversion as the new index, which
is the latest collation version of the system, no?  If the operation
is interrupted before the invalid index is dropped, then we would keep
a confusing value for refobjversion, because the old invalid index
does not rely on the new collation version, but on the old one.
Hence, it seems to me that it would be correct to have the old invalid
index either use an empty version string to say "we don't know"
because the index is invalid anyway, or keep a reference to the old
collation version intact.  I think that the latter is much more useful
for debugging issues when upgrading a subset of indexes if the
operation is interrupted for a reason or another.

> Given discussion in nearby threads, I obviously can't add tests for failed
> REINDEX CONCURRENTLY, so here's what's happening with a manual repro:
> 
> =# UPDATE pg_depend SET refobjversion = 'meh' WHERE refobjversion = '153.97';
> UPDATE 1

Updates to catalogs are not an existing practice in the core
regression tests, so patches had better not do that. :p 

> =# REINDEX TABLE CONCURRENTLY t1 ;
> LOCATION:  ReindexRelationConcurrently, indexcmds.c:2839
> ^CCancel request sent
> ERROR:  57014: canceling statement due to user request
> LOCATION:  ProcessInterrupts, postgres.c:3171

I guess that you used a second session here beginning a transaction
before REINDEX CONCURRENTLY ran here so as it would stop after
swapping dependencies, right?

> =# SELECT objid::regclass, indisvalid, refobjversion
>FROM pg_depend d
>JOIN pg_index i ON i.indexrelid = d.objid
>WHERE refobjversion IS NOT NULL;
>   objid   | indisvalid | refobjversion
> --++---
>  t1_val_idx_ccold | f  | 153.97
>  t1_val_idx   | t  | meh
> (2 rows)
> 
> =# REINDEX TABLE t1;
> WARNING:  0A000: cannot reindex invalid index 
> "pg_toast.pg_toast_16418_index_ccold" on TOAST table, skipping
> LOCATION:  reindex_relation, index.c:3987
> REINDEX
> 
> =# SELECT objid::regclass, indisvalid, refobjversion
>FROM pg_depend d
>JOIN pg_index i ON i.indexrelid = d.objid
>WHERE refobjversion IS NOT NULL;
>   objid   | indisvalid | refobjversion
> --++---
>  t1_val_idx_ccold | t  | 153.97
>  t1_val_idx   | t  | 153.97
> (2 rows)
> 
> ISTM that it's working as intended.

After a non-concurrent reindex, this information is right.  However
based on the output of your test here, after REINDEX CONCURRENTLY the
information held in refobjversion is incorrect for t1_val_idx_ccold
and t1_val_idx.  They should be reversed.
--
Michael

signature.asc
Description: PGP signature

Re: Missing errcode() in ereport

2020-03-18 Thread Masahiko Sawada

On Wed, 18 Mar 2020 at 13:57, Amit Kapila  wrote:
>
> On Wed, Mar 18, 2020 at 9:01 AM Amit Kapila  wrote:
> >
> > On Tue, Mar 17, 2020 at 7:39 PM Tom Lane  wrote:
> > >
> > > Amit Kapila  writes:
> > > > On Tue, Mar 17, 2020 at 3:28 PM Michael Paquier  
> > > > wrote:
> > > >> Definitely an oversight.  All stable branches down to 9.5 have
> > > >> mistakes in the same area, with nothing extra by grepping around.
> > > >> Amit, I guess that you will take care of it?
> > >
> > > > Yes, I will unless I see any objections in a day or so.
> > >
> > > No need to wait, it's a pretty obvious thinko.
> > >
> >
> > Okay, I will push in some time.
> >
>
> Pushed.

Thank you!


Regards,

--
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: error context for vacuum to include block number

2020-03-18 Thread Justin Pryzby

On Thu, Mar 19, 2020 at 08:20:51AM +0530, Amit Kapila wrote:
> On Tue, Mar 17, 2020 at 11:51 AM Amit Kapila  wrote:
> > On Tue, Mar 17, 2020 at 9:52 AM Amit Kapila  wrote:
> > >
> > > Another minor point, don't we need to remove the error stack by doing
> > > "error_context_stack = errcallback.previous;" in parallel_vacuum_main?

It's a good idea, thanks.

> > Few other comments:
> > 1. The error in lazy_vacuum_heap can either have phase
> > VACUUM_ERRCB_PHASE_INDEX_* or VACUUM_ERRCB_PHASE_VACUUM_HEAP depending
> > on when it occurs.  If it occurs the first time it enters that
> > function before a call to lazy_vacuum_page, it will use phase
> > VACUUM_ERRCB_PHASE_INDEX_*, otherwise, it would use
> > VACUUM_ERRCB_PHASE_VACUUM_HEAP.  The reason is lazy_vacuum_index or
> > lazy_cleanup_index won't reset the phase after leaving that function.

I think you mean that lazy_vacuum_heap() calls ReadBuffer itself, so needs to
be in phase VACUUM_HEAP even before it calls vacuum_page().

> > 2. Also once we set phase as VACUUM_ERRCB_PHASE_VACUUM_HEAP via
> > lazy_vacuum_page, it doesn't seem to be reset to
> > VACUUM_ERRCB_PHASE_SCAN_HEAP even when we do scanning of the heap.  I
> > think you need to set phase VACUUM_ERRCB_PHASE_SCAN_HEAP inside loop.

You're right.  PHASE_SCAN_HEAP was set, but only inside a conditional.

Both those issues are due to a change in the most recent patch.  In the
previous patch, the PHASE_VACUUM_HEAP was set only by lazy_vacuum_heap(), and I
moved it recently to vacuum_page.  But it needs to be copied, as you point out.

That's unfortunate due to a lack of symmetry: lazy_vacuum_page does its own
progress update, which suggests to me that it should also set its own error
callback.  It'd be nicer if EITHER the calling functions did that (scan_heap()
and vacuum_heap()) or if it was sufficient for the called function
(vacuum_page()) to do it.  

> > I think we need to be a bit more careful in setting/resetting the
> > phase information correctly so that it doesn't display the wrong info
> > in the context in an error message.
> 
> Justin, are you planning to work on the pending comments?  If you
> want, I can try to fix some of these.  We have less time left for this
> CF, so we need to do things a bit quicker.

Thanks for reminding.

-- 
Justin
>From bc30a83b96ffcd55420d602eeadbd397fec31fd0 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Thu, 12 Dec 2019 20:54:37 -0600
Subject: [PATCH v25 1/4] vacuum errcontext to show block being processed

Discussion:
https://www.postgresql.org/message-id/20191120210600.gc30...@telsasoft.com
---
 src/backend/access/heap/vacuumlazy.c | 202 +++
 1 file changed, 177 insertions(+), 25 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 03c43efc32..9f0efa5aad 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -268,8 +268,19 @@ typedef struct LVParallelState
 	int			nindexes_parallel_condcleanup;
 } LVParallelState;
 
+typedef enum {
+	VACUUM_ERRCB_PHASE_UNKNOWN,
+	VACUUM_ERRCB_PHASE_SCAN_HEAP,
+	VACUUM_ERRCB_PHASE_VACUUM_INDEX,
+	VACUUM_ERRCB_PHASE_VACUUM_HEAP,
+	VACUUM_ERRCB_PHASE_INDEX_CLEANUP,
+	VACUUM_ERRCB_PHASE_VACUUM_FSM,
+} errcb_phase;
+
 typedef struct LVRelStats
 {
+	char 		*relnamespace;
+	char		*relname;
 	/* useindex = true means two-pass strategy; false means one-pass */
 	bool		useindex;
 	/* Overall statistics about rel */
@@ -290,8 +301,12 @@ typedef struct LVRelStats
 	int			num_index_scans;
 	TransactionId latestRemovedXid;
 	bool		lock_waiter_detected;
-} LVRelStats;
 
+	/* Used for error callback: */
+	char 		*indname;
+	BlockNumber blkno;	/* used only for heap operations */
+	errcb_phase	phase;
+} LVRelStats;
 
 /* A few variables that don't seem worth passing around as parameters */
 static int	elevel = -1;
@@ -314,10 +329,10 @@ static void lazy_vacuum_all_indexes(Relation onerel, Relation *Irel,
 	LVRelStats *vacrelstats, LVParallelState *lps,
 	int nindexes);
 static void lazy_vacuum_index(Relation indrel, IndexBulkDeleteResult **stats,
-			  LVDeadTuples *dead_tuples, double reltuples);
+			  LVDeadTuples *dead_tuples, double reltuples, LVRelStats *vacrelstats);
 static void lazy_cleanup_index(Relation indrel,
 			   IndexBulkDeleteResult **stats,
-			   double reltuples, bool estimated_count);
+			   double reltuples, bool estimated_count, LVRelStats *vacrelstats);
 static int	lazy_vacuum_page(Relation onerel, BlockNumber blkno, Buffer buffer,
 			 int tupindex, LVRelStats *vacrelstats, Buffer *vmbuffer);
 static bool should_attempt_truncation(VacuumParams *params,
@@ -337,13 +352,13 @@ static void lazy_parallel_vacuum_indexes(Relation *Irel, IndexBulkDeleteResult *
 		 int nindexes);
 static void parallel_vacuum_index(Relation *Irel, IndexBulkDeleteResult **stats,
   LVShared *lvshared, LVDeadTuples *dead_tuples,
-  int nindexes);
+

Re: logical copy_replication_slot issues

2020-03-18 Thread Masahiko Sawada

On Wed, 18 Mar 2020 at 04:24, Alvaro Herrera 
wrote:
>
> Thanks Arseny and Masahiko, I pushed this patch just now.  I changed
> some comments while at it, hopefully they are improvements.
>
> On 2020-Mar-09, Masahiko Sawada wrote:
>
> > ctx = CreateInitDecodingContext(plugin, NIL,
> > -   false,  /* do not build snapshot */
> > +   false,  /* do not build data
snapshot */
> > restart_lsn,
> > logical_read_local_xlog_page, NULL,
NULL,
> > NULL);
> >
> > I'm not sure this change makes the comment better. Could you elaborate
> > on the motivation of this change?
>
> I addressed this issue by adding a comment in CreateInitDecodingContext
> to explain the parameter, and then reference that comment's terminology
> in this call.  I think it ends up clearer overall -- not that this whole
> area is at all particularly clear.
>
> Thanks again.
>

Thank you for committing the patch! That changes look good to me.

Regards,

-- 
Masahiko Sawadahttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
-- 
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [PATCH] Add schema and table names to partition error

2020-03-18 Thread Amit Kapila

On Thu, Mar 19, 2020 at 3:55 AM Chris Bandy  wrote:
>
>
> Sorry for these troubles. Attached are patches created using `git
> format-patch -n -v6` on master at 487e9861d0.
>

No problem.  I have extracted your code changes as a separate patch
(see attached) as I am not sure we want to add tests for these cases.
This doesn't apply in back-branches, but I think that is small work
and we can do that if required.  The real question is do we want to
back-patch this?  Basically, this improves the errors in certain cases
by providing additional information that otherwise the user might need
to extract from error messages.  So, there doesn't seem to be pressing
need to back-patch this but OTOH, we have mentioned in docs that we
support to display this information for all SQLSTATE class 23
(integrity constraint violation) errors which is not true as we forgot
to adhere to that in some parts of code.

What do you think?  Anybody else has an opinion on whether to
back-patch this or not?

[1] - https://www.postgresql.org/docs/devel/errcodes-appendix.html

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

v7-0001-Add-object-names-to-partition-integrity-violation.patch
Description: Binary data

Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-18 Thread Laurenz Albe

On Tue, 2020-03-17 at 18:02 -0700, Andres Freund wrote:
> I don't think a default scale factor of 0 is going to be ok. For
> large-ish tables this will basically cause permanent vacuums. And it'll
> sometimes trigger for tables that actually coped well so far. 10 million
> rows could be a few seconds, not more.
> 
> I don't think that the argument that otherwise a table might not get
> vacuumed before autovacuum_freeze_max_age is convincing enough.
> 
> a) if that's indeed the argument, we should increase the default
>   autovacuum_freeze_max_age - now that there's insert triggered vacuums,
>   the main argument against that from before isn't valid anymore.
> 
> b) there's not really a good arguments for vacuuming more often than
>   autovacuum_freeze_max_age for such tables. It'll not be not frequent
>   enough to allow IOS for new data, and you're not preventing
>   anti-wraparound vacuums from happening.

According to my reckoning, that is the remaining objection to the patch
as it is (with ordinary freezing behavior).

How about a scale_factor od 0.005?  That will be high enough for large
tables, which seem to be the main concern here.

I fully agree with your point a) - should that be part of the patch?

I am not sure about b).  In my mind, the objective is not to prevent
anti-wraparound vacuums, but to see that they have less work to do,
because previous autovacuum runs already have frozen anything older than
vacuum_freeze_min_age.  So, assuming linear growth, the number of tuples
to freeze during any run would be at most one fourth of today's number
when we hit autovacuum_freeze_max_age.

I am still sorry to see more proactive freezing go, which would
reduce the impact for truly insert-only tables.
After sleeping on it, here is one last idea.

Granted, freezing with vacuum_freeze_min_age = 0 poses a problem
for those parts of the table that will receive updates or deletes.
But what if insert-triggered vacuum operates with - say -
one tenth of vacuum_freeze_min_age (unless explicitly overridden
for the table)?  That might still be high enough not to needlessly
freeze too many tuples that will still be modified, but it will
reduce the impact on insert-only tables.

Yours,
Laurenz Albe

Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-18 Thread Justin Pryzby

On Fri, Mar 13, 2020 at 02:38:51PM -0700, Andres Freund wrote:
> On 2020-03-13 13:44:42 -0500, Justin Pryzby wrote:
> > Having now played with the patch, I'll suggest that 1000 is too high a
> > threshold.  If autovacuum runs without FREEZE, I don't see why it couldn't 
> > be
> > much lower (10?) or use (0.2 * n_ins + 50) like the other autovacuum 
> > GUC.
> 
> ISTM that the danger of regressing workloads due to suddenly repeatedly
> scanning huge indexes that previously were never / rarely scanned is
> significant (if there's a few dead tuples, otherwise most indexes will
> be able to skip the scan since the vacuum_cleanup_index_scale_factor
> introduction)).

We could try to avoid that issue here:

|/* If any tuples need to be deleted, perform final vacuum cycle */
|/* XXX put a threshold on min number of tuples here? */
|if (dead_tuples->num_tuples > 0)
|{
|/* Work on all the indexes, and then the heap */
|lazy_vacuum_all_indexes(onerel, Irel, indstats, vacrelstats,
|lps, nindexes);
|
|/* Remove tuples from heap */
|lazy_vacuum_heap(onerel, vacrelstats);
|}

As you said, an insert-only table can skip scanning indexes, but an
insert-mostly table currently cannot.

Maybe we could skip the final index scan if we hit the autovacuum insert
threshold?

I still don't like mixing the thresholds with the behavior they imply, but
maybe what's needed is better docs describing all of vacuum's roles and its
procedure and priority in executing them.

The dead tuples would just be cleaned up during a future vacuum, right ?  So
that would be less efficient, but (no surprise) there's a balance to strike and
that can be tuned.  I think that wouldn't be an issue for most people; the
worst case would be if you set high maint_work_mem, and low insert threshold,
and you got increased bloat.  But faster vacuum if we avoided idx scans.

That might allow more flexibility in our discussion around default values for
thresholds for insert-triggered vacuum.

-- 
Justin

Re: RecoveryWalAll and RecoveryWalStream wait events

2020-03-18 Thread Fujii Masao

On 2020/03/18 22:37, Atsushi Torikoshi wrote:

On Wed, Mar 18, 2020 at 6:59 PM Fujii Masao mailto:masao.fu...@oss.nttdata.com>> wrote:

I meant the following part in the doc.

-
At startup, the standby begins by restoring all WAL available in the archive
location, calling restore_command. Once it reaches the end of WAL available
there and restore_command fails, it tries to restore any WAL available in 
the
pg_wal directory. If that fails, and streaming replication has been 
configured,
the standby tries to connect to the primary server and start streaming WAL 
from
the last valid record found in archive or pg_wal. If that fails or streaming
replication is not configured, or if the connection is later disconnected,
the standby goes back to step 1 and tries to restore the file from the 
archive
again. This loop of retries from the archive, pg_wal, and via streaming
replication goes on until the server is stopped or failover is triggered by 
a
trigger file.
-

Thanks!

 > It seems the comment on WaitForWALToBecomeAvailable()
 > does not go along with the high-availability.sgml, do we need
 > modification on the comment on the function?

No, I think for now. But you'd like to improve the docs?

I'll do it.

 >     But, anyway, you think that "pg_wal" should be used instead
 >
 >     of "local" here?
 >
 >
 > I don't have special opinion here.
 > It might be better because high-availability.sgml does not use
 > "local" but "pg_wal" for the explanation,  but I also feel it's
 > obvious in this context.

Ok, I changed that from "local" to "pg_wal" in the patch for
the master. Attached is the updated version of the patch.
If you're OK with this, I'd like to commit two patches that I posted
in this thread.

  Thanks for your modification and it looks good to me.

Pushed! Thanks a lot!

Regards,

--
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters

Re: Memory-Bounded Hash Aggregation

2020-03-18 Thread Justin Pryzby

On Sun, Mar 15, 2020 at 04:05:37PM -0700, Jeff Davis wrote:
> > +   if (from_tape)
> > +   partition_mem += HASHAGG_READ_BUFFER_SIZE;
> > +   partition_mem = npartitions * HASHAGG_WRITE_BUFFER_SIZE;
> > 
> > => That looks wrong ; should say += ?
> 
> Good catch! Fixed.

> +++ b/src/backend/executor/nodeAgg.c
> @@ -2518,9 +3499,36 @@ ExecInitAgg(Agg *node, EState *estate, int eflags)
>*/
>   if (use_hashing)
>   {
> + Plan   *outerplan = outerPlan(node);
> + uint64  totalGroups = 0;
> + for (i = 0; i < aggstate->num_hashes; i++)
> + totalGroups = aggstate->perhash[i].aggnode->numGroups;
> +
> + hash_agg_set_limits(aggstate->hashentrysize, totalGroups, 0,

I realize that I missed the train but .. that looks like another += issue?

Also, Andres was educating me about the range of behavior of "long" type, and I
see now while rebasing that you did the same thing.
https://www.postgresql.org/message-id/20200306175859.d56ohskarwldyrrw%40alap3.anarazel.de

-- 
Justin

96 matches

Mail list logo