Re: Deleting older versions in unique indexes to avoid page splits

2020-10-24 Thread Simon Riggs
On Fri, 23 Oct 2020 at 18:14, Peter Geoghegan  wrote:

> It's obvious that a page split is more expensive than the delete
> operation (when it works out).

The problem I highlighted is that the average UPDATE latency is x2
what it is on current HEAD. That is not consistent with the reported
TPS, so it remains an issue and that isn't obvious.

> It doesn't need a microbenchmark (and I
> really can't think of one that would make any sense).

I'm asking for detailed timings so we can understand the latency
issue. I didn't ask for a microbenchmark.

I celebrate your results, but we do need to understand the issue, somehow.

-- 
Simon Riggshttp://www.EnterpriseDB.com/




Re: [doc] remove reference to pg_dump pre-8.1 switch behaviour

2020-10-24 Thread Stephen Frost
Greetings,

* Heikki Linnakangas (hlinn...@iki.fi) wrote:
> On 23/10/2020 17:51, Tom Lane wrote:
> >But anyway, this was about documentation not code.  What I'm wondering
> >about is when to drop things like, say, this bit in the regex docs:
> >
> > Two significant incompatibilities exist between AREs and the ERE syntax
> > recognized by pre-7.4 releases of PostgreSQL:
> > (etc etc)
> >
> >Seems like we could have gotten rid of that by now, but when exactly
> >does it become fair game?  And can we have a non-ad-hoc process for
> >getting rid of such cruft?
> 
> Let's try to zoom in on a rule:
> 
> Anything that talks about 9.4 or above (min supported version - 1) should
> definitely be left in place.

Sure.

> Something around 9.0 is possibly still useful to someone upgrading or
> updating an application. Or someone might still bump into old blog posts
> from that era.

Right- going back ~10 years.  (I do think it'd be good to have an actual
policy rather than just "well, right now, it seems like 9.0 is about
right").

> Before that, I don't see much value. Although you could argue that I jumped
> the gun on the notice about pre-8.2 pg_dump -t behavior. pg_dump still
> supports servers down to 8.0, so someone might also have an 8.0 pg_dump
> binary lying around, and might be confused that -t behaves differently. On
> the whole though, I think removing it was fair game.

I don't really have an issue with it, to be clear.  I had been hoping we
might be able to come up with a general rule to apply across both
documentation and code (in particular, pg_dump), but that doesn't seem
to be the case.  That does mean that *some* documentation might end up
needing to keep notes from before 9.0, where that documentation is about
pg_dump and older versions.

> I did some grepping for strings like "version 7", "pre-8" and so on. I
> couldn't come up with a clear rule on what could be removed. Context
> matters. In text that talks about protocol versions or libpq functions like
> PQlibVersion() it seems sensible to go back as far as possible, for the
> completeness. And subtle user-visible differences in behavior are more
> important to document than changes in internal C APIs that cause a compiler
> failure, for example.

I agree that context matters.

> Other notices are about old syntax that's kept for backwards compatibility,
> but still works. It makes sense to mention the old version in those cases,
> even if it's very old, because the alternative would be to just say
> something like "very old version", which is not any shorter, just less
> precise.

I would argue that we shouldn't be keeping things around for backwards
compatibility, in general.  If we feel that it's a good feature to keep
then let's keep it and just document it as an alternative syntax.

> Findings in detail follow. And attached is a patch about the stuff that I
> think can be removed pretty straightforwardly.

Thanks a lot for spending the time going through all of this!

> array.sgml:
>   
>If the value written for an element is NULL (in any
> case
>variant), the element is taken to be NULL.  The presence of any quotes
>or backslashes disables this and allows the literal string value
>NULL to be entered.  Also, for backward compatibility with
>pre-8.2 versions of PostgreSQL, the linkend="guc-array-nulls"/> configuration parameter can be turned
>off to suppress recognition of NULL
> as a NULL.
>   
> 
> The GUC still exists, so we should keep this.

I agree we should keep the documentation as long as the GUC exists- but
we should be considering getting rid fo the GUC.

> catalogs.sgml:
>   
>The view pg_group exists for backwards
>compatibility: it emulates a catalog that existed in
>PostgreSQL before version 8.1.
>It shows the names and members of all roles that are marked as not
>rolcanlogin, which is an approximation to the
> set
>of roles that are being used as groups.
>   
> 
> pg_group still exists, and that paragraph explains why. We should keep it.
> (There's a similar paragraph for pg_shadow)

When I wrote that, many many years ago (apparently about 15, looking
back at the commit responsible), I certainly didn't expect we'd still
have them today.  These views, which have been only haphazardly
maintained and which don't really represent the current system terribly
well, need to go.  In hindsight, introducing them was a mistake in the
first place.  We support 5 major versions for a reason and people should
be updating their code as we make changes- as they have to do in lots of
other parts of the system and for other catalogs (consider the v10
changes of XLOG -> WAL).

If we keep the views we should keep the documentation, of course, but
it's long, long past time to rip out pg_user, pg_group, and pg_shadow.
Now that we have column-level privileges, I'd think we could probably
get rid of pg_roles too as not really providing much value.

> config.sgml (on synchronized_scans):

Re: Move catalog toast table and index declarations

2020-10-24 Thread John Naylor
On Thu, Oct 22, 2020 at 6:21 AM Peter Eisentraut <
peter.eisentr...@2ndquadrant.com> wrote:
> [v1]

Hi Peter,

This part created a syntax error:

--- a/src/include/catalog/unused_oids
+++ b/src/include/catalog/unused_oids
@@ -28,7 +28,7 @@ chdir $FindBin::RealBin or die "could not cd to
$FindBin::RealBin: $!\n";
 use lib "$FindBin::RealBin/../../backend/catalog/";
 use Catalog;

-my @input_files = (glob("pg_*.h"), qw(indexing.h));
+my @input_files = (glob("pg_*.h");

Style: In genbki.h, "extern int no_such_variable" is now out of place.
Also, the old comments like "The macro definitions are just to keep the C
compiler from spitting up." are now redundant in their new setting.

The rest looks good to me. unused_oids (once fixed), duplicate_oids, and
renumber_oids.pl seem to work fine.

-- 
John Naylor
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Deleting older versions in unique indexes to avoid page splits

2020-10-24 Thread Peter Geoghegan
On Sat, Oct 24, 2020 at 2:55 AM Simon Riggs  wrote:
> The problem I highlighted is that the average UPDATE latency is x2
> what it is on current HEAD. That is not consistent with the reported
> TPS, so it remains an issue and that isn't obvious.

Why do you say that? I reported that the UPDATE latency is less than
half for the benchmark.

There probably are some workloads with worse latency and throughput,
but generally only with high contention/small indexes. I'll try to
fine tune those, but some amount of it is probably inevitable. On
average query latency is quite a lot lower with the patch (where it is
affected at all - the mechanism is only used with non-hot updates).

-- 
Peter Geoghegan




Re: Timing of relcache inval at parallel worker init

2020-10-24 Thread Noah Misch
On Wed, Oct 21, 2020 at 11:31:54AM -0400, Robert Haas wrote:
> On Sat, Oct 17, 2020 at 7:53 AM Noah Misch  wrote:
> > Let's move InvalidateSystemCaches() later.
> > Invalidation should follow any worker initialization step that changes the
> > results of relcache validation; otherwise, we'd need to ensure the
> > InvalidateSystemCaches() will not validate any relcache entry.
> 
> The thinking behind the current placement was this: just before the
> call to InvalidateSystemCaches(), we restore the active and
> transaction snapshots. I think that we must now flush the caches
> before anyone does any more lookups; otherwise, they might get wrong
> answers. So, putting this code later makes the assumption that no such
> lookups happen meanwhile. That feels a little risky to me; at the very
> least, it should be clearly spelled out in the comments that no system
> cache lookups can happen in the functions we call in the interim.

My comment edits did attempt that.  I could enlarge those comments, at the
risk of putting undue weight on the topic.  One could also arrange for an
assertion failure if something takes a snapshot in the unwelcome period,
between StartParallelWorkerTransaction() and InvalidateSystemCaches().
Looking closer, we have live bugs from lookups during RestoreGUCState():

$ echo "begin; create user alice; set session authorization alice; set 
force_parallel_mode = regress; select 1" | psql -X
BEGIN
CREATE ROLE
SET
SET
ERROR:  role "alice" does not exist
CONTEXT:  while setting parameter "session_authorization" to "alice"

$ echo "begin; create text search configuration foo (copy=english); set 
default_text_search_config = foo; set force_parallel_mode = regress; select 1" 
| psql -X
BEGIN
CREATE TEXT SEARCH CONFIGURATION
SET
SET
ERROR:  invalid value for parameter "default_text_search_config": "public.foo"
CONTEXT:  while setting parameter "default_text_search_config" to "public.foo"

I gather $SUBJECT is the tip of an iceberg, so I'm withdrawing the patch and
abandoning the topic.




Re: pg_dump, ATTACH, and independently restorable child partitions

2020-10-24 Thread Justin Pryzby
On Fri, Oct 23, 2020 at 12:29:40AM -0500, Justin Pryzby wrote:
> Since this commit, pg_dump CREATEs tables and then ATTACHes them:
> 
> |commit 33a53130a89447e171a8268ae0b221bb48af6468
> |Author: Alvaro Herrera 
> |Date:   Mon Jun 10 18:56:23 2019 -0400
> |
> |Make pg_dump emit ATTACH PARTITION instead of PARTITION OF (reprise)
> |...
> |This change also has the advantage that the partition is restorable from
> |the dump (as a standalone table) even if its parent table isn't
> |restored.
> 
> I like the idea of child tables being independently restorable, but it doesn't
> seem to work.
...
> Now that I look, it seems like this is calling PQexec(), which sends a single,
> "simple" libpq message with:
> |CREATE TABLE ..; ALTER TABLE .. ATTACH PARTITION;
> ..which is transactional, so when the 2nd command fails, the CREATE is rolled 
> back.
> https://www.postgresql.org/docs/9.5/libpq-exec.html#LIBPQ-EXEC-MAIN

The easy fix is to add an explicit begin/commit.

-- 
Justin
>From 4ca8cf7de0e575b0175fbc3a5797b75857bd2fab Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Sat, 24 Oct 2020 14:51:18 -0500
Subject: [PATCH] pg_dump: Allow child partitions to be independently restored

..even if the parent doesn't exist, or has missing/incompatible columns

This seems to have been intended by commit 33a53130a
---
 src/bin/pg_dump/pg_dump.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 40844fa1e7..8a783f1d3e 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -15676,6 +15676,13 @@ dumpTableSchema(Archive *fout, TableInfo *tbinfo)
binary_upgrade_set_pg_class_oids(fout, q,

 tbinfo->dobj.catId.oid, false);
 
+   /*
+* Use explicit transaction commands so that failure in a
+* following command in the same txn doesn't cause ROLLBACK of
+* the "CREATE TABLE".
+*/
+   appendPQExpBufferStr(q, "begin;\n");
+
appendPQExpBuffer(q, "CREATE %s%s %s",
  tbinfo->relpersistence == 
RELPERSISTENCE_UNLOGGED ?
  "UNLOGGED " : "",
@@ -15896,6 +15903,7 @@ dumpTableSchema(Archive *fout, TableInfo *tbinfo)
}
else
appendPQExpBufferStr(q, ";\n");
+   appendPQExpBufferStr(q, "commit;\n");
 
/* Materialized views can depend on extensions */
if (tbinfo->relkind == RELKIND_MATVIEW)
-- 
2.17.0



Re: Parallel Append can break run-time partition pruning

2020-10-24 Thread David Rowley
On Fri, 16 Oct 2020 at 03:01, Amit Langote  wrote:
>
> Going over the last few emails, it seems that David held off from
> committing the patch, because of the lack of confidence in its
> robustness.   With the patch, a sub-partitioned child's partial
> Append's child paths will *always* be pulled up into the parent's set
> of partial child paths thus preventing the nesting of Appends, which
> the run-time pruning code can't currently cope with.  The lack of
> confidence seems to be due to the fact that always pulling up a
> sub-Append's child paths into the parent partial Append's child paths
> *may* cause the latter to behave wrongly under parallelism and the new
> code structure will prevent add_partial_path() from being the
> arbitrator of whether such a path is really the best in terms of cost.
>
> If we can't be confident in that approach, maybe we should consider
> making the run-time pruning code cope with nested Appends?

I've been thinking about this and my thoughts are:

There are other cases where we don't pullup sub-Merge Append nodes
anyway, so I think we should just make run-time pruning work for more
than just the top-level Append/Merge Append.

The case I'm thinking about is the code added in 959d00e9dbe for
ordered Append scans.  It's possible a sub-partitioned table has
partitions which don't participate in the same ordering.  We need to
keep the sub-Merge Append in those cases... well, at least when
there's more than 1 partition remaining after plan-time pruning.

I've attached the patch which, pending a final look, I'm proposing to
commit for master only.  I just don't quite think this is a bug fix,
and certainly, due to the invasiveness of the proposed patch, that
means no backpatch.

I fixed all the stuff about the incorrectly set partitioned_rels list.
What I ended up with there is making it accumulate_append_subpath's
job to also pullup the sub-paths partitioned_rels fields when pulling
up a nested Append/MergeAppend.   If there's no pullup, there then we
don't care about the sub-path's partitioned_rels.  That's for it to
deal with.  With that, I think that run-time pruning will only get the
RT indexes for partitions that we actually have sub-paths for. That's
pretty good, so I added an Assert() to verify that in
make_partitionedrel_pruneinfo().  (I hope I don't regret that later)

This does mean we need to maintain a different partitioned_rels list
for each Append path we consider.  So there's a number (six) of these
now between add_paths_to_append_rel() and
generate_orderedappend_paths().  To try to minimise the impact of
that, I've changed the field so that instead of being a List of
IntLists, it's just a List of Relids.  The top-level list just
contains a single element until you get a UNION ALL that selects from
a partitioned table on each side of the union.  Merging sub-path
partitioned rel RT indexes into the existing element is now pretty
cheap as it just uses bms_add_members() rather the list_concat we'd
have had to have used if it was still a List of IntLists.

After fixing up how partitioned_rels is built, I saw there were no
usages of RelOptInfo.partitioned_child_rels, so I got rid of it.

I did another couple of little cleanups and wrote some regression
tests to test all this.

Overall I'm fairly happy with this, especially getting rid of a
partitioned table related field from RelOptInfo.

David
From b8f33c0074f3546139a396a683926c29db2a193f Mon Sep 17 00:00:00 2001
From: "dgrow...@gmail.com" 
Date: Sun, 25 Oct 2020 12:38:58 +1300
Subject: [PATCH v1] Allow run-time pruning on nested Append/MergeAppend nodes

Previously we only tagged on the required information to allow the
executor to perform run-time partition pruning for Append/MergeAppend
nodes belonging to base relations.  It was thought that nested
Append/MergeAppend nodes were just about always pulled up into the
top-level Append/MergeAppend and that making the run-time pruning info for
any sub Append/MergeAppend nodes was a waste of time.  However, that was
likely badly thought through.

Some examples of cases we're unable to pullup nested Append/MergeAppends
are: 1) Parallel Append nodes with a mix of parallel and non-parallel
paths into a Parallel Append.  2) When planning an ordered Append scan a
sub-partition which is unordered may require a nested MergeAppend path to
ensure sub-partitions don't mix up the order of tuples being fed into the
top-level Append.

Unfortunately, it was not just as simple as removing the lines in
createplan.c which were purposefully not building the run-time pruning
info for anything but RELOPT_BASEREL relations.  The code in
add_paths_to_append_rel() was far too sloppy about which partitioned_rels
it included for the Append/MergeAppend paths.  The original code there
would always assume accumulate_append_subpath() would pull each sub-Append
and sub-MergeAppend path into the top level path.  While it does not
appear that there were any actual bugs caused by those surplus RT indexe

Fix typo in src/backend/utils/cache/lsyscache.c

2020-10-24 Thread Hou, Zhijie
Hi

I found the comment of function get_attgenerated(Oid relid, AttrNumber attnum) 
seems wrong.
It seems the function is given the attribute number not the name.

/*
  * get_attgenerated
  *
- * Given the relation id and the attribute name,
+ * Given the relation id and the attribute number,
  * return the "attgenerated" field from the attribute relation.
  *
  * Errors if not found.

Best regards,
houzj




0001-fix-typo.patch
Description: 0001-fix-typo.patch


Re: pgstat_report_activity() and parallel CREATE INDEX (was: Parallel index creation & pg_stat_activity)

2020-10-24 Thread Noah Misch
On Sat, Oct 17, 2020 at 08:39:35AM -0700, Peter Geoghegan wrote:
> On Tue, Oct 13, 2020 at 7:26 PM Noah Misch  wrote:
> > 1. Disable parallelism for the index build under ExecuteTruncateGuts().
> >Nobody will mourn a performance loss from declining parallelism for an
> >empty index, but I feel like this is fixing in the wrong place.
> > 2. Make _bt_begin_parallel() and begin_parallel_vacuum() recognize the
> >debug_query_string==NULL case and reproduce it on the worker.
> > 3. Require bgworkers to set debug_query_string before entering code of 
> > vacuum,
> >truncate, etc.  Logical replication might synthesize a DDL statement, or 
> > it
> >might just use a constant string.
> >
> > I tend to prefer (2), but (3) isn't bad.  Opinions?
> 
> I also prefer 2.

Thanks.  Here is the pair of patches I plan to use.  The second is equivalent
to v0; I just added a log message.
Author: Noah Misch 
Commit: Noah Misch 

Reproduce debug_query_string==NULL on parallel workers.

Certain background workers initiate parallel queries while
debug_query_string==NULL, at which point they attempted strlen(NULL) and
died to SIGSEGV.  Older debug_query_string observers allow NULL, so do
likewise in these newer ones.  Back-patch to v11, where commit
7de4a1bcc56f494acbd0d6e70781df877dc8ecb5 introduced the first of these.

Reviewed by FIXME.

Discussion: https://postgr.es/m/20201014022636.ga1962...@rfd.leadboat.com

diff --git a/src/backend/access/heap/vacuumlazy.c 
b/src/backend/access/heap/vacuumlazy.c
index 4f2f381..34ae8aa 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -3238,7 +3238,6 @@ begin_parallel_vacuum(Oid relid, Relation *Irel, 
LVRelStats *vacrelstats,
WalUsage   *wal_usage;
bool   *can_parallel_vacuum;
longmaxtuples;
-   char   *sharedquery;
Sizeest_shared;
Sizeest_deadtuples;
int nindexes_mwm = 0;
@@ -3335,9 +3334,14 @@ begin_parallel_vacuum(Oid relid, Relation *Irel, 
LVRelStats *vacrelstats,
shm_toc_estimate_keys(&pcxt->estimator, 1);
 
/* Finally, estimate PARALLEL_VACUUM_KEY_QUERY_TEXT space */
-   querylen = strlen(debug_query_string);
-   shm_toc_estimate_chunk(&pcxt->estimator, querylen + 1);
-   shm_toc_estimate_keys(&pcxt->estimator, 1);
+   if (debug_query_string)
+   {
+   querylen = strlen(debug_query_string);
+   shm_toc_estimate_chunk(&pcxt->estimator, querylen + 1);
+   shm_toc_estimate_keys(&pcxt->estimator, 1);
+   }
+   else
+   querylen = 0;   /* keep compiler quiet */
 
InitializeParallelDSM(pcxt);
 
@@ -3382,10 +3386,16 @@ begin_parallel_vacuum(Oid relid, Relation *Irel, 
LVRelStats *vacrelstats,
lps->wal_usage = wal_usage;
 
/* Store query string for workers */
-   sharedquery = (char *) shm_toc_allocate(pcxt->toc, querylen + 1);
-   memcpy(sharedquery, debug_query_string, querylen + 1);
-   sharedquery[querylen] = '\0';
-   shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_QUERY_TEXT, sharedquery);
+   if (debug_query_string)
+   {
+   char   *sharedquery;
+
+   sharedquery = (char *) shm_toc_allocate(pcxt->toc, querylen + 
1);
+   memcpy(sharedquery, debug_query_string, querylen + 1);
+   sharedquery[querylen] = '\0';
+   shm_toc_insert(pcxt->toc,
+  PARALLEL_VACUUM_KEY_QUERY_TEXT, 
sharedquery);
+   }
 
pfree(can_parallel_vacuum);
return lps;
@@ -3528,7 +3538,7 @@ parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
elog(DEBUG1, "starting parallel vacuum worker for bulk delete");
 
/* Set debug_query_string for individual workers */
-   sharedquery = shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_QUERY_TEXT, 
false);
+   sharedquery = shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_QUERY_TEXT, true);
debug_query_string = sharedquery;
pgstat_report_activity(STATE_RUNNING, debug_query_string);
 
diff --git a/src/backend/access/nbtree/nbtsort.c 
b/src/backend/access/nbtree/nbtsort.c
index efee867..8730de2 100644
--- a/src/backend/access/nbtree/nbtsort.c
+++ b/src/backend/access/nbtree/nbtsort.c
@@ -1466,7 +1466,6 @@ _bt_begin_parallel(BTBuildState *buildstate, bool 
isconcurrent, int request)
WalUsage   *walusage;
BufferUsage *bufferusage;
boolleaderparticipates = true;
-   char   *sharedquery;
int querylen;
 
 #ifdef DISABLE_LEADER_PARTICIPATION
@@ -1533,9 +1532,14 @@ _bt_begin_parallel(BTBuildState *buildstate, bool 
isconcurrent, int request)
shm_toc_estimate_keys(&pcxt->estimator, 1);
 
/* Finally, estimate PARALLEL_KEY_QUERY_TEXT space */
-   querylen = strlen(debug_