Re: Deleting older versions in unique indexes to avoid page splits
On Fri, 23 Oct 2020 at 18:14, Peter Geoghegan wrote: > It's obvious that a page split is more expensive than the delete > operation (when it works out). The problem I highlighted is that the average UPDATE latency is x2 what it is on current HEAD. That is not consistent with the reported TPS, so it remains an issue and that isn't obvious. > It doesn't need a microbenchmark (and I > really can't think of one that would make any sense). I'm asking for detailed timings so we can understand the latency issue. I didn't ask for a microbenchmark. I celebrate your results, but we do need to understand the issue, somehow. -- Simon Riggshttp://www.EnterpriseDB.com/
Re: [doc] remove reference to pg_dump pre-8.1 switch behaviour
Greetings, * Heikki Linnakangas (hlinn...@iki.fi) wrote: > On 23/10/2020 17:51, Tom Lane wrote: > >But anyway, this was about documentation not code. What I'm wondering > >about is when to drop things like, say, this bit in the regex docs: > > > > Two significant incompatibilities exist between AREs and the ERE syntax > > recognized by pre-7.4 releases of PostgreSQL: > > (etc etc) > > > >Seems like we could have gotten rid of that by now, but when exactly > >does it become fair game? And can we have a non-ad-hoc process for > >getting rid of such cruft? > > Let's try to zoom in on a rule: > > Anything that talks about 9.4 or above (min supported version - 1) should > definitely be left in place. Sure. > Something around 9.0 is possibly still useful to someone upgrading or > updating an application. Or someone might still bump into old blog posts > from that era. Right- going back ~10 years. (I do think it'd be good to have an actual policy rather than just "well, right now, it seems like 9.0 is about right"). > Before that, I don't see much value. Although you could argue that I jumped > the gun on the notice about pre-8.2 pg_dump -t behavior. pg_dump still > supports servers down to 8.0, so someone might also have an 8.0 pg_dump > binary lying around, and might be confused that -t behaves differently. On > the whole though, I think removing it was fair game. I don't really have an issue with it, to be clear. I had been hoping we might be able to come up with a general rule to apply across both documentation and code (in particular, pg_dump), but that doesn't seem to be the case. That does mean that *some* documentation might end up needing to keep notes from before 9.0, where that documentation is about pg_dump and older versions. > I did some grepping for strings like "version 7", "pre-8" and so on. I > couldn't come up with a clear rule on what could be removed. Context > matters. In text that talks about protocol versions or libpq functions like > PQlibVersion() it seems sensible to go back as far as possible, for the > completeness. And subtle user-visible differences in behavior are more > important to document than changes in internal C APIs that cause a compiler > failure, for example. I agree that context matters. > Other notices are about old syntax that's kept for backwards compatibility, > but still works. It makes sense to mention the old version in those cases, > even if it's very old, because the alternative would be to just say > something like "very old version", which is not any shorter, just less > precise. I would argue that we shouldn't be keeping things around for backwards compatibility, in general. If we feel that it's a good feature to keep then let's keep it and just document it as an alternative syntax. > Findings in detail follow. And attached is a patch about the stuff that I > think can be removed pretty straightforwardly. Thanks a lot for spending the time going through all of this! > array.sgml: > >If the value written for an element is NULL (in any > case >variant), the element is taken to be NULL. The presence of any quotes >or backslashes disables this and allows the literal string value >NULL to be entered. Also, for backward compatibility with >pre-8.2 versions of PostgreSQL, the linkend="guc-array-nulls"/> configuration parameter can be turned >off to suppress recognition of NULL > as a NULL. > > > The GUC still exists, so we should keep this. I agree we should keep the documentation as long as the GUC exists- but we should be considering getting rid fo the GUC. > catalogs.sgml: > >The view pg_group exists for backwards >compatibility: it emulates a catalog that existed in >PostgreSQL before version 8.1. >It shows the names and members of all roles that are marked as not >rolcanlogin, which is an approximation to the > set >of roles that are being used as groups. > > > pg_group still exists, and that paragraph explains why. We should keep it. > (There's a similar paragraph for pg_shadow) When I wrote that, many many years ago (apparently about 15, looking back at the commit responsible), I certainly didn't expect we'd still have them today. These views, which have been only haphazardly maintained and which don't really represent the current system terribly well, need to go. In hindsight, introducing them was a mistake in the first place. We support 5 major versions for a reason and people should be updating their code as we make changes- as they have to do in lots of other parts of the system and for other catalogs (consider the v10 changes of XLOG -> WAL). If we keep the views we should keep the documentation, of course, but it's long, long past time to rip out pg_user, pg_group, and pg_shadow. Now that we have column-level privileges, I'd think we could probably get rid of pg_roles too as not really providing much value. > config.sgml (on synchronized_scans):
Re: Move catalog toast table and index declarations
On Thu, Oct 22, 2020 at 6:21 AM Peter Eisentraut < peter.eisentr...@2ndquadrant.com> wrote: > [v1] Hi Peter, This part created a syntax error: --- a/src/include/catalog/unused_oids +++ b/src/include/catalog/unused_oids @@ -28,7 +28,7 @@ chdir $FindBin::RealBin or die "could not cd to $FindBin::RealBin: $!\n"; use lib "$FindBin::RealBin/../../backend/catalog/"; use Catalog; -my @input_files = (glob("pg_*.h"), qw(indexing.h)); +my @input_files = (glob("pg_*.h"); Style: In genbki.h, "extern int no_such_variable" is now out of place. Also, the old comments like "The macro definitions are just to keep the C compiler from spitting up." are now redundant in their new setting. The rest looks good to me. unused_oids (once fixed), duplicate_oids, and renumber_oids.pl seem to work fine. -- John Naylor EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Deleting older versions in unique indexes to avoid page splits
On Sat, Oct 24, 2020 at 2:55 AM Simon Riggs wrote: > The problem I highlighted is that the average UPDATE latency is x2 > what it is on current HEAD. That is not consistent with the reported > TPS, so it remains an issue and that isn't obvious. Why do you say that? I reported that the UPDATE latency is less than half for the benchmark. There probably are some workloads with worse latency and throughput, but generally only with high contention/small indexes. I'll try to fine tune those, but some amount of it is probably inevitable. On average query latency is quite a lot lower with the patch (where it is affected at all - the mechanism is only used with non-hot updates). -- Peter Geoghegan
Re: Timing of relcache inval at parallel worker init
On Wed, Oct 21, 2020 at 11:31:54AM -0400, Robert Haas wrote: > On Sat, Oct 17, 2020 at 7:53 AM Noah Misch wrote: > > Let's move InvalidateSystemCaches() later. > > Invalidation should follow any worker initialization step that changes the > > results of relcache validation; otherwise, we'd need to ensure the > > InvalidateSystemCaches() will not validate any relcache entry. > > The thinking behind the current placement was this: just before the > call to InvalidateSystemCaches(), we restore the active and > transaction snapshots. I think that we must now flush the caches > before anyone does any more lookups; otherwise, they might get wrong > answers. So, putting this code later makes the assumption that no such > lookups happen meanwhile. That feels a little risky to me; at the very > least, it should be clearly spelled out in the comments that no system > cache lookups can happen in the functions we call in the interim. My comment edits did attempt that. I could enlarge those comments, at the risk of putting undue weight on the topic. One could also arrange for an assertion failure if something takes a snapshot in the unwelcome period, between StartParallelWorkerTransaction() and InvalidateSystemCaches(). Looking closer, we have live bugs from lookups during RestoreGUCState(): $ echo "begin; create user alice; set session authorization alice; set force_parallel_mode = regress; select 1" | psql -X BEGIN CREATE ROLE SET SET ERROR: role "alice" does not exist CONTEXT: while setting parameter "session_authorization" to "alice" $ echo "begin; create text search configuration foo (copy=english); set default_text_search_config = foo; set force_parallel_mode = regress; select 1" | psql -X BEGIN CREATE TEXT SEARCH CONFIGURATION SET SET ERROR: invalid value for parameter "default_text_search_config": "public.foo" CONTEXT: while setting parameter "default_text_search_config" to "public.foo" I gather $SUBJECT is the tip of an iceberg, so I'm withdrawing the patch and abandoning the topic.
Re: pg_dump, ATTACH, and independently restorable child partitions
On Fri, Oct 23, 2020 at 12:29:40AM -0500, Justin Pryzby wrote: > Since this commit, pg_dump CREATEs tables and then ATTACHes them: > > |commit 33a53130a89447e171a8268ae0b221bb48af6468 > |Author: Alvaro Herrera > |Date: Mon Jun 10 18:56:23 2019 -0400 > | > |Make pg_dump emit ATTACH PARTITION instead of PARTITION OF (reprise) > |... > |This change also has the advantage that the partition is restorable from > |the dump (as a standalone table) even if its parent table isn't > |restored. > > I like the idea of child tables being independently restorable, but it doesn't > seem to work. ... > Now that I look, it seems like this is calling PQexec(), which sends a single, > "simple" libpq message with: > |CREATE TABLE ..; ALTER TABLE .. ATTACH PARTITION; > ..which is transactional, so when the 2nd command fails, the CREATE is rolled > back. > https://www.postgresql.org/docs/9.5/libpq-exec.html#LIBPQ-EXEC-MAIN The easy fix is to add an explicit begin/commit. -- Justin >From 4ca8cf7de0e575b0175fbc3a5797b75857bd2fab Mon Sep 17 00:00:00 2001 From: Justin Pryzby Date: Sat, 24 Oct 2020 14:51:18 -0500 Subject: [PATCH] pg_dump: Allow child partitions to be independently restored ..even if the parent doesn't exist, or has missing/incompatible columns This seems to have been intended by commit 33a53130a --- src/bin/pg_dump/pg_dump.c | 8 1 file changed, 8 insertions(+) diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c index 40844fa1e7..8a783f1d3e 100644 --- a/src/bin/pg_dump/pg_dump.c +++ b/src/bin/pg_dump/pg_dump.c @@ -15676,6 +15676,13 @@ dumpTableSchema(Archive *fout, TableInfo *tbinfo) binary_upgrade_set_pg_class_oids(fout, q, tbinfo->dobj.catId.oid, false); + /* +* Use explicit transaction commands so that failure in a +* following command in the same txn doesn't cause ROLLBACK of +* the "CREATE TABLE". +*/ + appendPQExpBufferStr(q, "begin;\n"); + appendPQExpBuffer(q, "CREATE %s%s %s", tbinfo->relpersistence == RELPERSISTENCE_UNLOGGED ? "UNLOGGED " : "", @@ -15896,6 +15903,7 @@ dumpTableSchema(Archive *fout, TableInfo *tbinfo) } else appendPQExpBufferStr(q, ";\n"); + appendPQExpBufferStr(q, "commit;\n"); /* Materialized views can depend on extensions */ if (tbinfo->relkind == RELKIND_MATVIEW) -- 2.17.0
Re: Parallel Append can break run-time partition pruning
On Fri, 16 Oct 2020 at 03:01, Amit Langote wrote: > > Going over the last few emails, it seems that David held off from > committing the patch, because of the lack of confidence in its > robustness. With the patch, a sub-partitioned child's partial > Append's child paths will *always* be pulled up into the parent's set > of partial child paths thus preventing the nesting of Appends, which > the run-time pruning code can't currently cope with. The lack of > confidence seems to be due to the fact that always pulling up a > sub-Append's child paths into the parent partial Append's child paths > *may* cause the latter to behave wrongly under parallelism and the new > code structure will prevent add_partial_path() from being the > arbitrator of whether such a path is really the best in terms of cost. > > If we can't be confident in that approach, maybe we should consider > making the run-time pruning code cope with nested Appends? I've been thinking about this and my thoughts are: There are other cases where we don't pullup sub-Merge Append nodes anyway, so I think we should just make run-time pruning work for more than just the top-level Append/Merge Append. The case I'm thinking about is the code added in 959d00e9dbe for ordered Append scans. It's possible a sub-partitioned table has partitions which don't participate in the same ordering. We need to keep the sub-Merge Append in those cases... well, at least when there's more than 1 partition remaining after plan-time pruning. I've attached the patch which, pending a final look, I'm proposing to commit for master only. I just don't quite think this is a bug fix, and certainly, due to the invasiveness of the proposed patch, that means no backpatch. I fixed all the stuff about the incorrectly set partitioned_rels list. What I ended up with there is making it accumulate_append_subpath's job to also pullup the sub-paths partitioned_rels fields when pulling up a nested Append/MergeAppend. If there's no pullup, there then we don't care about the sub-path's partitioned_rels. That's for it to deal with. With that, I think that run-time pruning will only get the RT indexes for partitions that we actually have sub-paths for. That's pretty good, so I added an Assert() to verify that in make_partitionedrel_pruneinfo(). (I hope I don't regret that later) This does mean we need to maintain a different partitioned_rels list for each Append path we consider. So there's a number (six) of these now between add_paths_to_append_rel() and generate_orderedappend_paths(). To try to minimise the impact of that, I've changed the field so that instead of being a List of IntLists, it's just a List of Relids. The top-level list just contains a single element until you get a UNION ALL that selects from a partitioned table on each side of the union. Merging sub-path partitioned rel RT indexes into the existing element is now pretty cheap as it just uses bms_add_members() rather the list_concat we'd have had to have used if it was still a List of IntLists. After fixing up how partitioned_rels is built, I saw there were no usages of RelOptInfo.partitioned_child_rels, so I got rid of it. I did another couple of little cleanups and wrote some regression tests to test all this. Overall I'm fairly happy with this, especially getting rid of a partitioned table related field from RelOptInfo. David From b8f33c0074f3546139a396a683926c29db2a193f Mon Sep 17 00:00:00 2001 From: "dgrow...@gmail.com" Date: Sun, 25 Oct 2020 12:38:58 +1300 Subject: [PATCH v1] Allow run-time pruning on nested Append/MergeAppend nodes Previously we only tagged on the required information to allow the executor to perform run-time partition pruning for Append/MergeAppend nodes belonging to base relations. It was thought that nested Append/MergeAppend nodes were just about always pulled up into the top-level Append/MergeAppend and that making the run-time pruning info for any sub Append/MergeAppend nodes was a waste of time. However, that was likely badly thought through. Some examples of cases we're unable to pullup nested Append/MergeAppends are: 1) Parallel Append nodes with a mix of parallel and non-parallel paths into a Parallel Append. 2) When planning an ordered Append scan a sub-partition which is unordered may require a nested MergeAppend path to ensure sub-partitions don't mix up the order of tuples being fed into the top-level Append. Unfortunately, it was not just as simple as removing the lines in createplan.c which were purposefully not building the run-time pruning info for anything but RELOPT_BASEREL relations. The code in add_paths_to_append_rel() was far too sloppy about which partitioned_rels it included for the Append/MergeAppend paths. The original code there would always assume accumulate_append_subpath() would pull each sub-Append and sub-MergeAppend path into the top level path. While it does not appear that there were any actual bugs caused by those surplus RT indexe
Fix typo in src/backend/utils/cache/lsyscache.c
Hi I found the comment of function get_attgenerated(Oid relid, AttrNumber attnum) seems wrong. It seems the function is given the attribute number not the name. /* * get_attgenerated * - * Given the relation id and the attribute name, + * Given the relation id and the attribute number, * return the "attgenerated" field from the attribute relation. * * Errors if not found. Best regards, houzj 0001-fix-typo.patch Description: 0001-fix-typo.patch
Re: pgstat_report_activity() and parallel CREATE INDEX (was: Parallel index creation & pg_stat_activity)
On Sat, Oct 17, 2020 at 08:39:35AM -0700, Peter Geoghegan wrote: > On Tue, Oct 13, 2020 at 7:26 PM Noah Misch wrote: > > 1. Disable parallelism for the index build under ExecuteTruncateGuts(). > >Nobody will mourn a performance loss from declining parallelism for an > >empty index, but I feel like this is fixing in the wrong place. > > 2. Make _bt_begin_parallel() and begin_parallel_vacuum() recognize the > >debug_query_string==NULL case and reproduce it on the worker. > > 3. Require bgworkers to set debug_query_string before entering code of > > vacuum, > >truncate, etc. Logical replication might synthesize a DDL statement, or > > it > >might just use a constant string. > > > > I tend to prefer (2), but (3) isn't bad. Opinions? > > I also prefer 2. Thanks. Here is the pair of patches I plan to use. The second is equivalent to v0; I just added a log message. Author: Noah Misch Commit: Noah Misch Reproduce debug_query_string==NULL on parallel workers. Certain background workers initiate parallel queries while debug_query_string==NULL, at which point they attempted strlen(NULL) and died to SIGSEGV. Older debug_query_string observers allow NULL, so do likewise in these newer ones. Back-patch to v11, where commit 7de4a1bcc56f494acbd0d6e70781df877dc8ecb5 introduced the first of these. Reviewed by FIXME. Discussion: https://postgr.es/m/20201014022636.ga1962...@rfd.leadboat.com diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c index 4f2f381..34ae8aa 100644 --- a/src/backend/access/heap/vacuumlazy.c +++ b/src/backend/access/heap/vacuumlazy.c @@ -3238,7 +3238,6 @@ begin_parallel_vacuum(Oid relid, Relation *Irel, LVRelStats *vacrelstats, WalUsage *wal_usage; bool *can_parallel_vacuum; longmaxtuples; - char *sharedquery; Sizeest_shared; Sizeest_deadtuples; int nindexes_mwm = 0; @@ -3335,9 +3334,14 @@ begin_parallel_vacuum(Oid relid, Relation *Irel, LVRelStats *vacrelstats, shm_toc_estimate_keys(&pcxt->estimator, 1); /* Finally, estimate PARALLEL_VACUUM_KEY_QUERY_TEXT space */ - querylen = strlen(debug_query_string); - shm_toc_estimate_chunk(&pcxt->estimator, querylen + 1); - shm_toc_estimate_keys(&pcxt->estimator, 1); + if (debug_query_string) + { + querylen = strlen(debug_query_string); + shm_toc_estimate_chunk(&pcxt->estimator, querylen + 1); + shm_toc_estimate_keys(&pcxt->estimator, 1); + } + else + querylen = 0; /* keep compiler quiet */ InitializeParallelDSM(pcxt); @@ -3382,10 +3386,16 @@ begin_parallel_vacuum(Oid relid, Relation *Irel, LVRelStats *vacrelstats, lps->wal_usage = wal_usage; /* Store query string for workers */ - sharedquery = (char *) shm_toc_allocate(pcxt->toc, querylen + 1); - memcpy(sharedquery, debug_query_string, querylen + 1); - sharedquery[querylen] = '\0'; - shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_QUERY_TEXT, sharedquery); + if (debug_query_string) + { + char *sharedquery; + + sharedquery = (char *) shm_toc_allocate(pcxt->toc, querylen + 1); + memcpy(sharedquery, debug_query_string, querylen + 1); + sharedquery[querylen] = '\0'; + shm_toc_insert(pcxt->toc, + PARALLEL_VACUUM_KEY_QUERY_TEXT, sharedquery); + } pfree(can_parallel_vacuum); return lps; @@ -3528,7 +3538,7 @@ parallel_vacuum_main(dsm_segment *seg, shm_toc *toc) elog(DEBUG1, "starting parallel vacuum worker for bulk delete"); /* Set debug_query_string for individual workers */ - sharedquery = shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_QUERY_TEXT, false); + sharedquery = shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_QUERY_TEXT, true); debug_query_string = sharedquery; pgstat_report_activity(STATE_RUNNING, debug_query_string); diff --git a/src/backend/access/nbtree/nbtsort.c b/src/backend/access/nbtree/nbtsort.c index efee867..8730de2 100644 --- a/src/backend/access/nbtree/nbtsort.c +++ b/src/backend/access/nbtree/nbtsort.c @@ -1466,7 +1466,6 @@ _bt_begin_parallel(BTBuildState *buildstate, bool isconcurrent, int request) WalUsage *walusage; BufferUsage *bufferusage; boolleaderparticipates = true; - char *sharedquery; int querylen; #ifdef DISABLE_LEADER_PARTICIPATION @@ -1533,9 +1532,14 @@ _bt_begin_parallel(BTBuildState *buildstate, bool isconcurrent, int request) shm_toc_estimate_keys(&pcxt->estimator, 1); /* Finally, estimate PARALLEL_KEY_QUERY_TEXT space */ - querylen = strlen(debug_