TAP tests for pg_verify_checksums
Hi all, The topic of $subject has been discussed a bit times, resulting in a couple of patches on the way: https://www.postgresql.org/message-id/20180830200258.gg15...@paquier.xyz https://www.postgresql.org/message-id/cabuevezekrwpegk2o-ov4z2mjft6cu8clfa-v1s1j4z8x7w...@mail.gmail.com However nothing has actually happened. Based on the previous feedback, attached is an updated patch to do the actual job. Magnus Hagander and Michael Banck need to be credited as authors, as this patch is roughly a merge of what they proposed. Thoughts? -- Michael diff --git a/src/bin/initdb/t/001_initdb.pl b/src/bin/initdb/t/001_initdb.pl index 5dc629fd5e..610887e5d4 100644 --- a/src/bin/initdb/t/001_initdb.pl +++ b/src/bin/initdb/t/001_initdb.pl @@ -8,7 +8,7 @@ use Fcntl ':mode'; use File::stat qw{lstat}; use PostgresNode; use TestLib; -use Test::More tests => 18; +use Test::More tests => 21; my $tempdir = TestLib::tempdir; my $xlogdir = "$tempdir/pgxlog"; @@ -58,6 +58,12 @@ mkdir $datadir; "check PGDATA permissions"); } } + +# Control file should tell that data checksums are disabled by default. +command_like(['pg_controldata', $datadir], + qr/Data page checksum version:.*0/, + 'checksums are disabled in control file'); + command_ok([ 'initdb', '-S', $datadir ], 'sync only'); command_fails([ 'initdb', $datadir ], 'existing data directory'); diff --git a/src/bin/pg_verify_checksums/.gitignore b/src/bin/pg_verify_checksums/.gitignore index d1dcdaf0dd..0e5e569a54 100644 --- a/src/bin/pg_verify_checksums/.gitignore +++ b/src/bin/pg_verify_checksums/.gitignore @@ -1 +1,3 @@ /pg_verify_checksums + +/tmp_check/ diff --git a/src/bin/pg_verify_checksums/Makefile b/src/bin/pg_verify_checksums/Makefile index d16261571f..cfe4ab1b8b 100644 --- a/src/bin/pg_verify_checksums/Makefile +++ b/src/bin/pg_verify_checksums/Makefile @@ -34,3 +34,9 @@ uninstall: clean distclean maintainer-clean: rm -f pg_verify_checksums$(X) $(OBJS) rm -rf tmp_check + +check: + $(prove_check) + +installcheck: + $(prove_installcheck) diff --git a/src/bin/pg_verify_checksums/t/001_basic.pl b/src/bin/pg_verify_checksums/t/001_basic.pl new file mode 100644 index 00..1fa2e12db2 --- /dev/null +++ b/src/bin/pg_verify_checksums/t/001_basic.pl @@ -0,0 +1,8 @@ +use strict; +use warnings; +use TestLib; +use Test::More tests => 8; + +program_help_ok('pg_verify_checksums'); +program_version_ok('pg_verify_checksums'); +program_options_handling_ok('pg_verify_checksums'); diff --git a/src/bin/pg_verify_checksums/t/002_actions.pl b/src/bin/pg_verify_checksums/t/002_actions.pl new file mode 100644 index 00..7423595181 --- /dev/null +++ b/src/bin/pg_verify_checksums/t/002_actions.pl @@ -0,0 +1,52 @@ +# Do basic sanity checks supported by pg_verify_checksums using +# an initialized cluster. + +use strict; +use warnings; +use PostgresNode; +use TestLib; +use Test::More tests => 8; + +# Initialize node with checksums enabled. +my $node = get_new_node('node_checksum'); +$node->init(extra => ['--data-checksums']); +my $pgdata = $node->data_dir; + +# Control file should know that checksums are disabled. +command_like(['pg_controldata', $pgdata], + qr/Data page checksum version:.*1/, + 'checksums enabled in control file'); + +# Checksums pass on a newly-created cluster +command_ok(['pg_verify_checksums', '-D', $pgdata], + "checksum checks done and passing"); + +# Checks cannot happen for an online cluster +$node->start; +command_fails(['pg_verify_checksums', '-D', $pgdata], + "checksum checks not done"); + +# Create table to corrupt and get its relfilenode +$node->safe_psql('postgres', + "SELECT a INTO corrupt1 FROM generate_series(1,1) AS a; + ALTER TABLE corrupt1 SET (autovacuum_enabled=false);"); + +my $file_corrupted = $node->safe_psql('postgres', + "SELECT pg_relation_filepath('corrupt1')"); + +# Set page header and block size +my $pageheader_size = 24; +my $block_size = $node->safe_psql('postgres', 'SHOW block_size;'); +$node->stop; + +# Time to create a corruption +open my $file, '+<', "$pgdata/$file_corrupted"; +seek($file, $pageheader_size, 0); +syswrite($file, '\0\0\0\0\0\0\0\0\0'); +close $file; + +$node->command_checks_all([ 'pg_verify_checksums', '-D', $pgdata], + 1, + [qr/Bad checksums: 1/s], + [qr/checksum verification failed/s], + 'pg_checksums reports checksum mismatch'); signature.asc Description: PGP signature
Re: [HACKERS] proposal: schema variables
st 3. 10. 2018 v 1:01 odesílatel Thomas Munro napsal: > On Sun, Sep 30, 2018 at 11:20 AM Pavel Stehule > wrote: > > I hope so now, there are almost complete functionality. Please, check it. > > Hi Pavel, > > FYI there is a regression test failure on Windows: > > plpgsql ... FAILED > > *** 4071,4077 > end; > $$ language plpgsql; > select stacked_diagnostics_test(); > - NOTICE: sqlstate: 22012, message: division by zero, context: > [PL/pgSQL function zero_divide() line 4 at RETURN <- SQL statement > "SELECT zero_divide()" <- PL/pgSQL function stacked_diagnostics_test() > line 6 at PERFORM] > + NOTICE: sqlstate: 42702, message: column reference "v" is ambiguous, > context: [PL/pgSQL function zero_divide() line 4 at RETURN <- SQL > statement "SELECT zero_divide()" <- PL/pgSQL function > stacked_diagnostics_test() line 6 at PERFORM] > > https://ci.appveyor.com/project/postgresql-cfbot/postgresql/build/1.0.15234 please, check attached patch Thank you for report Pavel > > -- > Thomas Munro > http://www.enterprisedb.com > schema-variables-20181004-02.patch.gz Description: application/gzip
Re: executor relation handling
On 2018/10/05 5:59, Tom Lane wrote: > Amit Langote writes: >> I've rebased the remaining patches. I broke down one of the patches into >> 2 and re-ordered the patches as follows: > >> 0001: introduces a function that opens range table relations and maintains >> them in an array indexes by RT index > >> 0002: introduces a new field in EState that's an array of RangeTblEntry >> pointers and revises macros used in the executor that access RTEs to >> return them from the array (David Rowley co-authored this one) > > I've pushed 0001 and 0002 with mostly cosmetic changes. Thanks a lot. > One thing I > wanted to point out explicitly, though, is that I found this bit of 0002 > to be a seriously bad idea: > > --- a/src/include/nodes/execnodes.h > +++ b/src/include/nodes/execnodes.h > @@ -20,6 +20,7 @@ > #include "executor/instrument.h" > #include "lib/pairingheap.h" > #include "nodes/params.h" > +#include "nodes/parsenodes.h" > #include "nodes/plannodes.h" > #include "utils/hsearch.h" > #include "utils/queryenvironment.h" > > Please do not add #includes of fundamental headers to other fundamental > headers without clearing it with somebody. There's little enough > structure to our header collection now. I don't want to end up in a > situation where effectively the entire header set gets pulled into > every .c file, or worse that we have actual reference loops in the > headers. (This is not an academic problem; somebody actually created > such a loop awhile back. Cleaning it up, by the time we'd recognized > the problem, was really painful.) Okay, sorry about that. I was slightly nervous that I had to do it when doing it, but forgot to mention that explicitly in the commit message or the email. >> 0003: moves result relation and ExecRowMark initialization out of InitPlan >> and into ExecInit* routines of respective nodes > > I am finding myself pretty unconvinced by this one; it seems like mostly > a random reallocation of responsibility with little advantage. The > particular thing that brought me to a screeching halt was seeing that > the patch removed ExecFindRowMark, despite the fact that that's part > of our advertised FDW API (see fdwhandler.sgml), and it didn't provide > any alternative way for an FDW to find out at runtime whether it's > subject to a row locking requirement. > > I thought for a minute about just leaving the function in place, but > that wouldn't work because both nodeLockRows and nodeModifyTable are > written so that they find^H^H^Hbuild their rowmarks only after recursing > to initialize their child plan nodes; so a child node that tried to use > ExecFindRowMark during ExecInitNode would get the wrong answer. Of > course, we could consider changing the order of operations during > initialization of those node types, but I'm not really seeing a compelling > reason why we should whack things around that much. > > So I'm inclined to just omit 0003. AFAICS this would only mean that > we couldn't drop the global PlanRowMarks list from PlannedStmt, which > does not bother me much. To be honest, I too had begun to fail to see the point of this patch since yesterday. In fact, getting this one to pass make check-world took a bit of head-scratching due to its interaction with EvalPlanQuals EState building, so I was almost to drop it from the series. But I felt that it might still be a good idea to get rid of what was described as special case code. Reading your argument against it based on how it messes up some of the assumptions regarding ExecRowMark design, I'm willing to let go of this one as more or less a cosmetic improvement. >> 0005: teaches planner to remove PlanRowMarks corresponding to dummy relations > > I'm not entirely sold on the value of that either? If you look at the first email on this thread, you can see that trimming the PlanRowMarks list down to just the ones needed during execution results in non-trivial improvement in SELECT FOR SHARE on partitioned tables with large number of partitions: Speedup is more pronounced with a benchmark that needs RowMarks, because one of the patches (0003) removes overhead around handling them. $ cat /tmp/select-lt-for-share.sql select * from lt where b = 999 for share; master tps = 94.095985 (excluding connections establishing) tps = 93.955702 (excluding connections establishing) patch 0003 (prune PlanRowMarks) tps = 712.544029 (excluding connections establishing) tps = 717.540052 (excluding connections establishing) But on reflection, this seems more like adding special case code to the planner just for this, rather than solving the more general problem of initializing *any* partition information after pruning, being discussed over at "speeding up planning with partitions" thread [1]. So, I don't mind dropping this one too. So, that leaves us with only the patch that revises the plan node fields in light of having relieved the executor of doing any locking of its own in *some* cases. That patch's
Re: Odd 9.4, 9.3 buildfarm failure on s390x
I wrote: > Andres Freund writes: >> On 2018-10-01 12:13:57 -0400, Tom Lane wrote: >>> (2) Drop the restriction. This'd require at least changing the >>> DESC correction, and maybe other things. I'm not sure what the >>> odds would be of finding everyplace we need to check. >> (2) seems more maintainable to me (or perhaps less unmaintainable). It's >> infrastructure, rather than every datatype + support out there... > I guess we could set up some testing infrastructure: hack int4cmp > and/or a couple other popular comparators so that they *always* > return INT_MIN, 0, or INT_MAX, and then see what falls over. Here's a draft patch against HEAD for this. I looked for problem spots by (a) testing with the STRESS_SORT_INT_MIN option I added in nbtcompare.c, (b) grepping for "x = -x" type code, and (c) grepping for "return -x" type code. (b) and (c) found several places that (a) didn't, which does not give me a warm feeling about whether I have found quite everything. I changed a couple of places where things might've been safe but I didn't feel like chasing the calls to prove it (e.g. imath.c), and contrariwise I left a *very* small number of places alone because they were inverting the result of a specific function that is defined to return 1/0/-1 and nothing else. regards, tom lane diff --git a/contrib/pgcrypto/imath.c b/contrib/pgcrypto/imath.c index cd528bf..ca4995a 100644 --- a/contrib/pgcrypto/imath.c +++ b/contrib/pgcrypto/imath.c @@ -1254,11 +1254,9 @@ mp_int_compare(mp_int a, mp_int b) * If they're both zero or positive, the normal comparison applies; if * both negative, the sense is reversed. */ - if (sa == MP_ZPOS) - return cmp; - else - return -cmp; - + if (sa != MP_ZPOS) + INVERT_SIGN(cmp); + return cmp; } else { @@ -1314,10 +1312,9 @@ mp_int_compare_value(mp_int z, int value) { cmp = s_vcmp(z, value); - if (vsign == MP_ZPOS) - return cmp; - else - return -cmp; + if (vsign != MP_ZPOS) + INVERT_SIGN(cmp); + return cmp; } else { diff --git a/doc/src/sgml/btree.sgml b/doc/src/sgml/btree.sgml index 8bd0bad..c16825e 100644 --- a/doc/src/sgml/btree.sgml +++ b/doc/src/sgml/btree.sgml @@ -228,11 +228,8 @@ B, A = B, or A > - B, respectively. The function must not - return INT_MIN for the A - < B case, - since the value may be negated before being tested for sign. A null - result is disallowed, too. + B, respectively. + A null result is disallowed: all values of the data type must be comparable. See src/backend/access/nbtree/nbtcompare.c for examples. diff --git a/src/backend/access/nbtree/nbtcompare.c b/src/backend/access/nbtree/nbtcompare.c index b1855e8..6f2ad23 100644 --- a/src/backend/access/nbtree/nbtcompare.c +++ b/src/backend/access/nbtree/nbtcompare.c @@ -22,11 +22,10 @@ * * The result is always an int32 regardless of the input datatype. * - * Although any negative int32 (except INT_MIN) is acceptable for reporting - * "<", and any positive int32 is acceptable for reporting ">", routines + * Although any negative int32 is acceptable for reporting "<", + * and any positive int32 is acceptable for reporting ">", routines * that work on 32-bit or wider datatypes can't just return "a - b". - * That could overflow and give the wrong answer. Also, one must not - * return INT_MIN to report "<", since some callers will negate the result. + * That could overflow and give the wrong answer. * * NOTE: it is critical that the comparison function impose a total order * on all non-NULL values of the data type, and that the datatype's @@ -44,13 +43,31 @@ * during an index access won't be recovered till end of query. This * primarily affects comparison routines for toastable datatypes; * they have to be careful to free any detoasted copy of an input datum. + * + * NOTE: we used to forbid comparison functions from returning INT_MIN, + * but that proves to be too error-prone because some platforms' versions + * of memcmp() etc can return INT_MIN. As a means of stress-testing + * callers, this file can be compiled with STRESS_SORT_INT_MIN defined + * to cause many of these functions to return INT_MIN or INT_MAX instead of + * their customary -1/+1. For production, though, that's not a good idea + * since users or third-party code might expect the traditional results. *- */ #include "postgres.h" +#include + #include "utils/builtins.h" #include "utils/sortsupport.h" +#ifdef STRESS_SORT_INT_MIN +#define A_LESS_THAN_B INT_MIN +#define A_GREATER_THAN_B INT_MAX +#else +#define A_LESS_THAN_B (-1) +#define A_GREATER_THAN_B 1 +#endif + Datum btboolcmp(PG_FUNCTION_ARGS) @@ -95,11 +112,11 @@ btint4cmp(PG_FUNCTION_ARGS) int32 b = PG_GETARG_INT32(1); if (a > b) - PG_RETURN_INT32(1); + PG_RETURN_INT32(A_GREATER_THAN_B); else if (a == b) PG_RETURN_INT32(0); else - PG_RETURN_INT32(-1); + PG_RET
Re: Odd 9.4, 9.3 buildfarm failure on s390x
On Fri, Oct 5, 2018 at 3:12 PM Tom Lane wrote: > Here's a draft patch against HEAD for this. + * Invert the sign of a qsort-style comparison result, ie, exchange negative + * and positive integer values, being careful not to get the wrong answer + * for INT_MIN. The argument should be an integral variable. + */ +#define INVERT_SIGN(var) \ +((var) = ((var) < 0) ? 1 : -(var)) I suppose someone might mistake this for a function that converts -42 to 42... would something like INVERT_COMPARE_RESULT() be better? -- Thomas Munro http://www.enterprisedb.com
Re: Odd 9.4, 9.3 buildfarm failure on s390x
Thomas Munro writes: > I suppose someone might mistake this for a function that converts -42 > to 42... would something like INVERT_COMPARE_RESULT() be better? I have no particular allegiance to the macro name; it's just the first idea that came to mind. regards, tom lane
Re: pgsql: Improve autovacuum logging for aggressive and anti-wraparound ru
On Wed, Oct 03, 2018 at 04:37:29PM +0900, Masahiko Sawada wrote: > Thank you for the comment. Attached the updated patch. So, I have come back to this stuff, and finished with the attached instead, so as the assertion is in a single place. I find that clearer. The comments have also been improved. Thoughts? -- Michael diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c index 8996d366e9..e73b548ac3 100644 --- a/src/backend/commands/vacuumlazy.c +++ b/src/backend/commands/vacuumlazy.c @@ -249,6 +249,10 @@ lazy_vacuum_rel(Relation onerel, int options, VacuumParams *params, if (options & VACOPT_DISABLE_PAGE_SKIPPING) aggressive = true; + /* vacuum preventing wraparound must be aggressive */ + Assert((params->is_wraparound && aggressive) || + !params->is_wraparound); + vacrelstats = (LVRelStats *) palloc0(sizeof(LVRelStats)); vacrelstats->old_rel_pages = onerel->rd_rel->relpages; @@ -376,10 +380,8 @@ lazy_vacuum_rel(Relation onerel, int options, VacuumParams *params, initStringInfo(&buf); if (params->is_wraparound) { -if (aggressive) - msgfmt = _("automatic aggressive vacuum to prevent wraparound of table \"%s.%s.%s\": index scans: %d\n"); -else - msgfmt = _("automatic vacuum to prevent wraparound of table \"%s.%s.%s\": index scans: %d\n"); +/* autovacuum preventing wraparound has to be aggressive */ +msgfmt = _("automatic aggressive vacuum to prevent wraparound of table \"%s.%s.%s\": index scans: %d\n"); } else { signature.asc Description: PGP signature
Re: Segfault when creating partition with a primary key and sql_drop trigger exists
On 10/4/18 8:34 PM, Michael Paquier wrote: > On Thu, Oct 04, 2018 at 04:54:45PM -0700, Andres Freund wrote: >> Are you suggesting we fix after RC1, or delay RC1? I'm not 100% sure >> I'm parsing your sentence correctly. > > I am suggesting to fix the issue after RC1 is released, but before GA. That approach would mean we would require an RC2, which would further delay the GA. Based on our release process and various schedules, any delays to the GA date at this point would push the release well into mid-November. Part of the reason that we selected the current proposed date for the GA is that we would have the .1 available 3 weeks after the release during the regularly scheduled cumulative update. Ideally it would be great if this was fixed for RC1. However, given the choice of pushing the release out further vs. saving the fix for .1 which would be relatively soon, I would vote for the latter. Jonathan signature.asc Description: OpenPGP digital signature
Re: Segfault when creating partition with a primary key and sql_drop trigger exists
"Jonathan S. Katz" writes: > On 10/4/18 8:34 PM, Michael Paquier wrote: >> I am suggesting to fix the issue after RC1 is released, but before GA. > That approach would mean we would require an RC2, which would further > delay the GA. Not sure about that. Alvaro seems to think there's a generic problem in event trigger processing, which if true, was likely there pre-v11. I don't think that patches that get back-patched further than 11 need to restart the RC clock. > Ideally it would be great if this was fixed for RC1. However, given the > choice of pushing the release out further vs. saving the fix for .1 > which would be relatively soon, I would vote for the latter. There are definitely good calendar reasons to hold to the schedule of RC1 next week and GA the week after. I'd only want to blow up that plan if we hit something that is both very bad and very hard to fix. However, if we have a fix that we believe in that's available post-RC1, I'm not sure I see why it's better to sit on it till after GA. regards, tom lane
Assertion failure with ALTER TABLE ATTACH PARTITION with log_min_messages >= DEBUG1
Hi all, Running installcheck on an instance with log_min_messages = DEBUG1, I can bump into the following assertion failure: #2 0x56145231e82c in ExceptionalCondition (conditionName=0x56145258ae0b "!(strvalue != ((void *)0))", errorType=0x56145258adfb "FailedAssertion", fileName=0x56145258adf0 "snprintf.c", lineNumber=440) at assert.c:54 [...] #7 0x56145231f518 in errmsg (fmt=0x5614524dac60 "validating foreign key constraint \"%s\"") at elog.c:796 #8 0x561451f6ab54 in validateForeignKeyConstraint (conname=0x0, rel=0x7f12833ca750, pkrel=0x7f12833cc468, pkindOid=36449, constraintOid=36466) at tablecmds.c:8566 #9 0x561451f61589 in ATRewriteTables (parsetree=0x561453bde5e0, wqueue=0x7ffe8f1d55e8, lockmode=8) at tablecmds.c:4549 Looking at the stack trace there is this log in validateForeignKeyConstraint: ereport(DEBUG1, (errmsg("validating foreign key constraint \"%s\"", conname))); However conname is set to NULL in this code path. This test case allows to reproduce easily the failure: CREATE TABLE fk_notpartitioned_pk (a int, b int, PRIMARY KEY (a, b)); CREATE TABLE fk_partitioned_fk (b int, a int) PARTITION BY RANGE (a, b); CREATE TABLE fk_partitioned_fk_1 (b int, a int); ALTER TABLE fk_partitioned_fk ADD FOREIGN KEY (a, b) REFERENCES fk_notpartitioned_pk; -- crash ALTER TABLE fk_partitioned_fk ATTACH PARTITION fk_partitioned_fk_1 FOR VALUES FROM (1,1) TO (2,2); From what I can see the problem comes from CloneForeignKeyConstraint which forgets to assign the constraint name when cloning the FK definition. While looking at the ATTACH PARTITION code, I have noticed that a variable gets overridden, which is in my opinion bad style. So the problem is rather close to what Tom has fixed in 3d0f68dd it seems. Attached is a patch for all that, with which installcheck-world passes for me. I am surprised this was not noticed before, the recent snprintf stanza is nicely helping, and this would need to be back-patched down to v11. Thanks, -- Michael diff --git a/src/backend/catalog/pg_constraint.c b/src/backend/catalog/pg_constraint.c index 6781b00c6e..2063abb8ae 100644 --- a/src/backend/catalog/pg_constraint.c +++ b/src/backend/catalog/pg_constraint.c @@ -574,6 +574,7 @@ CloneForeignKeyConstraints(Oid parentId, Oid relationId, List **cloned) fkconstraint = makeNode(Constraint); /* for now this is all we need */ + fkconstraint->conname = pstrdup(NameStr(constrForm->conname)); fkconstraint->fk_upd_action = constrForm->confupdtype; fkconstraint->fk_del_action = constrForm->confdeltype; fkconstraint->deferrable = constrForm->condeferrable; diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c index c145385f84..7df1fc2a76 100644 --- a/src/backend/commands/tablecmds.c +++ b/src/backend/commands/tablecmds.c @@ -14275,21 +14275,21 @@ ATExecAttachPartition(List **wqueue, Relation rel, PartitionCmd *cmd) RelationGetRelid(attachrel), &cloned); foreach(l, cloned) { - ClonedConstraint *cloned = lfirst(l); + ClonedConstraint *clonedcon = lfirst(l); NewConstraint *newcon; Relation clonedrel; AlteredTableInfo *parttab; - clonedrel = relation_open(cloned->relid, NoLock); + clonedrel = relation_open(clonedcon->relid, NoLock); parttab = ATGetQueueEntry(wqueue, clonedrel); newcon = (NewConstraint *) palloc0(sizeof(NewConstraint)); - newcon->name = cloned->constraint->conname; + newcon->name = clonedcon->constraint->conname; newcon->contype = CONSTR_FOREIGN; - newcon->refrelid = cloned->refrelid; - newcon->refindid = cloned->conindid; - newcon->conid = cloned->conid; - newcon->qual = (Node *) cloned->constraint; + newcon->refrelid = clonedcon->refrelid; + newcon->refindid = clonedcon->conindid; + newcon->conid = clonedcon->conid; + newcon->qual = (Node *) clonedcon->constraint; parttab->constraints = lappend(parttab->constraints, newcon); signature.asc Description: PGP signature
Re: Multiple primary key on partition table?
On Thu, Oct 4, 2018 at 8:25 PM Alvaro Herrera wrote: > > On 2018-Oct-01, Rajkumar Raghuwanshi wrote: > > > On Tue, Sep 18, 2018 at 11:20 AM amul sul wrote: > > > > > Here is the complete patch proposes the aforesaid fix with regression > > > test. > > > > Thanks, This worked for me. > > Yeah, looks good to me, pushed. I added one more regression test to > ensure that the PRIMARY KEY clause in the partition is still accepted if > the parent does not have one. > Thanks a lot, for enhancing regression and committing the fix. Regards, Amul
Re: pgsql: Improve autovacuum logging for aggressive and anti-wraparound ru
On Fri, Oct 5, 2018 at 12:16 PM Michael Paquier wrote: > > On Wed, Oct 03, 2018 at 04:37:29PM +0900, Masahiko Sawada wrote: > > Thank you for the comment. Attached the updated patch. > > So, I have come back to this stuff, and finished with the attached > instead, so as the assertion is in a single place. I find that > clearer. The comments have also been improved. Thoughts? Thank you! The patch looks good to me. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Re: partition tree inspection functions
On Thu, Oct 04, 2018 at 05:18:07PM +0900, Michael Paquier wrote: > So it seems that I am clearly outvoted here ;) > > Okay, let's do as you folks propose. And attached is a newer version with this isleaf stuff and the previous feedback from Amit integrated, as long as I recall about it. The version is indented, and tutti-quanti. Does that look fine to everybody here? The CF bot should normally pick up this patch, the previous garbage seen in one of the tests seems to come from the previous implementation which used strings... -- Michael From 11bf47ee3137b6a15699bfda50b0904f2e10a415 Mon Sep 17 00:00:00 2001 From: Michael Paquier Date: Fri, 5 Oct 2018 14:41:17 +0900 Subject: [PATCH] Add pg_partition_tree to display information about partitions This new function is useful to display a full tree of partitions with a partitioned table given in output, and avoids the need of any complex WITH RECURSIVE when looking at partition trees which are multi-level deep. It returns a set of records, one for each partition, containing the partition OID, its immediate parent OID, if the relation is a leaf in the tree and its level in the partition tree with given table considered as root, beginning at zero for the root, and incrementing by one each time the scan goes one level down. Author: Amit Langote Reviewed-by: Jesper Pedersen, Michael Paquier Discussion: https://postgr.es/m/8d00e51a-9a51-ad02-d53e-ba6bf50b2...@lab.ntt.co.jp --- doc/src/sgml/func.sgml | 42 ++ src/backend/utils/adt/Makefile | 4 +- src/backend/utils/adt/partitionfuncs.c | 137 +++ src/include/catalog/catversion.h | 2 +- src/include/catalog/pg_proc.dat | 9 ++ src/test/regress/expected/partition_info.out | 67 + src/test/regress/parallel_schedule | 2 +- src/test/regress/serial_schedule | 1 + src/test/regress/sql/partition_info.sql | 42 ++ 9 files changed, 302 insertions(+), 4 deletions(-) create mode 100644 src/backend/utils/adt/partitionfuncs.c create mode 100644 src/test/regress/expected/partition_info.out create mode 100644 src/test/regress/sql/partition_info.sql diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index f984d069e1..be315aaabb 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -20197,6 +20197,48 @@ postgres=# SELECT * FROM pg_walfile_name_offset(pg_stop_backup()); The function returns the number of new collation objects it created. + +Partitioning Information Functions + + + Name Return Type Description + + + + + pg_partition_tree(oid) + setof record + +List information about table in a partition tree for a given +partitioned table, which consists of one row for each partition and +table itself. Information provided includes the OID of the partition, +the OID of its immediate parent, if the partition is a leaf and its +level in the hierarchy. The value of level begins at +0 for the input table in its role as the root of +the partition tree, 1 for its partitions, +2 for their partitions, and so on. + + + + + + + +To check the total size of the data contained in +measurement table described in +, one could use the +following query: + + + +=# SELECT pg_size_pretty(sum(pg_relation_size(relid))) AS total_size + FROM pg_partition_tree('measurement'::regclass); + total_size + + 24 kB +(1 row) + + diff --git a/src/backend/utils/adt/Makefile b/src/backend/utils/adt/Makefile index 4b35dbb8bb..132ec7620c 100644 --- a/src/backend/utils/adt/Makefile +++ b/src/backend/utils/adt/Makefile @@ -20,8 +20,8 @@ OBJS = acl.o amutils.o arrayfuncs.o array_expanded.o array_selfuncs.o \ jsonfuncs.o like.o lockfuncs.o mac.o mac8.o misc.o nabstime.o name.o \ network.o network_gist.o network_selfuncs.o network_spgist.o \ numeric.o numutils.o oid.o oracle_compat.o \ - orderedsetaggs.o pg_locale.o pg_lsn.o pg_upgrade_support.o \ - pgstatfuncs.o \ + orderedsetaggs.o partitionfuncs.o pg_locale.o pg_lsn.o \ + pg_upgrade_support.o pgstatfuncs.o \ pseudotypes.o quote.o rangetypes.o rangetypes_gist.o \ rangetypes_selfuncs.o rangetypes_spgist.o rangetypes_typanalyze.o \ regexp.o regproc.o ri_triggers.o rowtypes.o ruleutils.o \ diff --git a/src/backend/utils/adt/partitionfuncs.c b/src/backend/utils/adt/partitionfuncs.c new file mode 100644 index 00..c1743524c8 --- /dev/null +++ b/src/backend/utils/adt/partitionfuncs.c @@ -0,0 +1,137 @@ +/*- + * + * partitionfuncs.c + * Functions for accessing partition-related metadata + * + * Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group + * Portions Copyright (c) 1994, Regents of the University of California + * + * + * IDENTIFICATION + *
Re: partition tree inspection functions
pá 5. 10. 2018 v 7:57 odesílatel Michael Paquier napsal: > On Thu, Oct 04, 2018 at 05:18:07PM +0900, Michael Paquier wrote: > > So it seems that I am clearly outvoted here ;) > > > > Okay, let's do as you folks propose. > > And attached is a newer version with this isleaf stuff and the previous > feedback from Amit integrated, as long as I recall about it. The > version is indented, and tutti-quanti. Does that look fine to everybody > here? > looks well Pavel > The CF bot should normally pick up this patch, the previous garbage seen > in one of the tests seems to come from the previous implementation which > used strings... > -- > Michael >
Re: pg_upgrade failed with ERROR: null relpartbound for relation 18159 error.
Thanks for the report. On 2018/10/04 3:58, Rajkumar Raghuwanshi wrote: > Hi, > > I am getting ERROR: null relpartbound for relation 18159 while doing > pg_upgrade from v11 to v11/master. > > -- user-defined operator class in partition key > CREATE FUNCTION my_int4_sort(int4,int4) RETURNS int LANGUAGE sql > AS $$ SELECT CASE WHEN $1 = $2 THEN 0 WHEN $1 > $2 THEN 1 ELSE -1 END; $$; > CREATE OPERATOR CLASS test_int4_ops FOR TYPE int4 USING btree AS > OPERATOR 1 < (int4,int4), OPERATOR 2 <= (int4,int4), > OPERATOR 3 = (int4,int4), OPERATOR 4 >= (int4,int4), > OPERATOR 5 > (int4,int4), FUNCTION 1 my_int4_sort(int4,int4); > CREATE TABLE partkey_t (a int4) PARTITION BY RANGE (a test_int4_ops); > CREATE TABLE partkey_t_1 PARTITION OF partkey_t FOR VALUES FROM (0) TO > (1000); > INSERT INTO partkey_t VALUES (100); > INSERT INTO partkey_t VALUES (200); > > --ran pg_upgrade failed with below error. > pg_restore: [archiver (db)] could not execute query: ERROR: null > relpartbound for relation 18159 > CONTEXT: SQL function "my_int4_sort" Interesting test case. To reproduce, the following works too (after creating the objects as described above): alter table partkey_t detach partition partkey_t_1; alter table partkey_t attach partition partkey_t_1 for values from (0) to (1000); ERROR: null relpartbound for relation 16396 CONTEXT: SQL function "my_int4_sort" The stack at the time of the error: (gdb) bt #0 RelationBuildPartitionDesc #1 0x009bf04e in RelationBuildDesc #2 0x009c1784 in RelationClearRelation #3 0x009c1cc5 in RelationFlushRelation #4 0x009c1dd9 in RelationCacheInvalidateEntry #5 0x009b9496 in LocalExecuteInvalidationMessage #6 0x009b91ec in ProcessInvalidationMessages #7 0x009b9cdb in CommandEndInvalidationMessages #8 0x005346ef in AtCCI_LocalCache #9 0x00534124 in CommandCounterIncrement #10 0x006c579d in fmgr_sql #11 0x009de7c2 in FunctionCall2Coll #12 0x0058ac9f in partition_rbound_cmp #13 0x00588059 in check_new_partition_bound #14 0x0067f536 in ATExecAttachPartition So, the CommandCounterIncrement done in fmgr_sql causes partkey_t's PartitionDesc to be recomputed, which counts partkey_t_1 as its child because ATExecAttachPartition has already finished CreateInheritance which would've sent out an invalidation message for partkey_t. As of commit 2fbdf1b38bc [1], which has been applied in 11 and HEAD branches, RelationBuildPartitionDesc emits an error if we don't find relpartbound set for a child found by scanning pg_inherits, instead of skipping such children. While that commit switched the order of creating pg_inherits entry and checking a new bound against existing bounds in DefineRelation in light of aforementioned change, it didn't in ATExecAttachPartition, hence this error. Attached patch fixes that. I thought we'd need to apply this to 10, 11, HEAD, but I couldn't reproduce this in 10. That's because the above commit wasn't applied to 10, so the child that causes this error is being skipped in 10's case. Maybe, we should apply parts of the above commit that apply to 10 and then this patch on top. Attached for-10.patch file does that. Thanks, Amit [1] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=2fbdf1b38bc diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c index c145385f84..2d0043d1db 100644 --- a/src/backend/commands/tablecmds.c +++ b/src/backend/commands/tablecmds.c @@ -14247,9 +14247,6 @@ ATExecAttachPartition(List **wqueue, Relation rel, PartitionCmd *cmd) trigger_name, RelationGetRelationName(attachrel)), errdetail("ROW triggers with transition tables are not supported on partitions"))); - /* OK to create inheritance. Rest of the checks performed there */ - CreateInheritance(attachrel, rel); - /* * Check that the new partition's bound is valid and does not overlap any * of existing partitions of the parent - note that it does not return on @@ -14258,6 +14255,9 @@ ATExecAttachPartition(List **wqueue, Relation rel, PartitionCmd *cmd) check_new_partition_bound(RelationGetRelationName(attachrel), rel, cmd->bound); + /* OK to create inheritance. Rest of the checks performed there */ + CreateInheritance(attachrel, rel); + /* Update the pg_class entry. */ StorePartitionBound(attachrel, rel, cmd->bound); diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c index 8219d05d83..75e41ed21c 100644 --- a/src/backend/catalog/partition.c +++ b/src/backend/catalog/partition.c @@ -195,23 +195,12 @@ RelationBuildPartitionDesc(Relation rel) if (!HeapTupleIsValid(tuple)) elog(ERROR, "cache lookup failed for relation %u", inhrelid
Re: partition tree inspection functions
On 2018/10/05 14:56, Michael Paquier wrote: > On Thu, Oct 04, 2018 at 05:18:07PM +0900, Michael Paquier wrote: >> So it seems that I am clearly outvoted here ;) >> >> Okay, let's do as you folks propose. > > And attached is a newer version with this isleaf stuff and the previous > feedback from Amit integrated, as long as I recall about it. The > version is indented, and tutti-quanti. Does that look fine to everybody > here? Thanks for making those changes yourself and posting the new version. Can you check the attached diff file for some updates to the documentation part of the patch. Other parts look fine. Thanks, Amit diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index be315aaabb..826c59ecd9 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -20209,14 +20209,14 @@ postgres=# SELECT * FROM pg_walfile_name_offset(pg_stop_backup()); pg_partition_tree(oid) setof record -List information about table in a partition tree for a given +List information about tables in partition tree for a given partitioned table, which consists of one row for each partition and table itself. Information provided includes the OID of the partition, -the OID of its immediate parent, if the partition is a leaf and its -level in the hierarchy. The value of level begins at -0 for the input table in its role as the root of -the partition tree, 1 for its partitions, -2 for their partitions, and so on. +the OID of its immediate parent, a boolean value telling if the +partition is a leaf, and an integer telling its level in the hierarchy. +The value of level begins at 0 for the input table +in its role as the root of the partition tree, 1 for +its partitions, 2 for their partitions, and so on.
Re: pgsql: Improve autovacuum logging for aggressive and anti-wraparound ru
On Fri, Oct 05, 2018 at 12:16:03PM +0900, Michael Paquier wrote: > So, I have come back to this stuff, and finished with the attached > instead, so as the assertion is in a single place. I find that > clearer. The comments have also been improved. Thoughts? And so... I have been looking at committing this thing, and while testing in-depth I have been able to trigger a case where an autovacuum has been able to be not aggressive but anti-wraparound, which is exactly what should not be possible, no? I have simply created an instance with autovacuum_freeze_max_age = 20, then ran pgbench with autovacuum_freeze_table_age=20 set for each table, and also ran installcheck-world in parallel. This has been able to trigger the assertion pretty quickly. Here is the stack trace: #2 0x55e1a12ef87b in ExceptionalCondition (conditionName=0x55e1a14b45c8 "!((params->is_wraparound && aggressive) || !params->is_wraparound)", errorType=0x55e1a14b459d "FailedAssertion", fileName=0x55e1a14b4590 "vacuumlazy.c", lineNumber=254) at assert.c:54 #3 0x55e1a0f6c6ad in lazy_vacuum_rel (onerel=0x7f163e7ea710, options=65, params=0x55e1a2eeba70, bstrategy=0x55e1a2f5c1a0) at vacuumlazy.c:253 #4 0x55e1a0f6c217 in vacuum_rel (relid=1260, relation=0x55e1a2f5d748, options=65, params=0x55e1a2eeba70) at vacuum.c:1714 #5 0x55e1a0f6a3ac in vacuum (options=65, relations=0x55e1a2f2f050, params=0x55e1a2eeba70, bstrategy=0x55e1a2f5c1a0, isTopLevel=true) at vacuum.c:340 #6 0x55e1a10c1ddd in autovacuum_do_vac_analyze (tab=0x55e1a2eeba68, bstrategy=0x55e1a2f5c1a0) at autovacuum.c:3121 #7 0x55e1a10c0e19 in do_autovacuum () at autovacuum.c:2476 $2 = {spcNode = 1664, dbNode = 0, relNode = 1260} (gdb) p onerel->rd_node.relNode $3 = 1260 (gdb) p params->is_wraparound $4 = true (gdb) p aggressive $5 = false I still have the core file, the binaries and the data folder, so it's not a problem to dig into it. -- Michael signature.asc Description: PGP signature
Replace PG_AUTOCONF_FILENAME with parameter
As opposed to config_file, hba_file, etc. the name and location of the autoconfig file is fixed to $PGDATA/postgresql.auto.conf (guc.h) The static name also appears in some other locations, such as pg_basebackup, pg_rewind & initdb.c. New feature suggestion: Allow to set the file's location, as with hba_file and ident_file. Would be glad to make it my contribution to the code.
Re: Replace PG_AUTOCONF_FILENAME with parameter
On Fri, Oct 05, 2018 at 09:39:03AM +0300, Jonathan Jacobson wrote: > As opposed to config_file, hba_file, etc. the name and location of the > autoconfig file is fixed to $PGDATA/postgresql.auto.conf (guc.h) > > The static name also appears in some other locations, such as > pg_basebackup, pg_rewind & initdb.c. > > New feature suggestion: Allow to set the file's location, as with hba_file > and ident_file. > > Would be glad to make it my contribution to the code. Could you explain what is your use-case? postgresql.auto.conf is used internally by Postgres to store information from ALTER SYSTEM, and is not something aimed at being available externally. -- Michael signature.asc Description: PGP signature
Re: partition tree inspection functions
On Fri, Oct 05, 2018 at 03:31:49PM +0900, Amit Langote wrote: > Thanks for making those changes yourself and posting the new version. > > Can you check the attached diff file for some updates to the documentation > part of the patch. Other parts look fine. OK, I merged that into my local branch. From what I can see Mr Robot is happy as well: http://cfbot.cputube.org/amit-langote.html -- Michael signature.asc Description: PGP signature
Re: Replace PG_AUTOCONF_FILENAME with parameter
> "Jonathan" == Jonathan Jacobson writes: Jonathan> As opposed to config_file, hba_file, etc. the name and Jonathan> location of the autoconfig file is fixed to Jonathan> $PGDATA/postgresql.auto.conf (guc.h) The reason it's in a fixed location is that the server needs to be able to rewrite the file, and we don't assume we can write to directories outside of $PGDATA unless the _directory_ (not the file) was explicitly specified (e.g. unix_socket_directories or stats_temp_directory). hba_file and ident_file can be outside $PGDATA because the server never needs to write to them. -- Andrew (irc:RhodiumToad)
Re: executor relation handling
On 2018/10/04 5:16, Tom Lane wrote: > I wrote: >> Amit Langote writes: >>> Should this check that we're not in a parallel worker process? > >> Hmm. I've not seen any failures in the parallel parts of the regular >> regression tests, but maybe I'd better do a force_parallel_mode >> run before committing. >> In general, I'm not on board with the idea that parallel workers don't >> need to get their own locks, so I don't really want to exclude parallel >> workers from this check. But if it's not safe for that today, fixing it >> is beyond the scope of this particular patch. > > So the place where that came out in the wash is the commit I just made > (9a3cebeaa) to change the executor from taking table locks to asserting > that somebody else took them already. Thanks for getting that done. > To make that work, I had to make > both ExecOpenScanRelation and relation_open skip checking for lock-held > if IsParallelWorker(). Yeah, I had to do that to when rebasing the remaining patches. > This makes me entirely uncomfortable with the idea that parallel workers > can be allowed to not take any locks of their own. There is no basis > for arguing that we have field proof that that's safe, because *up to > now, parallel workers in fact did take their own locks*. And it seems > unsafe on its face, because there's nothing that really guarantees that > the parent process won't go away while children are still running. > (elog(FATAL) seems like a counterexample, for instance.) > > I think that we ought to adjust parallel query to insist that children > do take locks, and then revert the IsParallelWorker() exceptions I made > here. Maybe I'm missing something here, but isn't the necessary adjustment just that the relations are opened with locks if inside a parallel worker? > I plan to leave that point in abeyance till we've got the rest > of these changes in place, though. The easiest way to do it will > doubtless change once we've centralized the executor's table-opening > logic, so trying to code it right now seems like a waste of effort. Okay. I've rebased the remaining patches. I broke down one of the patches into 2 and re-ordered the patches as follows: 0001: introduces a function that opens range table relations and maintains them in an array indexes by RT index 0002: introduces a new field in EState that's an array of RangeTblEntry pointers and revises macros used in the executor that access RTEs to return them from the array (David Rowley co-authored this one) 0003: moves result relation and ExecRowMark initialization out of InitPlan and into ExecInit* routines of respective nodes 0004: removes useless fields from certain planner nodes whose only purpose has been to assist the executor lock relations in proper order 0005: teaches planner to remove PlanRowMarks corresponding to dummy relations Thanks, Amit From ebef0d923ea8a1d5e458c9a60845bad68904cb52 Mon Sep 17 00:00:00 2001 From: "dgrow...@gmail.com" Date: Thu, 27 Sep 2018 16:14:41 +1200 Subject: [PATCH v12 1/5] Revise executor range table relation locking and opening/closing All requests to open range table relations in the executor now go through ExecRangeTableRelation(), which consults an array of Relation pointers indexed by RT index, an arrangement which allows the executor to have to heap_open any given range table relation only once. Relations are closed by ExecEndPlan() instead of ExecEndNode. This needed revising PartitionedRelPruneInfo node to contain the partitioned table's RT index instead of OID. With that change, ExecCreatePartitionPruneState can use ExecRangeTableRelation described above. --- contrib/postgres_fdw/postgres_fdw.c | 4 -- src/backend/executor/README | 4 +- src/backend/executor/execMain.c | 61 +-- src/backend/executor/execPartition.c | 34 ++--- src/backend/executor/execUtils.c | 60 +++--- src/backend/executor/nodeAppend.c | 6 --- src/backend/executor/nodeBitmapHeapscan.c | 7 src/backend/executor/nodeCustom.c | 4 -- src/backend/executor/nodeForeignscan.c| 4 -- src/backend/executor/nodeIndexonlyscan.c | 7 src/backend/executor/nodeIndexscan.c | 7 src/backend/executor/nodeMergeAppend.c| 6 --- src/backend/executor/nodeModifyTable.c| 4 +- src/backend/executor/nodeSamplescan.c | 5 --- src/backend/executor/nodeSeqscan.c| 7 src/backend/executor/nodeTidscan.c| 5 --- src/backend/nodes/copyfuncs.c | 2 +- src/backend/nodes/outfuncs.c | 2 +- src/backend/nodes/readfuncs.c | 2 +- src/backend/optimizer/plan/setrefs.c | 30 +++ src/backend/partitioning/partprune.c | 5 +-- src/include/executor/execPartition.h | 1 - src/include/executor/executor.h | 2 +- src/include/nodes/execnodes.h | 2 + src/include/nodes/plannodes.h
Re: Skylake-S warning
On Thu, Oct 4, 2018 at 9:50 AM Adrien Nayrat wrote: > > On 10/3/18 11:29 PM, Daniel Wood wrote: > > If running benchmarks or you are a customer which is currently impacted by > > GetSnapshotData() on high end multisocket systems be wary of Skylake-S. > > > > > > Performance differences of nearly 2X can be seen on select only pgbench due > > to > > nothing else but unlucky choices for max_connections. Scale 1000, 192 local > > clients on a 2 socket 48 core Skylake-S(Xeon Platinum 8175M @ 2.50-GHz) > > system. > > pgbench -S > > Could it be related to : > https://www.postgresql.org/message-id/D2B9F2A20670C84685EF7D183F2949E2373E66%40gigant.nidsa.net > ? Unlikely. I understood from Daniel's email that profiling shows a different hot-spot. In the cited .NET issue the problem was mostly due to issuing PAUSE in a loop without attempting to grab the lock. In PostgreSQL it's called only once per retry attempt. Regards, Ants Aasma -- PostgreSQL Senior Consultant www.cybertec-postgresql.com Austria (HQ), Wiener Neustadt | Switzerland, Zürich | Estonia, Tallinn | Uruguay, Montevideo Facebook: www.fb.com/cybertec.postgresql Twitter: www.twitter.com/PostgresSupport
Re: partition tree inspection functions
On 2018/10/03 12:37, Michael Paquier wrote: > On Mon, Oct 01, 2018 at 04:27:57PM +0900, Amit Langote wrote: >> Yeah, maybe there is no reason to delay proceeding with >> pg_partition_children which provides a useful functionality. > > So, I have been looking at your patch, and there are a couple of things > which could be improved. Thanks for reviewing and updating the patch. > Putting the new function pg_partition_children() in partition.c is a > bad idea I think. So instead I think that we should put that in a > different location, say utils/adt/partitionfuncs.c. Okay, sounds like a good idea. > + TupleDescInitEntry(tupdesc, (AttrNumber) 1, "relid", > + REGCLASSOID, -1, 0); > + TupleDescInitEntry(tupdesc, (AttrNumber) 2, "parentid", > + REGCLASSOID, -1, 0); > REGCLASSOID is used mainly for casting, so instead let's use OIDOID like > any other system function. Check. > + values[2] = psprintf("%d", level); > + values[3] = psprintf("%c", relkind == RELKIND_PARTITIONED_TABLE ? > + 'f' : > + 't'); > Using Datum objects is more elegant in this context. Agreed. I think I'd just copied the psprintf code from some other function. > isleaf is not particularly useful as it can just be fetched with a join > on pg_class/relkind. So I think that we had better remove it. That's a bit imposing on the users to know about relkind, but maybe that's okay. > I have cleaned up a bit tests, removing duplicates and most of the > things which touched the size of relations to have something more > portable. Thanks for that. > We could have a flavor using a relation name in input with qualified > names handled properly (see pg_get_viewdef_name for example), not sure > if that's really mandatory so I left that out. Having to always use the typecast (::regclass) may be a bit annoying, but we can always add that version later. > I have also added some > comments here and there. The docs could be worded a bit better still. > > My result is the patch attached. What do you think? I looked at the updated patch and couldn't resist making some changes, which see in the attached diff file. Also attached is the updated patch. Thanks, Amit diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index d41c09b68b..6dfa3dc977 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -20209,12 +20209,13 @@ postgres=# SELECT * FROM pg_walfile_name_offset(pg_stop_backup()); pg_partition_tree(oid) setof record -List information about a partition tree for the given partitioned -table, consisting of one row for each partition in a tree. The -information available is the OID of the partition, the OID of its -immediate partitioned table, and its level in the hierarchy, -beginning at 0 for the top-most parent, and -incremented by 1 for each level up. +List information about table in a partition tree for a given +partitioned table, which consists of one row for each partition and +table itself. Information provided includes the OID of the partition, +the OID of its immediate parent, and its level in the hierarchy. +The value of level begins at 0 for the input table +in its role as the root of the partition tree, 1 for +its partitions, 2 for their partitions, and so on. diff --git a/src/backend/utils/adt/partitionfuncs.c b/src/backend/utils/adt/partitionfuncs.c index fc0a904967..41c57cac2d 100644 --- a/src/backend/utils/adt/partitionfuncs.c +++ b/src/backend/utils/adt/partitionfuncs.c @@ -1,7 +1,7 @@ /*- * * partitionfuncs.c - * Functions for accessing partitioning data + * Functions for accessing partitioning related metadata * * Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group * Portions Copyright (c) 1994, Regents of the University of California diff --git a/src/test/regress/sql/partition_info.sql b/src/test/regress/sql/partition_info.sql index 2ad75882b0..d9d96a284c 100644 --- a/src/test/regress/sql/partition_info.sql +++ b/src/test/regress/sql/partition_info.sql @@ -17,11 +17,24 @@ SELECT relid::regclass, parentrelid::regclass, level, pg_relation_size(relid) = 0 AS is_empty FROM pg_partition_tree('ptif_test'::regclass); --- children of the main tree -SELECT relid::regclass, parentrelid::regclass, level - FROM pg_partition_tree('ptif_test0'::regclass); -SELECT relid::regclass, parentrelid::regclass, level - FROM pg_partition_tree('ptif_test01'::regclass); +-- check that it works correctly even if it's passed other tables in the tree + +-- passing an intermediate level partitioned table +SELECT relid::regclass, parentrelid::regclass, level, relkind <> 'p' AS isleaf + FROM pg_pa
Re: partition tree inspection functions
On 2018/10/04 9:27, Michael Paquier wrote: > On Wed, Oct 03, 2018 at 08:12:59AM -0400, Jesper Pedersen wrote: >> Removing isleaf would require extra round trips to the server to get >> that information. So, I think we should keep it. > > I don't really get your point about extra round trips with the server, > and getting the same level of information is as simple as a join between > the result set of pg_partition_tree() and pg_class (better to add schema > qualification and aliases to relations by the way): > =# SELECT relid::regclass, > parentrelid::regclass, level, > relkind != 'p' AS isleaf > FROM pg_partition_tree('ptif_test'::regclass), pg_class > WHERE oid = relid; > relid| parentrelid | level | isleaf > -+-+---+ > ptif_test | null| 0 | f > ptif_test0 | ptif_test | 1 | f > ptif_test1 | ptif_test | 1 | f > ptif_test2 | ptif_test | 1 | t > ptif_test01 | ptif_test0 | 2 | t > ptif_test11 | ptif_test1 | 2 | t > (6 rows) As mentioned in my other reply, that might be considered as asking the user to know inner details like relkind. Also, if a database has many partitioned tables with lots of partitions, the pg_class join might get expensive. OTOH, computing and returning it with other fields of pg_partition_tree is essentially free. Thanks, Amit
Re: [HACKERS] Optional message to user when terminating/cancelling backend
On Wed, Oct 03, 2018 at 12:09:54PM +1300, Thomas Munro wrote: > It looks like you missed another case that needs tolerance for late > signal delivery on Windows: > > +select pg_cancel_backend(pg_backend_pid()); > > https://ci.appveyor.com/project/postgresql-cfbot/postgresql/build/1.0.15263 Looking at 0001, why are the declarations needed in patch 0002 part of 0001 (see signalfuncs.h)? I think that something like instead the attached is enough for this part. Daniel, could you confirm? -- Michael diff --git a/src/backend/storage/ipc/Makefile b/src/backend/storage/ipc/Makefile index 9dbdc26c9b..bf4619d5fd 100644 --- a/src/backend/storage/ipc/Makefile +++ b/src/backend/storage/ipc/Makefile @@ -8,8 +8,8 @@ subdir = src/backend/storage/ipc top_builddir = ../../../.. include $(top_builddir)/src/Makefile.global -OBJS = barrier.o dsm_impl.o dsm.o ipc.o ipci.o latch.o pmsignal.o procarray.o \ - procsignal.o shmem.o shmqueue.o shm_mq.o shm_toc.o sinval.o \ - sinvaladt.o standby.o +OBJS = signalfuncs.o barrier.o dsm_impl.o dsm.o ipc.o ipci.o latch.o \ + pmsignal.o procarray.o procsignal.o shmem.o shmqueue.o shm_mq.o \ + shm_toc.o sinval.o sinvaladt.o standby.o include $(top_srcdir)/src/backend/common.mk diff --git a/src/backend/storage/ipc/signalfuncs.c b/src/backend/storage/ipc/signalfuncs.c new file mode 100644 index 00..c09a047127 --- /dev/null +++ b/src/backend/storage/ipc/signalfuncs.c @@ -0,0 +1,216 @@ +/*- + * + * signalfuncs.c + * Functions for signalling backends + * + * Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group + * Portions Copyright (c) 1994, Regents of the University of California + * + * + * IDENTIFICATION + * src/backend/storage/ipc/signalfuncs.c + * + *- + */ +#include "postgres.h" + +#include + +#include "catalog/pg_authid.h" +#include "funcapi.h" +#include "miscadmin.h" +#include "postmaster/syslogger.h" +#include "storage/ipc.h" +#include "storage/pmsignal.h" +#include "storage/proc.h" +#include "storage/procarray.h" +#include "utils/acl.h" +#include "utils/builtins.h" + +/* + * Send a signal to another backend. + * + * The signal is delivered if the user is either a superuser or the same + * role as the backend being signaled. For "dangerous" signals, an explicit + * check for superuser needs to be done prior to calling this function. + * + * Returns 0 on success, 1 on general failure, 2 on normal permission error + * and 3 if the caller needs to be a superuser. + * + * In the event of a general failure (return code 1), a warning message will + * be emitted. For permission errors, doing that is the responsibility of + * the caller. + */ +#define SIGNAL_BACKEND_SUCCESS 0 +#define SIGNAL_BACKEND_ERROR 1 +#define SIGNAL_BACKEND_NOPERMISSION 2 +#define SIGNAL_BACKEND_NOSUPERUSER 3 +static int +pg_signal_backend(int pid, int sig) +{ + PGPROC *proc = BackendPidGetProc(pid); + + /* + * BackendPidGetProc returns NULL if the pid isn't valid; but by the time + * we reach kill(), a process for which we get a valid proc here might + * have terminated on its own. There's no way to acquire a lock on an + * arbitrary process to prevent that. But since so far all the callers of + * this mechanism involve some request for ending the process anyway, that + * it might end on its own first is not a problem. + */ + if (proc == NULL) + { + /* + * This is just a warning so a loop-through-resultset will not abort + * if one backend terminated on its own during the run. + */ + ereport(WARNING, +(errmsg("PID %d is not a PostgreSQL server process", pid))); + return SIGNAL_BACKEND_ERROR; + } + + /* Only allow superusers to signal superuser-owned backends. */ + if (superuser_arg(proc->roleId) && !superuser()) + return SIGNAL_BACKEND_NOSUPERUSER; + + /* Users can signal backends they have role membership in. */ + if (!has_privs_of_role(GetUserId(), proc->roleId) && + !has_privs_of_role(GetUserId(), DEFAULT_ROLE_SIGNAL_BACKENDID)) + return SIGNAL_BACKEND_NOPERMISSION; + + /* + * Can the process we just validated above end, followed by the pid being + * recycled for a new process, before reaching here? Then we'd be trying + * to kill the wrong thing. Seems near impossible when sequential pid + * assignment and wraparound is used. Perhaps it could happen on a system + * where pid re-use is randomized. That race condition possibility seems + * too unlikely to worry about. + */ + + /* If we have setsid(), signal the backend's whole process group */ +#ifdef HAVE_SETSID + if (kill(-pid, sig)) +#else + if (kill(pid, sig)) +#endif + { + /* Again, just a warning to allow loops */ + ereport(WARNING, +(errmsg("could not send signal to process %d: %m", pid))); + return SIGNAL_BACKEND_ERROR; + } + return SIGNAL_BACKEND_SUCCESS; +} + +/* + * Signal to cancel a backend process. This is allowed if you are
Re: [HACKERS] Optional message to user when terminating/cancelling backend
> On 4 Oct 2018, at 09:59, Michael Paquier wrote: > > On Wed, Oct 03, 2018 at 12:09:54PM +1300, Thomas Munro wrote: >> It looks like you missed another case that needs tolerance for late >> signal delivery on Windows: >> >> +select pg_cancel_backend(pg_backend_pid()); >> >> https://ci.appveyor.com/project/postgresql-cfbot/postgresql/build/1.0.15263 > > Looking at 0001, why are the declarations needed in patch 0002 part of > 0001 (see signalfuncs.h)? I think that something like instead the > attached is enough for this part. Daniel, could you confirm? Yes, you are correct, the signalfuncs.h includes in 0001 are a rebase error from when I renamed the file. They are not present in the v15 patch but got introduced in v16 when I clearly wasn’t caffeinated enough to rebase. cheers ./daniel
RE: speeding up planning with partitions
Hi, Amit! On Thu, Sept 13, 2018 at 10:29 PM, Amit Langote wrote: > Attached is what I have at the moment. I also do the code review of the patch. I could only see a v3-0001.patch so far, so below are all about v3-0001.patch. I am new to inheritance/partitioning codes and code review, so my review below might have some mistakes. If there are mistakes, please point out that kindly :) v3-0001: 1. Is there any reason inheritance_make_rel_from_joinlist returns "parent" that is passed to it? I think we can refer parent in the caller even if inheritance_make_rel_from_joinlist returns nothing. +static RelOptInfo * +inheritance_make_rel_from_joinlist(PlannerInfo *root, + RelOptInfo *parent, + List *joinlist) +{ ... + return parent; +} 2. Is there the possible case that IS_DUMMY_REL(child_joinrel) is true in inheritance_adjust_scanjoin_target()? +inheritance_adjust_scanjoin_target(PlannerInfo *root, ... +{ ... + foreach(lc, root->inh_target_child_rels) + { ... + /* +* If the child was pruned, its corresponding joinrel will be empty, +* which we can ignore safely. +*/ + if (IS_DUMMY_REL(child_joinrel)) + continue; I think inheritance_make_rel_from_joinlist() doesn't make dummy joinrel for the child that was pruned. +inheritance_make_rel_from_joinlist(PlannerInfo *root, ... +{ ... + foreach(lc, root->append_rel_list) + { + RelOptInfo *childrel; ... + /* Ignore excluded/pruned children. */ + if (IS_DUMMY_REL(childrel)) + continue; ... + /* +* Save child joinrel to be processed later in +* inheritance_adjust_scanjoin_target, which adjusts its paths to +* be able to emit expressions in query's top-level target list. +*/ + root->inh_target_child_rels = lappend(root->inh_target_child_rels, + childrel); + } +} 3. Is the "root->parse->commandType != CMD_INSERT" required in: @@ -2018,13 +1514,45 @@ grouping_planner(PlannerInfo *root, bool inheritance_update, ... + /* +* For UPDATE/DELETE on an inheritance parent, we must generate and +* apply scanjoin target based on targetlist computed using each +* of the child relations. +*/ + if (parse->commandType != CMD_INSERT && + current_rel->reloptkind == RELOPT_BASEREL && + current_rel->relid == root->parse->resultRelation && + root->simple_rte_array[current_rel->relid]->inh) ... @@ -2137,92 +1665,123 @@ grouping_planner(PlannerInfo *root, bool inheritance_update, final_rel->fdwroutine = current_rel->fdwroutine; ... - foreach(lc, current_rel->pathlist) + if (current_rel->reloptkind == RELOPT_BASEREL && + current_rel->relid == root->parse->resultRelation && + !IS_DUMMY_REL(current_rel) && + root->simple_rte_array[current_rel->relid]->inh && + parse->commandType != CMD_INSERT) I think if a condition would be "current_rel->relid == root->parse->resultRelation && parse->commandType != CMD_INSERT" at the above if clause, elog() is called in the query_planner and planner don't reach above if clause. Of course there is the case that query_planner returns the dummy joinrel and elog is not called, but in that case, current_rel->relid may be 0(current_rel is dummy joinrel) and root->parse->resultRelation may be not 0(a query is INSERT). 4. Can't we use define value(IS_PARTITIONED or something like IS_INHERITANCE?) to identify inheritance and partitioned table in below code? It was little confusing to me that which code is related to inheritance/partitioned when looking codes. In make_one_rel(): + if (root->parse->resultRelation && + root->simple_rte_array[root->parse->resultRelation]->inh) + { ... + rel = inheritance_make_rel_from_joinlist(root, targetrel, joinlist); In inheritance_make_rel_from_joinlist(): + if (childrel->part_scheme != NULL) + childrel = + inheritance_make_rel_from_joinlist(root, childrel, + translated_joinlist); I can't review inheritance_adjust_scanjoin_target() deeply, because it is difficult to me to understand fully codes about join processing. -- Yoshikazu Imai
Re: partition tree inspection functions
On Thu, Oct 04, 2018 at 04:53:02PM +0900, Amit Langote wrote: > As mentioned in my other reply, that might be considered as asking the > user to know inner details like relkind. Also, if a database has many > partitioned tables with lots of partitions, the pg_class join might get > expensive. OTOH, computing and returning it with other fields of > pg_partition_tree is essentially free. So it seems that I am clearly outvoted here ;) Okay, let's do as you folks propose. -- Michael signature.asc Description: PGP signature
RE: [Proposal] Add accumulated statistics for wait event
On Thu, July 26, 2018 at 1:25 AM, Michael Paquier wrote: > Even if you have spiky workloads, sampling may miss those, but even with > adding counters for each event > you would need to query the table holding the counters at an insane frequency > to be able to perhaps get > something out of it as you need to do sampling of the counters as well to > extract deltas. Hi, I was wondering why PostgreSQL did not have the number of wait events and wait time that other databases such as Oracle had as a function, and when I was looking for related threads, I got to this thread. I am a DB beginner, so please tell me. It says that you can find events that are bottlenecks in sampling, but as you saw above, you can not find events shorter than the sampling interval, right? If this short event has occurred quite a number of times and it was a considerable amount of time in total, can you solve this problem with sampling? # I have asked, but I could not understand much of the discussion above and I do not know if such a case can exist. Also, I think that it can be solved by higher the sampling frequency, but the load will be high, right? I think this method is not very practical. Moreover, I think that it is not implemented due to the reason that sampling is good as described above and because it affects performance. How about switching the function on / off and implementing with default off? Do you care about performance degradation during bottleneck investigation? When investigating the bottleneck, I think that it is better to find the cause even at the expense of performance. # If you can detect with high frequency sampling, I think that it depends on whether the sampling or the function(the number of wait events and wait time) is high load. Since I am a DB beginner, I think that it is saying strange things. I am glad if you tell me. - Naoki Yotsunaga
Re: speeding up planning with partitions
Imai-san, On 2018/10/04 17:11, Imai, Yoshikazu wrote: > Hi, Amit! > > On Thu, Sept 13, 2018 at 10:29 PM, Amit Langote wrote: >> Attached is what I have at the moment. > > I also do the code review of the patch. Thanks a lot for your review. I'm working on updating the patches as mentioned upthread and should be able to post a new version sometime next week. I will consider your comments along with David's and Dilip's before posting the new patch. Regards, Amit
Re: [HACKERS] Secondary index access optimizations
On 04.10.2018 06:19, David Rowley wrote: On 12 September 2018 at 08:32, Konstantin Knizhnik wrote: Also the patch proposed by you is much simple and does mostly the same. Yes, it is not covering CHECK constraints, I started to look at this and found a problem in regards to varno during the predicate_implied_by() test. The problem is that the partition bound is always stored as varno=1 (For example, see how get_qual_for_list() calls makeVar()). This causes the patch to fail in cases where the partitioned table is not varno=1. You're also incorrectly using rinfo->clause to pass to predicate_implied_by(). This is a problem because stored here have not been translated to have the child varattnos. childqual is the correct thing to use as that's just been translated. You may have not used it as the varnos will have been converted to the child's varno, which will never be varno=1, so you might have found that not to work due to the missing code to change the varnos to 1. I've attached the diff for allpaths.c (only) that I ended up with to make it work. This causes the output of many other regression test to change, so you'll need to go over these and verify everything is correct again. Please, can you also add a test which tests this code which has a partition with columns in a different order than it's parent. Having an INT and a TEXT column is best as if the translations are done incorrectly it's likely to result in a crash which will alert us to the issue. It would be good to also verify the test causes a crash if you temporarily put the code back to using the untranslated qual. Thanks for working on this. Thank you very much for detecting and fixing this problem. I have checked that all changes in plan caused by this fix are correct. Updated version of the patch is attached. -- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c index 5f74d3b..b628ac7 100644 --- a/src/backend/optimizer/path/allpaths.c +++ b/src/backend/optimizer/path/allpaths.c @@ -37,6 +37,7 @@ #include "optimizer/paths.h" #include "optimizer/plancat.h" #include "optimizer/planner.h" +#include "optimizer/predtest.h" #include "optimizer/prep.h" #include "optimizer/restrictinfo.h" #include "optimizer/tlist.h" @@ -1052,6 +1053,27 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel, /* Restriction reduces to constant TRUE, so drop it */ continue; } + + /* + * For partitions, we may be able to eliminate some quals if + * they're implied by the partition bound. + */ + if (childrel->partition_qual != NIL) + { +Node *checkqual = copyObject(childqual); + +/* + * Since the partition_qual has all Vars stored as varno=1, we + * must convert all Vars of the childqual to have their varnos + * set to 1 so that predicate_implied_by can properly match + * implied quals. + */ +ChangeVarNodes(checkqual, childrel->relid, 1, 0); + +if (predicate_implied_by(list_make1(checkqual), childrel->partition_qual, false)) + continue; + } + /* might have gotten an AND clause, if so flatten it */ foreach(lc2, make_ands_implicit((Expr *) childqual)) { diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c index 8369e3a..8cd9b06 100644 --- a/src/backend/optimizer/util/plancat.c +++ b/src/backend/optimizer/util/plancat.c @@ -450,7 +450,8 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent, */ if (inhparent && relation->rd_rel->relkind == RELKIND_PARTITIONED_TABLE) set_relation_partition_info(root, rel, relation); - + else if (relation->rd_rel->relispartition) + rel->partition_qual = RelationGetPartitionQual(relation); heap_close(relation, NoLock); /* diff --git a/src/test/regress/expected/inherit.out b/src/test/regress/expected/inherit.out index 4f29d9f..67d7a41 100644 --- a/src/test/regress/expected/inherit.out +++ b/src/test/regress/expected/inherit.out @@ -1772,30 +1772,26 @@ explain (costs off) select * from list_parted where a is not null; - Append -> Seq Scan on part_ab_cd - Filter: (a IS NOT NULL) -> Seq Scan on part_ef_gh - Filter: (a IS NOT NULL) -> Seq Scan on part_null_xy Filter: (a IS NOT NULL) -(7 rows) +(5 rows) explain (costs off) select * from list_parted where a in ('ab', 'cd', 'ef'); QUERY PLAN -- Append -> Seq Scan on part_ab_cd - Filter: ((a)::text = ANY ('{ab,cd,ef}'::text[])) -> Seq Scan on part_ef_gh Filter: ((a)::text = ANY ('{ab,cd,ef}'::text[])) -(5 rows) +(4 rows) explain (costs off) select * from list_parted where a = 'ab' or a in (null, 'cd'); - QUERY PLAN
Re: New vacuum option to do only freezing
On Mon, Oct 1, 2018 at 7:20 PM Masahiko Sawada wrote: > > Hi, > > Attached patch adds a new option FREEZE_ONLY to VACUUM command. This > option is same as FREEZE option except for it disables reclaiming dead > tuples. That is, with this option vacuum does pruning HOT chain, > freezing live tuples and maintaining both visibility map and freespace > map but does not collect dead tuples and invoke neither heap vacuum > nor index vacuum. This option will be useful if user wants to prevent > XID wraparound a table as quick as possible, especially when table is > quite large and is about to XID wraparound. I think this usecase was > mentioned in some threads but I couldn't find them. > > Currently this patch just adds the new option to VACUUM command but it > might be good to make autovacuum use it when emergency vacuum is > required. > > This is a performance-test result for FREEZE option and FREEZE_ONLY > option. I've tested them on the table which is about 3.8GB table > without indexes and randomly modified. > > * FREEZE > INFO: aggressively vacuuming "public.pgbench_accounts" > INFO: "pgbench_accounts": removed 5 row versions in 8 pages > INFO: "pgbench_accounts": found 5 removable, 3000 nonremovable > row versions in 491804 out of 491804 pages > DETAIL: 0 dead row versions cannot be removed yet, oldest xmin: 722 > There were 0 unused item pointers. > Skipped 0 pages due to buffer pins, 0 frozen pages. > 0 pages are entirely empty. > CPU: user: 4.20 s, system: 16.47 s, elapsed: 50.28 s. > VACUUM > Time: 50301.262 ms (00:50.301) > > * FREEZE_ONLY > INFO: aggressively vacuuming "public.pgbench_accounts" > INFO: "pgbench_accounts": found 4 removable, 3000 nonremovable > row versions in 491804 out of 491804 pages > DETAIL: freeze 3000 rows > There were 0 unused item pointers. > Skipped 0 pages due to buffer pins, 0 frozen pages. > 0 pages are entirely empty. > CPU: user: 3.10 s, system: 14.85 s, elapsed: 44.56 s. > VACUUM > Time: 44589.794 ms (00:44.590) > > Feedback is very welcome. > Added to the next commit fest. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Re: [HACKERS] Secondary index access optimizations
On 4 October 2018 at 22:11, Konstantin Knizhnik wrote: > On 04.10.2018 06:19, David Rowley wrote: >> Please, can you also add a test which tests this code which has a >> partition with columns in a different order than it's parent. Having >> an INT and a TEXT column is best as if the translations are done >> incorrectly it's likely to result in a crash which will alert us to >> the issue. It would be good to also verify the test causes a crash if >> you temporarily put the code back to using the untranslated qual. >> >> Thanks for working on this. >> > Thank you very much for detecting and fixing this problem. > I have checked that all changes in plan caused by this fix are correct. > Updated version of the patch is attached. Can you add the test that I mentioned above? -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: Early WIP/PoC for inlining CTEs
On 10/03/2018 05:57 PM, David Fetter wrote: Is there any meaningful distinction between "inlining," by which I mean converting to a subquery, and predicate pushdown, which would happen at least for a first cut, at the rewrite stage? Sorry, but I do not think I understand your question. The ability to push down predicates is just one of the potential benefits from inlining. Andreas
RE: [Proposal] Add accumulated statistics for wait event
Hi, I am a DB beginner, so please tell me. It says that you can find events that are bottlenecks in sampling, but as you saw above, you can not find events shorter than the sampling interval, right? If an event occurs frequently and if it is reported in pg_stat_activity, you will catch it again and again while sampling, no matter it duration. Hence you just need to * Sample the sessions and consider the active ones. You need to know if they are waiting (PostgreSQL now provides detailed wait events) or if they are on the CPU * Optionally collect information on the system context at the time of sampling (CPU, memory...), it can be provided by many tools like psutil python library for example If the client application itself provides information it's even more interesting. With something like program/module/action/client_info/sofar/totalwork in application_name you are able to focus directly on different kind of activity. It can give you information like "I/O waits are meaningful for my batch activity but not for my OLTP activity, if my goal is to improve response time for end users I have to consider that." Best regards Phil De : Yotsunaga, Naoki Envoyé : jeudi 4 octobre 2018 10:31 À : 'Michael Paquier'; Phil Florent Cc : Tomas Vondra; pgsql-hackers@lists.postgresql.org Objet : RE: [Proposal] Add accumulated statistics for wait event On Thu, July 26, 2018 at 1:25 AM, Michael Paquier wrote: > Even if you have spiky workloads, sampling may miss those, but even with > adding counters for each event > you would need to query the table holding the counters at an insane frequency > to be able to perhaps get > something out of it as you need to do sampling of the counters as well to > extract deltas. Hi, I was wondering why PostgreSQL did not have the number of wait events and wait time that other databases such as Oracle had as a function, and when I was looking for related threads, I got to this thread. I am a DB beginner, so please tell me. It says that you can find events that are bottlenecks in sampling, but as you saw above, you can not find events shorter than the sampling interval, right? If this short event has occurred quite a number of times and it was a considerable amount of time in total, can you solve this problem with sampling? # I have asked, but I could not understand much of the discussion above and I do not know if such a case can exist. Also, I think that it can be solved by higher the sampling frequency, but the load will be high, right? I think this method is not very practical. Moreover, I think that it is not implemented due to the reason that sampling is good as described above and because it affects performance. How about switching the function on / off and implementing with default off? Do you care about performance degradation during bottleneck investigation? When investigating the bottleneck, I think that it is better to find the cause even at the expense of performance. # If you can detect with high frequency sampling, I think that it depends on whether the sampling or the function(the number of wait events and wait time) is high load. Since I am a DB beginner, I think that it is saying strange things. I am glad if you tell me. - Naoki Yotsunaga
Re: libpq compression
On 01.10.2018 09:49, Michael Paquier wrote: On Mon, Aug 20, 2018 at 06:00:39PM +0300, Konstantin Knizhnik wrote: New version of the patch is attached: I removed -Z options form pgbench and psql and add checking that server and client are implementing the same compression algorithm. The patch had no reviews, and does not apply anymore, so it is moved to next CF with waiting on author as status. -- Michael Rebased version of the patch is attached. -- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company diff --git a/configure b/configure index 0448c6b..790ac2e 100755 --- a/configure +++ b/configure @@ -700,6 +700,7 @@ ELF_SYS EGREP GREP with_zlib +with_zstd with_system_tzdata with_libxslt with_libxml @@ -864,6 +865,7 @@ with_libxml with_libxslt with_system_tzdata with_zlib +with_zstd with_gnu_ld enable_largefile enable_float4_byval @@ -8377,6 +8379,86 @@ fi # +# ZStd +# + + + +# Check whether --with-zstd was given. +if test "${with_zstd+set}" = set; then : + withval=$with_zstd; + case $withval in +yes) + ;; +no) + : + ;; +*) + as_fn_error $? "no argument expected for --with-zstd option" "$LINENO" 5 + ;; + esac + +else + with_zstd=no + +fi + + + + +if test "$with_zstd" = yes ; then + { $as_echo "$as_me:${as_lineno-$LINENO}: checking for ZSTD_compress in -lzstd" >&5 +$as_echo_n "checking for ZSTD_compress in -lzstd... " >&6; } +if ${ac_cv_lib_zstd_ZSTD_compress+:} false; then : + $as_echo_n "(cached) " >&6 +else + ac_check_lib_save_LIBS=$LIBS +LIBS="-lzstd $LIBS" +cat confdefs.h - <<_ACEOF >conftest.$ac_ext +/* end confdefs.h. */ + +/* Override any GCC internal prototype to avoid an error. + Use char because int might match the return type of a GCC + builtin and then its argument prototype would still apply. */ +#ifdef __cplusplus +extern "C" +#endif +char ZSTD_compress (); +int +main () +{ +return ZSTD_compress (); + ; + return 0; +} +_ACEOF +if ac_fn_c_try_link "$LINENO"; then : + ac_cv_lib_zstd_ZSTD_compress=yes +else + ac_cv_lib_zstd_ZSTD_compress=no +fi +rm -f core conftest.err conftest.$ac_objext \ +conftest$ac_exeext conftest.$ac_ext +LIBS=$ac_check_lib_save_LIBS +fi +{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_lib_zstd_ZSTD_compress" >&5 +$as_echo "$ac_cv_lib_zstd_ZSTD_compress" >&6; } +if test "x$ac_cv_lib_zstd_ZSTD_compress" = xyes; then : + cat >>confdefs.h <<_ACEOF +#define HAVE_LIBZSTD 1 +_ACEOF + + LIBS="-lzstd $LIBS" + +else + as_fn_error $? "library 'zstd' is required for ZSTD support" "$LINENO" 5 +fi + +fi + + + +# # Elf # diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml index 06d909e..45bb061 100644 --- a/doc/src/sgml/libpq.sgml +++ b/doc/src/sgml/libpq.sgml @@ -1135,6 +1135,17 @@ postgresql://%2Fvar%2Flib%2Fpostgresql/dbname + + compression + + +Request compression of libpq traffic. If server is supporting compression, then all libpq messages send both from client to server and +visa versa will be compressed. Right now compression algorithm is hardcoded: is it is either zlib (default), either zstd (if Postgres was +configured with --with-zstd option). In both cases streaming mode is used. + + + + client_encoding diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml index f0b2145..2330e54 100644 --- a/doc/src/sgml/protocol.sgml +++ b/doc/src/sgml/protocol.sgml @@ -92,6 +92,15 @@ such as COPY. + +Is is possible to compress protocol data to reduce traffic and speed-up client-server interaction. +Compression is especialy useful for importing/exprorting data to/from database using COPY command +and for replication (oth physical and logical). Also compression can reduce server response time +in case of queries, requestion larger amount of data (for example returning JSON, BLOBs, text,...) +Right now compression algorithm is hardcoded: is it is either zlib (default), either zstd (if Postgres was +configured with --with-zstd option). In both cases streaming mode is used. + + Messaging Overview @@ -263,6 +272,18 @@ + CompressionOk + + + Server acknowledge using compression for client-server communication protocol. + Compression can be requested by client by including "compression" option in connection string. + Right now compression algorithm is hardcoded, but in future client and server may negotiate to + choose proper compression algorithm. + + + + + AuthenticationOk @@ -3398,6 +3419,43 @@ AuthenticationSASLFinal (B) + + +CompressionOk (B) + + + + + + + +Byte1('z') + + + + Acknowledge use of compression for protocol data. After receiving this message bother server and client are switched to compression mode + and exchange compressed messages. + + + +
Re: [Proposal] Add accumulated statistics for wait event
On Thu, Oct 04, 2018 at 09:32:37AM +, Phil Florent wrote: > I am a DB beginner, so please tell me. It says that you can find > events that are bottlenecks in sampling, but as you saw above, you can > not find events shorter than the sampling interval, right? Yes, which is why it would be as simple as making the interval shorter, still not too short so as it bloats the amount of information fetched which needs to be stored and afterwards (perhaps) treated for analysis. This gets rather close to signal processing. A simple image is for example, assuming that event A happens 100 times in an interval of 1s, and event B only once in the same interval of 1s, then if the snapshot interval is only 1s, then in the worst case A would be treated an equal of B, which would be wrong. -- Michael signature.asc Description: PGP signature
Re: [HACKERS] Optional message to user when terminating/cancelling backend
On Thu, Oct 04, 2018 at 10:06:06AM +0200, Daniel Gustafsson wrote: > Yes, you are correct, the signalfuncs.h includes in 0001 are a rebase error > from when I renamed the file. They are not present in the v15 patch but got > introduced in v16 when I clearly wasn’t caffeinated enough to rebase. No problem, thanks for confirming. I have worked a bit more on the thing, reducing the headers of the new file to a bare minimum, and I have committed 0001. -- Michael signature.asc Description: PGP signature
Re: Function to promote standby servers
Michael Paquier wrote: > > In that vein, I propose a function pg_promote() to promote > > physical standby servers. > > No fundamental issues from me regarding the concept of being able to > trigger a promotion remotely, so +1. Do we want this capability as well > for fallback_promote? My gut tells me no, and that we ought to drop > this option at some point in the future, still that's worth mentioning. > > You may want to move PROMOTE_SIGNAL_FILE in a position where xlogfuncs.c > could use it. Good, idea; updated patch attached. Yours, Laurenz Albe From 63ba6c6f8088bea138e924b4180a08086d339eb5 Mon Sep 17 00:00:00 2001 From: Laurenz Albe Date: Thu, 4 Oct 2018 13:24:03 +0200 Subject: [PATCH] Add pg_promote() to promote standby servers --- doc/src/sgml/func.sgml | 20 +++ doc/src/sgml/high-availability.sgml| 2 +- doc/src/sgml/recovery-config.sgml | 3 +- src/backend/access/transam/xlog.c | 2 -- src/backend/access/transam/xlogfuncs.c | 48 ++ src/include/access/xlog.h | 4 +++ src/include/catalog/pg_proc.dat| 4 +++ 7 files changed, 79 insertions(+), 4 deletions(-) diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 4331bebc96..7beeaeacde 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -18596,6 +18596,9 @@ SELECT set_config('log_statement_stats', 'off', false); pg_terminate_backend + +pg_promote + signal @@ -18655,6 +18658,16 @@ SELECT set_config('log_statement_stats', 'off', false); however only superusers can terminate superuser backends. + + +pg_promote() + + boolean + Promote a physical standby server. This function can only be +called by superusers and will only have an effect when run on +a standby server. + + @@ -18692,6 +18705,13 @@ SELECT set_config('log_statement_stats', 'off', false); subprocess. + +pg_promote() sends a signal to the server that causes it +to leave standby mode. Since sending signals is by nature asynchronous, +successful execution of the function does not guarantee that the server has +already been promoted. + + diff --git a/doc/src/sgml/high-availability.sgml b/doc/src/sgml/high-availability.sgml index 8cb77f85ec..6014817d9e 100644 --- a/doc/src/sgml/high-availability.sgml +++ b/doc/src/sgml/high-availability.sgml @@ -1472,7 +1472,7 @@ synchronous_standby_names = 'ANY 2 (s1, s2, s3)' To trigger failover of a log-shipping standby server, -run pg_ctl promote or create a trigger +run pg_ctl promote, call pg_promote(), or create a trigger file with the file name and path specified by the trigger_file setting in recovery.conf. If you're planning to use pg_ctl promote to fail over, trigger_file is diff --git a/doc/src/sgml/recovery-config.sgml b/doc/src/sgml/recovery-config.sgml index 92825fdf19..d06cd0b08e 100644 --- a/doc/src/sgml/recovery-config.sgml +++ b/doc/src/sgml/recovery-config.sgml @@ -439,7 +439,8 @@ restore_command = 'copy "C:\\server\\archivedir\\%f" "%p"' # Windows Specifies a trigger file whose presence ends recovery in the standby. Even if this value is not set, you can still promote - the standby using pg_ctl promote. + the standby using pg_ctl promote or calling + pg_promote(). This setting has no effect if standby_mode is off. diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 3025d0badb..bb422e9a13 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -81,8 +81,6 @@ extern uint32 bootstrap_data_checksum_version; /* File path names (all relative to $PGDATA) */ #define RECOVERY_COMMAND_FILE "recovery.conf" #define RECOVERY_COMMAND_DONE "recovery.done" -#define PROMOTE_SIGNAL_FILE "promote" -#define FALLBACK_PROMOTE_SIGNAL_FILE "fallback_promote" /* User-settable parameters */ diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c index 9731742978..ab37b03dee 100644 --- a/src/backend/access/transam/xlogfuncs.c +++ b/src/backend/access/transam/xlogfuncs.c @@ -35,6 +35,7 @@ #include "storage/fd.h" #include "storage/ipc.h" +#include /* * Store label file and tablespace map during non-exclusive backups. @@ -697,3 +698,50 @@ pg_backup_start_time(PG_FUNCTION_ARGS) PG_RETURN_DATUM(xtime); } + +/* + * Promote a standby server. + * + * A result of "true" means that promotion has been initiated. + */ +Datum +pg_promote(PG_FUNCTION_ARGS) +{ + FILE *promote_file; + + if (!superuser()) + ereport(ERROR, +(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE), + errmsg("must be superuser to promote standby servers"))); + + if (!RecoveryInProgress() || !StandbyMode) + PG_RETURN_BOOL(false); + + /* crea
pg_upgrade failed with ERROR: null relpartbound for relation 18159 error.
Hi, I am getting ERROR: null relpartbound for relation 18159 while doing pg_upgrade from v11 to v11/master. -- user-defined operator class in partition key CREATE FUNCTION my_int4_sort(int4,int4) RETURNS int LANGUAGE sql AS $$ SELECT CASE WHEN $1 = $2 THEN 0 WHEN $1 > $2 THEN 1 ELSE -1 END; $$; CREATE OPERATOR CLASS test_int4_ops FOR TYPE int4 USING btree AS OPERATOR 1 < (int4,int4), OPERATOR 2 <= (int4,int4), OPERATOR 3 = (int4,int4), OPERATOR 4 >= (int4,int4), OPERATOR 5 > (int4,int4), FUNCTION 1 my_int4_sort(int4,int4); CREATE TABLE partkey_t (a int4) PARTITION BY RANGE (a test_int4_ops); CREATE TABLE partkey_t_1 PARTITION OF partkey_t FOR VALUES FROM (0) TO (1000); INSERT INTO partkey_t VALUES (100); INSERT INTO partkey_t VALUES (200); --ran pg_upgrade failed with below error. pg_restore: creating TABLE "public.partitioned" pg_restore: creating TABLE "public.partkey_t" pg_restore: creating TABLE "public.partkey_t_1" pg_restore: [archiver (db)] Error while PROCESSING TOC: pg_restore: [archiver (db)] Error from TOC entry 485; 1259 18159 TABLE partkey_t_1 edb pg_restore: [archiver (db)] could not execute query: ERROR: null relpartbound for relation 18159 CONTEXT: SQL function "my_int4_sort" Command was: -- For binary upgrade, must preserve pg_type oid SELECT pg_catalog.binary_upgrade_set_next_pg_type_oid('18161'::pg_catalog.oid); -- For binary upgrade, must preserve pg_type array oid SELECT pg_catalog.binary_upgrade_set_next_array_pg_type_oid('18160'::pg_catalog.oid); -- For binary upgrade, must preserve pg_class oids SELECT pg_catalog.binary_upgrade_set_next_heap_pg_class_oid('18159'::pg_catalog.oid); CREATE TABLE "public"."partkey_t_1" ( "a" integer ); -- For binary upgrade, recreate inherited column. UPDATE pg_catalog.pg_attribute SET attislocal = false WHERE attname = 'a' AND attrelid = '"public"."partkey_t_1"'::pg_catalog.regclass; -- For binary upgrade, set up inheritance and partitioning this way. ALTER TABLE ONLY "public"."partkey_t" ATTACH PARTITION "public"."partkey_t_1" FOR VALUES FROM (0) TO (1000); -- For binary upgrade, set heap's relfrozenxid and relminmxid UPDATE pg_catalog.pg_class SET relfrozenxid = '1915', relminmxid = '1' WHERE oid = '"public"."partkey_t_1"'::pg_catalog.regclass; Thanks & Regards, Rajkumar Raghuwanshi QMG, EnterpriseDB Corporation
Possible important data point on stats collection, wondering about possible improvement
Hi; System is PostgreSQL 10.5, all partitioning done the old way (via inheritance). Last month we had some performance issues caused by statistics being out of date and the planner choosing the wrong index for a large number of queries. The proximal fix was to increase the stats target from 1000 to 1 and analyze which prevents the problem from continuing to manifest. Looking back at the graphs though I notice a number of things which make me think that maybe there might be ways of improving the situation without increasing stats targets. I wanted to bring up the question here and see if there were opportunities to work together on improving the situation. What I measured was the difference between the maximum value in the statistics histogram and the maximum value in the table. The relevant field is a timestamp field, so calculating lag is straight-forward and gives us a percentage of the table that is outside stats. What I noticed was that the major analytics tables seem to fall into two groups: 1. One group has actual clear sampling issues as evidenced by the fact that the difference in values swung wildly around. One of these tables had 52 million rows spread between two partitions (1M and 51M respectively). On this group I understand the need to set stats targets up. 2. A second group saw a more mild increase in lag between max value recorded in stats and max value in the db. However what struck me about this was that the lag seemed fairly linear. In other words, the fluctuations were within a relatively narrow range and the lag seemed to grow linearly with time. These tables were actually larger (one typical one was 60M rows split between a partition of 51M rows and one of 9M rows). The workload for the database in question is heavily update driven so I would expect fewer sampling bias problems than might happen for insert-only workloads. The second case puzzles me. I have been looking carefully into how the stats collector works and I cannot find anything that could account for a near-linear increase in statics missing recent data. What starts out in a 60M row table with default stats targets (1000) ends up going about 10% off over time. What I am wondering is whether it would make any sense whatsoever to expand the stats to include min and max values found in a scan, or whether it would make more sense to try to help the planner extrapolate from existing stats in a more efficient way. -- Best Regards, Chris Travers Head of Database Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com Saarbrücker Straße 37a, 10405 Berlin
Poor plan when using EXISTS in the expression list
Hello Our developpers ORM (Django's) sadly can not use EXISTS in the where clauses without having it in the expression part of the SELECT statement. I was expecting it to create queries performing a bit worse than queries without this useless expression, but it turns out this trigger an extremely poor planning, with an additional Seq Scan of the table referenced in EXISTS. Thus the query select a.*, exists (select * from b where a_id = a.id) from a where exists (select * from b where a_id = a.id); can be orders of magnitude slower than select a.* from a where exists (select * from b where a_id = a.id); This has been reproduced on PostgreSQL 9.6 and 11 beta4. Example : test=> create table a (id serial primary key, b text); CREATE TABLE test=> create table b (id serial primary key, a_id integer not null references a(id), c text); CREATE TABLE test=> explain select a.* from a where exists (select * from b where a_id = a.id); QUERY PLAN --- Hash Join (cost=29.50..62.60 rows=635 width=36) Hash Cond: (a.id = b.a_id) -> Seq Scan on a (cost=0.00..22.70 rows=1270 width=36) -> Hash (cost=27.00..27.00 rows=200 width=4) -> HashAggregate (cost=25.00..27.00 rows=200 width=4) Group Key: b.a_id -> Seq Scan on b (cost=0.00..22.00 rows=1200 width=4) (7 rows) test=> explain select a.*, exists (select * from b where a_id = a.id) from a; QUERY PLAN - Seq Scan on a (cost=0.00..5314.37 rows=1270 width=37) SubPlan 1 -> Seq Scan on b (cost=0.00..25.00 rows=6 width=0) Filter: (a_id = a.id) SubPlan 2 -> Seq Scan on b b_1 (cost=0.00..22.00 rows=1200 width=4) (6 rows) test=> explain select a.*, exists (select * from b where a_id = a.id) from a where exists (select * from b where a_id = a.id); QUERY PLAN --- Hash Join (cost=29.50..2708.43 rows=635 width=37) Hash Cond: (a.id = b.a_id) -> Seq Scan on a (cost=0.00..22.70 rows=1270 width=36) -> Hash (cost=27.00..27.00 rows=200 width=4) -> HashAggregate (cost=25.00..27.00 rows=200 width=4) Group Key: b.a_id -> Seq Scan on b (cost=0.00..22.00 rows=1200 width=4) SubPlan 1 -> Seq Scan on b b_1 (cost=0.00..25.00 rows=6 width=0) Filter: (a_id = a.id) SubPlan 2 -> Seq Scan on b b_2 (cost=0.00..22.00 rows=1200 width=4) (12 rows) Thanks Pierre
Re: pg_ls_tmpdir()
Bossart, Nathan wrote: > >> AFAICT the cleanest way to do this in system_views.sql is to hard-code > >> the pg_default tablespace OID in the DEFAULT expression. So, it might > >> be best to use the two function approach if we want pg_ls_tmpdir() to > >> default to the pg_default tablespace. > > > > That would be pretty bletcherous, so +1 for just making two C functions. > > Alright, here's an updated patch. Looks, good; marking as "ready for committer". Yours, Laurenz Albe
RE: [Proposal] Add accumulated statistics for wait event
Hi, It's the same logic with any polling system. An integration calculation using monte-carlo method with only a few points won't be accurate enough and can even be completely wrong etc. Polling is OK to troubleshoot a problem on the fly but 2 points are not enough. A few seconds are needed to obtain good enough data, e.g 5-10 seconds of polling with a 0.1=>0.01s interval between 2 queries of the activity. Polling a few seconds while the user is waiting is normally enough to say if a significant part of the waits are on the database. It's very important to know that. With 1 hour of accumulated statistics, a DBA will always see something to fix. But if the user waits 10 seconds on a particular screen and 1 second is spent on the database it often won't directly help. Polling gives great information with postgreSQL 10 but it was already useful to catch top queries etc. in older versions. I always check if activity is adequately reported by my tool using known cases. I want to be sure it will report adequately things in real-world troubleshooting sessions. Sometimes there are bugs in my tool, once there was an issue with postgres (pgstat_report_activty() was not called by workers in parallel index creation) Best regards Phil De : Michael Paquier Envoyé : jeudi 4 octobre 2018 12:58 À : Phil Florent Cc : Yotsunaga, Naoki; Tomas Vondra; pgsql-hackers@lists.postgresql.org Objet : Re: [Proposal] Add accumulated statistics for wait event On Thu, Oct 04, 2018 at 09:32:37AM +, Phil Florent wrote: > I am a DB beginner, so please tell me. It says that you can find > events that are bottlenecks in sampling, but as you saw above, you can > not find events shorter than the sampling interval, right? Yes, which is why it would be as simple as making the interval shorter, still not too short so as it bloats the amount of information fetched which needs to be stored and afterwards (perhaps) treated for analysis. This gets rather close to signal processing. A simple image is for example, assuming that event A happens 100 times in an interval of 1s, and event B only once in the same interval of 1s, then if the snapshot interval is only 1s, then in the worst case A would be treated an equal of B, which would be wrong. -- Michael
Re: SerializeParamList vs machines with strict alignment
Amit Kapila writes: > All the affected members (gharial, chipmunk, anole) are happy. It > feels good to see chipmunk becoming green after so many days. Yup. I've marked this item fixed on the open-items list. regards, tom lane
Re: executor relation handling
Amit Langote writes: > On 2018/10/04 5:16, Tom Lane wrote: >> I think that we ought to adjust parallel query to insist that children >> do take locks, and then revert the IsParallelWorker() exceptions I made >> here. > Maybe I'm missing something here, but isn't the necessary adjustment just > that the relations are opened with locks if inside a parallel worker? Yeah, that's one plausible way to fix it. I hadn't wanted to prejudge the best way before we finish the other changes, though. > I've rebased the remaining patches. I broke down one of the patches into > 2 and re-ordered the patches as follows: Thanks, will start looking at these today. regards, tom lane
Re: Poor plan when using EXISTS in the expression list
On Thu, 4 Oct 2018 at 13:11, Pierre Ducroquet wrote: > Our developpers ORM (Django's) sadly can not use EXISTS in the where > clauses > without having it in the expression part of the SELECT statement. > I don't know if this will be helpful to you (and I appreciate there's still the underlying PG issue), but there's a suggestion here that you can work around this using .extra() https://stackoverflow.com/a/38880144/321161 Geoff
Re: Multiple primary key on partition table?
On 2018-Oct-01, Rajkumar Raghuwanshi wrote: > On Tue, Sep 18, 2018 at 11:20 AM amul sul wrote: > > > Here is the complete patch proposes the aforesaid fix with regression test. > > Thanks, This worked for me. Yeah, looks good to me, pushed. I added one more regression test to ensure that the PRIMARY KEY clause in the partition is still accepted if the parent does not have one. Thanks -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Segfault when creating partition with a primary key and sql_drop trigger exists
I admit I'm surprised that your patch fixes the bug. sql_drop was added before the command-stashing was added for pg_event_trigger_ddl_commands was added, and sql_drop only processes objects from the list passed to performMultipleDeletions, so adding the EventTriggerAlterTableStart() / End() calls should not affect it ... evidently I must be missing something here. Still looking. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] Secondary index access optimizations
On 04.10.2018 12:19, David Rowley wrote: On 4 October 2018 at 22:11, Konstantin Knizhnik wrote: On 04.10.2018 06:19, David Rowley wrote: Please, can you also add a test which tests this code which has a partition with columns in a different order than it's parent. Having an INT and a TEXT column is best as if the translations are done incorrectly it's likely to result in a crash which will alert us to the issue. It would be good to also verify the test causes a crash if you temporarily put the code back to using the untranslated qual. Thanks for working on this. Thank you very much for detecting and fixing this problem. I have checked that all changes in plan caused by this fix are correct. Updated version of the patch is attached. Can you add the test that I mentioned above? Will the following test be enough: -- check that columns for parent table are correctly mapped to child partition of their order doesn't match create table paren (a int, b text) partition by range(a); create table child_1 partition of paren for values from (0) to (10); create table child_2 (b text, a int); alter table paren attach partition child_2 for values from (10) to (20); insert into paren values (generate_series(0,19), generate_series(100,119)); explain (costs off) select * from paren where a between 0 and 9; explain (costs off) select * from paren where a between 10 and 20; explain (costs off) select * from paren where a >= 5; explain (costs off) select * from paren where a <= 15; select count(*) from paren where a >= 5; select count(*) from paren where a < 15; drop table paren cascade; -- If so, then updated patch is attached. -- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c index 5f74d3b..b628ac7 100644 --- a/src/backend/optimizer/path/allpaths.c +++ b/src/backend/optimizer/path/allpaths.c @@ -37,6 +37,7 @@ #include "optimizer/paths.h" #include "optimizer/plancat.h" #include "optimizer/planner.h" +#include "optimizer/predtest.h" #include "optimizer/prep.h" #include "optimizer/restrictinfo.h" #include "optimizer/tlist.h" @@ -1052,6 +1053,27 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel, /* Restriction reduces to constant TRUE, so drop it */ continue; } + + /* + * For partitions, we may be able to eliminate some quals if + * they're implied by the partition bound. + */ + if (childrel->partition_qual != NIL) + { +Node *checkqual = copyObject(childqual); + +/* + * Since the partition_qual has all Vars stored as varno=1, we + * must convert all Vars of the childqual to have their varnos + * set to 1 so that predicate_implied_by can properly match + * implied quals. + */ +ChangeVarNodes(checkqual, childrel->relid, 1, 0); + +if (predicate_implied_by(list_make1(checkqual), childrel->partition_qual, false)) + continue; + } + /* might have gotten an AND clause, if so flatten it */ foreach(lc2, make_ands_implicit((Expr *) childqual)) { diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c index 8369e3a..8cd9b06 100644 --- a/src/backend/optimizer/util/plancat.c +++ b/src/backend/optimizer/util/plancat.c @@ -450,7 +450,8 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent, */ if (inhparent && relation->rd_rel->relkind == RELKIND_PARTITIONED_TABLE) set_relation_partition_info(root, rel, relation); - + else if (relation->rd_rel->relispartition) + rel->partition_qual = RelationGetPartitionQual(relation); heap_close(relation, NoLock); /* diff --git a/src/common/zpq_stream.c b/src/common/zpq_stream.c new file mode 100644 index 000..afd42e9 --- /dev/null +++ b/src/common/zpq_stream.c @@ -0,0 +1,386 @@ +#include "postgres_fe.h" +#include "common/zpq_stream.h" +#include "c.h" +#include "pg_config.h" + +#if HAVE_LIBZSTD + +#include +#include + +#define ZPQ_BUFFER_SIZE (8*1024) +#define ZSTD_COMPRESSION_LEVEL 1 + +struct ZpqStream +{ + ZSTD_CStream* tx_stream; + ZSTD_DStream* rx_stream; + ZSTD_outBuffer tx; + ZSTD_inBuffer rx; + size_t tx_not_flushed; /* Amount of datas in internal zstd buffer */ + size_t tx_buffered;/* Data which is consumed by zpq_read but not yet sent */ + zpq_tx_functx_func; + zpq_rx_funcrx_func; + void* arg; + char const*rx_error;/* Decompress error message */ + size_t tx_total; + size_t tx_total_raw; + size_t rx_total; + size_t rx_total_raw; + char tx_buf[ZPQ_BUFFER_SIZE]; + char rx_buf[ZPQ_BUFFER_SIZE]; +}; + +ZpqStream* +zpq_create(zpq_tx_func tx_func, zpq_rx_func rx_func, void *arg) +{ + ZpqStream* zs = (ZpqStream*)malloc(sizeof(ZpqStream)); + zs->tx_stream = ZSTD_createCStream(); + ZSTD_initCStream(zs->tx_stream, ZSTD_COMPRESSION_LEVEL); + zs
Re: Segfault when creating partition with a primary key and sql_drop trigger exists
> On Wed, 3 Oct 2018 at 09:53, Michael Paquier wrote: > > On Fri, Sep 28, 2018 at 12:17:00PM +0900, Michael Paquier wrote: > > I think that Alvaro should definitely look at this patch to be sure, or > > I could do it, but I would need to spend way more time on this and check > > event trigger interactions. > > > > Anyway, I was struggling a bit regarding the location where adding a > > regression test. event_trigger.sql makes the most sense but in tests > > for drops the objects are created before the event trigger is defined, > > so that would need to move around so as the original problem is > > reproducible. Perhaps you have an idea for that? > > Okay. I have spent more time on this issue, and I have been able to > integrate a test in the existing event_trigger.sql which is able to > reproduce the reported failure. Attached is what I am finishing with. > > I still want to do more testing on it, and the day is ending here. > > Thoughts? Sorry, couldn't answer your previous message, since was away. So far I don't see any problems with your proposed patch. > On Thu, 4 Oct 2018 at 17:22, Alvaro Herrera wrote: > > I admit I'm surprised that your patch fixes the bug. sql_drop was added > before the command-stashing was added for pg_event_trigger_ddl_commands > was added, and sql_drop only processes objects from the list passed to > performMultipleDeletions, so adding the EventTriggerAlterTableStart() / > End() calls should not affect it ... evidently I must be missing > something here. > > Still looking. I also find strange another part of this problem, namely why we ended up doing AlterTableInternal at all, since I assumed that after MergeAttributes all attnotnull should be merged with OR. But for example these would work: CREATE TABLE collections_list_1 PARTITION OF collections_list FOR VALUES IN (1); CREATE TABLE collections_list_1 PARTITION OF collections_list (key, ts, collection_id not null, value) FOR VALUES IN (1); Looks like in MergeAttributes at the branch: if (is_partition && list_length(saved_schema) > 0) we override not null property of ColumnDef: if (coldef->is_from_parent) { coldef->is_not_null = restdef->is_not_null; It looks a bit confusing, so I wonder if it's how it should be?
Re: Segfault when creating partition with a primary key and sql_drop trigger exists
On 2018-Oct-04, Alvaro Herrera wrote: > I admit I'm surprised that your patch fixes the bug. sql_drop was added > before the command-stashing was added for pg_event_trigger_ddl_commands > was added, and sql_drop only processes objects from the list passed to > performMultipleDeletions, so adding the EventTriggerAlterTableStart() / > End() calls should not affect it ... evidently I must be missing > something here. I think the explanation for this is that the problem has nothing to do with sql_drop per se -- it's only that having a sql_drop trigger causes the event trigger stuff to get invoked, and the bogus code involving ddl_command_end (the one that's affected by the EventTriggerAlterTableStart dance) is what crashes. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: TupleTableSlot abstraction
I have only done the below two changes yet. After doing that and rebasing with latest master, in the regression I got crashes, and I suspect the reason being that I have used Virtual tuple slot for the destination slot of execute_attr_map_slot(). I am analyzing it. I am anyway attaching the patches (v12) to give you an idea of how I have handled the below two items. On Wed, 26 Sep 2018 at 05:09, Andres Freund wrote: > > On 2018-09-04 18:35:34 +0530, Amit Khandekar wrote: > > The attached v11 tar has the above set of changes. > > - I've pushed 0003 - although that commit seems to have included a lot > of things unrelated to the commit message. I guess two patches have > accidentally been merged? Could you split out the part of the changes > that was mis-squashed? Yeah, it indeed looks like it had unrelated things, mostly the changes that moved the slot attribute functions into execTuples.c . I have included this change as the very first patch in the patch series. >> From 280eac5c6758061d0a46b7ef9dd324e9a23226b5 Mon Sep 17 00:00:00 2001 >> From: Ashutosh Bapat >> Date: Fri, 31 Aug 2018 10:53:42 +0530 >> Subject: [PATCH 08/14] Change tuple table slot creation routines to suite >> tuple table slot abstraction >> >> This change requires ExecInitResultTupleSlotTL, ExecInitScanTupleSlot, >> ExecCreateScanSlotFromOuterPlan, ExecInitNullTupleSlot, >> ExecInitExtraTupleSlot, MakeTupleTableSlot, ExecAllocTableSlot to >> accept TupleTableSlotType as a new parameter. Change all their calls. >> >> Ashutosh Bapat and Andres Freund >> >> This by itself won't compile. Neither the tuple table slot abstraction >> patch would compile and work without this change. Both of those need >> to be committed together. > > I don't like this kind of split - all commits should individually > compile. I think we should instead introduce dummy / empty structs for > &TTSOpsHeapTuple etc, and add the parameters necessary to pass them > through. And then move this patch to *before* the "core" abstract slot > patch. That way every commit, but the super verbose stuff is still > split out. > Done. Moved this patch before the core one. In this patch, just declared the global variables of type TupleTableSlotOps, without initializing them. I have tried to make sure individual patches compile successfully by shuffling around the changes to the right patch, but there is one particular patch that still gives error. Will fix that later. I will handle the other review comments in the next patch series. pg_abstract_tts_patches_v12.tar.gz Description: GNU Zip compressed data
Re: Early WIP/PoC for inlining CTEs
On Thu, Oct 04, 2018 at 11:22:32AM +0200, Andreas Karlsson wrote: > On 10/03/2018 05:57 PM, David Fetter wrote: > >Is there any meaningful distinction between "inlining," by which I > >mean converting to a subquery, and predicate pushdown, which > >would happen at least for a first cut, at the rewrite stage? > > Sorry, but I do not think I understand your question. The ability to push > down predicates is just one of the potential benefits from inlining. Oracle appears to have such a distinction, and it appears we don't. https://medium.com/@hakibenita/be-careful-with-cte-in-postgresql-fca5e24d2119 Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Changes in Brazil DST's period
Today Brazil's president announced that the DST for all our time zones would start at November 18th instead of November 4th There's already a thread at IANA about it https://mm.icann.org/pipermail/tz/2018-October/026921.html *Emílio B. Pedrollo* Full Stack Developer +55 (55) 99134-7922
Re: executor relation handling
Hi, On 2018-10-03 16:16:11 -0400, Tom Lane wrote: > I wrote: > > Amit Langote writes: > >> Should this check that we're not in a parallel worker process? > > > Hmm. I've not seen any failures in the parallel parts of the regular > > regression tests, but maybe I'd better do a force_parallel_mode > > run before committing. > > In general, I'm not on board with the idea that parallel workers don't > > need to get their own locks, so I don't really want to exclude parallel > > workers from this check. But if it's not safe for that today, fixing it > > is beyond the scope of this particular patch. > > So the place where that came out in the wash is the commit I just made > (9a3cebeaa) to change the executor from taking table locks to asserting > that somebody else took them already. To make that work, I had to make > both ExecOpenScanRelation and relation_open skip checking for lock-held > if IsParallelWorker(). > > This makes me entirely uncomfortable with the idea that parallel workers > can be allowed to not take any locks of their own. There is no basis > for arguing that we have field proof that that's safe, because *up to > now, parallel workers in fact did take their own locks*. And it seems > unsafe on its face, because there's nothing that really guarantees that > the parent process won't go away while children are still running. > (elog(FATAL) seems like a counterexample, for instance.) > I think that we ought to adjust parallel query to insist that children > do take locks, and then revert the IsParallelWorker() exceptions I made > here. I plan to leave that point in abeyance till we've got the rest > of these changes in place, though. The easiest way to do it will > doubtless change once we've centralized the executor's table-opening > logic, so trying to code it right now seems like a waste of effort. I've not really followed this thread, and just caught up to here. It seems entirely unacceptable to not acquire locks on workers to me. Maybe I'm missing something, but why do/did the patches in this thread require that / introduce that? We didn't have that kind of concept before, no? The group locking stuff should rely / require that kind of thing, no? Greetings, Andres Freund
Procedure calls are not tracked in pg_stat_user_functions / track_functions
Hi all, It seems that currently procedures do not get tracked when track_functions is enabled, which means one needs to resort to other workarounds in order to monitor procedure calls/runtime. To illustrate: =# SHOW track_functions; ┌─┐ │ track_functions │ ├─┤ │ all │ └─┘ (1 row) =# CALL abc(); CALL =# SELECT def(); ┌─┐ │ def │ ├─┤ │ │ └─┘ (1 row) =# SELECT * FROM pg_stat_user_functions; ┌─[ RECORD 1 ]┐ │ funcid │ 75223 │ │ schemaname │ public │ │ funcname │ def│ │ calls │ 1 │ │ total_time │ 3.222 │ │ self_time │ 3.222 │ └┴┘ Was this intentional, or an oversight? If welcome, I would be happy to work on a patch. Whilst slightly confusing in terms of naming, we could just track this together with functions, since one can always join with pg_proc to determine whether something is a function or a procedure. Thanks, Lukas -- Lukas Fittl
Re: executor relation handling
Andres Freund writes: > I've not really followed this thread, and just caught up to here. It > seems entirely unacceptable to not acquire locks on workers to me. > Maybe I'm missing something, but why do/did the patches in this thread > require that / introduce that? We didn't have that kind of concept > before, no? The group locking stuff should rely / require that kind of > thing, no? I'm possibly confused, but I thought that the design of parallel query involved an expectation that workers didn't need to get their own locks. What we've determined so far in this thread is that workers *do* get their own locks (or did before yesterday), but I'd been supposing that that was accidental not intentional. In any case, I definitely intend that they will be getting their own locks again after the dust has settled. Panic not. regards, tom lane
Re: executor relation handling
Hi, On 2018-10-04 15:27:59 -0400, Tom Lane wrote: > Andres Freund writes: > > I've not really followed this thread, and just caught up to here. It > > seems entirely unacceptable to not acquire locks on workers to me. > > Maybe I'm missing something, but why do/did the patches in this thread > > require that / introduce that? We didn't have that kind of concept > > before, no? The group locking stuff should rely / require that kind of > > thing, no? > > I'm possibly confused, but I thought that the design of parallel query > involved an expectation that workers didn't need to get their own > locks. Not as far as I'm aware of - but I'm not exactly the expert there. There's an exception that some lock classes don't conflict between the leader and the workers - that's group locking (a1c1af2a1f60). But the locks still have to be acquired, and I think it's quite dangerous not to do so. The group locking logic is required because otherwise it'd be trivial to get into deadlocks, and some of the restrictions around parallel query are required to make that safe. > What we've determined so far in this thread is that workers *do* get > their own locks (or did before yesterday), but I'd been supposing that > that was accidental not intentional. I don't think it was accidental. Greetings, Andres Freund
Re: executor relation handling
On 2018-10-04 12:34:44 -0700, Andres Freund wrote: > Hi, > > On 2018-10-04 15:27:59 -0400, Tom Lane wrote: > > Andres Freund writes: > > > I've not really followed this thread, and just caught up to here. It > > > seems entirely unacceptable to not acquire locks on workers to me. > > > Maybe I'm missing something, but why do/did the patches in this thread > > > require that / introduce that? We didn't have that kind of concept > > > before, no? The group locking stuff should rely / require that kind of > > > thing, no? > > > > I'm possibly confused, but I thought that the design of parallel query > > involved an expectation that workers didn't need to get their own > > locks. > > Not as far as I'm aware of - but I'm not exactly the expert > there. There's an exception that some lock classes don't conflict > between the leader and the workers - that's group locking > (a1c1af2a1f60). But the locks still have to be acquired, and I think > it's quite dangerous not to do so. The group locking logic is required > because otherwise it'd be trivial to get into deadlocks, and some of the > restrictions around parallel query are required to make that safe. Re-read docs + code just to make sure. Here's the relevant readme parts: src/backend/access/transam/README.parallel To prevent unprincipled deadlocks when running in parallel mode, this code also arranges for the leader and all workers to participate in group locking. See src/backend/storage/lmgr/README for more details. src/backend/storage/lmgr/README: Group Locking - As if all of that weren't already complicated enough, PostgreSQL now supports parallelism (see src/backend/access/transam/README.parallel), which means that we might need to resolve deadlocks that occur between gangs of related processes rather than individual processes. This doesn't change the basic deadlock detection algorithm very much, but it makes the bookkeeping more complicated. We choose to regard locks held by processes in the same parallel group as non-conflicting. This means that two processes in a parallel group can hold a self-exclusive lock on the same relation at the same time, or one process can acquire an AccessShareLock while the other already holds AccessExclusiveLock. This might seem dangerous and could be in some cases (more on that below), but if we didn't do this then parallel query would be extremely prone to self-deadlock. For example, a parallel query against a relation on which the leader already had AccessExclusiveLock would hang, because the workers would try to lock the same relation and be blocked by the leader; yet the leader can't finish until it receives completion indications from all workers. An undetected deadlock results. This is far from the only scenario where such a problem happens. The same thing will occur if the leader holds only AccessShareLock, the worker seeks AccessShareLock, but between the time the leader attempts to acquire the lock and the time the worker attempts to acquire it, some other process queues up waiting for an AccessExclusiveLock. In this case, too, an indefinite hang results. ... So yes, locks are expected to be acquired in workers. Greetings, Andres Freund
Re: executor relation handling
On Thu, Oct 4, 2018 at 3:28 PM Tom Lane wrote: > I'm possibly confused, but I thought that the design of parallel query > involved an expectation that workers didn't need to get their own locks. You are, indeed, confused. A heck of a lot of effort went into making sure that the workers COULD take their own locks, and into trying to make sure that didn't break anything. That effort may or may not have been entirely successful, but I'm pretty sure that having them NOT take locks is going to be a lot worse. > What we've determined so far in this thread is that workers *do* get > their own locks (or did before yesterday), but I'd been supposing that > that was accidental not intentional. Nope, that was intentional. > In any case, I definitely intend that they will be getting their own > locks again after the dust has settled. Panic not. /me unloads metaphorical bazooka. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Poor plan when using EXISTS in the expression list
On Thursday, October 4, 2018 4:46:26 PM CEST Geoff Winkless wrote: > On Thu, 4 Oct 2018 at 13:11, Pierre Ducroquet wrote: > > Our developpers ORM (Django's) sadly can not use EXISTS in the where > > clauses > > without having it in the expression part of the SELECT statement. > > I don't know if this will be helpful to you (and I appreciate there's still > the underlying PG issue), but there's a suggestion here that you can work > around this using .extra() > > https://stackoverflow.com/a/38880144/321161 Sure this helps when you know the trap and don't use the Exist support in Django, but this still mean any developer with Django may create a query that, on small volumes, will be a bit slow, and will blow up on big volumes. We sadly can not monitor every piece of code written by developers or imported in the dependencies.
Re: Procedure calls are not tracked in pg_stat_user_functions / track_functions
Hi, On 2018-10-04 12:15:28 -0700, Lukas Fittl wrote: > Hi all, > > It seems that currently procedures do not get tracked when track_functions > is enabled, which means one needs to resort to other workarounds in order > to monitor procedure calls/runtime. > > To illustrate: > > =# SHOW track_functions; > ┌─┐ > │ track_functions │ > ├─┤ > │ all │ > └─┘ > (1 row) > > =# CALL abc(); > CALL > > =# SELECT def(); > ┌─┐ > │ def │ > ├─┤ > │ │ > └─┘ > (1 row) > > =# SELECT * FROM pg_stat_user_functions; > ┌─[ RECORD 1 ]┐ > │ funcid │ 75223 │ > │ schemaname │ public │ > │ funcname │ def│ > │ calls │ 1 │ > │ total_time │ 3.222 │ > │ self_time │ 3.222 │ > └┴┘ > > Was this intentional, or an oversight? > > If welcome, I would be happy to work on a patch. Whilst slightly confusing > in terms of naming, we could just track this together with functions, since > one can always join with pg_proc to determine whether something is a > function or a procedure. Yea, that sounds wrong / not ideal to me. I think we should just fix this, should be easy enough. - Andres
Re: executor relation handling
Robert Haas writes: > On Thu, Oct 4, 2018 at 3:28 PM Tom Lane wrote: >> What we've determined so far in this thread is that workers *do* get >> their own locks (or did before yesterday), but I'd been supposing that >> that was accidental not intentional. > Nope, that was intentional. Fair enough --- in which case, the patch series we're working on here was broken to suppose that it could get away with removing that. As I said, I'll make sure the locking is back before I finish with this. regards, tom lane
snprintf assert is broken by plpgsql #option dump
Hi Today I had broken plpgsql_check tests aganst PostgreSQL 12. After command create or replace function ml_trg() returns trigger as $$ #option dump declare begin if TG_OP = 'INSERT' then if NEW.status_from IS NULL then begin -- performance issue only select status into NEW.status_from from pa where pa_id = NEW.pa_id; -- nonexist target value select status into NEW.status_from_xxx from pa where pa_id = NEW.pa_id; exception when DATA_EXCEPTION then new.status_from := 'DE'; end; end if; end if; if TG_OP = 'DELETE' then return OLD; else return NEW; end if; exception when OTHERS then NULL; if TG_OP = 'DELETE' then return OLD; else return NEW; end if; end; $$ language plpgsql; I got backtrace: Program received signal SIGABRT, Aborted. 0x7f2c140e653f in raise () from /lib64/libc.so.6 (gdb) bt #0 0x7f2c140e653f in raise () from /lib64/libc.so.6 #1 0x7f2c140d0895 in abort () from /lib64/libc.so.6 #2 0x008b7903 in ExceptionalCondition (conditionName=conditionName@entry=0xb00d3b "!(strvalue != ((void *)0))", errorType=errorType@entry=0x906034 "FailedAssertion", fileName=fileName@entry=0xb00d30 "snprintf.c", lineNumber=lineNumber@entry=674) at assert.c:54 #3 0x008ff368 in dopr (target=target@entry=0x7fff4ff6aad0, format=0x7f2bfe4b0de3 " fields", format@entry=0x7f2bfe4b0dda "ROW %-16s fields", args=args@entry=0x7fff4ff6ab18) at snprintf.c:674 #4 0x008ff6b7 in pg_vfprintf (stream=, fmt=fmt@entry=0x7f2bfe4b0dda "ROW %-16s fields", args=args@entry=0x7fff4ff6ab18) at snprintf.c:261 #5 0x008ff7ff in pg_printf (fmt=fmt@entry=0x7f2bfe4b0dda "ROW %-16s fields") at snprintf.c:286 #6 0x7f2bfe4a7c67 in plpgsql_dumptree (func=func@entry=0x26811e8) at pl_funcs.c:1676 #7 0x7f2bfe49a136 in do_compile (fcinfo=fcinfo@entry=0x7fff4ff6afa0, procTup=procTup@entry=0x7f2bfe4ba100, function=0x26811e8, function@entry=0x0, hashkey=hashkey@entry=0x7fff4ff6ad20, forValidator=forValidator@entry=true) at pl_comp.c:794 #8 0x7f2bfe49a27a in plpgsql_compile (fcinfo=fcinfo@entry=0x7fff4ff6afa0, forValidator=forValidator@entry=true) at pl_comp.c:224 #9 0x7f2bfe497126 in plpgsql_validator (fcinfo=) at pl_handler.c:504 #10 0x008c03df in OidFunctionCall1Coll (functionId=functionId@entry=13392, collation=collation@entry=0, arg1=arg1@entry=40960) at fmgr.c:1418 #11 0x00563444 in ProcedureCreate (procedureName=, procNamespace=procNamespace@entry=2200, replace=, returnsSet=, returnType=, proowner=16384, languageObjectId=13393, languageValidator=13392, prosrc=0x264feb8 "\n#option There are new assert Assert(strvalue != NULL); probably all refname usage inside plpgsql dump functions has problem with it. I found two parts diff --git a/src/pl/plpgsql/src/pl_funcs.c b/src/pl/plpgsql/src/pl_funcs.c index b93f866223..d97997c001 100644 --- a/src/pl/plpgsql/src/pl_funcs.c +++ b/src/pl/plpgsql/src/pl_funcs.c @@ -1519,7 +1519,7 @@ dump_execsql(PLpgSQL_stmt_execsql *stmt) dump_ind(); printf("INTO%s target = %d %s\n", stmt->strict ? " STRICT" : "", - stmt->target->dno, stmt->target->refname); + stmt->target->dno, stmt->target->refname ? stmt->target->refname : "null"); } dump_indent -= 2; } @@ -1673,7 +1673,7 @@ plpgsql_dumptree(PLpgSQL_function *func) PLpgSQL_row *row = (PLpgSQL_row *) d; int i; - printf("ROW %-16s fields", row->refname); + printf("ROW %-16s fields", row->refname ? row->refname : "null"); for (i = 0; i < row->nfields; i++) { printf(" %s=var %d", row->fieldnames[i], Regards Pavel
Re: executor relation handling
Amit Langote writes: > I've rebased the remaining patches. I broke down one of the patches into > 2 and re-ordered the patches as follows: > 0001: introduces a function that opens range table relations and maintains > them in an array indexes by RT index > 0002: introduces a new field in EState that's an array of RangeTblEntry > pointers and revises macros used in the executor that access RTEs to > return them from the array (David Rowley co-authored this one) I've pushed 0001 and 0002 with mostly cosmetic changes. One thing I wanted to point out explicitly, though, is that I found this bit of 0002 to be a seriously bad idea: --- a/src/include/nodes/execnodes.h +++ b/src/include/nodes/execnodes.h @@ -20,6 +20,7 @@ #include "executor/instrument.h" #include "lib/pairingheap.h" #include "nodes/params.h" +#include "nodes/parsenodes.h" #include "nodes/plannodes.h" #include "utils/hsearch.h" #include "utils/queryenvironment.h" Please do not add #includes of fundamental headers to other fundamental headers without clearing it with somebody. There's little enough structure to our header collection now. I don't want to end up in a situation where effectively the entire header set gets pulled into every .c file, or worse that we have actual reference loops in the headers. (This is not an academic problem; somebody actually created such a loop awhile back. Cleaning it up, by the time we'd recognized the problem, was really painful.) > 0003: moves result relation and ExecRowMark initialization out of InitPlan > and into ExecInit* routines of respective nodes I am finding myself pretty unconvinced by this one; it seems like mostly a random reallocation of responsibility with little advantage. The particular thing that brought me to a screeching halt was seeing that the patch removed ExecFindRowMark, despite the fact that that's part of our advertised FDW API (see fdwhandler.sgml), and it didn't provide any alternative way for an FDW to find out at runtime whether it's subject to a row locking requirement. I thought for a minute about just leaving the function in place, but that wouldn't work because both nodeLockRows and nodeModifyTable are written so that they find^H^H^Hbuild their rowmarks only after recursing to initialize their child plan nodes; so a child node that tried to use ExecFindRowMark during ExecInitNode would get the wrong answer. Of course, we could consider changing the order of operations during initialization of those node types, but I'm not really seeing a compelling reason why we should whack things around that much. So I'm inclined to just omit 0003. AFAICS this would only mean that we couldn't drop the global PlanRowMarks list from PlannedStmt, which does not bother me much. > 0005: teaches planner to remove PlanRowMarks corresponding to dummy relations I'm not entirely sold on the value of that either? regards, tom lane
Re: snprintf assert is broken by plpgsql #option dump
Pavel Stehule writes: > There are new assert > Assert(strvalue != NULL); > probably all refname usage inside plpgsql dump functions has problem with > it. This isn't so much a "new assert" as a modeling of the fact that some printf implementations dump core on a null string pointer, and have done so since the dawn of time. Now that we only use snprintf.c in HEAD, it'd be possible to consider modeling glibc's behavior instead, ie instead of the Assert do if (strvalue == NULL) strvalue = "(null)"; I do not think this would be a good idea though, at least not till all release branches that *don't* always use snprintf.c are out of support. Every assert failure that we find here is a live bug in the back branches, even if it happens not to occur on $your-favorite-platform. Even once that window elapses, I'd not be especially on board with snprintf.c papering over such cases. They're bugs really. > I found two parts Thanks for the report, will push something. regards, tom lane
Re: snprintf assert is broken by plpgsql #option dump
I wrote: > Pavel Stehule writes: >> I found two parts > Thanks for the report, will push something. On closer look, I'm not sure that these are the only places that are assuming that any PLpgSQL_variable struct has a refname. What seems like a safer answer is to make sure they all do, more or less as attached. regards, tom lane diff --git a/src/pl/plpgsql/src/pl_comp.c b/src/pl/plpgsql/src/pl_comp.c index 721234d..59460d2 100644 *** a/src/pl/plpgsql/src/pl_comp.c --- b/src/pl/plpgsql/src/pl_comp.c *** build_row_from_vars(PLpgSQL_variable **v *** 1896,1901 --- 1896,1903 row = palloc0(sizeof(PLpgSQL_row)); row->dtype = PLPGSQL_DTYPE_ROW; + row->refname = "(unnamed row)"; + row->lineno = -1; row->rowtupdesc = CreateTemplateTupleDesc(numvars, false); row->nfields = numvars; row->fieldnames = palloc(numvars * sizeof(char *)); diff --git a/src/pl/plpgsql/src/pl_exec.c b/src/pl/plpgsql/src/pl_exec.c index 574234d..4552638 100644 *** a/src/pl/plpgsql/src/pl_exec.c --- b/src/pl/plpgsql/src/pl_exec.c *** exec_stmt_call(PLpgSQL_execstate *estate *** 2205,2210 --- 2205,2211 row = palloc0(sizeof(*row)); row->dtype = PLPGSQL_DTYPE_ROW; + row->refname = "(unnamed row)"; row->lineno = -1; row->varnos = palloc(sizeof(int) * FUNC_MAX_ARGS); diff --git a/src/pl/plpgsql/src/pl_gram.y b/src/pl/plpgsql/src/pl_gram.y index b59869a..68e399f 100644 *** a/src/pl/plpgsql/src/pl_gram.y --- b/src/pl/plpgsql/src/pl_gram.y *** decl_cursor_args : *** 613,618 --- 613,619 new = palloc0(sizeof(PLpgSQL_row)); new->dtype = PLPGSQL_DTYPE_ROW; + new->refname = "(unnamed row)"; new->lineno = plpgsql_location_to_lineno(@1); new->rowtupdesc = NULL; new->nfields = list_length($2); *** read_into_scalar_list(char *initial_name *** 3526,3531 --- 3527,3533 row = palloc0(sizeof(PLpgSQL_row)); row->dtype = PLPGSQL_DTYPE_ROW; + row->refname = "(unnamed row)"; row->lineno = plpgsql_location_to_lineno(initial_location); row->rowtupdesc = NULL; row->nfields = nfields; *** make_scalar_list1(char *initial_name, *** 3560,3565 --- 3562,3568 row = palloc0(sizeof(PLpgSQL_row)); row->dtype = PLPGSQL_DTYPE_ROW; + row->refname = "(unnamed row)"; row->lineno = lineno; row->rowtupdesc = NULL; row->nfields = 1; diff --git a/src/pl/plpgsql/src/plpgsql.h b/src/pl/plpgsql/src/plpgsql.h index 4a4c7cb..f6c35a5 100644 *** a/src/pl/plpgsql/src/plpgsql.h --- b/src/pl/plpgsql/src/plpgsql.h *** typedef struct PLpgSQL_var *** 326,332 * Note that there's no way to name the row as such from PL/pgSQL code, * so many functions don't need to support these. * ! * refname, isconst, notnull, and default_val are unsupported (and hence * always zero/null) for a row. The member variables of a row should have * been checked to be writable at compile time, so isconst is correctly set * to false. notnull and default_val aren't applicable. --- 326,337 * Note that there's no way to name the row as such from PL/pgSQL code, * so many functions don't need to support these. * ! * That also means that there's no real name for the row variable, so we ! * conventionally set refname to "(unnamed row)". We could leave it NULL, ! * but it's too convenient to be able to assume that refname is valid in ! * all variants of PLpgSQL_variable. ! * ! * isconst, notnull, and default_val are unsupported (and hence * always zero/null) for a row. The member variables of a row should have * been checked to be writable at compile time, so isconst is correctly set * to false. notnull and default_val aren't applicable.
Re: Segfault when creating partition with a primary key and sql_drop trigger exists
On 2018-Oct-03, Michael Paquier wrote: > Okay. I have spent more time on this issue, and I have been able to > integrate a test in the existing event_trigger.sql which is able to > reproduce the reported failure. Attached is what I am finishing with. > > I still want to do more testing on it, and the day is ending here. I looked at this, and I think that this particular crash you're fixing is just a symptom of a larger problem in the event trigger alter table handling -- I noticed while playing a bit with src/test/modules/test_ddl_decoding (modify sql/alter_table.sql to create a partitioned table with nullable columns and a partition, then alter table to add a primary key). I'm tied up in something else at the moment so can't spend more time on it, but I hope to have time to give it a look over the weekend. Thanks -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Segfault when creating partition with a primary key and sql_drop trigger exists
Alvaro Herrera writes: > I'm tied up in something else at the moment so can't spend more time on > it, but I hope to have time to give it a look over the weekend. Keep in mind that RC1 is scheduled to wrap Monday afternoon ... regards, tom lane
Re: Segfault when creating partition with a primary key and sql_drop trigger exists
On Thu, Oct 04, 2018 at 06:04:49PM -0400, Tom Lane wrote: > Alvaro Herrera writes: >> I'm tied up in something else at the moment so can't spend more time on >> it, but I hope to have time to give it a look over the weekend. > > Keep in mind that RC1 is scheduled to wrap Monday afternoon ... ... Which is little time to work on a complete fix. Surely we could wait for GA for a fix? I don't find nice to release v11 with this bug. -- Michael signature.asc Description: PGP signature
Re: [HACKERS] Why does logical replication launcher exit with exit code 1?
On Thu, Aug 3, 2017 at 1:41 PM Robert Haas wrote: > On Wed, Aug 2, 2017 at 9:33 PM, Peter Eisentraut > wrote: > > On 8/2/17 16:52, Robert Haas wrote: > >> I actually don't think it's that unreasonable to get notified when > >> system-wide processes like the autovacuum launcher or the logical > >> replication launcher start or stop. > > > > But we got rid of those start messages recently due to complaints. > > Yeah, true. I'm just talking about what *I* think. :-) I still think the current situation is non-ideal. I don't have a strong view on whether some or all system-wide processes should say hello and goodbye explicitly in the log, but I do think we need a way to make that not look like an error condition, and ideally without special cases for known built-in processes. I looked into this a bit today, while debugging an extension that runs a background worker. Here are some assorted approaches to shutdown: 0. The default SIGTERM handler for bgworkers is bgworker_die(), which generates a FATAL ereport "terminating background worker \"%s\" due to administrator command", directly in the signal handler (hmm, see also recent discussions about the legality of similar code in quickdie()). 1. Some processes install their own custom SIGTERM handler that sets a flag, and use that to break out of their main loop and exit quietly. Example: autovacuum.c, or the open-source pg_cron extension, and probably many other things that just want a nice quiet shutdown. 2. Some processes install the standard SIGTERM handler die(), and then use CHECK_FOR_INTERRUPTS() to break out of their main loop. By default this looks like "FATAL: terminating connection due to administrator command". This approach is effectively required if the main loop will reach other code that has a CHECK_FOR_INTERRUPTS() wait loop. For example, a "launcher"-type process that calls WaitForBackgroundWorkerStartup() could hang forever if it used approach 1 (ask me how I know). 3. Some system processes generally use approach 2, but have a special case in ProcessInterrupts() to suppress or alter the usual FATAL message or behaviour. This includes the logical replication launcher. A couple of thoughts: * Extensions that need to use die() (or something else that would be compatible with CFI() wait loops) should ideally have a way to control how ProcessInterrupts() reports this to the user, since shutdown is an expected condition for long-lived processes. * We really should get rid of that "exited with exit code 1". -- Thomas Munro http://www.enterprisedb.com
Re: Segfault when creating partition with a primary key and sql_drop trigger exists
Hi, On 2018-10-05 08:29:29 +0900, Michael Paquier wrote: > On Thu, Oct 04, 2018 at 06:04:49PM -0400, Tom Lane wrote: > > Alvaro Herrera writes: > >> I'm tied up in something else at the moment so can't spend more time on > >> it, but I hope to have time to give it a look over the weekend. > > > > Keep in mind that RC1 is scheduled to wrap Monday afternoon ... > > ... Which is little time to work on a complete fix. Surely we could > wait for GA for a fix? I don't find nice to release v11 with this bug. Are you suggesting we fix after RC1, or delay RC1? I'm not 100% sure I'm parsing your sentence correctly. Greetings, Andres Freund
Re: pg_ls_tmpdir()
On Thu, Oct 04, 2018 at 02:23:34PM +0200, Laurenz Albe wrote: > Bossart, Nathan wrote: >> Alright, here's an updated patch. > > Looks, good; marking as "ready for committer". Like Tom, I think it is less dirty to use the two-function approach. Committed this way with a catalog version bump. -- Michael signature.asc Description: PGP signature
Re: Segfault when creating partition with a primary key and sql_drop trigger exists
On Thu, Oct 04, 2018 at 04:54:45PM -0700, Andres Freund wrote: > Are you suggesting we fix after RC1, or delay RC1? I'm not 100% sure > I'm parsing your sentence correctly. I am suggesting to fix the issue after RC1 is released, but before GA. -- Michael signature.asc Description: PGP signature
Re: Early WIP/PoC for inlining CTEs
> "Andreas" == Andreas Karlsson writes: > On 10/03/2018 05:57 PM, David Fetter wrote: >> Is there any meaningful distinction between "inlining," by which I >> mean converting to a subquery, and predicate pushdown, which >> would happen at least for a first cut, at the rewrite stage? Yes. Andreas> Sorry, but I do not think I understand your question. The Andreas> ability to push down predicates is just one of the potential Andreas> benefits from inlining. Consider the difference between (in the absence of CTE inlining): -- inline subquery with no optimization barrier (qual may be pushed down) select * from (select x from y) s where x=1; -- inline subquery with optimization barrier (qual not pushed down) select * from (select x from y offset 0) s where x=1; -- CTE with materialization with s as (select x from y) select * from s where x=1; -- Andrew (irc:RhodiumToad)
Re: snprintf assert is broken by plpgsql #option dump
čt 4. 10. 2018 v 23:57 odesílatel Tom Lane napsal: > I wrote: > > Pavel Stehule writes: > >> I found two parts > > > Thanks for the report, will push something. > > On closer look, I'm not sure that these are the only places that are > assuming that any PLpgSQL_variable struct has a refname. What seems > like a safer answer is to make sure they all do, more or less as > attached. > +1 Pavel > regards, tom lane > >
Re: pg_ls_tmpdir()
On 10/4/18, 7:31 PM, "Michael Paquier" wrote: > Committed this way with a catalog version bump. Thanks! Nathan
Re: Tid scan improvements
On Wed, 3 Oct 2018 at 17:36, David Rowley wrote: > I know commit fest is over, but I made a pass of this to hopefully > provide a bit of guidance so that it's closer for the November 'fest. Hi David. Thanks for the review. It's fairly thorough and you must have put some time into it -- I really appreciate it. > I've only done light testing on the patch and it does seem to work, > but there are a few things that I think should be changed. Most > importantly #11 below I think needs to be done. That might overwrite > some of the items that come before it in the list as you likely will > have to pull some of code which I mention out out due to changing #11. > I've kept them around anyway just in case some of it remains. > 1. Could wrap for tables > 16TB. Please use double. See index_pages_fetched() > 2. Should multiply by nseqpages, not add. > 3. Should be double. I agree with these three. > 4. Comment needs updated to mention what the new code does in > heapgettup() and heapgettup_pagemode() > > + > /* start from last page of the scan */ > - if (scan->rs_startblock > 0) > - page = scan->rs_startblock - 1; > + if (scan->rs_numblocks == InvalidBlockNumber) > + { > + if (scan->rs_startblock > 0) > + page = scan->rs_startblock - 1; > + else > + page = scan->rs_nblocks - 1; > + } > else > - page = scan->rs_nblocks - 1; > + page = scan->rs_startblock + scan->rs_numblocks - 1; > + I'm thinking that, as they don't depend on the others, the heapam.c changes should be a separate preparatory patch? The heap scan code has to support things like synchonised scans and parallel scans, but as far as I know, its support for scanning subranges is currently used only for building BRIN indexes. I found that although I could specify a subrange with heap_setscanlimits, I could not scan backward over it, because the original version of the above code would start at the end of the whole table. I'm not especially comfortable with this understanding of heapam, so close review would be appreciated. I note that there's a lot of common code in heapgettup and heapgettup_pagemode, which my changes add to. It might be worth trying to factor out somehow. > 5. Variables should be called "inclusive". We use "strict" to indicate > an operator comparison cannot match NULL values. > > + bool strict; /* Indicates < rather than <=, or > rather */ > + bool strict2; /* than >= */ > > Don't break the comment like that. If you need more space don't end > the comment and use a new line and tab the next line out to match the > * of the first line. > 8. Many instances of the word "strict" are used to mean "inclusive". > Can you please change all of them. I don't mind renaming it. I took "strict" from "strictly greater/less than" but I knew it was confusable with the other usages of "strict". > 9. Confusing comment: > > + * If the expression was non-strict (<=) and the offset is 0, then just > + * pretend it was strict, because offset 0 doesn't exist and we may as > + * well exclude that block. > > Shouldn't this be, "If the operator is non-inclusive, then since TID > offsets are 1-based, for simplicity, we can just class the expression > as inclusive.", or something along those lines. Ok, I'll try to reword it along those lines. > I think I'm going to stop here as changing this going to cause quite a > bit of churn. > > but one more... > 12. I think the changes to selfuncs.c to get the selectivity estimate > is a fairly credible idea, but I think it also needs to account for > offsets. You should be able to work out the average number of items > per page with rel->tuples / rel->pages and factor that in to get a > better estimate for cases like: > > postgres=# explain analyze select ctid,* from t1 where ctid <= '(0,200)'; > QUERY PLAN > --- > Tid Scan on t1 (cost=0.00..5.00 rows=1 width=10) (actual > time=0.025..0.065 rows=200 loops=1) >TID Cond: (ctid <= '(0,200)'::tid) > Planning Time: 0.081 ms > Execution Time: 0.088 ms > (4 rows) > > You can likely add on "(offset / avg_tuples_per_page) / rel->pages" to > the selectivity and get a fairly accurate estimate... at least when > there are no dead tuples in the heap I think the changes to selfuncs.c could also be a separate patch? I'll try to include the offset in the selectivity too. Related -- what should the selectivity be on an empty table? My code has: /* If the relation's empty, we're going to read all of it. */ if (vardata->rel->pages == 0) return 1.0; (which needs rewording, since selectivity isn't always about reading). Is 1.0 the right thing to return? > 6. Why not pass the TidExpr into MakeTidOpExprState() and have it set > the type instead of repeating code > 7. It's not very obvious why the following Assert() can't fail. [...] > I had to hunt around quite a bit to see that > TidQualFromBaseRestrictinfo