autoprewarm is fogetting to register a tranche.
Hello, I noticed while an investigation that pg_prewarm is forgetting to register a tranche. Before the second parameter of LWLockRegisterTranche() became char * in 3761fe3, that would lead to a crash for a --enable-dtrace build, but currently it not likely. On the other hand as far as reading a comment in lwlock.h, we ought to register a tranche obtained by LWLockNewTrancheId() and it is still convincing. So I made autoprewarm.c register it. (patch 0002) > * There is another, more flexible method of obtaining lwlocks. First, call > * LWLockNewTrancheId just once to obtain a tranche ID; this allocates from a > * shared counter. Next, LWLockInitialize should be called just once per > * lwlock, passing the tranche ID as an argument. Finally, each individual > * process using the tranche should call LWLockRegisterTranche() or > * LWLockRegisterTrancheOfLock() to associate that tranche ID with a name. In passing, I found it hard to register a tranche in apw_init_shmem() because a process using the initialized shared memory struct cannot find the tranche id (without intruding into the internal of LWLock strcut). So I'd like to propose another tranche registering interface LWLockRegisterTrancheOfLock(LWLock *, int). The interface gets rid of the necessity of storing tranche id separately from LWLock. (in patch 0001) The comment cited above looks a bit different from how the interface is actually used. So I rearranged the comment following a typical use I have seen in several cases. (in patch 0001) + * There is another, more flexible method of obtaining lwlocks. First, call + * LWLockNewTrancheId just once to obtain a tranche ID; this allocates from a + * shared counter. Next, LWLockInitialize should be called just once per + * lwlock, passing the tranche ID as an argument. Finally, each individual + * process using the tranche should call LWLockRegisterTranche() or + * LWLockRegisterTrancheOfLock() to associate that tranche ID with a name. This might seem somewhat odd but things should happen in the order in real codes since tranche registration usulally happens after a lock initializing block. The final patch 0003 should be arguable, or anticipated to be rejected. It cannot be detect that a tranche is not registered until its name is actually accessed (or many eewon't care even if the name were printed as '(null)' in an error message that is the result of the developer's own bug). On the other hand there's no chance to detect that before the lock is actually used. By the last patch I added an assertion in LWLockAcquire() but it must hit performance in certain dgree (even if it is only in --enable-cassert build) so I don't insisit that it must be there. regards, -- Kyotaro Horiguchi NTT Open Source Software Center >From 3ac3de48c9c6992bf9137dc65362ada502100c3b Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi Date: Fri, 15 Dec 2017 14:08:18 +0900 Subject: [PATCH 1/3] Add a new tranche register API In a typical usage of named tranches, it would be useful if we can register a tranche not needing remember the tranche id separately in shared memory. This new function accepts LWLock itself to specify the tranche id. --- src/backend/storage/lmgr/lwlock.c | 15 +++ src/include/storage/lwlock.h | 11 ++- 2 files changed, 21 insertions(+), 5 deletions(-) diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c index 46f5c42..db197a6 100644 --- a/src/backend/storage/lmgr/lwlock.c +++ b/src/backend/storage/lmgr/lwlock.c @@ -619,6 +619,21 @@ LWLockRegisterTranche(int tranche_id, const char *tranche_name) } /* + * Register the tranche ID of the specified lock in the lookup table for the + * current process. + */ +void +LWLockRegisterTrancheOfLock(LWLock *lock, const char *tranche_name) +{ + /* + * Different from LWLockRegisterTranche, users can easily give a LWLock of + * builtin tranches. + */ + Assert(lock->tranche >= LWTRANCHE_FIRST_USER_DEFINED); + LWLockRegisterTranche(lock->tranche, tranche_name); +} + +/* * RequestNamedLWLockTranche * Request that extra LWLocks be allocated during postmaster * startup. diff --git a/src/include/storage/lwlock.h b/src/include/storage/lwlock.h index 460843d..9f02329 100644 --- a/src/include/storage/lwlock.h +++ b/src/include/storage/lwlock.h @@ -172,11 +172,11 @@ extern LWLockPadded *GetNamedLWLockTranche(const char *tranche_name); /* * There is another, more flexible method of obtaining lwlocks. First, call - * LWLockNewTrancheId just once to obtain a tranche ID; this allocates from - * a shared counter. Next, each individual process using the tranche should - * call LWLockRegisterTranche() to associate that tranche ID with a name. - * Finally, LWLockInitialize should be called just once per lwlock, passing - * the tranche ID as an argument. + * LWLockNewTrancheId just once to obtain a tranche ID; this allocates from a + * shared counter. Next, LWLockInitialize should be called just once per + * lwlo
Re: [HACKERS] Surjective functional indexes
On 15.12.2017 01:21, Michael Paquier wrote: On Fri, Dec 15, 2017 at 6:15 AM, Alvaro Herrera wrote: Konstantin Knizhnik wrote: If you still thing that additional 16 bytes per relation in statistic is too high overhead, then I will also remove autotune. I think it's pretty clear that these additional bytes are excessive. The bar to add new fields in PgStat_TableCounts in very high, and one attempt to tackle its scaling problems with many relations is here by Horiguchi-san: https://www.postgresql.org/message-id/20171211.201523.24172046.horiguchi.kyot...@lab.ntt.co.jp His patch may be worth a look if you need more fields for your feature. So it seems to me that the patch as currently presented has close to zero chance to be committed as long as you keep your changes to pgstat.c. Ok, looks like everybody think that autotune based on statistic is bad idea. Attached please find patch without autotune. -- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company diff --git a/doc/src/sgml/ref/create_index.sgml b/doc/src/sgml/ref/create_index.sgml index 0255375..c4ee15e 100644 --- a/doc/src/sgml/ref/create_index.sgml +++ b/doc/src/sgml/ref/create_index.sgml @@ -294,8 +294,33 @@ CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ [ IF NOT EXISTS ] The optional WITH clause specifies storage parameters for the index. Each index method has its own set of allowed -storage parameters. The B-tree, hash, GiST and SP-GiST index methods all -accept this parameter: +storage parameters. All indexes accept the following parameter: + + + + +recheck_on_update + + + Functional index is based on on projection function: function which extract subset of its argument. + In mathematic such functions are called non-injective. For injective function if any attribute used in the indexed + expression is changed, then value of index expression is also changed. So to check that index is affected by the + update, it is enough to check the set of changed fields. By default this parameters is assigned true value and function is considered + as non-injective. + In this case change of any of indexed key doesn't mean that value of the function is changed. For example, for + the expression expression(bookinfo->>'isbn') defined + for column of JSON type is changed only when ISBN is changed, which rarely happen. The same is true for most + functional indexes. For non-injective functions, Postgres compares values of indexed expression for old and updated tuple and updates + index only when function results are different. It allows to eliminate index update and use HOT update. + But there are extra evaluations of the functions. So if function is expensive or probability that change of indexed column will not effect + the function value is small, then marking index as recheck_on_update may increase update speed. + + + + + + + The B-tree, hash, GiST and SP-GiST index methods all accept this parameter: diff --git a/src/backend/access/common/reloptions.c b/src/backend/access/common/reloptions.c index aa9c0f1..1aaab78 100644 --- a/src/backend/access/common/reloptions.c +++ b/src/backend/access/common/reloptions.c @@ -131,6 +131,15 @@ static relopt_bool boolRelOpts[] = }, { { + "recheck_on_update", + "Evaluate functional index expression on update to check if its values is changed", + RELOPT_KIND_INDEX, + AccessExclusiveLock + }, + true + }, + { + { "security_barrier", "View acts as a row security barrier", RELOPT_KIND_VIEW, @@ -1311,7 +1320,7 @@ fillRelOptions(void *rdopts, Size basesize, break; } } - if (validate && !found) + if (validate && !found && options[i].gen->kinds != RELOPT_KIND_INDEX) elog(ERROR, "reloption \"%s\" not found in parse table", options[i].gen->name); } @@ -1468,6 +1477,40 @@ index_reloptions(amoptions_function amoptions, Datum reloptions, bool validate) } /* + * Parse generic options for all indexes. + * + * reloptions options as text[] datum + * validate error flag + */ +bytea * +index_generic_reloptions(Datum reloptions, bool validate) +{ + int numoptions; + GenericIndexOpts *idxopts; + relopt_value *options; + static const relopt_parse_elt tab[] = { + {"recheck_on_update", RELOPT_TYPE_BOOL, offsetof(GenericIndexOpts, recheck_on_update)} + }; + + options = parseRelOptions(reloptions, validate, + RELOPT_KIND_INDEX, + &numoptions); + + /* if none set, we're done */ + if (numoptions == 0) + return NULL; + + idxopts = allocateReloptStruct(sizeof(GenericIndexOpts), options, numoptions); + + fillRelOptions((void *)idxopts, sizeof(GenericIndexOpts), options, numoptions, + validate, tab, lengthof(tab)); + + pfree(options); + + return (bytea*) idxopts; +} + +/* * Option parser for attribute reloptions */ bytea * diff --git a/src/backend/acces
Re: GSoC 2018
Hi Stephen, > New entries are certainly welcome and encouraged, just be sure to note > them as '2018' when you add it. I proposed a few ideas: * High availability / failover based on logical replication * Thrift datatype support Hope these ideas are good enough for GSoC. -- Best regards, Aleksander Alekseev signature.asc Description: PGP signature
Re: GSoC 2018
Hi, 2017-12-15 4:14 GMT+01:00 Stephen Frost : > Unsurprisingly, we'll need to have an Ideas page again, so I've gone > ahead and created one (copying last year's): What about adding "Learned Index" as project task [*]? This type of index looks promising for certain properties. :Stefan [*] "The Case for Learned Index Structures" Kraska et al. (Dec 2017) https://arxiv.org/abs/1712.01208 2017-12-15 4:14 GMT+01:00 Stephen Frost : > Greetings -hackers, > > Google Summer of Code 2018 was announced back in September and they've > changed up the timeline a bit [1]. Specifically, they moved up the > dates for things like the mentoring organization application deadline, > so it's time to start working on our Idea's page for 2018 in earnest. > > The deadline for Mentoring organizations to apply is: January 23. > > The list of accepted organization will be published around February 12. > > Unsurprisingly, we'll need to have an Ideas page again, so I've gone > ahead and created one (copying last year's): > > https://wiki.postgresql.org/wiki/GSoC_2018 > > Google discusses what makes a good "Ideas" list here: > > https://google.github.io/gsocguides/mentor/defining-a-project-ideas-list.html > > I marked all the entries with '2017' to indicate they were pulled from > last year. If the project from last year is still relevant, please > update it to be '2018' and make sure to update all of the information > (in particular, make sure to list yourself as a mentor and remove the > other mentors, as appropriate). > > New entries are certainly welcome and encouraged, just be sure to note > them as '2018' when you add it. > > Projects from last year which were worked on but have significant > follow-on work to be completed are absolutely welcome as well- simply > update the description appropriately and mark it as being for '2018'. > > When we get closer to actually submitting our application, I'll clean > out the '2017' entries that didn't get any updates. > > As a reminder, each idea on the page should be in the format that the > other entries are in and should include: > > - Project title/one-line description > - Brief, 2-5 sentence, description of the project (remember, these are > 12-week projects) > - Description of programming skills needed and estimation of the > difficulty level > - List of potential mentors > - Expected Outcomes > > As with last year, please consider PostgreSQL to be an "Umbrella" > project and that anything which would be considered "PostgreSQL Family" > per the News/Announce policy [2] is likely to be acceptable as a > PostgreSQL GSoC project. > > In other words, if you're a contributor or developer on barman, > pgBackRest, the PostgreSQL website (pgweb), the PgEU/PgUS website code > (pgeu-website), pgAdmin4, PostgresXL, pgbouncer, Citus, pldebugger, the > PG RPMs (pgrpms), the JDBC driver, the ODBC driver, or any of the many > other PG Family projects, please feel free to add a project for > consideration! If we get quite a few, we can organize the page further > based on which project or maybe what skills are needed or similar. > > Let's have another great year of GSoC with PostgreSQL! > > Thanks! > > Stephen > > [1]: https://developers.google.com/open-source/gsoc/timeline > [2]: https://wiki.postgresql.org/wiki/NewsEventsApproval
Reproducible builds: genbki.pl vs schemapg.h
Hi, Debian's reproducible builds project has revealed that the full build path gets embedded into server/catalog/schemapg.h: /*- * * schemapg.h *Schema_pg_xxx macros for use by relcache.c * * Portions Copyright (c) 1996-2017, PostgreSQL Global Development Group * Portions Copyright (c) 1994, Regents of the University of California * * NOTES * ** * *** DO NOT EDIT THIS FILE! *** * ** * * It has been GENERATED by /build/postgresql-11-05gRdu/build/../src/backend/catalog/genbki.pl * *- */ This information will vary on rebuilding the binaries, needlessly causing different checksums. I'm proposing to strip that down to "It has been GENERATED genbki.pl". Patch attached. Christoph diff --git a/src/backend/catalog/genbki.pl b/src/backend/catalog/genbki.pl new file mode 100644 index 256a9c9..30153e6 *** a/src/backend/catalog/genbki.pl --- b/src/backend/catalog/genbki.pl *** print $schemapg <
Re: GSoC 2018
Hi! > 15 дек. 2017 г., в 15:03, Stefan Keller написал(а): > What about adding "Learned Index" as project task [*]? > This type of index looks promising for certain properties. I have no high expectation from this idea. But feel that it is necessary at least to validate the idea. I'd sign up as co-mentor for this. Moreover this will attract students from huge pool of ML hackers. Also, I'm planning to add my previous-year project about sorting. Or add something WAL-G-related under Postgres umbrella (probably, WAL-scanning for delta-backups). Best regards, Andrey Borodin&
Package version in PG_VERSION and version()
To be able to identify more easily which package a connected server is coming from, I would like to embed the (Debian) package version in the version() output which is coming from PG_VERSION. It is fairly easy to do that, but it requires patching configure(.in): $ ./configure VENDOR_VERSION="Debian 10.1-2" # select version(); version ══ PostgreSQL 10.1 on x86_64-pc-linux-gnu (Debian 10.1-2), compiled by gcc (Debian 7.2.0-17) 7.2.1 20171205, 64-bit PoC patch against HEAD attached - if the approach is deemed acceptable, I'll also update the relevant documentation bits. Thanks, Christoph diff --git a/configure b/configure new file mode 100755 index 58eafd3..d68fecd *** a/configure --- b/configure *** fi *** 16773,16779 cat >>confdefs.h <<_ACEOF ! #define PG_VERSION_STR "PostgreSQL $PG_VERSION on $host, compiled by $cc_string, `expr $ac_cv_sizeof_void_p \* 8`-bit" _ACEOF --- 16773,16779 cat >>confdefs.h <<_ACEOF ! #define PG_VERSION_STR "PostgreSQL $PG_VERSION on $host${VENDOR_VERSION:+ ($VENDOR_VERSION)}, compiled by $cc_string, `expr $ac_cv_sizeof_void_p \* 8`-bit" _ACEOF diff --git a/configure.in b/configure.in new file mode 100644 index 5245899..dccf5b2 *** a/configure.in --- b/configure.in *** else *** 2185,2191 fi AC_DEFINE_UNQUOTED(PG_VERSION_STR, !["PostgreSQL $PG_VERSION on $host, compiled by $cc_string, `expr $ac_cv_sizeof_void_p \* 8`-bit"], [A string containing the version number, platform, and C compiler]) # Supply a numeric version string for use by 3rd party add-ons --- 2185,2191 fi AC_DEFINE_UNQUOTED(PG_VERSION_STR, !["PostgreSQL $PG_VERSION on $host${VENDOR_VERSION:+ ($VENDOR_VERSION)}, compiled by $cc_string, `expr $ac_cv_sizeof_void_p \* 8`-bit"], [A string containing the version number, platform, and C compiler]) # Supply a numeric version string for use by 3rd party add-ons
Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
On Fri, Dec 15, 2017 at 11:30 AM, Andres Freund wrote: > Alvaro, Michael, Peter, and everyone else I'd greatly appreciate if you > could have a look at the backported version, just about everything but > v10 had conflicts, some of them not insubstantial. I have gone through the backpatched versions for the fixes in tuple pruning, running some tests on the way and those look good to me. I have not taken the time to go through the ones changing the assertions to ereport() though... -- Michael
Re: Package version in PG_VERSION and version()
On Fri, Dec 15, 2017 at 7:46 PM, Christoph Berg wrote: > To be able to identify more easily which package a connected server is > coming from, I would like to embed the (Debian) package version in the > version() output which is coming from PG_VERSION. It is fairly easy to > do that, but it requires patching configure(.in): Why reinventing the wheel when there is already --with-extra-version that you can use for the same purpose? And I think that I can see a bug here on HEAD, src/tools/msvc/Solution.pm correctly uses --with-extra-version in PG_VERSION_STR but ./configure is forgetting it. If you want to push for this proposal anyway, I think that you should update the msvc scripts as well. -- Michael
Re: GSoC 2018
Stefan, * Stefan Keller (sfkel...@gmail.com) wrote: > 2017-12-15 4:14 GMT+01:00 Stephen Frost : > > Unsurprisingly, we'll need to have an Ideas page again, so I've gone > > ahead and created one (copying last year's): > > What about adding "Learned Index" as project task [*]? > This type of index looks promising for certain properties. If you can provide a *specific* description with expected outcomes and we have someone who can mentor such a project, then I think it would be acceptable. We would need to get any potential students on the mailing list before the project is submitted to try and fill out the specifics around what such an index would actually look like too, I think. Without that guideance, the student proposal would be pretty hard to accept I suspect. Thanks! Stephen signature.asc Description: Digital signature
Re: Package version in PG_VERSION and version()
Re: Michael Paquier 2017-12-15 > On Fri, Dec 15, 2017 at 7:46 PM, Christoph Berg > wrote: > > To be able to identify more easily which package a connected server is > > coming from, I would like to embed the (Debian) package version in the > > version() output which is coming from PG_VERSION. It is fairly easy to > > do that, but it requires patching configure(.in): > > Why reinventing the wheel when there is already --with-extra-version > that you can use for the same purpose? That modifies the PG version number as such, as what psql is showing on connect. I'd think that is too intrusive. And it doesn't work anyway: $ ./configure --with-extra-version ' (Debian 10.1-2)' configure: WARNING: you should use --build, --host, --target configure: WARNING: invalid host type: (Debian 10.1-2) configure: error: argument required for --with-extra-version option > And I think that I can see a > bug here on HEAD, src/tools/msvc/Solution.pm correctly uses > --with-extra-version in PG_VERSION_STR but ./configure is forgetting > it. If you want to push for this proposal anyway, I think that you > should update the msvc scripts as well. configure.in looks right, it includes the extra version right in PG_VERSION: PGAC_ARG_REQ(with, extra-version, [STRING], [append STRING to version], [PG_VERSION="$PACKAGE_VERSION$withval"], [PG_VERSION="$PACKAGE_VERSION"]) AC_DEFINE_UNQUOTED(PG_VERSION, "$PG_VERSION", [PostgreSQL version as a string]) AC_DEFINE_UNQUOTED(PG_VERSION_STR, ["PostgreSQL $PG_VERSION on $host, compiled by $cc_string, `expr $ac_cv_sizeof_void_p \* 8`-bit"], [A string containing the version number, platform, and C compiler]) I'll update the msvc scripts once we figure out where my idea of "vendor package version" is best placed. Mit freundlichen Grüßen, Christoph Berg -- Senior Berater, Tel.: +49 2166 9901 187 credativ GmbH, HRB Mönchengladbach 12080, USt-ID-Nummer: DE204566209 Trompeterallee 108, 41189 Mönchengladbach Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer pgp fingerprint: 5C48 FE61 57F4 9179 5970 87C6 4C5A 6BAB 12D2 A7AE
Re: GSoC 2018
Aleksander, * Aleksander Alekseev (a.aleks...@postgrespro.ru) wrote: > > New entries are certainly welcome and encouraged, just be sure to note > > them as '2018' when you add it. > > I proposed a few ideas: Thanks! > * High availability / failover based on logical replication > * Thrift datatype support > > Hope these ideas are good enough for GSoC. The main things are to have a detailed enough description that someone can write a project plan about what they're going to do over the summer on that project and the project is of a reasonable size. HA/fail-over is a very broad topic, with a lot of pieces that need to be done such that I'm not sure it's really viable, but perhaps a precursor project (synchronous logical replication seems like a prereq, no?) would make more sense. Or, perhaps, a different piece of the HA question, but solving the whole thing in a summer strikes me as unlikely to be reasonable. Regarding the thrift data type, that seems like a pretty good GSoC project, though I'm not sure why you suggest having pl/pgsql functions for accessing data from .thrift files- plpgsql can't directly access the filesystem and the input routines for .thrift-style data would certainly be best in C. Thanks! Stephen signature.asc Description: Digital signature
Re: GSoC 2018
Hi Stephen, > HA/fail-over is a very broad topic, with a lot of pieces that need to be > done such that I'm not sure it's really viable, but perhaps a precursor > project (synchronous logical replication seems like a prereq, no?) would > make more sense. Or, perhaps, a different piece of the HA question, but > solving the whole thing in a summer strikes me as unlikely to be > reasonable. Frankly I got the impression that logical replication in PostgreSQL 10 is as synchronous as physical replication in terms that it treats synchronous_commit the same way and gives exactly same guarantees. Is there a problem with current implementation of logical replication I'm not aware of? Or by synchronous logical replication you meant something different? Regarding the difficulty of the project - in fact it's not that difficult. Particularly this project can rely on external tools, e.g. use Consul for service discovery and leader election based on leader-lease approach (implementation [1]). Having this the only thing is missing is automatic replica promoting and (optionally) re-configuring of HAProxy / pgbouncer / whatever. Yes, and lots of Jepsen-like test of course. I believe it's not such a complicated project. > Regarding the thrift data type, that seems like a pretty good GSoC > project, though I'm not sure why you suggest having pl/pgsql functions > for accessing data from .thrift files- plpgsql can't directly access the > filesystem and the input routines for .thrift-style data would certainly > be best in C. What I meant was generating PL/pgSQL procedures from .thift files, like `MyStructure_getFieldX(bytea) returns text`. It took me a few hours to write pg_protobuf so I decided the project would be way to easy without implementing such tool. I changed the text on wiki, hopefully it's better now. [1]: https://github.com/afiskon/go-elector -- Best regards, Aleksander Alekseev signature.asc Description: PGP signature
Re: [HACKERS] UPDATE of partition key
On Wed, Dec 13, 2017 at 5:18 AM, Amit Khandekar wrote: > Amit Langote informed me off-list, - along with suggestions for > changes - that my patch needs a rebase. Attached is the rebased > version. I have also bumped the patch version number (now v29), > because this as additional changes, again, suggested by Amit L : > Because ExecSetupPartitionTupleRouting() has mtstate parameter now, > no need to pass update_rri and num_update_rri, since they can be > retrieved from mtstate. > > Also, the preparatory patch is also rebased. Reviewing the preparatory patch: + PartitionTupleRouting *partition_tuple_routing; + /* Tuple-routing support info */ Something's wrong with the formatting here. -PartitionDispatch **pd, -ResultRelInfo ***partitions, -TupleConversionMap ***tup_conv_maps, -TupleTableSlot **partition_tuple_slot, -int *num_parted, int *num_partitions) +PartitionTupleRouting **partition_tuple_routing) Since we're consolidating all of ExecSetupPartitionTupleRouting's output parameters into a single structure, I think it might make more sense to have it just return that value. I think it's only done with output parameter today because there are so many different things being produced, and we can't return them all. + PartitionTupleRouting *ptr = mtstate->mt_partition_tuple_routing; This is just nitpicking, but I don't find "ptr" to be the greatest variable name; it looks too much like "pointer". Maybe we could use "routing" or "proute" or something. It seems to me that we could improve things here by adding a function ExecCleanupTupleRouting(PartitionTupleRouting *) which would do the various heap_close(), ExecDropSingleTupleTableSlot(), and ExecCloseIndices() operations which are currently performed in CopyFrom() and, by separate code, in ExecEndModifyTable(). -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: GSoC 2018
Aleksander, * Aleksander Alekseev (a.aleks...@postgrespro.ru) wrote: > > HA/fail-over is a very broad topic, with a lot of pieces that need to be > > done such that I'm not sure it's really viable, but perhaps a precursor > > project (synchronous logical replication seems like a prereq, no?) would > > make more sense. Or, perhaps, a different piece of the HA question, but > > solving the whole thing in a summer strikes me as unlikely to be > > reasonable. > > Frankly I got the impression that logical replication in PostgreSQL 10 > is as synchronous as physical replication in terms that it treats > synchronous_commit the same way and gives exactly same guarantees. Is > there a problem with current implementation of logical replication I'm > not aware of? Or by synchronous logical replication you meant something > different? synchronous_commit isn't the same as synchronous_standby_names ... We could have used better names for those GUCs, I suspect. > Regarding the difficulty of the project - in fact it's not that > difficult. Particularly this project can rely on external tools, e.g. > use Consul for service discovery and leader election based on > leader-lease approach (implementation [1]). Having this the only thing > is missing is automatic replica promoting and (optionally) > re-configuring of HAProxy / pgbouncer / whatever. Yes, and lots of > Jepsen-like test of course. I believe it's not such a complicated > project. What you're talking about is rebuilding Patroni, but adding more into it than even Patroni tries to do, building it on Logical Replication instead of physical replication, and calling it simple and something that could get into core PostgreSQL over a 12 week GSoC project. I've certainly got doubts about that, even if we decide that it'd be an external-to-PG project (like Patroni). What might be interesting is seeing if Logical Replication could be added to Patroni as an option and then building on that.. Having someone involved in the Patroni project would be the right way to go about proposing that though to see what they think of it. That would also be much more sensible as a GSoC project, since it'd be an addition to an existing project and not more-or-less starting a whole new project. > > Regarding the thrift data type, that seems like a pretty good GSoC > > project, though I'm not sure why you suggest having pl/pgsql functions > > for accessing data from .thrift files- plpgsql can't directly access the > > filesystem and the input routines for .thrift-style data would certainly > > be best in C. > > What I meant was generating PL/pgSQL procedures from .thift files, like > `MyStructure_getFieldX(bytea) returns text`. It took me a few hours to > write pg_protobuf so I decided the project would be way to easy without > implementing such tool. I changed the text on wiki, hopefully it's > better now. Wouldn't it make more sense to have something like what we have for JSONB where there's operators that are used to pull out specific fields instead of generated pl/pgsql procedures..? Thanks! Stephen signature.asc Description: Digital signature
Re: pearltidy source code has been removed (pgindent)
On 12/14/2017 08:55 PM, Tom Lane wrote: > Andrew Dunstan writes: >> On 12/14/2017 08:37 PM, Jordan Deitch wrote: >>> I am unable to build pgindent as it appears the pearltidy source has >>> been removed from sourceforge: >>> https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob_plain;f=src/tools/pgindent/README;hb=HEAD >> I just downloaded it from there. > The 20090616 release we recommend? I don't see it there either: > > https://sourceforge.net/projects/perltidy/files/ > > Maybe it's time to update to a less hoary version? > > In any case, you don't need perltidy unless you intend to reindent > the Perl code . Oh, oops, got the wrong version, you're right, sorry for the noise. Yeah, seems like a good time to update it. I'm currently using 20170501 for the buildfarm code. cheers andrew -- Andrew Dunstanhttps://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] pgbench more operators & functions
2) In makeVariableValue(): if (pg_strcasecmp(var->svalue, "null") == 0) ... else if (pg_strncasecmp(var->svalue, "true", slen) mixing of pg_strcasecmp and pg_strNcasecmp. And, IMHO, result of pg_strncasecmp("tru", "true", 1) will be 0. Yep, but it cannot be called like that because slen == strlen(var->svalue). sorry, mistyped pg_strncasecmp("tru", "true", 3) will be 0. It may be good for 't' of 'f' but it seems too free grammar to accept 'tr' or 'fa' or even 'o' which actually not known to be on or off. Yes, it really works like that. I tried to make something clearer than "src/bin/psql/variable.c". Maybe I did not succeed. Ok, I see. Current coding accepts truexxx, falsexxx, yesxx, noxxx but doesn't accept offxxx and onxxx. Not so consistent as it could be. Also it doesn't accept 1 and 0 as psql does, but it's obviously why. I used "PGBT_NO_VALUE" which seemed clearer otherwise a variable may be set and its value would not "not set" which would look strange. Agree Sorry, but I found more notices: 1) '~' and unary '-' should be commented, it's not so easy to guess about how they actually implemented (-1 XOR value, remember that -1 is 0xf) 2) - | expr '%' expr { $$ = make_op(yyscanner, "%", $1, $3); } + | expr '%' expr { $$ = make_op(yyscanner, "mod", $1, $3); } why is MOD operation renamed? Do I miss something in thread? Looking to psql and pgbench scripting implementation, isn't it better to integrate lua in psql & pgbench? -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/
Re: GSoC 2018
Hi Stephen, > synchronous_commit isn't the same as synchronous_standby_names ... What about synchronous_standby_names? Logical replication ignores it and doesn't wait for ack's from replicas or something? > What might be interesting is seeing if Logical Replication could be > added to Patroni as an option and then building on that.. Having > someone involved in the Patroni project would be the right way to go > about proposing that though to see what they think of it. That would > also be much more sensible as a GSoC project, since it'd be an addition > to an existing project and not more-or-less starting a whole new > project. Completely agree, this project can be an improvement for Stolon (or Patroni, but I personally never tested or used it, also I got a feeling that Google guys will prefer a project that is written in Go). This would make much more sense. > Wouldn't it make more sense to have something like what we have for > JSONB where there's operators that are used to pull out specific fields > instead of generated pl/pgsql procedures..? I don't see why we can't have both procedures and operators like: ``` x -> 'field1' -- returns int x ->> 'field2' -- returns string ``` The reason why some users may prefer procedures to operators is that it's easy to make a typo in a field name. Procedures are safer in this regard. -- Best regards, Aleksander Alekseev signature.asc Description: PGP signature
[HACKERS] wrong t_bits alignment in pageinspect
Hi! I found out the problem in exposing values of t_bits field from heap_page_items function. When the number of attributes in table is multiple of eight, t_bits column shows double number of bits in which data fields are included. For example: # create table tbl(f1 int, f2 int, f3 int, f4 int, f5 int, f6 int, f7 int, f8 int); # insert into tbl(f1, f8) values (x'f1'::int, 0); # select t_bits, t_data from heap_page_items(get_raw_page('tbl', 0)); t_bits | t_data --+ 10011000 | \xf100 I suppose the prefix 1001 corresponds to real value of t_bits, the rest part 1000 - to the lower byte of f1 field of tbl. Attached patch fixes this issue. -- Regards, Maksim Milyutin diff --git a/contrib/pageinspect/heapfuncs.c b/contrib/pageinspect/heapfuncs.c index ca4d3f5..fe53a1a 100644 --- a/contrib/pageinspect/heapfuncs.c +++ b/contrib/pageinspect/heapfuncs.c @@ -234,7 +234,7 @@ heap_page_items(PG_FUNCTION_ARGS) int bits_len; bits_len = - ((tuphdr->t_infomask2 & HEAP_NATTS_MASK) / 8 + 1) * 8; + TYPEALIGN(8, (tuphdr->t_infomask2 & HEAP_NATTS_MASK)); values[11] = CStringGetTextDatum( bits_to_text(tuphdr->t_bits, bits_len)); }
Re: GSoC 2018
Aleksander Alekseev writes: > Hi Stephen, > >> synchronous_commit isn't the same as synchronous_standby_names ... > > What about synchronous_standby_names? Logical replication ignores it and > doesn't wait for ack's from replicas or something? It actually works, see [1]: The name of a standby server for this purpose is the application_name setting of the standby, as set in the standby's connection information. In case of a physical replication standby, this should be set in the primary_conninfo setting in recovery.conf; the default is walreceiver. For logical replication, this can be set in the connection information of the subscription, and it defaults to the subscription name. [1] https://www.postgresql.org/docs/current/static/runtime-config-replication.html -- Arseny Sher Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: GSoC 2018
Hello, On Fri, Dec 15, 2017, at 14:30, Stephen Frost wrote: > Aleksander, > > * Aleksander Alekseev (a.aleks...@postgrespro.ru) wrote: > > > Regarding the difficulty of the project - in fact it's not that > > difficult. Particularly this project can rely on external tools, e.g. > > use Consul for service discovery and leader election based on > > leader-lease approach (implementation [1]). Having this the only thing > > is missing is automatic replica promoting and (optionally) > > re-configuring of HAProxy / pgbouncer / whatever. Yes, and lots of > > Jepsen-like test of course. I believe it's not such a complicated > > project. Does it make sense to address the limitations of the logical replication first, i.e. inability to replicate DDL, sequences and so on? > > What you're talking about is rebuilding Patroni, but adding more into it > than even Patroni tries to do, building it on Logical Replication > instead of physical replication, and calling it simple and something > that could get into core PostgreSQL over a 12 week GSoC project. I've > certainly got doubts about that, even if we decide that it'd be an > external-to-PG project (like Patroni). > > What might be interesting is seeing if Logical Replication could be > added to Patroni as an option and then building on that.. Having > someone involved in the Patroni project would be the right way to go > about proposing that though to see what they think of it. That would > also be much more sensible as a GSoC project, since it'd be an addition > to an existing project and not more-or-less starting a whole new > project. Right now logical replication and physical replication-based HA tools don't work together nicely, since logical replication position is not propagated to the promoted replica (I think Craig Ringer has been tackling this issue for a few releases already, the latest set of patches I could find is https://commitfest.postgresql.org/15/788/). Perhaps there is opportunity for a GSoC student to help fixing it. Until then we cannot use logical replication for HA, and even doing something simpler like automating creation of logical replicas in Patroni makes little sense, as they are doomed to be reinitialized on every failover. -- Sincerely, Alex
Re: GSoC 2018
Alex, * Alex Kliukin (al...@hintbits.com) wrote: > On Fri, Dec 15, 2017, at 14:30, Stephen Frost wrote: > > * Aleksander Alekseev (a.aleks...@postgrespro.ru) wrote: > > > > > Regarding the difficulty of the project - in fact it's not that > > > difficult. Particularly this project can rely on external tools, e.g. > > > use Consul for service discovery and leader election based on > > > leader-lease approach (implementation [1]). Having this the only thing > > > is missing is automatic replica promoting and (optionally) > > > re-configuring of HAProxy / pgbouncer / whatever. Yes, and lots of > > > Jepsen-like test of course. I believe it's not such a complicated > > > project. > > Does it make sense to address the limitations of the logical > replication first, i.e. inability to replicate DDL, sequences and so on? This is what I was trying to get at earlier, which was mostly just around trying to scope the project down to something a bit more reasonable since the initial proposal sounded like it could be a whole new and independent OSS project on its own. > > What you're talking about is rebuilding Patroni, but adding more into it > > than even Patroni tries to do, building it on Logical Replication > > instead of physical replication, and calling it simple and something > > that could get into core PostgreSQL over a 12 week GSoC project. I've > > certainly got doubts about that, even if we decide that it'd be an > > external-to-PG project (like Patroni). > > > > What might be interesting is seeing if Logical Replication could be > > added to Patroni as an option and then building on that.. Having > > someone involved in the Patroni project would be the right way to go > > about proposing that though to see what they think of it. That would > > also be much more sensible as a GSoC project, since it'd be an addition > > to an existing project and not more-or-less starting a whole new > > project. > > Right now logical replication and physical replication-based HA tools > don't work together nicely, since logical replication position is not > propagated to the promoted replica (I think Craig Ringer has been > tackling this issue for a few releases already, the latest set of > patches I could find is https://commitfest.postgresql.org/15/788/). > Perhaps there is opportunity for a GSoC student to help fixing it. Until > then we cannot use logical replication for HA, and even doing something > simpler like automating creation of logical replicas in Patroni makes > little sense, as they are doomed to be reinitialized on every failover. This also sounds like a good project for a GSoC student. To be clear, we can certainly have project ideas on the Ideas page which build on things that aren't yet committed or to help with existing projects that are already underway. The project needs to be the right level of effort (remember- this will be the student's job for 12 weeks) and needs to be something that a student would be able to come up to speed on and make serious and meaningful progress on during their summer. When it comes to timeline, student proposals start being accepted in March. Thanks! Stephen signature.asc Description: Digital signature
Re: GSoC 2018
On Fri, Dec 15, 2017, at 14:52, Aleksander Alekseev wrote: > Completely agree, this project can be an improvement for Stolon (or > Patroni, but I personally never tested or used it, also I got a feeling > that Google guys will prefer a project that is written in Go). This > would make much more sense. I don't believe Google will reject a project based on the fact that it is written in Python (in fact, Python Software Foundation has successfully participated in GSoC for many years). Based on the github statistics, Patroni has started earlier and has more contributors than Stolon (including those contributed more than one patch/pull-request.) -- Sincerely, Alex
Re: GSoC 2018
Alex, * Alex Kliukin (al...@hintbits.com) wrote: > On Fri, Dec 15, 2017, at 14:52, Aleksander Alekseev wrote: > > Completely agree, this project can be an improvement for Stolon (or > > Patroni, but I personally never tested or used it, also I got a feeling > > that Google guys will prefer a project that is written in Go). This > > would make much more sense. > > I don't believe Google will reject a project based on the fact that it > is written in Python (in fact, Python Software Foundation has > successfully participated in GSoC for many years). Based on the github > statistics, Patroni has started earlier and has more contributors than > Stolon (including those contributed more than one patch/pull-request.) I am 100% certain that Google will be more than happy to accept a Python-based project and that they won't favor a Go project over a Python one. That said, I don't think we need to get into a discussion about which project would be best- I'd be happy to have them both listed on the Ideas page (perhaps independently would be best though) provided there are mentors for each. Also, remember, these are just our ideas- students are welcome to pick which they're interested in or to choose their own. When we get to the point of evaluating the student proposals, it will be a discussion among the GSoC admins and the individuals who have signed up to be mentors. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] pgbench more operators & functions
2017-12-15 14:47 GMT+01:00 Teodor Sigaev : > 2) In makeVariableValue(): >>> if (pg_strcasecmp(var->svalue, "null") == 0) >>> ... >>> else if (pg_strncasecmp(var->svalue, "true", slen) >>> >>> mixing of pg_strcasecmp and pg_strNcasecmp. And, IMHO, result of >>> pg_strncasecmp("tru", "true", 1) will be 0. >>> >> >> Yep, but it cannot be called like that because slen == >> strlen(var->svalue). >> > sorry, mistyped > pg_strncasecmp("tru", "true", 3) will be 0. > > >> It may be good for 't' of 'f' but it seems too free grammar to accept >>> 'tr' or 'fa' or even 'o' which actually not known to be on or off. >>> >> >> Yes, it really works like that. I tried to make something clearer than >> "src/bin/psql/variable.c". Maybe I did not succeed. >> > Ok, I see. Current coding accepts truexxx, falsexxx, yesxx, noxxx but > doesn't accept offxxx and onxxx. Not so consistent as it could be. Also it > doesn't accept 1 and 0 as psql does, but it's obviously why. > > I used "PGBT_NO_VALUE" which seemed clearer otherwise a variable may be >> set and its value would not "not set" which would look strange. >> > Agree > > > Sorry, but I found more notices: > 1) '~' and unary '-' should be commented, it's not so easy to guess about > how they actually implemented (-1 XOR value, remember that -1 is > 0xf) > > 2) > - | expr '%' expr { $$ = make_op(yyscanner, "%", $1, $3); } > + | expr '%' expr { $$ = make_op(yyscanner, "mod", $1, $3); } > > why is MOD operation renamed? Do I miss something in thread? > > > > Looking to psql and pgbench scripting implementation, isn't it better to > integrate lua in psql & pgbench? > I don't think - I like Lua language, but it means no small new dependency. These changes are only few lines and nobody expect building complex applications in pgbench or psql. Regards Pavel > > -- > Teodor Sigaev E-mail: teo...@sigaev.ru >WWW: > http://www.sigaev.ru/ >
Re: Package version in PG_VERSION and version()
Christoph Berg writes: > Re: Michael Paquier 2017-12-15 > >> Why reinventing the wheel when there is already --with-extra-version >> that you can use for the same purpose? > That modifies the PG version number as such, as what psql is showing > on connect. I'd think that is too intrusive. I'm really pretty much -1 on having two different ways to do very nearly the same thing, with the differences determined only by somebody's arbitrary choices of where they think the modified version should be exposed. IMO, either you think the Debian package version is important enough to show, or you don't. (I'd incline to the "don't" side anyway.) regards, tom lane
Re: Reproducible builds: genbki.pl vs schemapg.h
Christoph Berg writes: > Debian's reproducible builds project has revealed that the full build > path gets embedded into server/catalog/schemapg.h: genbki.pl is hardly our only script that prints its $0 ... regards, tom lane
[PROPOSAL] bracketed-paste support for psql
Hi It occurred to me the other day while people were talking about pasting blocks of text creating problems, especially with tabs, that xterm bracketed-paste support (also works in at least putty and probably others) that would block curses handling and just paste as-is would be a useful (and I'm guessing relatively simple) thing to add. Is anyone already working on this? If not, does anyone foresee any problems with the idea? Geoff
Re: [PROPOSAL] bracketed-paste support for psql
On 12/15/17 11:22, Geoff Winkless wrote: > It occurred to me the other day while people were talking about > pasting blocks of text creating problems, especially with tabs, that > xterm bracketed-paste support (also works in at least putty and > probably others) that would block curses handling and just paste as-is > would be a useful (and I'm guessing relatively simple) thing to add. You need to put set enable-bracketed-paste on into ~/.inputrc, then it works. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] replace GrantObjectType with ObjectType
On 12/13/17 02:35, Michael Paquier wrote: > Patch 0001 is simply removing EventTriggerSupportsGrantObjectType(), > but shouldn't we keep it and return an error for objects that have no > GRANT support? Returning conditionally true looks like a trap waiting > to take someone in. I don't understand the motivation for this. It would just be two lists for the same thing. I think the potential for omission would be much greater that way. > Similarly, not using default in the switches of > stringify_adefprivs_objtype() and stringify_grantobjtype() would be > nice to grab warnings during compilation. And patch 0002 is doing it > the correct way in aclcheck_error(). I'll fix that. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] replace GrantObjectType with ObjectType
On 12/14/17 22:59, Rushabh Lathia wrote: > I noted that no_priv_msg and not_owner_msg array been removed > and code fitted the code into aclcheck_error(). Actually that > makes the code more complex then what it used to be. I would > prefer the array rather then code been fitted into the function. There is an argument for having a big array versus the switch/case approach. But most existing code around object addresses uses the switch/case approach, so it's better to align it that way, I think. It's weird to have to maintain two different styles. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [PROPOSAL] bracketed-paste support for psql
On 15 December 2017 at 17:13, Peter Eisentraut wrote: > You need to put > > set enable-bracketed-paste on > > into ~/.inputrc, then it works. Hmm, looks like that requires a newer version of readline (v7) than I have here. Oh well, if support is already there (albeit unavailable) then I'll leave it. No point duplicating effort. Geoff
Re: Reproducible builds: genbki.pl vs schemapg.h
Re: Tom Lane 2017-12-15 <9616.1513351...@sss.pgh.pa.us> > Christoph Berg writes: > > Debian's reproducible builds project has revealed that the full build > > path gets embedded into server/catalog/schemapg.h: > > genbki.pl is hardly our only script that prints its $0 ... As per https://tests.reproducible-builds.org/debian/rb-pkg/unstable/amd64/postgresql-10.html, that's the only place that makes it into the resulting binary. I wouldn't be sending a patch if it didn't fix the issue. Christoph -- Senior Berater, Tel.: +49 2166 9901 187 credativ GmbH, HRB Mönchengladbach 12080, USt-ID-Nummer: DE204566209 Trompeterallee 108, 41189 Mönchengladbach Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer pgp fingerprint: 5C48 FE61 57F4 9179 5970 87C6 4C5A 6BAB 12D2 A7AE
Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
On Thu, Dec 14, 2017 at 6:30 PM, Andres Freund wrote: > Pushed this way. Moved some more relfrozenxid/relminmxid tests outside > of the cutoff changes, polished some error messages. > > > Alvaro, Michael, Peter, and everyone else I'd greatly appreciate if you > could have a look at the backported version, just about everything but > v10 had conflicts, some of them not insubstantial. I have one minor piece of feedback on the upgrading of assertions to ereport()s with ERRCODE_DATA_CORRUPTION: It would be nice if you could upgrade the raw elog() "can't happen" error within IndexBuildHeapRangeScan() to be an ereport() with ERRCODE_DATA_CORRUPTION. I'm referring to the "failed to find parent tuple for heap-only tuple" error, which I think merits being a real user-visible error, just like the relfrozenxid/relminmxid tests are now. As you know, the enhanced amcheck will sometimes detect corruption due to this bug by hitting that error. It would be nice if we could tighten up the number of errcodes that can be involved in an error that amcheck detects. I know that elog() implicitly has an errcode, that could be included in the errcodes to check when verification occurs in an automated fashion across a large number of databases. However, this is a pretty esoteric point, and I'd prefer to just try to limit it to ERRCODE_DATA_CORRUPTION and ERRCODE_INDEX_CORRUPTED, insofar as that's practical. When we ran amcheck on the Heroku fleet, back when I worked there, there were all kinds of non-interesting errors that could occur that needed to be filtered out. I want to try to make that process somewhat less painful. -- Peter Geoghegan
Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
On 2017-12-15 20:25:22 +0900, Michael Paquier wrote: > On Fri, Dec 15, 2017 at 11:30 AM, Andres Freund wrote: > > Alvaro, Michael, Peter, and everyone else I'd greatly appreciate if you > > could have a look at the backported version, just about everything but > > v10 had conflicts, some of them not insubstantial. > > I have gone through the backpatched versions for the fixes in tuple > pruning, running some tests on the way and those look good to me. Thanks. > I have not taken the time to go through the ones changing the > assertions to ereport() though... Those were the ones with a lot of conflicts tho - I'd temporarily broken freezing for 9.3, but both review and testing caught it... Greetings, Andres Freund
Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
On 2017-12-15 10:46:05 -0800, Peter Geoghegan wrote: > On Thu, Dec 14, 2017 at 6:30 PM, Andres Freund wrote: > > Pushed this way. Moved some more relfrozenxid/relminmxid tests outside > > of the cutoff changes, polished some error messages. > > > > > > Alvaro, Michael, Peter, and everyone else I'd greatly appreciate if you > > could have a look at the backported version, just about everything but > > v10 had conflicts, some of them not insubstantial. > > I have one minor piece of feedback on the upgrading of assertions to > ereport()s with ERRCODE_DATA_CORRUPTION: It would be nice if you could > upgrade the raw elog() "can't happen" error within > IndexBuildHeapRangeScan() to be an ereport() with > ERRCODE_DATA_CORRUPTION. I'm referring to the "failed to find parent > tuple for heap-only tuple" error, which I think merits being a real > user-visible error, just like the relfrozenxid/relminmxid tests are > now. As you know, the enhanced amcheck will sometimes detect > corruption due to this bug by hitting that error. I'm not opposed to that, it just seems independent from this thread. Not sure I really want to go around and backpatch such a change, that code has changed a bit between branches. Happy to do so on master. Greetings, Andres Freund
Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
On Fri, Dec 15, 2017 at 10:58 AM, Andres Freund wrote: >> I have one minor piece of feedback on the upgrading of assertions to >> ereport()s with ERRCODE_DATA_CORRUPTION: It would be nice if you could >> upgrade the raw elog() "can't happen" error within >> IndexBuildHeapRangeScan() to be an ereport() with >> ERRCODE_DATA_CORRUPTION. I'm referring to the "failed to find parent >> tuple for heap-only tuple" error, which I think merits being a real >> user-visible error, just like the relfrozenxid/relminmxid tests are >> now. As you know, the enhanced amcheck will sometimes detect >> corruption due to this bug by hitting that error. > > I'm not opposed to that, it just seems independent from this thread. Not > sure I really want to go around and backpatch such a change, that code > has changed a bit between branches. Happy to do so on master. The elog(), which was itself upgraded from a simple Assert by commit d70cf811, appears in exactly the same form in 9.3+. Things did change there, but they were kept in sync. -- Peter Geoghegan
Re: Top-N sorts verses parallelism
On Thu, Dec 14, 2017 at 5:12 PM, Thomas Munro wrote: > > > > This looks like a costing bug. The estimated cost of sorting 416,667 > > estimated tuples in one parallel worker is almost identical to the > estimated > > cost of sorting 1,000,000 tuples when parallelism is turned off. Adding > > some logging to the cost_sort function, it looks like the limit does not > get > > sent down for the parallel estimate: > > > > NOTICE: JJ cost_sort tuples 100.00, limit 61.00, sort_mem > 65536 > > NOTICE: JJ cost_sort tuples 416667.00, limit -1.00, sort_mem > 65536 > > > > So the LIMIT gets passed down for the execution step, but not for the > > planning step. > > Oh, well spotted. I was looking in the wrong place. Maybe the fix is > as simple as the attached? It certainly helps in the cases I tested, > including with wider tables. > I had hit on the same change. And was also surprised that it was located where it was. With the change, it uses the parallel plan all the way down to LIMIT 1. With the patch, it still satisfies make check, so if it introduces errors they are subtle ones. If we can't actually do this and it needs to stay -1, then I think we need a comment to explain why. Cheers, Jeff
Re: [HACKERS] replace GrantObjectType with ObjectType
On Fri, Dec 15, 2017 at 12:40 PM, Peter Eisentraut wrote: > On 12/14/17 22:59, Rushabh Lathia wrote: >> I noted that no_priv_msg and not_owner_msg array been removed >> and code fitted the code into aclcheck_error(). Actually that >> makes the code more complex then what it used to be. I would >> prefer the array rather then code been fitted into the function. > > There is an argument for having a big array versus the switch/case > approach. But most existing code around object addresses uses the > switch/case approach, so it's better to align it that way, I think. > It's weird to have to maintain two different styles. Eh, really? What about the two big arrays at the top of objectaddress.c? (This is just a drive-by comment; I haven't looked at the details of this patch.) -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
On Fri, Dec 15, 2017 at 11:03 AM, Peter Geoghegan wrote: > The elog(), which was itself upgraded from a simple Assert by commit > d70cf811, appears in exactly the same form in 9.3+. Things did change > there, but they were kept in sync. BTW, if you're going to do it, I would target the similar error within validate_index_heapscan(), too. That was also added by d70cf811. -- Peter Geoghegan
Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
On 2017-12-15 11:15:47 -0800, Peter Geoghegan wrote: > On Fri, Dec 15, 2017 at 11:03 AM, Peter Geoghegan wrote: > > The elog(), which was itself upgraded from a simple Assert by commit > > d70cf811, appears in exactly the same form in 9.3+. Things did change > > there, but they were kept in sync. > > BTW, if you're going to do it, I would target the similar error within > validate_index_heapscan(), too. That was also added by d70cf811. Please send a patch for master on a *new* thread. Greetings, Andres Freund
[sqlsmith] Parallel worker executor crash on master
Hi, sqlsmith just crashed a parallel worker while testing master at 699bf7d05c. I can reproduce it with the following recipe on a fresh regression database. Backtrace and query plan below as well. regards, Andreas --8<---cut here---start->8--- set min_parallel_table_scan_size = '8kB'; set parallel_setup_cost = 0; set parallel_tuple_cost = 0; select * from public.prt1_m_p3 as ref_3 inner join pg_catalog.pg_class as sample_1 tablesample bernoulli (2.1) on (ref_3.c = sample_1.relpages ) inner join public.rtest_view4 as ref_4 on (ref_4.b @@ (select keyword from public.test_tsquery limit 1 offset 2) ), lateral (select ref_3.a as c0 from public.lseg_tbl as ref_5 where (select f1 from public.timetz_tbl limit 1 offset 5) > (select pg_catalog.min(timetzcol) from public.brintest) limit 18) as subq_2 where true limit 102; --8<---cut here---end--->8--- QUERY PLAN --- Limit (cost=107.69..1421.69 rows=102 width=315) InitPlan 1 (returns $1) -> Limit (cost=0.35..0.52 rows=1 width=32) -> Gather (cost=0.00..1.04 rows=6 width=32) Workers Planned: 1 -> Parallel Seq Scan on test_tsquery (cost=0.00..1.04 rows=4 width=32) -> Nested Loop (cost=107.17..5775.39 rows=440 width=315) Join Filter: (ref_3.c = sample_1.relpages) -> Nested Loop (cost=107.17..5416.40 rows=250 width=16) -> Gather (cost=0.00..1.29 rows=50 width=12) Workers Planned: 1 -> Parallel Seq Scan on prt1_m_p3 ref_3 (cost=0.00..1.29 rows=29 width=12) -> Limit (cost=107.17..108.20 rows=5 width=4) InitPlan 2 (returns $4) -> Limit (cost=0.45..0.54 rows=1 width=12) -> Gather (cost=0.00..1.07 rows=12 width=12) Workers Planned: 1 -> Parallel Seq Scan on timetz_tbl (cost=0.00..1.07 rows=7 width=12) InitPlan 3 (returns $5) -> Aggregate (cost=106.62..106.64 rows=1 width=12) -> Seq Scan on brintest (cost=0.00..106.30 rows=130 width=12) -> Gather (cost=0.00..1.03 rows=5 width=4) Workers Planned: 1 Params Evaluated: $4, $5 -> Result (cost=0.00..1.03 rows=3 width=0) One-Time Filter: ($4 > $5) -> Parallel Seq Scan on lseg_tbl ref_5 (cost=0.00..1.03 rows=3 width=0) -> Materialize (cost=0.00..194.10 rows=44 width=299) -> Nested Loop (cost=0.00..193.88 rows=44 width=299) -> Gather (cost=0.00..140.00 rows=1 width=40) Workers Planned: 2 Params Evaluated: $1 -> Parallel Seq Scan on rtest_view4 ref_4 (cost=0.00..140.00 rows=1 width=40) Filter: (b @@ $1) -> Sample Scan on pg_class sample_1 (cost=0.00..53.44 rows=44 width=259) Sampling: bernoulli ('2.1'::real) Program terminated with signal SIGSEGV, Segmentation fault. #0 0x55c9dba2d7f8 in timetz_gt (fcinfo=0x55c9dd7295a0) at date.c:2183 2183TimeTzADT *time2 = PG_GETARG_TIMETZADT_P(1); #1 0x55c9db8ae8b2 in ExecInterpExpr (state=0x55c9dd729f00, econtext=0x55c9dd7299b8, isnull=) at execExprInterp.c:692 #2 0x55c9db8d6753 in ExecEvalExprSwitchContext (isNull=0x7ffd2f99e55f "", econtext=0x55c9dd7299b8, state=) at ../../../src/include/executor/executor.h:300 #3 ExecQual (econtext=0x55c9dd7299b8, state=) at ../../../src/include/executor/executor.h:369 #4 ExecResult (pstate=0x55c9dd729c38) at nodeResult.c:84 #5 0x55c9db8b21ea in ExecProcNode (node=0x55c9dd729c38) at ../../../src/include/executor/executor.h:242 #6 ExecutePlan (execute_once=, dest=0x55c9dd6fff78, direction=, numberTuples=18, sendTuples=, operation=CMD_SELECT, use_parallel_mode=, planstate=0x55c9dd729c38, estate=0x55c9dd729118) at execMain.c:1718 #7 standard_ExecutorRun (queryDesc=0x55c9dd742b48, direction=, count=18, execute_once=) at execMain.c:361 #8 0x55c9db8b6e99 in ParallelQueryMain (seg=0x55c9dd6a8ea8, toc=0x7f9109496000) at execParallel.c:1294 #9 0x55c9db7911d9 in ParallelWorkerMain (main_arg=) at parallel.c:1150 #10 0x55c9db975a63 in StartBackgroundWorker () at bgworker.c:841 #11 0x55c9db982015 in do_start_bgworker (rw=0x55c9dd6a0380)
Bug: Ambiguous Column Reference Allowed When Joining to pg_roles.oid
I recently fell afoul of a weird edge case while writing an extension. It seems Postgres allows for an ambiguous column reference to oid in the where clause when joining to pg_roles. It just arbitrarily chooses pg_roles.oid and ignores the conflicting name. Example: postgres=# CREATE TABLE t_demo(); CREATE TABLE postgres=# SELECT r.rolname FROM pg_class c JOIN pg_roles r ON (c.relowner = r.oid) WHERE oid = 't_demo'::regclass; rolname - (0 rows) postgres=# SELECT r.rolname FROM pg_class c JOIN pg_roles r ON (c.relowner = r.oid) WHERE c.oid = 't_demo'::regclass; rolname -- postgres (1 row) postgres=# SELECT r.rolname FROM pg_class c JOIN pg_roles r ON (c.relowner = r.oid) WHERE r.oid = 't_demo'::regclass; rolname - (0 rows) It seems like ambiguous oid references generally hit a different error message than normal ambiguous column references. postgres=# CREATE TABLE t1(x int, y int) WITH OIDS; CREATE TABLE postgres=# CREATE TABLE t2(x int, y int) WITH OIDS; CREATE TABLE postgres=# SELECT * FROM t1 JOIN t2 ON (t1.x = t2.x) WHERE y = 5; ERROR: column reference "y" is ambiguous LINE 1: SELECT * FROM t1 JOIN t2 ON (t1.x = t2.x) WHERE y = 5; postgres=# SELECT * FROM t1 JOIN t2 ON (t1.x = t2.x) WHERE oid = 5; ERROR: column "oid" does not exist LINE 1: SELECT * FROM t1 JOIN t2 ON (t1.x = t2.x) WHERE oid = 5; ^ HINT: There is a column named "oid" in table "t1", but it cannot be referenced from this part of the query. It’s clear that oids are getting to another code path normally and I suspected that it is related to the fact that pg_roles is a view with an explicit oid column. So I tried this test: postgres=# CREATE VIEW v1 AS SELECT x, y, oid FROM t1; CREATE VIEW postgres=# SELECT * FROM v1 JOIN t2 ON (v1.x = t2.x) WHERE oid = 5; x | y | oid | x | y ---+---+-+---+--- (0 rows) It would appear that unqualified oid columns do not make it all the way to where clause evaluation, whereas columns that happen to be named oid do survive that far. Therefore, postgres does not realize that is has an ambiguous column reference on its hands and binds to column presented by the view. I could definitely be wrong because I haven’t looked a the code but that is what the behavior looks like. This bug was first found on 9.3.19, but I just tested this against 10.1 as well. - Matt K
genomic locus
Greetings everyone, I need a data type to represent genomic positions, which will consist of a string and a pair of integers with interval logic and access methods. Sort of like my seg type, but more straightforward. I noticed somebody took a good care of seg while I was away for the last 20 years, and I am extremely grateful for that. I have been using it. In the meantime, things have changed and now I am almost clueless about how you deal with contributed modules and what steps I should take once I get it to work. Also, is it a good idea to clone and fix seg for this purpose, or is there a more appropriate template? Or maybe just building it from scratch will be a better idea? I have seen a lot of bit rot in other extensions (never contributed) that I have not maintained since 2009 and I now I am unable to fix some of them, so I wonder how much of old knowledge is still applicable. In other words, is what I see in new code just a change of macros or the change of principles? Thanks, --Gene
Re: Top-N sorts verses parallelism
On Fri, Dec 15, 2017 at 2:10 PM, Jeff Janes wrote: > I had hit on the same change. And was also surprised that it was located > where it was. With the change, it uses the parallel plan all the way down > to LIMIT 1. > > With the patch, it still satisfies make check, so if it introduces errors > they are subtle ones. If we can't actually do this and it needs to stay -1, > then I think we need a comment to explain why. Interesting. I suspect this is correct now, but would not have been before commit 3452dc5240da43e833118484e1e9b4894d04431c. AFAICS, this doesn't affect any execution-time behavior, just the cost estimate. And, prior to that commit, the execution-time behavior was different: there would not have been any way for the worker to do a top-N sort, because the LIMIT was not pushed through the Gather. Does that sound right, or am I still confused? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: portal pinning
On 12/12/17 10:34, Peter Eisentraut wrote: > But I also wonder whether we shouldn't automatically pin/unpin portals > in SPI_cursor_open() and SPI_cursor_close(). This makes sense if you > consider "pinned" to mean "internally generated". I don't think there > is a scenario in which user code should directly operate on a portal > created by SPI. Here is a patch for this option. The above sentence was not quite correct. Only unnamed portals should be pinned automatically. Named portals are of course possible to be passed around as refcursors for example. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services From f9aaa47cf4e46aac973e532a790ac2099d017523 Mon Sep 17 00:00:00 2001 From: Peter Eisentraut Date: Fri, 15 Dec 2017 15:24:10 -0500 Subject: [PATCH] Move portal pinning from PL/pgSQL to SPI PL/pgSQL "pins" internally generated (unnamed) portals so that user code cannot close them by guessing their names. This logic is also useful in other languages and really for any code. So move that logic into SPI. An unnamed portal obtained through SPI_cursor_open() and related functions is now automatically pinned, and SPI_cursor_close() automatically unpins a portal that is pinned. --- src/backend/executor/spi.c | 9 + src/pl/plpgsql/src/pl_exec.c | 8 2 files changed, 9 insertions(+), 8 deletions(-) diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c index f3da2ddd08..1f0a07ce0b 100644 --- a/src/backend/executor/spi.c +++ b/src/backend/executor/spi.c @@ -1175,6 +1175,12 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan, { /* Use a random nonconflicting name */ portal = CreateNewPortal(); + + /* +* Make sure the portal doesn't get closed by the user statements we +* execute. +*/ + PinPortal(portal); } else { @@ -1413,6 +1419,9 @@ SPI_cursor_close(Portal portal) if (!PortalIsValid(portal)) elog(ERROR, "invalid portal in SPI cursor operation"); + if (portal->portalPinned) + UnpinPortal(portal); + PortalDrop(portal, false); } diff --git a/src/pl/plpgsql/src/pl_exec.c b/src/pl/plpgsql/src/pl_exec.c index fa4d573e50..243396abd1 100644 --- a/src/pl/plpgsql/src/pl_exec.c +++ b/src/pl/plpgsql/src/pl_exec.c @@ -5330,12 +5330,6 @@ exec_for_query(PLpgSQL_execstate *estate, PLpgSQL_stmt_forq *stmt, /* Fetch loop variable's datum entry */ var = (PLpgSQL_variable *) estate->datums[stmt->var->dno]; - /* -* Make sure the portal doesn't get closed by the user statements we -* execute. -*/ - PinPortal(portal); - /* * Fetch the initial tuple(s). If prefetching is allowed then we grab a * few more rows to avoid multiple trips through executor startup @@ -5450,8 +5444,6 @@ exec_for_query(PLpgSQL_execstate *estate, PLpgSQL_stmt_forq *stmt, */ SPI_freetuptable(tuptab); - UnpinPortal(portal); - /* * Set the FOUND variable to indicate the result of executing the loop * (namely, whether we looped one or more times). This must be set last so -- 2.15.1
Re: Bug: Ambiguous Column Reference Allowed When Joining to pg_roles.oid
Matthew Kelly writes: > I recently fell afoul of a weird edge case while writing an extension. It > seems Postgres allows for an ambiguous column reference to oid in the where > clause when joining to pg_roles. It just arbitrarily chooses pg_roles.oid > and ignores the conflicting name. Example: > postgres=# CREATE TABLE t_demo(); > CREATE TABLE > postgres=# SELECT r.rolname FROM pg_class c JOIN pg_roles r ON (c.relowner = > r.oid) WHERE oid = 't_demo'::regclass; > rolname > - > (0 rows) I do not think that's a bug exactly. There's only one column named "oid" exposed by the join, and once you're above the join it hides the column(s) supplied by the input relations --- were that not so, you could never reference a join output column without qualifying it. If you try this with just regular OID columns, you get regression=# create table t1 (f1 int); CREATE TABLE regression=# create table t2 (f2 int) with oids; CREATE TABLE regression=# select * from t1 join t2 on (f1=f2) where oid = 42; ERROR: column "oid" does not exist LINE 1: select * from t1 join t2 on (f1=f2) where oid = 42; ^ HINT: There is a column named "oid" in table "t2", but it cannot be referenced from this part of the query. which indicates that you have to qualify the table's column if you want to reference it above the join. But if there's a matching user-defined column in the join output then that doesn't happen. It definitely is a bit unfortunate that the pg_roles view exposes a user-defined column named "oid", but we felt we had to do that to avoid breaking user queries that date from when pg_roles was a plain table. regards, tom lane
Re: [HACKERS] Proposal: Local indexes for partitioned table
Hmm, so I'm now unsure what the actual proposals for handling pg_dump are. We seem to have the following three proposals: 1. Alvaro: use CREATE INDEX ON ONLY (not recursive ), followed by CREATE INDEX ON , followed by ALTER INDEX ATTACH PARTITION . I provide an ALTER INDEX DETACH PARTITION for symmetry and because it can be used to replace the index. Pros: the database is always restored identically to what was in the original. Con: The index hierarchy might be "partial", that is, lack a component index on some partition. 2. David's: use CREATE INDEX ON , followed by CREATE INDEX ON . This will use the matching mechanism to automatically attach the index on partition to index on parent. If some partition lacks a matching index, one is created automatically by the creation on parent. If you want to replace the index on a partition, use a new (as yet unimplemented) ALTER INDEX REPLACE. No need to add ONLY to the table name in CREATE INDEX, since the command always recurses. (This seems good to me, because I Pro: the index is never "partial" (missing a partition). Con: the matching mechanism might choose a different index on restore than what was selected in the database being dumped. 3. Robert's: use CREATE INDEX ON ONLY , which creates a shell index on parent only (no recursion), followed by CREATE INDEX ON . DETACH is not provided. If you ATTACH an index for a partition that already has one index attached, then (1) the newly attached one replaces the original (i.e. effectively REPLACE) or (2) you get an error and we implement a separate ALTER INDEX REPLACE command. It's not clear to me how or when the shell index becomes a real index. Robert, can you please clarify the terms of your proposal? How is it better than mine? Is David's concern about a "partial" index (i.e. an index that doesn't exist in some partition) solved by it? I have code for proposals 1 and 2. I don't like proposal 2, and David & Ashutosh don't like (1). Maybe if we all understand (3) we can agree on using that one? -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Top-N sorts verses parallelism
On Sat, Dec 16, 2017 at 9:13 AM, Robert Haas wrote: > On Fri, Dec 15, 2017 at 2:10 PM, Jeff Janes wrote: >> I had hit on the same change. And was also surprised that it was located >> where it was. With the change, it uses the parallel plan all the way down >> to LIMIT 1. >> >> With the patch, it still satisfies make check, so if it introduces errors >> they are subtle ones. If we can't actually do this and it needs to stay -1, >> then I think we need a comment to explain why. > > Interesting. I suspect this is correct now, but would not have been > before commit 3452dc5240da43e833118484e1e9b4894d04431c. AFAICS, this > doesn't affect any execution-time behavior, just the cost estimate. > And, prior to that commit, the execution-time behavior was different: > there would not have been any way for the worker to do a top-N sort, > because the LIMIT was not pushed through the Gather. > > Does that sound right, or am I still confused? Looks right to me. Commit 3452dc52 just forgot to tell the planner. I'm pleased about that because it makes this a slam-dunk bug-fix and not some confusing hard to justify costing problem. BTW with this patch, the same test using a smaller foo table with 250k rows limit 1 runs a parallel plan in 45ms here, but 220k rows limit 1 runs a non-parallel plan in 80ms, but that's just a regular costing problem where the point at which the curves cross over will be hard to get right due to all kinds of other variables, so I guess that kind of thing is expected. PS Oops, I do actually know how to spell "versus". Typos are so much more embarrassing in subject lines! -- Thomas Munro http://www.enterprisedb.com
Re: [HACKERS] UPDATE of partition key
On Fri, Dec 15, 2017 at 7:58 AM, Robert Haas wrote: > Reviewing the preparatory patch: I started another review pass over the main patch, so here are some comments about that. This is unfortunately not a complete review, however. - map = ptr->partition_tupconv_maps[leaf_part_index]; + map = ptr->parentchild_tupconv_maps[leaf_part_index]; I don't think there's any reason to rename this. In previous patch versions, you had multiple arrays of tuple conversion maps in this structure, but the refactoring eliminated that. Likewise, I'm not sure I get the point of mt_transition_tupconv_maps -> mt_childparent_tupconv_maps. That seems like it could similarly be left alone. + /* + * If transition tables are the only reason we're here, return. As + * mentioned above, we can also be here during update tuple routing in + * presence of transition tables, in which case this function is called + * separately for oldtup and newtup, so either can be NULL, not both. + */ if (trigdesc == NULL || (event == TRIGGER_EVENT_DELETE && !trigdesc->trig_delete_after_row) || (event == TRIGGER_EVENT_INSERT && !trigdesc->trig_insert_after_row) || - (event == TRIGGER_EVENT_UPDATE && !trigdesc->trig_update_after_row)) + (event == TRIGGER_EVENT_UPDATE && !trigdesc->trig_update_after_row) || + (event == TRIGGER_EVENT_UPDATE && ((oldtup == NULL) ^ (newtup == NULL I guess this is correct, but it seems awfully fragile. Can't we have some more explicit signaling about whether we're only here for transition tables, rather than deducing it based on exactly one of oldtup and newtup being NULL? + /* Initialization specific to update */ + if (mtstate && mtstate->operation == CMD_UPDATE) + { + ModifyTable *node = (ModifyTable *) mtstate->ps.plan; + + is_update = true; + update_rri = mtstate->resultRelInfo; + num_update_rri = list_length(node->plans); + } I guess I don't see why we need a separate "if" block for this. Neither is_update nor update_rri nor num_update_rri are used until we get to the block that begins with "if (is_update)". Why not just change that block to test if (mtstate && mtstate->operation == CMD_UPDATE)" and put the rest of these initializations inside that block? + int num_update_rri = 0, + update_rri_index = 0; ... + update_rri_index = 0; It's already 0. + leaf_part_rri = &update_rri[update_rri_index]; ... + leaf_part_rri = leaf_part_arr + i; These are doing the same kind of thing, but using different styles. I prefer the former style, so I'd change the second one to &leaf_part_arr[i]. Alternatively, you could change the first one to update_rri + update_rri_indx. But it's strange to see the same variable initialized in two different ways just a few lines apart. + if (!partrel) + { + /* + * We locked all the partitions above including the leaf + * partitions. Note that each of the newly opened relations in + * *partitions are eventually closed by the caller. + */ + partrel = heap_open(leaf_oid, NoLock); + InitResultRelInfo(leaf_part_rri, + partrel, + resultRTindex, + rel, + estate->es_instrument); + } Hmm, isn't there a problem here? Before, we opened all the relations here and the caller closed them all. But now, we're only opening some of them. If the caller closes them all, then they will be closing some that we opened and some that we didn't. That seems quite bad, because the reference counts that are incremented and decremented by opening and closing should all end up at 0. Maybe I'm confused because it seems like this would break in any scenario where even 1 relation was already opened and surely you must have tested that case... but if there's some reason this works, I don't know what it is, and the comment doesn't tell me. +static HeapTuple +ConvertPartitionTupleSlot(ModifyTableState *mtstate, + TupleConversionMap *map, + HeapTuple tuple, + TupleTableSlot *new_slot, + TupleTableSlot **p_my_slot) This function doesn't use the mtstate argument at all. + * (Similarly we need to add the deleted row in OLD TABLE). We need to do The period should be before, not after, the closing parenthesis. + * Now that we have already captured NEW TABLE row, any AR INSERT + * trigger should not again capture it below. Arrange for the same. A more American style would be something like "We've already captured the NEW TABLE row, so make sure any AR INSERT trigger fired below doesn't capture it again." (Similarly for the other case.) + /* The delete has actually happened, so inform that to the caller */ + if (tuple_deleted) + *tuple_deleted = true; In the US, we inform the caller, not inform that to the caller. In other words, here the direct object of "inform" is the person or thing getting the information (in this case, "the caller"), not the information being conveyed (in this case, "that"). I realize your usage is probably typical for your country... + Assert(mtstate->mt_is_tupconv_perpart == true); We usually just Assert(thing_that_should_be_true), not Assert(thing_that_shou
Re: [HACKERS] pgbench more operators & functions
Hello Teodor, It may be good for 't' of 'f' but it seems too free grammar to accept 'tr' or 'fa' or even 'o' which actually not known to be on or off. Yes, it really works like that. I tried to make something clearer than "src/bin/psql/variable.c". Maybe I did not succeed. Ok, I see. Current coding accepts truexxx, falsexxx, yesxx, noxxx but doesn't accept offxxx and onxxx. Not so consistent as it could be. I've checked, but truexxx is not accepted as true. I have added a test case which fails on "malformed variable", i.e. it went up to scanning a double. When comparing ("truexxx", "true", 7) the fifth char is different, so it is != 0. Or I'm missing something. Also it doesn't accept 1 and 0 as psql does, but it's obviously why. Yep. I have added a comment that it will be an int, and if a boolean is needed it would work as expected. Sorry, but I found more notices: 1) '~' and unary '-' should be commented, it's not so easy to guess about how they actually implemented (-1 XOR value, remember that -1 is 0xf) Ok, I agree that it looks strange. I have added comments for both. I have replaced -1 by 0x so that the code is hopefully clearer. 2) - | expr '%' expr { $$ = make_op(yyscanner, "%", $1, $3); } + | expr '%' expr { $$ = make_op(yyscanner, "mod", $1, $3); } why is MOD operation renamed? Do I miss something in thread? Because I have added MOD as an explicit function to match SQL, so now % is just a shorthand for calling it. Before the patch there was only the '%' operator. Changing the name is enough for the function to be found. pg> SELECT 11 % 3, MOD(11, 3); 2 | 2 Looking to psql and pgbench scripting implementation, isn't it better to integrate lua in psql & pgbench? Hmmm... if it starts on this slope, everyone will have its opinion (lua, tcl, python, ruby, perl, insert-script-name-here...) and it must interact with SQL, I'm not sure how to embed SQL & another language cleanly. So the idea is just to extend backslash command capabilities of psql & pgbench, preferably consistently, when need (i.e. use cases) arises. Attached a new version with these changes. -- Fabien.diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml index 4431fc3..ea8f305 100644 --- a/doc/src/sgml/ref/pgbench.sgml +++ b/doc/src/sgml/ref/pgbench.sgml @@ -904,14 +904,32 @@ pgbench options d Sets variable varname to a value calculated from expression. - The expression may contain integer constants such as 5432, + The expression may contain the NULL constant, + boolean constants TRUE and FALSE, + integer constants such as 5432, double constants such as 3.14159, references to variables :variablename, - unary operators (+, -) and binary operators - (+, -, *, /, - %) with their usual precedence and associativity, - function calls, and - parentheses. + operators + with their usual SQL precedence and associativity, + function calls, + SQL CASE generic conditional + expressions and parentheses. + + + + Functions and most operators return NULL on + NULL input. + + + + For conditional purposes, non zero numerical values are + TRUE, zero numerical values and NULL + are FALSE. + + + + When no final ELSE clause is provided to a + CASE, the default value is NULL. @@ -920,6 +938,7 @@ pgbench options d \set ntellers 10 * :scale \set aid (1021 * random(1, 10 * :scale)) % \ (10 * :scale) + 1 +\set divx CASE WHEN :x <> 0 THEN :y/:x ELSE NULL END @@ -996,6 +1015,177 @@ pgbench options d + + Built-In Operators + + + The arithmetic, bitwise, comparison and logical operators listed in +are built into pgbench + and may be used in expressions appearing in + \set. + + + + pgbench Operators by increasing precedence + + + + Operator + Description + Example + Result + + + + + OR + logical or + 5 or 0 + TRUE + + + AND + logical and + 3 and 0 + FALSE + + + NOT + logical not + not false + TRUE + + + IS [NOT] (NULL|TRUE|FALSE) + value tests + 1 is null + FALSE + + + ISNULL|NOTNULL + null tests + 1 notnull + TRUE + + + = + is equal + 5 = 4 + FALSE + + + <> + is not equal + 5 <> 4 + TRUE + + + != + is not equal + 5 != 5 + FALSE + + + < + lower than + 5 < 4 + FALSE + + + <= + lower or equal + 5 <= 4 + FALSE + + + > + greater than + 5 > 4 + TRUE + + + >= + greater or equal + 5 >= 4 + TRUE + + + | + in
Re: [HACKERS] Proposal: Local indexes for partitioned table
Alvaro Herrera wrote: > 3. Robert's: use CREATE INDEX ON ONLY , which creates a shell >index on parent only (no recursion), followed by CREATE INDEX ON >. DETACH is not provided. If you ATTACH an index for a >partition that already has one index attached, then (1) the newly >attached one replaces the original (i.e. effectively REPLACE) or (2) >you get an error and we implement a separate ALTER INDEX REPLACE >command. It's not clear to me how or when the shell index becomes a >real index. As I understand the whole purpose of this design is that there is no point during the restore at which the index lacks indexes on partitions: it is either complete, or it doesn't exist yet. If we create the index on parent first, and later the indexes on partitions, that condition is not satisfied. To solve this, we could create the children indexes first and then the parent; but how do we indicate to the parent creation which are the indexes that oughta be marked as children? ALTER INDEX ATTACH PARTITION doesn't cut it, because it occurs after the index on parent is created, which is too late. We would do something like CREATE INDEX ON parent (columns) [other stuff] ATTACH idx_on_child1, idx_on_child2, idx_on_child3; but this seems mighty ugly. Anyway if we do that, what is the point of ALTER INDEX ATTACH PARTITION? We might as leave it out and just keep the door open for a later ALTER INDEX REPLACE. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] Proposal: Local indexes for partitioned table
On Fri, Dec 15, 2017 at 4:02 PM, Alvaro Herrera wrote: > 3. Robert's: use CREATE INDEX ON ONLY , which creates a shell >index on parent only (no recursion), followed by CREATE INDEX ON >. DETACH is not provided. If you ATTACH an index for a >partition that already has one index attached, then (1) the newly >attached one replaces the original (i.e. effectively REPLACE) or (2) >you get an error and we implement a separate ALTER INDEX REPLACE >command. It's not clear to me how or when the shell index becomes a >real index. With this proposal, I think the index can be invalid initially, but once you've attached an index for every child partition, it becomes irrevocably valid. After that, the only supported operation is REPLACE, which preserves validity. > Robert, can you please clarify the terms of your proposal? How is it > better than mine? Is David's concern about a "partial" index (i.e. an > index that doesn't exist in some partition) solved by it? I think the perceived advantage is that, once valid, the index can't subsequently become not-valid. That seemed to be David's big concern (which is not without foundation). > I have code for proposals 1 and 2. I don't like proposal 2, and David & > Ashutosh don't like (1). Maybe if we all understand (3) we can agree on > using that one? Yes, it would be nice to achieve some sort of consensus and I think (3) gives everyone a little of what they want. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [HACKERS] Proposal: Local indexes for partitioned table
Robert Haas wrote: > On Thu, Nov 30, 2017 at 7:02 AM, Alvaro Herrera > wrote: > > Great question. So you're thinking that the planner might have an > > interest in knowing what indexes are defined at the parent table level > > for planning purposes; but for that to actually have any effect we would > > need to change the planner and executor also. And one more point, also > > related to something you said before: we currently (I mean after my > > patch) don't mark partitioned-table-level indexes as valid or not valid > > depending on whether all its children exist, so trying to use that in > > the planner without having a flag could cause invalid plans to be > > generated (i.e. ones that would cause nonexistent indexes to be > > referenced). > > Did you do it this way due to locking concerns? No -- just because since the index-on-parent is a different kind of object (RELKIND_PARTITIONED_INDEX) it is not considered for anything in the planner anyway, so there's no need for indisvalid to be "correct". Changing the flag in the parent index only needs to examine state on the children, not modify them, so I don't think there would be any serious locking problem. By the way, my implementation of ALTER INDEX ATTACH PARTITION was prone to deadlocks because it needs locks on parent table, index-on-parent, partition, and index-on-partition; and there was no consideration to the ordering in which these were being acquired. I wrote a callback for RangeVarGetRelidExtended to be called in ATExecAttachPartitionIdx() that tries to acquire locks in the right order -- but I recently realized that even that is not sufficient, because (unless I misread it) ALTER INDEX itself does not worry about locking the containing table before the index. I think some tweaking needs to be done in RangeVarCallbackForAlterRelation to lock the table if an index is being altered (conditionally, depending on the precise ALTER TABLE operation). It surprises me that we don't need to do that yet, but I haven't looked into it any further. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] Proposal: Local indexes for partitioned table
Robert Haas wrote: > On Fri, Dec 15, 2017 at 4:02 PM, Alvaro Herrera > wrote: > > 3. Robert's: use CREATE INDEX ON ONLY , which creates a shell > >index on parent only (no recursion), followed by CREATE INDEX ON > >. DETACH is not provided. If you ATTACH an index for a > >partition that already has one index attached, then (1) the newly > >attached one replaces the original (i.e. effectively REPLACE) or (2) > >you get an error and we implement a separate ALTER INDEX REPLACE > >command. It's not clear to me how or when the shell index becomes a > >real index. > > With this proposal, I think the index can be invalid initially, but > once you've attached an index for every child partition, it becomes > irrevocably valid. After that, the only supported operation is > REPLACE, which preserves validity. Sounds okay to me. (I had already deleted ALTER INDEX DETACH from my patch earlier today.) I admit I had also deleted the ONLY clause from CREATE INDEX, and I don't like having to put it back :-) But on the whole, having it sounds better than the alternatives. We have two options for marking valid: 1. after each ALTER INDEX ATTACH, verify whether the set of partitions that contain the index is complete; if so, mark it valid, otherwise do nothing. This sucks because we have to check that over and over for every index that we attach 2. We invent yet another command, say ALTER INDEX VALIDATE -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
"failed to find parent tuple for heap-only tuple" error as an ERRCODE_DATA_CORRUPTION ereport()
Commit d70cf811, from 2014, promoted an Assert() within IndexBuildHeapScan() to a "can't happen" elog() error, in order to detect when a parent tuple cannot be found for some heap-only tuple -- if this happens, then it indicates corruption. I think that we should make it a full ereport(), with an errcode of ERRCODE_DATA_CORRUPTION, to match what Andres just added to code that deals with freezing (he promoted Assert()s to errors, just like the 2014 commit, though he went as far as making them ereport()s to begin with). Attached patch does this. I propose a backpatch to 9.3, partially for the sake of tools like amcheck, where users may only be on the lookout for ERRCODE_DATA_CORRUPTION and ERRCODE_INDEX_CORRUPTED. FWIW, an old MultiXact/recovery bug, alluded to by the commit message of d70cf811 [1] (and fixed by 6bfa88acd) was the cause of some déjà vu for me while looking into the "freeze the dead" issues. Because the enhanced amcheck [2] actually raised this error when I went to verify the first "freeze the dead" bugfix [3], it's clearly effective as a test for certain types of corruption. If CREATE INDEX/IndexBuildHeapScan() didn't already perform this check, then it would probably be necessary for amcheck to implement it on its own. What heap_get_root_tuples() does for us here is ideally suited to finding inconsistencies in HOT chains, because it matches xmin against xmax, looks at line pointer bits/redirects, and consults pg_multixact if necessary. The only thing that it *doesn't* do is make sure that hint bits accurately reflect what it says in the CLOG -- we'll need to find another way to do that, by directly targeting heap relations with their own function. In short, it does an awful lot for tools like amcheck, and I want to make sure that we get the full benefit of that. [1] https://www.postgresql.org/message-id/CAM3SWZTMQiCi5PV5OWHb+bYkUcnCk=o67w0csswpvv7xfuc...@mail.gmail.com [2] https://github.com/petergeoghegan/amcheck#optional-heapallindexed-verification [3] https://postsgr.es/m/cah2-wznm4rcrhfaiwkpwtpew2bxdtgrozk7jwwgucxeh3d1...@mail.gmail.com -- Peter Geoghegan From 0a37ceea53736e39d0f1af72ec9bc07d98833370 Mon Sep 17 00:00:00 2001 From: Peter Geoghegan Date: Fri, 15 Dec 2017 14:05:16 -0800 Subject: [PATCH] Promote "HOT parent tuple" elog to an ereport. Commit d70cf811, from 2014, promoted an assert within IndexBuildHeapScan() (and another similar assert) to a "can't happen" elog() error, in order to detect when a parent tuple cannot be found for some heap-only tuple during CREATE INDEX/REINDEX. When the error is raised, it indicates heap corruption. This has proven useful as a corruption smoke-test while investigating recent corruption-inducing bugs in pruning/freezing. This commit promotes those elog() errors to ereport() errors. The errors are emitted with ereport rather than elog, despite being "should never happen" messages, so a proper error code is emitted. To avoid superfluous translations, mark messages as internal. This is done primarily for the benefit of tools like amcheck, that may want to piggy-back on IndexBuildHeapScan() to perform integrity checking. It's worth spelling out the nature of the problem for users of these tools. Author: Peter Geoghegan Reviewed-By: Andres Freund Backpatch: 9.3- --- src/backend/catalog/index.c | 19 --- 1 file changed, 12 insertions(+), 7 deletions(-) diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c index 0125c18..e43482d 100644 --- a/src/backend/catalog/index.c +++ b/src/backend/catalog/index.c @@ -2595,9 +2595,12 @@ IndexBuildHeapRangeScan(Relation heapRelation, offnum = ItemPointerGetOffsetNumber(&heapTuple->t_self); if (!OffsetNumberIsValid(root_offsets[offnum - 1])) -elog(ERROR, "failed to find parent tuple for heap-only tuple at (%u,%u) in table \"%s\"", - ItemPointerGetBlockNumber(&heapTuple->t_self), - offnum, RelationGetRelationName(heapRelation)); +ereport(ERROR, + (errcode(ERRCODE_DATA_CORRUPTED), + errmsg_internal("failed to find parent tuple for heap-only tuple at (%u,%u) in table \"%s\"", + ItemPointerGetBlockNumber(&heapTuple->t_self), + offnum, + RelationGetRelationName(heapRelation; ItemPointerSetOffsetNumber(&rootTuple.t_self, root_offsets[offnum - 1]); @@ -3060,10 +3063,12 @@ validate_index_heapscan(Relation heapRelation, { root_offnum = root_offsets[root_offnum - 1]; if (!OffsetNumberIsValid(root_offnum)) -elog(ERROR, "failed to find parent tuple for heap-only tuple at (%u,%u) in table \"%s\"", - ItemPointerGetBlockNumber(heapcursor), - ItemPointerGetOffsetNumber(heapcursor), - RelationGetRelationName(heapRelation)); +ereport(ERROR, + (errcode(ERRCODE_DATA_CORRUPTED), + errmsg_internal("failed to find parent tuple for heap-only tuple at (%u,%u) in table \"%s\"", + ItemPointerGetBlockNumber(heapcursor), + ItemPointerGet
Re: [HACKERS] replace GrantObjectType with ObjectType
On Sat, Dec 16, 2017 at 2:39 AM, Peter Eisentraut wrote: > On 12/13/17 02:35, Michael Paquier wrote: >> Patch 0001 is simply removing EventTriggerSupportsGrantObjectType(), >> but shouldn't we keep it and return an error for objects that have no >> GRANT support? Returning conditionally true looks like a trap waiting >> to take someone in. > > I don't understand the motivation for this. It would just be two lists > for the same thing. Not really. What grant supports is a subset of what event triggers do. > I think the potential for omission would be much greater that way. That's the whole point of not having "default" in the switches, no? Any object additions would be caught at compilation time. -- Michael
Re: genomic locus
On Sat, Dec 16, 2017 at 4:49 AM, Gene Selkov wrote: > I noticed somebody took a good care of seg while I was away for the last 20 > years, and I am extremely grateful for that. I have been using it. In the > meantime, things have changed and now I am almost clueless about how you > deal with contributed modules and what steps I should take once I get it to > work. Also, is it a good idea to clone and fix seg for this purpose, or is > there a more appropriate template? Or maybe just building it from scratch > will be a better idea? Wah. You are the author of the seg module and your name is listed in a set of commits from 2000: https://www.postgresql.org/docs/devel/static/seg.html seg is now shaped as what is called an extension (see https://www.postgresql.org/docs/9.6/static/external-extensions.html), which is made roughly a library and a set of SQL commands that the server can load automatically after enabling them with the command CREATE EXTENSION. If you wish to fix seg in some way, you could always patch them. But I am not sure what you are trying to fix, so more details would be welcome. > I have seen a lot of bit rot in other extensions (never contributed) that I > have not maintained since 2009 and I now I am unable to fix some of them, so > I wonder how much of old knowledge is still applicable. In other words, is > what I see in new code just a change of macros or the change of principles? APIs in Postgres are usually stable. You should be able to update your own extensions. If you want to discuss about a couple of things in particular, don't hesitate! -- Michael
Re: Reproducible builds: genbki.pl vs schemapg.h
On Sat, Dec 16, 2017 at 3:13 AM, Christoph Berg wrote: > Re: Tom Lane 2017-12-15 <9616.1513351...@sss.pgh.pa.us> >> Christoph Berg writes: >> > Debian's reproducible builds project has revealed that the full build >> > path gets embedded into server/catalog/schemapg.h: >> >> genbki.pl is hardly our only script that prints its $0 ... > > As per > https://tests.reproducible-builds.org/debian/rb-pkg/unstable/amd64/postgresql-10.html, > that's the only place that makes it into the resulting binary. > I wouldn't be sending a patch if it didn't fix the issue. Why not fixing that? Reproducible builds are a trend of these days, and what's proposed here is really simple to make PG more compliant with this purpose in mind. -- Michael
Re: Reproducible builds: genbki.pl vs schemapg.h
On 2017-12-16 07:52:41 +0900, Michael Paquier wrote: > On Sat, Dec 16, 2017 at 3:13 AM, Christoph Berg > wrote: > > Re: Tom Lane 2017-12-15 <9616.1513351...@sss.pgh.pa.us> > >> Christoph Berg writes: > >> > Debian's reproducible builds project has revealed that the full build > >> > path gets embedded into server/catalog/schemapg.h: > >> > >> genbki.pl is hardly our only script that prints its $0 ... > > > > As per > > https://tests.reproducible-builds.org/debian/rb-pkg/unstable/amd64/postgresql-10.html, > > that's the only place that makes it into the resulting binary. > > I wouldn't be sending a patch if it didn't fix the issue. > > Why not fixing that? Reproducible builds are a trend of these days, > and what's proposed here is really simple to make PG more compliant > with this purpose in mind. It's not like $0 instead of a hardcoded name in the header actually buys us anything afaict. Greetings, Andres Freund
Re: Reproducible builds: genbki.pl vs schemapg.h
On Fri, Dec 15, 2017 at 3:21 PM, Andres Freund wrote: >> Why not fixing that? Reproducible builds are a trend of these days, >> and what's proposed here is really simple to make PG more compliant >> with this purpose in mind. > > It's not like $0 instead of a hardcoded name in the header actually buys > us anything afaict. +1. I think that reproducible builds are a worthwhile goal, and I welcome Christoph's continued work on them. -- Peter Geoghegan
Re: Reproducible builds: genbki.pl vs schemapg.h
Andres Freund writes: > On 2017-12-16 07:52:41 +0900, Michael Paquier wrote: >> On Sat, Dec 16, 2017 at 3:13 AM, Christoph Berg >> wrote: >>> Re: Tom Lane 2017-12-15 <9616.1513351...@sss.pgh.pa.us> genbki.pl is hardly our only script that prints its $0 ... >>> As per >>> https://tests.reproducible-builds.org/debian/rb-pkg/unstable/amd64/postgresql-10.html, >>> that's the only place that makes it into the resulting binary. I'm fairly confused by this claim. Since the string in question is in a comment, it really shouldn't affect built binaries at all. I can believe that it would affect the non-binary contents of the finished package, because we include schemapg.h as one of the installed headers. However, we also include fmgroids.h, which also has an expanded version of $0 in it. So if schemapg.h is affected by build path, why isn't fmgroids.h? In my build, neither one of these files contains any path information; I speculate that you need to use a VPATH build to have an issue, or maybe Debian's build environment does something even weirder. But I feel confident in saying that if indeed fmgroids.h is invariant for you today, that's a phase-of-the-moon behavior that will break someday if we don't make a similar change in Gen_fmgrtab.pl. Or else we should propagate what's preventing it back to genbki.pl. plperl's text2macro.pl is also emitting $0, and there may be other places I missed (I grepped for 'DO NOT EDIT', not for $0 per se). We seem not to install the output file of that one, perlchunks.h, but that's a decision that somebody might change too. > It's not like $0 instead of a hardcoded name in the header actually buys > us anything afaict. Agreed so far as the script name goes. However, two out of three of these scripts also print their input file names, and I'm suspicious that that output is also gonna change in a VPATH build. I'm a little less inclined to buy the claim that we're not losing anything if we suppress that :-( regards, tom lane
Re: Backfill bgworker Extension?
On Tue, Dec 12, 2017 at 2:26 PM Peter Eisentraut < peter.eisentr...@2ndquadrant.com> wrote: > On 12/12/17 13:03, Jeremy Finzel wrote: > > To be clear, what I mean is batch updating a large set of data in small > > pieces so as to avoid things like lock contention and replication lags. > > Sometimes these have a driving table that has the source data to update > > in a destination table based on a key column, but sometimes it is > > something like setting just a single specific value for a huge table. > > > > I would love instead to have a Postgres extension that uses postgres > > background workers to accomplish this, especially if it were part of > > core. Before I venture into exploring writing something like this as an > > extension, would this ever be considered something appropriate as an > > extension in Postgres core? Would that be appropriate? > > I don't see what the common ground between different variants of this > use case would be. Aren't you basically just looking to execute a > use-case-specific stored procedure in the background? > > -- > Peter Eisentraut http://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services The common ground is some column in some table needs to be bulk updated. I may not be explaining well, but in our environment we have done hundreds of these using a generic framework to build a backfill. So I’m not sure what you are questioning about the need? We have had to build a worker to accomplish this because it can’t be done as a sql script alone. I’m not sure what you mean by a stored procedure in the background. Since it would not be a single transaction, it doesn’t fit as a stored procedure at least in Postgres when a function is 1 transaction. Sorry if I’m misunderstanding. Thanks, Jeremy
Re: [HACKERS] Runtime Partition Pruning
On 13 December 2017 at 00:33, Beena Emerson wrote: > PFA the updated patch, this can be applied over the v13 patches [1] > over commit 487a0c1518af2f3ae2d05b7fd23d636d687f28f3 Hi Beena, Thanks for posting an updated patch. I've been looking over this and I think that the use of the PartitionDispatch in set_append_subplan_indexes is not correct. What we need here is the index of the Append's subnode and that's not what RelationGetPartitionDispatchInfo() gives you. Remember that some partitions could have been pruned away already during planning. This quick example shows that the partition selection is not correct. create table p (a int, b int) partition by range (a); create table p_a_neg partition of p for values from (minvalue) to (0) partition by range (b); create table p_a_pos partition of p for values from (0) to (maxvalue) partition by range (b); create table p_a_neg_b_neg partition of p_a_neg for values from (minvalue) to (0); create table p_a_neg_b_pos partition of p_a_neg for values from (0) to (maxvalue); create table p_a_pos_b_neg partition of p_a_pos for values from (minvalue) to (0); create table p_a_pos_b_pos partition of p_a_pos for values from (0) to (maxvalue); prepare q1 (int, int) as select * from p where a = $1 and b = $1; explain analyze execute q1 (-1,-1); -- this works. QUERY PLAN -- Append (cost=0.00..175.60 rows=4 width=8) (actual time=1.099..1.099 rows=0 loops=1) Runtime Partition Pruning: ((a = $1) AND (b = $1)) -> Seq Scan on p_a_neg_b_neg (cost=0.00..43.90 rows=1 width=8) (actual time=0.023..0.023 rows=0 loops=1) Filter: ((a = $1) AND (b = $1)) -> Seq Scan on p_a_neg_b_pos (cost=0.00..43.90 rows=1 width=8) (never executed) Filter: ((a = $1) AND (b = $1)) -> Seq Scan on p_a_pos_b_neg (cost=0.00..43.90 rows=1 width=8) (never executed) Filter: ((a = $1) AND (b = $1)) -> Seq Scan on p_a_pos_b_pos (cost=0.00..43.90 rows=1 width=8) (never executed) Filter: ((a = $1) AND (b = $1)) (12 rows) explain analyze execute q1 (-1,1); -- should scan p_a_neg_b_pos, but does not. QUERY PLAN -- Append (cost=0.00..175.60 rows=4 width=8) (actual time=758996.359..758996.359 rows=0 loops=1) Runtime Partition Pruning: ((a = $1) AND (b = $1)) -> Seq Scan on p_a_neg_b_neg (cost=0.00..43.90 rows=1 width=8) (actual time=0.056..0.056 rows=0 loops=1) Filter: ((a = $1) AND (b = $1)) -> Seq Scan on p_a_neg_b_pos (cost=0.00..43.90 rows=1 width=8) (never executed) Filter: ((a = $1) AND (b = $1)) -> Seq Scan on p_a_pos_b_neg (cost=0.00..43.90 rows=1 width=8) (never executed) Filter: ((a = $1) AND (b = $1)) -> Seq Scan on p_a_pos_b_pos (cost=0.00..43.90 rows=1 width=8) (never executed) Filter: ((a = $1) AND (b = $1)) (12 rows) So, I started to look at what the best way to put this right might be. I see that since Parallel Append was committed that the subnodes are now sorted in cost order inside create_append_path(), so likely we'll want to delay figuring out the subpath list indexes until after that's done since sorting would scramble our index arrays. We could simply look at the subpaths at the end of create_append_path() and create some sort of new matrix type that can accept the output of Amit's get_partitions_from_clauses() and translate that Bitmapset into the subpath indexes (another Bitmapset). This will also need to work for sub-partitions too, so this matrix must be some sort of tree that we can descend into when we see that get_partitions_from_clauses returned a bit for a sub-partition instead of a leaf-partition. I bashed this idea around a bit and I came up with the attached. It's very far from complete and in a very WIP state. I've not really done anything to make the correct clause list available in nodeAppend.c yet, but I think the code that's there is worthy of a look. I've not done that much work on the new choose_next_subplan* functions in nodeAppend.c. I just modified choose_next_subplan_locally to show how this set of functions need to take into account the subnode bitmap set of valid partitions to scan. Perhaps some special case is needed to have these functions ignore the Bitmapset when runtime pruning is disabled (perhaps a completely new set of the functions is needed to support the selection of the next non-pruned partition). Although, probably that can be debated a bit later as it's a fairly minor detail for now. My patch also lacks any means to extract the Params during match_clauses_to_partkey(), or at least most of the cases. I've just added 1 case there. I did this because I thought it was better to extract the ParamIds rather than a bool to
Re: [sqlsmith] Parallel worker executor crash on master
On Sat, Dec 16, 2017 at 12:57 AM, Andreas Seltenreich wrote: > Hi, > > sqlsmith just crashed a parallel worker while testing master at > 699bf7d05c. I can reproduce it with the following recipe on a fresh > regression database. Backtrace and query plan below as well. > This seems to be another symptom of the problem related to es_query_dsa for which Thomas has sent a patch on a different thread [1]. After applying that patch, I am not able to see the problem. I think due to the wrong usage of dsa across nodes, it can lead to sending some wrong values for params to workers. [1] - https://www.postgresql.org/message-id/CAEepm%3D0Mv9BigJPpribGQhnHqVGYo2%2BkmzekGUVJJc9Y_ZVaYA%40mail.gmail.com With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com