Obsolete comment in partbounds.c
While reviewing the partitionwise-join patch, I noticed $Subject,ie, this in create_list_bounds(): /* * Never put a null into the values array, flag instead for * the code further down below where we construct the actual * relcache struct. */ if (null_index != -1) elog(ERROR, "found null more than once"); null_index = i; "the code further down below where we construct the actual relcache struct" isn't in the same file anymore by refactoring by commit b52b7dc25. How about modifying it like the attached? Best regards, Etsuro Fujita fix-obsolete-comment.patch Description: Binary data
Re: generating catcache control data
On Fri, Oct 11, 2019 at 3:14 AM Tom Lane wrote: > I do not like attaching this data to the DECLARE_UNIQUE_INDEX macros. > It's really no business of the indexes' whether they are associated > with a syscache. It's *certainly* no business of theirs how many > buckets such a cache should start off with. > > I'd be inclined to make a separate file that's specifically concerned > with declaring syscaches, and put all the required data there. That gave me another idea that would further reduce the bookkeeping involved in creating new syscaches -- put declarations in the cache id enum (syscache.h), like this: #define DECLARE_SYSCACHE(cacheid,indexname,indexoid,numbuckets) cacheid enum SysCacheIdentifier { DECLARE_SYSCACHE(AGGFNOID, pg_aggregate_fnoid_index, AggregateFnoidIndexId, 16) = 0, ... }; > > Relname, and relisshared are straightforward. For eq/hash functions, > > we could add metadata attributes to pg_type.dat for the relevant > > types. Tuple descriptors would get their attrs from schemapg.h. > > I don't see a need to hard-wire more information than we do today, and > I'd prefer not to because it adds to the burden of creating new syscaches. > Assuming that the plan is for genbki.pl or some similar script to generate > the constants, it could look up all the appropriate data from the initial > contents for pg_opclass and friends. That is, basically what we want here > is for a constant-creation script to perform the same lookups that're now > done during backend startup. Looking at it again, the eq/hash functions are local and looked up via GetCCHashEqFuncs(), but the runtime of that is surely miniscule, so I left it alone. > > Is this something worth doing? > > Hard to tell. It'd take a few cycles out of backend startup, which > seems like a worthy goal; but I don't know if it'd save enough to be > worth the trouble. Probably can't tell for sure without doing most > of the work :-(. I went ahead and did just enough to remove the relation-opening code. Looking in the archives, I found this as a quick test: echo '\set x 1' > x.txt ./inst/bin/pgbench -n -C -c 1 -f x.txt -T 10 Typical numbers: master: number of transactions actually processed: 4276 latency average = 2.339 ms tps = 427.549137 (including connections establishing) tps = 24082.726350 (excluding connections establishing) patch: number of transactions actually processed: 4436 latency average = 2.255 ms tps = 443.492369 (including connections establishing) tps = 21817.308410 (excluding connections establishing) ...which amounts to nearly 4% improvement in the first tps number, which isn't earth-shattering, but it's something. Opinions? It wouldn't be a lot of additional work to put together a WIP patch. -- John Naylorhttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: UPSERT on view does not find constraint by name
Jeremy Finzel writes: > I'm not sure if this can be considered a bug or not, but it is perhaps > unexpected. I found that when using a view that is simply select * from > table, then doing INSERT ... ON CONFLICT ON CONSTRAINT constraint_name on > that view, it does not find the constraint and errors out. But it does > find the constraint if one lists the columns instead. I'm confused by this report. The view wouldn't have any constraints, and experimenting shows that the parser won't let you name a constraint of the underlying table here. So would you provide a concrete example of what you're talking about? regards, tom lane
Re: [Patch] Base backups and random or zero pageheaders
Hi, Am Samstag, den 04.05.2019, 21:50 +0900 schrieb Michael Paquier: > On Tue, Apr 30, 2019 at 03:07:43PM +0200, Michael Banck wrote: > > This is still an open item for the back branches I guess, i.e. zero page > > header for pg_verify_checksums and additionally random page header for > > pg_basebackup's base backup. > > I may be missing something, but could you add an entry in the future > commit fest about the stuff discussed here? I have not looked at your > patch closely.. Sorry. Here is finally a rebased patch for the (IMO) more important issue in pg_basebackup. I've added a commitfest entry for this now: https://commitfest.postgresql.org/25/2308/ Michael -- Michael Banck Projektleiter / Senior Berater Tel.: +49 2166 9901-171 Fax: +49 2166 9901-100 Email: michael.ba...@credativ.de credativ GmbH, HRB Mönchengladbach 12080 USt-ID-Nummer: DE204566209 Trompeterallee 108, 41189 Mönchengladbach Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer Unser Umgang mit personenbezogenen Daten unterliegt folgenden Bestimmungen: https://www.credativ.de/datenschutzFrom 084a2fe968a3ee9c0bb4f18fa9fbc8a582f38b3c Mon Sep 17 00:00:00 2001 From: Michael Banck Date: Fri, 18 Oct 2019 10:48:22 +0200 Subject: [PATCH] Fix base backup checksum verification for random or zero page headers We currently do not checksum a page if it is considered new by PageIsNew() or if its pd_lsn page header field is larger than the LSN at the beginning of the base backup. However, this means that if the whole page header is zero, PageIsNew() will consider that page new due to the pd_upper field being zero and no subsequent checksum verification is done. Also, a random value in the pd_lsn field will very likely be larger than the LSN at backup start, again resulting in the page not getting checksummed. To fix, factor out the all-zeroes check from PageIsVerified() into a new method PageIsZero() and call that for pages considered new by PageIsNew(). If a page is all zero, consider that a checksum failure. Also check the pd_lsn field against both the backup start pointer and the current pointer from GetInsertRecPtr(). Add two more tests to the pg_basebackup TAP tests to check those errors. Reported-By: Andres Freund Discussion: https://postgr.es/m/20190319200050.ncuxejradurja...@alap3.anarazel.de --- src/backend/replication/basebackup.c | 159 +++ src/backend/storage/page/bufpage.c | 56 ++ src/bin/pg_basebackup/t/010_pg_basebackup.pl | 30 +++-- src/include/storage/bufpage.h| 1 + 4 files changed, 148 insertions(+), 98 deletions(-) diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c index d0f210de8c..ee026bc1d0 100644 --- a/src/backend/replication/basebackup.c +++ b/src/backend/replication/basebackup.c @@ -1456,87 +1456,28 @@ sendFile(const char *readfilename, const char *tarfilename, struct stat *statbuf page = buf + BLCKSZ * i; /* - * Only check pages which have not been modified since the - * start of the base backup. Otherwise, they might have been - * written only halfway and the checksum would not be valid. - * However, replaying WAL would reinstate the correct page in - * this case. We also skip completely new pages, since they - * don't have a checksum yet. + * We skip completely new pages after checking they are + * all-zero, since they don't have a checksum yet. */ -if (!PageIsNew(page) && PageGetLSN(page) < startptr) +if (PageIsNew(page)) { - checksum = pg_checksum_page((char *) page, blkno + segmentno * RELSEG_SIZE); - phdr = (PageHeader) page; - if (phdr->pd_checksum != checksum) + if (!PageIsZero(page)) { /* - * Retry the block on the first failure. It's - * possible that we read the first 4K page of the - * block just before postgres updated the entire block - * so it ends up looking torn to us. We only need to - * retry once because the LSN should be updated to - * something we can ignore on the next pass. If the - * error happens again then it is a true validation - * failure. + * pd_upper is zero, but the page is not all zero. We + * cannot run pg_checksum_page() on the page as it + * would throw an assertion failure. Consider this a + * checksum failure. */ - if (block_retry == false) - { - /* Reread the failed block */ - if (fseek(fp, -(cnt - BLCKSZ * i), SEEK_CUR) == -1) - { -ereport(ERROR, - (errcode_for_file_access(), - errmsg("could not fseek in file \"%s\": %m", -readfilename))); - } - - if (fread(buf + BLCKSZ * i, 1, BLCKSZ, fp) != BLCKSZ) - { -/* - * If we hit end-of-file, a concurrent - * truncation must have occurred, so break out - * of this loop just as if the initial fread() - * returne
Re: Partitioning versus autovacuum
At the risk of forking this thread... I think there's actually a planner estimation bug here too. Consider this test case of a simple partitioned table and a simple join. The cardinality estimates for each partition and the Append node are all perfectly accurate. But the estimate for the join is way off. The corresponding test case without partitioning produces a perfect cardinality estimate for the join. I've never completely wrapped my head around the planner selectivity estimations. IIRC join restrictions are treated differently from single-relation restrictions. Perhaps what's happening here is that the single-relation restrictions are being correctly estimated based on the child partitions but the join restriction code hasn't been taught the same tricks? stark=# create table p (i integer, j integer) partition by list (i); CREATE TABLE stark=# create table p0 partition of p for values in (0); CREATE TABLE stark=# create table p1 partition of p for values in (1); CREATE TABLE stark=# insert into p select 0,generate_series(1,1000); INSERT 0 1000 stark=# insert into p select 1,generate_series(1,1000); INSERT 0 1000 stark=# analyze p0; ANALYZE stark=# analyze p1; ANALYZE stark=# create table q (i integer); CREATE TABLE stark=# insert into q values (0); INSERT 0 1 stark=# analyze q; ANALYZE -- Query partitioned table, get wildly off row estimates for join stark=# explain analyze select * from q join p using (i) where j between 1 and 500; ┌─┐ │ QUERY PLAN │ ├─┤ │ Hash Join (cost=1.02..44.82 rows=5 width=8) (actual time=0.060..1.614 rows=500 loops=1)│ │ Hash Cond: (p0.i = q.i) │ │ -> Append (cost=0.00..40.00 rows=1000 width=8) (actual time=0.030..1.127 rows=1000 loops=1) │ │ -> Seq Scan on p0 (cost=0.00..20.00 rows=500 width=8) (actual time=0.029..0.440 rows=500 loops=1) │ │ Filter: ((j >= 1) AND (j <= 500)) │ │ Rows Removed by Filter: 500 │ │ -> Seq Scan on p1 (cost=0.00..20.00 rows=500 width=8) (actual time=0.018..0.461 rows=500 loops=1) │ │ Filter: ((j >= 1) AND (j <= 500)) │ │ Rows Removed by Filter: 500 │ │ -> Hash (cost=1.01..1.01 rows=1 width=4) (actual time=0.011..0.012 rows=1 loops=1) │ │ Buckets: 1024 Batches: 1 Memory Usage: 9kB │ │ -> Seq Scan on q (cost=0.00..1.01 rows=1 width=4) (actual time=0.005..0.006 rows=1 loops=1) │ │ Planning time: 0.713 ms │ │ Execution time: 1.743 ms │ └─┘ (14 rows) -- Query non-partitioned table get accurate row estimates for join stark=# create table pp as (Select * from p); SELECT 2000 stark=# analyze pp; ANALYZE stark=# explain analyze select * from q join pp using (i) where j between 1 and 500; ┌─┐ │ QUERY PLAN │ ├─┤ │ Hash Join (cost=1.02..48.77 rows=500 width=8) (actual time=0.027..0.412 rows=500 loops=1) │ │ Hash Cond: (pp.i = q.i) │ │ -> Seq Scan on pp (cost=0.00..39.00 rows=1000 width=8) (actual time=0.014..0.243 rows=1000 loops=1) │ │ Filter: ((j >= 1) AND (j <= 500)) │ │ Rows Removed by Filter: 1000 │ │ -> Hash (cost=1.01..1.01 rows=1 width=4) (actual time=0.005..0.005 rows=1 loops=1) │ │ Buckets: 1024 Batches: 1 Memory Usage: 9kB │ │ -> Seq Scan on q (cost=0.00..1.01 rows=1 width=4) (actual time=0.003..0.003 rows=1 loops=1) │ │ Planning time: 0.160 ms │ │ Execution time: 0.456 ms │ └─┘ (10 rows)
Re: Obsolete comment in partbounds.c
On 2019-Oct-18, Etsuro Fujita wrote: > While reviewing the partitionwise-join patch, I noticed $Subject,ie, > this in create_list_bounds(): > > /* > * Never put a null into the values array, flag instead for > * the code further down below where we construct the actual > * relcache struct. > */ > if (null_index != -1) > elog(ERROR, "found null more than once"); > null_index = i; > > "the code further down below where we construct the actual relcache > struct" isn't in the same file anymore by refactoring by commit > b52b7dc25. How about modifying it like the attached? Yeah, agreed. Instead of "the null comes from" I would use "the partition that stores nulls". While reviewing your patch I noticed a few places where we use an odd pattern in switches, which can be simplified as shown here. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services diff --git a/src/backend/partitioning/partbounds.c b/src/backend/partitioning/partbounds.c index 318d8ecae9..e051094d54 100644 --- a/src/backend/partitioning/partbounds.c +++ b/src/backend/partitioning/partbounds.c @@ -209,13 +209,9 @@ partition_bounds_create(PartitionBoundSpec **boundspecs, int nparts, return create_range_bounds(boundspecs, nparts, key, mapping); default: - elog(ERROR, "unexpected partition strategy: %d", - (int) key->strategy); - break; + Assert(false); + return NULL; /* keep compiler quiet */ } - - Assert(false); - return NULL;/* keep compiler quiet */ } /* @@ -1797,10 +1793,7 @@ qsort_partition_rbound_cmp(const void *a, const void *b, void *arg) static int get_partition_bound_num_indexes(PartitionBoundInfo bound) { - int num_indexes; - Assert(bound); - switch (bound->strategy) { case PARTITION_STRATEGY_HASH: @@ -1809,24 +1802,20 @@ get_partition_bound_num_indexes(PartitionBoundInfo bound) * The number of the entries in the indexes array is same as the * greatest modulus. */ - num_indexes = get_hash_partition_greatest_modulus(bound); - break; + return get_hash_partition_greatest_modulus(bound); case PARTITION_STRATEGY_LIST: - num_indexes = bound->ndatums; + return bound->ndatums; break; case PARTITION_STRATEGY_RANGE: /* Range partitioned table has an extra index. */ - num_indexes = bound->ndatums + 1; - break; + return bound->ndatums + 1; default: - elog(ERROR, "unexpected partition strategy: %d", - (int) bound->strategy); + Assert(false); + return 0; /* keep compiler quiet */ } - - return num_indexes; } /*
Re: [PATCH] Race condition in logical walsender causes long postgresql shutdown delay
On Thu, 17 Oct 2019 at 21:19, Alvaro Herrera wrote: > On 2019-Sep-26, Alvaro Herrera wrote: > > > On 2019-Sep-26, Jeff Janes wrote: > > > > Hi Alvaro, does this count as a review? > > > > Well, I'm already a second pair of eyes for Craig's code, so I think it > > does :-) I would have liked confirmation from Craig that my change > > looks okay to him too, but maybe we'll have to go without that. > > There not being a third pair of eyes, I have pushed this. > > Thanks! > > > Thanks. I'm struggling to keep up with my own threads right now... -- Craig Ringer http://www.2ndQuadrant.com/ 2ndQuadrant - PostgreSQL Solutions for the Enterprise
Re: Compressed pluggable storage experiments
Hi, On 2019-10-17 12:47:47 -0300, Alvaro Herrera wrote: > On 2019-Oct-10, Ildar Musin wrote: > > > 1. Unlike FDW API, in pluggable storage API there are no routines like > > "begin modify table" and "end modify table" and there is no shared > > state between insert/update/delete calls. > > Hmm. I think adding a begin/end to modifytable is a reasonable thing to > do (it'd be a no-op for heap and zheap I guess). I'm fairly strongly against that. Adding two additional "virtual" function calls for something that's rarely going to be used, seems like adding too much overhead to me. > > 2. It looks like I cannot implement custom storage options. E.g. for > > compressed storage it makes sense to implement different compression > > methods (lz4, zstd etc.) and corresponding options (like compression > > level). But as i can see storage options (like fillfactor etc) are > > hardcoded and are not extensible. Possible solution is to use GUCs > > which would work but is not extremely convinient. > > Yeah, the reloptions module is undergoing some changes. I expect that > there will be a way to extend reloptions from an extension, at the end > of that set of patches. Cool. > > 3. A bit surprising limitation that in order to use bitmap scan the > > maximum number of tuples per page must not exceed 291 due to > > MAX_TUPLES_PER_PAGE macro in tidbitmap.c which is calculated based on > > 8kb page size. In case of 1mb page this restriction feels really > > limiting. > > I suppose this is a hardcoded limit that needs to be fixed by patching > core as we make table AM more pervasive. That's not unproblematic - a dynamic limit would make a number of computations more expensive, and we already spend plenty CPU cycles building the tid bitmap. And we'd waste plenty of memory just having all that space for the worst case. ISTM that we "just" need to replace the TID bitmap with some tree like structure. > > 4. In order to use WAL-logging each page must start with a standard 24 > > byte PageHeaderData even if it is needless for storage itself. Not a > > big deal though. Another (acutally documented) WAL-related limitation > > is that only generic WAL can be used within extension. So unless > > inserts are made in bulks it's going to require a lot of disk space to > > accomodate logs and wide bandwith for replication. > > Not sure what to suggest. Either you should ignore this problem, or > you should fix it. I think if it becomes a problem you should ask for an rmgr ID to use for your extension, which we encode and then then allow to set the relevant rmgr callbacks for that rmgr id at startup. But you should obviously first develop the WAL logging etc, and make sure it's beneficial over generic wal logging for your case. Greetings, Andres Freund
Re: v12.0: segfault in reindex CONCURRENTLY
On 2019-Oct-18, Michael Paquier wrote: > What you are proposing here sounds fine to me. Perhaps you would > prefer to adjust the code yourself? Sure thing, thanks, done :-) -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: WIP/PoC for parallel backup
On Thu, Oct 17, 2019 at 10:51 AM Asif Rehman wrote: > > Attached are the updated patches. > I had a quick look over these changes and they look good overall. However, here are my few review comments I caught while glancing the patches 0002 and 0003. --- 0002 patch 1. Can lsn option be renamed to start-wal-location? This will be more clear too. 2. +typedef struct +{ +charname[MAXPGPATH]; +chartype; +int32size; +time_tmtime; +} BackupFile; I think it will be good if we keep this structure in a common place so that the client can also use it. 3. +SEND_FILE_LIST, +SEND_FILES_CONTENT, Can above two commands renamed to SEND_BACKUP_MANIFEST and SEND_BACKUP_FILE respectively? The reason behind the first name change is, we are not getting only file lists here instead we are getting a few more details with that too. And for others, it will be inline with START_BACKUP/STOP_BACKUP/SEND_BACKUP_MANIFEST. 4. Typos: non-exlusive => non-exclusive retured => returned optionaly => optionally nessery => necessary totoal => total --- 0003 patch 1. +static int +simple_list_length(SimpleStringList *list) +{ +intlen = 0; +SimpleStringListCell *cell; + +for (cell = list->head; cell; cell = cell->next, len++) +; + +return len; +} I think it will be good if it goes to simple_list.c. That will help in other usages as well. 2. Please revert these unnecessary changes: @@ -1475,6 +1575,7 @@ ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res, int rownum) */ snprintf(filename, sizeof(filename), "%s/%s", current_path, copybuf); + if (filename[strlen(filename) - 1] == '/') { /* @@ -1528,8 +1622,8 @@ ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res, int rownum) * can map them too.) */ filename[strlen(filename) - 1] = '\0';/* Remove trailing slash */ - mapped_tblspc_path = get_tablespace_mapping(©buf[157]); + if (symlink(mapped_tblspc_path, filename) != 0) { pg_log_error("could not create symbolic link from \"%s\" to \"%s\": %m", 3. Typos: retrive => retrieve takecare => take care tablespae => tablespace 4. ParallelBackupEnd() function does not do anything for parallelism. Will it be better to just rename it as EndBackup()? 5. To pass a tablespace path to the server in SEND_FILES_CONTENT, you are reusing a LABEL option, that seems odd. How about adding a new option for that? 6. It will be good if we have some comments explaining what the function is actually doing in its prologue. For functions like: GetBackupFilesList() ReceiveFiles() create_workers_and_fetch() Thanks > > Thanks, > > -- > Asif Rehman > Highgo Software (Canada/China/Pakistan) > URL : www.highgo.ca > > -- Jeevan Chalke Associate Database Architect & Team Lead, Product Development EnterpriseDB Corporation The Enterprise PostgreSQL Company
Re: Questions/Observations related to Gist vacuum
On Fri, Oct 18, 2019 at 10:55 AM Amit Kapila wrote: > > On Fri, Oct 18, 2019 at 9:41 AM Dilip Kumar wrote: > > > > On Wed, Oct 16, 2019 at 7:22 PM Heikki Linnakangas wrote: > > > > > > On 16 October 2019 12:57:03 CEST, Amit Kapila > > > wrote: > > > >On Tue, Oct 15, 2019 at 7:13 PM Heikki Linnakangas > > > >wrote: > > > >> All things > > > >> considered, I'm not sure which is better. > > > > > > > >Yeah, this is a tough call to make, but if we can allow it to delete > > > >the pages in bulkdelete conditionally for parallel vacuum workers, > > > >then it would be better. > > > > > > Yeah, if it's needed for parallel vacuum, maybe that tips the scale. > > > > Are we planning to do this only if it is called from parallel vacuum > > workers or in general? > > > > I think we can do it in general as adding some check for parallel > vacuum there will look bit hackish. I agree with that point. It is not clear if we get enough > benefit by keeping it for cleanup phase of the index as discussed in > emails above. Heikki, others, let us know if you don't agree here. I have prepared a first version of the patch. Currently, I am performing an empty page deletion for all the cases. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com delete_emptypages_in_gistbulkdelete_v1.patch Description: Binary data
Add Change Badges to documentation
Attached is a patch to implement change badges in our documentation. What's a change badge? It's my term for a visual cue in the documentation used to indicate that the nearby section of documentation is new in this version or otherwise changed from the previous version. One example of change badges being used is in the DocBook documentation reference: https://tdg.docbook.org/tdg/4.5/ref-elements.html#common.attributes Docbook used graphical badges, which seemed to be a bad idea. Instead, I went with a decorated text span like one finds in gmail labels or Reddit "flair". The badges are implemented via using the "revision" attribute available on all docbook tags. All one needs to do to indicate a change is to change one tag, and add a revision attribute. For example: will add a small green text box with the tex "new in 13" immediately preceding the rendered elements. I have attached a screenshot (badges_in_acronyms.png) of an example of this from my browser viewing changes to the acronyms.html file. This obviously lacks the polish of viewing the page on a full website, but it does give you an idea of the flexibility of the change badge, and where badge placement is (and is not) a good idea. What are the benefits of using this? I think the benefits are as follows: 1. It shows a casual user what pieces are new on that page (new functions, new keywords, new command options, etc). 2. It also works in the negative: a user can quickly skim a page, and lacking any badges, feel confident that everything there works in the way that it did in version N-1. 3. It also acts as a subtle cue for the user to click on the previous version to see what it used to look like, confident that there *will* be a difference on the previous version. How would we implement this? 1. All new documentation pages would get a "NEW" badge in their title. 2. New function definitions, new command options, etc would get a "NEW" badge as visually close to the change as is practical. 3. Changes to existing functions, options, etc. would get a badge of "UPDATED" 4. At major release time, we could do one of two things: 4a. We could keep the NEW/UPDATED badges in the fixed release version, and then completely remove them from the master, because for version N+1, they won't be new anymore. This can be accomplished with an XSL transform looking for any tag with the "revision" attribute 4b. We could code in the version number at release time, and leave it in place. So in version 14 you could find both "v13" and "v14" badges, and in version 15 you could find badges for 15, 14, and 13. At some point (say v17), we start retiring the v13 badges, and in v18 we'd retire the v14 badges, and so on, to keep the clutter to a minimum. Back to the patch: I implemented this only for html output, and the colors I chose are very off-brand for postgres, so that will have to change. There's probably some spacing/padding issues I haven't thought of. Please try it out, make some modifications to existing document pages to see how badges would work in those contexts. From ded965fc90b223a834ac52d55512587b7a6ea139 Mon Sep 17 00:00:00 2001 From: Corey Huinker Date: Fri, 18 Oct 2019 06:15:10 -0400 Subject: [PATCH] add document change badges --- doc/src/sgml/acronyms.sgml | 6 +++--- doc/src/sgml/stylesheet-html-common.xsl | 10 ++ doc/src/sgml/stylesheet.css | 10 ++ 3 files changed, 23 insertions(+), 3 deletions(-) diff --git a/doc/src/sgml/acronyms.sgml b/doc/src/sgml/acronyms.sgml index f638665dc9..87bfef04be 100644 --- a/doc/src/sgml/acronyms.sgml +++ b/doc/src/sgml/acronyms.sgml @@ -10,7 +10,7 @@ -ANSI +ANSI https://en.wikipedia.org/wiki/American_National_Standards_Institute";> @@ -19,7 +19,7 @@ - + API @@ -31,7 +31,7 @@ ASCII - + https://en.wikipedia.org/wiki/Ascii";>American Standard Code for Information Interchange diff --git a/doc/src/sgml/stylesheet-html-common.xsl b/doc/src/sgml/stylesheet-html-common.xsl index 9edce52a10..cb04cb7f0d 100644 --- a/doc/src/sgml/stylesheet-html-common.xsl +++ b/doc/src/sgml/stylesheet-html-common.xsl @@ -289,4 +289,14 @@ set toc,title + + + + + + + + + + diff --git a/doc/src/sgml/stylesheet.css b/doc/src/sgml/stylesheet.css index 1a66c789d5..d0cae2f59f 100644 --- a/doc/src/sgml/stylesheet.css +++ b/doc/src/sgml/stylesheet.css @@ -109,3 +109,13 @@ acronym { font-style: inherit; } width: 75%; } } + +/* version badge styling */ +span.revision-badge { + visibility: visible ; +color: white; + background-color: #00933C; + border: 1px solid #00; + border-radius: 2px; +padding: 1px; +} -- 2.14.1
Re: libpq: Fix wrong connection status on invalid "connect_timeout"
Am 18.10.19 um 05:06 schrieb Michael Paquier: > So attached is a patch to skip trailing whitespaces as well, > which also fixes the issue with ECPG. I have refactored the parsing > logic a bit while on it. The comment at the top of parse_int_param() > needs to be reworked a bit more. I tested this and it looks good to me. Maybe you could omit some redundant 'end' checks, as in the attached patch. Or was your intention to verify non-NULL 'end'? > Perhaps we could add directly regression > tests for libpq. I'll start a new thread about that once we are done > here, the topic is larger. We have around 650 tests on ruby-pg to ensure everything runs as expected and I always wondered how the API of libpq is being verified. -- Kind Regards, Lars Kanis diff --git a/src/interfaces/libpq/fe-connect.c b/src/interfaces/libpq/fe-connect.c index f91f0f2..0eeabb8 100644 --- a/src/interfaces/libpq/fe-connect.c +++ b/src/interfaces/libpq/fe-connect.c @@ -1687,7 +1687,7 @@ useKeepalives(PGconn *conn) /* * Parse and try to interpret "value" as an integer value, and if successful, * store it in *result, complaining if there is any trailing garbage or an - * overflow. + * overflow. This allows any number of leading and trailing whitespaces. */ static bool parse_int_param(const char *value, int *result, PGconn *conn, @@ -1698,14 +1698,31 @@ parse_int_param(const char *value, int *result, PGconn *conn, *result = 0; + /* strtol(3) skips leading whitespaces */ errno = 0; numval = strtol(value, &end, 10); - if (errno == 0 && *end == '\0' && numval == (int) numval) - { - *result = numval; - return true; - } + /* + * If no progress was done during the parsing or an error happened, fail. + * This tests properly for overflows of the result. + */ + if (value == end || errno != 0 || numval != (int) numval) + goto error; + + /* + * Skip any trailing whitespace; if anything but whitespace remains before + * the terminating character, fail + */ + while (*end != '\0' && isspace((unsigned char) *end)) + end++; + + if (*end != '\0') + goto error; + + *result = numval; + return true; + +error: appendPQExpBuffer(&conn->errorMessage, libpq_gettext("invalid integer value \"%s\" for connection option \"%s\"\n"), value, context); @@ -2008,7 +2025,11 @@ connectDBComplete(PGconn *conn) { if (!parse_int_param(conn->connect_timeout, &timeout, conn, "connect_timeout")) + { + /* mark the connection as bad to report the parsing failure */ + conn->status = CONNECTION_BAD; return 0; + } if (timeout > 0) {
Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions
On Mon, Oct 14, 2019 at 3:09 PM Dilip Kumar wrote: > > On Thu, Oct 3, 2019 at 4:03 AM Tomas Vondra > wrote: > > > > > > Sure, I wasn't really proposing to adding all stats from that patch, > > including those related to streaming. We need to extract just those > > related to spilling. And yes, it needs to be moved right after 0001. > > > I have extracted the spilling related code to a separate patch on top > of 0001. I have also fixed some bugs and review comments and attached > as a separate patch. Later I can merge it to the main patch if you > agree with the changes. > Few comments - 0001-Add-logical_decoding_work_mem-to-limit-ReorderBuffer 1. + { + {"logical_decoding_work_mem", PGC_USERSET, RESOURCES_MEM, + gettext_noop("Sets the maximum memory to be used for logical decoding."), + gettext_noop("This much memory can be used by each internal " + "reorder buffer before spilling to disk or streaming."), + GUC_UNIT_KB + }, I think we can remove 'or streaming' from above sentence for now. We can add it later with later patch where streaming will be allowed. 2. @@ -206,6 +206,18 @@ CREATE SUBSCRIPTION subscription_name + + +work_mem (integer) + + + Limits the amount of memory used to decode changes on the + publisher. If not specified, the publisher will use the default + specified by logical_decoding_work_mem. When + needed, additional data are spilled to disk. + + + It is not clear why we need this parameter at least with this patch? I have raised this multiple times [1][2]. bugs_and_review_comments_fix 1. }, &logical_decoding_work_mem, - -1, -1, MAX_KILOBYTES, - check_logical_decoding_work_mem, NULL, NULL + 65536, 64, MAX_KILOBYTES, + NULL, NULL, NULL I think the default value should be 1MB similar to maintenance_work_mem. The same was true before this change. 2. -#logical_decoding_work_mem = 64MB # min 1MB, or -1 to use maintenance_work_mem +i#logical_decoding_work_mem = 64MB # min 64kB It seems the 'i' is a leftover character in the above change. Also, change the default value considering the previous point. 3. @@ -2479,7 +2480,7 @@ ReorderBufferSerializeTXN(ReorderBuffer *rb, ReorderBufferTXN *txn) /* update the statistics */ rb->spillCount += 1; - rb->spillTxns += txn->serialized ? 1 : 0; + rb->spillTxns += txn->serialized ? 0 : 1; rb->spillBytes += size; Why is this change required? Shouldn't we increase the spillTxns count only when the txn is serialized? 0002-Track-statistics-for-spilling 1. + + spill_txns + integer + Number of transactions spilled to disk after the memory used by + logical decoding exceeds logical_work_mem. The + counter gets incremented both for toplevel transactions and + subtransactions. + + The parameter name is wrong here. /logical_work_mem/logical_decoding_work_mem 2. + + spill_txns + integer + Number of transactions spilled to disk after the memory used by + logical decoding exceeds logical_work_mem. The + counter gets incremented both for toplevel transactions and + subtransactions. + + + + spill_count + integer + Number of times transactions were spilled to disk. Transactions + may get spilled repeatedly, and this counter gets incremented on every + such invocation. + + + + spill_bytes + integer + Amount of decoded transaction data spilled to disk. + + In all the above cases, the explanation text starts immediately after tag, but the general coding practice is to start from the next line, see the explanation of nearby parameters. It seems these parameters are added in pg-stat-wal-receiver-view in the docs, but in code, it is present as part of pg_stat_replication. It seems doc needs to be updated. Am, I missing something? 3. ReorderBufferSerializeTXN() { .. /* update the statistics */ rb->spillCount += 1; rb->spillTxns += txn->serialized ? 0 : 1; rb->spillBytes += size; Assert(spilled == txn->nentries_mem); Assert(dlist_is_empty(&txn->changes)); txn->nentries_mem = 0; txn->serialized = true; .. } I am not able to understand the above code. We are setting the serialized parameter a few lines after we check it and increment the spillTxns count. Can you please explain it? Also, isn't spillTxns count bit confusing, because in some cases it will include subtransactions and other cases (where the largest picked transaction is a subtransaction) it won't include it? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: libpq: Fix wrong connection status on invalid "connect_timeout"
Am 18.10.19 um 05:06 schrieb Michael Paquier: > So attached is a patch to skip trailing whitespaces as well, > which also fixes the issue with ECPG. I have refactored the parsing > logic a bit while on it. The comment at the top of parse_int_param() > needs to be reworked a bit more. I tested this and it looks good to me. Maybe you could omit some redundant 'end' checks, as in the attached patch. Or was your intention to verify non-NULL 'end'? > Perhaps we could add directly regression > tests for libpq. I'll start a new thread about that once we are done > here, the topic is larger. We have around 650 tests on ruby-pg to ensure everything runs as expected and I always wondered how the API of libpq is being verified. -- Kind Regards, Lars Kanis diff --git a/src/interfaces/libpq/fe-connect.c b/src/interfaces/libpq/fe-connect.c index f91f0f2..0eeabb8 100644 --- a/src/interfaces/libpq/fe-connect.c +++ b/src/interfaces/libpq/fe-connect.c @@ -1687,7 +1687,7 @@ useKeepalives(PGconn *conn) /* * Parse and try to interpret "value" as an integer value, and if successful, * store it in *result, complaining if there is any trailing garbage or an - * overflow. + * overflow. This allows any number of leading and trailing whitespaces. */ static bool parse_int_param(const char *value, int *result, PGconn *conn, @@ -1698,14 +1698,31 @@ parse_int_param(const char *value, int *result, PGconn *conn, *result = 0; + /* strtol(3) skips leading whitespaces */ errno = 0; numval = strtol(value, &end, 10); - if (errno == 0 && *end == '\0' && numval == (int) numval) - { - *result = numval; - return true; - } + /* + * If no progress was done during the parsing or an error happened, fail. + * This tests properly for overflows of the result. + */ + if (value == end || errno != 0 || numval != (int) numval) + goto error; + + /* + * Skip any trailing whitespace; if anything but whitespace remains before + * the terminating character, fail + */ + while (*end != '\0' && isspace((unsigned char) *end)) + end++; + + if (*end != '\0') + goto error; + + *result = numval; + return true; + +error: appendPQExpBuffer(&conn->errorMessage, libpq_gettext("invalid integer value \"%s\" for connection option \"%s\"\n"), value, context); @@ -2008,7 +2025,11 @@ connectDBComplete(PGconn *conn) { if (!parse_int_param(conn->connect_timeout, &timeout, conn, "connect_timeout")) + { + /* mark the connection as bad to report the parsing failure */ + conn->status = CONNECTION_BAD; return 0; + } if (timeout > 0) {
Re: libpq: Fix wrong connection status on invalid "connect_timeout"
Am 18.10.19 um 05:06 schrieb Michael Paquier: > So attached is a patch to skip trailing whitespaces as well, > which also fixes the issue with ECPG. I have refactored the parsing > logic a bit while on it. The comment at the top of parse_int_param() > needs to be reworked a bit more. I tested this and it looks good to me. Maybe you could omit some redundant 'end' checks, as in the attached patch. Or was your intention to verify non-NULL 'end'? > Perhaps we could add directly regression > tests for libpq. I'll start a new thread about that once we are done > here, the topic is larger. We have around 650 tests on ruby-pg to ensure everything runs as expected and I always wondered how the API of libpq is being verified. -- Kind Regards, Lars Kanis diff --git a/src/interfaces/libpq/fe-connect.c b/src/interfaces/libpq/fe-connect.c index f91f0f2..0eeabb8 100644 --- a/src/interfaces/libpq/fe-connect.c +++ b/src/interfaces/libpq/fe-connect.c @@ -1687,7 +1687,7 @@ useKeepalives(PGconn *conn) /* * Parse and try to interpret "value" as an integer value, and if successful, * store it in *result, complaining if there is any trailing garbage or an - * overflow. + * overflow. This allows any number of leading and trailing whitespaces. */ static bool parse_int_param(const char *value, int *result, PGconn *conn, @@ -1698,14 +1698,31 @@ parse_int_param(const char *value, int *result, PGconn *conn, *result = 0; + /* strtol(3) skips leading whitespaces */ errno = 0; numval = strtol(value, &end, 10); - if (errno == 0 && *end == '\0' && numval == (int) numval) - { - *result = numval; - return true; - } + /* + * If no progress was done during the parsing or an error happened, fail. + * This tests properly for overflows of the result. + */ + if (value == end || errno != 0 || numval != (int) numval) + goto error; + + /* + * Skip any trailing whitespace; if anything but whitespace remains before + * the terminating character, fail + */ + while (*end != '\0' && isspace((unsigned char) *end)) + end++; + + if (*end != '\0') + goto error; + + *result = numval; + return true; + +error: appendPQExpBuffer(&conn->errorMessage, libpq_gettext("invalid integer value \"%s\" for connection option \"%s\"\n"), value, context); @@ -2008,7 +2025,11 @@ connectDBComplete(PGconn *conn) { if (!parse_int_param(conn->connect_timeout, &timeout, conn, "connect_timeout")) + { + /* mark the connection as bad to report the parsing failure */ + conn->status = CONNECTION_BAD; return 0; + } if (timeout > 0) {
Re: Clean up MinGW def file generation
On 2019-Oct-17, Michael Paquier wrote: > On Tue, Oct 15, 2019 at 09:00:23AM +0200, Peter Eisentraut wrote: > > I think we can clean this up and just have the regular ddl.def built > > normally at build time if required. > > > > Does anyone know more about this? > > This comes from here, but I cannot see a thread about this topic > around this date: > commit: a1d5d8574751d62a039d8ceb44329ee7c637196a > author: Peter Eisentraut > date: Tue, 26 Feb 2008 06:41:24 + > Refactor the code that creates the shared library export files to appear > only once in Makefile.shlib and not in four copies. Well, yes, but that code originates from much earlier. For example 2a63c1602d9d (Tom Lane, Oct. 2004) is the one that created the libpq ones. But even that ancient one seems to be just refactoring some stuff that was already there, namely something that seems to have been created by commit 53cd7cd8a916: commit 53cd7cd8a9168d4b2e2feb52129336429cc99b98 Author: Bruce Momjian AuthorDate: Tue Mar 9 04:53:37 2004 + CommitDate: Tue Mar 9 04:53:37 2004 + Make a separate win32 debug DLL along with the non-debug version: Currently, src/interfaces/libpq/win32.mak builds a statically-linked library "libpq.lib", a debug dll "libpq.dll", import library for the debug dll "libpqdll.lib", a release dll "libpq.dll", import library for the release dll "libpqdll.lib". To avoid naming clashes, I would make the debug dll and import libraries "libpqd.dll" and "libpqddll.lib". Basically, the debug build uses the cl flags: "/MDd /D _DEBUG", and the release build uses the cl flags "/MD /D NDEBUG". Usually the debug build has a "D" suffix on the file name, so for example: libpqd.dll libpq, debug build libpqd.lib libpq, debug build, import library libpq.dll libpq, release build libpq.lib libpq, release build, import library David Turner This stuff was used by win32.mak, but I don't know if that tells anyone anything. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Proposal: Make use of C99 designated initialisers for nulls/values arrays
Greetings, * Thomas Munro (thomas.mu...@gmail.com) wrote: > On Tue, Oct 8, 2019 at 11:09 PM Amit Kapila wrote: > > I am personally still in the camp of people advocating the use of > > macro for this purpose. It is quite possible after reading your > > points, some people might change their opinion or some others also > > share their opinion against using a macro in which case we can drop > > the idea of using a macro. > > -1 for these macros. Agreed. > These are basic facts about the C language. I hope C eventually > supports {} like C++, so that you don't have to think hard about > whether the first member is another struct, and recursively so … but > since the macros can't help with that problem, what is the point? I realize that I need to don some fireproof gear for suggesting this, but I really wonder how much fallout we'd have from just allowing {} to be used.. It's about a billion[1] times cleaner and more sensible than using {0} and doesn't create a dependency on what the first element of the struct is.. Thanks, Stephen 1: Detailed justification not included intentionally and is left as an exercise to the reader. signature.asc Description: PGP signature
Re: [Patch proposal] libpq portal support
On Thu, 17 Oct 2019 at 03:12, Sergei Fedorov wrote: > Hello everybody, > > Our company was in desperate need of portals in async interface of libpq, > so we patched it. > > We would be happy to upstream the changes. > > The description of changes: > > Two functions in libpq-fe.h: > PQsendPortalBindParams for sending a command to bind a portal to a > previously prepared statement; > PQsendPortalExecute for executing a previously bound portal with a given > number of rows. > > A patch to pqParseInput3 in fe-protocol3.c to handle the `portal > suspended` message tag. > > The patch is ready for review, but it lacks documentation, tests and usage > examples. > > There are no functions for sending bind without params and no functions > for sync interface, but they can easily be added to the feature. > If you are happy to put it under The PostgreSQL License, then sending it as an attachment here is the first step. If possible, please rebase it on top of git master. Some explanation for why you have this need and what problems this solves for you would be helpful as well. -- Craig Ringer http://www.2ndQuadrant.com/ 2ndQuadrant - PostgreSQL Solutions for the Enterprise
Re: Non working timeout detection in logical worker
On Fri, 18 Oct 2019 07:47:13 +0200 Julien Rouhaud wrote: > On Fri, Oct 18, 2019 at 7:32 AM Michael Paquier wrote: > > > > On Thu, Oct 17, 2019 at 08:00:15PM +0200, Julien Rouhaud wrote: > > > Jehan-Guillaume (in Cc) reported me today a problem with logical > > > replication, where in case of network issue the walsender is correctly > > > terminating at the given wal_sender_timeout but the logical worker > > > kept waiting indefinitely. > > > > > > The issue is apparently a simple thinko, the timestamp of the last > > > received activity being unconditionally set at the beginning of the > > > main processing loop, making any reasonable timeout setting > > > ineffective. Trivial patch to fix the problem attached. > > > > Right, good catch. That's indeed incorrect. The current code would > > just keep resetting the timeout if walrcv_receive() returns 0 roughly > > once per NAPTIME_PER_CYCLE. The ping sent to the server once reaching > > half of wal_receiver_timeout was also broken because of that. > > > > In short, applied and back-patched down to 10. > > Thanks Michael! Thank you both guys!
RE: Copy data to DSA area
>>ShmZoneContext for SharedPlan and SharedRelCache is not implemented but >>I'm going to do it following your points. > >After looking into existing code, I'm thinking Generation Memory Context seems >to >have the similar purpose. So I'll implement ShmZoneContext by reference it. >Generation context manages chunks having almost same life span. >ShmZoneContext would dsa_allocate() block and hand out chunks and free chunks >and >its block all together. I added ShmZoneContext to my patch. I haven't added detailed comments and test set, so let me explain how to use it here. I followed Thomas' suggestion. At start up, ShmZoneContext is created in shared memory by ShmZoneContextCreateOrigin(). Before allocating memory, another context is created and set to short-lived parent context via MemoryContextClone() so that objects and contexts are automatically freed. Then you can use, palloc() which returns chunk from dsa_allocated block. When you use MemoryContextSetParent() to long-lived ShmContext, you need to acquire lock to protect parent-child path. The LWLock object is get by ShmZoneContextGetLock(). Every cloned ShmZoneContext uses the same lock instance. If you want to free allocated object, use MemoryContextDelete(). After the context becomes long-lived, you need to get lock again to do MemoryContextDelete() in order to protect MemoryContext parent-child path. Thinking about use case of Shared RelCache/PlanCache, allocation happens only before the parent context is switch to long-lived shared one, so I think we don't need to take lock while palloc(). I also think MemoryContextDelete() should be done after allocated objects are deleted from some shared index structure (ex. hash table or list in shared memory) so that another process can take a look at it What do you think? Regards, Takeshi Ideriha shm_zone_retail_context-v2.patch Description: shm_zone_retail_context-v2.patch
Re: UPSERT on view does not find constraint by name
On Fri, Oct 18, 2019 at 3:42 AM Tom Lane wrote: > Jeremy Finzel writes: > > I'm not sure if this can be considered a bug or not, but it is perhaps > > unexpected. I found that when using a view that is simply select * from > > table, then doing INSERT ... ON CONFLICT ON CONSTRAINT constraint_name on > > that view, it does not find the constraint and errors out. But it does > > find the constraint if one lists the columns instead. > > I'm confused by this report. The view wouldn't have any constraints, > and experimenting shows that the parser won't let you name a > constraint of the underlying table here. So would you provide a > concrete example of what you're talking about? > > regards, tom lane > Apologies for the lack of clarity. Here is a simple example of what I mean: test=# CREATE TEMP TABLE foo (id int primary key); CREATE TABLE test=# \d foo Table "pg_temp_4.foo" Column | Type | Collation | Nullable | Default +-+---+--+- id | integer | | not null | Indexes: "foo_pkey" PRIMARY KEY, btree (id) test=# CREATE VIEW bar AS SELECT * FROM foo; NOTICE: view "bar" will be a temporary view CREATE VIEW test=# INSERT INTO foo (id) test-# VALUES (1) test-# ON CONFLICT ON CONSTRAINT foo_pkey test-# DO NOTHING; INSERT 0 1 test=# INSERT INTO foo (id) VALUES (1) ON CONFLICT ON CONSTRAINT foo_pkey DO NOTHING; INSERT 0 0 test=# INSERT INTO foo (id) VALUES (1) ON CONFLICT ON CONSTRAINT foo_pkey DO NOTHING; INSERT 0 0 test=# INSERT INTO bar (id) VALUES (1) ON CONFLICT ON CONSTRAINT foo_pkey DO NOTHING; ERROR: constraint "foo_pkey" for table "bar" does not exist test=# INSERT INTO bar (id) VALUES (1) ON CONFLICT (id) DO NOTHING; INSERT 0 0 Of interest are the last 2 statements above. ON CONFLICT on the constraint name does not work, but it does work by field name. I'm not saying it *should* work both ways, but I'm more wondering if this is known/expected/desired behavior. The point of interest for us is that we frequently preserve a table's "public API" by instead swapping out a table for a view as above, in order for instance to rebuild a table behind the scenes without breaking table usage. Above case is a rare example where that doesn't work, and which in any case I advise (as does the docs) that they do not use on conflict on constraint, but rather to list the field names instead. Thanks, Jeremy
Re: Proposal: Make use of C99 designated initialisers for nulls/values arrays
On 10/18/19 08:18, Stephen Frost wrote: > I realize that I need to don some fireproof gear for suggesting this, > but I really wonder how much fallout we'd have from just allowing {} to > be used.. It's about a billion[1] times cleaner and more sensible than > using {0} and doesn't create a dependency on what the first element of > the struct is.. I guess the non-flamey empirical question would be, if it's not ISO C, are we supporting any compiler that doesn't understand it? -Chap
Re: Clean up MinGW def file generation
Alvaro Herrera writes: > On 2019-Oct-17, Michael Paquier wrote: >> On Tue, Oct 15, 2019 at 09:00:23AM +0200, Peter Eisentraut wrote: >>> I think we can clean this up and just have the regular ddl.def built >>> normally at build time if required. >>> Does anyone know more about this? > Well, yes, but that code originates from much earlier. For example > 2a63c1602d9d (Tom Lane, Oct. 2004) is the one that created the libpq > ones. Yeah, the comment that Peter complained about is mine. I believe the desire to avoid depending on "sed" at build time was focused on our old support for building libpq with Borland C (and not much else). Since this makefile infrastructure is now only used for MinGW, I agree we ought to be able to quit shipping those files in tarballs. I think there could be some .gitignore cleanup done along with this. Notably, I see exclusions for /exports.list in several places, but no other references to that name --- isn't that an intermediate file that we used to generate while creating these files? regards, tom lane
Re: Proposal: Make use of C99 designated initialisers for nulls/values arrays
Greetings, * Chapman Flack (c...@anastigmatix.net) wrote: > On 10/18/19 08:18, Stephen Frost wrote: > > I realize that I need to don some fireproof gear for suggesting this, > > but I really wonder how much fallout we'd have from just allowing {} to > > be used.. It's about a billion[1] times cleaner and more sensible than > > using {0} and doesn't create a dependency on what the first element of > > the struct is.. > > I guess the non-flamey empirical question would be, if it's not ISO C, > are we supporting any compiler that doesn't understand it? Right, that's basically what I was trying to ask. :) Thanks, Stephen signature.asc Description: PGP signature
Re: UPSERT on view does not find constraint by name
Jeremy Finzel writes: > test=# CREATE TEMP TABLE foo (id int primary key); > CREATE TABLE > test=# CREATE VIEW bar AS SELECT * FROM foo; > NOTICE: view "bar" will be a temporary view > CREATE VIEW > ... > test=# INSERT INTO bar (id) > VALUES (1) > ON CONFLICT ON CONSTRAINT foo_pkey > DO NOTHING; > ERROR: constraint "foo_pkey" for table "bar" does not exist > test=# INSERT INTO bar (id) > VALUES (1) > ON CONFLICT (id) > DO NOTHING; > INSERT 0 0 > Of interest are the last 2 statements above. ON CONFLICT on the constraint > name does not work, but it does work by field name. I'm not saying it > *should* work both ways, but I'm more wondering if this is > known/expected/desired behavior. The first case looks perfectly normal to me: there is no "foo_pkey" constraint associated with the "bar" view. It is interesting that the second case drills down to find there's an underlying constraint, but that seems like a bit of a hack :-(. Poking at it a little more closely, it seems like the first case involves a parse-time constraint lookup, while the second case postpones the lookup to plan time, and so the second case works because the view has already been expanded into a direct reference to the underlying table. Maybe it wasn't good to do those cases differently. I can't get too excited about it though. regards, tom lane
Re: Add Change Badges to documentation
On Fri, Oct 18, 2019 at 07:54:18AM -0400, Corey Huinker wrote: Attached is a patch to implement change badges in our documentation. What's a change badge? It's my term for a visual cue in the documentation used to indicate that the nearby section of documentation is new in this version or otherwise changed from the previous version. One example of change badges being used is in the DocBook documentation reference: https://tdg.docbook.org/tdg/4.5/ref-elements.html#common.attributes Docbook used graphical badges, which seemed to be a bad idea. Instead, I went with a decorated text span like one finds in gmail labels or Reddit "flair". Looks useful. I sometimes need to look at a command in version X and see what changed since version Y. Currently I do that by opening both pages and visually comparing them, so those badges make it easier. The badges are implemented via using the "revision" attribute available on all docbook tags. All one needs to do to indicate a change is to change one tag, and add a revision attribute. For example: will add a small green text box with the tex "new in 13" immediately preceding the rendered elements. I have attached a screenshot (badges_in_acronyms.png) of an example of this from my browser viewing changes to the acronyms.html file. This obviously lacks the polish of viewing the page on a full website, but it does give you an idea of the flexibility of the change badge, and where badge placement is (and is not) a good idea. What are the benefits of using this? I think the benefits are as follows: 1. It shows a casual user what pieces are new on that page (new functions, new keywords, new command options, etc). Yep. 2. It also works in the negative: a user can quickly skim a page, and lacking any badges, feel confident that everything there works in the way that it did in version N-1. Not sure about this. It'd require marking all changes with the badge, but we'll presumably mark only the large-ish changes, and it's unclear where the threshold is. It also does not work when removing a block of text (e.g. when removing some limitation), although it's true we often add a new para too. 3. It also acts as a subtle cue for the user to click on the previous version to see what it used to look like, confident that there *will* be a difference on the previous version. How would we implement this? 1. All new documentation pages would get a "NEW" badge in their title. 2. New function definitions, new command options, etc would get a "NEW" badge as visually close to the change as is practical. 3. Changes to existing functions, options, etc. would get a badge of "UPDATED" 4. At major release time, we could do one of two things: 4a. We could keep the NEW/UPDATED badges in the fixed release version, and then completely remove them from the master, because for version N+1, they won't be new anymore. This can be accomplished with an XSL transform looking for any tag with the "revision" attribute 4b. We could code in the version number at release time, and leave it in place. So in version 14 you could find both "v13" and "v14" badges, and in version 15 you could find badges for 15, 14, and 13. At some point (say v17), we start retiring the v13 badges, and in v18 we'd retire the v14 badges, and so on, to keep the clutter to a minimum. Presumably we could keep the SGML source and only decide which badges to ignore during build of the docs. That would however require somewhat more structured approach - now it's a single attribute with free text, which does not really allow easy filtering. With separate attributes for new/removed bits, e.g. and the filtering would be much easier. But my experience with SGML is rather limited, so maybe I'm wrong. Back to the patch: I implemented this only for html output, and the colors I chose are very off-brand for postgres, so that will have to change. There's probably some spacing/padding issues I haven't thought of. Please try it out, make some modifications to existing document pages to see how badges would work in those contexts. Haven't looked yet, but I agree the colors might need a change - that's a rather minor detail, though. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: configure fails for perl check on CentOS8
Kyotaro Horiguchi writes: > The immediately problematic command generated by autoconf is: > ... > /usr/bin/ld: /tmp/ccGxodNv.o: relocation R_X86_64_32 against symbol > `PL_memory_wrap' can not be used when making a PIE object; recompile with > -fPIC > /usr/bin/ld: final link failed: Nonrepresentable section on output > collect2: error: ld returned 1 exit status > Very interestingly I don't get the error when the "-O0" is "-O2". It > is because gcc eliminates the PL_memory_wrap maybe by inlining. Yeah, probably so. But I don't like the idea of fixing a problem triggered by user-supplied CFLAGS by injecting more cflags from elsewhere. That seems likely to be counterproductive, or at least it risks overriding what the user wanted. Can we fix this by using something other than perl_alloc() as the tested-for function? That is surely a pretty arbitrary choice. Are there any standard Perl entry points that are just plain functions with no weird macro expansions? regards, tom lane
Re: Bug about drop index concurrently
Hi, I can trivially reproduce this - it's enough to create a master-standby setup, and then do this on the master CREATE TABLE t (a int, b int); INSERT INTO t SELECT i, i FROM generate_series(1,1) s(i); and run pgbench with this script DROP INDEX CONCURRENTLY IF EXISTS t_a_idx; CREATE INDEX t_a_idx ON t(a); while on the standby there's another pgbench running this script EXPLAIN ANALYZE SELECT * FROM t WHERE a = 1; and it fails pretty fast for me. With an extra assert(false) added to src/backend/access/common/relation.c I get a backtrace like this: Program terminated with signal SIGABRT, Aborted. #0 0x7c32e458fe35 in raise () from /lib64/libc.so.6 Missing separate debuginfos, use: dnf debuginfo-install glibc-2.29-22.fc30.x86_64 (gdb) bt #0 0x7c32e458fe35 in raise () from /lib64/libc.so.6 #1 0x7c32e457a895 in abort () from /lib64/libc.so.6 #2 0x00a58579 in ExceptionalCondition (conditionName=0xacd9bc "!(0)", errorType=0xacd95b "FailedAssertion", fileName=0xacd950 "relation.c", lineNumber=64) at assert.c:54 #3 0x0048d1bd in relation_open (relationId=38216, lockmode=1) at relation.c:64 #4 0x005082e4 in index_open (relationId=38216, lockmode=1) at indexam.c:130 #5 0x0080ac3f in get_relation_info (root=0x21698b8, relationObjectId=16385, inhparent=false, rel=0x220ce60) at plancat.c:196 #6 0x008118c6 in build_simple_rel (root=0x21698b8, relid=1, parent=0x0) at relnode.c:292 #7 0x007d485d in add_base_rels_to_query (root=0x21698b8, jtnode=0x2169478) at initsplan.c:114 #8 0x007d48a3 in add_base_rels_to_query (root=0x21698b8, jtnode=0x21693e0) at initsplan.c:122 #9 0x007d8fad in query_planner (root=0x21698b8, qp_callback=0x7ded97 , qp_extra=0x7fffa4834f10) at planmain.c:168 #10 0x007dc316 in grouping_planner (root=0x21698b8, inheritance_update=false, tuple_fraction=0) at planner.c:2048 #11 0x007da7ca in subquery_planner (glob=0x220d078, parse=0x2168f78, parent_root=0x0, hasRecursion=false, tuple_fraction=0) at planner.c:1012 #12 0x007d942c in standard_planner (parse=0x2168f78, cursorOptions=256, boundParams=0x0) at planner.c:406 #13 0x007d91e8 in planner (parse=0x2168f78, cursorOptions=256, boundParams=0x0) at planner.c:275 #14 0x008e1b0d in pg_plan_query (querytree=0x2168f78, cursorOptions=256, boundParams=0x0) at postgres.c:878 #15 0x00658683 in ExplainOneQuery (query=0x2168f78, cursorOptions=256, into=0x0, es=0x220cd90, queryString=0x21407b8 "explain analyze select * from t where a = 1;", params=0x0, queryEnv=0x0) at explain.c:367 #16 0x00658386 in ExplainQuery (pstate=0x220cc28, stmt=0x2141728, queryString=0x21407b8 "explain analyze select * from t where a = 1;", params=0x0, queryEnv=0x0, dest=0x220cb90) at explain.c:255 #17 0x008ea218 in standard_ProcessUtility (pstmt=0x21425c0, queryString=0x21407b8 "explain analyze select * from t where a = 1;", context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0x220cb90, completionTag=0x7fffa48355c0 "") at utility.c:675 #18 0x008e9a45 in ProcessUtility (pstmt=0x21425c0, queryString=0x21407b8 "explain analyze select * from t where a = 1;", context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0x220cb90, completionTag=0x7fffa48355c0 "") at utility.c:360 #19 0x008e8a0c in PortalRunUtility (portal=0x219c278, pstmt=0x21425c0, isTopLevel=true, setHoldSnapshot=true, dest=0x220cb90, completionTag=0x7fffa48355c0 "") at pquery.c:1175 #20 0x008e871a in FillPortalStore (portal=0x219c278, isTopLevel=true) at pquery.c:1035 #21 0x008e8075 in PortalRun (portal=0x219c278, count=9223372036854775807, isTopLevel=true, run_once=true, dest=0x21efb90, altdest=0x21efb90, completionTag=0x7fffa48357b0 "") at pquery.c:765 #22 0x008e207c in exec_simple_query (query_string=0x21407b8 "explain analyze select * from t where a = 1;") at postgres.c:1215 #23 0x008e636e in PostgresMain (argc=1, argv=0x216c600, dbname=0x216c4e0 "test", username=0x213c3f8 "user") at postgres.c:4236 #24 0x0083c71e in BackendRun (port=0x2165850) at postmaster.c:4437 #25 0x0083beef in BackendStartup (port=0x2165850) at postmaster.c:4128 #26 0x00838313 in ServerLoop () at postmaster.c:1704 #27 0x00837bbf in PostmasterMain (argc=3, argv=0x213a360) at postmaster.c:1377 #28 0x00759643 in main (argc=3, argv=0x213a360) at main.c:228 So my guess is the root cause is pretty simple - we close/unlock the indexes after completing the query, but then EXPLAIN tries to open it again when producing the explain plan. I don't have a very good idea how to fix this, as explain has no idea which indexes will be used by the query, etc. regards -- Tomas Vondra http://www.2ndQuadrant.com Pos
Missing constant propagation in planner on hash quals causes join slowdown
Hi Hackers, By optimising our application I stumbled over the join quals used very often in our application. In general this concerns datasets, which are subdivided into chunks, like years, seasons (here half a year), multiple tenants in OLTP system etc. In these cases many tables are joined only to data of the same chunk, identified always by a separating column in the table (here: xx_season). (I tested it on PG 12.0 on Windows 64 bit, but similar results on older stable releases and other OS) Here is the test case, also in the attached file: (I choose to join 2 tables with 2 seasons (2 and 3) of about 1 million of records for every season. I put some randomness in the table creation to simulate the normal situation in OLTP systems) --- Source start drop table if exists tmaster; create table tmaster ( id_t1 integer, t1_season integer, t1_id_t2 integer, t1_value integer, t1_cdescr varchar, primary key (id_t1) ); -- select setseed (0.34512); insert into tmaster select inum ,iseason ,row_number () over () as irow ,irandom ,'TXT: '||irandom::varchar from ( select inum::integer ,((inum>>20)+2)::integer as iseason ,inum::integer + (50*random())::integer as irandom from generate_series (1,(1<<21)) as inum order by irandom )qg ; alter table tmaster add constraint uk_master_season_id unique (t1_season,id_t1); drop table if exists tfact; create table tfact ( id_t2 integer, t2_season integer, t2_value integer, t2_cdescr varchar, primary key (id_t2) ); -- select setseed (-0.76543); insert into tfact select qg.* ,'FKT: '||irandom::varchar from ( select inum::integer ,((inum>>20)+2)::integer as iseason ,inum::integer + (50*random())::integer as irandom from generate_series (1,(1<<21)) as inum order by irandom )qg ; alter table tfact add constraint uk_fact_season_id unique (t2_season,id_t2); - -- slower: explain (analyze, verbose, costs, settings, buffers) select * from tmaster left join tfact on id_t2=t1_id_t2 and t2_season=t1_season where t1_season=3 ; -- faster by setting a constant in left join on condition: explain (analyze, verbose, costs, settings, buffers) select * from tmaster left join tfact on id_t2=t1_id_t2 and t2_season=3 --t1_season where t1_season=3 ; --- Source end The results for the first query: explain (analyze, verbose, costs, settings, buffers) select * from tmaster left join tfact on id_t2=t1_id_t2 and t2_season=t1_season where t1_season=3 ; QUERY PLAN -- Hash Left Join (cost=53436.01..111610.15 rows=1046129 width=52) (actual time=822.784..2476.573 rows=1048576 loops=1) Output: tmaster.id_t1, tmaster.t1_season, tmaster.t1_id_t2, tmaster.t1_value, tmaster.t1_cdescr, tfact.id_t2, tfact.t2_season, tfact.t2_value, tfact.t2_cdescr Inner Unique: true Hash Cond: ((tmaster.t1_season = tfact.t2_season) AND (tmaster.t1_id_t2 = tfact.id_t2)) Buffers: shared hit=2102193, temp read=10442 written=10442 -> Index Scan using uk_master_season_id on public.tmaster (cost=0.43..32263.38 rows=1046129 width=28) (actual time=0.008..565.222 rows=1048576 loops=1) Output: tmaster.id_t1, tmaster.t1_season, tmaster.t1_id_t2, tmaster.t1_value, tmaster.t1_cdescr Index Cond: (tmaster.t1_season = 3) Buffers: shared hit=1051086 -> Hash (cost=31668.49..31668.49 rows=1043473 width=24) (actual time=820.960..820.961 rows=1048576 loops=1) Output: tfact.id_t2, tfact.t2_season, tfact.t2_value, tfact.t2_cdescr Buckets: 524288 (originally 524288) Batches: 4 (originally 2) Memory Usage: 28673kB Buffers: shared hit=1051107, temp written=4316 -> Index Scan using uk_fact_season_id on public.tfact (cost=0.43..31668.49 rows=1043473 width=24) (actual time=0.024..598.648 rows=1048576 loops=1) Output: tfact.id_t2, tfact.t2_season, tfact.t2_value, tfact.t2_cdescr Index Cond: (tfact.t2_season = 3) Buffers: shared hit=1051107 Settings: effective_cache_size = '8GB', random_page_cost = '1', temp_buffers = '32MB', work_mem = '32MB' Planning Time: 0.627 ms Execution Time: 2502.702 ms (20 rows) and for the second one: explain (analyze, verbose, costs, settings, buffers) select * from tmaster left join tfact on id_t2=t1_id_t2 and t2_season=3 --t1_season where t1_season=3 ; QUERY PLAN -- Hash Left Join (cost=50827.33..106255.38 rows=1046129 width=52) (actual time=758.086..2313.175 rows=1048576 loops=1) Output:
Re: Columns correlation and adaptive query optimization
Smarter version of join selectivity patch handling cases like this: explain select * from outer_tab join inner_tab using(x,y) where x=1; QUERY PLAN Nested Loop (cost=0.42..1815.47 rows=10 width=12) Join Filter: (outer_tab.y = inner_tab.y) -> Seq Scan on outer_tab (cost=0.00..1791.00 rows=1 width=12) Filter: (x = 1) -> Index Only Scan using inner_tab_x_y_idx on inner_tab (cost=0.42..24.35 rows=10 width=8) Index Cond: (x = 1) (6 rows) -- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company diff --git a/contrib/auto_explain/auto_explain.c b/contrib/auto_explain/auto_explain.c index a9536c2..915a204 100644 --- a/contrib/auto_explain/auto_explain.c +++ b/contrib/auto_explain/auto_explain.c @@ -13,12 +13,25 @@ #include "postgres.h" #include +#include #include "access/parallel.h" #include "commands/explain.h" +#include "commands/defrem.h" #include "executor/instrument.h" #include "jit/jit.h" +#include "nodes/makefuncs.h" +#include "nodes/nodeFuncs.h" +#include "optimizer/cost.h" +#include "optimizer/optimizer.h" +#include "optimizer/planmain.h" +#include "parser/parsetree.h" +#include "storage/ipc.h" +#include "statistics/statistics.h" #include "utils/guc.h" +#include "utils/syscache.h" +#include "utils/lsyscache.h" +#include "utils/ruleutils.h" PG_MODULE_MAGIC; @@ -33,7 +46,9 @@ static bool auto_explain_log_settings = false; static int auto_explain_log_format = EXPLAIN_FORMAT_TEXT; static int auto_explain_log_level = LOG; static bool auto_explain_log_nested_statements = false; +static bool auto_explain_suggest_only = false; static double auto_explain_sample_rate = 1; +static double auto_explain_add_statistics_threshold = 0.0; static const struct config_enum_entry format_options[] = { {"text", EXPLAIN_FORMAT_TEXT, false}, @@ -218,6 +233,30 @@ _PG_init(void) NULL, NULL); + DefineCustomRealVariable("auto_explain.add_statistics_threshold", + "Sets the threshold for actual/estimated #rows ratio triggering creation of multicolumn statistic for the related columns.", + "Zero disables implicit creation of multicolumn statistic.", + &auto_explain_add_statistics_threshold, + 0.0, + 0.0, + INT_MAX, + PGC_SUSET, + 0, + NULL, + NULL, + NULL); + + DefineCustomBoolVariable("auto_explain.suggest_only", + "Do not create statistic but just record in WAL suggested create statistics statement.", + NULL, + &auto_explain_suggest_only, + false, + PGC_SUSET, + 0, + NULL, + NULL, + NULL); + EmitWarningsOnPlaceholders("auto_explain"); /* Install hooks. */ @@ -353,6 +392,230 @@ explain_ExecutorFinish(QueryDesc *queryDesc) PG_END_TRY(); } +static void +AddMultiColumnStatisticsForNode(PlanState *planstate, ExplainState *es); + +static void +AddMultiColumnStatisticsForSubPlans(List *plans, ExplainState *es) +{ + ListCell *lst; + + foreach(lst, plans) + { + SubPlanState *sps = (SubPlanState *) lfirst(lst); + + AddMultiColumnStatisticsForNode(sps->planstate, es); + } +} + +static void +AddMultiColumnStatisticsForMemberNodes(PlanState **planstates, int nsubnodes, + ExplainState *es) +{ + int j; + + for (j = 0; j < nsubnodes; j++) + AddMultiColumnStatisticsForNode(planstates[j], es); +} + +static int +vars_list_comparator(const ListCell *a, const ListCell *b) +{ + char* va = strVal((Value *) linitial(((ColumnRef *)lfirst(a))->fields)); + char* vb = strVal((Value *) linitial(((ColumnRef *)lfirst(b))->fields)); + return strcmp(va, vb); +} + +static void +AddMultiColumnStatisticsForQual(void* qual, ExplainState *es) +{ + List *vars = NULL; + ListCell* lc; + foreach (lc, qual) + { + Node* node = (Node*)lfirst(lc); + if (IsA(node, RestrictInfo)) + node = (Node*)((RestrictInfo*)node)->clause; + vars = list_concat(vars, pull_vars_of_level(node, 0)); + } + while (vars != NULL) + { + ListCell *cell; + List *cols = NULL; + Index varno = 0; + Bitmapset* colmap = NULL; + + foreach (cell, vars) + { + Node* node = (Node *) lfirst(cell); + if (IsA(node, Var)) + { +Var *var = (Var *) node; +if (cols == NULL || var->varnoold == varno) +{ + varno = var->varnoold; + if (var->varattno > 0 && + !bms_is_member(var->varattno, colmap) && + varno >= 1 && + varno <= list_length(es->rtable) && + list_length(cols) < STATS_MAX_DIMENSIONS) + { + RangeTblEntry *rte = rt_fetch(varno, es->rtable); + if (rte->rtekind == RTE_RELATION) + { + ColumnRef *col = makeNode(ColumnRef); + char *colname = get_rte_attribute_name(rte, var->varattno); + col->fields = list_make1(makeString(colname)); + cols = lappend(cols, col); + colmap = bms_add_member(col
Re: v12.0: reindex CONCURRENTLY: lock ShareUpdateExclusiveLock on object 14185/39327/0 is already held
Checking if anybody is working on either of these https://www.postgresql.org/message-id/20191013025145.GC4475%40telsasoft.com https://www.postgresql.org/message-id/20191015164047.GA22729%40telsasoft.com On Sat, Oct 12, 2019 at 09:51:45PM -0500, Justin Pryzby wrote: > I ran into this while trying to trigger the previously-reported segfault. > > CREATE TABLE t(i) AS SELECT * FROM generate_series(1,9); > CREATE INDEX ON t(i); > > [pryzbyj@database ~]$ for i in `seq 1 9`; do > PGOPTIONS='-cstatement_timeout=9' psql postgres --host /tmp --port 5678 -c > "REINDEX INDEX CONCURRENTLY t_i_idx" ; done > ERROR: canceling statement due to statement timeout > ERROR: lock ShareUpdateExclusiveLock on object 14185/47287/0 is already held > [...] > > Variations on this seem to leave the locks table (?) or something else in a > Real Bad state, such that I cannot truncate the table or drop it; or at least > commands are unreasonably delayed for minutes, on this otherwise-empty test > cluster. On Tue, Oct 15, 2019 at 11:40:47AM -0500, Justin Pryzby wrote: > On a badly-overloaded VM, we hit the previously-reported segfault in progress > reporting. This left around some *ccold indices. I tried to drop them but: > > sentinel=# DROP INDEX child.alarms_null_alarm_id_idx1_ccold; -- > child.alarms_null_alarm_time_idx_ccold; -- alarms_null_alarm_id_idx_ccold; > ERROR: could not find tuple for parent of relation 41351896 > > Those are children of relkind=I index on relkind=p table. > > postgres=# CREATE TABLE t(i int)PARTITION BY RANGE(i); > postgres=# CREATE TABLE t1 PARTITION OF t FOR VALUES FROM (1)TO(100); > postgres=# INSERT INTO t1 SELECT 1 FROM generate_series(1,9); > postgres=# CREATE INDEX ON t(i); > > postgres=# begin; SELECT * FROM t; -- DO THIS IN ANOTHER SESSION > > postgres=# REINDEX INDEX CONCURRENTLY t1_i_idx; -- cancel this one > ^CCancel request sent > ERROR: canceling statement due to user request > > postgres=# \d t1 > ... > "t1_i_idx" btree (i) > "t1_i_idx_ccold" btree (i) INVALID > > postgres=# SELECT inhrelid::regclass FROM pg_inherits WHERE > inhparent='t_i_idx'::regclass; > inhrelid > t1_i_idx > (1 row) > > Not only can't I DROP the _ccold indexes, but also dropping the table doesn't > cause them to be dropped, and then I can't even slash dee them anymore: > > jtp=# DROP INDEX t1_i_idx_ccold; > ERROR: could not find tuple for parent of relation 290818869 > > jtp=# DROP TABLE t; -- does not fail, but .. > > jtp=# \d t1_i_idx_ccold > ERROR: cache lookup failed for relation 290818865 > > jtp=# SELECT indrelid::regclass, * FROM pg_index WHERE > indexrelid='t1_i_idx_ccold'::regclass; > indrelid | 290818865 > indexrelid | 290818869 > indrelid | 290818865 > [...]
Re: jsonb_set() strictness considered harmful to data
On Fri, Oct 18, 2019, at 12:37, Ariadne Conill wrote: > Hello, > > I am one of the primary maintainers of Pleroma, a federated social > networking application written in Elixir, which uses PostgreSQL in > ways that may be considered outside the typical usage scenarios for > PostgreSQL. > > Namely, we leverage JSONB heavily as a backing store for JSON-LD > documents[1]. We also use JSONB in combination with Ecto's "embedded > structs" to store things like user preferences. > > The fact that we can use JSONB to achieve our design goals is a > testament to the flexibility PostgreSQL has. > > However, in the process of doing so, we have discovered a serious flaw > in the way jsonb_set() functions, but upon reading through this > mailing list, we have discovered that this flaw appears to be an > intentional design.[2] > > A few times now, we have written migrations that do things like copy > keys in a JSONB object to a new key, to rename them. These migrations > look like so: > >update users set info=jsonb_set(info, '{bar}', info->'foo'); > > Typically, this works nicely, except for cases where evaluating > info->'foo' results in an SQL null being returned. When that happens, > jsonb_set() returns an SQL null, which then results in data loss.[3] > > This is not acceptable. PostgreSQL is a database that is renowned for > data integrity, but here it is wiping out data when it encounters a > failure case. The way jsonb_set() should fail in this case is to > simply return the original input: it should NEVER return SQL null. > > But hey, we've been burned by this so many times now that we'd like to > donate a useful function to the commons, consider it a mollyguard for > the real jsonb_set() function. > > create or replace function safe_jsonb_set(target jsonb, path > text[], new_value jsonb, create_missing boolean default true) returns > jsonb as $$ > declare > result jsonb; > begin > result := jsonb_set(target, path, coalesce(new_value, > 'null'::jsonb), create_missing); > if result is NULL then > return target; > else > return result; > end if; > end; > $$ language plpgsql; > > This safe_jsonb_set() wrapper should not be necessary. PostgreSQL's > own jsonb_set() should have this safety feature built in. Without it, > using jsonb_set() is like playing russian roulette with your data, > which is not a reasonable expectation for a database renowned for its > commitment to data integrity. > > Please fix this bug so that we do not have to hack around this bug. > It has probably ruined countless people's days so far. I don't want > to hear about how the function is strict, I'm aware it is strict, and > that strictness is harmful. Please fix the function so that it is > actually safe to use. > > [1]: JSON-LD stands for JSON Linked Data. Pleroma has an "internal > representation" that shares similar qualities to JSON-LD, so I use > JSON-LD here as a simplification. > > [2]: > https://www.postgresql.org/message-id/flat/qfkua9$2q0e$1...@blaine.gmane.org > > [3]: https://git.pleroma.social/pleroma/pleroma/issues/1324 is an > example of data loss induced by this issue. > > Ariadne > This should be directed towards the hackers list, too. What will it take to change the semantics of jsonb_set()? MySQL implements safe behavior here. It's a real shame Postgres does not. I'll offer a $200 bounty to whoever fixes it. I'm sure it's destroyed more than $200 worth of data and people's time by now, but it's something. Kind regards, -- Mark Felder ports-secteam & portmgr alumni f...@freebsd.org
Backport "WITH ... AS MATERIALIZED" syntax to <12?
I've been struggling with how we're going to upgrade launchpad.net to PostgreSQL 12 or newer (we're currently on 10). We're one of those applications that deliberately uses CTEs as optimization fences in a few difficult places. The provision of the MATERIALIZED keyword in 12 is great, but the fact that it doesn't exist in earlier versions is awkward. We certainly don't want to upgrade our application code at the same time as upgrading the database, and dealing with performance degradation between the database upgrade and the application upgrade doesn't seem great either (not to mention that it would be hard to coordinate). That leaves us with hacking our query compiler to add the MATERIALIZED keyword only if it's running on 12 or newer, which would be possible but is pretty cumbersome. However, an alternative would be to backport the new syntax to some earlier versions. "WITH ... AS MATERIALIZED" can easily just be synonymous with "WITH ... AS" in versions prior to 12; there's no need to support "NOT MATERIALIZED" since that's explicitly requesting the new query-folding feature that only exists in 12. Would something like the attached patch against REL_11_STABLE be acceptable? I'd like to backpatch it at least as far as PostgreSQL 10. This compiles and passes regression tests. Thanks, -- Colin Watson[cjwat...@canonical.com] >From 063186eb678ad9831961d6319f7a4279f1029358 Mon Sep 17 00:00:00 2001 From: Colin Watson Date: Fri, 18 Oct 2019 14:08:11 +0100 Subject: [PATCH] Backport "WITH ... AS MATERIALIZED" syntax Applications that deliberately use CTEs as optimization fences need to adjust their code to prepare for PostgreSQL 12. Unfortunately, the MATERIALIZED keyword that they need to add isn't valid syntax in earlier versions of PostgreSQL, so they're stuck with either upgrading the application and the database simultaneously, accepting performance degradation between the two parts of the upgrade, or doing complex query compiler work to add MATERIALIZED conditionally. It makes things much easier in these cases if the MATERIALIZED keyword is accepted and ignored in earlier releases. Users can then upgrade to a suitable point release, change their application code to add MATERIALIZED, and then upgrade to PostgreSQL 12. --- doc/src/sgml/queries.sgml | 12 ++ doc/src/sgml/ref/select.sgml| 18 +- src/backend/parser/gram.y | 12 +++--- src/test/regress/expected/subselect.out | 31 + src/test/regress/sql/subselect.sql | 14 +++ 5 files changed, 83 insertions(+), 4 deletions(-) diff --git a/doc/src/sgml/queries.sgml b/doc/src/sgml/queries.sgml index 88bc189646..cc33d92133 100644 --- a/doc/src/sgml/queries.sgml +++ b/doc/src/sgml/queries.sgml @@ -2215,6 +2215,18 @@ SELECT n FROM t LIMIT 100; rows.) + + In some cases, PostgreSQL 12 folds + WITH queries into the parent query, allowing joint + optimization of the two query levels. You can override that decision by + specifying MATERIALIZED to force separate calculation + of the WITH query. While versions of + PostgreSQL before 12 do not support folding of + WITH queries, specifying + MATERIALIZED is permitted to ease application + upgrades. + + The examples above only show WITH being used with SELECT, but it can be attached in the same way to diff --git a/doc/src/sgml/ref/select.sgml b/doc/src/sgml/ref/select.sgml index 4db8142afa..1bd711a3cb 100644 --- a/doc/src/sgml/ref/select.sgml +++ b/doc/src/sgml/ref/select.sgml @@ -72,7 +72,7 @@ SELECT [ ALL | DISTINCT [ ON ( expressionand with_query is: -with_query_name [ ( column_name [, ...] ) ] AS ( select | values | insert | update | delete ) +with_query_name [ ( column_name [, ...] ) ] AS [ MATERIALIZED ] ( select | values | insert | update | delete ) TABLE [ ONLY ] table_name [ * ] @@ -290,6 +290,17 @@ TABLE [ ONLY ] table_name [ * ] row, the results are unspecified. + +PostgreSQL 12 folds side-effect-free +WITH queries into the primary query in some cases. +To override this and retain the behaviour up to +PostgreSQL 11, mark the +WITH query as MATERIALIZED. That +might be useful, for example, if the WITH query is +being used as an optimization fence to prevent the planner from choosing +a bad plan. + + See for additional information. @@ -2087,6 +2098,11 @@ SELECT distributors.* WHERE distributors.name = 'Westward'; ROWS FROM( ... ) is an extension of the SQL standard. + + +The MATERIALIZED option of WITH is +an extension of the SQL standard. + diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y index bc65319c2c..70df09f409 100644 --- a/src/backend/parser/gram.y +++ b/src/backend/parser/gram.y @@ -479,7 +479,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query)
Missing error_context_stack = NULL in AutoVacWorkerMain()
I am not sure if this causes any potential problems or not, but for consistency of code seems we are missing below. All other places in code where sigsetjmp() exists for top level handling has error_context_stack set to NULL. diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c index 073f313337..b06d0ad058 100644 --- a/src/backend/postmaster/autovacuum.c +++ b/src/backend/postmaster/autovacuum.c @@ -1558,6 +1558,9 @@ AutoVacWorkerMain(int argc, char *argv[]) */ if (sigsetjmp(local_sigjmp_buf, 1) != 0) { + /* Since not using PG_TRY, must reset error stack by hand */ + error_context_stack = NULL; + /* Prevents interrupts while cleaning up */ HOLD_INTERRUPTS(); This was spotted by Paul during code inspection.
Re: v12.0: segfault in reindex CONCURRENTLY
On Fri, Oct 18, 2019 at 07:30:37AM -0300, Alvaro Herrera wrote: > Sure thing, thanks, done :-) Thanks, Alvaro. -- Michael signature.asc Description: PGP signature
Re: Remaining calls of heap_close/heap_open in the tree
On Fri, Oct 18, 2019 at 10:03:11AM +0900, Michael Paquier wrote: > Not sure that's worth the trouble. If there are no objections, I will > remove the compatibility macros. Okay, cleanup done with the compatibility macros removed. -- Michael signature.asc Description: PGP signature
Re: recovery_min_apply_delay in archive recovery causes assertion failure in latch
On Thu, Oct 17, 2019 at 02:35:13PM +0900, Michael Paquier wrote: > ArchiveRecoveryRequested will be set to true if recovery.signal or > standby.signal are found, so it seems to me that you can make all > those checks more simple by removing from the equation > StandbyModeRequested, no? StandbyModeRequested is never set to true > if ArchiveRecoveryRequested is not itself true. For the sake of the archives, this has been applied by Fujii-san as of ec1259e8. -- Michael signature.asc Description: PGP signature
Re: Backport "WITH ... AS MATERIALIZED" syntax to <12?
On Fri, Oct 18, 2019 at 02:21:30PM +0100, Colin Watson wrote: > However, an alternative would be to backport the new syntax to some > earlier versions. "WITH ... AS MATERIALIZED" can easily just be > synonymous with "WITH ... AS" in versions prior to 12; there's no need > to support "NOT MATERIALIZED" since that's explicitly requesting the new > query-folding feature that only exists in 12. Would something like the > attached patch against REL_11_STABLE be acceptable? I'd like to > backpatch it at least as far as PostgreSQL 10. I am afraid that new features don't gain a backpatch. This is a project policy. Back-branches should just include bug fixes. -- Michael signature.asc Description: PGP signature
Re: v12.0: reindex CONCURRENTLY: lock ShareUpdateExclusiveLock on object 14185/39327/0 is already held
On Fri, Oct 18, 2019 at 01:26:27PM -0500, Justin Pryzby wrote: > Checking if anybody is working on either of these > https://www.postgresql.org/message-id/20191013025145.GC4475%40telsasoft.com > https://www.postgresql.org/message-id/20191015164047.GA22729%40telsasoft.com FWIW, I have spent an hour or two poking at this issue the last couple of days so I am not ignoring both, not as much as I would have liked but well... My initial lookup leads me to think that something is going wrong with the cleanup of the session-level lock on the parent table taken in the first transaction doing the REINDEX CONCURRENTLY. -- Michael signature.asc Description: PGP signature
Re: Backport "WITH ... AS MATERIALIZED" syntax to <12?
> "Michael" == Michael Paquier writes: > On Fri, Oct 18, 2019 at 02:21:30PM +0100, Colin Watson wrote: >> However, an alternative would be to backport the new syntax to some >> earlier versions. "WITH ... AS MATERIALIZED" can easily just be >> synonymous with "WITH ... AS" in versions prior to 12; there's no >> need to support "NOT MATERIALIZED" since that's explicitly >> requesting the new query-folding feature that only exists in 12. >> Would something like the attached patch against REL_11_STABLE be >> acceptable? I'd like to backpatch it at least as far as PostgreSQL >> 10. Michael> I am afraid that new features don't gain a backpatch. This is Michael> a project policy. Back-branches should just include bug fixes. I do think an argument can be made for making an exception in this particular case. This wouldn't be backpatching a feature, just accepting and ignoring some of the new syntax to make upgrading easier. -- Andrew (irc:RhodiumToad)