Re: [HACKERS] New Event Trigger: table_rewrite
On 16 November 2014 06:59, Michael Paquier wrote: > Note that this patch has been submitted but there have been no real > discussion around it.. This seems a bit too fast to commit it, no? Committing uncontentious patches at the end of the commitfest seems normal, no? -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] New Event Trigger: table_rewrite
On 16 November 2014 06:59, Michael Paquier wrote: > 1) This patch is authorizing VACUUM and CLUSTER to use the event > triggers ddl_command_start and ddl_command_end, but aren't those > commands actually not DDLs but control commands? I could go either way on that. I'm happy to remove those from this commit. > 2) The documentation of src/sgml/event-trigger.sgml can be improved, > particularly I think that the example function should use a maximum of > upper-case letters for reserved keywords, and also this bit: > you're not allowed to rewrite the table foo > should be rewritten to something like that: > Rewrite of table foo not allowed > 3) A typo, missing a plural: > provides two built-in event trigger helper functionS I thought the documentation was very good, in comparison to most other feature submissions. Given that this is one of the areas I moan about a lot, that says something. > 4) pg_event_trigger_table_rewrite_oid is able to return only one OID, > which is the one of the table being rewritten, and it is limited to > one OID because VACUUM, CLUSTER and ALTER TABLE can only run on one > object at the same time in a single transaction. What about thinking > that we may have in the future multiple objects rewritten in a single > transaction, hence multiple OIDs could be fetched? Why would this API support something which the normal trigger API doesn't, just in case we support a feature that hadn't ever been proposed or discussed? Why can't such a change wait until that feature arrives? > 5) parsetree is passed to cluster_rel only for > EventTriggerTableRewrite. I am not sure if there are any extension > using cluster_rel as is but wouldn't it make more sense to call > EventTriggerTableRewrite before the calls to cluster_rel instead? ISTM > that this patch is breaking cluster_rel way of doing things. I will remove the call to CLUSTER and VACUUM as proposed above. > 6) in_table_rewrite seems unnecessary. > typedef struct EventTriggerQueryState > { > slist_head SQLDropList; > boolin_sql_drop; > + boolin_table_rewrite; > + Oid tableOid; > We could simplify that by renaming tableOid to rewriteTableOid or > rewriteObjOid and check if its value is InvalidOid to determine if the > event table_rewrite is in use or not. Each code path setting those > variables sets them all the time similarly: > + state->in_table_rewrite = false; > + state->tableOid = InvalidOid; > And if tableOid is InvaliOid, in_table_rewrite is false. If it is a > valid Oid, in_table_rewrite is set to true. Well, that seems a minor change. I'm happy to accept the original coding, but also happy to receive suggested changes. > 7) table_rewrite is kicked in ALTER TABLE only when ATRewriteTables is > used. The list of commands that actually go through this code path > should be clarified in the documentation IMO to help the user > apprehend this function. That is somewhat orthogonal to the patch. The rules for rewriting are quite complex, which is why this is needed and why documentation isn't really the answer. Separate doc patch on that would be welcome. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review of Refactoring code for sync node detection
On 31 October 2014 00:27, Michael Paquier wrote: > On Fri, Oct 31, 2014 at 6:59 AM, Jim Nasby wrote: >> >> If we stick with this version I'd argue it makes more sense to just stick >> the sync_node = and priority = statements into the if block and ditch the >> continue. > > Let's go with the cleaner version then, I'd prefer code that can be read > easily for this code path. Switching back is not much complicated either. I like where you are headed with this feature set. I will do my best to comment and review so we get something in 9.5. Some review comments * The new function calls them a Node rather than a Standby. That is a good change, but it needs to be applied consistently, so we no longer use the word Standby anywhere near the sync rep code. I'd prefer if we continue to use the abbreviation sync for synchronous, since its less typing and avoids spelling mistakes in code and comments. * Rationale... Robert Haas wrote > Interestingly, the syncrep code has in some of its code paths the idea > that a synchronous node is unique, while other code paths assume that > there can be multiple synchronous nodes. If that's fine I think that > it would be better to just make the code multiple-sync node aware, by > having a single function call in walsender.c and syncrep.c that > returns an integer array of WAL sender positions (WalSndCtl). as that > seems more extensible long-term. I thought this was the rationale for the patch, but it doesn't do this. If you disagree with Robert, it would be best to say so now, since I would guess he's expecting the patch to be doing as he asked. Or perhaps I misunderstand. * Multiple sync standbys were originally avoided because of the code cost and complexity at ReleaseWaiters(). I'm very keen to keep the code at that point very robust, so simplicity is essential. It would also be useful to have some kind of micro-benchmark that allows us to understand the overhead of various configs. Jim mentions he isn't sure of the performance there. I don't see too much a problem with this patch, but the later patches will require this, so we may as well do this soon. * We probably don't need a comment like /* Cleanup */ inserted above a pfree. ;-) I see this should get committed in next few days, even outside the CF. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review of Refactoring code for sync node detection
Thanks for the comments! On Sun, Nov 16, 2014 at 8:32 PM, Simon Riggs wrote: > On 31 October 2014 00:27, Michael Paquier wrote: >> On Fri, Oct 31, 2014 at 6:59 AM, Jim Nasby wrote: >>> >>> If we stick with this version I'd argue it makes more sense to just stick >>> the sync_node = and priority = statements into the if block and ditch the >>> continue. >> >> Let's go with the cleaner version then, I'd prefer code that can be read >> easily for this code path. Switching back is not much complicated either. > > I like where you are headed with this feature set. I will do my best > to comment and review so we get something in 9.5. > > Some review comments > > * The new function calls them a Node rather than a Standby. That is a > good change, but it needs to be applied consistently, so we no longer > use the word Standby anywhere near the sync rep code. I'd prefer if we > continue to use the abbreviation sync for synchronous, since its less > typing and avoids spelling mistakes in code and comments. Yes I guess this makes sense. > * Rationale... > Robert Haas wrote >> Interestingly, the syncrep code has in some of its code paths the idea >> that a synchronous node is unique, while other code paths assume that >> there can be multiple synchronous nodes. If that's fine I think that >> it would be better to just make the code multiple-sync node aware, by >> having a single function call in walsender.c and syncrep.c that >> returns an integer array of WAL sender positions (WalSndCtl). as that >> seems more extensible long-term. > > I thought this was the rationale for the patch, but it doesn't do > this. If you disagree with Robert, it would be best to say so now, > since I would guess he's expecting the patch to be doing as he asked. > Or perhaps I misunderstand. This comment is originally from me :) http://www.postgresql.org/message-id/CAB7nPqTtmSo0YX1_3_PykpKbdGUi3UeFd0_=2-6vrqo5n_t...@mail.gmail.com Changing with my older opinion, I wrote the patch on this thread with in mind the idea to keep the code as simple as possible, so for now it is better to still think that we have a single sync node. Let's work the multiple node thing once we have a better spec of how to do it, visibly using a dedicated micro-language within s_s_names. > * Multiple sync standbys were originally avoided because of the code > cost and complexity at ReleaseWaiters(). I'm very keen to keep the > code at that point very robust, so simplicity is essential. It would > also be useful to have some kind of micro-benchmark that allows us to > understand the overhead of various configs. Jim mentions he isn't sure > of the performance there. I don't see too much a problem with this > patch, but the later patches will require this, so we may as well do > this soon. Yes I have been playing with that with the patch series of the previous commit fest, with something not that much kludgy. > * We probably don't need a comment like /* Cleanup */ inserted above a > pfree. ;-) Sure. I'll send an updated patch tomorrow. Regards, -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Failback to old master
Hi, On Sat, Nov 15, 2014 at 5:31 PM, Maeldron T. wrote: >> A safely shut down master (-m fast is safe) can be safely restarted as >> a slave to the newly promoted master. Fast shutdown shuts down all >> normal connections, does a shutdown checkpoint and then waits for this >> checkpoint to be replicated to all active streaming clients. Promoting >> slave to master creates a timeline switch, that prior to version 9.3 >> was only possible to replicate using the archive mechanism. As of >> version 9.3 you don't need to configure archiving to follow timeline >> switches, just add a recovery.conf to the old master to start it up as >> a slave and it will fetch everything it needs from the new master. >> > I took your advice and I understood that removing the recovery.conf followed > by a restart is wrong. I will not do that on my production servers. > > However, I can't make it work with promotion. What did I wrong? It was > 9.4beta3. > > mkdir 1 > mkdir 2 > initdb -D 1/ > max_wal_senders=7, wal_keep_segments=100, uncomment replication in hba.conf> > pg_ctl -D 1/ start > createdb -p 5433 > psql -p 5433 > pg_basebackup -p 5433 -R -D 2/ > mcedit 2/postgresql.conf > chmod -R 700 1 > chmod -R 700 2 > pg_ctl -D 2/ start > psql -p 5433 > psql -p 5434 > > pg_ctl -D 1/ stop > pg_ctl -D 2/ promote > psql -p 5434 > cp 2/recovery.done 1/recovery.conf > mcedit 1/recovery.conf > pg_ctl -D 1/ start > > LOG: replication terminated by primary server > DETAIL: End of WAL reached on timeline 1 at 0/3000AE0. > LOG: restarted WAL streaming at 0/300 on timeline 1 > LOG: replication terminated by primary server > DETAIL: End of WAL reached on timeline 1 at 0/3000AE0. > > This is what I experienced in the past when I tried with promote. The old > master disconnects from the new. What am I missing? > I think you have to add recovery_target_timeline = '2' in recovery.conf with '2' being the new primary timeline . cf http://www.postgresql.org/docs/9.4/static/recovery-target-settings.html Didier -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] alternative model for handling locking in parallel groups
On Fri, Nov 14, 2014 at 12:03 PM, Andres Freund wrote: > Note that you'd definitely not want to do this naively - currently > there's baked in assumptions into the vaccum code that only one backend > is doing parts of it. > > I think there's Did something you intended get left out here? >> 4. Parallel query on a locked relation. Parallel query should work on >> a table created in the current transaction, or one explicitly locked >> by user action. It's not acceptable for that to just randomly >> deadlock, and skipping parallelism altogether, while it'd probably be >> acceptable for a first version, is not going a good long-term >> solution. > > FWIW, I think it's perfectly acceptable to refuse to work in parallel in > that scenario. And not just for now. I don't agree with that, but my point is that I think that fixing it so it works is probably no more work than detecting that it isn't going to work, whether the specific proposal in this email pans out or not. > The biggest argument I can see to that is parallel index creation. > >> It also sounds buggy and fragile for the query planner to >> try to guess whether the lock requests in the parallel workers will >> succeed or fail when issued. > > I don't know. Checking whether we hold a self exclusive lock on that > relation doesn't sound very problematic to me. That seems like gross oversimplification of what we need to check. For example, suppose we want to do a parallel scan. We grab AccessShareLock. Now, another backends waits for AccessExcusiveLock. We start workers, who all try to get AccessShareLock. They wait behind the AccessExclusiveLock, while, outside the view of the current lock manager, we wait for them. No process in our group acquired more than AccesShareLock, yet we've got problems. We need a simple and elegant way to avoid this kind of situation. >> After thinking about these cases for a bit, I came up with a new >> possible approach to this problem. Suppose that, at the beginning of >> parallelism, when we decide to start up workers, we grant all of the >> locks already held by the master to each worker (ignoring the normal >> rules for lock conflicts). Thereafter, we do everything the same as >> now, with no changes to the deadlock detector. That allows the lock >> conflicts to happen normally in the first two cases above, while >> preventing the unwanted lock conflicts in the second two cases. > > I don't think that's safe enough. There's e.g. a reason why ANALYZE > requires SUE lock. It'd definitely not be safe to simply grant the > worker a SUE lock, just because the master backend already analyzed it > or something like that. You could end up with the master and worker > backends ANALYZEing the same relation. Well, in the first version of this, I expect to prohibit parallel workers from doing any DML or DDL whatsoever - they will be strictly read-only. So you won't have that problem. Now, eventually, we might relax that, but I would expect that a prohibition on the workers starting new utility commands while in parallel mode wouldn't be very high on anyone's list of restrictions to relax. > That said, I can definitely see use for a infrastructure where we > explicitly can grant another backend a lock that'd conflict with one > we're already holding. But I think it'll need to be more explicit than > just granting all currently held locks at the "highest" current > level. And I think it's not necessarily going to be granting them all > the locks at their current levels. > > What I'm thinking of is basically add a step to execMain.c:InitPlan() > that goes through the locks and grants the child process all the locks > that are required for the statement to run. But not possibly preexisting > higher locks. This doesn't actually solve the problem, because we can be incidentally holding locks on other relations - system catalogs, in particular - that will cause the child processes to fail. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Failback to old master
On 16/11/14 13:13, didier wrote: I think you have to add recovery_target_timeline = '2' in recovery.conf with '2' being the new primary timeline . cf http://www.postgresql.org/docs/9.4/static/recovery-target-settings.html Thank you. Based on the link I have added: recovery_target_timeline = 'latest' And now it works. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Index scan optimization
On 30 October 2014 08:43, Haribabu Kommi wrote: > I marked the patch as ready for committer. This looks very interesting. The value of the patch seems to come from skipping costly checks. Your performance test has a leading VARCHAR column, so in that specific case skipping the leading column is a big win. I don't see it would be in all cases. Can we see a few tests on similar things to check for other opportunities and regressions. * Single column bigint index * Multi-column bigint index * 5 column index with mixed text and integers The explanatory comments need some work to more clearly explain what this patch does. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Order of views in stats docs
On Sun, Nov 9, 2014 at 3:46 PM, Magnus Hagander wrote: > On Thu, Nov 6, 2014 at 3:01 PM, Peter Eisentraut wrote: >> On 11/6/14 6:16 AM, Magnus Hagander wrote: >>> Another thought I had in that case is maybe we need to break out the >>> pg_stat_activity and pg_stat_replication views into their own table. >>> They are really the only two views that are different in a lot of >>> ways. Maybe call the splits "session statistics views" and "object >>> statistics views", instead of just "standard statistics views"? The >>> difference being that they show snapshots of *right now*, whereas all >>> the other views show accumulated statistics over time. >> >> Yeah, splitting this up a bit would help, too. > > Here's an initial run of this. It still feels wrong that we have the > dynamic views under "the statistics collector". Perhaps longterm we > should look at actually splitting them out to a completely different > sect1? > > I only fixed the obvious parts in this - the split out, and the moving > of pg_stat_database_conflicts. But it's a first step at least. > > Objections? Hearing no objections, I've pushed this. There's more to do, but it's a start. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [GSoC2014] Patch ALTER TABLE ... SET LOGGED
* Alvaro Herrera wrote: Michael Paquier wrote: Btw, perhaps this diff should be pushed as a different patch as this is a rather different thing: - if (heapRelation->rd_rel->relpersistence == RELPERSISTENCE_UNLOGGED && + if (indexRelation->rd_rel->relpersistence == RELPERSISTENCE_UNLOGGED && !smgrexists(indexRelation->rd_smgr, INIT_FORKNUM) As this is a correctness fix, it does not seem necessary to back-patch it: the parent relation always has the same persistence as its indexes. There was an argument for doing it this way that only applies if this patch went in, but I can't remember now what it was. Anyway I pushed the patch after some slight additional changes. Thanks! The buildfarm says -DCLOBBER_CACHE_ALWAYS does not like this patch. -- Christian -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical decoding - reading a user catalog table
On 11/13/2014 02:44 PM, Andres Freund wrote: Hi Steve, If it still happens, could you send me instructions of how to reproduce the problem after cloning the necessary source repositories? It's quite hard to validate a possible fix otherwise. 1. Install PG 9.4 2. Perform an initdb max_connections = 200 wal_level=logical max_walsenders=50 wal_keep_segments = 100 wal_sender_timeout = 600s max_replication_slots = 120 Create a user slon with superuser create a user test (set passwords for them you'll use them later) Configure pg_hba.conf to allow slon to connect as a replication user 3. Download slony from github (https://github.com/ssinger/slony1-engine.git) checkout the branch logical_remote_helper ./configure --with-pgconfigdir=/path_to_your_pgcbindir make make install 4. Download clustertest & compile clustertest from (https://github.com/clustertest/clustertest-framework). 5. In the slony source tree cd to the clustertest directory 6. cp conf/disorder.properties.sample to conf/disorder.properties Edit the file to have the correct paths, ports, users, passwords. 7 cp conf/java.properties.sample to conf/java.properties edit the file to point at the JAR you downloaded earlier. I run with all 5 databases on the same PG cluster. Performance will be much better with 5 different clusters, but I recommend trying to emulate my configuration as much as possible to replicate this To run the tests then do ./run_all_disorder_tests.sh At the moment (commit df5acfd1a3) is configured to just run the Failover node test cases where I am seeing this not the entire suite. It typically takes between 30 minutes to an hour to run though. I installed things following the above steps on a different system than my usual development laptop and I have been unable to reproduce the error so for (on that system). But I am still able to reproduce it on occasion on my normal development laptop. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] printing table in asciidoc with psql
Hi 2014-11-07 22:37 GMT+01:00 Alvaro Herrera : > > I did \o /tmp/tst, then > \dS > create table "eh | oh" (); > \dS > > and then filtered the output file to HTML. The CREATE TABLE tag ended > up in the same line as the title of the following table. I think > there's a newline is missing somewhere. > > The good news is that the | in the table name was processed correctly; > but I noticed that the table title is not using the escaped print, so I > would imagine that if I put a | in the title, things would go wrong. > (I don't know how to put titles on tables other than editing the > hardcoded titles for \ commands in psql). > > Another thing is that spaces around the | seem gratuituous and produce > bad results. I tried "select * from pg_class" which results in a very > wide table, and then the HTML output contains some cells with the values > in the second line; this makes all rows taller than they must be, > because some cells use the first line and other cells in the same row > use the second line for the text. I hand-edited the asciidoc and > removed the spaces around | which makes the result nicer. (Maybe > removing the trailing space is enough.) > I see a trailing spaces, but I don't see a described effect. Please, can you send some more specific test case? I fixed a status view and removing trailing spaces Regards Pavel > > -- > Álvaro Herrerahttp://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Training & Services > commit 76e033f3a287115f15f249c657b544c148e2efd2 Author: Pavel Stehule Date: Sun Nov 16 23:38:46 2014 +0100 every table is in own paragraph diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml index e7fcc73..cd64b88 100644 --- a/doc/src/sgml/ref/psql-ref.sgml +++ b/doc/src/sgml/ref/psql-ref.sgml @@ -2092,8 +2092,8 @@ lo_import 152801 aligned, wrapped, html, latex (uses tabular), - latex-longtable, or - troff-ms. + latex-longtable, + troff-ms, or asciidoc. Unique abbreviations are allowed. (That would mean one letter is enough.) @@ -2120,7 +2120,8 @@ lo_import 152801 The html, latex, - latex-longtable, and troff-ms + latex-longtable, troff-ms, + and asciidoc formats put out tables that are intended to be included in documents using the respective mark-up language. They are not complete documents! This might not be diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c index 26089352..e00e47b 100644 --- a/src/bin/psql/command.c +++ b/src/bin/psql/command.c @@ -2248,6 +2248,9 @@ _align2string(enum printFormat in) case PRINT_TROFF_MS: return "troff-ms"; break; + case PRINT_ASCIIDOC: + return "asciidoc"; + break; } return "unknown"; } @@ -2321,9 +2324,11 @@ do_pset(const char *param, const char *value, printQueryOpt *popt, bool quiet) popt->topt.format = PRINT_LATEX_LONGTABLE; else if (pg_strncasecmp("troff-ms", value, vallen) == 0) popt->topt.format = PRINT_TROFF_MS; + else if (pg_strncasecmp("asciidoc", value, vallen) == 0) + popt->topt.format = PRINT_ASCIIDOC; else { - psql_error("\\pset: allowed formats are unaligned, aligned, wrapped, html, latex, troff-ms\n"); + psql_error("\\pset: allowed formats are unaligned, aligned, wrapped, html, latex, troff-ms, asciidoc\n"); return false; } diff --git a/src/bin/psql/common.c b/src/bin/psql/common.c index 66d80b5..ea8b9c1 100644 --- a/src/bin/psql/common.c +++ b/src/bin/psql/common.c @@ -858,6 +858,10 @@ PrintQueryStatus(PGresult *results) html_escaped_print(PQcmdStatus(results), pset.queryFout); fputs("\n", pset.queryFout); } + else if (pset.popt.topt.format == PRINT_ASCIIDOC) + { + fprintf(pset.queryFout, "\n%s\n", PQcmdStatus(results)); + } else fprintf(pset.queryFout, "%s\n", PQcmdStatus(results)); } diff --git a/src/bin/psql/help.c b/src/bin/psql/help.c index ae5fe88..b14b313 100644 --- a/src/bin/psql/help.c +++ b/src/bin/psql/help.c @@ -351,7 +351,7 @@ helpVariables(unsigned short int pager) fprintf(output, _(" expanded (or x)toggle expanded output\n")); fprintf(output, _(" fieldsep field separator for unaligned output (default '|')\n")); fprintf(output, _(" fieldsep_zero set field separator in unaligned mode to zero\n")); - fprintf(output, _(" format set output format [unaligned, aligned, wrapped, html, latex, ..]\n")); + fprintf(output, _(" format set output format [unaligned, aligned, wrapped, html, latex, asciidoc ..]\n")); fprintf(output, _(" footer enable or disable display of the table footer [on, off]\n")); fprintf(output, _(" linestyle set the border line drawing style [ascii, old-ascii, unicode]\n")); fprintf(output, _(" null set the string to be printed in place of a null value\n")); diff
Re: [HACKERS] PostgreSQL doesn't stop propley when --slot option is specified with pg_receivexlog.
On Sat, Nov 15, 2014 at 9:10 PM, Michael Paquier wrote: > On Sat, Nov 15, 2014 at 3:42 AM, Andres Freund wrote: >> On 2014-11-15 03:25:16 +0900, Fujii Masao wrote: >>> On Fri, Nov 14, 2014 at 7:22 PM, wrote: >>> > "pg_ctl stop" does't work propley, if --slot option is specified when WAL >>> > is flushed only it has switched. >>> > These processes still continue even after the posmaster >>> > failed:pg_receivexlog, walsender and logger. >>> >>> I could reproduce this problem. At normal shutdown, walsender keeps waiting >>> for the last WAL record to be replicated and flushed in pg_receivexlog. But >>> pg_receivexlog issues sync command only when WAL file is switched. Thus, >>> since pg_receivexlog may never flush the last WAL record, walsender may have >>> to keep waiting infinitely. >> >> Right. > It is surprising that nobody complained about that before, > pg_receivexlog has been released two years ago. It's not so surprising because the problem can happen only when replication slot is specified, i.e., the version is 9.4 or later. >>> pg_recvlogical handles this problem by calling fsync() when it receives the >>> request of immediate reply from the server. That is, at shutdown, walsender >>> sends the request, pg_receivexlog receives it, flushes the last WAL record, >>> and sends the flush location back to the server. Since walsender can see >>> that >>> the last WAL record is successfully flushed in pg_receivexlog, it can >>> exit cleanly. >>> >>> One idea to the problem is to introduce the same logic as pg_recvlogical >>> has, >>> to pg_receivexlog. Thought? >> >> Sounds sane to me. Are you looking into doing that? > Yep, sounds a good thing to do if master requested answer from the > client in the keepalive message. Something like the patch attached > would make the deal. Isn't it better to do this only when replication slot is used? Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
On 13 November 2014 15:35 Amit Kapila Wrote, >As mentioned by you offlist that you are not able reproduce this >issue, I have tried again and what I observe is that I am able to >reproduce it only on *release* build and some cases work without >this issue as well, >example: >./vacuumdb --analyze-in-stages -t t1 -t t2 -t t3 -t t4 -t t5 -t t6 -t t7 -t t8 >-j 8 -d postgres >Generating minimal optimizer statistics (1 target) >Generating medium optimizer statistics (10 targets) >Generating default (full) optimizer statistics >So to me, it looks like this is a timing issue and please notice >why in error the statement looks like >"ANALYZE ng minimal optimizer statistics (1 target)". I think this >is not a valid statement. >Let me know if you still could not reproduce it. Thank you for looking into it once again.. I have tried with the release mode, but could not reproduce the same. By looking at server LOG sent by you “"ANALYZE ng minimal optimizer statistics (1 target)". ”, seems like some corruption. So actually looks like two issues here. 1. Query string sent to server is getting corrupted. 2. Client is getting crashed. I will review the code and try to find the same, meanwhile if you can find some time to debug this, it will be really helpful. Regards, Dilip
Re: [HACKERS] PostgreSQL doesn't stop propley when --slot option is specified with pg_receivexlog.
On Mon, Nov 17, 2014 at 10:02 AM, Fujii Masao wrote: > On Sat, Nov 15, 2014 at 9:10 PM, Michael Paquier > wrote: >> Yep, sounds a good thing to do if master requested answer from the >> client in the keepalive message. Something like the patch attached >> would make the deal. > > Isn't it better to do this only when replication slot is used? Makes sense. What about a check using reportFlushPosition then? -- Michael diff --git a/src/bin/pg_basebackup/receivelog.c b/src/bin/pg_basebackup/receivelog.c index c6c90fb..b426cfe 100644 --- a/src/bin/pg_basebackup/receivelog.c +++ b/src/bin/pg_basebackup/receivelog.c @@ -149,6 +149,34 @@ open_walfile(XLogRecPtr startpoint, uint32 timeline, char *basedir, } /* + * Flush the current WAL file to disk and update the last WAL flush + * positions as well as the last fsync timestamp. On failure, print + * a message to stderr and return false, otherwise return true. + */ +static bool +fsync_walfile(XLogRecPtr blockpos, int64 timestamp) +{ + /* nothing to do if no WAL file open */ + if (walfile == -1) + return true; + + /* nothing to do if data has already been flushed */ + if (blockpos <= lastFlushPosition) + return true; + + if (fsync(walfile) != 0) + { + fprintf(stderr, _("%s: could not fsync file \"%s\": %s\n"), +progname, current_walfile_name, strerror(errno)); + return false; + } + + lastFlushPosition = blockpos; + last_fsync = timestamp; + return true; +} + +/* * Close the current WAL file (if open), and rename it to the correct * filename if it's complete. On failure, prints an error message to stderr * and returns false, otherwise returns true. @@ -787,21 +815,12 @@ HandleCopyStream(PGconn *conn, XLogRecPtr startpos, uint32 timeline, * If fsync_interval has elapsed since last WAL flush and we've written * some WAL data, flush them to disk. */ - if (lastFlushPosition < blockpos && - walfile != -1 && - ((fsync_interval > 0 && - feTimestampDifferenceExceeds(last_fsync, now, fsync_interval)) || - fsync_interval < 0)) + if ((fsync_interval > 0 && + feTimestampDifferenceExceeds(last_fsync, now, fsync_interval)) || + fsync_interval < 0) { - if (fsync(walfile) != 0) - { -fprintf(stderr, _("%s: could not fsync file \"%s\": %s\n"), - progname, current_walfile_name, strerror(errno)); + if (!fsync_walfile(blockpos, now)) goto error; - } - - lastFlushPosition = blockpos; - last_fsync = now; } /* @@ -999,7 +1018,6 @@ ProcessKeepaliveMsg(PGconn *conn, char *copybuf, int len, { int pos; bool replyRequested; - int64 now; /* * Parse the keepalive message, enclosed in the CopyData message. @@ -1021,7 +1039,15 @@ ProcessKeepaliveMsg(PGconn *conn, char *copybuf, int len, /* If the server requested an immediate reply, send one. */ if (replyRequested && still_sending) { - now = feGetCurrentTimestamp(); + int64 now = feGetCurrentTimestamp(); + + /* + * Flush data, so a fresh position is sent but do this only if a + * flush position needs to be reported. + */ + if (reportFlushPosition && !fsync_walfile(blockpos, now)) + return false; + if (!sendFeedback(conn, blockpos, now, false)) return false; *last_status = now; -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] BRIN indexes - TRAP: BadArgument
On Sat, Nov 8, 2014 at 1:26 AM, Alvaro Herrera wrote: > > > I just pushed this, after some more minor tweaks. Thanks, and please do > continue testing! > Few typo's and few questions 1. * range. Need to an extra flag in mmtuples for that. */ Datum brinbulkdelete(PG_FUNCTION_ARGS) Isn't the part of comment referring *mmtuples* require some change, as I think mmtuples was used in initial version of patch. 2. /* --- * mt_info is laid out in the following fashion: * * 7th (high) bit: has nulls * 6th bit: is placeholder tuple * 5th bit: unused * 4-0 bit: offset of data * --- */ uint8 bt_info; } BrinTuple; Here in comments, bt_info is referred as mt_info. 3. /* * t_info manipulation macros */ #define BRIN_OFFSET_MASK 0x1F I think in above comment it should be bt_info, rather than t_info. 4. static void revmap_physical_extend(BrinRevmap *revmap) { .. .. START_CRIT_SECTION(); /* the rm_tids array is initialized to all invalid by PageInit */ brin_page_init(page, BRIN_PAGETYPE_REVMAP); MarkBufferDirty(buf); metadata->lastRevmapPage = mapBlk; MarkBufferDirty(revmap->rm_metaBuf); .. } Can't we update revmap->rm_lastRevmapPage along with metadata->lastRevmap? 5. typedef struct BrinMemTuple { bool bt_placeholder; /* this is a placeholder tuple */ BlockNumber bt_blkno; /* heap blkno that the tuple is for */ MemoryContext bt_context; /* memcxt holding the dt_column values */ .. } How is this memory context getting used? I could see that this is used brin_deform_tuple() which gets called from 3 other places in core code bringetbitmap(), brininsert() and union_tuples() and in all the 3 places there is already another temporaray memory context used to avoid any form of memory leaks. 6. Is there anyway to force brin index to be off, if not, then do we need it as it is present for other type of scan's. like set enable_indexscan=off; With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Review of Refactoring code for sync node detection
On Sun, Nov 16, 2014 at 9:07 PM, Michael Paquier wrote: > I'll send an updated patch tomorrow. Here are updated versions. I divided the patch into two for clarity, the first one is the actual refactoring patch, renaming SyncRepGetSynchronousNode to SyncRepGetSynchronousStandby (+alpha, like updating synchronous to sync in the comments as you mentioned) such as the namings have no conflicts. The second one updates the syncrep code, including WAL sender and WAL receiver, and its surroundings to use the term "node" instead of "standby". This brings in the code the idea that a node using replication APIs is not necessarily a standby, making it more generic. This can be applied on top of the refactoring patch. If any other folks (Heikki or Fujii-san?) have comments about this idea feel free. Regards, -- Michael From 51e9988ff44b7c2b716e3a0da3f1d1c9359a1d79 Mon Sep 17 00:00:00 2001 From: Michael Paquier Date: Thu, 30 Oct 2014 22:01:10 +0900 Subject: [PATCH 1/2] Refactor code to detect synchronous node in WAL sender array This patch is made to remove code duplication in walsender.c and syncrep.c in order to detect what is the node with the lowest strictly-positive priority, facilitating maintenance of this code. --- src/backend/replication/syncrep.c | 97 +++-- src/backend/replication/walsender.c | 35 +++-- src/include/replication/syncrep.h | 1 + 3 files changed, 78 insertions(+), 55 deletions(-) diff --git a/src/backend/replication/syncrep.c b/src/backend/replication/syncrep.c index aa54bfb..f838ad0 100644 --- a/src/backend/replication/syncrep.c +++ b/src/backend/replication/syncrep.c @@ -357,6 +357,70 @@ SyncRepInitConfig(void) } } + +/* + * Obtain position of sync standby in the array referencing all the WAL + * senders, or -1 if no sync node can be found. The caller of this function + * should take a lock on SyncRepLock. If there are multiple nodes with + * the same lowest priority value, the first node found is selected. + * sync_priority is a preallocated array of size max_wal_senders that can + * be used to retrieve the priority of each WAL sender. Its inclusion in + * this function has the advantage to limit the scan of the WAL sender + * array to one pass, limiting the amount of cycles SyncRepLock is taken. + */ +int +SyncRepGetSynchronousStandby(int *node_priority) +{ + int sync_node = -1; + int priority = 0; + int i; + + /* Scan WAL senders and find the sync node if any */ + for (i = 0; i < max_wal_senders; i++) + { + /* Use volatile pointer to prevent code rearrangement */ + volatile WalSnd *walsnd = &WalSndCtl->walsnds[i]; + + /* First get the priority of this WAL sender */ + if (node_priority) + node_priority[i] = XLogRecPtrIsInvalid(walsnd->flush) ? +0 : walsnd->sync_standby_priority; + + /* Proceed to next if not active */ + if (walsnd->pid == 0) + continue; + + /* Proceed to next if not streaming */ + if (walsnd->state != WALSNDSTATE_STREAMING) + continue; + + /* Proceed to next one if asynchronous */ + if (walsnd->sync_standby_priority == 0) + continue; + + /* Proceed to next one if priority conditions not satisfied */ + if (priority != 0 && + priority <= walsnd->sync_standby_priority) + continue; + + /* Proceed to next one if flush position is invalid */ + if (XLogRecPtrIsInvalid(walsnd->flush)) + continue; + + /* + * We have a potential sync candidate, choose it as the sync + * node for the time being before going through the other nodes + * listed in the WAL sender array. + */ + sync_node = i; + + /* Update priority to current value of WAL sender */ + priority = walsnd->sync_standby_priority; + } + + return sync_node; +} + /* * Update the LSNs on each queue based upon our latest state. This * implements a simple policy of first-valid-standby-releases-waiter. @@ -369,10 +433,9 @@ SyncRepReleaseWaiters(void) { volatile WalSndCtlData *walsndctl = WalSndCtl; volatile WalSnd *syncWalSnd = NULL; + int sync_node; int numwrite = 0; int numflush = 0; - int priority = 0; - int i; /* * If this WALSender is serving a standby that is not on the list of @@ -388,32 +451,14 @@ SyncRepReleaseWaiters(void) /* * We're a potential sync standby. Release waiters if we are the highest * priority standby. If there are multiple standbys with same priorities - * then we use the first mentioned standby. If you change this, also - * change pg_stat_get_wal_senders(). + * then we use the first mentioned standby. */ LWLockAcquire(SyncRepLock, LW_EXCLUSIVE); + sync_node = SyncRepGetSynchronousStandby(NULL); - for (i = 0; i < max_wal_senders; i++) - { - /* use volatile pointer to prevent code rearrangement */ - volatile WalSnd *walsnd = &walsndctl->walsnds[i]; - - if (walsnd->pid != 0 && - walsnd->state == WALSNDSTATE_STREAMING && - walsnd->sync_standby_priority > 0 && - (priority == 0 || - priority > walsnd->sync_standby_priority) && - !XLogRecPtrIsInvalid
Re: [HACKERS] Index scan optimization
On 16 November 2014 19:30, Simon Riggs Wrote: > Sent: 16 November 2014 19:30 > > I marked the patch as ready for committer. > > This looks very interesting. > > The value of the patch seems to come from skipping costly checks. Your > performance test has a leading VARCHAR column, so in that specific case > skipping the leading column is a big win. I don't see it would be in > all cases. Yes you are right. Best improvement will be for a case where leading column comparison is very costly (e.g. VARCHAR). > Can we see a few tests on similar things to check for other > opportunities and regressions. > * Single column bigint index It gives around 5% improvement. > * Multi-column bigint index It gives around 5% improvement. > * 5 column index with mixed text and integers It gives around 15% improvement. As we can see in-case of bigint, improvement is not as high as for VARCHAR because comparison of bigint data-type is not costly. So this being one of the worst scenario performance test and I think even in-case of worst case 5% improvement with so less/safe code changes should be OK specially since for other scenario (like varchar index) improvement is high (15-30%). Also even for bigint (or any other similar data-type) improvement can increase if number of records going to be selected increases. Test-case used for testing is attached. Please provide your opinion. > The explanatory comments need some work to more clearly explain what > this patch does. Please help me to understand this point, you want me to add more comments about patch in this mail chain or in code. Thanks and Regards, Kumar Rajeev Rastogi As per Simon Riggs suggestion: -- With Single column bigint index: --Schema create table tbl2(id1 int, id2 bigint, id3 int); create index idx2 on tbl2(id2); --Procedure to insert 1M data: insert into tbl2 values(1,generate_series(1, 100), 2); --Procedure to select data 1000 times (1000 times selected to make data more appropriate.) create or replace function select_data(count1 int) returns void AS $$ declare x int; Begin for i IN 1..count1 loop select count(*) into x from tbl2 where id2>99; end loop; End; $$ language plpgsql; select select_data(1000); -- With Multiple column bigint index: create table tbl3(id1 int, id2 bigint, id3 bigint, id4 bigint); create index idx3 on tbl3(id2, id3, id4); insert into tbl3 values(1,generate_series(1, 100), generate_series(1, 100), generate_series(1, 100)); --Procedure to select data 1000 times (1000 times selected to make data more appropriate.) create or replace function select_data_multi_bigint(count1 int) returns void AS $$ declare x int; Begin for i IN 1..count1 loop select count(*) into x from tbl3 where id2>99 and id3>99 and id4>99; end loop; End; $$ language plpgsql; select select_data_multi_bigint(1000); -- ---5 column index with mixed text and integers --Schema create table tbl4(id1 int, id2 varchar(10), id3 int, id4 varchar(10), id5 int, id6 int); create index idx4 on tbl4(id2, id3, id4, id5, id6); --Procedure to insert 1M data: create or replace function insert_data_mix_text_int(count1 int, count2 int) returns void AS $$ Begin for i IN 1..count1 loop insert into tbl4 values(i, 'a', i, 'a', i, i); end loop; for i IN count1+1..count2 loop insert into tbl4 values(i, 'b', i, 'b', i, i); end loop; End; $$ language plpgsql; select insert_data_mix_text_int(99, 100); --Procedure to select data 1000 times (1000 times selected to make data more appropriate.) create or replace function select_data_mix_text_int(count1 int) returns void AS $$ declare x int; Begin for i IN 1..count1 loop select count(*) into x from tbl4 where id2>'a' and id3>99 and id4>'a' and id5>99 and id6>99; end loop; End; $$ language plpgsql; select select_data_mix_text_int(1000); -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.5: Better memory accounting, towards memory-bounded HashAgg
On Sat, 2014-11-15 at 21:36 +, Simon Riggs wrote: > Do I understand correctly that we are trying to account for exact > memory usage at palloc/pfree time? Why?? Not palloc chunks, only tracking at the level of allocated blocks (that we allocate with malloc). It was a surprise to me that accounting at that level would have any measurable impact, but Robert found a reasonable case on a POWER machine that degraded a couple percent. I wasn't able to reproduce it consistently on x86. > Or alternatively, can't we just sample the allocations to reduce the overhead? Not sure quite what you mean by "sample", but it sounds like something along those lines would work. Attached is a patch that does something very simple: only tracks blocks held in the current context, with no inheritance or anything like it. This reduces it to a couple arithmetic instructions added to the alloc/dealloc path, so hopefully that removes the general performance concern raised by Robert[1]. To calculate the total memory used, I included a function MemoryContextMemAllocated() that walks the memory context and its children recursively. Of course, I was originally trying to avoid that, because it moves the problem to HashAgg. For each group, it will need to execute MemoryContextMemAllocated() to see if work_mem has been exceeded. It will have to visit a couple contexts, or perhaps many (in the case of array_agg, which creates one per group), which could be a measurable regression for HashAgg. But if that does turn out to be a problem, I think it's solvable. First, I could micro-optimize the function by making it iterative rather than recursive, to save on function call overhead. Second, I could offer a way to prevent the HTAB from creating its own context, which would be one less context to visit. And if those don't work, perhaps I could resort to a sampling method of some kind, as you allude to above. Regards, Jeff Davis [1] I'm fairly sure I tested something very similar on Robert's POWER machine a while ago, and it was fine. But I think it's offline or moved now, so I can't verify the results. If this patch still somehow turns out to be a 1%+ regression on any reasonable test, I don't know what to say. *** a/src/backend/utils/mmgr/aset.c --- b/src/backend/utils/mmgr/aset.c *** *** 438,451 AllocSetContextCreate(MemoryContext parent, Size initBlockSize, Size maxBlockSize) { ! AllocSet context; /* Do the type-independent part of context creation */ ! context = (AllocSet) MemoryContextCreate(T_AllocSetContext, ! sizeof(AllocSetContext), ! &AllocSetMethods, ! parent, ! name); /* * Make sure alloc parameters are reasonable, and save them. --- 438,454 Size initBlockSize, Size maxBlockSize) { ! AllocSet set; ! MemoryContext context; /* Do the type-independent part of context creation */ ! context = MemoryContextCreate(T_AllocSetContext, ! sizeof(AllocSetContext), ! &AllocSetMethods, ! parent, ! name); ! ! set = (AllocSet) context; /* * Make sure alloc parameters are reasonable, and save them. *** *** 459,467 AllocSetContextCreate(MemoryContext parent, if (maxBlockSize < initBlockSize) maxBlockSize = initBlockSize; Assert(AllocHugeSizeIsValid(maxBlockSize)); /* must be safe to double */ ! context->initBlockSize = initBlockSize; ! context->maxBlockSize = maxBlockSize; ! context->nextBlockSize = initBlockSize; /* * Compute the allocation chunk size limit for this context. It can't be --- 462,470 if (maxBlockSize < initBlockSize) maxBlockSize = initBlockSize; Assert(AllocHugeSizeIsValid(maxBlockSize)); /* must be safe to double */ ! set->initBlockSize = initBlockSize; ! set->maxBlockSize = maxBlockSize; ! set->nextBlockSize = initBlockSize; /* * Compute the allocation chunk size limit for this context. It can't be *** *** 477,486 AllocSetContextCreate(MemoryContext parent, * and actually-allocated sizes of any chunk must be on the same side of * the limit, else we get confused about whether the chunk is "big". */ ! context->allocChunkLimit = ALLOC_CHUNK_LIMIT; ! while ((Size) (context->allocChunkLimit + ALLOC_CHUNKHDRSZ) > (Size) ((maxBlockSize - ALLOC_BLOCKHDRSZ) / ALLOC_CHUNK_FRACTION)) ! context->allocChunkLimit >>= 1; /* * Grab always-allocated space, if requested --- 480,489 * and actually-allocated sizes of any chunk must be on the same side of * the limit, else we get confused about whether the chunk is "big". */ ! set->allocChunkLimit = ALLOC_CHUNK_LIMIT; ! while ((Size) (set->allocChunkLimit + ALLOC_CHUNKHDRSZ) > (Size) ((maxBlockSize - ALLOC_BLOCKHDRSZ) / ALLOC_CHUNK_FRACTION)) ! set->allocChunkLimit >>= 1; /* * Grab always-allocated space, if requested *** ***
Re: [HACKERS] inherit support for foreign tables
(2014/11/12 20:04), Ashutosh Bapat wrote: I reviewed fdw-chk-3 patch. Here are my comments Thanks for the review! Tests --- 1. The tests added in file_fdw module look good. We should add tests for CREATE TABLE with CHECK CONSTRAINT also. Agreed. I added the tests, and also changed the proposed tests a bit. 2. For postgres_fdw we need tests to check the behaviour in case the constraints mismatch between the remote table and its local foreign table declaration in case of INSERT, UPDATE and SELECT. Done. 3. In the testcases for postgres_fdw it seems that you have forgotten to add statement after SET constraint_exclusion to 'partition' I added the statement. 4. In test foreign_data there are changes to fix the diffs caused by these changes like below ALTER FOREIGN TABLE ft1 DROP CONSTRAINT no_const; -- ERROR -ERROR: "ft1" is not a table +ERROR: constraint "no_const" of relation "ft1" does not exist ALTER FOREIGN TABLE ft1 DROP CONSTRAINT IF EXISTS no_const; -ERROR: "ft1" is not a table +NOTICE: constraint "no_const" of relation "ft1" does not exist, skipping ALTER FOREIGN TABLE ft1 DROP CONSTRAINT ft1_c1_check; -ERROR: "ft1" is not a table +ERROR: constraint "ft1_c1_check" of relation "ft1" does not exist Earlier when constraints were not supported for FOREIGN TABLE, these tests made sure the same functionality. So, even though the corresponding constraints were not created on the table (in fact it didn't allow the creation as well). Now that the constraints are allowed, I think the tests for "no_const" (without IF EXISTS) and "ft1_c1_check" are duplicating the same testcase. May be we should review this set of statement in the light of new functionality. Agreed. I removed the "DROP CONSTRAINT ft1_c1_check" test, and added a new test for ALTER CONSTRAINT. Code and implementation -- The usage of NO INHERIT and NOT VALID with CONSTRAINT on foreign table is blocked, but corresponding documentation entry doesn't mention so. Since foreign tables can not be inherited NO INHERIT option isn't applicable to foreign tables and the constraints on the foreign tables are declarative, hence NOT VALID option is also not applicable. So, I agree with what the code is doing, but that should be reflected in documentation with this explanation. Rest of the code modifies the condition checks for CHECK CONSTRAINTs on foreign tables, and it looks good to me. Done. Other changes: * Modified one error message that I added in AddRelationNewConstraints, to match the other message there. * Revised other docs a little bit. Attached is an updated version of the patch. Thanks, Best regards, Etsuro Fujita *** a/contrib/file_fdw/input/file_fdw.source --- b/contrib/file_fdw/input/file_fdw.source *** *** 62,68 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'csv', null ' CREATE FOREIGN TABLE tbl () SERVER file_server; -- ERROR CREATE FOREIGN TABLE agg_text ( ! a int2, b float4 ) SERVER file_server OPTIONS (format 'text', filename '@abs_srcdir@/data/agg.data', delimiter ' ', null '\N'); --- 62,68 CREATE FOREIGN TABLE tbl () SERVER file_server; -- ERROR CREATE FOREIGN TABLE agg_text ( ! a int2 CHECK (a >= 0), b float4 ) SERVER file_server OPTIONS (format 'text', filename '@abs_srcdir@/data/agg.data', delimiter ' ', null '\N'); *** *** 72,82 CREATE FOREIGN TABLE agg_csv ( --- 72,84 b float4 ) SERVER file_server OPTIONS (format 'csv', filename '@abs_srcdir@/data/agg.csv', header 'true', delimiter ';', quote '@', escape '"', null ''); + ALTER FOREIGN TABLE agg_csv ADD CHECK (a >= 0); CREATE FOREIGN TABLE agg_bad ( a int2, b float4 ) SERVER file_server OPTIONS (format 'csv', filename '@abs_srcdir@/data/agg.bad', header 'true', delimiter ';', quote '@', escape '"', null ''); + ALTER FOREIGN TABLE agg_bad ADD CHECK (a >= 0); -- per-column options tests CREATE FOREIGN TABLE text_csv ( *** *** 134,139 DELETE FROM agg_csv WHERE a = 100; --- 136,153 -- but this should be ignored SELECT * FROM agg_csv FOR UPDATE; + -- constraint exclusion tests + \t on + EXPLAIN (VERBOSE, COSTS FALSE) SELECT * FROM agg_csv WHERE a < 0; + \t off + SELECT * FROM agg_csv WHERE a < 0; + SET constraint_exclusion = 'on'; + \t on + EXPLAIN (VERBOSE, COSTS FALSE) SELECT * FROM agg_csv WHERE a < 0; + \t off + SELECT * FROM agg_csv WHERE a < 0; + SET constraint_exclusion = 'partition'; + -- privilege tests SET ROLE file_fdw_superuser; SELECT * FROM agg_text ORDER BY a; *** a/contrib/file_fdw/output/file_fdw.source --- b/contrib/file_fdw/output/file_fdw.source *** *** 78,84 ERROR: COPY null representation cannot use newline or carriage return CREATE FOREIGN TABLE tbl () SERVER file_server; -- ERROR ERROR: filename is required for file_fdw foreign tables CREATE FOREIGN TABLE agg_text ( ! a int2, b float4 )