Re: START/END line number for COPY FROM
Hi, On Fri, Jan 4, 2019 at 5:37 PM Peter Eisentraut < peter.eisentr...@2ndquadrant.com> wrote: > It seems a bit fragile to me if I want to skip a footer and need to > figure out the total line count, subtract one, and then oh, was it zero- > or one-based. > > But normally we don't say start copying from line number 0 regards Surafel
btree.sgml typo?
There is a sentence in btree.sgml: PostgreSQL includes an implementation of the standard btree (multi-way binary tree) index data structure. I think the term "btree" here means "multi-way balanced tree", rather than "multi-way binary tree". In fact in our btree, there could be more than one key in a node. Patch attached. Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp diff --git a/doc/src/sgml/btree.sgml b/doc/src/sgml/btree.sgml index c16825e2ea..996932e35d 100644 --- a/doc/src/sgml/btree.sgml +++ b/doc/src/sgml/btree.sgml @@ -13,7 +13,7 @@ PostgreSQL includes an implementation of the - standard btree (multi-way binary tree) index data + standard btree (multi-way balanced tree) index data structure. Any data type that can be sorted into a well-defined linear order can be indexed by a btree index. The only limitation is that an index entry cannot exceed approximately one-third of a page (after TOAST
Re: START/END line number for COPY FROM
On Fri, 21 Dec 2018 at 02:02, Surafel Temesgen wrote: > Currently we can skip header line on COPY FROM but having the ability to skip > and stop copying at any line can use to divide long copy operation and enable > to copy a subset of the file and skipping footer. Attach is a patch for it I'm struggling a bit to see the sense in this. If you really want to improve the performance of a long copy, then I think it makes more sense to have performed the backup in multiple pieces in the first place. Having the database read the input stream and ignore the first N lines sounds like a bit of a waste of effort, and effort that wouldn't be required if the COPY TO had been done in multiple pieces. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: Feature: triggers on materialized views
Dear, You can try https://github.com/ntqvinh/PgMvIncrementalUpdate to generate triggers in C for incremental updates of matviews. For asynchronous updates, the tool does generate the triggers for collecting updated/inserted/deleted rows and then the codes for doing incremental updating as well. Tks and best regards, Vinh On Sat, Jan 5, 2019 at 5:10 AM Mitar wrote: > Hi! > > I am new to contributing to PostgreSQL and this is my first time > having patches in commit fest, so I am not sure about details of the > process here, but I assume that replying and discuss the patch during > this period is one of the actives, so I am replying to the comment. If > I should wait or something like that, please advise. > > On Fri, Jan 4, 2019 at 3:23 AM Peter Eisentraut > wrote: > > > A summary of the patch: This patch enables adding AFTER triggers (both > > > ROW and STATEMENT) on materialized views. They are fired when doing > > > REFRESH MATERIALIZED VIEW CONCURRENTLY for rows which have changed. > > > > What bothers me about this patch is that it subtly changes what a > > trigger means. It currently means, say, INSERT was executed on this > > table. You are expanding that to mean, a row was inserted into this > > table -- somehow. > > Aren't almost all statements these days generated by some sort of > automatic logic? Which generates those INSERTs and then you get > triggers on them? I am not sure where is this big difference in your > view coming from? Triggers are not defined as "user-made INSERT was > executed on this table". I think it has always been defined as "INSERT > modified this table", no matter where this insert came from (from > user, from some other trigger, by backup process). I mean, this is the > beauty of declarative programming. You define it once and you do not > care who triggers it. > > Materialized views are anyway just built-in implementation of tables + > triggers to rerun the query. You could reconstruct them manually. And > why would not triggers be called if tables is being modified through > INSERTs? So if PostgreSQL has such a feature, why make it limited and > artificially make it less powerful? It is literally not possible to > have triggers only because there is "if trigger on a materialized > view, throw an exception". > > > Triggers should generally refer to user-facing commands > > So triggers on table A are not run when some other trigger from table > B decides to insert data into table A? Not true. I think triggers have > never cared who and where an INSERT came from. They just trigger. From > user, from another trigger, or from some built-in PostgreSQL procedure > called REFRESH. > > > Could you not make a trigger on REFRESH itself? > > If you mean if I could simulate this somehow before or after I call > REFRESH, then not really. I would not have access to previous state of > the table to compute the diff anymore. Moreover, I would have to > recompute the diff again, when REFRESH already did it once. > > I could implement materialized views myself using regular tables and > triggers. And then have triggers after change on that table. But this > sounds very sad. > > Or, are you saying that we should introduce a whole new type of of > trigger, REFRESH trigger, which would be valid only on materialized > views, and get OLD and NEW relations for previous and old state? I > think this could be an option, but it would require much more work, > and more changes to API. Is this what community would prefer? > > > This is also a problem, because it would allow bypassing the trigger > > accidentally. > > Sure, this is why it is useful to explain that CONCURRENT REFRESH uses > INSERT/UPDATE/DELETE and this is why you get triggers, and REFRESH > does not (but it is faster). I explained this in documentation. > > But yes, this is downside. I checked the idea of calling row-level > triggers after regular REFRESH, but it seems it will introduce a lot > of overhead and special handling. I tried implementing it as TRUNCATE > + INSERTS instead of heap swap and it is 2x slower. > > > Moreover, consider that there could be updatable materialized views, > > just like there are updatable normal views. And there could be triggers > > on those updatable materialized views. Those would look similar but > > work quite differently from what you are proposing here. > > Hm, not really. I would claim they would behave exactly the same. > AFTER trigger on INSERT on a materialized view would trigger for rows > which have changed through user updating materialized view directly, > or by calling CONCURRENT REFRESH which inserted a row. In both cases > the same trigger would run because materialized view had a row > inserted. Pretty nice. > > > Mitar > > -- > http://mitar.tnode.com/ > https://twitter.com/mitar_m > >
Re: jsonpath
On 1/5/19 1:11 AM, Alexander Korotkov wrote: > On Tue, Dec 4, 2018 at 2:23 AM Nikita Glukhov wrote: >> 2) We define both DCH_FF# and DCH_ff#, but we never ever use the >> lower-case version. Heck, it's not mentioned even in DCH_keywords, which >> does this: >> >> ... >> {"FF1", 3, DCH_FF1, false, FROM_CHAR_DATE_NONE}, /* F */ >> ... >> {"ff1", 3, DCH_FF1, false, FROM_CHAR_DATE_NONE}, /* F */ >> ... >> >> Compare that to DCH_DAY, DCH_Day and DCH_day, mapped to "DAY", "Day" and >> "day". >> >> Yes, "ff#" are mapped to DCH_FF# like "mi" is mapped DCH_MI. >> >> "Day", "day" are not mapped to DCH_DAY because they determine letter case in >> the >> output, but "ff1" and "FF#" output contains only digits. > > Right, DCH_poz is also offset in DCH_keywords array. So, if array has > an entry for "ff1" then enum should have a DCH_ff1 member in the same > position. > I guess my question is why we define DCH_ff# at all, when it's not mapped in DCH_keywords? ISTM we could simply leave them out of all the arrays, no? Of course, this is not specific to this patch, as it applies to pre-existing items (like DCH_mi). regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Use atexit() in initdb and pg_basebackup
On 04/01/2019 20:35, Alvaro Herrera wrote: > Seems you're registering the atexit cb twice here; you should only do so > in the first "!conn" block. OK, fixed. >> @@ -3438,5 +3437,8 @@ main(int argc, char *argv[]) >> >> destroyPQExpBuffer(start_db_cmd); >> >> +/* prevent cleanup */ >> +made_new_pgdata = found_existing_pgdata = made_new_xlogdir = >> found_existing_xlogdir = false; >> + >> return 0; >> } > > This is a bit ugly, but meh. Yeah. Actually, we already have a solution of this in pg_basebackup, with a bool success variable. I rewrote it like that. At least it's better for uniformity. I also added an atexit() conversion in isolationtester. It's mostly the same thing. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services From 1cb0310a012e9b1c3324ca9df4bfd6ede51c5d96 Mon Sep 17 00:00:00 2001 From: Peter Eisentraut Date: Sat, 29 Dec 2018 13:21:47 +0100 Subject: [PATCH v2 1/3] pg_basebackup: Use atexit() Instead of using our custom disconnect_and_exit(), just register the desired cleanup using atexit() and use the standard exit() to leave the program. --- src/bin/pg_basebackup/pg_basebackup.c | 142 + src/bin/pg_basebackup/pg_receivewal.c | 45 src/bin/pg_basebackup/pg_recvlogical.c | 18 ++-- 3 files changed, 104 insertions(+), 101 deletions(-) diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c index 88421c7893..e51ef91d66 100644 --- a/src/bin/pg_basebackup/pg_basebackup.c +++ b/src/bin/pg_basebackup/pg_basebackup.c @@ -136,7 +136,6 @@ static PQExpBuffer recoveryconfcontents = NULL; /* Function headers */ static void usage(void); -static void disconnect_and_exit(int code) pg_attribute_noreturn(); static void verify_dir_is_empty_or_create(char *dirname, bool *created, bool *found); static void progress_report(int tablespacenum, const char *filename, bool force); @@ -217,25 +216,25 @@ cleanup_directories_atexit(void) } static void -disconnect_and_exit(int code) +disconnect_atexit(void) { if (conn != NULL) PQfinish(conn); +} #ifndef WIN32 - - /* -* On windows, our background thread dies along with the process. But on -* Unix, if we have started a subprocess, we want to kill it off so it -* doesn't remain running trying to stream data. -*/ +/* + * On windows, our background thread dies along with the process. But on + * Unix, if we have started a subprocess, we want to kill it off so it + * doesn't remain running trying to stream data. + */ +static void +kill_bgchild_atexit(void) +{ if (bgchild > 0) kill(bgchild, SIGTERM); -#endif - - exit(code); } - +#endif /* * Split argument into old_dir and new_dir and append to tablespace mapping @@ -562,7 +561,7 @@ StartLogStreamer(char *startpos, uint32 timeline, char *sysidentifier) fprintf(stderr, _("%s: could not parse write-ahead log location \"%s\"\n"), progname, startpos); - disconnect_and_exit(1); + exit(1); } param->startptr = ((uint64) hi) << 32 | lo; /* Round off to even segment position */ @@ -575,7 +574,7 @@ StartLogStreamer(char *startpos, uint32 timeline, char *sysidentifier) fprintf(stderr, _("%s: could not create pipe for background process: %s\n"), progname, strerror(errno)); - disconnect_and_exit(1); + exit(1); } #endif @@ -604,7 +603,7 @@ StartLogStreamer(char *startpos, uint32 timeline, char *sysidentifier) { if (!CreateReplicationSlot(param->bgconn, replication_slot, NULL, temp_replication_slot, true, true, false)) - disconnect_and_exit(1); + exit(1); if (verbose) { @@ -635,7 +634,7 @@ StartLogStreamer(char *startpos, uint32 timeline, char *sysidentifier) fprintf(stderr, _("%s: could not create directory \"%s\": %s\n"), progname, statusdir, strerror(errno)); - disconnect_and_exit(1); + exit(1); } } @@ -654,19 +653,20 @@ StartLogStreamer(char *startpos, uint32 timeline, char *sysidentifier) { fprintf(stderr, _("%s: could not create background process: %s\n"), progname, strerror(errno)); - disconnect_and_exit(1); + exit(1); } /* * Else we are in the parent process and all is well. */ + atexit(kill_bgchild_atexit); #else
Re: [PATCH] Log PostgreSQL version number on startup
On 21/11/2018 15:46, Christoph Berg wrote: > A startup looks like this: > > 2018-11-21 15:19:47.259 CET [24453] LOG: listening on IPv6 address "::1", > port 5431 > 2018-11-21 15:19:47.259 CET [24453] LOG: listening on IPv4 address > "127.0.0.1", port 5431 > 2018-11-21 15:19:47.315 CET [24453] LOG: listening on Unix socket > "/tmp/.s.PGSQL.5431" > 2018-11-21 15:19:47.394 CET [24453] LOG: starting PostgreSQL 12devel on > x86_64-pc-linux-gnu, compiled by gcc (Debian 8.2.0-9) 8.2.0, 64-bit > 2018-11-21 15:19:47.426 CET [24454] LOG: database system was shut down at > 2018-11-21 15:15:35 CET > 2018-11-21 15:19:47.460 CET [24453] LOG: database system is ready to accept > connections > > (I'd rather put the start message before the listening messages, but I > think the startup message should be logged via logging_collector, and > listening is logged before the log file is opened.) Why don't we start the logging collector before opening the sockets? -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [proposal] Add an option for returning SQLSTATE in psql error message
On 04/12/2018 13:18, didier wrote: > diff --git a/src/test/regress/sql/psql.sql b/src/test/regress/sql/psql.sql > index 1bb2a6e16d..64628f29a3 100644 > --- a/src/test/regress/sql/psql.sql > +++ b/src/test/regress/sql/psql.sql > @@ -1016,3 +1016,22 @@ select 1/(15-unique2) from tenk1 order by unique2 > limit 19; > \echo 'last error code:' :LAST_ERROR_SQLSTATE > > \unset FETCH_COUNT > + > +-- verbosity error setting > +\set VERBOSITY terse > +SELECT 1 UNION; > +\echo 'error:' :ERROR > +\echo 'error code:' :SQLSTATE > +\echo 'last error message:' :LAST_ERROR_MESSAGE > + > +\set VERBOSITY sqlstate > +SELECT 1 UNION; > +\echo 'error:' :ERROR > +\echo 'error code:' :SQLSTATE > +\echo 'last error message:' :LAST_ERROR_MESSAGE > + > +\set VERBOSITY default > +SELECT 1 UNION; > +\echo 'error:' :ERROR > +\echo 'error code:' :SQLSTATE > +\echo 'last error message:' :LAST_ERROR_MESSAGE > -- Why are you not including a test for \set VERBOSITY verbose? -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [PATCH] Improvements to "Getting started" tutorial for Google Code-in task
On 04/01/2019 00:05, Alvaro Herrera wrote: > Besides that, I have a hard time considering this patch committable. > There are some good additions, but they are mixed with some wording > changes that seem to be there just because the author doesn't like the > original, not because they're actual improvements. I agree. I don't find any of the changes to be improvements. > diff --git a/doc/src/sgml/installation.sgml b/doc/src/sgml/installation.sgml > index 4487d0cfd1..460a2279b6 100644 > --- a/doc/src/sgml/installation.sgml > +++ b/doc/src/sgml/installation.sgml > @@ -75,7 +75,7 @@ su - postgres >make programs or older > GNU make versions will > not work. >(GNU make is sometimes > installed under >the name gmake.) To test for > GNU > - make enter: > + make and check its version, enter: > > make --version > This might be worthwhile, but it kind of restates something obvious. > @@ -385,8 +385,8 @@ su - postgres > This script will run a number of tests to determine values for various > system dependent variables and detect any quirks of your > operating system, and finally will create several files in the > -build tree to record what it found. You can also run > -configure in a directory outside the source > +build tree to record what it found. If it does not print any error > messages, configuration was successful. > +You can also run configure in a directory outside > the source > tree, if you want to keep the build directory separate. This > procedure is also called a > > VPATHVPATH This just says that if there were no errors, it was successful. I think we don't need to state that. The (from-source) installation chapter is not for beginners. > @@ -1610,6 +1610,17 @@ su - postgres > > All of PostgreSQL successfully made. Ready to install. > > +If you see an error message like: > + > +ERROR: `flex' is missing on your system. It is needed to create the > +file `bootscanner.c'. You can either get flex from a GNU mirror site > +or download an official distribution of PostgreSQL, which contains > +pre-packaged flex output. > + > +then packages which are required to build > +PostgreSQL are missing. > +See for a list of requirements. > + > > > This says, if there was an error, you need to read what it says and follow the instructions. Again, see above. Moreover, I don't want to keep copies of particular error messages in the documentation that we then need to keep updating. > diff --git a/doc/src/sgml/start.sgml b/doc/src/sgml/start.sgml > index 5b73557835..e29672a505 100644 > --- a/doc/src/sgml/start.sgml > +++ b/doc/src/sgml/start.sgml > @@ -19,9 +19,8 @@ > > > If you are not sure whether PostgreSQL > -is already available or whether you can use it for your > -experimentation then you can install it yourself. Doing so is not > -hard and it can be a good exercise. > +is already available for your experimentation, > +you can install it yourself, which is not complicated. > PostgreSQL can be installed by any > unprivileged user; no superuser (root) > access is required. Doesn't look like a necessary change to me. > @@ -83,7 +82,7 @@ > > > > - The user's client (frontend) application that wants to perform > + The user's client (frontend), an application that wants to perform > database operations. Client applications can be very diverse > in nature: a client could be a text-oriented tool, a graphical > application, a web server that accesses the database to Doesn't look like an improvement to me. > @@ -153,7 +152,7 @@ > > $ createdb mydb > > -If this produces no response then this step was successful and you can > skip over the > +If this exits without any error message then this step was successful > and you can skip over the > remainder of this section. > > This removes the valuable information that in the success case, no output is printed. > @@ -240,12 +239,14 @@ createdb: database creation failed: ERROR: permission > denied to create database > > You can also create databases with other names. > PostgreSQL allows you to create any > -number of databases at a given site. Database names must have an > -alphabetic first character and are limited to 63 bytes in > -length. A convenient choice is to create a database with the same > -name as your current user name. Many tools assume that database > -name as the default, so it can save you some typing. To create > -that database, simply type: > +number of databases at a given site. However, if you would like to > create databases with names that do not start with an alphabetic character, > +you will need to quote the identifier, like "1234". The length of > database names are limited to 63 bytes, > +which is the default length defined i
Re: Use atexit() in initdb and pg_basebackup
On 2019-Jan-05, Peter Eisentraut wrote: > On 04/01/2019 20:35, Alvaro Herrera wrote: > >> + /* prevent cleanup */ > >> + made_new_pgdata = found_existing_pgdata = made_new_xlogdir = > >> found_existing_xlogdir = false; > >> + > >>return 0; > >> } > > > > This is a bit ugly, but meh. > > Yeah. Actually, we already have a solution of this in pg_basebackup, > with a bool success variable. I rewrote it like that. At least it's > better for uniformity. Ah, yeah, much better, LGTM. > I also added an atexit() conversion in isolationtester. It's mostly the > same thing. LGTM in a quick eyeball. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [proposal] Add an option for returning SQLSTATE in psql error message
Peter Eisentraut writes: > Why are you not including a test for \set VERBOSITY verbose? Stability of the output would be a problem ... regards, tom lane
Re: btree.sgml typo?
On Sat, Jan 5, 2019 at 1:35 AM Tatsuo Ishii wrote: > PostgreSQL includes an implementation of the > standard btree (multi-way binary tree) index data > structure. > > I think the term "btree" here means "multi-way balanced tree", rather > than "multi-way binary tree". In fact in our btree, there could be > more than one key in a node. Patch attached. +1 for applying this patch. The existing wording is highly confusing, especially because many people already incorrectly think that a B-Tree is just like a self-balancing binary search tree. There is no consensus on exactly what the "b" actually stands for, but it's definitely not "binary". I suppose that the original author meant that a B-Tree is a generalization of a binary tree, which is basically true -- though that's a very academic point. -- Peter Geoghegan
Re: Remove Deprecated Exclusive Backup Mode
Greetings, * Laurenz Albe (laurenz.a...@cybertec.at) wrote: > I too only learned about this recently, while the problem with exclusive > backups has been known at least since 2008 (c979a1fe), and nobody felt > this to be a terrible problem back then. My recollection was that back then it wasn't clear what to *do* about the problem. Once we had a solution (non-exclusive backups) we deprecated the exclusive backup mechanism. > I believe that the danger is greatly overrated. It is not like you end > up with a corrupted database after a crash, and you get a pretty helpful > error message. Many people are happy enough to live with that. Unfortunately, people get corrupted databases all the time because the backups are corrupted. The problem with databases not restarting correctly is another issue with the exclusive backup method but it's certainly not the only one. > I'm on board with deprecating and removing it eventually, but I see no > problem in waiting for the customary 5 years. We don't actually have a 'customary 5 years' rule of any sort. > And yes, a prominent warning in the next major release notes would be > a good thing. This is two years too late, at best. That is, maybe it would have made sense to highlight that the exclusive backup method was deprecated when it was made so, but that ship has sailed. Further, we don't put out announcements saying "we're going to remove X in the next release" and I don't see it making sense to start now. Thanks! Stephen signature.asc Description: PGP signature
Re: Remove Deprecated Exclusive Backup Mode
Greetings, * Peter Eisentraut (peter.eisentr...@2ndquadrant.com) wrote: > On 12/12/2018 05:31, Robert Haas wrote: > > Most of the features I've been involved in removing have been > > deprecated for 5+ years. The first release where this one was > > deprecated was only 2 years ago. So it feels dramatically faster to > > me than what I think we have typically done. > > I was just looking this up as well, and I find it too fast. The > nonexclusive backups were introduced in 9.6. So I'd say that we could > remove the exclusive ones when 9.5 goes EOL. (That would mean this > patch could be submitted for PostgreSQL 13, since 9.5 will go out of > support around the time PG13 would be released.) I don't agree with either the notion that we have to wait 5 years in this case or that we've only had a good alternative to the exclusive backup mode since 9.5 as we've had pg_basebackup since 9.1. Thanks, Stephen signature.asc Description: PGP signature
Re: Remove Deprecated Exclusive Backup Mode
Greetings, * Peter Eisentraut (peter.eisentr...@2ndquadrant.com) wrote: > Clearly, not having to do that at all is better, but if this is all > there is to it, then I'm confused by the characterizations of how awful > and terrible this feature is and how we must rush to remove it. It's not all there is to it though. This issue leads to extended downtime regularly and is definitely a huge 'gotcha' for users, even if you don't want to call it outright broken, but the other issue is that our documentation is ridiculously complicated around how to do a backup properly because of this and that also leads to the reality that it's difficult to make improvements to it because every sentence has to consider both methods, and that's really assuming that users actively read through the detailed documentation instead of just looking at those nice simple 'pg_start_backup' and 'pg_stop_backup' methods and use them thinking that's all that's required. We see this *all* the time, on the lists, in blog posts (even those from somewhat reputable companies...), and in other places. The exclusive backup method is a huge foot-gun for new users to PostgreSQL and leads to downtime, corrupt backups, and then corrupt databases when backups get restored. It really does need to go, and the sooner, the better. Thanks, Stephen signature.asc Description: PGP signature
ALTER INDEX fails on partitioned index
12dev and 11.1: postgres=# CREATE TABLE t(i int)PARTITION BY RANGE(i); postgres=# CREATE INDEX ON t(i) WITH(fillfactor=11); postgres=# ALTER INDEX t_i_idx SET (fillfactor=12); ERROR: 42809: "t_i_idx" is not a table, view, materialized view, or index LOCATION: ATWrongRelkindError, tablecmds.c:5031 I can't see that's deliberate, but I found an earlier problem report; however, discussion regarding the ALTER behavior seems to have been eclipsed due to 2nd, separate issue with pageinspect. https://www.postgresql.org/message-id/flat/CAKcux6mb6AZjMVyohnta6M%2BfdkUB720Gq8Wb6KPZ24FPDs7qzg%40mail.gmail.com Justin
Re: Sketch of a fix for that truncation data corruption issue
Greetings, * Andres Freund (and...@anarazel.de) wrote: > On 2018-12-10 15:38:55 -0500, Tom Lane wrote: > > Also, I'm not entirely sure whether there's anything in our various > > replication logic that's dependent on vacuum truncation taking AEL. > > Offhand I'd expect the reduced use of AEL to be a plus, but maybe > > I'm missing something. > > It'd be a *MAJOR* plus. One of the biggest operational headaches for > using a HS node for querying is that there'll often be conflicts due to > vacuum truncating relations (which logs an AEL), even if > hot_standby_feedback is used. There's been multiple proposals to > allow disabling truncations just because of that. Huge +1 from me here, we've seen this too. Getting rid of the conflict when using a HS node for querying would be fantastic. Thanks! Stephen signature.asc Description: PGP signature
Re: Why not represent "never vacuumed" accurately wrt pg_class.relpages?
Greetings, * Robert Haas (robertmh...@gmail.com) wrote: > On Tue, Dec 11, 2018 at 11:47 PM Tom Lane wrote: > > Andres Freund writes: > > > I don't quite get why we don't instead just represent "never vacuumed" > > > by storing a more meaningful value in relpages? > > > > Mostly, not wanting to break clients that look at these fields. > > If catalog compatibility weren't a concern, I'd seriously consider > > replacing both of them with a float "average tuples per page" ratio. > > I think we should do exactly that thing. On first blush, I tend to agree, but what I really wanted to remark here is that I don't think we should be hand-tying ourselves as to what we can do because we don't want to upset people who are reading the catalogs. We make changes to the catalogs pretty regularly and while we get complaints here and there, by and large, users are accustomed to them and appreciate that we don't have a lot of baggage from trying to support old interfaces and that we're able to make as much progress year over year as we are. Further, we support major releases for 5 years for a reason- users have quite a bit of time to adjust to the changes. > And I also agree that assuming 10 pages when pg_class says 0 and 1 > page when pg_class says 1 is not particularly bright. Agreed. Thanks! Stephen signature.asc Description: PGP signature
Re: Record last password change
Greetings, * Michael Banck (michael.ba...@credativ.de) wrote: > a customer recently mentioned that they'd like to be able to see when a > (md5, scram) role had their password last changed. There is an awful lot here that we really should be doing. For a long time, that felt prettty stalled because of the md5 mechanism being used, but now that we've got SCRAM, there's a number of things we should be doing: - Password aging (which requires knowing when it was last changed) - Password complexity - Disallow repeated use of the same password - Requiring password change on first/next connection - User/Password profiles more... > Use-cases for this would be issueing an initial password and then later > making sure it got changed, or auditing that all passwords get changed > once a year. You can do that via external authentication methods like > ldap/gss-api/pam but in some setups those might not be available to the > DBAs. Agreed. > I guess it would amount to adding a column like rolpasswordchanged to > pg_authid and updating it when rolpassword changes, but maybe there is a > better way? That could be a start, but I do expect that we'll grow at least one other table eventually to support user profiles. > The same was requested in https://dba.stackexchange.com/questions/91252/ > how-to-know-when-postgresql-password-is-changed so I was wondering > whether this would be a welcome change/addition, or whether people think > it's not worth bothering to implement it? Definitely a +1 from me, but I'd like us to be thinking about the other things we should be doing in this area to bring our password-based authentication mechanism kicking-and-screaming into the current decade. Thanks! Stephen signature.asc Description: PGP signature
Re: Record last password change
Greetings, * Tom Lane (t...@sss.pgh.pa.us) wrote: > Michael Banck writes: > > The same was requested in https://dba.stackexchange.com/questions/91252/ > > how-to-know-when-postgresql-password-is-changed so I was wondering > > whether this would be a welcome change/addition, or whether people think > > it's not worth bothering to implement it? > > This has all the same practical problems as recording object creation > times, which we're not going to do either. (You can consult the > archives for details, but from memory, the stickiest aspects revolve > around what to do during dump/reload. Although even CREATE OR REPLACE > offers interesting definitional questions. In the end there are just > too many different behaviors that somebody might want.) I disagree with these being serious practical problems- we just need to decide which was to go when it comes to dump/restore here and there's an awful lot of example systems out there to compare to for this case. > I've heard that if you want to implement a password aging policy, PAM > authentication can manage that for you; but I don't know the details. That's ridiculously ugly; I know, because I've had to do it, more than once. In my view, it's past time to update our password mechanisms to have the features that one expects a serious system to have these days. Thanks! Stephen signature.asc Description: PGP signature
Re: Record last password change
Stephen Frost writes: > ... Definitely a +1 from me, but I'd like us to be thinking about the other > things we should be doing in this area to bring our password-based > authentication mechanism kicking-and-screaming into the current decade. I'm not really excited about reinventing the whole of PAM, which is where this argument seems to be leading. regards, tom lane
Re: Record last password change
Stephen Frost writes: > * Tom Lane (t...@sss.pgh.pa.us) wrote: >> This has all the same practical problems as recording object creation >> times, which we're not going to do either. (You can consult the >> archives for details, but from memory, the stickiest aspects revolve >> around what to do during dump/reload. Although even CREATE OR REPLACE >> offers interesting definitional questions. In the end there are just >> too many different behaviors that somebody might want.) > I disagree with these being serious practical problems- we just need to > decide which was to go when it comes to dump/restore here and there's an > awful lot of example systems out there to compare to for this case. Yeah, an awful lot of systems with an awful lot of different behaviors. This doesn't exactly refute my point. For the specific case of password aging, it's possible we could agree on a definition, but that remains to be seen. regards, tom lane
Re: Grant documentation about "all tables"
Greetings Lætitia! * Lætitia Avrot (laetitia.av...@gmail.com) wrote: > When you look at Postgres' SQL reference documentation for `GRANT`, the > `ALL TABLES` clause is explained as : > > > ALL TABLES also affects views and foreign tables, just like > the specific-object GRANT command. > > A colleague of mine was asking himself if it included materialized views or > not (well, yes it does). > > I made that tiny patch to add materialized views to the list. It builds on > my laptop. > > Then another question crossed my mind... What about partitioned tables ? > I'm pretty sure it works for them too (because they're basically tables) > but should we add them too ? I couldn't decide whether to add them too or > not so I refrain from doing it and am asking you the question. The question here, at least in my mind, is if we feel it necessary to list out all of the specific kinds of "views" (as in, regular views and materialized views), and the same question applies to tables- do we list out all the specific kinds of "tables" (to include partitioned tables), or not? To figure that out, I'd suggest looking at existing documentation where we have similar lists and see what we've done in the past. If those other cases list everything explicitly, then the answer is clear, and if they don't, then we can either leave the documentation as-is, or come up with a complete list of changes that need to be made. If there aren't any other cases then I'd probably fall-back on looking at how we document things in the system catalogs area of the docs and see how much we get into the individual specific kinds of tables/views and perhaps that would help us figure out what makes sense to do here. Thanks! Stephen signature.asc Description: PGP signature
Re: Discussion: Fast DB/Schema/Table disk size check in Postgresql
Greetings, * Hubert Zhang (hzh...@pivotal.io) wrote: > For very large databases, the dbsize function `pg_database_size_name()` > etc. could be quite slow, since it needs to call `stat()` system call on > every file in the target database. I agree, it'd be nice to improve this. > We proposed a new solution to accelerate these dbsize functions which check > the disk usage of database/schema/table. The new functions avoid to call > `stat()` system call, and instead store the disk usage of database objects > in user tables(called diskquota.xxx_size, xxx could be db/schema/table). > Checking the size of database 'postgres' could be converted to the SQL > query `select size from diskquota.db_size where name = `postgres``. This seems like an awful lot of work though. I'd ask a different question- why do we need to stat() every file? In the largest portion of the system, when it comes to tables, indexes, and such, if there's a file 'X.1', you can be pretty darn sure that 'X' is 1GB in size. If there's a file 'X.245' then you can know that there's 245 files (X, X.1, X.2, X.3 ... X.244) that are 1GB in size. Maybe we should teach pg_database_size_name() about that? Thanks! Stephen signature.asc Description: PGP signature
Re: jsonpath
On Sat, Jan 5, 2019 at 5:21 PM Tomas Vondra wrote: > On 1/5/19 1:11 AM, Alexander Korotkov wrote: > > On Tue, Dec 4, 2018 at 2:23 AM Nikita Glukhov > > wrote: > >> 2) We define both DCH_FF# and DCH_ff#, but we never ever use the > >> lower-case version. Heck, it's not mentioned even in DCH_keywords, which > >> does this: > >> > >> ... > >> {"FF1", 3, DCH_FF1, false, FROM_CHAR_DATE_NONE}, /* F */ > >> ... > >> {"ff1", 3, DCH_FF1, false, FROM_CHAR_DATE_NONE}, /* F */ > >> ... > >> > >> Compare that to DCH_DAY, DCH_Day and DCH_day, mapped to "DAY", "Day" and > >> "day". > >> > >> Yes, "ff#" are mapped to DCH_FF# like "mi" is mapped DCH_MI. > >> > >> "Day", "day" are not mapped to DCH_DAY because they determine letter case > >> in the > >> output, but "ff1" and "FF#" output contains only digits. > > > > Right, DCH_poz is also offset in DCH_keywords array. So, if array has > > an entry for "ff1" then enum should have a DCH_ff1 member in the same > > position. > > > > I guess my question is why we define DCH_ff# at all, when it's not > mapped in DCH_keywords? ISTM we could simply leave them out of all the > arrays, no? I think we still need separate array entries for "FF1" and "ff1". At least, as long as we don't allow "fF1" and "Ff1". In order to get rid of DCH_ff#, I think we should decouple DCH_keywords from array indexes. > Of course, this is not specific to this patch, as it applies > to pre-existing items (like DCH_mi). Right, that should be a subject of separate patch. -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: Cell-Level security
Greetings, * Andrew Alsup (bluesbrea...@gmail.com) wrote: > Contrary to popular understanding, in classified environments it is common > to have data marked with a variety of combinations that make it difficult > to create roles and labels that match the various permutations. As a simple > example, a document could be classified as follows: > > Releasable to US citizens with a secret clearance OR Canadian citizens with > a top-secret clearance OR Australian citizens with a top-secret clearance. > > I am developing an extension that exposes a function, callable from a > PostgreSQL RLS policy which supports Accumulo-style visibility expressions. > Using the example above, the record might contain the following security > label expression: > > (S&US)|(TS&CAN)|(TS&AUS) > > A little more info on Accumulo-style expressions can be found here: > https://accumulo.apache.org/1.6/apidocs/org/apache/accumulo/core/security/ColumnVisibility.html > > Although I still have more testing to do, things are working well, and > there are some options about where the RLS policy function can pull the > authorizations from when determining visibility. Currently JWT tokens (via > set_config) are supported. > > I'm wondering how feasible a cell-level security implementation would be. > My thought is that by using an hstore column, a table could opt-in to row- > and cell-level security, by having the hstore track the visibility of any > or all columns in the row. > > | id | fname | lname | vis(hstore) | > ||---|---|-| > | 1 | Bob | Umpty | vis[_row_] = 'U'| > || | | vis[fname] = 'S'| > || | | vis[lname] = 'U&USA'| > ||---|---|-| > | 2 | Alice | Skwat | vis[_row_] = 'S'| > ||---|---|-| > > For tables that have the vis(hstore) column, a query rewrite could ensure > that column references are wrapped in a visibility check, akin to using a > CASE statement within a SELECT. > > I've begun studying the call chain from parser to planner to rewrite to > executor. I'm not sure where the best place would be to perform this work. > I'm thinking the query-rewrite might work well. Thoughts? I'm certainly interested in this as well but I'd really want to see it as a built-in addition on top of the existing policy system, in some fashion. I've thought a bit about how that might work and I'm certainly open to suggestions, but I don't think an hstore would be the way to go. I would think something more akin to a pg_attribute for pg_policy would allow a great deal of flexibility; in particular, I'm thinking that this system could be used to also provide data masking, if the user so wishes, instead of just "you see it or you don't." As for where to hook this in, the rewriter seems like a pretty good place as that's where RLS is happening but I'll admit that I have some concerns about what we'll need to do to make sure that we don't end up with data leaks when users are writing their own data-masking functions to use in these policies. Thanks! Stephen signature.asc Description: PGP signature
Re: Feature: triggers on materialized views
Hi! On Sat, Jan 5, 2019 at 2:53 AM Nguyễn Trần Quốc Vinh wrote: > You can try https://github.com/ntqvinh/PgMvIncrementalUpdate to generate > triggers in C for incremental updates of matviews. > > For asynchronous updates, the tool does generate the triggers for collecting > updated/inserted/deleted rows and then the codes for doing incremental > updating as well. Thank you for sharing this. This looks interesting, but I could not test it myself (not using Windows), so I just read through the code. Having better updating of materialized views using incremental approach would really benefit my use case as well. Then triggers being added through my patch here on materialized view itself could communicate those changes which were done to the client. If I understand things correctly, this IVM would benefit the speed of how quickly we can do refreshes, and also if would allow that we call refresh on a materialized view for every change on the source tables, knowing exactly what we have to update in the materialized view. Really cool. I also see that there was recently more discussion about IVM on the mailing list. [1] [1] https://www.postgresql.org/message-id/flat/20181227215726.4d166b4874f8983a641123f5%40sraoss.co.jp [2] https://www.postgresql.org/message-id/flat/fc784a9f-f599-4dcc-a45d-dbf6fa582...@qqdd.eu Mitar -- http://mitar.tnode.com/ https://twitter.com/mitar_m
Re: Offline enabling/disabling of data checksums
Greetings, * Tomas Vondra (tomas.von...@2ndquadrant.com) wrote: > On 12/27/18 11:43 AM, Magnus Hagander wrote: > > Plus, the majority of people *should* want them on :) We don't run with > > say synchronous_commit=off by default either to make it easier on those > > that don't want to pay the overhead of full data safety :P (I know it's > > not a direct match, but you get the idea) +1 to having them on by default, we should have done that a long time ago. > I don't know, TBH. I agree making the on/off change cheaper makes moves > us closer to 'on' by default, because they may disable it if needed. But > it's not the whole story. > > If we enable checksums by default, 99% users will have them enabled. Yes, and they'll then be able to catch data corruption much earlier. Today, 99% of our users don't have them enabled and have no clue if their data has been corrupted on disk, or not. That's not good. > That means more people will actually observe data corruption cases that > went unnoticed so far. What shall we do with that? We don't have very > good answers to that (tooling, docs) and I'd say "disable checksums" is > not a particularly amazing response in this case :-( Now that we've got a number of tools available which will check the checksums in a running system and throw up warnings when found (pg_basebackup, pgBackRest and I think other backup tools, pg_checksums...), users will see corruption and have the option to restore from a backup before those backups expire out and they're left with a corrupt database and backups which also have that corruption. This ongoing call for specific tooling to do "something" about checksums is certainly good, but it's not right to say that we don't have existing documentation- we do, quite a bit of it, and it's all under the heading of "Backup and Recovery". > FWIW I don't know what to do about that. We certainly can't prevent the > data corruption, but maybe we could help with fixing it (although that's > bound to be low-level work). There's been some effort to try and automagically correct corrupted pages but it's certainly not something I'm ready to trust beyond a "well, this is what it might have been" review. The answer today is to find a backup which isn't corrupt and restore from it on a known-good system. If adding explicit documentation to that effect would reduce your level of concern when it comes to enabling checksums by default, then I'm happy to do that. Thanks! Stephen signature.asc Description: PGP signature
Re: Record last password change
Greetings, * Tom Lane (t...@sss.pgh.pa.us) wrote: > Stephen Frost writes: > > ... Definitely a +1 from me, but I'd like us to be thinking about the other > > things we should be doing in this area to bring our password-based > > authentication mechanism kicking-and-screaming into the current decade. > > I'm not really excited about reinventing the whole of PAM, which is > where this argument seems to be leading. PAM isn't supported on all of our platforms and, really, even where we do support it, it's frankly beyond impractical to actually use the PAM modules because they expect to be run as root, which we don't do. I can understand that you're not excited about it, and I'm not keen to reinvent all of PAM (there's an awful lot of it which we really don't need), but there are features that happen to also exist in PAM (and Kerberos, and LDAP, and RADIUS, and...) that we really should have in our own password-based authentication system because our users are expecting them. Looking at the various forks of PG that are out there shows that quite clearly, I don't imagine they implemented these features out of pure fun, and they obviously also realized that trying to actually use PAM from PG was ultimately a bad idea. Thanks! Stephen signature.asc Description: PGP signature
Re: Facility for detecting insecure object naming
On Wed, Dec 05, 2018 at 11:20:52PM -0800, Noah Misch wrote: > On Thu, Aug 30, 2018 at 12:06:09AM -0700, Noah Misch wrote: > > On Sat, Aug 11, 2018 at 12:47:05PM -0700, Noah Misch wrote: > > > On Wed, Aug 08, 2018 at 09:58:38AM -0400, Tom Lane wrote: > > > > When the security team was discussing this issue before, we speculated > > > > about ideas like inventing a function trust mechanism, so that attacks > > > > based on search path manipulations would fail even if they managed to > > > > capture an operator reference. I'd rather go down that path than > > > > encourage people to do more schema qualification. > > > > > > Interesting. If we got a function trust mechanism, how much qualification > > > would you then like? Here are the levels I know about, along with their > > > implications: > > > > Any preferences among these options and the fifth option I gave in > > https://postgr.es/m/20180815024429.ga3535...@rfd.leadboat.com? I don't want > > to leave earthdistance broken. So far, though, it seems no two people > > accept > > any one fix. > > Ping. I prefer (1) for most functions. Nobody else has stated an explicit preference, but here are the votes I would expect based on other comments within this thread (feel free to correct): Option-1: Misch Option-4: Lane Option-5: Haas I would be okay with option-5, which presupposes a lexical search path for functions. Due to the magnitude of adding that feature, I don't expect to attempt an implementation myself. (Would anyone else like to?) > If we can't agree on something here, (4) stands, and earthdistance functions > will continue to fail unless both the cube extension's schema and the > earthdistance extension's schema are in search_path. That's bad, and I don't > want to prolong it. I don't think implementing function trust or a lexical > search_path makes (4) cease to be bad. Implementing both, however, would make > (4) non-bad. > > > > -- (1) Use qualified references and exact match for all objects. > > > -- > > > -- Always secure, even if schema usage does not conform to > > > ddl-schemas-patterns > > > -- and function trust is disabled. > > > -- > > > -- Subject to denial of service from anyone able to CREATE in cube schema > > > or > > > -- earthdistance schema. > > > CREATE FUNCTION latitude(earth) > > > RETURNS float8 > > > LANGUAGE SQL > > > IMMUTABLE STRICT > > > PARALLEL SAFE > > > AS $$SELECT CASE > > > WHEN @cube_schema@.cube_ll_coord($1::@cube_schema@.cube, 3) > > > OPERATOR(pg_catalog./) > > > @extschema@.earth() OPERATOR(pg_catalog.<) -1 THEN > > > -90::pg_catalog.float8 > > > WHEN @cube_schema@.cube_ll_coord($1::@cube_schema@.cube, 3) > > > OPERATOR(pg_catalog./) > > > @extschema@.earth() OPERATOR(pg_catalog.>) 1 THEN > > > 90::pg_catalog.float8 > > > ELSE pg_catalog.degrees(pg_catalog.asin(@cube_schema@.cube_ll_coord( > > > $1::@cube_schema@.cube, 3) OPERATOR(pg_catalog./) > > > @extschema@.earth())) > > > END$$; > > > > > > > > > -- (2) Use qualified references for objects outside pg_catalog. > > > -- > > > -- With function trust disabled, this would be subject to privilege > > > escalation > > > -- from anyone able to CREATE in cube schema. > > > -- > > > -- Subject to denial of service from anyone able to CREATE in cube schema > > > or > > > -- earthdistance schema. > > > CREATE FUNCTION latitude(earth) > > > RETURNS float8 > > > LANGUAGE SQL > > > IMMUTABLE STRICT > > > PARALLEL SAFE > > > AS $$SELECT CASE > > > WHEN @cube_schema@.cube_ll_coord($1, 3) > > > / > > > @extschema@.earth() < -1 THEN -90::float8 > > > WHEN @cube_schema@.cube_ll_coord($1, 3) > > > / > > > @extschema@.earth() > 1 THEN 90::float8 > > > ELSE degrees(asin(@cube_schema@.cube_ll_coord($1, 3) / > > > @extschema@.earth())) > > > END$$; > > > > > > > > > -- (3) "SET search_path" with today's code. > > > -- > > > -- Security and reliability considerations are the same as (2). Today, > > > this > > > -- reduces performance by suppressing optimizations like inlining. > > > CREATE FUNCTION latitude(earth) > > > RETURNS float8 > > > LANGUAGE SQL > > > IMMUTABLE STRICT > > > PARALLEL SAFE > > > SET search_path FROM CURRENT > > > AS $$SELECT CASE > > > WHEN cube_ll_coord($1, 3) > > > / > > > earth() < -1 THEN -90::float8 > > > WHEN cube_ll_coord($1, 3) > > > / > > > earth() > 1 THEN 90::float8 > > > ELSE degrees(asin(cube_ll_coord($1, 3) / earth())) > > > END$$; > > > > > > > > > -- (4) Today's code (reformatted). > > > -- > > > -- Always secure if schema usage conforms to ddl-schemas-patterns, even if > > > -- function trust is disabled. If cube schema or earthdistance schema is > > > not in > > > -- search_path, function doesn't work. > > > CREATE FUNCTION latitude(earth) > > > RETURNS float8 > > > LANGUAGE SQL > > > IMMUTABLE STRICT > > > PARALLEL SAFE > > > AS $$SELECT CASE > > > WHEN cube_ll_coord($1, 3) > > > / > > > earth() < -1 THEN -90::float8
Re: [patch] de.po REINDEX error
On 20/12/2018 11:42, Christoph Berg wrote: > de.po's error message when you try to "REINDEX DATABASE otherdb" has > been the wrong way round since 2011: Fixes committed to the translations repository. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: insensitive collations
On 04/01/2019 17:05, Daniel Verite wrote: > When using GROUP BY and ORDER BY on a field with a non-deterministic > collation, this pops out: > > CREATE COLLATION myfr (locale='fr-u-ks-level1', > provider='icu', deterministic=false); > > =# select n from (values ('été' collate "myfr"), ('ete')) x(n) > group by 1 order by 1 ; > n > - > ete > (1 row) > > =# select n from (values ('été' collate "myfr"), ('ete')) x(n) > group by 1 order by 1 desc; > n > - > été > (1 row) I don't see anything wrong here. The collation says that both values are equal, so which one is returned is implementation-dependent. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: "SELECT ... FROM DUAL" is not quite as silly as it appears
On Sat, 5 Jan 2019 at 08:48, Tom Lane wrote: > v5 attached; this responds to your comments plus Alexander's earlier > gripe about not getting a clean build with --disable-cassert. > No really substantive changes though. I ran a few benchmarks on an AWS m5d.large instance based on top of c5c7fa261f5. The biggest regression I see is from a simple SELECT 1 at around 5-6%. A repeat of your test of SELECT 2+2 showed about half that regression so the simple addition function call is introducing enough overhead to lower the slowdown percentage by a good amount. Test 3 improved performance a bit. SELECT 1; I believe is a common query for some connection poolers as a sort of ping to the database. In light of that, the performance drop of 2 microseconds per query is not going to amount to very much in total for that use case. i.e you'll need to do half a million pings before it'll cost you 1 second of additional CPU time. Results and tests are: Setup: create table t1 (id int primary key); Test 1: explain select 1; Unpatched: $ pgbench -n -f bench1.sql -T 60 postgres tps = 30899.599603 (excluding connections establishing) tps = 30806.247429 (excluding connections establishing) tps = 30330.971411 (excluding connections establishing) Patched: tps = 28971.551297 (excluding connections establishing) tps = 28892.053072 (excluding connections establishing) tps = 28881.105928 (excluding connections establishing) (5.75% drop) Test 2: explain select * from t1 inner join (select 1 as x) x on t1.id=x.x; Unpatched: $ pgbench -n -f bench2.sql -T 60 postgres tps = 14340.027655 (excluding connections establishing) tps = 14392.871399 (excluding connections establishing) tps = 14335.615020 (excluding connections establishing) Patched: tps = 14269.714239 (excluding connections establishing) tps = 14305.901601 (excluding connections establishing) tps = 14261.319313 (excluding connections establishing) (0.54% drop) Test 3: explain select * from t1 left join (select 1 as x) x on t1.id=x.x; Unpatched: $ pgbench -n -f bench3.sql -T 60 postgres tps = 11404.769545 (excluding connections establishing) tps = 11477.229511 (excluding connections establishing) tps = 11365.426342 (excluding connections establishing) Patched: tps = 11624.081759 (excluding connections establishing) tps = 11649.150950 (excluding connections establishing) tps = 11571.724571 (excluding connections establishing) (1.74% gain) Test 4: explain select * from t1 inner join (select * from t1) t2 on t1.id=t2.id; Unpatched: $ pgbench -n -f bench4.sql -T 60 postgres tps = 9966.796818 (excluding connections establishing) tps = 9887.775388 (excluding connections establishing) tps = 9906.681296 (excluding connections establishing) Patched: tps = 9845.451081 (excluding connections establishing) tps = 9936.377521 (excluding connections establishing) tps = 9915.724816 (excluding connections establishing) (0.21% drop) -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: Ordered Partitioned Table Scans
On Thu, 20 Dec 2018 at 18:20, Julien Rouhaud wrote: > > On Wed, Dec 19, 2018 at 11:08 PM David Rowley > wrote: > > create table listp (a int) partition by list (a); > > create table listp12 partition of listp for values in(1,2); > > create table listp03 partition of listp for vlaues in(0,3); > > create table listp45 partition of listp for values in(4,5); > > create table listpd partition of listp default; > > > > select * from listp where a in(1,2,4,5); > > > > Here we prune all but listp12 and listp45. Since the default is pruned > > and listp03 is pruned then there are no interleaved values. By your > > proposed design the natural ordering is not detected since we're > > storing a flag that says the partitions are unordered due to listp03. > > No, what I'm proposing is to store if the partitions are naturally > ordered or not, *and* recheck after pruning if that property could > have changed (so if some partitions have been pruned). So we avoid > extra processing if we already knew that the partitions were ordered > (possibly with the default partition pruning information), or if we > know that the partitions are not ordered and no partition have been > pruned. I see. So if the flag says "Yes", then we can skip the plan-time check, if it says "No" and partitions were pruned, then we need to re-check as the reason the flag says "No" might have been pruned away. I guess that works, but I had imagined that the test wouldn't have been particularly expensive. As more partitions are left unpruned then such a test costing a bit more I thought would have been unlikely to be measurable, but then I've not written the code yet. Are you saying that you think this patch should have this? Or are you happy to leave it until the next round? -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)
John Naylor writes: > [ v6-0001-Use-offset-based-keyword-lookup.patch ] I spent some time hacking on this today, and I think it's committable now, but I'm putting it back up in case anyone wants to have another look (and also so the cfbot can check it on Windows). Given the discussion about possibly switching to perfect hashing, I thought it'd be a good idea to try to make the APIs less dependent on the exact table representation. So in the attached, I created a struct ScanKeywordList that holds all the data ScanKeywordLookup needs, and the generated headers declare variables of that type, and we just pass around a pointer to that instead of passing several different things. I also went ahead with the idea of splitting the category and token data into separate arrays. That allows moving the backend token array out of src/common entirely, which I think is a good thing because of the dependency situation: we no longer need to run the bison build before we can compile src/common/keywords_srv.o. There's one remaining refactoring issue that I think we'd want to consider before trying to jack this up and wheel a perfect-hash lookup under it: where to do the downcasing transform. Right now, ecpg's c_keywords.c has its own copy of the binary-search logic because it doesn't want the downcasing transform that ScanKeywordLookup does. So unless we want it to also have a copy of the hash lookup logic, we need to rearrange that somehow. We could give ScanKeywordLookup a "bool downcase" argument, or we could refactor things so that the downcasing is done by callers if they need it (which many don't). I'm not very sure which of those three alternatives is best. My argument upthread that we could always do the downcasing before keyword lookup now feels a bit shaky, because I was reminded while working on this code that we actually have different downcasing rules for keywords and identifiers (yes, really), so that it's not possible for those code paths to share a downcasing transform. So the idea of moving the keyword-downcasing logic to the callers is likely to not work out quite as nicely as I thought. (This might also mean that I was overly hasty to reject Joerg's |0x20 hack. It's still an ugly hack, but it would save doing the keyword downcasing transform if we don't get a hashcode match...) regards, tom lane diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c index e8ef966..9131991 100644 *** a/contrib/pg_stat_statements/pg_stat_statements.c --- b/contrib/pg_stat_statements/pg_stat_statements.c *** fill_in_constant_lengths(pgssJumbleState *** 3075,3082 /* initialize the flex scanner --- should match raw_parser() */ yyscanner = scanner_init(query, &yyextra, ! ScanKeywords, ! NumScanKeywords); /* we don't want to re-emit any escape string warnings */ yyextra.escape_string_warning = false; --- 3075,3082 /* initialize the flex scanner --- should match raw_parser() */ yyscanner = scanner_init(query, &yyextra, ! &ScanKeywords, ! ScanKeywordTokens); /* we don't want to re-emit any escape string warnings */ yyextra.escape_string_warning = false; diff --git a/src/backend/parser/parser.c b/src/backend/parser/parser.c index 7e9b122..4c0c258 100644 *** a/src/backend/parser/parser.c --- b/src/backend/parser/parser.c *** raw_parser(const char *str) *** 41,47 /* initialize the flex scanner */ yyscanner = scanner_init(str, &yyextra.core_yy_extra, ! ScanKeywords, NumScanKeywords); /* base_yylex() only needs this much initialization */ yyextra.have_lookahead = false; --- 41,47 /* initialize the flex scanner */ yyscanner = scanner_init(str, &yyextra.core_yy_extra, ! &ScanKeywords, ScanKeywordTokens); /* base_yylex() only needs this much initialization */ yyextra.have_lookahead = false; diff --git a/src/backend/parser/scan.l b/src/backend/parser/scan.l index fbeb86f..e1cae85 100644 *** a/src/backend/parser/scan.l --- b/src/backend/parser/scan.l *** bool escape_string_warning = true; *** 67,72 --- 67,87 bool standard_conforming_strings = true; /* + * Constant data exported from this file. This array maps from the + * zero-based keyword numbers returned by ScanKeywordLookup to the + * Bison token numbers needed by gram.y. This is exported because + * callers need to pass it to scanner_init, if they are using the + * standard keyword list ScanKeywords. + */ + #define PG_KEYWORD(kwname, value, category) value, + + const uint16 ScanKeywordTokens[] = { + #include "parser/kwlist.h" + }; + + #undef PG_KEYWORD + + /* * Set the type of YYSTYPE. */ #define YYSTYPE core_YYSTYPE *** other . *** 504,521 * We will pass this along as a normal character string, * but preceded with an internally-gene
Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)
I wrote: > I spent some time hacking on this today, and I think it's committable > now, but I'm putting it back up in case anyone wants to have another > look (and also so the cfbot can check it on Windows). ... and indeed, the cfbot doesn't like it. Here's v8, with the missing addition to Mkvcbuild.pm. regards, tom lane diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c index e8ef966..9131991 100644 *** a/contrib/pg_stat_statements/pg_stat_statements.c --- b/contrib/pg_stat_statements/pg_stat_statements.c *** fill_in_constant_lengths(pgssJumbleState *** 3075,3082 /* initialize the flex scanner --- should match raw_parser() */ yyscanner = scanner_init(query, &yyextra, ! ScanKeywords, ! NumScanKeywords); /* we don't want to re-emit any escape string warnings */ yyextra.escape_string_warning = false; --- 3075,3082 /* initialize the flex scanner --- should match raw_parser() */ yyscanner = scanner_init(query, &yyextra, ! &ScanKeywords, ! ScanKeywordTokens); /* we don't want to re-emit any escape string warnings */ yyextra.escape_string_warning = false; diff --git a/src/backend/parser/parser.c b/src/backend/parser/parser.c index 7e9b122..4c0c258 100644 *** a/src/backend/parser/parser.c --- b/src/backend/parser/parser.c *** raw_parser(const char *str) *** 41,47 /* initialize the flex scanner */ yyscanner = scanner_init(str, &yyextra.core_yy_extra, ! ScanKeywords, NumScanKeywords); /* base_yylex() only needs this much initialization */ yyextra.have_lookahead = false; --- 41,47 /* initialize the flex scanner */ yyscanner = scanner_init(str, &yyextra.core_yy_extra, ! &ScanKeywords, ScanKeywordTokens); /* base_yylex() only needs this much initialization */ yyextra.have_lookahead = false; diff --git a/src/backend/parser/scan.l b/src/backend/parser/scan.l index fbeb86f..e1cae85 100644 *** a/src/backend/parser/scan.l --- b/src/backend/parser/scan.l *** bool escape_string_warning = true; *** 67,72 --- 67,87 bool standard_conforming_strings = true; /* + * Constant data exported from this file. This array maps from the + * zero-based keyword numbers returned by ScanKeywordLookup to the + * Bison token numbers needed by gram.y. This is exported because + * callers need to pass it to scanner_init, if they are using the + * standard keyword list ScanKeywords. + */ + #define PG_KEYWORD(kwname, value, category) value, + + const uint16 ScanKeywordTokens[] = { + #include "parser/kwlist.h" + }; + + #undef PG_KEYWORD + + /* * Set the type of YYSTYPE. */ #define YYSTYPE core_YYSTYPE *** other . *** 504,521 * We will pass this along as a normal character string, * but preceded with an internally-generated "NCHAR". */ ! const ScanKeyword *keyword; SET_YYLLOC(); yyless(1); /* eat only 'n' this time */ ! keyword = ScanKeywordLookup("nchar", ! yyextra->keywords, ! yyextra->num_keywords); ! if (keyword != NULL) { ! yylval->keyword = keyword->name; ! return keyword->value; } else { --- 519,536 * We will pass this along as a normal character string, * but preceded with an internally-generated "NCHAR". */ ! int kwnum; SET_YYLLOC(); yyless(1); /* eat only 'n' this time */ ! kwnum = ScanKeywordLookup("nchar", ! yyextra->keywordlist); ! if (kwnum >= 0) { ! yylval->keyword = GetScanKeyword(kwnum, ! yyextra->keywordlist); ! return yyextra->keyword_tokens[kwnum]; } else { *** other . *** 1021,1039 {identifier} { ! const ScanKeyword *keyword; char *ident; SET_YYLLOC(); /* Is it a keyword? */ ! keyword = ScanKeywordLookup(yytext, ! yyextra->keywords, ! yyextra->num_keywords); ! if (keyword != NULL) { ! yylval->keyword = keyword->name; ! return keyword->value; } /* --- 1036,1054 {identifier} { ! int kwnum; char *ident; SET_YYLLOC(); /* Is it a keyword? */ ! kwnum = ScanKeywordLookup(yytext, ! yyextra->keywordlist); ! if (kwnum >= 0) { ! yylval->keyword = GetScanKeyword(kwnum, ! yyextra->keywordlist); ! return yyextra->keyword_tokens[kwnum]; } /* *** scanner_yyerror(const char *message, cor *** 1142,1149 core_yyscan_t scanner_init(const char *str, core_yy_extra_type *yyext, ! const ScanKeyword *keywords, ! int num_keywords) { Size slen = strlen(str); yyscan_t scanner; --- 115