Re: Postgres 11 release notes
On Fri, Sep 28, 2018 at 7:24 PM Bruce Momjian wrote: > On Fri, Sep 28, 2018 at 10:21:16AM +0900, Amit Langote wrote: > > On 2018/09/27 23:24, Alvaro Herrera wrote: > > > On 2018-Sep-27, Amit Langote wrote: > > > > > >> Sorry I couldn't reply sooner, but the following of your proposed text > > >> needs to be updated a bit: > > >> > > >> + > > >> + > > >> + Having a "default" partition for storing data that does not > > >> match a > > >> + partition key > > >> + > > >> + > > >> > > >> I think "does not match a partition key" is not accurate. Description of > > >> default partitions further below in the release notes says this: > > >> > > >> "The default partition can store rows that don't match any of the other > > >> defined partitions, and is searched accordingly." > > >> > > >> So, we could perhaps write it as: > > >> > > >> Having a "default" partition for storing data that does not match any of > > >> the remaining partitions > > > > > > Yeah, I agree that "a partition key" is not the right term to use there > > > (and that term is used in the press release text also). However I don't > > > think "remaining" is the right word there either, because it sounds as > > > if you're removing something. > > > > > > For the Spanish translation of the press release, we ended up using the > > > equivalent of "for the data that does not match any other partition". > > > > Yeah, "any other partition" is what the existing description uses too, so: > > > > Having a "default" partition for storing data that does not match any > > other partition > > Uh, what text are you talkinga about? I see this text in the release > notes since May: > > The default partition can store rows that don't match any of the > other defined partitions, and is searched accordingly. I was commenting on the Jonathan's patch upthread [1] whereby he proposed to add a line about the default partition feature in the major partitioning enhancements section at the top. The existing text that you'd added when creating the release notes is fine. > The press release? Yeah, Alvaro pointed out that the press release has inaccurate text about the default partition, but that's a separate issue, which I was unaware of. Thanks, Amit [1] https://www.postgresql.org/message-id/dcdbb9e9-cde2-fb78-fdb6-76d672d08dbc%40postgresql.org
Re: Optional message to user when terminating/cancelling backend
The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: tested, passed Spec compliant: tested, passed Documentation:tested, passed I tested this patch, and all looks well and functional. I reread a discussion and I don't see any unresolved objection against this patch. There are not warning, crashes, all tests are passed. New behave is documented well. I'll mark this patch as ready for commiters The new status of this patch is: Ready for Committer
Re: Progress reporting for pg_verify_checksums
The time() granularity to the second makes the result awkward on small tests: 8/1554552 kB (0%, 8 kB/s) 1040856/1554552 kB (66%, 1040856 kB/s) 1554552/1554552 kB (100%, 1554552 kB/s) I'd suggest to reuse the "portability/instr_time.h" stuff which allows a much better precision. I still think it would make sense to use that instead of low-precision time(). As the use-case of this is not for small tests, Even for a long run, the induce error by the 1 second imprecision on both points is significant at the beginning of the scan. I don't think it is useful to make the code more complicated for this. I do not think that it would be much more complicated to use the portability macros to get a precise time. The display is exactly the same as in pg_basebackup (except the bandwith is shown as well), so right now I think it is more useful to be consistent here. Hmmm... units are units, and the display is wrong. The fact that other commands are wrong as well does not look like a good argument to persist in an error. I've had a look around, and "kB" seems to be a common unit for 1024 bytes, e.g. in pg_test_fsync etc., unless I am mistaken? I can only repeat my above comment: the fact that postgres is wrong elsewhere is not a good reason to persist in reproducing an error. See the mother of all knowledge https://en.wikipedia.org/wiki/Kilobyte - SI (decimal, 1000): kB, MB, GB, ... - IEC (binary 1024): KiB, MiB, GiB, ... - JEDEC (binary 1024, used for memory): KB, MB, GB. Summary: - 1 kB = 1000 bytes - 1 KB = 1 KiB = 1024 bytes Decimal is used for storage (HDD, SSD), binary for memory. That is life, and IMHO Postgres code is not the place to invent new units. Moreover, I still think that the progress should use MB defined as 100 bytes for showing the progress. I would be okay with a progress printing function shared between some commands. It just needs some place. pg_rewind also has a --rewind option. I guess you mean pg_rewind also has a --progress option. Indeed. I do agree it makes sense to refactor that, but I don't think this should be part of this patch. That's a point. I'd suggest to put the new progress function directly in the common part, and other patches could take advantage of it for other commands when someone feels like it. Other comments: function toggle_progress_report has empty lines after { and before }, but this style is not used elsewhere, I think that they should be removed. The report_progress API should be ready to be used by another client application, which suggest that global variables should be avoided. Maybe: void report_progress(current, total, force) and the scan start and last times could be a static variable inside the function initialized on the first call, which would suggest to call the function at the beginning of the scan, probably with current == 0. Or maybe: time_type report_progress(current, total, start, last, force) Which would return the last time, and the caller has responsability for initializing a start time. -- Fabien.
Re: [HACKERS] kqueue
On 28/09/2018 14:19, Thomas Munro wrote: > On Fri, Sep 28, 2018 at 11:09 AM Andres Freund wrote: >> On 2018-09-28 10:55:13 +1200, Thomas Munro wrote: >>> Matteo Beccati reported a 5-10% performance drop on a >>> low-end Celeron NetBSD box which we have no explanation for, and we >>> have no reports from server-class machines on that OS -- so perhaps we >>> (or the NetBSD port?) should consider building with WAIT_USE_POLL on >>> NetBSD until someone can figure out what needs to be fixed there >>> (possibly on the NetBSD side)? >> >> Yea, I'm not too worried about that. It'd be great to test that, but >> otherwise I'm also ok to just plonk that into the template. > > Thanks for the review! Ok, if we don't get a better idea I'll put > this in src/template/netbsd: > > CPPFLAGS="$CPPFLAGS -DWAIT_USE_POLL" A quick test on a 8 vCPU / 4GB RAM virtual machine running a fresh install of NetBSD 8.0 again shows that kqueue is consistently slower running pgbench vs unpatched master on tcp-b like pgbench workloads: ~1200tps vs ~1400tps w/ 96 clients and threads, scale factor 10 while on select only benchmarks the difference is below the noise floor, with both doing roughly the same ~30k tps. Out of curiosity, I've installed FreBSD on an identically specced VM, and the select benchmark was ~75k tps for kqueue vs ~90k tps on unpatched master, so maybe there's something wrong I'm doing when benchmarking. Could you please provide proper instructions? Cheers -- Matteo Beccati Development & Consulting - http://www.beccati.com/
Re: doc - improve description of default privileges
The Owner column is redundant, because it's always all applicable privileges. (Having this column would give the impression that it's not always all privileges, so it would be confusing.) The reason I put the owner column is to show (1) the privileges that apply to the objects (i.e. what is under "ALL") and (2) whether public's privileges are the same or not, because there are subtles differences, so I think it is interesting to have them side by side somewhere. Privileges should be listed using their full name (e.g., "SELECT"), not their internal abbreviation letter. Hmmm... the psql commands listed in the table output the abbreviated letters. Using the words would result in a large table, but maybe it could use multiline cells. The psql commands seem out of place here. If you want to learn about how to use psql, you can go to the psql documentation. The information about how to display the permissions is currently not easily available, I had to test/look for it, noticed that it is not accessible on some objects, so ISTM that it is useful to have it somewhere. Basically your points suggest that the table is maybe in the wrong place and could be improved. About the place, there is no simple choice: - backslash commands are "psql" specific - abbreviated letters are aclitem function specific, which happend to be used by psql. - full names are SQL specific (GRANT) - default permissions are object specific and can be modified... Which means that the information tends to be scattered everywhere and overall pretty unclear unless you have read all the pages, hence my proposal to put some unified summary somewhere with all the relevant information. Any choice will have its downside, and removing information to better suit one place means that my point of having some kind of summary in one place is lost, which is the initial motivation for this patch. -- Fabien.
Re: Use durable_unlink for .ready and .done files for WAL segment removal
On Fri, Sep 28, 2018 at 07:16:25PM -0400, Stephen Frost wrote: > An alternative would be to go through the .ready files on crash-recovery > and remove any .ready files that don't have corresponding WAL files, or > if we felt that it was necessary, we could do that on every restart but > do we really think we'd need to do that..? Actually, what you are proposing here sounds much better to me. That's in the area of what has been done recently with RemoveTempXlogFiles() in 5fc1008e. Any objections to doing something like that? -- Michael signature.asc Description: PGP signature
Re: Online verification of checksums
Hi, On 09/26/2018 05:15 PM, Michael Banck wrote: > ... > > New version 5 attached. > I've looked at v5, and the retry/recheck logic seems OK to me - I'd still vote to keep it consistent with what pg_basebackup does (i.e. doing the LSN check first, before looking at the checksum), but I don't think it's a bug. I'm not sure about the other issues brought up (ENOENT, short reads). I haven't given it much thought. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] proposal: schema variables
so 22. 9. 2018 v 8:00 odesílatel Pavel Stehule napsal: > Hi > > rebased against yesterday changes in tab-complete.c > rebased against last changes in master Regards Pavel > Regards > > Pavel > schema-variables-20180929-01.patch.gz Description: application/gzip
Re: Online enabling of checksums
Hi, While looking at the online checksum verification patch (which I guess will get committed before this one), it occurred to me that disabling checksums may need to be more elaborate, to protect against someone using the stale flag value (instead of simply switching to "off" assuming that's fine). The signals etc. seem good enough for our internal stuff, but what if someone uses the flag in a different way? E.g. the online checksum verification runs as an independent process (i.e. not a backend) and reads the control file to find out if the checksums are enabled or not. So if we just switch from "on" to "off" that will break. Of course, we may also say "Don't disable checksums while online verification is running!" but that's not ideal. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Online verification of checksums
Hi, One more thought - when running similar tools on a live system, it's usually a good idea to limit the impact by throttling the throughput. As the verification runs in an independent process it can't reuse the vacuum-like cost limit directly, but perhaps it could do something similar? Like, limit the number of blocks read/second, or so? regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Online verification of checksums
On Sat, Sep 29, 2018 at 10:51:23AM +0200, Tomas Vondra wrote: > One more thought - when running similar tools on a live system, it's > usually a good idea to limit the impact by throttling the throughput. As > the verification runs in an independent process it can't reuse the > vacuum-like cost limit directly, but perhaps it could do something > similar? Like, limit the number of blocks read/second, or so? When it comes to such parameters, not using a number of blocks but throttling with a value in bytes (kB or MB of course) speaks more to the user. The past experience with checkpoint_segments is one example of that. Converting that to a number of blocks internally would definitely make sense the most sense. +1 for this idea. -- Michael signature.asc Description: PGP signature
Re: Online verification of checksums
Greetings, * Michael Paquier (mich...@paquier.xyz) wrote: > On Sat, Sep 29, 2018 at 10:51:23AM +0200, Tomas Vondra wrote: > > One more thought - when running similar tools on a live system, it's > > usually a good idea to limit the impact by throttling the throughput. As > > the verification runs in an independent process it can't reuse the > > vacuum-like cost limit directly, but perhaps it could do something > > similar? Like, limit the number of blocks read/second, or so? > > When it comes to such parameters, not using a number of blocks but > throttling with a value in bytes (kB or MB of course) speaks more to the > user. The past experience with checkpoint_segments is one example of > that. Converting that to a number of blocks internally would definitely > make sense the most sense. +1 for this idea. While I agree this would be a nice additional feature to have, it seems like something which could certainly be added later and doesn't necessairly have to be included in the initial patch. If Michael has time to add that, great, if not, I'd rather have this as-is than not. I do tend to agree with Michael that having the parameter be specified as (or at least able to accept) a byte-based value is a good idea. As another feature idea, having this able to work in parallel across tablespaces would be nice too. I can certainly imagine some point where this is a default process which scans the database at a slow pace across all the tablespaces more-or-less all the time checking for corruption. Thanks! Stephen signature.asc Description: PGP signature
Re: Online enabling of checksums
Greetings, * Tomas Vondra (tomas.von...@2ndquadrant.com) wrote: > While looking at the online checksum verification patch (which I guess > will get committed before this one), it occurred to me that disabling > checksums may need to be more elaborate, to protect against someone > using the stale flag value (instead of simply switching to "off" > assuming that's fine). > > The signals etc. seem good enough for our internal stuff, but what if > someone uses the flag in a different way? E.g. the online checksum > verification runs as an independent process (i.e. not a backend) and > reads the control file to find out if the checksums are enabled or not. > So if we just switch from "on" to "off" that will break. > > Of course, we may also say "Don't disable checksums while online > verification is running!" but that's not ideal. I'm not really sure what else we could say here..? I don't particularly see an issue with telling people that if they disable checksums while they're running a tool that's checking the checksums that they're going to get odd results. Thanks! Stephen signature.asc Description: PGP signature
Adding pipe support to pg_dump and pg_restore
Hello, I recently wanted a way to encrypt/decrypt backups while still utilizing the parallel dump/restore functionality. I couldn't see a way to do this so I experimented a bit with the directory backup format. If there's in fact already a way to do this, please tell me now :-) The idea is to add a --pipe option to pg_dump / pg_restore where you can specify a custom shell command that is used to write / read each .dat-file. Usage examples include encryption with pgp and/or custom compression pipelines. %p in the command is expanded to the path to write to / read from. The pipe command is not applied to the toc. The current version is attached. Could something like this be acceptable for inclusion? From 27f6c541be6546edfef62646f514fe1a92042705 Mon Sep 17 00:00:00 2001 From: David Hedberg Date: Sat, 29 Sep 2018 12:55:52 +0200 Subject: [PATCH] Add support for --pipe to pg_dump and pg_restore --- src/bin/pg_dump/compress_io.c | 97 --- src/bin/pg_dump/compress_io.h | 6 +- src/bin/pg_dump/pg_backup.h | 2 + src/bin/pg_dump/pg_backup_directory.c | 14 ++-- src/bin/pg_dump/pg_dump.c | 17 - src/bin/pg_dump/pg_restore.c | 7 ++ 6 files changed, 121 insertions(+), 22 deletions(-) diff --git a/src/bin/pg_dump/compress_io.c b/src/bin/pg_dump/compress_io.c index a96da15dc1..64c06d7eae 100644 --- a/src/bin/pg_dump/compress_io.c +++ b/src/bin/pg_dump/compress_io.c @@ -443,6 +443,9 @@ struct cfp static int hasSuffix(const char *filename, const char *suffix); #endif +static void +expand_shell_command(char *buf, size_t bufsize, const char *cmd, const char *filepath); + /* free() without changing errno; useful in several places below */ static void free_keep_errno(void *p) @@ -464,24 +467,26 @@ free_keep_errno(void *p) * On failure, return NULL with an error code in errno. */ cfp * -cfopen_read(const char *path, const char *mode) +cfopen_read(const char *path, const char *mode, const char *pipecmd) { cfp *fp; + if (pipecmd) + fp = cfopen(path, mode, 0, pipecmd); #ifdef HAVE_LIBZ - if (hasSuffix(path, ".gz")) - fp = cfopen(path, mode, 1); + else if (hasSuffix(path, ".gz")) + fp = cfopen(path, mode, 1, NULL); else #endif { - fp = cfopen(path, mode, 0); + fp = cfopen(path, mode, 0, NULL); #ifdef HAVE_LIBZ if (fp == NULL) { char *fname; fname = psprintf("%s.gz", path); - fp = cfopen(fname, mode, 1); + fp = cfopen(fname, mode, 1, NULL); free_keep_errno(fname); } #endif @@ -501,19 +506,19 @@ cfopen_read(const char *path, const char *mode) * On failure, return NULL with an error code in errno. */ cfp * -cfopen_write(const char *path, const char *mode, int compression) +cfopen_write(const char *path, const char *mode, int compression, const char *pipecmd) { cfp *fp; if (compression == 0) - fp = cfopen(path, mode, 0); + fp = cfopen(path, mode, 0, pipecmd); else { #ifdef HAVE_LIBZ char *fname; fname = psprintf("%s.gz", path); - fp = cfopen(fname, mode, compression); + fp = cfopen(fname, mode, compression, pipecmd); free_keep_errno(fname); #else exit_horribly(modulename, "not built with zlib support\n"); @@ -530,11 +535,32 @@ cfopen_write(const char *path, const char *mode, int compression) * On failure, return NULL with an error code in errno. */ cfp * -cfopen(const char *path, const char *mode, int compression) +cfopen(const char *path, const char *mode, int compression, const char *pipecmd) { cfp *fp = pg_malloc(sizeof(cfp)); - if (compression != 0) + if (pipecmd) + { + char cmd[MAXPGPATH]; + char pmode[2]; + + if ( !(mode[0] == 'r' || mode[0] == 'w') ) { + exit_horribly(modulename, "Pipe does not support mode %s", mode); + } + pmode[0] = mode[0]; + pmode[1] = '\0'; + + expand_shell_command(cmd, MAXPGPATH, pipecmd, path); + + fp->compressedfp = NULL; + fp->uncompressedfp = popen(cmd, pmode); + if (fp->uncompressedfp == NULL) + { + free_keep_errno(fp); + fp->uncompressedfp = NULL; + } + } + else if (compression != 0) { #ifdef HAVE_LIBZ if (compression != Z_DEFAULT_COMPRESSION) @@ -731,5 +757,54 @@ hasSuffix(const char *filename, const char *suffix) suffix, suffixlen) == 0; } - #endif + +/* + * Expand a shell command + * + * Replaces %p in cmd with the path in filepath and writes the result to buf. + */ +static void +expand_shell_command(char *buf, size_t bufsize, const char *cmd, const char *filepath) +{ + char *dp; + char *endp; + const char *sp; + + dp = buf; + endp = buf + bufsize - 1; + *endp = '\0'; + + for (sp = cmd; *sp; sp++) + { + if (*sp == '%') + { + switch (sp[1]) + { +case 'p': + /* %p: absolute path of file */ + sp++; + strlcpy(dp, filepath, endp - dp); + dp += strlen(dp); + break; +case '%': + /* convert %% to a single % */ + sp++; + if (dp < endp) + *dp++ = *sp; + break; +default: + /* otherwise treat the %
Re: Odd 9.4, 9.3 buildfarm failure on s390x
On 09/29/2018 01:36 AM, Andrew Gierth wrote: Mark> What should I try next? What is the size of a C "int" on this platform? 4. cheers andrew -- Andrew Dunstanhttps://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: speeding up planning with partitions
On Fri, Sep 14, 2018 at 3:58 PM, Amit Langote wrote: > Thanks a lot for your detailed review. > I was going through your patch (v3-0002) and I have some suggestions 1. - if (nparts > 0) + /* + * For partitioned tables, we just allocate space for RelOptInfo's. + * pointers for all partitions and copy the partition OIDs from the + * relcache. Actual RelOptInfo is built for a partition only if it is + * not pruned. + */ + if (rte->relkind == RELKIND_PARTITIONED_TABLE) + { rel->part_rels = (RelOptInfo **) - palloc(sizeof(RelOptInfo *) * nparts); + palloc0(sizeof(RelOptInfo *) * rel->nparts); + return rel; + } I think we can delay allocating memory for rel->part_rels? And we can allocate in add_rel_partitions_to_query only for those partitions which survive pruning. 2. add_rel_partitions_to_query ... + /* Expand the PlannerInfo arrays to hold new partition objects. */ + num_added_parts = scan_all_parts ? rel->nparts : + bms_num_members(partindexes); + new_size = root->simple_rel_array_size + num_added_parts; + root->simple_rte_array = (RangeTblEntry **) + repalloc(root->simple_rte_array, + sizeof(RangeTblEntry *) * new_size); + root->simple_rel_array = (RelOptInfo **) + repalloc(root->simple_rel_array, + sizeof(RelOptInfo *) * new_size); + if (root->append_rel_array) + root->append_rel_array = (AppendRelInfo **) + repalloc(root->append_rel_array, + sizeof(AppendRelInfo *) * new_size); + else + root->append_rel_array = (AppendRelInfo **) + palloc0(sizeof(AppendRelInfo *) * + new_size); Instead of re-pallocing for every partitioned relation can't we first count the total number of surviving partitions and repalloc at once. 3. + /* + * And do prunin. Note that this adds AppendRelInfo's of only the + * partitions that are not pruned. + */ prunin/pruning -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: Postgres 11 release notes
On Sat, Sep 29, 2018 at 04:28:13PM +0900, Amit Langote wrote: > > The default partition can store rows that don't match any of the > > other defined partitions, and is searched accordingly. > > I was commenting on the Jonathan's patch upthread [1] whereby he > proposed to add a line about the default partition feature in the > major partitioning enhancements section at the top. > > The existing text that you'd added when creating the release notes is fine. > > > The press release? > > Yeah, Alvaro pointed out that the press release has inaccurate text > about the default partition, but that's a separate issue, which I was > unaware of. Oh, OK, thanks. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
Re: Adding pipe support to pg_dump and pg_restore
Greetings, * David Hedberg (david.hedb...@gmail.com) wrote: > I recently wanted a way to encrypt/decrypt backups while still > utilizing the parallel dump/restore functionality. I couldn't see a > way to do this so I experimented a bit with the directory backup > format. If there's in fact already a way to do this, please tell me > now :-) Supporting encryption/decryption is certainly a good idea but I'm not sure that we want to punt like this and expect the user to provide a shell script or similar to do it. I would have thought we'd build in encryption leveraging openssl (and, ideally, other providers, similar to what we're working to do with SSL) directly. > The idea is to add a --pipe option to pg_dump / pg_restore where you > can specify a custom shell command that is used to write / read each > .dat-file. Usage examples include encryption with pgp and/or custom > compression pipelines. %p in the command is expanded to the path to > write to / read from. The pipe command is not applied to the toc. I would certainly think that we'd want to have support for custom format dumps too.. > The current version is attached. Could something like this be > acceptable for inclusion? At least for my 2c, I'm not completely against it, but I'd much rather see us providing encryption directly and for all of the formats we support, doing intelligent things like encrypting the TOC for a custom format dump independently so we can still support fast restore of individual objects and such. I'm also not entirely sure about how well this proposed approach would work on Windows.. Thanks! Stephen signature.asc Description: PGP signature
Cygwin linking rules
Most of the buildfarm is now happy with the changes I made to have libpq + ecpg get src/port and src/common files via libraries ... but lorikeet isn't. It gets through the core regression tests fine (so libpq, per se, works), but contrib/dblink fails: ! ERROR: could not establish connection ! DETAIL: libpq is incorrectly linked to backend functions What this means is that libpq is calling the backend version of src/common/link-canary.c rather than the frontend version. Why would it do the right thing normally and the wrong thing in dblink? I can think of a few theories but I lack the ability to investigate: 1. Maybe the dblink.dll build is pulling in libpq.a rather than establishing a reference to libpq.dll. If so, the wrong things would happen because libpq.a won't contain the src/common/ files that libpq needs. (It seems like libpq.a is an active hazard given this. Why are we building it at all?) 2. Maybe we need a --version-script option or something equivalent to get libpq.dll's references to be preferentially resolved internally rather than to the backend. But this doesn't really explain why it worked properly before. regards, tom lane
Re: Adding pipe support to pg_dump and pg_restore
Stephen Frost writes: > * David Hedberg (david.hedb...@gmail.com) wrote: >> The idea is to add a --pipe option to pg_dump / pg_restore where you >> can specify a custom shell command that is used to write / read each >> .dat-file. Usage examples include encryption with pgp and/or custom >> compression pipelines. %p in the command is expanded to the path to >> write to / read from. The pipe command is not applied to the toc. > I would certainly think that we'd want to have support for custom format > dumps too.. This seems like rather a kluge :-(. In the context of encrypted dumps in particular, I see no really safe way to pass an encryption key down to the custom command --- either you put it in the command line to be exec'd, or you put it in the process environment, and neither of those are secure on all platforms. The assumption that the TOC doesn't need encryption seems pretty shaky as well. So I think we'd be better off proceeding as Stephen envisions. Maybe there are use-cases for the sort of thing David is proposing, but I don't think encrypted dumps present a good argument for it. regards, tom lane
Re: Online verification of checksums
On 09/29/2018 02:14 PM, Stephen Frost wrote: > Greetings, > > * Michael Paquier (mich...@paquier.xyz) wrote: >> On Sat, Sep 29, 2018 at 10:51:23AM +0200, Tomas Vondra wrote: >>> One more thought - when running similar tools on a live system, it's >>> usually a good idea to limit the impact by throttling the throughput. As >>> the verification runs in an independent process it can't reuse the >>> vacuum-like cost limit directly, but perhaps it could do something >>> similar? Like, limit the number of blocks read/second, or so? >> >> When it comes to such parameters, not using a number of blocks but >> throttling with a value in bytes (kB or MB of course) speaks more to the >> user. The past experience with checkpoint_segments is one example of >> that. Converting that to a number of blocks internally would definitely >> make sense the most sense. +1 for this idea. > > While I agree this would be a nice additional feature to have, it seems > like something which could certainly be added later and doesn't > necessairly have to be included in the initial patch. If Michael has > time to add that, great, if not, I'd rather have this as-is than not. > True, although I don't think it'd be particularly difficult. > I do tend to agree with Michael that having the parameter be specified > as (or at least able to accept) a byte-based value is a good idea. Sure, I was not really expecting it to be exposed as raw block count. I agree it should be in byte-based values (i.e. just like --max-rate in pg_basebackup). > As another feature idea, having this able to work in parallel across > tablespaces would be nice too. I can certainly imagine some point where > this is a default process which scans the database at a slow pace across > all the tablespaces more-or-less all the time checking for corruption. > Maybe, but that's certainly a non-trivial feature. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Adding pipe support to pg_dump and pg_restore
On Sat, Sep 29, 2018 at 11:42:40AM -0400, Tom Lane wrote: > Stephen Frost writes: > > * David Hedberg (david.hedb...@gmail.com) wrote: > >> The idea is to add a --pipe option to pg_dump / pg_restore where > >> you can specify a custom shell command that is used to write / > >> read each .dat-file. Usage examples include encryption with pgp > >> and/or custom compression pipelines. %p in the command is > >> expanded to the path to write to / read from. The pipe command is > >> not applied to the toc. > > > I would certainly think that we'd want to have support for custom > > format dumps too.. > > This seems like rather a kluge :-(. In the context of encrypted > dumps in particular, I see no really safe way to pass an encryption > key down to the custom command --- either you put it in the command > line to be exec'd, or you put it in the process environment, and > neither of those are secure on all platforms. As I understand it, those are the options for providing secrets in general. At least in the case of encryption, one good solution would be to use an asymmetric encryption scheme, i.e. one where encrypting doesn't expose a secret in any way. As to decryption, that's generally done with more caution in environments where things are being routinely encrypted in the first place. > The assumption that the TOC doesn't need encryption seems pretty > shaky as well. That it does. > So I think we'd be better off proceeding as Stephen envisions. > Maybe there are use-cases for the sort of thing David is proposing, > but I don't think encrypted dumps present a good argument for it. Dumping over a network seems like a reasonable use case for this. I know that we have remote ways to do this, but in some environments--think FedRAMP, or similar compliance regime--setting up a remote access to do the dump can cause extra headaches. Being able to encrypt them in the process would be helpful in situations I've seen in the past week. Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Re: Online enabling of checksums
On 09/29/2018 02:19 PM, Stephen Frost wrote: > Greetings, > > * Tomas Vondra (tomas.von...@2ndquadrant.com) wrote: >> While looking at the online checksum verification patch (which I guess >> will get committed before this one), it occurred to me that disabling >> checksums may need to be more elaborate, to protect against someone >> using the stale flag value (instead of simply switching to "off" >> assuming that's fine). >> >> The signals etc. seem good enough for our internal stuff, but what if >> someone uses the flag in a different way? E.g. the online checksum >> verification runs as an independent process (i.e. not a backend) and >> reads the control file to find out if the checksums are enabled or not. >> So if we just switch from "on" to "off" that will break. >> >> Of course, we may also say "Don't disable checksums while online >> verification is running!" but that's not ideal. > > I'm not really sure what else we could say here..? I don't particularly > see an issue with telling people that if they disable checksums while > they're running a tool that's checking the checksums that they're going > to get odd results. > I don't know, to be honest. I was merely looking at the online verification patch and realized that if someone disables checksums it won't notice it (because it only reads the flag once, at the very beginning) and will likely produce bogus errors. Although, maybe it won't - it now uses a checkpoint LSN, so that might fix it. The checkpoint LSN is read from the same controlfile as the flag, so we know the checksums were enabled during that checkpoint. Soi if we ignore failures with a newer LSN, that should do the trick, no? So perhaps that's the right "protocol" to handle this? regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Cygwin linking rules
On 09/29/2018 11:35 AM, Tom Lane wrote: Most of the buildfarm is now happy with the changes I made to have libpq + ecpg get src/port and src/common files via libraries ... but lorikeet isn't. It gets through the core regression tests fine (so libpq, per se, works), but contrib/dblink fails: ! ERROR: could not establish connection ! DETAIL: libpq is incorrectly linked to backend functions What this means is that libpq is calling the backend version of src/common/link-canary.c rather than the frontend version. Why would it do the right thing normally and the wrong thing in dblink? I can think of a few theories but I lack the ability to investigate: 1. Maybe the dblink.dll build is pulling in libpq.a rather than establishing a reference to libpq.dll. If so, the wrong things would happen because libpq.a won't contain the src/common/ files that libpq needs. (It seems like libpq.a is an active hazard given this. Why are we building it at all?) 2. Maybe we need a --version-script option or something equivalent to get libpq.dll's references to be preferentially resolved internally rather than to the backend. But this doesn't really explain why it worked properly before. I will see if I can determine if 1) is the cause. I don't know enough, or in fact anything, about 2), so don;t know that I can help there without advice. cheers andrew -- Andrew Dunstanhttps://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Adding pipe support to pg_dump and pg_restore
On Sat, Sep 29, 2018 at 5:56 PM, David Fetter wrote: > On Sat, Sep 29, 2018 at 11:42:40AM -0400, Tom Lane wrote: >> Stephen Frost writes: >> > * David Hedberg (david.hedb...@gmail.com) wrote: >> >> The idea is to add a --pipe option to pg_dump / pg_restore where >> >> you can specify a custom shell command that is used to write / >> >> read each .dat-file. Usage examples include encryption with pgp >> >> and/or custom compression pipelines. %p in the command is >> >> expanded to the path to write to / read from. The pipe command is >> >> not applied to the toc. >> >> > I would certainly think that we'd want to have support for custom >> > format dumps too.. >> >> This seems like rather a kluge :-(. In the context of encrypted >> dumps in particular, I see no really safe way to pass an encryption >> key down to the custom command --- either you put it in the command >> line to be exec'd, or you put it in the process environment, and >> neither of those are secure on all platforms. > > As I understand it, those are the options for providing secrets in > general. At least in the case of encryption, one good solution would > be to use an asymmetric encryption scheme, i.e. one where encrypting > doesn't expose a secret in any way. > > As to decryption, that's generally done with more caution in > environments where things are being routinely encrypted in the first > place. > Yes; in my specific case the idea is to use public key encryption with gpg. In that scenario the secret does not need to be on the server at all. >> The assumption that the TOC doesn't need encryption seems pretty >> shaky as well. > > That it does. > I don't think there's any inherent reason it can't be applied to the TOC as well. It's mostly an accident of me following the the existing compression code. On Sat, Sep 29, 2018 at 5:03 PM, Stephen Frost wrote: > At least for my 2c, I'm not completely against it, but I'd much rather > see us providing encryption directly and for all of the formats we > support, doing intelligent things like encrypting the TOC for a custom > format dump independently so we can still support fast restore of > individual objects and such. I'm also not entirely sure about how well > this proposed approach would work on Windows.. I haven't tested it in windows, but I did see that there's already a popen function in src/port/system.c so my guess was going to be that it can work.. Generally, my thinking is that this can be pretty useful in general besides encryption. For other formats the dumps can already be written to standard output and piped through for example gpg or a custom compression application of the administrators choice, so in a sense this functionality would merely add the same feature to the directory format. My main wish here is to be able combine a parallel dump/restore with encryption without having to first write the dump encrypted and then loop over and rewrite the files encrypted in an extra step. This can surely be quite a large win as the size of the dumps grow larger.. / David
Re: Implementing SQL ASSERTION
On 26 Sep 2018, at 12:36, Peter Eisentraut wrote: > > On 25/09/2018 01:04, Joe Wildish wrote: >> Having said all that: there are obviously going to be some expressions >> that cannot be proven to have no potential for invalidating the assertion >> truth. I guess this is the prime concern from a concurrency PoV? > > Before we spend more time on this, I think we need to have at least a > plan for that. Having thought about this some more: the answer could lie in using predicate locks, and enforcing that the transaction be SERIALIZABLE whenever an ASSERTION is triggered. To make use of the predicate locks we'd do a transformation on the ASSERTION expression. I believe that there is derivation, similar to the one mentioned up-thread re: "managers and administrators", that would essentially push predicates into the expression on the basis of the changed data. The semantics of the expression would remain unchanged, but it would mean that when the expression is rechecked, the minimal set of data is read and would therefore not conflict with other DML statements that had triggered the same ASSERTION but had modified unrelated data. Example: CREATE TABLE t (n INTEGER NOT NULL, m INTEGER NOT NULL, k INTEGER NOT NULL, PRIMARY KEY (n, m)); CREATE ASSERTION sum_k_at_most_10 CHECK (NOT EXISTS (SELECT * FROM (SELECT n, sum(k) FROM t GROUP BY n) AS r(n, ks) WHERE ks > 10)); On an INSERT/DELETE/UPDATE of "t", we would transform the inner-most expression of the ASSERTION to have a predicate of "WHERE n = NEW.n". In my experiments I can see that doing so allows concurrent transactions to COMMIT that have modified unrelated segments of "t" (assuming the planner uses Index Scan). The efficacy of this would be dictated by the granularity of the SIREAD locks; my understanding is that this can be as low as tuple-level in the case where Index Scans are used (and this is borne out in my experiments - ie. you don't want a SeqScan). > Perhaps we could should disallow cases that we can't > handle otherwise. But even that would need some analysis of which > practical cases we can and cannot handle, how we could extend support in > the future, etc. The optimisation I mentioned up-thread, plus the one hypothesised here, both rely on being able to derive the key of an expression from the underlying base tables/other expressions. We could perhaps disallow ASSERTIONS that don't have such properties? Beyond that I think it starts to get difficult (impossible?) to know which expressions are likely to be costly on the basis of static analysis. It could be legitimate to have an ASSERTION defined over what turns out to be a small subset of a very large table, for example. -Joe
Re: Implementing SQL ASSERTION
Hi Andrew, On 25 Sep 2018, at 01:51, Andrew Gierth wrote: > I haven't looked at the background of this, but if what you want to know > is whether the aggregate function has the semantics of min() or max() > (and if so, which) then the place to look is pg_aggregate.aggsortop. Thanks for the pointer. I've had a quick look at pg_aggregate, and back at my code, but I think there is more to it than just the sorting property. Specifically we need to know about the aggregate function when combined with connectors <, <=, < ANY, <= ANY, < ALL and <= ALL (and their equivalents with ">" and ">="). Also, it looks like COUNT and SUM don't have a sortop (the other aggregates I've catered for do though). When I come to do the rework of the patch I'll take a more in-depth look though, and see if this can be utilised. > As for operators, you can only make assumptions about their meaning if > the operator is a member of some opfamily that assigns it some > semantics. I had clocked the BT semantics stuff when doing the PoC patch. I have used the "get_op_btree_interpretation" function for determining operator meaning. -Joe
Re: Online enabling of checksums
Greetings, * Tomas Vondra (tomas.von...@2ndquadrant.com) wrote: > On 09/29/2018 02:19 PM, Stephen Frost wrote: > > * Tomas Vondra (tomas.von...@2ndquadrant.com) wrote: > >> While looking at the online checksum verification patch (which I guess > >> will get committed before this one), it occurred to me that disabling > >> checksums may need to be more elaborate, to protect against someone > >> using the stale flag value (instead of simply switching to "off" > >> assuming that's fine). > >> > >> The signals etc. seem good enough for our internal stuff, but what if > >> someone uses the flag in a different way? E.g. the online checksum > >> verification runs as an independent process (i.e. not a backend) and > >> reads the control file to find out if the checksums are enabled or not. > >> So if we just switch from "on" to "off" that will break. > >> > >> Of course, we may also say "Don't disable checksums while online > >> verification is running!" but that's not ideal. > > > > I'm not really sure what else we could say here..? I don't particularly > > see an issue with telling people that if they disable checksums while > > they're running a tool that's checking the checksums that they're going > > to get odd results. > > I don't know, to be honest. I was merely looking at the online > verification patch and realized that if someone disables checksums it > won't notice it (because it only reads the flag once, at the very > beginning) and will likely produce bogus errors. > > Although, maybe it won't - it now uses a checkpoint LSN, so that might > fix it. The checkpoint LSN is read from the same controlfile as the > flag, so we know the checksums were enabled during that checkpoint. Soi > if we ignore failures with a newer LSN, that should do the trick, no? > > So perhaps that's the right "protocol" to handle this? I certainly don't think we need to do anything more. Thanks! Stephen signature.asc Description: PGP signature
Re: Adding pipe support to pg_dump and pg_restore
Greetings, * David Hedberg (david.hedb...@gmail.com) wrote: > On Sat, Sep 29, 2018 at 5:56 PM, David Fetter wrote: > > On Sat, Sep 29, 2018 at 11:42:40AM -0400, Tom Lane wrote: > > As I understand it, those are the options for providing secrets in > > general. At least in the case of encryption, one good solution would > > be to use an asymmetric encryption scheme, i.e. one where encrypting > > doesn't expose a secret in any way. > > > > As to decryption, that's generally done with more caution in > > environments where things are being routinely encrypted in the first > > place. > > Yes; in my specific case the idea is to use public key encryption with > gpg. In that scenario the secret does not need to be on the server at > all. Using public key encryption doesn't mean you get to entirely avoid the question around how to handle secrets- you'll presumably want to actually restore the dump at some point. > On Sat, Sep 29, 2018 at 5:03 PM, Stephen Frost wrote: > > At least for my 2c, I'm not completely against it, but I'd much rather > > see us providing encryption directly and for all of the formats we > > support, doing intelligent things like encrypting the TOC for a custom > > format dump independently so we can still support fast restore of > > individual objects and such. I'm also not entirely sure about how well > > this proposed approach would work on Windows.. > > I haven't tested it in windows, but I did see that there's already a > popen function in src/port/system.c so my guess was going to be that > it can work.. Perhaps, though these things tend to be trickier on Windows, at least from what I've seen (I'm no Windows dev myself tho, to be clear). > Generally, my thinking is that this can be pretty useful in general > besides encryption. For other formats the dumps can already be written > to standard output and piped through for example gpg or a custom > compression application of the administrators choice, so in a sense > this functionality would merely add the same feature to the directory > format. That's certainly not the same though. One of the great advantages of custom and directory format dumps is the TOC and the ability to selectively extract data from them without having to read the entire dump file. You end up losing that if you have to pass the entire dump through something else because you're using the pipe. > My main wish here is to be able combine a parallel dump/restore with > encryption without having to first write the dump encrypted and then > loop over and rewrite the files encrypted in an extra step. This can > surely be quite a large win as the size of the dumps grow larger.. That's great, and I think we agree that it'd be a very nice feature for pg_dump/restore to support encryption, but done intelligently, across the formats that pg_dump supports, with a secure way to pass the secrets. Thanks! Stephen signature.asc Description: PGP signature
Re: Cygwin linking rules
On 09/29/2018 12:09 PM, Andrew Dunstan wrote: On 09/29/2018 11:35 AM, Tom Lane wrote: Most of the buildfarm is now happy with the changes I made to have libpq + ecpg get src/port and src/common files via libraries ... but lorikeet isn't. It gets through the core regression tests fine (so libpq, per se, works), but contrib/dblink fails: ! ERROR: could not establish connection ! DETAIL: libpq is incorrectly linked to backend functions What this means is that libpq is calling the backend version of src/common/link-canary.c rather than the frontend version. Why would it do the right thing normally and the wrong thing in dblink? I can think of a few theories but I lack the ability to investigate: 1. Maybe the dblink.dll build is pulling in libpq.a rather than establishing a reference to libpq.dll. If so, the wrong things would happen because libpq.a won't contain the src/common/ files that libpq needs. (It seems like libpq.a is an active hazard given this. Why are we building it at all?) 2. Maybe we need a --version-script option or something equivalent to get libpq.dll's references to be preferentially resolved internally rather than to the backend. But this doesn't really explain why it worked properly before. I will see if I can determine if 1) is the cause. I don't know enough, or in fact anything, about 2), so don;t know that I can help there without advice. It certainly looks like it's not linked to libpq.dll: Microsoft (R) COFF/PE Dumper Version 14.15.26726.0 Copyright (C) Microsoft Corporation. All rights reserved. Dump of file \cygwin64\home\andrew\\bf64\root\HEAD\inst\lib\postgresql\dblink.dll File Type: DLL Image has the following dependencies: postgres.exe cygcrypto-1.0.0.dll cygwin1.dll cygssl-1.0.0.dll KERNEL32.dll I'll build an earlier version to do a comparison just to make sure we're seeing the right thing. cheers andrew -- Andrew Dunstanhttps://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Implementing SQL ASSERTION
Hi David, > On 26 Sep 2018, at 19:47, David Fetter wrote: > >> Invalidating operations are "INSERT(t) and UPDATE(t.b, t.n)". > > So would DELETE(t), assuming n can be negative. Oops, right you are. Bug in my implementation :-) > Is there some interesting and fairly easily documented subset of > ASSERTIONs that wouldn't have the "can't prove" property? We can certainly know at the time the ASSERTION is created if we can use the transition table optimisation, as that relies upon the expression being written in such a way that a key can be derived for each expression. We could warn or disallow the creation on that basis. Ceri & Widom mention this actually in their papers, and their view is that most real-world use cases do indeed allow themselves to be optimised using the transition tables. -Joe
Re: Adding pipe support to pg_dump and pg_restore
Hi, On Sat, Sep 29, 2018 at 7:01 PM, Stephen Frost wrote: > Greetings, > > * David Hedberg (david.hedb...@gmail.com) wrote: >> On Sat, Sep 29, 2018 at 5:56 PM, David Fetter wrote: >> > On Sat, Sep 29, 2018 at 11:42:40AM -0400, Tom Lane wrote: >> > As I understand it, those are the options for providing secrets in >> > general. At least in the case of encryption, one good solution would >> > be to use an asymmetric encryption scheme, i.e. one where encrypting >> > doesn't expose a secret in any way. >> > >> > As to decryption, that's generally done with more caution in >> > environments where things are being routinely encrypted in the first >> > place. >> >> Yes; in my specific case the idea is to use public key encryption with >> gpg. In that scenario the secret does not need to be on the server at >> all. > > Using public key encryption doesn't mean you get to entirely avoid the > question around how to handle secrets- you'll presumably want to > actually restore the dump at some point. > You are right of course. But I don't see how it's more difficult to pass the secret to the piped commands than it is to pass it to postgres. You wouldn't want to pass the secrets as options to the commands of course. In the case of gpg you would probably let gpg store and handle them, which seems to me about the same as letting postgres store them. >> On Sat, Sep 29, 2018 at 5:03 PM, Stephen Frost wrote: >> Generally, my thinking is that this can be pretty useful in general >> besides encryption. For other formats the dumps can already be written >> to standard output and piped through for example gpg or a custom >> compression application of the administrators choice, so in a sense >> this functionality would merely add the same feature to the directory >> format. > > That's certainly not the same though. One of the great advantages of > custom and directory format dumps is the TOC and the ability to > selectively extract data from them without having to read the entire > dump file. You end up losing that if you have to pass the entire dump > through something else because you're using the pipe. > I can maybe see the problem here, but I apologize if I'm missing the point. Since all the files are individually passed through separate instances of the pipe, they can also be individually restored. I guess the --list option could be (adopted to be) used to produce a clear text TOC to further use in selective decryption of the rest of the archive? Possibly combined with an option to not apply the pipeline commands to the TOC during dump and/or restore, if there's any need for that. I do think that I understand the advantages of having a TOC that describes the exact format of the dump and how to restore it, and I am in no way arguing against having encryption included natively in the format as a default option. But I think the pipe option, or one like it, could be used to easily extend the format. Easily supporting a different compression algorithm, a different encryption method or even a different storage method like uploading the files directly to a bucket in S3. In this way I think that it's similar to be able to write the other formats to stdout; there are probably many different usages of it out there, including custom compression or encryption. If this is simply outside the scope of the directory or the custom format, that is certainly understandable (and, to me, somewhat regrettable :-) ). Thank you the answers, David
Re: Adding pipe support to pg_dump and pg_restore
Hi, On 2018-09-29 14:51:33 +0200, David Hedberg wrote: > I recently wanted a way to encrypt/decrypt backups while still > utilizing the parallel dump/restore functionality. I couldn't see a > way to do this so I experimented a bit with the directory backup > format. If there's in fact already a way to do this, please tell me > now :-) > > The idea is to add a --pipe option to pg_dump / pg_restore where you > can specify a custom shell command that is used to write / read each > .dat-file. Usage examples include encryption with pgp and/or custom > compression pipelines. %p in the command is expanded to the path to > write to / read from. The pipe command is not applied to the toc. > > The current version is attached. Could something like this be > acceptable for inclusion? Isn't that a bit unsatisfying because information about the tables and their sizes leaks? My suspicion is, and was, that we're probably at some point are going to want a format that supports the features the directory format does, without requiring to write to multiple files... - Andres
Re: Adding pipe support to pg_dump and pg_restore
Greetings, * David Hedberg (david.hedb...@gmail.com) wrote: > On Sat, Sep 29, 2018 at 7:01 PM, Stephen Frost wrote: > > * David Hedberg (david.hedb...@gmail.com) wrote: > >> On Sat, Sep 29, 2018 at 5:03 PM, Stephen Frost wrote: > >> Generally, my thinking is that this can be pretty useful in general > >> besides encryption. For other formats the dumps can already be written > >> to standard output and piped through for example gpg or a custom > >> compression application of the administrators choice, so in a sense > >> this functionality would merely add the same feature to the directory > >> format. > > > > That's certainly not the same though. One of the great advantages of > > custom and directory format dumps is the TOC and the ability to > > selectively extract data from them without having to read the entire > > dump file. You end up losing that if you have to pass the entire dump > > through something else because you're using the pipe. > > I can maybe see the problem here, but I apologize if I'm missing the point. > > Since all the files are individually passed through separate instances > of the pipe, they can also be individually restored. I guess the > --list option could be (adopted to be) used to produce a clear text > TOC to further use in selective decryption of the rest of the archive? This can work for directory format, but it wouldn't work for custom format. For a custom format dump, we'd need a way to encrypt the TOC independently of the rest, and we might even want to have the TOC include individual keys for the different objects or similar. > Possibly combined with an option to not apply the pipeline commands to > the TOC during dump and/or restore, if there's any need for that. That certainly doesn't seem to make things simpler or to be a very good interface. > But I think the pipe option, or one like it, could be used to easily > extend the format. Easily supporting a different compression > algorithm, a different encryption method or even a different storage > method like uploading the files directly to a bucket in S3. In this > way I think that it's similar to be able to write the other formats to > stdout; there are probably many different usages of it out there, > including custom compression or encryption. Considering the difficulty in doing selective restores (one of the primary reasons for doing a logical dump at all, imv) from a dump file that has to be completely decrypted or decompressed (due to using a custom compression method), I don't know that I really buy off on this argument that it's very commonly done or that it's a particularly good interface to use. > If this is simply outside the scope of the directory or the custom > format, that is certainly understandable (and, to me, somewhat > regrettable :-) ). What I think isn't getting through is that while this is an interesting approach, it really isn't a terribly good one, regardless of how flexible you view it to be. The way to move this forward seems pretty clearly to work on adding generalized encryption support to pg_dump/restore that doesn't depend on calling external programs underneath of the directory format with a pipe. Thanks! Stephen signature.asc Description: PGP signature
Re: Cygwin linking rules
Am 29.09.2018 um 19:03 schrieb Andrew Dunstan: On 09/29/2018 12:09 PM, Andrew Dunstan wrote: On 09/29/2018 11:35 AM, Tom Lane wrote: Most of the buildfarm is now happy with the changes I made to have libpq + ecpg get src/port and src/common files via libraries ... but lorikeet isn't. It gets through the core regression tests fine (so libpq, per se, works), but contrib/dblink fails: ! ERROR: could not establish connection ! DETAIL: libpq is incorrectly linked to backend functions What this means is that libpq is calling the backend version of src/common/link-canary.c rather than the frontend version. Why would it do the right thing normally and the wrong thing in dblink? I can think of a few theories but I lack the ability to investigate: 1. Maybe the dblink.dll build is pulling in libpq.a rather than establishing a reference to libpq.dll. If so, the wrong things would happen because libpq.a won't contain the src/common/ files that libpq needs. (It seems like libpq.a is an active hazard given this. Why are we building it at all?) 2. Maybe we need a --version-script option or something equivalent to get libpq.dll's references to be preferentially resolved internally rather than to the backend. But this doesn't really explain why it worked properly before. I will see if I can determine if 1) is the cause. I don't know enough, or in fact anything, about 2), so don;t know that I can help there without advice. It certainly looks like it's not linked to libpq.dll: Microsoft (R) COFF/PE Dumper Version 14.15.26726.0 Copyright (C) Microsoft Corporation. All rights reserved. Dump of file \cygwin64\home\andrew\\bf64\root\HEAD\inst\lib\postgresql\dblink.dll File Type: DLL Image has the following dependencies: postgres.exe cygcrypto-1.0.0.dll cygwin1.dll cygssl-1.0.0.dll KERNEL32.dll I'll build an earlier version to do a comparison just to make sure we're seeing the right thing. cheers andrew building from git and using the attached patch that is used for all cygwin packages on latest cygwin $ uname -svrm CYGWIN_NT-10.0 2.11.1(0.329/5/3) 2018-09-05 10:24 x86_64 I do not see the problem == creating database "contrib_regression" == CREATE DATABASE ALTER DATABASE == running regression test queries== test paths... ok test dblink ... ok == shutting down postmaster == $ objdump -x usr/lib/postgresql/dblink.dll |grep "DLL Name:" DLL Name: postgres.exe DLL Name: cygpq-5.dll DLL Name: cygwin1.dll DLL Name: KERNEL32.dll I am wondering if I am testing the same --- $ git log |head commit 8bddc864000f56d396621d4ad0f13e8e1872ddf5 Author: Stephen Frost Date: Fri Sep 28 19:04:50 2018 -0400 Add application_name to connection authorized msg The connection authorized message has quite a bit of useful information in it, but didn't include the application_name (when provided), so let's add that as it can be very useful. --- --- Diese E-Mail wurde von Avast Antivirus-Software auf Viren geprüft. https://www.avast.com/antivirus --- origsrc/postgresql-9.4.2/src/Makefile.shlib 2015-05-20 00:33:58.0 +0200 +++ src/Makefile.shlib 2015-05-27 23:01:09.379468300 +0200 @@ -267,7 +267,7 @@ endif ifeq ($(PORTNAME), cygwin) LINK.shared = $(CC) -shared ifdef SO_MAJOR_VERSION -shlib = cyg$(NAME)$(DLSUFFIX) +shlib = cyg$(NAME)-$(SO_MAJOR_VERSION)$(DLSUFFIX) endif haslibarule = yes endif @@ -359,12 +359,9 @@ ifeq ($(PORTNAME), cygwin) # Cygwin case $(shlib): $(OBJS) | $(SHLIB_PREREQS) - $(CC) $(CFLAGS) -shared -o $@ $(OBJS) $(LDFLAGS) $(LDFLAGS_SL) $(SHLIB_LINK) $(LIBS) $(LDAP_LIBS_BE) + $(CC) $(CFLAGS) -shared -o $@ -Wl,--out-implib=$(stlib) $(OBJS) $(LDFLAGS) $(LDFLAGS_SL) $(SHLIB_LINK) $(LIBS) $(LDAP_LIBS_BE) -$(stlib): $(OBJS) | $(SHLIB_PREREQS) - rm -f $@ - $(LINK.static) $@ $^ - $(RANLIB) $@ +$(stlib): $(shlib) ; else --- origsrc/postgresql-9.4.2/src/interfaces/libpq/Makefile 2015-05-20 00:33:58.0 +0200 +++ src/interfaces/libpq/Makefile 2015-05-27 22:56:43.193200600 +0200 @@ -45,7 +45,7 @@ OBJS += ip.o md5.o OBJS += encnames.o wchar.o ifeq ($(PORTNAME), cygwin) -override shlib = cyg$(NAME)$(DLSUFFIX) +override shlib = cyg$(NAME)-$(SO_MAJOR_VERSION)$(DLSUFFIX) endif ifeq ($(PORTNAME), win32)
Re: Cygwin linking rules
On 09/29/2018 01:03 PM, Andrew Dunstan wrote: On 09/29/2018 12:09 PM, Andrew Dunstan wrote: On 09/29/2018 11:35 AM, Tom Lane wrote: Most of the buildfarm is now happy with the changes I made to have libpq + ecpg get src/port and src/common files via libraries ... but lorikeet isn't. It gets through the core regression tests fine (so libpq, per se, works), but contrib/dblink fails: ! ERROR: could not establish connection ! DETAIL: libpq is incorrectly linked to backend functions What this means is that libpq is calling the backend version of src/common/link-canary.c rather than the frontend version. Why would it do the right thing normally and the wrong thing in dblink? I can think of a few theories but I lack the ability to investigate: 1. Maybe the dblink.dll build is pulling in libpq.a rather than establishing a reference to libpq.dll. If so, the wrong things would happen because libpq.a won't contain the src/common/ files that libpq needs. (It seems like libpq.a is an active hazard given this. Why are we building it at all?) 2. Maybe we need a --version-script option or something equivalent to get libpq.dll's references to be preferentially resolved internally rather than to the backend. But this doesn't really explain why it worked properly before. I will see if I can determine if 1) is the cause. I don't know enough, or in fact anything, about 2), so don;t know that I can help there without advice. It certainly looks like it's not linked to libpq.dll: Microsoft (R) COFF/PE Dumper Version 14.15.26726.0 Copyright (C) Microsoft Corporation. All rights reserved. Dump of file \cygwin64\home\andrew\\bf64\root\HEAD\inst\lib\postgresql\dblink.dll File Type: DLL Image has the following dependencies: postgres.exe cygcrypto-1.0.0.dll cygwin1.dll cygssl-1.0.0.dll KERNEL32.dll I'll build an earlier version to do a comparison just to make sure we're seeing the right thing. Hmm. Getting the same result from REL_10_STABLE. Not sure where to go from here. cheers andrew -- Andrew Dunstanhttps://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Adding pipe support to pg_dump and pg_restore
Hi, On Sat, Sep 29, 2018 at 8:03 PM, Stephen Frost wrote: > Greetings, > > * David Hedberg (david.hedb...@gmail.com) wrote: >> On Sat, Sep 29, 2018 at 7:01 PM, Stephen Frost wrote: >> > * David Hedberg (david.hedb...@gmail.com) wrote: >> >> On Sat, Sep 29, 2018 at 5:03 PM, Stephen Frost wrote: >> >> Generally, my thinking is that this can be pretty useful in general >> >> besides encryption. For other formats the dumps can already be written >> >> to standard output and piped through for example gpg or a custom >> >> compression application of the administrators choice, so in a sense >> >> this functionality would merely add the same feature to the directory >> >> format. >> > >> > That's certainly not the same though. One of the great advantages of >> > custom and directory format dumps is the TOC and the ability to >> > selectively extract data from them without having to read the entire >> > dump file. You end up losing that if you have to pass the entire dump >> > through something else because you're using the pipe. >> >> I can maybe see the problem here, but I apologize if I'm missing the point. >> >> Since all the files are individually passed through separate instances >> of the pipe, they can also be individually restored. I guess the >> --list option could be (adopted to be) used to produce a clear text >> TOC to further use in selective decryption of the rest of the archive? > I admit that my understanding of the custom format was naive (I have never actually used it). >> If this is simply outside the scope of the directory or the custom >> format, that is certainly understandable (and, to me, somewhat >> regrettable :-) ). > > What I think isn't getting through is that while this is an interesting > approach, it really isn't a terribly good one, regardless of how > flexible you view it to be. The way to move this forward seems pretty > clearly to work on adding generalized encryption support to > pg_dump/restore that doesn't depend on calling external programs > underneath of the directory format with a pipe. I did get the message that it wasn't the optimal way of doing it, and I have now also gotten the message that it's probably not really wanted at all. Thanks you for your insights, David
Re: executor relation handling
David Rowley writes: > I've attached v10 which fixes this and aligns the WARNING text in > ExecInitRangeTable() and addRangeTableEntryForRelation(). I started poking at this. I thought that it would be a good cross-check to apply just the "front half" of 0001 (i.e., creation and population of the RTE lockmode field), and then to insert checks in each of the "back half" places (executor, plancache, etc) that the lockmodes they are computing today match what is in the RTE. Those checks fell over many times in the regression tests. There seem to be at least four distinct problems: 1. You set up transformRuleStmt to insert AccessExclusiveLock into the "OLD" and "NEW" RTEs for a view. This is surely wrong; we do not want to take exclusive lock on a view just to run a query using the view. It should (usually, anyway) just be AccessShareLock. However, because addRangeTableEntryForRelation insists that you hold the requested lock type *now*, just changing the parameter to AccessShareLock doesn't work. I hacked around this for the moment by passing NoLock to addRangeTableEntryForRelation and then changing rte->lockmode after it returns, but man that's ugly. It makes me wonder whether addRangeTableEntryForRelation should be checking the lockmode at all. I'm inclined to think we should remove the check for current lockmode in addRangeTableEntryForRelation, and instead just assert that the passed lockmode must be one of AccessShareLock, RowShareLock, or RowExclusiveLock depending on whether the RTE is meant to represent a plain source table, a FOR UPDATE/SHARE source table, or a target table. I don't think it's that helpful to be checking that the caller got exactly the same lock, especially given the number of places where the patch had to cheat already by using NoLock. 2. The "forUpdatePushedDown" override in AcquireRewriteLocks isn't getting modeled in the RTE lockmode fields. In other words, if we have something like CREATE VIEW vv AS SELECT * FROM tab1; SELECT * FROM vv FOR UPDATE OF vv; the checks fall over, because the tab1 RTE in the view's rangetable just has AccessShareLock, but we need RowShareLock. I fixed this by having AcquireRewriteLocks actually modify the RTE if it is promoting the lock level. This is kinda grotty, but we were already assuming that AcquireRewriteLocks could scribble on the query tree, so it seems safe enough. 3. There remain some cases where the RTE says RowExclusiveLock but the executor calculation indicates we only need AccessShareLock. AFAICT, this happens only when we have a DO ALSO rule that results in an added query that merely scans the target table. The RTE used for that purpose is just the original one, so it still has a lockmode suitable for a target table. We could probably hack the rewriter so that it changes the RTE lockmode back down to AccessShareLock in these cases. However, I'm inclined to think that that is something *not* to do, and that we should let the higher lockmode be used instead, for two reasons: (1) Asking for both AccessShareLock and RowExclusiveLock on the same table requires an extra trip through the shared lock manager, for little value that I can see. (2) If the DO ALSO rule is run before the main one, we'd be acquiring AccessShareLock before RowExclusiveLock, resulting in deadlock hazard due to lock upgrade. (I think this may be a pre-existing bug, although it could likely only manifest in corner cases such as where we're pulling a plan tree out of plancache. In most cases the first thing we'd acquire on a rule target table is RowExclusiveLock in the parser, before any rule rewriting could happen.) 4. I also notice some cases (all in FDW tests) where ExecOpenScanRelation is choosing AccessShareLock although the RTE has RowShareLock. These seem to all be cases where we're implementing FOR UPDATE/SHARE via ROW_MARK_COPY, so that this: ExecRowMark *erm = ExecFindRowMark(estate, scanrelid, true); if (erm != NULL && erm->relation != NULL) lockmode = NoLock; leaves lockmode as AccessShareLock because erm->relation is NULL. Again, I think this is probably OK, and it'd be better to use RowShareLock for consistency with other places that might open the rel. It will be a change in externally-visible behavior though. Thoughts? regards, tom lane
Re: Cygwin linking rules
Andrew Dunstan writes: > Not sure where to go from here. What would happen if we stopped building libpq.a, so that the linker didn't have any choice about what to do? regards, tom lane
Re: MERGE SQL statement for PG12
On Mon, 24 Sep 2018 at 05:15, Pavan Deolasee wrote: > > A new version rebased against the current master is attached. > Hi Pavan, A day after you posted this patch commit 29c94e03c7d05d2b29afa1de32795ce178531246 removed ExecStoreTuple. I'm right in believe that the change in src/backend/executor/execMerge.c should be for ExecStoreHeapTuples? - ExecStoreTuple(&tuple, mtstate->mt_existing, buffer, false); + ExecStoreHeapTuple(&tuple, mtstate->mt_existing, false); -- Jaime Casanova www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Continue work on changes to recovery.conf API
Hello thank you for review! >> - if present both standby.signal and recovery.signal - we use standby mode >> (this design from previous thread) > > Didn't find the discussion on that and don't understand the reasons, but > seems OK and easy to change if we don't like it. I meant this was implemented in previous proposed patch and i do not changed this. I didn't find the discussion on that too... >> - recovery_target (immediate), recovery_target_name, recovery_target_time, >> recovery_target_xid, recovery_target_lsn are replaced to >> recovery_target_type and recovery_target_value (was discussed and changed in >> previous thread) > > I think this was the major point of contention. I reread the old > thread, and it's still not clear why we need to change this. _type and > _value look like an EAV system to me. GUC variables should be > verifiable independent of another variable. The other idea of using > only one variable seems better, but using two variables seems like a > poor compromise between one and five. > No, they MUST be independently verifiable. The interactions between the > check_xxx functions in this patch are utterly unsafe. Understood, thank you. I will implement set of current five recovery_target* settings and would like to leave reorganization for futher pathes. Not sure about one thing. All recovery_target settings change one internal recoveryTarget variable. GUC infrastructure will guarantee behavior if user erroneously set multiple different recovery_target*? I mean check_* and assign_* will be called in order and so latest config line wins? >> - trigger_file was renamed to promote_signal_file for clarify (my change, >> in prev thread was removed) > > OK to add the "promote" prefix, but I'd prefer to keep the "trigger" > name. There is a lot of discussion and knowledge around that. Seems > unnecessary to change. Ok, will change to promote_trigger_file >> - pg_basebackup with -R (--write-recovery-conf) option will create >> standby.signal file and append primary_conninfo and primary_slot_name (if >> was used) to postgresql.auto.conf instead of writing new recovery.conf > > I wonder if that would cause any problems. The settings in > postgresql.auto.conf are normally not PGC_POSTMASTER, otherwise you > couldn't use ALTER SYSTEM to put them there. Maybe it's OK. Actually we can use ALTER SYSTEM to put PGC_POSTMASTER settings. I tried now "alter system set max_connections to 300;", postgresql.auto.conf was changed and affect max_connections during database start. Of course, we can not reload PGC_POSTMASTER settings, but during start we call regular ParseConfigFile and can override PGC_POSTMASTER. regards, Sergei
Re: libpq host/hostaddr/conninfo inconsistencies
Hello, On Fri, Aug 24, 2018 at 11:22:47AM +0200, Fabien COELHO wrote: > Attached is a rebase after 5ca00774. I looked a little bit the patch. And I have a few notes. > However, the actual capability is slightly different: specifying an ip > address to "host" does work, without ensuing any name or reverse name > look-ups, even if this is undocumented. Agree it may have more details within the documentation. > sh> psql "host=/tmp hostaddr=127.0.0.1" Yeah this example shows that user may be confused by output of \conninfo. I think it is psql issue and libpq issue. psql in exec_command_conninfo() rely only on the PQhost() result. Can we add a function PQhostType() to solve this issue? > sh> psql "host=127.0.0.2 hostaddr=127.0.0.1" I'm not sure that is is the issue. User defined the host name and psql show it. > sh> psql "hostaddr=127.0.0.1" I cannot reproduce it. It gives me the message: You are connected to database "artur" as user "artur" on host "127.0.0.1" at port "5432". I think it is because of the environment (I didn't define PGHOST variable, for example). If so, depending on PGHOST variable value ("/tmp" or "127.0.0.1") it is related with first or second issue. > * Another issue with \conninfo is that if a host resolves to multiple ips, > there is no way to know which was chosen and/or worked, although on errors > some messages show the failing ip. Can you explain it please? You can use PQhost() to know choosed host. > * The documentation about host/hostaddr/port accepting lists is really > added as an afterthought: the features are presented for one, and then the > list is mentionned. I cannot agree with you. When I've learned libpq before I found host/hostaddr rules description useful. And I disagree that it is good to remove it (as the patch does). Of course it is only my point of view and others may have another opinion. > (3) checking that hostaddr non empty addresses are only accepted if the > corresponding host is a name. The user must use the "host=ip" syntax > to connect to an ip. Patch gives me an error if I specified only hostaddr: psql -d "hostaddr=127.0.0.1" psql: host "/tmp" cannot have an hostaddr "127.0.0.1" It is wrong, because I didn't specified host=/tmp. -- Arthur Zakirov Postgres Professional: http://www.postgrespro.com Russian Postgres Company
Re: executor relation handling
I wrote: > I started poking at this. I thought that it would be a good cross-check > to apply just the "front half" of 0001 (i.e., creation and population of > the RTE lockmode field), and then to insert checks in each of the > "back half" places (executor, plancache, etc) that the lockmodes they > are computing today match what is in the RTE. > Those checks fell over many times in the regression tests. Here's a version that doesn't fall over (for me), incorporating fixes as I suggested for points 1 and 2, and weakening the back-half assertions enough to let points 3 and 4 succeed. I'm inclined to commit this to see if the buildfarm can find any problems I missed. Some cosmetic changes: * I renamed the RTE field to rellockmode in the name of greppability. * I rearranged the order of the arguments for addRangeTableEntryForRelation to make it more consistent with the other addRangeTableEntryXXX functions. regards, tom lane diff --git a/src/backend/catalog/dependency.c b/src/backend/catalog/dependency.c index 4f1d365..7dfa327 100644 --- a/src/backend/catalog/dependency.c +++ b/src/backend/catalog/dependency.c @@ -1415,6 +1415,7 @@ recordDependencyOnSingleRelExpr(const ObjectAddress *depender, rte.rtekind = RTE_RELATION; rte.relid = relId; rte.relkind = RELKIND_RELATION; /* no need for exactness here */ + rte.rellockmode = AccessShareLock; context.rtables = list_make1(list_make1(&rte)); diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c index 9176f62..4ad5868 100644 --- a/src/backend/catalog/heap.c +++ b/src/backend/catalog/heap.c @@ -2549,6 +2549,7 @@ AddRelationNewConstraints(Relation rel, pstate->p_sourcetext = queryString; rte = addRangeTableEntryForRelation(pstate, rel, + AccessShareLock, NULL, false, true); diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c index d06662b..2d5bc8a 100644 --- a/src/backend/commands/copy.c +++ b/src/backend/commands/copy.c @@ -833,20 +833,21 @@ DoCopy(ParseState *pstate, const CopyStmt *stmt, if (stmt->relation) { + LOCKMODE lockmode = is_from ? RowExclusiveLock : AccessShareLock; + RangeTblEntry *rte; TupleDesc tupDesc; List *attnums; ListCell *cur; - RangeTblEntry *rte; Assert(!stmt->query); /* Open and lock the relation, using the appropriate lock type. */ - rel = heap_openrv(stmt->relation, - (is_from ? RowExclusiveLock : AccessShareLock)); + rel = heap_openrv(stmt->relation, lockmode); relid = RelationGetRelid(rel); - rte = addRangeTableEntryForRelation(pstate, rel, NULL, false, false); + rte = addRangeTableEntryForRelation(pstate, rel, lockmode, + NULL, false, false); rte->requiredPerms = (is_from ? ACL_INSERT : ACL_SELECT); tupDesc = RelationGetDescr(rel); diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c index 3d82edb..d5cb62d 100644 --- a/src/backend/commands/createas.c +++ b/src/backend/commands/createas.c @@ -528,6 +528,7 @@ intorel_startup(DestReceiver *self, int operation, TupleDesc typeinfo) rte->rtekind = RTE_RELATION; rte->relid = intoRelationAddr.objectId; rte->relkind = relkind; + rte->rellockmode = RowExclusiveLock; rte->requiredPerms = ACL_INSERT; for (attnum = 1; attnum <= intoRelationDesc->rd_att->natts; attnum++) diff --git a/src/backend/commands/policy.c b/src/backend/commands/policy.c index cee0ef9..2fd17b2 100644 --- a/src/backend/commands/policy.c +++ b/src/backend/commands/policy.c @@ -567,7 +567,9 @@ RemoveRoleFromObjectPolicy(Oid roleid, Oid classid, Oid policy_id) qual_expr = stringToNode(qual_value); /* Add this rel to the parsestate's rangetable, for dependencies */ - addRangeTableEntryForRelation(qual_pstate, rel, NULL, false, false); + addRangeTableEntryForRelation(qual_pstate, rel, + AccessShareLock, + NULL, false, false); qual_parse_rtable = qual_pstate->p_rtable; free_parsestate(qual_pstate); @@ -589,8 +591,9 @@ RemoveRoleFromObjectPolicy(Oid roleid, Oid classid, Oid policy_id) with_check_qual = stringToNode(with_check_value); /* Add this rel to the parsestate's rangetable, for dependencies */ - addRangeTableEntryForRelation(with_check_pstate, rel, NULL, false, - false); + addRangeTableEntryForRelation(with_check_pstate, rel, + AccessShareLock, + NULL, false, false); with_check_parse_rtable = with_check_pstate->p_rtable; free_parsestate(with_check_pstate); @@ -752,11 +755,13 @@ CreatePolicy(CreatePolicyStmt *stmt) /* Add for the regular security quals */ rte = addRangeTableEntryForRelation(qual_pstate, target_table, + AccessShareLock, NULL, false, false); addRTEtoQuery(qual_pstate, rte, false, true, true); /* Add for the with-check quals */ rte = addRangeTableEntryForRelation(with_check_pstate, target_table, + AccessShareLock, NULL, false
Re: [HACKERS] proposal: schema variables
so 29. 9. 2018 v 10:34 odesílatel Pavel Stehule napsal: > > > so 22. 9. 2018 v 8:00 odesílatel Pavel Stehule > napsal: > >> Hi >> >> rebased against yesterday changes in tab-complete.c >> > > rebased against last changes in master > + using content of schema variable for estimation + subtransaction support I hope so now, there are almost complete functionality. Please, check it. Regards Pavel > > Regards > > Pavel > > > >> Regards >> >> Pavel >> > schema-variables-20180929-02.patch.gz Description: application/gzip
Re: Cygwin linking rules
On 09/29/2018 04:06 PM, Tom Lane wrote: Andrew Dunstan writes: Not sure where to go from here. What would happen if we stopped building libpq.a, so that the linker didn't have any choice about what to do? I will test Marco's patch, which I think does that, tomorrow. cheers andrew -- Andrew Dunstanhttps://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Implementing SQL ASSERTION
> "Joe" == Joe Wildish writes: >> I haven't looked at the background of this, but if what you want to >> know is whether the aggregate function has the semantics of min() or >> max() (and if so, which) then the place to look is >> pg_aggregate.aggsortop. Joe> Thanks for the pointer. I've had a quick look at pg_aggregate, and Joe> back at my code, but I think there is more to it than just the Joe> sorting property. Specifically we need to know about the aggregate Joe> function when combined with connectors <, <=, < ANY, <= ANY, < ALL Joe> and <= ALL (and their equivalents with ">" and ">="). The presence of an aggsortop means "this aggregate function is interchangeable with (select x from ... order by x using OP limit 1)", with all of the semantic consequences that implies. Since OP must be the "<" or ">" member of a btree index opclass, the semantics of its relationships with other members of the same opfamily can be deduced from that. Joe> Also, it looks like COUNT and SUM don't have a sortop Right, because those currently have no semantics that PG needs to know about or describe. -- Andrew (irc:RhodiumToad)
Re: Odd 9.4, 9.3 buildfarm failure on s390x
> "Andrew" == Andrew Dunstan writes: >> What is the size of a C "int" on this platform? Andrew> 4. Hmm. Because int being more than 32 bits is the simplest explanation for this difference. How about the output of this query: with d(a) as (values ('----'::uuid), ('----'::uuid), ('3f3e3c3b-3a30-3938-3736-353433a2313e'::uuid)) select d1.a, d2.a, uuid_cmp(d1.a,d2.a) from d d1, d d2 order by d1.a asc, d2.a desc; -- Andrew (irc:RhodiumToad)
Re: libpq host/hostaddr/conninfo inconsistencies
On Sat, Aug 25, 2018 at 7:25 AM Tom Lane wrote: > Fabien COELHO writes: > > Attached is a rebase after 5ca00774. > > I notice that the cfbot thinks that *none* of your pending patches apply > successfully. I tried this one locally and what I get is > > $ patch -p1 <~/libpq-host-ip-2.patch > (Stripping trailing CRs from patch.) > patching file doc/src/sgml/libpq.sgml > (Stripping trailing CRs from patch.) > patching file src/interfaces/libpq/fe-connect.c > > as compared to the cfbot report, in which every hunk is rejected: > > === applying patch ./libpq-host-ip-2.patch > Hmm... Looks like a unified diff to me... > The text leading up to this was: > -- > |diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml > |index 5e7931ba90..086172d4f0 100644 > |--- a/doc/src/sgml/libpq.sgml > |+++ b/doc/src/sgml/libpq.sgml > -- > Patching file doc/src/sgml/libpq.sgml using Plan A... > Hunk #1 failed at 964. > Hunk #2 failed at 994. > 2 out of 2 hunks failed--saving rejects to doc/src/sgml/libpq.sgml.rej > Hmm... The next patch looks like a unified diff to me... > The text leading up to this was: > -- > |diff --git a/src/interfaces/libpq/fe-connect.c > b/src/interfaces/libpq/fe-connect.c > |index a8048ffad2..34025ba041 100644 > |--- a/src/interfaces/libpq/fe-connect.c > |+++ b/src/interfaces/libpq/fe-connect.c > -- > Patching file src/interfaces/libpq/fe-connect.c using Plan A... > Hunk #1 failed at 908. > Hunk #2 failed at 930. > Hunk #3 failed at 943. > Hunk #4 failed at 974. > Hunk #5 failed at 1004. > Hunk #6 failed at 1095. > Hunk #7 failed at 2098. > Hunk #8 failed at 2158. > Hunk #9 failed at 6138. > 9 out of 9 hunks failed--saving rejects to > src/interfaces/libpq/fe-connect.c.rej > done > > So I'm speculating that the cfbot is using a version of patch(1) that > doesn't have strip-trailing-CRs logic. Which bemuses me, because > I thought they all did. Huh. Yeah. I have now switched it over to GNU patch. It seems to be happier with Fabien's patches so far, but will take a few minutes to catch up with all of them. -- Thomas Munro http://www.enterprisedb.com
Re: [HACKERS] kqueue
On Sat, Sep 29, 2018 at 7:51 PM Matteo Beccati wrote: > On 28/09/2018 14:19, Thomas Munro wrote: > > On Fri, Sep 28, 2018 at 11:09 AM Andres Freund wrote: > >> On 2018-09-28 10:55:13 +1200, Thomas Munro wrote: > >>> Matteo Beccati reported a 5-10% performance drop on a > >>> low-end Celeron NetBSD box which we have no explanation for, and we > >>> have no reports from server-class machines on that OS -- so perhaps we > >>> (or the NetBSD port?) should consider building with WAIT_USE_POLL on > >>> NetBSD until someone can figure out what needs to be fixed there > >>> (possibly on the NetBSD side)? > >> > >> Yea, I'm not too worried about that. It'd be great to test that, but > >> otherwise I'm also ok to just plonk that into the template. > > > > Thanks for the review! Ok, if we don't get a better idea I'll put > > this in src/template/netbsd: > > > > CPPFLAGS="$CPPFLAGS -DWAIT_USE_POLL" > > A quick test on a 8 vCPU / 4GB RAM virtual machine running a fresh > install of NetBSD 8.0 again shows that kqueue is consistently slower > running pgbench vs unpatched master on tcp-b like pgbench workloads: > > ~1200tps vs ~1400tps w/ 96 clients and threads, scale factor 10 > > while on select only benchmarks the difference is below the noise floor, > with both doing roughly the same ~30k tps. > > Out of curiosity, I've installed FreBSD on an identically specced VM, > and the select benchmark was ~75k tps for kqueue vs ~90k tps on > unpatched master, so maybe there's something wrong I'm doing when > benchmarking. Could you please provide proper instructions? Ouch. What kind of virtualisation is this? Which version of FreeBSD? Not sure if it's relevant, but do you happen to see gettimeofday() showing up as a syscall, if you truss a backend running pgbench? -- Thomas Munro http://www.enterprisedb.com
Re: Odd 9.4, 9.3 buildfarm failure on s390x
Andrew Gierth writes: > Because int being more than 32 bits is the simplest explanation for this > difference. Curious to hear your reasoning behind that statement? I hadn't gotten further than "memcmp is broken" ... and neither of those theories is tenable, because if they were true then a lot more things besides uuid sorting would be falling over. regards, tom lane
Re: Odd 9.4, 9.3 buildfarm failure on s390x
> "Tom" == Tom Lane writes: > Andrew Gierth writes: >> Because int being more than 32 bits is the simplest explanation for >> this difference. Tom> Curious to hear your reasoning behind that statement? I hadn't Tom> gotten further than "memcmp is broken" ... and neither of those Tom> theories is tenable, because if they were true then a lot more Tom> things besides uuid sorting would be falling over. memcmp() returns an int, and guarantees only the sign of the result, so ((int32) memcmp()) may have the wrong value if int is wider than int32. But yeah, it seems unlikely that it would break for uuid but not bytea (or text in collate C). -- Andrew (irc:RhodiumToad)
Re: SerializeParamList vs machines with strict alignment
On Mon, Sep 10, 2018 at 7:05 PM Tom Lane wrote: > > Amit Kapila writes: > > On Mon, Sep 10, 2018 at 8:58 AM Tom Lane wrote: > >> In particular, SerializeParamList does this: > >> > >> /* Write flags. */ > >> memcpy(*start_address, &prm->pflags, sizeof(uint16)); > >> *start_address += sizeof(uint16); > >> > >> immediately followed by this: > >> > >> datumSerialize(prm->value, prm->isnull, typByVal, typLen, > >> start_address); > > > datumSerialize does this: > > > memcpy(*start_address, &header, sizeof(int)); > > *start_address += sizeof(int); > > > before calling EOH_flatten_into, so it seems to me it should be 4-byte > > aligned. > > But that doesn't undo the fact that you're now on an odd halfword > boundary. In the case I observed, EA_flatten_into was trying to > store an int32 through a pointer whose hex value ended in E, which > is explained by the "+= sizeof(uint16)". > > > Yeah, I think as suggested by you, start_address should be maxaligned. > > A localized fix would be for datumSerialize to temporarily palloc the > space for EOH_flatten_into to write into, and then it could memcpy > that to whatever random address it'd been passed. Attached is a patch along these lines, let me know if you have something else in mind? I have manually verified this with force_parallel_mode=regress configuration in my development environment. I don't have access to alignment-sensitive hardware, so can't test in such an environment. I will see if I can write a test as well. > Seems kind of > inefficient though, especially since you'd likely have to do the same > thing on the receiving side. > I am not sure what exactly you have in mind for receiving side. datumRestore does below for pass-by-reference values: /* Pass-by-reference case; copy indicated number of bytes. */ Assert(header > 0); d = palloc(header); memcpy(d, *start_address, header); *start_address += header; return PointerGetDatum(d); Do we need any other special handling here? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com fix_datum_serialization_v1.patch Description: Binary data
Re: [HACKERS] [PATCH] Generic type subscripting
ne 30. 9. 2018 v 0:21 odesílatel Dmitry Dolgov <9erthali...@gmail.com> napsal: > > On Fri, 20 Jul 2018 at 23:32, Dmitry Dolgov <9erthali...@gmail.com> > wrote: > > > > > On Thu, 26 Apr 2018 at 16:44, Dmitry Dolgov <9erthali...@gmail.com> > wrote: > > > > > > > On 22 March 2018 at 23:25, Dmitry Dolgov <9erthali...@gmail.com> > wrote: > > > > > > > > Here is the updated version of patch, rebased after recent conflicts > and with > > > > suggested documentation improvements. > > > > > > Another rebased version of the patch. > > > > I've noticed, that I never updated llvmjit code for the arrayref > expressions, > > and it's important to do so, since the patch introduces another layer of > > flexibility. Hence here is the new version. > > Here is another rebased version, and a bit of history: the first > prototypes of > this patch were sent more than 3 years ago. Of course the patch evolved > significantly over this period, and I take it as a good sign that it wasn't > rejected and keeps moving through the commitfests. At the same time the > lack of > attention makes things a bit frustrating. I have an impression that it's > sort > of regular situation and wonder if there are any ideas (besides the well > known > advice of putting some efforts into review patches from other people, > since I'm > already doing my best and enjoying this) how to make progress in such > cases? > This feature looks nice, and it can be great when some values of some not atomic type should be updated. Regards Pavel