Re: Avoiding deadlock errors in CREATE INDEX CONCURRENTLY
On Fri, Nov 08, 2019 at 10:30:39AM +0900, Michael Paquier wrote: > Per the arguments of upthread, storing a 64-bit XID would require a > catalog change and you cannot backpatch that. I would suggest to keep > this patch focused on HEAD, and keep it as an improvement of the > existing features. Concurrent deadlock risks caused by CCI exist > since the feature came to life. Marked as returned with feedback per lack of activity and the patch was waiting on author for a bit more than two weeks. -- Michael signature.asc Description: PGP signature
Re: pglz performance
On Wed, Nov 06, 2019 at 09:04:25AM +0100, Peter Eisentraut wrote: > OK, waiting on some independent verification of benchmark numbers. Still waiting for these after 19 days, so the patch has been marked as returned with feedback. -- Michael signature.asc Description: PGP signature
Re: log bind parameter values on error
On Thu, Nov 07, 2019 at 03:41:04PM -0800, Andres Freund wrote: > The way you do it you need to do it in numerous places, and I'm pretty > sure you're missing some already. E.g. this will not work to log > parameters for parametrized statements generated on the server side, > e.g. for foreign key queries. I don't think that's the right direction > to go. You can maybe argue that we don't need support for logging server > side generated parameters in the initial version, but the approach > should be compatible with adding that support. This patch had a review from two committers, with no updates from the author, so marked as returned with feedback. -- Michael signature.asc Description: PGP signature
Re: Avoid full GIN index scan when possible
On Sat, Nov 23, 2019 at 02:35:50AM +0300, Nikita Glukhov wrote: > Attached 8th version of the patches. Please be careful here. The CF entry was still marked as waiting on author, but you sent a new patch series which has not been reviewed. I have moved this patc to next CF instead. -- Michael signature.asc Description: PGP signature
Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index.
On Mon, Nov 18, 2019 at 05:26:37PM -0800, Peter Geoghegan wrote: > Attached is v24. This revision doesn't fix the problem with > xl_btree_insert record bloat, but it does fix the bitrot against the > master branch that was caused by commit 50d22de9. (This patch has had > a surprisingly large number of conflicts against the master branch > recently.) Please note that I have moved this patch to next CF per this last update. Anastasia, the ball is waiting on your side of the field, as the CF entry is marked as waiting on author for some time now. -- Michael signature.asc Description: PGP signature
Re: WIP: Data at rest encryption
On Wed, Sep 04, 2019 at 06:56:18AM +0200, Antonin Houska wrote: > This thread started later than our effort but important design questions are > being discussed there. So far there seems to be no consensus whether > full-instance encryption should be implemented first, so any effort spent on > this patch might get wasted. When/if there will be an agreement on the design, > we'll see how much of this patch can be used. I see. I have marked the patch as returned with feedback in CF 2019-11. Let's see how the other one finishes, before perhaps coming back to this one. -- Michael signature.asc Description: PGP signature
Re: pglz performance
> 25 нояб. 2019 г., в 13:03, Michael Paquier написал(а): > > On Wed, Nov 06, 2019 at 09:04:25AM +0100, Peter Eisentraut wrote: >> OK, waiting on some independent verification of benchmark numbers. > > Still waiting for these after 19 days, so the patch has been marked as > returned with feedback. I think status Needs Review describes what is going on better. It's not like something is awaited from my side. Thanks. Best regards, Andrey Borodin.
Re: pglz performance
On Mon, Nov 25, 2019 at 01:21:27PM +0500, Andrey Borodin wrote: > I think status Needs Review describes what is going on better. It's > not like something is awaited from my side. Indeed. You are right so I have moved the patch instead, with "Needs review". The patch status was actually incorrect in the CF app, as it was marked as waiting on author. @Tomas: updated versions of the patches have been sent by Andrey. -- Michael signature.asc Description: PGP signature
Re: accounting for memory used for BufFile during hash joins
On Tue, Sep 10, 2019 at 03:47:51PM +0200, Tomas Vondra wrote: > My feeling is that we should get the BNLJ committed first, and then maybe > use some of those additional strategies as fallbacks (depending on which > issues are still unsolved by the BNLJ). The glacier is melting more. Tomas, what's your status here? The patch has been waiting on author for two months now. If you are not planning to work more on this one, then it should be marked as returned with feedback? -- Michael signature.asc Description: PGP signature
Re: [HACKERS] Incomplete startup packet errors
On Thu, Mar 7, 2019 at 1:26 AM Andrew Dunstan < andrew.duns...@2ndquadrant.com> wrote: > > On 3/6/19 12:12 PM, Robert Haas wrote: > > On Tue, Mar 5, 2019 at 5:35 PM Andrew Dunstan > > wrote: > >> OK, I think we have agreement on Tom's patch. Do we want to backpatch > OK, no back-patching it is. > However, Checking whether the port is open is resulting in error log like: 2019-11-25 14:03:44.414 IST [14475] LOG: invalid length of startup packet Yes, This is different from "Incomplete startup packet" discussed here. Steps to reproduce: $ telnet localhost 5432 > >
Re: Rework manipulation and structure of attribute mappings
On Fri, Nov 22, 2019 at 4:57 PM Michael Paquier wrote: > On Fri, Nov 22, 2019 at 02:21:41PM +0900, Amit Langote wrote: > > Actually, we should also refactor > > convert_tuples_by_position() to carve out the code that builds the > > AttrMap into a separate function and move it to attmap.c. > > Not sure how to name that. One logic uses a match based on the > attribute name, and the other uses the type. So the one based on the > attribute name should be something like build_attrmap_by_name() and > the second attrmap_build_by_position()? We could use a better > convention like AttrMapBuildByPosition for example. Any suggestions > of names are welcome. Actually, I was just suggesting that we create a new function convert_tuples_by_position_map() and put the logic that compares the two TupleDescs to create the AttrMap in it, just like convert_tuples_by_name_map(). Now you could say that there would be no point in having such a function, because no caller currently wants to use such a map (that is, without the accompanying TupleConversionMap), but maybe there will be in the future. Though irrespective of that consideration, I guess you'd agree to group similar code in a single source file. Regarding coming up with any new name, having a word in the name that distinguishes the aspect of mapping by attribute name vs. type (position) should suffice. We can always do the renaming in a separate patch. > Please note that I still have a commit fest to > run and finish, so I'll unlikely come back to that before December. > Let's continue with the tuning of the function names though. As it's mainly just moving around code, I gave it a shot; patch attached (applies on top of yours). I haven't invented any new names yet, but let's keep discussing that as you say. Thanks, Amit 0002-Move-more-code-to-attmap.c.patch Description: Binary data
Re: Attempt to consolidate reading of XLOG page
Alvaro Herrera wrote: > On 2019-Nov-22, Antonin Houska wrote: > > > As I pointed out in > > > > https://www.postgresql.org/message-id/88183.1574261429%40antos > > > > seg.ws_off only replaced readOff in XLogReaderState. So we should only > > update > > ws_off where readOff was updated before commit 709d003. This does happen in > > ReadPageInternal (see HEAD) and I see no reason for the final patch to > > update > > ws_off anywhere else. > > Oh you're right. > > I see no reason to leave ws_off. We can move that to XLogReaderState; I > did that here. We also need the offset in WALReadError, though, so I > added it there too. Conceptually it seems clearer to me this way. > > What do you think of the attached? It looks good to me. Attached is just a fix of a minor problem in error reporting that Michael pointed out earlier. > BTW I'm not clear what errors can pread()/pg_pread() report that do not > set errno. I think lines 1083/1084 of WALRead are spurious now. All I can say is that the existing calls of pg_pread() do not clear errno, so you may be right. I'd appreciate more background about the "partial read" that Michael mentions here: https://www.postgresql.org/message-id/20191125033048.GG37821%40paquier.xyz -- Antonin Houska Web: https://www.cybertec-postgresql.com diff --git a/src/bin/pg_waldump/pg_waldump.c b/src/bin/pg_waldump/pg_waldump.c index 04124bc254..eda81c1df1 100644 --- a/src/bin/pg_waldump/pg_waldump.c +++ b/src/bin/pg_waldump/pg_waldump.c @@ -354,9 +354,11 @@ WALDumpReadPage(XLogReaderState *state, XLogRecPtr targetPagePtr, int reqLen, state->segcxt.ws_segsize); if (errinfo.wre_errno != 0) - fatal_error("could not read in file %s, offset %u, length %zu: %s", - fname, errinfo.wre_off, (Size) errinfo.wre_req, - strerror(errinfo.wre_errno)); + { + errno = errinfo.wre_errno; + fatal_error("could not read in file %s, offset %u, length %zu: %m", + fname, errinfo.wre_off, (Size) errinfo.wre_req); + } else fatal_error("could not read in file %s, offset %u: length: %zu", fname, errinfo.wre_off, (Size) errinfo.wre_req);
Re: backup manifests
Hi Jeevan, I have incorporated all the comments in the attached patch. Please review and let me know your thoughts. On Thu, Nov 21, 2019 at 2:51 PM Jeevan Chalke < jeevan.cha...@enterprisedb.com> wrote: > > > On Wed, Nov 20, 2019 at 11:05 AM Suraj Kharage < > suraj.khar...@enterprisedb.com> wrote: > >> Hi, >> >> Since now we are generating the backup manifest file with each backup, it >> provides us an option to validate the given backup. >> Let's say, we have taken a backup and after a few days, we want to check >> whether that backup is validated or corruption-free without restarting the >> server. >> >> Please find attached POC patch for same which will be based on the latest >> backup manifest patch from Rushabh. With this functionality, we add new >> option to pg_basebackup, something like --verify-backup. >> So, the syntax would be: >> ./bin/pg_basebackup --verify-backup -D >> >> Basically, we read the backup_manifest file line by line from the given >> directory path and build the hash table, then scan the directory and >> compare each file with the hash entry. >> >> Thoughts/suggestions? >> > > > I like the idea of verifying the backup once we have backup_manifest with > us. > Periodically verifying the already taken backup with this simple tool > becomes > easy now. > > I have reviewed this patch and here are my comments: > > 1. > @@ -30,7 +30,9 @@ > #include "common/file_perm.h" > #include "common/file_utils.h" > #include "common/logging.h" > +#include "common/sha2.h" > #include "common/string.h" > +#include "fe_utils/simple_list.h" > #include "fe_utils/recovery_gen.h" > #include "fe_utils/string_utils.h" > #include "getopt_long.h" > @@ -38,12 +40,19 @@ > #include "pgtar.h" > #include "pgtime.h" > #include "pqexpbuffer.h" > +#include "pgrhash.h" > #include "receivelog.h" > #include "replication/basebackup.h" > #include "streamutil.h" > > Please add new files in order. > > 2. > Can hash related file names be renamed to backuphash.c and backuphash.h? > > 3. > Need indentation adjustments at various places. > > 4. > +charbuf[100]; // 1MB chunk > > It will be good if we have multiple of block /page size (or at-least power > of 2 > number). > > 5. > +typedef struct pgrhash_entry > +{ > +struct pgrhash_entry *next; /* link to next entry in same bucket */ > +DataDirectoryFileInfo *record; > +} pgrhash_entry; > + > +struct pgrhash > +{ > +unsignednbuckets;/* number of buckets */ > +pgrhash_entry **bucket;/* pointer to hash entries */ > +}; > + > +typedef struct pgrhash pgrhash; > > These two can be moved to .h file instead of redefining over there. > > 6. > +/* > + * TODO: this function is not necessary, can be removed. > + * Test whether the given row number is match for the supplied keys. > + */ > +static bool > +pgrhash_compare(char *bt_filename, char *filename) > > Yeah, it can be removed by doing strcmp() at the required places rather > than > doing it in a separate function. > > 7. > mdate is not compared anywhere. I understand that it can't be compared with > the file in the backup directory and its entry in the manifest as manifest > entry gives mtime from server file whereas the same file in the backup will > have different mtime. But adding a few comments there will be good. > > 8. > +charmdate[24]; > > should be mtime instead? > > > Thanks > > -- > Jeevan Chalke > Associate Database Architect & Team Lead, Product Development > EnterpriseDB Corporation > The Enterprise PostgreSQL Company > > -- -- Thanks & Regards, Suraj kharage, EnterpriseDB Corporation, The Postgres Database Company. backup_validator_POC.patch Description: Binary data
Re: adding partitioned tables to publications
On Fri, Nov 22, 2019 at 7:46 PM Peter Eisentraut wrote: > On 2019-11-22 07:28, Amit Langote wrote: > > Hmm, I thought it would be more desirable to not expose a published > > partitioned table's leaf partitions to a subscriber, because it allows > > the target table to be defined more flexibly. > > There are multiple different variants that we probably eventually want > to support. But I think there is value in exposing the partition > structure to the subscriber. Most notably, it allows the subscriber to > run the initial table sync per partition rather than in one big chunk -- > which ultimately reflects one of the reasons partitioning exists. I agree that replicating leaf-to-leaf has the least overhead. > The other way, exposing only the partitioned table, is also useful, > especially if you want to partition differently on the subscriber. But > without the ability to target a partitioned table on the subscriber, > this would right now only allow you to replicate a partitioned table > into a non-partitioned table. Which is valid but probably not often useful. Handling non-partitioned target tables was the main reason for me to make publishing using the root parent's schema the default behavior. But given that replicating from partitioned tables into non-partitioned ones would be rare, I agree to replicating using the leaf schema by default. > >> What happens when you add a leaf table directly to a publication? Is it > >> replicated under its own identity or under its ancestor partitioned > >> table? (What if both the leaf table and a partitioned table are > >> publication members?) > > > > If both a leaf partition and an ancestor belong to the same > > publication, then leaf partition changes are replicated using the > > ancestor's schema. For a leaf partition to be replicated using its > > own schema it must be published via a separate publication that > > doesn't contain the ancestor. At least that's what the current patch > > does. > > Hmm, that seems confusing. This would mean that if you add a > partitioned table to a publication that already contains leaf tables, > the publication behavior of the leaf tables would change. So again, I > think this alternative behavior of publishing partitions under the name > of their root table should be an explicit option on a publication, and > then it should be ensured somehow that individual partitions are not > added to the publication in confusing ways. > > So, it's up to you which aspect of this you want to tackle, but I > thought your original goal of being able to add partitioned tables to > publications and have that implicitly expand to all member partitions on > the publication side seemed quite useful, self-contained, and > uncontroversial. OK, let's make whether to publish with root or leaf schema an option, with the latter being the default. I will see about updating the patch that way. Thanks, Amit
Re: [HACKERS] Block level parallel vacuum
On Fri, Nov 22, 2019 at 2:49 PM Amit Kapila wrote: > > On Wed, Nov 20, 2019 at 11:01 AM Masahiko Sawada > wrote: > > > > I've attached the latest version patch set. The patch set includes all > > discussed points regarding index AM options as well as shared cost > > balance. Also I added some test cases used all types of index AM. > > > > I have reviewed the first patch and made a number of modifications > that include adding/modifying comments, made some corrections and > modifications in the documentation. You can find my changes in > v33-0001-delta-amit.patch. > I have continued my review for this patch series and reviewed/hacked the second patch. I have added/modified comments, changed function ordering in file to make them look consistent and a few other changes. You can find my changes in v33-0002-delta-amit.patch. Are you working on review comments given recently, if you have not started yet, then it might be better to prepare a patch atop of v33 version as I am also going to work on this patch series, that way it will be easy to merge changes. OTOH, if you are already working on those, then it is fine. I can merge any remaining changes with your new patch. Whatever be the case, please let me know. Few more comments on v33-0002-Add-parallel-option-to-VACUUM-command.patch: --- 1. + * leader process re-initializes the parallel context while keeping recorded + * dead tuples so that the leader can launch parallel workers again in the next + * time. In this sentence, it is not clear to me why we need to keep the recorded dead tuples while re-initialize parallel workers? The next time when workers are launched, they should process a new set of dead tuples, no? 2. lazy_parallel_vacuum_or_cleanup_indexes() { .. + /* + * Increment the active worker count. We cannot decrement until the + * all parallel workers finish. + */ + pg_atomic_add_fetch_u32(VacuumActiveNWorkers, 1); + + /* + * Join as parallel workers. The leader process alone does that in + * case where no workers launched. + */ + if (lps->leaderparticipates || lps->pcxt->nworkers_launched == 0) + vacuum_or_cleanup_indexes_worker (Irel, nindexes, stats, lps->lvshared, + vacrelstats->dead_tuples); + + /* + * Here, the indexes that had been skipped during parallel index vacuuming + * are remaining. If there are such indexes the leader process does vacuum + * or cleanup them one by one. + */ + nindexes_remains = nindexes - pg_atomic_read_u32(&(lps->lvshared->nprocessed)); + if (nindexes_remains > 0) + { + int i; +#ifdef USE_ASSERT_CHECKING + int nprocessed = 0; +#endif + + for (i = 0; i < nindexes; i++) + { + bool processed = !skip_parallel_index_vacuum(Irel[i], + lps->lvshared->for_cleanup, + lps->lvshared->first_time); + + /* Skip the already processed indexes */ + if (processed) + continue; + + if (lps->lvshared->for_cleanup) + lazy_cleanup_index(Irel[i], &stats[i], +vacrelstats->new_rel_tuples, + vacrelstats->tupcount_pages < vacrelstats->rel_pages); + else + lazy_vacuum_index(Irel[i], &stats[i], vacrelstats->dead_tuples, + vacrelstats- >old_live_tuples); +#ifdef USE_ASSERT_CHECKING + nprocessed++; +#endif + } +#ifdef USE_ASSERT_CHECKING + Assert (nprocessed == nindexes_remains); +#endif + } + + /* + * We have completed the index vacuum so decrement the active worker + * count. + */ + pg_atomic_sub_fetch_u32(VacuumActiveNWorkers, 1); .. } Here, it seems that we can increment/decrement the VacuumActiveNWorkers even when there is no work performed by the leader backend. How about moving increment/decrement inside function vacuum_or_cleanup_indexes_worker? In that case, we need to do it in this function when we are actually doing an index vacuum or cleanup. After doing that the other usage of increment/decrement of VacuumActiveNWorkers in other function heap_parallel_vacuum_main can be removed. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com v33-0002-Add-parallel-option-to-VACUUM-command.patch Description: Binary data v33-0002-delta-amit.patch Description: Binary data
Re: Ordering of header file inclusion
On Thu, Nov 21, 2019 at 2:10 PM Amit Kapila wrote: > > Thanks for finding the remaining places, the patch looks good to me. > I hope this covers the entire code. BTW, are you using some script to > find this or is this a result of manual inspection of code? I have > modified the commit message in the attached patch. I will commit this > early next week unless someone else wants to review it. > Pushed. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: dropdb --force
On Sun, Nov 24, 2019 at 3:55 PM vignesh C wrote: > > On Sat, Nov 23, 2019 at 4:42 PM Amit Kapila wrote: > > > > > 2. > > ok( TestLib::pump_until( > > + $killme, > > + $psql_timeout, > > + \$killme_stderr, > > + qr/FATAL: terminating connection due to administrator command/m > > + ), > > + "psql query died successfully after SIGTERM"); > > > > Extra space before TestLib. > > Ran perltidy, perltidy adds an extra space. I'm not sure which version > is right whether to include space or without space. I had noticed > similarly in 001_stream_rep.pl, in few places space is present and in > few places it is not present. If required I can update based on > suggestion. > You can try by running perltidy on other existing .pl files where you find the usage "without space" and see if it adds the extra space in all places. I think keeping the version after running perltidy would be a better choice. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: TestLib::command_fails_like enhancement
On 11/11/19 4:28 PM, Mark Dilger wrote: > > > On further consideration, I'm wondering why we don't just unconditionally use a closed input pty for all these functions (except run_log). None of them use any input, and making the client worry about whether or not to close it seems something of an abstraction break. There would be no API change at all involved in this case, just a bit of extra cleanliness. Would need testing on Windows, I'll go and do that now. Thoughts? >>> >>> That sounds a lot better than your previous patch. >>> >>> PostgresNode.pm and TestLib.pm both invoke IPC::Run::run. If you >>> change all the invocations in TestLib to close input pty, should you >>> do the same for PostgresNode? I don't have a strong argument for >>> doing so, but it seems cleaner to have both libraries invoking >>> commands under identical conditions, such that if commands were >>> borrowed from one library and called from the other they would behave >>> the same. >>> >>> PostgresNode already uses TestLib, so perhaps setting up the >>> environment can be abstracted into something, perhaps TestLib::run, >>> and then used everywhere that IPC::Run::run currently is used. >> >> >> >> I don't think we need to do that. In the case of the PostgresNode.pm >> uses we know what the executable is, unlike the the TestLib.pm cases. >> They are our own executables and we don't expect them to be doing >> anything funky with /dev/tty. > > Ok. I think your proposal sounds fine. Here's a patch for that. The pty stuff crashes and burns on my Windows test box, so there I just set stdin to an empty string via the usual pipe mechanism. cheers andrew -- Andrew Dunstanhttps://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services diff --git a/src/test/perl/TestLib.pm b/src/test/perl/TestLib.pm index a377cdb226..5ff138360c 100644 --- a/src/test/perl/TestLib.pm +++ b/src/test/perl/TestLib.pm @@ -87,6 +87,8 @@ our @EXPORT = qw( our ($windows_os, $tmp_check, $log_path, $test_logfile); +my @no_stdin; + BEGIN { @@ -178,6 +180,20 @@ INIT autoflush STDOUT 1; autoflush STDERR 1; autoflush $testlog 1; + + # Settings to close stdin for certain commands. + # On non-Windows, use a pseudo-terminal, so that libraries like openssl + # which open the tty if they think stdin isn't one for a password + # don't block. Windows doesn't have ptys, so just provide an empty + # string for stdin. + if ($windows_os) + { + @no_stdin = ('<', \""); + } + else + { + @no_stdin = ('', \$stdout, '2>', \$stderr; + my $result = IPC::Run::run $cmd, '>', \$stdout, '2>', \$stderr, @no_stdin; chomp($stdout); chomp($stderr); return ($stdout, $stderr); @@ -576,7 +592,7 @@ sub check_pg_config my ($regexp) = @_; my ($stdout, $stderr); my $result = IPC::Run::run [ 'pg_config', '--includedir' ], '>', - \$stdout, '2>', \$stderr + \$stdout, '2>', \$stderr, @no_stdin or die "could not execute pg_config"; chomp($stdout); $stdout =~ s/\r$//; @@ -673,7 +689,7 @@ sub program_help_ok my ($stdout, $stderr); print("# Running: $cmd --help\n"); my $result = IPC::Run::run [ $cmd, '--help' ], '>', \$stdout, '2>', - \$stderr; + \$stderr, @no_stdin; ok($result, "$cmd --help exit code 0"); isnt($stdout, '', "$cmd --help goes to stdout"); is($stderr, '', "$cmd --help nothing to stderr"); @@ -695,7 +711,7 @@ sub program_version_ok my ($stdout, $stderr); print("# Running: $cmd --version\n"); my $result = IPC::Run::run [ $cmd, '--version' ], '>', \$stdout, '2>', - \$stderr; + \$stderr, @no_stdin; ok($result, "$cmd --version exit code 0"); isnt($stdout, '', "$cmd --version goes to stdout"); is($stderr, '', "$cmd --version nothing to stderr"); @@ -718,8 +734,7 @@ sub program_options_handling_ok my ($stdout, $stderr); print("# Running: $cmd --not-a-valid-option\n"); my $result = IPC::Run::run [ $cmd, '--not-a-valid-option' ], '>', - \$stdout, - '2>', \$stderr; + \$stdout, '2>', \$stderr, @no_stdin; ok(!$result, "$cmd with invalid option nonzero exit code"); isnt($stderr, '', "$cmd with invalid option prints error message"); return; @@ -740,7 +755,7 @@ sub command_like my ($cmd, $expected_stdout, $test_name) = @_; my ($stdout, $stderr); print("# Running: " . join(" ", @{$cmd}) . "\n"); - my $result = IPC::Run::run $cmd, '>', \$stdout, '2>', \$stderr; + my $result = IPC::Run::run $cmd, '>', \$stdout, '2>', \$stderr, @no_stdin; ok($result, "$test_name: exit code 0"); is($stderr, '', "$test_name: no stderr"); like($stdout, $expected_stdout, "$test_name: matches"); @@ -769,7 +784,8 @@ sub command_like_safe my $stdoutfile = File::Temp->new(); my $stderrfile = File::Temp->new(); print("# Running: " . join(" ", @{$cmd}) . "\n"); - my $result = IPC::Run::run $cmd, '>', $stdoutfile, '2>', $stderrfile; + my $result = IPC::Run::run $cmd, '>', $stdoutfile, '2>', $stderrfile, + @
Re: proposal: new polymorphic types - commontype and commontypearray
> On Mon, Jun 17, 2019 at 05:31:40AM +0200, Pavel Stehule wrote: > > > I like anycompatible and anycompatiblearray. > > > > I'll update the patch > > > > and here it is Thanks for the patch! I've reviewed it a bit, and have a few small commentaries: * There are few traces of copy paste in comments +static Oid +select_common_type_from_vector(int nargs, Oid *typeids, bool noerror) ... + /* +* Nope, so set up for the full algorithm. Note that at this point, lc +* points to the first list item with type different from pexpr's; we need +* not re-examine any items the previous loop advanced over. +*/ Seems like it was taken from select_common_type, but in select_common_type_from_vector there is no `lc`, since it doesn't accept a list. * I guess it's would be beneficial to update also commentaries for check_generic_type_consistency and enforce_generic_type_consistency * The argument consistency rules are: * * 1) All arguments declared ANYELEMENT must have the same datatype. * ... Since they do not reflect the current state of things in this patch. * I've noticed that there is a small difference in how anyelement and anycompatible behave, namely anycompatible do not handle unknowns: =# select 'aaa'::anyelement; anyelement aaa =# select 'aaa'::anycompatible; ERROR: 42846: cannot cast type unknown to anycompatible LINE 1: select 'aaa'::anycompatible; ^ LOCATION: transformTypeCast, parse_expr.c:2823 It happens due to unknowns being filtered out quite early in check_generic_type_consistency and similar. By itself this difference it not a problem, but it causes different error messages in functions: -- this function accepts anycompatible =# select test_anycompatible('aaa'); ERROR: 42883: function test_anycompatible(unknown) does not exist LINE 1: select test_anycompatible('aaa'); ^ HINT: No function matches the given name and argument types. You might need to add explicit type casts. LOCATION: ParseFuncOrColumn, parse_func.c:627 -- this function accepts anyelement =# select test_anyelement('aaa'); ERROR: 42804: could not determine polymorphic type because input has type unknown LOCATION: enforce_generic_type_consistency, parse_coerce.c:2177 Although of course it's not that serious. * I'm also curious about the following situation: =# create function test_both(a anycompatible) returns anycompatible as $$ begin return a; end$$ language plpgsql; CREATE FUNCTION =# create function test_both(a anyelement) returns anyelement as $$ begin return a; end$$ language plpgsql; CREATE FUNCTION =# select test_both('aaa'::text); ERROR: 42725: function test_both(text) is not unique LINE 1: select test_both('aaa'::text); ^ HINT: Could not choose a best candidate function. You might need to add explicit type casts. LOCATION: ParseFuncOrColumn, parse_func.c:568 =# select test_both('aaa'::anyelement); ERROR: 42804: could not determine polymorphic type because input has type unknown LOCATION: enforce_generic_type_consistency, parse_coerce.c:2177 Is it possible somehow to invoke any of these functions? Other than that the functionality looks pretty solid. It may look obvious, but I've also tested performance in different use cases for anycompatible, looks the same as for anyelement.
Re: Avoid full GIN index scan when possible
Michael Paquier writes: > On Sat, Nov 23, 2019 at 02:35:50AM +0300, Nikita Glukhov wrote: >> Attached 8th version of the patches. > Please be careful here. The CF entry was still marked as waiting on > author, but you sent a new patch series which has not been reviewed. > I have moved this patc to next CF instead. That seems a bit premature --- the new patch is only a couple of days old. The CF entry should've been moved back to "Needs Review", sure. (Having said that, the end of the month isn't that far away, so it may well end up in the next CF anyway.) regards, tom lane
Re: segmentation fault when cassert enabled
On Wed, 6 Nov 2019 14:34:38 +0100 Peter Eisentraut wrote: > On 2019-11-05 17:29, Jehan-Guillaume de Rorthais wrote: > > My best bet so far is that logicalrep_relmap_invalidate_cb is not called > > after the DDL on the subscriber so the relmap cache is not invalidated. So > > we end up with slot->tts_tupleDescriptor->natts superior than > > rel->remoterel->natts in slot_store_cstrings, leading to the overflow on > > attrmap and the sigsev. > > It looks like something like that is happening. But it shouldn't. > Different table schemas on publisher and subscriber are well supported, > so this must be an edge case of some kind. Please continue investigating. I've been able to find the origin of the crash, but it was a long journey. I was unable to debug using gdb record because of this famous error: Process record does not support instruction 0xc5 at address 0x1482758a4b30. Program stopped. __memset_avx2_unaligned_erms () at ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S:168 168 ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S: No such file or directory. Trying with rr, I had constant "stack depth limit exceeded", even with unlimited one. Does it worth opening a discussion or a wiki page about these tools? Peter, it looks like you have some experience with rr, any feedback? Finally, Julien Rouhaud spend some time with me after work hours, a,swering my questions about some parts of the code and pointed me to the excellent backtrace_functions GUC addition few days ago. This finally did the trick to find out what was happening. Many thanks Julien! Back to the bug itself. Consider a working logical replication with constant update/insert activity, eg. pgbench running against provider. Now, on the subscriber side, a session issue an "ALTER TABLE ADD COLUMN" on a subscribed table, eg. pgbench_branches. A cache invalidation message is then pending for this table. In the meantime, the logical replication worker receive an UPDATE to apply. It opens the local relation using "logicalrep_rel_open". It finds the related entry in LogicalRepRelMap is valid, so it does not update its attrmap and directly opens and locks the local relation: /* Need to update the local cache? */ if (!OidIsValid(entry->localreloid)) { [...updates attrmap here...] } else entry->localrel = table_open(entry->localreloid, lockmode); However, when locking the table, the code in LockRelationOid() actually process any pending invalidation messages: LockRelationOid(Oid relid, LOCKMODE lockmode) { [...] /* * Now that we have the lock, check for invalidation messages, so that we * will update or flush any stale relcache entry before we try to use it. * RangeVarGetRelid() specifically relies on us for this. We can skip * this in the not-uncommon case that we already had the same type of lock * being requested, since then no one else could have modified the * relcache entry in an undesirable way. (In the case where our own xact * modifies the rel, the relcache update happens via * CommandCounterIncrement, not here.) * * However, in corner cases where code acts on tables (usually catalogs) * recursively, we might get here while still processing invalidation * messages in some outer execution of this function or a sibling. The * "cleared" status of the lock tells us whether we really are done * absorbing relevant inval messages. */ if (res != LOCKACQUIRE_ALREADY_CLEAR) { AcceptInvalidationMessages(); MarkLockClear(locallock); } } We end up with attrmap referencing N columns and the relcache referencing N+1 columns. Later, in apply_handle_update(), we build a TupleTableSlot based on the relcache representation and we crash by overflowing attrmap while trying to feed this larger slot in slot_store_cstrings(). Please find in attachment a bugfix proposal. It just moves the attrmap update after the table_open() call. Last, I was wondering if entry->attrmap should be pfree'd before palloc'ing it again during its rebuild to avoid some memory leak? Regards, >From 4295b277952a46378f01211bd37075f20223aadc Mon Sep 17 00:00:00 2001 From: Jehan-Guillaume de Rorthais Date: Mon, 25 Nov 2019 15:21:38 +0100 Subject: [PATCH] Fix subscriber invalid memory access on DDL Before this patch, the attrmap structure mapping the local fields to remote ones for a subscribed relation was rebuild before handling pending invalidation messages and potential relcache updates. This was leading to an invalid memory access by overflowing the attrmap when building the related tuple slot received from the provider. --- src/backend/replication/logical/relation.c | 57 -- 1 file changed, 42 insertions(+), 15 deletions(-) diff --git a/src/backend/replication/logical/relation.c b/src/backend/replication/logical/relation.c index b386f8460d..cfc34d49e0 100644 --
Re: [HACKERS] Incomplete startup packet errors
Jobin Augustine writes: > However, Checking whether the port is open is resulting in error log like: > 2019-11-25 14:03:44.414 IST [14475] LOG: invalid length of startup packet > Yes, This is different from "Incomplete startup packet" discussed here. > Steps to reproduce: > $ telnet localhost 5432 >> >> Well, the agreed-to behavior change was to not log anything if the connection is closed without any data having been sent. If the client *does* send something, and it doesn't look like a valid connection request, I think we absolutely should log that. regards, tom lane
Re: logical decoding : exceeded maxAllocatedDescs for .spill files
On Fri, Nov 22, 2019 at 4:38 AM Amit Kapila wrote: > On Thu, Nov 21, 2019 at 8:32 PM Juan José Santamaría Flecha > wrote: > > > > [1] Win10 (1903) MSVC 19.22.27905 > > > > I have tested this on Windows7. I am not sure if it is due to a > different version of windows, but I think we can't rule out that > possibility. > > This seems to be the case. The unexpected behaviour is on my end, which is working as described in FILE_DISPOSITION_POSIX_SEMANTICS [1]. The expected behaviour is what you have already diagnosed. [1] https://docs.microsoft.com/es-es/windows-hardware/drivers/ddi/ntddk/ns-ntddk-_file_disposition_information_ex Regards, Juan José Santamaría Flecha
Why JIT speed improvement is so modest?
Right now JIT provides about 30% improvement of TPC-H Q1 query: https://www.citusdata.com/blog/2018/09/11/postgresql-11-just-in-time/ I wonder why even at this query, which seems to be ideal use case for JIT, we get such modest improvement? I have raised this question several years ago - but that time JIT was assumed to be in early development stage and performance aspects were less critical than required infrastructure changes. But right now JIT seems to be stable enough and is switch on by default. Vitesse DB reports 8x speedup on Q1, ISP-RAS JIT version provides 3x speedup of Q1: https://www.pgcon.org/2017/schedule/attachments/467_PGCon%202017-05-26%2015-00%20ISPRAS%20Dynamic%20Compilation%20of%20SQL%20Queries%20in%20PostgreSQL%20Using%20LLVM%20JIT.pdf According to this presentation Q1 spends 6% of time in ExecQual and 75% in ExecAgg. VOPS provides 10x improvement of Q1. I have a hypothesis that such difference was caused by the way of aggregates calculation. Postgres is using Youngs-Cramer algorithm while both ISPRAS JIT version and my VOPS are just accumulating results in variable of type double. I rewrite VOPS to use the same algorithm as Postgres, but VOPS is still about 10 times faster. Results of Q1 on scale factor=10 TPC-H data at my desktop with parallel execution enabled: no-JIT: 5640 msec JIT: 4590msec VOPS: 452 msec VOPS + Youngs-Cramer algorithm: 610 msec Below are tops of profiles (functions with more than 1% of time): JIT: 10.98% postgres postgres [.] float4_accum 8.40% postgres postgres [.] float8_accum 7.51% postgres postgres [.] HeapTupleSatisfiesVisibility 5.92% postgres postgres [.] ExecInterpExpr 5.63% postgres postgres [.] tts_minimal_getsomeattrs 4.35% postgres postgres [.] lookup_hash_entries 3.72% postgres postgres [.] TupleHashTableHash.isra.8 2.93% postgres postgres [.] tuplehash_insert 2.70% postgres postgres [.] heapgettup_pagemode 2.24% postgres postgres [.] check_float8_array 2.23% postgres postgres [.] hash_search_with_hash_value 2.10% postgres postgres [.] ExecScan 1.90% postgres postgres [.] hash_uint32 1.57% postgres postgres [.] tts_minimal_clear 1.53% postgres postgres [.] FunctionCall1Coll 1.47% postgres postgres [.] pg_detoast_datum 1.39% postgres postgres [.] heapgetpage 1.37% postgres postgres [.] TupleHashTableMatch.isra.9 1.35% postgres postgres [.] ExecStoreBufferHeapTuple 1.06% postgres postgres [.] LookupTupleHashEntry 1.06% postgres postgres [.] AggCheckCallContext no-JIT: 26.82% postgres postgres [.] ExecInterpExpr 15.26% postgres postgres [.] tts_buffer_heap_getsomeattrs 8.27% postgres postgres [.] float4_accum 7.51% postgres postgres [.] float8_accum 5.26% postgres postgres [.] HeapTupleSatisfiesVisibility 2.78% postgres postgres [.] TupleHashTableHash.isra.8 2.63% postgres postgres [.] tts_minimal_getsomeattrs 2.54% postgres postgres [.] lookup_hash_entries 2.05% postgres postgres [.] tuplehash_insert 1.97% postgres postgres [.] heapgettup_pagemode 1.72% postgres postgres [.] hash_search_with_hash_value 1.57% postgres postgres [.] float48mul 1.55% postgres postgres [.] check_float8_array 1.48% postgres postgres [.] ExecScan 1.26% postgres postgres [.] hash_uint32 1.04% postgres postgres [.] tts_minimal_clear 1.00% postgres postgres [.] FunctionCall1Coll VOPS: 44.25% postgres vops.so [.] vops_avg_state_accumulate 11.76% postgres vops.so [.] vops_float4_avg_accumulate 6.14% postgres postgres [.] ExecInterpExpr 5.89% postgres vops.so [.] vops_float4_sub_lconst 4.89% postgres vops.so [.] vops_float4_mul 4.30% postgres vops.so [.] vops_int4_le_rconst 2.57% postgres vops.so [.] vops_float4_add_lconst 2.31% postgres vops.so [.] vops_count_accumulate 2.24% postgres postgres [.] tts_buffer_heap_getsomeattrs 1.97% postgres postgres [.] heap_page_prune_opt 1.72% postgres postgres [.] HeapTupleSatisfiesVisibility 1.67% postgres postgres [.] AllocSetAlloc 1.47% postgres postgres [.] hash_search_with_hash_value In theory by elimination of interpretation overhead JIT should provide performance comparable with vecrtorized executor. In most programming languages using JIT compiler instead of byte-code interpreter provides about 10x speed improveme
Re: Why JIT speed improvement is so modest?
On Mon, Nov 25, 2019 at 9:09 AM Konstantin Knizhnik wrote: > JIT was not able to significantly (times) increase speed on Q1 query? > Experiment with VOPS shows that used aggregation algorithm itself is not > a bottleneck. > Profile also give no answer for this question. > Any ideas? Well, in the VOPS variant around 2/3 of the time is spent in routines that are obviously aggregation. In the JIT version, it's around 20%. So this suggests that the replacement execution engine is more invasive. I would also guess (!) that the VOPS engine optimizes fewer classes of query plan. ExecScan for example, looks to be completely optimized out VOPS but is still utilized in the JIT engine. I experimented with Vitessa a couple of years back and this was consistent with my recollection. merlin
Re: [HACKERS] Block level parallel vacuum
On Fri, 22 Nov 2019 at 10:19, Amit Kapila wrote: > > On Wed, Nov 20, 2019 at 11:01 AM Masahiko Sawada > wrote: > > > > I've attached the latest version patch set. The patch set includes all > > discussed points regarding index AM options as well as shared cost > > balance. Also I added some test cases used all types of index AM. > > > > I have reviewed the first patch and made a number of modifications > that include adding/modifying comments, made some corrections and > modifications in the documentation. You can find my changes in > v33-0001-delta-amit.patch. See, if those look okay to you, if so, > please include those in the next version of the patch. I am attaching > both your version of patch and delta changes by me. Thank you. All changes look good to me. But after changed the 0002 patch the two macros for parallel vacuum options (VACUUM_OPTIONS_SUPPORT_XXX) is no longer necessary. So we can remove them and can add if we need them again. > > One comment on v33-0002-Add-parallel-option-to-VACUUM-command: > > + /* Estimate size for shared information -- PARALLEL_VACUUM_KEY_SHARED */ > + est_shared = MAXALIGN(add_size(SizeOfLVShared, BITMAPLEN > (nindexes))); > .. > + shared->offset = add_size(SizeOfLVShared, BITMAPLEN(nindexes)); > > Here, don't you need to do MAXALIGN to set offset as we are computing > it that way while estimating shared memory? If not, then probably, > some comments are required to explain it. You're right. Will fix it. Regards, -- Masahiko Sawadahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: backup manifests
On 2019-11-24 15:38, David Steele wrote: On 11/23/19 4:34 PM, Andrew Dunstan wrote: On 11/23/19 3:13 AM, Tels wrote: Without the strong hashes it would be pointless to sign the manifest. I guess I must have missed where we are planning to add a cryptographic signature. I don't think we were planning to, but the user could do so if they wished. That was what I meant. Best regards, Tels
Re: backup manifests
On Fri, Nov 22, 2019 at 2:29 PM David Steele wrote: > See: > https://www.nist.gov/system/files/documents/2017/04/26/lrdc_systems_part2_032713.pdf > Search for "The maximum block size" Hmm, so it says: "The maximum block size that can be protected by a 32-bit CRC is 512MB." My problem is that (1) it doesn't back this up with a citation or any kind of logical explanation and (2) it's not very clear what "protected" means. Tels replies downthread to explain that the internal state of the 32-bit CRC calculation is also limited to 32 bits, and changes once per bit, so that after processing 512MB = 2^29 bytes = 2^32 bits of data, you're guaranteed to start repeating internal states. Perhaps this is also what the NIST folks had in mind, though it's hard to know. This link provides some more details: https://community.arm.com/developer/tools-software/tools/f/keil-forum/17467/crc-for-256-byte-data Not everyone on the thread agrees with everybody else, but it seems like there are size limits below which a CRC-n is guaranteed to detect all 1-bit and 2-bit errors, and above which this is no longer guaranteed. They put the limit *lower* than what NIST supposes, namely 2^(n-1)-1 bits, which would be 256MB, not 512MB, if I'm doing math correctly. However, they also say that above that value, you are still likely to detect most errors. Absent an intelligent adversary, the chance of a random collision when corruption is present is still about 1 in 4 billion (2^-32). To me, guaranteed detection of 1-bit and 2-bit errors (and the other kinds of specific things CRC is designed to catch) doesn't seem like a principle design consideration. It's nice if we can get it and I'm not against it, but these are algorithms that are designed to be used when data undergoes a digital-to-analog-to-digital conversion, where for example it's possible that that the conversion back to digital loses sync and reads 9 bits or 7 bits rather than 8 bits. And that's not really what we're doing here: we all know that bits get flipped sometimes, but nobody uses scp to copy a 1GB file and ends up with a file that is 1GB +/- a few bits. Some lower-level part of the communication stack is handling that part of the work; you're going to get exactly 1GB. So it seems to me that here, as with XLOG, we're not relying on the specific CRC properties that were intended to be used to catch and in some cases repair bit flips caused by wrinkles in an A-to-D conversion, but just on its general tendency to probably not match if any bits got flipped. And those properties hold regardless of input length. That being said, having done some reading on this, I am a little concerned that we're getting further and further from the design center of the CRC algorithm. Like relation segment files, XLOG records are not packets subject to bit insertions, but at least they're small, and relation files are not. Using a 40-year-old algorithm that was intended to be used for things like making sure the modem hadn't lost framing in the last second to verify 1GB files feels, in some nebulous way, like we might be stretching. That being said, I'm not sure what we think the reasonable alternatives are. Users aren't going to be better off if we say that, because CRC-32C might not do a great job detecting errors, we're not going to check for errors at all. If we go the other way and say we're going to use some variant of SHA, they will be better off, but at the price of what looks like a *significant* hit in terms of backup time. > > "Typically an n-bit CRC applied to a data block of arbitrary length > > will detect any single error burst not longer than n bits, and the > > fraction of all longer error bursts that it will detect is (1 − > > 2^−n)." > > I'm not sure how encouraging I find this -- a four-byte error not a lot > and 2^32 is only 4 billion. We have individual users who have backed up > more than 4 billion files over the last few years. I agree that people have a lot more than 4 billion files backed up, but I'm not sure it matters very much given the use case I'm trying to enable. There's a lot of difference between delta restore and backup integrity checking. For backup integrity checking, my goal is that, on those occasions when a file gets corrupted, the chances that we notice that it has been corrupted. For that purpose, a 32-bit checksum is probably sufficient. If a file gets corrupted, we have about a 1-in-4-billion chance of being unable to detect it. If 4 billion files get corrupted, we'll miss, on average, one of those corruption events. That's sad, but so is the fact that you had *4 billion corrupted files*. This is not the total number of files backed up; this is the number of those that got corrupted. I don't really know how common it is to copy a file and end up with a corrupt copy, but if you say it's one-in-a-million, which I suspect is far too high, then you'd have to back up something like 4 quadrillion files before you missed a corruption event, and th
Re: backup manifests
On Fri, Nov 22, 2019 at 2:02 PM David Steele wrote: > > Except - and this gets back to the previous point - I don't want to > > slow down backups by 40% by default. I wouldn't mind slowing them down > > 3% by default, but 40% is too much overhead. I think we've gotta > > either the overhead of using SHA way down or not use SHA by default. > > Maybe -- my take is that the measurements, an uncompressed backup to the > local filesystem, are not a very realistic use case. Well, compression is a feature we don't have yet, in core. So for people who are only using core tools, an uncompressed backup is a very realistic use case, because it's the only kind they can get. Granted the situation is different if you are using pgbackrest. I don't have enough experience to know how often people back up to local filesystems vs. remote filesystems mounted locally vs. overtly over-the-network. I sometimes get the impression that users choose their backup tools and procedures with, as Tom would say, the aid of a dart board, but that's probably the cynic in me talking. Or maybe a reflection of the fact that I usually end up talking to the users for whom things have gone really, really badly wrong, rather than the ones for whom things went as planned. > However, I'm still fine with leaving the user the option of checksums or > no. I just wanted to point out that CRCs have their limits so maybe > that's not a great option unless it is properly caveated and perhaps not > the default. I think the default is the sticking point here. To me, it looks like CRC is a better default than nothing at all because it should still catch a high percentage of issues that would otherwise be missed, and a better default than SHA because it's so cheap to compute. However, I'm certainly willing to consider other theories. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: backup manifests
On Fri, Nov 22, 2019 at 5:15 PM Tels wrote: > It is related to the number of states... Thanks for this explanation. See my reply to David where I also discuss this point. > However, if you choose a hash, please do not go below SHA-256. Both MD5 > and SHA-1 already had collision attacks, and these only got to be bound > to be worse. > >https://www.mscs.dal.ca/~selinger/md5collision/ >https://shattered.io/ Yikes, that second link, about SHA-1, is depressing. Now, it's not likely that an attacker has access to your backup repository and can spend 6500 years of CPU time to engineer a Trojan file there (maybe more, because the files are probably bigger than the PDFs they used in that case) and then induce you to restore and rely upon that backup. However, it's entirely likely that somebody is going to eventually ban SHA-1 as the attacks get better, which is going to be a problem for us whether the underlying exposures are problems or not. > It might even be a wise idea to encode the used Hash-Algorithm into the > manifest file, so it can be changed later. The hash length might be not > enough to decide which algorithm is the one used. I agree. Let's write SHA256:bc1c3a57369acd0d2183a927fb2e07acbbb1c97f317bbc3b39d93ec65b754af5 or similar rather than just the hash. That way even if the entire SHA family gets cracked, we can easily substitute in something else that hasn't been cracked yet. (It is unclear to me why anyone supposes that *any* popular hash function won't eventually be cracked. For a K-bit hash function, there are 2^K possible outputs, where K is probably in the hundreds. But there are 2^{2^33} possible 1GB files. So for every possible output value, there are 2^{2^33-K} inputs that produce that value, which is a very very big number. The probability that any given input produces a certain output is very low, but the number of possible inputs that produce a given output is very high; so assuming that nobody's ever going to figure out how to construct them seems optimistic.) > To get a feeling one can use: > > openssl speed md5 sha1 sha256 sha512 > > On my really-not-fast desktop CPU (i5-4690T CPU @ 2.50GHz) it says: > > The 'numbers' are in 1000s of bytes per second processed. >type 16 bytes 64 bytes256 bytes 1024 bytes 8192 > bytes 16384 bytes >md5 122638.55k 277023.96k 487725.57k 630806.19k > 683892.74k 688553.98k >sha1 127226.45k 313891.52k 632510.55k 865753.43k > 960995.33k 977215.19k >sha256 77611.02k 173368.15k 325460.99k 412633.43k > 447022.92k 448020.48k >sha512 51164.77k 205189.87k 361345.79k 543883.26k > 638372.52k 645933.74k > > Or in other words, it can hash nearly 931 MByte /s with SHA-1 and about > 427 MByte / s with SHA-256 (if I haven't miscalculated something). You'd > need a > pretty fast disk (aka M.2 SSD) and network (aka > 1 Gbit) to top these > speeds > and then you'd use a real CPU for your server, not some poor Intel > powersaving > surfing thingy-majingy :) I mean, how fast is in theory doesn't matter nearly as much as what happens when you benchmark the proposed implementation, and the results we have so far don't support the theory that this is so cheap as to be negligible. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: dropdb --force
On Sun, Nov 24, 2019 at 5:06 PM Pavel Stehule wrote: > > > > ne 24. 11. 2019 v 11:25 odesílatel vignesh C napsal: >> >> On Sat, Nov 23, 2019 at 4:42 PM Amit Kapila wrote: >> > >> > On Fri, Nov 22, 2019 at 3:16 PM vignesh C wrote: >> > > >> > > Thanks for fixing the comments. The changes looks fine to me. I have >> > > fixed the first comment, attached patch has the changes for the same. >> > > >> > >> > Few comments: >> > -- >> > 1. There is a lot of duplicate code in this test. Basically, almost >> > the same code is used once to test Drop Database command and dropdb. >> > I think we can create a few sub-functions to avoid duplicate code, but >> > do we really need to test the same thing once via command and then by >> > dropdb utility? I don't think it is required, but even if we want to >> > do so, then I am not sure if this is the right test script. I suggest >> > removing the command related test. >> > >> >> Pavel: What is your opinion on this? > > > I have not any problem with this reduction. > > We will see in future years what is benefit of this test. > Fixed, removed dropdb utility test. >> >> > 2. >> > ok( TestLib::pump_until( >> > + $killme, >> > + $psql_timeout, >> > + \$killme_stderr, >> > + qr/FATAL: terminating connection due to administrator command/m >> > + ), >> > + "psql query died successfully after SIGTERM"); >> > >> > Extra space before TestLib. >> >> Ran perltidy, perltidy adds an extra space. I'm not sure which version >> is right whether to include space or without space. I had noticed >> similarly in 001_stream_rep.pl, in few places space is present and in >> few places it is not present. If required I can update based on >> suggestion. >> >> > >> > 3. >> > +=item pump_until(proc, psql_timeout, stream, untl) >> > >> > I think moving pump_until to TestLib is okay, but if you are making it >> > a generic function to be used from multiple places, it is better to >> > name the variable as 'timeout' instead of 'psql_timeout' >> > >> >> Fixed. >> >> > 4. Have you ran perltidy, if not, can you please run it? See >> > src/test/perl/README for the recommendation. >> > >> >> I have verified by running perltidy. >> >> Please find the updated patch attached. 1st patch is same as previous, >> 2nd patch includes the fix for comments 2,3 & 4. >> Attached patch handles the fix for the above comments. Regards, Vignesh EnterpriseDB: http://www.enterprisedb.com From 0890c781b67595a51b04a1dfe972890fb6be18a4 Mon Sep 17 00:00:00 2001 From: vignesh Date: Mon, 25 Nov 2019 23:12:19 +0530 Subject: [PATCH 2/2] Made pump_until usable as a common subroutine. Patch includes movement of pump_until subroutine from 013_crash_restart to TestLib so that it can be used as a common sub from 013_crash_restart and 060_dropdb_force. --- src/test/perl/TestLib.pm | 37 +++ src/test/recovery/t/013_crash_restart.pl | 62 2 files changed, 59 insertions(+), 40 deletions(-) diff --git a/src/test/perl/TestLib.pm b/src/test/perl/TestLib.pm index a377cdb..b58679a 100644 --- a/src/test/perl/TestLib.pm +++ b/src/test/perl/TestLib.pm @@ -860,6 +860,43 @@ sub command_checks_all =pod +=item pump_until(proc, timeout, stream, untl) + +# Pump until string is matched, or timeout occurs + +=cut + +sub pump_until +{ + my ($proc, $timeout, $stream, $untl) = @_; + $proc->pump_nb(); + while (1) + { + last if $$stream =~ /$untl/; + if ($timeout->is_expired) + { + diag("aborting wait: program timed out"); + diag("stream contents: >>", $$stream, "<<"); + diag("pattern searched for: ", $untl); + + return 0; + } + if (not $proc->pumpable()) + { + diag("aborting wait: program died"); + diag("stream contents: >>", $$stream, "<<"); + diag("pattern searched for: ", $untl); + + return 0; + } + $proc->pump(); + } + return 1; + +} + +=pod + =back =cut diff --git a/src/test/recovery/t/013_crash_restart.pl b/src/test/recovery/t/013_crash_restart.pl index 2c47797..dd08924 100644 --- a/src/test/recovery/t/013_crash_restart.pl +++ b/src/test/recovery/t/013_crash_restart.pl @@ -72,7 +72,8 @@ CREATE TABLE alive(status text); INSERT INTO alive VALUES($$committed-before-sigquit$$); SELECT pg_backend_pid(); ]; -ok(pump_until($killme, \$killme_stdout, qr/[[:digit:]]+[\r\n]$/m), +ok( TestLib::pump_until( + $killme, $psql_timeout, \$killme_stdout, qr/[[:digit:]]+[\r\n]$/m), 'acquired pid for SIGQUIT'); my $pid = $killme_stdout; chomp($pid); @@ -84,7 +85,9 @@ $killme_stdin .= q[ BEGIN; INSERT INTO alive VALUES($$in-progress-before-sigquit$$) RETURNING status; ]; -ok(pump_until($killme, \$killme_stdout, qr/in-progress-before-sigquit/m), +ok( TestLib::pump_until( + $killme, $psql_timeout, + \$killme_stdout, qr/in-progress-before-sigquit/m), 'inserted in-progress-before-sigquit'); $killme_stdout = ''; $killme_stderr = ''; @@ -97,7 +100,8 @@ $monitor_stdin .= q[ SELECT $$psql-connected$$; SELECT pg_sleep(3600
Re: accounting for memory used for BufFile during hash joins
On Mon, Nov 25, 2019 at 05:33:35PM +0900, Michael Paquier wrote: On Tue, Sep 10, 2019 at 03:47:51PM +0200, Tomas Vondra wrote: My feeling is that we should get the BNLJ committed first, and then maybe use some of those additional strategies as fallbacks (depending on which issues are still unsolved by the BNLJ). The glacier is melting more. Tomas, what's your status here? The patch has been waiting on author for two months now. If you are not planning to work more on this one, then it should be marked as returned with feedback? I'm not planning to do any any immediate work on this, so I agree with marking it as RWF. I think Melanie is working on the BNL patch, which seems like the right solution. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Attempt to consolidate reading of XLOG page
On 2019-Nov-25, Antonin Houska wrote: > Alvaro Herrera wrote: > > I see no reason to leave ws_off. We can move that to XLogReaderState; I > > did that here. We also need the offset in WALReadError, though, so I > > added it there too. Conceptually it seems clearer to me this way. > > > > What do you think of the attached? > > It looks good to me. Attached is just a fix of a minor problem in error > reporting that Michael pointed out earlier. Excellent, I pushed it with this change included and some other cosmetic changes. Now there's only XLogPageRead() ... > > BTW I'm not clear what errors can pread()/pg_pread() report that do not > > set errno. I think lines 1083/1084 of WALRead are spurious now. > > All I can say is that the existing calls of pg_pread() do not clear errno, so > you may be right. Right ... in this interface, we only report an error if pg_pread() returns negative, which is documented to always set errno. > I'd appreciate more background about the "partial read" that > Michael mentions here: > > https://www.postgresql.org/message-id/20191125033048.GG37821%40paquier.xyz In the current implementation, if pg_pread() does a partial read, we just loop one more time. I considered changing the "if (readbytes <= 0)" with "if (readbytes < segbytes)", but that seemed pointless. However, writing this now makes me think that we should add a CHECK_FOR_INTERRUPTS in this loop. (I also wonder if we shouldn't limit the number of times we retry if pg_pread returns zero (i.e. no error, but no bytes read either). I don't know if this is a real-world consideration.) -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: TestLib::command_fails_like enhancement
On 11/25/19 5:08 AM, Andrew Dunstan wrote: On 11/11/19 4:28 PM, Mark Dilger wrote: On further consideration, I'm wondering why we don't just unconditionally use a closed input pty for all these functions (except run_log). None of them use any input, and making the client worry about whether or not to close it seems something of an abstraction break. There would be no API change at all involved in this case, just a bit of extra cleanliness. Would need testing on Windows, I'll go and do that now. Thoughts? That sounds a lot better than your previous patch. PostgresNode.pm and TestLib.pm both invoke IPC::Run::run. If you change all the invocations in TestLib to close input pty, should you do the same for PostgresNode? I don't have a strong argument for doing so, but it seems cleaner to have both libraries invoking commands under identical conditions, such that if commands were borrowed from one library and called from the other they would behave the same. PostgresNode already uses TestLib, so perhaps setting up the environment can be abstracted into something, perhaps TestLib::run, and then used everywhere that IPC::Run::run currently is used. I don't think we need to do that. In the case of the PostgresNode.pm uses we know what the executable is, unlike the the TestLib.pm cases. They are our own executables and we don't expect them to be doing anything funky with /dev/tty. Ok. I think your proposal sounds fine. Here's a patch for that. The pty stuff crashes and burns on my Windows test box, so there I just set stdin to an empty string via the usual pipe mechanism. Ok, I've reviewed and tested this. It works fine for me on Linux. I am not set up to test it on Windows. I think it is ready to commit. I have one remaining comment about the code, and this is just FYI. I won't quibble with you committing your patch as it currently stands. You might consider changing the '\x04' literal to use a named control character, both for readability and portability, as here: + use charnames ':full'; + @no_stdin = ('
GROUPING SETS and SQL standard
Hi, We are still on the process to migrate our applications from proprietary RDBMS to PostgreSQL. Here is a simple query executed on various systems (real query is different but this one does not need any data) : Connected to: Oracle Database 19c Standard Edition 2 Release 19.0.0.0.0 - Production Version 19.3.0.0.0 SQL> select count(*) from (select 1 from dual where 0=1 group by grouping sets(())) tmp; COUNT(*) -- 0 select @@version; GO --- --- -- Microsoft SQL Server 2017 (RTM-CU16) (KB4508218) - 14.0.3223.3 (X64) Jul 12 2019 17:43:08 Copyright (C) 2017 Microsoft Corporation Developer Edition (64-bit) on Linux (Ubuntu 16.04.6 LTS) select count(*) from (select 1 as c1 where 0=1 group by grouping sets(())) tmp; GO --- 0 (1 rows affected) select version(); version PostgreSQL 11.5 (Debian 11.5-1+deb10u1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 8.3.0-6) 8.3.0, 64-bit select count(*) from (select 1 from dual where 0=1 group by grouping sets(())) tmp; count --- 1 (1 ligne) 0 or 1, which behaviour conforms to the SQL standard ? We have a workaround and it's just informational. Regards, Phil
Re: TestLib::command_fails_like enhancement
On 11/25/19 1:56 PM, Mark Dilger wrote: > > > On 11/25/19 5:08 AM, Andrew Dunstan wrote: >> >> On 11/11/19 4:28 PM, Mark Dilger wrote: >>> >>> >>> >> >> On further consideration, I'm wondering why we don't just >> unconditionally use a closed input pty for all these functions >> (except >> run_log). None of them use any input, and making the client worry >> about >> whether or not to close it seems something of an abstraction break. >> There would be no API change at all involved in this case, just a >> bit of >> extra cleanliness. Would need testing on Windows, I'll go and do >> that >> now. >> >> >> Thoughts? > > That sounds a lot better than your previous patch. > > PostgresNode.pm and TestLib.pm both invoke IPC::Run::run. If you > change all the invocations in TestLib to close input pty, should you > do the same for PostgresNode? I don't have a strong argument for > doing so, but it seems cleaner to have both libraries invoking > commands under identical conditions, such that if commands were > borrowed from one library and called from the other they would behave > the same. > > PostgresNode already uses TestLib, so perhaps setting up the > environment can be abstracted into something, perhaps TestLib::run, > and then used everywhere that IPC::Run::run currently is used. I don't think we need to do that. In the case of the PostgresNode.pm uses we know what the executable is, unlike the the TestLib.pm cases. They are our own executables and we don't expect them to be doing anything funky with /dev/tty. >>> >>> Ok. I think your proposal sounds fine. >> >> >> >> Here's a patch for that. The pty stuff crashes and burns on my Windows >> test box, so there I just set stdin to an empty string via the usual >> pipe mechanism. > > Ok, I've reviewed and tested this. It works fine for me on Linux. I > am not set up to test it on Windows. I think it is ready to commit. > > I have one remaining comment about the code, and this is just FYI. I > won't quibble with you committing your patch as it currently stands. > > You might consider changing the '\x04' literal to use a named control > character, both for readability and portability, as here: > > + use charnames ':full'; > + @no_stdin = (' > The only character set I can find where this matters is EBCDIC, in > which the EOT character is 55 rather than 4. Since EBCDIC does not > occur in the list of supported character sets for postgres, per the > docs section 23.3.1, I don't suppose it matters too much. Nor can > I test how this works on EBCDIC, so I'm mostly guessing that perl > would do the right thing there. But, at least to my eyes, it is > more immediately clear what the code is doing when the control > character name is spelled out. > > Agreed, I'll do it that way. This is quite timely, as I just finished reworking the patch that relies on it. Thanks for the review. cheers andrew -- Andrew Dunstanhttps://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: GROUPING SETS and SQL standard
po 25. 11. 2019 v 20:32 odesílatel Phil Florent napsal: > Hi, > > We are still on the process to migrate our applications from proprietary > RDBMS to PostgreSQL. > > Here is a simple query executed on various systems (real query is > different but this one does not need any data) : > > Connected to: > > Oracle Database 19c Standard Edition 2 Release 19.0.0.0.0 - Production > > Version 19.3.0.0.0 > > > > SQL> select count(*) from (select 1 from dual where 0=1 group by grouping > sets(())) tmp; > > > > COUNT(*) > > -- > > 0 > > > > > > select @@version; > > GO > > > > > --- > --- > -- > > Microsoft SQL Server 2017 (RTM-CU16) (KB4508218) - 14.0.3223.3 (X64) > > Jul 12 2019 17:43:08 > > Copyright (C) 2017 Microsoft Corporation > > Developer Edition (64-bit) on Linux (Ubuntu 16.04.6 LTS) > > > > select count(*) from (select 1 as c1 where 0=1 group by grouping sets(())) > tmp; > > GO > > > > --- > > 0 > > > > (1 rows affected) > > > > > > select version(); > > version > > > > > PostgreSQL 11.5 (Debian 11.5-1+deb10u1) on x86_64-pc-linux-gnu, compiled > by gcc (Debian 8.3.0-6) 8.3.0, 64-bit > > > > > > > > select count(*) from (select 1 from dual where 0=1 group by grouping > sets(())) tmp; > > count > > --- > > 1 > > (1 ligne) > > > > > > 0 or 1, which behaviour conforms to the SQL standard ? We have a > workaround and it's just informational. > This example has not too much sense - I am not sure if these corner cases are described by ANSI SQL standards. If I add aggregate query to subquery - using grouping sets without aggregation function is strange, then Postgres result looks more correct postgres=# select 1, count(*) from dual group by grouping sets(()); ┌──┬───┐ │ ?column? │ count │ ╞══╪═══╡ │1 │ 1 │ └──┴───┘ (1 row) postgres=# select 1, count(*) from dual where false group by grouping sets(()); ┌──┬───┐ │ ?column? │ count │ ╞══╪═══╡ │1 │ 0 │ └──┴───┘ (1 row) SELECT count(*) from this should be one in both cases. I am not sure, if standard describe using grouping sets without any aggregation function Pavel > > Regards, > > > Phil > >
Re: FETCH FIRST clause WITH TIES option
On 2019-Nov-11, Alvaro Herrera wrote: > I'm not sure the proposed changes to gram.y are all that great, though. Here's a proposed simplification of the gram.y changes. There are two things here: 1. cosmetic: we don't need the LimitClause struct; we can use just SelectLimit, and return that from limit_clause; that can be complemented using the offset_clause if there's any at select_limit level. 2. there's a gratuituous palloc() in opt_select_limit when there's no clause, that seems to be there just so that NULLs can be returned. That's one extra palloc for SELECTs parsed using one the affected productions ... it's not every single select, but it seems bad enough it's worth fixing. I fixed #2 by just checking whether the return from opt_select_limit is NULL. ISTM it'd be better to pass the SelectLimit pointer to insertSelectOptions instead (which is a static function in gram.y anyway so there's no difficulty there). -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y index e776215155..f4304a45b9 100644 --- a/src/backend/parser/gram.y +++ b/src/backend/parser/gram.y @@ -135,13 +135,6 @@ typedef struct SelectLimit LimitOption limitOption; } SelectLimit; -/* Private struct for the result of limit_clause production */ -typedef struct LimitClause -{ - Node *limitCount; - LimitOption limitOption; -} LimitClause; - /* ConstraintAttributeSpec yields an integer bitmask of these flags: */ #define CAS_NOT_DEFERRABLE 0x01 #define CAS_DEFERRABLE0x02 @@ -258,8 +251,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query); PartitionSpec *partspec; PartitionBoundSpec *partboundspec; RoleSpec *rolespec; - struct SelectLimit *SelectLimit; - struct LimitClause *LimitClause; + struct SelectLimit *selectlimit; } %type stmt schema_stmt @@ -392,8 +384,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query); %type import_qualification_type %type import_qualification %type vacuum_relation -%type opt_select_limit select_limit -%type limit_clause +%type opt_select_limit select_limit limit_clause %type stmtblock stmtmulti OptTableElementList TableElementList OptInherit definition @@ -11343,8 +11334,9 @@ select_no_parens: | select_clause opt_sort_clause for_locking_clause opt_select_limit { insertSelectOptions((SelectStmt *) $1, $2, $3, - ($4)->limitOffset, ($4)->limitCount, - ($4)->limitOption, + ($4) ? ($4)->limitOffset : NULL, + ($4) ? ($4)->limitCount : NULL, + ($4) ? ($4)->limitOption : LIMIT_OPTION_DEFAULT, NULL, yyscanner); $$ = $1; @@ -11362,7 +11354,7 @@ select_no_parens: { insertSelectOptions((SelectStmt *) $2, NULL, NIL, NULL, NULL, - LIMIT_OPTION_DEFAULT,$1, + LIMIT_OPTION_DEFAULT, $1, yyscanner); $$ = $2; } @@ -11370,15 +11362,16 @@ select_no_parens: { insertSelectOptions((SelectStmt *) $2, $3, NIL, NULL, NULL, - LIMIT_OPTION_DEFAULT,$1, + LIMIT_OPTION_DEFAULT, $1, yyscanner); $$ = $2; } | with_clause select_clause opt_sort_clause for_locking_clause opt_select_limit { insertSelectOptions((SelectStmt *) $2, $3, $4, - ($5)->limitOffset, ($5)->limitCount, - ($5)->limitOption, + ($5) ? ($5)->limitOffset : NULL, + ($5) ? ($5)->limitCount : NULL, + ($5) ? ($5)->limitOption : LIMIT_OPTION_DEFAULT, $1, yyscanner); $$ = $2; @@ -11683,27 +11676,17 @@ sortby: a_expr USING qual_all_Op opt_nulls_order select_limit: limit_clause offset_clause { - SelectLimit *n = (SelectLimit *) palloc(sizeof(SelectLimit)); - n->limitOffset = $2; - n->limitCount = ($1)->limitCount; - n->limitOption = ($1)->limitOption; - $$ = n; + $$ = $1; + ($$)->limitOffset = $2; } | offset_clause limit_clause { - SelectLimit *n = (SelectLimit *) palloc(sizeof(SelectLimit)); - n->limitOffset = $1; - n->limitCount = ($2)->limitCount; - n->limitOption = ($2)->limitOption; - $$ = n; + $$ = $2; + ($$)->limitOffset = $1; } | limit_clause { - SelectLimit *n = (SelectLimit *) palloc(sizeof(SelectLimit)); - n->limitOffset = NULL; - n->limitCount = ($1)->limitCount; - n->limitOption = ($1)->limitOption; - $$ = n; + $$ = $1; } | offset_clause { @@ -11717,20 +11700,14 @@ select_limit: opt_select_limit: select_limit { $$ = $1; } - | /* EMPTY */ -{ - SelectLimit *n = (SelectLimit *) palloc(sizeof(SelectLimit)); - n->limitOffset = NULL; - n->limitCount = NULL; - n->limitOption = LIMIT_OPTION_DEFAULT; - $$ = n; -} + | /* EMPTY */ { $$ =
Re: FETCH FIRST clause WITH TIES option
(Prior to posting this delta patch, the CF bot appeared happy with this patch.) -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] WAL logging problem in 9.4.3?
On Sat, Nov 23, 2019 at 4:21 PM Noah Misch wrote: > I noticed an additional defect: > > BEGIN; > CREATE TABLE t (c) AS SELECT 1; > CHECKPOINT; -- write and fsync the table's one page > TRUNCATE t; -- no WAL > COMMIT; -- no FPI, just the commit record > > If we crash after the COMMIT and before the next fsync or OS-elected sync of > the table's file, the table will stay on disk with its pre-TRUNCATE content. Shouldn't the TRUNCATE be triggering an fsync() to happen before COMMIT is permitted to complete? You'd have the same problem if the TRUNCATE were replaced by INSERT, unless fsync() happens in that case. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: libpq sslpassword parameter and callback function
On 10/31/19 7:27 PM, Andrew Dunstan wrote: > On 10/31/19 6:34 PM, Andrew Dunstan wrote: >> This time with attachment. >> >> >> On 10/31/19 6:33 PM, Andrew Dunstan wrote: >>> This patch provides for an sslpassword parameter for libpq, and a hook >>> that a client can fill in for a callback function to set the password. >>> >>> >>> This provides similar facilities to those already available in the JDBC >>> driver. >>> >>> >>> There is also a function to fetch the sslpassword from the connection >>> parameters, in the same way that other settings can be fetched. >>> >>> >>> This is mostly the excellent work of my colleague Craig Ringer, with a >>> few embellishments from me. >>> >>> >>> Here are his notes: >>> >>> >>> Allow libpq to non-interactively decrypt client certificates that >>> are stored >>> encrypted by adding a new "sslpassword" connection option. >>> >>> The sslpassword option offers a middle ground between a cleartext >>> key and >>> setting up advanced key mangement via openssl engines, PKCS#11, USB >>> crypto >>> offload and key escrow, etc. >>> >>> Previously use of encrypted client certificate keys only worked if >>> the user >>> could enter the key's password interactively on stdin, in response >>> to openssl's >>> default prompt callback: >>> >>> Enter PEM passhprase: >>> >>> That's infesible in many situations, especially things like use from >>> postgres_fdw. >>> >>> This change also allows admins to prevent libpq from ever prompting >>> for a >>> password by calling: >>> >>> PQsetSSLKeyPassHook(PQdefaultSSLKeyPassHook); >>> >>> which is useful since OpenSSL likes to open /dev/tty to prompt for a >>> password, >>> so even closing stdin won't stop it blocking if there's no user >>> input available. >>> Applications may also override or extend SSL password fetching with >>> their own >>> callback. >>> >>> There is deliberately no environment variable equivalent for the >>> sslpassword >>> option. >>> >>> > I should also mention that this patch provides for support for DER > format certificates and keys. > > Here's an updated version of the patch, adjusted to the now committed changes to TestLib.pm. cheers andrew -- Andrew Dunstanhttps://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services >From b3b2104ab0a17383bb1e55ef59eaa609a417ed40 Mon Sep 17 00:00:00 2001 From: Andrew Dunstan Date: Mon, 25 Nov 2019 14:59:05 -0500 Subject: [PATCH] libpq sslpassword and DER support --- contrib/dblink/expected/dblink.out| 2 +- doc/src/sgml/libpq.sgml | 116 ++ doc/src/sgml/postgres-fdw.sgml| 2 +- src/interfaces/libpq/exports.txt | 4 + src/interfaces/libpq/fe-connect.c | 14 +++ src/interfaces/libpq/fe-secure-openssl.c | 99 +- src/interfaces/libpq/libpq-fe.h | 14 +++ src/interfaces/libpq/libpq-int.h | 2 + src/test/ssl/Makefile | 22 +++- src/test/ssl/ssl/client-der.key | Bin 0 -> 1191 bytes src/test/ssl/ssl/client-encrypted-der.key | Bin 0 -> 1191 bytes src/test/ssl/ssl/client-encrypted-pem.key | 30 ++ src/test/ssl/t/001_ssltests.pl| 66 ++-- 13 files changed, 354 insertions(+), 17 deletions(-) create mode 100644 src/test/ssl/ssl/client-der.key create mode 100644 src/test/ssl/ssl/client-encrypted-der.key create mode 100644 src/test/ssl/ssl/client-encrypted-pem.key diff --git a/contrib/dblink/expected/dblink.out b/contrib/dblink/expected/dblink.out index 6ceabb453c..6516d4f131 100644 --- a/contrib/dblink/expected/dblink.out +++ b/contrib/dblink/expected/dblink.out @@ -879,7 +879,7 @@ $d$; CREATE USER MAPPING FOR public SERVER fdtest OPTIONS (server 'localhost'); -- fail, can't specify server here ERROR: invalid option "server" -HINT: Valid options in this context are: user, password +HINT: Valid options in this context are: user, password, sslpassword CREATE USER MAPPING FOR public SERVER fdtest OPTIONS (user :'USER'); GRANT USAGE ON FOREIGN SERVER fdtest TO regress_dblink_user; GRANT EXECUTE ON FUNCTION dblink_connect_u(text, text) TO regress_dblink_user; diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml index 258b09cf8e..6082b38e9e 100644 --- a/doc/src/sgml/libpq.sgml +++ b/doc/src/sgml/libpq.sgml @@ -776,6 +776,72 @@ PGPing PQping(const char *conninfo); + + PQsetSSLKeyPassHookPQsetSSLKeyPassHook + + + PQsetSSLKeyPassHook lets an application override + libpq's default + handling of encrypted client certificate key files using +or interactive prompting. + + +void PQsetSSLKeyPassHook(PQsslKeyPassHook_type hook); + + + The application passes a pointer to a callback function with signature: + + int callback_fn(char *buf, int size, PGc
RE: GROUPING SETS and SQL standard
Hi, Thank you, as you mentionned it's not really an interesting real life case anyway. Regards, Phil De : Pavel Stehule Envoyé : lundi 25 novembre 2019 21:23 À : Phil Florent Cc : pgsql-hack...@postgresql.org Objet : Re: GROUPING SETS and SQL standard po 25. 11. 2019 v 20:32 odesílatel Phil Florent mailto:philflor...@hotmail.com>> napsal: Hi, We are still on the process to migrate our applications from proprietary RDBMS to PostgreSQL. Here is a simple query executed on various systems (real query is different but this one does not need any data) : Connected to: Oracle Database 19c Standard Edition 2 Release 19.0.0.0.0 - Production Version 19.3.0.0.0 SQL> select count(*) from (select 1 from dual where 0=1 group by grouping sets(())) tmp; COUNT(*) -- 0 select @@version; GO --- --- -- Microsoft SQL Server 2017 (RTM-CU16) (KB4508218) - 14.0.3223.3 (X64) Jul 12 2019 17:43:08 Copyright (C) 2017 Microsoft Corporation Developer Edition (64-bit) on Linux (Ubuntu 16.04.6 LTS) select count(*) from (select 1 as c1 where 0=1 group by grouping sets(())) tmp; GO --- 0 (1 rows affected) select version(); version PostgreSQL 11.5 (Debian 11.5-1+deb10u1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 8.3.0-6) 8.3.0, 64-bit select count(*) from (select 1 from dual where 0=1 group by grouping sets(())) tmp; count --- 1 (1 ligne) 0 or 1, which behaviour conforms to the SQL standard ? We have a workaround and it's just informational. This example has not too much sense - I am not sure if these corner cases are described by ANSI SQL standards. If I add aggregate query to subquery - using grouping sets without aggregation function is strange, then Postgres result looks more correct postgres=# select 1, count(*) from dual group by grouping sets(()); ┌──┬───┐ │ ?column? │ count │ ╞══╪═══╡ │1 │ 1 │ └──┴───┘ (1 row) postgres=# select 1, count(*) from dual where false group by grouping sets(()); ┌──┬───┐ │ ?column? │ count │ ╞══╪═══╡ │1 │ 0 │ └──┴───┘ (1 row) SELECT count(*) from this should be one in both cases. I am not sure, if standard describe using grouping sets without any aggregation function Pavel Regards, Phil
Re: [Doc] pg_restore documentation didn't explain how to use connection string
On Wed, 2019-11-13 at 16:48 +0100, Lætitia Avrot wrote: > So after some thoughts I did the minimal patch (for now). > I corrected documentation for the following tools so that now, using > connection string > for Postgres client applications is documented in Postgres: > - clusterdb > - pgbench > - pg_dump > - pg_restore > - reindexdb > - vacuumdb I think that this patch is a good idea. Even if it adds some redundancy, that can hardly be avoided because, as you said, the options to specify the database name are not the same everywhere. The patch applies and build fine. I see some room for improvement: - I think that "connection string" is better than "conninfo string". At least the chapter to which you link is headed "Connection Strings". This would also be consistent with the use of that term in the "pg_basebackup" , "pg_dumpall" and "pg_receivewal" documentation. You seem to have copied that wording from the "pg_isready", "psql", "reindexdb" and "vacuumdb" documentation, but I think it would be better to reword those too. - You begin your paragraph with "if this parameter contains ...". First, I think "argument" might be more appropriate here, as you are talking about a) the supplied value and b) a command line argument or the argument to an option Besides, it might be confusing to refer to "*this* parameter" if the text is not immediately after what you are referring to, like for example in "pgbench", where it might refer to the -h, -p or -U options. I think it would be better and less ambiguous to use "If dbname contains ..." In the cases where there is no ambiguity, it might be better to use a wording like in the "pg_recvlogical" documentation. - There are two places you forgot: createdb --maintenance-db=dbname dropdb --maintenance-db=dbname While looking at this patch, I noticed that "createuser" and "dropuser" explicitly connect to the "postgres" database rather than using "connectMaintenanceDatabase()" like the other scripts, which would try the database "postgres" first and fall back to "template1". This is unrelated to the patch, but low-hanging fruit for unified behavior. Yours, Laurenz Albe
Re: [HACKERS] WAL logging problem in 9.4.3?
On Mon, Nov 25, 2019 at 03:58:14PM -0500, Robert Haas wrote: > On Sat, Nov 23, 2019 at 4:21 PM Noah Misch wrote: > > I noticed an additional defect: > > > > BEGIN; > > CREATE TABLE t (c) AS SELECT 1; > > CHECKPOINT; -- write and fsync the table's one page > > TRUNCATE t; -- no WAL > > COMMIT; -- no FPI, just the commit record > > > > If we crash after the COMMIT and before the next fsync or OS-elected sync of > > the table's file, the table will stay on disk with its pre-TRUNCATE content. > > Shouldn't the TRUNCATE be triggering an fsync() to happen before > COMMIT is permitted to complete? With wal_skip_threshold=0, you do get an fsync(). The patch tries to avoid at-commit fsync of small files by WAL-logging file contents instead. However, the patch doesn't WAL-log enough to handle files that decreased in size. > You'd have the same problem if the > TRUNCATE were replaced by INSERT, unless fsync() happens in that case. I think an insert would be fine. You'd get an FPI record for the relation's one page, which fully reproduces the relation.
Re: Allow superuser to grant passwordless connection rights on postgres_fdw
On Sun, Nov 10, 2019 at 4:35 AM Craig Ringer wrote: > > On Mon, 4 Nov 2019 at 12:20, Stephen Frost wrote: >> >> Greetings, >> >> * Andrew Dunstan (andrew.duns...@2ndquadrant.com) wrote: >> > On 11/1/19 12:58 PM, Robert Haas wrote: >> > > On Thu, Oct 31, 2019 at 4:58 PM Andrew Dunstan >> > > wrote: >> > >> This patch allows the superuser to grant passwordless connection rights >> > >> in postgres_fdw user mappings. >> > > This is clearly something that we need, as the current code seems >> > > woefully ignorant of the fact that passwords are not the only >> > > authentication method supported by PostgreSQL, nor even the most >> > > secure. >> > > >> > > But, I do wonder a bit if we ought to think harder about the overall >> > > authentication model for FDW. Like, maybe we'd take a different view >> > > of how to solve this particular piece of the problem if we were >> > > thinking about how FDWs could do LDAP authentication, SSL >> > > authentication, credentials forwarding... >> > >> > I'm certainly open to alternatives. >> >> I've long felt that the way to handle this kind of requirement is to >> have a "trusted remote server" kind of option- where the local server >> authenticates to the remote server as a *server* and then says "this is >> the user on this server, and this is the user that this user wishes to >> be" and the remote server is then able to decide if they accept that, or >> not. > > > The original use case for the patch was to allow FDWs to use SSL/TLS client > certificates. Each user-mapping has its own certificate - there's a separate > patch to allow that. So there's no delegation of trust via Kerberos etc in > that particular case. > > I can see value in using Kerberos etc for that too though, as it separates > authorization and authentication in the same manner as most sensible systems. > You can say "user postgres@foo is trusted to vet users so you can safely hand > out tickets for any bar@foo that postgres@foo says is legit". > > I would strongly discourage allowing all users on host A to authenticate as > user postgres on host B. But with appropriate user-mappings support, we could > likely support that sort of model for both SSPI and Kerberos. > > A necessary prerequisite is that Pg be able to cope with passwordless > user-mappings though. Hence this patch. > > Yeah, I agree. Does anyone else want to weigh in on this? If nobody objects I'd like to tidy this up and get it committed so we can add support for client certs in postgres_fdw, which is the real business at hand, and which I know from various offline comments a number of people are keen to have. cheers andrew -- Andrew Dunstanhttps://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: global / super barriers (for checksums)
On Wed, Nov 13, 2019 at 12:26 PM Robert Haas wrote: > On the other hand, 0002 seems like it's pretty clearly a good idea. It > makes a whole bunch of auxiliary processes use > procsignal_sigusr1_handler() and those things all get called from > AuxiliaryProcessMain(), which does ProcSignalInit(), and it seems like > clearly the right idea that processes which register to participate in > the procsignal mechanism should also register to get notified if they > receive a procsignal. I think that the reason we haven't bothered with > this up until now is because I think that it's presently impossible > for any of the kind of procsignals that we have to get sent to any of > those processes. But, global barriers would require us to do so, so it > seems like it's time to tighten that up, and it doesn't really cost > anything. So I propose to commit this part soon, unless somebody > objects. Done. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: benchmarking Flex practices
[ My apologies for being so slow to get back to this ] John Naylor writes: > Now that I think of it, the regression in v7 was largely due to the > fact that the parser has to call the lexer 3 times per string in this > case, and that's going to be slower no matter what we do. Ah, of course. I'm not too fussed about the performance of queries with an explicit UESCAPE clause, as that seems like a very minority use-case. What we do want to pay attention to is not regressing for plain identifiers/strings, and to a lesser extent the U& cases without UESCAPE. > Inlining hexval() and friends seems to have helped somewhat for > unicode escapes, but I'd have to profile to improve that further. > However, v8 has regressed from v7 enough with both simple strings and > the information schema that it's a noticeable regression from HEAD. > I'm guessing getting rid of the "Uescape" production is to blame, but > I haven't tried reverting just that one piece. Since inlining the > rules didn't seem to help with the precedence hacks, it seems like the > separate production was a better way. Thoughts? I have duplicated your performance tests here, and get more or less the same results (see below). I agree that the performance of the v8 patch isn't really where we want to be --- and it also seems rather invasive to gram.y, and hence error-prone. (If we do it like that, I bet my bottom dollar that somebody would soon commit a patch that adds a production using IDENT not Ident, and it'd take a long time to notice.) It struck me though that there's another solution we haven't discussed, and that's to make the token lookahead filter in parser.c do the work of converting UIDENT [UESCAPE SCONST] to IDENT, and similarly for the string case. I pursued that to the extent of developing the attached incomplete patch ("v9"), which looks reasonable from a performance standpoint. I get these results with tests using the drive_parser function: information_schema HEAD3447.674 ms, 3433.498 ms, 3422.407 ms v6 3381.851 ms, 3442.478 ms, 3402.629 ms v7 3525.865 ms, 3441.038 ms, 3473.488 ms v8 3567.640 ms, 3488.417 ms, 3556.544 ms v9 3456.360 ms, 3403.635 ms, 3418.787 ms pgbench str HEAD4414.046 ms, 4376.222 ms, 4356.468 ms v6 4304.582 ms, 4245.534 ms, 4263.562 ms v7 4395.815 ms, 4398.381 ms, 4460.304 ms v8 4475.706 ms, 4466.665 ms, 4471.048 ms v9 4392.473 ms, 4316.549 ms, 4318.472 ms pgbench unicode HEAD4959.000 ms, 4921.751 ms, 4945.069 ms v6 4856.998 ms, 4802.996 ms, 4855.486 ms v7 5057.199 ms, 4948.342 ms, 4956.614 ms v8 5008.090 ms, 4963.641 ms, 4983.576 ms v9 4809.227 ms, 4767.355 ms, 4741.641 ms pgbench uesc HEAD5114.401 ms, 5235.764 ms, 5200.567 ms v6 5030.156 ms, 5083.398 ms, 4986.974 ms v7 5915.508 ms, 5953.135 ms, 5929.775 ms v8 5678.810 ms, 5665.239 ms, 5645.696 ms v9 5648.965 ms, 5601.592 ms, 5600.480 ms (A note about what we're looking at: on my machine, after using cpupower to lock down the CPU frequency, and taskset to bind everything to one CPU socket, I can get numbers that are very repeatable, to 0.1% or so ... until I restart the postmaster, and then I get different but equally repeatable numbers. The difference can be several percent, which is a lot of noise compared to what we're looking for. I believe the explanation is that kernel ASLR has loaded the backend executable at some different addresses and so there are different cache-line-boundary effects. While I could lock that down too by disabling ASLR, the result would be to overemphasize chance effects of a particular set of cache line boundaries. So I prefer to run all the tests over again after restarting the postmaster, a few times, and then look at the overall set of results to see what things look like. Each number quoted above is median-of-three tests within a single postmaster run.) Anyway, my conclusion is that the attached patch is at least as fast as today's HEAD; it's not as fast as v6, but on the other hand it's an even smaller postmaster executable, so there's something to be said for that: $ size postg* textdata bss dec hex filename 7478138 57928 203360 7739426 761822 postgres.head 7271218 57928 203360 7532506 72efda postgres.v6 7275810 57928 203360 7537098 7301ca postgres.v7 7276978 57928 203360 7538266 73065a postgres.v8 7266274 57928 203360 7527562 72dc8a postgres.v9 I based this on your v7 not v8; not sure if there's anything you want to salvage from v8. Generally, I'm pretty happy with this approach: it touches gram.y hardly at all, and it removes just about all of the complexity from scan.l. I'm happier about dropping the support code into parser.c than the other choices we've discussed. There's still undone work here, though: * I did not touch psql. Probably your patch is fine for that. * I did not do more with ecpg than get it to compile, using the same hacks as in your v7. It still fai
RE: GROUPING SETS and SQL standard
A of () (called grand total in the Standard) is equivalent to grouping the entire result Table; If I get it correctly: select max(dummy) from dual where 0 = 1 group by grouping sets(()); and select max(dummy) from dual where 0 = 1 ; should have the same output. It's the case with PostgreSQL, not with Oracle. Hence it means it's PostgreSQL which conforms to the standard in this case. Regards, Phil De : Phil Florent Envoyé : lundi 25 novembre 2019 22:18 À : Pavel Stehule Cc : pgsql-hack...@postgresql.org Objet : RE: GROUPING SETS and SQL standard Hi, Thank you, as you mentionned it's not really an interesting real life case anyway. Regards, Phil De : Pavel Stehule Envoyé : lundi 25 novembre 2019 21:23 À : Phil Florent Cc : pgsql-hack...@postgresql.org Objet : Re: GROUPING SETS and SQL standard po 25. 11. 2019 v 20:32 odesílatel Phil Florent mailto:philflor...@hotmail.com>> napsal: Hi, We are still on the process to migrate our applications from proprietary RDBMS to PostgreSQL. Here is a simple query executed on various systems (real query is different but this one does not need any data) : Connected to: Oracle Database 19c Standard Edition 2 Release 19.0.0.0.0 - Production Version 19.3.0.0.0 SQL> select count(*) from (select 1 from dual where 0=1 group by grouping sets(())) tmp; COUNT(*) -- 0 select @@version; GO --- --- -- Microsoft SQL Server 2017 (RTM-CU16) (KB4508218) - 14.0.3223.3 (X64) Jul 12 2019 17:43:08 Copyright (C) 2017 Microsoft Corporation Developer Edition (64-bit) on Linux (Ubuntu 16.04.6 LTS) select count(*) from (select 1 as c1 where 0=1 group by grouping sets(())) tmp; GO --- 0 (1 rows affected) select version(); version PostgreSQL 11.5 (Debian 11.5-1+deb10u1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 8.3.0-6) 8.3.0, 64-bit select count(*) from (select 1 from dual where 0=1 group by grouping sets(())) tmp; count --- 1 (1 ligne) 0 or 1, which behaviour conforms to the SQL standard ? We have a workaround and it's just informational. This example has not too much sense - I am not sure if these corner cases are described by ANSI SQL standards. If I add aggregate query to subquery - using grouping sets without aggregation function is strange, then Postgres result looks more correct postgres=# select 1, count(*) from dual group by grouping sets(()); ┌──┬───┐ │ ?column? │ count │ ╞══╪═══╡ │1 │ 1 │ └──┴───┘ (1 row) postgres=# select 1, count(*) from dual where false group by grouping sets(()); ┌──┬───┐ │ ?column? │ count │ ╞══╪═══╡ │1 │ 0 │ └──┴───┘ (1 row) SELECT count(*) from this should be one in both cases. I am not sure, if standard describe using grouping sets without any aggregation function Pavel Regards, Phil
Re: GROUPING SETS and SQL standard
Phil Florent writes: > A of () (called grand total in the Standard) is > equivalent to grouping the entire result Table; Yeah, I believe so. Grouping by no columns is similar to what happens if you compute an aggregate with no GROUP BY: the whole table is taken as one group. If the table is empty, the group is empty, but there's still a group --- that's why you get one aggregate output value, not none, from regression=# select count(*) from dual where 0 = 1; count --- 0 (1 row) Thus, in your example, the sub-query should give regression=# select 1 from dual where 0=1 group by grouping sets(()); ?column? -- 1 (1 row) and therefore it's correct that regression=# select count(*) from (select 1 from dual where 0=1 group by grouping sets(())) tmp; count --- 1 (1 row) AFAICS, Oracle and SQL Server are getting it wrong. regards, tom lane
Dynamic gathering the values for seq_page_cost/xxx_cost
The optimizer cost model usually needs 2 inputs, one is used to represent data distribution and the other one is used to represent the capacity of the hardware, like cpu/io let's call this one as system stats. In Oracle database, the system stats can be gathered with dbms_stats.gather_system_stats [1] on the running hardware, In postgresql, the value is set on based on experience (user can change the value as well, but is should be hard to decide which values they should use). The pg way is not perfect in theory(In practice, it may be good enough or not). for example, HDD & SSD have different capacity regards to seq_scan_cost/random_page_cost, cpu cost may also different on different hardware as well. I run into a paper [2] which did some research on dynamic gathering the values for xxx_cost, looks it is interesting. However it doesn't provide the code for others to do more research. before I dive into this, It would be great to hear some suggestion from experts. so what do you think about this method and have we have some discussion about this before and the result? [1] https://docs.oracle.com/database/121/ARPLS/d_stats.htm#ARPLS68580 [2] https://dsl.cds.iisc.ac.in/publications/thesis/pankhuri.pdf Thanks
[PATCH] Fix possible string overflow with sscanf (xlog.c)
Hi, I know it's very hard, but is possible. Just someone with the knowledge to do. Here a proof of concept: #include #include #define MAXPGPATH 256 int main(int argc, char ** argv) { chartbsoid[MAXPGPATH]; charstr[MAXPGPATH]; int ch, prev_ch = -1, i = 0, n; FILE * lfp; lfp = fopen("c:\\tmp\\crash.dat", "rb"); while ((ch = fgetc(lfp)) != EOF) { if ((ch == '\n' || ch == '\r') && prev_ch != '\\') { str[i] = '\0'; if (sscanf(str, "%s %n", tbsoid, &n) != 1) { printf("tbsoid size=%u\n", strlen(tbsoid)); printf("tbsoid=%s\n", tbsoid); exit(1); } i = 0; continue; } else if ((ch == '\n' || ch == '\r') && prev_ch == '\\') str[i - 1] = ch; else str[i++] = ch; prev_ch = ch; } fclose(lfp); } Overflow with (MAXPGPATH=256) C:\usr\src\tests\scanf>sscanf3 tbsoid size=260 tbsoid=x xxx Now with patch: C:\usr\src\tests\scanf>sscanf3 tbsoid size=255 tbsoid=x xx The solution is simple, but clumsy. I hope that is enough. sscanf(str, "%1023s %n", tbsoid, &n) Best regards. Ranier Vileladiff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 5f0ee50092..790e68ccf0 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -11382,7 +11382,7 @@ read_tablespace_map(List **tablespaces) if ((ch == '\n' || ch == '\r') && prev_ch != '\\') { str[i] = '\0'; - if (sscanf(str, "%s %n", tbsoid, &n) != 1) + if (sscanf(str, FMTPGPATH, tbsoid, &n) != 1) ereport(FATAL, (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), errmsg("invalid data in file \"%s\"", TABLESPACE_MAP))); diff --git a/src/include/pg_config_manual.h b/src/include/pg_config_manual.h index 743401cb96..03e0e3735e 100644 --- a/src/include/pg_config_manual.h +++ b/src/include/pg_config_manual.h @@ -84,6 +84,11 @@ */ #define MAXPGPATH 1024 +/* + * FMTPGPATH: standard format to scan pathname buffer, must follow MAXPGPATH + */ +#define FMTPGPATH "%1023s %n" + /* * PG_SOMAXCONN: maximum accept-queue length limit passed to * listen(2). You'd think we should use SOMAXCONN from
Re: progress report for ANALYZE
Hi Amit-san! Thanks for your comments! I have looked at the patch and here are some comments. I think include_children and current_relid are not enough to understand the progress of analyzing inheritance trees, because even with current_relid being updated, I can't tell how many more there will be. I think it'd be better to show the total number of children and the number of children processed, just like pg_stat_progress_create_index does for partitions. So, instead of include_children and current_relid, I think it's better to have child_tables_total, child_tables_done, and current_child_relid, placed last in the set of columns. Ah, I understood. I'll check pg_stat_progress_create_index does for partitions, and will create a new patch. :) Related to the above, I wonder whether we need the total number of ext stats on pg_stat_progress_analyze or not. As you might know, there is the same counter on pg_stat_progress_vacuum and pg_stat_progress_cluster. For example, index_vacuum_count and index_rebuild_count. Would it be added to the total number column to these views? :) Also, inheritance tree stats are created *after* creating single table stats, so I think that it would be better to have a distinct phase name for that, say "acquiring inherited sample rows". In do_analyze_rel(), you can select which of two phases to set based on whether inh is true or not. For partitioned tables, the progress output will immediately switch to this phase, because partitioned table itself is empty so there's nothing to do in the "acquiring sample rows" phase. That's all for now. Okay! I'll also add the new phase "acquiring inherited sample rows" on the next patch. :) Thanks, Tatsuro Yamada
Re: Problem while updating a foreign table pointing to a partitioned table on foreign server
Hi Michael-san, On Mon, Nov 25, 2019 at 4:13 PM Michael Paquier wrote: > On Tue, Sep 03, 2019 at 12:37:52AM +0900, Etsuro Fujita wrote: > > On Wed, Aug 14, 2019 at 11:51 AM Etsuro Fujita > > wrote: > >> This is my TODO item for PG13, but I'll give priority to other things > >> in the next commitfest. If anyone wants to work on it, feel free; > >> else I'll move this to the November commitfest when it opens. > > > > Moved. > > This has been waiting on author for two commit fests now, and the > situation has not changed. Fujita-san, Hiriguchi-san, do you have an > update to provide? There is no meaning to keep the current stale > situation for more CFs. I was planning to work on this in this commitfest, but sorry, I didn't have time due to other priorities. Probably, I won't have time for this in the development cycle for v13. So I'll mark this as RWF, unless anyone wants to work on it. Best regards, Etsuro Fujita
Re: progress report for ANALYZE
Hi Amit-san, Related to the above, I wonder whether we need the total number of ext stats on pg_stat_progress_analyze or not. As you might know, there is the same counter on pg_stat_progress_vacuum and pg_stat_progress_cluster. For example, index_vacuum_count and index_rebuild_count. Would it be added to the total number column to these views? :) Oops, I made a mistake. :( What I'd like to say was: Would it be better to add the total number column to these views? :) Thanks, Tatsuro Yamada
Re: checkpointer: PANIC: could not fsync file: No such file or directory
I looked and found a new "hint". On Tue, Nov 19, 2019 at 05:57:59AM -0600, Justin Pryzby wrote: > < 2019-11-15 22:16:07.098 EST >PANIC: could not fsync file > "base/16491/1731839470.2": No such file or directory > < 2019-11-15 22:16:08.751 EST >LOG: checkpointer process (PID 27388) was > terminated by signal 6: Aborted An earlier segment of that relation had been opened successfully and was *still* opened: $ sudo grep 1731839470 /var/spool/abrt/ccpp-2019-11-15-22:16:08-27388/open_fds 63:/var/lib/pgsql/12/data/base/16491/1731839470 For context: $ sudo grep / /var/spool/abrt/ccpp-2019-11-15-22:16:08-27388/open_fds |tail -3 61:/var/lib/pgsql/12/data/base/16491/1757077748 62:/var/lib/pgsql/12/data/base/16491/1756223121.2 63:/var/lib/pgsql/12/data/base/16491/1731839470 So this may be an issue only with relations>segment (but, that interpretation could also be very naive).
Re: Safeguards against incorrect fd flags for fsync()
On Mon, Nov 25, 2019 at 04:18:33PM +0900, Michael Paquier wrote: > Thanks for the review. I'll look at that pretty soon. Tweaked a bit the comment block added, and committed. Thanks Mark for the input! -- Michael signature.asc Description: PGP signature
Re: checkpointer: PANIC: could not fsync file: No such file or directory
On Tue, Nov 26, 2019 at 5:21 PM Justin Pryzby wrote: > I looked and found a new "hint". > > On Tue, Nov 19, 2019 at 05:57:59AM -0600, Justin Pryzby wrote: > > < 2019-11-15 22:16:07.098 EST >PANIC: could not fsync file > > "base/16491/1731839470.2": No such file or directory > > < 2019-11-15 22:16:08.751 EST >LOG: checkpointer process (PID 27388) was > > terminated by signal 6: Aborted > > An earlier segment of that relation had been opened successfully and was > *still* opened: > > $ sudo grep 1731839470 /var/spool/abrt/ccpp-2019-11-15-22:16:08-27388/open_fds > 63:/var/lib/pgsql/12/data/base/16491/1731839470 > > For context: > > $ sudo grep / /var/spool/abrt/ccpp-2019-11-15-22:16:08-27388/open_fds |tail -3 > 61:/var/lib/pgsql/12/data/base/16491/1757077748 > 62:/var/lib/pgsql/12/data/base/16491/1756223121.2 > 63:/var/lib/pgsql/12/data/base/16491/1731839470 > > So this may be an issue only with relations>segment (but, that interpretation > could also be very naive). FTR I have been trying to reproduce this but failing so far. I'm planning to dig some more in the next couple of days. Yeah, it's a .2 file, which means that it's one that would normally be unlinked after you commit your transaction (unlike a no-suffix file, which would normally be dropped at the next checkpoint after the commit, as our strategy to prevent the relfilenode from being reused before the next checkpoint cycle), but should normally have had a SYNC_FORGET_REQUEST enqueued for it first. So the question is, how did it come to pass that a .2 file was ENOENT but there was no forget request? Diificult, given the definition of mdunlinkfork(). I wondered if something was going wrong in queue compaction or something like that, but I don't see it. I need to dig into the exactly flow with the ALTER case to see if there is something I'm missing there, and perhaps try reproducing it with a tiny segment size to exercise some more multisegment-related code paths.
Re: accounting for memory used for BufFile during hash joins
On Mon, Nov 25, 2019 at 07:11:19PM +0100, Tomas Vondra wrote: > I'm not planning to do any any immediate work on this, so I agree with > marking it as RWF. I think Melanie is working on the BNL patch, which > seems like the right solution. Thanks, I have switched the patch as returned with feedback. -- Michael signature.asc Description: PGP signature
Re: [HACKERS] Block level parallel vacuum
On Mon, Nov 25, 2019 at 9:42 PM Masahiko Sawada wrote: > > On Fri, 22 Nov 2019 at 10:19, Amit Kapila wrote: > > > > On Wed, Nov 20, 2019 at 11:01 AM Masahiko Sawada > > wrote: > > > > > > I've attached the latest version patch set. The patch set includes all > > > discussed points regarding index AM options as well as shared cost > > > balance. Also I added some test cases used all types of index AM. > > > > > > > I have reviewed the first patch and made a number of modifications > > that include adding/modifying comments, made some corrections and > > modifications in the documentation. You can find my changes in > > v33-0001-delta-amit.patch. See, if those look okay to you, if so, > > please include those in the next version of the patch. I am attaching > > both your version of patch and delta changes by me. > > Thank you. > > All changes look good to me. But after changed the 0002 patch the two > macros for parallel vacuum options (VACUUM_OPTIONS_SUPPORT_XXX) is no > longer necessary. So we can remove them and can add if we need them > again. > Sounds reasonable. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: logical decoding : exceeded maxAllocatedDescs for .spill files
On Fri, Nov 22, 2019 at 7:38 PM Amit Khandekar wrote: > > On Fri, 22 Nov 2019 at 4:26 PM, Amit Kapila wrote: >> >> I think this is exactly the reason for the problem. In my test [1], >> the error "permission denied" occurred when I second time executed >> pg_logical_slot_get_changes() which means on first execution the >> unlink would have been successful but the files are still not removed >> as they were not closed. Then on second execution, it gets an error >> "Permission denied" when it again tries to unlink files via >> ReorderBufferCleanupSerializedTXNs(). >> >> >> . >> > But what you are seeing is "Permission denied" errors. Not sure why >> > unlink() is failing. >> > >> >> In your test program, if you try to unlink the file second time, you >> should see the error "Permission denied". > > I tested using the sample program and indeed I got the error 5 (access > denied) when I called unlink the second time. > So, what is the next step here? How about if we somehow check whether the file exists before doing unlink, say by using stat? If that doesn't work, I think we might need to go in the direction of tracking file handles in some way, so that they can be closed during an abort. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: Ought to use heap_multi_insert() for pg_attribute/depend insertions?
On Sun, Nov 17, 2019 at 12:01:08AM +0100, Daniel Gustafsson wrote: > Fixed by opting for the latter, mostly since it seems best convey what the > function does. - recordMultipleDependencies(depender, - context.addrs->refs, context.addrs->numrefs, - behavior); + recordMultipleDependencies(depender, context.addrs->refs, + context.addrs->numrefs, behavior); Some noise diffs. --- a/src/test/regress/expected/create_index.out +++ b/src/test/regress/expected/create_index.out index concur_reindex_ind4| column c1 of table - index concur_reindex_ind4| column c1 of table index concur_reindex_ind4| column c2 of table This removes a duplicated dependency with indexes using the same column multiple times. Guess that's good to get rid of, this was just unnecessary bloat in pg_depend. The regression tests of contrib/test_decoding are still failing here: +ERROR: could not resolve cmin/cmax of catalog tuple Getting rid the duplication between InsertPgAttributeTuples() and InsertPgAttributeTuple() would be nice. You would basically finish by just using a single slot when inserting one tuple.. -- Michael signature.asc Description: PGP signature
Re: logical decoding : exceeded maxAllocatedDescs for .spill files
On Tue, 26 Nov 2019 at 10:49, Amit Kapila wrote: > > On Fri, Nov 22, 2019 at 7:38 PM Amit Khandekar wrote: > > > > On Fri, 22 Nov 2019 at 4:26 PM, Amit Kapila wrote: > >> > >> I think this is exactly the reason for the problem. In my test [1], > >> the error "permission denied" occurred when I second time executed > >> pg_logical_slot_get_changes() which means on first execution the > >> unlink would have been successful but the files are still not removed > >> as they were not closed. Then on second execution, it gets an error > >> "Permission denied" when it again tries to unlink files via > >> ReorderBufferCleanupSerializedTXNs(). > >> > >> > >> . > >> > But what you are seeing is "Permission denied" errors. Not sure why > >> > unlink() is failing. > >> > > >> > >> In your test program, if you try to unlink the file second time, you > >> should see the error "Permission denied". > > > > I tested using the sample program and indeed I got the error 5 (access > > denied) when I called unlink the second time. > > > > So, what is the next step here? How about if we somehow check whether > the file exists before doing unlink, say by using stat? But the thing is, the behaviour is so much in a grey area, that we cannot reliably say for instance that when stat() says there is no such file, there is indeed no such file, and if we re-create the same file when it is still open, it is always going to open a new file, etc. > If that doesn't work, I think we might need to go in the direction of tracking > file handles in some way, so that they can be closed during an abort. Yeah, that is one way. I am still working on different approaches. WIll get back with proposals. -- Thanks, -Amit Khandekar EnterpriseDB Corporation The Postgres Database Company
Re: dropdb --force
On Mon, Nov 25, 2019 at 11:22 PM vignesh C wrote: > > On Sun, Nov 24, 2019 at 5:06 PM Pavel Stehule wrote: > > > > > > > > ne 24. 11. 2019 v 11:25 odesílatel vignesh C napsal: > >> > >> On Sat, Nov 23, 2019 at 4:42 PM Amit Kapila > >> wrote: > >> > > >> > On Fri, Nov 22, 2019 at 3:16 PM vignesh C wrote: > >> > > > >> > > Thanks for fixing the comments. The changes looks fine to me. I have > >> > > fixed the first comment, attached patch has the changes for the same. > >> > > > >> > > >> > Few comments: > >> > -- > >> > 1. There is a lot of duplicate code in this test. Basically, almost > >> > the same code is used once to test Drop Database command and dropdb. > >> > I think we can create a few sub-functions to avoid duplicate code, but > >> > do we really need to test the same thing once via command and then by > >> > dropdb utility? I don't think it is required, but even if we want to > >> > do so, then I am not sure if this is the right test script. I suggest > >> > removing the command related test. > >> > > >> > >> Pavel: What is your opinion on this? > > > > > > I have not any problem with this reduction. > > > > We will see in future years what is benefit of this test. > > > > Fixed, removed dropdb utility test. > Hmm, you have done the opposite of what I have asked. This test file is in rc/bin/scripts/t/ where we generally keep the tests for utilities. So, retaining the dropdb utility test in that file seems more sensible. +ok( TestLib::pump_until( + $killme, $psql_timeout, \$killme_stdout, qr/[[:digit:]]+[\r\n]$/m), + 'acquired pid'); How about changing 'acquired pid' to 'acquired pid for SIGTERM'? > >> > > >> > >> I have verified by running perltidy. > >> I think we don't need to run perltidy on the existing code. So, I am not sure if it is a good idea to run it for file 013_crash_restart.pl as it changes some existing code formatting. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: logical decoding : exceeded maxAllocatedDescs for .spill files
On Tue, Nov 26, 2019 at 11:19 AM Amit Khandekar wrote: > > On Tue, 26 Nov 2019 at 10:49, Amit Kapila wrote: > > > > > > So, what is the next step here? How about if we somehow check whether > > the file exists before doing unlink, say by using stat? > But the thing is, the behaviour is so much in a grey area, that we > cannot reliably say for instance that when stat() says there is no > such file, there is indeed no such file, > Why so? > and if we re-create the same > file when it is still open, it is always going to open a new file, > etc. > Yeah, or maybe even if we don't create with the same name, there is always be some dangling file which again doesn't sound like a good thing. > > If that doesn't work, I think we might need to go in the direction of > > tracking > > file handles in some way, so that they can be closed during an abort. > Yeah, that is one way. I am still working on different approaches. > WIll get back with proposals. > Fair enough. See, if you can also consider an approach that is local to ReorderBuffer module wherein we can track those handles in ReorderBufferTxn or some other place local to that module. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: Implementing Incremental View Maintenance
Hi, Attached is the latest patch (v8) to add support for Incremental View Maintenance (IVM). This patch adds OUTER join support in addition to the patch (v7) submitted last week in the following post. On Fri, 22 Nov 2019 15:29:45 +0900 (JST) Tatsuo Ishii wrote: > Up to now, IVM supports materialized views using: > > - Inner joins > - Some aggregate functions (count, sum, min, max, avg) > - GROUP BY > - Self joins > > With the latest patch now IVM supports subqueries in addition to > above. > > Known limitations are listed here: > > https://github.com/sraoss/pgsql-ivm/issues > > See more details at: > https://wiki.postgresql.org/wiki/Incremental_View_Maintenance * About outer join support: In case of outer-join, when a table is modified, in addition to deltas which occur in inner-join case, we also need to deletion or insertion of dangling tuples, that is, null-extended tuples generated when a join condition isn't met. [Example] - -- Create base tables and an outer join view CREATE TABLE r(i int); CREATE TABLE s(j int); INSERT INTO r VALUES (1); CREATE INCREMENTAL MATERIALIZED VIEW mv AS SELECT * FROM r LEFT JOIN s ON r.i=s.j; SELECT * FROM mv; i | j ---+--- (1 row) -- After an insertion to a base table ... INSERT INTO s VALUES (1); -- (1,1) is inserted and (1,null) is deleted from the view. SELECT * FROM mv; i | j ---+--- 1 | 1 (1 row) - Our implementation is basically based on the algorithm of Larson & Zhou (2007) [1]. Before view maintenances, the view definition query's jointree is analysed to make "view maintenance graph". This graph represents which tuples in the views are affected when a base table is modified. Specifically, tuples which are not null-extended on the modified table (that is, tuples generated by joins with the modiifed table) are directly affected. The delta of such effects are calculated similarly to inner-joins. On the other hand, dangling tuples generated by anti-joins with directly affected tuples can be indirectly affected. This means that we may need to delete dangling tuples when any tuples are inserted to a table, as well as to insert dangling tuples when tuples are deleted from a table. [1] Efficient Maintenance of Materialized Outer-Join Views (Larson & Zhou, 2007) https://ieeexplore.ieee.org/document/4221654 Although the original paper assumes that every base table and view have a unique key and tuple duplicates is disallowed, we allow this. If a view has tuple duplicates, we have to determine the number of each dangling tuple to be inserted into the view when tuples in a table are deleted. For this purpose, we count the number of each tuples which constitute a deleted tuple. These counts are stored as JSONB object in the delta table, and we use this information to maintain outer-join views. Also, we support outer self-joins that is not assumed in the original paper. * Restrictions Currently, we have following restrictions: - outer join view's targetlist must contain attributes used in join conditions - outer join view's targetlist cannot contain non-strict functions - outer join supports only simple equijoin - outer join view's WHERE clause cannot contain non null-rejecting predicates - aggregate is not supported with outer join - subquery (including EXSITS) is not supported with outer join Regression tests for all patterns of 3-way outer join and are added. Moreover, I reordered IVM related functions in matview.c so that ones which have relationship will be located closely. Moreover, I added more comments. Regards, Yugo Nagata -- Yugo Nagata IVM_v8.patch.gz Description: application/gzip
Re: Implementing Incremental View Maintenance
Note that this is the last patch in the series of IVM patches: now we would like focus on blushing up the patches, rather than adding new SQL support to IVM, so that the patch is merged into PostgreSQL 13 (hopefully). We are very welcome reviews, comments on the patch. BTW, the SGML docs in the patch is very poor at this point. I am going to add more descriptions to the doc. > Hi, > > Attached is the latest patch (v8) to add support for Incremental View > Maintenance (IVM). This patch adds OUTER join support in addition > to the patch (v7) submitted last week in the following post. > > On Fri, 22 Nov 2019 15:29:45 +0900 (JST) > Tatsuo Ishii wrote: >> Up to now, IVM supports materialized views using: >> >> - Inner joins >> - Some aggregate functions (count, sum, min, max, avg) >> - GROUP BY >> - Self joins >> >> With the latest patch now IVM supports subqueries in addition to >> above. >> >> Known limitations are listed here: >> >> https://github.com/sraoss/pgsql-ivm/issues >> >> See more details at: >> https://wiki.postgresql.org/wiki/Incremental_View_Maintenance > > * About outer join support: > > In case of outer-join, when a table is modified, in addition to deltas > which occur in inner-join case, we also need to deletion or insertion of > dangling tuples, that is, null-extended tuples generated when a join > condition isn't met. > > [Example] > - > -- Create base tables and an outer join view > CREATE TABLE r(i int); > CREATE TABLE s(j int); > INSERT INTO r VALUES (1); > CREATE INCREMENTAL MATERIALIZED VIEW mv > AS SELECT * FROM r LEFT JOIN s ON r.i=s.j; > SELECT * FROM mv; > i | j > ---+--- > (1 row) > > -- After an insertion to a base table ... > INSERT INTO s VALUES (1); > -- (1,1) is inserted and (1,null) is deleted from the view. > SELECT * FROM mv; > i | j > ---+--- > 1 | 1 > (1 row) > - > > Our implementation is basically based on the algorithm of Larson & Zhou > (2007) [1]. Before view maintenances, the view definition query's jointree > is analysed to make "view maintenance graph". This graph represents > which tuples in the views are affected when a base table is modified. > Specifically, tuples which are not null-extended on the modified table > (that is, tuples generated by joins with the modiifed table) are directly > affected. The delta of such effects are calculated similarly to inner-joins. > > On the other hand, dangling tuples generated by anti-joins with directly > affected tuples can be indirectly affected. This means that we may need to > delete dangling tuples when any tuples are inserted to a table, as well as > to insert dangling tuples when tuples are deleted from a table. > > [1] Efficient Maintenance of Materialized Outer-Join Views (Larson & Zhou, > 2007) > https://ieeexplore.ieee.org/document/4221654 > > Although the original paper assumes that every base table and view have a > unique key and tuple duplicates is disallowed, we allow this. If a view has > tuple duplicates, we have to determine the number of each dangling tuple to > be inserted into the view when tuples in a table are deleted. For this > purpose, > we count the number of each tuples which constitute a deleted tuple. These > counts are stored as JSONB object in the delta table, and we use this > information to maintain outer-join views. Also, we support outer self-joins > that is not assumed in the original paper. > > * Restrictions > > Currently, we have following restrictions: > > - outer join view's targetlist must contain attributes used in join conditions > - outer join view's targetlist cannot contain non-strict functions > - outer join supports only simple equijoin > - outer join view's WHERE clause cannot contain non null-rejecting predicates > - aggregate is not supported with outer join > - subquery (including EXSITS) is not supported with outer join > > > Regression tests for all patterns of 3-way outer join and are added. > > Moreover, I reordered IVM related functions in matview.c so that ones > which have relationship will be located closely. Moreover, I added more > comments. > > Regards, > Yugo Nagata > > > -- > Yugo Nagata