Re: BufFileRead() error signalling
On Wed, May 27, 2020 at 11:59:59AM -0400, Alvaro Herrera wrote: > In the discussion that led to 811b6e36a9e2 I did suggest to use "read > only M of N" instead, but there wasn't enough discussion on that fine > point so we settled on what you now call prevalent without a lot of > support specifically on that. I guess it was enough of an improvement > over what was there. But like Robert, I too prefer the wording that > includes "only" and "bytes" over the wording that doesn't. > > I'll let it be known that from a translator's point of view, it's a > ten-seconds job to update a fuzzy string from not including "only" and > "bytes" to one that does. So let's not make that an argument for not > changing. Using "only" would be fine by me, though I tend to prefer the existing one. Now I think that we should avoid "bytes" to not have to worry about pluralization of error messages. This has been a concern in the past (see e5d11b9 and the likes). -- Michael signature.asc Description: PGP signature
Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions
On Tue, May 26, 2020 at 12:00 PM Amit Kapila wrote: > > On Fri, May 22, 2020 at 6:22 PM Dilip Kumar wrote: > > > > On Mon, May 18, 2020 at 4:10 PM Amit Kapila wrote: > > > > > > On Sun, May 17, 2020 at 12:41 PM Dilip Kumar > > > wrote: > > > > > > > > On Fri, May 15, 2020 at 4:04 PM Amit Kapila > > > > wrote: > > > > > > > > > > > > > > > Review comments: > > > > > -- > > > > > 1. > > > > > @@ -1762,10 +1952,16 @@ ReorderBufferCommit(ReorderBuffer *rb, > > > > > TransactionId xid, > > > > > } > > > > > > > > > > case REORDER_BUFFER_CHANGE_MESSAGE: > > > > > - rb->message(rb, txn, change->lsn, true, > > > > > - change->data.msg.prefix, > > > > > - change->data.msg.message_size, > > > > > - change->data.msg.message); > > > > > + if (streaming) > > > > > + rb->stream_message(rb, txn, change->lsn, true, > > > > > +change->data.msg.prefix, > > > > > +change->data.msg.message_size, > > > > > +change->data.msg.message); > > > > > + else > > > > > + rb->message(rb, txn, change->lsn, true, > > > > > +change->data.msg.prefix, > > > > > +change->data.msg.message_size, > > > > > +change->data.msg.message); > > > > > > > > > > Don't we need to set any_data_sent flag while streaming messages as we > > > > > do for other types of changes? > > > > > > > > I think any_data_sent, was added to avoid sending abort to the > > > > subscriber if we haven't sent any data, but this is not complete as > > > > the output plugin can also take the decision not to send. So I think > > > > this should not be done as part of this patch and can be done > > > > separately. I think there is already a thread for handling the > > > > same[1] > > > > > > > > > > Hmm, but prior to this patch, we never use to send (empty) aborts but > > > now that will be possible. It is probably okay to deal that with > > > another patch mentioned by you but I felt at least any_data_sent will > > > work for some cases. OTOH, it appears to be half-baked solution, so > > > we should probably refrain from adding it. BTW, how do the pgoutput > > > plugin deal with it? I see that apply_handle_stream_abort will > > > unconditionally try to unlink the file and it will probably fail. > > > Have you tested this scenario after your latest changes? > > > > Yeah, I see, I think this is a problem, but this exists without my > > latest change as well, if pgoutput ignore some changes because it is > > not published then we will see a similar error. Shall we handle the > > ENOENT error case from unlink? > Isn't this problem only for subxact file as we anyway create changes > file as part of start stream message which should have come after > abort? If so, can't we detect whether subxact file exists probably by > using nsubxacts or something like that? Can you please once try to > reproduce this scenario to ensure that we are not missing anything? I have tested this, as of now, by default we create both changes and subxact files irrespective of whether we get any subtransactions or not. Maybe this could be optimized that only if we have any subxact then only create that file otherwise not? What's your opinion on the same. > > > > > 8. > > > > > @@ -2295,6 +2677,13 @@ ReorderBufferXidSetCatalogChanges(ReorderBuffer > > > > > *rb, TransactionId xid, > > > > > txn = ReorderBufferTXNByXid(rb, xid, true, NULL, lsn, true); > > > > > > > > > > txn->txn_flags |= RBTXN_HAS_CATALOG_CHANGES; > > > > > + > > > > > + /* > > > > > + * TOCHECK: Mark toplevel transaction as having catalog changes too > > > > > + * if one of its children has. > > > > > + */ > > > > > + if (txn->toptxn != NULL) > > > > > + txn->toptxn->txn_flags |= RBTXN_HAS_CATALOG_CHANGES; > > > > > } > > > > > > > > > > Why are we marking top transaction here? > > > > > > > > We need to mark top transaction to decide whether to build tuplecid > > > > hash or not. In non-streaming mode, we are only sending during the > > > > commit time, and during commit time we know whether the top > > > > transaction has any catalog changes or not based on the invalidation > > > > message so we are marking the top transaction there in DecodeCommit. > > > > Since here we are not waiting till commit so we need to mark the top > > > > transaction as soon as we mark any of its child transactions. > > > > > > > > > > But how does it help? We use this flag (via > > > ReorderBufferXidHasCatalogChanges) in SnapBuildCommitTxn which is > > > anyway done in DecodeCommit and that too after setting this flag for > > > the top transaction if required. So, how will it help in setting it > > > while processing for subxid. Also, even if we have to do it won't it > > > add the xid needlessly in builder->committed.xip array? > > > > In ReorderBufferBuildTupleCidHash, we use this flag to decide whether > > to build the tuplecid hash or not based on whether it has catalog > > changes or not. > > > > Okay, but you haven't answered the second part of the question: "won't > it add
Re: SIGSEGV from START_REPLICATION 0/XXXXXXX in XLogSendPhysical () at walsender.c:2762
On Thu, May 28, 2020 at 09:07:04AM +0300, Vladimir Sitnikov wrote: > The CI history shows that HEAD was good at 11 May 13:27 UTC, and it became > bad by 19 May 14:00 UTC, > so the regression was introduced somewhere in-between. > > Does that ring any bells? It does, thanks! This would map with 1d374302 or 850196b6 that reworked this area of the code, so it seems like we are not quite done with this work yet. Do you still see the problem as of 55ca50d (today's latest HEAD)? Also, just wondering.. If I use more or less the same commands as your travis job I should be able to reproduce the problem with a fresh JDBC repository, right? Or do you a sub-portion of your regression tests to run that easily? -- Michael signature.asc Description: PGP signature
Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions
On Tue, May 26, 2020 at 3:04 PM Amit Kapila wrote: > > On Mon, May 25, 2020 at 8:07 PM Dilip Kumar wrote: > > > > On Fri, May 22, 2020 at 11:54 AM Amit Kapila > > wrote: > > > > > > 4. > > > + * XXX Do we need to allocate it in TopMemoryContext? > > > + */ > > > +static void > > > +subxact_info_add(TransactionId xid) > > > { > > > .. > > > > > > For this and other places in a patch like in function > > > stream_open_file(), instead of using TopMemoryContext, can we consider > > > using a new memory context LogicalStreamingContext or something like > > > that. We can create LogicalStreamingContext under TopMemoryContext. I > > > don't see any need of using TopMemoryContext here. > > > > But, when we will delete/reset the LogicalStreamingContext? > > > > Why can't we reset it at each stream stop message? > > because > > we are planning to keep this memory until the worker is alive so that > > supposed to be the top memory context. > > > > Which part of allocation do we want to keep till the worker is alive? static TransactionId *xids = NULL; we need to keep till worker life space. > Why we need memory-related to subxacts till the worker is alive? As > we have now, after reading subxact info (subxact_info_read), we need > to ensure that it is freed after its usage due to which we need to > remember and perform pfree at various places. > > I think we should once see the possibility that such that we could > switch to this new context in start stream message and reset it in > stop stream message. That might help in avoiding > MemoryContextSwitchTo TopMemoryContext at various places. Ok, I understand, I think subxacts can be allocated in new LogicalStreamingContext which we can reset at the stream stop. How about xids? shall we create another context that will stay until the worker lifespan? > > If we create any other context > > with the same life span as TopMemoryContext then what is the point? >> > > It is helpful for debugging. It is recommended that we don't use the > top memory context unless it is really required. Read about it in > src/backend/utils/mmgr/README. I see. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: SIGSEGV from START_REPLICATION 0/XXXXXXX in XLogSendPhysical () at walsender.c:2762
At Thu, 28 May 2020 16:22:33 +0900, Michael Paquier wrote in > On Thu, May 28, 2020 at 09:07:04AM +0300, Vladimir Sitnikov wrote: > > The CI history shows that HEAD was good at 11 May 13:27 UTC, and it became > > bad by 19 May 14:00 UTC, > > so the regression was introduced somewhere in-between. > > > > Does that ring any bells? > > It does, thanks! This would map with 1d374302 or 850196b6 that > reworked this area of the code, so it seems like we are not quite done > with this work yet. Do you still see the problem as of 55ca50d > (today's latest HEAD)? I think that's not the case. I think I cause this crash with the HEAD. I'll post a fix soon. > Also, just wondering.. If I use more or less the same commands as > your travis job I should be able to reproduce the problem with a fresh > JDBC repository, right? Or do you a sub-portion of your regression > tests to run that easily? Pgjdbc seems initiating physical replication on a logical replication session. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: WAL reader APIs and WAL segment open/close callbacks
On Tue, May 26, 2020 at 08:49:44AM +0900, Michael Paquier wrote: > NB: I found some incorrect comments as per the attached: > s/open_segment/segment_open/ > s/close_segment/segment_close/ And fixed this one with f93bb0c. -- Michael signature.asc Description: PGP signature
Re: SIGSEGV from START_REPLICATION 0/XXXXXXX in XLogSendPhysical () at walsender.c:2762
On Thu, May 28, 2020 at 04:32:22PM +0900, Kyotaro Horiguchi wrote: > I think that's not the case. I think I cause this crash with the > HEAD. I'll post a fix soon. > > Pgjdbc seems initiating physical replication on a logical replication > session. Let me see... Indeed: $ psql "replication=database dbname=postgres" =# START_REPLICATION SLOT test_slot PHYSICAL 0/0; [one later] (gdb) bt #0 XLogSendPhysical () at walsender.c:2762 #1 0x55d5f7803318 in WalSndLoop (send_data=0x55d5f78039c7 ) at walsender.c:2300 #2 0x55d5f7800d70 in StartReplication (cmd=0x55d5f919bc60) at walsender.c:750 #3 0x55d5f78025ad in exec_replication_command (cmd_string=0x55d5f9119a80 "START_REPLICATION SLOT test_slot PHYSICAL 0/0;") at walsender.c:1643 #4 0x55d5f786a7ea in PostgresMain (argc=1, argv=0x55d5f91472c8, dbname=0x55d5f9147210 "ioltas", username=0x55d5f91471f0 "ioltas") at postgres.c:4311 #5 0x55d5f77b4e9c in BackendRun (port=0x55d5f913dcd0) at postmaster.c:4523 #6 0x55d5f77b4606 in BackendStartup (port=0x55d5f913dcd0) at postmaster.c:4215 #7 0x55d5f77b08ad in ServerLoop () at postmaster.c:1727 #8 0x55d5f77b00fc in PostmasterMain (argc=3, argv=0x55d5f9113260) at postmaster.c:1400 #9 0x55d5f76b3736 in main (argc=3, argv=0x55d5f9113260) at main.c:210 No need for the JDBC test suite then. -- Michael signature.asc Description: PGP signature
Re: Resolving the python 2 -> python 3 mess
On Wed, Feb 19, 2020 at 08:42:36PM +0100, Peter Eisentraut wrote: > I think there should just > be an option "plpython is: {2|3|don't build it at all}". Then packagers can > match this to what their plan for /usr/bin/python* is -- which appears to be > different everywhere. Today, we do not give packagers this sort of discretion over SQL-level behavior. We formerly had --disable-integer-datetimes, but I view that as the lesser of two evils, the greater of which would have been to force dump/reload of terabyte-scale clusters. We should continue to follow today's principle, which entails not inviting packagers to align the nature of "LANGUAGE plpythonu" with the nature of /usr/bin/python. Users wouldn't benefit from alignment. Moreover, it has long been conventional for the host to be a VM dedicated to PostgreSQL, in which case users of the DBMS aren't users of its host. I don't have an opinion on which version "LANGUAGE plpythonu" should denote in PostgreSQL v13, but the version should be constant across v13 configurations. (If some builds make it unavailable or make it an error-only stub, that is no problem.) I reviewed all code: On Thu, Feb 27, 2020 at 04:11:05PM -0500, Tom Lane wrote: > --- a/configure.in > +++ b/configure.in > @@ -766,6 +766,9 @@ PGAC_ARG_BOOL(with, python, no, [build Python modules > (PL/Python)]) > AC_MSG_RESULT([$with_python]) > AC_SUBST(with_python) > > +PGAC_ARG_BOOL(with, python2-stub, no, [build Python 2 compatibility stub]) > +AC_SUBST(with_python2_stub) > + > # > # GSSAPI > # > @@ -1042,6 +1045,12 @@ fi > if test "$with_python" = yes; then >PGAC_PATH_PYTHON >PGAC_CHECK_PYTHON_EMBED_SETUP > + # Disable building Python 2 stub if primary version isn't Python 3 > + if test "$python_majorversion" -lt 3; then > +with_python2_stub=no > + fi Standard PostgreSQL practice would be to AC_MSG_ERROR in response to the infeasible option, not to ignore the option. > --- /dev/null > +++ b/src/pl/plpython/sql/plpython_stub.sql > +call convert_python3_all(); Building with PYTHON=python3 --with-python2-stub, this fails on my RHEL 7.8 installation. I had installed python34-tools, which provides /usr/bin/2to3-3. I had not installed python-tools (v2.7.5), which provides /usr/bin/2to3. Making a 2to3 -> 2to3-3 symlink let the test pass. Requiring such a step to pass tests may or may not be fine; what do you think? (I am attaching regression.diffs; to my knowledge, it's not interesting.) > --- a/src/tools/msvc/config_default.pl > +++ b/src/tools/msvc/config_default.pl > @@ -16,6 +16,7 @@ our $config = { > tcl => undef,# --with-tcl= > perl => undef,# --with-perl= > python=> undef,# --with-python= > + python2_stub => 1, # --with-python2-stub (ignored unless Python is > v3) This default should not depend on whether one uses the MSVC build system or uses the GNU make build system. > --- /dev/null > +++ b/src/pl/plpython/convert_python3--1.0.sql > +create procedure convert_python3_all(tool text default '2to3', > + options text default '') > +language plpython3u as $$ > +import re, subprocess, tempfile > + > +# pattern to extract just the function header from pg_get_functiondef result > +aspat = re.compile("^(.*?\nAS )", re.DOTALL) This fails on: create function convert1(" AS " int) returns int AS $$return 123l$$ language plpython2u immutable; That's not up to project standard, but I'm proceeding to ignore this since the subject is an untrusted language and ~nobody uses such argument names. diff -U3 /home/nm/src/pg/vp/3stub/src/pl/plpython/expected/python3/plpython_stub.out /home/nm/src/pg/vp/3stub/src/pl/plpython/results/python3/plpython_stub.out --- /home/nm/src/pg/vp/3stub/src/pl/plpython/expected/python3/plpython_stub.out 2020-05-28 01:01:35.445842589 -0700 +++ /home/nm/src/pg/vp/3stub/src/pl/plpython/results/python3/plpython_stub.out 2020-05-28 01:01:37.765861474 -0700 @@ -50,22 +50,33 @@ call convert_python3_all(); NOTICE: converting function convert1() -NOTICE: converting function convert2() +ERROR: subprocess.CalledProcessError: Command '2to3 --no-diffs -w /tmp/tmpvmbr920w/temp.py' returned non-zero exit status 127. +CONTEXT: Traceback (most recent call last): + PL/Python function "convert_python3_all", line 42, in +subprocess.check_call(tool + " " + options + " --no-diffs -w " + tmpdirname + "/temp.py", shell=True) + PL/Python function "convert_python3_all", line 310, in check_call +PL/Python procedure "convert_python3_all" \sf convert1() CREATE OR REPLACE FUNCTION public.convert1() RETURNS integer - LANGUAGE plpython3u + LANGUAGE plpython2u IMMUTABLE -AS $function$return 123$function$ +AS $function$return 123l$function$ \sf convert2() CREATE OR REPLACE FUNCTION public.convert2() RETURNS integer - LANGUAGE plpython3u + LANGUAGE plpythonu IMMUTABLE -AS $function$return 123$function$ +AS $function$return 123l$function$ -- clean up
Re: SIGSEGV from START_REPLICATION 0/XXXXXXX in XLogSendPhysical () at walsender.c:2762
At Thu, 28 May 2020 09:07:04 +0300, Vladimir Sitnikov wrote in > Pgjdbc test suite identified a SIGSEGV in the recent HEAD builds of > PostgreSQL, Ubuntu 14.04.5 LTS > > Here's a call stack: > https://travis-ci.org/github/pgjdbc/pgjdbc/jobs/691794110#L7484 > The crash is consistent, and it reproduces 100% of the cases so far. > > The CI history shows that HEAD was good at 11 May 13:27 UTC, and it became > bad by 19 May 14:00 UTC, > so the regression was introduced somewhere in-between. > > Does that ring any bells? Thanks for the report. It is surely a bug since the server crashes, on the other hand Pgjdbc seems doing bad, too. It seems to me that that crash means Pgjdbc is initiating a logical replication connection to start physical replication. > In case you wonder: > > Program terminated with signal SIGSEGV, Segmentation fault. > #0 XLogSendPhysical () at walsender.c:2762 > 2762 if (!WALRead(xlogreader, > (gdb) #0 XLogSendPhysical () at walsender.c:2762 > SendRqstPtr = 133473640 > startptr = 133473240 > endptr = 133473640 > nbytes = 400 > segno = 1 > errinfo = {wre_errno = 988942240, wre_off = 2, wre_req = -1, > wre_read = -1, wre_seg = {ws_file = 4714224, > ws_segno = 140729887364688, ws_tli = 0}} > __func__ = "XLogSendPhysical" I see the probably the same symptom by the following steps with the current HEAD. psql 'host=/tmp replication=database' =# START_REPLICATION 0/1; Physical replication is not assumed to be started on a logical replication connection. The attached would fix that. The patch adds two tests. One for this case and another for the reverse. regards. -- Kyotaro Horiguchi NTT Open Source Software Center >From 1f94c98f7459ca8a4942246325815a3e0a91caa4 Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi Date: Thu, 28 May 2020 15:55:30 +0900 Subject: [PATCH v1] Fix crash when starting physical replication on logical connection It is an illegal operation to try starting physical replication on a logical replication session. We should properly warn the client instead of crashing. --- src/backend/replication/walsender.c | 5 + src/test/recovery/t/001_stream_rep.pl | 14 +++--- src/test/recovery/t/006_logical_decoding.pl | 10 +- 3 files changed, 25 insertions(+), 4 deletions(-) diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c index 86847cbb54..7b79c75311 100644 --- a/src/backend/replication/walsender.c +++ b/src/backend/replication/walsender.c @@ -589,6 +589,11 @@ StartReplication(StartReplicationCmd *cmd) StringInfoData buf; XLogRecPtr FlushPtr; + if (am_db_walsender) + ereport(ERROR, +(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), + errmsg("cannot initiate physical replication on a logical replication connection"))); + if (ThisTimeLineID == 0) ereport(ERROR, (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), diff --git a/src/test/recovery/t/001_stream_rep.pl b/src/test/recovery/t/001_stream_rep.pl index 0c316c1808..0b69b7d8d1 100644 --- a/src/test/recovery/t/001_stream_rep.pl +++ b/src/test/recovery/t/001_stream_rep.pl @@ -3,7 +3,7 @@ use strict; use warnings; use PostgresNode; use TestLib; -use Test::More tests => 35; +use Test::More tests => 36; # Initialize master node my $node_master = get_new_node('master'); @@ -18,6 +18,14 @@ my $backup_name = 'my_backup'; # Take backup $node_master->backup($backup_name); +# Check if logical-rep session properly refuses to start physical-rep +my ($ret, $stdout, $stderr) = + $node_master->psql('template1', + qq[START_REPLICATION PHYSICAL 0/1], + replication=>'database'); +ok($stderr =~ /ERROR: cannot initiate physical replication on a logical replication connection/, + "check if physical replication is rejected on logical-rep session"); + # Create streaming standby linking to master my $node_standby_1 = get_new_node('standby_1'); $node_standby_1->init_from_backup($node_master, $backup_name, @@ -94,7 +102,7 @@ sub test_target_session_attrs # The client used for the connection does not matter, only the backend # point does. - my ($ret, $stdout, $stderr) = + ($ret, $stdout, $stderr) = $node1->psql('postgres', 'SHOW port;', extra_params => [ '-d', $connstr ]); is( $status == $ret && $stdout eq $target_node->port, @@ -136,7 +144,7 @@ my $connstr_rep= "$connstr_common replication=1"; my $connstr_db = "$connstr_common replication=database dbname=postgres"; # Test SHOW ALL -my ($ret, $stdout, $stderr) = $node_master->psql( +($ret, $stdout, $stderr) = $node_master->psql( 'postgres', 'SHOW ALL;', on_error_die => 1, extra_params => [ '-d', $connstr_rep ]); diff --git a/src/test/recovery/t/006_logical_decoding.pl b/src/test/recovery/t/006_logical_decoding.pl index ee05535b1c..eefde8c3f1 100644 --- a/src/test/recovery/t/006_logical_decoding.pl +++ b/src/test/recovery/t/006_logical_decoding.pl @@ -7,7 +7
Re: Fix compilation failure against LLVM 11
On Wed, May 27, 2020 at 07:49:45AM -0700, Jesse Zhang wrote: > For bystanders: Andres and I argued for "fixing this sooner and > backpatch" and Michael suggested "wait longer and whack all moles". We > have waited, and there seems to be only one mole (finding all dead > unbroken "include"s was left as an exercise for the reader). Have we > come to an agreement on this? If Andres could take care of this issue as he feels is suited, that's OK for me. I could look at that and play with llvm builds. Now I am not really familiar with it, so it would take me some time but we are all here to learn :) Please note that I would still wait for their next GA release to plug in any extra holes at the same time. @Jesse: or is this change actually part of the upcoming 10.0.1? -- Michael signature.asc Description: PGP signature
Re: max_slot_wal_keep_size comment in postgresql.conf
At Thu, 28 May 2020 15:44:26 +0900, Michael Paquier wrote in > On Wed, May 27, 2020 at 04:35:51PM +0900, Michael Paquier wrote: > > On Wed, May 27, 2020 at 04:21:59PM +0900, Kyotaro Horiguchi wrote: > > > I don't oppose to full-spelling. How about the attached? > > > > No problem from me. > > And applied this one as of 55ca50d. Thanks! -- Kyotaro Horiguchi NTT Open Source Software Center
Re: SIGSEGV from START_REPLICATION 0/XXXXXXX in XLogSendPhysical () at walsender.c:2762
Kyotaro>It seems to me that that crash means Pgjdbc is initiating a logical Kyotaro>replication connection to start physical replication. Well, it used to work previously, so it might be a breaking change from the client/application point of view. Vladimir
Re: Inlining of couple of functions in pl_exec.c improves performance
Hi st 27. 5. 2020 v 13:31 odesílatel Amit Khandekar napsal: > On Tue, 26 May 2020 at 09:06, Amit Khandekar > wrote: > > > > On Sat, 23 May 2020 at 23:24, Pavel Stehule > wrote: > > > > > >FOR counter IN 1..180 LOOP > > > id = 0; id = 0; id1 = 0; > > > id2 = 0; id3 = 0; id1 = 0; id2 = 0; > > > id3 = 0; id = 0; id = 0; id1 = 0; > > > id2 = 0; id3 = 0; id1 = 0; id2 = 0; > > > id3 = 0; > > >END LOOP; > > > > > > This is not too much typical PLpgSQL code. All expressions are not > parametrized - so this test is little bit obscure. > > > > > > Last strange performance plpgsql benchmark did calculation of pi > value. It does something real > > > > Yeah, basically I wanted to have many statements, and that too with > > many assignments where casts are not required. Let me check if I can > > come up with a real-enough testcase. Thanks. > > create table tab (id int[]); > insert into tab select array((select ((random() * 10)::bigint) id > from generate_series(1, 3) order by 1)); > insert into tab select array((select ((random() * 60)::bigint) id > from generate_series(1, 3) order by 1)); > insert into tab select array((select ((random() * 100)::bigint) id > from generate_series(1, 3) order by 1)); > insert into tab select array((select ((random() * 10)::bigint) id > from generate_series(1, 3) order by 1)); > insert into tab select array((select ((random() * 60)::bigint) id > from generate_series(1, 3) order by 1)); > insert into tab select array((select ((random() * 100)::bigint) id > from generate_series(1, 3) order by 1)); > insert into tab select array((select ((random() * 10)::bigint) id > from generate_series(1, 3) order by 1)); > insert into tab select array((select ((random() * 60)::bigint) id > from generate_series(1, 3) order by 1)); > insert into tab select array((select ((random() * 100)::bigint) id > from generate_series(1, 3) order by 1)); > > > -- Return how much two consecutive array elements are apart from each > other, on average; i.e. how much the numbers are spaced out. > -- Input is an ordered array of integers. > CREATE OR REPLACE FUNCTION avg_space(int[]) RETURNS bigint AS $$ > DECLARE > diff int = 0; > num int; > prevnum int = 1; > BEGIN > FOREACH num IN ARRAY $1 > LOOP > diff = diff + num - prevnum; > prevnum = num; > END LOOP; > RETURN diff/array_length($1, 1); > END; > $$ LANGUAGE plpgsql; > > explain analyze select avg_space(id) from tab; > Like earlier figures, these are execution times in milliseconds, taken > from explain-analyze. > ARM VM: >HEAD : 49.8 >patch 0001+0002 : 47.8 => 4.2% >patch 0001+0002+0003 : 42.9 => 16.1% > x86 VM: >HEAD : 32.8 >patch 0001+0002 : 32.7 => 0% >patch 0001+0002+0003 : 28.0 => 17.1% > I tested these patches on my notebook - Lenovo T520 (x64) - on pi calculation CREATE OR REPLACE FUNCTION pi_est_1(n int) RETURNS numeric AS $$ DECLARE accum double precision DEFAULT 1.0; c1 double precision DEFAULT 2.0; c2 double precision DEFAULT 1.0; BEGIN FOR i IN 1..n LOOP accum := accum * ((c1 * c1) / (c2 * (c2 + 2.0))); c1 := c1 + 2.0; c2 := c2 + 2.0; END LOOP; RETURN accum * 2.0; END; $$ LANGUAGE plpgsql; and I see about 3-5% of speedup extra simply test shows do $$ declare i int default 0; begin while i < 1 loop i := i + 1; end loop; raise notice 'i=%', i;end $$; 2% speedup I don't see 17% anywhere, but 3-5% is not bad. patch 0001 has sense and can help with code structure patch 0002 it is little bit against simplicity, but for PLpgSQL with blocks structure it is correct. patch 0003 has sense too tested on Fedora 32 with gcc 10.1.1 and -O2 option Regards Pavel
Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions
On Thu, May 28, 2020 at 12:46 PM Dilip Kumar wrote: > > On Tue, May 26, 2020 at 12:00 PM Amit Kapila wrote: > > > > Isn't this problem only for subxact file as we anyway create changes > > file as part of start stream message which should have come after > > abort? If so, can't we detect whether subxact file exists probably by > > using nsubxacts or something like that? Can you please once try to > > reproduce this scenario to ensure that we are not missing anything? > > I have tested this, as of now, by default we create both changes and > subxact files irrespective of whether we get any subtransactions or > not. Maybe this could be optimized that only if we have any subxact > then only create that file otherwise not? What's your opinion on the > same. > Yeah, that makes sense. > > > > > > 8. > > > > > > @@ -2295,6 +2677,13 @@ > > > > > > ReorderBufferXidSetCatalogChanges(ReorderBuffer > > > > > > *rb, TransactionId xid, > > > > > > txn = ReorderBufferTXNByXid(rb, xid, true, NULL, lsn, true); > > > > > > > > > > > > txn->txn_flags |= RBTXN_HAS_CATALOG_CHANGES; > > > > > > + > > > > > > + /* > > > > > > + * TOCHECK: Mark toplevel transaction as having catalog changes too > > > > > > + * if one of its children has. > > > > > > + */ > > > > > > + if (txn->toptxn != NULL) > > > > > > + txn->toptxn->txn_flags |= RBTXN_HAS_CATALOG_CHANGES; > > > > > > } > > > > > > > > > > > > Why are we marking top transaction here? > > > > > > > > > > We need to mark top transaction to decide whether to build tuplecid > > > > > hash or not. In non-streaming mode, we are only sending during the > > > > > commit time, and during commit time we know whether the top > > > > > transaction has any catalog changes or not based on the invalidation > > > > > message so we are marking the top transaction there in DecodeCommit. > > > > > Since here we are not waiting till commit so we need to mark the top > > > > > transaction as soon as we mark any of its child transactions. > > > > > > > > > > > > > But how does it help? We use this flag (via > > > > ReorderBufferXidHasCatalogChanges) in SnapBuildCommitTxn which is > > > > anyway done in DecodeCommit and that too after setting this flag for > > > > the top transaction if required. So, how will it help in setting it > > > > while processing for subxid. Also, even if we have to do it won't it > > > > add the xid needlessly in builder->committed.xip array? > > > > > > In ReorderBufferBuildTupleCidHash, we use this flag to decide whether > > > to build the tuplecid hash or not based on whether it has catalog > > > changes or not. > > > > > > > Okay, but you haven't answered the second part of the question: "won't > > it add the xid of top transaction needlessly in builder->committed.xip > > array, see function SnapBuildCommitTxn?" IIUC, this can happen > > without patch as well because DecodeCommit also sets the flags just > > based on invalidation messages irrespective of whether the messages > > are generated by top transaction or not, is that right? > > Yes, with or without the patch it always adds the topxid. I think > purpose for doing this with/without patch is not for the snapshot > instead we are marking the top itself that some of its subtxn has the > catalog changes so that while building the tuplecid has we can know > whether to build the hash or not. But, having said that I feel in > ReorderBufferBuildTupleCidHash why do we need these two checks > if (!rbtxn_has_catalog_changes(txn) || dlist_is_empty(&txn->tuplecids)) > return; > > I mean it should be enough to just have the check, because if we have > added something to the tuplecids then catalog changes must be there > because that time we are setting the catalog changes to true. > > if (dlist_is_empty(&txn->tuplecids)) > return; > > I think in the base code there are multiple things going on > 1. If we get new CID we always set the catalog change in that > transaction but add the tuplecids in the top transaction. So > basically, top transaction is so far not marked with catalog changes > but it has tuplecids. > 2. Now, in DecodeCommit the top xid will be marked that it has catalog > changes based on the invalidation messages. > I don't think it is advisable to remove that check from base code unless we have a strong reason for doing so. I think here you can write better comments about why you are marking the flag for top transaction and remove TOCHECK from the comment. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: SIGSEGV from START_REPLICATION 0/XXXXXXX in XLogSendPhysical () at walsender.c:2762
Hello, Vladimir. At Thu, 28 May 2020 11:57:23 +0300, Vladimir Sitnikov wrote in > Kyotaro>It seems to me that that crash means Pgjdbc is initiating a logical > Kyotaro>replication connection to start physical replication. > > Well, it used to work previously, so it might be a breaking change from the > client/application point of view. Mmm. It is not the proper way to use physical replication and it's totally accidental that that worked (or even it might be a bug). The documentation is saying as the follows, as more-or-less the same for all versions since 9.4. https://www.postgresql.org/docs/13/protocol-replication.html > To initiate streaming replication, the frontend sends the replication > parameter in the startup message. A Boolean value of true (or on, yes, > 1) tells the backend to go into physical replication walsender mode, > wherein a small set of replication commands, shown below, can be > issued instead of SQL statements. > > Passing database as the value for the replication parameter instructs > the backend to go into logical replication walsender mode, connecting > to the database specified in the dbname parameter. In logical > replication walsender mode, the replication commands shown below as > well as normal SQL commands can be issued. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Getting ERROR with FOR UPDATE/SHARE for partitioned table.
On Thu, May 28, 2020 at 1:36 PM amul sul wrote: > On Wed, May 27, 2020 at 12:53 PM Amit Langote wrote: >> Actually, if you declare the cursor without FOR SHARE/UPDATE, the case >> would fail even with traditional inheritance: >> >> drop table if exists p cascade; >> create table p (a int); >> create table c (check (a = 2)) inherits (p); >> insert into p values (1); >> insert into c values (2); >> begin; >> declare c cursor for select * from p where a = 1; >> fetch c; >> update p set a = a where current of c; >> ERROR: cursor "c" is not a simply updatable scan of table "c" >> ROLLBACK >> > > I am not sure I understood the point, you'll see the same error with > declarative > partitioning as well. My point is that if a table is not present in the cursor's plan, there is no way for CURRENT OF to access it. Giving an error in that case seems justified. OTOH, when the CURRENT OF implementation has RowMarks to look at, it avoids the error for traditional inheritance children due their inactive RowMarks being present in the cursor's PlannedStmt. I think that's only by accident though. -- Amit Langote EnterpriseDB: http://www.enterprisedb.com
Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions
On Thu, May 28, 2020 at 12:57 PM Dilip Kumar wrote: > > On Tue, May 26, 2020 at 3:04 PM Amit Kapila wrote: > > > > Why we need memory-related to subxacts till the worker is alive? As > > we have now, after reading subxact info (subxact_info_read), we need > > to ensure that it is freed after its usage due to which we need to > > remember and perform pfree at various places. > > > > I think we should once see the possibility that such that we could > > switch to this new context in start stream message and reset it in > > stop stream message. That might help in avoiding > > MemoryContextSwitchTo TopMemoryContext at various places. > > Ok, I understand, I think subxacts can be allocated in new > LogicalStreamingContext which we can reset at the stream stop. How > about xids? > How about storing xids in ApplyContext? We do store similar lifespan things in that context, for ex. see store_flush_position. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions
On Thu, May 28, 2020 at 3:15 PM Amit Kapila wrote: > > On Thu, May 28, 2020 at 12:57 PM Dilip Kumar wrote: > > > > On Tue, May 26, 2020 at 3:04 PM Amit Kapila wrote: > > > > > > Why we need memory-related to subxacts till the worker is alive? As > > > we have now, after reading subxact info (subxact_info_read), we need > > > to ensure that it is freed after its usage due to which we need to > > > remember and perform pfree at various places. > > > > > > I think we should once see the possibility that such that we could > > > switch to this new context in start stream message and reset it in > > > stop stream message. That might help in avoiding > > > MemoryContextSwitchTo TopMemoryContext at various places. > > > > Ok, I understand, I think subxacts can be allocated in new > > LogicalStreamingContext which we can reset at the stream stop. How > > about xids? > > > > How about storing xids in ApplyContext? We do store similar lifespan > things in that context, for ex. see store_flush_position. That sounds good to me, I will make this change in the next patch set, along with other changes. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions
On Thu, May 28, 2020 at 2:41 PM Amit Kapila wrote: > > On Thu, May 28, 2020 at 12:46 PM Dilip Kumar wrote: > > > > On Tue, May 26, 2020 at 12:00 PM Amit Kapila > > wrote: > > > > > > Isn't this problem only for subxact file as we anyway create changes > > > file as part of start stream message which should have come after > > > abort? If so, can't we detect whether subxact file exists probably by > > > using nsubxacts or something like that? Can you please once try to > > > reproduce this scenario to ensure that we are not missing anything? > > > > I have tested this, as of now, by default we create both changes and > > subxact files irrespective of whether we get any subtransactions or > > not. Maybe this could be optimized that only if we have any subxact > > then only create that file otherwise not? What's your opinion on the > > same. > > > > Yeah, that makes sense. > > > > > > > > 8. > > > > > > > @@ -2295,6 +2677,13 @@ > > > > > > > ReorderBufferXidSetCatalogChanges(ReorderBuffer > > > > > > > *rb, TransactionId xid, > > > > > > > txn = ReorderBufferTXNByXid(rb, xid, true, NULL, lsn, true); > > > > > > > > > > > > > > txn->txn_flags |= RBTXN_HAS_CATALOG_CHANGES; > > > > > > > + > > > > > > > + /* > > > > > > > + * TOCHECK: Mark toplevel transaction as having catalog changes > > > > > > > too > > > > > > > + * if one of its children has. > > > > > > > + */ > > > > > > > + if (txn->toptxn != NULL) > > > > > > > + txn->toptxn->txn_flags |= RBTXN_HAS_CATALOG_CHANGES; > > > > > > > } > > > > > > > > > > > > > > Why are we marking top transaction here? > > > > > > > > > > > > We need to mark top transaction to decide whether to build tuplecid > > > > > > hash or not. In non-streaming mode, we are only sending during the > > > > > > commit time, and during commit time we know whether the top > > > > > > transaction has any catalog changes or not based on the invalidation > > > > > > message so we are marking the top transaction there in DecodeCommit. > > > > > > Since here we are not waiting till commit so we need to mark the top > > > > > > transaction as soon as we mark any of its child transactions. > > > > > > > > > > > > > > > > But how does it help? We use this flag (via > > > > > ReorderBufferXidHasCatalogChanges) in SnapBuildCommitTxn which is > > > > > anyway done in DecodeCommit and that too after setting this flag for > > > > > the top transaction if required. So, how will it help in setting it > > > > > while processing for subxid. Also, even if we have to do it won't it > > > > > add the xid needlessly in builder->committed.xip array? > > > > > > > > In ReorderBufferBuildTupleCidHash, we use this flag to decide whether > > > > to build the tuplecid hash or not based on whether it has catalog > > > > changes or not. > > > > > > > > > > Okay, but you haven't answered the second part of the question: "won't > > > it add the xid of top transaction needlessly in builder->committed.xip > > > array, see function SnapBuildCommitTxn?" IIUC, this can happen > > > without patch as well because DecodeCommit also sets the flags just > > > based on invalidation messages irrespective of whether the messages > > > are generated by top transaction or not, is that right? > > > > Yes, with or without the patch it always adds the topxid. I think > > purpose for doing this with/without patch is not for the snapshot > > instead we are marking the top itself that some of its subtxn has the > > catalog changes so that while building the tuplecid has we can know > > whether to build the hash or not. But, having said that I feel in > > ReorderBufferBuildTupleCidHash why do we need these two checks > > if (!rbtxn_has_catalog_changes(txn) || dlist_is_empty(&txn->tuplecids)) > > return; > > > > I mean it should be enough to just have the check, because if we have > > added something to the tuplecids then catalog changes must be there > > because that time we are setting the catalog changes to true. > > > > if (dlist_is_empty(&txn->tuplecids)) > > return; > > > > I think in the base code there are multiple things going on > > 1. If we get new CID we always set the catalog change in that > > transaction but add the tuplecids in the top transaction. So > > basically, top transaction is so far not marked with catalog changes > > but it has tuplecids. > > 2. Now, in DecodeCommit the top xid will be marked that it has catalog > > changes based on the invalidation messages. > > > > I don't think it is advisable to remove that check from base code > unless we have a strong reason for doing so. I think here you can > write better comments about why you are marking the flag for top > transaction and remove TOCHECK from the comment. Ok, I will do that. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: Getting ERROR with FOR UPDATE/SHARE for partitioned table.
On Thu, May 28, 2020 at 1:36 PM amul sul wrote: > On Wed, May 27, 2020 at 12:53 PM Amit Langote wrote: >> I guess the workaround is to declare the cursor such that no >> partitions/children are pruned/excluded. > > Disabling pruning as well -- at-least for the statement or function. Now *that* is actually a workaround to tell a customer. :) -- Amit Langote EnterpriseDB: http://www.enterprisedb.com
Re: WIP: WAL prefetch (another approach)
On Sun, May 3, 2020 at 3:12 AM Dmitry Dolgov <9erthali...@gmail.com> wrote: > I've finally performed couple of tests involving more IO. The > not-that-big dataset of 1.5 GB for the replica with the memory allowing > fitting ~ 1/6 of it, default prefetching parameters and an update > workload with uniform distribution. Rather a small setup, but causes > stable reading into the page cache on the replica and allows to see a > visible influence of the patch (more measurement samples tend to happen > at lower latencies): Thanks for these tests Dmitry. You didn't mention the details of the workload, but one thing I'd recommend for a uniform/random workload that's generating a lot of misses on the primary server using N backends is to make sure that maintenance_io_concurrency is set to a number like N*2 or higher, and to look at the queue depth on both systems with iostat -x 1. Then you can experiment with ALTER SYSTEM SET maintenance_io_concurrency = X; SELECT pg_reload_conf(); to try to understand the way it works; there is a point where you've set it high enough and the replica is able to handle the same rate of concurrent I/Os as the primary. The default of 10 is actually pretty low unless you've only got ~4 backends generating random updates on the primary. That's with full_page_writes=off; if you leave it on, it takes a while to get into a scenario where it has much effect. Here's a rebase, after the recent XLogReader refactoring. From a7fd3f728d64c3c94387e9e424dba507b166bcab Mon Sep 17 00:00:00 2001 From: Thomas Munro Date: Sat, 28 Mar 2020 11:42:59 +1300 Subject: [PATCH v9 1/3] Add pg_atomic_unlocked_add_fetch_XXX(). Add a variant of pg_atomic_add_fetch_XXX with no barrier semantics, for cases where you only want to avoid the possibility that a concurrent pg_atomic_read_XXX() sees a torn/partial value. Discussion: https://postgr.es/m/CA%2BhUKGJ4VJN8ttxScUFM8dOKX0BrBiboo5uz1cq%3DAovOddfHpA%40mail.gmail.com --- src/include/port/atomics.h | 24 ++ src/include/port/atomics/generic.h | 33 ++ 2 files changed, 57 insertions(+) diff --git a/src/include/port/atomics.h b/src/include/port/atomics.h index 4956ec55cb..2abb852893 100644 --- a/src/include/port/atomics.h +++ b/src/include/port/atomics.h @@ -389,6 +389,21 @@ pg_atomic_add_fetch_u32(volatile pg_atomic_uint32 *ptr, int32 add_) return pg_atomic_add_fetch_u32_impl(ptr, add_); } +/* + * pg_atomic_unlocked_add_fetch_u32 - atomically add to variable + * + * Like pg_atomic_unlocked_write_u32, guarantees only that partial values + * cannot be observed. + * + * No barrier semantics. + */ +static inline uint32 +pg_atomic_unlocked_add_fetch_u32(volatile pg_atomic_uint32 *ptr, int32 add_) +{ + AssertPointerAlignment(ptr, 4); + return pg_atomic_unlocked_add_fetch_u32_impl(ptr, add_); +} + /* * pg_atomic_sub_fetch_u32 - atomically subtract from variable * @@ -519,6 +534,15 @@ pg_atomic_sub_fetch_u64(volatile pg_atomic_uint64 *ptr, int64 sub_) return pg_atomic_sub_fetch_u64_impl(ptr, sub_); } +static inline uint64 +pg_atomic_unlocked_add_fetch_u64(volatile pg_atomic_uint64 *ptr, int64 add_) +{ +#ifndef PG_HAVE_ATOMIC_U64_SIMULATION + AssertPointerAlignment(ptr, 8); +#endif + return pg_atomic_unlocked_add_fetch_u64_impl(ptr, add_); +} + #undef INSIDE_ATOMICS_H #endif /* ATOMICS_H */ diff --git a/src/include/port/atomics/generic.h b/src/include/port/atomics/generic.h index d3ba89a58f..1683653ca6 100644 --- a/src/include/port/atomics/generic.h +++ b/src/include/port/atomics/generic.h @@ -234,6 +234,16 @@ pg_atomic_add_fetch_u32_impl(volatile pg_atomic_uint32 *ptr, int32 add_) } #endif +#if !defined(PG_HAVE_ATOMIC_UNLOCKED_ADD_FETCH_U32) +#define PG_HAVE_ATOMIC_UNLOCKED_ADD_FETCH_U32 +static inline uint32 +pg_atomic_unlocked_add_fetch_u32_impl(volatile pg_atomic_uint32 *ptr, int32 add_) +{ + ptr->value += add_; + return ptr->value; +} +#endif + #if !defined(PG_HAVE_ATOMIC_SUB_FETCH_U32) && defined(PG_HAVE_ATOMIC_FETCH_SUB_U32) #define PG_HAVE_ATOMIC_SUB_FETCH_U32 static inline uint32 @@ -399,3 +409,26 @@ pg_atomic_sub_fetch_u64_impl(volatile pg_atomic_uint64 *ptr, int64 sub_) return pg_atomic_fetch_sub_u64_impl(ptr, sub_) - sub_; } #endif + +#if defined(PG_HAVE_8BYTE_SINGLE_COPY_ATOMICITY) && \ + !defined(PG_HAVE_ATOMIC_U64_SIMULATION) + +#ifndef PG_HAVE_ATOMIC_UNLOCKED_ADD_FETCH_U64 +#define PG_HAVE_ATOMIC_UNLOCKED_ADD_FETCH_U64 +static inline uint64 +pg_atomic_unlocked_add_fetch_u64_impl(volatile pg_atomic_uint64 *ptr, uint64 val) +{ + ptr->value += val; + return ptr->value; +} +#endif + +#else + +static inline uint64 +pg_atomic_unlocked_add_fetch_u64_impl(volatile pg_atomic_uint64 *ptr, uint64 val) +{ + return pg_atomic_add_fetch_u64_impl(ptr, val); +} + +#endif -- 2.20.1 From 6ed95fffba6751ddc9607659183c072cb11fa4a8 Mon Sep 17 00:00:00 2001 From: Thomas Munro Date: Tue, 7 Apr 2020 22:56:27 +1200 Subject: [PATCH v9 2/3] Allow XLogReadRecord() to be non-blocking. Exte
Re: Getting ERROR with FOR UPDATE/SHARE for partitioned table.
On Thu, May 28, 2020 at 3:06 PM Amit Langote wrote: > On Thu, May 28, 2020 at 1:36 PM amul sul wrote: > > On Wed, May 27, 2020 at 12:53 PM Amit Langote > wrote: > >> Actually, if you declare the cursor without FOR SHARE/UPDATE, the case > >> would fail even with traditional inheritance: > >> > >> drop table if exists p cascade; > >> create table p (a int); > >> create table c (check (a = 2)) inherits (p); > >> insert into p values (1); > >> insert into c values (2); > >> begin; > >> declare c cursor for select * from p where a = 1; > >> fetch c; > >> update p set a = a where current of c; > >> ERROR: cursor "c" is not a simply updatable scan of table "c" > >> ROLLBACK > >> > > > > I am not sure I understood the point, you'll see the same error with > declarative > > partitioning as well. > > My point is that if a table is not present in the cursor's plan, there > is no way for CURRENT OF to access it. Giving an error in that case > seems justified. > > OTOH, when the CURRENT OF implementation has RowMarks to look at, it > avoids the error for traditional inheritance children due their > inactive RowMarks being present in the cursor's PlannedStmt. I think > that's only by accident though. > Yeah, make sense, thank you. Regards, Amul
Read access for pg_monitor to pg_replication_origin_status view
Hi all, While working on some monitoring tasks I realised that the pg_monitor role doesn't have access to the pg_replication_origin_status. Are there any strong thoughts on not giving pg_monitor access to this view, or is it just something that nobody asked for yet? I can't find any reason for pg_monitor not to have access to it. Regards, -- Martín Marquéshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions
On Wed, May 27, 2020 at 8:22 PM Dilip Kumar wrote: > > On Tue, May 26, 2020 at 7:45 AM Dilip Kumar wrote: > > > > > > Okay, sending again. > > While reviewing/testing I have found a couple of problems in 0005 and > 0006 which I have fixed in the attached version. > I haven't reviewed the new fixes yet but I have some comments on 0008-Add-support-for-streaming-to-built-in-replicatio.patch. 1. I think the temporary files (and or handles) used for storing the information of changes and subxacts are getting leaked in the patch. At some places, it is taken care to close the file but cases like apply_handle_stream_commit where if any error occurred in apply_dispatch(), the file might not get closed. The other place is in apply_handle_stream_abort() where if there is an error in ftruncate the file won't be closed. Now, the bigger problem is with changes related file which is opened in apply_handle_stream_start and closed in apply_handle_stream_stop and if there is any error in-between, we won't close it. OTOH, I think the worker will exit on an error so it might not matter but then why we are at few other places we are closing it before the error? I think on error these temporary files should be removed instead of relying on them to get removed next time when we receive changes for the same transaction which I feel is what we do in other cases where we use temporary files like for sorts or hashjoins. Also, what if the changes file size overflows "OS file size limit"? If we agree that the above are problems then do you think we should explore using BufFile interface (see storage/file/buffile.c) to avoid all such problems? 2. apply_handle_stream_abort() { .. + /* discard the subxacts added later */ + nsubxacts = subidx; + + /* write the updated subxact list */ + subxact_info_write(MyLogicalRepWorker->subid, xid); .. } Here, if subxacts becomes zero, then also subxact_info_write will create a new file and write checksum. I think subxact_info_write should have a check for nsubxacts > 0 before writing to the file. 3. apply_handle_stream_commit(StringInfo s) { .. + /* + * send feedback to upstream + * + * XXX Probably should send a valid LSN. But which one? + */ + send_feedback(InvalidXLogRecPtr, false, false); .. } Why do we need to send the feedback at this stage after applying each message? If we see a non-streamed case, we never send_feedback after each message. So, following that, I don't see the need to send it here but if you see any specific reason then do let me know? And if we have to send feedback, then we need to decide the appropriate values as well. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: password_encryption default
On 2020-05-27 15:25, Jonathan S. Katz wrote: $ initdb -D data --auth-local=scram-sha-256 --auth-host=md5 Got an error message: "initdb: error: must specify a password for the superuser to enable md5 authentication" For the last two, that behavior is to be expected (after all, you've set the two login vectors to require passwords), but the error message seems odd now. Perhaps we tweak it to be: "initdb: error: must specify a password for the superuser when requiring passwords for both local and host authentication." That message is a bit long. Maybe just must specify a password for the superuser to enable password authentication without reference to the specific method. I think the idea is clear there. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: password_encryption default
On 2020-05-27 15:59, Stephen Frost wrote: Agreed- let's remove the legacy options. As I've mentioned elsewhere, distros may manage the issue for us, and if we want to get into it, we could consider adding support to pg_upgrade to complain if it comes across a legacy setting that isn't valid. I'm not sure that's worthwhile though. More along these lines: We could also remove the ENCRYPTED and UNENCRYPTED keywords from CREATE and ALTER ROLE. AFAICT, these have never been emitted by pg_dump or psql, so there are no concerns from that end. Thoughts? -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: SIGSEGV from START_REPLICATION 0/XXXXXXX in XLogSendPhysical () at walsender.c:2762
On Thu, 28 May 2020 at 05:11, Kyotaro Horiguchi wrote: > Hello, Vladimir. > > At Thu, 28 May 2020 11:57:23 +0300, Vladimir Sitnikov < > sitnikov.vladi...@gmail.com> wrote in > > Kyotaro>It seems to me that that crash means Pgjdbc is initiating a > logical > > Kyotaro>replication connection to start physical replication. > > > > Well, it used to work previously, so it might be a breaking change from > the > > client/application point of view. > > Mmm. It is not the proper way to use physical replication and it's > totally accidental that that worked (or even it might be a bug). The > documentation is saying as the follows, as more-or-less the same for > all versions since 9.4. > > https://www.postgresql.org/docs/13/protocol-replication.html > > > To initiate streaming replication, the frontend sends the replication > > parameter in the startup message. A Boolean value of true (or on, yes, > > 1) tells the backend to go into physical replication walsender mode, > > wherein a small set of replication commands, shown below, can be > > issued instead of SQL statements. > > > > Passing database as the value for the replication parameter instructs > > the backend to go into logical replication walsender mode, connecting > > to the database specified in the dbname parameter. In logical > > replication walsender mode, the replication commands shown below as > > well as normal SQL commands can be issued. > > regards. > > While the documentation does indeed say that there is quite a bit of additional confusion added by: and START_REPLICATION [ SLOT *slot_name* ] [ PHYSICAL ] *XXX/XXX* [ TIMELINE *tli* ] If we already have a physical replication slot according to the startup message why do we need to specify it in the START REPLICATION message ? Dave
Re: New 'pg' consolidated metacommand patch
On Wed, May 27, 2020 at 4:00 PM Peter Eisentraut wrote: > First, consider that git has over 170 subcommands. PostgreSQL currently > has 36, and we're probably not going to add dozens more any time soon. > So the issue is not of the same scope. It also seems to me that the way > git is organized this becomes a self-perpetuating system: They are > adding subcommands all the time without much care where you might in > other situations think harder about combining them and keeping the > surface area smaller. For example, we wouldn't really need separate > commands clusterdb, reindexdb, vacuumdb if we had better support in psql > for "run this command in each database [in parallel]". I see your point here, but I think it's clear that some people are already concerned about the proliferation of binaries on namespace grounds. For example, it was proposed that pg_verifybackup should be part of pg_basebackup, and then later and by someone else that it should be part of pg_checksums. It's quite different from either of those things and I'm pretty confident it shouldn't be merged with either one, but there was certainly pressure in that direction. So apparently for some people 36 is already too many. It's not clear to me why it should be a problem to have a lot of commands, as long as they all start with "pg_", but if it is then we should do something about it rather than waiting until we have more of them. > git (and svn etc. before it) also has a much more consistent operating > model that is sensible to reflect in the command structure. They all > more or less operate on a git repository, in apparently 170 different > ways. The 36 PostgreSQL commands don't all work in the same way. Now > if someone were to propose a way to combine server tools, perhaps like > pginstancetool {init|controldata|resetwal|checksum}, and perhaps also in > a way that actually saves code duplication and inconsistency, that would > be something to consider. Or maybe a client-side tool that does > pgclienttool {create user|drop user|create database|...} -- but that > pretty much already exists by the name of psql. But just renaming > everything that's shipped with PostgreSQL to one common bucket without > regard to how it actually works and what role it plays would be > unnecessarily confusing. This doesn't strike me as a very practical proposal because "pginstancetool checksums" is really stinking long compared to "pg_checksums", where as "pg checksums" is no different, or one keystroke better if you assume that the underscore requires pressing shift. > Also consider some practical concerns with the command structure you > describe: Tab completion of commands wouldn't work anymore, unless you > supply custom tab completion setups. The direct association between a > command and its man page would be broken. Shell scripting becomes more > challenging: Instead of writing common things like "if which > pg_waldump; then" you'd need some custom code, to be determined. These > are all solvable, but just a sum of slight annoyances, for no real benefit. There are some potential benefits, I think, such as: 1. It seems to be the emerging standard for command line interfaces. There's not only the 'git' example but also things like 'aws', which is perhaps more similar to the case proposed here in that there are a bunch of subcommands that do quite different sorts of things. I think a lot of developers are now quite familiar with the idea of a main command with a bunch of subcommands, and they expect to be able to type 'git help' or 'aws help' or 'pg help' to get a list of commands, and then 'pg help createdb' for help with that. If you don't know what pg commands exist today, how do you discover them? You're right that not everyone is going this way but it seems to be pretty common (kubectl, yum, brew, npm, heroku, ...). 2. It lowers the barrier to adding more commands. For example, as Chris Browne says, we could have a 'pg completion' command to emit completion information for various shells. True, that would be more necessary with this proposal. But I bet there's stuff that could be done even today -- I think most modern shells have pretty powerful completion facilities. Someone could propose it, but what are the chances that a patch adding a pg_completion binary would be accepted? I feel like the argument that this is too narrow to justify the namespace pollution is almost inevitable. 3. It might help us achieve some better consistency between commands. Right now we have a lot of warts in the way things are named, like pgbench vs. pg_ctl vs. createdb, and also pg_basebackup vs. pg_test_fsync (why not pg_base_backup or pg_testfsync?). Standardizing on something like this would probably help us be more consistent there. Over time, we might be able to also clean other things up. Maybe we could get all of our client-side utilities to share a common config file, and have a 'pg config' utility to configure it. Maybe we could have com
Re: password_encryption default
On 5/28/20 8:10 AM, Peter Eisentraut wrote: > On 2020-05-27 15:25, Jonathan S. Katz wrote: >> $ initdb -D data --auth-local=scram-sha-256 --auth-host=md5 >> >> Got an error message: >> >> "initdb: error: must specify a password for the superuser to enable md5 >> authentication" >> >> For the last two, that behavior is to be expected (after all, you've set >> the two login vectors to require passwords), but the error message seems >> odd now. Perhaps we tweak it to be: >> >> >> "initdb: error: must specify a password for the superuser when requiring >> passwords for both local and host authentication." > > That message is a bit long. Maybe just > > must specify a password for the superuser to enable password authentication > > without reference to the specific method. I think the idea is clear there. +1 Jonathan signature.asc Description: OpenPGP digital signature
Re: Explain Analyze (Rollback off) Suggestion
On Wed, May 27, 2020 at 9:33 PM David G. Johnston wrote: > I'm not seeing enough similarity with the reasons for, and specific > behaviors, of those previous GUCs to dismiss this proposal on that basis > alone. These are "crap we messed things up" switches that alter a query > behind the scenes in ways that a user cannot do through SQL - they simply > provide for changing a default that we already allow the user to override > per-query. Its akin to "DateStyle" and its pure cosmetic influencing > ease-of-use option rather than some changing the fundamental structural > meaning of '\n' Well, I think it's usually worse to have two possible behaviors rather than one. Like, a lot of people have probably made the mistake of running EXPLAIN ANALYZE without realizing that it's actually running the query, and then been surprised or dismayed afterwards. But each person only has to learn that once. If we had a GUC controlling this behavior, then you'd have to always be aware of the setting on any particular system on which you might be thinking of running the command. Likewise, if you write an application or tool of some sort that uses EXPLAIN ANALYZE, it has to be aware of the GUC value, or it won't work as expected on some systems. This is the general problem with behavior-changing GUCs. I kind of have mixed feelings about this. On the one hand, it sucks for operators of individual systems not to be able to customize things so as to produce the behavior they want. On the other hand, each one you add makes it harder to write code that will work the same way on every PostgreSQL system. I don't think the problem would be as bad in this particular case as in some others that have been proposed, mostly because EXPLAIN ANALYZE isn't widely-used by applications, so maybe it's worth considering. But on the whole, I'm inclined to agree with Tom that it's better not to create too many ways for fundamental behavior of the system to vary from one installation to another. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: password_encryption default
On Thu, May 28, 2020 at 8:53 AM Peter Eisentraut wrote: > More along these lines: We could also remove the ENCRYPTED and > UNENCRYPTED keywords from CREATE and ALTER ROLE. AFAICT, these have > never been emitted by pg_dump or psql, so there are no concerns from > that end. Thoughts? I have a question about this. My understanding of this area isn't great. As I understand it, you can specify a password unencrypted and let the system compute the validator from it, or you can compute the validator yourself and then send that as the 'encrypted' password. But, apparently, CREATE ROLE and ALTER ROLE don't really know which thing you did. They just examine the string that you passed and decide whether it looks like a validator. If so, they assume it is; if not, they assume it's just a password. But that seems really odd. What if you choose a password that just happens to look like a validator? Perhaps that's not real likely, but why do we not permit -- or even require -- the user to specify intent? It seems out of character for us to, essentially, guess the meaning of something ambiguous rather than requiring the user to be clear about it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: password_encryption default
Greetings, * Robert Haas (robertmh...@gmail.com) wrote: > On Thu, May 28, 2020 at 8:53 AM Peter Eisentraut > wrote: > > More along these lines: We could also remove the ENCRYPTED and > > UNENCRYPTED keywords from CREATE and ALTER ROLE. AFAICT, these have > > never been emitted by pg_dump or psql, so there are no concerns from > > that end. Thoughts? > > I have a question about this. My understanding of this area isn't > great. As I understand it, you can specify a password unencrypted and > let the system compute the validator from it, or you can compute the > validator yourself and then send that as the 'encrypted' password. > But, apparently, CREATE ROLE and ALTER ROLE don't really know which > thing you did. They just examine the string that you passed and decide > whether it looks like a validator. If so, they assume it is; if not, > they assume it's just a password. > > But that seems really odd. What if you choose a password that just > happens to look like a validator? Perhaps that's not real likely, but > why do we not permit -- or even require -- the user to specify intent? > It seems out of character for us to, essentially, guess the meaning of > something ambiguous rather than requiring the user to be clear about > it. Indeed, and it's also been a source of bugs... Watching pgcon atm but I do recall some history around exactly this. I'd certainly be in favor of having these things be more explicit- including doing things like actually splitting out the actual password validator from the algorithm instead of having them smashed together as one string as if we don't know what columns are (also recall complaining about that when scram was first being developed too, though that might just be in my head). Thanks, Stephen signature.asc Description: PGP signature
Re: Getting ERROR with FOR UPDATE/SHARE for partitioned table.
On Wed, May 27, 2020 at 6:51 PM Amit Langote wrote: > > So in Rajkumar's example, the cursor is declared as: > > CURSOR IS SELECT * FROM tbl WHERE c1< 5 FOR UPDATE; > > and the WHERE CURRENT OF query is this: > > UPDATE tbl SET c2='aa' WHERE CURRENT OF cur; Thanks for the clarification. So it looks like we expand UPDATE on partitioned table to UPDATE on each partition (inheritance_planner for DML) and then execute each of those. If CURRENT OF were to save the table oid or something we could run the UPDATE only on that partition. I am possibly shooting in dark, but this puzzles me. And it looks like we can cause wrong rows to be updated in non-partition inheritance where the ctids match? -- Best Wishes, Ashutosh Bapat
Re: Explain Analyze (Rollback off) Suggestion
On Thu, May 28, 2020 at 6:42 AM Robert Haas wrote: > On Wed, May 27, 2020 at 9:33 PM David G. Johnston > wrote: > > I'm not seeing enough similarity with the reasons for, and specific > behaviors, of those previous GUCs to dismiss this proposal on that basis > alone. These are "crap we messed things up" switches that alter a query > behind the scenes in ways that a user cannot do through SQL - they simply > provide for changing a default that we already allow the user to override > per-query. Its akin to "DateStyle" and its pure cosmetic influencing > ease-of-use option rather than some changing the fundamental structural > meaning of '\n' > > Well, I think it's usually worse to have two possible behaviors rather > than one. Like, a lot of people have probably made the mistake of > running EXPLAIN ANALYZE without realizing that it's actually running > the query, and then been surprised or dismayed afterwards. This really belongs on the other thread (though I basically said the same thing there two days ago): The ANALYZE option should not be part of the GUC setup. None of the other EXPLAIN default changing options have the same issues with being on by default - which is basically what we are talking about here: being able to have an option be on without specifying that option in the command itself. TIMING already does this without difficulty and the others are no different. David J.
Re: Resolving the python 2 -> python 3 mess
Noah Misch writes: > On Wed, Feb 19, 2020 at 08:42:36PM +0100, Peter Eisentraut wrote: >> I think there should just >> be an option "plpython is: {2|3|don't build it at all}". Then packagers can >> match this to what their plan for /usr/bin/python* is -- which appears to be >> different everywhere. > Today, we do not give packagers this sort of discretion over SQL-level > behavior. We formerly had --disable-integer-datetimes, but I view that as the > lesser of two evils, the greater of which would have been to force dump/reload > of terabyte-scale clusters. We should continue to follow today's principle, > which entails not inviting packagers to align the nature of "LANGUAGE > plpythonu" with the nature of /usr/bin/python. FWIW, I've abandoned this patch. We've concluded (by default, at least) that nothing is getting done in v13, and by the time v14 is out it will be too late to have any useful effect. I expect that the situation on-the-ground by 2021 will be that packagers build with PYTHON=python3 and package whatever they get from that. That means (1) plpythonu won't exist anymore and (2) users will be left to their own devices to convert existing plpython code. Now, (1) corresponds to not providing any /usr/bin/python executable, only python3 -- and that is a really common choice for distros to make, AFAIK, so I don't feel too awful about it. I find (2) less than ideal, but there's evidently not enough interest in doing anything about it. There's certainly going to be no point in shipping a solution for (2) if we fail to do so before v14; people will already have done the work by hand. We should, however, consider updating the plpython docs to reflect current reality. Notably, the existing wording in section 45.1 suggests that we'll eventually redefine "plpythonu" as Python 3, and it seems to me that that's not going to happen. regards, tom lane
Re: Explain Analyze (Rollback off) Suggestion
"David G. Johnston" writes: > The ANALYZE option should not be part of the GUC setup. Yeah. While I'm generally not in favor of putting GUCs into the mix here, the only one that seriously scares me is a GUC that would affect whether the EXPLAIN'd query executes or not. A GUC that causes buffer counts to be reported/not-reported is not going to result in data destruction when someone forgets that it's on. (BTW, adding an option for auto-rollback wouldn't change my opinion about that. Not all side-effects of a query can be rolled back. Thus, if there is an auto-rollback option, it mustn't be GUC-adjustable either.) regards, tom lane
Re: Explain Analyze (Rollback off) Suggestion
On Thu, May 28, 2020 at 7:52 AM Tom Lane wrote: > (BTW, adding an option for auto-rollback wouldn't change my opinion > about that. Not all side-effects of a query can be rolled back. Thus, > if there is an auto-rollback option, it mustn't be GUC-adjustable > either.) > Yeah, I've worked myself around to that as well, this thread's proposal would be to just make setting up rollback more obvious and easier for a user of explain analyze - whose value at this point is wholly independent of the GUC discussion. David J.
Incorrect comment in be-secure-openssl.c
The comment in be-secure-openssl.c didn't get the memo that the hardcoded DH parameters moved in 573bd08b99e277026e87bb55ae69c489fab321b8. The attached updates the wording, keeping it generic enough that we wont need to update it should the parameters move again. cheers ./daniel openssl_dh_comment.patch Description: Binary data
Re: password_encryption default
On Thu, May 28, 2020 at 10:01 AM Stephen Frost wrote: > as if we don't know what columns are Amen to that! -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: repeat() function, CHECK_FOR_INTERRUPTS(), and unlikely()
On 5/27/20 3:29 AM, Michael Paquier wrote: >> I think that each of those tests should have a separate unlikely() marker, >> since the whole point here is that we don't expect either of those tests >> to yield true in the huge majority of CHECK_FOR_INTERRUPTS executions. > > +1. I am not sure that the addition of unlikely() should be > backpatched though, that's not something usually done. I backpatched and pushed the changes to the repeat() function. Any other opinions regarding backpatch of the unlikely() addition to CHECK_FOR_INTERRUPTS()? Joe -- Crunchy Data - http://crunchydata.com PostgreSQL Support for Secure Enterprises Consulting, Training, & Open Source Development signature.asc Description: OpenPGP digital signature
Re: Fix compilation failure against LLVM 11
On Thu, May 28, 2020 at 1:07 AM Michael Paquier wrote: > Please note that I would still wait for their next GA release to plug > in any extra holes at the same time. @Jesse: or is this change > actually part of the upcoming 10.0.1? No a refactoring like this was not in the back branches (nor is it expected). Thanks, Jesse
Re: Trouble with hashagg spill I/O pattern and costing
On Wed, May 27, 2020 at 11:07:04AM +0200, Tomas Vondra wrote: On Tue, May 26, 2020 at 06:42:50PM -0700, Melanie Plageman wrote: On Tue, May 26, 2020 at 5:40 PM Jeff Davis wrote: On Tue, 2020-05-26 at 21:15 +0200, Tomas Vondra wrote: As for the tlist fix, I think that's mostly ready too - the one thing we should do is probably only doing it for AGG_HASHED. For AGG_SORTED it's not really necessary. Melanie previously posted a patch to avoid spilling unneeded columns, but it introduced more code: https://www.postgresql.org/message-id/caakru_aefesv+ukqwqu+ioenoil2lju9diuh9br8mbyxuz0...@mail.gmail.com and it seems that Heikki also looked at it. Perhaps we should get an acknowledgement from one of them that your one-line change is the right approach? I spent some time looking at it today, and, it turns out I was wrong. I thought that there was a case I had found where CP_SMALL_TLIST did not eliminate as many columns as could be eliminated for the purposes of spilling, but, that turned out not to be the case. I changed CP_LABEL_TLIST to CP_SMALL_TLIST in create_groupingsets_plan(), create_agg_plan(), etc and tried a bunch of different queries and this 2-3 line change worked for all the cases I tried. Is that where you made the change? I've only made the change in create_agg_plan, because that's what was in the query plan I was investigating. You may be right that the same fix is needed in additional places, though. Attached is a patch adding CP_SMALL_TLIST to create_agg_plan and create_groupingsets_plan. I've looked at the other places that I think seem like they might benefit from it (create_upper_unique_plan, create_group_plan) but I think we don't need to modify those - there'll either be a Sort of HashAgg, which will take care about the projection. Or do you think some other places need CP_SMALL_TLIST? And then are you proposing to set it based on the aggstrategy to either CP_LABEL_TLIST or CP_SMALL_TLIST here? Yes, something like that. The patch I shared on on 5/21 just changed that, but I'm wondering if that could add overhead for sorted aggregation, which already does the projection thanks to the sort. I ended up tweaking the tlist only for AGG_MIXED and AGG_HASHED. We clearly don't need it for AGG_PLAIN or AGG_SORTED. This way we don't break regression tests by adding unnecessary "Subquery Scan" nodes. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c index 9941dfe65e..ddb277cdf0 100644 --- a/src/backend/optimizer/plan/createplan.c +++ b/src/backend/optimizer/plan/createplan.c @@ -2113,12 +2113,22 @@ create_agg_plan(PlannerInfo *root, AggPath *best_path) Plan *subplan; List *tlist; List *quals; + int flags; /* * Agg can project, so no need to be terribly picky about child tlist, but -* we do need grouping columns to be available +* we do need grouping columns to be available. We are a bit more careful +* with hash aggregate, where we explicitly request small tlist to minimize +* I/O needed for spilling (possibly not known already). */ - subplan = create_plan_recurse(root, best_path->subpath, CP_LABEL_TLIST); + flags = CP_LABEL_TLIST; + + /* ensure small tlist for hash aggregate */ + if (best_path->aggstrategy == AGG_HASHED || + best_path->aggstrategy == AGG_MIXED) + flags |= CP_SMALL_TLIST; + + subplan = create_plan_recurse(root, best_path->subpath, flags); tlist = build_path_tlist(root, &best_path->path); @@ -2209,7 +2219,8 @@ create_groupingsets_plan(PlannerInfo *root, GroupingSetsPath *best_path) * Agg can project, so no need to be terribly picky about child tlist, but * we do need grouping columns to be available */ - subplan = create_plan_recurse(root, best_path->subpath, CP_LABEL_TLIST); + subplan = create_plan_recurse(root, best_path->subpath, + CP_LABEL_TLIST | CP_SMALL_TLIST); /* * Compute the mapping from tleSortGroupRef to column index in the child's
Re: Delaying/avoiding BTreeTupleGetNAtts() call within _bt_compare()
> On Mar 2, 2020, at 5:29 PM, Peter Geoghegan wrote: > > On Sun, Mar 1, 2020 at 12:15 PM Floris Van Nee > wrote: >> Minor conflict with that patch indeed. Attached is rebased version. I'm >> running some tests now - will post the results here when finished. > > Thanks. > > We're going to have to go back to my original approach to inlining. At > least, it seemed to be necessary to do that to get any benefit from > the patch on my comparatively modest workstation (using a similar > pgbench SELECT benchmark to the one that you ran). Tom also had a > concern about the portability of inlining without using "static > inline" -- that is another reason to avoid the approach to inlining > taken by v3 + v4. > > It's possible (though not very likely) that performance has been > impacted by the deduplication patch (commit 0d861bbb), since it > updated the definition of BTreeTupleGetNAtts() itself. > > Attached is v5, which inlines in a targeted fashion, pretty much in > the same way as the earliest version. This is the same as v4 in every > other way. Perhaps you can test this. Hi Peter, just a quick code review: The v5 patch files apply and pass the regression tests. I get no errors. The performance impact is within the noise. The TPS with the patch are higher sometimes and lower other times, looking across the 27 subtests of the "select-only" benchmark. Which subtest is slower or faster changes per run, so that doesn't seem to be relevant. I ran the "select-only" six times (three with the patch, three without). The change you made to the loop is clearly visible in the nbtsearch.s output, and the change to inline _bt_compare is even more visible, so there doesn't seem to be any defeating of your change due to the compiler ignoring the "inline" or such. I compiled using gcc -O2 % gcc --version Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/c++/4.2.1 Apple clang version 11.0.0 (clang-1100.0.33.17) Target: x86_64-apple-darwin19.4.0 Thread model: posix InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin 2.4 GHz 8-Core Intel Core i9 32 GB 2667 MHz DDR4 Reading this thread, I think the lack of a performance impact on laptop hardware was expected, but perhaps confirmation that it does not make things worse is useful? Since this patch doesn't seem to do any harm, I would mark it as "ready for committer", except that there doesn't yet seem to be enough evidence that it is a net win. — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Operator class parameters and sgml docs
On Thu, May 21, 2020 at 3:17 AM Alexander Korotkov wrote: > > On Thu, May 21, 2020 at 12:37 AM Peter Geoghegan wrote: > > Commit 911e7020770 added a variety of new support routines to index > > AMs. For example, it added a support function 5 to btree (see > > BTOPTIONS_PROC), but didn't document this alongside the other support > > functions in btree.sgml. > > > > It looks like the new support functions are fundamentally different to > > the existing ones in that they exist only as a way of supplying > > parameters to other support functions. The idea was to preserve > > compatibility with the old support function signatures. Even still, I > > think that the new support functions should get some mention alongside > > the older support functions. > > > > I also wonder whether or not xindex.sgml needs to be updated to > > account for opclass parameters. > > Thank you for pointing. I'm going to take a look on this in next few days. I'm sorry for the delay. I was very busy with various stuff. I'm going to post docs patch next week. -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: WIP: WAL prefetch (another approach)
Thomas Munro escribió: > @@ -1094,8 +1103,16 @@ WALRead(XLogReaderState *state, > XLByteToSeg(recptr, nextSegNo, > state->segcxt.ws_segsize); > state->routine.segment_open(state, nextSegNo, &tli); > > - /* This shouldn't happen -- indicates a bug in > segment_open */ > - Assert(state->seg.ws_file >= 0); > + /* callback reported that there was no such file */ > + if (state->seg.ws_file < 0) > + { > + errinfo->wre_errno = errno; > + errinfo->wre_req = 0; > + errinfo->wre_read = 0; > + errinfo->wre_off = startoff; > + errinfo->wre_seg = state->seg; > + return false; > + } Ah, this is what Michael was saying ... we need to fix WALRead so that it doesn't depend on segment_open alway returning a good FD. This needs a fix everywhere, not just here, and improve the error report interface. Maybe it does make sense to get it fixed in pg13 and avoid a break later. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Conflict of implicit collations doesn't propagate out of subqueries
Hi! I think I’ve found a bug related to the strength of collations. Attached is a WIP patch, that address some other issues too. In this this example, the conflict of implicit collations propagates correctly: WITH data (c, posix) AS ( values ('a' COLLATE "C", 'b' COLLATE "POSIX") ) SELECT * FROM data ORDER BY ( c || posix ) || posix ERROR: collation mismatch between implicit collations "C" and "POSIX" LINE 6: ORDER BY ( c || posix ) || posix; ^ HINT: You can choose the collation by applying the COLLATE clause to one or both expressions. However, if the conflict happens in a subquery, it doesn’t anymore: WITH data (c, posix) AS ( values ('a' COLLATE "C", 'b' COLLATE "POSIX") ) SELECT * FROM (SELECT *, c || posix AS none FROM data) data ORDER BY none || posix; c | posix | none ---+---+-- a | b | ab (1 row) The problem is in parse_collate.c:566: A Var (and some other nodes) without valid collation gets the strength COLLATE_NONE: if (OidIsValid(collation)) strength = COLLATE_IMPLICIT; else strength = COLLATE_NONE; However, for a Var it should be COLLATE_CONFLICT, which corresponds to the standards collation derivation “none”. I guess there could be other similar cases which I don’t know about. The patch fixes that, plus necessary consequences (error handling for new scenarios) as well as some “cosmetic” improvements I found along the way. Unnecessary to mention that this patch might break existing code. With the patch, the second example form above fails similarly to the first example: ERROR: collation mismatch between implicit collation "POSIX" and unknown collation LINE 6: ORDER BY none || posix; ^ HINT: You can choose the collation by applying the COLLATE clause to one or both expressions. Other changes in the patch: ** Error Handling The current parse time error handling of implicit collisions has always both colliding collations. The patch allows a COLLATE_CONFLICT without knowing which collations caused the conflict (it’s not stored in the Var node). Thus we may have know two, one or none of the collations that contributed to the collision. The new function ereport_implicit_collation_mismatch takes care of that. ** Renaming COLLATE_NONE I found the name COLLATE_NONE a little bit unfortuante as it can easily be mistaken for the collation derivation “none” the SQL standard uses. I renamed it to COLLATE_NA for “not applicable”. The standard doesn’t have a word for that as non character string types just don’t have collations. ** Removal of location2 The code keeps track of the location (for parser_errposition) of both collations that collided, even though it reports only one of them. The patch removes location2. Due to this, the some errors report the other location as before (previously the second, now the first). See collate.out in the patch. The patch has TODO comments for code that would be needed to take the other one. Is there any policy to report the second location or to strictly keep the errors where they used to be? ** General cleanup The patch also removes a little code that I think is not needed (anymore). ** Tests Besides the errposition, only one existing test breaks. It previously triggered a runtime error, now it triggers a parse time error: SELECT a, b FROM collate_test1 UNION ALL SELECT a, b FROM collate_test2 ORDER BY 2; -- fail -ERROR: could not determine which collation to use for string comparison -HINT: Use the COLLATE clause to set the collation explicitly. +ERROR: collation mismatch between implicit collations +LINE 1: SELECT a, b FROM collate_test1 UNION ALL SELECT a, b FROM co... + ^ +HINT: Use the COLLATE clause to set the collation explicitly The patch also adds a new test to trigger the problem in the first place. The patch is against REL_12_STABLE but applies to master too. -markus WIP_collation_strength_conflict_for_Var_v1.patch Description: Binary data
Re: Expand the use of check_canonical_path() for more GUCs
> On May 20, 2020, at 12:03 AM, Michael Paquier wrote: > > On Tue, May 19, 2020 at 09:32:15AM -0400, Tom Lane wrote: >> Hm, I'm pretty certain that data_directory does not need this because >> canonicalization is done elsewhere; the most that you could accomplish >> there is to cause problems. Dunno about the rest. > > Hmm. I missed that this is getting done in SelectConfigFiles() first > by the postmaster so that's not necessary, which also does the work > for hba_file and ident_file. config_file does not need that either as > AbsoluteConfigLocation() does the same work via ParseConfigFile(). So > perhaps we could add a comment or such about that? Attached is an > idea. > > The rest is made of PromoteTriggerFile, pg_krb_server_keyfile, > ssl_cert_file, ssl_key_file, ssl_ca_file, ssl_crl_file and > ssl_dh_params_file where loaded values are taken as-is, so applying > canonicalization would be helpful there, no? Before this patch, there are three GUCs that get check_canonical_path treatment: log_directory external_pid_file stats_temp_directory After the patch, these also get the treatment (though Peter seems to be objecting to the change): promote_trigger_file krb_server_keyfile ssl_cert_file ssl_key_file ssl_ca_file ssl_crl_file ssl_dh_params_file and these still don't, with comments about how they are already canonicalized when the config file is loaded: data_directory config_file hba_file ident_file A little poking around shows that in SelectConfigFiles(), these four directories were set by SetConfigOption(). I don't see a problem with the code, but the way this stuff is spread around makes it easy for somebody adding a new GUC file path to do it wrong. I don't have much opinion about Peter's preference that paths be left alone, but I'd prefer some comments in guc.c explaining it all. The only cleanup that occurs to me is to reorder ConfigureNamesString[] to have all the path options back-to-back, with the four that are set by SelectConfigFiles() at the top with comments about why they are special, and the rest after that with comments about why they need or do not need canonicalization. — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Conflict of implicit collations doesn't propagate out of subqueries
Markus Winand writes: > However, if the conflict happens in a subquery, it doesn’t anymore: > WITH data (c, posix) AS ( > values ('a' COLLATE "C", 'b' COLLATE "POSIX") > ) > SELECT * > FROM (SELECT *, c || posix AS none FROM data) data > ORDER BY none || posix; > c | posix | none > ---+---+-- > a | b | ab > (1 row) I'm not exactly convinced this is a bug. Can you cite chapter and verse in the spec to justify throwing an error? AIUI, collation conflicts can only occur within a single expression, and this is not that. Moreover, even if data.none arguably has no collation, treating it from outside the sub-query as having collation strength "none" seems to me to be similar to our policy of promoting unknown-type subquery outputs to type "text" rather than leaving them to cause trouble later. It's not pedantically correct, but nobody liked the old behavior. regards, tom lane
Re: speed up unicode normalization quick check
> On May 21, 2020, at 12:12 AM, John Naylor wrote: > > Hi, > > Attached is a patch to use perfect hashing to speed up Unicode > normalization quick check. > > 0001 changes the set of multipliers attempted when generating the hash > function. The set in HEAD works for the current set of NFC codepoints, > but not for the other types. Also, the updated multipliers now all > compile to shift-and-add on most platform/compiler combinations > available on godbolt.org (earlier experiments found in [1]). The > existing keyword lists are fine with the new set, and don't seem to be > very picky in general. As a test, it also successfully finds a > function for the OS "words" file, the "D" sets of codepoints, and for > sets of the first n built-in OIDs, where n > 5. Prior to this patch, src/tools/gen_keywordlist.pl is the only script that uses PerfectHash. Your patch adds a second. I'm not convinced that modifying the PerfectHash code directly each time a new caller needs different multipliers is the right way to go. Could you instead make them arguments such that gen_keywordlist.pl, generate-unicode_combining_table.pl, and future callers can pass in the numbers they want? Or is there some advantage to having it this way? — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Fix compilation failure against LLVM 11
On 2020-05-28 17:07:46 +0900, Michael Paquier wrote: > Please note that I would still wait for their next GA release to plug > in any extra holes at the same time. @Jesse: or is this change > actually part of the upcoming 10.0.1? Why? I plan to just commit this change now.
Re: Conflict of implicit collations doesn't propagate out of subqueries
> On 28.05.2020, at 23:43, Tom Lane wrote: > > Markus Winand writes: >> However, if the conflict happens in a subquery, it doesn’t anymore: > >>WITH data (c, posix) AS ( >>values ('a' COLLATE "C", 'b' COLLATE "POSIX") >>) >>SELECT * >> FROM (SELECT *, c || posix AS none FROM data) data >> ORDER BY none || posix; > >> c | posix | none >>---+---+-- >> a | b | ab >>(1 row) > > I'm not exactly convinced this is a bug. Can you cite chapter and verse > in the spec to justify throwing an error? I think it is 6.6 Syntax Rule 17: • 17) If the declared type of a BIC is character string, then the collation derivation of the declared type of BIC is Case: • a) If the declared type has a declared type collation DTC, then implicit. • b) Otherwise, none. That gives derivation “none” to the column. When this is concatenated, 9.5 ("Result of data type combinations”) SR 3 a ii 3 applies: • ii) The collation derivation and declared type collation of the result are determined as follows. Case: • 1) If some data type in DTS has an explicit collation derivation [… doesn’t apply] • 2) If every data type in DTS has an implicit collation derivation, then [… doesn’t apply beause of “every"] • 3) Otherwise, the collation derivation is none. [applies] Also, the standard doesn’t have a forth derivation (strength). It also says that not having a declared type collation implies the derivation “none”. See 4.2.2: Every declared type that is a character string type has a collation derivation, this being either none, implicit, or explicit. The collation derivation of a declared type with a declared type collation that is explicitly or implicitly specified by a is implicit. If the collation derivation of a declared type that has a declared type collation is not implicit, then it is explicit. The collation derivation of an expression of character string type that has no declared type collation is none. -markus
OpenSSL 3.0.0 compatibility
Since OpenSSL is now releasing 3.0.0-alpha versions, I took a look at using it with postgres to see what awaits us. As it is now shipping in releases (with GA planned for Q4), users will probably soon start to test against it so I wanted to be prepared. Regarding the deprecations, we can either set preprocessor directives or use compiler flags to silence the warning and do nothing (for now), or we could update to the new API. We probably want to different things for master vs back-branches, but as an illustration of what the latter could look like I've implemented this in 0001. SSL_CTX_load_verify_locations and X509_STORE_load_locations are deprecated and replaced by individual calls to load files and directories. These are quite straightforward, and are implemented like how we handle the TLS protocol API. DH_check has been discouraged to use for quite some time, and is now marked deprecated without a 1:1 replacement. The OpenSSL docs recommends using the EVP API, which is also done in 0001. For now I just stuck it in with version gated ifdefs, something cleaner than that can clearly be done. 0001 is clearly not proposed for review/commit yet, it's included in case someone else is interested as well. OpenSSL also deprecates DES keys in 3.0.0, which cause our password callback tests to fail with the cryptic error "fetch failed", as the test suite keys are encrypted with DES. 0002 fixes this by changing to AES256 (randomly chosen among the ciphers supported in 1.0.1+ and likely to be around), and could be applied already today as there is nothing 3.0.0 specific about it. On top of DES keys there are also a lot of deprecations or low-level functions which breaks pgcrypto in quite a few ways. I haven't tackled these yet, but it looks like we have to do the EVP rewrite there. cheers ./daniel 0002-OpenSSL-3.0.0-compatibility-in-tests.patch Description: Binary data 0001-Compatibility-with-OpenSSL-3.0.0.patch Description: Binary data
Re: Fix compilation failure against LLVM 11
Hi, On 2020-05-27 07:49:45 -0700, Jesse Zhang wrote: > On the mensiversary of the last response, what can I do to help move > this along (aside from heeding the advice "don't use LLVM HEAD")? Sorry, I had looked at it at point with the intent to commit it, and hit some stupid small snags (*). And then I forgot about it. Pushed. Thanks for the patch! Andres Freund (*) first I didn't see the problem, because I accidentally had an old version of the header around. Then I couldn't immediately build old versions of pg + LLVM due to my existing installation needing to be rebuilt. Then there were compiler errors, due to a too new compiler. Etc.
Re: Make the qual cost on index Filter slightly higher than qual cost on index Cond.
On Wed, May 27, 2020 at 09:58:04PM +0800, Andy Fan wrote: On Wed, May 27, 2020 at 8:01 PM Ashutosh Bapat < ashutosh.ba...@2ndquadrant.com> wrote: On Wed, 27 May 2020 at 04:43, Andy Fan wrote: You can use the attached sql to reproduce this issue, but I'm not sure you can get the above result at the first time that is because when optimizer think the 2 index scan have the same cost, it will choose the first one it found, the order depends on RelationGetIndexList. If so, you may try drop and create j1_i_im5 index. The sense behind this patch is we still use the cost based optimizer, just when we we find out the 2 index scans have the same cost, we prefer to use the index which have more qual filter on Index Cond. This is implemented by adjust the qual cost on index filter slightly higher. Thanks for the example and the explanation. The execution time difference in your example is pretty high to account for executing the filter on so many rows. My guess is this has to do with the heap access. For applying the filter the entire row needs to be fetched from the heap. So we should investigate this case from that angle. Another guess I have is the statistics is not correct and hence the cost is wrong. I believe this is a statistics issue and then the cost is wrong. I think you're both right. Most of the time probably comes from the heap accesses, but the dabatabase has no chance to account for that as there was no analyze after inseting the data causing that. So it's very difficult to account for this when computing the cost. More characters of this issue are: 1). If a data is out of range in the old statistics, optimizer will given an 1 row assumption. 2). based on the 1 row assumption, for query "col1=out_of_range_val AND col2 = any_value" Index (col1, col2) and (col1, col3) will have exactly same cost for current cost model. 3). If the statistics was wrong, (col1, col3) maybe a very bad plan as shown above, but index (col1, col2) should always better/no worse than (col1, col3) in any case. 4). To expand the rule, for query "col1 = out_of_range_val AND col2 = any_value AND col3 = any_val", index are (col1, col2, col_m) and (col1, col_m, col_n), the former index will aways has better/no worse than the later one. 5). an statistics issue like this is not uncommon, for example an log based application, creation_date is very easy to out of range in statistics. Right. There are many ways to cause issues like this. so we need to optimize the cost model for such case, the method is the patch I mentioned above. Making the planner more robust w.r.t. to estimation errors is nice, but I wouldn't go as far saying we should optimize for such cases. The stats can be arbitrarily off, so should we expect the error to be 10%, 100% or 100%? We'd probably end up with plans that handle worst cases well, but the average performance would end up being way worse :-( Anyway, I kinda doubt making the conditions 1.001 more expensive is a way to make the planning more robust. I'm pretty sure we could construct examples in the opposite direction, in which case this change make it more likely we use the wrong index. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Trouble with hashagg spill I/O pattern and costing
On Tue, 2020-05-26 at 17:40 -0700, Jeff Davis wrote: > On Tue, 2020-05-26 at 21:15 +0200, Tomas Vondra wrote: > > Yeah. I agree prefetching is definitely out of v13 scope. It might > > be > > interesting to try how useful would it be, if you're willing to > > spend > > some time on a prototype. > > I think a POC would be pretty quick; I'll see if I can hack something > together. Attached (intended for v14). I changed the list from a simple array to a circular buffer so that we can keep enough preallocated block numbers in it to do prefetching. On SSD I didn't see any improvement, but perhaps it will do better on magnetic storage. Regards, Jeff Davis diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c index d7f99d9944c..89edbc4c579 100644 --- a/src/backend/postmaster/pgstat.c +++ b/src/backend/postmaster/pgstat.c @@ -3941,6 +3941,9 @@ pgstat_get_wait_io(WaitEventIO w) case WAIT_EVENT_BUFFILE_WRITE: event_name = "BufFileWrite"; break; + case WAIT_EVENT_BUFFILE_PREFETCH: + event_name = "BufFilePrefetch"; + break; case WAIT_EVENT_CONTROL_FILE_READ: event_name = "ControlFileRead"; break; diff --git a/src/backend/storage/file/buffile.c b/src/backend/storage/file/buffile.c index 35e8f12e62d..4e2d4f059a7 100644 --- a/src/backend/storage/file/buffile.c +++ b/src/backend/storage/file/buffile.c @@ -618,6 +618,20 @@ BufFileWrite(BufFile *file, void *ptr, size_t size) return nwritten; } +/* + * BufFilePrefetch + */ +void +BufFilePrefetchBlock(BufFile *buffile, long blknum) +{ + int filenum = (blknum / BUFFILE_SEG_SIZE); + off_t offset = (blknum % BUFFILE_SEG_SIZE) * BLCKSZ; + File file = buffile->files[filenum]; + + (void) FilePrefetch(file, offset, BLCKSZ, + WAIT_EVENT_BUFFILE_PREFETCH); +} + /* * BufFileFlush * diff --git a/src/backend/utils/sort/logtape.c b/src/backend/utils/sort/logtape.c index 666a7c0e81c..014ec67b8ff 100644 --- a/src/backend/utils/sort/logtape.c +++ b/src/backend/utils/sort/logtape.c @@ -97,6 +97,7 @@ typedef struct TapeBlockTrailer * block */ long next; /* next block on this tape, or # of valid * bytes on last block (if < 0) */ + long prefetch; /* block to prefetch when reading this one */ } TapeBlockTrailer; #define TapeBlockPayloadSize (BLCKSZ - sizeof(TapeBlockTrailer)) @@ -123,6 +124,11 @@ typedef struct TapeBlockTrailer #define TAPE_WRITE_PREALLOC_MIN 8 #define TAPE_WRITE_PREALLOC_MAX 128 +#define TAPE_PREFETCH_DISTANCE_MAX 64 + +StaticAssertDecl(TAPE_PREFETCH_DISTANCE_MAX * 2 <= TAPE_WRITE_PREALLOC_MAX, + "Tape prefetch distance too large."); + /* * This data structure represents a single "logical tape" within the set * of logical tapes stored in the same file. @@ -165,13 +171,13 @@ typedef struct LogicalTape int nbytes; /* total # of valid bytes in buffer */ /* - * Preallocated block numbers are held in an array sorted in descending - * order; blocks are consumed from the end of the array (lowest block - * numbers first). + * Preallocated block numbers are held in a sorted circular array. */ long *prealloc; - int nprealloc; /* number of elements in list */ - int prealloc_size; /* number of elements list can hold */ + int prealloc_next; /* next element in circular array */ + int prealloc_nelem; /* number of elements in circular array */ + int prealloc_size; /* total size of circular array */ + int prefetch_dist; /* distance to prefetch */ } LogicalTape; /* @@ -219,7 +225,8 @@ struct LogicalTapeSet static void ltsWriteBlock(LogicalTapeSet *lts, long blocknum, void *buffer); static void ltsReadBlock(LogicalTapeSet *lts, long blocknum, void *buffer); static long ltsGetFreeBlock(LogicalTapeSet *lts); -static long ltsGetPreallocBlock(LogicalTapeSet *lts, LogicalTape *lt); +static long ltsGetPreallocBlock(LogicalTapeSet *lts, LogicalTape *lt, +long *prefetch_block); static void ltsReleaseBlock(LogicalTapeSet *lts, long blocknum); static void ltsConcatWorkerTapes(LogicalTapeSet *lts, TapeShare *shared, SharedFileSet *fileset); @@ -329,6 +336,10 @@ ltsReadFillBuffer(LogicalTapeSet *lts, LogicalTape *lt) else lt->nextBlockNumber = TapeBlockGetTrailer(thisbuf)->next; + if (TapeBlockGetTrailer(thisbuf)->prefetch >= 0) + BufFilePrefetchBlock(lts->pfile, + TapeBlockGetTrailer(thisbuf)->prefetch); + /* Advance to next block, if we have buffer space left */ } while (lt->buffer_size - lt->nbytes > BLCKSZ); @@ -424,38 +435,79 @@ ltsGetFreeBlock(LogicalTapeSet *lts) * Refill the preallocation list if necessary. */ static long -ltsGetPreallocBlock(LogicalTapeSet *lts, LogicalTape *lt) +ltsGetPreallocBlock(LogicalTapeSet *lts, LogicalTape *lt, long *prefetch) { - /* sorted in descending order, so return the last element */ - if (lt->nprealloc > 0) - return lt->prealloc[--lt->nprealloc]; + long block; + int prefetch_idx; + + if (lt->prealloc_nelem > lt->prefetch_dist) + { + prefetch_id
Re: segmentation fault using currtid and partitioned tables
Hi, On 2020-05-22 19:32:57 -0400, Alvaro Herrera wrote: > On 2020-Apr-09, Michael Paquier wrote: > > > Playing more with this stuff, it happens that we have zero code > > coverage for currtid() and currtid2(), and the main user of those > > functions I can find around is the ODBC driver: > > https://coverage.postgresql.org/src/backend/utils/adt/tid.c.gcov.html > > Yeah, they're there solely for ODBC as far as I know. And there only for very old servers (< 8.2), according to Hiroshi Inoue. Found that out post 12 freeze. I was planning to drop them for 13, but I unfortunately didn't get around to do so :( I guess we could decide to make a freeze exception to remove them now, although I'm not sure the reasons for doing so are strong enough. > > There are multiple cases to consider, particularly for views: > > - Case of a view with ctid as attribute taken from table. > > - Case of a view with ctid as attribute with incorrect attribute > > type. > > It is worth noting that all those code paths can trigger various > > elog() errors, which is not something that a user should be able to do > > using a SQL-callable function. There are also two code paths for > > cases where a view has no or more-than-one SELECT rules, which cannot > > normally be reached. > > > All in that, I propose something like the attached to patch the > > surroundings with tests to cover everything I could think of, which I > > guess had better be backpatched? > > I don't know, but this stuff is so unused that your patch seems > excessive ... and I think we'd rather not backpatch something so large. > I propose we do something less invasive in the backbranches, like just > throw elog() errors (nothing fancy) where necessary to avoid the > crashes. Even for pg12 it seems that that should be sufficient. > > For pg13 and beyond, I liked Tom's idea of installing dummy functions I concur that it seems unnecessary to make these translatable, even with the reduced scope from https://www.postgresql.org/message-id/20200526025959.GE6155%40paquier.xyz Greetings, Andres Freund
Re: Make the qual cost on index Filter slightly higher than qual cost on index Cond.
Tomas Vondra writes: > On Wed, May 27, 2020 at 09:58:04PM +0800, Andy Fan wrote: >> so we need to optimize the cost model for such case, the method is the >> patch I mentioned above. > Making the planner more robust w.r.t. to estimation errors is nice, but > I wouldn't go as far saying we should optimize for such cases. Yeah, it's a serious mistake to try to "optimize" for cases where we have no data or wrong data. By definition, we don't know what we're doing, so who's to say whether we've made it better or worse? And the possible side effects on cases where we do have good data are not to be ignored. > Anyway, I kinda doubt making the conditions 1.001 more expensive is a > way to make the planning more robust. I'm pretty sure we could construct > examples in the opposite direction, in which case this change make it > more likely we use the wrong index. The other serious error we could be making here is to change things on the basis of just a few examples. You really need a pretty wide range of test cases to be sure that you're not making things worse, any time you're twiddling basic parameters like these. regards, tom lane
Re: segmentation fault using currtid and partitioned tables
Hi, On 2020-04-05 12:51:56 -0400, Tom Lane wrote: > (2) The proximate cause of the crash is that rd_tableam is zero, > so that the interface functions in tableam.h just crash hard. > This seems like a pretty bad thing; does anyone want to promise > that there are no other oversights of the same ilk elsewhere, > and never will be? > > I think it might be a good idea to make relations-without-storage > set up rd_tableam as a vector of dummy functions that will throw > some suitable complaint about "relation lacks storage". NULL is > a horrible default for this. I don't have particularly strong views here. I can see a benefit to such a pseudo AM. I can even imagine that there might some cases where we would actually introduce some tableam functions for e.g. partitioned or views tables, to centralize their handling more, instead of having such considerations more distributed. Clearly not worth actively trying to do that for all existing code dealing with such relkinds, but there might be cases where it's worthwhile. OTOH, it's kinda annoying having to maintain a not insignificant number of functions that needs to be updated whenever the tableam interface evolves. I guess we could partially hack our way through that by having one such function, and just assigning it to all the mandatory callbacks by way of a void cast. But that'd be mighty ugly. Greetings, Andres Freund
Re: Make the qual cost on index Filter slightly higher than qual cost on index Cond.
> >so we need to optimize the cost model for such case, the method is the > >patch I mentioned above. > > Making the planner more robust w.r.t. to estimation errors is nice, but > I wouldn't go as far saying we should optimize for such cases. The stats > can be arbitrarily off, so should we expect the error to be 10%, 100% or > 100%? I don't think my patch relay on anything like that. My patch doesn't fix the statistics issue, just adding the extra cost on qual cost on Index Filter part. Assume the query pattern are where col1= X and col2 = Y. The impacts are : 1). Make the cost of (col1, other_column) is higher than (col1, col2) 2). The relationship between seqscan and index scan on index (col1, other_column) is changed, (this is something I don't want). However my cost difference between index scan & seq scan usually very huge, so the change above should has nearly no impact on that choice. 3). Make the cost higher index scan for Index (col1) only. Overall I think nothing will make thing worse. > We'd probably end up with plans that handle worst cases well, > but the average performance would end up being way worse :-( > > That's possible, that's why I hope to get some feedback on that. Actually I can't think out such case. can you have anything like that in mind? I'm feeling that (qpqual_cost.per_tuple * 1.001) is not good enough since user may have some where expensive_func(col1) = X. we may change it cpu_tuple_cost + qpqual_cost.per_tuple + (0.0001) * list_lenght(qpquals). -- Best Regards Andy Fan
Re: segmentation fault using currtid and partitioned tables
Andres Freund writes: > On 2020-04-05 12:51:56 -0400, Tom Lane wrote: >> I think it might be a good idea to make relations-without-storage >> set up rd_tableam as a vector of dummy functions that will throw >> some suitable complaint about "relation lacks storage". NULL is >> a horrible default for this. > OTOH, it's kinda annoying having to maintain a not insignificant number > of functions that needs to be updated whenever the tableam interface > evolves. That argument sounds pretty weak. If you're making breaking changes in the tableam API, updating the signatures (not even any code) of some dummy functions seems like by far the easiest part. regards, tom lane
Re: Trouble with hashagg spill I/O pattern and costing
On Thu, 2020-05-28 at 20:57 +0200, Tomas Vondra wrote: > Attached is a patch adding CP_SMALL_TLIST to create_agg_plan and > create_groupingsets_plan. Looks good, except one question: Why would aggstrategy ever be MIXED when in create_agg_path? Wouldn't that only happen in create_groupingsets_path? Regards, Jeff Davis
Re: Make the qual cost on index Filter slightly higher than qual cost on index Cond.
Thanks all of you for your feedback. On Fri, May 29, 2020 at 9:04 AM Tom Lane wrote: > Tomas Vondra writes: > > On Wed, May 27, 2020 at 09:58:04PM +0800, Andy Fan wrote: > >> so we need to optimize the cost model for such case, the method is the > >> patch I mentioned above. > > > Making the planner more robust w.r.t. to estimation errors is nice, but > > I wouldn't go as far saying we should optimize for such cases. > > Yeah, it's a serious mistake to try to "optimize" for cases where we have > no data or wrong data. By definition, we don't know what we're doing, > so who's to say whether we've made it better or worse? Actually I think it is a more robust way.. the patch can't fix think all the impact of bad statistics(That is impossible I think), but it will make some simple things better and make others no worse. By definition I think I know what we are doing here, like what I replied to Tomas above. But it is possible my think is wrong. > The other serious error we could be making here is to change things on > the basis of just a few examples. You really need a pretty wide range > of test cases to be sure that you're not making things worse, any time > you're twiddling basic parameters like these. > > I will try more thing with this direction, thanks for suggestion. -- Best Regards Andy Fan
Re: Problem with pg_atomic_compare_exchange_u64 at 32-bit platforms
Hi, On 2020-05-20 10:32:18 +0300, Konstantin Knizhnik wrote: > On 20.05.2020 08:10, Andres Freund wrote: > > On May 19, 2020 8:05:00 PM PDT, Noah Misch wrote: > > > On Tue, May 19, 2020 at 04:07:29PM +0300, Konstantin Knizhnik wrote: > > > > Definition of pg_atomic_compare_exchange_u64 requires alignment of > > > expected > > > > pointer on 8-byte boundary. > > > > > > > > pg_atomic_compare_exchange_u64(volatile pg_atomic_uint64 *ptr, > > > > uint64 *expected, uint64 newval) > > > > { > > > > #ifndef PG_HAVE_ATOMIC_U64_SIMULATION > > > > AssertPointerAlignment(ptr, 8); > > > > AssertPointerAlignment(expected, 8); > > > > #endif > > > > > > > > > > > > I wonder if there are platforms where such restriction is actually > > > needed. > > > > > > In general, sparc Linux does SIGBUS on unaligned access. Other > > > platforms > > > function but suffer performance penalties. > > Indeed. Cross cacheline atomics are e.g. really expensive on x86. > > Essentially requiring a full blown bus lock iirc. > > > Please notice that here we talk about alignment not of atomic pointer > itself, but of pointer to the expected value. That wasn't particularly clear in your first email... In hindsight I can see that you meant that, but I'm not surprised to not have understood that the on the first read either. > At Intel CMPXCHG instruction read and write expected value throw AX > register. > So alignment of pointer to expected value in pg_atomic_compare_exchange_u64 > is not needed in this case. x86 also supports doing a CMPXCHG crossing a cacheline boundary, it's just terrifyingly expensive... I can imagine this being a problem on a 32bit platforms, but on 64bit platforms, it seems only an insane platform ABI would only have 4 byte alignment on 64bit integers. That'd cause so much unnecessarily split cachlines... That's separate from the ISA actually supporting doing such reads efficiently, of course. But that still leaves the alignment check on expected to be too strong on 32 bit platforms where 64bit alignment is only 4 bytes. I'm doubtful that's it's a good idea to use a comparison value potentially split across cachelines for an atomic operation that's potentially contended. But also, I don't particularly care about 32 bit performance. I think we should probably just drop the second assert down to ALIGNOF_INT64. Would require a new configure stanza, but that's easy enough to do. It's just adding AC_CHECK_ALIGNOF(PG_INT64_TYPE) Doing that change made me think about replace the conditional long long int alignof logic in configure.in, and just unconditionally do the a check for PG_INT64_TYPE, seems nicer. That made me look at Solution.pm due to ALIGNOF_LONG_LONG, and it's interesting: # Every symbol in pg_config.h.in must be accounted for here. Set # to undef if the symbol should not be defined. my %define = ( ... ALIGNOF_LONG_LONG_INT => 8, ... PG_INT64_TYPE => 'long long int', so currently our msvc build actually claims that the alignment requirements are what the code tests. And that's not just since pg_config.h is autogenerated, it was that way before too: /* The alignment requirement of a `long long int'. */ #define ALIGNOF_LONG_LONG_INT 8 /* Define to the name of a signed 64-bit integer type. */ #define PG_INT64_TYPE long long int and has been for a while. > And my question was whether there are some platforms where implementation of > compare-exchange 64-bit primitive > requires stronger alignment of "expected" pointer than one enforced by > original alignment rules for this platform. IIRC there's a few older platforms that have single-copy-atomicity for 8 byte values, but do *not* have it for ones not aligned to 8 byte platforms. Despite not having such an ABI alignment. It's not impossible to come up with a case where that could matter (if expected pointed into some shared memory that could be read by others), but it's hard to take them serious. Greetings, Andres Freund
Re: SIGSEGV from START_REPLICATION 0/XXXXXXX in XLogSendPhysical () at walsender.c:2762
At Thu, 28 May 2020 09:08:19 -0400, Dave Cramer wrote in > On Thu, 28 May 2020 at 05:11, Kyotaro Horiguchi > wrote: > > Mmm. It is not the proper way to use physical replication and it's > > totally accidental that that worked (or even it might be a bug). The > > documentation is saying as the follows, as more-or-less the same for > > all versions since 9.4. > > > > https://www.postgresql.org/docs/13/protocol-replication.html ... > > > While the documentation does indeed say that there is quite a bit of > additional confusion added by: > > and > START_REPLICATION [ SLOT *slot_name* ] [ PHYSICAL ] *XXX/XXX* [ TIMELINE > *tli* ] > > If we already have a physical replication slot according to the startup > message why do we need to specify it in the START REPLICATION message ? I don't know, but physical replication has worked that way since before the replication slots was introduced so we haven't needed to do so. Physical replication slots are not assumed as more than memorandum for the oldest required WAL segment (and oldest xmin). regards. -- Kyotaro Horiguchi NTT Open Source Software Center
feature idea: use index when checking for NULLs before SET NOT NULL
There's the age-old problem of SET NOT NULL being impossible on large actively used tables, because it needs to lock the table and do a table scan to check if there are any existing NULL values. I currently have a table that's not particularly huge but a scan takes 70 seconds, which causes unacceptable downtime for my entire application. Postgres is not able to use an index when doing this check: https://dba.stackexchange.com/questions/267947 Would it be possible to have Postgres use an index for this check? Given the right index, the check could be instant and the table would only need to be locked for milliseconds. (I'm sure I'm not the first person to think of this, but I couldn't find any other discussion on this list or elsewhere.) Thanks for reading! John
Re: speed up unicode normalization quick check
On Fri, May 29, 2020 at 5:59 AM Mark Dilger wrote: > > > On May 21, 2020, at 12:12 AM, John Naylor > > wrote: > > very picky in general. As a test, it also successfully finds a > > function for the OS "words" file, the "D" sets of codepoints, and for > > sets of the first n built-in OIDs, where n > 5. > > Prior to this patch, src/tools/gen_keywordlist.pl is the only script that > uses PerfectHash. Your patch adds a second. I'm not convinced that > modifying the PerfectHash code directly each time a new caller needs > different multipliers is the right way to go. Calling it "each time" with a sample size of two is a bit of a stretch. The first implementation made a reasonable attempt to suit future uses and I simply made it a bit more robust. In the text quoted above you can see I tested some scenarios beyond the current use cases, with key set sizes as low as 6 and as high as 250k. > Could you instead make them arguments such that gen_keywordlist.pl, > generate-unicode_combining_table.pl, and future callers can pass in the > numbers they want? Or is there some advantage to having it this way? That is an implementation detail that callers have no business knowing about. -- John Naylorhttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Incorrect comment in be-secure-openssl.c
On Thu, May 28, 2020 at 05:15:17PM +0200, Daniel Gustafsson wrote: > The comment in be-secure-openssl.c didn't get the memo that the hardcoded DH > parameters moved in 573bd08b99e277026e87bb55ae69c489fab321b8. The attached > updates the wording, keeping it generic enough that we wont need to update it > should the parameters move again. Indeed, looks good to me. I'll go fix, ust let's wait and see first if others have any comments. -- Michael signature.asc Description: PGP signature
Re: OpenSSL 3.0.0 compatibility
On Fri, 2020-05-29 at 00:16 +0200, Daniel Gustafsson wrote: > Since OpenSSL is now releasing 3.0.0-alpha versions, I took a look at using it > with postgres to see what awaits us. As it is now shipping in releases (with > GA planned for Q4), users will probably soon start to test against it so I > wanted to be prepared. > > Regarding the deprecations, we can either set preprocessor directives or use > compiler flags to silence the warning and do nothing (for now), or we could > update to the new API. We probably want to different things for master vs > back-branches, but as an illustration of what the latter could look like I've > implemented this in 0001. An important question will be: if we convert to functions that are not deprecated, what is the earliest OpenSSL version we can support? Yours, Laurenz Albe
Re: segmentation fault using currtid and partitioned tables
On Thu, May 28, 2020 at 05:55:59PM -0700, Andres Freund wrote: > And there only for very old servers (< 8.2), according to Hiroshi > Inoue. Found that out post 12 freeze. I was planning to drop them for > 13, but I unfortunately didn't get around to do so :( [... digging ...] Ah, I think I see your point from the code. That's related to the use of RETURNING for ctids. > I guess we could decide to make a freeze exception to remove them now, > although I'm not sure the reasons for doing so are strong enough. Not sure that's a good thing to do after beta1 for 13, but there is an argument for that in 14. FWIW, my company is a huge user of the ODBC driver (perhaps the biggest one?), and we have nothing even close to 8.2. > I concur that it seems unnecessary to make these translatable, even with > the reduced scope from > https://www.postgresql.org/message-id/20200526025959.GE6155%40paquier.xyz Okay, I have switched the patch to do that. Any comments or objections? -- Michael diff --git a/src/backend/utils/adt/tid.c b/src/backend/utils/adt/tid.c index 4ce8375eab..ac232a238f 100644 --- a/src/backend/utils/adt/tid.c +++ b/src/backend/utils/adt/tid.c @@ -31,6 +31,7 @@ #include "parser/parsetree.h" #include "utils/acl.h" #include "utils/builtins.h" +#include "utils/lsyscache.h" #include "utils/rel.h" #include "utils/snapmgr.h" #include "utils/varlena.h" @@ -349,6 +350,12 @@ currtid_for_view(Relation viewrel, ItemPointer tid) return (Datum) 0; } +/* + * currtid_byreloid + * Return the latest visible tid of a relation's tuple, associated + * to the tid given in input. This is a wrapper for currtid(), and + * uses in input the OID of the relation to look at. + */ Datum currtid_byreloid(PG_FUNCTION_ARGS) { @@ -378,6 +385,11 @@ currtid_byreloid(PG_FUNCTION_ARGS) if (rel->rd_rel->relkind == RELKIND_VIEW) return currtid_for_view(rel, tid); + if (!RELKIND_HAS_STORAGE(rel->rd_rel->relkind)) + elog(ERROR, "cannot look at latest visible tid for relation \"%s.%s\"", + get_namespace_name(RelationGetNamespace(rel)), + RelationGetRelationName(rel)); + ItemPointerCopy(tid, result); snapshot = RegisterSnapshot(GetLatestSnapshot()); @@ -391,6 +403,12 @@ currtid_byreloid(PG_FUNCTION_ARGS) PG_RETURN_ITEMPOINTER(result); } +/* + * currtid_byrelname + * Return the latest visible tid of a relation's tuple, associated + * to the tid given in input. This is a wrapper for currtid2(), and + * uses in input the relation name to look at. + */ Datum currtid_byrelname(PG_FUNCTION_ARGS) { @@ -415,6 +433,11 @@ currtid_byrelname(PG_FUNCTION_ARGS) if (rel->rd_rel->relkind == RELKIND_VIEW) return currtid_for_view(rel, tid); + if (!RELKIND_HAS_STORAGE(rel->rd_rel->relkind)) + elog(ERROR, "cannot look at latest visible tid for relation \"%s.%s\"", + get_namespace_name(RelationGetNamespace(rel)), + RelationGetRelationName(rel)); + result = (ItemPointer) palloc(sizeof(ItemPointerData)); ItemPointerCopy(tid, result); diff --git a/src/test/regress/expected/tid.out b/src/test/regress/expected/tid.out new file mode 100644 index 00..e7e0d74780 --- /dev/null +++ b/src/test/regress/expected/tid.out @@ -0,0 +1,106 @@ +-- tests for functions related to TID handling +CREATE TABLE tid_tab (a int); +-- min() and max() for TIDs +INSERT INTO tid_tab VALUES (1), (2); +SELECT min(ctid) FROM tid_tab; + min +--- + (0,1) +(1 row) + +SELECT max(ctid) FROM tid_tab; + max +--- + (0,2) +(1 row) + +TRUNCATE tid_tab; +-- Tests for currtid() and currtid2() with various relation kinds +-- Materialized view +CREATE MATERIALIZED VIEW tid_matview AS SELECT a FROM tid_tab; +SELECT currtid('tid_matview'::regclass::oid, '(0,1)'::tid); -- fails +ERROR: tid (0, 1) is not valid for relation "tid_matview" +SELECT currtid2('tid_matview'::text, '(0,1)'::tid); -- fails +ERROR: tid (0, 1) is not valid for relation "tid_matview" +INSERT INTO tid_tab VALUES (1); +REFRESH MATERIALIZED VIEW tid_matview; +SELECT currtid('tid_matview'::regclass::oid, '(0,1)'::tid); -- ok + currtid +- + (0,1) +(1 row) + +SELECT currtid2('tid_matview'::text, '(0,1)'::tid); -- ok + currtid2 +-- + (0,1) +(1 row) + +DROP MATERIALIZED VIEW tid_matview; +TRUNCATE tid_tab; +-- Sequence +CREATE SEQUENCE tid_seq; +SELECT currtid('tid_seq'::regclass::oid, '(0,1)'::tid); -- ok + currtid +- + (0,1) +(1 row) + +SELECT currtid2('tid_seq'::text, '(0,1)'::tid); -- ok + currtid2 +-- + (0,1) +(1 row) + +DROP SEQUENCE tid_seq; +-- Index, fails with incorrect relation type +CREATE INDEX tid_ind ON tid_tab(a); +SELECT currtid('tid_ind'::regclass::oid, '(0,1)'::tid); -- fails +ERROR: "tid_ind" is an index +SELECT currtid2('tid_ind'::text, '(0,1)'::tid); -- fails +ERROR: "tid_ind" is an index +DROP INDEX tid_ind; +-- Partitioned table, no storage +CREATE TABLE tid_part (a int) PARTITION BY RANGE (a); +SELECT currtid('tid_part'::regclass::oid, '(0,1)'::tid); -- fails +ERROR: cannot look at latest visible
Re: feature idea: use index when checking for NULLs before SET NOT NULL
Hello Correct index lookup is a difficult task. I tried to implement this previously... But the answer in SO is a bit incomplete for recent postgresql releases. Seqscan is not the only possible way to set not null in pg12+. My patch was commited ( https://commitfest.postgresql.org/22/1389/ ) and now it's possible to do this way: alter table foos add constraint foos_not_null check (bar1 is not null) not valid; -- short-time exclusive lock alter table foos validate constraint foos_not_null; -- still seqscan entire table but without exclusive lock An then another short lock: alter table foos alter column bar1 set not null; alter table foos drop constraint foos_not_null; regards, Sergei