Re: logical replication empty transactions
On Sat, Aug 7, 2021 at 12:01 AM Ajin Cherian wrote: > > On Mon, Aug 2, 2021 at 7:20 PM Amit Kapila wrote: > > > > On Fri, Jul 23, 2021 at 3:39 PM Ajin Cherian wrote: > > > > > > > Let's first split the patch for prepared and non-prepared cases as > > that will help to focus on each of them separately. > > As a first shot, I have split the patch into prepared and non-prepared cases, I have reviewed the v12* split patch set. Apply / build / test was all OK Below are my code review comments (mostly cosmetic). // Comments for v12-0001 = 1. Patch comment => This comment as-is might have been OK before the 2PC code was committed, but now that the 2PC is part of the HEAD perhaps this comment needs to be expanded just to say this patch is ONLY for fixing empty transactions for the cases of non-"streaming" and non-"two_phase", and the other kinds will be tackled separately. -- 2. src/backend/replication/pgoutput/pgoutput.c - PGOutputTxnData comment +/* + * Maintain a per-transaction level variable to track whether the + * transaction has sent BEGIN or BEGIN PREPARE. BEGIN or BEGIN PREPARE + * is only sent when the first change in a transaction is processed. + * This makes it possible to skip transactions that are empty. + */ => Maybe this is true for the combined v12-0001/v12-0002 case but just for the v12-0001 patch I think it is nor right to imply that some skipping of the BEGIN_PREPARE is possible, because IIUC it isn;t implemented in the *this* patch/ -- 3. src/backend/replication/pgoutput/pgoutput.c - pgoutput_begin_txn whitespace + PGOutputTxnData*txndata = MemoryContextAllocZero(ctx->context, + sizeof(PGOutputTxnData)); => Misaligned indentation? -- 4. src/backend/replication/pgoutput/pgoutput.c - pgoutput_change brackets + /* + * Output BEGIN if we haven't yet, unless streaming. + */ + if (!in_streaming && !txndata->sent_begin_txn) + { + pgoutput_begin(ctx, txn); + } => The brackets are not needed for the if with a single statement. -- 5. src/backend/replication/pgoutput/pgoutput.c - pgoutput_truncate brackets/comment + /* + * output BEGIN if we haven't yet, + * while streaming no need to send BEGIN / BEGIN PREPARE. + */ + if (!in_streaming && !txndata->sent_begin_txn) + { + pgoutput_begin(ctx, txn); + } 5a. => Same as review comment 4. The brackets are not needed for the if with a single statement. 5b. => Notice this code is the same as cited in review comment 4. So probably the code comment should be consistent/same also? -- 6. src/backend/replication/pgoutput/pgoutput.c - pgoutput_message brackets + Assert(txndata); + if (!txndata->sent_begin_txn) + { + pgoutput_begin(ctx, txn); + } => The brackets are not needed for the if with a single statement. -- 7. typdefs.list => The structure PGOutputTxnData was added in v12-0001, so the typedefs.list probably should also be updated. // Comments for v12-0002 = 8. Patch comment This patch addresses the above problem by postponing the BEGIN / BEGIN PREPARE messages until the first change is encountered. If (when processing a COMMIT / PREPARE message) we find there had been no other change for that transaction, then do not send the COMMIT / PREPARE message. This means that pgoutput will skip BEGIN / COMMIT or BEGIN PREPARE / PREPARE messages for transactions that are empty. pgoutput will also skip COMMIT PREPARED and ROLLBACK PREPARED messages for transactions that are empty. 8a. => I’m not sure this comment is 100% correct for this specific patch. The whole BEGIN/COMMIT was already handled by the v12-0001 patch, right? So really this comment should only be mentioning about BEGIN PREPARE and COMMIT PREPARED I thought. 8b. => I think there should also be some mention that this patch is not handling the "streaming" case of empty tx at all. -- 9. src/backend/replication/logical/proto.c - protocol version @@ -248,8 +250,10 @@ logicalrep_write_commit_prepared(StringInfo out, ReorderBufferTXN *txn, pq_sendbyte(out, flags); /* send fields */ + pq_sendint64(out, prepare_end_lsn); pq_sendint64(out, commit_lsn); pq_sendint64(out, txn->end_lsn); + pq_sendint64(out, prepare_time); pq_sendint64(out, txn->xact_time.commit_time); pq_sendint32(out, txn->xid); => I agree with a previous feedback comment from Amit - Probably there is some protocol version requirement/implications here because the message format has been changed in logicalrep_write_commit_prepared and logicalrep_read_commit_prepared. e.g. Does this code need to be cognisant of the version and behave differently accordingly? -- 10. src/backend/replication/pgoutput/pgoutput.c - pgoutput_begin_prepare flag moved? + Assert(txndata); OutputPluginPrepareWrite(ctx, !send_replication_origin); logicalrep_write_begin_prepare(ctx->out, txn); + txndata->sent_begin_txn = true; send_repl_origin(ctx, txn->origin_id, txn->origin_lsn, send_replication_origin); O
Re: ECPG bug fix: DECALRE STATEMENT and DEALLOCATE, DESCRIBE
> I think that it's crystal clear what I meant in the email of July 30. > I meant: it's not okay that you're simply ignoring the RMT. You > hadn't > even made a token effort at that point. For example you didn't give > the proposed fix a cursory 15 minute review, just so we had some very > rough idea of where things stand. You still haven't. How do you know I didn't spend 15 minutes looking at the patch and the whole email thread? I surely did and it was more than 15 minutes, but not enough to give a meaningful comment. If you can do it in 15 minutes, great for you, I cannot. The meaning of your email of July 30 was crystal clear, yes. It means you'd revert the patch if I didn't resolve the issue. This is literally what it says. If you meant to say "It's not okay that you're simply ignoring the RMT. You hadn't even made a token effort at that point." it might have been helpful if you said that, instead of having me guess if there was a hidden meaning in your email. Besides, I have not ignored the RMT. I don't know why you keep insisting on something that is simply not true. > My understanding of what you're taking issue with (perhaps a flawed > understanding) is that you think that you have been treated unfairly > or arbitrarily in general. That's why I brought up the email of July > 30 yesterday. So my point was: no, you haven't been treated unfairly. Yes, this is a flawed understanding. I'm sorry you came to that understanding, I though my emails were pretty clear as to what I was objecting to. > If you only take issue with the specific tone and tenor of my email > from Friday (the email that specified a deadline), and not the > content > itself, then maybe the timeline and the wider context are not so > important. > > I am still unsure about whether your concern is limited to the tone > of > the email from Friday, or if you also take exception to the content > of > that email (and the wider context). At the risk of repeating myself, my concern is *only* the rude and disrespectful way of addressing me in the third person while talking to me directly. Again, I though I made that clear in my first email about the topic and every one that followed. > Perhaps the tone of my email from Friday was unhelpful. Even still, I > am surprised that you seem to think that it was totally outrageous -- > especially given the context. It was the first email that you The context never makes a derogative communication okay, at least not in my opinion. > responded to *at all* on this thread, with the exception of your > original terse response. I am not well practised in communicating > with > a committer that just doesn't engage with the RMT at all. This is all > new to me. I admit that I found it awkward to write the email for my > own reasons. I was *never* asked for *any* response in *any* email, save the original technical discussion, which I did answer with telling people that I'm lacking time but will eventually get to it. Just to be precise, your email from July 30 told me and everyone how this will be handled. A reasonable procedure, albeit not one we'd like to see happen. But why should I answer and what? It's not that you bring this up as a discussion point, but as a fact. > I brought up flexibility to point out that this could have been > avoided. If you needed more time, why didn't you simply ask for it? The first conversation that brought up the time issue was your email that started this thread. There you set a deadline which I understand and accept. But then I never said a word against it, so the question remains, why accusing me of something I never did? > Again, the scope of what you're complaining about was (and still is) > unclear to me. I'm sorry, but I have no idea how to explain it more clearly. I'm not asking for any favor or special treatment, I just ask to be treated like a person. Michael -- Michael Meskes Michael at Fam-Meskes dot De Michael at Meskes dot (De|Com|Net|Org) Meskes at (Debian|Postgresql) dot Org
Re: Reduce the number of special cases to build contrib modules on windows
On Fri, 30 Jul 2021 at 15:05, David Rowley wrote: > Does anyone have any thoughts on where we should draw the line on > parsing Makefiles? I'm a little worried that I'm adding pasing just > for exactly how the Makefiles are today and that it could easily be > broken if something is adjusted later. I'm not all that skilled with > make, so I'm struggling to judge this for myself. After thinking about this some more, I think since we're never going to make these Perl scripts do everything that make can do, that if the parsing that's being added seems reasonable and works for what we have today, there I don't think there is much reason not to just go with this. The v10 patch I attached previously output almost identical *.vcxproj files. The only change was in libpq_pipeline.vcxproj where the order of the AdditionalDependencies changed to have ws2_32.lib first rather than somewhere in the middle. I've now pushed the final patch in this series. Thank you to everyone who looked at one or more of these patches. David
Re: [bug] Logical Decoding of relation rewrite with toast does not reset toast_hash
On Fri, Jul 9, 2021 at 12:22 PM Drouvot, Bertrand wrote: > > Please find enclosed a patch proposal to: > > * Avoid the failed assertion on current master and generate the error message > instead (should the code reach that stage). > * Reset the toast_hash in case of relation rewrite with toast (so that the > logical decoding in the above repro is working). > I think instead of resetting toast_hash for this case why don't we set 'relrewrite' for toast tables as well during rewrite? If we do that then we will simply skip assembling toast chunks for the toast table. In make_new_heap(), we are calling NewHeapCreateToastTable() to create toast table where we can pass additional information (probably 'toastid'), if required to set 'relrewrite'. Additionally, let's add a test case if possible for this. BTW, I see this as an Open Item for PG-14 [1] which seems wrong to me as this is a bug from previous versions. I am not sure who added it but do you see any reason for this to consider as an open item for PG-14? [1] - https://wiki.postgresql.org/wiki/PostgreSQL_14_Open_Items -- With Regards, Amit Kapila.
Re: Tab completion for CREATE SCHEMAAUTHORIZATION
Hello Dagfinn, I had a look at your patch and below are my review comments. Please correct me if I am missing something. 1. For me the patch does not apply cleanly. I have been facing the error of trailing whitespaces. surajkhamkar@localhost:postgres$ git apply v2-0001-Add-tab-completion-for-CREATE-SCHEMA.patch v2-0001-Add-tab-completion-for-CREATE-SCHEMA.patch:25: trailing whitespace. #define Query_for_list_of_schema_roles \ v2-0001-Add-tab-completion-for-CREATE-SCHEMA.patch:26: trailing whitespace. Query_for_list_of_roles \ v2-0001-Add-tab-completion-for-CREATE-SCHEMA.patch:30: trailing whitespace. #define Query_for_list_of_grant_roles \ v2-0001-Add-tab-completion-for-CREATE-SCHEMA.patch:31: trailing whitespace. Query_for_list_of_schema_roles \ v2-0001-Add-tab-completion-for-CREATE-SCHEMA.patch:32: trailing whitespace. " UNION ALL SELECT 'PUBLIC'"\ error: patch failed: src/bin/psql/tab-complete.c:758 error: src/bin/psql/tab-complete.c: patch does not apply 2. We can remove space in before \ and below change +" UNION ALL SELECT 'PUBLIC'" \ Should be, +" UNION ALL SELECT 'PUBLIC' "\ 3. role_specification has CURRENT_ROLE, CURRENT_USER and SESSION_USER. But current changes are missing CURRENT_ROLE. postgres@53724=#CREATE SCHEMA AUTHORIZATION CURRENT_USER pg_execute_server_program pg_read_all_data pg_read_all_stats pg_signal_backend pg_write_all_data SESSION_USER pg_database_owner pg_monitor pg_read_all_settings pg_read_server_files pg_stat_scan_tables pg_write_server_files surajkhamkar 4. I'm not sure about this but do we need to enable tab completion for IF NOT EXIST? 5. I think we are not handling IF NOT EXIST that's why it's not completing tab completion for AUTHORIZATION. 6. As we are here we can also enable missing tab completion for ALTER SCHEMA. After OWNER TO we should also get CURRENT_ROLE, CURRENT_USER and SESSION_USER. postgres@53724=#ALTER SCHEMA sch owner to pg_database_owner pg_monitor pg_read_all_settings pg_read_server_files pg_stat_scan_tables pg_write_server_files pg_execute_server_program pg_read_all_data pg_read_all_stats pg_signal_backend pg_write_all_data surajkhamkar 7. Similarly, as we can drop multiple schemas' simultaneously, we should enable tab completion for comma with CASCADE and RESTRICT postgres@53724=#DROP SCHEMA sch CASCADE RESTRICT Thanks. On Sun, Aug 8, 2021 at 2:39 AM Dagfinn Ilmari Mannsåker wrote: > ilm...@ilmari.org (Dagfinn Ilmari Mannsåker) writes: > > > ilm...@ilmari.org (Dagfinn Ilmari Mannsåker) writes: > > > >> Hi Hackers, > >> > >> I just noticed there's no tab completion for CREATE SCHEMA > >> AUTHORIZATION, nor for anything after CREATE SCHEMA . > >> > >> Please find attached a patch that adds this. > > > > Added to the 2021-09 commit fest: > https://commitfest.postgresql.org/34/3252/ > > Here's an updated version that also reduces the duplication between the > various role list queries. > > - ilmari > >
Re: [bug] Logical Decoding of relation rewrite with toast does not reset toast_hash
On Mon, Aug 9, 2021 at 2:07 PM Amit Kapila wrote: > > On Fri, Jul 9, 2021 at 12:22 PM Drouvot, Bertrand wrote: > > > > Please find enclosed a patch proposal to: > > > > * Avoid the failed assertion on current master and generate the error > > message instead (should the code reach that stage). > > * Reset the toast_hash in case of relation rewrite with toast (so that the > > logical decoding in the above repro is working). > > > > I think instead of resetting toast_hash for this case why don't we set > 'relrewrite' for toast tables as well during rewrite? If we do that > then we will simply skip assembling toast chunks for the toast table. > In make_new_heap(), we are calling NewHeapCreateToastTable() to create > toast table where we can pass additional information (probably > 'toastid'), if required to set 'relrewrite'. Additionally, let's add a > test case if possible for this. I agree with Amit, that setting relrewrite for the toast relation as well is better as we can simply avoid processing the toast tuple as well in such cases. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: Column Filtering in Logical Replication
On Mon, Aug 9, 2021 at 1:36 AM Rahila Syed wrote: > >> Having said that, I'm not sure I agree with this design decision; what I >> think this is doing is hiding from the user the fact that they are >> publishing columns that they don't want to publish. I think as a user I >> would rather get an error in that case: > > >> ERROR: invalid column list in published set >> DETAIL: The set of published commands does not include all the replica >> identity columns. > > >> or something like that. Avoid possible nasty surprises of security- >> leaking nature. > > > Ok, Thank you for your opinion. I agree that giving an explicit error in this > case will be safer. > +1 for an explicit error in this case. Can you please explain why you have the restriction for including replica identity columns and do we want to put a similar restriction for the primary key? As far as I understand, if we allow default values on subscribers for replica identity, then probably updates, deletes won't work as they need to use replica identity (or PK) to search the required tuple. If so, shouldn't we add this restriction only when a publication has been defined for one of these (Update, Delete) actions? Another point is what if someone drops the column used in one of the publications? Do we want to drop the entire relation from publication or just remove the column filter or something else? Do we want to consider that the columns specified in the filter must not have NOT NULL constraint? Because, otherwise, the subscriber will error out inserting such rows? Minor comments: pq_sendbyte(out, flags); - /* attribute name */ pq_sendstring(out, NameStr(att->attname)); @@ -953,6 +1000,7 @@ logicalrep_write_attrs(StringInfo out, Relation rel) /* attribute mode */ pq_sendint32(out, att->atttypmod); + } bms_free(idattrs); diff --git a/src/backend/replication/logical/relation.c b/src/backend/replication/logical/relation.c index c37e2a7e29..d7a7b00841 100644 --- a/src/backend/replication/logical/relation.c +++ b/src/backend/replication/logical/relation.c @@ -354,7 +354,6 @@ logicalrep_rel_open(LogicalRepRelId remoteid, LOCKMODE lockmode) attnum = logicalrep_rel_att_by_name(remoterel, NameStr(attr->attname)); - entry->attrmap->attnums[i] = attnum; There are quite a few places in the patch that contains spurious line additions or removals. -- With Regards, Amit Kapila.
RE: [BUG] wrong refresh when ALTER SUBSCRIPTION ADD/DROP PUBLICATION
On Monday, August 9, 2021 11:10 AM Amit Kapila wrote: > > On Sat, Aug 7, 2021 at 6:53 PM houzj.f...@fujitsu.com > wrote: > > > > On Sat, Aug 7, 2021 1:36 PM Amit Kapila wrote: > > > On Fri, Aug 6, 2021 at 9:57 PM Japin Li wrote: > > > > > > Do you mean to say that do it for both Add and Drop or just for Drop? > > > Actually, doing it both will make the behavior consistent but doing > > > it just for Drop might be preferable by some users. I think it is > > > better to be consistent here but I am fine if you and others feel it > > > is better to refresh all publications only in Drop case. > > > > Personally, I also think it will be better to make the behavior consistent. > > Attach the new version patch make both ADD and DROP behave the same as > > SET PUBLICATION which refresh all the publications. > > > > There is a bug reported on pgsql-bugs [1] which seems to be happening due to > the same reason. Can you please test that the other bug is also fixed with > your > patch? > > [1] - > https://www.postgresql.org/message-id/17132-6a702189086c6243%40postgres > ql.org Thanks for the reminder, I have checked and tested the reported bug. The reported bug is that when drop and then re-add a publication on subscriber side, the table in the publication wasn't synced. And after applying the patch here, the table will be synced as expected if re-added(behaves the same as SET PUBLICATION). Best regards, Hou zj
Re: Column Filtering in Logical Replication
On Mon, Aug 9, 2021 at 3:59 PM Amit Kapila wrote: > > On Mon, Aug 9, 2021 at 1:36 AM Rahila Syed wrote: > > > >> Having said that, I'm not sure I agree with this design decision; what I > >> think this is doing is hiding from the user the fact that they are > >> publishing columns that they don't want to publish. I think as a user I > >> would rather get an error in that case: > > > > > >> ERROR: invalid column list in published set > >> DETAIL: The set of published commands does not include all the replica > >> identity columns. > > > > > >> or something like that. Avoid possible nasty surprises of security- > >> leaking nature. > > > > > > Ok, Thank you for your opinion. I agree that giving an explicit error in > > this case will be safer. > > > > +1 for an explicit error in this case. > > Can you please explain why you have the restriction for including > replica identity columns and do we want to put a similar restriction > for the primary key? As far as I understand, if we allow default > values on subscribers for replica identity, then probably updates, > deletes won't work as they need to use replica identity (or PK) to > search the required tuple. If so, shouldn't we add this restriction > only when a publication has been defined for one of these (Update, > Delete) actions? > > Another point is what if someone drops the column used in one of the > publications? Do we want to drop the entire relation from publication > or just remove the column filter or something else? > > Do we want to consider that the columns specified in the filter must > not have NOT NULL constraint? Because, otherwise, the subscriber will > error out inserting such rows? > I noticed that other databases provide this feature [1] and they allow users to specify "Columns that are included in Filter" or specify "All columns to be included in filter except for a subset of columns". I am not sure if want to provide both ways in the first version but at least we should consider it as a future extensibility requirement and try to choose syntax accordingly. [1] - https://docs.oracle.com/en/cloud/paas/goldengate-cloud/gwuad/selecting-columns.html#GUID-9A851C8B-48F7-43DF-8D98-D086BE069E20 -- With Regards, Amit Kapila.
Re: [bug] Logical Decoding of relation rewrite with toast does not reset toast_hash
On Mon, Aug 9, 2021 at 3:37 PM Drouvot, Bertrand wrote: > > Hi Amit, > > On 8/9/21 10:37 AM, Amit Kapila wrote: > > On Fri, Jul 9, 2021 at 12:22 PM Drouvot, Bertrand > > wrote: > >> Please find enclosed a patch proposal to: > >> > >> * Avoid the failed assertion on current master and generate the error > >> message instead (should the code reach that stage). > >> * Reset the toast_hash in case of relation rewrite with toast (so that the > >> logical decoding in the above repro is working). > >> > > I think instead of resetting toast_hash for this case why don't we set > > 'relrewrite' for toast tables as well during rewrite? If we do that > > then we will simply skip assembling toast chunks for the toast table. > > Thanks for looking at it! > > I do agree, that would be even better than the current patch approach: > I'll work on it. > > > In make_new_heap(), we are calling NewHeapCreateToastTable() to create > > toast table where we can pass additional information (probably > > 'toastid'), if required to set 'relrewrite'. Additionally, let's add a > > test case if possible for this. > + 1 for the test case, it will be added in the next version of the patch. > Thanks, please see, if you can prepare patches for the back-branches as well. -- With Regards, Amit Kapila.
Re: Advanced Questions about PostgreSQL
On 8/9/21 1:33 AM, David Rowley wrote: > On Mon, 9 Aug 2021 at 17:14, A Z wrote: >> >> 2) How may I get PostgreSQL to output the create table statement(s) for one >> or more tables inside one database, without issuing instructions via the >> command line, but only inside a database login, as a query or pl/sql? Watch this space. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: row filtering for logical replication
On Tue, Aug 3, 2021 at 4:25 PM Amit Kapila wrote: > > On Tue, Jul 27, 2021 at 9:56 AM Dilip Kumar wrote: > > > Yeah, that's a big problem, seems like the expression evaluation > > machinery directly going and detoasting the externally stored data > > using some random snapshot. Ideally, in walsender we can never > > attempt to detoast the data because there is no guarantee that those > > data are preserved. Somehow before going to the expression evaluation > > machinery, I think we will have to deform that tuple and need to do > > something for the externally stored data otherwise it will be very > > difficult to control that inside the expression evaluation. > > > > True, I think it would be possible after we fix the issue reported in > another thread [1] where we will log the key values as part of > old_tuple_key for toast tuples even if they are not changed. We can > have a restriction that in the WHERE clause that user can specify only > Key columns for Updates similar to Deletes. Then, we have the data > required for filter columns basically if the toasted key values are > changed, then they will be anyway part of the old and new tuple and if > they are not changed then they will be part of the old tuple. Right. I have > not checked the implementation part of it but theoretically, it seems > possible. Yeah, It would be possible to because at least after fixing [1] we would have the required column data. The only thing I am worried about is while applying the filter on the new tuple the toasted unchanged key data will not be a part of the new tuple. So we can not directly call the expression evaluation machinary, basically, somehow we need to deform the new tuple and then replace the data from the old tuple before passing it to expression evaluation. Anyways this is an implementation part so we can look into that while implementing. If my understanding is correct then it becomes necessary to > solve the other bug [1] to solve this part of the problem for this > patch. Right. The other possibility is to disallow columns (datatypes) that > can lead to toasted data (at least for Updates) which doesn't sound > like a good idea to me. Yeah, that will be a big limitation, then we won't be able to allow expression on any varlena types. Do you have any other ideas for this problem? As of now no other better idea to suggest. [1] - https://www.postgresql.org/message-id/OS0PR01MB611342D0A92D4F4BF26C0F47FB229%40OS0PR01MB6113.jpnprd01.prod.outlook.com -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Extension updates and pg_upgrade
Our minor releases coming out this week will create a script when pg_upgrade finishes that contains ALTER EXTENSION ... UPDATE commands for extension that show updates in pg_available_extensions. However, I am unclear if all extensions that have updates are reported in pg_available_extensions or support ALTER EXTENSION ... UPDATE, e.g., PostGIS. Do the new pg_upgrade docs need to be more vague about this? -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
Re: Parallel scan with SubTransGetTopmostTransaction assert coredump
On Wed, Aug 4, 2021 at 10:03 PM Greg Nancarrow wrote: > In setting up the snapshot for the execution state used in command > execution, GetTransactionSnapshot() is called and (possibly a copy of) > the returned snapshot is pushed as the ActiveSnapshot. I mean, there are LOTS of PushActiveSnapshot() calls in the code. A lot of those specifically say PushActiveSnapshot(GetTransactionSnapshot()) but others are pushing snapshots obtained from various other places. I really don't think it can possibly be correct in general to assume that the snapshot on top of the active snapshot stack is the same as the transaction snapshot. > So why (current Postgres code, no patches applied) in setting up for > parallel-worker execution (in InitializeParallelDSM) does the Postgres > code then acquire ANOTHER TransactionSnapshot (by calling > GetTransactionSnashot(), which could return CurrentSnapshot or a new > snapshot) and serialize that, as well as serializing what the > ActiveSnapshot points to, and then restore those in the workers as two > separate snapshots? Is it a mistake? Or if intentional and correct, > how so? Well, I already agreed that in cases where GetTransactionSnapshot() will acquire an altogether new snapshot, we shouldn't call it, but beyond that I don't see why you think this is wrong. I mean, suppose we only call GetTransactionSnapshot() at parallel worker when IsolationUsesXactSnapshot(). In that case, CurrentSnapshot is a durable, transaction-lifetime piece of backend-local state that can affect the results of future calls to GetTransactionSnapshot(), and therefore seems to need to be replicated to workers. Separately, regardless of IsolationUsesXactSnapshot(), the active snapshot is accessible via calls to GetActiveSnapshot() and therefore should also be replicated to workers. Now, I don't know of any necessary reason why those two things need to be the same, because again, there are PushActiveSnapshot() calls all over the place, and they're not all pushing the transaction snapshot. So therefore it makes sense to me that we need to replicate those two things separately - the active snapshot in the leader becomes the active snapshot in the workers and the transaction snapshot in the leader becomes the transaction snapshot in the worker. Now there is evidently something wrong with this line of reasoning because the code is buggy and my proposed fix doesn't work, but I don't know what is wrong with it. You seem to think that it's crazy that we try to replicate the active snapshot to the active snapshot and the transaction snapshot to the transaction snapshot, but that did, and still does, seem really sane to me. The only part that now seems clearly wrong to me is that !IsolationUsesXactSnapshot() case takes an *extra* snapshot, but since fixing that didn't fix the bug, there's evidently more to the problem than that. -- Robert Haas EDB: http://www.enterprisedb.com
Re: EXEC_BACKEND vs bgworkers without BGWORKER_SHMEM_ACCESS
On Thu, Aug 5, 2021 at 8:02 PM Tom Lane wrote: > I think doing nothing is fine. Given the lack of complaints, we're > more likely to break something than fix anything useful. +1. -- Robert Haas EDB: http://www.enterprisedb.com
Re: when the startup process doesn't (logging startup delays)
Modified the reset_startup_progress_timeout() to take the second parameter which indicates whether it is called for initialization or for resetting. Without this parameter there is a problem if we call init_startup_progress() more than one time if there is no call to ereport_startup_progress() in between as the code related to disabling the timer has been removed. reset_startup_progress_timeout(TimeStampTz now, bool is_init) { if (is_init) next_timeout = now + interval; else { next_timeout = scheduled_startup_progress_timeout + interval; if (next_timeout < now) { // Either the timeout was processed so late that we missed an entire cycle, // or the system clock was set backwards. next_timeout = now + interval; } enable_timeout_at(next_timeout); scheduled_startup_progress_timeout = next_timeout; } } I have incorporated this in the attached patch. Please correct me if I am wrong. > This makes sense, but I think I'd like to have all the functions in > this patch use names_like_this() rather than NamesLikeThis(). I have changed all the function names accordingly. But I would like to know why it should be names_like_this() as there are many functions with the format NamesLikeThis(). I would like to understand when to use what, just to incorporate in the future patches. > Hmm, yeah, that seems good, but given this change, maybe the variables > need a little renaming. Like change last_startup_progress_timeout to > scheduled_startup_progress_timeout, perhaps. Right. Changed the variable name. > I guess this one needs to return a Boolean, actually. Yes. > reset_startup_progress_timeout(TimeStampTz now) > { > next_timeout = last_startup_progress_timeout + interval; > if (next_timeout < now) > { > // Either the timeout was processed so late that we missed an entire > cycle, > // or the system clock was set backwards. > next_timeout = now + interval; > } > enable_timeout_at(next_timeout); > last_startup_progress_timeout = next_timeout; > } As per the above logic, I would like to discuss 2 cases. Case-1: First scheduled timeout is after 1 sec. But the time out occurred after 1.5 sec. So the log msg prints after 1.5 sec. Next timer is scheduled after 2 sec (scheduled_startup_progress_timeout + interval). The condition (next_timeout < now) gets evaluated to false and everything goes smooth. Case-2: First scheduled timeout is after 1 sec. But the timeout occurred after 2.5 sec. So the log msg prints after 2.5 sec. Now next time is scheduled after 2 sec (scheduled_startup_progress_timeout + interval). So the condition (next_timeout < now) will fail and the next_timeout will get reset to 3.5 sec (2.5 + 1) and it continues. Is this behaviour ok or should we set the next_timeout to 3 sec. Please share your thoughts on this. Thanks & Regards, Nitin Jadhav On Thu, Aug 5, 2021 at 7:49 PM Robert Haas wrote: > > On Thu, Aug 5, 2021 at 7:41 AM Nitin Jadhav > wrote: > > This seems a little confusing. As we are setting > > last_startup_progress_timeout = now and then calling > > reset_startup_progress_timeout() which will calculate the next_time > > based on the value of last_startup_progress_timeout initially and > > checks whether next_timeout is less than now. It doesn't make sense to > > me. I feel we should calculate the next_timeout based on the time that > > it is supposed to fire next time. So we should set > > last_startup_progress_timeout = next_timeout after enabling the timer. > > Also I feel with the 2 functions mentioned above, we also need > > InitStartupProgress() which sets the initial value to > > startupProcessOpStartTime which is used to calculate the time > > difference and display in the logs. I could see those functions like > > below. > > > > InitStartupProgress(void) > > { > > startupProcessOpStartTime = GetCurrentTimestamp(); > > ResetStartupProgressTimeout(startupProcessOpStartTime); > > } > > This makes sense, but I think I'd like to have all the functions in > this patch use names_like_this() rather than NamesLikeThis(). > > > reset_startup_progress_timeout(TimeStampTz now) > > { > > next_timeout = last_startup_progress_timeout + interval; > > if (next_timeout < now) > > { > > // Either the timeout was processed so late that we missed an entire > > cycle, > > // or the system clock was set backwards. > > next_timeout = now + interval; > > } > > enable_timeout_at(next_timeout); > > last_startup_progress_timeout = next_timeout; > > } > > Hmm, yeah, that seems good, but given this change, maybe the variables > need a little renaming. Like change last_startup_progress_timeout to > scheduled_startup_progress_timeout, perhaps. > > > startup_progress_timeout_has_expired() > > { > >if (!startup_progress_timer_expired) > > return; > > now = GetCurrentTimestamp(); > > // compute timestamp difference based on startupProcessOpStartTime > > reset_startup
Re: [PATCH] OpenSSL: Mark underlying BIO with the appropriate type flags
> On 6 Aug 2021, at 12:16, Itamar Gafni wrote: > Previous to OpenSSL version 1.1.0, the BIO methods object would be copied > directly from the existing socket type and then its read\write functions > would be replaced. > With 1.1.0 and up, the object is created from scratch and then all its > methods are initialized to be the ones of the socket type, except read/write > which are custom. > In this newer way, a new type is given to it by calling “BIO_get_new_index”, > but the related type flags aren’t added. According to the documentation (I haven't tested it yet but will) I think you are right that the type should be set with the appropriate BIO_TYPE_ flags. For OpenSSL 1.0.1 and 1.0.2, wouldn't we need to set the .type with a randomly chosen index anded with BIO_TYPE_DESCRIPTOR and BIO_TYPE_SOURCE_SINK as well? -- Daniel Gustafsson https://vmware.com/
Re: Added schema level support for publication.
> On Aug 6, 2021, at 1:32 AM, vignesh C wrote: > > the attached v19 patch With v19 applied, a schema owner can publish the contents of a table regardless of ownership or permissions on that table: +CREATE ROLE user1; +GRANT CREATE ON DATABASE regression TO user1; +CREATE ROLE user2; +GRANT CREATE ON DATABASE regression TO user2; +SET SESSION AUTHORIZATION user1; +CREATE SCHEMA user1schema; +GRANT CREATE, USAGE ON SCHEMA user1schema TO user2; +RESET SESSION AUTHORIZATION; +SET SESSION AUTHORIZATION user2; +CREATE TABLE user1schema.user2private (junk text); +REVOKE ALL PRIVILEGES ON user1schema.user2private FROM PUBLIC; +REVOKE ALL PRIVILEGES ON user1schema.user2private FROM user1; +CREATE TABLE user1schema.user2public (junk text); +GRANT SELECT ON user1schema.user2public TO PUBLIC; +RESET SESSION AUTHORIZATION; +SET SESSION AUTHORIZATION user1; +SELECT * FROM user1schema.user2private; +ERROR: permission denied for table user2private +SELECT * FROM user1schema.user2public; + junk +-- +(0 rows) + +CREATE PUBLICATION user1pub; +WARNING: wal_level is insufficient to publish logical changes +HINT: Set wal_level to logical before creating subscriptions. +ALTER PUBLICATION user1pub + ADD TABLE user1schema.user2public; +ERROR: must be owner of table user2public +ALTER PUBLICATION user1pub + ADD TABLE user1schema.user2private, user1schema.user2public; +ERROR: must be owner of table user2private +SELECT * FROM pg_catalog.pg_publication_tables + WHERE pubname = 'user1pub'; + pubname | schemaname | tablename +-++--- +(0 rows) + +ALTER PUBLICATION user1pub ADD SCHEMA user1schema; +SELECT * FROM pg_catalog.pg_publication_tables + WHERE pubname = 'user1pub'; + pubname | schemaname | tablename +--+-+-- + user1pub | user1schema | user2private + user1pub | user1schema | user2public +(2 rows) It is a bit counterintuitive that schema owners do not have administrative privileges over tables within their schemas, but that's how it is. The design of this patch seems to assume otherwise. Perhaps ALTER PUBLICATION ... ADD SCHEMA should be restricted to superusers, just as FOR ALL TABLES? Alternatively, you could add ownership checks per table to mirror the behavior of ALTER PUBLICATION ... ADD TABLE, but that would foreclose the option of automatically updating the list of tables in the publication as new tables are added to the schema, since those new tables would not necessarily belong to the schema owner, and having a error thrown during CREATE TABLE would be quite unfriendly. I think until this is hammered out, it is safer to require superuser privileges and then we can revisit this issue and loosen the requirement in a subsequent commit. What do you think? — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
RE: ECPG bug fix: DECALRE STATEMENT and DEALLOCATE, DESCRIBE
Dear Hackers, I perfectly missed mails and 8/9 was a national holiday. I must apologize about anything. Very sorry for inconvenience. > The RMT's first responsibility is to ensure that PostgreSQL 14 is > released on schedule. We would prefer to avoid a revert, but we cannot > allow this to drag on indefinitely. Of cause I will try to avoid it but I can understand doing your business. Dear Meskes, I summarize the thread. As you might know DECLARE STATEMENT has been added from PG14, but I found that connection-control feature cannot be used for DEALLOCATE and DESCRIBE statement (More details, please see patches or ask me). But we have a question - what statement should use the associated connection? Obviously DEALLOCATE statement should follow the linked connection because the statement uses only one sql identifier. In DESCRIBE or any other descriptor-related statements, however, I think it is non-obvious because they have also descriptor-name. Currently I made patches that includes about DESCRIBE, but I'm wondering this approach is correct. I want you to ask your opinion about the scope of DECLARE STATEMENT. Coding is not hard hence I think we can fix this until the end of Sep. if we set a policy correctly and have reviewers. Best Regards, Hayato Kuroda FUJITSU LIMITED
RE: ECPG bug fix: DECALRE STATEMENT and DEALLOCATE, DESCRIBE
Dear Wang, Good reporting! > I'm not sure, but how about modify the ecpg.trailer: > > statement: ecpgstart at toplevel_stmt ';' { connection = NULL; } > > | ecpgstart toplevel_stmt ';' > I think there are lots of 'connection = NULL;' in source, and we should find a good location to reset the connection. Thank you for giving a solution! I will consider the idea and integrate it if it's OK. Best Regards, Hayato Kuroda FUJITSU LIMITED
RE: ECPG bug fix: DECALRE STATEMENT and DEALLOCATE, DESCRIBE
Dear Horiguchi-san, > How Pro*C behaves in that case? If the second command ends with an > error, I think we are free to error out the second command before > execution. If it works... do you know what is happening at the time? You asked that how Oracle works when the followings precompiled and executed, don't it? > > > EXEC SQL AT conn1 DECLARE stmt STATEMENT; > > > EXEC SQL AT conn2 EXECUTE stmt INTO ..; While precompiling, it does not throw any errors. While executing, the second statement will execute at conn1 without warnings. So the added warning is postgres-extended. Best Regards, Hayato Kuroda FUJITSU LIMITED
replay of CREATE TABLESPACE eats data at wal_level=minimal
To reproduce, initialize a cluster with wal_level=minimal and max_wal_senders=0. Then from psql: \! mkdir /tmp/goose CHECKPOINT; CREATE TABLESPACE goose LOCATION '/tmp/goose'; SET wal_skip_threshold=0; BEGIN; CREATE TABLE wild (a int, b text) TABLESPACE goose; INSERT INTO wild VALUES (1, 'chase'); COMMIT; SELECT * FROM wild; As expected, you will see one row in table 'wild'. Now perform an immediate shutdown. Restart the server. Table 'wild' is now empty. The problem appears to be that tblspc_redo() calls create_tablespace_directories(), which says: /* * Our theory for replaying a CREATE is to forcibly drop the target * subdirectory if present, and then recreate it. This may be more * work than needed, but it is simple to implement. */ Unfortunately, this theory (which dates to c86f467d18aa58e18fd85b560b46d8de014e6017, vintage 2010, by Bruce) is correct only with wal_level>minimal. At wal_level='minimal', we can replay the record to recreate the relfilenode, but not the records that would have created the contents. However, note that if the table is smaller than wal_skip_threshold, then we'll log full-page images of the contents at commit time even at wal_level='minimal' after which we have no problem. As far as I can see, this bug has "always" existed, but before c6b92041d38512a4176ed76ad06f713d2e6c01a8 (2020, Noah) you would have needed a different test case. Specifically, you would have needed to use COPY to put the row in the table, and you would have needed to omit setting wal_skip_threshold since it didn't exist yet. I don't presently have a specific idea about how to fix this. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Worth using personality(ADDR_NO_RANDOMIZE) for EXEC_BACKEND on linux?
On 2021-Aug-06, Andrew Dunstan wrote: > On 8/5/21 11:29 PM, Andres Freund wrote: > > I was wondering if we should have postmaster do > > personality(ADDR_NO_RANDOMIZE) > > for EXEC_BACKEND builds? It seems nicer to make it automatically work than > > have people remember that they need to call "setarch --addr-no-randomize > > make check". How common is to get a failure? I know I've run tests under EXEC_BACKEND and not seen any failures. Not many runs though. > > Not that it actually matters for EXEC_BACKEND, but theoretically doing > > personality(ADDR_NO_RANDOMIZE) in postmaster is a tad more secure than doing > > it via setarch, as in the personality() case postmaster's layout itself is > > still randomized... True. I think the security aspect is not critically important, since hopefully nobody should be using such builds for production. > (Thinks: do we have non-Windows buildfarm members doing EXEC_BACKEND?) culicidae does that. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "Pido que me den el Nobel por razones humanitarias" (Nicanor Parra)
Re: Worth using personality(ADDR_NO_RANDOMIZE) for EXEC_BACKEND on linux?
On Mon, Aug 9, 2021 at 1:30 PM Alvaro Herrera wrote: > How common is to get a failure? I know I've run tests under > EXEC_BACKEND and not seen any failures. Not many runs though. On macOS, failures are extremely common. Sometimes I have to run simple tests many times to get even one success. The proposal on the table won't help with that problem since it's Linux-specific, but if there's any way to do something similar on macOS it would be a _huge_ help. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Worth using personality(ADDR_NO_RANDOMIZE) for EXEC_BACKEND on linux?
Hi, On 2021-08-09 13:43:03 -0400, Robert Haas wrote: > On Mon, Aug 9, 2021 at 1:30 PM Alvaro Herrera wrote: > > How common is to get a failure? I know I've run tests under > > EXEC_BACKEND and not seen any failures. Not many runs though. I get check-world failures in about 1/2-1/3 of the runs, and a plain check fails in maybe 1/4 of the cases. It's pretty annoying because it often isn't trivial to distinguish whether I've broken something or whether it's randomization related... The frequency of failures isn't stable over time, making it a bit harder to know what's going on. > On macOS, failures are extremely common. Sometimes I have to run > simple tests many times to get even one success. The proposal on the > table won't help with that problem since it's Linux-specific, but if > there's any way to do something similar on macOS it would be a _huge_ > help. There does seem to be a way of doing so, because I found blog posts talking about how to get e.g. lldb to *enable* ASLR, which it disables by default... Greetings, Andres Freund
Re: Worth using personality(ADDR_NO_RANDOMIZE) for EXEC_BACKEND on linux?
Andres Freund writes: > On 2021-08-09 13:43:03 -0400, Robert Haas wrote: >> On Mon, Aug 9, 2021 at 1:30 PM Alvaro Herrera >> wrote: > How common is to get a failure? I know I've run tests under > EXEC_BACKEND and not seen any failures. Not many runs though. > I get check-world failures in about 1/2-1/3 of the runs, and a plain check > fails in maybe 1/4 of the cases. It's pretty annoying because it often isn't > trivial to distinguish whether I've broken something or whether it's > randomization related... I don't have numbers, but I do know that on Linux EXEC_BACKEND builds fail often enough to be annoying if you don't disable ASLR. If we can do something not-too-invasive about that, it'd be great. regards, tom lane
Re: Tab completion for CREATE SCHEMAAUTHORIZATION
Hi Suraj, Suraj Khamkar writes: > Hello Dagfinn, > > I had a look at your patch and below are my review comments. > Please correct me if I am missing something. > >1. For me the patch does not apply cleanly. I have been facing the error >of trailing whitespaces. I do not get these errors, neither with the patch file I still have locally, or by saving the attachment from my original email. Are you sure something in your download process hasn't converted it to Windows line endings (\r\n), or otherwise mangled the whitespace? >2. We can remove space in before \ and below change >+" UNION ALL SELECT 'PUBLIC'" \ > >Should be, >+" UNION ALL SELECT 'PUBLIC' "\ The patch doesn't add any lines that end with quote-space-backslash. As for the space before the quote, that is not necessary either, since the next line starts with a space after the quote. Either way, the updated version of the patch doesn't add any new lines with continuation after a string constant, so the point is moot. >3. role_specification has CURRENT_ROLE, CURRENT_USER and SESSION_USER. >But current changes are missing CURRENT_ROLE. Ah, I was looking at the documentation for 13, but CURRENT_ROLE is only allowed in this context as of 14. Fixed. >4. I'm not sure about this but do we need to enable tab completion for IF >NOT EXIST? > >5. I think we are not handling IF NOT EXIST that's why it's not >completing tab completion >for AUTHORIZATION. As you note, psql currently doesn't currently tab-complete IF NOT EXISTS for any command, so that would be a subject for a separate patch. >6. As we are here we can also enable missing tab completion for ALTER >SCHEMA. >After OWNER TO we should also get CURRENT_ROLE, CURRENT_USER and >SESSION_USER. I did an audit of all the uses of Query_for_list_of_roles, and there turned out be several more that accept CURRENT_ROLE, CURRENT_USER and SESSION_USER that they weren't tab-completed for. I also renamed the constant to Query_for_list_of_owner_roles, but I'm not 100% happy with that name either. >7. Similarly, as we can drop multiple schemas' simultaneously, we should >enable tab completion for >comma with CASCADE and RESTRICT >postgres@53724=#DROP SCHEMA sch >CASCADE RESTRICT The tab completion code for DROP is generic for all object types (see the words_after_create array and the create_or_drop_command_generator function), so that should be done genericallly, and is thus outside the scope for this patch. > Thanks. Thanks for the review. Updated patch attached, with the CURRENT/SESSION ROLE/USER changes for other commands separated out. - ilmari >From 68ebc42bf142f843dd3bf5741e085ec0397c8e0b Mon Sep 17 00:00:00 2001 From: tanghy Date: Mon, 9 Aug 2021 18:42:12 +0100 Subject: [PATCH v3 1/2] Tab-complete CURRENT_ROLE, CURRENT_USER and SESSION_USER everywhere All commands that manipulate ownership let you specify the current or session user by the above aliases, but the tab completion code had only got that memo in some places. --- src/bin/psql/tab-complete.c | 28 +--- 1 file changed, 13 insertions(+), 15 deletions(-) diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c index 064892bade..4ddc8134a0 100644 --- a/src/bin/psql/tab-complete.c +++ b/src/bin/psql/tab-complete.c @@ -758,15 +758,16 @@ static const SchemaQuery Query_for_list_of_collations = { " FROM pg_catalog.pg_roles "\ " WHERE substring(pg_catalog.quote_ident(rolname),1,%d)='%s'" -#define Query_for_list_of_grant_roles \ -" SELECT pg_catalog.quote_ident(rolname) "\ -" FROM pg_catalog.pg_roles "\ -" WHERE substring(pg_catalog.quote_ident(rolname),1,%d)='%s'"\ -" UNION ALL SELECT 'PUBLIC'"\ +#define Query_for_list_of_owner_roles \ +Query_for_list_of_roles \ " UNION ALL SELECT 'CURRENT_ROLE'"\ " UNION ALL SELECT 'CURRENT_USER'"\ " UNION ALL SELECT 'SESSION_USER'" +#define Query_for_list_of_grant_roles \ +Query_for_list_of_owner_roles \ +" UNION ALL SELECT 'PUBLIC'" + /* the silly-looking length condition is just to eat up the current word */ #define Query_for_index_of_table \ "SELECT pg_catalog.quote_ident(c2.relname) "\ @@ -1613,7 +1614,7 @@ psql_completion(const char *text, int start, int end) COMPLETE_WITH("SET TABLESPACE", "OWNED BY"); /* ALTER TABLE,INDEX,MATERIALIZED VIEW ALL IN TABLESPACE xxx OWNED BY */ else if (TailMatches("ALL", "IN", "TABLESPACE", MatchAny, "OWNED", "BY")) - COMPLETE_WITH_QUERY(Query_for_list_of_roles); + COMPLETE_WITH_QUERY(Query_for_list_of_owner_roles); /* ALTER TABLE,INDEX,MATERIALIZED VIEW ALL IN TABLESPACE xxx OWNED BY xxx */ else if (TailMatches("ALL", "IN", "TABLESPACE", MatchAny, "OWNED", "BY", MatchAny)) COMPLETE_WITH("SET TABLESPACE"); @@ -2257,7 +2258,7 @@ psql_completion(const char *text, int start, int end) COMPLETE_WITH("USER"); /* complete ALTER GROUP ADD|DROP USER with a user name */ else if (Matches("ALTER", "GROUP", Ma
Re: Worth using personality(ADDR_NO_RANDOMIZE) for EXEC_BACKEND on linux?
Hi, On 2021-08-09 13:54:25 -0400, Tom Lane wrote: > Andres Freund writes: > > On 2021-08-09 13:43:03 -0400, Robert Haas wrote: > >> On Mon, Aug 9, 2021 at 1:30 PM Alvaro Herrera > >> wrote: > > How common is to get a failure? I know I've run tests under > > EXEC_BACKEND and not seen any failures. Not many runs though. > > > I get check-world failures in about 1/2-1/3 of the runs, and a plain check > > fails in maybe 1/4 of the cases. It's pretty annoying because it often isn't > > trivial to distinguish whether I've broken something or whether it's > > randomization related... > > I don't have numbers, but I do know that on Linux EXEC_BACKEND builds fail > often enough to be annoying if you don't disable ASLR. If we can do > something not-too-invasive about that, it'd be great. I now not sure if personality(NO_RANDOMIZE) in postmaster is actually sufficient. It does seem to drastically reduce the frequency of issues, but there's still conceivable failures where postmaster's layout would class with the non-randomized children - obviously personality(NO_RANDOMIZE) cannot retroactively change the layout of the already running binary. So maybe we should put it into pg_ctl? Greetings, Andres Freund
Re: ECPG bug fix: DECALRE STATEMENT and DEALLOCATE, DESCRIBE
Dear Kuroda-san, > I perfectly missed mails and 8/9 was a national holiday. > I must apologize about anything. Very sorry for inconvenience. No need to, non of it is your fault. > I summarize the thread. Thank you so much, this is very, very helpful. > As you might know DECLARE STATEMENT has been added from PG14, but I > found that connection-control feature cannot be used for DEALLOCATE > and DESCRIBE statement (More details, please see patches or ask me). > But we have a question - what statement should use the associated > connection? Obviously DEALLOCATE statement should follow the linked > connection because the statement uses only one sql identifier. In > DESCRIBE or any other descriptor-related statements, however, I think > it is non-obvious because they have also descriptor-name. Currently I > made patches that includes about DESCRIBE, but I'm wondering this > approach is correct. I want you to ask your opinion about the scope > of > DECLARE STATEMENT. I've been reading through quite a few documents to come up with a good reason one way or the other, but as you already pointed out yourself, other database systems seem to not be consequent about the usage either. Unfortunately I didn't find my copy of the SQL standard. But then I kind of doubt it has much to say about this particular issue. Anyway, I'm currently leaning towards including DESCRIBE in the list of statements that are influenced by DECLARE STATEMENT. My reason is that DESCRIBE can be issued with an AT clause already, regardless whether it makes sense or not. Or in other words, if we allow it to get a connection name one way, why should we forbid the other way. To me this seems to be more confusing than the current situation. The alternative would be to forbid using an AT clause with DESCRIBE, which would definitely be a compatibility change, although, again, not a functional one. > Coding is not hard hence I think we can fix this until the end of > Sep. > if we set a policy correctly and have reviewers. Fully agreed. That's why a short glance at the patch didn't suffice to answer this. I didn't see any issues with the patch so far. Please send it to me once its finished (or is it already?) and I'll give it a run, too. Hopefully I caught up on all emails and didn't miss parts of the thread. Thanks, Michael -- Michael Meskes Michael at Fam-Meskes dot De Michael at Meskes dot (De|Com|Net|Org) Meskes at (Debian|Postgresql) dot Org
Re: ECPG bug fix: DECALRE STATEMENT and DEALLOCATE, DESCRIBE
On Mon, Aug 9, 2021 at 12:10 AM Michael Meskes wrote: > How do you know I didn't spend 15 minutes looking at the patch and the > whole email thread? I surely did and it was more than 15 minutes, but > not enough to give a meaningful comment. If you can do it in 15 > minutes, great for you, I cannot. That was just an example of a token response. I don't know anything about ecpg. > Besides, I have not ignored the RMT. I don't know why you keep > insisting on something that is simply not true. My email of July 30 was itself pretty strongly worded, but went unanswered for a full week. Not even a token response. If that doesn't count as "ignoring the RMT", then what does? How much time has to pass before radio silence begins to count as "ignoring the RMT", in your view of things? A month? More? > At the risk of repeating myself, my concern is *only* the rude and > disrespectful way of addressing me in the third person while talking to > me directly. Again, I though I made that clear in my first email about > the topic and every one that followed. Okay, I understand that now. > I was *never* asked for *any* response in *any* email, save the > original technical discussion, which I did answer with telling people > that I'm lacking time but will eventually get to it. Just to be > precise, your email from July 30 told me and everyone how this will be > handled. A reasonable procedure, albeit not one we'd like to see > happen. But why should I answer and what? It's not that you bring this > up as a discussion point, but as a fact. As Andrew pointed out, there is a general expectation that committers take care of their own open items. That doesn't mean that they are obligated to personally fix bugs. Just that the final responsibility to make sure that the issue is resolved rests with the committer. This is one of the few hard rules that we have. As I've said before, RMT-driven revert is something that I see as an option of last resort. The RMT charter doesn't go quite that far, but I'd argue that my interpretation is quite natural given how committer responsibility works in general. In other words, I personally believe that our bottom-up approach is on balance a good one, and should be preserved. Maybe the issue is muddied by the fact that we each have different views of the community process, at a high level (I'm unsure). Unlike you, I don't believe that RMT-driven revert is "a reasonable procedure". I myself see the RMT's power to resolve open items in this way as a necessary evil. Something to be avoided at all costs. Why should people that don't know anything about ecpg make decisions about ecpg? In general they should not. -- Peter Geoghegan
Re: ECPG bug fix: DECALRE STATEMENT and DEALLOCATE, DESCRIBE
> My email of July 30 was itself pretty strongly worded, but went > unanswered for a full week. Not even a token response. If that > doesn't > count as "ignoring the RMT", then what does? How much time has to > pass > before radio silence begins to count as "ignoring the RMT", in your > view of things? A month? More? If you want me to answer, how about asking a question? Or telling me that you'd like some feedback? I don't see how I should know that you expect a reply to a simple statement of facts. > Okay, I understand that now. And? Do you care at all? > As Andrew pointed out, there is a general expectation that committers > take care of their own open items. That doesn't mean that they are > obligated to personally fix bugs. Just that the final responsibility > to make sure that the issue is resolved rests with the committer. > This > is one of the few hard rules that we have. Sure, I don't question that. Again, I knew about the issue, only misjudged it in the beginning. Your email from July 30 did show me that it was more urgent but still didn't create the impression that there was such a short deadline. In my opinion my prior email already explained that I was on it, but couldn't give an estimate. > As I've said before, RMT-driven revert is something that I see as an > option of last resort. The RMT charter doesn't go quite that far, but > I'd argue that my interpretation is quite natural given how committer > responsibility works in general. In other words, I personally believe > that our bottom-up approach is on balance a good one, and should be > preserved. Fair enough, to me a revert is a revert, no matter who issues it. > Maybe the issue is muddied by the fact that we each have different > views of the community process, at a high level (I'm unsure). Unlike > you, I don't believe that RMT-driven revert is "a reasonable > procedure". I myself see the RMT's power to resolve open items in > this > way as a necessary evil. Something to be avoided at all costs. Why > should people that don't know anything about ecpg make decisions > about > ecpg? In general they should not. Well, you did lay out what the decision would be and I fully agreed with it. So again, what was there to do? Had you asked me if I agreed, I would told you. Michael -- Michael Meskes Michael at Fam-Meskes dot De Michael at Meskes dot (De|Com|Net|Org) Meskes at (Debian|Postgresql) dot Org
using an end-of-recovery record in all cases
On Thu, Jul 29, 2021 at 11:49 AM Robert Haas wrote: > On Wed, Jul 28, 2021 at 3:28 PM Andres Freund wrote: > > > Imagine if instead of > > > all the hairy logic we have now we just replaced this whole if > > > (IsInRecovery) stanza with this: > > > > > > if (InRecovery) > > > CreateEndOfRecoveryRecord(); > > > > > > That would be WAY easier to reason about than the rat's nest we have > > > here today. Now, I am not sure what it would take to get there, but I > > > think that is the direction we ought to be heading. > > > > What are we going to do in the single user ([1]) case in this awesome > > future? > > I guess we could just not create a checkpoint until single user mode is shut > > down / creates a checkpoint for other reasons? > > It probably depends on who writes this thus-far-hypothetical patch, > but my thought is that we'd unconditionally request a checkpoint after > writing the end-of-recovery record, same as we do now if (promoted). > If we happened to be in single-user mode, then that checkpoint request > would turn into performing a checkpoint on the spot in the one and > only process we've got, but StartupXLOG() wouldn't really need to care > what happens under the hood after it called RequestCheckpoint(). I decided to try writing a patch to use an end-of-recovery record rather than a checkpoint record in all cases. The patch itself was pretty simple but didn't work. There are two different reasons why it didn't work, which I'll explain in a minute. I'm not sure whether there are any other problems; these are the only two things that cause problems with 'make check-world', but that's hardly a guarantee of anything. Anyway, I thought it would be useful to report these issues first and hopefully get some feedback. The first problem I hit was that GetRunningTransactionData() does Assert(TransactionIdIsNormal(CurrentRunningXacts->latestCompletedXid)). While that does seem like a superficially reasonable thing to assert, StartupXLOG() initializes latestCompletedXid by calling TransactionIdRetreat on the value extracted from checkPoint.nextXid, and the first-ever checkpoint that is written at the beginning of WAL has a nextXid of 3, so when starting up from that checkpoint nextXid is 2, which is not a normal XID. When we try to create the next checkpoint, CreateCheckPoint() does LogStandbySnapshot() which calls GetRunningTransactionData() and the assertion fails. In the current code, we avoid this more or less accidentally because LogStandbySnapshot() is skipped when starting from a shutdown checkpoint. If we write an end-of-recovery record and then trigger a checkpoint afterwards, then we don't avoid the problem. Although I'm impishly tempted to suggest that we #define SecondNormalTransactionId 4 and then use that to create the first checkpoint instead of FirstNormalTransactionId -- naturally with no comments explaining why we're doing it -- I think the real problem is that the assertion is just wrong. CurrentRunningXacts->latestCompletedXid shouldn't be InvalidTransactionId or BootstrapTransactionId, but FrozenTransactionId is a legal, if corner-case, value, at least as the code stands today. Unfortunately we can't just relax the assertion, because the XLOG_RUNNING_XACTS record will eventually be handed to ProcArrayApplyRecoveryInfo() for processing ... and that function contains a matching assertion which would in turn fail. It in turn passes the value to MaintainLatestCompletedXidRecovery() which contains yet another matching assertion, so the restriction to normal XIDs here looks pretty deliberate. There are no comments, though, so the reader is left to guess why. I see one problem: MaintainLatestCompletedXidRecovery uses FullXidRelativeTo, which expects a normal XID. Perhaps it's best to just dodge the entire issue by skipping LogStandbySnapshot() if latestCompletedXid happens to be 2, but that feels like a hack, because AFAICS the real problem is that StartupXLog() doesn't agree with the rest of the code on whether 2 is a legal case, and maybe we ought to be storing a value that doesn't need to be computed via TransactionIdRetreat(). The second problem I hit was a preexisting bug where, under wal_level=minimal, redo of a "create tablespace" record can blow away data of which we have no other copy. See http://postgr.es/m/CA+TgmoaLO9ncuwvr2nN-J4VEP5XyAcy=zkihxqzbbfrxxgx...@mail.gmail.com for details. That bug happens to make src/test/recovery's 018_wal_optimize.pl fail one of the tests, because the test starts and stops the server repeatedly, with the result that with the attached patch, we just keep writing end-of-recovery records and never getting time to finish a checkpoint before the next shutdown, so every test replays the CREATE TABLESPACE record and everything that previous tests did. The "wal_level = minimal, SET TABLESPACE commit subtransaction" fails because it's the only one that (1) uses the tablespace for a new table, (2) commits, and (3) runs before a checkpoint is ma
Re: Another regexp performance improvement: skip useless paren-captures
Mark Dilger writes: > The patch looks ready to commit. I don't expect to test it any further > unless you have something in particular you'd like me to focus on. Pushed, but while re-reading it before commit I noticed that there's some more fairly low-hanging fruit in regexp_replace(). As I had it in that patch, it never used REG_NOSUB because of the possibility that the replacement string uses "\N". However, we're already pre-checking the replacement string to see if it has backslashes at all, so while we're at it we can check for \N to discover if we actually need any subexpression match data or not. We do need to refactor a little to postpone calling pg_regcomp until after we know that, but I think that makes replace_text_regexp's API less ugly not more so. While I was at it, I changed the search-for-backslash loops to use memchr rather than handwritten looping. Their use of pg_mblen was pretty unnecessary given we only need to find backslashes, and we can assume the backend encoding is ASCII-safe. Using a bunch of random cases generated by your little perl script, I see maybe 10-15% speedup on test cases that don't use \N in the replacement string, while it's about a wash on cases that do. (If I'd been using a multibyte encoding, maybe the memchr change would have made a difference, but I didn't try that.) regards, tom lane diff --git a/src/backend/utils/adt/regexp.c b/src/backend/utils/adt/regexp.c index 268cee1cbe..3b7a607437 100644 --- a/src/backend/utils/adt/regexp.c +++ b/src/backend/utils/adt/regexp.c @@ -630,11 +630,10 @@ textregexreplace_noopt(PG_FUNCTION_ARGS) text *s = PG_GETARG_TEXT_PP(0); text *p = PG_GETARG_TEXT_PP(1); text *r = PG_GETARG_TEXT_PP(2); - regex_t*re; - - re = RE_compile_and_cache(p, REG_ADVANCED, PG_GET_COLLATION()); - PG_RETURN_TEXT_P(replace_text_regexp(s, (void *) re, r, 0, 1)); + PG_RETURN_TEXT_P(replace_text_regexp(s, p, r, + REG_ADVANCED, PG_GET_COLLATION(), + 0, 1)); } /* @@ -648,7 +647,6 @@ textregexreplace(PG_FUNCTION_ARGS) text *p = PG_GETARG_TEXT_PP(1); text *r = PG_GETARG_TEXT_PP(2); text *opt = PG_GETARG_TEXT_PP(3); - regex_t*re; pg_re_flags flags; /* @@ -672,10 +670,9 @@ textregexreplace(PG_FUNCTION_ARGS) parse_re_flags(&flags, opt); - re = RE_compile_and_cache(p, flags.cflags, PG_GET_COLLATION()); - - PG_RETURN_TEXT_P(replace_text_regexp(s, (void *) re, r, 0, - flags.glob ? 0 : 1)); + PG_RETURN_TEXT_P(replace_text_regexp(s, p, r, + flags.cflags, PG_GET_COLLATION(), + 0, flags.glob ? 0 : 1)); } /* @@ -694,7 +691,6 @@ textregexreplace_extended(PG_FUNCTION_ARGS) int n = 1; text *flags = PG_GETARG_TEXT_PP_IF_EXISTS(5); pg_re_flags re_flags; - regex_t*re; /* Collect optional parameters */ if (PG_NARGS() > 3) @@ -723,11 +719,10 @@ textregexreplace_extended(PG_FUNCTION_ARGS) if (PG_NARGS() <= 4) n = re_flags.glob ? 0 : 1; - /* Compile the regular expression */ - re = RE_compile_and_cache(p, re_flags.cflags, PG_GET_COLLATION()); - /* Do the replacement(s) */ - PG_RETURN_TEXT_P(replace_text_regexp(s, (void *) re, r, start - 1, n)); + PG_RETURN_TEXT_P(replace_text_regexp(s, p, r, + re_flags.cflags, PG_GET_COLLATION(), + start - 1, n)); } /* This is separate to keep the opr_sanity regression test from complaining */ diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c index 348b5566de..acb8741734 100644 --- a/src/backend/utils/adt/varlena.c +++ b/src/backend/utils/adt/varlena.c @@ -4359,34 +4359,36 @@ replace_text(PG_FUNCTION_ARGS) } /* - * check_replace_text_has_escape_char + * check_replace_text_has_escape * - * check whether replace_text contains escape char. + * Returns 0 if text contains no backslashes that need processing. + * Returns 1 if text contains backslashes, but not regexp submatch specifiers. + * Returns 2 if text contains regexp submatch specifiers (\1 .. \9). */ -static bool -check_replace_text_has_escape_char(const text *replace_text) +static int +check_replace_text_has_escape(const text *replace_text) { + int result = 0; const char *p = VARDATA_ANY(replace_text); const char *p_end = p + VARSIZE_ANY_EXHDR(replace_text); - if (pg_database_encoding_max_length() == 1) - { - for (; p < p_end; p++) - { - if (*p == '\\') -return true; - } - } - else + while (p < p_end) { - for (; p < p_end; p += pg_mblen(p)) + /* Find next escape char, if any. */ + p = memchr(p, '\\', p_end - p); + if (p == NULL) + break; + p++; + /* Note: a backslash at the end doesn't require extra processing. */ + if (p < p_end) { - if (*p == '\\') -return true; + if (*p >= '1' && *p <= '9') +return 2; /* Found a submatch specifier, so done */ + result = 1; /* Found some other sequence, keep looking */ + p++; } } - - return false; + return result; } /* @@ -4403,25 +4405,17 @@ appendStringInfoRegexpSub
Re: Use extended statistics to estimate (Var op Var) clauses
> On Jul 20, 2021, at 11:28 AM, Tomas Vondra > wrote: > > Tomas Vondra > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company > <0001-Handling-Expr-op-Expr-clauses-in-extended-stats-20210720.patch> Hi Tomas, I tested this patch against master looking for types of clauses that uniformly get worse with the patch applied. I found some. The tests are too large to attach, but the scripts that generate them are not. To perform the tests: git checkout master perl ./gentest.pl > src/test/regress/sql/gentest.sql cat /dev/null > src/test/regress/expected/gentest.out echo "test: gentest" >> src/test/regress/parallel_schedule ./configure && make && make check cp src/test/regress/results/gentest.out src/test/regress/expected/gentest.out patch -p 1 < 0001-Handling-Expr-op-Expr-clauses-in-extended-stats-20210720.patch make check cat src/test/regress/regression.diffs | perl ./check.pl This shows patterns of conditions that get worse, such as: better:0, worse:80: A < B and A <> A or not A < A better:0, worse:80: A < B and not A <= A or A <= A better:0, worse:80: A < B or A = A better:0, worse:80: A < B or A = A or not A >= A better:0, worse:80: A < B or A >= A better:0, worse:80: A < B or A >= A and not A <> A better:0, worse:80: A < B or not A < A better:0, worse:80: A < B or not A <> A better:0, worse:80: A < B or not A <> A or A <= A better:0, worse:80: A < B or not A >= A or not A < A It seems things get worse when the conditions contain a column compared against itself. I suspect that is being handled incorrectly. #!/usr/bin/perl use strict; use warnings; my ($query, $where, $before, $after, @before, @after); my $better = 0; my $worse = 0; my %where = (); my %gripe; while (<>) { if (/from check_estimated_rows\('(.*)'\)/) { $query = $1; if ($query =~ m/where (.*)$/) { $where = $1; my %columns = map { $_ => 1 } ($where =~ m/(column_\d+)/g); my @normal = ('A'..'Z'); my @columns = sort keys %columns; for my $i (0..$#columns) { my $old = $columns[$i]; my $new = $normal[$i]; $where =~ s/\b$old\b/$new/g; } } } elsif (m/^-(\s*(\d+)\s*\|\s*(\d+)\s*\|\s*(\d+))\s*$/) { ($before, @before) = ($1, $2, $3, $4); } elsif (m/^\+(\s*(\d+)\s*\|\s*(\d+)\s*\|\s*(\d+))\s*$/) { ($after, @after) = ($1, $2, $3, $4); $where{$where}->{better} ||= 0; $where{$where}->{worse} ||= 0; # Don't count the difference as meaningful unless we're more than 5 better or worse than before if ($after[2] > 5 + $before[2]) { $worse++; $where{$where}->{worse}++; } elsif ($after[2] + 5 < $before[2]) { $better++; $where{$where}->{better}++; } if (!exists $gripe{$where} && $where{$where}->{better} == 0 && $where{$where}->{worse} > 50) { print "TERRIBLE:\n"; print "\tQUERY: $query\n"; print "\tBEFORE: $before\n"; print "\tAFTER: $after\n\n"; $gripe{$query} = 1; } } } foreach my $where (sort keys %where) { if ($where{$where}->{better} < $where{$where}->{worse}) { print("better:", $where{$where}->{better}, ", worse:", $where{$where}->{worse}, ": $where\n"); } } print("\n\nTOTAL:\n"); print("\tbetter: $better\n"); print("\tworse: $worse\n"); #!/usr/bin/perl use strict; use warnings; our ($tblnum, $tblname, $colnum, $colname); # Generate the where clauses to be used on all tables our (%wherepattern1, %wherepattern2, %wherepattern3); my @ops = (">", "<", ">=", "<=", "=", "<>"); my @conj = ("and", "or", "and not", "or not"); for (1..100) { my $op1 = $ops[int(rand(@ops))]; my $op2 = $ops[int(rand(@ops))]; my $op3 = $ops[int(rand(@ops))]; my $conj1 = $conj[int(rand(@conj))]; my $conj2 = $conj[int(rand(@conj))]; $wherepattern1{"\%s $op1 \%s"} = 1; $wherepattern2{"\%s $op1 \%s $conj1 \%s $op2 \%s"} = 1; $wherepattern3{"\%s $op1 \%s $conj1 \%s $op2 \%s $conj2 \%s $op3 \%s"} = 1; } sub next_table { $tblnum++; $tblname = "table_$tblnum"; $colnum = 0; $colname = "column_$colnum"; } sub next_column { $colnum++; $colname = "column_$colnum"; } for my $colcnt (2..10) { next_table(); print("CREATE TABLE $tblname (\n"); for (1..$colcnt-1) { next_column(); print("\t$colname INTEGER,\n"); } next_column(); print("\t$colname INTEGER\n);\n"); print("INSERT INTO $tblname (SELECT ", join(", ", map { "gs-$_" } (1..$colcnt)), " FROM generate_series(1,100) gs);\n"); print("VACUUM FREEZE $tblname;\n"); for my $colmax (2..$colcnt) { print("CREATE STATISTICS ${tblname}_stats_${colmax} ON ", join(", ", map { "column_$_" } (1..$colmax)), " FROM $tblname;\n"); } print("ANALYZE $tblname;\n"); } # Restart the table sequence $tblnum = 0; for my $colcnt (2..10) { next_table(); for (1..100) { my $a = sprintf("column_%d", 1+int(rand($colcnt))); my $b = sprintf("column_%d", 1+int(rand($colcnt))); my $c = sprintf("column_%d", 1+int(rand($colcnt))); foreach my $where1 (keys %wherepattern1) { m
Re: ECPG bug fix: DECALRE STATEMENT and DEALLOCATE, DESCRIBE
On Mon, Aug 9, 2021 at 11:45 AM Michael Meskes wrote: > If you want me to answer, how about asking a question? Or telling me > that you'd like some feedback? I don't see how I should know that you > expect a reply to a simple statement of facts. I expressed concern in fairly strong terms, and received no answer for a full week. I consider that "ignoring the RMT", but you take issue with that characterization because my email didn't ask an explicit question with a question mark. Despite the fact that it is generally understood that "committers own their own items", and that the RMT exists as a final check on that. Clearly we disagree about this. I don't think that there is anything to be gained from discussing this any further, though. I suggest that we leave it at that. > > Okay, I understand that now. > > And? Do you care at all? I don't want to upset anybody for any reason. I regret that my words have upset you, but I think that they were misinterpreted in a way that I couldn't possibly have predicted. The particular aspect of last Friday's email that you took exception to was actually intended to convey that it was not personal. Remember, my whole ethos is to avoid strong RMT intervention when possible, to make it impersonal. My framing of things had the opposite effect to the one I'd intended, ironically. > Sure, I don't question that. Again, I knew about the issue, only > misjudged it in the beginning. Your email from July 30 did show me that > it was more urgent but still didn't create the impression that there > was such a short deadline. In my opinion my prior email already > explained that I was on it, but couldn't give an estimate. How could anybody on the RMT judge what was going on sensibly? There was *zero* information from you (the original committer, our point of contact) about an item that is in a totally unfamiliar part of the code to every other committer. We were effectively forced to make very conservative assumptions about the deadline. I think that it's very likely that this could have been avoided if only you'd engaged to some degree -- if you had said it was a short deadline then we'd likely have taken your word for it, as the relevant subject matter expert and committer in charge of the item. But we were never given that choice. > Well, you did lay out what the decision would be and I fully agreed > with it. So again, what was there to do? Had you asked me if I agreed, > I would told you. If the patch being reverted was so inconsequential to you that you didn't even feel the need to write a brief email about it, why did you commit it in the first place? I just don't understand this at all. -- Peter Geoghegan
Re: when the startup process doesn't (logging startup delays)
On Mon, Aug 9, 2021 at 11:20 AM Nitin Jadhav wrote: > Modified the reset_startup_progress_timeout() to take the second > parameter which indicates whether it is called for initialization or > for resetting. Without this parameter there is a problem if we call > init_startup_progress() more than one time if there is no call to > ereport_startup_progress() in between as the code related to disabling > the timer has been removed. I'd really like to avoid this. I don't see why it's necessary. You say it causes a problem, but you don't explain what problem it causes. enable_timeout_at() will disable the timer if not already done. I think all we need to do is set scheduled_startup_progress_timeout = 0 before calling reset_startup_progress_timeout() in the "init" case and don't do that for the non-init case. If that's not quite right, maybe you can work out something that does work. But adding an is_init flag to a function and having no common code between the is_init = true case and the is_init = false case is exactly the kind of thing that I don't want here. I want as much common code as possible. > > This makes sense, but I think I'd like to have all the functions in > > this patch use names_like_this() rather than NamesLikeThis(). > > I have changed all the function names accordingly. But I would like to > know why it should be names_like_this() as there are many functions > with the format NamesLikeThis(). I would like to understand when to > use what, just to incorporate in the future patches. There is, unfortunately, no hard-and-fast rule, but we want to maintain as much consistency with the existing style as we can. I wasn't initially sure what would work best for this particular patch, but after we committed to a name like ereport_startup_progress() that to me was a strong hint in favor of using names_like_this() throughout. It seems impossible to imagine punctuating it as EreportStartupProgress() or something since that would be wildly inconsistent with the existing function name, and there seems to be no good reason why this patch can't be internally consistent. To some degree, we tend to use names_like_this() for low-level functions and NamesLikeThis() for higher-level functions, but that is not a very consistent practice. > > reset_startup_progress_timeout(TimeStampTz now) > > { > > next_timeout = last_startup_progress_timeout + interval; > > if (next_timeout < now) > > { > > // Either the timeout was processed so late that we missed an entire > > cycle, > > // or the system clock was set backwards. > > next_timeout = now + interval; > > } > > enable_timeout_at(next_timeout); > > last_startup_progress_timeout = next_timeout; > > } > > As per the above logic, I would like to discuss 2 cases. > > Case-1: > First scheduled timeout is after 1 sec. But the time out occurred > after 1.5 sec. So the log msg prints after 1.5 sec. Next timer is > scheduled after 2 sec (scheduled_startup_progress_timeout + interval). > The condition (next_timeout < now) gets evaluated to false and > everything goes smooth. > > Case-2: > First scheduled timeout is after 1 sec. But the timeout occurred after > 2.5 sec. So the log msg prints after 2.5 sec. Now next time is > scheduled after 2 sec (scheduled_startup_progress_timeout + interval). > So the condition (next_timeout < now) will fail and the next_timeout > will get reset to 3.5 sec (2.5 + 1) and it continues. Is this > behaviour ok or should we set the next_timeout to 3 sec. Please share > your thoughts on this. I can't quite follow this, because it seems like you are sometimes viewing the interval as 1 second and sometimes as 2 seconds. Maybe you could clarify that, and perhaps show example output? My feeling is that the timer will almost always be slightly late, but it will very rarely be extremely late, and it will also very rarely be early (only if someone resets the system clock). So let's consider those two cases separately. If the timer is a little bit late each time, we want to avoid drift, so we want to shorten the next sleep time by the amount that the previous interrupt was late. If the interval is 1000ms and the interrupt fires 1ms late then we should sleep 999ms the next time; if 2ms late, 998ms. That way, although there will be some variation in which the messages are logged, the drift won't accumulate over time and even after many minutes of recovery the messages will be printed at ABOUT the same number of milliseconds after the second every time, instead of drifting further and further off course. But this strategy cannot be used if the drift is larger than the interval. If we are trying to log a message every 1000ms and the timer doesn't fire for 14ms, we can wait only 986ms the next time. If it doesn't fire for 140ms, we can wait only 860ms the next time. But if the timer doesn't fire for 1400ms, we cannot wait for -400ms the next time. So what should we do? My proposal is to just wait for the configured interval, 1
Re: Autovacuum on partitioned table (autoanalyze)
Hi On 2021-Jul-27, Andres Freund wrote: > Isn't this going to create a *lot* of redundant sampling? Especially if you > have any sort of nested partition tree. In the most absurd case a partition > with n parents will get sampled n times, solely due to changes to itself. It seems to me that you're barking up the wrong tree on this point. This problem you describe is not something that was caused by this patch; ANALYZE has always worked like this. We have discussed the idea of avoiding redundant sampling, but it's clear that it is not a simple problem, and solving it was not in scope for this patch. > Additionally, while analyzing all child partitions for a partitioned tables > are AccessShareLock'ed at once. If a partition hierarchy has more than one > level, it actually is likely that multiple autovacuum workers will end up > processing the ancestors separately. This seems like it might contribute to > lock exhaustion issues with larger partition hierarchies? I agree this seems a legitimate problem. -- Álvaro Herrera Valdivia, Chile — https://www.EnterpriseDB.com/
Re: make MaxBackends available in _PG_init
On Sat, Aug 7, 2021 at 2:01 PM Bossart, Nathan wrote: > Here is a rebased version of the patch. > Giving this a review. Patch applies cleanly and `make check` works as of e12694523e7e4482a052236f12d3d8b58be9a22c Overall looks very nice and tucks MaxBackends safely away. I have a few suggestions: > + size = add_size(size, mul_size(GetMaxBackends(0), sizeof(BTOneVacInfo))); The use of GetMaxBackends(0) looks weird - can we add another constant in there for the "default" case? Or just have GetMaxBackends() work? > + GMB_MAX_PREPARED_XACTS = 1 << 0, /* include max_prepared_xacts */ s/include/add in/ > +typedef enum GMBOption > +{ > + GMB_MAX_PREPARED_XACTS = 1 << 0, /* include max_prepared_xacts */ > + GMB_NUM_AUXILIARY_PROCS = 1 << 1 /* include NUM_AUXILIARY_PROCS */ > +} GMBOption; Is a typedef enum really needed? As opposed to something like this style: #define WL_LATCH_SET (1 << 0) #define WL_SOCKET_READABLE (1 << 1) #define WL_SOCKET_WRITEABLE (1 << 2) > - (MaxBackends + max_prepared_xacts + 1)); > + (GetMaxBackends(GMB_MAX_PREPARED_XACTS) + 1)); This is a little confusing - there is no indication to the reader that this is an additive function. Perhaps a little more intuitive name: > + (GetMaxBackends(GMB_PLUS_MAX_PREPARED_XACTS) + 1)); However, the more I went through this patch, the more the GetMaxBackends(0) nagged at me. The vast majority of the calls are with "0". I'd argue for just having no arguments at all, which removes a bit of code and actually makes things like this easier to read: Original change: > - uint32 TotalProcs = MaxBackends + NUM_AUXILIARY_PROCS + max_prepared_xacts; > + uint32 TotalProcs = GetMaxBackends(GMB_NUM_AUXILIARY_PROCS | GMB_MAX_PREPARED_XACTS); Versus: > - uint32 TotalProcs = MaxBackends + NUM_AUXILIARY_PROCS + max_prepared_xacts; > + uint32 TotalProcs = GetMaxBackends() + NUM_AUXILIARY_PROCS + max_prepared_xacts; > + * This must be called after modules have had the change to alter GUCs in > + * shared_preload_libraries, and before shared memory size is determined. s/change/chance/; > +void > +SetMaxBackends(int max_backends) > +{ > + if (MaxBackendsInitialized) > + elog(ERROR, "MaxBackends already initialized"); Is this going to get tripped by a call from restore_backend_variables? Cheers, Greg
Re: ECPG bug fix: DECALRE STATEMENT and DEALLOCATE, DESCRIBE
On Sat, Aug 7, 2021 at 4:13 AM Michael Meskes wrote: > I get it that the goal is to release PostgreSQL 14 and I also get it > that nobody is interested in the reasons for my slow reaction. I even, > at least to an extend, understand why nobody tried reaching out to me > directly. However, what I cannot understand at all is the tone of this > email. Is this the new way of communication in the PostgreSQL project? > > Just to be more precise, I find it highly offensive that you address an > email only to me (everyone else was on CC) and yet do not even try to > talk to me, but instead talk about me as a third person. I find this > very disrespectful. Hi, FWIW, I don't think that the phrasing of Peter's email is disrespectful. As I read it, it simply states that the RMT has made a decision to revert the patch unless certain assurances are given before a certain date. I don't expect anyone will particularly like receiving such an email, because nobody likes to be threatened with a revert, but I don't think there is anything rude about it. Either you are willing to commit to resolving the problem by a date that the RMT finds acceptable, or you aren't. If you are, great. If you aren't, the patch is going to get reverted. That sucks, but it's nothing against you personally; it's just what happens sometimes. I also feel rather strongly that being a member of the RMT is a pretty thankless task, involving going through a lot of patches that you may not care about and trying to make decisions that will benefit the project, even while knowing that some people may not like them. We should give people who are willing to offer such service the benefit of the doubt. On the substance of the issue, one question that I have reading over this thread is whether there's really a bug here at all. It is being represented (and I have not checked whether this is accurate) that the patch affects the behavior of DECLARE, PREPARE, and EXECUTE, but not DESCRIBE, DEALLOCATE, DECLARE CURSOR .. FOR, or CREATE TABLE AS EXECUTE. However, it also seems that it's not entirely clear what the behavior ought to be in those cases, except perhaps for DESCRIBE. If that's the case, maybe this doesn't really need to be an open item, and maybe we don't therefore need to talk about whether it should be reverted. Maybe it's just a feature that supports certain things now and in the future, after due reflection, perhaps it will support more. I would be interested in hearing your view, and that of others, on whether this is really a bug at all. -- Robert Haas EDB: http://www.enterprisedb.com
Re: ECPG bug fix: DECALRE STATEMENT and DEALLOCATE, DESCRIBE
> question with a question mark. Despite the fact that it is generally > understood that "committers own their own items", and that the RMT > exists as a final check on that. This does not contradict my opinion, but anyway. > Clearly we disagree about this. I don't think that there is anything > to be gained from discussing this any further, though. I suggest that > we leave it at that. Agreed. > I don't want to upset anybody for any reason. I regret that my words > have upset you, but I think that they were misinterpreted in a way > that I couldn't possibly have predicted. The particular aspect of I strongly object to that. It's pretty obvious to me that addressing people in third person is very offending. > last > Friday's email that you took exception to was actually intended to > convey that it was not personal. Remember, my whole ethos is to avoid > strong RMT intervention when possible, to make it impersonal. My > framing of things had the opposite effect to the one I'd intended, > ironically. Let me stress again that the third person part is the bad thing in my opinion, not the rest of the words. > How could anybody on the RMT judge what was going on sensibly? There > was *zero* information from you (the original committer, our point of > contact) about an item that is in a totally unfamiliar part of the > code to every other committer. We were effectively forced to make > very > conservative assumptions about the deadline. I think that it's very > likely that this could have been avoided if only you'd engaged to > some > degree -- if you had said it was a short deadline then we'd likely > have taken your word for it, as the relevant subject matter expert > and > committer in charge of the item. But we were never given that choice. The same holds the other way round, I only understood later that you wanted more information. Had I known that earlier, I would have gladly given them. > > Well, you did lay out what the decision would be and I fully agreed > > with it. So again, what was there to do? Had you asked me if I > > agreed, > > I would told you. > > If the patch being reverted was so inconsequential to you that you > didn't even feel the need to write a brief email about it, why did > you > commit it in the first place? I just don't understand this at all. I'm getting very tired of you accusing me of things I neither said nor did. Please stop doing that or show me the email where I said the patch was "inconsequential"? As for writing a brief email, please read all the other emails in this thread, I've explained myself repeatedly already. Michael -- Michael Meskes Michael at Fam-Meskes dot De Michael at Meskes dot (De|Com|Net|Org) Meskes at (Debian|Postgresql) dot Org
Re: ECPG bug fix: DECALRE STATEMENT and DEALLOCATE, DESCRIBE
Hi, > FWIW, I don't think that the phrasing of Peter's email is > disrespectful. As I read it, it simply states that the RMT has made a As I said before, it might be a cultural difference. What I don't understand is, that a simple "Hi Michael, this is what the RMT decided:" would have been sufficient to make this email okay. I take offense in being addressed in third person only. > strongly that being a member of the RMT is a pretty thankless task, That I agree with. > On the substance of the issue, one question that I have reading over > this thread is whether there's really a bug here at all. It is being > represented (and I have not checked whether this is accurate) that > the > patch affects the behavior of DECLARE, PREPARE, and EXECUTE, but not > DESCRIBE, DEALLOCATE, DECLARE CURSOR .. FOR, or CREATE TABLE AS > EXECUTE. However, it also seems that it's not entirely clear what the > behavior ought to be in those cases, except perhaps for DESCRIBE. If > that's the case, maybe this doesn't really need to be an open item, > and maybe we don't therefore need to talk about whether it should be > reverted. Maybe it's just a feature that supports certain things now > and in the future, after due reflection, perhaps it will support > more. The way I see it we should commit the patch that makes more statements honor the new DECLARE STATEMENT feature. I don't think we can change anything for the worse by doing that. And other databases are not really strict about this either. > I would be interested in hearing your view, and that of others, on > whether this is really a bug at all. I think the question is more which version of the patch we commit as it does increase the functionality of ecpg. Michael -- Michael Meskes Michael at Fam-Meskes dot De Michael at Meskes dot (De|Com|Net|Org) Meskes at (Debian|Postgresql) dot Org
Re: ECPG bug fix: DECALRE STATEMENT and DEALLOCATE, DESCRIBE
On Mon, Aug 9, 2021 at 1:38 PM Michael Meskes wrote: > > I don't want to upset anybody for any reason. I regret that my words > > have upset you, but I think that they were misinterpreted in a way > > that I couldn't possibly have predicted. The particular aspect of > > I strongly object to that. It's pretty obvious to me that addressing > people in third person is very offending. > And using the third person to avoid making things personal, and properly represent one's role as a representative as opposed to an individual, is something at least two of us consider to be "professional". If others taking on a professional/formal tone with you is offending I politely suggest you need to at least cut them some slack when you haven't informed them of this previously. Cultural differences happen in both directions. > How could anybody on the RMT judge what was going on sensibly? There > > was *zero* information from you (the original committer, our point of > > contact) about an item that is in a totally unfamiliar part of the > > code to every other committer. We were effectively forced to make > > very > > conservative assumptions about the deadline. I think that it's very > > likely that this could have been avoided if only you'd engaged to > > some > > degree -- if you had said it was a short deadline then we'd likely > > have taken your word for it, as the relevant subject matter expert > > and > > committer in charge of the item. But we were never given that choice. > > The same holds the other way round, I only understood later that you > wanted more information. Had I known that earlier, I would have gladly > given them. There is clearly an expectation from the RMT, and at least myself, that: "The RMT discussed this recently. We are concerned about this issue, including how it has been handled so far. If you're unable to resolve the issue (or at least commit time to resolving the issue) then the RMT will call for the revert of the original feature patch." is expected to elicit a response from the comitter in a timely fashion. Really, any email sent to -hackers from the RMT about a specific commit; or even any email sent to -hackers by a core team member, is expected to be responded to in a timely manner. These teams should not be getting involved with the day-to-day operations and being responsive to them is part of the obligation of being a committer. In hindsight probably the quoted email above should have been worded. "If you're unable to resolve the issue, or communicate a timely plan for doing so, within the next week please revert the patch." Making it clear that the committer should be the one performing the revert. Then, absent feedback or a revert, the second email and the RMT team performing the revert, would be appropriate. David J.
Re: Another regexp performance improvement: skip useless paren-captures
Tom, I can still trigger the old bug for which we thought we'd pushed a fix. The test case below crashes on master (e12694523e7e4482a052236f12d3d8b58be9a22c), and also on the fixed version "Make regexp engine's backref-related compilation state more bulletproof." (cb76fbd7ec87e44b3c53165d68dc2747f7e26a9a). Can you test if it crashes for you, too? I'm not sure I see why this one fails when millions of others pass. The backtrace is still complaining about regc_nfa.c:1265: +select regexp_split_to_array('', '(?:((?:q+))){0}(\1){0,0}?*[^]'); +server closed the connection unexpectedly + This probably means the server terminated abnormally + before or while processing the request. +connection to server was lost — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: ECPG bug fix: DECALRE STATEMENT and DEALLOCATE, DESCRIBE
On Mon, Aug 9, 2021 at 10:38:07PM +0200, Michael Meskes wrote: > > Clearly we disagree about this. I don't think that there is anything > > to be gained from discussing this any further, though. I suggest that > > we leave it at that. > > Agreed. > > > I don't want to upset anybody for any reason. I regret that my words > > have upset you, but I think that they were misinterpreted in a way > > that I couldn't possibly have predicted. The particular aspect of > > I strongly object to that. It's pretty obvious to me that addressing > people in third person is very offending. So, you object to him referring to you in the third person in an email, and you object to him saying it was "misinterpreted". Are you going to object to my email too? I think everyone can accept that you interpreted what Peter said as offensive, but you must also give the same acceptance that someone might not consider it offensive. For example, I did not read it as offensive at all. I think it might have been in the third person because at that point, Peter didn't expect a reply from you, and put you on the "TO" line merely as a courtesy. He could have put out an email about reverting the patch without you on the email header at all, I guess --- then he could have referred to you without offending you. > > How could anybody on the RMT judge what was going on sensibly? There > > was *zero* information from you (the original committer, our point of > > contact) about an item that is in a totally unfamiliar part of the > > code to every other committer. We were effectively forced to make > > very > > conservative assumptions about the deadline. I think that it's very > > likely that this could have been avoided if only you'd engaged to > > some > > degree -- if you had said it was a short deadline then we'd likely > > have taken your word for it, as the relevant subject matter expert > > and > > committer in charge of the item. But we were never given that choice. > > The same holds the other way round, I only understood later that you > wanted more information. Had I known that earlier, I would have gladly > given them. Let me be practical here --- the more someone has to be chased for a reply, the less confidence they have in that person. If the RMT contacts you about something, and obviously they have had to take usual efforts to contact you, the more it is on you to give a full report and a timeline of when you will address the issue. If they had to chase you around, and you gave them a short answer, the less confidence they have in this getting resolved in a timely manner. It is the RMT's responsibility to resolve things in a timely manner, and since they have contacted you, you should be going out of your way to at least give them confidence that it will be dealt with by you, rather than them. Whether the problem is them not asking for a timeline or you not offering one, the real solution would have been to provide a timeline to them when they contacted you, since if the RMT is contacting you, it is a serious issue. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
Re: ECPG bug fix: DECALRE STATEMENT and DEALLOCATE, DESCRIBE
On Mon, Aug 9, 2021 at 1:38 PM Michael Meskes wrote: > > I don't want to upset anybody for any reason. I regret that my words > > have upset you, but I think that they were misinterpreted in a way > > that I couldn't possibly have predicted. The particular aspect of > > I strongly object to that. It's pretty obvious to me that addressing > people in third person is very offending. I think that this must be a cultural thing. I can see how somebody would see the third person style as overly formal or stilted. An interpretation like that does make sense to me. But I knew of no reason why you might find that style made the message offensive. It was never intended to denigrate. I don't know you all that well, but we have talked for quite a while on a few occasions. I got the sense that you are a decent person from these conversations. I have no possible reason to denigrate or insult you. In general I try to be respectful, and if I ever fail it's not because I didn't care at all. Anybody that knows me well knows that I am not a mean spirited person. If this is just an unfortunate misunderstanding, as I suspect it is, then I would be happy to let it go, and treat it as something to learn from. -- Peter Geoghegan
Re: Another regexp performance improvement: skip useless paren-captures
Mark Dilger writes: > I can still trigger the old bug for which we thought we'd pushed a fix. The > test case below crashes on master (e12694523e7e4482a052236f12d3d8b58be9a22c), > and also on the fixed version "Make regexp engine's backref-related > compilation state more bulletproof." > (cb76fbd7ec87e44b3c53165d68dc2747f7e26a9a). > Can you test if it crashes for you, too? I'm not sure I see why this one > fails when millions of others pass. > The backtrace is still complaining about regc_nfa.c:1265: > +select regexp_split_to_array('', '(?:((?:q+))){0}(\1){0,0}?*[^]'); > +server closed the connection unexpectedly Hmmm ... yeah, I see it too. This points up something I'd wondered about before, which is whether the code that "cancels everything" after detecting {0} is really OK. It throws away the outer subre *and children* without worrying about what might be inside, and here we see that that's not good enough --- there's still a v->subs pointer to the first capturing paren set, which we just deleted, so that the \1 later on messes up. I'm not sure why the back branches are managing not to crash, but that might just be a memory management artifact. regards, tom lane
Re: Asynchronous and "direct" IO support for PostgreSQL.
On Wed, Jul 28, 2021 at 1:37 PM Melanie Plageman wrote: > > On Tue, Feb 23, 2021 at 5:04 AM Andres Freund wrote: > > > > ## AIO API overview > > > > The main steps to use AIO (without higher level helpers) are: > > > > 1) acquire an "unused" AIO: pgaio_io_get() > > > > 2) start some IO, this is done by functions like > >pgaio_io_start_(read|write|fsync|flush_range)_(smgr|sb|raw|wal) > > > >The (read|write|fsync|flush_range) indicates the operation, whereas > >(smgr|sb|raw|wal) determines how IO completions, errors, ... are handled. > > > >(see below for more details about this design choice - it might or not be > >right) > > > > 3) optionally: assign a backend-local completion callback to the IO > >(pgaio_io_on_completion_local()) > > > > 4) 2) alone does *not* cause the IO to be submitted to the kernel, but to be > >put on a per-backend list of pending IOs. The pending IOs can be > > explicitly > >be flushed pgaio_submit_pending(), but will also be submitted if the > >pending list gets to be too large, or if the current backend waits for > > the > >IO. > > > >The are two main reasons not to submit the IO immediately: > >- If adjacent, we can merge several IOs into one "kernel level" IO during > > submission. Larger IOs are considerably more efficient. > >- Several AIO APIs allow to submit a batch of IOs in one system call. > > > > 5) wait for the IO: pgaio_io_wait() waits for an IO "owned" by the current > >backend. When other backends may need to wait for an IO to finish, > >pgaio_io_ref() can put a reference to that AIO in shared memory (e.g. a > >BufferDesc), which can be waited for using pgaio_io_wait_ref(). > > > > 6) Process the results of the request. If a callback was registered in 3), > >this isn't always necessary. The results of AIO can be accessed using > >pgaio_io_result() which returns an integer where negative numbers are > >-errno, and positive numbers are the [partial] success conditions > >(e.g. potentially indicating a short read). > > > > 7) release ownership of the io (pgaio_io_release()) or reuse the IO for > >another operation (pgaio_io_recycle()) > > > > > > Most places that want to use AIO shouldn't themselves need to care about > > managing the number of writes in flight, or the readahead distance. To help > > with that there are two helper utilities, a "streaming read" and a > > "streaming > > write". > > > > The "streaming read" helper uses a callback to determine which blocks to > > prefetch - that allows to do readahead in a sequential fashion but > > importantly > > also allows to asynchronously "read ahead" non-sequential blocks. > > > > E.g. for vacuum, lazy_scan_heap() has a callback that uses the visibility > > map > > to figure out which block needs to be read next. Similarly > > lazy_vacuum_heap() > > uses the tids in LVDeadTuples to figure out which blocks are going to be > > needed. Here's the latter as an example: > > https://github.com/anarazel/postgres/commit/a244baa36bfb252d451a017a273a6da1c09f15a3#diff-3198152613d9a28963266427b380e3d4fbbfabe96a221039c6b1f37bc575b965R1906 > > > > Attached is a patch on top of the AIO branch which does bitmapheapscan > prefetching using the PgStreamingRead helper already used by sequential > scan and vacuum on the AIO branch. > > The prefetch iterator is removed and the main iterator in the > BitmapHeapScanState node is now used by the PgStreamingRead helper. > ... > > Oh, and I haven't done testing to see how effective the prefetching is > -- that is a larger project that I have yet to tackle. > I have done some testing on how effective it is now. I've also updated the original patch to count the first page (in the lossy/exact page counts mentioned down-thread) as well as to remove unused prefetch fields and comments. I've also included a second patch which adds IO wait time information to EXPLAIN output when used like: EXPLAIN (buffers, analyze) SELECT ... The same commit also introduces a temporary dev GUC io_bitmap_prefetch_depth which I am using to experiment with the prefetch window size. I wanted to share some results from changing the prefetch window to demonstrate how prefetching is working. The short version of my results is that the prefetching works: - with the prefetch window set to 1, the IO wait time is 1550 ms - with the prefetch window set to 128, the IO wait time is 0.18 ms DDL and repro details below: On Andres' AIO branch [1] with my bitmap heapscan prefetching patch set applied built with the following build flags: -02 -fno-omit-frame-pointer --with-liburing And these non-default PostgreSQL settings: io_data_direct=1 io_data_force_async=off io_method=io_uring log_min_duration_statement=0 log_duration=on set track_io_timing to on; set max_parallel_workers_per_gather to 0; set enable_seqscan to off; set enable_indexscan to off; set enable_bitmapscan to on; set effective_io_concurren
Re: Slow standby snapshot
Hello, Andres. Thanks for the feedback again. > The problem is that we don't want to add a lot of work > KnownAssignedXidsAdd/Remove, because very often nobody will build a snapshot > for that moment and building a sorted, gap-free, linear array of xids isn't > cheap. In my experience it's more common to be bottlenecked by replay CPU > performance than on replica snapshot building. Yes, but my patch adds almost the smallest possible amount for KnownAssignedXidsAdd/Remove - a single write to the array by index. It differs from the first version in this thread which is based on linked lists. The "next valid offset" is just "optimistic optimization" - it means "you could safely skip KnownAssignedXidsNext[i] while finding the next valid". But KnownAssignedXidsNext is not updated by Add/Remove. > During recovery, where there's only one writer to the procarray / known xids, > it might not be hard to opportunistically build a dense version of the known > xids whenever a snapshot is built. AFAIU the patch does exactly the same. On the first snapshot building, offsets to the next valid entry are set. So, a dense version is created on-demand. And this version is reused (even partly if something was removed) on the next snapshot building. > I'm not entirely sure what datastructure would work best. I can see something > like a radix tree work well, or a skiplist style approach. Or a hashtable: We could try to use some other structure (for example - linked hash map) - but the additional cost (memory management, links, hash calculation) will probably significantly reduce performance. And it is a much harder step to perform. So, I think "next valid offset" optimization is a good trade-off for now: * KnownAssignedXidsAdd/Remove are almost not affected in their complexity * KnownAssignedXidsGetAndSetXmin is eliminated from the CPU top on typical read scenario - so, more TPS, less ProcArrayLock contention * it complements GetSnapshotData() scalability - now on standby * changes are small Thanks, Michail.
Re: ECPG bug fix: DECALRE STATEMENT and DEALLOCATE, DESCRIBE
> > > I don't want to upset anybody for any reason. I regret that my > > > words > > > have upset you, but I think that they were misinterpreted in a > > > way > > > that I couldn't possibly have predicted. The particular aspect of > > > > I strongly object to that. It's pretty obvious to me that > > addressing > > people in third person is very offending. > > So, you object to him referring to you in the third person in an > email, > and you object to him saying it was "misinterpreted". Are you going > to > object to my email too? No, of course not. And sorry for not being precise enough, I only objected to the prediction part, but I agree, I take the objection back. I guess it's as difficult for Peter to understand why this is offensive as it is for me to not see it as such. > I think it might have been in the third person because at that point, > Peter didn't expect a reply from you, and put you on the "TO" line > merely as a courtesy. He could have put out an email about reverting > the patch without you on the email header at all, I guess --- then he > could have referred to you without offending you. Right, that was my only problem originally. It seemed difficult to bring that point over. > Let me be practical here --- the more someone has to be chased for a > reply, the less confidence they have in that person. If the RMT > contacts you about something, and obviously they have had to take > usual > efforts to contact you, the more it is on you to give a full report > and > a timeline of when you will address the issue. If they had to chase > you > around, and you gave them a short answer, the less confidence they > have > in this getting resolved in a timely manner. Again agreed, please keep in mind, though, that I didn't notice I was being chased until Peter's first email. Michael -- Michael Meskes Michael at Fam-Meskes dot De Michael at Meskes dot (De|Com|Net|Org) Meskes at (Debian|Postgresql) dot Org
Re: ECPG bug fix: DECALRE STATEMENT and DEALLOCATE, DESCRIBE
On Mon, Aug 9, 2021 at 11:48:07PM +0200, Michael Meskes wrote: > No, of course not. And sorry for not being precise enough, I only > objected to the prediction part, but I agree, I take the objection > back. I guess it's as difficult for Peter to understand why this is > offensive as it is for me to not see it as such. OK, good. > > Let me be practical here --- the more someone has to be chased for a > > reply, the less confidence they have in that person. If the RMT > > contacts you about something, and obviously they have had to take > > usual > > efforts to contact you, the more it is on you to give a full report > > and > > a timeline of when you will address the issue. If they had to chase > > you > > around, and you gave them a short answer, the less confidence they > > have > > in this getting resolved in a timely manner. > > Again agreed, please keep in mind, though, that I didn't notice I was > being chased until Peter's first email. I was asked by the RMT to contact one of your employees, and I did, to tell you that the RMT was looking for feedback from you on an ecpg issue. Not sure if that was before or after Peter's email. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
Re: ECPG bug fix: DECALRE STATEMENT and DEALLOCATE, DESCRIBE
Peter, > I think that this must be a cultural thing. I can see how somebody > would see the third person style as overly formal or stilted. An > interpretation like that does make sense to me. But I knew of no > reason why you might find that style made the message offensive. It > was never intended to denigrate. This explains why it felt so difficult to make myself clear. I was feeling exactly the same, just the other way round. > I don't know you all that well, but we have talked for quite a while > on a few occasions. I got the sense that you are a decent person from > these conversations. I have no possible reason to denigrate or insult > you. In general I try to be respectful, and if I ever fail it's not > because I didn't care at all. Anybody that knows me well knows that I > am not a mean spirited person. I never though that. Maybe we should have quickly talked things out. Email tends to be a bad medium for communication, especially when it goes wrong. :) > If this is just an unfortunate misunderstanding, as I suspect it is, > then I would be happy to let it go, and treat it as something to > learn > from. Agreed, me too. I'd like to apologize for not answering your email the way I should have. Honestly it never occurred to me. Maybe that's because I'm used to other procedures, or because I never had to converse with the RMT, or maybe, quite simple, because I lacked the time to think it through, the original issue that kind of started this whole mess. Michael -- Michael Meskes Michael at Fam-Meskes dot De Michael at Meskes dot (De|Com|Net|Org) Meskes at (Debian|Postgresql) dot Org
Re: ECPG bug fix: DECALRE STATEMENT and DEALLOCATE, DESCRIBE
On Mon, Aug 9, 2021 at 06:00:00PM -0400, Bruce Momjian wrote: > > Again agreed, please keep in mind, though, that I didn't notice I was > > being chased until Peter's first email. > > I was asked by the RMT to contact one of your employees, and I did, to > tell you that the RMT was looking for feedback from you on an ecpg > issue. Not sure if that was before or after Peter's email. The date of that request was July 28 and I was told by your employee that you would be informed that afternoon. If you want the employee's name, I will provide it in a private email. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
Re: ECPG bug fix: DECALRE STATEMENT and DEALLOCATE, DESCRIBE
On Tue, Aug 10, 2021 at 12:03:24AM +0200, Michael Meskes wrote: > I'd like to apologize for not answering your email the way I should > have. Honestly it never occurred to me. Maybe that's because I'm used > to other procedures, or because I never had to converse with the RMT, > or maybe, quite simple, because I lacked the time to think it through, > the original issue that kind of started this whole mess. Agreed. When the RMT contacts me, I assume it is something that is time and release critical so I give them as much detail as I can, and a timeline when they will get more information. If you are not focused on the RMT process, it might not be clear why that is important. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
Re: ECPG bug fix: DECALRE STATEMENT and DEALLOCATE, DESCRIBE
> > Again agreed, please keep in mind, though, that I didn't notice I > > was > > being chased until Peter's first email. > > I was asked by the RMT to contact one of your employees, and I did, > to > tell you that the RMT was looking for feedback from you on an ecpg > issue. Not sure if that was before or after Peter's email. I think that was before, at that point I still thought it was nothing time sensitive. And unfortunately it didn't register that RMT was involved at all. Michael -- Michael Meskes Michael at Fam-Meskes dot De Michael at Meskes dot (De|Com|Net|Org) Meskes at (Debian|Postgresql) dot Org
Re: Another regexp performance improvement: skip useless paren-captures
I wrote: > Hmmm ... yeah, I see it too. This points up something I'd wondered > about before, which is whether the code that "cancels everything" > after detecting {0} is really OK. It throws away the outer subre > *and children* without worrying about what might be inside, and > here we see that that's not good enough --- there's still a v->subs > pointer to the first capturing paren set, which we just deleted, > so that the \1 later on messes up. I'm not sure why the back > branches are managing not to crash, but that might just be a memory > management artifact. ... yeah, it is. For me, this variant hits the assertion in all branches: regression=# select regexp_split_to_array('', '((.)){0}(\2){0}'); server closed the connection unexpectedly So that's a pre-existing (and very long-standing) bug. I'm not sure if it has any serious impact in non-assert builds though. Failure to clean out some disconnected arcs probably has no real effect on the regex's behavior later. regards, tom lane
Re: ECPG bug fix: DECALRE STATEMENT and DEALLOCATE, DESCRIBE
On 8/9/21 6:15 PM, Bruce Momjian wrote: > On Tue, Aug 10, 2021 at 12:03:24AM +0200, Michael Meskes wrote: >> I'd like to apologize for not answering your email the way I should >> have. Honestly it never occurred to me. Maybe that's because I'm used >> to other procedures, or because I never had to converse with the RMT, >> or maybe, quite simple, because I lacked the time to think it through, >> the original issue that kind of started this whole mess. > Agreed. When the RMT contacts me, I assume it is something that is time > and release critical so I give them as much detail as I can, and a > timeline when they will get more information. If you are not focused on > the RMT process, it might not be clear why that is important. > That's what you should be doing. If nothing else comes from this colloquy it should make all committers aware of the process. The reason we have an RMT, as I understand it, is to prevent the situation we had years ago when things sometimes dragged on almost interminably after feature freeze. cheers andrew
Re: ECPG bug fix: DECALRE STATEMENT and DEALLOCATE, DESCRIBE
Michael, On Mon, Aug 9, 2021 at 3:03 PM Michael Meskes wrote: > This explains why it felt so difficult to make myself clear. I was > feeling exactly the same, just the other way round. My own special brand of miscommunication was also involved. I happen to be sensitive to the perception that I yield any authority that I might have (as an RMT member, as a committer, whatever) in a way that is arbitrary or unfair. And so I wrote way too much about why that wasn't actually the case here. I now realize that that wasn't really relevant. > I never though that. Maybe we should have quickly talked things out. > Email tends to be a bad medium for communication, especially when it > goes wrong. :) Indeed. That might well have happened if we had been set up for it already. > I'd like to apologize for not answering your email the way I should > have. Honestly it never occurred to me. Maybe that's because I'm used > to other procedures, or because I never had to converse with the RMT, > or maybe, quite simple, because I lacked the time to think it through, > the original issue that kind of started this whole mess. I think that there was a snowballing effect here. We both made mistakes that compounded. I apologize for my role in this whole mess. -- Peter Geoghegan
Re: ECPG bug fix: DECALRE STATEMENT and DEALLOCATE, DESCRIBE
On Mon, Aug 9, 2021 at 03:42:45PM -0700, Peter Geoghegan wrote: > > I'd like to apologize for not answering your email the way I should > > have. Honestly it never occurred to me. Maybe that's because I'm used > > to other procedures, or because I never had to converse with the RMT, > > or maybe, quite simple, because I lacked the time to think it through, > > the original issue that kind of started this whole mess. > > I think that there was a snowballing effect here. We both made > mistakes that compounded. I apologize for my role in this whole mess. I do think all committers need to understand the role of the RMT so they can respond appropriately. Do we need to communicate this better? -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
Re: Estimating HugePages Requirements?
On Mon, Aug 9, 2021 at 3:57 PM Bossart, Nathan wrote: > On 6/9/21, 8:09 PM, "Bossart, Nathan" wrote: > > On 6/9/21, 3:51 PM, "Mark Dilger" wrote: > >>> On Jun 9, 2021, at 1:52 PM, Bossart, Nathan > wrote: > >>> > >>> I'd be happy to clean it up and submit it for > >>> discussion in pgsql-hackers@ if there is interest. > >> > >> Yes, I'd like to see it. Thanks for offering. > > > > Here's the general idea. It still needs a bit of polishing, but I'm > > hoping this is enough to spark some discussion on the approach. > > Here's a rebased version of the patch. > > Nathan > > Hi, -extern void CreateSharedMemoryAndSemaphores(void); +extern Size CreateSharedMemoryAndSemaphores(bool size_only); Should the parameter be enum / bitmask so that future addition would not change the method signature ? Cheers
Re: ECPG bug fix: DECALRE STATEMENT and DEALLOCATE, DESCRIBE
On Mon, Aug 9, 2021 at 3:51 PM Bruce Momjian wrote: > > I think that there was a snowballing effect here. We both made > > mistakes that compounded. I apologize for my role in this whole mess. > > I do think all committers need to understand the role of the RMT so they > can respond appropriately. Do we need to communicate this better? I think that it makes sense to codify the practical expectations that the community has of existing committers, at least to some degree. I mean why wouldn't we? The resulting document (an addition to the "committers" Postgres wiki page?) would inevitably leave certain questions open to interpretation. That seems okay to me. I don't feel qualified to write this myself. Just my opinion. -- Peter Geoghegan
Re: Another regexp performance improvement: skip useless paren-captures
Mark Dilger writes: > +select regexp_split_to_array('', '(?:((?:q+))){0}(\1){0,0}?*[^]'); > +server closed the connection unexpectedly Here's a quick draft patch for this. Basically it moves the responsibility for clearing v->subs[] pointers into the freesubre() recursion, so that it will happen for contained capturing parens not only the top level. There is a potentially interesting definitional question: what exactly ought this regexp do? ((.)){0}\2 Because the capturing paren sets are zero-quantified, they will never be matched to any characters, so the backref can never have any defined referent. I suspect that study of the POSIX spec would lead to the conclusion that this is a legal regexp but it will never match anything. Implementing that would be tedious though, and what's more it seems very unlikely that the user wanted any such behavior. So I think throwing an error is an appropriate response. The existing code will throw such an error for ((.)){0}\1 so I guess Spencer did think about this to some extent -- he just forgot about the possibility of nested parens. This patch should work OK in HEAD and v14, but it will need a bit of fooling-about for older branches I think, given that they fill v->subs[] a little differently. regards, tom lane diff --git a/src/backend/regex/regcomp.c b/src/backend/regex/regcomp.c index ae3a7b6a38..ae1ff670f9 100644 --- a/src/backend/regex/regcomp.c +++ b/src/backend/regex/regcomp.c @@ -550,8 +550,6 @@ freev(struct vars *v, { if (v->re != NULL) rfree(v->re); - if (v->subs != v->sub10) - FREE(v->subs); if (v->nfa != NULL) freenfa(v->nfa); if (v->tree != NULL) @@ -562,6 +560,8 @@ freev(struct vars *v, freecvec(v->cv); if (v->cv2 != NULL) freecvec(v->cv2); + if (v->subs != v->sub10) + FREE(v->subs); if (v->lacons != NULL) freelacons(v->lacons, v->nlacons); ERR(err); /* nop if err==0 */ @@ -1091,8 +1091,8 @@ parseqatom(struct vars *v, { if (atom != NULL) freesubre(v, atom); - if (atomtype == '(') - v->subs[subno] = NULL; + /* freesubre() will clean up any v->subs[] pointers into the atom */ + assert(atomtype != '(' || v->subs[subno] == NULL); delsub(v->nfa, lp, rp); EMPTYARC(lp, rp); return top; @@ -2127,9 +2127,24 @@ freesrnode(struct vars *v, /* might be NULL */ if (sr == NULL) return; + /* + * If the node is referenced in v->subs[], clear that to avoid leaving a + * dangling pointer. This should only ever matter when parseqatom() is + * throwing away a {0,0}-quantified atom that contains capturing parens. + * Doing this will cause later backrefs to such parens to fail, which is + * maybe not strictly POSIX but seems more helpful than allowing a backref + * that can never have a defined referent. + */ + if (sr->capno > 0 && v != NULL) + { + assert(v->subs[sr->capno] == sr); + v->subs[sr->capno] = NULL; + } + if (!NULLCNFA(sr->cnfa)) freecnfa(&sr->cnfa); sr->flags = 0;/* in particular, not INUSE */ + sr->capno = 0; sr->child = sr->sibling = NULL; sr->begin = sr->end = NULL; diff --git a/src/test/modules/test_regex/expected/test_regex.out b/src/test/modules/test_regex/expected/test_regex.out index 5a6cdf47c2..850e0c8ef5 100644 --- a/src/test/modules/test_regex/expected/test_regex.out +++ b/src/test/modules/test_regex/expected/test_regex.out @@ -3506,6 +3506,10 @@ select * from test_regex('((.))(\2)', 'xyy', 'oRP'); {yy,NULL,NULL,NULL} (2 rows) +-- # throwing an error for this is arguably not per POSIX +-- expectError 21.39 - {((.)){0}(\2){0}} ESUBREG +select * from test_regex('((.)){0}(\2){0}', '', '-'); +ERROR: invalid regular expression: invalid backreference number -- doing 22 "multicharacter collating elements" -- # again ugh -- MCCEs are not implemented in Postgres, so we skip all these tests diff --git a/src/test/modules/test_regex/sql/test_regex.sql b/src/test/modules/test_regex/sql/test_regex.sql index 3419564203..378f22b67b 100644 --- a/src/test/modules/test_regex/sql/test_regex.sql +++ b/src/test/modules/test_regex/sql/test_regex.sql @@ -1020,6 +1020,9 @@ select * from test_regex('((.))(\2){0}', 'xy', 'RPQ'); select * from test_regex('((.))(\2)', 'xyy', 'RP'); -- expectMatch 21.38 oRP ((.))(\2) xyy yy {} {} {} select * from test_regex('((.))(\2)', 'xyy', 'oRP'); +-- # throwing an error for this is arguably not per POSIX +-- expectError 21.39 - {((.)){0}(\2){0}} ESUBREG +select * from test_regex('((.)){0}(\2){0}', '', '-'); -- doing 22 "multicharacter collating elements" -- # again ugh
Re: Another regexp performance improvement: skip useless paren-captures
> On Aug 9, 2021, at 12:14 PM, Tom Lane wrote: > > Pushed, but while re-reading it before commit I noticed that there's > some more fairly low-hanging fruit in regexp_replace(). As I had it > in that patch, it never used REG_NOSUB because of the possibility > that the replacement string uses "\N". However, we're already > pre-checking the replacement string to see if it has backslashes > at all, so while we're at it we can check for \N to discover if we > actually need any subexpression match data or not. We do need to > refactor a little to postpone calling pg_regcomp until after we > know that, but I think that makes replace_text_regexp's API less > ugly not more so. > > While I was at it, I changed the search-for-backslash loops to > use memchr rather than handwritten looping. Their use of > pg_mblen was pretty unnecessary given we only need to find > backslashes, and we can assume the backend encoding is ASCII-safe. > > Using a bunch of random cases generated by your little perl > script, I see maybe 10-15% speedup on test cases that don't > use \N in the replacement string, while it's about a wash > on cases that do. (If I'd been using a multibyte encoding, > maybe the memchr change would have made a difference, but > I didn't try that.) I've been reviewing and testing this (let-regexp_replace-use-NOSUB.patch) since you sent it 4 hours ago, and I can't seem to break it. There are pre-existing problems in the regex code, but this doesn't seem to add any new breakage. — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Estimating HugePages Requirements?
On 8/9/21, 4:05 PM, "Zhihong Yu" wrote: > -extern void CreateSharedMemoryAndSemaphores(void); > +extern Size CreateSharedMemoryAndSemaphores(bool size_only); > > Should the parameter be enum / bitmask so that future addition would not > change the method signature ? I don't have a strong opinion about this. I don't feel that it's really necessary, but if reviewers want a bitmask instead, I can change it. Nathan
Re: Estimating HugePages Requirements?
On Thu, Jun 10, 2021 at 07:23:33PM -0500, Justin Pryzby wrote: > On Wed, Jun 09, 2021 at 10:55:08PM -0500, Don Seiler wrote: > > On Wed, Jun 9, 2021, 21:03 P C wrote: > > > > > I agree, its confusing for many and that confusion arises from the fact > > > that you usually talk of shared_buffers in MB or GB whereas hugepages have > > > to be configured in units of 2mb. But once they understand they realize > > > its > > > pretty simple. > > > > > > Don, we have experienced the same not just with postgres but also with > > > oracle. I havent been able to get to the root of it, but what we usually > > > do > > > is, we add another 100-200 pages and that works for us. If the SGA or > > > shared_buffers is high eg 96gb, then we add 250-500 pages. Those few > > > hundred MBs may be wasted (because the moment you configure hugepages, > > > the > > > operating system considers it as used and does not use it any more) but > > > nowadays, servers have 64 or 128 gb RAM easily and wasting that 500mb to > > > 1gb does not hurt really. > > > > I don't have a problem with the math, just wanted to know if it was > > possible to better estimate what the actual requirements would be at > > deployment time. My fallback will probably be you did and just pad with an > > extra 512MB by default. > > It's because the huge allocation isn't just shared_buffers, but also > wal_buffers: > > | The amount of shared memory used for WAL data that has not yet been written > to disk. > | The default setting of -1 selects a size equal to 1/32nd (about 3%) of > shared_buffers, ... > > .. and other stuff: I wonder if this shouldn't be solved the other way around: Define shared_buffers as the exact size to be allocated/requested from the OS (regardless of whether they're huge pages or not), and have postgres compute everything else based on that. So shared_buffers=2GB would end up being 1950MB (or so) of buffer cache. We'd have to check that after the other allocations, there's still at least 128kB left for the buffer cache. Maybe we'd have to bump the minimum value of shared_buffers. > src/backend/storage/ipc/ipci.c >* Size of the Postgres shared-memory block is estimated via >* moderately-accurate estimates for the big hogs, plus 100K for the >* stuff that's too small to bother with estimating. >* >* We take some care during this phase to ensure that the total size >* request doesn't overflow size_t. If this gets through, we don't >* need to be so careful during the actual allocation phase. >*/ > size = 10; > size = add_size(size, PGSemaphoreShmemSize(numSemas)); > size = add_size(size, SpinlockSemaSize()); > size = add_size(size, hash_estimate_size(SHMEM_INDEX_SIZE, > > sizeof(ShmemIndexEnt))); > size = add_size(size, dsm_estimate_size()); > size = add_size(size, BufferShmemSize()); > size = add_size(size, LockShmemSize()); > size = add_size(size, PredicateLockShmemSize()); > size = add_size(size, ProcGlobalShmemSize()); > size = add_size(size, XLOGShmemSize()); > size = add_size(size, CLOGShmemSize()); > size = add_size(size, CommitTsShmemSize()); > size = add_size(size, SUBTRANSShmemSize()); > size = add_size(size, TwoPhaseShmemSize()); > size = add_size(size, BackgroundWorkerShmemSize()); > size = add_size(size, MultiXactShmemSize()); > size = add_size(size, LWLockShmemSize()); > size = add_size(size, ProcArrayShmemSize()); > size = add_size(size, BackendStatusShmemSize()); > size = add_size(size, SInvalShmemSize()); > size = add_size(size, PMSignalShmemSize()); > size = add_size(size, ProcSignalShmemSize()); > size = add_size(size, CheckpointerShmemSize()); > size = add_size(size, AutoVacuumShmemSize()); > size = add_size(size, ReplicationSlotsShmemSize()); > size = add_size(size, ReplicationOriginShmemSize()); > size = add_size(size, WalSndShmemSize()); > size = add_size(size, WalRcvShmemSize()); > size = add_size(size, PgArchShmemSize()); > size = add_size(size, ApplyLauncherShmemSize()); > size = add_size(size, SnapMgrShmemSize()); > size = add_size(size, BTreeShmemSize()); > size = add_size(size, SyncScanShmemSize()); > size = add_size(size, AsyncShmemSize()); > #ifdef EXEC_BACKEND > size = add_size(size, ShmemBackendArraySize()); > #endif > > /* freeze the addin request size and include it */ > addin_request_allowed = false; > size = add_size(size, total_addin_request); > > /* might as well round it off to a multiple of a typical page size */ > size = add_size(size, 8192 - (size % 8192)); > > BTW, I think it'd be nice if this were a NOTICE: > | elog(DEBUG1, "mmap(%zu) with MAP_HUGETLB failed, huge pages disabled: %m",
Re: Autovacuum on partitioned table (autoanalyze)
Hello, On 2021-Jul-22, Andres Freund wrote: > 1) Somehow it seems like a violation to do stuff like >get_partition_ancestors() in pgstat.c. It's nothing I can't live with, but >it feels a bit off. Would likely not be too hard to address, e.g. by just >putting some of pgstat_report_anl_ancestors in partition.c instead. I understand the complain about this being a modularity violation -- the point being that pgstat.c has no business accessing system catalogs at all. Before this function, all pgstat_report_* functions were just assembling a message from counters accumulated somewhere and sending the bytes to the collector, and this new function is a deviation from that. It seems that we could improve this by having a function (maybe in partition.c as you propose), something like static void report_partition_ancestors(Oid relid) { ancestors = get_partition_ancestors( ... ); array = palloc(sizeof(Oid) * list_length(ancestors)); foreach(lc, ancestors) { array[i++] = lfirst_oid(lc); } pgstat_report_partition_ancestors(oid, array); } and then pgstat.c works with the given array without having to consult system catalogs. > 2) Why does it make sense that autovacuum sends a stats message for every >partition in the system that had any [changes] since the last autovacuum >cycle? On a database with a good number of objects / a short naptime we'll >often end up sending messages for the same set of tables from separate >workers, because they don't yet see the concurrent >tabentry->changes_since_analyze_reported. The traffic could be large, yeah, and I agree it seems undesirable. If collector kept a record of the list of ancestors of each table, then we wouldn't need to do this (we would have to know if collector knows a particular partition or not, though ... I have no ideas on that.) > 3) What is the goal of the autovac_refresh_stats() after the loop doing >pgstat_report_anl_ancestors()? I think it'll be common that the stats >collector hasn't even processed the incoming messages by that point, not to >speak of actually having written out a new stats file. If it took less than >10ms (PGSTAT_RETRY_DELAY) to get to autovac_refresh_stats(), >backend_read_statsfile() will not wait for a new stats file to be written >out, and we'll just re-read the state we previously did. > >It's pretty expensive to re-read the stats file in some workloads, so I'm a >bit concerned that we end up significantly increasing the amount of stats >updates/reads, without actually gaining anything reliable? This is done once per autovacuum run and the point is precisely to let the next block absorb the updates that were sent. In manual ANALYZE we do it to inform future autovacuum runs. Note that the PGSTAT_RETRY_DELAY limit is used by the autovac launcher only, and this code is running in the worker; we do flush out the old data. Yes, it's expensive, but we're not doing it once per table, just once per worker run. > 4) In the shared mem stats patch I went to a fair bit of trouble to try to get >rid of pgstat_vacuum_stat() (which scales extremely poorly to larger >systems). For that to work pending stats can only be "staged" while holding >a lock on a relation that prevents the relation from being concurrently >dropped (pending stats increment a refcount for the shared stats object, >which ensures that we don't loose track of the fact that a stats object has >been dropped, even when stats only get submitted later). > >I'm not yet clear on how to make this work for >pgstat_report_anl_ancestors() - but I probably can find a way. But it does >feel a bit off to issue stats stuff for tables we're not sure still exist. I assume you refer to locking the *partition*, right? You're not talking about locking the ancestor mentioned in the message. I don't know how does the shmem-collector work, but it shouldn't be a problem that an ancestor goes away (ALTER TABLE parent DETACH; DROP TABLE parent); as long as you've kept a lock on the partition, it should be fine. Or am I misinterpreting what you mean? > I'll go and read through the thread, but my first thought is that having a > hashtable in do_autovacuum() that contains stats for partitioned tables would > be a good bit more efficient than the current approach? We already have a > hashtable for each toast table, compared to that having a hashtable for each > partitioned table doesn't seem like it'd be a problem? > With a small bit of extra work that could even avoid the need for the > additional pass through pg_class. Do the partitioned table data-gathering as > part of the "collect main tables to vacuum" pass, and then do one of I'll have to re-read the thread to remember why did I make it a separate pass. I think I did it that way because otherwise there was a requirement on the pg_class scan order. (Some earlier vers
Re: Another regexp performance improvement: skip useless paren-captures
> On Aug 9, 2021, at 4:31 PM, Tom Lane wrote: > > There is a potentially interesting definitional question: > what exactly ought this regexp do? > > ((.)){0}\2 > > Because the capturing paren sets are zero-quantified, they will > never be matched to any characters, so the backref can never > have any defined referent. Perl regular expressions are not POSIX, but if there is a principled reason POSIX should differ from perl on this, we should be clear what that is: #!/usr/bin/perl use strict; use warnings; our $match; if ('foo' =~ m/((.)(??{ die; })){0}(..)/) { print "captured 1 $1\n" if defined $1; print "captured 2 $2\n" if defined $2; print "captured 3 $3\n" if defined $3; print "captured 4 $4\n" if defined $4; print "match = $match\n" if defined $match; } This will print "captured 3 fo", proving that although the regular expression is parsed with the (..) bound to the third capture group, the first two capture groups never run. If you don't believe that, change the {0} to {1} and observe that the script dies. > So I think throwing an > error is an appropriate response. The existing code will > throw such an error for > > ((.)){0}\1 > > so I guess Spencer did think about this to some extent -- he > just forgot about the possibility of nested parens. Ugg. That means our code throws an error where perl does not, pretty well negating my point above. If we're already throwing an error for this type of thing, I agree we should be consistent about it. My personal preference would have been to do the same thing as perl, but it seems that ship has already sailed. — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Another regexp performance improvement: skip useless paren-captures
> On Aug 9, 2021, at 5:14 PM, Mark Dilger wrote: > >our $match; >if ('foo' =~ m/((.)(??{ die; })){0}(..)/) I left in a stray variable. A prior version of this script was assigning to $match where it now has die. Sorry for any confusion. — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Another regexp performance improvement: skip useless paren-captures
Mark Dilger writes: >> On Aug 9, 2021, at 12:14 PM, Tom Lane wrote: >> Pushed, but while re-reading it before commit I noticed that there's >> some more fairly low-hanging fruit in regexp_replace(). > I've been reviewing and testing this (let-regexp_replace-use-NOSUB.patch) > since you sent it 4 hours ago, and I can't seem to break it. There are > pre-existing problems in the regex code, but this doesn't seem to add any new > breakage. Pushed that bit, thanks for testing! I plan to not do anything about the (()){0} bug until after the release window, since that will need to be back-patched. That bug's gotta be twenty years old, so while I kinda wish we'd found it a few days earlier, waiting another 3 months won't make much difference. regards, tom lane
Re: Small documentation improvement for ALTER SUBSCRIPTION
On Mon, Aug 9, 2021 at 1:01 PM Peter Smith wrote: > > On Mon, Aug 9, 2021 at 12:46 PM Amit Kapila wrote: > > > > On Sun, Aug 8, 2021 at 10:21 AM Peter Smith wrote: > > > > > > On Sat, Aug 7, 2021 at 4:33 PM Amit Kapila > > > wrote: > > > > > > > > On Thu, Jul 8, 2021 at 6:31 PM Masahiko Sawada > > > > wrote: > > > > > > > > > > Hi all, > > > > > > > > > > When reading the doc of ALTER SUBSCRIPTION I realized that 'refresh > > > > > options' in the following paragraph is not tagged: > > > > > > > > > > --- > > > > > Additionally, refresh options as described under REFRESH PUBLICATION > > > > > may be specified, except in the case of DROP PUBLICATION. > > > > > --- > > > > > > > > > > When I read it for the first time, I got confused because we actually > > > > > have the 'refresh' option and this description in the paragraph of the > > > > > 'refresh' option. I think we can improve it by changing to > > > > > 'refresh_option'. Thoughts? > > > > > > > > > > > > > I see that one can get confused but how about changing it to > > > > "Additionally, refresh options as described under REFRESH > > > > PUBLICATION (refresh_option) may > > > > be specified,.."? I think keeping "refresh options" in the text would > > > > be good because there could be multiple such options. > > > > > > > > > > I feel like it would be better to reword it in some way that avoids > > > using parentheses because they look like part of the syntax instead of > > > just part of the sentence. > > > > > > > Fair enough, feel free to propose if you find something better or if > > you think the current text in the docs is good. > > > Thank you for the comments! > IMO just the same as your suggestion but without the parens would be good. > e.g. > > "Additionally, refresh options as described under REFRESH > PUBLICATION refresh_option may be > specified,.." But "REFRESH PUBLICATION refresh_option" seems wrong in terms of SQL syntax, not? Given there could be multiple options how about using "refresh_options"? That is, the sentence will be: Additionally, refresh_options as described under REFRESH PUBLICATION may be specified, except in the case of DROP PUBLICATION. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
Re: Another regexp performance improvement: skip useless paren-captures
Mark Dilger writes: >> On Aug 9, 2021, at 4:31 PM, Tom Lane wrote: >> There is a potentially interesting definitional question: >> what exactly ought this regexp do? >> ((.)){0}\2 >> Because the capturing paren sets are zero-quantified, they will >> never be matched to any characters, so the backref can never >> have any defined referent. > Perl regular expressions are not POSIX, but if there is a principled reason > POSIX should differ from perl on this, we should be clear what that is: > if ('foo' =~ m/((.)(??{ die; })){0}(..)/) > { > print "captured 1 $1\n" if defined $1; > print "captured 2 $2\n" if defined $2; > print "captured 3 $3\n" if defined $3; > print "captured 4 $4\n" if defined $4; > print "match = $match\n" if defined $match; > } Hm. I'm not sure that this example proves anything about Perl's handling of the situation, since you didn't use a backref. I tried both if ('foo' =~ m/((.)){0}\1/) if ('foo' =~ m/((.)){0}\2/) and while neither throws an error, they don't succeed either. So AFAICS Perl is acting in the way I'm attributing to POSIX. But maybe we should actually read POSIX ... >> ... I guess Spencer did think about this to some extent -- he >> just forgot about the possibility of nested parens. > Ugg. That means our code throws an error where perl does not, pretty > well negating my point above. If we're already throwing an error for > this type of thing, I agree we should be consistent about it. My > personal preference would have been to do the same thing as perl, but it > seems that ship has already sailed. Removing an error case is usually an easier sell than adding one. However, the fact that the simplest case (viz, '(.){0}\1') has always thrown an error and nobody has complained in twenty-ish years suggests that nobody much cares. regards, tom lane
Re: Another regexp performance improvement: skip useless paren-captures
> On Aug 9, 2021, at 6:11 PM, Tom Lane wrote: > > Hm. I'm not sure that this example proves anything about Perl's handling > of the situation, since you didn't use a backref. Well, this doesn't die either: if ('foo' =~ m/((??{ die; })(.)(??{ die $1; })){0}((??{ die "got here"; })|\2)/) { print "matched\n"; } The point is that the regex engine never walks the part of the pattern that {0} qualifies. I thought it was more clear in the prior example, because that example proves that the engine does get as far as capturing. This example also does that, and with a backref, because it dies with "got here". — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: replay of CREATE TABLESPACE eats data at wal_level=minimal
On Mon, Aug 09, 2021 at 01:08:42PM -0400, Robert Haas wrote: > To reproduce, initialize a cluster with wal_level=minimal and > max_wal_senders=0. Then from psql: > > \! mkdir /tmp/goose > > CHECKPOINT; > CREATE TABLESPACE goose LOCATION '/tmp/goose'; > SET wal_skip_threshold=0; > BEGIN; > CREATE TABLE wild (a int, b text) TABLESPACE goose; > INSERT INTO wild VALUES (1, 'chase'); > COMMIT; > SELECT * FROM wild; > > As expected, you will see one row in table 'wild'. Now perform an > immediate shutdown. Restart the server. Table 'wild' is now empty. Thanks for finding the problem. It's a bad problem. > The problem appears to be that tblspc_redo() calls > create_tablespace_directories(), which says: > > /* > * Our theory for replaying a CREATE is to forcibly drop the target > * subdirectory if present, and then recreate it. This may be more > * work than needed, but it is simple to implement. > */ > > Unfortunately, this theory (which dates to > c86f467d18aa58e18fd85b560b46d8de014e6017, vintage 2010, by Bruce) is > correct only with wal_level>minimal. At wal_level='minimal', we can > replay the record to recreate the relfilenode, but not the records > that would have created the contents. However, note that if the table > is smaller than wal_skip_threshold, then we'll log full-page images of > the contents at commit time even at wal_level='minimal' after which we > have no problem. As far as I can see, this bug has "always" existed, > but before c6b92041d38512a4176ed76ad06f713d2e6c01a8 (2020, Noah) you > would have needed a different test case. Specifically, you would have > needed to use COPY to put the row in the table, and you would have > needed to omit setting wal_skip_threshold since it didn't exist yet. Agreed. > I don't presently have a specific idea about how to fix this. Can't recovery just not delete the directory, create it if doesn't exist, and be happy if it does exist? Like the attached WIP. If we think it's possible for a crash during mkdir to leave a directory having the wrong permissions, adding a chmod would be in order. diff --git a/src/backend/commands/tablespace.c b/src/backend/commands/tablespace.c index a54239a..2928ecf 100644 --- a/src/backend/commands/tablespace.c +++ b/src/backend/commands/tablespace.c @@ -589,7 +589,6 @@ create_tablespace_directories(const char *location, const Oid tablespaceoid) { char *linkloc; char *location_with_version_dir; - struct stat st; linkloc = psprintf("pg_tblspc/%u", tablespaceoid); location_with_version_dir = psprintf("%s/%s", location, @@ -614,39 +613,22 @@ create_tablespace_directories(const char *location, const Oid tablespaceoid) location))); } - if (InRecovery) - { - /* -* Our theory for replaying a CREATE is to forcibly drop the target -* subdirectory if present, and then recreate it. This may be more -* work than needed, but it is simple to implement. -*/ - if (stat(location_with_version_dir, &st) == 0 && S_ISDIR(st.st_mode)) - { - if (!rmtree(location_with_version_dir, true)) - /* If this failed, MakePGDirectory() below is going to error. */ - ereport(WARNING, - (errmsg("some useless files may be left behind in old database directory \"%s\"", - location_with_version_dir))); - } - } - /* * The creation of the version directory prevents more than one tablespace * in a single location. */ if (MakePGDirectory(location_with_version_dir) < 0) { - if (errno == EEXIST) - ereport(ERROR, - (errcode(ERRCODE_OBJECT_IN_USE), -errmsg("directory \"%s\" already in use as a tablespace", - location_with_version_dir))); - else + if (errno != EEXIST) ereport(ERROR, (errcode_for_file_access(), errmsg("could not create directory \"%s\": %m", location_with_version_dir))); + else if (!InRecovery) + ereport(ERROR, + (errcode(ERRCODE_OBJECT_IN_USE), +errmsg("directory \"%s\" already in use as a tablespace", + location_with_version_dir))); } /*
Re: Another regexp performance improvement: skip useless paren-captures
> On Aug 9, 2021, at 6:17 PM, Mark Dilger wrote: > > Well, this doesn't die either: Meaning it doesn't die in the part of the pattern qualified by {0} either. It does die in the other part. Sorry again for the confusion. — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Another regexp performance improvement: skip useless paren-captures
> On Aug 9, 2021, at 4:31 PM, Tom Lane wrote: > > This patch should work OK in HEAD and v14, but it will need > a bit of fooling-about for older branches I think, given that > they fill v->subs[] a little differently. Note that I tested your patch *before* master, so the changes look backwards. I tested this fix and it seems good to me. Some patterns that used to work (by chance?) now raise an error, such as: select 'bpgouiwcquu' ~ '[e])*?)){0,0}?(\3))'; -ERROR: invalid regular expression: invalid backreference number + ?column? +-- + f +(1 row) I ran a lot of tests with the patch, and the asserts have all cleared up, but I don't know how to think about the user facing differences. If we had a good reason for raising an error for these back-references, maybe that'd be fine, but it seems to just be an implementation detail. — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Small documentation improvement for ALTER SUBSCRIPTION
On Tue, Aug 10, 2021 at 11:01 AM Masahiko Sawada wrote: > > On Mon, Aug 9, 2021 at 1:01 PM Peter Smith wrote: > > > > On Mon, Aug 9, 2021 at 12:46 PM Amit Kapila wrote: > > > > > > On Sun, Aug 8, 2021 at 10:21 AM Peter Smith wrote: > > > > > > > > On Sat, Aug 7, 2021 at 4:33 PM Amit Kapila > > > > wrote: > > > > > > > > > > On Thu, Jul 8, 2021 at 6:31 PM Masahiko Sawada > > > > > wrote: > > > > > > > > > > > > Hi all, > > > > > > > > > > > > When reading the doc of ALTER SUBSCRIPTION I realized that 'refresh > > > > > > options' in the following paragraph is not tagged: > > > > > > > > > > > > --- > > > > > > Additionally, refresh options as described under REFRESH PUBLICATION > > > > > > may be specified, except in the case of DROP PUBLICATION. > > > > > > --- > > > > > > > > > > > > When I read it for the first time, I got confused because we > > > > > > actually > > > > > > have the 'refresh' option and this description in the paragraph of > > > > > > the > > > > > > 'refresh' option. I think we can improve it by changing to > > > > > > 'refresh_option'. Thoughts? > > > > > > > > > > > > > > > > I see that one can get confused but how about changing it to > > > > > "Additionally, refresh options as described under REFRESH > > > > > PUBLICATION (refresh_option) may > > > > > be specified,.."? I think keeping "refresh options" in the text would > > > > > be good because there could be multiple such options. > > > > > > > > > > > > > I feel like it would be better to reword it in some way that avoids > > > > using parentheses because they look like part of the syntax instead of > > > > just part of the sentence. > > > > > > > > > > Fair enough, feel free to propose if you find something better or if > > > you think the current text in the docs is good. > > > > > > > Thank you for the comments! > > > IMO just the same as your suggestion but without the parens would be good. > > e.g. > > > > "Additionally, refresh options as described under REFRESH > > PUBLICATION refresh_option may be > > specified,.." > > But "REFRESH PUBLICATION refresh_option" seems wrong in terms of SQL > syntax, not? > Because the sentence says "... as described under ..." I thought it was clear enough it was referring to the documentation below and not the SQL syntax. > Given there could be multiple options how about using > "refresh_options"? That is, the sentence > will be: > > Additionally, refresh_options as described > under REFRESH PUBLICATION may be specified, > except in the case of DROP PUBLICATION. > +1 LGTM -- Kind Regards, Peter Smith. Fujitsu Australia.
Re: Another regexp performance improvement: skip useless paren-captures
Mark Dilger writes: > I ran a lot of tests with the patch, and the asserts have all cleared up, but > I don't know how to think about the user facing differences. If we had a > good reason for raising an error for these back-references, maybe that'd be > fine, but it seems to just be an implementation detail. I thought about this some more, and I'm coming around to the idea that throwing an error is the wrong thing. As a contrary example, consider (.)|(\1\1) We don't throw an error for this, and neither does Perl, even though the capturing parens can never be defined in the branch where the backrefs are. So it seems hard to argue that this is okay but the other thing isn't. Another interesting example is (.){0}(\1){0} I think that the correct interpretation is that this is a valid regexp matching an empty string (i.e., zero repetitions of each part), even though neither capture group will be defined. That's different from (.){0}(\1) which can never match. So I took another look at the code, and it doesn't seem that hard to make it act this way. The attached passes regression, but I've not beat on it with random strings. regards, tom lane diff --git a/src/backend/regex/regcomp.c b/src/backend/regex/regcomp.c index ae3a7b6a38..d9840171a3 100644 --- a/src/backend/regex/regcomp.c +++ b/src/backend/regex/regcomp.c @@ -1089,11 +1089,23 @@ parseqatom(struct vars *v, /* annoying special case: {0} or {0,0} cancels everything */ if (m == 0 && n == 0) { - if (atom != NULL) - freesubre(v, atom); - if (atomtype == '(') - v->subs[subno] = NULL; - delsub(v->nfa, lp, rp); + /* + * If we had capturing subexpression(s) within the atom, we don't want + * to destroy them, because it's legal (if useless) to back-ref them + * later. Hence, just unlink the atom from lp/rp and then ignore it. + */ + if (atom != NULL && (atom->flags & CAP)) + { + delsub(v->nfa, lp, atom->begin); + delsub(v->nfa, atom->end, rp); + } + else + { + /* Otherwise, we can clean up any subre infrastructure we made */ + if (atom != NULL) +freesubre(v, atom); + delsub(v->nfa, lp, rp); + } EMPTYARC(lp, rp); return top; } diff --git a/src/test/modules/test_regex/expected/test_regex.out b/src/test/modules/test_regex/expected/test_regex.out index 5a6cdf47c2..96c0e6fa59 100644 --- a/src/test/modules/test_regex/expected/test_regex.out +++ b/src/test/modules/test_regex/expected/test_regex.out @@ -3506,6 +3506,21 @@ select * from test_regex('((.))(\2)', 'xyy', 'oRP'); {yy,NULL,NULL,NULL} (2 rows) +-- expectMatch 21.39 NPQR {((.)){0}(\2){0}} xyz {} {} {} {} +select * from test_regex('((.)){0}(\2){0}', 'xyz', 'NPQR'); + test_regex + + {3,REG_UBACKREF,REG_UBOUNDS,REG_UNONPOSIX,REG_UEMPTYMATCH} + {"",NULL,NULL,NULL} +(2 rows) + +-- expectNomatch 21.40 PQR {((.)){0}(\2)} xxx +select * from test_regex('((.)){0}(\2)', 'xxx', 'PQR'); + test_regex + + {3,REG_UBACKREF,REG_UBOUNDS,REG_UNONPOSIX} +(1 row) + -- doing 22 "multicharacter collating elements" -- # again ugh -- MCCEs are not implemented in Postgres, so we skip all these tests diff --git a/src/test/modules/test_regex/sql/test_regex.sql b/src/test/modules/test_regex/sql/test_regex.sql index 3419564203..29806e9417 100644 --- a/src/test/modules/test_regex/sql/test_regex.sql +++ b/src/test/modules/test_regex/sql/test_regex.sql @@ -1020,6 +1020,10 @@ select * from test_regex('((.))(\2){0}', 'xy', 'RPQ'); select * from test_regex('((.))(\2)', 'xyy', 'RP'); -- expectMatch 21.38 oRP ((.))(\2) xyy yy {} {} {} select * from test_regex('((.))(\2)', 'xyy', 'oRP'); +-- expectMatch 21.39 NPQR {((.)){0}(\2){0}} xyz {} {} {} {} +select * from test_regex('((.)){0}(\2){0}', 'xyz', 'NPQR'); +-- expectNomatch 21.40 PQR {((.)){0}(\2)} xxx +select * from test_regex('((.)){0}(\2)', 'xxx', 'PQR'); -- doing 22 "multicharacter collating elements" -- # again ugh
Re: [BUG] wrong refresh when ALTER SUBSCRIPTION ADD/DROP PUBLICATION
On Sat, Aug 7, 2021 at 2:36 PM Amit Kapila wrote: > > On Fri, Aug 6, 2021 at 9:57 PM Japin Li wrote: > > > > > > > > Hmm yes, it cannot cover all cases. I had somehow misunderstood that > > > the subscriber knows which relations are associated with which > > > publications. Given that the subscriber doesn’t know which relations > > > are associated with which publications (and the set of subscribed > > > relations on the subscriber becomes out of date once the publication > > > is updated), I think it's impossible to achieve that DROP PUBLICATION > > > drops relations that are associated with only the publication being > > > dropped. > > > > > >> Do we have better ideas to fix or shall we just > > >> say that we will refresh based on whatever current set of relations > > >> are present in publication to be dropped? > > > > > > I don't have better ideas. It seems to me that it's prudent that we > > > accept the approach in v3 patch and refresh all publications in DROP > > > PUBLICATION cases. > > > > > > > Maybe refreshing all publications in PUBLICATION cases is simple way to > > fix the problem. > > > > Actually, doing it both will make the behavior consistent but doing it > just for Drop might be preferable by some users. I agree that ADD/DROP PUBLICATION is convenient to just add/drop publications instead of providing the complete publication list. But I'm concerned that during developing this feature most people didn't like the behavior of refreshing new and existing publications. Also, there was pointing out that there will be an overhead of a SQL with more publications when AlterSubscription_refresh() is called with all the existing publications. If we feel this behavior is unnatural, the users will feel the same. I personally think there is a benefit only in terms of syntax. And we can work on the part of refreshing only publications being added/dropped on v15 development. But it might be better to consider also if there will be use cases for users to add/remove publications in the subscription even with such downsides. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
Re: Autovacuum on partitioned table (autoanalyze)
Hi, On 2021-08-09 16:02:33 -0400, Alvaro Herrera wrote: > On 2021-Jul-27, Andres Freund wrote: > > > Isn't this going to create a *lot* of redundant sampling? Especially if you > > have any sort of nested partition tree. In the most absurd case a partition > > with n parents will get sampled n times, solely due to changes to itself. > > It seems to me that you're barking up the wrong tree on this point. > This problem you describe is not something that was caused by this > patch; ANALYZE has always worked like this. We have discussed the idea > of avoiding redundant sampling, but it's clear that it is not a simple > problem, and solving it was not in scope for this patch. I don't agree. There's a difference between this happening after a manual ANALYZE on partition roots, and this continuously happening in production workloads due to auto-analyzes... Greetings, Andres Freund
Re: [BUG] wrong refresh when ALTER SUBSCRIPTION ADD/DROP PUBLICATION
On Tue, Aug 10, 2021 at 8:05 AM Masahiko Sawada wrote: > > On Sat, Aug 7, 2021 at 2:36 PM Amit Kapila wrote: > > > > On Fri, Aug 6, 2021 at 9:57 PM Japin Li wrote: > > > > > > > > > > > Hmm yes, it cannot cover all cases. I had somehow misunderstood that > > > > the subscriber knows which relations are associated with which > > > > publications. Given that the subscriber doesn’t know which relations > > > > are associated with which publications (and the set of subscribed > > > > relations on the subscriber becomes out of date once the publication > > > > is updated), I think it's impossible to achieve that DROP PUBLICATION > > > > drops relations that are associated with only the publication being > > > > dropped. > > > > > > > >> Do we have better ideas to fix or shall we just > > > >> say that we will refresh based on whatever current set of relations > > > >> are present in publication to be dropped? > > > > > > > > I don't have better ideas. It seems to me that it's prudent that we > > > > accept the approach in v3 patch and refresh all publications in DROP > > > > PUBLICATION cases. > > > > > > > > > > Maybe refreshing all publications in PUBLICATION cases is simple way to > > > fix the problem. > > > > > > > Actually, doing it both will make the behavior consistent but doing it > > just for Drop might be preferable by some users. > > I agree that ADD/DROP PUBLICATION is convenient to just add/drop > publications instead of providing the complete publication list. But > I'm concerned that during developing this feature most people didn't > like the behavior of refreshing new and existing publications. Also, > there was pointing out that there will be an overhead of a SQL with > more publications when AlterSubscription_refresh() is called with all > the existing publications. > True, but I think at that time we didn't know this design problem with 'drop publication'. > If we feel this behavior is unnatural, the > users will feel the same. I personally think there is a benefit only > in terms of syntax. > Right, and I think in this particular case, the new syntax is much easier to use for users. > And we can work on the part of refreshing only > publications being added/dropped on v15 development. > Yeah, unless we change design drastically we might not be able to do a refresh for dropped publications, for add it is possible. It seems most of the people responded on this thread that we can be consistent in terms of refreshing for add/drop at this stage but we can still go with another option where we can refresh only newly added publications for add but for drop refresh all publications. Peter E., do you have any opinion on this matter? -- With Regards, Amit Kapila.
Re: alter table set TABLE ACCESS METHOD
On Sat, Aug 07, 2021 at 07:18:19PM +0900, Michael Paquier wrote: > As a matter of curiosity, I have checked how it would look to handle > the no-op case for the sub-commands other than SET TABLESPACE, and one > would need something like the attached, with a new flag for > AlteredTableInfo. That does not really look good, but it triggers > properly the object access hook when SET LOGGED/UNLOGGED/ACCESS METHOD > are no-ops, so let's just handle the case using the version from > upthread. I'll do that at the beginning of next week. So, on a closer look, it happens that this breaks the regression tests of sepgsql, as the two following commands in ddl.sql cause a rewrite: ALTER TABLE regtest_table_4 ALTER COLUMN y TYPE float; ALTER TABLE regtest_ptable_4 ALTER COLUMN y TYPE float; I have been fighting with SE/Linux for a couple of hours to try to figure out how to run our regression tests, but gave up after running into various segmentation faults even after following all the steps of the documentation to set up things. Perhaps that's because I just set up the environment with a recent Debian, I don't know. Instead of running those tests, I have fallen back to my own module and ran all the SQLs of sepgsql to find out places where there are rewrites where I spotted those two places. One thing I have noticed is that sepgsql-regtest.te fails to compile because /usr/share/selinux/devel/Makefile does not understand auth_read_passwd(). Looking at some of the SE/Linux repos, perhaps this ought to be auth_read_shadow() instead to be able to work with a newer version? Anyway, as the addition of this InvokeObjectPostAlterHook() is consistent for a rewrite caused by SET LOGGED/UNLOGGED/ACCESS METHOD I have applied the patch. I'll fix rhinoceros once it reports back the diffs in output. -- Michael signature.asc Description: PGP signature
Re: Small documentation improvement for ALTER SUBSCRIPTION
On Tue, Aug 10, 2021 at 6:31 AM Masahiko Sawada wrote: > > On Mon, Aug 9, 2021 at 1:01 PM Peter Smith wrote: > > > > On Mon, Aug 9, 2021 at 12:46 PM Amit Kapila wrote: > > But "REFRESH PUBLICATION refresh_option" seems wrong in terms of SQL > syntax, not? > > Given there could be multiple options how about using > "refresh_options"? That is, the sentence > will be: > > Additionally, refresh_options as described > under REFRESH PUBLICATION may be specified, > except in the case of DROP PUBLICATION. > Normally (at least on this doc page), we use this tag for some defined option, syntax and as refresh_options is none of them, it would look a bit awkward. -- With Regards, Amit Kapila.
Re: [bug] Logical Decoding of relation rewrite with toast does not reset toast_hash
On Mon, Aug 9, 2021 at 3:37 PM Drouvot, Bertrand wrote: > > > BTW, I see this as an Open Item for PG-14 [1] which seems wrong to me > > as this is a bug from previous versions. I am not sure who added it > > Me neither. > > > but do you see any reason for this to consider as an open item for > > PG-14? > > No, I don't see any reasons as this is a bug from previous versions. > Thanks for the confirmation. I have moved this to the section: "Older bugs affecting stable branches". -- With Regards, Amit Kapila.
Re: Estimating HugePages Requirements?
Hi, On 2021-08-09 18:58:53 -0500, Justin Pryzby wrote: > Define shared_buffers as the exact size to be allocated/requested from the OS > (regardless of whether they're huge pages or not), and have postgres compute > everything else based on that. So shared_buffers=2GB would end up being > 1950MB > (or so) of buffer cache. We'd have to check that after the other allocations, > there's still at least 128kB left for the buffer cache. Maybe we'd have to > bump the minimum value of shared_buffers. I don't like that. How much "other" shared memory we're going to need is very hard to predict and depends on extensions, configuration options like max_locks_per_transaction, max_connections to a significant degree. This way the user ends up needing to guess at least as much as before to get to a sensible shared_buffers. Greetings, Andres Freund
Re: Estimating HugePages Requirements?
Hi, On 2021-08-09 22:57:18 +, Bossart, Nathan wrote: > @@ -1026,6 +1031,18 @@ PostmasterMain(int argc, char *argv[]) >*/ > InitializeMaxBackends(); > > + if (output_shmem) > + { > + char output[64]; > + Size size; > + > + size = CreateSharedMemoryAndSemaphores(true); > + sprintf(output, "%zu", size); > + > + puts(output); > + ExitPostmaster(0); > + } I don't like putting this into PostmasterMain(). Either BootstrapMain() (specifically checker mode) or GucInfoMain() seem like better places. > -void > -CreateSharedMemoryAndSemaphores(void) > +Size > +CreateSharedMemoryAndSemaphores(bool size_only) > { > PGShmemHeader *shim = NULL; > > @@ -161,6 +161,9 @@ CreateSharedMemoryAndSemaphores(void) > /* might as well round it off to a multiple of a typical page > size */ > size = add_size(size, 8192 - (size % 8192)); > > + if (size_only) > + return size; > + > elog(DEBUG3, "invoking IpcMemoryCreate(size=%zu)", size); > > /* > @@ -288,4 +291,6 @@ CreateSharedMemoryAndSemaphores(void) >*/ > if (shmem_startup_hook) > shmem_startup_hook(); > + > + return 0; > } That seems like an ugly API to me. Why don't we split the size determination and shmem creation functions into two? Greetings, Andres Freund
Re: alter table set TABLE ACCESS METHOD
On Tue, Aug 10, 2021 at 12:24:13PM +0900, Michael Paquier wrote: > So, on a closer look, it happens that this breaks the regression tests > of sepgsql, as the two following commands in ddl.sql cause a rewrite: > ALTER TABLE regtest_table_4 ALTER COLUMN y TYPE float; > ALTER TABLE regtest_ptable_4 ALTER COLUMN y TYPE float; rhinoceros has reported back, and these are the only two that required an adjustment, so fixed. -- Michael signature.asc Description: PGP signature
RE: Skipping logical replication transactions on subscriber side
On Wednesday, August 4, 2021 8:46 PM Masahiko Sawada wrote: > I'll incorporate those comments in the next version patch. Hi, when are you going to make and share the updated v6 ? Best Regards, Takamichi Osumi
Re: [BUG]Update Toast data failure in logical replication
On Fri, Jul 30, 2021 at 10:21 AM Amit Kapila wrote: > > This problem seems to be from the time Logical > Replication has been introduced, so adding others (who are generally > involved in this area) to see what they think about this bug? I think > people might not be using toasted columns for Replica Identity due to > which this problem has been reported yet but I feel this is quite a > fundamental issue and we should do something about this. > > Let me summarize the problem for the ease of others. > > The logical replica can go out of sync for UPDATES when there is a > toast column as part of REPLICA IDENTITY. In such cases, updates are > not replicated if the key column doesn't change because we don't log > the actual key value for the unchanged toast key. It is neither logged > as part of old_key_tuple nor for new tuple due to which we are not > able to find the tuple to be updated on the subscriber-side and the > update is ignored on the subscriber-side. We log this in DEBUG1 mode > but I don't think the user can do anything about this and the replica > will go out-of-sync. This works when the replica identity column value > is not toasted because then it will be part of the new tuple and we > use that to fetch the tuple on the subscriber. > It seems to me this problem exists from the time we introduced wal_level = logical in the commit e55704d8b2 [1], or another possibility is that logical replication commit didn't consider something to make it work. Andres, Robert, Petr, can you guys please comment because otherwise, we might miss something here. [1] - commit e55704d8b2fe522fbc9435acbb5bc59033478bd5 Author: Robert Haas Date: Tue Dec 10 18:33:45 2013 -0500 Add new wal_level, logical, sufficient for logical decoding. When wal_level=logical, we'll log columns from the old tuple as configured by the REPLICA IDENTITY facility added in commit 07cacba983ef79be4a84fcd0e0ca3b5fcb85dd65. This makes it possible a properly-configured logical replication solution to correctly follow table updates even if they change the chosen key columns, or, with REPLICA IDENTITY FULL, even if the table has no key at all. Note that updates which do not modify the replica identity column won't log anything extra, making the choice of a good key (i.e. one that will rarely be changed) important to performance when wal_level=logical is configured. .. Andres Freund, reviewed in various versions by myself, Heikki Linnakangas, KONDO Mitsumasa, and many others. -- With Regards, Amit Kapila.
Re: Added schema level support for publication.
On Mon, Aug 9, 2021 at 9:50 PM Mark Dilger wrote: > > > On Aug 6, 2021, at 1:32 AM, vignesh C wrote: > > > > the attached v19 patch > > With v19 applied, a schema owner can publish the contents of a table > regardless of ownership or permissions on that table: > ... ... > > It is a bit counterintuitive that schema owners do not have administrative > privileges over tables within their schemas, but that's how it is. The > design of this patch seems to assume otherwise. Perhaps ALTER PUBLICATION > ... ADD SCHEMA should be restricted to superusers, just as FOR ALL TABLES? > +1. Your suggestion sounds reasonable to me. > Alternatively, you could add ownership checks per table to mirror the > behavior of ALTER PUBLICATION ... ADD TABLE, but that would foreclose the > option of automatically updating the list of tables in the publication as new > tables are added to the schema, since those new tables would not necessarily > belong to the schema owner, and having a error thrown during CREATE TABLE > would be quite unfriendly. I think until this is hammered out, it is safer > to require superuser privileges and then we can revisit this issue and loosen > the requirement in a subsequent commit. > I think the same argument can be made for "FOR ALL TABLES .." as well. So, let's leave such a requirement for another patch. -- With Regards, Amit Kapila.
Re: Why does the owner of a publication need CREATE privileges on the database?
On Tue, Jul 27, 2021 at 11:29 PM Mark Dilger wrote: > > The documentation for ALTER PUBLICATION ... OWNER TO ... claims the new owner > must have CREATE privilege on the database, though superuser can change the > ownership in spite of this restriction. No explanation is given for this > requirement. > I am not aware of the original thought process behind this but current behavior seems reasonable because if users need to have CREATE privilege on the database while Create Publication, the same should be true while we change the owner to a new owner. Basically, at any point in time, the owner of the publication should have CREATE privilege on the database which contains the publication. -- With Regards, Amit Kapila.
Re: ECPG bug fix: DECALRE STATEMENT and DEALLOCATE, DESCRIBE
On Mon, Aug 09, 2021 at 10:50:29PM +0200, Michael Meskes wrote: >> On the substance of the issue, one question that I have reading over >> this thread is whether there's really a bug here at all. It is being >> represented (and I have not checked whether this is accurate) that >> the >> patch affects the behavior of DECLARE, PREPARE, and EXECUTE, but not >> DESCRIBE, DEALLOCATE, DECLARE CURSOR .. FOR, or CREATE TABLE AS >> EXECUTE. However, it also seems that it's not entirely clear what the >> behavior ought to be in those cases, except perhaps for DESCRIBE. If >> that's the case, maybe this doesn't really need to be an open item, >> and maybe we don't therefore need to talk about whether it should be >> reverted. Maybe it's just a feature that supports certain things now >> and in the future, after due reflection, perhaps it will support >> more. > > The way I see it we should commit the patch that makes more statements > honor the new DECLARE STATEMENT feature. I don't think we can change > anything for the worse by doing that. And other databases are not > really strict about this either. Okay. So you mean to change DESCRIBE and DEALLOCATE to be able to handle cached connection names, as of [1]? For [DE]ALLOCATE, I agree that it could be a good thing. declare.pgc seems to rely on that already but the tests are incorrect as I mentioned in [2]. For DESCRIBE, that provides data about a result set, I find the assignment of a connection a bit strange, and even if this would allow the use of the same statement name for multiple connections, it seems to me that there is a risk of breaking existing applications. There should not be that many, so perhaps that's fine anyway. [1]: https://www.postgresql.org/message-id/tyapr01mb5866973462d17f2aebd8ebb8f5...@tyapr01mb5866.jpnprd01.prod.outlook.com [2]: https://www.postgresql.org/message-id/yoqncyfxp868z...@paquier.xyz -- Michael signature.asc Description: PGP signature
Re: Skipping logical replication transactions on subscriber side
On Tue, Aug 10, 2021 at 10:37 AM Masahiko Sawada wrote: > > I've attached the latest patches that incorporated all comments I got > so far. Please review them. > I am not able to apply the latest patch (v6-0001-Add-errcontext-to-errors-happening-during-applyin) on HEAD, getting the below error: patching file src/backend/replication/logical/worker.c Hunk #11 succeeded at 1195 (offset 50 lines). Hunk #12 succeeded at 1253 (offset 50 lines). Hunk #13 succeeded at 1277 (offset 50 lines). Hunk #14 succeeded at 1305 (offset 50 lines). Hunk #15 succeeded at 1330 (offset 50 lines). Hunk #16 succeeded at 1362 (offset 50 lines). Hunk #17 succeeded at 1508 (offset 50 lines). Hunk #18 succeeded at 1524 (offset 50 lines). Hunk #19 succeeded at 1645 (offset 50 lines). Hunk #20 succeeded at 1671 (offset 50 lines). Hunk #21 succeeded at 1772 (offset 50 lines). Hunk #22 succeeded at 1828 (offset 50 lines). Hunk #23 succeeded at 1934 (offset 50 lines). Hunk #24 succeeded at 1962 (offset 50 lines). Hunk #25 succeeded at 2399 (offset 50 lines). Hunk #26 FAILED at 2405. Hunk #27 succeeded at 3730 (offset 54 lines). 1 out of 27 hunks FAILED -- saving rejects to file src/backend/replication/logical/worker.c.rej -- With Regards, Amit Kapila.
Re: EXEC_BACKEND vs bgworkers without BGWORKER_SHMEM_ACCESS
On Mon, Aug 09, 2021 at 11:07:14AM -0400, Robert Haas wrote: > On Thu, Aug 5, 2021 at 8:02 PM Tom Lane wrote: >> I think doing nothing is fine. Given the lack of complaints, we're >> more likely to break something than fix anything useful. > > +1. FWIW, the only interesting case I have in my plugin box for a background worker that does not attach to shared memory is a template of worker able to catch signals, to be used as a base for simple actions. So that's not really interesting. Making the SHMEM flag be something mandatory on HEAD while doing nothing in the back-branches sounds good to me, so +1. -- Michael signature.asc Description: PGP signature