Attach to shared memory after fork()
Fork is an expensive operation[1]. The major cost is the mm(VMA PTE...) copy. ARM is especially weak on fork, which will invalid TLB entries one by one, and this is an expensive operation[2]. We could easily got 100% CPU on ARM machine. We also meet fork problem in x86, but not as serious as arm. We can avoid this by enable hugepage(and 2MB doesn’t help us under arm, we got a huge shared buffer), but we still think it is a problem. So I propose to remove shared buffers from postmaster and shmat them after fork. Not all of them, we still keep necessary shared memories in postmaster. Or maybe we just need to give up fork like under Windows? Any good idea about it? [1]. https://www.microsoft.com/en-us/research/publication/a-fork-in-the-road/ [2]. https://developer.arm.com/documentation/ddi0487/latest/ D5.10 TLB maintenance requirements and the TLB maintenance instructions: Break-before-make sequence on changing from an old translation table entry to a new translation table entryrequires the following steps: 1. Replace the old translation table entry with an invalid entry, and execute a DSB instruction. 2. Invalidate the translation table entry with a broadcast TLB invalidation instruction, and execute a DSBinstruction to ensure the completion of that invalidation. 3. Write the new translation table entry, and execute a DSB instruction to ensure that the new entry is visible. Regards. Yuhang Qiu.
Re: [BUG] "FailedAssertion" reported when streaming in logical replication
On Tue, Apr 27, 2021 at 12:22 PM Dilip Kumar wrote: > > On Tue, Apr 27, 2021 at 12:05 PM Amit Kapila wrote: > > > > Can't we use 'txns_by_base_snapshot_lsn' list for this purpose? It is > > > > ensured in ReorderBufferSetBaseSnapshot that we always assign > > > > base_snapshot to a top-level transaction if the current is a known > > > > subxact. I think that will be true because we always form xid-subxid > > > > relation before processing each record in > > > > LogicalDecodingProcessRecord. > > > > > > Yeah, we can do that, but here we are only interested in top > > > transactions and this list will give us sub-transaction as well so we > > > will have to skip it in the below if condition. > > > > > > > I am not so sure about this point. I have explained above why I think > > there won't be any subtransactions in this. Can you please let me know > > what am I missing if anything? > > Got your point, yeah this will only have top transactions so we can > use this. I will change this in the next patch. In fact we can put > an assert that it should not be an sub transaction? > Right. It is good to have an assert. -- With Regards, Amit Kapila.
Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY
Thanks for the updated patch. I've been reading it, but I noticed a bug in 8aba9322511f, which I thought you'd want to know to make a note of when committing this one. So we decided in 8aba9322511f that it is okay to make the memory context in which a transient partdesc is allocated a child of PortalContext so that it disappears when the portal does. But PortalContext is apparently NULL when the planner runs, at least in the "simple" query protocol, so any transient partdescs built by the planner would effectively leak. Not good. With this patch, partdesc_nodetached is no longer transient, so the problem doesn't exist. I will write more about the updated patch in a bit... -- Amit Langote EDB: http://www.enterprisedb.com
Re: SQL-standard function body
On 18.04.21 23:33, Tom Lane wrote: ... BTW, a dependency loop is also possible without using this feature, by abusing default-value expressions: create function f1(x int, y int) returns int language sql as 'select $1 + $2'; create function f2(x int, y int default f1(1,2)) returns int language sql as 'select $1 + $2'; create or replace function f1(x int, y int default f2(11,12)) returns int language sql as 'select $1 + $2'; The actual use-case for that seems pretty thin, so we never bothered to worry about it before. But if we're going to build loop-breaking logic to handle function body dependencies, it should deal with this too. I think that all that's required is for the initial dummy function declaration to omit defaults as well as providing a dummy body. I have studied this a bit. I'm not sure where the dummy function declaration should be created. The current dependency-breaking logic in pg_dump_sort.c doesn't appear to support injecting additional objects into the set of dumpable objects. So we would need to create it perhaps in dumpFunc() and then later set flags that indicate whether it will be required. Another option would be that we disallow this at creation time. It seems we could detect dependency loops using findDependentObjects(), so this might not be so difficult. The use case for recursive SQL functions is probably low, at least with the current limited set of control flow options in SQL. (And you can always use a quoted body to work around it.)
Re: [PATCH] Re: pg_identify_object_as_address() doesn't support pg_event_trigger oids
On Tue, Apr 27, 2021 at 07:16:25AM +0200, Joel Jacobson wrote: > I've added a test at the end of event_trigger.sql, > reusing the three event triggers already in existence, > just before they are dropped. Cool, thanks. I have been looking at it and I'd still like to cross-check the output data of pg_get_object_address() to see if pg_identify_object() remains consistent through it. See for example the attached that uses a trick based on LATERAL, a bit different than what's done in object_address.sql but that gives the same amount of coverage (I could also use two ROW()s and an equality, but well..). -- Michael diff --git a/src/backend/catalog/objectaddress.c b/src/backend/catalog/objectaddress.c index 6d88b690d8..ad9740098e 100644 --- a/src/backend/catalog/objectaddress.c +++ b/src/backend/catalog/objectaddress.c @@ -5607,10 +5607,7 @@ getObjectIdentityParts(const ObjectAddress *object, { HeapTuple tup; Form_pg_event_trigger trigForm; - -/* no objname support here */ -if (objname) - *objname = NIL; +char *evtname; tup = SearchSysCache1(EVENTTRIGGEROID, ObjectIdGetDatum(object->objectId)); @@ -5622,8 +5619,10 @@ getObjectIdentityParts(const ObjectAddress *object, break; } trigForm = (Form_pg_event_trigger) GETSTRUCT(tup); -appendStringInfoString(&buffer, - quote_identifier(NameStr(trigForm->evtname))); +evtname = NameStr(trigForm->evtname); +appendStringInfoString(&buffer, quote_identifier(evtname)); +if (objname) + *objname = list_make1(evtname); ReleaseSysCache(tup); break; } diff --git a/src/test/regress/expected/event_trigger.out b/src/test/regress/expected/event_trigger.out index bdd0ffcdaf..1aca794d6d 100644 --- a/src/test/regress/expected/event_trigger.out +++ b/src/test/regress/expected/event_trigger.out @@ -533,6 +533,23 @@ DROP POLICY p2 ON event_trigger_test; NOTICE: DROP POLICY - ddl_command_start NOTICE: DROP POLICY - sql_drop NOTICE: DROP POLICY - ddl_command_end +-- Check the object address of those event triggers +SELECT +e.evtname, +pg_describe_object('pg_event_trigger'::regclass, e.oid, 0) as descr, +b.type, b.object_names, b.object_args, +pg_identify_object(a.classid, a.objid, a.objsubid) as ident + FROM pg_event_trigger as e, +LATERAL pg_identify_object_as_address('pg_event_trigger'::regclass, e.oid, 0) as b, +LATERAL pg_get_object_address(b.type, b.object_names, b.object_args) as a + ORDER BY e.evtname; + evtname | descr | type |object_names | object_args | ident +---+-+---+-+-+ + end_rls_command | event trigger end_rls_command | event trigger | {end_rls_command} | {} | ("event trigger",,end_rls_command,end_rls_command) + sql_drop_command | event trigger sql_drop_command | event trigger | {sql_drop_command} | {} | ("event trigger",,sql_drop_command,sql_drop_command) + start_rls_command | event trigger start_rls_command | event trigger | {start_rls_command} | {} | ("event trigger",,start_rls_command,start_rls_command) +(3 rows) + DROP EVENT TRIGGER start_rls_command; DROP EVENT TRIGGER end_rls_command; DROP EVENT TRIGGER sql_drop_command; diff --git a/src/test/regress/sql/event_trigger.sql b/src/test/regress/sql/event_trigger.sql index 18b2a267cb..f7347ffbb1 100644 --- a/src/test/regress/sql/event_trigger.sql +++ b/src/test/regress/sql/event_trigger.sql @@ -426,6 +426,17 @@ ALTER POLICY p1 ON event_trigger_test USING (TRUE); ALTER POLICY p1 ON event_trigger_test RENAME TO p2; DROP POLICY p2 ON event_trigger_test; +-- Check the object address of those event triggers +SELECT +e.evtname, +pg_describe_object('pg_event_trigger'::regclass, e.oid, 0) as descr, +b.type, b.object_names, b.object_args, +pg_identify_object(a.classid, a.objid, a.objsubid) as ident + FROM pg_event_trigger as e, +LATERAL pg_identify_object_as_address('pg_event_trigger'::regclass, e.oid, 0) as b, +LATERAL pg_get_object_address(b.type, b.object_names, b.object_args) as a + ORDER BY e.evtname; + DROP EVENT TRIGGER start_rls_command; DROP EVENT TRIGGER end_rls_command; DROP EVENT TRIGGER sql_drop_command; signature.asc Description: PGP signature
Re: [HACKERS] logical decoding of two-phase transactions
On Wed, Apr 21, 2021 at 12:13 PM Peter Smith wrote: > > On Tue, Apr 20, 2021 at 3:45 PM Peter Smith wrote: > > > > Please find attached the latest patch set v73`* > > > > Differences from v72* are: > > > > * Rebased to HEAD @ today (required because v72-0001 no longer applied > > cleanly) > > > > * Minor documentation correction for protocol messages for Commit Prepared > > ('K') > > > > * Non-functional code tidy (mostly proto.c) to reduce overloading > > different meanings to same member names for prepare/commit times. > > > Please find attached a re-posting of patch set v73* > > This is the same as yesterday's v73 but with a contrib module compile > error fixed. Few comments on v73-0002-Add-prepare-API-support-for-streaming-transactio.patch patch: 1) There are slight differences in error message in case of Alter subscription ... drop publication, we can keep the error message similar: postgres=# ALTER SUBSCRIPTION mysub drop PUBLICATION mypub WITH (refresh = false, copy_data=true, two_phase=true); ERROR: unrecognized subscription parameter: "copy_data" postgres=# ALTER SUBSCRIPTION mysub drop PUBLICATION mypub WITH (refresh = false, two_phase=true, streaming=true); ERROR: cannot alter two_phase option 2) We are sending txn->xid twice, I felt we should send only once in logicalrep_write_stream_prepare: + /* transaction ID */ + Assert(TransactionIdIsValid(txn->xid)); + pq_sendint32(out, txn->xid); + + /* send the flags field */ + pq_sendbyte(out, flags); + + /* send fields */ + pq_sendint64(out, prepare_lsn); + pq_sendint64(out, txn->end_lsn); + pq_sendint64(out, txn->u_op_time.prepare_time); + pq_sendint32(out, txn->xid); + 3) We could remove xid and return prepare_data->xid +TransactionId +logicalrep_read_stream_prepare(StringInfo in, LogicalRepPreparedTxnData *prepare_data) +{ + TransactionId xid; + uint8 flags; + + xid = pq_getmsgint(in, 4); 4) Here comments can be above apply_spooled_messages for better readability + /* +* 1. Replay all the spooled operations - Similar code as for +* apply_handle_stream_commit (i.e. non two-phase stream commit) +*/ + + ensure_transaction(); + + nchanges = apply_spooled_messages(xid, prepare_data.prepare_lsn); + 5) Similarly this below comment can be above PrepareTransactionBlock + /* +* 2. Mark the transaction as prepared. - Similar code as for +* apply_handle_prepare (i.e. two-phase non-streamed prepare) +*/ + + /* +* BeginTransactionBlock is necessary to balance the EndTransactionBlock +* called within the PrepareTransactionBlock below. +*/ + BeginTransactionBlock(); + CommitTransactionCommand(); + + /* +* Update origin state so we can restart streaming from correct position +* in case of crash. +*/ + replorigin_session_origin_lsn = prepare_data.end_lsn; + replorigin_session_origin_timestamp = prepare_data.prepare_time; + + PrepareTransactionBlock(gid); + CommitTransactionCommand(); + + pgstat_report_stat(false); 6) There is a lot of common code between apply_handle_stream_prepare and apply_handle_prepare, if possible try to have a common function to avoid fixing at both places. + /* +* 2. Mark the transaction as prepared. - Similar code as for +* apply_handle_prepare (i.e. two-phase non-streamed prepare) +*/ + + /* +* BeginTransactionBlock is necessary to balance the EndTransactionBlock +* called within the PrepareTransactionBlock below. +*/ + BeginTransactionBlock(); + CommitTransactionCommand(); + + /* +* Update origin state so we can restart streaming from correct position +* in case of crash. +*/ + replorigin_session_origin_lsn = prepare_data.end_lsn; + replorigin_session_origin_timestamp = prepare_data.prepare_time; + + PrepareTransactionBlock(gid); + CommitTransactionCommand(); + + pgstat_report_stat(false); + + store_flush_position(prepare_data.end_lsn); 7) two-phase commit is slightly misleading, we can just mention streaming prepare. + * PREPARE callback (for streaming two-phase commit). + * + * Notify the downstream to prepare the transaction. + */ +static void +pgoutput_stream_prepare_txn(LogicalDecodingContext *ctx, + ReorderBufferTXN *txn, + XLogRecPtr prepare_lsn) 8) should we include Assert of in_streaming similar to other pgoutput_stream*** functions. +static void +pgoutput_stream_prepare_txn(LogicalDecodingContext *ctx, + ReorderBufferTXN *txn, + XLogRecPtr prepare_lsn) +{ + Assert(rbtxn_is_streamed(txn)); + + OutputP
Re: Bug fix for tab completion of ALTER TABLE ... VALIDATE CONSTRAINT ...
Hi David, > I noticed that $subject completes with already valid constraints, > please find attached a patch that fixes it. I noticed that there are > other places constraints can be validated, but didn't check whether > similar bugs exist there yet. There was a typo in the patch ("... and and not convalidated"). I've fixed it. Otherwise the patch passes all the tests and works as expected: eax=# create table foo (x int); CREATE TABLE eax=# alter table foo add constraint bar check (x < 3) not valid; ALTER TABLE eax=# alter table foo add constraint baz check (x <> 5) not valid; ALTER TABLE eax=# alter table foo validate constraint ba bar baz eax=# alter table foo validate constraint bar; ALTER TABLE -- Best regards, Aleksander Alekseev v2-0001-tab-completion-of-ALTER-TABLE.VALIDATE-CONSTRAINT.patch Description: Binary data
Re: TRUNCATE on foreign table
On 2021/04/27 15:02, Bharath Rupireddy wrote: On Tue, Apr 27, 2021 at 11:19 AM Fujii Masao wrote: In docs v4 patch, I think we can combine below two lines into a single line: + supported by the foreign data wrapper, see . You mean "supported by the foreign data wrapper "? I was thinking that it's better to separate them because postgres_fdw is just an example of the foreign data wrapper supporting TRUNCATE. This makes me think again; isn't it better to add "for example" or "for instance" into after "data wrapper"? That is, TRUNCATE can be used for foreign tables if supported by the foreign data wrapper, for instance, see . +1. Pushed. Thanks! Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
CREATE COLLATION - check for duplicate options and error out if found one
Hi, While reviewing [1], I found that the CREATE COLLATION doesn't throw an error if duplicate options are specified, see [2] for testing a few cases on HEAD. This may end up accepting some of the weird cases, see [2]. It's against other option checking code in the server where the duplicate option is detected and an error thrown if found one. Attached a patch doing that. I chose to have the error message "option \"%s\" specified more than once" and parser_errposition because that's kind of agreed in [3]. Thoughts? [1] https://www.postgresql.org/message-id/CALj2ACWVd%3D-E6uG5AdHD0MvHY6e4mVzkapT%3DvLDnJJseGjaJLQ%40mail.gmail.com [2] CREATE COLLATION coll_dup_chk (LC_COLLATE = "POSIX", LC_COLLATE = "NONSENSE", LC_CTYPE = "POSIX"); -- ERROR CREATE COLLATION coll_dup_chk (LC_COLLATE = "NONSENSE", LC_COLLATE = "POSIX", LC_CTYPE = "POSIX"); -- OK but it's weird CREATE COLLATION coll_dup_chk (LC_CTYPE = "POSIX", LC_CTYPE = "NONSENSE", LC_COLLATE = "POSIX"); -- ERROR CREATE COLLATION coll_dup_chk (LC_CTYPE = "NONSENSE", LC_CTYPE = "POSIX", LC_COLLATE = "POSIX",); -- OK but it's weird CREATE COLLATION coll_dup_chk (PROVIDER = icu, PROVIDER = NONSENSE, LC_COLLATE = "POSIX", LC_CTYPE = "POSIX"); -- ERROR CREATE COLLATION coll_dup_chk (PROVIDER = NONSENSE, PROVIDER = icu, LC_COLLATE = "POSIX", LC_CTYPE = "POSIX"); -- OK but it's weird CREATE COLLATION case_sensitive (LOCALE = '', LOCALE = "NONSENSE"); -- ERROR CREATE COLLATION coll_dup_chk (LOCALE = "NONSENSE", LOCALE = ''); -- OK but it's weird CREATE COLLATION coll_dup_chk (DETERMINISTIC = TRUE, DETERMINISTIC = NONSENSE, LOCALE = ''); -- ERROR CREATE COLLATION coll_dup_chk (DETERMINISTIC = NONSENSE, DETERMINISTIC = TRUE, LOCALE = ''); -- OK but it's weird [3] https://www.postgresql.org/message-id/CALj2ACUa%3DZM8QtOLPCHc7%3DWgFrx9P6-AgKQs8cmKLvNCvu7arQ%40mail.gmail.com With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com From 555ac67a94d899b107e27a691f3a2a7340a14f64 Mon Sep 17 00:00:00 2001 From: Bharath Rupireddy Date: Tue, 27 Apr 2021 12:01:44 +0530 Subject: [PATCH v1] Check duplicate options and error out for CREATE COLLATION command CREATE COLLATION doesn't throw an error if duplicate options are specified. Modify the option parsing loop in DefineCollation to check for duplicate options and error out if found one. --- src/backend/commands/collationcmds.c | 35 +++ src/test/regress/expected/collate.out | 35 +++ src/test/regress/sql/collate.sql | 17 + 3 files changed, 87 insertions(+) diff --git a/src/backend/commands/collationcmds.c b/src/backend/commands/collationcmds.c index b8ec6f5756..8d07b21eda 100644 --- a/src/backend/commands/collationcmds.c +++ b/src/backend/commands/collationcmds.c @@ -86,15 +86,50 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e if (strcmp(defel->defname, "from") == 0) defelp = &fromEl; else if (strcmp(defel->defname, "locale") == 0) + { + if (localeEl) +ereport(ERROR, + (errcode(ERRCODE_SYNTAX_ERROR), + errmsg("option \"%s\" specified more than once", "locale"), + parser_errposition(pstate, defel->location))); defelp = &localeEl; + } else if (strcmp(defel->defname, "lc_collate") == 0) + { + if (lccollateEl) +ereport(ERROR, + (errcode(ERRCODE_SYNTAX_ERROR), + errmsg("option \"%s\" specified more than once", "lc_collate"), + parser_errposition(pstate, defel->location))); defelp = &lccollateEl; + } else if (strcmp(defel->defname, "lc_ctype") == 0) + { + if (lcctypeEl) +ereport(ERROR, + (errcode(ERRCODE_SYNTAX_ERROR), + errmsg("option \"%s\" specified more than once", "lc_ctype"), + parser_errposition(pstate, defel->location))); defelp = &lcctypeEl; + } else if (strcmp(defel->defname, "provider") == 0) + { + if (providerEl) +ereport(ERROR, + (errcode(ERRCODE_SYNTAX_ERROR), + errmsg("option \"%s\" specified more than once", "provider"), + parser_errposition(pstate, defel->location))); defelp = &providerEl; + } else if (strcmp(defel->defname, "deterministic") == 0) + { + if (deterministicEl) +ereport(ERROR, + (errcode(ERRCODE_SYNTAX_ERROR), + errmsg("option \"%s\" specified more than once", "deterministic"), + parser_errposition(pstate, defel->location))); defelp = &deterministicEl; + } else { ereport(ERROR, diff --git a/src/test/regress/expected/collate.out b/src/test/regress/expected/collate.out index 0b60b55514..85a10c8198 100644 --- a/src/test/regress/expected/collate.out +++ b/src/test/regress/expected/collate.out @@ -701,6 +701,41 @@ View definition: SELECT ss.c1 + 1 AS c1p FROM ( SELECT 4 AS c1) ss; +-- Check duplicate options in CREATE COLLATION command and throw error +-- LC_COLLATE +CREATE COLLATION coll_dup_chk (LC_COLLATE = "POSIX", LC_COLLATE = "NONSENSE", LC_CTYPE = "POSIX"); +ERROR: option "lc_collate" specifie
Re: Bug fix for tab completion of ALTER TABLE ... VALIDATE CONSTRAINT ...
Hi hackers, > Otherwise the patch passes all the tests and works as expected I've noticed there is no tab completion for ALTER TABLE xxx ADD. Here is an alternative version of the patch that fixes this as well. Not sure if this should be in the same commit though. -- Best regards, Aleksander Alekseev v3-0001-tab-completion-of-ALTER-TABLE.VALIDATE-CONSTRAINT.patch Description: Binary data
Re: Better sanity checking for message length words
Hi Tom, > ... > Given the small number of complaints to date, I'm hesitant to > back-patch this: if there's anybody out there with valid use for > long messages that I didn't think should be long, this might break > things for them. But I think it'd be reasonable to sneak it into > v14, since we've not started beta yet. > > Thoughts? I'm having slight issues applying your patch to the `master` branch. Is it the right target? Regarding the idea, I think extra checks are a good thing and definitely something that can be introduced in the next major version. If we receive a complaint during beta-testing we can revert the patch or maybe increase the limit for small messages. -- Best regards, Aleksander Alekseev
Re: Attach to shared memory after fork()
On 4/26/21 11:56 PM, 邱宇航(烛远) wrote: > Fork is an expensive operation[1]. The major cost is the mm(VMA > PTE...) copy. > > ARM is especially weak on fork, which will invalid TLB entries one by > one, and this is an expensive operation[2]. We could easily got 100% > CPU on ARM machine. We also meet fork problem in x86, but not as > serious as arm. > > We can avoid this by enable hugepage(and 2MB doesn’t help us under > arm, we got a huge shared buffer), but we still think it is a problem. > > So I propose to remove shared buffers from postmaster and shmat them > after fork. Not all of them, we still keep necessary shared memories > in postmaster. Or maybe we just need to give up fork like under Windows? > Windows has CreateProcess, which isn't available elsewhere. If you build with EXEC_BACKEND on *nix it will fork() followed by exec(), the classic Unix pattern. You can benchmark that but I doubt you will like the results. This is one of the reasons for using a connection pooler like pgbouncer, which can vastly reduce the number of new process creations Postgres has to do. Better shared memory management might be more promising. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: Enhanced error message to include hint messages for redundant options error
On Tue, Apr 27, 2021 at 6:23 AM Bharath Rupireddy wrote: > > On Mon, Apr 26, 2021 at 9:10 PM Bharath Rupireddy > wrote: > > > > I found another problem with collationcmds.c is that it doesn't error > > out if some of the options are specified more than once, something > > like below. I think the option checking "for loop" in DefineCollation > > needs to be reworked. > > CREATE COLLATION case_insensitive (provider = icu, provider = > > someother locale = '@colStrength=secondary', deterministic = false, > > deterministic = true); > > I'm thinking that the above problem should be discussed separately. I > will start a new thread soon on this. I started a separate thread - https://www.postgresql.org/message-id/CALj2ACWtL6fTLdyF4R_YkPtf1YEDb6FUoD5DGAki3rpD%2BsWqiA%40mail.gmail.com With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
Re: [BUG] "FailedAssertion" reported when streaming in logical replication
On Tue, Apr 27, 2021 at 12:55 PM Amit Kapila wrote: > > On Tue, Apr 27, 2021 at 12:22 PM Dilip Kumar wrote: > > > > On Tue, Apr 27, 2021 at 12:05 PM Amit Kapila > > wrote: > > > > > Can't we use 'txns_by_base_snapshot_lsn' list for this purpose? It is > > > > > ensured in ReorderBufferSetBaseSnapshot that we always assign > > > > > base_snapshot to a top-level transaction if the current is a known > > > > > subxact. I think that will be true because we always form xid-subxid > > > > > relation before processing each record in > > > > > LogicalDecodingProcessRecord. > > > > > > > > Yeah, we can do that, but here we are only interested in top > > > > transactions and this list will give us sub-transaction as well so we > > > > will have to skip it in the below if condition. > > > > > > > > > > I am not so sure about this point. I have explained above why I think > > > there won't be any subtransactions in this. Can you please let me know > > > what am I missing if anything? > > > > Got your point, yeah this will only have top transactions so we can > > use this. I will change this in the next patch. In fact we can put > > an assert that it should not be an sub transaction? > > > > Right. It is good to have an assert. I have modified the patch based on the above comments. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com From 6fcd9be88bd85df4af0508f082dfdad0f8bbf518 Mon Sep 17 00:00:00 2001 From: Dilip Kumar Date: Tue, 27 Apr 2021 13:57:21 +0530 Subject: [PATCH v3] Assorted bug fix while selecting the largest top transaction for streaming There were mainly 2 problems, 1) Ideally, if we haven't selected any transaction we should take next available transaction without comparing the size but the condition was wrong and it was selecting the next available transaction without comparing the size if we had already selected a transaction which was wrong. 2) Another probelm was we were selecting the transaction without checking their base snapshot, so if the base snapshot is NULL then we can not stream any change so it was hitting the assert that after streaming txn->size should be 0. So the solution is we should never select the transaction for streaming which doesn't have a base snapshot as we can not stream that transaction. --- src/backend/replication/logical/reorderbuffer.c | 31 ++--- 1 file changed, 18 insertions(+), 13 deletions(-) diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c index 5cb484f..ea217ef 100644 --- a/src/backend/replication/logical/reorderbuffer.c +++ b/src/backend/replication/logical/reorderbuffer.c @@ -3362,16 +3362,17 @@ ReorderBufferLargestTXN(ReorderBuffer *rb) * iterate over the limited number of toplevel transactions. * * Note that, we skip transactions that contains incomplete changes. There - * is a scope of optimization here such that we can select the largest transaction - * which has complete changes. But that will make the code and design quite complex - * and that might not be worth the benefit. If we plan to stream the transactions - * that contains incomplete changes then we need to find a way to partially - * stream/truncate the transaction changes in-memory and build a mechanism to - * partially truncate the spilled files. Additionally, whenever we partially - * stream the transaction we need to maintain the last streamed lsn and next time - * we need to restore from that segment and the offset in WAL. As we stream the - * changes from the top transaction and restore them subtransaction wise, we need - * to even remember the subxact from where we streamed the last change. + * is a scope of optimization here such that we can select the largest + * transaction which has incomplete changes. But that will make the code and + * design quite complex and that might not be worth the benefit. If we plan to + * stream the transactions that contains incomplete changes then we need to + * find a way to partially stream/truncate the transaction changes in-memory + * and build a mechanism to partially truncate the spilled files. + * Additionally, whenever we partially stream the transaction we need to + * maintain the last streamed lsn and next time we need to restore from that + * segment and the offset in WAL. As we stream the changes from the top + * transaction and restore them subtransaction wise, we need to even remember + * the subxact from where we streamed the last change. */ static ReorderBufferTXN * ReorderBufferLargestTopTXN(ReorderBuffer *rb) @@ -3381,13 +3382,17 @@ ReorderBufferLargestTopTXN(ReorderBuffer *rb) ReorderBufferTXN *largest = NULL; /* Find the largest top-level transaction. */ - dlist_foreach(iter, &rb->toplevel_by_lsn) + dlist_foreach(iter, &rb->txns_by_base_snapshot_lsn) { ReorderBufferTXN *txn; - txn = dlist_container(ReorderBufferTXN, node, iter.cur); + txn = dlist_container(ReorderBufferTXN, base_snapshot_node, iter.c
Re: Replication slot stats misgivings
On Tue, Apr 27, 2021 at 8:58 AM Masahiko Sawada wrote: > > On Tue, Apr 27, 2021 at 11:45 AM Amit Kapila wrote: > > > > > > Sawada-San, I would like to go ahead with your > > "Use-HTAB-for-replication-slot-statistics" unless you think otherwise? > > I agree that it's better to use the stats collector. So please go ahead. > I have pushed this patch and seeing one buildfarm failure: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mandrill&dt=2021-04-27%2009%3A23%3A14 starting permutation: s1_init s1_begin s1_insert_tbl1 s1_insert_tbl2 s2_alter_tbl1_char s1_commit s2_get_changes + isolationtester: canceling step s1_init after 314 seconds step s1_init: SELECT 'init' FROM pg_create_logical_replication_slot('isolation_slot', 'test_decoding'); ?column? I am analyzing this. Do let me know if you have any thoughts on the same? -- With Regards, Amit Kapila.
Re: Asynchronous Append on postgres_fdw nodes.
On Mon, Apr 26, 2021 at 3:01 PM Andrey V. Lepikhov wrote: > While studying the capabilities of AsyncAppend, I noticed an > inconsistency with the cost model of the optimizer: > Here I see two problems: > 1. Cost of an AsyncAppend is the same as cost of an Append. But > execution time of the AsyncAppend for three remote partitions has more > than halved. > 2. Cost of an AsyncAppend looks as a sum of the child ForeignScan costs. Yeah, we don’t adjust the cost for async Append; it’s the same as that for sync Append. But I don’t see any issue as-is, either. (It’s not that easy to adjust the cost to an appropriate value in the case of postgres_fdw, because in that case the cost would vary depending on which connections are used for scanning foreign tables [1].) > I haven't ideas why it may be a problem right now. But I can imagine > that it may be a problem in future if we have alternative paths: complex > pushdown in synchronous mode (a few rows to return) or simple > asynchronous append with a large set of rows to return. Yeah, I think it’s better if we could consider async append paths and estimate the costs for them accurately at path-creation time, not plan-creation time, because that would make it possible to use async execution in more cases, as you pointed out. But I left that for future work, because I wanted to make the first cut simple. Thanks for the review! Best regards, Etsuro Fujita [1] https://www.postgresql.org/message-id/CAPmGK15i-OyCesd369P8zyBErjN_T18zVYu27714bf_L%3DCOXew%40mail.gmail.com
Re: Asynchronous Append on postgres_fdw nodes.
On Mon, Apr 26, 2021 at 7:35 PM Andrey V. Lepikhov wrote: > Small mistake i found. If no tuple was received from a foreign > partition, explain shows that we never executed node. For example, > if we have 0 tuples in f1 and 100 tuples in f2: > > Query: > EXPLAIN (ANALYZE, VERBOSE, TIMING OFF, COSTS OFF) > SELECT * FROM (SELECT * FROM f1 UNION ALL SELECT * FROM f2) AS q1 > LIMIT 101; > > Explain: > Limit (actual rows=100 loops=1) > Output: f1.a > -> Append (actual rows=100 loops=1) > -> Async Foreign Scan on public.f1 (never executed) > Output: f1.a > Remote SQL: SELECT a FROM public.l1 > -> Async Foreign Scan on public.f2 (actual rows=100 loops=1) > Output: f2.a > Remote SQL: SELECT a FROM public.l2 > > The patch in the attachment fixes this. Thanks for the report and patch! Will look into this. Best regards, Etsuro Fujita
Re: [PATCH] Re: pg_identify_object_as_address() doesn't support pg_event_trigger oids
On Tue, Apr 27, 2021, at 09:48, Michael Paquier wrote: > On Tue, Apr 27, 2021 at 07:16:25AM +0200, Joel Jacobson wrote: > > I've added a test at the end of event_trigger.sql, > > reusing the three event triggers already in existence, > > just before they are dropped. > > Cool, thanks. I have been looking at it and I'd still like to > cross-check the output data of pg_get_object_address() to see if > pg_identify_object() remains consistent through it. See for example > the attached that uses a trick based on LATERAL, a bit different than > what's done in object_address.sql but that gives the same amount of > coverage (I could also use two ROW()s and an equality, but well..). Neat trick, looks good to me. I've successfully tested fix_event_trigger_pg_identify_object_as_address3.patch. /Joel
RE: Forget close an open relation in ReorderBufferProcessTXN()
On Tuesday, April 20, 2021 12:07 PM Amit Langote wrote: > On Sat, Apr 17, 2021 at 1:30 PM Amit Kapila > wrote: > > On Fri, Apr 16, 2021 at 11:24 PM Andres Freund > > wrote:> > This made me take a brief look at pgoutput.c - maybe I am > > missing > > > something, but how is the following not a memory leak? > > > > > > static void > > > maybe_send_schema(LogicalDecodingContext *ctx, > > > ReorderBufferTXN *txn, ReorderBufferChange > *change, > > > Relation relation, RelationSyncEntry *relentry) { > > > ... > > > /* Map must live as long as the session does. */ > > > oldctx = MemoryContextSwitchTo(CacheMemoryContext); > > > relentry->map = > convert_tuples_by_name(CreateTupleDescCopy(indesc), > > > > CreateTupleDescCopy(outdesc)); > > > MemoryContextSwitchTo(oldctx); > > > send_relation_and_attrs(ancestor, xid, ctx); > > > RelationClose(ancestor); > > > > > > If - and that's common - convert_tuples_by_name() won't have to do > > > anything, the copied tuple descs will be permanently leaked. > > > > > > > I also think this is a permanent leak. I think we need to free all the > > memory associated with this map on the invalidation of this particular > > relsync entry (basically in rel_sync_cache_relation_cb). > > I agree there's a problem here. > > Back in: > > https://www.postgresql.org/message-id/CA%2BHiwqEeU19iQgjN6HF1HTP > U0L5%2BJxyS5CmxaOVGNXBAfUY06Q%40mail.gmail.com > > I had proposed to move the map creation from maybe_send_schema() to > get_rel_sync_entry(), mainly because the latter is where I realized it > belongs, though a bit too late. Attached is the part of the patch > for this particular issue. It also takes care to release the copied > TupleDescs > if the map is found to be unnecessary, thus preventing leaking into > CacheMemoryContext. Thank you for sharing the patch. Your patch looks correct to me. Make check-world has passed with it. Also, I agree with the idea to place the processing to set the map in the get_rel_sync_entry. One thing I'd like to ask is an advanced way to confirm the memory leak is solved by the patch, not just by running make check-world. I used OSS HEAD and valgrind, expecting that I could see function stack which has a call of CreateTupleDescCopy from both pgoutput_change and pgoutput_truncate as memory leak report in the valgrind logs, and they disappear after applying the patch. But, I cannot find the pair of pgoutput functions and CreateTupleDescCopy in one report when I used OSS HEAD, which means that I need to do advanced testing to check if the memory leak of CreateTupleDescCopy is addressed. I collected the logs from RT at src/test/subscription so should pass the routes of our interest. Could someone give me an advise about the way to confirm the memory leak is solved ? Best Regards, Takamichi Osumi
Result Cache node shows per-worker info even for workers not launched
Hi, If planned parallel workers do not get launched, the Result Cache plan node shows all-0 stats for each of those workers: tpch=# set max_parallel_workers TO 0; SET tpch=# explain analyze select avg(l_discount) from orders, lineitem where l_orderkey = o_orderkey and o_orderdate < date '1995-03-09' and l_shipdate > date '1995-03-09'; QUERY PLAN - Finalize Aggregate (cost=315012.87..315012.88 rows=1 width=32) (actual time=27533.482..27533.598 rows=1 loops=1) -> Gather (cost=315012.44..315012.85 rows=4 width=32) (actual time=27533.471..27533.587 rows=1 loops=1) Workers Planned: 4 Workers Launched: 0 -> Partial Aggregate (cost=314012.44..314012.45 rows=1 width=32) (actual time=27533.177..27533.178 rows=1 loops=1) -> Nested Loop (cost=0.44..309046.68 rows=1986303 width=4) (actual time=0.400..27390.835 rows=748912 loops=1) -> Parallel Seq Scan on lineitem (cost=0.00..154513.66 rows=4120499 width=12) (actual time=0.044..7910.399 rows=16243662 loops=1) Filter: (l_shipdate > '1995-03-09'::date) Rows Removed by Filter: 13756133 -> Result Cache (cost=0.44..0.53 rows=1 width=4) (actual time=0.001..0.001 rows=0 loops=16243662) Cache Key: lineitem.l_orderkey Hits: 12085749 Misses: 4157913 Evictions: 3256424 Overflows: 0 Memory Usage: 65537kB Worker 0: Hits: 0 Misses: 0 Evictions: 0 Overflows: 0 Memory Usage: 0kB Worker 1: Hits: 0 Misses: 0 Evictions: 0 Overflows: 0 Memory Usage: 0kB Worker 2: Hits: 0 Misses: 0 Evictions: 0 Overflows: 0 Memory Usage: 0kB Worker 3: Hits: 0 Misses: 0 Evictions: 0 Overflows: 0 Memory Usage: 0kB -> Index Scan using orders_pkey on orders (cost=0.43..0.52 rows=1 width=4) (actual time=0.002..0.002 rows=0 loops=4157913) Index Cond: (o_orderkey = lineitem.l_orderkey) Filter: (o_orderdate < '1995-03-09'::date) Rows Removed by Filter: 1 Planning Time: 0.211 ms Execution Time: 27553.477 ms (22 rows) By looking at the other cases like show_sort_info() or printing per-worker jit info, I could see that the general policy is that we skip printing info for workers that are not launched. Attached is a patch to do the same for Result Cache. I was earlier thinking about using (instrument[n].nloops == 0) to check for not-launched workers. But we are already using "if (rcstate->stats.cache_misses == 0)" for the leader process, so for consistency I used the same method for workers. -- Thanks, -Amit Khandekar Huawei Technologies skip_notlaunched_workers.patch Description: Binary data
Re: wal stats questions
On 2021/04/26 10:11, Masahiro Ikeda wrote: On 2021/04/23 16:30, Fujii Masao wrote: On 2021/04/23 10:25, Andres Freund wrote: Hi, On 2021-04-23 09:26:17 +0900, Masahiro Ikeda wrote: On 2021/04/23 0:36, Andres Freund wrote: On Thu, Apr 22, 2021, at 06:42, Fujii Masao wrote: On 2021/04/21 18:31, Masahiro Ikeda wrote: BTW, is it better to change PgStat_Counter from int64 to uint64 because> there aren't any counters with negative number? On second thought, it's ok even if the counters like wal_records get overflowed. Because they are always used to calculate the diff between the previous and current counters. Even when the current counters are overflowed and the previous ones are not, WalUsageAccumDiff() seems to return the right diff of them. If this understanding is right, I'd withdraw my comment and it's ok to use "long" type for those counters. Thought? Why long? It's of a platform dependent size, which doesn't seem useful here? I think it's ok to unify uint64. Although it's better to use small size for cache, the idea works well for only some platform which use 4bytes for "long". (Although I'm not familiar with the standardization...) It seems that we need to adopt unsinged long if use "long", which may be 4bytes. I though WalUsageAccumDiff() seems to return the right value too. But, I researched deeply and found that ISO/IEC 9899:1999 defines unsinged integer never overflow(2.6.5 Types 9th section) although it doesn't define overflow of signed integer type. If my understanding is right, the definition only guarantee WalUsageAccumDiff() returns the right value only if the type is unsigned integer. So, it's safe to change "long" to "unsigned long". I think this should just use 64bit counters. Emitting more than 4 billion records in one transaction isn't actually particularly rare. And Right. I agree to change the types of the counters to int64. I think that it's better to make this change as a separate patch from the changes for pg_stat_wal view. Thanks for your comments. OK, I separate two patches. Thanks! First patch has only the changes for pg_stat_wal view. ("v6-0001-performance-improvements-of-reporting-wal-stats-without-introducing-a-new-variable.patch") + pgWalUsage.wal_records == prevWalUsage.wal_records && + walStats.wal_write == 0 && walStats.wal_sync == 0 && WalStats.m_wal_write should be checked here instead of walStats.wal_write? Is there really the case where the number of sync is larger than zero when the number of writes is zero? If not, it's enough to check only the number of writes? +* wal records weren't generated. So, the counters of 'wal_fpi', +* 'wal_bytes', 'm_wal_buffers_full' are not updated neither. It's better to add the assertion check that confirms m_wal_buffers_full == 0 whenever wal_records is larger than zero? Second one has the changes for the type of the BufferUsage's and WalUsage's members. I change the type from long to int64. Is it better to make new thread? ("v6-0002-change-the-data-type-of-XXXUsage-from-long-to-int64.patch") Will review the patch later. I'm ok to discuss that in this thread. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Add reset information to pg_stat_statements_info
Hi. This is a proposal for a new feature in pg_stat_statements extension. I think we need to add some statistics to pg_stat_statements_info view. "pg_stat_statements_info.stats_reset" will only be logged if "pg_statements_reset()" or "pg_statements_reset(0,0,0)" is executed. How about changing it to the following ? [before] -[ RECORD 1 ]-- dealloc | 0 stats_reset | 2021-04-27 21:30:00 [after] -[ RECORD 1 ]-- dealloc | 0 last_reset_all_time | 2021-04-27 21:30:00 last_reset_userid | 10 last_reset_userid_time | 2021-04-27 22:30:00 last_reset_dbid | 13974 last_reset_dbid_time| 2021-04-27 23:30:00 last_reset_queryid | 8436481539005031698 last_reset_queryid_time | 2021-04-27 23:30:00 If "pg_statements_reset(10,0,0)" is executed, then "last_reset_userid", "last_reset_userid" are updated, but do not update "last_reset_all_time". If "pg_statements_reset(0,0,0)" is executed, then "last_reset_userid", "last_reset_userid" and others are null. What do you think? Regards, Seino Yuki
Re: Skip temporary table schema name from explain-verbose output.
On Tue, Apr 27, 2021 at 12:23 PM Amul Sul wrote: > > > > How about using an explain filter to replace the unstable text > > pg_temp_3 to pg_temp_N instead of changing it in the core? Following > > are the existing explain filters: explain_filter, > > explain_parallel_append, explain_analyze_without_memory, > > explain_resultcache, explain_parallel_sort_stats, explain_sq_limit. > > > > Well, yes eventually, that will be the kludge. I was wondering if that > table is accessible in a query via pg_temp schema then why should > bother about printing the pg_temp_N schema name which is an internal > purpose. Although only the associated session can access objects from that schema, I think, the entries in pg_class have different namespace oids and are accessible from other sessions. So knowing the actual schema name is useful for debugging purposes. Using auto_explain, the explain output goes to server log, where access to two temporary tables with the same name from different sessions can be identified by the actual schema name easily. I am not sure whether we should change explain output only for the sake of stable tests. You could add a flag to EXPLAIN to mask pg_temp name but that's probably an overkill. Filtering is a better option for tests. -- Best Wishes, Ashutosh Bapat
Re: Skip temporary table schema name from explain-verbose output.
On Tue, Apr 27, 2021 at 6:59 PM Ashutosh Bapat wrote: > > On Tue, Apr 27, 2021 at 12:23 PM Amul Sul wrote: > > > > > > How about using an explain filter to replace the unstable text > > > pg_temp_3 to pg_temp_N instead of changing it in the core? Following > > > are the existing explain filters: explain_filter, > > > explain_parallel_append, explain_analyze_without_memory, > > > explain_resultcache, explain_parallel_sort_stats, explain_sq_limit. > > > > > > > Well, yes eventually, that will be the kludge. I was wondering if that > > table is accessible in a query via pg_temp schema then why should > > bother about printing the pg_temp_N schema name which is an internal > > purpose. > > Although only the associated session can access objects from that > schema, I think, the entries in pg_class have different namespace oids > and are accessible from other sessions. So knowing the actual schema > name is useful for debugging purposes. Using auto_explain, the explain > output goes to server log, where access to two temporary tables with > the same name from different sessions can be identified by the actual > schema name easily. > > I am not sure whether we should change explain output only for the > sake of stable tests. I agree to not change the explain code, just for tests. > You could add a flag to EXPLAIN to mask pg_temp name but that's > probably an overkill. IMO, you are right, it will be an overkill. We might end up having requests to add flags for other cases as well. > Filtering is a better option for tests. +1. EXPLAIN output filtering is not something new, we have already stabilized a few tests. With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
Re: Performance degradation of REFRESH MATERIALIZED VIEW
On 4/27/21 7:34 AM, Masahiko Sawada wrote: On Tue, Apr 27, 2021 at 8:07 AM Andres Freund wrote: Hi, On 2021-04-26 23:59:17 +0200, Tomas Vondra wrote: On 4/26/21 9:27 PM, Andres Freund wrote: On 2021-04-26 15:31:02 +0200, Tomas Vondra wrote: I'm not sure what to do about this :-( I don't have any ideas about how to eliminate this overhead, so the only option I see is reverting the changes in heap_insert. Unfortunately, that'd mean inserts into TOAST tables won't be frozen ... ISTM that the fundamental issue here is not that we acquire pins that we shouldn't, but that we do so at a much higher frequency than needed. It's probably too invasive for 14, but I think it might be worth exploring passing down a BulkInsertState in nodeModifyTable.c's table_tuple_insert() iff the input will be more than one row. And then add the vm buffer of the target page to BulkInsertState, so that hio.c can avoid re-pinning the buffer. Yeah. The question still is what to do about 14, though. Shall we leave the code as it is now, or should we change it somehow? It seem a bit unfortunate that a COPY FREEZE optimization should negatively influence other (more) common use cases, so I guess we can't just keep the current code ... I'd suggest prototyping the use of BulkInsertState in nodeModifyTable.c and see whether that fixes the regression. Is this idea to have RelationGetBufferForTuple() skip re-pinning vmbuffer? If so, is this essentially the same as the one in the v3 patch? I don't think it is the same approach - it's a bit hard to follow what exactly happens in RelationGetBufferForTuple, but AFAICS it always starts with vmbuffer = InvalidBuffer, so it may pin the vmbuffer quite often, no? What Andres is suggesting (I think) is to modify ExecInsert() to pass a valid bistate to table_tuple_insert, instead of just NULL, and store the vmbuffer in it. Not sure how to identify when inserting more than just a single row, though ... regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Attach to shared memory after fork()
"=?UTF-8?B?6YKx5a6H6IiqKOeDm+i/nCk=?=" writes: > Fork is an expensive operation[1]. Yeah, it's not hugely cheap. > So I propose to remove shared buffers from postmaster and shmat them > after fork. This proposal seems moderately insane. In the first place, it introduces failure modes we could do without, and in the second place, how is it not strictly *more* expensive than what happens now? You still have to end up with all those TLB entries mapped in the child. (If your kernel is unable to pass down shared-memory TLBs effectively, ISTM that's a kernel shortcoming not a Postgres architectural problem.) regards, tom lane
Re: Performance degradation of REFRESH MATERIALIZED VIEW
On Tue, Apr 27, 2021 at 7:13 PM Tomas Vondra wrote: > What Andres is suggesting (I think) is to modify ExecInsert() to pass a > valid bistate to table_tuple_insert, instead of just NULL, and store the > vmbuffer in it. Not sure how to identify when inserting more than just a > single row, though ... I think the thread "should INSERT SELECT use a BulkInsertState?" [1], has a simple dynamic mechanism [with a GUC defining the threshold tuples] to switch over to using BulkInsertState. Maybe it's worth having a look at the patch - 0001-INSERT-SELECT-to-use-BulkInsertState-and-multi_i.patch? +/* Use bulk insert after a threshold number of tuples */ +// XXX: maybe this should only be done if it's not a partitioned table or +// if the partitions don't support miinfo, which uses its own bistates +mtstate->ntuples++; +if (mtstate->bistate == NULL && +mtstate->operation == CMD_INSERT && +mtstate->ntuples > bulk_insert_ntuples && +bulk_insert_ntuples >= 0) +{ +elog(DEBUG1, "enabling bulk insert"); +mtstate->bistate = GetBulkInsertState(); +} [1] https://www.postgresql.org/message-id/20210222030158.GS14772%40telsasoft.com With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
Re: Performance degradation of REFRESH MATERIALIZED VIEW
On Tue, Apr 27, 2021 at 03:43:07PM +0200, Tomas Vondra wrote: > On 4/27/21 7:34 AM, Masahiko Sawada wrote: > > On Tue, Apr 27, 2021 at 8:07 AM Andres Freund wrote: > > > On 2021-04-26 23:59:17 +0200, Tomas Vondra wrote: > > > > On 4/26/21 9:27 PM, Andres Freund wrote: > > > > > On 2021-04-26 15:31:02 +0200, Tomas Vondra wrote: > > > > > > I'm not sure what to do about this :-( I don't have any ideas about > > > > > > how to > > > > > > eliminate this overhead, so the only option I see is reverting the > > > > > > changes > > > > > > in heap_insert. Unfortunately, that'd mean inserts into TOAST > > > > > > tables won't > > > > > > be frozen ... > > > > > > > > > > ISTM that the fundamental issue here is not that we acquire pins that > > > > > we > > > > > shouldn't, but that we do so at a much higher frequency than needed. > > > > > > > > > > It's probably too invasive for 14, but I think it might be worth > > > > > exploring > > > > > passing down a BulkInsertState in nodeModifyTable.c's > > > > > table_tuple_insert() iff > > > > > the input will be more than one row. > > > > > > > > > > And then add the vm buffer of the target page to BulkInsertState, so > > > > > that > > > > > hio.c can avoid re-pinning the buffer. > > > > > > > > > > > > > Yeah. The question still is what to do about 14, though. Shall we leave > > > > the > > > > code as it is now, or should we change it somehow? It seem a bit > > > > unfortunate > > > > that a COPY FREEZE optimization should negatively influence other (more) > > > > common use cases, so I guess we can't just keep the current code ... > > > > > > I'd suggest prototyping the use of BulkInsertState in nodeModifyTable.c > > > and see whether that fixes the regression. > > What Andres is suggesting (I think) is to modify ExecInsert() to pass a > valid bistate to table_tuple_insert, instead of just NULL, and store the > vmbuffer in it. Not sure how to identify when inserting more than just a > single row, though ... Maybe this is relevant. https://commitfest.postgresql.org/33/2553/ | INSERT SELECT: BulkInsertState and table_multi_insert The biistate part is small - Simon requested to also use table_multi_insert, which makes the patch much bigger, and there's probably lots of restrictions I haven't even thought to check. This uses a GUC threshold for bulk insert, but I'm still not sure it's really problematic to use a biistate for a single row. /* Use bulk insert after a threshold number of tuples */ // XXX: maybe this should only be done if it's not a partitioned table or // if the partitions don't support miinfo, which uses its own bistates mtstate->ntuples++; if (mtstate->bistate == NULL && mtstate->ntuples > bulk_insert_ntuples && bulk_insert_ntuples >= 0) { elog(DEBUG1, "enabling bulk insert"); mtstate->bistate = GetBulkInsertState(); -- Justin
Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY
On Tue, Apr 27, 2021 at 4:34 PM Amit Langote wrote: > Thanks for the updated patch. I've been reading it, but I noticed a > bug in 8aba9322511f, which I thought you'd want to know to make a note > of when committing this one. > > So we decided in 8aba9322511f that it is okay to make the memory > context in which a transient partdesc is allocated a child of > PortalContext so that it disappears when the portal does. But > PortalContext is apparently NULL when the planner runs, at least in > the "simple" query protocol, so any transient partdescs built by the > planner would effectively leak. Not good. > > With this patch, partdesc_nodetached is no longer transient, so the > problem doesn't exist. > > I will write more about the updated patch in a bit... The first thing that struck me about partdesc_nodetached is that it's not handled in RelationClearRelation(), where we (re)set a regular partdesc to NULL so that the next RelationGetPartitionDesc() has to build it from scratch. I think partdesc_nodetached and the xmin should likewise be reset. Also in load_relcache_init_file(), although nothing serious there. That said, I think I may have found a couple of practical problems with partdesc_nodetached, or more precisely with having it side-by-side with regular partdesc. Maybe they can be fixed, so the problems are not as such deal breakers for the patch's main idea. The problems can be seen when different queries in a serializable transaction have to use both the regular partdesc and partdesc_detached in a given relcache. For example, try the following after first creating a situation where the table p has a detach-pending partition p2 (for values in (2) and a live partition p1 (for values in (1)). begin isolation level serializable; insert into p values (1); select * from p where a = 1; insert into p values (1); The 1st insert succeeds but the 2nd fails with: ERROR: no partition of relation "p" found for row DETAIL: Partition key of the failing row contains (a) = (1). I haven't analyzed this error very closely but there is another situation which causes a crash due to what appears to be a conflict with rd_pdcxt's design: -- run in a new session begin isolation level serializable; select * from p where a = 1; insert into p values (1); select * from p where a = 1; The 2nd select crashes: server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: Failed. The crash occurs because the planner gets handed a stale copy of partdesc_nodetached for the 2nd select. It gets stale, because the context it's allocated in gets made a child of rd_pdcxt, which is in turn assigned the context of the regular partdesc when it is built for the insert query. Any child contexts of rd_pdcxt are deleted as soon as the Relation's refcount goes to zero, taking it with partdesc_nodetached. Note it is this code in RelationBuildPartitionDesc(): /* * But first, a kluge: if there's an old rd_pdcxt, it contains an old * partition descriptor that may still be referenced somewhere. * Preserve it, while not leaking it, by reattaching it as a child * context of the new rd_pdcxt. Eventually it will get dropped by * either RelationClose or RelationClearRelation. */ if (rel->rd_pdcxt != NULL) MemoryContextSetParent(rel->rd_pdcxt, new_pdcxt); rel->rd_pdcxt = new_pdcxt; I think we may need a separate context for partdesc_nodetached, likely with the same kludges as rd_pdcxt. Maybe the first problem will go away with that as well. Few other minor things I noticed: +* Store it into relcache. For snapshots built excluding detached +* partitions, which we save separately, we also record the +* pg_inherits.xmin of the detached partition that was omitted; this +* informs a future potential user of such a cached snapshot. The "snapshot" in the 1st and the last sentence should be "partdesc"? + * We keep two partdescs in relcache: rd_partdesc_nodetached excludes + * partitions marked concurrently being detached, while rd_partdesc includes + * them. IMHO, describing rd_partdesc first in the sentence would be better. Like: rd_partdesc includes all partitions including any that are being concurrently detached, while rd_partdesc_nodetached excludes them. -- Amit Langote EDB: http://www.enterprisedb.com
Should we document the behaviour of TRUNCATE skipping repeated relations?
Hi, The TRUNCATE command currently skips processing repeated relations (see if (list_member_oid(relids, myrelid)) continue; in ExecuteTruncate) because the same relation can't be truncated more than once as it will be under "use" during the txn. For instance, in the following use cases 1) TRUNCATE foo, foo; 2) TRUNCATE foo, ONLY foo, foo; first instance of relation foo is taken into consideration for processing and other relation instances ( and options specified if any) are ignored. I feel that users should be aware of this behaviour so that they can correct the commands if written in such a way and don't report unexpected behaviour especially for the use cases like (2) where they might expect ONLY foo behaviour but it is skipped by the server. AFAICS, I don't find it anywhere in the docs, should we document it as a note? Thoughts? With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
Re: Replication slot stats misgivings
On Tue, Apr 27, 2021 at 5:40 PM Amit Kapila wrote: > > On Tue, Apr 27, 2021 at 8:58 AM Masahiko Sawada wrote: > > I have pushed this patch and seeing one buildfarm failure: > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mandrill&dt=2021-04-27%2009%3A23%3A14 > > starting permutation: s1_init s1_begin s1_insert_tbl1 s1_insert_tbl2 > s2_alter_tbl1_char s1_commit s2_get_changes > + isolationtester: canceling step s1_init after 314 seconds > step s1_init: SELECT 'init' FROM > pg_create_logical_replication_slot('isolation_slot', 'test_decoding'); > ?column? > > I am analyzing this. > After checking below logs corresponding to this test, it seems test has been executed and create_slot was successful: 2021-04-27 11:06:43.770 UTC [17694956:52] isolation/concurrent_ddl_dml STATEMENT: SELECT 'init' FROM pg_create_logical_replication_slot('isolation_slot', 'test_decoding'); 2021-04-27 11:07:11.748 UTC [5243096:9] LOG: checkpoint starting: time 2021-04-27 11:09:24.332 UTC [5243096:10] LOG: checkpoint complete: wrote 14 buffers (0.1%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.716 s, sync=0.001 s, total=132.584 s; sync files=0, longest=0.000 s, average=0.000 s; distance=198 kB, estimate=406 kB 2021-04-27 11:09:40.116 UTC [6226046:1] [unknown] LOG: connection received: host=[local] 2021-04-27 11:09:40.117 UTC [17694956:53] isolation/concurrent_ddl_dml LOG: statement: BEGIN; 2021-04-27 11:09:40.117 UTC [17694956:54] isolation/concurrent_ddl_dml LOG: statement: INSERT INTO tbl1 (val1, val2) VALUES (1, 1); 2021-04-27 11:09:40.118 UTC [17694956:55] isolation/concurrent_ddl_dml LOG: statement: INSERT INTO tbl2 (val1, val2) VALUES (1, 1); 2021-04-27 11:09:40.119 UTC [10944636:49] isolation/concurrent_ddl_dml LOG: statement: ALTER TABLE tbl1 ALTER COLUMN val2 TYPE character varying; I am not sure but there is some possibility that even though create slot is successful, the isolation tester got successful in canceling it, maybe because create_slot is just finished at the same time. As we can see from logs, during this test checkpoint also happened which could also lead to the slowness of this particular command. Also, I see a lot of messages like below which indicate stats collector is also quite slow: 2021-04-27 10:57:59.385 UTC [18743536:1] LOG: using stale statistics instead of current ones because stats collector is not responding I am not sure if the timeout happened because the machine is slow or is it in any way related to code. I am seeing some previous failures due to timeout on this machine [1][2]. In those failures, I see the "using stale stats" message. Also, I am not able to see why it can fail due to this patch? [1] - https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mandrill&dt=2021-02-23%2004%3A23%3A56 [2] - https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mandrill&dt=2020-12-24%2005%3A31%3A43 -- With Regards, Amit Kapila.
Re: Better sanity checking for message length words
Aleksander Alekseev writes: > I'm having slight issues applying your patch to the `master` branch. > Is it the right target? [ scratches head ... ] The patch still applies perfectly cleanly for me, using either "patch" or "git apply". > Regarding the idea, I think extra checks are a good thing and > definitely something that can be introduced in the next major version. > If we receive a complaint during beta-testing we can revert the patch > or maybe increase the limit for small messages. Actually, I did some more testing yesterday and found that "make check-world" passes with PQ_SMALL_MESSAGE_LIMIT values as small as 16. That may say more about our testing than anything else --- for example, it implies we're not using long statement or portal names anywhere. But still, it suggests that 3 is between one and two orders of magnitude too large. I'm now thinking that 1 would be a good conservative setting, or we could try 1000 if we want to be a bit more aggressive. As you say, beta-testing feedback could result in further modifications. regards, tom lane
Re: Replication slot stats misgivings
On Tue, Apr 27, 2021 at 11:29 PM Amit Kapila wrote: > > On Tue, Apr 27, 2021 at 5:40 PM Amit Kapila wrote: > > > > On Tue, Apr 27, 2021 at 8:58 AM Masahiko Sawada > > wrote: > > > > I have pushed this patch and seeing one buildfarm failure: > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mandrill&dt=2021-04-27%2009%3A23%3A14 > > > > starting permutation: s1_init s1_begin s1_insert_tbl1 s1_insert_tbl2 > > s2_alter_tbl1_char s1_commit s2_get_changes > > + isolationtester: canceling step s1_init after 314 seconds > > step s1_init: SELECT 'init' FROM > > pg_create_logical_replication_slot('isolation_slot', 'test_decoding'); > > ?column? > > > > I am analyzing this. > > > > After checking below logs corresponding to this test, it seems test > has been executed and create_slot was successful: The pg_create_logical_replication_slot() was executed at 11:04:25: 2021-04-27 11:04:25.494 UTC [17694956:49] isolation/concurrent_ddl_dml LOG: statement: SELECT 'init' FROM pg_create_logical_replication_slot('isolation_slot', 'test_decoding'); Therefore this command took 314 sec that matches the number the isolation test reported. And the folling logs follow: 2021-04-27 11:06:43.770 UTC [17694956:50] isolation/concurrent_ddl_dml LOG: logical decoding found consistent point at 0/17F9078 2021-04-27 11:06:43.770 UTC [17694956:51] isolation/concurrent_ddl_dml DETAIL: There are no running transactions. > 2021-04-27 11:06:43.770 UTC [17694956:52] isolation/concurrent_ddl_dml > STATEMENT: SELECT 'init' FROM > pg_create_logical_replication_slot('isolation_slot', 'test_decoding'); > 2021-04-27 11:07:11.748 UTC [5243096:9] LOG: checkpoint starting: time > 2021-04-27 11:09:24.332 UTC [5243096:10] LOG: checkpoint complete: > wrote 14 buffers (0.1%); 0 WAL file(s) added, 0 removed, 0 recycled; > write=0.716 s, sync=0.001 s, total=132.584 s; sync files=0, > longest=0.000 s, average=0.000 s; distance=198 kB, estimate=406 kB > 2021-04-27 11:09:40.116 UTC [6226046:1] [unknown] LOG: connection > received: host=[local] > 2021-04-27 11:09:40.117 UTC [17694956:53] isolation/concurrent_ddl_dml > LOG: statement: BEGIN; > 2021-04-27 11:09:40.117 UTC [17694956:54] isolation/concurrent_ddl_dml > LOG: statement: INSERT INTO tbl1 (val1, val2) VALUES (1, 1); > 2021-04-27 11:09:40.118 UTC [17694956:55] isolation/concurrent_ddl_dml > LOG: statement: INSERT INTO tbl2 (val1, val2) VALUES (1, 1); > 2021-04-27 11:09:40.119 UTC [10944636:49] isolation/concurrent_ddl_dml > LOG: statement: ALTER TABLE tbl1 ALTER COLUMN val2 TYPE character > varying; > > I am not sure but there is some possibility that even though create > slot is successful, the isolation tester got successful in canceling > it, maybe because create_slot is just finished at the same time. Yeah, we see the test log "canceling step s1_init after 314 seconds" but don't see any log indicating canceling query. > As we > can see from logs, during this test checkpoint also happened which > could also lead to the slowness of this particular command. Yes. I also think the checkpoint could somewhat lead to the slowness. And since create_slot() took 2min to find a consistent snapshot the system might have already been busy. > > Also, I see a lot of messages like below which indicate stats > collector is also quite slow: > 2021-04-27 10:57:59.385 UTC [18743536:1] LOG: using stale statistics > instead of current ones because stats collector is not responding > > I am not sure if the timeout happened because the machine is slow or > is it in any way related to code. I am seeing some previous failures > due to timeout on this machine [1][2]. In those failures, I see the > "using stale stats" message. It seems like a time-dependent issue but I'm wondering why the logical decoding test failed at this time. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
Re: Performance degradation of REFRESH MATERIALIZED VIEW
On Tue, Apr 27, 2021 at 10:43 PM Tomas Vondra wrote: > > > > On 4/27/21 7:34 AM, Masahiko Sawada wrote: > > On Tue, Apr 27, 2021 at 8:07 AM Andres Freund wrote: > >> > >> Hi, > >> > >> On 2021-04-26 23:59:17 +0200, Tomas Vondra wrote: > >>> On 4/26/21 9:27 PM, Andres Freund wrote: > On 2021-04-26 15:31:02 +0200, Tomas Vondra wrote: > > I'm not sure what to do about this :-( I don't have any ideas about how > > to > > eliminate this overhead, so the only option I see is reverting the > > changes > > in heap_insert. Unfortunately, that'd mean inserts into TOAST tables > > won't > > be frozen ... > > ISTM that the fundamental issue here is not that we acquire pins that we > shouldn't, but that we do so at a much higher frequency than needed. > > It's probably too invasive for 14, but I think it might be worth > exploring > passing down a BulkInsertState in nodeModifyTable.c's > table_tuple_insert() iff > the input will be more than one row. > > And then add the vm buffer of the target page to BulkInsertState, so that > hio.c can avoid re-pinning the buffer. > > >>> > >>> Yeah. The question still is what to do about 14, though. Shall we leave > >>> the > >>> code as it is now, or should we change it somehow? It seem a bit > >>> unfortunate > >>> that a COPY FREEZE optimization should negatively influence other (more) > >>> common use cases, so I guess we can't just keep the current code ... > >> > >> I'd suggest prototyping the use of BulkInsertState in nodeModifyTable.c > >> and see whether that fixes the regression. > > > > Is this idea to have RelationGetBufferForTuple() skip re-pinning > > vmbuffer? If so, is this essentially the same as the one in the v3 > > patch? > > > > I don't think it is the same approach - it's a bit hard to follow what > exactly happens in RelationGetBufferForTuple, but AFAICS it always > starts with vmbuffer = InvalidBuffer, so it may pin the vmbuffer quite > often, no? With that patch, we pin the vmbuffer only when inserting a frozen tuple into an empty page. That is, when inserting a frozen tuple into an empty page, we pin the vmbuffer and heap_insert() will mark the page all-visible and set all-frozen bit on vm. And from the next insertion (into the same page) until the page gets full, since the page is already all-visible, we skip pinning the vmbuffer. IOW, if the target page is not empty but all-visible, we skip pinning the vmbuffer. We pin the vmbuffer only once per heap page used during insertion. > > What Andres is suggesting (I think) is to modify ExecInsert() to pass a > valid bistate to table_tuple_insert, instead of just NULL, and store the > vmbuffer in it. Understood. This approach keeps using the same vmbuffer until we need another vm page corresponding to the target heap page, which seems better. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
Re: Performance degradation of REFRESH MATERIALIZED VIEW
On Wed, Apr 28, 2021 at 12:26 AM Masahiko Sawada wrote: > > On Tue, Apr 27, 2021 at 10:43 PM Tomas Vondra > wrote: > > > > > > > > On 4/27/21 7:34 AM, Masahiko Sawada wrote: > > > On Tue, Apr 27, 2021 at 8:07 AM Andres Freund wrote: > > >> > > >> Hi, > > >> > > >> On 2021-04-26 23:59:17 +0200, Tomas Vondra wrote: > > >>> On 4/26/21 9:27 PM, Andres Freund wrote: > > On 2021-04-26 15:31:02 +0200, Tomas Vondra wrote: > > > I'm not sure what to do about this :-( I don't have any ideas about > > > how to > > > eliminate this overhead, so the only option I see is reverting the > > > changes > > > in heap_insert. Unfortunately, that'd mean inserts into TOAST tables > > > won't > > > be frozen ... > > > > ISTM that the fundamental issue here is not that we acquire pins that > > we > > shouldn't, but that we do so at a much higher frequency than needed. > > > > It's probably too invasive for 14, but I think it might be worth > > exploring > > passing down a BulkInsertState in nodeModifyTable.c's > > table_tuple_insert() iff > > the input will be more than one row. > > > > And then add the vm buffer of the target page to BulkInsertState, so > > that > > hio.c can avoid re-pinning the buffer. > > > > >>> > > >>> Yeah. The question still is what to do about 14, though. Shall we leave > > >>> the > > >>> code as it is now, or should we change it somehow? It seem a bit > > >>> unfortunate > > >>> that a COPY FREEZE optimization should negatively influence other (more) > > >>> common use cases, so I guess we can't just keep the current code ... > > >> > > >> I'd suggest prototyping the use of BulkInsertState in nodeModifyTable.c > > >> and see whether that fixes the regression. > > > > > > Is this idea to have RelationGetBufferForTuple() skip re-pinning > > > vmbuffer? If so, is this essentially the same as the one in the v3 > > > patch? > > > > > > > I don't think it is the same approach - it's a bit hard to follow what > > exactly happens in RelationGetBufferForTuple, but AFAICS it always > > starts with vmbuffer = InvalidBuffer, so it may pin the vmbuffer quite > > often, no? > > With that patch, we pin the vmbuffer only when inserting a frozen > tuple into an empty page. That is, when inserting a frozen tuple into > an empty page, we pin the vmbuffer and heap_insert() will mark the > page all-visible and set all-frozen bit on vm. And from the next > insertion (into the same page) until the page gets full, since the > page is already all-visible, we skip pinning the vmbuffer. IOW, if the > target page is not empty but all-visible, we skip pinning the > vmbuffer. We pin the vmbuffer only once per heap page used during > insertion. > > > > > What Andres is suggesting (I think) is to modify ExecInsert() to pass a > > valid bistate to table_tuple_insert, instead of just NULL, and store the > > vmbuffer in it. > > Understood. This approach keeps using the same vmbuffer until we need > another vm page corresponding to the target heap page, which seems > better. But how is ExecInsert() related to REFRESH MATERIALIZED VIEW? Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY
On 2021-Apr-27, Amit Langote wrote: > On Tue, Apr 27, 2021 at 4:34 PM Amit Langote wrote: > I think we may need a separate context for partdesc_nodetached, likely > with the same kludges as rd_pdcxt. Maybe the first problem will go > away with that as well. Ooh, seems I completely misunderstood what RelationClose was doing. I thought it was deleting the whole rd_pdcxt, *including* the "current" partdesc. But that's not at all what it does: it only deletes the *children* memcontexts, so the partdesc that is currently valid remains valid. I agree that your proposed fix appears to be promising, in that a separate "context tree" rd_pddcxt (?) can be used for this. I'll try it out now. > Few other minor things I noticed: > > +* Store it into relcache. For snapshots built excluding detached > +* partitions, which we save separately, we also record the > +* pg_inherits.xmin of the detached partition that was omitted; this > +* informs a future potential user of such a cached snapshot. > > The "snapshot" in the 1st and the last sentence should be "partdesc"? Doh, yeah. > + * We keep two partdescs in relcache: rd_partdesc_nodetached excludes > + * partitions marked concurrently being detached, while rd_partdesc includes > + * them. > > IMHO, describing rd_partdesc first in the sentence would be better. > Like: rd_partdesc includes all partitions including any that are being > concurrently detached, while rd_partdesc_nodetached excludes them. Makes sense. -- Álvaro Herrera Valdivia, Chile
Re: Since '2001-09-09 01:46:40'::timestamp microseconds are lost when extracting epoch
I wrote: >> Perhaps it'd be worth documenting that you can get the standard >> astronomical definition of Julian date by transposing to time zone UTC-12 >> before converting. BTW ... I'd first thought that the way to do this was to rotate to time zone UTC+12. I convinced myself on two separate days that UTC-12 was correct instead, but now I'm thinking I was right the first time. In particular, the results I'm getting with UTC-12 don't square with the example on Wikipedia [1], which says "the Julian Date for 00:30:00.0 UT January 1, 2013, is 2 456 293.520 833": regression=# select extract(julian from '2013-01-01 00:30+00'::timestamptz at time zone 'utc-12'); extract -- 2456294.5208 (1 row) But using UTC+12 does match: regression=# select extract(julian from '2013-01-01 00:30+00'::timestamptz at time zone 'utc+12'); extract -- 2456293.5208 (1 row) Of course Wikipedia has been known to contain errors, but now I'm inclined to think I blew this. Anyone want to check my work? regards, tom lane [1] https://en.wikipedia.org/wiki/Julian_day
Re: SQL-standard function body
Peter Eisentraut writes: > On 18.04.21 23:33, Tom Lane wrote: >> The actual use-case for that seems pretty thin, so we never bothered >> to worry about it before. But if we're going to build loop-breaking >> logic to handle function body dependencies, it should deal with this >> too. I think that all that's required is for the initial dummy >> function declaration to omit defaults as well as providing a dummy >> body. > I have studied this a bit. I'm not sure where the dummy function > declaration should be created. The current dependency-breaking logic in > pg_dump_sort.c doesn't appear to support injecting additional objects > into the set of dumpable objects. So we would need to create it perhaps > in dumpFunc() and then later set flags that indicate whether it will be > required. Hmm, good point. The existing code that breaks loops involving views depends on the fact that the view relation and the view's ON SELECT rule are already treated as distinct objects within pg_dump. So we just need to mark the rule object to indicate whether to emit it or not. To make it work for functions, there would have to be a secondary object representing the function body (and the default expressions, I guess). That's kind of a lot of complication, and inefficiency, for a corner case that may never arise in practice. We've ignored the risk for default expressions, and AFAIR have yet to receive any field complaints about it. So maybe it's okay to do the same for SQL-style function bodies, at least for now. > Another option would be that we disallow this at creation time. Don't like that one much. The backend shouldn't be in the business of rejecting valid commands just because pg_dump might be unable to cope later. regards, tom lane
Re: Performance degradation of REFRESH MATERIALIZED VIEW
On 4/27/21 5:44 PM, Masahiko Sawada wrote: On Wed, Apr 28, 2021 at 12:26 AM Masahiko Sawada wrote: On Tue, Apr 27, 2021 at 10:43 PM Tomas Vondra wrote: On 4/27/21 7:34 AM, Masahiko Sawada wrote: On Tue, Apr 27, 2021 at 8:07 AM Andres Freund wrote: Hi, On 2021-04-26 23:59:17 +0200, Tomas Vondra wrote: On 4/26/21 9:27 PM, Andres Freund wrote: On 2021-04-26 15:31:02 +0200, Tomas Vondra wrote: I'm not sure what to do about this :-( I don't have any ideas about how to eliminate this overhead, so the only option I see is reverting the changes in heap_insert. Unfortunately, that'd mean inserts into TOAST tables won't be frozen ... ISTM that the fundamental issue here is not that we acquire pins that we shouldn't, but that we do so at a much higher frequency than needed. It's probably too invasive for 14, but I think it might be worth exploring passing down a BulkInsertState in nodeModifyTable.c's table_tuple_insert() iff the input will be more than one row. And then add the vm buffer of the target page to BulkInsertState, so that hio.c can avoid re-pinning the buffer. Yeah. The question still is what to do about 14, though. Shall we leave the code as it is now, or should we change it somehow? It seem a bit unfortunate that a COPY FREEZE optimization should negatively influence other (more) common use cases, so I guess we can't just keep the current code ... I'd suggest prototyping the use of BulkInsertState in nodeModifyTable.c and see whether that fixes the regression. Is this idea to have RelationGetBufferForTuple() skip re-pinning vmbuffer? If so, is this essentially the same as the one in the v3 patch? I don't think it is the same approach - it's a bit hard to follow what exactly happens in RelationGetBufferForTuple, but AFAICS it always starts with vmbuffer = InvalidBuffer, so it may pin the vmbuffer quite often, no? With that patch, we pin the vmbuffer only when inserting a frozen tuple into an empty page. That is, when inserting a frozen tuple into an empty page, we pin the vmbuffer and heap_insert() will mark the page all-visible and set all-frozen bit on vm. And from the next insertion (into the same page) until the page gets full, since the page is already all-visible, we skip pinning the vmbuffer. IOW, if the target page is not empty but all-visible, we skip pinning the vmbuffer. We pin the vmbuffer only once per heap page used during insertion. What Andres is suggesting (I think) is to modify ExecInsert() to pass a valid bistate to table_tuple_insert, instead of just NULL, and store the vmbuffer in it. Understood. This approach keeps using the same vmbuffer until we need another vm page corresponding to the target heap page, which seems better. But how is ExecInsert() related to REFRESH MATERIALIZED VIEW? TBH I haven't looked into the details, but Andres talked about nodeModifyTable and table_tuple_insert, and ExecInsert is the only place calling it. But maybe I'm just confused and Andres meant something else? regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY
This v3 handles things as you suggested and works correctly AFAICT. I'm going to add some more tests cases to verify the behavior in the scenarios you showed, and get them to run under cache-clobber options to make sure it's good. Thanks! -- Álvaro Herrera Valdivia, Chile >From 145ed63c4338d7c7804329e736019a00c97f1a7a Mon Sep 17 00:00:00 2001 From: Alvaro Herrera Date: Mon, 26 Apr 2021 14:53:04 -0400 Subject: [PATCH v3] Allow a partdesc-omitting-partitions to be cached Makes partition descriptor acquisition faster during the transient period in which a partition is in the process of being detached. Per gripe from Amit Langote Discussion: https://postgr.es/m/ca+hiwqfgpp1lxjzobygt9rpvtjxxkg5qg2+xch2z1q7krqz...@mail.gmail.com --- src/backend/catalog/pg_inherits.c | 52 - src/backend/commands/tablecmds.c | 66 +++- src/backend/commands/trigger.c| 3 +- src/backend/partitioning/partdesc.c | 101 -- src/backend/utils/cache/relcache.c| 33 +- src/include/catalog/pg_inherits.h | 6 +- src/include/utils/rel.h | 5 + .../detach-partition-concurrently-3.out | 57 +- .../detach-partition-concurrently-3.spec | 9 +- 9 files changed, 254 insertions(+), 78 deletions(-) diff --git a/src/backend/catalog/pg_inherits.c b/src/backend/catalog/pg_inherits.c index 6447b52854..c373faf2d6 100644 --- a/src/backend/catalog/pg_inherits.c +++ b/src/backend/catalog/pg_inherits.c @@ -52,6 +52,22 @@ typedef struct SeenRelsEntry * then no locks are acquired, but caller must beware of race conditions * against possible DROPs of child relations. * + * Partitions marked as being detached are omitted; see + * find_inheritance_children_extended for details. + */ +List * +find_inheritance_children(Oid parentrelId, LOCKMODE lockmode) +{ + return find_inheritance_children_extended(parentrelId, true, lockmode, + NULL, NULL); +} + +/* + * find_inheritance_children_extended + * + * As find_inheritance_children, with more options regarding detached + * partitions. + * * If a partition's pg_inherits row is marked "detach pending", * *detached_exist (if not null) is set true. * @@ -60,11 +76,13 @@ typedef struct SeenRelsEntry * marked "detach pending" is visible to that snapshot, then that partition is * omitted from the output list. This makes partitions invisible depending on * whether the transaction that marked those partitions as detached appears - * committed to the active snapshot. + * committed to the active snapshot. In addition, *detached_xmin (if not null) + * is set to the xmin of the row of the detached partition. */ List * -find_inheritance_children(Oid parentrelId, bool omit_detached, - LOCKMODE lockmode, bool *detached_exist) +find_inheritance_children_extended(Oid parentrelId, bool omit_detached, + LOCKMODE lockmode, bool *detached_exist, + TransactionId *detached_xmin) { List *list = NIL; Relation relation; @@ -132,7 +150,32 @@ find_inheritance_children(Oid parentrelId, bool omit_detached, snap = GetActiveSnapshot(); if (!XidInMVCCSnapshot(xmin, snap)) +{ + if (detached_xmin) + { + /* + * Two detached partitions should not occur (see + * checks in MarkInheritDetached), but if they do, + * track the newer of the two. Make sure to warn the + * user, so that they can clean up. Since this is + * just a cross-check against potentially corrupt + * catalogs, we don't make it a full-fledged error + * message. + */ + if (*detached_xmin != InvalidTransactionId) + { + elog(WARNING, "more than one partition pending detach found for table with OID %u", + parentrelId); + if (TransactionIdFollows(xmin, *detached_xmin)) +*detached_xmin = xmin; + } + else + *detached_xmin = xmin; + } + + /* Don't add the partition to the output list */ continue; +} } } @@ -247,8 +290,7 @@ find_all_inheritors(Oid parentrelId, LOCKMODE lockmode, List **numparents) ListCell *lc; /* Get the direct children of this rel */ - currentchildren = find_inheritance_children(currentrel, true, - lockmode, NULL); + currentchildren = find_inheritance_children(currentrel, lockmode); /* * Add to the queue only those children not already seen. This avoids diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c index 8e717ada28..d9ba87a2a3 100644 --- a/src/backend/commands/tablecmds.c +++ b/src/backend/commands/tablecmds.c @@ -3492,7 +3492,7 @@ renameatt_internal(Oid myrelid, * expected_parents will only be 0 if we are not already recursing. */ if (expected_parents == 0 && - find_inheritance_children(myrelid, true, NoLock, NULL) != NIL) + find_inheritance_children(myrelid, NoLock) != NIL) ereport(ERROR, (
Re: Addition of authenticated ID to pg_stat_activity
Greetings, * Michael Paquier (mich...@paquier.xyz) wrote: > On Mon, Apr 26, 2021 at 03:21:46PM -0400, Stephen Frost wrote: > > * Andres Freund (and...@anarazel.de) wrote: > >> I'm getting a bit worried about the incremental increase in > >> pg_stat_activity width - it's probably by far the view that's most > >> viewed interactively. I think we need to be careful not to add too niche > >> things to it. This is especially true for columns that may be wider. > >> > >> It'd be bad for discoverability, but perhaps something like this, that's > >> not that likely to be used interactively, would be better done as a > >> separate function that would need to be used explicitly? > > > > I mean.. we already have separate functions and views for this, though > > they're auth-method-specific currently, but also provide more details, > > since it isn't actually a "one size fits all" kind of thing like this > > entire approach is imagining it to be. > > Referring to pg_stat_ssl and pg_stat_gssapi here, right? Yes, that > would be very limited as this leads to no visibility for LDAP, all > password-based authentications and more. Yes, of course. The point being made was that we could do the same for the other auth methods rather than adding something to pg_stat_activity. > I am wondering if we should take this as an occasion to move some data > out of pg_stat_activity into a separate biew, dedicated to the data > related to the connection that remains set to the same value for the > duration of a backend's life, as of the following set: > - the backend PID > - client_addr > - client_hostname > - client_port > - authenticated ID > - application_name? (well, this one could change on reload, so I am > lying). application_name certainly changes, as pointed out elsewhere. > It would be tempting to move the database name and the username but > these are popular fields with monitoring. Maybe we could name that > pg_stat_connection_status, pg_stat_auth_status or just > pg_stat_connection? I don't know that there's really any of the existing fields that aren't "popular fields with monitoring".. The issue that Andres brought up wasn't about monitoring though- it was about users looking interactively. Monitoring systems can adjust their queries for the new major version to do whatever joins, et al, they need and that's a once-per-major-version to do. On the other hand, people doing: table pg_stat_activity; Would like to get the info they really want out of that and not anything else. If we're going to adjust the fields returned from that then that's really the lens we should use. So, what fields are people really looking at when querying pg_stat_activity interactively? User, database, pid, last query, transaction start, query start, state, wait event info, maybe backend xmin/xid? I doubt most people looking at pg_stat_activity interactively actually care about the non-user backends (autovacuum, et al). Thanks, Stephen signature.asc Description: PGP signature
Re: Addition of authenticated ID to pg_stat_activity
On Tue, Apr 27, 2021 at 09:59:18AM +0900, Michael Paquier wrote: > On Mon, Apr 26, 2021 at 03:21:46PM -0400, Stephen Frost wrote: > > * Andres Freund (and...@anarazel.de) wrote: > >> I'm getting a bit worried about the incremental increase in > >> pg_stat_activity width - it's probably by far the view that's most > >> viewed interactively. I think we need to be careful not to add too niche > >> things to it. This is especially true for columns that may be wider. > >> > >> It'd be bad for discoverability, but perhaps something like this, that's > >> not that likely to be used interactively, would be better done as a > >> separate function that would need to be used explicitly? > > > > I mean.. we already have separate functions and views for this, though > > they're auth-method-specific currently, but also provide more details, > > since it isn't actually a "one size fits all" kind of thing like this > > entire approach is imagining it to be. > > I am wondering if we should take this as an occasion to move some data > out of pg_stat_activity into a separate biew, dedicated to the data > related to the connection that remains set to the same value for the > duration of a backend's life, as of the following set: > - the backend PID > - client_addr > - client_hostname > - client_port > - authenticated ID > - application_name? (well, this one could change on reload, so I am > lying). +backend type +leader_PID > It would be tempting to move the database name and the username but > these are popular fields with monitoring. Maybe we could name that > pg_stat_connection_status, pg_stat_auth_status or just > pg_stat_connection? Maybe - there could also be a trivial view which JOINs pg_stat_activity and pg_stat_connection ON (pid). Technically I think it could also move backend_start/backend_xmin, but it'd be odd to move them if the other timestamp/xid columns stayed in pg_stat_activity. There's no reason that pg_stat_connection would *have* to be "static" per connction, right ? That's just how you're defining what would be included. Stephen wrote: > Would like to get the info they really want out of that and not anything > else. If we're going to adjust the fields returned from that then > that's really the lens we should use. > > So, what fields are people really looking at when querying > pg_stat_activity interactively? User, database, pid, last query, > transaction start, query start, state, wait event info, maybe backend > xmin/xid? I doubt most people looking at pg_stat_activity interactively > actually care about the non-user backends (autovacuum, et al). I think the narrow/userfacing view would exclude only the OID/XID fields: datid| oid | | | usesysid | oid | | | backend_xid | xid | | | backend_xmin | xid | | | I think interactive users *would* care about other backend types - they're frequently wondering "what's going on?" TBH, query text is often so long that I have to write left(query,33), and then the idea of a "userfacing" variant loses its appeal, since it's necessary to enumerate columns anyway. -- Justin
Re: Performance degradation of REFRESH MATERIALIZED VIEW
Hi, On 2021-04-28 00:44:47 +0900, Masahiko Sawada wrote: > On Wed, Apr 28, 2021 at 12:26 AM Masahiko Sawada > wrote: > > > What Andres is suggesting (I think) is to modify ExecInsert() to pass a > > > valid bistate to table_tuple_insert, instead of just NULL, and store the > > > vmbuffer in it. > > > > Understood. This approach keeps using the same vmbuffer until we need > > another vm page corresponding to the target heap page, which seems > > better. > > But how is ExecInsert() related to REFRESH MATERIALIZED VIEW? I was thinking of the CONCURRENTLY path for REFRESH MATERIALIZED VIEW I think. Or something. That actually makes it easier - we already pass in a bistate in the relevant paths. So if we add a current_vmbuf to BulkInsertStateData, we can avoid needing to pin so often. It seems that'd end up with a good bit cleaner and less risky code than the skip_vmbuffer_for_frozen_tuple_insertion_v3.patch approach. The current RelationGetBufferForTuple() interface / how it's used in heapam.c doesn't make this quite as trivial as it could be... Attached is a quick hack implementing this. For me it reduces the overhead noticably: REFRESH MATERIALIZED VIEW mv; before: Time: 26542.333 ms (00:26.542) after: Time: 23105.047 ms (00:23.105) Greetings, Andres Freund diff --git i/src/include/access/hio.h w/src/include/access/hio.h index 1d611287c08..6d8f18152f1 100644 --- i/src/include/access/hio.h +++ w/src/include/access/hio.h @@ -30,6 +30,7 @@ typedef struct BulkInsertStateData { BufferAccessStrategy strategy; /* our BULKWRITE strategy object */ Buffer current_buf; /* current insertion target page */ + Buffer current_vmbuf; } BulkInsertStateData; diff --git i/src/backend/access/heap/heapam.c w/src/backend/access/heap/heapam.c index 13396eb7f2c..5a63efc4386 100644 --- i/src/backend/access/heap/heapam.c +++ w/src/backend/access/heap/heapam.c @@ -2011,6 +2011,7 @@ GetBulkInsertState(void) bistate = (BulkInsertState) palloc(sizeof(BulkInsertStateData)); bistate->strategy = GetAccessStrategy(BAS_BULKWRITE); bistate->current_buf = InvalidBuffer; + bistate->current_vmbuf = InvalidBuffer; return bistate; } @@ -2020,8 +2021,7 @@ GetBulkInsertState(void) void FreeBulkInsertState(BulkInsertState bistate) { - if (bistate->current_buf != InvalidBuffer) - ReleaseBuffer(bistate->current_buf); + ReleaseBulkInsertStatePin(bistate); FreeAccessStrategy(bistate->strategy); pfree(bistate); } @@ -2033,8 +2033,15 @@ void ReleaseBulkInsertStatePin(BulkInsertState bistate) { if (bistate->current_buf != InvalidBuffer) + { ReleaseBuffer(bistate->current_buf); - bistate->current_buf = InvalidBuffer; + bistate->current_buf = InvalidBuffer; + } + if (bistate->current_vmbuf != InvalidBuffer) + { + ReleaseBuffer(bistate->current_vmbuf); + bistate->current_vmbuf = InvalidBuffer; + } } @@ -2277,8 +2284,12 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid, } UnlockReleaseBuffer(buffer); - if (vmbuffer != InvalidBuffer) + + if (vmbuffer != InvalidBuffer && + (!bistate || bistate->current_vmbuf != vmbuffer)) + { ReleaseBuffer(vmbuffer); + } /* * If tuple is cachable, mark it for invalidation from the caches in case @@ -2655,8 +2666,11 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples, } /* We're done with inserting all tuples, so release the last vmbuffer. */ - if (vmbuffer != InvalidBuffer) + if (vmbuffer != InvalidBuffer && + (!bistate || bistate->current_vmbuf != vmbuffer)) + { ReleaseBuffer(vmbuffer); + } /* * We're done with the actual inserts. Check for conflicts again, to diff --git i/src/backend/access/heap/hio.c w/src/backend/access/heap/hio.c index ffc89685bff..a573322bb6d 100644 --- i/src/backend/access/heap/hio.c +++ w/src/backend/access/heap/hio.c @@ -136,7 +136,8 @@ ReadBufferBI(Relation relation, BlockNumber targetBlock, * be less than block2. */ static void -GetVisibilityMapPins(Relation relation, Buffer buffer1, Buffer buffer2, +GetVisibilityMapPins(Relation relation, BulkInsertState bistate, + Buffer buffer1, Buffer buffer2, BlockNumber block1, BlockNumber block2, Buffer *vmbuffer1, Buffer *vmbuffer2) { @@ -157,6 +158,12 @@ GetVisibilityMapPins(Relation relation, Buffer buffer1, Buffer buffer2, if (!need_to_pin_buffer1 && !need_to_pin_buffer2) return; + if (bistate && bistate->current_vmbuf != InvalidBuffer) + { + ReleaseBuffer(bistate->current_vmbuf); + bistate->current_vmbuf = InvalidBuffer; + } + /* We must unlock both buffers before doing any I/O. */ LockBuffer(buffer1, BUFFER_LOCK_UNLOCK); if (buffer2 != InvalidBuffer && buffer2 != buffer1) @@ -269,6 +276,26 @@ RelationAddExtraBlocks(Relation relation, BulkInsertState bistate) FreeSpaceMapVacuumRange(relation, firstBlock, blockNum + 1); } +static void +visibilitymap_pin_bi(Relation rel, BlockNumber targetBlock, BulkInsertState bistate, Buffer *vmbuffer) +{ + if (bistate != NULL) + { + if (*vmbuf
Re: Addition of authenticated ID to pg_stat_activity
Hi, On 2021-04-27 12:40:29 -0400, Stephen Frost wrote: > So, what fields are people really looking at when querying > pg_stat_activity interactively? User, database, pid, last query, > transaction start, query start, state, wait event info, maybe backend > xmin/xid? I doubt most people looking at pg_stat_activity interactively > actually care about the non-user backends (autovacuum, et al). Not representative, but I personally am about as often interested in one of the non-connection processes as the connection ones. E.g. investigating what is autovacuum's bottleneck, are checkpointer / wal writer / bgwriter io bound or keeping up, etc. Greetings, Andres Freund
Release 14 Beta 1
Greetings. The Release Management Team (Pete Geoghegan, Michael Paquier and myself) proposes that the date of the Beta 1 release will be **Thursday May 20, 2021**, which aligns with past practice. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY
On 2021-Apr-27, Alvaro Herrera wrote: > This v3 handles things as you suggested and works correctly AFAICT. I'm > going to add some more tests cases to verify the behavior in the > scenarios you showed, and get them to run under cache-clobber options to > make sure it's good. Yep, it seems to work. Strangely, the new isolation case doesn't actually crash before the fix -- it merely throws a memory allocation error. -- Álvaro Herrera Valdivia, Chile "Linux transformó mi computadora, de una `máquina para hacer cosas', en un aparato realmente entretenido, sobre el cual cada día aprendo algo nuevo" (Jaime Salinas) >From 8e41eb137b835f614ddeeb7c79e70d7e0740f522 Mon Sep 17 00:00:00 2001 From: Alvaro Herrera Date: Tue, 27 Apr 2021 19:04:37 -0400 Subject: [PATCH v4] Allow a partdesc-omitting-partitions to be cached Makes partition descriptor acquisition faster during the transient period in which a partition is in the process of being detached. (This incidentally fixes an alleged bug in 8aba9322511 whereby a memory context holding a transient partdesc is reparented to NULL ... leading to permanent leak of that memory. Reported by Amit Langote.) Per gripe from Amit Langote Discussion: https://postgr.es/m/ca+hiwqfgpp1lxjzobygt9rpvtjxxkg5qg2+xch2z1q7krqz...@mail.gmail.com --- src/backend/catalog/pg_inherits.c | 52 - src/backend/commands/tablecmds.c | 66 ++- src/backend/commands/trigger.c| 3 +- src/backend/partitioning/partdesc.c | 110 -- src/backend/utils/cache/relcache.c| 33 +- src/include/catalog/pg_inherits.h | 6 +- src/include/utils/rel.h | 5 + .../detach-partition-concurrently-3.out | 110 +- .../detach-partition-concurrently-3.spec | 14 ++- 9 files changed, 320 insertions(+), 79 deletions(-) diff --git a/src/backend/catalog/pg_inherits.c b/src/backend/catalog/pg_inherits.c index 6447b52854..c373faf2d6 100644 --- a/src/backend/catalog/pg_inherits.c +++ b/src/backend/catalog/pg_inherits.c @@ -52,6 +52,22 @@ typedef struct SeenRelsEntry * then no locks are acquired, but caller must beware of race conditions * against possible DROPs of child relations. * + * Partitions marked as being detached are omitted; see + * find_inheritance_children_extended for details. + */ +List * +find_inheritance_children(Oid parentrelId, LOCKMODE lockmode) +{ + return find_inheritance_children_extended(parentrelId, true, lockmode, + NULL, NULL); +} + +/* + * find_inheritance_children_extended + * + * As find_inheritance_children, with more options regarding detached + * partitions. + * * If a partition's pg_inherits row is marked "detach pending", * *detached_exist (if not null) is set true. * @@ -60,11 +76,13 @@ typedef struct SeenRelsEntry * marked "detach pending" is visible to that snapshot, then that partition is * omitted from the output list. This makes partitions invisible depending on * whether the transaction that marked those partitions as detached appears - * committed to the active snapshot. + * committed to the active snapshot. In addition, *detached_xmin (if not null) + * is set to the xmin of the row of the detached partition. */ List * -find_inheritance_children(Oid parentrelId, bool omit_detached, - LOCKMODE lockmode, bool *detached_exist) +find_inheritance_children_extended(Oid parentrelId, bool omit_detached, + LOCKMODE lockmode, bool *detached_exist, + TransactionId *detached_xmin) { List *list = NIL; Relation relation; @@ -132,7 +150,32 @@ find_inheritance_children(Oid parentrelId, bool omit_detached, snap = GetActiveSnapshot(); if (!XidInMVCCSnapshot(xmin, snap)) +{ + if (detached_xmin) + { + /* + * Two detached partitions should not occur (see + * checks in MarkInheritDetached), but if they do, + * track the newer of the two. Make sure to warn the + * user, so that they can clean up. Since this is + * just a cross-check against potentially corrupt + * catalogs, we don't make it a full-fledged error + * message. + */ + if (*detached_xmin != InvalidTransactionId) + { + elog(WARNING, "more than one partition pending detach found for table with OID %u", + parentrelId); + if (TransactionIdFollows(xmin, *detached_xmin)) +*detached_xmin = xmin; + } + else + *detached_xmin = xmin; + } + + /* Don't add the partition to the output list */ continue; +} } } @@ -247,8 +290,7 @@ find_all_inheritors(Oid parentrelId, LOCKMODE lockmode, List **numparents) ListCell *lc; /* Get the direct children of this rel */ - currentchildren = find_inheritance_children(currentrel, true, - lockmode, NULL); + currentchildren = find_inheritance_children(currentrel, lockmode); /* * Add to the qu
Re: wal stats questions
On 2021/04/27 21:56, Fujii Masao wrote: > > > On 2021/04/26 10:11, Masahiro Ikeda wrote: >> >> First patch has only the changes for pg_stat_wal view. >> ("v6-0001-performance-improvements-of-reporting-wal-stats-without-introducing-a-new-variable.patch") >> > > + pgWalUsage.wal_records == prevWalUsage.wal_records && > + walStats.wal_write == 0 && walStats.wal_sync == 0 && > > WalStats.m_wal_write should be checked here instead of walStats.wal_write? Thanks! Yes, I'll fix it. > Is there really the case where the number of sync is larger than zero when > the number of writes is zero? If not, it's enough to check only the number > of writes? I thought that there is the case if "wal_sync_method" is fdatasync, fsync or fsync_writethrough. The example case is following. (1) backend-1 writes the wal data because wal buffer has no space. But, it doesn't sync the wal data. (2) backend-2 reads data pages. In the execution, it need to write and sync the wal because dirty pages is selected as victim pages. backend-2 need to only sync the wal data because the wal data were already written by backend-1, but they weren't synced. I'm ok to change it since it's rare case. > + * wal records weren't generated. So, the counters of 'wal_fpi', > + * 'wal_bytes', 'm_wal_buffers_full' are not updated neither. > > It's better to add the assertion check that confirms > m_wal_buffers_full == 0 whenever wal_records is larger than zero? Sorry, I couldn't understand yet. I thought that m_wal_buffers_full can be larger than 0 if wal_records > 0. Do you suggest that the following assertion is needed? - if (memcmp(&WalStats, &all_zeroes, sizeof(PgStat_MsgWal)) == 0) - return false; + if (pgWalUsage.wal_records == prevWalUsage.wal_records && + WalStats.m_wal_write == 0 && WalStats.m_wal_sync == 0) + { + Assert(pgWalUsage.wal_fpi == 0 && pgWalUsage.wal_bytes && + WalStats.m_wal_buffers_full == 0 && WalStats.m_wal_write_time == 0 && + WalStats.m_wal_sync_time == 0); + return; + } >> Second one has the changes for the type of the BufferUsage's and WalUsage's >> members. I change the type from long to int64. Is it better to make new >> thread? >> ("v6-0002-change-the-data-type-of-XXXUsage-from-long-to-int64.patch") > > Will review the patch later. I'm ok to discuss that in this thread. Thanks! Regards, -- Masahiro Ikeda NTT DATA CORPORATION
Re: Unresolved repliaction hang and stop problem.
On 2021-Apr-14, Amit Kapila wrote: > On Tue, Apr 13, 2021 at 1:18 PM Krzysztof Kois > wrote: > > > > Hello, > > After upgrading the cluster from 10.x to 13.1 we've started getting a > > problem describe pgsql-general: > > https://www.postgresql.org/message-id/8bf8785c-f47d-245c-b6af-80dc1eed40db%40unitygroup.com > > We've noticed similar issue being described on this list in > > https://www.postgresql-archive.org/Logical-replication-CPU-bound-with-TRUNCATE-DROP-CREATE-many-tables-tt6155123.html > > with a fix being rolled out in 13.2. > > The fix for the problem discussed in the above threads is committed > only in PG-14, see [1]. I don't know what makes you think it is fixed > in 13.2. Also, it is not easy to back-patch that because this fix > depends on some of the infrastructure introduced in PG-14. > > [1] - > https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=d7eb52d7181d83cf2363570f7a205b8eb1008dbc Hmm ... On what does it depend (other than plain git conflicts, which are aplenty)? On a quick look to the commit, it's clear that we need to be careful in order not to cause an ABI break, but that doesn't seem impossible to solve, but I'm wondering if there is more to it than that. -- Álvaro Herrera39°49'30"S 73°17'W
Re: Bug fix for tab completion of ALTER TABLE ... VALIDATE CONSTRAINT ...
On Tue, Apr 27, 2021 at 12:33:25PM +0300, Aleksander Alekseev wrote: > Hi David, > > > I noticed that $subject completes with already valid constraints, > > please find attached a patch that fixes it. I noticed that there are > > other places constraints can be validated, but didn't check whether > > similar bugs exist there yet. > > There was a typo in the patch ("... and and not convalidated"). I've fixed > it. Otherwise the patch passes all the tests and works as expected: > > eax=# create table foo (x int); > CREATE TABLE > eax=# alter table foo add constraint bar check (x < 3) not valid; > ALTER TABLE > eax=# alter table foo add constraint baz check (x <> 5) not valid; > ALTER TABLE > eax=# alter table foo validate constraint ba > bar baz > eax=# alter table foo validate constraint bar; > ALTER TABLE Sorry about that typo, and thanks for poking at this! Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Error in libpq docs for target_session_attrs, prefer-standby mode
Hi, I spotted an error in the development version documentation for libpq's connection parameter "target_session_attrs" (34.1.2 Parameter Key Words). In the description for the "prefer-standby" mode, it says "... but if none of the listed hosts is a standby server, try again in all mode". There is no such "all" mode. It should instead say "any" mode. Patch is attached. Regards, Greg Nancarrow Fujitsu Australia libpq_target_session_attrs_doc_fix.patch Description: Binary data
Re: Replication slot stats misgivings
On Tue, Apr 27, 2021 at 8:28 PM Masahiko Sawada wrote: > > On Tue, Apr 27, 2021 at 11:29 PM Amit Kapila wrote: > > > > On Tue, Apr 27, 2021 at 5:40 PM Amit Kapila wrote: > > > > > > > I am not sure if the timeout happened because the machine is slow or > > is it in any way related to code. I am seeing some previous failures > > due to timeout on this machine [1][2]. In those failures, I see the > > "using stale stats" message. > > It seems like a time-dependent issue but I'm wondering why the logical > decoding test failed at this time. > As per the analysis done till now, it appears to be due to the reason that the machine is slow which leads to timeout and there appear to be some prior failures related to timeout as well. I think it is better to wait for another run (or few runs) to see if this occurs again. -- With Regards, Amit Kapila.
Re: Replication slot stats misgivings
On Wed, Apr 28, 2021 at 8:28 AM Amit Kapila wrote: > > On Tue, Apr 27, 2021 at 8:28 PM Masahiko Sawada wrote: > > > > On Tue, Apr 27, 2021 at 11:29 PM Amit Kapila > > wrote: > > > > > > On Tue, Apr 27, 2021 at 5:40 PM Amit Kapila > > > wrote: > > > > > > > > > > I am not sure if the timeout happened because the machine is slow or > > > is it in any way related to code. I am seeing some previous failures > > > due to timeout on this machine [1][2]. In those failures, I see the > > > "using stale stats" message. > > > > It seems like a time-dependent issue but I'm wondering why the logical > > decoding test failed at this time. > > > > As per the analysis done till now, it appears to be due to the reason > that the machine is slow which leads to timeout and there appear to be > some prior failures related to timeout as well. I think it is better > to wait for another run (or few runs) to see if this occurs again. > Yes, checkpoint seems to take a lot of time, could be because the machine is slow. Let's wait for the next run and see. Regards, Vignesh
Re: [PATCH] Re: pg_identify_object_as_address() doesn't support pg_event_trigger oids
On Tue, Apr 27, 2021 at 02:33:36PM +0200, Joel Jacobson wrote: > I've successfully tested > fix_event_trigger_pg_identify_object_as_address3.patch. Thanks. Applied down to 9.6 then. -- Michael signature.asc Description: PGP signature
Re: Unresolved repliaction hang and stop problem.
On Wed, Apr 28, 2021 at 6:48 AM Alvaro Herrera wrote: > > On 2021-Apr-14, Amit Kapila wrote: > > > On Tue, Apr 13, 2021 at 1:18 PM Krzysztof Kois > > wrote: > > > > > > Hello, > > > After upgrading the cluster from 10.x to 13.1 we've started getting a > > > problem describe pgsql-general: > > > https://www.postgresql.org/message-id/8bf8785c-f47d-245c-b6af-80dc1eed40db%40unitygroup.com > > > We've noticed similar issue being described on this list in > > > https://www.postgresql-archive.org/Logical-replication-CPU-bound-with-TRUNCATE-DROP-CREATE-many-tables-tt6155123.html > > > with a fix being rolled out in 13.2. > > > > The fix for the problem discussed in the above threads is committed > > only in PG-14, see [1]. I don't know what makes you think it is fixed > > in 13.2. Also, it is not easy to back-patch that because this fix > > depends on some of the infrastructure introduced in PG-14. > > > > [1] - > > https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=d7eb52d7181d83cf2363570f7a205b8eb1008dbc > > Hmm ... On what does it depend (other than plain git conflicts, which > are aplenty)? On a quick look to the commit, it's clear that we need to > be careful in order not to cause an ABI break, but that doesn't seem > impossible to solve, but I'm wondering if there is more to it than that. > As mentioned in the commit message, we need another commit [1] change to make this work. [1] - https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=c55040ccd0 -- With Regards, Amit Kapila.
Re: Replication slot stats misgivings
On Tue, Apr 27, 2021 at 11:02 AM vignesh C wrote: > > On Tue, Apr 27, 2021 at 9:48 AM vignesh C wrote: > > > > Attached patch has the changes to update statistics during > spill/stream which prevents the statistics from being lost during > interrupt. > void -UpdateDecodingStats(LogicalDecodingContext *ctx) +UpdateDecodingStats(ReorderBuffer *rb) I don't think you need to change this interface because reorderbuffer->private_data points to LogicalDecodingContext. See StartupDecodingContext. Other than that there is a comment in the code "Update the decoding stats at transaction prepare/commit/abort...". This patch should extend that comment by saying something like "Additionally we send the stats when we spill or stream the changes to avoid losing them in case the decoding is interrupted." -- With Regards, Amit Kapila.
Re: Skip temporary table schema name from explain-verbose output.
On Tue, Apr 27, 2021 at 7:08 PM Bharath Rupireddy wrote: > > On Tue, Apr 27, 2021 at 6:59 PM Ashutosh Bapat > wrote: > > > > On Tue, Apr 27, 2021 at 12:23 PM Amul Sul wrote: > > > > > > > > How about using an explain filter to replace the unstable text > > > > pg_temp_3 to pg_temp_N instead of changing it in the core? Following > > > > are the existing explain filters: explain_filter, > > > > explain_parallel_append, explain_analyze_without_memory, > > > > explain_resultcache, explain_parallel_sort_stats, explain_sq_limit. > > > > > > > > > > Well, yes eventually, that will be the kludge. I was wondering if that > > > table is accessible in a query via pg_temp schema then why should > > > bother about printing the pg_temp_N schema name which is an internal > > > purpose. > > > > Although only the associated session can access objects from that > > schema, I think, the entries in pg_class have different namespace oids > > and are accessible from other sessions. So knowing the actual schema > > name is useful for debugging purposes. Using auto_explain, the explain > > output goes to server log, where access to two temporary tables with > > the same name from different sessions can be identified by the actual > > schema name easily. > > Make sense, we would lose the ability to differentiate temporary tables from the auto_explain logs. Regards, Amul
Re: Replication slot stats misgivings
On Wed, Apr 28, 2021 at 8:59 AM Amit Kapila wrote: > > On Tue, Apr 27, 2021 at 11:02 AM vignesh C wrote: > > > > On Tue, Apr 27, 2021 at 9:48 AM vignesh C wrote: > > > > > > > Attached patch has the changes to update statistics during > > spill/stream which prevents the statistics from being lost during > > interrupt. > > > > void > -UpdateDecodingStats(LogicalDecodingContext *ctx) > +UpdateDecodingStats(ReorderBuffer *rb) > > I don't think you need to change this interface because > reorderbuffer->private_data points to LogicalDecodingContext. See > StartupDecodingContext. Other than that there is a comment in the code > "Update the decoding stats at transaction prepare/commit/abort...". > This patch should extend that comment by saying something like > "Additionally we send the stats when we spill or stream the changes to > avoid losing them in case the decoding is interrupted." Thanks for the comments, Please find the attached v4 patch having the fixes for the same. Regards, Vignesh From 533c45c4cbd4545350d464bbf7a2df91ee668a75 Mon Sep 17 00:00:00 2001 From: vignesh Date: Tue, 27 Apr 2021 10:56:02 +0530 Subject: [PATCH v4] Update replication statistics after every stream/spill. Currently, replication slot statistics are updated at prepare, commit, and rollback. Now, if the transaction is interrupted the stats might not get updated. Fixed this by updating replication statistics after every stream/spill. --- src/backend/replication/logical/decode.c| 14 -- src/backend/replication/logical/reorderbuffer.c | 6 ++ 2 files changed, 14 insertions(+), 6 deletions(-) diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c index 7924581cdc..888e064ec0 100644 --- a/src/backend/replication/logical/decode.c +++ b/src/backend/replication/logical/decode.c @@ -746,9 +746,10 @@ DecodeCommit(LogicalDecodingContext *ctx, XLogRecordBuffer *buf, } /* - * Update the decoding stats at transaction prepare/commit/abort. It is - * not clear that sending more or less frequently than this would be - * better. + * Update the decoding stats at transaction prepare/commit/abort. + * Additionally we send the stats when we spill or stream the changes to + * avoid losing them in case the decoding is interrupted. It is not clear + * that sending more or less frequently than this would be better. */ UpdateDecodingStats(ctx); } @@ -828,9 +829,10 @@ DecodePrepare(LogicalDecodingContext *ctx, XLogRecordBuffer *buf, ReorderBufferPrepare(ctx->reorder, xid, parsed->twophase_gid); /* - * Update the decoding stats at transaction prepare/commit/abort. It is - * not clear that sending more or less frequently than this would be - * better. + * Update the decoding stats at transaction prepare/commit/abort. + * Additionally we send the stats when we spill or stream the changes to + * avoid losing them in case the decoding is interrupted. It is not clear + * that sending more or less frequently than this would be better. */ UpdateDecodingStats(ctx); } diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c index c27f710053..ceb83bcbf9 100644 --- a/src/backend/replication/logical/reorderbuffer.c +++ b/src/backend/replication/logical/reorderbuffer.c @@ -3551,6 +3551,9 @@ ReorderBufferSerializeTXN(ReorderBuffer *rb, ReorderBufferTXN *txn) /* don't consider already serialized transactions */ rb->spillTxns += (rbtxn_is_serialized(txn) || rbtxn_is_serialized_clear(txn)) ? 0 : 1; + + /* update the decoding stats */ + UpdateDecodingStats(rb->private_data); } Assert(spilled == txn->nentries_mem); @@ -3920,6 +3923,9 @@ ReorderBufferStreamTXN(ReorderBuffer *rb, ReorderBufferTXN *txn) /* Don't consider already streamed transaction. */ rb->streamTxns += (txn_is_streamed) ? 0 : 1; + /* update the decoding stats */ + UpdateDecodingStats(rb->private_data); + Assert(dlist_is_empty(&txn->changes)); Assert(txn->nentries == 0); Assert(txn->nentries_mem == 0); -- 2.25.1
Re: Replication slot stats misgivings
On Wed, Apr 28, 2021 at 12:29 PM Amit Kapila wrote: > > On Tue, Apr 27, 2021 at 11:02 AM vignesh C wrote: > > > > On Tue, Apr 27, 2021 at 9:48 AM vignesh C wrote: > > > > > > > Attached patch has the changes to update statistics during > > spill/stream which prevents the statistics from being lost during > > interrupt. > > > > void > -UpdateDecodingStats(LogicalDecodingContext *ctx) > +UpdateDecodingStats(ReorderBuffer *rb) > > I don't think you need to change this interface because > reorderbuffer->private_data points to LogicalDecodingContext. See > StartupDecodingContext. +1 With this approach, we could still miss the totalTxns and totalBytes updates if the decoding a large but less than logical_decoding_work_mem is interrupted, right? Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
Re: Replication slot stats misgivings
On Wed, Apr 28, 2021 at 9:37 AM Masahiko Sawada wrote: > > On Wed, Apr 28, 2021 at 12:29 PM Amit Kapila wrote: > > > > On Tue, Apr 27, 2021 at 11:02 AM vignesh C wrote: > > > > > > On Tue, Apr 27, 2021 at 9:48 AM vignesh C wrote: > > > > > > > > > > Attached patch has the changes to update statistics during > > > spill/stream which prevents the statistics from being lost during > > > interrupt. > > > > > > > void > > -UpdateDecodingStats(LogicalDecodingContext *ctx) > > +UpdateDecodingStats(ReorderBuffer *rb) > > > > I don't think you need to change this interface because > > reorderbuffer->private_data points to LogicalDecodingContext. See > > StartupDecodingContext. > > +1 > > With this approach, we could still miss the totalTxns and totalBytes > updates if the decoding a large but less than > logical_decoding_work_mem is interrupted, right? Yes you are right, I felt that is reasonable and that way it reduces frequent calls to the stats collector to update the stats. Regards, Vignesh
Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions
Tom Lane has raised a complaint on pgsql-commiters [1] about one of the commits related to this work [2]. The new member wrasse is showing Warning: "/export/home/nm/farm/studio64v12_6/HEAD/pgsql.build/../pgsql/src/backend/replication/logical/reorderbuffer.c", line 2510: Warning: Likely null pointer dereference (*(curtxn+272)): ReorderBufferProcessTXN The Warning is for line: curtxn->concurrent_abort = true; Now, we can simply fix this warning by adding an if check like: if (curtxn) curtxn->concurrent_abort = true; However, on further discussion, it seems that is not sufficient here because the callbacks can throw the surrounding error code (ERRCODE_TRANSACTION_ROLLBACK) where we set concurrent_abort flag for a completely different scenario. I think here we need a stronger check to ensure that we set concurrent abort flag and do other things in that check only when we are decoding non-committed xacts. The idea I have is to additionally check that we are decoding streaming or prepared transaction (the same check as we have for setting curtxn) or we can check if CheckXidAlive is a valid transaction id. What do you think? [1] - https://www.postgresql.org/message-id/2752962.1619568098%40sss.pgh.pa.us [2] - https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=7259736a6e5b7c7588fff9578370736a6648acbb -- With Regards, Amit Kapila.
Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions
On Wed, Apr 28, 2021 at 11:00 AM Amit Kapila wrote: > > Tom Lane has raised a complaint on pgsql-commiters [1] about one of > the commits related to this work [2]. The new member wrasse is showing > Warning: > > "/export/home/nm/farm/studio64v12_6/HEAD/pgsql.build/../pgsql/src/backend/replication/logical/reorderbuffer.c", > line 2510: Warning: Likely null pointer dereference (*(curtxn+272)): > ReorderBufferProcessTXN > > The Warning is for line: > curtxn->concurrent_abort = true; > > Now, we can simply fix this warning by adding an if check like: > if (curtxn) > curtxn->concurrent_abort = true; > > However, on further discussion, it seems that is not sufficient here > because the callbacks can throw the surrounding error code > (ERRCODE_TRANSACTION_ROLLBACK) where we set concurrent_abort flag for > a completely different scenario. I think here we need a > stronger check to ensure that we set concurrent abort flag and do > other things in that check only when we are decoding non-committed > xacts. That makes sense. The idea I have is to additionally check that we are decoding > streaming or prepared transaction (the same check as we have for > setting curtxn) or we can check if CheckXidAlive is a valid > transaction id. What do you think? I think a check based on CheckXidAlive looks good to me. This will protect against if a similar error is raised from any other path as you mentioned above. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: Since '2001-09-09 01:46:40'::timestamp microseconds are lost when extracting epoch
On Wed, Apr 28, 2021 at 3:56 AM Tom Lane wrote: > Of course Wikipedia has been known to contain errors, but now > I'm inclined to think I blew this. Anyone want to check my work? I tried a couple of examples not from Wikipedia. First, from the definition of Julian days as used by astronomers[1], counting from noon on 4713-01-01 BC Julian AKA 4714-11-24 BC Gregorian, days 0 and 1 look right with 'utc+12': postgres=# select extract(julian from '4714-11-24 11:00:00+00 BC'::timestamptz at time zone 'utc+12'); ERROR: timestamp out of range postgres=# select extract(julian from '4714-11-24 12:00:00+00 BC'::timestamptz at time zone 'utc+12'); extract 0. (1 row) postgres=# select extract(julian from '4714-11-25 11:00:00+00 BC'::timestamptz at time zone 'utc+12'); extract 0.9583 (1 row) postgres=# select extract(julian from '4714-11-25 12:00:00+00 BC'::timestamptz at time zone 'utc+12'); extract 1. (1 row) Next I found a worked example in an aerospace textbook[1] and it agrees, too: postgres=# select extract(julian from '2004-05-12 14:45:30+00'::timestamptz at time zone 'utc+12'); extract -- 2453138.11493056 (1 row) [1] http://curious.astro.cornell.edu/people-and-astronomy/125-observational-astronomy/timekeeping/calendars/763-how-was-the-starting-point-for-the-julian-date-system-chosen-advanced [2] https://www.sciencedirect.com/topics/engineering/julian-day-number
pg_hba.conf.sample wording improvement
I propose the attached patch to shake up the wording in the connection type section of pg_hba.conf.sample a bit. After the hostgssenc part was added on, the whole thing became a bit wordy, and it's also a bit inaccurate for example in that the current wording for "host" appears to say that it does not apply to GSS-encrypted connections. From dc64f4826c4dbf3bcd1cdadb1e9f351ce45f9074 Mon Sep 17 00:00:00 2001 From: Peter Eisentraut Date: Wed, 28 Apr 2021 07:46:48 +0200 Subject: [PATCH] pg_hba.conf.sample: Reword connection type section --- src/backend/libpq/pg_hba.conf.sample | 13 +++-- 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/src/backend/libpq/pg_hba.conf.sample b/src/backend/libpq/pg_hba.conf.sample index b6de12b298..ead092ffab 100644 --- a/src/backend/libpq/pg_hba.conf.sample +++ b/src/backend/libpq/pg_hba.conf.sample @@ -18,12 +18,13 @@ # # (The uppercase items must be replaced by actual values.) # -# The first field is the connection type: "local" is a Unix-domain -# socket, "host" is either a plain or SSL-encrypted TCP/IP socket, -# "hostssl" is an SSL-encrypted TCP/IP socket, and "hostnossl" is a -# non-SSL TCP/IP socket. Similarly, "hostgssenc" uses a -# GSSAPI-encrypted TCP/IP socket, while "hostnogssenc" uses a -# non-GSSAPI socket. +# The first field is the connection type: +# - "local" is a Unix-domain socket +# - "host" is a TCP/IP socket (encrypted or not) +# - "hostssl" is an SSL-encrypted TCP/IP socket +# - "hostnossl" is a non-SSL TCP/IP socket +# - "hostgssenc" is a GSSAPI-encrypted TCP/IP socket +# - "hostnogssenc" is a not GSSAPI-encrypted socket # # DATABASE can be "all", "sameuser", "samerole", "replication", a # database name, or a comma-separated list thereof. The "all" -- 2.31.1
Re: Replication slot stats misgivings
On Wed, Apr 28, 2021 at 9:37 AM Masahiko Sawada wrote: > > On Wed, Apr 28, 2021 at 12:29 PM Amit Kapila wrote: > > > > On Tue, Apr 27, 2021 at 11:02 AM vignesh C wrote: > > > > > > On Tue, Apr 27, 2021 at 9:48 AM vignesh C wrote: > > > > > > > > > > Attached patch has the changes to update statistics during > > > spill/stream which prevents the statistics from being lost during > > > interrupt. > > > > > > > void > > -UpdateDecodingStats(LogicalDecodingContext *ctx) > > +UpdateDecodingStats(ReorderBuffer *rb) > > > > I don't think you need to change this interface because > > reorderbuffer->private_data points to LogicalDecodingContext. See > > StartupDecodingContext. > > +1 > > With this approach, we could still miss the totalTxns and totalBytes > updates if the decoding a large but less than > logical_decoding_work_mem is interrupted, right? > Right, but is there some simple way to avoid that? I see two possibilities (a) store stats in ReplicationSlot and then send them at ReplicationSlotRelease but that will lead to an increase in shared memory usage and as per the discussion above, we don't want that, (b) send intermediate stats after decoding say N changes but for that, we need to additionally compute the size of each change which might add some overhead. I am not sure if any of these alternatives are a good idea. What do you think? Do you have any other ideas for this? -- With Regards, Amit Kapila.
Re: pg_hba.conf.sample wording improvement
On Wed, 2021-04-28 at 07:51 +0200, Peter Eisentraut wrote: > I propose the attached patch to shake up the wording in the connection > type section of pg_hba.conf.sample a bit. After the hostgssenc part was > added on, the whole thing became a bit wordy, and it's also a bit > inaccurate for example in that the current wording for "host" appears to > say that it does not apply to GSS-encrypted connections. +1 Thanks for taking care of things like that. Yours, Laurenz Albe
RE: [BUG] "FailedAssertion" reported when streaming in logical replication
> I have modified the patch based on the above comments. Thanks for your patch. I tested again after applying your patch and the problem is fixed. Regards Tang