Re: Todo: Teach planner to evaluate multiple windows in the optimal order
On Wed, 15 Feb 2023 at 17:23, John Naylor wrote: > HEAD: > > 4 ^ 8: latency average = 113.976 ms > 5 ^ 8: latency average = 783.830 ms > 6 ^ 8: latency average = 3990.351 ms > 7 ^ 8: latency average = 15793.629 ms > > Skip rechecking first key: > > 4 ^ 8: latency average = 107.028 ms > 5 ^ 8: latency average = 732.327 ms > 6 ^ 8: latency average = 3709.882 ms > 7 ^ 8: latency average = 14570.651 ms Thanks for testing that. It's good to see improvements in each of them. > I gather that planner hack was just a demonstration, so I didn't test it, but > if that was a move toward something larger I can run additional tests. Yeah, just a hack. My intention with it was just to prove we had a problem because sometimes Sort -> Incremental Sort was faster than Sort. Ideally, with your change, we'd see that it's always faster to do the full sort in one go. It would be good to see your patch with and without the planner hack patch to ensure sort is now always faster than sort -> incremental sort. David
Re: Todo: Teach planner to evaluate multiple windows in the optimal order
I wrote: > it might be worthwhile to "zoom in" with more measurements, but haven't done that yet. I've attached the script and image for 1 million / random / varying the mod by quarter-log intervals. Unfortunately I didn't get as good results as yesterday. Immediately going from mod 1 to mod 2, sort pushdown regresses sharply and stays regressed up until 1. The tiebreaker patch helps but never removes the regression. I suspect that I fat-fingered work_mem yesterday, so next I'll pick a badly-performing mod like 32, then range over work_mem and see if that explains anything, especially whether L3 effects are in fact more important in this workload. -- John Naylor EDB: http://www.enterprisedb.com bench_windowsort-jcn-random-finegrained.sh Description: application/shellscript
Fix the description of GUC "max_locks_per_transaction" and "max_pred_locks_per_transaction" in guc_table.c
Hi, When I refer to the GUC "max_locks_per_transaction", I find that the description of the shared lock table size in pg-doc[1] is inconsistent with the code (guc_table.c). BTW, the GUC "max_predicate_locks_per_xact" has similar problems. I think the descriptions in pg-doc are correct. - GUC "max_locks_per_transaction" Please refer to the macro "NLOCKENTS" used to obtain max_table_size in the function InitLocks. - GUC "max_predicate_locks_per_xact" Please refer to the macro "NPREDICATELOCKTARGETENTS" used to obtain max_table_size in the function InitPredicateLocks. Attach the patch to fix the descriptions of these two GUCs in guc_table.c. [1] - https://www.postgresql.org/docs/devel/runtime-config-locks.html Regards, Wang Wei v1-0001-Fix-the-description-of-GUC-max_locks_per_trans.patch Description: v1-0001-Fix-the-description-of-GUC-max_locks_per_trans.patch
Re: Reconcile stats in find_tabstat_entry() and get rid of PgStat_BackendFunctionEntry
Hi, On 2/15/23 1:56 AM, Kyotaro Horiguchi wrote: At Tue, 14 Feb 2023 15:43:26 +0100, "Drouvot, Bertrand" wrote in The comment assumed that the function directly returns an entry from shared memory, but now it copies the entry's contents into a palloc'ed memory and stores the sums of some counters for the current transaction in it. Do you think we should update the comment to reflect this change? Good point, thanks! Yeah, definitively, done in V5 attached. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.comdiff --git a/src/backend/utils/activity/pgstat_relation.c b/src/backend/utils/activity/pgstat_relation.c index f793ac1516..b26e2a5a7a 100644 --- a/src/backend/utils/activity/pgstat_relation.c +++ b/src/backend/utils/activity/pgstat_relation.c @@ -471,20 +471,46 @@ pgstat_fetch_stat_tabentry_ext(bool shared, Oid reloid) * Find any existing PgStat_TableStatus entry for rel_id in the current * database. If not found, try finding from shared tables. * + * If an entry is found, copy it and increment the copy's counters with their + * subtransactions counterparts. Then return the copy. There is no need for the + * caller to pfree the copy as the MemoryContext will be reset soon after. + * * If no entry found, return NULL, don't create a new one */ PgStat_TableStatus * find_tabstat_entry(Oid rel_id) { PgStat_EntryRef *entry_ref; + PgStat_TableXactStatus *trans; + PgStat_TableStatus *tabentry = NULL; + PgStat_TableStatus *tablestatus = NULL; entry_ref = pgstat_fetch_pending_entry(PGSTAT_KIND_RELATION, MyDatabaseId, rel_id); if (!entry_ref) + { entry_ref = pgstat_fetch_pending_entry(PGSTAT_KIND_RELATION, InvalidOid, rel_id); + if(!entry_ref) + return tablestatus; + } + + tabentry = (PgStat_TableStatus *) entry_ref->pending; + tablestatus = palloc(sizeof(PgStat_TableStatus)); + *tablestatus = *tabentry; + + /* +* Live subtransactions' counts aren't in t_counts yet. This is not a hot +* code path so it sounds ok to reconcile for tuples_inserted, +* tuples_updated and tuples_deleted even if this is not what the caller +* is interested in. +*/ + for (trans = tabentry->trans; trans != NULL; trans = trans->upper) + { + tablestatus->t_counts.t_tuples_inserted += trans->tuples_inserted; + tablestatus->t_counts.t_tuples_updated += trans->tuples_updated; + tablestatus->t_counts.t_tuples_deleted += trans->tuples_deleted; + } - if (entry_ref) - return entry_ref->pending; - return NULL; + return tablestatus; } /* diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c index 9d707c3521..405d080daa 100644 --- a/src/backend/utils/adt/pgstatfuncs.c +++ b/src/backend/utils/adt/pgstatfuncs.c @@ -1548,17 +1548,11 @@ pg_stat_get_xact_tuples_inserted(PG_FUNCTION_ARGS) Oid relid = PG_GETARG_OID(0); int64 result; PgStat_TableStatus *tabentry; - PgStat_TableXactStatus *trans; if ((tabentry = find_tabstat_entry(relid)) == NULL) result = 0; else - { - result = tabentry->t_counts.t_tuples_inserted; - /* live subtransactions' counts aren't in t_tuples_inserted yet */ - for (trans = tabentry->trans; trans != NULL; trans = trans->upper) - result += trans->tuples_inserted; - } + result = (int64) (tabentry->t_counts.t_tuples_inserted); PG_RETURN_INT64(result); } @@ -1569,17 +1563,11 @@ pg_stat_get_xact_tuples_updated(PG_FUNCTION_ARGS) Oid relid = PG_GETARG_OID(0); int64 result; PgStat_TableStatus *tabentry; - PgStat_TableXactStatus *trans; if ((tabentry = find_tabstat_entry(relid)) == NULL) result = 0; else - { - result = tabentry->t_counts.t_tuples_updated; - /* live subtransactions' counts aren't in t_tuples_updated yet */ - for (trans = tabentry->trans; trans != NULL; trans = trans->upper) - result += trans->tuples_updated; - } + result = (int64) (tabentry->t_counts.t_tuples_updated); PG_RETURN_INT64(result); } @@ -1590,17 +1578,11 @@ pg_stat_get_xact_tuples_deleted(PG_FUNCTION_ARGS) Oid relid = PG_GETARG_OID(0); int64 result; PgStat_TableStatus *tabentry; - PgStat_TableXactStatus *trans; if ((tabentry = find_tabstat_entry(relid)) == NULL) result = 0; else - { - result = tabentry->t_counts.t_tuples_deleted; - /* live subtransactio
Re: Support logical replication of DDLs
On 2023-Feb-15, Peter Smith wrote: > On Thu, Feb 9, 2023 at 8:55 PM Ajin Cherian wrote: > > > > On Fri, Feb 3, 2023 at 11:41 AM Peter Smith wrote: > > > 3. ExecuteGrantStmt > > > > > > + /* Copy the grantor id needed for DDL deparsing of Grant */ > > > + istmt.grantor_uid = grantor; > > > > > > SUGGESTION (comment) > > > Copy the grantor id to the parsetree, needed for DDL deparsing of Grant > > > > didn't change this, as Alvaro said this was not a parsetree. > > Perhaps there is more to do here? Sorry, I did not understand the > details of Alvaro's post [1], but I did not recognize the difference > between ExecuteGrantStmt and ExecSecLabelStmt so it was my impression > either one or both of these places are either wrongly commented, or > maybe are doing something that should not be done. These two cases are different. In ExecGrantStmt we're adding the grantor OID to the InternalGrant struct, which is not a parse node, so there's no strong reason not to modify it (and also the suggested comment change is factually wrong). I don't know if the idea is great, but at least I see no strong objection. In the other case, as I said in [1], the patch proposes to edit the parse node to add the grantor, but I think a better idea might be to change the signature to ExecSecLabelStmt(SecLabelStmt *stmt, ObjectAddress *provider) so that the function can set the provider there; and caller passes &secondaryObject, which is the method we adopted for this kind of thing. [1] https://postgr.es/m/20230213090752.27ftbb6byiw3qcbl@alvherre.pgsql -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
Use of additional index columns in rows filtering
Hi All, I'd like to report what seems to be a missing optimization opportunity or understand why it is not possible to achieve. TLDR; additional index column B specified in CREATE INDEX .. (A) INCLUDE(B) is not used to filter rows in queries like WHERE B = $1 ORDER BY A during IndexScan. https://dbfiddle.uk/iehtq44L Take following database: CREATE TABLE t( a integer NOT NULL, b integer NOT NULL, d integer NOT NULL ); CREATE UNIQUE INDEX t_a_include_b ON t (a) INCLUDE (b); -- I'd expect index above to behave as index below for the purpose -- of this query -- CREATE UNIQUE INDEX ON t(a,b); INSERT INTO t( SELECT random() * 1 as a, random() * 3 as b, generate_series as d FROM generate_series(1,20) ) ON CONFLICT DO NOTHING; If we filter on `a` and `b` columns while scanning index created as `(a) INCLUDE (b)` it seems to be fetching tuples from heap to check for condition `b = 4` despite both columns available in the index: SELECT * FROM t WHERE a > 100 and b = 4 ORDER BY a ASC LIMIT 10; Here is the plan (notice high "shared hit"): Limit (cost=0.42..10955.01 rows=1 width=12) (actual time=84.283..84.284 rows=0 loops=1) Output: a, b, d Buffers: shared hit=198307 -> Index Scan using t_a_include_b on public.t (cost=0.42..10955.01 rows=1 width=12) (actual time=84.280..84.281 rows=0 loops=1) Output: a, b, d Index Cond: (t.a > 100) Filter: (t.b = 4) Rows Removed by Filter: 197805 Buffers: shared hit=198307 Planning: Buffers: shared hit=30 Planning Time: 0.201 ms Execution Time: 84.303 ms And here is the plan with index on (a,b). Limit (cost=0.42..4447.90 rows=1 width=12) (actual time=6.883..6.884 rows=0 loops=1) Output: a, b, d Buffers: shared hit=613 -> Index Scan using t_a_b_idx on public.t (cost=0.42..4447.90 rows=1 width=12) (actual time=6.880..6.881 rows=0 loops=1) Output: a, b, d Index Cond: ((t.a > 100) AND (t.b = 4)) Buffers: shared hit=613 Planning: Buffers: shared hit=41 Planning Time: 0.314 ms Execution Time: 6.910 ms Because query doesn't sort on `b`, only filters on it while sorting on `a`, I'd expect indexes `(a) INCLUDE (b)` and `(a,b)` behave exactly the same with this particular query. Interestingly, IndexOnlyScan is capable of using additional columns to filter rows without fetching them from the heap, but only for visibible tuples: VACUUM FREEZE t; SELECT a,b FROM t WHERE a > 100 and b = 4 ORDER BY a ASC LIMIT 10; Limit (cost=0.42..6619.76 rows=1 width=8) (actual time=18.479..18.480 rows=0 loops=1) Output: a, b Buffers: shared hit=662 -> Index Only Scan using t_a_include_b on public.t (cost=0.42..6619.76 rows=1 width=8) (actual time=18.477..18.477 rows=0 loops=1) Output: a, b Index Cond: (t.a > 100) Filter: (t.b = 4) Rows Removed by Filter: 197771 Heap Fetches: 0 Buffers: shared hit=662 Removing VACUUM makes it behave like IndexScan and fetch candidate tuples from heap all while returning zero rows in the result. To make query plan comparable I had to force index scan on both with: SET enable_bitmapscan to off; SET enable_seqscan to off; SET max_parallel_workers_per_gather = 0; Self contained fully reproducible example is in https://dbfiddle.uk/iehtq44L Regards, Maxim
Re: [PATCH] Add pretty-printed XML output option
On Wed, Feb 15, 2023 at 6:10 PM Jim Jones wrote: > > On 15.02.23 02:09, Peter Smith wrote: > > With v8, in my environment, in psql I see something slightly different: > > > > > > test_pub=# SET XML OPTION CONTENT; > > SET > > test_pub=# SELECT xmlformat(''); > > ERROR: invalid XML document > > DETAIL: line 1: switching encoding : no input > > line 1: Document is empty > > test_pub=# SET XML OPTION DOCUMENT; > > SET > > test_pub=# SELECT xmlformat(''); > > ERROR: invalid XML document > > LINE 1: SELECT xmlformat(''); > > ^ > > DETAIL: line 1: switching encoding : no input > > line 1: Document is empty > > > > ~~ > > > > test_pub=# SET XML OPTION CONTENT; > > SET > > test_pub=# INSERT INTO xmltest VALUES (3, ' > ERROR: relation "xmltest" does not exist > > LINE 1: INSERT INTO xmltest VALUES (3, ' > ^ > > test_pub=# SET XML OPTION DOCUMENT; > > SET > > test_pub=# INSERT INTO xmltest VALUES (3, ' > ERROR: relation "xmltest" does not exist > > LINE 1: INSERT INTO xmltest VALUES (3, ' > ^ > > > > ~~ > > Yes... a cfbot also complained about the same thing. > > Setting the VERBOSITY to terse might solve this issue: > > postgres=# \set VERBOSITY terse > postgres=# SELECT xmlformat(''); > ERROR: invalid XML document > > postgres=# \set VERBOSITY default > postgres=# SELECT xmlformat(''); > ERROR: invalid XML document > DETAIL: line 1: switching encoding : no input > > ^ > line 1: Document is empty > > ^ > > v9 wraps the corner test cases with VERBOSITY terse to reduce the error > message output. > Bingo!! Your v9 patch now passes all 'make check' tests for me. But I'll leave it to a committer to decide if this VERBOSITY toggle is the best fix. (I don't understand, maybe someone can explain, how the patch managed to mess verbosity of the existing tests.) -- Kind Regards, Peter Smith. Fujitsu Austalia.
Re: Perform streaming logical transactions by background workers and parallel apply
On Wed, Feb 15, 2023 at 8:55 AM houzj.f...@fujitsu.com wrote: > > On Wednesday, February 15, 2023 10:34 AM Amit Kapila > wrote: > > > > > > > > > > So names like the below seem correct format: > > > > > > > > a) WAIT_EVENT_LOGICAL_APPLY_SEND_DATA > > > > b) WAIT_EVENT_LOGICAL_LEADER_SEND_DATA > > > > c) WAIT_EVENT_LOGICAL_LEADER_APPLY_SEND_DATA > > > > > > Personally I'm fine even without "LEADER" in the wait event name since > > > we don't have "who is waiting" in it. IIUC a row of pg_stat_activity > > > shows who, and the wait event name shows "what the process is > > > waiting". So I prefer (a). > > > > > > > This logic makes sense to me. So, let's go with (a). > > OK, here is patch that change the event name to > WAIT_EVENT_LOGICAL_APPLY_SEND_DATA. > LGTM. -- With Regards, Amit Kapila.
Re: [PATCH] Add pretty-printed XML output option
On 15.02.23 10:06, Peter Smith wrote: Bingo!! Your v9 patch now passes all 'make check' tests for me. Nice! It also passed all tests in the patch tester. But I'll leave it to a committer to decide if this VERBOSITY toggle is the best fix. I see now other test cases in the xml.sql file that also use this option, so it might be a known "issue". Shall we now set the patch to "Ready for Commiter"? Thanks a lot for the review! Best, Jim smime.p7s Description: S/MIME Cryptographic Signature
Re: Support logical replication of DDLs
On Wed, Feb 15, 2023 at 2:02 PM Alvaro Herrera wrote: > > On 2023-Feb-15, Peter Smith wrote: > > > On Thu, Feb 9, 2023 at 8:55 PM Ajin Cherian wrote: > > > > > > On Fri, Feb 3, 2023 at 11:41 AM Peter Smith wrote: > > > > > 3. ExecuteGrantStmt > > > > > > > > + /* Copy the grantor id needed for DDL deparsing of Grant */ > > > > + istmt.grantor_uid = grantor; > > > > > > > > SUGGESTION (comment) > > > > Copy the grantor id to the parsetree, needed for DDL deparsing of Grant > > > > > > didn't change this, as Alvaro said this was not a parsetree. > > > > Perhaps there is more to do here? Sorry, I did not understand the > > details of Alvaro's post [1], but I did not recognize the difference > > between ExecuteGrantStmt and ExecSecLabelStmt so it was my impression > > either one or both of these places are either wrongly commented, or > > maybe are doing something that should not be done. > > These two cases are different. In ExecGrantStmt we're adding the > grantor OID to the InternalGrant struct, which is not a parse node, so > there's no strong reason not to modify it (and also the suggested > comment change is factually wrong). I don't know if the idea is great, > but at least I see no strong objection. > > In the other case, as I said in [1], the patch proposes to edit the > parse node to add the grantor, but I think a better idea might be to > change the signature to > ExecSecLabelStmt(SecLabelStmt *stmt, ObjectAddress *provider) so that > the function can set the provider there; and caller passes > &secondaryObject, which is the method we adopted for this kind of thing. > +1, that is a better approach to make the required change in ExecSecLabelStmt(). -- With Regards, Amit Kapila.
Re: [EXTERNAL] Re: [PATCH] Support using "all" for the db user in pg_ident.conf
On 15.02.2023 10:11, Michael Paquier wrote: Which comes down to blame me for both of them. My only intention was to make postgres better.I'm sorry you took it that way. So, please find attached a patch to close the gap the sample files, for both things, with descriptions of all the field values they can use. A short and precise description. Nothing to add.Next time I will try to offer a patch at once. - Pavel Luzanov
Re: [PATCH] Add pretty-printed XML output option
On 2023-Feb-13, Peter Smith wrote: > Something is misbehaving. > > Using the latest HEAD, and before applying the v6 patch, 'make check' > is working OK. > > But after applying the v6 patch, some XML regression tests are failing > because the DETAIL part of the message indicating the wrong syntax > position is not getting displayed. Not only for your new tests -- but > for other XML tests too. Note that there's another file, xml_2.out, which does not contain the additional part of the DETAIL message. I suspect in Peter's case it's xml_2.out that was originally being used as a comparison basis (not xml.out), but that one is not getting patched, and ultimately the diff is reported for him against xml.out because none of them matches. An easy way forward might be to manually apply the patch from xml.out to xml_2.out, and edit it by hand to remove the additional lines. See commit 085423e3e326 for a bit of background. -- Álvaro HerreraBreisgau, Deutschland — https://www.EnterpriseDB.com/
Re: [EXTERNAL] Re: [PATCH] Support using "all" for the db user in pg_ident.conf
On Wed, Feb 15, 2023 at 01:05:04PM +0300, Pavel Luzanov wrote: > On 15.02.2023 10:11, Michael Paquier wrote: >> Which comes down to blame me for both of them. > > My only intention was to make postgres better.I'm sorry you took it that > way. You have no need to feel sorry about that. I really appreciate that you took the time to report this issue, so don't worry. My point is that I have committed this code, so basically it is my responsibility to take care of its maintenance. >> So, please find attached a patch to close the gap the sample files, >> for both things, with descriptions of all the field values they can >> use. > > A short and precise description. Nothing to add.Next time I will try to > offer a patch at once. If you have a proposal of patch, that's always nice to have, but you should not feel obliged to do so, either. Thanks a lot for the report, Pavel! -- Michael signature.asc Description: PGP signature
Re: [PATCH] Add pretty-printed XML output option
On 15.02.23 11:11, Alvaro Herrera wrote: Note that there's another file, xml_2.out, which does not contain the additional part of the DETAIL message. I suspect in Peter's case it's xml_2.out that was originally being used as a comparison basis (not xml.out), but that one is not getting patched, and ultimately the diff is reported for him against xml.out because none of them matches. An easy way forward might be to manually apply the patch from xml.out to xml_2.out, and edit it by hand to remove the additional lines. See commit 085423e3e326 for a bit of background. Hi Álvaro, As my test cases were not specifically about how the error message looks like, I thought that suppressing part of the error messages by setting VERBOSITY terse would suffice.[1] Is this approach not recommended? Thanks! Best, Jim 1 - v9 patch: https://www.postgresql.org/message-id/CAHut%2BPtoH8zkBHxv44XyO%2Bo4kW_nZdGhNdVaJ_cpEjrckKDqtw%40mail.gmail.com smime.p7s Description: S/MIME Cryptographic Signature
Re: Can we do something to help stop users mistakenly using force_parallel_mode?
On Wed, 15 Feb 2023 at 14:10, Andrew Dunstan wrote: > We'll email them once this is in. Most people are fairly reponsive. I pushed the rename patch earlier. How should we go about making contact with the owners? I'm thinking it may be better coming from you, especially if you think technical details of what exactly should be changed should be included in the email. But I can certainly have a go if you'd rather I did it or you don't have time for this. Thanks David
Re: [PATCH] Add pretty-printed XML output option
On 2023-Feb-15, Jim Jones wrote: > Hi Álvaro, > > As my test cases were not specifically about how the error message looks > like, I thought that suppressing part of the error messages by setting > VERBOSITY terse would suffice.[1] Is this approach not recommended? Well, I don't see why we would depart from what we've been doing in the rest of the XML tests. I did see that patch and I thought it was taking the wrong approach. -- Álvaro HerreraBreisgau, Deutschland — https://www.EnterpriseDB.com/ "Aprender sin pensar es inútil; pensar sin aprender, peligroso" (Confucio)
Re: run pgindent on a regular basis / scripted manner
> (ITYM "remove as many hurdles as possible"). yes, I messed up rewriting that sentence from "having as few hurdles as possible" to "removing as many hurdles as possible" > So far, we have had the following categories suggested: dirty, staged, > dirty+staged, untracked. Are there any others? The two workflows that make most sense to me personally are: 1. staged (indent anything that you're staging for a commit) 2. dirty+staged+untracked (indent anything you've been working on that is not committed yet) The obvious way of having --dirty, --staged, and --untracked flags would require 3 flags for this second (to me seemingly) common operation. That seems quite unfortunate. So I would propose the following two flags for those purposes: 1. --staged/--cached (--cached is in line with git, but I personally think --staged is clearer, git has --staged-only but that seems long for no reason) 2. --uncommitted And maybe for completeness we could have the following flags, so you could target any combination of staged/untracked/dirty files: 3. --untracked (untracked files only) 4. --dirty (tracked files with changes that are not staged) But I don't know in what workflow people would actually use them. > Another issue is whether or not to restrict these to files under the current > directory. I think we probably should, or at least provide a --relative > option. Good point, I think it makes sense to restrict it to the current directory by default. You can always cd to the root of the repo if you want to format everything.
RE: Time delayed LR (WAS Re: logical replication restrictions)
Dear Andres and other hackers, > OTOH, if we want to implement the functionality on publisher-side, > I think we need to first consider the interface. > We can think of two options (a) Have it as a subscription parameter as the > patch > has now and > then pass it as an option to the publisher which it will use to delay; > (b) Have it defined on publisher-side, say via GUC or some other way. > The basic idea could be that while processing commit record (in DecodeCommit), > we can somehow check the value of delay and then use it there to delay sending > the xact. > Also, during delay, we need to somehow send the keepalive and process replies, > probably via a new callback or by some existing callback. > We also need to handle in-progress and 2PC xacts in a similar way. > For the former, probably we would need to apply the delay before sending the > first > stream. > Could you please share what you feel on this direction as well ? I implemented a patch that the delaying is done on the publisher side. In this patch, approach (a) was chosen, in which min_apply_delay is specified as a subscription parameter, and then apply worker passes it to the publisher as an output plugin option. During the delay, the walsender periodically checks and processes replies from the apply worker and sends keepalive messages if needed. Therefore, the ability to handle keepalives is not loosed. To delay the transaction in the output plugin layer, the new LogicalOutputPlugin API was added. For now, I choose the output plugin layer but can consider to do it from the core if there is a better way. Could you please share your opinion? Note: thanks for Osumi-san to help implementing. Best Regards, Hayato Kuroda FUJITSU LIMITED 0001-Time-delayed-logical-replication-on-publisher-side.patch Description: 0001-Time-delayed-logical-replication-on-publisher-side.patch
Re: Todo: Teach planner to evaluate multiple windows in the optimal order
On Wed, Feb 15, 2023 at 3:02 PM David Rowley wrote: > > On Wed, 15 Feb 2023 at 17:23, John Naylor wrote: > > HEAD: > > > > 4 ^ 8: latency average = 113.976 ms > > 5 ^ 8: latency average = 783.830 ms > > 6 ^ 8: latency average = 3990.351 ms > > 7 ^ 8: latency average = 15793.629 ms > > > > Skip rechecking first key: > > > > 4 ^ 8: latency average = 107.028 ms > > 5 ^ 8: latency average = 732.327 ms > > 6 ^ 8: latency average = 3709.882 ms > > 7 ^ 8: latency average = 14570.651 ms > Yeah, just a hack. My intention with it was just to prove we had a > problem because sometimes Sort -> Incremental Sort was faster than > Sort. Ideally, with your change, we'd see that it's always faster to > do the full sort in one go. It would be good to see your patch with > and without the planner hack patch to ensure sort is now always faster > than sort -> incremental sort. Okay, here's a rerun including the sort hack, and it looks like incremental sort is only ahead with the smallest set, otherwise same or maybe slightly slower: HEAD: 4 ^ 8: latency average = 113.461 ms 5 ^ 8: latency average = 786.080 ms 6 ^ 8: latency average = 3948.388 ms 7 ^ 8: latency average = 15733.348 ms tiebreaker: 4 ^ 8: latency average = 106.556 ms 5 ^ 8: latency average = 734.834 ms 6 ^ 8: latency average = 3640.507 ms 7 ^ 8: latency average = 14470.199 ms tiebreaker + incr sort hack: 4 ^ 8: latency average = 93.998 ms 5 ^ 8: latency average = 740.120 ms 6 ^ 8: latency average = 3715.942 ms 7 ^ 8: latency average = 14749.323 ms And as far as this: > I suspect that I fat-fingered work_mem yesterday, so next I'll pick a badly-performing mod like 32, then range over work_mem and see if that explains anything, especially whether L3 effects are in fact more important in this workload. Attached is a script and image for fixing the input at random / mod32 and varying work_mem. There is not a whole lot of variation here: pushdown regresses and the tiebreaker patch only helped marginally. I'm still not sure why the results from yesterday looked better than today. -- John Naylor EDB: http://www.enterprisedb.com bench_windowsort_workmem-20230215.sh Description: application/shellscript
Re: Refactoring postgres_fdw/connection.c
On Tue, Jan 31, 2023 at 3:44 PM Fujii Masao wrote: > On 2023/01/29 19:31, Etsuro Fujita wrote: > > I agree that if the name of an existing function was bad, we should > > rename it, but I do not think the name pgfdw_get_cleanup_result is > > bad; I think it is good in the sense that it well represents what the > > function waits for. > > > > The patch you proposed changes pgfdw_get_cleanup_result to cover the > > timed_out==NULL case, but I do not think it is a good idea to rename > > it for such a minor reason. That function is used in all supported > > versions, so that would just make back-patching hard. > > As far as I understand, the function name pgfdw_get_cleanup_result is > used because it's used to get the result during abort cleanup as > the comment tells. OTOH new function is used even not during abort clean, > e.g., pgfdw_get_result() calls that new function. So I don't think that > pgfdw_get_cleanup_result is good name in some places. Yeah, I agree on that point. > If you want to leave pgfdw_get_cleanup_result for the existing uses, > we can leave that and redefine it so that it just calls the workhorse > function pgfdw_get_result_timed. +1; that's actually what I proposed upthread. :-) BTW the name "pgfdw_get_result_timed" is a bit confusing to me, because the new function works *without* a timeout condition. We usually append the suffix "_internal", so how about "pgfdw_get_result_internal", to avoid that confusion? > > Yeah, this is intentional; in commit 04e706d42, I coded this to match > > the behavior in the non-parallel-commit mode, which does not call > > CHECK_FOR_INTERRUPT. > > > >> But could you tell me why? > > > > My concern about doing so is that WaitLatchOrSocket is rather > > expensive, so it might lead to useless overhead in most cases. > > pgfdw_get_result() and pgfdw_get_cleanup_result() already call > WaitLatchOrSocket() and CHECK_FOR_INTERRUPTS(). That is, during > commit phase, they are currently called when receiving the result > of COMMIT TRANSACTION command from remote server. Why do we need > to worry about their overhead only when executing DEALLOCATE ALL? DEALLOCATE ALL is a light operation and is issued immediately after executing COMMIT TRANSACTION successfully, so I thought that in most cases it too would be likely to be executed successfully and quickly; there would be less need to do so for DEALLOCATE ALL. > > Anyway, this changes the behavior, so you should show the evidence > > that this is useful. I think this would be beyond refactoring, > > though. > > Isn't it useful to react the interrupts (e.g., shutdown requests) > soon even during waiting for the result of DEALLOCATE ALL? That might be useful, but another concern about this is error handling. The existing code (both in parallel commit and non-parallel commit) ignores any kinds of errors in libpq as well as interrupts when doing DEALLOCATE ALL, and then commits the local transaction, making the remote/local transaction states consistent, but IIUC the patch aborts the local transaction when doing the command, e.g., if WaitLatchOrSocket detected some kind of error in socket access, making the transaction states *inconsistent*, which I don't think would be great. So I'm still not sure this would be acceptable. Best regards, Etsuro Fujita
Re: [PATCH] Add pretty-printed XML output option
On 15.02.23 12:11, Alvaro Herrera wrote: Well, I don't see why we would depart from what we've been doing in the rest of the XML tests. I did see that patch and I thought it was taking the wrong approach. Fair point. v10 patches the xml.out to xml_2.out - manually removing the additional lines. Thanks for the review! Best, Jim From 835c9ec18255adce9f9ae1e1e5d9e4287bac5452 Mon Sep 17 00:00:00 2001 From: Jim Jones Date: Thu, 2 Feb 2023 21:27:16 +0100 Subject: [PATCH v10] Add pretty-printed XML output option This small patch introduces a XML pretty print function. It basically takes advantage of the indentation feature of xmlDocDumpFormatMemory from libxml2 to format XML strings. --- doc/src/sgml/func.sgml | 34 + src/backend/utils/adt/xml.c | 45 src/include/catalog/pg_proc.dat | 3 + src/test/regress/expected/xml.out | 108 src/test/regress/expected/xml_1.out | 57 +++ src/test/regress/expected/xml_2.out | 105 +++ src/test/regress/sql/xml.sql| 32 + 7 files changed, 384 insertions(+) diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index e09e289a43..a621192425 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -14861,6 +14861,40 @@ SELECT xmltable.* ]]> + + +xmlformat + + + xmlformat + + + +xmlformat ( xml ) xml + + + + Converts the given XML value to pretty-printed, indented text. + + + + Example: + + + + diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c index 079bcb1208..ec12707b5c 100644 --- a/src/backend/utils/adt/xml.c +++ b/src/backend/utils/adt/xml.c @@ -473,6 +473,51 @@ xmlBuffer_to_xmltype(xmlBufferPtr buf) } #endif +Datum +xmlformat(PG_FUNCTION_ARGS) +{ +#ifdef USE_LIBXML + + xmlDocPtr doc; + xmlChar*xmlbuf = NULL; + text *arg = PG_GETARG_TEXT_PP(0); + StringInfoData buf; + int nbytes; + + doc = xml_parse(arg, XMLOPTION_DOCUMENT, false, GetDatabaseEncoding(), NULL); + + if(!doc) + elog(ERROR, "could not parse the given XML document"); + + /** + * xmlDocDumpFormatMemory ( + * xmlDocPtr doc, # the XML document + * xmlChar **xmlbuf, # the memory pointer + * int *nbytes, # the memory length + * int format # 1 = node indenting + *) + */ + + xmlDocDumpFormatMemory(doc, &xmlbuf, &nbytes, 1); + + xmlFreeDoc(doc); + + if(!nbytes) + elog(ERROR, "could not indent the given XML document"); + + initStringInfo(&buf); + appendStringInfoString(&buf, (const char *)xmlbuf); + + xmlFree(xmlbuf); + + PG_RETURN_XML_P(stringinfo_to_xmltype(&buf)); + +#else + NO_XML_SUPPORT(); +return 0; +#endif +} + Datum xmlcomment(PG_FUNCTION_ARGS) diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat index c0f2a8a77c..54e8a6262a 100644 --- a/src/include/catalog/pg_proc.dat +++ b/src/include/catalog/pg_proc.dat @@ -8842,6 +8842,9 @@ { oid => '3053', descr => 'determine if a string is well formed XML content', proname => 'xml_is_well_formed_content', prorettype => 'bool', proargtypes => 'text', prosrc => 'xml_is_well_formed_content' }, +{ oid => '4642', descr => 'Indented text from xml', + proname => 'xmlformat', prorettype => 'xml', + proargtypes => 'xml', prosrc => 'xmlformat' }, # json { oid => '321', descr => 'I/O', diff --git a/src/test/regress/expected/xml.out b/src/test/regress/expected/xml.out index 3c357a9c7e..8bc8919092 100644 --- a/src/test/regress/expected/xml.out +++ b/src/test/regress/expected/xml.out @@ -1599,3 +1599,111 @@ SELECT * FROM XMLTABLE('.' PASSING XMLELEMENT(NAME a) columns a varchar(20) PATH |(1 row) +-- XML format: single line XML string +SELECT xmlformat('Belgian Waffles$5.95Two of our famous Belgian Waffles with plenty of real maple syrup650'); +xmlformat +-- + + + + + Belgian Waffles+ + $5.95+ + Two of our famous Belgian Waffles with plenty of real maple syrup+ + 650+ + + ++ + +(1 row) + +-- XML format: XML string with space, tabs and newline between nodes +SELECT xmlformat(' Belgian Waffles $15.95 + Two of our famous Belgian Waffles with plenty of real maple syrup +650
Re: Minimal logical decoding on standbys
On Thu, Jan 12, 2023 at 11:46 PM Andres Freund wrote: > > Hi, > > On 2023-01-12 20:08:55 +0530, Ashutosh Sharma wrote: > > I previously participated in the discussion on "Synchronizing the > > logical replication slots from Primary to Standby" and one of the > > purposes of that project was to synchronize logical slots from primary > > to standby so that if failover occurs, it will not affect the logical > > subscribers of the old primary much. Can someone help me understand > > how we are going to solve this problem with this patch? Are we going > > to encourage users to do LR from standby instead of primary to get rid > > of such problems during failover? > > It only provides a building block towards that. The "Synchronizing the logical > replication slots from Primary to Standby" project IMO needs all of the > infrastructure in this patch. With the patch, a logical rep solution can > e.g. maintain one slot on the primary and one on the standby, and occasionally > forward the slot on the standby to the position of the slot on the primary. In > case of a failover it can just start consuming changes from the former > standby, all the necessary changes are guaranteed to be present. > > > > Also, one small observation: > > > > I just played around with the latest (v38) patch a bit and found that > > when a new logical subscriber of standby is created, it actually > > creates two logical replication slots for it on the standby server. > > May I know the reason for creating an extra replication slot other > > than the one created by create subscription command? See below: > > That's unrelated to this patch. There's no changes to the "higher level" > logical replication code dealing with pubs and subs, it's all on the "logical > decoding" level. > > I think this because logical rep wants to be able to concurrently perform > ongoing replication, and synchronize tables added to the replication set. The > pg_16399_sync_16392_7187728548042694423 slot should vanish after the initial > synchronization. > Thanks Andres. I have one more query (both for you and Bertrand). I don't know if this has already been answered somewhere in this mail thread, if yes, please let me know the mail that answers this query. Will there be a problem if we mandate the use of physical replication slots and hot_standby_feedback to support minimum LD on standby. I know people can do a physical replication setup without a replication slot or even with hot_standby_feedback turned off, but are we going to have any issue if we ask them to use a physical replication slot and turn on hot_standby_feedback for LD on standby. This will reduce the code changes required to do conflict handling for logical slots on standby which is being done by v50-0001 and v50-0002* patches currently. IMHO even in normal scenarios i.e. when we are not doing LD on standby, we should mandate the use of a physical replication slot. -- With Regards, Ashutosh Sharma.
Re: Can we do something to help stop users mistakenly using force_parallel_mode?
On 2023-02-15 We 06:05, David Rowley wrote: On Wed, 15 Feb 2023 at 14:10, Andrew Dunstan wrote: We'll email them once this is in. Most people are fairly reponsive. I pushed the rename patch earlier. How should we go about making contact with the owners? I'm thinking it may be better coming from you, especially if you think technical details of what exactly should be changed should be included in the email. But I can certainly have a go if you'd rather I did it or you don't have time for this. Leave it with me. cheers andrew -- Andrew Dunstan EDB:https://www.enterprisedb.com
Re: pg_statistic MCVs use float4 but extended stats use float8
Hi, On 2/15/23 02:20, Justin Pryzby wrote: > It seems odd that stats_ext uses double: > > postgres=# SELECT attrelid::regclass, attname, atttypid::regtype, relkind > FROM pg_attribute a JOIN pg_class c ON c.oid=a.attrelid WHERE > attname='most_common_freqs'; > attrelid | attname | atttypid | relkind > +---++- > pg_stats | most_common_freqs | real[] | v > pg_stats_ext | most_common_freqs | double precision[] | v > pg_stats_ext_exprs | most_common_freqs | real[] | v > > I'm not sure if that's deliberate ? > Not really, I'm not sure why I chose float8 and not float4. Likely a cause of muscle memory on 64-bit systems. I wonder if there are practical reasons to change this, i.e. if the float8 can have adverse effects on some systems. Yes, it makes the stats a little bit larger, but I doubt the difference is significant enough to make a difference. Perhaps on 32-bit systems it's worse, because float8 is going to be pass-by-ref there ... regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Use of additional index columns in rows filtering
On 2/15/23 09:57, Maxim Ivanov wrote: > Hi All, > > I'd like to report what seems to be a missing optimization > opportunity or understand why it is not possible to achieve. > > TLDR; additional index column B specified in CREATE INDEX .. (A) > INCLUDE(B) is not used to filter rows in queries like WHERE B = $1 > ORDER BY A during IndexScan. https://dbfiddle.uk/iehtq44L > > ... > > Here is the plan (notice high "shared hit"): > > Limit (cost=0.42..10955.01 rows=1 width=12) (actual time=84.283..84.284 > rows=0 loops=1) > Output: a, b, d > Buffers: shared hit=198307 > -> Index Scan using t_a_include_b on public.t (cost=0.42..10955.01 > rows=1 width=12) (actual time=84.280..84.281 rows=0 loops=1) > Output: a, b, d > Index Cond: (t.a > 100) > Filter: (t.b = 4) > Rows Removed by Filter: 197805 > Buffers: shared hit=198307 > Planning: > Buffers: shared hit=30 > Planning Time: 0.201 ms > Execution Time: 84.303 ms > Yeah. The reason for this behavior is pretty simple: 1) When matching clauses to indexes in match_clause_to_index(), we only look at key columns (nkeycolumns). We'd need to check all columns (ncolumns) and remember if the clause matched a key or included one. 2) index_getnext_slot would need to get "candidate" TIDs using conditions on keys, and then check the clauses on included columns. Seems doable, unless I'm missing some fatal issue. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [PATCH] Add pretty-printed XML output option
Accidentally left the VERBOSE settings out -- sorry! Now it matches the approach used in a xpath test in xml.sql, xml.out, xml_1.out and xml_2.out -- Since different libxml versions emit slightly different -- error messages, we suppress the DETAIL in this test. \set VERBOSITY terse SELECT xpath('/*', ''); ERROR: could not parse XML document \set VERBOSITY default v11 now correctly sets xml_2.out. Best, Jim From 473aab0a0028cd4de515c6a3698a1cda1c987067 Mon Sep 17 00:00:00 2001 From: Jim Jones Date: Thu, 2 Feb 2023 21:27:16 +0100 Subject: [PATCH v11] Add pretty-printed XML output option This small patch introduces a XML pretty print function. It basically takes advantage of the indentation feature of xmlDocDumpFormatMemory from libxml2 to format XML strings. --- doc/src/sgml/func.sgml | 34 ++ src/backend/utils/adt/xml.c | 45 + src/include/catalog/pg_proc.dat | 3 + src/test/regress/expected/xml.out | 101 src/test/regress/expected/xml_1.out | 53 +++ src/test/regress/expected/xml_2.out | 101 src/test/regress/sql/xml.sql| 33 + 7 files changed, 370 insertions(+) diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index e09e289a43..a621192425 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -14861,6 +14861,40 @@ SELECT xmltable.* ]]> + + +xmlformat + + + xmlformat + + + +xmlformat ( xml ) xml + + + + Converts the given XML value to pretty-printed, indented text. + + + + Example: + + + + diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c index 079bcb1208..ec12707b5c 100644 --- a/src/backend/utils/adt/xml.c +++ b/src/backend/utils/adt/xml.c @@ -473,6 +473,51 @@ xmlBuffer_to_xmltype(xmlBufferPtr buf) } #endif +Datum +xmlformat(PG_FUNCTION_ARGS) +{ +#ifdef USE_LIBXML + + xmlDocPtr doc; + xmlChar*xmlbuf = NULL; + text *arg = PG_GETARG_TEXT_PP(0); + StringInfoData buf; + int nbytes; + + doc = xml_parse(arg, XMLOPTION_DOCUMENT, false, GetDatabaseEncoding(), NULL); + + if(!doc) + elog(ERROR, "could not parse the given XML document"); + + /** + * xmlDocDumpFormatMemory ( + * xmlDocPtr doc, # the XML document + * xmlChar **xmlbuf, # the memory pointer + * int *nbytes, # the memory length + * int format # 1 = node indenting + *) + */ + + xmlDocDumpFormatMemory(doc, &xmlbuf, &nbytes, 1); + + xmlFreeDoc(doc); + + if(!nbytes) + elog(ERROR, "could not indent the given XML document"); + + initStringInfo(&buf); + appendStringInfoString(&buf, (const char *)xmlbuf); + + xmlFree(xmlbuf); + + PG_RETURN_XML_P(stringinfo_to_xmltype(&buf)); + +#else + NO_XML_SUPPORT(); +return 0; +#endif +} + Datum xmlcomment(PG_FUNCTION_ARGS) diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat index c0f2a8a77c..54e8a6262a 100644 --- a/src/include/catalog/pg_proc.dat +++ b/src/include/catalog/pg_proc.dat @@ -8842,6 +8842,9 @@ { oid => '3053', descr => 'determine if a string is well formed XML content', proname => 'xml_is_well_formed_content', prorettype => 'bool', proargtypes => 'text', prosrc => 'xml_is_well_formed_content' }, +{ oid => '4642', descr => 'Indented text from xml', + proname => 'xmlformat', prorettype => 'xml', + proargtypes => 'xml', prosrc => 'xmlformat' }, # json { oid => '321', descr => 'I/O', diff --git a/src/test/regress/expected/xml.out b/src/test/regress/expected/xml.out index 3c357a9c7e..e45116aaa7 100644 --- a/src/test/regress/expected/xml.out +++ b/src/test/regress/expected/xml.out @@ -1599,3 +1599,104 @@ SELECT * FROM XMLTABLE('.' PASSING XMLELEMENT(NAME a) columns a varchar(20) PATH |(1 row) +-- XML format: single line XML string +SELECT xmlformat('Belgian Waffles$5.95Two of our famous Belgian Waffles with plenty of real maple syrup650'); +xmlformat +-- + + + + + Belgian Waffles+ + $5.95+ + Two of our famous Belgian Waffles with plenty of real maple syrup+ + 650+ + + ++ + +(1 row) + +-- XML format: XML string with space, tabs and newline between nodes +SELECT xmlformat(' Belgian Waffles $15.95 + Tw
RE: Add LZ4 compression in pg_dump
On Fri, Jan 27, 2023 2:04 AM gkokola...@pm.me wrote: > > --- Original Message --- > On Thursday, January 26th, 2023 at 12:53 PM, Michael Paquier > wrote: > > > > > > > > On Thu, Jan 26, 2023 at 11:24:47AM +, gkokola...@pm.me wrote: > > > > > I gave this a little bit of thought. I think that ReadHead should not > > > emit a warning, or at least not this warning as it is slightly misleading. > > > It implies that it will automatically turn off data restoration, which is > > > false. Further ahead, the code will fail with a conflicting error message > > > stating that the compression is not available. > > > > > > Instead, it would be cleaner both for the user and the maintainer to > > > move the check in RestoreArchive and make it the sole responsible for > > > this logic. > > > > > > - pg_fatal("cannot restore from compressed archive (compression not > supported in this installation)"); > > + pg_fatal("cannot restore data from compressed archive (compression not > supported in this installation)"); > > Hmm. I don't mind changing this part as you suggest. > > > > -#ifndef HAVE_LIBZ > > - if (AH->compression_spec.algorithm == PG_COMPRESSION_GZIP) > > > > - pg_fatal("archive is compressed, but this installation does not support > compression"); > > -#endif > > However I think that we'd better keep the warning, as it can offer a > > hint when using pg_restore -l not built with compression support if > > looking at a dump that has been compressed. > > Fair enough. Please find v27 attached. > Hi, I am interested in this feature and tried the patch. While reading the comments, I noticed some minor things that could possibly be improved (in v27-0003 patch). 1. + /* +* Open a file for writing. +* +* 'mode' can be one of ''w', 'wb', 'a', and 'ab'. Requrires an already +* initialized CompressFileHandle. +*/ + int (*open_write_func) (const char *path, const char *mode, + CompressFileHandle *CFH); There is a redundant single quote in front of 'w'. 2. /* * Callback function for WriteDataToArchive. Writes one block of (compressed) * data to the archive. */ /* * Callback function for ReadDataFromArchive. To keep things simple, we * always read one compressed block at a time. */ Should the function names in the comments be updated? WriteDataToArchive -> writeData ReadDataFromArchive -> readData 3. + Assert(strcmp(mode, "r") == 0 || strcmp(mode, "rb") == 0); Could we use PG_BINARY_R instead of "r" and "rb" here? Regards, Shi Yu
Re: Minimal logical decoding on standbys
Hi, On 2/15/23 1:32 PM, Ashutosh Sharma wrote: Thanks Andres. I have one more query (both for you and Bertrand). I don't know if this has already been answered somewhere in this mail thread, if yes, please let me know the mail that answers this query. Will there be a problem if we mandate the use of physical replication slots and hot_standby_feedback to support minimum LD on standby. I don't think we have to make it mandatory. There is use cases where it's not needed and mentioned by Andres up-thread [1] (see the comment "The patch deals with this...") I know people can do a physical replication setup without a replication slot or even with hot_standby_feedback turned off, but are we going to have any issue if we ask them to use a physical replication slot and turn on hot_standby_feedback for LD on standby. This will reduce the code changes required to do conflict handling for logical slots on standby which is being done by v50-0001 and v50-0002* patches currently. But on the other hand we'd need to ensure that the primary physical slot and HSF set to on on the standby are always in place. That would probably lead to extra code too. I'm -1 on that but +1 on the fact that it should be documented (as done in 0006). Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com [1]: https://www.postgresql.org/message-id/20211028210755.afmwcvpo6ajwdx6n%40alap3.anarazel.de
Re: [EXTERNAL] Re: [PATCH] Support using "all" for the db user in pg_ident.conf
On Wed, 15 Feb 2023 at 08:11, Michael Paquier wrote: > Hmm, I am not sure that adding more examples in the sample files is > worth the duplication with the docs. I think you misunderstood what I meant (because I admittedly didn't write it down clearly). I meant the docs for pg_ident don't include any examples (only descriptions of the new patterns). Attached is a patch that addresses that. > So, please find attached a patch to close the gap the sample files, > for both things, with descriptions of all the field values they can > use. LGTM 0001-Add-example-of-new-pg_ident-username-paterns-to-docs.patch Description: Binary data
Re: [PATCH] Add pretty-printed XML output option
Alvaro Herrera writes: > Note that there's another file, xml_2.out, which does not contain the > additional part of the DETAIL message. I suspect in Peter's case it's > xml_2.out that was originally being used as a comparison basis (not > xml.out), but that one is not getting patched, and ultimately the diff > is reported for him against xml.out because none of them matches. > See commit 085423e3e326 for a bit of background. Yeah. That's kind of sad, because it means there are still broken libxml2s out there in 2023. Nonetheless, since there are, it is not optional to fix all three expected-files. regards, tom lane
Silent overflow of interval type
Hello hackers, I've been testing various edge-cases of timestamptz and related types and noticed that despite being a 16-byte wide type, interval overflows for some timestamptz (8-byte) subtractions (timestamp_mi). A simple example of this would be: select timestamptz'294276-12-31 23:59:59 UTC' - timestamptz'1582-10-15 00:00:00 UTC'; Yielding: interval'-106599615 days -08:01:50.551616' This makes sense from the implementation point of view, since both timestamptz and Interval->TimeOffset are int64. The patch attached simply throws an error when an overflow is detected. However I'm not sure this is a reasonable approach for a code path that could be very hot in some workloads. Another consideration is that regardless of the values of the timestamps, the absolute value of the difference can be stored in a uint64. However that observation has little practical value. That being said I'm willing to work on a fix that makes sense and making it commit ready (or step aside if someone else wants to take over) but I'd also understand if this is marked as "not worth fixing". Regards, Nick From ba326276f79cf847fd1127d0edaf7d7f99a18c8e Mon Sep 17 00:00:00 2001 From: Nick Babadzhanian Date: Wed, 15 Feb 2023 08:42:00 +0100 Subject: [PATCH] Fix silent interval overflow --- src/backend/utils/adt/timestamp.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/src/backend/utils/adt/timestamp.c b/src/backend/utils/adt/timestamp.c index 47e059a409..88fb4160cb 100644 --- a/src/backend/utils/adt/timestamp.c +++ b/src/backend/utils/adt/timestamp.c @@ -2713,7 +2713,11 @@ timestamp_mi(PG_FUNCTION_ARGS) (errcode(ERRCODE_DATETIME_VALUE_OUT_OF_RANGE), errmsg("cannot subtract infinite timestamps"))); - result->time = dt1 - dt2; + /* Subtract dt1 and dt2 with overflow detection */ + if (unlikely(pg_sub_s64_overflow(dt1, dt2, &result->time))) + ereport(ERROR, +(errcode(ERRCODE_INTERVAL_FIELD_OVERFLOW), + errmsg("interval field out of range"))); result->month = 0; result->day = 0; -- 2.39.1
Re: Use of additional index columns in rows filtering
Tomas Vondra writes: > On 2/15/23 09:57, Maxim Ivanov wrote: >> TLDR; additional index column B specified in CREATE INDEX .. (A) >> INCLUDE(B) is not used to filter rows in queries like WHERE B = $1 >> ORDER BY A during IndexScan. https://dbfiddle.uk/iehtq44L > Seems doable, unless I'm missing some fatal issue. Partly this is lack of round tuits, but there's another significant issue: there very likely are index entries corresponding to dead heap rows. Applying random user-defined quals to values found in such rows could produce semantic anomalies; for example, divide-by-zero failures even though you deleted all the rows having a zero in that column. This isn't a problem for operators found in operator families, because we trust those to not have undesirable side effects like raising data-dependent errors. But it'd be an issue if we started to apply quals that aren't index quals directly to index rows before doing the heap liveness check. (And, of course, once you've fetched the heap row there's no point in having a special path for columns available from the index.) regards, tom lane
Re: DDL result is lost by CREATE DATABASE with WAL_LOG strategy
On Wed, Feb 15, 2023 at 04:49:38AM +, Ryo Matsumura (Fujitsu) wrote: > Hi, hackers. > > I found that CREATE DATABASE occurs lost of DDL result after crash server. > The direct cause is that the checkpoint skips sync for page of template1's > main fork > because buffer status is not marked as BM_PERMANENT in BufferAlloc(). I had some trouble reproducing this when running the commands by hand. But it reproduces fine like this: $ ./tmp_install/usr/local/pgsql/bin/postgres -D ./testrun/regress/regress/tmp_check/data& sleep 2; psql -h /tmp postgres -c "DROP DATABASE IF EXISTS j" -c "CREATE DATABASE j STRATEGY wal_log" && psql -h /tmp template1 -c "CREATE TABLE t(i int)" -c "INSERT INTO t SELECT generate_series(1,9)" -c CHECKPOINT; kill -9 %1; wait; ./tmp_install/usr/local/pgsql/bin/postgres -D ./testrun/regress/regress/tmp_check/data& sleep 9; psql -h /tmp template1 -c "table t"; kill %1 [1] 29069 2023-02-15 10:10:27.584 CST postmaster[29069] LOG: starting PostgreSQL 16devel on x86_64-linux, compiled by gcc-9.4.0, 64-bit 2023-02-15 10:10:27.584 CST postmaster[29069] LOG: listening on IPv4 address "127.0.0.1", port 5432 2023-02-15 10:10:27.663 CST postmaster[29069] LOG: listening on Unix socket "/tmp/.s.PGSQL.5432" 2023-02-15 10:10:27.728 CST startup[29074] LOG: database system was shut down at 2023-02-15 10:10:13 CST 2023-02-15 10:10:27.780 CST postmaster[29069] LOG: database system is ready to accept connections NOTICE: database "j" does not exist, skipping DROP DATABASE CREATE DATABASE CREATE TABLE INSERT 0 9 2023-02-15 10:10:30.160 CST checkpointer[29072] LOG: checkpoint starting: immediate force wait 2023-02-15 10:10:30.740 CST checkpointer[29072] LOG: checkpoint complete: wrote 943 buffers (5.8%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.070 s, sync=0.369 s, total=0.581 s; sync files=268, longest=0.274 s, average=0.002 s; distance=4322 kB, estimate=4322 kB; lsn=0/BA9E8A0, redo lsn=0/BA9E868 CHECKPOINT [1]+ Killed ./tmp_install/usr/local/pgsql/bin/postgres -D ./testrun/regress/regress/tmp_check/data [1] 29088 2023-02-15 10:10:31.664 CST postmaster[29088] LOG: starting PostgreSQL 16devel on x86_64-linux, compiled by gcc-9.4.0, 64-bit 2023-02-15 10:10:31.665 CST postmaster[29088] LOG: listening on IPv4 address "127.0.0.1", port 5432 2023-02-15 10:10:31.724 CST postmaster[29088] LOG: listening on Unix socket "/tmp/.s.PGSQL.5432" 2023-02-15 10:10:31.780 CST startup[29094] LOG: database system was interrupted; last known up at 2023-02-15 10:10:30 CST 2023-02-15 10:10:33.888 CST startup[29094] LOG: database system was not properly shut down; automatic recovery in progress 2023-02-15 10:10:33.934 CST startup[29094] LOG: redo starts at 0/BA9E868 2023-02-15 10:10:33.934 CST startup[29094] LOG: invalid record length at 0/BA9E918: wanted 24, got 0 2023-02-15 10:10:33.934 CST startup[29094] LOG: redo done at 0/BA9E8A0 system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s 2023-02-15 10:10:34.073 CST checkpointer[29092] LOG: checkpoint starting: end-of-recovery immediate wait 2023-02-15 10:10:34.275 CST checkpointer[29092] LOG: checkpoint complete: wrote 3 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.035 s, sync=0.026 s, total=0.257 s; sync files=2, longest=0.019 s, average=0.013 s; distance=0 kB, estimate=0 kB; lsn=0/BA9E918, redo lsn=0/BA9E918 2023-02-15 10:10:34.321 CST postmaster[29088] LOG: database system is ready to accept connections 2023-02-15 10:10:39.893 CST client backend[29110] psql ERROR: relation "t" does not exist at character 7 2023-02-15 10:10:39.893 CST client backend[29110] psql STATEMENT: table t ERROR: relation "t" does not exist
Re: Use of additional index columns in rows filtering
On 2/15/23 16:18, Tom Lane wrote: > Tomas Vondra writes: >> On 2/15/23 09:57, Maxim Ivanov wrote: >>> TLDR; additional index column B specified in CREATE INDEX .. (A) >>> INCLUDE(B) is not used to filter rows in queries like WHERE B = $1 >>> ORDER BY A during IndexScan. https://dbfiddle.uk/iehtq44L > >> Seems doable, unless I'm missing some fatal issue. > > Partly this is lack of round tuits, but there's another significant > issue: there very likely are index entries corresponding to dead heap > rows. Applying random user-defined quals to values found in such rows > could produce semantic anomalies; for example, divide-by-zero failures > even though you deleted all the rows having a zero in that column. > > This isn't a problem for operators found in operator families, because > we trust those to not have undesirable side effects like raising > data-dependent errors. But it'd be an issue if we started to apply > quals that aren't index quals directly to index rows before doing > the heap liveness check. (And, of course, once you've fetched the > heap row there's no point in having a special path for columns > available from the index.) Sure, but we can do the same VM check as index-only scan, right? That would save some of the I/O to fetch the heap tuple, as long as the page is all-visible and the filter eliminates the tuples. It makes the costing a bit trickier, because it needs to consider both how many pages are all-visible and selectivity of the condition. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Use of additional index columns in rows filtering
> This isn't a problem for operators found in operator families, because > we trust those to not have undesirable side effects like raising > data-dependent errors. But it'd be an issue if we started to apply > quals that aren't index quals directly to index rows before doing > the heap liveness check. (And, of course, once you've fetched the > heap row there's no point in having a special path for columns > available from the index.) Assuming operators are pure and don't have global side effects, is it possible to ignore any error during that check? If tuple is not visible it shouldn't matter, if it is visible then error will be reported by the same routine which does filtering now (ExecQual?). If not, then limiting this optimization to builtin ops is something I can live with :)
Re: We shouldn't signal process groups with SIGQUIT
On Tue, Feb 14, 2023 at 04:20:59PM -0800, Andres Freund wrote: > On 2023-02-14 14:23:32 -0800, Nathan Bossart wrote: >> On Tue, Feb 14, 2023 at 12:47:12PM -0800, Andres Freund wrote: >> > Not really related: I do wonder how often we end up self deadlocking in >> > quickdie(), due to the ereport() not beeing reentrant. We'll "fix" it soon >> > after, due to postmasters SIGKILL. Perhaps we should turn on >> > send_abort_for_kill on CI? >> >> +1, this seems like it'd be useful in general. I'm guessing this will >> require a bit of work on the CI side to generate the backtrace. > > They're already generated for all current platforms. It's possible that debug > info for some system libraries is missing, but the most important one (like > libc) should be available. > > Since yesterday the cfbot website allows to easily find the coredumps, too: > http://cfbot.cputube.org/highlights/core-7.html Oh, that's nifty. Any reason not to enable send_abort_for_crash, too? > There's definitely some work to be done to make the contents of the backtraces > more useful though. E.g. printing out the program name, the current directory > (although that doesn't seem to be doable for all programs). For everything but > windows that's in src/tools/ci/cores_backtrace.sh. Got it, thanks for the info. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: some namespace.c refactoring
On 2023-Feb-15, Tom Lane wrote: > Peter Eisentraut writes: > > Here are two patches that refactor the mostly repetitive "${object} is > > visible" and get_${object}_oid() functions in namespace.c. This uses > > the functions in objectaddress.c to look up the appropriate per-catalog > > system caches and attribute numbers, similar to other refactoring > > patches I have posted recently. > > This does not look like a simple refactoring patch to me. I have > very serious concerns first about whether it even preserves the > existing semantics, and second about whether there is a performance > penalty. I suppose there are two possible questionable angles from a performance POV: 1. the new code uses get_object_property_data() when previously there was a straight dereference after casting to the right struct type. So how expensive is that? I think the answer to that is not great, because it does a linear array scan on ObjectProperty. Maybe we need a better answer. 2. other accesses to the data are done using SysCacheGetAttr instead of straight struct access dereferences. I expect that most of the fields being accessed have attcacheoff set, which allows pretty fast access to the field in question, so it's not *that* bad. (For cases where attcacheoff is not set, then the original code would also have to deform the tuple.) Still, it's going to have nontrivial impact in any microbenchmarking. That said, I think most of this code is invoked for DDL, where performance is not so critical; probably just fixing get_object_property_data to not be so naïve would suffice. Queries are another matter. I can't think of a way to determine which of these paths are used for queries, so that we can optimize more (IOW: just plain not rewrite those); manually going through the callers seems a bit laborious, but perhaps doable. -- Álvaro HerreraBreisgau, Deutschland — https://www.EnterpriseDB.com/ Thou shalt check the array bounds of all strings (indeed, all arrays), for surely where thou typest "foo" someone someday shall type "supercalifragilisticexpialidocious" (5th Commandment for C programmers)
Re: We shouldn't signal process groups with SIGQUIT
Hi, On 2023-02-15 09:57:41 -0800, Nathan Bossart wrote: > On Tue, Feb 14, 2023 at 04:20:59PM -0800, Andres Freund wrote: > > On 2023-02-14 14:23:32 -0800, Nathan Bossart wrote: > >> On Tue, Feb 14, 2023 at 12:47:12PM -0800, Andres Freund wrote: > >> > Not really related: I do wonder how often we end up self deadlocking in > >> > quickdie(), due to the ereport() not beeing reentrant. We'll "fix" it > >> > soon > >> > after, due to postmasters SIGKILL. Perhaps we should turn on > >> > send_abort_for_kill on CI? > >> > >> +1, this seems like it'd be useful in general. I'm guessing this will > >> require a bit of work on the CI side to generate the backtrace. > > > > They're already generated for all current platforms. It's possible that > > debug > > info for some system libraries is missing, but the most important one (like > > libc) should be available. > > > > Since yesterday the cfbot website allows to easily find the coredumps, too: > > http://cfbot.cputube.org/highlights/core-7.html > > Oh, that's nifty. Any reason not to enable send_abort_for_crash, too? I think it'd be too noisy. Right now you get just a core dump of the crashed process, but if we set send_abort_for_crash we'd end up with a lot of core dumps, making it harder to know what to look at. We should never need the send_abort_for_kill path, so I don't think the noise issue applies to the same degree. Greetings, Andres
Re: Improve logging when using Huge Pages
On Tue, Feb 14, 2023 at 07:32:56PM -0600, Justin Pryzby wrote: > On Mon, Feb 13, 2023 at 08:18:52PM -0800, Nathan Bossart wrote: >> On Mon, Feb 13, 2023 at 05:22:45PM -0600, Justin Pryzby wrote: >> nitpick: Does this need to be initialized here? > > None of the GUCs' C vars need to be initialized, since the guc machinery > will do it. > > ...but the convention is that they *are* initialized - and that's now > partially enforced. > > See: > d9d873bac67047cfacc9f5ef96ee488f2cb0f1c3 > 7d25958453a60337bcb7bcc986e270792c007ea4 > a73952b795632b2cf5acada8476e7cf75857e9be I see. This looked a little strange to me because many of the other variables are uninitialized. In a73952b, I see that we allow the variables for string GUCs to be initialized to NULL. Anyway, this is only a nitpick. I don't feel strongly about it. >> I'm curious why you chose to make this a string instead of an enum. There >> might be little practical difference, but since there are only three >> possible values, I wonder if it'd be better form to make it an enum. > > It takes more code to write as an enum - see 002.txt. I'm not convinced > this is better. > > But your comment made me fix its , and reconsider the strings, > which I changed to active={unknown/true/false} rather than {unk/on/off}. > It could also be active={unknown/yes/no}... I think unknown/true/false is fine. I'm okay with using a string if no one else thinks it should be an enum (or a bool). -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: We shouldn't signal process groups with SIGQUIT
On Wed, Feb 15, 2023 at 10:12:58AM -0800, Andres Freund wrote: > On 2023-02-15 09:57:41 -0800, Nathan Bossart wrote: >> Oh, that's nifty. Any reason not to enable send_abort_for_crash, too? > > I think it'd be too noisy. Right now you get just a core dump of the crashed > process, but if we set send_abort_for_crash we'd end up with a lot of core > dumps, making it harder to know what to look at. > > We should never need the send_abort_for_kill path, so I don't think the noise > issue applies to the same degree. Makes sense. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: Minimal logical decoding on standbys
Hi, On 2023-02-15 18:02:11 +0530, Ashutosh Sharma wrote: > Thanks Andres. I have one more query (both for you and Bertrand). I > don't know if this has already been answered somewhere in this mail > thread, if yes, please let me know the mail that answers this query. > > Will there be a problem if we mandate the use of physical replication > slots and hot_standby_feedback to support minimum LD on standby. I > know people can do a physical replication setup without a replication > slot or even with hot_standby_feedback turned off, but are we going to > have any issue if we ask them to use a physical replication slot and > turn on hot_standby_feedback for LD on standby. This will reduce the > code changes required to do conflict handling for logical slots on > standby which is being done by v50-0001 and v50-0002* patches > currently. I don't think it would. E.g. while restoring from archives we can't rely on knowing that the slot still exists on the primary. We can't just do corrupt things, including potentially crashing, when the configuration is wrong. We can't ensure that the configuration is accurate all the time. So we need to detect this case. Hence needing to detect conflicts. > IMHO even in normal scenarios i.e. when we are not doing LD on > standby, we should mandate the use of a physical replication slot. I don't think that's going to fly. There plenty scenarios where you e.g. don't want to use a slot, e.g. when you want to limit space use on the primary. Greetings, Andres Freund
Re: recovery modules
On Wed, Feb 15, 2023 at 03:38:21PM +0900, Michael Paquier wrote: > On Mon, Feb 13, 2023 at 05:02:37PM -0800, Nathan Bossart wrote: >> Sorry for then noise, cfbot alerted me to a missing #include, which I've >> added in v13. Sorry for more noise. I noticed that I missed updating the IDENTIFICATION line for shell_archive.c. That's the only change in v14. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com >From 2f32fdd5cfd89600025000c1ddfee30b6f9103b4 Mon Sep 17 00:00:00 2001 From: Nathan Bossart Date: Thu, 9 Feb 2023 13:49:46 -0800 Subject: [PATCH v14 1/1] restructure archive modules API --- contrib/basic_archive/basic_archive.c | 85 --- doc/src/sgml/archive-modules.sgml | 35 ++-- src/backend/Makefile | 2 +- src/backend/archive/Makefile | 18 src/backend/archive/meson.build | 5 ++ .../{postmaster => archive}/shell_archive.c | 32 --- src/backend/meson.build | 1 + src/backend/postmaster/Makefile | 1 - src/backend/postmaster/meson.build| 1 - src/backend/postmaster/pgarch.c | 27 +++--- src/backend/utils/misc/guc_tables.c | 1 + src/include/archive/archive_module.h | 58 + src/include/archive/shell_archive.h | 24 ++ src/include/postmaster/pgarch.h | 39 - 14 files changed, 242 insertions(+), 87 deletions(-) create mode 100644 src/backend/archive/Makefile create mode 100644 src/backend/archive/meson.build rename src/backend/{postmaster => archive}/shell_archive.c (77%) create mode 100644 src/include/archive/archive_module.h create mode 100644 src/include/archive/shell_archive.h diff --git a/contrib/basic_archive/basic_archive.c b/contrib/basic_archive/basic_archive.c index 36b7a4814a..d7c227a10b 100644 --- a/contrib/basic_archive/basic_archive.c +++ b/contrib/basic_archive/basic_archive.c @@ -30,9 +30,9 @@ #include #include +#include "archive/archive_module.h" #include "common/int.h" #include "miscadmin.h" -#include "postmaster/pgarch.h" #include "storage/copydir.h" #include "storage/fd.h" #include "utils/guc.h" @@ -40,14 +40,27 @@ PG_MODULE_MAGIC; +typedef struct BasicArchiveData +{ + MemoryContext context; +} BasicArchiveData; + static char *archive_directory = NULL; -static MemoryContext basic_archive_context; -static bool basic_archive_configured(void); -static bool basic_archive_file(const char *file, const char *path); +static void basic_archive_startup(ArchiveModuleState *state); +static bool basic_archive_configured(ArchiveModuleState *state); +static bool basic_archive_file(ArchiveModuleState *state, const char *file, const char *path); static void basic_archive_file_internal(const char *file, const char *path); static bool check_archive_directory(char **newval, void **extra, GucSource source); static bool compare_files(const char *file1, const char *file2); +static void basic_archive_shutdown(ArchiveModuleState *state); + +static const ArchiveModuleCallbacks basic_archive_callbacks = { + .startup_cb = basic_archive_startup, + .check_configured_cb = basic_archive_configured, + .archive_file_cb = basic_archive_file, + .shutdown_cb = basic_archive_shutdown +}; /* * _PG_init @@ -67,10 +80,6 @@ _PG_init(void) check_archive_directory, NULL, NULL); MarkGUCPrefixReserved("basic_archive"); - - basic_archive_context = AllocSetContextCreate(TopMemoryContext, - "basic_archive", - ALLOCSET_DEFAULT_SIZES); } /* @@ -78,11 +87,28 @@ _PG_init(void) * * Returns the module's archiving callbacks. */ +const ArchiveModuleCallbacks * +_PG_archive_module_init(void) +{ + return &basic_archive_callbacks; +} + +/* + * basic_archive_startup + * + * Creates the module's memory context. + */ void -_PG_archive_module_init(ArchiveModuleCallbacks *cb) +basic_archive_startup(ArchiveModuleState *state) { - cb->check_configured_cb = basic_archive_configured; - cb->archive_file_cb = basic_archive_file; + BasicArchiveData *data; + + data = (BasicArchiveData *) MemoryContextAllocZero(TopMemoryContext, + sizeof(BasicArchiveData)); + data->context = AllocSetContextCreate(TopMemoryContext, + "basic_archive", + ALLOCSET_DEFAULT_SIZES); + state->private_data = (void *) data; } /* @@ -133,7 +159,7 @@ check_archive_directory(char **newval, void **extra, GucSource source) * Checks that archive_directory is not blank. */ static bool -basic_archive_configured(void) +basic_archive_configured(ArchiveModuleState *state) { return archive_directory != NULL && archive_directory[0] != '\0'; } @@ -144,10 +170,12 @@ basic_archive_configured(void) * Archives one file. */ static bool -basic_archive_file(const char *file, const char *path) +basic_archive_file(ArchiveModuleState *state, const char *file, const char *path) { sigjmp_buf local_sigjmp_buf; Mem
Re: run pgindent on a regular basis / scripted manner
On Mon, Feb 13, 2023 at 07:57:25AM -0500, Andrew Dunstan wrote: > > On 2023-02-12 Su 15:59, Justin Pryzby wrote: > > It seems like if pgindent knows about git, it ought to process only > > tracked files. Then, it wouldn't need to manually exclude generated > > files, and it wouldn't process vpath builds and who-knows-what else it > > finds in CWD. > > I don't really want restrict this to tracked files because it would mean you > can't pgindent files before you `git add` them. I think you'd allow indenting files which were either tracked *or* specified on the command line. Also, it makes a more sense to "add" the file before indenting it, to allow checking the output and remove unrelated changes. So that doesn't seem to me like a restriction of any significance. But I would never want to indent an untracked file unless I specified it. -- Justin
Re: Move defaults toward ICU in 16?
On Tue, 2023-02-14 at 16:27 -0500, Jonathan S. Katz wrote: > Would it be any different than a vulnerability in OpenSSL et al? In principle, no, but sometimes the details matter. I'm just trying to add data to the discussion. > It seems that > in general, users would see performance gains switching to ICU. That's great news, and consistent with my experience. I don't think it should be a driving factor though. If there's a choice is between platform-independent semantics (ICU) and performance, platform- independence should be the default. > I agree with most of your points in [1]. The platform-consistent > behavior is a good point, especially with more PG deployments running > on > different systems. Now I think semantics are the most important driver, being consistent across platforms and based on some kind of trusted independent organization that we can point to. It feels very wrong to me to explain that sort order is defined by the operating system on which Postgres happens to run. Saying that it's defined by ICU, which is part of the Unicode consotium, is much better. It doesn't eliminate versioning issues, of course, but I think it's a better explanation for users. Many users have other systems in their data infrastructure, running on a variety of platforms, and could (in theory) try to synchronize around a common ICU version to avoid subtle bugs in their data pipeline. > Based on the available data, I think it's OK to move towards ICU as > the > default, or preferred, collation provider. I agree (for now) in not > taking a hard dependency on ICU. I count several favorable responses, so I'll take it that we (as a community) are intending to change the default for build and initdb in v16. Robert expressed some skepticism[1], though I don't see an objection. If I read his concerns correctly, he's mainly concerned with quality issues like documentaiton, bugs, etc. I understand those concerns (I'm the one that raised them), but they seem like the kind of issues that one finds any time they dig into a dependency enough. "Setting our sights very high"[1], to me, would just be ICU with a bit more rigorous attention to quality issues. [1] https://www.postgresql.org/message-id/CA%2BTgmoYmeGJaW%3DPy9tAZtrnCP%2B_Q%2BzRQthv%3Dzn_HyA_nqEDM-A%40mail.gmail.com -- Jeff Davis PostgreSQL Contributor Team - AWS
Re: doc: add missing "id" attributes to extension packaging page
On Tue, 14 Feb 2023 12:13:18 -0800 Andres Freund wrote: > A small note: As-is this fails on CI, because we don't allow network > access during the docs build anymore (since it always fails these > days): > > https://cirrus-ci.com/task/5474029402849280?logs=docs_build#L297 > > [17:02:03.114] time make -s -j${BUILD_JOBS} -C doc > [17:02:04.092] I/O error : Attempt to load network entity > http://cdn.docbook.org/release/xsl/current/html/sections.xsl > [17:02:04.092] warning: failed to load external entity > "http://cdn.docbook.org/release/xsl/current/html/sections.xsl"; > [17:02:04.092] compilation error: file stylesheet-html-common.xsl > line 17 element import [17:02:04.092] xsl:import : unable to load > http://cdn.docbook.org/release/xsl/current/html/sections.xsl This makes me think that it would be useful to add --nonet to the xsltproc invocations. That would catch this error before it goes to CI. > I think this is just due to the common URL in docbook packages being > http://docbook.sourceforge.net/release/xsl/current/* > Because of that the docbook catalog matching logic won't work for > that file: > > E.g. I have the following in /etc/xml/docbook-xsl.xml, on debian > unstable: uriStartString="http://docbook.sourceforge.net/release/xsl/"; > catalog="file:///usr/share/xml/docbook/stylesheet/docbook-xsl/catalog.xml"/> > > As all our other references use the sourceforge address, this should > too. Agreed. I'm also noticing that the existing xsl:import-s all import entire docbook stylesheets. It does not hurt to do this; the output is unaffected, although I can't say what it means for build performance. It does keep it simple. Only one import is needed no matter which templates we use the import mechanism to extend. And by importing "everything" there's no concern about any (unlikely) changes to the the "internals" of the catalog. Should we import only what we need or all of docbook? I don't know. Regards, Karl Free Software: "You don't pay back, you pay forward." -- Robert A. Heinlein
Re: Silent overflow of interval type
On Wed, Feb 15, 2023 at 7:08 AM Nikolai wrote: > > The patch attached simply throws an error when an overflow is > detected. However I'm not sure this is a reasonable approach for a > code path that could be very hot in some workloads. Given the extraordinary amount of overflow checks in the nearby code of timestamp.c, I'd say that this case should not be an exception. By chance did you look at all other nearby cases, is it the only place with overflow? (I took a look too, but haven't found anything suspicious) Best regards, Andrey Borodin.
Re: doc: add missing "id" attributes to extension packaging page
Hi, On 2023-02-15 13:34:37 -0600, Karl O. Pinc wrote: > This makes me think that it would be useful to add --nonet to the > xsltproc invocations. That would catch this error before it goes to > CI. We are doing that now :) commit 969509c3f2e3b4c32dcf264f9d642b5ef01319f3 Author: Tom Lane Date: 2023-02-08 17:15:23 -0500 Stop recommending auto-download of DTD files, and indeed disable it. > I'm also noticing that the existing xsl:import-s all import entire > docbook stylesheets. It does not hurt to do this; the output is > unaffected, although I can't say what it means for build performance. > It does keep it simple. Only one import is needed no matter which > templates we use the import mechanism to extend. And by importing > "everything" there's no concern about any (unlikely) changes to > the the "internals" of the catalog. > > Should we import only what we need or all of docbook? I don't know. It couldn't hurt to check if performance improves when you avoid doing so. I suspect it won't make much of a difference, because the time is actually spent evaluating xslt rather than parsing it. Greetings, Andres Freund
Re: run pgindent on a regular basis / scripted manner
> Also, it makes a more sense to "add" the file before indenting it, to > allow checking the output and remove unrelated changes. So that doesn't > seem to me like a restriction of any significance. For my workflow it would be the same, but afaik there's two ways that people commonly use git (mine is 1): 1. Adding changes/files to the staging area using and then committing those changes: git add (-p)/emacs magit/some other editor integration 2. Just add everything that's changed and commit all of it: git add -A + git commit/git commit -a For workflow 1, a --staged/--cached flag would be enough IMHO. But that's not at all helpful for workflow 2. That's why I proposed --uncommitted too, to make indenting easier for workflow 2. > But I would never want to indent an untracked file unless I specified > it. Would the --uncommitted flag I proposed be enough of an explicit way of specifying that you want to indent untracked files?
[PATCH] FIx alloc_var() ndigits thinko
Hi, I came across another harmless thinko in numeric.c. It is harmless since 20/DEC_DIGITS and 40/DEC_DIGITS happens to be exactly 5 and 10 since DEC_DIGITS == 4, but should be fixed anyway for correctness IMO. - alloc_var(var, 20 / DEC_DIGITS); + alloc_var(var, (20 + DEC_DIGITS - 1) / DEC_DIGITS); - alloc_var(var, 40 / DEC_DIGITS); + alloc_var(var, (40 + DEC_DIGITS - 1) / DEC_DIGITS); /Joel 0001-fix-alloc-var-ndigits-thinko.patch Description: Binary data
Re: psql: Add role's membership options to the \du+ command
On 2023-02-10 2:27 p.m., David G. Johnston wrote: On Fri, Feb 10, 2023 at 2:08 PM David Zhang wrote: I noticed the document psql-ref.sgml has been updated for both `du+` and `dg+`, but only `du` and `\du+` are covered in regression test. Is that because `dg+` is treated exactly the same as `du+` from testing point of view? Yes. The reason I am asking this question is that I notice that `pg_monitor` also has the detailed information, so not sure if more test cases required. Of course it does. Why does that bother you? And what does that have to do with the previous paragraph? There is a default built-in role `pg_monitor` and the behavior changed after the patch. If `\dg+` and `\du+` is treated as the same, and `make check` all pass, then I assume there is no test case to verify the output of `duS+`. My point is should we consider add a test case? Before patch the output for `pg_monitor`, postgres=# \duS+ List of roles Role name | Attributes | Member of | Description -++--+- alice | | {pg_read_all_settings,pg_read_all_stats,pg_stat_scan_tables} | pg_checkpoint | Cannot login | {} | pg_database_owner | Cannot login | {} | pg_execute_server_program | Cannot login | {} | pg_maintain | Cannot login | {} | pg_monitor | Cannot login | {pg_read_all_settings,pg_read_all_stats,pg_stat_scan_tables} | pg_read_all_data | Cannot login | {} | pg_read_all_settings | Cannot login | {} | pg_read_all_stats | Cannot login | {} | pg_read_server_files | Cannot login | {} | pg_signal_backend | Cannot login | {} | pg_stat_scan_tables | Cannot login | {} | pg_use_reserved_connections | Cannot login | {} | pg_write_all_data | Cannot login | {} | pg_write_server_files | Cannot login | {} | ubuntu | Superuser, Create role, Create DB, Replication, Bypass RLS | {} | After patch the output for `pg_monitor`, postgres=# \duS+ List of roles Role name | Attributes | Member of | Description -++---+- alice | | pg_read_all_settings WITH ADMIN, INHERIT, SET+| | | pg_read_all_stats WITH INHERIT +| | | pg_stat_scan_tables | pg_checkpoint | Cannot login | | pg_database_owner | Cannot login | | pg_execute_server_program | Cannot login | | pg_maintain | Cannot login | | pg_monitor | Cannot login | pg_read_all_settings WITH INHERIT, SET +| |
Re: psql: Add role's membership options to the \du+ command
On Wed, Feb 15, 2023 at 2:31 PM David Zhang wrote: > There is a default built-in role `pg_monitor` and the behavior changed > after the patch. If `\dg+` and `\du+` is treated as the same, and `make > check` all pass, then I assume there is no test case to verify the output > of `duS+`. My point is should we consider add a test case? > I mean, either you accept the change in how this meta-command presents its information or you don't. I don't see how a test case is particularly beneficial. Or, at least the pg_monitor role is not special in this regard. Alice changed too and you don't seem to be including it in your complaint. David J.
Re: [EXTERNAL] Re: [PATCH] Support using "all" for the db user in pg_ident.conf
On Wed, Feb 15, 2023 at 03:40:26PM +0100, Jelte Fennema wrote: > On Wed, 15 Feb 2023 at 08:11, Michael Paquier wrote: >> Hmm, I am not sure that adding more examples in the sample files is >> worth the duplication with the docs. > > I think you misunderstood what I meant (because I admittedly didn't > write it down clearly). I meant the docs for pg_ident don't include > any examples (only descriptions of the new patterns). Attached is a > patch that addresses that. Shouldn't the paragraph above the example file of pg_ident.conf be updated to reflect the changes you have added? An idea would be cleaner to split that into two sections. For example, we could keep the current example with bryanh, ann and bob as it is (splitting it into its own ), and add a second example with all the new patterns? >> So, please find attached a patch to close the gap the sample files, >> for both things, with descriptions of all the field values they can >> use. > > LGTM Thanks for the review, applied this part. -- Michael signature.asc Description: PGP signature
Re: psql: Add role's membership options to the \du+ command
On 2023-02-15 1:37 p.m., David G. Johnston wrote: On Wed, Feb 15, 2023 at 2:31 PM David Zhang wrote: There is a default built-in role `pg_monitor` and the behavior changed after the patch. If `\dg+` and `\du+` is treated as the same, and `make check` all pass, then I assume there is no test case to verify the output of `duS+`. My point is should we consider add a test case? I mean, either you accept the change in how this meta-command presents its information or you don't. I don't see how a test case is particularly beneficial. Or, at least the pg_monitor role is not special in this regard. Alice changed too and you don't seem to be including it in your complaint. Good improvement, +1.
Re: [PATCH] Add pretty-printed XML output option
On Thu, Feb 16, 2023 at 12:49 AM Jim Jones wrote: > > Accidentally left the VERBOSE settings out -- sorry! > > Now it matches the approach used in a xpath test in xml.sql, xml.out, > xml_1.out and xml_2.out > > -- Since different libxml versions emit slightly different > -- error messages, we suppress the DETAIL in this test. > \set VERBOSITY terse > SELECT xpath('/*', ''); > ERROR: could not parse XML document > \set VERBOSITY default > > v11 now correctly sets xml_2.out. > > Best, Jim Firstly, Sorry it seems like I made a mistake and was premature calling bingo above for v9. - today I repeated v9 'make check' and found it failing still. - the new xmlformat tests are OK, but some pre-existing xmlparse tests are broken. - see attached file pretty-v9-results OTOH, the absence of xml_2.out from this patch appears to be the correct explanation for why my results have been differing. Today I fetched and tried the latest v11. It is failing too, but only just. - see attached file pretty-v11-results It looks only due to a whitespace EOF issue in the xml_2.out @@ -1679,4 +1679,4 @@ -- XML format: empty string SELECT xmlformat(''); ERROR: invalid XML document -\set VERBOSITY default \ No newline at end of file +\set VERBOSITY default -- The attached patch update (v12-0002) fixes the xml_2.out for me. -- Kind Regards, Peter Smith. Fujitsu Australia v12-0001-Add-pretty-printed-XML-output-option.patch Description: Binary data v12-0002-PS-fix-EOF-for-xml_2.out.patch Description: Binary data pretty-v9-results Description: Binary data pretty-v11-results Description: Binary data
Re: Perform streaming logical transactions by background workers and parallel apply
LGTM. My only comment is about the commit message. == Commit message d9d7fe6 reuse existing wait event when sending data in apply worker. But we should have invent a new wait state if we are waiting at a new place, so fix this. ~ SUGGESTION d9d7fe6 made use of an existing wait event when sending data from the apply worker, but we should have invented a new wait state since the code was waiting at a new place. This patch corrects the mistake by using a new wait state "LogicalApplySendData". -- Kind Regards, Peter Smith. Fujitsu Australia
Re: DDL result is lost by CREATE DATABASE with WAL_LOG strategy
On Wed, Feb 15, 2023 at 04:49:38AM +, Ryo Matsumura (Fujitsu) wrote: > The above is occured by the following call. > The argument 'permanent' of ReadBufferWithoutRelcache() is passed to > BufferAlloc() as 'relpersistence'. > > [src/backend/commands/] > 298 buf = ReadBufferWithoutRelcache(rnode, MAIN_FORKNUM, blkno, > 299 RBM_NORMAL, bstrategy, false); Indeed, setting that to true (as per the attached patch) seems to fix this. I don't see any reason this _shouldn't_ be true from what I've read so far. We're reading pg_class, which will probably never be UNLOGGED. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c index ef05633bb0..a0259cc593 100644 --- a/src/backend/commands/dbcommands.c +++ b/src/backend/commands/dbcommands.c @@ -296,7 +296,7 @@ ScanSourceDatabasePgClass(Oid tbid, Oid dbid, char *srcpath) CHECK_FOR_INTERRUPTS(); buf = ReadBufferWithoutRelcache(rlocator, MAIN_FORKNUM, blkno, - RBM_NORMAL, bstrategy, false); + RBM_NORMAL, bstrategy, true); LockBuffer(buf, BUFFER_LOCK_SHARE); page = BufferGetPage(buf);
Re: Silent overflow of interval type
Andrey Borodin writes: > On Wed, Feb 15, 2023 at 7:08 AM Nikolai wrote: >> The patch attached simply throws an error when an overflow is >> detected. However I'm not sure this is a reasonable approach for a >> code path that could be very hot in some workloads. > Given the extraordinary amount of overflow checks in the nearby code > of timestamp.c, I'd say that this case should not be an exception. Yeah, I don't think this would create a performance problem, at least not if you're using a compiler that implements pg_sub_s64_overflow reasonably. (And if you're not, and this bugs you, the answer is to get a better compiler.) > By chance did you look at all other nearby cases, is it the only place > with overflow? That was my immediate reaction as well. regards, tom lane
Re: Normalization of utility queries in pg_stat_statements
On Wed, Feb 08, 2023 at 12:05:24PM +0900, Michael Paquier wrote: > Thoughts and comments are welcome. 0001 and 0002 are useful on their > own to keep track of utilities that use Const and A_Const after going > through the query jumbling, even if an approach based on query string > or the automated query jumbling for utilities is used (the query > string approach a bit its value). I'll add that to the next commit > fest. While wondering about this stuff about the last few days and discussing with bertrand, I have changed my mind on the point that there is no need to be that aggressive yet with the normalization of the A_Const nodes, because the query string normalization of pg_stat_statements is not prepared yet to handle cases where a A_Const value uses a non-quoted value with whitespaces. The two cases where I saw an impact is on the commands that can define an isolation level: SET TRANSACTION and BEGIN. For example, applying normalization to A_Const nodes does the following as of HEAD: 1) BEGIN TRANSACTION READ ONLY, READ WRITE, DEFERRABLE, NOT DEFERRABLE; BEGIN TRANSACTION $1 ONLY, $2 WRITE, $3, $4 DEFERRABLE 2) SET TRANSACTION ISOLATION LEVEL READ COMMITTED; SET TRANSACTION ISOLATION LEVEL $1 COMMITTED On top of that, specifying a different isolation level may cause these commands to be grouped, which is not really cool. All that could be done incrementally later on, in 17~ or later depending on the adjustments that make sense. Attached is an updated patch set. 0003 is basically the same as v3, that I have kept around for clarity in case one wants to see the effect of a A_Const normalization to all the related commands, though I am not proposing that for an upstream integration. 0002 has been completed with a couple of commands to track all the commands with A_Const, so as we never lose sight of what happens. 0004 is what I think could be done for PG16, where normalization affects only Const. At the end of the day, this reflects the following commands that use Const nodes because they use directly queries, so the same rules as SELECT and DMLs apply to them: - DECLARE - EXPLAIN - CREATE MATERIALIZED VIEW - CTAS, SELECT INTO Comments and thoughts welcome. Thanks, -- Michael From eb61fe10d0d7399573ebb3a911aed4114742cf25 Mon Sep 17 00:00:00 2001 From: Michael Paquier Date: Tue, 7 Feb 2023 14:33:20 +0900 Subject: [PATCH v2 1/4] Refactor regression tests of pg_stat_statements pg_stat_statements.sql acts as the main file for all the core tests of the module, but things have become a bit hairy over the years as some of the sub-scenarios tested rely on assumptions that may have been set in a completely different block, like a GUC setup or a different relation. This commit refactors the tests of pg_stat_statements a bit, by moving a few test cases out of pg_stat_statements.sql into their own file, as of: - Planning-related tests in planning.sql. - Utilities in utility.sql. Test scenarios and their results remain the same as the originals. --- contrib/pg_stat_statements/Makefile | 2 +- .../pg_stat_statements/expected/cleanup.out | 1 + .../expected/pg_stat_statements.out | 284 ++ .../pg_stat_statements/expected/planning.out | 195 .../pg_stat_statements/expected/utility.out | 72 + contrib/pg_stat_statements/meson.build| 3 + contrib/pg_stat_statements/sql/cleanup.sql| 1 + .../sql/pg_stat_statements.sql| 118 +--- contrib/pg_stat_statements/sql/planning.sql | 78 + contrib/pg_stat_statements/sql/utility.sql| 34 +++ 10 files changed, 417 insertions(+), 371 deletions(-) create mode 100644 contrib/pg_stat_statements/expected/cleanup.out create mode 100644 contrib/pg_stat_statements/expected/planning.out create mode 100644 contrib/pg_stat_statements/expected/utility.out create mode 100644 contrib/pg_stat_statements/sql/cleanup.sql create mode 100644 contrib/pg_stat_statements/sql/planning.sql create mode 100644 contrib/pg_stat_statements/sql/utility.sql diff --git a/contrib/pg_stat_statements/Makefile b/contrib/pg_stat_statements/Makefile index edc40c8bbf..78dc4c1d07 100644 --- a/contrib/pg_stat_statements/Makefile +++ b/contrib/pg_stat_statements/Makefile @@ -17,7 +17,7 @@ PGFILEDESC = "pg_stat_statements - execution statistics of SQL statements" LDFLAGS_SL += $(filter -lm, $(LIBS)) REGRESS_OPTS = --temp-config $(top_srcdir)/contrib/pg_stat_statements/pg_stat_statements.conf -REGRESS = pg_stat_statements oldextversions +REGRESS = pg_stat_statements utility planning cleanup oldextversions # Disabled because these tests require "shared_preload_libraries=pg_stat_statements", # which typical installcheck users do not have (e.g. buildfarm clients). NO_INSTALLCHECK = 1 diff --git a/contrib/pg_stat_statements/expected/cleanup.out b/contrib/pg_stat_statements/expected/cleanup.out new file mode 100644 index 00..36bec35c40 --- /dev/null +++ b/contrib/pg_stat_statements/expec
Make set_ps_display faster and easier to use
While doing some benchmarking of some fast-to-execute queries, I see that set_ps_display() popping up on the profiles. Looking a little deeper, there are some inefficiencies in there that we could fix. For example, the following is pretty poor: strlcpy(ps_buffer + ps_buffer_fixed_size, activity, ps_buffer_size - ps_buffer_fixed_size); ps_buffer_cur_len = strlen(ps_buffer); We already know the strlen of the fixed-sized part, so why bother doing strlen on the entire thing? Also, if we did just do strlen(activity), we could just memcpy, which would be much faster than strlcpy's byte-at-a-time method of copying. Adjusting that lead me to notice that we often just pass string constants to set_ps_display(), so we already know the strlen for this at compile time. So maybe we can just have set_ps_display_with_len() and then make a static inline wrapper that does strlen() so that when the compiler can figure out the length, it just hard codes it. After doing that, I went over all usages of set_ps_display() to see if any of those call sites knew the length already in a way that the compiler wouldn't be able to deduce. There were a few cases to adjust when setting the process title to contain the command tag. After fixing up the set_ps_display()s to use set_ps_display_with_len() where possible, I discovered some not so nice code which appends " waiting" onto the process title. Basically, there's a bunch of code that looks like this: const char *old_status; int len; old_status = get_ps_display(&len); new_status = (char *) palloc(len + 8 + 1); memcpy(new_status, old_status, len); strcpy(new_status + len, " waiting"); set_ps_display(new_status); new_status[len] = '\0'; /* truncate off " waiting" */ Seeing that made me wonder if we shouldn't just have something more generic for setting a suffix on the process title. I came up with set_ps_display_suffix() and set_ps_display_remove_suffix(). The above code can just become: set_ps_display_suffix("waiting"); then to remove the "waiting" suffix, just: set_ps_display_remove_suffix(); I considered adding a format version to append the suffix as there's one case that could make use of it, but in the end, decided it might be overkill, so I left that code like: char buffer[32]; sprintf(buffer, "waiting for %X/%X", LSN_FORMAT_ARGS(lsn)); set_ps_display_suffix(buffer); I don't think that's terrible enough to warrant making a va_args version of set_ps_display_suffix(), especially for just 1 instance of it. I also resisted making set_ps_display_suffix_with_len(). The new code should be quite a bit faster already without troubling over that additional function. I've attached the patch. David diff --git a/src/backend/replication/syncrep.c b/src/backend/replication/syncrep.c index 80d681b71c..889e20b5dd 100644 --- a/src/backend/replication/syncrep.c +++ b/src/backend/replication/syncrep.c @@ -148,8 +148,6 @@ static bool SyncRepQueueIsOrderedByLSN(int mode); void SyncRepWaitForLSN(XLogRecPtr lsn, bool commit) { - char *new_status = NULL; - const char *old_status; int mode; /* @@ -216,15 +214,10 @@ SyncRepWaitForLSN(XLogRecPtr lsn, bool commit) /* Alter ps display to show waiting for sync rep. */ if (update_process_title) { - int len; - - old_status = get_ps_display(&len); - new_status = (char *) palloc(len + 32 + 1); - memcpy(new_status, old_status, len); - sprintf(new_status + len, " waiting for %X/%X", - LSN_FORMAT_ARGS(lsn)); - set_ps_display(new_status); - new_status[len] = '\0'; /* truncate off " waiting ..." */ + charbuffer[32]; + + sprintf(buffer, "waiting for %X/%X", LSN_FORMAT_ARGS(lsn)); + set_ps_display_suffix(buffer); } /* @@ -322,12 +315,9 @@ SyncRepWaitForLSN(XLogRecPtr lsn, bool commit) MyProc->syncRepState = SYNC_REP_NOT_WAITING; MyProc->waitLSN = 0; - if (new_status) - { - /* Reset ps display */ - set_ps_display(new_status); - pfree(new_status); - } + /* reset ps display to remove the suffix */ + if (update_process_title) + set_ps_display_remove_suffix(); } /* diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c index 2d6dbc6561..98904a7c05 100644 --- a/src/backend/storage/buffer/bufmgr.c +++ b/src/backend/storage/buffer/bufmgr.c @@ -4302,8 +4302,8 @@ void LockBufferForCleanup(Buffer buffer) { BufferDesc *bufHdr; - char *new_status = NULL; TimestampTz waitStart = 0; + boolwaiting = false; boollogged_recovery_conflict = false; Assert(BufferIsPinned(buffer)); @@ -4350,11 +4350,11 @@ LockBufferForCleanup(Buffer buffer)
Bug in processing conditions on multi-column BRIN indexes
Hi, While working on the BRIN SK_SEARCHARRAY patch I noticed a silly bug in handling clauses on multi-column BRIN indexes, introduced in PG13. Consider a simple table with two columns (a,b) and a multi-columns BRIN index on them: create table t (a int, b int); insert into t select mod(i,1) + 100 * random(), mod(i,1) + 100 * random() from generate_series(1,100) s(i); create index on t using brin(a int4_minmax_ops, b int4_minmax_ops) with (pages_per_range=1); Let's run a query with condition on "a": select * from t where a = 500; QUERY PLAN - Bitmap Heap Scan on t (actual rows=97 loops=1) Recheck Cond: (a = 500) Rows Removed by Index Recheck: 53189 Heap Blocks: lossy=236 -> Bitmap Index Scan on t_a_b_idx (actual rows=2360 loops=1) Index Cond: (a = 500) Planning Time: 0.075 ms Execution Time: 8.263 ms (8 rows) Now let's add another condition on b: select * from t where a = 500 and b < 800; QUERY PLAN - Bitmap Heap Scan on t (actual rows=97 loops=1) Recheck Cond: ((a = 500) AND (b < 800)) Rows Removed by Index Recheck: 101101 Heap Blocks: lossy=448 -> Bitmap Index Scan on t_a_b_idx (actual rows=4480 loops=1) Index Cond: ((a = 500) AND (b < 800)) Planning Time: 0.085 ms Execution Time: 14.989 ms (8 rows) Well, that's wrong. With one condition we accessed 236 pages, and with additional condition - which should reduce the number of heap pages - we accessed 448 pages. The problem is in bringetbitmap(), which failed to combine the results from consistent function correctly (and also does not abort early). Here's a patch for that, I'll push it shortly after a bit more testing. regard -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL CompanyFrom bbe40c94e2293849d977da4720ef76d13160347a Mon Sep 17 00:00:00 2001 From: Tomas Vondra Date: Wed, 15 Feb 2023 17:32:53 +0100 Subject: [PATCH 01/10] Fix handling of multi-column BRIN indexes When evaluating clauses on multiple keys of multi-column BRIN indexes, the results should be combined using AND, and we can stop evaluating once we find a mismatching scan key. The existing code was simply scanning a range as long as the last batch of scan keys returned true. --- src/backend/access/brin/brin.c | 11 +-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c index de1427a1e0..85ae795949 100644 --- a/src/backend/access/brin/brin.c +++ b/src/backend/access/brin/brin.c @@ -660,7 +660,7 @@ bringetbitmap(IndexScanDesc scan, TIDBitmap *tbm) PointerGetDatum(bval), PointerGetDatum(keys[attno - 1]), Int32GetDatum(nkeys[attno - 1])); - addrange = DatumGetBool(add); + addrange &= DatumGetBool(add); } else { @@ -681,11 +681,18 @@ bringetbitmap(IndexScanDesc scan, TIDBitmap *tbm) PointerGetDatum(bdesc), PointerGetDatum(bval), PointerGetDatum(keys[attno - 1][keyno])); - addrange = DatumGetBool(add); + addrange &= DatumGetBool(add); if (!addrange) break; } } + + /* + * We found a clause that eliminates this range. No point + * in evaluating more clauses. + */ + if (!addrange) + break; } } } -- 2.39.1
Re: Todo: Teach planner to evaluate multiple windows in the optimal order
On Thu, 16 Feb 2023 at 00:45, John Naylor wrote: > Okay, here's a rerun including the sort hack, and it looks like incremental > sort is only ahead with the smallest set, otherwise same or maybe slightly > slower: > > HEAD: > > 4 ^ 8: latency average = 113.461 ms > 5 ^ 8: latency average = 786.080 ms > 6 ^ 8: latency average = 3948.388 ms > 7 ^ 8: latency average = 15733.348 ms > > tiebreaker: > > 4 ^ 8: latency average = 106.556 ms > 5 ^ 8: latency average = 734.834 ms > 6 ^ 8: latency average = 3640.507 ms > 7 ^ 8: latency average = 14470.199 ms > > tiebreaker + incr sort hack: > > 4 ^ 8: latency average = 93.998 ms > 5 ^ 8: latency average = 740.120 ms > 6 ^ 8: latency average = 3715.942 ms > 7 ^ 8: latency average = 14749.323 ms Sad news :( the sort hacks are still quite a bit faster for 4 ^ 8. I was fooling around with the attached (very quickly and crudely put together) patch just there. The idea is to sort during tuplesort_puttuple_common() when the memory consumption goes over some approximation of L3 cache size in the hope we still have cache lines for the tuples in question still. The code is not ideal there as we track availMem rather than the used mem, so maybe I need to do that better as we could cross some boundary without actually having done very much, plus we USEMEM for other reasons too. I found that the patch didn't really help: create table t (a int not null, b int not null); insert into t select (x*random())::int % 100,(x*random())::int % 100 from generate_Series(1,1000)x; vacuum freeze t; select pg_prewarm('t'); show work_mem; work_mem -- 4GB explain (analyze, timing off) select * from t order by a,b; master: Execution Time: 5620.704 ms Execution Time: 5506.705 ms patched: Execution Time: 6801.421 ms Execution Time: 6762.130 ms I suspect it's slower because the final sort must sort the entire array still without knowledge that portions of it are pre-sorted. It would be very interesting to improve this and do some additional work and keep track of the "memtupsortedto" index by pushing them onto a List each time we cross the availMem boundary, then do then qsort just the final portion of the array in tuplesort_performsort() before doing a k-way merge on each segment rather than qsorting the entire thing again. I suspect this would be faster when work_mem exceeds L3 by some large amount. David diff --git a/src/backend/utils/sort/tuplesort.c b/src/backend/utils/sort/tuplesort.c index 9ca9835aab..65ffe442cd 100644 --- a/src/backend/utils/sort/tuplesort.c +++ b/src/backend/utils/sort/tuplesort.c @@ -216,6 +216,7 @@ struct Tuplesortstate SortTuple *memtuples; /* array of SortTuple structs */ int memtupcount;/* number of tuples currently present */ int memtupsize; /* allocated length of memtuples array */ + int memtupsortedto; /* index of where we've already sorted up to */ boolgrowmemtuples; /* memtuples' growth still underway? */ /* @@ -465,7 +466,7 @@ static bool mergereadnext(Tuplesortstate *state, LogicalTape *srcTape, SortTuple static void dumptuples(Tuplesortstate *state, bool alltuples); static void make_bounded_heap(Tuplesortstate *state); static void sort_bounded_heap(Tuplesortstate *state); -static void tuplesort_sort_memtuples(Tuplesortstate *state); +static void tuplesort_sort_memtuples(Tuplesortstate *state, int start_index); static void tuplesort_heap_insert(Tuplesortstate *state, SortTuple *tuple); static void tuplesort_heap_replace_top(Tuplesortstate *state, SortTuple *tuple); static void tuplesort_heap_delete_top(Tuplesortstate *state); @@ -708,6 +709,7 @@ tuplesort_begin_common(int workMem, SortCoordinate coordinate, int sortopt) */ state->memtupsize = INITIAL_MEMTUPSIZE; state->memtuples = NULL; + state->memtupsortedto = 0; /* * After all of the other non-parallel-related state, we setup all of the @@ -793,6 +795,7 @@ tuplesort_begin_batch(Tuplesortstate *state) state->tapeset = NULL; state->memtupcount = 0; + state->memtupsortedto = 0; /* * Initial size of array must be more than ALLOCSET_SEPARATE_THRESHOLD; @@ -1193,10 +1196,29 @@ tuplesort_puttuple_common(Tuplesortstate *state, SortTuple *tuple, bool useAbbre Assert(!LEADER(state)); +/* just roughly. Must be power-of-2 for efficient division */ +#define EFFECTIVE_CPU_L3_SIZE 16777216 + /* Count the size of the out-of-line data */ if (tuple->tuple != NULL) + { + size_t before_mem = state->availMem / EFFECTIVE_CPU_L3_SIZE; + USEMEM(state, GetMemoryChunkSpace(tuple->tuple)); + /* +* Being a bit more cache awareness to the sort and do a presort +* of the tuple seen thus far while they're likely still sitting in +* L3.
Re: [PoC] Improve dead tuple storage for lazy vacuum
On Tue, Feb 14, 2023 at 8:24 PM John Naylor wrote: > > On Mon, Feb 13, 2023 at 2:51 PM Masahiko Sawada wrote: > > > > On Sat, Feb 11, 2023 at 2:33 PM John Naylor > > wrote: > > > > > > I didn't get any closer to radix-tree regression, > > > > Me neither. It seems that in v26, inserting chunks into node-32 is > > slow but needs more analysis. I'll share if I found something > > interesting. > > If that were the case, then the other benchmarks I ran would likely have > slowed down as well, but they are the same or faster. There is one > microbenchmark I didn't run before: "select * from > bench_fixed_height_search(15)" (15 to reduce noise from growing size class, > and despite the name it measures load time as well). Trying this now shows no > difference: a few runs range 19 to 21ms in each version. That also reinforces > that update_inner is fine and that the move to value pointer API didn't > regress. > > Changing TIDS_PER_BLOCK_FOR_LOAD to 1 to stress the tree more gives (min of > 5, perf run separate from measurements): > > v15 + v26 store: > > mem_allocated | load_ms > ---+- > 98202152 | 553 > > 19.71% postgres postgres [.] tidstore_add_tids > + 31.47% postgres postgres [.] rt_set > = 51.18% > > 20.62% postgres postgres [.] rt_node_insert_leaf >6.05% postgres postgres [.] AllocSetAlloc >4.74% postgres postgres [.] AllocSetFree >4.62% postgres postgres [.] palloc >2.23% postgres postgres [.] SlabAlloc > > v26: > > mem_allocated | load_ms > ---+- > 98202032 | 617 > > 57.45% postgres postgres [.] tidstore_add_tids > > 20.67% postgres postgres [.] local_rt_node_insert_leaf >5.99% postgres postgres [.] AllocSetAlloc >3.55% postgres postgres [.] palloc >3.05% postgres postgres [.] AllocSetFree >2.05% postgres postgres [.] SlabAlloc > > So it seems the store itself got faster when we removed shared memory paths > from the v26 store to test it against v15. > > I thought to favor the local memory case in the tidstore by controlling > inlining -- it's smaller and will be called much more often, so I tried the > following (done in 0007) > > #define RT_PREFIX shared_rt > #define RT_SHMEM > -#define RT_SCOPE static > +#define RT_SCOPE static pg_noinline > > That brings it down to > > mem_allocated | load_ms > ---+- > 98202032 | 590 The improvement makes sense to me. I've also done the same test (with changing TIDS_PER_BLOCK_FOR_LOAD to 1): w/o 0007 patch: mem_allocated | load_ms | iter_ms ---+-+- 98202032 | 334 | 445 (1 row) w/ 0007 patch: mem_allocated | load_ms | iter_ms ---+-+- 98202032 | 316 | 434 (1 row) On the other hand, with TIDS_PER_BLOCK_FOR_LOAD being 30, the load performance didn't improve: w/0 0007 patch: mem_allocated | load_ms | iter_ms ---+-+- 98202032 | 601 | 608 (1 row) w/ 0007 patch: mem_allocated | load_ms | iter_ms ---+-+- 98202032 | 610 | 606 (1 row) That being said, it might be within noise level, so I agree with 0007 patch. > Perhaps some slowdown is unavoidable, but it would be nice to understand why. True. > > > I can think that something like traversing a HOT chain could visit > > offsets out of order. But fortunately we prune such collected TIDs > > before heap vacuum in heap case. > > Further, currently we *already* assume we populate the tid array in order > (for binary search), so we can just continue assuming that (with an assert > added since it's more public in this form). I'm not sure why such basic > common sense evaded me a few versions ago... Right. TidStore is implemented not only for heap, so loading out-of-order TIDs might be important in the future. > > > If these are acceptable, I can incorporate them into a later patchset. > > > > These are nice improvements! I agree with all changes. > > Great, I've squashed these into the tidstore patch (0004). Also added 0005, > which is just a simplification. > I've attached some small patches to improve the radix tree and tidstrore: We have the following WIP comment in test_radixtree: // WIP: compiles with warnings because rt_attach is defined but not used // #define RT_SHMEM How about unsetting RT_SCOPE to suppress warnings for unused rt_attach and friends? FYI I've briefly tested the TidStore with blocksize = 32kb, and it seems to work fine. > I squashed the earlier dead code removal into the radix tree patch. Thanks! > > v27-0008 measures tid store iteration performance and adds a stub function to > prevent spurious warnings, so the benchmarking module can always be built. > > Getting the list of offsets from the
Dead code in ps_status.c
Hi, Here's some archeology I did a while back, but was reminded to post when I saw David's nearby performance improvements for ps_status.c. * there are no systems with HAVE_PS_STRINGS (ancient BSD) * setproctitle_fast() is in all live FreeBSD releases * setproctitle() is in all other BSDs * PostgreSQL can't run on GNU/Hurd apparently, for lack of shared sempahores, so who would even know if that works? * IRIX is rusting in peace * there are no other NeXT-derived systems (NeXTSTEP and OPENSTEP are departed) Therefore I think it is safe to drop the PS_USE_PS_STRING and PS_USE_CHANGE_ARGV code branches, remove a bunch of outdated comments and macro tests, and prune the defunct configure/meson probe. I guess (defined(sun) && !defined(BSD)) || defined(__svr5__) could be changed to just defined(sun) (surely there are no other living SysV-derived systems, and I think non-BSD Sun probably meant "Solaris but not SunOS"), but I don't know so I didn't touch that. I think the history here is that the ancient BSD sendmail code (conf.c) had all this stuff for BSD and SVR5 systems, but then its setproctitle() function actually moved into the OS so that the underlying PS_STRINGS stuff wouldn't have to be stable, and indeed it was not. From fdca29db1140aac55f9b07d63bf5cdc7555454da Mon Sep 17 00:00:00 2001 From: Thomas Munro Date: Thu, 16 Feb 2023 15:48:50 +1300 Subject: [PATCH] Remove obsolete code from ps_status.c. We can now remove various code, comments and configure/meson probes for ancient BSD variants, GNU/Hurd, IRIX, NeXT. --- configure | 35 --- configure.ac | 13 --- meson.build| 17 - src/backend/utils/misc/ps_status.c | 55 +- src/include/pg_config.h.in | 3 -- 5 files changed, 9 insertions(+), 114 deletions(-) diff --git a/configure b/configure index 7bb829ddf4..e35769ea73 100755 --- a/configure +++ b/configure @@ -16278,41 +16278,6 @@ cat >>confdefs.h <<_ACEOF _ACEOF -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for PS_STRINGS" >&5 -$as_echo_n "checking for PS_STRINGS... " >&6; } -if ${pgac_cv_var_PS_STRINGS+:} false; then : - $as_echo_n "(cached) " >&6 -else - cat confdefs.h - <<_ACEOF >conftest.$ac_ext -/* end confdefs.h. */ -#include -#include - -int -main () -{ -PS_STRINGS->ps_nargvstr = 1; -PS_STRINGS->ps_argvstr = "foo"; - ; - return 0; -} -_ACEOF -if ac_fn_c_try_link "$LINENO"; then : - pgac_cv_var_PS_STRINGS=yes -else - pgac_cv_var_PS_STRINGS=no -fi -rm -f core conftest.err conftest.$ac_objext \ -conftest$ac_exeext conftest.$ac_ext -fi -{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $pgac_cv_var_PS_STRINGS" >&5 -$as_echo "$pgac_cv_var_PS_STRINGS" >&6; } -if test "$pgac_cv_var_PS_STRINGS" = yes ; then - -$as_echo "#define HAVE_PS_STRINGS 1" >>confdefs.h - -fi - ac_fn_c_check_func "$LINENO" "explicit_bzero" "ac_cv_func_explicit_bzero" if test "x$ac_cv_func_explicit_bzero" = xyes; then : $as_echo "#define HAVE_EXPLICIT_BZERO 1" >>confdefs.h diff --git a/configure.ac b/configure.ac index 137a40a942..af23c15cb2 100644 --- a/configure.ac +++ b/configure.ac @@ -1849,19 +1849,6 @@ AC_CHECK_DECLS([pwritev], [], [AC_LIBOBJ(pwritev)], [#include ]) # This is probably only present on macOS, but may as well check always AC_CHECK_DECLS(F_FULLFSYNC, [], [], [#include ]) -AC_CACHE_CHECK([for PS_STRINGS], [pgac_cv_var_PS_STRINGS], -[AC_LINK_IFELSE([AC_LANG_PROGRAM( -[#include -#include -], -[PS_STRINGS->ps_nargvstr = 1; -PS_STRINGS->ps_argvstr = "foo";])], -[pgac_cv_var_PS_STRINGS=yes], -[pgac_cv_var_PS_STRINGS=no])]) -if test "$pgac_cv_var_PS_STRINGS" = yes ; then - AC_DEFINE([HAVE_PS_STRINGS], 1, [Define to 1 if the PS_STRINGS thing exists.]) -fi - AC_REPLACE_FUNCS(m4_normalize([ explicit_bzero getopt diff --git a/meson.build b/meson.build index b5daed9f38..f534704452 100644 --- a/meson.build +++ b/meson.build @@ -2293,23 +2293,6 @@ endif cdata.set('pg_restrict', '__restrict') -if cc.links(''' -#include -#include - -int main(void) -{ -PS_STRINGS->ps_nargvstr = 1; -PS_STRINGS->ps_argvstr = "foo"; -} -''', - name: 'PS_STRINGS', args: test_c_args) - cdata.set('HAVE_PS_STRINGS', 1) -else - cdata.set('HAVE_PS_STRINGS', false) -endif - - # Most libraries are included only if they demonstrably provide a function we # need, but libm is an exception: always include it, because there are too # many compilers that play cute optimization games that will break probes for diff --git a/src/backend/utils/misc/ps_status.c b/src/backend/utils/misc/ps_status.c index 06bdc2afd1..d6a3389c29 100644 --- a/src/backend/utils/misc/ps_status.c +++ b/src/backend/utils/misc/ps_status.c @@ -15,10 +15,6 @@ #include "postgres.h" #include -#ifdef HAVE_PS_STRINGS -#include /* for old BSD */ -#include -#endif #if defined(__darwin__) #include #endif @@ -39,16 +35,10 @@ bool update_process_title = DEFAULT_UPDATE_PROCESS_TITLE; *
Re: Support logical replication of DDLs
On Thu, Feb 16, 2023 at 3:46 AM Zheng Li wrote: > > We have not discussed much about the ownership of replicated objects. > Currently, replicated > objects belong to the subscription owner. However, it makes sense to > allow replicated > objects to keep the same owner from the publisher for certain use > cases otherwise users > may need to run lots of ALTER TABLE/OBJ OWNER TO manually. This issue > has been raised in [1] and [2]. > > I've implemented a prototype to allow replicated objects to have the > same owner from the publisher in > v69-0008-Allow-replicated-objects-to-have-the-same-owner-from.patch. > I also think it would be a helpful addition for users. A few points that come to my mind are: (a) Shouldn't the role have the same priveliges (for ex. rolbypassrls or rolsuper) on both sides before we allow this? (b) Isn't it better to first have a replication of roles? I think if we have (b) then it would be probably a bit easier because if the subscription has allowed replicating roles and we can confirm that the role is replicated then we don't need to worry about the differences. Now, coming to implementation, won't it be better if we avoid sending the owner to the subscriber unless it is changed for the replicated command? Consider the current case of tables where we send schema only if it is changed. This is not a direct mapping but it would be better to avoid sending additional information and then process it on the subscriber for each command. -- With Regards, Amit Kapila.
Re: REASSIGN OWNED vs ALTER TABLE OWNER TO permission inconsistencies
On Wed, Feb 15, 2023 at 9:01 AM Stephen Frost wrote: > I don't think I really agree that "because a superuser can arrange for > it to not be valid" that it follows that requiring the recipient to have > CREATE permission on the parent object doesn't make sense. Surely for > any of these scenarios, whatever rule we come up with (assuming we have > any rule at all...) a superuser could arrange to make that rule no > longer consistent. Well yes and no. The superuser can always hack things by modifying the system catalogs, but we have plenty of integrity constraints that a superuser can't just casually violate because they feel like it. For example, a superuser is no more able to revoke privileges without revoking the privileges that depend upon them than anyone else. > I'm not really a fan of just dropping the CREATE check. If we go with > "recipient needs CREATE rights" then at least without superuser > intervention and excluding cases where REVOKE's or such are happening, > we should be able to see that only objects where the owners of those > objects have CREATE rights exist in the system. If we drop the CREATE > check entirely then clearly any user who happens to have access to > multiple roles can arrange to have objects owned by any of their roles > in any schema or database they please without any consideration for what > the owner of the parent object's wishes are. That's true, and it is a downside of dropping to CREATE check, but it's also a bit hard to believe that anyone's really getting a lot of value out of the current inconsistent checks. -- Robert Haas EDB: http://www.enterprisedb.com
Re: run pgindent on a regular basis / scripted manner
On Wed, Feb 15, 2023 at 12:45:52PM -0600, Justin Pryzby wrote: > On Mon, Feb 13, 2023 at 07:57:25AM -0500, Andrew Dunstan wrote: > > On 2023-02-12 Su 15:59, Justin Pryzby wrote: > > > It seems like if pgindent knows about git, it ought to process only > > > tracked files. Then, it wouldn't need to manually exclude generated > > > files, and it wouldn't process vpath builds and who-knows-what else it > > > finds in CWD. > > > > I don't really want restrict this to tracked files because it would mean you > > can't pgindent files before you `git add` them. > > I think you'd allow indenting files which were either tracked *or* > specified on the command line. > > Also, it makes a more sense to "add" the file before indenting it, to > allow checking the output and remove unrelated changes. So that doesn't > seem to me like a restriction of any significance. > > But I would never want to indent an untracked file unless I specified > it. Agreed. I use pgindent three ways: 1. Indent everything that changed between master and the current branch. Most common, since I develop nontrivial patches on branches. 2. Indent all staged files. For trivial changes. 3. Indent all tracked files. For typedefs.list changes. That said, pre-2023 pgindent changed untracked files if called without a file list. I've lived with that and could continue to do so.
Re: some namespace.c refactoring
On Tue, Feb 14, 2023 at 02:32:04PM +0100, Peter Eisentraut wrote: > Notes on 0001-Refactor-is-visible-functions.patch: > > Among the functions that are being unified, some check temp schemas and some > skip them. I suppose that this is because some (most) object types cannot > normally be in temp schemas, but this isn't made explicit in the code. I > added a code comment about this, the way I understand it. > > That said, you can create objects explicitly in temp schemas, so I'm not > sure the existing code is completely correct. > + /* > + * Do not look in temp namespace for object types that > don't > + * support temporary objects > + */ > + if (!(classid == RelationRelationId || classid == > TypeRelationId) && > + namespaceId == myTempNamespace) > + continue; I think the reason for the class-specific *IsVisible behavior is alignment with the lookup rules that CVE-2007-2138 introduced (commit aa27977). "CREATE FUNCTION pg_temp.f(...)" works, but calling the resulting function requires a schema-qualified name regardless of search_path. Since *IsVisible functions determine whether you can reach the object without schema qualification, their outcomes shall reflect those CVE-2007-2138 rules.
Re: [PATCH] Add pretty-printed XML output option
On Thu, Feb 9, 2023 at 2:31 AM Peter Eisentraut < peter.eisentr...@enterprisedb.com> wrote: > I suggest we call it "xmlformat", which is an established term for this. > Some very-very old, rusted memory told me that there was something in standard – and indeed, it seems it described an optional Feature X069, “XMLSerialize: INDENT” for XMLSERIALIZE. So probably pretty-printing should go there, to XMLSERIALIZE, to follow the standard? Oracle also has an option for it in XMLSERIALIZE, although in a slightly different form, with ability to specify the number of spaces for indentation https://docs.oracle.com/database/121/SQLRF/functions268.htm#SQLRF06231.
Re: Minimal logical decoding on standbys
On Wed, Feb 15, 2023 at 11:48 PM Andres Freund wrote: > > Hi, > > On 2023-02-15 18:02:11 +0530, Ashutosh Sharma wrote: > > Thanks Andres. I have one more query (both for you and Bertrand). I > > don't know if this has already been answered somewhere in this mail > > thread, if yes, please let me know the mail that answers this query. > > > > Will there be a problem if we mandate the use of physical replication > > slots and hot_standby_feedback to support minimum LD on standby. I > > know people can do a physical replication setup without a replication > > slot or even with hot_standby_feedback turned off, but are we going to > > have any issue if we ask them to use a physical replication slot and > > turn on hot_standby_feedback for LD on standby. This will reduce the > > code changes required to do conflict handling for logical slots on > > standby which is being done by v50-0001 and v50-0002* patches > > currently. > > I don't think it would. E.g. while restoring from archives we can't rely on > knowing that the slot still exists on the primary. > > We can't just do corrupt things, including potentially crashing, when the > configuration is wrong. We can't ensure that the configuration is accurate all > the time. So we need to detect this case. Hence needing to detect conflicts. > OK. Got it, thanks. > > > IMHO even in normal scenarios i.e. when we are not doing LD on > > standby, we should mandate the use of a physical replication slot. > > I don't think that's going to fly. There plenty scenarios where you e.g. don't > want to use a slot, e.g. when you want to limit space use on the primary. > I think this can be controlled via max_slot_wal_keep_size GUC, if I understand it correctly. -- With Regards, Ashutosh Sharma.
Re: Is psSocketPoll doing the right thing?
On 2023-02-10 10:42, Kyotaro Horiguchi wrote: On 2023/02/09 11:50, Kyotaro Horiguchi wrote: > Hello. > While looking a patch, I found that pqSocketPoll passes through the > result from poll(2) to the caller and throws away revents. If I > understand it correctly, poll() *doesn't* return -1 nor errno by the > reason it has set POLLERR, POLLHUP, POLLNVAL, and POLLRDHUP for some > of the target sockets, and returns 0 unless poll() itself failed to > work. As far as I understand correctly, poll() returns >0 if "revents" has either of those bits, not 0 nor -1. Right. as my understanding. If any of the sockets is in any of the states, pqSocketPoll returns a positive, which makes pqSocketCheck return 1. Finally pqRead/WriteReady return "ready" even though the connection socket is in an error state. Actually that behavior doesn't harm since the succeeding actual read/write will "properly" fail. However, once we use this function to simply check the socket is sound without doing an actual read/write, that behavior starts giving a harm by the false answer. I agree with you. Current pqScoketCheck could return a false result from a caller's point of view. You're thinking that pqSocketPoll() should check "revents" and return -1 if either of those bits is set? In short, yes. pqSocketPoll() should somehow inform callers about that state. Fortunately pqSocketPoll is a private function thus we can refactor the function so that it can do that properly. Does this mean that pqSocketPoll or pqSocketCheck somehow returns the poll's result including error conditions (POLLERR, POLLHUP, POLLNVAL) to callers? Then callers filter the result to make their final result. regards, -- Katsuragi Yuta Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Re: DDL result is lost by CREATE DATABASE with WAL_LOG strategy
On Thu, Feb 16, 2023 at 5:37 AM Nathan Bossart wrote: > > On Wed, Feb 15, 2023 at 04:49:38AM +, Ryo Matsumura (Fujitsu) wrote: > > The above is occured by the following call. > > The argument 'permanent' of ReadBufferWithoutRelcache() is passed to > > BufferAlloc() as 'relpersistence'. > > > > [src/backend/commands/] > > 298 buf = ReadBufferWithoutRelcache(rnode, MAIN_FORKNUM, blkno, > > 299 RBM_NORMAL, bstrategy, false); > > Indeed, setting that to true (as per the attached patch) seems to fix this. > I don't see any reason this _shouldn't_ be true from what I've read so far. > We're reading pg_class, which will probably never be UNLOGGED. Yes, there is no reason to pass this as false, seems like this is passed false by mistake. And your patch fixes the issue. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: Force testing of query jumbling code in TAP tests
On Tue, Feb 14, 2023 at 10:11:21AM -0800, Andres Freund wrote: > I didn't mean printing in the sense of outputting the statements to the tap > log. Maybe creating a temp table or such for all the queries. And yes, then > doing some top-level analysis on it like you describe sounds like a good idea. One idea would be something like that, that makes sure that reports are generated for the most common query patterns: WITH select_stats AS (SELECT upper(substr(query, 1, 6)) AS select_query FROM pg_stat_statements WHERE upper(substr(query, 1, 6)) IN ('SELECT', 'UPDATE', 'INSERT', 'DELETE', 'CREATE')) SELECT select_query, count(select_query) > 1 AS some_rows FROM select_stats GROUP BY select_query ORDER BY select_query; Other ideas are welcome. At least this would be a start. -- Michael diff --git a/src/test/recovery/Makefile b/src/test/recovery/Makefile index 570bf42b58..c60314d195 100644 --- a/src/test/recovery/Makefile +++ b/src/test/recovery/Makefile @@ -9,7 +9,7 @@ # #- -EXTRA_INSTALL=contrib/test_decoding contrib/pg_prewarm +EXTRA_INSTALL=contrib/pg_prewarm contrib/pg_stat_statements contrib/test_decoding subdir = src/test/recovery top_builddir = ../../.. diff --git a/src/test/recovery/t/027_stream_regress.pl b/src/test/recovery/t/027_stream_regress.pl index 13482adbaf..4f26192ffd 100644 --- a/src/test/recovery/t/027_stream_regress.pl +++ b/src/test/recovery/t/027_stream_regress.pl @@ -14,6 +14,17 @@ $node_primary->init(allows_streaming => 1); $node_primary->adjust_conf('postgresql.conf', 'max_connections', '25'); $node_primary->append_conf('postgresql.conf', 'max_prepared_transactions = 10'); + +# Enable pg_stat_statements to force tests with do query jumbling. +# pg_stat_statements.max should be large enough to hold all the entries +# of the regression database. +$node_primary->append_conf( + 'postgresql.conf', + qq{shared_preload_libraries = 'pg_stat_statements' +pg_stat_statements.max = 5 +compute_query_id = 'regress' +}); + # We'll stick with Cluster->new's small default shared_buffers, but since that # makes synchronized seqscans more probable, it risks changing the results of # some test queries. Disable synchronized seqscans to prevent that. @@ -106,6 +117,27 @@ command_ok( [ 'diff', $outputdir . '/primary.dump', $outputdir . '/standby.dump' ], 'compare primary and standby dumps'); +# Check some data from pg_stat_statements. +$node_primary->safe_psql('postgres', 'CREATE EXTENSION pg_stat_statements'); +# This gathers data based on the first characters for some common query types, +# providing coverage for SELECT, DMLs, and some DDLs. +my $result = $node_primary->safe_psql( + 'postgres', + qq{WITH select_stats AS + (SELECT upper(substr(query, 1, 6)) AS select_query + FROM pg_stat_statements + WHERE upper(substr(query, 1, 6)) IN ('SELECT', 'UPDATE', + 'INSERT', 'DELETE', + 'CREATE')) + SELECT select_query, count(select_query) > 1 AS some_rows +FROM select_stats +GROUP BY select_query ORDER BY select_query;}); +is( $result, qq(CREATE|t +DELETE|t +INSERT|t +SELECT|t +UPDATE|t), 'check contents of pg_stat_statements on regression database'); + $node_standby_1->stop; $node_primary->stop; signature.asc Description: PGP signature
RE: Allow logical replication to copy tables in binary format
Hi, Melih On Monday, January 30, 2023 7:50 PM Melih Mutlu wrote: > Thanks for reviewing. ... > Well, with this patch, it will begin to fail in the table copy phase... The latest patch doesn't get updated for more than two weeks after some review comments. If you don't have time, I would like to help updating the patch for you and other reviewers. Kindly let me know your status. Best Regards, Takamichi Osumi
Re: Time delayed LR (WAS Re: logical replication restrictions)
At Wed, 15 Feb 2023 11:29:18 +, "Hayato Kuroda (Fujitsu)" wrote in > Dear Andres and other hackers, > > > OTOH, if we want to implement the functionality on publisher-side, > > I think we need to first consider the interface. > > We can think of two options (a) Have it as a subscription parameter as the > > patch > > has now and > > then pass it as an option to the publisher which it will use to delay; > > (b) Have it defined on publisher-side, say via GUC or some other way. > > The basic idea could be that while processing commit record (in > > DecodeCommit), > > we can somehow check the value of delay and then use it there to delay > > sending > > the xact. > > Also, during delay, we need to somehow send the keepalive and process > > replies, > > probably via a new callback or by some existing callback. > > We also need to handle in-progress and 2PC xacts in a similar way. > > For the former, probably we would need to apply the delay before sending > > the first > > stream. > > Could you please share what you feel on this direction as well ? > > I implemented a patch that the delaying is done on the publisher side. In > this patch, > approach (a) was chosen, in which min_apply_delay is specified as a > subscription > parameter, and then apply worker passes it to the publisher as an output > plugin option. As Amit-K mentioned, we may need to change the name of the option in this version, since the delay mechanism in this version causes a delay in sending from publisher than delaying apply on the subscriber side. I'm not sure why output plugin is involved in the delay mechanism. It appears to me that it would be simpler if the delay occurred in reorder buffer or logical decoder instead. Perhaps what I understand correctly is that we could delay right before only sending commit records in this case. If we delay at publisher end, all changes will be sent at once if !streaming, and otherwise, all changes in a transaction will be spooled at subscriber end. In any case, apply worker won't be holding an active transaction unnecessarily. Of course we need add the mechanism to process keep-alive and status report messages. > During the delay, the walsender periodically checks and processes replies > from the > apply worker and sends keepalive messages if needed. Therefore, the ability > to handle > keepalives is not loosed. My understanding is that the keep-alives is a different mechanism with a different objective from status reports. Even if subscriber doesn't send a spontaneous or extra status reports at all, connection can be checked and maintained by keep-alive packets. It is possible to setup an asymmetric configuration where only walsender sends keep-alives, but none are sent from the peer. Those setups work fine when no apply-delay involved, but they won't work with the patches we're talking about because the subscriber won't respond to the keep-alive packets from the peer. So when I wrote "practically works" in the last mail, this is what I meant. Thus if someone plans to enable apply_delay for logical replication, that person should be aware of some additional subtle restrictions that are required compared to a non-delayed setups. > To delay the transaction in the output plugin layer, the new > LogicalOutputPlugin > API was added. For now, I choose the output plugin layer but can consider to > do > it from the core if there is a better way. > > Could you please share your opinion? > > Note: thanks for Osumi-san to help implementing. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: DDL result is lost by CREATE DATABASE with WAL_LOG strategy
On Thu, Feb 16, 2023 at 10:24:13AM +0530, Dilip Kumar wrote: > Yes, there is no reason to pass this as false, seems like this is > passed false by mistake. And your patch fixes the issue. So, if I am understanding this stuff right, this issue can create data corruption once a DDL updates any pages of pg_class stored in a template database that gets copied by this routine. In this case, the patch sent makes sure that any page copied will get written once a checkpoint kicks in. Shouldn't we have at least a regression test for such a scenario? The issue can happen when updating a template database after creating a database from it, which is less worrying than the initial impression I got, still I'd like to think that we should have some coverage as of the special logic this code path relies on for pg_class when reading its buffers. I have not given much attention to this area, but I am a bit suspicious that enforcing the default as WAL_LOG was a good idea for 15~, TBH. We are usually much more conservative when it comes to such choices, switching to the new behavior after a few years would have been wiser.. -- Michael signature.asc Description: PGP signature
Re: Dead code in ps_status.c
Thomas Munro writes: > Therefore I think it is safe to drop the PS_USE_PS_STRING and > PS_USE_CHANGE_ARGV code branches, remove a bunch of outdated comments > and macro tests, and prune the defunct configure/meson probe. Seems reasonable. Patch passes an eyeball check. > I guess (defined(sun) && !defined(BSD)) || defined(__svr5__) could be > changed to just defined(sun) (surely there are no other living > SysV-derived systems, and I think non-BSD Sun probably meant "Solaris > but not SunOS"), but I don't know so I didn't touch that. Hm, is "defined(sun)" true on any live systems at all? regards, tom lane
remove duplicate comment.
Hi, The attached patch removes the comment line noting the same as the previous paragraph of the ExecUpdateAct() prolog comment. -- Regards, Amul Sul EDB: http://www.enterprisedb.com diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c index f419c47065a..645e62f4a76 100644 --- a/src/backend/executor/nodeModifyTable.c +++ b/src/backend/executor/nodeModifyTable.c @@ -1948,9 +1948,6 @@ ExecUpdatePrepareSlot(ResultRelInfo *resultRelInfo, * caller is also in charge of doing EvalPlanQual if the tuple is found to * be concurrently updated. However, in case of a cross-partition update, * this routine does it. - * - * Caller is in charge of doing EvalPlanQual as necessary, and of keeping - * indexes current for the update. */ static TM_Result ExecUpdateAct(ModifyTableContext *context, ResultRelInfo *resultRelInfo,
Missing free_var() at end of accum_sum_final()?
Hi, I noticed the NumericVar's pos_var and neg_var are not free_var()'d at the end of accum_sum_final(). The potential memory leak seems small, since the function is called only once per sum() per worker (and from a few more places), but maybe they should be free'd anyways for correctness? /Joel
Re: Dead code in ps_status.c
On Thu, Feb 16, 2023 at 6:34 PM Tom Lane wrote: > Thomas Munro writes: > > Therefore I think it is safe to drop the PS_USE_PS_STRING and > > PS_USE_CHANGE_ARGV code branches, remove a bunch of outdated comments > > and macro tests, and prune the defunct configure/meson probe. > > Seems reasonable. Patch passes an eyeball check. Thanks for looking. > > I guess (defined(sun) && !defined(BSD)) || defined(__svr5__) could be > > changed to just defined(sun) (surely there are no other living > > SysV-derived systems, and I think non-BSD Sun probably meant "Solaris > > but not SunOS"), but I don't know so I didn't touch that. > > Hm, is "defined(sun)" true on any live systems at all? My GCC compile farm account seems to have expired, or something, so I couldn't check on wrasse's host (though whether wrasse is "live" is debatable: Solaris 11.3 has reached EOL, it's just that the CPU is too old to be upgraded, so it's not testing a real OS that anyone would actually run PostgreSQL on). But from some googling[1], I think __sun, __sun__ and sun should all be defined. Ohh, but __svr5__ should not be. Solaris boxes define __svr4__, I was confused by the two fives. __svr5__ was SCO/Unixware, another dead OS[1], so I think we can just remove that one too. So, yeah, I think we should replace (defined(sun) && !defined(BSD)) || defined(__svr5__) with defined(__sun). (Hmph. We have all of __sun__, __sun and sun in the tree.) [1] https://stackoverflow.com/questions/16618604/solaris-and-preprocessor-macros [2] https://en.wikipedia.org/wiki/UNIX_System_V#SVR5_/_UnixWare_7
RE: Time delayed LR (WAS Re: logical replication restrictions)
Dear Horiguchi-san, Thank you for responding! Before modifying patches, I want to confirm something you said. > As Amit-K mentioned, we may need to change the name of the option in > this version, since the delay mechanism in this version causes a delay > in sending from publisher than delaying apply on the subscriber side. Right, will be changed. > I'm not sure why output plugin is involved in the delay mechanism. It > appears to me that it would be simpler if the delay occurred in > reorder buffer or logical decoder instead. I'm planning to change, but.. > Perhaps what I understand > correctly is that we could delay right before only sending commit > records in this case. If we delay at publisher end, all changes will > be sent at once if !streaming, and otherwise, all changes in a > transaction will be spooled at subscriber end. In any case, apply > worker won't be holding an active transaction unnecessarily. What about parallel case? Latest patch does not reject the combination of parallel streaming mode and delay. If delay is done at commit and subscriber uses an parallel apply worker, it may acquire lock for a long time. > Of > course we need add the mechanism to process keep-alive and status > report messages. Could you share the good way to handle keep-alive and status messages if you have? If we changed to the decoding layer, it is strange to call walsender function directly. > Those setups work fine when no > apply-delay involved, but they won't work with the patches we're > talking about because the subscriber won't respond to the keep-alive > packets from the peer. So when I wrote "practically works" in the > last mail, this is what I meant. I'm not sure around the part. I think in the latest patch, subscriber can respond to the keepalive packets from the peer. Also, publisher can respond to the peer. Could you please tell me if you know a case that publisher or subscriber cannot respond to the opposite side? Note that if we apply the publisher-side patch, we don't have to apply subscriber-side patch. Best Regards, Hayato Kuroda FUJITSU LIMITED
Re: Missing free_var() at end of accum_sum_final()?
On Thu, Feb 16, 2023 at 06:59:13AM +0100, Joel Jacobson wrote: > I noticed the NumericVar's pos_var and neg_var are not free_var()'d > at the end of accum_sum_final(). > > The potential memory leak seems small, since the function is called > only once per sum() per worker (and from a few more places), but > maybe they should be free'd anyways for correctness? Indeed, it is true that any code path of numeric.c that relies on a NumericVar with an allocation done in its buffer is careful enough to free it, except for generate_series's SRF where one step of the computation is done. I don't see directly why you could not do the following: @@ -11973,6 +11973,9 @@ accum_sum_final(NumericSumAccum *accum, NumericVar *result) /* And add them together */ add_var(&pos_var, &neg_var, result); + free_var(&pos_var); + free_var(&neg_var); + /* Remove leading/trailing zeroes */ strip_var(result); -- Michael signature.asc Description: PGP signature
Re: Support logical replication of global object commands
On Tue, Aug 30, 2022 at 8:09 AM Zheng Li wrote: > > > > I think a publication of ALL OBJECTS sounds intuitive. Does it mean we'll > > > publish all DDL commands, all commit and abort operations in every > > > database if there is such publication of ALL OBJECTS? > > > > > > > Actually, I intend something for global objects. But the main thing > > that is worrying me about this is that we don't have a clean way to > > untie global object replication from database-specific object > > replication. > > I think ultimately we need a clean and efficient way to publish (and > subscribe to) any changes in all databases, preferably in one logical > replication slot. > Agreed. I was thinking currently for logical replication both walsender and slot are database-specific. So we need a way to distinguish the WAL for global objects and then avoid filtering based on the slot's database during decoding. I also thought about whether we want to have a WALSender that is not connected to a database for the replication of global objects but I couldn't come up with a reason for doing so. Do you have any thoughts on this matter? -- With Regards, Amit Kapila.
Re: DDL result is lost by CREATE DATABASE with WAL_LOG strategy
On Thu, Feb 16, 2023 at 02:26:55PM +0900, Michael Paquier wrote: > So, if I am understanding this stuff right, this issue can create data > corruption once a DDL updates any pages of pg_class stored in a > template database that gets copied by this routine. In this case, the > patch sent makes sure that any page copied will get written once a > checkpoint kicks in. Shouldn't we have at least a regression test for > such a scenario? The issue can happen when updating a template > database after creating a database from it, which is less worrying > than the initial impression I got, still I'd like to think that we > should have some coverage as of the special logic this code path > relies on for pg_class when reading its buffers. I was able to quickly hack together a TAP test that seems to reliably fail before the fix and pass afterwards. There's probably a better place for it, though... -- Nathan Bossart Amazon Web Services: https://aws.amazon.com diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c index ef05633bb0..a0259cc593 100644 --- a/src/backend/commands/dbcommands.c +++ b/src/backend/commands/dbcommands.c @@ -296,7 +296,7 @@ ScanSourceDatabasePgClass(Oid tbid, Oid dbid, char *srcpath) CHECK_FOR_INTERRUPTS(); buf = ReadBufferWithoutRelcache(rlocator, MAIN_FORKNUM, blkno, - RBM_NORMAL, bstrategy, false); + RBM_NORMAL, bstrategy, true); LockBuffer(buf, BUFFER_LOCK_SHARE); page = BufferGetPage(buf); diff --git a/src/test/recovery/meson.build b/src/test/recovery/meson.build index 209118a639..6e9f8a7c7f 100644 --- a/src/test/recovery/meson.build +++ b/src/test/recovery/meson.build @@ -39,6 +39,7 @@ tests += { 't/031_recovery_conflict.pl', 't/032_relfilenode_reuse.pl', 't/033_replay_tsp_drops.pl', + 't/100_bugs.pl', ], }, } diff --git a/src/test/recovery/t/100_bugs.pl b/src/test/recovery/t/100_bugs.pl new file mode 100644 index 00..6429a352e9 --- /dev/null +++ b/src/test/recovery/t/100_bugs.pl @@ -0,0 +1,24 @@ + +# Copyright (c) 2023, PostgreSQL Global Development Group + +# Tests for various bugs found over time +use strict; +use warnings; +use PostgreSQL::Test::Cluster; +use PostgreSQL::Test::Utils; +use Test::More; + +# DDL on template is persisted after creating database using WAL_LOG strategy +my $node = PostgreSQL::Test::Cluster->new('node'); +$node->init; +$node->start; +$node->safe_psql("postgres", "CREATE DATABASE test STRATEGY WAL_LOG;"); +$node->safe_psql("template1", "CREATE TABLE test (a INT);"); +$node->safe_psql("postgres", "CHECKPOINT;"); +$node->stop('immediate'); +$node->start; +my $result = $node->safe_psql("template1", "SELECT count(*) FROM test;"); +is($result, "0", "check table still exists"); +$node->stop; + +done_testing();
Re: remove duplicate comment.
On Thu, Feb 16, 2023 at 11:05:13AM +0530, Amul Sul wrote: > The attached patch removes the comment line noting the same as the > previous paragraph of the ExecUpdateAct() prolog comment. - * Caller is in charge of doing EvalPlanQual as necessary, and of keeping - * indexes current for the update. Indeed, good catch. Both are mentioned in the previous paragraph. -- Michael signature.asc Description: PGP signature
Re: DDL result is lost by CREATE DATABASE with WAL_LOG strategy
On Thu, Feb 16, 2023 at 12:11 PM Nathan Bossart wrote: > > On Thu, Feb 16, 2023 at 02:26:55PM +0900, Michael Paquier wrote: > > So, if I am understanding this stuff right, this issue can create data > > corruption once a DDL updates any pages of pg_class stored in a > > template database that gets copied by this routine. In this case, the > > patch sent makes sure that any page copied will get written once a > > checkpoint kicks in. Shouldn't we have at least a regression test for > > such a scenario? The issue can happen when updating a template > > database after creating a database from it, which is less worrying > > than the initial impression I got, still I'd like to think that we > > should have some coverage as of the special logic this code path > > relies on for pg_class when reading its buffers. > > I was able to quickly hack together a TAP test that seems to reliably fail > before the fix and pass afterwards. There's probably a better place for > it, though... I think the below change is not relevant to this bug right? diff --git a/src/test/recovery/meson.build b/src/test/recovery/meson.build index 209118a639..6e9f8a7c7f 100644 --- a/src/test/recovery/meson.build +++ b/src/test/recovery/meson.build @@ -39,6 +39,7 @@ tests += { 't/031_recovery_conflict.pl', 't/032_relfilenode_reuse.pl', 't/033_replay_tsp_drops.pl', + 't/100_bugs.pl', ], }, } -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: DDL result is lost by CREATE DATABASE with WAL_LOG strategy
On Thu, Feb 16, 2023 at 12:32 PM Dilip Kumar wrote: > I think the below change is not relevant to this bug right? > > diff --git a/src/test/recovery/meson.build b/src/test/recovery/meson.build > index 209118a639..6e9f8a7c7f 100644 > --- a/src/test/recovery/meson.build > +++ b/src/test/recovery/meson.build > @@ -39,6 +39,7 @@ tests += { >'t/031_recovery_conflict.pl', >'t/032_relfilenode_reuse.pl', >'t/033_replay_tsp_drops.pl', > + 't/100_bugs.pl', > ], >}, > } Why not? The patch creates 100_bugs.pl so it also adds it to meson.build. -- Robert Haas EDB: http://www.enterprisedb.com
Re: [PATCH] Add pretty-printed XML output option
On 16.02.23 05:38, Nikolay Samokhvalov wrote: On Thu, Feb 9, 2023 at 2:31 AM Peter Eisentraut wrote: I suggest we call it "xmlformat", which is an established term for this. Some very-very old, rusted memory told me that there was something in standard – and indeed, it seems it described an optional Feature X069, “XMLSerialize: INDENT” for XMLSERIALIZE. So probably pretty-printing should go there, to XMLSERIALIZE, to follow the standard? Oracle also has an option for it in XMLSERIALIZE, although in a slightly different form, with ability to specify the number of spaces for indentation https://docs.oracle.com/database/121/SQLRF/functions268.htm#SQLRF06231. Hi Nikolay, My first thought was to call it xmlpretty, to make it consistent with the jsonb equivalent "jsonb_pretty". But yes, you make a good observation .. xmlserialize seems to be a much better candidate. I would be willing to refactor my patch if we agree on xmlserialize. Thanks for the suggestion! Jim
Re: Missing free_var() at end of accum_sum_final()?
On Thu, Feb 16, 2023, at 07:26, Michael Paquier wrote: > Indeed, it is true that any code path of numeric.c that relies on a > NumericVar with an allocation done in its buffer is careful enough to > free it, except for generate_series's SRF where one step of the > computation is done. I don't see directly why you could not do the > following: > @@ -11973,6 +11973,9 @@ accum_sum_final(NumericSumAccum *accum, > NumericVar *result) > /* And add them together */ > add_var(&pos_var, &neg_var, result); > > + free_var(&pos_var); > + free_var(&neg_var); > + Thanks for looking and explaining. I added the free_var() calls after strip_var() to match the similar existing code at the end of sqrt_var(). Not that it matters, but thought it looked nicer to keep them in the same order. /Joel numeric-accum-sum-final-free-var.patch Description: Binary data
wrong query result due to wang plan
Hi hackers, I have this query as shown below: select ref_1.r_comment as c0, subq_0.c1 as c1 from public.region as sample_0 right join public.partsupp as sample_1 right join public.lineitem as sample_2 on (cast(null as path) = cast(null as path)) on (cast(null as "timestamp") < cast(null as "timestamp")) inner join public.lineitem as ref_0 on (true) left join (select sample_3.ps_availqty as c1, sample_3.ps_comment as c2 from public.partsupp as sample_3 where false order by c1, c2 ) as subq_0 on (sample_1.ps_supplycost = subq_0.c1 ) right join public.region as ref_1 on (sample_1.ps_availqty = ref_1.r_regionkey ) where ref_1.r_comment is not NULL order by c0, c1; *This query has different result on pg12.12 and on HEAD*, on pg12.12: c0 | c1 -+ even, ironic theodolites according to the bold platelets wa | furiously unusual packages use carefully above the unusual, exp | silent, bold requests sleep slyly across the quickly sly dependencies. furiously silent instructions alongside | special, bold deposits haggle foxes. platelet | special Tiresias about the furiously even dolphins are furi | (5 rows) its plan : QUERY PLAN -- Sort Sort Key: ref_1.r_comment, c1 -> Hash Left Join Hash Cond: (ref_1.r_regionkey = ps_availqty) -> Seq Scan on region ref_1 Filter: (r_comment IS NOT NULL) -> Hash -> Result One-Time Filter: false (9 rows) But on HEAD(pg16devel), its results below: c0 | c1 + (0 rows) its plan: QUERY PLAN Sort Sort Key: ref_1.r_comment, subq_0.c1 -> Result One-Time Filter: false (4 rows) Attached file included table schema info. regards, tender wang dbt3-s0.01-janm.sql Description: Binary data
Re: DDL result is lost by CREATE DATABASE with WAL_LOG strategy
On Thu, Feb 16, 2023 at 12:37:57PM +0530, Robert Haas wrote: > Why not? The patch creates 100_bugs.pl so it also adds it to meson.build. It would not matter for REL_15_STABLE, but on HEAD all the new TAP test files have to be added in their respective meson.build. If you don't do that, the CFBot would not test it, neither would a local build with meson. Adding a test in src/test/recovery/ is OK by me. This is a recovery case, and that's a.. Bug. Another possible name would be something like 0XX_create_database.pl, now that's me being picky. -- Michael signature.asc Description: PGP signature
Re: Missing free_var() at end of accum_sum_final()?
On Thu, Feb 16, 2023 at 08:08:37AM +0100, Joel Jacobson wrote: > I added the free_var() calls after strip_var() to match the similar > existing code at the end of sqrt_var(). > Not that it matters, but thought it looked nicer to keep them in the > same order. WFM. Thanks! -- Michael signature.asc Description: PGP signature
Re: [PATCH] Add pretty-printed XML output option
On 16.02.23 00:13, Peter Smith wrote: Today I fetched and tried the latest v11. It is failing too, but only just. - see attached file pretty-v11-results It looks only due to a whitespace EOF issue in the xml_2.out @@ -1679,4 +1679,4 @@ -- XML format: empty string SELECT xmlformat(''); ERROR: invalid XML document -\set VERBOSITY default \ No newline at end of file +\set VERBOSITY default -- The attached patch update (v12-0002) fixes the xml_2.out for me. It works for me too. Thanks for the quick fix! Jim
Re: DDL result is lost by CREATE DATABASE with WAL_LOG strategy
On Thu, Feb 16, 2023 at 12:38 PM Robert Haas wrote: > > On Thu, Feb 16, 2023 at 12:32 PM Dilip Kumar wrote: > > I think the below change is not relevant to this bug right? > > > > diff --git a/src/test/recovery/meson.build b/src/test/recovery/meson.build > > index 209118a639..6e9f8a7c7f 100644 > > --- a/src/test/recovery/meson.build > > +++ b/src/test/recovery/meson.build > > @@ -39,6 +39,7 @@ tests += { > >'t/031_recovery_conflict.pl', > >'t/032_relfilenode_reuse.pl', > >'t/033_replay_tsp_drops.pl', > > + 't/100_bugs.pl', > > ], > >}, > > } > > Why not? The patch creates 100_bugs.pl so it also adds it to meson.build. Yeah my bad, I somehow assumed this was an existing file. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: Use pg_pwritev_with_retry() instead of write() in dir_open_for_write() to avoid partial writes?
On Wed, Feb 15, 2023 at 01:00:00PM +0530, Bharath Rupireddy wrote: > The v3 patch reduces initialization of iovec array elements which is a > clear win when pg_pwrite_zeros is called for sizes less than BLCKSZ > many times (I assume this is what is needed for the relation extension > lock improvements feature). However, it increases the number of iovec > initialization with zerobuf for the cases when pg_pwrite_zeros is > called for sizes far greater than BLCKSZ (for instance, WAL file > initialization). It seems to me that v3 would do extra initializations only if pg_pwritev_with_retry() does *not* retry its writes, but that's not the case as it retries on a partial write as per its name. The number of iov buffers is stricly capped by remaining_size. FWIW, I find v3 proposed more elegant. > FWIW, I attached v4 patch, a simplified version of the v2 - it > initializes all the iovec array elements if the total blocks to be > written crosses lengthof(iovec array), otherwise it initializes only > the needed blocks. + static size_t zbuf_sz = BLCKSZ; In v4, what's the advantage of marking that as static? It could actually be dangerous if this is carelessly updated. Well, that's not the case, still.. -- Michael signature.asc Description: PGP signature