Re: Time delayed LR (WAS Re: logical replication restrictions)
Here are some review comments for patch v9-0001: == GENERAL 1. min_ prefix? What's the significance of the "min_" prefix for this parameter? I'm guessing the background is that at one time it was considered to be a GUC so took a name similar to GUC recovery_min_apply_delay (??) But in practice, I think it is meaningless and/or misleading. For example, suppose the user wants to defer replication by 1hr. IMO it is more natural to just say "defer replication by 1 hr" (aka apply_delay='1hr') Clearly it means replication will take place about 1 hr into the future. OTHO saying "defer replication by a MINIMUM of 1 hr" (aka min_apply_delay='1hr') is quite vague because then it is equally valid if the replication gets delayed by 1 hr or 2 hrs or 5 days or 3 weeks since all of those satisfy the minimum delay. The implementation could hardwire a delay of INT_MAX ms but clearly, that's not really what the user would expect. ~ So, I think this parameter should be renamed just as 'apply_delay'. But, if you still decide to keep it as 'min_apply_delay' then there is a lot of other code that ought to be changed to be consistent with that name. e.g. - subapplydelay in catalogs.sgml --> subminapplydelay - subapplydelay in system_views.sql --> subminapplydelay - subapplydelay in pg_subscription.h --> subminapplydelay - subapplydelay in dump.h --> subminapplydelay - i_subapplydelay in pg_dump.c --> i_subminapplydelay - applydelay member name of Form_pg_subscription --> minapplydelay - "Apply Delay" for the column name displayed by describe.c --> "Min apply delay" - more... (IMO the fact that so much code does not currently say 'min' at all is just evidence that the 'min' prefix really didn't really mean much in the first place) == doc/src/sgml/catalogs.sgml 2. Section 31.2 Subscription + + Time delayed replica of subscription is available by indicating + min_apply_delay. See +for details. + How about saying like: SUGGESTION The subscriber replication can be instructed to lag behind the publisher side changes by specifying the min_apply_delay subscription parameter. See XXX for details. == doc/src/sgml/ref/create_subscription.sgml 3. min_apply_delay + + By default, subscriber applies changes as soon as possible. As with + the physical replication feature + (), it can be useful to + have a time-delayed logical replica. This parameter allows you to + delay the application of changes by a specified amount of time. If + this value is specified without units, it is taken as milliseconds. + The default is zero, adding no delay. + "subscriber applies" -> "the subscriber applies" "allows you" -> "lets the user" "The default is zero, adding no delay." -> "The default is zero (no delay)." ~ 4. + larger than the time deviations between servers. Note that + in the case when this parameter is set to a long value, the + replication may not continue if the replication slot falls behind the + current LSN by more than max_slot_wal_keep_size. + See more details in . + 4a. SUGGESTION Note that if this parameter is set to a long delay, the replication will stop if the replication slot falls behind the current LSN by more than max_slot_wal_keep_size. ~ 4b. When it is rendered (like below) it looks a bit repetitive: ... if the replication slot falls behind the current LSN by more than max_slot_wal_keep_size. See more details in max_slot_wal_keep_size. ~ IMO the previous sentence should include the link. SUGGESTION if the replication slot falls behind the current LSN by more than max_slot_wal_keep_size. ~ 5. + +Synchronous replication is affected by this setting when +synchronous_commit is set to +remote_write; every COMMIT +will need to wait to be applied. + Yes, this deserves a big warning -- but I am just not quite sure of the details. I think this impacts more than just "remote_rewrite" -- e.g. the same problem would happen if "synchronous_standby_names" is non-empty. I think this warning needs to be more generic to cover everything. Maybe something like below SUGGESTION: Delaying the replication can mean there is a much longer time between making a change on the publisher, and that change being committed on the subscriber. This can have a big impact on synchronous replication. See https://www.postgresql.org/docs/current/runtime-config-wal.html#GUC-SYNCHRONOUS-COMMIT == src/backend/commands/subscriptioncmds.c 6. parse_subscription_options + ms = interval_to_ms(interval); + if (ms < 0 || ms > INT_MAX) + ereport(ERROR, + errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE), + errmsg("%lld ms is outside the valid range for option \"%s\"", +(long long) ms, "min_apply_delay")); "for option" -> "for parameter" == src/backend/replication/logical/worker.c 7. apply_delay +static
Re: Generate pg_stat_get_* functions with Macros
On Tue, Dec 06, 2022 at 12:23:55PM +0530, Bharath Rupireddy wrote: > Likewise, is there a plan to add function generation macros for > pg_stat_get_bgwriter, pg_stat_get_xact and so on? Yes, I saw that and we could do it, but I did not get as much enthusiastic in terms of code reduction. -- Michael signature.asc Description: PGP signature
Re: Perform streaming logical transactions by background workers and parallel apply
On Tue, Dec 6, 2022 at 2:51 PM Amit Kapila wrote: > > On Tue, Dec 6, 2022 at 5:27 AM Peter Smith wrote: > > > > Here are my review comments for patch v55-0002 > > > ... > > 4. > > > > /* > > + * Replay the spooled messages in the parallel apply worker if the leader > > apply > > + * worker has finished serializing changes to the file. > > + */ > > +static void > > +pa_spooled_messages(void) > > > > I'm not 100% sure of the logic, so IMO maybe the comment should say a > > bit more about how this works: > > > > Specifically, let's say there was some timeout and the LA needed to > > write the spool file, then let's say the PA timed out and found itself > > inside this function. Now, let's say the LA is still busy writing the > > file -- so what happens next? > > > > Does this function simply return, then the main PA loop waits again, > > then the times out again, then PA finds itself back inside this > > function again... and that keeps happening over and over until > > eventually the spool file is found FS_READY? Some explanatory comments > > might help. > > > > No, PA will simply wait for LA to finish. See the code handling for > FS_BUSY state. We might want to slightly improve part of the current > comment to: "If the leader apply worker is busy serializing the > partial changes then acquire the stream lock now and wait for the > leader worker to finish serializing the changes". > Sure, "PA will simply wait for LA to finish". Except I think it's not quite that simple because IIUC when LA *does* finish, the PA (this function) will continue and just drop out the bottom -- it cannot apply those spooled messages yet until it cycles all the way back around the main loop and times out again and gets back into pa_spooled_messages function again to get to the FS_READY block of code where it can finally call the 'apply_spooled_messages'... If my understanding is correct, then It's that extra looping that I thought maybe warrants some mention in a comment here. -- Kind Regards, Peter Smith. Fujitsu Australia
Re: Question regarding "Make archiver process an auxiliary process. commit"
At Mon, 5 Dec 2022 12:06:11 +0530, Sravan Kumar wrote in > timeout = PGARCH_AUTOWAKE_INTERVAL - (curtime - last_copy_time); > It so happens that last_copy_time and curtime are always set at the same > time which always makes timeout equal (actually roughly equal) to > PGARCH_AUTOWAKE_INTERVAL. Oooo *^^*. > This behaviour was different before the commit: > d75288fb27b8fe0a926aaab7d75816f091ecdc27, > in which the archiver keeps track of how much time has elapsed since > last_copy_time > in case there was a signal, and it results in a smaller subsequent value of > timeout, until timeout is zero. This also avoids calling > pgarch_ArchiverCopyLoop > before PGARCH_AUTOWAKE_INTERVAL in case there's an intermittent signal. Yes, WaitLatch() (I believe) no longer makes a spurious wakeup. > With the current changes it may be okay to always sleep for > PGARCH_AUTOWAKE_INTERVAL, > but that means curtime and last_copy_time are no more needed. I think you're right. > I would like to validate if my understanding is correct, and which of the > behaviours we would like to retain. As my understanding the patch didn't change the copying behavior of the function. I think we should simplify the loop by removing last_copy_time and curtime in the "if (!time_to_stop)" block. Then we can remove the variable "timeout" and the "if (timeout > 0)" branch. Are you willing to work on this? regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: doc: add missing "id" attributes to extension packaging page
On 06.12.2022 at 01:55, Ian Lawrence Barwick wrote: Oh, now you mention it, I vaguely recall seeing those. However the thread stalled back in March and the patches don't seem to have made it to a CommitFest entry. Yes, my patches added quite a few ids and also some xsl/css logic to make them more discoverable in the browser but I had gotten the impression that nobody besides me cares about this, so I didn't push it any further. Brar, would you like to add an entry so they don't get lost? See: https://commitfest.postgresql.org/41/ Yes. I can certainly add them to the commitfest although I'm not sure if they still apply cleanly. I can also rebase or extend them if somebody cares. Regards, Brar
Re: doc: add missing "id" attributes to extension packaging page
On 2022-Dec-06, Brar Piening wrote: > On 06.12.2022 at 01:55, Ian Lawrence Barwick wrote: > > > Oh, now you mention it, I vaguely recall seeing those. However the thread > > stalled back in March and the patches don't seem to have made it to a > > CommitFest entry. > > Yes, my patches added quite a few ids and also some xsl/css logic to > make them more discoverable in the browser but I had gotten the > impression that nobody besides me cares about this, so I didn't push it > any further. I care. The problem last time is that we were in the middle of the last commitfest, so we were (or at least I was) distracted by other stuff. Looking at the resulting psql page, https://pgdocs.piening.info/app-psql.html#APP-PSQL-OPTIONS-EXPANDED I note that the ID for the -x option is called "options-blah". I understand where does this come from: it's the "expanded" bit in the "options" section. However, put together it's a bit silly to have "options" in plural there; it would make more sense to have it be https://pgdocs.piening.info/app-psql.html#APP-PSQL-OPTION-EXPANDED (where you can read more naturally "the expanded option for psql"). How laborious would it be to make it so? > Yes. I can certainly add them to the commitfest although I'm not sure if > they still apply cleanly. It'll probably have some conflicts, yeah. > I can also rebase or extend them if somebody cares. I would welcome separate patches: one to add the IDs, another for the XSL/CSS stuff. That allows us to discuss them separately. -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
Re: refactor ExecGrant_*() functions
Peter Eisentraut wrote: > Continuing the ideas in [0], this patch refactors the ExecGrant_Foo() > functions and replaces many of them by a common function that is driven by the > ObjectProperty table. > > It would be nice to do more here, for example ExecGrant_Language(), which has > additional non-generalizable checks, but I think this is already a good start. > For example, the work being discussed on privileges on publications [1] would > be able to take good advantage of this. Right, I mostly copy and pasted the code when writing ExecGrant_Publication(). I agree that your refactoring is very useful. Attached are my proposals for improvements. One is to avoid memory leak, the other tries to improve readability a little bit. > [0]: > https://www.postgresql.org/message-id/95c30f96-4060-2f48-98b5-a4392d3b6...@enterprisedb.com > [1]: https://www.postgresql.org/message-id/flat/20330.1652105397@antos -- Antonin Houska Web: https://www.cybertec-postgresql.com diff --git a/src/backend/catalog/aclchk.c b/src/backend/catalog/aclchk.c index d3121a469f..ac4490c0b8 100644 --- a/src/backend/catalog/aclchk.c +++ b/src/backend/catalog/aclchk.c @@ -2228,6 +2234,9 @@ ExecGrant_generic(InternalGrant *istmt, Oid classid, AclMode default_privs) newtuple = heap_modify_tuple(tuple, RelationGetDescr(relation), values, nulls, replaces); + pfree(values); + pfree(nulls); + pfree(replaces); CatalogTupleUpdate(relation, &newtuple->t_self, newtuple); diff --git a/src/backend/catalog/aclchk.c b/src/backend/catalog/aclchk.c index d3121a469f..ac4490c0b8 100644 --- a/src/backend/catalog/aclchk.c +++ b/src/backend/catalog/aclchk.c @@ -2144,6 +2144,7 @@ ExecGrant_generic(InternalGrant *istmt, Oid classid, AclMode default_privs) { Oid objectid = lfirst_oid(cell); Datum aclDatum; + Datum nameDatum; bool isNull; AclMode avail_goptions; AclMode this_privileges; @@ -2197,6 +2198,11 @@ ExecGrant_generic(InternalGrant *istmt, Oid classid, AclMode default_privs) old_acl, ownerId, &grantorId, &avail_goptions); + nameDatum = SysCacheGetAttr(cacheid, tuple, + get_object_attnum_name(classid), + &isNull); + Assert(!isNull); + /* * Restrict the privileges to what we can actually grant, and emit the * standards-mandated warning and error messages. @@ -2205,7 +2211,7 @@ ExecGrant_generic(InternalGrant *istmt, Oid classid, AclMode default_privs) restrict_and_check_grant(istmt->is_grant, avail_goptions, istmt->all_privs, istmt->privileges, objectid, grantorId, get_object_type(classid, objectid), - NameStr(*DatumGetName(SysCacheGetAttr(cacheid, tuple, get_object_attnum_name(classid), &isNull))), + NameStr(*DatumGetName(nameDatum)), 0, NULL); /*
A problem about ParamPathInfo for an AppendPath
While trying the codes of MemoizePath I noticed that MemoizePath cannot be generated if inner path is a union-all AppendPath, even if it is parameterized. And I found the problem is that for a union-all AppendPath the ParamPathInfo is not created in the usual way. Instead it is created with get_appendrel_parampathinfo, which just creates a struct with no ppi_clauses. As a result, MemoizePath cannot find a proper cache key to use. This problem does not exist for AppendPath generated for a partitioned table though, because in this case we always create a normal param_info with ppi_clauses, for use in run-time pruning. As an illustration, we can use the table 'prt' in sql/memoize.sql with the settings below set enable_hashjoin to off; set enable_mergejoin to off; set enable_seqscan to off; set enable_material to off; explain (costs off) select * from prt_p1 t1 join prt t2 on t1.a = t2.a; QUERY PLAN -- Nested Loop -> Index Only Scan using iprt_p1_a on prt_p1 t1 -> Memoize Cache Key: t1.a Cache Mode: logical -> Append -> Index Only Scan using iprt_p1_a on prt_p1 t2_1 Index Cond: (a = t1.a) -> Index Only Scan using iprt_p2_a on prt_p2 t2_2 Index Cond: (a = t1.a) (10 rows) explain (costs off) select * from prt_p1 t1 join (select * from prt_p1 union all select * from prt_p2) t2 on t1.a = t2.a; QUERY PLAN --- Nested Loop -> Index Only Scan using iprt_p1_a on prt_p1 t1 -> Append -> Index Only Scan using iprt_p1_a on prt_p1 Index Cond: (a = t1.a) -> Index Only Scan using iprt_p2_a on prt_p2 Index Cond: (a = t1.a) (7 rows) As we can see, MemoizePath can be generated for partitioned AppendPath but not for union-all AppendPath. For the fix I think we can relax the check in create_append_path and always use get_baserel_parampathinfo if the parent is a baserel. --- a/src/backend/optimizer/util/pathnode.c +++ b/src/backend/optimizer/util/pathnode.c @@ -1266,7 +1266,7 @@ create_append_path(PlannerInfo *root, * partition, and it's not necessary anyway in that case. Must skip it if * we don't have "root", too.) */ - if (root && rel->reloptkind == RELOPT_BASEREL && IS_PARTITIONED_REL(rel)) + if (root && rel->reloptkind == RELOPT_BASEREL) pathnode->path.param_info = get_baserel_parampathinfo(root, rel, required_outer); Thanks Richard
Re: Add semi-join pushdown to postgres_fdw
Hi, Yuki. Thanks for looking at this patch. fujii.y...@df.mitsubishielectric.co.jp писал 2022-12-03 06:02: question1) > + if (jointype == JOIN_SEMI && bms_is_member(var->varno, innerrel->relids) && !bms_is_member(var->varno, outerrel->relids)) It takes time for me to find in what case this condition is true. There is cases in which this condition is true for semi-join of two baserels when running query which joins more than two relations such as query2 and query3. Running queries such as query2, you maybe want to pushdown of only semi-join path of joinrel(outerrel) defined by (f_t1 a1 join f_t3 a2 on a1.c1 = a2.c1) and baserel(innerrel) f_t3 because of safety deparse. So you add this condition. Becouase of this limitation, your patch can't push down subquery expression "exists (select null from f_t2 where c1 = a1.c1)" in query3. I think, it is one of difficulty points for semi-join pushdown. This is my understanding of the intent of this condition and the restrictions imposed by this condition. Is my understanding right? IIRC, planner can create semi-join, which targetlist references Vars from inner join relation. However, it's deparsed as exists and so we can't reference it from SQL. So, there's this check - if Var is referenced in semi-join target list, it can't be pushed down. You can see this if comment out this check. EXPLAIN (verbose, costs off) SELECT ft2.*, ft4.* FROM ft2 INNER JOIN (SELECT * FROM ft4 WHERE EXISTS ( SELECT 1 FROM ft2 WHERE ft2.c2=ft4.c2)) ft4 ON ft2.c2 = ft4.c1 INNER JOIN (SELECT * FROM ft2 WHERE EXISTS ( SELECT 1 FROM ft4 WHERE ft2.c2=ft4.c2)) ft21 ON ft2.c2 = ft21.c2 WHERE ft2.c1 > 900 ORDER BY ft2.c1 LIMIT 10; will fail with EXPLAIN SELECT r8.c2, r9.c2 FROM "S 1"."T 1" r8 WHERE (EXISTS (SELECT NULL FROM "S 1"."T 3" r9 WHERE ((r8.c2 = r9.c2 Here you can see that SELECT * FROM ft2 WHERE EXISTS ( SELECT 1 FROM ft4 WHERE ft2.c2=ft4.c2) was transformed to SELECT r8.c2, r9.c2 FROM "S 1"."T 1" r8 WHERE (EXISTS (SELECT NULL FROM "S 1"."T 3" r9 WHERE ((r8.c2 = r9.c2 where our exists subquery is referenced from tlist. It's fine for plan (relations, participating in semi-join, can be referenced in tlist), but is not going to work with EXISTS subquery. BTW, there's a comment in joinrel_target_ok(). It tells exactly that - 5535 if (jointype == JOIN_SEMI && bms_is_member(var->varno, innerrel->relids) && !bms_is_member(var->varno, outerrel->relids)) 5536 { 5537 /* We deparse semi-join as exists() subquery, and so can't deparse references to inner rel in join target list. */ 5538 ok = false; 5539 break; 5540 } Expanded comment. question2) In foreign_join_ok > * Constructing queries representing ANTI joins is hard, hence Is this true? Is it hard to expand your approach to ANTI join pushdown? I haven't tried, so don't know. question3) You use variables whose name is "addl_condXXX" in the following code. > appendStringInfo(addl_conds, "EXISTS (SELECT NULL FROM %s", join_sql_i.data); Does this naming mean additional literal? Is there more complehensive naming, such as "subquery_exprXXX"? The naming means additional conditions (for WHERE clause, by analogy with ignore_conds and remote_conds). Not sure if subquery_expr sounds better, but if you come with better idea, I'm fine with renaming them. question4) Although really detail, there is expression making space such as "ft4.c2 = ft2.c2" and one making no space such as "c1=ftupper.c1". Is there reason for this difference? If not, need we use same policy for making space? Fixed. -- Best regards, Alexander Pyhalov, Postgres ProfessionalFrom 0f26b0841cb095b4e114984deac2b1b001368c15 Mon Sep 17 00:00:00 2001 From: Alexander Pyhalov Date: Mon, 7 Nov 2022 10:23:32 +0300 Subject: [PATCH v3] postgres_fdw: add support for deparsing semi joins We deparse semi-joins as EXISTS subqueries. So, deparsing semi-join leads to generating addl_conds condition, which is then added to the uppermost JOIN's WHERE clause. --- contrib/postgres_fdw/deparse.c| 201 +--- .../postgres_fdw/expected/postgres_fdw.out| 297 -- contrib/postgres_fdw/postgres_fdw.c | 82 - contrib/postgres_fdw/postgres_fdw.h | 3 +- contrib/postgres_fdw/sql/postgres_fdw.sql | 119 ++- 5 files changed, 620 insertions(+), 82 deletions(-) diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c index 95247656504..10d82d9f2ab 100644 --- a/contrib/postgres_fdw/deparse.c +++ b/contrib/postgres_fdw/deparse.c @@ -179,12 +179,13 @@ static void appendLimitClause(deparse_expr_cxt *context); static void appendConditions(List *exprs, deparse_expr_cxt *context); static void deparseFromExprForRel(StringInfo buf, PlannerInfo *root, RelOptInfo *foreignrel, bool use_alias,
Re: Allow batched insert during cross-partition updates
Amit-san, On Tue, Mar 22, 2022 at 10:17 AM Amit Langote wrote: > Rebased to fix a minor conflict with some recently committed > nodeModifyTable.c changes. Apologies for not having reviewed the patch. Here are some review comments: * The patch conflicts with commit ffbb7e65a, so please update the patch. (That commit would cause an API break, so I am planning to apply a fix to HEAD as well [1].) That commit fixed the handling of pending inserts, which I think would eliminate the need to do this: * ExecModifyTable(), when inserting any remaining batched tuples, must look at the correct set of ResultRelInfos that would've been used by such inserts, because failing to do so would result in those tuples not actually getting inserted. To fix, ExecModifyTable() is now made to get the ResultRelInfos from the PartitionTupleRouting data structure which contains the ResultRelInfo that would be used by those internal inserts. To allow nodeModifyTable.c look inside PartitionTupleRouting, its definition, which was previously local to execPartition.c, is exposed via execPartition.h. * In postgresGetForeignModifyBatchSize(): /* -* Should never get called when the insert is being performed as part of a -* row movement operation. +* Use the auxiliary state if any; see postgresBeginForeignInsert() for +* details on what it represents. */ - Assert(fmstate == NULL || fmstate->aux_fmstate == NULL); + if (fmstate != NULL && fmstate->aux_fmstate != NULL) + fmstate = fmstate->aux_fmstate; I might be missing something, but I think we should leave the Assert as-is, because we still disallow to move rows to another foreign-table partition that is also an UPDATE subplan result relation, which means that any fmstate should have fmstate->aux_fmstate=NULL. * Also in that function: - if (fmstate) + if (fmstate != NULL) This is correct, but I would vote for leaving that as-is, to make back-patching easy. That is all I have for now. I will mark this as Waiting on Author. Best regards, Etsuro Fujita [1] https://www.postgresql.org/message-id/CAPmGK17rmXEY3BL%3DAq71L8UZv5f-mz%3DuxJkz1kMnfSSY%2BpFe-A%40mail.gmail.com
Re: Fix gin index cost estimation
Le vendredi 2 décembre 2022, 13:58:27 CET Ronan Dunklau a écrit : > Le vendredi 2 décembre 2022, 12:33:33 CET Alexander Korotkov a écrit : > > Hi, Ronan! > > Thank you for your patch. Couple of quick questions. > > 1) What magic number 50.0 stands for? I think we at least should make > > it a macro. > > This is what is used in other tree-descending estimation functions, so I > used that too. Maybe a DEFAULT_PAGE_CPU_COST macro would work for both ? If > so I'll separate this into two patches, one introducing the macro for the > other estimation functions, and this patch for gin. The 0001 patch does this. > > > 2) "We only charge one data page for the startup cost" – should this > > be dependent on number of search entries? In fact there was another problem. The current code estimate two different pathes for fetching data pages: in the case of a partial match, it takes into account that all the data pages will have to be fetched. So this is is now taken into account for the CPU cost as well. For the regular search, we scale the number of data pages by the number of search entries. Best regards, -- Ronan Dunklau >From b3d03a96ecb3d03c4c212e3c434def6e0e77fcd4 Mon Sep 17 00:00:00 2001 From: Ronan Dunklau Date: Tue, 6 Dec 2022 11:08:46 +0100 Subject: [PATCH v4 1/2] Extract the cpu page-process cost multiplier into a macro. Btree, GiST and SP-GiST all charge 50.0 * cpu_operator_cost for processing an index page. Extract this to a macro to avoid repeated magic numbers. --- src/backend/utils/adt/selfuncs.c | 7 --- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c index f116924d3c..5baf8cf631 100644 --- a/src/backend/utils/adt/selfuncs.c +++ b/src/backend/utils/adt/selfuncs.c @@ -140,6 +140,7 @@ #include "utils/timestamp.h" #include "utils/typcache.h" +#define DEFAULT_PAGE_CPU_MULTIPLIER 50.0 /* Hooks for plugins to get control when we ask for stats */ get_relation_stats_hook_type get_relation_stats_hook = NULL; @@ -6858,7 +6859,7 @@ btcostestimate(PlannerInfo *root, IndexPath *path, double loop_count, * touched. The number of such pages is btree tree height plus one (ie, * we charge for the leaf page too). As above, charge once per SA scan. */ - descentCost = (index->tree_height + 1) * 50.0 * cpu_operator_cost; + descentCost = (index->tree_height + 1) * DEFAULT_PAGE_CPU_MULTIPLIER * cpu_operator_cost; costs.indexStartupCost += descentCost; costs.indexTotalCost += costs.num_sa_scans * descentCost; @@ -7053,7 +7054,7 @@ gistcostestimate(PlannerInfo *root, IndexPath *path, double loop_count, /* * Likewise add a per-page charge, calculated the same as for btrees. */ - descentCost = (index->tree_height + 1) * 50.0 * cpu_operator_cost; + descentCost = (index->tree_height + 1) * DEFAULT_PAGE_CPU_MULTIPLIER * cpu_operator_cost; costs.indexStartupCost += descentCost; costs.indexTotalCost += costs.num_sa_scans * descentCost; @@ -7108,7 +7109,7 @@ spgcostestimate(PlannerInfo *root, IndexPath *path, double loop_count, /* * Likewise add a per-page charge, calculated the same as for btrees. */ - descentCost = (index->tree_height + 1) * 50.0 * cpu_operator_cost; + descentCost = (index->tree_height + 1) * DEFAULT_PAGE_CPU_MULTIPLIER * cpu_operator_cost; costs.indexStartupCost += descentCost; costs.indexTotalCost += costs.num_sa_scans * descentCost; -- 2.38.1 >From 817f02fa5ace98b343e585568165cc2aef84328a Mon Sep 17 00:00:00 2001 From: Ronan Dunklau Date: Mon, 12 Sep 2022 15:40:18 +0200 Subject: [PATCH v4 2/2] Fix gin costing. GIN index scans were not taking any descent CPU-based cost into account. That made them look cheaper than other types of indexes when they shouldn't be. We use the same heuristic as for btree indexes, but multiplying it by the number of searched entries. Additionnally, the cpu cost for the tree was based largely on genericcostestimate. For a GIN index, we should not charge index quals per tuple, but per entry. On top of this, charge cpu_index_tuple_cost per actual tuple. This should fix the cases where a GIN index is preferred over a btree, and the ones where a memoize node is not added on top of the GIN index scan because it seemed too cheap. Per report of Hung Nguyen. --- src/backend/utils/adt/selfuncs.c | 54 +--- 1 file changed, 50 insertions(+), 4 deletions(-) diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c index 5baf8cf631..0b78fe450b 100644 --- a/src/backend/utils/adt/selfuncs.c +++ b/src/backend/utils/adt/selfuncs.c @@ -7446,6 +7446,7 @@ gincostestimate(PlannerInfo *root, IndexPath *path, double loop_count, qual_arg_cost, spc_random_page_cost, outer_scans; + Cost descentCost; Relation indexRel; GinStatsData ginStats; ListCell *lc; @@ -7670,6 +7671,43 @@ gincostestimate(PlannerInfo *root, IndexPath *path, double loop_count, */ dataPagesFetched = c
RE: Add semi-join pushdown to postgres_fdw
Hi Mr.Pyhalov. Thank you for fixing it and giving more explanation. > IIRC, planner can create semi-join, which targetlist references Vars > from inner join relation. However, it's deparsed as exists and so we > can't reference it from SQL. So, there's this check - if Var is > referenced in semi-join target list, it can't be pushed down. > You can see this if comment out this check. > > EXPLAIN (verbose, costs off) > SELECT ft2.*, ft4.* FROM ft2 INNER JOIN > (SELECT * FROM ft4 WHERE EXISTS ( > SELECT 1 FROM ft2 WHERE ft2.c2=ft4.c2)) ft4 > ON ft2.c2 = ft4.c1 > INNER JOIN > (SELECT * FROM ft2 WHERE EXISTS ( > SELECT 1 FROM ft4 WHERE ft2.c2=ft4.c2)) ft21 > ON ft2.c2 = ft21.c2 > WHERE ft2.c1 > 900 > ORDER BY ft2.c1 LIMIT 10; > > will fail with > EXPLAIN SELECT r8.c2, r9.c2 FROM "S 1"."T 1" r8 WHERE (EXISTS (SELECT > NULL FROM "S 1"."T 3" r9 WHERE ((r8.c2 = r9.c2 > > Here you can see that > SELECT * FROM ft2 WHERE EXISTS ( > SELECT 1 FROM ft4 WHERE ft2.c2=ft4.c2) > > was transformed to > SELECT r8.c2, r9.c2 FROM "S 1"."T 1" r8 WHERE (EXISTS (SELECT NULL > FROM "S 1"."T 3" r9 WHERE ((r8.c2 = r9.c2 > > where our exists subquery is referenced from tlist. It's fine for plan > (relations, participating in semi-join, can be referenced in tlist), > but is not going to work with EXISTS subquery. > BTW, there's a comment in joinrel_target_ok(). It tells exactly that - > > 5535 if (jointype == JOIN_SEMI && > bms_is_member(var->varno, > innerrel->relids) && !bms_is_member(var->varno, outerrel->relids)) > 5536 { > 5537 /* We deparse semi-join as exists() subquery, and > so can't deparse references to inner rel in join target list. */ > 5538 ok = false; > 5539 break; > 5540 } > > Expanded comment. Thank you for expanding your comment and giving examples. Thanks to the above examples, I understood in what case planner wolud create semi-join, which targetlist references Vars from inner join relation. > > question2) In foreign_join_ok > > > * Constructing queries representing ANTI joins is hard, hence > > Is this true? Is it hard to expand your approach to ANTI join > > pushdown? > > I haven't tried, so don't know. I understand the situation. > The naming means additional conditions (for WHERE clause, by analogy > with ignore_conds and remote_conds). Not sure if subquery_expr sounds > better, but if you come with better idea, I'm fine with renaming them. Sure. > > question4) Although really detail, there is expression making space > > such as > > "ft4.c2 = ft2.c2" and one making no space such as "c1=ftupper.c1". > > Is there reason for this difference? If not, need we use same > > policy for making space? Thank you. Later, I'm going to look at other part of your patch. Sincerely yours, Yuuki Fujii -- Yuuki Fujii Information Technology R&D Center Mitsubishi Electric Corporation
Re: Force streaming every change in logical decoding
On Tue, Dec 6, 2022 at 11:53 AM shiy.f...@fujitsu.com wrote: > > Hi hackers, > > In logical decoding, when logical_decoding_work_mem is exceeded, the changes > are > sent to output plugin in streaming mode. But there is a restriction that the > minimum value of logical_decoding_work_mem is 64kB. I tried to add a GUC to > allow sending every change to output plugin without waiting until > logical_decoding_work_mem is exceeded. > > This helps to test streaming mode. For example, to test "Avoid streaming the > transaction which are skipped" [1], it needs many XLOG_XACT_INVALIDATIONS > messages. With the new option, it can be tested with fewer changes and in less > time. Also, this new option helps to test more scenarios for "Perform > streaming > logical transactions by background workers" [2]. > Yeah, I think this can also help in reducing the time for various tests in test_decoding/stream and src/test/subscription/t/*_stream_*.pl file by reducing the number of changes required to invoke streaming mode. Can we think of making this GUC extendible to even test more options on server-side (publisher) and client-side (subscriber) cases? For example, we can have something like logical_replication_mode with the following valid values: (a) server_serialize: this will serialize each change to file on publishers and then on commit restore and send all changes; (b) server_stream: this will stream each change as currently proposed in your patch. Then if we want to extend it for subscriber-side testing then we can introduce new options like client_serialize for the case being discussed in the email [1]. Thoughts? [1] - https://www.postgresql.org/message-id/CAD21AoAVUfDrm4-%3DykihNAmR7bTX-KpHXM9jc42RbHePJv5k1w%40mail.gmail.com -- With Regards, Amit Kapila.
Re: Time delayed LR (WAS Re: logical replication restrictions)
On Tue, Dec 6, 2022 at 1:30 PM Peter Smith wrote: > > Here are some review comments for patch v9-0001: > > == > > GENERAL > > 1. min_ prefix? > > What's the significance of the "min_" prefix for this parameter? I'm > guessing the background is that at one time it was considered to be a > GUC so took a name similar to GUC recovery_min_apply_delay (??) > > But in practice, I think it is meaningless and/or misleading. For > example, suppose the user wants to defer replication by 1hr. IMO it is > more natural to just say "defer replication by 1 hr" (aka > apply_delay='1hr') Clearly it means replication will take place about > 1 hr into the future. OTHO saying "defer replication by a MINIMUM of 1 > hr" (aka min_apply_delay='1hr') is quite vague because then it is > equally valid if the replication gets delayed by 1 hr or 2 hrs or 5 > days or 3 weeks since all of those satisfy the minimum delay. The > implementation could hardwire a delay of INT_MAX ms but clearly, > that's not really what the user would expect. > There is another way to look at this naming. It is quite possible user has set its value as '1 second' and the transaction is delayed by more than that say because the publisher delayed sending it. There could be various reasons why the publisher could delay like it was busy processing another workload, the replication connection between publisher and subscriber was not working, etc. Moreover, it will be similar to the same parameter for physical replication. So, I think keeping min in the name is a good idea. -- With Regards, Amit Kapila.
Re: [PoC] Reducing planning time when tables have many partitions
On Mon, 5 Dec 2022 at 21:28, David Rowley wrote: > > On Tue, 6 Dec 2022 at 04:45, Thom Brown wrote: > > Testing your patches with the same 1024 partitions, each with 64 > > sub-partitions, I get a planning time of 205.020 ms, which is now a > > 1,377x speedup. This has essentially reduced the planning time from a > > catastrophe to a complete non-issue. Huge win! > > Thanks for testing the v10 patches. > > I wouldn't have expected such additional gains from v10. I was mostly > focused on trying to minimise any performance regression for simple > queries that wouldn't benefit from indexing the EquivalenceMembers. > Your query sounds like it does not fit into that category. Perhaps it > is down to the fact that v9-0002 or v9-0003 reverts a couple of the > optimisations that is causing v9 to be slower than v10 for your query. > It's hard to tell without more details of what you're running. I celebrated prematurely as I neglected to wait for the 6th execution of the prepared statement, which shows the real result. With the v10 patches, it takes 5632.040 ms, a speedup of 50x. Testing the v9 patches, the same query takes 3388.173 ms, a speedup of 83x. And re-testing v8, I'm getting roughly the same times. These are all with a cold cache. So the result isn't as dramatic as I had initially interpreted it to have unfortunately. -- Thom
Re: Add LSN along with offset to error messages reported for WAL file read/write/validate header failures
On Tue, Dec 6, 2022 at 12:57 PM Kyotaro Horiguchi wrote: > > At Mon, 5 Dec 2022 13:13:23 +0900, Michael Paquier > wrote in > > On Mon, Dec 05, 2022 at 08:48:25AM +0530, Bharath Rupireddy wrote: > > > So, a SQL function pg_dissect_walfile_name (or some other better name) > > > given a WAL file name returns the tli and seg number. Then the > > > pg_walfile_offset_lsn can just be a SQL-defined function (in > > > system_functions.sql) using this new function and pg_settings. If this > > > understanding is correct, it looks good to me at this point. > > > > I would do without the SQL function that looks at pg_settings, FWIW. > > If that function may be called at a high frequency, SQL-defined one is > not suitable, but I don't think this function is used that way. With > that premise, I would prefer SQL-defined as it is far simpler on its > face. > > At Mon, 5 Dec 2022 10:03:55 +0900, Michael Paquier > wrote in > > Hence I would tend to let XLogFromFileName do the job, while having a > > SQL function that is just a thin wrapper around it that returns the > > segment TLI and its number, leaving the offset out of the equation as > > well as this new XLogIdFromFileName(). > > I don't think it could be problematic that the SQL-callable function > returns a bogus result for a wrong WAL filename in the correct > format. Specifically, I think that the function may return (0/0,0) for > "" since that behavior is completely > harmless. If we don't check logid, XLogFromFileName fits instead. If we were to provide correctness and input invalidations (specifically, the validations around 'seg', see below) of the WAL file name typed in by the user, the function pg_walfile_offset_lsn() wins the race. +XLogIdFromFileName(fname, &tli, &segno, &log, &seg, wal_segment_size); + +if (seg >= XLogSegmentsPerXLogId(wal_segment_size) || +(log == 0 && seg == 0) || +segno == 0 || +tli == 0) +ereport(ERROR, +(errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("invalid WAL file name \"%s\"", fname))); +SELECT * FROM pg_walfile_offset_lsn('0001', 15); +ERROR: invalid WAL file name "0001" That said, I think we can have a single function pg_dissect_walfile_name(wal_file_name, offset optional) returning segno (XLogSegNo - physical log file sequence number), tli, lsn (if offset is given). This way there is no need for another SQL-callable function using pg_settings. Thoughts? > (If we assume that the file names are typed in letter-by-letter, I > rather prefer to allow lower-case letters:p) It's easily doable if we convert the entered WAL file name to uppercase using pg_toupper() and then pass it to IsXLogFileName(). -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Question regarding "Make archiver process an auxiliary process. commit"
Thank you for the feedback. I'll be glad to help with the fix. Here's the patch for review. On Tue, Dec 6, 2022 at 1:54 PM Kyotaro Horiguchi wrote: > At Mon, 5 Dec 2022 12:06:11 +0530, Sravan Kumar > wrote in > > timeout = PGARCH_AUTOWAKE_INTERVAL - (curtime - last_copy_time); > > It so happens that last_copy_time and curtime are always set at the same > > time which always makes timeout equal (actually roughly equal) to > > PGARCH_AUTOWAKE_INTERVAL. > > Oooo *^^*. > > > This behaviour was different before the commit: > > d75288fb27b8fe0a926aaab7d75816f091ecdc27, > > in which the archiver keeps track of how much time has elapsed since > > last_copy_time > > in case there was a signal, and it results in a smaller subsequent value > of > > timeout, until timeout is zero. This also avoids calling > > pgarch_ArchiverCopyLoop > > before PGARCH_AUTOWAKE_INTERVAL in case there's an intermittent signal. > > Yes, WaitLatch() (I believe) no longer makes a spurious wakeup. > > > With the current changes it may be okay to always sleep for > > PGARCH_AUTOWAKE_INTERVAL, > > but that means curtime and last_copy_time are no more needed. > > I think you're right. > > > I would like to validate if my understanding is correct, and which of the > > behaviours we would like to retain. > > As my understanding the patch didn't change the copying behavior of > the function. I think we should simplify the loop by removing > last_copy_time and curtime in the "if (!time_to_stop)" block. Then we > can remove the variable "timeout" and the "if (timeout > 0)" > branch. Are you willing to work on this? > > regards. > > -- > Kyotaro Horiguchi > NTT Open Source Software Center > -- Thanks And Regards, Sravan Take life one day at a time. v1-0001-simplify-wait-loop-in-the-archiver.patch Description: Binary data
Re: Error-safe user functions
On 2022-12-05 Mo 20:06, Tom Lane wrote: > Andres Freund writes: > >> But perhaps it's even worth having such a function properly exposed: >> It's not at all rare to parse text data during ETL and quite often >> erroring out fatally is undesirable. As savepoints are undesirable >> overhead-wise, there's a lot of SQL out there that tries to do a >> pre-check about whether some text could be cast to some other data >> type. A function that'd try to cast input to a certain type without >> erroring out hard would be quite useful for that. > Corey and Vik are already talking about a non-error CAST variant. /metoo! :-) > Maybe we should leave this in abeyance until something shows up > for that? Otherwise we'll be making a nonstandard API for what > will probably ultimately be SQL-spec functionality. I don't mind > that as regression-test infrastructure, but I'm a bit less excited > about exposing it as a user feature. > I think a functional mechanism could be very useful. Who knows when the standard might specify something in this area? cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: predefined role(s) for VACUUM and ANALYZE
Nathan Bossart writes: > diff --git a/src/backend/catalog/aclchk.c b/src/backend/catalog/aclchk.c > index 3b5ea3c137..bd967eaa78 100644 > --- a/src/backend/catalog/aclchk.c > +++ b/src/backend/catalog/aclchk.c > @@ -4202,6 +4202,26 @@ pg_class_aclmask_ext(Oid table_oid, Oid roleid, > AclMode mask, > has_privs_of_role(roleid, ROLE_PG_WRITE_ALL_DATA)) > result |= (mask & (ACL_INSERT | ACL_UPDATE | ACL_DELETE)); > > + /* > + * Check if ACL_VACUUM is being checked and, if so, and not already set > as > + * part of the result, then check if the user is a member of the > + * pg_vacuum_all_tables role, which allows VACUUM on all relations. > + */ > + if (mask & ACL_VACUUM && > + !(result & ACL_VACUUM) && > + has_privs_of_role(roleid, ROLE_PG_VACUUM_ALL_TABLES)) > + result |= ACL_VACUUM; > + > + /* > + * Check if ACL_ANALYZE is being checked and, if so, and not already > set as > + * part of the result, then check if the user is a member of the > + * pg_analyze_all_tables role, which allows ANALYZE on all relations. > + */ > + if (mask & ACL_ANALYZE && > + !(result & ACL_ANALYZE) && > + has_privs_of_role(roleid, ROLE_PG_ANALYZE_ALL_TABLES)) > + result |= ACL_ANALYZE; > + > return result; > } These checks are getting rather repetitive, how about a data-driven approach, along the lines of the below patch? I'm not quite happy with the naming of the struct and its members (and maybe it should be in a header?), suggestions welcome. - ilmari >From 34bac3aced60931b2e995c5e1e6269f40c0828f5 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Dagfinn=20Ilmari=20Manns=C3=A5ker?= Date: Thu, 1 Dec 2022 11:49:14 + Subject: [PATCH] Make built-in role permission checking data-driven --- src/backend/catalog/aclchk.c | 64 +--- src/tools/pgindent/typedefs.list | 1 + 2 files changed, 27 insertions(+), 38 deletions(-) diff --git a/src/backend/catalog/aclchk.c b/src/backend/catalog/aclchk.c index bd967eaa78..434bd39124 100644 --- a/src/backend/catalog/aclchk.c +++ b/src/backend/catalog/aclchk.c @@ -4084,6 +4084,22 @@ pg_class_aclmask(Oid table_oid, Oid roleid, return pg_class_aclmask_ext(table_oid, roleid, mask, how, NULL); } +/* + * Actions that built-in roles can perform unconditionally + */ +typedef struct RoleAcl +{ + Oid role; + AclMode mode; +} RoleAcl; + +static const RoleAcl builtin_role_acls[] = { + {ROLE_PG_READ_ALL_DATA, ACL_SELECT}, + {ROLE_PG_WRITE_ALL_DATA, ACL_INSERT | ACL_UPDATE | ACL_DELETE}, + {ROLE_PG_VACUUM_ALL_TABLES, ACL_VACUUM}, + {ROLE_PG_ANALYZE_ALL_TABLES, ACL_ANALYZE}, +}; + /* * Routine for examining a user's privileges for a table * @@ -4182,45 +4198,17 @@ pg_class_aclmask_ext(Oid table_oid, Oid roleid, AclMode mask, ReleaseSysCache(tuple); /* - * Check if ACL_SELECT is being checked and, if so, and not set already as - * part of the result, then check if the user is a member of the - * pg_read_all_data role, which allows read access to all relations. + * For each built-in role, check if its permissions are being checked and, + * if so, and not set already as part of the result, then check if the + * user is a member of the role, and allow the action if so. */ - if (mask & ACL_SELECT && !(result & ACL_SELECT) && - has_privs_of_role(roleid, ROLE_PG_READ_ALL_DATA)) - result |= ACL_SELECT; - - /* - * Check if ACL_INSERT, ACL_UPDATE, or ACL_DELETE is being checked and, if - * so, and not set already as part of the result, then check if the user - * is a member of the pg_write_all_data role, which allows - * INSERT/UPDATE/DELETE access to all relations (except system catalogs, - * which requires superuser, see above). - */ - if (mask & (ACL_INSERT | ACL_UPDATE | ACL_DELETE) && - !(result & (ACL_INSERT | ACL_UPDATE | ACL_DELETE)) && - has_privs_of_role(roleid, ROLE_PG_WRITE_ALL_DATA)) - result |= (mask & (ACL_INSERT | ACL_UPDATE | ACL_DELETE)); - - /* - * Check if ACL_VACUUM is being checked and, if so, and not already set as - * part of the result, then check if the user is a member of the - * pg_vacuum_all_tables role, which allows VACUUM on all relations. - */ - if (mask & ACL_VACUUM && - !(result & ACL_VACUUM) && - has_privs_of_role(roleid, ROLE_PG_VACUUM_ALL_TABLES)) - result |= ACL_VACUUM; - - /* - * Check if ACL_ANALYZE is being checked and, if so, and not already set as - * part of the result, then check if the user is a member of the - * pg_analyze_all_tables role, which allows ANALYZE on all relations. - */ - if (mask & ACL_ANALYZE && - !(result & ACL_ANALYZE) && - has_privs_of_role(roleid, ROLE_PG_ANALYZE_ALL_TABLES)) - result |= ACL_ANALYZE; + for (int i = 0; i < lengthof(builtin_role_acls); i++) + { + const RoleAcl *const builtin = &builtin_role_acls[i]; + if (mask & builtin->mode && !(result & builtin->mode) && + has_privs_of_role(roleid, bu
Re: Question regarding "Make archiver process an auxiliary process. commit"
On Tue, Dec 6, 2022 at 4:57 PM Sravan Kumar wrote: > > Thank you for the feedback. > > I'll be glad to help with the fix. Here's the patch for review. Thanks. +1 for fixing this. I would like to quote recent discussions on reducing the useless wakeups or increasing the sleep/hibernation times in various processes to reduce the power savings [1] [2] [3] [4] [5]. With that in context, does the archiver need to wake up every 60 sec at all when its latch gets set (PgArchWakeup()) whenever the server switches to a new WAL file? What happens if we get rid of PGARCH_AUTOWAKE_INTERVAL and rely on its latch being set? If required, we can spread PgArchWakeup() to more places, no? Before even answering the above questions, I think we need to see if there're any cases where a process can miss SetLatch() calls (I don't have an answer for that). Or do we want to stick to what the below comment says? /* * There shouldn't be anything for the archiver to do except to wait for a * signal ... however, the archiver exists to protect our data, so she * wakes up occasionally to allow herself to be proactive. */ [1] https://www.postgresql.org/message-id/CA%2BhUKGJCbcv8AtujLw3kEO2wRB7Ffzo1fmwaGG-tQLuMOjf6qQ%40mail.gmail.com [2] commit cd4329d9393f84dce34f0bd2dd936adc8ffaa213 Author: Thomas Munro Date: Tue Nov 29 11:28:08 2022 +1300 Remove promote_trigger_file. Previously, an idle startup (recovery) process would wake up every 5 seconds to have a chance to poll for promote_trigger_file, even if that GUC was not configured. That promotion triggering mechanism was effectively superseded by pg_ctl promote and pg_promote() a long time ago. There probably aren't many users left and it's very easy to change to the modern mechanisms, so we agreed to remove the feature. This is part of a campaign to reduce wakeups on idle systems. [3] https://commitfest.postgresql.org/41/4035/ [4] https://commitfest.postgresql.org/41/4020/ [5] commit 05a7be93558c614ab89c794cb1d301ea9ff33199 Author: Thomas Munro Date: Tue Nov 8 20:36:36 2022 +1300 Suppress useless wakeups in walreceiver. Instead of waking up 10 times per second to check for various timeout conditions, keep track of when we next have periodic work to do. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Using WaitEventSet in the postmaster
On Tue, Dec 6, 2022 at 7:09 AM Andres Freund wrote: > On 2022-12-05 22:45:57 +1300, Thomas Munro wrote: > > The reason for the existing sleep-based approach was that we didn't > > want to accept any more connections (or spin furiously because the > > listen queue was non-empty). So in this version I invented a way to > > suppress socket events temporarily with WL_SOCKET_IGNORE, and then > > reactivate them after crash reinit. > > That seems like an odd special flag. Why do we need it? Is it just because we > want to have assertions ensuring that something is being queried? Yeah. Perhaps 0 would be a less clumsy way to say "no events please". I removed the assertions and did it that way in this next iteration. I realised that the previous approach didn't actually suppress POLLHUP and POLLERR in the poll and epoll implementations (even though our code seems to think it needs to ask for those events, it's not necessary, you get them anyway), and, being level-triggered, if those were ever reported we'd finish up pegging the CPU to 100% until the children exited. Unlikely to happen with a server socket, but wrong on principle, and maybe a problem for other potential users of this temporary event suppression mode. One way to fix that for the epoll version is to EPOLL_CTL_DEL and EPOLL_CTL_ADD, whenever transitioning to/from a zero event mask. Tried like that in this version. Another approach would be to (finally) write DeleteWaitEvent() to do the same thing at a higher level... seems overkill for this. The kqueue version was already doing that because of the way it was implemented, and the poll and Windows versions needed only a small adjustment. I'm not too sure about the Windows change; my two ideas are passing the 0 through as shown in this version (not sure if it really works the way I want, but it makes some sense and the WSAEventSelect() call doesn't fail...), or sticking a dummy unsignaled event in the array passed to WaitForMultipleObjects(). To make sure this code is exercised, I made the state machine code eager about silencing the socket events during PM_WAIT_DEAD_END, so crash TAP tests go through the cycle. Regular non-crash shutdown also runs EPOLL_CTL_DEL/EV_DELETE, which stands out if you trace the postmaster. > > * WL_SOCKET_ACCEPT is a new event for an incoming connection (on Unix, > >this is just another name for WL_SOCKET_READABLE, but Window has a > >different underlying event; this mirrors WL_SOCKET_CONNECTED on the > >other end of a connection) > > Perhaps worth committing separately and soon? Seems pretty uncontroversial > from here. Alright, I split this into a separate patch. > > +/* > > + * Object representing the state of a postmaster. > > + * > > + * XXX Lots of global variables could move in here. > > + */ > > +typedef struct > > +{ > > + WaitEventSet*wes; > > +} Postmaster; > > + > > Seems weird to introduce this but then basically have it be unused. I'd say > either have a preceding patch move at least a few members into it, or just > omit it for now. Alright, I'll just have to make a global variable wait_set for now to keep things simple. > > + /* This may configure SIGURG, depending on platform. */ > > + InitializeLatchSupport(); > > + InitLocalLatch(); > > I'm mildly preferring InitProcessLocalLatch(), but not sure why - there's not > really a conflicting meaning of "local" here. Done. > > +/* > > + * Initialize the WaitEventSet we'll use in our main event loop. > > + */ > > +static void > > +InitializeWaitSet(Postmaster *postmaster) > > +{ > > + /* Set up a WaitEventSet for our latch and listening sockets. */ > > + postmaster->wes = CreateWaitEventSet(CurrentMemoryContext, 1 + > > MAXLISTEN); > > + AddWaitEventToSet(postmaster->wes, WL_LATCH_SET, PGINVALID_SOCKET, > > MyLatch, NULL); > > + for (int i = 0; i < MAXLISTEN; i++) > > + { > > + int fd = ListenSocket[i]; > > + > > + if (fd == PGINVALID_SOCKET) > > + break; > > + AddWaitEventToSet(postmaster->wes, WL_SOCKET_ACCEPT, fd, > > NULL, NULL); > > + } > > +} > > The naming seems like it could be confused with latch.h > infrastructure. InitPostmasterWaitSet()? OK. > > +/* > > + * Activate or deactivate the server socket events. > > + */ > > +static void > > +AdjustServerSocketEvents(Postmaster *postmaster, bool active) > > +{ > > + for (int pos = 1; pos < GetNumRegisteredWaitEvents(postmaster->wes); > > ++pos) > > + ModifyWaitEvent(postmaster->wes, > > + pos, active ? > > WL_SOCKET_ACCEPT : WL_SOCKET_IGNORE, > > + NULL); > > +} > > This seems to hardcode the specific wait events we're waiting for based on > latch.c infrastructure. Not really convinced that's a good idea. What are you objecting to? The assumption that the first socket is at position 1? The use of Get
RE: Time delayed LR (WAS Re: logical replication restrictions)
On Friday, December 2, 2022 4:05 PM Amit Kapila wrote: > On Tue, Nov 15, 2022 at 12:33 PM Amit Kapila > wrote: > > One more thing I would like you to consider is the point raised by me > > related to this patch's interaction with the parallel apply feature as > > mentioned in the email [1]. I am not sure the idea proposed in that > > email [1] is a good one because delaying after applying commit may not > > be good as we want to delay the apply of the transaction(s) on > > subscribers by this feature. I feel this needs more thought. > > > > I have thought a bit more about this and we have the following options to > choose the delay point from. (a) apply delay just before committing a > transaction. As mentioned in comments in the patch this can lead to bloat and > locks held for a long time. (b) apply delay before starting to apply changes > for a > transaction but here the problem is which time to consider. In some cases, > like > for streaming transactions, we don't receive the commit/prepare xact time in > the start message. (c) use (b) but use the previous transaction's commit time. > (d) apply delay after committing a transaction by using the xact's commit > time. > > At this stage, among above, I feel any one of (c) or (d) is worth > considering. Now, > the difference between (c) and (d) is that if after commit the next xact's > data is > already delayed by more than min_apply_delay time then we don't need to kick > the new logic of apply delay. > > The other thing to consider whether we need to process any keepalive > messages during the delay because otherwise, walsender may think that the > subscriber is not available and time out. This may not be a problem for > synchronous replication but otherwise, it could be a problem. > > Thoughts? Hi, Thank you for your comments ! Below are some analysis for the major points above. (1) About the timing to apply the delay One approach of (b) would be best. The idea is to delay all types of transaction's application based on the time when one transaction arrives at the subscriber node. One advantage of this approach over (c) and (d) is that this can avoid the case where we might apply a transaction immediately without waiting, if there are two transactions sequentially and the time in between exceeds the min_apply_delay time. When we receive stream-in-progress transactions, we'll check whether the time for delay has passed or not at first in this approach. (2) About the timeout issue When having a look at the physical replication internals, it conducts sending feedback and application of delay separately on different processes. OTOH, the logical replication needs to achieve those within one process. When we want to apply delay and avoid the timeout, we should not store all the transactions data into memory. So, one approach for this is to serialize the transaction data and after the delay, we apply the transactions data. However, this means if users adopt this feature, then all transaction data that should be delayed would be serialized. We are not sure if this sounds a valid approach or not. One another approach was to divide the time of delay in apply_delay() and utilize the divided time for WaitLatch and sends the keepalive messages from there. But, this approach requires some change on the level of libpq layer (like implementing a new function for wal receiver in order to monitor if the data from the publisher is readable or not there). Probably, the first idea to serialize the delayed transactions might be better on this point. Any feedback is welcome. Best Regards, Takamichi Osumi
Re: pg_basebackup: add test about zstd compress option
> The only thing that I can think of would be if $decompress_program > --test were failing, but actually trying to decompress succeeded. I > would be inclined to dismiss that particular scenario as not important > enough to be worth the additional CPU cycles. When I wrote this test, it was just to increase coverage for pg_basebackup. As I checked again, it already does that in the pg_verifybackup 008_untar.pl, 009_extrack.pl test you mentioned. Therefore, I agree with your opinion. --- Regards, DongWook Lee.
Re: [PATCH] Check snapshot argument of index_beginscan and family
On Tue, 6 Dec 2022 at 04:31, Alexander Korotkov wrote: > > On Fri, Dec 2, 2022 at 6:18 PM Alexander Korotkov > wrote: > > On Mon, Nov 28, 2022 at 1:30 PM Aleksander Alekseev > > wrote: > > > Thanks for the feedback! > > > > > > > I think it's a nice catch and worth fixing. The one thing I don't > > > > agree with is using asserts for handling the error that can appear > > > > because most probably the server is built with assertions off and in > > > > this case, there still will be a crash in this case. I'd do this with > > > > report ERROR. Otherwise, the patch looks right and worth committing. > > > > > > Normally a developer is not supposed to pass NULLs there so I figured > > > having extra if's in the release builds doesn't worth it. Personally I > > > wouldn't mind using ereport() but my intuition tells me that the > > > committers are not going to approve this :) > > > > > > Let's see what the rest of the community thinks. > > > > I think this is harmless assertion patch. I'm going to push this if > > no objections. > > Pushed! Great, thanks! Pavel Borisov.
Re: Force streaming every change in logical decoding
On Tue, Dec 6, 2022 at 7:29 PM Amit Kapila wrote: > > On Tue, Dec 6, 2022 at 11:53 AM shiy.f...@fujitsu.com > wrote: > > > > Hi hackers, > > > > In logical decoding, when logical_decoding_work_mem is exceeded, the > > changes are > > sent to output plugin in streaming mode. But there is a restriction that the > > minimum value of logical_decoding_work_mem is 64kB. I tried to add a GUC to > > allow sending every change to output plugin without waiting until > > logical_decoding_work_mem is exceeded. > > > > This helps to test streaming mode. For example, to test "Avoid streaming the > > transaction which are skipped" [1], it needs many XLOG_XACT_INVALIDATIONS > > messages. With the new option, it can be tested with fewer changes and in > > less > > time. Also, this new option helps to test more scenarios for "Perform > > streaming > > logical transactions by background workers" [2]. > > > > Yeah, I think this can also help in reducing the time for various > tests in test_decoding/stream and > src/test/subscription/t/*_stream_*.pl file by reducing the number of > changes required to invoke streaming mode. +1 > Can we think of making this > GUC extendible to even test more options on server-side (publisher) > and client-side (subscriber) cases? For example, we can have something > like logical_replication_mode with the following valid values: (a) > server_serialize: this will serialize each change to file on > publishers and then on commit restore and send all changes; (b) > server_stream: this will stream each change as currently proposed in > your patch. Then if we want to extend it for subscriber-side testing > then we can introduce new options like client_serialize for the case > being discussed in the email [1]. Setting logical_replication_mode = 'client_serialize' implies that the publisher behaves as server_stream? or do you mean we can set like logical_replication_mode = 'server_stream, client_serialize'? Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Re: Add tracking of backend memory allocated to pg_stat_activity
On Fri, 2022-12-02 at 09:18 -0800, Andres Freund wrote: > Hi, > > On 2022-12-02 11:09:30 -0500, Reid Thompson wrote: > > It appears to me that Postmaster populates the local variable that > > *my_allocated_bytes points to. That allocation is passed to forked > > children, and if not zeroed out, will be double counted as part of > > the child allocation. Is this invalid? > > I don't think we should count allocations made before > backend_status.c has > been initialized. > > Greetings, > > Andres Freund Hi, The intent was to capture and display all the memory allocated to the backends, for admins/users/max_total_backend_memory/others to utilize. Why should we ignore the allocations prior to backend_status.c? Thanks, Reid -- Reid Thompson Senior Software Engineer Crunchy Data, Inc. reid.thomp...@crunchydata.com www.crunchydata.com
Re: predefined role(s) for VACUUM and ANALYZE
On 06.12.2022 03:04, Nathan Bossart wrote: I wonder why \dpS wasn't added. I wrote up a patch to add it and the corresponding documentation that other meta-commands already have. Yes, \dpS command and clarification in the documentation is exactly what is needed. -- Pavel Luzanov Postgres Professional: https://postgrespro.com
Re: Missing MaterialPath support in reparameterize_path_by_child
On Mon, Dec 5, 2022 at 8:13 PM Tom Lane wrote: > > Ashutosh Bapat writes: > > partition-wise join should be willing to fallback to non-partitionwise > > join in such a case. After spending a few minutes with the code, I > > think generate_partitionwise_join_paths() should not call > > set_cheapest() is the pathlist of the child is NULL and should just > > wind up and avoid adding any path. > > We clearly need to not call set_cheapest(), but that's not sufficient; > we still fail at higher levels, as you'll see if you try the example > Richard found. I ended up making fe12f2f8f to fix this. Thanks. That looks good. > > I don't especially like "rel->nparts = 0" as a way of disabling > partitionwise join; ISTM it'd be clearer and more flexible to reset > consider_partitionwise_join instead of destroying the data structure. > But that's the way it's being done elsewhere, and I didn't want to > tamper with it in a bug fix. I see various assertions about parent > and child consider_partitionwise_join flags being equal, which we > might have to revisit if we try to make it work that way. > AFAIR, consider_partitionwise_join tells whether a given partitioned relation (join, higher or base) can be considered for partitionwise join. set_append_rel_size() decides that based on some properties. But rel->nparts is indicator of whether the relation (join, higher or base) is partitioned or not. If we can not generate AppendPath for a join relation, it means there is no way to compute child join relations and thus the relation is not partitioned. So setting rel->nparts = 0 is right. Probably we should add macros similar to dummy relation for marking and checking partitioned relation. I see IS_PARTITIONED_RELATION() is defined already. Maybe we could add mark_(un)partitioned_rel(). -- Best Wishes, Ashutosh Bapat
Re: Error-safe user functions
[ continuing the naming quagmire... ] I wrote: > Andres Freund writes: >> Not that I have a suggestion for a better name, but I don't particularly >> like "Safe" denoting non-erroring input function calls. There's too many >> interpretations of safe - e.g. safe against privilege escalation issues >> or such. > Yeah, I'm not that thrilled with it either --- but it's a reasonably > on-point modifier, and short. It occurs to me that another spelling could be NoError (or _noerror where not using camel case). There's some precedent for that already; and where we have it, it has the same implication of reporting rather than throwing certain errors, without making a guarantee about all errors. For instance lookup_rowtype_tupdesc_noerror won't prevent throwing errors if catalog corruption is detected inside the catcaches. I'm not sure this is any *better* than Safe ... it's longer, less mellifluous, and still subject to misinterpretation. But it's a possible alternative. regards, tom lane
Re: Missing MaterialPath support in reparameterize_path_by_child
Ashutosh Bapat writes: > On Mon, Dec 5, 2022 at 8:13 PM Tom Lane wrote: >> I don't especially like "rel->nparts = 0" as a way of disabling >> partitionwise join ... > ... If we can not generate AppendPath for a > join relation, it means there is no way to compute child join > relations and thus the relation is not partitioned. So setting > rel->nparts = 0 is right. If we had nparts > 0 before, then it is partitioned for some value of "partitioned", so I don't entirely buy this argument. > Probably we should add macros similar to > dummy relation for marking and checking partitioned relation. I see > IS_PARTITIONED_RELATION() is defined already. Maybe we could add > mark_(un)partitioned_rel(). Hiding it behind a macro with an explanatory name would be an improvement, for sure. regards, tom lane
Re: old_snapshot: add test for coverage
On Mon, Aug 8, 2022 at 2:37 PM Tom Lane wrote: > > Dong Wook Lee writes: > > I wrote a test of the old_snapshot extension for coverage. > > Hmm, does this really provide any meaningful coverage? The test > sure looks like it's not doing much. Previously written tests were simply test codes to increase coverage. Therefore, we can make a better test. I'll think about it by this week. If this work exceeds my ability, I will let you know by reply. It's okay that the issue should be closed unless I write a meaningful test. --- Regards, DongWook Lee.
Re: Allow placeholders in ALTER ROLE w/o superuser
On Mon, Dec 5, 2022 at 10:32 PM Alexander Korotkov wrote: > On Mon, Dec 5, 2022 at 8:18 PM Tom Lane wrote: > > Alvaro Herrera writes: > > > I couldn't find any discussion of the idea of adding "(s)" to the > > > variable name in order to mark the variable userset in the catalog, and > > > I have to admit I find it a bit strange. Are we really agreed that > > > that's the way to proceed? > > > > I hadn't been paying close attention to this thread, sorry. > > > > I agree that that seems like a very regrettable choice, > > especially if you anticipate having to bump catversion anyway. > > I totally understand that this change requires a catversion bump. > I've reflected this in the commit message. > > > Better to add a bool column to the catalog. > > What about adding a boolean array to the pg_db_role_setting? So, > pg_db_role_setting would have the following columns. > * setdatabase oid > * setrole oid > * setconfig text[] > * setuser bool[] The revised patch implements this way for storage USER SET flag. I think it really became more structured and less cumbersome. -- Regards, Alexander Korotkov 0001-Add-USER-SET-parameter-values-for-pg_db_role_sett-v8.patch Description: Binary data
Re: Improve tab completion for ALTER FUNCTION/PROCEDURE/ROUTINE
Hi Vignesh, Looks like the patch needs a rebase. Also one little suggestion: + if (ends_with(prev_wd, ')')) > + COMPLETE_WITH(Alter_routine_options, "CALLED ON NULL INPUT", > + "RETURNS NULL ON NULL INPUT", "STRICT", "SUPPORT"); What do you think about gathering FUNCTION options as you did with ROUTINE options. Something like the following would seem nicer, I think. #define Alter_function_options \ > Alter_routine_options, "CALLED ON NULL INPUT", \ "RETURNS NULL ON NULL INPUT", "STRICT", "SUPPORT" Best, -- Melih Mutlu Microsoft
Re: ExecRTCheckPerms() and many prunable partitions
I have pushed this finally. I made two further changes: 1. there was no reason to rename ExecCheckPerms_hook, since its signature was changing anyway. I reverted it to the original name. 2. I couldn't find any reason to expose ExecGetRTEPermissionInfo, and given that it's a one-line function, I removed it. Maybe you had a reason to add ExecGetRTEPermissionInfo, thinking about external callers; if so please discuss it. I'll mark this commitfest entry as committed soon; please post the other two patches you had in this series in a new thread. -- Álvaro HerreraBreisgau, Deutschland — https://www.EnterpriseDB.com/
Re: Error-safe user functions
On 2022-12-06 Tu 09:42, Tom Lane wrote: > [ continuing the naming quagmire... ] > > I wrote: >> Andres Freund writes: >>> Not that I have a suggestion for a better name, but I don't particularly >>> like "Safe" denoting non-erroring input function calls. There's too many >>> interpretations of safe - e.g. safe against privilege escalation issues >>> or such. >> Yeah, I'm not that thrilled with it either --- but it's a reasonably >> on-point modifier, and short. > It occurs to me that another spelling could be NoError (or _noerror > where not using camel case). There's some precedent for that already; > and where we have it, it has the same implication of reporting rather > than throwing certain errors, without making a guarantee about all > errors. For instance lookup_rowtype_tupdesc_noerror won't prevent > throwing errors if catalog corruption is detected inside the catcaches. > > I'm not sure this is any *better* than Safe ... it's longer, less > mellifluous, and still subject to misinterpretation. But it's > a possible alternative. > > Yeah, I don't think there's terribly much to choose between 'safe' and 'noerror' in terms of meaning. I originally chose InputFunctionCallContext as a more neutral name in case we wanted to be able to pass some other sort of node for the context in future. Maybe that was a little too forward looking. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: Error-safe user functions
Andrew Dunstan writes: > On 2022-12-06 Tu 09:42, Tom Lane wrote: >> I'm not sure this is any *better* than Safe ... it's longer, less >> mellifluous, and still subject to misinterpretation. But it's >> a possible alternative. > Yeah, I don't think there's terribly much to choose between 'safe' and > 'noerror' in terms of meaning. Yeah, I just wanted to throw it out there and see if anyone thought it was a better idea. > I originally chose InputFunctionCallContext as a more neutral name in > case we wanted to be able to pass some other sort of node for the > context in future. > Maybe that was a little too forward looking. I didn't like that because it seemed to convey nothing at all about the expected behavior. regards, tom lane
Re: wake up logical workers after ALTER SUBSCRIPTION
Hi Nathan, @@ -410,6 +411,12 @@ ExecRenameStmt(RenameStmt *stmt) > stmt->newname); > table_close(catalog, RowExclusiveLock); > > + /* > + * Wake up the logical replication workers to handle this > + * change quickly. > + */ > + LogicalRepWorkersWakeupAtCommit(address.objectId); Is it really necessary to wake logical workers up when renaming other than subscription or publication? address.objectId will be a valid subid only when renaming a subscription. @@ -322,6 +323,9 @@ UpdateSubscriptionRelState(Oid subid, Oid relid, char > state, > > /* Cleanup. */ > table_close(rel, NoLock); > + > + /* Wake up the logical replication workers to handle this change > quickly. */ > + LogicalRepWorkersWakeupAtCommit(subid); I wonder why a wakeup call is needed every time a subscription relation is updated. It seems to me that there are two places where UpdateSubscriptionRelState is called and we need another worker to wake up: - When a relation is in SYNCWAIT state, it waits for the apply worker to wake up and change the relation state to CATCHUP. Then tablesync worker needs to wake up to continue from CATCHUP state. - When the state is SYNCDONE and the apply worker has to wake up to change the state to READY. I think we already call logicalrep_worker_wakeup_ptr wherever it's needed for the above cases? What am I missing here? Best, -- Melih Mutlu Microsoft
Re: Error-safe user functions
On Tue, Dec 6, 2022 at 11:07 AM Tom Lane wrote: > > I originally chose InputFunctionCallContext as a more neutral name in > > case we wanted to be able to pass some other sort of node for the > > context in future. > > Maybe that was a little too forward looking. > > I didn't like that because it seemed to convey nothing at all about > the expected behavior. I feel like this can go either way. If we pick a name that conveys a specific intended behavior now, and then later we want to pass some other sort of node for some purpose other than ignoring errors, it's unpleasant to have a name that sounds like it can only ignore errors. But if we never use it for anything other than ignoring errors, a specific name is clearer. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Add tracking of backend memory allocated to pg_stat_activity
Hi, On 2022-12-06 08:47:55 -0500, Reid Thompson wrote: > The intent was to capture and display all the memory allocated to the > backends, for admins/users/max_total_backend_memory/others to utilize. It's going to be far less accurate than that. For one, there's a lot of operating system resources, like the page table, that are going to be ignored. We're also not going to capture allocations done directly via malloc(). There's also copy-on-write effects that we're ignoring. If we just want to show an accurate picture of the current memory usage, we need to go to operating system specific interfaces. > Why should we ignore the allocations prior to backend_status.c? It seems to add complexity without a real increase in accuracy to me. But I'm not going to push harder on it than I already have. Greetings, Andres Freund
Re: doc: add missing "id" attributes to extension packaging page
On 06.12.2022 at 09:38, Alvaro Herrera wrote: I care. The problem last time is that we were in the middle of the last commitfest, so we were (or at least I was) distracted by other stuff. Ok, thanks. That's appreciated and understood. Looking at the resulting psql page, https://pgdocs.piening.info/app-psql.html#APP-PSQL-OPTIONS-EXPANDED I note that the ID for the -x option is called "options-blah". I understand where does this come from: it's the "expanded" bit in the "options" section. However, put together it's a bit silly to have "options" in plural there; it would make more sense to have it be https://pgdocs.piening.info/app-psql.html#APP-PSQL-OPTION-EXPANDED (where you can read more naturally "the expanded option for psql"). How laborious would it be to make it so? No problem. I've already done it. Your second link should work now. It'll probably have some conflicts, yeah. I've updated and rebased my branch on current master now. I would welcome separate patches: one to add the IDs, another for the XSL/CSS stuff. That allows us to discuss them separately. I'll send two patches in two separate e-mails in a moment. Regards, Brar
Re: pg_upgrade test failure
Hi, On 2022-11-08 01:16:09 +1300, Thomas Munro wrote: > So [1] on its own didn't fix this. My next guess is that the attached > might help. > > Hmm. Following Michael's clue that this might involve log files and > pg_ctl, I noticed one thing: pg_ctl implements > wait_for_postmaster_stop() by waiting for kill(pid, 0) to fail, and > our kill emulation does CallNamedPipe(). If the server is in the > process of exiting and the kernel is cleaning up all the handles we > didn't close, is there any reason to expect the signal pipe to be > closed after the log file? What is our plan here? This afaict is the most common "false positive" for cfbot in the last weeks. E.g.: https://api.cirrus-ci.com/v1/artifact/task/5462686092230656/testrun/build/testrun/pg_upgrade/002_pg_upgrade/log/regress_log_002_pg_upgrade ... [00:02:58.761](93.859s) ok 10 - run of pg_upgrade for new instance [00:02:58.808](0.047s) not ok 11 - pg_upgrade_output.d/ removed after pg_upgrade success [00:02:58.815](0.007s) # Failed test 'pg_upgrade_output.d/ removed after pg_upgrade success' # at C:/cirrus/src/bin/pg_upgrade/t/002_pg_upgrade.pl line 288. Michael: Why does 002_pg_upgrade.pl try to filter the list of files in pg_upgrade_output.d for files ending in .log? And why does it print those only after starting the new node? How about moving the iteration through the pg_upgrade_output.d to before the ->start and printing all the files, but only slurp_file() if the filename ends with *.log? Minor nit: It seems off to quite so many copies of $newnode->data_dir . "/pg_upgrade_output.d" particularly where the test defines $log_path, but then still builds it from scratch after (line 304). Greetings, Andres Freund
Re: doc: add missing "id" attributes to extension packaging page
On 06.12.2022 at 18:59, Brar Piening wrote: On 06.12.2022 at 09:38, Alvaro Herrera wrote: I would welcome separate patches: one to add the IDs, another for the XSL/CSS stuff. That allows us to discuss them separately. I'll send two patches in two separate e-mails in a moment. This is patch no 2 that adds links to html elements with ids to make them visible on the HTML surface when hovering the element. Regards, Brar diff --git a/doc/src/sgml/stylesheet-html-common.xsl b/doc/src/sgml/stylesheet-html-common.xsl index 9df2782ce4..111b03d6fc 100644 --- a/doc/src/sgml/stylesheet-html-common.xsl +++ b/doc/src/sgml/stylesheet-html-common.xsl @@ -301,4 +301,115 @@ set toc,title + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + 6 + + + + + + http://www.w3.org/1999/xhtml";> + + + +clear: both + + + + + + + + + + + + + + + + + + + + + + + # + + + + + + id_link + + # + + + + + + + + element without id. Please add an id to make it usable + as stable anchor in the public HTML documentation. + + + Please add an id to the + + element in the sgml source code. + + + + + + diff --git a/doc/src/sgml/stylesheet.css b/doc/src/sgml/stylesheet.css index 6410a47958..e4174e0613 100644 --- a/doc/src/sgml/stylesheet.css +++ b/doc/src/sgml/stylesheet.css @@ -163,3 +163,13 @@ acronym{ font-style: inherit; } width: 75%; } } + +/* Links to ids of headers and definition terms */ +a.id_link { +color: inherit; +visibility: hidden; +} + +*:hover > a.id_link { +visibility: visible; +}
Re: Make EXPLAIN generate a generic plan for a parameterized query
Hi, On 2022-10-29 10:35:26 +0200, Laurenz Albe wrote: > On Tue, 2022-10-25 at 19:03 +0800, Julien Rouhaud wrote: > > On Tue, Oct 25, 2022 at 11:08:27AM +0200, Laurenz Albe wrote: > > > Here is a patch that > > > implements it with an EXPLAIN option named GENERIC_PLAN. > > > > I only have a quick look at the patch for now. Any reason why you don't > > rely > > on the existing explain_filter() function for emitting stable output > > (without > > having to remove the costs)? It would also take care of checking that it > > works > > in plpgsql. > > No, there is no principled reason I did it like that. Version 2 does it like > you suggest. This fails to build the docs: https://cirrus-ci.com/task/5609301511766016 [17:47:01.064] ref/explain.sgml:179: parser error : Opening and ending tag mismatch: likeral line 179 and literal [17:47:01.064] ANALYZE, since a statement with unknown parameters [17:47:01.064] ^ Greetings, Andres Freund
Re: Error-safe user functions
On Tue, Dec 6, 2022 at 6:46 AM Andrew Dunstan wrote: > > On 2022-12-05 Mo 20:06, Tom Lane wrote: > > Andres Freund writes: > > > >> But perhaps it's even worth having such a function properly exposed: > >> It's not at all rare to parse text data during ETL and quite often > >> erroring out fatally is undesirable. As savepoints are undesirable > >> overhead-wise, there's a lot of SQL out there that tries to do a > >> pre-check about whether some text could be cast to some other data > >> type. A function that'd try to cast input to a certain type without > >> erroring out hard would be quite useful for that. > > Corey and Vik are already talking about a non-error CAST variant. > > > /metoo! :-) > > > > Maybe we should leave this in abeyance until something shows up > > for that? Otherwise we'll be making a nonstandard API for what > > will probably ultimately be SQL-spec functionality. I don't mind > > that as regression-test infrastructure, but I'm a bit less excited > > about exposing it as a user feature. > > > > > I think a functional mechanism could be very useful. Who knows when the > standard might specify something in this area? > > > Vik's working on the standard (he put the spec in earlier in this thread). I'm working on implementing it on top of Tom's work, but I'm one patchset behind at the moment. Once completed, it should be leverage-able in several places, COPY being the most obvious. What started all this was me noticing that if I implemented TRY_CAST in pl/pgsql with an exception block, then I wasn't able to use parallel query.
Re: [Proposal] Add foreign-server health checks infrastructure
Hi, On 2022-11-25 11:41:45 +, Hayato Kuroda (Fujitsu) wrote: > Sorry for my late reply. I understood that we got agreement the basic design > of first version. Thanks! > I attached new version patches. This is failing on cfbot / CI, as have prior versions. https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest%2F40%2F3388 https://api.cirrus-ci.com/v1/artifact/task/6655254493134848/testrun/build/testrun/postgres_fdw/regress/regression.diffs diff -U3 /tmp/cirrus-ci-build/contrib/postgres_fdw/expected/postgres_fdw.out /tmp/cirrus-ci-build/build/testrun/postgres_fdw/regress/results/postgres_fdw.out --- /tmp/cirrus-ci-build/contrib/postgres_fdw/expected/postgres_fdw.out 2022-12-06 05:21:33.906116000 + +++ /tmp/cirrus-ci-build/build/testrun/postgres_fdw/regress/results/postgres_fdw.out 2022-12-06 05:27:24.929694000 + @@ -11732,10 +11732,9 @@ -- execute check function. This should return false on supported platforms, -- otherwise return true. SELECT postgres_fdw_verify_connection_states('loopback'); -WARNING: 08006 postgres_fdw_verify_connection_states --- - f + t (1 row) ABORT; The failure happens everywhere except on linux, so presumably there's some portability issue in the patch. I've set the patch as waiting on author for now. Note that you can test CI in your own repository, as explained in src/tools/ci/README Greetings, Andres Freund
Re: Add the ability to limit the amount of memory that can be allocated to backends.
Hi, On 2022-11-26 22:22:15 -0500, Reid Thompson wrote: > rebased/patched to current master && current > pg-stat-activity-backend-memory-allocated This version fails to build with msvc, and builds with warnings on other platforms. https://cirrus-ci.com/build/5410696721072128 msvc: [20:26:51.286] c:\cirrus\src\include\utils/backend_status.h(40): error C2059: syntax error: 'constant' mingw cross: [20:26:26.358] from /usr/share/mingw-w64/include/winsock2.h:23, [20:26:26.358] from ../../src/include/port/win32_port.h:60, [20:26:26.358] from ../../src/include/port.h:24, [20:26:26.358] from ../../src/include/c.h:1306, [20:26:26.358] from ../../src/include/postgres.h:47, [20:26:26.358] from controldata_utils.c:18: [20:26:26.358] ../../src/include/utils/backend_status.h:40:2: error: expected identifier before numeric constant [20:26:26.358]40 | IGNORE, [20:26:26.358] | ^~ [20:26:26.358] In file included from ../../src/include/postgres.h:48, [20:26:26.358] from controldata_utils.c:18: [20:26:26.358] ../../src/include/utils/backend_status.h: In function ‘pgstat_report_allocated_bytes’: [20:26:26.358] ../../src/include/utils/backend_status.h:365:12: error: format ‘%ld’ expects argument of type ‘long int’, but argument 3 has type ‘uint64’ {aka ‘long long unsigned int’} [-Werror=format=] [20:26:26.358] 365 | errmsg("Backend %d deallocated %ld bytes, exceeding the %ld bytes it is currently reporting allocated. Setting reported to 0.", [20:26:26.358] | ^~~ [20:26:26.358] 366 | MyProcPid, allocated_bytes, *my_allocated_bytes)); [20:26:26.358] |~~~ [20:26:26.358] || [20:26:26.358] |uint64 {aka long long unsigned int} Due to windows having long be 32bit, you need to use %lld. Our custom to deal with that is to cast the argument to errmsg as long long unsigned and use %llu. Btw, given that the argument is uint64, it doesn't seem correct to use %ld, that's signed. Not that it's going to matter, but ... Greetings, Andres Freund
Re: PATCH: AM-specific statistics, with an example implementation for BRIN (WIP)
Hi, On 2022-10-18 13:33:59 +0200, Tomas Vondra wrote: > I mentioned I have some ideas how to improve this, and that I'll start a > separate thread to discuss this. So here we go ... This CF entry has been failing tests since it was submitted. Are you planning to work on this further? If not I think we should just close the CF entry. Greetings, Andres Freund
Re: doc: add missing "id" attributes to extension packaging page
On 06.12.2022 at 19:11, Brar Piening wrote: The current statistics for docbook elements with/without ids after applying the patch are the following: Somehow my e-mail client destroyed the table. That's how it was supposed to look like: name | with_id | without_id | id_coverage | min_id_len | max_id_len ---+-++-++ sect2 | 870 | 0 | 100.00 | 7 | 57 sect1 | 739 | 0 | 100.00 | 4 | 46 refentry | 307 | 0 | 100.00 | 8 | 37 sect3 | 206 | 0 | 100.00 | 11 | 57 chapter | 77 | 0 | 100.00 | 5 | 24 sect4 | 28 | 0 | 100.00 | 16 | 47 biblioentry | 23 | 0 | 100.00 | 6 | 15 simplesect | 20 | 0 | 100.00 | 24 | 39 appendix | 15 | 0 | 100.00 | 7 | 23 part | 8 | 0 | 100.00 | 5 | 20 co | 4 | 0 | 100.00 | 18 | 30 figure | 3 | 0 | 100.00 | 13 | 28 reference | 3 | 0 | 100.00 | 14 | 18 anchor | 1 | 0 | 100.00 | 21 | 21 bibliography | 1 | 0 | 100.00 | 8 | 8 book | 1 | 0 | 100.00 | 10 | 10 index | 1 | 0 | 100.00 | 11 | 11 legalnotice | 1 | 0 | 100.00 | 13 | 13 preface | 1 | 0 | 100.00 | 9 | 9 glossentry | 119 | 14 | 89.47 | 13 | 32 table | 285 | 162 | 63.76 | 12 | 56 example | 27 | 16 | 62.79 | 12 | 42 refsect3 | 5 | 3 | 62.50 | 19 | 24 refsect2 | 41 | 56 | 42.27 | 10 | 36 varlistentry | 1701 | 3172 | 34.91 | 9 | 64 footnote | 5 | 18 | 21.74 | 17 | 32 step | 28 | 130 | 17.72 | 7 | 28 refsect1 | 151 | 1333 | 10.18 | 15 | 40 informaltable | 1 | 15 | 6.25 | 25 | 25 phrase | 2 | 94 | 2.08 | 20 | 26 indexterm | 5 | 3262 | 0.15 | 17 | 26 variablelist | 1 | 813 | 0.12 | 21 | 21 function | 4 | 4011 | 0.10 | 12 | 28 entry | 11 | 17740 | 0.06 | 21 | 40 para | 3 | 25734 | 0.01 | 21 | 27
Re: Eliminating SPI from RI triggers - take 2
Hi, On 2022-10-15 14:47:05 +0900, Amit Langote wrote: > Attached updated patches. These started to fail to build recently: [04:43:33.046] ccache cc -Isrc/backend/postgres_lib.a.p -Isrc/include -I../src/include -Isrc/include/storage -Isrc/include/utils -Isrc/include/catalog -Isrc/include/nodes -fdiagnostics-color=always -pipe -D_FILE_OFFSET_BITS=64 -Wall -Winvalid-pch -g -fno-strict-aliasing -fwrapv -fexcess-precision=standard -D_GNU_SOURCE -Wmissing-prototypes -Wpointer-arith -Werror=vla -Wendif-labels -Wmissing-format-attribute -Wimplicit-fallthrough=3 -Wcast-function-type -Wshadow=compatible-local -Wformat-security -Wdeclaration-after-statement -Wno-format-truncation -Wno-stringop-truncation -fPIC -pthread -DBUILDING_DLL -MD -MQ src/backend/postgres_lib.a.p/executor_execPartition.c.o -MF src/backend/postgres_lib.a.p/executor_execPartition.c.o.d -o src/backend/postgres_lib.a.p/executor_execPartition.c.o -c ../src/backend/executor/execPartition.c [04:43:33.046] ../src/backend/executor/execPartition.c: In function ‘ExecGetLeafPartitionForKey’: [04:43:33.046] ../src/backend/executor/execPartition.c:1679:19: error: too few arguments to function ‘build_attrmap_by_name_if_req’ [04:43:33.046] 1679 |AttrMap *map = build_attrmap_by_name_if_req(RelationGetDescr(root_rel), [04:43:33.046] | ^~~~ [04:43:33.046] In file included from ../src/include/access/tupconvert.h:17, [04:43:33.046] from ../src/include/nodes/execnodes.h:32, [04:43:33.046] from ../src/include/executor/execPartition.h:16, [04:43:33.046] from ../src/backend/executor/execPartition.c:21: [04:43:33.046] ../src/include/access/attmap.h:47:17: note: declared here [04:43:33.046]47 | extern AttrMap *build_attrmap_by_name_if_req(TupleDesc indesc, [04:43:33.046] | ^~~~ Regards, Andres
Re: RFC: Logging plan of the running query
Hi, This patch does not currently build, due to a conflicting oid: https://cirrus-ci.com/task/4638460594618368?logs=build#L108 [17:26:44.602] /usr/bin/perl ../src/include/catalog/../../backend/catalog/genbki.pl --include-path=../src/include --set-version=16 --output=src/include/catalog ../src/include/catalog/pg_proc.h ../src/include/catalog/pg_type.h ../src/include/catalog/pg_attribute.h ../src/include/catalog/pg_class.h ../src/include/catalog/pg_attrdef.h ../src/include/catalog/pg_constraint.h ../src/include/catalog/pg_inherits.h ../src/include/catalog/pg_index.h ../src/include/catalog/pg_operator.h ../src/include/catalog/pg_opfamily.h ../src/include/catalog/pg_opclass.h ../src/include/catalog/pg_am.h ../src/include/catalog/pg_amop.h ../src/include/catalog/pg_amproc.h ../src/include/catalog/pg_language.h ../src/include/catalog/pg_largeobject_metadata.h ../src/include/catalog/pg_largeobject.h ../src/include/catalog/pg_aggregate.h ../src/include/catalog/pg_statistic.h ../src/include/catalog/pg_statistic_ext.h ../src/include/catalog/pg_statistic_ext_data.h ../src/include/catalog/pg_rewrite.h ../src/include/catalog/pg_trigger.h ../src/include/catalog/pg_event_trigger.h ../src/include/catalog/pg_description.h ../src/include/catalog/pg_cast.h ../src/include/catalog/pg_enum.h ../src/include/catalog/pg_namespace.h ../src/include/catalog/pg_conversion.h ../src/include/catalog/pg_depend.h ../src/include/catalog/pg_database.h ../src/include/catalog/pg_db_role_setting.h ../src/include/catalog/pg_tablespace.h ../src/include/catalog/pg_authid.h ../src/include/catalog/pg_auth_members.h ../src/include/catalog/pg_shdepend.h ../src/include/catalog/pg_shdescription.h ../src/include/catalog/pg_ts_config.h ../src/include/catalog/pg_ts_config_map.h ../src/include/catalog/pg_ts_dict.h ../src/include/catalog/pg_ts_parser.h ../src/include/catalog/pg_ts_template.h ../src/include/catalog/pg_extension.h ../src/include/catalog/pg_foreign_data_wrapper.h ../src/include/catalog/pg_foreign_server.h ../src/include/catalog/pg_user_mapping.h ../src/include/catalog/pg_foreign_table.h ../src/include/catalog/pg_policy.h ../src/include/catalog/pg_replication_origin.h ../src/include/catalog/pg_default_acl.h ../src/include/catalog/pg_init_privs.h ../src/include/catalog/pg_seclabel.h ../src/include/catalog/pg_shseclabel.h ../src/include/catalog/pg_collation.h ../src/include/catalog/pg_parameter_acl.h ../src/include/catalog/pg_partitioned_table.h ../src/include/catalog/pg_range.h ../src/include/catalog/pg_transform.h ../src/include/catalog/pg_sequence.h ../src/include/catalog/pg_publication.h ../src/include/catalog/pg_publication_namespace.h ../src/include/catalog/pg_publication_rel.h ../src/include/catalog/pg_subscription.h ../src/include/catalog/pg_subscription_rel.h [17:26:44.602] Duplicate OIDs detected: [17:26:44.602] 4550 [17:26:44.602] found 1 duplicate OID(s) in catalog data I suggest you choose a random oid out of the "development purposes" range: * OIDs 1- are reserved for manual assignment (see .dat files in * src/include/catalog/). Of these, 8000- are reserved for * development purposes (such as in-progress patches and forks); * they should not appear in released versions. Greetings, Andres Freund
Re: New strategies for freezing, advancing relfrozenxid early
Hi, On 2022-11-23 15:06:52 -0800, Peter Geoghegan wrote: > Attached is v8. The docs don't build: https://cirrus-ci.com/task/5456939761532928 [20:00:58.203] postgres.sgml:52: element link: validity error : IDREF attribute linkend references an unknown ID "vacuum-for-wraparound" Greetings, Andres Freund
Re: May be BUG. Periodic burst growth of the checkpoint_req counter on replica.
Hi, On 2022-09-19 01:29:21 +0300, Anton A. Melnikov wrote: > Corrected patch is attached (v2-0001-Fix-burst-checkpoint_req-growth.patch). This patch doesn't pass the main regression tests tests successfully: https://cirrus-ci.com/task/5502700019253248 diff -U3 /tmp/cirrus-ci-build/src/test/regress/expected/rules.out /tmp/cirrus-ci-build/build/testrun/regress/regress/results/rules.out --- /tmp/cirrus-ci-build/src/test/regress/expected/rules.out2022-12-06 05:49:53.687424000 + +++ /tmp/cirrus-ci-build/build/testrun/regress/regress/results/rules.out 2022-12-06 05:53:04.64269 + @@ -1816,6 +1816,9 @@ FROM pg_stat_get_archiver() s(archived_count, last_archived_wal, last_archived_time, failed_count, last_failed_wal, last_failed_time, stats_reset); pg_stat_bgwriter| SELECT pg_stat_get_bgwriter_timed_checkpoints() AS checkpoints_timed, pg_stat_get_bgwriter_requested_checkpoints() AS checkpoints_req, +pg_stat_get_bgwriter_timed_restartpoints() AS restartpoints_timed, +pg_stat_get_bgwriter_requested_restartpoints() AS restartpoints_req, +pg_stat_get_bgwriter_performed_restartpoints() AS restartpoints_done, pg_stat_get_checkpoint_write_time() AS checkpoint_write_time, pg_stat_get_checkpoint_sync_time() AS checkpoint_sync_time, pg_stat_get_bgwriter_buf_written_checkpoints() AS buffers_checkpoint, Greetings, Andres Freund
Re: Removing unneeded self joins
Hi, On 2022-10-05 17:25:18 +0500, Andrey Lepikhov wrote: > New version, rebased onto current master. > Nothing special, just rebase. This doesn't pass the main regression tests due to a plan difference. https://cirrus-ci.com/task/5537518245380096 https://api.cirrus-ci.com/v1/artifact/task/5537518245380096/testrun/build/testrun/regress/regress/regression.diffs diff -U3 /tmp/cirrus-ci-build/src/test/regress/expected/join.out /tmp/cirrus-ci-build/build/testrun/regress/regress/results/join.out --- /tmp/cirrus-ci-build/src/test/regress/expected/join.out 2022-12-05 19:11:52.453920838 + +++ /tmp/cirrus-ci-build/build/testrun/regress/regress/results/join.out 2022-12-05 19:15:21.864183651 + @@ -5806,7 +5806,7 @@ Nested Loop Join Filter: (sj_t3.id = sj_t1.id) -> Nested Loop - Join Filter: (sj_t3.id = sj_t2.id) + Join Filter: (sj_t2.id = sj_t3.id) -> Nested Loop Semi Join -> Nested Loop -> HashAggregate Greetings, Andres Freund
Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
Hi, On 2022-11-11 17:16:36 +0100, Önder Kalacı wrote: > I rebased the changes to the current master branch, reflected pg_indent > suggestions and also made a few minor style changes. Needs another rebase, I think: https://cirrus-ci.com/task/559244463758 [05:44:22.102] FAILED: src/backend/postgres_lib.a.p/replication_logical_worker.c.o [05:44:22.102] ccache cc -Isrc/backend/postgres_lib.a.p -Isrc/include -I../src/include -Isrc/include/storage -Isrc/include/utils -Isrc/include/catalog -Isrc/include/nodes -fdiagnostics-color=always -pipe -D_FILE_OFFSET_BITS=64 -Wall -Winvalid-pch -g -fno-strict-aliasing -fwrapv -fexcess-precision=standard -D_GNU_SOURCE -Wmissing-prototypes -Wpointer-arith -Werror=vla -Wendif-labels -Wmissing-format-attribute -Wimplicit-fallthrough=3 -Wcast-function-type -Wshadow=compatible-local -Wformat-security -Wdeclaration-after-statement -Wno-format-truncation -Wno-stringop-truncation -fPIC -pthread -DBUILDING_DLL -MD -MQ src/backend/postgres_lib.a.p/replication_logical_worker.c.o -MF src/backend/postgres_lib.a.p/replication_logical_worker.c.o.d -o src/backend/postgres_lib.a.p/replication_logical_worker.c.o -c ../src/backend/replication/logical/worker.c [05:44:22.102] ../src/backend/replication/logical/worker.c: In function ‘get_usable_indexoid’: [05:44:22.102] ../src/backend/replication/logical/worker.c:2101:36: error: ‘ResultRelInfo’ has no member named ‘ri_RootToPartitionMap’ [05:44:22.102] 2101 | TupleConversionMap *map = relinfo->ri_RootToPartitionMap; [05:44:22.102] |^~ Greetings, Andres Freund
Re: Temporary tables versus wraparound... again
So I talked about this patch with Ronan Dunklau and he had a good question Why are we maintaining relfrozenxid and relminmxid in pg_class for temporary tables at all? Autovacuum can't use them and other sessions won't care about them. The only session that might care about them is the one attached to the temp schema. So we could replace relfrozenxid and relminmxid for temp tables with a local hash table that can be updated on every truncate easily and efficiently. If a temp table actually wraps around the only session that runs into problems is the one attached to that temp schema. It can throw local session errors and doesn't need to interfere with the rest of the cluster in any way. It could even start running vacuums though I'm not convinced that's a great solution. At least I think so. I'm pretty sure about relfrozenxid but as always I don't really know how relminmxid works. I think we only need to worry about multixacts for subtransactions, all of which are in the same transaction -- does that even work that way? But this is really attractive since it means no catalog updates just for temp tables on every transaction and no wraparound cluster problems even if you have on-commit-preserve-rows tables. It really shouldn't be possible for a regular user to cause the whole cluster to run into problems just by creating a temp table and keeping a connection around a while.
Re: PATCH: Using BRIN indexes for sorted output
Hi, On 2022-11-17 00:52:35 +0100, Tomas Vondra wrote: > Well, yeah. That's pretty much exactly what the last version of this > patch (from October 23) does. That version unfortunately doesn't build successfully: https://cirrus-ci.com/task/5108789846736896 [03:02:48.641] Duplicate OIDs detected: [03:02:48.641] 9979 [03:02:48.641] 9980 [03:02:48.641] found 2 duplicate OID(s) in catalog data Greetings, Andres Freund
Re: 16: Collation versioning and dependency helpers
Hi, On 2022-10-31 16:36:54 -0700, Jeff Davis wrote: > Committed these two small changes. FWIW, as it stands cfbot can't apply the remaining changes: http://cfbot.cputube.org/patch_40_3977.log Perhaps worth posting a new version? Or are the remaining patches abandoned in favor of the other threads? Greetings, Andres Freund
Re: Temporary tables versus wraparound... again
Hi, On 2022-12-06 13:47:39 -0500, Greg Stark wrote: > So I talked about this patch with Ronan Dunklau and he had a good > question Why are we maintaining relfrozenxid and relminmxid in > pg_class for temporary tables at all? Autovacuum can't use them and > other sessions won't care about them. The only session that might care > about them is the one attached to the temp schema. Uh, without relfrozenxid for temp tables we can end up truncating clog "ranges" away that are required to access the temp tables. So this would basically mean that temp tables can't be used reliably anymore. Greetings, Andres Freund
Re: generic plans and "initial" pruning
I find the API of GetCachedPlans a little weird after this patch. I think it may be better to have it return a pointer of a new struct -- one that contains both the CachedPlan pointer and the list of pruning results. (As I understand, the sole caller that isn't interested in the pruning results, SPI_plan_get_cached_plan, can be explained by the fact that it knows there won't be any. So I don't think we need to worry about this case?) And I think you should make that struct also be the last argument of PortalDefineQuery, so you don't need the separate PortalStorePartitionPruneResults function -- because as far as I can tell, the callers that pass a non-NULL pointer there are the exactly same that later call PortalStorePartitionPruneResults. -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/ "La primera ley de las demostraciones en vivo es: no trate de usar el sistema. Escriba un guión que no toque nada para no causar daños." (Jakob Nielsen)
Re: Time delayed LR (WAS Re: logical replication restrictions)
Hi, The tests fail on cfbot: https://cirrus-ci.com/task/4533866329800704 They only seem to fail on 32bit linux. https://api.cirrus-ci.com/v1/artifact/task/4533866329800704/testrun/build-32/testrun/subscription/032_apply_delay/log/regress_log_032_apply_delay [06:27:10.628](0.138s) ok 2 - check if the new rows were applied to subscriber timed out waiting for match: (?^:logical replication apply delay) at /tmp/cirrus-ci-build/src/test/subscription/t/032_apply_delay.pl line 124. Greetings, Andres Freund
Re: Improve tab completion for ALTER FUNCTION/PROCEDURE/ROUTINE
On Tue, 6 Dec 2022 at 20:42, Melih Mutlu wrote: > > Hi Vignesh, > > Looks like the patch needs a rebase. Rebased > Also one little suggestion: > >> + if (ends_with(prev_wd, ')')) >> + COMPLETE_WITH(Alter_routine_options, "CALLED ON NULL INPUT", >> + "RETURNS NULL ON NULL INPUT", "STRICT", "SUPPORT"); > > > What do you think about gathering FUNCTION options as you did with ROUTINE > options. > Something like the following would seem nicer, I think. > >> #define Alter_function_options \ >> Alter_routine_options, "CALLED ON NULL INPUT", \ >> >> "RETURNS NULL ON NULL INPUT", "STRICT", "SUPPORT" I did not make it as a macro for alter function options as it is used only in one place whereas the others were required in more than one place. The attached v4 patch is rebased on top of HEAD. Regards, Vignesh From c1436c498d1d939384999ecb508d5da3403abc90 Mon Sep 17 00:00:00 2001 From: Vignesh C Date: Wed, 7 Dec 2022 00:27:37 +0530 Subject: [PATCH v4] Tab completion for ALTER FUNCTION/PROCEDURE/ROUTINE. Tab completion for ALTER FUNCTION/PROCEDURE/ROUTINE action was missing, added the tab completion for the same. --- src/bin/psql/tab-complete.c | 57 ++--- 1 file changed, 53 insertions(+), 4 deletions(-) diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c index 89e7317c23..105517e61e 100644 --- a/src/bin/psql/tab-complete.c +++ b/src/bin/psql/tab-complete.c @@ -1149,6 +1149,16 @@ static const SchemaQuery Query_for_trigger_of_table = { "CREATE", "CONNECT", "TEMPORARY", "EXECUTE", "USAGE", "SET", "ALTER SYSTEM", \ "VACUUM", "ANALYZE", "ALL" +/* ALTER PROCEDURE options */ +#define Alter_procedure_options \ +"DEPENDS ON EXTENSION", "EXTERNAL SECURITY", "NO DEPENDS ON EXTENSION", \ +"OWNER TO", "RENAME TO", "RESET", "SECURITY", "SET" + +/* ALTER ROUTINE options */ +#define Alter_routine_options \ +Alter_procedure_options, "COST", "IMMUTABLE", "LEAKPROOF", "NOT LEAKPROOF", \ +"PARALLEL", "ROWS", "STABLE", "VOLATILE" + /* * These object types were introduced later than our support cutoff of * server version 9.2. We use the VersionedQuery infrastructure so that @@ -1812,12 +1822,51 @@ psql_completion(const char *text, int start, int end) else COMPLETE_WITH_FUNCTION_ARG(prev2_wd); } - /* ALTER FUNCTION,PROCEDURE,ROUTINE (...) */ - else if (Matches("ALTER", "FUNCTION|PROCEDURE|ROUTINE", MatchAny, MatchAny)) + /* ALTER FUNCTION (...) */ + else if (Matches("ALTER", "FUNCTION", MatchAny, MatchAny)) + { + if (ends_with(prev_wd, ')')) + COMPLETE_WITH(Alter_routine_options, "CALLED ON NULL INPUT", + "RETURNS NULL ON NULL INPUT", "STRICT", "SUPPORT"); + else + COMPLETE_WITH_FUNCTION_ARG(prev2_wd); + } + /* ALTER FUNCTION|ROUTINE (...) PARALLEL */ + else if (Matches("ALTER", "FUNCTION|ROUTINE", MatchAny, MatchAny, + "PARALLEL")) + COMPLETE_WITH("RESTRICTED", "SAFE", "UNSAFE"); + /* + * ALTER FUNCTION|PROCEDURE|ROUTINE (...) + * [EXTERNAL] SECURITY + */ + else if (Matches("ALTER", "FUNCTION|PROCEDURE|ROUTINE", MatchAny, MatchAny, + "SECURITY") || + Matches("ALTER", "FUNCTION|PROCEDURE|ROUTINE", MatchAny, MatchAny, + "EXTERNAL", "SECURITY")) + COMPLETE_WITH("DEFINER" ,"INVOKER"); + /* ALTER FUNCTION|PROCEDURE|ROUTINE (...) RESET */ + else if (Matches("ALTER", "FUNCTION|PROCEDURE|ROUTINE", MatchAny, MatchAny, + "RESET")) + COMPLETE_WITH_QUERY_VERBATIM_PLUS(Query_for_list_of_set_vars, + "ALL"); + /* ALTER FUNCTION|PROCEDURE|ROUTINE (...) SET */ + else if (Matches("ALTER", "FUNCTION|PROCEDURE|ROUTINE", MatchAny, MatchAny, + "SET")) + COMPLETE_WITH_QUERY_VERBATIM_PLUS(Query_for_list_of_set_vars, + "SCHEMA"); + /* ALTER PROCEDURE (...) */ + else if (Matches("ALTER", "PROCEDURE", MatchAny, MatchAny)) + { + if (ends_with(prev_wd, ')')) + COMPLETE_WITH(Alter_procedure_options); + else + COMPLETE_WITH_FUNCTION_ARG(prev2_wd); + } + /* ALTER ROUTINE (...) */ + else if (Matches("ALTER", "ROUTINE", MatchAny, MatchAny)) { if (ends_with(prev_wd, ')')) - COMPLETE_WITH("OWNER TO", "RENAME TO", "SET SCHEMA", - "DEPENDS ON EXTENSION", "NO DEPENDS ON EXTENSION"); + COMPLETE_WITH(Alter_routine_options); else COMPLETE_WITH_FUNCTION_ARG(prev2_wd); } -- 2.34.1
Re: wake up logical workers after ALTER SUBSCRIPTION
Thanks for reviewing! On Tue, Dec 06, 2022 at 07:44:46PM +0300, Melih Mutlu wrote: > Is it really necessary to wake logical workers up when renaming other than > subscription or publication? address.objectId will be a valid subid only > when renaming a subscription. Oops, that is a mistake. I only meant to wake up the workers for ALTER SUBSCRIPTION RENAME. I think I've fixed this in v6. > - When the state is SYNCDONE and the apply worker has to wake up to change > the state to READY. > > I think we already call logicalrep_worker_wakeup_ptr wherever it's needed > for the above cases? What am I missing here? IIUC we must restart all the apply workers for a subscription to enable two_phase mode. It looks like finish_sync_worker() only wakes up its own apply worker. I moved this logic to where the sync worker marks the state as SYNCDONE and added a check that two_phase mode is pending. Even so, there can still be unnecessary wakeups, but this adjustment should limit them. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com >From 4be842c8a857a13b0bec2ebcbd9f8addf8b8562a Mon Sep 17 00:00:00 2001 From: Nathan Bossart Date: Mon, 21 Nov 2022 16:01:01 -0800 Subject: [PATCH v6 1/1] wake up logical workers as needed instead of relying on periodic wakeups --- src/backend/access/transam/xact.c | 3 ++ src/backend/commands/alter.c| 7 src/backend/commands/subscriptioncmds.c | 4 ++ src/backend/replication/logical/tablesync.c | 8 src/backend/replication/logical/worker.c| 46 + src/include/replication/logicalworker.h | 3 ++ 6 files changed, 71 insertions(+) diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c index 8086b857b9..dc00e66cfb 100644 --- a/src/backend/access/transam/xact.c +++ b/src/backend/access/transam/xact.c @@ -47,6 +47,7 @@ #include "pgstat.h" #include "replication/logical.h" #include "replication/logicallauncher.h" +#include "replication/logicalworker.h" #include "replication/origin.h" #include "replication/snapbuild.h" #include "replication/syncrep.h" @@ -2360,6 +2361,7 @@ CommitTransaction(void) AtEOXact_PgStat(true, is_parallel_worker); AtEOXact_Snapshot(true, false); AtEOXact_ApplyLauncher(true); + AtEOXact_LogicalRepWorkers(true); pgstat_report_xact_timestamp(0); CurrentResourceOwner = NULL; @@ -2860,6 +2862,7 @@ AbortTransaction(void) AtEOXact_HashTables(false); AtEOXact_PgStat(false, is_parallel_worker); AtEOXact_ApplyLauncher(false); + AtEOXact_LogicalRepWorkers(false); pgstat_report_xact_timestamp(0); } diff --git a/src/backend/commands/alter.c b/src/backend/commands/alter.c index 10b6fe19a2..d095cd3ced 100644 --- a/src/backend/commands/alter.c +++ b/src/backend/commands/alter.c @@ -59,6 +59,7 @@ #include "commands/user.h" #include "miscadmin.h" #include "parser/parse_func.h" +#include "replication/logicalworker.h" #include "rewrite/rewriteDefine.h" #include "tcop/utility.h" #include "utils/builtins.h" @@ -279,6 +280,12 @@ AlterObjectRename_internal(Relation rel, Oid objectId, const char *new_name) if (strncmp(new_name, "regress_", 8) != 0) elog(WARNING, "subscriptions created by regression test cases should have names starting with \"regress_\""); #endif + + /* + * Wake up the logical replication workers to handle this change + * quickly. + */ + LogicalRepWorkersWakeupAtCommit(objectId); } else if (nameCacheId >= 0) { diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c index d673557ea4..d6993c26e5 100644 --- a/src/backend/commands/subscriptioncmds.c +++ b/src/backend/commands/subscriptioncmds.c @@ -34,6 +34,7 @@ #include "nodes/makefuncs.h" #include "pgstat.h" #include "replication/logicallauncher.h" +#include "replication/logicalworker.h" #include "replication/origin.h" #include "replication/slot.h" #include "replication/walreceiver.h" @@ -1362,6 +1363,9 @@ AlterSubscription(ParseState *pstate, AlterSubscriptionStmt *stmt, InvokeObjectPostAlterHook(SubscriptionRelationId, subid, 0); + /* Wake up the logical replication workers to handle this change quickly. */ + LogicalRepWorkersWakeupAtCommit(subid); + return myself; } diff --git a/src/backend/replication/logical/tablesync.c b/src/backend/replication/logical/tablesync.c index 94e813ac53..ca556f7145 100644 --- a/src/backend/replication/logical/tablesync.c +++ b/src/backend/replication/logical/tablesync.c @@ -105,6 +105,7 @@ #include "pgstat.h" #include "replication/logicallauncher.h" #include "replication/logicalrelation.h" +#include "replication/logicalworker.h" #include "replication/walreceiver.h" #include "replication/worker_internal.h" #include "replication/slot.h" @@ -334,6 +335,13 @@ process_syncing_tables_for_sync(XLogRecPtr current_lsn) */ ReplicationSlotDropAtPubNode(LogRepWorkerWalRcvConn, syncslotname, false); + /* + * We might be ready to enable two_phase mode. Wa
Re: Cygwin cleanup
Hi, On 2022-11-08 19:04:37 -0600, Justin Pryzby wrote: > From 2741472080eceac5cb6d002c39eaf319d7f72b50 Mon Sep 17 00:00:00 2001 > From: Justin Pryzby > Date: Fri, 30 Sep 2022 13:39:43 -0500 > Subject: [PATCH 1/3] meson: other fixes for cygwin > > XXX: what about HAVE_BUGGY_STRTOF ? What about it? As noted in another thread, HAVE_BUGGY_STRTOF is defined in a header, and shouldn't be affected by the buildsystem. Pushed this commit. > XXX This should use a canned Docker image with all the right packages > installed? But if the larger image is slower to start, then maybe not... I think once we convert the windows containers to windows VMs we can just install both cygwin and mingw in the same image. The overhead of installing too much seems far far smaller there. > +CONFIGURE_FLAGS: --enable-cassert --enable-debug --with-ldap > --with-ssl=openssl --with-libxml > +# --enable-tap-tests I assume this is disabled as tap tests fail? > +C:\tools\cygwin\bin\bash.exe --login -c "cygserver-config -y" I'd copy the approach used for mingw of putting most of this in an environment variable. > +findargs='' > case $os in > freebsd|linux|macos) > -;; > +;; > + > +cygwin) > +# XXX Evidently I don't know how to write two arguments here without > pathname expansion later, other than eval. > +#findargs='-name "*.stackdump"' > +for stack in $(find "$directory" -type f -name "*.stackdump") ; do > +binary=`basename "$stack" .stackdump` > +echo;echo; > +echo "dumping ${stack} for ${binary}" > +awk '/^0/{print $2}' $stack |addr2line -f -i -e > ./src/backend/postgres.exe > +#awk '/^0/{print $2}' $stack |addr2line -f -i -e > "./src/backend/$binary.exe" > +done > +exit 0 > +;; Is this stuff actually needed? Could we use the infrastructure we use for backtraces with msvc instead? Or use something that understands .stackdump files? > +++ b/src/test/perl/PostgreSQL/Test/Cluster.pm > [...] > +++ b/src/test/perl/PostgreSQL/Test/Utils.pm > [...] > +++ b/src/test/recovery/t/020_archive_status.pl > [...] I think these should be in a separate commit, they're not actually about CI. Greetings, Andres Freund
add \dpS to psql
Hi hackers, As discussed elsewhere [0], \dp doesn't show privileges on system objects, and this behavior is not mentioned in the docs. I've attached a small patch that adds support for the S modifier (i.e., \dpS) and the adjusts the docs. Thoughts? [0] https://postgr.es/m/a2382acd-e465-85b2-9d8e-f9ed1a5a66e9%40postgrespro.ru -- Nathan Bossart Amazon Web Services: https://aws.amazon.com diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml index d3dd638b14..406936dd1c 100644 --- a/doc/src/sgml/ref/psql-ref.sgml +++ b/doc/src/sgml/ref/psql-ref.sgml @@ -1825,14 +1825,16 @@ INSERT INTO tbl1 VALUES ($1, $2) \bind 'first value' 'second value' \g -\dp [ pattern ] +\dp[S] [ pattern ] Lists tables, views and sequences with their associated access privileges. If pattern is specified, only tables, views and sequences whose names match the -pattern are listed. +pattern are listed. By default only user-created objects are shown; +supply a pattern or the S modifier to include system +objects. diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c index de6a3a71f8..3520655dc0 100644 --- a/src/bin/psql/command.c +++ b/src/bin/psql/command.c @@ -875,7 +875,7 @@ exec_command_d(PsqlScanState scan_state, bool active_branch, const char *cmd) success = listCollations(pattern, show_verbose, show_system); break; case 'p': -success = permissionsList(pattern); +success = permissionsList(pattern, show_system); break; case 'P': { @@ -2831,7 +2831,7 @@ exec_command_z(PsqlScanState scan_state, bool active_branch) char *pattern = psql_scan_slash_option(scan_state, OT_NORMAL, NULL, true); - success = permissionsList(pattern); + success = permissionsList(pattern, false); free(pattern); } else diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c index 2eae519b1d..eb98797d67 100644 --- a/src/bin/psql/describe.c +++ b/src/bin/psql/describe.c @@ -1002,7 +1002,7 @@ listAllDbs(const char *pattern, bool verbose) * \z (now also \dp -- perhaps more mnemonic) */ bool -permissionsList(const char *pattern) +permissionsList(const char *pattern, bool showSystem) { PQExpBufferData buf; PGresult *res; @@ -1121,15 +1121,12 @@ permissionsList(const char *pattern) CppAsString2(RELKIND_FOREIGN_TABLE) "," CppAsString2(RELKIND_PARTITIONED_TABLE) ")\n"); - /* - * Unless a schema pattern is specified, we suppress system and temp - * tables, since they normally aren't very interesting from a permissions - * point of view. You can see 'em by explicit request though, eg with \z - * pg_catalog.* - */ + if (!showSystem && !pattern) + appendPQExpBufferStr(&buf, "AND n.nspname !~ '^pg_'\n"); + if (!validateSQLNamePattern(&buf, pattern, true, false, "n.nspname", "c.relname", NULL, -"n.nspname !~ '^pg_' AND pg_catalog.pg_table_is_visible(c.oid)", +"pg_catalog.pg_table_is_visible(c.oid)", NULL, 3)) goto error_return; diff --git a/src/bin/psql/describe.h b/src/bin/psql/describe.h index bd051e09cb..58d0cf032b 100644 --- a/src/bin/psql/describe.h +++ b/src/bin/psql/describe.h @@ -38,7 +38,7 @@ extern bool describeRoles(const char *pattern, bool verbose, bool showSystem); extern bool listDbRoleSettings(const char *pattern, const char *pattern2); /* \z (or \dp) */ -extern bool permissionsList(const char *pattern); +extern bool permissionsList(const char *pattern, bool showSystem); /* \ddp */ extern bool listDefaultACLs(const char *pattern);
Re: 16: Collation versioning and dependency helpers
On Tue, 2022-12-06 at 10:53 -0800, Andres Freund wrote: > Perhaps worth posting a new version? Or are the remaining patches > abandoned in > favor of the other threads? Marked what is there as committed, and the remainder is abandoned in favor of other threads. Thanks, Jeff Davis
Re: predefined role(s) for VACUUM and ANALYZE
On Tue, Dec 06, 2022 at 04:57:37PM +0300, Pavel Luzanov wrote: > On 06.12.2022 03:04, Nathan Bossart wrote: >> I wonder why \dpS wasn't added. I wrote up a patch to add it and the >> corresponding documentation that other meta-commands already have. > > Yes, \dpS command and clarification in the documentation is exactly what is > needed. I created a new thread for this: https://postgr.es/m/20221206193606.GB3078082%40nathanxps13 -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: predefined role(s) for VACUUM and ANALYZE
On Tue, Dec 06, 2022 at 11:47:50AM +, Dagfinn Ilmari Mannsåker wrote: > These checks are getting rather repetitive, how about a data-driven > approach, along the lines of the below patch? I'm not quite happy with > the naming of the struct and its members (and maybe it should be in a > header?), suggestions welcome. +1. I wonder if we should also consider checking all the bits at once before we start checking for the predefined roles. I'm thinking of something a bit like this: role_mask = ACL_SELECT | ACL_INSERT | ACL_UPDATE | ACL_DELETE | ACL_VACUUM | ACL_ANALYZE; if (mask & role_mask != result & role_mask) { ... existing checks here ... } I'm skeptical this actually produces any measurable benefit, but presumably the predefined roles list will continue to grow, so maybe it's still worth adding a fast path. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: Temporary tables versus wraparound... again
On Tue, 6 Dec 2022 at 13:59, Andres Freund wrote: > > Hi, > > On 2022-12-06 13:47:39 -0500, Greg Stark wrote: > > So I talked about this patch with Ronan Dunklau and he had a good > > question Why are we maintaining relfrozenxid and relminmxid in > > pg_class for temporary tables at all? Autovacuum can't use them and > > other sessions won't care about them. The only session that might care > > about them is the one attached to the temp schema. > > Uh, without relfrozenxid for temp tables we can end up truncating clog > "ranges" away that are required to access the temp tables. So this would > basically mean that temp tables can't be used reliably anymore. True, we would have to have some other mechanism for exporting the frozenxid that the session needs. Presumably that would be something in PGProc like the xmin and other numbers. It could be updated by scanning our local hash table whenever a transaction starts. This also probably is what would be needed for multixacts I guess? -- greg
Re: Temporary tables versus wraparound... again
Hi, On 2022-12-06 14:50:34 -0500, Greg Stark wrote: > On Tue, 6 Dec 2022 at 13:59, Andres Freund wrote: > > On 2022-12-06 13:47:39 -0500, Greg Stark wrote: > > > So I talked about this patch with Ronan Dunklau and he had a good > > > question Why are we maintaining relfrozenxid and relminmxid in > > > pg_class for temporary tables at all? Autovacuum can't use them and > > > other sessions won't care about them. The only session that might care > > > about them is the one attached to the temp schema. > > > > Uh, without relfrozenxid for temp tables we can end up truncating clog > > "ranges" away that are required to access the temp tables. So this would > > basically mean that temp tables can't be used reliably anymore. > > True, we would have to have some other mechanism for exporting the > frozenxid that the session needs. Presumably that would be something > in PGProc like the xmin and other numbers. It could be updated by > scanning our local hash table whenever a transaction starts. That'd be a fair bit of new mechanism. Not at all impossible, but I'm doubtful the complexity is worth it. In your patch the relevant catalog change is an inplace change and thus doesn't cause bloat. And if we have to retain the clog, I don't see that much benefit in the proposed approach. > This also probably is what would be needed for multixacts I guess? Yes. Greetings, Andres Freund
Re: Error-safe user functions
OK, here's a v3 responding to the comments from Andres. is preliminary refactoring of elog.c, with (I trust) no functional effect. It gets rid of some pre-existing code duplication as well as setting up to let 0001's additions be less duplicative. 0001 adopts use of Node pointers in place of "void *". To do this I needed an alias type in elog.h equivalent to fmgr.h's fmNodePtr. I decided that having two different aliases would be too confusing, so what I did here was to converge both elog.h and fmgr.h on using the same alias "typedef struct Node *NodePtr". That has to be in elog.h since it's included first, from postgres.h. (I thought of defining NodePtr in postgres.h, but postgres.h includes elog.h immediately so that wouldn't have looked very nice.) I also adopted Andres' recommendation that InputFunctionCallSafe return boolean. I'm still not totally sold on that ... but it does end with array_in and record_in never using SAFE_ERROR_OCCURRED at all, so maybe the idea's OK. 0002 adjusts the I/O functions for these API changes, and fixes my silly oversight about error cleanup in record_in. Given the discussion about testing requirements, I threw away the COPY hack entirely. This 0003 provides a couple of SQL-callable functions that can be used to invoke a specific datatype's input function. I haven't documented them, pending bikeshedding on names etc. I also arranged to test array_in and record_in with a datatype that still throws errors, reserving the existing test type "widget" for that purpose. (I'm not intending to foreclose development of new COPY features in this area, just abandoning the idea that that's our initial test mechanism.) Thoughts? regards, tom lane diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c index 2585e24845..f5cd1b7493 100644 --- a/src/backend/utils/error/elog.c +++ b/src/backend/utils/error/elog.c @@ -176,8 +176,15 @@ static char formatted_log_time[FORMATTED_TS_LEN]; static const char *err_gettext(const char *str) pg_attribute_format_arg(1); +static ErrorData *get_error_stack_entry(void); +static void set_stack_entry_domain(ErrorData *edata, const char *domain); +static void set_stack_entry_location(ErrorData *edata, + const char *filename, int lineno, + const char *funcname); +static bool matches_backtrace_functions(const char *funcname); static pg_noinline void set_backtrace(ErrorData *edata, int num_skip); static void set_errdata_field(MemoryContextData *cxt, char **ptr, const char *str); +static void FreeErrorDataContents(ErrorData *edata); static void write_console(const char *line, int len); static const char *process_log_prefix_padding(const char *p, int *ppadding); static void log_line_prefix(StringInfo buf, ErrorData *edata); @@ -434,27 +441,13 @@ errstart(int elevel, const char *domain) debug_query_string = NULL; } } - if (++errordata_stack_depth >= ERRORDATA_STACK_SIZE) - { - /* - * Wups, stack not big enough. We treat this as a PANIC condition - * because it suggests an infinite loop of errors during error - * recovery. - */ - errordata_stack_depth = -1; /* make room on stack */ - ereport(PANIC, (errmsg_internal("ERRORDATA_STACK_SIZE exceeded"))); - } /* Initialize data for this error frame */ - edata = &errordata[errordata_stack_depth]; - MemSet(edata, 0, sizeof(ErrorData)); + edata = get_error_stack_entry(); edata->elevel = elevel; edata->output_to_server = output_to_server; edata->output_to_client = output_to_client; - /* the default text domain is the backend's */ - edata->domain = domain ? domain : PG_TEXTDOMAIN("postgres"); - /* initialize context_domain the same way (see set_errcontext_domain()) */ - edata->context_domain = edata->domain; + set_stack_entry_domain(edata, domain); /* Select default errcode based on elevel */ if (elevel >= ERROR) edata->sqlerrcode = ERRCODE_INTERNAL_ERROR; @@ -462,8 +455,6 @@ errstart(int elevel, const char *domain) edata->sqlerrcode = ERRCODE_WARNING; else edata->sqlerrcode = ERRCODE_SUCCESSFUL_COMPLETION; - /* errno is saved here so that error parameter eval can't change it */ - edata->saved_errno = errno; /* * Any allocations for this error state level should go into ErrorContext @@ -474,32 +465,6 @@ errstart(int elevel, const char *domain) return true; } -/* - * Checks whether the given funcname matches backtrace_functions; see - * check_backtrace_functions. - */ -static bool -matches_backtrace_functions(const char *funcname) -{ - char *p; - - if (!backtrace_symbol_list || funcname == NULL || funcname[0] == '\0') - return false; - - p = backtrace_symbol_list; - for (;;) - { - if (*p == '\0') /* end of backtrace_symbol_list */ - break; - - if (strcmp(funcname, p) == 0) - return true; - p += strlen(p) + 1; - } - - return false; -} - /* * errfinish --- end an error-reporting cycle * @@ -520,23 +485,7 @@ errfinish(const char *filename, int lineno, const cha
Re: Error-safe user functions
Robert Haas writes: > I feel like this can go either way. If we pick a name that conveys a > specific intended behavior now, and then later we want to pass some > other sort of node for some purpose other than ignoring errors, it's > unpleasant to have a name that sounds like it can only ignore errors. > But if we never use it for anything other than ignoring errors, a > specific name is clearer. With Andres' proposal to make the function return boolean succeed/fail, I think it's pretty clear that the only useful case is to pass an ErrorSaveContext. There may well be future APIs that pass some other kind of context object to input functions, but they'll presumably have different goals and want a different sort of wrapper function. regards, tom lane
Re: Cygwin cleanup
On Fri, Jul 29, 2022 at 10:57 AM Thomas Munro wrote: > I wonder if these problems would go away as a nice incidental > side-effect if we used latches for postmaster wakeups. I don't > know... maybe, if the problem is just with the postmaster's pattern of > blocking/unblocking? Maybe backend startup is simple enough that it > doesn't hit the bug? From a quick glance, I think the assertion > failures that occur in regular backends can possibly be blamed on the > postmaster getting confused about its children due to unexpected > handler re-entry. Just to connect the dots, that's what this patch does: https://www.postgresql.org/message-id/flat/CA+hUKG+Z-HpOj1JsO9eWUP+ar7npSVinsC_npxSy+jdOMsx=g...@mail.gmail.com (There may be other places that break under Cygwin's flaky sa_mask implementation, I don't know and haven't seen any clues about that.)
Re: [PATCH] Infinite loop while acquiring new TOAST Oid
Hi hackers! Here's some update on the subject - I've made a branch based on master from 28/11 and introduced 64-bit TOAST value ID there. It is not complete now but is already working and has some features: - extended structure for TOAST pointer (varatt_long_external) with 64-bit TOAST value ID; - individual ID counters for each TOAST table, being cached for faster access and lookup of maximum TOAST ID in use after server restart. https://github.com/postgrespro/postgres/tree/toast_64bit -- Regards, Nikita Malakhov Postgres Professional https://postgrespro.ru/
Re: wake up logical workers after ALTER SUBSCRIPTION
On Tue, Dec 06, 2022 at 11:25:51AM -0800, Nathan Bossart wrote: > On Tue, Dec 06, 2022 at 07:44:46PM +0300, Melih Mutlu wrote: >> - When the state is SYNCDONE and the apply worker has to wake up to change >> the state to READY. >> >> I think we already call logicalrep_worker_wakeup_ptr wherever it's needed >> for the above cases? What am I missing here? > > IIUC we must restart all the apply workers for a subscription to enable > two_phase mode. It looks like finish_sync_worker() only wakes up its own > apply worker. I moved this logic to where the sync worker marks the state > as SYNCDONE and added a check that two_phase mode is pending. Even so, > there can still be unnecessary wakeups, but this adjustment should limit > them. Actually, that's not quite right. The sync worker will wake up the apply worker to change the state from SYNCDONE to READY. AllTablesyncsReady() checks that all tables are READY, so we need to wake up all the workers when an apply worker changes the state to READY. Each worker will then evaluate whether to restart for two_phase mode. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com >From 1ca15608be05229b3349ba5840abceebb1497fe1 Mon Sep 17 00:00:00 2001 From: Nathan Bossart Date: Mon, 21 Nov 2022 16:01:01 -0800 Subject: [PATCH v7 1/1] wake up logical workers as needed instead of relying on periodic wakeups --- src/backend/access/transam/xact.c | 3 ++ src/backend/commands/alter.c| 7 src/backend/commands/subscriptioncmds.c | 4 ++ src/backend/replication/logical/tablesync.c | 8 src/backend/replication/logical/worker.c| 46 + src/include/replication/logicalworker.h | 3 ++ 6 files changed, 71 insertions(+) diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c index 8086b857b9..dc00e66cfb 100644 --- a/src/backend/access/transam/xact.c +++ b/src/backend/access/transam/xact.c @@ -47,6 +47,7 @@ #include "pgstat.h" #include "replication/logical.h" #include "replication/logicallauncher.h" +#include "replication/logicalworker.h" #include "replication/origin.h" #include "replication/snapbuild.h" #include "replication/syncrep.h" @@ -2360,6 +2361,7 @@ CommitTransaction(void) AtEOXact_PgStat(true, is_parallel_worker); AtEOXact_Snapshot(true, false); AtEOXact_ApplyLauncher(true); + AtEOXact_LogicalRepWorkers(true); pgstat_report_xact_timestamp(0); CurrentResourceOwner = NULL; @@ -2860,6 +2862,7 @@ AbortTransaction(void) AtEOXact_HashTables(false); AtEOXact_PgStat(false, is_parallel_worker); AtEOXact_ApplyLauncher(false); + AtEOXact_LogicalRepWorkers(false); pgstat_report_xact_timestamp(0); } diff --git a/src/backend/commands/alter.c b/src/backend/commands/alter.c index 10b6fe19a2..d095cd3ced 100644 --- a/src/backend/commands/alter.c +++ b/src/backend/commands/alter.c @@ -59,6 +59,7 @@ #include "commands/user.h" #include "miscadmin.h" #include "parser/parse_func.h" +#include "replication/logicalworker.h" #include "rewrite/rewriteDefine.h" #include "tcop/utility.h" #include "utils/builtins.h" @@ -279,6 +280,12 @@ AlterObjectRename_internal(Relation rel, Oid objectId, const char *new_name) if (strncmp(new_name, "regress_", 8) != 0) elog(WARNING, "subscriptions created by regression test cases should have names starting with \"regress_\""); #endif + + /* + * Wake up the logical replication workers to handle this change + * quickly. + */ + LogicalRepWorkersWakeupAtCommit(objectId); } else if (nameCacheId >= 0) { diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c index d673557ea4..d6993c26e5 100644 --- a/src/backend/commands/subscriptioncmds.c +++ b/src/backend/commands/subscriptioncmds.c @@ -34,6 +34,7 @@ #include "nodes/makefuncs.h" #include "pgstat.h" #include "replication/logicallauncher.h" +#include "replication/logicalworker.h" #include "replication/origin.h" #include "replication/slot.h" #include "replication/walreceiver.h" @@ -1362,6 +1363,9 @@ AlterSubscription(ParseState *pstate, AlterSubscriptionStmt *stmt, InvokeObjectPostAlterHook(SubscriptionRelationId, subid, 0); + /* Wake up the logical replication workers to handle this change quickly. */ + LogicalRepWorkersWakeupAtCommit(subid); + return myself; } diff --git a/src/backend/replication/logical/tablesync.c b/src/backend/replication/logical/tablesync.c index 94e813ac53..e253db371a 100644 --- a/src/backend/replication/logical/tablesync.c +++ b/src/backend/replication/logical/tablesync.c @@ -105,6 +105,7 @@ #include "pgstat.h" #include "replication/logicallauncher.h" #include "replication/logicalrelation.h" +#include "replication/logicalworker.h" #include "replication/walreceiver.h" #include "replication/worker_internal.h" #include "replication/slot.h" @@ -517,6 +518,13 @@ process_syncing_tables_for_apply(XLogRecPtr current_lsn) UpdateSubscriptionRelState(MyLogicalRepWorker->su
Re: New strategies for freezing, advancing relfrozenxid early
On Tue, Dec 6, 2022 at 10:42 AM Andres Freund wrote: > The docs don't build: > https://cirrus-ci.com/task/5456939761532928 > [20:00:58.203] postgres.sgml:52: element link: validity error : IDREF > attribute linkend references an unknown ID "vacuum-for-wraparound" Thanks for pointing this out. FWIW it is a result of Bruce's recent addition of the transaction processing chapter to the docs. My intention is to post v9 later in the week, which will fix the doc build, and a lot more besides that. If you are planning on doing another round of review, I'd suggest that you hold off until then. v9 will have structural improvements that will likely make it easier to understand all the steps leading up to removing aggressive mode completely. It'll be easier to relate each local step/patch to the bigger picture for VACUUM. v9 will also address some of the concerns you raised in your review that weren't covered by v8, especially about the VM snapshotting infrastructure. But also your concerns about the transition from lazy strategies to eager strategies. The "catch up freezing" performed by the first VACUUM operation run against a table that just exceeded the GUC-controlled table size threshold will have far more limited impact, because the burden of freezing will be spread out across multiple VACUUM operations. The big idea behind the patch series is to relieve users from having to think about a special type of VACUUM that has to do much more freezing than other VACUUMs that ran against the same table in the recent past, of course, so it is important to avoid accidentally allowing any behavior that looks kind of like the ghost of aggressive VACUUM. -- Peter Geoghegan
core dumps generated in archive / restore commands etc
Hi, Occasionally I see core dumps for sh, cp etc when running the tests. I think this is mainly due to immediate shutdowns / crashes signalling the entire process group with SIGQUIT. If a sh/cp/... is running as part of an archive/restore command when the signal arrives, we'll trigger a coredump, because those tools won't have a SIGQUIT handler. ISTM that postmaster's signal_child() shouldn't send SIGQUIT to the process group in the #ifdef HAVE_SETSID section. We've already signalled the backend with SIGQUIT, so we could change the signal we send to the whole process group to one that doesn't trigger core dumps by default. SIGTERM seems like it would be the right choice. The one non-trivial aspect of this is that that signal will also be delivered to the group leader. It's possible that that could lead to some minor test behaviour issues, because the output could change if e.g. SIGTERM is received / processed first. Greetings, Andres Freund
Re: Allow placeholders in ALTER ROLE w/o superuser
Hi, Alexander! On Tue, 6 Dec 2022 at 19:01, Alexander Korotkov wrote: > > On Mon, Dec 5, 2022 at 10:32 PM Alexander Korotkov > wrote: > > On Mon, Dec 5, 2022 at 8:18 PM Tom Lane wrote: > > > Alvaro Herrera writes: > > > > I couldn't find any discussion of the idea of adding "(s)" to the > > > > variable name in order to mark the variable userset in the catalog, and > > > > I have to admit I find it a bit strange. Are we really agreed that > > > > that's the way to proceed? > > > > > > I hadn't been paying close attention to this thread, sorry. > > > > > > I agree that that seems like a very regrettable choice, > > > especially if you anticipate having to bump catversion anyway. > > > > I totally understand that this change requires a catversion bump. > > I've reflected this in the commit message. > > > > > Better to add a bool column to the catalog. > > > > What about adding a boolean array to the pg_db_role_setting? So, > > pg_db_role_setting would have the following columns. > > * setdatabase oid > > * setrole oid > > * setconfig text[] > > * setuser bool[] > > The revised patch implements this way for storage USER SET flag. > think it really became more structured and less cumbersome. I agree that the patch became more structured and the complications for string parameter suffixing have gone away. I've looked it through and don't see problems with it. The only two-lines fix regarding variable initializing may be relevant (see v9). Tests pass and CI is also happy with it. I'd like to set it ready for committer if no objections. Regards, Pavel Borisov, Supabase. v9-0001-Add-USER-SET-parameter-values-for-pg_db_role_sett.patch Description: Binary data
Re: Using WaitEventSet in the postmaster
On 2022-12-07 00:58:06 +1300, Thomas Munro wrote: > One way to fix that for the epoll version is to EPOLL_CTL_DEL and > EPOLL_CTL_ADD, whenever transitioning to/from a zero event mask. > Tried like that in this version. Another approach would be to > (finally) write DeleteWaitEvent() to do the same thing at a higher > level... seems overkill for this. What about just recreating the WES during crash restart? > > This seems to hardcode the specific wait events we're waiting for based on > > latch.c infrastructure. Not really convinced that's a good idea. > > What are you objecting to? The assumption that the first socket is at > position 1? The use of GetNumRegisteredWaitEvents()? The latter.
Re: Allow pageinspect's bt_page_stats function to return a set of rows instead of a single row
Hi, On 2022-11-25 02:45:01 +0500, Hamid Akhtar wrote: > Rebasing the patch to the tip of the master branch. This doesn't pass tests on cfbot. Looks like possibly some files are missing? https://api.cirrus-ci.com/v1/artifact/task/4916614353649664/testrun/build/testrun/pageinspect/regress/regression.diffs diff -U3 /tmp/cirrus-ci-build/contrib/pageinspect/expected/page.out /tmp/cirrus-ci-build/build/testrun/pageinspect/regress/results/page.out --- /tmp/cirrus-ci-build/contrib/pageinspect/expected/page.out 2022-12-06 20:07:47.691479000 + +++ /tmp/cirrus-ci-build/build/testrun/pageinspect/regress/results/page.out 2022-12-06 20:11:42.955606000 + @@ -1,4 +1,5 @@ CREATE EXTENSION pageinspect; +ERROR: extension "pageinspect" has no installation script nor update path for version "1.12" -- Use a temp table so that effects of VACUUM are predictable CREATE TEMP TABLE test1 (a int, b int); INSERT INTO test1 VALUES (16777217, 131584); @@ -6,236 +7,203 @@ Greetings, Andres Freund
remove extra space from dumped ALTER DEFAULT PRIVILEGES commands
Hi hackers, pg_dump adds an extra space in ALTER DEFAULT PRIVILEGES commands. AFAICT it's been this way since the command was first introduced (249724c). I've attached a patch to fix it. This is admittedly just a pet peeve, but maybe it is bothering others, too. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com diff --git a/src/bin/pg_dump/dumputils.c b/src/bin/pg_dump/dumputils.c index 9311417f18..694a28f532 100644 --- a/src/bin/pg_dump/dumputils.c +++ b/src/bin/pg_dump/dumputils.c @@ -184,7 +184,9 @@ buildACLCommands(const char *name, const char *subname, const char *nspname, prefix, privs->data, type); if (nspname && *nspname) appendPQExpBuffer(firstsql, "%s.", fmtId(nspname)); - appendPQExpBuffer(firstsql, "%s FROM ", name); + if (name && *name) +appendPQExpBuffer(firstsql, "%s ", name); + appendPQExpBufferStr(firstsql, "FROM "); if (grantee->len == 0) appendPQExpBufferStr(firstsql, "PUBLIC;\n"); else @@ -253,7 +255,9 @@ buildACLCommands(const char *name, const char *subname, const char *nspname, prefix, privs->data, type); if (nspname && *nspname) appendPQExpBuffer(thissql, "%s.", fmtId(nspname)); - appendPQExpBuffer(thissql, "%s TO ", name); + if (name && *name) + appendPQExpBuffer(thissql, "%s ", name); + appendPQExpBufferStr(thissql, "TO "); if (grantee->len == 0) appendPQExpBufferStr(thissql, "PUBLIC;\n"); else @@ -265,7 +269,9 @@ buildACLCommands(const char *name, const char *subname, const char *nspname, prefix, privswgo->data, type); if (nspname && *nspname) appendPQExpBuffer(thissql, "%s.", fmtId(nspname)); - appendPQExpBuffer(thissql, "%s TO ", name); + if (name && *name) + appendPQExpBuffer(thissql, "%s ", name); + appendPQExpBufferStr(thissql, "TO "); if (grantee->len == 0) appendPQExpBufferStr(thissql, "PUBLIC"); else diff --git a/src/bin/pg_dump/t/002_pg_dump.pl b/src/bin/pg_dump/t/002_pg_dump.pl index 6656222363..4732ee2e4a 100644 --- a/src/bin/pg_dump/t/002_pg_dump.pl +++ b/src/bin/pg_dump/t/002_pg_dump.pl @@ -563,7 +563,7 @@ my %tests = ( regexp => qr/^ \QALTER DEFAULT PRIVILEGES \E \QFOR ROLE regress_dump_test_role IN SCHEMA dump_test \E - \QGRANT SELECT ON TABLES TO regress_dump_test_role;\E + \QGRANT SELECT ON TABLES TO regress_dump_test_role;\E /xm, like => { %full_runs, %dump_test_schema_runs, section_post_data => 1, }, @@ -582,7 +582,7 @@ my %tests = ( regexp => qr/^ \QALTER DEFAULT PRIVILEGES \E \QFOR ROLE regress_dump_test_role IN SCHEMA dump_test \E - \QGRANT ALL ON FUNCTIONS TO regress_dump_test_role;\E + \QGRANT ALL ON FUNCTIONS TO regress_dump_test_role;\E /xm, like => { %full_runs, %dump_test_schema_runs, section_post_data => 1, }, @@ -600,7 +600,7 @@ my %tests = ( regexp => qr/^ \QALTER DEFAULT PRIVILEGES \E \QFOR ROLE regress_dump_test_role \E - \QREVOKE ALL ON FUNCTIONS FROM PUBLIC;\E + \QREVOKE ALL ON FUNCTIONS FROM PUBLIC;\E /xm, like => { %full_runs, section_post_data => 1, }, unlike => { no_privs => 1, }, @@ -615,10 +615,10 @@ my %tests = ( regexp => qr/^ \QALTER DEFAULT PRIVILEGES \E \QFOR ROLE regress_dump_test_role \E - \QREVOKE ALL ON TABLES FROM regress_dump_test_role;\E\n + \QREVOKE ALL ON TABLES FROM regress_dump_test_role;\E\n \QALTER DEFAULT PRIVILEGES \E \QFOR ROLE regress_dump_test_role \E - \QGRANT INSERT,REFERENCES,DELETE,TRIGGER,TRUNCATE,VACUUM,ANALYZE,UPDATE ON TABLES TO regress_dump_test_role;\E + \QGRANT INSERT,REFERENCES,DELETE,TRIGGER,TRUNCATE,VACUUM,ANALYZE,UPDATE ON TABLES TO regress_dump_test_role;\E /xm, like => { %full_runs, section_post_data => 1, }, unlike => { no_privs => 1, },
Re: Force streaming every change in logical decoding
On Tue, Dec 6, 2022 at 9:29 PM Amit Kapila wrote: > > On Tue, Dec 6, 2022 at 11:53 AM shiy.f...@fujitsu.com > wrote: > > > > Hi hackers, > > > > In logical decoding, when logical_decoding_work_mem is exceeded, the > > changes are > > sent to output plugin in streaming mode. But there is a restriction that the > > minimum value of logical_decoding_work_mem is 64kB. I tried to add a GUC to > > allow sending every change to output plugin without waiting until > > logical_decoding_work_mem is exceeded. > > > > This helps to test streaming mode. For example, to test "Avoid streaming the > > transaction which are skipped" [1], it needs many XLOG_XACT_INVALIDATIONS > > messages. With the new option, it can be tested with fewer changes and in > > less > > time. Also, this new option helps to test more scenarios for "Perform > > streaming > > logical transactions by background workers" [2]. > > +1 > > Yeah, I think this can also help in reducing the time for various > tests in test_decoding/stream and > src/test/subscription/t/*_stream_*.pl file by reducing the number of > changes required to invoke streaming mode. Can we think of making this > GUC extendible to even test more options on server-side (publisher) > and client-side (subscriber) cases? For example, we can have something > like logical_replication_mode with the following valid values: (a) > server_serialize: this will serialize each change to file on > publishers and then on commit restore and send all changes; (b) > server_stream: this will stream each change as currently proposed in > your patch. Then if we want to extend it for subscriber-side testing > then we can introduce new options like client_serialize for the case > being discussed in the email [1]. > > Thoughts? There is potential for lots of developer GUCs for testing/debugging in the area of logical replication but IMO it might be better to keep them all separated. Putting everything into a single 'logical_replication_mode' might cause difficulties later when/if you want combinations of the different modes. For example, instead of logical_replication_mode = XXX/YYY/ZZZ maybe something like below will give more flexibility. logical_replication_dev_XXX = true/false logical_replication_dev_YYY = true/false logical_replication_dev_ZZZ = true/false -- Kind Regards, Peter Smith. Fujitsu Australia
Re: Generate pg_stat_get_* functions with Macros
On Tue, Dec 06, 2022 at 05:28:47AM +0100, Drouvot, Bertrand wrote: > Fields renaming was mandatory in the previous ones as there was > already a mix of with/without "n_" in the existing fields names. > > That said, I think it's better to rename the fields as you did (to > be "consistent" on the naming between relation/db stats), so the > patch LGTM. Yeah, PgStat_StatDBEntry is the last one using this style, so I have kept my change with the variables renamed rather than painting more CppConcat()s. The functions are still named the same as the original ones. -- Michael signature.asc Description: PGP signature
Collation DDL inconsistencies
When I looked at the bug: https://postgr.es/m/CALDQics_oBEYfOnu_zH6yw9WR1waPCmcrqxQ8+39hK3Op=z...@mail.gmail.com I noticed that the DDL around collations is inconsistent. For instance, CREATE COLLATION[1] uses LOCALE, LC_COLLATE, and LC_CTYPE parameters to specify either libc locales or an icu locale; whereas CREATE DATABASE[2] uses LOCALE, LC_COLLATE, and LC_CTYPE always for libc, and ICU_LOCALE if the default collation is ICU. The catalog representation is strange in a different way: datcollate/collcollate are always for libc, and daticulocale is for icu. That means anything that deals with those fields needs to pick the right one based on the provider. If this were a clean slate, it would make more sense if it were something like: datcollate/collcollate: to instantiate pg_locale_t datctype/collctype: to instantiate pg_locale_t datlibccollate: used by libc elsewhere datlibcctype: used by libc elsewhere daticulocale/colliculocale: remove these fields That way, if you are instantiating a pg_locale_t, you always just pass datcollate/datctype/collcollate/collctype, regardless of the provider (pg_newlocale_from_collation() would figure it out). And if you are going to do something straight with libc, you always use datlibccollate/datlibcctype. Aside: why don't we support different collate/ctype with ICU? It appears that u_strToTitle/u_strToUpper/u_strToLower just accept a string "locale", and it would be easy enough to pass it whatever is in datctype/collctype, right? We should validate that it's a valid locale; but other than that, I don't see the problem. Thoughts? Implementation-wise, I suppose this could create some annoyances in pg_dump. [1] https://www.postgresql.org/docs/devel/sql-createcollation.html [2] https://www.postgresql.org/docs/devel/sql-createdatabase.html [3] https://unicode-org.github.io/icu-docs/apidoc/released/icu4c/ustring_8h.html -- Jeff Davis PostgreSQL Contributor Team - AWS
Re: Query Jumbling for CALL and SET utility statements
On Tue, Oct 11, 2022 at 11:54:51AM +0900, Michael Paquier wrote: > On Mon, Oct 10, 2022 at 09:16:47PM +0800, Julien Rouhaud wrote: >> Unless I'm missing something both can be handled using pg_node_attr() >> annotations that already exists? > > Indeed, that should work for the locators. My mistake here. When it comes to the locations to silence in the normalization, we already rely on the variable name "location" of type int in gen_node_support.pl, so we can just do the same for the code generating the jumbling. This stuff still needs two more pg_node_attr() to be able to work: one to ignore a full Node in the jumbling and a second that can be used on a per-field basis. Once this is in place, automating the generation of the code is not that complicated, most of the work is to get around placing the pg_node_attr() so as the jumbling gets things right. The number of fields to mark as things to ignore depends on the Node type (heavy for Const, for example), but I'd like to think that a "ignore" approach is better than an "include" approach so as new fields would be considered by default in the jumble compilation. I have not looked at all the edge cases, so perhaps more attrs would be needed.. -- Michael signature.asc Description: PGP signature
Re: Using WaitEventSet in the postmaster
> From 61480441f67ca7fac96ca4bcfe85f27013a47aa8 Mon Sep 17 00:00:00 2001 > From: Thomas Munro > Date: Tue, 6 Dec 2022 16:13:36 +1300 > Subject: [PATCH v4 2/5] Don't leak a signalfd when using latches in the > postmaster. > > + /* > + * It would probably be safe to re-use the inherited signalfd > since > + * signalfds only see the current processes pending signals, > but it I think you mean "current process's", right ?
Re: Using WaitEventSet in the postmaster
On Wed, Dec 7, 2022 at 12:12 PM Andres Freund wrote: > On 2022-12-07 00:58:06 +1300, Thomas Munro wrote: > > One way to fix that for the epoll version is to EPOLL_CTL_DEL and > > EPOLL_CTL_ADD, whenever transitioning to/from a zero event mask. > > Tried like that in this version. Another approach would be to > > (finally) write DeleteWaitEvent() to do the same thing at a higher > > level... seems overkill for this. > > What about just recreating the WES during crash restart? It seems a bit like cheating but yeah that's a super simple solution, and removes one patch from the stack. Done like that in this version. > > > This seems to hardcode the specific wait events we're waiting for based on > > > latch.c infrastructure. Not really convinced that's a good idea. > > > > What are you objecting to? The assumption that the first socket is at > > position 1? The use of GetNumRegisteredWaitEvents()? > > The latter. Removed. From 07b04dc410118ad04fd0006edda7ba80f241357a Mon Sep 17 00:00:00 2001 From: Thomas Munro Date: Tue, 6 Dec 2022 15:21:11 +1300 Subject: [PATCH v5 1/4] Add WL_SOCKET_ACCEPT event to WaitEventSet API. To be able to handle incoming connections on a server socket with the WaitEventSet API, we'll need a new kind of event to indicate that the the socket is ready to accept a connection. On Unix, it's just the same as WL_SOCKET_READABLE, but on Windows there is a different kernel event that we need to map our abstraction to. A future commit will use this. Reviewed-by: Andres Freund Discussion: https://postgr.es/m/CA%2BhUKG%2BZ-HpOj1JsO9eWUP%2Bar7npSVinsC_npxSy%2BjdOMsx%3DGg%40mail.gmail.com --- src/backend/storage/ipc/latch.c | 13 - src/include/storage/latch.h | 7 +++ 2 files changed, 19 insertions(+), 1 deletion(-) diff --git a/src/backend/storage/ipc/latch.c b/src/backend/storage/ipc/latch.c index eb3a569aae..7ced8264f0 100644 --- a/src/backend/storage/ipc/latch.c +++ b/src/backend/storage/ipc/latch.c @@ -864,6 +864,9 @@ FreeWaitEventSet(WaitEventSet *set) * - WL_SOCKET_CONNECTED: Wait for socket connection to be established, * can be combined with other WL_SOCKET_* events (on non-Windows * platforms, this is the same as WL_SOCKET_WRITEABLE) + * - WL_SOCKET_ACCEPT: Wait for new connection to a server socket, + * can be combined with other WL_SOCKET_* events (on non-Windows + * platforms, this is the same as WL_SOCKET_READABLE) * - WL_SOCKET_CLOSED: Wait for socket to be closed by remote peer. * - WL_EXIT_ON_PM_DEATH: Exit immediately if the postmaster dies * @@ -874,7 +877,7 @@ FreeWaitEventSet(WaitEventSet *set) * i.e. it must be a process-local latch initialized with InitLatch, or a * shared latch associated with the current process by calling OwnLatch. * - * In the WL_SOCKET_READABLE/WRITEABLE/CONNECTED cases, EOF and error + * In the WL_SOCKET_READABLE/WRITEABLE/CONNECTED/ACCEPT cases, EOF and error * conditions cause the socket to be reported as readable/writable/connected, * so that the caller can deal with the condition. * @@ -1312,6 +1315,8 @@ WaitEventAdjustWin32(WaitEventSet *set, WaitEvent *event) flags |= FD_WRITE; if (event->events & WL_SOCKET_CONNECTED) flags |= FD_CONNECT; + if (event->events & WL_SOCKET_ACCEPT) + flags |= FD_ACCEPT; if (*handle == WSA_INVALID_EVENT) { @@ -2067,6 +2072,12 @@ WaitEventSetWaitBlock(WaitEventSet *set, int cur_timeout, /* connected */ occurred_events->events |= WL_SOCKET_CONNECTED; } + if ((cur_event->events & WL_SOCKET_ACCEPT) && + (resEvents.lNetworkEvents & FD_ACCEPT)) + { + /* incoming connection could be accepted */ + occurred_events->events |= WL_SOCKET_ACCEPT; + } if (resEvents.lNetworkEvents & FD_CLOSE) { /* EOF/error, so signal all caller-requested socket flags */ diff --git a/src/include/storage/latch.h b/src/include/storage/latch.h index 68ab740f16..c55838db60 100644 --- a/src/include/storage/latch.h +++ b/src/include/storage/latch.h @@ -135,9 +135,16 @@ typedef struct Latch #define WL_SOCKET_CONNECTED WL_SOCKET_WRITEABLE #endif #define WL_SOCKET_CLOSED (1 << 7) +#ifdef WIN32 +#define WL_SOCKET_ACCEPT (1 << 8) +#else +/* avoid having to deal with case on platforms not requiring it */ +#define WL_SOCKET_ACCEPT WL_SOCKET_READABLE +#endif #define WL_SOCKET_MASK (WL_SOCKET_READABLE | \ WL_SOCKET_WRITEABLE | \ WL_SOCKET_CONNECTED | \ + WL_SOCKET_ACCEPT | \ WL_SOCKET_CLOSED) typedef struct WaitEvent -- 2.35.1 From 827866959dbbe537f6677271093f6d7730bd2527 Mon Sep 17 00:00:00 2001 From: Thomas Munro Date: Tue, 6 Dec 2022 16:13:36 +1300 Subject: [PATCH v5 2/4] Don't leak a signalfd when using latches in the postmaster. At the time of commit 6a2a70a02 we didn't use latch infrastructure in the postmaster. We're planning to start doing that, so we'd better make sure that the signalfd inherited from a postmaster is not duplicated and then leaked in the child. Reviewed-by: Andres Freund
Re: Using WaitEventSet in the postmaster
On Wed, Dec 7, 2022 at 2:08 PM Justin Pryzby wrote: > > + /* > > + * It would probably be safe to re-use the inherited signalfd > > since > > + * signalfds only see the current processes pending signals, > > but it > > I think you mean "current process's", right ? Fixed in v5, thanks.
Re: Force streaming every change in logical decoding
On Tue, Dec 6, 2022 at 5:23 PM shiy.f...@fujitsu.com wrote: > > Hi hackers, > > In logical decoding, when logical_decoding_work_mem is exceeded, the changes > are > sent to output plugin in streaming mode. But there is a restriction that the > minimum value of logical_decoding_work_mem is 64kB. I tried to add a GUC to > allow sending every change to output plugin without waiting until > logical_decoding_work_mem is exceeded. > Some review comments for patch v1-0001. 1. Typos In several places "Wheather/wheather" -> "Whether/whether" == src/backend/replication/logical/reorderbuffer.c 2. ReorderBufferCheckMemoryLimit { ReorderBufferTXN *txn; - /* bail out if we haven't exceeded the memory limit */ - if (rb->size < logical_decoding_work_mem * 1024L) + /* + * Stream the changes immediately if force_stream_mode is on and the output + * plugin supports streaming. Otherwise wait until size exceeds + * logical_decoding_work_mem. + */ + bool force_stream = (force_stream_mode && ReorderBufferCanStream(rb)); + + /* bail out if force_stream is false and we haven't exceeded the memory limit */ + if (!force_stream && rb->size < logical_decoding_work_mem * 1024L) return; /* - * Loop until we reach under the memory limit. One might think that just - * by evicting the largest (sub)transaction we will come under the memory - * limit based on assumption that the selected transaction is at least as - * large as the most recent change (which caused us to go over the memory - * limit). However, that is not true because a user can reduce the + * If force_stream is true, loop until there's no change. Otherwise, loop + * until we reach under the memory limit. One might think that just by + * evicting the largest (sub)transaction we will come under the memory limit + * based on assumption that the selected transaction is at least as large as + * the most recent change (which caused us to go over the memory limit). + * However, that is not true because a user can reduce the * logical_decoding_work_mem to a smaller value before the most recent * change. */ - while (rb->size >= logical_decoding_work_mem * 1024L) + while ((!force_stream && rb->size >= logical_decoding_work_mem * 1024L) || +(force_stream && rb->size > 0)) { /* * Pick the largest transaction (or subtransaction) and evict it from IIUC this logic can be simplified quite a lot just by shifting that "bail out" condition into the loop. Something like: while (true) { if (!(force_stream && rb->size > 0 || rb->size < logical_decoding_work_mem * 1024L)) break; ... } -- Kind Regards, Peter Smith. Fujitsu Australia
Re: Using WaitEventSet in the postmaster
Hi, On 2022-12-07 14:12:37 +1300, Thomas Munro wrote: > On Wed, Dec 7, 2022 at 12:12 PM Andres Freund wrote: > > On 2022-12-07 00:58:06 +1300, Thomas Munro wrote: > > > One way to fix that for the epoll version is to EPOLL_CTL_DEL and > > > EPOLL_CTL_ADD, whenever transitioning to/from a zero event mask. > > > Tried like that in this version. Another approach would be to > > > (finally) write DeleteWaitEvent() to do the same thing at a higher > > > level... seems overkill for this. > > > > What about just recreating the WES during crash restart? > > It seems a bit like cheating but yeah that's a super simple solution, > and removes one patch from the stack. Done like that in this version. I somewhat wish we'd do that more aggressively during crash-restart, rather than the opposite. Mostly around shared memory contents though, so perhaps that's not that comparable... Greetings, Andres Freund
Re: [DOCS] Stats views and functions not in order?
I'd like to "fix" this but IIUC there is no consensus yet about what order is best for patch 0001, right? -- Kind Regards, Peter Smith. Fujitsu Australia
Re: Force streaming every change in logical decoding
On Wed, Dec 7, 2022 at 8:46 AM Peter Smith wrote: > > On Tue, Dec 6, 2022 at 9:29 PM Amit Kapila wrote: > > > > On Tue, Dec 6, 2022 at 11:53 AM shiy.f...@fujitsu.com > > wrote: > > > > > > Hi hackers, > > > > > > In logical decoding, when logical_decoding_work_mem is exceeded, the > > > changes are > > > sent to output plugin in streaming mode. But there is a restriction that > > > the > > > minimum value of logical_decoding_work_mem is 64kB. I tried to add a GUC > > > to > > > allow sending every change to output plugin without waiting until > > > logical_decoding_work_mem is exceeded. > > > > > > This helps to test streaming mode. For example, to test "Avoid streaming > > > the > > > transaction which are skipped" [1], it needs many XLOG_XACT_INVALIDATIONS > > > messages. With the new option, it can be tested with fewer changes and in > > > less > > > time. Also, this new option helps to test more scenarios for "Perform > > > streaming > > > logical transactions by background workers" [2]. > > > > > +1 > > > > > Yeah, I think this can also help in reducing the time for various > > tests in test_decoding/stream and > > src/test/subscription/t/*_stream_*.pl file by reducing the number of > > changes required to invoke streaming mode. Can we think of making this > > GUC extendible to even test more options on server-side (publisher) > > and client-side (subscriber) cases? For example, we can have something > > like logical_replication_mode with the following valid values: (a) > > server_serialize: this will serialize each change to file on > > publishers and then on commit restore and send all changes; (b) > > server_stream: this will stream each change as currently proposed in > > your patch. Then if we want to extend it for subscriber-side testing > > then we can introduce new options like client_serialize for the case > > being discussed in the email [1]. > > > > Thoughts? > > There is potential for lots of developer GUCs for testing/debugging in > the area of logical replication but IMO it might be better to keep > them all separated. Putting everything into a single > 'logical_replication_mode' might cause difficulties later when/if you > want combinations of the different modes. I think we want the developer option that forces streaming changes during logical decoding to be PGC_USERSET but probably the developer option for testing the parallel apply feature would be PGC_SIGHUP. Also, since streaming changes is not specific to logical replication but to logical decoding, I'm not sure logical_replication_XXX is a good name. IMO having force_stream_mode and a different GUC for testing the parallel apply feature makes sense to me. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Re: Force streaming every change in logical decoding
On Tue, Dec 6, 2022 at 7:18 PM Masahiko Sawada wrote: > > On Tue, Dec 6, 2022 at 7:29 PM Amit Kapila wrote: > > > > On Tue, Dec 6, 2022 at 11:53 AM shiy.f...@fujitsu.com > > wrote: > > > > > > Hi hackers, > > > > > > In logical decoding, when logical_decoding_work_mem is exceeded, the > > > changes are > > > sent to output plugin in streaming mode. But there is a restriction that > > > the > > > minimum value of logical_decoding_work_mem is 64kB. I tried to add a GUC > > > to > > > allow sending every change to output plugin without waiting until > > > logical_decoding_work_mem is exceeded. > > > > > > This helps to test streaming mode. For example, to test "Avoid streaming > > > the > > > transaction which are skipped" [1], it needs many XLOG_XACT_INVALIDATIONS > > > messages. With the new option, it can be tested with fewer changes and in > > > less > > > time. Also, this new option helps to test more scenarios for "Perform > > > streaming > > > logical transactions by background workers" [2]. > > > > > > > Yeah, I think this can also help in reducing the time for various > > tests in test_decoding/stream and > > src/test/subscription/t/*_stream_*.pl file by reducing the number of > > changes required to invoke streaming mode. > > +1 > > > Can we think of making this > > GUC extendible to even test more options on server-side (publisher) > > and client-side (subscriber) cases? For example, we can have something > > like logical_replication_mode with the following valid values: (a) > > server_serialize: this will serialize each change to file on > > publishers and then on commit restore and send all changes; (b) > > server_stream: this will stream each change as currently proposed in > > your patch. Then if we want to extend it for subscriber-side testing > > then we can introduce new options like client_serialize for the case > > being discussed in the email [1]. > > Setting logical_replication_mode = 'client_serialize' implies that the > publisher behaves as server_stream? or do you mean we can set like > logical_replication_mode = 'server_stream, client_serialize'? > The latter one (logical_replication_mode = 'server_stream, client_serialize'). The idea is to cover more options with one GUC and each option can be used individually as well as in combination with others. -- With Regards, Amit Kapila.
Re: Transaction timeout
At Mon, 5 Dec 2022 17:10:50 -0800, Andres Freund wrote in > I'm most concerned about the overhead when the timeouts are *not* > enabled. And this adds a branch to start_xact_command() and a function > call for get_timeout_active(TRANSACTION_TIMEOUT) in that case. On its > own, that's not a whole lot, but it does add up. There's 10+ function > calls for timeout and ps_display purposes for every single statement. That path seems like existing just for robustness. I inserted "Assert(0)" just before the disable_timeout(), but make check-world didn't fail [1]. Couldn't we get rid of that path, adding an assertion instead? I'm not sure about other timeouts yet, though. About disabling side, we cannot rely on StatementTimeout. [1] # 032_apply_delay.pl fails for me so I don't know any of the later # tests fails. > But it's definitely also worth optimizing the timeout enabled paths. And > you're right, it looks like there's a fair bit of optimization > potential. Thanks. I'll work on that. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Commit fest 2022-11
Hi, The Commitfest 2022-11 status still shows as "In Progress", Shouldn't the status be changed to "Closed" and the entries be moved to the next commitfest. Regards, Vignesh On Wed, 16 Nov 2022 at 18:30, Ian Lawrence Barwick wrote: > > 2022年11月14日(月) 22:38 Ian Lawrence Barwick : > > > > 2022年11月14日(月) 22:23 James Coleman : > > > > > > On Mon, Nov 14, 2022 at 7:08 AM Ian Lawrence Barwick > > > wrote: > > > > > > > > 2022年11月9日(水) 8:12 Justin Pryzby : > > > > > > > > If my script is not wrong, these patches add TAP tests, but don't > > > > > update > > > > > the requisite ./meson.build file. It seems like it'd be reasonable to > > > > > set them all as WOA until that's done. > > > > > > > > > > $ for a in `git branch -a |sort |grep commitfest/40`; do : echo > > > > > "$a..."; x=`git log -1 --compact-summary "$a"`; echo "$x" |grep > > > > > '/t/.*pl.*new' >/dev/null || continue; echo "$x" |grep -Fw meson > > > > > >/dev/null && continue; git log -1 --oneline "$a"; done > > > > > ... [CF 40/3558] Allow file inclusion in pg_hba and pg_ident files > > > > > ... [CF 40/3628] Teach pg_waldump to extract FPIs from the WAL stream > > > > > ... [CF 40/3646] Skip replicating the tables specified in except > > > > > table option > > > > > ... [CF 40/3663] Switching XLog source from archive to streaming when > > > > > primary available > > > > > ... [CF 40/3670] pg_rewind: warn when checkpoint hasn't happened > > > > > after promotion > > > > > ... [CF 40/3729] Testing autovacuum wraparound > > > > > ... [CF 40/3877] vacuumlo: add test to vacuumlo for test coverage > > > > > ... [CF 40/3985] TDE key management patches > > > > > > > > Looks like your script is correct, will update accordingly. > > > > > > > > > > It's possible this has been discussed before, but it seems less than > > > ideal to have notifications about moving to WOA be sent only in a bulk > > > email hanging off the "current CF" email chain as opposed to being > > > sent to the individual threads. Perhaps that's something the app > > > should do for us in this situation. Without that though the patch > > > authors are left to wade through unrelated discussion, and, probably > > > more importantly, the patch discussion thread doesn't show the current > > > state (I think bumping there is more likely to prompt activity as > > > well). > > > > FWIW I've been manually "bumping" the respective threads, which is somewhat > > time-consuming but seems to have been quite productive in terms of getting > > patches updated. > > > > Will do same for the above once I've confirmed what is being requested, > > (which I presume is adding the new tests to the 'tests' array in the > > respective > > "meson.build" file; just taking the opportunity to familiariize myself with > > it). > > Various mails since sent to the appropriate threads; I took the opportunity > to create a short wiki page: > >https://wiki.postgresql.org/wiki/Meson_for_patch_authors > > with relevant details (AFAICS) for anyone not familiar with meson; > corrections/ > improvements welcome. > > In the meantime I notice a number of patches in cfbot are currently failing on > TAP test "test_custom_rmgrs/t/001_basic.pl" in some environments. This was > added the other day in commit ae168c794f; there is already a fix for the issue > ( 36e0358e70 ) but the cfbot doesn't have that commit yet. > > Regards > > Ian Barwick
Re: [DOCS] Stats views and functions not in order?
On Tue, Dec 6, 2022 at 6:36 PM Peter Smith wrote: > I'd like to "fix" this but IIUC there is no consensus yet about what > order is best for patch 0001, right? > > I'm planning on performing a more thorough review of 0003 and 0004 tomorrow. As for 0001 - go with Peter E.'s suggested ordering. David J.
Re: Time delayed LR (WAS Re: logical replication restrictions)
At Tue, 6 Dec 2022 11:08:43 -0800, Andres Freund wrote in > Hi, > > The tests fail on cfbot: > https://cirrus-ci.com/task/4533866329800704 > > They only seem to fail on 32bit linux. > > https://api.cirrus-ci.com/v1/artifact/task/4533866329800704/testrun/build-32/testrun/subscription/032_apply_delay/log/regress_log_032_apply_delay > [06:27:10.628](0.138s) ok 2 - check if the new rows were applied to subscriber > timed out waiting for match: (?^:logical replication apply delay) at > /tmp/cirrus-ci-build/src/test/subscription/t/032_apply_delay.pl line 124. It fails for me on 64bit Linux.. (Rocky 8.7) > t/032_apply_delay.pl ... Dubious, test returned 29 (wstat 7424, > 0x1d00) > No subtests run .. > t/032_apply_delay.pl (Wstat: 7424 Tests: 0 Failed: 0) > Non-zero exit status: 29 > Parse errors: No plan found in TAP output regards. -- Kyotaro Horiguchi NTT Open Source Software Center
RE: Perform streaming logical transactions by background workers and parallel apply
On Tue, Dec 6, 2022 7:57 AM Peter Smith wrote: > Here are my review comments for patch v55-0002 Thansk for your comments. > == > > .../replication/logical/applyparallelworker.c > > 1. pa_can_start > > @@ -276,9 +278,9 @@ pa_can_start(TransactionId xid) > /* > * Don't start a new parallel worker if user has set skiplsn as it's > * possible that user want to skip the streaming transaction. For > - * streaming transaction, we need to spill the transaction to disk so > that > - * we can get the last LSN of the transaction to judge whether to > skip > - * before starting to apply the change. > + * streaming transaction, we need to serialize the transaction to a > + file > + * so that we can get the last LSN of the transaction to judge > + whether to > + * skip before starting to apply the change. > */ > if (!XLogRecPtrIsInvalid(MySubscription->skiplsn)) > return false; > > I think the wording change may belong in patch 0001 because it has > nothing to do with partial serializing. Changed. > ~~~ > > 2. pa_free_worker > > + /* > + * Stop the worker if there are enough workers in the pool. > + * > + * XXX The worker is also stopped if the leader apply worker needed > + to > + * serialize part of the transaction data due to a send timeout. This > + is > + * because the message could be partially written to the queue due to > + send > + * timeout and there is no way to clean the queue other than > + resending the > + * message until it succeeds. To avoid complexity, we directly stop > + the > + * worker in this case. > + */ > + if (winfo->serialize_changes || > + napplyworkers > (max_parallel_apply_workers_per_subscription / 2)) > > Don't need to say "due to send timeout" 2 times in 2 sentences. > > SUGGESTION > XXX The worker is also stopped if the leader apply worker needed to > serialize part of the transaction data due to a send timeout. This is > because the message could be partially written to the queue but there > is no way to clean the queue other than resending the message until it > succeeds. Directly stopping the worker avoids needing this complexity. Changed. > 4. > > /* > + * Replay the spooled messages in the parallel apply worker if the > +leader apply > + * worker has finished serializing changes to the file. > + */ > +static void > +pa_spooled_messages(void) > > I'm not 100% sure of the logic, so IMO maybe the comment should say a > bit more about how this works: > > Specifically, let's say there was some timeout and the LA needed to > write the spool file, then let's say the PA timed out and found itself > inside this function. Now, let's say the LA is still busy writing the > file -- so what happens next? > > Does this function simply return, then the main PA loop waits again, > then the times out again, then PA finds itself back inside this > function again... and that keeps happening over and over until > eventually the spool file is found FS_READY? Some explanatory comments > might help. Slightly changed the logic and comment here. > ~ > > 5. > > + /* > + * Check if changes have been serialized to a file. if so, read and > + apply > + * them. > + */ > + SpinLockAcquire(&MyParallelShared->mutex); > + fileset_state = MyParallelShared->fileset_state; > + SpinLockRelease(&MyParallelShared->mutex); > > "if so" -> "If so" Changed. > ~~~ > > > 6. pa_send_data > > + * > + * If the attempt to send data via shared memory times out, then we > + will > switch > + * to "PARTIAL_SERIALIZE mode" for the current transaction to prevent > possible > + * deadlocks with another parallel apply worker (refer to the > + comments atop > + * applyparallelworker.c for details). This means that the current > + data and any > + * subsequent data for this transaction will be serialized to a file. > */ > void > pa_send_data(ParallelApplyWorkerInfo *winfo, Size nbytes, const void > *data) > > SUGGESTION (minor comment rearranging) > > If the attempt to send data via shared memory times out, then we will > switch to "PARTIAL_SERIALIZE mode" for the current transaction -- this > means that the current data and any subsequent data for this > transaction will be serialized to a file. This is done to prevent > possible deadlocks with another parallel apply worker (refer to the > comments atop applyparallelworker.c for details). Changed. > ~ > > 7. > > + /* > + * Take the stream lock to make sure that the parallel apply worker > + * will wait for the leader to release the stream lock until the > + * end of the transaction. > + */ > + pa_lock_stream(winfo->shared->xid, AccessExclusiveLock); > > The comment doesn't sound right. > > "until the end" -> "at the end" (??) I think it means "PA wait ... until the end of transaction". > ~~~ > > 8. pa_stream_abort > > @@ -1374,6 +1470,7 @@ pa_stream_abort(LogicalRepStreamAbortData > *abort_data) > RollbackToSavepoint(spname); > CommitTransactionCommand(); > subxactlist = list_truncate(subx
Re: meson PGXS compatibility
Hi, On 2022-12-01 12:17:51 -0800, Andres Freund wrote: > I'll push 0001, 0002 shortly. I don't think 0002 is the most elegant approach, > but it's not awful. I'd still like some review for 0003, but will try to push > it in a few days if that's not forthcoming. Pushed 0003 (move export_dynamic determination to configure.ac) and 0004 (PGXS compatibility). Hope there's no fallout from 0003. Greetings, Andres Freund
Re: Using WaitEventSet in the postmaster
Oops, v5 was broken as visible on cfbot (a last second typo broke it). Here's a better one. From 07b04dc410118ad04fd0006edda7ba80f241357a Mon Sep 17 00:00:00 2001 From: Thomas Munro Date: Tue, 6 Dec 2022 15:21:11 +1300 Subject: [PATCH v6 1/4] Add WL_SOCKET_ACCEPT event to WaitEventSet API. To be able to handle incoming connections on a server socket with the WaitEventSet API, we'll need a new kind of event to indicate that the the socket is ready to accept a connection. On Unix, it's just the same as WL_SOCKET_READABLE, but on Windows there is a different kernel event that we need to map our abstraction to. A future commit will use this. Reviewed-by: Andres Freund Discussion: https://postgr.es/m/CA%2BhUKG%2BZ-HpOj1JsO9eWUP%2Bar7npSVinsC_npxSy%2BjdOMsx%3DGg%40mail.gmail.com --- src/backend/storage/ipc/latch.c | 13 - src/include/storage/latch.h | 7 +++ 2 files changed, 19 insertions(+), 1 deletion(-) diff --git a/src/backend/storage/ipc/latch.c b/src/backend/storage/ipc/latch.c index eb3a569aae..7ced8264f0 100644 --- a/src/backend/storage/ipc/latch.c +++ b/src/backend/storage/ipc/latch.c @@ -864,6 +864,9 @@ FreeWaitEventSet(WaitEventSet *set) * - WL_SOCKET_CONNECTED: Wait for socket connection to be established, * can be combined with other WL_SOCKET_* events (on non-Windows * platforms, this is the same as WL_SOCKET_WRITEABLE) + * - WL_SOCKET_ACCEPT: Wait for new connection to a server socket, + * can be combined with other WL_SOCKET_* events (on non-Windows + * platforms, this is the same as WL_SOCKET_READABLE) * - WL_SOCKET_CLOSED: Wait for socket to be closed by remote peer. * - WL_EXIT_ON_PM_DEATH: Exit immediately if the postmaster dies * @@ -874,7 +877,7 @@ FreeWaitEventSet(WaitEventSet *set) * i.e. it must be a process-local latch initialized with InitLatch, or a * shared latch associated with the current process by calling OwnLatch. * - * In the WL_SOCKET_READABLE/WRITEABLE/CONNECTED cases, EOF and error + * In the WL_SOCKET_READABLE/WRITEABLE/CONNECTED/ACCEPT cases, EOF and error * conditions cause the socket to be reported as readable/writable/connected, * so that the caller can deal with the condition. * @@ -1312,6 +1315,8 @@ WaitEventAdjustWin32(WaitEventSet *set, WaitEvent *event) flags |= FD_WRITE; if (event->events & WL_SOCKET_CONNECTED) flags |= FD_CONNECT; + if (event->events & WL_SOCKET_ACCEPT) + flags |= FD_ACCEPT; if (*handle == WSA_INVALID_EVENT) { @@ -2067,6 +2072,12 @@ WaitEventSetWaitBlock(WaitEventSet *set, int cur_timeout, /* connected */ occurred_events->events |= WL_SOCKET_CONNECTED; } + if ((cur_event->events & WL_SOCKET_ACCEPT) && + (resEvents.lNetworkEvents & FD_ACCEPT)) + { + /* incoming connection could be accepted */ + occurred_events->events |= WL_SOCKET_ACCEPT; + } if (resEvents.lNetworkEvents & FD_CLOSE) { /* EOF/error, so signal all caller-requested socket flags */ diff --git a/src/include/storage/latch.h b/src/include/storage/latch.h index 68ab740f16..c55838db60 100644 --- a/src/include/storage/latch.h +++ b/src/include/storage/latch.h @@ -135,9 +135,16 @@ typedef struct Latch #define WL_SOCKET_CONNECTED WL_SOCKET_WRITEABLE #endif #define WL_SOCKET_CLOSED (1 << 7) +#ifdef WIN32 +#define WL_SOCKET_ACCEPT (1 << 8) +#else +/* avoid having to deal with case on platforms not requiring it */ +#define WL_SOCKET_ACCEPT WL_SOCKET_READABLE +#endif #define WL_SOCKET_MASK (WL_SOCKET_READABLE | \ WL_SOCKET_WRITEABLE | \ WL_SOCKET_CONNECTED | \ + WL_SOCKET_ACCEPT | \ WL_SOCKET_CLOSED) typedef struct WaitEvent -- 2.35.1 From 827866959dbbe537f6677271093f6d7730bd2527 Mon Sep 17 00:00:00 2001 From: Thomas Munro Date: Tue, 6 Dec 2022 16:13:36 +1300 Subject: [PATCH v6 2/4] Don't leak a signalfd when using latches in the postmaster. At the time of commit 6a2a70a02 we didn't use latch infrastructure in the postmaster. We're planning to start doing that, so we'd better make sure that the signalfd inherited from a postmaster is not duplicated and then leaked in the child. Reviewed-by: Andres Freund Reviewed-by: Justin Pryzby Discussion: https://postgr.es/m/CA%2BhUKG%2BZ-HpOj1JsO9eWUP%2Bar7npSVinsC_npxSy%2BjdOMsx%3DGg%40mail.gmail.com --- src/backend/storage/ipc/latch.c | 16 1 file changed, 16 insertions(+) diff --git a/src/backend/storage/ipc/latch.c b/src/backend/storage/ipc/latch.c index 7ced8264f0..b32c96b63d 100644 --- a/src/backend/storage/ipc/latch.c +++ b/src/backend/storage/ipc/latch.c @@ -283,6 +283,22 @@ InitializeLatchSupport(void) #ifdef WAIT_USE_SIGNALFD sigset_t signalfd_mask; + if (IsUnderPostmaster) + { + /* + * It would probably be safe to re-use the inherited signalfd since + * signalfds only see the current process's pending signals, but it + * seems less surprising to close it and create our own. + */ + if (signal_fd != -1) + { + /* Release postmaster's signal FD; ignore a
Re: Commit fest 2022-11
On Wed, Dec 07, 2022 at 08:25:25AM +0530, vignesh C wrote: > The Commitfest 2022-11 status still shows as "In Progress", Shouldn't > the status be changed to "Closed" and the entries be moved to the next > commitfest. Yes, Ian has told me that he is on it this week. -- Michael signature.asc Description: PGP signature