Re: [PATCH] Add pretty-printed XML output option
On 09.02.23 08:23, Tom Lane wrote: Um ... why are you using PG_TRY here at all? It seems like you have full control of the actually likely error cases. The only plausible error out of the StringInfo calls is OOM, and you probably don't want to trap that at all. My intention was to catch any unexpected error from xmlDocDumpFormatMemory and handle it properly. But I guess you're right, I can control the likely error cases by checking doc and nbytes. You suggest something along these lines? xmlDocPtr doc; xmlChar *xmlbuf = NULL; text *arg = PG_GETARG_TEXT_PP(0); StringInfoData buf; int nbytes; doc = xml_parse(arg, XMLOPTION_DOCUMENT, false, GetDatabaseEncoding(), NULL); if(!doc) elog(ERROR, "could not parse the given XML document"); xmlDocDumpFormatMemory(doc, &xmlbuf, &nbytes, 1); xmlFreeDoc(doc); if(!nbytes) elog(ERROR, "could not indent the given XML document"); initStringInfo(&buf); appendStringInfoString(&buf, (const char *)xmlbuf); xmlFree(xmlbuf); PG_RETURN_XML_P(stringinfo_to_xmltype(&buf)); Thanks! Best, Jim smime.p7s Description: S/MIME Cryptographic Signature
Re: Time delayed LR (WAS Re: logical replication restrictions)
On Thu, Feb 9, 2023 at 10:45 AM Kyotaro Horiguchi wrote: > > At Wed, 8 Feb 2023 09:03:03 +, "Hayato Kuroda (Fujitsu)" > wrote in > > Thank you for reviewing! PSA new version. > > + if (statusinterval_ms > 0 && diffms > statusinterval_ms) > > The next expected feedback time is measured from the last status > report. Thus, it seems to me this may suppress feedbacks from being > sent for an unexpectedly long time especially when min_apply_delay is > shorter than wal_r_s_interval. > I think the minimum time before we send any feedback during the wait is wal_r_s_interval. Now, I think if there is no transaction for a long time before we get a new transaction, there should be keep-alive messages in between which would allow us to send feedback at regular intervals (wal_receiver_status_interval). So, I think we should be able to send feedback in less than 2 * wal_receiver_status_interval unless wal_sender/receiver timeout is very large and there is a very low volume of transactions. Now, we can try to send the feedback before we start waiting or maybe after every wal_receiver_status_interval / 2 but I think that will lead to more spurious feedback messages than we get the benefit from them. -- With Regards, Amit Kapila.
Re: Can we do something to help stop users mistakenly using force_parallel_mode?
On Thu, Feb 9, 2023 at 5:50 AM David Rowley wrote: > Attached updated patch. Looks good at a glance, just found a spurious word: + "by forcing the planner into to generate plans which contains nodes " -- John Naylor EDB: http://www.enterprisedb.com
Re: Is psSocketPoll doing the right thing?
On 2023/02/09 11:50, Kyotaro Horiguchi wrote: Hello. While looking a patch, I found that pqSocketPoll passes through the result from poll(2) to the caller and throws away revents. If I understand it correctly, poll() *doesn't* return -1 nor errno by the reason it has set POLLERR, POLLHUP, POLLNVAL, and POLLRDHUP for some of the target sockets, and returns 0 unless poll() itself failed to work. As far as I understand correctly, poll() returns >0 if "revents" has either of those bits, not 0 nor -1. You're thinking that pqSocketPoll() should check "revents" and return -1 if either of those bits is set? Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Re: Consolidate ItemPointer to Datum conversion functions
On 06.02.23 11:11, Heikki Linnakangas wrote: On 06/02/2023 11:54, Peter Eisentraut wrote: Instead of defining the same set of macros several times, define it once in an appropriate header file. In passing, convert to inline functions. Looks good to me. Did you consider moving PG_GETARG_ITEMPOINTER and PG_RETURN_ITEMPOINTER, too? They're only used in tid.c, but for most datatypes, we define the PG_GETARG and PG_RETURN macros in the same header file as the the Datum conversion functions. Yeah that makes sense. Here is an updated patch for that. From 47b316eb6cfa50412d81fd183a8b54e0a104a685 Mon Sep 17 00:00:00 2001 From: Peter Eisentraut Date: Thu, 9 Feb 2023 09:23:20 +0100 Subject: [PATCH v2] Consolidate ItemPointer to Datum conversion functions Instead of defining the same set of macros several times, define it once in an appropriate header file. In passing, convert to inline functions. Discussion: https://www.postgresql.org/message-id/flat/844dd4c5-e5a1-3df1-bfaf-d1e1c2a16e45%40enterprisedb.com --- contrib/pageinspect/btreefuncs.c | 2 -- contrib/pageinspect/ginfuncs.c | 3 --- contrib/pageinspect/gistfuncs.c | 2 -- src/backend/utils/adt/tid.c | 5 - src/include/storage/itemptr.h| 20 5 files changed, 20 insertions(+), 12 deletions(-) diff --git a/contrib/pageinspect/btreefuncs.c b/contrib/pageinspect/btreefuncs.c index e4e5dc3c81..9cdc8e182b 100644 --- a/contrib/pageinspect/btreefuncs.c +++ b/contrib/pageinspect/btreefuncs.c @@ -50,8 +50,6 @@ PG_FUNCTION_INFO_V1(bt_multi_page_stats); #define IS_INDEX(r) ((r)->rd_rel->relkind == RELKIND_INDEX) #define IS_BTREE(r) ((r)->rd_rel->relam == BTREE_AM_OID) -#define DatumGetItemPointer(X) ((ItemPointer) DatumGetPointer(X)) -#define ItemPointerGetDatum(X) PointerGetDatum(X) /* * structure for single btree page statistics diff --git a/contrib/pageinspect/ginfuncs.c b/contrib/pageinspect/ginfuncs.c index efaa47e86d..0f846988df 100644 --- a/contrib/pageinspect/ginfuncs.c +++ b/contrib/pageinspect/ginfuncs.c @@ -21,9 +21,6 @@ #include "utils/builtins.h" #include "utils/rel.h" -#define DatumGetItemPointer(X) ((ItemPointer) DatumGetPointer(X)) -#define ItemPointerGetDatum(X) PointerGetDatum(X) - PG_FUNCTION_INFO_V1(gin_metapage_info); PG_FUNCTION_INFO_V1(gin_page_opaque_info); diff --git a/contrib/pageinspect/gistfuncs.c b/contrib/pageinspect/gistfuncs.c index 3a947c82af..100697814d 100644 --- a/contrib/pageinspect/gistfuncs.c +++ b/contrib/pageinspect/gistfuncs.c @@ -31,8 +31,6 @@ PG_FUNCTION_INFO_V1(gist_page_items_bytea); #define IS_GIST(r) ((r)->rd_rel->relam == GIST_AM_OID) -#define ItemPointerGetDatum(X) PointerGetDatum(X) - Datum gist_page_opaque_info(PG_FUNCTION_ARGS) diff --git a/src/backend/utils/adt/tid.c b/src/backend/utils/adt/tid.c index 251219a1ef..77fb74ab0c 100644 --- a/src/backend/utils/adt/tid.c +++ b/src/backend/utils/adt/tid.c @@ -37,11 +37,6 @@ #include "utils/varlena.h" -#define DatumGetItemPointer(X) ((ItemPointer) DatumGetPointer(X)) -#define ItemPointerGetDatum(X) PointerGetDatum(X) -#define PG_GETARG_ITEMPOINTER(n) DatumGetItemPointer(PG_GETARG_DATUM(n)) -#define PG_RETURN_ITEMPOINTER(x) return ItemPointerGetDatum(x) - #define LDELIM '(' #define RDELIM ')' #define DELIM ',' diff --git a/src/include/storage/itemptr.h b/src/include/storage/itemptr.h index 354e50e68b..fafefa14cd 100644 --- a/src/include/storage/itemptr.h +++ b/src/include/storage/itemptr.h @@ -222,4 +222,24 @@ extern int32 ItemPointerCompare(ItemPointer arg1, ItemPointer arg2); extern void ItemPointerInc(ItemPointer pointer); extern void ItemPointerDec(ItemPointer pointer); +/* + * Datum conversion functions + * + */ + +static inline ItemPointer +DatumGetItemPointer(Datum X) +{ + return (ItemPointer) DatumGetPointer(X); +} + +static inline Datum +ItemPointerGetDatum(const ItemPointerData *X) +{ + return PointerGetDatum(X); +} + +#define PG_GETARG_ITEMPOINTER(n) DatumGetItemPointer(PG_GETARG_DATUM(n)) +#define PG_RETURN_ITEMPOINTER(x) return ItemPointerGetDatum(x) + #endif /* ITEMPTR_H */ -- 2.39.1
Re: Exit walsender before confirming remote flush in logical replication
Hi, On Wed, Feb 8, 2023 at 6:47 PM Hayato Kuroda (Fujitsu) wrote: > > > > > Dear Horiguchi-san, > > > > Thank you for checking the patch! PSA new version. > > PSA rebased patch that supports updated time-delayed patch[1]. > Thank you for the patch! Here are some comments on v5 patch: +/* + * Options for controlling the behavior of the walsender. Options can be + * specified in the START_STREAMING replication command. Currently only one + * option is allowed. + */ +typedef struct +{ +WalSndShutdownMode shutdown_mode; +} WalSndOptions; + +static WalSndOptions *my_options = NULL; I'm not sure we need to have it as a struct at this stage since we support only one option. I wonder if we can have one value, say shutdown_mode, and we can make it a struct when we really need it. Even if we use WalSndOptions struct, I don't think we need to dynamically allocate it. Since a walsender can start logical replication multiple times in principle, my_options is not freed. --- +/* + * Parse given shutdown mode. + * + * Currently two values are accepted - "wait_flush" and "immediate" + */ +static void +ParseShutdownMode(char *shutdownmode) +{ +if (pg_strcasecmp(shutdownmode, "wait_flush") == 0) +my_options->shutdown_mode = WALSND_SHUTDOWN_MODE_WAIT_FLUSH; +else if (pg_strcasecmp(shutdownmode, "immediate") == 0) +my_options->shutdown_mode = WALSND_SHUTDOWN_MODE_IMMIDEATE; +else +ereport(ERROR, +errcode(ERRCODE_SYNTAX_ERROR), +errmsg("SHUTDOWN_MODE requires \"wait_flush\" or \"immediate\"")); +} I think we should make the error message consistent with other enum parameters. How about the message like: ERROR: invalid value shutdown mode: "%s" Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Re: proposal: psql: psql variable BACKEND_PID
On 03.02.23 11:41, Pavel Stehule wrote: We can simply allow an access to backend process id thru psql variable. I propose the name "BACKEND_PID". The advantages of usage are simple accessibility by command \set, and less typing then using function pg_backend_pid, because psql variables are supported by tab complete routine. Implementation is very simple, because we can use the function PQbackendPID. What would this be useful for? You can mostly do this using select pg_backend_pid() AS "BACKEND_PID" \gset
Re: proposal: psql: psql variable BACKEND_PID
čt 9. 2. 2023 v 9:57 odesílatel Peter Eisentraut < peter.eisentr...@enterprisedb.com> napsal: > On 03.02.23 11:41, Pavel Stehule wrote: > > We can simply allow an access to backend process id thru psql variable. > > I propose the name "BACKEND_PID". The advantages of usage are simple > > accessibility by command \set, and less typing then using function > > pg_backend_pid, because psql variables are supported by tab complete > > routine. Implementation is very simple, because we can use the function > > PQbackendPID. > > What would this be useful for? > > You can mostly do this using > > select pg_backend_pid() AS "BACKEND_PID" \gset > there are 2 (3) my motivations first and main (for me) - I can use psql variables tab complete - just :B - it is significantly faster second - I can see all connection related information by \set third - there is not hook on reconnect in psql - so if you implement BACKEND_PID by self, you ensure to run query with pg_backend_pid() after any reconnect or connection change. It is clean so you can run "select pg_backend_pid() AS "BACKEND_PID" \gset" and you can store it to .psqlrc. But most of the time I am in customer's environment, and I have the time, possibility to do a complete setup of .psqlrc. It looks (for me) like a generally useful feature to be everywhere. Regards Pavel
Re: A bug with ExecCheckPermissions
On 08.02.2023 21:23, Alvaro Herrera wrote: On 2023-Feb-08, Amit Langote wrote: On Wed, Feb 8, 2023 at 16:19 Alvaro Herrera wrote: I think we should also patch ExecCheckPermissions to use forboth(), scanning the RTEs as it goes over the perminfos, and make sure that the entries are consistent. Hmm, we can’t use forboth here, because not all RTEs have the corresponding RTEPermissionInfo, inheritance children RTEs, for example. Doh, of course. Also, it doesn’t make much sense to reinstate the original loop over range table and fetch the RTEPermissionInfo for the RTEs with non-0 perminfoindex, because the main goal of the patch was to make ExecCheckPermissions() independent of range table length. Yeah, I'm thinking in a mechanism that would allow us to detect bugs in development builds — no need to have it run in production builds. However, I can't see any useful way to implement it. Maybe something like the attached would do? -- Sergey Shinderukhttps://postgrespro.com/ diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c index 39bfb48dc22..94866743f48 100644 --- a/src/backend/executor/execMain.c +++ b/src/backend/executor/execMain.c @@ -577,6 +577,28 @@ ExecCheckPermissions(List *rangeTable, List *rteperminfos, ListCell *l; boolresult = true; +#ifdef USE_ASSERT_CHECKING + Bitmapset *indexset = NULL; + + /* Check that rteperminfos is consistent with rangeTable */ + foreach(l, rangeTable) + { + RangeTblEntry *rte = lfirst_node(RangeTblEntry, l); + + if (rte->perminfoindex != 0) + { + /* Sanity checks */ + (void) getRTEPermissionInfo(rteperminfos, rte); + /* Many-to-one mapping not allowed */ + Assert(!bms_is_member(rte->perminfoindex, indexset)); + indexset = bms_add_member(indexset, rte->perminfoindex); + } + } + + /* All rteperminfos are referenced */ + Assert(bms_num_members(indexset) == list_length(rteperminfos)); +#endif + foreach(l, rteperminfos) { RTEPermissionInfo *perminfo = lfirst_node(RTEPermissionInfo, l); diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c index 375b17b9f3b..f6d19226a0a 100644 --- a/src/backend/utils/adt/ri_triggers.c +++ b/src/backend/utils/adt/ri_triggers.c @@ -1399,6 +1399,7 @@ RI_Initial_Check(Trigger *trigger, Relation fk_rel, Relation pk_rel) pkrte->relid = RelationGetRelid(pk_rel); pkrte->relkind = pk_rel->rd_rel->relkind; pkrte->rellockmode = AccessShareLock; + pkrte->perminfoindex = 2; pk_perminfo = makeNode(RTEPermissionInfo); pk_perminfo->relid = RelationGetRelid(pk_rel); @@ -1409,6 +1410,7 @@ RI_Initial_Check(Trigger *trigger, Relation fk_rel, Relation pk_rel) fkrte->relid = RelationGetRelid(fk_rel); fkrte->relkind = fk_rel->rd_rel->relkind; fkrte->rellockmode = AccessShareLock; + fkrte->perminfoindex = 1; fk_perminfo = makeNode(RTEPermissionInfo); fk_perminfo->relid = RelationGetRelid(fk_rel);
Inconsistent nullingrels due to oversight in deconstruct_distribute_oj_quals
When we try to generate qual variants with different nullingrels in deconstruct_distribute_oj_quals, we traverse all the JoinTreeItems and adjust qual nulling bits as we crawl up the join tree. For a SpecialJoinInfo which commutes with current sjinfo from below left, in the next level up it would null all the relids in its righthand. So we adjust qual nulling bits as below. /* * Adjust qual nulling bits for next level up, if needed. We * don't want to put sjinfo's own bit in at all, and if we're * above sjinfo then we did it already. */ if (below_sjinfo) quals = (List *) add_nulling_relids((Node *) quals, othersj->min_righthand, bms_make_singleton(othersj->ojrelid)); It seems to me there is oversight here. Actually in next level up this othersj would null all the relids in its syn_righthand, not only the relids in its min_righthand. If the quals happen to contain references to relids which are in othersj->syn_righthand but not in othersj->min_righthand, these relids would not get updated with othersj->ojrelid added. And this would cause qual nulling bits not consistent. I've managed to devise a query that can show this problem. create table t1(a int, b int); create table t2(a int, b int); create table t3(a int, b int); create table t4(a int, b int); insert into t1 select i, i from generate_series(1,10)i; insert into t2 select i, i from generate_series(1,10)i; insert into t3 select i, i from generate_series(1,1000)i; insert into t4 select i, i from generate_series(1,1000)i; analyze; select * from t1 left join (t2 left join t3 on t2.a > t3.a) on t1.b = t2.b left join t4 on t2.b = t3.b; This query would trigger the Assert() in search_indexed_tlist_for_var. So I wonder that we should use othersj->syn_righthand here. --- a/src/backend/optimizer/plan/initsplan.c +++ b/src/backend/optimizer/plan/initsplan.c @@ -2046,7 +2046,7 @@ deconstruct_distribute_oj_quals(PlannerInfo *root, if (below_sjinfo) quals = (List *) add_nulling_relids((Node *) quals, - othersj->min_righthand, + othersj->syn_righthand, bms_make_singleton(othersj->ojrelid)); Thanks Richard
Re: daitch_mokotoff module
Tomas Vondra writes: > On 2/8/23 15:31, Dag Lem wrote: >> Alvaro Herrera writes: >> >>> On 2023-Jan-17, Dag Lem wrote: >>> + * Daitch-Mokotoff Soundex + * + * Copyright (c) 2021 Finance Norway + * Author: Dag Lem >>> >>> Hmm, I don't think we accept copyright lines that aren't "PostgreSQL >>> Global Development Group". Is it okay to use that, and update the year >>> to 2023? (Note that answering "no" very likely means your patch is not >>> candidate for inclusion.) Also, we tend not to have "Author:" lines. >>> >> >> You'll have to forgive me for not knowing about this rule: >> >> grep -ER "Copyright.*[0-9]{4}" contrib/ | grep -v PostgreSQL >> >> In any case, I have checked with the copyright owner, and it would be OK >> to assign the copyright to "PostgreSQL Global Development Group". >> > > I'm not entirely sure what's the rule either, and I'm a committer. My > guess is these cases are either old and/or adding a code that already > existed elsewhere (like some of the double metaphone, for example), or > maybe both. But I'd bet we'd prefer not adding more ... > >> To avoid going back and forth with patches, how do you propose that the >> sponsor and the author of the contributed module should be credited? >> Woule something like this be acceptable? >> > > We generally credit contributors in two ways - by mentioning them in the > commit message, and by listing them in the release notes (for individual > features). > I'll ask again, would the proposed credits be acceptable? In this case, the code already existed elsewhere (as in your example for double metaphone) as a separate extension. The copyright owner is OK with copyright assignment, however I find it quite unreasonable that proper credits should not be given. Neither commit messages nor release notes follow the contributed module, which is in its entirety contributed by an external entity. I'll also point out that in addition to credits in code all over the place, PostgreSQL has much more prominent credits in the documentation: grep -ER "Author" doc/ | grep -v PostgreSQL "Author" is even documented as a top level section in the Reference Pages as "Author (only used in the contrib section)", see https://www.postgresql.org/docs/15/docguide-style.html#id-1.11.11.8.2 If there really exists some new rule which says that for new contributions under contrib/, credits should not be allowed in any way in either code or documentation (IANAL, but AFAIU this would be in conflict with laws on author's moral rights in several countries), then one would reasonably expect that you'd be upfront about this, both in documentation, and also as the very first thing when a contribution is first proposed for inclusion. Best regards Dag Lem
Re: A bug with ExecCheckPermissions
Hi, On Thu, Feb 9, 2023 at 14:44 Sergey Shinderuk wrote: > On 08.02.2023 21:23, Alvaro Herrera wrote: > > On 2023-Feb-08, Amit Langote wrote: > > > >> On Wed, Feb 8, 2023 at 16:19 Alvaro Herrera > wrote: > > > >>> I think we should also patch ExecCheckPermissions to use forboth(), > >>> scanning the RTEs as it goes over the perminfos, and make sure that the > >>> entries are consistent. > >> > >> Hmm, we can’t use forboth here, because not all RTEs have the > corresponding > >> RTEPermissionInfo, inheritance children RTEs, for example. > > > > Doh, of course. > > > >> Also, it doesn’t make much sense to reinstate the original loop over > >> range table and fetch the RTEPermissionInfo for the RTEs with non-0 > >> perminfoindex, because the main goal of the patch was to make > >> ExecCheckPermissions() independent of range table length. > > > > Yeah, I'm thinking in a mechanism that would allow us to detect bugs in > > development builds — no need to have it run in production builds. > > However, I can't see any useful way to implement it. > > > > > Maybe something like the attached would do? Thanks for the patch. Something like this makes sense to me. > -- Thanks, Amit Langote EDB: http://www.enterprisedb.com
RE: Time delayed LR (WAS Re: logical replication restrictions)
Hi, On Thursday, February 9, 2023 4:56 PM Amit Kapila wrote: > On Thu, Feb 9, 2023 at 12:17 AM Peter Smith > wrote: > > > > On Wed, Feb 8, 2023 at 8:03 PM Hayato Kuroda (Fujitsu) > > wrote: > > > > > ... > > > > == > > > > > > > > src/backend/replication/logical/worker.c > > > > > > > > 2. maybe_apply_delay > > > > > > > > + if (wal_receiver_status_interval > 0 && diffms > > > > > + wal_receiver_status_interval * 1000L) { WaitLatch(MyLatch, > > > > + WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH, > > > > + wal_receiver_status_interval * 1000L, > > > > + WAIT_EVENT_RECOVERY_APPLY_DELAY); > send_feedback(last_received, > > > > + true, false, true); } > > > > > > > > I felt that introducing another variable like: > > > > > > > > long statusinterval_ms = wal_receiver_status_interval * 1000L; > > > > > > > > would help here by doing 2 things: > > > > 1) The condition would be easier to read because the ms units > > > > would be the same > > > > 2) Won't need * 1000L repeated in two places. > > > > > > > > Only, do take care to assign this variable in the right place in > > > > this loop in case the configuration is changed. > > > > > > Fixed. Calculations are done on two lines - first one is the > > > entrance of the loop, and second one is the after SIGHUP is detected. > > > > > > > TBH, I expected you would write this as just a *single* variable > > assignment before the condition like below: > > > > SUGGESTION (tweaked comment and put single assignment before > > condition) > > /* > > * Call send_feedback() to prevent the publisher from exiting by > > * timeout during the delay, when the status interval is greater than > > * zero. > > */ > > status_interval_ms = wal_receiver_status_interval * 1000L; if > > (status_interval_ms > 0 && diffms > status_interval_ms) { ... > > > > ~ > > > > I understand in theory, your code is more efficient, but in practice, > > I think the overhead of a single variable assignment every loop > > iteration (which is doing WaitLatch anyway) is of insignificant > > concern, whereas having one assignment is simpler than having two IMO. > > > > Yeah, that sounds better to me as well. OK, fixed. The comment adjustment suggested by Peter-san above was also included in this v33. Please have a look at the attached patch. Best Regards, Takamichi Osumi v33-0001-Time-delayed-logical-replication-subscriber.patch Description: v33-0001-Time-delayed-logical-replication-subscriber.patch
Re: Improve logging when using Huge Pages
On 2023-Feb-08, Justin Pryzby wrote: > I don't think it makes sense to run postgres -C huge_pages_active, > however, so I see no issue that that would always returns "false". Hmm, I would initialize it to return "unknown" rather than "off" — and make sure it turns "off" at the appropriate time. Otherwise you're just moving the confusion elsewhere. > If need be, maybe the documentation could say "indicates whether huge > pages are active for the running server". Dunno, that seems way too subtle. > Does anybody else want to vote for a function rather than a > RUNTIME_COMPUTED GUC ? I don't think I'd like to have SELECT show_block_size() et al, so I'd rather not go that way. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "¿Qué importan los años? Lo que realmente importa es comprobar que a fin de cuentas la mejor edad de la vida es estar vivo" (Mafalda)
Re: ANY_VALUE aggregate
On 1/23/23 08:50, David Rowley wrote: On Thu, 19 Jan 2023 at 06:01, Vik Fearing wrote: Thank you for the review. Attached is a new version rebased to d540a02a72. I've only a bunch of nit-picks, personal preferences and random thoughts to offer as a review: 1. I'd be inclined *not* to mention the possible future optimisation in: + * Currently this just returns the first value, but in the future it might be + * able to signal to the aggregate that it does not need to be called anymore. I think it's unlikely that the transfn would "signal" such a thing. It seems more likely if we did anything about it that nodeAgg.c would maybe have some additional knowledge not to call that function if the agg state already has a value. Just so we're not preempting how we might do such a thing in the future, it seems best just to remove the mention of it. I don't really think it serves as a good reminder that we might want to do this one day anyway. Modified. My logic in having the transition function signal that it is finished is to one day allow something like: array_agg(x order by y limit z) 2. +any_value_trans(PG_FUNCTION_ARGS) Many of transition function names end in "transfn", not "trans". I think it's better to follow the existing (loosely followed) naming pattern that a few aggregates seem to follow rather than invent a new one. Renamed. 3. I tend to try to copy the capitalisation of keywords from the surrounding regression tests. I see the following breaks that. +SELECT any_value(v) FILTER (WHERE v > 2) FROM (VALUES (1), (2), (3)) AS v (v); (obviously, ideally, we'd always just follow the same capitalisation of keywords everywhere in each .sql file, but we've long broken that and the best way can do is be consistent with surrounding tests) Downcased. 4. I think I'd use the word "Returns" instead of "Chooses" in: +Chooses a non-deterministic value from the non-null input values. Done. 5. I've not managed to find a copy of the 2023 draft, so I'm assuming you've got the ignoring of NULLs correct. Yes, I do. This is part of , so SQL:2016 10.9 GR 7.a applies. 6. Is it worth adding a WindowFunc test somewhere in window.sql with an any_value(...) over (...)? Is what any_value() returns as a WindowFunc equally as non-deterministic as when it's used as an Aggref? Can we assume there's no guarantee that it'll return the same value for each partition in each row? Does the spec mention anything about that? This is governed by SQL:2016 10.9 GR 1.d and 1.e which defines the source rows for the aggregate: either a group or a window frame. There is no difference in behavior. I don't think a windowed test is useful here unless I were to implement moving transitions. I think that might be overkill for this function. 7. I wondered if it's worth adding a SupportRequestOptimizeWindowClause support function for this aggregate. I'm thinking that it might not be as likely people would use something more specific like first_value/nth_value/last_value instead of using any_value as a WindowFunc. Also, I'm currently thinking that a SupportRequestWFuncMonotonic for any_value() is not worth the dozen or so lines of code it would take to write it. I'm assuming it would always be a MONOTONICFUNC_BOTH function. It seems unlikely that someone would have a subquery with a WHERE clause in the upper-level query referencing the any_value() aggregate. Thought I'd mention both of these things anyway as someone else might think of some good reason we should add them that I didn't think of. I thought about this for a while and decided that it was not worthwhile. v4 attached. I am putting this back to Needs Review in the commitfest app, but these changes were editorial so it is probably RfC like Peter had set it. I will let you be the judge of that. -- Vik Fearing From 410751cfa6367e5436e20011f3f47f37888190a1 Mon Sep 17 00:00:00 2001 From: Vik Fearing Date: Thu, 9 Feb 2023 10:37:10 +0100 Subject: [PATCH v4] Implement ANY_VALUE aggregate SQL:2023 defines an ANY_VALUE aggregate whose purpose is to emit an implementation-dependent (i.e. non-deterministic) value from the aggregated rows. --- doc/src/sgml/func.sgml | 14 ++ src/backend/catalog/sql_features.txt | 1 + src/backend/utils/adt/misc.c | 10 ++ src/include/catalog/pg_aggregate.dat | 4 src/include/catalog/pg_proc.dat | 8 src/test/regress/expected/aggregates.out | 24 src/test/regress/sql/aggregates.sql | 6 ++ 7 files changed, 67 insertions(+) diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index e09e289a43..8bdef6eb32 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -19735,6 +19735,20 @@ SELECT NULLIF(value, '(none)') ... + + + + any_value + +any_value ( anyelement ) +same as input type + + +
Re: [PATCH] Make ON CONFLICT DO NOTHING and ON CONFLICT DO UPDATE consistent
Hi, > Yes, and the fact is that cmin == cmax is something that we don't normally > produce Not sure if this is particularly relevant to this discussion but I can't resist noticing that the heap doesn't even store cmin and cmax... There is only HeapTupleHeaderData.t_cid and flags. cmin/cmax are merely smoke and mirrors we use to trick a user. And yes, the patch doesn't seem to break much mirrors: ``` =# create table t (a int unique, b int); =# insert into t values (1,1), (1,2) on conflict (a) do update set b = 0; =# SELECT xmin, xmax, cmin, cmax, * FROM t; xmin | xmax | cmin | cmax | a | b --+--+--+--+---+--- 731 |0 |0 |0 | 1 | 0 =# begin; =# insert into t values (2,1), (2,2), (3,1) on conflict (a) do update set b = 0; =# SELECT xmin, xmax, cmin, cmax, * FROM t; xmin | xmax | cmin | cmax | a | b --+--+--+--+---+--- 731 |0 |0 |0 | 1 | 0 732 |0 |0 |0 | 2 | 0 732 |0 |0 |0 | 3 | 1 =# insert into t values (2,1), (2,2), (3,1) on conflict (a) do update set b = 0; =# SELECT xmin, xmax, cmin, cmax, * FROM t; xmin | xmax | cmin | cmax | a | b --+--+--+--+---+--- 731 |0 |0 |0 | 1 | 0 732 | 732 |1 |1 | 2 | 0 732 | 732 |1 |1 | 3 | 0 =# commit; =# SELECT xmin, xmax, cmin, cmax, * FROM t; xmin | xmax | cmin | cmax | a | b --+--+--+--+---+--- 731 |0 |0 |0 | 1 | 0 732 | 732 |1 |1 | 2 | 0 732 | 732 |1 |1 | 3 | 0 ``` > That's a spectactularly wrong argument in almost all cases. Unless you have a > way to get to full branch coverage or use a model checker that basically does > the same, testing isn't going to give you a whole lot of confidence that you > haven't introduced bugs. But neither will reviewing a lot of code... > I've said my piece, as-is I vote to reject the patch. Fair enough. I'm merely saying that rejecting a patch because it doesn't include a TLA+ model is something novel :) > I don't buy your argument about DO UPDATE needing to be brought into > line with DO NOTHING. In any case I'm pretty sure that Tom's remarks > in 2016 about a behavioral inconsistencies (which you cited) actually > called for making DO NOTHING more like DO UPDATE -- not the other way > around. Interesting. Yep, we could use a bit of input from Tom on this one. This of course would break backward compatibility. But we can always invent something like: ``` INSERT INTO .. ON CONFLICT DO [NOTHING|UPDATE .. ] [ALLOWING|FORBIDDING] SELF CONFLICTS; ``` ... if we really want to. > problem. To me the best argument is also the simplest: who would want > us to allow it, and for what purpose? Good question. This arguably has little use for application developers. As an application developer you typically know your unique constraints and using this knowledge you can rewrite the query as needed and add any other accompanying logic. However, extension developers, as an example, often don't know the underlying unique constraints (more specifically, it's difficult to look for them and process them manually) and often have to process any garbage the application developer passes to an extension. This of course is applicable not only to extensions, but to any middleware between the DBMS and the application. -- Best regards, Aleksander Alekseev
RE: Exit walsender before confirming remote flush in logical replication
Dear Osumi-san, Thank you for reviewing! PSA new version. > (1) > > + Decides the condition for exiting the walsender process. > + 'wait_flush', which is the default, the > walsender > + will wait for all the sent WALs to be flushed on the subscriber > side, > + before exiting the process. 'immediate' will > exit > + without confirming the remote flush. This may break the consistency > + between publisher and subscriber, but it may be useful for a system > + that has a high-latency network to reduce the amount of time for > + shutdown. > > (1-1) > > The first part "exiting the walsender process" can be improved. > Probably, you can say "the exiting walsender process" or > "Decides the behavior of the walsender process at shutdown" instread. Fixed. Second idea was chosen. > (1-2) > > Also, the next sentence can be improved something like > "If the shutdown mode is wait_flush, which is the default, the > walsender waits for all the sent WALs to be flushed on the subscriber side. > If it is immediate, the walsender exits without confirming the remote flush". Fixed. > (1-3) > > We don't need to wrap wait_flush and immediate by single quotes > within the literal tag. This style was ported from the SNAPSHOT options part, so I decided to keep. > (2) > > + /* minapplydelay affects SHUTDOWN_MODE option */ > > I think we can move this comment to just above the 'if' condition > and combine it with the existing 'if' conditions comments. Moved and added some comments. > (3) 001_rep_changes.pl > > (3-1) Question > > In general, do we add this kind of check when we extend the protocol > (STREAM_REPLICATION command) > or add a new condition for apply worker exit ? > In case when we would like to know the restart of the walsender process in TAP > tests, > then could you tell me why the new test code matches the purpose of this > patch ? The replication command is not for normal user, so I think we don't have to test itself. The check that waits to restart the apply worker was added to improve the robustness. I think there is a possibility to fail the test when the apply worker recevies a transaction before it checks new subscription option. Now the failure can be avoided by confriming to reload pg_subscription and restart. > (3-2) > > + "Timed out while waiting for apply to restart after changing > min_apply_delay > to non-zero value"; > > Probably, we can partly change this sentence like below, because we check > walsender's pid. > FROM: "... while waiting for apply to restart..." > TO: "... while waiting for the walsender to restart..." Right, fixed. Best Regards, Hayato Kuroda FUJITSU LIMITED v6-0001-Time-delayed-logical-replication-subscriber.patch Description: v6-0001-Time-delayed-logical-replication-subscriber.patch v6-0002-Extend-START_REPLICATION-command-to-accept-walsen.patch Description: v6-0002-Extend-START_REPLICATION-command-to-accept-walsen.patch
RE: Exit walsender before confirming remote flush in logical replication
Dear Sawada-san, Thank you for reviewing! > +/* > + * Options for controlling the behavior of the walsender. Options can be > + * specified in the START_STREAMING replication command. Currently only one > + * option is allowed. > + */ > +typedef struct > +{ > +WalSndShutdownMode shutdown_mode; > +} WalSndOptions; > + > +static WalSndOptions *my_options = NULL; > > I'm not sure we need to have it as a struct at this stage since we > support only one option. I wonder if we can have one value, say > shutdown_mode, and we can make it a struct when we really need it. > Even if we use WalSndOptions struct, I don't think we need to > dynamically allocate it. Since a walsender can start logical > replication multiple times in principle, my_options is not freed. +1, removed the structure. > --- > +/* > + * Parse given shutdown mode. > + * > + * Currently two values are accepted - "wait_flush" and "immediate" > + */ > +static void > +ParseShutdownMode(char *shutdownmode) > +{ > +if (pg_strcasecmp(shutdownmode, "wait_flush") == 0) > +my_options->shutdown_mode = > WALSND_SHUTDOWN_MODE_WAIT_FLUSH; > +else if (pg_strcasecmp(shutdownmode, "immediate") == 0) > +my_options->shutdown_mode = > WALSND_SHUTDOWN_MODE_IMMIDEATE; > +else > +ereport(ERROR, > +errcode(ERRCODE_SYNTAX_ERROR), > +errmsg("SHUTDOWN_MODE requires > \"wait_flush\" or \"immediate\"")); > +} > > I think we should make the error message consistent with other enum > parameters. How about the message like: > > ERROR: invalid value shutdown mode: "%s" Modified like enum parameters and hint message was also provided. New patch is attached in [1]. [1]: https://www.postgresql.org/message-id/TYAPR01MB586683FC450662990E356A0EF5D99%40TYAPR01MB5866.jpnprd01.prod.outlook.com Best Regards, Hayato Kuroda FUJITSU LIMITED
Re: [PATCH] Make ON CONFLICT DO NOTHING and ON CONFLICT DO UPDATE consistent
Hi, ``` =# commit; =# SELECT xmin, xmax, cmin, cmax, * FROM t; xmin | xmax | cmin | cmax | a | b --+--+--+--+---+--- 731 |0 |0 |0 | 1 | 0 732 | 732 |1 |1 | 2 | 0 732 | 732 |1 |1 | 3 | 0 ``` Oops, you got me :) This of course isn't right - the xmax transaction is committed but we still see the data, etc. If we really are going to work on this, this part is going to require more work. -- Best regards, Aleksander Alekseev
Re: Rework LogicalOutputPluginWriterUpdateProgress (WAS Re: Logical replication timeout ...)
On Thu, Feb 9, 2023 at 11:21 AM Amit Kapila wrote: > > > How about renaming ProcessPendingWrites to WaitToSendPendingWrites or > WalSndWaitToSendPendingWrites? > How about renaming WalSndUpdateProgress() to WalSndUpdateProgressAndSendKeepAlive() or WalSndUpdateProgressAndKeepAlive()? One thing to note about the changes we are discussing here is that some of the plugins like wal2json already call OutputPluginUpdateProgress in their commit callback. They may need to update it accordingly. One difference I see with the patch is that I think we will end up sending keepalive for empty prepared transactions even though we don't skip sending begin/prepare messages for those. The reason why we don't skip sending prepare for empty 2PC xacts is that if the WALSender restarts after the PREPARE of a transaction and before the COMMIT PREPARED of the same transaction then we won't be able to figure out if we have skipped sending BEGIN/PREPARE of a transaction. To skip sending prepare for empty xacts, we previously thought of some ideas like (a) At commit-prepare time have a check on the subscriber-side to know whether there is a corresponding prepare for it before actually doing commit-prepare but that sounded costly. (b) somehow persist the information whether the PREPARE for a xact is already sent and then use that information for commit prepared but again that also didn't sound like a good idea. -- With Regards, Amit Kapila.
Re: [PATCH] Make ON CONFLICT DO NOTHING and ON CONFLICT DO UPDATE consistent
Hi, On 2023-02-09 13:06:04 +0300, Aleksander Alekseev wrote: > > Yes, and the fact is that cmin == cmax is something that we don't normally > > produce > > Not sure if this is particularly relevant to this discussion but I > can't resist noticing that the heap doesn't even store cmin and > cmax... There is only HeapTupleHeaderData.t_cid and flags. cmin/cmax > are merely smoke and mirrors we use to trick a user. No, they're not just that. Yes, cmin/cmax aren't both stored on-disk, but if both are needed, they *are* stored in-memory. We can do that because it's only ever needed from within a transaction. > > That's a spectactularly wrong argument in almost all cases. Unless you have > > a > > way to get to full branch coverage or use a model checker that basically > > does > > the same, testing isn't going to give you a whole lot of confidence that you > > haven't introduced bugs. > > But neither will reviewing a lot of code... And yet my review did figure out that your patch would have visibility problems, which you did end up having, as you noticed yourself downthread :) > > I've said my piece, as-is I vote to reject the patch. > > Fair enough. I'm merely saying that rejecting a patch because it > doesn't include a TLA+ model is something novel :) I obviously am not suggesting that (although some things could probably benefit). Just that not having an example showing something working, isn't sufficient to consider something suspicious OK. And changes affecting heapam.c visibility semantics need extremely careful review, I have the battle scars to prove that to be true :P. Greetings, Andres Freund
Re: [PATCH] Add pretty-printed XML output option
On 02.02.23 21:35, Jim Jones wrote: This small patch introduces a XML pretty print function. It basically takes advantage of the indentation feature of xmlDocDumpFormatMemory from libxml2 to format XML strings. I suggest we call it "xmlformat", which is an established term for this.
Reconcile stats in find_tabstat_entry() and get rid of PgStat_BackendFunctionEntry
Hi hackers, Please find attached a patch proposal for $SUBJECT. The idea has been raised in [1] by Andres: it would allow to simplify even more the work done to generate pg_stat_get_xact*() functions with Macros. Indeed, with the reconciliation done in find_tabstat_entry() then all the pg_stat_get_xact*() functions (making use of find_tabstat_entry()) now "look the same" (should they take into account live subtransactions or not). Looking forward to your feedback, Regards -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com [1]: https://www.postgresql.org/message-id/20230111225907.6el6c5j3hukizqxc%40awork3.anarazel.dediff --git a/src/backend/utils/activity/pgstat.c b/src/backend/utils/activity/pgstat.c index 0fa5370bcd..a08a2c8e66 100644 --- a/src/backend/utils/activity/pgstat.c +++ b/src/backend/utils/activity/pgstat.c @@ -291,7 +291,7 @@ static const PgStat_KindInfo pgstat_kind_infos[PGSTAT_NUM_KINDS] = { .shared_size = sizeof(PgStatShared_Function), .shared_data_off = offsetof(PgStatShared_Function, stats), .shared_data_len = sizeof(((PgStatShared_Function *) 0)->stats), - .pending_size = sizeof(PgStat_BackendFunctionEntry), + .pending_size = sizeof(PgStat_FunctionCounts), .flush_pending_cb = pgstat_function_flush_cb, }, diff --git a/src/backend/utils/activity/pgstat_function.c b/src/backend/utils/activity/pgstat_function.c index 6139a27fee..0fdcefb783 100644 --- a/src/backend/utils/activity/pgstat_function.c +++ b/src/backend/utils/activity/pgstat_function.c @@ -73,7 +73,7 @@ pgstat_init_function_usage(FunctionCallInfo fcinfo, PgStat_FunctionCallUsage *fcu) { PgStat_EntryRef *entry_ref; - PgStat_BackendFunctionEntry *pending; + PgStat_FunctionCounts *pending; boolcreated_entry; if (pgstat_track_functions <= fcinfo->flinfo->fn_stats) @@ -121,10 +121,10 @@ pgstat_init_function_usage(FunctionCallInfo fcinfo, pending = entry_ref->pending; - fcu->fs = &pending->f_counts; + fcu->fs = pending; /* save stats for this function, later used to compensate for recursion */ - fcu->save_f_total_time = pending->f_counts.f_total_time; + fcu->save_f_total_time = pending->f_total_time; /* save current backend-wide total time */ fcu->save_total = total_func_time; @@ -192,10 +192,10 @@ pgstat_end_function_usage(PgStat_FunctionCallUsage *fcu, bool finalize) bool pgstat_function_flush_cb(PgStat_EntryRef *entry_ref, bool nowait) { - PgStat_BackendFunctionEntry *localent; + PgStat_FunctionCounts *localent; PgStatShared_Function *shfuncent; - localent = (PgStat_BackendFunctionEntry *) entry_ref->pending; + localent = (PgStat_FunctionCounts *) entry_ref->pending; shfuncent = (PgStatShared_Function *) entry_ref->shared_stats; /* localent always has non-zero content */ @@ -203,11 +203,11 @@ pgstat_function_flush_cb(PgStat_EntryRef *entry_ref, bool nowait) if (!pgstat_lock_entry(entry_ref, nowait)) return false; - shfuncent->stats.f_numcalls += localent->f_counts.f_numcalls; + shfuncent->stats.f_numcalls += localent->f_numcalls; shfuncent->stats.f_total_time += - INSTR_TIME_GET_MICROSEC(localent->f_counts.f_total_time); + INSTR_TIME_GET_MICROSEC(localent->f_total_time); shfuncent->stats.f_self_time += - INSTR_TIME_GET_MICROSEC(localent->f_counts.f_self_time); + INSTR_TIME_GET_MICROSEC(localent->f_self_time); pgstat_unlock_entry(entry_ref); @@ -215,11 +215,11 @@ pgstat_function_flush_cb(PgStat_EntryRef *entry_ref, bool nowait) } /* - * find any existing PgStat_BackendFunctionEntry entry for specified function + * find any existing PgStat_FunctionCounts entry for specified function * * If no entry, return NULL, don't create a new one */ -PgStat_BackendFunctionEntry * +PgStat_FunctionCounts * find_funcstat_entry(Oid func_id) { PgStat_EntryRef *entry_ref; diff --git a/src/backend/utils/activity/pgstat_relation.c b/src/backend/utils/activity/pgstat_relation.c index 2e20b93c20..b33965c581 100644 --- a/src/backend/utils/activity/pgstat_relation.c +++ b/src/backend/utils/activity/pgstat_relation.c @@ -466,14 +466,30 @@ PgStat_TableStatus * find_tabstat_entry(Oid rel_id) { PgStat_EntryRef *entry_ref; + PgStat_TableStatus *tablestatus = NULL; entry_ref = pgstat_fetch_pending_entry(PGSTAT_KIND_RELATION, MyDatabaseId, rel_id); if (!entry_ref) entry_ref = pgstat_fetch_pending_entry(PGSTAT_KIND_RELATION, InvalidOid, rel_id); if (entry_ref) - return entry_ref->pending; - return NULL; + { + PgStat_TableSta
Re: [PATCH] Make ON CONFLICT DO NOTHING and ON CONFLICT DO UPDATE consistent
Hi, > And yet my review did figure out that your patch would have visibility > problems, which you did end up having, as you noticed yourself downthread :) Yep, this particular implementation turned out to be buggy. >> I don't buy your argument about DO UPDATE needing to be brought into >> line with DO NOTHING. In any case I'm pretty sure that Tom's remarks >> in 2016 about a behavioral inconsistencies (which you cited) actually >> called for making DO NOTHING more like DO UPDATE -- not the other way >> around. > > Interesting. Yep, we could use a bit of input from Tom on this one. > > This of course would break backward compatibility. But we can always > invent something like: > > ``` > INSERT INTO .. > ON CONFLICT DO [NOTHING|UPDATE .. ] > [ALLOWING|FORBIDDING] SELF CONFLICTS; > ``` > > ... if we really want to. I suggest we discuss if we even want to support something like this before processing further and then think about a particular implementation if necessary. One thing that occured to me during the discussion is that we don't necessarily have to physically write one tuple at a time to the heap. Alternatively we could use information about the existing unique constraints and write only the needed tuples. > However, extension developers, as an example, often don't know the > underlying unique constraints (more specifically, it's difficult to > look for them and process them manually) and often have to process any > garbage the application developer passes to an extension. > > This of course is applicable not only to extensions, but to any > middleware between the DBMS and the application. This however is arguably a niche use case. So maybe we don't want to spend time on this. -- Best regards, Aleksander Alekseev
Re: [PATCH] Compression dictionaries for JSONB
Hi Andres, > > So to clarify, are we talking about tuple-level compression? Or > > perhaps page-level compression? > > Tuple level. > although my own patch proposed attribute-level compression, not > tuple-level one, it is arguably closer to tuple-level approach than > page-level one Just wanted to make sure that by tuple-level we mean the same thing. When saying tuple-level do you mean that the entire tuple should be compressed as one large binary (i.e. similarly to page-level compression but more granularly), or every single attribute should be compressed separately (similarly to how TOAST does this)? -- Best regards, Aleksander Alekseev
Re: Support logical replication of DDLs
I happened to notice that MINVFUNC in 0003 displays like this "fmt": "MINVFUNC==%{type}T", in some cases; this appears in the JSON that's emitted by the regression tests at some point. How can we detect this kind of thing, so that these mistakes become self-evident? I thought the intention of the regress module was to run the deparsed code, so the syntax error should have become obvious. ... Oh, I see the problem. There are two 'fmt' lines for that clause (and many others), one of which is used when the clause is not present. So there's never a syntax error, because this one never expands other than to empty. AFAICS this defeats the purpose of the 'present' field. I mean, if the clause is never to deparse, then why have it there in the first place? If we want to have it, then it has to be correct. I think we should design the code to avoid the repetition, because that has an inherent risk of typo bugs and such. Maybe we should set forth policy that each 'fmt' string should appear in the source code only once. So instead of this + /* MINVFUNC */ + if (OidIsValid(agg->aggminvtransfn)) + tmp = new_objtree_VA("MINVFUNC=%{type}T", 1, +"type", ObjTypeObject, + new_objtree_for_qualname_id(ProcedureRelationId, + agg->aggminvtransfn)); + else + { + tmp = new_objtree("MINVFUNC==%{type}T"); + append_bool_object(tmp, "present", false); + } we would have something like tmp = new_objtree("MINVFUNC=%{type}T"); if (OidIsValid(agg->aggminvtransfn)) { append_bool_object(tmp, "present", true); append...(tmp, "type", new_objtree_for_qualname_id(ProcedureRelationId, agg->aggminvtransfn)); } else { append_bool_object(tmp, "present", false); } -- Álvaro HerreraBreisgau, Deutschland — https://www.EnterpriseDB.com/ "No necesitamos banderas No reconocemos fronteras" (Jorge González)
Re: [PATCH] Compression dictionaries for JSONB
Hi, On February 9, 2023 2:50:57 AM PST, Aleksander Alekseev wrote: >Hi Andres, > >> > So to clarify, are we talking about tuple-level compression? Or >> > perhaps page-level compression? >> >> Tuple level. > >> although my own patch proposed attribute-level compression, not >> tuple-level one, it is arguably closer to tuple-level approach than >> page-level one > >Just wanted to make sure that by tuple-level we mean the same thing. > >When saying tuple-level do you mean that the entire tuple should be >compressed as one large binary (i.e. similarly to page-level >compression but more granularly), or every single attribute should be >compressed separately (similarly to how TOAST does this)? Good point - should have been clearer. I meant attribute wise compression. Like we do today, except that we would use a dictionary to increase compression rates. Andres -- Sent from my Android device with K-9 Mail. Please excuse my brevity.
Re: [PoC] Improve dead tuple storage for lazy vacuum
On Thu, Feb 9, 2023 at 2:08 PM Masahiko Sawada wrote: > I think it's still important for lazy vacuum that an iteration over a > TID store returns TIDs in ascending order, because otherwise a heap > vacuum does random writes. That being said, we can have > RT_ITERATE_NEXT() return key-value pairs in an order regardless of how > the key chunks are stored in a node. Okay, we can keep that possibility in mind if we need to go there. > > Note: The regression seems to have started in v17, which is the first with a full template. > > 0007 removes some dead code. RT_NODE_INSERT_INNER is only called during RT_SET_EXTEND, so it can't possibly find an existing key. This kind of change is much easier with the inner/node cases handled together in a template, as far as being sure of how those cases are different. I thought about trying the search in assert builds and verifying it doesn't exist, but thought yet another #ifdef would be too messy. It just occurred to me that these facts might be related. v17 was the first use of the full template, and I decided then I liked one of your earlier patches where replace_node() calls node_update_inner() better than calling node_insert_inner() with a NULL parent, which was a bit hard to understand. That now-dead code was actually used in the latter case for updating the (original) parent. It's possible that trying to use separate paths contributed to the regression. I'll try the other way and report back. > I've attached the patch that adds a simple benchmark test using > TidStore. With this test, I got similar trends of results to yours > with gcc, but I've not analyzed them in depth yet. Thanks for that! I'll take a look. > BTW it would be better to remove the RT_DEBUG macro from bench_radix_tree.c. Absolutely. -- John Naylor EDB: http://www.enterprisedb.com
Re: pg_stat_statements and "IN" conditions
On 07.02.23 21:14, Sergei Kornilov wrote: It seems a little strange to me that with const_merge_threshold = 1, such a test case gives the same result as with const_merge_threshold = 2 What is the point of making this a numeric setting? Either you want to merge all values or you don't want to merge any values.
Re: Introduce a new view for checkpointer related stats
On Thu, Feb 9, 2023 at 12:33 PM Andres Freund wrote: > > Hi, Thanks for looking at this. > On 2023-02-09 12:21:51 +0530, Bharath Rupireddy wrote: > > @@ -1105,18 +1105,22 @@ CREATE VIEW pg_stat_archiver AS > > > > CREATE VIEW pg_stat_bgwriter AS > > SELECT > > -pg_stat_get_bgwriter_timed_checkpoints() AS checkpoints_timed, > > -pg_stat_get_bgwriter_requested_checkpoints() AS checkpoints_req, > > -pg_stat_get_checkpoint_write_time() AS checkpoint_write_time, > > -pg_stat_get_checkpoint_sync_time() AS checkpoint_sync_time, > > -pg_stat_get_bgwriter_buf_written_checkpoints() AS > > buffers_checkpoint, > > pg_stat_get_bgwriter_buf_written_clean() AS buffers_clean, > > pg_stat_get_bgwriter_maxwritten_clean() AS maxwritten_clean, > > -pg_stat_get_buf_written_backend() AS buffers_backend, > > -pg_stat_get_buf_fsync_backend() AS buffers_backend_fsync, > > pg_stat_get_buf_alloc() AS buffers_alloc, > > pg_stat_get_bgwriter_stat_reset_time() AS stats_reset; > > > > +CREATE VIEW pg_stat_checkpointer AS > > +SELECT > > +pg_stat_get_timed_checkpoints() AS timed_checkpoints, > > +pg_stat_get_requested_checkpoints() AS requested_checkpoints, > > +pg_stat_get_checkpoint_write_time() AS checkpoint_write_time, > > +pg_stat_get_checkpoint_sync_time() AS checkpoint_sync_time, > > +pg_stat_get_buf_written_checkpoints() AS > > buffers_written_checkpoints, > > +pg_stat_get_buf_written_backend() AS buffers_written_backend, > > +pg_stat_get_buf_fsync_backend() AS buffers_fsync_backend, > > +pg_stat_get_checkpointer_stat_reset_time() AS stats_reset; > > + > > I don't think the backend written stats belong more accurately in > pg_stat_checkpointer than pg_stat_bgwriter. We accumulate buffers_written_backend and buffers_fsync_backend of all backends under checkpointer stats to show the aggregated results to the users. I think this is correct because the checkpointer is the one that processes fsync requests (of course, backends themselves can fsync when needed, that's what the buffers_fsync_backend shows), whereas bgwriter doesn't perform IIUC. > I continue to be worried about breaking just about any postgres monitoring > setup. Hm. Yes, it requires minimal and straightforward changes in monitoring scripts. Please note that we separated out bgwriter and checkpointer in v9.2 12 years ago but we haven't had a chance to separate the stats so far. We might do it at some point of time, IMHO this is that time. We did away with promote_trigger_file (cd4329d) very recently. The agreement was that the changes required to move on to other mechanisms of promotion are minimal, hence we didn't want it to be first deprecated and then removed. >From the discussion upthread, it looks like Robert, Amit, Bertrand, Greg and myself are in favour of not having a deprecated version but moving them to the new pg_stat_checkpointer view. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Weird failure with latches in curculio on v15
On Wed, Feb 8, 2023 at 7:24 PM Nathan Bossart wrote: > On Thu, Feb 09, 2023 at 08:56:24AM +0900, Michael Paquier wrote: > > On Wed, Feb 08, 2023 at 02:25:54PM -0800, Nathan Bossart wrote: > >> These are all good points. Perhaps there could be a base archiver > >> implementation that shell_archive uses (and that other modules could use if > >> desired, which might be important for backward compatibility with the > >> existing callbacks). But if you want to do something fancier than > >> archiving sequentially, you could write your own. > > > > Which is basically the kind of things you can already achieve with a > > background worker and a module of your own? > > IMO one of the big pieces that's missing is a way to get the next N files > to archive. Right now, you'd have to trawl through archive_status on your > own if you wanted to batch/parallelize. I think one advantage of what > Robert is suggesting is that we could easily provide a supported way to get > the next set of files to archive, and we can asynchronously mark them > "done". Otherwise, each module has to implement this. Right. I think that we could certainly, as Michael suggests, have people provide their own background worker rather than having the archiver invoke the user-supplied code directly. As long as the functions that you need in order to get the necessary information can be called from some other process, that's fine. The only difficulty I see is that if the archiving is happening from a separate background worker rather than from the archiver, then what is the archiver doing? We could somehow arrange to not run the archiver process at all, or I guess to just sit there and have it do nothing. Or, we can decide not to have a separate background worker and just have the archiver call the user-supplied core directly. I kind of like that approach at the moment; it seems more elegant to me. -- Robert Haas EDB: http://www.enterprisedb.com
Re: pgsql: Use appropriate wait event when sending data in the apply worker
On Mon, Feb 6, 2023 at 11:40 PM Amit Kapila wrote: > Use appropriate wait event when sending data in the apply worker. > > Currently, we reuse WAIT_EVENT_LOGICAL_PARALLEL_APPLY_STATE_CHANGE in the > apply worker while sending data to the parallel apply worker via a shared > memory queue. This is not appropriate as one won't be able to distinguish > whether the worker is waiting for sending data or for the state change. > > To patch instead uses the wait event WAIT_EVENT_MQ_SEND which has been > already used in blocking mode while sending data via a shared memory > queue. This is not right at all. You should invent a new wait state if you're waiting in a new place. -- Robert Haas EDB: http://www.enterprisedb.com
Re: AW: Wrong rows estimations with joins of CTEs slows queries by more than factor 500
On 2/9/23 10:03, Hans Buschmann wrote: > Hello Tomas, > > > Thank you for looking at. > > > First, I miscalculated the factor which should be about 50, not 500. Sorry. > > Then I want to show you the table definitions (simple, very similar, > ommited child_tables and additional indexes, here using always "ONLY"): > > cpsdb_matcol=# \d sa_upper; > Tabelle ╗public.sa_upper½ > Spalte | Typ | Sortierfolge | NULL erlaubt? | > Vorgabewert > --+---+--+---+-- > id_sup | integer | | not null | > generated by default as identity > sup_season | smallint | | | > sup_sa_code | character varying(10) | C | | > sup_mat_code | character varying(4) | C | | > sup_clr_code | character varying(3) | C | | > Indexe: > "sa_upper_active_pkey" PRIMARY KEY, btree (id_sup) > > > cpsdb_matcol=# \d sa_lining+; > Tabelle ╗public.sa_lining½ > Spalte | Typ | Sortierfolge | NULL erlaubt? | > Vorgabewert > --+---+--+---+-- > id_sli | integer | | not null | > generated by default as identity > sli_season | smallint | | | > sli_sa_code | character varying(10) | C | | > sli_mat_code | character varying(4) | C | | > sli_clr_code | character varying(3) | C | | > Indexe: > "sa_lining_active_pkey" PRIMARY KEY, btree (id_sli) > > > cpsdb_matcol=# \d sa_insole; > Tabelle ╗public.sa_insole½ > Spalte | Typ | Sortierfolge | NULL erlaubt? | > Vorgabewert > --+---+--+---+-- > id_sin | integer | | not null | > generated by default as identity > sin_season | smallint | | | > sin_sa_code | character varying(10) | C | | > sin_mat_code | character varying(4) | C | | > sin_clr_code | character varying(3) | C | | > Indexe: > "sa_insole_active_pkey" PRIMARY KEY, btree (id_sin) > > > cpsdb_matcol=# \d sa_outsole; > Tabelle ╗public.sa_outsole½ > Spalte | Typ | Sortierfolge | NULL erlaubt? | > Vorgabewert > --+---+--+---+-- > id_sou | integer | | not null | > generated by default as identity > sou_season | smallint | | | > sou_sa_code | character varying(10) | C | | > sou_mat_code | character varying(4) | C | | > sou_clr_code | character varying(3) | C | | > Indexe: > "sa_outsole_active_pkey" PRIMARY KEY, btree (id_sou) > > The xxx_target tables are very similiar, here the upper one as an example: > They are count_aggregates of the whole dataset, where > up_mat_code=sup_mat_code etc. > > cpsdb_matcol=# \d upper_target > Tabelle ╗admin.upper_target½ > Spalte | Typ | Sortierfolge | NULL erlaubt? | Vorgabewert > -+--+--+---+- > id_up | smallint | | | > nup | integer | | | > up_mat_code | text | C | | > > > > I have reworked the two queries to show their complete explain plans: > > 1. query with left join in the qupd CTE: > > \set only 'ONLY' > > cpsdb_matcol=# explain analyze -- explain analyze verbose -- explain -- > select * from ( -- select count(*) from ( -- select length(sel) from ( > cpsdb_matcol-# with > cpsdb_matcol-# qup as ( > cpsdb_matcol(# select > cpsdb_matcol(# curr_season -- all xxx_seasosn are always smallint > cpsdb_matcol(# ,curr_code-- all xx_code are always varchar(10) > cpsdb_matcol(# ,array_agg(id_up order by > id_up)||array_fill(0::smallint,array[10]) as mat_arr > cpsdb_matcol(# ,array_agg(curr_mat_code order by id_up) as matcode_arr > cpsdb_matcol(# ,bit_or(imask) as ibitmask > cpsdb_matcol(# from( > cpsdb_matcol(# select > cpsdb_matcol(# sup_season as curr_season > cpsdb_matcol(# ,sup_sa_code as curr_code > cpsdb_matcol(# ,sup_mat_code as curr_mat_code > cpsdb_matcol(# ,sup_clr_code as curr_clr_code > cpsdb_matcol(# ,id_up > cpsdb_matcol(# ,
Re: MacOS: xsltproc fails with "warning: failed to load external entity"
On 08.02.23 23:18, Tom Lane wrote: I pushed the discussed documentation improvements, and changed the behavior of "ninja docs" to only build the HTML docs. I don't like this change. Now the default set of docs is different between the make builds and the meson builds. And people will be less likely to make sure the man pages still build. What's wrong with just typing "ninja html"?
RE: [Proposal] Add foreign-server health checks infrastructure
Dear Horiguchi-san, Thank you for checking! The patch will be attached to another mail. > At Fri, 27 Jan 2023 06:57:01 +, "Hayato Kuroda (Fujitsu)" > wrote in > > I found cfbot failure, PSA fixed version. > > + Unlike , this function checks socket > + health. This check is performed by polling the socket. This function > is > + currently available only on systems that support the non-standard > + POLLRDHUP extension to the > poll system > > I find it quite confusing that we have pqSocketCheck and PQconnCheck, > that does almost the same thing.. Since pqSocketCheck is a static > function, we can modify the function as we like. Renamed to pqSocketIsReadableOrWritableOrValid(), but seemed very bad... > I still don't understand why we need pqconnCheck_internal separate > from pqSocketPoll(), and PQconnCheck from pqSocketCheck. pqconnCheck_internal() was combined into pqSocketPoll(), but PQconnCheck() still exists. libpq-fe.h, did not include standard header files except stdio.h. I'm not sure whether we can add an inclusion of time.h, because it may break the compatibility that some platform may not have the header. If there are not such a system, we may able to export pqSocketCheck() and remove PQconnCheck(). The side effect of this changes is that codes become dirty when we add kqueue() support... > https://www.postgresql.org/message-id/flat/TYAPR01MB58665BF23D38EDF1 > 0028DE2AF5299%40TYAPR01MB5866.jpnprd01.prod.outlook.com#47d21431bf9 > fa94f763c824f6e81fa54 > > IIUC, pqSocketCheck () calls pqSocketPoll(), > > and in the pqSocketPoll() we poll()'d the POLLIN or POLLOUT event. > > But according to [1], we must wait POLLRDHUP event, > > so we cannot reuse it straightforward. > > Yeah, I didn't suggest to use the function as-is. Couldn't we extend > the fucntion by letting it accept end_time = 0 && !forRead && > !forWrite, not causing side effects? Modified accordingly. Is it what you expected? Best Regards, Hayato Kuroda FUJITSU LIMITED
RE: [Proposal] Add foreign-server health checks infrastructure
Dear Katsuragi-san, Thank you for reviewing! PSA new version patches. > 0001: > + while (result < 0 && errno == EINTR); > + > + if (result < 0) > + return -1; > > this `return -1` is not indented properly. This part is no longer needed. Please see another discussion[1]. > 0002: > +postgres_fdw_verify_connection_states(server_name > text) returns boolean > ... > + extension to the poll system call, including > Linux. This > + returns true if checked connections are still > valid, > + or the checking is not supported on this platform. > false > + is returned if the local session seems to be disconnected from > other > + servers. Example usage of the function: > > Here, 'still valid' seems a little bit confusing because this 'valid' is > not > the same as postgres_fdw_get_connections's 'valid' [1]. > > Should 'still valid' be 'existing connection is not closed by the remote > peer'? > But this description does not cover all the cases where this function > returns true... > I think one choice is to write all the cases like 'returns true if any > of the > following condition is satisfied. 1) existing connection is not closed > by the > remote peer 2) there is no connection for specified server yet 3) the > checking > is not supported...'. If my understanding is not correct, please point > it out. Modified like you pointed out. > BTW, is it reasonable to return true if ConnectionHash is not > initialized or > there is no ConnCacheEntry for specified remote server? What do you > think > about returning NULL in that case? I'm not sure which one is better, but modified accordingly. > 0003: > I think it is better that the test covers all the new functions. > How about adding a test for postgres_fdw_verify_connection_states_all? > > > +-- > === > > +-- test for postgres_fdw_verify_foreign_servers function > +-- > === > > > Writing all the functions' name like 'test for > postgres_fdw_verify_connection_states > and postgres_fdw_can_verify_connection_states' looks straightforward. > What do you think about this? Added. > 0004: > Sorry, I have not read 0004. I will read. No problem:-). [1]: https://www.postgresql.org/message-id/TYAPR01MB58664E039F45959AB321FA1FF5D99%40TYAPR01MB5866.jpnprd01.prod.outlook.com Best Regards, Hayato Kuroda FUJITSU LIMITED v31-0001-Add-PQconnCheck-and-PQcanConnCheck-to-libpq.patch Description: v31-0001-Add-PQconnCheck-and-PQcanConnCheck-to-libpq.patch v31-0002-postgres_fdw-add-postgres_fdw_verify_connection_.patch Description: v31-0002-postgres_fdw-add-postgres_fdw_verify_connection_.patch v31-0003-add-test.patch Description: v31-0003-add-test.patch v31-0004-add-kqueue-support-for-PQconnCheck-and-PQcanConn.patch Description: v31-0004-add-kqueue-support-for-PQconnCheck-and-PQcanConn.patch
Re: ICU locale validation / canonicalization
On 08.02.23 08:59, Jeff Davis wrote: The overall benefit here is that we keep our catalogs consistently using an independent standard format for ICU locale strings, rather than whatever the user specifies. That makes it less likely that ICU needs to use any fallback logic when trying to open a collator, which could only be bad news. One use case is that if a user specifies a locale, say, of 'de-AT', this might canonicalize to 'de' today, but we should still store what the user specified because 1) that documents what the user wanted, and 2) it might not canonicalize to the same thing tomorrow.
Re: MacOS: xsltproc fails with "warning: failed to load external entity"
Peter Eisentraut writes: > On 08.02.23 23:18, Tom Lane wrote: >> I pushed the discussed documentation improvements, and changed the >> behavior of "ninja docs" to only build the HTML docs. > I don't like this change. Now the default set of docs is different > between the make builds and the meson builds. And people will be less > likely to make sure the man pages still build. What? The default behavior of "make" has been to build only the html docs for many years. And I've never ever seen a case where the html docs build and the man pages don't. > What's wrong with just typing "ninja html"? Don't really care how the command is spelled, but there needs to be a convenient way to get that behavior. regards, tom lane
Re: pg_stat_statements and "IN" conditions
> On Tue, Feb 07, 2023 at 11:14:52PM +0300, Sergei Kornilov wrote: > Hello! Thanks for reviewing. > Unfortunately, rebase is needed again due to recent changes in > queryjumblefuncs ( 9ba37b2cb6a174b37fc51d0649ef73e56eae27fc ) Yep, my favourite game, rebaseball. Will post a new version soon, after figuring out all the recent questions. > It seems a little strange to me that with const_merge_threshold = 1, such a > test case gives the same result as with const_merge_threshold = 2 > > select pg_stat_statements_reset(); > set const_merge_threshold to 1; > select * from test where i in (1,2,3); > select * from test where i in (1,2); > select * from test where i in (1); > select query, calls from pg_stat_statements order by query; > > query| calls > -+--- > select * from test where i in (...) | 2 > select * from test where i in ($1) | 1 > > Probably const_merge_threshold = 1 should produce only "i in (...)"? Well, it's not intentional, probably I need to be more careful with off-by-one. Although I agree to a certain extent with Peter questioning the value of having numerical option here, let me think about this. > const_merge_threshold is "the minimal length of an array" (more or equal) or > "array .. length is larger than" (not equals)? I think the documentation is > ambiguous in this regard. > > I also noticed a typo in guc_tables.c: "Sets the minimal numer of constants > in an array" -> number Yep, I'll rephrase the documentation.
Re: pg_stat_statements and "IN" conditions
> On Thu, Feb 09, 2023 at 02:30:34PM +0100, Peter Eisentraut wrote: > On 07.02.23 21:14, Sergei Kornilov wrote: > > It seems a little strange to me that with const_merge_threshold = 1, such a > > test case gives the same result as with const_merge_threshold = 2 > > What is the point of making this a numeric setting? Either you want to > merge all values or you don't want to merge any values. At least in theory the definition of "too many constants" is different for different use cases and I see allowing to configure it as a way of reducing the level of surprise here. The main scenario for a numerical setting would be to distinguish between normal usage with just a handful of constants (and the user expecting to see them represented in pgss) and some sort of outliers with thousands of constants in a query (e.g. as a defence mechanism for the infrastructure working with those metrics). But I agree that it's not clear how much value is in that. Not having strong opinion about this I would be fine changing it to a boolean option (with an actual limit hidden internally) if everyone agrees it fits better.
Re: meson: Optionally disable installation of test modules
Hi, On 2/8/23 13:30, Peter Eisentraut wrote: If you feel that your patch is ready, please add it to the commit fest. I look forward to reviewing it. Thanks! Commit fest entry link: https://commitfest.postgresql.org/42/4173/ Regards, Nazir Bilal Yavuz Microsoft
[Question] Revamping btree gist implementation of inet
Hi, I was looking at bug mentioned at https://www.postgresql.org/message-id/flat/201010112055.o9BKtZf7011251%40wwwmaster.postgresql.org Issue appears to be in gbt_inet_compress which doesn't store inet details like ip_family and netmask details in inetKEY and gbt_inet_consistent which doesn't have enough info (as gbt_inet_compress didn't store them) and uses vague convert_network_to_scalar for performing operations. Looking at reference implementation for inet in src/backend/utils/adt/network_gist.c, if we add missing inet data (as seen in GistInetKey) into inetKEY and refactor gbt_inet_consistent to use network functions instead of using convert_network_to_scalar, mentioned bugs might be alleviated. I just to know if this is right direction to approach this problem? Thanks, Ankit
Re: Weird failure with latches in curculio on v15
Robert Haas writes: > I think that we could certainly, as Michael suggests, have people > provide their own background worker rather than having the archiver > invoke the user-supplied code directly. As long as the functions that > you need in order to get the necessary information can be called from > some other process, that's fine. The only difficulty I see is that if > the archiving is happening from a separate background worker rather > than from the archiver, then what is the archiver doing? We could > somehow arrange to not run the archiver process at all, or I guess to > just sit there and have it do nothing. Or, we can decide not to have a > separate background worker and just have the archiver call the > user-supplied core directly. I kind of like that approach at the > moment; it seems more elegant to me. I'm fairly concerned about the idea of making it common for people to write their own main loop for the archiver. That means that, if we have a bug fix that requires the archiver to do X, we will not just be patching our own code but trying to get an indeterminate set of third parties to add the fix to their code. If we think we need primitives to let the archiver hooks get all the pending files, or whatever, by all means add those. But don't cede fundamental control of the archiver. The hooks need to be decoration on a framework we provide, not the framework themselves. regards, tom lane
Re: AW: Wrong rows estimations with joins of CTEs slows queries by more than factor 500
> > FWIW I suggest you provide the data in a form that's easier to use (like > a working SQL script). More people are likely to look and help than when > they have to extract stuff from an e-mail, fill in missing pieces etc. > BTW if anyone wants to play with this, here are the SQL scripts I used to create the tables and the queries. There's no data, but it's enough to see how the plans change. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company query-complete.sql Description: application/sql query-smaller.sql Description: application/sql create-join.sql Description: application/sql
Re: ICU locale validation / canonicalization
On Wed, Feb 8, 2023 at 2:59 AM Jeff Davis wrote: > We do check that the value is accepted by ICU, but ICU seems to accept > anything and use some fallback logic. Bogus strings will typically end > up as the "root" locale (spelled "root" or ""). I've noticed this, and I think it's really frustrating. There's barely any documentation of what strings you're allowed to specify, and the documentation that does exist is extremely difficult to understand. Normally, you could work around that problem to some degree by making a guess at what you're supposed to be doing and then seeing whether the program accepts it, but here that doesn't work either. It just accepts anything you give it and then you have to try to figure out whether the behavior is what you wanted. But there's also no real documentation of what the behavior of any collation is, so you're apparently just supposed to magically know what collations exist and how they behave and then you can test whether the string you put in gave you the behavior you wanted. Adding validation and canonicalization wouldn't cure the documentation problems, but it would be a big help. You still wouldn't know what string you were supposed to be passing to ICU, but if you did pass it a string, you'd find out what it thought that string meant. I think that would be a huge step forward. Unfortunately, I have no idea whether your specific ideas about how to make that happen are any good or not. But I hope they are, because the current situation is pessimal. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Inconsistent nullingrels due to oversight in deconstruct_distribute_oj_quals
Richard Guo writes: > It seems to me there is oversight here. Actually in next level up this > othersj would null all the relids in its syn_righthand, not only the > relids in its min_righthand. Good point. I think this code originated before it was clear to me that nullingrels would need to follow the syntactic structure. > This query would trigger the Assert() in search_indexed_tlist_for_var. > So I wonder that we should use othersj->syn_righthand here. There are two such calls in deconstruct_distribute_oj_quals ... don't they both need this change? regards, tom lane
Re: Weird failure with latches in curculio on v15
On Thu, Feb 9, 2023 at 10:51 AM Tom Lane wrote: > I'm fairly concerned about the idea of making it common for people > to write their own main loop for the archiver. That means that, if > we have a bug fix that requires the archiver to do X, we will not > just be patching our own code but trying to get an indeterminate > set of third parties to add the fix to their code. I don't know what kind of bug we could really have in the main loop that would be common to every implementation. They're probably all going to check for interrupts, do some work, and then wait for I/O on some things by calling select() or some equivalent. But the work, and the wait for the I/O, would be different for every implementation. I would anticipate that the amount of common code would be nearly zero. Imagine two archive modules, one of which archives files via HTTP and the other of which archives them via SSH. They need to do a lot of the same things, but the code is going to be totally different. When the HTTP archiver module needs to open a new connection, it's going to call some libcurl function. When the SSH archiver module needs to do the same thing, it's going to call some libssh function. It seems quite likely that the HTTP implementation would want to juggle multiple connections in parallel, but the SSH implementation might not want to do that, or its logic for determining how many connections to open might be completely different based on the behavior of that protocol vs. the other protocol. Once either implementation has sent as much data it can over the connections it has open, it needs to wait for those sockets to become write-ready or, possibly, read-ready. There again, each one will be calling into a different library to do that. It could be that in this particular case, but would be waiting for a set of file descriptors, and we could provide some framework for waiting on a set of file descriptors provided by the module. But you could also have some other archiver implementation that is, say, waiting for a process to terminate rather than for a file descriptor to become ready for I/O. > If we think we need primitives to let the archiver hooks get all > the pending files, or whatever, by all means add those. But don't > cede fundamental control of the archiver. The hooks need to be > decoration on a framework we provide, not the framework themselves. I don't quite see how you can make asynchronous and parallel archiving work if the archiver process only calls into the archive module at times that it chooses. That would mean that the module has to return control to the archiver when it's in the middle of archiving one or more files -- and then I don't see how it can get control back at the appropriate time. Do you have a thought about that? -- Robert Haas EDB: http://www.enterprisedb.com
Re: Inconsistent nullingrels due to oversight in deconstruct_distribute_oj_quals
I wrote: > Richard Guo writes: >> It seems to me there is oversight here. Actually in next level up this >> othersj would null all the relids in its syn_righthand, not only the >> relids in its min_righthand. > Good point. I think this code originated before it was clear to me > that nullingrels would need to follow the syntactic structure. Although ... the entire point here is that we're trying to build quals that don't match the original syntactic structure. I'm worried that distribute_qual_to_rels will do the wrong thing (put the qual at the wrong level) if we add more nullingrel bits than we meant to. This might be less trivial than it appears. regards, tom lane
Re: HOT chain validation in verify_heapam()
On Wed, Feb 8, 2023 at 11:17 PM Robert Haas wrote: > On Sun, Feb 5, 2023 at 3:57 AM Himanshu Upadhyaya > wrote: > > Thanks, yes it's working fine with Prepared Transaction. > > Please find attached the v9 patch incorporating all the review comments. > > I don't know quite how we're still going around in circles about this, > but this code makes no sense to me at all: > > /* > * Add data to the predecessor array even if the current or > * successor's LP is not valid. We will not process/validate > these > * offset entries while looping over the predecessor array but > * having all entries in the predecessor array will help in > * identifying(and validating) the Root of a chain. > */ > if (!lp_valid[ctx.offnum] || !lp_valid[nextoffnum]) > { > predecessor[nextoffnum] = ctx.offnum; > continue; > } > > If the current offset number is not for a valid line pointer, then it > makes no sense to talk about the successor. An invalid redirected line > pointer is one that points off the end of the line pointer array, or > to before the beginning of the line pointer array, or to a line > pointer that is unused. An invalid line pointer that is LP_USED is one > which points to a location outside the page, or to a location inside > the page. In none of these cases does it make any sense to talk about > the next tuple. If the line pointer isn't valid, it's pointing to some > invalid location where there cannot possibly be a tuple. In other > words, if lp_valid[ctx.offnum] is false, then nextoffnum is a garbage > value, and therefore referencing predecessor[nextoffnum] is useless > and dangerous. > > If the next offset number is not for a valid line pointer, we could in > theory still assign to the predecessor array, as you propose here. In > that case, the tuple or line pointer at ctx.offnum is pointing to the > line pointer at nextoffnum and that is all fine. But what is the > point? The comment claims that the point is that it will help us > identify and validate the root of the hot chain. But if the line > pointer at nextoffnum is not valid, it can't be the root of a hot > chain. When we're talking about the root of a HOT chain, we're > speaking about a tuple. If lp_valid[nextoffnum] is false, there is no > tuple. Instead of pointing to a tuple, that line pointer is pointing > to garbage. > > Initially while implementing logic to identify the root of the HOT chain I was getting crash and regression failure's that time I thought of having this check along with a few other changes that were required, but you are right, it's unnecessary to add data to the predecessor array(in this case) and is not required. I am removing this from the patch. -- Regards, Himanshu Upadhyaya EnterpriseDB: http://www.enterprisedb.com From 19c93cce0189150d6bfe68237eb3d5a414a18ad9 Mon Sep 17 00:00:00 2001 From: Himanshu Upadhyaya Date: Thu, 9 Feb 2023 22:00:25 +0530 Subject: [PATCH v10] Implement HOT chain validation in verify_heapam() Himanshu Upadhyaya, reviewed by Robert Haas, Aleksander Alekseev, Andres Freund --- contrib/amcheck/verify_heapam.c | 303 +- src/bin/pg_amcheck/t/004_verify_heapam.pl | 241 - 2 files changed, 527 insertions(+), 17 deletions(-) diff --git a/contrib/amcheck/verify_heapam.c b/contrib/amcheck/verify_heapam.c index 4fcfd6df72..7fc984dd33 100644 --- a/contrib/amcheck/verify_heapam.c +++ b/contrib/amcheck/verify_heapam.c @@ -150,7 +150,7 @@ typedef struct HeapCheckContext } HeapCheckContext; /* Internal implementation */ -static void check_tuple(HeapCheckContext *ctx); +static void check_tuple(HeapCheckContext *ctx, bool *lp_valid); static void check_toast_tuple(HeapTuple toasttup, HeapCheckContext *ctx, ToastedAttribute *ta, int32 *expected_chunk_seq, uint32 extsize); @@ -160,7 +160,7 @@ static void check_toasted_attribute(HeapCheckContext *ctx, ToastedAttribute *ta); static bool check_tuple_header(HeapCheckContext *ctx); -static bool check_tuple_visibility(HeapCheckContext *ctx); +static bool check_tuple_visibility(HeapCheckContext *ctx, bool *lp_valid); static void report_corruption(HeapCheckContext *ctx, char *msg); static void report_toast_corruption(HeapCheckContext *ctx, @@ -399,9 +399,14 @@ verify_heapam(PG_FUNCTION_ARGS) for (ctx.blkno = first_block; ctx.blkno <= last_block; ctx.blkno++) { OffsetNumber maxoff; + OffsetNumber predecessor[MaxOffsetNumber]; + OffsetNumber successor[MaxOffsetNumber]; + bool lp_valid[MaxOffsetNumber]; CHECK_FOR_INTERRUPTS(); + memset(predecessor, 0, sizeof(OffsetNumber) * MaxOffsetNumber); + /* Optionally skip over all-frozen or all-visible blocks */ if (skip_option != SKIP_PAGES_NONE) { @@ -433,6 +438,10 @@ verify_heapam(PG_FUNCTION_ARGS) for (ctx.offnum = FirstOffsetNumber; ctx.offnum <= maxoff;
Re: Weird failure with latches in curculio on v15
On Thu, Feb 09, 2023 at 11:12:21AM -0500, Robert Haas wrote: > On Thu, Feb 9, 2023 at 10:51 AM Tom Lane wrote: >> If we think we need primitives to let the archiver hooks get all >> the pending files, or whatever, by all means add those. But don't >> cede fundamental control of the archiver. The hooks need to be >> decoration on a framework we provide, not the framework themselves. > > I don't quite see how you can make asynchronous and parallel archiving > work if the archiver process only calls into the archive module at > times that it chooses. That would mean that the module has to return > control to the archiver when it's in the middle of archiving one or > more files -- and then I don't see how it can get control back at the > appropriate time. Do you have a thought about that? I've been thinking about this, actually. I'm wondering if we could provide a list of files to the archiving callback (configurable via a variable in ArchiveModuleState), and then have the callback return a list of files that are archived. (Or maybe we just put the list of files that need archiving in ArchiveModuleState.) The returned list could include files that were sent to the callback previously. The archive module would be responsible for creating background worker(s) (if desired), dispatching files to-be-archived to its background worker(s), and gathering the list of archived files to return. This is admittedly half-formed, but I'm tempted to hack something together quickly to see whether it might be viable. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: recovery modules
On Wed, Feb 08, 2023 at 09:16:19PM -0800, Andres Freund wrote: > Pushed. Thanks! Thanks! -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: pg_stat_statements and "IN" conditions
On 2023-Feb-09, Dmitry Dolgov wrote: > > On Thu, Feb 09, 2023 at 02:30:34PM +0100, Peter Eisentraut wrote: > > What is the point of making this a numeric setting? Either you want > > to merge all values or you don't want to merge any values. > > At least in theory the definition of "too many constants" is different > for different use cases and I see allowing to configure it as a way of > reducing the level of surprise here. I was thinking about this a few days ago and I agree that we don't necessarily want to make it just a boolean thing; we may want to make it more complex. One trivial idea is to make it group entries in powers of 10: for 0-9 elements, you get one entry, and 10-99 you get a different one, and so on: # group everything in a single bucket const_merge_threshold = true / yes / on # group 0-9, 10-99, 100-999, 1000- const_merge_treshold = powers Ideally the value would be represented somehow in the query text. For example query| calls --+--- select * from test where i in ({... 0-9 entries ...})| 2 select * from test where i in ({... 10-99 entries ...}) | 1 What do you think? The jumble would have to know how to reduce all values within each power-of-ten group to one specific value, but I don't think that should be particularly difficult. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "Find a bug in a program, and fix it, and the program will work today. Show the program how to find and fix a bug, and the program will work forever" (Oliver Silfridge)
Re: MacOS: xsltproc fails with "warning: failed to load external entity"
Hi, On 2023-02-09 09:57:42 -0500, Tom Lane wrote: > Peter Eisentraut writes: > > On 08.02.23 23:18, Tom Lane wrote: > >> I pushed the discussed documentation improvements, and changed the > >> behavior of "ninja docs" to only build the HTML docs. > > > I don't like this change. Now the default set of docs is different > > between the make builds and the meson builds. And people will be less > > likely to make sure the man pages still build. > > What? The default behavior of "make" has been to build only the > html docs for many years. And I've never ever seen a case where > the html docs build and the man pages don't. I think this misunderstanding is again due to the confusion between the 'all' target in doc/src/sgml and the default target, just like earlier in the thread / why I ended up with the prior set of targets under 'docs'. # Make "html" the default target, since that is what most people tend # to want to use. html: ... all: html man Given the repeated confusion from that, among fairly senior hackers, perhaps we ought to at least put those lines next to each other? It's certainly not obvious as-is. > > What's wrong with just typing "ninja html"? > > Don't really care how the command is spelled, but there needs to > be a convenient way to get that behavior. Perhaps we should have doc-html, doc-man, doc-all or such? The shell autocompletions for ninja work pretty well for me, a prefix like that would make it easier to discover such "sub"-targets. I'm was pondering adding a 'help' target that shows important targets. Greetings, Andres Freund
Re: run pgindent on a regular basis / scripted manner
On 2023-02-08 We 21:29, Shinoda, Noriyoshi (PN Japan FSIP) wrote: Hi, I tried the committed pgindent. The attached small patch changes spaces in the usage message to tabs. Options other than --commit start with a tab. Thanks, pushed. cheers andrew -- Andrew Dunstan EDB:https://www.enterprisedb.com
Re: MacOS: xsltproc fails with "warning: failed to load external entity"
Andres Freund writes: > I think this misunderstanding is again due to the confusion between the 'all' > target in doc/src/sgml and the default target, just like earlier in the thread > / why I ended up with the prior set of targets under 'docs'. > # Make "html" the default target, since that is what most people tend > # to want to use. > html: > ... > all: html man > Given the repeated confusion from that, among fairly senior hackers, perhaps > we ought to at least put those lines next to each other? It's certainly not > obvious as-is. I think there are ordering constraints between these and the Makefile.global inclusion. But we could add a comment beside the "all:" line pointing out that that's not the default target. > Perhaps we should have doc-html, doc-man, doc-all or such? No objection here. If we intend to someday build tarballs with meson, there'd need to be a target that builds html+man, but that could perhaps be named "distprep". regards, tom lane
Minor meson gripe
Currently, meson has a test suite named "setup". According to the Wiki, this is needed to get something equivalent to "make check", by running "meson test -v --suite setup --suite regress". Some questions about this: * Isn't it confusing that we have a suite by that name, given that we also need to use the unrelated --setup flag for some nearby testing recipes? * Why do we actually need a "setup" suite? Offhand it appears that a simple "meson test -v --suite regress" works just as well. Have I missed something? -- Peter Geoghegan
Re: Worth using personality(ADDR_NO_RANDOMIZE) for EXEC_BACKEND on linux?
On Wed, Feb 8, 2023 at 5:12 PM Michael Paquier wrote: > At the end, I have just done this stuff down to ~12, 11 does not seem > worth the trouble as the next stable version to go out of support. > I'll reduce gokiburi's script a bit, as a result, until the oldest > version support is v12. For the record, according to [1] it's not necessary to use --reset-author when back-patching. (Maybe a little confusingly, because it's not quite clear whether our policies consider the author field to be meaningful or not.) [1] https://www.postgresql.org/message-id/flat/1347459696.16215.11.camel%40vanquo.pezone.net
Re: Improve logging when using Huge Pages
On Thu, Feb 09, 2023 at 10:40:13AM +0100, Alvaro Herrera wrote: > On 2023-Feb-08, Justin Pryzby wrote: >> I don't think it makes sense to run postgres -C huge_pages_active, >> however, so I see no issue that that would always returns "false". > > Hmm, I would initialize it to return "unknown" rather than "off" — and > make sure it turns "off" at the appropriate time. Otherwise you're just > moving the confusion elsewhere. I think this approach would address my concerns about using a GUC. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: Weird failure with latches in curculio on v15
Hi, On 2023-02-09 11:12:21 -0500, Robert Haas wrote: > On Thu, Feb 9, 2023 at 10:51 AM Tom Lane wrote: > > I'm fairly concerned about the idea of making it common for people > > to write their own main loop for the archiver. That means that, if > > we have a bug fix that requires the archiver to do X, we will not > > just be patching our own code but trying to get an indeterminate > > set of third parties to add the fix to their code. I'm somewhat concerned about that too, but perhaps from a different angle. First, I think we don't do our users a service by defaulting the in-core implementation to something that doesn't scale to even a moderately busy server. Second, I doubt we'll get the API for any of this right, without an acutual user that does something more complicated than restoring one-by-one in a blocking manner. > I don't know what kind of bug we could really have in the main loop > that would be common to every implementation. They're probably all > going to check for interrupts, do some work, and then wait for I/O on > some things by calling select() or some equivalent. But the work, and > the wait for the I/O, would be different for every implementation. I > would anticipate that the amount of common code would be nearly zero. I don't think it's that hard to imagine problems. To be reasonably fast, a decent restore implementation will have to 'restore ahead'. Which also provides ample things to go wrong. E.g. - WAL source is switched, restore module needs to react to that, but doesn't, we end up lots of wasted work, or worse, filename conflicts - recovery follows a timeline, restore module doesn't catch on quickly enough - end of recovery happens, restore just continues on > > If we think we need primitives to let the archiver hooks get all > > the pending files, or whatever, by all means add those. But don't > > cede fundamental control of the archiver. The hooks need to be > > decoration on a framework we provide, not the framework themselves. > > I don't quite see how you can make asynchronous and parallel archiving > work if the archiver process only calls into the archive module at > times that it chooses. That would mean that the module has to return > control to the archiver when it's in the middle of archiving one or > more files -- and then I don't see how it can get control back at the > appropriate time. Do you have a thought about that? I don't think archiver is the hard part, that already has a dedicated process, and it also has something of a queuing system already. The startup process imo is the complicated one... If we had a 'restorer' process, startup fed some sort of a queue with things to restore in the near future, it might be more realistic to do something you describe? Greetings, Andres Freund
Re: recovery modules
rebased for cfbot -- Nathan Bossart Amazon Web Services: https://aws.amazon.com >From ea6339276c6863f260572dfc816f9dd27ac7b516 Mon Sep 17 00:00:00 2001 From: Nathan Bossart Date: Fri, 27 Jan 2023 21:01:22 -0800 Subject: [PATCH v9 1/3] s/ArchiveContext/ArchiveCallbacks --- src/backend/postmaster/pgarch.c | 20 ++-- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c index e551af2905..065d7d1313 100644 --- a/src/backend/postmaster/pgarch.c +++ b/src/backend/postmaster/pgarch.c @@ -97,7 +97,7 @@ char *XLogArchiveLibrary = ""; */ static time_t last_sigterm_time = 0; static PgArchData *PgArch = NULL; -static ArchiveModuleCallbacks ArchiveContext; +static ArchiveModuleCallbacks ArchiveCallbacks; /* @@ -406,8 +406,8 @@ pgarch_ArchiverCopyLoop(void) HandlePgArchInterrupts(); /* can't do anything if not configured ... */ - if (ArchiveContext.check_configured_cb != NULL && -!ArchiveContext.check_configured_cb()) + if (ArchiveCallbacks.check_configured_cb != NULL && +!ArchiveCallbacks.check_configured_cb()) { ereport(WARNING, (errmsg("archive_mode enabled, yet archiving is not configured"))); @@ -508,7 +508,7 @@ pgarch_archiveXlog(char *xlog) snprintf(activitymsg, sizeof(activitymsg), "archiving %s", xlog); set_ps_display(activitymsg); - ret = ArchiveContext.archive_file_cb(xlog, pathname); + ret = ArchiveCallbacks.archive_file_cb(xlog, pathname); if (ret) snprintf(activitymsg, sizeof(activitymsg), "last was %s", xlog); else @@ -814,7 +814,7 @@ HandlePgArchInterrupts(void) /* * LoadArchiveLibrary * - * Loads the archiving callbacks into our local ArchiveContext. + * Loads the archiving callbacks into our local ArchiveCallbacks. */ static void LoadArchiveLibrary(void) @@ -827,7 +827,7 @@ LoadArchiveLibrary(void) errmsg("both archive_command and archive_library set"), errdetail("Only one of archive_command, archive_library may be set."))); - memset(&ArchiveContext, 0, sizeof(ArchiveModuleCallbacks)); + memset(&ArchiveCallbacks, 0, sizeof(ArchiveModuleCallbacks)); /* * If shell archiving is enabled, use our special initialization function. @@ -844,9 +844,9 @@ LoadArchiveLibrary(void) ereport(ERROR, (errmsg("archive modules have to define the symbol %s", "_PG_archive_module_init"))); - (*archive_init) (&ArchiveContext); + (*archive_init) (&ArchiveCallbacks); - if (ArchiveContext.archive_file_cb == NULL) + if (ArchiveCallbacks.archive_file_cb == NULL) ereport(ERROR, (errmsg("archive modules must register an archive callback"))); @@ -859,6 +859,6 @@ LoadArchiveLibrary(void) static void pgarch_call_module_shutdown_cb(int code, Datum arg) { - if (ArchiveContext.shutdown_cb != NULL) - ArchiveContext.shutdown_cb(); + if (ArchiveCallbacks.shutdown_cb != NULL) + ArchiveCallbacks.shutdown_cb(); } -- 2.25.1 >From e4e28ed2aea395e5120389e92d6944c48468208e Mon Sep 17 00:00:00 2001 From: Nathan Bossart Date: Fri, 27 Jan 2023 21:16:34 -0800 Subject: [PATCH v9 2/3] move archive module exports to dedicated headers --- contrib/basic_archive/basic_archive.c | 2 +- src/backend/postmaster/pgarch.c | 2 ++ src/backend/postmaster/shell_archive.c | 3 +- src/backend/utils/misc/guc_tables.c | 1 + src/include/postmaster/archive_module.h | 47 + src/include/postmaster/pgarch.h | 39 src/include/postmaster/shell_archive.h | 24 + 7 files changed, 77 insertions(+), 41 deletions(-) create mode 100644 src/include/postmaster/archive_module.h create mode 100644 src/include/postmaster/shell_archive.h diff --git a/contrib/basic_archive/basic_archive.c b/contrib/basic_archive/basic_archive.c index 36b7a4814a..384336884d 100644 --- a/contrib/basic_archive/basic_archive.c +++ b/contrib/basic_archive/basic_archive.c @@ -32,7 +32,7 @@ #include "common/int.h" #include "miscadmin.h" -#include "postmaster/pgarch.h" +#include "postmaster/archive_module.h" #include "storage/copydir.h" #include "storage/fd.h" #include "utils/guc.h" diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c index 065d7d1313..281d9fd8b7 100644 --- a/src/backend/postmaster/pgarch.c +++ b/src/backend/postmaster/pgarch.c @@ -34,8 +34,10 @@ #include "lib/binaryheap.h" #include "libpq/pqsignal.h" #include "pgstat.h" +#include "postmaster/archive_module.h" #include "postmaster/interrupt.h" #include "postmaster/pgarch.h" +#include "postmaster/shell_archive.h" #include "storage/fd.h" #include "storage/ipc.h" #include "storage/latch.h" diff --git a/src/backend/postmaster/shell_archive.c b/src/backend/postmaster/shell_archive.c index 7771b951b7..0f0558e155 100644 --- a/src/backend/postmaster/shell_archive.c +++ b/src/backend/postmaster/shell_archive.c @@ -20,7 +20,8 @@ #include "access/xlog.h" #include "common/percentrepl.h" #include "pgstat.h" -#i
Re: pg_stat_statements and "IN" conditions
> On Thu, Feb 09, 2023 at 06:26:51PM +0100, Alvaro Herrera wrote: > On 2023-Feb-09, Dmitry Dolgov wrote: > > > > On Thu, Feb 09, 2023 at 02:30:34PM +0100, Peter Eisentraut wrote: > > > > What is the point of making this a numeric setting? Either you want > > > to merge all values or you don't want to merge any values. > > > > At least in theory the definition of "too many constants" is different > > for different use cases and I see allowing to configure it as a way of > > reducing the level of surprise here. > > I was thinking about this a few days ago and I agree that we don't > necessarily want to make it just a boolean thing; we may want to make it > more complex. One trivial idea is to make it group entries in powers of > 10: for 0-9 elements, you get one entry, and 10-99 you get a different > one, and so on: > > # group everything in a single bucket > const_merge_threshold = true / yes / on > > # group 0-9, 10-99, 100-999, 1000- > const_merge_treshold = powers > > Ideally the value would be represented somehow in the query text. For > example > > query| calls > --+--- > select * from test where i in ({... 0-9 entries ...})| 2 > select * from test where i in ({... 10-99 entries ...}) | 1 > > What do you think? The jumble would have to know how to reduce all > values within each power-of-ten group to one specific value, but I don't > think that should be particularly difficult. Yeah, it sounds appealing and conveniently addresses the question of losing the information about how many constants originally were there. Not sure if the example above would be the most natural way to represent it in the query text, but otherwise I'm going to try implementing this. Stay tuned.
Re: recovery modules
Hi, On 2023-02-09 11:39:17 -0800, Nathan Bossart wrote: > rebased for cfbot I think this nearly ready. Michael, are you planning to commit this? Personally I'd probably squash these into a single commit. > diff --git a/doc/src/sgml/archive-modules.sgml > b/doc/src/sgml/archive-modules.sgml > index ef02051f7f..2db1b19216 100644 > --- a/doc/src/sgml/archive-modules.sgml > +++ b/doc/src/sgml/archive-modules.sgml > @@ -47,23 +47,30 @@ > normal library search path is used to locate the library. To provide the > required archive module callbacks and to indicate that the library is > actually an archive module, it needs to provide a function named > - _PG_archive_module_init. This function is passed a > - struct that needs to be filled with the callback function pointers for > - individual actions. > + _PG_archive_module_init. This function must return > an > + ArchiveModuleCallbacks struct filled with the > + callback function pointers for individual actions. I'd probably mention that this should typically be of server lifetime / a 'static const' struct. Tableam documents this as follows: The result of the function must be a pointer to a struct of type TableAmRoutine, which contains everything that the core code needs to know to make use of the table access method. The return value needs to be of server lifetime, which is typically achieved by defining it as a static const variable in global scope > + > + > + > +archive_library is only loaded in the archiver > process. > + > + > That's not really related to any of the changes here, right? I'm not sure it's a good idea to document that. We e.g. probably should allow the library to check that the configuration is correct, at postmaster start, rather than later, at runtime. > > @@ -73,6 +80,20 @@ typedef void (*ArchiveModuleInit) (struct > ArchiveModuleCallbacks *cb); > The server will call them as required to process each individual WAL file. > > > + > + Startup Callback > + > +The startup_cb callback is called shortly after the > +module is loaded. This callback can be used to perform any additional > +initialization required. If the archive module needs to have a state, it > +can use state->private_data to store it. s/needs to have a state/has state/? > @@ -83,7 +104,7 @@ typedef void (*ArchiveModuleInit) (struct > ArchiveModuleCallbacks *cb); > assumes the module is configured. > > > -typedef bool (*ArchiveCheckConfiguredCB) (void); > +typedef bool (*ArchiveCheckConfiguredCB) (ArchiveModuleState *state); > > > If true is returned, the server will proceed with Hm. I wonder if ArchiveCheckConfiguredCB() should actually work without the state. We're not really doing anything yet, at that point, so it shouldn't really need state? The reason I'm wondering is that I think we should consider calling this from the GUC assignment hook, at least in postmaster. Which would make it more convenient to not have state, I think? > @@ -128,7 +149,7 @@ typedef bool (*ArchiveFileCB) (const char *file, const > char *path); > these situations. > > > -typedef void (*ArchiveShutdownCB) (void); > +typedef void (*ArchiveShutdownCB) (ArchiveModuleState *state); > > > Perhaps mention that this needs to free state it allocated in the ArchiveModuleState, or it'll be leaked? Greetings, Andres Freund
Re: Can we do something to help stop users mistakenly using force_parallel_mode?
On Thu, 9 Feb 2023 at 21:20, John Naylor wrote: > Looks good at a glance, just found a spurious word: > > + "by forcing the planner into to generate plans which contains nodes " Thanks for looking. I'll fix that. Likely the hardest part to get right here is the new name. Can anyone think of anything better than debug_parallel_query? David
Re: MacOS: xsltproc fails with "warning: failed to load external entity"
Hi, On 2023-02-09 13:48:46 -0500, Tom Lane wrote: > Andres Freund writes: > > I think this misunderstanding is again due to the confusion between the > > 'all' > > target in doc/src/sgml and the default target, just like earlier in the > > thread > > / why I ended up with the prior set of targets under 'docs'. > > > # Make "html" the default target, since that is what most people tend > > # to want to use. > > html: > > ... > > all: html man > > > Given the repeated confusion from that, among fairly senior hackers, perhaps > > we ought to at least put those lines next to each other? It's certainly not > > obvious as-is. > > I think there are ordering constraints between these and the > Makefile.global inclusion. But we could add a comment beside the "all:" > line pointing out that that's not the default target. Yes, html: has to happen before the inclusion of Makefile.global to become the default target, but afaics we can just move "all: html man" up? > If we intend to someday build tarballs with meson, there'd need to be > a target that builds html+man, but that could perhaps be named > "distprep". Yea, a distprep target just depending on all the required targets seems to be the way to go for that. Not really related: I think we should seriously consider removing most of the things distprep includes in the tarball. I'd leave docs in though. But IMO all of the generated code doesn't make sense in this day and age. I guess that's a discussion for a different thread and a different day. Greetings, Andres Freund
Re: Minor meson gripe
Hi, On 2023-02-09 11:01:31 -0800, Peter Geoghegan wrote: > Currently, meson has a test suite named "setup". According to the > Wiki, this is needed to get something equivalent to "make check", by > running "meson test -v --suite setup --suite regress". Yep. > Some questions about this: > > * Isn't it confusing that we have a suite by that name, given that we > also need to use the unrelated --setup flag for some nearby testing > recipes? Hm. I don't find it particularly confusing, but I don't think I'm a good judge of that, too close. > * Why do we actually need a "setup" suite? > > Offhand it appears that a simple "meson test -v --suite regress" works > just as well. Have I missed something? It'll work, but only if you have run setup before. And it'll not use changed C code. The setup suite creates the installation in tmp_install/. So if you haven't run the tests before, it'll fail due to that missing. If you have run it before, but have changed code, it'll not get used. The background for the issue is that while meson test supports dependencies for each test, and will build exactly the required dependencies if you run individual tests with meson test, it unfortunately also adds all the test dependencies to the default ninja target. That's mostly for historical reasons, because initially meson didn't support dependencies for tests. There's recent work on changing that though. Creating the temp installation every time you run 'ninja' would not be nice. On slower machines it can take quite a while. I think medium term we should just stop requiring a temporary install to run tests, it's substantial, unnecessary, overhead, and it requires us to build way too much to run basic tests. It'd not take a whole lot to make that work: - a search path for finding extensions, which'd be very useful for other reasons as well - a way to tell 'postgres', 'initdb' etc, which use find_other_exec(), that they should use PATH - a way to tell initdb where to find things like postgres.bki, postgres where it can find timezone data, etc. Greetings, Andres Freund
pg_usleep for multisecond delays
I just found myself carefully counting the zeros in a call to pg_usleep(). Besides getting my eyes checked, perhaps there should be a wrapper called pg_ssleep() than can be used for multisecond sleeps. Or maybe the USECS_PER_SEC macro should be used more widely. I attached a patch for the former approach. I don't have a strong opinion, but I do think it's worth improving readability a bit here. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com diff --git a/src/backend/access/nbtree/nbtpage.c b/src/backend/access/nbtree/nbtpage.c index 3feee28d19..63de896cae 100644 --- a/src/backend/access/nbtree/nbtpage.c +++ b/src/backend/access/nbtree/nbtpage.c @@ -2976,7 +2976,7 @@ _bt_pendingfsm_finalize(Relation rel, BTVacState *vstate) * never be effective without some other backend concurrently consuming an * XID. */ - pg_usleep(500L); + pg_ssleep(5); #endif /* diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index f9f0f6db8d..87664045d0 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -5103,7 +5103,7 @@ StartupXLOG(void) /* This is just to allow attaching to startup process with a debugger */ #ifdef XLOG_REPLAY_DELAY if (ControlFile->state != DB_SHUTDOWNED) - pg_usleep(6000L); + pg_ssleep(60); #endif /* diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c index ff6149a179..744adff984 100644 --- a/src/backend/postmaster/autovacuum.c +++ b/src/backend/postmaster/autovacuum.c @@ -441,7 +441,7 @@ AutoVacLauncherMain(int argc, char *argv[]) (errmsg_internal("autovacuum launcher started"))); if (PostAuthDelay) - pg_usleep(PostAuthDelay * 100L); + pg_ssleep(PostAuthDelay); SetProcessingMode(InitProcessing); @@ -561,7 +561,7 @@ AutoVacLauncherMain(int argc, char *argv[]) * Sleep at least 1 second after any error. We don't want to be * filling the error logs as fast as we can. */ - pg_usleep(100L); + pg_ssleep(1); } /* We can now handle ereport(ERROR) */ @@ -687,7 +687,7 @@ AutoVacLauncherMain(int argc, char *argv[]) * of a worker will continue to fail in the same way. */ AutoVacuumShmem->av_signal[AutoVacForkFailed] = false; -pg_usleep(100L); /* 1s */ +pg_ssleep(1); SendPostmasterSignal(PMSIGNAL_START_AUTOVAC_WORKER); continue; } @@ -1708,7 +1708,7 @@ AutoVacWorkerMain(int argc, char *argv[]) (errmsg_internal("autovacuum: processing database \"%s\"", dbname))); if (PostAuthDelay) - pg_usleep(PostAuthDelay * 100L); + pg_ssleep(PostAuthDelay); /* And do an appropriate amount of work */ recentXid = ReadNextTransactionId(); diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c index 0dd22b2351..6d38dfeeba 100644 --- a/src/backend/postmaster/bgworker.c +++ b/src/backend/postmaster/bgworker.c @@ -759,7 +759,7 @@ StartBackgroundWorker(void) /* Apply PostAuthDelay */ if (PostAuthDelay > 0) - pg_usleep(PostAuthDelay * 100L); + pg_ssleep(PostAuthDelay); /* * Set up signal handlers. diff --git a/src/backend/postmaster/bgwriter.c b/src/backend/postmaster/bgwriter.c index 9bb47da404..147f9b1e38 100644 --- a/src/backend/postmaster/bgwriter.c +++ b/src/backend/postmaster/bgwriter.c @@ -196,7 +196,7 @@ BackgroundWriterMain(void) * to be repeated, and we don't want to be filling the error logs as * fast as we can. */ - pg_usleep(100L); + pg_ssleep(1); /* * Close all open files after any error. This is helpful on Windows, diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c index aaad5c5228..12786229dd 100644 --- a/src/backend/postmaster/checkpointer.c +++ b/src/backend/postmaster/checkpointer.c @@ -310,7 +310,7 @@ CheckpointerMain(void) * to be repeated, and we don't want to be filling the error logs as * fast as we can. */ - pg_usleep(100L); + pg_ssleep(1); /* * Close all open files after any error. This is helpful on Windows, diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c index e551af2905..6460c69726 100644 --- a/src/backend/postmaster/pgarch.c +++ b/src/backend/postmaster/pgarch.c @@ -450,7 +450,7 @@ pgarch_ArchiverCopyLoop(void) } /* wait a bit before retrying */ -pg_usleep(100L); +pg_ssleep(1); continue; } @@ -482,7 +482,7 @@ pgarch_ArchiverCopyLoop(void) xlog))); return; /* give up archiving for now */ } -pg_usleep(100L); /* wait a bit before retrying */ +pg_ssleep(1); /* wait a bit before retrying */ } } } diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c index 2552327d90..6b80f423a8 100644 --- a/src/backend/postmaster/postmaster.c +++ b/src/backend/postmaster/postmaster.c @@ -4292,7 +4292,7 @@ BackendInitialize(Port *port) * is not honored until after authentication.) */
Re: Importing pg_bsd_indent into our source tree
Hi, On 2023-02-09 13:30:30 -0500, Tom Lane wrote: > 0003 lacks meson support (anyone want to help with that?) I'll give it a go, unless somebody else wants to. Do we expect pg_bsd_indent to build / work on windows, right now? If it doesn't, do we want to make that a hard requirement? I'll have CI test that, once I added meson support. > I'd anticipate pushing 0001-0003 shortly but holding 0004 until we are ready > to do the post-March-CF pgindent run. (Come to think of it, 0004 had > probably better include a pg_bsd_indent version bump too.) How large is the diff that creates? If it's not super-widespread, it might be ok to do that earlier. I wouldn't mind not seeing that uglyness every time I run pgindent on a patch... Although I guess post-March-CF isn't that far away at this point :) Greetings, Andres Freund
Re: ICU locale validation / canonicalization
On Thu, 2023-02-09 at 15:44 +0100, Peter Eisentraut wrote: > One use case is that if a user specifies a locale, say, of 'de-AT', > this > might canonicalize to 'de' today, Canonicalization should not lose useful information, it should just rearrange it, so I don't see a risk here based on what I read and the behavior I saw. In ICU, "de-AT" canonicalizes to "de_AT" and becomes the language tag "de-AT". > but we should still store what the > user specified because 1) that documents what the user wanted, and 2) > it > might not canonicalize to the same thing tomorrow. We don't want to store things with ambiguous interpretations that could change tomorrow; that's a recipe for trouble. That's why most people store timestamps as the offset from some epoch in UTC rather than as "2/9/23" (Feb 9 or Sept 2? 1923 or 2023?). There are exceptions where you would want to store something like that, but I don't see why they'd apply in this case, where reinterpretation probably means a corrupted index. If the user wants to know how their ad-hoc string was interpreted, they can look at the resulting BCP 47 language tag, and see if it's what they meant. We can try to make this user-friendly by offering a NOTICE, WARNING, or helper functions that allow them to explore. We can also double check that the canonicalized form resolves to the same actual collator to be safe, and maybe even fall back to whatever the user specified if not. I'm open to discuss how strict we want to be and what kind of escape hatches we need to offer. There is still a risk that the BCP 47 language tag resolves to a different specific ICU collator or different collator version tomorrow. That's why we need to be careful about versioning (library versions or collator versions or both), and we've had long discussions about that. -- Jeff Davis PostgreSQL Contributor Team - AWS
Re: pg_usleep for multisecond delays
Hi, On 2023-02-09 12:59:29 -0800, Nathan Bossart wrote: > I just found myself carefully counting the zeros in a call to pg_usleep(). > Besides getting my eyes checked, perhaps there should be a wrapper called > pg_ssleep() than can be used for multisecond sleeps. Or maybe the > USECS_PER_SEC macro should be used more widely. I attached a patch for the > former approach. I don't have a strong opinion, but I do think it's worth > improving readability a bit here. pg_usleep() should pretty much never used for sleeps that long, at least in the backend - depending on the platform they're not interruptible. Most of the things changed here are debugging tools, but even so, it's e.g. pretty annoying. E.g. you can't normally shut down while a backend is in pre_auth_delay. So I'm not sure it's the right direction to make pg_usleep() easier to use... Greetings, Andres Freund
Re: Importing pg_bsd_indent into our source tree
Andres Freund writes: > On 2023-02-09 13:30:30 -0500, Tom Lane wrote: >> 0003 lacks meson support (anyone want to help with that?) > I'll give it a go, unless somebody else wants to. Thanks. > Do we expect pg_bsd_indent to build / work on windows, right now? It would be desirable, for sure. I've not noticed anything remarkably unportable in the code, so probably it's just a matter of getting the build infrastructure to build it. I suppose that we aren't going to update the src/tools/msvc scripts anymore, so getting meson to handle it should be enough?? >> I'd anticipate pushing 0001-0003 shortly but holding 0004 until we are ready >> to do the post-March-CF pgindent run. (Come to think of it, 0004 had >> probably better include a pg_bsd_indent version bump too.) > How large is the diff that creates? If it's not super-widespread, it might be > ok to do that earlier. It wasn't that big; I posted it in the thread discussing that change. I think the real issue might just be that, assuming we bump the pg_bsd_indent version number, that in itself would force interested committers to update their copy Right Now. I'd rather give a little notice. regards, tom lane
Re: postgres_fdw, dblink, and CREATE SUBSCRIPTION security
On 2/6/23 08:22, Robert Haas wrote: > I don't think that's quite the right concept. It seems to me that the > client is responsible for informing the server of what the situation > is, and the server is responsible for deciding whether to allow the > connection. In your scenario, the client is not only communicating > information ("here's the password I have got") but also making demands > on the server ("DO NOT authenticate using anything else"). I like the > first part fine, but not the second part. For what it's worth, making a negative demand during authentication is pretty standard: if you visit example.com and it tells you "I need your OS login password and Social Security Number to authenticate you," you have the option of saying "no thanks" and closing the tab. It's not really about protecting the server at that point; the server can protect itself. It's about protecting *you*. Allowing the proxy to pin a specific set of authentication details to the connection is just a way for it to close the tab on a server that would otherwise pull some other piece of ambient authority out of it. In a hypothetical world where the server presented the client with a list of authentication options before allowing any access, this would maybe be a little less convoluted to solve. For example, a proxy seeing a SASL list of - ANONYMOUS - EXTERNAL could understand that both methods allow the client to assume the authority of the proxy itself. So if its client isn't allowed to do that, the proxy realizes something is wrong (either it, or its target server, has been misconfigured or is under attack), and it can close the connection *before* the server runs login triggers. > In this kind of scenario, the client has no business > demanding that the server authenticate using the password rather than > anything else. The server, not the client, is in charge of deciding > which connections to accept; the client's job is only to decide which > connections to proxy. This sounds like a reasonable separation of responsibilities on the surface, but I think it's subtly off. The entire confused-deputy problem space revolves around the proxy being unable to correctly decide which connections to allow unless it also knows why the connections are being authorized. You've constructed an example where that's not a concern: everything's symmetrical, all proxies operate with the same authority, and internal users are identical to external users. But the CVE that led to the password requirement, as far as I can tell, dealt with asymmetry. The proxy had the authority to connect locally to a user, and the clients had the authority to connect to other machines' users, but those users weren't the same and were not mutually trusting. > And the human being is responsible for making > sure that the combination of those two things implements the intended > security policy. Sure, but upthread it was illustrated how difficult it is for even the people implementing the protocol to reason through what's safe and what's not. The primitives we're providing in the protocol are, IMO, difficult to wield safely for more complex use cases. We can provide mitigations, and demand that the DBA reason through every combination, and tell them "don't do that" when they screw up or come across a situation that our mitigations can't paper over. But I think we can solve the root problem instead. > We > could do something that looks more like a series of if-then rules, > e.g. > > target-host 127.0.0.0/8 => reject > authentication-method scram => accept > reject Yeah, I think something based on allow/deny is going to be the most intuitive. > But it's only a hop, skip and a jump from there to something that > looks an awful lot like a full-blown programing language, and maybe > that's even the right idea, but, oh, the bike-shedding! Eh. Someone will demand Turing-completeness eventually, but you don't have to listen. :D --Jacob
Re: ICU locale validation / canonicalization
On Thu, 2023-02-09 at 10:53 -0500, Robert Haas wrote: > Unfortunately, I have no idea whether your specific ideas about how > to > make that happen are any good or not. But I hope they are, because > the > current situation is pessimal. It feels like BCP 47 is the right catalog representation. We are already using it for the import of initial collations, and it's a standard, and there seems to be good support in ICU. There are a couple cases where canonicalization will succeed but conversion to a BCP 47 language tag will fail. One is for unsupported attributes, like "en_US@foo=bar". Another is a bug I found and reported here: https://unicode-org.atlassian.net/browse/ICU-22268 In both cases, we know that conversion has failed, and we have a choice about how to proceed. We can fail, warn and continue with the user- entered representation, or turn off the strictness checking and come up with some BCP 47 tag and see if it resolves to the same collator. I do like the ICU format locale IDs from a readability standpoint. "en_US@colstrength=primary" is more meaningful to me than "en-US-u-ks- level1" (the equivalent language tag). And the format is specified[1], even though it's not an independent standard. But I think the benefits of better validation, an independent standard, and the fact that we're already favoring BCP47 outweigh my subjective opinion. I also attached a simple test program that I've been using to experiment (not intended for code review). It's hard for me to say that I'm sure I'm right. I really just got involved in this a few months back, and had a few off-list conversations with Peter Eisentraut to try to learn more (I believe he is aligned with my proposal but I will let him speak for himself). I should also say that I'm not exactly an expert in languages or scripts. I assume that ICU and IETF are doing sensible things to accommodate the diversity of human language as well as they can (or at least much better than the Postgres project could do on its own). I'm happy to hear more input or other proposals. [1] https://unicode-org.github.io/icu/userguide/locale/#canonicalization -- Jeff Davis PostgreSQL Contributor Team - AWS #include #include #include #define CAPACITY 1024 int main(int argc, char *argv[]) { UErrorCode status; UCollator *collator; const char *requested_locale = NULL; const char *valid_locale; const char *actual_locale; char canonical[CAPACITY] = {0}; char variant[CAPACITY] = {0}; char basename[CAPACITY] = {0}; char getname[CAPACITY] = {0}; char langtag[CAPACITY] = {0}; char langtag_s[CAPACITY] = {0}; if (argc > 1) { requested_locale = argv[1]; status = U_ZERO_ERROR; uloc_canonicalize(requested_locale, canonical, CAPACITY, &status); if (U_FAILURE(status)) printf("canonicalize error: %s\n", u_errorName(status)); status = U_ZERO_ERROR; uloc_getBaseName(requested_locale, basename, CAPACITY, &status); if (U_FAILURE(status)) printf("basename error: %s\n", u_errorName(status)); status = U_ZERO_ERROR; uloc_getName(requested_locale, getname, CAPACITY, &status); if (U_FAILURE(status)) printf("getname error: %s\n", u_errorName(status)); status = U_ZERO_ERROR; uloc_getVariant(requested_locale, variant, CAPACITY, &status); if (U_FAILURE(status)) printf("variant error: %s\n", u_errorName(status)); status = U_ZERO_ERROR; uloc_toLanguageTag(requested_locale, langtag, CAPACITY, false, &status); if (U_FAILURE(status)) printf("langtag error: %s\n", u_errorName(status)); uloc_toLanguageTag(requested_locale, langtag_s, CAPACITY, true, &status); if (U_FAILURE(status)) printf("langtag strict error: %s\n", u_errorName(status)); } else if (argc > 2) fprintf(stderr, "too many arguments"); status = U_ZERO_ERROR; collator = ucol_open(requested_locale, &status); valid_locale = ucol_getLocaleByType(collator, ULOC_VALID_LOCALE, &status); actual_locale = ucol_getLocaleByType(collator, ULOC_ACTUAL_LOCALE, &status); printf("canonicalize: %s\n", canonical); printf("langtag : %s\n", langtag); printf("langtag strict: %s\n", langtag_s); printf("variant: %s\n", variant); printf("getBaseName: %s\n", basename); printf("getName: %s\n", getname); printf("valid locale: %s\n", valid_locale); printf("actual locale: %s\n", actual_locale); ucol_close(collator); }
Re: Importing pg_bsd_indent into our source tree
Andres Freund writes: > Did that in the attached. Thanks. > I didn't convert the test though, due to the duplicating it'd create. Perhaps > we should just move it to a shell script? Or maybe it just doesn't matter > enough to bother with? We could move it to a shell script perhaps, but that seems pretty low-priority. > It doesn't build as-is with msvc, but does build with mingw. Failure: > https://cirrus-ci.com/task/6290206869946368?logs=build#L1573 Thanks, I'll take a look at these things. > To we really want to require users to install pg_bsd_indent into PATH? Seems > like we ought to have a build target to invoke pgindent with a path to > pg_bsd_indent or such? But I guess we can address that later. For the moment I was just interested in maintaining the current workflow. I know people muttered about having some sort of build target that'd indent the whole tree from scratch after building pg_bsd_indent, but it's not very clear to me how that'd work with e.g. VPATH configurations. (I think you can already tell pgindent to use a specific pg_bsd_indent, if your gripe is just about wanting to use a prebuilt copy that you don't want to keep in PATH for some reason.) > Independent of this specific patch: You seem to be generating your patch > series by invoking git show and redirecting that to a file? Yeah, it's pretty low-tech. I'm not in the habit of posting multi-patch series very often, so I haven't really bothered to use format-patch. (I gave up on "git am" long ago as being too fragile, and always use good ol' "patch" to apply patches, so I don't think about things like whether it'd automatically absorb commit messages. I pretty much never use anyone else's commit message verbatim anyway ...) regards, tom lane
Re: Importing pg_bsd_indent into our source tree
Hi, On 2023-02-09 13:55:32 -0800, Andres Freund wrote: > ../src/tools/pg_bsd_indent/args.c(179): error C2065: 'PATH_MAX': undeclared > identifier > ../src/tools/pg_bsd_indent/args.c(179): error C2057: expected constant > expression > ../src/tools/pg_bsd_indent/args.c(179): error C2466: cannot allocate an array > of constant size 0 > ../src/tools/pg_bsd_indent/args.c(179): error C2133: 'fname': unknown size > ../src/tools/pg_bsd_indent/args.c(183): warning C4034: sizeof returns 0 > ../src/tools/pg_bsd_indent/args.c(185): warning C4034: sizeof returns 0 > [1557/2161] Compiling C object > src/tools/pg_bsd_indent/pg_bsd_indent.exe.p/err.c.obj > [1558/2161] Precompiling header > src/interfaces/ecpg/ecpglib/libecpg.dll.p/meson_pch-c.c > [1559/2161] Compiling C object > src/tools/pg_bsd_indent/pg_bsd_indent.exe.p/indent.c.obj > FAILED: src/tools/pg_bsd_indent/pg_bsd_indent.exe.p/indent.c.obj > "cl" "-Isrc\tools/pg_bsd_indent\pg_bsd_indent.exe.p" > "-Isrc\tools/pg_bsd_indent" "-I..\src\tools\pg_bsd_indent" "-Isrc\include" > "-I..\src\include" "-Ic:\openssl\1.1\include" "-I..\src\include\port\win32" > "-I..\src\include\port\win32_msvc" "/MDd" "/nologo" "/showIncludes" "/utf-8" > "/W2" "/Od" "/Zi" "/DWIN32" "/DWINDOWS" "/D__WINDOWS__" "/D__WIN32__" > "/D_CRT_SECURE_NO_DEPRECATE" "/D_CRT_NONSTDC_NO_DEPRECATE" "/wd4018" > "/wd4244" "/wd4273" "/wd4101" "/wd4102" "/wd4090" "/wd4267" > "/Fdsrc\tools/pg_bsd_indent\pg_bsd_indent.exe.p\indent.c.pdb" > /Fosrc/tools/pg_bsd_indent/pg_bsd_indent.exe.p/indent.c.obj "/c" > ../src/tools/pg_bsd_indent/indent.c > ../src/tools/pg_bsd_indent/indent.c(63): error C2065: 'MAXPATHLEN': > undeclared identifier > ../src/tools/pg_bsd_indent/indent.c(63): error C2057: expected constant > expression > ../src/tools/pg_bsd_indent/indent.c(63): error C2466: cannot allocate an > array of constant size 0 > > This specific issue at least should be easily fixable. The trivial fix of using MAXPGPATH made it build, without warnings. That doesn't say anything about actually working. So I guess porting the test would make sense. Opinions on whether it would make sense as a shell script? Greetings, Andres Freund
Re: Importing pg_bsd_indent into our source tree
Andres Freund writes: > The trivial fix of using MAXPGPATH made it build, without warnings. That > doesn't say anything about actually working. So I guess porting the test would > make sense. > Opinions on whether it would make sense as a shell script? Hmmm .. a shell script would be fine by me, but it won't help in testing a Windows build. Maybe we need to make it a Perl script? BTW, the attachments to your previous message are identical to what I previously posted --- did you attach the wrong set of diffs? regards, tom lane
Re: Time delayed LR (WAS Re: logical replication restrictions)
> The comment adjustment suggested by Peter-san above > was also included in this v33. > Please have a look at the attached patch. Patch v33 LGTM. -- Kind Regards, Peter Smith. Fujitsu Australia
Re: recovery modules
On Thu, Feb 09, 2023 at 12:18:55PM -0800, Andres Freund wrote: > On 2023-02-09 11:39:17 -0800, Nathan Bossart wrote: > Personally I'd probably squash these into a single commit. done > I'd probably mention that this should typically be of server lifetime / a > 'static const' struct. Tableam documents this as follows: done >> + >> + >> +archive_library is only loaded in the archiver >> process. >> + >> + >> > > That's not really related to any of the changes here, right? > > I'm not sure it's a good idea to document that. We e.g. probably should allow > the library to check that the configuration is correct, at postmaster start, > rather than later, at runtime. removed >> + >> + Startup Callback >> + >> +The startup_cb callback is called shortly after the >> +module is loaded. This callback can be used to perform any additional >> +initialization required. If the archive module needs to have a state, >> it >> +can use state->private_data to store it. > > s/needs to have a state/has state/? done >> >> -typedef bool (*ArchiveCheckConfiguredCB) (void); >> +typedef bool (*ArchiveCheckConfiguredCB) (ArchiveModuleState *state); >> >> >> If true is returned, the server will proceed with > > Hm. I wonder if ArchiveCheckConfiguredCB() should actually work without the > state. We're not really doing anything yet, at that point, so it shouldn't > really need state? > > The reason I'm wondering is that I think we should consider calling this from > the GUC assignment hook, at least in postmaster. Which would make it more > convenient to not have state, I think? I agree that this callback should typically not need the state, but I'm not sure whether it fits into the assignment hook for archive_library. This callback is primarily meant for situations when you have archiving enabled, but your module isn't configured yet (e.g., archive_command is empty). In this case, we keep the WAL around, but we don't try to archive it until this hook returns true. It's up to each module to define that criteria. I can imagine someone introducing a GUC in their archive module that temporarily halts archiving via this callback. In that case, calling it via a GUC assignment hook probably won't work. In fact, checking whether archive_command is empty in that hook might not work either. >> >> -typedef void (*ArchiveShutdownCB) (void); >> +typedef void (*ArchiveShutdownCB) (ArchiveModuleState *state); >> >> >> > > Perhaps mention that this needs to free state it allocated in the > ArchiveModuleState, or it'll be leaked? done I left this out originally because the archiver exits shortly after calling this. However, if you have DSM segments or something, it's probably wise to make sure those are cleaned up. And I suppose we might not always exit immediately after this callback, so establishing the habit of freeing the state could be a good idea. In addition to updating this part of the docs, I wrote a shutdown callback for basic_archive that frees its state. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com >From fab6305c344deb83c5b5c30ca2c09dbe1925a36c Mon Sep 17 00:00:00 2001 From: Nathan Bossart Date: Thu, 9 Feb 2023 13:49:46 -0800 Subject: [PATCH v10 1/1] restructure archive modules API --- contrib/basic_archive/basic_archive.c | 91 + doc/src/sgml/archive-modules.sgml | 35 +++--- src/backend/postmaster/pgarch.c | 27 +--- src/backend/postmaster/shell_archive.c | 34 + src/backend/utils/misc/guc_tables.c | 1 + src/include/postmaster/archive_module.h | 58 src/include/postmaster/pgarch.h | 39 --- src/include/postmaster/shell_archive.h | 24 +++ 8 files changed, 224 insertions(+), 85 deletions(-) create mode 100644 src/include/postmaster/archive_module.h create mode 100644 src/include/postmaster/shell_archive.h diff --git a/contrib/basic_archive/basic_archive.c b/contrib/basic_archive/basic_archive.c index 36b7a4814a..e4742c9c94 100644 --- a/contrib/basic_archive/basic_archive.c +++ b/contrib/basic_archive/basic_archive.c @@ -32,7 +32,7 @@ #include "common/int.h" #include "miscadmin.h" -#include "postmaster/pgarch.h" +#include "postmaster/archive_module.h" #include "storage/copydir.h" #include "storage/fd.h" #include "utils/guc.h" @@ -40,14 +40,27 @@ PG_MODULE_MAGIC; +typedef struct BasicArchiveData +{ + MemoryContext context; +} BasicArchiveData; + static char *archive_directory = NULL; -static MemoryContext basic_archive_context; -static bool basic_archive_configured(void); -static bool basic_archive_file(const char *file, const char *path); +static void basic_archive_startup(ArchiveModuleState *state); +static bool basic_archive_configured(ArchiveModuleState *state); +static bool basic_archive_file(ArchiveModuleState *state, const char *file, const char *path); static void basic_archive_file_internal(const cha
Re: Generating code for query jumbling through gen_node_support.pl
On Wed, Feb 08, 2023 at 03:47:51PM +0900, Michael Paquier wrote: > This one was intentional to let extensions play with jumbling of such > nodes, but perhaps you are right that it makes little sense at this > stage. If there is an ask for it later, though.. Using > shared_preload_libraries = pg_stat_statements and compute_query_id = > regress shows that numbers go up to 60% for funcs.c and 30% for > switch.c. Removing nodes like as of the attached brings these numbers > respectively up to 94.5% and 93.5% for a check. With a check-world, I > measure respectively 96.7% and 96.1% because there is more coverage > for extensions, ALTER SYSTEM and database commands, roughly. Tom, did you get a chance to look at what is proposed here and expand the use of query_jumble_ignore in the definitions of the nodes rather than have an enforced per-file policy in gen_node_support.pl? The attached improves the situation by as much as we can, still the numbers reported by coverage.postgresql.org won't go that up until we enforce query jumbling testing on the regression database (something we should have done since this code moved to core, I guess). -- Michael signature.asc Description: PGP signature
Re: pg_usleep for multisecond delays
On Thu, Feb 09, 2023 at 01:30:27PM -0800, Andres Freund wrote: > On 2023-02-09 12:59:29 -0800, Nathan Bossart wrote: >> I just found myself carefully counting the zeros in a call to pg_usleep(). >> Besides getting my eyes checked, perhaps there should be a wrapper called >> pg_ssleep() than can be used for multisecond sleeps. Or maybe the >> USECS_PER_SEC macro should be used more widely. I attached a patch for the >> former approach. I don't have a strong opinion, but I do think it's worth >> improving readability a bit here. > > pg_usleep() should pretty much never used for sleeps that long, at least in > the backend - depending on the platform they're not interruptible. Most of the > things changed here are debugging tools, but even so, it's e.g. pretty > annoying. E.g. you can't normally shut down while a backend is in > pre_auth_delay. > > So I'm not sure it's the right direction to make pg_usleep() easier to use... Hm... We might be able to use WaitLatch() for some of these. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: Worth using personality(ADDR_NO_RANDOMIZE) for EXEC_BACKEND on linux?
On Fri, Feb 10, 2023 at 08:15:21AM +1300, Thomas Munro wrote: > For the record, according to [1] it's not necessary to use > --reset-author when back-patching. (Maybe a little confusingly, > because it's not quite clear whether our policies consider the author > field to be meaningful or not.) > > [1] > https://www.postgresql.org/message-id/flat/1347459696.16215.11.camel%40vanquo.pezone.net Oops, sorry about that. Using --reset-author is a habit when it comes to backpatch. It looks like my mistake when back-patching something only to stable branches. -- Michael signature.asc Description: PGP signature
Re: Importing pg_bsd_indent into our source tree
Hi, On 2023-02-09 17:19:22 -0500, Tom Lane wrote: > Andres Freund writes: > > The trivial fix of using MAXPGPATH made it build, without warnings. That > > doesn't say anything about actually working. So I guess porting the test > > would > > make sense. > > > Opinions on whether it would make sense as a shell script? > > Hmmm .. a shell script would be fine by me, but it won't help in > testing a Windows build. Maybe we need to make it a Perl script? At least for casual testing a shell script actually mostly works, due to git it's easy enough to have a sh.exe around... Not something I'd necessarily want to make a hard dependency, but for something like this it might suffice. Of course perl would be more dependable... > BTW, the attachments to your previous message are identical to what > I previously posted --- did you attach the wrong set of diffs? I attached an extra patch, in addition to yours. I also attached yours so that cfbot could continue to work, if you registered this. Greetings, Andres Freund
Re: Generating code for query jumbling through gen_node_support.pl
Michael Paquier writes: > Tom, did you get a chance to look at what is proposed here and expand > the use of query_jumble_ignore in the definitions of the nodes rather > than have an enforced per-file policy in gen_node_support.pl? Sorry, didn't look at it before. I'm okay with the pathnodes.h changes --- although surely you don't need changes like this: - pg_node_attr(abstract) + pg_node_attr(abstract, no_query_jumble) "abstract" should already imply "no_query_jumble". I wonder too if you could shorten the changes by making no_query_jumble an inheritable attribute, and then just applying it to Path and Plan. The changes in parsenodes.h seem wrong, except for RawStmt. Those node types are used in parsed queries, aren't they? regards, tom lane
Re: Importing pg_bsd_indent into our source tree
Andres Freund writes: > On 2023-02-09 17:19:22 -0500, Tom Lane wrote: >> Hmmm .. a shell script would be fine by me, but it won't help in >> testing a Windows build. Maybe we need to make it a Perl script? > At least for casual testing a shell script actually mostly works, due to git > it's easy enough to have a sh.exe around... Not something I'd necessarily want > to make a hard dependency, but for something like this it might suffice. Of > course perl would be more dependable... Yeah, also less question about whether it works on Windows. I'll see about moving that into Perl. It's short enough. >> BTW, the attachments to your previous message are identical to what >> I previously posted --- did you attach the wrong set of diffs? > I attached an extra patch, in addition to yours. D'oh, I didn't notice that :-( > I also attached yours so that > cfbot could continue to work, if you registered this. I thought about registering it, but that won't teach us anything unless we make it built-by-default, which was not my intention. I guess we could temporarily include it in the build. regards, tom lane
Re: Minor meson gripe
On Thu, Feb 9, 2023 at 12:56 PM Andres Freund wrote: > > * Isn't it confusing that we have a suite by that name, given that we > > also need to use the unrelated --setup flag for some nearby testing > > recipes? > > Hm. I don't find it particularly confusing, but I don't think I'm a good judge > of that, too close. > It'll work, but only if you have run setup before. And it'll not use changed C > code. I see. It's not that confusing on its own, but it does cause confusion once you consider how things fit together. Suppose I want to do the equivalent of running the amcheck tests -- the tests that run when "make check" runs from contrib/amcheck with an autoconf build. That looks like this with our current meson workflow: meson test -v --suite setup --suite amcheck Now consider what I have to run to get the equivalent of a "make installcheck" run from the contrib/amcheck directory: meson test -v --setup running --suite amcheck-running Notice that I have to specify "--suite setup" in the first example, whereas I have to specify "--setup running" in the second example instead -- at the same point in. Also notice the rest of the details almost match. This makes it quite natural to wonder if "--suite setup" is related to "--setup running" in some way, which is not the case at all. They're two wholly unrelated concepts. Why not change the suite name to tmp_install? That immediately reminds me of what's really going on here, since I'm used to seeing that directory name. And it clashes with "--suite setup" in a way that seems useful. -- Peter Geoghegan
Re: ICU locale validation / canonicalization
On 2/9/23 23:09, Jeff Davis wrote: I do like the ICU format locale IDs from a readability standpoint. "en_US@colstrength=primary" is more meaningful to me than "en-US-u-ks- level1" (the equivalent language tag). And the format is specified[1], even though it's not an independent standard. But I think the benefits of better validation, an independent standard, and the fact that we're already favoring BCP47 outweigh my subjective opinion. I have the same feeling one is readable and the other unreadable but the unreadable one is standardized. Hard call. And in general I agree, if we are going to make ICU default it needs to be more user friendly than it is now. Currently there is no nice way to understand if you entered the right locale or made a typo in the BCP47 syntax. Andreas
Re: make_ctags: use -I option to ignore pg_node_attr macro
>> Oops. Thank you for pointing it out. BTW, just out of curiosity, do >> you have etags on you Mac? > > Yes. > > $ etags --version > etags (GNU Emacs 28.2) > Copyright (C) 2022 Free Software Foundation, Inc. > This program is distributed under the terms in ETAGS.README Ok. Probably that was installed with emacs. For some reason I don't know, I don't have etags even I already installed emacs. So I decided to test using my old Ubuntu 18 vm, which has old ctags and etags. > How about just applying the following into make_etags? > > +if [ $# -gt 1 ] || ( [ $# -eq 1 ] && [ $1 != "-n" ] ) > +then echo "Usage: $0 [-n]" > + exit 1 > +fi Ok from me. Looks simple enough. > With the patch, make_etags caused the following error messages > on my MacOS. > > $ ./src/tools/make_etags > No such file or directory > No such file or directory > > To fix this error, probaby we should get rid of double-quotes > from "$FLAGS" "$IGNORE_IDENTIFIES" in the following command. > > - xargs ctags $MODE -a -f $TAGS_FILE "$FLAGS" "$IGNORE_IDENTIFIES" > + xargs $PROG $MODE $TAGS_OPT $TAGS_FILE "$FLAGS" "$IGNORE_IDENTIFIES" Oh, I see. > + elsectags --help 2>&1 | grep -- -e >/dev/null > + # Note that "ctags --help" does not always work. Even certain ctags > does not have the option. > > This code seems to assume that there is non-Exuberant ctags > supporting -e option. But does such ctags really exist? Good question. I vaguely recalled so、but I haven't been able to find any evidence for it. > > I fixed the above issues and refactored the code. > Attached is the updated version of the patch. Thought? Thank you! Looks good to me. Best reagards, -- Tatsuo Ishii SRA OSS LLC English: http://www.sraoss.co.jp/index_en/ Japanese:http://www.sraoss.co.jp
Re: Importing pg_bsd_indent into our source tree
Hi, On 2023-02-09 18:19:06 -0500, Tom Lane wrote: > Andres Freund writes: > > On 2023-02-09 17:19:22 -0500, Tom Lane wrote: > >> Hmmm .. a shell script would be fine by me, but it won't help in > >> testing a Windows build. Maybe we need to make it a Perl script? > > > At least for casual testing a shell script actually mostly works, due to git > > it's easy enough to have a sh.exe around... Not something I'd necessarily > > want > > to make a hard dependency, but for something like this it might suffice. Of > > course perl would be more dependable... > > Yeah, also less question about whether it works on Windows. > I'll see about moving that into Perl. It's short enough. Cool. > > I also attached yours so that > > cfbot could continue to work, if you registered this. > > I thought about registering it, but that won't teach us anything unless > we make it built-by-default, which was not my intention. I guess we > could temporarily include it in the build. The meson patch I sent did build it by default, that's why I saw the windows failure and the freebsd warnings. If we don't want that, we'd need to add build_by_default: false I'm fine either way. It's barely noticeable compared to the rest of postgres. Greetings, Andres Freund
Re: Importing pg_bsd_indent into our source tree
Andres Freund writes: > On 2023-02-09 18:19:06 -0500, Tom Lane wrote: >> I thought about registering it, but that won't teach us anything unless >> we make it built-by-default, which was not my intention. I guess we >> could temporarily include it in the build. > The meson patch I sent did build it by default, that's why I saw the windows > failure and the freebsd warnings. If we don't want that, we'd need to add > build_by_default: false > I'm fine either way. It's barely noticeable compared to the rest of postgres. Yeah, build-by-default isn't really a big deal. Install-by-default is more of a problem... regards, tom lane
Re: Minor meson gripe
Hi, On 2023-02-09 15:34:34 -0800, Peter Geoghegan wrote: > Why not change the suite name to tmp_install? That immediately reminds > me of what's really going on here, since I'm used to seeing that > directory name. And it clashes with "--suite setup" in a way that > seems useful. The individual test is actually named tmp_install. I thought it might be useful to add further things to the setup "stage", hence the more general name. I e.g. have a not-quite-done patch that creates a "template initdb", which pg_regress and tap tests automatically use (except if non-default options are required), which quite noticably reduces test times (*). But something needs to create the template initdb, and initdb doesn't run without an installation, so we need to run it after the temp_install. Of course we could wrap that into one "test", but it seemed nicer to see if you fail during installation, or during initdb. So for that I added a separate test, that is also part of the setup suite. Of course we could still name the suite tmp_install (or to be consistent with the confusing make naming, have a temp-install target, which creates the tmp_install directory :)). I guess that'd still be less confusing? I'm not at all wedded to the "setup" name. Greetings, Andres Freund * approximate test time improvements: local: 688.67user 154.44system 1:08.29elapsed 1234%CPU (0avgtext+0avgdata 138984maxresident)k -> 172.37user 109.43system 1:00.12elapsed 468%CPU (0avgtext+0avgdata 139168maxresident)k The 4x reduction in CPU cycles is pretty bonkers. To bad wall clock time doesn't improve that much - we end up waiting for isolationtester, pg_upgrade tests to finish. CI freebsd: 06:30 -> 05:00 CI linux, sanitize-address: 5:40->4:30 CI linux, sanitize-undefined,alignment: 3:00->2:15 CI windows 12:20 -> 10:00 CI macos: 10:20 -> 08:00 I expect it to very substantially speed up the valgrind builfarm animal. By far the most cycles are spent below initdb. https://github.com/anarazel/postgres/tree/initdb-caching
Re: Importing pg_bsd_indent into our source tree
On 2023-02-09 19:20:37 -0500, Tom Lane wrote: > Andres Freund writes: > > I'm fine either way. It's barely noticeable compared to the rest of > > postgres. > > Yeah, build-by-default isn't really a big deal. Install-by-default > is more of a problem... Perhaps we should install it, just not in bin/, but alongside pgxs/, similar to pg_regress et al?
Re: Importing pg_bsd_indent into our source tree
Andres Freund writes: > Perhaps we should install it, just not in bin/, but alongside pgxs/, similar > to pg_regress et al? For my own purposes, I really don't want it anywhere in the --prefix tree. That's not necessarily present when I'm using the program. (Hmm, clarify: it wouldn't matter if install sticks it under pgxs, but I don't want to be forced to use it from there.) regards, tom lane
Re: Introduce a new view for checkpointer related stats
Hi, On 2023-02-09 19:00:00 +0530, Bharath Rupireddy wrote: > On Thu, Feb 9, 2023 at 12:33 PM Andres Freund wrote: > > On 2023-02-09 12:21:51 +0530, Bharath Rupireddy wrote: > > > @@ -1105,18 +1105,22 @@ CREATE VIEW pg_stat_archiver AS > > > > > > CREATE VIEW pg_stat_bgwriter AS > > > SELECT > > > -pg_stat_get_bgwriter_timed_checkpoints() AS checkpoints_timed, > > > -pg_stat_get_bgwriter_requested_checkpoints() AS checkpoints_req, > > > -pg_stat_get_checkpoint_write_time() AS checkpoint_write_time, > > > -pg_stat_get_checkpoint_sync_time() AS checkpoint_sync_time, > > > -pg_stat_get_bgwriter_buf_written_checkpoints() AS > > > buffers_checkpoint, > > > pg_stat_get_bgwriter_buf_written_clean() AS buffers_clean, > > > pg_stat_get_bgwriter_maxwritten_clean() AS maxwritten_clean, > > > -pg_stat_get_buf_written_backend() AS buffers_backend, > > > -pg_stat_get_buf_fsync_backend() AS buffers_backend_fsync, > > > pg_stat_get_buf_alloc() AS buffers_alloc, > > > pg_stat_get_bgwriter_stat_reset_time() AS stats_reset; > > > > > > +CREATE VIEW pg_stat_checkpointer AS > > > +SELECT > > > +pg_stat_get_timed_checkpoints() AS timed_checkpoints, > > > +pg_stat_get_requested_checkpoints() AS requested_checkpoints, > > > +pg_stat_get_checkpoint_write_time() AS checkpoint_write_time, > > > +pg_stat_get_checkpoint_sync_time() AS checkpoint_sync_time, > > > +pg_stat_get_buf_written_checkpoints() AS > > > buffers_written_checkpoints, > > > +pg_stat_get_buf_written_backend() AS buffers_written_backend, > > > +pg_stat_get_buf_fsync_backend() AS buffers_fsync_backend, > > > +pg_stat_get_checkpointer_stat_reset_time() AS stats_reset; > > > + > > > > I don't think the backend written stats belong more accurately in > > pg_stat_checkpointer than pg_stat_bgwriter. > > We accumulate buffers_written_backend and buffers_fsync_backend of all > backends under checkpointer stats to show the aggregated results to > the users. I think this is correct because the checkpointer is the one > that processes fsync requests (of course, backends themselves can > fsync when needed, that's what the buffers_fsync_backend shows), > whereas bgwriter doesn't perform IIUC. That's true for buffers_fsync_backend, but not true for buffers_backend/buffers_written_backend. That isn't tied to checkpointer. I think if we end up breaking compat, we should just drop that column. The pg_stat_io patch from Melanie, which I hope to finish committing by tomorrow, provides that in a more useful way, in a less confusing place. I'm not sure it's worth having buffers_fsync_backend in pg_stat_checkpointer in that case. You can get nearly the same information from pg_stat_io as well (except fsyncs for SLRUs that couldn't be put into the queue, which you'd not see right now - hard to believe that ever happens at a relelvant frequency). > > I continue to be worried about breaking just about any postgres monitoring > > setup. > > Hm. Yes, it requires minimal and straightforward changes in monitoring > scripts. Please note that we separated out bgwriter and checkpointer > in v9.2 12 years ago but we haven't had a chance to separate the stats > so far. We might do it at some point of time, IMHO this is that time. > We did away with promote_trigger_file (cd4329d) very recently. The > agreement was that the changes required to move on to other mechanisms > of promotion are minimal, hence we didn't want it to be first > deprecated and then removed. That's not really comparable, because we have had pg_ctl promote for a long time. You can use it across all supported versions. pg_promote() is nearly there as well. Whereas there's no way to use same query across all versions. IME there also exist a lot more hand-rolled monitoring setups than hand-rolled automatic promotion setups. > From the discussion upthread, it looks like Robert, Amit, Bertrand, > Greg and myself are in favour of not having a deprecated version but > moving them to the new pg_stat_checkpointer view. Yep, and I think you are all wrong, and that this is just going to cause unnecessary pain :). I'm not going to try to prevent the patch from going in because of this, just to be clear. Greetings, Andres Freund
Re: Time delayed LR (WAS Re: logical replication restrictions)
At Thu, 9 Feb 2023 13:26:19 +0530, Amit Kapila wrote in amit.kapila16> On Thu, Feb 9, 2023 at 12:17 AM Peter Smith wrote: > > I understand in theory, your code is more efficient, but in practice, > > I think the overhead of a single variable assignment every loop > > iteration (which is doing WaitLatch anyway) is of insignificant > > concern, whereas having one assignment is simpler than having two IMO. > > > > Yeah, that sounds better to me as well. FWIW, I'm on board with this. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Time delayed LR (WAS Re: logical replication restrictions)
At Thu, 9 Feb 2023 13:48:52 +0530, Amit Kapila wrote in > On Thu, Feb 9, 2023 at 10:45 AM Kyotaro Horiguchi > wrote: > > > > At Wed, 8 Feb 2023 09:03:03 +, "Hayato Kuroda (Fujitsu)" > > wrote in > > > Thank you for reviewing! PSA new version. > > > > + if (statusinterval_ms > 0 && diffms > statusinterval_ms) > > > > The next expected feedback time is measured from the last status > > report. Thus, it seems to me this may suppress feedbacks from being > > sent for an unexpectedly long time especially when min_apply_delay is > > shorter than wal_r_s_interval. > > > > I think the minimum time before we send any feedback during the wait > is wal_r_s_interval. Now, I think if there is no transaction for a > long time before we get a new transaction, there should be keep-alive > messages in between which would allow us to send feedback at regular > intervals (wal_receiver_status_interval). So, I think we should be Right. > able to send feedback in less than 2 * wal_receiver_status_interval > unless wal_sender/receiver timeout is very large and there is a very > low volume of transactions. Now, we can try to send the feedback We have suffered this kind of feedback silence many times. Thus I don't want to rely on luck here. I had in mind of exposing last_send itself or providing interval-calclation function to the logic. > before we start waiting or maybe after every > wal_receiver_status_interval / 2 but I think that will lead to more > spurious feedback messages than we get the benefit from them. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Minor meson gripe
On Thu, Feb 9, 2023 at 4:33 PM Andres Freund wrote: > The individual test is actually named tmp_install. I thought it might be > useful to add further things to the setup "stage", hence the more general > name. I did notice that, but only after sitting with my initial confusion for a while. > I e.g. have a not-quite-done patch that creates a "template initdb", which > pg_regress and tap tests automatically use (except if non-default options are > required), which quite noticably reduces test times (*). But something needs > to > create the template initdb, and initdb doesn't run without an installation, so > we need to run it after the temp_install. > > Of course we could wrap that into one "test", but it seemed nicer to see if > you fail during installation, or during initdb. So for that I added a separate > test, that is also part of the setup suite. But what are the chances that the setup / tmp_install "test" will actually fail? It's almost a test in name only. > Of course we could still name the suite tmp_install (or to be consistent with > the confusing make naming, have a temp-install target, which creates the > tmp_install directory :)). I guess that'd still be less confusing? Yes, that definitely seems like an improvement. I don't care about the tiny inconsistency that this creates. I wonder if this could be addressed by adding another custom test setup, like --setup running, used whenever you want to just run one or two tests against an ad-hoc temporary installation? Offhand it seems as if add_test_setup() could support that requirement? -- Peter Geoghegan
Re: Time delayed LR (WAS Re: logical replication restrictions)
Mmm. A part of the previous mail have gone anywhere for a uncertain reason and placed by a mysterious blank lines... At Fri, 10 Feb 2023 09:57:22 +0900 (JST), Kyotaro Horiguchi wrote in > At Thu, 9 Feb 2023 13:48:52 +0530, Amit Kapila > wrote in > > On Thu, Feb 9, 2023 at 10:45 AM Kyotaro Horiguchi > > wrote: > > > > > > At Wed, 8 Feb 2023 09:03:03 +, "Hayato Kuroda (Fujitsu)" > > > wrote in > > > > Thank you for reviewing! PSA new version. > > > > > > + if (statusinterval_ms > 0 && diffms > statusinterval_ms) > > > > > > The next expected feedback time is measured from the last status > > > report. Thus, it seems to me this may suppress feedbacks from being > > > sent for an unexpectedly long time especially when min_apply_delay is > > > shorter than wal_r_s_interval. > > > > > > > I think the minimum time before we send any feedback during the wait > > is wal_r_s_interval. Now, I think if there is no transaction for a > > long time before we get a new transaction, there should be keep-alive > > messages in between which would allow us to send feedback at regular > > intervals (wal_receiver_status_interval). So, I think we should be > > Right. > > > able to send feedback in less than 2 * wal_receiver_status_interval > > unless wal_sender/receiver timeout is very large and there is a very > > low volume of transactions. Now, we can try to send the feedback > > We have suffered this kind of feedback silence many times. Thus I > don't want to rely on luck here. I had in mind of exposing last_send > itself or providing interval-calclation function to the logic. > > > before we start waiting or maybe after every > > wal_receiver_status_interval / 2 but I think that will lead to more > > spurious feedback messages than we get the benefit from them. Agreed. I think we dont want to do that. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: [PATCH] Add pretty-printed XML output option
On Thu, Feb 9, 2023 at 7:17 PM Jim Jones wrote: > > On 09.02.23 08:23, Tom Lane wrote: > > Um ... why are you using PG_TRY here at all? It seems like > > you have full control of the actually likely error cases. > > The only plausible error out of the StringInfo calls is OOM, > > and you probably don't want to trap that at all. > > My intention was to catch any unexpected error from > xmlDocDumpFormatMemory and handle it properly. But I guess you're right, > I can control the likely error cases by checking doc and nbytes. > > You suggest something along these lines? > > xmlDocPtr doc; > xmlChar*xmlbuf = NULL; > text *arg = PG_GETARG_TEXT_PP(0); > StringInfoData buf; > int nbytes; > > doc = xml_parse(arg, XMLOPTION_DOCUMENT, false, > GetDatabaseEncoding(), NULL); > > if(!doc) > elog(ERROR, "could not parse the given XML document"); > > xmlDocDumpFormatMemory(doc, &xmlbuf, &nbytes, 1); > > xmlFreeDoc(doc); > > if(!nbytes) > elog(ERROR, "could not indent the given XML document"); > > initStringInfo(&buf); > appendStringInfoString(&buf, (const char *)xmlbuf); > > xmlFree(xmlbuf); > > PG_RETURN_XML_P(stringinfo_to_xmltype(&buf)); > > > Thanks! > Something like that LGTM, but here are some other minor comments. == 1. FYI, there are some whitespace warnings applying the v5 patch [postgres@CentOS7-x64 oss_postgres_misc]$ git apply ../patches_misc/v5-0001-Add-pretty-printed-XML-output-option.patch ../patches_misc/v5-0001-Add-pretty-printed-XML-output-option.patch:26: trailing whitespace. ../patches_misc/v5-0001-Add-pretty-printed-XML-output-option.patch:29: trailing whitespace. ../patches_misc/v5-0001-Add-pretty-printed-XML-output-option.patch:33: trailing whitespace. ../patches_misc/v5-0001-Add-pretty-printed-XML-output-option.patch:37: trailing whitespace. ../patches_misc/v5-0001-Add-pretty-printed-XML-output-option.patch:41: trailing whitespace. warning: squelched 48 whitespace errors warning: 53 lines add whitespace errors. == src/test/regress/sql/xml.sql 2. The v5 patch was already testing NULL, but it might be good to add more tests to verify the function is behaving how you want for edge cases. For example, +-- XML pretty print: NULL, empty string, spaces only... SELECT xmlpretty(NULL); SELECT xmlpretty(''); SELECT xmlpretty(' '); ~~ 3. The function is returning XML anyway, so is the '::xml' casting in these tests necessary? e.g. SELECT xmlpretty(NULL)::xml; --> SELECT xmlpretty(NULL); == src/include/catalog/pg_proc.dat 4. + { oid => '4642', descr => 'Indented text from xml', + proname => 'xmlpretty', prorettype => 'xml', + proargtypes => 'xml', prosrc => 'xmlpretty' }, Spurious leading space for this new entry. == doc/src/sgml/func.sgml 5. + + + 42 + + + +(1 row) + +]]> A spurious blank line in the example after the "(1 row)" ~~~ 6. Does this function docs belong in section 9.15.1 "Producing XML Content"? Or (since it is not really producing content) should it be moved to the 9.15.3 "Processing XML" section? -- Kind Regards, Peter Smith. Fujitsu Australia
Re: Minor meson gripe
Hi, On 2023-02-09 17:00:48 -0800, Peter Geoghegan wrote: > On Thu, Feb 9, 2023 at 4:33 PM Andres Freund wrote: > > I e.g. have a not-quite-done patch that creates a "template initdb", which > > pg_regress and tap tests automatically use (except if non-default options > > are > > required), which quite noticably reduces test times (*). But something > > needs to > > create the template initdb, and initdb doesn't run without an installation, > > so > > we need to run it after the temp_install. > > > > Of course we could wrap that into one "test", but it seemed nicer to see if > > you fail during installation, or during initdb. So for that I added a > > separate > > test, that is also part of the setup suite. > > But what are the chances that the setup / tmp_install "test" will > actually fail? It's almost a test in name only. I've seen more failures than I'd like. Permission errors, conflicting names, and similar things. But mainly that was a reference to running initdb, which I've broken temporarily many a time. > > Of course we could still name the suite tmp_install (or to be consistent > > with > > the confusing make naming, have a temp-install target, which creates the > > tmp_install directory :)). I guess that'd still be less confusing? > > Yes, that definitely seems like an improvement. I don't care about the > tiny inconsistency that this creates. Then lets do that - feel free to push something, or send something for review. Otherwise I'll try to get to it, but I owe a few people work before this... > I wonder if this could be addressed by adding another custom test > setup, like --setup running, used whenever you want to just run one or > two tests against an ad-hoc temporary installation? Offhand it seems > as if add_test_setup() could support that requirement? What precisely do you mean with "ad-hoc temporary installation"? I was wondering about adding a different setup that'd use the "real" installation to run tests. But perhaps that's something different than what you have in mind? The only restriction I see wrt add_test_setup() is that it's not entirely trivial to use a "runtime-variable" path to an installation. Greetings, Andres Freund
Re: ICU locale validation / canonicalization
On Fri, 2023-02-10 at 01:04 +0100, Andreas Karlsson wrote: > I have the same feeling one is readable and the other unreadable but > the > unreadable one is standardized. Hard call. > > And in general I agree, if we are going to make ICU default it needs > to > be more user friendly than it is now. Currently there is no nice way > to > understand if you entered the right locale or made a typo in the > BCP47 > syntax. We will still allow the ICU format locale IDs for input; we would just convert them to BCP47 before storing them in the catalog. And there's an inverse function, so it's easy enough to offer a view that shows the ICU format locale IDs in addition to the BCP 47 tags. I don't think it's hugely important that we use BCP47; ICU format locale IDs would also make sense. But I do think we should be consistent to simplify things where we can -- collator versioning is hard enough without wondering how a user-entered string will be interpreted. And if we're going to be consistent, BCP 47 seems like the most obvious choice. -- Jeff Davis PostgreSQL Contributor Team - AWS
Re: tests against running server occasionally fail, postgres_fdw & tenk1
On Wed, Feb 8, 2023 at 7:18 PM Andres Freund wrote: > > This is a good thing for performance, of course, but it also makes VACUUM > > VERBOSE show information that makes sense to users, since things actually > > happen in a way that makes a lot more sense. I'm quite happy about the fact > > that the new VACUUM VERBOSE allows users to mostly ignore obscure details > > like whether an index was scanned by amvacuumcleanup() or by ambulkdelete() > > -- stuff that basically nobody understands. That seems worth preserving. > > I don't mind making the messages as similar as possible, but I do mind if I as > a postgres hacker, or an expert consultant, can't parse that detail out. We > need to be able to debug things like amvacuumcleanup() doing too much work too > often. FWIW you can tell even today. You can observe that the number of index scans is 0, and that one or more indexes have their size reported -- that indicates that an amvacuumcleanup()-only scan took place, say because we needed to put some preexisting deleted pages in the FSM. There is also another detail that strongly hints that VACUUM VERBOSE had to scan an index during its call to amvacuumcleanup(), which is atypical: it only shows details for that particular index, which is really noticeable. It won't report anything about those indexes that had no-op calls to amvacuumcleanup(). It kind of makes sense that we report 0 index scans when there were 0 calls to ambulkdelete(), even though there may still have been some index scans during a call to some amvacuumcleanup() routine. The common case is that they're no-op calls for every index, but even when they're not there is still probably only one or two indexes that have to do a noticeable amount of I/O. It makes sense to "round down to 0". Granted, there are some notable exceptions. For example, gistvacuumcleanup() doesn't even try to avoid scanning the index. But that's really a problem in gistvacuumcleanup() -- since it really doesn't make very much sense, even from a GiST point of view. It can follow exactly the same approach as B-Tree here, since its approach to page deletion is already directly based on nbtree. -- Peter Geoghegan