Re: Add 'worker_type' to pg_stat_subscription
On Sat, Sep 16, 2023 at 1:06 AM Nathan Bossart wrote: > > On Thu, Sep 14, 2023 at 03:04:19PM -0700, Nathan Bossart wrote: > > The only reason I didn't apply this already is because IMHO we should > > adjust the worker types and the documentation for the view to be > > consistent. For example, the docs say "leader apply worker" but the view > > just calls them "apply" workers. The docs say "synchronization worker" but > > the view calls them "table synchronization" workers. My first instinct is > > to call apply workers "leader apply" workers in the view, and to call table > > synchronization workers "table synchronization workers" in the docs. > > Concretely, like this. > I think there is a merit in keeping the worker type as 'sync' or 'synchronization' because these would be used in future for syncing other objects like sequences. One more thing that slightly looks odd is the 'leader apply' type of worker, won't this be confusing when there is no parallel apply worker in the system? In this regard, probably existing documentation could also be improved. -- With Regards, Amit Kapila.
Re: JSON Path and GIN Questions
On Sep 15, 2023, at 20:36, Tom Lane wrote: > I think that that indicates that you're putting the info in the > wrong place. Perhaps the right answer is to insert something > more explicit in section 11.2, which is the first place where > we really spend any effort discussing what can be indexed. Fair enough. How ’bout this? --- a/doc/src/sgml/indices.sgml +++ b/doc/src/sgml/indices.sgml @@ -120,7 +120,7 @@ CREATE INDEX test1_id_index ON test1 (id); B-tree, Hash, GiST, SP-GiST, GIN, BRIN, and the extension bloom. Each index type uses a different - algorithm that is best suited to different types of queries. + algorithm that is best suited to different types of queries and operators. By default, the CREATE INDEX command creates B-tree indexes, which fit the most common situations. @@ -132,6 +132,14 @@ CREATE INDEX name ON table + + +Only operators on indexed columns are considered as index qualifications. +Functions never qualify for index usage, aside from +indexes on expressions. + + + B-Tree signature.asc Description: Message signed with OpenPGP
Re: JSON Path and GIN Questions
On Sep 15, 2023, at 23:59, Erik Rijkers wrote: > movie @? '$ ?($.year >= 2023)' > > I believe it is indeed not possible to have such a unequality-search use the > GIN index. It is another weakness of JSON that can be unexpected to those > not in the fullness of Knowledge of the manual. Yes, this too would be good > to explain in the doc where JSON indexes are explained. Is that a limitation of GIN indexes in general? Or could there be opclass improvements in the future that would enable such comparisons? Thanks, David signature.asc Description: Message signed with OpenPGP
Re: JSON Path and GIN Questions
On Sep 12, 2023, at 21:00, Erik Wienhold wrote: >> If so, I’d like to submit a patch to the docs talking about this, and >> suggesting the use of jsonb_path_query() to test paths to see if they return >> a boolean or not. > > +1 I’ve started work on this; there’s so much to learn! Here’s a new example that surprised me a bit. Using the GPS tracker example from the docs [1] loaded into a `:json` psql variable, this output of this query makes perfect sense to me: david=# select jsonb_path_query(:'json', '$.track.segments.location[*] ? (@ < 14)'); jsonb_path_query -- 13.4034 13.2635 Because `[*]` selects all the values. This, however, I did not expect: david=# select jsonb_path_query(:'json', '$.track.segments.location ? (@[*] < 14)'); jsonb_path_query -- 13.4034 13.2635 (2 rows) I had expected it to return two single-value arrays, instead: [13.4034] [13.2635] It appears that the filter expression is doing some sub-selection, too. Is that expected? Best, David [1]: https://www.postgresql.org/docs/current/functions-json.html#FUNCTIONS-SQLJSON-PATH signature.asc Description: Message signed with OpenPGP
Re: Add 'worker_type' to pg_stat_subscription
On Sat, Sep 16, 2023 at 06:09:48PM +0530, Amit Kapila wrote: > I think there is a merit in keeping the worker type as 'sync' or > 'synchronization' because these would be used in future for syncing > other objects like sequences. One more thing that slightly looks odd > is the 'leader apply' type of worker, won't this be confusing when > there is no parallel apply worker in the system? In this regard, > probably existing documentation could also be improved. These are good points. I went ahead and adjusted the patch back to using "apply" for [leader] apply workers and to using "synchronization" for synchronization workers. I also adjusted a couple of the error messages that Michael pointed out to say "synchronization worker" instead of "table synchronization worker" or "tablesync worker". This still leaves the possibility for confusion with the documentation's use of "leader apply worker", but I haven't touched that for now. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com >From 42c7b1910af29a8543ed65b2150e5eedae34a594 Mon Sep 17 00:00:00 2001 From: Nathan Bossart Date: Sat, 16 Sep 2023 13:21:54 -0700 Subject: [PATCH v8 1/1] Add worker type to pg_stat_subscription. Thanks to 2a8b40e368, the logical replication worker type is easily determined, and this information is a nice addition to the pg_stat_subscription view. The worker type could already be deduced via other columns such as leader_pid and relid, but that is unnecessary complexity for users. Bumps catversion. Author: Peter Smith Reviewed-by: Michael Paquier, Maxim Orlov, Amit Kapila Discussion: https://postgr.es/m/CAHut%2BPtmbSMfErSk0S7xxVdZJ9XVE3xVLhqBTmT91kf57BeKDQ%40mail.gmail.com --- doc/src/sgml/monitoring.sgml| 11 +++ src/backend/catalog/system_views.sql| 1 + src/backend/replication/logical/launcher.c | 18 +- src/backend/replication/logical/tablesync.c | 2 +- src/backend/replication/logical/worker.c| 6 +++--- src/include/catalog/catversion.h| 2 +- src/include/catalog/pg_proc.dat | 6 +++--- src/test/regress/expected/rules.out | 3 ++- src/test/subscription/t/004_sync.pl | 2 +- 9 files changed, 40 insertions(+), 11 deletions(-) diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml index 4ff415d6a0..3546e8b3d9 100644 --- a/doc/src/sgml/monitoring.sgml +++ b/doc/src/sgml/monitoring.sgml @@ -1993,6 +1993,17 @@ description | Waiting for a newly initialized WAL file to reach durable storage + + + worker_type text + + + Type of the subscription worker process. Possible types are + apply, parallel apply, and + synchronization. + + + pid integer diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql index 77b06e2a7a..fcb14976c0 100644 --- a/src/backend/catalog/system_views.sql +++ b/src/backend/catalog/system_views.sql @@ -949,6 +949,7 @@ CREATE VIEW pg_stat_subscription AS SELECT su.oid AS subid, su.subname, +st.worker_type, st.pid, st.leader_pid, st.relid, diff --git a/src/backend/replication/logical/launcher.c b/src/backend/replication/logical/launcher.c index 7882fc91ce..49e1c934a7 100644 --- a/src/backend/replication/logical/launcher.c +++ b/src/backend/replication/logical/launcher.c @@ -1278,7 +1278,7 @@ GetLeaderApplyWorkerPid(pid_t pid) Datum pg_stat_get_subscription(PG_FUNCTION_ARGS) { -#define PG_STAT_GET_SUBSCRIPTION_COLS 9 +#define PG_STAT_GET_SUBSCRIPTION_COLS 10 Oid subid = PG_ARGISNULL(0) ? InvalidOid : PG_GETARG_OID(0); int i; ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo; @@ -1339,6 +1339,22 @@ pg_stat_get_subscription(PG_FUNCTION_ARGS) else values[8] = TimestampTzGetDatum(worker.reply_time); + switch (worker.type) + { + case WORKERTYPE_APPLY: +values[9] = CStringGetTextDatum("apply"); +break; + case WORKERTYPE_PARALLEL_APPLY: +values[9] = CStringGetTextDatum("parallel apply"); +break; + case WORKERTYPE_TABLESYNC: +values[9] = CStringGetTextDatum("synchronization"); +break; + case WORKERTYPE_UNKNOWN: +/* Should never happen. */ +elog(ERROR, "unknown worker type"); + } + tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc, values, nulls); diff --git a/src/backend/replication/logical/tablesync.c b/src/backend/replication/logical/tablesync.c index e2cee92cf2..cd71807088 100644 --- a/src/backend/replication/logical/tablesync.c +++ b/src/backend/replication/logical/tablesync.c @@ -151,7 +151,7 @@ finish_sync_worker(void) StartTransactionCommand(); ereport(LOG, - (errmsg("logical replication table synchronization worker for subscription \"%s\", table \"%s\" has finished", + (errmsg("logical replication synchronization worker for subscription \"%s\", table \"%s\" has finished"
Re: JSON Path and GIN Questions
On 16/09/2023 22:19 CEST David E. Wheeler wrote: > On Sep 15, 2023, at 23:59, Erik Rijkers wrote: > > > movie @? '$ ?($.year >= 2023)' > > > > I believe it is indeed not possible to have such a unequality-search use > > the GIN index. It is another weakness of JSON that can be unexpected to > > those not in the fullness of Knowledge of the manual. Yes, this too would > > be good to explain in the doc where JSON indexes are explained. > > Is that a limitation of GIN indexes in general? Or could there be opclass > improvements in the future that would enable such comparisons? This detail is mentioned in docs [1]: "For these operators, a GIN index extracts clauses of the form **accessors_chain = constant** out of the jsonpath pattern, and does the index search based on the keys and values mentioned in these clauses." I don't know if this is a general limitation of GIN indexes or just how these operators are implemented right now. [1] https://www.postgresql.org/docs/current/datatype-json.html#JSON-INDEXING -- Erik
Re: JSON Path and GIN Questions
On Sep 16, 2023, at 16:50, Erik Wienhold wrote: > "For these operators, a GIN index extracts clauses of the form > **accessors_chain = constant** out of the jsonpath pattern, and does the > index search based on the keys and values mentioned in these clauses." > > I don't know if this is a general limitation of GIN indexes or just how these > operators are implemented right now. > > [1] https://www.postgresql.org/docs/current/datatype-json.html#JSON-INDEXING The detail that jumps out at me is this one on jsonb_path_ops: “Basically, each jsonb_path_ops index item is a hash of the value and the key(s) leading to it” Because jsonb_path_ops indexes hashes, I would assume it would only support path equality. But it’s not clear to me from these docs that jsonb_ops also indexes hashes. Does it? Best, D signature.asc Description: Message signed with OpenPGP
Re: JSON Path and GIN Questions
On 16/09/2023 22:26 CEST David E. Wheeler wrote: > I’ve started work on this; there’s so much to learn! Here’s a new example > that surprised me a bit. Using the GPS tracker example from the docs [1] > loaded into a `:json` psql variable, this output of this query makes perfect > sense to me: > > david=# select jsonb_path_query(:'json', '$.track.segments.location[*] ? (@ < > 14)'); > jsonb_path_query > -- > 13.4034 > 13.2635 > > Because `[*]` selects all the values. This, however, I did not expect: > > david=# select jsonb_path_query(:'json', '$.track.segments.location ? (@[*] < > 14)'); > jsonb_path_query > -- > 13.4034 > 13.2635 > (2 rows) > > I had expected it to return two single-value arrays, instead: > > [13.4034] > [13.2635] > > It appears that the filter expression is doing some sub-selection, too. > Is that expected? Looks like the effect of lax mode which may unwrap arrays when necessary [1]. The array unwrapping looks like the result of jsonb_array_elements(). It kinda works in strict mode: SELECT jsonb_path_query(:'json', 'strict $.track.segments[*].location ? (@[*] < 14)'); jsonb_path_query --- [47.763, 13.4034] [47.706, 13.2635] (2 rows) But it does not remove elements from the matching arrays. Which I don't even expect here because the path specifies the location array as the object to be returned. The filter expression then only decides whether to return the location array or not. Nowhere in the docs does it say that the filter expression itself removes any elements from a matched array. Here's a query that filter's out individual array elements. It's quite a mouthful (especially to preserve the order of array elements): WITH location AS ( SELECT loc, row_number() OVER () AS array_num FROM jsonb_path_query(:'json', 'strict $.track.segments[*].location') loc ), element AS ( SELECT array_num, e.num AS elem_num, e.elem FROM location CROSS JOIN jsonb_array_elements(loc) WITH ORDINALITY AS e (elem, num) ) SELECT jsonb_agg(elem ORDER BY elem_num) FROM element WHERE jsonb_path_exists(elem, '$ ? (@ < 14)') GROUP BY array_num; jsonb_agg --- [13.2635] [13.4034] (2 rows) [1] https://www.postgresql.org/docs/current/functions-json.html#STRICT-AND-LAX-MODES -- Erik
Re: JSON Path and GIN Questions
On Sep 16, 2023, at 18:13, Erik Wienhold wrote: > Looks like the effect of lax mode which may unwrap arrays when necessary [1]. > The array unwrapping looks like the result of jsonb_array_elements(). > > It kinda works in strict mode: > > SELECT jsonb_path_query(:'json', 'strict $.track.segments[*].location ? (@[*] > < 14)'); > >jsonb_path_query > --- > [47.763, 13.4034] > [47.706, 13.2635] > (2 rows) > > But it does not remove elements from the matching arrays. Which I don't even > expect here because the path specifies the location array as the object to be > returned. The filter expression then only decides whether to return the > location array or not. Nowhere in the docs does it say that the filter > expression itself removes any elements from a matched array. Yes, this is what I expected. It means “select the location array if any of its contents is less that 14.” I don’t understand why it’s different in lax mode, though, as `@[*]` is not a structural error; it confirms to the schema, as the docs say. The flattening in this case seems weird. Ah, here’s why:, from the docs: "Besides, comparison operators automatically unwrap their operands in the lax mode, so you can compare SQL/JSON arrays out-of-the-box.” There follow some discussion of the need to specify `[*]` on segments in strict mode, but since that’s exactly what my example does (and the same for the locations array inside the filter), it doesn’t seem right to me that it would be unwrapped here. > Here's a query that filter's out individual array elements. It's quite a > mouthful (especially to preserve the order of array elements): Wow fun, and yeah, it makes sense to take things apart in SQL for this sort of thing! Best, David signature.asc Description: Message signed with OpenPGP
Re: [PoC] Improve dead tuple storage for lazy vacuum
On Sat, Sep 16, 2023 at 9:03 AM Andres Freund wrote: > > Hi, > > On 2023-08-28 23:43:22 +0900, Masahiko Sawada wrote: > > I've attached v42 patch set. I improved tidstore regression test codes > > in addition of imcorporating the above comments. > > Why did you need to disable the benchmark module for CI? I didn't want to unnecessarily make cfbot unhappy since the benchmark module is not going to get committed to the core and sometimes not up-to-date. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Re: remaining sql/json patches
Op 9/14/23 om 10:14 schreef Amit Langote: Hi Amit, Just now I built a v14-patched server and I found this crash: select json_query(jsonb ' { "arr": [ {"arr": [2,3]} , {"arr": [4,5]} ] }' , '$.arr[*].arr ? (@ <= 3)' returning anyarray WITH WRAPPER) --crash ; server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. connection to server was lost Can you have a look? Thanks, Erik