Re: Add 'worker_type' to pg_stat_subscription

2023-09-16 Thread Amit Kapila
On Sat, Sep 16, 2023 at 1:06 AM Nathan Bossart  wrote:
>
> On Thu, Sep 14, 2023 at 03:04:19PM -0700, Nathan Bossart wrote:
> > The only reason I didn't apply this already is because IMHO we should
> > adjust the worker types and the documentation for the view to be
> > consistent.  For example, the docs say "leader apply worker" but the view
> > just calls them "apply" workers.  The docs say "synchronization worker" but
> > the view calls them "table synchronization" workers.  My first instinct is
> > to call apply workers "leader apply" workers in the view, and to call table
> > synchronization workers "table synchronization workers" in the docs.
>
> Concretely, like this.
>

I think there is a merit in keeping the worker type as 'sync' or
'synchronization' because these would be used in future for syncing
other objects like sequences. One more thing that slightly looks odd
is the 'leader apply' type of worker, won't this be confusing when
there is no parallel apply worker in the system? In this regard,
probably existing documentation could also be improved.

-- 
With Regards,
Amit Kapila.




Re: JSON Path and GIN Questions

2023-09-16 Thread David E. Wheeler
On Sep 15, 2023, at 20:36, Tom Lane  wrote:

> I think that that indicates that you're putting the info in the
> wrong place.  Perhaps the right answer is to insert something
> more explicit in section 11.2, which is the first place where
> we really spend any effort discussing what can be indexed.

Fair enough. How ’bout this?

--- a/doc/src/sgml/indices.sgml
+++ b/doc/src/sgml/indices.sgml
@@ -120,7 +120,7 @@ CREATE INDEX test1_id_index ON test1 (id);
B-tree, Hash, GiST, SP-GiST, GIN, BRIN, and the extension bloom.
Each index type uses a different
-   algorithm that is best suited to different types of queries.
+   algorithm that is best suited to different types of queries and operators.
By default, the CREATE
INDEX command creates
B-tree indexes, which fit the most common situations.
@@ -132,6 +132,14 @@ CREATE INDEX name ON 
table
 
   

+  
+
+Only operators on indexed columns are considered as index qualifications.
+Functions never qualify for index usage, aside from
+indexes on expressions.
+
+  
+
   
B-Tree




signature.asc
Description: Message signed with OpenPGP


Re: JSON Path and GIN Questions

2023-09-16 Thread David E. Wheeler
On Sep 15, 2023, at 23:59, Erik Rijkers  wrote:

> movie @? '$ ?($.year >= 2023)'
> 
> I believe it is indeed not possible to have such a unequality-search use the 
> GIN index.  It is another weakness of JSON that can be unexpected to those 
> not in the fullness of Knowledge of the manual. Yes, this too would be good 
> to explain in the doc where JSON indexes are explained.

Is that a limitation of GIN indexes in general? Or could there be opclass 
improvements in the future that would enable such comparisons?

Thanks,

David



signature.asc
Description: Message signed with OpenPGP


Re: JSON Path and GIN Questions

2023-09-16 Thread David E. Wheeler
On Sep 12, 2023, at 21:00, Erik Wienhold  wrote:

>> If so, I’d like to submit a patch to the docs talking about this, and
>> suggesting the use of jsonb_path_query() to test paths to see if they return
>> a boolean or not.
> 
> +1

I’ve started work on this; there’s so much to learn! Here’s a new example that 
surprised me a bit. Using the GPS tracker example from the docs [1] loaded into 
a `:json` psql variable, this output of this query makes perfect sense to me:

david=# select jsonb_path_query(:'json', '$.track.segments.location[*] ? (@ < 
14)');
 jsonb_path_query
--
 13.4034
 13.2635

Because `[*]` selects all the values. This, however, I did not expect:

david=# select jsonb_path_query(:'json', '$.track.segments.location ? (@[*] < 
14)');
 jsonb_path_query
--
 13.4034
 13.2635
(2 rows)

I had expected it to return two single-value arrays, instead:

 [13.4034]
 [13.2635]

It appears that the filter expression is doing some sub-selection, too. Is that 
expected?

Best,

David

  [1]: 
https://www.postgresql.org/docs/current/functions-json.html#FUNCTIONS-SQLJSON-PATH


signature.asc
Description: Message signed with OpenPGP


Re: Add 'worker_type' to pg_stat_subscription

2023-09-16 Thread Nathan Bossart
On Sat, Sep 16, 2023 at 06:09:48PM +0530, Amit Kapila wrote:
> I think there is a merit in keeping the worker type as 'sync' or
> 'synchronization' because these would be used in future for syncing
> other objects like sequences. One more thing that slightly looks odd
> is the 'leader apply' type of worker, won't this be confusing when
> there is no parallel apply worker in the system? In this regard,
> probably existing documentation could also be improved.

These are good points.  I went ahead and adjusted the patch back to using
"apply" for [leader] apply workers and to using "synchronization" for
synchronization workers.  I also adjusted a couple of the error messages
that Michael pointed out to say "synchronization worker" instead of "table
synchronization worker" or "tablesync worker".

This still leaves the possibility for confusion with the documentation's
use of "leader apply worker", but I haven't touched that for now.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From 42c7b1910af29a8543ed65b2150e5eedae34a594 Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Sat, 16 Sep 2023 13:21:54 -0700
Subject: [PATCH v8 1/1] Add worker type to pg_stat_subscription.

Thanks to 2a8b40e368, the logical replication worker type is easily
determined, and this information is a nice addition to the
pg_stat_subscription view.  The worker type could already be
deduced via other columns such as leader_pid and relid, but that is
unnecessary complexity for users.

Bumps catversion.

Author: Peter Smith
Reviewed-by: Michael Paquier, Maxim Orlov, Amit Kapila
Discussion: https://postgr.es/m/CAHut%2BPtmbSMfErSk0S7xxVdZJ9XVE3xVLhqBTmT91kf57BeKDQ%40mail.gmail.com
---
 doc/src/sgml/monitoring.sgml| 11 +++
 src/backend/catalog/system_views.sql|  1 +
 src/backend/replication/logical/launcher.c  | 18 +-
 src/backend/replication/logical/tablesync.c |  2 +-
 src/backend/replication/logical/worker.c|  6 +++---
 src/include/catalog/catversion.h|  2 +-
 src/include/catalog/pg_proc.dat |  6 +++---
 src/test/regress/expected/rules.out |  3 ++-
 src/test/subscription/t/004_sync.pl |  2 +-
 9 files changed, 40 insertions(+), 11 deletions(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 4ff415d6a0..3546e8b3d9 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -1993,6 +1993,17 @@ description | Waiting for a newly initialized WAL file to reach durable storage
   
  
 
+ 
+  
+   worker_type text
+  
+  
+   Type of the subscription worker process.  Possible types are
+   apply, parallel apply, and
+   synchronization.
+  
+ 
+
  
   
pid integer
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 77b06e2a7a..fcb14976c0 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -949,6 +949,7 @@ CREATE VIEW pg_stat_subscription AS
 SELECT
 su.oid AS subid,
 su.subname,
+st.worker_type,
 st.pid,
 st.leader_pid,
 st.relid,
diff --git a/src/backend/replication/logical/launcher.c b/src/backend/replication/logical/launcher.c
index 7882fc91ce..49e1c934a7 100644
--- a/src/backend/replication/logical/launcher.c
+++ b/src/backend/replication/logical/launcher.c
@@ -1278,7 +1278,7 @@ GetLeaderApplyWorkerPid(pid_t pid)
 Datum
 pg_stat_get_subscription(PG_FUNCTION_ARGS)
 {
-#define PG_STAT_GET_SUBSCRIPTION_COLS	9
+#define PG_STAT_GET_SUBSCRIPTION_COLS	10
 	Oid			subid = PG_ARGISNULL(0) ? InvalidOid : PG_GETARG_OID(0);
 	int			i;
 	ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
@@ -1339,6 +1339,22 @@ pg_stat_get_subscription(PG_FUNCTION_ARGS)
 		else
 			values[8] = TimestampTzGetDatum(worker.reply_time);
 
+		switch (worker.type)
+		{
+			case WORKERTYPE_APPLY:
+values[9] = CStringGetTextDatum("apply");
+break;
+			case WORKERTYPE_PARALLEL_APPLY:
+values[9] = CStringGetTextDatum("parallel apply");
+break;
+			case WORKERTYPE_TABLESYNC:
+values[9] = CStringGetTextDatum("synchronization");
+break;
+			case WORKERTYPE_UNKNOWN:
+/* Should never happen. */
+elog(ERROR, "unknown worker type");
+		}
+
 		tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc,
 			 values, nulls);
 
diff --git a/src/backend/replication/logical/tablesync.c b/src/backend/replication/logical/tablesync.c
index e2cee92cf2..cd71807088 100644
--- a/src/backend/replication/logical/tablesync.c
+++ b/src/backend/replication/logical/tablesync.c
@@ -151,7 +151,7 @@ finish_sync_worker(void)
 
 	StartTransactionCommand();
 	ereport(LOG,
-			(errmsg("logical replication table synchronization worker for subscription \"%s\", table \"%s\" has finished",
+			(errmsg("logical replication synchronization worker for subscription \"%s\", table \"%s\" has finished"

Re: JSON Path and GIN Questions

2023-09-16 Thread Erik Wienhold
On 16/09/2023 22:19 CEST David E. Wheeler  wrote:

> On Sep 15, 2023, at 23:59, Erik Rijkers  wrote:
>
> > movie @? '$ ?($.year >= 2023)'
> >
> > I believe it is indeed not possible to have such a unequality-search use
> > the GIN index.  It is another weakness of JSON that can be unexpected to
> > those not in the fullness of Knowledge of the manual. Yes, this too would
> > be good to explain in the doc where JSON indexes are explained.
>
> Is that a limitation of GIN indexes in general? Or could there be opclass
> improvements in the future that would enable such comparisons?

This detail is mentioned in docs [1]:

"For these operators, a GIN index extracts clauses of the form
 **accessors_chain = constant** out of the jsonpath pattern, and does the
 index search based on the keys and values mentioned in these clauses."

I don't know if this is a general limitation of GIN indexes or just how these
operators are implemented right now.

[1] https://www.postgresql.org/docs/current/datatype-json.html#JSON-INDEXING

--
Erik




Re: JSON Path and GIN Questions

2023-09-16 Thread David E. Wheeler
On Sep 16, 2023, at 16:50, Erik Wienhold  wrote:

> "For these operators, a GIN index extracts clauses of the form
> **accessors_chain = constant** out of the jsonpath pattern, and does the
> index search based on the keys and values mentioned in these clauses."
> 
> I don't know if this is a general limitation of GIN indexes or just how these
> operators are implemented right now.
> 
> [1] https://www.postgresql.org/docs/current/datatype-json.html#JSON-INDEXING


The detail that jumps out at me is this one on jsonb_path_ops:

“Basically, each jsonb_path_ops index item is a hash of the value and the 
key(s) leading to it”

Because jsonb_path_ops indexes hashes, I would assume it would only support 
path equality. But it’s not clear to me from these docs that jsonb_ops also 
indexes hashes. Does it?

Best,

D



signature.asc
Description: Message signed with OpenPGP


Re: JSON Path and GIN Questions

2023-09-16 Thread Erik Wienhold
On 16/09/2023 22:26 CEST David E. Wheeler  wrote:

> I’ve started work on this; there’s so much to learn! Here’s a new example
> that surprised me a bit. Using the GPS tracker example from the docs [1]
> loaded into a `:json` psql variable, this output of this query makes perfect
> sense to me:
>
> david=# select jsonb_path_query(:'json', '$.track.segments.location[*] ? (@ < 
> 14)');
>  jsonb_path_query
> --
>  13.4034
>  13.2635
>
> Because `[*]` selects all the values. This, however, I did not expect:
>
> david=# select jsonb_path_query(:'json', '$.track.segments.location ? (@[*] < 
> 14)');
>  jsonb_path_query
> --
>  13.4034
>  13.2635
> (2 rows)
>
> I had expected it to return two single-value arrays, instead:
>
>  [13.4034]
>  [13.2635]
>
> It appears that the filter expression is doing some sub-selection, too.
> Is that expected?

Looks like the effect of lax mode which may unwrap arrays when necessary [1].
The array unwrapping looks like the result of jsonb_array_elements().

It kinda works in strict mode:

SELECT jsonb_path_query(:'json', 'strict $.track.segments[*].location ? 
(@[*] < 14)');

   jsonb_path_query
---
 [47.763, 13.4034]
 [47.706, 13.2635]
(2 rows)

But it does not remove elements from the matching arrays.  Which I don't even
expect here because the path specifies the location array as the object to be
returned.  The filter expression then only decides whether to return the
location array or not.  Nowhere in the docs does it say that the filter
expression itself removes any elements from a matched array.

Here's a query that filter's out individual array elements.  It's quite a
mouthful (especially to preserve the order of array elements):

WITH location AS (
  SELECT loc, row_number() OVER () AS array_num
  FROM jsonb_path_query(:'json', 'strict $.track.segments[*].location') 
loc
),
element AS (
  SELECT array_num, e.num AS elem_num, e.elem
  FROM location
CROSS JOIN jsonb_array_elements(loc) WITH ORDINALITY AS e (elem, 
num)
)
SELECT jsonb_agg(elem ORDER BY elem_num)
FROM element
WHERE jsonb_path_exists(elem, '$ ? (@ < 14)')
GROUP BY array_num;

   jsonb_agg
---
 [13.2635]
 [13.4034]
(2 rows)

[1] 
https://www.postgresql.org/docs/current/functions-json.html#STRICT-AND-LAX-MODES

--
Erik




Re: JSON Path and GIN Questions

2023-09-16 Thread David E. Wheeler
On Sep 16, 2023, at 18:13, Erik Wienhold  wrote:

> Looks like the effect of lax mode which may unwrap arrays when necessary [1].
> The array unwrapping looks like the result of jsonb_array_elements().
> 
> It kinda works in strict mode:
> 
> SELECT jsonb_path_query(:'json', 'strict $.track.segments[*].location ? (@[*] 
> < 14)');
> 
>jsonb_path_query
> ---
>  [47.763, 13.4034]
>  [47.706, 13.2635]
> (2 rows)
> 
> But it does not remove elements from the matching arrays.  Which I don't even
> expect here because the path specifies the location array as the object to be
> returned.  The filter expression then only decides whether to return the
> location array or not.  Nowhere in the docs does it say that the filter
> expression itself removes any elements from a matched array.

Yes, this is what I expected. It means “select the location array if any of its 
contents is less that 14.”

I don’t understand why it’s different in lax mode, though, as `@[*]` is not a 
structural error; it confirms to the schema, as the docs say. The flattening in 
this case seems weird.

Ah, here’s why:, from the docs:

"Besides, comparison operators automatically unwrap their operands in the lax 
mode, so you can compare SQL/JSON arrays out-of-the-box.”

There follow some discussion of the need to specify `[*]` on segments in strict 
mode, but since that’s exactly what my example does (and the same for the 
locations array inside the filter), it doesn’t seem right to me that it would 
be unwrapped here.

> Here's a query that filter's out individual array elements.  It's quite a
> mouthful (especially to preserve the order of array elements):

Wow fun, and yeah, it makes sense to take things apart in SQL for this sort of 
thing!

Best,

David



signature.asc
Description: Message signed with OpenPGP


Re: [PoC] Improve dead tuple storage for lazy vacuum

2023-09-16 Thread Masahiko Sawada
On Sat, Sep 16, 2023 at 9:03 AM Andres Freund  wrote:
>
> Hi,
>
> On 2023-08-28 23:43:22 +0900, Masahiko Sawada wrote:
> > I've attached v42 patch set. I improved tidstore regression test codes
> > in addition of imcorporating the above comments.
>
> Why did you need to disable the benchmark module for CI?

I didn't want to unnecessarily make cfbot unhappy since the benchmark
module is not going to get committed to the core and sometimes not
up-to-date.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com




Re: remaining sql/json patches

2023-09-16 Thread Erik Rijkers

Op 9/14/23 om 10:14 schreef Amit Langote:





Hi Amit,

Just now I built a v14-patched server and I found this crash:

select json_query(jsonb '
{
  "arr": [
{"arr": [2,3]}
  , {"arr": [4,5]}
  ]
}'
  , '$.arr[*].arr ? (@ <= 3)' returning anyarray  WITH WRAPPER) --crash
;
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
connection to server was lost


Can you have a look?

Thanks,

Erik