Greetings Hackers,

Been a while! I’m working on some experiments with JSONB columns and GIN 
indexes, and have operated on the assumption that JSON Path operations would 
take advantage of GIN indexes, with json_path_ops as a nice optimization. But 
I’ve run into what appear to be some inconsistencies and oddities I’m hoping to 
figure out with your help.

For the examples in this email, I’m using this simple table:

CREATE TABLE MOVIES (id SERIAL PRIMARY KEY, movie JSONB NOT NULL);
\copy movies(movie) from PROGRAM 'curl -s 
https://raw.githubusercontent.com/prust/wikipedia-movie-data/master/movies.json 
| jq -c ".[]" | sed "s|\\\\|\\\\\\\\|g"';
create index on movies using gin (movie);
analyze movies;

That gives me a simple table with around 3600 rows. Not a lot of data, but 
hopefully enough to demonstrate the issues.

Issue 1: @@ vs @?
-----------------

I have been confused as to the difference between @@ vs @?: Why do these return 
different results?

david=# select id from movies where movie @@ '$ ?(@.title == "New Life 
Rescue")';
 id
----
(0 rows)

david=# select id from movies where movie @? '$ ?(@.title == "New Life 
Rescue")';
 id
----
 10
(1 row)

I posted this question on Stack Overflow 
(https://stackoverflow.com/q/77046554/79202), and from the suggestion I got 
there, it seems that @@ expects a boolean to be returned by the path query, 
while @? wraps it in an implicit exists(). Is that right?

If so, I’d like to submit a patch to the docs talking about this, and 
suggesting the use of jsonb_path_query() to test paths to see if they return a 
boolean or not.


Issue 2: @? Index Use
---------------------

From Oleg’s (happy belated birthday!) notes 
(https://github.com/obartunov/sqljsondoc/blob/master/jsonpath.md#jsonpath-operators):


> Operators @? and @@ are interchangeable:
> 
>     js @? '$.a' <=> js @@ 'exists($.a)’
>     js @@ '$.a == 1' <=> js @? '$ ? ($.a == 1)’

For the purposes of the above example, this appears to hold true: if I wrap the 
path query in exists(), @@ returns a result:

david=# select id from movies where movie @@ 'exists($ ?(@.title == "New Life 
Rescue"))';
 id
----
 10
(1 row)

Yay! However, @@ and @? don’t seem to use an index the same way: @@ uses a GIN 
index while @? does not.

Or, no, fiddling with it again just now, I think I have still been confusing 
these operators! @@ was using the index with an an explicit exists(), but @? 
was not…because I was still using an explicit exists.

In other words:

* @@ 'exists($ ?($.year == 1944))'  Uses the index
* @? '$ ?(@.year == 1944)'          Uses the index
* @? 'exists($ ?($.year == 1944))'  Does not use the index

That last one presumably doesn’t work, because there is an implicit exists() 
around the exists(), making it `exists(exists($ ?($.year == 1944)))`, which 
returns true for every row  (true and false both exists)! 🤦🏻‍♂️.

Anyway, if I have this right, I’d like to flesh out the docs a bit.

Issue 3: Index Use for Comparison
---------------------------------

From the docs 
(https://www.postgresql.org/docs/current/datatype-json.html#JSON-INDEXING), I 
had assumed any JSON Path query would be able to use the GIN index. However 
while the use of the == JSON Path operator is able to take advantage of the GIN 
index, apparently the >= operator cannot:

david=# explain analyze select id from movies where movie @? '$ ?($.year >= 
2023)';
                                               QUERY PLAN                       
                          
---------------------------------------------------------------------------------------------------------
 Seq Scan on movies  (cost=0.00..3741.41 rows=366 width=4) (actual 
time=34.815..36.259 rows=192 loops=1)
   Filter: (movie @? '$?($."year" >= 2023)'::jsonpath)
   Rows Removed by Filter: 36081
 Planning Time: 1.864 ms
 Execution Time: 36.338 ms
(5 rows)

Is this expected? Originally I tried with json_path_ops, which I can understand 
not working, since it stores hashes of paths, which would allow only exact 
matches. But a plain old GIN index doesn’t appear to work, either. Should it? 
Is there perhaps some other op class that would allow it to work? Or would I 
have to create a separate BTREE index on `movie -> 'year'`?

Thanks your your patience with my questions!

Best,

David

Attachment: signature.asc
Description: Message signed with OpenPGP

Reply via email to