On 3/17/19 12:55 PM, Alexander Korotkov wrote: > >> However, when I did something a little more complex, like the below: >> >> SELECT count(*) >> FROM news_feed >> WHERE data @? '$.length ? (@ < 150)'; >> >> SELECT count(*) >> FROM news_feed >> WHERE data @? '$.content ? (@ like_regex "^Start")'; >> >> SELECT id, jsonb_path_query(data, '$.content') >> FROM news_feed >> WHERE data @? '$.content ? (@ like_regex "risk" flag "i")'; >> >> I would find that the index scan performed as well as the sequential >> scan. Additionally, on my laptop, the parallel sequential scan would >> beat the index scan by ~2.5x in some cases. > > Yeah, this cases are not supported. Did optimizer automatically > select sequential scan in this case (if not touching enable_* > variables)? It should, because optimizer understands that GIN scan > will be bad if extract_query method failed to extract anything.
It did not - it was doing a bitmap heap scan. I have default costs setup. Example output from EXPLAIN ANALYZE with the index available: Aggregate (cost=1539.78..1539.79 rows=1 width=8) (actual time=270.419..270.419 rows=1 loops=1) -> Bitmap Heap Scan on news_feed (cost=23.24..1538.73 rows=418 width=0) (actual time=84.040..270.407 rows=5 loops=1) Recheck Cond: (data @? '$."length"?(@ < 150)'::jsonpath) Rows Removed by Index Recheck: 418360 Heap Blocks: exact=28690 -> Bitmap Index Scan on news_feed_data_gin_idx (cost=0.00..23.14 rows=418 width=0) (actual time=41.788..41.788 rows=418365 loops=1) Index Cond: (data @? '$."length"?(@ < 150)'::jsonpath) Planning Time: 0.168 ms Execution Time: 271.105 ms And for arguments sake, after I dropped the index (and max_parallel_workers = 8): Finalize Aggregate (cost=30998.07..30998.08 rows=1 width=8) (actual time=91.062..91.062 rows=1 loops=1) -> Gather (cost=30997.65..30998.06 rows=4 width=8) (actual time=90.892..97.739 rows=5 loops=1) Workers Planned: 4 Workers Launched: 4 -> Partial Aggregate (cost=29997.65..29997.66 rows=1 width=8) (actual time=76.977..76.977 rows=1 loops=5) -> Parallel Seq Scan on news_feed (cost=0.00..29997.39 rows=104 width=0) (actual time=39.736..76.964 rows=1 loops=5) Filter: (data @? '$."length"?(@ < 150)'::jsonpath) Rows Removed by Filter: 83672 Planning Time: 0.127 ms Execution Time: 97.801 ms >> Reading up on what the GIN patch does, this all makes sense: it's >> optimized for equality, I understand there are challenges to be able to >> handle inequality, regex exps, etc. And the cases where it really does >> work well, it's _incredibly_ fast. > > Yes, for more complex cases, we need different opclasses. For > instance, we can consider porting jsquery opclasses to PG 13. And it > become even more important to get parametrized opclasses, because we > don't necessary want to index all the json fields in this same way. > That's another challenge for future releases. But what we have now is > just support for some of jsonpathes for existing opclasses. Yeah, that makes sense, and seems to be my recollection from the several years of presentations I've seen on the topic ;) >> My suggestion would be adding some additional guidance in the user >> documentation around how GIN works with the @@ and @? operators so they >> can understand where GIN will work very well with JSON path + their data >> and not be surprised when other types of JSON path queries are >> performing on par with a sequential scan (or worse than a parallel seq >> scan). > > Good point. Will do. Thanks! Jonathan
signature.asc
Description: OpenPGP digital signature