Re: Slow query when pg_trgm is in inner lopp

Sasa Vilic Wed, 20 Jun 2018 08:40:17 -0700

Hi Jeff,

the way I see it, is it a poor man's implementation of 'full-text' search.
I just discussed it with out navdata team and we might be redefine how do
we do the search. Regardless of that, I think that issue with Postgres
stands.


I tried now to see, how the query would behave if we always had
left-anchored pattern and that would allow us to stick to btree indexes.

* Query
------------------------------------
EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS)
SELECT
  r.*
FROM navdata.route r
  INNER JOIN navdata.point op ON r.frompointguid = op.guid
  INNER JOIN navdata.point dp ON r.topointguid = dp.guid
WHERE
  r.routeidentifier LIKE '%'
  AND tsrange(r.startvalid, r.endvalid) @> now() :: TIMESTAMP
  AND (NOT false :: BOOLEAN OR r.source = ANY (ARRAY[] :: VARCHAR []))
  AND op.identifier LIKE 'LOW%'
  AND op.type = ANY (ARRAY['PA'] :: VARCHAR [])
  AND tsrange(op.startvalid, op.endvalid) @> now() :: TIMESTAMP
  AND dp.identifier LIKE '%' :: VARCHAR
  AND dp.type = ANY (ARRAY['PA'] :: VARCHAR [])
  AND tsrange(dp.startvalid, dp.endvalid) @> now() :: TIMESTAMP
ORDER BY r.routeidentifier
LIMIT 1000;

* Analysis
---------------------------------------
Limit  (cost=646.48..646.48 rows=1 width=349) (actual
time=1375.359..1376.447 rows=1000 loops=1)
  Output: r.uid, r.routeidentifier, r.frompointguid, r.topointguid,
r.sidguid, r.starguid, r.routeinformation, r.routetype, r.startvalid,
r.endvalid, r.revisionuid, r.source, r.fufi, r.grounddistance_excl_sidstar,
r.from_first, r.dep_airports, r.dst_airports, r.tag,
r.expanded_route_string, r.route_geometry
  Buffers: shared hit=79276
  ->  Sort  (cost=646.48..646.48 rows=1 width=349) (actual
time=1375.356..1375.785 rows=1000 loops=1)
        Output: r.uid, r.routeidentifier, r.frompointguid, r.topointguid,
r.sidguid, r.starguid, r.routeinformation, r.routetype, r.startvalid,
r.endvalid, r.revisionuid, r.source, r.fufi, r.grounddistance_excl_sidstar,
r.from_first, r.dep_airports, r.dst_airports, r.tag,
r.expanded_route_string, r.route_geometry
        Sort Key: r.routeidentifier
        Sort Method: top-N heapsort  Memory: 321kB
        Buffers: shared hit=79276
        ->  Nested Loop  (cost=250.30..646.47 rows=1 width=349) (actual
time=202.826..1372.178 rows=2596 loops=1)
              Output: r.uid, r.routeidentifier, r.frompointguid,
r.topointguid, r.sidguid, r.starguid, r.routeinformation, r.routetype,
r.startvalid, r.endvalid, r.revisionuid, r.source, r.fufi,
r.grounddistance_excl_sidstar, r.from_first, r.dep_airports,
r.dst_airports, r.tag, r.expanded_route_string, r.route_geometry
              Buffers: shared hit=79276
              ->  Nested Loop  (cost=249.87..621.96 rows=1 width=349)
(actual time=202.781..1301.135 rows=2602 loops=1)
                    Output: r.uid, r.routeidentifier, r.frompointguid,
r.topointguid, r.sidguid, r.starguid, r.routeinformation, r.routetype,
r.startvalid, r.endvalid, r.revisionuid, r.source, r.fufi,
r.grounddistance_excl_sidstar, r.from_first, r.dep_airports,
r.dst_airports, r.tag, r.expanded_route_string, r.route_geometry
                    Buffers: shared hit=51974
                    ->  Index Scan using idx_point_11 on navdata.point op
(cost=0.43..107.02 rows=2 width=16) (actual time=0.055..0.214 rows=7
loops=1)
                          Output: op.uid, op.guid, op.airportguid,
op.identifier, op.icaocode, op.name, op.type, op.coordinates, op.fir,
op.navaidfrequency, op.elevation, op.magneticvariance, op.startvalid,
op.endvalid, op.revisionuid, op.source, op.leveltype
                          Index Cond: (((op.type)::text = ANY
('{PA}'::text[])) AND ((op.identifier)::text ~>=~ 'LOW'::text) AND
((op.identifier)::text ~<~ 'LOX'::text))
                          Filter: (((op.identifier)::text ~~ 'LOW%'::text)
AND (tsrange(op.startvalid, op.endvalid) @> (now())::timestamp without time
zone))
                          Rows Removed by Filter: 42
                          Buffers: shared hit=52
                    ->  Bitmap Heap Scan on navdata.route r
(cost=249.44..257.45 rows=2 width=349) (actual time=183.255..185.491
rows=372 loops=7)
                          Output: r.uid, r.routeidentifier,
r.frompointguid, r.topointguid, r.sidguid, r.starguid, r.routeinformation,
r.routetype, r.startvalid, r.endvalid, r.revisionuid, r.source, r.fufi,
r.grounddistance_excl_sidstar, r.from_first, r.dep_airports,
r.dst_airports, r.tag, r.expanded_route_string, r.route_geometry
                          Recheck Cond: ((r.frompointguid = op.guid) AND
(tsrange(r.startvalid, r.endvalid) @> (now())::timestamp without time zone))
                          Filter: ((r.routeidentifier)::text ~~ '%'::text)
                          Heap Blocks: exact=2140
                          Buffers: shared hit=51922
                          ->  BitmapAnd  (cost=249.44..249.44 rows=2
width=0) (actual time=183.197..183.197 rows=0 loops=7)
                                Buffers: shared hit=49782
                                ->  Bitmap Index Scan on idx_route_02
(cost=0.00..10.96 rows=338 width=0) (actual time=0.162..0.162 rows=884
loops=7)
                                      Index Cond: (r.frompointguid =
op.guid)
                                      Buffers: shared hit=47
                                ->  Bitmap Index Scan on idx_route_07
(cost=0.00..237.01 rows=4896 width=0) (actual time=182.858..182.858
rows=579062 loops=7)
                                      Index Cond: (tsrange(r.startvalid,
r.endvalid) @> (now())::timestamp without time zone)
                                      Buffers: shared hit=49735
              ->  Index Scan using cidx_point on navdata.point dp
(cost=0.43..24.50 rows=1 width=16) (actual time=0.008..0.025 rows=1
loops=2602)
                    Output: dp.uid, dp.guid, dp.airportguid, dp.identifier,
dp.icaocode, dp.name, dp.type, dp.coordinates, dp.fir, dp.navaidfrequency,
dp.elevation, dp.magneticvariance, dp.startvalid, dp.endvalid,
dp.revisionuid, dp.source, dp.leveltype
                    Index Cond: (dp.guid = r.topointguid)
                    Filter: (((dp.type)::text = ANY ('{PA}'::text[])) AND
((dp.identifier)::text ~~ '%'::text) AND (tsrange(dp.startvalid,
dp.endvalid) @> (now())::timestamp without time zone))
                    Rows Removed by Filter: 6
                    Buffers: shared hit=27302
Planning time: 12.202 ms
Execution time: 1376.912 ms

Why think is weird here is this:

                        ->  BitmapAnd  (cost=249.44..249.44 rows=2 width=0)
(actual time=183.197..183.197 rows=0 loops=7)
                                Buffers: shared hit=49782
                                ->  Bitmap Index Scan on idx_route_02
(cost=0.00..10.96 rows=338 width=0) (actual time=0.162..0.162 rows=884
loops=7)
                                      Index Cond: (r.frompointguid =
op.guid)
                                      Buffers: shared hit=47
                                ->  Bitmap Index Scan on idx_route_07
(cost=0.00..237.01 rows=4896 width=0) (actual time=182.858..182.858
rows=579062 loops=7)
                                      Index Cond: (tsrange(r.startvalid,
r.endvalid) @> (now())::timestamp without time zone)
                                      Buffers: shared hit=49735

Why would postgres choose to use second index idx_route_07 at all when row
estimate is way higher then on idx_route_02? Wouldn't it be better just to
use one index with lower number of estimated rows and then filter?

Thanks
Sasa

On 20 June 2018 at 16:53, Jeff Janes <jeff.ja...@gmail.com> wrote:

> On Wed, Jun 20, 2018 at 9:21 AM, Sasa Vilic <sasavi...@gmail.com> wrote:
>
>
>> Query that we have finds all routes between two set of points. A set is a
>> dynamically/loosely defined by pattern given by the user input. So for
>> example
>> if user wants to find all routes between international airports in
>> Austria
>> toward London Heathrow, he or she would use 'LOW%' as
>> :from_point_identifier
>> and 'EGLL' as :to_point_identifier. Please keep in mind that is a simple
>> case,
>> and that user is allowed to define search term any way he/she see it fit,
>> i.e. '%OW%', 'EG%'.
>>
>
>
> Letting users do substring searches on airport codes in the middle of a
> complex query makes no sense.  Do all airports with 'OW' in the middle of
> them having something in common with each other?  If people can't remember
> the real airport code of the airport they are using, you should offer a
> look-up tool which they can use to figure that out **before** hitting the
> main query.
>
> But taking for granted your weird use case, the most obvious improvement
> to the PostgreSQL code that I can see is in the executor, not the planner.
> There is no reason to recompute the bitmap on idx_point_08 each time
> through the nested loop, as the outcome of that scan doesn't depend on the
> outer tuple.  Presumably the reason this happens is that it is being
> 'BitmapAnd'ed with another bitmap index scan which does depend on the outer
> tuple, and it is just not smart enough to reuse the stable bitmap while
> recomputing the parameterized one.
>
> Cheers,
>
> Jeff
>

Re: Slow query when pg_trgm is in inner lopp

Reply via email to