Hi Jeff, the way I see it, is it a poor man's implementation of 'full-text' search. I just discussed it with out navdata team and we might be redefine how do we do the search. Regardless of that, I think that issue with Postgres stands.
I tried now to see, how the query would behave if we always had left-anchored pattern and that would allow us to stick to btree indexes. * Query ------------------------------------ EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS) SELECT r.* FROM navdata.route r INNER JOIN navdata.point op ON r.frompointguid = op.guid INNER JOIN navdata.point dp ON r.topointguid = dp.guid WHERE r.routeidentifier LIKE '%' AND tsrange(r.startvalid, r.endvalid) @> now() :: TIMESTAMP AND (NOT false :: BOOLEAN OR r.source = ANY (ARRAY[] :: VARCHAR [])) AND op.identifier LIKE 'LOW%' AND op.type = ANY (ARRAY['PA'] :: VARCHAR []) AND tsrange(op.startvalid, op.endvalid) @> now() :: TIMESTAMP AND dp.identifier LIKE '%' :: VARCHAR AND dp.type = ANY (ARRAY['PA'] :: VARCHAR []) AND tsrange(dp.startvalid, dp.endvalid) @> now() :: TIMESTAMP ORDER BY r.routeidentifier LIMIT 1000; * Analysis --------------------------------------- Limit (cost=646.48..646.48 rows=1 width=349) (actual time=1375.359..1376.447 rows=1000 loops=1) Output: r.uid, r.routeidentifier, r.frompointguid, r.topointguid, r.sidguid, r.starguid, r.routeinformation, r.routetype, r.startvalid, r.endvalid, r.revisionuid, r.source, r.fufi, r.grounddistance_excl_sidstar, r.from_first, r.dep_airports, r.dst_airports, r.tag, r.expanded_route_string, r.route_geometry Buffers: shared hit=79276 -> Sort (cost=646.48..646.48 rows=1 width=349) (actual time=1375.356..1375.785 rows=1000 loops=1) Output: r.uid, r.routeidentifier, r.frompointguid, r.topointguid, r.sidguid, r.starguid, r.routeinformation, r.routetype, r.startvalid, r.endvalid, r.revisionuid, r.source, r.fufi, r.grounddistance_excl_sidstar, r.from_first, r.dep_airports, r.dst_airports, r.tag, r.expanded_route_string, r.route_geometry Sort Key: r.routeidentifier Sort Method: top-N heapsort Memory: 321kB Buffers: shared hit=79276 -> Nested Loop (cost=250.30..646.47 rows=1 width=349) (actual time=202.826..1372.178 rows=2596 loops=1) Output: r.uid, r.routeidentifier, r.frompointguid, r.topointguid, r.sidguid, r.starguid, r.routeinformation, r.routetype, r.startvalid, r.endvalid, r.revisionuid, r.source, r.fufi, r.grounddistance_excl_sidstar, r.from_first, r.dep_airports, r.dst_airports, r.tag, r.expanded_route_string, r.route_geometry Buffers: shared hit=79276 -> Nested Loop (cost=249.87..621.96 rows=1 width=349) (actual time=202.781..1301.135 rows=2602 loops=1) Output: r.uid, r.routeidentifier, r.frompointguid, r.topointguid, r.sidguid, r.starguid, r.routeinformation, r.routetype, r.startvalid, r.endvalid, r.revisionuid, r.source, r.fufi, r.grounddistance_excl_sidstar, r.from_first, r.dep_airports, r.dst_airports, r.tag, r.expanded_route_string, r.route_geometry Buffers: shared hit=51974 -> Index Scan using idx_point_11 on navdata.point op (cost=0.43..107.02 rows=2 width=16) (actual time=0.055..0.214 rows=7 loops=1) Output: op.uid, op.guid, op.airportguid, op.identifier, op.icaocode, op.name, op.type, op.coordinates, op.fir, op.navaidfrequency, op.elevation, op.magneticvariance, op.startvalid, op.endvalid, op.revisionuid, op.source, op.leveltype Index Cond: (((op.type)::text = ANY ('{PA}'::text[])) AND ((op.identifier)::text ~>=~ 'LOW'::text) AND ((op.identifier)::text ~<~ 'LOX'::text)) Filter: (((op.identifier)::text ~~ 'LOW%'::text) AND (tsrange(op.startvalid, op.endvalid) @> (now())::timestamp without time zone)) Rows Removed by Filter: 42 Buffers: shared hit=52 -> Bitmap Heap Scan on navdata.route r (cost=249.44..257.45 rows=2 width=349) (actual time=183.255..185.491 rows=372 loops=7) Output: r.uid, r.routeidentifier, r.frompointguid, r.topointguid, r.sidguid, r.starguid, r.routeinformation, r.routetype, r.startvalid, r.endvalid, r.revisionuid, r.source, r.fufi, r.grounddistance_excl_sidstar, r.from_first, r.dep_airports, r.dst_airports, r.tag, r.expanded_route_string, r.route_geometry Recheck Cond: ((r.frompointguid = op.guid) AND (tsrange(r.startvalid, r.endvalid) @> (now())::timestamp without time zone)) Filter: ((r.routeidentifier)::text ~~ '%'::text) Heap Blocks: exact=2140 Buffers: shared hit=51922 -> BitmapAnd (cost=249.44..249.44 rows=2 width=0) (actual time=183.197..183.197 rows=0 loops=7) Buffers: shared hit=49782 -> Bitmap Index Scan on idx_route_02 (cost=0.00..10.96 rows=338 width=0) (actual time=0.162..0.162 rows=884 loops=7) Index Cond: (r.frompointguid = op.guid) Buffers: shared hit=47 -> Bitmap Index Scan on idx_route_07 (cost=0.00..237.01 rows=4896 width=0) (actual time=182.858..182.858 rows=579062 loops=7) Index Cond: (tsrange(r.startvalid, r.endvalid) @> (now())::timestamp without time zone) Buffers: shared hit=49735 -> Index Scan using cidx_point on navdata.point dp (cost=0.43..24.50 rows=1 width=16) (actual time=0.008..0.025 rows=1 loops=2602) Output: dp.uid, dp.guid, dp.airportguid, dp.identifier, dp.icaocode, dp.name, dp.type, dp.coordinates, dp.fir, dp.navaidfrequency, dp.elevation, dp.magneticvariance, dp.startvalid, dp.endvalid, dp.revisionuid, dp.source, dp.leveltype Index Cond: (dp.guid = r.topointguid) Filter: (((dp.type)::text = ANY ('{PA}'::text[])) AND ((dp.identifier)::text ~~ '%'::text) AND (tsrange(dp.startvalid, dp.endvalid) @> (now())::timestamp without time zone)) Rows Removed by Filter: 6 Buffers: shared hit=27302 Planning time: 12.202 ms Execution time: 1376.912 ms Why think is weird here is this: -> BitmapAnd (cost=249.44..249.44 rows=2 width=0) (actual time=183.197..183.197 rows=0 loops=7) Buffers: shared hit=49782 -> Bitmap Index Scan on idx_route_02 (cost=0.00..10.96 rows=338 width=0) (actual time=0.162..0.162 rows=884 loops=7) Index Cond: (r.frompointguid = op.guid) Buffers: shared hit=47 -> Bitmap Index Scan on idx_route_07 (cost=0.00..237.01 rows=4896 width=0) (actual time=182.858..182.858 rows=579062 loops=7) Index Cond: (tsrange(r.startvalid, r.endvalid) @> (now())::timestamp without time zone) Buffers: shared hit=49735 Why would postgres choose to use second index idx_route_07 at all when row estimate is way higher then on idx_route_02? Wouldn't it be better just to use one index with lower number of estimated rows and then filter? Thanks Sasa On 20 June 2018 at 16:53, Jeff Janes <jeff.ja...@gmail.com> wrote: > On Wed, Jun 20, 2018 at 9:21 AM, Sasa Vilic <sasavi...@gmail.com> wrote: > > >> Query that we have finds all routes between two set of points. A set is a >> dynamically/loosely defined by pattern given by the user input. So for >> example >> if user wants to find all routes between international airports in >> Austria >> toward London Heathrow, he or she would use 'LOW%' as >> :from_point_identifier >> and 'EGLL' as :to_point_identifier. Please keep in mind that is a simple >> case, >> and that user is allowed to define search term any way he/she see it fit, >> i.e. '%OW%', 'EG%'. >> > > > Letting users do substring searches on airport codes in the middle of a > complex query makes no sense. Do all airports with 'OW' in the middle of > them having something in common with each other? If people can't remember > the real airport code of the airport they are using, you should offer a > look-up tool which they can use to figure that out **before** hitting the > main query. > > But taking for granted your weird use case, the most obvious improvement > to the PostgreSQL code that I can see is in the executor, not the planner. > There is no reason to recompute the bitmap on idx_point_08 each time > through the nested loop, as the outcome of that scan doesn't depend on the > outer tuple. Presumably the reason this happens is that it is being > 'BitmapAnd'ed with another bitmap index scan which does depend on the outer > tuple, and it is just not smart enough to reuse the stable bitmap while > recomputing the parameterized one. > > Cheers, > > Jeff >