Re: [GENERAL] Improve PostGIS performance with 62 million rows?

Paul Ramsey Thu, 05 Jan 2017 09:52:29 -0800

The index filters using bounding boxes.  A long, diagonal route will have a
large bounding box, relative to the area you actually care about (within a
narrow strip of the route). Use ST_Segmentize() to add points to your
route, ST_DumpPoints() to dump those out as point and ST_MakeLine to
generate new lines from those points, each line very short. The maximum
index effectiveness will come when your line length is close to your buffer
width.


P

On Thu, Jan 5, 2017 at 9:45 AM, Israel Brewster <isr...@ravnalaska.net>
wrote:

> I have a database (PostgreSQL 9.6.1) containing 62,702,675 rows of
> latitude (numeric), longitude(numeric), elevation(integer) data, along with
> a PostGIS (2.3.0) geometry column (location), running on a CentOS 6.8 box
> with 64GB RAM and a RAID10 SSD data drive. I'm trying to get the maximum
> elevation along a path, for which purpose I've come up with the following
> query (for one particular path example):
>
> SELECT elevation FROM data
>
>
>
>
>                 WHERE ST_DWithin(location, 
> ST_GeographyFromText('SRID=4326;LINESTRING(-150.008056
> 61.179167,-156.77 71.285833)'), 600)
>
>
>
>   ORDER BY elevation LIMIT 1;
>
> The EXPLAIN ANALYZE output of this particular query (
> https://explain.depesz.com/s/heZ) shows:
>
>
>
>                 QUERY PLAN
>
>
> ------------------------------------------------------------
> ------------------------------------------------------------
> ------------------------------------------------------------
> ------------------------------------------------------------
> ------------------------------------------------------------
> ------------------------------------------
>  Limit  (cost=4.83..4.83 rows=1 width=4) (actual time=22653.840..22653.842
> rows=1 loops=1)
>    ->  Sort  (cost=4.83..4.83 rows=1 width=4) (actual
> time=22653.837..22653.837 rows=1 loops=1)
>          Sort Key: elevation DESC
>          Sort Method: top-N heapsort  Memory: 25kB
>          ->  Index Scan using location_gix on data  (cost=0.42..4.82
> rows=1 width=4) (actual time=15.786..22652.041 rows=11081 loops=1)
>                Index Cond: (location && '0102000020E6100000020000002C11
> A8FE41C062C0DFC2BAF1EE964E40713D0AD7A39863C086C77E164BD25140'::geography)
>                Filter: (('0102000020E6100000020000002C11
> A8FE41C062C0DFC2BAF1EE964E40713D0AD7A39863C086C77E164BD25140'::geography
> && _st_expand(location, '600'::double precision)) AND
> _st_dwithin(location, '0102000020E6100000020000002C11
> A8FE41C062C0DFC2BAF1EE964E40713D0AD7A39863C086C77E164BD25140'::geography,
> '600'::double precision, true))
>                Rows Removed by Filter: 4934534
>  Planning time: 0.741 ms
>  Execution time: 22653.906 ms
> (10 rows)
>
> So it is using the index properly, but still takes a good 22 seconds to
> run, most of which appears to be in the Index Scan.
>
> Is there any way to improve this, or is this going to be about as good as
> it gets with the number of rows being dealt with? I was planning to use
> this for a real-time display - punch in a couple of points, get some
> information about the route between, including maximum elevation - but with
> it taking 22 seconds for the longer routes at least, that doesn't make for
> the best user experience.
>
> It's perhaps worth noting that the example above is most likely a worst
> case scenario. I would expect the vast majority of routes to be
> significantly shorter, and I want to say the shorter routes query much
> faster [testing needed]. That said, the faster the better, even for short
> routes :-)
> -----------------------------------------------
> Israel Brewster
> Systems Analyst II
> Ravn Alaska
> 5245 Airport Industrial Rd
> Fairbanks, AK 99709
> (907) 450-7293
> -----------------------------------------------
>
>
>
>
>
>

Re: [GENERAL] Improve PostGIS performance with 62 million rows?

Reply via email to