Re: [GENERAL] Improve PostGIS performance with 62 million rows?

Paul Ramsey Thu, 05 Jan 2017 11:39:26 -0800

Yes, you did. You want a query that spits out a tupleset of goemetries (one
each for each wee segment), and then you can join that set to your main
table using st_dwithin() as the join clause.
So start by ditching the main table and just work on a query that generates
a pile of wee segments.


On Thu, Jan 5, 2017 at 11:36 AM, Israel Brewster <isr...@ravnalaska.net>
wrote:

> On Jan 5, 2017, at 8:50 AM, Paul Ramsey <pram...@cleverelephant.ca> wrote:
>
>
> The index filters using bounding boxes.  A long, diagonal route will have
> a large bounding box, relative to the area you actually care about (within
> a narrow strip of the route). Use ST_Segmentize() to add points to your
> route, ST_DumpPoints() to dump those out as point and ST_MakeLine to
> generate new lines from those points, each line very short. The maximum
> index effectiveness will come when your line length is close to your buffer
> width.
>
> P
>
>
> Ok, I think I understand the concept. So attempting to follow your advice,
> I modified the query to be:
>
> SELECT elevation
> FROM data
> WHERE
>     ST_DWithin(
>         location,
>         (SELECT ST_MakeLine(geom)::geography as split_line
>          FROM (SELECT
>         (ST_DumpPoints(
>             ST_Segmentize(
>                 ST_GeographyFromText('SRID=4326;LINESTRING(-150.008056
> 61.179167,-156.77 71.285833)'),
>                 600
>             )::geometry
>         )).geom
>     ) s1),
>         600
>     )
> ORDER BY elevation DESC limit 1;
>
> It took some fiddling to find a syntax that Postgresql would accept, but
> eventually that's what I came up with. Unfortunately, far from improving
> performance, it killed it - in running the query, it went from 22 seconds
> to several minutes (EXPLAIn ANALYZE has yet to return a result). Looking at
> the query execution plan shows, at least partially, why:
>
>                                   QUERY PLAN
>
> ------------------------------------------------------------
> ------------------
>  Limit  (cost=17119748.98..17119748.98 rows=1 width=4)
>    InitPlan 1 (returns $0)
>      ->  Aggregate  (cost=17.76..17.77 rows=1 width=32)
>            ->  Result  (cost=0.00..5.25 rows=1000 width=32)
>    ->  Sort  (cost=17119731.21..17171983.43 rows=20900890 width=4)
>          Sort Key: data.elevation DESC
>          ->  Seq Scan on data  (cost=0.00..17015226.76 rows=20900890
> width=4)
>                Filter: st_dwithin(location, $0, '600'::double precision)
> (8 rows)
>
> So apparently it is now doing a sequential scan on data rather than using
> the index. And, of course, sorting 20 million rows is not trivial either.
> Did I do something wrong with forming the query?
>
> -----------------------------------------------
> Israel Brewster
> Systems Analyst II
> Ravn Alaska
> 5245 Airport Industrial Rd
> Fairbanks, AK 99709
> (907) 450-7293
> -----------------------------------------------
>
>
> On Thu, Jan 5, 2017 at 9:45 AM, Israel Brewster <isr...@ravnalaska.net>
> wrote:
>
>> I have a database (PostgreSQL 9.6.1) containing 62,702,675 rows of
>> latitude (numeric), longitude(numeric), elevation(integer) data, along with
>> a PostGIS (2.3.0) geometry column (location), running on a CentOS 6.8 box
>> with 64GB RAM and a RAID10 SSD data drive. I'm trying to get the maximum
>> elevation along a path, for which purpose I've come up with the following
>> query (for one particular path example):
>>
>> SELECT elevation FROM data
>>
>>
>>
>>
>>                 WHERE ST_DWithin(location, 
>> ST_GeographyFromText('SRID=4326;LINESTRING(-150.008056
>> 61.179167,-156.77 71.285833)'), 600)
>>
>>
>>
>>   ORDER BY elevation LIMIT 1;
>>
>> The EXPLAIN ANALYZE output of this particular query (
>> https://explain.depesz.com/s/heZ) shows:
>>
>>
>>
>>                 QUERY PLAN
>>
>>
>> ------------------------------------------------------------
>> ------------------------------------------------------------
>> ------------------------------------------------------------
>> ------------------------------------------------------------
>> ------------------------------------------------------------
>> ------------------------------------------
>>  Limit  (cost=4.83..4.83 rows=1 width=4) (actual
>> time=22653.840..22653.842 rows=1 loops=1)
>>    ->  Sort  (cost=4.83..4.83 rows=1 width=4) (actual
>> time=22653.837..22653.837 rows=1 loops=1)
>>          Sort Key: elevation DESC
>>          Sort Method: top-N heapsort  Memory: 25kB
>>          ->  Index Scan using location_gix on data  (cost=0.42..4.82
>> rows=1 width=4) (actual time=15.786..22652.041 rows=11081 loops=1)
>>                Index Cond: (location && '0102000020E6100000020000002C1
>> 1A8FE41C062C0DFC2BAF1EE964E40713D0AD7A39863C086C77E164BD2514
>> 0'::geography)
>>                Filter: (('0102000020E6100000020000002
>> C11A8FE41C062C0DFC2BAF1EE964E40713D0AD7A39863C086C77E164BD25140'::geography
>> && _st_expand(location, '600'::double precision)) AND
>> _st_dwithin(location, '0102000020E6100000020000002C11A8FE41C
>> 062C0DFC2BAF1EE964E40713D0AD7A39863C086C77E164BD25140'::geography,
>> '600'::double precision, true))
>>                Rows Removed by Filter: 4934534
>>  Planning time: 0.741 ms
>>  Execution time: 22653.906 ms
>> (10 rows)
>>
>> So it is using the index properly, but still takes a good 22 seconds to
>> run, most of which appears to be in the Index Scan.
>>
>> Is there any way to improve this, or is this going to be about as good as
>> it gets with the number of rows being dealt with? I was planning to use
>> this for a real-time display - punch in a couple of points, get some
>> information about the route between, including maximum elevation - but with
>> it taking 22 seconds for the longer routes at least, that doesn't make for
>> the best user experience.
>>
>> It's perhaps worth noting that the example above is most likely a worst
>> case scenario. I would expect the vast majority of routes to be
>> significantly shorter, and I want to say the shorter routes query much
>> faster [testing needed]. That said, the faster the better, even for short
>> routes :-)
>> -----------------------------------------------
>> Israel Brewster
>> Systems Analyst II
>> Ravn Alaska
>> 5245 Airport Industrial Rd
>> Fairbanks, AK 99709
>> (907) 450-7293
>> -----------------------------------------------
>>
>>
>>
>>
>>
>>
>
>

Re: [GENERAL] Improve PostGIS performance with 62 million rows?

Reply via email to