Yes, you did. You want a query that spits out a tupleset of goemetries (one each for each wee segment), and then you can join that set to your main table using st_dwithin() as the join clause. So start by ditching the main table and just work on a query that generates a pile of wee segments.
On Thu, Jan 5, 2017 at 11:36 AM, Israel Brewster <isr...@ravnalaska.net> wrote: > On Jan 5, 2017, at 8:50 AM, Paul Ramsey <pram...@cleverelephant.ca> wrote: > > > The index filters using bounding boxes. A long, diagonal route will have > a large bounding box, relative to the area you actually care about (within > a narrow strip of the route). Use ST_Segmentize() to add points to your > route, ST_DumpPoints() to dump those out as point and ST_MakeLine to > generate new lines from those points, each line very short. The maximum > index effectiveness will come when your line length is close to your buffer > width. > > P > > > Ok, I think I understand the concept. So attempting to follow your advice, > I modified the query to be: > > SELECT elevation > FROM data > WHERE > ST_DWithin( > location, > (SELECT ST_MakeLine(geom)::geography as split_line > FROM (SELECT > (ST_DumpPoints( > ST_Segmentize( > ST_GeographyFromText('SRID=4326;LINESTRING(-150.008056 > 61.179167,-156.77 71.285833)'), > 600 > )::geometry > )).geom > ) s1), > 600 > ) > ORDER BY elevation DESC limit 1; > > It took some fiddling to find a syntax that Postgresql would accept, but > eventually that's what I came up with. Unfortunately, far from improving > performance, it killed it - in running the query, it went from 22 seconds > to several minutes (EXPLAIn ANALYZE has yet to return a result). Looking at > the query execution plan shows, at least partially, why: > > QUERY PLAN > > ------------------------------------------------------------ > ------------------ > Limit (cost=17119748.98..17119748.98 rows=1 width=4) > InitPlan 1 (returns $0) > -> Aggregate (cost=17.76..17.77 rows=1 width=32) > -> Result (cost=0.00..5.25 rows=1000 width=32) > -> Sort (cost=17119731.21..17171983.43 rows=20900890 width=4) > Sort Key: data.elevation DESC > -> Seq Scan on data (cost=0.00..17015226.76 rows=20900890 > width=4) > Filter: st_dwithin(location, $0, '600'::double precision) > (8 rows) > > So apparently it is now doing a sequential scan on data rather than using > the index. And, of course, sorting 20 million rows is not trivial either. > Did I do something wrong with forming the query? > > ----------------------------------------------- > Israel Brewster > Systems Analyst II > Ravn Alaska > 5245 Airport Industrial Rd > Fairbanks, AK 99709 > (907) 450-7293 > ----------------------------------------------- > > > On Thu, Jan 5, 2017 at 9:45 AM, Israel Brewster <isr...@ravnalaska.net> > wrote: > >> I have a database (PostgreSQL 9.6.1) containing 62,702,675 rows of >> latitude (numeric), longitude(numeric), elevation(integer) data, along with >> a PostGIS (2.3.0) geometry column (location), running on a CentOS 6.8 box >> with 64GB RAM and a RAID10 SSD data drive. I'm trying to get the maximum >> elevation along a path, for which purpose I've come up with the following >> query (for one particular path example): >> >> SELECT elevation FROM data >> >> >> >> >> WHERE ST_DWithin(location, >> ST_GeographyFromText('SRID=4326;LINESTRING(-150.008056 >> 61.179167,-156.77 71.285833)'), 600) >> >> >> >> ORDER BY elevation LIMIT 1; >> >> The EXPLAIN ANALYZE output of this particular query ( >> https://explain.depesz.com/s/heZ) shows: >> >> >> >> QUERY PLAN >> >> >> ------------------------------------------------------------ >> ------------------------------------------------------------ >> ------------------------------------------------------------ >> ------------------------------------------------------------ >> ------------------------------------------------------------ >> ------------------------------------------ >> Limit (cost=4.83..4.83 rows=1 width=4) (actual >> time=22653.840..22653.842 rows=1 loops=1) >> -> Sort (cost=4.83..4.83 rows=1 width=4) (actual >> time=22653.837..22653.837 rows=1 loops=1) >> Sort Key: elevation DESC >> Sort Method: top-N heapsort Memory: 25kB >> -> Index Scan using location_gix on data (cost=0.42..4.82 >> rows=1 width=4) (actual time=15.786..22652.041 rows=11081 loops=1) >> Index Cond: (location && '0102000020E6100000020000002C1 >> 1A8FE41C062C0DFC2BAF1EE964E40713D0AD7A39863C086C77E164BD2514 >> 0'::geography) >> Filter: (('0102000020E6100000020000002 >> C11A8FE41C062C0DFC2BAF1EE964E40713D0AD7A39863C086C77E164BD25140'::geography >> && _st_expand(location, '600'::double precision)) AND >> _st_dwithin(location, '0102000020E6100000020000002C11A8FE41C >> 062C0DFC2BAF1EE964E40713D0AD7A39863C086C77E164BD25140'::geography, >> '600'::double precision, true)) >> Rows Removed by Filter: 4934534 >> Planning time: 0.741 ms >> Execution time: 22653.906 ms >> (10 rows) >> >> So it is using the index properly, but still takes a good 22 seconds to >> run, most of which appears to be in the Index Scan. >> >> Is there any way to improve this, or is this going to be about as good as >> it gets with the number of rows being dealt with? I was planning to use >> this for a real-time display - punch in a couple of points, get some >> information about the route between, including maximum elevation - but with >> it taking 22 seconds for the longer routes at least, that doesn't make for >> the best user experience. >> >> It's perhaps worth noting that the example above is most likely a worst >> case scenario. I would expect the vast majority of routes to be >> significantly shorter, and I want to say the shorter routes query much >> faster [testing needed]. That said, the faster the better, even for short >> routes :-) >> ----------------------------------------------- >> Israel Brewster >> Systems Analyst II >> Ravn Alaska >> 5245 Airport Industrial Rd >> Fairbanks, AK 99709 >> (907) 450-7293 >> ----------------------------------------------- >> >> >> >> >> >> > >