On Thu, Nov 9, 2017 at 12:11 PM, Paul Ramsey <pram...@cleverelephant.ca> wrote:
> All, > > As we try and make PostGIS more "parallel sensitive" we have been added > costs to our functions, so that their relative CPU cost is more accurately > reflected in parallel plans. > > This has resulted in an odd side effect: some of our "wrapper" functions > stop giving index scans in plans [1]. This is a problem! > > An example of a "wrapper" is ST_Intersects(geom1, geom2). It combines an > index operation (geom1 && geom2) with an exact spatial test > (_ST_Intersects(geom1, geom2). This is primarily for user convenience, and > has worked for us well for a decade and more. Having this construct stop > working is definitely a problem. > > As we add costs to our functions, the odds increase that one of the > parameters to a wrapper might be a costed function. It's not uncommon to > see: > > ST_Interects(geom, ST_SetSRID('POLYGON(...)', 4326)) > > It's fair to say that we really do depend on our wrappers getting inlined > basically all the time. They are simple functions, they do nothing other > than 'SELECT func1() AND func2() AND arg1 && arg2'. > > However, once costs are added to the parameters, the inlining can be > turned off relatively quickly. Here's a PgSQL native example: > > -- Create data table and index. Analyze. > DROP TABLE IF EXISTS boxen; > CREATE TABLE boxen AS > SELECT row_number() OVER() As gid, > box(point(x, y),point(x+1, y+1)) AS b, x, y > FROM generate_series(-100,100) As y, generate_series(-100,100) As > x; > CREATE INDEX idx_b_geom_gist ON boxen USING gist(b); > ANALYZE boxen; > > -- An inlined function > -- When set 'STRICT' it breaks index access > -- However 'IMMUTABLE' doesn't seem to bother it > CREATE OR REPLACE FUNCTION good_box(box, box) > RETURNS boolean > AS 'SELECT $1 OPERATOR(&&) $2 AND length(lseg(point($1),point($2))) > < 3' > LANGUAGE 'sql'; > > -- Start with a low cost circle() > ALTER FUNCTION circle(point, double precision) COST 1; > > -- [A] Query plan hits index > EXPLAIN SELECT gid > FROM boxen > WHERE good_box( > boxen.b, > box(circle(point(20.5, 20.5), 2)) > ); > > -- [B] Query plan hits index > EXPLAIN SELECT gid > FROM boxen, > (SELECT x, y FROM boxen WHERE x < 0 and y < 0) AS c > WHERE good_box( > boxen.b, > box(circle(point(c.x, c.y), 2)) > ); > > -- Increase cost of circle > ALTER FUNCTION circle(point, double precision) COST 100; > > -- [B] Query plan does not hit index! > EXPLAIN SELECT gid > FROM boxen, > (SELECT x, y FROM boxen WHERE x < 0 and y < 0) AS c > WHERE good_box( > boxen.b, > box(circle(point(c.x, c.y), 2)) > ); > > The inlining is getting tossed out on a test of how expensive the function > parameters are [2]. As a result, we lose what is really the correct plan, > and get a sequence scan instead of an index scan. > > The test of parameter cost seems quite old (15+ years) and perhaps didn't > anticipate highly variable individual function costs (or maybe it did). As > it stands though, PostGIS is currently stuck choosing between having costs > on our functions or having our inlined wrappers, because we cannot have > both at the same time. > I personally find it hard to make the case for not inlining all the time, but that's probably a lack of imagination. The functions in question pass all the other tests of "inlinability" the function cost one seems arbitrary. > > Any thoughts? > > Thanks! > > P. > > > [1] https://trac.osgeo.org/postgis/ticket/3675#comment:18 > [2] https://github.com/postgres/postgres/blob/ > ae20b23a9e7029f31ee902da08a464d968319f56/src/backend/ > optimizer/util/clauses.c#L4581-L4584 >