On Fri, Nov 9, 2018 at 11:14 AM Paul Ramsey <pram...@cleverelephant.ca> wrote:
> > Neglected to include the footnotes... > > As I promised at PgConf.eu, here's a follow-up email about PostGIS > parallelism and function inlining (an odd combination, it is true). > > So, context: > > - We want PostGIS to parallelize more. In order to achieve that we need to > mark our functions with more realistic COSTs. Much much higher COSTs. > - When we do that, we hit a different problem. Our most commonly used > functions, ST_Intersects(), ST_DWithin() are actually SQL wrapper functions > are more complex combinations of index operators and exact computational > geometry functions. > - In the presence of high cost parameters that are used multiple times in > SQL functions, PostgreSQL will stop inlining those functions, in an attempt > to save the costs of double-calculating the parameters. > - For us, that's the wrong choice, because we lose the index operators at > the same time as we "save" the cost of double calculation. > - We need our wrapper functions inlined, even when they are carrying a > high COST. > > At pgconf.eu, I canvassed this problem and some potential solutions: > > * Solution #1 - Quick and dirty and visible: Add an 'INLINE' function > decorator, which tells PostgreSQL to just ignore costs and inline the > function regardless. Pros: it's not too hard to implement and I'm happy to > contribute this. Cons: it adds very specific single-purpose syntax to > CREATE FUNCTION. > > * Solution #2 - Quick and dirty and invisible. Tom suggested a hack that > achieves the aims of #1 but without adding syntax to CREATE FUNCTION: have > the inlining logic look at the cost of the wrapper and the cost of > parameters, and if the cost of the wrapper "greatly exceeded" the cost of > the parameters, then inline. So the PostGIS project would just set the cost > of our wrappers very high, and we'd get the behaviour we want, while other > users who want to use wrappers to force caching of calculations would have > zero coded wrapper functions. Pros: Solves the problem and easy to > implement, I'm happy to contribute. Cons: it's so clearly a hack involving > hidden (from users) magic. > > * Solution #3 - Correct and globally helpful. When first presented with > this problem last year, both Andres and Tom said [1] "but the right fix is > to avoid the double-calculation of identical entries in the target list" > because then it would be safe to inline functions with duplicate expensive > parameters. This would not only address the proximate PostGIS problem but > make a whole class of queries faster. There was some discussion of this > approach last week [2]. Pros: The right thing! Improves a whole pile of > other performance cases. Cons: Hard! Only experienced PgSQL developers need > apply. > > Naturally, I would love to see #3 implemented, but there's only so much > experienced developer time to go around, and it's beyond my current skill > set. I would like to be able to start to improve PostGIS parallelism with > PgSQL 12, so in order to make that not impossible, I'd like to implement > either #1 or #2 in case #3 doesn't happen for PgSQL 12. > > So my question to hackers is: which is less worse, #1 or #2, to implement > and submit to commitfest, in case #3 does not materialize in time for PgSQL > 12? > Absent any preferences, I would be inclined to go with #2, having a high personal tolerance for ugly hacks... :) P > > [1] > https://www.postgresql.org/message-id/20171116182208.kcvf75nfaldv36uh%40alap3.anarazel.de > [2] https://www.postgresql.org/message-id/10355.1540926295%40sss.pgh.pa.us > >