On Tue, Mar 13, 2018 at 6:31 AM, David Rowley <david.row...@2ndquadrant.com> wrote:
> On 13 March 2018 at 11:44, Tom Lane <t...@sss.pgh.pa.us> wrote: > > While it would certainly be nice to have better behavior for that, > > "add a hook so users who can write C can fix it by hand" doesn't seem > > like a great solution. On top of the sheer difficulty of writing a > > hook function, you'd have the problem that no pre-written hook could > > know about all available functions. I think somehow we'd need a way > > to add per-function knowledge, perhaps roughly like the protransform > > feature. > I think this isn't either-or. I think a general hook can be useful for extensions that want to optimize particular data distributions/workloads using domain-knowledge about functions common for those workloads. That way users working with that data can use extensions to optimize workloads without writing C themselves. I also think a protransform like feature would add a lot of power to the native planner but this could take a while to get into core properly and may not handle all kinds of data distributions/cases. An example, of a case a protransform type system would not be able to optimize is mathematical operator expressions like bucketing integers by decile --- (integer / 10) * 10. This is somewhat analogous to date_trunc in the integer space and would also change the number of resulting distinct rows. > > I always imagined that extended statistics could be used for this. > Right now the estimates are much better when you create an index on > the function, but there's no real reason to limit the stats that are > gathered to just plain columns + expression indexes. > > I believe I'm not the only person to have considered this. Originally > extended statistics were named multivariate statistics. I think it was > Dean and I (maybe others too) that suggested to Tomas to give the > feature a more generic name so that it can be used for a more general > purpose later. > I also think that the point with extended statistics is a good one and points to the need for more experimentation/experience which I think a C hook is better suited for. Putting in a hook will allow extension writers like us to experiment and figure out the kinds of transform on statistics that are useful while having a small footprint on the core. I think designing a protransform-like system would benefit from more experience with the kinds of transformations that are useful. For example, can anything be done if the interval passed to date_trunc is not constant, or is it not even worth bothering with that case? Maybe extended statistics is a better approach, etc. > > -- > David Rowley http://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Training & Services >