On Sun, Mar 15, 2020 at 3:23 PM Tomas Vondra <tomas.von...@2ndquadrant.com> wrote: > On Sun, Mar 15, 2020 at 02:48:02PM +1300, Thomas Munro wrote: > >Stimulated by some bad plans involving JSON, I found my way to your > >WIP stats-on-expressions patch in this thread. Do I understand > >correctly that it will eventually also support single expressions, > >like CREATE STATISTICS t_distinct_abc (ndistinct) ON > >(my_jsonb_column->>'abc') FROM t? It looks like that would solve > >problems that otherwise require a generated column or an expression > >index just to get ndistinct. > > Yes, I think that's generally the plan. I was also thinking about > inventing some sort of special JSON statistics (e.g. extracting paths > from the JSONB and computing frequencies, or something like that). But > stats on expressions are one of the things I'd like to do in PG14.
Interesting idea. If you had simple single-expression statistics, I suppose a cave-person version of this would be to write a script/stored procedure that extracts the distinct set of JSON paths and does CREATE STATISTICS for expressions to access each path. That said, I suspect that in many cases there's a small set of a paths and a human designer would know what to do. I didn't manage to try your WIP stats-on-expressions patch due to bitrot and unfinished parts, but I am hoping it just needs to remove the "if (numcols < 2) ereport(ERROR ...)" check to get a very very useful thing.