On Fri, 31 Dec 2021 at 22:07, Tomas Vondra <tomas.von...@enterprisedb.com> wrote:
> The patch does something far more > elegant - it simply uses stavalues to store an array of JSONB documents, > each describing stats for one path extracted from the sampled documents. Sounds good. > I'm sure there's plenty open questions - for example I think we'll need > some logic to decide which paths to keep, otherwise the statistics can > get quite big, if we're dealing with large / variable documents. We're > already doing something similar for MCV lists. > > One of Nikita's patches not included in this thread allow "selective" > statistics, where you can define in advance a "filter" restricting which > parts are considered interesting by ANALYZE. That's interesting, but I > think we need some simple MCV-like heuristics first anyway. > > Another open question is how deep the stats should be. Imagine documents > like this: > > {"a" : {"b" : {"c" : {"d" : ...}}}} > > The current patch build stats for all possible paths: > > "a" > "a.b" > "a.b.c" > "a.b.c.d" > > and of course many of the paths will often have JSONB documents as > values, not simple scalar values. I wonder if we should limit the depth > somehow, and maybe build stats only for scalar values. The user interface for designing filters sounds hard, so I'd hope we can ignore that for now. -- Simon Riggs http://www.EnterpriseDB.com/