Re: Collecting statistics about contents of JSONB columns

Simon Riggs Wed, 05 Jan 2022 12:23:01 -0800

On Fri, 31 Dec 2021 at 22:07, Tomas Vondra
<tomas.von...@enterprisedb.com> wrote:


> The patch does something far more
> elegant - it simply uses stavalues to store an array of JSONB documents,
> each describing stats for one path extracted from the sampled documents.

Sounds good.

> I'm sure there's plenty open questions - for example I think we'll need
> some logic to decide which paths to keep, otherwise the statistics can
> get quite big, if we're dealing with large / variable documents. We're
> already doing something similar for MCV lists.
>
> One of Nikita's patches not included in this thread allow "selective"
> statistics, where you can define in advance a "filter" restricting which
> parts are considered interesting by ANALYZE. That's interesting, but I
> think we need some simple MCV-like heuristics first anyway.
>
> Another open question is how deep the stats should be. Imagine documents
> like this:
>
>    {"a" : {"b" : {"c" : {"d" : ...}}}}
>
> The current patch build stats for all possible paths:
>
>   "a"
>   "a.b"
>   "a.b.c"
>   "a.b.c.d"
>
> and of course many of the paths will often have JSONB documents as
> values, not simple scalar values. I wonder if we should limit the depth
> somehow, and maybe build stats only for scalar values.

The user interface for designing filters sounds hard, so I'd hope we
can ignore that for now.

--
Simon Riggs                http://www.EnterpriseDB.com/

Re: Collecting statistics about contents of JSONB columns

Reply via email to