Re: Multidimensional Histograms

Andrei Lepikhov Sun, 07 Jan 2024 09:27:10 -0800

On 7/1/2024 17:51, Tomas Vondra wrote:

On 1/7/24 11:22, Andrei Lepikhov wrote:

On 7/1/2024 06:54, Tomas Vondra wrote:

It's an interesting are for experiments, no doubt about it. And if you
choose to explore it, that's fine. But it's better to be aware it may
not end with a commit.
For the multi-dimensional case, I propose we first try to experiment
with the various algorithms, and figure out what works etc. Maybe
implementing them in python or something would be easier than C.


Curiously, trying to utilize extended statistics for some problematic
cases, I am experimenting with auto-generating such statistics by
definition of indexes [1]. Doing that, I wanted to add some hand-made
statistics like a multidimensional histogram or just a histogram which
could help to perform estimation over a set of columns/expressions.
I realized that current hooks get_relation_stats_hook and
get_index_stats_hook are insufficient if I want to perform an estimation
over a set of ANDed quals on different columns.
In your opinion, is it possible to add a hook into the extended
statistics to allow for an extension to propose alternative estimation?

[1] https://github.com/danolivo/pg_index_stats


No idea, I haven't thought about that very much. Presumably the existing
hooks are insufficient because they're per-attnum? I guess it would make
sense to have a hook for all the attnums of the relation, but I'm not
sure it'd be enough to introduce a new extended statistics kind ...

I got stuck on the same problem Alexander mentioned: we usually havelarge tables with many uniformly distributed values. In this case, MCVdoesn't help a lot.Usually, I face problems scanning a table with a filter containing 3-6ANDed quals. Here, Postgres multiplies selectivities and ends up with aless than 1 tuple selectivity. But such scans, in reality, mostly havesome physical sense and return a bunch of tuples. It looks like the setof columns representing some value of composite type.Sometimes extended statistics on dependency helps well, but it expensivefor multiple columns. And sometimes I see that even a trivial histogramon a ROW(x1,x2,...) could predict a much more adequate value (kind ofconservative upper estimation) for a clause like "x1=N1 AND x2=N2 AND..." if somewhere in extension we'd transform it to ROW(x1,x2,...) =ROW(N1, N2,...).For such cases we don't have an in-core solution, and introducing a hookon clause list estimation (paired with maybe a hook on statisticsgeneration) could help invent an extension that would deal with thatproblem. Also, it would open a way for experiments with different typesof extended statistics ...


--
regards,
Andrei Lepikhov
Postgres Professional

Re: Multidimensional Histograms

Reply via email to