Re: Columns correlation and adaptive query optimization

Konstantin Knizhnik Thu, 26 Mar 2020 07:44:13 -0700

Thank you very much for review.

On 25.03.2020 20:04, Rafia Sabih wrote:


+static void
+AddMultiColumnStatisticsForNode(PlanState *planstate, ExplainState *es);
+

This doesn't look like the right place for it, you might want to
declare it with other functions in the starting of the file.

Also, there is no description about any of the functions here,
wouldn’t hurt having some more comments there.


Sorry, I will fix it.
Actually this patch contains of two independent parts:

first allows to use auto_explain extension to generate mutlicolumnstatistic for variables used in clausesfor which selectivity estimation gives wrong result. It affects onlyauto_explain extension.

Second part allows to use multicolumn statistic for join selectivityestimation.

As far as I know extended statistic is now actively improved:
https://www.postgresql.org/message-id/flat/20200309000157.ig5tcrynvaqu4ixd%40development#bfbdf9c41c31ef92819dfc5ecde4a67c

I think that using extended statistic for join selectivity is veryimportant and should also be addressed.

If my approach is on so good, I will be pleased for other suggestions.


A few of more questions that cross my mind at this point,

- have you tried measuring the extra cost we have to pay for this
mores statistics , and also compare it with the benefit it gives in
terms of accuracy.

Adding statistic not always leads to performance improvement but I neverobserved any performance degradation caused by presence of extendedstatistic.Definitely we can manually create too many extended statistic entriesfor different subsets of columns.And it certainly increase planning time because optimizer has toconsider more alternatives.

But in practice I never noticed such slowdown.

- I would also be interested in understanding if there are cases when
adding this extra step doesn’t help and have you excluded them already
or if some of them are easily identifiable at this stage...?


Unfortunately there are many cases when extended statistic can not help.

Either because optimizer is not able to use it (for example my patchconsider only cases with strict equality comparison,but if you use predicate like "a.x=b.x and a.y in (1,2,3)" thenextended statistic for <x,y> can not be used.Either because collected statistic itself is not precise enough ,especially in case of data skews.

- is there any limit  on the number of columns for which this will
work, or should there be any such limit...?

Right now there is limit for maximal number of columns used in extendedstatistic: 8 columns.

But in practice I rarely see join predicates involving more than 3 columns.



--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: Columns correlation and adaptive query optimization

Reply via email to