Re: Should 'sum(mvf)' read 'sum(mcv)'...?

2022-08-22 Thread Julien Rouhaud
Hi,

On Sun, Aug 21, 2022 at 11:02:04PM +, PG Doc comments form wrote:
> The following documentation comment has been logged on the website:
> 
> Page: https://www.postgresql.org/docs/14/row-estimation-examples.html
> Description:
> 
> About halfway down this page
> https://www.postgresql.org/docs/current/row-estimation-examples.html we see
> the following formula for calculating selectivity:
> 
> > selectivity = (1 - sum(mvf))/(num_distinct - num_mcv)
> 
> And just below the formula we see the explanatory sentence saying:
> 
>> That is, add up all the frequencies for the MCVs and subtract them from
> one, ...
> 
> It appears the above sentence is referring to the "(1 - sum(mvf))" portion
> of the formula, however I am not sure what "mvf" is referring to
> there...shouldn't it be "(1 - sum(mcv))" in order to match what the
> explanatory sentence is saying?

It should be mcf, ie. Most Common Frequencies.  It looks like a very old typo
that survived until now.




Re: Should 'sum(mvf)' read 'sum(mcv)'...?

2022-08-22 Thread Daniel Gustafsson
> On 22 Aug 2022, at 09:48, Julien Rouhaud  wrote:
> On Sun, Aug 21, 2022 at 11:02:04PM +, PG Doc comments form wrote:

>> It appears the above sentence is referring to the "(1 - sum(mvf))" portion
>> of the formula, however I am not sure what "mvf" is referring to
>> there...shouldn't it be "(1 - sum(mcv))" in order to match what the
>> explanatory sentence is saying?
> 
> It should be mcf, ie. Most Common Frequencies. It looks like a very old typo
> that survived until now.

That seems plausible, but it does seem introduced on purpose in f5678e8e075 so
CC:ing Tom for a trip down memory lane.

Looking at this I noticed that we mark up MCV and MCF as acronyms but they
aren't defined in acronyms.sgml.  ISTM it's a good idea to keep a 1:1 mapping
between markup and content, so we should probably do that as per the attached?

--
Daniel Gustafsson   https://vmware.com/



mcx_acronyms.diff
Description: Binary data


Re: Should 'sum(mvf)' read 'sum(mcv)'...?

2022-08-22 Thread Julien Rouhaud
Hi,

On Mon, Aug 22, 2022 at 11:13:38AM +0200, Daniel Gustafsson wrote:
> > On 22 Aug 2022, at 09:48, Julien Rouhaud  wrote:
> > On Sun, Aug 21, 2022 at 11:02:04PM +, PG Doc comments form wrote:
> 
> >> It appears the above sentence is referring to the "(1 - sum(mvf))" portion
> >> of the formula, however I am not sure what "mvf" is referring to
> >> there...shouldn't it be "(1 - sum(mcv))" in order to match what the
> >> explanatory sentence is saying?
> > 
> > It should be mcf, ie. Most Common Frequencies. It looks like a very old typo
> > that survived until now.
> 
> That seems plausible, but it does seem introduced on purpose in f5678e8e075 so
> CC:ing Tom for a trip down memory lane.

That was actually introduced 2 years before in 234d50812c8 by Bruce.

> Looking at this I noticed that we mark up MCV and MCF as acronyms but they
> aren't defined in acronyms.sgml.  ISTM it's a good idea to keep a 1:1 mapping
> between markup and content, so we should probably do that as per the attached?

Agreed, although MCF is only used in planstats.sgml and the acronym defined
locally.




Re: Should 'sum(mvf)' read 'sum(mcv)'...?

2022-08-22 Thread Daniel Gustafsson
> On 22 Aug 2022, at 12:08, Julien Rouhaud  wrote:

> That was actually introduced 2 years before in 234d50812c8 by Bruce.

Yes, I was unclear, I meant that the second use was by Tom (whom I also missed
to CC as I said I would so doing that now).

--
Daniel Gustafsson   https://vmware.com/





Re: Should 'sum(mvf)' read 'sum(mcv)'...?

2022-08-22 Thread Tom Lane
Julien Rouhaud  writes:
> On Sun, Aug 21, 2022 at 11:02:04PM +, PG Doc comments form wrote:
>> It appears the above sentence is referring to the "(1 - sum(mvf))" portion
>> of the formula, however I am not sure what "mvf" is referring to
>> there...shouldn't it be "(1 - sum(mcv))" in order to match what the
>> explanatory sentence is saying?

> It should be mcf, ie. Most Common Frequencies.  It looks like a very old typo
> that survived until now.

I don't think it's a typo exactly, but an odd abbreviation for "Most
common Values' Frequencies".  (Summing the MCVs themselves isn't
sensible; they might not even be numeric.)

I'd vote for replacing mvf in both places with something a bit more
spelled-out, perhaps "mcv_freqs".

regards, tom lane