Re: Should 'sum(mvf)' read 'sum(mcv)'...?
Hi, On Sun, Aug 21, 2022 at 11:02:04PM +, PG Doc comments form wrote: > The following documentation comment has been logged on the website: > > Page: https://www.postgresql.org/docs/14/row-estimation-examples.html > Description: > > About halfway down this page > https://www.postgresql.org/docs/current/row-estimation-examples.html we see > the following formula for calculating selectivity: > > > selectivity = (1 - sum(mvf))/(num_distinct - num_mcv) > > And just below the formula we see the explanatory sentence saying: > >> That is, add up all the frequencies for the MCVs and subtract them from > one, ... > > It appears the above sentence is referring to the "(1 - sum(mvf))" portion > of the formula, however I am not sure what "mvf" is referring to > there...shouldn't it be "(1 - sum(mcv))" in order to match what the > explanatory sentence is saying? It should be mcf, ie. Most Common Frequencies. It looks like a very old typo that survived until now.
Re: Should 'sum(mvf)' read 'sum(mcv)'...?
> On 22 Aug 2022, at 09:48, Julien Rouhaud wrote: > On Sun, Aug 21, 2022 at 11:02:04PM +, PG Doc comments form wrote: >> It appears the above sentence is referring to the "(1 - sum(mvf))" portion >> of the formula, however I am not sure what "mvf" is referring to >> there...shouldn't it be "(1 - sum(mcv))" in order to match what the >> explanatory sentence is saying? > > It should be mcf, ie. Most Common Frequencies. It looks like a very old typo > that survived until now. That seems plausible, but it does seem introduced on purpose in f5678e8e075 so CC:ing Tom for a trip down memory lane. Looking at this I noticed that we mark up MCV and MCF as acronyms but they aren't defined in acronyms.sgml. ISTM it's a good idea to keep a 1:1 mapping between markup and content, so we should probably do that as per the attached? -- Daniel Gustafsson https://vmware.com/ mcx_acronyms.diff Description: Binary data
Re: Should 'sum(mvf)' read 'sum(mcv)'...?
Hi, On Mon, Aug 22, 2022 at 11:13:38AM +0200, Daniel Gustafsson wrote: > > On 22 Aug 2022, at 09:48, Julien Rouhaud wrote: > > On Sun, Aug 21, 2022 at 11:02:04PM +, PG Doc comments form wrote: > > >> It appears the above sentence is referring to the "(1 - sum(mvf))" portion > >> of the formula, however I am not sure what "mvf" is referring to > >> there...shouldn't it be "(1 - sum(mcv))" in order to match what the > >> explanatory sentence is saying? > > > > It should be mcf, ie. Most Common Frequencies. It looks like a very old typo > > that survived until now. > > That seems plausible, but it does seem introduced on purpose in f5678e8e075 so > CC:ing Tom for a trip down memory lane. That was actually introduced 2 years before in 234d50812c8 by Bruce. > Looking at this I noticed that we mark up MCV and MCF as acronyms but they > aren't defined in acronyms.sgml. ISTM it's a good idea to keep a 1:1 mapping > between markup and content, so we should probably do that as per the attached? Agreed, although MCF is only used in planstats.sgml and the acronym defined locally.
Re: Should 'sum(mvf)' read 'sum(mcv)'...?
> On 22 Aug 2022, at 12:08, Julien Rouhaud wrote: > That was actually introduced 2 years before in 234d50812c8 by Bruce. Yes, I was unclear, I meant that the second use was by Tom (whom I also missed to CC as I said I would so doing that now). -- Daniel Gustafsson https://vmware.com/
Re: Should 'sum(mvf)' read 'sum(mcv)'...?
Julien Rouhaud writes: > On Sun, Aug 21, 2022 at 11:02:04PM +, PG Doc comments form wrote: >> It appears the above sentence is referring to the "(1 - sum(mvf))" portion >> of the formula, however I am not sure what "mvf" is referring to >> there...shouldn't it be "(1 - sum(mcv))" in order to match what the >> explanatory sentence is saying? > It should be mcf, ie. Most Common Frequencies. It looks like a very old typo > that survived until now. I don't think it's a typo exactly, but an odd abbreviation for "Most common Values' Frequencies". (Summing the MCVs themselves isn't sensible; they might not even be numeric.) I'd vote for replacing mvf in both places with something a bit more spelled-out, perhaps "mcv_freqs". regards, tom lane