On 12/21/23 10:56, Melanie Plageman wrote:
On Sat, Dec 9, 2023 at 9:24 AM Joe Conway <m...@joeconway.com> wrote:
However, even if we assume a more-or-less normal distribution, we should
consider using subgroups in a way similar to Statistical Process
Control[1]. The reasoning is explained in this quote:
The Math Behind Subgroup Size
The Central Limit Theorem (CLT) plays a pivotal role here. According
to CLT, as the subgroup size (n) increases, the distribution of the
sample means will approximate a normal distribution, regardless of
the shape of the population distribution. Therefore, as your
subgroup size increases, your control chart limits will narrow,
making the chart more sensitive to special cause variation and more
prone to false alarms.
I haven't read anything about statistical process control until you
mentioned this. I read the link you sent and also googled around a
bit. I was under the impression that the more samples we have, the
better. But, it seems like this may not be the assumption in
statistical process control?
It may help us to get more specific. I'm not sure what the
relationship between "unsets" in my code and subgroup members would
be. The article you linked suggests that each subgroup should be of
size 5 or smaller. Translating that to my code, were you imagining
subgroups of "unsets" (each time we modify a page that was previously
all-visible)?
Basically, yes.
It might not makes sense, but I think we could test the theory by
plotting a histogram of the raw data, and then also plot a histogram
based on sub-grouping every 5 sequential values in your accumulator.
If the former does not look very normal (I would guess most workloads it
will be skewed with a long tail) and the latter looks to be more normal,
then it would say we were on the right track.
There are statistical tests for "normalness" that could be applied too
(<quickly looks> e.g.
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6350423/#sec2-13title )
which be a more rigorous approach, but the quick look at histograms
might be sufficiently convincing.
--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com