On Sat, Dec 9, 2023 at 9:24 AM Joe Conway <m...@joeconway.com> wrote: > > On 12/8/23 23:11, Melanie Plageman wrote: > > > > I'd be delighted to receive any feedback, ideas, questions, or review. > > > This is well thought out, well described, and a fantastic improvement in > my view -- well done!
Thanks, Joe! That means a lot! I see work done by hackers on the mailing list a lot that makes me think, "hey, that's cool/clever/awesome!" but I don't give that feedback. I appreciate you doing that! > I do think we will need to consider distributions other than normal, but > I don't know offhand what they will be. Agreed. I plan to test with another distribution. Though, the exercise of determining which ones are useful is probably more challenging. I imagine we will have to choose one distribution (as opposed to supporting different distributions and choosing based on data access patterns for a table). Though, even with a normal distribution, I think it should be an improvement. > However, even if we assume a more-or-less normal distribution, we should > consider using subgroups in a way similar to Statistical Process > Control[1]. The reasoning is explained in this quote: > > The Math Behind Subgroup Size > > The Central Limit Theorem (CLT) plays a pivotal role here. According > to CLT, as the subgroup size (n) increases, the distribution of the > sample means will approximate a normal distribution, regardless of > the shape of the population distribution. Therefore, as your > subgroup size increases, your control chart limits will narrow, > making the chart more sensitive to special cause variation and more > prone to false alarms. I haven't read anything about statistical process control until you mentioned this. I read the link you sent and also googled around a bit. I was under the impression that the more samples we have, the better. But, it seems like this may not be the assumption in statistical process control? It may help us to get more specific. I'm not sure what the relationship between "unsets" in my code and subgroup members would be. The article you linked suggests that each subgroup should be of size 5 or smaller. Translating that to my code, were you imagining subgroups of "unsets" (each time we modify a page that was previously all-visible)? Thanks for the feedback! - Melanie