I opened a PR after changing the expected failure probability to 1e-5. I had an old version of the GH build file when I estimated it used 4 runs. The latest CI runs 4 JDKs on 3 platforms plus CodeQL and coverage. So this is 14 runs. We should see failures with a probability of:
(1 - (1 - 1e-5)**14) = 0.0001399, or approximately 1 in 7143. Compare that to previously: (1 - (1 - 1e-3)**14) = 0.01391, or approximately 1 in 71.89. If this is still a problem moving forward then we can replace with a fixed random number generator calling the underlying method and test coverage of the original method by other means. Alex On Fri, 20 Oct 2023 at 20:16, Gary D. Gregory <ggreg...@apache.org> wrote: > > Hi Alex, > > I'd prefer if you could give a shot at adjusting this test when you can take > the time. > > TY, > Gary > > On 2023/10/20 18:17:35 Alex Herbert wrote: > > On Fri, 20 Oct 2023 at 18:55, Alex Herbert <alex.d.herb...@gmail.com> wrote: > > > > > > The chi-square critical value (13.82) is correct: > > > > > > >>> from scipy.stats import chi2 > > > >>> chi2(2).isf(0.001) > > > 13.815510557964274 > > > > > > The test seems to fail with the expected frequency when run locally. I > > > annotated with: > > > > > > @RepeatedTest(value = 100000) > > > > > > I observe 93 failures (just under 1 in 1000). So it is strange this > > > fails a lot on the GH CI build. > > > > > > We could just use a fixed Random argument to the call that is > > > ultimately performing the random string generation: > > > > > > random(count, 0, chars.length, false, false, chars, random()); > > > > > > Switch the test to: > > > > > > Random rng = new Random(0xdeadbeef) > > > > > > gen = RandomStringUtils.random(6, 0, 3, false, false, chars, rng); > > > > > > You will see a drop in coverage by not exercising the public API. > > > > > > The alternative is to change the chi-square critical value: > > > > > > 1 in 10,000: 18.420680743952364 > > > 1 in 100,000: 23.025850929940457 > > > 1 in 1,000,000: 27.631021115928547 > > > > > > Alex > > > > Also note that although the test fails 1 in 1000 times, we run 4 > > builds in CI (coverage + 3 JDKS). So we expect to see failure with a > > p-value of: > > > > 1 - (1 - 0.001)^4 = 0.00399 > > > > This is the probability of not seeing a failure subtracted from 1. It > > is approximately 1 in 250. > > > > I did check the computation of the chi-square statistic and AFAIK it is > > correct. > > > > My suggestion for a first change is to bump the critical value to the > > level for 1 in 100,000. We should then see failures every 25,000 GH > > builds. If the frequency is more than that then we have to assume that > > the ThreadLocalRandom instance is not uniformly sampling from the set > > of size 3. I find this unlikely as the underlying algorithm for > > ThreadLocalRandom is good [1]. > > > > Alex > > > > [1] IIRC it passes the Test U01 BigCrush test for random generators > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > > For additional commands, e-mail: dev-h...@commons.apache.org > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > For additional commands, e-mail: dev-h...@commons.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org