On Fri, 20 Oct 2023 at 18:55, Alex Herbert <alex.d.herb...@gmail.com> wrote:
>
> The chi-square critical value (13.82) is correct:
>
> >>> from scipy.stats import chi2
> >>> chi2(2).isf(0.001)
> 13.815510557964274
>
> The test seems to fail with the expected frequency when run locally. I
> annotated with:
>
> @RepeatedTest(value = 100000)
>
> I observe 93 failures (just under 1 in 1000). So it is strange this
> fails a lot on the GH CI build.
>
> We could just use a fixed Random argument to the call that is
> ultimately performing the random string generation:
>
> random(count, 0, chars.length, false, false, chars, random());
>
> Switch the test to:
>
> Random rng = new Random(0xdeadbeef)
>
> gen = RandomStringUtils.random(6, 0, 3, false, false, chars, rng);
>
> You will see a drop in coverage by not exercising the public API.
>
> The alternative is to change the chi-square critical value:
>
> 1 in 10,000: 18.420680743952364
> 1 in 100,000: 23.025850929940457
> 1 in 1,000,000: 27.631021115928547
>
> Alex

Also note that although the test fails 1 in 1000 times, we run 4
builds in CI (coverage + 3 JDKS). So we expect to see failure with a
p-value of:

1 - (1 - 0.001)^4 = 0.00399

This is the probability of not seeing a failure subtracted from 1. It
is approximately 1 in 250.

I did check the computation of the chi-square statistic and AFAIK it is correct.

My suggestion for a first change is to bump the critical value to the
level for 1 in 100,000. We should then see failures every 25,000 GH
builds. If the frequency is more than that then we have to assume that
the ThreadLocalRandom instance is not uniformly sampling from the set
of size 3. I find this unlikely as the underlying algorithm for
ThreadLocalRandom is good [1].

Alex

[1] IIRC it passes the Test U01 BigCrush test for random generators

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Reply via email to