I opened a PR after changing the expected failure probability to 1e-5.

I had an old version of the GH build file when I estimated it used 4
runs. The latest CI runs 4 JDKs on 3 platforms plus CodeQL and
coverage. So this is 14 runs. We should see failures with a
probability of:

(1 - (1 - 1e-5)**14) = 0.0001399, or approximately 1 in 7143.

Compare that to previously:

(1 - (1 - 1e-3)**14) = 0.01391, or approximately 1 in 71.89.

If this is still a problem moving forward then we can replace with a
fixed random number generator calling the underlying method and test
coverage of the original method by other means.

Alex

On Fri, 20 Oct 2023 at 20:16, Gary D. Gregory <ggreg...@apache.org> wrote:
>
> Hi Alex,
>
> I'd prefer if you could give a shot at adjusting this test when you can take 
> the time.
>
> TY,
> Gary
>
> On 2023/10/20 18:17:35 Alex Herbert wrote:
> > On Fri, 20 Oct 2023 at 18:55, Alex Herbert <alex.d.herb...@gmail.com> wrote:
> > >
> > > The chi-square critical value (13.82) is correct:
> > >
> > > >>> from scipy.stats import chi2
> > > >>> chi2(2).isf(0.001)
> > > 13.815510557964274
> > >
> > > The test seems to fail with the expected frequency when run locally. I
> > > annotated with:
> > >
> > > @RepeatedTest(value = 100000)
> > >
> > > I observe 93 failures (just under 1 in 1000). So it is strange this
> > > fails a lot on the GH CI build.
> > >
> > > We could just use a fixed Random argument to the call that is
> > > ultimately performing the random string generation:
> > >
> > > random(count, 0, chars.length, false, false, chars, random());
> > >
> > > Switch the test to:
> > >
> > > Random rng = new Random(0xdeadbeef)
> > >
> > > gen = RandomStringUtils.random(6, 0, 3, false, false, chars, rng);
> > >
> > > You will see a drop in coverage by not exercising the public API.
> > >
> > > The alternative is to change the chi-square critical value:
> > >
> > > 1 in 10,000: 18.420680743952364
> > > 1 in 100,000: 23.025850929940457
> > > 1 in 1,000,000: 27.631021115928547
> > >
> > > Alex
> >
> > Also note that although the test fails 1 in 1000 times, we run 4
> > builds in CI (coverage + 3 JDKS). So we expect to see failure with a
> > p-value of:
> >
> > 1 - (1 - 0.001)^4 = 0.00399
> >
> > This is the probability of not seeing a failure subtracted from 1. It
> > is approximately 1 in 250.
> >
> > I did check the computation of the chi-square statistic and AFAIK it is 
> > correct.
> >
> > My suggestion for a first change is to bump the critical value to the
> > level for 1 in 100,000. We should then see failures every 25,000 GH
> > builds. If the frequency is more than that then we have to assume that
> > the ThreadLocalRandom instance is not uniformly sampling from the set
> > of size 3. I find this unlikely as the underlying algorithm for
> > ThreadLocalRandom is good [1].
> >
> > Alex
> >
> > [1] IIRC it passes the Test U01 BigCrush test for random generators
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> > For additional commands, e-mail: dev-h...@commons.apache.org
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> For additional commands, e-mail: dev-h...@commons.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Reply via email to