On Fri, Feb 08, 2019 at 07:23:19PM +0100, SZEDER Gábor wrote:

> > Picking an <N> is tough. Too low and you get a false negative, too high
> > and you can wait forever, especially if the script is long. But I don't
> > think there's any real way to auto-scale it, except by seeing a few of
> > the failing cases and watching how long they take.
> 
> So far I've chosen <N> like this: run the test script with --stress
> 3-5 times to trigger the failure, take the highest repetition count
> that was necessary for the failure, multiply it by 4-6 to get a round
> number, and that's a good ballpark for <N>.  And once bisect came up
> with the suspect commit, I double checked it by letting the test
> script run with --stress on its parent commit for at least 5-10x <N>
> repetitions.

Heh. That's exactly my process, too. :)

> Anyway, I doubt that auto-scaling <N> is worth the effort.

Yeah, especially because as a concept it exists outside of the script
itself (i.e., you have to checkout a failing version and then run the
script a bunch of times; that's not something that test-lib.sh should
even know about).

So let's go with this for now. It's already a much nicer tool than we
had yesterday, so we can take some time to get used to it.

-Peff

Reply via email to