On 1/7/14 2:53 PM, Michael Still wrote:
Hi. Thanks for reaching out about this.

It seems this patch has now passed turbo hipster, so I am going to
treat this as a more theoretical question than perhaps you intended. I
should note though that Joshua Hesketh and I have been trying to read
/ triage every turbo hipster failure, but that has been hard this week
because we're both at a conference.

The problem this patch faced is that we are having trouble defining
what is a reasonable amount of time for a database migration to run
for. Specifically:

2014-01-07 14:59:32,012 [output] 205 -> 206...
2014-01-07 14:59:32,848 [heartbeat]
2014-01-07 15:00:02,848 [heartbeat]
2014-01-07 15:00:32,849 [heartbeat]
2014-01-07 15:00:39,197 [output] done

So applying migration 206 took slightly over a minute (67 seconds).
Our historical data (mean + 2 standard deviations) says that this
migration should take no more than 63 seconds. So this only just
failed the test.

It seems to me that requiring a runtime less than (mean + 2 stddev) leads to a false-positive rate of 1 in 40, right? If the runtimes have a normal(-ish) distribution, then 95% of them will be within 2 standard deviations of the mean, so that's 1 in 20 falling outside that range. Then discard the ones that are faster than (mean - 2 stddev), and that leaves 1 in 40. Please correct me if I'm wrong; I'm no statistician.

Such a high false-positive may make it too easy to ignore turbo hipster as the bot that cried wolf. This problem already exists with Jenkins and the devstack/tempest tests; when one of those fails, I don't wonder what I broke, but rather how many times I'll have to recheck the patch until the tests pass.

Unfortunately, I don't have a solution to offer, but perhaps someone else will.

_______________________________________________
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to