Re: Probabilistic unit tests?

duncan smith Fri, 11 Jan 2013 10:14:52 -0800

On 11/01/13 01:59, Nick Mellor wrote:

Hi,


I've got a unit test that will usually succeed but sometimes fails. An 
occasional failure is expected and fine. It's failing all the time I want to 
test for.

What I want to test is "on average, there are the same number of males and females 
in a sample, give or take 2%."

Here's the unit test code:
import unittest
from collections import counter

sex_count = Counter()
for contact in range(self.binary_check_sample_size):
     p = get_record_as_dict()
     sex_count[p['Sex']] += 1
self.assertAlmostEqual(sex_count['male'],
                        sex_count['female'],
                        delta=sample_size * 2.0 / 100.0)

My question is: how would you run an identical test 5 times and pass the group 
*as a whole* if only one or two iterations passed the test? Something like:

     for n in range(5):
         # self.assertAlmostEqual(...)
         # if test passed: break
     else:
         self.fail()

(except that would create 5+1 tests as written!)

Thanks for any thoughts,

Best wishes,

Nick

The appropriateness of "give or take 2%" will depend on sample size.e.g. If the proportion of males should be 0.5 and your sample size issmall enough this will fail most of the time regardless of whether theproportion is 0.5.

What you could do is perform a statistical test. Generally this involvesgenerating a p-value and rejecting the null hypothesis if the p-value isbelow some chosen threshold (Type I error rate), often taken to be 0.05.Here the null hypothesis would be that the underlying proportion ofmales is 0.5.

A statistical test will incorrectly reject a true null in a proportionof cases equal to the chosen Type I error rate. A test will also fail toreject false nulls a certain proportion of the time (the Type II errorrate). The Type II error rate can be reduced by using larger samples. Iprefer to generate several samples and test whether the proportion offailures is about equal to the error rate.

The above implies that p-values follow a [0,1] uniform density functionif the null hypothesis is true. So alternatively you could generate manysamples / p-values and test the p-values for uniformity. That is what Igenerally do:



p_values = []
for _ in range(numtests):
    values = data generated from code to be tested
    p_values.append(stat_test(values))
test p_values for uniformity

The result is still a test that will fail a given proportion of thetime. You just have to live with that. Run your test suite several timesand check that no one test is "failing" too regularly (more often thanthe chosen Type I error rate for the test of uniformity). My experienceis that any issues generally result in the test of uniformity beingconsistently rejected (which is why a do that rather than justperforming a single test on a single generated data set).

In your case you're testing a Binomial proportion and as long as you'regenerating enough data (you need to take into account any testassumptions / approximations) the observed proportions will beapproximately normally distributed. Samples of e.g. 100 would be fine.P-values can be generated from the appropriate normal(http://en.wikipedia.org/wiki/Binomial_proportion_confidence_interval),and uniformity can be tested using e.g. the Kolmogorov-Smirnov orAnderson-Darling test(http://www.itl.nist.gov/div898/handbook/eda/section3/eda35g.htm).

I'd have thought that something like this also exists somewhere. How dopeople usually test e.g. functions that generate random variates, orother cases where deterministic tests don't cut it?


Duncan
--
http://mail.python.org/mailman/listinfo/python-list

Re: Probabilistic unit tests?

Reply via email to