On 11/01/13 01:59, Nick Mellor wrote:
Hi,

I've got a unit test that will usually succeed but sometimes fails. An 
occasional failure is expected and fine. It's failing all the time I want to 
test for.

What I want to test is "on average, there are the same number of males and females 
in a sample, give or take 2%."

Here's the unit test code:
import unittest
from collections import counter

sex_count = Counter()
for contact in range(self.binary_check_sample_size):
     p = get_record_as_dict()
     sex_count[p['Sex']] += 1
self.assertAlmostEqual(sex_count['male'],
                        sex_count['female'],
                        delta=sample_size * 2.0 / 100.0)

My question is: how would you run an identical test 5 times and pass the group 
*as a whole* if only one or two iterations passed the test? Something like:

     for n in range(5):
         # self.assertAlmostEqual(...)
         # if test passed: break
     else:
         self.fail()

(except that would create 5+1 tests as written!)

Thanks for any thoughts,

Best wishes,

Nick


The appropriateness of "give or take 2%" will depend on sample size. e.g. If the proportion of males should be 0.5 and your sample size is small enough this will fail most of the time regardless of whether the proportion is 0.5.

What you could do is perform a statistical test. Generally this involves generating a p-value and rejecting the null hypothesis if the p-value is below some chosen threshold (Type I error rate), often taken to be 0.05. Here the null hypothesis would be that the underlying proportion of males is 0.5.

A statistical test will incorrectly reject a true null in a proportion of cases equal to the chosen Type I error rate. A test will also fail to reject false nulls a certain proportion of the time (the Type II error rate). The Type II error rate can be reduced by using larger samples. I prefer to generate several samples and test whether the proportion of failures is about equal to the error rate.

The above implies that p-values follow a [0,1] uniform density function if the null hypothesis is true. So alternatively you could generate many samples / p-values and test the p-values for uniformity. That is what I generally do:


p_values = []
for _ in range(numtests):
    values = data generated from code to be tested
    p_values.append(stat_test(values))
test p_values for uniformity


The result is still a test that will fail a given proportion of the time. You just have to live with that. Run your test suite several times and check that no one test is "failing" too regularly (more often than the chosen Type I error rate for the test of uniformity). My experience is that any issues generally result in the test of uniformity being consistently rejected (which is why a do that rather than just performing a single test on a single generated data set).

In your case you're testing a Binomial proportion and as long as you're generating enough data (you need to take into account any test assumptions / approximations) the observed proportions will be approximately normally distributed. Samples of e.g. 100 would be fine. P-values can be generated from the appropriate normal (http://en.wikipedia.org/wiki/Binomial_proportion_confidence_interval), and uniformity can be tested using e.g. the Kolmogorov-Smirnov or Anderson-Darling test (http://www.itl.nist.gov/div898/handbook/eda/section3/eda35g.htm).

I'd have thought that something like this also exists somewhere. How do people usually test e.g. functions that generate random variates, or other cases where deterministic tests don't cut it?

Duncan
--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to