Steven D'Aprano wrote: > Rob Williscroft wrote: >> MISSING = MissingObject() >> def mean( sequence, missing = MISSING ): > > So you think the right API is to allow the caller to specify what > counts as a missing value at runtime? Are you aware of any other > statistics packages that do that?
R does it, not in the stats functions itself but in, for instance read.table. When reading data from an external file, you can specify a set of values that will be converted to NA in the resulting data frame. I think it's worth considering this approach, namely separating the input of the data into your system from the calculations on that data. You haven't said exactly how people are going to be using your API, but your example of "where mising data comes from" showed something like a table of data from a survey. If this is the case, and users are going to be importing sets of data from external files, it makes a lot of sense to let them specify "convert these particular values to MISSING when importing". Either way, my answer to your original question would be: if you want to err on the side of caution, use your own MISSING value and just provide a simple function that will MISSING-ize specified values: def ckeanUp(data, missing=None): if missing is None: missing = [] return [d for d in data if d not in missing else MISSING] (Yet another use of None here! :-) Then if people find their functions are returning None (or any other value, such as an empty string) to mean a "genuine" missing value, they can just wrap the call in this cleanUp function. The reverse is harder to do: if you use None as your missing-value sentinel, you irrevocably lose the ability to tell it apart from other uses of None. -- --OKB (not okblacke) Brendan Barnwell "Do not follow where the path may lead. Go, instead, where there is no path, and leave a trail." --author unknown -- http://mail.python.org/mailman/listinfo/python-list