Hello folks, I'm designing an API for some lightweight calculator-like statistics functions, such as mean, standard deviation, etc., and I want to support missing values. Missing values should be just ignored. E.g.:
mean([1, 2, MISSING, 3]) => 6/3 = 2 rather than 6/4 or raising an error. My question is, should I accept None as the missing value, or a dedicated singleton? In favour of None: it's already there, no extra code required. People may expect it to work. Against None: it's too easy to mistakenly add None to a data set by mistake, because functions return None by default. In favour of a dedicated MISSING singleton: it's obvious from context. It's not a lot of work to implement compared to using None. Hard to accidentally include it by mistake. If None does creep into the data by accident, you get a nice explicit exception. Against MISSING: users may expect to be able to choose their own sentinel by assigning to MISSING. I don't want to support that. I've considered what other packages do:- R uses a special value, NA, to stand in for missing values. This is more or less the model I wish to follow. I believe that MATLAB treats float NANs as missing values. I consider this an abuse of NANs and I won't be supporting that :-P Spreadsheets such as Excel, OpenOffice and Gnumeric generally ignore blank cells, and give you a choice between ignoring text and treating it as zero. E.g. with cells set to [1, 2, "spam", 3] the AVERAGE function returns 2 and the AVERAGEA function returns 1.5. numpy uses masked arrays, which is probably over-kill for my purposes; I am gratified to see it doesn't abuse NANs: >>> import numpy as np >>> a = np.array([1, 2, float('nan'), 3]) >>> np.mean(a) nan numpy also treats None as an error: >>> a = np.array([1, 2, None, 3]) >>> np.mean(a) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/lib/python2.5/site-packages/numpy/core/fromnumeric.py", line 860, in mean return mean(axis, dtype, out) TypeError: unsupported operand type(s) for +: 'int' and 'NoneType' I would appreciate any comments, advice or suggestions. -- Steven -- http://mail.python.org/mailman/listinfo/python-list