On 7 January 2013 01:46, Steven D'Aprano <steve+comp.lang.pyt...@pearwood.info> wrote: > On Sun, 06 Jan 2013 19:44:08 +0000, Joseph L. Casale wrote: > >> I have a dataset that consists of a dict with text descriptions and >> values that are integers. If required, I collect the values into a list >> and create a numpy array running it through a simple routine: >> >> data[abs(data - mean(data)) < m * std(data)] >> >> where m is the number of std deviations to include. > > I'm not sure that this approach is statistically robust. No, let me be > even more assertive: I'm sure that this approach is NOT statistically > robust, and may be scientifically dubious.
Whether or not this is "statistically robust" requires more explanation about the OP's intention. Thus far, the OP has not given any reason/motivation for excluding data or even for having any data in the first place! It's hard to say whether any technique applied is really accurate/robust without knowing *anything* about the purpose of the operation. Oscar -- http://mail.python.org/mailman/listinfo/python-list