On Wed, Dec 31, 2014 at 10:28:10AM +0100, John Darrington wrote: On Tue, Dec 30, 2014 at 04:58:48PM -0600, Alan Mead wrote: or GNU/Linux. Regarding the actual algorithm, the boxplot I get from SPSS is attached as "boxplot2.png". I think it's a lot more reasonable (albeit uglier). The main difference is the SPSS boxplot had short whiskers while PSPP's boxplot whiskers seems to include the entire range of the data (including the outlier). In the physio dataset, apparently there are some outliers like 30 mm for a human height. That's the kind of thing that boxplots are supposed to help you find. Maybe that's a bug in PSPP that the whisker length is just wrong? Otherwise I think it would make more sense to limit the whiskers to some reasonable value like 1.5 times the inter-quartile range (or to the highest and lowest values that are within 1.5 times the inter-quartile range). Here is what SPSS has to say about boxplots: The boundaries of the box are Tukey's hinges. The length of the box is the interquartile range based on Tukey's hinges. That is, IQR = Q_3 - Q_1 Define STEP = 1.5 IQR A case is an outlier if Q_3 + STEP < y < Q_3 + 2 * STEP or Q_3 - 2 * STEP < y < Q_3 - 2 * STEP A case is an extreme if y >= Q_3 + 2 * STEP or y <= Q_1 - 2 * STEP Note that it doesn't actually say where the whiskers should be. However it seems that PSPP is placing the lower whisker at the lowest value y, of the dataset for which y < Q1 - STEP and the upper whisker at the highest value y, for which y < Q3 + STEP I vaguely remember reading this recommendation in the literature. If someone can reference any better recommendations, when we can consider implementing that instead.
Most other implementations seem to have the whiskers extend to the most extreme points of the dataset, which are not themselves outliers. So I pushed a change so that boxplots in PSPP do that too. J' -- PGP Public key ID: 1024D/2DE827B3 fingerprint = 8797 A26D 0854 2EAB 0285 A290 8A67 719C 2DE8 27B3 See http://sks-keyservers.net or any PGP keyserver for public key.
signature.asc
Description: Digital signature
_______________________________________________ Pspp-users mailing list Pspp-users@gnu.org https://lists.gnu.org/mailman/listinfo/pspp-users