On Tue, Dec 30, 2014 at 04:58:48PM -0600, Alan Mead wrote: John and Harry, PSPPIRE.exe 0.8.4-g5ce6b1 (and the associated PSPP) is pretty wonky. Paste doesn't work correctly (I believe that's a know issue?). To paste syntax into the syntax window, I had to right-click and choose paste; neither Control-v or Edit > Paste would paste the text. And Run > Current line wasn't working (but now I cannot replicate it, I'm guessing it has to do with where the cursor was rather than which line of the syntax window was "shaded"). What was most annoying is that by "not working" I mean that the output window would flash but there was no additional output in the window and no warning or error.
I don't know why this should be. Perhaps Harry can shed some light on it. And then look at the boxplot I got when I ran Robin's syntax on the physio data (attached "boxplot.png")... I don't use examine and I don't know what " /STATISTICS = EXTREME (3)" is meant to do, but I know what a boxplot is and there shouldn't be values like 9999999 between 1200 and 200 on the y-axis. [ EXTREME (3) reporst the largest and smallest three values of the variable ] Regarding the 99999 issue, I certainly don't get that on GNU/Linux - my guess is that Windows has rounding issues and is miscalculating 300 as 299.99999999999999 (the left hand side is off the page). Like you say, Windows is somewhat Wonky. That is one reason why I don't regularly use it. Note, that we whilst we try to support PSPP under windows (and Harry has done an excellent job making his binaries available) the recommended platform is GNU or GNU/Linux. Regarding the actual algorithm, the boxplot I get from SPSS is attached as "boxplot2.png". I think it's a lot more reasonable (albeit uglier). The main difference is the SPSS boxplot had short whiskers while PSPP's boxplot whiskers seems to include the entire range of the data (including the outlier). In the physio dataset, apparently there are some outliers like 30 mm for a human height. That's the kind of thing that boxplots are supposed to help you find. Maybe that's a bug in PSPP that the whisker length is just wrong? Otherwise I think it would make more sense to limit the whiskers to some reasonable value like 1.5 times the inter-quartile range (or to the highest and lowest values that are within 1.5 times the inter-quartile range). Here is what SPSS has to say about boxplots: The boundaries of the box are Tukey's hinges. The length of the box is the interquartile range based on Tukey's hinges. That is, IQR = Q_3 - Q_1 Define STEP = 1.5 IQR A case is an outlier if Q_3 + STEP < y < Q_3 + 2 * STEP or Q_3 - 2 * STEP < y < Q_3 - 2 * STEP A case is an extreme if y >= Q_3 + 2 * STEP or y <= Q_1 - 2 * STEP Note that it doesn't actually say where the whiskers should be. However it seems that PSPP is placing the lower whisker at the lowest value y, of the dataset for which y < Q1 - STEP and the upper whisker at the highest value y, for which y < Q3 + STEP I vaguely remember reading this recommendation in the literature. If someone can reference any better recommendations, when we can consider implementing that instead. J' -- PGP Public key ID: 1024D/2DE827B3 fingerprint = 8797 A26D 0854 2EAB 0285 A290 8A67 719C 2DE8 27B3 See http://sks-keyservers.net or any PGP keyserver for public key.
signature.asc
Description: Digital signature
_______________________________________________ Pspp-users mailing list Pspp-users@gnu.org https://lists.gnu.org/mailman/listinfo/pspp-users