On 2010-08-24 11:06, Mike Williamson wrote:
Hello All,
Using the standard "summary" function in 'R', I ran across some odd
behavior that I cannot understand. Easy to reproduce:
Typing:
summary(c(6,207936))
Yields::
Min. *1st Qu. Median Mean 3rd Qu. Max.*
6 *51990 104000 104000 156000 207900*
None of these values are correct except for the minimum. If I perform
"quantile(c(6, 207936))", it gives the correct values. I originally
presumed that summary was merely calling "quantile" if it saw a numeric, but
this doesn't seem to be the case.
Anyone know what's going on here? On a related note, what is the
statistically correct answer for calculating the 1st quartile& 3rd quartile
when only 2 values are present? I presume one takes the mid-point between
the median (also calculated) and the min or max. So in this case, 51988.5
for 1st& 155953.5 for 3rd (which is what quantile calculates). But taking
25%& 75% of the sum of the 2 also seems "reasonable". Either way,
"summary" is calculating the wrong number, and most disturbing is that it
mis-calculates the max.
Regards,
Mike
This is one of those (many) situations where reading the help pages
really helps nicely:
help(summary) points you to the 'digits' argument (as David has said)
and that probably defaults to 'digits=4' for you. So, no, R is not
miscalculating anything.
help(quantile) shows that there are quite a few ways to define
quantiles and that R defaults to 'type=7'.
-Peter Ehlers
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.