On Aug 24, 2010, at 1:06 PM, Mike Williamson wrote:
Hello All,
Using the standard "summary" function in 'R', I ran across some odd
behavior that I cannot understand. Easy to reproduce:
Typing:
summary(c(6,207936))
Yields::
Min. *1st Qu. Median Mean 3rd Qu. Max.*
6 *51990 104000 104000 156000 207900*
None of these values are correct except for the minimum. If I
perform
"quantile(c(6, 207936))", it gives the correct values. I originally
presumed that summary was merely calling "quantile" if it saw a
numeric, but
this doesn't seem to be the case.
I would have assumed as you did, and continue to think so with
appropriate modification of "merely" after reading the code in
summary.default:
else if (is.numeric(object)) {
nas <- is.na(object)
object <- object[!nas]
qq <- stats::quantile(object)
qq <- signif(c(qq[1L:3L], mean(object), qq[4L:5L]), digits)
names(qq) <- c("Min.", "1st Qu.", "Median", "Mean", "3rd Qu.",
"Max.")
if (any(nas))
c(qq, `NA's` = sum(nas))
else qq
Notice the digits argument:
> summary(c(6,207936))
Min. 1st Qu. Median Mean 3rd Qu. Max.
6 51990 104000 104000 156000 207900
> quantile(c(6,207936))
0% 25% 50% 75% 100%
6.0 51988.5 103971.0 155953.5 207936.0
> summary(c(6,207936), digits=6)
Min. 1st Qu. Median Mean 3rd Qu. Max.
6.0 51988.5 103971.0 103971.0 155954.0 207936.0
Anyone know what's going on here? On a related note, what is the
statistically correct answer for calculating the 1st quartile & 3rd
quartile
when only 2 values are present? I presume one takes the mid-point
between
the median (also calculated) and the min or max. So in this case,
51988.5
for 1st & 155953.5 for 3rd (which is what quantile calculates). But
taking
25% & 75% of the sum of the 2 also seems "reasonable". Either way,
"summary" is calculating the wrong number, and most disturbing is
that it
mis-calculates the max.
Regards,
David Winsemius, MD
West Hartford, CT
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.