Oh for Pete's sake!

Computers use floating point arithmetic. Your residual standard error in case 2 (i.e. 1.44e-14) *is* 0, but floating point arithmetic can't quite see that this is so. Put in a check for the RSE being 0, and ``over- ride'' the adjusted R squared to be NA (or NaN, or whatever floats your boat) in such instances.
The all.equal() function might be useful to you:

> x <- 1.44e-14
> all.equal(x,0)
[1] TRUE

(Caution:  Trap for Young Players:  If x and y are ``really'' different,
then all.equal(x,y) doesn't return FALSE as you might expect, but rather
a description of the difference between x and y --- which may be complicated if x and y are complicated objects. The function isTRUE() is useful here.)

        cheers,

                Rolf Turner


On 21/01/2009, at 9:21 AM, tyler wrote:

Hi,

I'm analyzing a large number of simulations using lm(), a sample of the
resulting data is pasted below. In some simulations, the response
variable doesn't vary, ie:

tmp[[2]]$richness
[1] 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40

When I analyze this using R version 2.8.0 (2008-10-20) on a linux
cluster, I get an appropriate result:


## begin R ##

summary(lm(richness ~ het, data = tmp[[2]]))

Call:
lm(formula = richness ~ het, data = tmp[[2]])

Residuals:
   Min     1Q Median     3Q    Max
     0      0      0      0      0

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)       40          0     Inf   <2e-16 ***
het                0          0      NA       NA
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0 on 23 degrees of freedom
Multiple R-squared:      ,      Adjusted R-squared:
F-statistic:       on 1 and 23 DF,  p-value: NA

## end R ##

This is good, as when I extract the Adjusted R-squared and slope I get
NaN and 0, which are easily identified in my aggregate analysis, so I
can deal with them appropriately.

However, this isn't always the case:

## begin R ##

 tmp[[1]]$richness
[1] 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40
[26] 40 40 40 40 40 40 40 40 40 40 40

 summary(lm(richness ~ het, data = tmp[[1]]))

Call:
lm(formula = richness ~ het, data = tmp[[1]])

Residuals:
       Min         1Q     Median         3Q        Max
-8.265e-14  1.689e-15  2.384e-15  2.946e-15  4.022e-15

Coefficients:
             Estimate Std. Error   t value Pr(>|t|)
(Intercept) 4.000e+01  8.418e-15 4.752e+15   <2e-16 ***
het         1.495e-14  4.723e-14 3.160e-01    0.754
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.44e-14 on 34 degrees of freedom
Multiple R-squared: 0.5112,     Adjusted R-squared: 0.4968
F-statistic: 35.56 on 1 and 34 DF,  p-value: 9.609e-07

## end R ##

This is a problem, as when I plot the adj. R sq as part of an aggregate
analysis of a large number of simulations, it appears to be a very
strong regression. I wouldn't have caught this except it was
exceptionally high for the simulation parameters. It also differs by
more than rounding error from the results with R 2.8.1 running on my
laptop (Debian GNU/Linux), i.e., adj. R sq 0.5042 vs 0.4968.
Furthermore, on my laptop, none of the analyses produce a NaN adj. R sq,
even for data that do produce that result on the cluster.

Both my laptop and the linux cluster have na.action set to na.omit. Is
there something else I can do to ensure that lm() returns slope == 0
and adj.R.sq == NaN when the response variable is fixed?

Thanks for any suggestions,

Tyler

Data follows:

`tmp` <-
list(structure(list(richness = c(40, 40,
40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40,
40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40,
40, 40), range = c(0.655084651733024, 0.579667533660137, 0.433092220907644, 0.62937198839679, 0.787891987978164, 0.623511540624239, 0.542744487102066, 0.905937570175433, 0.806802881350753, 0.680413208666325, 0.873426339019084, 0.699982832956593, 0.697716600618959, 0.952729864926405, 0.782938474636578, 1.03899695305995, 0.715075858219333, 0.579749205792549, 1.20648999819246, 0.648677938600964, 0.651883559714785, 0.997318331273967, 0.926368116052012, 0.91001274146868, 1.20737951037620, 1.12006560586723, 1.09806272133903, 0.9750792390176, 0.356496202035743, 0.612018080768747, 0.701905693862144, 0.735857916053381, 0.991787489781244, 1.07247435214078, 0.60061903319766,
0.699733090379818), het = c(0.154538307084452, 0.143186508136608,
0.0690948358402777, 0.132337152911839, 0.169037344105692, 0.117783183361602, 0.117524251767612, 0.221161206774407, 0.204574928003633, 0.170571000779693, 0.204489357007294, 0.131749663515638, 0.154127894997213, 0.232672587431942, 0.198610891796736, 0.260497696582693, 0.129028191256682, 0.128717975847452, 0.254300896783617, 0.113546727236817, 0.142220347446853, 0.24828642688332, 0.194340945175726, 0.190782985783610, 0.214676796387244, 0.252940213066992, 0.22362832797347, 0.182423482989676, 0.0602332226418674, 0.145400861749859, 0.141297315445974, 0.139798699247632, 0.222815139716421, 0.211971297234962,
0.120813579628747, 0.150590744533818), n.rich = c(40, 40, 40,
40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40,
40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40,
40)), .Names = c("richness", "range", "het", "n.rich")),
 structure(list(richness = c(40, 40,
40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40,
40, 40, 40, 40, 40, 40, 40), range = c(0.753203162648624, 0.599708526308711, 0.714477274087683, 0.892359682406808, 0.868440625159371, 0.753239521511417, 1.20164969658467, 1.20462111558583, 1.13142122690491, 0.95241921975703, 1.13214481653550, 0.827528954009827, 1.14827745443481, 0.936048043180592, 0.874649332193952, 1.38844778296649, 0.985016220913809, 1.18166853164661, 0.784679773255876, 0.94894149080785, 0.770312904574722, 1.10203660758219,
1.15624067277321, 0.692776967548628, 0.79343712876973),
het = c(0.170481207967181,
0.108265674755723, 0.123316519598517, 0.220631611141464, 0.160460967122565, 0.145032358811883, 0.293678286125082, 0.284769842125969, 0.258637372765782, 0.18303781265474, 0.265304220319150, 0.194784967445680, 0.248055723803990, 0.204658616507612, 0.167203828355069, 0.287030735881294, 0.247639113771915, 0.269348295820692, 0.111409735752589, 0.209076579513581, 0.176890183224181, 0.249378876987384, 0.260323833307383, 0.177061093736427, 0.172263958005774
), n.rich = c(40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40,
40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40)), .Names = c ("richness",
"range", "het", "n.rich")))

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting- guide.html
and provide commented, minimal, self-contained, reproducible code.

######################################################################
Attention: This e-mail message is privileged and confidential. If you are not the intended recipient please delete the message and notify the sender. Any views or opinions presented are solely those of the author.

This e-mail has been scanned and cleared by MailMarshal www.marshalsoftware.com
######################################################################

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to