Hi,
I'm analyzing a large number of simulations using lm(), a sample of
the
resulting data is pasted below. In some simulations, the response
variable doesn't vary, ie:
tmp[[2]]$richness
[1] 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40
40 40 40 40
When I analyze this using R version 2.8.0 (2008-10-20) on a linux
cluster, I get an appropriate result:
## begin R ##
summary(lm(richness ~ het, data = tmp[[2]]))
Call:
lm(formula = richness ~ het, data = tmp[[2]])
Residuals:
Min 1Q Median 3Q Max
0 0 0 0 0
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 40 0 Inf <2e-16 ***
het 0 0 NA NA
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0 on 23 degrees of freedom
Multiple R-squared: , Adjusted R-squared:
F-statistic: on 1 and 23 DF, p-value: NA
## end R ##
This is good, as when I extract the Adjusted R-squared and slope I get
NaN and 0, which are easily identified in my aggregate analysis, so I
can deal with them appropriately.
However, this isn't always the case:
## begin R ##
tmp[[1]]$richness
[1] 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40
40 40 40 40
[26] 40 40 40 40 40 40 40 40 40 40 40
summary(lm(richness ~ het, data = tmp[[1]]))
Call:
lm(formula = richness ~ het, data = tmp[[1]])
Residuals:
Min 1Q Median 3Q Max
-8.265e-14 1.689e-15 2.384e-15 2.946e-15 4.022e-15
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.000e+01 8.418e-15 4.752e+15 <2e-16 ***
het 1.495e-14 4.723e-14 3.160e-01 0.754
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 1.44e-14 on 34 degrees of freedom
Multiple R-squared: 0.5112, Adjusted R-squared: 0.4968
F-statistic: 35.56 on 1 and 34 DF, p-value: 9.609e-07
## end R ##
This is a problem, as when I plot the adj. R sq as part of an
aggregate
analysis of a large number of simulations, it appears to be a very
strong regression. I wouldn't have caught this except it was
exceptionally high for the simulation parameters. It also differs by
more than rounding error from the results with R 2.8.1 running on my
laptop (Debian GNU/Linux), i.e., adj. R sq 0.5042 vs 0.4968.
Furthermore, on my laptop, none of the analyses produce a NaN adj.
R sq,
even for data that do produce that result on the cluster.
Both my laptop and the linux cluster have na.action set to na.omit. Is
there something else I can do to ensure that lm() returns slope == 0
and adj.R.sq == NaN when the response variable is fixed?
Thanks for any suggestions,
Tyler
Data follows:
`tmp` <-
list(structure(list(richness = c(40, 40,
40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40,
40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40,
40, 40), range = c(0.655084651733024, 0.579667533660137,
0.433092220907644,
0.62937198839679, 0.787891987978164, 0.623511540624239,
0.542744487102066,
0.905937570175433, 0.806802881350753, 0.680413208666325,
0.873426339019084,
0.699982832956593, 0.697716600618959, 0.952729864926405,
0.782938474636578,
1.03899695305995, 0.715075858219333, 0.579749205792549,
1.20648999819246,
0.648677938600964, 0.651883559714785, 0.997318331273967,
0.926368116052012,
0.91001274146868, 1.20737951037620, 1.12006560586723,
1.09806272133903,
0.9750792390176, 0.356496202035743, 0.612018080768747,
0.701905693862144,
0.735857916053381, 0.991787489781244, 1.07247435214078,
0.60061903319766,
0.699733090379818), het = c(0.154538307084452, 0.143186508136608,
0.0690948358402777, 0.132337152911839, 0.169037344105692,
0.117783183361602,
0.117524251767612, 0.221161206774407, 0.204574928003633,
0.170571000779693,
0.204489357007294, 0.131749663515638, 0.154127894997213,
0.232672587431942,
0.198610891796736, 0.260497696582693, 0.129028191256682,
0.128717975847452,
0.254300896783617, 0.113546727236817, 0.142220347446853,
0.24828642688332,
0.194340945175726, 0.190782985783610, 0.214676796387244,
0.252940213066992,
0.22362832797347, 0.182423482989676, 0.0602332226418674,
0.145400861749859,
0.141297315445974, 0.139798699247632, 0.222815139716421,
0.211971297234962,
0.120813579628747, 0.150590744533818), n.rich = c(40, 40, 40,
40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40,
40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40,
40)), .Names = c("richness", "range", "het", "n.rich")),
structure(list(richness = c(40, 40,
40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40,
40, 40, 40, 40, 40, 40, 40), range = c(0.753203162648624,
0.599708526308711,
0.714477274087683, 0.892359682406808, 0.868440625159371,
0.753239521511417,
1.20164969658467, 1.20462111558583, 1.13142122690491,
0.95241921975703,
1.13214481653550, 0.827528954009827, 1.14827745443481,
0.936048043180592,
0.874649332193952, 1.38844778296649, 0.985016220913809,
1.18166853164661,
0.784679773255876, 0.94894149080785, 0.770312904574722,
1.10203660758219,
1.15624067277321, 0.692776967548628, 0.79343712876973),
het = c(0.170481207967181,
0.108265674755723, 0.123316519598517, 0.220631611141464,
0.160460967122565,
0.145032358811883, 0.293678286125082, 0.284769842125969,
0.258637372765782,
0.18303781265474, 0.265304220319150, 0.194784967445680,
0.248055723803990,
0.204658616507612, 0.167203828355069, 0.287030735881294,
0.247639113771915,
0.269348295820692, 0.111409735752589, 0.209076579513581,
0.176890183224181,
0.249378876987384, 0.260323833307383, 0.177061093736427,
0.172263958005774
), n.rich = c(40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40,
40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40)), .Names = c
("richness",
"range", "het", "n.rich")))
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-
guide.html
and provide commented, minimal, self-contained, reproducible code.