Re: [R] inconsistent lm results with fixed response variable

Rolf Turner Tue, 20 Jan 2009 13:03:00 -0800


Oh for Pete's sake!

Computers use floating point arithmetic. Your residual standarderror incase 2 (i.e. 1.44e-14) *is* 0, but floating point arithmetic can'tquite seethat this is so. Put in a check for the RSE being 0, and ``over-ride'' theadjusted R squared to be NA (or NaN, or whatever floats your boat) insuch instances.

The all.equal() function might be useful to you:

> x <- 1.44e-14
> all.equal(x,0)
[1] TRUE

(Caution:  Trap for Young Players:  If x and y are ``really'' different,
then all.equal(x,y) doesn't return FALSE as you might expect, but rather

a description of the difference between x and y --- which may becomplicatedif x and y are complicated objects. The function isTRUE() is usefulhere.)


        cheers,

                Rolf Turner


On 21/01/2009, at 9:21 AM, tyler wrote:

Hi,
I'm analyzing a large number of simulations using lm(), a sample ofthe
resulting data is pasted below. In some simulations, the response
variable doesn't vary, ie:
tmp[[2]]$richness
[1] 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 4040 40 40 40
When I analyze this using R version 2.8.0 (2008-10-20) on a linux
cluster, I get an appropriate result:


## begin R ##

summary(lm(richness ~ het, data = tmp[[2]]))

Call:
lm(formula = richness ~ het, data = tmp[[2]])

Residuals:
   Min     1Q Median     3Q    Max
     0      0      0      0      0

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)       40          0     Inf   <2e-16 ***
het                0          0      NA       NA
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0 on 23 degrees of freedom
Multiple R-squared:      ,      Adjusted R-squared:
F-statistic:       on 1 and 23 DF,  p-value: NA

## end R ##

This is good, as when I extract the Adjusted R-squared and slope I get
NaN and 0, which are easily identified in my aggregate analysis, so I
can deal with them appropriately.

However, this isn't always the case:

## begin R ##

 tmp[[1]]$richness
[1] 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 4040 40 40 40
[26] 40 40 40 40 40 40 40 40 40 40 40

 summary(lm(richness ~ het, data = tmp[[1]]))

Call:
lm(formula = richness ~ het, data = tmp[[1]])

Residuals:
       Min         1Q     Median         3Q        Max
-8.265e-14  1.689e-15  2.384e-15  2.946e-15  4.022e-15

Coefficients:
             Estimate Std. Error   t value Pr(>|t|)
(Intercept) 4.000e+01  8.418e-15 4.752e+15   <2e-16 ***
het         1.495e-14  4.723e-14 3.160e-01    0.754
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.44e-14 on 34 degrees of freedom
Multiple R-squared: 0.5112,     Adjusted R-squared: 0.4968
F-statistic: 35.56 on 1 and 34 DF,  p-value: 9.609e-07

## end R ##
This is a problem, as when I plot the adj. R sq as part of anaggregate
analysis of a large number of simulations, it appears to be a very
strong regression. I wouldn't have caught this except it was
exceptionally high for the simulation parameters. It also differs by
more than rounding error from the results with R 2.8.1 running on my
laptop (Debian GNU/Linux), i.e., adj. R sq 0.5042 vs 0.4968.
Furthermore, on my laptop, none of the analyses produce a NaN adj.R sq,
even for data that do produce that result on the cluster.

Both my laptop and the linux cluster have na.action set to na.omit. Is
there something else I can do to ensure that lm() returns slope == 0
and adj.R.sq == NaN when the response variable is fixed?

Thanks for any suggestions,

Tyler

Data follows:

`tmp` <-
list(structure(list(richness = c(40, 40,
40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40,
40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40,
40, 40), range = c(0.655084651733024, 0.579667533660137,0.433092220907644,0.62937198839679, 0.787891987978164, 0.623511540624239,0.542744487102066,0.905937570175433, 0.806802881350753, 0.680413208666325,0.873426339019084,0.699982832956593, 0.697716600618959, 0.952729864926405,0.782938474636578,1.03899695305995, 0.715075858219333, 0.579749205792549,1.20648999819246,0.648677938600964, 0.651883559714785, 0.997318331273967,0.926368116052012,0.91001274146868, 1.20737951037620, 1.12006560586723,1.09806272133903,0.9750792390176, 0.356496202035743, 0.612018080768747,0.701905693862144,0.735857916053381, 0.991787489781244, 1.07247435214078,0.60061903319766,
0.699733090379818), het = c(0.154538307084452, 0.143186508136608,
0.0690948358402777, 0.132337152911839, 0.169037344105692,0.117783183361602,0.117524251767612, 0.221161206774407, 0.204574928003633,0.170571000779693,0.204489357007294, 0.131749663515638, 0.154127894997213,0.232672587431942,0.198610891796736, 0.260497696582693, 0.129028191256682,0.128717975847452,0.254300896783617, 0.113546727236817, 0.142220347446853,0.24828642688332,0.194340945175726, 0.190782985783610, 0.214676796387244,0.252940213066992,0.22362832797347, 0.182423482989676, 0.0602332226418674,0.145400861749859,0.141297315445974, 0.139798699247632, 0.222815139716421,0.211971297234962,
0.120813579628747, 0.150590744533818), n.rich = c(40, 40, 40,
40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40,
40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40,
40)), .Names = c("richness", "range", "het", "n.rich")),
 structure(list(richness = c(40, 40,
40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40,
40, 40, 40, 40, 40, 40, 40), range = c(0.753203162648624,0.599708526308711,0.714477274087683, 0.892359682406808, 0.868440625159371,0.753239521511417,1.20164969658467, 1.20462111558583, 1.13142122690491,0.95241921975703,1.13214481653550, 0.827528954009827, 1.14827745443481,0.936048043180592,0.874649332193952, 1.38844778296649, 0.985016220913809,1.18166853164661,0.784679773255876, 0.94894149080785, 0.770312904574722,1.10203660758219,
1.15624067277321, 0.692776967548628, 0.79343712876973),
het = c(0.170481207967181,
0.108265674755723, 0.123316519598517, 0.220631611141464,0.160460967122565,0.145032358811883, 0.293678286125082, 0.284769842125969,0.258637372765782,0.18303781265474, 0.265304220319150, 0.194784967445680,0.248055723803990,0.204658616507612, 0.167203828355069, 0.287030735881294,0.247639113771915,0.269348295820692, 0.111409735752589, 0.209076579513581,0.176890183224181,0.249378876987384, 0.260323833307383, 0.177061093736427,0.172263958005774
), n.rich = c(40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40,
40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40)), .Names = c("richness",
"range", "het", "n.rich")))

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


######################################################################

Attention:This e-mail message is privileged and confidential. If you are not theintended recipient please delete the message and notify the sender.Any views or opinions presented are solely those of the author.

This e-mail has been scanned and cleared by MailMarshalwww.marshalsoftware.com

######################################################################

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] inconsistent lm results with fixed response variable

Reply via email to