Re: [R] when to use "I", "as is" caret

David Winsemius Fri, 14 Sep 2012 07:52:17 -0700

On Sep 14, 2012, at 12:41 AM, agent dunham wrote:

> Dear community, 
> 
> I've check it while working, but just to reassure myself.  Let's say we have
> 2 models: 
> 
> model1 <-  lm(vdep ~ log(v1) + v2 + v3 + I(v4^2) , data = mydata)


If you want to create a second degree polynomial for "proper" statisical 
inference via a formula, the way forward is:

?poly
model1 <-  lm(vdep ~ log(v1) + v2 + v3 + poly(v4,2) , data = mydata)

You will get orthogonal polynomials, which are different than most people's 
naive expectations, but they do allow your to fairly assess departures from 
linearity.

It's interesting to compare two methods with the cars dataset:

Proper use of poly():

> fm <- lm(dist ~ poly(speed, 2), data = cars)
> summary(fm)

Call:
lm(formula = dist ~ poly(speed, 2), data = cars)

Residuals:
    Min      1Q  Median      3Q     Max 
-28.720  -9.184  -3.188   4.628  45.152 

Coefficients:
                Estimate Std. Error t value Pr(>|t|)    
(Intercept)       42.980      2.146  20.026  < 2e-16 ***
poly(speed, 2)1  145.552     15.176   9.591 1.21e-12 ***
poly(speed, 2)2   22.996     15.176   1.515    0.136    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Residual standard error: 15.18 on 47 degrees of freedom
Multiple R-squared: 0.6673,     Adjusted R-squared: 0.6532 
F-statistic: 47.14 on 2 and 47 DF,  p-value: 5.852e-12 

Improper use of linear and "I-quadratic":

> fm2 <- lm(dist ~ speed+I(speed^2), data = cars)
> summary(fm2)

Call:
lm(formula = dist ~ speed + I(speed^2), data = cars)

Residuals:
    Min      1Q  Median      3Q     Max 
-28.720  -9.184  -3.188   4.628  45.152 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)  2.47014   14.81716   0.167    0.868
speed        0.91329    2.03422   0.449    0.656
I(speed^2)   0.09996    0.06597   1.515    0.136

Residual standard error: 15.18 on 47 degrees of freedom
Multiple R-squared: 0.6673,     Adjusted R-squared: 0.6532 
F-statistic: 47.14 on 2 and 47 DF,  p-value: 5.852e-12 

#---------

If you wanted the same results as you would get from I(v4^2) and you were using 
poly() it would look like :

(z <- poly(1:10, 2, raw=TRUE)[,2])
 [1]   1   4   9  16  25  36  49  64  81 100

I didn't know off whether one could use the raw-poly column within a formula 
for lm but it seems to work as I expected:

> fm <- lm(dist ~ I(speed^2), data = cars)
> fm

Call:
lm(formula = dist ~ I(speed^2), data = cars)

Coefficients:
(Intercept)   I(speed^2)  
      8.860        0.129  

> fm <- lm(dist ~ poly(speed, 2, raw=TRUE)[,2], data = cars)
> fm

Call:
lm(formula = dist ~ poly(speed, 2, raw = TRUE)[, 2], data = cars)

Coefficients:
                    (Intercept)  poly(speed, 2, raw = TRUE)[, 2]  
                          8.860                            0.129  


(And Uwe's answer covers the rest.)

> model2 <-   lm(vdep ~ log(v1) + v2 + v3 + v4^2, data = mydata)
> 
> So in model1 you really square v4; and in model2,  v4*^2 *doesn't do
> anything, does it? Model2 could be rewritten:
> model2b <-   lm(vdep ~ log(v1) + v2 + v3 + v4, data = mydata) and nothing
> changes, doesn't it?

> 
> This "I" caret is essential with powering or when including transformations
> as I(1/(v2+v3)) but not with log transformation, isn't it?. Is there any
> other transformation where I muss use also this "I", as is caret?
> 

David Winsemius, MD
Alameda, CA, USA

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] when to use "I", "as is" caret

Reply via email to