On 11/02/2015 4:30 PM, Goldschneider, Jill wrote:
> I was playing with some examples of piecewise regression using lm() and have
> come across a behavior I'm uncertain about.
> Below is simple 3-segment dataset. I compare predicted output of a model
> created by one call to lm() to that of 3 models created by 3 calls to lm().
> In case A and B, the results are the same. However, in case C the results
> differ for the middle segment.
> Is the output of lm() for case C to be expected or not and if so, why?
Take a look at the fit value, and you'll see what happened:
> lm(y ~ (x<=10)*x + I(x>10 & x<=20) + (x>20)*x)
Call:
lm(formula = y ~ (x <= 10) * x + I(x > 10 & x <= 20) + (x > 20) *
x)
Coefficients:
(Intercept) x <= 10TRUE
x I(x > 10 & x <= 20)TRUE
35.0741 -15.0733
-0.1644 -17.7344
x > 20TRUE x <= 10TRUE:x x:x > 20TRUE
NA 1.2397 -0.9658
The * in a formula means "main effect plus interaction", not
multiplication. W hat you want is
lm(y ~ I((x<=10)*x) + I(x>10 & x<=20) + I((x>20)*x))
This doesn't give exactly the same results as the segmented regression,
because it uses the same intercept in all three segments; you might want
to add I(x <= 10) as well if you don't want that.
Duncan Murdoch
> Thank you,
> Jill
>
> set.seed(133)
> y <- c(21:30, rep(15,10), 10:1) + runif(30, -2, 2)
> x <- 1:30
>
> # Case A: 3 segments, each fit with a constant value
> plot(x, y)
> # 3 different lm fits
> p.c3 <- c(predict(lm(y[1:10]~1)), predict(lm(y[11:20]~1)),
> predict(lm(y[21:30]~1)))
> lines(x, p.c3, col="red")
> # piecewise via lm
> p.lm1 <- predict(lm(y ~ I(x<=10) + I(x>10 & x<=20) + I(x>20)))
> lines(x, p.lm1, col="blue")
> max(abs(p.c3-p.lm1))
>
> # Case B: 3 segments, each fit with a line
> plot(x, y)
> # 3 different lm fits
> p.c3 <- c(predict(lm(y[1:10]~x[1:10])), predict(lm(y[11:20]~x[11:20])),
> predict(lm(y[21:30]~x[21:30])))
> lines(x, p.c3, col="red")
> # piecewise via lm
> p.lm1 <- predict(lm(y ~ (x<=10)*x + (x>10 & x<=20)*x + (x>20)*x))
> lines(x, p.lm1, col="blue")
> max(abs(p.c3-p.lm1))
>
> # Cast C: 3 segments - the middle fit with a constant value, the outer by a
> line
> plot(x, y)
> # 3 different lm fits
> p.c3 <- c(predict(lm(y[1:10]~x[1:10])), predict(lm(y[11:20]~1)),
> predict(lm(y[21:30]~x[21:30])))
> lines(x, p.c3, col="red")
> # piecewise via lm
> p.lm1 <- predict(lm(y ~ (x<=10)*x + I(x>10 & x<=20) + (x>20)*x))
> lines(x, p.lm1, col="blue")
> max(abs(p.c3-p.lm1))
> ## the single call to lm does not have a constant value fit to the middle
> section
> plot(x, p.c3-p.lm1 )
>
>
>
> The information contained in this transmission may contain confidential
> information. If the reader of this message is not the intended recipient,
> you are hereby notified that any review, dissemination, distribution or
> duplication of this communication is strictly prohibited. If you are not the
> intended recipient, please contact the sender by reply email and destroy all
> copies of the original message.
>
> ______________________________________________
> [email protected] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
______________________________________________
[email protected] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.