
On Tue, Aug 3, 2010 at 6:51 AM, <haenl...@gmail.com> wrote:

> I'm sorry -- I think I chose a bad example. Let me start over again:
> I want to estimate a moderated regression model of the following form:
> y = a*x1 + b*x2 + c*x1*x2 + e

No intercept? What's your null model, then?

> Based on my understanding, including an interaction term (x1*x2) into the
> regression in addition to x1 and x2 leads to issues of multicollinearity,
> as x1*x2 is likely to covary to some degree with x1 (and x2).

Is it possible you're confusing interaction with multicollinearity? You've
stated that x1 and x2 are weakly correlated;  the product term is going to
be correlated with each of its constituent covariates, but unless that
correlation is above 0.9 (some say 0.95) in magnitude, multicollinearity is
not really a substantive issue. As others have suggested, if you're
concerned about multicollinearity, then fit the interaction model and use
the vif() function from package car or elsewhere to check for it.
Multicollinearity has to do with ill-conditioning in the model matrix;
interaction means that the response y is influenced by the product of x1 and
x2 covariates as well as the individual covariates. They are not the same
thing. Perhaps an example will help.

Here's your x1 and x2 with a manufactured response:

df <- data.frame(x1 = rep(1:3, each = 3),
                  x2 = rep(1:3, 3))
df$y <- 0.5 + df$x1 + 1.2 * df$x2 + 2.5 * df$x1 * df$x2 + rnorm(9)
# Response is generated to produce a significant interaction
  x1 x2         y
1  1  1  5.968255
2  1  2  7.566212
3  1  3 13.420006
4  2  1  9.025791
5  2  2 16.382381
6  2  3 20.923113
7  3  1 11.669916
8  3  2 20.714224
9  3  3 31.757423

m1 <- lm(y ~ x1 * x2, data = df)
> summary(m1)

            Estimate Std. Error t value Pr(>|t|)
(Intercept)   2.3642     2.6214   0.902  0.40846
x1           -0.1200     1.2135  -0.099  0.92505
x2            0.2549     1.2135   0.210  0.84193
x1:x2         3.1589     0.5617   5.624  0.00246 **
Residual standard error: 1.123 on 5 degrees of freedom
Multiple R-squared: 0.9882,     Adjusted R-squared: 0.9812
F-statistic: 139.9 on 3 and 5 DF,  p-value: 3.053e-05

# So the model has insignificant marginal covariate effects but a strong
interaction effect.

   x1    x2 x1:x2
    7     7    13

# None of these is big enough to raise a red flag
# re multicollinearity. Let's look at the correlation
# matrix of the two covariates and their interaction.

with(df, cor(cbind(x1, x2, x1 * x2)))
          x1        x2
x1 1.0000000 0.0000000 0.6793662
x2 0.0000000 1.0000000 0.6793662
   0.6793662 0.6793662 1.0000000

The correlation of the interaction with the other two covariates is 0.68,
which is nowhere close to the 0.9 or above correlations that signal
potential multicollinearity.


> recommendation I have seen in this context is to use mean centering, but
> apparently this does not solve the problem (see: Echambadi, Raj and James
> D. Hess (2007), "Mean-centering does not alleviate collinearity problems in
> moderated multiple regression models," Marketing science, 26 (3), 438 -
> 45). So my question is: Which R function can I use to estimate this type of
> model.

> Sorry for the confusion caused due to my previous message,
> Michael
> On Aug 3, 2010 3:42pm, David Winsemius <dwinsem...@comcast.net> wrote:
> > I think you are attributing to "collinearity" a problem that is due to
> > your small sample size. You are predicting 9 points with 3 predictor
> > terms, and incorrectly concluding that there is some "inconsistency"
> > because you get an R^2 that is above some number you deem surprising. (I
> > got values between 0.2 and 0.4 on several runs.
> > Try:
> > x1
> > x2
> > x3
> > y
> > model
> > summary(model)
> > # Multiple R-squared: 0.04269
> > --
> > David.
> > On Aug 3, 2010, at 9:10 AM, Michael Haenlein wrote:
> > Dear all,
> > I have one dependent variable y and two independent variables x1 and x2
> > which I would like to use to explain y. x1 and x2 are design factors in
> an
> > experiment and are not correlated with each other. For example assume
> > that:
> > x1
> > x2
> > cor(x1,x2)
> > The problem is that I do not only want to analyze the effect of x1 and x2
> > on
> > y but also of their interaction x1*x2. Evidently this interaction term
> > has a
> > substantial correlation with both x1 and x2:
> > x3
> > cor(x1,x3)
> > cor(x2,x3)
> > I therefore expect that a simple regression of y on x1, x2 and x1*x2 will
> > lead to biased results due to multicollinearity. For example, even when y
> > is
> > completely random and unrelated to x1 and x2, I obtain a substantial R2
> > for
> > a simple linear model which includes all three variables. This evidently
> > does not make sense:
> > y
> > model
> > summary(model)
> > Is there some function within R or in some separate library that allows
> me
> > to estimate such a regression without obtaining inconsistent results?
> > Thanks for your help in advance,
> > Michael
> > Michael Haenlein
> > Associate Professor of Marketing
> > ESCP Europe
> > Paris, France
> > [[alternative HTML version deleted]]
> > ______________________________________________
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> > David Winsemius, MD
> > West Hartford, CT
>        [[alternative HTML version deleted]]
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

        [[alternative HTML version deleted]]

R-help@r-project.org mailing list
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to