On 06/06/17 18:08, Marc Girondot via R-help wrote:
This is a question at the border between stats and r.
When I do a glm with many potential effects, and select a model using
stepAIC, many independent variables are selected even if there are no
relationship between dependent variable and the effects (all are random
numbers).
Do someone has a solution to prevent this effect ? Is it related to
Bonferoni correction ?
Is there is a ratio of independent vs number of observations that is
safe for stepAIC ?
Thanks
Marc
Example of code. When 2 independent variables are included, no effect is
selected, when 11 are included, 7 to 8 are selected.
x <- rnorm(15, 15, 2)
A <- rnorm(15, 20, 5)
B <- rnorm(15, 20, 5)
C <- rnorm(15, 20, 5)
D <- rnorm(15, 20, 5)
E <- rnorm(15, 20, 5)
F <- rnorm(15, 20, 5)
G <- rnorm(15, 20, 5)
H <- rnorm(15, 20, 5)
I <- rnorm(15, 20, 5)
J <- rnorm(15, 20, 5)
K <- rnorm(15, 20, 5)
df <- data.frame(x=x, A=A, B=B, C=C, D=D,
E=E, F=F, G=G, H=H, I=I, J=J,
K=K)
G1 <- glm(formula = x ~ A + B,
data=df, family = gaussian(link = "identity"))
g1 <- stepAIC(G1)
summary(g1)
G2 <- glm(formula = x ~ A + B + C + D + E + F + G + H + I + J + K,
data=df, family = gaussian(link = "identity"))
g2 <- stepAIC(G2)
summary(g2)
IMHO there's nothing much that you can do about this. Trying to get the
data to select a model is always fraught with peril.
The phenomenon that you have observed has been remarked on before; see
Alan Miller's book "Subset Selection in Regression" (Chapman and Hall,
1990), page 12 (first paragraph of section 1.4).
However you might find some of Miller's recommendations to be at least a
*bit* useful.
cheers,
Rolf Turner
--
Technical Editor ANZJS
Department of Statistics
University of Auckland
Phone: +64-9-373-7599 ext. 88276
______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.