I got two questions on factors in regression: Q1. In a table, there a few categorical/factor variables, a few numerical variables and the response variable is numeric. Some factors are important but others not. How to determine which categorical variables are significant to the response variable?
Q2. As we knew, lm can deal with categorical variables. I thought, when there is a categorical predictor, we may use lm directly without quantifying these factors and assigning different values to factors would not change the fittings as shown: x <- 1:20 ## numeric predictor yes.no <- c("yes","no") factors <- gl(2,10,20,yes.no) ##factor predictor factors.quant <- rep(c(18.8,29.9),c(10,10)) ##quantificatio of factors factors.quant.1 <- rep(c(16.9,38.9),c(10,10)) ##second quantificatio of factors response <- 0.8*x + 18 + factors.quant + rnorm(20) ##response lm.quant <- lm(response ~ x + factors.quant) ##lm with quantifications lm.fact <- lm(response ~ x + factors) ##lm with factors lm.quant.1 <- lm(response ~ x + factors.quant.1) ##lm with quantifications lm.fact.1 <- lm(response ~ x + factors) ##lm with factors par(mfrow=c(2,2)) ## comparisons of two fittings plot(x, response) lines(x,fitted(lm.quant),col="blue") grid() plot(x,response) lines(x,fitted(lm.fact),col = "red") grid() plot(x, response) lines(x,fitted(lm.quant.1),lty =2,col="blue") grid() plot(x,response) lines(x,fitted(lm.fact.1),lty =2,col = "red") grid() par(mfrow = c(1,1)) So, is it right that we can assign any numeric values to factors, for example, c(yes, no) = c(18.8,29.9) or (16.9,38.9) in the above, before doing lm, glm, aov, even nls? Please drop a few lines and/or direct me some references. Thanks, -james ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.