On 05/28/2011 12:54 PM, Ben Haller wrote:
1. Is my choice of glmnet() ok? On what basis should I choose glmnet() vs. lars()?
LARS is for linear regression; your outcome is binary.
2. Is the way I'm scaling the variables before calling glmnet() correct? Or should the squares themselves be centered and scaled?
3. Is my model matrix correct, or do I have a problem with the scale of the interaction variables?
glmnet centers and scales the variables itself. You do not need to do so.
4. Is it a problem that the lasso fit gives non-zero coefficients for interactions whose underlying terms have zero coefficients?
This is going to occur with any automated model selection procedure unless specifically disallowed.
5. Is there any way to choose a simple explanatory model, smaller than the best predictive model supported by the data, that is less arbitrary / subjective?
You have 5 variables. Variable selection is not your goal. What you are trying to do is fit a curve (as opposed to a line) through your data, along possibly with interactions. I would suggest looking into splines, provided for example in the mgcv package.
-- Patrick Breheny Assistant Professor Department of Biostatistics Department of Statistics University of Kentucky ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.