On 05/28/2011 12:54 PM, Ben Haller wrote:
1. Is my choice of glmnet() ok?  On what basis should I choose
glmnet() vs. lars()?

LARS is for linear regression; your outcome is binary.

2. Is the way I'm scaling the variables before calling glmnet()
correct?  Or should the squares themselves be centered and scaled?

3. Is my model matrix correct, or do I have a problem with the scale
of the interaction variables?

glmnet centers and scales the variables itself.  You do not need to do so.

4. Is it a problem that the lasso fit gives non-zero coefficients for
interactions whose underlying terms have zero coefficients?

This is going to occur with any automated model selection procedure unless specifically disallowed.

5. Is there any way to choose a simple explanatory model, smaller
than the best predictive model supported by the data, that is less
arbitrary / subjective?

You have 5 variables. Variable selection is not your goal. What you are trying to do is fit a curve (as opposed to a line) through your data, along possibly with interactions. I would suggest looking into splines, provided for example in the mgcv package.

--
Patrick Breheny
Assistant Professor
Department of Biostatistics
Department of Statistics
University of Kentucky

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to