I haven't read all of your code, but at first read, it seems right. With regard to your questions: 1. Am I doing it correctly or not? Seems OK, as I said. You could use some more standard code to convert your data to a matrix, but essentially the results should be the same. Also, lambda.min may be a tad to optimistic: to correct for the reuse of data in crossvalidation, one normally uses the minus one se trick (I think this is described in the helpfile for glmnet.cv, and that is also present in the glmnet.cv return value (lambda.1se if I'm not mistaken))
2. Which model, I mean lasso or elastic net, should be selected? and why? Both models chose the same variables but different coefficient values. You may want to read 'the elements of statistical learning' to find some info on the advantages of ridge/lasso/elnet compared. Lasso should work fine in this relatively low-dimensional setting, although it depends on the correlation structure of your covariates. Depending on your goals, you may want to refit a standard logistic regression with only the variables selected by the lasso: this avoids the downward bias that is in (just about) every penalized regression. 3. Is it O.K. to calculate odds ratio by exp(coefficients)? And how can you calculate 95% confidence interval of odds ratio? Or 95%CI is meaningless in this kind of analysis? At this time, confidence intervals for lasso/elnet in GLM settings is an open problem (the reason being that the L1 penalty is not differentiable). Some 'solutions' exist (bootstrap, for one), but they have all been shown to have (statistical) properties that make them - at the least - doubtful. I know, because I'm working on this. Short answer: there is no way to do this (at this time). HTH (and hang on there in Japan), Nick Sabbe -- ping: nick.sa...@ugent.be link: http://biomath.ugent.be wink: A1.056, Coupure Links 653, 9000 Gent ring: 09/264.59.36 -- Do Not Disapprove -----Original Message----- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of ???? Sent: vrijdag 25 maart 2011 14:04 To: r-h...@stat.math.ethz.ch Subject: [R] A question on glmnet analysis Hi, I am trying to do logistic regression for data of 104 patients, which have one outcome (yes or no) and 15 variables (9 categorical factors [yes or no] and 6 continuous variables). Number of yes outcome is 25. Twenty-five events and 15 variables mean events per variable is much less than 10. Therefore, I tried to analyze the data with penalized regression method. I would like please some of the experts here to help me. First of all, I standardized all 6 continuous variables by scale() with center=TRUE and scale=TRUE option. Nine categorical variables and one outcome variable were re-coded as 0 or 1. Then, I used glmnet with standardize=FALSE option because of presence of categorical variables. x15std <- matrix(c(x1,x2,x3,x4,x5,x6,x7,x8,x9,x10,x11,x12,x13,x14,x15), 104, 15) y <- outcome library(glmnet) fit.1 <- glmnet(x15std, y, family="binomial", standardize=FALSE) fit.1cv <- cv.glmnet(x15std, y, family="binomial", standardize=FALSE) default alpha=1, so this should be lasso penalty. Coefficients.fit1 <- coef(fit1, s=fit1.cv$lambda.min) Active.Index.fit1 <- which(Coefficients.fit1 !=0) Active.Coefficients.fit1 <- Coefficients.fit1[Active.Index.fit1] Active.Index.fit1 [1] 1 5 9 10 16 Active.Coefficients.fit1 [1] -1.28774827 0.01420395 0.70444865 -0.27726625 0.18455926 My optimal model chose 5 active covariates including intercept as first one. Second, I did the same things with alpha=0.5 option to do elastic net analysis. fit.2 <- glmnet(x15std, y, family="binomial", standardize=FALSE, alpha=0.5) fit.2cv <- cv.glmnet(x15std, y, family="binomial", standardize=FALSE, alpha=0.5) Coefficients.fit2 <- coef(fit2, s=fit2.cv$lambda.min) Active.Index.fit2 <- which(Coefficients.fit2 !=0) Active.Coefficients.fit2 <- Coefficients.fit2[Active.Index.fit2] Active.Index.fit2 [1] 1 5 9 10 16 Active.Coefficients.fit2 [1] -1.3286190 0.1410739 0.6315108 -0.2668022 0.2292459 This model chose the same 5 active covariates as first one with lasso penalty. My questions are followings; 1. Am I doing it correctly or not? 2. Which model, I mean lasso or elastic net, should be selected? and why? Both models chose the same variables but different coefficient values. 3. Is it O.K. to calculate odds ratio by exp(coefficients)? And how can you calculate 95% confidence interval of odds ratio? Or 95%CI is meaningless in this kind of analysis? I would appreciate your help in advance. KH ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.