Hi, I am trying to do logistic regression for data of 104 patients, which have one outcome (yes or no) and 15 variables (9 categorical factors [yes or no] and 6 continuous variables). Number of yes outcome is 25. Twenty-five events and 15 variables mean events per variable is much less than 10. Therefore, I tried to analyze the data with penalized regression method. I would like please some of the experts here to help me.
First of all, I standardized all 6 continuous variables by scale() with center=TRUE and scale=TRUE option. Nine categorical variables and one outcome variable were re-coded as 0 or 1. Then, I used glmnet with standardize=FALSE option because of presence of categorical variables. x15std <- matrix(c(x1,x2,x3,x4,x5,x6,x7,x8,x9,x10,x11,x12,x13,x14,x15), 104, 15) y <- outcome library(glmnet) fit.1 <- glmnet(x15std, y, family="binomial", standardize=FALSE) fit.1cv <- cv.glmnet(x15std, y, family="binomial", standardize=FALSE) default alpha=1, so this should be lasso penalty. Coefficients.fit1 <- coef(fit1, s=fit1.cv$lambda.min) Active.Index.fit1 <- which(Coefficients.fit1 !=0) Active.Coefficients.fit1 <- Coefficients.fit1[Active.Index.fit1] Active.Index.fit1 [1] 1 5 9 10 16 Active.Coefficients.fit1 [1] -1.28774827 0.01420395 0.70444865 -0.27726625 0.18455926 My optimal model chose 5 active covariates including intercept as first one. Second, I did the same things with alpha=0.5 option to do elastic net analysis. fit.2 <- glmnet(x15std, y, family="binomial", standardize=FALSE, alpha=0.5) fit.2cv <- cv.glmnet(x15std, y, family="binomial", standardize=FALSE, alpha=0.5) Coefficients.fit2 <- coef(fit2, s=fit2.cv$lambda.min) Active.Index.fit2 <- which(Coefficients.fit2 !=0) Active.Coefficients.fit2 <- Coefficients.fit2[Active.Index.fit2] Active.Index.fit2 [1] 1 5 9 10 16 Active.Coefficients.fit2 [1] -1.3286190 0.1410739 0.6315108 -0.2668022 0.2292459 This model chose the same 5 active covariates as first one with lasso penalty. My questions are followings; 1. Am I doing it correctly or not? 2. Which model, I mean lasso or elastic net, should be selected? and why? Both models chose the same variables but different coefficient values. 3. Is it O.K. to calculate odds ratio by exp(coefficients)? And how can you calculate 95% confidence interval of odds ratio? Or 95%CI is meaningless in this kind of analysis? I would appreciate your help in advance. KH ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.