Hi all, I'm trying to do model reduction for logistic regression. I have 13 predictor (4 continuous variables and 9 binary variables). Using subject matter knowledge, I selected 4 important variables. Regarding the rest 9 variables, I tried to perform data reduction by principal component analysis (PCA). However, 8 of 9 variables were binary and only one continuous. I transformed the data by transcan of rms package and did PCA with princomp. PC1 explained only 20% of the variance. Still, I used the PC1 as a predictor of the logistic model and obtained some results.
Then, I tried multiple correspondence analysis (MCA). The only one continuous variable was age. I transformed "age" variable to "age_Q" factor variable as the followings. > quantile(mydata.df$age) 0% 25% 50% 75% 100% 53.00 66.75 72.00 76.25 85.00 > age_Q <- cut(x17.df$age, right=TRUE, breaks=c(-Inf, 66, 72, 76, Inf), labels=c("53-66", "67-72", "73-76", "77-85")) > table(age_Q) age_Q 53-66 67-72 73-76 77-85 26 27 25 26 Then, I used mjca of ca pacakge for MCA. > mjca1 <- mjca(mydata.df[, c("age_Q","sex","symptom", "HT", "DM", "IHD","smoking","DL", "Statin")]) > summary(mjca1) Principal inertias (eigenvalues): dim value % cum% scree plot 1 0.009592 43.4 43.4 ************************* 2 0.003983 18.0 61.4 ********** 3 0.001047 4.7 66.1 ** 4 0.000367 1.7 67.8 -------- ----- Total: 0.022111 The dimension 1 explained 43% of the variance. Then, I was wondering which values I could use like PC1 in PCA. I explored in mjca1 and found "rowcoord". > mjca1$rowcoord [,1] [,2] [,3] [,4] [1,] 0.07403748 0.8963482181 0.10828273 1.581381849 [2,] 0.92433996 -1.1497911361 1.28872517 0.304065865 [3,] 0.49833354 0.6482940556 -2.11114314 0.365023261 [4,] 0.18998290 -1.4028117048 -1.70962159 0.451951744 [5,] -0.13008173 0.2557656854 1.16561601 -1.012992485 ......................................................... ......................................................... [101,] -1.86940216 0.5918128751 0.87352987 -1.118865117 [102,] -2.19096615 1.2845448725 0.25227354 -0.938612155 [103,] 0.77981265 -1.1931087587 0.23934034 0.627601413 [104,] -2.37058237 -1.4014005013 -0.73578248 -1.455055095 Then, I used mjca1$rowcoord[, 1] as the followings. > mydata.df$NewScore <- mjca1$rowcoord[, 1] I used this "NewScore" as one of the predictors for the model instead of original 9 variables. The final logistic model obtained by use of MCA was similar to the one obtained by use of PCA. My questions are; 1. Is it O.K. to perform PCA for data consisting of 1 continuous variable and 8 binary variables? 2. Is it O.K to perform transformation of age from continuous variable to factor variable for MCA? 3. Is "mjca1$rowcoord[, 1]" the correct values as a predictor of logistic regression model like PC1 of PCA? I would appreciate your help in advance. -- Kohkichi Hosoda ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.