Dear Daniel,

Thank you for your mail.
Your comment is exactly what I was worried about.

I konw very little about latent class analysis. So, I would like to use multiple correspondence analysis (MCA) for data redution. Besides, the first plane of the MCA captured 43% of the variance.

Do you think my use of "mjca1$rowcoord[, 1]" in ca package for data reduction in the previous mail is O.K.?

Thank you for your help.

--
Kohkichi Hosoda

(11/08/18 17:39), Daniel Malter wrote:
Pooling nominal with numeric variables and running pca on them sounds like
conceptual nonsense to me. You use PCA to reduce the dimensionality of the
data if the data are numeric. For categorical data analysis, you should use
latent class analysis or something along those lines.

The fact that your first PC captures only 20 percent of the variance
indicates that either you apply the wrong technique or that dimensionality
reduction is of little use for these data more generally. The first step
should generally be to check the correlations/associations between the
variables to inspect whether what you intend to do makes sense.

HTH,
Daniel



khosoda wrote:

Hi all,

I'm trying to do model reduction for logistic regression. I have 13
predictor (4 continuous variables and 9 binary variables). Using subject
matter knowledge, I selected 4 important variables. Regarding the rest 9
variables, I tried to perform data reduction by principal component
analysis (PCA). However, 8 of 9 variables were binary and only one
continuous. I transformed the data by transcan of rms package and did
PCA with princomp. PC1 explained only 20% of the variance. Still, I used
the PC1 as a predictor of the logistic model and obtained some results.

Then, I tried multiple correspondence analysis (MCA). The only one
continuous variable was age. I transformed "age" variable to "age_Q"
factor variable as the followings.

quantile(mydata.df$age)
    0%   25%   50%   75%  100%
53.00 66.75 72.00 76.25 85.00
age_Q<- cut(x17.df$age, right=TRUE, breaks=c(-Inf, 66, 72, 76, Inf),
labels=c("53-66", "67-72", "73-76", "77-85"))
table(age_Q)
age_Q
53-66 67-72 73-76 77-85
    26    27    25    26

Then, I used mjca of ca pacakge for MCA.

mjca1<-  mjca(mydata.df[, c("age_Q","sex","symptom", "HT", "DM",
"IHD","smoking","DL", "Statin")])

summary(mjca1)

Principal inertias (eigenvalues):

  dim    value      %   cum%   scree plot
  1      0.009592  43.4  43.4  *************************
  2      0.003983  18.0  61.4  **********
  3      0.001047   4.7  66.1  **
  4      0.000367   1.7  67.8
         -------- -----
  Total: 0.022111

The dimension 1 explained 43% of the variance. Then, I was wondering
which values I could use like PC1 in PCA. I explored in mjca1 and found
"rowcoord".

mjca1$rowcoord
               [,1]          [,2]        [,3]         [,4]
   [1,]  0.07403748  0.8963482181  0.10828273  1.581381849
   [2,]  0.92433996 -1.1497911361  1.28872517  0.304065865
   [3,]  0.49833354  0.6482940556 -2.11114314  0.365023261
   [4,]  0.18998290 -1.4028117048 -1.70962159  0.451951744
   [5,] -0.13008173  0.2557656854  1.16561601 -1.012992485
.........................................................
.........................................................
[101,] -1.86940216  0.5918128751  0.87352987 -1.118865117
[102,] -2.19096615  1.2845448725  0.25227354 -0.938612155
[103,]  0.77981265 -1.1931087587  0.23934034  0.627601413
[104,] -2.37058237 -1.4014005013 -0.73578248 -1.455055095

Then, I used mjca1$rowcoord[, 1] as the followings.

mydata.df$NewScore<- mjca1$rowcoord[, 1]

I used this "NewScore" as one of the predictors for the model instead of
original 9 variables.

The final logistic model obtained by use of MCA was similar to the one
obtained by use of PCA.

My questions are;

1. Is it O.K. to perform PCA for data consisting of 1 continuous
variable and 8 binary variables?

2. Is it O.K to perform transformation of age from continuous variable
to factor variable for MCA?

3. Is "mjca1$rowcoord[, 1]" the correct values as a predictor of
logistic regression model like PC1 of PCA?

I would appreciate your help in advance.

--
Kohkichi Hosoda

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


--
View this message in context: 
http://r.789695.n4.nabble.com/How-to-use-PC1-of-PCA-and-dim1-of-MCA-as-a-predictor-in-logistic-regression-model-for-data-reduction-tp3750251p3752062.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to