Let's say I perform 4 measurements on an animal: three are linear measurements
in millimeters and the fourth is its weight in milligrams. So, we have a data
set with mixed units.
Based on these four correlated measurements, I would like to obtain one "score"
or value that describes an individual animal's size. I considered simply taking
the geometric mean of these 4 measurements, and that would give me a "score" -
larger values would be for larger animals, etc.
However, this assumes that all 4 of these measurements contribute equally to an
animal's size. Of course, more than likely this is not the case. I then
performed a PCA to discover how much influence each variable had on the overall
data set. I was hoping to use this analysis to refine my original approach.
I honestly do not know how to apply the information from the PCA to this
particular problem...
I do know, however, that principle components 1 and 2 capture enough of the
variation to reduce the number of dimensions down to 2 (see analysis below with
the original data set).
Note: animal weights were ln() transformed to increase correlation with the 3
other variables.
df <- data.frame(
weight = log(1000*c(0.0980, 0.0622, 0.0600, 0.1098, 0.0538, 0.0701, 0.1138,
0.0540, 0.0629, 0.0930,
0.0443, 0.1115, 0.1157, 0.0734, 0.0616, 0.0640, 0.0480, 0.1339,
0.0547, 0.0844,
0.0431, 0.0472, 0.0752, 0.0604, 0.0713, 0.0658, 0.0538, 0.0585,
0.0645, 0.0529,
0.0448, 0.0574, 0.0577, 0.0514, 0.0758, 0.0424, 0.0997, 0.0758,
0.0649, 0.0465,
0.0748, 0.0540, 0.0819, 0.0732, 0.0725, 0.0730, 0.0777, 0.0630,
0.0466)),
interoc = c(0.853, 0.865, 0.811, 0.840, 0.783, 0.868, 0.818, 0.847, 0.838,
0.799,
0.737, 0.788, 0.731, 0.777, 0.863, 0.877, 0.814, 0.926, 0.767,
0.746,
0.700, 0.768, 0.807, 0.753, 0.809, 0.788, 0.750, 0.815, 0.757,
0.737,
0.759, 0.863, 0.747, 0.838, 0.790, 0.676, 0.857, 0.728, 0.743,
0.870,
0.787, 0.773, 0.829, 0.785, 0.746, 0.834, 0.829, 0.750, 0.842),
cwidth = c(3.152, 3.046, 3.139, 3.181, 3.023, 3.452, 2.803, 3.050, 3.160,
3.186,
2.801, 2.862, 3.183, 2.770, 3.207, 3.188, 2.969, 3.033, 2.972,
3.291,
2.772, 2.875, 2.978, 3.094, 2.956, 2.966, 2.896, 3.149, 2.813,
2.935,
2.839, 3.152, 2.984, 3.037, 2.888, 2.723, 3.342, 2.562, 2.827,
2.909,
3.093, 2.990, 3.097, 2.751, 2.877, 2.901, 2.895, 2.721, 2.942),
clength = c(3.889, 3.733, 3.762, 4.059, 3.911, 3.822, 3.768, 3.814, 3.721,
3.794,
3.483, 3.863, 3.856, 3.457, 3.996, 3.876, 3.642, 3.978, 3.534,
3.967,
3.429, 3.518, 3.766, 3.755, 3.706, 3.785, 3.607, 3.922, 3.453,
3.589,
3.508, 3.861, 3.706, 3.593, 3.570, 3.341, 3.916, 3.336, 3.504,
3.688,
3.735, 3.724, 3.860, 3.405, 3.493, 3.586, 3.545, 3.443, 3.640))
pca_morpho <- princomp(df, cor = TRUE)
summary(pca_morpho)
Importance of components:
Comp.1 Comp.2 Comp.3
Comp.4
Standard deviation 1.604107 0.8827323 0.7061206
0.3860275
Proportion of Variance 0.643290 0.1948041 0.1246516
0.0372543
Cumulative Proportion 0.643290 0.8380941 0.9627457
1.0000000
Loadings:
Comp.1 Comp.2 Comp.3 Comp.4
weight -0.371 0.907 -0.201
interoc -0.486 -0.227 -0.840
cwidth -0.537 -0.349 0.466 -0.611
clength -0.582 0.278 0.761
Comp.1 Comp.2 Comp.3 Comp.4
SS loadings 1.00 1.00 1.00 1.00
Proportion Var 0.25 0.25 0.25 0.25
Cumulative Var 0.25 0.50 0.75 1.00
Any guidance will be greatly appreciated!
Salvatore A. Sidoti
PhD Student
The Ohio State University
Behavioral Ecology
______________________________________________
[email protected] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.