[R] Principle Component Analysis: Ranking Animal Size Based On Combined Metrics

Sidoti, Salvatore A. Sun, 13 Nov 2016 03:44:27 -0800

Let's say I perform 4 measurements on an animal: three are linear measurements 
in millimeters and the fourth is its weight in milligrams. So, we have a data 
set with mixed units.


Based on these four correlated measurements, I would like to obtain one "score" 
or value that describes an individual animal's size. I considered simply taking 
the geometric mean of these 4 measurements, and that would give me a "score" - 
larger values would be for larger animals, etc.

However, this assumes that all 4 of these measurements contribute equally to an 
animal's size. Of course, more than likely this is not the case. I then 
performed a PCA to discover how much influence each variable had on the overall 
data set. I was hoping to use this analysis to refine my original approach.

I honestly do not know how to apply the information from the PCA to this 
particular problem...

I do know, however, that principle components 1 and 2 capture enough of the 
variation to reduce the number of dimensions down to 2 (see analysis below with 
the original data set).

Note: animal weights were ln() transformed to increase correlation with the 3 
other variables.

df <- data.frame(
  weight = log(1000*c(0.0980, 0.0622, 0.0600, 0.1098, 0.0538, 0.0701, 0.1138, 
0.0540, 0.0629, 0.0930,
             0.0443, 0.1115, 0.1157, 0.0734, 0.0616, 0.0640, 0.0480, 0.1339, 
0.0547, 0.0844,
             0.0431, 0.0472, 0.0752, 0.0604, 0.0713, 0.0658, 0.0538, 0.0585, 
0.0645, 0.0529,
             0.0448, 0.0574, 0.0577, 0.0514, 0.0758, 0.0424, 0.0997, 0.0758, 
0.0649, 0.0465,
             0.0748, 0.0540, 0.0819, 0.0732, 0.0725, 0.0730, 0.0777, 0.0630, 
0.0466)),
  interoc = c(0.853, 0.865, 0.811, 0.840, 0.783, 0.868, 0.818, 0.847, 0.838, 
0.799,
              0.737, 0.788, 0.731, 0.777, 0.863, 0.877, 0.814, 0.926, 0.767, 
0.746,
              0.700, 0.768, 0.807, 0.753, 0.809, 0.788, 0.750, 0.815, 0.757, 
0.737,
              0.759, 0.863, 0.747, 0.838, 0.790, 0.676, 0.857, 0.728, 0.743, 
0.870,
              0.787, 0.773, 0.829, 0.785, 0.746, 0.834, 0.829, 0.750, 0.842),
  cwidth = c(3.152, 3.046, 3.139, 3.181, 3.023, 3.452, 2.803, 3.050, 3.160, 
3.186,
             2.801, 2.862, 3.183, 2.770, 3.207, 3.188, 2.969, 3.033, 2.972, 
3.291,
             2.772, 2.875, 2.978, 3.094, 2.956, 2.966, 2.896, 3.149, 2.813, 
2.935,
             2.839, 3.152, 2.984, 3.037, 2.888, 2.723, 3.342, 2.562, 2.827, 
2.909,
             3.093, 2.990, 3.097, 2.751, 2.877, 2.901, 2.895, 2.721, 2.942),
  clength = c(3.889, 3.733, 3.762, 4.059, 3.911, 3.822, 3.768, 3.814, 3.721, 
3.794,
              3.483, 3.863, 3.856, 3.457, 3.996, 3.876, 3.642, 3.978, 3.534, 
3.967,
              3.429, 3.518, 3.766, 3.755, 3.706, 3.785, 3.607, 3.922, 3.453, 
3.589,
              3.508, 3.861, 3.706, 3.593, 3.570, 3.341, 3.916, 3.336, 3.504, 
3.688,
              3.735, 3.724, 3.860, 3.405, 3.493, 3.586, 3.545, 3.443, 3.640))

pca_morpho <- princomp(df, cor = TRUE)

summary(pca_morpho)

Importance of components:
                                        Comp.1          Comp.2          Comp.3  
        Comp.4
Standard deviation      1.604107        0.8827323       0.7061206       
0.3860275
Proportion of Variance  0.643290        0.1948041       0.1246516       
0.0372543
Cumulative Proportion   0.643290        0.8380941       0.9627457       
1.0000000

Loadings:
                        Comp.1  Comp.2  Comp.3  Comp.4
weight          -0.371          0.907                           -0.201
interoc         -0.486  -0.227  -0.840       
cwidth          -0.537  -0.349          0.466           -0.611
clength         -0.582                          0.278   0.761

                        Comp.1  Comp.2  Comp.3  Comp.4
SS loadings             1.00            1.00            1.00            1.00
Proportion Var          0.25            0.25            0.25            0.25
Cumulative Var          0.25            0.50            0.75            1.00

Any guidance will be greatly appreciated!

Salvatore A. Sidoti
PhD Student
The Ohio State University
Behavioral Ecology

______________________________________________
[email protected] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Principle Component Analysis: Ranking Animal Size Based On Combined Metrics

Reply via email to