Hello,
I need to analyse a data matrix with dimensions of 30x100. Before analysing the data there is, however, a need to remove outliers from the data. I read quite a lot about outlier removal already and I think the most common technique for that seems to be Principal Component Analysis (PCA). However, I think that these technqiue is quite subjective. When is an outlier an outlier? I uploaded an example PCA plot here: http://s14.postimage.org/oknyya1ld/pca.png Should we treat the green and red dots as outliers already or only the blue one which lies outside the 95% confidence interval. It seems very arbitrary how people remove outliers using PCA. I also thought about fitting a linear model through my data and look at distribution of the residuals. However, the problem with using linear models is that one can actually never be sure that the model used is the one which describes the data best. In model A, for instance, we might treat sample 1 as and outlier but fitting a different model B sample 1 might not be an outlier at all. I had a brief look at k-means clustering as well but I think it's not the right thing to go for. Again, how do one decide which cluster is an outler? And also it is known that different cluster analysis lead to totally different results. So which one to choose? Is there any other way to non-subjectively remove outliers from data? I would really appreciated any ideas/comments you might have on that topic. Cheers -- View this message in context: http://r.789695.n4.nabble.com/Outlier-removal-techniques-tp4372652p4372652.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.