>>>>> Bert Gunter <gunter.ber...@gene.com> >>>>> on Wed, 18 Jul 2012 07:14:31 -0700 writes:
> checkforoutliers <- function(series) NULL > Cheers, Bert > *Explanation: There is no such thing as a statistical > outlier -- or, rather,"outlier" is a fraudulent > statistical concept, defined arbitrarily and without > scientific legitimacy. The typical unstated purpose of > such identification is to remove contaminating or > irrelevant data, but such a judgment can only be made by a > subject matter expert with knowledge of the context and, > usually, the specific cause for the unusual data. Do not > be misled by the large body of statistical literature on > this topic into believing that statistical analysis alone > can provide objective criteria to do this. That is a path > to scientific purgatory. > For the record: 1. I am a statistician > 2. Lots of highly knowledgeable, smart statisticians will condemn what I > have just said as stupid ranting. I entirely agree with you that outlier-removing procedures are mostly misused, and dangerous because of that misuse {and hence should typically NOT be taught, or not the way I have seen them taught (on occasions, not here at ETH!)...} and I even more fervently agree with Michael Weylandt's recommendation to use robust statistics rather than outlier detection --- at least in those cases where "robust statistics" is *not* ill-re-defined as {outlier detection}+{classical stats}. However, I don't think 'outlier' to be a fraudulent concept. Rather I think outliers can be pretty well defined along the line of "outlier WITH RESPECT TO A MODEL" (and 'model' means 'statistical model', i.e., with some randomness built in) : Outlier wrt model M := an observation which is highly improbable to be observed under model M (and "highly improbable" of course is somewhat vague, but that's not a problem per se.) A version of the above is Outlier := an observation that has unduely large influence on the estimators/inference performed where 'estimator / inference' imply a model of course. So I think outlier is a useful concept for those who think about *models* (rather than just data sets), and I agree that without an implicit or explicit model, "outlier" is not well defined. > The perils of a mailing list. > -- Bert :-) Martin > On Wed, Jul 18, 2012 at 6:27 AM, Sajeeka Nanayakkara .. wrote: >> >> What is the R code to check whether data series have >> outliers or not? >> >> Thanks, >> >> Sajeeka Nanayakkara > -- > Bert Gunter Genentech Nonclinical Biostatistics ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.