Re: [R] Removing Outliers Function

David Winsemius Tue, 08 Feb 2011 20:46:22 -0800

Exactly right. I use the phrase to catch the unwary's attention. Ithink the effect is properly placed on the y-axis.

IIRC, Ben Bolker (or was it Bert Gunter?) has also commented in the R-help or r-devel pages this curious inversion of functional meaning.


--
David
On Feb 8, 2011, at 10:36 PM, Ravi Varadhan wrote:

David,

Please allow me to digress a lot here. You are one of the few(inlcuding yours truly!) that uses the phrase "shallow learningcurve" to indicate difficulty of learning (I assume this is what youmeant). I always felt that "steep learning curve" was incorrect. Ifyou plotted the amount of learning on the Y-axis and time on the X-axis, a steep learning curve means that one learns very quickly, butthis is just the opposite of what is actually meant.


Best,
Ravi.
____________________________________________________________________

Ravi Varadhan, Ph.D.
Assistant Professor,
Division of Geriatric Medicine and Gerontology
School of Medicine
Johns Hopkins University

Ph. (410) 502-2619
email: rvarad...@jhmi.edu


----- Original Message -----
From: David Winsemius <dwinsem...@comcast.net>
Date: Tuesday, February 8, 2011 10:09 pm
Subject: Re: [R] Removing Outliers Function
To: kirtau <kir...@live.com>
Cc: r-help@r-project.org

On Feb 8, 2011, at 9:11 PM, kirtau wrote:


I am working on a function that will remove outliers for regression

analysis.

I am stating that a data point is an outlier if its studentized

residual is

above or below 3 and -3, respectively. The code below is what i have

thus

far for the function

x = c(1:20)
y = c(1,3,4,2,5,6,18,8,10,8,11,13,14,14,15,85,17,19,19,20)
data1 = data.frame(x,y)


rm.outliers = function(dataset,dependent,independent){
 dataset$predicted = predict(lm(dependent~independent))
 dataset$stdres = rstudent(lm(dependent~independent))
 m = 1
 for(i in 1:length(dataset$stdres)){
   dataset$outlier_counter[i] = if(dataset$stdres[i] >= 3 |
dataset$stdres[i] <= -3) {m} else{0}
 }
 j = length(which(dataset$outlier_counter >= 1))
 while(j>=1){
   print(dataset[which(dataset$outlier_counter >= 1),])
   dataset = dataset[which(dataset$outlier_counter == 0),]
   dataset$predicted = predict(lm(dependent~independent))
   dataset$stdres = rstudent(lm(dependent~independent))
     m = m+1
     for(k in 1:length(dataset$stdres)){
       dataset$outlier_counter[k] = if(dataset$stdres[k] >= 3 |
dataset$stdres[k] <= -3) {m} else{0}
     }
   j = length(which(dataset$outlier_counter >= 1))
 }
 return(dataset)
}

The problem that I run into is that i receive this error when i type

rm.outliers(data1,data1$y,data1$x)

"    x  y predicted   stdres outlier_counter
16 16 85  22.98647 24.04862               1

Error in `$<-.data.frame`(`*tmp*`, "predicted", value =c(0.114285714285714,

:
replacement has 20 rows, data has 19"

Note: the outlier_counter variable is used to state which "round" of

the

loop the datapoint was marked as an outlier.
This would be a HUGE help to me and a few buddies who run a lot ofdifferent
regression tests.


The solution is about 3 or 4 lines of code to make the function, but
removing outliers like this is simply statistical malpractice. Maybe
it's a good thing that R has a shallow learning curve.

--

David Winsemius, MD
West Hartford, CT

______________________________________________
R-help@r-project.org mailing list

PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Removing Outliers Function

Reply via email to