To answer part 2: You should read up on statistical distributions and when a sample size is (or isn't) large enough to produce reliable statistical parameters such as mean or variance. I suspect David was implying that your yardstick, based on studentized residual, removes valid samples.

I once wrote a simple bit of code (back when I had to do things in c rather than R :-( ) that removed data points that were more than N*sigma off the current fitted data set, where N was 3 or 4. Even that is sloppy, as it doesn't take the sample size or other fit parameters into account, but it's a lot easier than your setup.


Carl


<quote>
From: kirtau <kirtau_at_live.com>
Date: Wed, 09 Feb 2011 10:06:07 -0800 (PST)

I have two questions,

1. if the solutions is only three or four lines of code is there anyway you can share those lines, without disrespecting me further
   2. Can you explain why you feel that this is "statistical malpractice"
</quote>

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to