I tried > lapply(split(Clean,list(Clean$TERM,Clean$INST_NUM)),function(x) shapiro.test(x$GRADE)) and I got >Error in shapiro.test(x$GRADE.) : sample size must be between 3 and 5000
I also tried with(Clean, aggregate(GRADE,list(TERM,INST_NUM),FUN=shapiro.test)) and got Group.1 Group.2 x 1 201001 689809 0.9546164 2 201201 689809 0.9521624 3 201301 689809 0.9106206 4 200701 994474 0.8862705 5 200710 994474 0.9176743 6 201203 1105752 0.9382688 . . . 72 201001 1759272 0.9291295 73 201101 1759272 0.9347072 74 201110 1897809 0.9395375 Warning message: In format.data.frame(x, digits = digits, na.encode = FALSE) : corrupt data frame: columns will be truncated or padded with NAs I am not sure how to interpret the output of the second. Thanks! On Tue, Aug 13, 2013 at 11:01 AM, arun <smartpink...@yahoo.com> wrote: > Hi, > You could try: > lapply(split(Clean,list(Clean$TERM,Clean$INST_NUM)),function(x) > shapiro.test(x$GRADE)) > A.K. > > > > > ----- Original Message ----- > From: Robert Lynch <robert.b.ly...@gmail.com> > To: r-help@r-project.org > Cc: > Sent: Tuesday, August 13, 2013 1:46 PM > Subject: [R] ave function > > I've written the following function > CoursePrep <- function (Source, SaveName) { > > > Clean$TERM <- as.factor(Clean$TERM) > > Clean$INST_NUM <- as.factor(Clean$INST_NUM) > Clean$zGrade <- with(Clean, ave(GRADE., list(TERM, INST_NUM), FUN = > scale)) > write.csv(Clean,paste(SaveName, "csv", sep ="."), row.names = FALSE) > return(Clean) > } > > which is all well and good, but I wan't to throw a shapiro.test in before I > normalize. that is I don't really understand quite how I did ( I got help) > what I wanted to in the > Clean$zGrade <- with(Clean, ave(GRADE., list(TERM, INST_NUM), FUN = scale)) > that code for the whole of Clean finds all sets of GRADE.'s that have the > same INST_NUM and TERM computes a mean, subtracts off the mean and divides > by the standard deviation. I would like to for each one of those sets of > grades to call shapiro.test() on the set, to see if it is normal *before* I > assume it is. > > I know the naive > with(Clean, shapiro.test( list(TERM, INST_NUM))) > doesn't work. > with(Clean, ave(GRADE., list(TERM, INST_NUM), FUN = > function(x)shapiro.test(x))) > > which returns > Error in shapiro.test(x) : sample size must be between 3 and 5000 > and I have checked that the sets selected are all of length between 3 and > 5000. > using the following on my full data > > ClassSize <- with(Clean, ave(GRADE., list(TERM, INST_NUM), FUN = > function(x)length(x))) > > summary(ClassSize) > Min. 1st Qu. Median Mean 3rd Qu. Max. > 22.0 198.0 241.0 244.4 279.0 466.0 > > here is some sample data > GRADE TERM INST_NUM > 1, 9, 1 > 2, 9, 1 > 3, 9, 1 > 1.5, 8, 2 > 1.75, 8, 2 > 2, 8, 2 > 0.5, 9, 2 > 2, 9, 2 > 3.5, 9, 2 > 3.5, 8, 1 > 3.75, 8, 1 > 4, 8, 1 > > and hopefully the code would test the following set of grades > (1,2,3)(1.5,1.75,2)(0.5,2,3.5)(3.5,3.75,4) > > Thanks Robert > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.