Hi Robert,
source("shareB101") ##Clean is the dataset res1<-with(Clean,aggregate(GRADE,list(TERM,INST_NUM),FUN=function(x) cbind(shapiro.test(x)$p.value,shapiro.test(x)$statistic)) ) head(res1) # Group.1 Group.2 x.1 x.2 #1 201001 689809 1.720329e-07 9.307362e-01 #2 201201 689809 2.029761e-11 9.139405e-01 #3 201301 689809 4.709662e-14 8.791063e-01 #4 200701 994474 3.695317e-14 7.939902e-01 #5 200710 994474 4.560275e-13 8.849943e-01 #6 201203 1105752 4.434649e-15 9.220643e-01 #Regarding the lapply() error, it was the same problem as I thought: lapply(split(Clean,list(Clean$TERM,Clean$INST_NUM)),function(x) shapiro.test(x$GRADE)) #Error in shapiro.test(x$GRADE) : sample size must be between 3 and 5000 lst1<-split(Clean,list(Clean$TERM,Clean$INST_NUM)) lst2<- lapply(lst1[lapply(lst1,nrow)>0], function(x) shapiro.test(x$GRADE)) lst2[[1]] # Shapiro-Wilk normality test # #data: x$GRADE #W = 0.9307, p-value = 1.72e-07 library(plyr) res2<- ldply(dlply(Clean,.(TERM,INST_NUM), function(x) shapiro.test(x$GRADE)), summarize, pval=p.value,stat1=statistic) head(res2) # TERM INST_NUM pval stat1 #1 200610 1106842 1.420787e-11 0.9192428 #2 200610 1324438 2.345177e-12 0.9048394 #3 200610 1557630 4.618117e-10 0.8968445 #4 200701 994474 3.695317e-14 0.7939902 #5 200701 1106842 2.745429e-08 0.9292158 #6 200701 1107019 6.887642e-10 0.9213602 A.K. ________________________________ From: Robert Lynch <robert.b.ly...@gmail.com> To: arun <smartpink...@yahoo.com> Sent: Wednesday, August 21, 2013 4:49 PM Subject: Re: [R] ave function Arun-- Thanks I had no idea about dput. I really appreciate your help. I have attached an example data set from dput. Not to worry the ID#s have been changed but I wanted to include them just in case they were part of the issue ( though i doubt it). On Tue, Aug 20, 2013 at 7:27 PM, arun <smartpink...@yahoo.com> wrote: HI, > > >I guess your original dataset would have some list elements as empty. > >Clean<- structure(list(GRADE = c(1, 2, 3, 1.5, 1.75, 2, 0.5, 2, 3.5, >3.5, 3.75, 4), TERM = c(9L, 9L, 9L, 8L, 8L, 8L, 9L, 9L, 9L, 8L, >8L, 8L), INST_NUM = c(1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, >1L, 1L)), .Names = c("GRADE", "TERM", "INST_NUM"), class = "data.frame", >row.names = c(NA, >-12L)) > > lapply(split(Clean,list(Clean$TERM,Clean$INST_NUM)),function(x) >shapiro.test(x$GRADE)) >#$`8.1` > ># Shapiro-Wilk normality test ># >#data: x$GRADE >#W = 1, p-value = 1 ># > >#$`9.1` ># > # Shapiro-Wilk normality test ># >#data: x$GRADE >#W = 1, p-value = 1 > >----------------------------------------------------- > sapply(split(Clean,list(Clean$TERM,Clean$INST_NUM)),function(x) >shapiro.test(x$GRADE)$p.value) >#8.1 9.1 8.2 9.2 > # 1 1 1 1 >with(Clean, aggregate(GRADE,list(TERM,INST_NUM),FUN=shapiro.test)) #the output >is a list, ># Group.1 Group.2 x >#1 8 1 1 >#2 9 1 1 >#3 8 2 1 >#4 9 2 1 >#Warning message: >#In format.data.frame(x, digits = digits, na.encode = FALSE) : > # corrupt data frame: columns will be truncated or padded with NAs > > > >library(plyr) >ldply(dlply(Clean,.(TERM,INST_NUM), function(x) shapiro.test(x$GRADE)), >summarize, pval=p.value) ># TERM INST_NUM pval >#1 8 1 1 >#2 8 2 1 >#3 9 1 1 >#4 9 2 1 > > > >Now, consider this example: > >Clean1<- structure(list(GRADE = c(1, 2, 3, 1.5, 1.75, 2, 0.5, 2, 3.5, >3.5, 3.75, 4, 4.5, 4.25, 4.32), TERM = c(9L, 9L, 9L, 8L, 8L, >8L, 9L, 9L, 9L, 8L, 8L, 8L, 10L, 10L, 10L), INST_NUM = c(1L, >1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L)), .Names = c("GRADE", >"TERM", "INST_NUM"), class = "data.frame", row.names = c(NA, >-15L)) >lapply(split(Clean1,list(Clean1$TERM,Clean1$INST_NUM)),function(x) >shapiro.test(x$GRADE)) >#Error in shapiro.test(x$GRADE) : sample size must be between 3 and 5000 > > split(Clean1,list(Clean1$TERM,Clean1$INST_NUM))[[6]] ###0 rows >#[1] GRADE TERM INST_NUM >#<0 rows> (or 0-length row.names) > > >lst1<-split(Clean1,list(Clean1$TERM,Clean1$INST_NUM)) >lapply(lst1[lapply(lst1,nrow)>0], function(x) shapiro.test(x$GRADE)) >#$`8.1` ># > # Shapiro-Wilk normality test ># >#data: x$GRADE >#W = 1, p-value = 1 > > >You could do this directly with: > ldply(dlply(Clean1,.(TERM,INST_NUM), function(x) shapiro.test(x$GRADE)), >summarize, pval=p.value) ># TERM INST_NUM pval >#1 8 1 1.0000000 >#2 8 2 1.0000000 >#3 9 1 1.0000000 >#4 9 2 1.0000000 >#5 10 1 0.5248807 > ldply(dlply(Clean1,.(TERM,INST_NUM), function(x) shapiro.test(x$GRADE)), >summarize, pval=p.value,stat1=statistic) ># TERM INST_NUM pval stat1 >#1 8 1 1.0000000 1.0000000 >#2 8 2 1.0000000 1.0000000 >#3 9 1 1.0000000 1.0000000 >#4 9 2 1.0000000 1.0000000 >#5 10 1 0.5248807 0.9393788 > > > >#or > with(Clean1, aggregate(GRADE,list(TERM,INST_NUM),FUN=function(x) >shapiro.test(x)$p.value)) > Group.1 Group.2 x >1 8 1 1.0000000 >2 9 1 1.0000000 >3 10 1 0.5248807 >4 8 2 1.0000000 >5 9 2 1.0000000 > >#If you want both pvalue and statistic >with(Clean1, aggregate(GRADE,list(TERM,INST_NUM),FUN=function(x) >cbind(shapiro.test(x)$p.value,shapiro.test(x)$statistic)) ) ># Group.1 Group.2 x.1 x.2 >#1 8 1 1.0000000 1.0000000 >#2 9 1 1.0000000 1.0000000 >#3 10 1 0.5248807 0.9393788 >#4 8 2 1.0000000 1.0000000 >#5 9 2 1.0000000 1.0000000 > > >Hope this helps. > > >A.K. > > >________________________________ >From: Robert Lynch <robert.b.ly...@gmail.com> > >To: arun <smartpink...@yahoo.com> >Cc: R help <r-help@r-project.org> >Sent: Tuesday, August 20, 2013 8:00 PM >Subject: Re: [R] ave function > > > > >I tried >> lapply(split(Clean,list(Clean$TERM,Clean$INST_NUM)),function(x) >> shapiro.test(x$GRADE)) > > and I got > >>Error in shapiro.test(x$GRADE.) : sample size must be between 3 and 5000 > > >I also tried >with(Clean, aggregate(GRADE,list(TERM,INST_NUM),FUN=shapiro.test)) > > >and got > Group.1 Group.2 x >1 201001 689809 0.9546164 >2 201201 689809 0.9521624 >3 201301 689809 0.9106206 >4 200701 994474 0.8862705 >5 200710 994474 0.9176743 >6 201203 1105752 0.9382688 >. >. >. >72 201001 1759272 0.9291295 >73 201101 1759272 0.9347072 >74 201110 1897809 0.9395375 >Warning message: >In format.data.frame(x, digits = digits, na.encode = FALSE) : > corrupt data frame: columns will be truncated or padded with NAs > >I am not sure how to interpret the output of the second. > >Thanks! > > > >On Tue, Aug 13, 2013 at 11:01 AM, arun <smartpink...@yahoo.com> wrote: > >Hi, >>You could try: >> lapply(split(Clean,list(Clean$TERM,Clean$INST_NUM)),function(x) >>shapiro.test(x$GRADE)) >>A.K. >> >> >> >> >> >>----- Original Message ----- >>From: Robert Lynch <robert.b.ly...@gmail.com> >>To: r-help@r-project.org >>Cc: >>Sent: Tuesday, August 13, 2013 1:46 PM >>Subject: [R] ave function >> >>I've written the following function >>CoursePrep <- function (Source, SaveName) { >> >> >> Clean$TERM <- as.factor(Clean$TERM) >> >> Clean$INST_NUM <- as.factor(Clean$INST_NUM) >> Clean$zGrade <- with(Clean, ave(GRADE., list(TERM, INST_NUM), FUN = >>scale)) >> write.csv(Clean,paste(SaveName, "csv", sep ="."), row.names = FALSE) >> return(Clean) >>} >> >>which is all well and good, but I wan't to throw a shapiro.test in before I >>normalize. that is I don't really understand quite how I did ( I got help) >>what I wanted to in the >>Clean$zGrade <- with(Clean, ave(GRADE., list(TERM, INST_NUM), FUN = scale)) >>that code for the whole of Clean finds all sets of GRADE.'s that have the >>same INST_NUM and TERM computes a mean, subtracts off the mean and divides >>by the standard deviation. I would like to for each one of those sets of >>grades to call shapiro.test() on the set, to see if it is normal *before* I >>assume it is. >> >>I know the naive >>with(Clean, shapiro.test( list(TERM, INST_NUM))) >>doesn't work. >>with(Clean, ave(GRADE., list(TERM, INST_NUM), FUN = >>function(x)shapiro.test(x))) >> >>which returns >>Error in shapiro.test(x) : sample size must be between 3 and 5000 >>and I have checked that the sets selected are all of length between 3 and >>5000. >>using the following on my full data >> >>ClassSize <- with(Clean, ave(GRADE., list(TERM, INST_NUM), FUN = >>function(x)length(x))) >>> summary(ClassSize) >> Min. 1st Qu. Median Mean 3rd Qu. Max. >> 22.0 198.0 241.0 244.4 279.0 466.0 >> >>here is some sample data >>GRADE TERM INST_NUM >>1, 9, 1 >>2, 9, 1 >>3, 9, 1 >>1.5, 8, 2 >>1.75, 8, 2 >>2, 8, 2 >>0.5, 9, 2 >>2, 9, 2 >>3.5, 9, 2 >>3.5, 8, 1 >>3.75, 8, 1 >>4, 8, 1 >> >>and hopefully the code would test the following set of grades >>(1,2,3)(1.5,1.75,2)(0.5,2,3.5)(3.5,3.75,4) >> >>Thanks Robert >> >> [[alternative HTML version deleted]] >> >>______________________________________________ >>R-help@r-project.org mailing list >>https://stat.ethz.ch/mailman/listinfo/r-help >>PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>and provide commented, minimal, self-contained, reproducible code. >> >> > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.