Thanks. Before I never used findInterval function. It seems very nice. On Wed, Apr 6, 2011 at 11:20 PM, David Winsemius <dwinsem...@comcast.net> wrote: > > On Apr 6, 2011, at 9:46 AM, Fabrice Tourre wrote: > >> Dear Henrique Dallazuanna, >> >> Thank you very much for your suggestion. >> >> It is obvious that your method is better than me. >> >> Is it possible to use cut, table,by etc? Whether there is some >> aggregate function in R can do this? >> >> Thanks. >> >> On Wed, Apr 6, 2011 at 2:16 PM, Henrique Dallazuanna <www...@gmail.com> >> wrote: >>> >>> Try this: >>> >>> fil <- sapply(ran, '<', e1 = dat[,1]) & sapply(ran[2:(length(ran) + >>> 1)], '>=', e1 = dat[,1]) >>> mm <- apply(fil, 2, function(idx)mean(dat[idx, 2])) >>> >>> On Wed, Apr 6, 2011 at 5:48 AM, Fabrice Tourre <fabrice.c...@gmail.com> >>> wrote: >>>> >>>> Dear list, >>>> >>>> I have a dataframe with two column as fellow. >>>> >>>>> head(dat) >>>> >>>> V1 V2 >>>> 0.15624 0.94567 >>>> 0.26039 0.66442 >>>> 0.16629 0.97822 >>>> 0.23474 0.72079 >>>> 0.11037 0.83760 >>>> 0.14969 0.91312 >>>> >>>> I want to get the column V2 mean value based on the bin of column of >>>> V1. I write the code as fellow. It works, but I think this is not the >>>> elegant way. Any suggestions? >>>> >>>> dat<-read.table("dat.txt",head=F) >>>> ran<-seq(0,0.5,0.05) >>>> mm<-NULL >>>> for (i in c(1:(length(ran)-1))) >>>> { >>>> fil<- dat[,1] > ran[i] & dat[,1]<=ran[i+1] >>>> m<-mean(dat[fil,2]) >>>> mm<-c(mm,m) >>>> } >>>> mm >>>> >>>> Here is the first 20 lines of my data. >>>> >>>>> dput(head(dat,20)) >>>> >>>> structure(list(V1 = c(0.15624, 0.26039, 0.16629, 0.23474, 0.11037, >>>> 0.14969, 0.16166, 0.09785, 0.36417, 0.08005, 0.29597, 0.14856, >>>> 0.17307, 0.36718, 0.11621, 0.23281, 0.10415, 0.1025, 0.04238, >>>> 0.13525), V2 = c(0.94567, 0.66442, 0.97822, 0.72079, 0.8376, >>>> 0.91312, 0.88463, 0.82432, 0.55582, 0.9429, 0.78956, 0.93424, >>>> 0.87692, 0.83996, 0.74552, 0.9779, 0.9958, 0.9783, 0.92523, 0.99022 >>>> )), .Names = c("V1", "V2"), row.names = c(NA, 20L), class = >>>> "data.frame") >>>> >>>> ______________________________________________ > > Here is how I would have done it with findInterval and tapply which is very > similar to using a `cut` and `table` approach: > >> dat$grp <- findInterval(dat$V1, seq(0,0.5,0.05) ) >> tapply(dat$V2, dat$grp, mean) > 1 2 3 4 5 6 8 > 0.9252300 0.8836100 0.9135429 0.9213600 0.8493450 0.7269900 0.6978900 > #####--------------- > > You do not get exactly the same form of the result as with Henrique's > method. His yields: >> mm > [1] 0.9252300 0.8836100 0.9135429 0.9213600 0.8493450 0.7269900 NaN > [8] 0.6978900 NaN NaN NaN > > ####---------------- > > The cut approach would yield this, which is more informatively labeled. (I'm > wasn't completely sure the second to last word in the prior sentence was a > real word, but several dictionaries seem to think so.): > >> dat$grp2 <- cut(dat$V1 , breaks=ran) >> tapply(dat$V2, dat$grp2, mean) > (0,0.05] (0.05,0.1] (0.1,0.15] (0.15,0.2] (0.2,0.25] (0.25,0.3] > 0.9252300 0.8836100 0.9135429 0.9213600 0.8493450 0.7269900 > (0.3,0.35] (0.35,0.4] (0.4,0.45] (0.45,0.5] > NA 0.6978900 NA NA > > >> > > David Winsemius, MD > West Hartford, CT > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.