On Apr 6, 2011, at 9:46 AM, Fabrice Tourre wrote:
Dear Henrique Dallazuanna,
Thank you very much for your suggestion.
It is obvious that your method is better than me.
Is it possible to use cut, table,by etc? Whether there is some
aggregate function in R can do this?
Thanks.
On Wed, Apr 6, 2011 at 2:16 PM, Henrique Dallazuanna
<www...@gmail.com> wrote:
Try this:
fil <- sapply(ran, '<', e1 = dat[,1]) & sapply(ran[2:(length(ran) +
1)], '>=', e1 = dat[,1])
mm <- apply(fil, 2, function(idx)mean(dat[idx, 2]))
On Wed, Apr 6, 2011 at 5:48 AM, Fabrice Tourre <fabrice.c...@gmail.com
> wrote:
Dear list,
I have a dataframe with two column as fellow.
head(dat)
V1 V2
0.15624 0.94567
0.26039 0.66442
0.16629 0.97822
0.23474 0.72079
0.11037 0.83760
0.14969 0.91312
I want to get the column V2 mean value based on the bin of column of
V1. I write the code as fellow. It works, but I think this is not
the
elegant way. Any suggestions?
dat<-read.table("dat.txt",head=F)
ran<-seq(0,0.5,0.05)
mm<-NULL
for (i in c(1:(length(ran)-1)))
{
fil<- dat[,1] > ran[i] & dat[,1]<=ran[i+1]
m<-mean(dat[fil,2])
mm<-c(mm,m)
}
mm
Here is the first 20 lines of my data.
dput(head(dat,20))
structure(list(V1 = c(0.15624, 0.26039, 0.16629, 0.23474, 0.11037,
0.14969, 0.16166, 0.09785, 0.36417, 0.08005, 0.29597, 0.14856,
0.17307, 0.36718, 0.11621, 0.23281, 0.10415, 0.1025, 0.04238,
0.13525), V2 = c(0.94567, 0.66442, 0.97822, 0.72079, 0.8376,
0.91312, 0.88463, 0.82432, 0.55582, 0.9429, 0.78956, 0.93424,
0.87692, 0.83996, 0.74552, 0.9779, 0.9958, 0.9783, 0.92523, 0.99022
)), .Names = c("V1", "V2"), row.names = c(NA, 20L), class =
"data.frame")
______________________________________________
Here is how I would have done it with findInterval and tapply which is
very similar to using a `cut` and `table` approach:
> dat$grp <- findInterval(dat$V1, seq(0,0.5,0.05) )
> tapply(dat$V2, dat$grp, mean)
1 2 3 4 5 6 8
0.9252300 0.8836100 0.9135429 0.9213600 0.8493450 0.7269900 0.6978900
#####---------------
You do not get exactly the same form of the result as with Henrique's
method. His yields:
> mm
[1] 0.9252300 0.8836100 0.9135429 0.9213600 0.8493450
0.7269900 NaN
[8] 0.6978900 NaN NaN NaN
####----------------
The cut approach would yield this, which is more informatively
labeled. (I'm wasn't completely sure the second to last word in the
prior sentence was a real word, but several dictionaries seem to think
so.):
> dat$grp2 <- cut(dat$V1 , breaks=ran)
> tapply(dat$V2, dat$grp2, mean)
(0,0.05] (0.05,0.1] (0.1,0.15] (0.15,0.2] (0.2,0.25] (0.25,0.3]
0.9252300 0.8836100 0.9135429 0.9213600 0.8493450 0.7269900
(0.3,0.35] (0.35,0.4] (0.4,0.45] (0.45,0.5]
NA 0.6978900 NA NA
David Winsemius, MD
West Hartford, CT
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.