Re: [R] Calculated mean value based on another column bin from dataframe.

David Winsemius Wed, 06 Apr 2011 14:48:54 -0700


On Apr 6, 2011, at 9:46 AM, Fabrice Tourre wrote:

Dear Henrique Dallazuanna,

Thank you very much for your suggestion.

It is obvious that your method is better than me.

Is it possible to use cut, table,by etc? Whether there is some
aggregate function in R can do this?

Thanks.

On Wed, Apr 6, 2011 at 2:16 PM, Henrique Dallazuanna<www...@gmail.com> wrote:

Try this:

fil <- sapply(ran, '<', e1 = dat[,1]) & sapply(ran[2:(length(ran) +
1)], '>=', e1 = dat[,1])
mm <- apply(fil, 2, function(idx)mean(dat[idx, 2]))

On Wed, Apr 6, 2011 at 5:48 AM, Fabrice Tourre <fabrice.c...@gmail.com> wrote:

Dear list,

I have a dataframe with two column as fellow.

head(dat)

      V1      V2
 0.15624 0.94567
 0.26039 0.66442
 0.16629 0.97822
 0.23474 0.72079
 0.11037 0.83760
 0.14969 0.91312

I want to get the column V2 mean value based on the bin of column of

V1. I write the code as fellow. It works, but I think this is notthe

elegant way. Any suggestions?

dat<-read.table("dat.txt",head=F)
ran<-seq(0,0.5,0.05)
mm<-NULL
for (i in c(1:(length(ran)-1)))
{
   fil<- dat[,1] > ran[i] & dat[,1]<=ran[i+1]
   m<-mean(dat[fil,2])
   mm<-c(mm,m)
}
mm

Here is the first 20 lines of my data.

dput(head(dat,20))

structure(list(V1 = c(0.15624, 0.26039, 0.16629, 0.23474, 0.11037,
0.14969, 0.16166, 0.09785, 0.36417, 0.08005, 0.29597, 0.14856,
0.17307, 0.36718, 0.11621, 0.23281, 0.10415, 0.1025, 0.04238,
0.13525), V2 = c(0.94567, 0.66442, 0.97822, 0.72079, 0.8376,
0.91312, 0.88463, 0.82432, 0.55582, 0.9429, 0.78956, 0.93424,
0.87692, 0.83996, 0.74552, 0.9779, 0.9958, 0.9783, 0.92523, 0.99022

)), .Names = c("V1", "V2"), row.names = c(NA, 20L), class ="data.frame")


______________________________________________

Here is how I would have done it with findInterval and tapply which isvery similar to using a `cut` and `table` approach:


> dat$grp <- findInterval(dat$V1, seq(0,0.5,0.05) )
> tapply(dat$V2, dat$grp, mean)
        1         2         3         4         5         6         8
0.9252300 0.8836100 0.9135429 0.9213600 0.8493450 0.7269900 0.6978900
#####---------------

You do not get exactly the same form of the result as with Henrique'smethod. His yields:

> mm

[1] 0.9252300 0.8836100 0.9135429 0.9213600 0.84934500.7269900 NaN

 [8] 0.6978900       NaN       NaN       NaN

####----------------

The cut approach would yield this, which is more informativelylabeled. (I'm wasn't completely sure the second to last word in theprior sentence was a real word, but several dictionaries seem to thinkso.):


> dat$grp2 <- cut(dat$V1 , breaks=ran)
> tapply(dat$V2, dat$grp2, mean)
  (0,0.05] (0.05,0.1] (0.1,0.15] (0.15,0.2] (0.2,0.25] (0.25,0.3]
 0.9252300  0.8836100  0.9135429  0.9213600  0.8493450  0.7269900
(0.3,0.35] (0.35,0.4] (0.4,0.45] (0.45,0.5]
        NA  0.6978900         NA         NA


David Winsemius, MD
West Hartford, CT

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Calculated mean value based on another column bin from dataframe.

Reply via email to