Let me just reply to myself. Sorry, it's funny how much I don't get this, but it appears Ray is following you and provides an answer -- scratch my email, it seems to be way off
(you should still learn plyr and/or data.table if you haven't yet, tho ;-) Apologies, -steve On Tue, Jan 10, 2012 at 7:18 PM, Steve Lianoglou <mailinglist.honey...@gmail.com> wrote: > I'm having a really difficult time understanding what you're trying to > get -- copy and pasting your code is failing to run, and your question > isn't clear, ie: > > "For each phone call that BEGINS with the module which is denoted by 81 > (i.e. of the form 81X,XXX), what is the expected number of modules in these > calls?" > > How does one calculate the expected number of "modules" in this > module? What does that even mean? > > Anyway, here's some using your `data` data.frame that calculates the > number of unique calls and other statistics on the "call id" within > each module prefix. I'm using both data.table and plyr ... there are > no for loops. > > You will want to do `whatever it is you really want to do` inside the > "blocks" below. > > ## R code > data <- transform(data, module.prefix=substring(modules, 1, 2)) > > ## take a look at `data` now > > ## calulate "stuff" inside each module.prefix using data.table > xx <- data.table(data, key="module.prefix") > > ans <- xx[, { > ## the columns of the particular subset of your data.table > ## are "injected" into the scope for this expression block > ## which is where the `calls` variable below comes from > tabled <- table(as.character(calls)) > list(unique.calls=length(tabled), min=min(tabled), > median=as.numeric(median(tabled)), max=max(tabled)) > ## you will want to return your own list of "stuff" > }, by='module.prefix'] > > > ## with plyr > library(plyr) > ans <- ddply(data, "module.prefix", function(x) { > ## `x` is a data.frame that all share the same module.prefix > ## do whatever you want with it here > tabled <- table(as.character(x$calls)) > c(unique.calls=length(tabled), min=min(tabled), > median=median(tabled), max=max(tabled)) > }) > > You'll have to read up on the particulars of data.table and plyr. Both > are really powerful packages ... you should get familiar with at least > one. > > plyr is a bit more flexible in some ways. > > data.table is a bit more strict (cf. the need for > `as.numeric(median(tabled))`), but also tends to be (much) faster when > working over large datasets > > HTH, > -steve > > -- > Steve Lianoglou > Graduate Student: Computational Systems Biology > | Memorial Sloan-Kettering Cancer Center > | Weill Medical College of Cornell University > Contact Info: http://cbio.mskcc.org/~lianos/contact -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.