Yeah -- just fired off an apology email before this landed in my inbox. Sometimes I'm better off not trying to help at all -- this was one of those cases ;-)
Whatever I was trying to do clearly was going down the wrong trail Thankfully, you're on top of it though. Sorry for the spam, -steve On Tue, Jan 10, 2012 at 7:33 PM, Ray Brownrigg <ray.brownr...@ecs.vuw.ac.nz> wrote: > Steve: > > I don't understand why you couldn't get the original code working. You just > have to > notice that one comment overflows its line. > > However I couldn't get your code to match the output of the original - > almost, but not > quite! > > Ray > > On Wed, 11 Jan 2012, Steve Lianoglou wrote: >> I'm having a really difficult time understanding what you're trying to >> get -- copy and pasting your code is failing to run, and your question >> isn't clear, ie: >> >> "For each phone call that BEGINS with the module which is denoted by 81 >> (i.e. of the form 81X,XXX), what is the expected number of modules in these >> calls?" >> >> How does one calculate the expected number of "modules" in this >> module? What does that even mean? >> >> Anyway, here's some using your `data` data.frame that calculates the >> number of unique calls and other statistics on the "call id" within >> each module prefix. I'm using both data.table and plyr ... there are >> no for loops. >> >> You will want to do `whatever it is you really want to do` inside the >> "blocks" below. >> >> ## R code >> data <- transform(data, module.prefix=substring(modules, 1, 2)) >> >> ## take a look at `data` now >> >> ## calulate "stuff" inside each module.prefix using data.table >> xx <- data.table(data, key="module.prefix") >> >> ans <- xx[, { >> ## the columns of the particular subset of your data.table >> ## are "injected" into the scope for this expression block >> ## which is where the `calls` variable below comes from >> tabled <- table(as.character(calls)) >> list(unique.calls=length(tabled), min=min(tabled), >> median=as.numeric(median(tabled)), max=max(tabled)) >> ## you will want to return your own list of "stuff" >> }, by='module.prefix'] >> >> >> ## with plyr >> library(plyr) >> ans <- ddply(data, "module.prefix", function(x) { >> ## `x` is a data.frame that all share the same module.prefix >> ## do whatever you want with it here >> tabled <- table(as.character(x$calls)) >> c(unique.calls=length(tabled), min=min(tabled), >> median=median(tabled), max=max(tabled)) >> }) >> >> You'll have to read up on the particulars of data.table and plyr. Both >> are really powerful packages ... you should get familiar with at least >> one. >> >> plyr is a bit more flexible in some ways. >> >> data.table is a bit more strict (cf. the need for >> `as.numeric(median(tabled))`), but also tends to be (much) faster when >> working over large datasets >> >> HTH, >> -steve > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.