> -----Original Message----- > From: r-help-boun...@r-project.org > [mailto:r-help-boun...@r-project.org] On Behalf Of S. Few > Sent: Thursday, September 10, 2009 1:46 PM > To: r-help@r-project.org > Subject: [R] R 2.9.2 memory max - object vector size > > Me: > > Win XP > 4 gig ram > R 2.9.2 > > library(foreign) # to read/write SPSS files > library(doBy) # for summaryBy > library(RODBC) > setwd("C:\\Documents and Settings\\............00909BR") > gc() > memory.limit(size=4000) > > ## PROBLEM: > > I have memory limit problems. R and otherwise. My dataframes for > merging or subsetting are about 300k to 900k records. > I've had errors such as vector size too large. gc() was done.....reset > workspace, etc. > > This fails: > > y$pickseq<-with(y,ave(as.numeric(as.Date(timestamp)),id,FUN=seq))
If any values in id are singletons then the call to seq(timestamp[id=="singleton"]) returns a vector whose length is timestamp[id=="singleton"] (not the length of that, the value of that). as.numeric(as.Date("2009-09-10")) is 14497 so you might have a lot of 14497-long vectors being created (and thrown away, unused except for their initial value). Using seq_along instead of seq would take care of that potential problem. E.g., > d1<-data.frame(x=c(2,3,5e9,4,5),id=c("A","B","B","B","A")) > d2<-data.frame(x=c(2,3,5e9,4,5),id=c("A","B","C","B","A")) > # d1$id has no singletons, d2$id does where d2$x is huge > with(d1, ave(x,id,FUN=seq)) [1] 1 1 2 3 2 > with(d2, ave(x,id,FUN=seq)) Error in 1L:from : result would be too long a vector > with(d2, ave(x,id,FUN=seq_along)) [1] 1 1 1 2 2 If your intent is to create a vector of within-group sequence numbers then there are more efficient ways to do it. E.g., with the following functions withinGroupSeq <- function(x){ x <- as.factor(x) retval <- integer(length(x)) retval[order(as.integer(x))] <- Sequence(table(x)) retval } # Sequence is like base::sequence but should use less memory # by avoiding the list that sequence's lapply call makes. Sequence <- function(nvec) { seq_len(sum(nvec)) - rep(cumsum(c(0L,nvec[-length(nvec)])), nvec) } you can get the same result as ave(FUN=seq_along) in less time and, I suspect, less memory > withinGroupSeq(d1$id) [1] 1 1 2 3 2 > withinGroupSeq(d2$id) [1] 1 1 1 2 2 Base R may have a function for that already. Bill Dunlap TIBCO Software Inc - Spotfire Division wdunlap tibco.com > > Any clues? > > Is this 2.9.2? > > Skipping forward, should I download version R 2.8 or less? > > Thanks! > Steve > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.