On Thu, 17 Apr 2008, Rolf Turner wrote: > > On 17/04/2008, at 7:52 AM, Matthew Keller wrote: > >> Hello all, >> >> I should probably know this by now... Anyway: >> >> I have a large matrix (dim(data) is 3000 18000). In each element are >> one of the following character strings "0/0", "1/1", "1/2", "2/2". I >> wanted to replace "0/0" with NA and the other three with 0,1,2 >> respectively. To accomplish just the first of these four steps I did >> this: >> >> data[data=="0/0"] <- NA >> >> Which is still running after 13 hours. I have 18 GB RAM and running 64 >> bit R. What is a more efficient way to accomplish this (I've already >> done it using sed in UNIX - but want to know how to do so in R)? >> Thanks in advance. > > Well I just did > > gorp <- c("0/0","1/1","1/2","2/2") > mung <- matrix(sample(gorp,54e6,TRUE),3000,18000) > mung[mung=="0/0"] <- NA > > and the whole schmear ran in under half a minute of real time.
Likewise. I'll lay odds that Matthew's 'matrix' is actually a data.frame, and I'll not be surprised if the columns are factors. In which case mung2 <- as.data.frame(lapply( mung, function(x) { levels(x)[ levels(x)=='0/0' ] <- NA x } )) will be faster, but still not as fast as what you show with a matrix. HTH, Chuck > > > sessionInfo() > R version 2.6.2 (2008-02-08) > i386-apple-darwin8.10.1 > > locale: > C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] misc_0.0-2 > > loaded via a namespace (and not attached): > [1] rcompgen_0.1-17 > > I would say that something is seriously snarled up in your system. > > cheers, > > Rolf Turner > > ###################################################################### > Attention:\ This e-mail message is privileged and confid...{{dropped:9}} > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > Charles C. Berry (858) 534-2098 Dept of Family/Preventive Medicine E mailto:[EMAIL PROTECTED] UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.