On 26-01-2013, at 21:09, Uwe Ligges <lig...@statistik.tu-dortmund.de> wrote:
> > > On 26.01.2013 20:46, Berend Hasselman wrote: >> >> On 26-01-2013, at 19:43, emorway <emor...@usgs.gov> wrote: >> >>> I'm wondering if I need to use a function other than sapply as the following >>> line of code runs indefinitely (or > 30 min so far) and uses up all 16Gb of >>> memory on my machine for what seems like a very small dataset (data attached >>> in a txt file wells.txt >>> <http://r.789695.n4.nabble.com/file/n4656723/wells.txt> ). The R code is: >>> >>> wells<-read.table("c:/temp/wells.txt",col.names=c("name","plc_hldr")) >>> wells2<-wells[sapply(wells[,1],function(x)length(strsplit(as.character(x), >>> "_")[[1]])==2),] >>> >>> The 2nd line of R code above gets bogged down and takes all my RAM with it: >>> <http://r.789695.n4.nabble.com/file/n4656723/memory_loss.png> >>> >>> I'm simply trying to extract all of the lines of data that have a single "_" >>> in the first column and place them into a dataset called "wells2". If that >>> were to work, I then want to extract the lines of data that have two "_" and >>> put them into a separate dataset, say "wells3". Is there a better way to do >>> this than the one-liner above? >> >> >> Read your file with >> >> wells<-read.table("wells.txt",col.names=c("name","plc_hldr"), >> stringsAsFactors=FALSE) >> >> Remove all non underscores with >> >> w.sub <- gsub("[^_]+","",wells[,1]) >> >> then select elements of w.sub with 2 underscores and a single underscore with >> >> u.2 <- which(w.sub=="__") >> u.1 <- which(w.sub=="_") >> >> and use u.1 and u.2 to select the appropriate rows of wells. > > With grep: > > wells1 <- wells[grep("^[^\\_]*_[^\\_]*$", wells[,1]),] > wells2 <- wells[grep("^[^\\_]*_[^\\_]*_[^\\_]*$", wells[,1]),] > Are the \\ necessary? I tried without the \\ and that gives identical results. Berend ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.