On 26-01-2013, at 21:09, Uwe Ligges <lig...@statistik.tu-dortmund.de> wrote:
> 
> 
> On 26.01.2013 20:46, Berend Hasselman wrote:
>> 
>> On 26-01-2013, at 19:43, emorway <emor...@usgs.gov> wrote:
>> 
>>> I'm wondering if I need to use a function other than sapply as the following
>>> line of code runs indefinitely (or > 30 min so far) and uses up all 16Gb of
>>> memory on my machine for what seems like a very small dataset (data attached
>>> in a txt file  wells.txt
>>> <http://r.789695.n4.nabble.com/file/n4656723/wells.txt>  ).  The R code is:
>>> 
>>> wells<-read.table("c:/temp/wells.txt",col.names=c("name","plc_hldr"))
>>> wells2<-wells[sapply(wells[,1],function(x)length(strsplit(as.character(x),
>>> "_")[[1]])==2),]
>>> 
>>> The 2nd line of R code above gets bogged down and takes all my RAM with it:
>>> <http://r.789695.n4.nabble.com/file/n4656723/memory_loss.png>
>>> 
>>> I'm simply trying to extract all of the lines of data that have a single "_"
>>> in the first column and place them into a dataset called "wells2".  If that
>>> were to work, I then want to extract the lines of data that have two "_" and
>>> put them into a separate dataset, say "wells3".  Is there a better way to do
>>> this than the one-liner above?
>> 
>> 
>> Read your file with
>> 
>>      wells<-read.table("wells.txt",col.names=c("name","plc_hldr"), 
>> stringsAsFactors=FALSE)
>> 
>> Remove all non underscores with
>> 
>>      w.sub <- gsub("[^_]+","",wells[,1])
>> 
>> then select elements of w.sub with 2 underscores and a single underscore with
>> 
>>      u.2 <- which(w.sub=="__")
>>      u.1 <- which(w.sub=="_")
>> 
>> and use u.1 and u.2 to select the appropriate rows of wells.
> 
> With grep:
> 
> wells1 <- wells[grep("^[^\\_]*_[^\\_]*$", wells[,1]),]
> wells2 <- wells[grep("^[^\\_]*_[^\\_]*_[^\\_]*$", wells[,1]),]
> 

Are the \\ necessary?
I tried without the \\ and that gives identical results.

Berend

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to