another useful trick that could come in handy, thanks! baptiste
2009/11/18 Gabor Grothendieck <ggrothendi...@gmail.com>: > Here is a slight variation: > >> read.table(textConnection(grep("<aa?[xy]>", input, value = TRUE)), > + colClasses = c("NULL", "NULL", "numeric")) > V3 V6 > 1 0.00137700 3.4644e-07 > 2 0.00019412 4.8840e-08 > 3 0.00137700 3.4644e-07 > 4 0.00019412 4.8840e-08 > > > > On Wed, Nov 18, 2009 at 1:54 PM, baptiste auguie > <baptiste.aug...@googlemail.com> wrote: >> Hi, >> >> Thanks for the alternative approach. However, I should have made my >> example more complete in that other lines may also have numeric >> values, which I'm not interested in. Below is an updated problem, with >> my current solution, >> >> tc <- textConnection( >> "some text >> <ax> = 1.3770E-03 <bx> = 3.4644E-07 >> <ay> = 1.9412E-04 <by> = 4.8840E-08 >> >> other text >> <aax> = 1.3770E-03 <bbx> = 3.4644E-07 >> <aay> = 1.9412E-04 <bby> = 4.8840E-08 >> >> lots of other material, including numeric values >> 1.23E-4 123E5 12.3E-4 123E5 123E-4 123E5 >> 12.3E-4 123E5 12.3E-4 123E5 123E-4 123E5 >> etc...") >> >> input <- >> readLines(tc) >> close(tc) >> >> ## I want to retrieve the values for >> ## <ax>, <ay>, <aax> and <aay> only >> >> results <- c( >> strapply(input, "<ax> += +(\\d+\\.\\d+E[-+]?\\d+)", as.numeric, >> simplify = rbind), >> strapply(input, "<ay> += +(\\d+\\.\\d+E[-+]?\\d+)", as.numeric, >> simplify = rbind), >> strapply(input, "<aax> += +(\\d+\\.\\d+E[-+]?\\d+)", as.numeric, >> simplify = rbind), >> strapply(input, "<aay> += +(\\d+\\.\\d+E[-+]?\\d+)", as.numeric, >> simplify = rbind)) >> >> results >> >> Using the suggested base R solution, I've come up with this variation, >> >> z <- `, grep("<ax>|<ay>|<aax>|<aay>", input, >> value=TRUE)) >> >> test <- scan(textConnection(z),what=0) >> test[seq(1, length(test), by=2)] >> >> >> Thanks again, >> >> baptiste >> >> 2009/11/18 Bert Gunter <gunter.ber...@gene.com>: >>> The previous elegant solutions required the use of the gsubfn package. >>> Nothing wrong with that, of course, but I'm always curious whether still >>> relatively simple base R solutions can be found, as they are often (but not >>> always!) much faster. And anyway, it seems to be in the spirit of your query >>> to try such a solution. So here is one base R approach that I believe works. >>> I'll break it up into 2 lines so you can see what's going on. >>> >>> ## Using your example... >>> ## First replace everything but the number with spaces >>> >>>> z <- gsub("[^[:digit:]E.+-]"," ",input) >>>> z >>> [1] " " >>> [2] " 1.3770E-03 3.4644E-07" >>> [3] " 1.9412E-04 4.8840E-08" >>> [4] "" >>> [5] " " >>> [6] " 1.3770E-03 3.4644E-07" >>> [7] " 1.9412E-04 4.8840E-08" >>> >>> ## Now it can be scanned to a numeric via >>> >>>> z<-scan(textConnection(z),what=0) >>> Read 8 items >>>> z >>> [1] 1.3770e-03 3.4644e-07 1.9412e-04 4.8840e-08 1.3770e-03 3.4644e-07 >>> 1.9412e-04 4.8840e-08 >>> >>> ######## >>> I believe this strategy is reasonably general, but I haven't checked it >>> carefully and would appreciate folks pointing out where it trips up (e.g. >>> perhaps with NA's). >>> >>> Best, >>> >>> Bert Gunter >>> Genentech Nonclinical Biostatistics >>> >>> -----Original Message----- >>> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On >>> Behalf Of baptiste auguie >>> Sent: Wednesday, November 18, 2009 3:57 AM >>> To: r-help >>> Subject: [R] parsing numeric values >>> >>> Dear list, >>> >>> I'm seeking advice to extract some numeric values from a log file >>> created by an external program. Consider the following example, >>> >>> input <- >>> readLines(textConnection( >>> "some text >>> <ax> = 1.3770E-03 <bx> = 3.4644E-07 >>> <ay> = 1.9412E-04 <by> = 4.8840E-08 >>> >>> other text >>> <aax> = 1.3770E-03 <bbx> = 3.4644E-07 >>> <aay> = 1.9412E-04 <bby> = 4.8840E-08")) >>> >>> ## this is what I want >>> results <- c(as.numeric(strsplit(grep("<ax>", input,val=T), " ")[[1]][8]), >>> as.numeric(strsplit(grep("<ay>", input,val=T), " ")[[1]][8]), >>> as.numeric(strsplit(grep("<aax>", input,val=T), " ")[[1]][9]), >>> as.numeric(strsplit(grep("<aay>", input,val=T), " ")[[1]][9]) >>> ) >>> >>> ## [1] 0.00137700 0.00019412 0.00137700 0.00019412 >>> >>> The use of strsplit is not ideal here as there is a different number >>> of space characters in the lines containing <ax> and <aax> for >>> instance (hence the indices 8 and 9 respectively). >>> >>> I tried to use gsubfn for a cleaner construct, >>> >>> strapply(input, "<ax> += +([0-9.]+)", c, simplify=rbind,combine=as.numeric) >>> >>> but I can't seem to find the correct regular expression to deal with >>> the exponent. >>> >>> >>> Any tips are welcome! >>> >>> >>> Best regards, >>> >>> baptiste >>> >>> ______________________________________________ >>> R-help@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> >>> >> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.