Here is a slight variation: > read.table(textConnection(grep("<aa?[xy]>", input, value = TRUE)), + colClasses = c("NULL", "NULL", "numeric")) V3 V6 1 0.00137700 3.4644e-07 2 0.00019412 4.8840e-08 3 0.00137700 3.4644e-07 4 0.00019412 4.8840e-08
On Wed, Nov 18, 2009 at 1:54 PM, baptiste auguie <baptiste.aug...@googlemail.com> wrote: > Hi, > > Thanks for the alternative approach. However, I should have made my > example more complete in that other lines may also have numeric > values, which I'm not interested in. Below is an updated problem, with > my current solution, > > tc <- textConnection( > "some text > <ax> = 1.3770E-03 <bx> = 3.4644E-07 > <ay> = 1.9412E-04 <by> = 4.8840E-08 > > other text > <aax> = 1.3770E-03 <bbx> = 3.4644E-07 > <aay> = 1.9412E-04 <bby> = 4.8840E-08 > > lots of other material, including numeric values > 1.23E-4 123E5 12.3E-4 123E5 123E-4 123E5 > 12.3E-4 123E5 12.3E-4 123E5 123E-4 123E5 > etc...") > > input <- > readLines(tc) > close(tc) > > ## I want to retrieve the values for > ## <ax>, <ay>, <aax> and <aay> only > > results <- c( > strapply(input, "<ax> += +(\\d+\\.\\d+E[-+]?\\d+)", as.numeric, > simplify = rbind), > strapply(input, "<ay> += +(\\d+\\.\\d+E[-+]?\\d+)", as.numeric, > simplify = rbind), > strapply(input, "<aax> += +(\\d+\\.\\d+E[-+]?\\d+)", as.numeric, > simplify = rbind), > strapply(input, "<aay> += +(\\d+\\.\\d+E[-+]?\\d+)", as.numeric, > simplify = rbind)) > > results > > Using the suggested base R solution, I've come up with this variation, > > z <- `, grep("<ax>|<ay>|<aax>|<aay>", input, > value=TRUE)) > > test <- scan(textConnection(z),what=0) > test[seq(1, length(test), by=2)] > > > Thanks again, > > baptiste > > 2009/11/18 Bert Gunter <gunter.ber...@gene.com>: >> The previous elegant solutions required the use of the gsubfn package. >> Nothing wrong with that, of course, but I'm always curious whether still >> relatively simple base R solutions can be found, as they are often (but not >> always!) much faster. And anyway, it seems to be in the spirit of your query >> to try such a solution. So here is one base R approach that I believe works. >> I'll break it up into 2 lines so you can see what's going on. >> >> ## Using your example... >> ## First replace everything but the number with spaces >> >>> z <- gsub("[^[:digit:]E.+-]"," ",input) >>> z >> [1] " " >> [2] " 1.3770E-03 3.4644E-07" >> [3] " 1.9412E-04 4.8840E-08" >> [4] "" >> [5] " " >> [6] " 1.3770E-03 3.4644E-07" >> [7] " 1.9412E-04 4.8840E-08" >> >> ## Now it can be scanned to a numeric via >> >>> z<-scan(textConnection(z),what=0) >> Read 8 items >>> z >> [1] 1.3770e-03 3.4644e-07 1.9412e-04 4.8840e-08 1.3770e-03 3.4644e-07 >> 1.9412e-04 4.8840e-08 >> >> ######## >> I believe this strategy is reasonably general, but I haven't checked it >> carefully and would appreciate folks pointing out where it trips up (e.g. >> perhaps with NA's). >> >> Best, >> >> Bert Gunter >> Genentech Nonclinical Biostatistics >> >> -----Original Message----- >> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On >> Behalf Of baptiste auguie >> Sent: Wednesday, November 18, 2009 3:57 AM >> To: r-help >> Subject: [R] parsing numeric values >> >> Dear list, >> >> I'm seeking advice to extract some numeric values from a log file >> created by an external program. Consider the following example, >> >> input <- >> readLines(textConnection( >> "some text >> <ax> = 1.3770E-03 <bx> = 3.4644E-07 >> <ay> = 1.9412E-04 <by> = 4.8840E-08 >> >> other text >> <aax> = 1.3770E-03 <bbx> = 3.4644E-07 >> <aay> = 1.9412E-04 <bby> = 4.8840E-08")) >> >> ## this is what I want >> results <- c(as.numeric(strsplit(grep("<ax>", input,val=T), " ")[[1]][8]), >> as.numeric(strsplit(grep("<ay>", input,val=T), " ")[[1]][8]), >> as.numeric(strsplit(grep("<aax>", input,val=T), " ")[[1]][9]), >> as.numeric(strsplit(grep("<aay>", input,val=T), " ")[[1]][9]) >> ) >> >> ## [1] 0.00137700 0.00019412 0.00137700 0.00019412 >> >> The use of strsplit is not ideal here as there is a different number >> of space characters in the lines containing <ax> and <aax> for >> instance (hence the indices 8 and 9 respectively). >> >> I tried to use gsubfn for a cleaner construct, >> >> strapply(input, "<ax> += +([0-9.]+)", c, simplify=rbind,combine=as.numeric) >> >> but I can't seem to find the correct regular expression to deal with >> the exponent. >> >> >> Any tips are welcome! >> >> >> Best regards, >> >> baptiste >> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> >> > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.