Yes, you could bring it up on the R-sig-mac or file a bug report. On Wed, May 5, 2010 at 10:11 PM, steven mosher <mosherste...@gmail.com> wrote: > Thnks, > perhaps we should report it > > On Wed, May 5, 2010 at 4:52 PM, Gabor Grothendieck <ggrothendi...@gmail.com> > wrote: >> >> I am using Vista. Another thing to try is strapply using the tcl >> engine (assuming you do have tcltk capabilities) and the R engine. On >> Vista R 2.11.0 patched I get the same result: >> >> > capabilities()[["tcltk"]] >> [1] TRUE >> > strapply(test, "\\d{5}", c, engine = "tcl")[[1]] >> [1] "88958" >> > strapply(test, "\\d{5}", c, engine = "R")[[1]] >> [1] "88958" >> >> On Vista with R 2.9.2 I do get bad results: >> >> > >> > test<-"</tr><tr><th>88958</th><th>Abcdsef</th><th>67.8S</th><th>68.9\nW</th><th>26m</th>" >> > sub(".*(\\d{5}).*", "\\1", test) >> [1] >> "</tr><tr><th>88958</th><th>Abcdsef</th><th>67.8S</th><th>68.9\nW</th><th>26m</th>" >> > sub(".*(\\d{5}).*", "\\1", test, extended = TRUE) >> [1] >> "</tr><tr><th>88958</th><th>Abcdsef</th><th>67.8S</th><th>68.9\nW</th><th>26m</th>" >> > R.version.string >> [1] "R version 2.9.2 Patched (2009-09-08 r49647)" >> > win.version() >> [1] "Windows Vista (build 6002) Service Pack 2" >> >> >> On Wed, May 5, 2010 at 6:20 PM, steven mosher <mosherste...@gmail.com> >> wrote: >> > Hmm. >> > I have R11 just downloaded fresh. >> > I'll reload a new session..and revert. I will note that I've had trouble >> > with \\d >> > which is why I was using [0-9] >> > MAC here. >> > >> > On Wed, May 5, 2010 at 3:00 PM, Gabor Grothendieck >> > <ggrothendi...@gmail.com> >> > wrote: >> >> >> >> That's not what I get: >> >> >> >> > >> >> > >> >> > test<-"</tr><tr><th>88958</th><th>Abcdsef</th><th>67.8S</th><th>68.9\nW</th><th>26m</th>" >> >> > sub(".*(\\d{5}).*", "\\1", test) >> >> [1] "88958" >> >> > R.version.string >> >> [1] "R version 2.10.1 (2009-12-14)" >> >> >> >> I also got the above in R 2.11.0 patched as well. >> >> >> >> >> >> On Wed, May 5, 2010 at 5:55 PM, steven mosher <mosherste...@gmail.com> >> >> wrote: >> >> > test >> >> > [1] >> >> > >> >> > >> >> > "</tr><tr><th>88958</th><th>Abcdsef</th><th>67.8S</th><th>68.9\nW</th><th>26m</th>" >> >> >> sub(".*(\\d{5}).*", "\\1", test) >> >> > [1] "</th>" >> >> >> sub(".*([0-9]{5}).*","\\1",test) >> >> > [1] "88958" >> >> >> >> >> > >> >> > I think the "</" in the source throws something off. >> >> > as the group capture appears to not be working, except the bracket >> >> > version >> >> > it did. >> >> > >> >> > On Wed, May 5, 2010 at 2:35 PM, Gabor Grothendieck >> >> > <ggrothendi...@gmail.com> >> >> > wrote: >> >> >> >> >> >> Here are two ways to extract 5 digits. >> >> >> >> >> >> In the first one \\1 refers to the portion matched between the >> >> >> parentheses in the regular expression. >> >> >> >> >> >> In the second one strapply is like apply where the object to be >> >> >> worked >> >> >> on is the first argument (array for apply, string for strapply) the >> >> >> second modifies it (which dimension for apply, regular expression >> >> >> for >> >> >> strapply) and the last is a function which acts on each value >> >> >> (typically each row or column for apply and each match for >> >> >> strapply). >> >> >> In this case we use c as our function to just return all the >> >> >> results. >> >> >> They are returned in a list with one component per string but here >> >> >> test is just a single string so we get a list one long and we ask >> >> >> for >> >> >> the contents of the first component using [[1]]. >> >> >> >> >> >> # 1 - sub >> >> >> sub(".*(\\d{5}).*", "\\1", test) >> >> >> >> >> >> # 2 - strapply - see http://gsubfn.googlecode.com >> >> >> library(gsubfn) >> >> >> strapply(test, "\\d{5}", c)[[1]] >> >> >> >> >> >> >> >> >> >> >> >> On Wed, May 5, 2010 at 5:13 PM, steven mosher >> >> >> <mosherste...@gmail.com> >> >> >> wrote: >> >> >> > Given a text like >> >> >> > >> >> >> > I want to be able to extract a matched regular expression from a >> >> >> > piece >> >> >> > of >> >> >> > text. >> >> >> > >> >> >> > this apparently works, but is pretty ugly >> >> >> > # some html >> >> >> > >> >> >> > >> >> >> > >> >> >> > test<-"</tr><tr><th>88958</th><th>Abcdsef</th><th>67.8S</th><th>68.9\nW</th><th>26m</th>" >> >> >> > # a pattern to extract 5 digits >> >> >> >> pattern<-"[0-9]{5}" >> >> >> > # regexpr returns a start point[1] and an attribute "match.length" >> >> >> > attr(,"match.length) >> >> >> > # get the substring from the start point to the stop point.. where >> >> >> > stop >> >> >> > = >> >> >> > start +length-1 >> >> >> >> >> >> >> > >> >> >> > >> >> >> > >> >> >> > answer<-substr(test,regexpr(pattern,test)[1],regexpr(pattern,test)[1]+attr(regexpr(pattern,test),"match.length")-1) >> >> >> >> answer >> >> >> > [1] "88958" >> >> >> > >> >> >> > I tried using sub(pattern, replacement, x ) with a regexp that >> >> >> > captured >> >> >> > the >> >> >> > group. I'd found an example of this in the mails >> >> >> > but it didnt seem to work.. >> >> > >> >> > >> > >> > > >
______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.