Hi Bert, In the final.pattern, there are ten patterns.
>sub(final.pattern, '\\1', test.string) Expected results: "240.m.g" "3.mg.kg" "240.m.g" Current results: "" "" "240.m.g" >sub(final.pattern, '\\2', test.string) Expected results: ">110.kg" ">110.kg" ">50-70.kg" Current results: "" "" ">50-70.kg" >sub(final.pattern, '\\3', test.string) Expected results: "geo.mean" "P05" "geo.mean" Current results: "" "" "geo.mean" Right now, I only get the results from the third string. On Tue, Sep 6, 2016 at 8:01 PM, Bert Gunter <bgunter.4...@gmail.com> wrote: > Jun: > > 1. Tell us your desired result from your test vector and maybe someone > will help. > > 2. As we played this game once already (you couldn't do it; I showed > you how), this seems to be a function of your limitations with regular > expressions. I'm probably not much better, but in any case, I don't > intend to be your consultant. See if you can find someone locally to > help you if you do not receive a satisfactory reply from the list. > There are many people here who are pretty good at this sort of thing, > but I don't know if they'll reply. Regex's are certainly complex. PERL > people tend to be pretty good at them, I believe. There are numerous > web sites and books on them if you need to acquire expertise for your > work. > > Cheers, > Bert > Bert Gunter > > "The trouble with having an open mind is that people keep coming along > and sticking things into it." > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > > > On Tue, Sep 6, 2016 at 3:59 PM, Jun Shen <jun.shen...@gmail.com> wrote: > > Hi Bert, > > > > I still couldn't make the multiple patterns to work. Here is an example. > I > > make the pattern as follows > > > > final.pattern <- > > "(240\\.m\\.g)\\.(>50-70\\.kg)\\.(.*)|(3\\.mg\\.kg)\\.(>50- > 70\\.kg)\\.(.*)|(240\\.m\\.g)\\.(>70-90\\.kg)\\.(.*)|(3\\. > mg\\.kg)\\.(>70-90\\.kg)\\.(.*)|(240\\.m\\.g)\\.(>90-110\\. > kg)\\.(.*)|(3\\.mg\\.kg)\\.(>90-110\\.kg)\\.(.*)|(240\\.m\\ > .g)\\.(50\\.kg\\.or\\.less)\\.(.*)|(3\\.mg\\.kg)\\.(50\\.kg\ > \.or\\.less)\\.(.*)|(240\\.m\\.g)\\.(>110\\.kg)\\.(.*)|(3\\. > mg\\.kg)\\.(>110\\.kg)\\.(.*)" > > > > test.string <- c('240.m.g.>110.kg.geo.mean', '3.mg.kg.>110.kg.P05', > > '240.m.g.>50-70.kg.geo.mean') > > > > sub(final.pattern, '\\1', test.string) > > sub(final.pattern, '\\2', test.string) > > sub(final.pattern, '\\3', test.string) > > > > Only the third string has been correctly parsed, which matches the first > > pattern. It seems the rest of the patterns are not called. > > > > Jun > > > > > > On Mon, Sep 5, 2016 at 10:21 PM, Bert Gunter <bgunter.4...@gmail.com> > wrote: > >> > >> Just noticed: My clumsy do.call() line in my previously posted code > >> below should be replaced with: > >> pat <- paste(pat,collapse = "|") > >> > >> > >> > pat <- c(pat1,pat2) > >> > paste(pat,collapse="|") > >> [1] "a+\\.*a+|b+\\.*b+" > >> > >> ************ replace this ************************** > >> > pat <- do.call(paste,c(as.list(pat), sep="|")) > >> ******************************************** > >> > sub(paste0("^[^b]*(",pat,").*$"),"\\1",z) > >> [1] "a.a" "bb" "b.bbb" > >> > >> > >> -- Bert > >> Bert Gunter > >> > >> "The trouble with having an open mind is that people keep coming along > >> and sticking things into it." > >> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > >> > >> > >> On Mon, Sep 5, 2016 at 12:11 PM, Bert Gunter <bgunter.4...@gmail.com> > >> wrote: > >> > Jun: > >> > > >> > You need to provide a clear specification via regular expressions of > >> > the patterns you wish to match -- at least for me to decipher it. > >> > Others may be smarter than I, though... > >> > > >> > Jeff: Thanks. I have now convinced myself that it can be done (a > >> > "proof" of sorts): If pat1, pat2,..., patn are m different patterns > >> > (in a vector of patterns) to be matched in a vector of n strings, > >> > where only one of the patterns will match in any string, then use > >> > paste() (probably via do.call()) or otherwise to paste them together > >> > separated by "|" to form the concatenated pattern, pat. Then > >> > > >> > sub(paste0("^.*(",pat, ").*$"),"\\1",thevector) > >> > > >> > should extract the matching pattern in each (perhaps with a little > >> > fiddling due to precedence rules); e.g. > >> > > >> >> z <-c(".fg.h.g.a.a", "bb..dd.ef.tgf.", "foo...b.bbb.tgy") > >> > > >> >> pat1 <- "a+\\.*a+" > >> >> pat2 <-"b+\\.*b+" > >> >> pat <- c(pat1,pat2) > >> > > >> >> pat <- do.call(paste,c(as.list(pat), sep="|")) > >> >> pat > >> > [1] "a+\\.*a+|b+\\.*b+" > >> > > >> >> sub(paste0("^[^b]*(",pat,").*$"), "\\1", z) > >> > [1] "a.a" "bb" "b.bbb" > >> > > >> > Cheers, > >> > Bert > >> > > >> > > >> > Bert Gunter > >> > > >> > "The trouble with having an open mind is that people keep coming along > >> > and sticking things into it." > >> > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > >> > > >> > > >> > On Mon, Sep 5, 2016 at 9:56 AM, Jun Shen <jun.shen...@gmail.com> > wrote: > >> >> Thanks for the reply, Bert. > >> >> > >> >> Your solution solves the example. I actually have a more general > >> >> situation > >> >> where I have this dot concatenated string from multiple variables. > The > >> >> problem is those variables may have values with dots in there. The > >> >> number of > >> >> dots are not consistent for all values of a variable. So I am > thinking > >> >> to > >> >> define a vector of patterns for the vector of the string and > hopefully > >> >> to > >> >> find a way to use a pattern from the pattern vector for each value of > >> >> the > >> >> string vector. The only way I can think of is "for" loop, which can > be > >> >> slow. > >> >> Also these are happening in a function I am writing. Just wonder if > >> >> there is > >> >> another more efficient way. Thanks a lot. > >> >> > >> >> Jun > >> >> > >> >> On Mon, Sep 5, 2016 at 1:41 AM, Bert Gunter <bgunter.4...@gmail.com> > >> >> wrote: > >> >>> > >> >>> Well, he did provide an example, and... > >> >>> > >> >>> > >> >>> > z <- c('TX.WT.CUT.mean','mg.tx.cv') > >> >>> > >> >>> > sub("^.+?\\.(.+)\\.[^.]+$","\\1",z) > >> >>> [1] "WT.CUT" "tx" > >> >>> > >> >>> > >> >>> ## seems to do what was requested. > >> >>> > >> >>> Jeff would have to amplify on his initial statement however: do you > >> >>> mean that separate patterns can always be combined via "|" ? Or > >> >>> something deeper? > >> >>> > >> >>> Cheers, > >> >>> Bert > >> >>> Bert Gunter > >> >>> > >> >>> "The trouble with having an open mind is that people keep coming > along > >> >>> and sticking things into it." > >> >>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > >> >>> > >> >>> > >> >>> On Sun, Sep 4, 2016 at 9:30 PM, Jeff Newmiller > >> >>> <jdnew...@dcn.davis.ca.us> > >> >>> wrote: > >> >>> > Your opening assertion is false. > >> >>> > > >> >>> > Provide a reproducible example and someone will demonstrate. > >> >>> > -- > >> >>> > Sent from my phone. Please excuse my brevity. > >> >>> > > >> >>> > On September 4, 2016 9:06:59 PM PDT, Jun Shen > >> >>> > <jun.shen...@gmail.com> > >> >>> > wrote: > >> >>> >>Dear list, > >> >>> >> > >> >>> >>I have a vector of strings that cannot be described by one > pattern. > >> >>> >> So > >> >>> >>let's say I construct a vector of patterns in the same length as > the > >> >>> >>vector > >> >>> >>of strings, can I do the element wise pattern recognition and > string > >> >>> >>substitution. > >> >>> >> > >> >>> >>For example, > >> >>> >> > >> >>> >>pattern1 <- "([^.]*)\\.([^.]*\\.[^.]*)\\.(.*)" > >> >>> >>pattern2 <- "([^.]*)\\.([^.]*)\\.(.*)" > >> >>> >> > >> >>> >>patterns <- c(pattern1,pattern2) > >> >>> >>strings <- c('TX.WT.CUT.mean','mg.tx.cv') > >> >>> >> > >> >>> >>Say I want to extract "WT.CUT" from the first string and "tx" from > >> >>> >> the > >> >>> >>second string. If I do > >> >>> >> > >> >>> >>sub(patterns, '\\2', strings), only the first pattern will be > used. > >> >>> >> > >> >>> >>looping the patterns doesn't work the way I want. Appreciate any > >> >>> >>comments. > >> >>> >>Thanks. > >> >>> >> > >> >>> >>Jun > >> >>> >> > >> >>> >> [[alternative HTML version deleted]] > >> >>> >> > >> >>> >>______________________________________________ > >> >>> >>R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > >> >>> >>https://stat.ethz.ch/mailman/listinfo/r-help > >> >>> >>PLEASE do read the posting guide > >> >>> >>http://www.R-project.org/posting-guide.html > >> >>> >>and provide commented, minimal, self-contained, reproducible code. > >> >>> > > >> >>> > ______________________________________________ > >> >>> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > >> >>> > https://stat.ethz.ch/mailman/listinfo/r-help > >> >>> > PLEASE do read the posting guide > >> >>> > http://www.R-project.org/posting-guide.html > >> >>> > and provide commented, minimal, self-contained, reproducible code. > >> >> > >> >> > > > > > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.