Hi Jeff, Thanks for the reply. I tried your suggestion and it doesn't seem to work and I tried a simple pattern as follows and it works as expected
sub("(3\\.mg\\.kg)\\.(>50-70\\.kg)\\.(.*)", '\\1', "3.mg.kg.>50-70.kg.P05") [1] "3.mg.kg" sub("(3\\.mg\\.kg)\\.(>50-70\\.kg)\\.(.*)", '\\2', "3.mg.kg.>50-70.kg.P05") [1] ">50-70.kg" sub("(3\\.mg\\.kg)\\.(>50-70\\.kg)\\.(.*)", '\\3', "3.mg.kg.>50-70.kg.P05") [1] "P05" My problem is the pattern has to be dynamically constructed on the input data of the function I am writing. It's actually not too difficult to assemble the final.pattern with some code like the following sort.var <- c('TX','WTCUT') combn.sort.var <- do.call(expand.grid, lapply(sort.var, function(x)paste('(',gsub('\\.','\\\\.',unlist(unique(all.exposure[x]))), ')', sep=''))) all.patterns <- do.call(paste, c(combn.sort.var, '(.*)', sep='\\.')) final.pattern <- paste0(all.patterns, collapse='|') You cannot run the code directly since the data object "all.exposure" is not provided here. Jun On Tue, Sep 6, 2016 at 8:18 PM, Jeff Newmiller <jdnew...@dcn.davis.ca.us> wrote: > I am not near my computer today, but each parenthesis gets its own result > number, so you should put the parenthesis around the whole pattern of > alternatives instead of having many parentheses. > > I recommend thinking in terms of what common information you expect to > find in these various strings, and place your parentheses to capture that > information. There is no other reason to put parentheses in the pattern... > they are not grouping symbols. > -- > Sent from my phone. Please excuse my brevity. > > On September 6, 2016 5:01:04 PM PDT, Bert Gunter <bgunter.4...@gmail.com> > wrote: > >Jun: > > > >1. Tell us your desired result from your test vector and maybe someone > >will help. > > > >2. As we played this game once already (you couldn't do it; I showed > >you how), this seems to be a function of your limitations with regular > >expressions. I'm probably not much better, but in any case, I don't > >intend to be your consultant. See if you can find someone locally to > >help you if you do not receive a satisfactory reply from the list. > >There are many people here who are pretty good at this sort of thing, > >but I don't know if they'll reply. Regex's are certainly complex. PERL > >people tend to be pretty good at them, I believe. There are numerous > >web sites and books on them if you need to acquire expertise for your > >work. > > > >Cheers, > >Bert > >Bert Gunter > > > >"The trouble with having an open mind is that people keep coming along > >and sticking things into it." > >-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > > > > > >On Tue, Sep 6, 2016 at 3:59 PM, Jun Shen <jun.shen...@gmail.com> wrote: > >> Hi Bert, > >> > >> I still couldn't make the multiple patterns to work. Here is an > >example. I > >> make the pattern as follows > >> > >> final.pattern <- > >> > >"(240\\.m\\.g)\\.(>50-70\\.kg)\\.(.*)|(3\\.mg\\.kg)\\.(> > 50-70\\.kg)\\.(.*)|(240\\.m\\.g)\\.(>70-90\\.kg)\\.(.*)|(3\\ > .mg\\.kg)\\.(>70-90\\.kg)\\.(.*)|(240\\.m\\.g)\\.(>90-110\\. > kg)\\.(.*)|(3\\.mg\\.kg)\\.(>90-110\\.kg)\\.(.*)|(240\\.m\\ > .g)\\.(50\\.kg\\.or\\.less)\\.(.*)|(3\\.mg\\.kg)\\.(50\\.kg\ > \.or\\.less)\\.(.*)|(240\\.m\\.g)\\.(>110\\.kg)\\.(.*)|(3\\. > mg\\.kg)\\.(>110\\.kg)\\.(.*)" > >> > >> test.string <- c('240.m.g.>110.kg.geo.mean', '3.mg.kg.>110.kg.P05', > >> '240.m.g.>50-70.kg.geo.mean') > >> > >> sub(final.pattern, '\\1', test.string) > >> sub(final.pattern, '\\2', test.string) > >> sub(final.pattern, '\\3', test.string) > >> > >> Only the third string has been correctly parsed, which matches the > >first > >> pattern. It seems the rest of the patterns are not called. > >> > >> Jun > >> > >> > >> On Mon, Sep 5, 2016 at 10:21 PM, Bert Gunter <bgunter.4...@gmail.com> > >wrote: > >>> > >>> Just noticed: My clumsy do.call() line in my previously posted code > >>> below should be replaced with: > >>> pat <- paste(pat,collapse = "|") > >>> > >>> > >>> > pat <- c(pat1,pat2) > >>> > paste(pat,collapse="|") > >>> [1] "a+\\.*a+|b+\\.*b+" > >>> > >>> ************ replace this ************************** > >>> > pat <- do.call(paste,c(as.list(pat), sep="|")) > >>> ******************************************** > >>> > sub(paste0("^[^b]*(",pat,").*$"),"\\1",z) > >>> [1] "a.a" "bb" "b.bbb" > >>> > >>> > >>> -- Bert > >>> Bert Gunter > >>> > >>> "The trouble with having an open mind is that people keep coming > >along > >>> and sticking things into it." > >>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > >>> > >>> > >>> On Mon, Sep 5, 2016 at 12:11 PM, Bert Gunter > ><bgunter.4...@gmail.com> > >>> wrote: > >>> > Jun: > >>> > > >>> > You need to provide a clear specification via regular expressions > >of > >>> > the patterns you wish to match -- at least for me to decipher it. > >>> > Others may be smarter than I, though... > >>> > > >>> > Jeff: Thanks. I have now convinced myself that it can be done (a > >>> > "proof" of sorts): If pat1, pat2,..., patn are m different > >patterns > >>> > (in a vector of patterns) to be matched in a vector of n strings, > >>> > where only one of the patterns will match in any string, then use > >>> > paste() (probably via do.call()) or otherwise to paste them > >together > >>> > separated by "|" to form the concatenated pattern, pat. Then > >>> > > >>> > sub(paste0("^.*(",pat, ").*$"),"\\1",thevector) > >>> > > >>> > should extract the matching pattern in each (perhaps with a little > >>> > fiddling due to precedence rules); e.g. > >>> > > >>> >> z <-c(".fg.h.g.a.a", "bb..dd.ef.tgf.", "foo...b.bbb.tgy") > >>> > > >>> >> pat1 <- "a+\\.*a+" > >>> >> pat2 <-"b+\\.*b+" > >>> >> pat <- c(pat1,pat2) > >>> > > >>> >> pat <- do.call(paste,c(as.list(pat), sep="|")) > >>> >> pat > >>> > [1] "a+\\.*a+|b+\\.*b+" > >>> > > >>> >> sub(paste0("^[^b]*(",pat,").*$"), "\\1", z) > >>> > [1] "a.a" "bb" "b.bbb" > >>> > > >>> > Cheers, > >>> > Bert > >>> > > >>> > > >>> > Bert Gunter > >>> > > >>> > "The trouble with having an open mind is that people keep coming > >along > >>> > and sticking things into it." > >>> > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > >>> > > >>> > > >>> > On Mon, Sep 5, 2016 at 9:56 AM, Jun Shen <jun.shen...@gmail.com> > >wrote: > >>> >> Thanks for the reply, Bert. > >>> >> > >>> >> Your solution solves the example. I actually have a more general > >>> >> situation > >>> >> where I have this dot concatenated string from multiple > >variables. The > >>> >> problem is those variables may have values with dots in there. > >The > >>> >> number of > >>> >> dots are not consistent for all values of a variable. So I am > >thinking > >>> >> to > >>> >> define a vector of patterns for the vector of the string and > >hopefully > >>> >> to > >>> >> find a way to use a pattern from the pattern vector for each > >value of > >>> >> the > >>> >> string vector. The only way I can think of is "for" loop, which > >can be > >>> >> slow. > >>> >> Also these are happening in a function I am writing. Just wonder > >if > >>> >> there is > >>> >> another more efficient way. Thanks a lot. > >>> >> > >>> >> Jun > >>> >> > >>> >> On Mon, Sep 5, 2016 at 1:41 AM, Bert Gunter > ><bgunter.4...@gmail.com> > >>> >> wrote: > >>> >>> > >>> >>> Well, he did provide an example, and... > >>> >>> > >>> >>> > >>> >>> > z <- c('TX.WT.CUT.mean','mg.tx.cv') > >>> >>> > >>> >>> > sub("^.+?\\.(.+)\\.[^.]+$","\\1",z) > >>> >>> [1] "WT.CUT" "tx" > >>> >>> > >>> >>> > >>> >>> ## seems to do what was requested. > >>> >>> > >>> >>> Jeff would have to amplify on his initial statement however: do > >you > >>> >>> mean that separate patterns can always be combined via "|" ? Or > >>> >>> something deeper? > >>> >>> > >>> >>> Cheers, > >>> >>> Bert > >>> >>> Bert Gunter > >>> >>> > >>> >>> "The trouble with having an open mind is that people keep coming > >along > >>> >>> and sticking things into it." > >>> >>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip > >) > >>> >>> > >>> >>> > >>> >>> On Sun, Sep 4, 2016 at 9:30 PM, Jeff Newmiller > >>> >>> <jdnew...@dcn.davis.ca.us> > >>> >>> wrote: > >>> >>> > Your opening assertion is false. > >>> >>> > > >>> >>> > Provide a reproducible example and someone will demonstrate. > >>> >>> > -- > >>> >>> > Sent from my phone. Please excuse my brevity. > >>> >>> > > >>> >>> > On September 4, 2016 9:06:59 PM PDT, Jun Shen > >>> >>> > <jun.shen...@gmail.com> > >>> >>> > wrote: > >>> >>> >>Dear list, > >>> >>> >> > >>> >>> >>I have a vector of strings that cannot be described by one > >pattern. > >>> >>> >> So > >>> >>> >>let's say I construct a vector of patterns in the same length > >as the > >>> >>> >>vector > >>> >>> >>of strings, can I do the element wise pattern recognition and > >string > >>> >>> >>substitution. > >>> >>> >> > >>> >>> >>For example, > >>> >>> >> > >>> >>> >>pattern1 <- "([^.]*)\\.([^.]*\\.[^.]*)\\.(.*)" > >>> >>> >>pattern2 <- "([^.]*)\\.([^.]*)\\.(.*)" > >>> >>> >> > >>> >>> >>patterns <- c(pattern1,pattern2) > >>> >>> >>strings <- c('TX.WT.CUT.mean','mg.tx.cv') > >>> >>> >> > >>> >>> >>Say I want to extract "WT.CUT" from the first string and "tx" > >from > >>> >>> >> the > >>> >>> >>second string. If I do > >>> >>> >> > >>> >>> >>sub(patterns, '\\2', strings), only the first pattern will be > >used. > >>> >>> >> > >>> >>> >>looping the patterns doesn't work the way I want. Appreciate > >any > >>> >>> >>comments. > >>> >>> >>Thanks. > >>> >>> >> > >>> >>> >>Jun > >>> >>> >> > >>> >>> >> [[alternative HTML version deleted]] > >>> >>> >> > >>> >>> >>______________________________________________ > >>> >>> >>R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, > >see > >>> >>> >>https://stat.ethz.ch/mailman/listinfo/r-help > >>> >>> >>PLEASE do read the posting guide > >>> >>> >>http://www.R-project.org/posting-guide.html > >>> >>> >>and provide commented, minimal, self-contained, reproducible > >code. > >>> >>> > > >>> >>> > ______________________________________________ > >>> >>> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, > >see > >>> >>> > https://stat.ethz.ch/mailman/listinfo/r-help > >>> >>> > PLEASE do read the posting guide > >>> >>> > http://www.R-project.org/posting-guide.html > >>> >>> > and provide commented, minimal, self-contained, reproducible > >code. > >>> >> > >>> >> > >> > >> > > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.