Just noticed: My clumsy do.call() line in my previously posted code below should be replaced with: pat <- paste(pat,collapse = "|")
> pat <- c(pat1,pat2) > paste(pat,collapse="|") [1] "a+\\.*a+|b+\\.*b+" ************ replace this ************************** > pat <- do.call(paste,c(as.list(pat), sep="|")) ******************************************** > sub(paste0("^[^b]*(",pat,").*$"),"\\1",z) [1] "a.a" "bb" "b.bbb" -- Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Mon, Sep 5, 2016 at 12:11 PM, Bert Gunter <bgunter.4...@gmail.com> wrote: > Jun: > > You need to provide a clear specification via regular expressions of > the patterns you wish to match -- at least for me to decipher it. > Others may be smarter than I, though... > > Jeff: Thanks. I have now convinced myself that it can be done (a > "proof" of sorts): If pat1, pat2,..., patn are m different patterns > (in a vector of patterns) to be matched in a vector of n strings, > where only one of the patterns will match in any string, then use > paste() (probably via do.call()) or otherwise to paste them together > separated by "|" to form the concatenated pattern, pat. Then > > sub(paste0("^.*(",pat, ").*$"),"\\1",thevector) > > should extract the matching pattern in each (perhaps with a little > fiddling due to precedence rules); e.g. > >> z <-c(".fg.h.g.a.a", "bb..dd.ef.tgf.", "foo...b.bbb.tgy") > >> pat1 <- "a+\\.*a+" >> pat2 <-"b+\\.*b+" >> pat <- c(pat1,pat2) > >> pat <- do.call(paste,c(as.list(pat), sep="|")) >> pat > [1] "a+\\.*a+|b+\\.*b+" > >> sub(paste0("^[^b]*(",pat,").*$"), "\\1", z) > [1] "a.a" "bb" "b.bbb" > > Cheers, > Bert > > > Bert Gunter > > "The trouble with having an open mind is that people keep coming along > and sticking things into it." > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > > > On Mon, Sep 5, 2016 at 9:56 AM, Jun Shen <jun.shen...@gmail.com> wrote: >> Thanks for the reply, Bert. >> >> Your solution solves the example. I actually have a more general situation >> where I have this dot concatenated string from multiple variables. The >> problem is those variables may have values with dots in there. The number of >> dots are not consistent for all values of a variable. So I am thinking to >> define a vector of patterns for the vector of the string and hopefully to >> find a way to use a pattern from the pattern vector for each value of the >> string vector. The only way I can think of is "for" loop, which can be slow. >> Also these are happening in a function I am writing. Just wonder if there is >> another more efficient way. Thanks a lot. >> >> Jun >> >> On Mon, Sep 5, 2016 at 1:41 AM, Bert Gunter <bgunter.4...@gmail.com> wrote: >>> >>> Well, he did provide an example, and... >>> >>> >>> > z <- c('TX.WT.CUT.mean','mg.tx.cv') >>> >>> > sub("^.+?\\.(.+)\\.[^.]+$","\\1",z) >>> [1] "WT.CUT" "tx" >>> >>> >>> ## seems to do what was requested. >>> >>> Jeff would have to amplify on his initial statement however: do you >>> mean that separate patterns can always be combined via "|" ? Or >>> something deeper? >>> >>> Cheers, >>> Bert >>> Bert Gunter >>> >>> "The trouble with having an open mind is that people keep coming along >>> and sticking things into it." >>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) >>> >>> >>> On Sun, Sep 4, 2016 at 9:30 PM, Jeff Newmiller <jdnew...@dcn.davis.ca.us> >>> wrote: >>> > Your opening assertion is false. >>> > >>> > Provide a reproducible example and someone will demonstrate. >>> > -- >>> > Sent from my phone. Please excuse my brevity. >>> > >>> > On September 4, 2016 9:06:59 PM PDT, Jun Shen <jun.shen...@gmail.com> >>> > wrote: >>> >>Dear list, >>> >> >>> >>I have a vector of strings that cannot be described by one pattern. So >>> >>let's say I construct a vector of patterns in the same length as the >>> >>vector >>> >>of strings, can I do the element wise pattern recognition and string >>> >>substitution. >>> >> >>> >>For example, >>> >> >>> >>pattern1 <- "([^.]*)\\.([^.]*\\.[^.]*)\\.(.*)" >>> >>pattern2 <- "([^.]*)\\.([^.]*)\\.(.*)" >>> >> >>> >>patterns <- c(pattern1,pattern2) >>> >>strings <- c('TX.WT.CUT.mean','mg.tx.cv') >>> >> >>> >>Say I want to extract "WT.CUT" from the first string and "tx" from the >>> >>second string. If I do >>> >> >>> >>sub(patterns, '\\2', strings), only the first pattern will be used. >>> >> >>> >>looping the patterns doesn't work the way I want. Appreciate any >>> >>comments. >>> >>Thanks. >>> >> >>> >>Jun >>> >> >>> >> [[alternative HTML version deleted]] >>> >> >>> >>______________________________________________ >>> >>R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> >>https://stat.ethz.ch/mailman/listinfo/r-help >>> >>PLEASE do read the posting guide >>> >>http://www.R-project.org/posting-guide.html >>> >>and provide commented, minimal, self-contained, reproducible code. >>> > >>> > ______________________________________________ >>> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> > https://stat.ethz.ch/mailman/listinfo/r-help >>> > PLEASE do read the posting guide >>> > http://www.R-project.org/posting-guide.html >>> > and provide commented, minimal, self-contained, reproducible code. >> >> ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.