My error. However, Jun has been severely abusing them... such use is unusual, and the "(?:" non-capturing group marker was invented because the
capture side effect is so central to the use of the regular parenthesis.

On Tue, 6 Sep 2016, Bert Gunter wrote:

Jeff:

Not sure what you meant by this:

"There is no other reason to put parentheses in the pattern... they
are not grouping symbols."

... but in fact, from ?regexp

"Repetition takes precedence over concatenation, which in turn takes
precedence over alternation. A whole subexpression may be enclosed in
parentheses to override these precedence rules. "

So parentheses *are* in fact "grouping symbols."

Cheers,
Bert


Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Tue, Sep 6, 2016 at 5:18 PM, Jeff Newmiller <jdnew...@dcn.davis.ca.us> wrote:
I am not near my computer today, but each parenthesis gets its own result 
number, so you should put the parenthesis around the whole pattern of 
alternatives instead of having many parentheses.

I recommend thinking in terms of what common information you expect to find in 
these various strings, and place your parentheses to capture that information. 
There is no other reason to put parentheses in the pattern... they are not 
grouping symbols.
--
Sent from my phone. Please excuse my brevity.

On September 6, 2016 5:01:04 PM PDT, Bert Gunter <bgunter.4...@gmail.com> wrote:
Jun:

1. Tell us your desired result from your test vector and maybe someone
will help.

2. As we played this game once already (you couldn't do it; I showed
you how), this seems to be a function of your limitations with regular
expressions. I'm probably not much better, but in any case, I don't
intend to be your consultant. See if you can find someone locally to
help you if you do not receive a satisfactory reply from the list.
There are many people here who are pretty good at this sort of thing,
but I don't know if they'll reply. Regex's are certainly complex. PERL
people tend to be pretty good at them, I believe. There are numerous
web sites and books on them if you need to acquire expertise for your
work.

Cheers,
Bert
Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Tue, Sep 6, 2016 at 3:59 PM, Jun Shen <jun.shen...@gmail.com> wrote:
Hi Bert,

I still couldn't make the multiple patterns to work. Here is an
example. I
make the pattern as follows

final.pattern <-

"(240\\.m\\.g)\\.(>50-70\\.kg)\\.(.*)|(3\\.mg\\.kg)\\.(>50-70\\.kg)\\.(.*)|(240\\.m\\.g)\\.(>70-90\\.kg)\\.(.*)|(3\\.mg\\.kg)\\.(>70-90\\.kg)\\.(.*)|(240\\.m\\.g)\\.(>90-110\\.kg)\\.(.*)|(3\\.mg\\.kg)\\.(>90-110\\.kg)\\.(.*)|(240\\.m\\.g)\\.(50\\.kg\\.or\\.less)\\.(.*)|(3\\.mg\\.kg)\\.(50\\.kg\\.or\\.less)\\.(.*)|(240\\.m\\.g)\\.(>110\\.kg)\\.(.*)|(3\\.mg\\.kg)\\.(>110\\.kg)\\.(.*)"

test.string <- c('240.m.g.>110.kg.geo.mean', '3.mg.kg.>110.kg.P05',
'240.m.g.>50-70.kg.geo.mean')

sub(final.pattern, '\\1', test.string)
sub(final.pattern, '\\2', test.string)
sub(final.pattern, '\\3', test.string)

Only the third string has been correctly parsed, which matches the
first
pattern. It seems the rest of the patterns are not called.

Jun


On Mon, Sep 5, 2016 at 10:21 PM, Bert Gunter <bgunter.4...@gmail.com>
wrote:

Just noticed: My clumsy do.call() line in my previously posted code
below should be replaced with:
pat <- paste(pat,collapse = "|")


pat <- c(pat1,pat2)
paste(pat,collapse="|")
[1] "a+\\.*a+|b+\\.*b+"

************ replace this **************************
pat <- do.call(paste,c(as.list(pat), sep="|"))
********************************************
sub(paste0("^[^b]*(",pat,").*$"),"\\1",z)
[1] "a.a"   "bb"    "b.bbb"


-- Bert
Bert Gunter

"The trouble with having an open mind is that people keep coming
along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Mon, Sep 5, 2016 at 12:11 PM, Bert Gunter
<bgunter.4...@gmail.com>
wrote:
Jun:

You need to provide a clear specification via regular expressions
of
the patterns you wish to match -- at least for me to decipher it.
Others may be smarter than I, though...

Jeff: Thanks. I have now convinced myself that it can be done (a
"proof" of sorts): If pat1, pat2,..., patn are m different
patterns
(in a vector of patterns)  to be matched in a vector of n strings,
where only one of the patterns will match in any string,  then use
paste() (probably via do.call()) or otherwise to paste them
together
separated by "|" to form the concatenated pattern, pat. Then

sub(paste0("^.*(",pat, ").*$"),"\\1",thevector)

should extract the matching pattern in each (perhaps with a little
fiddling due to precedence rules); e.g.

z <-c(".fg.h.g.a.a", "bb..dd.ef.tgf.", "foo...b.bbb.tgy")

pat1 <- "a+\\.*a+"
pat2 <-"b+\\.*b+"
pat <- c(pat1,pat2)

pat <- do.call(paste,c(as.list(pat), sep="|"))
pat
[1] "a+\\.*a+|b+\\.*b+"

sub(paste0("^[^b]*(",pat,").*$"), "\\1", z)
[1] "a.a"   "bb"    "b.bbb"

Cheers,
Bert


Bert Gunter

"The trouble with having an open mind is that people keep coming
along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Mon, Sep 5, 2016 at 9:56 AM, Jun Shen <jun.shen...@gmail.com>
wrote:
Thanks for the reply, Bert.

Your solution solves the example. I actually have a more general
situation
where I have this dot concatenated string from multiple
variables. The
problem is those variables may have values with dots in there.
The
number of
dots are not consistent for all values of a variable. So I am
thinking
to
define a vector of patterns for the vector of the string and
hopefully
to
find a way to use a pattern from the pattern vector for each
value of
the
string vector. The only way I can think of is "for" loop, which
can be
slow.
Also these are happening in a function I am writing. Just wonder
if
there is
another more efficient way. Thanks a lot.

Jun

On Mon, Sep 5, 2016 at 1:41 AM, Bert Gunter
<bgunter.4...@gmail.com>
wrote:

Well, he did provide an example, and...


z <- c('TX.WT.CUT.mean','mg.tx.cv')

sub("^.+?\\.(.+)\\.[^.]+$","\\1",z)
[1] "WT.CUT" "tx"


## seems to do what was requested.

Jeff would have to amplify on his initial statement however: do
you
mean that separate patterns can always be combined via "|" ?  Or
something deeper?

Cheers,
Bert
Bert Gunter

"The trouble with having an open mind is that people keep coming
along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip
)


On Sun, Sep 4, 2016 at 9:30 PM, Jeff Newmiller
<jdnew...@dcn.davis.ca.us>
wrote:
Your opening assertion is false.

Provide a reproducible example and someone will demonstrate.
--
Sent from my phone. Please excuse my brevity.

On September 4, 2016 9:06:59 PM PDT, Jun Shen
<jun.shen...@gmail.com>
wrote:
Dear list,

I have a vector of strings that cannot be described by one
pattern.
So
let's say I construct a vector of patterns in the same length
as the
vector
of strings, can I do the element wise pattern recognition and
string
substitution.

For example,

pattern1 <- "([^.]*)\\.([^.]*\\.[^.]*)\\.(.*)"
pattern2 <- "([^.]*)\\.([^.]*)\\.(.*)"

patterns <- c(pattern1,pattern2)
strings <- c('TX.WT.CUT.mean','mg.tx.cv')

Say I want to extract "WT.CUT" from the first string and "tx"
from
the
second string. If I do

sub(patterns, '\\2', strings), only the first pattern will be
used.

looping the patterns doesn't work the way I want. Appreciate
any
comments.
Thanks.

Jun

      [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more,
see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible
code.

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more,
see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible
code.







---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<jdnew...@dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...
                                      Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to