Re: Need match character help

Norman Gaywood Thu, 17 May 2018 20:03:27 -0700

Nice RE tutorial :-)

/^ [ | <[d..z]-[g]> | g<!before m> ]* $/


One question I have is what is the first | for?

On Thu, 17 May 2018 at 21:10, Timo Paulssen <t...@wakelift.de> wrote:

> The description perhaps doesn't point out clearly enough: the reason why
> the stuff inside the [ ] will match any amount of times is only the * at
> the end, the [ ] is only there because otherwise the regex would instead
> match something you didn't mean at all. If you're interested, read on for
> an explanation, but it might actually be more confusing than helpful:
>
> The resulting regex means "either the beginning of the string is followed
> by any letter from d to z except g, or there's a g that's either not before
> an m, or it is, and followed by the end of the string".
>
> That's because now the | would not only separate the ^ and $ anchors into
> becoming alternatives, but the * would cling to the <!before m> which is
> now allowed to not match at all (because it's a * and not a +).
>
> Also, putting a quantifier (which is what * and + are called) on a before
> or after assertion makes no sense and probably leads to an infinite loop
> (the regex engine tries to make you proud by matching it as often as it
> possibly can. which if the assertion is true, is infinitely often. it is
> very diligent, but it does not really think much about what it does).
>
> Hope that helps
>   - Timo
>
> On 17/05/18 12:51, Timo Paulssen wrote:
>
> character classes are fundamentally the wrong thing for "phrases", since
> they describe only a character.
>
> Your current regex (before changing [gm] to ["gm"]) was expressing "from
> the start of the string, there's any amount of characters d through z (but
> neither g nor m) and then the end of the string", which can be more easily
> expressed as "the whole string contains only letters d through z (but
> neither g nor m)".
>
> What you apparently want is "the whole string contains only letters d
> through z, but never the phrase 'gm'", which - in order to get to a working
> regex - we can rephrase as "the whole string contains only letters d
> through z and no occurrence of g is followed by an m". Let's turn that into
> a regex:
>
>     /^     # Require the match to start at the beginning of the
>            # string so nothing can sneak in before that.
>     [      # Everything in this group will be matched a bunch
>            # of times.
>     |  <[d..z]-[g]>  # either anything between d and z, with no
>                      # further restrictions, except for g.
>     |  g <!before m> # If there's a g, it must not be followed
>                      # by an m.
>     ]*     # end of the group, allow the things in the group to
>            # occur any amount of times.
>     $/     # Require the match to end at the end of the string,
>            # so nothing at the end can sneak in.
>
> Important things to note here:
>
>    - <!before m> (spoken as "do not match before an m") will be fine with
>    occurrences at the end of the string, too.
>    - we don't remove the m from the character class any more, we only
>    keep the g in there, because m can be in the string without restrictions;
>    if there is an m after a g, our regex will already have failed before it
>    even reaches the m, and all other cases are fine (like dm or fm or hm).
>    - you are allowed to put a | not only between things, but also at the
>    very front. This is allowed in the syntax so that you can line things up
>    vertically like I did. Think of it as similar to allowing a , after the
>    last element in a list, like with [1, 2, 3, 4, ]
>
> hi
> Match: ｢hi｣
> bleh
> Match: Nil
> fog
> Match: ｢fog｣
> dm
> Match: ｢dm｣
> fm
> Match: ｢fm｣
> hm
> Match: ｢hm｣
> gm
> Match: Nil
> rofl
> Match: ｢rofl｣
> dddddddddddg
> Match: ｢dddddddddddg｣
> gggggggggggg
> Match: ｢gggggggggggg｣
> mmmmmmmm
> Match: ｢mmmmmmmm｣
>
>
> Hope that helps!
>   - Timo
>
>
>

-- 
Norman Gaywood, Computer Systems Officer
School of Science and Technology
University of New England
Armidale NSW 2351, Australia

ngayw...@une.edu.au  http://turing.une.edu.au/~ngaywood
Phone: +61 (0)2 6773 2412  Mobile: +61 (0)4 7862 0062

Please avoid sending me Word or Power Point attachments.
See http://www.gnu.org/philosophy/no-word-attachments.html

Re: Need match character help

Reply via email to