Re: Need match character help

Timo Paulssen Thu, 17 May 2018 04:10:45 -0700

The description perhaps doesn't point out clearly enough: the reason why
the stuff inside the [ ] will match any amount of times is only the * at
the end, the [ ] is only there because otherwise the regex would instead
match something you didn't mean at all. If you're interested, read on
for an explanation, but it might actually be more confusing than helpful:


The resulting regex means "either the beginning of the string is
followed by any letter from d to z except g, or there's a g that's
either not before an m, or it is, and followed by the end of the string".

That's because now the | would not only separate the ^ and $ anchors
into becoming alternatives, but the * would cling to the <!before m>
which is now allowed to not match at all (because it's a * and not a +).

Also, putting a quantifier (which is what * and + are called) on a
before or after assertion makes no sense and probably leads to an
infinite loop (the regex engine tries to make you proud by matching it
as often as it possibly can. which if the assertion is true, is
infinitely often. it is very diligent, but it does not really think much
about what it does).

Hope that helps
  - Timo


On 17/05/18 12:51, Timo Paulssen wrote:
>
> character classes are fundamentally the wrong thing for "phrases",
> since they describe only a character.
>
> Your current regex (before changing [gm] to ["gm"]) was expressing
> "from the start of the string, there's any amount of characters d
> through z (but neither g nor m) and then the end of the string", which
> can be more easily expressed as "the whole string contains only
> letters d through z (but neither g nor m)".
>
> What you apparently want is "the whole string contains only letters d
> through z, but never the phrase 'gm'", which - in order to get to a
> working regex - we can rephrase as "the whole string contains only
> letters d through z and no occurrence of g is followed by an m". Let's
> turn that into a regex:
>
>     /^     # Require the match to start at the beginning of the
>            # string so nothing can sneak in before that.
>     [      # Everything in this group will be matched a bunch
>            # of times.
>     |  <[d..z]-[g]>  # either anything between d and z, with no
>                      # further restrictions, except for g.
>     |  g <!before m> # If there's a g, it must not be followed
>                      # by an m.
>     ]*     # end of the group, allow the things in the group to
>            # occur any amount of times.
>     $/     # Require the match to end at the end of the string,
>            # so nothing at the end can sneak in.
>
> Important things to note here:
>
>   * <!before m> (spoken as "do not match before an m") will be fine
>     with occurrences at the end of the string, too.
>   * we don't remove the m from the character class any more, we only
>     keep the g in there, because m can be in the string without
>     restrictions; if there is an m after a g, our regex will already
>     have failed before it even reaches the m, and all other cases are
>     fine (like dm or fm or hm).
>   * you are allowed to put a | not only between things, but also at
>     the very front. This is allowed in the syntax so that you can line
>     things up vertically like I did. Think of it as similar to
>     allowing a , after the last element in a list, like with [1, 2, 3,
>     4, ]
>
>> hi
>> Match: ｢hi｣
>> bleh
>> Match: Nil
>> fog
>> Match: ｢fog｣
>> dm
>> Match: ｢dm｣
>> fm
>> Match: ｢fm｣
>> hm
>> Match: ｢hm｣
>> gm
>> Match: Nil
>> rofl
>> Match: ｢rofl｣
>> dddddddddddg
>> Match: ｢dddddddddddg｣
>> gggggggggggg
>> Match: ｢gggggggggggg｣
>> mmmmmmmm
>> Match: ｢mmmmmmmm｣
>
> Hope that helps!
>   - Timo

Re: Need match character help

Reply via email to