Nice RE tutorial :-) /^ [ | <[d..z]-[g]> | g<!before m> ]* $/
One question I have is what is the first | for? On Thu, 17 May 2018 at 21:10, Timo Paulssen <t...@wakelift.de> wrote: > The description perhaps doesn't point out clearly enough: the reason why > the stuff inside the [ ] will match any amount of times is only the * at > the end, the [ ] is only there because otherwise the regex would instead > match something you didn't mean at all. If you're interested, read on for > an explanation, but it might actually be more confusing than helpful: > > The resulting regex means "either the beginning of the string is followed > by any letter from d to z except g, or there's a g that's either not before > an m, or it is, and followed by the end of the string". > > That's because now the | would not only separate the ^ and $ anchors into > becoming alternatives, but the * would cling to the <!before m> which is > now allowed to not match at all (because it's a * and not a +). > > Also, putting a quantifier (which is what * and + are called) on a before > or after assertion makes no sense and probably leads to an infinite loop > (the regex engine tries to make you proud by matching it as often as it > possibly can. which if the assertion is true, is infinitely often. it is > very diligent, but it does not really think much about what it does). > > Hope that helps > - Timo > > On 17/05/18 12:51, Timo Paulssen wrote: > > character classes are fundamentally the wrong thing for "phrases", since > they describe only a character. > > Your current regex (before changing [gm] to ["gm"]) was expressing "from > the start of the string, there's any amount of characters d through z (but > neither g nor m) and then the end of the string", which can be more easily > expressed as "the whole string contains only letters d through z (but > neither g nor m)". > > What you apparently want is "the whole string contains only letters d > through z, but never the phrase 'gm'", which - in order to get to a working > regex - we can rephrase as "the whole string contains only letters d > through z and no occurrence of g is followed by an m". Let's turn that into > a regex: > > /^ # Require the match to start at the beginning of the > # string so nothing can sneak in before that. > [ # Everything in this group will be matched a bunch > # of times. > | <[d..z]-[g]> # either anything between d and z, with no > # further restrictions, except for g. > | g <!before m> # If there's a g, it must not be followed > # by an m. > ]* # end of the group, allow the things in the group to > # occur any amount of times. > $/ # Require the match to end at the end of the string, > # so nothing at the end can sneak in. > > Important things to note here: > > - <!before m> (spoken as "do not match before an m") will be fine with > occurrences at the end of the string, too. > - we don't remove the m from the character class any more, we only > keep the g in there, because m can be in the string without restrictions; > if there is an m after a g, our regex will already have failed before it > even reaches the m, and all other cases are fine (like dm or fm or hm). > - you are allowed to put a | not only between things, but also at the > very front. This is allowed in the syntax so that you can line things up > vertically like I did. Think of it as similar to allowing a , after the > last element in a list, like with [1, 2, 3, 4, ] > > hi > Match: 「hi」 > bleh > Match: Nil > fog > Match: 「fog」 > dm > Match: 「dm」 > fm > Match: 「fm」 > hm > Match: 「hm」 > gm > Match: Nil > rofl > Match: 「rofl」 > dddddddddddg > Match: 「dddddddddddg」 > gggggggggggg > Match: 「gggggggggggg」 > mmmmmmmm > Match: 「mmmmmmmm」 > > > Hope that helps! > - Timo > > > -- Norman Gaywood, Computer Systems Officer School of Science and Technology University of New England Armidale NSW 2351, Australia ngayw...@une.edu.au http://turing.une.edu.au/~ngaywood Phone: +61 (0)2 6773 2412 Mobile: +61 (0)4 7862 0062 Please avoid sending me Word or Power Point attachments. See http://www.gnu.org/philosophy/no-word-attachments.html