Cfengine Help: Re: regex help

no-reply Wed, 04 May 2011 19:13:09 -0700

Forum: Cfengine Help
Subject: Re: regex help
Author: sauer
Link to topic: https://cfengine.com/forum/read.php?3,21705,21776#msg-21776


Yes, I'm saying put a .* at the end. :)

To handle the zero-width look-arounds, imagine that there's an index which 
keeps track of which character you're at in the string.  The index keeps track 
of which character you're comparing to.  Meanwhile, the regex keeps track of 
which expression it's evaluating.  Say we have the expression /\d+/.  Comparing 
it against the string 123abc456, the regex parser indicates "well, first I'm 
looking for a member of the set a to z".  So, it starts at 1, doesn't match.  
Goes to 2, doesn't match.  Eventually gets to a, and matches.  Now it's found 
1, but it's to find one or more.  So, it looks for either more members of that 
set, or a digit.  It plods along, eventually finding a digit.  And then it will 
optimally stop there, since that's the end of the expression, and "one or more" 
is fulfilled by just one.  Put the expression in parens, and it'll find all the 
digits due to the + being greedy.

I think we're on the same page up to this point.  But then we throw in the zero 
width expressions.

When the regex matching encounters a zero-width, imagine that the parser says 
"ok, hang on a second.  I'm gonna copy the index we were at and, in a child 
process, go check this pattern out".  So, it heads off, checking the zero width 
expression to see if it preceeds or follows the place where the main index is 
located.  But then, after checking that, the important thing to remember is 
that it returns to the original index before continuing to check the pattern.  
If the zero-width positive look-ahead is at the end of the pattern, fine.  
However, that's not the end of the pattern in an anchored cfengine expression.  
The end of the pattern is a $, ie, a pattern which matches the end of the line. 
 So, you have to follow the zero-width look ahead assertion with a pattern 
which matches from the prior-to-the-zero-width pointer to the end of the line.

Say you want to match one or more lower-case letters followed by a number, but 
the number can not start with a 1.  You'd say:
/+(?!1)\d+/
You match the letters with the +, then note that it should not be followed by a 
1, but *should* be followed by numbers.  Compare it against abc123. Both the 
(?!1) and the \d start matching immediately after the "c", which is the end of 
the + match.  The + ends on a non-letter.

If you don't want to start the numeric sequence with a 0 or a 1, one hard way 
to express that could be
/+(?!1)(?!2)\d+/
You've got two zero-width expressions with nothing separating them, so they 
both start just after the c in my example, and then the \d still also starts 
after the "c".

The same deal applies with the lookbehind, except that you go backwards instead 
of forwards. :)  Basically, your expression should match the whole string 
without the zero-width expressions, and then the zero-width parts add 
additional description to the pattern without consuming any space in the match. 
 In your string, you want to match the command and anything else to the end of 
the line (the .*), but that "anything else" can not include the redirection to 
/dev/null (the negative lookahead).

I feel like there should be an animated gif or something showing the 
progression through a regex here. :)  Until that time, does this clarify or 
further muddy things?

_______________________________________________
Help-cfengine mailing list
Help-cfengine@cfengine.org
https://cfengine.org/mailman/listinfo/help-cfengine

Cfengine Help: Re: regex help

Reply via email to