On Mon, Dec 05, 2011 at 09:33:05AM -0500, no-re...@cfengine.com wrote:
>Forum: CFEngine Help
>Subject: problem with negative lookahead regex
>Author: svenXY
>Link to topic: https://cfengine.com/forum/read.php?3,24188,24188#msg-24188
>
>What I want is find out if a line starts with "log4j.rootLogger=", then 
>something else, then does not contain the string ", SYSLOG". This works fine 
>when specifying the whole string, but as soon as I use '(.*)' or even '(.*?)' 
>inbetween, the regex fails.
>
>Here's some code to demonstrate:
>
>
>body common control
>{
>        bundlesequence => { "test" };
>}
>
>
>bundle agent test
>{
>
>  vars:
>    "start" string => "log4j.rootLogger=";
>    "startlong" string => "log4j.rootLogger=INFO, FILE";
>    "end"   string => ", SYSLOG";

Shouldn't this be "SYSLOG" without the comma and space characters?  It
could be the only option listed.  For example:  "log4j.rootLogger=SYSLOG"

Of course, that dones't address the regex problem.

>  classes:
>    "matched_origin"    expression => regcmp("^($(start)(.*?))(?!$(end))$", 
> "log4j.rootLogger=INFO, FILE");
>    "matched_whole"     expression => regcmp("^($(start)(.*?))(?!$(end))$", 
> "log4j.rootLogger=INFO, FILE, SYSLOG");
>    "matched_l_origin"    expression => regcmp("^($(startlong))(?!$(end))$", 
> "log4j.rootLogger=INFO, FILE");
>    "matched_l_whole"     expression => regcmp("^($(startlong))(?!$(end))$", 
> "log4j.rootLogger=INFO, FILE, SYSLOG");


I think the problem here is that the '.*!' construct, is still too
greedy. You match the value of ${start} just fine, then match
*everything else to the end of the string*, then perform a negative
lookahead.  The negative lookahead succeeds because there's nothing left
to match.

Consider this example where we want to find all rabbits not
chased by a dog:

        $ cat test.txt
        rabbit
        rabbit dog

        $ pcregrep 'rabbit.*(?!dog)' test.txt
        rabbit
        rabbit dog

It finds both.  Now consider:

        $ pcregrep 'rabbit(?!.*dog)' test.txt
        rabbit

So maybe try this regex (note that I've moved the '.*'):

        ^($(start))(?!.*$(end)).*$


So I don't think this is a bug, but one of the dark and subtle corners
of regexes.



>
>  reports:
>    matched_origin::
>      "this should match - start is not followed by end";
>    matched_whole::
>      "this should not match (best version)";
>    matched_l_origin::
>      "should match, but start contains the whole string";
>    matched_l_whole::
>      "this should not match (whole string version)";
>}
>
>
>
>output is:
>
>
>R: this should match - start is not followed by end
>R: this should not match (best version)
>R: should match, but start contains the whole string
>
>
>- the second one is my problem. It should not work, because I do the following 
>there:
>
>"log4j.rootLogger=INFO, FILE, SYSLOG" matched against 
>"^(log4j.rootLogger=(.*?)(?!, SYSLOG)$"
>
>--> and that should not match!!!
>
>Is that a bug or can someone enlighten my poor understanding of regexes here?
>
>Thaanks a bunch,
>Sven
>
>_______________________________________________
>Help-cfengine mailing list
>Help-cfengine@cfengine.org
>https://cfengine.org/mailman/listinfo/help-cfengine

-- 
Jesse Becker
NHGRI Linux support (Digicon Contractor)
_______________________________________________
Help-cfengine mailing list
Help-cfengine@cfengine.org
https://cfengine.org/mailman/listinfo/help-cfengine

Reply via email to