On Mon, Dec 05, 2011 at 09:33:05AM -0500, no-re...@cfengine.com wrote: >Forum: CFEngine Help >Subject: problem with negative lookahead regex >Author: svenXY >Link to topic: https://cfengine.com/forum/read.php?3,24188,24188#msg-24188 > >What I want is find out if a line starts with "log4j.rootLogger=", then >something else, then does not contain the string ", SYSLOG". This works fine >when specifying the whole string, but as soon as I use '(.*)' or even '(.*?)' >inbetween, the regex fails. > >Here's some code to demonstrate: > > >body common control >{ > bundlesequence => { "test" }; >} > > >bundle agent test >{ > > vars: > "start" string => "log4j.rootLogger="; > "startlong" string => "log4j.rootLogger=INFO, FILE"; > "end" string => ", SYSLOG";
Shouldn't this be "SYSLOG" without the comma and space characters? It could be the only option listed. For example: "log4j.rootLogger=SYSLOG" Of course, that dones't address the regex problem. > classes: > "matched_origin" expression => regcmp("^($(start)(.*?))(?!$(end))$", > "log4j.rootLogger=INFO, FILE"); > "matched_whole" expression => regcmp("^($(start)(.*?))(?!$(end))$", > "log4j.rootLogger=INFO, FILE, SYSLOG"); > "matched_l_origin" expression => regcmp("^($(startlong))(?!$(end))$", > "log4j.rootLogger=INFO, FILE"); > "matched_l_whole" expression => regcmp("^($(startlong))(?!$(end))$", > "log4j.rootLogger=INFO, FILE, SYSLOG"); I think the problem here is that the '.*!' construct, is still too greedy. You match the value of ${start} just fine, then match *everything else to the end of the string*, then perform a negative lookahead. The negative lookahead succeeds because there's nothing left to match. Consider this example where we want to find all rabbits not chased by a dog: $ cat test.txt rabbit rabbit dog $ pcregrep 'rabbit.*(?!dog)' test.txt rabbit rabbit dog It finds both. Now consider: $ pcregrep 'rabbit(?!.*dog)' test.txt rabbit So maybe try this regex (note that I've moved the '.*'): ^($(start))(?!.*$(end)).*$ So I don't think this is a bug, but one of the dark and subtle corners of regexes. > > reports: > matched_origin:: > "this should match - start is not followed by end"; > matched_whole:: > "this should not match (best version)"; > matched_l_origin:: > "should match, but start contains the whole string"; > matched_l_whole:: > "this should not match (whole string version)"; >} > > > >output is: > > >R: this should match - start is not followed by end >R: this should not match (best version) >R: should match, but start contains the whole string > > >- the second one is my problem. It should not work, because I do the following >there: > >"log4j.rootLogger=INFO, FILE, SYSLOG" matched against >"^(log4j.rootLogger=(.*?)(?!, SYSLOG)$" > >--> and that should not match!!! > >Is that a bug or can someone enlighten my poor understanding of regexes here? > >Thaanks a bunch, >Sven > >_______________________________________________ >Help-cfengine mailing list >Help-cfengine@cfengine.org >https://cfengine.org/mailman/listinfo/help-cfengine -- Jesse Becker NHGRI Linux support (Digicon Contractor) _______________________________________________ Help-cfengine mailing list Help-cfengine@cfengine.org https://cfengine.org/mailman/listinfo/help-cfengine