Mark-

I'm Cc'ing this to the mailing list so that the rest of user community can 
chime in (since I may be barking up the wrong tree).  This is in response to 
the many bug reports I see which involve regex's (that are in fact, not bugs 
but misunderstandings as to what are regex characters).  It's a thought that 
came to me in the "if I had it all to do over" frame of mind, and maybe you'll 
find a way to incorporate it in the future somehow...  consider it a long-term 
feature-request.  It may be possible to do this with a control promise, too, 
where we change the overall behavior of regex interpretation.

The biggest problem is that to a n00b, it is not always clear where the regex 
is, or when a regex is expected.  Currently, wherever a regex is allowed, the 
string contains the regex (which may or may not contain regex characters, and 
which looks like any other string).  So, you get code that looks like this:

> bundle edit_line comment_lines_matching
>   {
>   vars:
> 
>     "regexes" slist => { "one.*", "two.*", "four.*" };
> 
>   replace_patterns:
> 
>    "^($(regexes))$"
>       replace_with => comment("# ");
> 
>    ".*foo.*"
>       replace_with => comment("# ");
>   } 
> 
> bundle agent wintest
> {
> vars:
> 
>   "dim_array"
>      int =>  readstringarray("array_name","/tmp/array","#[^\n]*",":",10,4000);
> 
> 
> files:
>   "c:\tmp\file"
>     delete => nodir,
>     pathtype => "literal";    # force literal string interpretation
> 
> 
>   "C:/windows/tmp/f\d"
>     delete => nodir,
>     pathtype => "regex";      # force regular expression interpretation
> }

I propose introducing a new semantic distinction with a control promise and two 
new syntactic elements.

0) The default (current) behavior is unchanged, but a new behavior can be 
introduced with the control promise "explicit_regex"
1) With the new behavioral mode, a quoted string like "foo.*" is just a string 
containing 5 characters ("foo" a literal '.' and a literal '*').
2) Also with the new behavioral mode, an explicit regex string like /foo.*/ or 
r"foo.*" or r'foo.*' is a string that contains a regex.  The r"str" notation is 
a convenience, to spare lots of backslashes in filenames.

This means that the above code (with the appropriate control variable) will now 
look like this:

> body common control
> {
>    explicit_regex => "true";
> }
> 
> bundle edit_line comment_lines_matching
>   {
>   vars:
> 
>     "regexes" slist => { "one.*", "two.*", "four.*" };
> 
>   replace_patterns:
> 
>    /^($(regexes))$/           # The slist contains strings, but they are 
> expanded and then the result is interpreted as a regex
>       replace_with => comment("# ");
> 
>    /.*foo.*/
>       replace_with => comment("# ");
>   } 
> 
> bundle agent wintest
> {
> vars:
> 
>   "dim_array"
>      int =>  
> readstringarray("array_name","/tmp/array",/#[^\n]*/,":",10,4000);        # 
> 3rd parameter is an explicit regex, 4th param is a string (both could be 
> regex's)
> 
> 
> files:
>   "c:\tmp\file"
>     delete => nodir   ;       # literal string interpretation is automatic, 
> because the string is simple and not a regex string
> 
> 
>   r"C:/windows/tmp/f\d"
>     delete => nodir;  # regular expression interpretation is automatic, 
> because the string is a regex
> }

The biggest problem now is that a user may specify a file as "file.ext", and 
not realize that this can also select a file named "filesext" or "file-ext" or 
"file_ext" (and to get just the one file, they need to say "file\.ext").

I also just saw user report where
>       perms => system("0400", "DOMAIN+USER", "sysmgt")

didn't work as expected, and that it needed to be
>       perms => system("0400", "DOMAIN\+USER", "sysmgt")


I think it would be so much clearer if the regex/string distinction was 
explicit.  What do you all think?  I'd be willing to help implement this, of 
course.

-Dan
_______________________________________________
Help-cfengine mailing list
Help-cfengine@cfengine.org
https://cfengine.org/mailman/listinfo/help-cfengine

Reply via email to