Gary V. Vaughan <gary <at> gnu.org> writes: > > I'm thinking of removing epatsubst, eregexp and erenamesyms in HEAD, in > favour of a more flexible and scalable changeresyntax builtin as an > analogue to re_set_syntax in the GNU regex C API. > > If a bogus operand is given: > > changeresyntax(`meh') > => stdin:1: m4: ERROR: unknown argument to built-in `changeresyntax'; > use one of: AWK, ED, EGREP, EMACS, GNU_AWK, GREP, POSIX_AWK, POSIX_BASIC, POSIX_EGREP, POSIX_EXTENDED, SED.
I like it! It is similar to the -regextype primary recently added to findutils 4.2.24. And it goes well with the regexprops-generic.texi in gnulib, which documents all the high-level regular expression families in GNU programs. And what about POSIX_MINIMAL_BASIC? > > This replaces 3 builtins with one more powerful builtin, an obvious > win to my mind Can anyone see a downside to this change? A few issues to be resolved, first. One - autoconf documents m4_bpatsubst as mapping to m4's patsubst, with the note that m4_patsubst is reserved for the day that m4 introduces epatsubst. We need to make sure that repeated use of changeresyntax is efficient. With your proposal, autoconf will have to do something like: define(`m4_patsubst', `changeresyntax(`POSIX_EXTENDED')'defn(`patsubst')) define(`m4_bpatsubst', `changeresyntax(`EMACS')'defn(`patsubst')) (which implies that we will need to fix the mixing of text and builtins in a single definition; or else expand the above example into using helper macros). Two - what about case-insensitive regular expressions? Again, using findutils as an example, it provides -regex and -iregex as the two primaries affected by - regextype. So we should really have 7 regex builtins in m4: patsubst, regex, renamesyms, ipatsubst, iregex, irenamesyms, changeregex. Three - is changeresyntax(`emacs') the same as changeresyntax(`EMACS')? Should we accept unambiguous prefixes, like changeresyntax(`em')? Four - what should the default be? Do we stick with EMACS syntax, for 1.4.x compatibility, or do we go for broke and make the default POSIX_EXTENDED? Whatever we choose, we should probably also have a command-line option to set the default. Five - it looks like you already have a patch started. Don't forget to add the current resyntax to frozen files, since it should be saved across loads. And how would this interact if the state in the frozen file and the state requested by the command line differ on reload? Six - is it also worth adding an optional parameter to the existing regex builtins? I'm thinking along the lines of: patsubst(string, regexp, replacement, opt syntax) That optional syntax parameter could also serve as the place to request flags like case-insensitive or global vs. first match only (kind of like perl's s///ig). Then you would only need four regex primitives (changeresyntax, patsubst, regex, and renamesyms), because the optional syntax parameter could double as the place to request case-insensitivity. For example, autoconf could then do something like: define(`m4_bpatsubst', `m4_builtin(`patsubst', `$1', `$2', `$3', `EMACS')') define(`m4_patsubst', `m4_builtin(`patsubst', `$1', `$2', `$3', `POSIX_EXTENDED')') define(`m4_ipatsubst', `m4_builtin(`patsubst', `$1', `$2', `$3', `POSIX_EXTENDED,insensitive')') It still makes sense to provide changeresyntax, even if you add the optional parameter to the other three builtins, so that you don't always have to request which syntax. -- Eric Blake _______________________________________________ M4-discuss mailing list M4-discuss@gnu.org http://lists.gnu.org/mailman/listinfo/m4-discuss