-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 According to Paul Eggert on 1/20/2007 12:43 AM: > Eric Blake <[EMAIL PROTECTED]> writes: > >> + /* This warning must not kill m4 -E, or it will break autoconf. */ >> + if (text && strstr (text, "${")) >> + M4ERROR ((0, 0, "Warning: raw `${' in defn of %s will change semantics", >> + name)); > > This warning will generate a lot of false positives, right? > Most of the time, a stray ${ in an M4 file won't be followed > by a series of digits and then a }. So it will be treated > as itself (for backward compatibility).
OK, I toned down my patch on the M4 side of things. Originally, the patch warned for the two-character sequence ${, since I was planning that even ${foo} could have meaning in M4 2.0 (as the current definition of foo), but we can save that for M4 2.1 and a long transition period. For m4 2.0, if ${ is followed by a non-digit, then I will be sure to stick with the old behavior of literal output. This greatly reduces (but not eliminates) the number of places in autoconf that need extra quoting; I'll follow up with a patch to autoconf along those lines. It is also possible in 2.0 to disable ${} handling, using the changesyntax builtin to assign { and } back to the ordinary character category, at the expense of no longer being able to refer to more than 9 arguments to a macro. My patch to autoconf will include an action along those lines, so that no matter how fancy M4 2.0 actually becomes when handling ${}, it is possible for autoconf to ignore that new feature for the sake of the large existing codebase of macros that use raw ${. Meanwhile, this particular patch is only for the 1.4.x branch, and I'm going ahead and committing it. I hope it is the last patch prior to 1.4.9, although this week's changes in gnulib regarding <string.h> need to stabilize first. It adds the --warn-syntax option (off by default) in order to detect uses of the three-character sequences $<digit><digit> (which will change to the one-digit argumented concatenated with the second digit rather than a multi-digit argument; I doubt much code tickles this) as well as uses of ${<digit> (common when generating shell or Makefile code; I doubt there are many false positives where a close } cannot be found, so the warning is simplified by not looking for it). I will be using this patch to find the problem spots in autoconf; I already know that m4 1.4.9 + autoconf 2.61 will trigger the warning (and since autom4te runs m4 -E, it is fatal to autoconf), so this patch is careful to document that issue. Hopefully, autoconf 2.62 will be immune from this warning. 2007-01-27 Eric Blake <[EMAIL PROTECTED]> * src/m4.h (warn_syntax): Declare. (init_pattern_buffer): Export. * src/m4.c (warn_syntax, usage, WARN_SYNTAX_OPTIONS) (long_options, main): Implement new option. * src/builtin.c (init_pattern_buffer): Allow NULL regs argument. (define_user_macro): Warn on $11 and ${1} if requested. * src/input.c (init_pattern_buffer): Delete duplicate method. * doc/m4.texinfo (Operation modes): Document it. (Arguments): Document future direction of ${11} vs. $11. (Incompatibilities): Fix wording on POSIX limitations. * checks/get-them: Parse @{ and @} correctly. * NEWS: Document this change. - -- Don't work too hard, make some time for fun as well! Eric Blake [EMAIL PROTECTED] -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (Cygwin) Comment: Public key at home.comcast.net/~ericblake/eblake.gpg Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFFvAIS84KuGfSFAYARAj3PAJwKP/02NcMGixie5CcrW60H7qJigQCg1KzX 4JCSiWLw8Upnu5wY6UNSdjE= =yJW3 -----END PGP SIGNATURE-----
Index: NEWS =================================================================== RCS file: /sources/m4/m4/NEWS,v retrieving revision 1.1.1.1.2.90 diff -u -p -r1.1.1.1.2.90 NEWS --- NEWS 15 Jan 2007 13:51:33 -0000 1.1.1.1.2.90 +++ NEWS 28 Jan 2007 01:50:28 -0000 @@ -15,6 +15,14 @@ Version 1.4.9 - ?? ??? 2007, by ???? (C of variable assignment as an extension. * The `include' builtin now affects exit status on failure, as required by POSIX. Use `sinclude' if you need a successful exit status. +* A new `--warn-syntax' command-line option allows detection of + non-portable syntax that might be broken when upgrading to M4 2.0. For + example, POSIX requires a macro definition containing `$11' to expand to + the first argument concatenated with 1, rather than the eleventh + argument; and allows implementations to choose whether `${11}' is treated + as literal text, as in M4 1.4.x, or as the eleventh argument, as in the + eventual M4 2.0. Be aware that Autoconf 2.61 will not work with this + option enabled. * Improved portability to platforms such as BSD/OS. Version 1.4.8 - 20 November 2006, by Eric Blake (CVS version 1.4.7a) Index: checks/get-them =================================================================== RCS file: /sources/m4/m4/checks/Attic/get-them,v retrieving revision 1.1.1.1.2.8 diff -u -p -r1.1.1.1.2.8 get-them --- checks/get-them 6 Jan 2007 19:56:11 -0000 1.1.1.1.2.8 +++ checks/get-them 28 Jan 2007 01:50:28 -0000 @@ -73,6 +73,8 @@ BEGIN { else prefix = ""; gsub("@@", "@", $0); + gsub("@{", "{", $0); + gsub("@}", "}", $0); gsub("@w{ }", " ", $0); gsub("@tabchar{}", "\t", $0); printf("%s%s\n", prefix, $0) >> file; Index: doc/m4.texinfo =================================================================== RCS file: /sources/m4/m4/doc/m4.texinfo,v retrieving revision 1.1.1.1.2.108 diff -u -p -r1.1.1.1.2.108 m4.texinfo --- doc/m4.texinfo 15 Jan 2007 13:51:33 -0000 1.1.1.1.2.108 +++ doc/m4.texinfo 28 Jan 2007 01:50:29 -0000 @@ -577,6 +577,13 @@ is also specified. Suppress warnings, such as missing or superfluous arguments in macro calls, or treating the empty string as zero. [EMAIL PROTECTED] --warn-syntax +Issue warnings when syntax is encountered that will change semantics in [EMAIL PROTECTED] M4 2.0. For now, the only semantics that will change have +to do with how more than 9 arguments in a macro definition are handled +(@pxref{Arguments}). This warning is disabled by default because it +triggers spurious failures in @acronym{GNU} Autoconf 2.61. + @item -W @var{REGEXP} @itemx [EMAIL PROTECTED] Use @var{REGEXP} as an alternative syntax for macro names. This @@ -1354,8 +1361,8 @@ As a @acronym{GNU} extension, the first not have to be a simple word. It can be any text string, even the empty string. A macro with a non-standard name cannot be invoked in the normal way, as the name is -not recognized. It can only be referenced by the builtins @code{Indir} -(@pxref{Indir}) and @code{Defn} (@pxref{Defn}). +not recognized. It can only be referenced by the builtins @code{indir} +(@pxref{Indir}) and @code{defn} (@pxref{Defn}). @cindex arrays Arrays and associative arrays can be simulated by using this trick. @@ -1375,7 +1382,7 @@ array(eval(`10 + 7')) @result{}array element no. 17 @end example -Change the @code{%d} to @code{%s} and it is an associative array. +Change the @samp{%d} to @samp{%s} and it is an associative array. @node Arguments @section Arguments to macros @@ -1412,13 +1419,6 @@ macro (You should try and improve this example so that clients of @code{exch} do not have to double quote; or @pxref{Improved exch, , Answers}). [EMAIL PROTECTED] @acronym{GNU} extensions [EMAIL PROTECTED] @code{m4} allows the number following the @samp{$} to -consist of one -or more digits, allowing macros to have any number of arguments. This -is not so in UNIX implementations of @code{m4}, which only recognize -one digit. - As a special case, the zeroth argument, @code{$0}, is always the name of the macro being expanded. @@ -1443,6 +1443,51 @@ foo The @samp{foo} in the expansion text is @emph{not} expanded, since it is a quoted string, and not a name. [EMAIL PROTECTED] @acronym{GNU} extensions [EMAIL PROTECTED] nine arguments, more than [EMAIL PROTECTED] more than nine arguments [EMAIL PROTECTED] arguments, more than nine [EMAIL PROTECTED] positional parameters, more than nine [EMAIL PROTECTED] @code{m4} allows the number following the @samp{$} to +consist of one or more digits, allowing macros to have any number of +arguments. The extension of accepting multiple digits is incompatible +with @acronym{POSIX}, and is different than traditional implementations +of @code{m4}, which only recognize one digit. Therefore, future +versions of @acronym{GNU} M4 will phase out this feature. [EMAIL PROTECTED], for an example of how to portably access the eleventh +argument. + [EMAIL PROTECTED] also states that @samp{$} followed immediately by [EMAIL PROTECTED]@{} in a macro definition is implementation-defined. This version +of M4 passes the literal characters @[EMAIL PROTECTED] through unchanged, but M4 +2.0 will implement an optional feature similar to @command{sh}, where [EMAIL PROTECTED]@[EMAIL PROTECTED] expands to the eleventh argument, to replace the current +recognition of @samp{$11}. Meanwhile, if you want to guarantee that you +will get a literal @[EMAIL PROTECTED] in output when expanding a macro, even +when you upgrade to M4 2.0, you can use nested quoting to your +advantage: + [EMAIL PROTECTED] +define(`foo', `single quoted $`'@[EMAIL PROTECTED] output') [EMAIL PROTECTED] +define(`bar', ``double quoted $'[EMAIL PROTECTED]@} output'') [EMAIL PROTECTED] +foo(`a', `b') [EMAIL PROTECTED] quoted [EMAIL PROTECTED]@} output +bar(`a', `b') [EMAIL PROTECTED] quoted [EMAIL PROTECTED]@} output [EMAIL PROTECTED] example + +To help you detect places in your M4 input files that might change in +behavior due to the changed behavior of M4 2.0, you can use the [EMAIL PROTECTED] command-line option (@pxref{Operation modes, , +Invoking m4}). This will add a warning any time a macro definition +includes @samp{$} followed by multiple digits, or by @[EMAIL PROTECTED] and a +digit. The warning is not enabled by default, because it triggers a +number of warnings in Autoconf 2.61 (and Autoconf uses @option{-E} to +treat warnings as errors), and because it will still be possible to +restore traditional behavior in M4 2.0. + @node Pseudo Arguments @section Special arguments to macros @@ -2588,7 +2633,7 @@ foo @result{}blah @end example -Tracing even works on builtins. However, @command{defn} (@pxref{Defn}) +Tracing even works on builtins. However, @code{defn} (@pxref{Defn}) does not transfer tracing status. @example @@ -4721,10 +4766,10 @@ There are a few builtin macros in @code{ commands from within @code{m4}. Note that the definition of a valid shell command is system dependent. -On UNIX systems, this is the typical @code{/bin/sh}. But on other +On UNIX systems, this is the typical @command{/bin/sh}. But on other systems, such as native Windows, the shell has a different syntax of commands that it understands. Some examples in this chapter assume [EMAIL PROTECTED]/bin/sh}, and also demonstrate how to quit early with a known [EMAIL PROTECTED]/bin/sh}, and also demonstrate how to quit early with a known exit value if this is not the case. @menu @@ -4934,7 +4979,7 @@ sysval @result{}0 @end example [EMAIL PROTECTED] results in 127 if there was a problem executing the [EMAIL PROTECTED] results in 127 if there was a problem executing the command, for example, if the system-imposed argument length is exceeded, or if there were not enough resources to fork. It is not possible to distinguish between failed execution and successful execution that had @@ -5262,8 +5307,8 @@ which files are listed on each @code{m4} user's input file, or else each input file uses @code{include}. Reading the common base of a big application, over and over again, may -be time consuming. @acronym{GNU} @code{m4} offers some machinery to speed up -the start of an application using lengthy common bases. +be time consuming. @acronym{GNU} @code{m4} offers some machinery to +speed up the start of an application using lengthy common bases. @menu * Using frozen files:: Using frozen files @@ -5311,7 +5356,7 @@ with the varying input. The first call, option, only reads and executes file @file{base.m4}, defining various application macros and computing other initializations. Once the input file @file{base.m4} has been completely processed, @acronym{GNU} [EMAIL PROTECTED] produces on @file{base.m4f} a @dfn{frozen} file, that is, a [EMAIL PROTECTED] produces in @file{base.m4f} a @dfn{frozen} file, that is, a file which contains a kind of snapshot of the @code{m4} internal state. Later calls, containing the @option{-R} option, are able to reload @@ -5466,7 +5511,7 @@ Invoking m4}), unless overridden by othe @itemize @bullet @item -In the @[EMAIL PROTECTED] notation for macro arguments, @var{n} can contain +In the @[EMAIL PROTECTED] notation for macro arguments, @var{n} can contain several digits, while the System V @code{m4} only accepts one digit. This allows macros in @acronym{GNU} @code{m4} to take any number of arguments, and not only nine (@pxref{Arguments}). @@ -5623,10 +5668,11 @@ m4wrap(`a`'m4wrap(`c @end example @item [EMAIL PROTECTED] requires that all builtins that require arguments, but -are called without arguments, behave as though empty strings had been -passed. For example, @code{a`'define`'b} would expand to @code{ab}. -But @acronym{GNU} @code{m4} ignores certain builtins if they have missing [EMAIL PROTECTED] states that builtins that require arguments, but are +called without arguments, have undefined behavior. Traditional +implementations simply behave as though empty strings had been passed. +For example, @code{a`'define`'b} would expand to @code{ab}. But [EMAIL PROTECTED] @code{m4} ignores certain builtins if they have missing arguments, giving @code{adefineb} for the above example. @item Index: src/builtin.c =================================================================== RCS file: /sources/m4/m4/src/Attic/builtin.c,v retrieving revision 1.1.1.1.2.55 diff -u -p -r1.1.1.1.2.55 builtin.c --- src/builtin.c 27 Jan 2007 00:25:33 -0000 1.1.1.1.2.55 +++ src/builtin.c 28 Jan 2007 01:50:30 -0000 @@ -231,6 +231,7 @@ void define_user_macro (const char *name, const char *text, symbol_lookup mode) { symbol *s; + size_t len; s = lookup_symbol (name, mode); if (SYMBOL_TYPE (s) == TOKEN_TEXT) @@ -238,6 +239,43 @@ define_user_macro (const char *name, con SYMBOL_TYPE (s) = TOKEN_TEXT; SYMBOL_TEXT (s) = xstrdup (text ? text : ""); + + /* In M4 2.0, $11 will mean the first argument concatenated with 1, + not the eleventh argument. Also, ${1} will mean the first + argument, rather than literal text (although for compatibility + sake, it will be possible to restore the traditional meaning of + ${1} using changesyntax). Needing more than 9 arguments is + somewhat rare, but using M4 to process shell code is quite + common; either way, warn on usages that will change in + semantics. */ + if (warn_syntax && text && (len = strlen (text)) >= 3) + { + static struct re_pattern_buffer buf; + static bool init = false; + regoff_t offset = 0; + + if (! init) + { + const char *msg = "\\$[{0-9][0-9]"; + init_pattern_buffer (&buf, NULL); + msg = re_compile_pattern (msg, strlen (msg), &buf); + if (msg != NULL) + { + M4ERROR ((EXIT_FAILURE, 0, + "unable to check --warn-syntax: %s", msg)); + } + init = true; + } + while ((offset = re_search (&buf, text, len, offset, len - offset, + NULL)) >= 0) + { + M4ERROR ((warning_status, 0, + "Warning: semantics of `$%c%c%s' in `%s' will change", + text[offset + 1], text[offset + 2], + text[offset + 1] == '{' ? "...}" : "", name)); + offset += 3; + } + } } /*-----------------------------------------------. @@ -1828,15 +1866,18 @@ Warning: trailing \\ ignored in replacem | Initialize regular expression variables. | `------------------------------------------*/ -static void +void init_pattern_buffer (struct re_pattern_buffer *buf, struct re_registers *regs) { buf->translate = NULL; buf->fastmap = NULL; buf->buffer = NULL; buf->allocated = 0; - regs->start = NULL; - regs->end = NULL; + if (regs) + { + regs->start = NULL; + regs->end = NULL; + } } /*----------------------------------------. Index: src/input.c =================================================================== RCS file: /sources/m4/m4/src/Attic/input.c,v retrieving revision 1.1.1.1.2.32 diff -u -p -r1.1.1.1.2.32 input.c --- src/input.c 1 Nov 2006 22:29:08 -0000 1.1.1.1.2.32 +++ src/input.c 28 Jan 2007 01:50:30 -0000 @@ -1,6 +1,6 @@ /* GNU m4 -- A simple macro processor - Copyright (C) 1989, 1990, 1991, 1992, 1993, 1994, 2004, 2005, 2006 + Copyright (C) 1989, 1990, 1991, 1992, 1993, 1994, 2004, 2005, 2006, 2007 Free Software Foundation, Inc. This program is free software; you can redistribute it and/or modify @@ -752,15 +752,6 @@ set_comment (const char *bc, const char #ifdef ENABLE_CHANGEWORD -static void -init_pattern_buffer (struct re_pattern_buffer *buf) -{ - buf->translate = NULL; - buf->fastmap = NULL; - buf->buffer = NULL; - buf->allocated = 0; -} - void set_word_regexp (const char *regexp) { @@ -776,7 +767,7 @@ set_word_regexp (const char *regexp) } /* Dry run to see whether the new expression is compilable. */ - init_pattern_buffer (&new_word_regexp); + init_pattern_buffer (&new_word_regexp, NULL); msg = re_compile_pattern (regexp, strlen (regexp), &new_word_regexp); regfree (&new_word_regexp); Index: src/m4.c =================================================================== RCS file: /sources/m4/m4/src/Attic/m4.c,v retrieving revision 1.1.1.1.2.41 diff -u -p -r1.1.1.1.2.41 m4.c --- src/m4.c 5 Jan 2007 02:58:32 -0000 1.1.1.1.2.41 +++ src/m4.c 28 Jan 2007 01:50:30 -0000 @@ -55,6 +55,9 @@ int suppress_warnings = 0; /* If not zero, then value of exit status for warning diagnostics. */ int warning_status = 0; +/* If true, then warn about usage of ${1} in macro definitions. */ +bool warn_syntax = false; + /* Artificial limit for expansion_level in macro.c. */ int nesting_limit = 1024; @@ -142,10 +145,13 @@ for short options too.\n\ Operation modes:\n\ --help display this help and exit\n\ --version output version information and exit\n\ +", stdout); + fputs ("\ -E, --fatal-warnings stop execution after first warning\n\ -i, --interactive unbuffer output, ignore interrupts\n\ -P, --prefix-builtins force a `m4_' prefix to all builtins\n\ -Q, --quiet, --silent suppress some warnings for builtins\n\ + --warn-syntax warn on syntax that will change in future\n\ ", stdout); #ifdef ENABLE_CHANGEWORD fputs ("\ @@ -221,6 +227,7 @@ enum { DEBUGFILE_OPTION = CHAR_MAX + 1, /* no short opt */ DIVERSIONS_OPTION, /* not quite -N, because of message */ + WARN_SYNTAX_OPTION, /* no short opt */ HELP_OPTION, /* no short opt */ VERSION_OPTION /* no short opt */ @@ -250,6 +257,7 @@ static const struct option long_options[ {"debugfile", required_argument, NULL, DEBUGFILE_OPTION}, {"diversions", required_argument, NULL, DIVERSIONS_OPTION}, + {"warn-syntax", no_argument, NULL, WARN_SYNTAX_OPTION}, {"help", no_argument, NULL, HELP_OPTION}, {"version", no_argument, NULL, VERSION_OPTION}, @@ -455,6 +463,10 @@ main (int argc, char *const *argv, char debugfile = optarg; break; + case WARN_SYNTAX_OPTION: + warn_syntax = true; + break; + case VERSION_OPTION: version_etc (stdout, PACKAGE, PACKAGE_NAME, VERSION, AUTHORS, NULL); exit (EXIT_SUCCESS); Index: src/m4.h =================================================================== RCS file: /sources/m4/m4/src/m4.h,v retrieving revision 1.1.1.1.2.36 diff -u -p -r1.1.1.1.2.36 m4.h --- src/m4.h 6 Jan 2007 19:56:11 -0000 1.1.1.1.2.36 +++ src/m4.h 28 Jan 2007 01:50:30 -0000 @@ -110,6 +110,7 @@ extern int max_debug_argument_length; /* extern int suppress_warnings; /* -Q */ extern int warning_status; /* -E */ extern int nesting_limit; /* -L */ +extern bool warn_syntax; /* --warn-syntax */ #ifdef ENABLE_CHANGEWORD extern const char *user_word_regexp; /* -W */ #endif @@ -396,6 +397,8 @@ struct predefined typedef struct builtin builtin; typedef struct predefined predefined; +struct re_pattern_buffer; +struct re_registers; void builtin_init (void); void define_builtin (const char *, const builtin *, symbol_lookup); @@ -403,6 +406,7 @@ void define_user_macro (const char *, co void undivert_all (void); void expand_user_macro (struct obstack *, symbol *, int, token_data **); void m4_placeholder (struct obstack *, int, token_data **); +void init_pattern_buffer (struct re_pattern_buffer *, struct re_registers *); const builtin *find_builtin_by_addr (builtin_func *); const builtin *find_builtin_by_name (const char *);
_______________________________________________ M4-discuss mailing list M4-discuss@gnu.org http://lists.gnu.org/mailman/listinfo/m4-discuss