On Fri, Feb 18, 2022, at 9:38 AM, Eric Blake wrote: >> typeset as_tr_cpp='eval sed >> '\''y%*abcdefghijklmnopqrstuvwxyz%PABCDEFGHIJKLMNOPQRSTUVWXYZ%;s%[^_abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789]%_%g'\' >> >> (via “typeset -p”), and without -o noglob, using it as simply >> $as_tr_cpp does, in fact, glob on it. >> >> Yes, clearly a bug in GNU autoconf… which I’m not personally going to >> even try and report. The …[^… is passed to sed. But it is also >> processed by the shell first, by accident. (This is from >> OpenSSH-portable’s configure.) > > So we need to patch autoconf to properly shell-quote the sed script > stored in as_tr_cpp.
(This also affects as_tr_sh.) This is going to take some surgery. In a generated autoconf script we will have # Sed expression to map a string onto a valid CPP name. as_tr_cpp="eval sed 'y%*$as_cr_letters%P$as_cr_LETTERS%;s%[^_$as_cr_alnum]%_%g'" # Sed expression to map a string onto a valid variable name. as_tr_sh="eval sed 'y%*+%pp%;s%[^_$as_cr_alnum]%_%g'" The RHS of these assignments is double-quoted so the shell variables ($as_cr_letters, etc) are expanded first, leaving us with something like # Sed expression to map a string onto a valid CPP name. as_tr_cpp="eval sed 'y%*abcdef...%PABCDEF...%;s%[^_abcABC123...]%_%g'" # Sed expression to map a string onto a valid variable name. as_tr_sh="eval sed 'y%*+%pp%;s%[^_abcABC123...]%_%g'" Typical usage for these is e.g. cat >>confdefs.h <<_ACEOF #define `$as_echo "HAVE_$ac_header" | $as_tr_cpp` 1 _ACEOF We have an unquoted expansion of $as_tr_cpp, which is split by fields, then subject to glob expansion, and _then_ eval'ed (which strips the single quotes). At the point of the glob expansion, the single quotes are not special. We're only getting away with this because of how unlikely it is that a file matching the glob 'y%*abcdef...%PABCDEF...%;s%[^_abcABC123...]%_%g' (_including_ the single quotes) will exist, and how rare it is for anyone to turn on failglob or nullglob (probably M4sh breaks in many more ways if you do that). Note that if there were spaces inside the single quotes, the sed script would get split into multiple words! I think the only practical way to fix this is to convert $as_tr_* into shell functions, which will also mean that we don't need the eval anymore, which is nice. Something like as_fn_tr_cpp () { sed "y%*$as_cr_letters%P$as_cr_LETTERS%;s%[^_$as_cr_alnum]%_%g" } as_tr_cpp=as_fn_tr_cpp If we don't like having $as_cr_letters etc expanded when as_fn_tr_cpp is invoked, we can define the functions using the equivalent m4_cr_* macros instead and use single quotes around the sed script. I don't know when I will have time to write a patch for this. zw p.s. I'm sympathetic to mirabilos' position that POSIX should not mandate [^...] range complement in globs; but given that there exist several widely-used shells that already implement [^...], _none_ of the possible changes to POSIX actually make anything better. There exist scripts that require [^...] to expand, and scripts that require it _not_ to expand. Defensively coded scripts have to avoid [^...] entirely, which as we see above can be a major headache. Changes to POSIX take upwards of ten years to become ubiquitous. Honestly, at this point in history I would be inclined to say "No further changes to the POSIX shell language, period. It is what it is. Use a less terrible scripting language if you have the option."