On 06/ 8/16 05:15 PM, Dimitry Andric wrote: > On 08 Jun 2016, at 21:11, Gerald Pfeifer <ger...@pfeifer.com> wrote: >> >> I got a user report, and could reproduce this, that building >> GCC (lang/gcc, but also current HEAD, so probably pretty much >> any version) with FreeBSD 11 and LANG = en_US.UTF-8 we get >> conflicting entires in $BUILDDIR/gcc/options.h such as >> >> OPT_d = 135, /* -d */ >> OPT_D = 136, /* -D */ >> OPT_d = 137, /* -d */ >> OPT_D = 138, /* -D */ >> OPT_d = 141, /* -d */ >> OPT_D = 142, /* -D */ >> OPT_d = 143, /* -d */ >> >> Using LANG = en_US (without UTF-8), everything works fine. >> >> Any ideas what might be going on here? (This is done via >> AWK scripts from what I can tell, does this trigger any >> ideas?) > > It is definitely something caused by our awk in base, in any case. > First opt-gather.awk is run to generate a flat list of all options: > > /usr/bin/awk -f /usr/ports/lang/gcc/work/gcc-4.8.5/gcc/opt-gather.awk > /usr/ports/lang/gcc/work/gcc-4.8.5/gcc/ada/gcc-interface/lang.opt > /usr/ports/lang/gcc/work/gcc-4.8.5/gcc/fortran/lang.opt > /usr/ports/lang/gcc/work/gcc-4.8.5/gcc/go/lang.opt > /usr/ports/lang/gcc/work/gcc-4.8.5/gcc/java/lang.opt > /usr/ports/lang/gcc/work/gcc-4.8.5/gcc/lto/lang.opt > /usr/ports/lang/gcc/work/gcc-4.8.5/gcc/c-family/c.opt > /usr/ports/lang/gcc/work/gcc-4.8.5/gcc/common.opt > /usr/ports/lang/gcc/work/gcc-4.8.5/gcc/config/fused-madd.opt > /usr/ports/lang/gcc/work/gcc-4.8.5/gcc/config/i386/i386.opt > /usr/ports/lang/gcc/work/gcc-4.8.5/gcc/config/rpath.opt > /usr/ports/lang/gcc/work/gcc-4.8.5/gcc/config/freebsd.opt > tmp-optionlist > > Then opt-functions.awk is run to process optionlist into options.h: > > /usr/bin/awk -f /usr/ports/lang/gcc/work/gcc-4.8.5/gcc/opt-functions.awk -f > /usr/ports/lang/gcc/work/gcc-4.8.5/gcc/opt-read.awk -f > /usr/ports/lang/gcc/work/gcc-4.8.5/gcc/opth-gen.awk < optionlist > options.h > > If I run the first step using LANG=C, or without any LANG setting, both > optionlist and options.h are as expected. If I run the first step using > LANG=en_US.UTF-8, the optionlist is sorted differently, for example the > "good" optionlist has the uppercase d options first, and much later the > lowercase d options: > > D^\C ObjC C++ ObjC++ Joined Separate MissingArgError(macro name missing > after %qs)^\-D<macro>[=<val>] Define a <macro> with <val> as its value. If > just <macro> is given, <val> is taken to be 1 > D^\Driver Joined Separate > D^\Fortran Joined Separate > ... much later in the file, after all options starting with an uppercase > letter ... > d^\C ObjC C++ ObjC++ Joined > d^\Common Joined^\-d<letters> Enable dumps from specific passes of the > compiler > d^\Fortran Joined > d^\Java Separate SeparateAlias Alias(foutput-class-dir=) > > The "bad" optionlist has the upper and lower case d options sorted > together: > > d^\C ObjC C++ ObjC++ Joined > D^\C ObjC C++ ObjC++ Joined Separate MissingArgError(macro name missing > after %qs)^\-D<macro>[=<val>] Define a <macro> with <val> as its value. If > just <macro> is given, <val> is taken to be 1 > d^\Common Joined^\-d<letters> Enable dumps from specific passes of the > compiler > D^\Driver Joined Separate > defsym=^\Driver JoinedOrMissing > defsym^\Driver Separate > d^\Fortran Joined > D^\Fortran Joined Separate > d^\Java Separate SeparateAlias Alias(foutput-class-dir=) > > Note that GNU awk does *not* produce a different optionlist file when > used with either LANG=C or LANG=en_US.UTF-8. > > opt-gather.awk's sorting function looks like this: > > function sort(ARRAY, ELEMENTS) > { > for (i = 2; i <= ELEMENTS; ++i) { > for (j = i; ARRAY[j-1] > ARRAY[j]; --j) { > temp = ARRAY[j] > ARRAY[j] = ARRAY[j-1] > ARRAY[j-1] = temp > } > } > return > } > > So I am assuming that the ARRAY[j-1] > ARRAY[j] comparison works > differently in our awk, depending on the LANG settings. No idea when > that changed, though, if it changed at all...
This behaviour is known for very long time: https://svnweb.freebsd.org/changeset/base/173731 and it is not our fault: https://www.gnu.org/software/gawk/manual/html_node/POSIX-String-Comparison.html GNU awk produces the same output with "--posix" option. FYI... Jung-uk Kim
signature.asc
Description: OpenPGP digital signature