On Sat, Dec 2, 2023, at 11:24 AM, Zack Weinberg wrote: > On Sat, Dec 2, 2023, at 6:05 AM, Bruno Haible wrote: >> On Solaris OpenIndiana/x86_64... >> 445: semantics.at:1255 AC_PROG_LEX with noyywrap >> 446: semantics.at:1257 AC_PROG_LEX with yywrap >> 510: acprograms.at:22 AC_DECL_YYTEXT > > I can reproduce the lex-related failures (445/446/510) on > OmniOS/x86_64 using a custom PATH that excludes bison and flex; expect > a patch shortly.
The root cause of this failure is a previously unsuspected problem — IMNSHO it qualifies as a bug in the version of lex that shipped with OmniOS (and presumably also OpenIndiana). The OmniOS installation I have access to, also ships flex (version 2.6.4) and there’s a *different* problem with that, but it’s more of a C++ compatibility headache than an actual *bug*; on the other hand, its consequences are more severe, it causes Autoconf to pick the wrong LEXLIB under some circumstances, rather than just “giving up on lex”. I am not sure what to do about either problem and would appreciate suggestions. An important detail is that all the problems I’m about to describe only manifest when you compile the generated scanner as C++. The Autoconf test suite tries both C and C++ for every macro that can be used with either language, and expects config.status to be nigh-identical both ways. Thus, if ./configure reports that we have a usable lex program for C, but not for C++, the test fails. This system ships both libfl.a and libfl.so, both defining yywrap and main. If I compile a test .l file that needs an external definition of yywrap through flex *as C*, everything is fine. However, if I compile it as C++, then the object file defines yylex() with a mangled name. This is probably what C++ users of lex want, and it *ought to* be harmless, because the object file also defines main. But the linker picks the shared version of libfl to satisfy the external reference to yywrap, and then fails the link because libfl.so’s main contains a strong external reference to yylex with an *unmangled* name. AC_PROG_LEX then goes on to try -ll, which we should not be using with flex on this system because of what I’ll say below, and that link *succeeds* because its version of main has only a *weak* reference to yylex. This makes config.status for C++ say LEXLIB=-ll, while config.status for C says LEXLIB=-lfl, and so test 446 fails when flex is available to the test suite. (Test 445 succeeds, because its version of the test program doesn’t require yywrap, so it links fine with no -lfl.) I currently cannot think of a way to modify a test lexer that needs yywrap so that it links successfully with LEXLIB=-lfl, on this system, when compiled as C++, but that’s what we need here. (My Linux workstation avoids this problem by not having a shared version of libfl. I wonder if there’s a good way to prevent linkers from using shared libfl, specifically — that would fix the C++ problem and, in the worst case, would make us report that a test lexer that needs yywrap can’t be linked as *either* C or C++, which we could live with.) Now, the lex problem. OmniOS’s /usr/bin/lex identifies itself as lex: Software Generation Utilities (SGU) Solaris-ELF (4.0) It includes libl.so (shared object *only*) and its lexer skeleton depends on that library for several symbols, not just for yywrap. (This is why we shouldn’t try to use -ll with flex on this system; a nontrivial scanner is liable to get symbol clashes and/or malfunction.) $ nm --dynamic --defined-only /usr/lib/libl.so 00000000 A SUNW_1.1 000000b4 R _DYNAMIC 00012000 D _GLOBAL_OFFSET_TABLE_ 00000f78 T _PROCEDURE_LINKAGE_TABLE_ 000120a0 D _edata 000120c0 B _end 00001ca8 R _etext 00001162 T allprint@@SUNW_1.1 0000153d T allprint_w@@SUNW_1.1 00001285 T main@@SUNW_1.1 0000124e T sprint@@SUNW_1.1 0000162b T sprint_w@@SUNW_1.1 000014b5 T yyless@@SUNW_1.1 00001b3c T yyless_e@@SUNW_1.1 00001856 T yyless_w@@SUNW_1.1 000012c1 T yyracc@@SUNW_1.1 00001380 T yyreject@@SUNW_1.1 000019d4 T yyreject_e@@SUNW_1.1 00001720 T yyreject_w@@SUNW_1.1 00001537 T yywrap@@SUNW_1.1 You can see that none of these names are mangled. However, the lexer skeleton declares the functions it needs like this: | #if defined(__cplusplus) && defined(__EXTERN_C__) | extern "C" { | #endif | int yyback(int *, int); | int yyinput(void); | int yylook(void); | void yyoutput(int); | int yyracc(int); | int yyreject(void); | void yyunput(int); | int yylex(void); | #ifdef YYLEX_E | void yywoutput(wchar_t); | wchar_t yywinput(void); | void yywunput(wchar_t); | #endif | #ifndef yyless | int yyless(int); | #endif | #ifndef yywrap | int yywrap(void); | #endif | #ifdef LEXDEBUG | void allprint(char); | void sprint(char *); | #endif | #if defined(__cplusplus) && defined(__EXTERN_C__) | } | #endif That is, compiling as C++ *does not* make the skeleton declare yyless, yyreject, etc. as extern "C" unless you also define __EXTERN_C__. Moreover, the %{ %} block is too late to define __EXTERN_C__, it needs to be on the command line or injected at the top of the generated lexer. I consider this an outright bug in this version of lex, since there’s no way that a C++ scanner can work correctly without __EXTERN_C__ defined. I don’t want to put a huge amount of work into compatibility with a buggy lex implementation. I’d be fine with rejecting it altogether, for C++ lexers at least. However, that breaks the test suite’s expectation that AC_PROG_LEX’s results will match for C and C++. We do have machinery to cancel that expectation, but if I use it here, we won’t get notified of some classes of bugs in hypothetical future versions of Flex. Thoughts? zw