l...@gnu.org (Ludovic Courtès) skribis: > (The branch is called ‘wip-’ because the glibc upgrade happens to cause > troubles: since it has new locale category elements, the locale data is > incompatible with that older libcs expect, which means the bootstrap > binaries fail with an assertion failure when trying to load the new > locale data, like: > > xz: loadlocale.c:130: _nl_intern_locale_data: Assertion `cnt < (sizeof > (_nl_value_type_LC_COLLATE) / sizeof (_nl_value_type_LC_COLLATE[0]))' failed.
I thought spelling out the details of why this is annoying might help find a solution, so here we go. The binary format for locales is dependent on the libc version. Over the last few releases, it turned out to be compatible, but that of 2.22 differs from that of 2.21 (a new element was added to locale categories, according to ChangeLog.) During bootstrapping, at some point we build ‘guile-final’ against the latest libc (2.22.) In gnu-build-system.scm we heavily use ‘regexp-exec’ (via ‘substitute*’), which calls C code, and thus uses ‘scm_to_locale_string’. If we run in the “C” locale, we can only pass to ‘regexp-exec’ purely ASCII strings. However, it turns out that, occasionally, strings read from files (in ‘patch-shebangs’ etc.) are not ASCII, but rather UTF-8 (see commit 87c8b92.) Thus, calls to ‘regexp-exec’ with these strings lead to a “failed to convert to locale encoding” error. So ‘guile-final’ needs to run in a UTF-8 locale (the bootstrap Guile doesn’t have that problem thanks to the hacky ‘guile-default-utf8.patch’.) However, it we set LOCPATH to point to the libc 2.22 locales, we satisfy ‘guile-final’, but we break all the bootstrap binaries, which were built with an older libc; specifically, these binaries terminate with the assertion failure above. (If you’re still reading, I thank you for your support.) So we have some sort of an “interesting” checking-and-egg problem. We could side-step the issue by using the pure-Scheme SRFI-105 instead of ‘regexp-exec’. That may work to some extent, but we cannot get rid of ‘substitute*’ entirely overnight, so it’s not clear whether this would be enough. Apart from that, I can only think of dirty hacks. What do people think? Ludo’.