I commented on bug-gettext that these fixes work for me when environment variables LANG, LC_*, or LANGUAGE aren't set. Thanks again!
The commit 00211fc69c92 ("setlocale: Support the UTF-8 environment on native Windows.") introduces Windows support for C.UTF-8 in setlocale.c. Emulating C.UTF-8 with "English_United States.65001" looks like a great idea with LC_CTYPE. However, with some other locale categories it's problematic. I think "C.UTF-8" should map to a mixed locale. I noticed the informative link[1] in the Gettext commit 3873b7f1c777 ("intl: Treat C.UTF-8 locale like C locale."). The wiki page makes me suspect that LC_ALL=C.UTF-8 should set LC_CTYPE to "English_United States.65001" while setting all other categories to "C". This seems pretty clear for LC_COLLATE and LC_NUMERIC but I'm not sure about the other categories. [1] https://sourceware.org/glibc/wiki/Proposals/C.UTF-8 LC_COLLATE=C.UTF-8 should sort in Unicode codepoint order. In UTF-8 this is the same as byte order, thus "C" does the right thing (at least for valid UTF-8 inputs). In contrast, "English_United States.65001" sorts by English rules, for example, putting "ä" before "b". Test with strcoll("b", "ä"). LC_NUMERIC can make a difference if thousand separators are requested. Thousand separators in printf() are a POSIX feature that UCRT's printf() doesn't support. However, MinGW-w64's replacement via "#define __USE_MINGW_ANSI_STDIO 1" provides an implementation that does support thousand separators. -- Lasse Collin