On 2025-01-26 Pali Rohár wrote: > Maybe it could be a good idea to look into last released version of > source code for UCRT. Such ___lc_codepage_func() / CP_UTF8 / > AreFileApisANSI() / CP_ACP / CP_OEMCP should be there too (if it was > correctly guessed). Maybe there could be some other corner cases?
I cannot do that, sorry. Perhaps someone else can. > Slightly off-topic, not related to readdir, but could be interesting > to check, what would happen if you call setlocale(LC_ALL, ".UTF-8") > before __getmainargs() call (which is in mingw-w64 startup code > crtexe.c)? Would this force UCRT to pass argv[] in UTF-8 encoding > into main() even without having UTF-8 manifest? I didn't test but even if it worked, I suspect that ACP wouldn't become UTF-8 and thus argv[] and CRT wouldn't be in sync with Win32 *A() APIs. I *assume* that ACP is set fairly early, even before the first instruction is run from the executable. It's good to avoid the situation where CRT file system APIs use different encoding than the *A() functions. There is code around that uses both in parallel with the assumption that the encodings are the same. One would think that setlocale(LC_ALL, ".UTF-8") is rare but I think it's more common than it seems at first. <libintl.h> from gettext-runtime overrides setlocale() with its intl_setlocale() wrapper. The wrapper reads environment variables like LC_CTYPE (the native setlocale() doesn't do that). Cygwin and MSYS2 default to UTF-8 locale and they export these POSIX environment variables even when running native Windows programs. When setlocale(LC_ALL, "") becomes intl_setlocale(LC_ALL, "") and there is LC_CTYPE=en_US.UTF-8, one ends up with UTF-8 locale in UCRT but the *A() APIs and argv[] are still in 1252. In MSYS2, you can try with /ucrt64/bin/size.exe by passing it a filename that contains non-ASCII characters. It cannot open the file because it tries to use ANSI encoded filename from argv[] with UCRT's file system APIs that expect UTF-8 due to the locale. If you set LC_CTYPE=C then it works. On the other hand, the <libint.h> setlocale() override is there only when translations have been enabled. If a package is configured with --disable-nls, then <libintl.h> isn't #included either and the LC_* environment variables aren't obeyed on native Windows. (Packages that use Gnulib might have a setlocale() override still though.) To keep things simpler, UTF-8 locales ideally wouldn't be used unless ACP is UTF-8 (set in application manifest or globally in Windows settings). It's not that simple though because, for some apps, filenames don't matter but stdin/stdout encoding does. It's a curious mess. -- Lasse Collin _______________________________________________ Mingw-w64-public mailing list Mingw-w64-public@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mingw-w64-public