On 2025-01-26 Pali Rohár wrote:
> Maybe it could be a good idea to look into last released version of
> source code for UCRT. Such ___lc_codepage_func() / CP_UTF8 /
> AreFileApisANSI() / CP_ACP / CP_OEMCP should be there too (if it was
> correctly guessed). Maybe there could be some other corner cases?

I cannot do that, sorry. Perhaps someone else can.

> Slightly off-topic, not related to readdir, but could be interesting
> to check, what would happen if you call setlocale(LC_ALL, ".UTF-8")
> before __getmainargs() call (which is in mingw-w64 startup code
> crtexe.c)? Would this force UCRT to pass argv[] in UTF-8 encoding
> into main() even without having UTF-8 manifest?

I didn't test but even if it worked, I suspect that ACP wouldn't become
UTF-8 and thus argv[] and CRT wouldn't be in sync with Win32 *A() APIs.
I *assume* that ACP is set fairly early, even before the first
instruction is run from the executable.

It's good to avoid the situation where CRT file system APIs use
different encoding than the *A() functions. There is code around that
uses both in parallel with the assumption that the encodings are the
same.

One would think that setlocale(LC_ALL, ".UTF-8") is rare but I think
it's more common than it seems at first. <libintl.h> from
gettext-runtime overrides setlocale() with its intl_setlocale()
wrapper. The wrapper reads environment variables like LC_CTYPE (the
native setlocale() doesn't do that).

Cygwin and MSYS2 default to UTF-8 locale and they export these POSIX
environment variables even when running native Windows programs. When
setlocale(LC_ALL, "") becomes intl_setlocale(LC_ALL, "") and there is
LC_CTYPE=en_US.UTF-8, one ends up with UTF-8 locale in UCRT but the
*A() APIs and argv[] are still in 1252.

In MSYS2, you can try with /ucrt64/bin/size.exe by passing it a
filename that contains non-ASCII characters. It cannot open the file
because it tries to use ANSI encoded filename from argv[] with UCRT's
file system APIs that expect UTF-8 due to the locale. If you set
LC_CTYPE=C then it works.

On the other hand, the <libint.h> setlocale() override is there only
when translations have been enabled. If a package is configured with
--disable-nls, then <libintl.h> isn't #included either and the LC_*
environment variables aren't obeyed on native Windows. (Packages that
use Gnulib might have a setlocale() override still though.)

To keep things simpler, UTF-8 locales ideally wouldn't be used unless
ACP is UTF-8 (set in application manifest or globally in Windows
settings). It's not that simple though because, for some apps,
filenames don't matter but stdin/stdout encoding does.

It's a curious mess.

-- 
Lasse Collin


_______________________________________________
Mingw-w64-public mailing list
Mingw-w64-public@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mingw-w64-public

Reply via email to