This is a very good idea. It is also would be good to check that functions which return char as an int avoid sign-extension. For example, this was a reason to replace wctob() for UCRT, CRT's wctob() which sign-extends its return value making it impossible to distinguish `return (char)255` from EOF which is returned on failure. (yes, it is a CRT bug, but the same idea)
I also wanted to suggest check usage of [f]printf functions in mingw-w64, and we use them, replace them with wide [f]wprintf instead. The reason is that applications may call _setmode[1] on stdout/stderr with one of _O_U8TEXT, _O_U16TEXT and _O_WTEXT, and if they do so, [f]printf will fail and produce no output. Speaking of _setmode, I wonder if there is a way to obtain which translation mode is set on particular file descriptor? It's kinda dumb that there is no way to get it other than calling _setmode (it returns previous translation mode). - Kirill Makurin [1] https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/setmode ________________________________ From: Pali Rohár <[email protected]> Sent: Monday, September 8, 2025 7:29 AM To: Kirill Makurin <[email protected]> Cc: mingw-w64-public <[email protected]> Subject: Re: Tests for mbrlen, mbrtowc and mbsrtowcs fail with crtdll.dll I was thinking about the isleadbyte issue and seems that the explicit cast to unsigned char should be used. Basically every ISO C is* function from ctype.h has defined same behavior. Function takes argument of signed int type which should be either EOF or value in range of unsigned char. Function isleadbyte() is not in ISO C but it makes sense if it follow above logic. As EOF is defined as -1, it means that sign-extended input would never work for (char)255. I would suggest to check all usage of is*() functions in mingw-w64 that there is explicit cast to unsigned char. On Thursday 04 September 2025 09:50:22 Kirill Makurin wrote: > Out of curiosity I tried to configure mingw-w64 for crtdll.dll and run the > tests. Something strange happens in tests for mbrlen, mbrtowc and mbsrtowcs > functions. > > First, it seems that crtdll's setlocale() does not allow string which specify > ACP/OCP instead of actual code page (it's not a big issue, I'll send a patch > in coming days). Second, something seems to be wrong with crtdll's > isleadbyte(). > > For mbrlen's test, the failing assertion was > > ``` > assert (mbrlen ((char *) Multibyte, 1, &state) == (size_t) -2); > ``` > > It seems to me that the issue may come from line 78 in mbrtowc.c: > > ``` > } else if (mb_cur_max == 2 && isleadbyte (mbs[0])) { > ``` > > Here mbs[0] is passed to isleadbyte(). If value of mbs[0] is outside of range > [0,127] (as all lead bytes are), it will be sign-extended. This seems to be > the source of the issue. I tried re-run tests after adding a cast to > `unsigned char` and it seems to fix it. > > I tried configuring and running tests with msvcrt20.dll and they pass. I do > not have msvcrt10.dll to run tests with it. It would be helpful if anyone > could run tests with crtdll.dll and msvcrt10.dll to see how they behave. > > If the issue is sign-extended input to isleadbyte(), it is fixable by a > simple patch with cast to `unsigned char`. If issue is indeed in crtdll's > isleadbyte(), I suggest we replace it by emu. > > - Kirill Makurin _______________________________________________ Mingw-w64-public mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/mingw-w64-public
