Re: [Mingw-w64-public] Fix return value of mbrlen and mbrtowc

Kirill Makurin Wed, 20 Aug 2025 06:56:27 -0700

I like the idea to avoid LoadLibrary at all, I wasn't sure how to achieve it. 
Thanks for the info about using DATA in def files.


The bug itself seems to be around since msvcr80.dll, where mbrtowc and other 
C95 conversion functions were introduced. I guess it was unnoticed because it 
only occurs with specific code pages and in very uncommon case when you pass 1 
as `count` argument instead of MB_CUR_MAX.

I can't figure myself how to report a bug in CRT to Microsoft, any pointers? If 
they would fix this in UCRT, this would be amazing.

Using escape sequenced instead of literal UTF-8 characters sounds reasonable. 
In this case I could also replace them in existing tests for conversion 
functions.

- Kirill Makurin
________________________________
From: Martin Storsjö <[email protected]>
Sent: Wednesday, August 20, 2025 9:49 PM
To: mingw-w64-public <[email protected]>
Subject: Re: [Mingw-w64-public] Fix return value of mbrlen and mbrtowc

Hi,

Side note; patches as attachments are much harder to comment on, than
inline patches, with my mail client.

Comments on patch 3:

> crt: use replacements for C95 conversion functions with UCRT

The commit message states that it does this, but it does not say _why_.
This is essential information.

It may have been conveyed elsewhere (prior discussions on the mailing
list, or maybe in prior commit messages as well), but at least for me for
jumping in to review it, I don't have that information available.

Overall - with older msvcrt.dll, we have had the habit to replace lots of
functions with our own, statically linked. With UCRT, the general
direction is to work towards not replacing as many CRT functions, if
possible.

In this case, as far as I've understood, there's a spec bug in the UCRT
implementation of this bug. I would like this bug to be outlined in the
commit message here. Ideally it would also come with a link to a bugreport
to MS about this bug (because they actually do fix these kinds of bugs),
and a note about which version of UCRT the bug was observed in.

Whether to replace the function or not is of course always a judgement
call between the severity of bug and how widespread the usage is. In this
case I guess the function isn't that widely used, so not all users would
end up statically linking more code, so it may be ok in that sense.


Some practical comments on the patch:

> +#ifdef _UCRT
> +static
> +#endif
>  size_t wcrtomb (

Cases like these are kinda ugly and repetitive. It could be nicer to
define a macro like STATIC_ON_UCRT at the start of the file, to avoid some
ifdefs.

> +#ifdef _UCRT
> +#undef wcrtomb
> +#undef wcsrtombs
> +#endif

I don't think we need to undef defines at the end of the file.

> +static size_t wcrtomb_init (
> +  char *__restrict__ mbc,
> +  wchar_t wc,
> +  mbstate_t *__restrict__ state
> +) {
> +  HANDLE ucrt = LoadLibraryW (L"ucrtbase.dll");
> +
> +  if (ucrt == NULL) {
> +    fwprintf (stderr, L"Runtime error: failed to obtain handle to 
> ucrtbase.dll\n");
> +    abort ();
> +  }
> +
> +  __wcrtomb_t real_wcrtomb = (__wcrtomb_t) GetProcAddress (ucrt, "wcrtomb");

Instead of LoadLibrary, we could also consider GetModuleHandle, as the DLL
should be expected to already be loaded here.

However - this approach doesn't work on UWP. In UWP environments, the
regular LoadLibrary and GetModuleHandle are unavailable. (If compiling for
such environments, one probably ends up linking with the winstorecompat
library, which replaces them with stubs.) Or when looking closer at it,
LoadLibrary is being stubbed in both winstorecompat (for Win 8.x "windows
store") and windowsappcompat (for Win 10 UWP), while GetModuleHandle is
stubbed only in the former, so it may be usable on Win10.

But anyway, if we could avoid the need to do this dynamic lookup if
possible, it would be much less brittle.

In this case, we _know_ the DLL we link against does have these functions,
so we don't need to avoid it for the sake of not having a binary loading
(like in the case of msvcrt.dll).

An alternative approach, is to not comment out the original functions in
the .def file, but mark them as "DATA" - which retains the __imp_<func>
symbols but skips the <func> symbol. Then we can reference the real ones
through __imp_<func>, or rather __MINGW_IMP_SYMBOL(<func>), in code,
without requiring the dynamic lookup at all.

If callers end up calling e.g. wcrtomb with a dllimport declaration (our
headers don't declare it this way right now), this would end up going
directly towards the UCRT version though, so it's less complete, but I
think it still would be a simpler solution overall. What do you think?

For patch 4:

> + * Valid UTF-8 character which cannot be represented by a single wchar_t.
> + */
> +char UTF8SurrogatePair[] = "🧡";
>  #endif

I think we'd rather have the fancy utf8 char in the form of escapes,
rather than as a literal utf8 char.

// Martin

_______________________________________________
Mingw-w64-public mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/mingw-w64-public

_______________________________________________
Mingw-w64-public mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/mingw-w64-public

Re: [Mingw-w64-public] Fix return value of mbrlen and mbrtowc

Reply via email to