The difference between isleadbyte (declared in ctype.h) and _ismbblead 
(declared in mbctype.h) is that isleadyte uses code page set with call to 
setlocale(), while _ismbblead uses code page set with _setmbcp[1].

All functions declared in mbctype.h and mbstring.h use code page set with 
_setmbcp() (not with setlocale), so we should avoid using them at all.

Yes, _ismbblead does not work with UTF-8 and this is not an issue in our case. 
I believe isleadyte will always return 0 with UTF-8, but I didn't test it.

It is not a problem for me to recreate patches if yours are pushed first. What 
do you think about splitting misc/wcrtomb.c into multiple files just like you 
did with misc/mbrtowc.c?

- Kirill Makurin

[1] https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/setmbcp
________________________________
From: Pali Rohár <[email protected]>
Sent: Saturday, August 16, 2025 7:26 AM
To: Kirill Makurin <[email protected]>
Cc: mingw-w64-public <[email protected]>
Subject: Re: [PATCH 0/9] crt: Improve mbrtowc and ___lc_handle_func / 
___lc_codepage_func

Hello!

On Friday 15 August 2025 21:48:10 Kirill Makurin wrote:
> Hi Pali,
>
> Some of patches in this series conflict with patches I have sent in `Fix 
> return value of mbrlen and mbrtowc`. In particular, splitting implementation 
> into three files and removing `mb_wc_common.h`.

I can rebase my changes, that is no problem.

> I like the idea of using `_ismbblead()` instead of `IsDBCSLeadByteEx` and 
> removing static `mbrtowc_cp`, but shouldn't it be isleadbyte[1] instead?

That is a good question. I blindly chose _ismbblead because it is
already used in crtexewin.c and did not think about it. Now I'm
thinking, what is the difference between those two functions?

https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/isleadbyte-isleadbyte-l
https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/ismbblead-ismbblead-l

Description is not very useful, for me it looks like that _ismbblead is
limited to double byte encodings (so for 4-byte UTF-8 it does not work)
but isleadbyte works for any encodings?

Looking into ms files, they are defined as (after expanding macros):

#define isleadbyte(c) (__pctype_func()[(unsigned char)(c)] & _LEADBYTE)
#define _ismbblead(c) ((__p__mbctype()+1)[(unsigned char)(c)] & _M1)

And seems that ucrt sets both _LEADBYTE and _M1 flags for
CPINFO.LeadByte sequences returned from GetCPInfo().

So I do not know which should be used, needs more investigation.

mingw-w64's mbrtowc() seems to handle maximally mb_cur_max == 2, so it
is also questionable UTF-8 support (cp=65000).

> Replacing `_set_errno()` with `errno` also sounds like a good idea to me.
>
> Both usage of `IsDBCSLeadByteEx` and `_set_errno` came from my original code 
> on which I based this implementation for mingw-w64.
>
> - Kirill Makurin
>
> [1] 
> https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/isleadbyte-isleadbyte-l
> ________________________________
> From: Pali Rohár <[email protected]>
> Sent: Saturday, August 16, 2025 6:28 AM
> To: [email protected] 
> <[email protected]>
> Cc: Martin Storsjö <[email protected]>; LIU Hao <[email protected]>; Kirill 
> Makurin <[email protected]>
> Subject: [PATCH 0/9] crt: Improve mbrtowc and ___lc_handle_func / 
> ___lc_codepage_func
>
> Pali Rohár (9):
>   crt: Provide emulation of ___lc_handle_func for msvcrt.dll and
>     msvcrtd.dll
>   crt: Improve support for ___lc_codepage_func() function
>   crt: Remove internal mb_wc_common.h and replace it by locale.h usage
>   crt: Remove static helper function mbrtowc_cp()
>   crt: Move private state_mbrlen/state_mbrtowc/state_mbsrtowcs variables
>     to corresponding functions
>   crt: Replace IsDBCSLeadByteEx() by _ismbblead() in mbrtowc()
>   crt: Use errno instead of _set_errno in mbrtowc
>   crt: Split mbrtowc.c into 3 files mbrlen.c mbrtowc.c and mbsrtowcs.c
>   crt: Use only mbstate_t in mbsrtowcs
>
>  mingw-w64-crt/Makefile.am                     |  12 +-
>  mingw-w64-crt/lib-common/msvcrt.def.in        |   2 +-
>  .../{mb_wc_common.h => ___lc_codepage_func.c} |  15 +-
>  ...cale_func.c => ___lc_codepage_func_emul.c} |  64 ++-----
>  .../{mb_wc_common.h => ___lc_handle_func.c}   |  17 +-
>  mingw-w64-crt/misc/btowc.c                    |   2 +-
>  .../misc/{mb_wc_common.h => mbrlen.c}         |  17 +-
>  mingw-w64-crt/misc/mbrtowc.c                  | 147 ++-------------
>  mingw-w64-crt/misc/{mbrtowc.c => mbsrtowcs.c} | 174 +-----------------
>  mingw-w64-crt/misc/mingw_wcstold.c            |   2 -
>  mingw-w64-crt/misc/wcrtomb.c                  |   2 +-
>  mingw-w64-crt/misc/wcstof.c                   |   2 -
>  mingw-w64-crt/misc/wctob.c                    |   2 +-
>  13 files changed, 95 insertions(+), 363 deletions(-)
>  copy mingw-w64-crt/misc/{mb_wc_common.h => ___lc_codepage_func.c} (41%)
>  rename mingw-w64-crt/misc/{lc_locale_func.c => ___lc_codepage_func_emul.c} 
> (15%)
>  copy mingw-w64-crt/misc/{mb_wc_common.h => ___lc_handle_func.c} (39%)
>  rename mingw-w64-crt/misc/{mb_wc_common.h => mbrlen.c} (38%)
>  copy mingw-w64-crt/misc/{mbrtowc.c => mbsrtowcs.c} (30%)
>
> --
> 2.20.1
>

_______________________________________________
Mingw-w64-public mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/mingw-w64-public

Reply via email to