The difference between isleadbyte (declared in ctype.h) and _ismbblead (declared in mbctype.h) is that isleadyte uses code page set with call to setlocale(), while _ismbblead uses code page set with _setmbcp[1].
All functions declared in mbctype.h and mbstring.h use code page set with _setmbcp() (not with setlocale), so we should avoid using them at all. Yes, _ismbblead does not work with UTF-8 and this is not an issue in our case. I believe isleadyte will always return 0 with UTF-8, but I didn't test it. It is not a problem for me to recreate patches if yours are pushed first. What do you think about splitting misc/wcrtomb.c into multiple files just like you did with misc/mbrtowc.c? - Kirill Makurin [1] https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/setmbcp ________________________________ From: Pali Rohár <[email protected]> Sent: Saturday, August 16, 2025 7:26 AM To: Kirill Makurin <[email protected]> Cc: mingw-w64-public <[email protected]> Subject: Re: [PATCH 0/9] crt: Improve mbrtowc and ___lc_handle_func / ___lc_codepage_func Hello! On Friday 15 August 2025 21:48:10 Kirill Makurin wrote: > Hi Pali, > > Some of patches in this series conflict with patches I have sent in `Fix > return value of mbrlen and mbrtowc`. In particular, splitting implementation > into three files and removing `mb_wc_common.h`. I can rebase my changes, that is no problem. > I like the idea of using `_ismbblead()` instead of `IsDBCSLeadByteEx` and > removing static `mbrtowc_cp`, but shouldn't it be isleadbyte[1] instead? That is a good question. I blindly chose _ismbblead because it is already used in crtexewin.c and did not think about it. Now I'm thinking, what is the difference between those two functions? https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/isleadbyte-isleadbyte-l https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/ismbblead-ismbblead-l Description is not very useful, for me it looks like that _ismbblead is limited to double byte encodings (so for 4-byte UTF-8 it does not work) but isleadbyte works for any encodings? Looking into ms files, they are defined as (after expanding macros): #define isleadbyte(c) (__pctype_func()[(unsigned char)(c)] & _LEADBYTE) #define _ismbblead(c) ((__p__mbctype()+1)[(unsigned char)(c)] & _M1) And seems that ucrt sets both _LEADBYTE and _M1 flags for CPINFO.LeadByte sequences returned from GetCPInfo(). So I do not know which should be used, needs more investigation. mingw-w64's mbrtowc() seems to handle maximally mb_cur_max == 2, so it is also questionable UTF-8 support (cp=65000). > Replacing `_set_errno()` with `errno` also sounds like a good idea to me. > > Both usage of `IsDBCSLeadByteEx` and `_set_errno` came from my original code > on which I based this implementation for mingw-w64. > > - Kirill Makurin > > [1] > https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/isleadbyte-isleadbyte-l > ________________________________ > From: Pali Rohár <[email protected]> > Sent: Saturday, August 16, 2025 6:28 AM > To: [email protected] > <[email protected]> > Cc: Martin Storsjö <[email protected]>; LIU Hao <[email protected]>; Kirill > Makurin <[email protected]> > Subject: [PATCH 0/9] crt: Improve mbrtowc and ___lc_handle_func / > ___lc_codepage_func > > Pali Rohár (9): > crt: Provide emulation of ___lc_handle_func for msvcrt.dll and > msvcrtd.dll > crt: Improve support for ___lc_codepage_func() function > crt: Remove internal mb_wc_common.h and replace it by locale.h usage > crt: Remove static helper function mbrtowc_cp() > crt: Move private state_mbrlen/state_mbrtowc/state_mbsrtowcs variables > to corresponding functions > crt: Replace IsDBCSLeadByteEx() by _ismbblead() in mbrtowc() > crt: Use errno instead of _set_errno in mbrtowc > crt: Split mbrtowc.c into 3 files mbrlen.c mbrtowc.c and mbsrtowcs.c > crt: Use only mbstate_t in mbsrtowcs > > mingw-w64-crt/Makefile.am | 12 +- > mingw-w64-crt/lib-common/msvcrt.def.in | 2 +- > .../{mb_wc_common.h => ___lc_codepage_func.c} | 15 +- > ...cale_func.c => ___lc_codepage_func_emul.c} | 64 ++----- > .../{mb_wc_common.h => ___lc_handle_func.c} | 17 +- > mingw-w64-crt/misc/btowc.c | 2 +- > .../misc/{mb_wc_common.h => mbrlen.c} | 17 +- > mingw-w64-crt/misc/mbrtowc.c | 147 ++------------- > mingw-w64-crt/misc/{mbrtowc.c => mbsrtowcs.c} | 174 +----------------- > mingw-w64-crt/misc/mingw_wcstold.c | 2 - > mingw-w64-crt/misc/wcrtomb.c | 2 +- > mingw-w64-crt/misc/wcstof.c | 2 - > mingw-w64-crt/misc/wctob.c | 2 +- > 13 files changed, 95 insertions(+), 363 deletions(-) > copy mingw-w64-crt/misc/{mb_wc_common.h => ___lc_codepage_func.c} (41%) > rename mingw-w64-crt/misc/{lc_locale_func.c => ___lc_codepage_func_emul.c} > (15%) > copy mingw-w64-crt/misc/{mb_wc_common.h => ___lc_handle_func.c} (39%) > rename mingw-w64-crt/misc/{mb_wc_common.h => mbrlen.c} (38%) > copy mingw-w64-crt/misc/{mbrtowc.c => mbsrtowcs.c} (30%) > > -- > 2.20.1 > _______________________________________________ Mingw-w64-public mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/mingw-w64-public
