Thanks, the _ismbblead() should really not be used in mbrtowc() which uses code page from setlocale. isleadbyte() then makes sense to use.
Splitting the wcrtomb.c file is a good idea too. On Friday 15 August 2025 22:38:28 Kirill Makurin wrote: > The difference between isleadbyte (declared in ctype.h) and _ismbblead > (declared in mbctype.h) is that isleadyte uses code page set with call to > setlocale(), while _ismbblead uses code page set with _setmbcp[1]. > > All functions declared in mbctype.h and mbstring.h use code page set with > _setmbcp() (not with setlocale), so we should avoid using them at all. > > Yes, _ismbblead does not work with UTF-8 and this is not an issue in our > case. I believe isleadyte will always return 0 with UTF-8, but I didn't test > it. > > It is not a problem for me to recreate patches if yours are pushed first. > What do you think about splitting misc/wcrtomb.c into multiple files just > like you did with misc/mbrtowc.c? > > - Kirill Makurin > > [1] https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/setmbcp > ________________________________ > From: Pali Rohár <[email protected]> > Sent: Saturday, August 16, 2025 7:26 AM > To: Kirill Makurin <[email protected]> > Cc: mingw-w64-public <[email protected]> > Subject: Re: [PATCH 0/9] crt: Improve mbrtowc and ___lc_handle_func / > ___lc_codepage_func > > Hello! > > On Friday 15 August 2025 21:48:10 Kirill Makurin wrote: > > Hi Pali, > > > > Some of patches in this series conflict with patches I have sent in `Fix > > return value of mbrlen and mbrtowc`. In particular, splitting > > implementation into three files and removing `mb_wc_common.h`. > > I can rebase my changes, that is no problem. > > > I like the idea of using `_ismbblead()` instead of `IsDBCSLeadByteEx` and > > removing static `mbrtowc_cp`, but shouldn't it be isleadbyte[1] instead? > > That is a good question. I blindly chose _ismbblead because it is > already used in crtexewin.c and did not think about it. Now I'm > thinking, what is the difference between those two functions? > > https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/isleadbyte-isleadbyte-l > https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/ismbblead-ismbblead-l > > Description is not very useful, for me it looks like that _ismbblead is > limited to double byte encodings (so for 4-byte UTF-8 it does not work) > but isleadbyte works for any encodings? > > Looking into ms files, they are defined as (after expanding macros): > > #define isleadbyte(c) (__pctype_func()[(unsigned char)(c)] & _LEADBYTE) > #define _ismbblead(c) ((__p__mbctype()+1)[(unsigned char)(c)] & _M1) > > And seems that ucrt sets both _LEADBYTE and _M1 flags for > CPINFO.LeadByte sequences returned from GetCPInfo(). > > So I do not know which should be used, needs more investigation. > > mingw-w64's mbrtowc() seems to handle maximally mb_cur_max == 2, so it > is also questionable UTF-8 support (cp=65000). > > > Replacing `_set_errno()` with `errno` also sounds like a good idea to me. > > > > Both usage of `IsDBCSLeadByteEx` and `_set_errno` came from my original > > code on which I based this implementation for mingw-w64. > > > > - Kirill Makurin > > > > [1] > > https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/isleadbyte-isleadbyte-l > > ________________________________ > > From: Pali Rohár <[email protected]> > > Sent: Saturday, August 16, 2025 6:28 AM > > To: [email protected] > > <[email protected]> > > Cc: Martin Storsjö <[email protected]>; LIU Hao <[email protected]>; Kirill > > Makurin <[email protected]> > > Subject: [PATCH 0/9] crt: Improve mbrtowc and ___lc_handle_func / > > ___lc_codepage_func > > > > Pali Rohár (9): > > crt: Provide emulation of ___lc_handle_func for msvcrt.dll and > > msvcrtd.dll > > crt: Improve support for ___lc_codepage_func() function > > crt: Remove internal mb_wc_common.h and replace it by locale.h usage > > crt: Remove static helper function mbrtowc_cp() > > crt: Move private state_mbrlen/state_mbrtowc/state_mbsrtowcs variables > > to corresponding functions > > crt: Replace IsDBCSLeadByteEx() by _ismbblead() in mbrtowc() > > crt: Use errno instead of _set_errno in mbrtowc > > crt: Split mbrtowc.c into 3 files mbrlen.c mbrtowc.c and mbsrtowcs.c > > crt: Use only mbstate_t in mbsrtowcs > > > > mingw-w64-crt/Makefile.am | 12 +- > > mingw-w64-crt/lib-common/msvcrt.def.in | 2 +- > > .../{mb_wc_common.h => ___lc_codepage_func.c} | 15 +- > > ...cale_func.c => ___lc_codepage_func_emul.c} | 64 ++----- > > .../{mb_wc_common.h => ___lc_handle_func.c} | 17 +- > > mingw-w64-crt/misc/btowc.c | 2 +- > > .../misc/{mb_wc_common.h => mbrlen.c} | 17 +- > > mingw-w64-crt/misc/mbrtowc.c | 147 ++------------- > > mingw-w64-crt/misc/{mbrtowc.c => mbsrtowcs.c} | 174 +----------------- > > mingw-w64-crt/misc/mingw_wcstold.c | 2 - > > mingw-w64-crt/misc/wcrtomb.c | 2 +- > > mingw-w64-crt/misc/wcstof.c | 2 - > > mingw-w64-crt/misc/wctob.c | 2 +- > > 13 files changed, 95 insertions(+), 363 deletions(-) > > copy mingw-w64-crt/misc/{mb_wc_common.h => ___lc_codepage_func.c} (41%) > > rename mingw-w64-crt/misc/{lc_locale_func.c => ___lc_codepage_func_emul.c} > > (15%) > > copy mingw-w64-crt/misc/{mb_wc_common.h => ___lc_handle_func.c} (39%) > > rename mingw-w64-crt/misc/{mb_wc_common.h => mbrlen.c} (38%) > > copy mingw-w64-crt/misc/{mbrtowc.c => mbsrtowcs.c} (30%) > > > > -- > > 2.20.1 > > _______________________________________________ Mingw-w64-public mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/mingw-w64-public
