Thanks, the _ismbblead() should really not be used in mbrtowc() which
uses code page from setlocale. isleadbyte() then makes sense to use.

Splitting the wcrtomb.c file is a good idea too.

On Friday 15 August 2025 22:38:28 Kirill Makurin wrote:
> The difference between isleadbyte (declared in ctype.h) and _ismbblead 
> (declared in mbctype.h) is that isleadyte uses code page set with call to 
> setlocale(), while _ismbblead uses code page set with _setmbcp[1].
> 
> All functions declared in mbctype.h and mbstring.h use code page set with 
> _setmbcp() (not with setlocale), so we should avoid using them at all.
> 
> Yes, _ismbblead does not work with UTF-8 and this is not an issue in our 
> case. I believe isleadyte will always return 0 with UTF-8, but I didn't test 
> it.
> 
> It is not a problem for me to recreate patches if yours are pushed first. 
> What do you think about splitting misc/wcrtomb.c into multiple files just 
> like you did with misc/mbrtowc.c?
> 
> - Kirill Makurin
> 
> [1] https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/setmbcp
> ________________________________
> From: Pali Rohár <[email protected]>
> Sent: Saturday, August 16, 2025 7:26 AM
> To: Kirill Makurin <[email protected]>
> Cc: mingw-w64-public <[email protected]>
> Subject: Re: [PATCH 0/9] crt: Improve mbrtowc and ___lc_handle_func / 
> ___lc_codepage_func
> 
> Hello!
> 
> On Friday 15 August 2025 21:48:10 Kirill Makurin wrote:
> > Hi Pali,
> >
> > Some of patches in this series conflict with patches I have sent in `Fix 
> > return value of mbrlen and mbrtowc`. In particular, splitting 
> > implementation into three files and removing `mb_wc_common.h`.
> 
> I can rebase my changes, that is no problem.
> 
> > I like the idea of using `_ismbblead()` instead of `IsDBCSLeadByteEx` and 
> > removing static `mbrtowc_cp`, but shouldn't it be isleadbyte[1] instead?
> 
> That is a good question. I blindly chose _ismbblead because it is
> already used in crtexewin.c and did not think about it. Now I'm
> thinking, what is the difference between those two functions?
> 
> https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/isleadbyte-isleadbyte-l
> https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/ismbblead-ismbblead-l
> 
> Description is not very useful, for me it looks like that _ismbblead is
> limited to double byte encodings (so for 4-byte UTF-8 it does not work)
> but isleadbyte works for any encodings?
> 
> Looking into ms files, they are defined as (after expanding macros):
> 
> #define isleadbyte(c) (__pctype_func()[(unsigned char)(c)] & _LEADBYTE)
> #define _ismbblead(c) ((__p__mbctype()+1)[(unsigned char)(c)] & _M1)
> 
> And seems that ucrt sets both _LEADBYTE and _M1 flags for
> CPINFO.LeadByte sequences returned from GetCPInfo().
> 
> So I do not know which should be used, needs more investigation.
> 
> mingw-w64's mbrtowc() seems to handle maximally mb_cur_max == 2, so it
> is also questionable UTF-8 support (cp=65000).
> 
> > Replacing `_set_errno()` with `errno` also sounds like a good idea to me.
> >
> > Both usage of `IsDBCSLeadByteEx` and `_set_errno` came from my original 
> > code on which I based this implementation for mingw-w64.
> >
> > - Kirill Makurin
> >
> > [1] 
> > https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/isleadbyte-isleadbyte-l
> > ________________________________
> > From: Pali Rohár <[email protected]>
> > Sent: Saturday, August 16, 2025 6:28 AM
> > To: [email protected] 
> > <[email protected]>
> > Cc: Martin Storsjö <[email protected]>; LIU Hao <[email protected]>; Kirill 
> > Makurin <[email protected]>
> > Subject: [PATCH 0/9] crt: Improve mbrtowc and ___lc_handle_func / 
> > ___lc_codepage_func
> >
> > Pali Rohár (9):
> >   crt: Provide emulation of ___lc_handle_func for msvcrt.dll and
> >     msvcrtd.dll
> >   crt: Improve support for ___lc_codepage_func() function
> >   crt: Remove internal mb_wc_common.h and replace it by locale.h usage
> >   crt: Remove static helper function mbrtowc_cp()
> >   crt: Move private state_mbrlen/state_mbrtowc/state_mbsrtowcs variables
> >     to corresponding functions
> >   crt: Replace IsDBCSLeadByteEx() by _ismbblead() in mbrtowc()
> >   crt: Use errno instead of _set_errno in mbrtowc
> >   crt: Split mbrtowc.c into 3 files mbrlen.c mbrtowc.c and mbsrtowcs.c
> >   crt: Use only mbstate_t in mbsrtowcs
> >
> >  mingw-w64-crt/Makefile.am                     |  12 +-
> >  mingw-w64-crt/lib-common/msvcrt.def.in        |   2 +-
> >  .../{mb_wc_common.h => ___lc_codepage_func.c} |  15 +-
> >  ...cale_func.c => ___lc_codepage_func_emul.c} |  64 ++-----
> >  .../{mb_wc_common.h => ___lc_handle_func.c}   |  17 +-
> >  mingw-w64-crt/misc/btowc.c                    |   2 +-
> >  .../misc/{mb_wc_common.h => mbrlen.c}         |  17 +-
> >  mingw-w64-crt/misc/mbrtowc.c                  | 147 ++-------------
> >  mingw-w64-crt/misc/{mbrtowc.c => mbsrtowcs.c} | 174 +-----------------
> >  mingw-w64-crt/misc/mingw_wcstold.c            |   2 -
> >  mingw-w64-crt/misc/wcrtomb.c                  |   2 +-
> >  mingw-w64-crt/misc/wcstof.c                   |   2 -
> >  mingw-w64-crt/misc/wctob.c                    |   2 +-
> >  13 files changed, 95 insertions(+), 363 deletions(-)
> >  copy mingw-w64-crt/misc/{mb_wc_common.h => ___lc_codepage_func.c} (41%)
> >  rename mingw-w64-crt/misc/{lc_locale_func.c => ___lc_codepage_func_emul.c} 
> > (15%)
> >  copy mingw-w64-crt/misc/{mb_wc_common.h => ___lc_handle_func.c} (39%)
> >  rename mingw-w64-crt/misc/{mb_wc_common.h => mbrlen.c} (38%)
> >  copy mingw-w64-crt/misc/{mbrtowc.c => mbsrtowcs.c} (30%)
> >
> > --
> > 2.20.1
> >


_______________________________________________
Mingw-w64-public mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/mingw-w64-public

Reply via email to