On Jul 22 05:38, Thomas Wolff via Cygwin wrote:
> Am 27.06.2025 um 12:30 schrieb Corinna Vinschen via Cygwin:
> > On Jun 26 19:07, Christian Franke via Cygwin wrote:
> > > With some trial and error I found a testcase for this more serious problem
> > > reported yesterday but not quoted above:
> > > 
> > > > > In cases like file3-... above, the converted Windows path ends with
> > > > > 0xF000. This suggests that this is an accidental conversion of the
> > > > > terminating null to the 0xF0xx range.
> > > > > 
> > > > > In some cases, the created Windows file name has random garbage
> > > > > behind the 0xF000. Then even Cygwin is not able to access or unlink
> > > > > the file after creation.
> > > Testcase (attached):
> > Thanks for the testcase!
> > 
> > I found the problem in the newlib core function creating wchar_t from
> > UTF-8 input.  In case of 4 byte UTF-8 sequences, the code created the
> > low surrogate already after reading byte 3, without checking if byte 4
> > of the UTF-8 sequence is a valid byte. Hilarity ensues.
> I'm afraid the fix may have broken mbrtowc as I just reported to the list,
> with a test case, thus also breaking mintty.
> The low surrogate MUST be created after byte 3 because otherwise the high
> surrogate cannot be delivered after byte 4 as it needs to.
> I think it's a drawback of UTF-16 that must be swallowed, even if some
> incorrect sequences slip through somehow.

Bummer.  What bugs me most is that you might be right here.  It's a bit
late, but we should have defined wchar_t as a 4 byte type back when we
worked on Cygwin 1.7.0... sigh.

mbrtowc() is inherently a bad idea when it comes to UTF-16.  It's a
function which only works really correctly for the unicode base plane,
or if wchar_t is big enough.

It's the reason we don't use mbrtowc() if possible.  It's better to call
mbstowcs() or friends and allow at least 3 chars in the wchar_t buffer.
You can't change that in mintty by any chance?


Corinna

-- 
Problem reports:      https://cygwin.com/problems.html
FAQ:                  https://cygwin.com/faq/
Documentation:        https://cygwin.com/docs.html
Unsubscribe info:     https://cygwin.com/ml/#unsubscribe-simple

Reply via email to