On Jul 22 05:38, Thomas Wolff via Cygwin wrote: > Am 27.06.2025 um 12:30 schrieb Corinna Vinschen via Cygwin: > > On Jun 26 19:07, Christian Franke via Cygwin wrote: > > > With some trial and error I found a testcase for this more serious problem > > > reported yesterday but not quoted above: > > > > > > > > In cases like file3-... above, the converted Windows path ends with > > > > > 0xF000. This suggests that this is an accidental conversion of the > > > > > terminating null to the 0xF0xx range. > > > > > > > > > > In some cases, the created Windows file name has random garbage > > > > > behind the 0xF000. Then even Cygwin is not able to access or unlink > > > > > the file after creation. > > > Testcase (attached): > > Thanks for the testcase! > > > > I found the problem in the newlib core function creating wchar_t from > > UTF-8 input. In case of 4 byte UTF-8 sequences, the code created the > > low surrogate already after reading byte 3, without checking if byte 4 > > of the UTF-8 sequence is a valid byte. Hilarity ensues. > I'm afraid the fix may have broken mbrtowc as I just reported to the list, > with a test case, thus also breaking mintty. > The low surrogate MUST be created after byte 3 because otherwise the high > surrogate cannot be delivered after byte 4 as it needs to. > I think it's a drawback of UTF-16 that must be swallowed, even if some > incorrect sequences slip through somehow.
Bummer. What bugs me most is that you might be right here. It's a bit late, but we should have defined wchar_t as a 4 byte type back when we worked on Cygwin 1.7.0... sigh. mbrtowc() is inherently a bad idea when it comes to UTF-16. It's a function which only works really correctly for the unicode base plane, or if wchar_t is big enough. It's the reason we don't use mbrtowc() if possible. It's better to call mbstowcs() or friends and allow at least 3 chars in the wchar_t buffer. You can't change that in mintty by any chance? Corinna -- Problem reports: https://cygwin.com/problems.html FAQ: https://cygwin.com/faq/ Documentation: https://cygwin.com/docs.html Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple