On Jul 24 15:41, Thomas Wolff via Cygwin wrote: > Am 24.07.2025 um 12:30 schrieb Corinna Vinschen: > > What does that mean? Consider this UTF8 input string: > > > > 0xf0 0x90 0x80 0x2e > > > > mbstowcs: returns -1 > > sys_mbstowcs: f0f0 f090 f080 002e > > > > Let's convert it back to multibyte: > > > > sys_wcstombs: 0xf0 0x90 0x80 0x2e > > wcstombs: 0xef 0x83 0xb0 0xef 0x82 0x90 0xef 0x82 0x80 0x2e > > > > So while sys_wcstombs has special code converting the string back to its > > original MB string, wcstombs converts to the CESU-8 representation. > > > > This is transparent. If we convert this CESU-8 string back to > > wide-char, the resulting wide-char strings are the same: > > > > mbstowcs: f0f0 f090 f080 002e > > sys_mbstowcs: f0f0 f090 f080 002e > > > > So the question here is, shall we keep the special case converting > > private use area bytes back to their original byte encoding? > > > > Or shall simply go along with CESU-8 when converting back to multibyte > > to keep the string the same as with wcstombs? > > > > Exempt from this are the characters not valid in a DOS filename. > > These will always be converted if we create wide-char filenames. > Sounds like a fair solution with only minor glitches. Poor 4th byte but > thanks a lot anyway. > About the latter decision, if there's no strong bias otherwise, I'd prefer > to drop special handling (but don't take my vote, I don't care so much about > that).
Thanks for your input. As another datapoint we have to consider how sys_wcstombs is used. wcstombs on a filename will be used by the application only, and only if the filename is incoming application level data or has been converted to a wide char by the application itself. sys_wcstombs will be used to generate a readable multi-byte filename from UTF-16 filenames read from the filesystem. So it's major use in terms of filenames is by readdir(). Knowing that, the question boils down to this: Do we want readdir() returning the same name as given to open(), or is CESU-8 sufficent? Corinna -- Problem reports: https://cygwin.com/problems.html FAQ: https://cygwin.com/faq/ Documentation: https://cygwin.com/docs.html Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple