On Jul 24 15:41, Thomas Wolff via Cygwin wrote:
> Am 24.07.2025 um 12:30 schrieb Corinna Vinschen:
> > What does that mean?  Consider this UTF8 input string:
> > 
> >    0xf0 0x90 0x80 0x2e
> > 
> >    mbstowcs:     returns -1
> >    sys_mbstowcs: f0f0 f090 f080 002e
> > 
> > Let's convert it back to multibyte:
> > 
> >    sys_wcstombs: 0xf0 0x90 0x80 0x2e
> >    wcstombs:     0xef 0x83 0xb0 0xef 0x82 0x90 0xef 0x82 0x80 0x2e
> > 
> > So while sys_wcstombs has special code converting the string back to its
> > original MB string, wcstombs converts to the CESU-8 representation.
> > 
> > This is transparent.  If we convert this CESU-8 string back to
> > wide-char, the resulting wide-char strings are the same:
> > 
> >    mbstowcs:     f0f0 f090 f080 002e
> >    sys_mbstowcs: f0f0 f090 f080 002e
> > 
> > So the question here is, shall we keep the special case converting
> > private use area bytes back to their original byte encoding?
> > 
> > Or shall simply go along with CESU-8 when converting back to multibyte
> > to keep the string the same as with wcstombs?
> > 
> > Exempt from this are the characters not valid in a DOS filename.
> > These will always be converted if we create wide-char filenames.
> Sounds like a fair solution with only minor glitches. Poor 4th byte but
> thanks a lot anyway.
> About the latter decision, if there's no strong bias otherwise, I'd prefer
> to drop special handling (but don't take my vote, I don't care so much about
> that).

Thanks for your input.

As another datapoint we have to consider how sys_wcstombs is used.

wcstombs on a filename will be used by the application only, and only if
the filename is incoming application level data or has been converted to a
wide char by the application itself.

sys_wcstombs will be used to generate a readable multi-byte filename from
UTF-16 filenames read from the filesystem.  So it's major use in terms of
filenames is by readdir().

Knowing that, the question boils down to this:

Do we want readdir() returning the same name as given to open(), or is
CESU-8 sufficent?


Corinna

-- 
Problem reports:      https://cygwin.com/problems.html
FAQ:                  https://cygwin.com/faq/
Documentation:        https://cygwin.com/docs.html
Unsubscribe info:     https://cygwin.com/ml/#unsubscribe-simple

Reply via email to