On Apr 1, 2015, at 7:34 AM, Corinna Vinschen <corinna-cyg...@cygwin.com> wrote: > > As you probably know, Unicode values beyond the base plane (that is, > everything > 0xffff in UTF-32 and > ef bf bf in UTF-8 notation) > are represented as so-called surrogate pairs in UTF-16, two UTF-16 > values in the 0xd800 - 0xdfff range.
I happened to have run across a similar strangeness in Unicode earlier today. Does Cygwin cope with/care about Unicode normalization forms? http://goo.gl/jnsqhC For example, will open(2) cope with any UTF-8 form of a string that you could pass in UTF-16 encoding to CreateFile()? You could imagine, say, a web app getting a string from a user, then using that to access a file on disk. A different browser given the “same” string could result in a different series of bytes passed to the Cygwin POSIX layer. -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple