On Jul 29 15:14, Jon Turney wrote: > On 29/07/2022 12:58, Corinna Vinschen wrote: > > Hi Jon, > > > > On Jul 29 11:01, Jon TURNEY via Cygwin-cvs wrote: > > > https://sourceware.org/git/gitweb.cgi?p=newlib-cygwin.git;h=d4689b99c68628d9ec2fc1ac7884906ddbf6a2fc > > > > > > commit d4689b99c68628d9ec2fc1ac7884906ddbf6a2fc > > > Author: Jon Turney <jon.tur...@dronecode.org.uk> > > > Date: Thu May 19 17:27:39 2022 +0100 > > > > > > Cygwin: Set threadnames with SetThreadDescription() > > > [...] > > > + /* SetThreadDescription only exists in a wide-char version, so we > > > must > > > + convert threadname to wide-char. The encoding of threadName is > > > + unclear, so use UTF8 until we know better. */ > > > + int bufsize = MultiByteToWideChar (CP_UTF8, 0, threadName, -1, > > > NULL, 0); > > > + WCHAR buf[bufsize]; > > > + bufsize = MultiByteToWideChar (CP_UTF8, 0, threadName, -1, buf, > > > bufsize); > > > > I think this is wrong. The function should use stock mbstowcs instead > > to get the externally used encoding. Think of SetThreadName called with > > program_invocation_short_name in pthread::thread_init_wrapper, or called > > from pthread_setname_np with an externally provided thread name. This > > thread name will use the locale of the application code it's called by. > > I'm not sure. > > The linux manpage for pthread_setname_np() says "The thread name is a > meaningful C language string", which I think means it's ASCII-encoded, not > locale-encoded.
I think this only means, it's a NUL-terminated string. "Meaningful" is just trying to nudge developers into using meaningful names, not something like "blurb". > (The solaris manpage explicitly says that the thread name is utf8 encoded) Ok, that's an interesting point. > The encoding for program_invocation_short_name was also unclear to me. > (It's the same as argv[0], so I guess it's in whatever encoding the > filesystem uses, which doesn't have to match the process locale encoding) > > Expecting this function to work with non-ASCII names seems optimistic :) Well, for Linux it's certainly just an arbitrary, NUL-terminated byte stream, but yeah, it's certainly the only portable way to expect the portable codeset. Anyway, feel free to just keep the code as is. We're typically using UTF-8 anyway and people switching to one of the legacy codesets are supposed to know what they are doing. Corinna