== Auszug aus Stewart Gordon ([email protected])'s Artikel > What compiler version/platform are you using? I had to fix some errors > before it would > compile on mine (1.066/2.051 Windows). > On 19/02/2011 13:42, Nrgyzer wrote: > <snip> > > Now... and with writefln("%s", cast(ubyte[]) convertToUTF8(f.readLine())); > > I get the following: > > > > [195, 131, 164] > > [195, 131, 182] > > [195, 131, 188] > It took a while for me to make sense of what's going on! > The expressions (0xC0 | (ch >> 6)) and (0x80 | (ch & 0x3F)) both have type > int. It > appears that, in D2, if you append an int to a string then it treats the int > as a Unicode > codepoint and automagically converts it to UTF-8. But why is it doing it on > the first > byte and not the second? This looks like a bug. > Casting each UTF-8 byte value to a char > if (ch < 0x80) { > result ~= cast(char) ch; > } else { > result ~= cast(char) (0xC0 | (ch >> 6)); > result ~= cast(char) (0x80 | (ch & 0x3F)); > } > gives the expected output > [195, 164] > [195, 182] > [195, 188] > HTH > Stewart.
I also wondered because I've used the same code in D1 and it worked without any problems. Anyway... thanks :)
