Re: Read non-UTF8 file

Nrgyzer Tue, 22 Feb 2011 08:55:26 -0800

== Auszug aus Stewart Gordon ([email protected])'s Artikel
> What compiler version/platform are you using?  I had to fix some errors 
> before it would
> compile on mine (1.066/2.051 Windows).
> On 19/02/2011 13:42, Nrgyzer wrote:
> <snip>
> > Now... and with writefln("%s", cast(ubyte[]) convertToUTF8(f.readLine())); 
> > I get the following:
> >
> > [195, 131, 164]
> > [195, 131, 182]
> > [195, 131, 188]
> It took a while for me to make sense of what's going on!
> The expressions (0xC0 | (ch >> 6)) and (0x80 | (ch & 0x3F)) both have type 
> int.  It
> appears that, in D2, if you append an int to a string then it treats the int 
> as a Unicode
> codepoint and automagically converts it to UTF-8.  But why is it doing it on 
> the first
> byte and not the second?  This looks like a bug.
> Casting each UTF-8 byte value to a char
>      if (ch < 0x80) {
>          result ~= cast(char) ch;
>      } else {
>          result ~= cast(char) (0xC0 | (ch >> 6));
>          result ~= cast(char) (0x80 | (ch & 0x3F));
>      }
> gives the expected output
> [195, 164]
> [195, 182]
> [195, 188]
> HTH
> Stewart.


I also wondered because I've used the same code in D1 and it worked without any 
problems. Anyway... thanks :)

Re: Read non-UTF8 file

Reply via email to