On Tue, Feb 28, 2017 at 9:05 AM, Fraser Hanson <fraser.han...@gmail.com> wrote:
> https://play.golang.org/p/05wZM9BhfB
>
> I'm working on some code that reads UTF32 and converts it to go strings.
> I'm finding some surprising behavior when casting slices of runes to
> strings.
>
>  runes := []rune{'©'}
>  fmt.Printf(" cast to string: (%s)\n", string(runes))
>  fmt.Printf("bytes in string: (%x)\n", string(runes))
> Output:
>
>  cast to string: (©)
> bytes in string: (c2a9) // <-- where's the C2 byte coming from??
>
>
> The weird part is that casting the rune slice to a string causes it to pick
> up an additional leading character.
>
> runesi 0x00-0x7f get nothing prepended.
> runes 0x80-0xbf gets a leading c2 byte as seen above.
> runes 0xc0-0xff gets a leading c3 byte.
> rune 0x100 gets a leading c4 byte.  Seems like a pattern here.
>
> The same thing happens if I add the runes into a bytes.Buffer with
> WriteRune(), then print it out with bytes.Buffer.String().
>
> Can anyone explain this?
> What's the correct way to convert a slice of runes into a string?

When you convert []rune to string, the runes are encoded into UTF-8
and the resulting bytes are the contents of the string.  That is what
you are seeing.  I don't know what you expect to see.

Ian

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to