Go strings are UTF-8 encoded as others have mentioned. This means that each human readable character in the string is really a cluster of one or more runes. Some characters are made up of one rune, some are made up of many. Some runes combine with others to create different characters. Also, runes don't have a preset size in bytes, some are made up of one byte, others are made up of more.
In your example, the character © is made up of one rune, which is defined using two bytes, each with the values 0xc2 and x0a9 respectively. On Tuesday, 28 February 2017 17:29:07 UTC, Fraser Hanson wrote: > > https://play.golang.org/p/05wZM9BhfB > > I'm working on some code that reads UTF32 and converts it to go strings. > I'm finding some surprising behavior when casting slices of runes to > strings. > > runes := []rune{'©'} > fmt.Printf(" cast to string: (%s)\n", string(runes)) > fmt.Printf("bytes in string: (%x)\n", string(runes)) > Output: > > cast to string: (©) > bytes in string: (c2a9) // <-- where's the C2 byte coming from?? > > > The weird part is that casting the rune slice to a string causes it to > pick up an additional leading character. > > runesi 0x00-0x7f get nothing prepended. > runes 0x80-0xbf gets a leading c2 byte as seen above. > runes 0xc0-0xff gets a leading c3 byte. > rune 0x100 gets a leading c4 byte. Seems like a pattern here. > > The same thing happens if I add the runes into a bytes.Buffer with > WriteRune(), then print it out with bytes.Buffer.String(). > > Can anyone explain this? > What's the correct way to convert a slice of runes into a string? > -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.