Re: [go-nuts] Sanitising a UTF-8 string

2017-10-22 Thread Sam Whited
On Sun, Oct 22, 2017, at 09:29, Juliusz Chroboczek wrote: > I'm probably missing something obvious, but I've looked through the > standard library to no avail. How do I sanitise a []byte to make sure > it's a UTF-8 string by replacing all incorrect sequences by the > replacement character (or what

Re: [go-nuts] Sanitising a UTF-8 string

2017-10-22 Thread Jakob Borg
Converting a string to a slice of runes gives you the individual code points, with the replacement character as necessary. Converting a slice of runes into a string gives you the UTF-8 representation. So sanitation of a string should be as simple as string([]rune(someString)). This will be O(n)

Re: [go-nuts] Sanitising a UTF-8 string

2017-10-22 Thread andrey mirtchovski
See the section "For statements with range clause" in the spec: https://golang.org/ref/spec#For_statements "For a string value, the "range" clause iterates over the Unicode code points in the string starting at byte index 0. On successive iterations, the index value will be the index of the first

[go-nuts] Sanitising a UTF-8 string

2017-10-22 Thread Juliusz Chroboczek
I'm probably missing something obvious, but I've looked through the standard library to no avail. How do I sanitise a []byte to make sure it's a UTF-8 string by replacing all incorrect sequences by the replacement character (or whatever)? I've found unicode/utf8.Valid, which tells me if a []byte