I've been going over the spec to clarify finer points of how string vs. []byte 
behave, I think there may be an unnecessary degree of freedom that could be 
removed. Either that, or I missed a load-bearing statement that constrains 
implementations.

In https://go.dev/ref/spec#Conversions, `[]rune(str)` is specified as: 
"Converting a value of a string type to a slice of runes type yields a slice 
containing the individual Unicode code points of the string."

This does not specify the behavior if the string contains invalid UTF-8 byte 
sequences. If my reading is correct, a compliant implementation would be free 
to panic() on such a conversion, or implement the conversion in an arbitrary 
way of its choosing.

This is in contrast to for...range over a string, which strictly specifies how 
invalid UTF-8 byte sequences are handled. 
https://go.dev/ref/spec#For_statements says: "For a string value [...] If the 
iteration encounters an invalid UTF-8 sequence, the second value will be 
`0xFFFD`, the Unicode replacement character, and the next iteration will 
advance a single byte in the string." This is in line with current Unicode 
recommendations for input processing, and (IMO) is the only reasonable thing to 
do when decoding invalid UTF-8.

Empirically, the reference Go compiler does the sensible thing: string to 
[]rune conversions behave consistently with the ranged-for behavior. I haven't 
checked but presume that gccgo et al. do the same: they must implement the 
ranged for-behavior anyway, doing something different for []rune conversion 
would be more work to introduce gratuitous surprising behavior.

But, unless I missed a clarification in the spec, a contrarian implementation 
_could_ implement novel behavior for []rune conversion of invalid UTF-8. Did I 
miss anything?

If not I'll file a proposal to spell out required behavior in the spec, since I 
don't think there are any compatibility concerns or reasonable arguments for 
allowing []rune conversion alone to behave strangely in this respect.

- Dave

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/a81552b5-b382-4da8-ab8d-a4d4d657cfdd%40app.fastmail.com.

Reply via email to