Hi Joakim, > On 17 Oct 2015, at 22:42, Joakim Erdfelt <joakim.erdf...@gmail.com> wrote: > > You are required, per the RFC6455 spec, to validate incoming and outgoing > TEXT messages are valid UTF8. > (also Handshake and Close Reason Messages) > > http://tools.ietf.org/html/rfc6455#section-8.1 > > Relying on the JVM built-in replacement character behavior for invalid UTF8 > sequences will cause many bugs. > If you rely on the CharsetEncoder and CharBuffer you'll wind up with > situations where you are changing the data. > > You need to rely on an implementation that does not use replacement > characters and throws exceptions on bad Write, > and on bad received TEXT messages you MUST close the connection with a 1007 > error code.
The only thing I was trying to say is that in my opinion there's no extra confidence in UTF-8 representability that CharSequence or even String gives us compared to what CharBuffer does. On the other hand, compared to any other implementation of CharSequence or String, CharBuffer is the most charset-friendly thing we have: CharsetEncoder/CharsetDecoder speaks in CharBuffers. Sorry, but I believe I haven't proposed to rely on JDK built-in replacement characters. Moreover, being able to tell the decoder/encoder to throw exceptions (e.g. UnmappableCharacterException) on incorrect input was one of the main reasons to use CharsetEncoder/Decoder. And not, say, String.getBytes(StandardCharsets.UTF_8). Thanks.