You know that CharBuffer doesn't actually do UTF8, right? It's just a ByteBuffer split into equal 2 byte segments. CharBuffer is a way to obtain char (the 2 byte number, not the character) from the ByteBuffer or String you created it from. CharBuffer is functionally no different than ShortBuffer.
On Tue, Oct 20, 2015 at 1:39 AM, Peter Levart <peter.lev...@gmail.com> wrote: > > > On 10/18/2015 12:08 AM, Pavel Rappo wrote: > >> Hi Joakim, >> >> On 17 Oct 2015, at 22:42, Joakim Erdfelt <joakim.erdf...@gmail.com> >>> wrote: >>> >>> You are required, per the RFC6455 spec, to validate incoming and >>> outgoing TEXT messages are valid UTF8. >>> (also Handshake and Close Reason Messages) >>> >>> http://tools.ietf.org/html/rfc6455#section-8.1 >>> >>> Relying on the JVM built-in replacement character behavior for invalid >>> UTF8 sequences will cause many bugs. >>> If you rely on the CharsetEncoder and CharBuffer you'll wind up with >>> situations where you are changing the data. >>> >>> You need to rely on an implementation that does not use replacement >>> characters and throws exceptions on bad Write, >>> and on bad received TEXT messages you MUST close the connection with a >>> 1007 error code. >>> >> The only thing I was trying to say is that in my opinion there's no extra >> confidence in UTF-8 representability that CharSequence or even String >> gives us >> compared to what CharBuffer does. On the other hand, compared to any other >> implementation of CharSequence or String, CharBuffer is the most >> charset-friendly thing we have: CharsetEncoder/CharsetDecoder speaks in >> CharBuffers. >> >> Sorry, but I believe I haven't proposed to rely on JDK built-in >> replacement >> characters. Moreover, being able to tell the decoder/encoder to throw >> exceptions >> (e.g. UnmappableCharacterException) on incorrect input was one of the main >> reasons to use CharsetEncoder/Decoder. And not, say, >> String.getBytes(StandardCharsets.UTF_8). >> >> Thanks. >> >> > Hi, > > Just to clear things... The onText(..., CharBuffer cb, ...) call-back > method receives a CharBuffer with content that is already UTF-8 decoded > from wire message bytes, right? If it was different, it would not be right! > So decoding is performed by WebSocket implementation, not by user and > therefore can be performed per RFC6455 spec. CharBuffer, CharSequence, > String - those object all represent characters and their API has nothing to > do with UTF-8 or any other encoding. > > Regards, Peter > >