On 10/18/2015 12:08 AM, Pavel Rappo wrote:
Hi Joakim,
On 17 Oct 2015, at 22:42, Joakim Erdfelt <joakim.erdf...@gmail.com> wrote:
You are required, per the RFC6455 spec, to validate incoming and outgoing TEXT
messages are valid UTF8.
(also Handshake and Close Reason Messages)
http://tools.ietf.org/html/rfc6455#section-8.1
Relying on the JVM built-in replacement character behavior for invalid UTF8
sequences will cause many bugs.
If you rely on the CharsetEncoder and CharBuffer you'll wind up with situations
where you are changing the data.
You need to rely on an implementation that does not use replacement characters
and throws exceptions on bad Write,
and on bad received TEXT messages you MUST close the connection with a 1007
error code.
The only thing I was trying to say is that in my opinion there's no extra
confidence in UTF-8 representability that CharSequence or even String gives us
compared to what CharBuffer does. On the other hand, compared to any other
implementation of CharSequence or String, CharBuffer is the most
charset-friendly thing we have: CharsetEncoder/CharsetDecoder speaks in
CharBuffers.
Sorry, but I believe I haven't proposed to rely on JDK built-in replacement
characters. Moreover, being able to tell the decoder/encoder to throw exceptions
(e.g. UnmappableCharacterException) on incorrect input was one of the main
reasons to use CharsetEncoder/Decoder. And not, say,
String.getBytes(StandardCharsets.UTF_8).
Thanks.
Hi,
Just to clear things... The onText(..., CharBuffer cb, ...) call-back
method receives a CharBuffer with content that is already UTF-8 decoded
from wire message bytes, right? If it was different, it would not be
right! So decoding is performed by WebSocket implementation, not by user
and therefore can be performed per RFC6455 spec. CharBuffer,
CharSequence, String - those object all represent characters and their
API has nothing to do with UTF-8 or any other encoding.
Regards, Peter