Hi Joakim, > On 20 Oct 2015, at 14:37, Joakim Erdfelt <joakim.erdf...@gmail.com> wrote: > > But we *think* we understand what you are trying to do. > > Here's a split UTF8 scenario (just whipped up) > https://gist.github.com/joakime/e34b727a6989ca7cef94 > > So the JVM implementation side will take the raw bytes (presumably as a > ByteBuffer), and when the entire message is fully received it will convert it > to a CharBuffer using the CharsetDecoder for UTF8 with REPORT logic to > capture bad UTF8 sequences. > > Some concerns about this approach. > > 1. You can't fast-fail a large and fragmented TEXT message if the problematic > UTF8 sequence occurs early (this is a spec test in the autobahn testsuite btw) > 2. You can't use CharBuffer with partial TEXT message handling, as UTF8 > sequences that are split across Frames will trigger the REPORT processing. > (see gist/example above for this scenario) (also a spec test in the autobahn > testsuite) > 3. For each TEXT message, there's 2 data copies (ByteBuffer -> HeapCharBuffer > -> String) for it to be practical to use in many 3rd party libs (eg JSON > parsing). For large messages, this can get expensive.
Joakim, If I tell you that 1. CharsetDecoder is a stateful object which is capable of incremental decoding from any given ByteBuffer into any given CharBuffer. Have a look at CharsetDecoder#decode(java.nio.ByteBuffer, java.nio.CharBuffer, boolean) method. 2. String#String(byte[], int, int, java.nio.charset.Charset) uses the same machinery underneath. The difference (among other) is, the this constructor creates Buffer wrappers. that we can preallocate a bunch of CharBuffers and reuse them (speaking of performance). 3. Probably the quickest way bytes from a Channel can end up being UTF-8 decoded chars is through the ByteBuffer, CharsetEncoder and CharBuffer (If you know better, please tell me). 4. So if, after all, one needs a String the pipeline: Channel --> ByteBuffer --> CharsetDecoder --> CharBuffer --> String would be the quickest way to it. 5. Not everyone, probably, needs a String. For some users a CharSequence would do. Consider appending it (or its subsequence) to java.lang.Appendable, or just processing Stream from cs.chars(), etc. would it change any of your concerns? If not, please try to explain these problems again, even in more details.