I'm looking into adding a fast path case for encoding Strings into ByteBuffers, 
and wanted to get feedback on a possible approach.  My use case is taking 
mostly-ASCII, UTF-8 Strings and writing them to the disk/network.  To do this 
today, there are two approaches which both have drawbacks:

1.  Use String.getBytes(StandardCharsets.UTF_8), and call ByteBuffer.put().  
The downside of this approach is that I need to make a copy of the String's 
byte[] value.    The upside of this approach is that ByteBuffer uses the 
intrinsic copy methods, which are fast.

2.  Wrap the String in a CharBuffer, and call CharsetEncoder.encode(CharBuffer, 
ByteBuffer).  This avoids copying the String value.  However, when using the 
UTF_8 encoder, there is no fastpath for writing to direct ByteBuffers.   
sun.nio.cs.UTF_8.encodeLoop() only has fast paths for when the destination is 
array based.  This allocates less memory, but is overall slower in my JMH 
benchmark.

To fix this, I looked at adding an overload to CharsetEncoder to accept a 
String (or a CharSequence), and a ByteBuffer as a destination.  However, this 
is not easily doable, since it's hard to call it in a loop.  In the case that 
the String overflows the BB, the caller needs to be able to provide a new BB 
and resume from where they left off.  The CharBuffer approach works here 
because it keeps the position last read, and can resume from there.  

To encode a String, we need to know that the character index written to resume 
with a larger buffer.  However, the return type on CharsetEncoder's encode 
method is a CoderResult.  The length() method on this can't be called for 
underflow cases.  This means that there isn't a usable return type here 
(neither int nor CoderResult can be used).

Another, almost-possible solution I was considering adding a special case to 
UTF_8 for direct buffer destinations, and a corresponding JLA.encodeASCII 
overload that accepts a ByteBuffer.  The challenge here is that a wrapped 
CharBuffer doesn't have an array, and so doesn't get the fast path copying.

The reason I am reaching out here is that I am looking for feedback on my 
analysis of the existing API.  I am wondering what API compromises could be 
made to fast path writing Strings to direct buffers, which I feel is probably a 
common operation.  The only reasonable way I can see to implement is a new 
return type, which also seems undesirable as well.

Carl

Reply via email to