Re: String encoding to ByteBuffer

Brian Burkhalter Mon, 13 Mar 2023 16:16:36 -0700
Redirecting to nio-dev which is the more appropriate forum for this topic.

> On Feb 26, 2023, at 3:39 PM, Carl M <j...@rkive.org> wrote:
> 
> I'm looking into adding a fast path case for encoding Strings into 
> ByteBuffers, and wanted to get feedback on a possible approach.  My use case 
> is taking mostly-ASCII, UTF-8 Strings and writing them to the disk/network.  
> To do this today, there are two approaches which both have drawbacks:
> 
> 1.  Use String.getBytes(StandardCharsets.UTF_8), and call ByteBuffer.put().  
> The downside of this approach is that I need to make a copy of the String's 
> byte[] value.    The upside of this approach is that ByteBuffer uses the 
> intrinsic copy methods, which are fast.
> 
> 2.  Wrap the String in a CharBuffer, and call 
> CharsetEncoder.encode(CharBuffer, ByteBuffer).  This avoids copying the 
> String value.  However, when using the UTF_8 encoder, there is no fastpath 
> for writing to direct ByteBuffers.   sun.nio.cs.UTF_8.encodeLoop() only has 
> fast paths for when the destination is array based.  This allocates less 
> memory, but is overall slower in my JMH benchmark.
> 
> To fix this, I looked at adding an overload to CharsetEncoder to accept a 
> String (or a CharSequence), and a ByteBuffer as a destination.  However, this 
> is not easily doable, since it's hard to call it in a loop.  In the case that 
> the String overflows the BB, the caller needs to be able to provide a new BB 
> and resume from where they left off.  The CharBuffer approach works here 
> because it keeps the position last read, and can resume from there.  
> 
> To encode a String, we need to know that the character index written to 
> resume with a larger buffer.  However, the return type on CharsetEncoder's 
> encode method is a CoderResult.  The length() method on this can't be called 
> for underflow cases.  This means that there isn't a usable return type here 
> (neither int nor CoderResult can be used).
> 
> Another, almost-possible solution I was considering adding a special case to 
> UTF_8 for direct buffer destinations, and a corresponding JLA.encodeASCII 
> overload that accepts a ByteBuffer.  The challenge here is that a wrapped 
> CharBuffer doesn't have an array, and so doesn't get the fast path copying.
> 
> The reason I am reaching out here is that I am looking for feedback on my 
> analysis of the existing API.  I am wondering what API compromises could be 
> made to fast path writing Strings to direct buffers, which I feel is probably 
> a common operation.  The only reasonable way I can see to implement is a new 
> return type, which also seems undesirable as well.
> 
> Carl
Re: String encoding to ByteBuffer

Reply via email to