Re: Clogged pipes: 50x throughput degradation with large Cipher writes

Ferenc Rakoczi Thu, 27 Oct 2022 07:41:10 -0700

The fragmentation can be done within the update(…) functions that call the 
intrinsified processBlocks(…) (in this case there are only 2 of those, with 3 
call sites altogether), but a more general solution would be if somehow we 
could tell the JIT compiler (with an annotation similar to 
@IntrinsicCandidate): “use the intrinsic if there is one, from the first call, 
I am pretty sure it will be worth it even if we use this function only once”.

Ferenc

From: security-dev <security-dev-r...@openjdk.org> on behalf of Carter Kozak 
<cko...@ckozak.net>
Date: Thursday, 2022. October 27. 14:26
To: security-dev@openjdk.org <security-dev@openjdk.org>
Subject: Re: Clogged pipes: 50x throughput degradation with large Cipher writes
Thanks for your interest in the topic.

> While it might not be a problem in practice (large buffers are ok, but larger 
> than 1mb seems seldom, especially in multi threaded apps) it is still a 
> condition which can be handled. But with AE ciphers becoming the norm, such 
> large cipher chunks seems to be legacy as well?

The linked benchmark is a good representation of the problem as we encountered 
it in a production instance. A compute job (as opposed to a typical 
multi-threaded server) took data that was already buffered (relatively large 
byte-array on heap), and attempted to store it (either to disk, S3, etc), 
encrypting the data first. The filesystem is likely attached via a network, 
however the encryption in this case is a separate concern. We use a 
CipherOutputStream to encrypt data as we write it to storage. The client code 
only sees an OutputStream, so it's not clear to the caller that inputs should 
be segmented into smaller chunks -- it's often best to provide fewer, larger 
writes to a disk. The cipher interactions in this case are the result of the 
way other JDK components (namely CipherOutputStream) work. BufferedOutputStream 
is generally used to reduce interactions with Ciphers to avoid inefficient 
small operations, but it passes through large buffers to the delegate, allowing 
multi-megabyte cipher operations.

I could see an argument for CipherOutputStream becoming responsible for 
chunking, although that may not be ideal for native Cipher implementations 
which aren't constrained in the same ways, where perhaps a Cipher instance 
should have the ability to recommend a maximum segment size to callers (e.g. 
CipherOutputStream would segment based on a recommendation from cipher 
instances).

> Can you clarify. You said JSSE, does this actually happen in TLS usage  - how 
> big are your TLS Records? Isn’t there a 16k limit anyway?

I'm sorry, I confused initialisms, I believe JCE is more accurate. I can test 
TLS, but that's not the scenario where this was problematic in production.

Carter Kozak

Re: Clogged pipes: 50x throughput degradation with large Cipher writes

Reply via email to