On Thu, 23 Nov 2023 10:30:52 GMT, Daniel Jeliński <[email protected]> wrote:
>> Hi,
>>
>> I need a review for a new internal buffer class called AEADBufferStream.
>> AEADBufferStream extends ByteArrayOutputStream, but eliminates some data
>> checking and copying that are not necessary for what GaloisCounterMode.java
>> and ChaCha20Cipher.java need.
>>
>> The changes greatest benefit is with decryption operations.
>> ChaCha20-Poly1305 had larger performance gains by adopting similar
>> techniques that AES/GCM already uses.
>>
>> The new buffer shows up to 21% bytes/sec performance increase for decryption
>> for ChaCha20-Poly1305 and 12% for AES/GCM. 16K data sizes saw a memory
>> usage reduction of 46% with and 83% with ChaCha20-Poly1305. These results
>> come from the JMH tests updated in this request and memory usage using the
>> JMH gc profile gc.alloc.rate.norm entry
>>
>> thanks
>>
>> Tony
>
> src/java.base/share/classes/com/sun/crypto/provider/AEADBufferedStream.java
> line 45:
>
>> 43: * Create an instance with no buffer
>> 44: */
>> 45: public AEADBufferedStream() {
>
> Never used. And once you remove it, `buf` is never null
Sure
> src/java.base/share/classes/com/sun/crypto/provider/AEADBufferedStream.java
> line 55:
>
>> 53:
>> 54: public AEADBufferedStream(int len) {
>> 55: buf = new byte[len];
>
> Suggestion:
>
> super(len);
>
> otherwise buf will be initialized twice, once here, once in the base class
> constructor
Yeah.. better memory performance too.
> src/java.base/share/classes/com/sun/crypto/provider/AEADBufferedStream.java
> line 56:
>
>> 54: public AEADBufferedStream(int len) {
>> 55: buf = new byte[len];
>> 56: count = 0;
>
> no need to zero-initialize, it's zero by default
Yes
> src/java.base/share/classes/com/sun/crypto/provider/AEADBufferedStream.java
> line 67:
>
>> 65: */
>> 66: @Override
>> 67: public synchronized byte[] toByteArray() {
>
> remove this `synchronized`; the new `write` methods are not synchronized, so
> this does not add value.
Interesting, I thought I was stuck with `synchronized` from the parent
> src/java.base/share/classes/com/sun/crypto/provider/AEADBufferedStream.java
> line 68:
>
>> 66: @Override
>> 67: public synchronized byte[] toByteArray() {
>> 68: if (buf.length > count) {
>
> Can this ever happen? And if it can, would it be better to return the whole
> buffer and have the caller extract the necessary range?
Yes, because it can reuse the allocated buffer during encryption scenarios when
multi-part ops send non-block size data. For decryption, it should never
happen.
> src/java.base/share/classes/com/sun/crypto/provider/AEADBufferedStream.java
> line 94:
>
>> 92: } else {
>> 93: if (buf.length < (count + len)) {
>> 94: buf = Arrays.copyOf(buf, count + len);
>
> this will copy a lot if the user performs many small writes, for example when
> decrypting with CipherInputStream; see `AESGCMCipherOutputStream` benchmark.
As I understand the `ByteArrayOutputStream` code, the
`ArraysSupport.newLength()` will double the allocation each time. So if the
buffer is 1k size and it wants to add one more byte, the method will allocate
2k.
I agree that in my change, if you send one byte at a time, it will be doing a
lot of allocating. But I don't believe that is the general case. If you have
examples of apps doing small allocations, let me know and I can come up with a
different scheme. But at this point I thought it was a bitter risk more
allocations in the less-likely situation, than unused allocation in what I see
as a more common case.
> src/java.base/share/classes/com/sun/crypto/provider/ChaCha20Cipher.java line
> 699:
>
>> 697: * @throws ShortBufferException if the buffer {@code out} does not
>> have
>> 698: * enough space to hold the resulting data.
>> 699: */ @Override
>
> Suggestion:
>
> */
> @Override
Yep
> src/java.base/share/classes/com/sun/crypto/provider/ChaCha20Cipher.java line
> 792:
>
>> 790:
>> 791: /*
>> 792: * Optimized version of bufferCrypt from CipherSpi.java. Direct
>
> Can you document the optimizations done in this function?
The second sentence says what the optimizations is.
> src/java.base/share/classes/com/sun/crypto/provider/ChaCha20Cipher.java line
> 844:
>
>> 842: total = engine.doFinal(inArray, inOfs, inLen,
>> outArray, outOfs);
>> 843: }
>> 844: } catch (BadPaddingException | KeyException e) {
>
> Preexisting, and probably out of scope for this PR, but we should expose the
> "Counter exhausted" exception in javax.crypto.Cipher; filed
> [JDK-8320743](https://bugs.openjdk.org/browse/JDK-8320743) for that
Certainly out of scope here.
> src/java.base/share/classes/com/sun/crypto/provider/ChaCha20Cipher.java line
> 1427:
>
>> 1425: input.get(in);
>> 1426: byte[] out = new byte[in.length];
>> 1427: doUpdate(in, 0, in.length, out, out.length);
>
> Suggestion:
>
> byte[] out = in;
> doUpdate(in, 0, in.length, out, 0);
>
> I guess we need more test coverage here
I don't see a testing issue there, but that's better memory usage. I probably
copied this code over from AES/GCM where it's blocksized data and `in` and
`out` could have been different sizes. But CC20 can use this optimization
because it's a streaming cipher.
> src/java.base/share/classes/com/sun/crypto/provider/ChaCha20Cipher.java line
> 1505:
>
>> 1503: byte[] in = new byte[input.remaining()];
>> 1504: input.get(in);
>> 1505: byte[] out = new byte[in.length];
>
> Suggestion:
>
> byte[] out = in;
yep
> src/java.base/share/classes/com/sun/crypto/provider/ChaCha20Cipher.java line
> 1516:
>
>> 1514: input.get(in);
>> 1515: byte[] out = new byte[in.length + TAG_LENGTH];
>> 1516: doFinal(in, 0, in.length, out, 0);
>
> A bit more complex, but you can avoid one allocation here too; something like
> (untested):
> Suggestion:
>
> int inLen = input.remaining();
> byte[] out = new byte[inLen + TAG_LENGTH];
> byte[] in = out;
> input.get(in, 0, inLen);
> doFinal(in, 0, inLen, out, 0);
I will have to check on this, I've run into `BufferUnderflowException` too many
times with ByteBuffer to say for sure at this point.
> src/java.base/share/classes/com/sun/crypto/provider/ChaCha20Cipher.java line
> 1652:
>
>> 1650: int ctLen = getBufferedLength() + inLen;
>> 1651: if (ctLen == 0) {
>> 1652: throw new AEADBadTagException("Tag mismatch");
>
> Suggestion:
>
> if (ctLen < TAG_LENGTH) {
> throw new AEADBadTagException("Input too short - need tag");
Yep
> test/micro/org/openjdk/bench/javax/crypto/full/AESGCMBench.java line 43:
>
>> 41: AlgorithmParameterSpec getNewSpec() {
>> 42: iv_index = (iv_index + 1) % IV_MODULO;
>> 43: return new GCMParameterSpec(96, iv, iv_index, 12);
>
> Can you also change tag length to 128 bits? TLS uses 128, and 128-bit tag
> generates a slightly different flamegraph.
I'll see if I can do it cleanly. GCM spec defaults to 96bit and because
CC20P1305 requires 96bit, it made the common code easier. I'm surprised you any
differences which such a minor change.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/16487#discussion_r1406946509
PR Review Comment: https://git.openjdk.org/jdk/pull/16487#discussion_r1406946538
PR Review Comment: https://git.openjdk.org/jdk/pull/16487#discussion_r1406968884
PR Review Comment: https://git.openjdk.org/jdk/pull/16487#discussion_r1406946567
PR Review Comment: https://git.openjdk.org/jdk/pull/16487#discussion_r1406946612
PR Review Comment: https://git.openjdk.org/jdk/pull/16487#discussion_r1406946637
PR Review Comment: https://git.openjdk.org/jdk/pull/16487#discussion_r1406946687
PR Review Comment: https://git.openjdk.org/jdk/pull/16487#discussion_r1406946735
PR Review Comment: https://git.openjdk.org/jdk/pull/16487#discussion_r1406946776
PR Review Comment: https://git.openjdk.org/jdk/pull/16487#discussion_r1406947706
PR Review Comment: https://git.openjdk.org/jdk/pull/16487#discussion_r1406947296
PR Review Comment: https://git.openjdk.org/jdk/pull/16487#discussion_r1406953636
PR Review Comment: https://git.openjdk.org/jdk/pull/16487#discussion_r1406968978
PR Review Comment: https://git.openjdk.org/jdk/pull/16487#discussion_r1406968180