Can you create a bug in JBS (if there isn't one already). In general
there are a lot of issues with java.io.PipedXXX and discussions
periodically on whether to just deprecate these JDK 1.0/1.1 classes.
-Alan
On 11/03/2026 12:08, wenshao wrote:
Hi,
I found an integer overflow bug in String.encodedLengthUTF8() where
the LATIN1 code path uses an int accumulator without overflow check,
while the UTF16 path correctly uses long.
The bug:
LATIN1 path (line 1499-1517):
private static int encodedLengthUTF8(byte coder, byte[] val) {
if (coder == UTF16) {
return encodedLengthUTF8_UTF16(val, null); // ← uses long dp, has overflow
check
}
int positives = StringCoding.countPositives(val, 0, val.length);
if (positives == val.length) {
return positives;
}
int dp = positives; // ← int, no overflow protection
for (int i = dp; i < val.length; i++) {
if (val[i] < 0) dp += 2;
else dp++;
}
return dp; // ← may have overflowed
}
UTF16 path (encodedLengthUTF8_UTF16, line 1596-1642):
long dp = 0L; // ← long
...
if (dp > (long)Integer.MAX_VALUE) { // ← overflow check
throw new OutOfMemoryError("Required length exceeds implementation limit");
}
return (int) dp;
When a LATIN1 string contains more than Integer.MAX_VALUE / 2
non-ASCII bytes (~1 GB of 0x80-0xFF), each byte encodes to 2 UTF-8
bytes, so dp exceeds Integer.MAX_VALUE and wraps to negative.
This causes NegativeArraySizeException in downstream buffer
allocation, instead of OutOfMemoryError.
Analytical proof:
length = Integer.MAX_VALUE / 2 + 1 = 1,073,741,824
correct result (long) = 2 * 1,073,741,824 = 2,147,483,648
overflowed result (int) = -2,147,483,648 // silent overflow!
The fix:
Align LATIN1 path with UTF16 path:
long dp = positives;
for (int i = positives; i < val.length; i++) {
if (val[i] < 0) dp += 2;
else dp++;
}
if (dp > (long)Integer.MAX_VALUE) {
throw new OutOfMemoryError("Required length exceeds implementation limit");
}
return (int) dp;
Note: for (int i = dp; ...) changed to for (int i = positives; ...)
to avoid implicit long→int narrowing after dp changed to long.
This is semantically equivalent since dp == positives at loop entry.
No performance impact: long arithmetic has identical cost on 64-bit
platforms, the overflow check runs once outside the loop, and pure
ASCII strings exit early at line 1504 before reaching this code.
The patch includes a jtreg test with small-string correctness
verification and a large-string overflow test (requires 3GB heap).
Webrev: https://github.com/wenshao/jdk/tree/fix/string-encodedLengthUTF8-overflow
<https://github.com/wenshao/jdk/tree/fix/string-encodedLengthUTF8-overflow >
Thanks,
Shaojin Wen