Hi,
 I found an integer overflow bug in String.encodedLengthUTF8() where
 the LATIN1 code path uses an int accumulator without overflow check,
 while the UTF16 path correctly uses long.
 The bug:
 LATIN1 path (line 1499-1517):
 private static int encodedLengthUTF8(byte coder, byte[] val) {
 if (coder == UTF16) {
 return encodedLengthUTF8_UTF16(val, null); // ← uses long dp, has overflow 
check
 }
 int positives = StringCoding.countPositives(val, 0, val.length);
 if (positives == val.length) {
 return positives;
 }
 int dp = positives; // ← int, no overflow protection
 for (int i = dp; i < val.length; i++) {
 if (val[i] < 0) dp += 2;
 else dp++;
 }
 return dp; // ← may have overflowed
 }
 UTF16 path (encodedLengthUTF8_UTF16, line 1596-1642):
 long dp = 0L; // ← long
 ...
 if (dp > (long)Integer.MAX_VALUE) { // ← overflow check
 throw new OutOfMemoryError("Required length exceeds implementation limit");
 }
 return (int) dp;
 When a LATIN1 string contains more than Integer.MAX_VALUE / 2
 non-ASCII bytes (~1 GB of 0x80-0xFF), each byte encodes to 2 UTF-8
 bytes, so dp exceeds Integer.MAX_VALUE and wraps to negative.
 This causes NegativeArraySizeException in downstream buffer
 allocation, instead of OutOfMemoryError.
 Analytical proof:
 length = Integer.MAX_VALUE / 2 + 1 = 1,073,741,824
 correct result (long) = 2 * 1,073,741,824 = 2,147,483,648
 overflowed result (int) = -2,147,483,648 // silent overflow!
 The fix:
 Align LATIN1 path with UTF16 path:
 long dp = positives;
 for (int i = positives; i < val.length; i++) {
 if (val[i] < 0) dp += 2;
 else dp++;
 }
 if (dp > (long)Integer.MAX_VALUE) {
 throw new OutOfMemoryError("Required length exceeds implementation limit");
 }
 return (int) dp;
 Note: for (int i = dp; ...) changed to for (int i = positives; ...)
 to avoid implicit long→int narrowing after dp changed to long.
 This is semantically equivalent since dp == positives at loop entry.
 No performance impact: long arithmetic has identical cost on 64-bit
 platforms, the overflow check runs once outside the loop, and pure
 ASCII strings exit early at line 1504 before reaching this code.
 The patch includes a jtreg test with small-string correctness
 verification and a large-string overflow test (requires 3GB heap).
 Webrev: 
https://github.com/wenshao/jdk/tree/fix/string-encodedLengthUTF8-overflow 
<https://github.com/wenshao/jdk/tree/fix/string-encodedLengthUTF8-overflow >
 Thanks,
 Shaojin Wen

Reply via email to