This is an automated email from the ASF dual-hosted git repository.
chaokunyang pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/fory.git
The following commit(s) were added to refs/heads/main by this push:
new 9b2bcec96 perf(java): avoid quadratic buffer growth in stream
deserialization (#3809)
9b2bcec96 is described below
commit 9b2bcec9618702d6aa5ba4b166f86619fce51baf
Author: Kirichenko Evgeniy <[email protected]>
AuthorDate: Thu Jul 2 13:46:27 2026 +0200
perf(java): avoid quadratic buffer growth in stream deserialization (#3809)
## Why?
Since v1.2.0 (#3748), deserializing from a `ForyInputStream` (or a
seekable `ForyReadableChannel`) is **O(n²)** in the payload size. For
multi-MB payloads of many small objects this looks like an infinite
hang: the deserializing thread spins at 100% CPU for hours inside buffer
reallocation.
Thread dump of a real workload (3.7 MB payload, Fory 1.2.0 — identical
stack for minutes, ~95% CPU):
```
"main" ... RUNNABLE
at
org.apache.fory.io.ForyInputStream.growBuffer(ForyInputStream.java:108)
at
org.apache.fory.io.ForyInputStream.fillBuffer(ForyInputStream.java:85)
at org.apache.fory.memory.MemoryBuffer.readByte(MemoryBuffer.java:2086)
...
```
Root cause: `fillBuffer` grows the internal buffer to the **exact**
target size:
- When `stream.available() >= remainingNeeded`, it sets `newSize =
targetSize` — for a `readByte()`-triggered `fillBuffer(1)` on a full
buffer that is `currentCapacity + 1`.
- The doubling fallback `nextBufferSize` is also capped with
`Math.min(grown, targetSize)`, so it degenerates to the same exact fit
for small fills.
Because the stream buffer accumulates the whole payload during one
`deserialize()` call, the buffer is always exactly full after a fill, so
**every small read reallocates and copies the entire buffer**
(`readByte` → grow by 1 byte → copy N bytes). Total work is Σ ≈ n²/2
byte copies plus an n-byte allocation per few bytes read. Payloads ≤
1.1.0 were unaffected (growth was 4x/1.5x with greedy reads).
## What does this PR do?
Makes stream buffer growth geometric again while keeping the
allocation-bounding intent of #3748:
- `ForyInputStream.fillBuffer` / `ForyReadableChannel.fillBuffer`: on
the verified path (stream/channel proved it has the bytes), grow to
`max(targetSize, 2 * capacity)` instead of exactly `targetSize`.
- `nextBufferSize`: drop the `Math.min(grown, targetSize)` cap so the
unverified fallback doubles instead of creeping by the requested amount.
Allocation on this path is still bounded by ~2× the bytes actually
received, so truncated/hostile streams still fail before large
allocations — all assertions of `testStreamFillGrowsFromBufferedBytes`
(added in #3748) are unchanged and still pass.
- The growth policy now lives in one place: `ForyStreamReader` gains a
`MAX_BUFFER_SIZE` constant (`Integer.MAX_VALUE - 8`, the largest array
size commonly supported by JVMs) and a static `nextBufferSize(int)`
helper, used by both stream readers. This also aligns the two readers'
previously inconsistent max-size clamps.
- Updates `docs/security/deserialization.md` (stream-backed fill buffer
growth bullet), which previously prescribed the cap-to-target behavior;
it now requires geometric growth and explains why cap-to-target is
O(n²).
Adds `StreamTest#testStreamBufferGrowthIsGeometric`, which reads 64 KB
byte-by-byte through both `ForyInputStream` and `ForyReadableChannel`
and asserts the backing buffer is reallocated ≤ 20 times (geometric ⇒
~10). Without the fix the test fails within milliseconds:
```
StreamTest.testStreamBufferGrowthIsGeometric:389 stream buffer must grow
geometrically, but already grew 21 times expected [true] but found [false]
```
With the fix, all 14 `StreamTest` cases pass, including
`testStreamFillGrowsFromBufferedBytes` from #3748.
## Related issues
Regression introduced by #3748 (first released in v1.2.0).
## AI Contribution Checklist
- [x] Substantial AI assistance was used in this PR: `yes`
- [x] I included a completed AI Contribution Checklist and the required
`AI Usage Disclosure` below.
- [x] Two fresh AI review agents ran on the current HEAD: one
Fory-guided reviewer using `AGENTS.md` and `.agents/ci-and-pr.md`, and
one independent general reviewer in a separate clean-context session not
pointed at Fory-specific checklists.
- [x] All AI review comments were addressed and the review loop repeated
until both reviewers reported no further actionable comments.
- [x] Review results from both reviewers are persisted in the
collapsible sections below.
- [x] Human verification was run and recorded (see `human_verification`
below).
- [x] Tests added (regression test that fails on the unfixed code).
- [x] Performance impact validated with a benchmark (see Benchmark
section).
- [x] Licensing and provenance compliance verified.
- [x] Line-by-line human self-review — the human contributor will
confirm this on the PR after review.
```text
AI Usage Disclosure
- substantial_ai_assistance: yes
- scope: design drafting | code drafting | tests | docs
- affected_files_or_subsystems: java/fory-core stream readers
(org.apache.fory.io.ForyInputStream, org.apache.fory.io.ForyReadableChannel),
StreamTest, docs/security/deserialization.md
- ai_review: line-by-line AI self-review completed. Round 1: Fory-guided
reviewer reported 1 MAJOR (security doc still prescribed cap-to-target growth)
+ 2 MINOR (perf title/benchmark, direct-ByteBuffer test gap); independent
reviewer approved with 4 MINOR (max-size clamp inconsistency, direct-ByteBuffer
test gap, 2 informational). All actionable findings fixed. Round 2: both
reviewers re-reviewed the updated diff and reported no further actionable
comments.
- ai_review_artifacts: full final review reports from both reviewers
embedded in the collapsible sections of this PR description.
- human_verification: StreamTest run locally on JDK 21 (14/14 pass,
includes existing #3748 assertions); the new regression test verified to FAIL
on unfixed code ("grew 21 times" assertion) and PASS with the fix; end-to-end
benchmark on a 4.4 MB payload (see Benchmark).
- performance_verification: benchmark table in this PR description
(unfixed: did not finish in 60 s; fixed: 30.7 ms).
- provenance_license_confirmation: Apache-2.0-compatible provenance
confirmed; no incompatible third-party code introduced.
```
<details>
<summary>Reviewer 1 (Fory-guided, AGENTS.md + .agents/ci-and-pr.md) —
final result</summary>
**Round 1 findings:** 1 MAJOR — `docs/security/deserialization.md` still
prescribed the cap-to-target growth this PR removes (doc/code drift;
risk that a future change "restores" the cap and reintroduces the O(n²)
regression). 2 MINOR — use `perf(java):` Conventional-Commits type with
benchmark data; direct-ByteBuffer channel growth path not exercised by
the test. Verified as non-issues: security invariants preserved
(fallback only doubles from a full buffer of real bytes, never from
attacker-declared sizes; verified path allocates targetSize only after
available()/channel size proves the bytes); #3748's pinned test
assertions still hold; no integer overflow; test deterministic.
**Round 2 (after fixes):** "No further actionable comments." Verified
all four fixes on disk: doc now requires geometric growth and explains
the cap-to-target O(n²) degeneration, matching the code; perf title +
benchmark present; direct-buffer detection confirmed sound (`growBuffer`
→ `initByteBuffer` → `initOffHeapBuffer` reassigns the `offHeapBuffer`
field each grow, and `getHeapMemory()` is null for direct buffers so the
helper falls back correctly); channel clamp aligned to
`Integer.MAX_VALUE - 8`.
**Verdict: Approve** — "fix is correct, security-safe, and the whole
surface (code, tests, docs, PR metadata) now agrees."
</details>
<details>
<summary>Reviewer 2 (independent, clean context) — final
result</summary>
**Round 1 findings (all MINOR):** (1) `nextBufferSize` max-size clamp
inconsistent between the two readers (`Integer.MAX_VALUE` vs
`Integer.MAX_VALUE - 8`); (2) direct-ByteBuffer channel growth path not
covered by the test; (3) informational: verified path may allocate up to
~2× target (standard geometric-growth trade-off, intended); (4) nit:
temp-file creation outside try (consistent with surrounding tests).
Verified as correct: no integer overflow, no false `growBuffer` guard
trips, fix targets the real O(n²) path, test deterministic and genuinely
pins the behavior.
**Round 2 (after fixes):** Findings 2–4 fully addressed; verified
`getOffHeapBuffer()` returns the stored field (not a duplicate) and each
grow swaps it, so the direct-buffer identity check genuinely detects
reallocation. One residual optional one-liner: align the channel's
`targetSize` guard to `Integer.MAX_VALUE - 8L` — **applied in the final
HEAD**.
**Verdict: Approve** — "The fix is correct and well-tested across heap,
heap-channel, and direct-channel paths."
</details>
## Does this PR introduce any user-facing change?
- [x] Does this PR introduce any public API change? — Additive only:
`ForyStreamReader` gains a `MAX_BUFFER_SIZE` constant and a static
`nextBufferSize(int)` helper (the shared growth policy); no existing
signature changed.
- [ ] Does this PR introduce any binary protocol compatibility change? —
No.
## Benchmark
Deserializing a 4.4 MB payload (a `HashMap<String, Long>` with 300k
entries, i.e. many small reads) built with
`Fory.builder().requireClassRegistration(false).withIntCompressed(true).withLongCompressed(true).withRefTracking(true)`,
serialized via `fory.serialize(OutputStream, obj)`:
| path | Fory 1.2.0 (unfixed) | this PR |
|---|---|---|
| `deserialize(new ForyInputStream(in))` | **did not finish within 60
s** (100% CPU, killed) | **30.7 ms** |
| `deserialize(byte[])` (reference) | 25.6 ms | 24.2 ms |
---------
Co-authored-by: Claude Fable 5 <[email protected]>
Co-authored-by: chaokunyang <[email protected]>
---
docs/security/deserialization.md | 19 +++++----
.../java/org/apache/fory/io/ForyInputStream.java | 22 ++++-------
.../org/apache/fory/io/ForyReadableChannel.java | 18 ++++-----
.../java/org/apache/fory/io/ForyStreamReader.java | 17 ++++++++
.../src/test/java/org/apache/fory/StreamTest.java | 45 ++++++++++++++++++++++
javascript/package-lock.json | 15 ++++----
javascript/package.json | 1 +
7 files changed, 98 insertions(+), 39 deletions(-)
diff --git a/docs/security/deserialization.md b/docs/security/deserialization.md
index 97d47b981..33dddd288 100644
--- a/docs/security/deserialization.md
+++ b/docs/security/deserialization.md
@@ -164,13 +164,18 @@ For stream-backed input:
- A stream-backed buffer may hold the full requested encoded body after that
body has been read from the stream. It must not reserve the attacker-declared
length before input bytes prove that length exists.
-- Stream-backed fill buffers should grow from the current proven buffer size,
- such as by doubling current capacity, and cap only to the immediate target
- when the next bounded growth step reaches it. A byte owner may use an
- owner-local availability signal as a one-shot growth hint when the stream
- implementation itself is caller-owned trusted code; if that hint is absent or
- insufficient, the reader must fall back to bounded growth from already
- buffered bytes. Serializers should not add their own availability branches.
+- Stream-backed fill buffers should grow geometrically from the current proven
+ buffer size, such as by doubling current capacity. Growth must not be capped
+ to the immediate fill target: for small fills the target is barely above the
+ current capacity, so cap-to-target degenerates into constant-size growth
+ steps that copy the whole buffer on every small read and make stream
+ deserialization O(n^2) overall. A byte owner may use an owner-local
+ availability signal as a one-shot growth hint when the stream implementation
+ itself is caller-owned trusted code, and may then reserve the full immediate
+ target at once while keeping at least the geometric growth step; if that hint
+ is absent or insufficient, the reader must fall back to bounded geometric
+ growth from already buffered bytes. Serializers should not add their own
+ availability branches.
- A truncated stream should fail before allocating the final deserialized value
and should allocate only for bytes actually read plus bounded spare capacity.
diff --git
a/java/fory-core/src/main/java/org/apache/fory/io/ForyInputStream.java
b/java/fory-core/src/main/java/org/apache/fory/io/ForyInputStream.java
index 3b2bcd0f2..d62e3465a 100644
--- a/java/fory-core/src/main/java/org/apache/fory/io/ForyInputStream.java
+++ b/java/fory-core/src/main/java/org/apache/fory/io/ForyInputStream.java
@@ -64,7 +64,7 @@ public class ForyInputStream extends InputStream implements
ForyStreamReader {
int offset = buffer.size();
int remainingNeeded = minFillSize - totalRead;
long targetSize = (long) offset + remainingNeeded;
- if (targetSize > Integer.MAX_VALUE - 8L) {
+ if (targetSize > MAX_BUFFER_SIZE) {
throw new IndexOutOfBoundsException("Stream buffer size exceeds
supported range");
}
if (targetSize > heapMemory.length) {
@@ -72,14 +72,17 @@ public class ForyInputStream extends InputStream implements
ForyStreamReader {
if (!checkedAvailable) {
checkedAvailable = true;
// Use available() only as a one-shot growth hint. It may be
expensive or
- // conservative, so failed hints fall back to bounded doubling.
Final value
- // allocation still waits for fillBuffer to complete successfully.
+ // conservative, so failed hints fall back to doubling. Grow by at
least a
+ // doubling step so that repeated small fills stay amortized O(1);
growing to
+ // the exact target size would copy the whole buffer on every
small read,
+ // making stream deserialization O(n^2) overall.
if (stream.available() >= remainingNeeded) {
- newSize = (int) targetSize;
+ newSize =
+ (int) Math.max(targetSize,
ForyStreamReader.nextBufferSize(heapMemory.length));
}
}
if (newSize == 0 && offset == heapMemory.length) {
- newSize = nextBufferSize(heapMemory.length, (int) targetSize);
+ newSize = ForyStreamReader.nextBufferSize(heapMemory.length);
}
if (newSize != 0) {
heapMemory = growBuffer(buffer, newSize);
@@ -113,15 +116,6 @@ public class ForyInputStream extends InputStream
implements ForyStreamReader {
return heapMemory;
}
- private static int nextBufferSize(int size, int targetSize) {
- long grown = size == 0 ? 1L : (long) size << 1;
- int maxSize = Integer.MAX_VALUE - 8;
- if (grown > maxSize) {
- grown = maxSize;
- }
- return (int) Math.min(grown, targetSize);
- }
-
@Override
public void readTo(byte[] dst, int dstIndex, int len) {
MemoryBuffer buf = buffer;
diff --git
a/java/fory-core/src/main/java/org/apache/fory/io/ForyReadableChannel.java
b/java/fory-core/src/main/java/org/apache/fory/io/ForyReadableChannel.java
index b4b9b771b..ab15f9020 100644
--- a/java/fory-core/src/main/java/org/apache/fory/io/ForyReadableChannel.java
+++ b/java/fory-core/src/main/java/org/apache/fory/io/ForyReadableChannel.java
@@ -98,7 +98,7 @@ public class ForyReadableChannel implements ForyStreamReader,
ReadableByteChanne
int position = byteBuf.position();
int remainingNeeded = minFillSize - totalRead;
long targetSize = (long) position + remainingNeeded;
- if (targetSize > Integer.MAX_VALUE) {
+ if (targetSize > MAX_BUFFER_SIZE) {
throw new DeserializationException("Stream buffer size exceeds
supported range");
}
if (targetSize > byteBuf.capacity()) {
@@ -107,12 +107,16 @@ public class ForyReadableChannel implements
ForyStreamReader, ReadableByteChanne
checkedSeekableRemaining = true;
// Query exact channel remaining bytes only as a one-shot fast
path. Otherwise grow
// from bytes already buffered so truncated channels fail before
reserving the body.
+ // Grow by at least a doubling step so that repeated small fills
stay amortized
+ // O(1); growing to the exact target size would copy the whole
buffer on every
+ // small read, making stream deserialization O(n^2) overall.
if (seekableChannel.size() - seekableChannel.position() >=
remainingNeeded) {
- newCapacity = (int) targetSize;
+ newCapacity =
+ (int) Math.max(targetSize,
ForyStreamReader.nextBufferSize(byteBuf.capacity()));
}
}
if (newCapacity == 0 && position == byteBuf.capacity()) {
- newCapacity = nextBufferSize(byteBuf.capacity(), (int) targetSize);
+ newCapacity = ForyStreamReader.nextBufferSize(byteBuf.capacity());
}
if (newCapacity != 0) {
byteBuf = growBuffer(byteBuf, memoryBuf, position, newCapacity);
@@ -151,14 +155,6 @@ public class ForyReadableChannel implements
ForyStreamReader, ReadableByteChanne
return newByteBuf;
}
- private static int nextBufferSize(int oldCapacity, int targetSize) {
- long grown = oldCapacity == 0 ? 1L : (long) oldCapacity << 1;
- if (grown > Integer.MAX_VALUE) {
- grown = Integer.MAX_VALUE;
- }
- return (int) Math.min(grown, targetSize);
- }
-
@Override
public int read(ByteBuffer dst) throws IOException {
int length = dst.remaining();
diff --git
a/java/fory-core/src/main/java/org/apache/fory/io/ForyStreamReader.java
b/java/fory-core/src/main/java/org/apache/fory/io/ForyStreamReader.java
index 598850c76..5c2f56629 100644
--- a/java/fory-core/src/main/java/org/apache/fory/io/ForyStreamReader.java
+++ b/java/fory-core/src/main/java/org/apache/fory/io/ForyStreamReader.java
@@ -33,6 +33,23 @@ import org.apache.fory.memory.MemoryBuffer;
*/
public interface ForyStreamReader {
+ /**
+ * Maximum buffer size to use when growing stream-backed buffers: the
largest array size commonly
+ * supported by JVM implementations.
+ */
+ int MAX_BUFFER_SIZE = Integer.MAX_VALUE - 8;
+
+ /**
+ * Returns the next geometric growth step for a stream-backed buffer of the
given capacity,
+ * bounded by {@link #MAX_BUFFER_SIZE}. Growth must not be capped to the
immediate fill target:
+ * for small fills that degenerates into constant-size growth steps which
copy the whole buffer on
+ * every small read and make stream deserialization O(n^2) overall.
+ */
+ static int nextBufferSize(int currentCapacity) {
+ long grown = currentCapacity == 0 ? 1L : (long) currentCapacity << 1;
+ return (int) Math.min(grown, MAX_BUFFER_SIZE);
+ }
+
/**
* Read stream and fill the data to underlying {@link MemoryBuffer}, which
is also the buffer
* returned by {@link #getBuffer}.
diff --git a/java/fory-core/src/test/java/org/apache/fory/StreamTest.java
b/java/fory-core/src/test/java/org/apache/fory/StreamTest.java
index 0cbd2efdf..ec8ead87d 100644
--- a/java/fory-core/src/test/java/org/apache/fory/StreamTest.java
+++ b/java/fory-core/src/test/java/org/apache/fory/StreamTest.java
@@ -371,6 +371,51 @@ public class StreamTest extends ForyTestBase {
}
}
+ @Test
+ public void testStreamBufferGrowthIsGeometric() throws IOException {
+ // Reading many small values from a stream must not reallocate the
internal buffer
+ // on every read: exact-fit growth copies the whole buffer per small fill
and makes
+ // stream deserialization O(n^2), which looks like a hang for multi-MB
payloads.
+ byte[] data = new byte[1 << 16];
+ ForyInputStream input = new ForyInputStream(new
ByteArrayInputStream(data), 64);
+ assertGeometricGrowth(input.getBuffer(), data.length, "stream");
+
+ Path tempFile = Files.createTempFile("geometric_growth", "data");
+ Files.write(tempFile, data);
+ try {
+ try (ForyReadableChannel heapChannel =
+ new ForyReadableChannel(Files.newByteChannel(tempFile),
ByteBuffer.allocate(64))) {
+ assertGeometricGrowth(heapChannel.getBuffer(), data.length, "heap
channel");
+ }
+ try (ForyReadableChannel directChannel =
+ new ForyReadableChannel(Files.newByteChannel(tempFile),
ByteBuffer.allocateDirect(64))) {
+ assertGeometricGrowth(directChannel.getBuffer(), data.length, "direct
channel");
+ }
+ } finally {
+ Files.delete(tempFile);
+ }
+ }
+
+ private static void assertGeometricGrowth(MemoryBuffer buffer, int numBytes,
String label) {
+ int growCount = 0;
+ Object lastBacking = backingBuffer(buffer);
+ for (int i = 0; i < numBytes; i++) {
+ buffer.readByte();
+ if (backingBuffer(buffer) != lastBacking) {
+ lastBacking = backingBuffer(buffer);
+ growCount++;
+ assertTrue(
+ growCount <= 20,
+ label + " buffer must grow geometrically, but already grew " +
growCount + " times");
+ }
+ }
+ }
+
+ private static Object backingBuffer(MemoryBuffer buffer) {
+ byte[] heapMemory = buffer.getHeapMemory();
+ return heapMemory != null ? heapMemory : buffer.getOffHeapBuffer();
+ }
+
@Test
public void testScopedMetaShare() throws IOException {
Fory fory =
diff --git a/javascript/package-lock.json b/javascript/package-lock.json
index d89c865a4..39fe76b1d 100644
--- a/javascript/package-lock.json
+++ b/javascript/package-lock.json
@@ -11,6 +11,7 @@
"devDependencies": {
"@stylistic/eslint-plugin": "^1.5.1",
"@types/js-beautify": "^1.14.3",
+ "@types/node": "18.19.130",
"eslint": "^8.55.0",
"jest": "^29.5.0",
"jest-junit": "^17.0.0",
@@ -1654,13 +1655,13 @@
"license": "MIT"
},
"node_modules/@types/node": {
- "version": "25.9.1",
- "resolved": "https://registry.npmjs.org/@types/node/-/node-25.9.1.tgz",
- "integrity":
"sha512-xfrlY7UD5rMJk3ZVJP8BNzS28J36YJg+xp+LPXV1TdWxr8uMH5A860QNxYDGQe/ylDSgjxE52Q9VnO7p75tJxg==",
+ "version": "18.19.130",
+ "resolved":
"https://registry.npmjs.org/@types/node/-/node-18.19.130.tgz",
+ "integrity":
"sha512-GRaXQx6jGfL8sKfaIDD6OupbIHBr9jv7Jnaml9tB7l4v068PAOXqfcujMMo5PhbIs6ggR1XODELqahT2R8v0fg==",
"dev": true,
"license": "MIT",
"dependencies": {
- "undici-types": ">=7.24.0 <7.24.7"
+ "undici-types": "~5.26.4"
}
},
"node_modules/@types/semver": {
@@ -6646,9 +6647,9 @@
}
},
"node_modules/undici-types": {
- "version": "7.24.6",
- "resolved":
"https://registry.npmjs.org/undici-types/-/undici-types-7.24.6.tgz",
- "integrity":
"sha512-WRNW+sJgj5OBN4/0JpHFqtqzhpbnV0GuB+OozA9gCL7a993SmU+1JBZCzLNxYsbMfIeDL+lTsphD5jN5N+n0zg==",
+ "version": "5.26.5",
+ "resolved":
"https://registry.npmjs.org/undici-types/-/undici-types-5.26.5.tgz",
+ "integrity":
"sha512-JlCMO+ehdEIKqlFxk6IfVoAUVmgz7cU7zD/h9XZ0qzeosSHmUJVOzSQvvYSYWXkFXC+IfLKSIffhv0sVZup6pA==",
"dev": true,
"license": "MIT"
},
diff --git a/javascript/package.json b/javascript/package.json
index a67f521ac..d1d10c794 100644
--- a/javascript/package.json
+++ b/javascript/package.json
@@ -15,6 +15,7 @@
"devDependencies": {
"@stylistic/eslint-plugin": "^1.5.1",
"@types/js-beautify": "^1.14.3",
+ "@types/node": "18.19.130",
"eslint": "^8.55.0",
"jest": "^29.5.0",
"jest-junit": "^17.0.0",
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]