The GitHub Actions job "npm_and_yarn in /javascript for js-yaml - Update 
#1445291020" on fory.git/main has failed.
Run started by GitHub user dependabot[bot] (triggered by dependabot[bot]).

Head commit for run:
9b2bcec9618702d6aa5ba4b166f86619fce51baf / Kirichenko Evgeniy 
<[email protected]>
perf(java): avoid quadratic buffer growth in stream deserialization (#3809)

## Why?

Since v1.2.0 (#3748), deserializing from a `ForyInputStream` (or a
seekable `ForyReadableChannel`) is **O(n²)** in the payload size. For
multi-MB payloads of many small objects this looks like an infinite
hang: the deserializing thread spins at 100% CPU for hours inside buffer
reallocation.

Thread dump of a real workload (3.7 MB payload, Fory 1.2.0 — identical
stack for minutes, ~95% CPU):

```
"main" ... RUNNABLE
    at org.apache.fory.io.ForyInputStream.growBuffer(ForyInputStream.java:108)
    at org.apache.fory.io.ForyInputStream.fillBuffer(ForyInputStream.java:85)
    at org.apache.fory.memory.MemoryBuffer.readByte(MemoryBuffer.java:2086)
    ...
```

Root cause: `fillBuffer` grows the internal buffer to the **exact**
target size:

- When `stream.available() >= remainingNeeded`, it sets `newSize =
targetSize` — for a `readByte()`-triggered `fillBuffer(1)` on a full
buffer that is `currentCapacity + 1`.
- The doubling fallback `nextBufferSize` is also capped with
`Math.min(grown, targetSize)`, so it degenerates to the same exact fit
for small fills.

Because the stream buffer accumulates the whole payload during one
`deserialize()` call, the buffer is always exactly full after a fill, so
**every small read reallocates and copies the entire buffer**
(`readByte` → grow by 1 byte → copy N bytes). Total work is Σ ≈ n²/2
byte copies plus an n-byte allocation per few bytes read. Payloads ≤
1.1.0 were unaffected (growth was 4x/1.5x with greedy reads).

## What does this PR do?

Makes stream buffer growth geometric again while keeping the
allocation-bounding intent of #3748:

- `ForyInputStream.fillBuffer` / `ForyReadableChannel.fillBuffer`: on
the verified path (stream/channel proved it has the bytes), grow to
`max(targetSize, 2 * capacity)` instead of exactly `targetSize`.
- `nextBufferSize`: drop the `Math.min(grown, targetSize)` cap so the
unverified fallback doubles instead of creeping by the requested amount.
Allocation on this path is still bounded by ~2× the bytes actually
received, so truncated/hostile streams still fail before large
allocations — all assertions of `testStreamFillGrowsFromBufferedBytes`
(added in #3748) are unchanged and still pass.
- The growth policy now lives in one place: `ForyStreamReader` gains a
`MAX_BUFFER_SIZE` constant (`Integer.MAX_VALUE - 8`, the largest array
size commonly supported by JVMs) and a static `nextBufferSize(int)`
helper, used by both stream readers. This also aligns the two readers'
previously inconsistent max-size clamps.
- Updates `docs/security/deserialization.md` (stream-backed fill buffer
growth bullet), which previously prescribed the cap-to-target behavior;
it now requires geometric growth and explains why cap-to-target is
O(n²).

Adds `StreamTest#testStreamBufferGrowthIsGeometric`, which reads 64 KB
byte-by-byte through both `ForyInputStream` and `ForyReadableChannel`
and asserts the backing buffer is reallocated ≤ 20 times (geometric ⇒
~10). Without the fix the test fails within milliseconds:

```
StreamTest.testStreamBufferGrowthIsGeometric:389 stream buffer must grow
geometrically, but already grew 21 times expected [true] but found [false]
```

With the fix, all 14 `StreamTest` cases pass, including
`testStreamFillGrowsFromBufferedBytes` from #3748.

## Related issues

Regression introduced by #3748 (first released in v1.2.0).

## AI Contribution Checklist

- [x] Substantial AI assistance was used in this PR: `yes`
- [x] I included a completed AI Contribution Checklist and the required
`AI Usage Disclosure` below.
- [x] Two fresh AI review agents ran on the current HEAD: one
Fory-guided reviewer using `AGENTS.md` and `.agents/ci-and-pr.md`, and
one independent general reviewer in a separate clean-context session not
pointed at Fory-specific checklists.
- [x] All AI review comments were addressed and the review loop repeated
until both reviewers reported no further actionable comments.
- [x] Review results from both reviewers are persisted in the
collapsible sections below.
- [x] Human verification was run and recorded (see `human_verification`
below).
- [x] Tests added (regression test that fails on the unfixed code).
- [x] Performance impact validated with a benchmark (see Benchmark
section).
- [x] Licensing and provenance compliance verified.
- [x] Line-by-line human self-review — the human contributor will
confirm this on the PR after review.

```text
AI Usage Disclosure
- substantial_ai_assistance: yes
- scope: design drafting | code drafting | tests | docs
- affected_files_or_subsystems: java/fory-core stream readers 
(org.apache.fory.io.ForyInputStream, org.apache.fory.io.ForyReadableChannel), 
StreamTest, docs/security/deserialization.md
- ai_review: line-by-line AI self-review completed. Round 1: Fory-guided 
reviewer reported 1 MAJOR (security doc still prescribed cap-to-target growth) 
+ 2 MINOR (perf title/benchmark, direct-ByteBuffer test gap); independent 
reviewer approved with 4 MINOR (max-size clamp inconsistency, direct-ByteBuffer 
test gap, 2 informational). All actionable findings fixed. Round 2: both 
reviewers re-reviewed the updated diff and reported no further actionable 
comments.
- ai_review_artifacts: full final review reports from both reviewers embedded 
in the collapsible sections of this PR description.
- human_verification: StreamTest run locally on JDK 21 (14/14 pass, includes 
existing #3748 assertions); the new regression test verified to FAIL on unfixed 
code ("grew 21 times" assertion) and PASS with the fix; end-to-end benchmark on 
a 4.4 MB payload (see Benchmark).
- performance_verification: benchmark table in this PR description (unfixed: 
did not finish in 60 s; fixed: 30.7 ms).
- provenance_license_confirmation: Apache-2.0-compatible provenance confirmed; 
no incompatible third-party code introduced.
```

<details>
<summary>Reviewer 1 (Fory-guided, AGENTS.md + .agents/ci-and-pr.md) —
final result</summary>

**Round 1 findings:** 1 MAJOR — `docs/security/deserialization.md` still
prescribed the cap-to-target growth this PR removes (doc/code drift;
risk that a future change "restores" the cap and reintroduces the O(n²)
regression). 2 MINOR — use `perf(java):` Conventional-Commits type with
benchmark data; direct-ByteBuffer channel growth path not exercised by
the test. Verified as non-issues: security invariants preserved
(fallback only doubles from a full buffer of real bytes, never from
attacker-declared sizes; verified path allocates targetSize only after
available()/channel size proves the bytes); #3748's pinned test
assertions still hold; no integer overflow; test deterministic.

**Round 2 (after fixes):** "No further actionable comments." Verified
all four fixes on disk: doc now requires geometric growth and explains
the cap-to-target O(n²) degeneration, matching the code; perf title +
benchmark present; direct-buffer detection confirmed sound (`growBuffer`
→ `initByteBuffer` → `initOffHeapBuffer` reassigns the `offHeapBuffer`
field each grow, and `getHeapMemory()` is null for direct buffers so the
helper falls back correctly); channel clamp aligned to
`Integer.MAX_VALUE - 8`.

**Verdict: Approve** — "fix is correct, security-safe, and the whole
surface (code, tests, docs, PR metadata) now agrees."

</details>

<details>
<summary>Reviewer 2 (independent, clean context) — final
result</summary>

**Round 1 findings (all MINOR):** (1) `nextBufferSize` max-size clamp
inconsistent between the two readers (`Integer.MAX_VALUE` vs
`Integer.MAX_VALUE - 8`); (2) direct-ByteBuffer channel growth path not
covered by the test; (3) informational: verified path may allocate up to
~2× target (standard geometric-growth trade-off, intended); (4) nit:
temp-file creation outside try (consistent with surrounding tests).
Verified as correct: no integer overflow, no false `growBuffer` guard
trips, fix targets the real O(n²) path, test deterministic and genuinely
pins the behavior.

**Round 2 (after fixes):** Findings 2–4 fully addressed; verified
`getOffHeapBuffer()` returns the stored field (not a duplicate) and each
grow swaps it, so the direct-buffer identity check genuinely detects
reallocation. One residual optional one-liner: align the channel's
`targetSize` guard to `Integer.MAX_VALUE - 8L` — **applied in the final
HEAD**.

**Verdict: Approve** — "The fix is correct and well-tested across heap,
heap-channel, and direct-channel paths."

</details>

## Does this PR introduce any user-facing change?

- [x] Does this PR introduce any public API change? — Additive only:
`ForyStreamReader` gains a `MAX_BUFFER_SIZE` constant and a static
`nextBufferSize(int)` helper (the shared growth policy); no existing
signature changed.
- [ ] Does this PR introduce any binary protocol compatibility change? —
No.

## Benchmark

Deserializing a 4.4 MB payload (a `HashMap<String, Long>` with 300k
entries, i.e. many small reads) built with
`Fory.builder().requireClassRegistration(false).withIntCompressed(true).withLongCompressed(true).withRefTracking(true)`,
serialized via `fory.serialize(OutputStream, obj)`:

| path | Fory 1.2.0 (unfixed) | this PR |
|---|---|---|
| `deserialize(new ForyInputStream(in))` | **did not finish within 60
s** (100% CPU, killed) | **30.7 ms** |
| `deserialize(byte[])` (reference) | 25.6 ms | 24.2 ms |

---------

Co-authored-by: Claude Fable 5 <[email protected]>
Co-authored-by: chaokunyang <[email protected]>

Report URL: https://github.com/apache/fory/actions/runs/28587544109

With regards,
GitHub Actions via GitBox


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to