The GitHub Actions job "npm_and_yarn in /javascript for js-yaml - Update #1445291020" on fory.git/main has failed. Run started by GitHub user dependabot[bot] (triggered by dependabot[bot]).
Head commit for run: 9b2bcec9618702d6aa5ba4b166f86619fce51baf / Kirichenko Evgeniy <[email protected]> perf(java): avoid quadratic buffer growth in stream deserialization (#3809) ## Why? Since v1.2.0 (#3748), deserializing from a `ForyInputStream` (or a seekable `ForyReadableChannel`) is **O(n²)** in the payload size. For multi-MB payloads of many small objects this looks like an infinite hang: the deserializing thread spins at 100% CPU for hours inside buffer reallocation. Thread dump of a real workload (3.7 MB payload, Fory 1.2.0 — identical stack for minutes, ~95% CPU): ``` "main" ... RUNNABLE at org.apache.fory.io.ForyInputStream.growBuffer(ForyInputStream.java:108) at org.apache.fory.io.ForyInputStream.fillBuffer(ForyInputStream.java:85) at org.apache.fory.memory.MemoryBuffer.readByte(MemoryBuffer.java:2086) ... ``` Root cause: `fillBuffer` grows the internal buffer to the **exact** target size: - When `stream.available() >= remainingNeeded`, it sets `newSize = targetSize` — for a `readByte()`-triggered `fillBuffer(1)` on a full buffer that is `currentCapacity + 1`. - The doubling fallback `nextBufferSize` is also capped with `Math.min(grown, targetSize)`, so it degenerates to the same exact fit for small fills. Because the stream buffer accumulates the whole payload during one `deserialize()` call, the buffer is always exactly full after a fill, so **every small read reallocates and copies the entire buffer** (`readByte` → grow by 1 byte → copy N bytes). Total work is Σ ≈ n²/2 byte copies plus an n-byte allocation per few bytes read. Payloads ≤ 1.1.0 were unaffected (growth was 4x/1.5x with greedy reads). ## What does this PR do? Makes stream buffer growth geometric again while keeping the allocation-bounding intent of #3748: - `ForyInputStream.fillBuffer` / `ForyReadableChannel.fillBuffer`: on the verified path (stream/channel proved it has the bytes), grow to `max(targetSize, 2 * capacity)` instead of exactly `targetSize`. - `nextBufferSize`: drop the `Math.min(grown, targetSize)` cap so the unverified fallback doubles instead of creeping by the requested amount. Allocation on this path is still bounded by ~2× the bytes actually received, so truncated/hostile streams still fail before large allocations — all assertions of `testStreamFillGrowsFromBufferedBytes` (added in #3748) are unchanged and still pass. - The growth policy now lives in one place: `ForyStreamReader` gains a `MAX_BUFFER_SIZE` constant (`Integer.MAX_VALUE - 8`, the largest array size commonly supported by JVMs) and a static `nextBufferSize(int)` helper, used by both stream readers. This also aligns the two readers' previously inconsistent max-size clamps. - Updates `docs/security/deserialization.md` (stream-backed fill buffer growth bullet), which previously prescribed the cap-to-target behavior; it now requires geometric growth and explains why cap-to-target is O(n²). Adds `StreamTest#testStreamBufferGrowthIsGeometric`, which reads 64 KB byte-by-byte through both `ForyInputStream` and `ForyReadableChannel` and asserts the backing buffer is reallocated ≤ 20 times (geometric ⇒ ~10). Without the fix the test fails within milliseconds: ``` StreamTest.testStreamBufferGrowthIsGeometric:389 stream buffer must grow geometrically, but already grew 21 times expected [true] but found [false] ``` With the fix, all 14 `StreamTest` cases pass, including `testStreamFillGrowsFromBufferedBytes` from #3748. ## Related issues Regression introduced by #3748 (first released in v1.2.0). ## AI Contribution Checklist - [x] Substantial AI assistance was used in this PR: `yes` - [x] I included a completed AI Contribution Checklist and the required `AI Usage Disclosure` below. - [x] Two fresh AI review agents ran on the current HEAD: one Fory-guided reviewer using `AGENTS.md` and `.agents/ci-and-pr.md`, and one independent general reviewer in a separate clean-context session not pointed at Fory-specific checklists. - [x] All AI review comments were addressed and the review loop repeated until both reviewers reported no further actionable comments. - [x] Review results from both reviewers are persisted in the collapsible sections below. - [x] Human verification was run and recorded (see `human_verification` below). - [x] Tests added (regression test that fails on the unfixed code). - [x] Performance impact validated with a benchmark (see Benchmark section). - [x] Licensing and provenance compliance verified. - [x] Line-by-line human self-review — the human contributor will confirm this on the PR after review. ```text AI Usage Disclosure - substantial_ai_assistance: yes - scope: design drafting | code drafting | tests | docs - affected_files_or_subsystems: java/fory-core stream readers (org.apache.fory.io.ForyInputStream, org.apache.fory.io.ForyReadableChannel), StreamTest, docs/security/deserialization.md - ai_review: line-by-line AI self-review completed. Round 1: Fory-guided reviewer reported 1 MAJOR (security doc still prescribed cap-to-target growth) + 2 MINOR (perf title/benchmark, direct-ByteBuffer test gap); independent reviewer approved with 4 MINOR (max-size clamp inconsistency, direct-ByteBuffer test gap, 2 informational). All actionable findings fixed. Round 2: both reviewers re-reviewed the updated diff and reported no further actionable comments. - ai_review_artifacts: full final review reports from both reviewers embedded in the collapsible sections of this PR description. - human_verification: StreamTest run locally on JDK 21 (14/14 pass, includes existing #3748 assertions); the new regression test verified to FAIL on unfixed code ("grew 21 times" assertion) and PASS with the fix; end-to-end benchmark on a 4.4 MB payload (see Benchmark). - performance_verification: benchmark table in this PR description (unfixed: did not finish in 60 s; fixed: 30.7 ms). - provenance_license_confirmation: Apache-2.0-compatible provenance confirmed; no incompatible third-party code introduced. ``` <details> <summary>Reviewer 1 (Fory-guided, AGENTS.md + .agents/ci-and-pr.md) — final result</summary> **Round 1 findings:** 1 MAJOR — `docs/security/deserialization.md` still prescribed the cap-to-target growth this PR removes (doc/code drift; risk that a future change "restores" the cap and reintroduces the O(n²) regression). 2 MINOR — use `perf(java):` Conventional-Commits type with benchmark data; direct-ByteBuffer channel growth path not exercised by the test. Verified as non-issues: security invariants preserved (fallback only doubles from a full buffer of real bytes, never from attacker-declared sizes; verified path allocates targetSize only after available()/channel size proves the bytes); #3748's pinned test assertions still hold; no integer overflow; test deterministic. **Round 2 (after fixes):** "No further actionable comments." Verified all four fixes on disk: doc now requires geometric growth and explains the cap-to-target O(n²) degeneration, matching the code; perf title + benchmark present; direct-buffer detection confirmed sound (`growBuffer` → `initByteBuffer` → `initOffHeapBuffer` reassigns the `offHeapBuffer` field each grow, and `getHeapMemory()` is null for direct buffers so the helper falls back correctly); channel clamp aligned to `Integer.MAX_VALUE - 8`. **Verdict: Approve** — "fix is correct, security-safe, and the whole surface (code, tests, docs, PR metadata) now agrees." </details> <details> <summary>Reviewer 2 (independent, clean context) — final result</summary> **Round 1 findings (all MINOR):** (1) `nextBufferSize` max-size clamp inconsistent between the two readers (`Integer.MAX_VALUE` vs `Integer.MAX_VALUE - 8`); (2) direct-ByteBuffer channel growth path not covered by the test; (3) informational: verified path may allocate up to ~2× target (standard geometric-growth trade-off, intended); (4) nit: temp-file creation outside try (consistent with surrounding tests). Verified as correct: no integer overflow, no false `growBuffer` guard trips, fix targets the real O(n²) path, test deterministic and genuinely pins the behavior. **Round 2 (after fixes):** Findings 2–4 fully addressed; verified `getOffHeapBuffer()` returns the stored field (not a duplicate) and each grow swaps it, so the direct-buffer identity check genuinely detects reallocation. One residual optional one-liner: align the channel's `targetSize` guard to `Integer.MAX_VALUE - 8L` — **applied in the final HEAD**. **Verdict: Approve** — "The fix is correct and well-tested across heap, heap-channel, and direct-channel paths." </details> ## Does this PR introduce any user-facing change? - [x] Does this PR introduce any public API change? — Additive only: `ForyStreamReader` gains a `MAX_BUFFER_SIZE` constant and a static `nextBufferSize(int)` helper (the shared growth policy); no existing signature changed. - [ ] Does this PR introduce any binary protocol compatibility change? — No. ## Benchmark Deserializing a 4.4 MB payload (a `HashMap<String, Long>` with 300k entries, i.e. many small reads) built with `Fory.builder().requireClassRegistration(false).withIntCompressed(true).withLongCompressed(true).withRefTracking(true)`, serialized via `fory.serialize(OutputStream, obj)`: | path | Fory 1.2.0 (unfixed) | this PR | |---|---|---| | `deserialize(new ForyInputStream(in))` | **did not finish within 60 s** (100% CPU, killed) | **30.7 ms** | | `deserialize(byte[])` (reference) | 25.6 ms | 24.2 ms | --------- Co-authored-by: Claude Fable 5 <[email protected]> Co-authored-by: chaokunyang <[email protected]> Report URL: https://github.com/apache/fory/actions/runs/28587544109 With regards, GitHub Actions via GitBox --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
