chaokunyang commented on code in PR #3734:
URL: https://github.com/apache/fory/pull/3734#discussion_r3418712793


##########
THREAT_MODEL.md:
##########
@@ -0,0 +1,182 @@
+<!--
+SPDX-License-Identifier: Apache-2.0
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+    https://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Apache Fory — Threat Model (v0 draft)
+
+## §1 Header
+
+- **Project:** Apache Fory (`apache/fory`), `main`, against which this draft 
was written. Fory is a multi-language serialization framework (Java, C++, 
Python, Go, Rust, JavaScript, Kotlin, Scala, Swift, Dart, C#).
+- **Date:** 2026-06-02. **Status:** draft — for Apache Fory PMC review. 
**Author:** ASF Security team (drafted via the Scovetta threat-model rubric), 
for PMC ratification.
+- **Version binding:** versioned with the project; a report against Fory 
version *N* is triaged against the model as it stood at *N*, not at HEAD.
+- **Reporting cross-reference:** findings that violate a §8 property should be 
reported privately per the ASF process (`[email protected]` → 
`[email protected]`); findings under §3 or §9 are closed citing this 
document.
+- **Provenance legend:** *(documented)* = stated in Fory's own docs/repo; 
*(maintainer)* = confirmed by a Fory PMC member through this process; 
*(inferred)* = reasoned from architecture/domain knowledge, not yet confirmed — 
every *(inferred)* claim has a matching §14 open question.
+- **Draft confidence:** ~20 documented / 0 maintainer / ~26 inferred.
+- **What Fory is:** Apache Fory is a high-performance, multi-language 
object/data serialization framework. An application uses it in-process to 
serialize its objects to bytes and deserialize bytes back into objects, either 
within one language ("native" mode) or across languages ("xlang" mode), with 
optional zero-copy and a row format. *(documented — README, docs/guide)*
+
+## §2 Scope and intended use
+
+- **Primary use:** an **in-process library** linked into a host application 
that calls `serialize()` / `deserialize()` on its own data types. *(documented 
— guides)*
+- **It is not a network service or daemon.** It has no listening surface, no 
auth, no users — the embedding application owns where the bytes come from and 
go. *(inferred)*
+- **Caller / trust level:** a single caller — the embedding application — 
which is **trusted** (it links the library and registers its types). The 
security-relevant question is not "who calls Fory" but **"where do the bytes 
handed to `deserialize()` come from"** — trusted producer, or 
attacker-controlled. *(inferred; the registration guidance is documented)*
+
+**Component-family table** *(in/out of this model):*
+
+| Family | Entry point | Notes | In model? |
+| --- | --- | --- | --- |
+| Object-graph serialization (native, per language) | `fory.serialize` / 
`deserialize` | the core; instantiates registered types from bytes | **In** 
*(documented)* |
+| Cross-language (xlang) serialization | xlang `serialize`/`deserialize` | 
type mapping across languages | **In** *(documented)* |
+| Row format / zero-copy | row encoders | reads fields in place from a buffer 
| **In** *(documented)* |
+| Class/type registration + "secure mode" | `requireClassRegistration`, 
`register(...)` | the primary defense | **In** *(documented)* |
+| Per-language implementations | `java/`, `cpp/`, `python/`, `go/`, `rust/`, 
`javascript/`, `kotlin/`, `scala/`, `swift/`, `dart/`, `csharp/` | each is a 
separate impl of the same model | **In** — but memory-safety profile differs by 
language (see §5/§8) *(documented: dirs exist)* |
+| `examples/`, `benchmarks/`, `integration_tests/` | demo/bench/test | not 
production surface | **Out** *(see §3)* |
+
+## §3 Out of scope (explicit non-goals)
+
+- **The integrity / authenticity / confidentiality of the serialized bytes.** 
Fory deserializes what it is given; it does not authenticate, MAC, or encrypt 
payloads. If bytes can be tampered with in transit/at rest, that is the 
application's problem to solve (sign/encrypt before handing to Fory). 
*(inferred)*
+- **Anything when the caller disables class registration on an untrusted 
payload source.** `requireClassRegistration(false)` is a documented, 
deliberately-available footgun; using it against attacker-controlled bytes is 
out of the model's protection (see §5a/§9). *(documented — config: "Disabling 
may allow unknown classes to be deserialized, potentially causing security 
risks")*
+- **The behaviour of the application's own registered classes.** Fory 
instantiates and populates registered types; if a registered class has 
dangerous side effects in its constructors/setters/finalizers, that is the 
application's design, not Fory's. *(inferred)*
+- **`examples/`, `benchmarks/`, `integration_tests/`** — shipped but not a 
production trust surface. *(inferred)*
+
+## §4 Trust boundaries and data flow
+
+- **The trust boundary is the byte buffer passed to `deserialize()`** (and the 
row-format buffer). Everything Fory does on the serialize side operates on the 
application's own in-memory objects (trusted); the deserialize side is where 
attacker-controlled bytes, if any, enter. *(inferred)*
+- **Data flow:** untrusted bytes → format/header parse → (class id / type 
resolution → **registration check**) → field decode → object graph construction 
→ returned to caller. The registration check is the gate that decides whether 
an arbitrary type may be instantiated. *(inferred; registration mechanism 
documented)*
+- **Reachability precondition:** a deserialize-side finding is **in-model** 
only if it is reachable from the byte buffer under the **default secure 
configuration** (`requireClassRegistration(true)`). A finding that requires 
`requireClassRegistration(false)`, or that requires the *serialize* side to be 
fed attacker-controlled live objects, is out-of-model (§5a / trusted-input). 
*(inferred)*
+
+## §5 Assumptions about the environment
+
+- **In-process, no ambient I/O.** Fory does not (by design) open sockets, 
spawn processes, or read the network; it operates on in-memory buffers handed 
to it. *(inferred — high-priority confirmation; negative claim)*
+- **Per-language memory model differs.** In managed runtimes (Java, Python, 
Go, JS, …) memory safety is the runtime's; in the **C++** (and unsafe-Rust FFI) 
paths, malformed input reaching the decoder is a memory-safety surface in a way 
it is not on the JVM. The model's "memory safety on malformed input" property 
is therefore language-conditional (see §8). *(inferred)*
+- **Codegen / JIT:** on ordinary JVMs Fory generates serializer code at 
runtime (`codeGenEnabled` default true); disabled on Android / GraalVM native 
image. This is a performance mechanism over the application's own registered 
types, not a path for executing attacker bytes. *(documented — config table)*
+
+## §5a Build-time and configuration variants
+
+The security envelope is set by runtime configuration, not build flags. The 
load-bearing knobs *(documented — docs/guide/java/configuration.md)*:
+
+| Knob | Default | Effect on the model |
+| --- | --- | --- |
+| `requireClassRegistration` | **`true`** (secure) | When true, only 
registered types are deserialized — the primary defense against deserializing 
arbitrary/gadget classes. Disabling "may allow unknown classes to be 
deserialized, potentially causing security risks." |
+| `maxDepth` | **`50`** | Bounds deserialization recursion depth; "can be used 
to refuse deserialization DDOS attack." |
+| `deserializeUnknownClass` | `true` in compatible mode, else `false` | 
Whether data for unknown/non-existent classes is skipped/deserialized. |
+| `compatible` | xlang: `true`; native: `false` | Schema forward/backward 
compatibility. |
+| `suppressClassRegistrationWarnings` | `true` | Registration warnings are 
useful for security audit but suppressed by default. |
+
+**The default is the *secure* posture here** (registration required) — the 
inverse of the usual insecure-default case. The model's §8 properties hold 
*under the defaults*; a report that only manifests under 
`requireClassRegistration(false)` is `OUT-OF-MODEL: non-default-build`. Confirm 
this framing with the PMC (§14).
+
+## §6 Assumptions about inputs
+
+Per-entry-point trust table *(registration mechanism + defaults documented; 
trust framing inferred):*
+
+| Entry point | Input | Attacker-controllable? | Caller must enforce |
+| --- | --- | --- | --- |
+| `deserialize(bytes)` / `deserialize(bytes, Class)` | serialized byte buffer 
| **yes, if the application sources bytes from an untrusted producer** | keep 
`requireClassRegistration(true)`; register only safe types; integrity-check 
bytes upstream |
+| row-format readers | buffer | **yes** (same as above) | same |
+| `serialize(obj)` | a live application object | no — the app's own trusted 
object | n/a |
+| `register(Class, …)` | type registered at setup | no — controlled by the app 
developer | register only types safe to instantiate from untrusted data |
+
+- **Size/shape/rate:** `maxDepth` (default 50) bounds nesting; whether total 
allocation / output size is otherwise bounded against a hostile payload is open 
(see §8 resource line). *(maxDepth documented; broader bound inferred)*
+
+## §7 Adversary model
+
+- **Primary adversary:** a party who controls the **serialized bytes** an 
application later passes to `deserialize()` (e.g. data arriving over a network 
the app feeds to Fory, or persisted data an attacker can tamper with). Goal: 
instantiate dangerous types (gadget-chain RCE), corrupt memory in the native 
paths, or exhaust CPU/memory. *(inferred — the canonical 
serialization-framework adversary)*
+- **Capabilities:** can craft arbitrary/malformed byte buffers; cannot change 
the application's Fory configuration or its registered-type set (those are set 
by the trusted app at startup). *(inferred)*
+- **Out of scope:** an attacker who controls the embedding application, its 
configuration, or the objects passed to `serialize()` — already trusted; an 
attacker who has set `requireClassRegistration(false)` themselves. *(inferred)*
+
+## §8 Security properties the project provides
+
+*(Registration + depth defenses documented; the guarantees framed below are 
for PMC confirmation.)*
+
+- **Registered-type-only instantiation (default).** With 
`requireClassRegistration(true)` (the default), deserialization instantiates 
only types the application registered, so attacker bytes cannot drive Fory to 
construct an arbitrary class. *Violation symptom:* an unregistered/unexpected 
type is instantiated from input under the default config. *Severity:* 
security-critical (this is the deserialization-RCE defense). *(documented that 
registration is required by default + that disabling causes risk; the 
unbypassability guarantee is the claim to confirm)*
+- **Bounded recursion depth.** Deserialization beyond `maxDepth` (default 50) 
throws rather than recursing unbounded. *Violation symptom:* stack overflow / 
unbounded recursion from crafted nesting under the default. *Severity:* 
security-critical (DoS). *(documented — config table)*
+- **Memory safety on malformed input — language-conditional.** In 
managed-runtime implementations, malformed bytes yield an exception, not memory 
corruption. For the **C++** implementation this is the load-bearing property to 
confirm (malformed-input fuzzing of the C++ decoder). *Violation symptom:* OOB 
read/write, crash. *Severity:* security-critical. *(inferred — confirm per 
language)*
+- **Resource bounds beyond depth — UNSPECIFIED.** Whether a crafted payload 
can force large allocation / CPU blowup within the depth limit (e.g. huge 
declared collection sizes) is a bug or expected is open; the model needs a line 
(§14). *(inferred; maxDepth documented)*
+
+## §9 Security properties the project does *not* provide
+
+*(Highest-value section for integrators.)*
+
+- **No protection when class registration is disabled.** 
`requireClassRegistration(false)` deliberately allows deserializing unknown 
classes — using it on untrusted input re-opens the classic 
deserialization-gadget RCE surface. This is the caller's choice, documented as 
risky. *(documented — config)*
+- **No payload authentication or confidentiality.** Fory does not verify that 
bytes came from a trusted producer or that they are unmodified; it is not a 
MAC, signature, or cipher. *(inferred)* **False friend:** a successful 
round-trip / schema-compatibility check is *not* an integrity guarantee against 
a malicious producer.
+- **Not a sandbox for registered types.** Registering a class authorizes Fory 
to instantiate it from bytes; if that class's construction has side effects, 
Fory does not contain them. *(inferred)*
+- **Cross-language type-confusion is the integrator's concern** in xlang mode 
— relying on the peer to send a compatible schema is a trust assumption between 
the two ends, not something Fory enforces against a hostile peer. *(inferred)*
+- **Well-known classes left to the caller:** deserialization-gadget attacks 
(defended by registration, *if left on*), decompression/allocation bombs 
(partially bounded by `maxDepth`), and integrity attacks on the byte stream. 
*(inferred)*
+
+## §10 Downstream responsibilities (the embedding application)
+
+- **Keep `requireClassRegistration(true)`** whenever any deserialized bytes 
could be attacker-influenced (the documented production guidance). 
*(documented)*
+- **Register only types that are safe to instantiate from untrusted data**; do 
not register types with dangerous construction side effects. *(inferred)*
+- **Authenticate / integrity-check / decrypt** untrusted bytes *before* 
handing them to `deserialize()` — Fory will not. *(inferred)*
+- **Tune `maxDepth`** to the application's real object depth rather than 
disabling it. *(inferred)*
+- **In xlang mode, treat the peer's schema as a trust relationship** you 
control, not something Fory polices. *(inferred)*
+
+## §11 Known misuse patterns
+
+*(Draft one-liners — expand before publishing.)*
+
+- Setting `requireClassRegistration(false)` for convenience, then 
deserializing network/user data. *(documented as risky)*
+- Treating Fory deserialization of untrusted bytes as safe without 
integrity-checking the bytes first. *(inferred)*
+- Registering broad/dangerous types (or whole packages) to "make it work", 
widening the gadget surface. *(inferred)*
+- Assuming the C++ decoder is as forgiving of malformed input as the JVM one. 
*(inferred)*
+
+## §11a Known non-findings (recurring false positives)
+
+*(Seed list — confirmations here are the highest-leverage scan-suppression 
input.)*
+
+- "Fory can deserialize arbitrary classes → RCE" — **only** with 
`requireClassRegistration(false)`; under the default (`true`) it cannot. A 
report that assumes registration is off is `OUT-OF-MODEL: non-default-build` 
unless the PMC says otherwise. *(documented)*
+- "No signature/MAC/encryption on the serialized format" — by-design; 
integrity/confidentiality is the caller's (§9/§10). *(inferred)*
+- "Unbounded recursion on nested input" — bounded by `maxDepth` (default 50). 
*(documented)*
+- "Registered class X does something dangerous when constructed" — the 
application's registration choice (§3/§10), not a Fory bug. *(inferred)*
+- "Reflection / dynamic codegen used at runtime" — `codeGenEnabled` operates 
over the app's own registered types, not attacker bytes (§5). *(documented 
config; framing inferred)*
+
+## §12 Conditions that would change this model
+
+- A change to the **default** of `requireClassRegistration` or `maxDepth`. 
*(documented knobs)*
+- A new deserialization entry point or a new language implementation with a 
different memory-safety profile. *(inferred)*
+- Fory gaining any I/O / network surface (it would stop being a pure 
in-process library). *(inferred)*
+- A report that cannot be routed to a single §13 disposition → revise the 
model.
+
+## §13 Triage dispositions
+
+| Disposition | Meaning | Licensed by |
+| --- | --- | --- |
+| `VALID` | Violates a §8 property under the **default** config via 
attacker-controlled bytes (e.g. unregistered-type instantiation with 
registration on; unbounded recursion within maxDepth; C++ memory corruption on 
malformed input). | §8, §6, §7 |
+| `VALID-HARDENING` | No §8 property broken, but a §11 misuse is easy enough 
to harden (e.g. a safer default, a louder warning). | §11 |
+| `OUT-OF-MODEL: trusted-input` | Requires attacker control of the 
serialize-side objects, the registered-type set, or the Fory config. | §6, §7 |
+| `OUT-OF-MODEL: non-default-build` | Only manifests with 
`requireClassRegistration(false)` or another discouraged §5a setting. | §5a |
+| `OUT-OF-MODEL: unsupported-component` | Lands in `examples/`, `benchmarks/`, 
`integration_tests/`. | §3 |
+| `BY-DESIGN: property-disclaimed` | Concerns a §9-disclaimed property (no 
payload auth/encryption, not a sandbox for registered types, xlang peer trust). 
| §9 |
+| `KNOWN-NON-FINDING` | Matches a §11a entry. | §11a |
+| `MODEL-GAP` | Cannot be cleanly routed — triggers a §12 revision. | §12 |
+
+## §14 Open questions for the maintainers
+
+**Wave 1 — scope & the registration framing:**
+
+1. Confirm Fory is modeled as an **in-process library** with no ambient I/O 
(no sockets/processes/network) — the negative-side-effects inventory in §5. 
Proposed: yes. → §2/§5.
+2. **The core ruling:** with `requireClassRegistration(true)` (default), is 
"only registered types are instantiated from untrusted bytes" a property Fory 
**commits to** (so a bypass is `VALID`/security-critical)? And is a finding 
that requires `requireClassRegistration(false)` correctly `OUT-OF-MODEL: 
non-default-build`? Proposed: yes to both. → §8/§5a/§13.
+3. Confirm `examples/`/`benchmarks/`/`integration_tests/` are out of scope. → 
§3.
+
+**Wave 2 — language profiles & inputs:**
+
+4. **Per-language memory safety:** for which implementations does Fory claim 
"malformed input → clean error, not memory corruption"? Is the **C++** decoder 
the primary memory-safety surface to fuzz, and does it carry the same 
guarantee? → §5/§8.
+5. Beyond `maxDepth`, are there bounds on total allocation / declared 
collection sizes / output size against a hostile payload, or is that explicitly 
the caller's concern? Where is the resource/DoS line? → §8/§11a.
+6. In **xlang** mode, what does Fory assume about the peer — is a 
hostile/malformed peer schema in scope, or is the peer a trusted endpoint? 
Proposed: peer trusted; type-confusion is the integrator's concern. → §7/§9.
+
+**Wave 3 — disclaimers & non-findings:**
+
+7. Confirm Fory disclaims payload integrity/authenticity/confidentiality (no 
MAC/sig/encryption) and is not a sandbox for registered types' own logic. → §9.
+8. Any other recurring scanner/fuzzer false positives the PMC already knows 
about, to seed §11a (e.g. reflection/Unsafe usage, codegen)? → §11a.
+9. **Meta:** Fory has no in-repo `SECURITY.md` and an `AGENTS.md` that is a 
developer/agent guide. This engagement adds `SECURITY.md` + `THREAT_MODEL.md` 
and wires `AGENTS.md → SECURITY.md → THREAT_MODEL.md`. Confirm the model should 
live in-repo (as proposed) vs. on the website, and who owns revisions. The 
existing config-guide "Security" section becomes a pointer to this model. → §1.

Review Comment:
   We have 
https://github.com/apache/fory/blob/main/docs/security/deserialization.md now, 
could the THREAD_MODEL.md be moved into docs/security/threat-model.md ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to