chaokunyang commented on code in PR #3734:
URL: https://github.com/apache/fory/pull/3734#discussion_r3418720810


##########
THREAT_MODEL.md:
##########
@@ -0,0 +1,182 @@
+<!--
+SPDX-License-Identifier: Apache-2.0
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+    https://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Apache Fory — Threat Model (v0 draft)
+
+## §1 Header
+
+- **Project:** Apache Fory (`apache/fory`), `main`, against which this draft 
was written. Fory is a multi-language serialization framework (Java, C++, 
Python, Go, Rust, JavaScript, Kotlin, Scala, Swift, Dart, C#).
+- **Date:** 2026-06-02. **Status:** draft — for Apache Fory PMC review. 
**Author:** ASF Security team (drafted via the Scovetta threat-model rubric), 
for PMC ratification.
+- **Version binding:** versioned with the project; a report against Fory 
version *N* is triaged against the model as it stood at *N*, not at HEAD.
+- **Reporting cross-reference:** findings that violate a §8 property should be 
reported privately per the ASF process (`[email protected]` → 
`[email protected]`); findings under §3 or §9 are closed citing this 
document.
+- **Provenance legend:** *(documented)* = stated in Fory's own docs/repo; 
*(maintainer)* = confirmed by a Fory PMC member through this process; 
*(inferred)* = reasoned from architecture/domain knowledge, not yet confirmed — 
every *(inferred)* claim has a matching §14 open question.
+- **Draft confidence:** ~20 documented / 0 maintainer / ~26 inferred.
+- **What Fory is:** Apache Fory is a high-performance, multi-language 
object/data serialization framework. An application uses it in-process to 
serialize its objects to bytes and deserialize bytes back into objects, either 
within one language ("native" mode) or across languages ("xlang" mode), with 
optional zero-copy and a row format. *(documented — README, docs/guide)*
+
+## §2 Scope and intended use
+
+- **Primary use:** an **in-process library** linked into a host application 
that calls `serialize()` / `deserialize()` on its own data types. *(documented 
— guides)*
+- **It is not a network service or daemon.** It has no listening surface, no 
auth, no users — the embedding application owns where the bytes come from and 
go. *(inferred)*
+- **Caller / trust level:** a single caller — the embedding application — 
which is **trusted** (it links the library and registers its types). The 
security-relevant question is not "who calls Fory" but **"where do the bytes 
handed to `deserialize()` come from"** — trusted producer, or 
attacker-controlled. *(inferred; the registration guidance is documented)*
+
+**Component-family table** *(in/out of this model):*
+
+| Family | Entry point | Notes | In model? |
+| --- | --- | --- | --- |
+| Object-graph serialization (native, per language) | `fory.serialize` / 
`deserialize` | the core; instantiates registered types from bytes | **In** 
*(documented)* |
+| Cross-language (xlang) serialization | xlang `serialize`/`deserialize` | 
type mapping across languages | **In** *(documented)* |
+| Row format / zero-copy | row encoders | reads fields in place from a buffer 
| **In** *(documented)* |
+| Class/type registration + "secure mode" | `requireClassRegistration`, 
`register(...)` | the primary defense | **In** *(documented)* |
+| Per-language implementations | `java/`, `cpp/`, `python/`, `go/`, `rust/`, 
`javascript/`, `kotlin/`, `scala/`, `swift/`, `dart/`, `csharp/` | each is a 
separate impl of the same model | **In** — but memory-safety profile differs by 
language (see §5/§8) *(documented: dirs exist)* |
+| `examples/`, `benchmarks/`, `integration_tests/` | demo/bench/test | not 
production surface | **Out** *(see §3)* |
+
+## §3 Out of scope (explicit non-goals)
+
+- **The integrity / authenticity / confidentiality of the serialized bytes.** 
Fory deserializes what it is given; it does not authenticate, MAC, or encrypt 
payloads. If bytes can be tampered with in transit/at rest, that is the 
application's problem to solve (sign/encrypt before handing to Fory). 
*(inferred)*
+- **Anything when the caller disables class registration on an untrusted 
payload source.** `requireClassRegistration(false)` is a documented, 
deliberately-available footgun; using it against attacker-controlled bytes is 
out of the model's protection (see §5a/§9). *(documented — config: "Disabling 
may allow unknown classes to be deserialized, potentially causing security 
risks")*
+- **The behaviour of the application's own registered classes.** Fory 
instantiates and populates registered types; if a registered class has 
dangerous side effects in its constructors/setters/finalizers, that is the 
application's design, not Fory's. *(inferred)*
+- **`examples/`, `benchmarks/`, `integration_tests/`** — shipped but not a 
production trust surface. *(inferred)*
+
+## §4 Trust boundaries and data flow
+
+- **The trust boundary is the byte buffer passed to `deserialize()`** (and the 
row-format buffer). Everything Fory does on the serialize side operates on the 
application's own in-memory objects (trusted); the deserialize side is where 
attacker-controlled bytes, if any, enter. *(inferred)*
+- **Data flow:** untrusted bytes → format/header parse → (class id / type 
resolution → **registration check**) → field decode → object graph construction 
→ returned to caller. The registration check is the gate that decides whether 
an arbitrary type may be instantiated. *(inferred; registration mechanism 
documented)*
+- **Reachability precondition:** a deserialize-side finding is **in-model** 
only if it is reachable from the byte buffer under the **default secure 
configuration** (`requireClassRegistration(true)`). A finding that requires 
`requireClassRegistration(false)`, or that requires the *serialize* side to be 
fed attacker-controlled live objects, is out-of-model (§5a / trusted-input). 
*(inferred)*
+
+## §5 Assumptions about the environment
+
+- **In-process, no ambient I/O.** Fory does not (by design) open sockets, 
spawn processes, or read the network; it operates on in-memory buffers handed 
to it. *(inferred — high-priority confirmation; negative claim)*
+- **Per-language memory model differs.** In managed runtimes (Java, Python, 
Go, JS, …) memory safety is the runtime's; in the **C++** (and unsafe-Rust FFI) 
paths, malformed input reaching the decoder is a memory-safety surface in a way 
it is not on the JVM. The model's "memory safety on malformed input" property 
is therefore language-conditional (see §8). *(inferred)*
+- **Codegen / JIT:** on ordinary JVMs Fory generates serializer code at 
runtime (`codeGenEnabled` default true); disabled on Android / GraalVM native 
image. This is a performance mechanism over the application's own registered 
types, not a path for executing attacker bytes. *(documented — config table)*
+
+## §5a Build-time and configuration variants
+
+The security envelope is set by runtime configuration, not build flags. The 
load-bearing knobs *(documented — docs/guide/java/configuration.md)*:
+
+| Knob | Default | Effect on the model |
+| --- | --- | --- |
+| `requireClassRegistration` | **`true`** (secure) | When true, only 
registered types are deserialized — the primary defense against deserializing 
arbitrary/gadget classes. Disabling "may allow unknown classes to be 
deserialized, potentially causing security risks." |
+| `maxDepth` | **`50`** | Bounds deserialization recursion depth; "can be used 
to refuse deserialization DDOS attack." |
+| `deserializeUnknownClass` | `true` in compatible mode, else `false` | 
Whether data for unknown/non-existent classes is skipped/deserialized. |
+| `compatible` | xlang: `true`; native: `false` | Schema forward/backward 
compatibility. |
+| `suppressClassRegistrationWarnings` | `true` | Registration warnings are 
useful for security audit but suppressed by default. |
+
+**The default is the *secure* posture here** (registration required) — the 
inverse of the usual insecure-default case. The model's §8 properties hold 
*under the defaults*; a report that only manifests under 
`requireClassRegistration(false)` is `OUT-OF-MODEL: non-default-build`. Confirm 
this framing with the PMC (§14).
+
+## §6 Assumptions about inputs
+
+Per-entry-point trust table *(registration mechanism + defaults documented; 
trust framing inferred):*
+
+| Entry point | Input | Attacker-controllable? | Caller must enforce |
+| --- | --- | --- | --- |
+| `deserialize(bytes)` / `deserialize(bytes, Class)` | serialized byte buffer 
| **yes, if the application sources bytes from an untrusted producer** | keep 
`requireClassRegistration(true)`; register only safe types; integrity-check 
bytes upstream |
+| row-format readers | buffer | **yes** (same as above) | same |
+| `serialize(obj)` | a live application object | no — the app's own trusted 
object | n/a |
+| `register(Class, …)` | type registered at setup | no — controlled by the app 
developer | register only types safe to instantiate from untrusted data |
+
+- **Size/shape/rate:** `maxDepth` (default 50) bounds nesting; whether total 
allocation / output size is otherwise bounded against a hostile payload is open 
(see §8 resource line). *(maxDepth documented; broader bound inferred)*
+
+## §7 Adversary model
+
+- **Primary adversary:** a party who controls the **serialized bytes** an 
application later passes to `deserialize()` (e.g. data arriving over a network 
the app feeds to Fory, or persisted data an attacker can tamper with). Goal: 
instantiate dangerous types (gadget-chain RCE), corrupt memory in the native 
paths, or exhaust CPU/memory. *(inferred — the canonical 
serialization-framework adversary)*
+- **Capabilities:** can craft arbitrary/malformed byte buffers; cannot change 
the application's Fory configuration or its registered-type set (those are set 
by the trusted app at startup). *(inferred)*
+- **Out of scope:** an attacker who controls the embedding application, its 
configuration, or the objects passed to `serialize()` — already trusted; an 
attacker who has set `requireClassRegistration(false)` themselves. *(inferred)*
+
+## §8 Security properties the project provides
+
+*(Registration + depth defenses documented; the guarantees framed below are 
for PMC confirmation.)*
+
+- **Registered-type-only instantiation (default).** With 
`requireClassRegistration(true)` (the default), deserialization instantiates 
only types the application registered, so attacker bytes cannot drive Fory to 
construct an arbitrary class. *Violation symptom:* an unregistered/unexpected 
type is instantiated from input under the default config. *Severity:* 
security-critical (this is the deserialization-RCE defense). *(documented that 
registration is required by default + that disabling causes risk; the 
unbypassability guarantee is the claim to confirm)*
+- **Bounded recursion depth.** Deserialization beyond `maxDepth` (default 50) 
throws rather than recursing unbounded. *Violation symptom:* stack overflow / 
unbounded recursion from crafted nesting under the default. *Severity:* 
security-critical (DoS). *(documented — config table)*
+- **Memory safety on malformed input — language-conditional.** In 
managed-runtime implementations, malformed bytes yield an exception, not memory 
corruption. For the **C++** implementation this is the load-bearing property to 
confirm (malformed-input fuzzing of the C++ decoder). *Violation symptom:* OOB 
read/write, crash. *Severity:* security-critical. *(inferred — confirm per 
language)*
+- **Resource bounds beyond depth — UNSPECIFIED.** Whether a crafted payload 
can force large allocation / CPU blowup within the depth limit (e.g. huge 
declared collection sizes) is a bug or expected is open; the model needs a line 
(§14). *(inferred; maxDepth documented)*

Review Comment:
   This is stale against the current security doc. 
`docs/security/deserialization.md` now draws the resource line: no 
disproportionate allocation before bytes are supplied or proven readable, no 
stream buffer growth to attacker-declared sizes before exact read/skip, and 
proportional checks before collection preallocation. I would replace this open 
question with a link to that model, otherwise future triage will treat an 
already-settled rule as undefined.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to