This is an automated email from the ASF dual-hosted git repository. chaokunyang pushed a commit to branch main in repository https://gitbox.apache.org/repos/asf/fory-site.git
commit 316baa51723e37621168f423aad83c37fdd85623 Author: chaokunyang <[email protected]> AuthorDate: Fri Jun 5 11:00:25 2026 +0000 🔄 synced local 'docs/specification/' with remote 'docs/specification/' --- docs/specification/java_serialization_spec.md | 10 ++--- docs/specification/xlang_implementation_guide.md | 53 ++++++++++++------------ docs/specification/xlang_serialization_spec.md | 50 +++++++++++----------- docs/specification/xlang_type_mapping.md | 8 ++-- 4 files changed, 60 insertions(+), 61 deletions(-) diff --git a/docs/specification/java_serialization_spec.md b/docs/specification/java_serialization_spec.md index 2cd160ebaf..609bae7001 100644 --- a/docs/specification/java_serialization_spec.md +++ b/docs/specification/java_serialization_spec.md @@ -50,7 +50,7 @@ root: | reference flag | [type metadata] | [value payload] | ``` -All multi-byte fixed-width values are little endian. A big-endian Java runtime +All multi-byte fixed-width values are little endian. A big-endian Java implementation must still write and read little-endian payloads. The stream is stateful. Type metadata, class definitions, and object references @@ -113,7 +113,7 @@ name metadata. Schema-evolution classes may carry a ClassDef. | `22..63` | Reserved in Java native mode for the xlang internal ID range. | | `64..68` | Reserved for future Java native internal IDs. | | `69..98` | Java native built-ins listed below. | -| `99+` | User and runtime class IDs assigned by the Java `ClassResolver`. | +| `99+` | User and Fory class IDs assigned by the Java `ClassResolver`. | The shared scalar IDs are: @@ -207,9 +207,9 @@ Indexes are assigned in first-use order. Java native mode has two object schema modes. -### Schema-Consistent Mode +### Same-Schema Mode -Schema-consistent mode is used when compatible mode is disabled. The writer and +Same-schema mode is used when compatible mode is disabled. The writer and reader must have matching fields and field order. No per-object ClassDef is required for ordinary registered classes. Field values are written directly in protocol order. @@ -225,7 +225,7 @@ In compatible mode, a matched field may read between direct top-level scalar ClassDef schemas when the remote value can be represented by the local scalar schema without changing the logical value. This is a read adaptation only: writers keep emitting their local canonical field schema and payload, and -ClassDef metadata, schema-consistent mode, dynamic value serialization, and +ClassDef metadata, same-schema mode, dynamic value serialization, and unknown-field skipping continue to treat the original field schemas as distinct. The rule applies only to the immediate schema of a matched field. It does not diff --git a/docs/specification/xlang_implementation_guide.md b/docs/specification/xlang_implementation_guide.md index c54eb7687e..60ed45ef63 100644 --- a/docs/specification/xlang_implementation_guide.md +++ b/docs/specification/xlang_implementation_guide.md @@ -21,22 +21,22 @@ license: | ## Overview -This guide describes the current xlang runtime ownership model used by the -reference Java runtime and mirrored by the Dart runtime rewrite. +This guide describes the current xlang implementation ownership model used by +the reference Java implementation and mirrored by the Dart implementation rewrite. The wire format is defined by [Xlang Serialization Spec](xlang_serialization_spec.md). This document is about -service boundaries, operation flow, and internal ownership. New runtimes do not +service boundaries, operation flow, and internal ownership. New language implementations do not need the same class names, but they should preserve the same control flow: -- root operations stay on the runtime facade +- root operations stay on the `Fory` facade - nested payload work stays on explicit read and write contexts - type metadata stays in the type resolver layer - serializers stay payload-focused When this guide conflicts with the wire-format specification, follow `docs/specification/xlang_serialization_spec.md`. When it conflicts with a -runtime-specific implementation detail, follow the current runtime code for +language-specific implementation detail, follow the current implementation code for that language. ## Source Of Truth @@ -44,10 +44,10 @@ that language. Use these sources in this order: 1. `docs/specification/xlang_serialization_spec.md` -2. the current runtime implementation for the language +2. the current implementation for the language 3. cross-language tests under `integration_tests/` -For Dart, the runtime shape is centered on: +For Dart, the implementation shape is centered on: - `Fory` - `WriteContext` @@ -57,20 +57,20 @@ For Dart, the runtime shape is centered on: - `TypeResolver` - `StructSerializer` -## Runtime Ownership Model +## Implementation Ownership Model ### `Fory` is the root-operation facade -`Fory` owns the reusable runtime services for one runtime instance. +`Fory` owns the reusable services for one Fory instance. -In Dart, `Fory` owns exactly four runtime members: +In Dart, `Fory` owns exactly four reusable members: - `Buffer` - `WriteContext` - `ReadContext` - `TypeResolver` -In Java, `Fory` also owns runtime-local services such as `JITContext` and +In Java, `Fory` also owns instance-local services such as `JITContext` and `CopyContext`, but the ownership rule is the same: `Fory` is the root facade, not the place where nested serializers do their work. @@ -104,7 +104,7 @@ That operation-local state includes: - logical object-graph depth Generated and hand-written serializers should treat these contexts as the only -source of operation-local services. Serializers must not keep ambient runtime +source of operation-local services. Serializers must not keep ambient instance state in thread locals, globals, or serializer instance fields. ### `WriteContext` @@ -312,7 +312,7 @@ In Dart that internal owner is `StructSerializer`. When `Config.compatible` is enabled and the struct is marked evolving: - the wire type uses the compatible struct form -- the runtime writes shared TypeDef metadata +- the writer emits shared TypeDef metadata - reads map incoming fields by identifier and skip unknown fields - generated serializers apply matched fields directly while preserving their own object construction and default-value rules @@ -322,7 +322,7 @@ When `Config.compatible` is enabled and the struct is marked evolving: When `compatible` is disabled and `checkStructVersion` is enabled: -- the runtime writes the schema hash for struct payloads +- the writer emits the schema hash for struct payloads - the read side checks that hash before reading fields Compatible scalar conversion is owned by the compatible struct field reader or @@ -335,8 +335,7 @@ layout. Layout classification must reject top-level scalar conversions when either matched schema has `trackingRef = true` and must reject same scalar type pairs whose top-level `trackingRef` framing differs; converters must not add a reference-table path for scalar mismatches. Same-schema readers with matching -reference and null/optional framing and schema-consistent readers must keep -direct scalar read paths without conversion branches or per-field conversion +reference and null/optional framing must keep direct scalar read paths without conversion branches or per-field conversion objects. Same raw scalar types with different null/optional framing may still use the compatible nullable/optional composition path when both fields are not reference-tracked. @@ -366,7 +365,7 @@ In Java: - `@ForyEnumId` can override that with a stable explicit tag - `serializeEnumByName(true)` affects native Java mode, not xlang mode -Other runtimes should preserve the same wire rule even if the configuration or +Other language implementations should preserve the same wire rule even if the configuration or annotation surface differs. ## Out-Of-Band Buffer Objects @@ -376,7 +375,7 @@ Buffer-object handling follows the same split: - one root bit advertises whether out-of-band buffers are in play - nested buffer-object payloads still decide in-band vs out-of-band one value at a time -- serializers use read/write context helpers rather than bypassing the runtime +- serializers use read/write context helpers rather than bypassing the context layer ## Code Generation @@ -396,9 +395,9 @@ Generated code should emit: - a public per-library registration helper that users call from application code - private generated installation helpers that keep serializer factories private -The public helper should be a thin generated wrapper around the runtime -registration API, not a public global registry or a second unrelated runtime API -family. +The public helper should be a thin generated wrapper around the Fory +registration API, not a public global registry or a second unrelated +registration API family. ## Directory Layout @@ -414,19 +413,19 @@ Not allowed: - `lib/src/<area>/<subarea>/<file>.dart` -## Serializer Design Rules For New Runtimes +## Serializer Design Rules For New Implementations -Any new xlang runtime should follow these rules even if its surface API looks +Any new xlang implementation should follow these rules even if its surface API looks different: -1. Keep root operations on the runtime facade and nested payload work on +1. Keep root operations on the `Fory` facade and nested payload work on explicit read and write contexts. 2. Keep reference tracking behind dedicated read-side and write-side services so the disabled path stays cheap. 3. Make serializers payload-only. Type metadata, registration, and root - framing belong to the runtime and type resolver layers. + framing belong to the `Fory` and type resolver layers. 4. Track per-operation state explicitly. Do not rely on ambient thread-local - runtime state. + instance state. 5. Reserve read reference IDs before materializing new objects, and bind partially built objects as soon as a nested child may refer back to them. 6. Keep operation setup and operation cleanup separate. `prepare(...)` binds @@ -442,7 +441,7 @@ different: ## Validation -For Dart runtime changes, run at minimum: +For Dart implementation changes, run at minimum: ```bash cd dart diff --git a/docs/specification/xlang_serialization_spec.md b/docs/specification/xlang_serialization_spec.md index 6328b43a41..27b8e19ee4 100644 --- a/docs/specification/xlang_serialization_spec.md +++ b/docs/specification/xlang_serialization_spec.md @@ -29,7 +29,7 @@ Key characteristics: - **Cross-language**: Same binary format works across Java, Python, C++, Go, Rust, JavaScript/TypeScript, C#, Swift, Dart, Scala, and Kotlin - **Reference-aware**: Handles shared references and circular references without duplication or infinite recursion -- **Polymorphic**: Supports object polymorphism with runtime type resolution +- **Polymorphic**: Supports object polymorphism with concrete type resolution This specification defines the Fory xlang binary format. The format is dynamic rather than static, which enables flexibility and ease of use at the cost of additional complexity in the wire format. @@ -101,7 +101,7 @@ Note: ### Polymorphisms For polymorphism, if one non-final class is registered, and only one subclass is registered, then we can take all -elements in List/Map have same type, thus reduce runtime check cost. +elements in List/Map have same type, thus reduce per-element type checks. Collection/Array polymorphism are not fully supported, since some languages such as golang have only one collection type. If users want to get exactly the type he passed, he must pass that type when deserializing or annotate that type @@ -190,7 +190,7 @@ encodings in the same signedness and width domain match the corresponding dense array element domain. This is a read adaptation, not a schema-kind merge: writers keep emitting their local canonical `list<T>` or `array<T>` payload, and TypeDef/ClassDef encodings, fingerprints, dynamic root serialization, -schema-consistent mode, and unknown-field skipping continue to treat `list<T>` +same-schema mode, and unknown-field skipping continue to treat `list<T>` and `array<T>` as distinct kinds. The adaptation is limited to the immediate schema of the matched compatible @@ -210,7 +210,7 @@ between direct top-level scalar schemas when the remote value can be represented by the local scalar schema without changing the logical value. This is a compatible read adaptation only: writers keep emitting their local canonical schema and payload, and TypeDef/ClassDef encodings, fingerprints, dynamic root -serialization, schema-consistent mode, unknown-field skipping, and container +serialization, same-schema mode, unknown-field skipping, and container element schemas continue to treat the original scalar types as distinct. The scalar conversion rule applies only to the immediate schema of the matched @@ -318,16 +318,16 @@ matched scalar pairs whose top-level field schemas have `trackingRef = false`. Readers first consume the remote null/optional framing described by the remote field metadata. If a value is present, the reader converts the unwrapped scalar value and then assigns or wraps it into the local carrier. If the remote value -is null or absent, the runtime uses the same missing/null compatible-field rule +is null or absent, the reader uses the same missing/null compatible-field rule it already applies for that local field; this feature does not introduce a second null policy. Reference-tracked scalar conversion is not supported. Conversion failures are data errors, not schema misses. A schema pair outside the conversion matrix remains a schema/type compatibility error when building the compatible layout. Once a matched field is accepted as a scalar conversion -action, an invalid payload value MUST be reported through the runtime's -data-error owner with enough context to identify the remote type, local type, -and field when that owner has the information. +action, an invalid payload value MUST be reported through the implementation's +data-error path with enough context to identify the remote type, local type, and +field when that path has the information. Users can also provide meta hints for fields of a type, or the type whole. Here is an example in java which use annotation to provide such information. @@ -394,7 +394,7 @@ Named types (`NAMED_*`) do not embed a user ID; their names are carried in metad | 24 | MAP | Key-value mapping | | 25 | ENUM | Enum registered by numeric ID | | 26 | NAMED_ENUM | Enum registered by namespace + type name | -| 27 | STRUCT | Struct registered by numeric ID (schema consistent) | +| 27 | STRUCT | Struct registered by numeric ID (same-schema) | | 28 | COMPATIBLE_STRUCT | Struct with schema evolution support (by ID) | | 29 | NAMED_STRUCT | Struct registered by namespace + type name | | 30 | NAMED_COMPATIBLE_STRUCT | Struct with schema evolution (by name) | @@ -1299,12 +1299,12 @@ The elements header is a single byte that encodes metadata about the collection | reserved | is_same_type| is_decl_elem_type| has_null | track_ref | ``` -| Bit | Name | Value | Meaning when SET (1) | Meaning when UNSET (0) | -| --- | ----------------- | ----- | --------------------------------------- | --------------------------------------- | -| 0 | track_ref | 0x01 | Track references for elements | Don't track element references | -| 1 | has_null | 0x02 | Payload contains null element markers | No null elements (skip null checks) | -| 2 | is_decl_elem_type | 0x04 | Elements are the declared generic type | Element types differ from declared type | -| 3 | is_same_type | 0x08 | All elements have the same runtime type | Elements have different runtime types | +| Bit | Name | Value | Meaning when SET (1) | Meaning when UNSET (0) | +| --- | ----------------- | ----- | ---------------------------------------- | --------------------------------------- | +| 0 | track_ref | 0x01 | Track references for elements | Don't track element references | +| 1 | has_null | 0x02 | Payload contains null element markers | No null elements (skip null checks) | +| 2 | is_decl_elem_type | 0x04 | Elements are the declared generic type | Element types differ from declared type | +| 3 | is_same_type | 0x08 | All elements have the same concrete type | Elements have different concrete types | **Common header values:** @@ -1377,7 +1377,7 @@ can be taken as an example. Primitive array are taken as a binary buffer, serialization will just write the length of array size as an unsigned int, then copy the whole buffer into the stream. Multi-byte element arrays are always encoded in little-endian element order; -runtimes whose native typed-array storage uses another byte order must swap or write elements explicitly instead of +implementations whose native typed-array storage uses another byte order must swap or write elements explicitly instead of copying native storage bytes unchanged. Such serialization won't compress the array. If users want to compress primitive array, users need to register custom @@ -1506,7 +1506,7 @@ Date represents a date without timezone. It is encoded as: - `days` (varint64): signed count of days since the Unix epoch (`1970-01-01`) The value is reconstructed as `LocalDate.ofEpochDay(days)` or the equivalent calendar-date constructor in -the target runtime. +the target language implementation. This `varint64` encoding applies to xlang serialization only. Native, language-specific local-date encodings are unchanged. @@ -1654,7 +1654,7 @@ reachable only in invalid schemas (e.g., duplicate tag IDs). - The compressed numeric rule is critical for cross-language consistency: compressed integer fields are always placed after all fixed-width integer fields. -#### Schema consistent (meta share disabled) +#### Same-schema mode (meta share disabled) Object value layout: @@ -1680,7 +1680,7 @@ value; polymorphic fields include type meta. #### Compatible mode (meta share enabled) -The field value layout is the same as schema-consistent mode, but the type meta for +The field value layout is the same as same-schema mode, but the type meta for `COMPATIBLE_STRUCT` and `NAMED_COMPATIBLE_STRUCT` uses shared TypeDef entries. Deserializers use TypeDef to map fields by name or tag ID and to honor nullable/ref flags from metadata; unknown fields are skipped. @@ -1702,7 +1702,7 @@ union Contact [id=0] { Rules: - A union schema MUST declare at least one schema-defined alternative. The - unknown-case carrier used by some language bindings is runtime-owned and is + unknown-case carrier used by some language bindings is implementation-provided and is omitted from the schema's alternative table. - Each union alternative MUST have a stable non-negative tag number (`= 0`, `= 1`, ...). - Tag numbers MUST be unique within the union and MUST NOT be reused. @@ -1751,7 +1751,7 @@ This is required even for primitives so unknown alternatives can be skipped safe If a reader sees a `case_id` that is not present in its local union schema, it SHOULD preserve the unknown case when the target language has a language-neutral carrier for it. Such a carrier MUST expose the original case -ID and decoded value, and it MUST retain only runtime-internal wire type ID +ID and decoded value, and it MUST retain only implementation-internal wire type ID state needed for reserialization. It MUST NOT store resolver-owned type metadata or other context-owned state. Writers MUST use the stored original case ID for the union envelope, not any generated carrier marker. Unknown-case @@ -1759,10 +1759,10 @@ payload writers MUST emit the Any-style payload body in wire order: ref metadata first, then full value type metadata, then value bytes. For internal numeric type IDs, the type ID byte is the complete value type metadata and the payload writer MAY use the stored wire type ID to preserve fixed, variable, or -tagged integer encodings when the decoded value has the expected runtime type. +tagged integer encodings when the decoded value has the expected concrete value type. These scalar numeric payloads are not reference-tracked, so their ref metadata -is `NotNullValue`. Otherwise it MUST fall back to the language runtime's -ordinary polymorphic Any-value writer. Unknown carriers are runtime-owned +is `NotNullValue`. Otherwise it MUST fall back to the language implementation's +ordinary polymorphic Any-value writer. Unknown carriers are implementation-provided forward-compatibility containers, not entries in the local schema case table; schema-defined union cases MAY use `0..N`. When an unknown carrier is written back, the union envelope MUST use the carrier's original peer schema case ID @@ -1823,7 +1823,7 @@ Type will be serialized using type meta format. 1. **Byte Order**: Always use little-endian for multi-byte values 2. **Varint Sign Extension**: Ensure proper handling of signed vs unsigned varints 3. **Reference ID Ordering**: IDs must be assigned in serialization order -4. **Field Order Consistency**: Must match exactly across languages in schema-consistent mode; in compatible mode, match by TypeDef field names or tag IDs +4. **Field Order Consistency**: Must match exactly across languages in same-schema mode; in compatible mode, match by TypeDef field names or tag IDs 5. **String Encoding**: Use best encoding for current language 6. **Null Handling**: Different languages represent null differently 7. **Empty Collections**: Still write length (0) and header byte diff --git a/docs/specification/xlang_type_mapping.md b/docs/specification/xlang_type_mapping.md index 64ed25ea3d..bb8d49785d 100644 --- a/docs/specification/xlang_type_mapping.md +++ b/docs/specification/xlang_type_mapping.md @@ -114,7 +114,7 @@ Notes: - Python `pyfory.Float16` and `pyfory.BFloat16` are reserved annotation markers; scalar values deserialize as native Python `float`. - Python `BoolArray`, `Int8Array`, `Int16Array`, `Int32Array`, `Int64Array`, `UInt8Array`, `UInt16Array`, `UInt32Array`, `UInt64Array`, `Float16Array`, `BFloat16Array`, `Float32Array`, and `Float64Array` are public dense-array wrappers with list-like sequence behavior. -- JavaScript `BoolArray`, fallback `Float16Array`, and `BFloat16Array` are public dense-array wrappers backed by `Uint8Array` or `Uint16Array`. Scalar `float16` and `bfloat16` values use `number`. A JavaScript runtime with native `Float16Array` may return that native carrier for `array<float16>`. +- JavaScript `BoolArray`, fallback `Float16Array`, and `BFloat16Array` are public dense-array wrappers backed by `Uint8Array` or `Uint16Array`. Scalar `float16` and `bfloat16` values use `number`. A JavaScript environment with native `Float16Array` may return that native carrier for `array<float16>`. - Java plain `byte[]` maps to `binary`. Numeric byte arrays use type-use annotations: `@Int8Type byte[]` for `array<int8>` and `@UInt8Type byte[]` for `array<uint8>`. - Dart uses `double` plus `Float16Type` or `Bfloat16Type` metadata for scalar @@ -147,12 +147,12 @@ Notes: - The table above remains the canonical xlang schema mapping. Compatible readers may apply the scalar field adaptation rules defined by `xlang_serialization_spec.md` during schema-compatible struct/class field matching. Those rules do not change TypeDef metadata, dynamic root type - mapping, schema-consistent mode, or nested collection/map/array/union/generic positions. + mapping, same-schema mode, or nested collection/map/array/union/generic positions. ### Scala IDL Mapping -The Scala schema IDL target emits Scala 3 source only. The `fory-scala` runtime -artifact remains cross-built for Scala 2.13 and Scala 3. +The Scala schema IDL target emits Scala 3 source only. The `fory-scala` artifact remains cross-built +for Scala 2.13 and Scala 3. | Fory schema kind | Scala generated carrier | | ------------------------------------- | ---------------------------------------------------------------------------------------- | --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
