This is an automated email from the ASF dual-hosted git repository.

chaokunyang pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/fory-site.git

commit 1e1b652717c44da7f8bce1cc1f350afa08b46643
Author: chaokunyang <[email protected]>
AuthorDate: Wed May 13 11:15:30 2026 +0000

    🔄 synced local 'docs/specification/' with remote 'docs/specification/'
---
 docs/specification/java_serialization_spec.md  | 964 ++++++++++++++-----------
 docs/specification/xlang_serialization_spec.md |  47 +-
 2 files changed, 559 insertions(+), 452 deletions(-)

diff --git a/docs/specification/java_serialization_spec.md 
b/docs/specification/java_serialization_spec.md
index 1cba88ebc4..2d759e1c8e 100644
--- a/docs/specification/java_serialization_spec.md
+++ b/docs/specification/java_serialization_spec.md
@@ -19,563 +19,667 @@ license: |
   limitations under the License.
 ---
 
-## Spec overview
+## Scope
 
-Apache Fory Java serialization is a dynamic binary format for Java object 
graphs. It supports
-shared references, circular references, polymorphism, and optional schema 
evolution. The format is
-stream friendly: shared type metadata is written inline when needed and there 
is no meta start
-offset.
+This document specifies the Apache Fory Java native binary format: the format
+used by Java when `withXlang(false)` is configured. The format is optimized for
+Java object graphs, Java collection implementations, Java primitive arrays,
+Java class registration, Java serialization hooks, and optional schema
+evolution.
 
-The Java native format is an extension of the xlang wire format and reuses the 
same core framing
-and encodings; see `docs/specification/xlang_serialization_spec.md` for the 
shared baseline.
+Java native mode and xlang mode share low-level building blocks such as
+little-endian numeric payloads, variable-length integer encodings, reference
+flags, meta string encodings, and TypeDef/ClassDef concepts. They are different
+wire formats. In Java native mode, only the scalar type IDs from `BOOL` through
+`STRING` are shared with xlang. Collection, map, struct, array, enum, and
+native Java implementation type IDs are Java native IDs unless this document
+explicitly says otherwise.
 
-Overall layout:
+See [Xlang Serialization Format](xlang_serialization_spec.md) for the
+cross-language format.
 
-```
-| fory header | object ref meta | object type meta | object value data |
-```
-
-All data is encoded in little endian byte order. When running on a big endian 
platform, array
-serializers swap byte order on write/read so the on-wire layout remains little 
endian.
+## Stream Layout
 
-## Fory header
+A Java native stream contains one header byte followed by one or more root
+objects. Each root object is encoded as a normal object slot:
 
-Java native serialization writes a one byte bitmap header. The header layout 
mirrors the xlang
-bitmap and uses the same flag bits.
+```text
+| header | root_0 | root_1 | ... |
 
+root:
+| reference flag | [type metadata] | [value payload] |
 ```
-|     6 bits    | 1 bit | 1 bit |
-+---------------+-------+-------+
-| reserved      |  oob  | xlang |
-```
-
-- xlang flag: bit 0, set when serialization uses xlang format and clear for 
Java native format.
-- oob flag: bit 1, set when `BufferCallback` is not null.
-- reserved bits: bits 2-7, must be zero.
 
-The header is always a single byte; no language ID is written.
+All multi-byte fixed-width values are little endian. A big-endian Java runtime
+must still write and read little-endian payloads.
+
+The stream is stateful. Type metadata, class definitions, and object references
+are assigned indexes as they are first encountered and may be referenced later
+in the same stream.
+
+## Header
+
+The header is a single byte:
+
+```text
+| bits 7..2 reserved | bit 1 out-of-band | bit 0 xlang |
+```
+
+- `xlang` must be `0` for Java native mode.
+- `out-of-band` is `1` when a `BufferCallback` is configured.
+- Reserved bits must be `0`.
+
+Java native mode does not write a language ID after the header.
+
+## Reference Slots
+
+Objects, nullable fields, and reference-tracked fields use the standard Fory
+reference slot. The first byte is signed:
+
+| Flag                  | Byte | Payload that follows                          
                   |
+| --------------------- | ---- | 
---------------------------------------------------------------- |
+| `NULL_FLAG`           | `-3` | No payload. The slot value is `null`.         
                   |
+| `REF_FLAG`            | `-2` | `varuint32` reference ID of an earlier 
object.                   |
+| `NOT_NULL_VALUE_FLAG` | `-1` | Value payload. No reference ID is assigned 
for this occurrence.  |
+| `REF_VALUE_FLAG`      | `0`  | Value payload. Assign the next reference ID 
before reading data. |
+
+When reference tracking is disabled for a slot, writers use only `NULL_FLAG`
+and `NOT_NULL_VALUE_FLAG`.
+
+Primitive field fast paths do not wrap non-null primitive values in a reference
+slot. Boxed primitives and other nullable values use the slot selected by field
+metadata.
+
+## Type Metadata
+
+Dynamic object slots write type metadata before the value payload. Type 
metadata
+identifies the serializer and, when needed, carries class names or ClassDef
+metadata.
+
+```text
+| varuint32 type_id | [type-specific metadata] |
+```
+
+Registered Java classes, Java native built-ins, and Fory internal serializers
+use numeric type IDs. Unregistered classes or classes registered by name carry
+name metadata. Schema-evolution classes may carry a ClassDef.
+
+### Native Type ID Ranges
+
+| Range    | Meaning                                                           
 |
+| -------- | 
------------------------------------------------------------------ |
+| `0`      | `UNKNOWN`, used in metadata for dynamic or object-typed 
positions. |
+| `1..21`  | Shared scalar IDs from `BOOL` through `STRING`.                   
 |
+| `22..63` | Reserved in Java native mode for the xlang internal ID range.     
 |
+| `64..68` | Reserved for future Java native internal IDs.                     
 |
+| `69..98` | Java native built-ins listed below.                               
 |
+| `99+`    | User and runtime class IDs assigned by the Java `ClassResolver`.  
 |
+
+The shared scalar IDs are:
+
+| ID  | Name            | Java value domain                       |
+| --- | --------------- | --------------------------------------- |
+| 1   | `BOOL`          | boolean values in xlang metadata        |
+| 2   | `INT8`          | signed 8-bit integer metadata           |
+| 3   | `INT16`         | signed 16-bit integer metadata          |
+| 4   | `INT32`         | fixed-width signed 32-bit metadata      |
+| 5   | `VARINT32`      | variable-width signed 32-bit metadata   |
+| 6   | `INT64`         | fixed-width signed 64-bit metadata      |
+| 7   | `VARINT64`      | variable-width signed 64-bit metadata   |
+| 8   | `TAGGED_INT64`  | tagged signed 64-bit metadata           |
+| 9   | `UINT8`         | unsigned 8-bit metadata                 |
+| 10  | `UINT16`        | unsigned 16-bit metadata                |
+| 11  | `UINT32`        | fixed-width unsigned 32-bit metadata    |
+| 12  | `VAR_UINT32`    | variable-width unsigned 32-bit metadata |
+| 13  | `UINT64`        | fixed-width unsigned 64-bit metadata    |
+| 14  | `VAR_UINT64`    | variable-width unsigned 64-bit metadata |
+| 15  | `TAGGED_UINT64` | tagged unsigned 64-bit metadata         |
+| 16  | `FLOAT8`        | reserved 8-bit float metadata           |
+| 17  | `FLOAT16`       | half precision float metadata           |
+| 18  | `BFLOAT16`      | bfloat16 metadata                       |
+| 19  | `FLOAT32`       | 32-bit floating point metadata          |
+| 20  | `FLOAT64`       | 64-bit floating point metadata          |
+| 21  | `STRING`        | Java `String`                           |
+
+Java native built-ins start at ID `69`:
 
-## Reference meta
+| ID  | Name                         | Java type or serializer owner           
 |
+| --- | ---------------------------- | 
---------------------------------------- |
+| 69  | `VOID_ID`                    | `java.lang.Void`                        
 |
+| 70  | `CHAR_ID`                    | `java.lang.Character`                   
 |
+| 71  | `PRIMITIVE_VOID_ID`          | `void`                                  
 |
+| 72  | `PRIMITIVE_BOOL_ID`          | `boolean`                               
 |
+| 73  | `PRIMITIVE_INT8_ID`          | `byte`                                  
 |
+| 74  | `PRIMITIVE_CHAR_ID`          | `char`                                  
 |
+| 75  | `PRIMITIVE_INT16_ID`         | `short`                                 
 |
+| 76  | `PRIMITIVE_INT32_ID`         | `int`                                   
 |
+| 77  | `PRIMITIVE_FLOAT32_ID`       | `float`                                 
 |
+| 78  | `PRIMITIVE_INT64_ID`         | `long`                                  
 |
+| 79  | `PRIMITIVE_FLOAT64_ID`       | `double`                                
 |
+| 80  | `PRIMITIVE_BOOLEAN_ARRAY_ID` | `boolean[]`                             
 |
+| 81  | `PRIMITIVE_BYTE_ARRAY_ID`    | `byte[]`                                
 |
+| 82  | `PRIMITIVE_CHAR_ARRAY_ID`    | `char[]`                                
 |
+| 83  | `PRIMITIVE_SHORT_ARRAY_ID`   | `short[]`                               
 |
+| 84  | `PRIMITIVE_INT_ARRAY_ID`     | `int[]`                                 
 |
+| 85  | `PRIMITIVE_FLOAT_ARRAY_ID`   | `float[]`                               
 |
+| 86  | `PRIMITIVE_LONG_ARRAY_ID`    | `long[]`                                
 |
+| 87  | `PRIMITIVE_DOUBLE_ARRAY_ID`  | `double[]`                              
 |
+| 88  | `STRING_ARRAY_ID`            | `String[]`                              
 |
+| 89  | `OBJECT_ARRAY_ID`            | `Object[]` and object array serializers 
 |
+| 90  | `ARRAYLIST_ID`               | `java.util.ArrayList`                   
 |
+| 91  | `HASHMAP_ID`                 | `java.util.HashMap`                     
 |
+| 92  | `HASHSET_ID`                 | `java.util.HashSet`                     
 |
+| 93  | `CLASS_ID`                   | `java.lang.Class`                       
 |
+| 94  | `EMPTY_OBJECT_ID`            | Empty-object serializer                 
 |
+| 95  | `LAMBDA_STUB_ID`             | Lambda replacement stub                 
 |
+| 96  | `JDK_PROXY_STUB_ID`          | JDK proxy replacement stub              
 |
+| 97  | `REPLACE_STUB_ID`            | `writeReplace`/`readResolve` 
replacement |
+| 98  | `NONEXISTENT_META_SHARED_ID` | Unknown class placeholder               
 |
+
+### Registered, Named, and Unregistered Classes
+
+Java native mode supports three class identity forms:
+
+- ID registration: the type ID is the registered numeric class ID.
+- Name registration: the type metadata carries namespace and type name strings.
+- Unregistered class: the type metadata carries the package name as namespace
+  and the simple Java class name as type name.
+
+Class registration is the fastest and most compact form. Name-based forms are
+used when stable names are required or class registration is disabled.
+
+### Meta Sharing
+
+When meta sharing is enabled, class metadata is written once and referenced by 
a
+stream-local index:
+
+```text
+| varuint32 marker | [class definition bytes if new] |
+
+marker = (index << 1) | flag
+flag = 0: new definition, class definition bytes follow
+flag = 1: reference to an earlier definition
+```
+
+Indexes are assigned in first-use order.
+
+## Schema Modes
+
+Java native mode has two object schema modes.
+
+### Schema-Consistent Mode
+
+Schema-consistent mode is used when compatible mode is disabled. The writer and
+reader must have matching fields and field order. No per-object ClassDef is
+required for ordinary registered classes. Field values are written directly in
+protocol order.
+
+### Compatible Mode
+
+Compatible mode writes ClassDef metadata for struct-like classes. Readers match
+local fields against remote ClassDef fields by identifier, read matching 
fields,
+and skip unknown fields using the remote field type metadata. Compatible mode 
is
+the Java native schema-evolution path.
+
+## Field Order
+
+Java native object serializers use the same deterministic field-order
+categories as the current xlang protocol:
+
+1. Primitive non-nullable numeric and boolean scalar fields.
+2. Primitive nullable numeric and boolean scalar fields, including boxed Java
+   primitive wrappers.
+3. Non-primitive fields.
+
+Primitive groups keep the primitive comparator:
+
+1. Fixed-width primitive encodings before compressed or variable-width
+   primitive encodings.
+2. Larger primitive width before smaller primitive width.
+3. Internal primitive type ID ascending.
+4. Field identifier.
+
+Non-primitive fields sort directly by field identifier. Non-primitive type ID,
+serializer kind, collection kind, map kind, and Java implementation class do 
not
+participate in field order.
+
+Field identifiers are selected as follows:
+
+- If a field has an explicit non-negative `@ForyField(id = ...)`, that numeric
+  ID is the field identifier.
+- Otherwise, the Java field name converted to snake_case is the field
+  identifier.
+- Negative annotation values are not valid field IDs. The annotation default
+  value `-1` means no explicit ID and is ignored for identifier selection.
 
-Reference tracking uses the same flags as the xlang specification.
+Identifier comparison is:
 
-| Flag                | Byte Value | Description                               
                                                               |
-| ------------------- | ---------- | 
--------------------------------------------------------------------------------------------------------
 |
-| NULL FLAG           | `-3`       | Object is null. No further bytes are 
written for this object.                                            |
-| REF FLAG            | `-2`       | Object was already serialized. Followed 
by unsigned varint32 reference ID.                               |
-| NOT_NULL VALUE FLAG | `-1`       | Object is non-null but reference tracking 
is disabled for this type. Object data follows immediately.    |
-| REF VALUE FLAG      | `0`        | Object is referencable and this is its 
first occurrence. Object data follows. Assigns next reference ID. |
+1. If both fields have explicit IDs, compare IDs numerically.
+2. If only one field has an explicit ID, the ID-based field sorts before the
+   name-based field.
+3. If neither field has an explicit ID, compare snake_case names
+   lexicographically.
+4. If identifiers are equal, use deterministic tie-breakers such as declaring
+   class and original field name. Untagged fields with the same snake_case
+   identifier in the same class are invalid. A child field that hides an
+   inherited field with the same Java field name keeps only the nearest field 
in
+   xlang TypeDef metadata because the inherited field has no distinct untagged
+   identifier.
 
-When reference tracking is disabled globally or for a specific field/type, 
only `NULL FLAG` and
-`NOT_NULL VALUE FLAG` are used.
+Generated serializers may keep separate internal descriptor groups for
+primitive, collection, map, built-in, and user-defined serializers so they can
+emit specialized fast paths. Those internal groups are an implementation detail
+and must not change wire field order.
 
-## Type system and type IDs
+## ClassDef Encoding
 
-Java native serialization uses the unified type ID layout shared with xlang:
+Compatible mode and meta sharing encode Java class definitions as TypeDef
+records. A TypeDef has an 8-byte header followed by class metadata bytes:
 
-```
-full_type_id = (user_type_id << 8) | internal_type_id
+```text
+| 8-byte header | [varuint32 extra_size] | class metadata bytes |
 ```
 
-- `internal_type_id` is the low 8 bits describing the kind (enum/struct/ext, 
named variants, or a
-  built-in type).
-- `user_type_id` is the numeric registration ID (0-based) for user-defined 
enum/struct/ext types.
-- Named types use `NAMED_*` internal IDs and carry names in metadata rather 
than embedding a user
-  ID.
-
-### Shared internal type IDs (0-63)
-
-Java native mode shares the xlang internal IDs for all values below 64. IDs 
`0~56` are defined by
-the xlang spec, while `57~63` are reserved for future internal use. These IDs 
are stable across
-languages.
-
-See the internal type ID table in
-[Xlang Serialization 
Format](xlang_serialization_spec.md#internal-type-id-table).
-Java shares all IDs `< 64`, with `57~63` reserved for future internal use.
-
-### Java native built-in type IDs
-
-Java native serialization assigns Java-specific built-ins starting at
-`Types.BOUND + 5` (`Types.BOUND` is 64; 5 IDs are reserved for future use).
-Type IDs in `0~56` are shared with xlang; `57~63` are reserved; `64+` are only
-valid in Java native mode.
-
-| Type ID | Name                       | Description                    |
-| ------- | -------------------------- | ------------------------------ |
-| 69      | VOID_ID                    | java.lang.Void                 |
-| 70      | CHAR_ID                    | java.lang.Character            |
-| 71      | PRIMITIVE_VOID_ID          | void                           |
-| 72      | PRIMITIVE_BOOL_ID          | boolean                        |
-| 73      | PRIMITIVE_INT8_ID          | byte                           |
-| 74      | PRIMITIVE_CHAR_ID          | char                           |
-| 75      | PRIMITIVE_INT16_ID         | short                          |
-| 76      | PRIMITIVE_INT32_ID         | int                            |
-| 77      | PRIMITIVE_FLOAT32_ID       | float                          |
-| 78      | PRIMITIVE_INT64_ID         | long                           |
-| 79      | PRIMITIVE_FLOAT64_ID       | double                         |
-| 80      | PRIMITIVE_BOOLEAN_ARRAY_ID | boolean[]                      |
-| 81      | PRIMITIVE_BYTE_ARRAY_ID    | byte[]                         |
-| 82      | PRIMITIVE_CHAR_ARRAY_ID    | char[]                         |
-| 83      | PRIMITIVE_SHORT_ARRAY_ID   | short[]                        |
-| 84      | PRIMITIVE_INT_ARRAY_ID     | int[]                          |
-| 85      | PRIMITIVE_FLOAT_ARRAY_ID   | float[]                        |
-| 86      | PRIMITIVE_LONG_ARRAY_ID    | long[]                         |
-| 87      | PRIMITIVE_DOUBLE_ARRAY_ID  | double[]                       |
-| 88      | STRING_ARRAY_ID            | String[]                       |
-| 89      | OBJECT_ARRAY_ID            | Object[]                       |
-| 90      | ARRAYLIST_ID               | java.util.ArrayList            |
-| 91      | HASHMAP_ID                 | java.util.HashMap              |
-| 92      | HASHSET_ID                 | java.util.HashSet              |
-| 93      | CLASS_ID                   | java.lang.Class                |
-| 94      | EMPTY_OBJECT_ID            | empty object stub              |
-| 95      | LAMBDA_STUB_ID             | lambda stub                    |
-| 96      | JDK_PROXY_STUB_ID          | JDK proxy stub                 |
-| 97      | REPLACE_STUB_ID            | writeReplace/readResolve stub  |
-| 98      | NONEXISTENT_META_SHARED_ID | meta-shared unknown class stub |
-
-### Registration and named types
-
-User-defined enum/struct/ext types can be registered by numeric ID or by name.
-
-- Numeric registration: `full_type_id = (user_id << 8) | internal_type_id`.
-- Name registration: type meta uses namespace and type name (see below).
-- Unregistered types are encoded as named types using namespace = package name 
and type name =
-  simple class name.
-
-Named type selection rules for unregistered types:
-
-- enum -> NAMED_ENUM
-- struct-like serializers -> NAMED_STRUCT (or NAMED_COMPATIBLE_STRUCT in 
compatible mode)
-- all other custom serializers -> NAMED_EXT
-
-## Type meta encoding
-
-Every value is written with a type ID followed by optional type metadata:
-
-1. Write `type_id` using varuint32 small7 encoding.
-2. For `NAMED_ENUM`, `NAMED_STRUCT`, `NAMED_EXT`, `NAMED_COMPATIBLE_STRUCT`:
-   - If meta share is enabled: write shared class meta (streaming format).
-   - Otherwise: write namespace and type name as meta strings.
-3. For `COMPATIBLE_STRUCT`:
-   - If meta share is enabled: write shared class meta (streaming format).
-   - Otherwise: no extra meta (type ID is sufficient).
-4. All other types: no extra meta.
-
-### Shared class meta (streaming)
-
-When meta share is enabled, Java uses the streaming shared meta protocol and 
writes TypeDef
-bytes inline on first use.
+Header bits:
 
+```text
+| 52-bit hash | 3 reserved bits | 1 compress bit | 8 size bits |
 ```
-| varuint32: index_marker | [class def bytes if new] |
 
-index_marker = (index << 1) | flag
-flag = 1 -> reference
-flag = 0 -> new type
-```
+- `size`: the lower 8 bits. If the value is `0xff`, read `extra_size` as
+  `varuint32` and add it to `0xff`.
+- `compress`: set when class metadata bytes are compressed by the configured
+  meta compressor.
+- `reserved`: must be zero.
+- `hash`: 52 bits derived from MurmurHash3 x64_128 seed 47 over
+  `class_metadata_bytes || header_low12_le`. `header_low12_le` is the low 12
+  header bits encoded as two little-endian bytes with the upper four bits of 
the
+  second byte clear. Take lane 0 of the MurmurHash3 result, left-shift it by 12
+  with signed 64-bit wraparound, apply signed absolute value, and mask with
+  `0xfffffffffffff000`.
 
-- If `flag == 1`, this is a reference to a previously written type. No class 
def bytes follow.
-- If `flag == 0`, this is a new type definition and class def bytes are 
written inline.
+### Class Metadata Body
 
-The index is assigned sequentially in the order types are first encountered.
+```text
+| root_kind_and_layer_count | class_layer_0 | class_layer_1 | ... |
 
-## Schema modes
+class_layer:
+| varuint32 class_header | [registered type IDs or names] | field_info... |
+```
 
-Java native serialization supports two schema modes:
+`root_kind_and_layer_count` stores the root TypeDef kind in the high four bits
+and `(num_layers - 1)` in the low four bits. If the low four bits are `0b1111`,
+read an extra `varuint32` and add it to `15`.
 
-- Schema consistent (compatible mode disabled): fields are serialized in a 
fixed order and no
-  ClassDef is required. Type meta uses `STRUCT` or `NAMED_STRUCT` for 
user-defined classes.
-- Schema evolution (compatible mode enabled): fields are serialized with 
schema evolution metadata
-  (ClassDef). Type meta uses `COMPATIBLE_STRUCT` or `NAMED_COMPATIBLE_STRUCT`.
+Root kind codes:
 
-## ClassDef format (compatible mode)
+| Code  | Kind                                         |
+| ----- | -------------------------------------------- |
+| 0     | `STRUCT`                                     |
+| 1     | `COMPATIBLE_STRUCT`                          |
+| 2     | `NAMED_STRUCT`                               |
+| 3     | `NAMED_COMPATIBLE_STRUCT`                    |
+| 4     | `ENUM`                                       |
+| 5     | `NAMED_ENUM`                                 |
+| 6     | `EXT`                                        |
+| 7     | `NAMED_EXT`                                  |
+| 8     | `TYPED_UNION`                                |
+| 9     | `NAMED_UNION`                                |
+| 10-14 | Reserved                                     |
+| 15    | Extended-kind escape, rejected until defined |
 
-ClassDef is the schema evolution metadata encoded for compatible structs. It 
is written inline
-when shared meta is enabled, or referenced by index when already seen.
+`class_header = (num_fields << 1) | registered_flag`.
 
-### Binary layout
+- If `registered_flag == 1`, write the class type ID as one byte. For
+  user-registered `ENUM`, `STRUCT`, `COMPATIBLE_STRUCT`, `EXT`, and
+  `TYPED_UNION`, write the user type ID as `varuint32`.
+- If `registered_flag == 0`, write namespace and type name as meta strings.
 
-```
-| 8 bytes header | [varuint32 extra size] | class meta bytes |
-```
+Class layers are encoded from parent to leaf. Field lists inside each layer use
+the field order defined above.
 
-Header layout (lower bits on the right):
+### Field Info
 
-```
-| 52-bit hash | 3 bits reserved | 1 bit compress | 8-bit size |
+Each field is encoded as:
+
+```text
+| field_header | [extended_name_or_id_size] | [field name bytes] | field_type |
 ```
 
-- size: lower 8 bits. If size equals the mask (0xFF), write extra size as 
varuint32 and add it.
-- compress: bit 8, set when class meta bytes are compressed.
-- reserved: bits 9-11 are reserved for future use and must be zero.
-- hash: 52 stored hash bits derived from MurmurHash3 x64_128 seed 47 over
-  `class meta bytes || header_low12_le`. `header_low12_le` is two 
little-endian bytes containing
-  the low 12 header bits (size, compress, and reserved bits); the upper four 
bits of the second
-  byte are zero. Take lane 0 of the 128-bit MurmurHash3 result as a signed 
int64, left-shift it by
-  12 with two's-complement 64-bit wraparound, apply signed absolute value 
(leaving `INT64_MIN`
-  unchanged), then mask with `0xfffffffffffff000`. The final header is the 
masked hash bits OR-ed
-  with the low 12 header bits.
+`field_header` bits:
 
-### Class meta bytes
+| Bits | Meaning                                          |
+| ---- | ------------------------------------------------ |
+| 0    | `trackingRef`                                    |
+| 1    | `nullable`                                       |
+| 2..3 | field name encoding                              |
+| 4..6 | encoded name length minus one, or compact tag ID |
+| 7    | reserved, must be zero                           |
 
-Class meta encodes a linearized class hierarchy (from parent to leaf) and 
field metadata:
+Field name encodings:
 
-```
-| root_kind_and_num_classes | class_layer_0 | class_layer_1 | ... |
+| Code | Encoding                             |
+| ---- | ------------------------------------ |
+| 0    | UTF-8                                |
+| 1    | all-to-lower special encoding        |
+| 2    | lower/upper/digit special encoding   |
+| 3    | tag ID; field name bytes are omitted |
 
-class_layer:
-| num_fields << 1 | registered_flag | [type_id if registered] |
-| namespace | type_name | field_infos |
-```
+For name encodings, bits `4..6` store `encoded_length - 1` when it is less than
+`7`. If the value is `7`, read an extra `varuint32` and add it to `7`.
 
-- `root_kind_and_num_classes` stores the root TypeDef kind in the high four 
bits and
-  `(num_layers - 1)` in the low four bits.
-  - Root kind codes are `STRUCT=0`, `COMPATIBLE_STRUCT=1`, `NAMED_STRUCT=2`,
-    `NAMED_COMPATIBLE_STRUCT=3`, `ENUM=4`, `NAMED_ENUM=5`, `EXT=6`, 
`NAMED_EXT=7`,
-    `TYPED_UNION=8`, and `NAMED_UNION=9`.
-  - Kind codes `10-14` are reserved and `15` is an extended-kind escape 
rejected until defined.
-  - If the low four bits equal `0b1111`, read an extra varuint32 small7 and 
add it.
-  - The actual number of layers is `num_classes + 1`.
-- `registered_flag` is 1 if the class is registered by numeric ID.
-- If registered by ID, the one-byte class type ID follows. For user-registered 
ID kinds, the
-  user type ID follows as varuint32.
-- If registered by name or unregistered, namespace and type name are written 
as meta strings.
+For tag ID encoding, bits `4..6` store the numeric field ID when it is less 
than
+`7`. If the value is `7`, read an extra `varuint32` and add it to `7`. Field 
IDs
+must be non-negative. Duplicate field IDs in one TypeDef are invalid.
 
-### Field info
+### Field Type
 
-Each field uses a compact header followed by its name bytes (omitted when 
TAG_ID is used) and its
-type info:
+Field types describe how compatible readers read or skip the field payload.
+Top-level field types write only the type tag. Nested field types store
+`nullable` and `trackingRef` in the low bits:
 
-```
-| field_header | [field_name_bytes] | field_type |
+```text
+nested_field_type_header = (type_tag << 2) | (nullable << 1) | trackingRef
 ```
 
-`field_header` bits:
+Type tags:
 
-- bit 0: trackingRef
-- bit 1: nullable
-- bits 2-3: field name encoding
-- bits 4-6: name length (len-1), or tag ID when TAG_ID is used; value 7 
indicates extended length
-- bit 7: reserved (0)
+| Tag | Field type                  | Payload                          |
+| --- | --------------------------- | -------------------------------- |
+| 0   | Object/dynamic              | none                             |
+| 1   | Map                         | key field type, value field type |
+| 2   | Collection/List/Set         | element field type               |
+| 3   | Java array                  | dimensions, component field type |
+| 4   | Enum                        | none                             |
+| 5+  | Registered or built-in type | `tag - 5` is the type ID         |
 
-Field name encoding:
+## Meta Strings
 
-- 0: UTF8
-- 1: ALL_TO_LOWER_SPECIAL
-- 2: LOWER_UPPER_DIGIT_SPECIAL
-- 3: TAG_ID (field name omitted, tag ID stored in size field)
+Namespaces, type names, and field names use the meta string encodings defined
+by the xlang specification. A meta string header stores the byte length and
+encoding kind; extended lengths are written as `varuint32`.
 
-If length is extended (size==7), an extra varuint32 small7 storing `(len-1) - 
7` follows.
+Package and namespace names use UTF-8, all-to-lower special encoding, or
+lower/upper/digit special encoding. Type names use UTF-8,
+lower/upper/digit special encoding, first-to-lower special encoding, or
+all-to-lower special encoding. Field names use the field-info encoding table
+above.
 
-### Field type encoding
+## Primitive Values
 
-Field types are encoded with a type tag and optional nested type info. For 
nested types, the header
-includes nullable/trackingRef flags in the low bits.
-Top-level field types use the tag only (no flags).
+Primitive values are written without type metadata when the field serializer is
+known statically:
 
-Type tags:
+| Java type | Payload                                                          
           |
+| --------- | 
--------------------------------------------------------------------------- |
+| `boolean` | one byte: `0` or `1`                                             
           |
+| `byte`    | one signed byte                                                  
           |
+| `char`    | two-byte UTF-16 code unit, little endian                         
           |
+| `short`   | two-byte signed integer, little endian                           
           |
+| `int`     | fixed int32 little endian, or ZigZag varint32 when configured    
           |
+| `long`    | fixed int64 little endian, ZigZag varint64, or tagged int64 when 
configured |
+| `float`   | IEEE 754 binary32, little endian                                 
           |
+| `double`  | IEEE 754 binary64, little endian                                 
           |
 
-| Tag | Field type                                |
-| --- | ----------------------------------------- |
-| 0   | Object (ObjectFieldType)                  |
-| 1   | Map (MapFieldType)                        |
-| 2   | Collection/List/Set (CollectionFieldType) |
-| 3   | Array (ArrayFieldType)                    |
-| 4   | Enum (EnumFieldType)                      |
-| 5+  | Registered type (RegisteredFieldType)     |
+Boxed primitives use the same value payload after the selected null/reference
+slot.
 
-Encoding rules:
+## String Values
 
-- ObjectFieldType: write tag 0.
-- MapFieldType: write tag 1, then key type, then value type.
-- CollectionFieldType: write tag 2, then element type.
-- ArrayFieldType: write tag 3, then dimensions, then component type.
-- EnumFieldType: write tag 4.
-- RegisteredFieldType: write tag `5 + type_id`.
+Java strings are encoded as:
 
-For nested types, nullable/trackingRef flags are stored in the low bits of the 
header as
-`(type_tag << 2) | (nullable << 1) | tracking_ref`.
+```text
+| varuint36_small7 header | bytes |
 
-## Meta string encoding
+header = (num_bytes << 2) | coder
+```
 
-Namespace, type names, and field names use the same meta string encodings as 
the xlang spec.
+`coder` values:
 
-### Package and type names
+| Value | Encoding             |
+| ----- | -------------------- |
+| 0     | Latin-1              |
+| 1     | UTF-16 little endian |
+| 2     | UTF-8                |
 
-Header format:
+`num_bytes` is the byte length of the encoded payload.
 
-```
-| 6 bits size | 2 bits encoding |
-```
+## Enum Values
 
-- size is the byte length of the encoded name.
-- if size == 63, write extra length `(size - 63)` as varuint32 small7.
+Enum value payload depends on configuration:
 
-Encodings:
+- Ordinal mode writes the enum ordinal as `varuint32`.
+- `@ForyEnumId` mode writes the configured non-negative enum tag as
+  `varuint32`.
+- Name mode writes the enum constant name as a meta string.
 
-- Package name: UTF8, ALL_TO_LOWER_SPECIAL, LOWER_UPPER_DIGIT_SPECIAL
-- Type name: UTF8, LOWER_UPPER_DIGIT_SPECIAL, FIRST_TO_LOWER_SPECIAL, 
ALL_TO_LOWER_SPECIAL
+`@ForyEnumId` may be declared on enum constants, on one integer field, or on 
one
+zero-argument integer getter, according to the Java API contract. Duplicate or
+negative enum tags are invalid.
 
-### Field names
+## Arrays
 
-Field name encoding is described in the ClassDef field header section. When 
using TAG_ID, the
-field name bytes are omitted and the tag ID is stored in the size field.
+### Primitive Arrays
 
-### Encoding algorithms
+Primitive arrays write a length prefix and contiguous little-endian element
+payload:
 
-See the xlang specification for encoding algorithms and tables:
-`docs/specification/xlang_serialization_spec.md#meta-string`.
-
-## Value encodings
+```text
+| varuint32 byte_length | raw element bytes |
+```
 
-This section describes the byte layouts for common built-in serializers used 
in Java native
-serialization. Custom serializers (EXT) may define additional formats but must 
still follow the
-reference and type meta rules described above.
+Compressed `int[]` and `long[]` arrays use element count followed by compressed
+elements:
 
-### Primitives
+```text
+int[] compressed:
+| varuint32 length | varint32... |
 
-- boolean: 1 byte (0x00 or 0x01).
-- byte: 1 byte.
-- short: 2 bytes little endian.
-- char: 2 bytes little endian (UTF-16 code unit).
-- int:
-  - fixed: 4 bytes little endian.
-  - varint: signed varint32 (ZigZag) when `compressInt` is enabled.
-- long:
-  - fixed: 8 bytes little endian.
-  - varint: signed varint64 (ZigZag) when `longEncoding=VARINT`.
-  - tagged: tagged int64 when `longEncoding=TAGGED`.
-- float: IEEE 754 float32, little endian.
-- double: IEEE 754 float64, little endian.
+long[] compressed:
+| varuint32 length | varint64 or tagged_int64... |
+```
 
-Varint encodings follow the xlang spec:
-`docs/specification/xlang_serialization_spec.md#unsigned-varint32`.
+`byte[]` is the binary serializer and writes `varuint32 length` followed by raw
+bytes.
 
-### String
+### Object Arrays
 
-Strings are encoded as:
+Object arrays write the array length and an element type mode:
 
+```text
+| varuint32_small7 (length << 1 | monomorphic_flag) |
+| [shared element class metadata] |
+| element slots... |
 ```
-| varuint36_small: (num_bytes << 2) | coder | string bytes |
-```
-
-- coder: 2-bit value
-  - 0: LATIN1
-  - 1: UTF16
-  - 2: UTF8
-- num_bytes: byte length of the encoded string payload.
-
-UTF16 is encoded as little endian 2-byte code units.
 
-### Enum
+- If `monomorphic_flag == 1`, all non-null elements use the same element
+  serializer. The shared element class metadata is written once.
+- If `monomorphic_flag == 0`, each non-null element writes its own type
+  metadata.
 
-- If `serializeEnumByName` is enabled: write enum name as a meta string.
-- Otherwise: write an enum tag as varuint32 small7.
-  - By default the tag is the declaration ordinal.
-  - If the enum configures `@ForyEnumId`, write the configured stable id 
instead. Java supports
-    annotating exactly one id field, exactly one zero-argument id getter, or 
every enum constant
-    with explicit tag values.
+Each nullable or reference-tracked element is still represented by a reference
+slot before its element payload.
 
-### Binary (byte[])
+## Collections
 
-Primitive byte arrays are encoded as:
+Java collection serializers write collection size, element flags, optional
+shared element type metadata, and element payloads:
 
+```text
+| varuint32_small7 size | elements_header | [element type metadata] | 
elements... |
 ```
-| varuint32: num_bytes | raw bytes |
-```
-
-### Primitive arrays
 
-Primitive arrays write a byte-length prefix followed by the little-endian 
primitive payload unless
-compression is enabled:
+`elements_header` bits:
 
-```
-| varuint32: byte_length | raw bytes |
-```
+| Bit | Meaning                               |
+| --- | ------------------------------------- |
+| 0   | Element reference tracking is enabled |
+| 1   | At least one element may be null      |
+| 2   | Declared element type is used         |
+| 3   | All non-null elements share one type  |
 
-- `compressIntArray`: int[] encoded as `| varuint32: length | varint32... |`.
-- `compressLongArray`: long[] encoded as `| varuint32: length | 
varint64/tagged... |`.
+When all non-null elements share a type and the declared element type is not
+used, the shared element type metadata is written once before element payloads.
+Otherwise each non-null element writes its own type metadata. Null and 
reference
+flags follow the reference-slot rules.
 
-### Object arrays
+### Collection Subclasses
 
-Object arrays encode length and a monomorphic flag:
+Specialized serializers for supported JDK collection subclasses write
+subclass-owned field layers before the element payload:
 
-```
-| varuint32_small7: (length << 1) | mono_flag |
+```text
+| varuint32_small7 size |
+| [comparator reference for sorted/priority collections] |
+| varuint32_small7 num_class_layers |
+| class_layer_fields... |
+| elements_header | [element type metadata] | elements... |
 ```
 
-- If `mono_flag == 1`, all elements share a known component serializer. Each 
element uses ref
-  flags and the component serializer writes the value.
-- If `mono_flag == 0`, each element uses ref flags and writes its own class 
info and data.
+`num_class_layers` is the exact number of subclass field layers encoded in the
+payload. Readers must reject a payload whose layer count does not match the
+local serializer because the value payload does not carry enough layer identity
+to skip a mismatched subclass layout.
 
-### Collections (List/Set)
+## Maps
 
-Collections encode length and a one-byte elements header:
+Maps write entry count followed by one or more chunks. Each chunk groups 
entries
+with compatible key and value metadata:
 
-```
-| varuint32_small7: length | elements_header | [elem_class_info] | elements... 
|
+```text
+| varuint32_small7 size | chunk... |
 ```
 
-`elements_header` bits (see `CollectionFlags`):
+Non-null chunks:
 
-- bit 0: TRACKING_REF
-- bit 1: HAS_NULL
-- bit 2: IS_DECL_ELEMENT_TYPE
-- bit 3: IS_SAME_TYPE
+```text
+| header | uint8 chunk_size | [key type metadata] | [value type metadata] | 
entries... |
+```
 
-If `IS_SAME_TYPE` is set and `IS_DECL_ELEMENT_TYPE` is not set, the element 
class info is written
-once before the elements. Element values then follow with either ref flags (if 
TRACKING_REF) or
-per-element null flags (if HAS_NULL).
+`chunk_size` is in `1..255`.
 
-If `IS_SAME_TYPE` is not set, each element is written with its own class info 
and data (and
-optionally ref flags).
+`header` bits:
 
-#### Child collection subclasses
+| Bit | Meaning                             |
+| --- | ----------------------------------- |
+| 0   | Key reference tracking is enabled   |
+| 1   | Chunk may contain null keys         |
+| 2   | Declared key type is used           |
+| 3   | Value reference tracking is enabled |
+| 4   | Chunk may contain null values       |
+| 5   | Declared value type is used         |
 
-Optimized serializers for subclasses of supported JDK collection 
implementations write subclass
-field layers before element payloads:
+Null key or null value entries are encoded as single-entry special chunks
+without a `chunk_size` byte:
 
-```
-| varuint32_small7: length | [comparator_ref] | varuint32_small7: 
num_class_layers |
-| class_layer_fields... | [elements_header | elem_class_info | elements...] |
-```
+- null key and non-null value: special null-key header, then value payload.
+- non-null key and null value: special null-value header, then key payload.
+- null key and null value: `KV_NULL` header only.
 
-- `comparator_ref` is present only for sorted-set and priority-queue 
subclasses.
-- `num_class_layers` is the exact number of subclass-owned field layers 
written after the collection
-  header and before the element payload.
-- Readers must reject a payload whose `num_class_layers` does not match the 
local serializer's layer
-  count. These serializers do not carry per-layer class identity in the value 
payload, so mismatched
-  layers cannot be skipped safely.
+`EnumMap` writes one serializer-owned payload mode byte before its normal map
+payload:
 
-### Maps
+- `0`: normal payload follows.
+- `1`: Java-serialized empty `EnumMap` payload.
 
-Maps encode entry count and then a sequence of chunks. Each chunk groups 
entries that share key
-and value types.
+### Map Subclasses
 
-```
-| varuint32_small7: size | chunk_1 | chunk_2 | ... |
+Specialized serializers for supported JDK map subclasses write subclass-owned
+field layers before entry chunks:
 
-chunk (non-null entries):
-| header | chunk_size | [key_class_info] | [value_class_info] | entries... |
+```text
+| varuint32_small7 size |
+| [comparator reference for sorted maps] |
+| varuint32_small7 num_class_layers |
+| class_layer_fields... |
+| chunk... |
 ```
 
-`header` bits (see `MapFlags`):
+Readers must reject mismatched `num_class_layers` for the same reason as
+collection subclasses.
 
-- bit 0: TRACKING_KEY_REF
-- bit 1: KEY_HAS_NULL
-- bit 2: KEY_DECL_TYPE
-- bit 3: TRACKING_VALUE_REF
-- bit 4: VALUE_HAS_NULL
-- bit 5: VALUE_DECL_TYPE
+## JDK Wrappers and Views
 
-If `KEY_DECL_TYPE` or `VALUE_DECL_TYPE` is unset, the corresponding class info 
is written once at
-the start of the chunk. `chunk_size` is a single byte (1..255) and 
`MAX_CHUNK_SIZE` is 255.
+Java native mode has serializers for selected JDK wrappers and views:
 
-#### Null key/value entries
+- Unmodifiable and synchronized collection/map wrappers keep the wrapper type
+  metadata and write the wrapped source collection or map as a normal object
+  payload.
+- Recognized sublist views keep the sublist type metadata and write one
+  serializer-owned mode byte. Mode `0` writes visible elements as a collection
+  payload. Mode `1` writes view offset, size, and source list reference.
+- `Collections.newSetFromMap` writes the backing map payload.
+- Immutable JDK collection serializers keep list, set, or map payload
+  semantics and materialize an equivalent immutable or unmodifiable container
+  on read.
 
-Entries with null key or null value are encoded as special single-entry chunks 
without a
-`chunk_size` byte:
+Android and JVM implementations may choose different concrete public backing
+types for wrapper payloads, but the serializer-owned payload modes above define
+the wire shape.
 
-- null key, non-null value: `NULL_KEY_VALUE_DECL_TYPE*` flags, then value 
payload
-- null value, non-null key: `NULL_VALUE_KEY_DECL_TYPE*` flags, then key payload
-- null key and null value: `KV_NULL` header only
+## Struct and Object Payloads
 
-These chunks always represent exactly one entry.
+Struct-like object payloads contain field values in protocol field order. The
+selected serializer owns the exact field fast path:
 
-`EnumMap` has an EnumMap-owned one-byte payload mode before its map payload:
+```text
+| field_0 payload | field_1 payload | ... |
+```
 
-- `0`: normal payload, then `varuint32_small7` size, key enum class info, and 
the map chunks above.
-- `1`: Java-serialized empty `EnumMap` payload. Android uses this mode when an 
empty map has no
-  public key from which to derive the enum class. Readers on Android and JVM 
must accept both modes.
+For each field, field metadata decides whether the field writes a primitive
+payload directly, a nullable slot, a reference-tracked slot, type metadata, or 
a
+specialized collection/map/array payload.
 
-#### Child map subclasses
+Compatible-mode readers use the remote ClassDef field list to map fields by
+identifier. Unknown fields are skipped using their remote field type metadata.
 
-Optimized serializers for subclasses of supported JDK map implementations 
write subclass field
-layers before map entry chunks:
+Generated serializers may split large generated methods and hoist serializers,
+field offsets, collection metadata, or map metadata. Those generated-code
+decisions must preserve the same object payload order.
 
-```
-| varuint32_small7: size | [comparator_ref] | varuint32_small7: 
num_class_layers |
-| class_layer_fields... | [chunk_1 | chunk_2 | ...] |
-```
+## Throwable Payloads
 
-- `comparator_ref` is present only for sorted-map subclasses.
-- `num_class_layers` is the exact number of subclass-owned field layers 
written after the map header
-  and before the entry chunks.
-- Readers must reject a payload whose `num_class_layers` does not match the 
local serializer's layer
-  count. These serializers do not carry per-layer class identity in the value 
payload, so mismatched
-  layers cannot be skipped safely.
-
-### JDK collection/map wrappers and views
-
-Java native mode may use specialized serializers for JDK collection/map 
wrappers and views. These
-serializers do not introduce a new collection/map protocol branch; they write 
ordinary object,
-collection, or map payloads in serializer-owned value slots.
-
-- Unmodifiable and synchronized wrappers keep the outer wrapper type metadata. 
The wrapper value
-  payload is the wrapped source collection or map written as a normal 
referencable object. Android
-  writers use public source implementations for that payload: `ArrayList`, 
`HashSet`, `TreeSet`,
-  `HashMap`, or `TreeMap`. Readers rewrap the source through 
`Collections.unmodifiable*` or
-  `Collections.synchronized*`.
-- Recognized sublist view classes keep their outer sublist type metadata and 
use a
-  serializer-local one-byte payload mode. Mode `0` writes visible elements as 
a normal collection
-  payload. Mode `1` writes view metadata as `offset`, `size`, and source list 
reference. Android
-  writers use mode `0`; JVM writers may use mode `1` when the view fields 
match the supported JDK
-  shape. Readers on Android and JVM must accept both modes.
-- `Collections.newSetFromMap` writes a backing-map payload. Android writers 
use `HashMap` backing
-  type metadata.
-- Immutable JDK collection serializers keep ordinary list/set/map payload 
semantics. Android readers
-  materialize public unmodifiable containers when JDK internal immutable 
constructors are not
-  available.
-
-Xlang mode uses the xlang collection/map protocol and does not encode Java 
wrapper or view internals.
-
-### Objects and structs
-
-Object values are encoded as:
+`Throwable` serializers preserve standard Java throwable state and
+subclass-owned fields:
 
+```text
+| stack_trace_ref | cause_ref | message_ref |
+| varuint32 suppressed_count | suppressed_ref... |
+| varuint32 extra_field_count | extra_field_name/value... |
+| varuint32_small7 num_class_layers |
+| class_layer_fields... |
 ```
-| ref meta | type meta | field data |
-```
-
-Field data is written by the serializer selected by the class info. For 
standard object
-serialization:
 
-- Fields are sorted deterministically using `DescriptorGrouper` order:
-  primitives, boxed primitives, built-ins, collections, maps, then other 
fields, with names sorted
-  within each category.
-- For compatible mode, `MetaSharedSerializer` uses ClassDef field metadata to 
read and skip
-  unknown fields.
-- For each field, the serializer uses field metadata (nullable, trackingRef, 
polymorphic) to decide
-  whether to write ref flags and/or type meta before the field value.
+`extra_field_count` is reserved for serializer-owned extension fields and is
+currently written as zero. `num_class_layers` must match the local throwable
+serializer layout on read.
 
-### Throwable values
+## Replacement and Java Serialization Hooks
 
-`Throwable` subclasses use a specialized payload that preserves stack trace, 
cause, message,
-suppressed exceptions, and subclass-owned fields:
+Java native mode supports serializer-owned handling for Java object replacement
+and Java serialization hooks:
 
-```
-| stack_trace_ref | cause_ref | message_string_ref |
-| varuint32: suppressed_count | suppressed_ref... |
-| varuint32: extra_field_count | extra_field_name/value... |
-| varuint32_small7: num_class_layers | class_layer_fields... |
-```
+- `writeReplace`/`readResolve` values use replacement metadata and payloads
+  owned by the replacement serializer.
+- JDK proxy and lambda stubs use their registered native stub IDs.
+- Types that require Java Object Serialization compatibility may be delegated 
to
+  serializers that reproduce the required Java semantics inside a Fory object
+  slot.
 
-- `extra_field_count` is reserved for serializer-owned extension fields and is 
currently written as
-  zero.
-- `num_class_layers` is the exact number of `Throwable` subclass field layers 
written after the
-  built-in Throwable state.
-- Readers must reject a payload whose `num_class_layers` does not match the 
local serializer's layer
-  count. The Throwable value payload does not carry per-layer class identity, 
so mismatched layers
-  cannot be skipped safely.
+These serializers still obey the stream header, reference slot, and type
+metadata rules in this document.
 
-### Extensions (EXT)
+## Unknown Classes
 
-Extension types are encoded by their registered serializer. Type meta is still 
written before the
-value as described above. The serializer is responsible for the value layout.
+When meta sharing is enabled and a reader does not have a local class for a
+remote ClassDef, Java may materialize an unknown-class placeholder using
+`NONEXISTENT_META_SHARED_ID`. The placeholder stores enough field data to
+preserve and copy the unknown value according to the unknown-class serializer.
+It does not make the unknown Java class available to user code.
 
-## Out-of-band buffers
+## Out-of-Band Buffers
 
-When a `BufferCallback` is provided, the oob flag is set in the header and 
serializers may emit
-buffer references instead of inline bytes (for example, large primitive 
arrays). The out-of-band
-buffer protocol is specific to the callback implementation; the main stream 
only contains
-references to those buffers.
+When the header out-of-band bit is set, serializers may write references to
+external buffers instead of writing all bytes inline. The callback defines the
+external buffer transport. The main stream remains a valid Fory stream
+containing references to those buffers at serializer-owned payload positions.
diff --git a/docs/specification/xlang_serialization_spec.md 
b/docs/specification/xlang_serialization_spec.md
index 1c9a424931..bb5b2c1598 100644
--- a/docs/specification/xlang_serialization_spec.md
+++ b/docs/specification/xlang_serialization_spec.md
@@ -646,8 +646,8 @@ Field names:
 
 Field order:
 
-Field order is implementation-defined. Decoders must match fields by name or 
tag ID rather than
-position. Fory uses a stable grouping and sorting order to produce 
deterministic TypeDefs.
+TypeDef field lists use the same ordering defined in [Field 
order](#field-order). Compatible
+decoders must still match fields by name or tag ID rather than relying only on 
position.
 
 ## Meta String
 
@@ -1469,10 +1469,21 @@ language-specific helper classes.
 
 For every field, compute a stable identifier used for ordering:
 
-- If a tag ID is configured (e.g., `@ForyField(id=...)`), use the tag ID as a 
decimal string.
+- If a non-negative tag ID is configured (e.g., `@ForyField(id=...)`), use the 
tag ID.
 - Otherwise, use the field name converted to `snake_case`.
 
-Tag IDs must be unique within a type; duplicate tag IDs are invalid.
+Configured tag IDs must be non-negative. A negative configured tag ID is 
invalid; languages may
+use a negative value only as a default or internal sentinel for "no tag ID 
configured", which falls
+back to the `snake_case` field name and is not a tag ID. Tag IDs must be 
unique within a type;
+duplicate tag IDs are invalid.
+
+Field identifiers compare as follows:
+
+1. If both fields have tag IDs, compare the IDs numerically.
+2. If only one field has a tag ID, the tagged field sorts first.
+3. If neither field has a tag ID, compare the `snake_case` names 
lexicographically.
+4. If fields still compare equal, use deterministic language-local 
tie-breakers such as declaring
+   class name, original field name, or original field index.
 
 ##### Step 2: Group assignment
 
@@ -1480,13 +1491,9 @@ Assign each field to exactly one group in the following 
order:
 
 1. **Primitive (non-nullable)**: primitive or boxed numeric/boolean types with 
`nullable=false`.
 2. **Primitive (nullable)**: primitive or boxed numeric/boolean types with 
`nullable=true`.
-3. **Built-in (non-container)**: internal type IDs that are not user-defined 
and not UNKNOWN,
-   excluding collections and maps (for example: STRING, TIME types, 
UNION/TYPED_UNION/NAMED_UNION,
-   primitive arrays).
-4. **Collection**: list/set/object-array fields. Non-primitive arrays are 
treated as LIST for
-   ordering purposes.
-5. **Map**: map fields.
-6. **Other**: user-defined enum/struct/ext and UNKNOWN types.
+3. **Non-primitive**: every other field, including strings, 
time/date/duration/decimal/binary
+   values, unions, primitive arrays, collections, maps, enums, structs, 
ext/user-defined types,
+   UNKNOWN fields, object arrays, and all other non-primitive schemas.
 
 ##### Step 3: Intra-group ordering
 
@@ -1498,16 +1505,11 @@ Within each group, apply the following sort keys in 
order until a difference is
    types (`VARINT32`, `VAR_UINT32`, `VARINT64`, `VAR_UINT64`, `TAGGED_INT64`, 
`TAGGED_UINT64`).
 2. **Primitive size** (descending): 8-byte > 4-byte > 2-byte > 1-byte.
 3. **Internal type ID** (ascending) as a tie-breaker for equal sizes.
-4. **Field identifier** (lexicographic ascending).
-
-**Built-in / Collection / Map groups (3-5):**
-
-1. **Internal type ID** (ascending).
-2. **Field identifier** (lexicographic ascending).
+4. **Field identifier** using the comparator from Step 1.
 
-**Other group (6):**
+**Non-primitive group (3):**
 
-1. **Field identifier** (lexicographic ascending).
+1. **Field identifier** using the comparator from Step 1.
 
 If two fields still compare equal after the rules above, preserve a 
deterministic order by
 comparing declaring class name and then the original field name. This 
tie-breaker should be
@@ -1517,8 +1519,9 @@ reachable only in invalid schemas (e.g., duplicate tag 
IDs).
 
 - The ordering above is used for serialization order and TypeDef field lists. 
Schema hashes use
   the field identifier ordering described in the schema hash section.
-- Collection/map normalization is required so peers with different concrete 
types (e.g.,
-  `List` vs `Collection`) still agree on ordering.
+- Non-primitive type IDs and codec categories must not affect field order. 
Implementations may keep
+  internal categories to preserve optimized serializers and generated code 
paths, but the categories
+  are not ordering keys.
 - The compressed numeric rule is critical for cross-language consistency: 
compressed integer
   fields are always placed after all fixed-width integer fields.
 
@@ -1535,7 +1538,7 @@ MurmurHash3 x64_128 of the struct fingerprint string:
 
 - For each field, build `<field_id_or_name>,<field_type_fingerprint>;`.
 - Field identifier is the tag ID if present, otherwise the snake_case field 
name.
-- Sort by field identifier lexicographically before concatenation.
+- Sort by the field identifier comparator from [Field order](#field-order) 
before concatenation.
 - `field_type_fingerprint` is recursive:
   - Leaf: `<type_id>,<ref>,<nullable>`
   - `LIST` / `SET`: `<type_id>,<ref>,<nullable>[<element_fingerprint>]`


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to