This is an automated email from the ASF dual-hosted git repository.

chaokunyang pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/fory.git


The following commit(s) were added to refs/heads/main by this push:
     new ee7d6b80e fix(javascript): align TypeMeta preamble constants with 
python/java/rust/go xlang bindings (#3603)
ee7d6b80e is described below

commit ee7d6b80e9a0383793e4d4abb93fca4363465b7e
Author: Emrul <[email protected]>
AuthorDate: Wed Apr 22 07:45:37 2026 +0100

    fix(javascript): align TypeMeta preamble constants with python/java/rust/go 
xlang bindings (#3603)
    
    ## Why?
    
    `@apache-fory/core`'s `NAMED_COMPATIBLE_STRUCT` TypeMeta preamble is not
    byte-compatible with pyfory, fory-java, fory-rust, or fory-go. For the
    same logical struct, the JavaScript binding emits an 8-byte int64 header
    that no other binding can read. I noticed this as part of issue #3602.
    
    With this patch my cross-language tests pass, but I don't know if this
    is entirely correct — I'd appreciate a deeper review from someone who
    knows the TypeMeta spec better than I do (especially around the
    signed-vs-unsigned hash interpretation in `prependHeader`).
    
    ## What does this PR do?
    
    Aligns four constants / behaviours in
    `javascript/packages/core/lib/meta/TypeMeta.ts` with what every other
    xlang binding does at 0.17:
    
    | Constant / behaviour | JS before | python / java / rust / go |
    Reference |
    
    |----------------------|-----------|---------------------------|-----------|
    | `NUM_HASH_BITS` | `41` | `50` | `python/pyfory/meta/typedef.py:37`,
    `java/.../meta/TypeDef.java:77`,
    `rust/fory-core/src/meta/type_meta.rs:76`, `go/fory/type_def.go:35` |
    | `COMPRESS_META_FLAG` | `1n << 63n` | `1 << 9` | same files |
    | `HAS_FIELDS_META_FLAG` | `1n << 62n` | `1 << 8` | same files |
    | hash read in `prependHeader` | unsigned `BigInt` built from two
    `uint32` halves via `getUint32(0, false) << 32n \| getUint32(4, false)`
    | signed `int64` | pyfory unpacks `int64_t[0]`, fory-java
    `murmurhash3_x64_128(...)[0]` returns `long`, rust `.0 as i64` |
    
    On the hash read specifically: reading the same 8 bytes as unsigned
    `BigInt` never produces a negative value, so the subsequent `abs()` is
    effectively a no-op. Whenever the hash's high bit is set, the resulting
    header diverges from what the other bindings emit for the same struct.
    The patch uses `hash.getBigInt64(0, false)` (signed read) followed by
    explicit arbitrary-precision `abs()` + 63-bit mask, mirroring pyfory's
    `abs(hash) & 0x7FFFFFFFFFFFFFFF`.
    
    Empirical reproduction (fory-core 0.17.0 on every binding, matching
    config `xlang=true, ref=true, compatible=true`, NAMED_COMPATIBLE_STRUCT
    via `(namespace, typename)` registration):
    
    ```python
    # python
    import pyfory, dataclasses
    @dataclasses.dataclass
    class Point:
        x: pyfory.int32 = 0
        y: pyfory.int32 = 0
    f = pyfory.Fory(xlang=True, ref=True, compatible=True)
    f.register_type(Point, namespace='demo', typename='Point')
    print(f.serialize(Point(x=10, y=20)).hex(' '))
    ```
    
    ```java
    // java
    Fory fory = Fory.builder()
        .withLanguage(Language.XLANG)
        .withRefTracking(true)
        .withCompatibleMode(CompatibleMode.COMPATIBLE)
        .build();
    fory.register(Point.class, "demo", "Point");
    byte[] out = fory.serialize(new Point(10, 20));
    ```
    
    ```typescript
    // javascript
    const fory = new Fory({ ref: true, compatible: true });
    const ti = Type.struct({ namespace: 'demo', typeName: 'Point' },
      { x: Type.varInt32(), y: Type.varInt32() },
      { withConstructor: true });
    ti.initMeta(Point);
    const reg = fory.register(Point);
    console.log(Array.from(reg.serialize({ x: 10, y: 20 })));
    ```
    
    Before this PR:
    - python / java / rust / go all produce
    `02 00 1e 00 10 01 d2 92 ce 5f 2b 73 22 0d 0c 8c 70 13 bd c8 6c c0 40 05
    5c 40 05 60 14 28` (30 bytes)
    - javascript produces
    `02 ff 1e 00 10 00 00 ad 86 c0 98 d5 23 15 31 12 92 1c d0 2d f6 53 04 e9
    2e c4 92 7b 9b 22 00 58 07 …`
    The field-descriptor and value bytes align once you get past the
    preamble, but the 8-byte int64 header and the byte-1 reference flag
    diverge. `pyfory.deserialize(jsBytes)` silently returns `Point(x=0,
    y=0)` (every field unmatched, falls through to defaults);
    `fory.deserialize(jsBytes)` in Java throws `DeserializationException:
    read objects are: [null]`.
    
    After this PR: javascript produces byte-identical output to python /
    java / rust / go, and each binding can decode every other binding's
    bytes. Ran manual round-trip against both pyfory 0.17 and fory-java 0.17
    with a Point struct and with a richer struct containing strings, a
    `list<string>`, and int/float fields — both succeed.
    
    ## Related issues
    
    - #3602
    
    ## AI Contribution Checklist
    
    - [x] Substantial AI assistance was used in this PR: **no** (a couple of
    lines of constant alignment; no architectural or API decisions)
    - [ ] If `yes`, I included a completed AI Contribution Checklist in this
    PR description and the required `AI Usage Disclosure`.
    - [ ] If `yes`, my PR description includes the required `ai_review`
    summary and screenshot evidence of the final clean AI review results
    from both fresh reviewers on the current PR diff or current HEAD after
    the latest code changes.
    
    ## Does this PR introduce any user-facing change?
    
    - [x] Does this PR introduce any public API change? — **No.**
    - [x] Does this PR introduce any binary protocol compatibility change? —
    **Yes:** this fixes the JavaScript binding's TypeMeta preamble so it
    matches the canonical wire format the other bindings have been
    producing. Existing `@apache-fory/core` clients communicating only with
    each other will continue to work (same-binding output still
    round-trips). Any persisted JS-produced bytes, or in-flight messages
    relying on JS-specific preamble, will no longer be readable. Given
    cross-binding interop was broken on 0.17 anyway, practical impact should
    be small.
    
    ## Benchmark
    
    Not applicable — constant alignment with no hot-path change.
---
 javascript/packages/core/lib/meta/TypeMeta.ts | 38 ++++++++++++++++++++++-----
 1 file changed, 31 insertions(+), 7 deletions(-)

diff --git a/javascript/packages/core/lib/meta/TypeMeta.ts 
b/javascript/packages/core/lib/meta/TypeMeta.ts
index 08473251d..83f832e17 100644
--- a/javascript/packages/core/lib/meta/TypeMeta.ts
+++ b/javascript/packages/core/lib/meta/TypeMeta.ts
@@ -36,11 +36,18 @@ const pkgDecoder = new MetaStringDecoder(".", "_");
 const typeNameEncoder = new MetaStringEncoder("$", ".");
 const typeNameDecoder = new MetaStringDecoder("$", ".");
 
-// Constants from Java implementation
-const COMPRESS_META_FLAG = 1n << 63n;
-const HAS_FIELDS_META_FLAG = 1n << 62n;
-const META_SIZE_MASKS = 0xFF; // 22 bits
-const NUM_HASH_BITS = 41;
+// Constants shared with python/java/rust/go 0.17+. See e.g.
+// python/pyfory/meta/typedef.py, java/.../TypeDef.java,
+// rust/fory-core/src/meta/type_meta.rs, go/fory/type_def.go. The
+// JavaScript binding previously placed COMPRESS_META_FLAG at bit 63
+// and HAS_FIELDS_META_FLAG at bit 62, and used NUM_HASH_BITS = 41,
+// producing an 8-byte TypeMeta preamble that no other xlang binding
+// could decode. Aligning with the constants every other binding uses
+// so NAMED_COMPATIBLE_STRUCT output is byte-compatible cross-binding.
+const COMPRESS_META_FLAG = 1n << 9n;
+const HAS_FIELDS_META_FLAG = 1n << 8n;
+const META_SIZE_MASKS = 0xFF;
+const NUM_HASH_BITS = 50;
 const BIG_NAME_THRESHOLD = 0b111111;
 
 const PRIMITIVE_TYPE_IDS = [
@@ -645,9 +652,26 @@ export class TypeMeta {
   private prependHeader(buffer: Uint8Array, isCompressed: boolean, 
hasFieldsMeta: boolean): Uint8Array {
     const metaSize = buffer.length;
     const hash = x64hash128(buffer, 47);
-    let header = BigInt(hash.getUint32(0, false)) << 32n | 
BigInt(hash.getUint32(4, false));
+    // Read the high 64 bits of the 128-bit MurmurHash3 as a SIGNED
+    // int64 to match pyfory (`hash_buffer()[0]` unpacks `int64_t[0]`),
+    // java (`murmurhash3_x64_128(...)[0]` returns `long`), and rust
+    // (`.0 as i64`). Reading the same bytes as unsigned via two
+    // uint32 halves produces a different value after
+    // `<< (64 - NUM_HASH_BITS); abs()` whenever the hash's high bit
+    // is set -- unsigned BigInt can't go negative, so its sign-check
+    // is always false and the abs is a no-op. Signed int64 here
+    // matches the canonical behaviour of the other xlang bindings.
+    let header = hash.getBigInt64(0, false);
     header = header << BigInt(64 - NUM_HASH_BITS);
-    header = header >= 0n ? header : -header; // Math.abs for bigint
+    // Arbitrary-precision abs + mask to 63 bits, matching pyfory's
+    // `abs(hash) & 0x7FFFFFFFFFFFFFFF`. The mask clears the sign bit
+    // so the COMPRESS_META_FLAG (bit 9) / HAS_FIELDS_META_FLAG
+    // (bit 8) / metaSize (low 8 bits) ORs below don't collide with
+    // residual hash bits.
+    if (header < 0n) {
+      header = -header;
+    }
+    header = header & 0x7FFFFFFFFFFFFFFFn;
 
     if (isCompressed) {
       header |= COMPRESS_META_FLAG;


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to