This is an automated email from the ASF dual-hosted git repository.

junrushao pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/tvm-ffi.git


The following commit(s) were added to refs/heads/main by this push:
     new 281f7e1  doc: clarify structural_eq semantics and py_class eq/hash 
interaction (#566)
281f7e1 is described below

commit 281f7e114fc46d262ccb57c18d19c7ef7b0a018c
Author: Yaoyao Ding <[email protected]>
AuthorDate: Fri Apr 24 01:20:07 2026 -0400

    doc: clarify structural_eq semantics and py_class eq/hash interaction (#566)
    
    Rework the const-tree section in docs/concepts/structural_eq_hash.rst:
    the "Why not use it everywhere?" example was bad — the drawn trees had
    different pointers for their `+` nodes, so the const-tree pointer
    shortcut would never have fired. Replaced it with a shared-subtree
    scenario under `map_free_vars=True` that actually exercises the bug, and
    expanded "When is this safe?" with a third condition (interning payoff)
    plus a rule of thumb for choosing between `tree` and `const-tree`.
    
    Expand the `"var"` section to document that non-ignored fields on a var
    type (e.g. a type annotation) are compared on first encounter but
    subsequent occurrences short-circuit through the bijective mapping —
    i.e. the mapping is sticky.
    
    Clarify the py_class docstring so readers understand that
    `structural_eq` is independent of `eq` / `unsafe_hash`: the default
    `__eq__` / `__hash__` remain pointer-based (inherited from `Object`),
    while `structural_equal` / `structural_hash` are driven separately by
    the kind metadata registered in C++.
    
    ---------
    
    Co-authored-by: Claude Opus 4.7 (1M context) <[email protected]>
    Co-authored-by: gemini-code-assist[bot] 
<176961590+gemini-code-assist[bot]@users.noreply.github.com>
---
 docs/concepts/structural_eq_hash.rst   | 306 +++++++++++++++++++++++++++------
 python/tvm_ffi/dataclasses/py_class.py |  48 +++++-
 2 files changed, 297 insertions(+), 57 deletions(-)

diff --git a/docs/concepts/structural_eq_hash.rst 
b/docs/concepts/structural_eq_hash.rst
index 63dbd4c..55614c3 100644
--- a/docs/concepts/structural_eq_hash.rst
+++ b/docs/concepts/structural_eq_hash.rst
@@ -33,6 +33,18 @@ The behavior is controlled by two layers of annotation on
 This document explains what each annotation means, when to use it, and how
 they compose.
 
+.. note::
+
+   Structural equality and hashing **never call Python-level** ``__eq__``
+   or ``__hash__``.  ``structural_equal`` / ``structural_hash`` dispatch
+   entirely through a C++ walker driven by the kind metadata registered
+   via ``structural_eq=``; the Python ``a == b`` / ``hash(a)`` dunders
+   are independent (they default to pointer identity and handle address,
+   inherited from ``Object``).  To customize how a specific type
+   participates in *structural* comparison, register the
+   :ref:`sequal-shash` hooks described below — do **not** override
+   ``__eq__`` or ``__hash__``.
+
 
 Type-Level Annotation
 ---------------------
@@ -160,61 +172,103 @@ object, they are guaranteed equal — skip the field 
comparison."
 This is purely a **performance optimization**. The only behavioral difference
 from ``"tree"`` is that pointer identity short-circuits to ``True``.
 
-When is this safe?
-~~~~~~~~~~~~~~~~~~
+When is this safe (and worth it)?
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-When the type satisfies two conditions:
+Three conditions decide whether ``"const-tree"`` is the right choice:
 
 1. **Immutable** — content doesn't change after construction, so same-pointer
    always implies same-content.
 2. **No transitive** ``"var"`` **children** — skipping field traversal won't
    cause variable mappings to be missed (see :ref:`var-kind` for why this
    matters).
+3. **Sharing is common** — instances are interned or canonicalized, so the
+   same pointer actually appears on both sides of real comparisons. Without
+   interning, the shortcut never fires and ``"const-tree"`` behaves like
+   ``"tree"`` with a dead branch.
+
+Conditions 1 and 2 are correctness requirements: violating them is a bug,
+not a performance regression. Condition 3 is the payoff — ``"const-tree"``
+is worth reaching for only when it will actually save work.
+
+A useful rule of thumb: *does the system go out of its way to make two
+equal instances of this type share a pointer?* Canonical types, interned
+constants, cached shapes, and op metadata usually do. General expression
+and statement nodes usually don't — and also fail condition 2. Prefer
+``"const-tree"`` for the type / attribute / metadata layer of the IR, not
+the expression / statement layer.
+
+Note also that condition 2 is a *whole-subgraph* property: once a field
+holds an ``Expr`` (which may one day contain a ``Var``), the annotation
+silently commits the type to that invariant — a later refactor embedding
+a ``Var`` becomes a correctness break rather than a local change.
 
 Why not use it everywhere?
 ~~~~~~~~~~~~~~~~~~~~~~~~~~
 
 Most IR nodes are immutable, but many transitively contain variables
-(e.g., ``x + 1`` contains the ``"var"`` node ``x``). If the pointer
-shortcut fires, the traversal skips ``x``, and a variable mapping that should
-have been established is silently missed.
+(e.g., ``x + 1`` contains the ``"var"`` node ``x``). The pointer shortcut
+fires only when both sides of a comparison reference the **same object** —
+but when that sharing exists, skipping traversal also skips the variable
+occurrences inside, and mappings that should have been recorded are
+silently missed.
 
-The following diagram shows the danger. Suppose the ``+`` node were
-incorrectly annotated as ``"const-tree"``. When comparing two trees that
-share a sub-expression, the pointer shortcut fires on the shared node, and
-the ``"var"`` ``x`` inside it is never visited — so no ``x ↔ y`` mapping
-is recorded:
+Suppose the ``+`` node were incorrectly annotated as ``"const-tree"``, and
+consider comparing two tuples that share the ``+`` subtree via pointer
+identity:
 
-.. mermaid::
+.. code-block:: text
 
-   graph TD
-       subgraph "lhs"
-           LT["(_, _)"]
-           LE["x + 1"]
-           LX["x : var"]
-           LT -->|".0"| LE
-           LT -->|".1"| LX
-           LE -->|".lhs"| LX2["x"]
-       end
+   shared = x + 1                     # pointer P, contains var x
 
-       subgraph "rhs"
-           RT["(_, _)"]
-           RE["y + 1"]
-           RY["y : var"]
-           RT -->|".0"| RE
-           RT -->|".1"| RY
-           RE -->|".lhs"| RY2["y"]
-       end
+   lhs = (shared, x)                  # .0 = P, .1 = var x
+   rhs = (shared, y)                  # .0 = P, .1 = var y  (different Var)
 
-       LE -. "const-tree would skip here<br/>(misses x ↔ y mapping)" .-> RE
-       LX -. "Later comparison fails:<br/>x has no recorded mapping" .-> RY
+   structural_equal(lhs, rhs, map_free_vars=True)
 
-       style LE fill:#fff3cd
-       style RE fill:#fff3cd
-       style LX fill:#f8d7da
-       style RY fill:#f8d7da
-       style LX2 fill:#f8d7da
-       style RY2 fill:#f8d7da
+With the ``+`` annotated as plain ``"tree"`` (correct):
+
+- ``.0``: traverse into ``shared`` on both sides, visit ``x`` at ``.lhs``,
+  record the mapping ``x ↔ x``.
+- ``.1``: look up ``x`` → maps to ``x``, but rhs is ``y``. **NOT EQUAL** ✓
+
+With the ``+`` annotated as ``"const-tree"`` (the bug):
+
+- ``.0``: pointer shortcut fires on ``shared`` (both sides reference P).
+  Fields are skipped, ``x`` inside is never visited, no mapping is recorded.
+- ``.1``: compare ``x`` vs ``y``. No existing mapping, and
+  ``map_free_vars=True`` lets a new one be recorded as ``x ↔ y``.
+  **EQUAL** ✗ (wrong)
+
+The following diagram illustrates the shared structure. The ``+`` node
+(``shared``) has two incoming ``.0`` edges — one from each side — which
+is exactly the situation in which the pointer shortcut fires:
+
+.. mermaid::
+
+   graph TD
+       LT["lhs: (_, _)"]
+       RT["rhs: (_, _)"]
+       ADD["shared = x + 1<br/>const-tree<br/><i>same pointer on both 
sides</i>"]
+       X["x : var"]
+       ONE["1"]
+       Y["y : var"]
+
+       LT -->|".0"| ADD
+       RT -->|".0"| ADD
+       LT -->|".1"| X
+       RT -->|".1"| Y
+       ADD -->|".lhs"| X
+       ADD -->|".rhs"| ONE
+
+       style ADD fill:#fff3cd
+       style X fill:#f8d7da
+       style Y fill:#f8d7da
+
+The same failure mode arises whenever a shared subtree containing a
+``"var"`` is compared inside any definition region (e.g., the body of a
+``Lambda`` whose params field is ``structural_eq="def"``), not only under
+``map_free_vars=True``.
 
 
 ``"dag"`` — Sharing-Aware Comparison
@@ -337,11 +391,15 @@ Full comparison: ``"tree"`` vs ``"dag"``
 
    @py_class(structural_eq="var")
    class Var(Object):
-       name: str
+       name: str = field(structural_eq="ignore")   # alpha-equivalent vars 
differ in name
+       type: Type                                  # participates in equality
 
 **Meaning**: "This is a variable. Two variables are equal if they are
-**bound in corresponding positions**, not if they have the same name or
-content."
+**bound in corresponding positions**, not if they have the same name."
+The ``name`` field is almost always marked ``structural_eq="ignore"``
+because alpha-equivalent variables have different names. Other fields
+such as ``type`` *are* compared — but only at the binding site (see
+:ref:`var-fields`).
 
 The problem
 ~~~~~~~~~~~
@@ -350,7 +408,7 @@ The problem
 
    fun x → x + 1       should equal       fun y → y + 1
 
-Variables are not defined by their content (name, type annotation). They
+Variables are not defined by their content, such as their name. They
 are defined by **where they are introduced** and **how they are used**.
 ``x`` and ``y`` above are interchangeable because they occupy the same
 binding position and are used in the same way.
@@ -399,6 +457,43 @@ The following diagram traces the comparison of two 
alpha-equivalent functions:
    # Bare expressions, no enclosing function:
    x + 1  vs  y + 1   →  NOT EQUAL (no definition region, different pointers)
 
+.. _var-fields:
+
+Fields and the sticky mapping
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+A ``"var"`` type still has fields, and non-ignored fields *are* compared —
+but only on the **first** encounter of a var pair. Once a mapping is
+recorded, subsequent occurrences look up the mapping and skip field
+comparison entirely.
+
+Take the ``Var`` declaration from the top of this section: ``name`` is
+ignored, but ``type`` is not. The first time a pair of vars is seen in
+a definition region, their ``type`` fields are compared and the mapping
+is only established if they match. After that, the mapping is **sticky**
+— later occurrences trust the correspondence regardless of those fields:
+
+.. list-table::
+   :header-rows: 1
+   :widths: 55 45
+
+   * - Scenario
+     - Result
+   * - ``Var("x", int)`` vs ``Var("y", int)`` on first encounter
+     - Fields match → mapping ``x ↔ y`` recorded → **Equal**
+   * - ``Var("x", int)`` vs ``Var("y", float)`` on first encounter
+     - Fields differ → **Not equal**
+   * - ``Var("x", int)`` vs ``Var("y", float)`` when ``x ↔ y`` already mapped
+     - Lookup succeeds → **Equal** (types are *not* rechecked)
+
+For IRs where type consistency is part of well-formedness, this is
+usually sufficient: a well-formed program uses each var with a consistent
+type at every occurrence, so the first-encounter check at the binding
+site covers the rest. If you truly want types re-verified at every use,
+they don't belong on the ``"var"`` node — lift them into the surrounding
+expression/statement node where they participate in normal ``"tree"``
+comparison.
+
 Full comparison: with and without definition regions
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
@@ -549,20 +644,125 @@ Use for:
 - **Any field that introduces names into scope**
 
 
-Custom Equality and Hashing
-----------------------------
+.. _sequal-shash:
+
+Custom Equality and Hashing: ``__s_equal__`` / ``__s_hash__``
+--------------------------------------------------------------
+
+For types where the default field-by-field traversal is insufficient (for
+example, fields that need to be visited in a specific order, cross-field
+invariants, or sub-values that need a different ``def_region`` setting
+than the declarative field flags allow), you can register custom
+callbacks as **type attributes**:
+
+- ``__s_equal__`` — custom structural equality logic.
+- ``__s_hash__`` — custom structural hashing logic.
+
+These are the *only* supported way to override structural comparison.
+``structural_equal`` / ``structural_hash`` never consult Python
+``__eq__`` / ``__hash__`` — those dunders serve a separate purpose
+(``==`` and ``hash()``, which default to pointer identity).
+
+When either hook is registered, it replaces the default field iteration
+for that type.  All kind-specific machinery (``"dag"`` memoization,
+``"var"`` mapping, the pointer shortcut of ``"const-tree"``, etc.) is
+still managed by the framework — the custom callback only controls
+*which* sub-values are compared or hashed, *in what order*, and *with
+what* ``def_region`` flag.
 
-For types where the default field-by-field traversal is insufficient, you
-can register custom callbacks as type attributes:
+Signatures
+~~~~~~~~~~
 
-- **``__s_equal__``** — custom equality logic
-- **``__s_hash__``** — custom hashing logic
+``__s_equal__``:
 
-These are registered per type via ``EnsureTypeAttrColumn``. When present,
-they replace the default field iteration. The system still manages all
-kind-specific logic (``"dag"`` memoization, ``"var"`` mapping, etc.) — the
-custom callback only controls which sub-values to compare/hash and in what
-order.
+.. code-block:: text
+
+   (self, other, eq_cb) -> bool
+
+   eq_cb(lhs, rhs, def_region: bool, field_name: str) -> bool
+
+``__s_hash__``:
+
+.. code-block:: text
+
+   (self, init_hash: int, hash_cb) -> int
+
+   hash_cb(value, init_hash: int, def_region: bool) -> int
+
+The ``def_region`` flag on each recursive call controls whether the
+sub-value is compared/hashed in a definition region (enabling new
+variable bindings, just like ``field(structural_eq="def")``).  The
+``field_name`` argument on ``eq_cb`` is used only for mismatch path
+reporting from :py:func:`~tvm_ffi.get_first_structural_mismatch`.
+
+Example (Python)
+~~~~~~~~~~~~~~~~
+
+.. code-block:: python
+
+   @py_class(structural_eq="tree")
+   class Lambda(Object):
+       params: list
+       body: Any
+       comment: str  # not part of identity, but also not iterated below
+
+       def __s_equal__(self, other, eq_cb):
+           # params is a definition region; body is not.
+           if not eq_cb(self.params, other.params, True, "params"):
+               return False
+           if not eq_cb(self.body, other.body, False, "body"):
+               return False
+           return True
+
+       def __s_hash__(self, init_hash, hash_cb):
+           h = hash_cb(self.params, init_hash, True)
+           h = hash_cb(self.body, h, False)
+           return h
+
+The two methods must agree: if ``__s_equal__`` considers two instances
+equal, ``__s_hash__`` must produce the same hash for them.
+
+Example (C++)
+~~~~~~~~~~~~~
+
+.. code-block:: c++
+
+   class MyNodeObj : public Object {
+    public:
+     Array<Var> params;
+     Array<ObjectRef> body;
+
+     bool SEqual(const MyNodeObj* other,
+                 ffi::TypedFunction<bool(AnyView, AnyView, bool, AnyView)> 
cmp) const {
+       if (!cmp(params, other->params, /*def_region=*/true, "params")) return 
false;
+       if (!cmp(body, other->body, /*def_region=*/false, "body")) return false;
+       return true;
+     }
+
+     int64_t SHash(int64_t init_hash,
+                   ffi::TypedFunction<int64_t(AnyView, int64_t, bool)> hash) 
const {
+       int64_t h = hash(params, init_hash, /*def_region=*/true);
+       h = hash(body, h, /*def_region=*/false);
+       return h;
+     }
+
+     static void RegisterReflection() {
+       namespace refl = tvm::ffi::reflection;
+       refl::ObjectDef<MyNodeObj>()
+           .def_ro("params", &MyNodeObj::params)
+           .def_ro("body", &MyNodeObj::body);
+       refl::TypeAttrDef<MyNodeObj>()
+           .def(refl::type_attr::kSEqual, &MyNodeObj::SEqual)
+           .def(refl::type_attr::kSHash, &MyNodeObj::SHash);
+     }
+
+     static constexpr TVMFFISEqHashKind _type_s_eq_hash_kind = 
kTVMFFISEqHashKindTreeNode;
+     TVM_FFI_DECLARE_OBJECT_INFO_FINAL("my.Node", MyNodeObj, Object);
+   };
+
+See :cpp:var:`tvm::ffi::reflection::type_attr::kSEqual` and
+:cpp:var:`tvm::ffi::reflection::type_attr::kSHash` in
+``include/tvm/ffi/reflection/accessor.h`` for the full reference.
 
 
 All Kinds at a Glance
@@ -683,7 +883,7 @@ source location:
 
    @py_class(structural_eq="var")
    class Var(Object):
-       name: str
+       name: str = field(structural_eq="ignore")
 
    @py_class(structural_eq="singleton")
    class Op(Object):
diff --git a/python/tvm_ffi/dataclasses/py_class.py 
b/python/tvm_ffi/dataclasses/py_class.py
index a3e249f..0a44aa0 100644
--- a/python/tvm_ffi/dataclasses/py_class.py
+++ b/python/tvm_ffi/dataclasses/py_class.py
@@ -543,12 +543,21 @@ def py_class(  # noqa: PLR0913
     repr
         If True (default), generate ``__repr__``.
     eq
-        If True, generate ``__eq__`` and ``__ne__``.
+        If True, generate ``__eq__`` and ``__ne__`` using recursive
+        field-wise content equality.  Default False, in which case the
+        class inherits the pointer-based ``__eq__`` from ``Object``
+        (``a == b`` is equivalent to ``a.same_as(b)``).  If the class
+        body defines ``__eq__`` or ``__ne__``, the generator is skipped
+        and the user definition is preserved.
     order
         If True, generate ``__lt__``, ``__le__``, ``__gt__``, ``__ge__``.
         Requires ``eq=True``.
     unsafe_hash
-        If True, generate ``__hash__`` (unsafe for mutable objects).
+        If True, generate ``__hash__`` using recursive field-wise
+        content hashing (unsafe for mutable objects).  Default False,
+        in which case the class inherits the handle-address ``__hash__``
+        from ``Object``.  A user-defined ``__hash__`` in the class body
+        is preserved.
     match_args
         If True (default), set ``__match_args__`` to a tuple of the
         positional ``__init__`` field names (``init=True`` and not
@@ -558,8 +567,8 @@ def py_class(  # noqa: PLR0913
         If True, all fields are keyword-only in ``__init__`` by default.
     structural_eq
         Structural equality/hashing kind for this type.  Controls how
-        instances participate in ``StructuralEqual`` and ``StructuralHash``.
-        Valid values are:
+        instances participate in ``structural_equal`` and
+        ``structural_hash``.  Valid values are:
 
         - ``None`` (default): structural comparison is not supported.
         - ``"tree"``: content-based comparison, the safe default for
@@ -571,6 +580,11 @@ def py_class(  # noqa: PLR0913
           fast path (only safe for types with no transitive ``"var"``
           children).
         - ``"singleton"``: pointer equality only, for singleton types.
+
+        This parameter is **independent** of ``eq`` / ``unsafe_hash``:
+        it only configures how ``structural_equal`` / ``structural_hash``
+        walk the object in C++ and never installs or alters Python-level
+        ``__eq__`` / ``__hash__``.  See Notes below.
     slots
         Accepted for ``dataclass_transform`` compatibility.  Object
         subclasses always use ``__slots__ = ()`` via the metaclass.
@@ -580,6 +594,32 @@ def py_class(  # noqa: PLR0913
     Callable | type
         A class decorator, or the decorated class (bare usage).
 
+    Notes
+    -----
+    Three orthogonal equality/hashing mechanisms coexist on a
+    ``@py_class`` type, each controlled by an independent knob:
+
+    - ``a == b`` / ``hash(a)`` — selected by ``eq`` / ``unsafe_hash``
+      params (or user-defined ``__eq__`` / ``__hash__`` in the class
+      body).  Default: pointer-based ``same_as`` and handle-address
+      hash, inherited from ``Object``.
+    - ``structural_equal(a, b)`` / ``structural_hash(a)`` — selected
+      by the ``structural_eq`` param.  Default (``None``): structural
+      comparison is unsupported for this type.
+    - ``a.same_as(b)`` — always available; always pointer comparison.
+
+    The typical pattern is to leave ``eq`` / ``unsafe_hash`` at their
+    defaults so ``==`` and ``hash()`` stay cheap and pointer-based
+    (ideal for pass-internal bookkeeping such as visited-set tracking),
+    and call ``structural_equal`` / ``structural_hash`` explicitly at
+    the points that require the heavy semantic check.
+
+    Combining ``eq=True`` (or ``unsafe_hash=True``) with a
+    ``structural_eq`` kind is legal but gives the type two different
+    recursive equalities — a Python-level one for ``==`` and a C++
+    structural one for ``structural_equal`` — which rarely coincide.
+    Prefer setting only one.
+
     """
     if order and not eq:
         raise ValueError("order=True requires eq=True")

Reply via email to