This is an automated email from the ASF dual-hosted git repository.
junrushao pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/tvm-ffi.git
The following commit(s) were added to refs/heads/main by this push:
new 281f7e1 doc: clarify structural_eq semantics and py_class eq/hash
interaction (#566)
281f7e1 is described below
commit 281f7e114fc46d262ccb57c18d19c7ef7b0a018c
Author: Yaoyao Ding <[email protected]>
AuthorDate: Fri Apr 24 01:20:07 2026 -0400
doc: clarify structural_eq semantics and py_class eq/hash interaction (#566)
Rework the const-tree section in docs/concepts/structural_eq_hash.rst:
the "Why not use it everywhere?" example was bad — the drawn trees had
different pointers for their `+` nodes, so the const-tree pointer
shortcut would never have fired. Replaced it with a shared-subtree
scenario under `map_free_vars=True` that actually exercises the bug, and
expanded "When is this safe?" with a third condition (interning payoff)
plus a rule of thumb for choosing between `tree` and `const-tree`.
Expand the `"var"` section to document that non-ignored fields on a var
type (e.g. a type annotation) are compared on first encounter but
subsequent occurrences short-circuit through the bijective mapping —
i.e. the mapping is sticky.
Clarify the py_class docstring so readers understand that
`structural_eq` is independent of `eq` / `unsafe_hash`: the default
`__eq__` / `__hash__` remain pointer-based (inherited from `Object`),
while `structural_equal` / `structural_hash` are driven separately by
the kind metadata registered in C++.
---------
Co-authored-by: Claude Opus 4.7 (1M context) <[email protected]>
Co-authored-by: gemini-code-assist[bot]
<176961590+gemini-code-assist[bot]@users.noreply.github.com>
---
docs/concepts/structural_eq_hash.rst | 306 +++++++++++++++++++++++++++------
python/tvm_ffi/dataclasses/py_class.py | 48 +++++-
2 files changed, 297 insertions(+), 57 deletions(-)
diff --git a/docs/concepts/structural_eq_hash.rst
b/docs/concepts/structural_eq_hash.rst
index 63dbd4c..55614c3 100644
--- a/docs/concepts/structural_eq_hash.rst
+++ b/docs/concepts/structural_eq_hash.rst
@@ -33,6 +33,18 @@ The behavior is controlled by two layers of annotation on
This document explains what each annotation means, when to use it, and how
they compose.
+.. note::
+
+ Structural equality and hashing **never call Python-level** ``__eq__``
+ or ``__hash__``. ``structural_equal`` / ``structural_hash`` dispatch
+ entirely through a C++ walker driven by the kind metadata registered
+ via ``structural_eq=``; the Python ``a == b`` / ``hash(a)`` dunders
+ are independent (they default to pointer identity and handle address,
+ inherited from ``Object``). To customize how a specific type
+ participates in *structural* comparison, register the
+ :ref:`sequal-shash` hooks described below — do **not** override
+ ``__eq__`` or ``__hash__``.
+
Type-Level Annotation
---------------------
@@ -160,61 +172,103 @@ object, they are guaranteed equal — skip the field
comparison."
This is purely a **performance optimization**. The only behavioral difference
from ``"tree"`` is that pointer identity short-circuits to ``True``.
-When is this safe?
-~~~~~~~~~~~~~~~~~~
+When is this safe (and worth it)?
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-When the type satisfies two conditions:
+Three conditions decide whether ``"const-tree"`` is the right choice:
1. **Immutable** — content doesn't change after construction, so same-pointer
always implies same-content.
2. **No transitive** ``"var"`` **children** — skipping field traversal won't
cause variable mappings to be missed (see :ref:`var-kind` for why this
matters).
+3. **Sharing is common** — instances are interned or canonicalized, so the
+ same pointer actually appears on both sides of real comparisons. Without
+ interning, the shortcut never fires and ``"const-tree"`` behaves like
+ ``"tree"`` with a dead branch.
+
+Conditions 1 and 2 are correctness requirements: violating them is a bug,
+not a performance regression. Condition 3 is the payoff — ``"const-tree"``
+is worth reaching for only when it will actually save work.
+
+A useful rule of thumb: *does the system go out of its way to make two
+equal instances of this type share a pointer?* Canonical types, interned
+constants, cached shapes, and op metadata usually do. General expression
+and statement nodes usually don't — and also fail condition 2. Prefer
+``"const-tree"`` for the type / attribute / metadata layer of the IR, not
+the expression / statement layer.
+
+Note also that condition 2 is a *whole-subgraph* property: once a field
+holds an ``Expr`` (which may one day contain a ``Var``), the annotation
+silently commits the type to that invariant — a later refactor embedding
+a ``Var`` becomes a correctness break rather than a local change.
Why not use it everywhere?
~~~~~~~~~~~~~~~~~~~~~~~~~~
Most IR nodes are immutable, but many transitively contain variables
-(e.g., ``x + 1`` contains the ``"var"`` node ``x``). If the pointer
-shortcut fires, the traversal skips ``x``, and a variable mapping that should
-have been established is silently missed.
+(e.g., ``x + 1`` contains the ``"var"`` node ``x``). The pointer shortcut
+fires only when both sides of a comparison reference the **same object** —
+but when that sharing exists, skipping traversal also skips the variable
+occurrences inside, and mappings that should have been recorded are
+silently missed.
-The following diagram shows the danger. Suppose the ``+`` node were
-incorrectly annotated as ``"const-tree"``. When comparing two trees that
-share a sub-expression, the pointer shortcut fires on the shared node, and
-the ``"var"`` ``x`` inside it is never visited — so no ``x ↔ y`` mapping
-is recorded:
+Suppose the ``+`` node were incorrectly annotated as ``"const-tree"``, and
+consider comparing two tuples that share the ``+`` subtree via pointer
+identity:
-.. mermaid::
+.. code-block:: text
- graph TD
- subgraph "lhs"
- LT["(_, _)"]
- LE["x + 1"]
- LX["x : var"]
- LT -->|".0"| LE
- LT -->|".1"| LX
- LE -->|".lhs"| LX2["x"]
- end
+ shared = x + 1 # pointer P, contains var x
- subgraph "rhs"
- RT["(_, _)"]
- RE["y + 1"]
- RY["y : var"]
- RT -->|".0"| RE
- RT -->|".1"| RY
- RE -->|".lhs"| RY2["y"]
- end
+ lhs = (shared, x) # .0 = P, .1 = var x
+ rhs = (shared, y) # .0 = P, .1 = var y (different Var)
- LE -. "const-tree would skip here<br/>(misses x ↔ y mapping)" .-> RE
- LX -. "Later comparison fails:<br/>x has no recorded mapping" .-> RY
+ structural_equal(lhs, rhs, map_free_vars=True)
- style LE fill:#fff3cd
- style RE fill:#fff3cd
- style LX fill:#f8d7da
- style RY fill:#f8d7da
- style LX2 fill:#f8d7da
- style RY2 fill:#f8d7da
+With the ``+`` annotated as plain ``"tree"`` (correct):
+
+- ``.0``: traverse into ``shared`` on both sides, visit ``x`` at ``.lhs``,
+ record the mapping ``x ↔ x``.
+- ``.1``: look up ``x`` → maps to ``x``, but rhs is ``y``. **NOT EQUAL** ✓
+
+With the ``+`` annotated as ``"const-tree"`` (the bug):
+
+- ``.0``: pointer shortcut fires on ``shared`` (both sides reference P).
+ Fields are skipped, ``x`` inside is never visited, no mapping is recorded.
+- ``.1``: compare ``x`` vs ``y``. No existing mapping, and
+ ``map_free_vars=True`` lets a new one be recorded as ``x ↔ y``.
+ **EQUAL** ✗ (wrong)
+
+The following diagram illustrates the shared structure. The ``+`` node
+(``shared``) has two incoming ``.0`` edges — one from each side — which
+is exactly the situation in which the pointer shortcut fires:
+
+.. mermaid::
+
+ graph TD
+ LT["lhs: (_, _)"]
+ RT["rhs: (_, _)"]
+ ADD["shared = x + 1<br/>const-tree<br/><i>same pointer on both
sides</i>"]
+ X["x : var"]
+ ONE["1"]
+ Y["y : var"]
+
+ LT -->|".0"| ADD
+ RT -->|".0"| ADD
+ LT -->|".1"| X
+ RT -->|".1"| Y
+ ADD -->|".lhs"| X
+ ADD -->|".rhs"| ONE
+
+ style ADD fill:#fff3cd
+ style X fill:#f8d7da
+ style Y fill:#f8d7da
+
+The same failure mode arises whenever a shared subtree containing a
+``"var"`` is compared inside any definition region (e.g., the body of a
+``Lambda`` whose params field is ``structural_eq="def"``), not only under
+``map_free_vars=True``.
``"dag"`` — Sharing-Aware Comparison
@@ -337,11 +391,15 @@ Full comparison: ``"tree"`` vs ``"dag"``
@py_class(structural_eq="var")
class Var(Object):
- name: str
+ name: str = field(structural_eq="ignore") # alpha-equivalent vars
differ in name
+ type: Type # participates in equality
**Meaning**: "This is a variable. Two variables are equal if they are
-**bound in corresponding positions**, not if they have the same name or
-content."
+**bound in corresponding positions**, not if they have the same name."
+The ``name`` field is almost always marked ``structural_eq="ignore"``
+because alpha-equivalent variables have different names. Other fields
+such as ``type`` *are* compared — but only at the binding site (see
+:ref:`var-fields`).
The problem
~~~~~~~~~~~
@@ -350,7 +408,7 @@ The problem
fun x → x + 1 should equal fun y → y + 1
-Variables are not defined by their content (name, type annotation). They
+Variables are not defined by their content, such as their name. They
are defined by **where they are introduced** and **how they are used**.
``x`` and ``y`` above are interchangeable because they occupy the same
binding position and are used in the same way.
@@ -399,6 +457,43 @@ The following diagram traces the comparison of two
alpha-equivalent functions:
# Bare expressions, no enclosing function:
x + 1 vs y + 1 → NOT EQUAL (no definition region, different pointers)
+.. _var-fields:
+
+Fields and the sticky mapping
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+A ``"var"`` type still has fields, and non-ignored fields *are* compared —
+but only on the **first** encounter of a var pair. Once a mapping is
+recorded, subsequent occurrences look up the mapping and skip field
+comparison entirely.
+
+Take the ``Var`` declaration from the top of this section: ``name`` is
+ignored, but ``type`` is not. The first time a pair of vars is seen in
+a definition region, their ``type`` fields are compared and the mapping
+is only established if they match. After that, the mapping is **sticky**
+— later occurrences trust the correspondence regardless of those fields:
+
+.. list-table::
+ :header-rows: 1
+ :widths: 55 45
+
+ * - Scenario
+ - Result
+ * - ``Var("x", int)`` vs ``Var("y", int)`` on first encounter
+ - Fields match → mapping ``x ↔ y`` recorded → **Equal**
+ * - ``Var("x", int)`` vs ``Var("y", float)`` on first encounter
+ - Fields differ → **Not equal**
+ * - ``Var("x", int)`` vs ``Var("y", float)`` when ``x ↔ y`` already mapped
+ - Lookup succeeds → **Equal** (types are *not* rechecked)
+
+For IRs where type consistency is part of well-formedness, this is
+usually sufficient: a well-formed program uses each var with a consistent
+type at every occurrence, so the first-encounter check at the binding
+site covers the rest. If you truly want types re-verified at every use,
+they don't belong on the ``"var"`` node — lift them into the surrounding
+expression/statement node where they participate in normal ``"tree"``
+comparison.
+
Full comparison: with and without definition regions
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -549,20 +644,125 @@ Use for:
- **Any field that introduces names into scope**
-Custom Equality and Hashing
-----------------------------
+.. _sequal-shash:
+
+Custom Equality and Hashing: ``__s_equal__`` / ``__s_hash__``
+--------------------------------------------------------------
+
+For types where the default field-by-field traversal is insufficient (for
+example, fields that need to be visited in a specific order, cross-field
+invariants, or sub-values that need a different ``def_region`` setting
+than the declarative field flags allow), you can register custom
+callbacks as **type attributes**:
+
+- ``__s_equal__`` — custom structural equality logic.
+- ``__s_hash__`` — custom structural hashing logic.
+
+These are the *only* supported way to override structural comparison.
+``structural_equal`` / ``structural_hash`` never consult Python
+``__eq__`` / ``__hash__`` — those dunders serve a separate purpose
+(``==`` and ``hash()``, which default to pointer identity).
+
+When either hook is registered, it replaces the default field iteration
+for that type. All kind-specific machinery (``"dag"`` memoization,
+``"var"`` mapping, the pointer shortcut of ``"const-tree"``, etc.) is
+still managed by the framework — the custom callback only controls
+*which* sub-values are compared or hashed, *in what order*, and *with
+what* ``def_region`` flag.
-For types where the default field-by-field traversal is insufficient, you
-can register custom callbacks as type attributes:
+Signatures
+~~~~~~~~~~
-- **``__s_equal__``** — custom equality logic
-- **``__s_hash__``** — custom hashing logic
+``__s_equal__``:
-These are registered per type via ``EnsureTypeAttrColumn``. When present,
-they replace the default field iteration. The system still manages all
-kind-specific logic (``"dag"`` memoization, ``"var"`` mapping, etc.) — the
-custom callback only controls which sub-values to compare/hash and in what
-order.
+.. code-block:: text
+
+ (self, other, eq_cb) -> bool
+
+ eq_cb(lhs, rhs, def_region: bool, field_name: str) -> bool
+
+``__s_hash__``:
+
+.. code-block:: text
+
+ (self, init_hash: int, hash_cb) -> int
+
+ hash_cb(value, init_hash: int, def_region: bool) -> int
+
+The ``def_region`` flag on each recursive call controls whether the
+sub-value is compared/hashed in a definition region (enabling new
+variable bindings, just like ``field(structural_eq="def")``). The
+``field_name`` argument on ``eq_cb`` is used only for mismatch path
+reporting from :py:func:`~tvm_ffi.get_first_structural_mismatch`.
+
+Example (Python)
+~~~~~~~~~~~~~~~~
+
+.. code-block:: python
+
+ @py_class(structural_eq="tree")
+ class Lambda(Object):
+ params: list
+ body: Any
+ comment: str # not part of identity, but also not iterated below
+
+ def __s_equal__(self, other, eq_cb):
+ # params is a definition region; body is not.
+ if not eq_cb(self.params, other.params, True, "params"):
+ return False
+ if not eq_cb(self.body, other.body, False, "body"):
+ return False
+ return True
+
+ def __s_hash__(self, init_hash, hash_cb):
+ h = hash_cb(self.params, init_hash, True)
+ h = hash_cb(self.body, h, False)
+ return h
+
+The two methods must agree: if ``__s_equal__`` considers two instances
+equal, ``__s_hash__`` must produce the same hash for them.
+
+Example (C++)
+~~~~~~~~~~~~~
+
+.. code-block:: c++
+
+ class MyNodeObj : public Object {
+ public:
+ Array<Var> params;
+ Array<ObjectRef> body;
+
+ bool SEqual(const MyNodeObj* other,
+ ffi::TypedFunction<bool(AnyView, AnyView, bool, AnyView)>
cmp) const {
+ if (!cmp(params, other->params, /*def_region=*/true, "params")) return
false;
+ if (!cmp(body, other->body, /*def_region=*/false, "body")) return false;
+ return true;
+ }
+
+ int64_t SHash(int64_t init_hash,
+ ffi::TypedFunction<int64_t(AnyView, int64_t, bool)> hash)
const {
+ int64_t h = hash(params, init_hash, /*def_region=*/true);
+ h = hash(body, h, /*def_region=*/false);
+ return h;
+ }
+
+ static void RegisterReflection() {
+ namespace refl = tvm::ffi::reflection;
+ refl::ObjectDef<MyNodeObj>()
+ .def_ro("params", &MyNodeObj::params)
+ .def_ro("body", &MyNodeObj::body);
+ refl::TypeAttrDef<MyNodeObj>()
+ .def(refl::type_attr::kSEqual, &MyNodeObj::SEqual)
+ .def(refl::type_attr::kSHash, &MyNodeObj::SHash);
+ }
+
+ static constexpr TVMFFISEqHashKind _type_s_eq_hash_kind =
kTVMFFISEqHashKindTreeNode;
+ TVM_FFI_DECLARE_OBJECT_INFO_FINAL("my.Node", MyNodeObj, Object);
+ };
+
+See :cpp:var:`tvm::ffi::reflection::type_attr::kSEqual` and
+:cpp:var:`tvm::ffi::reflection::type_attr::kSHash` in
+``include/tvm/ffi/reflection/accessor.h`` for the full reference.
All Kinds at a Glance
@@ -683,7 +883,7 @@ source location:
@py_class(structural_eq="var")
class Var(Object):
- name: str
+ name: str = field(structural_eq="ignore")
@py_class(structural_eq="singleton")
class Op(Object):
diff --git a/python/tvm_ffi/dataclasses/py_class.py
b/python/tvm_ffi/dataclasses/py_class.py
index a3e249f..0a44aa0 100644
--- a/python/tvm_ffi/dataclasses/py_class.py
+++ b/python/tvm_ffi/dataclasses/py_class.py
@@ -543,12 +543,21 @@ def py_class( # noqa: PLR0913
repr
If True (default), generate ``__repr__``.
eq
- If True, generate ``__eq__`` and ``__ne__``.
+ If True, generate ``__eq__`` and ``__ne__`` using recursive
+ field-wise content equality. Default False, in which case the
+ class inherits the pointer-based ``__eq__`` from ``Object``
+ (``a == b`` is equivalent to ``a.same_as(b)``). If the class
+ body defines ``__eq__`` or ``__ne__``, the generator is skipped
+ and the user definition is preserved.
order
If True, generate ``__lt__``, ``__le__``, ``__gt__``, ``__ge__``.
Requires ``eq=True``.
unsafe_hash
- If True, generate ``__hash__`` (unsafe for mutable objects).
+ If True, generate ``__hash__`` using recursive field-wise
+ content hashing (unsafe for mutable objects). Default False,
+ in which case the class inherits the handle-address ``__hash__``
+ from ``Object``. A user-defined ``__hash__`` in the class body
+ is preserved.
match_args
If True (default), set ``__match_args__`` to a tuple of the
positional ``__init__`` field names (``init=True`` and not
@@ -558,8 +567,8 @@ def py_class( # noqa: PLR0913
If True, all fields are keyword-only in ``__init__`` by default.
structural_eq
Structural equality/hashing kind for this type. Controls how
- instances participate in ``StructuralEqual`` and ``StructuralHash``.
- Valid values are:
+ instances participate in ``structural_equal`` and
+ ``structural_hash``. Valid values are:
- ``None`` (default): structural comparison is not supported.
- ``"tree"``: content-based comparison, the safe default for
@@ -571,6 +580,11 @@ def py_class( # noqa: PLR0913
fast path (only safe for types with no transitive ``"var"``
children).
- ``"singleton"``: pointer equality only, for singleton types.
+
+ This parameter is **independent** of ``eq`` / ``unsafe_hash``:
+ it only configures how ``structural_equal`` / ``structural_hash``
+ walk the object in C++ and never installs or alters Python-level
+ ``__eq__`` / ``__hash__``. See Notes below.
slots
Accepted for ``dataclass_transform`` compatibility. Object
subclasses always use ``__slots__ = ()`` via the metaclass.
@@ -580,6 +594,32 @@ def py_class( # noqa: PLR0913
Callable | type
A class decorator, or the decorated class (bare usage).
+ Notes
+ -----
+ Three orthogonal equality/hashing mechanisms coexist on a
+ ``@py_class`` type, each controlled by an independent knob:
+
+ - ``a == b`` / ``hash(a)`` — selected by ``eq`` / ``unsafe_hash``
+ params (or user-defined ``__eq__`` / ``__hash__`` in the class
+ body). Default: pointer-based ``same_as`` and handle-address
+ hash, inherited from ``Object``.
+ - ``structural_equal(a, b)`` / ``structural_hash(a)`` — selected
+ by the ``structural_eq`` param. Default (``None``): structural
+ comparison is unsupported for this type.
+ - ``a.same_as(b)`` — always available; always pointer comparison.
+
+ The typical pattern is to leave ``eq`` / ``unsafe_hash`` at their
+ defaults so ``==`` and ``hash()`` stay cheap and pointer-based
+ (ideal for pass-internal bookkeeping such as visited-set tracking),
+ and call ``structural_equal`` / ``structural_hash`` explicitly at
+ the points that require the heavy semantic check.
+
+ Combining ``eq=True`` (or ``unsafe_hash=True``) with a
+ ``structural_eq`` kind is legal but gives the type two different
+ recursive equalities — a Python-level one for ``==`` and a C++
+ structural one for ``structural_equal`` — which rarely coincide.
+ Prefer setting only one.
+
"""
if order and not eq:
raise ValueError("order=True requires eq=True")