[I] [Bug] Segfault in `tvm.compile` (Relax→TIR, CUDA target) inside `tir::transform::InjectPTXLDG32` / `PTXRewriter::VisitStmt_(BufferStore)` when compiling `torch.export` model returning `(tril, triu)` tuple [tvm]

via GitHub Wed, 24 Dec 2025 23:47:38 -0800


tinywisdom opened a new issue, #18612:
URL: https://github.com/apache/tvm/issues/18612


   ## Summary
   
   I am seeing a **hard segfault** (no Python exception) during 
`tvm.compile(...)` for a CUDA target. The crash consistently occurs inside the 
TIR pass:
   
   * `tvm::tir::transform::InjectPTXLDG32(bool)`
   * `tvm::tir::PTXRewriter::VisitStmt_(tvm::tir::BufferStoreNode const*)`
   * `tvm::tir::BufferStore::BufferStore(...)`
   
   The input IRModule is produced by converting a PyTorch `torch.export` 
program using `tvm.relax.frontend.torch.from_exported_program`. The PyTorch 
model is intentionally small (`Linear(4,4)`) and returns a **tuple of 
tensors**: `(torch.tril(x), torch.triu(x))`.
   
   This looks like a bug in the `InjectPTXLDG32` rewrite logic, or an unsafe 
assumption in the pass leading to a null/invalid `BufferStore` construction.
   
   ---
   
   ## Environment
   
   From the repro output:
   
   * TVM version: `0.22.0`
   * TVM commit: `9dbf3f22ff6f44962472f9af310fda368ca85ef2`
   * LLVM: `17.0.6`
   * Python: `3.10.16` (from stack paths)
   * NumPy: `2.2.6`
   * PyTorch: `2.9.0+cu128`
   * CUDA GPU: `NVIDIA RTX A6000` (sm_86)
   
   Target string used:
   
   ```
   cuda -keys=cuda,gpu -arch=sm_86 -max_num_threads=1024 -thread_warp_size=32
   ```
   
   ---
   
   ## Minimal Repro Script
   
   ```python
   #!/usr/bin/env python3
   # -*- coding: utf-8 -*-
   
   import numpy as np
   import torch
   import torch.nn as nn
   import tvm
   from tvm import tir
   
   
   def print_env_info():
       print("==== Environment Info ====")
       print("TVM version:", getattr(tvm, "__version__", "unknown"))
       try:
           print("TVM git commit:", 
tvm.support.libinfo().get("GIT_COMMIT_HASH", "unknown"))
       except Exception:
           print("TVM git commit: unknown")
       try:
           print("TVM LLVM version:", tvm.support.libinfo().get("LLVM_VERSION", 
"unknown"))
       except Exception:
           print("TVM LLVM version: unknown")
       print("Python (numpy) version:", np.__version__)
       print("PyTorch version:", torch.__version__)
       print("CUDA available (torch):", torch.cuda.is_available())
       if torch.cuda.is_available():
           try:
               print("CUDA device:", torch.cuda.get_device_name(0))
           except Exception:
               pass
       print("==========================\n")
   
   
   class MyModel(nn.Module):
       def __init__(self):
           super().__init__()
           self.linear = nn.Linear(4, 4)
   
       def forward(self, x):
           x = self.linear(x)
           return torch.tril(x), torch.triu(x)
   
   
   def export_to_relax(mod: nn.Module, x: torch.Tensor) -> tvm.IRModule:
       mod = mod.to("cpu").eval()
       x = x.to("cpu")
       ep = torch.export.export(mod, (x,))
       from tvm.relax.frontend.torch import from_exported_program
       return from_exported_program(ep)
   
   
   def main():
       print_env_info()
   
       if not torch.cuda.is_available():
           raise RuntimeError("CUDA is required for this repro, but 
torch.cuda.is_available() is False")
   
       target_str = "cuda -keys=cuda,gpu -arch=sm_86 -max_num_threads=1024 
-thread_warp_size=32"
       target = tvm.target.Target(target_str)
   
       relax_pipeline = "default"
       tir_pipeline = "default"
   
       model = MyModel()
       x = torch.zeros((1, 4), dtype=torch.float32)
   
       print("[repro] exporting torch -> relax ...")
       ir_mod = export_to_relax(model, x)
   
       disabled_pass = [
           "DeadCodeElimination",
           "CanonicalizeBindings",
           "Simplify",
           "UnrollLoop",
           "VectorizeLoop",
           "StorageRewrite",
           "RemoveNoOp",
           "LoopPartition",
       ]
   
       pass_config = {
           "relax.FuseOps.max_depth": 2,
           "relax.lift_transform_params.consume_params": 1,
           "tir.disable_storage_rewrite": 1,
           "tir.disable_vectorize": 1,
           "tir.instrument_bound_checkers": 1,
           "tir.merge_static_smem": 1,
           "tir.noalias": 1,
           "tir.ptx_ldg32": 1,
           "tir.use_async_copy": 1,
       }
   
       pc_kwargs = {
           "opt_level": 3,
           "disabled_pass": disabled_pass,
           "config": pass_config,
       }
   
       print("[repro] target:", target)
       print("[repro] relax_pipeline:", relax_pipeline)
       print("[repro] tir_pipeline:", tir_pipeline)
       print("[repro] opt_level:", pc_kwargs["opt_level"])
       print("[repro] disabled_pass:", disabled_pass)
       print("[repro] PassContext.config keys:", sorted(pass_config.keys()))
       print("[repro] compiling with tvm.compile ...")
   
       with tvm.transform.PassContext(**pc_kwargs):
           _ = tvm.compile(
               ir_mod,
               target=target,
               relax_pipeline=relax_pipeline,
               tir_pipeline=tir_pipeline,
           )
   
       print("[repro] compile finished (no crash).")
   
   
   if __name__ == "__main__":
       main()
   ```
   
   ---
   
   ## Actual Behavior
   
   Segfault during compilation:
   
   ```text
   !!!!!!! Segfault encountered !!!!!!!
   ...
   tvm::tir::BufferStore::BufferStore(...)
   tvm::tir::PTXRewriter::VisitStmt_(tvm::tir::BufferStoreNode const*)
   ...
   tvm::tir::transform::InjectPTXLDG32(bool)
   Segmentation fault (core dumped)
   ```
   
   This is a hard crash (core dumped), not a recoverable error.
   
   ---
   
   ## Expected Behavior
   
   `tvm.compile(...)` should either:
   
   * successfully compile the module, or
   * raise a normal Python exception / diagnostic if some pass config is 
invalid,
   
   but it should **not segfault**.
   
   
   
   
   ### Triage
   
   
   * needs-triage
   * bug
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[I] [Bug] Segfault in `tvm.compile` (Relax→TIR, CUDA target) inside `tir::transform::InjectPTXLDG32` / `PTXRewriter::VisitStmt_(BufferStore)` when compiling `torch.export` model returning `(tril, triu)` tuple [tvm]

Reply via email to