[apache/tvm] [Release] v0.21.0 Release Candidate Notes (Issue #18150)

ysh329 Tue, 15 Jul 2025 20:55:00 -0700

ysh329 created an issue (apache/tvm#18150)

# Introduction


The TVM community has worked since the last release to deliver the following 
new exciting improvements!

The main tags are below (**bold text is with lots of progress**): Relax 
(especial PyTorch frontend), CUDA etc.

Please visit the full listing of commits for a complete view: 
[v0.20.dev0...v0.20.0.rc0](https://github.com/apache/tvm/compare/v0.21.dev0...v0.21.0.rc0).

### Community

None.

### RFCs

None.

### Arith
 * [#18067](https://github.com/apache/tvm/pull/18067) - Add IsBound method to 
ConstIntBoundAnalyzer
 * [#18031](https://github.com/apache/tvm/pull/18031) - Canonicalize 
mul-coefficient to rhs
 * [#18025](https://github.com/apache/tvm/pull/18025) - Fix canonical simplify 
for LE with incorrect range assumptions

### BugFix
 * [#18115](https://github.com/apache/tvm/pull/18115) - [Fix][Serialization] 
Add support for NaN value serialization
 * [#18103](https://github.com/apache/tvm/pull/18103) - [Fix] Replace 
dmlc::Error with std::exception in VerifyGPUCode
 * [#18092](https://github.com/apache/tvm/pull/18092) - [Fix] Fix 
ExecBuilderDeclareFunction method name in exec_builder.py
 * [#18087](https://github.com/apache/tvm/pull/18087) - fix exception when tvm 
not built with llvm support
 * [#18035](https://github.com/apache/tvm/pull/18035) - [CUDA] Fix: Update 
settings for rerun on Increase FloatImm precision when printing 64 bit values 
in CUDA codegen
 * [#17968](https://github.com/apache/tvm/pull/17968) - [Relax][Pytorch] Bugfix 
of conv_transpose1d and conv_transpose2d
 * [#17950](https://github.com/apache/tvm/pull/17950) - [Fix][Relax] Fix 
dangling reference in GetTargetFunctions()
 * [#17902](https://github.com/apache/tvm/pull/17902) - Fix off-by-one error in 
the type index range check within Object::IsInstance()
 * [#17882](https://github.com/apache/tvm/pull/17882) - [Relax][Pytorch] Fix 
incorrect behaviour of % (mod) operator in TVM frontend
 * [#17875](https://github.com/apache/tvm/pull/17875) - [Relax][Pytorch] 
Incorrect Handling of In-Place Ops in FX-Based TVM Frontend
 * [#17838](https://github.com/apache/tvm/pull/17838) - [TIR] Schedule support 
reverse-inline with reduction blocks

### CI
 * [#18071](https://github.com/apache/tvm/pull/18071) - Update windows to 2025
 * [#18058](https://github.com/apache/tvm/pull/18058) - [TEST] Move temp files 
into tempdir
 * [#18037](https://github.com/apache/tvm/pull/18037) - Further robustify 
is_last_build check
 * [#17981](https://github.com/apache/tvm/pull/17981) - Update images to 
`20250513-063354-70aa3797`
 * [#17891](https://github.com/apache/tvm/pull/17891) - Update images to 
20250428-080833-03eadc65
 * [#17905](https://github.com/apache/tvm/pull/17905) - Install PyTorch 2.7 
compatible with CUDA 11.8
 * [#17887](https://github.com/apache/tvm/pull/17887) - Upgrade pytorch to 
2.7.0, torchvision to 0.22.0, and vulkan sdk to 1.4.309
 * [#17846](https://github.com/apache/tvm/pull/17846) - Upgrade ubuntu runner 
image for GitHub CI

### Docker
 * [#17955](https://github.com/apache/tvm/pull/17955) - [CI] Reintroduce NNEF 
to CI images

### Docs
 * [#18056](https://github.com/apache/tvm/pull/18056) - Update installation 
instruction based ffi refactor

### Frontend
 * [#18090](https://github.com/apache/tvm/pull/18090) - [Relax][ONNX] Update 
Reduce ops to support axes as input
 * [#18072](https://github.com/apache/tvm/pull/18072) - [Relax][ONNX] Update 
ReduceL1 to opset 18
 * [#18016](https://github.com/apache/tvm/pull/18016) - [Relax][ONNX] Replace 
deprecated `mapping.TENSOR_TYPE_TO_NP_TYPE` usage
 * [#18001](https://github.com/apache/tvm/pull/18001) - [Relax][ONNX] Fix: 
bitwise_not misclassified as binary (is …
 * [#17990](https://github.com/apache/tvm/pull/17990) - [Relax]Fix: Output 
tensor with zero dimension after torch.u…
 * [#17925](https://github.com/apache/tvm/pull/17925) - [Relax][PyTorch] 
Re-enable test_subgraph_capture in dynamo test
 * [#17980](https://github.com/apache/tvm/pull/17980) - [ONNX] Make bias input 
optional in LayerNormalization
 * [#17918](https://github.com/apache/tvm/pull/17918) - [Relax][PyTorch] Add 
ReLU6 Op Support for Exported Program and FX graph
 * [#17930](https://github.com/apache/tvm/pull/17930) - [Relax][PyTorch] Add 
torch.outer Op Support for Exported Program and FX graph 
 * [#17932](https://github.com/apache/tvm/pull/17932) - [Relax][PyTorch] Add 
UpSample Bicubic Op Support for Exported Program and FX graph
 * [#17921](https://github.com/apache/tvm/pull/17921) - [Relax][PyTorch] Add 
AvgPool 1D and 3D Op Support for Exported Program and FX graph
 * [#17922](https://github.com/apache/tvm/pull/17922) - [Relax][PyTorch] Add 
Adaptive AvgPool 1D and 3D Op Support for Exported Program and FX graph
 * [#17863](https://github.com/apache/tvm/pull/17863) - [Relax][PyTorch] 
CrossEntropyLoss
 * [#17919](https://github.com/apache/tvm/pull/17919) - [Relax][PyTorch] Add 
MaxPool 1D and 3D Op Support for Exported Program and FX graph
 * [#17926](https://github.com/apache/tvm/pull/17926) - [Relax][PyTorch] Add 
tests for all the dtypes supported in the PyTorch frontend
 * [#17924](https://github.com/apache/tvm/pull/17924) - [Relax][PyTorch] Add 
div.Tensor_mode and trunc Op Support for Exported Program and FX graph
 * [#17904](https://github.com/apache/tvm/pull/17904) - [Relax][PyTorch] Add 
Meshgrid Op Support for Exported Program and FX graph
 * [#17915](https://github.com/apache/tvm/pull/17915) - [Relax][PyTorch] Add 
support for linspace op in fx graph
 * [#17886](https://github.com/apache/tvm/pull/17886) - [Relax][PyTorch] Add 
Pixel Shuffle Op Support for Exported Program and FX graph
 * [#17908](https://github.com/apache/tvm/pull/17908) - [Relax][PyTorch] Add 
support for eye op in fx graph
 * [#17893](https://github.com/apache/tvm/pull/17893) - [Relax][Pytorch] Add 
fmod support
 * [#17894](https://github.com/apache/tvm/pull/17894) - [Relax][PyTorch] 
Support torch.bfloat16 dtype in pytorch frontend
 * [#17878](https://github.com/apache/tvm/pull/17878) - [Relax][PyTorch] Add 
torch.isin Op Support for Exported Program and FX graph
 * [#17889](https://github.com/apache/tvm/pull/17889) - [Relax][PyTorch] 
Support linspace op for ExportedProgram importer
 * [#17868](https://github.com/apache/tvm/pull/17868) - [Relax][Pytorch] Add 
support for ones_like, zero_, zeros, type_as, item ops
 * [#17857](https://github.com/apache/tvm/pull/17857) - [Relax][PyTorch] 
Refactor norm op for ExportedProgram importer
 * [#17852](https://github.com/apache/tvm/pull/17852) - [Relax][PyTorch] 
Sort.default
 * [#17871](https://github.com/apache/tvm/pull/17871) - [Relax][Pytorch] Add 
support for bitwise_or op support
 * [#17836](https://github.com/apache/tvm/pull/17836) - [Relax][PyTorch] 
support for index.Tensor
 * [#17864](https://github.com/apache/tvm/pull/17864) - [Relax][PyTorch] 
Support eye op for ExportedProgram importer
 * [#17858](https://github.com/apache/tvm/pull/17858) - [Relax][PyTorch] Add 
copy_ op support in fxGraph
 * [#17851](https://github.com/apache/tvm/pull/17851) - [Relax][PyTorch] 
Support `leaky_relu_.default` and `reshape_as.default` in ExportedProgram 
frontend
 * [#17843](https://github.com/apache/tvm/pull/17843) - [Relax][PyTorch] Add 
mul_.Tensor, max.default, min.default and pow.Scalar Op Support into Exported 
Program Frontend
 * [#17821](https://github.com/apache/tvm/pull/17821) - [Relax][PyTorch] Add 
Pad Op Support for Exported Program and FX graph
 * [#17819](https://github.com/apache/tvm/pull/17819) - [Relax][PyTorch] Add 
Stack Op Support for Exported Program 
 * [#17849](https://github.com/apache/tvm/pull/17849) - [Relax][PyTorch] Add 
RSub Op Support for Exported Program and FX graph
 * [#17850](https://github.com/apache/tvm/pull/17850) - [Relax][Pytorch] Add 
masked_fill op support in ExportedProgram
 * [#17816](https://github.com/apache/tvm/pull/17816) - [Relax][PyTorch] Add 
PReLU Op Support for Exported Program and FX graph
 * [#17803](https://github.com/apache/tvm/pull/17803) - [Relax][PyTorch] Add 
Logaddexp op support for exported program 
 * [#17841](https://github.com/apache/tvm/pull/17841) - [Relax][PyTorch] Add 
support for norm op
 * [#17832](https://github.com/apache/tvm/pull/17832) - [Relax][PyTorch] 
full.default, full_like.default, ones.default 
 * [#17830](https://github.com/apache/tvm/pull/17830) - [Relax][PyTorch] 
Support narrow and broadcast_to ops for ExportedProgram importer

### LLVM
 * [#17859](https://github.com/apache/tvm/pull/17859) - [Codegen] Enable 
SVE/VLA for RISCV targets
 * [#17958](https://github.com/apache/tvm/pull/17958) - Fix JIT unknown reloc 
issue for case of RISCV
 * [#17954](https://github.com/apache/tvm/pull/17954) - [FFI]Fix compilation 
errors with clang20

### Metal
 * [#18034](https://github.com/apache/tvm/pull/18034) - Fix `GetFunction` of 
metal runtime

### ROCm
 * [#18029](https://github.com/apache/tvm/pull/18029) - Fix ROCm build after 
FFI refactor

### Relax
 * [#18102](https://github.com/apache/tvm/pull/18102) - Fix rotary embedding 
buffer size calculation
 * [#17928](https://github.com/apache/tvm/pull/17928) - [KVCache] Per Layer 
Sliding Window
 * [#17840](https://github.com/apache/tvm/pull/17840) - Refactor missing op 
check into shared utility for Torch frontends
 * [#17826](https://github.com/apache/tvm/pull/17826) - Fix Torch frontends to 
report all the missing ops

### Runtime
 * [#18097](https://github.com/apache/tvm/pull/18097) - CutensorMap support

### TIR
 * [#18068](https://github.com/apache/tvm/pull/18068) - Extend address_of to 
support Buffer objects
 * [#18069](https://github.com/apache/tvm/pull/18069) - Fix block access region 
detection for nested let bindings
 * [#18057](https://github.com/apache/tvm/pull/18057) - Phase out 
ProducerStore, ProducerRealize and Prefetch

### TOPI
 * [#18039](https://github.com/apache/tvm/pull/18039) - [Relax] Support 
InstanceNorm & Bugfix of InstanceNorm
 * [#18063](https://github.com/apache/tvm/pull/18063) - [NN][Layer_Norm] Fix 
layer_norm error with reduce-only axes
 * [#18006](https://github.com/apache/tvm/pull/18006) - Fix index handling in 
expand_like operator for axis expansion
 * [#18015](https://github.com/apache/tvm/pull/18015) - Support integer type 
input for log10
 * [#17942](https://github.com/apache/tvm/pull/17942) - Add shape validation to 
prevent negative dimensions in conv operations

### Vulkan
 * [#18005](https://github.com/apache/tvm/pull/18005) - Add TIR unary 
trigonometric/hyperbolic intrinsic definitions

### cuda & cutlass & tensorrt
 * [#18064](https://github.com/apache/tvm/pull/18064) - [CUTLASS] Fix CUTLASS 
kernel build on Hopper
 * [#18033](https://github.com/apache/tvm/pull/18033) - [CUTLASS] Add GeMM 
kernels for Blackwell GPUs
 * [#18024](https://github.com/apache/tvm/pull/18024) - [CUDA] Fix thrust with 
latest FFI refactor
 * [#18118](https://github.com/apache/tvm/pull/18118) - bump 
cutlass_fpA_intB_gemm
 * [#18113](https://github.com/apache/tvm/pull/18113) - [CMake] Refine C++/CUDA 
standard settings in CMakeLists.txt

### FFI
 * [#18076](https://github.com/apache/tvm/pull/18076) - [FFI][REFACTOR] 
Stablize container ABI and implementation
 * [#18091](https://github.com/apache/tvm/pull/18091) - [FFI] Provide Field 
Visit bridge so we can do gradual transition
 * [#18095](https://github.com/apache/tvm/pull/18095) - [FFI][REFACTOR] Migrate 
attrs to use new reflection
 * [#18083](https://github.com/apache/tvm/pull/18083) - [FFI] Update typeinfo 
to speedup parent reflection
 * [#18077](https://github.com/apache/tvm/pull/18077) - [FFI] Optimize atomic 
decref in Object
 * [#18065](https://github.com/apache/tvm/pull/18065) - [FFI] Introduce FFI 
reflection support in python
 * [#18062](https://github.com/apache/tvm/pull/18062) - [FFI][REFACTOR] Update 
registry to have complete meta-data
 * [#18059](https://github.com/apache/tvm/pull/18059) - [FFI][REFACTOR] Enhance 
reflection
 * [#18050](https://github.com/apache/tvm/pull/18050) - [FFI] Enhance FFI 
Object exception safety during init
 * [#18121](https://github.com/apache/tvm/pull/18121) - Revert "[FFI] Replace 
`Arg2Str` with a more powerful `for_each`"
 * [#18117](https://github.com/apache/tvm/pull/18117) - [FFI] Replace `Arg2Str` 
with a more powerful `for_each`
 * [#18116](https://github.com/apache/tvm/pull/18116) - [FFI] Use fold 
expression to simplify for_each
 * [#18114](https://github.com/apache/tvm/pull/18114) - [FFI] Replace 
`__attribute__` with C++ standard attributes
 * [#18112](https://github.com/apache/tvm/pull/18112) - [FFI] Cleanup 
visit_attrs attribute after refactor
 * [#18111](https://github.com/apache/tvm/pull/18111) - [FFI] Introduce 
GlobalDef for function registration
 * [#18106](https://github.com/apache/tvm/pull/18106) - [REFACTOR][FFI] Phase 
out old VisitAttrs mechanism
 * [#18042](https://github.com/apache/tvm/pull/18042) - [REFACTOR][FFI] Update 
symbol name for library module
 * [#18023](https://github.com/apache/tvm/pull/18023) - [FFI] More strict tuple 
constructor checking
 * [#18022](https://github.com/apache/tvm/pull/18022) - [REFACTOR][FFI] Cleanup 
PackedFunc redirections
 * [#18020](https://github.com/apache/tvm/pull/18020) - [REFACTOR][PYTHON] 
Phase out tvm.\_ffi and Limited API support
 * [#17979](https://github.com/apache/tvm/pull/17979) - [FFI][REFACTOR] Update 
to distinguish as and cast
 * [#17983](https://github.com/apache/tvm/pull/17983) - [FFI][JVM] Upgrade 
tvm4j to latest FFI
 * [#18010](https://github.com/apache/tvm/pull/18010) - [REFACTOR][FFI] Phase 
out legacy C API
 * [#17943](https://github.com/apache/tvm/pull/17943) - [FFI] Variant 
specialize for all ObjectRef
 * [#17939](https://github.com/apache/tvm/pull/17939) - [REFACTOR] Phase out 
legacy rust ffi
 * [#17940](https://github.com/apache/tvm/pull/17940) - [REFACTOR] Phase out 
legacy go ffi
 * [#17931](https://github.com/apache/tvm/pull/17931) - [REFACTOR][FFI][RPC] 
Migrate RPC to use the latest FFI ABI
 * [#17929](https://github.com/apache/tvm/pull/17929) - [REFACTOR][FFI] Cleanup 
container redirections
 * [#17927](https://github.com/apache/tvm/pull/17927) - [FFI][FEAT] AutoDLPack 
for taking external tensor objects
 * [#17923](https://github.com/apache/tvm/pull/17923) - [REFACTOR][FFI] Cleanup 
PackedFunc related redirection
 * [#17920](https://github.com/apache/tvm/pull/17920) - [REFACTOR] Introduce 
and modernize ffi system

### web
 * [#17946](https://github.com/apache/tvm/pull/17946) - [REFACTOR][FFI]Upgrade 
Web Runtime to new FFI
 * [#17917](https://github.com/apache/tvm/pull/17917) - [WebGPU][CodeGen] 
Override PrintVecElemLoad and Store for WebGPU

### Misc
 * [#18104](https://github.com/apache/tvm/pull/18104) - Add LLVM Legalization 
for tir.erf
 * [#18107](https://github.com/apache/tvm/pull/18107) - fix: guard tensormap 
with cuda version check
 * [#18101](https://github.com/apache/tvm/pull/18101) - [REFACTOR] Formalize 
namespace for all objects
 * [#18040](https://github.com/apache/tvm/pull/18040) - Add support for 
bucketize
 * [#18098](https://github.com/apache/tvm/pull/18098) - [REFACTOR] Transition 
VisitAttrs to new reflection mechanism
 * [#18096](https://github.com/apache/tvm/pull/18096) - [REFACTOR] Transition 
VisitAttrs to new reflection mechanism in tir/ir_builder/meta_schedule
 * [#18093](https://github.com/apache/tvm/pull/18093) - [NVSHMEM] Extend CUDA 
backend to compile and link TIR modules with NVSHMEM
 * [#18088](https://github.com/apache/tvm/pull/18088) - [Script] Enhance alloc 
buffer handling in nested frames
 * [#18086](https://github.com/apache/tvm/pull/18086) - [SCRIPT] Bump Python 
minimum version to 3.9 and update AST compatibility
 * [#18075](https://github.com/apache/tvm/pull/18075) - add support for 
softsign op
 * [#18079](https://github.com/apache/tvm/pull/18079) - [Script] Add support 
for merging block annotations
 * [#18080](https://github.com/apache/tvm/pull/18080) - [REFACTOR] Phase out 
LegacyReprPrinter and improve CommonSubExprElim
 * [#18078](https://github.com/apache/tvm/pull/18078) - [REFACTOR] Phase out 
the RelaxExpr.checked_type in favor of struct_info
 * [#18073](https://github.com/apache/tvm/pull/18073) - [NVSHMEM] Update 
NDArray allocation
 * [#18066](https://github.com/apache/tvm/pull/18066) - [Script] Remove 
deprecated attributes from Constant AST node
 * [#18060](https://github.com/apache/tvm/pull/18060) - Add Python functor 
support for TIR expressions and statements
 * [#18054](https://github.com/apache/tvm/pull/18054) - [Pytest] Remove 
obsolete test suite entries
 * [#18036](https://github.com/apache/tvm/pull/18036) - Add support for 
hamming_window op
 * [#18049](https://github.com/apache/tvm/pull/18049) - [Refactor] Rename 
`relax_vm` to `vm`
 * [#18046](https://github.com/apache/tvm/pull/18046) - [3rdparty] Phasing out 
FlashInfer AOT from 3rdparty
 * [#18047](https://github.com/apache/tvm/pull/18047) - [Backend] JIT compile 
FlashInfer kernel with FFI header
 * [#18041](https://github.com/apache/tvm/pull/18041) - [DTYPE] Fix dtype 
functions after dtype refactor
 * [#18043](https://github.com/apache/tvm/pull/18043) - [REFACTOR] Phase out 
the relax tuning_api
 * [#18038](https://github.com/apache/tvm/pull/18038) - Resolving inconsistency 
between attention/attention_bias
 * [#18027](https://github.com/apache/tvm/pull/18027) - [Dtype] Low-precision 
Blackwell Datatype Support
 * [#17985](https://github.com/apache/tvm/pull/17985) - [Codegen] Resolve issue 
#17965 where the same model produces different outputs on the LLVM (CPU) and 
CUDA (GPU) backends
 * [#17978](https://github.com/apache/tvm/pull/17978) - Fix IR generation 
conflict in topi.nn.simplify by separating Tensor and PrimExpr handling
 * [#18026](https://github.com/apache/tvm/pull/18026) - [Python] Fix library 
lookup path for pip installed packages
 * [#18019](https://github.com/apache/tvm/pull/18019) - Add op support for 
slice_scatter
 * [#17974](https://github.com/apache/tvm/pull/17974) - Fix FLOP estimation for 
EvaluateNode by implementing VisitStmt_ handler
 * [#18013](https://github.com/apache/tvm/pull/18013) - Fix RuntimeError: 
parallel_for_dynamic
 * [#18014](https://github.com/apache/tvm/pull/18014) - Fix division truncation 
in window size calculation for small dtypes in average_pool
 * [#17995](https://github.com/apache/tvm/pull/17995) - Fix zero-extent loops 
in PerStoreFeature to prevent crashes
 * [#17969](https://github.com/apache/tvm/pull/17969) - Add registion for the 
operator asinh, acosh, atanh in llvm
 * [#17972](https://github.com/apache/tvm/pull/17972) - Fix g.costs
 * [#17953](https://github.com/apache/tvm/pull/17953) - Fix sqrt/rsqrt 
Compatibility with Integer Data Types
 * [#17961](https://github.com/apache/tvm/pull/17961) - Fix basic FLOP 
estimation for WhileNode
 * [#17945](https://github.com/apache/tvm/pull/17945) - Add registion for the 
operator asin and acos in llvm
 * [#17951](https://github.com/apache/tvm/pull/17951) - [NODE] Fix structural 
equality for Array<Any> specialization
 * [#17913](https://github.com/apache/tvm/pull/17913) - [Triton] Support latest 
`triton.compile` interface
 * [#17911](https://github.com/apache/tvm/pull/17911) - Add op support for 
new_zeros op in Exported Program and fx graph frontend
 * [#17909](https://github.com/apache/tvm/pull/17909) - Add 
masked_fill_.scalar, logical_not.default in Exported Program frontend
 * [#17910](https://github.com/apache/tvm/pull/17910) - [RPC] Fix Bug That 
Change Dict When Iterate The Keys
 * [#17896](https://github.com/apache/tvm/pull/17896) - Add op support for 
zeros_like and fill_
 * [#17900](https://github.com/apache/tvm/pull/17900) - Fix onnx expand op
 * [#17865](https://github.com/apache/tvm/pull/17865) - Add support for 
index_put_ op
 * [#17839](https://github.com/apache/tvm/pull/17839) - Add op support for roll 
op
 * [#17844](https://github.com/apache/tvm/pull/17844) - Fix incorrect docstring 
in topi softmax 
 * [#17831](https://github.com/apache/tvm/pull/17831) - [3rdparty] Bump DLPack 
to v1.1 for float8/6/4 dtype supports
 * [#17848](https://github.com/apache/tvm/pull/17848) - Fix docstring in 
batch_to_space_nd and bitpack
 * [#17845](https://github.com/apache/tvm/pull/17845) - fixing incorrect 
docstring in upsampling.py
 * [#17808](https://github.com/apache/tvm/pull/17808) - [Install] Fix error 
during python/tvm installation

-- 
Reply to this email directly or view it on GitHub:
https://github.com/apache/tvm/issues/18150
You are receiving this because you are subscribed to this thread.

Message ID: <apache/tvm/issues/[email protected]>

[apache/tvm] [Release] v0.21.0 Release Candidate Notes (Issue #18150)

Reply via email to