# Introduction The TVM community has worked since the last release to deliver the following new exciting improvements!
The main tags are below (**bold text is with lots of progress**): Relax (especial PyTorch frontend), FFI etc. Please visit the full listing of commits for a complete view: [v0.21.dev0...v0.21.0.rc0](https://github.com/apache/tvm/compare/v0.21.dev0...v0.21.0.rc0). ### Community None. ### RFCs None. ### Arith * [#18067](https://github.com/apache/tvm/pull/18067) - Add IsBound method to ConstIntBoundAnalyzer * [#18031](https://github.com/apache/tvm/pull/18031) - Canonicalize mul-coefficient to rhs * [#18025](https://github.com/apache/tvm/pull/18025) - Fix canonical simplify for LE with incorrect range assumptions ### BugFix * [#18115](https://github.com/apache/tvm/pull/18115) - [Fix][Serialization] Add support for NaN value serialization * [#18103](https://github.com/apache/tvm/pull/18103) - [Fix] Replace dmlc::Error with std::exception in VerifyGPUCode * [#18092](https://github.com/apache/tvm/pull/18092) - [Fix] Fix ExecBuilderDeclareFunction method name in exec_builder.py * [#18087](https://github.com/apache/tvm/pull/18087) - fix exception when tvm not built with llvm support * [#18035](https://github.com/apache/tvm/pull/18035) - [CUDA] Fix: Update settings for rerun on Increase FloatImm precision when printing 64 bit values in CUDA codegen * [#17968](https://github.com/apache/tvm/pull/17968) - [Relax][Pytorch] Bugfix of conv_transpose1d and conv_transpose2d * [#17950](https://github.com/apache/tvm/pull/17950) - [Fix][Relax] Fix dangling reference in GetTargetFunctions() * [#17902](https://github.com/apache/tvm/pull/17902) - Fix off-by-one error in the type index range check within Object::IsInstance() * [#17882](https://github.com/apache/tvm/pull/17882) - [Relax][Pytorch] Fix incorrect behaviour of % (mod) operator in TVM frontend * [#17875](https://github.com/apache/tvm/pull/17875) - [Relax][Pytorch] Incorrect Handling of In-Place Ops in FX-Based TVM Frontend * [#17838](https://github.com/apache/tvm/pull/17838) - [TIR] Schedule support reverse-inline with reduction blocks ### CI * [#18071](https://github.com/apache/tvm/pull/18071) - Update windows to 2025 * [#18058](https://github.com/apache/tvm/pull/18058) - [TEST] Move temp files into tempdir * [#18037](https://github.com/apache/tvm/pull/18037) - Further robustify is_last_build check * [#17981](https://github.com/apache/tvm/pull/17981) - Update images to `20250513-063354-70aa3797` * [#17891](https://github.com/apache/tvm/pull/17891) - Update images to 20250428-080833-03eadc65 * [#17905](https://github.com/apache/tvm/pull/17905) - Install PyTorch 2.7 compatible with CUDA 11.8 * [#17887](https://github.com/apache/tvm/pull/17887) - Upgrade pytorch to 2.7.0, torchvision to 0.22.0, and vulkan sdk to 1.4.309 * [#17846](https://github.com/apache/tvm/pull/17846) - Upgrade ubuntu runner image for GitHub CI ### Docker * [#17955](https://github.com/apache/tvm/pull/17955) - [CI] Reintroduce NNEF to CI images ### Docs * [#18056](https://github.com/apache/tvm/pull/18056) - Update installation instruction based ffi refactor ### Frontend * [#18090](https://github.com/apache/tvm/pull/18090) - [Relax][ONNX] Update Reduce ops to support axes as input * [#18072](https://github.com/apache/tvm/pull/18072) - [Relax][ONNX] Update ReduceL1 to opset 18 * [#18016](https://github.com/apache/tvm/pull/18016) - [Relax][ONNX] Replace deprecated `mapping.TENSOR_TYPE_TO_NP_TYPE` usage * [#18001](https://github.com/apache/tvm/pull/18001) - [Relax][ONNX] Fix: bitwise_not misclassified as binary (is … * [#17990](https://github.com/apache/tvm/pull/17990) - [Relax]Fix: Output tensor with zero dimension after torch.u… * [#17925](https://github.com/apache/tvm/pull/17925) - [Relax][PyTorch] Re-enable test_subgraph_capture in dynamo test * [#17980](https://github.com/apache/tvm/pull/17980) - [ONNX] Make bias input optional in LayerNormalization * [#17918](https://github.com/apache/tvm/pull/17918) - [Relax][PyTorch] Add ReLU6 Op Support for Exported Program and FX graph * [#17930](https://github.com/apache/tvm/pull/17930) - [Relax][PyTorch] Add torch.outer Op Support for Exported Program and FX graph * [#17932](https://github.com/apache/tvm/pull/17932) - [Relax][PyTorch] Add UpSample Bicubic Op Support for Exported Program and FX graph * [#17921](https://github.com/apache/tvm/pull/17921) - [Relax][PyTorch] Add AvgPool 1D and 3D Op Support for Exported Program and FX graph * [#17922](https://github.com/apache/tvm/pull/17922) - [Relax][PyTorch] Add Adaptive AvgPool 1D and 3D Op Support for Exported Program and FX graph * [#17863](https://github.com/apache/tvm/pull/17863) - [Relax][PyTorch] CrossEntropyLoss * [#17919](https://github.com/apache/tvm/pull/17919) - [Relax][PyTorch] Add MaxPool 1D and 3D Op Support for Exported Program and FX graph * [#17926](https://github.com/apache/tvm/pull/17926) - [Relax][PyTorch] Add tests for all the dtypes supported in the PyTorch frontend * [#17924](https://github.com/apache/tvm/pull/17924) - [Relax][PyTorch] Add div.Tensor_mode and trunc Op Support for Exported Program and FX graph * [#17904](https://github.com/apache/tvm/pull/17904) - [Relax][PyTorch] Add Meshgrid Op Support for Exported Program and FX graph * [#17915](https://github.com/apache/tvm/pull/17915) - [Relax][PyTorch] Add support for linspace op in fx graph * [#17886](https://github.com/apache/tvm/pull/17886) - [Relax][PyTorch] Add Pixel Shuffle Op Support for Exported Program and FX graph * [#17908](https://github.com/apache/tvm/pull/17908) - [Relax][PyTorch] Add support for eye op in fx graph * [#17893](https://github.com/apache/tvm/pull/17893) - [Relax][Pytorch] Add fmod support * [#17894](https://github.com/apache/tvm/pull/17894) - [Relax][PyTorch] Support torch.bfloat16 dtype in pytorch frontend * [#17878](https://github.com/apache/tvm/pull/17878) - [Relax][PyTorch] Add torch.isin Op Support for Exported Program and FX graph * [#17889](https://github.com/apache/tvm/pull/17889) - [Relax][PyTorch] Support linspace op for ExportedProgram importer * [#17868](https://github.com/apache/tvm/pull/17868) - [Relax][Pytorch] Add support for ones_like, zero_, zeros, type_as, item ops * [#17857](https://github.com/apache/tvm/pull/17857) - [Relax][PyTorch] Refactor norm op for ExportedProgram importer * [#17852](https://github.com/apache/tvm/pull/17852) - [Relax][PyTorch] Sort.default * [#17871](https://github.com/apache/tvm/pull/17871) - [Relax][Pytorch] Add support for bitwise_or op support * [#17836](https://github.com/apache/tvm/pull/17836) - [Relax][PyTorch] support for index.Tensor * [#17864](https://github.com/apache/tvm/pull/17864) - [Relax][PyTorch] Support eye op for ExportedProgram importer * [#17858](https://github.com/apache/tvm/pull/17858) - [Relax][PyTorch] Add copy_ op support in fxGraph * [#17851](https://github.com/apache/tvm/pull/17851) - [Relax][PyTorch] Support `leaky_relu_.default` and `reshape_as.default` in ExportedProgram frontend * [#17843](https://github.com/apache/tvm/pull/17843) - [Relax][PyTorch] Add mul_.Tensor, max.default, min.default and pow.Scalar Op Support into Exported Program Frontend * [#17821](https://github.com/apache/tvm/pull/17821) - [Relax][PyTorch] Add Pad Op Support for Exported Program and FX graph * [#17819](https://github.com/apache/tvm/pull/17819) - [Relax][PyTorch] Add Stack Op Support for Exported Program * [#17849](https://github.com/apache/tvm/pull/17849) - [Relax][PyTorch] Add RSub Op Support for Exported Program and FX graph * [#17850](https://github.com/apache/tvm/pull/17850) - [Relax][Pytorch] Add masked_fill op support in ExportedProgram * [#17816](https://github.com/apache/tvm/pull/17816) - [Relax][PyTorch] Add PReLU Op Support for Exported Program and FX graph * [#17803](https://github.com/apache/tvm/pull/17803) - [Relax][PyTorch] Add Logaddexp op support for exported program * [#17841](https://github.com/apache/tvm/pull/17841) - [Relax][PyTorch] Add support for norm op * [#17832](https://github.com/apache/tvm/pull/17832) - [Relax][PyTorch] full.default, full_like.default, ones.default * [#17830](https://github.com/apache/tvm/pull/17830) - [Relax][PyTorch] Support narrow and broadcast_to ops for ExportedProgram importer ### LLVM * [#17859](https://github.com/apache/tvm/pull/17859) - [Codegen] Enable SVE/VLA for RISCV targets * [#17958](https://github.com/apache/tvm/pull/17958) - Fix JIT unknown reloc issue for case of RISCV * [#17954](https://github.com/apache/tvm/pull/17954) - [FFI]Fix compilation errors with clang20 ### Metal * [#18034](https://github.com/apache/tvm/pull/18034) - Fix `GetFunction` of metal runtime ### ROCm * [#18029](https://github.com/apache/tvm/pull/18029) - Fix ROCm build after FFI refactor ### Relax * [#18102](https://github.com/apache/tvm/pull/18102) - Fix rotary embedding buffer size calculation * [#17928](https://github.com/apache/tvm/pull/17928) - [KVCache] Per Layer Sliding Window * [#17840](https://github.com/apache/tvm/pull/17840) - Refactor missing op check into shared utility for Torch frontends * [#17826](https://github.com/apache/tvm/pull/17826) - Fix Torch frontends to report all the missing ops ### Runtime * [#18097](https://github.com/apache/tvm/pull/18097) - CutensorMap support ### TIR * [#18068](https://github.com/apache/tvm/pull/18068) - Extend address_of to support Buffer objects * [#18069](https://github.com/apache/tvm/pull/18069) - Fix block access region detection for nested let bindings * [#18057](https://github.com/apache/tvm/pull/18057) - Phase out ProducerStore, ProducerRealize and Prefetch ### TOPI * [#18039](https://github.com/apache/tvm/pull/18039) - [Relax] Support InstanceNorm & Bugfix of InstanceNorm * [#18063](https://github.com/apache/tvm/pull/18063) - [NN][Layer_Norm] Fix layer_norm error with reduce-only axes * [#18006](https://github.com/apache/tvm/pull/18006) - Fix index handling in expand_like operator for axis expansion * [#18015](https://github.com/apache/tvm/pull/18015) - Support integer type input for log10 * [#17942](https://github.com/apache/tvm/pull/17942) - Add shape validation to prevent negative dimensions in conv operations ### Vulkan * [#18005](https://github.com/apache/tvm/pull/18005) - Add TIR unary trigonometric/hyperbolic intrinsic definitions ### cuda & cutlass & tensorrt * [#18064](https://github.com/apache/tvm/pull/18064) - [CUTLASS] Fix CUTLASS kernel build on Hopper * [#18033](https://github.com/apache/tvm/pull/18033) - [CUTLASS] Add GeMM kernels for Blackwell GPUs * [#18024](https://github.com/apache/tvm/pull/18024) - [CUDA] Fix thrust with latest FFI refactor * [#18118](https://github.com/apache/tvm/pull/18118) - bump cutlass_fpA_intB_gemm * [#18113](https://github.com/apache/tvm/pull/18113) - [CMake] Refine C++/CUDA standard settings in CMakeLists.txt ### FFI * [#18076](https://github.com/apache/tvm/pull/18076) - [FFI][REFACTOR] Stablize container ABI and implementation * [#18091](https://github.com/apache/tvm/pull/18091) - [FFI] Provide Field Visit bridge so we can do gradual transition * [#18095](https://github.com/apache/tvm/pull/18095) - [FFI][REFACTOR] Migrate attrs to use new reflection * [#18083](https://github.com/apache/tvm/pull/18083) - [FFI] Update typeinfo to speedup parent reflection * [#18077](https://github.com/apache/tvm/pull/18077) - [FFI] Optimize atomic decref in Object * [#18065](https://github.com/apache/tvm/pull/18065) - [FFI] Introduce FFI reflection support in python * [#18062](https://github.com/apache/tvm/pull/18062) - [FFI][REFACTOR] Update registry to have complete meta-data * [#18059](https://github.com/apache/tvm/pull/18059) - [FFI][REFACTOR] Enhance reflection * [#18050](https://github.com/apache/tvm/pull/18050) - [FFI] Enhance FFI Object exception safety during init * [#18121](https://github.com/apache/tvm/pull/18121) - Revert "[FFI] Replace `Arg2Str` with a more powerful `for_each`" * [#18117](https://github.com/apache/tvm/pull/18117) - [FFI] Replace `Arg2Str` with a more powerful `for_each` * [#18116](https://github.com/apache/tvm/pull/18116) - [FFI] Use fold expression to simplify for_each * [#18114](https://github.com/apache/tvm/pull/18114) - [FFI] Replace `__attribute__` with C++ standard attributes * [#18112](https://github.com/apache/tvm/pull/18112) - [FFI] Cleanup visit_attrs attribute after refactor * [#18111](https://github.com/apache/tvm/pull/18111) - [FFI] Introduce GlobalDef for function registration * [#18106](https://github.com/apache/tvm/pull/18106) - [REFACTOR][FFI] Phase out old VisitAttrs mechanism * [#18042](https://github.com/apache/tvm/pull/18042) - [REFACTOR][FFI] Update symbol name for library module * [#18023](https://github.com/apache/tvm/pull/18023) - [FFI] More strict tuple constructor checking * [#18022](https://github.com/apache/tvm/pull/18022) - [REFACTOR][FFI] Cleanup PackedFunc redirections * [#18020](https://github.com/apache/tvm/pull/18020) - [REFACTOR][PYTHON] Phase out tvm.\_ffi and Limited API support * [#17979](https://github.com/apache/tvm/pull/17979) - [FFI][REFACTOR] Update to distinguish as and cast * [#17983](https://github.com/apache/tvm/pull/17983) - [FFI][JVM] Upgrade tvm4j to latest FFI * [#18010](https://github.com/apache/tvm/pull/18010) - [REFACTOR][FFI] Phase out legacy C API * [#17943](https://github.com/apache/tvm/pull/17943) - [FFI] Variant specialize for all ObjectRef * [#17939](https://github.com/apache/tvm/pull/17939) - [REFACTOR] Phase out legacy rust ffi * [#17940](https://github.com/apache/tvm/pull/17940) - [REFACTOR] Phase out legacy go ffi * [#17931](https://github.com/apache/tvm/pull/17931) - [REFACTOR][FFI][RPC] Migrate RPC to use the latest FFI ABI * [#17929](https://github.com/apache/tvm/pull/17929) - [REFACTOR][FFI] Cleanup container redirections * [#17927](https://github.com/apache/tvm/pull/17927) - [FFI][FEAT] AutoDLPack for taking external tensor objects * [#17923](https://github.com/apache/tvm/pull/17923) - [REFACTOR][FFI] Cleanup PackedFunc related redirection * [#17920](https://github.com/apache/tvm/pull/17920) - [REFACTOR] Introduce and modernize ffi system ### web * [#17946](https://github.com/apache/tvm/pull/17946) - [REFACTOR][FFI]Upgrade Web Runtime to new FFI * [#17917](https://github.com/apache/tvm/pull/17917) - [WebGPU][CodeGen] Override PrintVecElemLoad and Store for WebGPU ### Misc * [#18104](https://github.com/apache/tvm/pull/18104) - Add LLVM Legalization for tir.erf * [#18107](https://github.com/apache/tvm/pull/18107) - fix: guard tensormap with cuda version check * [#18101](https://github.com/apache/tvm/pull/18101) - [REFACTOR] Formalize namespace for all objects * [#18040](https://github.com/apache/tvm/pull/18040) - Add support for bucketize * [#18098](https://github.com/apache/tvm/pull/18098) - [REFACTOR] Transition VisitAttrs to new reflection mechanism * [#18096](https://github.com/apache/tvm/pull/18096) - [REFACTOR] Transition VisitAttrs to new reflection mechanism in tir/ir_builder/meta_schedule * [#18093](https://github.com/apache/tvm/pull/18093) - [NVSHMEM] Extend CUDA backend to compile and link TIR modules with NVSHMEM * [#18088](https://github.com/apache/tvm/pull/18088) - [Script] Enhance alloc buffer handling in nested frames * [#18086](https://github.com/apache/tvm/pull/18086) - [SCRIPT] Bump Python minimum version to 3.9 and update AST compatibility * [#18075](https://github.com/apache/tvm/pull/18075) - add support for softsign op * [#18079](https://github.com/apache/tvm/pull/18079) - [Script] Add support for merging block annotations * [#18080](https://github.com/apache/tvm/pull/18080) - [REFACTOR] Phase out LegacyReprPrinter and improve CommonSubExprElim * [#18078](https://github.com/apache/tvm/pull/18078) - [REFACTOR] Phase out the RelaxExpr.checked_type in favor of struct_info * [#18073](https://github.com/apache/tvm/pull/18073) - [NVSHMEM] Update NDArray allocation * [#18066](https://github.com/apache/tvm/pull/18066) - [Script] Remove deprecated attributes from Constant AST node * [#18060](https://github.com/apache/tvm/pull/18060) - Add Python functor support for TIR expressions and statements * [#18054](https://github.com/apache/tvm/pull/18054) - [Pytest] Remove obsolete test suite entries * [#18036](https://github.com/apache/tvm/pull/18036) - Add support for hamming_window op * [#18049](https://github.com/apache/tvm/pull/18049) - [Refactor] Rename `relax_vm` to `vm` * [#18046](https://github.com/apache/tvm/pull/18046) - [3rdparty] Phasing out FlashInfer AOT from 3rdparty * [#18047](https://github.com/apache/tvm/pull/18047) - [Backend] JIT compile FlashInfer kernel with FFI header * [#18041](https://github.com/apache/tvm/pull/18041) - [DTYPE] Fix dtype functions after dtype refactor * [#18043](https://github.com/apache/tvm/pull/18043) - [REFACTOR] Phase out the relax tuning_api * [#18038](https://github.com/apache/tvm/pull/18038) - Resolving inconsistency between attention/attention_bias * [#18027](https://github.com/apache/tvm/pull/18027) - [Dtype] Low-precision Blackwell Datatype Support * [#17985](https://github.com/apache/tvm/pull/17985) - [Codegen] Resolve issue #17965 where the same model produces different outputs on the LLVM (CPU) and CUDA (GPU) backends * [#17978](https://github.com/apache/tvm/pull/17978) - Fix IR generation conflict in topi.nn.simplify by separating Tensor and PrimExpr handling * [#18026](https://github.com/apache/tvm/pull/18026) - [Python] Fix library lookup path for pip installed packages * [#18019](https://github.com/apache/tvm/pull/18019) - Add op support for slice_scatter * [#17974](https://github.com/apache/tvm/pull/17974) - Fix FLOP estimation for EvaluateNode by implementing VisitStmt_ handler * [#18013](https://github.com/apache/tvm/pull/18013) - Fix RuntimeError: parallel_for_dynamic * [#18014](https://github.com/apache/tvm/pull/18014) - Fix division truncation in window size calculation for small dtypes in average_pool * [#17995](https://github.com/apache/tvm/pull/17995) - Fix zero-extent loops in PerStoreFeature to prevent crashes * [#17969](https://github.com/apache/tvm/pull/17969) - Add registion for the operator asinh, acosh, atanh in llvm * [#17972](https://github.com/apache/tvm/pull/17972) - Fix g.costs * [#17953](https://github.com/apache/tvm/pull/17953) - Fix sqrt/rsqrt Compatibility with Integer Data Types * [#17961](https://github.com/apache/tvm/pull/17961) - Fix basic FLOP estimation for WhileNode * [#17945](https://github.com/apache/tvm/pull/17945) - Add registion for the operator asin and acos in llvm * [#17951](https://github.com/apache/tvm/pull/17951) - [NODE] Fix structural equality for Array<Any> specialization * [#17913](https://github.com/apache/tvm/pull/17913) - [Triton] Support latest `triton.compile` interface * [#17911](https://github.com/apache/tvm/pull/17911) - Add op support for new_zeros op in Exported Program and fx graph frontend * [#17909](https://github.com/apache/tvm/pull/17909) - Add masked_fill_.scalar, logical_not.default in Exported Program frontend * [#17910](https://github.com/apache/tvm/pull/17910) - [RPC] Fix Bug That Change Dict When Iterate The Keys * [#17896](https://github.com/apache/tvm/pull/17896) - Add op support for zeros_like and fill_ * [#17900](https://github.com/apache/tvm/pull/17900) - Fix onnx expand op * [#17865](https://github.com/apache/tvm/pull/17865) - Add support for index_put_ op * [#17839](https://github.com/apache/tvm/pull/17839) - Add op support for roll op * [#17844](https://github.com/apache/tvm/pull/17844) - Fix incorrect docstring in topi softmax * [#17831](https://github.com/apache/tvm/pull/17831) - [3rdparty] Bump DLPack to v1.1 for float8/6/4 dtype supports * [#17848](https://github.com/apache/tvm/pull/17848) - Fix docstring in batch_to_space_nd and bitpack * [#17845](https://github.com/apache/tvm/pull/17845) - fixing incorrect docstring in upsampling.py * [#17808](https://github.com/apache/tvm/pull/17808) - [Install] Fix error during python/tvm installation -- View it on GitHub: https://github.com/apache/tvm/releases/tag/v0.21.0.rc0 You are receiving this because you are subscribed to this thread. Message ID: <apache/tvm/releases/233008...@github.com>