[apache/tvm] [Release] v0.16.0 Release Candidate Notes (Issue #16911)

ysh329 Sun, 21 Apr 2024 05:03:54 -0700

# Introduction

The TVM community has worked since the v0.15.0 release to deliver the following 
new exciting improvements! This release version is:


- **First support of Relax**, with dynamic shape and pipeline
- Dlight module for optimizing LLM TIR workloads on GPU
- Disco module for initial SPMD multi-GPU support

The main tags are below (**bold text is with lots of progress**):

- Community, RFCs
- Adreno, ArmComputeLibrary, Metal, cuda & cutlass & tensorrt, micoNPU, Runtime
- **Relax**, **Dlight**, **Disco**
- Arith, **TIR**, TVMScript
- Docs, CI, **Misc**, **BugFix**

Please visit the full listing of commits for a complete view: 
[v0.16.dev0...v0.16.0.rc0](https://github.com/apache/tvm/compare/v0.16.dev0...v0.16.0.rc0).

### Community

 * [#16695](https://github.com/apache/tvm/pull/16695) - Add new key for release 
signing
 * [#16419](https://github.com/apache/tvm/pull/16419) - Add new key for release 
signing

 ### RFCs

This new RFC explores how TVM can be utilized to generate code for the SME ISA 
to achieve improved inference performance on supported Arm®-based hardware 
implementing the SME extension.

 * [#107](https://github.com/apache/tvm-rfcs/pull/107) - [RFC] Scalable Matrix 
Extension enablement
----

### Arith
 * [#16735](https://github.com/apache/tvm/pull/16735) - [Fixup] Require feature 
flag for tighter inequality bounds
 * [#16588](https://github.com/apache/tvm/pull/16588) - Provide tighter 
ConstIntBounds for special cases
 * [#16704](https://github.com/apache/tvm/pull/16704) - [Fix]Fix canonical 
simplification of LE

### BYOC
 * [#16567](https://github.com/apache/tvm/pull/16567) - Skip processed 
functions in FuseOpsByPattern and RunCodegen

### BugFix
 * [#16766](https://github.com/apache/tvm/pull/16766) - [Target] Added null 
check to fix segfault at ->defined() in cpu.cc DetectSystemTriple()
 * [#16739](https://github.com/apache/tvm/pull/16739) - [Ansor] Fixing Ansor 
Gradient Bug
 * [#16820](https://github.com/apache/tvm/pull/16820) - [Fix] PAPI docs
 * [#16793](https://github.com/apache/tvm/pull/16793) - [Fix] fix for numpy 2.0 
compatibility
 * [#16790](https://github.com/apache/tvm/pull/16790) - [Fix] Fix build errors 
with VS2022
 * [#16780](https://github.com/apache/tvm/pull/16780) - [Fix] Fix numpy dtype 
map
 * [#16773](https://github.com/apache/tvm/pull/16773) - [Fix] Fix the purity 
flag of "vm.call_tir_dyn" and "kill" ops
 * [#16770](https://github.com/apache/tvm/pull/16770) - [Hotfix] Revert driver 
API pass ordering that breaks MLC, mark failing test
 * [#16771](https://github.com/apache/tvm/pull/16771) - [Fix] Remove redundant 
"remove_all_unused" in IPC memory lowering
 * [#16746](https://github.com/apache/tvm/pull/16746) - [Fix][Builtin] Fix 
"GetQueryPosition" of PagedKVCache
 * [#16728](https://github.com/apache/tvm/pull/16728) - [Fix] Introduce 
TVM_DEBUG_WITH_ABI_CHANGE to warn ABI changes in debug mode
 * [#16714](https://github.com/apache/tvm/pull/16714) - [Fix] PagedKVCache 
fetching compute stream when copy stream is needed
 * [#16684](https://github.com/apache/tvm/pull/16684) - [SLM] Produce 
well-formed Relax for nn.modules.KVCache
 * [#16659](https://github.com/apache/tvm/pull/16659) - add the default value 
for DFT in ONNX frontend
 * [#16637](https://github.com/apache/tvm/pull/16637) - [Transform] Preserve 
symbolic variables in FuseOps
 * [#16649](https://github.com/apache/tvm/pull/16649) - [FFI] Add a missing 
default for datatype lanes
 * [#16492](https://github.com/apache/tvm/pull/16492) - [Executor] fix 
debug_executor function debug_get_output
 * [#16598](https://github.com/apache/tvm/pull/16598) - [Transform]Handle 
non-composite lambda functions in FuseOps
 * [#16565](https://github.com/apache/tvm/pull/16565) - [Transform] Keep 
private non-primitive functions in FuseTIR
 * [#16518](https://github.com/apache/tvm/pull/16518) - Use x*x*x instead of 
pow(x,3)
 * [#16436](https://github.com/apache/tvm/pull/16436) - Ensure that bf16 arrays 
are created as expected
 * [#16361](https://github.com/apache/tvm/pull/16361) - Disable 
SingleEnvThreadVerifier
 * [#16289](https://github.com/apache/tvm/pull/16289) - [AUTOTVM][FIX] Typo 
fixes and add a warning in the Droplet Search

### CI
 * [#16837](https://github.com/apache/tvm/pull/16837) - Disable flaky unit test
 * [#16765](https://github.com/apache/tvm/pull/16765) - [AOT][Testing] Improve 
output mismatch information on test failure
 * [#16661](https://github.com/apache/tvm/pull/16661) - add merge_with_main in 
unity
 * [#16611](https://github.com/apache/tvm/pull/16611) - [AOT][Testing] Print 
output values on test failure
 * [#16546](https://github.com/apache/tvm/pull/16546) - Disable testing that 
downloads from mxnet
 * [#16521](https://github.com/apache/tvm/pull/16521) - Fix CI Script and 
Broken Tests
 * [#16502](https://github.com/apache/tvm/pull/16502) - Support tvm-bot rerun 
for tvm-unity task
 * [#16435](https://github.com/apache/tvm/pull/16435) - Update image tag to 
20240126-070121-8ade9c30e
 * [#16420](https://github.com/apache/tvm/pull/16420) - [WASM] Update emsdk and 
nodejs version
 * [#16384](https://github.com/apache/tvm/pull/16384) - Remove 
NVIDIA_DISABLE_REQUIRE
 * [#16382](https://github.com/apache/tvm/pull/16382) - In 
jenkins.cmd_utils.Sh.tee, check for failing subprocess
 * [#16366](https://github.com/apache/tvm/pull/16366) - Upgrade sccache version 
to 0.7.*
 * [#16369](https://github.com/apache/tvm/pull/16369) - Upgrade Unity ci images
 * [#16344](https://github.com/apache/tvm/pull/16344) - Update docker images 
tag to 20240105-165030-51bdaec6
 * [#16340](https://github.com/apache/tvm/pull/16340) - [Unity][UnitTest] 
Increase atol to resolve flaky CI failure
 * [#16337](https://github.com/apache/tvm/pull/16337) - [Hexagon][UnitTest] 
Disable flaky quantization test
 * [#16336](https://github.com/apache/tvm/pull/16336) - Upgrade cmake version 
to 3.24.0

### Docker
 * [#16755](https://github.com/apache/tvm/pull/16755) - [SME]Add Fixed Virtual 
Platform (FVP) and toolchain install
 * [#16348](https://github.com/apache/tvm/pull/16348) - Upgrade pip in i386 
container

### Dlight
 * [#16775](https://github.com/apache/tvm/pull/16775) - [Fix][Dlight] 
(Low-batched-)GeMV on small spatial loops
 * [#16429](https://github.com/apache/tvm/pull/16429) - [Unity][Dlight][Fix] 
Reduction rule support dyn-shape epilogue
 * [#16351](https://github.com/apache/tvm/pull/16351) - [Unity] Add 
dlight.gpu.Fallback in DispatchSortScan, add argsort, topk, and cumprod
 * [#16338](https://github.com/apache/tvm/pull/16338) - [Unity][DLight] 
Introduce Specific Rule for RMSNorm
 * [#16251](https://github.com/apache/tvm/pull/16251) - [Unity][Dlight] Support 
dlight gemv rule on nested inner block
 * [#16878](https://github.com/apache/tvm/pull/16878) - [Dlight] Enhance 
vectorization loading weight for gemv
 * [#16848](https://github.com/apache/tvm/pull/16848) - [DLight] Fix a corner 
case for reduction rule
 * [#16701](https://github.com/apache/tvm/pull/16701) - [Dlight] Add fallback 
for low batch gemv with outer reduction
 * [#16678](https://github.com/apache/tvm/pull/16678) - [Dlight] LowBatchGemv 
rule only apply to function with spatial symbolic var
 * [#16665](https://github.com/apache/tvm/pull/16665) - [Dlight] Skip GeMV when 
normalization fails
 * [#16579](https://github.com/apache/tvm/pull/16579) - [Dlight] Scheduling Low 
batch GEMM using GEMV-like rule
 * [#16579](https://github.com/apache/tvm/pull/16579) - [Dlight] Scheduling Low 
batch GEMM using GEMV-like rule
 * [#16321](https://github.com/apache/tvm/pull/16321) - [DLight] Skip rule if 
target is not suitable
 * [#16731](https://github.com/apache/tvm/pull/16731) - [Dlight] Fix GeMV 
shared memory estimation

### Docs
 * [#16792](https://github.com/apache/tvm/pull/16792) - [Doc] Fix 
set_axis_separator example
 * [#16610](https://github.com/apache/tvm/pull/16610) - [Doc] Fixed Docstring 
usage example in `tvm.ir.make_node`
 * [#16572](https://github.com/apache/tvm/pull/16572) - [Doc] Remove MxNet 
related tutorials
 * [#16514](https://github.com/apache/tvm/pull/16514) - [Unity][Doc] Document 
passes that depend on `DataflowBlock`s and encourage using `ConvertToDataflow`
 * [#16482](https://github.com/apache/tvm/pull/16482) - [Doc] Fix Docstring in 
`extern.py` for Sphinx
 * [#16346](https://github.com/apache/tvm/pull/16346) - [Doc] Fix minor error 
in "Expressions in Relay"

### Frontend
 * [#16001](https://github.com/apache/tvm/pull/16001) - [ONNX] Fix interpreting 
auto_pad parameters in ConvTranspose operator
 * [#16651](https://github.com/apache/tvm/pull/16651) - [PaddlePaddle] 
PaddlePaddle model with NCHW data format that supports quantization
 * [#16616](https://github.com/apache/tvm/pull/16616) - [PaddlePaddle] Support 
conv2d when data_format is NHWC
 * [#16526](https://github.com/apache/tvm/pull/16526) - [Keras] Enable Dense 
operator for any input dims
 * [#16478](https://github.com/apache/tvm/pull/16478) - [PaddlePaddle] Fixed 
the bug that prevented the model from being successfully converted to microTVM 
on MacOS

### Hexagon
 * [#16762](https://github.com/apache/tvm/pull/16762) - [VM]Cache operations 
when bypass mode is enabled
 * [#16706](https://github.com/apache/tvm/pull/16706) - [VM] Add buffers to 
`dma_wait` builtin
 * [#16448](https://github.com/apache/tvm/pull/16448) - [VM]Implement dma_copy 
and dma_wait builtin for hexagon

### LLVM
 * [#16782](https://github.com/apache/tvm/pull/16782) - [SVE] Support scalable 
vectors in LoopVectorizer
 * [#16812](https://github.com/apache/tvm/pull/16812) - Fix compilation failure 
due to minor change
 * [#16808](https://github.com/apache/tvm/pull/16808) - [Runtime]Fix errors 
during loading of target tags
 * [#16748](https://github.com/apache/tvm/pull/16748) - Lack of DWARF type is 
not an error
 * [#16696](https://github.com/apache/tvm/pull/16696) - [SVE] Add codegen 
support for scalable buffer accesses
 * [#15964](https://github.com/apache/tvm/pull/15964) - [RUNTIME] Add optional 
LLVM ORCJIT runtime executor
 * [#16612](https://github.com/apache/tvm/pull/16612) - [SVE] Add support for 
scalable data type strings
 * [#16523](https://github.com/apache/tvm/pull/16523) - [SVE] Change the dtype 
of Ramp and Broadcast lanes to PrimExpr
 * [#16484](https://github.com/apache/tvm/pull/16484) - [SVE] Add vscale builtin
 * [#16373](https://github.com/apache/tvm/pull/16373) - Update Host.h path

### MetaSchedule
 * [#16725](https://github.com/apache/tvm/pull/16725) - Make the `opt_level` of 
`tune_relay()` adjustable

### Metal
 * [#16713](https://github.com/apache/tvm/pull/16713) - [RUNTIME]Provide richer 
runtime when error happens
 * [#16605](https://github.com/apache/tvm/pull/16605) - [RUNTIME]Fix 
multithreading access of metal runtime
 * [#16438](https://github.com/apache/tvm/pull/16438) - Dispatch numerically 
stable tanh for metal

### OpenCL & CLML
 * [#16854](https://github.com/apache/tvm/pull/16854) - [OpenCL] Add OpenCL 
device for automatic target detection
 * [#16846](https://github.com/apache/tvm/pull/16846) - [Meta-Schedule][OpenCL] 
Enable MS tuning for Android OpenCL
 * [#16768](https://github.com/apache/tvm/pull/16768) - [RUNTIME][OPENCL] 
Bugfix for ciImage create with host ptr
 * [#16672](https://github.com/apache/tvm/pull/16672) - [CLML] Fix build TVM 
with CLML on MacOS
 * [#16328](https://github.com/apache/tvm/pull/16328) - [RUNTIME][CLML] Fix for 
Softmax op for 4D tensors
 * [#16394](https://github.com/apache/tvm/pull/16394) - [OpenCL][CMake] Fix 
OpenCL tests compilation

### ROCm
 * [#16441](https://github.com/apache/tvm/pull/16441) - [WebGPU] Intrin 
Dispatch: `tanh`, `erf`, `log`
 * [#16404](https://github.com/apache/tvm/pull/16404) - Some fixes of ROCm 
codegen

### Relax
 * [#16872](https://github.com/apache/tvm/pull/16872) - Enhance symbolic expr 
estimation in memory planning
 * [#16867](https://github.com/apache/tvm/pull/16867) - Dispatch sort/scan for 
non-cuda gpu backends
 * [#16852](https://github.com/apache/tvm/pull/16852) - Fix 
EliminiateCommonSubexpr removing alloc tensor
 * [#16851](https://github.com/apache/tvm/pull/16851) - [Relax,Topi] Allow 
passing workspace to thrust to avoid allocations
 * [#16841](https://github.com/apache/tvm/pull/16841) - Provide well-formed 
output in `transform.LazyGetInput`
 * [#16798](https://github.com/apache/tvm/pull/16798) - [Transform] Provide 
callback versions of LazyTransformParams
 * [#16801](https://github.com/apache/tvm/pull/16801) - Allow 
DeadCodeElimination within ApplyPassToFunction
 * [#16834](https://github.com/apache/tvm/pull/16834) - Capture symbolic vars 
in struct info of weights
 * [#16830](https://github.com/apache/tvm/pull/16830) - Share storage allocs 
among functions after cuda graph rewriting
 * [#16823](https://github.com/apache/tvm/pull/16823) - [VM] Refactor CUDA 
graph builtins as VM extension
 * [#16828](https://github.com/apache/tvm/pull/16828) - [Bugfix] Provide the 
full Expr to pattern-match rewriter
 * [#16805](https://github.com/apache/tvm/pull/16805) - [Bugfix]BlockBuilder 
may not assume unique input functions
 * [#16815](https://github.com/apache/tvm/pull/16815) - Enable capturing 
symbolic shapes in cuda graph
 * [#16642](https://github.com/apache/tvm/pull/16642) - Allow R.Prim('bool') in 
relax::If and assert_op
 * [#16796](https://github.com/apache/tvm/pull/16796) - Unit-test for 
structural equal of recursive function
 * [#16732](https://github.com/apache/tvm/pull/16732) - Allow composition of 
DFPattern replacements
 * [#16783](https://github.com/apache/tvm/pull/16783) - Improve 
CanonicalizeBindings in DataflowVar edge case
 * [#16721](https://github.com/apache/tvm/pull/16721) - Implement operators to 
inspec DLTensor::strides and offset
 * [#16730](https://github.com/apache/tvm/pull/16730) - Refactor 
PatternRewriter into separate Block/Expr mutators
 * [#16756](https://github.com/apache/tvm/pull/16756) - [IR]Improve 
highlighting in assert_structural_equal
 * [#16779](https://github.com/apache/tvm/pull/16779) - Improve malform error 
msg
 * [#16569](https://github.com/apache/tvm/pull/16569) - [Unity][Parser] Check 
well-formedness in the parser
 * [#16759](https://github.com/apache/tvm/pull/16759) - [Pass] Lowering passes 
for GPU IPC memory and allreduce
 * [#16697](https://github.com/apache/tvm/pull/16697) - Implement 
relax.transform.TopologicalSort
 * [#16658](https://github.com/apache/tvm/pull/16658) - Normalize use of 
void-type variable to inline R.tuple()
 * [#16711](https://github.com/apache/tvm/pull/16711) - [Frontend] Add op 
`tanh`, `exp`, `negative`, and `permute`
 * [#16703](https://github.com/apache/tvm/pull/16703) - [Fix]Fix top-p/top-k 
sampling kernel
 * [#16669](https://github.com/apache/tvm/pull/16669) - [Frontend][Onnx] add 
sum and globalavgpool 1d/3d op
 * [#16691](https://github.com/apache/tvm/pull/16691) - CUDA graph rewrite 
treating StringImm as static
 * [#16685](https://github.com/apache/tvm/pull/16685) - Implement 
StructInfoPattern for dataflow pattern matching
 * [#16681](https://github.com/apache/tvm/pull/16681) - [Frontend][Onnx] 
support MaxPool1/2/3D and AveragePool1/2/3D
 * [#16584](https://github.com/apache/tvm/pull/16584) - [Unity][TIR] Clear 
struct info when specializing PrimFunc
 * [#16676](https://github.com/apache/tvm/pull/16676) - Remove the legalization 
of cumsum/cumprob
 * [#16654](https://github.com/apache/tvm/pull/16654) - [Frontend][NN] Add 
support for Conv3D
 * [#16674](https://github.com/apache/tvm/pull/16674) - Eager free original 
weights in transform_params
 * [#16675](https://github.com/apache/tvm/pull/16675) - add sample_indices in 
sampling
 * [#16648](https://github.com/apache/tvm/pull/16648) - [Runtime] Support 
Unpack API for NDArrayCache
 * [#16591](https://github.com/apache/tvm/pull/16591) - [Unity][Transform] 
Handle dynamic shapes in CombineParallelMatmul
 * [#16594](https://github.com/apache/tvm/pull/16594) - [Transform] Preserve 
param names in LiftTransformParams
 * [#16575](https://github.com/apache/tvm/pull/16575) - [Unity] GPU sampling
 * [#16574](https://github.com/apache/tvm/pull/16574) - Additional unit tests 
for RemoveUnusedParameters
 * [#16585](https://github.com/apache/tvm/pull/16585) - [Unity][Analysis] 
Include impure call in VerifyWellFormed errors
 * [#16421](https://github.com/apache/tvm/pull/16421) - [Unity][Transform] 
Raise error in FuseOpsByPattern for SSA violation
 * [#16629](https://github.com/apache/tvm/pull/16629) - Fix error message in 
BlockBuilder
 * [#16592](https://github.com/apache/tvm/pull/16592) - Handle dynamic 
arguments in legalization of nn.attention
 * [#16590](https://github.com/apache/tvm/pull/16590) - [Unity][Transform] 
Check for permute_dims in ExpandMatmulOfSum
 * [#16604](https://github.com/apache/tvm/pull/16604) - [Frontend][Onnx] fix 
clip unsqueeze opset implement
 * [#16568](https://github.com/apache/tvm/pull/16568) - [Runtime] RNNState for 
Space State Models
 * [#16563](https://github.com/apache/tvm/pull/16563) - Implement operators to 
read runtime DLTensor* information
 * [#16581](https://github.com/apache/tvm/pull/16581) - 
[Unity][MSC][M4.2][Step2] Enable plugin with manager, test plugins in compile 
pipeline
 * [#16600](https://github.com/apache/tvm/pull/16600) - Expose name_hint field 
for BlockBuilder.match_cast
 * [#16601](https://github.com/apache/tvm/pull/16601) - [Transform] 
Canonicalize `let var = R.const` bindings
 * [#16583](https://github.com/apache/tvm/pull/16583) - [Unity][VM] Recursively 
visit match bindings in VMShapeLowerMutator
 * [#16586](https://github.com/apache/tvm/pull/16586) - Ignore non-relax 
functions in relax.transform.RunCodegen
 * [#16573](https://github.com/apache/tvm/pull/16573) - [VM] Re-implementation 
of callback functions
 * [#16561](https://github.com/apache/tvm/pull/16561) - [Bugfix]Remove call to 
tvm.build for empty TIR module
 * [#16564](https://github.com/apache/tvm/pull/16564) - [Unity] Check for 
symbolic vars in PrimValue in when lowering to TIR
 * [#16558](https://github.com/apache/tvm/pull/16558) - Minor updates for NN 
frontend
 * [#16542](https://github.com/apache/tvm/pull/16542) - Support callback as 
argument
 * [#16487](https://github.com/apache/tvm/pull/16487) - [Unity][Transform] 
Handle `call_tir_inplace` in `FuseTIR` and `FuseOps`
 * [#16355](https://github.com/apache/tvm/pull/16355) - [Unity] Infer struct 
info for relax.op.split on dynamic-sized index
 * [#16465](https://github.com/apache/tvm/pull/16465) - [Redo][Unity] Split 
DecomposeOpsForTraining into two steps
 * [#16495](https://github.com/apache/tvm/pull/16495) - 
[Unity][MSC][M4.2][Step1] Enable plugin with manager, test plugins in compile 
pipeline
 * [#16498](https://github.com/apache/tvm/pull/16498) - [Frontent] 
"tensor_ir_inplace" op
 * [#16500](https://github.com/apache/tvm/pull/16500) - [Unity] Support storage 
reuse for dynamic shapes
 * [#16493](https://github.com/apache/tvm/pull/16493) - [Pass] Skip data type 
node for CSE pass
 * [#16467](https://github.com/apache/tvm/pull/16467) - [Unity][MSC][Refactor] 
Reconstruct BYOC and runner
 * [#16422](https://github.com/apache/tvm/pull/16422) - [Unity][CodeGen] 
RunCodegen based on externally-exposed functions
 * [#16483](https://github.com/apache/tvm/pull/16483) - [Unity][Frontend] Add 
Sigmoid and Square Op
 * [#16472](https://github.com/apache/tvm/pull/16472) - [Unity] Improved error 
message in tvm::relax::UpdateStructInfo
 * [#16473](https://github.com/apache/tvm/pull/16473) - [Unity] Improve error 
message in tensor_to_shape struct inference
 * [#16466](https://github.com/apache/tvm/pull/16466) - Memory planning for 
"partially dynamic" shapes
 * [#16464](https://github.com/apache/tvm/pull/16464) - NDArray Cache Update 
with DLTensor Support
 * [#16315](https://github.com/apache/tvm/pull/16315) - [Unity][Transform] 
Implement relax.transform.ReorderTakeAfterMatmul
 * [#16313](https://github.com/apache/tvm/pull/16313) - [Unity][Transform] 
Implement relax.transform.ExpandMatmulOfSum
 * [#16411](https://github.com/apache/tvm/pull/16411) - [Unity][Transform] 
Handle symbolic variables in LambdaLift
 * [#16443](https://github.com/apache/tvm/pull/16443) - [Unity][FIX] fix thread 
dtype mismatch
 * [#16442](https://github.com/apache/tvm/pull/16442) - Revert "[Unity] Split 
DecomposeOpsForTraining into two steps"
 * [#16437](https://github.com/apache/tvm/pull/16437) - [Unity] Improve buffer 
allocation for handling duplicated buffer names.
 * [#16439](https://github.com/apache/tvm/pull/16439) - [Unity]  Support cumsum 
with pure int32
 * [#16432](https://github.com/apache/tvm/pull/16432) - [Unity] downgrade cmake 
version requirement
 * [#16427](https://github.com/apache/tvm/pull/16427) - [Unity][Frontend][NN] 
Better support for dynamic convolutions
 * [#16418](https://github.com/apache/tvm/pull/16418) - [Unity][Fix] Fix 
mismatched intrinsic name
 * [#16129](https://github.com/apache/tvm/pull/16129) - [Unity][Transform] 
Replace eligible operators with in-place versions in dataflow blocks
 * [#16414](https://github.com/apache/tvm/pull/16414) - [Bugfix][Unity] Recover 
MSVC/NVCC/ROCm/Vulkan
 * [#15954](https://github.com/apache/tvm/pull/15954) - [Unity] Split 
DecomposeOpsForTraining into two steps
 * [#16111](https://github.com/apache/tvm/pull/16111) - [Unity][Transform] 
Memory planning for dynamic-shape func return
 * [#16396](https://github.com/apache/tvm/pull/16396) - [Unity] PagedKVCache 
supporting on-the-fly RoPE calculation
 * [#16395](https://github.com/apache/tvm/pull/16395) - [Frontend][ONNX]fix 
onnx frontend parse
 * [#16385](https://github.com/apache/tvm/pull/16385) - [Unity][Op] Add Conv3D 
Operator
 * [#16284](https://github.com/apache/tvm/pull/16284) - [Unity][nnModule] 
Dynamic shape support in nn Module
 * [#16378](https://github.com/apache/tvm/pull/16378) - [Unity][BlockBuilder] 
Restore bb.get()
 * [#16374](https://github.com/apache/tvm/pull/16374) - [Unity] Support TIR 
kernel for PagedKVCache
 * [#16314](https://github.com/apache/tvm/pull/16314) - [Unity][Transform] 
Implement relax.transform.AdjustMatmulOrder
 * [#16349](https://github.com/apache/tvm/pull/16349) - [Unity][MSC] Avoid 
depending on trivial bindings in Relax intermediate
 * [#16376](https://github.com/apache/tvm/pull/16376) - [Unity][Contrib] Fix a 
bug due to typo in vllm `reconstruct_from_cache` kernel and add test
 * [#16388](https://github.com/apache/tvm/pull/16388) - [Unity] Update dispatch 
test cases following the merge from main
 * [#16335](https://github.com/apache/tvm/pull/16335) - [Unity] Set 
CMAKE_CUDA_ARCHITECTURES default to native
 * [#16306](https://github.com/apache/tvm/pull/16306) - [Unity][Transform] 
Update LambdaLift to use name of lifted lambda
 * [#16310](https://github.com/apache/tvm/pull/16310) - [Unity][Analysis] Show 
objects instead of names in WellFormedChecker
 * [#16362](https://github.com/apache/tvm/pull/16362) - [Unity][Fix] Memory 
planning check value type of 'tir_var_upper_bound'
 * [#16367](https://github.com/apache/tvm/pull/16367) - [Unity][Transform] 
Handle replacement at both var binding and usage
 * [#16309](https://github.com/apache/tvm/pull/16309) - [Unity][Transform] Use 
parameter name in BundleModelParams
 * [#16307](https://github.com/apache/tvm/pull/16307) - [Unity] Improved error 
message in ExprMutator::ReEmitBinding
 * [#16308](https://github.com/apache/tvm/pull/16308) - [Unity] Improved error 
message for matmul shape mismatch
 * [#16360](https://github.com/apache/tvm/pull/16360) - [Unity] Enhance 
Torch-consistency in rehsape
 * [#16350](https://github.com/apache/tvm/pull/16350) - [Unity][Contrib] Add 
vLLM paged attention kernel
 * [#16303](https://github.com/apache/tvm/pull/16303) - [Unity][NN] Use Linear 
name for nn.op.permute_dims
 * [#16325](https://github.com/apache/tvm/pull/16325) - [Unity][MSC][Legalize] 
legalize codes and mute logging
 * [#16312](https://github.com/apache/tvm/pull/16312) - [Unity][Analysis] Add 
utility for collecting compile-time bindings
 * [#16330](https://github.com/apache/tvm/pull/16330) - [Unity][WEBGPU] Enable 
wasm exception propagation
 * [#16304](https://github.com/apache/tvm/pull/16304) - [Unity][Analysis] 
Handle PrimStructInfo in EraseToWellDefined
 * [#16305](https://github.com/apache/tvm/pull/16305) - [Unity][Transform] 
Implement UpdateParamStructInfo
 * [#16331](https://github.com/apache/tvm/pull/16331) - [Unity] Alter op impl 
handling empty transform for output
 * [#16254](https://github.com/apache/tvm/pull/16254) - [Unity] Dispatch cumsum 
and sort
 * [#16120](https://github.com/apache/tvm/pull/16120) - [Unity][Transform] 
Extract partial-tuple-usage from FuseTIR
 * [#16311](https://github.com/apache/tvm/pull/16311) - [Unity] Validate struct 
info in relax::Call constructor
 * [#16333](https://github.com/apache/tvm/pull/16333) - [Unity] Fix 
nn.op.tensor_ir_op signature
 * [#16302](https://github.com/apache/tvm/pull/16302) - [Unity] Cutlass kernel 
compatibility with cmake 3.18+

### Relay
 * [#16622](https://github.com/apache/tvm/pull/16622) - [ONNX] Fix the 
attribute mode parse of operator Upsample
 * [#16626](https://github.com/apache/tvm/pull/16626) - [ONNX] Fix the Resize 
operator in ONNX frontend
 * [#16624](https://github.com/apache/tvm/pull/16624) - [ONNX] fix the wrong 
default value about dtype in Multinomial converter
 * [#16417](https://github.com/apache/tvm/pull/16417) - [Frontend][Torch] fix 
pytorch frontend linspace op
 * [#16400](https://github.com/apache/tvm/pull/16400) - [Frontend][Torch] fix 
pytorch frontend not support logical or
 * [#16390](https://github.com/apache/tvm/pull/16390) - [Frontend][Torch] fix a 
typo mistake in nonzero_numpy
 * [#16324](https://github.com/apache/tvm/pull/16324) - make "ToScalar" support 
directly obtaining "int64_t"

### Runtime
 * [#16804](https://github.com/apache/tvm/pull/16804) - Introduce MSCCLPP with 
NCCL equivalent interface
 * [#16809](https://github.com/apache/tvm/pull/16809) - Add "TVM_DLL" to NVTX 
header
 * [#16750](https://github.com/apache/tvm/pull/16750) - CUDA IPC Memory support 
and custom allreduce kernels
 * [#16738](https://github.com/apache/tvm/pull/16738) - [Refactor]Always 
specify device in allocator interface
 * [#16716](https://github.com/apache/tvm/pull/16716) - Ensure 
NDArray.CopyTo(Device) always sync
 * [#16705](https://github.com/apache/tvm/pull/16705) - Add TVM_DLL to memory 
manager functions
 * [#16692](https://github.com/apache/tvm/pull/16692) - PagedKVCache execute 
data copy on a separate stream
 * [#16647](https://github.com/apache/tvm/pull/16647) - [RPC] Fix FreeObject in 
minrpc server
 * [#16667](https://github.com/apache/tvm/pull/16667) - [Builtin] Using float32 
accumulation in attention kernel
 * [#16635](https://github.com/apache/tvm/pull/16635) - [RPC] Enable 
RPCObjectRef over multi-hop RPC
 * [#16630](https://github.com/apache/tvm/pull/16630) - Add TVM_DLL to 
threading backend funcs
 * [#16541](https://github.com/apache/tvm/pull/16541) - Add "TVM_DLL" to 
NDArray cache load func
 * [#16550](https://github.com/apache/tvm/pull/16550) - [ROCM] Properly align 
rocm parameter buffer
 * [#16545](https://github.com/apache/tvm/pull/16545) - Fix dtype conversion 
for bf16 and fp8
 * [#16508](https://github.com/apache/tvm/pull/16508) - ParallelFor skipping 
thread backend for unit extent
 * [#16486](https://github.com/apache/tvm/pull/16486) - KV cache providing 
workspace for attn kernel
 * [#16456](https://github.com/apache/tvm/pull/16456) - [KVCache] 
AttentionWithFusedQKV and RoPE mode
 * [#16415](https://github.com/apache/tvm/pull/16415) - [Memory] Implement 
support for non-zero offset within a storage object in AllocNDArr…
 * [#16387](https://github.com/apache/tvm/pull/16387) - [RPC] Enable 
RPCObjectRef return in RPC
 * [#16377](https://github.com/apache/tvm/pull/16377) - Use cudaGetDeviceCount 
to check if device exists

### TIR
 * [#16832](https://github.com/apache/tvm/pull/16832) - Use constructor for new 
PrimFunc in TransformLayout
 * [#16543](https://github.com/apache/tvm/pull/16543) - Fix segfaults from 
ordering of Let/Assert in MakePackedAPI
 * [#16795](https://github.com/apache/tvm/pull/16795) - Ramp and Broadcast 
lanes fixed to int32 dtype
 * [#16767](https://github.com/apache/tvm/pull/16767) - [Driver] Use 
`BindTarget` to specify target for FP8 legalization
 * [#16742](https://github.com/apache/tvm/pull/16742) - [Bugfix]Fix cache_read 
update buffer region
 * [#16726](https://github.com/apache/tvm/pull/16726) - [Bugfix]Avoid overwrite 
of unmanaged buffer allocations
 * [#16548](https://github.com/apache/tvm/pull/16548) - [CUDA] Add native FP8 
support to codegen
 * [#16723](https://github.com/apache/tvm/pull/16723) - Implement max/min_value 
for fp8 data types
 * [#16655](https://github.com/apache/tvm/pull/16655) - Improve well-formed 
check's handling of match buffer
 * [#16673](https://github.com/apache/tvm/pull/16673) - Support Vector 
Reinterpret Calls
 * [#16682](https://github.com/apache/tvm/pull/16682) - [Bugfix]Handle AttrStmt 
of upcoming tir.Var in ConvertSSA
 * [#16560](https://github.com/apache/tvm/pull/16560) - Enhance and fix 
tensorize schedule for some case
 * [#16660](https://github.com/apache/tvm/pull/16660) - [Bugfix]Fix duplicate 
AllocateConst in CacheReadWrite schedule primitive
 * [#16544](https://github.com/apache/tvm/pull/16544) - Expand debug symbol 
output for CodeGenLLVM
 * [#16553](https://github.com/apache/tvm/pull/16553) - Fix 
get_block_access_region for let bindings
 * [#16515](https://github.com/apache/tvm/pull/16515) - Require exactly 
same-dtype matching for Vulkan smem reuse
 * [#16406](https://github.com/apache/tvm/pull/16406) - Fix of inter thread 
reduction with shared memory prefetch
 * [#16293](https://github.com/apache/tvm/pull/16293) - Extend DP4A tensor 
intrin
 * [#16345](https://github.com/apache/tvm/pull/16345) - Allow sync threads 
inside condition
 * [#16250](https://github.com/apache/tvm/pull/16250) - In SplitHostDevice, 
check for variables in thread extents
 * [#16184](https://github.com/apache/tvm/pull/16184) - [Transform] Implement 
InlinePrivateFunctions

### TOPI
 * [#16652](https://github.com/apache/tvm/pull/16652) - improve inclusive_scan 
for thrust
 * [#16383](https://github.com/apache/tvm/pull/16383) - [Target] Add fp16 SIMD 
support for conv2d on `arm_cpu` targets

### TVMC
 * [#16261](https://github.com/apache/tvm/pull/16261) - Add tvmc flag to print 
ir before and print ir after named pass

### TVMScript
 * [#16864](https://github.com/apache/tvm/pull/16864) - Add parser and printer 
support for e4m3/e5m2 fp8
 * [#16844](https://github.com/apache/tvm/pull/16844) - Produce empty DictAttrs 
when R.func_attrs is absent
 * [#16811](https://github.com/apache/tvm/pull/16811) - Do not throw error for 
duplicate definitions
 * [#16641](https://github.com/apache/tvm/pull/16641) - Allow use of relax.Expr 
with void type as a statement
 * [#16663](https://github.com/apache/tvm/pull/16663) - Infer T.reads() for 
DeclBuffer nodes
 * [#16640](https://github.com/apache/tvm/pull/16640) - Represent 
tir::builtin::ret() using python "return"
 * [#16562](https://github.com/apache/tvm/pull/16562) - [Bugfix]Handle 
R.match_cast as last binding in if/else
 * [#16593](https://github.com/apache/tvm/pull/16593) - [Unity]Parse R.Object 
return type from call_pure_packed
 * [#16356](https://github.com/apache/tvm/pull/16356) - [Unity]Optionally hide 
StructInfo that can be inferred
 * [#16379](https://github.com/apache/tvm/pull/16379) - [Unity]Update 
`call_packed` semantics to support empty sinfo_args

### Vulkan
 * [#16858](https://github.com/apache/tvm/pull/16858) - Fix CLZ support for 
Vulkan

### cuda & cutlass & tensorrt
 * [#16865](https://github.com/apache/tvm/pull/16865) - [Codegen, CUDA] Add 
handling of fp8 broadcast / const
 * [#16818](https://github.com/apache/tvm/pull/16818) - [Cutlass] Fix usage of 
cuda stream for group gemm
 * [#16788](https://github.com/apache/tvm/pull/16788) - [Cutlass] Add check for 
group gemm param shapes
 * [#16789](https://github.com/apache/tvm/pull/16789) - [Bugfix][Cutlass] 
Remove a typo in cutlass build
 * [#16787](https://github.com/apache/tvm/pull/16787) - [Codegen, Cuda] Add 
overload for fp8x4 e5m2 <-> half4 conversion
 * [#16751](https://github.com/apache/tvm/pull/16751) - [Cutlass] Add group 
gemm kernels
 * [#16736](https://github.com/apache/tvm/pull/16736) - [Target][CUDA] Allow 
non-numeric arch as needed for latest gpu
 * [#16619](https://github.com/apache/tvm/pull/16619) - [Bugfix][Cutlass] Check 
if function attributes is None
 * [#16342](https://github.com/apache/tvm/pull/16342) - [CUDA] Simple extend to 
optimize reuse for static shared memory.
 * [#16342](https://github.com/apache/tvm/pull/16342) - [CUDA] Simple extend to 
optimize reuse for static shared memory.
 * [#16342](https://github.com/apache/tvm/pull/16342) - [CUDA] Simple extend to 
optimize reuse for static shared memory.
 * [#16342](https://github.com/apache/tvm/pull/16342) - [CUDA] Simple extend to 
optimize reuse for static shared memory.
 * [#16342](https://github.com/apache/tvm/pull/16342) - [CUDA] Simple extend to 
optimize reuse for static shared memory.

### micoNPU
 * [#16266](https://github.com/apache/tvm/pull/16266) - [microNPU][ETHOSU] Add 
fixed point for tanh
 * [#16680](https://github.com/apache/tvm/pull/16680) - [microNPU][ETHOSU] Fix 
LUT size for int16 activations
 * [#16401](https://github.com/apache/tvm/pull/16401) - [microNPU][ETHOSU] Add 
fixed point for matmul

### web
 * [#16733](https://github.com/apache/tvm/pull/16733) - Support web indexDB 
cache for larger model storage
 * [#16810](https://github.com/apache/tvm/pull/16810) - Support building 
tvm/web on Windows
 * [#16825](https://github.com/apache/tvm/pull/16825) - Allow custom bc files 
in emcc making
 * [#16791](https://github.com/apache/tvm/pull/16791) - Add `kv_state` and 
`rnn_state` to wasm_runtime
 * [#16722](https://github.com/apache/tvm/pull/16722) - Implement linear 
congruential generator, make runtime seedable
 * [#16650](https://github.com/apache/tvm/pull/16650) - Seperate parallel shard 
download and iterative shard loading
 * [#16694](https://github.com/apache/tvm/pull/16694) - Initial support for 
asyncify
 * [#16631](https://github.com/apache/tvm/pull/16631) - Fix NDArrayCache 
loading report callback
 * [#16525](https://github.com/apache/tvm/pull/16525) - Move ArtifactCache to 
Interface, Support Cache delete and Batch Delete, Remove typo
 * [#16554](https://github.com/apache/tvm/pull/16554) - Compatibility with 
PagedKVCache in WebGPU
 * [#16527](https://github.com/apache/tvm/pull/16527) - Revert "[Unity]Temp 
disable wasm exception (#16444)"
 * [#16504](https://github.com/apache/tvm/pull/16504) - [Relax]Add 
ApplyPresenceAndRequencyPenalty
 * [#16485](https://github.com/apache/tvm/pull/16485) - [wasm] Enlarge initial 
memory for emcc
 * [#16444](https://github.com/apache/tvm/pull/16444) - [Unity]Temp disable 
wasm exception

### Misc
 * [#16873](https://github.com/apache/tvm/pull/16873) - [Thrust] Fix thrust 
workspace allocation
 * [#16868](https://github.com/apache/tvm/pull/16868) - [3rdparty] Bump 
flashinfer
 * [#16871](https://github.com/apache/tvm/pull/16871) - [PageKV] allow PopN to 
pop all the tokens in last block
 * [#16866](https://github.com/apache/tvm/pull/16866) - [3rdparty] Bump 
FlashInfer
 * [#16863](https://github.com/apache/tvm/pull/16863) - [Picojson] Let the key 
of objects in json be ordered by default
 * [#16856](https://github.com/apache/tvm/pull/16856) - [Thrust] Use pointer to 
tls pool to prevent creating new pool
 * [#16850](https://github.com/apache/tvm/pull/16850) - Fixing probability 
comment
 * [#16849](https://github.com/apache/tvm/pull/16849) - [KVCache] Initialize 
one extra page than specified
 * [#16843](https://github.com/apache/tvm/pull/16843) - [IR] Provide 
well-formed intermediate in ApplyPassToFunction
 * [#16772](https://github.com/apache/tvm/pull/16772) - [MSC][M5.3] Support 
torch.dynamo for dynamic models
 * [#16839](https://github.com/apache/tvm/pull/16839) - Bump pillow from 10.2.0 
to 10.3.0 in /apps/microtvm/cmsisnn
 * [#16838](https://github.com/apache/tvm/pull/16838) - Bump pillow from 10.2.0 
to 10.3.0 in /apps/microtvm/ethosu
 * [#16831](https://github.com/apache/tvm/pull/16831) - [KVCache] Reducing 
CacheAuxDataManager copy size
 * [#16794](https://github.com/apache/tvm/pull/16794) - [SME] Target parser 
support for SME
 * [#16824](https://github.com/apache/tvm/pull/16824) - [KVCache] Introducing 
auxiliary data manager
 * [#16800](https://github.com/apache/tvm/pull/16800) - [BugTIR]fix error 
merging shared memory for ptx_cp_async
 * [#16822](https://github.com/apache/tvm/pull/16822) - [VM] Recycle VMFrame
 * [#16813](https://github.com/apache/tvm/pull/16813) - [KVCache] Support 
forking sequence at specific posotion
 * [#16786](https://github.com/apache/tvm/pull/16786) - [Codegen] Add check to 
disable invalid reinterpret
 * [#16816](https://github.com/apache/tvm/pull/16816) - [Cmake] Allow using 
custom CCCL path for thrust
 * [#16784](https://github.com/apache/tvm/pull/16784) - [SLM] Add unit tests 
for SLM to Relax exporter
 * [#16814](https://github.com/apache/tvm/pull/16814) - Fix includes of custom 
allreduce kernel
 * [#16806](https://github.com/apache/tvm/pull/16806) - [Debug] Improve error 
message in VMShapeLower
 * [#16802](https://github.com/apache/tvm/pull/16802) - [Debug] Improve error 
messages in LiftTransformParams
 * [#16425](https://github.com/apache/tvm/pull/16425) - [Target] Use LLVM 
target parser for determining Arm(R) A-Profile Architecture features
 * [#16797](https://github.com/apache/tvm/pull/16797) - [3rdparty] AUTO mode 
for custom all-reduce strategy
 * [#16761](https://github.com/apache/tvm/pull/16761) - [SME] Add support for 
inserting processor state annotations
 * [#16778](https://github.com/apache/tvm/pull/16778) - [Analysis] Allow calls 
to GlobalVar in @R.function
 * [#16745](https://github.com/apache/tvm/pull/16745) - [IR] Default to empty 
attributes, instead of NULL
 * [#16777](https://github.com/apache/tvm/pull/16777) - Revert "[SLM] Allow 
modules to define pre-processing of weights"
 * [#16776](https://github.com/apache/tvm/pull/16776) - [Contrib] Remove thrust 
"built but not used" warning
 * [#16757](https://github.com/apache/tvm/pull/16757) - [SLM] Allow modules to 
define pre-processing of weights
 * [#16763](https://github.com/apache/tvm/pull/16763) - [CONTRIB] Add nm symbol 
dump
 * [#16717](https://github.com/apache/tvm/pull/16717) - Enable Shared Function 
in LiftTransformParam Pass
 * [#16729](https://github.com/apache/tvm/pull/16729) - [Builtin] Sliding 
window and sink support for PagedKVCache
 * [#16724](https://github.com/apache/tvm/pull/16724) - Fix cpp_rtvm cmake 
build on Windows
 * [#16513](https://github.com/apache/tvm/pull/16513) - [Target] Automatically 
detect system triple when not specified by the user
 * [#16710](https://github.com/apache/tvm/pull/16710) - [CMake] Add 
"USE_FLASHINFER" to libinfo
 * [#16702](https://github.com/apache/tvm/pull/16702) - [MSC][M5.2] Enable 
quantize && prune with gym by wrapper
 * [#16699](https://github.com/apache/tvm/pull/16699) - [Transform] Remove 
R.Object parameters after LazyTransformParams
 * [#16668](https://github.com/apache/tvm/pull/16668) - [MSC][M5.1] Build 
wrapper to support compression
 * [#16693](https://github.com/apache/tvm/pull/16693) - [Contrib] Support 
NDArray cache taking generator
 * [#16412](https://github.com/apache/tvm/pull/16412) - [Lint] Add check to 
prevent usage of #include <regex>
 * [#16689](https://github.com/apache/tvm/pull/16689) - [DeviceAPI] Support 
"GetCurrentStream"
 * [#16690](https://github.com/apache/tvm/pull/16690) - Use target name instead 
of node name as function name
 * [#16683](https://github.com/apache/tvm/pull/16683) - [skip ci] Fix wasm 
exception flag
 * [#16609](https://github.com/apache/tvm/pull/16609) - Minor update docs 
instructions
 * [#16656](https://github.com/apache/tvm/pull/16656) - Simplify Windows CMake 
Command
 * [#16666](https://github.com/apache/tvm/pull/16666) - [KVCache] Fix the 
reference counter in sequence fork
 * [#16662](https://github.com/apache/tvm/pull/16662) - Fixing workload comment
 * [#16595](https://github.com/apache/tvm/pull/16595) - [Transform] Check for 
zero-param operators in LiftTransformParams
 * [#16599](https://github.com/apache/tvm/pull/16599) - [Transform] 
De-duplicate MatchCast nodes in EliminateCommonSubexpr
 * [#16596](https://github.com/apache/tvm/pull/16596) - [Transform] Implement 
relax.transform.ReorderPermuteDimsAfterConcat
 * [#16597](https://github.com/apache/tvm/pull/16597) - [Transform] Allow 
explicit name of bundled model parameters
 * [#16602](https://github.com/apache/tvm/pull/16602) - [Transform] 
Improvements to LazyTransformParams
 * [#16606](https://github.com/apache/tvm/pull/16606) - [KVCache] Support 
passing in attn_score_scaling_factor into KV cache
 * [#16608](https://github.com/apache/tvm/pull/16608) - Extend gpu memory 
bandwidth test to work through RPC
 * [#16587](https://github.com/apache/tvm/pull/16587) - [Debug] Improve error 
message for codegen pattern mismatches
 * [#16570](https://github.com/apache/tvm/pull/16570) - [Marvell BYOC]: Marvell 
AI Accelerator Integration - Phase 1
 * [#16576](https://github.com/apache/tvm/pull/16576) - Update the 
3rdparty/libflash_attn submodule
 * [#16580](https://github.com/apache/tvm/pull/16580) - [KVCache] Support mode 
"None" for Rotary Embebdding
 * [#16578](https://github.com/apache/tvm/pull/16578) - [KVCache] Support 
returning query positions
 * [#16571](https://github.com/apache/tvm/pull/16571) - Fix compile warnings
 * [#16540](https://github.com/apache/tvm/pull/16540) - [Upd] Enable lld search 
to include /opt/rocm/llvm/bin for rocm
 * [#16539](https://github.com/apache/tvm/pull/16539) - Improve error message 
in NDArray::CopyFromTo
 * [#16524](https://github.com/apache/tvm/pull/16524) - [Build] Improving debug 
and build-dir options
 * [#16551](https://github.com/apache/tvm/pull/16551) - [KVCache] Fix attention 
kernel for ROCm
 * [#16512](https://github.com/apache/tvm/pull/16512) - Cut pytest-lazy-fixture
 * [#16506](https://github.com/apache/tvm/pull/16506) - Bump 
3rdparty/cutlass_fpA_intB_gemm version
 * [#16511](https://github.com/apache/tvm/pull/16511) - [Minor] Fix Clang 
compilation warning in fuse_tir.cc and codegen_c_host.cc
 * [#16516](https://github.com/apache/tvm/pull/16516) - Add Relax, Unity Tags 
in make_notes.py
 * [#16497](https://github.com/apache/tvm/pull/16497) - [Instrument] Add 
default instrument to print all passes
 * [#16494](https://github.com/apache/tvm/pull/16494) - [DPL] Support tir_vars 
field in is_call_tir pattern
 * [#16453](https://github.com/apache/tvm/pull/16453) - Bump pillow from 10.0.1 
to 10.2.0 in /apps/microtvm
 * [#16454](https://github.com/apache/tvm/pull/16454) - [BugTIR] fix 
thread_sync occurs in letstmt
 * [#16468](https://github.com/apache/tvm/pull/16468) - [LINT] Fix pylint 
issues in test_dma_builtin.py
 * [#16413](https://github.com/apache/tvm/pull/16413) - [Contrib] Workspace for 
cuBLAS backend
 * [#16460](https://github.com/apache/tvm/pull/16460) - 
[Cherry-pick][MSC][M4.1] Add plugin && plugin_builder, enable build and test in 
different frameworks (#16397)
 * [#16461](https://github.com/apache/tvm/pull/16461) - [Minor] Fix Docstring 
for sphinx-build
 * [#16431](https://github.com/apache/tvm/pull/16431) - [Schedule] 
Loop-Partition Scheduling Primitive
 * [#16451](https://github.com/apache/tvm/pull/16451) - Bump pillow from 10.0.1 
to 10.2.0 in /apps/microtvm/ethosu
 * [#16452](https://github.com/apache/tvm/pull/16452) - Bump pillow from 10.0.1 
to 10.2.0 in /apps/microtvm/cmsisnn
 * [#16445](https://github.com/apache/tvm/pull/16445) - [skip ci] update branch 
rule to prepare for unity transition
 * [#16426](https://github.com/apache/tvm/pull/16426) - [CMake] Enable cuda 
lang if USE_CUDA is on
 * [#16407](https://github.com/apache/tvm/pull/16407) - Add NVIDIA Hopper H100 
target tag
 * [#16398](https://github.com/apache/tvm/pull/16398) - [DeviceAPI] Support 
querying total global memory
 * [#16357](https://github.com/apache/tvm/pull/16357) - [RPC] Fix tuning on 
macOS and Windows (#15771)
 * [#16386](https://github.com/apache/tvm/pull/16386) - [Thrust] Use no sync 
exec policy and caching allocator
 * [#16343](https://github.com/apache/tvm/pull/16343) - [CMake][MSVC] Disable 
permissive mode for MSVC builds
 * [#16242](https://github.com/apache/tvm/pull/16242) - [Codegen] Fix 
if_then_else codegen
 * [#16341](https://github.com/apache/tvm/pull/16341) - [CMake] Use ccache as 
CMAKE_CUDA_COMPILER_LAUNCHER
 * [#16332](https://github.com/apache/tvm/pull/16332) - Change metal dtype of 
ceil_log2 to fp32

-- 
Reply to this email directly or view it on GitHub:
https://github.com/apache/tvm/issues/16911
You are receiving this because you are subscribed to this thread.

Message ID: <apache/tvm/issues/16...@github.com>

[apache/tvm] [Release] v0.16.0 Release Candidate Notes (Issue #16911)

Reply via email to