# Introduction The TVM community has worked since the v0.12.0 release to deliver the following new exciting improvements! The main tags are below (**bold text is with lots of progress**):
- Community, RFC; - Runtime: Adreno, OpenCL & CLML, ROCm, CUDA & CUTLASS & TensorRT, Ethosn, Vulkan, Hexagon, Metal, others about runtime; - Frontend: TensorFlow/tflite, Pytorch/Torch, Paddle, keras; - TE, Relay, BYOC, TOPI, Arith, **TIR, TVMScript, MetaSchedule**, Schedule; - CI, Tests, BugFix, Docs, Docker, Build; - Android, **microTVM**, AOT, LLVM. Please visit the full listing of commits for a complete view: [v0.12.0...v0.13.0](https://github.com/apache/tvm/compare/v0.12.0...v0.13.0). ### Community * #15086 - Aleksei-grovety -> Reviewer * #14853 - Anirudh Sundar Subramaniam -> Committer * #14772 - Add new key for release signing * #14676 - Jiajun Jiang -> Reviewer * #14677 - Qiang Zhang -> Reviewer * #14622 - Sunghyun Park -> Reviewer * #14578 - [skip ci]Zihao Ye -> Committer ### Arith * #15131 - Hotfix flaky test in padded matmul * #15120 - NormalizeToIterSum * #15081 - Improve arith simplify to handle symbolic reshape pattern * #14532 - Implement statistics counters for RewriteSimplifier * #14704 - [cherry-pick][BUGFIX] Fix a bug of iter map floormod(x,2) simplify * #14849 - [TVMScript] Capture fails if var appears only in annotation * #14596 - [TensorIR] Improve CompactBufferRegion for symbolic shape * #15129 - [TIR] Recognize empty extents * #14982 - [TIR][VTA] Update host-side target, even without device func * #14547 - Enhance IterMapSimplify for symbolic * #14571 - [BUGFIX] Fix a bug of iter map floormod(x,2) simplify * #14582 - Fix solve inequality of unbound var ranges * #14538 - Enhance CanonicalSimplify to Simplify ProdDiv ### Frontend * #14830 - Use f-strings for string formatting, NFC * Keras * #15122 - [Relay][Keras] Fix SeparableConv2D conversion in dilation_rate attribute * #15107 - [Relay][Keras] Fix a wrong variable name in keras frontend * #15053 - [Relay][Keras] Fix the wrong implementation logic about cropping2D * #15082 - [Relay][Keras] Fix UpSampling2D about the wrong assertion about size * #15060 - [Relay][keras] Fix the bug about the attribute 'output_padding' in Deconv * #14707 - [Keras]fix a bug about alpha attribute in LeakyReLU which lead to passes conflict * #15175 - [Relay][Keras] Fix concatenate convert function in axis parsing * Paddle * #14801 - [Paddle] [PaddlePaddle Hackathon 4]add attribute support for gaussian_random/softplus/Conv3d/Conv2d * #14973 - [Paddle] [PaddlePaddle Hackathon 4] add convert support for tanhshrink/pool3d/set_value ops for paddle frontend * #14826 - [Paddle] [PaddlePaddle Hackathon 4] add convert support for p_norm/roi_align/softmax_with_cross_entropy * #14575 - [Paddle] [PaddlePaddle Hackathon 4]add attribute support for dropout/hard_sigmoid/pixel_shuffle * TFLite * #14667 - [TFLite]Support for quantized squared difference * #14819 - [TFLite]Generate name when tensor name is missing * #15173 - [FRONTEND][TFLITE]Fix int16 transpose conv loading * TensorFlow * #14546 - [Tensorflow] Fix conv2d_transpose for NHWC layout * PyTorch * #14747 - [PyTorch] Add aten::new_zeros * #14699 - [Torch] fix typo in new_full * #14963 - [PyTorch] Support use_input_stats in instance_norm * #14930 - Fix pytorch axis * ONNX * #15017 - [ONNX] Fix bug in scatter_elements ### AOT * #15033 - Avoid Var-to-Var Let binding in AOTExecutorCodegen * #15032 - Remove duplication in tvm.testing.aot.compile_models * #14529 - Fix warning on dropping const in TVMAotExecutor_GetInputName ### Runtime * #15182 - Add weak symbol to builtin fp16 * #15161 - Clean TVM stacktrace in error messages * #15162 - Support void as dtype in FFI * #14902 - Update Module and Registry to use String Container * #14967 - [Runtime,RPC] Use f-strings for string formatting, NFC * #14887 - Make systemlib unique per prefix * #14775 - Added __str__ for tvm._ffi.runtime_ctypes.TVMArray * #14656 - Fix Can't "query_imports" Bug of VM Executable ### Adreno * #15061 - [TOPI]Fix problem with ceil_log2 * #14996 - [OpenCL]Fix conv2d when output channels < 4 ### CMSIS-NN * #15059 - Update CMSIS-NN release to v4.1.0 ### OpenCL & CLML * #14972 - [OPENCL] Always use convert_T for type conversion * #14995 - [OpenCL] Improve diagnostic message * #14833 - [Codegen][OpenCL] fix amibiguous selection operator call * #14792 - [OpenCL] Refactor OpenCL runtime to support SPIRV binary ingestion * #14922 - [OpenCLML] Reactor and introduce on chip memory and memory planner * #14949 - [CodegenC] Updated unit test for sorted CodegenC output * #14767 - [OpenCLML] Transposed convolution support and other fixes ### cuda & cutlass & tensorrt * #14751 - [CUDA] Fixed the call of the min function in the schedule for cuda * #14798 - [CUTLASS] Add NDEBUG option to CUTLASS compile to speed up attention kernel * #14782 - [Bugfix][Codegen][CUDA] Wrong casting in ASM ### metal * #14962 - Fix int8 vectorized cast * #14846 - Fix vectorized select * #14727 - Update metal runtime to directly store kernel map * #14671 - Fix flaky memory issue due to racing ### Vulkan * #15035 - [Vulkan] Allow DeclBuffer in CodeGenSPIRV * #14817 - [Vulkan] Add cooperative matrix support ### Hexagon * #14997 - Remove "c" as aot_host_target tvm/contrib/hexagon/pytest_pl… * #14948 - Update instructions to compile hexagon runtime * #14965 - Add support for v73, make v68 default * #14720 - [TIR] Add get_vtcm_allocation_sizes with lowering * #14567 - [TIR] Use the "target" value in T.func_attr for VTCM limit ### ROCm * #15106 - [TensorIR]AMD Matrix Core Support * #15088 - [Target]Replace rocm arch parsing from int to string ### micoNPU * #15159 - [microNPU][ETHOSU] Fix compiler attributes types * #15147 - [microNPU][ETHOSU] Add option to disable copying constants for case without cascader * #15069 - [microNPU][ETHOSU] Fix SoftMax legalization parameters * #15115 - [microNPU][ETHOSU] Upgrade to 23.05 version of Arm(R) Ethos(TM)-U NPU drivers * #15114 - [microNPU] Upgrade Vela to v3.8.0 * #15104 - [microNPU][ETHOSU] Fix minimum buffer size * #15063 - [microNPU][ETHOSU] Fix CopyComputeReordering pass arguments * #14861 - [microNPU][ETHOSU] Add offloading to the NPU the nn.avg_pool2d operator with a stride > 3 * #14765 - [microNPU][ETHOSU] Channel pad offloaded to NPU * #14774 - [microNPU][ETHOSU] Fix Softmax quantization parameters * #14629 - [microNPU][ETHOSU] Softmax int8 legalization support * #14353 - [microNPU] Add support for MEAN with uint8 ifm * #14587 - [microNPU] Fix skip tests when Vela is not present * #14464 - [microNPU][ETHOSU] Add restrictions to convert to NHCWB16 layout in LayoutOptimization pass ### microTVM * #14872 - Use self.close_transport() on error ### BYOC * #15046 - Add GEMM kernel from FasterTransformer as submodule * #15029 - Hide internal cutlass symbols ### Relay * #15068 - Improve the "clip" op optimization in simplify expr pass * #14925 - add a dimension check to reject invalid input * #14858 - [simplify_expr]: Add pass to remove trivial transpose ops * #14838 - Use f-strings for string formatting, NFC * #14831 - [Relay/Op] Use f-strings for string formatting, NFC * #14580 - Simplify the square of a binomial * #14735 - Handle pad value coming from Tensor instead of scalar * #14601 - Enhance type infer for dynamic shape * #14885 - [Relay] fix broadcast in PyTorch frontend * #15090 - [Relay] Insertion of "device_copy" CallNode to Resolve Device Conflict on Unconstrained Nodes * #14845 - [Relay] Fix softplus in paddlepaddle frontend * #14837 - [Relay] Fix AdaptiveAvgPool2d about wrong dtype prasing * #14821 - [Relay] Fix softplus about the wrong calculation formula in Relay PyTorch frontend * #14820 - [Relay] Fix threshold calculation logic in PyTorch frontend * #14824 - [Relay] fix a bug about ReLu in the threshold attribute which causes a different results with keras * #14796 - [relay] fix wrong calculate logic about celu * #14773 - [Relay] fix `scatter_nd` type relation * #14742 - [relay] Fix alpha attribute with None in ELU * #14740 - [Relay] Fix stride in LpPool for default * #14556 - [Relay] fix a bug caused by IncompleteTypeNode in EinsumRel while doing MergeComposite * #15057 - [QNN] Implement quantized avg_pool2d * #14536 - [QNN] Implement 'qnn.softmax' * #14875 - [Quantization]: Update simulated_quantize to infer correct layout ### TOPI * #15018 - Fix dynamic dimensions support for Dense on TOPI side * #14856 - Fix in interpretation of empty axis parameter in reduction fun… * #14483 - [Target] Add SVE specific convolution * #14839 - Use f-strings for string formatting, NFC * #14822 - Use f-strings for string formatting, NFC * #14519 - Vectorize depthwise conv2d output operator * #14549 - remove the i32 cast for output shape of pool * #14566 - [Topi] Output strides in pack_buffer() utility ### MetaSchedule * #14781 - [MetaSchedule] RPC port needs to be an integer * #14673 - Introduce MMA Tensor Core Multilevel Tiling * #14784 - Enhance `tune_tir` to tune IRModule of TIR Collections * #14783 - Add an API to dump a pruned database * #14785 - Clear screen only when specified * #14654 - Handle output cases for InlineConstantScalars * #14642 - PostProc not rewriting unroll for purely spatial block * #14591 - Handle cases when no features found by FeatureExtractor * #14584 - [ARM] Beautification of the function names ### TIR * #15153 - [TensorIR][Visitor] Visit buffer members in `match_buffer`'s in block visitor functions * #15168 - [Schedule] Support padding-by-factor in PadEinsum * #15165 - Expose UndefinedVars to Python * #15163 - Fix RenewDef for symbolic input shapes * #15142 - [Schedule] Enhance `compute-inline` for fusion * #15150 - Fix typo in code example * #15144 - [TensorIR][Schedule] New schedule primitive `unsafe_hide_buffer_access` * #15146 - Block dependence analysis without schedules * #15119 - Avoid duplicate GlobalVar names in SplitHostDevice * #15037 - Handle DeclBuffer in CacheReadWrite schedule primitive * #15098 - [Ethos-U]Handle DeclBuffer in Ethos-U inputs * #15044 - [USMP] Preserve DeclBuffer in PoolAllocationToOffsetConverter * #15078 - Handle DeclBuffer in LowerThreadAllreduce * #15094 - Handle DeclBuffer in MergeDynamicSharedMemoryAllocations * #15093 - Handle DeclBuffer in StorageAccessInfoLower * #15045 - Handle DeclBuffer in InjectDoubleBuffer * #15096 - Handle DeclBuffer in RemoveNoOp * #15076 - [CodeGen] Define PackedFunc error code in MakePackedAPI * #15102 - Update primfunc host attachment to include host * #14854 - [Compute-at] Enable complex floordiv/floormod expressions in compute_at * #15041 - Handle DeclBuffer in LowerCustomDatatypes * #15038 - Handle DeclBuffer in Inline/ComputeAt/ReverseComputeAt * #15052 - [Analysis] Handle DeclBuffer in FlopEstimator * #15051 - Handle DeclBuffer in StorageRewrite * #15050 - [Schedule] Fix decompose_padding bug with dtypes * #15034 - Refactor BlockScope outside schedule * #15054 - Handle DeclBuffer in IRSubstitute * #14986 - Move SplitHostDevice to before MakePackedAPI * #15042 - Handle DeclBuffer in StorageFlatten's input * #15040 - Preserve object equality in Buffer::GetFlattenedBuffer * #14693 - Enhance TVMScript Buffer Slice Access * #14988 - Handle callees on same target, different codegen * #14951 - Keep trivial LetStmt in tir.Simplify when used in buffer decl * #14944 - Restrict tir.transform.LowerTVMBuiltin to host functions * #14990 - [IR,TE,TIR] Use f-strings for string formatting, NFC * #14993 - Fix incorrect construction of block frames * #14952 - Avoid re-defining `var = arg_var` in ArgBinder * #14918 - SplitHostDevice, handle subroutines * #14943 - Restrict tir.transform.InstallDebugSpans to host functions * #14942 - Preserve existing kTarget function attribute in BindTarget * #14945 - Restrict tir.transform.CombineContextCall to host functions * #14914 - Handle subroutine calls in MakeUnpackedAPI * #14913 - Handle subroutine calls in MakePackedAPI * #14892 - Expand unit tests for ConvertSSA * #14866 - Avoid too complex predicate in compaction * #14766 - [Schedule] Improve blockize to support blockizing multiple blocks * #14776 - Improved parameter name in DLTensor unpacking error messages * #14562 - [Driver] Move ShouldAnnotateEntryFunc logic into transform * #14741 - Keep block annotations from tensorization * #14021 - More flexible buffer compaction * #14711 - [Analysis] Calculate allocated memory at module level * #14492 - Flatten SeqStmt on construction * #14598 - Add CUDA int4 tensor core intrinsics * #14593 - [Schedule] Method returning the function being worked on * #14592 - [TensorIR] Fix ComputeAt with perfect symbolic bound * #14491 - Use String instead of StringImm for AttrStmtNode::node * #14626 - [TensorIR]`reindex_cache_write` do not mutate init statement * #14588 - [Fix][TIR] UnifyThreadBinding creating unit loop with annotation * #14589 - [Fix][TIR][Analysis] Reduction block checking alloc_buffers ### TVMScript * #15083 - Avoid visiting repetition tensor in SetCommonPrefix Visitor * #15091 - [TIR]Convert tir.op operands to PrimExpr * #14919 - [TIR] Parse subroutine calls with no arguments * #14941 - Prevent bool to int conversion in T.Assert condition * #14915 - Allow T.target("device", host="host") to specify host * #14900 - Round-trip DeclBuffer with undefined data pointer * #14889 - [TIR]Added format/parsing of subroutine calls * #14874 - Use default fallback for un-registered type * #14840 - Print Executor, Runtime, and FunctionInfo as metadata * #14812 - Handle AllocatedPoolInfo, ConstantPoolInfo, ConstantInfo * #14786 - Add `__name__` attr for parsed PrimFunc and IRModule * #14531 - Preserve LetStmt of constants * #14488 - Distinguish between void* and handle ### TVMC * #14994 - [Bugfix]Fix tvmc option for printing which operators are offloaded to the Ethos-U ### BugFix * #14960 - [Bug] Add typing_extensions requirement again * #15015 - [Hotfix] Remove `LOG(INFO)` from unsupported dtype legalization pass * #14991 - Make ThreadAllReduce pass compatible with int64 * #14950 - Avoid symbol conflicts in MakePackedAPI/MakeUnpackedAPI * #14903 - [Test Cases]Add some version check to make test cases run in all PyTorch versions * #14890 - [Fix] Fix typo in error message * #14879 - fix the undeclared identifier 'f' * #14857 - Fix batch_norm * #14787 - [FIX] fix typo in comment ## CI * #15179 - [Testing] Utility method to run TVM on remote device * #15138 - [Test] Improve check for TVMError exception in test_cast * #15062 - Clone submodule recursively * #15065 - Revert "Make Graviton3 default AArch64 job runner node (#14983)" * #14983 - Make Graviton3 default AArch64 job runner node * #15056 - [Bugfix]Fix CacheControl version constraint violation * #14908 - Update the expected CI jobs list in the update_branch script * #14847 - Update CPU image to install PyTorch * #14808 - [Testing] Use TVMScript's "name" argument for error messages * #14780 - fix doc deploy issue * #14651 - Modify test cases to accommodate the CI upgrades * #14666 - sccache support while using ci.py under multi user environments * #14635 - Upgrade CI * #14713 - Add PLATFORM env var to builds * #14680 - Downgrade ci_cpu llvm version back to 11 * #14653 - [tests][scripts][release] Optimize release note script about categories etc * #14646 - [test][script] Fix release gather_pr.py of script about ghost users or blank PR nodes * #14550 - Add JAX deps in Dockerfiles * #14466 - Update ci_cpu image and build with llvm-15 ### LLVM * #15127 - Remove the "ret_void" argument of AddFunction * #15139 - Minor refactor to LLVMModuleNode::SaveToFile * #14958 - [Codegen]Allow void return type from PackedFunc * #14946 - Expose Host CPU Feature Detection * #14901 - Codegen subroutine call when CallNode::op is GlobalVar * #14570 - Use Var annotation in LetStmt for pointer type * #14843 - [RUNTIME] Enable multi systemlib with device code * #14564 - Validate generated LLVM module before optimization * #14568 - Expand tvm::Type to DWARF conversion * #14563 - [Codegen]Remove cast to i8* in builtin::address_of ### Docker * #15149 - Fix build.sh environment variables * #15105 - Update docker images for llvm-16 * #15092 - Update ci-cortexm docker image to contain CMSIS-NN release v… * #15095 - Add build.sh environment variables * #15067 - Migrate arm docker image to use llvm packages * #15031 - Update ci_cpu docker image to one containing polly package f… * #15003 - [ADRENO] Docker setup changes for multi user environments * #14912 - Add polly package * #14842 - Install PyTorch on cpu image * #14590 - Support rootless docker when using docker/bash.sh ### Docs * #15126 - [DOC] Add RPC System Setup Document * #15071 - [#15043]Updated the copyright year from 2020 to 2023 * #15055 - [#14992][DOC][TUTORIAL] Fix typo for the 'Making your Hardware Accelerator TVM-ready with UMA' * #14504 - [TensorIR][Doc] Docstring of `reorder_block_iter_var` * #14611 - [TIR] Fix unsafe_set_dtype docstring * #14585 - Fix typo in the Vitis AI Integration docs ### Misc * #15267 - [release] Disable git merge to avoid conflict * #15187 - [RPC] Report RPC Session Timeout to Client Instead of "kShutdown" * #15185 - Update tvm_runtime.h * #15164 - [CMake] Support LLVM-16 static linking * #15167 - [Python] Enhance Wheel Packaging * #15166 - [Target] Add MetaSchedule-compatible attributes to OpenCL * #15154 - [Minor] Fix Compilation Warnings * #15132 - [NDArray] Allow creating a view from a strided array * #15116 - [RPC] Add Missing Option "port_end" to RPC Proxy * #15073 - [CodeGenC] Use PrimFuncNode::ret_type in function signature * #15036 - [StackVM] Updated CodeGenStackVM to handle DeclBuffer * #15022 - [Build] Fix missing virtual destructor in SIBuilder * #15016 - Fix type parse error about AdaptiveMaxPool * #15007 - [Minor] Fix compilation warnings * #15000 - [CMAKE] Introduce dummy build as an option * #14863 - [DataType] Initial support of fp8 (e4m3/e5m2) * #14975 - [CMAKE] Add a dummy target to defer libtvm dep * #14574 - [IR][SIBuilder] * #14939 - [Target] Add target to all TVM callbacks * #14937 - [BUILD] Enable log before throw message in windows * #14934 - [TestCases] fix unreachable test cases due to outside the for-loop * #14916 - [TypoFix] fix some typo problem in keras frontend * #14893 - [Contrib] Use f-strings for string formatting, NFC * #14884 - [AutoTVM] Use f-strings for string formatting, NFC * #14876 - [CONTRIB] Enable create_staticlib to take in tar files * #14867 - Fix f-string typo * #14851 - Add v0.12.0 docs * #14813 - [BUILD] Removed the duplicated MACROs in config.cmake * #14743 - [SUPPORT] Fix RingBuffer ReadWithCallback * #14799 - [LINT] Fix clang-format script for newest clang-format * #14797 - [NDArray] Allow arbitrary stride when the corresponding shape is 1 * #14790 - More clear ref of thirdparty license * #14779 - fix: use arm on demand instead of spot * #14762 - [Target][Minor] Add A6000 Target Tag * #14683 - [AutoTVM] Added Droplet algorithm in TVM * #14694 - unify search path approach to various libs * #14686 - [CMAKE] Update search pattern of config * #14636 - Fix bug about wrong attribute name * #14628 - [CODEGEN] Fix metal codegen when with only single working dim * #14607 - fix: deploy ci * #14569 - [Node] Allow alternative root names in ObjectPath::Root() * #14522 - [Object] Implemented .as<T> for ObjectRef param, returns Optional<T> * #14477 - feat: use spot instances for ci with on demand as a backup * #14468 - [AutoTVM] New rank-binary loss_type for the new xgboost >= 2.0.0 behaviour * #14544 - Update to v0.13.dev0 * #14539 - [Target] Add Apple M1 GPU tag with 256-thread restriction -- Reply to this email directly or view it on GitHub: https://github.com/apache/tvm/issues/15295 You are receiving this because you are subscribed to this thread. Message ID: <apache/tvm/issues/15...@github.com>