[apache/tvm] [Release] v0.13.0 Release Candidate Notes (Issue #15295)

ysh329 Tue, 11 Jul 2023 19:09:15 -0700

# Introduction

The TVM community has worked since the v0.12.0 release to deliver the following 
new exciting improvements! The main tags are below (**bold text is with lots of 
progress**):


- Community, RFC;
- Runtime: Adreno, OpenCL & CLML, ROCm, CUDA & CUTLASS & TensorRT, Ethosn, 
Vulkan, Hexagon, Metal, others about runtime;
- Frontend: TensorFlow/tflite, Pytorch/Torch, Paddle, keras;
- TE, Relay, BYOC, TOPI, Arith, **TIR, TVMScript, MetaSchedule**, Schedule;
- CI, Tests, BugFix, Docs, Docker, Build;
- Android, **microTVM**, AOT, LLVM.

Please visit the full listing of commits for a complete view: 
[v0.12.0...v0.13.0](https://github.com/apache/tvm/compare/v0.12.0...v0.13.0).

### Community
 * #15086 - Aleksei-grovety -> Reviewer
 * #14853 - Anirudh Sundar Subramaniam -> Committer
 * #14772 - Add new key for release signing
 * #14676 - Jiajun Jiang -> Reviewer
 * #14677 - Qiang Zhang -> Reviewer
 * #14622 - Sunghyun Park -> Reviewer
 * #14578 - [skip ci]Zihao Ye -> Committer

### Arith
 * #15131 - Hotfix flaky test in padded matmul
 * #15120 - NormalizeToIterSum
 * #15081 - Improve arith simplify to handle symbolic reshape pattern
 * #14532 - Implement statistics counters for RewriteSimplifier
 * #14704 - [cherry-pick][BUGFIX] Fix a bug of iter map floormod(x,2) simplify
 * #14849 - [TVMScript] Capture fails if var appears only in annotation
 * #14596 - [TensorIR] Improve CompactBufferRegion for symbolic shape
 * #15129 - [TIR] Recognize empty extents
 * #14982 - [TIR][VTA] Update host-side target, even without device func
 * #14547 - Enhance IterMapSimplify for symbolic
 * #14571 - [BUGFIX] Fix a bug of iter map floormod(x,2) simplify
 * #14582 - Fix solve inequality of unbound var ranges
 * #14538 - Enhance CanonicalSimplify to Simplify ProdDiv

### Frontend
 * #14830 - Use f-strings for string formatting, NFC
 * Keras
    * #15122 - [Relay][Keras] Fix SeparableConv2D conversion in dilation_rate 
attribute
    * #15107 - [Relay][Keras] Fix a wrong variable name in keras frontend
    * #15053 - [Relay][Keras] Fix the wrong implementation logic about 
cropping2D
    * #15082 - [Relay][Keras] Fix UpSampling2D about the wrong assertion about 
size
    * #15060 - [Relay][keras] Fix the bug about the attribute 'output_padding' 
in Deconv
    * #14707 - [Keras]fix a bug about alpha attribute in LeakyReLU which lead 
to passes conflict
    * #15175 - [Relay][Keras] Fix concatenate convert function in axis parsing
 * Paddle
    * #14801 - [Paddle] [PaddlePaddle Hackathon 4]add attribute support for 
gaussian_random/softplus/Conv3d/Conv2d
    * #14973 - [Paddle] [PaddlePaddle Hackathon 4] add convert support for 
tanhshrink/pool3d/set_value ops for paddle frontend
    * #14826 - [Paddle] [PaddlePaddle Hackathon 4] add convert support for 
p_norm/roi_align/softmax_with_cross_entropy
    * #14575 - [Paddle] [PaddlePaddle Hackathon 4]add attribute support for 
dropout/hard_sigmoid/pixel_shuffle
 * TFLite
    * #14667 - [TFLite]Support for quantized squared difference
    * #14819 - [TFLite]Generate name when tensor name is missing
    * #15173 - [FRONTEND][TFLITE]Fix int16 transpose conv loading
 * TensorFlow
    * #14546 - [Tensorflow] Fix conv2d_transpose for NHWC layout
 * PyTorch
    * #14747 - [PyTorch]  Add aten::new_zeros
    * #14699 - [Torch] fix typo in new_full
    * #14963 - [PyTorch] Support use_input_stats in instance_norm
    * #14930 - Fix pytorch axis
 * ONNX
    * #15017 - [ONNX] Fix bug in scatter_elements

### AOT
 * #15033 - Avoid Var-to-Var Let binding in AOTExecutorCodegen
 * #15032 - Remove duplication in tvm.testing.aot.compile_models
 * #14529 - Fix warning on dropping const in TVMAotExecutor_GetInputName

### Runtime
 * #15182 - Add weak symbol to builtin fp16
 * #15161 - Clean TVM stacktrace in error messages
 * #15162 - Support void as dtype in FFI
 * #14902 - Update Module and Registry to use String Container
 * #14967 - [Runtime,RPC] Use f-strings for string formatting, NFC
 * #14887 - Make systemlib unique per prefix
 * #14775 - Added __str__ for tvm._ffi.runtime_ctypes.TVMArray
 * #14656 - Fix Can't "query_imports" Bug of VM Executable

### Adreno
 * #15061 - [TOPI]Fix problem with ceil_log2
 * #14996 - [OpenCL]Fix conv2d when output channels < 4

### CMSIS-NN
 * #15059 - Update CMSIS-NN release to v4.1.0

### OpenCL & CLML
 * #14972 - [OPENCL] Always use convert_T for type conversion
 * #14995 - [OpenCL] Improve diagnostic message
 * #14833 - [Codegen][OpenCL] fix amibiguous selection operator call
 * #14792 - [OpenCL] Refactor OpenCL runtime to support SPIRV binary ingestion
 * #14922 - [OpenCLML] Reactor and introduce on chip memory and memory planner
 * #14949 - [CodegenC] Updated unit test for sorted CodegenC output
 * #14767 - [OpenCLML] Transposed convolution support and other fixes

### cuda & cutlass & tensorrt
 * #14751 - [CUDA] Fixed the call of the min function in the schedule for cuda
 * #14798 - [CUTLASS] Add NDEBUG option to CUTLASS compile to speed up 
attention kernel
 * #14782 - [Bugfix][Codegen][CUDA] Wrong casting in ASM

### metal
 * #14962 - Fix int8 vectorized cast
 * #14846 - Fix vectorized select
 * #14727 - Update metal runtime to directly store kernel map
 * #14671 - Fix flaky memory issue due to racing

### Vulkan
 * #15035 - [Vulkan] Allow DeclBuffer in CodeGenSPIRV
 * #14817 - [Vulkan] Add cooperative matrix support

### Hexagon
 * #14997 - Remove "c" as aot_host_target tvm/contrib/hexagon/pytest_pl…
 * #14948 - Update instructions to compile hexagon runtime
 * #14965 - Add support for v73, make v68 default
 * #14720 - [TIR] Add get_vtcm_allocation_sizes with lowering
 * #14567 - [TIR] Use the "target" value in T.func_attr for VTCM limit

### ROCm
 * #15106 - [TensorIR]AMD Matrix Core Support
 * #15088 - [Target]Replace rocm arch parsing from int to string

### micoNPU
 * #15159 - [microNPU][ETHOSU] Fix compiler attributes types
 * #15147 - [microNPU][ETHOSU] Add option to disable copying constants for case 
without cascader
 * #15069 - [microNPU][ETHOSU] Fix SoftMax legalization parameters
 * #15115 - [microNPU][ETHOSU] Upgrade to 23.05 version of Arm(R) Ethos(TM)-U 
NPU drivers
 * #15114 - [microNPU] Upgrade Vela to v3.8.0
 * #15104 - [microNPU][ETHOSU] Fix minimum buffer size
 * #15063 - [microNPU][ETHOSU] Fix CopyComputeReordering pass arguments
 * #14861 - [microNPU][ETHOSU] Add offloading to the NPU the nn.avg_pool2d 
operator with a stride > 3
 * #14765 - [microNPU][ETHOSU] Channel pad offloaded to NPU
 * #14774 - [microNPU][ETHOSU] Fix Softmax quantization parameters
 * #14629 - [microNPU][ETHOSU] Softmax int8 legalization support
 * #14353 - [microNPU] Add support for MEAN with uint8 ifm
 * #14587 - [microNPU] Fix skip tests when Vela is not present
 * #14464 - [microNPU][ETHOSU] Add restrictions to convert to NHCWB16 layout in 
LayoutOptimization pass

### microTVM
 * #14872 - Use self.close_transport() on error

### BYOC
 * #15046 - Add GEMM kernel from FasterTransformer as submodule
 * #15029 - Hide internal cutlass symbols

### Relay
 * #15068 - Improve the "clip" op optimization in simplify expr pass
 * #14925 - add a dimension check to reject invalid input
 * #14858 - [simplify_expr]: Add pass to remove trivial transpose ops
 * #14838 - Use f-strings for string formatting, NFC
 * #14831 - [Relay/Op] Use f-strings for string formatting, NFC
 * #14580 - Simplify the square of a binomial
 * #14735 - Handle pad value coming from Tensor instead of scalar
 * #14601 - Enhance type infer for dynamic shape
 * #14885 - [Relay] fix broadcast in PyTorch frontend
 * #15090 - [Relay] Insertion of "device_copy" CallNode to Resolve Device 
Conflict on Unconstrained Nodes
 * #14845 - [Relay] Fix softplus in paddlepaddle frontend
 * #14837 - [Relay] Fix AdaptiveAvgPool2d about wrong dtype prasing
 * #14821 - [Relay] Fix softplus about the wrong calculation formula in Relay 
PyTorch frontend
 * #14820 - [Relay] Fix threshold calculation logic in PyTorch frontend
 * #14824 - [Relay] fix a bug about ReLu in the threshold attribute which 
causes a different results with keras
 * #14796 - [relay] fix wrong calculate logic about celu
 * #14773 - [Relay] fix `scatter_nd` type relation
 * #14742 - [relay] Fix alpha attribute with None in ELU
 * #14740 - [Relay] Fix stride in LpPool for default
 * #14556 - [Relay] fix a bug caused by IncompleteTypeNode in EinsumRel while 
doing MergeComposite
 * #15057 - [QNN] Implement quantized avg_pool2d
 * #14536 - [QNN] Implement 'qnn.softmax'
 * #14875 - [Quantization]: Update simulated_quantize to infer correct layout

### TOPI
 * #15018 - Fix dynamic dimensions support for Dense on TOPI side
 * #14856 - Fix in interpretation of empty axis parameter in reduction fun…
 * #14483 - [Target] Add SVE specific convolution
 * #14839 - Use f-strings for string formatting, NFC
 * #14822 - Use f-strings for string formatting, NFC
 * #14519 - Vectorize depthwise conv2d output operator
 * #14549 - remove the i32 cast for output shape of pool
 * #14566 - [Topi] Output strides in pack_buffer() utility

### MetaSchedule
 * #14781 - [MetaSchedule] RPC port needs to be an integer
 * #14673 - Introduce MMA Tensor Core Multilevel Tiling
 * #14784 - Enhance `tune_tir` to tune IRModule of TIR Collections
 * #14783 - Add an API to dump a pruned database
 * #14785 - Clear screen only when specified
 * #14654 - Handle output cases for InlineConstantScalars
 * #14642 - PostProc not rewriting unroll for purely spatial block
 * #14591 - Handle cases when no features found by FeatureExtractor
 * #14584 - [ARM] Beautification of the function names

### TIR
 * #15153 - [TensorIR][Visitor] Visit buffer members in `match_buffer`'s in 
block visitor functions
 * #15168 - [Schedule] Support padding-by-factor in PadEinsum
 * #15165 - Expose UndefinedVars to Python
 * #15163 - Fix RenewDef for symbolic input shapes
 * #15142 - [Schedule] Enhance `compute-inline` for fusion
 * #15150 - Fix typo in code example
 * #15144 - [TensorIR][Schedule] New schedule primitive 
`unsafe_hide_buffer_access`
 * #15146 - Block dependence analysis without schedules
 * #15119 - Avoid duplicate GlobalVar names in SplitHostDevice
 * #15037 - Handle DeclBuffer in CacheReadWrite schedule primitive
 * #15098 - [Ethos-U]Handle DeclBuffer in Ethos-U inputs
 * #15044 - [USMP] Preserve DeclBuffer in PoolAllocationToOffsetConverter
 * #15078 - Handle DeclBuffer in LowerThreadAllreduce
 * #15094 - Handle DeclBuffer in MergeDynamicSharedMemoryAllocations
 * #15093 - Handle DeclBuffer in StorageAccessInfoLower
 * #15045 - Handle DeclBuffer in InjectDoubleBuffer
 * #15096 - Handle DeclBuffer in RemoveNoOp
 * #15076 - [CodeGen] Define PackedFunc error code in MakePackedAPI
 * #15102 - Update primfunc host attachment to include host
 * #14854 - [Compute-at] Enable complex floordiv/floormod expressions in 
compute_at
 * #15041 - Handle DeclBuffer in LowerCustomDatatypes
 * #15038 - Handle DeclBuffer in Inline/ComputeAt/ReverseComputeAt
 * #15052 - [Analysis] Handle DeclBuffer in FlopEstimator
 * #15051 - Handle DeclBuffer in StorageRewrite
 * #15050 - [Schedule] Fix decompose_padding bug with dtypes
 * #15034 - Refactor BlockScope outside schedule
 * #15054 - Handle DeclBuffer in IRSubstitute
 * #14986 - Move SplitHostDevice to before MakePackedAPI
 * #15042 - Handle DeclBuffer in StorageFlatten's input
 * #15040 - Preserve object equality in Buffer::GetFlattenedBuffer
 * #14693 - Enhance TVMScript Buffer Slice Access
 * #14988 - Handle callees on same target, different codegen
 * #14951 - Keep trivial LetStmt in tir.Simplify when used in buffer decl
 * #14944 - Restrict tir.transform.LowerTVMBuiltin to host functions
 * #14990 - [IR,TE,TIR] Use f-strings for string formatting, NFC
 * #14993 - Fix incorrect construction of block frames
 * #14952 - Avoid re-defining `var = arg_var` in ArgBinder
 * #14918 - SplitHostDevice, handle subroutines
 * #14943 - Restrict tir.transform.InstallDebugSpans to host functions
 * #14942 - Preserve existing kTarget function attribute in BindTarget
 * #14945 - Restrict tir.transform.CombineContextCall to host functions
 * #14914 - Handle subroutine calls in MakeUnpackedAPI
 * #14913 - Handle subroutine calls in MakePackedAPI
 * #14892 - Expand unit tests for ConvertSSA
 * #14866 - Avoid too complex predicate in compaction
 * #14766 - [Schedule] Improve blockize to support blockizing multiple blocks
 * #14776 - Improved parameter name in DLTensor unpacking error messages
 * #14562 - [Driver] Move ShouldAnnotateEntryFunc logic into transform
 * #14741 - Keep block annotations from tensorization
 * #14021 - More flexible buffer compaction
 * #14711 - [Analysis] Calculate allocated memory at module level
 * #14492 - Flatten SeqStmt on construction
 * #14598 - Add CUDA int4 tensor core intrinsics
 * #14593 - [Schedule] Method returning the function being worked on
 * #14592 - [TensorIR] Fix ComputeAt with perfect symbolic bound
 * #14491 - Use String instead of StringImm for AttrStmtNode::node
 * #14626 - [TensorIR]`reindex_cache_write` do not mutate init statement
 * #14588 - [Fix][TIR] UnifyThreadBinding creating unit loop with annotation
 * #14589 - [Fix][TIR][Analysis] Reduction block checking alloc_buffers

### TVMScript
 * #15083 - Avoid visiting repetition tensor in SetCommonPrefix Visitor
 * #15091 - [TIR]Convert tir.op operands to PrimExpr
 * #14919 - [TIR] Parse subroutine calls with no arguments
 * #14941 - Prevent bool to int conversion in T.Assert condition
 * #14915 - Allow T.target("device", host="host") to specify host
 * #14900 - Round-trip DeclBuffer with undefined data pointer
 * #14889 - [TIR]Added format/parsing of subroutine calls
 * #14874 - Use default fallback for un-registered type
 * #14840 - Print Executor, Runtime, and FunctionInfo as metadata
 * #14812 - Handle AllocatedPoolInfo, ConstantPoolInfo, ConstantInfo
 * #14786 - Add `__name__` attr for parsed PrimFunc and IRModule
 * #14531 - Preserve LetStmt of constants
 * #14488 - Distinguish between void* and handle

### TVMC
 * #14994 - [Bugfix]Fix tvmc option for printing which operators are offloaded 
to the Ethos-U

### BugFix
 * #14960 - [Bug] Add typing_extensions requirement again
 * #15015 - [Hotfix] Remove `LOG(INFO)` from unsupported dtype legalization pass
 * #14991 - Make ThreadAllReduce pass compatible with int64
 * #14950 - Avoid symbol conflicts in MakePackedAPI/MakeUnpackedAPI
 * #14903 - [Test Cases]Add some version check to make test cases run in all 
PyTorch versions
 * #14890 - [Fix] Fix typo in error message
 * #14879 - fix the undeclared identifier 'f'
 * #14857 - Fix batch_norm
 * #14787 - [FIX] fix typo in comment

## CI
 * #15179 - [Testing] Utility method to run TVM on remote device
 * #15138 - [Test] Improve check for TVMError exception in test_cast
 * #15062 - Clone submodule recursively
 * #15065 - Revert "Make Graviton3 default AArch64 job runner node (#14983)"
 * #14983 - Make Graviton3 default AArch64 job runner node
 * #15056 - [Bugfix]Fix CacheControl version constraint violation
 * #14908 - Update the expected CI jobs list in the update_branch script
 * #14847 - Update CPU image to install PyTorch
 * #14808 - [Testing] Use TVMScript's "name" argument for error messages
 * #14780 - fix doc deploy issue
 * #14651 - Modify test cases to accommodate the CI upgrades
 * #14666 - sccache support while using ci.py under multi user environments
 * #14635 - Upgrade CI
 * #14713 - Add PLATFORM env var to builds
 * #14680 - Downgrade ci_cpu llvm version back to 11
 * #14653 - [tests][scripts][release] Optimize release note script about 
categories etc
 * #14646 - [test][script] Fix release gather_pr.py of script about ghost users 
or blank PR nodes
 * #14550 - Add JAX deps in Dockerfiles
 * #14466 - Update ci_cpu image and build with llvm-15

### LLVM
 * #15127 - Remove the "ret_void" argument of AddFunction
 * #15139 - Minor refactor to LLVMModuleNode::SaveToFile
 * #14958 - [Codegen]Allow void return type from PackedFunc
 * #14946 - Expose Host CPU Feature Detection
 * #14901 - Codegen subroutine call when CallNode::op is GlobalVar
 * #14570 - Use Var annotation in LetStmt for pointer type
 * #14843 - [RUNTIME] Enable multi systemlib with device code
 * #14564 - Validate generated LLVM module before optimization
 * #14568 - Expand tvm::Type to DWARF conversion
 * #14563 - [Codegen]Remove cast to i8* in builtin::address_of

### Docker
 * #15149 - Fix build.sh environment variables
 * #15105 - Update docker images for llvm-16
 * #15092 - Update ci-cortexm docker image to contain CMSIS-NN release v…
 * #15095 - Add build.sh environment variables
 * #15067 - Migrate arm docker image to use llvm packages
 * #15031 - Update ci_cpu docker image to one containing polly package f…
 * #15003 - [ADRENO] Docker setup changes for multi user environments
 * #14912 - Add polly package
 * #14842 - Install PyTorch on cpu image
 * #14590 - Support rootless docker when using docker/bash.sh

### Docs
 * #15126 - [DOC] Add RPC System Setup Document
 * #15071 - [#15043]Updated the copyright year from 2020 to 2023
 * #15055 - [#14992][DOC][TUTORIAL] Fix typo for the 'Making your Hardware 
Accelerator TVM-ready with UMA'
 * #14504 - [TensorIR][Doc] Docstring of `reorder_block_iter_var`
 * #14611 - [TIR] Fix unsafe_set_dtype docstring
 * #14585 - Fix typo in the Vitis AI Integration docs

### Misc
 * #15267 - [release] Disable git merge to avoid conflict
 * #15187 - [RPC] Report RPC Session Timeout to Client Instead of "kShutdown"
 * #15185 - Update tvm_runtime.h
 * #15164 - [CMake] Support LLVM-16 static linking
 * #15167 - [Python] Enhance Wheel Packaging
 * #15166 - [Target] Add MetaSchedule-compatible attributes to OpenCL
 * #15154 - [Minor] Fix Compilation Warnings
 * #15132 - [NDArray] Allow creating a view from a strided array
 * #15116 - [RPC] Add Missing Option "port_end" to RPC Proxy
 * #15073 - [CodeGenC] Use PrimFuncNode::ret_type in function signature
 * #15036 - [StackVM] Updated CodeGenStackVM to handle DeclBuffer
 * #15022 - [Build] Fix missing virtual destructor in SIBuilder
 * #15016 - Fix type parse error about AdaptiveMaxPool
 * #15007 - [Minor] Fix compilation warnings
 * #15000 - [CMAKE] Introduce dummy build as an option
 * #14863 - [DataType] Initial support of fp8 (e4m3/e5m2)
 * #14975 - [CMAKE] Add a dummy target to defer libtvm dep
 * #14574 - [IR][SIBuilder]
 * #14939 - [Target] Add target to all TVM callbacks
 * #14937 - [BUILD] Enable log before throw message in windows
 * #14934 - [TestCases] fix unreachable test cases due to outside the for-loop
 * #14916 - [TypoFix] fix some typo problem in keras frontend
 * #14893 - [Contrib] Use f-strings for string formatting, NFC
 * #14884 - [AutoTVM] Use f-strings for string formatting, NFC
 * #14876 - [CONTRIB] Enable create_staticlib to take in tar files
 * #14867 - Fix f-string typo
 * #14851 - Add v0.12.0 docs
 * #14813 - [BUILD] Removed the duplicated MACROs in config.cmake
 * #14743 - [SUPPORT] Fix RingBuffer ReadWithCallback
 * #14799 - [LINT] Fix clang-format script for newest clang-format
 * #14797 - [NDArray] Allow arbitrary stride when the corresponding shape is 1
 * #14790 - More clear ref of thirdparty license
 * #14779 - fix: use arm on demand instead of spot
 * #14762 - [Target][Minor] Add A6000 Target Tag
 * #14683 - [AutoTVM] Added Droplet algorithm in TVM
 * #14694 - unify search path approach to various libs
 * #14686 - [CMAKE] Update search pattern of config
 * #14636 - Fix bug about wrong attribute name
 * #14628 - [CODEGEN] Fix metal codegen when with only single working dim
 * #14607 - fix: deploy ci
 * #14569 - [Node] Allow alternative root names in ObjectPath::Root()
 * #14522 - [Object] Implemented .as<T> for ObjectRef param, returns Optional<T>
 * #14477 - feat: use spot instances for ci with on demand as a backup
 * #14468 - [AutoTVM] New rank-binary loss_type for the new xgboost >= 2.0.0 
behaviour
 * #14544 - Update to v0.13.dev0
 * #14539 - [Target] Add Apple M1 GPU tag with 256-thread restriction

-- 
Reply to this email directly or view it on GitHub:
https://github.com/apache/tvm/issues/15295
You are receiving this because you are subscribed to this thread.

Message ID: <apache/tvm/issues/15...@github.com>

[apache/tvm] [Release] v0.13.0 Release Candidate Notes (Issue #15295)

Reply via email to