# Introduction The TVM community has worked since the v0.10.0 release to deliver the following new exciting improvements!
* Metaschedule * Tuning API improvements and anchor-block tuning * TVMSCript metaprogramming * Lots of progress wiht TVMScript, with the introduction of a core parser, AST, Evaluator, Source and diagnostics And many other general improvements to microTVM, code quality, CI, frontends, and more! Please visit the full listing of commits for a complete view: https://github.com/apache/tvm/compare/v0.10.0...v0.11.0. ## RFCs These RFCs have been merged in [apache/tvm-rfcs](https://github.com/apache/tvm-rfcs) since the last release. * [CodeGenAArch64 backend with Scalable Vector Extension (SVE) #94](https://github.com/apache/tvm-rfcs/blob/main/rfcs/0094-aarch64-backend-with-sve.md) https://github.com/apache/tvm-rfcs/commit/04b9909d6f8b63524091f12ff5eb964ad490c7b8 ## What's Changed Note that this list is not comprehensive of all PRs and discussions since v0.10. Please visit the full listing of commits for a complete view: https://github.com/apache/tvm/compare/v0.10.0...v0.11.0. ### Adreno * [Adreno] Add global pooling schedule (#13573) * [Adreno] Add documentation for Adreno deployment (#13393) * [Adreno] Fix mem_scope annotations for prim funcs having several heads (#13153) * [Adreno] Adapt reduction schedule for adreno (#13100) * [Adreno] Fix winograd accuracy (#13117) * [Adreno][Textures] Fix static memory planner (#13253) * [DOCKER][Adreno]Docker infra for Adreno target with CLML support (#12833) ### AoT * [AOT] Add CreateExecutorMetadata analysis pass (#13250) * [AOT] Add CreateFunctionMetadata analysis pass (#13095) * [AOT] Sanitize input/output name in runtime (#13046) ### Arith * [Arith] Add internal NarrowPredicateExpression utility (#13041) * [Arith] Optional rewriting and simplification into AND of ORs (#12972) ### arm * [bfloat16] Fixed dtype conversion in the arm_cpu injective schedule (#13417) ### AutoTVM * [AutoTVM] Introducing multi_filter into ConfigSpace autotvm (#12545) ### Build * [BUILD] Re-enable ccache by default (#12839) ### CI * [ci] Fix docs deploy (#13570) * [ci] Split Jenkinsfile into platform-specific jobs (#13300) * [ci] Dis-allow any non-S3 URLs in CI (#13283) * [ci] Split out C++ unittests (#13335) * [CI] Separate the ci scripts into Github and Jenkins scripts (#13368) * [ci] Assert some tests are not skipped in the CI (#12915) * [ci] Ignore JUnit upload failures (#13142) * [ci] Lint for trailing newlines and spaces (#13058) * [ci] Template build steps (#12983) * [ci][docker] Allow usage of ECR images in PRs (#13590) * [ci][docker] Read docker image tags during CI runs (#13572) * [ci][wasm] Add package-lock.json to git (#13505) ### CL * [ACL] Enable int8 data type in pooling operators (#13488) ### CMSIS-NN * [CMSIS-NN] Support for int16 conv2d (#12950) * [CMSIS-NN] Support for int16 in fully connected layer (#13484) ### DNNL * [AMP] refine AMP and the corresponding tests for bfloat16 (#12787) ### Docker * [Docker]Refactor timezone script and NRF installation (#13342) ### Docs * [docs] Fix empty code blocks in tutorials (#13188) ### Ethos-N * [ETHOSN] Consolidate target string usage (#13159) * [ETHOSN] Throw error message when inference fails (#13022) * [ETHOSN] Inline non-compute-intensive partitions (#13092) * [ETHOSN] Transpose fully connected weights (#12970) * [ETHOSN] Support conversion of add/mul to requantize where possible (#12887) ### Frontend * [TFLite] Enable int64 biases for int16 quantized operators (#12042) ### Hexagon * [Hexagon] Add HVX quant conv2d implementation (#13256) * [Hexagon] Add test to show scheduling of resnet50 with async dma pipe… (#13352) * [Hexagon] Enable Hexagon User DMA bypass mode (#13381) * [Hexagon] Lint tests part 2 (#13271) * [Hexagon] Add pylint on tests (#13233) * [Hexagon] Add E2E test demonstrating how to apply blocked layout schedule to conv2d via metaschedule (#13180) * [Hexagon] Add a test to show how to use multi input async dma pipelin… (#13110) * [Hexagon]: Add upload function to hexagon session (#13161) * [Hexagon] Add support for instrumentation based profiling for Hexagon (#12971) * [Hexagon] Add power manager (#13162) * [Hexagon] Add scripts for e2e MetaSchedule tuning demonstration (#13135) * [Hexagon] Add feature to copy logcat to --hexagon-debug and add new --sysmon-profile option to run sysmon profiler during the test (#13107) * [Hexagon] Async DMA pipelining test suite (#13005) * [Hexagon] Enable multi input Async DMA; same queue / stage (#13037) * [Hexagon] Do not use `target` test fixture in Hexagon tests (#12981) * [Hexagon] 3-stage pipeline; multi queue async DMA for cache read / write (#12954) * [Hexagon] vrmpy tensorization for e2e compilation of int8 models (#12911) * [Hexagon] Support template-free meta schedule tuning (#12854) * [Hexagon] depth_to_space slice op (#12669) * [Hexagon] Make allocate_hexagon_array a hexagon contrib API (#13336) * [Hexagon] Add fix for vtcm allocation searches (#13197) * [MetaSchedule][Hexagon] Add postproc for verifying VTCM usage (#13538) * [Hexagon][QNN] Add TOPI strategies for qnn ops mul/tanh/subtract (#13416) * [Logging][Hexagon] Improve logging on Hexagon (#13072) * [Hexagon] [runtime] Per-thread hardware resource management (#13181) * [Hexagon] [runtime] Create objects to manage thread hardware resources (#13111) * [QNN][Hexagon] Disable QNN canonicalization pass (#12398) * [Hexagon] [runtime] Manage RPC and runtime buffers separately (#13028) * [Hexagon] [runtime] VTCM Allocator (#12947) * [TOPI][Hexagon] Add schedule and test for maxpool uint8 layout (#12826) * [TOPI][Hexagon] Implement quantize op for hexagon (#12820) * [Meta Schedule][XGBoost] Update the custom callback function of xgboost in meta schedule (#12141) * [TIR] [Hexagon] Add vdmpy intrinsic and transform_layout for tests (#13557) * [Hexagon] [runtime] Support VTCM alignments of 128 or 2k (#12999) * [HEXAGON][QHL] Clippling the inputs of HVX version of QHL Sigmoid operation (#12919) * [Hexagon] [runtime] Add user DMA to device API resource management (#12918) ### LLVM * [LLVM] Emit fp16/fp32 builtins directly into target module (#12877) * [LLVM] Switch to using New Pass Manager (NPM) with LLVM 16+ (#13515) ### MetaSchedule * [MetaSchedule] Make `MultiLevelTiling` apply condition customizable (#13535) * [MetaSchedule] Enhance Database Validation Script (#13459) * [MetaSchedule] Fix Dynamic Loop from AutoBinding (#13421) * [MetaSchedule] Support schedules with cache read in RewriteLayout (#13384) * [MetaSchedule] Improve inlining and `VerifyGPUCode` for quantized model workload (#13334) * [MetaSchedule] Add JSON Database Validation Scripts (#12948) * [MetaSchedule] Fix the order of applying `AutoInline` in `ScheduleUsingAnchorTrace` (#13329) * [MetaSchedule] Refactor ScheduleRule Attributes (#13195) * [MetaSchedule] Improve the script for TorchBench model tuning & benchmarking (#13255) * [MetaSchedule] Enable anchor-block tuning (#13206) * [MetaSchedule] Introduce a variant of ModuleEquality to enable ignoring NDArray raw data (#13091) * [MetaSchedule] Consolidate module hashing and equality testing (#13050) * [MetaSchedule] Support RewriteLayout postproc on AllocateConst (#12991) * [MetaSchedule] Tuning API cleanup & ergonomics (#12895) * [MetaSchedule] Fix XGBoost Import Issue (#12936) * [MetaSchedule] Add Script for TorchBench Model Tuning & Benchmarking (#12914) * [MetaSchedule] Restore `num_threads` parameter in tuning API (#13561) * [MetaSchedule] TorchBench tuning script: add option to disallow operators in sub graph (#13453) * [MetaSchedule] Fix segfault in gradient based scheduler (#13399) * [MetaSchedule] Add `from-target` Defaults for x86 VNNI Targets (#13383) * [MetaSchedule] Fix Task Hanging in EvolutionarySearch (#13246) * [MetaSchedule] Allow skipping exact NDArray rewrite in RemoveWeightLayoutRewriteBlock (#13052) * [MetaSchedule][UX] Support Interactive Performance Table Printing in Notebook (#13006) * [MetaSchedule][UX] User Interface for Jupyter Notebook (#12866) ### microNPU * [microNPU] Upgrade Vela to v3.5.0 (#13394) * [microNPU] Fixed MergeConstants pass on striped networks (#13281) ### microTVM * [microNPU] Upgrade Vela to v3.5.0 (#13394) * [microNPU] Fixed MergeConstants pass on striped networks (#13281) * [microTVM] Modernize Arm Cortex-M convolution schedules (#13242) * [microTVM] Improve code reuse in Corstone300 conv2d tests (#13051) * [microTVM] Add Cortex-M DSP schedules for optimal conv2d layouts (#12969) * [microTVM] Use default Project Options in template projects and add Makefile for Arduino template project (#12818) * [microTVM] Generalize depthwise_conv2d schedule (#12856) * [microTVM] add the option to open a saved micro project for debugging (#12495) * Added macro generation in MLF export (#12789) * [microTVM][Arduino]Add `serial_number` to project options and tests (#13518) * [microTVM][Zephyr] Add 'serial_number' option (#13377) * [microTVM][PyTorch][Tutorial]Adding a PyTorch tutorial for microTVM with CRT (#13324) ### Misc * [CodegenC] Explicit forward function declarations (#13522) * [FQ2I] Support converting `dense` -> `add` to `qnn.dense` -> `add` -> `requantize` (#13578) * [Minor][Testing] Consolidate IRs into corresponding functions (#13339) * Add recursive on loop with marked kUnrolled (#13536) * Skip stride check if shape is 1 in IsContiguous (#13121) * [TEST] CPU feature detection for x86 and ARM dot product instructions (#12980) * [Node] Expose StructuralEqual/Hash handler implemenation to header (#13001) * [Tensorize] Add logs to comparator to make debugging tensorize failures easier (#13285) * [usmp] Also remap VarNode to USMP-allocated buffer (#12880) * [Virtual Machine] Implementation of 'set_output_zero_copy' (#11358) ### ONNX * [ONNX] Add converter for FastGelu from Microsoft onnxruntime contrib opset (#13119) * [QNN, ONNX] Extension of QLinearMatMul in ONNX front-end for all ranks of input tensors (#13322) ### OpenCL * [OpenCL] Introduce OpenCL wrapper to TVM (#13362) * [OpenCL] Introduction of weights on buffers (#13563) * [OPENCL][TEXTURE] Test case enhancements and fixes for RPC (#13408) ### Relay * [Relay] Fix `CombineParallelDense` slicing axis (#13597) * [Relay] Refactor constant folding over expr into a utility function (#13343) * [Relay] Enhancement for fold_scale_axis and simplify_expr (#13275) * [Relay] Add ClipAndConsecutiveCast and CastClip to SimplifyExpr (#13236) * [Relay] Rewrite division by constant to multiply (#13182) * [Relay] Extend split for blocked ConvertLayout pass (#12886) * [Relay][transform][SimplifyExpr] simplify adjacent muls and adds with constants (#13213) * [Relay][Hexagon] Add per-channel FixedPointMultiply operation (#13080) * [IRBuilder][Minor] Add intrinsics like `T.int32x4` (#13361) ### roofline * [ROOFLINE] Add support for different dtypes (#13003) * [Roofline] Add fma (non-tensorcore) peak flops for CUDA (#13419) ### RPC * [RPC] Fix tracker connection termination (#13420) ### Runtime * [RUNTIME][CLML] Add fixes to clml runtime api (#13426) * [DLPack][runtime] Update DLPack to v0.7 (#13177) ### Target * [Target] Replace utility functions with target.features (#12455) * [Target] Add Target Parser for Arm(R) Cortex(R) A-Profile CPUs (#12454) * [Target] Add target_device_type attribute to override default device_type (#12509) ### TIR * [TIR] Add preserve_unit_iters option to blockize/tensorize (#13579) * [TIR] Introduce ReduceBranchingThroughOvercompute (#13299) * [TIR] Unify index data type when creating prim func (#13327) * [TIR] Remove PrimFuncNode::preflattened_buffer_map (#10940) * [TIR] Make syntax of AST nodes different than ops (#13358) * [TIR] Update ReductionIterNotIndexOutputBuffer to check BlockRealizeN… (#13301) * [TIR] Check producer predicate in `ReverseComputeInline` (#13338) * [TIR] Add utility for anchor block extraction (#13194) * [TIR] Allow IndexMap applied to arguments with different dtypes (#13085) * [TIR] Fix handling of int64 extent in blockize and tensorize (#13069) * [TIR] Refactor NarrowDataType into DataTypeLegalizer (#13049) * [TIR] add unit-tests for upcoming primfunc-slicing (#12794) * [TIR] Fix plan buffer allocation location for loop carried dependencies (#12757) * [TIR] Fix predefined inverse map in layout transform dtype legalization (#13565) * [TIR] Preserve loop annotation after loop partitioning (#13292) * [TIR] Use IndexMap to transform NDArray (#12949) * [TIR] Preserve loop annotations in inject_software_pipeline pass (#12937) * [TIR][Schedule] Support for specific consumer block targeting in cache_write (#13510) * [TIR][Hexagon] Add vtcm memory capacity verification for Hexagon target (#13349) * [TIR][Transform] Optional data-flow analysis in RemoveNoOp (#13217) * [TIR][Analysis][Arith] Implement basic data-flow analysis (#13130) * [TIR][Bugfix] Fix AXIS_SEPARATORS in tir.Schedule.transform_layout (#13326) * [TIR][Arith] Use TryCompare to narrow inequalities if possible (#13024) * [TIR][Primitive] Support rolling_buffer schedule primitive in TensorIR (#13033) * [Arith][TIR] Check for constant offsets of known literal constraints (#13023) * [TIR][Arith] Implement kApplyConstraintsToBooleanBranches extension (#13129) * [TIR][Schedule] Add cache_index to precompute index of buffer load (#13192) * [TIR][Schedule] Add cache_inplace primitive to cache opaque buffer (#12939) * [UnitTest][TIR] Support IRModule comparisons in CompareBeforeAfter (#12920) * [TIR][Arith] Prove conditionals by transitively applying knowns (#12863) * [TIR, MetaSchedule] Preserve unit block iters for auto-tensorization (#12974) * [TIR][MetaSchedule] Add regression test for layout_rewrite extent=1 (#12916) * [TIR][Transform] Keep the allocate buffers order after update buffer allocation location (#13560) * [TIR][Schedule] Fix cache_read loc detecting and region_cover checking (#13345) * [TIR][Transform] Clear buffer_map during MakeUnpackedAPI (#12891) * [TIR][Schedule] Relax cache read/write's restriction and fix unexpected behavior (#12766) ### TOPI * [TOPI] Implement Einsum with reduction axes (#12913) * [TOPI] Add layer norm operator (#12864) * [TOPI] Add handwritten matvec for dynamic cases (#13423) * [TOPI] Fix dtype legalize logic for CPU dot product instruction (#12865) * [TOPI][Hexagon] Implement quantized adaptive_avg_pool1d for hexagon (#13282) * [TOPI][Hexagon] Implement quantized depthwise conv2d (#12499) ### Torch * [TVM PyTorch Integration] optimized_torch & as_torch how-to guide (#12318) * [frontend][pytorch]Support aten::Tensor_split operator (#12871) ### TVMC * [TVMC] Global pass context for compile and tune (#13309) ### TVMScript * [TVMScript] Improvements tvm.script.highlight (#13438) * [TVMScript] Reorganize the folder structure (#12496) * [TVMScript] TIR parser (#13190) * [TVMScript] IRModule parser (#13176) * [TVMScript] Evaluator, core parser, var table (#13088) * [TVMScript] AST, Source and diagnostics for Parser (#12978) * [TVMScript] Import TIR methods into the IRBuilder (#12900) * [TVMScript] Infer T.match_buffer parameters for region (#12890) -- View it on GitHub: https://github.com/apache/tvm/releases/tag/v0.11.0.rc0 You are receiving this because you are subscribed to this thread. Message ID: <apache/tvm/releases/93673...@github.com>