RFC @ https://discuss.tvm.apache.org/t/rfc-unity-msc-introduction-to-multi-system-compiler/15251 Tacking issue @ https://github.com/apache/tvm/issues/15233
This is the M0 of MSC(Multi-System Compiler): Build MSCGraph core parts. Enable translation between Relay, Relax and MSCGraph without loss information. Core parts: 1. MSCGraph: A DAG structure for store info from relax and relay. A MSCGraph can be build from relax or relay function 2. Codegen: A printer to generate target codes. A MSCGraph can be translated to relax This milestone add examples on translating relay to relax without loss information. This solution can be used to use relay-based features in relax, like quantization, pruning and some optimization passes. Both python and cpp codes are managed in a new module, so there are no change to the current code base. To use MSC, just set the USE_MSC in config.cmake to ON. You can view, comment on, or merge this pull request online at: https://github.com/apache/tvm/pull/15489 -- Commit Summary -- * [Unity] Preserve symbolic var args when applying call_tir (#14555) * [Unity] Enable pod args in WebGPU (#14560) * [Unity][Op] Symbolic shape support of take grad (#14559) * [Unity][Op] add einsum and flip in Relax (#14545) * [Unity][TOPI] Symbolic shape support for `collapse_sum` (#14535) * [Unity] Enhance Dynamic-aware FuseTIR (#14577) * [Unity][TVMScript] Update struct_info for GlobalVar (#14579) * [Unity][MetaSchedule] Skip Scheduled PrimFuncs in Task Generation (#14402) * [Unity][PyTorch] Disable gradient during dynamo subgraph capture to save RAM (#14602) * [Unity] Fix FuseTIR when the same buffer is read multiple times with different access pattern (#14603) * [Unity][MetaSchedule] BlockCollector focusing on current func (#14595) * [Unity][Op] Dynamic Strided Slice (#14548) * [Unity] Add pass for combining parallel matmul (#14583) * [MERGE] Bring changes from main into unity 2023-04-12 * [MERGE-FIX] Fix regresions after merge * [Unity][BYOC] Add fused patterns for stacked attention (#14608) * [Unity] Fix ForceNarrowI32 with pod arguments (#14605) * [Unity][UX] Symbolic Variables Used in Multiple Functions (#14606) * [Unity][Bugfix] Resolve failure on `test_e2e_op_dynamic.py` (#14616) * [Unity][VM] Converting tuple arg to Python tuple (#14620) * [Unity][VM] LibComparator using dtype from input (#14623) * [Unity] Update specific builtins for LM (#14617) * [Unity][CODEGEN] Fix metal codegen when with only single working dim (#14627) * [Unity][CUTLASS] Support batched matmul + residual fusion (#14613) * [Unity][CI] Update images to include jax deps (#14610) * [Unity] hotfix webgpu codegen for vec load (#14630) * [Unity][Frontend] Some changes on the PyTorch FX Frontend (#14625) * [Unity] update ci cpu/gpu images (#14631) * [Unity][AMP] Fix merging concrete type and "unknown" type (#14612) * [Unity][MetaSchedule] Add the module_equality param for tune_relax flow (#14537) * [Unity][TARGET] Updates vulkan codegen for DeclBuffer (#14641) * [Unity] `enable_warning` option for LegalizeOps and MSApplyDatabase (#14634) * [Unity] BlockBuilder assigning unique tensor names in call_te (#14632) * [Unity] Improve error message in webgpu request (#14640) * [Unity][Frontend] Add `no_bind_return_tuple` for PyTorch FX Translator (#14639) * Adding powerPreference argument to navigator.gpu.requestAdapter (#14650) * [Unity][BYOC] Fuse attention pattern with `strided_slice` (#14649) * [Unity] Improve and reduces possible memory leak RPC debug (#14662) * [Unity][BYOC] Add check for stacked attention patterns (#14664) * [Unity] Add rewriting for CUDA graph capturing (#14513) * [Unity][CUTLASS] Require the residual input to have the same shape as input (#14657) * [Unity] Update docs for operators (#14659) * [Unity] Improve WebGPU codegen for large grid (#14674) * [Unity] Use custom hash in `BlockBuilder` to avoid hashing large constants (#14675) * [Unity][Training] Optimizer library (#14670) * [Unity] Fix `DataflowReshapeRewrite` when input has multiple buffers from tuple (#14669) * [WebGPU] This PR fixes the webgpu runtime when there is no pod params (#14685) * [Unity][TuningAPI] Temporary patch for large models (#14691) * [Unity] FuseOps skipping PrimValues (#14687) * [Unity][CUTLASS] Fix CUTLASS codegen for occasional variable name conflict (#14692) * [Unity] Use split rather than slice in `CombineParallelMatmul` (#14688) * [Unity][WebGPU] Move NDArrayCache Support to relax runtime (#14689) * [Unity][Training] Loss functions and AppendLoss pass (#14668) * [Unity][Op] Avoid indices in TIR matmul being 0 in legalization (#14701) * [Unity] MetaScheduleApplyDatabase using workload from records (#14702) * [MERGE] Merge main into unity 2023-04-23 * [Unity] Reduce cast to fp32 for constant input in AMP (#14679) * [Unity] Smart parameter fetching (#14708) * [Unity][Training] Trainer and SetupTrainer (#14706) * [Unity][CUTLASS] Fixed memory leak in attention kernel offload (#14723) * [Unity] Cache ndarray-cache.json (#14722) * [Unity] Allow KVCache Access without Shape (#14726) * [Unity][CUTLASS] Fixed stacked attention offload when QKV reshape uses the same shape expression (#14728) * [Unity] Allow modifying function signature by AMP to accept fp16 inputs (#14719) * [Unity] Fix ForceNarrowIndexToI32 so it ignores i16 (#14733) * [Unity] Add system lib build option to relax (#14734) * [Unity] Process all Relax functions in CompositeFunctionAnnotator (#14736) * [Unity] Add options to MS tuning pass to enable more fine-grained tuning (#14730) * [Uniy][Op] Expand support of attention bias layout (#14737) * [Unity][CUTLASS] Add layer norm support (#14731) * [Unity][BYOC] Support implicit attention patterns (#14744) * [Unity][Relax] Memory planning for call_tir_dyn (#14750) * Fix super().visit_var_binding_ in PyExprVisitor and PyExprMutator (#14754) * [Unity][VM] `kill_tensor` and `kill_storage` releasing NDArray in VM at runtime (#14753) * [Unity] Limit number of characters in logger names (#14752) * [Unity] Fix FX translator no output issue (#14761) * [Unity] Enhance CopyWithNewVars utility (#14764) * [Unity][CI] Use the upgraded images (#14768) * [Unity] Update LM Sample builtins (#14793) * [Unity][Pass] Lazy transform params (#14769) * [Unity][Training] More Relax operators gradient supported (#14777) * [MERGE] Merge main into unity 2023-05-07 * Use latest lint image * [LINT] Fix clang-format script for newest clang-format * [MERGE] Fix lint after lint image upgrade * [Unity] Fix Unary Op Legalization (#14789) * [Unity] Introduce FewShotTuning Pass (#14624) * [Unity][BYOC] Fix incorrect bias stride in matmul cutlass offload (#14807) * [Unity] Add pass to allocate big workspace and pass it to all functions that need temp storage (#14802) * [Unity][BYOC] Add shape validation for bias arg in cuBLAS (#14809) * [Unity] NDArray Cache Efficient Load in OpenCL (#14816) * [Unity] Fix CUDA graph rewrite var used before def (#14800) * [Unity] Cover all Relax functions in implicit attention rewrite (#14818) * [Unity] Improve Error Message loading NDArray Shards (#14823) * [Unity][Training] Categorical cross entropy loss (#14757) * [Unity][CI] Update CPU image to install PyTorch (#14848) * [Unity][CI] Update CPU image to install PyTorch (fix) (#14852) * [MERGE] Merge main into unity 2023-05-14 * [Unity][FX] Add support for PT2.0 scaled_dot_product_attention (#14841) * [Unity] Support Mixed-Precision FMA Pattern (#14865) * [Unity]Lazy transform param now only work on non-dataflow block (#14864) * [Unity] Support multilib relax build (#14873) * [Unity][Relax] gelu-tanh operator (#14814) * [Unity][IR] Purity Tracking (#14394) * [Unity] Fix broken test in cutlass codegen (#14881) * [Unity][WEB] Enable String object (#14882) * [Unity] Only process relax functions in workspace annotation (#14896) * [TIR][Doc] Fix formatting in pop_trace docstring (#14894) * [Unity][CUTLASS] Fix for purity tracking (#14891) * [MERGE] merge main into unity 2023-05-20 * [Unity][WebGPU] Try F16 support for WebGPU Backend (#14904) * [Unity][Transform] Fix scalar case in DefaultGPUSchedule (#14928) * [Unity] Allow eliminating only call nodes in CSE pass (#14895) * [Unity] Support causal mask for `R.nn.attention` (#14907) * [Unity][WebGPU] Fix WebGPU adapter requiring features (#14933) * [Unity] Cleanup Web runtime wasi (#14929) * [Unity] Improve caching logic in webruntime (#14940) * [Unity] Enhance web cache to add scopes (#14953) * [Unity] Fix MergeCompositeFunctions for non-CallNode dataflow inputs (#14959) * [Unity][BYOC] Make CUTLASS attention rewriting aware of fp16 <-> f32 casting (#14957) * [Unity][Transform] Fix bug for tir expression in shape in fuse_tir (#14931) * [Unity] support update KV cache (#14964) * [Unity][CUTLASS] Attention hot fix (#14966) * [Unity][Frontend] Translate StableHLO to Relax (#14460) * [Unity][Training] Enhance op gradient (#14932) * [Unity][CUTLASS] Support more residual input shape (#14968) * [Unity] Allow filtering out unwanted branches in matmul combining pass (#14971) * [Unity] Add popn to kvcache (#14970) * [Unity] Fix LazyTransformParams use-def analysis and binding emission (#14974) * [Unity] Fix ConvertLayout on binary elemwise ops involving scalar input (#14961) * [Unity] Reset match state when backtracking (#14984) * [Unity][Training] Avoid problematic inputs to nll_loss in test_op_gradient_numeric (#14987) * [Unity][CUTLASS] Fix circular import bug in relax cutlass backend (#15001) * [MERGE] Merge main into unity 2023-06-01 * [Unity] Fix importing tvm.contrib.cutlass (#15010) * [Unity] Fix cutlass BYOC after merge (#15012) * [Unity][FIX] add init file to `relax.backend.contrib` (#15023) * [Unity] Allow name_hint in additional locations (#15027) * [Unity] Cutlass attention with dynamic sequence length (#15028) * [Unity][BYOC] Cache cuBlasLt handle with thread entry (#15030) * [Unity][NN] Allow nn.Placeholder/Parameter prior to BlockBuilder (#15025) * [Unity] Add hexp for compute capacity <= 5.2 (#15070) * [Unity] Add an API to create multiple kv caches with single allocation (#15064) * [Unity][Frontend] Add relax onnx importer and tests (#14999) * [Unity] Added bounds checking on TupleGetItem index (#15024) * [Unity] Optimize SampleTopPFromProb (#15072) * [Unity][Relax] Add masked_fill operator (#15077) * [Unity][Relax] Add bitwise and logical ops (AND, NOT, OR, XOR) (#15075) * [Bugfix][CUDA] Fix codegen for hexp for sm >= 52 (#15079) * [Unity] Fix FewShotTuning Failure When Missing Global Symbol (#15097) * [MERGE] Merge main into unity 2023-06-13 * [MERGE] Fix after merge * [Unity][Analysis] Reshape TIR detection with iter-map-simplify (#15099) * [MERGE] recover cutlass in unity * [Unity][FuseTIR] Flatten and add tuple fields to parameters / arguments only when they are used (#15113) * [Unity][Relax][UX] Specify function purity in the @R.function decorator (#15109) * [Unity][Relax] Make RewriteDataflowReshape only rewrite volume-preserving ops (#15112) * [Unity] Allocate workspace for all functions (#15118) * [Unity][BYOC] Integrate fp16 A - int4 B GEMM kernel from FasterTransformer into CUTLASS BYOC (#15111) * [Unity] Minor fix to `RewriteDataflowReshape` condition (#15125) * [Unity] Hotfix webgpu runtime (#15135) * [Unity][Relax] Generalize CSE to work outside DataflowBlocks (#15047) * [Unity] Fix handling of vm builtins in cuda graph (#15145) * [Unity][Bugfix] Fix purity annotation in CSE test (#15143) * [Unity] Scaffolding DLight (#15141) * [Unity][Pass] FuseOps with partially accessed Tuple param (#15152) * [MERGE] Merge main to unity 2023-06-24 * [Fix] Fix merge error * [Unity][Dlight] Add reduction rules (#15156) * [Unity][IR][UX] Privacy annotation in Relax (#15140) * [Unity][UX][Tweak] Make it an error to mark a function private and specify a global symbol (#15170) * [Unity] Support clear global memory allocators (#15172) * Merge remote-tracking branch 'apache-upstream/main' into unity * Fix super() visit function in PyExprVisitor and PyExprMutator (#15189) * [RPC] Disable socket SO_REUSEADDR for Windows (#15188) * [Unity] Legalization for LayoutTransform (#15184) * [Unity] Add memory scope and nd allocation support in allocators (#15178) * [Unity][Dlight] general reduction rule for gemv-decode (#15169) * [Unity][Dlight] Matmul Rules (#15191) * [MERGE] Merge main into unity 2023-07-03 * [MERGE] Fix testcase after merge * [Unity] Fix dlight reduction rule (#15194) * [VM] Add repetition penalty functions to Relax VM (#15219) * [Unity] Allow specifying struct_info for relax constant (#15220) * [Dlight] Enhance Decode-GEMV Schedule (#15195) * [Unity][TIR][Transform] Support no spatial axes cases for DefaultGPUSchedule (#15232) * [Unity] Fix memory statistics issues in estimate_memory_usage (#15224) * [Unity][NestedMsg] Add NestedMsgTo helper function (#15223) * [Unity][Dlight] Avoid TransformBlockLayout in GEMV Rule (#15248) * [Unity][Dlight] Handle Epilogue Broadcasting (#15252) * [Unity] Add a Standalone VM Version Number (#15254) * Merge remote-tracking branch 'apache-upstream/main' into unity-staging * [Unity][TIR] Allow symbolic bounds in IndexMap analysis (#15262) * [Unity][Training] Registering te gradient (#15231) * [MERGE] Hotfix layout transform related change after last merge from main * [Unity][Dlight] Minor performance improvement for gemm and gemv (#15278) * [Unity][BYOC] `PrimValue` handling in `FuseOpByPattern` for BYOC (#15217) * [Unity][CUTLASS] Offload RMS norm (#15288) * [Unity] [DistIR] Introducing DistIR (#15289) * [Dlight] Enhance fallback schedule with DecomposeReduction (#15302) * [Unity][Relax][Transform] Do not remove MatchCast for RemoveAllUnused (#15290) * [Unity] [Relax] [ONNX frontend] [op] Add support for Trilu operator (#15299) * [Unity] Update CUTLASS Attention to incorprate upstream change (#15309) * [Unity] Fix FuseOpsByPattern when a subgraph can be matched by multiple residual patterns (#15308) * [Unity][Op] Add leaky relu operator (#15296) * [Unity][Dlight] Add schedule rule for decode transpose (#15304) * [Unity] CUDA Graph update (#15320) * [Unity][Training] Enhance gradient system (#15230) * [Unity][Dlight] Fix decode-GeMV rule when spatial-inner without broadcasting (#15330) * [Unity][Dlight] Rule matmul avoiding blockIdx.z (#15333) * [Unity] fp16 A x int B GEMM update - support int8, more bias shape (#15318) * [Unity][Dlight] Fix DecodeGeMV rule for spatial-inner with grouping (#15340) * Merge remote-tracking branch 'apache-upstream/main' into unity * [Unity][OP] Sync `rms_norm` with main (#15355) * Add the ability to differentiate between model loads from remote fetch v/s model loads from cache (#15357) * [Unity][Dlight] Fix matmul schedule when out_dtype = fp32 and bias add is fp32 (#15363) * [Unity][CUTLASS] Support `out_dtype = "float32"` for FasterTransformer kernel (#15377) * [Unity] Avoid overloaded-virtual warnings (#15382) * [Dlight] Benchmarking Tools for Dynamic Shape PrimFuncs & Relax Function (#15322) * [Unity][DLight] GEMV Rules (#15381) * [Unity] Add support for AXIS_SEPARATOR in AlterOpImpl Pass (#15315) * [Unity][Dlight] Fix reduction rule, aligning last block's iters (#15383) * [Unity][Module] Add Core Data Structure (#15398) * [Unity][DLight] Fix Reduction Rule (#15412) * [Unity] Lowering of axis separator in Layout Transform (#15390) * [Unity][Dlight] Tensorization Rule in GPU Matmul (#15389) * [Unity] nn.Module Spec (#15416) * [Unity] nn.Module Op (#15418) * [Unity] nn.Module Torch Integration (#15424) * [Unity][Dlight] Improve Dlight Tensorization Rule (#15427) * [Unity][DLight] Update GEMV rules (#15429) * [Unity][Frontends][Onnx] Improve ConstantOfShape behavior (#15434) * [Unity] nn.Module Module and Effect (#15438) * [Unity] cuda graph support for cublas (#15435) * [Unity][Op] Implement basic `call_tir_inplace` operator (#15372) * [Unity] dynamo with dynamic shape (#15441) * [Unity][Ops] Support for erf in relax (#15445) * [Unity][Dlight] Avoid too large vectorization factor in caching (#15443) * Merge remote-tracking branch 'upstream/main' into unity-staging * [MERGE-FIX] Update the code to fix merge issues * [Unity]: fix error on enum (#15451) * [Unity] Eslint and TypeScript configuration fix (#15452) * [Unity][Op] Support symbolic shape inference for slice op. (#15450) * [Unity][Frontend][Onnx] Simplify gemm (#15458) * [Unity][Op] Conv1dTranspose (#15456) * [Unity][Transform] Elide redundant bindings of dataflow vars (#15341) * [Unity][Fix][Op] Add groups to conv1d (#15457) * [Bugfix][CUTLASS] CUTLASS path finding (#15476) * [Cherry-Pick][BugFix][TIR] ThreadSync with shared.dyn awareness (#15481) * add msc -- File Changes -- M 3rdparty/cutlass (2) M 3rdparty/cutlass_fpA_intB_gemm (2) M CMakeLists.txt (14) A apps/relax_examples/e2e_auto_tir.py (253) A apps/relax_examples/mlp.py (57) A apps/relax_examples/nn_module.py (69) A apps/relax_examples/resnet.py (53) M ci/jenkins/generated/arm_jenkinsfile.groovy (5) M ci/jenkins/generated/cortexm_jenkinsfile.groovy (5) M ci/jenkins/generated/cpu_jenkinsfile.groovy (5) M ci/jenkins/generated/docker_jenkinsfile.groovy (5) M ci/jenkins/generated/gpu_jenkinsfile.groovy (5) M ci/jenkins/generated/hexagon_jenkinsfile.groovy (5) M ci/jenkins/generated/i386_jenkinsfile.groovy (5) M ci/jenkins/generated/lint_jenkinsfile.groovy (5) M ci/jenkins/generated/minimal_cross_isa_jenkinsfile.groovy (5) M ci/jenkins/generated/minimal_jenkinsfile.groovy (5) M ci/jenkins/generated/riscv_jenkinsfile.groovy (5) M ci/jenkins/generated/wasm_jenkinsfile.groovy (5) A ci/jenkins/unity_jenkinsfile.groovy (337) M cmake/config.cmake (3) M cmake/modules/CUDA.cmake (8) M cmake/modules/contrib/CUTLASS.cmake (5) M cmake/modules/contrib/DNNL.cmake (8) A cmake/modules/contrib/MSC.cmake (26) M cmake/modules/contrib/TensorRT.cmake (2) M include/tvm/ir/expr.h (9) M include/tvm/ir/function.h (133) A include/tvm/ir/global_info.h (80) M include/tvm/ir/module.h (22) M include/tvm/ir/name_supply.h (46) M include/tvm/ir/transform.h (54) M include/tvm/ir/type.h (3) M include/tvm/node/script_printer.h (5) A include/tvm/relax/analysis.h (484) A include/tvm/relax/attrs/create.h (54) A include/tvm/relax/attrs/datatype.h (53) A include/tvm/relax/attrs/distributed.h (48) A include/tvm/relax/attrs/image.h (81) A include/tvm/relax/attrs/index.h (68) A include/tvm/relax/attrs/linear_algebra.h (53) A include/tvm/relax/attrs/manipulate.h (162) A include/tvm/relax/attrs/nn.h (394) A include/tvm/relax/attrs/op.h (63) A include/tvm/relax/attrs/search.h (48) A include/tvm/relax/attrs/statistical.h (63) A include/tvm/relax/backend.h (51) A include/tvm/relax/binding_rewrite.h (115) A include/tvm/relax/block_builder.h (240) A include/tvm/relax/dataflow_matcher.h (74) A include/tvm/relax/dataflow_pattern.h (828) A include/tvm/relax/dataflow_pattern_functor.h (183) A include/tvm/relax/distributed/axis_group_graph.h (305) A include/tvm/relax/distributed/global_info.h (92) A include/tvm/relax/distributed/struct_info.h (192) A include/tvm/relax/distributed/transform.h (56) A include/tvm/relax/exec_builder.h (181) A include/tvm/relax/expr.h (1059) A include/tvm/relax/expr_functor.h (551) A include/tvm/relax/nested_msg.h (602) A include/tvm/relax/op_attr_types.h (75) A include/tvm/relax/struct_info.h (451) A include/tvm/relax/struct_info_functor.h (157) A include/tvm/relax/tir_pattern.h (75) A include/tvm/relax/transform.h (537) A include/tvm/relax/tuning_api.h (396) A include/tvm/relax/type.h (166) A include/tvm/relax/utils.h (111) M include/tvm/relay/transform.h (2) M include/tvm/runtime/module.h (4) A include/tvm/runtime/relax_vm/builtin.h (89) A include/tvm/runtime/relax_vm/bytecode.h (223) A include/tvm/runtime/relax_vm/executable.h (219) A include/tvm/runtime/relax_vm/memory_manager.h (152) A include/tvm/runtime/relax_vm/vm.h (187) M include/tvm/script/ir_builder/ir/frame.h (4) A include/tvm/script/ir_builder/relax/frame.h (297) A include/tvm/script/ir_builder/relax/ir.h (146) M include/tvm/te/operation.h (2) M include/tvm/tir/buffer.h (14) M include/tvm/tir/builtin.h (44) M include/tvm/tir/data_type_rewriter.h (12) M include/tvm/tir/function.h (7) M include/tvm/tir/transform.h (22) M include/tvm/topi/nn/group_norm.h (31) M include/tvm/topi/nn/layer_norm.h (28) M include/tvm/topi/nn/rms_norm.h (11) M include/tvm/topi/transform.h (52) M jvm/pom.xml (4) M python/tvm/_ffi/libinfo.py (21) A python/tvm/contrib/cutlass/attention_operation.py (161) M python/tvm/contrib/cutlass/build.py (494) M python/tvm/contrib/cutlass/conv2d_operation.py (61) M python/tvm/contrib/cutlass/gemm_operation.py (216) M python/tvm/contrib/cutlass/gemm_profiler.py (4) M python/tvm/contrib/cutlass/gen_conv2d.py (119) M python/tvm/contrib/cutlass/gen_gemm.py (81) M python/tvm/contrib/cutlass/gen_tensor_op.py (349) A python/tvm/contrib/cutlass/layer_norm_operation.py (48) M python/tvm/contrib/cutlass/library.py (8) A python/tvm/contrib/cutlass/rms_norm_operation.py (47) M python/tvm/contrib/hexagon/session.py (37) A python/tvm/contrib/msc/__init__.py (17) A python/tvm/contrib/msc/core/__init__.py (17) A python/tvm/contrib/msc/core/_ffi_api.py (21) A python/tvm/contrib/msc/core/ir/__init__.py (20) A python/tvm/contrib/msc/core/ir/graph.py (520) A python/tvm/contrib/msc/core/ir/translate.py (172) A python/tvm/contrib/msc/core/runtime/__init__.py (17) A python/tvm/contrib/msc/core/tools/__init__.py (17) A python/tvm/contrib/msc/core/transform/__init__.py (20) A python/tvm/contrib/msc/core/transform/pattern.py (490) A python/tvm/contrib/msc/core/transform/transform.py (61) A python/tvm/contrib/msc/core/utils/__init__.py (23) A python/tvm/contrib/msc/core/utils/expr.py (105) A python/tvm/contrib/msc/core/utils/file.py (159) A python/tvm/contrib/msc/core/utils/info.py (68) A python/tvm/contrib/msc/core/utils/logging.py (17) A python/tvm/contrib/msc/core/utils/namespace.py (64) A python/tvm/contrib/msc/core/utils/register.py (61) A python/tvm/contrib/msc/framework/__init__.py (17) A python/tvm/contrib/msc/framework/tvm/__init__.py (17) A python/tvm/contrib/msc/framework/tvm/_ffi_api.py (21) A python/tvm/contrib/msc/framework/tvm/codegen/__init__.py (19) A python/tvm/contrib/msc/framework/tvm/codegen/translate.py (74) A python/tvm/contrib/msc/pipeline/__init__.py (17) A python/tvm/contrib/tvmjs.py (305) A python/tvm/dlight/__init__.py (27) A python/tvm/dlight/base/__init__.py (28) A python/tvm/dlight/base/analysis.py (253) A python/tvm/dlight/base/common_schedules.py (98) A python/tvm/dlight/base/schedule_rule.py (105) A python/tvm/dlight/base/transform.py (86) A python/tvm/dlight/benchmark/__init__.py (24) A python/tvm/dlight/benchmark/bench.py (312) A python/tvm/dlight/benchmark/extract.py (351) A python/tvm/dlight/benchmark/utils.py (172) A python/tvm/dlight/gpu/__init__.py (26) A python/tvm/dlight/gpu/fallback.py (70) A python/tvm/dlight/gpu/gemv.py (296) A python/tvm/dlight/gpu/general_reduction.py (94) A python/tvm/dlight/gpu/matmul.py (632) A python/tvm/dlight/gpu/reduction.py (244) A python/tvm/dlight/gpu/transpose.py (128) A python/tvm/dlight/gpu/utils.py (87) M python/tvm/exec/microtvm_debug_shell.py (8) M python/tvm/exec/rpc_proxy.py (38) M python/tvm/ir/__init__.py (1) M python/tvm/ir/expr.py (62) M python/tvm/ir/function.py (27) A python/tvm/ir/global_info.py (42) M python/tvm/ir/module.py (53) M python/tvm/ir/supply.py (7) M python/tvm/ir/transform.py (95) M python/tvm/meta_schedule/__init__.py (1) M python/tvm/meta_schedule/builder/local_builder.py (12) M python/tvm/meta_schedule/logging.py (3) A python/tvm/meta_schedule/relax_integration.py (0) M python/tvm/meta_schedule/runner/local_runner.py (0) M python/tvm/meta_schedule/testing/tune_utils.py (0) M python/tvm/meta_schedule/tir_integration.py (0) M python/tvm/meta_schedule/tune_context.py (0) M python/tvm/meta_schedule/utils.py (0) A python/tvm/relax/__init__.py (0) A python/tvm/relax/_ffi_api.py (0) A python/tvm/relax/analysis/__init__.py (0) A python/tvm/relax/analysis/_ffi_api.py (0) A python/tvm/relax/analysis/analysis.py (0) A python/tvm/relax/analysis/estimate_memory_usage.py (0) A python/tvm/relax/backend/__init__.py (0) A python/tvm/relax/backend/_ffi_api.py (0) A python/tvm/relax/backend/contrib/__init__.py (0) A python/tvm/relax/backend/contrib/cublas.py (0) A python/tvm/relax/backend/contrib/cutlass.py (0) A python/tvm/relax/backend/pattern_registry.py (0) A python/tvm/relax/backend/patterns.py (0) A python/tvm/relax/backend_tir/__init__.py (0) A python/tvm/relax/backend_tir/contrib/__init__.py (0) A python/tvm/relax/backend_tir/contrib/cutlass.py (0) A python/tvm/relax/backend_tir/pattern.py (0) A python/tvm/relax/binding_rewrite.py (0) A python/tvm/relax/block_builder.py (0) A python/tvm/relax/distributed/__init__.py (0) A python/tvm/relax/distributed/_ffi_api.py (0) A python/tvm/relax/distributed/global_info.py (0) A python/tvm/relax/distributed/struct_info.py (0) A python/tvm/relax/distributed/transform/__init__.py (0) A python/tvm/relax/distributed/transform/_ffi_api.py (0) A python/tvm/relax/distributed/transform/transform.py (0) A python/tvm/relax/dpl/__init__.py (0) A python/tvm/relax/dpl/_ffi.py (0) A python/tvm/relax/dpl/context.py (0) A python/tvm/relax/dpl/pattern.py (0) A python/tvm/relax/dpl/rewrite.py (0) A python/tvm/relax/exec_builder.py (0) A python/tvm/relax/expr.py (0) A python/tvm/relax/expr_functor.py (0) A python/tvm/relax/frontend/__init__.py (0) A python/tvm/relax/frontend/common.py (0) A python/tvm/relax/frontend/nn/__init__.py (0) A python/tvm/relax/frontend/nn/_tensor_op.py (0) A python/tvm/relax/frontend/nn/core.py (0) A python/tvm/relax/frontend/nn/modules.py (0) A python/tvm/relax/frontend/nn/op.py (0) A python/tvm/relax/frontend/nn/spec.py (0) A python/tvm/relax/frontend/nn/torch.py (0) A python/tvm/relax/frontend/onnx/__init__.py (0) A python/tvm/relax/frontend/onnx/onnx_frontend.py (0) A python/tvm/relax/frontend/stablehlo/__init__.py (0) A python/tvm/relax/frontend/stablehlo/stablehlo_translator.py (0) A python/tvm/relax/frontend/torch/__init__.py (0) A python/tvm/relax/frontend/torch/dynamo.py (0) A python/tvm/relax/frontend/torch/fx_translator.py (0) A python/tvm/relax/ir/instrument.py (0) A python/tvm/relax/op/__init__.py (0) A python/tvm/relax/op/_ffi_api.py (0) A python/tvm/relax/op/_op_gradient.py (0) A python/tvm/relax/op/base.py (0) A python/tvm/relax/op/binary.py (0) A python/tvm/relax/op/builtin/__init__.py (0) A python/tvm/relax/op/builtin/_ffi_api.py (0) A python/tvm/relax/op/builtin/builtin.py (0) A python/tvm/relax/op/create.py (0) A python/tvm/relax/op/datatype.py (0) A python/tvm/relax/op/distributed/__init__.py (0) A python/tvm/relax/op/distributed/_ffi_api.py (0) A python/tvm/relax/op/distributed/distributed.py (0) A python/tvm/relax/op/grad/__init__.py (0) A python/tvm/relax/op/grad/_ffi_api.py (0) A python/tvm/relax/op/grad/grad.py (0) A python/tvm/relax/op/image/__init__.py (0) A python/tvm/relax/op/image/_ffi_api.py (0) A python/tvm/relax/op/image/image.py (0) A python/tvm/relax/op/index.py (0) A python/tvm/relax/op/linear_algebra.py (0) A python/tvm/relax/op/manipulate.py (0) A python/tvm/relax/op/mask.py (0) A python/tvm/relax/op/memory/__init__.py (0) A python/tvm/relax/op/memory/_ffi_api.py (0) A python/tvm/relax/op/memory/memory.py (0) A python/tvm/relax/op/nn/__init__.py (0) A python/tvm/relax/op/nn/_ffi_api.py (0) A python/tvm/relax/op/nn/nn.py (0) A python/tvm/relax/op/op_attrs.py (0) A python/tvm/relax/op/search.py (0) A python/tvm/relax/op/set.py (0) A python/tvm/relax/op/statistical.py (0) A python/tvm/relax/op/ternary.py (0) A python/tvm/relax/op/unary.py (0) A python/tvm/relax/op/vm/__init__.py (0) A python/tvm/relax/op/vm/_ffi_api.py (0) A python/tvm/relax/op/vm/vm.py (0) A python/tvm/relax/pipeline.py (0) A python/tvm/relax/struct_info.py (0) A python/tvm/relax/testing/__init__.py (0) A python/tvm/relax/testing/ast_printer.py (0) A python/tvm/relax/testing/lib_comparator.py (0) A python/tvm/relax/testing/matmul.py (0) A python/tvm/relax/testing/nn.py (0) A python/tvm/relax/testing/relay_translator.py (0) A python/tvm/relax/testing/runtime_builtin.py (0) A python/tvm/relax/testing/transform.py (0) A python/tvm/relax/testing/vm.py (0) A python/tvm/relax/training/__init__.py (0) A python/tvm/relax/training/_ffi_api.py (0) A python/tvm/relax/training/loss.py (0) A python/tvm/relax/training/optimizer.py (0) A python/tvm/relax/training/setup_trainer.py (0) A python/tvm/relax/training/trainer.py (0) A python/tvm/relax/training/utils.py (0) A python/tvm/relax/transform/__init__.py (0) A python/tvm/relax/transform/_ffi_api.py (0) A python/tvm/relax/transform/lazy_transform_params.py (0) A python/tvm/relax/transform/legalize_ops/__init__.py (0) A python/tvm/relax/transform/legalize_ops/binary.py (0) A python/tvm/relax/transform/legalize_ops/common.py (0) A python/tvm/relax/transform/legalize_ops/create.py (0) A python/tvm/relax/transform/legalize_ops/datatype.py (0) A python/tvm/relax/transform/legalize_ops/grad.py (0) A python/tvm/relax/transform/legalize_ops/image.py (0) A python/tvm/relax/transform/legalize_ops/index.py (0) A python/tvm/relax/transform/legalize_ops/linear_algebra.py (0) A python/tvm/relax/transform/legalize_ops/manipulate.py (0) A python/tvm/relax/transform/legalize_ops/nn.py (0) A python/tvm/relax/transform/legalize_ops/search.py (0) A python/tvm/relax/transform/legalize_ops/statistical.py (0) A python/tvm/relax/transform/legalize_ops/unary.py (0) A python/tvm/relax/transform/transform.py (0) A python/tvm/relax/transform/tuning_api/__init__.py (0) A python/tvm/relax/transform/tuning_api/_ffi_api.py (0) A python/tvm/relax/transform/tuning_api/database.py (0) A python/tvm/relax/transform/tuning_api/default_functions.py (0) A python/tvm/relax/transform/tuning_api/primitives.py (0) A python/tvm/relax/ty.py (0) A python/tvm/relax/utils.py (0) A python/tvm/relax/vm_build.py (0) M python/tvm/relay/op/op_attrs.py (0) M python/tvm/rpc/proxy.py (0) M python/tvm/rpc/server.py (0) M python/tvm/rpc/tracker.py (0) -- Patch Links -- https://github.com/apache/tvm/pull/15489.patch https://github.com/apache/tvm/pull/15489.diff -- Reply to this email directly or view it on GitHub: https://github.com/apache/tvm/pull/15489 You are receiving this because you are subscribed to this thread. Message ID: <apache/tvm/pull/15...@github.com>