[apache/tvm] [RFC][Unity][MSC] MileStone 0 (PR #15489)

Archermmt Fri, 04 Aug 2023 19:38:10 -0700

RFC @ 
https://discuss.tvm.apache.org/t/rfc-unity-msc-introduction-to-multi-system-compiler/15251
Tacking issue @ https://github.com/apache/tvm/issues/15233


This is the M0 of MSC(Multi-System Compiler): Build MSCGraph core parts. Enable 
translation between Relay, Relax and MSCGraph without loss information.

Core parts:
1. MSCGraph: A DAG structure for store info from relax and relay. A MSCGraph 
can be build from relax or relay function
2. Codegen: A printer to generate target codes. A MSCGraph can be translated to 
relax

This milestone add examples on translating relay to relax without loss 
information. This solution can be used to use relay-based features in relax, 
like quantization, pruning and some optimization passes.

Both python and cpp codes are managed in a new module, so there are no change 
to the current code base. To use MSC, just set the USE_MSC in config.cmake to 
ON.
You can view, comment on, or merge this pull request online at:

  https://github.com/apache/tvm/pull/15489

-- Commit Summary --

  * [Unity] Preserve symbolic var args when applying call_tir (#14555)
  * [Unity] Enable pod args in WebGPU (#14560)
  * [Unity][Op] Symbolic shape support of take grad (#14559)
  * [Unity][Op] add einsum and flip in Relax (#14545)
  * [Unity][TOPI] Symbolic shape support for `collapse_sum` (#14535)
  * [Unity] Enhance Dynamic-aware FuseTIR (#14577)
  * [Unity][TVMScript] Update struct_info for GlobalVar (#14579)
  * [Unity][MetaSchedule] Skip Scheduled PrimFuncs in Task Generation (#14402)
  * [Unity][PyTorch] Disable gradient during dynamo subgraph capture to save 
RAM (#14602)
  * [Unity] Fix FuseTIR when the same buffer is read multiple times with 
different access pattern (#14603)
  * [Unity][MetaSchedule] BlockCollector focusing on current func (#14595)
  * [Unity][Op] Dynamic Strided Slice (#14548)
  * [Unity] Add pass for combining parallel matmul (#14583)
  * [MERGE] Bring changes from main into unity 2023-04-12
  * [MERGE-FIX] Fix regresions after merge
  * [Unity][BYOC] Add fused patterns for stacked attention (#14608)
  * [Unity] Fix ForceNarrowI32 with pod arguments (#14605)
  * [Unity][UX] Symbolic Variables Used in Multiple Functions (#14606)
  * [Unity][Bugfix] Resolve failure on `test_e2e_op_dynamic.py` (#14616)
  * [Unity][VM] Converting tuple arg to Python tuple (#14620)
  * [Unity][VM] LibComparator using dtype from input (#14623)
  * [Unity] Update specific builtins for LM (#14617)
  * [Unity][CODEGEN] Fix metal codegen when with only single working dim 
(#14627)
  * [Unity][CUTLASS] Support batched matmul + residual fusion (#14613)
  * [Unity][CI] Update images to include jax deps (#14610)
  * [Unity] hotfix webgpu codegen for vec load (#14630)
  * [Unity][Frontend] Some changes on the PyTorch FX Frontend (#14625)
  * [Unity] update ci cpu/gpu images (#14631)
  * [Unity][AMP] Fix merging concrete type and &quot;unknown&quot; type (#14612)
  * [Unity][MetaSchedule] Add the module_equality param for tune_relax flow 
(#14537)
  * [Unity][TARGET] Updates vulkan codegen for DeclBuffer (#14641)
  * [Unity] `enable_warning` option for LegalizeOps and MSApplyDatabase (#14634)
  * [Unity] BlockBuilder assigning unique tensor names in call_te (#14632)
  * [Unity] Improve error message in webgpu request (#14640)
  * [Unity][Frontend] Add `no_bind_return_tuple` for PyTorch FX Translator 
(#14639)
  * Adding powerPreference argument to navigator.gpu.requestAdapter (#14650)
  * [Unity][BYOC] Fuse attention pattern with `strided_slice` (#14649)
  * [Unity] Improve and reduces possible memory leak RPC debug (#14662)
  * [Unity][BYOC] Add check for stacked attention patterns (#14664)
  * [Unity] Add rewriting for CUDA graph capturing (#14513)
  * [Unity][CUTLASS] Require the residual input to have the same shape as input 
(#14657)
  * [Unity] Update docs for operators (#14659)
  * [Unity] Improve WebGPU codegen for large grid (#14674)
  * [Unity] Use custom hash in `BlockBuilder` to avoid hashing large constants 
(#14675)
  * [Unity][Training] Optimizer library (#14670)
  * [Unity] Fix `DataflowReshapeRewrite` when input has multiple buffers from 
tuple  (#14669)
  * [WebGPU] This PR fixes the webgpu runtime when there is no pod params 
(#14685)
  * [Unity][TuningAPI] Temporary patch for large models  (#14691)
  * [Unity] FuseOps skipping PrimValues (#14687)
  * [Unity][CUTLASS] Fix CUTLASS codegen for occasional variable name conflict 
(#14692)
  * [Unity] Use split rather than slice in `CombineParallelMatmul` (#14688)
  * [Unity][WebGPU] Move NDArrayCache Support to relax runtime (#14689)
  * [Unity][Training] Loss functions and AppendLoss pass (#14668)
  * [Unity][Op] Avoid indices in TIR matmul being 0 in legalization (#14701)
  * [Unity] MetaScheduleApplyDatabase using workload from records (#14702)
  * [MERGE] Merge main into unity 2023-04-23
  * [Unity] Reduce cast to fp32 for constant input in AMP (#14679)
  * [Unity] Smart parameter fetching (#14708)
  * [Unity][Training] Trainer and SetupTrainer (#14706)
  * [Unity][CUTLASS] Fixed memory leak in attention kernel offload (#14723)
  * [Unity] Cache ndarray-cache.json (#14722)
  * [Unity] Allow KVCache Access without Shape (#14726)
  * [Unity][CUTLASS] Fixed stacked attention offload when QKV reshape uses the 
same shape expression (#14728)
  * [Unity] Allow modifying function signature by AMP to accept fp16 inputs  
(#14719)
  * [Unity] Fix ForceNarrowIndexToI32 so it ignores i16 (#14733)
  * [Unity] Add system lib build option to relax (#14734)
  * [Unity] Process all Relax functions in CompositeFunctionAnnotator (#14736)
  * [Unity] Add options to MS tuning pass to enable more fine-grained tuning 
(#14730)
  * [Uniy][Op] Expand support of attention bias layout (#14737)
  * [Unity][CUTLASS] Add layer norm support (#14731)
  * [Unity][BYOC] Support implicit attention patterns (#14744)
  * [Unity][Relax] Memory planning for call_tir_dyn (#14750)
  * Fix super().visit_var_binding_ in PyExprVisitor and PyExprMutator (#14754)
  * [Unity][VM] `kill_tensor` and `kill_storage` releasing NDArray in VM at 
runtime (#14753)
  * [Unity] Limit number of characters in logger names (#14752)
  * [Unity] Fix FX translator no output issue (#14761)
  * [Unity] Enhance CopyWithNewVars utility (#14764)
  * [Unity][CI] Use the upgraded images (#14768)
  * [Unity] Update LM Sample builtins (#14793)
  * [Unity][Pass] Lazy transform params (#14769)
  * [Unity][Training] More Relax operators gradient supported (#14777)
  * [MERGE] Merge main into unity 2023-05-07
  * Use latest lint image
  * [LINT] Fix clang-format script for newest clang-format
  * [MERGE] Fix lint after lint image upgrade
  * [Unity] Fix Unary Op Legalization (#14789)
  * [Unity] Introduce FewShotTuning Pass (#14624)
  * [Unity][BYOC] Fix incorrect bias stride in matmul cutlass offload (#14807)
  * [Unity] Add pass to allocate big workspace and pass it to all functions 
that need temp storage   (#14802)
  * [Unity][BYOC] Add shape validation for bias arg in cuBLAS (#14809)
  * [Unity] NDArray Cache Efficient Load in OpenCL (#14816)
  * [Unity] Fix CUDA graph rewrite var used before def (#14800)
  * [Unity] Cover all Relax functions in implicit attention rewrite (#14818)
  * [Unity] Improve Error Message loading NDArray Shards (#14823)
  * [Unity][Training] Categorical cross entropy loss (#14757)
  * [Unity][CI] Update CPU image to install PyTorch (#14848)
  * [Unity][CI] Update CPU image to install PyTorch (fix) (#14852)
  * [MERGE] Merge main into unity 2023-05-14
  * [Unity][FX] Add support for PT2.0 scaled_dot_product_attention (#14841)
  * [Unity] Support Mixed-Precision FMA Pattern (#14865)
  * [Unity]Lazy transform param now only work on non-dataflow block (#14864)
  * [Unity] Support multilib relax build (#14873)
  * [Unity][Relax] gelu-tanh operator (#14814)
  * [Unity][IR] Purity Tracking (#14394)
  * [Unity] Fix broken test in cutlass codegen (#14881)
  * [Unity][WEB] Enable String object (#14882)
  * [Unity] Only process relax functions in workspace annotation (#14896)
  * [TIR][Doc] Fix formatting in pop_trace docstring (#14894)
  * [Unity][CUTLASS] Fix for purity tracking (#14891)
  * [MERGE] merge main into unity 2023-05-20
  * [Unity][WebGPU] Try F16 support for  WebGPU Backend (#14904)
  * [Unity][Transform] Fix scalar case in DefaultGPUSchedule (#14928)
  * [Unity] Allow eliminating only call nodes in CSE pass (#14895)
  * [Unity] Support causal mask for `R.nn.attention` (#14907)
  * [Unity][WebGPU] Fix WebGPU adapter requiring features (#14933)
  * [Unity] Cleanup Web runtime wasi (#14929)
  * [Unity] Improve caching logic in webruntime (#14940)
  * [Unity] Enhance web cache to add scopes (#14953)
  * [Unity] Fix MergeCompositeFunctions for non-CallNode dataflow inputs  
(#14959)
  * [Unity][BYOC] Make CUTLASS attention rewriting aware of fp16 &lt;-&gt; f32 
casting  (#14957)
  * [Unity][Transform] Fix bug for tir expression in shape in fuse_tir (#14931)
  * [Unity] support update KV cache (#14964)
  * [Unity][CUTLASS] Attention hot fix (#14966)
  * [Unity][Frontend] Translate StableHLO to Relax (#14460)
  * [Unity][Training] Enhance op gradient (#14932)
  * [Unity][CUTLASS] Support more residual input shape (#14968)
  * [Unity] Allow filtering out unwanted branches in matmul combining pass 
(#14971)
  * [Unity] Add popn to kvcache (#14970)
  * [Unity] Fix LazyTransformParams use-def analysis and binding emission 
(#14974)
  * [Unity] Fix ConvertLayout on binary elemwise ops involving scalar input 
(#14961)
  * [Unity] Reset match state when backtracking (#14984)
  * [Unity][Training] Avoid problematic inputs to nll_loss in 
test_op_gradient_numeric (#14987)
  * [Unity][CUTLASS]  Fix circular import bug in relax cutlass backend (#15001)
  * [MERGE] Merge main into unity 2023-06-01
  * [Unity] Fix importing tvm.contrib.cutlass (#15010)
  * [Unity] Fix cutlass BYOC after merge (#15012)
  * [Unity][FIX] add init file to `relax.backend.contrib` (#15023)
  * [Unity] Allow name_hint in additional locations (#15027)
  * [Unity] Cutlass attention with dynamic sequence length (#15028)
  * [Unity][BYOC] Cache cuBlasLt handle with thread entry (#15030)
  * [Unity][NN] Allow nn.Placeholder/Parameter prior to BlockBuilder (#15025)
  * [Unity] Add hexp for compute capacity &lt;= 5.2 (#15070)
  * [Unity] Add an API to create multiple kv caches with single allocation 
(#15064)
  * [Unity][Frontend] Add relax onnx importer and tests (#14999)
  * [Unity] Added bounds checking on TupleGetItem index (#15024)
  * [Unity] Optimize SampleTopPFromProb (#15072)
  * [Unity][Relax] Add masked_fill operator (#15077)
  * [Unity][Relax] Add bitwise and logical ops (AND, NOT, OR, XOR) (#15075)
  * [Bugfix][CUDA] Fix codegen for hexp for sm &gt;= 52 (#15079)
  * [Unity] Fix FewShotTuning Failure When Missing Global Symbol (#15097)
  * [MERGE] Merge main into unity 2023-06-13
  * [MERGE] Fix after merge
  * [Unity][Analysis] Reshape TIR detection with iter-map-simplify (#15099)
  * [MERGE] recover cutlass in unity
  * [Unity][FuseTIR] Flatten and add tuple fields to parameters / arguments 
only when they are used (#15113)
  * [Unity][Relax][UX] Specify function purity in the @R.function decorator 
(#15109)
  * [Unity][Relax] Make RewriteDataflowReshape only rewrite volume-preserving 
ops (#15112)
  * [Unity] Allocate workspace for all functions (#15118)
  * [Unity][BYOC] Integrate fp16 A - int4 B GEMM kernel from FasterTransformer 
into CUTLASS BYOC  (#15111)
  * [Unity] Minor fix to `RewriteDataflowReshape` condition  (#15125)
  * [Unity] Hotfix webgpu runtime (#15135)
  * [Unity][Relax] Generalize CSE to work outside DataflowBlocks (#15047)
  * [Unity] Fix handling of vm builtins in cuda graph (#15145)
  * [Unity][Bugfix] Fix purity annotation in CSE test (#15143)
  * [Unity] Scaffolding DLight (#15141)
  * [Unity][Pass] FuseOps with partially accessed Tuple param (#15152)
  * [MERGE] Merge main to unity 2023-06-24
  * [Fix] Fix merge error
  * [Unity][Dlight] Add reduction rules (#15156)
  * [Unity][IR][UX] Privacy annotation in Relax (#15140)
  * [Unity][UX][Tweak] Make it an error to mark a function private and specify 
a global symbol (#15170)
  * [Unity] Support clear global memory allocators (#15172)
  * Merge remote-tracking branch &#39;apache-upstream/main&#39; into unity
  * Fix super() visit function in PyExprVisitor and PyExprMutator (#15189)
  * [RPC] Disable socket SO_REUSEADDR for Windows (#15188)
  * [Unity] Legalization for LayoutTransform (#15184)
  * [Unity] Add memory scope and nd allocation support in allocators (#15178)
  * [Unity][Dlight] general reduction rule for gemv-decode (#15169)
  * [Unity][Dlight] Matmul Rules (#15191)
  * [MERGE] Merge main into unity 2023-07-03
  * [MERGE] Fix testcase after merge
  * [Unity] Fix dlight reduction rule (#15194)
  * [VM] Add repetition penalty functions to Relax VM (#15219)
  * [Unity] Allow specifying struct_info for relax constant (#15220)
  * [Dlight] Enhance Decode-GEMV Schedule (#15195)
  * [Unity][TIR][Transform] Support no spatial axes cases for 
DefaultGPUSchedule (#15232)
  * [Unity] Fix memory statistics issues in estimate_memory_usage (#15224)
  * [Unity][NestedMsg] Add NestedMsgTo helper function (#15223)
  * [Unity][Dlight] Avoid TransformBlockLayout in GEMV Rule (#15248)
  * [Unity][Dlight] Handle Epilogue Broadcasting (#15252)
  * [Unity] Add a Standalone VM Version Number (#15254)
  * Merge remote-tracking branch &#39;apache-upstream/main&#39; into 
unity-staging
  * [Unity][TIR] Allow symbolic bounds in IndexMap analysis (#15262)
  * [Unity][Training] Registering te gradient (#15231)
  * [MERGE] Hotfix layout transform related change after last merge from main
  * [Unity][Dlight] Minor performance improvement for gemm and gemv (#15278)
  * [Unity][BYOC] `PrimValue` handling in `FuseOpByPattern` for BYOC (#15217)
  * [Unity][CUTLASS] Offload RMS norm (#15288)
  * [Unity] [DistIR] Introducing DistIR (#15289)
  * [Dlight] Enhance fallback schedule with DecomposeReduction (#15302)
  * [Unity][Relax][Transform] Do not remove MatchCast for RemoveAllUnused 
(#15290)
  * [Unity] [Relax] [ONNX frontend] [op] Add support for Trilu operator (#15299)
  * [Unity] Update CUTLASS Attention to incorprate upstream change (#15309)
  * [Unity] Fix FuseOpsByPattern when a subgraph can be matched by multiple 
residual patterns (#15308)
  * [Unity][Op] Add leaky relu operator (#15296)
  * [Unity][Dlight] Add schedule rule for decode transpose (#15304)
  * [Unity] CUDA Graph update  (#15320)
  * [Unity][Training] Enhance gradient system (#15230)
  * [Unity][Dlight] Fix decode-GeMV rule when spatial-inner without 
broadcasting (#15330)
  * [Unity][Dlight] Rule matmul avoiding blockIdx.z (#15333)
  * [Unity] fp16 A x int B GEMM update - support int8, more bias shape (#15318)
  * [Unity][Dlight] Fix DecodeGeMV rule for spatial-inner with grouping (#15340)
  * Merge remote-tracking branch &#39;apache-upstream/main&#39; into unity
  * [Unity][OP] Sync `rms_norm` with main (#15355)
  * Add the ability to differentiate between model loads from remote fetch v/s 
model loads from cache (#15357)
  * [Unity][Dlight] Fix matmul schedule when out_dtype = fp32 and bias add is 
fp32 (#15363)
  * [Unity][CUTLASS] Support `out_dtype = &quot;float32&quot;` for 
FasterTransformer kernel (#15377)
  * [Unity] Avoid overloaded-virtual warnings (#15382)
  * [Dlight] Benchmarking Tools for Dynamic Shape PrimFuncs &amp; Relax 
Function (#15322)
  * [Unity][DLight] GEMV Rules (#15381)
  * [Unity] Add support for AXIS_SEPARATOR in AlterOpImpl Pass (#15315)
  * [Unity][Dlight] Fix reduction rule, aligning last block&#39;s iters (#15383)
  * [Unity][Module] Add Core Data Structure (#15398)
  * [Unity][DLight] Fix Reduction Rule (#15412)
  * [Unity] Lowering of axis separator in Layout Transform (#15390)
  * [Unity][Dlight] Tensorization Rule in GPU Matmul (#15389)
  * [Unity] nn.Module Spec (#15416)
  * [Unity] nn.Module Op (#15418)
  * [Unity] nn.Module Torch Integration (#15424)
  * [Unity][Dlight] Improve Dlight Tensorization Rule (#15427)
  * [Unity][DLight] Update GEMV rules (#15429)
  * [Unity][Frontends][Onnx] Improve ConstantOfShape behavior (#15434)
  * [Unity] nn.Module Module and Effect (#15438)
  * [Unity] cuda graph support for cublas (#15435)
  * [Unity][Op] Implement basic `call_tir_inplace` operator (#15372)
  * [Unity] dynamo with dynamic shape (#15441)
  * [Unity][Ops] Support for erf in relax (#15445)
  * [Unity][Dlight] Avoid too large vectorization factor in caching (#15443)
  * Merge remote-tracking branch &#39;upstream/main&#39; into unity-staging
  * [MERGE-FIX] Update the code to fix merge issues
  * [Unity]: fix error on enum (#15451)
  * [Unity] Eslint and TypeScript configuration fix (#15452)
  * [Unity][Op] Support symbolic shape inference for slice op. (#15450)
  * [Unity][Frontend][Onnx] Simplify gemm (#15458)
  * [Unity][Op] Conv1dTranspose (#15456)
  * [Unity][Transform] Elide redundant bindings of dataflow vars (#15341)
  * [Unity][Fix][Op] Add groups to conv1d (#15457)
  * [Bugfix][CUTLASS] CUTLASS path finding (#15476)
  * [Cherry-Pick][BugFix][TIR] ThreadSync with shared.dyn awareness (#15481)
  * add msc

-- File Changes --

    M 3rdparty/cutlass (2)
    M 3rdparty/cutlass_fpA_intB_gemm (2)
    M CMakeLists.txt (14)
    A apps/relax_examples/e2e_auto_tir.py (253)
    A apps/relax_examples/mlp.py (57)
    A apps/relax_examples/nn_module.py (69)
    A apps/relax_examples/resnet.py (53)
    M ci/jenkins/generated/arm_jenkinsfile.groovy (5)
    M ci/jenkins/generated/cortexm_jenkinsfile.groovy (5)
    M ci/jenkins/generated/cpu_jenkinsfile.groovy (5)
    M ci/jenkins/generated/docker_jenkinsfile.groovy (5)
    M ci/jenkins/generated/gpu_jenkinsfile.groovy (5)
    M ci/jenkins/generated/hexagon_jenkinsfile.groovy (5)
    M ci/jenkins/generated/i386_jenkinsfile.groovy (5)
    M ci/jenkins/generated/lint_jenkinsfile.groovy (5)
    M ci/jenkins/generated/minimal_cross_isa_jenkinsfile.groovy (5)
    M ci/jenkins/generated/minimal_jenkinsfile.groovy (5)
    M ci/jenkins/generated/riscv_jenkinsfile.groovy (5)
    M ci/jenkins/generated/wasm_jenkinsfile.groovy (5)
    A ci/jenkins/unity_jenkinsfile.groovy (337)
    M cmake/config.cmake (3)
    M cmake/modules/CUDA.cmake (8)
    M cmake/modules/contrib/CUTLASS.cmake (5)
    M cmake/modules/contrib/DNNL.cmake (8)
    A cmake/modules/contrib/MSC.cmake (26)
    M cmake/modules/contrib/TensorRT.cmake (2)
    M include/tvm/ir/expr.h (9)
    M include/tvm/ir/function.h (133)
    A include/tvm/ir/global_info.h (80)
    M include/tvm/ir/module.h (22)
    M include/tvm/ir/name_supply.h (46)
    M include/tvm/ir/transform.h (54)
    M include/tvm/ir/type.h (3)
    M include/tvm/node/script_printer.h (5)
    A include/tvm/relax/analysis.h (484)
    A include/tvm/relax/attrs/create.h (54)
    A include/tvm/relax/attrs/datatype.h (53)
    A include/tvm/relax/attrs/distributed.h (48)
    A include/tvm/relax/attrs/image.h (81)
    A include/tvm/relax/attrs/index.h (68)
    A include/tvm/relax/attrs/linear_algebra.h (53)
    A include/tvm/relax/attrs/manipulate.h (162)
    A include/tvm/relax/attrs/nn.h (394)
    A include/tvm/relax/attrs/op.h (63)
    A include/tvm/relax/attrs/search.h (48)
    A include/tvm/relax/attrs/statistical.h (63)
    A include/tvm/relax/backend.h (51)
    A include/tvm/relax/binding_rewrite.h (115)
    A include/tvm/relax/block_builder.h (240)
    A include/tvm/relax/dataflow_matcher.h (74)
    A include/tvm/relax/dataflow_pattern.h (828)
    A include/tvm/relax/dataflow_pattern_functor.h (183)
    A include/tvm/relax/distributed/axis_group_graph.h (305)
    A include/tvm/relax/distributed/global_info.h (92)
    A include/tvm/relax/distributed/struct_info.h (192)
    A include/tvm/relax/distributed/transform.h (56)
    A include/tvm/relax/exec_builder.h (181)
    A include/tvm/relax/expr.h (1059)
    A include/tvm/relax/expr_functor.h (551)
    A include/tvm/relax/nested_msg.h (602)
    A include/tvm/relax/op_attr_types.h (75)
    A include/tvm/relax/struct_info.h (451)
    A include/tvm/relax/struct_info_functor.h (157)
    A include/tvm/relax/tir_pattern.h (75)
    A include/tvm/relax/transform.h (537)
    A include/tvm/relax/tuning_api.h (396)
    A include/tvm/relax/type.h (166)
    A include/tvm/relax/utils.h (111)
    M include/tvm/relay/transform.h (2)
    M include/tvm/runtime/module.h (4)
    A include/tvm/runtime/relax_vm/builtin.h (89)
    A include/tvm/runtime/relax_vm/bytecode.h (223)
    A include/tvm/runtime/relax_vm/executable.h (219)
    A include/tvm/runtime/relax_vm/memory_manager.h (152)
    A include/tvm/runtime/relax_vm/vm.h (187)
    M include/tvm/script/ir_builder/ir/frame.h (4)
    A include/tvm/script/ir_builder/relax/frame.h (297)
    A include/tvm/script/ir_builder/relax/ir.h (146)
    M include/tvm/te/operation.h (2)
    M include/tvm/tir/buffer.h (14)
    M include/tvm/tir/builtin.h (44)
    M include/tvm/tir/data_type_rewriter.h (12)
    M include/tvm/tir/function.h (7)
    M include/tvm/tir/transform.h (22)
    M include/tvm/topi/nn/group_norm.h (31)
    M include/tvm/topi/nn/layer_norm.h (28)
    M include/tvm/topi/nn/rms_norm.h (11)
    M include/tvm/topi/transform.h (52)
    M jvm/pom.xml (4)
    M python/tvm/_ffi/libinfo.py (21)
    A python/tvm/contrib/cutlass/attention_operation.py (161)
    M python/tvm/contrib/cutlass/build.py (494)
    M python/tvm/contrib/cutlass/conv2d_operation.py (61)
    M python/tvm/contrib/cutlass/gemm_operation.py (216)
    M python/tvm/contrib/cutlass/gemm_profiler.py (4)
    M python/tvm/contrib/cutlass/gen_conv2d.py (119)
    M python/tvm/contrib/cutlass/gen_gemm.py (81)
    M python/tvm/contrib/cutlass/gen_tensor_op.py (349)
    A python/tvm/contrib/cutlass/layer_norm_operation.py (48)
    M python/tvm/contrib/cutlass/library.py (8)
    A python/tvm/contrib/cutlass/rms_norm_operation.py (47)
    M python/tvm/contrib/hexagon/session.py (37)
    A python/tvm/contrib/msc/__init__.py (17)
    A python/tvm/contrib/msc/core/__init__.py (17)
    A python/tvm/contrib/msc/core/_ffi_api.py (21)
    A python/tvm/contrib/msc/core/ir/__init__.py (20)
    A python/tvm/contrib/msc/core/ir/graph.py (520)
    A python/tvm/contrib/msc/core/ir/translate.py (172)
    A python/tvm/contrib/msc/core/runtime/__init__.py (17)
    A python/tvm/contrib/msc/core/tools/__init__.py (17)
    A python/tvm/contrib/msc/core/transform/__init__.py (20)
    A python/tvm/contrib/msc/core/transform/pattern.py (490)
    A python/tvm/contrib/msc/core/transform/transform.py (61)
    A python/tvm/contrib/msc/core/utils/__init__.py (23)
    A python/tvm/contrib/msc/core/utils/expr.py (105)
    A python/tvm/contrib/msc/core/utils/file.py (159)
    A python/tvm/contrib/msc/core/utils/info.py (68)
    A python/tvm/contrib/msc/core/utils/logging.py (17)
    A python/tvm/contrib/msc/core/utils/namespace.py (64)
    A python/tvm/contrib/msc/core/utils/register.py (61)
    A python/tvm/contrib/msc/framework/__init__.py (17)
    A python/tvm/contrib/msc/framework/tvm/__init__.py (17)
    A python/tvm/contrib/msc/framework/tvm/_ffi_api.py (21)
    A python/tvm/contrib/msc/framework/tvm/codegen/__init__.py (19)
    A python/tvm/contrib/msc/framework/tvm/codegen/translate.py (74)
    A python/tvm/contrib/msc/pipeline/__init__.py (17)
    A python/tvm/contrib/tvmjs.py (305)
    A python/tvm/dlight/__init__.py (27)
    A python/tvm/dlight/base/__init__.py (28)
    A python/tvm/dlight/base/analysis.py (253)
    A python/tvm/dlight/base/common_schedules.py (98)
    A python/tvm/dlight/base/schedule_rule.py (105)
    A python/tvm/dlight/base/transform.py (86)
    A python/tvm/dlight/benchmark/__init__.py (24)
    A python/tvm/dlight/benchmark/bench.py (312)
    A python/tvm/dlight/benchmark/extract.py (351)
    A python/tvm/dlight/benchmark/utils.py (172)
    A python/tvm/dlight/gpu/__init__.py (26)
    A python/tvm/dlight/gpu/fallback.py (70)
    A python/tvm/dlight/gpu/gemv.py (296)
    A python/tvm/dlight/gpu/general_reduction.py (94)
    A python/tvm/dlight/gpu/matmul.py (632)
    A python/tvm/dlight/gpu/reduction.py (244)
    A python/tvm/dlight/gpu/transpose.py (128)
    A python/tvm/dlight/gpu/utils.py (87)
    M python/tvm/exec/microtvm_debug_shell.py (8)
    M python/tvm/exec/rpc_proxy.py (38)
    M python/tvm/ir/__init__.py (1)
    M python/tvm/ir/expr.py (62)
    M python/tvm/ir/function.py (27)
    A python/tvm/ir/global_info.py (42)
    M python/tvm/ir/module.py (53)
    M python/tvm/ir/supply.py (7)
    M python/tvm/ir/transform.py (95)
    M python/tvm/meta_schedule/__init__.py (1)
    M python/tvm/meta_schedule/builder/local_builder.py (12)
    M python/tvm/meta_schedule/logging.py (3)
    A python/tvm/meta_schedule/relax_integration.py (0)
    M python/tvm/meta_schedule/runner/local_runner.py (0)
    M python/tvm/meta_schedule/testing/tune_utils.py (0)
    M python/tvm/meta_schedule/tir_integration.py (0)
    M python/tvm/meta_schedule/tune_context.py (0)
    M python/tvm/meta_schedule/utils.py (0)
    A python/tvm/relax/__init__.py (0)
    A python/tvm/relax/_ffi_api.py (0)
    A python/tvm/relax/analysis/__init__.py (0)
    A python/tvm/relax/analysis/_ffi_api.py (0)
    A python/tvm/relax/analysis/analysis.py (0)
    A python/tvm/relax/analysis/estimate_memory_usage.py (0)
    A python/tvm/relax/backend/__init__.py (0)
    A python/tvm/relax/backend/_ffi_api.py (0)
    A python/tvm/relax/backend/contrib/__init__.py (0)
    A python/tvm/relax/backend/contrib/cublas.py (0)
    A python/tvm/relax/backend/contrib/cutlass.py (0)
    A python/tvm/relax/backend/pattern_registry.py (0)
    A python/tvm/relax/backend/patterns.py (0)
    A python/tvm/relax/backend_tir/__init__.py (0)
    A python/tvm/relax/backend_tir/contrib/__init__.py (0)
    A python/tvm/relax/backend_tir/contrib/cutlass.py (0)
    A python/tvm/relax/backend_tir/pattern.py (0)
    A python/tvm/relax/binding_rewrite.py (0)
    A python/tvm/relax/block_builder.py (0)
    A python/tvm/relax/distributed/__init__.py (0)
    A python/tvm/relax/distributed/_ffi_api.py (0)
    A python/tvm/relax/distributed/global_info.py (0)
    A python/tvm/relax/distributed/struct_info.py (0)
    A python/tvm/relax/distributed/transform/__init__.py (0)
    A python/tvm/relax/distributed/transform/_ffi_api.py (0)
    A python/tvm/relax/distributed/transform/transform.py (0)
    A python/tvm/relax/dpl/__init__.py (0)
    A python/tvm/relax/dpl/_ffi.py (0)
    A python/tvm/relax/dpl/context.py (0)
    A python/tvm/relax/dpl/pattern.py (0)
    A python/tvm/relax/dpl/rewrite.py (0)
    A python/tvm/relax/exec_builder.py (0)
    A python/tvm/relax/expr.py (0)
    A python/tvm/relax/expr_functor.py (0)
    A python/tvm/relax/frontend/__init__.py (0)
    A python/tvm/relax/frontend/common.py (0)
    A python/tvm/relax/frontend/nn/__init__.py (0)
    A python/tvm/relax/frontend/nn/_tensor_op.py (0)
    A python/tvm/relax/frontend/nn/core.py (0)
    A python/tvm/relax/frontend/nn/modules.py (0)
    A python/tvm/relax/frontend/nn/op.py (0)
    A python/tvm/relax/frontend/nn/spec.py (0)
    A python/tvm/relax/frontend/nn/torch.py (0)
    A python/tvm/relax/frontend/onnx/__init__.py (0)
    A python/tvm/relax/frontend/onnx/onnx_frontend.py (0)
    A python/tvm/relax/frontend/stablehlo/__init__.py (0)
    A python/tvm/relax/frontend/stablehlo/stablehlo_translator.py (0)
    A python/tvm/relax/frontend/torch/__init__.py (0)
    A python/tvm/relax/frontend/torch/dynamo.py (0)
    A python/tvm/relax/frontend/torch/fx_translator.py (0)
    A python/tvm/relax/ir/instrument.py (0)
    A python/tvm/relax/op/__init__.py (0)
    A python/tvm/relax/op/_ffi_api.py (0)
    A python/tvm/relax/op/_op_gradient.py (0)
    A python/tvm/relax/op/base.py (0)
    A python/tvm/relax/op/binary.py (0)
    A python/tvm/relax/op/builtin/__init__.py (0)
    A python/tvm/relax/op/builtin/_ffi_api.py (0)
    A python/tvm/relax/op/builtin/builtin.py (0)
    A python/tvm/relax/op/create.py (0)
    A python/tvm/relax/op/datatype.py (0)
    A python/tvm/relax/op/distributed/__init__.py (0)
    A python/tvm/relax/op/distributed/_ffi_api.py (0)
    A python/tvm/relax/op/distributed/distributed.py (0)
    A python/tvm/relax/op/grad/__init__.py (0)
    A python/tvm/relax/op/grad/_ffi_api.py (0)
    A python/tvm/relax/op/grad/grad.py (0)
    A python/tvm/relax/op/image/__init__.py (0)
    A python/tvm/relax/op/image/_ffi_api.py (0)
    A python/tvm/relax/op/image/image.py (0)
    A python/tvm/relax/op/index.py (0)
    A python/tvm/relax/op/linear_algebra.py (0)
    A python/tvm/relax/op/manipulate.py (0)
    A python/tvm/relax/op/mask.py (0)
    A python/tvm/relax/op/memory/__init__.py (0)
    A python/tvm/relax/op/memory/_ffi_api.py (0)
    A python/tvm/relax/op/memory/memory.py (0)
    A python/tvm/relax/op/nn/__init__.py (0)
    A python/tvm/relax/op/nn/_ffi_api.py (0)
    A python/tvm/relax/op/nn/nn.py (0)
    A python/tvm/relax/op/op_attrs.py (0)
    A python/tvm/relax/op/search.py (0)
    A python/tvm/relax/op/set.py (0)
    A python/tvm/relax/op/statistical.py (0)
    A python/tvm/relax/op/ternary.py (0)
    A python/tvm/relax/op/unary.py (0)
    A python/tvm/relax/op/vm/__init__.py (0)
    A python/tvm/relax/op/vm/_ffi_api.py (0)
    A python/tvm/relax/op/vm/vm.py (0)
    A python/tvm/relax/pipeline.py (0)
    A python/tvm/relax/struct_info.py (0)
    A python/tvm/relax/testing/__init__.py (0)
    A python/tvm/relax/testing/ast_printer.py (0)
    A python/tvm/relax/testing/lib_comparator.py (0)
    A python/tvm/relax/testing/matmul.py (0)
    A python/tvm/relax/testing/nn.py (0)
    A python/tvm/relax/testing/relay_translator.py (0)
    A python/tvm/relax/testing/runtime_builtin.py (0)
    A python/tvm/relax/testing/transform.py (0)
    A python/tvm/relax/testing/vm.py (0)
    A python/tvm/relax/training/__init__.py (0)
    A python/tvm/relax/training/_ffi_api.py (0)
    A python/tvm/relax/training/loss.py (0)
    A python/tvm/relax/training/optimizer.py (0)
    A python/tvm/relax/training/setup_trainer.py (0)
    A python/tvm/relax/training/trainer.py (0)
    A python/tvm/relax/training/utils.py (0)
    A python/tvm/relax/transform/__init__.py (0)
    A python/tvm/relax/transform/_ffi_api.py (0)
    A python/tvm/relax/transform/lazy_transform_params.py (0)
    A python/tvm/relax/transform/legalize_ops/__init__.py (0)
    A python/tvm/relax/transform/legalize_ops/binary.py (0)
    A python/tvm/relax/transform/legalize_ops/common.py (0)
    A python/tvm/relax/transform/legalize_ops/create.py (0)
    A python/tvm/relax/transform/legalize_ops/datatype.py (0)
    A python/tvm/relax/transform/legalize_ops/grad.py (0)
    A python/tvm/relax/transform/legalize_ops/image.py (0)
    A python/tvm/relax/transform/legalize_ops/index.py (0)
    A python/tvm/relax/transform/legalize_ops/linear_algebra.py (0)
    A python/tvm/relax/transform/legalize_ops/manipulate.py (0)
    A python/tvm/relax/transform/legalize_ops/nn.py (0)
    A python/tvm/relax/transform/legalize_ops/search.py (0)
    A python/tvm/relax/transform/legalize_ops/statistical.py (0)
    A python/tvm/relax/transform/legalize_ops/unary.py (0)
    A python/tvm/relax/transform/transform.py (0)
    A python/tvm/relax/transform/tuning_api/__init__.py (0)
    A python/tvm/relax/transform/tuning_api/_ffi_api.py (0)
    A python/tvm/relax/transform/tuning_api/database.py (0)
    A python/tvm/relax/transform/tuning_api/default_functions.py (0)
    A python/tvm/relax/transform/tuning_api/primitives.py (0)
    A python/tvm/relax/ty.py (0)
    A python/tvm/relax/utils.py (0)
    A python/tvm/relax/vm_build.py (0)
    M python/tvm/relay/op/op_attrs.py (0)
    M python/tvm/rpc/proxy.py (0)
    M python/tvm/rpc/server.py (0)
    M python/tvm/rpc/tracker.py (0)

-- Patch Links --

https://github.com/apache/tvm/pull/15489.patch
https://github.com/apache/tvm/pull/15489.diff

-- 
Reply to this email directly or view it on GitHub:
https://github.com/apache/tvm/pull/15489
You are receiving this because you are subscribed to this thread.

Message ID: &lt;apache/tvm/pull/15...@github.com&gt;

[apache/tvm] [RFC][Unity][MSC] MileStone 0 (PR #15489)

Reply via email to