Thanks @Mousius for drafing this RFC! First of all, I completely agree on the importance to handle `arch`-specific checks. Use our experience as an example, on CUDA we might want to check if the PTX intrinsic `cp.async.commit_group` is available on certain architecture before tensorizing using that PTX intrinsic, and the existing approach is to add extra flags in `TargetKind`.
To motivate discussion, I would love to provide more context on the system design, and then talk about some specific points I noticed in the current RFC, and finally propose some ideas inspired by this RFC. ## Background **Design principles.** Just wanted to share some of my design principles when developing TVM, the compiler I believe that aims to work across hardware models: - A1. Generic. By design, TVM serves the common interest of all vendors. Therefore, it would be great if we could discuss broadly how a design choice would benefit all hardware platforms, including ARM, NV, Intel, Qual, etc. - A2. Minimal. Ideally, practitioners are supposed to learn only bare minimum to participate in TVM development. - Take TIR as an example, for a developer who knows the IR and passes to use and develop TIR, the only extra concept is `Target` . - Indeed Relay is in a less ideal state, but Relax is designed to address this issue so I wouldn't worry too much - A3. Customizable. We want to develop the infrastructure to provide customization opportunities for all vendors to independently work on their extension in a collaborative way, making TVM a platform for everyone. A few examples: - TIR scheduling is heading towards this direction that all the schedule primitives are decoupled with each other, so that vendors could develop their own primitives without affecting each other. - TVMScript is going to be re-designed this way treating TIR and Relax as independent dialects while the core infra itself supports any 3rdparty IR. - Admittedly Relay is much limited with the assumption of Relay => TE => TIR lowering path, with some hooks to hack in other compilation paths, but Relax is going to change this so again I wouldn't worry... **Current `arch`-specifc checks.** Most of the 6 groups of `arch`-specific helper functions, mentioned in the "Motivation" section, are developed by concurrent efforts from multiple parties, and therefore I would say fragmentation is almost a certain thing to happen. On the other hand, fragmentation of those helper functions, which are indeed ad-hoc, is currently confined locally as independent checks without yet polluting the design of global infrastructure. **Special target attributes.** Below are a few existing target attributes that serve special semantics, the design of which I don't fully agree with, but now are preserved as legacy: - **keys.** This is used to guide TOPI dispatch. For example, "llvm --keys=arm_cpu,cpu" first finds if there is any `GenericFunc` registered with `arm_cpu`, and if not, it falls back to `cpu`. - **libs.** This is used in TOPI to dispatch to vendor library. For example, "cuda -libs=cudnn" prefers dispatching to cuDNN. - **device.** and **model.** These two sometimes control the behavior of auto-tuning. **Existing `arch`-like attributes.** Note that the design of `Target` attributes in `TargetKind` is intended to describe the capability of hardware, according to the [Target RFC](https://discuss.tvm.apache.org/t/rfc-tvm-target-specification/6844): > Pass in additional target-specific attributes like ISA/library extension to a > target. Therefore, currently all the `arch`-like flags are centered around `Target`. As an example: - The [Vulkan target](https://github.com/apache/tvm/blob/a6a34046c432b3766e7c32bbd85c098812a12a68/src/target/target_kind.cc#L345-L370) comprehensively includes hardware feature support (e.g. whether or not fp16 is supported), physical limits of the device (e.g. max number of threads allowed). - The [CUDA target](https://github.com/apache/tvm/blob/a6a34046c432b3766e7c32bbd85c098812a12a68/src/target/target_kind.cc#L292-L296) is defined in similar approach but less comprehensive yet. It doesn't require architectural change to grow its attributes. Note that the number of attributes may grow if there is new hardware feature, but to the best of my knowledge, it could be less common that those hardware features may bloat, mainly because there is concrete cost to grow features on hardwares. **Target tag system.** Given the fact that existing hardware models are enumerable, the [Target RFC](https://discuss.tvm.apache.org/t/rfc-tvm-target-specification/6844) proposes to use "tags" to allow easy creation of targets. For example, `Target("nvidia/jetson-agx-xavier")` gives full specification of this device, including the cuda target and the ARM host. At the time of writing, it only takes 200 tags to describe [all the CUDA hardware](https://github.com/apache/tvm/blob/a6a34046c432b3766e7c32bbd85c098812a12a68/src/target/tag.cc#L107-L348). **What on earth are `Target`s.** Actually, `target` in TVM not only refers to the hardware, but also the codegen targets. For example, LLVM targets means TVM codegens to LLVM, and let LLVM do the rest of compilation. CUDA targets codegens to CUDA source code (i.e. `*.cu` files), and invokes NVCC for compilation. ## Discussion **Naming choice.** When implementing the Target RFC, I came up with the `preprocessor` to be backward compatible with existing functionalities that auto-detects the local enviroment (e.g. cuda version), which I have to admit it's probably not the best name. In the context of our RFC, we might want to use names like `target_parser` instead to be more specific. **Where to dispatch target parsers.** Currently, the preprocessor is dispatched solely based on `TargetKind`, which is admittedly a bit limited and overlooked the possiblity that `aarch64` and `x86` may need completely different parsers. Therefore, here we would love to raise a question: based on which attribute the parser should be dispatched? Previous wisdom in clang seems to suggest dispatching according to `mtriple` (IIUC), so shall we introduce anything similar, for example, dispatching based on `--device aarch64-foo-bar`? **Do we need multiple target parsers?** My answer is no. If the parsers are maintained by different vendors separately on their own interest, then they could decide how to implement parsers for "keys", "arch", together, without conflicting with other contributors. Therefore, I would say it's already consistent with our principle A3 without having to develop multiple parsers. **Function Signature for a target parser.** Note that with the introduction of previous Target RFC, a target object is canonically represented by a JSON-like object. Therefore, the signature could be: ```c++ using TargetJSON = Map<String, ObjectRef>; using FTVMTargetParser = TypedPackedFunc<TargetJSON(TargetJSON)>; ``` so that it's generic enough and could potentially be useful if vendors in the future want to hack around the Target object. To clarify the discussion [here](https://github.com/apache/tvm-rfcs/pull/71/files#r876079831), I would propose that our parsing code path not go back to string again, i.e.: - str -> TargetJSON (canonical form) -> TargetJSON (target parser applied) -> Target - TargetJSON (canonical form) -> TargetJSON (target parser applied) -> Target **Distinguishing `arch` and `attrs`.** Note that the number of possible attributes grow less frequently as we might expect. Also, there is no fundamental difference between `arch` and `attrs` given `arch` is special attrs, and target is designed for this according to point C5 in the [Target RFC](https://discuss.tvm.apache.org/t/rfc-tvm-target-specification/6844). Therefore, I would personally prefer a more flattened access pattern, and therefore we do not need to distinguish `arch` and `attrs`. **Folder structure.** According to principle A3, I would propose that different vendors may maintain their own TargetKind and parsers separately, for example, aarch64 could be maintained using: - `src/target/target/aarch64/parser.cc` - `src/target/target/aarch64/tags.cc` - `src/target/target/aarch64/kind.cc` - `src/target/target/aarch64/helpers.cc` Where the last item provides pre-defined packed functions mentioned in the "Motivation" section. -- Reply to this email directly or view it on GitHub: https://github.com/apache/tvm-rfcs/pull/71#issuecomment-1132579480 You are receiving this because you are subscribed to this thread. Message ID: <apache/tvm-rfcs/pull/71/c1132579...@github.com>