Thanks @Mousius for drafing this RFC!

First of all, I completely agree on the importance to handle `arch`-specific 
checks. Use our experience as an example, on CUDA we might want to check if the 
PTX intrinsic `cp.async.commit_group` is available on certain architecture 
before tensorizing using that PTX intrinsic, and the existing approach is to 
add extra flags in `TargetKind`.

To motivate discussion, I would love to provide more context on the system 
design, and then talk about some specific points I noticed in the current RFC, 
and finally propose some ideas inspired by this RFC.

## Background

**Design principles.** Just wanted to share some of my design principles when 
developing TVM, the compiler I believe that aims to work across hardware models:
- A1. Generic. By design, TVM serves the common interest of all vendors. 
Therefore, it would be great if we could discuss broadly how a design choice 
would benefit all hardware platforms, including ARM, NV, Intel, Qual, etc.
- A2. Minimal. Ideally, practitioners are supposed to learn only bare minimum 
to participate in TVM development.
  - Take TIR as an example, for a developer who knows the IR and passes to use 
and develop TIR, the only extra concept is `Target` .
  - Indeed Relay is in a less ideal state, but Relax is designed to address 
this issue so I wouldn't worry too much
- A3. Customizable. We want to develop the infrastructure to provide 
customization opportunities for all vendors to independently work on their 
extension in a collaborative way, making TVM a platform for everyone. A few 
examples:
  - TIR scheduling is heading towards this direction that all the schedule 
primitives are decoupled with each other, so that vendors could develop their 
own primitives without affecting each other.
  - TVMScript is going to be re-designed this way treating TIR and Relax as 
independent dialects while the core infra itself supports any 3rdparty IR.
  - Admittedly Relay is much limited with the assumption of Relay => TE => TIR 
lowering path, with some hooks to hack in other compilation paths, but Relax is 
going to change this so again I wouldn't worry...

**Current `arch`-specifc checks.** Most of the 6 groups of `arch`-specific 
helper functions, mentioned in the "Motivation" section, are developed by 
concurrent efforts from multiple parties, and therefore I would say 
fragmentation is almost a certain thing to happen. On the other hand, 
fragmentation of those helper functions, which are indeed ad-hoc, is currently 
confined locally as independent checks without yet polluting the design of 
global infrastructure.

**Special target attributes.** Below are a few existing target attributes that 
serve special semantics, the design of which I don't fully agree with, but now 
are preserved as legacy:
- **keys.** This is used to guide TOPI dispatch. For example, "llvm 
--keys=arm_cpu,cpu" first finds if there is any `GenericFunc` registered with 
`arm_cpu`, and if not, it falls back to `cpu`.
- **libs.** This is used in TOPI to dispatch to vendor library. For example, 
"cuda -libs=cudnn" prefers dispatching to cuDNN.
- **device.** and **model.** These two sometimes control the behavior of 
auto-tuning.

**Existing `arch`-like attributes.** Note that the design of `Target` 
attributes in `TargetKind` is intended to describe the capability of hardware, 
according to the [Target 
RFC](https://discuss.tvm.apache.org/t/rfc-tvm-target-specification/6844):
> Pass in additional target-specific attributes like ISA/library extension to a 
> target.
Therefore, currently all the `arch`-like flags are centered around `Target`. As 
an example:
- The [Vulkan 
target](https://github.com/apache/tvm/blob/a6a34046c432b3766e7c32bbd85c098812a12a68/src/target/target_kind.cc#L345-L370)
 comprehensively includes hardware feature support (e.g. whether or not fp16 is 
supported), physical limits of the device (e.g. max number of threads allowed).
- The [CUDA 
target](https://github.com/apache/tvm/blob/a6a34046c432b3766e7c32bbd85c098812a12a68/src/target/target_kind.cc#L292-L296)
 is defined in similar approach but less comprehensive yet. It doesn't require 
architectural change to grow its attributes.

Note that the number of attributes may grow if there is new hardware feature, 
but to the best of my knowledge, it could be less common that those hardware 
features may bloat, mainly because there is concrete cost to grow features on 
hardwares.

**Target tag system.** Given the fact that existing hardware models are 
enumerable, the [Target 
RFC](https://discuss.tvm.apache.org/t/rfc-tvm-target-specification/6844) 
proposes to use "tags" to allow easy creation of targets. For example, 
`Target("nvidia/jetson-agx-xavier")` gives full specification of this device, 
including the cuda target and the ARM host. At the time of writing, it only 
takes 200 tags to describe [all the CUDA 
hardware](https://github.com/apache/tvm/blob/a6a34046c432b3766e7c32bbd85c098812a12a68/src/target/tag.cc#L107-L348).

**What on earth are `Target`s.** Actually, `target` in TVM not only refers to 
the hardware, but also the codegen targets. For example, LLVM targets means TVM 
codegens to LLVM, and let LLVM do the rest of compilation. CUDA targets 
codegens to CUDA source code (i.e. `*.cu` files), and invokes NVCC for 
compilation.

## Discussion

**Naming choice.** When implementing the Target RFC, I came up with the 
`preprocessor` to be backward compatible with existing functionalities that 
auto-detects the local enviroment (e.g. cuda version), which I have to admit 
it's probably not the best name. In the context of our RFC, we might want to 
use names like `target_parser` instead to be more specific.

**Where to dispatch target parsers.** Currently, the preprocessor is dispatched 
solely based on `TargetKind`, which is admittedly a bit limited and overlooked 
the possiblity that `aarch64` and `x86` may need completely different parsers. 
Therefore, here we would love to raise a question: based on which attribute the 
parser should be dispatched? Previous wisdom in clang seems to suggest 
dispatching according to `mtriple` (IIUC), so shall we introduce anything 
similar, for example, dispatching based on `--device aarch64-foo-bar`?

**Do we need multiple target parsers?** My answer is no. If the parsers are 
maintained by different vendors separately on their own interest, then they 
could decide how to implement parsers for "keys", "arch", together, without 
conflicting with other contributors. Therefore, I would say it's already 
consistent with our principle A3 without having to develop multiple parsers.

**Function Signature for a target parser.** Note that with the introduction of 
previous Target RFC, a target object is canonically represented by a JSON-like 
object. Therefore, the signature could be:
```c++
using TargetJSON = Map<String, ObjectRef>;
using FTVMTargetParser = TypedPackedFunc<TargetJSON(TargetJSON)>;
```
so that it's generic enough and could potentially be useful if vendors in the 
future want to hack around the Target object. To clarify the discussion 
[here](https://github.com/apache/tvm-rfcs/pull/71/files#r876079831), I would 
propose that our parsing code path not go back to string again, i.e.:
- str -> TargetJSON (canonical form) -> TargetJSON (target parser applied) -> 
Target
- TargetJSON (canonical form) -> TargetJSON (target parser applied) -> Target

**Distinguishing `arch` and `attrs`.** Note that the number of possible 
attributes grow less frequently as we might expect. Also, there is no 
fundamental difference between `arch` and `attrs` given `arch` is special 
attrs, and target is designed for this according to point C5 in the [Target 
RFC](https://discuss.tvm.apache.org/t/rfc-tvm-target-specification/6844). 
Therefore, I would personally prefer a more flattened access pattern, and 
therefore we do not need to distinguish `arch` and `attrs`.

**Folder structure.** According to principle A3, I would propose that different 
vendors may maintain their own TargetKind and parsers separately, for example, 
aarch64 could be maintained using:
- `src/target/target/aarch64/parser.cc`
- `src/target/target/aarch64/tags.cc`
- `src/target/target/aarch64/kind.cc`
- `src/target/target/aarch64/helpers.cc`
Where the last item provides pre-defined packed functions mentioned in the 
"Motivation" section.


-- 
Reply to this email directly or view it on GitHub:
https://github.com/apache/tvm-rfcs/pull/71#issuecomment-1132579480
You are receiving this because you are subscribed to this thread.

Message ID: <apache/tvm-rfcs/pull/71/c1132579...@github.com>

Reply via email to