Re: [apache/tvm-rfcs] Add Target Pre-processing RFC (PR #71)

Junru Shao Fri, 20 May 2022 00:36:17 -0700

Thanks @Mousius for drafing this RFC!

First of all, I completely agree on the importance to handle `arch`-specific 
checks. Use our experience as an example, on CUDA we might want to check if the 
PTX intrinsic `cp.async.commit_group` is available on certain architecture 
before tensorizing using that PTX intrinsic, and the existing approach is to 
add extra flags in `TargetKind`.

To motivate discussion, I would love to provide more context on the system
design, and then talk about some specific points I noticed in the current RFC,
and finally propose some ideas inspired by this RFC.

## Background

**Design principles.** Just wanted to share some of my design principles when
developing TVM, the compiler I believe that aims to work across hardware models:
- A1. Generic. By design, TVM serves the common interest of all vendors.
Therefore, it would be great if we could discuss broadly how a design choice
would benefit all hardware platforms, including ARM, NV, Intel, Qual, etc.
- A2. Minimal. Ideally, practitioners are supposed to learn only bare minimum
to participate in TVM development.
- Take TIR as an example, for a developer who knows the IR and passes to use
and develop TIR, the only extra concept is `Target` .
- Indeed Relay is in a less ideal state, but Relax is designed to address
this issue so I wouldn't worry too much
- A3. Customizable. We want to develop the infrastructure to provide
customization opportunities for all vendors to independently work on their
extension in a collaborative way, making TVM a platform for everyone. A few
examples:
- TIR scheduling is heading towards this direction that all the schedule
primitives are decoupled with each other, so that vendors could develop their
own primitives without affecting each other.
- TVMScript is going to be re-designed this way treating TIR and Relax as
independent dialects while the core infra itself supports any 3rdparty IR.
- Admittedly Relay is much limited with the assumption of Relay => TE => TIR
lowering path, with some hooks to hack in other compilation paths, but Relax is
going to change this so again I wouldn't worry...

**Current `arch`-specifc checks.** Most of the 6 groups of `arch`-specific
helper functions, mentioned in the "Motivation" section, are developed by
concurrent efforts from multiple parties, and therefore I would say
fragmentation is almost a certain thing to happen. On the other hand,
fragmentation of those helper functions, which are indeed ad-hoc, is currently
confined locally as independent checks without yet polluting the design of
global infrastructure.

**Special target attributes.** Below are a few existing target attributes that
serve special semantics, the design of which I don't fully agree with, but now
are preserved as legacy:
- **keys.** This is used to guide TOPI dispatch. For example, "llvm
--keys=arm_cpu,cpu" first finds if there is any `GenericFunc` registered with
`arm_cpu`, and if not, it falls back to `cpu`.
- **libs.** This is used in TOPI to dispatch to vendor library. For example,
"cuda -libs=cudnn" prefers dispatching to cuDNN.
- **device.** and **model.** These two sometimes control the behavior of
auto-tuning.

**Existing `arch`-like attributes.** Note that the design of `Target`
attributes in `TargetKind` is intended to describe the capability of hardware,
according to the [Target
RFC](https://discuss.tvm.apache.org/t/rfc-tvm-target-specification/6844):
> Pass in additional target-specific attributes like ISA/library extension to a
> target.
Therefore, currently all the `arch`-like flags are centered around `Target`. As
an example:
- The [Vulkan
target](https://github.com/apache/tvm/blob/a6a34046c432b3766e7c32bbd85c098812a12a68/src/target/target_kind.cc#L345-L370)
comprehensively includes hardware feature support (e.g. whether or not fp16 is
supported), physical limits of the device (e.g. max number of threads allowed).
- The [CUDA
target](https://github.com/apache/tvm/blob/a6a34046c432b3766e7c32bbd85c098812a12a68/src/target/target_kind.cc#L292-L296)
is defined in similar approach but less comprehensive yet. It doesn't require
architectural change to grow its attributes.

Note that the number of attributes may grow if there is new hardware feature,
but to the best of my knowledge, it could be less common that those hardware
features may bloat, mainly because there is concrete cost to grow features on
hardwares.

**Target tag system.** Given the fact that existing hardware models are
enumerable, the [Target
RFC](https://discuss.tvm.apache.org/t/rfc-tvm-target-specification/6844)
proposes to use "tags" to allow easy creation of targets. For example,
`Target("nvidia/jetson-agx-xavier")` gives full specification of this device,
including the cuda target and the ARM host. At the time of writing, it only
takes 200 tags to describe [all the CUDA
hardware](https://github.com/apache/tvm/blob/a6a34046c432b3766e7c32bbd85c098812a12a68/src/target/tag.cc#L107-L348).

**What on earth are `Target`s.** Actually, `target` in TVM not only refers to
the hardware, but also the codegen targets. For example, LLVM targets means TVM
codegens to LLVM, and let LLVM do the rest of compilation. CUDA targets
codegens to CUDA source code (i.e. `*.cu` files), and invokes NVCC for
compilation.

## Discussion

**Naming choice.** When implementing the Target RFC, I came up with the
`preprocessor` to be backward compatible with existing functionalities that
auto-detects the local enviroment (e.g. cuda version), which I have to admit
it's probably not the best name. In the context of our RFC, we might want to
use names like `target_parser` instead to be more specific.

**Where to dispatch target parsers.** Currently, the preprocessor is dispatched
solely based on `TargetKind`, which is admittedly a bit limited and overlooked
the possiblity that `aarch64` and `x86` may need completely different parsers.
Therefore, here we would love to raise a question: based on which attribute the
parser should be dispatched? Previous wisdom in clang seems to suggest
dispatching according to `mtriple` (IIUC), so shall we introduce anything
similar, for example, dispatching based on `--device aarch64-foo-bar`?

**Do we need multiple target parsers?** My answer is no. If the parsers are
maintained by different vendors separately on their own interest, then they
could decide how to implement parsers for "keys", "arch", together, without
conflicting with other contributors. Therefore, I would say it's already
consistent with our principle A3 without having to develop multiple parsers.

**Function Signature for a target parser.** Note that with the introduction of
previous Target RFC, a target object is canonically represented by a JSON-like
object. Therefore, the signature could be:
```c++
using TargetJSON = Map<String, ObjectRef>;
using FTVMTargetParser = TypedPackedFunc<TargetJSON(TargetJSON)>;
```
so that it's generic enough and could potentially be useful if vendors in the
future want to hack around the Target object. To clarify the discussion
[here](https://github.com/apache/tvm-rfcs/pull/71/files#r876079831), I would
propose that our parsing code path not go back to string again, i.e.:
- str -> TargetJSON (canonical form) -> TargetJSON (target parser applied) ->
Target
- TargetJSON (canonical form) -> TargetJSON (target parser applied) -> Target

**Distinguishing `arch` and `attrs`.** Note that the number of possible
attributes grow less frequently as we might expect. Also, there is no
fundamental difference between `arch` and `attrs` given `arch` is special
attrs, and target is designed for this according to point C5 in the [Target
RFC](https://discuss.tvm.apache.org/t/rfc-tvm-target-specification/6844).
Therefore, I would personally prefer a more flattened access pattern, and
therefore we do not need to distinguish `arch` and `attrs`.

**Folder structure.** According to principle A3, I would propose that different
vendors may maintain their own TargetKind and parsers separately, for example,
aarch64 could be maintained using:
- `src/target/target/aarch64/parser.cc`
- `src/target/target/aarch64/tags.cc`
- `src/target/target/aarch64/kind.cc`
- `src/target/target/aarch64/helpers.cc`
Where the last item provides pre-defined packed functions mentioned in the
"Motivation" section.

--
Reply to this email directly or view it on GitHub:
https://github.com/apache/tvm-rfcs/pull/71#issuecomment-1132579480
You are receiving this because you are subscribed to this thread.

Message ID: <apache/tvm-rfcs/pull/71/c1132579...@github.com>

Re: [apache/tvm-rfcs] Add Target Pre-processing RFC (PR #71)

Reply via email to