Hi @junrushao1994, thanks for the elaborate reply 😸  I don't want to debate our 
personal principles but I appreciate you sharing them and will reference them 
where I can.

> **Current `arch`-specifc checks.** Most of the 6 groups of `arch`-specific 
> helper functions, mentioned in the "Motivation" section, are developed by 
> concurrent efforts from multiple parties, and therefore I would say 
> fragmentation is almost a certain thing to happen. On the other hand, 
> fragmentation of those helper functions, which are indeed ad-hoc, is 
> currently confined locally as independent checks without yet polluting the 
> design of global infrastructure.

Yip, this fragmentation occurs due to a lack of a standardised mechanism for 
these groups to use, which the RFC aims to provide - I'm not sure why we 
consider that pollution given it should have a positive impact on all groups 
and aids in providing customisation through a well defined route established by 
all vendors, providing simplicity and customisation (A1, A2 and A3).

> Existing arch-like attributes. Note that the design of Target attributes in 
> TargetKind is intended to describe the capability of hardware, according to 
> the [Target 
> RFC](https://discuss.tvm.apache.org/t/rfc-tvm-target-specification/6844):

I believe this is still held given the crossover between hardware and code 
generators, for `c`/`llvm` you will still be able to pass `mattr`/`mcpu`/etc 
which are the standard way to specify CPU, ISA and extensions in existing 
compilers (A1). This is also following the TVM Guideline of "Be consistent with 
existing well-known package’s APIs if the features overlap.
  For example, tensor operation APIs should always be consistent with the numpy 
API.".

> Target tag system. Given the fact that existing hardware models are 
> enumerable, the [Target 
> RFC](https://discuss.tvm.apache.org/t/rfc-tvm-target-specification/6844) 
> proposes to use "tags" to allow easy creation of targets. For example, 
> Target("nvidia/jetson-agx-xavier") gives full specification of this device, 
> including the cuda target and the ARM host. At the time of writing, it only 
> takes 200 tags to describe [all the CUDA 
> hardware](https://github.com/apache/tvm/blob/a6a34046c432b3766e7c32bbd85c098812a12a68/src/target/tag.cc#L107-L348).

> **What on earth are `Target`s.** Actually, `target` in TVM not only refers to 
> the hardware, but also the codegen targets. For example, LLVM targets means 
> TVM codegens to LLVM, and let LLVM do the rest of compilation. CUDA targets 
> codegens to CUDA source code (i.e. `*.cu` files), and invokes NVCC for 
> compilation.

There's definitely an existing amount of confusion in the `Target` system, but 
I think it's even more confused than this by looking at the tagged entries, 
such as:

```
TVM_REGISTER_TARGET_TAG("raspberry-pi/4b-aarch64")
    .set_config({{"kind", String("llvm")},
                 {"mtriple", String("aarch64-linux-gnu")},
                 {"mcpu", String("cortex-a72")},
                 {"mattr", Array<String>{"+neon"}},
                 {"num-cores", Integer(4)},
                 {"host", Map<String, ObjectRef>{{"kind", String("llvm")},
                                                 {"mtriple", 
String("aarch64-linux-gnu")},
                                                 {"mcpu", String("cortex-a72")},
                                                 {"mattr", 
Array<String>{"+neon"}},
                                                 {"num-cores", Integer(4)}}}});
```

Which defines a system configuration that then uses a code generator `llvm` 
with a specific CPU profile, therefore the `Target` system can represent **at 
minimum** 3 distinct layers: systems/hardware/code generators. Given the 
principle of having to understand a minimal amount (A2), `Target` needs to be 
streamlined into understandable parts. This is a digression from the actual 
RFC, as the features are pre-processed from the tagged `Target`s information.

The problem that I've faced trying to figure out how to implement this is that 
`llvm` takes `mcpu`/`mattr`/etc which are used to infer features later where-as 
the `cuda` `Target` has a more complete reflection of the `attrs` available for 
CUDA. These inconsistencies make it difficult for a user to approach TVM and 
that isn't requiring a developers to learn the bare minimum (A2). It also 
diverges from other compilers where you'd expect `mcpu`/`mtriple`/etc to infer 
much of this information for you (A1).

> Do we need multiple target parsers? My answer is no. If the parsers are 
> maintained by different vendors separately on their own interest, then they 
> could decide how to implement parsers for "keys", "arch", together, without 
> conflicting with other contributors. Therefore, I would say it's already 
> consistent with our principle A3 without having to develop multiple parsers.

> Naming choice. When implementing the Target RFC, I came up with the 
> preprocessor to be backward compatible with existing functionalities that 
> auto-detects the local enviroment (e.g. cuda version), which I have to admit 
> it's probably not the best name. In the context of our RFC, we might want to 
> use names like target_parser instead to be more specific.

I don't mind using one parser, I was just making it more granular and avoiding 
stepping on peoples toes with the existing pre-processor. Either design seems 
to fulfil your principle of customisability (A3), more granularly means you 
only have to implement a small piece of customisation where-as with the single 
parser it requires some thought as to how to cater for all attributes within a 
given `TargetKind` parser.

> **Where to dispatch target parsers.** Currently, the preprocessor is 
> dispatched solely based on `TargetKind`, which is admittedly a bit limited 
> and overlooked the possiblity that `aarch64` and `x86` may need completely 
> different parsers. Therefore, here we would love to raise a question: based 
> on which attribute the parser should be dispatched? Previous wisdom in clang 
> seems to suggest dispatching according to `mtriple` (IIUC), so shall we 
> introduce anything similar, for example, dispatching based on `--device 
> aarch64-foo-bar`?

This requires developers to learn more than the bare minimum (A2), as users are 
required to learn the specifics of how to specify a `Target` in TVM. If an 
`llvm` `Target` supports `-mcpu`/`-mattr`/`-mtriple` then all of these fields 
can be reasonably expected to allow the `Target` to infer features given the 
relation to GCC/LLVM (A1). 

> Distinguishing arch and attrs. Note that the number of possible attributes 
> grow less frequently as we might expect. Also, there is no fundamental 
> difference between arch and attrs given arch is special attrs, and target is 
> designed for this according to point C5 in the [Target 
> RFC](https://discuss.tvm.apache.org/t/rfc-tvm-target-specification/6844). 
> Therefore, I would personally prefer a more flattened access pattern, and 
> therefore we do not need to distinguish arch and attrs.

Based on LLVM, the `Features` inferred from a given CPU/GPU profile can be 
extensive already, which would lead to a drastic increase in `attrs` if we were 
to track them this way. For example: 

* 
https://github.com/llvm/llvm-project/blob/09c2b7c35af8c4bad39f03e9f60df8bd07323028/llvm/lib/Support/X86TargetParser.cpp
* 
https://github.com/llvm/llvm-project/blob/09c2b7c35af8c4bad39f03e9f60df8bd07323028/llvm/lib/Support/AArch64TargetParser.cpp

If we were to attach all of these directly to `Target` (i.e. `llvm`), that 
would drastically increase the number of available fields and in all cases only 
a subset would be used - specific to a given CPU/GPU profile. Therefore I would 
strongly advise following the pattern (as in LLVM) of using a separate 
structure for this, clear boundaries provide a much more straight forward 
developer experience (A2) and allow us to do things as @Lunderberg suggested 
like plucking just `features` or `arch` to give to `Pass`es rather than the 
entire `Target`.

> **Folder structure.** According to principle A3, I would propose that 
> different vendors may maintain their own TargetKind and parsers separately, 
> for example, aarch64 could be maintained using:
> 
> * `src/target/target/aarch64/parser.cc`
> * `src/target/target/aarch64/tags.cc`
> * `src/target/target/aarch64/kind.cc`
> * `src/target/target/aarch64/helpers.cc`
>   Where the last item provides pre-defined packed functions mentioned in the 
> "Motivation" section.

I believe the proposed layout also conforms to A3 as it already details two 
different pre-processors, and you can easily imagine it extending:

* src/target/preprocessors/aarch64.cc
* src/target/preprocessors/cpu.cc
* src/target/preprocessors/cuda.cc
* src/target/preprocessors/woofles.cc
* src/target/preprocessors/x86_64.cc

The composition of `CPU` calling `AArch64` or `x86_64` conditionally also 
provides multiple vendors to provide pre-processing functionality to `llvm` by 
virtue of being able to call their own functions within the central `CPU` 
pre-processor.

# Outcomes
Will try and summarise the above into changes in the RFC which I can implement 
😸 

## Single Parser
I can re-word the RFC to use a single parser (`target_parser`) rather than the 
three separate ones, using the syntax you suggested:

```c++
using TargetJSON = Map<String, ObjectRef>;
using FTVMTargetParser = TypedPackedFunc<TargetJSON(TargetJSON)>;
```

## `Target` Field
I'm strongly in favour of having `target.features` to represent this, do you 
actively object to this?

## Location
I believe the `src/target/parsers` folder would provide for maximum 
customisation given parsers can be re-used between `TargetKind`s and therefore 
serves as a good starting point? Also happy for an RFC in 6 months to move 
everything to a new folder if we re-factor the `Target`s 😸 

-- 
Reply to this email directly or view it on GitHub:
https://github.com/apache/tvm-rfcs/pull/71#issuecomment-1132872196
You are receiving this because you are subscribed to this thread.

Message ID: <apache/tvm-rfcs/pull/71/c1132872...@github.com>

Reply via email to