Hi @junrushao1994, thanks for the elaborate reply 😸 I don't want to debate our personal principles but I appreciate you sharing them and will reference them where I can.
> **Current `arch`-specifc checks.** Most of the 6 groups of `arch`-specific > helper functions, mentioned in the "Motivation" section, are developed by > concurrent efforts from multiple parties, and therefore I would say > fragmentation is almost a certain thing to happen. On the other hand, > fragmentation of those helper functions, which are indeed ad-hoc, is > currently confined locally as independent checks without yet polluting the > design of global infrastructure. Yip, this fragmentation occurs due to a lack of a standardised mechanism for these groups to use, which the RFC aims to provide - I'm not sure why we consider that pollution given it should have a positive impact on all groups and aids in providing customisation through a well defined route established by all vendors, providing simplicity and customisation (A1, A2 and A3). > Existing arch-like attributes. Note that the design of Target attributes in > TargetKind is intended to describe the capability of hardware, according to > the [Target > RFC](https://discuss.tvm.apache.org/t/rfc-tvm-target-specification/6844): I believe this is still held given the crossover between hardware and code generators, for `c`/`llvm` you will still be able to pass `mattr`/`mcpu`/etc which are the standard way to specify CPU, ISA and extensions in existing compilers (A1). This is also following the TVM Guideline of "Be consistent with existing well-known package’s APIs if the features overlap. For example, tensor operation APIs should always be consistent with the numpy API.". > Target tag system. Given the fact that existing hardware models are > enumerable, the [Target > RFC](https://discuss.tvm.apache.org/t/rfc-tvm-target-specification/6844) > proposes to use "tags" to allow easy creation of targets. For example, > Target("nvidia/jetson-agx-xavier") gives full specification of this device, > including the cuda target and the ARM host. At the time of writing, it only > takes 200 tags to describe [all the CUDA > hardware](https://github.com/apache/tvm/blob/a6a34046c432b3766e7c32bbd85c098812a12a68/src/target/tag.cc#L107-L348). > **What on earth are `Target`s.** Actually, `target` in TVM not only refers to > the hardware, but also the codegen targets. For example, LLVM targets means > TVM codegens to LLVM, and let LLVM do the rest of compilation. CUDA targets > codegens to CUDA source code (i.e. `*.cu` files), and invokes NVCC for > compilation. There's definitely an existing amount of confusion in the `Target` system, but I think it's even more confused than this by looking at the tagged entries, such as: ``` TVM_REGISTER_TARGET_TAG("raspberry-pi/4b-aarch64") .set_config({{"kind", String("llvm")}, {"mtriple", String("aarch64-linux-gnu")}, {"mcpu", String("cortex-a72")}, {"mattr", Array<String>{"+neon"}}, {"num-cores", Integer(4)}, {"host", Map<String, ObjectRef>{{"kind", String("llvm")}, {"mtriple", String("aarch64-linux-gnu")}, {"mcpu", String("cortex-a72")}, {"mattr", Array<String>{"+neon"}}, {"num-cores", Integer(4)}}}}); ``` Which defines a system configuration that then uses a code generator `llvm` with a specific CPU profile, therefore the `Target` system can represent **at minimum** 3 distinct layers: systems/hardware/code generators. Given the principle of having to understand a minimal amount (A2), `Target` needs to be streamlined into understandable parts. This is a digression from the actual RFC, as the features are pre-processed from the tagged `Target`s information. The problem that I've faced trying to figure out how to implement this is that `llvm` takes `mcpu`/`mattr`/etc which are used to infer features later where-as the `cuda` `Target` has a more complete reflection of the `attrs` available for CUDA. These inconsistencies make it difficult for a user to approach TVM and that isn't requiring a developers to learn the bare minimum (A2). It also diverges from other compilers where you'd expect `mcpu`/`mtriple`/etc to infer much of this information for you (A1). > Do we need multiple target parsers? My answer is no. If the parsers are > maintained by different vendors separately on their own interest, then they > could decide how to implement parsers for "keys", "arch", together, without > conflicting with other contributors. Therefore, I would say it's already > consistent with our principle A3 without having to develop multiple parsers. > Naming choice. When implementing the Target RFC, I came up with the > preprocessor to be backward compatible with existing functionalities that > auto-detects the local enviroment (e.g. cuda version), which I have to admit > it's probably not the best name. In the context of our RFC, we might want to > use names like target_parser instead to be more specific. I don't mind using one parser, I was just making it more granular and avoiding stepping on peoples toes with the existing pre-processor. Either design seems to fulfil your principle of customisability (A3), more granularly means you only have to implement a small piece of customisation where-as with the single parser it requires some thought as to how to cater for all attributes within a given `TargetKind` parser. > **Where to dispatch target parsers.** Currently, the preprocessor is > dispatched solely based on `TargetKind`, which is admittedly a bit limited > and overlooked the possiblity that `aarch64` and `x86` may need completely > different parsers. Therefore, here we would love to raise a question: based > on which attribute the parser should be dispatched? Previous wisdom in clang > seems to suggest dispatching according to `mtriple` (IIUC), so shall we > introduce anything similar, for example, dispatching based on `--device > aarch64-foo-bar`? This requires developers to learn more than the bare minimum (A2), as users are required to learn the specifics of how to specify a `Target` in TVM. If an `llvm` `Target` supports `-mcpu`/`-mattr`/`-mtriple` then all of these fields can be reasonably expected to allow the `Target` to infer features given the relation to GCC/LLVM (A1). > Distinguishing arch and attrs. Note that the number of possible attributes > grow less frequently as we might expect. Also, there is no fundamental > difference between arch and attrs given arch is special attrs, and target is > designed for this according to point C5 in the [Target > RFC](https://discuss.tvm.apache.org/t/rfc-tvm-target-specification/6844). > Therefore, I would personally prefer a more flattened access pattern, and > therefore we do not need to distinguish arch and attrs. Based on LLVM, the `Features` inferred from a given CPU/GPU profile can be extensive already, which would lead to a drastic increase in `attrs` if we were to track them this way. For example: * https://github.com/llvm/llvm-project/blob/09c2b7c35af8c4bad39f03e9f60df8bd07323028/llvm/lib/Support/X86TargetParser.cpp * https://github.com/llvm/llvm-project/blob/09c2b7c35af8c4bad39f03e9f60df8bd07323028/llvm/lib/Support/AArch64TargetParser.cpp If we were to attach all of these directly to `Target` (i.e. `llvm`), that would drastically increase the number of available fields and in all cases only a subset would be used - specific to a given CPU/GPU profile. Therefore I would strongly advise following the pattern (as in LLVM) of using a separate structure for this, clear boundaries provide a much more straight forward developer experience (A2) and allow us to do things as @Lunderberg suggested like plucking just `features` or `arch` to give to `Pass`es rather than the entire `Target`. > **Folder structure.** According to principle A3, I would propose that > different vendors may maintain their own TargetKind and parsers separately, > for example, aarch64 could be maintained using: > > * `src/target/target/aarch64/parser.cc` > * `src/target/target/aarch64/tags.cc` > * `src/target/target/aarch64/kind.cc` > * `src/target/target/aarch64/helpers.cc` > Where the last item provides pre-defined packed functions mentioned in the > "Motivation" section. I believe the proposed layout also conforms to A3 as it already details two different pre-processors, and you can easily imagine it extending: * src/target/preprocessors/aarch64.cc * src/target/preprocessors/cpu.cc * src/target/preprocessors/cuda.cc * src/target/preprocessors/woofles.cc * src/target/preprocessors/x86_64.cc The composition of `CPU` calling `AArch64` or `x86_64` conditionally also provides multiple vendors to provide pre-processing functionality to `llvm` by virtue of being able to call their own functions within the central `CPU` pre-processor. # Outcomes Will try and summarise the above into changes in the RFC which I can implement 😸 ## Single Parser I can re-word the RFC to use a single parser (`target_parser`) rather than the three separate ones, using the syntax you suggested: ```c++ using TargetJSON = Map<String, ObjectRef>; using FTVMTargetParser = TypedPackedFunc<TargetJSON(TargetJSON)>; ``` ## `Target` Field I'm strongly in favour of having `target.features` to represent this, do you actively object to this? ## Location I believe the `src/target/parsers` folder would provide for maximum customisation given parsers can be re-used between `TargetKind`s and therefore serves as a good starting point? Also happy for an RFC in 6 months to move everything to a new folder if we re-factor the `Target`s 😸 -- Reply to this email directly or view it on GitHub: https://github.com/apache/tvm-rfcs/pull/71#issuecomment-1132872196 You are receiving this because you are subscribed to this thread. Message ID: <apache/tvm-rfcs/pull/71/c1132872...@github.com>