Re: [apache/tvm-rfcs] Add Target Pre-processing RFC (PR #71)

Junru Shao Fri, 20 May 2022 22:52:57 -0700

@Mousius Thank you so much for your response! This makes lots of sense to me! 
Also, thanks for including my personal principles in the discussion! It's my 
personal principles which are completely okay to disagree with :-)


> I'm not sure why we consider that pollution given it should have a positive 
> impact on all groups and aids in providing customisation through a well 
> defined route established by all vendors, providing simplicity and 
> customisation (A1, A2 and A3).

Sorry my words may lead to some potential miscommunication. As I mentioned in 
the very beginning, I completely agree with the idea of handling `arch` more 
systematically and it's important to us, so in no means I consider it 
pollution. I mean the fragmentation itself is not good but hasn't become 
extremely bad right now, because they are not used globally at infrastructural 
level, so there is no namespace polluting.

> I believe this is still held given the crossover between hardware and code 
> generators, for c/llvm you will still be able to pass mattr/mcpu/etc which 
> are the standard way to specify CPU, ISA and extensions in existing compilers.

Agreed. It is designed to be consistent with existing wisdom, i.e. LLVM/GCC 
mattr/mcpu/etc., so I believe it's not controversial.

> There's definitely an existing amount of confusion in the Target system, but 
> I think it's even more confused than this by looking at the tagged entries, 
> such as: .... Which defines a system configuration that then uses a code 
> generator llvm with a specific CPU profile, therefore the Target system can 
> represent at minimum 3 distinct layers: systems/hardware/code generators.

Yes, I think your understanding is correct. Per [Target 
RFC](https://discuss.tvm.apache.org/t/rfc-tvm-target-specification/6844), this 
object represents host compiler info (C0), heterogeneous compilation settings 
(C1), compiler behavior control (C2), additional ISA/library (C4). Therefore, 
the tag system (C4) is introduced to reduce the cognitive overhead, because the 
end users can use a simple plain string tag `Target("raspberry-pi/4b-aarch64")` 
to represent all C0/1/2/4, instead of understanding every detail of the 
hardware themselves (A2). In other words, to the end users, the vision of 
having tag system is that they only need a simple string to do configure TVM 
codegen on their their hardware.

> where-as the cuda Target has a more complete reflection of the attrs 
> available for CUDA. These inconsistencies make it difficult for a user to 
> approach TVM and that isn't requiring a developers to learn the bare minimum 
> (A2). It also diverges from other compilers where you'd expect 
> mcpu/mtriple/etc to infer much of this information for you (A1).

Right, in fact I share the same confusion with you with it comes to 
CUDA/Vulkan/etc., where (to best of my knowledge) there is no precedent of 
mcpu/mattr/mtriple/etc (except for LLVM NVPTX backend). Certainly, we should 
develop unified approach to address this confusion, which could possibly be the 
`arch` proposed in the RFC.

> I don't mind using one parser, I was just making it more granular and 
> avoiding stepping on peoples toes with the existing pre-processor. Either 
> design seems to fulfil your principle of customisability (A3), more 
> granularly means you only have to implement a small piece of customisation 
> where-as with the single parser it requires some thought as to how to cater 
> for all attributes within a given TargetKind parser.

Thanks for sharing your thoughts! Definitely. The issue of having multiple 
parsers is that the order of their application is less explicit, so I would be 
a little bit leaning towards a single one, while the implementation could share 
common helper functions.

> > Where to dispatch target parsers. Currently, the preprocessor is dispatched 
> > solely based on TargetKind, which is admittedly a bit limited and 
> > overlooked the possiblity that aarch64 and x86 may need completely 
> > different parsers. Therefore, here we would love to raise a question: based 
> > on which attribute the parser should be dispatched? Previous wisdom in 
> > clang seems to suggest dispatching according to mtriple (IIUC), so shall we 
> > introduce anything similar, for example, dispatching based on --device 
> > aarch64-foo-bar?
>
> This requires developers to learn more than the bare minimum (A2), as users 
> are required to learn the specifics of how to specify a Target in TVM. If an 
> llvm Target supports -mcpu/-mattr/-mtriple then all of these fields can be 
> reasonably expected to allow the Target to infer features given the relation 
> to GCC/LLVM (A1).

Right. I agree. `TargetKind`-based dispatch is certainly the clearest approach 
as you mentioned (A2). The reason I'm thinking about (not 100% sure though) 
other directions is: what if different vendors (say Qualcomm and ARM) want to 
customize their own target parsing logic separately, while their `TargetKind` 
is the same `llvm`? Admittedly I don't have a perfect answer for this and would 
love to hear about your thoughts.

> Based on LLVM, the Features inferred from a given CPU/GPU profile can be 
> extensive already, which would lead to a drastic increase in attrs if we were 
> to track them this way. ... Therefore I would strongly advise following the 
> pattern (as in LLVM) of using a separate structure for this, clear boundaries 
> provide a much more straight forward developer experience (A2) and allow us 
> to do things as @Lunderberg suggested like plucking just features or arch to 
> give to Passes rather than the entire Target.

Thank you for detailing the challenges! I'm convinced that `target.features` 
may be the best way to go. A couple of more questions:
- What's the difference between `target.features` and `target.arch`? If they 
are the same thing, I would say `features` is certainly a better name
- IIUC we are not going to introduce 2-level of indirection like 
`target.arch.features`. Is that correct?
- Is it possible to completely replace `target.attrs` with more structured 
configuration? For example, `target.features` for C4 in the Target RFC, while 
`target.hardware_limits` for maximum shared memory, etc (just a hypothetical 
name, please feel free to suggest better ones)

> I believe the src/target/parsers folder would provide for maximum 
> customisation given parsers can be re-used between TargetKinds and therefore 
> serves as a good starting point? Also happy for an RFC in 6 months to move 
> everything to a new folder if we re-factor the Targets 😸

Sounds good to me!

Overall, I think we are in broad agreement, and thanks for sharing all the 
context!

-- 
Reply to this email directly or view it on GitHub:
https://github.com/apache/tvm-rfcs/pull/71#issuecomment-1133541236
You are receiving this because you are subscribed to this thread.

Message ID: <apache/tvm-rfcs/pull/71/[email protected]>

Re: [apache/tvm-rfcs] Add Target Pre-processing RFC (PR #71)

Reply via email to