Thanks @areusch and @jroesch for the input and great questions on this PRE-RFC 
👍.
We really appreciate it. As this is a pre-RFC, we felt it is really important 
to get input  from the TVM community as soon as possible :) .

[quote="jroesch, post:2, topic:12039"]
There has been talking of unifying the partitioners to use target specific 
annotations in the default fusion/partitioning flow, do you still need the 
`UMAPartitioner` in this case? or is the goal here to build a stable API which 
can map on to internal APIs as they change?
[/quote]
The intent of UMA is mostly to a have stable API, so mapping it to another 
partitioner activity could really make sense. Could you provide a pointer to a 
description of the 
activity you have in mind?

[quote="jroesch, post:2, topic:12039"]
In that case would it make more sense to build a data structure representing 
the patterns vs. using imperative APIs? i.e. 
`self._register_pattern("conv1d_relu", conv1d_relu_pattern())`
[/quote]

That is great input! In our team discussion we concluded that your proposal to 
build a data structure representation makes more sense. We are currently also 
in favor 
of moving away from multiple base classes to a common *UMABackendBase* class.

[quote="jroesch, post:2, topic:12039"]
Then finally do you have an example of how the UMACodegen step would work?
[/quote]
Let'me give you a pain points why we think changes in the codegen should be 
possible for the standard developer:
Adding an include statement like `#include "accelerator_a_lib.h"` to the target 
code requires to change **codegen_c.cc** and recompile (at least that a 
solution we are aware of). 
There are more cases like this, and we think that a Python interface is 
required.

How this would work? There could be multiples ways, e.g. packed calls into the 
codegen_c - we are trying to think from the user/developer perspective first 
here.


            
[quote="areusch, post:3, topic:12039"]
One of the challenges with adding several different lowering flows to TVM is 
understanding the advantages and drawbacks of each 
(hopefully there are really not so many drawbacks to any flow, but as with any 
system I’m sure they exist). At a high level, it’d be
great if you guys could add additional motivation where you depart from the 
standard flow to explain what is difficult to accomplish 
 with the existing standard flow
[/quote]
We are under the impression that there is no "standard flow" for accelerator. 
There are many paths that lead to the same outcome through the 
TVM flow.
Difficult for a developer who has to integrate an accelerator is:
* Defining the steps from Relay graph to TIR and from TIR to target code
* Finding the hooks to register custom transformations for a new accelerator
* For some changes a developer has to change the TVM code basis and recompile. 
It's more convenient for a developer to call a Python interface
   than changing C++ code and recompile

            
[quote="areusch, post:3, topic:12039"]
[quote="MJKlaiber, post:1, topic:12039"]
The intention is to use TensorIR and Relax with MetaScheduler for optimization.
[/quote]

Just curious where you guys have gotten to with this part of the effort. Will 
this be in the initial PR(s)?
[/quote]
TensorIR: yes

Relax: Probably not in the first PR. I attended the Relax meeting last time and 
was impressed by the progress and the elegance of the interface. 
   Advantage of UMA would be that it is a stable API, i.e. the move from Relay 
from Relax should be easier.

Metascheduler: generally yes, depends on the timeline of the first PR.
            
[quote="areusch, post:3, topic:12039"]
Is it possible to output other things? e.g. if TIR-to-Runtime assembles binary 
programming for an accelerator, is it possible to also output `.bin` or similar?
[/quote]

We currently assume, that the primary target will be generated C-Code. Similar 
to EthosU
binary command streams will be embedded in the generated C-code.
Standalone binary command streams are planned, but we do not have a clear 
opinion of how to implement them.

We also consider outputting other files, e.g. memory initialization dumps and 
simulation graphs.
@areusch and @jroesch, maybe you can help us to understand what the best 
options would be in this case. We do not want to have a create major change in 
**codegen_c**


CC: @cgerum @paulpb @philippvk @aca88 @SebastianBoblestETAS @r.stahl @jroesch 
@areusch @tqchen





---
[Visit 
Topic](https://discuss.tvm.apache.org/t/rfc-uma-universal-modular-accelerator-interface/12039/4)
 to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.tvm.apache.org/email/unsubscribe/21fab334aa2b237857f4b6a281458a65d16541768616df43acb72c519eddd8cc).

Reply via email to