[Apache TVM Discuss] [Development/pre-RFC] [RFC] UMA: Universal Modular Accelerator Interface

Michael J Klaiber via Apache TVM Discuss Thu, 10 Feb 2022 08:33:31 -0800


Thanks @areusch and @jroesch for the input and great questions on this PRE-RFC 
👍.
We really appreciate it. As this is a pre-RFC, we felt it is really important 
to get input  from the TVM community as soon as possible :) .

[quote="jroesch, post:2, topic:12039"]
There has been talking of unifying the partitioners to use target specific
annotations in the default fusion/partitioning flow, do you still need the
`UMAPartitioner` in this case? or is the goal here to build a stable API which
can map on to internal APIs as they change?
[/quote]
The intent of UMA is mostly to a have stable API, so mapping it to another
partitioner activity could really make sense. Could you provide a pointer to a
description of the
activity you have in mind?

[quote="jroesch, post:2, topic:12039"]
In that case would it make more sense to build a data structure representing
the patterns vs. using imperative APIs? i.e.
`self._register_pattern("conv1d_relu", conv1d_relu_pattern())`
[/quote]

That is great input! In our team discussion we concluded that your proposal to
build a data structure representation makes more sense. We are currently also
in favor
of moving away from multiple base classes to a common *UMABackendBase* class.

[quote="jroesch, post:2, topic:12039"]
Then finally do you have an example of how the UMACodegen step would work?
[/quote]
Let'me give you a pain points why we think changes in the codegen should be
possible for the standard developer:
Adding an include statement like `#include "accelerator_a_lib.h"` to the target
code requires to change **codegen_c.cc** and recompile (at least that a
solution we are aware of).
There are more cases like this, and we think that a Python interface is
required.

How this would work? There could be multiples ways, e.g. packed calls into the
codegen_c - we are trying to think from the user/developer perspective first
here.

[quote="areusch, post:3, topic:12039"]
One of the challenges with adding several different lowering flows to TVM is
understanding the advantages and drawbacks of each
(hopefully there are really not so many drawbacks to any flow, but as with any
system I’m sure they exist). At a high level, it’d be
great if you guys could add additional motivation where you depart from the
standard flow to explain what is difficult to accomplish
with the existing standard flow
[/quote]
We are under the impression that there is no "standard flow" for accelerator.
There are many paths that lead to the same outcome through the
TVM flow.
Difficult for a developer who has to integrate an accelerator is:
* Defining the steps from Relay graph to TIR and from TIR to target code
* Finding the hooks to register custom transformations for a new accelerator
* For some changes a developer has to change the TVM code basis and recompile.
It's more convenient for a developer to call a Python interface
than changing C++ code and recompile

[quote="areusch, post:3, topic:12039"]
[quote="MJKlaiber, post:1, topic:12039"]
The intention is to use TensorIR and Relax with MetaScheduler for optimization.
[/quote]

Just curious where you guys have gotten to with this part of the effort. Will
this be in the initial PR(s)?
[/quote]
TensorIR: yes

Relax: Probably not in the first PR. I attended the Relax meeting last time and
was impressed by the progress and the elegance of the interface.
Advantage of UMA would be that it is a stable API, i.e. the move from Relay
from Relax should be easier.

Metascheduler: generally yes, depends on the timeline of the first PR.

[quote="areusch, post:3, topic:12039"]
Is it possible to output other things? e.g. if TIR-to-Runtime assembles binary
programming for an accelerator, is it possible to also output `.bin` or similar?
[/quote]

We currently assume, that the primary target will be generated C-Code. Similar
to EthosU
binary command streams will be embedded in the generated C-code.
Standalone binary command streams are planned, but we do not have a clear
opinion of how to implement them.

We also consider outputting other files, e.g. memory initialization dumps and
simulation graphs.
@areusch and @jroesch, maybe you can help us to understand what the best
options would be in this case. We do not want to have a create major change in
**codegen_c**

CC: @cgerum @paulpb @philippvk @aca88 @SebastianBoblestETAS @r.stahl @jroesch
@areusch @tqchen

---
[Visit
Topic](https://discuss.tvm.apache.org/t/rfc-uma-universal-modular-accelerator-interface/12039/4)
to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click
here](https://discuss.tvm.apache.org/email/unsubscribe/21fab334aa2b237857f4b6a281458a65d16541768616df43acb72c519eddd8cc).

[Apache TVM Discuss] [Development/pre-RFC] [RFC] UMA: Universal Modular Accelerator Interface

Reply via email to