@ccjoechou Summarizing our discussion a bit: - Marvell is interested in being able to arbitrarily partition a Relay graph into hardware-accelerated and non-hardware-acclerated parts - The boundaries between these parts are to be determined by Marvell backend; therefore, some additional control is needed over the default behavior provided by MergeComposite - @mbs-octoml suggests that they use the [StopFusion annotation](https://github.com/apache/tvm/blob/main/src/relay/op/annotation/annotation.h#L38) to manually enforce the boundaries. These annotations could be added programmatically via a Relay IRModule pass. StopFusion is used in [FuseOps pass](https://github.com/apache/tvm/blob/main/src/relay/transforms/fuse_ops.cc#L896) to avoid fusion. - Using this approach, the Marvell partitioning pass defined here could be simplified and the existing fusion pass could be used. - Marvell needs to be able to determine which: - Imported ONNX operator is responsible for a given Relay node - Relay node is responsible for a TIR CallNode This needs to happen at two times: 1. At compile time, to serve as a reference to the boundary nodes between a hardware-accelerated and non-hardware-accelerated subgraph 2. At runtime, to determine which backend operator to call A follow-up question here from me: at runtime, couldn’t you just emit the call to the correct backend operator? I wonder if the reason this mapping was needed was due to previous difficulties configuring the TVM partitioner (it would sometimes fuse across a desired boundary). Is it possible to avoid the need for this reference at runtime given the improved partitioning approach mentioned above? That doesn't solve the problem of needing to identify a Relay node at compile time. However, if we can reduce this need to a purely compile-time need, perhaps we can develop an alternate way to refer to a Relay node given Relay source code other than adding an id to the IR. cc @tqchen @junrushao1994 in case they have ideas here.
- Marvell proposes to add a Relay attribute exprnode_id and export this from the compiled artifact to identify the relay nodes which are fused into a particular subgraph - More broadly, source maps (e.g. mapping TIR to Relay to frontend operator) would help here. - Right now the RFC proposes to create a new GraphExecutorCodegen. It might not be necessary to do this if we could export the exprnode_id for Relay operators passed to BYOC. A suggestion is to create a Marvell-specific runtime::Module modeled after [CUDAModule](https://github.com/apache/tvm/blob/main/src/runtime/cuda/cuda_module.cc#L137) which contains several distinct pieces of generated code. The exprnode_ids could be kept separate from any binary instructions if encoded this way. This pattern is common amongst GPU-offloaded runtime::Module. - Additionally, note the [SaveToFile](https://github.com/apache/tvm/blob/main/src/runtime/cuda/cuda_module.cc#L70) override which is invoked when `Module.save()` is called from Python. This can allow you walk the runtime::Module tree from Python and collect the various exprnode_ids into a single e.g. JSON blob. - @jroesch to comment on rust CI failures - Marvell would like to contribute a simulator which can run in TVM CI to test their accelerator. We discussed either adding the sim to ci-cpu or a new ci-marvell, the method to do this, and limitations of TVM CI. - Marvell runs a patched version of the TVM CI internally. A primary reason why patching is needed is because many tests in the TVM CI require an internet connection to e.g. download models, but their CI is run in a sandbox. It would be particularly helpful to mark such tests e.g. via pytest.mark in order to make these easy to skip. We also discussed pre-populating the download_testdata cache and patching pytest.skip into download_testdata on their internal fork. cc @leandron @driazati @konturn for visibility and in case they have ideas here. -- Reply to this email directly or view it on GitHub: https://github.com/apache/tvm-rfcs/pull/48#issuecomment-1056363210 You are receiving this because you are subscribed to this thread. Message ID: <apache/tvm-rfcs/pull/48/c1056363...@github.com>