Hi Michael, thanks for the proposal! Like others I'm very supportive of 
tightening up the BYOC interfaces.

My group here at OctoML have been looking at bringing a backend placement 
search capability to TVM, a la the 'Collage' paper 
(https://arxiv.org/pdf/2111.00655.pdf). Under that approach there's no longer a 
notion of a BYOC uniquely partitioning the graph according to its rules and 
heuristics in 'one shot'. Instead the BYOC must convey the rules (patterns, 
predicates) for which operators could potentially be offloaded, and leave the 
actual partitioning to the main Collage searcher.

Currently we have two mechanisms for conveying those rules:
 - pattern tables (triple of label, Relay pattern and predicate over the 
matched sub-expression)
 - per BYOC backend predicates associated with ops

My feeling is Collage would benefit if there was a well-known way of getting to 
the former, and we just port over the latter to the former to avoid a 
proliferation of equivalent mechanism. Though there is a global pattern 
registry it seems folks have realized it is not necessary to use it so BYOC 
integrations are inconsistent in their use of it.

Collage would also benefit if BYOC backends could be represented by Targets (as 
@Mousius at ARM has been working towards.) For example, both CUTLASS and 
TensortRT could be represented by Targets which refine that of the CUDA device. 
In this way the search space of placements can be controlled by including the 
relevant Targets in the list of heterogeneous targets, and the result of 
partitioning (irrespective of which implementation(s) actually do it) can be 
conveyed by a "target" annotation on a "Primitive" Relay Function.

I don't think Collage has any implications for how lowering/codegen is 
dispatched, provided it is keyed by Target. However personally I think it may 
be better if we decompose that into:
 - well known places in the standard pipeline to insert new passes (esp just 
before built-in lowering)
 - a pass combinator that can filter based on "target" annotations

So part of registering a BYOC backend could be to both register the patterns 
and register the new passes wrapped by the above filtering combinator.

Very happy to work on this with you all -- if we can get this right it will 
make our work much easier!

Best,
-Mark





---
[Visit 
Topic](https://discuss.tvm.apache.org/t/rfc-uma-universal-modular-accelerator-interface/12039/5)
 to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.tvm.apache.org/email/unsubscribe/8e406db205830f6effb854a1222ab430277e93cd41361d85b56c82c143831c10).

Reply via email to