Hi Michael, thanks for the proposal! Like others I'm very supportive of tightening up the BYOC interfaces.
My group here at OctoML have been looking at bringing a backend placement search capability to TVM, a la the 'Collage' paper (https://arxiv.org/pdf/2111.00655.pdf). Under that approach there's no longer a notion of a BYOC uniquely partitioning the graph according to its rules and heuristics in 'one shot'. Instead the BYOC must convey the rules (patterns, predicates) for which operators could potentially be offloaded, and leave the actual partitioning to the main Collage searcher. Currently we have two mechanisms for conveying those rules: - pattern tables (triple of label, Relay pattern and predicate over the matched sub-expression) - per BYOC backend predicates associated with ops My feeling is Collage would benefit if there was a well-known way of getting to the former, and we just port over the latter to the former to avoid a proliferation of equivalent mechanism. Though there is a global pattern registry it seems folks have realized it is not necessary to use it so BYOC integrations are inconsistent in their use of it. Collage would also benefit if BYOC backends could be represented by Targets (as @Mousius at ARM has been working towards.) For example, both CUTLASS and TensortRT could be represented by Targets which refine that of the CUDA device. In this way the search space of placements can be controlled by including the relevant Targets in the list of heterogeneous targets, and the result of partitioning (irrespective of which implementation(s) actually do it) can be conveyed by a "target" annotation on a "Primitive" Relay Function. I don't think Collage has any implications for how lowering/codegen is dispatched, provided it is keyed by Target. However personally I think it may be better if we decompose that into: - well known places in the standard pipeline to insert new passes (esp just before built-in lowering) - a pass combinator that can filter based on "target" annotations So part of registering a BYOC backend could be to both register the patterns and register the new passes wrapped by the above filtering combinator. Very happy to work on this with you all -- if we can get this right it will make our work much easier! Best, -Mark --- [Visit Topic](https://discuss.tvm.apache.org/t/rfc-uma-universal-modular-accelerator-interface/12039/5) to respond. You are receiving this because you enabled mailing list mode. To unsubscribe from these emails, [click here](https://discuss.tvm.apache.org/email/unsubscribe/8e406db205830f6effb854a1222ab430277e93cd41361d85b56c82c143831c10).