Hi @MJKlaiber ,
Apologies to for not getting back to this in time. Thanks for the proposal! and it broadly looks like wrapping the Target Hooks RFC (by @Mousius ) : https://github.com/apache/tvm-rfcs/blob/main/rfcs/0010-target-registered-compiler-flow-customisation.md, and exposing a nice/structured interface to python. It is nice to see progress on this :) . I would like to suggest potential text changes for the formal RFC to those of us who are familiar with the existing flow (specially around naming). [quote="MJKlaiber, post:1, topic:12039"] UMA Partitioning: * Register relay passes * Register patterns - supported sub-graph operations * Order: pre-partitioning passes, Graph partitioning, post-partitioning passes * UMAPartitioner baseclass (Python only) has to be inherited by accelerator-specific Partitioners (e.g. Accelerator A Partitioner, etc) [/quote] Maybe it is worth mentioning these are current implemented as partition_for_\<(backend\target)\> ? https://github.com/apache/tvm/blob/f1ff61a7b9adb61eaa0db842bbf71c704d350154/python/tvm/relay/op/contrib/dnnl.py#L210 https://github.com/apache/tvm/blob/f1ff61a7b9adb61eaa0db842bbf71c704d350154/python/tvm/relay/op/contrib/tensorrt.py#L83 https://github.com/apache/tvm/blob/f1ff61a7b9adb61eaa0db842bbf71c704d350154/python/tvm/relay/op/contrib/ethosu.py#L1660 I am a bit curious, why this interface is specifically positioned as an "accelerator" (as in UMA) partitioner though ? i.e. Would it not be used to support optimized library support as we currently have today with BYOC ? [quote="MJKlaiber, post:1, topic:12039"] ``` class MyCustomAcceleratorPartitioner(UMAPartitioner): @property def target_name(self): return "my_custom_accelerator" def _register_patterns(self): self._register_pattern("conv1d_relu", conv1d_relu_pattern()) def _register_relay_passes(self): self._register_relay_pass(1, ConfigGenerator()) self._register_relay_pass(2, BufferScopeAnnotator()) ``` [/quote] Since the proposal suggests to use the properly registered targets, any reason should we stick to target_name (str) as opposed to the actual TargetKind ? Following up on the above question, what are your thoughts on moving the UMAPartitioner inside relay.build(...) ? Also this seemed to be proposed on using S-TIR (as opposed to "legacy" TE->TIR pipeline), would you be able to share the motivation to the partitioning of tir_schedules and tir_passes ? (Im asking mainly because they will all be S-TIR --> S-TIR IRModule passes). Following from the above question, is there an ambition to handover S-TIR back to the core compiler ? Following up on Mark's comments, [quote="mbs-octoml, post:5, topic:12039"] My group here at OctoML have been looking at bringing a backend placement search capability to TVM, a la the ‘Collage’ paper ([https://arxiv.org/pdf/2111.00655.pdf ](https://arxiv.org/pdf/2111.00655.pdf)). Under that approach there’s no longer a notion of a BYOC uniquely partitioning the graph according to its rules and heuristics in ‘one shot’. Instead the BYOC must convey the rules (patterns, predicates) for which operators could potentially be offloaded, and leave the actual partitioning to the main Collage searcher. [/quote] Mark, we are quite looking forward for the RFC for this, especially related to reference-level explanation to see where this work is headed -- which I believe might be better to know in this mutual interest of structuring BYOC targets. However, I think we all share the ambition to replace kCompiler strings to be targets if can get more support from the community. --- [Visit Topic](https://discuss.tvm.apache.org/t/rfc-uma-universal-modular-accelerator-interface/12039/11) to respond. You are receiving this because you enabled mailing list mode. To unsubscribe from these emails, [click here](https://discuss.tvm.apache.org/email/unsubscribe/70a91c3429ea525fc3b94a58d070b1b823d4e916f5277fe8ed3bfffa12c7b64b).