Hi @wrongtest, thanks for the nice write up.
> Currently, we have to hack the compile engine to find the pre-scheduled > PrimFunc from a standalone cache, we are glad to know what is the best way to > achieve this goal. Here's some thoughts, please correct any misunderstandings I might have. Yes, the relay_to_tir pass that @Mousius added a few months back could work, since it seems you want to completely take over scheduling. You can use annotations to convey layout hints from your analysis pass to lowering (or just do it on-the-fly in your hook, you're probably doing a global analysis though?) You'd have to implement caching yourself, but when Chris and I were trying to decide if caching was something worth building into the relay_to_tir machinery the consensus was it was straightforward to just implement it directly in each hook function. So that gives you both full control over the conversion to TIR and full control over the rewritten call_lowered you leave behind. I think everyone would be happy to extend that if you find it lacking. We've also been mulling over another approach to incremental layout optimization, though it's by no means ready to use out-of-the-box (but maybe it sparks your interest?). We can now invoke lowering multiple times, and with a bit more work we could even restrict lowering to trigger on only particular 'focus' primitive functions. Ie we don't have to lower all-at-once. We've also done some legwork to allow virtual device annotations to flow both into and out of already lowered PrimFuncs, all be it currently only for memory scope and not layout. But putting those together we could imagine: - allow layout constraints to appear in VirtualDevices, just as we now do for memory/storage scope. - choose a subset of 'critical' primitives (maybe just one) to lower, and give lowering free choices to choose the best layout. Capture that choice in the PrimFunc using VirtualDevices on the arguments. - re-run device planning to flow the new layout constraints to yet-to-be-lowered primitives. Where layouts have a hard disagreement insert the necessary layout x-forms as per the bijections you describe. - re-run lowering on the next set of 'critical' primitives, this time respecting any layout constraints already imposed on the arguments, but as before any still unconstrained arguments can have their layout chosen during lowering. - repeat until all primitives lowered. Would be happy to talk more about that if you see a connection. --- [Visit Topic](https://discuss.tvm.apache.org/t/introducing-ty-nnp-backend-with-end2end-tensorir-integration/11807/6) to respond. You are receiving this because you enabled mailing list mode. To unsubscribe from these emails, [click here](https://discuss.tvm.apache.org/email/unsubscribe/05d8d7721d54fb9b1767e6d1b3598622904e8d988254f667cfebd6d4ad427275).