Hi @wrongtest, thanks for the nice write up.

> Currently, we have to hack the compile engine to find the pre-scheduled 
> PrimFunc from a standalone cache, we are glad to know what is the best way to 
> achieve this goal.

Here's some thoughts, please correct any misunderstandings I might have.

Yes, the relay_to_tir pass that @Mousius added a few months back could work, 
since it seems you want to completely take over scheduling. You can use 
annotations to convey layout hints from your analysis pass to lowering (or just 
do it on-the-fly in your hook, you're probably doing a global analysis though?) 
You'd have to implement caching yourself, but when Chris and I were trying to 
decide if caching was something worth building into the relay_to_tir machinery 
the consensus was it was straightforward to just implement it directly in each 
hook function. So that gives you both full control over the conversion to TIR 
and full control over the rewritten call_lowered you leave behind. I think 
everyone would be happy to extend that if you find it lacking.

We've also been mulling over another approach to incremental layout 
optimization, though it's by no means ready to use out-of-the-box (but maybe it 
sparks your interest?). We can now invoke lowering multiple times, and with a 
bit more work we could even restrict lowering to trigger on only particular 
'focus' primitive functions. Ie we don't have to lower all-at-once. We've also 
done some legwork to allow virtual device annotations to flow both into and out 
of already lowered PrimFuncs, all be it currently only for memory scope and not 
layout. But putting those together we could imagine:
 - allow layout constraints to appear in VirtualDevices, just as we now do for 
memory/storage scope.
 - choose a subset of 'critical' primitives (maybe just one) to lower, and give 
lowering free choices to choose the best layout. Capture that choice in the 
PrimFunc using VirtualDevices on the arguments.
 - re-run device planning to flow the new layout constraints to 
yet-to-be-lowered primitives. Where layouts have a hard disagreement insert the 
necessary layout x-forms as per the bijections you describe.
 - re-run lowering on the next set of 'critical' primitives, this time 
respecting any layout constraints already imposed on the arguments, but as 
before any still unconstrained arguments can have their layout chosen during 
lowering.
 - repeat until all primitives lowered.

Would be happy to talk more about that if you see a connection.





---
[Visit 
Topic](https://discuss.tvm.apache.org/t/introducing-ty-nnp-backend-with-end2end-tensorir-integration/11807/6)
 to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.tvm.apache.org/email/unsubscribe/05d8d7721d54fb9b1767e6d1b3598622904e8d988254f667cfebd6d4ad427275).

Reply via email to