> Indeed it is important to avoid having a separate compute definition for each 
> workload on a new target. In this particular case, all computation definition 
> would start with the original layout. Then there is a "schedule 
> transformation" like transform layout which will generate the new stage as 
> part of the scheduling process.

Thank you, and that is roughly how I'm seeing it as well.  That everything 
starts with the base compute definition and is modified from there.  If I 
understand correctly, the main differences are below.

* Option A: Layout transformations of inputs are allowed, but only during 
initial graph-level optimization.  When optimizing an individual PrimFunc, 
layout transformations of inputs and outputs are not allowed.

* Option B: Layout transformations of inputs and outputs are not allowed.  If 
this is desired, it should be done by first introducing a cache stage in TIR, 
then transforming the layout of the cache, and finally by a graph-level 
transformation that inspects each PrimFunc and hoists the cache stage out.

> The particular stage can be marked, which contains effectively the same 
> information as BufferConstraint, except that it does not introduce new data 
> structures. During global layout reflowing, such information can be used to 
> guide the reflowing to reconstruct a data structure like BufferConstraint or 
> other Layout mappings and use that to serve the same purpose.

So long as the constraints can be statically searched for, this approach makes 
sense to me.  I would be more concerned about adding additional semantics to 
existing nodes, such as a AttrStmt node, since it then requires passes to be 
aware not only of the existence of the constraint, but also that it must be 
reconstructed from the existing data structure.  This approach would make it 
much more difficult for a static analysis tool to identify locations where the 
constraints must be updated.

As a way to potentially find a way forward, what if we start by implementing 
pad values only for buffers that are allocated internally to a function?  This 
would be allowed behavior under both Option A and Option B, and would help 
determine how difficult reconstruction of the constraints would be from the 
transformation block without any additional annotation.  This could help 
motivate whether additional annotations are necessary, regardless of whether 
they are stored alongside the Buffer itself or in a separate 
attribute/annotation.

-- 
Reply to this email directly or view it on GitHub:
https://github.com/apache/tvm-rfcs/pull/77#issuecomment-1163436177
You are receiving this because you are subscribed to this thread.

Message ID: <apache/tvm-rfcs/pull/77/c1163436...@github.com>

Reply via email to