> Indeed it is important to avoid having a separate compute definition for each > workload on a new target. In this particular case, all computation definition > would start with the original layout. Then there is a "schedule > transformation" like transform layout which will generate the new stage as > part of the scheduling process.
Thank you, and that is roughly how I'm seeing it as well. That everything starts with the base compute definition and is modified from there. If I understand correctly, the main differences are below. * Option A: Layout transformations of inputs are allowed, but only during initial graph-level optimization. When optimizing an individual PrimFunc, layout transformations of inputs and outputs are not allowed. * Option B: Layout transformations of inputs and outputs are not allowed. If this is desired, it should be done by first introducing a cache stage in TIR, then transforming the layout of the cache, and finally by a graph-level transformation that inspects each PrimFunc and hoists the cache stage out. > The particular stage can be marked, which contains effectively the same > information as BufferConstraint, except that it does not introduce new data > structures. During global layout reflowing, such information can be used to > guide the reflowing to reconstruct a data structure like BufferConstraint or > other Layout mappings and use that to serve the same purpose. So long as the constraints can be statically searched for, this approach makes sense to me. I would be more concerned about adding additional semantics to existing nodes, such as a AttrStmt node, since it then requires passes to be aware not only of the existence of the constraint, but also that it must be reconstructed from the existing data structure. This approach would make it much more difficult for a static analysis tool to identify locations where the constraints must be updated. As a way to potentially find a way forward, what if we start by implementing pad values only for buffers that are allocated internally to a function? This would be allowed behavior under both Option A and Option B, and would help determine how difficult reconstruction of the constraints would be from the transformation block without any additional annotation. This could help motivate whether additional annotations are necessary, regardless of whether they are stored alongside the Buffer itself or in a separate attribute/annotation. -- Reply to this email directly or view it on GitHub: https://github.com/apache/tvm-rfcs/pull/77#issuecomment-1163436177 You are receiving this because you are subscribed to this thread. Message ID: <apache/tvm-rfcs/pull/77/c1163436...@github.com>