> Introducing changes to TIR would needs some additional thoughts that deserves > some extra consideration. Due to the N*M complexity (where N is the TIR > possibilities and M is the number of primitives to be supported) that needs > to be handled in implementation (by backend implementers and primitive > implementers)
This was part of the design consideration, to minimize the impact of the proposed changes to primitives, lowering transformations, and backends. * The `BufferConstraint` annotations do not need specific handling at the codegen level, as it is only present to enable compile-time optimizations. * Use of the `BufferConstraint` hints would occur within existing utilities, primarily as additional information available in `arith::Analyzer` utilities. This minimizes the need for other primitives/transforms to be aware of the buffer constraints, while still benefiting from them. * The `T.undef()` built-in does not need specific handling at the codegen level, as it is removed during lowering. * The `T.undef()` built-in does not require specific handling from other primitives, as stores of `T.undef()` can be treated the same as stores of any other value. > Right now it is possible to do non-local constraint rewriting flowings as > part of the graph pass. Note that while E1 is indeed less "compact" on one > hand, we can use it to reconstruct the desirable compact data > structure(something like BufferConstraint that represents the layout mapping) > that we can use to flow the decisions across the graph node during the pass. I definitely agree that graph-level transforms are where the layouts and constraints should be decided. The `BufferConstraint` annotations are not intended as a way to override in TIR what was already decided at the graph level, but rather a way to communicate to TIR transformations what has been decided at the graph level. > E1: Composing a stage that transforms the layout(a loop that represents the > mapping) I'm still a bit confused with this approach, specifically how one would avoid having a separate compute definition for each workload on a new target (Initially brought up by @csullivan [here](https://github.com/apache/tvm-rfcs/pull/77#discussion_r893701372).) In my mind, if I'm going to compose a layout transformation stage, it would need to be followed by a compute stage that takes a transformed layout as input. So rather than having a single conv2d that can be generalized over layouts, each transformed layout would still need to have a compute stage for it. > Note that intiially such data structure do not need to live beyond the life > of a pass, because they can be reconstructed at anytime from the other > representation. How would this be represented while optimizing the performance of a subgraph? My concern would be how to express the non-local constraints while keeping a small search space for optimization. * Ensure that the producer and consumer stages are within the same subgraph. Since the constraints provided to a consumer depend not only on the producer, but also on the constraints provided to the producer, so this might require fusing the entire end-to-end model into a single monolithic kernel. My understanding is that this would result in a search space that is too large to effectively optimize, though I haven't explicitly tested it. * Insert a transformation stage into the subgraph, in which the constraint is written. Later portions of the subgraph could then rely on the constraint without examining other subgraphs. Would need to have some way to indicate that the transformation stage shouldn't be altered during optimization, nor should it be part of the performance timing. * Express the graph-level constraints to a subgraph, so that it can optimize using those constraints. This was my intent with the `BufferConstraint` annotations, since then the subgraphs could take advantage of > E1 also enables some additional capabilities (e.g.) expressing future memory > remappings that do not necessarily fit into padding/packing. Is there an existing annotation to indicate that a stage should be removed entirely during lowering? That might be an effective way to allow more general usage by annotating a stage that can be assumed to have been performed prior to the subgraph. This would be a way to express the second option of an extra transformation stage, while still providing enough information to remove the transformation stage during lowering. -- Reply to this email directly or view it on GitHub: https://github.com/apache/tvm-rfcs/pull/77#issuecomment-1162392893 You are receiving this because you are subscribed to this thread. Message ID: <apache/tvm-rfcs/pull/77/c1162392...@github.com>