> Introducing changes to TIR would needs some additional thoughts that deserves 
> some extra consideration. Due to the N*M complexity (where N is the TIR 
> possibilities and M is the number of primitives to be supported) that needs 
> to be handled in implementation (by backend implementers and primitive 
> implementers)

This was part of the design consideration, to minimize the impact of the 
proposed changes to primitives, lowering transformations, and backends.

* The `BufferConstraint` annotations do not need specific handling at the 
codegen level, as it is only present to enable compile-time optimizations.
  
* Use of the `BufferConstraint` hints would occur within existing utilities, 
primarily as additional information available in `arith::Analyzer` utilities.  
This minimizes the need for other primitives/transforms to be aware of the 
buffer constraints, while still benefiting from them.
  
* The `T.undef()` built-in does not need specific handling at the codegen 
level, as it is removed during lowering.
  
* The `T.undef()` built-in does not require specific handling from other 
primitives, as stores of `T.undef()` can be treated the same as stores of any 
other value.
  
> Right now it is possible to do non-local constraint rewriting flowings as 
> part of the graph pass. Note that while E1 is indeed less "compact" on one 
> hand, we can use it to reconstruct the desirable compact data 
> structure(something like BufferConstraint that represents the layout mapping) 
> that we can use to flow the decisions across the graph node during the pass.
  
I definitely agree that graph-level transforms are where the layouts and 
constraints should be decided.  The `BufferConstraint` annotations are not 
intended as a way to override in TIR what was already decided at the graph 
level, but rather a way to communicate to TIR transformations what has been 
decided at the graph level.

> E1: Composing a stage that transforms the layout(a loop that represents the 
> mapping)

I'm still a bit confused with this approach, specifically how one would avoid 
having a separate compute definition for each workload on a new target 
(Initially brought up by @csullivan 
[here](https://github.com/apache/tvm-rfcs/pull/77#discussion_r893701372).) In 
my mind, if I'm going to compose a layout transformation stage, it would need 
to be followed by a compute stage that takes a transformed layout as input.  So 
rather than having a single conv2d that can be generalized over layouts, each 
transformed layout would still need to have a compute stage for it.

> Note that intiially such data structure do not need to live beyond the life 
> of a pass, because they can be reconstructed at anytime from the other 
> representation.

How would this be represented while optimizing the performance of a subgraph?  
My concern would be how to express the non-local constraints while keeping a 
small search space for optimization.

* Ensure that the producer and consumer stages are within the same subgraph.  
Since the constraints provided to a consumer depend not only on the producer, 
but also on the constraints provided to the producer, so this might require 
fusing the entire end-to-end model into a single monolithic kernel.
  
  My understanding is that this would result in a search space that is too 
large to effectively optimize, though I haven't explicitly tested it.
  
* Insert a transformation stage into the subgraph, in which the constraint is 
written.  Later portions of the subgraph could then rely on the constraint 
without examining other subgraphs.
  
  Would need to have some way to indicate that the transformation stage 
shouldn't be altered during optimization, nor should it be part of the 
performance timing.
  
* Express the graph-level constraints to a subgraph, so that it can optimize 
using those constraints.
  
  This was my intent with the `BufferConstraint` annotations, since then the 
subgraphs could take advantage of
  
> E1 also enables some additional capabilities (e.g.) expressing future memory 
> remappings that do not necessarily fit into padding/packing.

Is there an existing annotation to indicate that a stage should be removed 
entirely during lowering?  That might be an effective way to allow more general 
usage by annotating a stage that can be assumed to have been performed prior to 
the subgraph.  This would be a way to express the second option of an extra 
transformation stage, while still providing enough information to remove the 
transformation stage during lowering.

-- 
Reply to this email directly or view it on GitHub:
https://github.com/apache/tvm-rfcs/pull/77#issuecomment-1162392893
You are receiving this because you are subscribed to this thread.

Message ID: <apache/tvm-rfcs/pull/77/c1162392...@github.com>

Reply via email to