Re: [apache/tvm-rfcs] [RFC] Buffer Layout Padding (PR #77)

Tianqi Chen Wed, 29 Jun 2022 10:46:55 -0700

Thanks @Lunderberg for the update, I think we are moving towards positive 
direction of overall IR design. Some additional feedbacks:


## Keep Schedule Decisions Local to PrimFunc then Compose

On schedule primitives, to be pragmatic, it would be helpful to have some of 
the cross PrimFunc re-flowing done in two steps. Specifically, some of your 
`transform_layout` example of the functions touches buffers that involves 
input. One approach is of course to trace up to its producers and then rewrite 
the producers function as well (or trace down to consumers functions). However, 
the complication here is that:

- There can be multiple consumers/producer TIR functions
- In certain cases producer/consumer may not have consistent requirements.
- The producer/consumer themselves can have their own local layout preferences 
that needs to be consolidated.

In general it is helpful to first keep schedule decision local, e.g. 
introducing a caching stage (AC, BC in the example), the compose with another 
reflowing pass to bring the decision to consumer/producers. This is mainly to 
reduce the overall complexity in implementing such transformations, and also 
makes things more modular.

```
@T.prim_func
def grow(A: T.Buffer[14, "int32"], B: T.Buffer[14, "int32"]):
    AC = T.alloc_buffer([4, 4], "int32")
    BC = T.alloc_buffer([4, 4], "int32")

    for io, ii in T.grid(4, 4):
         with T.block():
              T.block_attr("preproc", "pad")
              AC[io, ii] = if_then_else(4 * io + ii < 14, A[4 * io + ii], 0)

    for i, j in T.grid(4, 4):
        BC[i, j] = 2 * AC[i, j]

    for io, ii in T.grid(14):
        with T.block():
            T.block_attr("postproc", ["crop", 0])
            B[io, ii] = BC[4 * io + ii]

@T.prim_func
def addone(A: T.Buffer[14, "int32"], B: T.Buffer[14, "int32"]):
    for i in T.grid(14):
        B[i] = A[i] + 1

@R.func
def main(A: T.Tensor[14, "int32"]):
        lv0 = call_tir(grow, [A], (14))
        # an intermdiate stage to show non-local reflowing
        lv1 = call_tir(addone, [lv0], (14))
        lv2 = call_tir(grow, [lv1], (14))
        ...
```

## Use IfThenElse expression for Padding.

While it is possible to express padding with a loop and another loop that 
writes the padded value, it is harder to schedule the resulting blocks as there 
are more than one producers. Having a single loop and use `T.if_then_else ` 
will express such pattern in a single shot and makes future rewriting easier.


```python
    for io, ii in T.grid(4, 4):
         with T.block():
              T.block_attr("preproc", "pad")
              AC[io, ii] = if_then_else(4 * io + ii < 14, A[4 * io + ii], 0)
```

## Propagate Padding Decisions from the End.
 
Some of the complications of duplicated condition(and their simplification) 
roots from the fact that we do layout transform of output and input 
separately(each introducing their own conditions which then needs to be 
simplified). It might be helpful to do a global transformation, usually driven 
from the output, then "backprop" the implication of that decisions to the 
input. Doing such transformation at a single shot will likely alleviate the 
need of generating extra conditions then simplifying them.



-- 
Reply to this email directly or view it on GitHub:
https://github.com/apache/tvm-rfcs/pull/77#issuecomment-1170294348
You are receiving this because you are subscribed to this thread.

Message ID: <apache/tvm-rfcs/pull/77/c1170294...@github.com>

Re: [apache/tvm-rfcs] [RFC] Buffer Layout Padding (PR #77)

Reply via email to