Re: [apache/tvm-rfcs] [RFC] Buffer Layout Padding (PR #77)

Andrew Reusch Thu, 09 Jun 2022 15:36:29 -0700

we discussed this at the June 6 [community 
meeting](https://discuss.tvm.apache.org/t/next-tvm-community-meeting-june-8-2022/12900).
 a significant chunk of the meeting was spent presenting the RFC, and we had 
about 15 minutes of discussion at the end.

i think there is more to be discussed here. if. we'd like to discuss in
high-bandwidth, we can bring this back up at future community meetings. here
are notes:

@kparzysz-quic :
- aside from transform_layout, the immediate application i see from this is
vectorization of variable-length loops. we should separate the transformation
and optimization parts because those two things are logically independent. the
transform_layout will generate TIR, and then that TIR is optimized using a set
of other passes/techniques.
- @Lunderberg agrees. this is the motivation behind splitting this into
"transforms" and "more generic operations." HoistExpression does a large part
of what is needed for variable-length loop vectorization by splitting out parts
that do depend on a dynamic size from the parts that don't.
- KP is worried that it'll take quite a while to implement enough transforms to
get to overcompute (e.g. it's hard to determine whether overcompute can be
applied). can we have something that transforms the layout, then allow the user
to provide a compute statement that is attested by them to work on the
transformed layout without any verification?
- @Lunderberg i think that on its own (assuming ops are fused together by
providing a tensorization that defines "this entire fused operation can be
replaced with x followed by y"), can be done by
- don't have a good way to express "turn off all additional safeties" but
proceed to perform those optimizations.
- could imagine having something analogous to the `undef` (where that is the
"least-convenient value") except as the "most convenient value." if it's most
convenient to presume a value is 0, then where this value is present, it's
legal to assume that the value is 0 and move forward.
- there's also a [partway
condition](https://github.com/apache/tvm-rfcs/pull/77/files#diff-a5740745158592278e549c62bd8c7ccb5b6317deb56d1164d8bf845ee4db5e75R1919)
that doesn't require any of the overcompute proving, but does get to a useful
intermediate using only expression hoisting and insertion of existing if/then's
that happen for loop rewrites. after everything's been hoisted and simplified,
what falls out naturally is an outer loop that splits up into two inner loops:
- a slow one that handles the edges
- a fast one that handles the interior
this might allow us to get to the point of adding the branchless/vectorizable
piece even if it's not the only thing there.
- @tqchen notes that one of the reasons we have complexity here is that we are
trying to decompose the problem into more general predicates. if we try to go
for less complexity, we could introduce transformations that do more
transformations at once and thus require less proving.
- the question remains how we might remove additional unnecessary steps added
by the initial layout_transform. on GPUs it might be possible to pad while
loading shared memory. in other cases we may need to consult the graph-level
model to determine how much padding is needed.
- @Lunderberg notes much of this complexity came from "how can we prove the
additional steps are unnecessary?" there are also some additional parts where
the constraints written in the copy stage may need to flow upwards from
something downstream in the data-dependency graph in order to properly state
it.
- between explicitly specifying options over N buffers with different
pre-existing layouts and identifying whether a layout transformation would
require branching loops to handle the edge, a lot of it boils down to which
level of abstraction is the layout decided on and how is that exposed to lower
levels of abstraction.

--
Reply to this email directly or view it on GitHub:
https://github.com/apache/tvm-rfcs/pull/77#issuecomment-1151684472
You are receiving this because you are subscribed to this thread.

Message ID: <apache/tvm-rfcs/pull/77/c1151684...@github.com>

Re: [apache/tvm-rfcs] [RFC] Buffer Layout Padding (PR #77)

Reply via email to