we discussed this at the June 6 [community 
meeting](https://discuss.tvm.apache.org/t/next-tvm-community-meeting-june-8-2022/12900).
 a significant chunk of the meeting was spent presenting the RFC, and we had 
about 15 minutes of discussion at the end. 

i think there is more to be discussed here. if. we'd like to discuss in 
high-bandwidth, we can bring this back up at future community meetings. here 
are notes:

@kparzysz-quic : 
- aside from transform_layout, the immediate application i see from this is 
vectorization of variable-length loops. we should separate the transformation 
and optimization parts because those two things are logically independent. the 
transform_layout will generate TIR, and then that TIR is optimized using a set 
of other passes/techniques.
  - @Lunderberg agrees. this is the motivation behind splitting this into 
"transforms" and "more generic operations." HoistExpression does a large part 
of what is needed for variable-length loop vectorization by splitting out parts 
that do depend on a dynamic size from the parts that don't. 
- KP is worried that it'll take quite a while to implement enough transforms to 
get to overcompute (e.g. it's hard to determine whether overcompute can be 
applied). can we have something that transforms the layout, then allow the user 
to provide a compute statement that is attested by them to work on the 
transformed layout without any verification?
  - @Lunderberg i think that on its own (assuming ops are fused together by 
providing a tensorization that defines "this entire fused operation can be 
replaced with x followed by y"), can be done by 
  - don't have a good way to express "turn off all additional safeties" but 
proceed to perform those optimizations.
  - could imagine having something analogous to the `undef` (where that is the 
"least-convenient value") except as the "most convenient value." if it's most 
convenient to presume a value is 0, then where this value is present, it's 
legal to assume that the value is 0 and move forward.
  - there's also a [partway 
condition](https://github.com/apache/tvm-rfcs/pull/77/files#diff-a5740745158592278e549c62bd8c7ccb5b6317deb56d1164d8bf845ee4db5e75R1919)
 that doesn't require any of the overcompute proving, but does get to a useful 
intermediate using only expression hoisting and insertion of existing if/then's 
that happen for loop rewrites. after everything's been hoisted and simplified, 
what falls out naturally is an outer loop that splits up into two inner loops:
     -  a slow one that handles the edges
     - a fast one that handles the interior
  this might allow us to get to the point of adding the branchless/vectorizable 
piece even if it's not the only thing there.
- @tqchen notes that one of the reasons we have complexity here is that we are 
trying to decompose the problem into more general predicates. if we try to go 
for less complexity, we could introduce transformations that do more 
transformations at once and thus require less proving.
  - the question remains how we might remove additional unnecessary steps added 
by the initial layout_transform. on GPUs it might be possible to pad while 
loading shared memory. in other cases we may need to consult the graph-level 
model to determine how much padding is needed.
- @Lunderberg notes much of this complexity came from "how can we prove the 
additional steps are unnecessary?" there are also some additional parts where 
the constraints written in the copy stage may need to flow upwards from 
something downstream in the data-dependency graph in order to properly state 
it. 
  - between explicitly specifying options over N buffers with different 
pre-existing layouts and identifying whether a layout transformation would 
require branching loops to handle the edge, a lot of it boils down to which 
level of abstraction is the layout decided on and how is that exposed to lower 
levels of abstraction.

-- 
Reply to this email directly or view it on GitHub:
https://github.com/apache/tvm-rfcs/pull/77#issuecomment-1151684472
You are receiving this because you are subscribed to this thread.

Message ID: <apache/tvm-rfcs/pull/77/c1151684...@github.com>

Reply via email to