we discussed this at the June 6 [community meeting](https://discuss.tvm.apache.org/t/next-tvm-community-meeting-june-8-2022/12900). a significant chunk of the meeting was spent presenting the RFC, and we had about 15 minutes of discussion at the end.
i think there is more to be discussed here. if. we'd like to discuss in high-bandwidth, we can bring this back up at future community meetings. here are notes: @kparzysz-quic : - aside from transform_layout, the immediate application i see from this is vectorization of variable-length loops. we should separate the transformation and optimization parts because those two things are logically independent. the transform_layout will generate TIR, and then that TIR is optimized using a set of other passes/techniques. - @Lunderberg agrees. this is the motivation behind splitting this into "transforms" and "more generic operations." HoistExpression does a large part of what is needed for variable-length loop vectorization by splitting out parts that do depend on a dynamic size from the parts that don't. - KP is worried that it'll take quite a while to implement enough transforms to get to overcompute (e.g. it's hard to determine whether overcompute can be applied). can we have something that transforms the layout, then allow the user to provide a compute statement that is attested by them to work on the transformed layout without any verification? - @Lunderberg i think that on its own (assuming ops are fused together by providing a tensorization that defines "this entire fused operation can be replaced with x followed by y"), can be done by - don't have a good way to express "turn off all additional safeties" but proceed to perform those optimizations. - could imagine having something analogous to the `undef` (where that is the "least-convenient value") except as the "most convenient value." if it's most convenient to presume a value is 0, then where this value is present, it's legal to assume that the value is 0 and move forward. - there's also a [partway condition](https://github.com/apache/tvm-rfcs/pull/77/files#diff-a5740745158592278e549c62bd8c7ccb5b6317deb56d1164d8bf845ee4db5e75R1919) that doesn't require any of the overcompute proving, but does get to a useful intermediate using only expression hoisting and insertion of existing if/then's that happen for loop rewrites. after everything's been hoisted and simplified, what falls out naturally is an outer loop that splits up into two inner loops: - a slow one that handles the edges - a fast one that handles the interior this might allow us to get to the point of adding the branchless/vectorizable piece even if it's not the only thing there. - @tqchen notes that one of the reasons we have complexity here is that we are trying to decompose the problem into more general predicates. if we try to go for less complexity, we could introduce transformations that do more transformations at once and thus require less proving. - the question remains how we might remove additional unnecessary steps added by the initial layout_transform. on GPUs it might be possible to pad while loading shared memory. in other cases we may need to consult the graph-level model to determine how much padding is needed. - @Lunderberg notes much of this complexity came from "how can we prove the additional steps are unnecessary?" there are also some additional parts where the constraints written in the copy stage may need to flow upwards from something downstream in the data-dependency graph in order to properly state it. - between explicitly specifying options over N buffers with different pre-existing layouts and identifying whether a layout transformation would require branching loops to handle the edge, a lot of it boils down to which level of abstraction is the layout decided on and how is that exposed to lower levels of abstraction. -- Reply to this email directly or view it on GitHub: https://github.com/apache/tvm-rfcs/pull/77#issuecomment-1151684472 You are receiving this because you are subscribed to this thread. Message ID: <apache/tvm-rfcs/pull/77/c1151684...@github.com>