Thanks @MeeraN7 . Yes I get what you mean. Right now we are adding a "is_scalable" field to indicate that the broadcast and ramp are "context dependent" on VL. Additionally, we might need to update DataType to indicate a scalable data type.
This context dependency is the missing information I mentioned here. The set of code is really undefined and should be parameterized by VL. Additionally, considering the case of two loops with different VL1 and VL2 and want to do some transformations, we might fall into the trap of thinking them as same type(because only "is_scalable" is marked) but in reality they are not, as a implicit dependency on VL can be ignored. I can understand that the additional flag can be helpful as we could reuse some of the vectorization logic. However, the "is_scalable" field might introduce additional confusion as above, and the additional ramp node may not carry too much additional information(apart from the fact that we use a scalar vs a vector type). So my main question is that whether or not we could use a separate normal form to hint the code generator without changing the current DataType, ramp and broadcast. Specifically, a regular loop as follows would carry same amount of information(the access to i(VLA index)) would indicate a vector load, and access to other indices would become a normal load. The main constraint could be that we can only allow `i` to appear in certain locations(say in the inner most to represent a ramp like pattern), and defer the generation of SVE code to the codegen phase, by pattern matching the indices and lookup intermediate value: **N1: A possible loop normal form via annotation** ``` for (i: int32, 0, 17;i, annotation={"VLA"}) { C_2[i] = A_2[i] + B_2[i]; } ``` N1 would indeed push a bit more pressure to the code generator, because now code generator needs to pattern match load/store of VLA index(i), and possibly perform broadcast if necessary. However, the additional overhead may not be too large and could help us to keep the code cleaner. My guess is that this approach is also easier to generalize to later loop patterns such as scalable matrix instruction, in which case we cannot really reuse ramp and broadcast -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/apache/tvm-rfcs/pull/18#issuecomment-916899991