Thanks for commenting @tqchen Could you further clarify a few things for me please? See remarks inlined.
> Thanks @MeeraN7 . Yes I get what you mean. Right now we are adding a > "is_scalable" field to indicate that the broadcast and ramp are "context > dependent" on VL. Additionally, we might need to update DataType to indicate > a scalable data type. > > This context dependency is the missing information I mentioned here. I don't think I understand what you mean by context dependency. Basically, in my view of the world, the Ramp node means that we can process elements in a data parallel way. How exactly this is done, is up to backends depending on the architecture and the code. What we are doing here is annotating this Ramp node with a bit of state, which is a hint that we want a special kind of vectorisation. From this point of view it is syntactic sugar: if the hint is dropped or ignored, it could still be vectorised, but not in a vector length agnostic way. I don't think we need to encode much more information than one boolean that specifies the vectorisation style. You could indeed argue that this is ad-hoc, but if we are going to do it in a different way, we would still need to keep a bit of state around, like the `annotation={"VLA"}` example that you gave earlier. From that point of view, I don't see any difference. > The set of code is really undefined and should be parameterized by VL. What do you mean by undefined here? > Additionally, considering the case of two loops with different VL1 and VL2 > and want to do some transformations, we might fall into the trap of thinking > them as same type(because only "is_scalable" is marked) but in reality they > are not, as a implicit dependency on VL can be ignored. > > I can understand that the additional flag can be helpful as we could reuse > some of the vectorization logic. However, the "is_scalable" field might > introduce additional confusion as above, and the additional ramp node may not > carry too much additional information(apart from the fact that we use a > scalar vs a vector type). So my main question is that whether or not we could > use a separate normal form to hint the code generator without changing the > current DataType, ramp and broadcast. Correct me if I am wrong, but your assumption is that explicit scalable state is bad. Looking at your example: > **N1: A possible loop normal form via annotation** > > ``` > for (i: int32, 0, 17;i, annotation={"VLA"}) { > C_2[i] = A_2[i] + B_2[i]; > } > ``` This annotation looks equivalent to a loop pragma. In Clang you can for example do: ``` #pragma loop vectorize_width(4, scalable) for (I =0; I<17; ++i) { C_2[i] = A_2[i] + B_2[I]; } ``` and thus request scalable vectorisation. If this annotation results in scalable vectorisation, the LLVM vectoriser will lower this to operations using `<n x 4 x i32>` scalable vector IR types. What I would like to say with these examples, is that there are 2 things going on at different levels. I think: - The `annotation={"VLA"}` corresponds to a loop pragma in Clang, - The TIR Ramp node extension corresponds to LLVM IR scalable types, e.g. `<n x 4 x i32>`. And I think these 2 concepts are different things that both have their value and place. I we can only annotate loops as being scalable, we loose the finer grained control to request this on a statement level. I don't know if mixing fixed and scalable will be an important use-case, but I think it is possible. Summarising, I don't think explicit encoding of scalable in TIR nodes is a bad thing, the opposite actually, I think we need it, and the annotation on the loop might be a complementary technique to this. What do you think? -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/apache/tvm-rfcs/pull/18#issuecomment-917576240