Thanks for commenting @tqchen
Could you further clarify a few things for me please? See remarks inlined.

> Thanks @MeeraN7 . Yes I get what you mean. Right now we are adding a 
> "is_scalable" field to indicate that the broadcast and ramp are "context 
> dependent" on VL. Additionally, we might need to update DataType to indicate 
> a scalable data type.
> 
> This context dependency is the missing information I mentioned here. 

I don't think I understand what you mean by context dependency. Basically, in 
my view of the world, the Ramp node means that we can process elements in a 
data parallel way. How exactly this is done, is up to backends depending on the 
architecture and the code. What we are doing here is annotating this Ramp node 
with a bit of state, which is a hint that we want a special kind of 
vectorisation. From this point of view it is syntactic sugar: if the hint is 
dropped or ignored, it could still be vectorised, but not in a vector length 
agnostic way. 

I don't think we need to encode much more information than one boolean that 
specifies the vectorisation style. You could indeed argue that this is ad-hoc, 
but if we are going to do it in a different way, we would still need to keep a 
bit of state around, like the `annotation={"VLA"}` example that you gave 
earlier. From that point of view, I don't see any difference.

> The set of code is really undefined and should be parameterized by VL. 

What do you mean by undefined here?

> Additionally, considering the case of two loops with different VL1 and VL2 
> and want to do some transformations, we might fall into the trap of thinking 
> them as same type(because only "is_scalable" is marked) but in reality they 
> are not, as a implicit dependency on VL can be ignored.
> 
> I can understand that the additional flag can be helpful as we could reuse 
> some of the vectorization logic. However, the "is_scalable" field might 
> introduce additional confusion as above, and the additional ramp node may not 
> carry too much additional information(apart from the fact that we use a 
> scalar vs a vector type). So my main question is that whether or not we could 
> use a separate normal form to hint the code generator without changing the 
> current DataType, ramp and broadcast.

Correct me if I am wrong, but your assumption is that explicit scalable state 
is bad. Looking at your example:

> **N1: A possible loop normal form via annotation**
> 
> ```
>   for (i: int32, 0, 17;i, annotation={"VLA"}) {
>     C_2[i] = A_2[i] + B_2[i];
>   } 
> ```

This annotation looks equivalent to a loop pragma. In Clang you can for example 
do:

 ```
  #pragma loop vectorize_width(4, scalable)
   for (I =0; I<17; ++i) {
     C_2[i] = A_2[i] + B_2[I];
   } 
```

and thus request scalable vectorisation. If this annotation results in scalable 
vectorisation, the LLVM vectoriser will lower this to operations using `<n x 4 
x i32>` scalable vector IR types. 

What I would like to say with these examples, is that there are 2 things going 
on at different levels. I think:
- The `annotation={"VLA"}` corresponds to a loop pragma in Clang,
- The TIR Ramp node extension corresponds to LLVM IR scalable types, e.g. `<n x 
4 x i32>`.

And I think these 2 concepts are different things that both have their value 
and place.

I we can only annotate loops as being scalable, we loose the finer grained 
control to request this on a statement level. I don't know if mixing fixed and 
scalable will be an important use-case, but I think it is possible. 

Summarising, I don't think explicit encoding of scalable in TIR nodes is a bad 
thing, the opposite actually, I think we need it, and the annotation on the 
loop might be a complementary technique to this. 

What do you think? 


-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/tvm-rfcs/pull/18#issuecomment-917576240

Reply via email to