Thanks @MeeraN7 . Yes I get what you mean. Right now we are adding a 
"is_scalable" field to indicate that the broadcast and ramp are "context 
dependent" on VL. Additionally, we might need to update DataType to indicate a 
scalable data type.

This context dependency is the missing information I mentioned here. The set of 
code is really undefined and should be parameterized by VL.  Additionally, 
considering the case of two loops with different VL1 and VL2  and  want to do 
some transformations, we might fall into the trap of thinking them as same 
type(because only "is_scalable" is marked) but in reality they are not, as a 
implicit dependency on VL can be ignored.

I can understand that the additional flag can be helpful as we could reuse some 
of the vectorization logic. However, the "is_scalable" field might introduce 
additional confusion as above, and the additional ramp node may not carry too 
much additional information(apart from the fact that we use a scalar vs a 
vector type). So my main question is that whether or not we could use a 
separate normal form to hint the code generator without changing the current 
DataType, ramp and broadcast.

Specifically, a regular loop as follows would carry same amount of 
information(the access to i(VLA index)) would indicate a vector load,  and 
access to other indices would become a normal load. The main constraint could 
be that we can only allow `i` to appear in certain locations(say in the inner 
most to represent a ramp like pattern), and defer the generation of SVE code to 
the codegen phase, by pattern matching the indices and lookup intermediate 
value:

**N1: A possible loop normal form via annotation**
```
  for (i: int32, 0, 17;i, annotation={"VLA"}) {
    C_2[i] = A_2[i] + B_2[i];
  } 
```

N1 would indeed push a bit more pressure to the code generator, because now 
code generator needs to pattern match load/store of VLA index(i), and possibly 
perform broadcast if necessary. However, the additional overhead may not be too 
large and could help us to keep the code cleaner.  My guess is that this 
approach is also easier to generalize to later loop patterns such as scalable 
matrix instruction, in which case we cannot really reuse ramp and broadcast




-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/tvm-rfcs/pull/18#issuecomment-916899991

Reply via email to