I think there's a confusion about the difference between what we have referred 
to as `vscale` and `vfactor`. I'll try to summarise the the difference and the 
respective pros and cons. 

For reference, this is how LLVM represents vectors (copied from the 
[documentation](https://llvm.org/docs/LangRef.html#vector-type)):
```
< <# elements> x <elementtype> >          ; Fixed-length vector
< vscale x <# elements> x <elementtype> > ; Scalable vector
```
A concrete example of a scalable vector:
```
<vscale x 4 x float>
```
or
```
<vscale x 16 x i8> 
```
To construct these vectors we need to know the minimum vector length (SVE's 128 
used in these examples) and the size of the data type of the vector elements 
(32 bits or 8 bits in these examples).

## Vscale

This would mirror LLVM's `vscale` intrinsic, so if we had a TIR intrinsic with 
the same meaning, a TVM vector of floats that would exactly map to a hardware 
vector would look like
```
ramp(base, stride, 4 * vscale)   # or vscale(4) depending on which UI we want 
to go for
```
### Pros
1. When eyeballing the TIR, the meaning of the `vscale` intrinsic is intuitive 
since it's matches LLVM
2. It makes translating the expressions involving `vscale` that exist outside 
of the vectors in codegen very easy since we just have to map `tir.vscale` -> 
`llvm.vscale`
3. Since we can pull the information about the vector element data type from 
the ramp node, we can deduce the minimum vector length from the multiplier
4. Makes it simpler to support arbitrarily long vectors\*

### Cons
1. Representing `lanes` in runtime data type is very awkward (see the comments 
above)
2. It's harder to place restriction on what `ramp->lanes` can be so it can get 
accidentally set to something nonsensical. This could be alleviated by using 
`vscale(4)` though as recommended by @kparzysz-quic 

## Vfactor

This was proposed in the first version of this RFC. A TVM vector that would map 
to a hardware vector would be:
```
ramp(base, stride, vfactor)
```
In this case the constant is implicitly absorbed into `vfactor` and will be 
deduced during codegen. The minimum vector length should be known to the 
backend specific codegen and the data type size can be pulled from the data 
type of the elements in the vector. 

### Pros
1. Simpler to use in the scheduling, you don't have to worry about data type 
size and minimum vector length
3. Less visual clutter
4. Easier to create a robust implementation since we can enforce that if 
`lanes` of the ramp is not `int`, it is `vfactor` (unless we go to the 
territory of arbitrarily long vectors\*)
5. `DLDataType` representation is less of an issue, we can just go for -1

### Cons
1. We don't know the implicit data type of `vfactor` that is outside of the 
vector (this is a big problem)


## \*The arbitrarily long vectors

This is the "vectors with multiple vector width" that @tqchen mentioned. It is 
referring to there being no restrictions to the length of the TIR vectors and 
subsequently LLVM vectors in TVM. I've seen things like 
```
<1024 x float>
```
coming out of TVM's codegen. I've always wondered if this is feature or (mostly 
harmless) side effect. LLVM itself deals with it by breaking these vectors down 
into a string of vector instruction that match the hardware length. SVE support 
in LLVM can also do that for SVE vectors, so in theory we could create vectors 
like 
```
<vscale x 512 x float>
```
So the question there is if we want to support creating these vectors in TVM. 
If we do, `vscale` approach would be more appropriate. I agree tough that it is 
not probably particularly useful. So depends how much we care about the feature 
parity between the vector types there. 

-- 
Reply to this email directly or view it on GitHub:
https://github.com/apache/tvm-rfcs/pull/104#issuecomment-1759954046
You are receiving this because you are subscribed to this thread.

Message ID: <apache/tvm-rfcs/pull/104/c1759954...@github.com>

Reply via email to