Re: [apache/tvm-rfcs] [RFC] Scalable vectors in TIR (PR #104)

Elen Kalda Fri, 08 Dec 2023 04:30:34 -0800

@cbalint13 @tqchen  Thank you for your input! This thread has been dormant for 
a bit, but we're still on it!


> A comprehensive presentation on SVE design booth on RISCV and ARM from 
> perspective of LLVM.
The presentation captures all the design details of the SVE rationale in LLVM 
including arch comparisions.
https://youtu.be/-ox8iJmbp0c?feature=shared (Vector Codegen / Luke Lau)

Thanks for sharing this, a really nice presentation! I'm trying to think how 
RVV's features will align with this RFC proposal... I think LLVM can be a good 
source of inspiration there :) Based on my (quite basic) understanding of RVV, 
there are two features that need consideration:

**1. Addressing several vectors at once (`LMUL`)**
They have resolved it in LLVM by encoding the `LMUL` value into the multiplier 
of `vscale`. Since this proposal follows the LLVM convection in expressing the 
scalable length, it can easily be adopted in TVM. As it currently stands, it 
will be up to the schedule author to do the maths and figure out the correct 
multiplier. It would be even easier if we implemented both `vscale` and 
`vfactor`...*

**2. Predication**
If I understood it correctly, there are two ways of setting the active lanes:
  1. By providing a bitmask as a predicate to the operation - I'd expect LLVM 
RVV backend supports `llvm.get_active_lane_mask` for that purpose, so for this 
case the current proposal should work
  2. By setting the `VL` register to the number of active lanes - I suppose 
that's the feature @cbalint13  you mentioned in your previous comment? I can 
think of few options there:
          1. If it is more like a status register that will apply to several 
instructions, we can use pragmas/TensorIR block attributes
          2. If it is not an expensive operation, we can add an optional 
argument to ramps and broadcasts to indicate the active lanes
          3. The RISC-V backend in TVM should have all the information to 
translate `tir.get_active_lane_mask` to appropriate LLVM intrinsics that set 
the VL register.

### * ... implement both `vscale` and `vfactor`

> Why not have both `vfactor` (abstract) concept along with `vscale` (real), 
> where the `vfactor` would be a "virtual" teller of how a single true type 
> `vscale` ramps ? This make the "implicit data type to be know" on one hand, 
> and also would be expressive enough for "vectors with multiple vector width".

Sorry I missed this before! That's a good point, I think there would be 
benefits in having both of these available. It would certainly make expressing 
multiples of vector length simpler, e.g. 
```
Ramp(base, 1, 2 * vfactor)
```
would imply `LMUL = 2`. If we want to keep `vfactor` as a user facing 
convenience function, we could do the translation from `vfactor` to `n * 
vscale` in the Ramp constructor so we wouldn't need to teach the TVM internal 
passes how to deal with it. 

---
@tqchen 

> Just to circle back here a bit. the main root issue is that we are using 
> runtime::DataType, which is supposely being concrete through out the TIR node.

> This places restrictions on what we can normally represent. A more 
> comprehensive update would change the PrimExpr's field to also an object, as 
> per StructInfo in the relax. That would requires bit more thinking, which 
> likely can get around the issues mentioned in the thread(of passing around 
> runtime::DataType which is not an object).


I think I see what you mean and I agree, if we had something of a base type 
`Object` representing the data type that would give us much more freedom in 
expressing the compile time data type. I see how this would be a pretty 
invasive change though, I'm also not sure how this would interoperate with the 
`DLDataType` dependent runtime implementation (but I also don't know the 
runtime implementation very well).


> I think in short term making the protocol of lanes = -1 and lanes = -8(for 
> vscale(8)) may not be a bad idea. The main reason is I cannot think of 
> another possible use of the lanes field other than for the SVE.

I'm fine with going with this option (especially if we manage to hide the -8 
from the user) as it is probably the least invasive and sturdy option that will 
allow us to achieve our goals. @lhutton has been prototyping an additional 
field in `runtime::DataType`, but it's a bit of a can of worms (as per his post 
above).

We intend to upload a draft prototype soon, then you guys will have something 
more concrete to look at :)

-- 
Reply to this email directly or view it on GitHub:
https://github.com/apache/tvm-rfcs/pull/104#issuecomment-1847090278
You are receiving this because you are subscribed to this thread.

Message ID: <apache/tvm-rfcs/pull/104/c1847090...@github.com>

Re: [apache/tvm-rfcs] [RFC] Scalable vectors in TIR (PR #104)

Reply via email to