Thanks @ekalda for the nice work of the proposal, permit few personal point of 
views supporting the initiative:

### Pros
> 1. When eyeballing the TIR, the meaning of the `vscale` intrinsic is 
> intuitive since it's matches LLVM
> 2. It makes translating the expressions involving `vscale` that exist outside 
> of the vectors in codegen very easy since we just have to map `tir.vscale` -> 
> `llvm.vscale`
> 3. Since we can pull the information about the vector element data type from 
> the ramp node, we can deduce the minimum vector length from the multiplier
> 4. Makes it simpler to support arbitrarily long vectors*
> 
> ### Cons
> 1. Representing `lanes` in runtime data type is very awkward (see the 
> comments above)

* I don't see `lanes` information being awkward, it is already happening for 
classical x86, see: [x86 unrolled tensorizers]( 
https://github.com/apache/tvm/blob/0d338828eebaa3ff705e8521f2a1b3530f73dc7d/python/tvm/topi/x86/tensor_intrin.py#L94-L117)
* Also given `lanes` information now even the schedulers starts to be aware of 
this, see recent fragment: [x86 
proposal](https://github.com/apache/tvm/blob/0d338828eebaa3ff705e8521f2a1b3530f73dc7d/python/tvm/topi/x86/dense.py#L326-L355)


> 2. It's harder to place restriction on what `ramp->lanes` can be so it can 
> get accidentally set to something nonsensical. This could be alleviated by 
> using `vscale(4)` though as recommended by @kparzysz-quic

> ```
> ramp(base, stride, vfactor)
> ```
 
> ### Cons
> 1. We don't know the implicit data type of `vfactor` that is outside of the 
> vector (this is a big problem)

* Why not have both `vfactor` (abstract) concept along with `vscale` (real), 
where the `vfactor` would be a "virtual" teller of how a single true type 
`vscale`  ramps ? This make the "implicit data type to be know" on one hand, 
and also would be expressive enough for "vectors with multiple vector width".

---

Personal note:

  I would keep going (a +1 ✌️) to align with llvm concepts regarding the 
`vscale` type, even with the price to have a native data type implemented from 
the very bottom of dlpack  stack up to the top TVM endings of the llvm emmiters.

  From ASIC point of view, in the very CPU design, there is a clear trend that 
these single-shot atomic "reductors" are becoming increasingly parametrizable 
w.r.t to data (the veclen/lanes concept), easily trading between bandwidth 
needs and specific data access in their hottest possible pipeline path.

  There is also the ["v" RISCV 
extension](https://eupilot.eu/wp-content/uploads/2022/11/RISC-V-VectorExtension-1-1.pdf)
 that I think is well aligned to these recent concepts (if not they were even 
the first introducing these) so it looks like it is becoming a defacto thing in 
the SIMD design trends.




-- 
Reply to this email directly or view it on GitHub:
https://github.com/apache/tvm-rfcs/pull/104#issuecomment-1772429695
You are receiving this because you are subscribed to this thread.

Message ID: <apache/tvm-rfcs/pull/104/c1772429...@github.com>

Reply via email to