Thanks for your comments @kparzysz-quic! Some clarifying questions and thoughts:
> Add a parameter to tir.vscale to state the minimal assumed vector length. For > AArch64 SVE it will be 128 (bits), but some other non-SVE architecture can > provide a different value (via a target hook, or something like that). Happy to include it, but I'd like to understand better the value it would add. AFAIK the `llvm::vscale` does not have the minimum vector length associated with it, it's encoded in the "multiplier", e.g. in ``` %wide.load11 = load <vscale x 4 x i32>, ptr %12 ``` the 4 represents min_vector_length / size_of_the_data_type. If we follow that philosophy and mimic LLVM's `vscale` in TIR, then it will be the responsibility of the author of target specific schedule to set that multiplier correctly. It would be different if we opted for something like `vfactor` instead of `vscale` (as originally proposed in the RFC) since `vfactor` would essentially represent the number of elements in a vector which would depend on the minimum length. I'm mostly looking at it from the point of SVE, so I'm interested to learn if there is a case for it for other scalable architecture extensions out there. > If you plan to include predication eventually, that would be something that a > lot of targets could use. The LLVM intrinsics for predicated operations do > not explicitly require SVE, they can be used with fixed-sized vectors as well. Agreed! This might require its own mini-RFC. > For dealing with an unknown vector lengths and simultaneously allowing > specific lengths per use-site we could either > 1. Require that if Ramp/Broadcast has lanes == -1, then the base/value member > must be a TIR intrinsic specifying the vscale for the value. E.g. > Ramp(tir.vscale(128, base), stride, -1) or Broadcast(tir.vscale(256, value), > -1). > 2. Extend Ramp and Broadcast to take lanes as PrimExpr, with restrictions on > what that expression can contain. Option 2. is what we propose in this RFC. From some prototyping experience, it would let us use all the current infrastructure for vectors in TVM and the LLVM codegen pretty much "just works", with ca 10 lines to map `tir.vscale` to `llvm::vscale` (that applies to simple consecutive loads and stores, it's a bit more complex for things like ramps with stride != 1). I'm not in favour of exposing -1 to user in any form, e.g. from TVMScript or just from printing TIR, it is not particularly intuitive interface. The only reason for -1 is the DLPack standard for which we need a way to express scalable vectors. Another idea to handle this would be to add a new field to `DLDataType`, e.g. `bool is_scalable`, but I'm not sure how feasible changing that standard is. -- Reply to this email directly or view it on GitHub: https://github.com/apache/tvm-rfcs/pull/104#issuecomment-1717982991 You are receiving this because you are subscribed to this thread. Message ID: <apache/tvm-rfcs/pull/104/c1717982...@github.com>