On Mon, Sep 1, 2025 at 6:51 PM Robin Dapp <rdapp....@gmail.com> wrote:
>
> > We used to apply -mrvv-max-lmul= to limit VLS code gen, auto vectorizer,
> > and builtin string function expansion. But I think the VLS code gen part 
> > doesn't
> > need this limit, since it only happens when the user explicitly writes 
> > vector
> > types.
> >
> > For example, int32x8_t under -mrvv-max-lmul=m1 with VLEN=128 would be split 
> > into
> > two int32x4_t, which generate more instructions and runs slower.
> >
> > In this patch, I changed -mrvv-max-lmul= to only affect auto vectorization 
> > and
> > builtin string function expansion. Actually, the option's help text already
> > says it only controls the LMUL used by auto-vectorization, so I believe this
> > change is makes sense :)
>
> This might have been discussed while I was away so I haven't complained yet :)
> To me the -mrvv-max-lmul option always included "everything" and IMHO the
> maximum LMUL should be generally tied to a microarchitecture.
>
> Many of the higher-end cores won't favor LMUL > 1 and I'd find it surprising 
> if
> we started emitting LMUL8 even for fixed vector sizes.
>
> To play devil's advocate: If LMUL8 (or 4, 2) is faster why don't we enable it
> unconditionally?  Not that I think it's generally faster but what's special
> about such a VLS example that doesn't hold for auto-vectorization?
>
> Is the code for this example particularly bad for LMUL1 or is it optimal and
> LMUL8 is just faster on your uarchs?

The main reason is that I’m working on the fixed-length-vector calling
convention [1]. For that, I need all these VLS types to be available so
that arguments can be passed correctly.

I know LMUL choice is very u-arch specific, so I agree the option makes
sense for the vectorizer. But when people use fixed-length vectors in
their code, I think it’s a bit different. My assumption is that if
someone writes code with fixed-length vectors, they usually expect it to
map directly to hardware operations, not to be split into smaller ones.

Also, the option doesn’t really match the meaning of explicit vector
types in code. For example, with -mrvv-max-lmul=dynamic, the current
implementation basically acts the same as -mrvv-max-lmul=m8 for VLS
types.

[1] https://github.com/riscv-non-isa/riscv-elf-psabi-doc/pull/418


>
> --
> Regards
>  Robin
>

Reply via email to