https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111153

--- Comment #6 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Pan Li <pa...@gcc.gnu.org>:

https://gcc.gnu.org/g:5e0f67b84a615ba186ab234a9bc43df0df5a50b6

commit r14-6528-g5e0f67b84a615ba186ab234a9bc43df0df5a50b6
Author: Juzhe-Zhong <juzhe.zh...@rivai.ai>
Date:   Thu Dec 14 11:23:43 2023 +0800

    RISC-V: Add RVV builtin vectorization cost model

    This patch fixes PR11153:

            ble     a1,zero,.L8
            addiw   a5,a1,-1
            li      a4,4
            addi    sp,sp,-16
            mv      a2,a0
            sext.w  a3,a1
            bleu    a5,a4,.L9
            srliw   a4,a3,2
            slli    a4,a4,4
            mv      a5,a0
            add     a4,a4,a0
            vsetivli        zero,4,e32,m1,ta,ma
            vmv.v.i v1,0
            vse32.v v1,0(sp)
    .L4:
            vle32.v v1,0(a5) ---> This loop always processes 4 elements which
is ok for VLEN = 128bits, but waste a huge amount of computation units when
VLEN > 128bits
            vle32.v v2,0(sp)
            addi    a5,a5,16
            vadd.vv v1,v2,v1
            vse32.v v1,0(sp)
            bne     a4,a5,.L4
            ld      a5,0(sp)
            lw      a4,0(sp)
            andi    a1,a1,-4
            srai    a5,a5,32
            addw    a5,a4,a5
            lw      a4,8(sp)
            addw    a5,a5,a4
            ld      a4,8(sp)
            srai    a4,a4,32
            addw    a0,a5,a4
            beq     a3,a1,.L15
    .L3:
            subw    a3,a3,a1
            slli    a5,a1,32
            slli    a3,a3,32
            srli    a3,a3,32
            srli    a5,a5,30
            add     a2,a2,a5
            vsetvli a5,a3,e8,mf4,tu,mu
            vsetvli a4,zero,e32,m1,ta,ma
            sub     a1,a3,a5
            vmv.v.i v1,0
            vsetvli zero,a3,e32,m1,tu,ma
            vle32.v v2,0(a2)
            vmv.v.v v1,v2
            bne     a3,a5,.L21
    .L7:
            vsetvli a4,zero,e32,m1,ta,ma
            vmv.s.x v2,zero
            vredsum.vs      v1,v1,v2
            vmv.x.s a5,v1
            addw    a0,a0,a5
    .L15:
            addi    sp,sp,16
            jr      ra
    .L21:
            slli    a5,a5,2
            add     a2,a2,a5
            vsetvli zero,a1,e32,m1,tu,ma
            vle32.v v2,0(a2)
            vadd.vv v1,v1,v2
            j       .L7
    .L8:
            li      a0,0
            ret
    .L9:
            li      a1,0
            li      a0,0
            j       .L3

    The rootcause of this is we missed RVV builtin vectorization cost model.

    After this patch:

            ble     a1,zero,.L4
            vsetvli a5,zero,e32,m1,ta,ma
            vmv.v.i v1,0
    .L3:
            vsetvli a5,a1,e32,m1,tu,ma
            vle32.v v2,0(a0)
            slli    a4,a5,2
            sub     a1,a1,a5
            add     a0,a0,a4
            vadd.vv v1,v2,v1
            bne     a1,zero,.L3
            li      a5,0
            vsetivli        zero,1,e32,m1,ta,ma
            vmv.s.x v2,a5
            vsetvli a5,zero,e32,m1,ta,ma
            vredsum.vs      v1,v1,v2
            vmv.x.s a0,v1
            ret
    .L4:
            li      a0,0
            ret

            PR target/111153

    gcc/ChangeLog:

            * config/riscv/riscv-protos.h (struct common_vector_cost): New
struct.
            (struct scalable_vector_cost): Ditto.
            (struct cpu_vector_cost): Ditto.
            * config/riscv/riscv-vector-costs.cc (costs::add_stmt_cost): Add
RVV
            builtin vectorization cost
            * config/riscv/riscv.cc (struct riscv_tune_param): Ditto.
            (get_common_costs): New function.
            (riscv_builtin_vectorization_cost): Ditto.
            (TARGET_VECTORIZE_BUILTIN_VECTORIZATION_COST): New targethook.

    gcc/testsuite/ChangeLog:

            * gcc.dg/vect/costmodel/riscv/rvv/pr111153.c: New test.

Reply via email to