On Tue, Sep 29, 2020 at 7:22 PM 夏 晋 <ilyply2...@hotmail.com> wrote: > vint16m1_t foo3(vint16m1_t a, vint16m1_t b){ > vint16m1_t add = a+b; > vint16m1_t mul = a*b; > vsetvl_e8m1(32); > return add + mul; > }
Taking another look at your example, you have type confusion. Using vsetvl to specify an element width of 8 does not magically convert types into 8-bit vector types. They are still 16-bit vector types and will still result in 16-bit vector operations. So your explicit vsetvl_e8m1 is completely useless. In the RISC-V V scheme, every vector operation emits an implicit vsetvl instruction, and then we optimize away the redundant ones. So the add and mul at the start are emitting two vsetvl instructions. Then you have an explicit vsetvl. Then another add, which will emit another implicit vsetvl. The compiler reordered the arithmetic in such a way that two of the implicit vsetvl instructions can be optimized away. That probably happened by accident. But we don't have support for optimizing away the useless explicit vsetvl, so it remains. Jim