Pan Li via Gcc-patches <gcc-patches@gcc.gnu.org> writes: > I am not very familiar with the memory pattern, maybe juzhe can provide more > information or correct me if anything is misleading. > > The different precision try to resolve the below bugs, the second vlm(with > different size of load bytes compared to first one) > is eliminated because vbool8 and vbool16 have the same precision size, aka > [8, 8]. > > vbool8_t v2 = *(vbool8_t*)in; > vbool16_t v5 = *(vbool16_t*)in; > *(vbool16_t*)(out + 200) = v5; > *(vbool8_t*)(out + 100) = v2; > > addi a4,a1,100 > vsetvli a5,zero,e8,m1,ta,ma > addi a1,a1,200 > vlm.v v24,0(a0) > vsm.v v24,0(a4) > // Need one vsetvli and vlm.v for correctness here. > vsm.v v24,0(a1)
But I think it's important to think about the patch as more than a way of fixing the bug above. The aim has to be to describe the modes as they really are. I don't think there's a way for GET_MODE_SIZE to be "conservatively wrong". A GET_MODE_SIZE that is too small would cause problems. So would a GET_MODE_SIZE that is too big. Like Richard says, I think the question comes down to the amount of padding. Is it the case that for 4+4X ([4,4]), the memory representation has 4 bits of padding for even X and 0 bits of padding for odd X? I agree getting rid of GET_MODE_SIZE and representing everything in bits would avoid the problem at this point, but I think it would just be pushing the difficulty elsewhere. E.g. stack layout will be "interesting" if we can't work in byte sizes. Thanks, Richard