Hi Robin: Just got few more clarifications from Andrew about the behavior for the valid* LMUL for ELEN=32,
* valid may not be a precise word, anyway, the spec guarantees that it should be implemented. Spec[1] say: --- When LMUL < SEWMIN/ELEN, there is no guarantee an implementation would have enough bits in the fractional vector register to store at least one element, as VLEN=ELEN is a valid implementation choice. For example, with VLEN=ELEN=32, and SEWMIN=8, an LMUL of 1/8 would only provide four bits of storage in a vector register. For a given supported fractional LMUL setting, implementations must support SEW settings between SEWMIN and LMUL * ELEN, inclusive. --- So valid range fractional LMUL for SEW=8, 16 32 are: mf8 = [8, (1/8)*32] = [8, 4] = [], no SEW is valid with mf8 for ELEN = 32 mf4 = [8, (1/4)*32] = [8, 8] = only SEW 8 with mf4 is valid mf2 = [8, (1/2)*32] = [8, 16] = SEW 8 and 16 with mf2 are valid [1] https://github.com/riscvarchive/riscv-v-spec/blob/master/v-spec.adoc#342-vector-register-grouping-vlmul20 >> >> In particular, how would the same LMUL for AVL=2 and AVL=4 and the same >> >> data >> >> type be correct? >> > >> > That's right. The case just allocates more space, but storing 2 and 4 >> > elements remains the same. >> >> Even if a V2SI with LMUL=1 on VLEN=128 doesn't lead to a SIGILL right away >> it would surely modify the overlap constraints and such. To me that doesn't >> look right. I am not sure I got the point, we are using early clobber to avoid the overlap constraint, that's pretty conservative way and should not lead the problem here when we using LMUL=1/SEW=32 for V2SI, or are you worry about we may put two V2SI within single vector register?