On Sun 5. Sep 2021 at 11:11, Richard Henderson <richard.hender...@linaro.org> wrote: > > On 9/4/21 10:35 PM, Philipp Tomsich wrote: > > Assume clzw being executed on a register that is not sign-extended, such > > as for the following sequence that uses (1ULL << 63) | 392 as the operand > > to clzw: > > bseti a2, zero, 63 > > addi a2, a2, 392 > > clzw a3, a2 > > The correct result of clzw would be 23, but the current implementation > > returns -32 (as it performs a 64bit clz, which results in 0 leading zero > > bits, and then subtracts 32). > > > > Fix this by changing the implementation to: > > 1. shift the original register up by 32 > > 2. performs a target-length (64bit) clz > > 3. return 32 if no bits are set > > > > Signed-off-by: Philipp Tomsich <philipp.toms...@vrull.eu> > > --- > > > > Changes in v10: > > - New patch, fixing correctnes for clzw called on a register with undefined > > (as in: not properly sign-extended) upper bits. > > But we have > > return gen_unary(ctx, a, EXT_ZERO, gen_clzw); > > should *not* be undefined. Ah, what's missing is > > ctx->w = true; > > within trans_clzw to cause the extend to take effect. > > There are a few other "w" functions that are missing that set, though they > use EXT_NONE so > there is no visible bug, it would probably be best to set w anyway.
I had originally considered that (it would have to be ctx->w = true; and EXT_SIGN), but that has the side-effect of performing an extension on the result of the clzw as well (i.e. bot get_gpr and set_gpr are extended). However, this is not what clzw does: it only ignores the upper bits and then produces a result in the value-range [0..32]. An extension on the result of either a clz or clzw is never needed (we have been fighting that problem in GCC and had to use patterns for the combiner), so I don't want to introduce this inefficiency in qemu. If you have EXT_SIGN (or _ZERO), we will end up with 2 additional extensions (one on the argument, one on the result) in addition to the 2 other tcg operations that we need (i.e. clzi/subi for the extending case, shli/clzi for the proposed fix). So I believe that this commit is the best fix: 1. It doesn't mark this as a w-form (i.e. ignores the high-bits on the input _and_ extends the output), as it only treats the input as 32bit, but the output is xlen. 2. It doesn't introduce any unnecessary extensions, but handles the case in-place. Philipp.