Richard,

Did you have a chance to consider what to do with clzw?
I would prefer to avoid the extra extension instructions and change the
implementation (and would update the commit message to provide more
context), but if you insist on setting 'ctx->w' I'll just have the extra
extensions emitted than delay this series further…

Thanks,
Philipp.

On Sun, 5 Sept 2021 at 12:01, Philipp Tomsich <philipp.toms...@vrull.eu>
wrote:

> On Sun 5. Sep 2021 at 11:11, Richard Henderson
> <richard.hender...@linaro.org> wrote:
> >
> > On 9/4/21 10:35 PM, Philipp Tomsich wrote:
> > > Assume clzw being executed on a register that is not sign-extended,
> such
> > > as for the following sequence that uses (1ULL << 63) | 392 as the
> operand
> > > to clzw:
> > >       bseti   a2, zero, 63
> > >       addi    a2, a2, 392
> > >       clzw    a3, a2
> > > The correct result of clzw would be 23, but the current implementation
> > > returns -32 (as it performs a 64bit clz, which results in 0 leading
> zero
> > > bits, and then subtracts 32).
> > >
> > > Fix this by changing the implementation to:
> > >   1. shift the original register up by 32
> > >   2. performs a target-length (64bit) clz
> > >   3. return 32 if no bits are set
> > >
> > > Signed-off-by: Philipp Tomsich <philipp.toms...@vrull.eu>
> > > ---
> > >
> > > Changes in v10:
> > > - New patch, fixing correctnes for clzw called on a register with
> undefined
> > >    (as in: not properly sign-extended) upper bits.
> >
> > But we have
> >
> >      return gen_unary(ctx, a, EXT_ZERO, gen_clzw);
> >
> > should *not* be undefined.  Ah, what's missing is
> >
> >      ctx->w = true;
> >
> > within trans_clzw to cause the extend to take effect.
> >
> > There are a few other "w" functions that are missing that set, though
> they use EXT_NONE so
> > there is no visible bug, it would probably be best to set w anyway.
>
>
> I had originally considered that (it would have to be ctx->w = true;
> and EXT_SIGN),
> but that has the side-effect of performing an extension on the result
> of the clzw as
> well (i.e. bot get_gpr and set_gpr are extended).
>
> However, this is not what clzw does: it only ignores the upper bits
> and then produces
> a result in the value-range [0..32]. An extension on the result of
> either a clz or clzw
> is never needed (we have been fighting that problem in GCC and had to
> use patterns
> for the combiner), so I don't want to introduce this inefficiency in qemu.
>
> If you have EXT_SIGN (or _ZERO), we will end up with 2 additional
> extensions (one
> on the argument, one on the result) in addition to the 2 other tcg
> operations that we
> need (i.e. clzi/subi for the extending case, shli/clzi for the proposed
> fix).
>
> So I believe that this commit is the best fix:
> 1. It doesn't mark this as a w-form (i.e. ignores the high-bits on the
> input _and_
> extends the output), as it only treats the input as 32bit, but the
> output is xlen.
> 2. It doesn't introduce any unnecessary extensions, but handles the
> case in-place.
>
> Philipp.
>

Reply via email to