i386: reimplement 0x0f 0x28-0x2f, add AVX

Peter Maydell Tue, 09 May 2023 03:53:36 -0700

On Tue, 18 Oct 2022 at 15:32, Paolo Bonzini <pbonz...@redhat.com> wrote:
>
> Here the code is a bit uglier due to the truncation and extension
> of registers to and from 32-bit.  There is also a mistake in the
> manual with respect to the size of the memory operand of CVTPS2PI
> and CVTTPS2PI, reported by Ricky Zhou.
>
> Reviewed-by: Richard Henderson <richard.hender...@linaro.org>
> Signed-off-by: Paolo Bonzini <pbonz...@redhat.com>
> ---


Hi; in bug https://gitlab.com/qemu-project/qemu/-/issues/1637
it's been reported that there's an error in the handling
of the UCOMISS instruction that was added in this commit:

>  static void decode_sse_unary(DisasContext *s, CPUX86State *env, X86OpEntry 
> *entry, uint8_t *b)
>  {
>      if (!(s->prefix & (PREFIX_REPZ | PREFIX_REPNZ))) {
> @@ -746,6 +793,15 @@ static const X86OpEntry opcodes_0F[256] = {
>      [0x76] = X86_OP_ENTRY3(PCMPEQD,    V,x, H,x, W,x,  vex4 mmx avx2_256 
> p_00_66),
>      [0x77] = X86_OP_GROUP0(0F77),
>
> +    [0x28] = X86_OP_ENTRY3(MOVDQ,      V,x,  None,None, W,x, vex1 p_00_66), 
> /* MOVAPS */
> +    [0x29] = X86_OP_ENTRY3(MOVDQ,      W,x,  None,None, V,x, vex1 p_00_66), 
> /* MOVAPS */
> +    [0x2A] = X86_OP_GROUP0(0F2A),
> +    [0x2B] = X86_OP_GROUP0(0F2B),
> +    [0x2C] = X86_OP_GROUP0(0F2C),
> +    [0x2D] = X86_OP_GROUP0(0F2D),
> +    [0x2E] = X86_OP_ENTRY3(VUCOMI,     None,None, V,x, W,x,  vex4 p_00_66),
> +    [0x2F] = X86_OP_ENTRY3(VCOMI,      None,None, V,x, W,x,  vex4 p_00_66),

I've figured out what's going wrong, but the tables here are
a bit opaque for my level of understanding of the instruction
set. The problem seems to be that we assume that UCOMISS takes
two 64 bit arguments. However, the documentation says that it
operates on the low 32-bits of a register operand and on a
32-bit memory location. (UCOMISD operates on 64-bit values.)
This only causes a problem when the guest code tries to do
a UCOMISS on a 32-bit memory operand that happens to be at
the very end of a page where there is nothing mapped after it.
It looks like QEMU always tries to load a full 128 bits from
memory, and so it causes a segfault that shouldn't happen:
you can see that we do two qemu_ld_i64 from rcx + 0xff4 and
rcx + 0xffc, when we should only be loading a single 32 bit value.

0x400000116f:  0f 2e 81 f4 0f 00 00     ucomiss  0xff4(%rcx), %xmm0

OP:
 ld_i32 loc9,env,$0xfffffffffffffff8
 brcond_i32 loc9,$0x0,lt,$L0

 ---- 000000400000116f 0000000000000000
 add_i64 loc2,rcx,$0xff4
 qemu_ld_i64 loc4,loc2,leq,0
 st_i64 loc4,env,$0xb50
 add_i64 loc3,loc2,$0x8
 qemu_ld_i64 loc4,loc3,leq,0
 st_i64 loc4,env,$0xb58
 add_i64 loc13,env,$0xb50
 add_i64 loc15,env,$0x350
 call ucomiss,$0x0,$0,env,loc15,loc13

(this is from the test case in the bug report)

Presumably the table entry should be marked up somehow to
indicate that the operand size is only 32 bits/64 bits,
but I'm not sure how that should be done. Any suggestions?

thanks
-- PMM

Re: [PULL 47/53] target/i386: reimplement 0x0f 0x28-0x2f, add AVX

Reply via email to