On Tue, 18 Oct 2022 at 15:32, Paolo Bonzini <pbonz...@redhat.com> wrote: > > Here the code is a bit uglier due to the truncation and extension > of registers to and from 32-bit. There is also a mistake in the > manual with respect to the size of the memory operand of CVTPS2PI > and CVTTPS2PI, reported by Ricky Zhou. > > Reviewed-by: Richard Henderson <richard.hender...@linaro.org> > Signed-off-by: Paolo Bonzini <pbonz...@redhat.com> > ---
Hi; in bug https://gitlab.com/qemu-project/qemu/-/issues/1637 it's been reported that there's an error in the handling of the UCOMISS instruction that was added in this commit: > static void decode_sse_unary(DisasContext *s, CPUX86State *env, X86OpEntry > *entry, uint8_t *b) > { > if (!(s->prefix & (PREFIX_REPZ | PREFIX_REPNZ))) { > @@ -746,6 +793,15 @@ static const X86OpEntry opcodes_0F[256] = { > [0x76] = X86_OP_ENTRY3(PCMPEQD, V,x, H,x, W,x, vex4 mmx avx2_256 > p_00_66), > [0x77] = X86_OP_GROUP0(0F77), > > + [0x28] = X86_OP_ENTRY3(MOVDQ, V,x, None,None, W,x, vex1 p_00_66), > /* MOVAPS */ > + [0x29] = X86_OP_ENTRY3(MOVDQ, W,x, None,None, V,x, vex1 p_00_66), > /* MOVAPS */ > + [0x2A] = X86_OP_GROUP0(0F2A), > + [0x2B] = X86_OP_GROUP0(0F2B), > + [0x2C] = X86_OP_GROUP0(0F2C), > + [0x2D] = X86_OP_GROUP0(0F2D), > + [0x2E] = X86_OP_ENTRY3(VUCOMI, None,None, V,x, W,x, vex4 p_00_66), > + [0x2F] = X86_OP_ENTRY3(VCOMI, None,None, V,x, W,x, vex4 p_00_66), I've figured out what's going wrong, but the tables here are a bit opaque for my level of understanding of the instruction set. The problem seems to be that we assume that UCOMISS takes two 64 bit arguments. However, the documentation says that it operates on the low 32-bits of a register operand and on a 32-bit memory location. (UCOMISD operates on 64-bit values.) This only causes a problem when the guest code tries to do a UCOMISS on a 32-bit memory operand that happens to be at the very end of a page where there is nothing mapped after it. It looks like QEMU always tries to load a full 128 bits from memory, and so it causes a segfault that shouldn't happen: you can see that we do two qemu_ld_i64 from rcx + 0xff4 and rcx + 0xffc, when we should only be loading a single 32 bit value. 0x400000116f: 0f 2e 81 f4 0f 00 00 ucomiss 0xff4(%rcx), %xmm0 OP: ld_i32 loc9,env,$0xfffffffffffffff8 brcond_i32 loc9,$0x0,lt,$L0 ---- 000000400000116f 0000000000000000 add_i64 loc2,rcx,$0xff4 qemu_ld_i64 loc4,loc2,leq,0 st_i64 loc4,env,$0xb50 add_i64 loc3,loc2,$0x8 qemu_ld_i64 loc4,loc3,leq,0 st_i64 loc4,env,$0xb58 add_i64 loc13,env,$0xb50 add_i64 loc15,env,$0x350 call ucomiss,$0x0,$0,env,loc15,loc13 (this is from the test case in the bug report) Presumably the table entry should be marked up somehow to indicate that the operand size is only 32 bits/64 bits, but I'm not sure how that should be done. Any suggestions? thanks -- PMM