On 07/22/2018 02:31 PM, Richard Henderson wrote: > On 07/22/2018 01:47 PM, Jason A. Donenfeld wrote: >> Hello, >> >> Gcc 7.3 compiles bash's array_flush's dual assignment using: >> >> STP X20, X20, [X20,#0x10] >> >> But gcc 8.1 compiles it as: >> >> STR Q0, [X20,#0x10] >> >> Real processors seem okay, and qemu 2.11 seems okay. But qemu 2.12 >> results in a segfaulting process. I'm pretty sure this is a TCG bug. >> >> In the attached tarball, please find kernel and run.sh. Calling >> ./run.sh will start the kernel with the bad bash executable that tries >> to execute `config=({1..100000})` and crashes. Also included in there >> is the actual crashing bash binary, in case you'd like to disassemble >> a little bit. > > Interesting. The test passes on master with --enable-debug, but fails when > qemu is compiled with optimization... > > I'll dig a bit deeper.
The failing sequence is 0x0045ba44: 4e080e80 dup v0.2d, x20 0x0045ba48: 90000340 adrp x0, #0x4c3000 0x0045ba4c: 91098003 add x3, x0, #0x260 0x0045ba50: 92800001 movn x1, #0 0x0045ba54: f9413002 ldr x2, [x0, #0x260] 0x0045ba58: 3d800680 str q0, [x20, #0x10] ... OP after optimization and liveness analysis: ld_i32 tmp0,env,$0xffffffffffffffdc dead: 1 movi_i32 tmp1,$0x0 brcond_i32 tmp0,tmp1,lt,$L0 dead: 0 1 ---- 000000000045ba44 0000000000000000 0000000000000000 dup_vec v128,e64,tmp2,x20 st_vec v128,e8,tmp2,env,$0x8c0 dead: 0 ... ---- 000000000045ba58 0000000000000000 0000000000000000 movi_i64 tmp4,$0x10 add_i64 tmp3,x20,tmp4 dead: 1 2 ld_i64 tmp4,env,$0x8c0 movi_i64 tmp6,$0x8 add_i64 tmp5,tmp3,tmp6 dead: 2 qemu_st_i64 tmp4,tmp3,leq,0 dead: 0 1 ld_i64 tmp4,env,$0x8c8 dead: 1 qemu_st_i64 tmp4,tmp5,leq,0 dead: 0 1 ... 0x7fffcd2e678c: vmovq 0xe0(%r14), %xmm0 0x7fffcd2e6795: vpbroadcastq %xmm0, %xmm1 0x7fffcd2e679a: vmovdqu %xmm1, 0x8c0(%r14) ... 0x7fffcd2c0e78: vmovq %xmm0, %r12 0x7fffcd2c0e7d: addq $0x10, %r12 The guest x20 is loaded in to xmm0 for the dup at 0x45ba44, and was reused for the store at 0x45ba58. However, if the load at 0x45ba54 misses the TLB, then we will have a function call, which can clobber xmm0. With -O0, it just so happens that the function call does not clobber xmm0; with optimization enabled, the compiler's different code generation does clobber xmm0. Fix by properly considering xmm registers to be call-clobbered. At which point the saved value is evicted from xmm0 naturally. Patch posted separately. r~