(added qemu-devel to the cc list) On Mon, 10 Feb 2025 at 17:26, Stu Grossman <stu.gross...@gmail.com> wrote: > > I've been getting SIGBUS cores with a bunch of user apps running under > linux 5.15 and qemu-system-aarch64. These happen to be 32 bit (T32?) > programs. > > All of the cores point at the following instruction: > > ldrd r2,r3,[r2] > > where r2 points at four bytes prior to the end of a page. Eg: 0x10ffc. > > The bug appears to be that we get a page fault between the accesses for > each word. Prior to the fault, the first register is updated. Later, > after the fault is serviced, the instruction is restarted with the index > register containing the word loaded prior to the fault, which is > probably not the desired address, resulting in a coredump. > > So the conditions for the problem are: > > - the index register must be one of the registers being loaded. > - the index register must point at a uint64_t that spans a page > boundary. > > I understand that the uint64_t spanning a page boundary may not be > allowed by the arm 32 bit API. However, the ldrd instruction definition > allows for four byte alignment.
Yes, you're right; this is a longstanding bug in our LDRD implementation. The Arm ARM defines in section G1.17.8.3 that if the execution of an instruction generates a synhcronous Data Abort then the base register should be restored to its original value if the aborted instruction is a load and the list of registers to be loaded includes the base register. > Note that ldm may have similar issues if the index is one of the > registers being loaded and the words cross a page boundary. I believe in LDM we get this correct: in the do_ldm() function we have a case for 'i == a->rn' which stashes the loaded value into a TCG temporary, and we don't write that into the actual register until after we've completed all the loads. > The fix is to defer the register stores till after both words have been > read from memory. > > Here is my fix: > > diff --git a/target/arm/tcg/translate.c b/target/arm/tcg/translate.c > index 9ee761fc64..78ad0ed186 100644 > --- a/target/arm/tcg/translate.c > +++ b/target/arm/tcg/translate.c > @@ -5006,7 +5006,7 @@ static bool op_store_rr(DisasContext *s, > arg_ldst_rr *a, > static bool trans_LDRD_rr(DisasContext *s, arg_ldst_rr *a) > { > int mem_idx = get_mem_index(s); > - TCGv_i32 addr, tmp; > + TCGv_i32 addr, tmp1, tmp2; > > if (!ENABLE_ARCH_5TE) { > return false; > @@ -5017,15 +5017,16 @@ static bool trans_LDRD_rr(DisasContext *s, > arg_ldst_rr *a) > } > addr = op_addr_rr_pre(s, a); > > - tmp = tcg_temp_new_i32(); > - gen_aa32_ld_i32(s, tmp, addr, mem_idx, MO_UL | MO_ALIGN); > - store_reg(s, a->rt, tmp); > + tmp1 = tcg_temp_new_i32(); > + gen_aa32_ld_i32(s, tmp1, addr, mem_idx, MO_UL | MO_ALIGN); > > tcg_gen_addi_i32(addr, addr, 4); > > - tmp = tcg_temp_new_i32(); > - gen_aa32_ld_i32(s, tmp, addr, mem_idx, MO_UL | MO_ALIGN); > - store_reg(s, a->rt + 1, tmp); > + tmp2 = tcg_temp_new_i32(); > + gen_aa32_ld_i32(s, tmp2, addr, mem_idx, MO_UL | MO_ALIGN); > + > + store_reg(s, a->rt, tmp1); > + store_reg(s, a->rt + 1, tmp2); > > /* LDRD w/ base writeback is undefined if the registers overlap. */ > op_addr_rr_post(s, a, addr, -4); > @@ -5153,19 +5154,20 @@ static bool op_store_ri(DisasContext *s, > arg_ldst_ri *a, > static bool op_ldrd_ri(DisasContext *s, arg_ldst_ri *a, int rt2) > { > int mem_idx = get_mem_index(s); > - TCGv_i32 addr, tmp; > + TCGv_i32 addr, tmp1, tmp2; > > addr = op_addr_ri_pre(s, a); > > - tmp = tcg_temp_new_i32(); > - gen_aa32_ld_i32(s, tmp, addr, mem_idx, MO_UL | MO_ALIGN); > - store_reg(s, a->rt, tmp); > + tmp1 = tcg_temp_new_i32(); > + gen_aa32_ld_i32(s, tmp1, addr, mem_idx, MO_UL | MO_ALIGN); > > tcg_gen_addi_i32(addr, addr, 4); > > - tmp = tcg_temp_new_i32(); > - gen_aa32_ld_i32(s, tmp, addr, mem_idx, MO_UL | MO_ALIGN); > - store_reg(s, rt2, tmp); > + tmp2 = tcg_temp_new_i32(); > + gen_aa32_ld_i32(s, tmp2, addr, mem_idx, MO_UL | MO_ALIGN); > + > + store_reg(s, a->rt, tmp1); > + store_reg(s, rt2, tmp2); > > /* LDRD w/ base writeback is undefined if the registers overlap. */ > op_addr_ri_post(s, a, addr, -4); Yes, this fix looks correct to me. Can you provide a Signed-off-by: tag for it? We can't accept it as a patch without that. (I can do the other administrative tidying up of it into a commit, but the signed-off-by is what says you have the legal right and are happy to submit it to QEMU under our license (LGPLv2.1+ in this case)). thanks -- PMM