On Thu, 27 Jul 2023 at 17:33, Richard Henderson <richard.hender...@linaro.org> wrote: > > STGP writes to tag memory, it does not check it. > This happened to work because we wrote tag memory first > so that the check always succeeded. > > Signed-off-by: Richard Henderson <richard.hender...@linaro.org> > --- > target/arm/tcg/translate-a64.c | 41 +++++++++++++--------------------- > 1 file changed, 15 insertions(+), 26 deletions(-) > > diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c > index 5fa1257d32..dfd18e19ca 100644 > --- a/target/arm/tcg/translate-a64.c > +++ b/target/arm/tcg/translate-a64.c > @@ -3020,37 +3020,17 @@ static bool trans_STGP(DisasContext *s, arg_ldstpair > *a) > tcg_gen_addi_i64(dirty_addr, dirty_addr, offset); > } > > - if (!s->ata) { > - /* > - * TODO: We could rely on the stores below, at least for > - * system mode, if we arrange to add MO_ALIGN_16. > - */ > - gen_helper_stg_stub(cpu_env, dirty_addr); > - } else if (tb_cflags(s->base.tb) & CF_PARALLEL) { > - gen_helper_stg_parallel(cpu_env, dirty_addr, dirty_addr); > - } else { > - gen_helper_stg(cpu_env, dirty_addr, dirty_addr); > - } > - > - mop = finalize_memop(s, MO_64); > - clean_addr = gen_mte_checkN(s, dirty_addr, true, false, 2 << MO_64, mop); > - > + clean_addr = clean_data_tbi(s, dirty_addr); > tcg_rt = cpu_reg(s, a->rt); > tcg_rt2 = cpu_reg(s, a->rt2); > > /* > - * STGP is defined as two 8-byte memory operations and one tag operation. > - * We implement it as one single 16-byte memory operation for > convenience. > - * Rebuild mop as for STP. > - * TODO: The atomicity with LSE2 is stronger than required. > - * Need a form of MO_ATOM_WITHIN16_PAIR that never requires > - * 16-byte atomicity. > + * STGP is defined as two 8-byte memory operations, aligned to > TAG_GRANULE, > + * and one tag operation. We implement it as one single aligned 16-byte > + * memory operation for convenience. Note that the alignment ensures > + * MO_ATOM_IFALIGN_PAIR produces 8-byte atomicity for the memory store. > */ > - mop = MO_128; > - if (s->align_mem) { > - mop |= MO_ALIGN_8; > - } > - mop = finalize_memop_pair(s, mop); > + mop = MO_128 | MO_ALIGN | MO_ATOM_IFALIGN_PAIR;
So here we're implicitly assuming TAG_GRANULE is 16 and then relying on the codegen for a MO_128 | MO_ALIGN operation to give us the alignment fault if the guest address isn't aligned to the tag granule, right ? Previously we also put s->be_data into the MemOp (via finalize_memop_pair() calling finalize_memop_atom()). Don't we still need to do that ? (We do explicitly swap the two i64s into the i128 in different orders depending on the be_data setting, but I think that is to handle MO_128 | MO_BE giving a true BE 128 bit store, whereas what we're implementing is two BE 64 bit stores.) > tmp = tcg_temp_new_i128(); > if (s->be_data == MO_LE) { > @@ -3060,6 +3040,15 @@ static bool trans_STGP(DisasContext *s, arg_ldstpair > *a) > } > tcg_gen_qemu_st_i128(tmp, clean_addr, get_mem_index(s), mop); thanks -- PMM