On Fri, Jan 8, 2016 at 4:53 PM, Alex Bennée <alex.ben...@linaro.org> wrote: > From: Alvise Rigo <a.r...@virtualopensystems.com> > > Attempting to simplify the helper_*_st_name, wrap the > do_unaligned_access code into an shared inline function. As this also > removes the goto statement the inline code is expanded twice in each > helper. > > Suggested-by: Jani Kokkonen <jani.kokko...@huawei.com> > Suggested-by: Claudio Fontana <claudio.font...@huawei.com> > CC: Alvise Rigo <a.r...@virtualopensystems.com> > Signed-off-by: Alex Bennée <alex.ben...@linaro.org> > > --- > v2 > - based on original patch from Alvise > - uses a single shared inline function to reduce duplication > --- > softmmu_template.h | 75 > ++++++++++++++++++++++++++++-------------------------- > 1 file changed, 39 insertions(+), 36 deletions(-) > > diff --git a/softmmu_template.h b/softmmu_template.h > index 0074bd7..ac0b4ac 100644 > --- a/softmmu_template.h > +++ b/softmmu_template.h > @@ -159,6 +159,39 @@ static inline int smmu_helper(victim_tlb_hit) (const > bool is_read, CPUArchState > } > > #ifndef SOFTMMU_CODE_ACCESS > + > +static inline void smmu_helper(do_unl_store)(CPUArchState *env, > + bool little_endian, > + DATA_TYPE val, > + target_ulong addr, > + TCGMemOpIdx oi, > + unsigned mmu_idx, > + uintptr_t retaddr) > +{ > + int i; > + > + if ((get_memop(oi) & MO_AMASK) == MO_ALIGN) { > + cpu_unaligned_access(ENV_GET_CPU(env), addr, MMU_DATA_STORE, > + mmu_idx, retaddr); > + } > + /* Note: relies on the fact that tlb_fill() does not remove the > + * previous page from the TLB cache. */ > + for (i = DATA_SIZE - 1; i >= 0; i--) { > + uint8_t val8; > + if (little_endian) { > + /* Little-endian extract. */ > + val8 = val >> (i * 8); > + } else { > + /* Big-endian extract. */ > + val8 = val >> (((DATA_SIZE - 1) * 8) - (i * 8)); > + } > + /* Note the adjustment at the beginning of the function. > + Undo that for the recursion. */ > + glue(helper_ret_stb, MMUSUFFIX)(env, addr + i, val8, > + oi, retaddr + GETPC_ADJ); > + } > +} > + > static inline DATA_TYPE glue(io_read, SUFFIX)(CPUArchState *env, > CPUIOTLBEntry *iotlbentry, > target_ulong addr, > @@ -416,7 +449,8 @@ void helper_le_st_name(CPUArchState *env, target_ulong > addr, DATA_TYPE val, > if (unlikely(tlb_addr & ~TARGET_PAGE_MASK)) { > CPUIOTLBEntry *iotlbentry; > if ((addr & (DATA_SIZE - 1)) != 0) { > - goto do_unaligned_access; > + smmu_helper(do_unl_store)(env, true, val, addr, oi, mmu_idx, > retaddr); > + return; > } > iotlbentry = &env->iotlb[mmu_idx][index]; > > @@ -431,23 +465,7 @@ void helper_le_st_name(CPUArchState *env, target_ulong > addr, DATA_TYPE val, > if (DATA_SIZE > 1 > && unlikely((addr & ~TARGET_PAGE_MASK) + DATA_SIZE - 1 > >= TARGET_PAGE_SIZE)) { > - int i; > - do_unaligned_access: > - if ((get_memop(oi) & MO_AMASK) == MO_ALIGN) { > - cpu_unaligned_access(ENV_GET_CPU(env), addr, MMU_DATA_STORE, > - mmu_idx, retaddr); > - } > - /* XXX: not efficient, but simple */ > - /* Note: relies on the fact that tlb_fill() does not remove the > - * previous page from the TLB cache. */ > - for (i = DATA_SIZE - 1; i >= 0; i--) { > - /* Little-endian extract. */ > - uint8_t val8 = val >> (i * 8); > - /* Note the adjustment at the beginning of the function. > - Undo that for the recursion. */ > - glue(helper_ret_stb, MMUSUFFIX)(env, addr + i, val8, > - oi, retaddr + GETPC_ADJ); > - } > + smmu_helper(do_unl_store)(env, true, val, addr, oi, mmu_idx, > retaddr); > return; > } > > @@ -496,7 +514,8 @@ void helper_be_st_name(CPUArchState *env, target_ulong > addr, DATA_TYPE val, > if (unlikely(tlb_addr & ~TARGET_PAGE_MASK)) { > CPUIOTLBEntry *iotlbentry; > if ((addr & (DATA_SIZE - 1)) != 0) { > - goto do_unaligned_access; > + smmu_helper(do_unl_store)(env, false, val, addr, oi, mmu_idx, > retaddr); > + return; > } > iotlbentry = &env->iotlb[mmu_idx][index]; > > @@ -511,23 +530,7 @@ void helper_be_st_name(CPUArchState *env, target_ulong > addr, DATA_TYPE val, > if (DATA_SIZE > 1 > && unlikely((addr & ~TARGET_PAGE_MASK) + DATA_SIZE - 1 > >= TARGET_PAGE_SIZE)) { > - int i; > - do_unaligned_access: > - if ((get_memop(oi) & MO_AMASK) == MO_ALIGN) { > - cpu_unaligned_access(ENV_GET_CPU(env), addr, MMU_DATA_STORE, > - mmu_idx, retaddr); > - } > - /* XXX: not efficient, but simple */ > - /* Note: relies on the fact that tlb_fill() does not remove the > - * previous page from the TLB cache. */ > - for (i = DATA_SIZE - 1; i >= 0; i--) { > - /* Big-endian extract. */ > - uint8_t val8 = val >> (((DATA_SIZE - 1) * 8) - (i * 8)); > - /* Note the adjustment at the beginning of the function. > - Undo that for the recursion. */ > - glue(helper_ret_stb, MMUSUFFIX)(env, addr + i, val8, > - oi, retaddr + GETPC_ADJ); > - } > + smmu_helper(do_unl_store)(env, false, val, addr, oi, mmu_idx, > retaddr); > return; > } > > -- > 2.6.4 >
This approach makes sense to me, given that the leg of the *if* statement is actually inlined depending on the (constant) value of little_endian. The thing not convincing me is the fact that we are not imposing the inlining, but we are relying on the compiler optimizations to do it. I wonder if this will always happen, even with other compilers (clang). alvise