> From: Aleksandar Markovic > Subject: Re: [PATCH v2 1/7] target/mips: Fix <ld|st>.<b|h|w|d> MSA > instructions for MIPS big endian host > > > From: Mateja Marjanovic <mateja.marjano...@rt-rk.com> > > Subject: Re: [PATCH v2 1/7] target/mips: Fix <ld|st>.<b|h|w|d> MSA > > instructions for MIPS big endian host > > On 25.3.19. 22:21, Aleksandar Markovic wrote: > >> From: Mateja Marjanovic <mateja.marjano...@rt-rk.com> > >> Subject: [PATCH v2 1/7] target/mips: Fix <ld|st>.<b|h|w|d> MSA > >> instructions for MIPS big endian host > >> Please split this patch into one for load instructions and another for > >> store instructions. > > > > I will do that. > > > >> I don't think the variable "big_endian" is needed, you should better > >> resolve > >> all differences wrt endian via "#if defined(HOST_WORDS_BIGENDIAN)", > >> like you did in other patches. It is important to be consistent. > > > > I think you can't have a preprocessing statement inside a macro, > > so I did the thing that seemed most similar to that. > > > > There should be some equivalent way of doing that involving an argument > to the macro, however, in the light of your recent work on MSA optimization, > let's split/demacro and unroll loops in load and store helpers. > > After splitting helper for LD.B should look something like this: > (the code is just for demonstration purposes, doublecheck it) > > void helper_msa_ld_b(CPUMIPSState *env, uint32_t wd, target_ulong addr) > { > wr_t *pwd = &(env->active_fpu.fpr[wd].wr); > wr_t wx; > int i; > > MEMOP_IDX(DF_BYTE) > for (i = 0; i < DF_ELEMENTS(DF_BYTE); i++) { > #if !defined(CONFIG_USER_ONLY) > wx.b[i] = helper_ret_ldub_mmu(env, addr + (i << DF_BYTE), oi, > GETPC()); > #else > wx.b[i] = cpu_ldub_data(env, addr + (i << DF_BYTE)); > #endif > } > memcpy(pwd, &wx, sizeof(wr_t)); > } > > After unrolling: > > void helper_msa_ld_b(CPUMIPSState *env, uint32_t wd, target_ulong addr) > { > wr_t *pwd = &(env->active_fpu.fpr[wd].wr); > wr_t wx; > int i; > > MEMOP_IDX(DF_BYTE) > > #if !defined(CONFIG_USER_ONLY) > wx.b[0] = helper_ret_ldub_mmu(env, addr, oi, GETPC()); > wx.b[1] = helper_ret_ldub_mmu(env, addr + 1, oi, GETPC()); > wx.b[2] = helper_ret_ldub_mmu(env, addr + 2, oi, GETPC()); > wx.b[3] = helper_ret_ldub_mmu(env, addr + 3, oi, GETPC()); > wx.b[4] = helper_ret_ldub_mmu(env, addr + 4, oi, GETPC()); > wx.b[5] = helper_ret_ldub_mmu(env, addr + 5, oi, GETPC()); > wx.b[6] = helper_ret_ldub_mmu(env, addr + 6, oi, GETPC()); > wx.b[7] = helper_ret_ldub_mmu(env, addr + 7, oi, GETPC()); > wx.b[8] = helper_ret_ldub_mmu(env, addr + 8, oi, GETPC()); > wx.b[9] = helper_ret_ldub_mmu(env, addr + 9, oi, GETPC()); > wx.b[10] = helper_ret_ldub_mmu(env, addr + 10, oi, GETPC()); > wx.b[11] = helper_ret_ldub_mmu(env, addr + 11, oi, GETPC()); > wx.b[12] = helper_ret_ldub_mmu(env, addr + 12, oi, GETPC()); > wx.b[13] = helper_ret_ldub_mmu(env, addr + 13, oi, GETPC()); > wx.b[14] = helper_ret_ldub_mmu(env, addr + 14, oi, GETPC()); > wx.b[15] = helper_ret_ldub_mmu(env, addr + 15, oi, GETPC()); > #else > wx.b[0] = cpu_ldub_data(env, addr); > wx.b[1] = cpu_ldub_data(env, addr + 1); > wx.b[2] = cpu_ldub_data(env, addr + 2); > wx.b[3] = cpu_ldub_data(env, addr + 3); > wx.b[4] = cpu_ldub_data(env, addr + 4); > wx.b[5] = cpu_ldub_data(env, addr + 5); > wx.b[6] = cpu_ldub_data(env, addr + 6); > wx.b[7] = cpu_ldub_data(env, addr + 7); > wx.b[8] = cpu_ldub_data(env, addr + 8); > wx.b[9] = cpu_ldub_data(env, addr + 9); > wx.b[10] = cpu_ldub_data(env, addr + 10); > wx.b[11] = cpu_ldub_data(env, addr + 11); > wx.b[12] = cpu_ldub_data(env, addr + 12); > wx.b[13] = cpu_ldub_data(env, addr + 13); > wx.b[14] = cpu_ldub_data(env, addr + 14); > wx.b[15] = cpu_ldub_data(env, addr + 15); > #endif > } > memcpy(pwd, &wx, sizeof(wr_t)); > } > > > And then apply big-endian-related changes. > > I think that would be more in the spirit of other MSA work.
The same approach (splitting helpers/unrolling loops) should be good for COPY_S, COPY_U, INSERT as well. Thanks, Aleksandar