> From: Mateja Marjanovic <mateja.marjano...@rt-rk.com> > Subject: Re: [PATCH v2 1/7] target/mips: Fix <ld|st>.<b|h|w|d> MSA > instructions for MIPS big endian host > On 25.3.19. 22:21, Aleksandar Markovic wrote: >> From: Mateja Marjanovic <mateja.marjano...@rt-rk.com> >> Subject: [PATCH v2 1/7] target/mips: Fix <ld|st>.<b|h|w|d> MSA instructions >> for MIPS big endian host >> Please split this patch into one for load instructions and another for >> store instructions. > > I will do that. > >> I don't think the variable "big_endian" is needed, you should better resolve >> all differences wrt endian via "#if defined(HOST_WORDS_BIGENDIAN)", >> like you did in other patches. It is important to be consistent. > > I think you can't have a preprocessing statement inside a macro, > so I did the thing that seemed most similar to that. >
There should be some equivalent way of doing that involving an argument to the macro, however, in the light of your recent work on MSA optimization, let's split/demacro and unroll loops in load and store helpers. After splitting helper for LD.B should look something like this: (the code is just for demonstration purposes, doublecheck it) void helper_msa_ld_b(CPUMIPSState *env, uint32_t wd, target_ulong addr) { wr_t *pwd = &(env->active_fpu.fpr[wd].wr); wr_t wx; int i; MEMOP_IDX(DF_BYTE) for (i = 0; i < DF_ELEMENTS(DF_BYTE); i++) { #if !defined(CONFIG_USER_ONLY) wx.b[i] = helper_ret_ldub_mmu(env, addr + (i << DF_BYTE), oi, GETPC()); #else wx.b[i] = cpu_ldub_data(env, addr + (i << DF_BYTE)); #endif } memcpy(pwd, &wx, sizeof(wr_t)); } After unrolling: void helper_msa_ld_b(CPUMIPSState *env, uint32_t wd, target_ulong addr) { wr_t *pwd = &(env->active_fpu.fpr[wd].wr); wr_t wx; int i; MEMOP_IDX(DF_BYTE) #if !defined(CONFIG_USER_ONLY) wx.b[0] = helper_ret_ldub_mmu(env, addr, oi, GETPC()); wx.b[1] = helper_ret_ldub_mmu(env, addr + 1, oi, GETPC()); wx.b[2] = helper_ret_ldub_mmu(env, addr + 2, oi, GETPC()); wx.b[3] = helper_ret_ldub_mmu(env, addr + 3, oi, GETPC()); wx.b[4] = helper_ret_ldub_mmu(env, addr + 4, oi, GETPC()); wx.b[5] = helper_ret_ldub_mmu(env, addr + 5, oi, GETPC()); wx.b[6] = helper_ret_ldub_mmu(env, addr + 6, oi, GETPC()); wx.b[7] = helper_ret_ldub_mmu(env, addr + 7, oi, GETPC()); wx.b[8] = helper_ret_ldub_mmu(env, addr + 8, oi, GETPC()); wx.b[9] = helper_ret_ldub_mmu(env, addr + 9, oi, GETPC()); wx.b[10] = helper_ret_ldub_mmu(env, addr + 10, oi, GETPC()); wx.b[11] = helper_ret_ldub_mmu(env, addr + 11, oi, GETPC()); wx.b[12] = helper_ret_ldub_mmu(env, addr + 12, oi, GETPC()); wx.b[13] = helper_ret_ldub_mmu(env, addr + 13, oi, GETPC()); wx.b[14] = helper_ret_ldub_mmu(env, addr + 14, oi, GETPC()); wx.b[15] = helper_ret_ldub_mmu(env, addr + 15, oi, GETPC()); #else wx.b[0] = cpu_ldub_data(env, addr); wx.b[1] = cpu_ldub_data(env, addr + 1); wx.b[2] = cpu_ldub_data(env, addr + 2); wx.b[3] = cpu_ldub_data(env, addr + 3); wx.b[4] = cpu_ldub_data(env, addr + 4); wx.b[5] = cpu_ldub_data(env, addr + 5); wx.b[6] = cpu_ldub_data(env, addr + 6); wx.b[7] = cpu_ldub_data(env, addr + 7); wx.b[8] = cpu_ldub_data(env, addr + 8); wx.b[9] = cpu_ldub_data(env, addr + 9); wx.b[10] = cpu_ldub_data(env, addr + 10); wx.b[11] = cpu_ldub_data(env, addr + 11); wx.b[12] = cpu_ldub_data(env, addr + 12); wx.b[13] = cpu_ldub_data(env, addr + 13); wx.b[14] = cpu_ldub_data(env, addr + 14); wx.b[15] = cpu_ldub_data(env, addr + 15); #endif } memcpy(pwd, &wx, sizeof(wr_t)); } And then apply big-endian-related changes. I think that would be more in the spirit of other MSA work. Thanks, Aleksandar