> From: Mateja Marjanovic <mateja.marjano...@rt-rk.com>
> Subject: Re: [PATCH v2 1/7] target/mips: Fix <ld|st>.<b|h|w|d> MSA 
> instructions for MIPS big endian host
> On 25.3.19. 22:21, Aleksandar Markovic wrote:
>> From: Mateja Marjanovic <mateja.marjano...@rt-rk.com>
>> Subject: [PATCH v2 1/7] target/mips: Fix <ld|st>.<b|h|w|d> MSA instructions 
>> for MIPS big endian host
>> Please split this patch into one for load instructions and another for
>> store instructions.
>
> I will do that.
>
>> I don't think the variable "big_endian" is needed, you should better resolve
>> all differences wrt endian via "#if defined(HOST_WORDS_BIGENDIAN)",
>> like you did in other patches. It is important to be consistent.
>
> I think you can't have a preprocessing statement inside a macro,
> so I did the thing that seemed most similar to that.
>

There should be some equivalent way of doing that involving an argument
to the macro, however, in the light of your recent work on MSA optimization,
let's split/demacro and unroll loops in load and store helpers.

After splitting helper for LD.B should look something like this:
(the code is just for demonstration purposes, doublecheck it)

void helper_msa_ld_b(CPUMIPSState *env, uint32_t wd, target_ulong addr)
{
    wr_t *pwd = &(env->active_fpu.fpr[wd].wr);
    wr_t wx;
    int i;

    MEMOP_IDX(DF_BYTE)
    for (i = 0; i < DF_ELEMENTS(DF_BYTE); i++) {
#if !defined(CONFIG_USER_ONLY)
        wx.b[i] = helper_ret_ldub_mmu(env, addr + (i << DF_BYTE), oi, GETPC());
#else
        wx.b[i] = cpu_ldub_data(env, addr + (i << DF_BYTE));
#endif
    }
    memcpy(pwd, &wx, sizeof(wr_t));
}

After unrolling:

void helper_msa_ld_b(CPUMIPSState *env, uint32_t wd, target_ulong addr)
{
    wr_t *pwd = &(env->active_fpu.fpr[wd].wr);
    wr_t wx;
    int i;

    MEMOP_IDX(DF_BYTE)

#if !defined(CONFIG_USER_ONLY)
    wx.b[0]  = helper_ret_ldub_mmu(env, addr, oi, GETPC());
    wx.b[1]  = helper_ret_ldub_mmu(env, addr + 1, oi, GETPC());
    wx.b[2]  = helper_ret_ldub_mmu(env, addr + 2, oi, GETPC());
    wx.b[3]  = helper_ret_ldub_mmu(env, addr + 3, oi, GETPC());
    wx.b[4]  = helper_ret_ldub_mmu(env, addr + 4, oi, GETPC());
    wx.b[5]  = helper_ret_ldub_mmu(env, addr + 5, oi, GETPC());
    wx.b[6]  = helper_ret_ldub_mmu(env, addr + 6, oi, GETPC());
    wx.b[7]  = helper_ret_ldub_mmu(env, addr + 7, oi, GETPC());
    wx.b[8]  = helper_ret_ldub_mmu(env, addr + 8, oi, GETPC());
    wx.b[9]  = helper_ret_ldub_mmu(env, addr + 9, oi, GETPC());
    wx.b[10] = helper_ret_ldub_mmu(env, addr + 10, oi, GETPC());
    wx.b[11] = helper_ret_ldub_mmu(env, addr + 11, oi, GETPC());
    wx.b[12] = helper_ret_ldub_mmu(env, addr + 12, oi, GETPC());
    wx.b[13] = helper_ret_ldub_mmu(env, addr + 13, oi, GETPC());
    wx.b[14] = helper_ret_ldub_mmu(env, addr + 14, oi, GETPC());
    wx.b[15] = helper_ret_ldub_mmu(env, addr + 15, oi, GETPC());
#else
    wx.b[0]  = cpu_ldub_data(env, addr);
    wx.b[1]  = cpu_ldub_data(env, addr + 1);
    wx.b[2]  = cpu_ldub_data(env, addr + 2);
    wx.b[3]  = cpu_ldub_data(env, addr + 3);
    wx.b[4]  = cpu_ldub_data(env, addr + 4);
    wx.b[5]  = cpu_ldub_data(env, addr + 5);
    wx.b[6]  = cpu_ldub_data(env, addr + 6);
    wx.b[7]  = cpu_ldub_data(env, addr + 7);
    wx.b[8]  = cpu_ldub_data(env, addr + 8);
    wx.b[9]  = cpu_ldub_data(env, addr + 9);
    wx.b[10] = cpu_ldub_data(env, addr + 10);
    wx.b[11] = cpu_ldub_data(env, addr + 11);
    wx.b[12] = cpu_ldub_data(env, addr + 12);
    wx.b[13] = cpu_ldub_data(env, addr + 13);
    wx.b[14] = cpu_ldub_data(env, addr + 14);
    wx.b[15] = cpu_ldub_data(env, addr + 15);
#endif
    }
    memcpy(pwd, &wx, sizeof(wr_t));
}


And then apply big-endian-related changes.

I think that would be more in the spirit of other MSA work.

Thanks,
Aleksandar




Reply via email to