On 08/07/2016 11:06 PM, Nikunj A Dadhania wrote:
+#define LXV(name, access, swap, type, elems) \
+uint64_t helper_##name(CPUPPCState *env, \
+ target_ulong addr) \
+{ \
+ type r[elems] = {0}; \
+ int i, index, bound, step; \
+ if (msr_le) { \
+ index = elems - 1; \
+ bound = -1; \
+ step = -1; \
+ } else { \
+ index = 0; \
+ bound = elems; \
+ step = 1; \
+ } \
+ \
+ for (i = index; i != bound; i += step) { \
+ if (needs_byteswap(env)) { \
+ r[i] = swap(access(env, addr, GETPC())); \
+ } else { \
+ r[i] = access(env, addr, GETPC()); \
+ } \
+ addr = addr_add(env, addr, sizeof(type)); \
+ } \
+ return *((uint64_t *)r); \
+}
This looks more complicated than necessary.
(1) In big-endian mode, surely this simplifies to two 64-bit big-endian loads.
(2) In little-endian mode, the overhead of accessing memory surely dominates,
and therefore we should perform two 64-bit loads and manipulate the data after.
AFAICS, this is easiest done by requesting two 64-bit *big-endian* loads, and
then swapping bytes. E.g.
uint64_t helper_bswap16x4(uint64_t x)
{
uint64_t m = 0x00ff00ff00ff00ffull;
return ((x & m) << 8) | ((x >> 8) & m);
}
uint64_t helper_bswap32x2(uint64_t x)
{
return deposit64(bswap32(x >> 32), 32, 32, bswap32(x));
}
tcg_gen_qemu_ld_i64(dest, addr, MO_BEQ, s->mem_index);
if (ctx->le_mode) {
gen_helper_bswap16x4(dest, dest);
}
r~