On 08/28/2013 07:34 AM, Peter Maydell wrote: > On 28 August 2013 15:31, Richard Henderson <r...@twiddle.net> wrote: >> On 08/28/2013 01:15 AM, Peter Maydell wrote: >>> [*] not impossible, we already do something on the ppc >>> that's similar; however I'd really want to take the time to >>> figure out how to do endianness swapping "properly" >>> and what qemu does currently before messing with it. >> >> I've got a loose plan in my head for how to clean up handling of >> reverse-endian load/store instructions at both the translator and >> tcg backend levels. > > Nice. Will it allow us to get rid of TARGET_WORDS_BIGENDIAN?
I don't know, as I don't know off-hand what all that implies. Let me lay out my idea and see what you think: Currently, at the TCG level we have 8 qemu_ld* opcodes, and 4 qemu_st* opcodes, that always produce target_ulong sized results, and always in the guest declared endianness. There are several problems I want to address: (1) I want explicit _i32 and _i64 sizes for the loads and stores. This will clean up a number of places in several translators where we have to load to _tl and then truncate or extend to an explicit size. (2) I want explicit endianness for the loads and stores. E.g. when a sparc guest does a byte-swapped store, there's little point in doing two offsetting bswaps to make that happen. (3) For hosts that do not support byte-swapped loads and stores themselves, the need to allocate extra registers during the memory operation in order to hold the swapped results is an unnecessary burden. Better to expose the bswap operation at the tcg opcode level and let normal register allocation happen. Now, naively implementing 1 and 2 would result in 32 opcodes for qemu_ld*. That is obviously a non-starter. However, the very first thing that each tcg backend does is map the current 8 opcodes into a bitmask ("opc" and "s_bits" in the source). Let us make that official, and then extend it. Therefore: (A) Compress qemu_ld* into two qemu_ld_{i32,i64}, with an additional constant argument that describes the actual load, exactly as "opc" does today. Adjusting the translators to match can be done in stages, or we might decide to leave the existing translator-level interface in place permanently. (B) Add an additional bit to the "opc" to indicate which endianness is desired. E.g. 0 = LE, 8 = BE. Expose the opc interface to the translators. At which point generating a load becomes more like tcg_gen_qemu_ld_tl(dest, addr, size | sign | dc->big_endian); and the current endianness of the guest becomes a bit on the TB, to be copied into the DisasContext at the beginning of translation. (C) Examine the endian bit in the tcg-op.h expander, and check a TCG_TARGET_HAS_foo flag to see if the tcg backend supports reverse endian memory ops. If not, break out the bswap into the opcode stream as a temporary. The corollary here is that we must have a full set of bi-endian tcg helper functions. At the moment, the helper functions are all keyed to the hard-coded guest endianness. That means the typical LE/BE host/guest memory op looks like if (tlb hit) { t = bswap(data); store t; } else { helper_store_be(data); } If we hoist the bswap it'll need to be t = bswap(data); if (tlb hit) { store t; } else { helper_store_le(t); } (D) Profit! I'm not sure what will be left of TARGET_WORDS_BIGENDIAN at this point. Possibly only if we leave the current translator interface in place in step A. r~