[Qemu-devel] [RFC] Streamlining endian handling in TCG

Richard Henderson Wed, 28 Aug 2013 08:29:28 -0700

On 08/28/2013 07:34 AM, Peter Maydell wrote:
> On 28 August 2013 15:31, Richard Henderson <r...@twiddle.net> wrote:
>> On 08/28/2013 01:15 AM, Peter Maydell wrote:
>>> [*] not impossible, we already do something on the ppc
>>> that's similar; however I'd really want to take the time to
>>> figure out how to do endianness swapping "properly"
>>> and what qemu does currently before messing with it.
>>
>> I've got a loose plan in my head for how to clean up handling of
>> reverse-endian load/store instructions at both the translator and
>> tcg backend levels.
> 
> Nice. Will it allow us to get rid of TARGET_WORDS_BIGENDIAN?


I don't know, as I don't know off-hand what all that implies.

Let me lay out my idea and see what you think:

Currently, at the TCG level we have 8 qemu_ld* opcodes, and 4 qemu_st* opcodes,
that always produce target_ulong sized results, and always in the guest
declared endianness.

There are several problems I want to address:

(1) I want explicit _i32 and _i64 sizes for the loads and stores.  This will
clean up a number of places in several translators where we have to load to _tl
and then truncate or extend to an explicit size.

(2) I want explicit endianness for the loads and stores.  E.g. when a sparc
guest does a byte-swapped store, there's little point in doing two offsetting
bswaps to make that happen.

(3) For hosts that do not support byte-swapped loads and stores themselves, the
need to allocate extra registers during the memory operation in order to  hold
the swapped results is an unnecessary burden.  Better to expose the bswap
operation at the tcg opcode level and let normal register allocation happen.

Now, naively implementing 1 and 2 would result in 32 opcodes for qemu_ld*. That
is obviously a non-starter.  However, the very first thing that each tcg
backend does is map the current 8 opcodes into a bitmask ("opc" and "s_bits"
in the source).  Let us make that official, and then extend it.

Therefore:

(A) Compress qemu_ld* into two qemu_ld_{i32,i64}, with an additional constant
argument that describes the actual load, exactly as "opc" does today.
Adjusting the translators to match can be done in stages, or we might decide to
leave the existing translator-level interface in place permanently.

(B) Add an additional bit to the "opc" to indicate which endianness is desired.
 E.g. 0 = LE, 8 = BE.  Expose the opc interface to the translators.  At which
point generating a load becomes more like

    tcg_gen_qemu_ld_tl(dest, addr, size | sign | dc->big_endian);

and the current endianness of the guest becomes a bit on the TB, to be copied
into the DisasContext at the beginning of translation.

(C) Examine the endian bit in the tcg-op.h expander, and check a
TCG_TARGET_HAS_foo flag to see if the tcg backend supports reverse endian
memory ops.  If not, break out the bswap into the opcode stream as a temporary.

The corollary here is that we must have a full set of bi-endian tcg helper
functions.  At the moment, the helper functions are all keyed to the hard-coded
guest endianness.  That means the typical LE/BE host/guest memory op looks like

        if (tlb hit) {
            t = bswap(data);
            store t;
        } else {
            helper_store_be(data);
        }

If we hoist the bswap it'll need to be

        t = bswap(data);
        if (tlb hit) {
            store t;
        } else {
            helper_store_le(t);
        }

(D) Profit!  I'm not sure what will be left of TARGET_WORDS_BIGENDIAN at this
point.  Possibly only if we leave the current translator interface in place in
step A.



r~

[Qemu-devel] [RFC] Streamlining endian handling in TCG

Reply via email to