On Sat, 2013-07-13 at 00:10 +0100, Peter Maydell wrote: > The block marked "byteswap here" does "byte invariant bigendian", > so byte accesses are unchanged, 16 bit accesses have the two words > of data flipped, and 32 bit accesses have the four bytes flipped; > this happens as the data passes through; addresses are unchanged. > It only happens if the CPU is configured by the guest to operate > in big-endian mode, obviously. > (Contrast 'word invariant bigendian', which is what ARM used to do, > where the addresses are changed but the data is not. That would be > pretty painful to implement in the memory region API though it is > of course trivial in hardware since it is just XORing of the low > address bits according to the access size...)
Which means that ARM switches the byte order of its bus when switching endian which is very very wrong... oh well. PowerPCs operate in both endians and nowadays do not require any adjustement at the bus level. It's simple, the definition that matter and the only definition that matter for data is which byte is at what address and that shouldn't change. For addresses, they should be in a fixed significance order, not change order based on the "endianness" of the processor. Or to say things differently, byte order of data should be fixed, and the "endianness" of the bus (which is purely in that case the byte order of addresses) as well. ARM seem to have chosen to go from fixing the byte order of data & changing the bus endianness, to changing the byte order of data to preserve the bus endianness, or something along those lines. Both approaches are WRONG. Anyway, whatever, what is done is done, point is, what you have is a byte lane swapper which is similar to what older ppc did which indeed is needed if your bus flips around. I would still not model that using the term "endianness" of either the bus or bridge. It's again purely a statement of what byte lane coming out corresponds to what *address*, regardless of byte significance. > > Again, the only endian > > attribute that exists are the byte order of the original access (which > > byte has the lowest address, regardless of significance of those bytes > > in the target, ie, purely from a qemu standpoint, in the variable that > > carries the access around inside qemu, which byte has the lowest > > address) > > What does this even mean? At the point where a memory access leaves > the CPU (emulation or real hardware) it has (a) an address and > (b) a width -- a 16 bit access is neither big nor little endian, Not from the point of view of the bus indeed. The value inside might or might not have an endianness but that's purely somebody else business. > it's just a request for 16 bits of data (on real hardware it's > typically a bus transaction on a bunch of data lines with some > control lines indicating transaction width). Now the CPU emulation > may internally be intending to put that data into its emulated > register one way round or the other, but that's an internal detail > of the CPU emulation. (Similarly for stores.) My statement above meant (sorry if it wasn't clear) that what matters is that when qemu carries that request around (thus carries those 16-bits in a u16 variable inside qemu itself, ie, the argument of some load/store callback), the only attribute of interest is which of the bytes in that u16 variable is the first in ascending address order. IE. This is a property of the processor bus, it is *defined* which of those 2 bytes that form that 16 bit wide "access" is supposed to go at what address as part of the bus definition (though ARM seems to flip it around). Obviously, it should be done according to the host endianness, and thus one would expect that qemu just "storing" that in memory results in the two bytes being laid out in the right order. My point here is that if any conversion at that level is needed, it's purely somewhere in TCG to ensure that what's in the variable carried out of TCG is represented according to the intended byte order of the original access. That's why the word "endianness" is so confusing. At this point, if anything, the only endianness that exists is the one of the host CPU :-) Now when we cross your byte-lanes adjusting bridge in qemu, two things can happen. - The bridge is configured properly and it's a nop - The bridge is *not* configured properly, and you basically need to perform a lane swap on anything crossing that bridge. I wouldn't call that "endian". I wouldn't model that by saying that some range of addresses is "LE" or "BE" or "Host Endian" or whatever ... The best way to represent that in qemu would be to have some kind of "lane swap" attribute which is set based on the (mis)match of the CPU and bridge configuration. > >, and the same on the target device (at which point a concept of > > significance does apply, but it's a guest driver business to get it > > right, qemu just need to make sure byte 0 goes to byte 0). > > Similarly, at the target device end there is no concept > of a "big endian access" -- we make a request for 16 > bits of data at a particular address (via the MemoryRegion > API) and the device returns 16 bits of data. > It's entirely > possible to design hardware so that byte access to address > X, halfword access to address X and word access to address > X all return entirely different data (though it would be > a bit perverse.) (As an implementation convenience we may > choose to provide helper infrastructure so you don't have > to actually implement all of byte/halfword/word access by hand.) In fact, I think some x86 IO devices did that perverse thing in the past. It's not actually always possible. On modern busses it tends not to be actually. Often, busses do *not* have low address bits below their width but use byte enables instead. Now of course the device could be perverse enough to try to use those to "deduce" the address and still return different values but I doubt anybody does that :-) However this is academic. The point is that the bus the device is on has a definition of what byte is expected first in ascending address order, which should match the way qemu carry the data. Endianness at the device level thus is purely about the interpretation of those data at a register level, because qemu is no HW. > > If a bridge flips things around in a way that breaks the > > model, > > That breaks what model? Byte order invariance which is what we care about. If the bridge preserves that, then it's a nop for qemu. > > then add some property describing the flipping > > properties but don't call it "big > > endian" or "little endian" at the bridge level, that has no meaning, > > confuses things and introduces breakage like we have seen. > > I'm happy to call the property "byteswap", yes, because > that's what it does. If you did two of these in a row you'd > get a no-op. Right but the bridge you mentioned "byteswap" configuration is based on the configuration of the processor bus byte order, and if those are in sync, the bridge should essentially be a nop for all intend and purpose. If byte order is preserved, then the "value" qemu carries around can go unmodified, all the way. > >> (Our other serious endianness problem is that we don't really > >> do very well at supporting a TCG CPU arbitrarily flipping > >> endianness -- TARGET_WORDS_BIGENDIAN is a compile time setting > >> and ideally it should not be.) > > > > Our experience is that it actually works fine for almost everything > > except virtio :-) ie mostly TARGET_WORDS_BIGENDIAN is irrelevant (and > > should be). > > I agree that TARGET_WORDS_BIGENDIAN *should* go away, but > it exists currently. Do you actually implement a CPU which > does dynamic endianness flipping? Is it at all efficient > in the config which is the opposite of whatever > TARGET_WORDS_BIGENDIAN says? Yes. I haven't measured the speed (in fact I haven't looked at the code either, others have, but I know it works). Cheers, Ben.