On 12/16/14 14:48, Andrew Jones wrote: > On Fri, Dec 12, 2014 at 04:58:45PM +0100, Laszlo Ersek wrote: >> Make it clear that the maximum access size to the MMIO data register >> determines the full size of the memory region. >> >> Currently the max access size is 1. Ensure that if a larger size were >> used in "fw_cfg_data_mem_ops.valid.max_access_size", the memory >> subsystem would split the access to byte-sized accesses internally, >> in increasing address order. >> >> fw_cfg_data_mem_read() and fw_cfg_data_mem_write() don't care about >> "address" or "size"; they just call the sequential fw_cfg_read() and >> fw_cfg_write() functions, correspondingly. Therefore the automatic >> splitting is just right. (The endianness of "fw_cfg_data_mem_ops" is >> native.) > > This 'is native' caught my eye. Laszlo's > Documentation/devicetree/bindings/arm/fw-cfg.txt patch states that the > selector register is LE and > > " > The data register allows accesses with 8, 16, 32 and 64-bit width. > Accesses larger than a byte are interpreted as arrays, bundled > together only for better performance. The bytes constituting such a > word, in increasing address order, correspond to the bytes that would > have been transferred by byte-wide accesses in chronological order. > " > > I chatted with Laszlo to make sure the host-is-BE case was considered. > It looks like there may be an issue there that Laszlo promised to look > into. Laszlo had already noticed that the selector was defined as > native in qemu as well, but should be LE. Now that we support > 1 byte > reads and writes from the data port, then maybe we should look into > changing that as well.
Yes. The root of this question is what each of enum device_endian { DEVICE_NATIVE_ENDIAN, DEVICE_BIG_ENDIAN, DEVICE_LITTLE_ENDIAN, }; means. Consider the following call tree, which implements the splitting / combining of an MMIO read: memory_region_dispatch_read() [memory.c] memory_region_dispatch_read1() access_with_adjusted_size() memory_region_big_endian() for each byte in the wide access: memory_region_read_accessor() fw_cfg_data_mem_read() [hw/nvram/fw_cfg.c] fw_cfg_read() adjust_endianness() memory_region_wrong_endianness() bswapXX() The function access_with_adjusted_size() always iterates over the MMIO address range in incrementing address order. However, it can calculate the shift count for memory_region_read_accessor() in two ways. When memory_region_big_endian() returns "true", the shift count decreases as the MMIO address increases. When memory_region_big_endian() returns "false", the shift count increases as the MMIO address increases. In memory_region_read_accessor(), the shift count is used for a logical (ie. numeric) bitwise left shift (<<). That operator works with numeric values and hides (ie. accounts for) host endianness. Let's consider - an array of two bytes, [0xaa, 0xbb], - a uint16_t access made from the guest, - and all twelve possible cases. In the table below, the "host", "guest" and "device_endian" columns are input variables. The columns to the right are calculated / derived values. The arrows above the columns show the data dependencies. After memory_region_dispatch_read1() constructs the value that is visible in the "host value" column, it is passed to adjust_endianness(). If memory_region_wrong_endianness() returns "true", then the host value is byte-swapped. The result is then passed to the guest. +---------------------------------------------------------------------------------------------------------------+----------+ | | | +---- ------+-------------------------------------------------------------------------+ | | | | | | | | +----------------------------------------------------------+---------+ | | | | | | | | | | | | | +-----------+-------------------+ +----------------+ | | +------------------------+-------------------+ | | | | | | | | | | | | | | | | | | | | | | | | | v | v | v | v | v | v # host guest device_endian memory_region_big_endian() host value host repr. memory_region_wrong_endianness() guest repr. guest value -- ---- ----- ------------- -------------------------- ---------- ------------ -------------------------------- ------------ ----------- 1 LE LE native 0 0xbbaa [0xaa, 0xbb] 0 [0xaa, 0xbb] 0xbbaa 2 LE LE BE 1 0xaabb [0xbb, 0xaa] 1 [0xaa, 0xbb] 0xbbaa 3 LE LE LE 0 0xbbaa [0xaa, 0xbb] 0 [0xaa, 0xbb] 0xbbaa 4 LE BE native 1 0xaabb [0xbb, 0xaa] 0 [0xbb, 0xaa] 0xbbaa 5 LE BE BE 1 0xaabb [0xbb, 0xaa] 0 [0xbb, 0xaa] 0xbbaa 6 LE BE LE 0 0xbbaa [0xaa, 0xbb] 1 [0xbb, 0xaa] 0xbbaa 7 BE LE native 0 0xbbaa [0xbb, 0xaa] 0 [0xbb, 0xaa] 0xaabb 8 BE LE BE 1 0xaabb [0xaa, 0xbb] 1 [0xbb, 0xaa] 0xaabb 9 BE LE LE 0 0xbbaa [0xbb, 0xaa] 0 [0xbb, 0xaa] 0xaabb 10 BE BE native 1 0xaabb [0xaa, 0xbb] 0 [0xaa, 0xbb] 0xaabb 11 BE BE BE 1 0xaabb [0xaa, 0xbb] 0 [0xaa, 0xbb] 0xaabb 12 BE BE LE 0 0xbbaa [0xbb, 0xaa] 1 [0xaa, 0xbb] 0xaabb Looking at the two rightmost columns, we must conclude: (a) The "device_endian" field has absolutely no significance wrt. what the guest sees. In each triplet of cases, when we go from DEVICE_NATIVE_ENDIAN to DEVICE_BIG_ENDIAN to DEVICE_LITTLE_ENDIAN, the guest sees the exact same value. I don't understand this result (it makes me doubt device_endian makes any sense). What did I do wrong? Just to be sure that I was not seeing ghosts, I actually tested this result. On an x86_64 hosts I tested the aarch64 guest firmware (using TCG), cycling the "fw_cfg_data_mem_ops.endianness" field through all three possible values. That is, I tested cases #1 to #3. They *all* worked! (b) Considering a given host endianness (ie. a group of six cases): when the guest goes from little endian to big endian, the *numerical value* the guest sees does not change. In addition, fixating the guest endianness, and changing the host endianness, the *numerical value* that the guest sees (for the original host-side array [0xaa, 0xbb]) changes. This means that this interface is *value preserving*, not representation preserving. In other words: when host and guest endiannesses are identical, the *array* is transferred okay (the guest array matches the initial host array [0xaa, 0xbb]). When guest and host differ in endianness, the guest will see an incorrect *array*. Which, alas, makes this interface completely unsuitable for the purpose at hand (ie. transferring kernel & initrd images in words wider than a byte). We couldn't care less as to what numerical value the array [0xaa, 0xbb] encodes on the host -- whatever it encodes, the guest wants to receive the same *array* (not the same value). And the byte order cannot be fixed up in the guest, because it depends on the XOR of guest and host endiannesses, and the guest doesn't know about the host's endianness. I apologize for wasting everyone's time, but I think both results are very non-intuitive. I noticed the use of the bitwise shift in memory_region_read_accessor() -- which internally accounts for the host-side byte order -- just today, while discussing this with Drew on IRC. I had assumed that it would store bytes in the recipient uint64_t by address, not by numerical order of magnitude. Looks like the DMA-like interface is the only way forward. :( Laszlo