Re: [Qemu-devel] [PATCH v4 1/8] fw_cfg: max access size and region size are the same for MMIO data reg

Laszlo Ersek Tue, 16 Dec 2014 11:16:45 -0800

On 12/16/14 14:48, Andrew Jones wrote:
> On Fri, Dec 12, 2014 at 04:58:45PM +0100, Laszlo Ersek wrote:
>> Make it clear that the maximum access size to the MMIO data register
>> determines the full size of the memory region.
>>
>> Currently the max access size is 1. Ensure that if a larger size were
>> used in "fw_cfg_data_mem_ops.valid.max_access_size", the memory
>> subsystem would split the access to byte-sized accesses internally,
>> in increasing address order.
>>
>> fw_cfg_data_mem_read() and fw_cfg_data_mem_write() don't care about
>> "address" or "size"; they just call the sequential fw_cfg_read() and
>> fw_cfg_write() functions, correspondingly. Therefore the automatic
>> splitting is just right. (The endianness of "fw_cfg_data_mem_ops" is
>> native.)
>
> This 'is native' caught my eye. Laszlo's
> Documentation/devicetree/bindings/arm/fw-cfg.txt patch states that the
> selector register is LE and
>
> "
> The data register allows accesses with 8, 16, 32 and 64-bit width.
> Accesses larger than a byte are interpreted as arrays, bundled
> together only for better performance. The bytes constituting such a
> word, in increasing address order, correspond to the bytes that would
> have been transferred by byte-wide accesses in chronological order.
> "
>
> I chatted with Laszlo to make sure the host-is-BE case was considered.
> It looks like there may be an issue there that Laszlo promised to look
> into. Laszlo had already noticed that the selector was defined as
> native in qemu as well, but should be LE. Now that we support > 1 byte
> reads and writes from the data port, then maybe we should look into
> changing that as well.


Yes.

The root of this question is what each of

enum device_endian {
    DEVICE_NATIVE_ENDIAN,
    DEVICE_BIG_ENDIAN,
    DEVICE_LITTLE_ENDIAN,
};

means.

Consider the following call tree, which implements the splitting /
combining of an MMIO read:

memory_region_dispatch_read() [memory.c]
  memory_region_dispatch_read1()
    access_with_adjusted_size()
      memory_region_big_endian()
      for each byte in the wide access:
        memory_region_read_accessor()
          fw_cfg_data_mem_read() [hw/nvram/fw_cfg.c]
            fw_cfg_read()
  adjust_endianness()
    memory_region_wrong_endianness()
    bswapXX()

The function access_with_adjusted_size() always iterates over the MMIO
address range in incrementing address order. However, it can calculate
the shift count for memory_region_read_accessor() in two ways.

When memory_region_big_endian() returns "true", the shift count
decreases as the MMIO address increases.

When memory_region_big_endian() returns "false", the shift count
increases as the MMIO address increases.

In memory_region_read_accessor(), the shift count is used for a logical
(ie. numeric) bitwise left shift (<<). That operator works with numeric
values and hides (ie. accounts for) host endianness.

Let's consider
- an array of two bytes, [0xaa, 0xbb],
- a uint16_t access made from the guest,
- and all twelve possible cases.

In the table below, the "host", "guest" and "device_endian" columns are
input variables. The columns to the right are calculated / derived
values. The arrows above the columns show the data dependencies.

After memory_region_dispatch_read1() constructs the value that is
visible in the "host value" column, it is passed to adjust_endianness().
If memory_region_wrong_endianness() returns "true", then the host value
is byte-swapped. The result is then passed to the guest.

              
+---------------------------------------------------------------------------------------------------------------+----------+
              |                                                                 
                                              |          |
            +---- 
------+-------------------------------------------------------------------------+
                           |          |
            | |         |                                                       
                  |                           |          |
      +----------------------------------------------------------+---------+    
                  |                           |          |
      |     | |         |                                        |         |    
                  |                           |          |
      |   +-----------+-------------------+  +----------------+  |         |  
+------------------------+-------------------+  |          |
      |   | | |       | |                 |  |                |  |         |  | 
                  |    |                   |  |          |
      |   | | |       | |                 v  |                v  |         v  | 
                  v    |                   v  |          v
 #  host  guest  device_endian  memory_region_big_endian()  host value  host 
repr.    memory_region_wrong_endianness()  guest repr.   guest value
--  ----  -----  -------------  --------------------------  ----------  
------------  --------------------------------  ------------  -----------
 1  LE    LE     native         0                           0xbbaa      [0xaa, 
0xbb]  0                                 [0xaa, 0xbb]  0xbbaa
 2  LE    LE     BE             1                           0xaabb      [0xbb, 
0xaa]  1                                 [0xaa, 0xbb]  0xbbaa
 3  LE    LE     LE             0                           0xbbaa      [0xaa, 
0xbb]  0                                 [0xaa, 0xbb]  0xbbaa

 4  LE    BE     native         1                           0xaabb      [0xbb, 
0xaa]  0                                 [0xbb, 0xaa]  0xbbaa
 5  LE    BE     BE             1                           0xaabb      [0xbb, 
0xaa]  0                                 [0xbb, 0xaa]  0xbbaa
 6  LE    BE     LE             0                           0xbbaa      [0xaa, 
0xbb]  1                                 [0xbb, 0xaa]  0xbbaa

 7  BE    LE     native         0                           0xbbaa      [0xbb, 
0xaa]  0                                 [0xbb, 0xaa]  0xaabb
 8  BE    LE     BE             1                           0xaabb      [0xaa, 
0xbb]  1                                 [0xbb, 0xaa]  0xaabb
 9  BE    LE     LE             0                           0xbbaa      [0xbb, 
0xaa]  0                                 [0xbb, 0xaa]  0xaabb

10  BE    BE     native         1                           0xaabb      [0xaa, 
0xbb]  0                                 [0xaa, 0xbb]  0xaabb
11  BE    BE     BE             1                           0xaabb      [0xaa, 
0xbb]  0                                 [0xaa, 0xbb]  0xaabb
12  BE    BE     LE             0                           0xbbaa      [0xbb, 
0xaa]  1                                 [0xaa, 0xbb]  0xaabb


Looking at the two rightmost columns, we must conclude:

(a) The "device_endian" field has absolutely no significance wrt. what
    the guest sees. In each triplet of cases, when we go from
    DEVICE_NATIVE_ENDIAN to DEVICE_BIG_ENDIAN to DEVICE_LITTLE_ENDIAN,
    the guest sees the exact same value.

    I don't understand this result (it makes me doubt device_endian
    makes any sense). What did I do wrong?

    Just to be sure that I was not seeing ghosts, I actually tested this
    result. On an x86_64 hosts I tested the aarch64 guest firmware
    (using TCG), cycling the "fw_cfg_data_mem_ops.endianness" field
    through all three possible values. That is, I tested cases #1 to #3.

    They *all* worked!

(b) Considering a given host endianness (ie. a group of six cases): when
    the guest goes from little endian to big endian, the *numerical
    value* the guest sees does not change.

    In addition, fixating the guest endianness, and changing the host
    endianness, the *numerical value* that the guest sees (for the
    original host-side array [0xaa, 0xbb]) changes.

    This means that this interface is *value preserving*, not
    representation preserving. In other words: when host and guest
    endiannesses are identical, the *array* is transferred okay (the
    guest array matches the initial host array [0xaa, 0xbb]). When guest
    and host differ in endianness, the guest will see an incorrect
    *array*.

    Which, alas, makes this interface completely unsuitable for the
    purpose at hand (ie. transferring kernel & initrd images in words
    wider than a byte). We couldn't care less as to what numerical value
    the array [0xaa, 0xbb] encodes on the host -- whatever it encodes,
    the guest wants to receive the same *array* (not the same value).
    And the byte order cannot be fixed up in the guest, because it
    depends on the XOR of guest and host endiannesses, and the guest
    doesn't know about the host's endianness.

I apologize for wasting everyone's time, but I think both results are
very non-intuitive. I noticed the use of the bitwise shift in
memory_region_read_accessor() -- which internally accounts for the
host-side byte order -- just today, while discussing this with Drew on
IRC. I had assumed that it would store bytes in the recipient uint64_t
by address, not by numerical order of magnitude.

Looks like the DMA-like interface is the only way forward. :(

Laszlo

Re: [Qemu-devel] [PATCH v4 1/8] fw_cfg: max access size and region size are the same for MMIO data reg

Reply via email to