On 16.06.20 06:50, Halil Pasic wrote:
> The atomic_cmpxchg() loop is broken because we occasionally end up with
> old and _old having different values (a legit compiler can generate code
> that accessed *ind_addr again to pick up a value for _old instead of
> using the value of old that was already fetched according to the
> rules of the abstract machine). This means the underlying CS instruction
> may use a different old (_old) than the one we intended to use if
> atomic_cmpxchg() performed the xchg part.
>
> Let us use volatile to force the rules of the abstract machine for
> accesses to *ind_addr. Let us also rewrite the loop so, we that the
> new old is used to compute the new desired value if the xchg part
> is not performed.
>
> Signed-off-by: Halil Pasic <pa...@linux.ibm.com>
> Reported-by: Andre Wild <andre.wi...@ibm.com>
> Fixes: 7e7494627f ("s390x/virtio-ccw: Adapter interrupt support.")
> ---
> hw/s390x/virtio-ccw.c | 18 ++++++++++--------
> 1 file changed, 10 insertions(+), 8 deletions(-)
>
> diff --git a/hw/s390x/virtio-ccw.c b/hw/s390x/virtio-ccw.c
> index c1f4bb1d33..3c988a000b 100644
> --- a/hw/s390x/virtio-ccw.c
> +++ b/hw/s390x/virtio-ccw.c
> @@ -786,9 +786,10 @@ static inline VirtioCcwDevice
> *to_virtio_ccw_dev_fast(DeviceState *d)
> static uint8_t virtio_set_ind_atomic(SubchDev *sch, uint64_t ind_loc,
> uint8_t to_be_set)
> {
> - uint8_t ind_old, ind_new;
> + uint8_t expected, actual;
> hwaddr len = 1;
> - uint8_t *ind_addr;
> + /* avoid multiple fetches */
> + uint8_t volatile *ind_addr;
>
> ind_addr = cpu_physical_memory_map(ind_loc, &len, true);
> if (!ind_addr) {
> @@ -796,14 +797,15 @@ static uint8_t virtio_set_ind_atomic(SubchDev *sch,
> uint64_t ind_loc,
> __func__, sch->cssid, sch->ssid, sch->schid);
> return -1;
> }
> + actual = *ind_addr;
> do {
> - ind_old = *ind_addr;
to make things easier to understand. Adding a barrier in here also fixes the
issue.
Reasoning follows below:
> - ind_new = ind_old | to_be_set;
with an analysis from Andreas (cc)
#define atomic_cmpxchg__nocheck(ptr, old, new) ({ \
typeof_strip_qual(*ptr) _old = (old); \
(void)__atomic_compare_exchange_n(ptr, &_old, new, false, \
__ATOMIC_SEQ_CST, __ATOMIC_SEQ_CST); \
_old; \
})
ind_old is copied into _old in the macro. Instead of doing the copy from the
register the compiler reloads the value from memory. The result is that _old
and ind_old end up having different values. _old in r1 with the bits set
already and ind_old in r10 with the bits cleared. _old gets updated by CS
and matches ind_old afterwards - both with the bits being 0. So the !=
compare is false and the loop is left without having set any bits.
Paolo (to),
I am asking myself if it would be safer to add a barrier or something like
this in the macros in include/qemu/atomic.h.
> - } while (atomic_cmpxchg(ind_addr, ind_old, ind_new) != ind_old);
> - trace_virtio_ccw_set_ind(ind_loc, ind_old, ind_new);
> - cpu_physical_memory_unmap(ind_addr, len, 1, len);
> + expected = actual;
> + actual = atomic_cmpxchg(ind_addr, expected, expected | to_be_set);
> + } while (actual != expected);
> + trace_virtio_ccw_set_ind(ind_loc, actual, actual | to_be_set);
> + cpu_physical_memory_unmap((void *)ind_addr, len, 1, len);
>
> - return ind_old;
> + return actual;
> }
>
> static void virtio_ccw_notify(DeviceState *d, uint16_t vector)
>