On Tue, 16 Jun 2020 08:45:14 +0200 Christian Borntraeger <borntrae...@de.ibm.com> wrote:
> On 16.06.20 08:33, Cornelia Huck wrote: > > On Tue, 16 Jun 2020 07:58:53 +0200 > > Christian Borntraeger <borntrae...@de.ibm.com> wrote: > > > >> On 16.06.20 06:50, Halil Pasic wrote: > >>> The atomic_cmpxchg() loop is broken because we occasionally end up with > >>> old and _old having different values (a legit compiler can generate code > >>> that accessed *ind_addr again to pick up a value for _old instead of > >>> using the value of old that was already fetched according to the > >>> rules of the abstract machine). This means the underlying CS instruction > >>> may use a different old (_old) than the one we intended to use if > >>> atomic_cmpxchg() performed the xchg part. > >>> > >>> Let us use volatile to force the rules of the abstract machine for > >>> accesses to *ind_addr. Let us also rewrite the loop so, we that the > >>> new old is used to compute the new desired value if the xchg part > >>> is not performed. > >>> > >>> Signed-off-by: Halil Pasic <pa...@linux.ibm.com> > >>> Reported-by: Andre Wild <andre.wi...@ibm.com> > >>> Fixes: 7e7494627f ("s390x/virtio-ccw: Adapter interrupt support.") > >>> --- > >>> hw/s390x/virtio-ccw.c | 18 ++++++++++-------- > >>> 1 file changed, 10 insertions(+), 8 deletions(-) > >>> > >>> diff --git a/hw/s390x/virtio-ccw.c b/hw/s390x/virtio-ccw.c > >>> index c1f4bb1d33..3c988a000b 100644 > >>> --- a/hw/s390x/virtio-ccw.c > >>> +++ b/hw/s390x/virtio-ccw.c > >>> @@ -786,9 +786,10 @@ static inline VirtioCcwDevice > >>> *to_virtio_ccw_dev_fast(DeviceState *d) > >>> static uint8_t virtio_set_ind_atomic(SubchDev *sch, uint64_t ind_loc, > >>> uint8_t to_be_set) > >>> { > >>> - uint8_t ind_old, ind_new; > >>> + uint8_t expected, actual; > >>> hwaddr len = 1; > >>> - uint8_t *ind_addr; > >>> + /* avoid multiple fetches */ > >>> + uint8_t volatile *ind_addr; > >>> > >>> ind_addr = cpu_physical_memory_map(ind_loc, &len, true); > >>> if (!ind_addr) { > >>> @@ -796,14 +797,15 @@ static uint8_t virtio_set_ind_atomic(SubchDev *sch, > >>> uint64_t ind_loc, > >>> __func__, sch->cssid, sch->ssid, sch->schid); > >>> return -1; > >>> } > >>> + actual = *ind_addr; > >>> do { > >>> - ind_old = *ind_addr; > >> > >> to make things easier to understand. Adding a barrier in here also fixes > >> the issue. > >> Reasoning follows below: > >> > >>> - ind_new = ind_old | to_be_set; > >> > >> with an analysis from Andreas (cc) > >> > >> #define atomic_cmpxchg__nocheck(ptr, old, new) ({ \ > >> > >> > >> typeof_strip_qual(*ptr) _old = (old); \ > >> > >> > >> (void)__atomic_compare_exchange_n(ptr, &_old, new, false, \ > >> > >> > >> __ATOMIC_SEQ_CST, __ATOMIC_SEQ_CST); \ > >> > >> > >> _old; \ > >> > >> > >> }) > >> > >> ind_old is copied into _old in the macro. Instead of doing the copy from > >> the > >> register the compiler reloads the value from memory. The result is that > >> _old > >> and ind_old end up having different values. _old in r1 with the bits set > >> already and ind_old in r10 with the bits cleared. _old gets updated by CS > >> and matches ind_old afterwards - both with the bits being 0. So the != > >> compare is false and the loop is left without having set any bits. > >> > >> > >> Paolo (to), > >> I am asking myself if it would be safer to add a barrier or something like > >> this in the macros in include/qemu/atomic.h. > > Having said this, I think that the refactoring from Halil (to re-use actual) > also makes sense independent of the fix. What about adding a barrier instead, as you suggested? (Still wondering about other users of atomic_cmpxchg(), though.)