On 16/02/2021 12.43, Peter Maydell wrote:
On Tue, 16 Feb 2021 at 11:01, Thomas Huth <th...@redhat.com> wrote:
According to the virtio specification, a memory barrier should be
used before incrementing the idx field in the "available" ring.
So far, we did not do this in the s390-ccw bios yet, but recently
Peter Maydell saw problems with the s390-ccw bios when running
the qtests on an aarch64 host (the bios panic'ed with the message:
"SCSI cannot report LUNs: response VS RESP=09"), which could
maybe be related to the missing memory barriers. Thus let's add
those barriers now. Since we've only seen the problem on TCG so far,
a "bcr 14,0" should be sufficient here to trigger the tcg_gen_mb()
in the TCG translate code.
(Note: The virtio spec also talks about using a memory barrier
*after* incrementing the idx field, but if I understood correctly
this is only required when using notification suppression - which
we don't use in the s390-ccw bios here)
Signed-off-by: Thomas Huth <th...@redhat.com>
This noticeably increases the reliability of the test for me:
it goes from failing one time in 2 or 3 to failing about one time
in 5 or 6. However it does still hit the virtio_scsi_parse_req()
check on "out_size && in_size" eventually. So my guess is that this
is not a sufficient supply of barriers. I'm firmly not a virtio expert
but my reading of the spec suggested that some of the way the
s390-ccw code is doing things might be in the wrong order and
need restructuring beyond merely "add barriers to existing code".
I've had a look at the virtio 1.0 spec and looked for "barrier" there, and
these two spots were the only ones that I identified that were definitely
missing. The other spots in the spec that talk about memory barriers rather
seem to be related to notification suppression, which we are not doing here,
if I've got that right. So no clue what else could be going wrong here on
your aarch64 host...
Thomas