date:20240731

Re: [PATCH v2 0/4] hw/pci: Convert rom_bar into OnOffAuto

2024-07-31 Thread Markus Armbruster

Akihiko Odaki  writes:

> rom_bar is tristate but was defined as uint32_t so convert it into
> OnOffAuto to clarify that. For compatibility, a uint32 value set via
> QOM will be converted into OnOffAuto.
>
> Signed-off-by: Akihiko Odaki 

I agree making property "rombar" an integer was a design mistake.  I
further agree that vfio_pci_size_rom() peeking into dev->opts to
distinguish "user didn't set a value" from "user set the default value")
is an unadvisable hack.

However, ...

> ---
> Changes in v2:
> - Documented in docs/about/deprecated.rst (Daniel P. Berrangé)
> - Link to v1: 
> https://lore.kernel.org/r/20240706-rombar-v1-0-802daef2a...@daynix.com
>
> ---
> Akihiko Odaki (4):
>   qapi: Add visit_type_str_preserving()
>   qapi: Do not consume a value when visit_type_enum() fails
>   hw/pci: Convert rom_bar into OnOffAuto
>   hw/qdev: Remove opts member
>
>  docs/about/deprecated.rst |  7 +
>  docs/igd-assign.txt   |  2 +-
>  include/hw/pci/pci_device.h   |  2 +-
>  include/hw/qdev-core.h|  4 ---
>  include/qapi/visitor-impl.h   |  3 ++-
>  include/qapi/visitor.h| 25 +
>  hw/core/qdev.c|  1 -
>  hw/pci/pci.c  | 57 
> +--
>  hw/vfio/pci-quirks.c  |  2 +-
>  hw/vfio/pci.c | 11 
>  hw/xen/xen_pt_load_rom.c  |  4 +--
>  qapi/opts-visitor.c   | 12 -
>  qapi/qapi-clone-visitor.c |  2 +-
>  qapi/qapi-dealloc-visitor.c   |  4 +--
>  qapi/qapi-forward-visitor.c   |  4 ++-
>  qapi/qapi-visit-core.c| 21 ---
>  qapi/qobject-input-visitor.c  | 23 
>  qapi/qobject-output-visitor.c |  2 +-
>  qapi/string-input-visitor.c   |  2 +-
>  qapi/string-output-visitor.c  |  2 +-
>  system/qdev-monitor.c | 12 +
>  tests/qtest/virtio-net-failover.c | 32 +++---
>  22 files changed, 161 insertions(+), 73 deletions(-)
> ---
> base-commit: f2cb4026fccfe073f84a4b440e41d3ed0c3134f6
> change-id: 20240704-rombar-1a4ba2890d6c
>
> Best regards,

... this is an awful lot of QAPI visitor infrastructure.  Infrastructure
that will likely only ever be used for this one property.

Moreover, the property setter rom_bar_set() is a hack: it first tries to
parse the value as enum, and if that fails, as uint32_t.  QAPI already
provides a way to express "either this type or that type": alternate.
Like this:

{ 'alternate': 'OnOffAutoUint32',
  'data': { 'sym': 'OnOffAuto',
'uint': 'uint32' } }

Unfortunately, such alternates don't work on the command line due to
keyval visitor restrictions.  These cannot be lifted entirely, but we
might be able to lift them sufficiently to make this alternate work.

Whether it would be worth your trouble and mine just to clean up
"rombar" seems highly dubious, though.

Re: [PATCH v2 0/4] hw/pci: Convert rom_bar into OnOffAuto

2024-07-31 Thread Michael S. Tsirkin

On Wed, Jul 31, 2024 at 10:32:19AM +0200, Markus Armbruster wrote:
> Whether it would be worth your trouble and mine just to clean up
> "rombar" seems highly dubious, though.

Exactly.

-- 
MST

Re: [PATCH v5 0/8] virtio-net: add support for SR-IOV emulation

2024-07-31 Thread Cédric Le Goater


On 7/30/24 19:56, Michael S. Tsirkin wrote:

On Tue, Jul 30, 2024 at 09:26:20PM +0900, Akihiko Odaki wrote:

On 2024/07/30 20:37, Michael S. Tsirkin wrote:

On Mon, Jul 15, 2024 at 02:19:06PM +0900, Akihiko Odaki wrote:

Based-on: <20240714-rombar-v2-0-af1504ef5...@daynix.com>
("[PATCH v2 0/4] hw/pci: Convert rom_bar into OnOffAuto")


OK I will revert this for now. We'll try again after the release,
there will be time to address s390.



I'd like to know if anybody wants to use igb on a s390x machine configured
with libvirt. Such a configuration is already kind of broken, and it is
likely to require significant effort on both side of libvirt and QEMU to fix
it.



I assume Cédric wouldn't report it if was unused.



As an alternative, I'm also introducing SR-IOV support to virtio-net-pci. It
does not suffer the same problem with igb thanks to its flexible
configuration mechanism.

Regards,
Akihiko Odaki


Sounds like this needs more review time, anyway.




Using an emulated IGB device in a s390x VM is not a common scenario.
The IGB device is not supported downstream (RHEL on Z). However, the
change broke the s390x machines I use for upstream QEMU development,
I removed the IGB device as a fix for now.

The Z PCI device model has 'uid' and 'fid' properties which are set
by the VMM, and in this case, they are auto-generated, hence the
conflicting ids with libvirt. This is Z specific but, possibly there
are subtle use cases on other platforms which could have similar
consequences. Something to be aware of.


Also, and this is why I thought it was important to report. Creating
PCI devices at machine init time (with -nodefaults) in the back of the
management layer is a no-no. libvirt requests to have full control
on the machine topology and this change seems like a violation of
this rule, even if VFs are kind of special.

Daniel, did I understand correctly the above constraint on the machine
definition ?


I can't say if we should revert for 9.1. Again, this Z is specific.
The changes were introduced to catch errors early when the PF device
is realized. There is no doubt they are useful. Some rework is needed
to avoid the conflicting id issue though.


Thanks,

C.

Re: [PATCH RFC v4 0/7] virtio-net: add support for SR-IOV emulation

2024-07-31 Thread Yui Washizu




On 2024/07/15 14:15, Akihiko Odaki wrote:

On 2024/05/16 11:00, Yui Washizu wrote:


On 2024/04/28 18:05, Akihiko Odaki wrote:

Based-on: <20240315-reuse-v9-0-67aa69af4...@daynix.com>
("[PATCH for 9.1 v9 00/11] hw/pci: SR-IOV related fixes and 
improvements")


Introduction


This series is based on the RFC series submitted by Yui Washizu[1].
See also [2] for the context.

This series enables SR-IOV emulation for virtio-net. It is useful
to test SR-IOV support on the guest, or to expose several vDPA devices
in a VM. vDPA devices can also provide L2 switching feature for
offloading though it is out of scope to allow the guest to configure
such a feature.

The PF side code resides in virtio-pci. The VF side code resides in
the PCI common infrastructure, but it is restricted to work only for
virtio-net-pci because of lack of validation.

User Interface
--

A user can configure a SR-IOV capable virtio-net device by adding
virtio-net-pci functions to a bus. Below is a command line example:
   -netdev user,id=n -netdev user,id=o
   -netdev user,id=p -netdev user,id=q
   -device pcie-root-port,id=b
   -device virtio-net-pci,bus=b,addr=0x0.0x3,netdev=q,sriov-pf=f
   -device virtio-net-pci,bus=b,addr=0x0.0x2,netdev=p,sriov-pf=f
   -device virtio-net-pci,bus=b,addr=0x0.0x1,netdev=o,sriov-pf=f
   -device virtio-net-pci,bus=b,addr=0x0.0x0,netdev=n,id=f

The VFs specify the paired PF with "sriov-pf" property. The PF must be
added after all VFs. It is user's responsibility to ensure that VFs 
have

function numbers larger than one of the PF, and the function numbers
have a consistent stride.



I tried to start a VM with more than 8 VFs allocated using your patch,
but the following error occured and qemu didn't work:
VF function number overflows.

I think the cause of this error is that virtio-net-pci PFs don't have 
ARI.

(pcie_ari_init is not added to virtio-net-pci when PFs are initialized.)
I think it is possible to add it later,
but how about adding pcie_ari_init ?

As a trial,
adding pcie_ari_init to virtio_pci_realize enabled the creation of 
more than 8 VFs.


I have just looked into that possibility, but adding pcie_ari_init to 
virtio_pci_realize has some implications. Unconditionally calling 
pcie_ari_init will break the existing configuration of virtio-pci 
devices so we need to implement some logic to detect when ARI is 
needed. Preferably such logic should be implemented in the common PCI 
infrastructure instead of implementing it in virtio-pci so that other 
PCI multifunction devices can benefit from it.


While I don't think implementing this will be too complicated, I need 
to ensure that such a feature is really needed before doing so.



OK.
I want to use this emulation for offloading virtual network
in a environment where there are many containers in VMs.
So, I consider that the feature is need.
I think that 7 VFs are too few.
I'll keep thinking about the feature's necessity.


I'll add other comments to RFC v5 patch.


Regards,

Yui Washizu

Re: [PATCH 01/18] qapi: Smarter camel_to_upper() to reduce need for 'prefix'

2024-07-31 Thread Kevin Wolf

Am 30.07.2024 um 10:10 hat Markus Armbruster geschrieben:
> camel_to_upper() converts its argument from camel case to upper case
> with '_' between words.  Used for generated enumeration constant
> prefixes.
> 
> When some of the words are spelled all caps, where exactly to insert
> '_' is guesswork.  camel_to_upper()'s guesses are bad enough in places
> to make people override them with a 'prefix' in the schema.
> 
> Rewrite it to guess better:
> 
> 1. Insert '_' after a non-upper case character followed by an upper
>case character:
> 
>OneTwo -> ONE_TWO
>One2Three -> ONE2_THREE
> 
> 2. Insert '_' before the last upper case character followed by a
>non-upper case character:
> 
>ACRONYMWord -> ACRONYM_Word
> 
>Except at the beginning (as in OneTwo above), or when there is
>already one:
> 
>AbCd -> AB_CD

Maybe it's just me, but the exception "at the beginning" (in the sense
of "after the first character") seems to be exactly where I thought
"that looks strange" while going through your list below. In particular,
I'd expect X_DBG_* instead of XDBG_*. I also thought that the Q_*
spelling made more sense, though this might be less clear. But in case
of doubt, less exceptions seems like a good choice.

> +# Copy remainder of ``value`` to ``ret`` with '_' inserted
> +for ch in value[1:]:
> +if ch.isupper() == upc:
> +pass
> +elif upc:
> +# ``ret`` ends in upper case, next char isn't: insert '_'
> +# before the last upper case char unless there is one
> +# already, or it's at the beginning
> +if len(ret) > 2 and ret[-2] != '_':
> +ret = ret[:-1] + '_' + ret[-1]

I think in the code this means I would have expected len(ret) >= 2.

Kevin

Re: [PATCH 11/18] qapi/crypto: Rename QCryptoHashAlgorithm to *Algo, and drop prefix

2024-07-31 Thread Daniel P . Berrangé

On Tue, Jul 30, 2024 at 02:26:49PM +0200, Markus Armbruster wrote:
> Daniel P. Berrangé  writes:
> 
> > On Tue, Jul 30, 2024 at 10:10:25AM +0200, Markus Armbruster wrote:
> >> QAPI's 'prefix' feature can make the connection between enumeration
> >> type and its constants less than obvious.  It's best used with
> >> restraint.
> >> 
> >> QCryptoHashAlgorithm has a 'prefix' that overrides the generated
> >> enumeration constants' prefix to QCRYPTO_HASH_ALG.
> >> 
> >> We could simply drop 'prefix', but then the prefix becomes
> >> QCRYPTO_HASH_ALGORITHM, which is rather long.
> >> 
> >> We could additionally rename the type to QCryptoHashAlg, but I think
> >> the abbreviation "alg" is less than clear.
> >
> > I would have gone with this, but it is a bit of a bike shed colouring
> > debate so I'm not fussed
> 
> Either solution seems okay, so I went with my personal preference.  Do
> feel free to state yours and ask me to respin!

After reviewing the patches that follow, I'd observe that picking
Algo has made the following patches much larger than if it had
stuck with Alg. Basically changing both the types & constants,
instead of only having to change the types. 

> 
> >> Rename the type to QCryptoHashAlgo instead.  The prefix becomes to
> >> QCRYPTO_HASH_ALGO.
> >> 
> >> Signed-off-by: Markus Armbruster 
> >> ---
> >>  qapi/crypto.json| 17 +-
> >>  crypto/blockpriv.h  |  2 +-
> >>  crypto/hashpriv.h   |  2 +-
> >>  crypto/hmacpriv.h   |  4 +--
> >>  crypto/ivgenpriv.h  |  2 +-
> >>  include/crypto/afsplit.h|  8 ++---
> >>  include/crypto/block.h  |  2 +-
> >>  include/crypto/hash.h   | 18 +-
> >>  include/crypto/hmac.h   |  6 ++--
> >>  include/crypto/ivgen.h  |  6 ++--
> >>  include/crypto/pbkdf.h  | 10 +++---
> >>  backends/cryptodev-builtin.c|  8 ++---
> >>  backends/cryptodev-lkcf.c   | 10 +++---
> >>  block/parallels-ext.c   |  2 +-
> >>  block/quorum.c  |  4 +--
> >>  crypto/afsplit.c|  6 ++--
> >>  crypto/block-luks.c | 16 -
> >>  crypto/block.c  |  2 +-
> >>  crypto/hash-afalg.c | 26 +++
> >>  crypto/hash-gcrypt.c| 20 +--
> >>  crypto/hash-glib.c  | 20 +--
> >>  crypto/hash-gnutls.c| 20 +--
> >>  crypto/hash-nettle.c| 18 +-
> >>  crypto/hash.c   | 30 -
> >>  crypto/hmac-gcrypt.c| 22 ++---
> >>  crypto/hmac-glib.c  | 22 ++---
> >>  crypto/hmac-gnutls.c| 22 ++---
> >>  crypto/hmac-nettle.c| 22 ++---
> >>  crypto/hmac.c   |  2 +-
> >>  crypto/ivgen.c  |  4 +--
> >>  crypto/pbkdf-gcrypt.c   | 36 ++--
> >>  crypto/pbkdf-gnutls.c   | 36 ++--
> >>  crypto/pbkdf-nettle.c   | 32 +-
> >>  crypto/pbkdf-stub.c |  4 +--
> >>  crypto/pbkdf.c  |  2 +-
> >>  hw/misc/aspeed_hace.c   | 16 -
> >>  io/channel-websock.c|  2 +-
> >>  target/i386/sev.c   |  6 ++--
> >>  tests/bench/benchmark-crypto-akcipher.c | 12 +++
> >>  tests/bench/benchmark-crypto-hash.c | 10 +++---
> >>  tests/bench/benchmark-crypto-hmac.c |  6 ++--
> >>  tests/unit/test-crypto-afsplit.c| 10 +++---
> >>  tests/unit/test-crypto-akcipher.c   |  6 ++--
> >>  tests/unit/test-crypto-block.c  | 16 -
> >>  tests/unit/test-crypto-hash.c   | 42 +++
> >>  tests/unit/test-crypto-hmac.c   | 16 -
> >>  tests/unit/test-crypto-ivgen.c  |  8 ++---
> >>  tests/unit/test-crypto-pbkdf.c  | 44 -
> >>  ui/vnc.c|  2 +-
> >>  util/hbitmap.c  |  2 +-
> >>  crypto/akcipher-gcrypt.c.inc| 14 
> >>  crypto/akcipher-nettle.c.inc| 26 +++
> >>  52 files changed, 350 insertions(+), 351 deletions(-)
> >
> > Acked-by: Daniel P. Berrangé 
> 
> Thanks!
> 

With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

[PATCH v2 1/4] scsi-disk: Use positive return value for status in dma_readv/writev

2024-07-31 Thread Kevin Wolf

In some error cases, scsi_block_sgio_complete() never calls the passed
callback, but directly completes the request. This leads to bugs because
its error paths are not exact copies of what the callback would normally
do.

In preparation to fix this, allow passing positive return values to the
callbacks that represent the status code that should be used to complete
the request.

scsi_handle_rw_error() already handles positive values for its ret
parameter because scsi_block_sgio_complete() calls directly into it.

Signed-off-by: Kevin Wolf 
---
 hw/scsi/scsi-disk.c | 21 ++---
 1 file changed, 14 insertions(+), 7 deletions(-)

diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c
index a67092db6a..3ff6798bde 100644
--- a/hw/scsi/scsi-disk.c
+++ b/hw/scsi/scsi-disk.c
@@ -65,6 +65,10 @@ OBJECT_DECLARE_TYPE(SCSIDiskState, SCSIDiskClass, 
SCSI_DISK_BASE)
 
 struct SCSIDiskClass {
 SCSIDeviceClass parent_class;
+/*
+ * Callbacks receive ret == 0 for success. Errors are represented either as
+ * negative errno values, or as positive SAM status codes.
+ */
 DMAIOFunc   *dma_readv;
 DMAIOFunc   *dma_writev;
 bool(*need_fua_emulation)(SCSICommand *cmd);
@@ -283,7 +287,7 @@ static bool scsi_disk_req_check_error(SCSIDiskReq *r, int 
ret, bool acct_failed)
 return true;
 }
 
-if (ret < 0) {
+if (ret != 0) {
 return scsi_handle_rw_error(r, ret, acct_failed);
 }
 
@@ -360,7 +364,7 @@ static void scsi_write_do_fua(SCSIDiskReq *r)
 static void scsi_dma_complete_noio(SCSIDiskReq *r, int ret)
 {
 assert(r->req.aiocb == NULL);
-if (scsi_disk_req_check_error(r, ret, false)) {
+if (scsi_disk_req_check_error(r, ret, ret > 0)) {
 goto done;
 }
 
@@ -385,9 +389,10 @@ static void scsi_dma_complete(void *opaque, int ret)
 assert(r->req.aiocb != NULL);
 r->req.aiocb = NULL;
 
+/* ret > 0 is accounted for in scsi_disk_req_check_error() */
 if (ret < 0) {
 block_acct_failed(blk_get_stats(s->qdev.conf.blk), &r->acct);
-} else {
+} else if (ret == 0) {
 block_acct_done(blk_get_stats(s->qdev.conf.blk), &r->acct);
 }
 scsi_dma_complete_noio(r, ret);
@@ -403,7 +408,7 @@ static void scsi_read_complete_noio(SCSIDiskReq *r, int ret)
qemu_get_current_aio_context());
 
 assert(r->req.aiocb == NULL);
-if (scsi_disk_req_check_error(r, ret, false)) {
+if (scsi_disk_req_check_error(r, ret, ret > 0)) {
 goto done;
 }
 
@@ -424,9 +429,10 @@ static void scsi_read_complete(void *opaque, int ret)
 assert(r->req.aiocb != NULL);
 r->req.aiocb = NULL;
 
+/* ret > 0 is accounted for in scsi_disk_req_check_error() */
 if (ret < 0) {
 block_acct_failed(blk_get_stats(s->qdev.conf.blk), &r->acct);
-} else {
+} else if (ret == 0) {
 block_acct_done(blk_get_stats(s->qdev.conf.blk), &r->acct);
 trace_scsi_disk_read_complete(r->req.tag, r->qiov.size);
 }
@@ -534,7 +540,7 @@ static void scsi_write_complete_noio(SCSIDiskReq *r, int 
ret)
qemu_get_current_aio_context());
 
 assert (r->req.aiocb == NULL);
-if (scsi_disk_req_check_error(r, ret, false)) {
+if (scsi_disk_req_check_error(r, ret, ret > 0)) {
 goto done;
 }
 
@@ -562,9 +568,10 @@ static void scsi_write_complete(void * opaque, int ret)
 assert (r->req.aiocb != NULL);
 r->req.aiocb = NULL;
 
+/* ret > 0 is accounted for in scsi_disk_req_check_error() */
 if (ret < 0) {
 block_acct_failed(blk_get_stats(s->qdev.conf.blk), &r->acct);
-} else {
+} else if (ret == 0) {
 block_acct_done(blk_get_stats(s->qdev.conf.blk), &r->acct);
 }
 scsi_write_complete_noio(r, ret);
-- 
2.45.2

[PATCH v2 0/4] scsi-block: Fix error handling with r/werror=stop

2024-07-31 Thread Kevin Wolf

Running validation tests in Windows 2019's Failover Cluster Manager
fails in two different ways when run with rerror/werror=stop:

1. It runs into an assertion failure because the sgio-based I/O path
   takes shortcuts in its error handling that skip necessary cleanup

2. RESERVATION_CONFLICT is treated as a host error and stops the VM,
   which in some cases can't be resumed at all because nothing will make
   the error go away on retry. The error should always go to the guest
   instead, it's an invalid request from the guest.

This series fixes these problems.

v2:
- Patch 4: [Paolo]
  * Set error = 0 explicitly, remove useless variable initialisation
  * Add comment to explain why we consider it a guest error
  * Mention scsi-block specifically in the commit message

Kevin Wolf (4):
  scsi-disk: Use positive return value for status in dma_readv/writev
  scsi-block: Don't skip callback for sgio error status/driver_status
  scsi-disk: Add warning comments that host_status errors take a
shortcut
  scsi-disk: Always report RESERVATION_CONFLICT to guest

 hw/scsi/scsi-disk.c | 73 +++--
 1 file changed, 51 insertions(+), 22 deletions(-)

-- 
2.45.2

[PATCH v2 2/4] scsi-block: Don't skip callback for sgio error status/driver_status

2024-07-31 Thread Kevin Wolf

Instead of calling into scsi_handle_rw_error() directly from
scsi_block_sgio_complete() and skipping the normal callback, go through
the normal cleanup path by calling the callback with a positive error
value.

The important difference here is not only that the code path is cleaner,
but that the callbacks set r->req.aiocb = NULL. If we skip setting this
and the error action is BLOCK_ERROR_ACTION_STOP, resuming the VM runs
into an assertion failure in scsi_read_data() or scsi_write_data()
because the dangling aiocb pointer is unexpected.

Fixes: a108557bbf ("scsi: inline sg_io_sense_from_errno() into the callers.")
Buglink: https://issues.redhat.com/browse/RHEL-5
Signed-off-by: Kevin Wolf 
---
 hw/scsi/scsi-disk.c | 10 --
 1 file changed, 10 deletions(-)

diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c
index 3ff6798bde..6e1a5c98df 100644
--- a/hw/scsi/scsi-disk.c
+++ b/hw/scsi/scsi-disk.c
@@ -2832,16 +2832,6 @@ static void scsi_block_sgio_complete(void *opaque, int 
ret)
 } else {
 ret = io_hdr->status;
 }
-
-if (ret > 0) {
-if (scsi_handle_rw_error(r, ret, true)) {
-scsi_req_unref(&r->req);
-return;
-}
-
-/* Ignore error.  */
-ret = 0;
-}
 }
 
 req->cb(req->cb_opaque, ret);
-- 
2.45.2

[PATCH v2 4/4] scsi-disk: Always report RESERVATION_CONFLICT to guest

2024-07-31 Thread Kevin Wolf

In the case of scsi-block, RESERVATION_CONFLICT is not a backend error,
but indicates that the guest tried to make a request that it isn't
allowed to execute. Pass the error to the guest so that it can decide
what to do with it.

Without this, if we stop the VM in response to a RESERVATION_CONFLICT
(as is the default policy in management software such as oVirt or
KubeVirt), it can happen that the VM cannot be resumed any more because
every attempt to resume it immediately runs into the same error and
stops the VM again.

One case that expects RESERVATION_CONFLICT errors to be visible in the
guest is running the validation tests in Windows 2019's Failover Cluster
Manager, which intentionally tries to execute invalid requests to see if
they are properly rejected.

Buglink: https://issues.redhat.com/browse/RHEL-5
Signed-off-by: Kevin Wolf 
---
 hw/scsi/scsi-disk.c | 35 ++-
 1 file changed, 30 insertions(+), 5 deletions(-)

diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c
index 69a195177e..4d94b2b816 100644
--- a/hw/scsi/scsi-disk.c
+++ b/hw/scsi/scsi-disk.c
@@ -224,7 +224,7 @@ static bool scsi_handle_rw_error(SCSIDiskReq *r, int ret, 
bool acct_failed)
 SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, r->req.dev);
 SCSIDiskClass *sdc = (SCSIDiskClass *) object_get_class(OBJECT(s));
 SCSISense sense = SENSE_CODE(NO_SENSE);
-int error = 0;
+int error;
 bool req_has_sense = false;
 BlockErrorAction action;
 int status;
@@ -235,11 +235,35 @@ static bool scsi_handle_rw_error(SCSIDiskReq *r, int ret, 
bool acct_failed)
 } else {
 /* A passthrough command has completed with nonzero status.  */
 status = ret;
-if (status == CHECK_CONDITION) {
+switch (status) {
+case CHECK_CONDITION:
 req_has_sense = true;
 error = scsi_sense_buf_to_errno(r->req.sense, 
sizeof(r->req.sense));
-} else {
+break;
+case RESERVATION_CONFLICT:
+/*
+ * Don't apply the error policy, always report to the guest.
+ *
+ * This is a passthrough code path, so it's not a backend error, 
but
+ * a response to an invalid guest request.
+ *
+ * Windows Failover Cluster validation intentionally sends invalid
+ * requests to verify that reservations work as intended. It is
+ * crucial that it sees the resulting errors.
+ *
+ * Treating a reservation conflict as a guest-side error is obvious
+ * when a pr-manager is in use. Without one, the situation is less
+ * clear, but there might be nothing that can be fixed on the host
+ * (like in the above example), and we don't want to be stuck in a
+ * loop where resuming the VM and retrying the request immediately
+ * stops it again. So always reporting is still the safer option in
+ * this case, too.
+ */
+error = 0;
+break;
+default:
 error = EINVAL;
+break;
 }
 }
 
@@ -249,8 +273,9 @@ static bool scsi_handle_rw_error(SCSIDiskReq *r, int ret, 
bool acct_failed)
  * are usually retried immediately, so do not post them to QMP and
  * do not account them as failed I/O.
  */
-if (req_has_sense &&
-scsi_sense_buf_is_guest_recoverable(r->req.sense, 
sizeof(r->req.sense))) {
+if (!error || (req_has_sense &&
+   scsi_sense_buf_is_guest_recoverable(r->req.sense,
+   sizeof(r->req.sense 
{
 action = BLOCK_ERROR_ACTION_REPORT;
 acct_failed = false;
 } else {
-- 
2.45.2

[PATCH v2 3/4] scsi-disk: Add warning comments that host_status errors take a shortcut

2024-07-31 Thread Kevin Wolf

scsi_block_sgio_complete() has surprising behaviour in that there are
error cases in which it directly completes the request and never calls
the passed callback. In the current state of the code, this doesn't seem
to result in bugs, but with future code changes, we must be careful to
never rely on the callback doing some cleanup until this code smell is
fixed. For now, just add warnings to make people aware of the trap.

Signed-off-by: Kevin Wolf 
---
 hw/scsi/scsi-disk.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c
index 6e1a5c98df..69a195177e 100644
--- a/hw/scsi/scsi-disk.c
+++ b/hw/scsi/scsi-disk.c
@@ -68,6 +68,9 @@ struct SCSIDiskClass {
 /*
  * Callbacks receive ret == 0 for success. Errors are represented either as
  * negative errno values, or as positive SAM status codes.
+ *
+ * Beware: For errors returned in host_status, the function may directly
+ * complete the request and never call the callback.
  */
 DMAIOFunc   *dma_readv;
 DMAIOFunc   *dma_writev;
@@ -381,6 +384,7 @@ done:
 scsi_req_unref(&r->req);
 }
 
+/* May not be called in all error cases, don't rely on cleanup here */
 static void scsi_dma_complete(void *opaque, int ret)
 {
 SCSIDiskReq *r = (SCSIDiskReq *)opaque;
@@ -421,6 +425,7 @@ done:
 scsi_req_unref(&r->req);
 }
 
+/* May not be called in all error cases, don't rely on cleanup here */
 static void scsi_read_complete(void *opaque, int ret)
 {
 SCSIDiskReq *r = (SCSIDiskReq *)opaque;
@@ -560,6 +565,7 @@ done:
 scsi_req_unref(&r->req);
 }
 
+/* May not be called in all error cases, don't rely on cleanup here */
 static void scsi_write_complete(void * opaque, int ret)
 {
 SCSIDiskReq *r = (SCSIDiskReq *)opaque;
@@ -2821,6 +2827,7 @@ static void scsi_block_sgio_complete(void *opaque, int 
ret)
 sg_io_hdr_t *io_hdr = &req->io_header;
 
 if (ret == 0) {
+/* FIXME This skips calling req->cb() and any cleanup in it */
 if (io_hdr->host_status != SCSI_HOST_OK) {
 scsi_req_complete_failed(&r->req, io_hdr->host_status);
 scsi_req_unref(&r->req);
-- 
2.45.2

Re: [PATCH] iotests/024: exclude 'backing file format' field from the output

2024-07-31 Thread Philippe Mathieu-Daudé


On 30/7/24 11:47, Andrey Drobyshev wrote:

Apparently 'qemu-img info' doesn't report the backing file format field
for qed (as it does for qcow2):

$ qemu-img create -f qed base.qed 1M && qemu-img create -f qed -b base.qed -F 
qed top.qed 1M
$ qemu-img create -f qcow2 base.qcow2 1M && qemu-img create -f qcow2 -b 
base.qcow2 -F qcow2 top.qcow2 1M
$ qemu-img info top.qed | grep 'backing file format'
$ qemu-img info top.qcow2 | grep 'backing file format'
backing file format: qcow2

This leads to the 024 test failure with -qed.  Let's just filter the
field out and exclude it from the output.

This is a fixup for the commit f93e65ee51 ("iotests/{024, 271}: add
testcases for qemu-img rebase").

Found-by: Thomas Huth 


s/Found/Reported/


Signed-off-by: Andrey Drobyshev 
---
  tests/qemu-iotests/024 | 2 +-
  tests/qemu-iotests/024.out | 1 -
  2 files changed, 1 insertion(+), 2 deletions(-)

[PATCH 1/7] block/vdi.c: Avoid potential overflow when calculating size of write

2024-07-31 Thread Peter Maydell

In vdi_co_pwritev() we multiply a sector count by SECTOR_SIZE to
get the size to write in bytes. Coverity notes that this means that
we do the multiply as a 32x32->32 multiply before converting to
64 bits, which has the potential to overflow.

This is very unlikely to happen, since the block map has 4 bytes per
block and the maximum number of blocks in the image must fit into a
32-bit integer.  But we can keep Coverity happy by including a cast
so we do a 64-bit multiply here.

Resolves: Coverity CID 1508076
Signed-off-by: Peter Maydell 
---
 block/vdi.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/vdi.c b/block/vdi.c
index 6363da08cee..27c60ba18d0 100644
--- a/block/vdi.c
+++ b/block/vdi.c
@@ -728,7 +728,7 @@ nonallocating_write:
 logout("will write %u block map sectors starting from entry %u\n",
n_sectors, bmap_first);
 ret = bdrv_co_pwrite(bs->file, bmap_offset * SECTOR_SIZE,
- n_sectors * SECTOR_SIZE, base, 0);
+ n_sectors * (uint64_t)SECTOR_SIZE, base, 0);
 }
 
 return ret;
-- 
2.34.1

[PATCH 0/7] block: Miscellaneous minor Coverity fixes

2024-07-31 Thread Peter Maydell

This patchset is a collection of fixes for minor Coverity reported
issues. In all cases, there isn't a user-visible problem, but
the Coverity issue pointed up somewhere where we could clean
up the code or make it a bit more obvious to a human reader
what the intent was.

Only lightly tested (with "make check" and "make check-avocado").

thanks
-- PMM

Peter Maydell (7):
  block/vdi.c: Avoid potential overflow when calculating size of write
  block/gluster: Use g_autofree for string in qemu_gluster_parse_json()
  hw/block/pflash_cfi01: Don't decrement pfl->counter below 0
  hw/ide/atapi: Be explicit that assigning to s->lcyl truncates
  hw/block/fdc-isa: Assert that isa_fdc_get_drive_max_chs() found
something
  hw/ide/pci.c: Remove dead code from bmdma_prepare_buf()
  block/ssh.c: Don't double-check that characters are hex digits

 block/gluster.c |  6 +-
 block/ssh.c | 10 ++
 block/vdi.c |  2 +-
 hw/block/fdc-isa.c  |  2 ++
 hw/block/pflash_cfi01.c |  1 +
 hw/ide/atapi.c  |  2 +-
 hw/ide/pci.c|  4 
 7 files changed, 12 insertions(+), 15 deletions(-)

-- 
2.34.1

[PATCH 7/7] block/ssh.c: Don't double-check that characters are hex digits

2024-07-31 Thread Peter Maydell

In compare_fingerprint() we effectively check whether the characters
in the fingerprint are valid hex digits twice: first we do so with
qemu_isxdigit(), but then the hex2decimal() function also has a code
path where it effectively detects an invalid digit and returns -1.
This causes Coverity to complain because it thinks that we might use
that -1 value in an expression where it would be an integer overflow.

Avoid the double-check of hex digit validity by testing the return
values from hex2decimal() rather than doing separate calls to
qemu_isxdigit().

Signed-off-by: Peter Maydell 
---
Could alternatively have put a g_assert_non_reached() in
hex2decimal(), but this seemed better to me.
---
 block/ssh.c | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/block/ssh.c b/block/ssh.c
index 27d582e0e3d..510dd208aba 100644
--- a/block/ssh.c
+++ b/block/ssh.c
@@ -376,13 +376,15 @@ static int compare_fingerprint(const unsigned char 
*fingerprint, size_t len,
 unsigned c;
 
 while (len > 0) {
+unsigned c0, c1;
 while (*host_key_check == ':')
 host_key_check++;
-if (!qemu_isxdigit(host_key_check[0]) ||
-!qemu_isxdigit(host_key_check[1]))
+c0 = hex2decimal(host_key_check[0]);
+c1 = hex2decimal(host_key_check[1]);
+if (c0 > 0xf || c1 > 0xf) {
 return 1;
-c = hex2decimal(host_key_check[0]) * 16 +
-hex2decimal(host_key_check[1]);
+}
+c = c0 * 16 + c1;
 if (c - *fingerprint != 0)
 return c - *fingerprint;
 fingerprint++;
-- 
2.34.1

[PATCH 5/7] hw/block/fdc-isa: Assert that isa_fdc_get_drive_max_chs() found something

2024-07-31 Thread Peter Maydell

Coverity complains about an overflow in isa_fdc_get_drive_max_chs()
that can happen if the loop over fd_formats never finds a match,
because we initialize *maxc to 0 and then at the end of the
function decrement it.

This can't ever actually happen because fd_formats has at least
one entry for each FloppyDriveType, so we must at least once
find a match and update *maxc, *maxh and *maxs. Assert that we
did find a match, which should keep Coverity happy and will also
detect possible bugs in the data in fd_formats.

Resolves: Coverity CID 1547663
Signed-off-by: Peter Maydell 
---
 hw/block/fdc-isa.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/hw/block/fdc-isa.c b/hw/block/fdc-isa.c
index e43dc532af8..796835f57b3 100644
--- a/hw/block/fdc-isa.c
+++ b/hw/block/fdc-isa.c
@@ -147,6 +147,8 @@ static void isa_fdc_get_drive_max_chs(FloppyDriveType type, 
uint8_t *maxc,
 *maxs = fdf->last_sect;
 }
 }
+/* fd_formats must contain at least one entry per FloppyDriveType */
+assert(*maxc);
 (*maxc)--;
 }
 
-- 
2.34.1

[PATCH 6/7] hw/ide/pci.c: Remove dead code from bmdma_prepare_buf()

2024-07-31 Thread Peter Maydell

Coverity notes that the code at the end of the loop in
bmdma_prepare_buf() is unreachable.  This is because in commit
9fbf0fa81fca8f527 ("ide: remove hardcoded 2GiB transactional limit")
we removed the only codepath in the loop which could "break" out of
it, but didn't notice that this meant we should also remove the code
at the end of the loop.

Remove the dead code.

Resolves: Coverity CID 1547772
Signed-off-by: Peter Maydell 
---
 hw/ide/pci.c | 4 
 1 file changed, 4 deletions(-)

diff --git a/hw/ide/pci.c b/hw/ide/pci.c
index 4675d079a17..f2cb500a94f 100644
--- a/hw/ide/pci.c
+++ b/hw/ide/pci.c
@@ -266,10 +266,6 @@ static int32_t bmdma_prepare_buf(const IDEDMA *dma, 
int32_t limit)
 s->io_buffer_size += l;
 }
 }
-
-qemu_sglist_destroy(&s->sg);
-s->io_buffer_size = 0;
-return -1;
 }
 
 /* return 0 if buffer completed */
-- 
2.34.1

[PATCH 4/7] hw/ide/atapi: Be explicit that assigning to s->lcyl truncates

2024-07-31 Thread Peter Maydell

In ide_atapi_cmd_reply_end() we calculate a 16-bit size, and then
assign its two halves to s->lcyl and s->hcyl like this:

   s->lcyl = size;
   s->hcyl = size >> 8;

Coverity warns that the first line here can overflow the
8-bit s->lcyl variable. This is true, and in this case we're
deliberately only after the low 8 bits of the value. The
code is clearer to both humans and Coverity if we're explicit
that we only wanted the low 8 bits, though.

Signed-off-by: Peter Maydell 
---
 hw/ide/atapi.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/ide/atapi.c b/hw/ide/atapi.c
index fcb6cca1573..e82959dc2d3 100644
--- a/hw/ide/atapi.c
+++ b/hw/ide/atapi.c
@@ -265,7 +265,7 @@ void ide_atapi_cmd_reply_end(IDEState *s)
 byte_count_limit--;
 size = byte_count_limit;
 }
-s->lcyl = size;
+s->lcyl = size & 0xff;
 s->hcyl = size >> 8;
 s->elementary_transfer_size = size;
 /* we cannot transmit more than one sector at a time */
-- 
2.34.1

[PATCH 2/7] block/gluster: Use g_autofree for string in qemu_gluster_parse_json()

2024-07-31 Thread Peter Maydell

In the loop in qemu_gluster_parse_json() we do:

char *str = NULL;
for(...) {
str = g_strdup_printf(...);
...
if (various errors) {
goto out;
}
...
g_free(str);
str = NULL;
}
return 0;
out:
various cleanups;
g_free(str);
...
return -errno;

Coverity correctly complains that the assignment "str = NULL" at the
end of the loop is unnecessary, because we will either go back to the
top of the loop and overwrite it, or else we will exit the loop and
then exit the function without ever reading str again. The assignment
is there as defensive coding to ensure that str is only non-NULL if
it's a live allocation, so this is intentional.

We can make Coverity happier and simplify the code here by using
g_autofree, since we never need 'str' outside the loop.

Resolves: Coverity CID 1527385
Signed-off-by: Peter Maydell 
---
 block/gluster.c | 6 +-
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/block/gluster.c b/block/gluster.c
index f8b415f3812..61ded95e660 100644
--- a/block/gluster.c
+++ b/block/gluster.c
@@ -514,7 +514,6 @@ static int qemu_gluster_parse_json(BlockdevOptionsGluster 
*gconf,
 SocketAddressList **tail;
 QDict *backing_options = NULL;
 Error *local_err = NULL;
-char *str = NULL;
 const char *ptr;
 int i, type, num_servers;
 
@@ -547,7 +546,7 @@ static int qemu_gluster_parse_json(BlockdevOptionsGluster 
*gconf,
 tail = &gconf->server;
 
 for (i = 0; i < num_servers; i++) {
-str = g_strdup_printf(GLUSTER_OPT_SERVER_PATTERN"%d.", i);
+g_autofree char *str = 
g_strdup_printf(GLUSTER_OPT_SERVER_PATTERN"%d.", i);
 qdict_extract_subqdict(options, &backing_options, str);
 
 /* create opts info from runtime_type_opts list */
@@ -658,8 +657,6 @@ static int qemu_gluster_parse_json(BlockdevOptionsGluster 
*gconf,
 
 qobject_unref(backing_options);
 backing_options = NULL;
-g_free(str);
-str = NULL;
 }
 
 return 0;
@@ -668,7 +665,6 @@ out:
 error_propagate(errp, local_err);
 qapi_free_SocketAddress(gsconf);
 qemu_opts_del(opts);
-g_free(str);
 qobject_unref(backing_options);
 errno = EINVAL;
 return -errno;
-- 
2.34.1

[PATCH 3/7] hw/block/pflash_cfi01: Don't decrement pfl->counter below 0

2024-07-31 Thread Peter Maydell

In pflash_write() Coverity points out that we can decrement the
unsigned pfl->counter below zero, which makes it wrap around.  In
fact this is harmless, because if pfl->counter is 0 at this point we
also increment pfl->wcycle to 3, and the wcycle == 3 handling doesn't
look at counter; the only way back into code which looks at the
counter value is via wcycle == 1, which will reinitialize the counter.
But it's arguably a little clearer to break early in the "counter ==
0" if(), to avoid the decrement-below-zero.

Resolves: Coverity CID 1547611
Signed-off-by: Peter Maydell 
---
 hw/block/pflash_cfi01.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/hw/block/pflash_cfi01.c b/hw/block/pflash_cfi01.c
index c8f1cf5a872..2f3d1dd509c 100644
--- a/hw/block/pflash_cfi01.c
+++ b/hw/block/pflash_cfi01.c
@@ -614,6 +614,7 @@ static void pflash_write(PFlashCFI01 *pfl, hwaddr offset,
 if (!pfl->counter) {
 trace_pflash_write(pfl->name, "block write finished");
 pfl->wcycle++;
+break;
 }
 
 pfl->counter--;
-- 
2.34.1

Re: [PATCH 4/7] hw/ide/atapi: Be explicit that assigning to s->lcyl truncates

2024-07-31 Thread Markus Armbruster

Peter Maydell  writes:

> In ide_atapi_cmd_reply_end() we calculate a 16-bit size, and then
> assign its two halves to s->lcyl and s->hcyl like this:
>
>s->lcyl = size;
>s->hcyl = size >> 8;
>
> Coverity warns that the first line here can overflow the
> 8-bit s->lcyl variable. This is true, and in this case we're
> deliberately only after the low 8 bits of the value. The
> code is clearer to both humans and Coverity if we're explicit
> that we only wanted the low 8 bits, though.
>

Resolves?

> Signed-off-by: Peter Maydell 
> ---
>  hw/ide/atapi.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/hw/ide/atapi.c b/hw/ide/atapi.c
> index fcb6cca1573..e82959dc2d3 100644
> --- a/hw/ide/atapi.c
> +++ b/hw/ide/atapi.c
> @@ -265,7 +265,7 @@ void ide_atapi_cmd_reply_end(IDEState *s)
   /* a new transfer is needed */
   s->nsector = (s->nsector & ~7) | ATAPI_INT_REASON_IO;
   ide_bus_set_irq(s->bus);
   byte_count_limit = atapi_byte_count_limit(s);
   trace_ide_atapi_cmd_reply_end_bcl(s, byte_count_limit);
   size = s->packet_transfer_size;
   if (size > byte_count_limit) {
   /* byte count limit must be even if this case */
   if (byte_count_limit & 1)
>  byte_count_limit--;
>  size = byte_count_limit;
>  }
> -s->lcyl = size;
> +s->lcyl = size & 0xff;
>  s->hcyl = size >> 8;
>  s->elementary_transfer_size = size;
>  /* we cannot transmit more than one sector at a time */

@size is int.  I wonder why it's fine with size >> 8...  ah,
atapi_byte_count_limit() returns uint16_t, and Coverity is smart enough
to data-flow this via @byte_count_limit into @size.

Reviewed-by: Markus Armbruster

Re: [PATCH 4/7] hw/ide/atapi: Be explicit that assigning to s->lcyl truncates

2024-07-31 Thread Peter Maydell

On Wed, 31 Jul 2024 at 15:47, Markus Armbruster  wrote:
>
> Peter Maydell  writes:
>
> > In ide_atapi_cmd_reply_end() we calculate a 16-bit size, and then
> > assign its two halves to s->lcyl and s->hcyl like this:
> >
> >s->lcyl = size;
> >s->hcyl = size >> 8;
> >
> > Coverity warns that the first line here can overflow the
> > 8-bit s->lcyl variable. This is true, and in this case we're
> > deliberately only after the low 8 bits of the value. The
> > code is clearer to both humans and Coverity if we're explicit
> > that we only wanted the low 8 bits, though.
> >
>
> Resolves?

Whoops.

Resolves: Coverity CID 1547621

> > Signed-off-by: Peter Maydell 
> > ---
> >  hw/ide/atapi.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/hw/ide/atapi.c b/hw/ide/atapi.c
> > index fcb6cca1573..e82959dc2d3 100644
> > --- a/hw/ide/atapi.c
> > +++ b/hw/ide/atapi.c
> > @@ -265,7 +265,7 @@ void ide_atapi_cmd_reply_end(IDEState *s)
>/* a new transfer is needed */
>s->nsector = (s->nsector & ~7) | ATAPI_INT_REASON_IO;
>ide_bus_set_irq(s->bus);
>byte_count_limit = atapi_byte_count_limit(s);
>trace_ide_atapi_cmd_reply_end_bcl(s, byte_count_limit);
>size = s->packet_transfer_size;
>if (size > byte_count_limit) {
>/* byte count limit must be even if this case */
>if (byte_count_limit & 1)
> >  byte_count_limit--;
> >  size = byte_count_limit;
> >  }
> > -s->lcyl = size;
> > +s->lcyl = size & 0xff;
> >  s->hcyl = size >> 8;
> >  s->elementary_transfer_size = size;
> >  /* we cannot transmit more than one sector at a time */
>
> @size is int.  I wonder why it's fine with size >> 8...  ah,
> atapi_byte_count_limit() returns uint16_t, and Coverity is smart enough
> to data-flow this via @byte_count_limit into @size.
>
> Reviewed-by: Markus Armbruster 

thanks
-- PMM

Re: [PATCH 5/7] hw/block/fdc-isa: Assert that isa_fdc_get_drive_max_chs() found something

2024-07-31 Thread Markus Armbruster

Peter Maydell  writes:

> Coverity complains about an overflow in isa_fdc_get_drive_max_chs()
> that can happen if the loop over fd_formats never finds a match,
> because we initialize *maxc to 0 and then at the end of the
> function decrement it.
>
> This can't ever actually happen because fd_formats has at least
> one entry for each FloppyDriveType, so we must at least once
> find a match and update *maxc, *maxh and *maxs. Assert that we
> did find a match, which should keep Coverity happy and will also
> detect possible bugs in the data in fd_formats.
>
> Resolves: Coverity CID 1547663
> Signed-off-by: Peter Maydell 
> ---
>  hw/block/fdc-isa.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/hw/block/fdc-isa.c b/hw/block/fdc-isa.c
> index e43dc532af8..796835f57b3 100644
> --- a/hw/block/fdc-isa.c
> +++ b/hw/block/fdc-isa.c
> @@ -147,6 +147,8 @@ static void isa_fdc_get_drive_max_chs(FloppyDriveType 
> type, uint8_t *maxc,
>  *maxs = fdf->last_sect;
>  }
>  }
> +/* fd_formats must contain at least one entry per FloppyDriveType */
> +assert(*maxc);
>  (*maxc)--;
>  }

Reviewed-by: Markus Armbruster

Re: [PATCH 5/7] hw/block/fdc-isa: Assert that isa_fdc_get_drive_max_chs() found something

2024-07-31 Thread Kevin Wolf

Am 31.07.2024 um 16:36 hat Peter Maydell geschrieben:
> Coverity complains about an overflow in isa_fdc_get_drive_max_chs()
> that can happen if the loop over fd_formats never finds a match,
> because we initialize *maxc to 0 and then at the end of the
> function decrement it.
> 
> This can't ever actually happen because fd_formats has at least
> one entry for each FloppyDriveType, so we must at least once
> find a match and update *maxc, *maxh and *maxs. Assert that we
> did find a match, which should keep Coverity happy and will also
> detect possible bugs in the data in fd_formats.
> 
> Resolves: Coverity CID 1547663
> Signed-off-by: Peter Maydell 

Reviewed-by: Kevin Wolf

Re: [PATCH 4/7] hw/ide/atapi: Be explicit that assigning to s->lcyl truncates

2024-07-31 Thread Kevin Wolf

Am 31.07.2024 um 16:36 hat Peter Maydell geschrieben:
> In ide_atapi_cmd_reply_end() we calculate a 16-bit size, and then
> assign its two halves to s->lcyl and s->hcyl like this:
> 
>s->lcyl = size;
>s->hcyl = size >> 8;
> 
> Coverity warns that the first line here can overflow the
> 8-bit s->lcyl variable. This is true, and in this case we're
> deliberately only after the low 8 bits of the value. The
> code is clearer to both humans and Coverity if we're explicit
> that we only wanted the low 8 bits, though.
> 
> Signed-off-by: Peter Maydell 

Reviewed-by: Kevin Wolf

Re: [PATCH 1/7] block/vdi.c: Avoid potential overflow when calculating size of write

2024-07-31 Thread Kevin Wolf

Am 31.07.2024 um 16:36 hat Peter Maydell geschrieben:
> In vdi_co_pwritev() we multiply a sector count by SECTOR_SIZE to
> get the size to write in bytes. Coverity notes that this means that
> we do the multiply as a 32x32->32 multiply before converting to
> 64 bits, which has the potential to overflow.
> 
> This is very unlikely to happen, since the block map has 4 bytes per
> block and the maximum number of blocks in the image must fit into a
> 32-bit integer.  But we can keep Coverity happy by including a cast
> so we do a 64-bit multiply here.
> 
> Resolves: Coverity CID 1508076
> Signed-off-by: Peter Maydell 
> ---
>  block/vdi.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/block/vdi.c b/block/vdi.c
> index 6363da08cee..27c60ba18d0 100644
> --- a/block/vdi.c
> +++ b/block/vdi.c
> @@ -728,7 +728,7 @@ nonallocating_write:
>  logout("will write %u block map sectors starting from entry %u\n",
> n_sectors, bmap_first);
>  ret = bdrv_co_pwrite(bs->file, bmap_offset * SECTOR_SIZE,
> - n_sectors * SECTOR_SIZE, base, 0);
> + n_sectors * (uint64_t)SECTOR_SIZE, base, 0);
>  }

I wonder if we shouldn't just make VDI's SECTOR_SIZE 64 bits like
BDRV_SECTOR_SIZE. It's easy to miss the cast in individual places.

Kevin

Re: [PATCH 2/7] block/gluster: Use g_autofree for string in qemu_gluster_parse_json()

2024-07-31 Thread Kevin Wolf

Am 31.07.2024 um 16:36 hat Peter Maydell geschrieben:
> In the loop in qemu_gluster_parse_json() we do:
> 
> char *str = NULL;
> for(...) {
> str = g_strdup_printf(...);
> ...
> if (various errors) {
> goto out;
> }
> ...
> g_free(str);
> str = NULL;
> }
> return 0;
> out:
> various cleanups;
> g_free(str);
> ...
> return -errno;
> 
> Coverity correctly complains that the assignment "str = NULL" at the
> end of the loop is unnecessary, because we will either go back to the
> top of the loop and overwrite it, or else we will exit the loop and
> then exit the function without ever reading str again. The assignment
> is there as defensive coding to ensure that str is only non-NULL if
> it's a live allocation, so this is intentional.
> 
> We can make Coverity happier and simplify the code here by using
> g_autofree, since we never need 'str' outside the loop.
> 
> Resolves: Coverity CID 1527385
> Signed-off-by: Peter Maydell 
> ---
>  block/gluster.c | 6 +-
>  1 file changed, 1 insertion(+), 5 deletions(-)
> 
> diff --git a/block/gluster.c b/block/gluster.c
> index f8b415f3812..61ded95e660 100644
> --- a/block/gluster.c
> +++ b/block/gluster.c
> @@ -514,7 +514,6 @@ static int qemu_gluster_parse_json(BlockdevOptionsGluster 
> *gconf,
>  SocketAddressList **tail;
>  QDict *backing_options = NULL;
>  Error *local_err = NULL;
> -char *str = NULL;
>  const char *ptr;
>  int i, type, num_servers;
>  
> @@ -547,7 +546,7 @@ static int qemu_gluster_parse_json(BlockdevOptionsGluster 
> *gconf,
>  tail = &gconf->server;
>  
>  for (i = 0; i < num_servers; i++) {
> -str = g_strdup_printf(GLUSTER_OPT_SERVER_PATTERN"%d.", i);
> +g_autofree char *str = 
> g_strdup_printf(GLUSTER_OPT_SERVER_PATTERN"%d.", i);

This line is too long now.

With this fixed:
Reviewed-by: Kevin Wolf

Re: [PATCH 1/7] block/vdi.c: Avoid potential overflow when calculating size of write

2024-07-31 Thread Stefan Weil via


Am 31.07.24 um 16:36 schrieb Peter Maydell:


In vdi_co_pwritev() we multiply a sector count by SECTOR_SIZE to
get the size to write in bytes. Coverity notes that this means that
we do the multiply as a 32x32->32 multiply before converting to
64 bits, which has the potential to overflow.

This is very unlikely to happen, since the block map has 4 bytes per
block and the maximum number of blocks in the image must fit into a
32-bit integer.  But we can keep Coverity happy by including a cast
so we do a 64-bit multiply here.

Resolves: Coverity CID 1508076
Signed-off-by: Peter Maydell 
---
  block/vdi.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/vdi.c b/block/vdi.c
index 6363da08cee..27c60ba18d0 100644
--- a/block/vdi.c
+++ b/block/vdi.c
@@ -728,7 +728,7 @@ nonallocating_write:
  logout("will write %u block map sectors starting from entry %u\n",
 n_sectors, bmap_first);
  ret = bdrv_co_pwrite(bs->file, bmap_offset * SECTOR_SIZE,
- n_sectors * SECTOR_SIZE, base, 0);
+ n_sectors * (uint64_t)SECTOR_SIZE, base, 0);
  }
  
  return ret;


Thanks.

Reviewed-by: Stefan Weil

Re: [PATCH 3/7] hw/block/pflash_cfi01: Don't decrement pfl->counter below 0

2024-07-31 Thread Kevin Wolf

Am 31.07.2024 um 16:36 hat Peter Maydell geschrieben:
> In pflash_write() Coverity points out that we can decrement the
> unsigned pfl->counter below zero, which makes it wrap around.  In
> fact this is harmless, because if pfl->counter is 0 at this point we
> also increment pfl->wcycle to 3, and the wcycle == 3 handling doesn't
> look at counter; the only way back into code which looks at the
> counter value is via wcycle == 1, which will reinitialize the counter.
> But it's arguably a little clearer to break early in the "counter ==
> 0" if(), to avoid the decrement-below-zero.
> 
> Resolves: Coverity CID 1547611
> Signed-off-by: Peter Maydell 

Reviewed-by: Kevin Wolf

Re: [PATCH 6/7] hw/ide/pci.c: Remove dead code from bmdma_prepare_buf()

2024-07-31 Thread Kevin Wolf

Am 31.07.2024 um 16:36 hat Peter Maydell geschrieben:
> Coverity notes that the code at the end of the loop in
> bmdma_prepare_buf() is unreachable.  This is because in commit
> 9fbf0fa81fca8f527 ("ide: remove hardcoded 2GiB transactional limit")
> we removed the only codepath in the loop which could "break" out of
> it, but didn't notice that this meant we should also remove the code
> at the end of the loop.
> 
> Remove the dead code.
> 
> Resolves: Coverity CID 1547772
> Signed-off-by: Peter Maydell 
> ---
>  hw/ide/pci.c | 4 
>  1 file changed, 4 deletions(-)
> 
> diff --git a/hw/ide/pci.c b/hw/ide/pci.c
> index 4675d079a17..f2cb500a94f 100644
> --- a/hw/ide/pci.c
> +++ b/hw/ide/pci.c
> @@ -266,10 +266,6 @@ static int32_t bmdma_prepare_buf(const IDEDMA *dma, 
> int32_t limit)
>  s->io_buffer_size += l;
>  }
>  }
> -
> -qemu_sglist_destroy(&s->sg);
> -s->io_buffer_size = 0;
> -return -1;
>  }

Should we put a g_assert_not_reached() here instead to make it easier
for the reader to understand how this function works?

Either way:
Reviewed-by: Kevin Wolf

Re: [PATCH 7/7] block/ssh.c: Don't double-check that characters are hex digits

2024-07-31 Thread Kevin Wolf

Am 31.07.2024 um 16:36 hat Peter Maydell geschrieben:
> In compare_fingerprint() we effectively check whether the characters
> in the fingerprint are valid hex digits twice: first we do so with
> qemu_isxdigit(), but then the hex2decimal() function also has a code
> path where it effectively detects an invalid digit and returns -1.
> This causes Coverity to complain because it thinks that we might use
> that -1 value in an expression where it would be an integer overflow.

If it's a Coverity issue, I think you want to mention the CID, too.

> Avoid the double-check of hex digit validity by testing the return
> values from hex2decimal() rather than doing separate calls to
> qemu_isxdigit().
> 
> Signed-off-by: Peter Maydell 
> ---
> Could alternatively have put a g_assert_non_reached() in
> hex2decimal(), but this seemed better to me.

I don't like that hex2decimal() returns -1 when its result is unsigned,
which is why you had to write the check as > 0xf rather than < 0. So in
this sense, g_assert_non_reached() would look better to me. But we could
also just change it to return UINT_MAX instead, which should be the
same, just written in a less confusing way.

>  block/ssh.c | 10 ++
>  1 file changed, 6 insertions(+), 4 deletions(-)
> 
> diff --git a/block/ssh.c b/block/ssh.c
> index 27d582e0e3d..510dd208aba 100644
> --- a/block/ssh.c
> +++ b/block/ssh.c
> @@ -376,13 +376,15 @@ static int compare_fingerprint(const unsigned char 
> *fingerprint, size_t len,
>  unsigned c;
>  
>  while (len > 0) {
> +unsigned c0, c1;
>  while (*host_key_check == ':')
>  host_key_check++;
> -if (!qemu_isxdigit(host_key_check[0]) ||
> -!qemu_isxdigit(host_key_check[1]))
> +c0 = hex2decimal(host_key_check[0]);
> +c1 = hex2decimal(host_key_check[1]);
> +if (c0 > 0xf || c1 > 0xf) {
>  return 1;
> -c = hex2decimal(host_key_check[0]) * 16 +
> -hex2decimal(host_key_check[1]);
> +}
> +c = c0 * 16 + c1;
>  if (c - *fingerprint != 0)
>  return c - *fingerprint;
>  fingerprint++;

Reviewed-by: Kevin Wolf

Re: [PATCH 7/7] block/ssh.c: Don't double-check that characters are hex digits

2024-07-31 Thread Peter Maydell

On Wed, 31 Jul 2024 at 16:21, Kevin Wolf  wrote:
>
> Am 31.07.2024 um 16:36 hat Peter Maydell geschrieben:
> > In compare_fingerprint() we effectively check whether the characters
> > in the fingerprint are valid hex digits twice: first we do so with
> > qemu_isxdigit(), but then the hex2decimal() function also has a code
> > path where it effectively detects an invalid digit and returns -1.
> > This causes Coverity to complain because it thinks that we might use
> > that -1 value in an expression where it would be an integer overflow.
>
> If it's a Coverity issue, I think you want to mention the CID, too.

Yes;

Resolves: Coverity CID 1547813

> > Avoid the double-check of hex digit validity by testing the return
> > values from hex2decimal() rather than doing separate calls to
> > qemu_isxdigit().
> >
> > Signed-off-by: Peter Maydell 
> > ---
> > Could alternatively have put a g_assert_non_reached() in
> > hex2decimal(), but this seemed better to me.
>
> I don't like that hex2decimal() returns -1 when its result is unsigned,
> which is why you had to write the check as > 0xf rather than < 0. So in
> this sense, g_assert_non_reached() would look better to me. But we could
> also just change it to return UINT_MAX instead, which should be the
> same, just written in a less confusing way.

I was not super happy with the -1 either. Happy to change that
to 'return UINT_MAX'.

-- PMM

Re: [PATCH-for-9.1 5/5] hw/sd/sdhci: Check ADMA descriptors can be accessed

2024-07-31 Thread Philippe Mathieu-Daudé


On 30/7/24 11:21, Philippe Mathieu-Daudé wrote:

Since malicious guest can write invalid addresses to
the ADMASYSADDR register, we need to check whether the
descriptor could be correctly filled or not.

Cc: qemu-sta...@nongnu.org
Fixes: d7dfca0807 ("hw/sdhci: introduce standard SD host controller")
Signed-off-by: Philippe Mathieu-Daudé 
---
  hw/sd/sdhci.c  | 86 ++
  hw/sd/trace-events |  2 +-
  2 files changed, 49 insertions(+), 39 deletions(-)

diff --git a/hw/sd/sdhci.c b/hw/sd/sdhci.c
index 66b9364e9e..eb0476b9aa 100644
--- a/hw/sd/sdhci.c
+++ b/hw/sd/sdhci.c
@@ -698,53 +698,62 @@ static void trace_adma_description(const char *type, 
const ADMADescr *dscr)
  trace_sdhci_adma_desc(type, dscr->addr, dscr->length, dscr->attr, 
dscr->incr);
  }
  
-static void get_adma_description(SDHCIState *s, ADMADescr *dscr)

+static MemTxResult get_adma_description(SDHCIState *s, ADMADescr *dscr)
  {
  uint32_t adma1 = 0;
  uint64_t adma2 = 0;
  hwaddr entry_addr = (hwaddr)s->admasysaddr;
+MemTxResult res;
+
  switch (SDHC_DMA_TYPE(s->hostctl1)) {
  case SDHC_CTRL_ADMA2_32:
-dma_memory_read(s->dma_as, entry_addr, &adma2, sizeof(adma2),
-MEMTXATTRS_UNSPECIFIED);
-adma2 = le64_to_cpu(adma2);
-/* The spec does not specify endianness of descriptor table.
- * We currently assume that it is LE.
- */
-dscr->addr = (hwaddr)extract64(adma2, 32, 32) & ~0x3ull;
-dscr->length = (uint16_t)extract64(adma2, 16, 16);
-dscr->attr = (uint8_t)extract64(adma2, 0, 7);
-dscr->incr = 8;
-trace_adma_description("ADMA2_32", dscr);
+res = dma_memory_read(s->dma_as, entry_addr, &adma2, sizeof(adma2),
+  MEMTXATTRS_UNSPECIFIED);
+if (res == MEMTX_OK) {
+adma2 = le64_to_cpu(adma2);
+/* The spec does not specify endianness of descriptor table.
+ * We currently assume that it is LE.
+ */
+dscr->addr = (hwaddr)extract64(adma2, 32, 32) & ~0x3ull;
+dscr->length = (uint16_t)extract64(adma2, 16, 16);
+dscr->attr = (uint8_t)extract64(adma2, 0, 7);
+dscr->incr = 8;
+trace_adma_description("ADMA2_32", dscr);
+}
  break;
  case SDHC_CTRL_ADMA1_32:
-dma_memory_read(s->dma_as, entry_addr, &adma1, sizeof(adma1),
-MEMTXATTRS_UNSPECIFIED);
-adma1 = le32_to_cpu(adma1);
-dscr->addr = (hwaddr)(adma1 & 0xF000);
-dscr->attr = (uint8_t)extract32(adma1, 0, 7);
-dscr->incr = 4;
-if ((dscr->attr & SDHC_ADMA_ATTR_ACT_MASK) == SDHC_ADMA_ATTR_SET_LEN) {
-dscr->length = (uint16_t)extract32(adma1, 12, 16);
-} else {
-dscr->length = 4 * KiB;
+res = dma_memory_read(s->dma_as, entry_addr, &adma1, sizeof(adma1),
+  MEMTXATTRS_UNSPECIFIED);
+if (res == MEMTX_OK) {
+adma1 = le32_to_cpu(adma1);
+dscr->addr = (hwaddr)(adma1 & ~0xfff);
+dscr->attr = (uint8_t)extract32(adma1, 0, 7);
+dscr->incr = 4;
+if ((dscr->attr & SDHC_ADMA_ATTR_ACT_MASK) == 
SDHC_ADMA_ATTR_SET_LEN) {
+dscr->length = (uint16_t)extract32(adma1, 12, 16);
+} else {
+dscr->length = 4 * KiB;
+}
+trace_adma_description("ADMA1_32", dscr);
  }
-trace_adma_description("ADMA1_32", dscr);
  break;
  case SDHC_CTRL_ADMA2_64:
-dma_memory_read(s->dma_as, entry_addr, &dscr->attr, 1,
-MEMTXATTRS_UNSPECIFIED);
-dma_memory_read(s->dma_as, entry_addr + 2, &dscr->length, 2,
-MEMTXATTRS_UNSPECIFIED);
-dscr->length = le16_to_cpu(dscr->length);
-dma_memory_read(s->dma_as, entry_addr + 4, &dscr->addr, 8,
-MEMTXATTRS_UNSPECIFIED);
-dscr->addr = le64_to_cpu(dscr->addr);
-dscr->attr &= (uint8_t) ~0xC0;
-dscr->incr = 12;
-trace_adma_description("ADMA2_64", dscr);
+res = dma_memory_read(s->dma_as, entry_addr, &dscr->attr, 1,
+  MEMTXATTRS_UNSPECIFIED);
+res |= dma_memory_read(s->dma_as, entry_addr + 2, &dscr->length, 2,
+   MEMTXATTRS_UNSPECIFIED);
+res |= dma_memory_read(s->dma_as, entry_addr + 4, &dscr->addr, 8,
+   MEMTXATTRS_UNSPECIFIED);
+if (res == MEMTX_OK) {
+dscr->length = le16_to_cpu(dscr->length);
+dscr->addr = le64_to_cpu(dscr->addr);
+dscr->attr &= (uint8_t) ~0xc0;
+dscr->incr = 12;
+trace_adma_description("ADMA2_64", dscr);
+}
  break;
  }
+return res;
  }
  
  /* Advanced DMA data transfer */

@@ -755,7 +764,6 @@ static void sdhci_do_adma(SDHCIState *

Re: [PATCH 2/7] block/gluster: Use g_autofree for string in qemu_gluster_parse_json()

2024-07-31 Thread Philippe Mathieu-Daudé


On 31/7/24 16:36, Peter Maydell wrote:

In the loop in qemu_gluster_parse_json() we do:

 char *str = NULL;
 for(...) {
 str = g_strdup_printf(...);
 ...
 if (various errors) {
 goto out;
 }
 ...
 g_free(str);
 str = NULL;
 }
 return 0;
out:
 various cleanups;
 g_free(str);
 ...
 return -errno;

Coverity correctly complains that the assignment "str = NULL" at the
end of the loop is unnecessary, because we will either go back to the
top of the loop and overwrite it, or else we will exit the loop and
then exit the function without ever reading str again. The assignment
is there as defensive coding to ensure that str is only non-NULL if
it's a live allocation, so this is intentional.

We can make Coverity happier and simplify the code here by using
g_autofree, since we never need 'str' outside the loop.

Resolves: Coverity CID 1527385
Signed-off-by: Peter Maydell 
---
  block/gluster.c | 6 +-
  1 file changed, 1 insertion(+), 5 deletions(-)


Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH 5/7] hw/block/fdc-isa: Assert that isa_fdc_get_drive_max_chs() found something

2024-07-31 Thread Philippe Mathieu-Daudé


On 31/7/24 16:36, Peter Maydell wrote:

Coverity complains about an overflow in isa_fdc_get_drive_max_chs()
that can happen if the loop over fd_formats never finds a match,
because we initialize *maxc to 0 and then at the end of the
function decrement it.

This can't ever actually happen because fd_formats has at least
one entry for each FloppyDriveType, so we must at least once
find a match and update *maxc, *maxh and *maxs. Assert that we
did find a match, which should keep Coverity happy and will also
detect possible bugs in the data in fd_formats.

Resolves: Coverity CID 1547663
Signed-off-by: Peter Maydell 
---
  hw/block/fdc-isa.c | 2 ++
  1 file changed, 2 insertions(+)


Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH 6/7] hw/ide/pci.c: Remove dead code from bmdma_prepare_buf()

2024-07-31 Thread Philippe Mathieu-Daudé


On 31/7/24 17:13, Kevin Wolf wrote:

Am 31.07.2024 um 16:36 hat Peter Maydell geschrieben:

Coverity notes that the code at the end of the loop in
bmdma_prepare_buf() is unreachable.  This is because in commit
9fbf0fa81fca8f527 ("ide: remove hardcoded 2GiB transactional limit")
we removed the only codepath in the loop which could "break" out of
it, but didn't notice that this meant we should also remove the code
at the end of the loop.

Remove the dead code.

Resolves: Coverity CID 1547772
Signed-off-by: Peter Maydell 
---
  hw/ide/pci.c | 4 
  1 file changed, 4 deletions(-)

diff --git a/hw/ide/pci.c b/hw/ide/pci.c
index 4675d079a17..f2cb500a94f 100644
--- a/hw/ide/pci.c
+++ b/hw/ide/pci.c
@@ -266,10 +266,6 @@ static int32_t bmdma_prepare_buf(const IDEDMA *dma, 
int32_t limit)
  s->io_buffer_size += l;
  }
  }
-
-qemu_sglist_destroy(&s->sg);
-s->io_buffer_size = 0;
-return -1;
  }


Should we put a g_assert_not_reached() here instead to make it easier
for the reader to understand how this function works?


Or break and keep returning at EOF:

-- >8 --
diff --git a/hw/ide/pci.c b/hw/ide/pci.c
index 4675d079a1..8386c4fe42 100644
--- a/hw/ide/pci.c
+++ b/hw/ide/pci.c
@@ -237,7 +237,7 @@ static int32_t bmdma_prepare_buf(const IDEDMA *dma, 
int32_t limit)

 /* end of table (with a fail safe of one page) */
 if (bm->cur_prd_last ||
 (bm->cur_addr - bm->addr) >= BMDMA_PAGE_SIZE) {
-return s->sg.size;
+break;
 }
 pci_dma_read(pci_dev, bm->cur_addr, &prd, 8);
 bm->cur_addr += 8;
@@ -267,9 +267,7 @@ static int32_t bmdma_prepare_buf(const IDEDMA *dma, 
int32_t limit)

 }
 }

-qemu_sglist_destroy(&s->sg);
-s->io_buffer_size = 0;
-return -1;
+return s->sg.size;
 }

---



Either way:
Reviewed-by: Kevin Wolf 



Reviewed-by: Philippe Mathieu-Daudé

[PATCH-for-9.1? v2 0/4] hw/sd/sdhci: Check ADMA descriptors can be accessed

2024-07-31 Thread Philippe Mathieu-Daudé

Since v1:
- split patch
- do not return MemTxResult from get_adma_description()
- single DMA read in SDHC_CTRL_ADMA2_64 case

Based-on: <20240730092138.32443-5-phi...@linaro.org>

Philippe Mathieu-Daudé (4):
  hw/sd/sdhci: Reduce variables scope in sdhci_do_adma()
  hw/sd/sdhci: Reduce variables scope in get_adma_description()
  hw/sd/sdhci: Read ADMA2_64 descriptor with a single dma_memory_read()
  hw/sd/sdhci: Check ADMA descriptors can be accessed

 hw/sd/sdhci.c | 117 ++
 1 file changed, 70 insertions(+), 47 deletions(-)

-- 
2.45.2

[PATCH-for-9.1? v2 4/4] hw/sd/sdhci: Check ADMA descriptors can be accessed

2024-07-31 Thread Philippe Mathieu-Daudé

Since malicious guest can write invalid addresses to
the ADMASYSADDR register, we need to check whether the
descriptor could be correctly filled or not.

Cc: qemu-sta...@nongnu.org
Fixes: d7dfca0807 ("hw/sdhci: introduce standard SD host controller")
Signed-off-by: Philippe Mathieu-Daudé 
---
 hw/sd/sdhci.c | 23 +--
 1 file changed, 17 insertions(+), 6 deletions(-)

diff --git a/hw/sd/sdhci.c b/hw/sd/sdhci.c
index 2d8fa3151a..6794ee2267 100644
--- a/hw/sd/sdhci.c
+++ b/hw/sd/sdhci.c
@@ -701,13 +701,18 @@ static void trace_adma_description(const char *type, 
const ADMADescr *dscr)
 static void get_adma_description(SDHCIState *s, ADMADescr *dscr)
 {
 hwaddr entry_addr = (hwaddr)s->admasysaddr;
+MemTxResult res;
+
 switch (SDHC_DMA_TYPE(s->hostctl1)) {
 case SDHC_CTRL_ADMA2_32:
 {
 uint64_t adma2 = 0;
 
-dma_memory_read(s->dma_as, entry_addr, &adma2, sizeof(adma2),
-MEMTXATTRS_UNSPECIFIED);
+res = dma_memory_read(s->dma_as, entry_addr, &adma2, sizeof(adma2),
+  MEMTXATTRS_UNSPECIFIED);
+if (res != MEMTX_OK) {
+break;
+}
 adma2 = le64_to_cpu(adma2);
 /*
  * The spec does not specify endianness of descriptor table.
@@ -724,8 +729,11 @@ static void get_adma_description(SDHCIState *s, ADMADescr 
*dscr)
 {
 uint32_t adma1 = 0;
 
-dma_memory_read(s->dma_as, entry_addr, &adma1, sizeof(adma1),
-MEMTXATTRS_UNSPECIFIED);
+res = dma_memory_read(s->dma_as, entry_addr, &adma1, sizeof(adma1),
+  MEMTXATTRS_UNSPECIFIED);
+if (res != MEMTX_OK) {
+break;
+}
 adma1 = le32_to_cpu(adma1);
 dscr->addr = (hwaddr)(adma1 & ~0xfff);
 dscr->attr = (uint8_t)extract32(adma1, 0, 7);
@@ -748,8 +756,11 @@ static void get_adma_description(SDHCIState *s, ADMADescr 
*dscr)
 } QEMU_PACKED adma2;
 QEMU_BUILD_BUG_ON(sizeof(adma2) != 12);
 
-dma_memory_read(s->dma_as, entry_addr, &adma2, sizeof(adma2),
-MEMTXATTRS_UNSPECIFIED);
+res = dma_memory_read(s->dma_as, entry_addr, &adma2, sizeof(adma2),
+  MEMTXATTRS_UNSPECIFIED);
+if (res != MEMTX_OK) {
+break;
+}
 dscr->length = le16_to_cpu(adma2.length);
 dscr->addr = le64_to_cpu(adma2.addr);
 dscr->attr = adma2.attr & (uint8_t) ~0xc0;
-- 
2.45.2

[PATCH-for-9.1? v2 1/4] hw/sd/sdhci: Reduce variables scope in sdhci_do_adma()

2024-07-31 Thread Philippe Mathieu-Daudé

All variables are only used within the for loop.
Declare them within it. In particular this resets
'dscr' on each iteration.

Signed-off-by: Philippe Mathieu-Daudé 
---
 hw/sd/sdhci.c | 15 +++
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/hw/sd/sdhci.c b/hw/sd/sdhci.c
index 66b9364e9e..773f2b284b 100644
--- a/hw/sd/sdhci.c
+++ b/hw/sd/sdhci.c
@@ -751,20 +751,19 @@ static void get_adma_description(SDHCIState *s, ADMADescr 
*dscr)
 
 static void sdhci_do_adma(SDHCIState *s)
 {
-unsigned int begin, length;
-const uint16_t block_size = s->blksize & BLOCK_SIZE_MASK;
-const MemTxAttrs attrs = { .memory = true };
-ADMADescr dscr = {};
-MemTxResult res;
-int i;
-
 if (s->trnmod & SDHC_TRNS_BLK_CNT_EN && !s->blkcnt) {
 /* Stop Multiple Transfer */
 sdhci_end_transfer(s);
 return;
 }
 
-for (i = 0; i < SDHC_ADMA_DESCS_PER_DELAY; ++i) {
+for (int i = 0; i < SDHC_ADMA_DESCS_PER_DELAY; ++i) {
+unsigned int begin, length;
+const uint16_t block_size = s->blksize & BLOCK_SIZE_MASK;
+const MemTxAttrs attrs = { .memory = true };
+ADMADescr dscr = { };
+MemTxResult res;
+
 s->admaerr &= ~SDHC_ADMAERR_LENGTH_MISMATCH;
 
 get_adma_description(s, &dscr);
-- 
2.45.2

[PATCH-for-9.1? v2 2/4] hw/sd/sdhci: Reduce variables scope in get_adma_description()

2024-07-31 Thread Philippe Mathieu-Daudé

The 'adma1' variable is only used in the SDHC_CTRL_ADMA1_32
case, and 'adma2' in SDHC_CTRL_ADMA2_32. Add braces in the
switch case to use local declarations.
Do the same in the SDHC_CTRL_ADMA2_64 case because we'll add
a local variable there in the next commit.

Replace 0xF000 -> ~0xfff to align with our codebase style.

No functional change intended.

Signed-off-by: Philippe Mathieu-Daudé 
---
 hw/sd/sdhci.c | 87 ---
 1 file changed, 48 insertions(+), 39 deletions(-)

diff --git a/hw/sd/sdhci.c b/hw/sd/sdhci.c
index 773f2b284b..0a95f49b93 100644
--- a/hw/sd/sdhci.c
+++ b/hw/sd/sdhci.c
@@ -700,50 +700,59 @@ static void trace_adma_description(const char *type, 
const ADMADescr *dscr)
 
 static void get_adma_description(SDHCIState *s, ADMADescr *dscr)
 {
-uint32_t adma1 = 0;
-uint64_t adma2 = 0;
 hwaddr entry_addr = (hwaddr)s->admasysaddr;
 switch (SDHC_DMA_TYPE(s->hostctl1)) {
 case SDHC_CTRL_ADMA2_32:
-dma_memory_read(s->dma_as, entry_addr, &adma2, sizeof(adma2),
-MEMTXATTRS_UNSPECIFIED);
-adma2 = le64_to_cpu(adma2);
-/* The spec does not specify endianness of descriptor table.
- * We currently assume that it is LE.
- */
-dscr->addr = (hwaddr)extract64(adma2, 32, 32) & ~0x3ull;
-dscr->length = (uint16_t)extract64(adma2, 16, 16);
-dscr->attr = (uint8_t)extract64(adma2, 0, 7);
-dscr->incr = 8;
-trace_adma_description("ADMA2_32", dscr);
-break;
-case SDHC_CTRL_ADMA1_32:
-dma_memory_read(s->dma_as, entry_addr, &adma1, sizeof(adma1),
-MEMTXATTRS_UNSPECIFIED);
-adma1 = le32_to_cpu(adma1);
-dscr->addr = (hwaddr)(adma1 & 0xF000);
-dscr->attr = (uint8_t)extract32(adma1, 0, 7);
-dscr->incr = 4;
-if ((dscr->attr & SDHC_ADMA_ATTR_ACT_MASK) == SDHC_ADMA_ATTR_SET_LEN) {
-dscr->length = (uint16_t)extract32(adma1, 12, 16);
-} else {
-dscr->length = 4 * KiB;
+{
+uint64_t adma2 = 0;
+
+dma_memory_read(s->dma_as, entry_addr, &adma2, sizeof(adma2),
+MEMTXATTRS_UNSPECIFIED);
+adma2 = le64_to_cpu(adma2);
+/*
+ * The spec does not specify endianness of descriptor table.
+ * We currently assume that it is LE.
+ */
+dscr->addr = (hwaddr)extract64(adma2, 32, 32) & ~0x3ull;
+dscr->length = (uint16_t)extract64(adma2, 16, 16);
+dscr->attr = (uint8_t)extract64(adma2, 0, 7);
+dscr->incr = 8;
+trace_adma_description("ADMA2_32", dscr);
+break;
+}
+case SDHC_CTRL_ADMA1_32:
+{
+uint32_t adma1 = 0;
+
+dma_memory_read(s->dma_as, entry_addr, &adma1, sizeof(adma1),
+MEMTXATTRS_UNSPECIFIED);
+adma1 = le32_to_cpu(adma1);
+dscr->addr = (hwaddr)(adma1 & ~0xfff);
+dscr->attr = (uint8_t)extract32(adma1, 0, 7);
+dscr->incr = 4;
+if ((dscr->attr & SDHC_ADMA_ATTR_ACT_MASK) == 
SDHC_ADMA_ATTR_SET_LEN) {
+dscr->length = (uint16_t)extract32(adma1, 12, 16);
+} else {
+dscr->length = 4 * KiB;
+}
+trace_adma_description("ADMA1_32", dscr);
+break;
 }
-trace_adma_description("ADMA1_32", dscr);
-break;
 case SDHC_CTRL_ADMA2_64:
-dma_memory_read(s->dma_as, entry_addr, &dscr->attr, 1,
-MEMTXATTRS_UNSPECIFIED);
-dma_memory_read(s->dma_as, entry_addr + 2, &dscr->length, 2,
-MEMTXATTRS_UNSPECIFIED);
-dscr->length = le16_to_cpu(dscr->length);
-dma_memory_read(s->dma_as, entry_addr + 4, &dscr->addr, 8,
-MEMTXATTRS_UNSPECIFIED);
-dscr->addr = le64_to_cpu(dscr->addr);
-dscr->attr &= (uint8_t) ~0xC0;
-dscr->incr = 12;
-trace_adma_description("ADMA2_64", dscr);
-break;
+{
+dma_memory_read(s->dma_as, entry_addr, &dscr->attr, 1,
+MEMTXATTRS_UNSPECIFIED);
+dma_memory_read(s->dma_as, entry_addr + 2, &dscr->length, 2,
+MEMTXATTRS_UNSPECIFIED);
+dscr->length = le16_to_cpu(dscr->length);
+dma_memory_read(s->dma_as, entry_addr + 4, &dscr->addr, 8,
+MEMTXATTRS_UNSPECIFIED);
+dscr->addr = le64_to_cpu(dscr->addr);
+dscr->attr &= (uint8_t) ~0xC0;
+dscr->incr = 12;
+trace_adma_description("ADMA2_64", dscr);
+break;
+}
 }
 }
 
-- 
2.45.2

[PATCH-for-9.1? v2 3/4] hw/sd/sdhci: Read ADMA2_64 descriptor with a single dma_memory_read()

2024-07-31 Thread Philippe Mathieu-Daudé

Instead of 3 consecutive dma_memory_read() calls, use
a packed structure to read the descriptor in a single
call.

Signed-off-by: Philippe Mathieu-Daudé 
---
 hw/sd/sdhci.c | 22 +-
 1 file changed, 13 insertions(+), 9 deletions(-)

diff --git a/hw/sd/sdhci.c b/hw/sd/sdhci.c
index 0a95f49b93..2d8fa3151a 100644
--- a/hw/sd/sdhci.c
+++ b/hw/sd/sdhci.c
@@ -740,16 +740,20 @@ static void get_adma_description(SDHCIState *s, ADMADescr 
*dscr)
 }
 case SDHC_CTRL_ADMA2_64:
 {
-dma_memory_read(s->dma_as, entry_addr, &dscr->attr, 1,
+struct {
+uint8_t attr;
+uint8_t pad;
+uint16_t length;
+uint64_t addr;
+} QEMU_PACKED adma2;
+QEMU_BUILD_BUG_ON(sizeof(adma2) != 12);
+
+dma_memory_read(s->dma_as, entry_addr, &adma2, sizeof(adma2),
 MEMTXATTRS_UNSPECIFIED);
-dma_memory_read(s->dma_as, entry_addr + 2, &dscr->length, 2,
-MEMTXATTRS_UNSPECIFIED);
-dscr->length = le16_to_cpu(dscr->length);
-dma_memory_read(s->dma_as, entry_addr + 4, &dscr->addr, 8,
-MEMTXATTRS_UNSPECIFIED);
-dscr->addr = le64_to_cpu(dscr->addr);
-dscr->attr &= (uint8_t) ~0xC0;
-dscr->incr = 12;
+dscr->length = le16_to_cpu(adma2.length);
+dscr->addr = le64_to_cpu(adma2.addr);
+dscr->attr = adma2.attr & (uint8_t) ~0xc0;
+dscr->incr = sizeof(adma2);
 trace_adma_description("ADMA2_64", dscr);
 break;
 }
-- 
2.45.2

Re: [PATCH RFC v4 0/7] virtio-net: add support for SR-IOV emulation

2024-07-31 Thread Akihiko Odaki


On 2024/07/31 18:34, Yui Washizu wrote:


On 2024/07/15 14:15, Akihiko Odaki wrote:

On 2024/05/16 11:00, Yui Washizu wrote:


On 2024/04/28 18:05, Akihiko Odaki wrote:

Based-on: <20240315-reuse-v9-0-67aa69af4...@daynix.com>
("[PATCH for 9.1 v9 00/11] hw/pci: SR-IOV related fixes and 
improvements")


Introduction


This series is based on the RFC series submitted by Yui Washizu[1].
See also [2] for the context.

This series enables SR-IOV emulation for virtio-net. It is useful
to test SR-IOV support on the guest, or to expose several vDPA devices
in a VM. vDPA devices can also provide L2 switching feature for
offloading though it is out of scope to allow the guest to configure
such a feature.

The PF side code resides in virtio-pci. The VF side code resides in
the PCI common infrastructure, but it is restricted to work only for
virtio-net-pci because of lack of validation.

User Interface
--

A user can configure a SR-IOV capable virtio-net device by adding
virtio-net-pci functions to a bus. Below is a command line example:
   -netdev user,id=n -netdev user,id=o
   -netdev user,id=p -netdev user,id=q
   -device pcie-root-port,id=b
   -device virtio-net-pci,bus=b,addr=0x0.0x3,netdev=q,sriov-pf=f
   -device virtio-net-pci,bus=b,addr=0x0.0x2,netdev=p,sriov-pf=f
   -device virtio-net-pci,bus=b,addr=0x0.0x1,netdev=o,sriov-pf=f
   -device virtio-net-pci,bus=b,addr=0x0.0x0,netdev=n,id=f

The VFs specify the paired PF with "sriov-pf" property. The PF must be
added after all VFs. It is user's responsibility to ensure that VFs 
have

function numbers larger than one of the PF, and the function numbers
have a consistent stride.



I tried to start a VM with more than 8 VFs allocated using your patch,
but the following error occured and qemu didn't work:
VF function number overflows.

I think the cause of this error is that virtio-net-pci PFs don't have 
ARI.

(pcie_ari_init is not added to virtio-net-pci when PFs are initialized.)
I think it is possible to add it later,
but how about adding pcie_ari_init ?

As a trial,
adding pcie_ari_init to virtio_pci_realize enabled the creation of 
more than 8 VFs.


I have just looked into that possibility, but adding pcie_ari_init to 
virtio_pci_realize has some implications. Unconditionally calling 
pcie_ari_init will break the existing configuration of virtio-pci 
devices so we need to implement some logic to detect when ARI is 
needed. Preferably such logic should be implemented in the common PCI 
infrastructure instead of implementing it in virtio-pci so that other 
PCI multifunction devices can benefit from it.


While I don't think implementing this will be too complicated, I need 
to ensure that such a feature is really needed before doing so.



OK.
I want to use this emulation for offloading virtual network
in a environment where there are many containers in VMs.
So, I consider that the feature is need.
I think that 7 VFs are too few.
I'll keep thinking about the feature's necessity.


I understand there could be many containers in VMs, but will a single 
device deal with them? If the virtio-net VFs are backed by the vDPA 
capability of one physical device, it will not have VFs more than that. 
The VMs must have several PFs individually paired with VFs to 
accommodate more containers on one VM.


I don't know much about vDPA-capable device, but as a reference, igb 
only has 8 VFs.





I'll add other comments to RFC v5 patch.


The RFC tag is already dropped.

Regards,
Akihiko Odaki

Re: [PATCH RFC v4 0/7] virtio-net: add support for SR-IOV emulation

2024-07-31 Thread Michael S. Tsirkin

On Thu, Aug 01, 2024 at 02:37:55PM +0900, Akihiko Odaki wrote:
> I don't know much about vDPA-capable device, but as a reference, igb only
> has 8 VFs.

modern vdpa capable devices have much more than 8 VFs, 8 is a very low
number.

-- 
MST

43 matches

Mail list logo