Re: [PATCH v2 0/4] hw/pci: Convert rom_bar into OnOffAuto
Akihiko Odaki writes: > rom_bar is tristate but was defined as uint32_t so convert it into > OnOffAuto to clarify that. For compatibility, a uint32 value set via > QOM will be converted into OnOffAuto. > > Signed-off-by: Akihiko Odaki I agree making property "rombar" an integer was a design mistake. I further agree that vfio_pci_size_rom() peeking into dev->opts to distinguish "user didn't set a value" from "user set the default value") is an unadvisable hack. However, ... > --- > Changes in v2: > - Documented in docs/about/deprecated.rst (Daniel P. Berrangé) > - Link to v1: > https://lore.kernel.org/r/20240706-rombar-v1-0-802daef2a...@daynix.com > > --- > Akihiko Odaki (4): > qapi: Add visit_type_str_preserving() > qapi: Do not consume a value when visit_type_enum() fails > hw/pci: Convert rom_bar into OnOffAuto > hw/qdev: Remove opts member > > docs/about/deprecated.rst | 7 + > docs/igd-assign.txt | 2 +- > include/hw/pci/pci_device.h | 2 +- > include/hw/qdev-core.h| 4 --- > include/qapi/visitor-impl.h | 3 ++- > include/qapi/visitor.h| 25 + > hw/core/qdev.c| 1 - > hw/pci/pci.c | 57 > +-- > hw/vfio/pci-quirks.c | 2 +- > hw/vfio/pci.c | 11 > hw/xen/xen_pt_load_rom.c | 4 +-- > qapi/opts-visitor.c | 12 - > qapi/qapi-clone-visitor.c | 2 +- > qapi/qapi-dealloc-visitor.c | 4 +-- > qapi/qapi-forward-visitor.c | 4 ++- > qapi/qapi-visit-core.c| 21 --- > qapi/qobject-input-visitor.c | 23 > qapi/qobject-output-visitor.c | 2 +- > qapi/string-input-visitor.c | 2 +- > qapi/string-output-visitor.c | 2 +- > system/qdev-monitor.c | 12 + > tests/qtest/virtio-net-failover.c | 32 +++--- > 22 files changed, 161 insertions(+), 73 deletions(-) > --- > base-commit: f2cb4026fccfe073f84a4b440e41d3ed0c3134f6 > change-id: 20240704-rombar-1a4ba2890d6c > > Best regards, ... this is an awful lot of QAPI visitor infrastructure. Infrastructure that will likely only ever be used for this one property. Moreover, the property setter rom_bar_set() is a hack: it first tries to parse the value as enum, and if that fails, as uint32_t. QAPI already provides a way to express "either this type or that type": alternate. Like this: { 'alternate': 'OnOffAutoUint32', 'data': { 'sym': 'OnOffAuto', 'uint': 'uint32' } } Unfortunately, such alternates don't work on the command line due to keyval visitor restrictions. These cannot be lifted entirely, but we might be able to lift them sufficiently to make this alternate work. Whether it would be worth your trouble and mine just to clean up "rombar" seems highly dubious, though.
Re: [PATCH v2 0/4] hw/pci: Convert rom_bar into OnOffAuto
On Wed, Jul 31, 2024 at 10:32:19AM +0200, Markus Armbruster wrote: > Whether it would be worth your trouble and mine just to clean up > "rombar" seems highly dubious, though. Exactly. -- MST
Re: [PATCH v5 0/8] virtio-net: add support for SR-IOV emulation
On 7/30/24 19:56, Michael S. Tsirkin wrote: On Tue, Jul 30, 2024 at 09:26:20PM +0900, Akihiko Odaki wrote: On 2024/07/30 20:37, Michael S. Tsirkin wrote: On Mon, Jul 15, 2024 at 02:19:06PM +0900, Akihiko Odaki wrote: Based-on: <20240714-rombar-v2-0-af1504ef5...@daynix.com> ("[PATCH v2 0/4] hw/pci: Convert rom_bar into OnOffAuto") OK I will revert this for now. We'll try again after the release, there will be time to address s390. I'd like to know if anybody wants to use igb on a s390x machine configured with libvirt. Such a configuration is already kind of broken, and it is likely to require significant effort on both side of libvirt and QEMU to fix it. I assume Cédric wouldn't report it if was unused. As an alternative, I'm also introducing SR-IOV support to virtio-net-pci. It does not suffer the same problem with igb thanks to its flexible configuration mechanism. Regards, Akihiko Odaki Sounds like this needs more review time, anyway. Using an emulated IGB device in a s390x VM is not a common scenario. The IGB device is not supported downstream (RHEL on Z). However, the change broke the s390x machines I use for upstream QEMU development, I removed the IGB device as a fix for now. The Z PCI device model has 'uid' and 'fid' properties which are set by the VMM, and in this case, they are auto-generated, hence the conflicting ids with libvirt. This is Z specific but, possibly there are subtle use cases on other platforms which could have similar consequences. Something to be aware of. Also, and this is why I thought it was important to report. Creating PCI devices at machine init time (with -nodefaults) in the back of the management layer is a no-no. libvirt requests to have full control on the machine topology and this change seems like a violation of this rule, even if VFs are kind of special. Daniel, did I understand correctly the above constraint on the machine definition ? I can't say if we should revert for 9.1. Again, this Z is specific. The changes were introduced to catch errors early when the PF device is realized. There is no doubt they are useful. Some rework is needed to avoid the conflicting id issue though. Thanks, C.
Re: [PATCH RFC v4 0/7] virtio-net: add support for SR-IOV emulation
On 2024/07/15 14:15, Akihiko Odaki wrote: On 2024/05/16 11:00, Yui Washizu wrote: On 2024/04/28 18:05, Akihiko Odaki wrote: Based-on: <20240315-reuse-v9-0-67aa69af4...@daynix.com> ("[PATCH for 9.1 v9 00/11] hw/pci: SR-IOV related fixes and improvements") Introduction This series is based on the RFC series submitted by Yui Washizu[1]. See also [2] for the context. This series enables SR-IOV emulation for virtio-net. It is useful to test SR-IOV support on the guest, or to expose several vDPA devices in a VM. vDPA devices can also provide L2 switching feature for offloading though it is out of scope to allow the guest to configure such a feature. The PF side code resides in virtio-pci. The VF side code resides in the PCI common infrastructure, but it is restricted to work only for virtio-net-pci because of lack of validation. User Interface -- A user can configure a SR-IOV capable virtio-net device by adding virtio-net-pci functions to a bus. Below is a command line example: -netdev user,id=n -netdev user,id=o -netdev user,id=p -netdev user,id=q -device pcie-root-port,id=b -device virtio-net-pci,bus=b,addr=0x0.0x3,netdev=q,sriov-pf=f -device virtio-net-pci,bus=b,addr=0x0.0x2,netdev=p,sriov-pf=f -device virtio-net-pci,bus=b,addr=0x0.0x1,netdev=o,sriov-pf=f -device virtio-net-pci,bus=b,addr=0x0.0x0,netdev=n,id=f The VFs specify the paired PF with "sriov-pf" property. The PF must be added after all VFs. It is user's responsibility to ensure that VFs have function numbers larger than one of the PF, and the function numbers have a consistent stride. I tried to start a VM with more than 8 VFs allocated using your patch, but the following error occured and qemu didn't work: VF function number overflows. I think the cause of this error is that virtio-net-pci PFs don't have ARI. (pcie_ari_init is not added to virtio-net-pci when PFs are initialized.) I think it is possible to add it later, but how about adding pcie_ari_init ? As a trial, adding pcie_ari_init to virtio_pci_realize enabled the creation of more than 8 VFs. I have just looked into that possibility, but adding pcie_ari_init to virtio_pci_realize has some implications. Unconditionally calling pcie_ari_init will break the existing configuration of virtio-pci devices so we need to implement some logic to detect when ARI is needed. Preferably such logic should be implemented in the common PCI infrastructure instead of implementing it in virtio-pci so that other PCI multifunction devices can benefit from it. While I don't think implementing this will be too complicated, I need to ensure that such a feature is really needed before doing so. OK. I want to use this emulation for offloading virtual network in a environment where there are many containers in VMs. So, I consider that the feature is need. I think that 7 VFs are too few. I'll keep thinking about the feature's necessity. I'll add other comments to RFC v5 patch. Regards, Yui Washizu
Re: [PATCH 01/18] qapi: Smarter camel_to_upper() to reduce need for 'prefix'
Am 30.07.2024 um 10:10 hat Markus Armbruster geschrieben: > camel_to_upper() converts its argument from camel case to upper case > with '_' between words. Used for generated enumeration constant > prefixes. > > When some of the words are spelled all caps, where exactly to insert > '_' is guesswork. camel_to_upper()'s guesses are bad enough in places > to make people override them with a 'prefix' in the schema. > > Rewrite it to guess better: > > 1. Insert '_' after a non-upper case character followed by an upper >case character: > >OneTwo -> ONE_TWO >One2Three -> ONE2_THREE > > 2. Insert '_' before the last upper case character followed by a >non-upper case character: > >ACRONYMWord -> ACRONYM_Word > >Except at the beginning (as in OneTwo above), or when there is >already one: > >AbCd -> AB_CD Maybe it's just me, but the exception "at the beginning" (in the sense of "after the first character") seems to be exactly where I thought "that looks strange" while going through your list below. In particular, I'd expect X_DBG_* instead of XDBG_*. I also thought that the Q_* spelling made more sense, though this might be less clear. But in case of doubt, less exceptions seems like a good choice. > +# Copy remainder of ``value`` to ``ret`` with '_' inserted > +for ch in value[1:]: > +if ch.isupper() == upc: > +pass > +elif upc: > +# ``ret`` ends in upper case, next char isn't: insert '_' > +# before the last upper case char unless there is one > +# already, or it's at the beginning > +if len(ret) > 2 and ret[-2] != '_': > +ret = ret[:-1] + '_' + ret[-1] I think in the code this means I would have expected len(ret) >= 2. Kevin
Re: [PATCH 11/18] qapi/crypto: Rename QCryptoHashAlgorithm to *Algo, and drop prefix
On Tue, Jul 30, 2024 at 02:26:49PM +0200, Markus Armbruster wrote: > Daniel P. Berrangé writes: > > > On Tue, Jul 30, 2024 at 10:10:25AM +0200, Markus Armbruster wrote: > >> QAPI's 'prefix' feature can make the connection between enumeration > >> type and its constants less than obvious. It's best used with > >> restraint. > >> > >> QCryptoHashAlgorithm has a 'prefix' that overrides the generated > >> enumeration constants' prefix to QCRYPTO_HASH_ALG. > >> > >> We could simply drop 'prefix', but then the prefix becomes > >> QCRYPTO_HASH_ALGORITHM, which is rather long. > >> > >> We could additionally rename the type to QCryptoHashAlg, but I think > >> the abbreviation "alg" is less than clear. > > > > I would have gone with this, but it is a bit of a bike shed colouring > > debate so I'm not fussed > > Either solution seems okay, so I went with my personal preference. Do > feel free to state yours and ask me to respin! After reviewing the patches that follow, I'd observe that picking Algo has made the following patches much larger than if it had stuck with Alg. Basically changing both the types & constants, instead of only having to change the types. > > >> Rename the type to QCryptoHashAlgo instead. The prefix becomes to > >> QCRYPTO_HASH_ALGO. > >> > >> Signed-off-by: Markus Armbruster > >> --- > >> qapi/crypto.json| 17 +- > >> crypto/blockpriv.h | 2 +- > >> crypto/hashpriv.h | 2 +- > >> crypto/hmacpriv.h | 4 +-- > >> crypto/ivgenpriv.h | 2 +- > >> include/crypto/afsplit.h| 8 ++--- > >> include/crypto/block.h | 2 +- > >> include/crypto/hash.h | 18 +- > >> include/crypto/hmac.h | 6 ++-- > >> include/crypto/ivgen.h | 6 ++-- > >> include/crypto/pbkdf.h | 10 +++--- > >> backends/cryptodev-builtin.c| 8 ++--- > >> backends/cryptodev-lkcf.c | 10 +++--- > >> block/parallels-ext.c | 2 +- > >> block/quorum.c | 4 +-- > >> crypto/afsplit.c| 6 ++-- > >> crypto/block-luks.c | 16 - > >> crypto/block.c | 2 +- > >> crypto/hash-afalg.c | 26 +++ > >> crypto/hash-gcrypt.c| 20 +-- > >> crypto/hash-glib.c | 20 +-- > >> crypto/hash-gnutls.c| 20 +-- > >> crypto/hash-nettle.c| 18 +- > >> crypto/hash.c | 30 - > >> crypto/hmac-gcrypt.c| 22 ++--- > >> crypto/hmac-glib.c | 22 ++--- > >> crypto/hmac-gnutls.c| 22 ++--- > >> crypto/hmac-nettle.c| 22 ++--- > >> crypto/hmac.c | 2 +- > >> crypto/ivgen.c | 4 +-- > >> crypto/pbkdf-gcrypt.c | 36 ++-- > >> crypto/pbkdf-gnutls.c | 36 ++-- > >> crypto/pbkdf-nettle.c | 32 +- > >> crypto/pbkdf-stub.c | 4 +-- > >> crypto/pbkdf.c | 2 +- > >> hw/misc/aspeed_hace.c | 16 - > >> io/channel-websock.c| 2 +- > >> target/i386/sev.c | 6 ++-- > >> tests/bench/benchmark-crypto-akcipher.c | 12 +++ > >> tests/bench/benchmark-crypto-hash.c | 10 +++--- > >> tests/bench/benchmark-crypto-hmac.c | 6 ++-- > >> tests/unit/test-crypto-afsplit.c| 10 +++--- > >> tests/unit/test-crypto-akcipher.c | 6 ++-- > >> tests/unit/test-crypto-block.c | 16 - > >> tests/unit/test-crypto-hash.c | 42 +++ > >> tests/unit/test-crypto-hmac.c | 16 - > >> tests/unit/test-crypto-ivgen.c | 8 ++--- > >> tests/unit/test-crypto-pbkdf.c | 44 - > >> ui/vnc.c| 2 +- > >> util/hbitmap.c | 2 +- > >> crypto/akcipher-gcrypt.c.inc| 14 > >> crypto/akcipher-nettle.c.inc| 26 +++ > >> 52 files changed, 350 insertions(+), 351 deletions(-) > > > > Acked-by: Daniel P. Berrangé > > Thanks! > With regards, Daniel -- |: https://berrange.com -o-https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o-https://fstop138.berrange.com :| |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|
[PATCH v2 1/4] scsi-disk: Use positive return value for status in dma_readv/writev
In some error cases, scsi_block_sgio_complete() never calls the passed callback, but directly completes the request. This leads to bugs because its error paths are not exact copies of what the callback would normally do. In preparation to fix this, allow passing positive return values to the callbacks that represent the status code that should be used to complete the request. scsi_handle_rw_error() already handles positive values for its ret parameter because scsi_block_sgio_complete() calls directly into it. Signed-off-by: Kevin Wolf --- hw/scsi/scsi-disk.c | 21 ++--- 1 file changed, 14 insertions(+), 7 deletions(-) diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c index a67092db6a..3ff6798bde 100644 --- a/hw/scsi/scsi-disk.c +++ b/hw/scsi/scsi-disk.c @@ -65,6 +65,10 @@ OBJECT_DECLARE_TYPE(SCSIDiskState, SCSIDiskClass, SCSI_DISK_BASE) struct SCSIDiskClass { SCSIDeviceClass parent_class; +/* + * Callbacks receive ret == 0 for success. Errors are represented either as + * negative errno values, or as positive SAM status codes. + */ DMAIOFunc *dma_readv; DMAIOFunc *dma_writev; bool(*need_fua_emulation)(SCSICommand *cmd); @@ -283,7 +287,7 @@ static bool scsi_disk_req_check_error(SCSIDiskReq *r, int ret, bool acct_failed) return true; } -if (ret < 0) { +if (ret != 0) { return scsi_handle_rw_error(r, ret, acct_failed); } @@ -360,7 +364,7 @@ static void scsi_write_do_fua(SCSIDiskReq *r) static void scsi_dma_complete_noio(SCSIDiskReq *r, int ret) { assert(r->req.aiocb == NULL); -if (scsi_disk_req_check_error(r, ret, false)) { +if (scsi_disk_req_check_error(r, ret, ret > 0)) { goto done; } @@ -385,9 +389,10 @@ static void scsi_dma_complete(void *opaque, int ret) assert(r->req.aiocb != NULL); r->req.aiocb = NULL; +/* ret > 0 is accounted for in scsi_disk_req_check_error() */ if (ret < 0) { block_acct_failed(blk_get_stats(s->qdev.conf.blk), &r->acct); -} else { +} else if (ret == 0) { block_acct_done(blk_get_stats(s->qdev.conf.blk), &r->acct); } scsi_dma_complete_noio(r, ret); @@ -403,7 +408,7 @@ static void scsi_read_complete_noio(SCSIDiskReq *r, int ret) qemu_get_current_aio_context()); assert(r->req.aiocb == NULL); -if (scsi_disk_req_check_error(r, ret, false)) { +if (scsi_disk_req_check_error(r, ret, ret > 0)) { goto done; } @@ -424,9 +429,10 @@ static void scsi_read_complete(void *opaque, int ret) assert(r->req.aiocb != NULL); r->req.aiocb = NULL; +/* ret > 0 is accounted for in scsi_disk_req_check_error() */ if (ret < 0) { block_acct_failed(blk_get_stats(s->qdev.conf.blk), &r->acct); -} else { +} else if (ret == 0) { block_acct_done(blk_get_stats(s->qdev.conf.blk), &r->acct); trace_scsi_disk_read_complete(r->req.tag, r->qiov.size); } @@ -534,7 +540,7 @@ static void scsi_write_complete_noio(SCSIDiskReq *r, int ret) qemu_get_current_aio_context()); assert (r->req.aiocb == NULL); -if (scsi_disk_req_check_error(r, ret, false)) { +if (scsi_disk_req_check_error(r, ret, ret > 0)) { goto done; } @@ -562,9 +568,10 @@ static void scsi_write_complete(void * opaque, int ret) assert (r->req.aiocb != NULL); r->req.aiocb = NULL; +/* ret > 0 is accounted for in scsi_disk_req_check_error() */ if (ret < 0) { block_acct_failed(blk_get_stats(s->qdev.conf.blk), &r->acct); -} else { +} else if (ret == 0) { block_acct_done(blk_get_stats(s->qdev.conf.blk), &r->acct); } scsi_write_complete_noio(r, ret); -- 2.45.2
[PATCH v2 0/4] scsi-block: Fix error handling with r/werror=stop
Running validation tests in Windows 2019's Failover Cluster Manager fails in two different ways when run with rerror/werror=stop: 1. It runs into an assertion failure because the sgio-based I/O path takes shortcuts in its error handling that skip necessary cleanup 2. RESERVATION_CONFLICT is treated as a host error and stops the VM, which in some cases can't be resumed at all because nothing will make the error go away on retry. The error should always go to the guest instead, it's an invalid request from the guest. This series fixes these problems. v2: - Patch 4: [Paolo] * Set error = 0 explicitly, remove useless variable initialisation * Add comment to explain why we consider it a guest error * Mention scsi-block specifically in the commit message Kevin Wolf (4): scsi-disk: Use positive return value for status in dma_readv/writev scsi-block: Don't skip callback for sgio error status/driver_status scsi-disk: Add warning comments that host_status errors take a shortcut scsi-disk: Always report RESERVATION_CONFLICT to guest hw/scsi/scsi-disk.c | 73 +++-- 1 file changed, 51 insertions(+), 22 deletions(-) -- 2.45.2
[PATCH v2 2/4] scsi-block: Don't skip callback for sgio error status/driver_status
Instead of calling into scsi_handle_rw_error() directly from scsi_block_sgio_complete() and skipping the normal callback, go through the normal cleanup path by calling the callback with a positive error value. The important difference here is not only that the code path is cleaner, but that the callbacks set r->req.aiocb = NULL. If we skip setting this and the error action is BLOCK_ERROR_ACTION_STOP, resuming the VM runs into an assertion failure in scsi_read_data() or scsi_write_data() because the dangling aiocb pointer is unexpected. Fixes: a108557bbf ("scsi: inline sg_io_sense_from_errno() into the callers.") Buglink: https://issues.redhat.com/browse/RHEL-5 Signed-off-by: Kevin Wolf --- hw/scsi/scsi-disk.c | 10 -- 1 file changed, 10 deletions(-) diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c index 3ff6798bde..6e1a5c98df 100644 --- a/hw/scsi/scsi-disk.c +++ b/hw/scsi/scsi-disk.c @@ -2832,16 +2832,6 @@ static void scsi_block_sgio_complete(void *opaque, int ret) } else { ret = io_hdr->status; } - -if (ret > 0) { -if (scsi_handle_rw_error(r, ret, true)) { -scsi_req_unref(&r->req); -return; -} - -/* Ignore error. */ -ret = 0; -} } req->cb(req->cb_opaque, ret); -- 2.45.2
[PATCH v2 4/4] scsi-disk: Always report RESERVATION_CONFLICT to guest
In the case of scsi-block, RESERVATION_CONFLICT is not a backend error, but indicates that the guest tried to make a request that it isn't allowed to execute. Pass the error to the guest so that it can decide what to do with it. Without this, if we stop the VM in response to a RESERVATION_CONFLICT (as is the default policy in management software such as oVirt or KubeVirt), it can happen that the VM cannot be resumed any more because every attempt to resume it immediately runs into the same error and stops the VM again. One case that expects RESERVATION_CONFLICT errors to be visible in the guest is running the validation tests in Windows 2019's Failover Cluster Manager, which intentionally tries to execute invalid requests to see if they are properly rejected. Buglink: https://issues.redhat.com/browse/RHEL-5 Signed-off-by: Kevin Wolf --- hw/scsi/scsi-disk.c | 35 ++- 1 file changed, 30 insertions(+), 5 deletions(-) diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c index 69a195177e..4d94b2b816 100644 --- a/hw/scsi/scsi-disk.c +++ b/hw/scsi/scsi-disk.c @@ -224,7 +224,7 @@ static bool scsi_handle_rw_error(SCSIDiskReq *r, int ret, bool acct_failed) SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, r->req.dev); SCSIDiskClass *sdc = (SCSIDiskClass *) object_get_class(OBJECT(s)); SCSISense sense = SENSE_CODE(NO_SENSE); -int error = 0; +int error; bool req_has_sense = false; BlockErrorAction action; int status; @@ -235,11 +235,35 @@ static bool scsi_handle_rw_error(SCSIDiskReq *r, int ret, bool acct_failed) } else { /* A passthrough command has completed with nonzero status. */ status = ret; -if (status == CHECK_CONDITION) { +switch (status) { +case CHECK_CONDITION: req_has_sense = true; error = scsi_sense_buf_to_errno(r->req.sense, sizeof(r->req.sense)); -} else { +break; +case RESERVATION_CONFLICT: +/* + * Don't apply the error policy, always report to the guest. + * + * This is a passthrough code path, so it's not a backend error, but + * a response to an invalid guest request. + * + * Windows Failover Cluster validation intentionally sends invalid + * requests to verify that reservations work as intended. It is + * crucial that it sees the resulting errors. + * + * Treating a reservation conflict as a guest-side error is obvious + * when a pr-manager is in use. Without one, the situation is less + * clear, but there might be nothing that can be fixed on the host + * (like in the above example), and we don't want to be stuck in a + * loop where resuming the VM and retrying the request immediately + * stops it again. So always reporting is still the safer option in + * this case, too. + */ +error = 0; +break; +default: error = EINVAL; +break; } } @@ -249,8 +273,9 @@ static bool scsi_handle_rw_error(SCSIDiskReq *r, int ret, bool acct_failed) * are usually retried immediately, so do not post them to QMP and * do not account them as failed I/O. */ -if (req_has_sense && -scsi_sense_buf_is_guest_recoverable(r->req.sense, sizeof(r->req.sense))) { +if (!error || (req_has_sense && + scsi_sense_buf_is_guest_recoverable(r->req.sense, + sizeof(r->req.sense { action = BLOCK_ERROR_ACTION_REPORT; acct_failed = false; } else { -- 2.45.2
[PATCH v2 3/4] scsi-disk: Add warning comments that host_status errors take a shortcut
scsi_block_sgio_complete() has surprising behaviour in that there are error cases in which it directly completes the request and never calls the passed callback. In the current state of the code, this doesn't seem to result in bugs, but with future code changes, we must be careful to never rely on the callback doing some cleanup until this code smell is fixed. For now, just add warnings to make people aware of the trap. Signed-off-by: Kevin Wolf --- hw/scsi/scsi-disk.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c index 6e1a5c98df..69a195177e 100644 --- a/hw/scsi/scsi-disk.c +++ b/hw/scsi/scsi-disk.c @@ -68,6 +68,9 @@ struct SCSIDiskClass { /* * Callbacks receive ret == 0 for success. Errors are represented either as * negative errno values, or as positive SAM status codes. + * + * Beware: For errors returned in host_status, the function may directly + * complete the request and never call the callback. */ DMAIOFunc *dma_readv; DMAIOFunc *dma_writev; @@ -381,6 +384,7 @@ done: scsi_req_unref(&r->req); } +/* May not be called in all error cases, don't rely on cleanup here */ static void scsi_dma_complete(void *opaque, int ret) { SCSIDiskReq *r = (SCSIDiskReq *)opaque; @@ -421,6 +425,7 @@ done: scsi_req_unref(&r->req); } +/* May not be called in all error cases, don't rely on cleanup here */ static void scsi_read_complete(void *opaque, int ret) { SCSIDiskReq *r = (SCSIDiskReq *)opaque; @@ -560,6 +565,7 @@ done: scsi_req_unref(&r->req); } +/* May not be called in all error cases, don't rely on cleanup here */ static void scsi_write_complete(void * opaque, int ret) { SCSIDiskReq *r = (SCSIDiskReq *)opaque; @@ -2821,6 +2827,7 @@ static void scsi_block_sgio_complete(void *opaque, int ret) sg_io_hdr_t *io_hdr = &req->io_header; if (ret == 0) { +/* FIXME This skips calling req->cb() and any cleanup in it */ if (io_hdr->host_status != SCSI_HOST_OK) { scsi_req_complete_failed(&r->req, io_hdr->host_status); scsi_req_unref(&r->req); -- 2.45.2
Re: [PATCH] iotests/024: exclude 'backing file format' field from the output
On 30/7/24 11:47, Andrey Drobyshev wrote: Apparently 'qemu-img info' doesn't report the backing file format field for qed (as it does for qcow2): $ qemu-img create -f qed base.qed 1M && qemu-img create -f qed -b base.qed -F qed top.qed 1M $ qemu-img create -f qcow2 base.qcow2 1M && qemu-img create -f qcow2 -b base.qcow2 -F qcow2 top.qcow2 1M $ qemu-img info top.qed | grep 'backing file format' $ qemu-img info top.qcow2 | grep 'backing file format' backing file format: qcow2 This leads to the 024 test failure with -qed. Let's just filter the field out and exclude it from the output. This is a fixup for the commit f93e65ee51 ("iotests/{024, 271}: add testcases for qemu-img rebase"). Found-by: Thomas Huth s/Found/Reported/ Signed-off-by: Andrey Drobyshev --- tests/qemu-iotests/024 | 2 +- tests/qemu-iotests/024.out | 1 - 2 files changed, 1 insertion(+), 2 deletions(-)
[PATCH 1/7] block/vdi.c: Avoid potential overflow when calculating size of write
In vdi_co_pwritev() we multiply a sector count by SECTOR_SIZE to get the size to write in bytes. Coverity notes that this means that we do the multiply as a 32x32->32 multiply before converting to 64 bits, which has the potential to overflow. This is very unlikely to happen, since the block map has 4 bytes per block and the maximum number of blocks in the image must fit into a 32-bit integer. But we can keep Coverity happy by including a cast so we do a 64-bit multiply here. Resolves: Coverity CID 1508076 Signed-off-by: Peter Maydell --- block/vdi.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/block/vdi.c b/block/vdi.c index 6363da08cee..27c60ba18d0 100644 --- a/block/vdi.c +++ b/block/vdi.c @@ -728,7 +728,7 @@ nonallocating_write: logout("will write %u block map sectors starting from entry %u\n", n_sectors, bmap_first); ret = bdrv_co_pwrite(bs->file, bmap_offset * SECTOR_SIZE, - n_sectors * SECTOR_SIZE, base, 0); + n_sectors * (uint64_t)SECTOR_SIZE, base, 0); } return ret; -- 2.34.1
[PATCH 0/7] block: Miscellaneous minor Coverity fixes
This patchset is a collection of fixes for minor Coverity reported issues. In all cases, there isn't a user-visible problem, but the Coverity issue pointed up somewhere where we could clean up the code or make it a bit more obvious to a human reader what the intent was. Only lightly tested (with "make check" and "make check-avocado"). thanks -- PMM Peter Maydell (7): block/vdi.c: Avoid potential overflow when calculating size of write block/gluster: Use g_autofree for string in qemu_gluster_parse_json() hw/block/pflash_cfi01: Don't decrement pfl->counter below 0 hw/ide/atapi: Be explicit that assigning to s->lcyl truncates hw/block/fdc-isa: Assert that isa_fdc_get_drive_max_chs() found something hw/ide/pci.c: Remove dead code from bmdma_prepare_buf() block/ssh.c: Don't double-check that characters are hex digits block/gluster.c | 6 +- block/ssh.c | 10 ++ block/vdi.c | 2 +- hw/block/fdc-isa.c | 2 ++ hw/block/pflash_cfi01.c | 1 + hw/ide/atapi.c | 2 +- hw/ide/pci.c| 4 7 files changed, 12 insertions(+), 15 deletions(-) -- 2.34.1
[PATCH 7/7] block/ssh.c: Don't double-check that characters are hex digits
In compare_fingerprint() we effectively check whether the characters in the fingerprint are valid hex digits twice: first we do so with qemu_isxdigit(), but then the hex2decimal() function also has a code path where it effectively detects an invalid digit and returns -1. This causes Coverity to complain because it thinks that we might use that -1 value in an expression where it would be an integer overflow. Avoid the double-check of hex digit validity by testing the return values from hex2decimal() rather than doing separate calls to qemu_isxdigit(). Signed-off-by: Peter Maydell --- Could alternatively have put a g_assert_non_reached() in hex2decimal(), but this seemed better to me. --- block/ssh.c | 10 ++ 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/block/ssh.c b/block/ssh.c index 27d582e0e3d..510dd208aba 100644 --- a/block/ssh.c +++ b/block/ssh.c @@ -376,13 +376,15 @@ static int compare_fingerprint(const unsigned char *fingerprint, size_t len, unsigned c; while (len > 0) { +unsigned c0, c1; while (*host_key_check == ':') host_key_check++; -if (!qemu_isxdigit(host_key_check[0]) || -!qemu_isxdigit(host_key_check[1])) +c0 = hex2decimal(host_key_check[0]); +c1 = hex2decimal(host_key_check[1]); +if (c0 > 0xf || c1 > 0xf) { return 1; -c = hex2decimal(host_key_check[0]) * 16 + -hex2decimal(host_key_check[1]); +} +c = c0 * 16 + c1; if (c - *fingerprint != 0) return c - *fingerprint; fingerprint++; -- 2.34.1
[PATCH 5/7] hw/block/fdc-isa: Assert that isa_fdc_get_drive_max_chs() found something
Coverity complains about an overflow in isa_fdc_get_drive_max_chs() that can happen if the loop over fd_formats never finds a match, because we initialize *maxc to 0 and then at the end of the function decrement it. This can't ever actually happen because fd_formats has at least one entry for each FloppyDriveType, so we must at least once find a match and update *maxc, *maxh and *maxs. Assert that we did find a match, which should keep Coverity happy and will also detect possible bugs in the data in fd_formats. Resolves: Coverity CID 1547663 Signed-off-by: Peter Maydell --- hw/block/fdc-isa.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/hw/block/fdc-isa.c b/hw/block/fdc-isa.c index e43dc532af8..796835f57b3 100644 --- a/hw/block/fdc-isa.c +++ b/hw/block/fdc-isa.c @@ -147,6 +147,8 @@ static void isa_fdc_get_drive_max_chs(FloppyDriveType type, uint8_t *maxc, *maxs = fdf->last_sect; } } +/* fd_formats must contain at least one entry per FloppyDriveType */ +assert(*maxc); (*maxc)--; } -- 2.34.1
[PATCH 6/7] hw/ide/pci.c: Remove dead code from bmdma_prepare_buf()
Coverity notes that the code at the end of the loop in bmdma_prepare_buf() is unreachable. This is because in commit 9fbf0fa81fca8f527 ("ide: remove hardcoded 2GiB transactional limit") we removed the only codepath in the loop which could "break" out of it, but didn't notice that this meant we should also remove the code at the end of the loop. Remove the dead code. Resolves: Coverity CID 1547772 Signed-off-by: Peter Maydell --- hw/ide/pci.c | 4 1 file changed, 4 deletions(-) diff --git a/hw/ide/pci.c b/hw/ide/pci.c index 4675d079a17..f2cb500a94f 100644 --- a/hw/ide/pci.c +++ b/hw/ide/pci.c @@ -266,10 +266,6 @@ static int32_t bmdma_prepare_buf(const IDEDMA *dma, int32_t limit) s->io_buffer_size += l; } } - -qemu_sglist_destroy(&s->sg); -s->io_buffer_size = 0; -return -1; } /* return 0 if buffer completed */ -- 2.34.1
[PATCH 4/7] hw/ide/atapi: Be explicit that assigning to s->lcyl truncates
In ide_atapi_cmd_reply_end() we calculate a 16-bit size, and then assign its two halves to s->lcyl and s->hcyl like this: s->lcyl = size; s->hcyl = size >> 8; Coverity warns that the first line here can overflow the 8-bit s->lcyl variable. This is true, and in this case we're deliberately only after the low 8 bits of the value. The code is clearer to both humans and Coverity if we're explicit that we only wanted the low 8 bits, though. Signed-off-by: Peter Maydell --- hw/ide/atapi.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/hw/ide/atapi.c b/hw/ide/atapi.c index fcb6cca1573..e82959dc2d3 100644 --- a/hw/ide/atapi.c +++ b/hw/ide/atapi.c @@ -265,7 +265,7 @@ void ide_atapi_cmd_reply_end(IDEState *s) byte_count_limit--; size = byte_count_limit; } -s->lcyl = size; +s->lcyl = size & 0xff; s->hcyl = size >> 8; s->elementary_transfer_size = size; /* we cannot transmit more than one sector at a time */ -- 2.34.1
[PATCH 2/7] block/gluster: Use g_autofree for string in qemu_gluster_parse_json()
In the loop in qemu_gluster_parse_json() we do: char *str = NULL; for(...) { str = g_strdup_printf(...); ... if (various errors) { goto out; } ... g_free(str); str = NULL; } return 0; out: various cleanups; g_free(str); ... return -errno; Coverity correctly complains that the assignment "str = NULL" at the end of the loop is unnecessary, because we will either go back to the top of the loop and overwrite it, or else we will exit the loop and then exit the function without ever reading str again. The assignment is there as defensive coding to ensure that str is only non-NULL if it's a live allocation, so this is intentional. We can make Coverity happier and simplify the code here by using g_autofree, since we never need 'str' outside the loop. Resolves: Coverity CID 1527385 Signed-off-by: Peter Maydell --- block/gluster.c | 6 +- 1 file changed, 1 insertion(+), 5 deletions(-) diff --git a/block/gluster.c b/block/gluster.c index f8b415f3812..61ded95e660 100644 --- a/block/gluster.c +++ b/block/gluster.c @@ -514,7 +514,6 @@ static int qemu_gluster_parse_json(BlockdevOptionsGluster *gconf, SocketAddressList **tail; QDict *backing_options = NULL; Error *local_err = NULL; -char *str = NULL; const char *ptr; int i, type, num_servers; @@ -547,7 +546,7 @@ static int qemu_gluster_parse_json(BlockdevOptionsGluster *gconf, tail = &gconf->server; for (i = 0; i < num_servers; i++) { -str = g_strdup_printf(GLUSTER_OPT_SERVER_PATTERN"%d.", i); +g_autofree char *str = g_strdup_printf(GLUSTER_OPT_SERVER_PATTERN"%d.", i); qdict_extract_subqdict(options, &backing_options, str); /* create opts info from runtime_type_opts list */ @@ -658,8 +657,6 @@ static int qemu_gluster_parse_json(BlockdevOptionsGluster *gconf, qobject_unref(backing_options); backing_options = NULL; -g_free(str); -str = NULL; } return 0; @@ -668,7 +665,6 @@ out: error_propagate(errp, local_err); qapi_free_SocketAddress(gsconf); qemu_opts_del(opts); -g_free(str); qobject_unref(backing_options); errno = EINVAL; return -errno; -- 2.34.1
[PATCH 3/7] hw/block/pflash_cfi01: Don't decrement pfl->counter below 0
In pflash_write() Coverity points out that we can decrement the unsigned pfl->counter below zero, which makes it wrap around. In fact this is harmless, because if pfl->counter is 0 at this point we also increment pfl->wcycle to 3, and the wcycle == 3 handling doesn't look at counter; the only way back into code which looks at the counter value is via wcycle == 1, which will reinitialize the counter. But it's arguably a little clearer to break early in the "counter == 0" if(), to avoid the decrement-below-zero. Resolves: Coverity CID 1547611 Signed-off-by: Peter Maydell --- hw/block/pflash_cfi01.c | 1 + 1 file changed, 1 insertion(+) diff --git a/hw/block/pflash_cfi01.c b/hw/block/pflash_cfi01.c index c8f1cf5a872..2f3d1dd509c 100644 --- a/hw/block/pflash_cfi01.c +++ b/hw/block/pflash_cfi01.c @@ -614,6 +614,7 @@ static void pflash_write(PFlashCFI01 *pfl, hwaddr offset, if (!pfl->counter) { trace_pflash_write(pfl->name, "block write finished"); pfl->wcycle++; +break; } pfl->counter--; -- 2.34.1
Re: [PATCH 4/7] hw/ide/atapi: Be explicit that assigning to s->lcyl truncates
Peter Maydell writes: > In ide_atapi_cmd_reply_end() we calculate a 16-bit size, and then > assign its two halves to s->lcyl and s->hcyl like this: > >s->lcyl = size; >s->hcyl = size >> 8; > > Coverity warns that the first line here can overflow the > 8-bit s->lcyl variable. This is true, and in this case we're > deliberately only after the low 8 bits of the value. The > code is clearer to both humans and Coverity if we're explicit > that we only wanted the low 8 bits, though. > Resolves? > Signed-off-by: Peter Maydell > --- > hw/ide/atapi.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/hw/ide/atapi.c b/hw/ide/atapi.c > index fcb6cca1573..e82959dc2d3 100644 > --- a/hw/ide/atapi.c > +++ b/hw/ide/atapi.c > @@ -265,7 +265,7 @@ void ide_atapi_cmd_reply_end(IDEState *s) /* a new transfer is needed */ s->nsector = (s->nsector & ~7) | ATAPI_INT_REASON_IO; ide_bus_set_irq(s->bus); byte_count_limit = atapi_byte_count_limit(s); trace_ide_atapi_cmd_reply_end_bcl(s, byte_count_limit); size = s->packet_transfer_size; if (size > byte_count_limit) { /* byte count limit must be even if this case */ if (byte_count_limit & 1) > byte_count_limit--; > size = byte_count_limit; > } > -s->lcyl = size; > +s->lcyl = size & 0xff; > s->hcyl = size >> 8; > s->elementary_transfer_size = size; > /* we cannot transmit more than one sector at a time */ @size is int. I wonder why it's fine with size >> 8... ah, atapi_byte_count_limit() returns uint16_t, and Coverity is smart enough to data-flow this via @byte_count_limit into @size. Reviewed-by: Markus Armbruster
Re: [PATCH 4/7] hw/ide/atapi: Be explicit that assigning to s->lcyl truncates
On Wed, 31 Jul 2024 at 15:47, Markus Armbruster wrote: > > Peter Maydell writes: > > > In ide_atapi_cmd_reply_end() we calculate a 16-bit size, and then > > assign its two halves to s->lcyl and s->hcyl like this: > > > >s->lcyl = size; > >s->hcyl = size >> 8; > > > > Coverity warns that the first line here can overflow the > > 8-bit s->lcyl variable. This is true, and in this case we're > > deliberately only after the low 8 bits of the value. The > > code is clearer to both humans and Coverity if we're explicit > > that we only wanted the low 8 bits, though. > > > > Resolves? Whoops. Resolves: Coverity CID 1547621 > > Signed-off-by: Peter Maydell > > --- > > hw/ide/atapi.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/hw/ide/atapi.c b/hw/ide/atapi.c > > index fcb6cca1573..e82959dc2d3 100644 > > --- a/hw/ide/atapi.c > > +++ b/hw/ide/atapi.c > > @@ -265,7 +265,7 @@ void ide_atapi_cmd_reply_end(IDEState *s) >/* a new transfer is needed */ >s->nsector = (s->nsector & ~7) | ATAPI_INT_REASON_IO; >ide_bus_set_irq(s->bus); >byte_count_limit = atapi_byte_count_limit(s); >trace_ide_atapi_cmd_reply_end_bcl(s, byte_count_limit); >size = s->packet_transfer_size; >if (size > byte_count_limit) { >/* byte count limit must be even if this case */ >if (byte_count_limit & 1) > > byte_count_limit--; > > size = byte_count_limit; > > } > > -s->lcyl = size; > > +s->lcyl = size & 0xff; > > s->hcyl = size >> 8; > > s->elementary_transfer_size = size; > > /* we cannot transmit more than one sector at a time */ > > @size is int. I wonder why it's fine with size >> 8... ah, > atapi_byte_count_limit() returns uint16_t, and Coverity is smart enough > to data-flow this via @byte_count_limit into @size. > > Reviewed-by: Markus Armbruster thanks -- PMM
Re: [PATCH 5/7] hw/block/fdc-isa: Assert that isa_fdc_get_drive_max_chs() found something
Peter Maydell writes: > Coverity complains about an overflow in isa_fdc_get_drive_max_chs() > that can happen if the loop over fd_formats never finds a match, > because we initialize *maxc to 0 and then at the end of the > function decrement it. > > This can't ever actually happen because fd_formats has at least > one entry for each FloppyDriveType, so we must at least once > find a match and update *maxc, *maxh and *maxs. Assert that we > did find a match, which should keep Coverity happy and will also > detect possible bugs in the data in fd_formats. > > Resolves: Coverity CID 1547663 > Signed-off-by: Peter Maydell > --- > hw/block/fdc-isa.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/hw/block/fdc-isa.c b/hw/block/fdc-isa.c > index e43dc532af8..796835f57b3 100644 > --- a/hw/block/fdc-isa.c > +++ b/hw/block/fdc-isa.c > @@ -147,6 +147,8 @@ static void isa_fdc_get_drive_max_chs(FloppyDriveType > type, uint8_t *maxc, > *maxs = fdf->last_sect; > } > } > +/* fd_formats must contain at least one entry per FloppyDriveType */ > +assert(*maxc); > (*maxc)--; > } Reviewed-by: Markus Armbruster
Re: [PATCH 5/7] hw/block/fdc-isa: Assert that isa_fdc_get_drive_max_chs() found something
Am 31.07.2024 um 16:36 hat Peter Maydell geschrieben: > Coverity complains about an overflow in isa_fdc_get_drive_max_chs() > that can happen if the loop over fd_formats never finds a match, > because we initialize *maxc to 0 and then at the end of the > function decrement it. > > This can't ever actually happen because fd_formats has at least > one entry for each FloppyDriveType, so we must at least once > find a match and update *maxc, *maxh and *maxs. Assert that we > did find a match, which should keep Coverity happy and will also > detect possible bugs in the data in fd_formats. > > Resolves: Coverity CID 1547663 > Signed-off-by: Peter Maydell Reviewed-by: Kevin Wolf
Re: [PATCH 4/7] hw/ide/atapi: Be explicit that assigning to s->lcyl truncates
Am 31.07.2024 um 16:36 hat Peter Maydell geschrieben: > In ide_atapi_cmd_reply_end() we calculate a 16-bit size, and then > assign its two halves to s->lcyl and s->hcyl like this: > >s->lcyl = size; >s->hcyl = size >> 8; > > Coverity warns that the first line here can overflow the > 8-bit s->lcyl variable. This is true, and in this case we're > deliberately only after the low 8 bits of the value. The > code is clearer to both humans and Coverity if we're explicit > that we only wanted the low 8 bits, though. > > Signed-off-by: Peter Maydell Reviewed-by: Kevin Wolf
Re: [PATCH 1/7] block/vdi.c: Avoid potential overflow when calculating size of write
Am 31.07.2024 um 16:36 hat Peter Maydell geschrieben: > In vdi_co_pwritev() we multiply a sector count by SECTOR_SIZE to > get the size to write in bytes. Coverity notes that this means that > we do the multiply as a 32x32->32 multiply before converting to > 64 bits, which has the potential to overflow. > > This is very unlikely to happen, since the block map has 4 bytes per > block and the maximum number of blocks in the image must fit into a > 32-bit integer. But we can keep Coverity happy by including a cast > so we do a 64-bit multiply here. > > Resolves: Coverity CID 1508076 > Signed-off-by: Peter Maydell > --- > block/vdi.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/block/vdi.c b/block/vdi.c > index 6363da08cee..27c60ba18d0 100644 > --- a/block/vdi.c > +++ b/block/vdi.c > @@ -728,7 +728,7 @@ nonallocating_write: > logout("will write %u block map sectors starting from entry %u\n", > n_sectors, bmap_first); > ret = bdrv_co_pwrite(bs->file, bmap_offset * SECTOR_SIZE, > - n_sectors * SECTOR_SIZE, base, 0); > + n_sectors * (uint64_t)SECTOR_SIZE, base, 0); > } I wonder if we shouldn't just make VDI's SECTOR_SIZE 64 bits like BDRV_SECTOR_SIZE. It's easy to miss the cast in individual places. Kevin
Re: [PATCH 2/7] block/gluster: Use g_autofree for string in qemu_gluster_parse_json()
Am 31.07.2024 um 16:36 hat Peter Maydell geschrieben: > In the loop in qemu_gluster_parse_json() we do: > > char *str = NULL; > for(...) { > str = g_strdup_printf(...); > ... > if (various errors) { > goto out; > } > ... > g_free(str); > str = NULL; > } > return 0; > out: > various cleanups; > g_free(str); > ... > return -errno; > > Coverity correctly complains that the assignment "str = NULL" at the > end of the loop is unnecessary, because we will either go back to the > top of the loop and overwrite it, or else we will exit the loop and > then exit the function without ever reading str again. The assignment > is there as defensive coding to ensure that str is only non-NULL if > it's a live allocation, so this is intentional. > > We can make Coverity happier and simplify the code here by using > g_autofree, since we never need 'str' outside the loop. > > Resolves: Coverity CID 1527385 > Signed-off-by: Peter Maydell > --- > block/gluster.c | 6 +- > 1 file changed, 1 insertion(+), 5 deletions(-) > > diff --git a/block/gluster.c b/block/gluster.c > index f8b415f3812..61ded95e660 100644 > --- a/block/gluster.c > +++ b/block/gluster.c > @@ -514,7 +514,6 @@ static int qemu_gluster_parse_json(BlockdevOptionsGluster > *gconf, > SocketAddressList **tail; > QDict *backing_options = NULL; > Error *local_err = NULL; > -char *str = NULL; > const char *ptr; > int i, type, num_servers; > > @@ -547,7 +546,7 @@ static int qemu_gluster_parse_json(BlockdevOptionsGluster > *gconf, > tail = &gconf->server; > > for (i = 0; i < num_servers; i++) { > -str = g_strdup_printf(GLUSTER_OPT_SERVER_PATTERN"%d.", i); > +g_autofree char *str = > g_strdup_printf(GLUSTER_OPT_SERVER_PATTERN"%d.", i); This line is too long now. With this fixed: Reviewed-by: Kevin Wolf
Re: [PATCH 1/7] block/vdi.c: Avoid potential overflow when calculating size of write
Am 31.07.24 um 16:36 schrieb Peter Maydell: In vdi_co_pwritev() we multiply a sector count by SECTOR_SIZE to get the size to write in bytes. Coverity notes that this means that we do the multiply as a 32x32->32 multiply before converting to 64 bits, which has the potential to overflow. This is very unlikely to happen, since the block map has 4 bytes per block and the maximum number of blocks in the image must fit into a 32-bit integer. But we can keep Coverity happy by including a cast so we do a 64-bit multiply here. Resolves: Coverity CID 1508076 Signed-off-by: Peter Maydell --- block/vdi.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/block/vdi.c b/block/vdi.c index 6363da08cee..27c60ba18d0 100644 --- a/block/vdi.c +++ b/block/vdi.c @@ -728,7 +728,7 @@ nonallocating_write: logout("will write %u block map sectors starting from entry %u\n", n_sectors, bmap_first); ret = bdrv_co_pwrite(bs->file, bmap_offset * SECTOR_SIZE, - n_sectors * SECTOR_SIZE, base, 0); + n_sectors * (uint64_t)SECTOR_SIZE, base, 0); } return ret; Thanks. Reviewed-by: Stefan Weil
Re: [PATCH 3/7] hw/block/pflash_cfi01: Don't decrement pfl->counter below 0
Am 31.07.2024 um 16:36 hat Peter Maydell geschrieben: > In pflash_write() Coverity points out that we can decrement the > unsigned pfl->counter below zero, which makes it wrap around. In > fact this is harmless, because if pfl->counter is 0 at this point we > also increment pfl->wcycle to 3, and the wcycle == 3 handling doesn't > look at counter; the only way back into code which looks at the > counter value is via wcycle == 1, which will reinitialize the counter. > But it's arguably a little clearer to break early in the "counter == > 0" if(), to avoid the decrement-below-zero. > > Resolves: Coverity CID 1547611 > Signed-off-by: Peter Maydell Reviewed-by: Kevin Wolf
Re: [PATCH 6/7] hw/ide/pci.c: Remove dead code from bmdma_prepare_buf()
Am 31.07.2024 um 16:36 hat Peter Maydell geschrieben: > Coverity notes that the code at the end of the loop in > bmdma_prepare_buf() is unreachable. This is because in commit > 9fbf0fa81fca8f527 ("ide: remove hardcoded 2GiB transactional limit") > we removed the only codepath in the loop which could "break" out of > it, but didn't notice that this meant we should also remove the code > at the end of the loop. > > Remove the dead code. > > Resolves: Coverity CID 1547772 > Signed-off-by: Peter Maydell > --- > hw/ide/pci.c | 4 > 1 file changed, 4 deletions(-) > > diff --git a/hw/ide/pci.c b/hw/ide/pci.c > index 4675d079a17..f2cb500a94f 100644 > --- a/hw/ide/pci.c > +++ b/hw/ide/pci.c > @@ -266,10 +266,6 @@ static int32_t bmdma_prepare_buf(const IDEDMA *dma, > int32_t limit) > s->io_buffer_size += l; > } > } > - > -qemu_sglist_destroy(&s->sg); > -s->io_buffer_size = 0; > -return -1; > } Should we put a g_assert_not_reached() here instead to make it easier for the reader to understand how this function works? Either way: Reviewed-by: Kevin Wolf
Re: [PATCH 7/7] block/ssh.c: Don't double-check that characters are hex digits
Am 31.07.2024 um 16:36 hat Peter Maydell geschrieben: > In compare_fingerprint() we effectively check whether the characters > in the fingerprint are valid hex digits twice: first we do so with > qemu_isxdigit(), but then the hex2decimal() function also has a code > path where it effectively detects an invalid digit and returns -1. > This causes Coverity to complain because it thinks that we might use > that -1 value in an expression where it would be an integer overflow. If it's a Coverity issue, I think you want to mention the CID, too. > Avoid the double-check of hex digit validity by testing the return > values from hex2decimal() rather than doing separate calls to > qemu_isxdigit(). > > Signed-off-by: Peter Maydell > --- > Could alternatively have put a g_assert_non_reached() in > hex2decimal(), but this seemed better to me. I don't like that hex2decimal() returns -1 when its result is unsigned, which is why you had to write the check as > 0xf rather than < 0. So in this sense, g_assert_non_reached() would look better to me. But we could also just change it to return UINT_MAX instead, which should be the same, just written in a less confusing way. > block/ssh.c | 10 ++ > 1 file changed, 6 insertions(+), 4 deletions(-) > > diff --git a/block/ssh.c b/block/ssh.c > index 27d582e0e3d..510dd208aba 100644 > --- a/block/ssh.c > +++ b/block/ssh.c > @@ -376,13 +376,15 @@ static int compare_fingerprint(const unsigned char > *fingerprint, size_t len, > unsigned c; > > while (len > 0) { > +unsigned c0, c1; > while (*host_key_check == ':') > host_key_check++; > -if (!qemu_isxdigit(host_key_check[0]) || > -!qemu_isxdigit(host_key_check[1])) > +c0 = hex2decimal(host_key_check[0]); > +c1 = hex2decimal(host_key_check[1]); > +if (c0 > 0xf || c1 > 0xf) { > return 1; > -c = hex2decimal(host_key_check[0]) * 16 + > -hex2decimal(host_key_check[1]); > +} > +c = c0 * 16 + c1; > if (c - *fingerprint != 0) > return c - *fingerprint; > fingerprint++; Reviewed-by: Kevin Wolf
Re: [PATCH 7/7] block/ssh.c: Don't double-check that characters are hex digits
On Wed, 31 Jul 2024 at 16:21, Kevin Wolf wrote: > > Am 31.07.2024 um 16:36 hat Peter Maydell geschrieben: > > In compare_fingerprint() we effectively check whether the characters > > in the fingerprint are valid hex digits twice: first we do so with > > qemu_isxdigit(), but then the hex2decimal() function also has a code > > path where it effectively detects an invalid digit and returns -1. > > This causes Coverity to complain because it thinks that we might use > > that -1 value in an expression where it would be an integer overflow. > > If it's a Coverity issue, I think you want to mention the CID, too. Yes; Resolves: Coverity CID 1547813 > > Avoid the double-check of hex digit validity by testing the return > > values from hex2decimal() rather than doing separate calls to > > qemu_isxdigit(). > > > > Signed-off-by: Peter Maydell > > --- > > Could alternatively have put a g_assert_non_reached() in > > hex2decimal(), but this seemed better to me. > > I don't like that hex2decimal() returns -1 when its result is unsigned, > which is why you had to write the check as > 0xf rather than < 0. So in > this sense, g_assert_non_reached() would look better to me. But we could > also just change it to return UINT_MAX instead, which should be the > same, just written in a less confusing way. I was not super happy with the -1 either. Happy to change that to 'return UINT_MAX'. -- PMM
Re: [PATCH-for-9.1 5/5] hw/sd/sdhci: Check ADMA descriptors can be accessed
On 30/7/24 11:21, Philippe Mathieu-Daudé wrote: Since malicious guest can write invalid addresses to the ADMASYSADDR register, we need to check whether the descriptor could be correctly filled or not. Cc: qemu-sta...@nongnu.org Fixes: d7dfca0807 ("hw/sdhci: introduce standard SD host controller") Signed-off-by: Philippe Mathieu-Daudé --- hw/sd/sdhci.c | 86 ++ hw/sd/trace-events | 2 +- 2 files changed, 49 insertions(+), 39 deletions(-) diff --git a/hw/sd/sdhci.c b/hw/sd/sdhci.c index 66b9364e9e..eb0476b9aa 100644 --- a/hw/sd/sdhci.c +++ b/hw/sd/sdhci.c @@ -698,53 +698,62 @@ static void trace_adma_description(const char *type, const ADMADescr *dscr) trace_sdhci_adma_desc(type, dscr->addr, dscr->length, dscr->attr, dscr->incr); } -static void get_adma_description(SDHCIState *s, ADMADescr *dscr) +static MemTxResult get_adma_description(SDHCIState *s, ADMADescr *dscr) { uint32_t adma1 = 0; uint64_t adma2 = 0; hwaddr entry_addr = (hwaddr)s->admasysaddr; +MemTxResult res; + switch (SDHC_DMA_TYPE(s->hostctl1)) { case SDHC_CTRL_ADMA2_32: -dma_memory_read(s->dma_as, entry_addr, &adma2, sizeof(adma2), -MEMTXATTRS_UNSPECIFIED); -adma2 = le64_to_cpu(adma2); -/* The spec does not specify endianness of descriptor table. - * We currently assume that it is LE. - */ -dscr->addr = (hwaddr)extract64(adma2, 32, 32) & ~0x3ull; -dscr->length = (uint16_t)extract64(adma2, 16, 16); -dscr->attr = (uint8_t)extract64(adma2, 0, 7); -dscr->incr = 8; -trace_adma_description("ADMA2_32", dscr); +res = dma_memory_read(s->dma_as, entry_addr, &adma2, sizeof(adma2), + MEMTXATTRS_UNSPECIFIED); +if (res == MEMTX_OK) { +adma2 = le64_to_cpu(adma2); +/* The spec does not specify endianness of descriptor table. + * We currently assume that it is LE. + */ +dscr->addr = (hwaddr)extract64(adma2, 32, 32) & ~0x3ull; +dscr->length = (uint16_t)extract64(adma2, 16, 16); +dscr->attr = (uint8_t)extract64(adma2, 0, 7); +dscr->incr = 8; +trace_adma_description("ADMA2_32", dscr); +} break; case SDHC_CTRL_ADMA1_32: -dma_memory_read(s->dma_as, entry_addr, &adma1, sizeof(adma1), -MEMTXATTRS_UNSPECIFIED); -adma1 = le32_to_cpu(adma1); -dscr->addr = (hwaddr)(adma1 & 0xF000); -dscr->attr = (uint8_t)extract32(adma1, 0, 7); -dscr->incr = 4; -if ((dscr->attr & SDHC_ADMA_ATTR_ACT_MASK) == SDHC_ADMA_ATTR_SET_LEN) { -dscr->length = (uint16_t)extract32(adma1, 12, 16); -} else { -dscr->length = 4 * KiB; +res = dma_memory_read(s->dma_as, entry_addr, &adma1, sizeof(adma1), + MEMTXATTRS_UNSPECIFIED); +if (res == MEMTX_OK) { +adma1 = le32_to_cpu(adma1); +dscr->addr = (hwaddr)(adma1 & ~0xfff); +dscr->attr = (uint8_t)extract32(adma1, 0, 7); +dscr->incr = 4; +if ((dscr->attr & SDHC_ADMA_ATTR_ACT_MASK) == SDHC_ADMA_ATTR_SET_LEN) { +dscr->length = (uint16_t)extract32(adma1, 12, 16); +} else { +dscr->length = 4 * KiB; +} +trace_adma_description("ADMA1_32", dscr); } -trace_adma_description("ADMA1_32", dscr); break; case SDHC_CTRL_ADMA2_64: -dma_memory_read(s->dma_as, entry_addr, &dscr->attr, 1, -MEMTXATTRS_UNSPECIFIED); -dma_memory_read(s->dma_as, entry_addr + 2, &dscr->length, 2, -MEMTXATTRS_UNSPECIFIED); -dscr->length = le16_to_cpu(dscr->length); -dma_memory_read(s->dma_as, entry_addr + 4, &dscr->addr, 8, -MEMTXATTRS_UNSPECIFIED); -dscr->addr = le64_to_cpu(dscr->addr); -dscr->attr &= (uint8_t) ~0xC0; -dscr->incr = 12; -trace_adma_description("ADMA2_64", dscr); +res = dma_memory_read(s->dma_as, entry_addr, &dscr->attr, 1, + MEMTXATTRS_UNSPECIFIED); +res |= dma_memory_read(s->dma_as, entry_addr + 2, &dscr->length, 2, + MEMTXATTRS_UNSPECIFIED); +res |= dma_memory_read(s->dma_as, entry_addr + 4, &dscr->addr, 8, + MEMTXATTRS_UNSPECIFIED); +if (res == MEMTX_OK) { +dscr->length = le16_to_cpu(dscr->length); +dscr->addr = le64_to_cpu(dscr->addr); +dscr->attr &= (uint8_t) ~0xc0; +dscr->incr = 12; +trace_adma_description("ADMA2_64", dscr); +} break; } +return res; } /* Advanced DMA data transfer */ @@ -755,7 +764,6 @@ static void sdhci_do_adma(SDHCIState *
Re: [PATCH 2/7] block/gluster: Use g_autofree for string in qemu_gluster_parse_json()
On 31/7/24 16:36, Peter Maydell wrote: In the loop in qemu_gluster_parse_json() we do: char *str = NULL; for(...) { str = g_strdup_printf(...); ... if (various errors) { goto out; } ... g_free(str); str = NULL; } return 0; out: various cleanups; g_free(str); ... return -errno; Coverity correctly complains that the assignment "str = NULL" at the end of the loop is unnecessary, because we will either go back to the top of the loop and overwrite it, or else we will exit the loop and then exit the function without ever reading str again. The assignment is there as defensive coding to ensure that str is only non-NULL if it's a live allocation, so this is intentional. We can make Coverity happier and simplify the code here by using g_autofree, since we never need 'str' outside the loop. Resolves: Coverity CID 1527385 Signed-off-by: Peter Maydell --- block/gluster.c | 6 +- 1 file changed, 1 insertion(+), 5 deletions(-) Reviewed-by: Philippe Mathieu-Daudé
Re: [PATCH 5/7] hw/block/fdc-isa: Assert that isa_fdc_get_drive_max_chs() found something
On 31/7/24 16:36, Peter Maydell wrote: Coverity complains about an overflow in isa_fdc_get_drive_max_chs() that can happen if the loop over fd_formats never finds a match, because we initialize *maxc to 0 and then at the end of the function decrement it. This can't ever actually happen because fd_formats has at least one entry for each FloppyDriveType, so we must at least once find a match and update *maxc, *maxh and *maxs. Assert that we did find a match, which should keep Coverity happy and will also detect possible bugs in the data in fd_formats. Resolves: Coverity CID 1547663 Signed-off-by: Peter Maydell --- hw/block/fdc-isa.c | 2 ++ 1 file changed, 2 insertions(+) Reviewed-by: Philippe Mathieu-Daudé
Re: [PATCH 6/7] hw/ide/pci.c: Remove dead code from bmdma_prepare_buf()
On 31/7/24 17:13, Kevin Wolf wrote: Am 31.07.2024 um 16:36 hat Peter Maydell geschrieben: Coverity notes that the code at the end of the loop in bmdma_prepare_buf() is unreachable. This is because in commit 9fbf0fa81fca8f527 ("ide: remove hardcoded 2GiB transactional limit") we removed the only codepath in the loop which could "break" out of it, but didn't notice that this meant we should also remove the code at the end of the loop. Remove the dead code. Resolves: Coverity CID 1547772 Signed-off-by: Peter Maydell --- hw/ide/pci.c | 4 1 file changed, 4 deletions(-) diff --git a/hw/ide/pci.c b/hw/ide/pci.c index 4675d079a17..f2cb500a94f 100644 --- a/hw/ide/pci.c +++ b/hw/ide/pci.c @@ -266,10 +266,6 @@ static int32_t bmdma_prepare_buf(const IDEDMA *dma, int32_t limit) s->io_buffer_size += l; } } - -qemu_sglist_destroy(&s->sg); -s->io_buffer_size = 0; -return -1; } Should we put a g_assert_not_reached() here instead to make it easier for the reader to understand how this function works? Or break and keep returning at EOF: -- >8 -- diff --git a/hw/ide/pci.c b/hw/ide/pci.c index 4675d079a1..8386c4fe42 100644 --- a/hw/ide/pci.c +++ b/hw/ide/pci.c @@ -237,7 +237,7 @@ static int32_t bmdma_prepare_buf(const IDEDMA *dma, int32_t limit) /* end of table (with a fail safe of one page) */ if (bm->cur_prd_last || (bm->cur_addr - bm->addr) >= BMDMA_PAGE_SIZE) { -return s->sg.size; +break; } pci_dma_read(pci_dev, bm->cur_addr, &prd, 8); bm->cur_addr += 8; @@ -267,9 +267,7 @@ static int32_t bmdma_prepare_buf(const IDEDMA *dma, int32_t limit) } } -qemu_sglist_destroy(&s->sg); -s->io_buffer_size = 0; -return -1; +return s->sg.size; } --- Either way: Reviewed-by: Kevin Wolf Reviewed-by: Philippe Mathieu-Daudé
[PATCH-for-9.1? v2 0/4] hw/sd/sdhci: Check ADMA descriptors can be accessed
Since v1: - split patch - do not return MemTxResult from get_adma_description() - single DMA read in SDHC_CTRL_ADMA2_64 case Based-on: <20240730092138.32443-5-phi...@linaro.org> Philippe Mathieu-Daudé (4): hw/sd/sdhci: Reduce variables scope in sdhci_do_adma() hw/sd/sdhci: Reduce variables scope in get_adma_description() hw/sd/sdhci: Read ADMA2_64 descriptor with a single dma_memory_read() hw/sd/sdhci: Check ADMA descriptors can be accessed hw/sd/sdhci.c | 117 ++ 1 file changed, 70 insertions(+), 47 deletions(-) -- 2.45.2
[PATCH-for-9.1? v2 4/4] hw/sd/sdhci: Check ADMA descriptors can be accessed
Since malicious guest can write invalid addresses to the ADMASYSADDR register, we need to check whether the descriptor could be correctly filled or not. Cc: qemu-sta...@nongnu.org Fixes: d7dfca0807 ("hw/sdhci: introduce standard SD host controller") Signed-off-by: Philippe Mathieu-Daudé --- hw/sd/sdhci.c | 23 +-- 1 file changed, 17 insertions(+), 6 deletions(-) diff --git a/hw/sd/sdhci.c b/hw/sd/sdhci.c index 2d8fa3151a..6794ee2267 100644 --- a/hw/sd/sdhci.c +++ b/hw/sd/sdhci.c @@ -701,13 +701,18 @@ static void trace_adma_description(const char *type, const ADMADescr *dscr) static void get_adma_description(SDHCIState *s, ADMADescr *dscr) { hwaddr entry_addr = (hwaddr)s->admasysaddr; +MemTxResult res; + switch (SDHC_DMA_TYPE(s->hostctl1)) { case SDHC_CTRL_ADMA2_32: { uint64_t adma2 = 0; -dma_memory_read(s->dma_as, entry_addr, &adma2, sizeof(adma2), -MEMTXATTRS_UNSPECIFIED); +res = dma_memory_read(s->dma_as, entry_addr, &adma2, sizeof(adma2), + MEMTXATTRS_UNSPECIFIED); +if (res != MEMTX_OK) { +break; +} adma2 = le64_to_cpu(adma2); /* * The spec does not specify endianness of descriptor table. @@ -724,8 +729,11 @@ static void get_adma_description(SDHCIState *s, ADMADescr *dscr) { uint32_t adma1 = 0; -dma_memory_read(s->dma_as, entry_addr, &adma1, sizeof(adma1), -MEMTXATTRS_UNSPECIFIED); +res = dma_memory_read(s->dma_as, entry_addr, &adma1, sizeof(adma1), + MEMTXATTRS_UNSPECIFIED); +if (res != MEMTX_OK) { +break; +} adma1 = le32_to_cpu(adma1); dscr->addr = (hwaddr)(adma1 & ~0xfff); dscr->attr = (uint8_t)extract32(adma1, 0, 7); @@ -748,8 +756,11 @@ static void get_adma_description(SDHCIState *s, ADMADescr *dscr) } QEMU_PACKED adma2; QEMU_BUILD_BUG_ON(sizeof(adma2) != 12); -dma_memory_read(s->dma_as, entry_addr, &adma2, sizeof(adma2), -MEMTXATTRS_UNSPECIFIED); +res = dma_memory_read(s->dma_as, entry_addr, &adma2, sizeof(adma2), + MEMTXATTRS_UNSPECIFIED); +if (res != MEMTX_OK) { +break; +} dscr->length = le16_to_cpu(adma2.length); dscr->addr = le64_to_cpu(adma2.addr); dscr->attr = adma2.attr & (uint8_t) ~0xc0; -- 2.45.2
[PATCH-for-9.1? v2 1/4] hw/sd/sdhci: Reduce variables scope in sdhci_do_adma()
All variables are only used within the for loop. Declare them within it. In particular this resets 'dscr' on each iteration. Signed-off-by: Philippe Mathieu-Daudé --- hw/sd/sdhci.c | 15 +++ 1 file changed, 7 insertions(+), 8 deletions(-) diff --git a/hw/sd/sdhci.c b/hw/sd/sdhci.c index 66b9364e9e..773f2b284b 100644 --- a/hw/sd/sdhci.c +++ b/hw/sd/sdhci.c @@ -751,20 +751,19 @@ static void get_adma_description(SDHCIState *s, ADMADescr *dscr) static void sdhci_do_adma(SDHCIState *s) { -unsigned int begin, length; -const uint16_t block_size = s->blksize & BLOCK_SIZE_MASK; -const MemTxAttrs attrs = { .memory = true }; -ADMADescr dscr = {}; -MemTxResult res; -int i; - if (s->trnmod & SDHC_TRNS_BLK_CNT_EN && !s->blkcnt) { /* Stop Multiple Transfer */ sdhci_end_transfer(s); return; } -for (i = 0; i < SDHC_ADMA_DESCS_PER_DELAY; ++i) { +for (int i = 0; i < SDHC_ADMA_DESCS_PER_DELAY; ++i) { +unsigned int begin, length; +const uint16_t block_size = s->blksize & BLOCK_SIZE_MASK; +const MemTxAttrs attrs = { .memory = true }; +ADMADescr dscr = { }; +MemTxResult res; + s->admaerr &= ~SDHC_ADMAERR_LENGTH_MISMATCH; get_adma_description(s, &dscr); -- 2.45.2
[PATCH-for-9.1? v2 2/4] hw/sd/sdhci: Reduce variables scope in get_adma_description()
The 'adma1' variable is only used in the SDHC_CTRL_ADMA1_32 case, and 'adma2' in SDHC_CTRL_ADMA2_32. Add braces in the switch case to use local declarations. Do the same in the SDHC_CTRL_ADMA2_64 case because we'll add a local variable there in the next commit. Replace 0xF000 -> ~0xfff to align with our codebase style. No functional change intended. Signed-off-by: Philippe Mathieu-Daudé --- hw/sd/sdhci.c | 87 --- 1 file changed, 48 insertions(+), 39 deletions(-) diff --git a/hw/sd/sdhci.c b/hw/sd/sdhci.c index 773f2b284b..0a95f49b93 100644 --- a/hw/sd/sdhci.c +++ b/hw/sd/sdhci.c @@ -700,50 +700,59 @@ static void trace_adma_description(const char *type, const ADMADescr *dscr) static void get_adma_description(SDHCIState *s, ADMADescr *dscr) { -uint32_t adma1 = 0; -uint64_t adma2 = 0; hwaddr entry_addr = (hwaddr)s->admasysaddr; switch (SDHC_DMA_TYPE(s->hostctl1)) { case SDHC_CTRL_ADMA2_32: -dma_memory_read(s->dma_as, entry_addr, &adma2, sizeof(adma2), -MEMTXATTRS_UNSPECIFIED); -adma2 = le64_to_cpu(adma2); -/* The spec does not specify endianness of descriptor table. - * We currently assume that it is LE. - */ -dscr->addr = (hwaddr)extract64(adma2, 32, 32) & ~0x3ull; -dscr->length = (uint16_t)extract64(adma2, 16, 16); -dscr->attr = (uint8_t)extract64(adma2, 0, 7); -dscr->incr = 8; -trace_adma_description("ADMA2_32", dscr); -break; -case SDHC_CTRL_ADMA1_32: -dma_memory_read(s->dma_as, entry_addr, &adma1, sizeof(adma1), -MEMTXATTRS_UNSPECIFIED); -adma1 = le32_to_cpu(adma1); -dscr->addr = (hwaddr)(adma1 & 0xF000); -dscr->attr = (uint8_t)extract32(adma1, 0, 7); -dscr->incr = 4; -if ((dscr->attr & SDHC_ADMA_ATTR_ACT_MASK) == SDHC_ADMA_ATTR_SET_LEN) { -dscr->length = (uint16_t)extract32(adma1, 12, 16); -} else { -dscr->length = 4 * KiB; +{ +uint64_t adma2 = 0; + +dma_memory_read(s->dma_as, entry_addr, &adma2, sizeof(adma2), +MEMTXATTRS_UNSPECIFIED); +adma2 = le64_to_cpu(adma2); +/* + * The spec does not specify endianness of descriptor table. + * We currently assume that it is LE. + */ +dscr->addr = (hwaddr)extract64(adma2, 32, 32) & ~0x3ull; +dscr->length = (uint16_t)extract64(adma2, 16, 16); +dscr->attr = (uint8_t)extract64(adma2, 0, 7); +dscr->incr = 8; +trace_adma_description("ADMA2_32", dscr); +break; +} +case SDHC_CTRL_ADMA1_32: +{ +uint32_t adma1 = 0; + +dma_memory_read(s->dma_as, entry_addr, &adma1, sizeof(adma1), +MEMTXATTRS_UNSPECIFIED); +adma1 = le32_to_cpu(adma1); +dscr->addr = (hwaddr)(adma1 & ~0xfff); +dscr->attr = (uint8_t)extract32(adma1, 0, 7); +dscr->incr = 4; +if ((dscr->attr & SDHC_ADMA_ATTR_ACT_MASK) == SDHC_ADMA_ATTR_SET_LEN) { +dscr->length = (uint16_t)extract32(adma1, 12, 16); +} else { +dscr->length = 4 * KiB; +} +trace_adma_description("ADMA1_32", dscr); +break; } -trace_adma_description("ADMA1_32", dscr); -break; case SDHC_CTRL_ADMA2_64: -dma_memory_read(s->dma_as, entry_addr, &dscr->attr, 1, -MEMTXATTRS_UNSPECIFIED); -dma_memory_read(s->dma_as, entry_addr + 2, &dscr->length, 2, -MEMTXATTRS_UNSPECIFIED); -dscr->length = le16_to_cpu(dscr->length); -dma_memory_read(s->dma_as, entry_addr + 4, &dscr->addr, 8, -MEMTXATTRS_UNSPECIFIED); -dscr->addr = le64_to_cpu(dscr->addr); -dscr->attr &= (uint8_t) ~0xC0; -dscr->incr = 12; -trace_adma_description("ADMA2_64", dscr); -break; +{ +dma_memory_read(s->dma_as, entry_addr, &dscr->attr, 1, +MEMTXATTRS_UNSPECIFIED); +dma_memory_read(s->dma_as, entry_addr + 2, &dscr->length, 2, +MEMTXATTRS_UNSPECIFIED); +dscr->length = le16_to_cpu(dscr->length); +dma_memory_read(s->dma_as, entry_addr + 4, &dscr->addr, 8, +MEMTXATTRS_UNSPECIFIED); +dscr->addr = le64_to_cpu(dscr->addr); +dscr->attr &= (uint8_t) ~0xC0; +dscr->incr = 12; +trace_adma_description("ADMA2_64", dscr); +break; +} } } -- 2.45.2
[PATCH-for-9.1? v2 3/4] hw/sd/sdhci: Read ADMA2_64 descriptor with a single dma_memory_read()
Instead of 3 consecutive dma_memory_read() calls, use a packed structure to read the descriptor in a single call. Signed-off-by: Philippe Mathieu-Daudé --- hw/sd/sdhci.c | 22 +- 1 file changed, 13 insertions(+), 9 deletions(-) diff --git a/hw/sd/sdhci.c b/hw/sd/sdhci.c index 0a95f49b93..2d8fa3151a 100644 --- a/hw/sd/sdhci.c +++ b/hw/sd/sdhci.c @@ -740,16 +740,20 @@ static void get_adma_description(SDHCIState *s, ADMADescr *dscr) } case SDHC_CTRL_ADMA2_64: { -dma_memory_read(s->dma_as, entry_addr, &dscr->attr, 1, +struct { +uint8_t attr; +uint8_t pad; +uint16_t length; +uint64_t addr; +} QEMU_PACKED adma2; +QEMU_BUILD_BUG_ON(sizeof(adma2) != 12); + +dma_memory_read(s->dma_as, entry_addr, &adma2, sizeof(adma2), MEMTXATTRS_UNSPECIFIED); -dma_memory_read(s->dma_as, entry_addr + 2, &dscr->length, 2, -MEMTXATTRS_UNSPECIFIED); -dscr->length = le16_to_cpu(dscr->length); -dma_memory_read(s->dma_as, entry_addr + 4, &dscr->addr, 8, -MEMTXATTRS_UNSPECIFIED); -dscr->addr = le64_to_cpu(dscr->addr); -dscr->attr &= (uint8_t) ~0xC0; -dscr->incr = 12; +dscr->length = le16_to_cpu(adma2.length); +dscr->addr = le64_to_cpu(adma2.addr); +dscr->attr = adma2.attr & (uint8_t) ~0xc0; +dscr->incr = sizeof(adma2); trace_adma_description("ADMA2_64", dscr); break; } -- 2.45.2
Re: [PATCH RFC v4 0/7] virtio-net: add support for SR-IOV emulation
On 2024/07/31 18:34, Yui Washizu wrote: On 2024/07/15 14:15, Akihiko Odaki wrote: On 2024/05/16 11:00, Yui Washizu wrote: On 2024/04/28 18:05, Akihiko Odaki wrote: Based-on: <20240315-reuse-v9-0-67aa69af4...@daynix.com> ("[PATCH for 9.1 v9 00/11] hw/pci: SR-IOV related fixes and improvements") Introduction This series is based on the RFC series submitted by Yui Washizu[1]. See also [2] for the context. This series enables SR-IOV emulation for virtio-net. It is useful to test SR-IOV support on the guest, or to expose several vDPA devices in a VM. vDPA devices can also provide L2 switching feature for offloading though it is out of scope to allow the guest to configure such a feature. The PF side code resides in virtio-pci. The VF side code resides in the PCI common infrastructure, but it is restricted to work only for virtio-net-pci because of lack of validation. User Interface -- A user can configure a SR-IOV capable virtio-net device by adding virtio-net-pci functions to a bus. Below is a command line example: -netdev user,id=n -netdev user,id=o -netdev user,id=p -netdev user,id=q -device pcie-root-port,id=b -device virtio-net-pci,bus=b,addr=0x0.0x3,netdev=q,sriov-pf=f -device virtio-net-pci,bus=b,addr=0x0.0x2,netdev=p,sriov-pf=f -device virtio-net-pci,bus=b,addr=0x0.0x1,netdev=o,sriov-pf=f -device virtio-net-pci,bus=b,addr=0x0.0x0,netdev=n,id=f The VFs specify the paired PF with "sriov-pf" property. The PF must be added after all VFs. It is user's responsibility to ensure that VFs have function numbers larger than one of the PF, and the function numbers have a consistent stride. I tried to start a VM with more than 8 VFs allocated using your patch, but the following error occured and qemu didn't work: VF function number overflows. I think the cause of this error is that virtio-net-pci PFs don't have ARI. (pcie_ari_init is not added to virtio-net-pci when PFs are initialized.) I think it is possible to add it later, but how about adding pcie_ari_init ? As a trial, adding pcie_ari_init to virtio_pci_realize enabled the creation of more than 8 VFs. I have just looked into that possibility, but adding pcie_ari_init to virtio_pci_realize has some implications. Unconditionally calling pcie_ari_init will break the existing configuration of virtio-pci devices so we need to implement some logic to detect when ARI is needed. Preferably such logic should be implemented in the common PCI infrastructure instead of implementing it in virtio-pci so that other PCI multifunction devices can benefit from it. While I don't think implementing this will be too complicated, I need to ensure that such a feature is really needed before doing so. OK. I want to use this emulation for offloading virtual network in a environment where there are many containers in VMs. So, I consider that the feature is need. I think that 7 VFs are too few. I'll keep thinking about the feature's necessity. I understand there could be many containers in VMs, but will a single device deal with them? If the virtio-net VFs are backed by the vDPA capability of one physical device, it will not have VFs more than that. The VMs must have several PFs individually paired with VFs to accommodate more containers on one VM. I don't know much about vDPA-capable device, but as a reference, igb only has 8 VFs. I'll add other comments to RFC v5 patch. The RFC tag is already dropped. Regards, Akihiko Odaki
Re: [PATCH RFC v4 0/7] virtio-net: add support for SR-IOV emulation
On Thu, Aug 01, 2024 at 02:37:55PM +0900, Akihiko Odaki wrote: > I don't know much about vDPA-capable device, but as a reference, igb only > has 8 VFs. modern vdpa capable devices have much more than 8 VFs, 8 is a very low number. -- MST